Greg-

As always... thank you for the detailed response.

I have been trying to look in detail at some amino acid structures going from 2D->3D. Here are my results using RDKit's methods of investigating chirality of alanine:

>>> suppl = Chem.SDMolSupplier('4aminoacids.separated.sdf')
>>>
>>> m = suppl[0]
>>> Chem.FindMolChiralCenters(m)
[(2, 'S')]
>>> mH = Chem.AddHs(m)
>>> Chem.FindMolChiralCenters(mH)
[(2, 'R')]
>>> print Chem.MolToMolBlock(m)

     RDKit          3D

  6  5  0  0  0  0  0  0  0  0999 V2000
   -0.7083    0.0583    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.4833    1.4125    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.8333    0.0583    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.4833   -1.2708    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.6083    1.4125    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    1.6083   -1.2708    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  2  1  2  0
  3  1  1  1
  4  1  1  0
  3  5  1  0
  6  3  1  0
M  END

>>> print Chem.MolToMolBlock(mH)

     RDKit          3D

 13 12  0  0  0  0  0  0  0  0999 V2000
   -0.7083    0.0583    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.4833    1.4125    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.8333    0.0583    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.4833   -1.2708    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.6083    1.4125    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    1.6083   -1.2708    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
  2  1  2  0
  3  1  1  6
  4  1  1  0
  3  5  1  0
  6  3  1  0
  3  7  1  0
  4  8  1  0
  5  9  1  0
  5 10  1  0
  6 11  1  0
  6 12  1  0
  6 13  1  0
M  END

The embeded version of mH also had an 'R' configuration.

I will take a look at your tests. The big difference I see here is that you are using smiles while I am using SDFs as my source.

Could there be an issue with handling SDF?

-Marshall

On Apr 14, 2009, at 12:55 PM, Greg Landrum wrote:

On Tue, Apr 14, 2009 at 7:38 AM, Greg Landrum <greg.land...@gmail.com> wrote:

To answer the overall question: yes the embedding process *should*
preserve stereochemistry. Having said that, the area of chirality is
where one is the most likely to encounter "correctness bugs" in the
RDKit.

Since I'm probably more skeptical about all of this stuff than anyone
else, I just did an experiment to make sure that I wasn't completely
wrong to be condident that the RDKit was handling chirality reasonably
in the embedding procedure.

From the PubChem screening set (or at least one version of it), I
pulled out the 4579 molecules that have stereochemistry information
provided for at least one atom (easily done by grepping for "@" in the
SMILES file).
I then ran the following code snippet over those molecules:
#-------------------
logger.info('generating and testing:')
for i,(nm,smi,m) in enumerate(ms):
   centers=Chem.FindMolChiralCenters(m)
   cDict = {}
   for id,l in centers: cDict[id]=l
   m2=Chem.AddHs(m)

   centers2=Chem.FindMolChiralCenters(m2)
   for id,l in centers2:
       if l!= cDict.get(id,l):
           print '1:',i,nm,smi,id,l
   try:
       AllChem.EmbedMolecule(m2)
   except:
       continue
   Chem.AssignAtomChiralTagsFromStructure(m2)
   centers2=Chem.FindMolChiralCenters(m2)
   for id,l in centers2:
       if l!= cDict.get(id,l):
           print '2:',i,nm,smi,id,l
   oMs.append((nm,smi,m2))
   if not (i+1)%10: logger.info('Done: %d'%(i+1))
#------------
Also visible for 30 days here: http://pastebin.com/m19a4c639

The only error that comes out of this is for the molecule:
[C@@H]([...@h](C(=O)O)O)(C(=O)O)O

Where there's bad assignment of R and S in the result of AddHs(); a
bug, but an unconnected one :
https://sourceforge.net/tracker/?func=detail&aid=2762917&group_id=160139&atid=814650

Though this is a limited test, less than 5000 molecules, the 100%
success rate makes me feel a bit more comfortable with things.

-greg


Reply via email to