On Tue, Apr 14, 2009 at 7:38 AM, Greg Landrum <greg.land...@gmail.com> wrote:
>
> To answer the overall question: yes the embedding process *should*
> preserve stereochemistry. Having said that, the area of chirality is
> where one is the most likely to encounter "correctness bugs" in the
> RDKit.

Since I'm probably more skeptical about all of this stuff than anyone
else, I just did an experiment to make sure that I wasn't completely
wrong to be condident that the RDKit was handling chirality reasonably
in the embedding procedure.

>From the PubChem screening set (or at least one version of it), I
pulled out the 4579 molecules that have stereochemistry information
provided for at least one atom (easily done by grepping for "@" in the
SMILES file).
I then ran the following code snippet over those molecules:
#-------------------
logger.info('generating and testing:')
for i,(nm,smi,m) in enumerate(ms):
    centers=Chem.FindMolChiralCenters(m)
    cDict = {}
    for id,l in centers: cDict[id]=l
    m2=Chem.AddHs(m)

    centers2=Chem.FindMolChiralCenters(m2)
    for id,l in centers2:
        if l!= cDict.get(id,l):
            print '1:',i,nm,smi,id,l
    try:
        AllChem.EmbedMolecule(m2)
    except:
        continue
    Chem.AssignAtomChiralTagsFromStructure(m2)
    centers2=Chem.FindMolChiralCenters(m2)
    for id,l in centers2:
        if l!= cDict.get(id,l):
            print '2:',i,nm,smi,id,l
    oMs.append((nm,smi,m2))
    if not (i+1)%10: logger.info('Done: %d'%(i+1))
#------------
Also visible for 30 days here: http://pastebin.com/m19a4c639

The only error that comes out of this is for the molecule:
[C@@H]([...@h](C(=O)O)O)(C(=O)O)O

Where there's bad assignment of R and S in the result of AddHs(); a
bug, but an unconnected one :
https://sourceforge.net/tracker/?func=detail&aid=2762917&group_id=160139&atid=814650

Though this is a limited test, less than 5000 molecules, the 100%
success rate makes me feel a bit more comfortable with things.

-greg

Reply via email to