On Tue, Apr 14, 2009 at 7:38 AM, Greg Landrum <greg.land...@gmail.com> wrote: > > To answer the overall question: yes the embedding process *should* > preserve stereochemistry. Having said that, the area of chirality is > where one is the most likely to encounter "correctness bugs" in the > RDKit.
Since I'm probably more skeptical about all of this stuff than anyone else, I just did an experiment to make sure that I wasn't completely wrong to be condident that the RDKit was handling chirality reasonably in the embedding procedure. >From the PubChem screening set (or at least one version of it), I pulled out the 4579 molecules that have stereochemistry information provided for at least one atom (easily done by grepping for "@" in the SMILES file). I then ran the following code snippet over those molecules: #------------------- logger.info('generating and testing:') for i,(nm,smi,m) in enumerate(ms): centers=Chem.FindMolChiralCenters(m) cDict = {} for id,l in centers: cDict[id]=l m2=Chem.AddHs(m) centers2=Chem.FindMolChiralCenters(m2) for id,l in centers2: if l!= cDict.get(id,l): print '1:',i,nm,smi,id,l try: AllChem.EmbedMolecule(m2) except: continue Chem.AssignAtomChiralTagsFromStructure(m2) centers2=Chem.FindMolChiralCenters(m2) for id,l in centers2: if l!= cDict.get(id,l): print '2:',i,nm,smi,id,l oMs.append((nm,smi,m2)) if not (i+1)%10: logger.info('Done: %d'%(i+1)) #------------ Also visible for 30 days here: http://pastebin.com/m19a4c639 The only error that comes out of this is for the molecule: [C@@H]([...@h](C(=O)O)O)(C(=O)O)O Where there's bad assignment of R and S in the result of AddHs(); a bug, but an unconnected one : https://sourceforge.net/tracker/?func=detail&aid=2762917&group_id=160139&atid=814650 Though this is a limited test, less than 5000 molecules, the 100% success rate makes me feel a bit more comfortable with things. -greg