Greg-
As always... thank you for the detailed response.
I have been trying to look in detail at some amino acid structures
going from 2D->3D. Here are my results using RDKit's methods of
investigating chirality of alanine:
>>> suppl = Chem.SDMolSupplier('4aminoacids.separated.sdf')
>>>
>>> m = suppl[0]
>>> Chem.FindMolChiralCenters(m)
[(2, 'S')]
>>> mH = Chem.AddHs(m)
>>> Chem.FindMolChiralCenters(mH)
[(2, 'R')]
>>> print Chem.MolToMolBlock(m)
RDKit 3D
6 5 0 0 0 0 0 0 0 0999 V2000
-0.7083 0.0583 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.4833 1.4125 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
0.8333 0.0583 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.4833 -1.2708 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.6083 1.4125 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
1.6083 -1.2708 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2 1 2 0
3 1 1 1
4 1 1 0
3 5 1 0
6 3 1 0
M END
>>> print Chem.MolToMolBlock(mH)
RDKit 3D
13 12 0 0 0 0 0 0 0 0999 V2000
-0.7083 0.0583 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.4833 1.4125 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
0.8333 0.0583 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.4833 -1.2708 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.6083 1.4125 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
1.6083 -1.2708 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
2 1 2 0
3 1 1 6
4 1 1 0
3 5 1 0
6 3 1 0
3 7 1 0
4 8 1 0
5 9 1 0
5 10 1 0
6 11 1 0
6 12 1 0
6 13 1 0
M END
The embeded version of mH also had an 'R' configuration.
I will take a look at your tests. The big difference I see here is
that you are using smiles while I am using SDFs as my source.
Could there be an issue with handling SDF?
-Marshall
On Apr 14, 2009, at 12:55 PM, Greg Landrum wrote:
On Tue, Apr 14, 2009 at 7:38 AM, Greg Landrum
<greg.land...@gmail.com> wrote:
To answer the overall question: yes the embedding process *should*
preserve stereochemistry. Having said that, the area of chirality is
where one is the most likely to encounter "correctness bugs" in the
RDKit.
Since I'm probably more skeptical about all of this stuff than anyone
else, I just did an experiment to make sure that I wasn't completely
wrong to be condident that the RDKit was handling chirality reasonably
in the embedding procedure.
From the PubChem screening set (or at least one version of it), I
pulled out the 4579 molecules that have stereochemistry information
provided for at least one atom (easily done by grepping for "@" in the
SMILES file).
I then ran the following code snippet over those molecules:
#-------------------
logger.info('generating and testing:')
for i,(nm,smi,m) in enumerate(ms):
centers=Chem.FindMolChiralCenters(m)
cDict = {}
for id,l in centers: cDict[id]=l
m2=Chem.AddHs(m)
centers2=Chem.FindMolChiralCenters(m2)
for id,l in centers2:
if l!= cDict.get(id,l):
print '1:',i,nm,smi,id,l
try:
AllChem.EmbedMolecule(m2)
except:
continue
Chem.AssignAtomChiralTagsFromStructure(m2)
centers2=Chem.FindMolChiralCenters(m2)
for id,l in centers2:
if l!= cDict.get(id,l):
print '2:',i,nm,smi,id,l
oMs.append((nm,smi,m2))
if not (i+1)%10: logger.info('Done: %d'%(i+1))
#------------
Also visible for 30 days here: http://pastebin.com/m19a4c639
The only error that comes out of this is for the molecule:
[C@@H]([...@h](C(=O)O)O)(C(=O)O)O
Where there's bad assignment of R and S in the result of AddHs(); a
bug, but an unconnected one :
https://sourceforge.net/tracker/?func=detail&aid=2762917&group_id=160139&atid=814650
Though this is a limited test, less than 5000 molecules, the 100%
success rate makes me feel a bit more comfortable with things.
-greg