Dear Marshall,

To answer the overall question: yes the embedding process *should*
preserve stereochemistry. Having said that, the area of chirality is
where one is the most likely to encounter "correctness bugs" in the
RDKit.

 I think there may be some confusion about representation of
stereochemistry in the RDKit though. Since this is pretty much
completely undocumented, that's to be expected. I will try to do a
better job of this, but here's a start:

The first thing to know is that the result returned by
atom.GetChiralTag() gives you incomplete information about the atom's
stereochemistry -- for the full story you also need to know the
ordering of the bonds around the atoms. This is because the RDKit's
internal representation of chirality is based on the way chirality is
represented in SMILES. For a nice overview of this take a look at two
posts from Noel's blog:
http://baoilleach.blogspot.com/2009/03/clockwisdom-of-smiles.html
http://baoilleach.blogspot.com/2009/03/clockwisdom-of-smiles-part-ii.html
and for an authoritative definition, try section 3.3.3 here:
http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

In short, the result of GetAtomChiralTag() tells you the direction you
have rotate to move the second substituent of the atom onto its third
substituent when you are looking down the bond from the first
substituent.

Since implicit Hs always count as the second substituent, this means
that adding Hs (making them explicit) *can* change atom atom's chiral
tag without actually affecting its actual chirality. This happens only
in rare instances.

For most applications, you are probably not actually interested in the
results of GetAtomChiralTag(), as described above: an implementation
detail, instead you're probably more interested in the actual
stereochemistry/chirality at that atom. For that you need to assign
R/S labels to the atoms in the molecule and look at those.

Here's a quick demonstration from your mol 483696 (assuming I drew it
correctly):
[22]>>> m = 
Chem.MolFromSmiles('CC(C)(C)OC(=O)N1CC(c...@h]1c(O)=O)OCC1=CC=CC=C1')

[23]>>> Chem.FindMolChiralCenters(m)
Out[23] [(11, 'S')]


FindMolChiralCenters is a convenience function that assigns CIP codes
to chiral atoms in a molecule, and returns them as a list. If you know
specifically which atom you're interested in, you can get the same
information as follows:
[33]>>> Chem.AssignStereochemistry(m)

[34]>>> m.GetAtomWithIdx(11).GetProp('_CIPCode')
Out[34] 'S'

Here's the result of embedding this molecule:
[25]>>> m2 =Chem.AddHs(m)

[26]>>> ids=AllChem.EmbedMultipleConfs(m2,10)

With 3D structures, one can derive the stereochemical information from
the coordinates:

[27]>>> for id in ids:
   ....:     Chem.AssignAtomChiralTagsFromStructure(m2,confId=id)
   ....:     print Chem.FindMolChiralCenters(m2)
   ....:
[(9, 'S'), (11, 'S')]
[(9, 'S'), (11, 'S')]
[(9, 'S'), (11, 'S')]
[(9, 'S'), (11, 'S')]
[(9, 'S'), (11, 'S')]
[(9, 'S'), (11, 'S')]
[(9, 'S'), (11, 'S')]
[(9, 'S'), (11, 'S')]
[(9, 'S'), (11, 'S')]
[(9, 'S'), (11, 'S')]

Two things to notice:
1) atom 11 is always "S", as it started
2) the other possible stereocenter (9) also shows up as a chiral center.

As I said, this is an area where I wouldn't be surprised to encounter
bugs, so if you still see odd behavior after taking the stuff above
into account, please post. The more test cases and examples we have
for this, the more robust the code will be.

-greg

On Tue, Apr 14, 2009 at 4:36 AM, Marshall Levesque
<marsh...@emolecules.com> wrote:
> Just to include other thoughts on the topic... can this issue come from the
> Chem.AddHs(mol) step that is performed pre-embed for 2D->3D generation?
>
> Since the conformers of a single starting 2D structure don't all see
> flipping of stereochemistry when a single structure has Hs added and is sent
> to be embeded, I am assuming AddHs is not the problem.  Here are additional
> example stats on how varied the stereochemistry is lost:
>
> 481809_atom_4 has 90.00 % correct with 10 conformers
> 482080_atom_4 has 25.00 % correct with 4 conformers
> 475598_atom_2 has 0.00 % correct with 6 conformers
> 476397_atom_1 has 40.00 % correct with 10 conformers
> 482629_atom_0 has 20.00 % correct with 10 conformers
> 475620_atom_4 has 16.67 % correct with 6 conformers
> 476459_atom_3 has 83.33 % correct with 6 conformers
> 484276_atom_0 has 10.00 % correct with 10 conformers
> 483000_atom_1 has 8.33 % correct with 12 conformers
> 483696_atom_0 has 0.00 % correct with 14 conformers
> 483370_atom_1 has 11.11 % correct with 18 conformers
>
> The 6-digit code is the MOL_ID.  "correct" is considered a match between the
> results of calling GetChiralTag() on the same atom of the 2D and 3D versions
> of a molecule.
>

Reply via email to