Dear Marshall, To answer the overall question: yes the embedding process *should* preserve stereochemistry. Having said that, the area of chirality is where one is the most likely to encounter "correctness bugs" in the RDKit.
I think there may be some confusion about representation of stereochemistry in the RDKit though. Since this is pretty much completely undocumented, that's to be expected. I will try to do a better job of this, but here's a start: The first thing to know is that the result returned by atom.GetChiralTag() gives you incomplete information about the atom's stereochemistry -- for the full story you also need to know the ordering of the bonds around the atoms. This is because the RDKit's internal representation of chirality is based on the way chirality is represented in SMILES. For a nice overview of this take a look at two posts from Noel's blog: http://baoilleach.blogspot.com/2009/03/clockwisdom-of-smiles.html http://baoilleach.blogspot.com/2009/03/clockwisdom-of-smiles-part-ii.html and for an authoritative definition, try section 3.3.3 here: http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html In short, the result of GetAtomChiralTag() tells you the direction you have rotate to move the second substituent of the atom onto its third substituent when you are looking down the bond from the first substituent. Since implicit Hs always count as the second substituent, this means that adding Hs (making them explicit) *can* change atom atom's chiral tag without actually affecting its actual chirality. This happens only in rare instances. For most applications, you are probably not actually interested in the results of GetAtomChiralTag(), as described above: an implementation detail, instead you're probably more interested in the actual stereochemistry/chirality at that atom. For that you need to assign R/S labels to the atoms in the molecule and look at those. Here's a quick demonstration from your mol 483696 (assuming I drew it correctly): [22]>>> m = Chem.MolFromSmiles('CC(C)(C)OC(=O)N1CC(c...@h]1c(O)=O)OCC1=CC=CC=C1') [23]>>> Chem.FindMolChiralCenters(m) Out[23] [(11, 'S')] FindMolChiralCenters is a convenience function that assigns CIP codes to chiral atoms in a molecule, and returns them as a list. If you know specifically which atom you're interested in, you can get the same information as follows: [33]>>> Chem.AssignStereochemistry(m) [34]>>> m.GetAtomWithIdx(11).GetProp('_CIPCode') Out[34] 'S' Here's the result of embedding this molecule: [25]>>> m2 =Chem.AddHs(m) [26]>>> ids=AllChem.EmbedMultipleConfs(m2,10) With 3D structures, one can derive the stereochemical information from the coordinates: [27]>>> for id in ids: ....: Chem.AssignAtomChiralTagsFromStructure(m2,confId=id) ....: print Chem.FindMolChiralCenters(m2) ....: [(9, 'S'), (11, 'S')] [(9, 'S'), (11, 'S')] [(9, 'S'), (11, 'S')] [(9, 'S'), (11, 'S')] [(9, 'S'), (11, 'S')] [(9, 'S'), (11, 'S')] [(9, 'S'), (11, 'S')] [(9, 'S'), (11, 'S')] [(9, 'S'), (11, 'S')] [(9, 'S'), (11, 'S')] Two things to notice: 1) atom 11 is always "S", as it started 2) the other possible stereocenter (9) also shows up as a chiral center. As I said, this is an area where I wouldn't be surprised to encounter bugs, so if you still see odd behavior after taking the stuff above into account, please post. The more test cases and examples we have for this, the more robust the code will be. -greg On Tue, Apr 14, 2009 at 4:36 AM, Marshall Levesque <marsh...@emolecules.com> wrote: > Just to include other thoughts on the topic... can this issue come from the > Chem.AddHs(mol) step that is performed pre-embed for 2D->3D generation? > > Since the conformers of a single starting 2D structure don't all see > flipping of stereochemistry when a single structure has Hs added and is sent > to be embeded, I am assuming AddHs is not the problem. Here are additional > example stats on how varied the stereochemistry is lost: > > 481809_atom_4 has 90.00 % correct with 10 conformers > 482080_atom_4 has 25.00 % correct with 4 conformers > 475598_atom_2 has 0.00 % correct with 6 conformers > 476397_atom_1 has 40.00 % correct with 10 conformers > 482629_atom_0 has 20.00 % correct with 10 conformers > 475620_atom_4 has 16.67 % correct with 6 conformers > 476459_atom_3 has 83.33 % correct with 6 conformers > 484276_atom_0 has 10.00 % correct with 10 conformers > 483000_atom_1 has 8.33 % correct with 12 conformers > 483696_atom_0 has 0.00 % correct with 14 conformers > 483370_atom_1 has 11.11 % correct with 18 conformers > > The 6-digit code is the MOL_ID. "correct" is considered a match between the > results of calling GetChiralTag() on the same atom of the 2D and 3D versions > of a molecule. >