Hello all, I realize that this topic has been discussed in some detail (https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/76909664-2C16-4B61-8BEE-2196B3721EA1%40gmail.com/#msg34923617), but I remain somewhat confused. Let me layout what I am trying to achieve:
I would like a method for creating a canonical order of the atoms in a molecule, independent of the input order. For example, given (R)-1-(sec-butyl)naphthalene (see attached image) [A close up of a logo Description automatically generated] if you start with the following smiles string “CC[C@H](C1=CC=CC2=C1C=CC=C2)C” versus the InChI string “InChI=1S/C14H16/c1-3-11(2)13-10-6-8-12-7-4-5-9-14(12)13/h4-11H,3H2,1-2H3/t11-/m1/s1”, you obviously get two different atom orders. I have tried to apply the `CanonicalRankAtoms` method to each of the molecules, such as the following example code: ``` from rdkit import Chem def atom_order(m): return [(x.GetIdx(), x.GetAtomicNum(), x.GetDegree()) for x in m.GetAtoms()] m = Chem.MolFromSmiles(“CC[C@H](C1=CC=CC2=C1C=CC=C2)C”) m = Chem.AddHs(m) m1 = Chem.MolFromInchi(“InChI=1S/C14H16/c1-3-11(2)13-10-6-8-12-7-4-5-9-14(12)13/h4-11H,3H2,1-2H3/t11-/m1/s1”) m1 = Chem.AddHs(m1) # Some simple comparison of atom ordering atom_order(m) == atom_order(m1) # returns False m_order = list(Chem.CanonicalRankAtoms(m)) m1_order = list(Chem.CanonicalRankAtoms(m1)) m_order == m1_order # returns False # For completeness m_ordered = Chem.RenumberAtoms(m, m_order) m1_ordered = Chem.RenumberAtoms(m1, m1_order) atom_order(m_ordered) == atom_order(m1_ordered) # returns False ``` One plausible solution that seems to work, is the following extension: ``` m_canon = Chem.MolFromSmiles(Chem.MolToSmiles(m)) m1_canon = Chem.MolFromSmiles(Chem.MolToSmiles(m1)) atom_order(m_canon) == atom_order(m1_canon) # returns True ``` I believe this works because by default `MolToSmiles` has the `canonical=True`. I suppose what I would like to know is 1. Why does CanonicalAtomRank not return the same result for different inputs of the same molecule. I understand that it has something to do with the underlying molecular graph. In particular, in the linked mail list discussion Greg says (https://sourceforge.net/p/rdkit/mailman/message/34923647/): “If you just want a canonical ordering of the atoms, there is no reason to generate the SMILES. You can just use Chem.CanonicalRankAtoms().” 2. Is there a better solution than round tripping from import X format -> export canonical smiles -> import canonical smiles -> export canonical mol (mol file or similar)? 3. In a related but tangential questions, is there a way to have canonical smiles without the lowercase aromaticity notation? Thank you very much, Jeff van Santen The Natural Products Atlas (www.npatlas.org)
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss