Hello all,
I realize that this topic has been discussed in some detail
(https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/76909664-2C16-4B61-8BEE-2196B3721EA1%40gmail.com/#msg34923617),
but I remain somewhat confused. Let me layout what I am trying to achieve:
I would like a method for creating a canonical order of the atoms in a
molecule, independent of the input order. For example, given
(R)-1-(sec-butyl)naphthalene (see attached image)
[A close up of a logo Description automatically generated]
if you start with the following smiles string “CC[C@H](C1=CC=CC2=C1C=CC=C2)C”
versus the InChI string
“InChI=1S/C14H16/c1-3-11(2)13-10-6-8-12-7-4-5-9-14(12)13/h4-11H,3H2,1-2H3/t11-/m1/s1”,
you obviously get two different atom orders. I have tried to apply the
`CanonicalRankAtoms` method to each of the molecules, such as the following
example code:
```
from rdkit import Chem
def atom_order(m):
return [(x.GetIdx(), x.GetAtomicNum(), x.GetDegree()) for x in m.GetAtoms()]
m = Chem.MolFromSmiles(“CC[C@H](C1=CC=CC2=C1C=CC=C2)C”)
m = Chem.AddHs(m)
m1 =
Chem.MolFromInchi(“InChI=1S/C14H16/c1-3-11(2)13-10-6-8-12-7-4-5-9-14(12)13/h4-11H,3H2,1-2H3/t11-/m1/s1”)
m1 = Chem.AddHs(m1)
# Some simple comparison of atom ordering
atom_order(m) == atom_order(m1) # returns False
m_order = list(Chem.CanonicalRankAtoms(m))
m1_order = list(Chem.CanonicalRankAtoms(m1))
m_order == m1_order # returns False
# For completeness
m_ordered = Chem.RenumberAtoms(m, m_order)
m1_ordered = Chem.RenumberAtoms(m1, m1_order)
atom_order(m_ordered) == atom_order(m1_ordered) # returns False
```
One plausible solution that seems to work, is the following extension:
```
m_canon = Chem.MolFromSmiles(Chem.MolToSmiles(m))
m1_canon = Chem.MolFromSmiles(Chem.MolToSmiles(m1))
atom_order(m_canon) == atom_order(m1_canon) # returns True
```
I believe this works because by default `MolToSmiles` has the `canonical=True`.
I suppose what I would like to know is
1. Why does CanonicalAtomRank not return the same result for different
inputs of the same molecule. I understand that it has something to do with the
underlying molecular graph. In particular, in the linked mail list discussion
Greg says (https://sourceforge.net/p/rdkit/mailman/message/34923647/):
“If you just want a canonical ordering of the atoms, there is no reason to
generate the SMILES. You can just use Chem.CanonicalRankAtoms().”
2. Is there a better solution than round tripping from import X format ->
export canonical smiles -> import canonical smiles -> export canonical mol (mol
file or similar)?
3. In a related but tangential questions, is there a way to have canonical
smiles without the lowercase aromaticity notation?
Thank you very much,
Jeff van Santen
The Natural Products Atlas (www.npatlas.org)
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss