Hello all,

I realize that this topic has been discussed in some detail 
(https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/76909664-2C16-4B61-8BEE-2196B3721EA1%40gmail.com/#msg34923617),
 but I remain somewhat confused. Let me layout what I am trying to achieve:

I would like a method for creating a canonical order of the atoms in a 
molecule, independent of the input order. For example, given 
(R)-1-(sec-butyl)naphthalene (see attached image)
[A close up of a logo  Description automatically generated]

if you start with the following smiles string “CC[C@H](C1=CC=CC2=C1C=CC=C2)C” 
versus the InChI string 
“InChI=1S/C14H16/c1-3-11(2)13-10-6-8-12-7-4-5-9-14(12)13/h4-11H,3H2,1-2H3/t11-/m1/s1”,
 you obviously get two different atom orders. I have tried to apply the 
`CanonicalRankAtoms` method to each of the molecules, such as the following 
example code:

```
from rdkit import Chem

def atom_order(m):
    return [(x.GetIdx(), x.GetAtomicNum(), x.GetDegree()) for x in m.GetAtoms()]

m = Chem.MolFromSmiles(“CC[C@H](C1=CC=CC2=C1C=CC=C2)C”)
m = Chem.AddHs(m)
m1 = 
Chem.MolFromInchi(“InChI=1S/C14H16/c1-3-11(2)13-10-6-8-12-7-4-5-9-14(12)13/h4-11H,3H2,1-2H3/t11-/m1/s1”)
m1 = Chem.AddHs(m1)
# Some simple comparison of atom ordering
atom_order(m) == atom_order(m1) # returns False
m_order = list(Chem.CanonicalRankAtoms(m))
m1_order = list(Chem.CanonicalRankAtoms(m1))
m_order == m1_order # returns False
# For completeness
m_ordered = Chem.RenumberAtoms(m, m_order)
m1_ordered = Chem.RenumberAtoms(m1, m1_order)
atom_order(m_ordered) == atom_order(m1_ordered) # returns False
```

One plausible solution that seems to work, is the following extension:

```
m_canon = Chem.MolFromSmiles(Chem.MolToSmiles(m))
m1_canon = Chem.MolFromSmiles(Chem.MolToSmiles(m1))
atom_order(m_canon) == atom_order(m1_canon) # returns True
```

I believe this works because by default `MolToSmiles` has the `canonical=True`.

I suppose what I would like to know is

  1.  Why does CanonicalAtomRank not return the same result for different 
inputs of the same molecule. I understand that it has something to do with the 
underlying molecular graph. In particular, in the linked mail list discussion 
Greg says (https://sourceforge.net/p/rdkit/mailman/message/34923647/):
“If you just want a canonical ordering of the atoms, there is no reason to 
generate the SMILES. You can just use Chem.CanonicalRankAtoms().”
  2.  Is there a better solution than round tripping from import X format -> 
export canonical smiles -> import canonical smiles -> export canonical mol (mol 
file or similar)?
  3.  In a related but tangential questions, is there a way to have canonical 
smiles without the lowercase aromaticity notation?

Thank you very much,

Jeff van Santen
The Natural Products Atlas (www.npatlas.org)
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to