Hi Pavel,

It is, unfortunately, not that easy.
The canonicalization algorithm does not use atomic aromaticity when
determining atom ordering, so as far as it is concerned there is no
difference between atoms 0 and 2 in either of your examples. What does get
used is the number of hydrogens, so you need to use that in order to get
the results you are looking for.[1] For technical reasons, you also need to
tell the RDKit that the atoms should not have implicit Hs attached. Here's
a gist that works for me:
https://gist.github.com/greglandrum/f4e2f2f2ad311560d8ab36874d503843

Two notes:
 1) I don't set the number of Hs on atom 1 in that gist, but I would
suggest doing that too.
 2) If atoms 0 and 2 have the same number of Hs attached, this still is not
going to work if you're building things from fragments. The
canonicalization code was not really designed to be used in situations like
this.

-greg
[1] The details of the canonicalization algorithm, including the contents
of the atom invariants, are described here:
http://dx.doi.org/10.1021/acs.jcim.5b00543


On Tue, Aug 1, 2017 at 2:53 PM, Pavel Polishchuk <pavel_polishc...@ukr.net>
wrote:

> Hi all,
>
>   canonicalization of fragment SMILES does not work properly. Below there
> are two examples of identical fragments. The only difference is the order
> of atoms (indices). However, it seems that RDKit canonicalization does not
> take into account atom types.
>
>   Does someone have an idea how to solve this issue with small losses?
>
> #1 ===========
>
> m = RWMol()
>
> for i in range(3):
>     a = Atom(6)
>     m.AddAtom(a)
> a = Atom(0)
> m.AddAtom(a)
>
> m.GetAtomWithIdx(0).SetIsAromatic(True)  # set atom 0 as aromatic
> m.GetAtomWithIdx(3).SetAtomMapNum(1)
>
>
> m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
> m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
> m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)
>
> Chem.MolToSmiles(m)
>
> OUTPUT: 'cC(C)[*:1]'
>
> #2 ===========
>
> m2 = RWMol()
>
> for i in range(3):
>     a = Atom(6)
>     m2.AddAtom(a)
> a = Atom(0)
> m2.AddAtom(a)
>
> m2.GetAtomWithIdx(2).SetIsAromatic(True) # set atom 2 as aromatic
> m2.GetAtomWithIdx(3).SetAtomMapNum(1)
>
>
> m2.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
> m2.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
> m2.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)
>
> Chem.MolToSmiles(m2)
>
> OUTPUT: 'CC(c)[*:1]'
>
>
> Pavel.
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to