Hi Ling,
If there are symmetries then a substructure search like will only give you
one mapping, and that might not be the canonical mapping.
What you're looking for is the special property _smilesAtomOutputOrder
>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C")
>>> Chem.MolToSmiles(mol)
'COc1cc(CNC(=O)CCCC/C=C/C(C)C)ccc1O'
>>> mol.GetProp("_smilesAtomOutputOrder")
'[8,7,6,5,4,3,2,1,0,13,14,15,16,17,18,19,20,21,12,11,9,10,]'
Here are the atom indices of the original SMILES:
┌ 1 11 1111 1 1 1 2 2
atoms│ 0 1 234 56 78 9 0 12 3456 7 8 9 0 1
└ | | ||| || || | | || |||| | | | | |
SMILES[ O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C
You can see the first atom of the output is a "C", which is mapped to position
8 in the _smilesAtomOutputOrder, which is the "...C)..." in the original
SMILES, etc.
Cheers,
Andrew
[email protected]
> On Nov 3, 2021, at 00:18, Ling Chan <[email protected]> wrote:
>
> O.K. Problem solved. Sorry about the spam, folks.
>
> I can use GetSubstructMatch, as follows.
>
> # sinput is the input smiles
> # scanon is the output smiles
>
> minput = Chem.MolFromSmiles(sinput)
> scanon=Chem.MolToSmiles(minput)
> mcanon=Chem.MolFromSmiles(scanon)
> map_forward = minput.GetSubstructMatch(mcanon)
> map_backward = mcanon.GetSubstructMatch(minput)
>
>
>
>
> Ling Chan <[email protected]> 於 2021年11月2日週二 下午3:55寫道:
> Dear colleagues,
>
> Just wonder if I can obtain a mapping of the atom indices upon
> canonicalization by MolToSmiles ? I am aware that canonicalization (and hence
> atom reordering) can be suppressed in MolToSmiles, but I do want to
> canonicalize the output smiles.
>
> If you are interested, here is a bit more details of my problem. For each
> molecule, I want to delete one or two side chains, and obtain a smiles of
> what is left. Just that I want to know what are the atoms that bonded to the
> deleted side chains. I know, by suppressing canonicalization things will
> work. But I would like to canonicalize the smiles so that I can know if there
> are duplicates.
>
> I tried marking the atoms. But I believe that properties that got carried
> over to the output smiles, e.g. Isotope, affect the canonicalization, while
> properties that do not affect canonicalization, e.g, IntProp, are lost upon
> the conversion to smiles.
>
> Thank you for your insight.
>
> Ling
>
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss