Pavel, this isn't exactly trivial so I went ahead and made an example.  The
basics are that atomMaps are canonicalized, i.e. their value is used in the
generation of smiles.

To solve this problem:
1) backup the atom maps and remove them
2) canonicalize *without* atom maps but figure out the order in which the
atoms in the molecule are output
3) using the atom output order, relabel the atom maps based on output order.

That's a mouthful, but here's some code that should do the trick:

from rdkit import Chem

smi = ["ClC1=C([*:1])C(=S)C([*:2])=C([*:3])N1",
       "ClC1=C([*:1])C(=S)C([*:3])=C([*:2])N1",
       "ClC1=C([*:2])C(=S)C([*:1])=C([*:3])N1",
       "ClC1=C([*:2])C(=S)C([*:3])=C([*:1])N1",
       "ClC1=C([*:3])C(=S)C([*:1])=C([*:2])N1",
       "ClC1=C([*:3])C(=S)C([*:2])=C([*:1])N1"]


def CanonicalizeMaps(m, *a, **kw):
    # atom maps are canonicalized, so rename them
    #  figure out where they would have gone
    #  and relabel from 1...N based on output order
    atomMap = "molAtomMapNumber"
    backupAtomMap = "oldMolAtomMapNumber"

    for atom in m.GetAtoms():
        if atom.HasProp(atomMap):
            atomNum = atom.GetProp(atomMap)
            atom.SetProp(backupAtomMap, atomNum)
            atom.ClearProp(atomMap)

    # canonicalize
    smi = Chem.MolToSmiles(m, *a, **kw)
    # where did the atoms end up in the output string?
    atoms = [(pos, atom_idx) for atom_idx, pos in enumerate(
        eval(m.GetProp("_smilesAtomOutputOrder")))]
    atommap = 1
    atoms.sort()

    # set the new atommap based on output position
    for pos, atom_idx in atoms:
        atom = m.GetAtomWithIdx(atom_idx)
        if atom.HasProp(backupAtomMap):
            atom.SetProp(atomMap, str(atommap))
            atommap +=1

    return Chem.MolToSmiles(m)

for s in smi:
    m = Chem.MolFromSmiles(s)
    print CanonicalizeMaps(m,True)



Output:

S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]

Now, if you want the atomMaps in 1...2...3 output order, we could do that
as well, but it is even trickier.

Enjoy,
 Brian

On Sat, May 27, 2017 at 8:36 AM, Pavel Polishchuk <pavel_polishc...@ukr.net>
wrote:

> Hi,
>
>   I cannot solve an issue and would like to ask for an advice.
>   If there are different map numbers for attachment points for the same
> fragment different canonical smiles are generated.
>   I observed such behavior only for fragments with 3 attachment points.
> Below is an example.
>   I'm looking for a solution/workaround how to produce the "same" smiles
> strings irrespectively of mapping that after removal of map numbers smiles
> will become identical.
>   Any advice would be appreciated.
>
> smi = ["ClC1=C([*:1])C(=S)C([*:2])=C([*:3])N1",
>        "ClC1=C([*:1])C(=S)C([*:3])=C([*:2])N1",
>        "ClC1=C([*:2])C(=S)C([*:1])=C([*:3])N1",
>        "ClC1=C([*:2])C(=S)C([*:3])=C([*:1])N1",
>        "ClC1=C([*:3])C(=S)C([*:1])=C([*:2])N1",
>        "ClC1=C([*:3])C(=S)C([*:2])=C([*:1])N1"]
>
> for s in smi:
>     print(Chem.MolToSmiles(Chem.MolFromSmiles(s)))
>
> output:
> S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
> S=c1c([*:1])c(Cl)[nH]c([*:2])c1[*:3]
> S=c1c([*:1])c([*:3])[nH]c(Cl)c1[*:2]
> S=c1c([*:2])c(Cl)[nH]c([*:1])c1[*:3]
> S=c1c([*:1])c([*:2])[nH]c(Cl)c1[*:3]
> S=c1c([*:2])c([*:1])[nH]c(Cl)c1[*:3]
>
> Kind regards,
> Pavel.
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to