Hi

Surely the problem is that some of these SMILES aren't really valid. From
the Daylight theory manual: '*The bonds are numbered in any order,
designating ring opening (or ring closure) bonds by a digit immediately
following the atomic symbol at each ring closure'*  (my emphasis).

So the behaviour with SMILES where there is an atom between the ring
closure digit and the atom to which the ring closure applies (e.g.
[C@@](F)1(C)CCO1)
may well not be well defined. Arguably RDKit should refuse to process
these, but apparently it looks at the atom order and inverts the
stereochemistry instead. In Daylight SMILES the @ symbol refers to the
order of substituents around the asymmetric atom. If we swap the ring
closure digit and one of the atoms then we've changed the order of
connections and inverted the stereochemistry, so the current behaviour
seems reasonable. Personally I wouldn't change the behaviour - or get RDKit
to issue a warning that the SMILES isn't 'strict' in these cases.

I think the safest approach is to stick to SMILES which are unequivocally
valid, unless RDKit is going to create its own definition of SMILES...


Best regards,
Chris Earnshaw

On 9 November 2017 at 07:13, Greg Landrum <greg.land...@gmail.com> wrote:

>
> On Thu, Nov 9, 2017 at 6:32 AM, Brian Cole <col...@gmail.com> wrote:
>
>> Hi Cheminformaticians,
>>
>> This is an extreme subtlety in the interpretation of SMILES atom
>> stereochemistry and I think a bug in RDKit. Specifically, I think the
>> following SMILES should be the same molecule:
>>
>> >>> rdkit.__version__
>> '2017.09.1'
>> >>> Chem.CanonSmiles('F[C@@]1(C)CCO1')
>> 'C[C@]1(F)CCO1'
>> >>> Chem.CanonSmiles('[C@@](F)1(C)CCO1')
>> 'C[C@@]1(F)CCO1'
>>
>
> As was discussed in the comments of https://github.com/rdkit/
> rdkit/issues/786, I think it's pretty gross that the second syntax is
> even legal. But that's a side point.
>
> Since there is no hydrogen inside the stereo carbon atom block the bond
>> being 'looked down' should be the first atom encountered. In both cases
>> above, that should be the Florine, therefore the molecules should be
>> equivalent.
>>
>
> Agreed, and this is a view that's further supported by this behavior:
>
> In [2]: Chem.CanonSmiles('F[C@@]1(C)CCO1')
> Out[2]: 'C[C@]1(F)CCO1'
>
> In [3]: Chem.CanonSmiles('F[C@@](C)1CCO1')
> Out[3]: 'C[C@@]1(F)CCO1'
>
> Would you mind filing a bug for this and I'll try to track it down/fix it?
>
> Thanks,
> -greg
>
>
>
>>
>> Though it could be argued the 2nd one is not strict SMILES as Andrew
>> describes here: https://github.com/rdkit/rdkit/issues/786
>>
>> It is useful when recombining fragments with ring closure digits for
>> these to be equivalent:
>> [*][C@]1(C)CCO1
>> [C@]([*])1(C)CCO1
>>
>> Also, every other tool I can get my hands on agrees they're the same:
>> OEChem, OpenBabel, indigo, and ChemAxon. (CDK lacks a simple enough
>> canonicalization example for me to work from.)
>>
>> Sure wish there was a SMILES validation test suite we could all run
>> against. And so I'm attaching the examples I used to verify the above so
>> whatever poor soul assigned that task later can find this on Google. (I'm
>> hopeful :-)
>>
>> Thanks,
>> Brian
>>
>> PS: the current output from the script:
>>
>> $ python stereo_handling_first_atom.py
>> RDKit = 2017.09.1
>> OEChem = 2.1.2
>> OpenBabel = 2.4.1
>> indigo = 1.2.3.r0-g98188eb mac10.7
>> RDKit failed to recognize these as the same:
>> [*:1][C@]1([*:2])CC1(Cl)Cl -> ClC1(Cl)C[C@]1([*:1])[*:2]
>> [C@]([*:1])1([*:2])CC1(Cl)Cl -> ClC1(Cl)C[C@@]1([*:1])[*:2]
>> OpenBabel failed to recognize these as the same:
>> Cl[S@](C)=O -> C[S@](=O)Cl
>> [S@](Cl)(C)=O -> C[S@@](=O)Cl
>> Indigo failed to recognize these as the same:
>> Cl[S@](C)=O -> C[S@](=O)Cl
>> [S@](Cl)(C)=O -> C[S@@](=O)Cl
>> OpenBabel failed to recognize these as the same:
>> Cl[S@](C)=CCCC -> CCCC=[S@](Cl)C
>> [S@](Cl)(C)=CCCC -> CCCC=[S@@](Cl)C
>> Indigo failed to recognize these as the same:
>> Cl[S@](C)=CCCC -> CCCC=[S@@](C)Cl
>> [S@](Cl)(C)=CCCC -> CCCC=[S@](C)Cl
>> RDKit failed to recognize these as the same:
>> Cl[C@](F)1CC[C@H](F)CC1 -> F[C@H]1CC[C@](F)(Cl)CC1
>> [C@](Cl)(F)1CC[C@H](F)CC1 -> F[C@H]1CC[C@@](F)(Cl)CC1
>> RDKit failed to recognize these as the same:
>> Cl[C@]1(c2ccccc2)NCCCS1 -> Cl[C@]1(c2ccccc2)NCCCS1
>> [C@](Cl)1(c2ccccc2)NCCCS1 -> Cl[C@@]1(c2ccccc2)NCCCS1
>> RDKit failed to recognize these as the same:
>> Cl3.[C@]31(c2ccccc2)NCCCS1 -> Cl[C@]1(c2ccccc2)NCCCS1
>> [C@](Cl)1(c2ccccc2)NCCCS1 -> Cl[C@@]1(c2ccccc2)NCCCS1
>> RDKit failed to recognize these as the same:
>> Cl[C@](F)1C2C(C1)CNC2 -> F[C@@]1(Cl)CC2CNCC21
>> [C@](Cl)(F)1C2C(C1)CNC2 -> F[C@]1(Cl)CC2CNCC21
>> RDKit failed to recognize these as the same:
>> [*][C@@H]1CO1 -> [*][C@@H]1CO1
>> [C@H]([*])1CO1 -> [*][C@H]1CO1
>> RDKit failed to recognize these as the same:
>> [*][C@@]1(C)CCO1 -> [*][C@@]1(C)CCO1
>> [C@@]([*])1(C)CCO1 -> [*][C@]1(C)CCO1
>> RDKit failed to recognize these as the same:
>> F[C@@]1(C)CCO1 -> C[C@]1(F)CCO1
>> [C@@](F)1(C)CCO1 -> C[C@@]1(F)CCO1
>> RDKit failed to recognize these as the same:
>> Cl[C@@H]1[C@@H](Cl)C(Cl)CCN1 -> ClC1CCN[C@H](Cl)[C@H]1Cl
>> [C@H](Cl)1[C@@H](Cl)C(Cl)CCN1 -> ClC1CCN[C@@H](Cl)[C@H]1Cl
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to