That does seem like a bug. You can also see it without involving
DeleteSubstructs, by starting from different SMILES representations of the
same molecule:
>>> m1 = Chem.MolFromSmiles('FC12C3CCCC1C32F')
>>> m2 = Chem.MolFromSmiles('C12C3CCCC1C32')
>>> m3 = Chem.MolFromSmiles('C1CC2C3C(C1)C23')
>>> Chem.MolToSmiles(m2) == Chem.MolToSmiles(m3)
True
>>> m1.GetSubstructMatch(m2)
(1, 2, 3, 4, 5, 6, 7)
>>> m1.GetSubstructMatch(m3)
()
Note that if you parse the problem SMILES as a SMARTS, you do get a match:
>>> m4 = Chem.MolFromSmarts('C1CC2C3C(C1)C23')
>>> m1.GetSubstructMatch(m4)
(4, 3, 2, 1, 6, 5, 7)
Another interesting bit is that while the Inchis of m2 and m3 are also the
same, the conversion produces a warning about stereochemistry:
>>> Chem.MolToInchi(m2) == Chem.MolToInchi(m3)
[18:26:48] WARNING: Omitted undefined stereo
[18:26:48] WARNING: Omitted undefined stereo
True
Ivan
On Wed, Nov 3, 2021 at 3:59 PM Ling Chan <[email protected]> wrote:
> Dear colleagues,
>
> I have a molecule "FC12C3CCCC1C32F". I stripped it of the F's, and tried
> to do a GetSubstructMatch. It worked. But if I reconstruct the stripped
> molecule from a smiles string, it does not. Please see attached.
>
> I suppose some info is lost when you reconstruct the stripped core from a
> smiles string. But still, I would think it should match anyway.
>
> Another issue is that the 2D depiction has the left most carbons lying
> exactly on top of each other, creating a false impression. A better
> depiction would be like the second attached image. (Not sure if this is
> easy to fix though.)
>
> Thank you for you attention.
>
> Ling
>
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss