Yes, if you don't care about stereochemistry, you need to do a
Chem.RemoveStereochemistry() prior to your comparison. But if you do care
about stereochemistry, "CC=CC(C)(O)CF" and "C/C=C/[C@](C)(O)CF" are indeed
different molecular specifications.

As for fingerprints, I believe that most fingerprint schemes do not
distinguish between stereoisomers.

Regarding SDF's, note that "officially" the stereochemistry has nothing to
do with the actual atomic coordinates. Two molecules with identical atomic
coordinates can have different stereochemistry specifications. One may have
unspecified chirality while the other is specified. And it is even possible
to have a geometry inconsistent with the stereochemistry specification. If
you invoke Chem.AssignStereochemistryFrom3D(), RDKit will assign chirality
to all centres using atomic coordinates, as the function name suggests. If
you don't, it seems to me that RDKit sometimes does not read in the
stereochemistry correctly. See my post "SDMolSupplier chirality" three
weeks ago.

On a related note, if you run a GetSubstructMatch on "CC=CC(C)(O)CF" and
"C/C=C/[C@](C)(O)CF", one is a substructure of the other, but not the other
way round.

As for molecular equivalence, it all depends on what you want. You may also
like to take a look at InChi since it allows for comparisons on different
levels (e.g. stereochemistry or not).

Ling

Rocco Moretti <rmoretti...@gmail.com> 於 2023年1月13日週五 上午8:03寫道:

> Just as an FYI: the best easy way, by far, to keep track of whether or not
>> you've seen a particular molecule is to use the SMILES.
>>
>
> Though as a caveat with SMILES, be aware of issues about partial chirality
> and E/Z isomerization specification. "CC=CC(C)(O)CF" is not the same SMILES
> as "C/C=C/[C@](C)(O)CF", even though they might refer to the "same"
> molecule for your purposes. RDKit canonical SMILES will faithfully render
> the stereochemistry information if available, but depending on how you're
> reading and/or processing things, you may or may not have that info
> properly annotated for the SMILES outputter to use. (Something as simple as
> generating 3D coordinates can potentially add that info in. But also, just
> because your SDF file has 3D coordinates doesn't necessarily guarantee that
> RDKit will completely annotate stereochemical info on the read-in Mol.)
>
> Take a look at `Chem.AssignStereochemistryFrom3D()`,
> `Chem.RemoveStereochemistry()` and
> `Chem.EnumerateStereoisomers.EnumerateStereoisomers()` if this is
> potentially going to be an issue for you.
>
> On Fri, Jan 13, 2023 at 1:41 AM Greg Landrum <greg.land...@gmail.com>
> wrote:
>
>> Hi Eric,
>>
>> That would be due to the fix for this bug:
>> https://github.com/rdkit/rdkit/issues/5036
>> If you were generating the fingerprints on "normal" (i.e.
>> hydrogen-suppressed) graphs, you wouldn't notice this one, but the fact
>> that you add the Hs before generating the fingerprint causes you to notice
>> it.
>>
>> Just as an FYI: the best easy way, by far, to keep track of whether or
>> not you've seen a particular molecule is to use the SMILES.
>>
>> -greg
>>
>>
>> On Fri, Jan 13, 2023 at 6:27 AM Eric Jonas <jo...@ericjonas.com> wrote:
>>
>>> Hello! I use the crc of morgan fingerprints as a quick-and-dirty way to
>>> keep track of different molecules, but now I realize it might have been too
>>> quick and dirty! In particular, there appears to have been a change in the
>>> morgan code sometime between 2021.09.02 and 2022.03.05. The following code
>>> produces different output under these versions:
>>>
>>> import rdkit.Chem
>>> import pickle
>>> from rdkit import Chem
>>>
>>> import rdkit.Chem.rdMolDescriptors
>>> import zlib
>>>
>>> def get_morgan4_crc32(m):
>>>     mf = Chem.rdMolDescriptors.GetHashedMorganFingerprint(m, 4)
>>>     morgan4_crc32 = zlib.crc32(mf.ToBinary())
>>>     return morgan4_crc32
>>>
>>> mol = Chem.AddHs(Chem.MolFromSmiles('Oc1cc(O)c(O)c(O)c1'))
>>> print(get_morgan4_crc32(mol))
>>>
>>> 2021.09.2 : 1567135676
>>> 2022.03.5 : 204854560
>>>
>>> I tried looking at the release notes but I didn't seem to see any
>>> breaking changes (I might have missed them!) and I tried looking at "blame"
>>> for the relevant source but didn't see any seemingly-substantive changes
>>> within the relevant timeframe.
>>>
>>> So am I doing something crazy here, or did something change
>>> deliberately, or is it possible this is a bug?
>>>
>>> ...E
>>>
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to