Hi Greg

Thanks for looking into this. I think, but of course cannot prove, that the 
choice taken by Rogers was to include only such chirality, that can be 
disambiguated within the fragment itself in order to ensure that the 
fingerprints describe a real sub-fragment of the molecule independent on any 
information outside its radius. If the such a fragment, even if derived from a 
chiral molecule, is achiral, how can the chirality information be set, in order 
to ensure consistency and alignment independence?  In your current 
implementation how does the chirality information get set, in case the 
substituents cannot be disambiguated within the Morgan radius?


With respect to the question that molecules that are truly different, but 
cannot be distinguished by Morgan fingerprints, that effect kicks in at a 
certain alkyl chain length anyway, So from CCCCCCO on the chain homologues 
cannot be distinguished any more by Morgan-2 (without counts that is), so not 
distinguishing  in fragments sidechains outside of the radius I think is not 
something surprising. The answer to this is that you sometimes need to increase 
the radius in order to disambiguate longer repeats. Like in genomic  sequence 
assembly, where also longer reads are needed to assemble repeat-rich genomes.

I agree with your idea to make the original implementation a flag rather than 
changing the default, even if only for inter version compatibility reasons.

Best regards

Ansgar

Ansgar Schuffenhauer
Senior Investigator I
T +41 79 608 9063
ansgar.schuffenha...@novartis.com<mailto:ansgar.schuffenha...@novartis.com>

Novartis Pharma AG
NIBR

From: Greg Landrum <greg.land...@gmail.com>
Sent: Montag, 2. Dezember 2019 10:25
To: Schuffenhauer, Ansgar <ansgar.schuffenha...@novartis.com>
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] FW: rdkit Chiral Moragn Fingerprint unexpected 
behaviour

This is a really good question.

I must admit that I find the ECFP behavior as published to be somewhat weird.
It doesn't make sense to me that the chiral versions of the Morgan-2 
fingerprints for CCC[CH](C)CCO, CCC[C@@H](C)CCO, and CCC[C@H](C)CCO would be 
identical.

However, as you point out, we have tried to reproduce the details of the 
published algorithm and the way chirality is being handled currently does not 
do that. I don't think "fixing" the current behavior would be a great idea, but 
it would make sense to add an additional option to use the original chirality 
rules (along with some documentation explaining them). Here's the github issue: 
https://github.com/rdkit/rdkit/issues/2818<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_issues_2818&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=5QXEEnQo9VkJH7cIXFb_E4UmFhbbILws-P-WlR4_pzpv_6dQk_-xFQGH00p03i-I&m=_5gPv6bdkZE6XBGq7c1HtsyYinCaotY4ShvwcVaNd4s&s=5A3QyXuRVmQvi5UnvyHAUoGVFD2zYoA5YoU2lrEv7WU&e=>

I didn't notice this discrepancy when I did the original comparison of 
similarities between RDKit's MorganFP and PPs ECFP implementation many years 
ago because I ran both of them without chirality being turned on.

Thanks for pointing this out Ansgar!
-greg




On Mon, Nov 25, 2019 at 1:09 PM Schuffenhauer, Ansgar 
<ansgar.schuffenha...@novartis.com<mailto:ansgar.schuffenha...@novartis.com>> 
wrote:
Dear all

I have observed some unexpected behaviour with the chiral version of the Morgan 
Fingerprints in RDKit

When reading the Rogers paper 
(http://doi.org/10.1021/ci100050t<https://urldefense.proofpoint.com/v2/url?u=http-3A__doi.org_10.1021_ci100050t&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=5QXEEnQo9VkJH7cIXFb_E4UmFhbbILws-P-WlR4_pzpv_6dQk_-xFQGH00p03i-I&m=_5gPv6bdkZE6XBGq7c1HtsyYinCaotY4ShvwcVaNd4s&s=t_MGValwqu5hfyuSZFhYPVoup_fRztuFkeGAVKVOLkU&e=>
 ) I find:
“If the atom is a possible stereoatom but is not yet disambiguated, and all 
attachment atoms have different identifiers, then the atom is marked as 
disambiguated, and a stereochemical flag is appended to the array, depending on 
the marked stereochemistry. (Step 4 is only performed if stereochemical 
fingerprints are requested.)”

In this aspect I believe that the rdkit implementation does not follow exactly 
the ECFP paper.
As a test I calculated the pairwise similarity between the enatiomers of 
butan-2-ol, hexan-3-ol, octan-4-ol, decan-5-ol, ...
Eventually the both alkyl chains should grow too long to become disambiguated 
within the fingerprint radius, there for the chirality on the chiral center 
should not be recognised any more, and the fingerprint of the enantiomers 
should become equal to 1, once the chains outgrow the fingerprint radius.

Strangely that doesn’t happen, as can be seen in the attached notebook, all 
fingerprints with radius > 0 will always give similarities < 1.0 for the 
enantiomer pairs.

This contrasts with the Pipeline Pilot implementation, where with the 
similarity of the enantiomers indeed becomes 1.0 once the chains outgrow the 
fingerprint radius. For your reference I added also fingerprints and similarity 
values obtained at different ECFP diameters

Is this difference in behaviour intentional? I always assumed so far that rdkit 
Morgan and Pipeline Pilot ECFP would give identical similarity results.


With best regards


Ansgar Schuffenhauer
Senior Investigator I
T +41 79 608 9063
ansgar.schuffenha...@novartis.com<mailto:ansgar.schuffenha...@novartis.com>

Novartis Pharma AG
NIBR

Novartis Campus
Virchow 16-4.249.09
4056 Basel
Switzerland
________________________________

_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure under 
applicable law. If the reader of this message is not the intended recipient, or 
the employee or agent responsible for delivery of the message to the intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of this communication is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately by e-mail and 
delete the material from any computer.  Thank you.



_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_rdkit-2Ddiscuss&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=5QXEEnQo9VkJH7cIXFb_E4UmFhbbILws-P-WlR4_pzpv_6dQk_-xFQGH00p03i-I&m=_5gPv6bdkZE6XBGq7c1HtsyYinCaotY4ShvwcVaNd4s&s=Y0tDTJ_1vUr9lJXor-houWZVJWhHWa6PyyLCEQUpTko&e=>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to