[Rdkit-discuss] job for opening at Novartis Basel

2018-04-11 Thread Schuffenhauer, Ansgar
Hi all

We have a job opening at Novartis in Basel for a "Computational Drug Hunter", 
where rdkit experience is a relevant skill:

You can find out more an apply under this link:


https://www.novartis.com/careers/career-search/job-details?jobID=238569BR

best regards

Ansgar

Ansgar Schuffenhauer
Senior Investigator I
T +41 79 608 9063
ansgar.schuffenha...@novartis.com

Novartis Pharma AG
NIBR

Novartis Campus
Virchow 16-4.249.09
4056 Basel
Switzerland

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 141, Issue 16

2019-07-22 Thread Schuffenhauer, Ansgar
Dear all

For the standardizer module (Chem.MolStandardize), what is the best way to 
change some of the tautomerizer rules?
There is a data file in share/RDKit/Data/Molstandardize/tautomerTransforms.in 
which I assume to define the default.

//  NameSMARTS  Bonds   Charges
1,3 (thio)keto/enol f   [CX4!H0]-[C]=[O,S,Se,Te;X1]
1,3 (thio)keto/enol r   [O,S,Se,Te;X2!H0]-[C]=[C]
1,5 (thio)keto/enol f   [CX4,NX3;!H0]-[C]=[C][CH0]=[O,S,Se,Te;X1]
1,5 (thio)keto/enol r   [O,S,Se,Te;X2!H0]-[CH0]=[C]-[C]=[C,N]
...

Now my questions are
1. What is the Syntax of this file? What does the "f" and the "r" stand for? Do 
the smarts have to start with the atom carrying the mobile H?
2. How can I instruct rdkit not to use this default file, but the one supplied 
by the user.

The background for this question that the smarts for keto/enol seems to be a 
bit too generic, as it catches also the alpha C-atoms of carboxylic acids and 
amides. Generation of tautomers here leads to a epimerization of stereo-centers 
in alpha positions of carboxylic acids and amides. That appears odd to me, as 
such stereo-centers are quite stable (in contrast to those of "real" ketones 
and aldehydes). 


Best regards

Ansgar

Ansgar Schuffenhauer
Senior Investigator I
T +41 79 608 9063
ansgar.schuffenha...@novartis.com

Novartis Pharma AG
NIBR



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rdkit.MolStandardize tautomer

2019-07-22 Thread Schuffenhauer, Ansgar
Hi Greg

Thanks for your quick answer. What I am doing is essentially the following:

from rdkit.Chem import MolStandardize
my_standardizer = MolStandardize.standardize.Standardizer()
standard_tautomer = my_standardizer.tautomer_parent(input_mol)


I assume that at the stage I construct my_standardizer  there would be some 
opportunity slip in an alternative configuration info

By the way, I think also that one of the two cases of vanishing 
stereo-chemistry reported in https://github.com/rdkit/rdkit/issues/2363
is caused by an overly eager keto/enol tautomerizer.

Best regards

Ansgar


Ansgar Schuffenhauer
Senior Investigator I
T +41 79 608 9063
ansgar.schuffenha...@novartis.com<mailto:ansgar.schuffenha...@novartis.com>

Novartis Pharma AG
NIBR

From: Greg Landrum 
Sent: Montag, 22. Juli 2019 17:42
To: Schuffenhauer, Ansgar 
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 141, Issue 16

Hi Ansgar,

It is possible to specify the tautomer parameter file that is used, but in 
order for me to explain how, I need to know how you are currently using the 
code to enumerate tautomers (i.e. which function you are calling).

As for the format: it's tab-delimited and the first entry is the name. The 
"r/f" flag is an indicator of which direction the transform is going that is 
just there to make the name unique.
In the SMARTS the first atom is the one with the mobile H and the last atom is 
where it should be moved to.

-greg



On Mon, Jul 22, 2019 at 3:08 PM Schuffenhauer, Ansgar 
mailto:ansgar.schuffenha...@novartis.com>> 
wrote:
Dear all

For the standardizer module (Chem.MolStandardize), what is the best way to 
change some of the tautomerizer rules?
There is a data file in share/RDKit/Data/Molstandardize/tautomerTransforms.in 
which I assume to define the default.

//  NameSMARTS  Bonds   Charges
1,3 (thio)keto/enol f   [CX4!H0]-[C]=[O,S,Se,Te;X1]
1,3 (thio)keto/enol r   [O,S,Se,Te;X2!H0]-[C]=[C]
1,5 (thio)keto/enol f   [CX4,NX3;!H0]-[C]=[C][CH0]=[O,S,Se,Te;X1]
1,5 (thio)keto/enol r   [O,S,Se,Te;X2!H0]-[CH0]=[C]-[C]=[C,N]
...

Now my questions are
1. What is the Syntax of this file? What does the "f" and the "r" stand for? Do 
the smarts have to start with the atom carrying the mobile H?
2. How can I instruct rdkit not to use this default file, but the one supplied 
by the user.

The background for this question that the smarts for keto/enol seems to be a 
bit too generic, as it catches also the alpha C-atoms of carboxylic acids and 
amides. Generation of tautomers here leads to a epimerization of stereo-centers 
in alpha positions of carboxylic acids and amides. That appears odd to me, as 
such stereo-centers are quite stable (in contrast to those of "real" ketones 
and aldehydes).


Best regards

Ansgar

Ansgar Schuffenhauer
Senior Investigator I
T +41 79 608 9063
ansgar.schuffenha...@novartis.com<mailto:ansgar.schuffenha...@novartis.com>

Novartis Pharma AG
NIBR



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_rdkit-2Ddiscuss&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=5QXEEnQo9VkJH7cIXFb_E4UmFhbbILws-P-WlR4_pzpv_6dQk_-xFQGH00p03i-I&m=uiXOLxD_7MgeeA9MyeUBlDB3ufzf53oBws3smVh4cc8&s=L4Bzk6_VPaAqyj_iM8_9rz9diujKH9rSgsNrvBa5958&e=>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rdkit.MolStandardize tautomer

2019-07-24 Thread Schuffenhauer, Ansgar
Hi Greg

Very useful. Thank you. I’ll be able to deal with the SMARTS part myself

But your answer has left me with another series of questions:

You are saying that I am still using the old MolVS code for tautomer 
processing. That explains why the performance in terms of execution speed is 
not yet fully at the level I am used to expect from rdkit (no complaint meant, 
you just have set the bar quite high with rdkit in general).
Are you saying, that actually with respect to tautomer standardization there is 
no C++ port from Google Summer of Code?  Or is there something, which is not 
quite ready? Or is there a tautomer code in the C++ and I am just not using it? 
In this case, what would be the right function to use? What is your 
recommendation when it comes to tautomer standardization? .

For the sake of clarity I am looking for a tautomer standardizer, that does 
produce a uniform, canonical tautomer, I am not asking for the 
pyhsical.-chemical right, that is lowest energy energy one, as this is a task I 
assume no simple rule based tautomer standardizer can perform.

Is the C++ port everything that is in 
rdkit.Chem.MolStandardize.rdMolStandardize?

Best regards

Ansgar


Ansgar Schuffenhauer
Senior Investigator I
T +41 79 608 9063
ansgar.schuffenha...@novartis.com<mailto:ansgar.schuffenha...@novartis.com>

Novartis Pharma AG
NIBR

From: Greg Landrum 
Sent: Dienstag, 23. Juli 2019 14:43
To: Schuffenhauer, Ansgar 
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: rdkit.MolStandardize tautomer

Hi Ansgar,

This is still using the MolVS tautomer-handling code since we didn't finish the 
canonicalization part during last year's Google Summer of Code.[1]
That means it's not using the parameter file that you found. The rules that are 
used are here:
https://github.com/rdkit/rdkit/blob/master/rdkit/Chem/MolStandardize/tautomer.py<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_master_rdkit_Chem_MolStandardize_tautomer.py&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=5QXEEnQo9VkJH7cIXFb_E4UmFhbbILws-P-WlR4_pzpv_6dQk_-xFQGH00p03i-I&m=BnaAWSqyTt4tDiaikrUZmNMgOhNeWPi73bwdA9j4-T8&s=d5ldgC12XWgrCnPLSa9hZ9_B2zYRiw8riuv93BTPsz0&e=>

You can change those at runtime, but it you need to be careful to properly 
re-import modules after doing so. Here's an example showing how to do that:
https://gist.github.com/greglandrum/4ac2b4e7f8c61e25836e106467aef150<https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_greglandrum_4ac2b4e7f8c61e25836e106467aef150&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=5QXEEnQo9VkJH7cIXFb_E4UmFhbbILws-P-WlR4_pzpv_6dQk_-xFQGH00p03i-I&m=BnaAWSqyTt4tDiaikrUZmNMgOhNeWPi73bwdA9j4-T8&s=Q-EeeeZdfD5bjCY_U7PrtfL5iHexxeP7faW30BsozuM&e=>

I'm not going to claim that the SMARTS which I constructed to change the 1,3 
(thio)ketol/enol is the right one, but it does at least show how to make the 
changes and reload the standardize module so that it takes effect.

I hope this helps,
-greg
[1] and I haven't made it a priority because I dread the "no, that's not the 
right canonical tautomer" arguments that will ensue


On Tue, Jul 23, 2019 at 8:54 AM Schuffenhauer, Ansgar 
mailto:ansgar.schuffenha...@novartis.com>> 
wrote:
Hi Greg

Thanks for your quick answer. What I am doing is essentially the following:

from rdkit.Chem import MolStandardize
my_standardizer = MolStandardize.standardize.Standardizer()
standard_tautomer = my_standardizer.tautomer_parent(input_mol)


I assume that at the stage I construct my_standardizer  there would be some 
opportunity slip in an alternative configuration info

By the way, I think also that one of the two cases of vanishing 
stereo-chemistry reported in 
https://github.com/rdkit/rdkit/issues/2363<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_issues_2363&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=5QXEEnQo9VkJH7cIXFb_E4UmFhbbILws-P-WlR4_pzpv_6dQk_-xFQGH00p03i-I&m=BnaAWSqyTt4tDiaikrUZmNMgOhNeWPi73bwdA9j4-T8&s=bALfjkh7w44Zp0A885uv77DRWnreozm-_FML9XdeW60&e=>
is caused by an overly eager keto/enol tautomerizer.

Best regards

Ansgar


Ansgar Schuffenhauer
Senior Investigator I
T +41 79 608 9063
ansgar.schuffenha...@novartis.com<mailto:ansgar.schuffenha...@novartis.com>

Novartis Pharma AG
NIBR

From: Greg Landrum mailto:greg.land...@gmail.com>>
Sent: Montag, 22. Juli 2019 17:42
To: Schuffenhauer, Ansgar 
mailto:ansgar.schuffenha...@novartis.com>>
Cc: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 141, Issue 16

Hi Ansgar,

It is possible to specify the tautomer parameter file that is used, but in 
order for me to explain how, I need to know how you are currently using the 
code to enumerate tautomers (i.e. which function you are calling).

As for the format: it

[Rdkit-discuss] FW: rdkit Chiral Moragn Fingerprint unexpected behaviour

2019-11-25 Thread Schuffenhauer, Ansgar
Dear all

I have observed some unexpected behaviour with the chiral version of the Morgan 
Fingerprints in RDKit

When reading the Rogers paper (http://doi.org/10.1021/ci100050t ) I find:
"If the atom is a possible stereoatom but is not yet disambiguated, and all 
attachment atoms have different identifiers, then the atom is marked as 
disambiguated, and a stereochemical flag is appended to the array, depending on 
the marked stereochemistry. (Step 4 is only performed if stereochemical 
fingerprints are requested.)"

In this aspect I believe that the rdkit implementation does not follow exactly 
the ECFP paper.
As a test I calculated the pairwise similarity between the enatiomers of 
butan-2-ol, hexan-3-ol, octan-4-ol, decan-5-ol, ...
Eventually the both alkyl chains should grow too long to become disambiguated 
within the fingerprint radius, there for the chirality on the chiral center 
should not be recognised any more, and the fingerprint of the enantiomers 
should become equal to 1, once the chains outgrow the fingerprint radius.

Strangely that doesn't happen, as can be seen in the attached notebook, all 
fingerprints with radius > 0 will always give similarities < 1.0 for the 
enantiomer pairs.

This contrasts with the Pipeline Pilot implementation, where with the 
similarity of the enantiomers indeed becomes 1.0 once the chains outgrow the 
fingerprint radius. For your reference I added also fingerprints and similarity 
values obtained at different ECFP diameters

Is this difference in behaviour intentional? I always assumed so far that rdkit 
Morgan and Pipeline Pilot ECFP would give identical similarity results.


With best regards


Ansgar Schuffenhauer
Senior Investigator I
T +41 79 608 9063
ansgar.schuffenha...@novartis.com

Novartis Pharma AG
NIBR

Novartis Campus
Virchow 16-4.249.09
4056 Basel
Switzerland


_

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure under 
applicable law. If the reader of this message is not the intended recipient, or 
the employee or agent responsible for delivery of the message to the intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of this communication is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately by e-mail and 
delete the material from any computer.  Thank you.





ChiralMorganTest.ipynb
Description: ChiralMorganTest.ipynb
n   smiles_asmiles_bECFP_0#S_B[1]   ECFP_0#S_B[2]   
ECFP_0#S_B[3]   ECFP_0#S_B[4]   ECFP_0#S_A[1]   ECFP_0#S_A[2]   ECFP_0#S_A[3]   
ECFP_0#S_A[4]   0   ECFP_2#S_B[1]   ECFP_2#S_B[2]   ECFP_2#S_B[3]   
ECFP_2#S_B[4]   ECFP_2#S_B[5]   ECFP_2#S_B[6]   ECFP_2#S_B[7]   ECFP_2#S_B[8]   
ECFP_2#S_B[9]   ECFP_2#S_B[10]  ECFP_2#S_A[1]   ECFP_2#S_A[2]   ECFP_2#S_A[3]   
ECFP_2#S_A[4]   ECFP_2#S_A[5]   ECFP_2#S_A[6]   ECFP_2#S_A[7]   ECFP_2#S_A[8]   
ECFP_2#S_A[9]   ECFP_2#S_A[10]  1   ECFP_4#S_B[1]   ECFP_4#S_B[2]   
ECFP_4#S_B[3]   ECFP_4#S_B[4]   ECFP_4#S_B[5]   ECFP_4#S_B[6]   ECFP_4#S_B[7]   
ECFP_4#S_B[8]   ECFP_4#S_B[9]   ECFP_4#S_B[10]  ECFP_4#S_B[11]  ECFP_4#S_B[12]  
ECFP_4#S_B[13]  ECFP_4#S_B[14]  ECFP_4#S_B[15]  ECFP_4#S_B[16]  ECFP_4#S_A[1]   
ECFP_4#S_A[2]   ECFP_4#S_A[3]   ECFP_4#S_A[4]   ECFP_4#S_A[5]   ECFP_4#S_A[6]   
ECFP_4#S_A[7]   ECFP_4#S_A[8]   ECFP_4#S_A[9]   ECFP_4#S_A[10]  ECFP_4#S_A[11]  
ECFP_4#S_A[12]  ECFP_4#S_A[13]  ECFP_4#S_A[14]  ECFP_4#S_A[15]  ECFP_4#S_A[16]  
2   ECFP_6#S_B[1]   ECFP_6#S_B[2]   ECFP_6#S_B[3]   ECFP_6#S_B[4]   
ECFP_6#S_B[5]   ECFP_6#S_B[6]   ECFP_6#S_B[7]   ECFP_6#S_B[8]   ECFP_6#S_B[9]   
ECFP_6#S_B[10]  ECFP_6#S_B[11]  ECFP_6#S_B[12]  ECFP_6#S_B[13]  ECFP_6#S_B[14]  
ECFP_6#S_B[15]  ECFP_6#S_B[16]  ECFP_6#S_B[17]  ECFP_6#S_B[18]  ECFP_6#S_B[19]  
ECFP_6#S_B[20]  ECFP_6#S_B[21]  ECFP_6#S_B[22]  ECFP_6#S_B[23]  ECFP_6#S_A[1]   
ECFP_6#S_A[2]   ECFP_6#S_A[3]   ECFP_6#S_A[4]   ECFP_6#S_A[5]   ECFP_6#S_A[6]   
ECFP_6#S_A[7]   ECFP_6#S_A[8]   ECFP_6#S_A[9]   ECFP_6#S_A[10]  ECFP_6#S_A[11]  
ECFP_6#S_A[12]  ECFP_6#S_A[13]  ECFP_6#S_A[14]  ECFP_6#S_A[15]  ECFP_6#S_A[16]  
ECFP_6#S_A[17]  ECFP_6#S_A[18]  ECFP_6#S_A[19]  ECFP_6#S_A[20]  ECFP_6#S_A[21]  
ECFP_6#S_A[22]  ECFP_6#S_A[23]  3   ECFP_8#S_B[1]   ECFP_8#S_B[2]   
ECFP_8#S_B[3]   ECFP_8#S_B[4]   ECFP_8#S_B[5]   ECFP_8#S_B[6]   ECFP_8#S_B[7]   
ECFP_8#S_B[8]   ECFP_8#S_B[9]   ECFP_8#S_B[10]  ECFP_8#S_B[11]  ECFP_8#S_B[12]  
ECFP_8#S_B[13]  ECFP_8#S_B[14]  ECFP_8#S_B[15]  ECFP_8#S_B[16]  ECFP_8#S_B[17]  
ECFP_8#S_B[18]  ECFP_8#S_B[19]  ECFP_8#S_B[20]  ECFP_8#S_B[21]  ECFP_8#S_B[22]  
ECFP_8#S_B[23]  ECFP_8#S_B[24]  ECFP_8#S_B[25]  ECFP_8#S_B[26]  ECFP_8#S_B[27]  
ECFP_8#S_B[28]  ECFP_8#S_B[29]  ECFP_8#S_B[30]  ECFP_8#S_B[31

Re: [Rdkit-discuss] FW: rdkit Chiral Morgan Fingerprint unexpected behaviour

2019-12-02 Thread Schuffenhauer, Ansgar
Hi Greg

Thanks for looking into this. I think, but of course cannot prove, that the 
choice taken by Rogers was to include only such chirality, that can be 
disambiguated within the fragment itself in order to ensure that the 
fingerprints describe a real sub-fragment of the molecule independent on any 
information outside its radius. If the such a fragment, even if derived from a 
chiral molecule, is achiral, how can the chirality information be set, in order 
to ensure consistency and alignment independence?  In your current 
implementation how does the chirality information get set, in case the 
substituents cannot be disambiguated within the Morgan radius?


With respect to the question that molecules that are truly different, but 
cannot be distinguished by Morgan fingerprints, that effect kicks in at a 
certain alkyl chain length anyway, So from CCO on the chain homologues 
cannot be distinguished any more by Morgan-2 (without counts that is), so not 
distinguishing  in fragments sidechains outside of the radius I think is not 
something surprising. The answer to this is that you sometimes need to increase 
the radius in order to disambiguate longer repeats. Like in genomic  sequence 
assembly, where also longer reads are needed to assemble repeat-rich genomes.

I agree with your idea to make the original implementation a flag rather than 
changing the default, even if only for inter version compatibility reasons.

Best regards

Ansgar

Ansgar Schuffenhauer
Senior Investigator I
T +41 79 608 9063
ansgar.schuffenha...@novartis.com<mailto:ansgar.schuffenha...@novartis.com>

Novartis Pharma AG
NIBR

From: Greg Landrum 
Sent: Montag, 2. Dezember 2019 10:25
To: Schuffenhauer, Ansgar 
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] FW: rdkit Chiral Moragn Fingerprint unexpected 
behaviour

This is a really good question.

I must admit that I find the ECFP behavior as published to be somewhat weird.
It doesn't make sense to me that the chiral versions of the Morgan-2 
fingerprints for CCC[CH](C)CCO, CCC[C@@H](C)CCO, and CCC[C@H](C)CCO would be 
identical.

However, as you point out, we have tried to reproduce the details of the 
published algorithm and the way chirality is being handled currently does not 
do that. I don't think "fixing" the current behavior would be a great idea, but 
it would make sense to add an additional option to use the original chirality 
rules (along with some documentation explaining them). Here's the github issue: 
https://github.com/rdkit/rdkit/issues/2818<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_issues_2818&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=5QXEEnQo9VkJH7cIXFb_E4UmFhbbILws-P-WlR4_pzpv_6dQk_-xFQGH00p03i-I&m=_5gPv6bdkZE6XBGq7c1HtsyYinCaotY4ShvwcVaNd4s&s=5A3QyXuRVmQvi5UnvyHAUoGVFD2zYoA5YoU2lrEv7WU&e=>

I didn't notice this discrepancy when I did the original comparison of 
similarities between RDKit's MorganFP and PPs ECFP implementation many years 
ago because I ran both of them without chirality being turned on.

Thanks for pointing this out Ansgar!
-greg




On Mon, Nov 25, 2019 at 1:09 PM Schuffenhauer, Ansgar 
mailto:ansgar.schuffenha...@novartis.com>> 
wrote:
Dear all

I have observed some unexpected behaviour with the chiral version of the Morgan 
Fingerprints in RDKit

When reading the Rogers paper 
(http://doi.org/10.1021/ci100050t<https://urldefense.proofpoint.com/v2/url?u=http-3A__doi.org_10.1021_ci100050t&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=5QXEEnQo9VkJH7cIXFb_E4UmFhbbILws-P-WlR4_pzpv_6dQk_-xFQGH00p03i-I&m=_5gPv6bdkZE6XBGq7c1HtsyYinCaotY4ShvwcVaNd4s&s=t_MGValwqu5hfyuSZFhYPVoup_fRztuFkeGAVKVOLkU&e=>
 ) I find:
“If the atom is a possible stereoatom but is not yet disambiguated, and all 
attachment atoms have different identifiers, then the atom is marked as 
disambiguated, and a stereochemical flag is appended to the array, depending on 
the marked stereochemistry. (Step 4 is only performed if stereochemical 
fingerprints are requested.)”

In this aspect I believe that the rdkit implementation does not follow exactly 
the ECFP paper.
As a test I calculated the pairwise similarity between the enatiomers of 
butan-2-ol, hexan-3-ol, octan-4-ol, decan-5-ol, ...
Eventually the both alkyl chains should grow too long to become disambiguated 
within the fingerprint radius, there for the chirality on the chiral center 
should not be recognised any more, and the fingerprint of the enantiomers 
should become equal to 1, once the chains outgrow the fingerprint radius.

Strangely that doesn’t happen, as can be seen in the attached notebook, all 
fingerprints with radius > 0 will always give similarities < 1.0 for the 
enantiomer pairs.

This contrasts with the Pipeline Pilot implementation, where with the 
similarity of the enantiomers indeed becomes 1.0 once the chains outgrow the 
fingerprint radius. 

Re: [Rdkit-discuss] Synthetic Accessibility (SA) score

2020-04-01 Thread Schuffenhauer, Ansgar
Hi Ganesh

Chemical motives of fragments are not adding simply up. The linkage of two 
fragments creates new motives at the fragment linkage points not present in 
either of the fragments, and the circular substructures of the once terminal 
atoms are disappearing.
Depending on the type of fragmentation and assembly strategy you use, even 
stereocenters could be created by fragment linkage


Best regards


Ansgar Schuffenhauer
Senior Investigator I
T +41 79 608 9063
ansgar.schuffenha...@novartis.com

Novartis Pharma AG
NIBR

-Original Message-
From: rdkit-discuss-requ...@lists.sourceforge.net 
 
Sent: Mittwoch, 1. April 2020 15:15
To: rdkit-discuss@lists.sourceforge.net
Subject: Rdkit-discuss Digest, Vol 150, Issue 4

Send Rdkit-discuss mailing list submissions to
rdkit-discuss@lists.sourceforge.net

To subscribe or unsubscribe via the World Wide Web, visit

https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_rdkit-2Ddiscuss&d=DwICAg&c=ZbgFmJjg4pdtrnL2HUJUDw&r=5QXEEnQo9VkJH7cIXFb_E4UmFhbbILws-P-WlR4_pzpv_6dQk_-xFQGH00p03i-I&m=w39sgK-SCsl2RmZF3hc9J7GZnaERmOcPmrT2osTwPrc&s=5S3iPLyW1E2knLYrWo9DKJeSOI7RQUIjBnhjpTO2EOs&e=
or, via email, send a message with subject or body 'help' to
rdkit-discuss-requ...@lists.sourceforge.net

You can reach the person managing the list at
rdkit-discuss-ow...@lists.sourceforge.net

When replying, please edit your Subject line so it is more specific than "Re: 
Contents of Rdkit-discuss digest..."


Today's Topics:

   1. Re: Synthetic Accessibility (SA) score (Alan Kerstjens Medina)


--

Message: 1
Date: Wed, 1 Apr 2020 10:58:56 +
From: Alan Kerstjens Medina 
To: Ganesh Shahane 
Cc: "rdkit-discuss@lists.sourceforge.net"

Subject: Re: [Rdkit-discuss] Synthetic Accessibility (SA) score
Message-ID:



Content-Type: text/plain; charset="windows-1252"

Hi Ganesh,

To delve a bit deeper into this, if I recall correctly, SA score is calculated 
based on both:

  1.  The prevalence of your molecule?s chemical motifs in a virtual library of 
synthesizable compounds.
  2.  A set of logarithmic formulas that take as parameters molecular features 
associated with chemical complexity, like the ring complexity or number of 
stereocenters.
Consequently, like Axel and Nils have pointed out before me, the sum of 
fragment SA scores shouldn?t be the same as that of a whole molecule. This is 
because you have more chemical motifs in your larger molecule (think of how 
fingerprints are generated) and because you can?t sum logarithms in the same 
way you sum real numbers.

You can find the details in the original publication:

Ertl, P., & Schuffenhauer, A. (2009). Estimation of synthetic accessibility 
score of drug-like molecules based on molecular complexity and fragment 
contributions. Journal of Cheminformatics, 1(1), 1?11. 
https://urldefense.proofpoint.com/v2/url?u=https-3A__doi.org_10.1186_1758-2D2946-2D1-2D8&d=DwICAg&c=ZbgFmJjg4pdtrnL2HUJUDw&r=5QXEEnQo9VkJH7cIXFb_E4UmFhbbILws-P-WlR4_pzpv_6dQk_-xFQGH00p03i-I&m=w39sgK-SCsl2RmZF3hc9J7GZnaERmOcPmrT2osTwPrc&s=SpDRDob2_4abUGK1H8zKdWcOyMo0Ofrqid7Q_0UuQPs&e=
From: Nils Weskamp
Sent: 01 April 2020 12:48
To: Ganesh Shahane
Cc: 
rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Synthetic Accessibility (SA) score

Hi Ganesh,

I would like to challenge your premise. Why do you think that synthetic 
accessibility should add up like that?

Theoretically, I would expect that the combination of A,B and C to ABC will 
require some synthetic effort - so should be SA(A) + SA(B) + SA(C) < SA(ABC).

Technically, the combination of the three fragments will change the properties 
and environment of at least some atoms in the molecule, so that should have an 
influence on the result. I suspect it will be difficult to define a score with 
the desired properties without making use of the fragmentation scheme you are 
using.

Best regards,
Nils


On Wed, Apr 1, 2020 at 12:36 PM Ganesh Shahane 
mailto:ganesh7shah...@gmail.com>> wrote:
Hi Axel,

Thank you for your response.

Yes, I tried to implement the aforementioned script. It works very well on 
whole molecules.

However, I am trying to implement the script on fragments. For example, if I 
have fragments: A, B and C that makes up a whole molecule "ABC", then the sum 
of SA scores of the fragments should be equal to SA score of the whole molecule.

Right now, the summation doesn't add up.

I was wondering if there is a correction that needs to be made to the sum of SA 
scores such that it is equal to the SA score of the whole molecule.

--
Best,
Ganesh


On Mon, Mar 30, 2020 at 4:30 PM Axel Pahl 
mailto:axelp...@gmx.de>> wrote:
Hi Ganesh,

are you aware that the SA Score IS implemented in RDKit:

https://urldefense.proofpoint.c