Hi Ansgar,

On Wed, Jul 24, 2019 at 9:33 AM Schuffenhauer, Ansgar <
ansgar.schuffenha...@novartis.com> wrote:

>
>
> Very useful. Thank you. I’ll be able to deal with the SMARTS part myself
>
>
Great.


> But your answer has left me with another series of questions:
>
>
>
> You are saying that I am still using the old MolVS code for tautomer
> processing. That explains why the performance in terms of execution speed
> is not yet fully at the level I am used to expect from rdkit (no complaint
> meant, you just have set the bar quite high with rdkit in general).
>

You say such nice things. :-)
I haven't spent enough time looking at the tautomer enumeration code to
know whether or not the poor performance you are seeing is something
inherent in the process or an implementation artifact that can be fixed.


> Are you saying, that actually with respect to tautomer standardization
> there is no C++ port from Google Summer of Code?  Or is there something,
> which is not quite ready? Or is there a tautomer code in the C++ and I am
> just not using it? In this case, what would be the right function to use?
> What is your recommendation when it comes to tautomer standardization? .
>

Susan (who did the MolVS port) got a first version of the tautomer
enumeration done but did not finish the scoring code that is necessary to
get a "canonical" tautomer. Since the tautomer code in general wasn't
finished, we didn't do a Python wrapper for any of it.


> For the sake of clarity I am looking for a tautomer standardizer, that
> does produce a uniform, canonical tautomer, I am not asking for the
> pyhsical.-chemical right, that is lowest energy energy one, as this is a
> task I assume no simple rule based tautomer standardizer can perform.
>
>
At the moment the only real option is to stick with the Python-based code.
Completing the tautomer enumeration/canonicalization work is something that
is on my ToDo list, but it hasn't managed to bubble up to the top. Having
people in the community[1] requesting it helps raise the priority.

-greg
[1] Particularly people in the community who also happen to work for
companies that have RDKit support contracts.


>
>
> Is the C++ port everything that is in
> rdkit.Chem.MolStandardize.rdMolStandardize?
>
>
>
> Best regards
>
>
>
> Ansgar
>
>
>
>
>
> *Ansgar Schuffenhauer*
>
> Senior Investigator I
>
> T +41 79 608 9063
>
> ansgar.schuffenha...@novartis.com
>
>
>
> *Novartis Pharma AG*
>
> NIBR
>
>
>
> *From:* Greg Landrum <greg.land...@gmail.com>
> *Sent:* Dienstag, 23. Juli 2019 14:43
> *To:* Schuffenhauer, Ansgar <ansgar.schuffenha...@novartis.com>
> *Cc:* rdkit-discuss@lists.sourceforge.net
> *Subject:* Re: rdkit.MolStandardize tautomer
>
>
>
> Hi Ansgar,
>
>
>
> This is still using the MolVS tautomer-handling code since we didn't
> finish the canonicalization part during last year's Google Summer of
> Code.[1]
>
> That means it's not using the parameter file that you found. The rules
> that are used are here:
>
>
> https://github.com/rdkit/rdkit/blob/master/rdkit/Chem/MolStandardize/tautomer.py
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_blob_master_rdkit_Chem_MolStandardize_tautomer.py&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=5QXEEnQo9VkJH7cIXFb_E4UmFhbbILws-P-WlR4_pzpv_6dQk_-xFQGH00p03i-I&m=BnaAWSqyTt4tDiaikrUZmNMgOhNeWPi73bwdA9j4-T8&s=d5ldgC12XWgrCnPLSa9hZ9_B2zYRiw8riuv93BTPsz0&e=>
>
>
>
> You can change those at runtime, but it you need to be careful to properly
> re-import modules after doing so. Here's an example showing how to do that:
>
> https://gist.github.com/greglandrum/4ac2b4e7f8c61e25836e106467aef150
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_greglandrum_4ac2b4e7f8c61e25836e106467aef150&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=5QXEEnQo9VkJH7cIXFb_E4UmFhbbILws-P-WlR4_pzpv_6dQk_-xFQGH00p03i-I&m=BnaAWSqyTt4tDiaikrUZmNMgOhNeWPi73bwdA9j4-T8&s=Q-EeeeZdfD5bjCY_U7PrtfL5iHexxeP7faW30BsozuM&e=>
>
>
>
> I'm not going to claim that the SMARTS which I constructed to change the
> 1,3 (thio)ketol/enol is the right one, but it does at least show how to
> make the changes and reload the standardize module so that it takes effect.
>
>
>
> I hope this helps,
>
> -greg
>
> [1] and I haven't made it a priority because I dread the "no, that's not
> the right canonical tautomer" arguments that will ensue
>
>
>
>
>
> On Tue, Jul 23, 2019 at 8:54 AM Schuffenhauer, Ansgar <
> ansgar.schuffenha...@novartis.com> wrote:
>
> Hi Greg
>
>
>
> Thanks for your quick answer. What I am doing is essentially the following:
>
>
>
> from rdkit.Chem import MolStandardize
>
> my_standardizer = MolStandardize.standardize.Standardizer()
>
> standard_tautomer = my_standardizer.tautomer_parent(input_mol)
>
>
>
>
>
> I assume that at the stage I construct my_standardizer  there would be
> some opportunity slip in an alternative configuration info
>
>
>
> By the way, I think also that one of the two cases of vanishing
> stereo-chemistry reported in https://github.com/rdkit/rdkit/issues/2363
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rdkit_rdkit_issues_2363&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=5QXEEnQo9VkJH7cIXFb_E4UmFhbbILws-P-WlR4_pzpv_6dQk_-xFQGH00p03i-I&m=BnaAWSqyTt4tDiaikrUZmNMgOhNeWPi73bwdA9j4-T8&s=bALfjkh7w44Zp0A885uv77DRWnreozm-_FML9XdeW60&e=>
>
> is caused by an overly eager keto/enol tautomerizer.
>
>
>
> Best regards
>
>
>
> Ansgar
>
>
>
>
>
> *Ansgar Schuffenhauer*
>
> Senior Investigator I
>
> T +41 79 608 9063
>
> ansgar.schuffenha...@novartis.com
>
>
>
> *Novartis Pharma AG*
>
> NIBR
>
>
>
> *From:* Greg Landrum <greg.land...@gmail.com>
> *Sent:* Montag, 22. Juli 2019 17:42
> *To:* Schuffenhauer, Ansgar <ansgar.schuffenha...@novartis.com>
> *Cc:* rdkit-discuss@lists.sourceforge.net
> *Subject:* Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 141, Issue 16
>
>
>
> Hi Ansgar,
>
>
>
> It is possible to specify the tautomer parameter file that is used, but in
> order for me to explain how, I need to know how you are currently using the
> code to enumerate tautomers (i.e. which function you are calling).
>
>
>
> As for the format: it's tab-delimited and the first entry is the name. The
> "r/f" flag is an indicator of which direction the transform is going that
> is just there to make the name unique.
>
> In the SMARTS the first atom is the one with the mobile H and the last
> atom is where it should be moved to.
>
>
>
> -greg
>
>
>
>
>
>
>
> On Mon, Jul 22, 2019 at 3:08 PM Schuffenhauer, Ansgar <
> ansgar.schuffenha...@novartis.com> wrote:
>
> Dear all
>
> For the standardizer module (Chem.MolStandardize), what is the best way to
> change some of the tautomerizer rules?
> There is a data file in
> share/RDKit/Data/Molstandardize/tautomerTransforms.in which I assume to
> define the default.
>
> //      Name    SMARTS  Bonds   Charges
> 1,3 (thio)keto/enol f   [CX4!H0]-[C]=[O,S,Se,Te;X1]
> 1,3 (thio)keto/enol r   [O,S,Se,Te;X2!H0]-[C]=[C]
> 1,5 (thio)keto/enol f   [CX4,NX3;!H0]-[C]=[C][CH0]=[O,S,Se,Te;X1]
> 1,5 (thio)keto/enol r   [O,S,Se,Te;X2!H0]-[CH0]=[C]-[C]=[C,N]
> ...
>
> Now my questions are
> 1. What is the Syntax of this file? What does the "f" and the "r" stand
> for? Do the smarts have to start with the atom carrying the mobile H?
> 2. How can I instruct rdkit not to use this default file, but the one
> supplied by the user.
>
> The background for this question that the smarts for keto/enol seems to be
> a bit too generic, as it catches also the alpha C-atoms of carboxylic acids
> and amides. Generation of tautomers here leads to a epimerization of
> stereo-centers in alpha positions of carboxylic acids and amides. That
> appears odd to me, as such stereo-centers are quite stable (in contrast to
> those of "real" ketones and aldehydes).
>
>
> Best regards
>
> Ansgar
>
> Ansgar Schuffenhauer
> Senior Investigator I
> T +41 79 608 9063
> ansgar.schuffenha...@novartis.com
>
> Novartis Pharma AG
> NIBR
>
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_rdkit-2Ddiscuss&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=5QXEEnQo9VkJH7cIXFb_E4UmFhbbILws-P-WlR4_pzpv_6dQk_-xFQGH00p03i-I&m=uiXOLxD_7MgeeA9MyeUBlDB3ufzf53oBws3smVh4cc8&s=L4Bzk6_VPaAqyj_iM8_9rz9diujKH9rSgsNrvBa5958&e=>
>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to