Re: [Rdkit-discuss] MolFromInchi with Amides

2018-06-16 Thread Greg Landrum
Hi Rocco,

On Fri, Jun 15, 2018 at 3:29 PM Rocco Moretti  wrote:

>
> Is there an easy way from within RDKit to take an arbitrary amide tautomer
> and convert it to the "correct" (according to chemists) one?
>

I suspect it's tricky to define a transformation that handles an arbitrary
tautomer, but dealing with this specific one isn't too hard:

In [4]: ims = [Chem.MolFromSmiles(x) for x in ('C(O)=N','C(O)=NC')]

In [5]: tf =
AllChem.ReactionFromSmarts('[C:1](-[OH:2])=[N:3]>>[C:1](=[O:2])-[N:3]')

In [7]: ps = [tf.RunReactants((x,))[0][0] for x in ims]

In [8]: _ = [Chem.SanitizeMol(x) for x in ps]

In [9]: [Chem.MolToSmiles(x) for x in ps]
Out[9]: ['NC=O', 'CNC=O']


-greg



>
> On Fri, Jun 15, 2018 at 12:26 AM, Markus Sitzmann <
> markus.sitzm...@gmail.com> wrote:
>
>> Hi Jeff,
>>
>> That is because InChI is a structure identifier, not a structure
>> representation. The difference of both is, a structure identifier
>> normalizes the structure to a form which it regards as the standard
>> representation of the molecule in order to make the molecule identifiable
>> regardless of the state the molecule is coming in from a input resource
>> (and hence calculates the same identifier).
>>
>> For Standard InChI, the decision was made to make them insensitive to
>> tautomers (within the limitations of the InChI algorithm). Kind of
>> unluckily, this normalizes most amides to a form that chemists regard as
>> the incorrect one. And the second unlucky thing is that you can convert the
>> InChI back to a structure representation which then  is of course the
>> normalized or standardized form of the molecule.
>>
>> So if you want to make sure to keep the original representation of a
>> molecule don’t use InChI as your representation format (calculate InChI as
>> an identifier field next to it). If your input resource only provides InChI
>> or Standard InChI then your are of course out of luck.
>>
>> Best,
>> Markus
>>
>> -
>> |  Markus Sitzmann
>> |  markus.sitzm...@gmail.com
>>
>> On 14. Jun 2018, at 23:33, Jeff van Santen 
>> wrote:
>>
>> Hi all,
>>
>>
>> I have some questions about how remit handles amides. For context, I am
>> working with a large set of molecules, many of which contain peptides. I
>> have been running into a problem with using rdkit, in that when I try to
>> load a molecule from the InChI, the wrong tautomer is loaded. As a simple
>> example consider acetamide:
>>
>>
>> """
>>
>> FromInchi = Chem.MolFromInchi('InChI=1S/C2H5NO/c1-2(3)4/h1H3,(H2,3,4)')
>>
>> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))
>>
>>  > 0
>>
>> print(Chem.MolToSmiles(FromInchi))
>>
>> > CC(=N)O
>>
>>
>> FromSmiles = Chem.MolFromSmiles('CC(=O)N')
>>
>> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))
>>
>> > 1
>>
>> print(Chem.MolToSmiles(FromSmiles))
>>
>> > CC(=N)O
>>
>> """
>>
>>
>> I realize that Standard InChi does not have a mechanism for
>> distinguishing between the two tautomers, so I am wondering why rdkit
>> considers the iminol to be a better representation? Also, there is anyway
>> to get the amide instead? (Without using MolVS)
>>
>>
>> Thanks,
>>
>> Jeff
>>
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolFromInchi with Amides

2018-06-15 Thread Rocco Moretti
Is there an easy way from within RDKit to take an arbitrary amide tautomer
and convert it to the "correct" (according to chemists) one?

On Fri, Jun 15, 2018 at 12:26 AM, Markus Sitzmann  wrote:

> Hi Jeff,
>
> That is because InChI is a structure identifier, not a structure
> representation. The difference of both is, a structure identifier
> normalizes the structure to a form which it regards as the standard
> representation of the molecule in order to make the molecule identifiable
> regardless of the state the molecule is coming in from a input resource
> (and hence calculates the same identifier).
>
> For Standard InChI, the decision was made to make them insensitive to
> tautomers (within the limitations of the InChI algorithm). Kind of
> unluckily, this normalizes most amides to a form that chemists regard as
> the incorrect one. And the second unlucky thing is that you can convert the
> InChI back to a structure representation which then  is of course the
> normalized or standardized form of the molecule.
>
> So if you want to make sure to keep the original representation of a
> molecule don’t use InChI as your representation format (calculate InChI as
> an identifier field next to it). If your input resource only provides InChI
> or Standard InChI then your are of course out of luck.
>
> Best,
> Markus
>
> -
> |  Markus Sitzmann
> |  markus.sitzm...@gmail.com
>
> On 14. Jun 2018, at 23:33, Jeff van Santen 
> wrote:
>
> Hi all,
>
>
> I have some questions about how remit handles amides. For context, I am
> working with a large set of molecules, many of which contain peptides. I
> have been running into a problem with using rdkit, in that when I try to
> load a molecule from the InChI, the wrong tautomer is loaded. As a simple
> example consider acetamide:
>
>
> """
>
> FromInchi = Chem.MolFromInchi('InChI=1S/C2H5NO/c1-2(3)4/h1H3,(H2,3,4)')
>
> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))
>
>  > 0
>
> print(Chem.MolToSmiles(FromInchi))
>
> > CC(=N)O
>
>
> FromSmiles = Chem.MolFromSmiles('CC(=O)N')
>
> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))
>
> > 1
>
> print(Chem.MolToSmiles(FromSmiles))
>
> > CC(=N)O
>
> """
>
>
> I realize that Standard InChi does not have a mechanism for distinguishing
> between the two tautomers, so I am wondering why rdkit considers the iminol
> to be a better representation? Also, there is anyway to get the amide
> instead? (Without using MolVS)
>
>
> Thanks,
>
> Jeff
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolFromInchi with Amides

2018-06-14 Thread Markus Sitzmann
Hi Jeff,

That is because InChI is a structure identifier, not a structure 
representation. The difference of both is, a structure identifier normalizes 
the structure to a form which it regards as the standard representation of the 
molecule in order to make the molecule identifiable regardless of the state the 
molecule is coming in from a input resource (and hence calculates the same 
identifier).

For Standard InChI, the decision was made to make them insensitive to tautomers 
(within the limitations of the InChI algorithm). Kind of unluckily, this 
normalizes most amides to a form that chemists regard as the incorrect one. And 
the second unlucky thing is that you can convert the InChI back to a structure 
representation which then  is of course the normalized or standardized form of 
the molecule. 

So if you want to make sure to keep the original representation of a molecule 
don’t use InChI as your representation format (calculate InChI as an identifier 
field next to it). If your input resource only provides InChI or Standard InChI 
then your are of course out of luck.

Best,
Markus

-
|  Markus Sitzmann
|  markus.sitzm...@gmail.com

> On 14. Jun 2018, at 23:33, Jeff van Santen  wrote:
> 
> Hi all,
> 
> 
> I have some questions about how remit handles amides. For context, I am 
> working with a large set of molecules, many of which contain peptides. I have 
> been running into a problem   with using rdkit, in that when I try to 
> load a molecule from the InChI, the wrong tautomer is loaded. As a simple 
> example consider acetamide:
> 
> 
> """
> 
> FromInchi = Chem.MolFromInchi('InChI=1S/C2H5NO/c1-2(3)4/h1H3,(H2,3,4)')
> 
> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))
> 
>  > 0
> 
> print(Chem.MolToSmiles(FromInchi))
> 
> > CC(=N)O
> 
> 
> 
> FromSmiles = Chem.MolFromSmiles('CC(=O)N')
> 
> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))
> 
> > 1
> 
> print(Chem.MolToSmiles(FromSmiles))
> 
> > CC(=N)O
> 
> """
> 
> 
> I realize that Standard InChi does not have a mechanism for distinguishing 
> between the two tautomers, so I am wondering why rdkit considers the iminol 
> to be a better representation? Also, there is anyway to get the amide 
> instead? (Without using MolVS)
> 
> 
> Thanks,
> 
> Jeff
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] MolFromInchi with Amides

2018-06-14 Thread Jeff van Santen

Hi all,


I have some questions about how remit handles amides. For context, I am 
working with a large set of molecules, many of which contain peptides. I 
have been running into a problem with using rdkit, in that when I try to 
load a molecule from the InChI, the wrong tautomer is loaded. As a 
simple example consider acetamide:



"""

FromInchi = Chem.MolFromInchi('InChI=1S/C2H5NO/c1-2(3)4/h1H3,(H2,3,4)')

print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))

 > 0

print(Chem.MolToSmiles(FromInchi))

> CC(=N)O


FromSmiles = Chem.MolFromSmiles('CC(=O)N')

print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))

> 1

print(Chem.MolToSmiles(FromSmiles))

> CC(=N)O

"""


I realize that Standard InChi does not have a mechanism for 
distinguishing between the two tautomers, so I am wondering why rdkit 
considers the iminol to be a better representation? Also, there is 
anyway to get the amide instead? (Without using MolVS)



Thanks,

Jeff


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss