Re: [Rdkit-discuss] MolFromInchi with Amides
Hi Rocco, On Fri, Jun 15, 2018 at 3:29 PM Rocco Moretti wrote: > > Is there an easy way from within RDKit to take an arbitrary amide tautomer > and convert it to the "correct" (according to chemists) one? > I suspect it's tricky to define a transformation that handles an arbitrary tautomer, but dealing with this specific one isn't too hard: In [4]: ims = [Chem.MolFromSmiles(x) for x in ('C(O)=N','C(O)=NC')] In [5]: tf = AllChem.ReactionFromSmarts('[C:1](-[OH:2])=[N:3]>>[C:1](=[O:2])-[N:3]') In [7]: ps = [tf.RunReactants((x,))[0][0] for x in ims] In [8]: _ = [Chem.SanitizeMol(x) for x in ps] In [9]: [Chem.MolToSmiles(x) for x in ps] Out[9]: ['NC=O', 'CNC=O'] -greg > > On Fri, Jun 15, 2018 at 12:26 AM, Markus Sitzmann < > markus.sitzm...@gmail.com> wrote: > >> Hi Jeff, >> >> That is because InChI is a structure identifier, not a structure >> representation. The difference of both is, a structure identifier >> normalizes the structure to a form which it regards as the standard >> representation of the molecule in order to make the molecule identifiable >> regardless of the state the molecule is coming in from a input resource >> (and hence calculates the same identifier). >> >> For Standard InChI, the decision was made to make them insensitive to >> tautomers (within the limitations of the InChI algorithm). Kind of >> unluckily, this normalizes most amides to a form that chemists regard as >> the incorrect one. And the second unlucky thing is that you can convert the >> InChI back to a structure representation which then is of course the >> normalized or standardized form of the molecule. >> >> So if you want to make sure to keep the original representation of a >> molecule don’t use InChI as your representation format (calculate InChI as >> an identifier field next to it). If your input resource only provides InChI >> or Standard InChI then your are of course out of luck. >> >> Best, >> Markus >> >> - >> | Markus Sitzmann >> | markus.sitzm...@gmail.com >> >> On 14. Jun 2018, at 23:33, Jeff van Santen >> wrote: >> >> Hi all, >> >> >> I have some questions about how remit handles amides. For context, I am >> working with a large set of molecules, many of which contain peptides. I >> have been running into a problem with using rdkit, in that when I try to >> load a molecule from the InChI, the wrong tautomer is loaded. As a simple >> example consider acetamide: >> >> >> """ >> >> FromInchi = Chem.MolFromInchi('InChI=1S/C2H5NO/c1-2(3)4/h1H3,(H2,3,4)') >> >> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi)) >> >> > 0 >> >> print(Chem.MolToSmiles(FromInchi)) >> >> > CC(=N)O >> >> >> FromSmiles = Chem.MolFromSmiles('CC(=O)N') >> >> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi)) >> >> > 1 >> >> print(Chem.MolToSmiles(FromSmiles)) >> >> > CC(=N)O >> >> """ >> >> >> I realize that Standard InChi does not have a mechanism for >> distinguishing between the two tautomers, so I am wondering why rdkit >> considers the iminol to be a better representation? Also, there is anyway >> to get the amide instead? (Without using MolVS) >> >> >> Thanks, >> >> Jeff >> >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolFromInchi with Amides
Is there an easy way from within RDKit to take an arbitrary amide tautomer and convert it to the "correct" (according to chemists) one? On Fri, Jun 15, 2018 at 12:26 AM, Markus Sitzmann wrote: > Hi Jeff, > > That is because InChI is a structure identifier, not a structure > representation. The difference of both is, a structure identifier > normalizes the structure to a form which it regards as the standard > representation of the molecule in order to make the molecule identifiable > regardless of the state the molecule is coming in from a input resource > (and hence calculates the same identifier). > > For Standard InChI, the decision was made to make them insensitive to > tautomers (within the limitations of the InChI algorithm). Kind of > unluckily, this normalizes most amides to a form that chemists regard as > the incorrect one. And the second unlucky thing is that you can convert the > InChI back to a structure representation which then is of course the > normalized or standardized form of the molecule. > > So if you want to make sure to keep the original representation of a > molecule don’t use InChI as your representation format (calculate InChI as > an identifier field next to it). If your input resource only provides InChI > or Standard InChI then your are of course out of luck. > > Best, > Markus > > - > | Markus Sitzmann > | markus.sitzm...@gmail.com > > On 14. Jun 2018, at 23:33, Jeff van Santen > wrote: > > Hi all, > > > I have some questions about how remit handles amides. For context, I am > working with a large set of molecules, many of which contain peptides. I > have been running into a problem with using rdkit, in that when I try to > load a molecule from the InChI, the wrong tautomer is loaded. As a simple > example consider acetamide: > > > """ > > FromInchi = Chem.MolFromInchi('InChI=1S/C2H5NO/c1-2(3)4/h1H3,(H2,3,4)') > > print(rdMolDescriptors.CalcNumAmideBonds(FromInchi)) > > > 0 > > print(Chem.MolToSmiles(FromInchi)) > > > CC(=N)O > > > FromSmiles = Chem.MolFromSmiles('CC(=O)N') > > print(rdMolDescriptors.CalcNumAmideBonds(FromInchi)) > > > 1 > > print(Chem.MolToSmiles(FromSmiles)) > > > CC(=N)O > > """ > > > I realize that Standard InChi does not have a mechanism for distinguishing > between the two tautomers, so I am wondering why rdkit considers the iminol > to be a better representation? Also, there is anyway to get the amide > instead? (Without using MolVS) > > > Thanks, > > Jeff > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] MolFromInchi with Amides
Hi Jeff, That is because InChI is a structure identifier, not a structure representation. The difference of both is, a structure identifier normalizes the structure to a form which it regards as the standard representation of the molecule in order to make the molecule identifiable regardless of the state the molecule is coming in from a input resource (and hence calculates the same identifier). For Standard InChI, the decision was made to make them insensitive to tautomers (within the limitations of the InChI algorithm). Kind of unluckily, this normalizes most amides to a form that chemists regard as the incorrect one. And the second unlucky thing is that you can convert the InChI back to a structure representation which then is of course the normalized or standardized form of the molecule. So if you want to make sure to keep the original representation of a molecule don’t use InChI as your representation format (calculate InChI as an identifier field next to it). If your input resource only provides InChI or Standard InChI then your are of course out of luck. Best, Markus - | Markus Sitzmann | markus.sitzm...@gmail.com > On 14. Jun 2018, at 23:33, Jeff van Santen wrote: > > Hi all, > > > I have some questions about how remit handles amides. For context, I am > working with a large set of molecules, many of which contain peptides. I have > been running into a problem with using rdkit, in that when I try to > load a molecule from the InChI, the wrong tautomer is loaded. As a simple > example consider acetamide: > > > """ > > FromInchi = Chem.MolFromInchi('InChI=1S/C2H5NO/c1-2(3)4/h1H3,(H2,3,4)') > > print(rdMolDescriptors.CalcNumAmideBonds(FromInchi)) > > > 0 > > print(Chem.MolToSmiles(FromInchi)) > > > CC(=N)O > > > > FromSmiles = Chem.MolFromSmiles('CC(=O)N') > > print(rdMolDescriptors.CalcNumAmideBonds(FromInchi)) > > > 1 > > print(Chem.MolToSmiles(FromSmiles)) > > > CC(=N)O > > """ > > > I realize that Standard InChi does not have a mechanism for distinguishing > between the two tautomers, so I am wondering why rdkit considers the iminol > to be a better representation? Also, there is anyway to get the amide > instead? (Without using MolVS) > > > Thanks, > > Jeff > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] MolFromInchi with Amides
Hi all, I have some questions about how remit handles amides. For context, I am working with a large set of molecules, many of which contain peptides. I have been running into a problem with using rdkit, in that when I try to load a molecule from the InChI, the wrong tautomer is loaded. As a simple example consider acetamide: """ FromInchi = Chem.MolFromInchi('InChI=1S/C2H5NO/c1-2(3)4/h1H3,(H2,3,4)') print(rdMolDescriptors.CalcNumAmideBonds(FromInchi)) > 0 print(Chem.MolToSmiles(FromInchi)) > CC(=N)O FromSmiles = Chem.MolFromSmiles('CC(=O)N') print(rdMolDescriptors.CalcNumAmideBonds(FromInchi)) > 1 print(Chem.MolToSmiles(FromSmiles)) > CC(=N)O """ I realize that Standard InChi does not have a mechanism for distinguishing between the two tautomers, so I am wondering why rdkit considers the iminol to be a better representation? Also, there is anyway to get the amide instead? (Without using MolVS) Thanks, Jeff -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss