Hi Jeff,
That is because InChI is a structure identifier, not a structure
representation. The difference of both is, a structure identifier normalizes
the structure to a form which it regards as the standard representation of the
molecule in order to make the molecule identifiable regardless of the state the
molecule is coming in from a input resource (and hence calculates the same
identifier).
For Standard InChI, the decision was made to make them insensitive to tautomers
(within the limitations of the InChI algorithm). Kind of unluckily, this
normalizes most amides to a form that chemists regard as the incorrect one. And
the second unlucky thing is that you can convert the InChI back to a structure
representation which then is of course the normalized or standardized form of
the molecule.
So if you want to make sure to keep the original representation of a molecule
don’t use InChI as your representation format (calculate InChI as an identifier
field next to it). If your input resource only provides InChI or Standard InChI
then your are of course out of luck.
Best,
Markus
-------------------------------------
| Markus Sitzmann
| markus.sitzm...@gmail.com
> On 14. Jun 2018, at 23:33, Jeff van Santen <jeffrey_van_san...@sfu.ca> wrote:
>
> Hi all,
>
>
> I have some questions about how remit handles amides. For context, I am
> working with a large set of molecules, many of which contain peptides. I have
> been running into a problem with using rdkit, in that when I try to
> load a molecule from the InChI, the wrong tautomer is loaded. As a simple
> example consider acetamide:
>
>
> """
>
> FromInchi = Chem.MolFromInchi('InChI=1S/C2H5NO/c1-2(3)4/h1H3,(H2,3,4)')
>
> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))
>
> > 0
>
> print(Chem.MolToSmiles(FromInchi))
>
> > CC(=N)O
>
>
>
> FromSmiles = Chem.MolFromSmiles('CC(=O)N')
>
> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))
>
> > 1
>
> print(Chem.MolToSmiles(FromSmiles))
>
> > CC(=N)O
>
> """
>
>
> I realize that Standard InChi does not have a mechanism for distinguishing
> between the two tautomers, so I am wondering why rdkit considers the iminol
> to be a better representation? Also, there is anyway to get the amide
> instead? (Without using MolVS)
>
>
> Thanks,
>
> Jeff
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss