Is there an easy way from within RDKit to take an arbitrary amide tautomer
and convert it to the "correct" (according to chemists) one?
On Fri, Jun 15, 2018 at 12:26 AM, Markus Sitzmann <[email protected]
> wrote:
> Hi Jeff,
>
> That is because InChI is a structure identifier, not a structure
> representation. The difference of both is, a structure identifier
> normalizes the structure to a form which it regards as the standard
> representation of the molecule in order to make the molecule identifiable
> regardless of the state the molecule is coming in from a input resource
> (and hence calculates the same identifier).
>
> For Standard InChI, the decision was made to make them insensitive to
> tautomers (within the limitations of the InChI algorithm). Kind of
> unluckily, this normalizes most amides to a form that chemists regard as
> the incorrect one. And the second unlucky thing is that you can convert the
> InChI back to a structure representation which then is of course the
> normalized or standardized form of the molecule.
>
> So if you want to make sure to keep the original representation of a
> molecule don’t use InChI as your representation format (calculate InChI as
> an identifier field next to it). If your input resource only provides InChI
> or Standard InChI then your are of course out of luck.
>
> Best,
> Markus
>
> -------------------------------------
> | Markus Sitzmann
> | [email protected]
>
> On 14. Jun 2018, at 23:33, Jeff van Santen <[email protected]>
> wrote:
>
> Hi all,
>
>
> I have some questions about how remit handles amides. For context, I am
> working with a large set of molecules, many of which contain peptides. I
> have been running into a problem with using rdkit, in that when I try to
> load a molecule from the InChI, the wrong tautomer is loaded. As a simple
> example consider acetamide:
>
>
> """
>
> FromInchi = Chem.MolFromInchi('InChI=1S/C2H5NO/c1-2(3)4/h1H3,(H2,3,4)')
>
> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))
>
> > 0
>
> print(Chem.MolToSmiles(FromInchi))
>
> > CC(=N)O
>
>
> FromSmiles = Chem.MolFromSmiles('CC(=O)N')
>
> print(rdMolDescriptors.CalcNumAmideBonds(FromInchi))
>
> > 1
>
> print(Chem.MolToSmiles(FromSmiles))
>
> > CC(=N)O
>
> """
>
>
> I realize that Standard InChi does not have a mechanism for distinguishing
> between the two tautomers, so I am wondering why rdkit considers the iminol
> to be a better representation? Also, there is anyway to get the amide
> instead? (Without using MolVS)
>
>
> Thanks,
>
> Jeff
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss