Thanks for your helpful answer. I learned a lot.
I have few more questions:
1. How do you achieve non-standard InChI? Is it available in RDKit?
2. What are the 15T and KET options?
3. Is your solution cannot be systematic? As a systematic solution I tried:
enumerator = rdMolStandardize.TautomerEnumerator()
for smi in my_smi_list:
m = Chem.MolFromSmiles(smi)
m = enumerator.Canonicalize(m)
inchi = Chem.rdinchi.MolToInchi(m)
The problem with this solution was that with very big molecules (for example,
macrocycles) I have 'MemoryError'.
4. In another case (not for tautomers), I can't understand if the InChI output
is correct or not:
C[N+]1=C(\C=C\C2=CNC=C2)C=CC2=CC=CC=C12
C[N+]1=C(\C=C/C2=CNC=C2)C=CC2=CC=CC=C12
Usually, when I enter two E/Z stereoisomers - I have two different InChIs (and
the difference is in the the /b or /t layers, as should be). However, this time
(both in RDKit and OpenBabel) I have:
InChI=1S/C16H14N2/c1-18-15(8-6-13-10-11-17-12-13)9-7-14-4-2-3-5-16(14)18/h2-12H,1H3/p+1
InChI=1S/C16H14N2/c1-18-15(8-6-13-10-11-17-12-13)9-7-14-4-2-3-5-16(14)18/h2-12H,1H3/p+1
Only if I remove the charge (hydrogen instead of carbon on the methylquinoline)
or modify the pyrrole group on the other side, it gives me different InChI. Why?
Thanks a lot,
Benny
From: Markus Sitzmann [mailto:[email protected]]
Sent: Tuesday, July 21, 2020 2:47 PM
To: Da'Adoosh Binyamin <[email protected]>
Cc: [email protected]
Subject: Re: [Rdkit-discuss] RDKit/tautomers
Hi Benny,
that is a pure InChI problem (not a RDKit one). Back then when the Standard
InChI was defined, the 15T and the KET option for the InChI calculation weren't
either available or still experimental (I don't remember :-)), so they didn't
make it into the standard set of options for the Standard InChI calculation.
Hence it isn't too surprising that this tautomer pair doesn't calculate the
same Standard InChI (InChI isn't/wasn't particularly strong regarding
tautomerism outside rings). You might use (non-standard) InChI and switch the
15T and KET options on, that should fix your particular case.
In general there are still ongoing efforts to make InChI stronger regarding
tautomerism: https://pubmed.ncbi.nlm.nih.gov/32043883/
Markus
On Tue, Jul 21, 2020 at 12:11 PM Da'Adoosh Binyamin
<[email protected]<mailto:[email protected]>> wrote:
Hi,
I have a question about RDKit/tautomers.
Let's say I have smiles input:
C[CH]2CCC(=O)C1=C(O)[CH](O)C[CH](O)[CH]12
C[CH]2CCC(O)=C1C(=O)[CH](O)C[CH](O)[CH]12
Now, if I make this code for each input:
m = Chem.MolFromSmiles(input)
inchi = Chem.rdinchi.MolToInchi(m)
I get different InChIs:
InChI=1S/C11H16O4/c1-5-2-3-6(12)10-9(5)7(13)4-8(14)11(10)15/h5,7-9,13-15H,2-4H2,1H3
InChI=1S/C11H16O4/c1-5-2-3-6(12)10-9(5)7(13)4-8(14)11(10)15/h5,7-9,12-14H,2-4H2,1H3
My question is why is it happening. Usually if I enter two tautomers - they
have the same InChI (like it is supposed to be, according to the literature ).
What is the difference in this example?
Thanks,
Benny
_______________________________________________
Rdkit-discuss mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss