The InChIs have me confused.
I'm going to simplify the below by just showing the input SMILES, the
current (=master) RDKit InChI and the PubChem InChI
On Mon, Feb 23, 2015 at 10:54 AM, JP <jeanpaul.ebe...@inhibox.com> wrote:
>
> Here is the list (first inchi is the 2014_09_2, second one is the
> 2015.03.1pre generated one, third inchi is the cactus.nci.nih.gov):
>
> O=C(/N=c1/[nH]ncs1)[C@H]1CC[C@H](Cn2cnc3ccccc3c2=O)CC1
> InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13-
> # RDKit 2015.03.1pre
> InChI=1S/C18H29N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h12-15,19-20H,1-11H2,(H,21,22,24)/t12-,13-,14?,15?
> # cactus.nci.nih.gov
>
> O=C(/N=c1\[nH]c(-c2ccccn2)cs1)[C@H]1CC[C@H](Cn2cnc3ccccc3c2=O)CC1
> InChI=1S/C24H23N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h1-7,12,14-17H,8-11,13H2,(H,27,28,30)/t16-,17-
>
> InChI=1S/C24H39N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h16-21,25-26H,1-15H2,(H,27,28,30)/t16-,17-,18?,19?,20?,21?
>
> CCOC(=O)Cc1cs/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4ccccc4c3=O)CC2)[nH]1
> InChI=1S/C23H26N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h3-6,13-16H,2,7-12H2,1H3,(H,25,26,29)/t15-,16-
>
> InChI=1S/C23H36N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h15-19,24H,2-14H2,1H3,(H,25,26,29)/t15-,16-,17?,18?,19?
>
> COCc1n[nH]/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4ccccc4c3=O)CC2)s1
> InChI=1S/C20H23N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h2-5,12-14H,6-11H2,1H3,(H,22,24,26)/t13-,14-
>
> InChI=1S/C20H33N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h13-17,21,23H,2-12H2,1H3,(H,22,24,26)/t13-,14-,15?,16?,17?
>
> COC(=O)c1[nH]/c(=N\C(=O)[C@H]2CC[C@H](Cn3cnc4ccccc4c3=O)CC2)sc1C(C)C
> InChI=1S/C24H28N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h4-7,13-16H,8-12H2,1-3H3,(H,26,27,29)/t15-,16-
>
> InChI=1S/C24H38N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h14-20,25H,4-13H2,1-3H3,(H,26,27,29)/t15-,16-,17?,18?,19?,20?
>
> CC(C)[C@H]1CC[C@H](C(=O)N[C@H](Cc2ccccc2)C(=O)/N=c2\[nH]ncs2)CC1
> InChI=1S/C21H28N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h3-7,13-14,16-18H,8-12H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1
>
> InChI=1S/C21H36N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h14-18,22H,3-13H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1
>
If you look in the formula layer for the InChIs from PubChem, you will see
that they all have *way* too many H atoms. I think there's something about
the structures that is confusing the pubchem/cactvs conversion code.
Compare these two outputs.
Aromatic form:
http://cactus.nci.nih.gov/chemical/structure/O=C(N=c1[nH]ncs1)C1CCC(Cn2cnc3ccccc3c2=O)CC1/stdinchi
produces:
InChI=1S/C18H29N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h12-15,19-20H,1-11H2,(H,21,22,24)
Kekule form:
http://cactus.nci.nih.gov/chemical/structure/O=C(/N=C1/[NH]N=CS1)[C@H]1CC[C@H](CN2C=NC3=CC=CC=C3C2=O)CC1/stdinchi
produces:
InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13-
In fact, converting the 5 membered ring to kekule form is enough:
http://cactus.nci.nih.gov/chemical/structure/O=C(N=C1[NH]N=CS1)C1CCC(Cn2cnc3ccccc3c2=O)CC1/stdinchi
produces:
InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)
This can't be true.
We can further simplify things to track down the problem:
http://cactus.nci.nih.gov/chemical/structure/N=c1[nH]ncs1/stdinchi
InChI=1S/C2H5N3S/c3-2-5-4-1-6-2/h4H,1H2,(H2,3,5)
vs
http://cactus.nci.nih.gov/chemical/structure/O=c1[nH]ncs1/stdinchi
InChI=1S/C2H2N2OS/c5-2-4-3-1-6-2/h1H,(H,4,5)
It seems to be the exocyclic bond to an atom with Hs. This is ok:
http://cactus.nci.nih.gov/chemical/structure/O=c1occo1/stdinchi
InChI=1S/C3H2O3/c4-3-5-1-2-6-3/h1-2H
but both of these are wrong:
http://cactus.nci.nih.gov/chemical/structure/N=c1occo1/stdinchi
InChI=1S/C3H5NO2/c4-3-5-1-2-6-3/h4H,1-2H2
http://cactus.nci.nih.gov/chemical/structure/C=c1occo1/stdinchi
InChI=1S/C4H6O2/c1-4-5-2-3-6-4/h1-3H2
I'm pretty sure that this is not the RDKit doing the wrong thing.
@Markus: what would be the best way to report this to the NCI CADD guys?
-greg
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss