Hello all,

Back on the 19/03/2009 I emailed to this list with the subject
"Canonical SMILES performance" about a test set of around 18000
PubChem 3D structures. I did the following analysis:
(1) sdf -> can
(2) sdf -> smi -> can
(3) diff of (1) and (2)

At that time, we had 1424 failures (8%), which wasn't great. According
to a later email, the 22x branch finished with 190 failures.

I've just redone the analysis - the download from PubChem has changed,
but still has 18000 or so molecules
(ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound_3D/SDF/Conformers_00000001_00025000.sdf.gz)

Now we have only 5 failures. Pretty good by any measure.

(There were two canonicalisation timeouts...I think we should add an
option either to obabel, or to the canonical format, to set the
timeout.)

obabel failures.sdf -ocan -O sdf_to_can.txt
obabel failures.sdf -osmi -O sdf_to_smi.txt
obabel -ismi sdf_to_can.txt -ocan smi_to_can.txt
diff sdf_to_can.txt sdf_to_smi.txt

< c12=NCCN=c1ncnc2      167
< N12CC[C@@H](CC1)CC2   7527
< c12c3c(cc4c1c1c(nn2)c2c(cc1cc4)cccc2)cccc3    9107
< c12c(c(c[nH]1)C[C@@h]1n3c...@h](C1)CC3)cccc2  21918
< c\1(=c/2\[n+](=O)cccc2)/n(cccc1)[O-]  23699
---
> C12=NCCN=C1NCNC2      167
> n12c...@h](CC1)CC2    7527
> c12c3c(cc4c1c1c([nH][nH]2)c2c(cc1cc4)cccc2)cccc3      9107
> c12c(c(c[nH]1)C[C@@H]1N3CC[C@@H](C1)CC3)cccc2 21918
> C1(C2[N+](=O)CCCC2)N(CCCC1)[O-]       23699

I make it two kekulization problems and two canonicalisation problems
(both the same substructure). The fifth structure (23699) is a tough
one.

failures.sdf attached.

- Noel

Attachment: failures.sdf
Description: Binary data

------------------------------------------------------------------------------
Virtualization is moving to the mainstream and overtaking non-virtualized
environment for deploying applications. Does it make network security 
easier or more difficult to achieve? Read this whitepaper to separate the 
two and get a better understanding.
http://p.sf.net/sfu/hp-phase2-d2d
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to