Using the latest code from this morning (-r4157), I ran four processes all day long for a total of about 280,000 SMILES. I found 12 more interesting cases where canonicalization failed.
Let me say again that this is very impressive, a huge reduction in problem cases! These molecules seem to fall into new categories of problems, not necessarily the canonicalizer itself -- some seem to be problems with aromaticity and bond order or H count. These were all discovered using my "shuffle" script that generates 20 random SMILES from each input SMILES using "babel -i smi -o smi -xC" then runs them through "babel -i smi -o can | sort -u". Craig http://www.emolecules.com/image?db=549&id=4842449&width=500&height=500 c12c(C(=O)C(=C(C1=O)s...@h]1c([...@h]([C@@H](CO1)OC(=O)C)OC(=O)C)OC(=O)C)s...@h]1[c@@H]([...@h]([C@@H](CO1)OC(=O)C)OC(=O)C)OC(=O)C)cccc2 4842449 c12c(C(=O)C(=C(C1=O)s...@h]1[c@@H]([...@h]([C@@H](CO1)OC(=O)C)OC(=O)C)OC(=O)C)s...@h]1c([...@h]([C@@H](CO1)OC(=O)C)OC(=O)C)OC(=O)C)cccc2 4842449 http://www.emolecules.com/image?db=549&id=4782286&width=500&height=500 [...@]12(CC(CN(C1)Cc1ccccc1)(CNC2)C)C 4782286 C12(c...@](CN(C1)Cc1ccccc1)(CNC2)C)C 4782286 http://www.emolecules.com/image?db=549&id=4785090&width=500&height=500 n12c(c3c(n4c(c5c1CCCC5)nc1c(c4=O)cccc1)cccc3)nc1c(c2=O)cccc1 4785090 n12c(=O)c3c(nc1c1c(n4c(c5c2cccc5)nc2c(c4=O)cccc2)cccc1)cccc3 4785090 n12c(=O)c3c(nc1c1c(n4c(c5c2cccc5)nc2c(c4=O)CCCC2)cccc1)cccc3 4785090 http://www.emolecules.com/image?db=549&id=5860502&width=500&height=500 c12=c3c(c4c(c5c(c1csc2)csc5)csc4)csc3 5860502 C12C(C3C(C4C(C5C1CSC5)CSC4)CSC3)CSC2 5860502 http://www.emolecules.com/image?db=549&id=5860663&width=500&height=500 c12c3c4c5c(ccc4c4c(c3ccc1cccc2)nc1c(n4)cc2c(c1)cccc2)cccc5 5860663 c12c3c4c(ccc3c3c(c2ccc2c1cccc2)nc1c(n3)cc2c(c1)CCCC2)cccc4 5860663 c12c3c(ccc2c2c(c4c1c1c(cc4)CCCC1)nc1c(n2)cc2c(c1)cccc2)cccc3 5860663 http://www.emolecules.com/image?db=549&id=5860665&width=500&height=500 c12c3c4c5c(ccc4c4c(c3ccc1cccc2)nc1c(n4)c2c(c3c1CCc1c3cccc1)c1c(CC2)cccc1)cccc5 5860665 c12c3c4c(c5c(c3ccc1cccc2)nc1c(n5)c2c(c3c1ccc1c3cccc1)c1c(cc2)CCCC1)ccc1c4CCCC1 5860665 c12c3c4c(ccc3c3c(c2ccc2c1cccc2)nc1c(n3)c2c(c3c1ccc1c3cccc1)c1c(cc2)cccc1)cccc4 5860665 c12c3c4c(ccc3c3c(c2ccc2c1cccc2)nc1c(n3)c2c(c3c1ccc1c3CCCC1)c1c(cc2)cccc1)cccc4 5860665 c12c3c4c(ccc3c3c(c2ccc2c1cccc2)nc1c(n3)c2c(c3c1ccc1c3CCCC1)c1c(cc2)CCCC1)cccc4 5860665 c12c3c4c(ccc3c3c(c2ccc2c1cccc2)nc1c(n3)c2c(c3c1CCc1c3cccc1)c1c(cc2)cccc1)cccc4 5860665 c12c3c(c4c(c5c3c3c(CC5)cccc3)nc3c(n4)c4c(c5c3ccc3c5cccc3)c3c(CC4)cccc3)ccc1cccc5860665 c12c3c(c4c(c5c3c3c(CC5)cccc3)nc3c(n4)c4c(c5c3CCc3c5cccc3)c3c(cc4)cccc3)ccc1cccc5860665 c12c3c(ccc1c1c(c4c2c2c(CC4)cccc2)nc2c(n1)c1c(c4c2ccc2c4CCCC2)c2c(cc1)CCCC2)cccc5860665 c12c3c(ccc2c2c(c4c1c1c(cc4)CCCC1)nc1c(n2)c2c(c4c1CCc1c4cccc1)c1c(cc2)cccc1)cccc5860665 http://www.emolecules.com/image?db=549&id=6137697&width=500&height=500 c12=c(nn2)ssnc1S 6137697 c12c(nn2)ssnc1S 6137697 http://www.emolecules.com/image?db=549&id=5863122&width=500&height=500 C1([N+](=O)[O-])C2C[C@@h]3...@h]1c[c@H](C2)C3 5863122 C1([N+](=O)[O-])[C@@H]2C[C@@h]3cc1...@h](C2)C3 5863122 http://www.emolecules.com/image?db=549&id=5865030&width=500&height=500 c12c3c4c5c6c7c8c(ccc7ccc6ccc5ccc4ccc3ccc1cccc2)cccc8 5865030 c12c3c(ccc2ccc2c1c1c4c5c6c(ccc5ccc4ccc1cc2)CCCC6)cccc3 5865030 http://www.emolecules.com/image?db=549&id=5865292&width=500&height=500 c12c3c4c5c6c7c8c9c(ccc8ccc7ccc6ccc5ccc4ccc3ccc1cccc2)cccc9 5865292 c12c3c4c5c6c7c8c9c(CCc8ccc7ccc6ccc5ccc4ccc3ccc1cccc2)cccc9 5865292 c12c3c4c(ccc3ccc2ccc2c1c1c3c5c6c(ccc5ccc3ccc1cc2)CCCC6)CCCC4 5865292 c12c3c(ccc2ccc2c1c1c4c5c6c7c(ccc6ccc5ccc4ccc1cc2)CCCC7)cccc3 5865292 http://www.emolecules.com/image?db=549&id=5865338&width=500&height=500 C(C(N(C)C)C)(c1ccccc1)o...@h]([C@@H](N(C)C)C)(c1ccccc1)O 5865338 [...@h]([C@@H](N(C)C)C)(c1ccccc1)O.C(C(N(C)C)C)(c1ccccc1)O 5865338 http://www.emolecules.com/image?db=549&id=5865516&width=500&height=500 c12c3c4c5c6c(c7c8c9c%10c(CCc9cc(c8ccc7cc6)Br)cccc%10)ccc5ccc4c(cc3ccc1cccc2)Br 5865516 c12c3c4c5c(c6c7c8c9c(CCc8cc(c7ccc6cc5)Br)cccc9)ccc4ccc3c(cc2ccc2c1CCCC2)Br 5865516 c12c3c(ccc2cc(c2c1c1c4c(c5c6C7c8c(CCC7CC(c6ccc5cc4)Br)cccc8)ccc1cc2)Br)cccc3 5865516 ------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb _______________________________________________ OpenBabel-Devel mailing list OpenBabel-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-devel