On 10/5/10 10:56 AM, Geoffrey Hutchison wrote:
> That's good! Maybe we should let you generate a file of errors
>over the next day or two(?), post it, and then make every effort
>  to fix them?

The truth is, just a dozen bugs out of over a million structures is quite 
remarkable.  Naturally we all want a zero error rate, but this is a
massive improvement over the previous release.  I'm very impressed by
the work everyone has put into this.

My biggest concern is the runaway process that used up 26 GB of memory before I 
killed it.  The molecules that take a very long time to canonicalize are also 
problematic, but with the "timeout" feature, they're not deadly.  We should 
verify that structures like ferrocenes, and buckyballs are handled in a 
reasonable amount of time  since they're so common.

> With some of the smaller subsets we've been testing, it's easy to
>  fix the bug, verify, etc. With your large DB, it seems like we
>  should allow for longer lead-time between "iterations" and perhaps
>  concentrate on other pre-release tasks.

I'm happy to keep chugging away on the data, but my suspicion is that the four 
classes of bugs I've encountered so far is all we'll find.  There's a chance 
there's another bug that will pop up in the next 4 million compounds, but my 
money is that we've identified all of the important bugs.

If anyone wants a big test set, you can download the eMolecules SMILES or SDF 
files yourself:

   http://www.emolecules.com/doc/plus/download-database.php

It's about six months old, but still very good data.   You can get it as SMILES 
or SDF.

PubChem is also a useful test case.  It's much larger but tends to have a lot 
of redundancy because they include things like Zinc (which has multiple 
automatically-generated stereoisomers for each of the input molecules) as well 
as some "virtual libraries" (from suppliers who say they have in-stock 
compounds but actually have never synthesized them).

Craig

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to