On 10/5/10 10:56 AM, Geoffrey Hutchison wrote: > That's good! Maybe we should let you generate a file of errors >over the next day or two(?), post it, and then make every effort > to fix them?
The truth is, just a dozen bugs out of over a million structures is quite remarkable. Naturally we all want a zero error rate, but this is a massive improvement over the previous release. I'm very impressed by the work everyone has put into this. My biggest concern is the runaway process that used up 26 GB of memory before I killed it. The molecules that take a very long time to canonicalize are also problematic, but with the "timeout" feature, they're not deadly. We should verify that structures like ferrocenes, and buckyballs are handled in a reasonable amount of time since they're so common. > With some of the smaller subsets we've been testing, it's easy to > fix the bug, verify, etc. With your large DB, it seems like we > should allow for longer lead-time between "iterations" and perhaps > concentrate on other pre-release tasks. I'm happy to keep chugging away on the data, but my suspicion is that the four classes of bugs I've encountered so far is all we'll find. There's a chance there's another bug that will pop up in the next 4 million compounds, but my money is that we've identified all of the important bugs. If anyone wants a big test set, you can download the eMolecules SMILES or SDF files yourself: http://www.emolecules.com/doc/plus/download-database.php It's about six months old, but still very good data. You can get it as SMILES or SDF. PubChem is also a useful test case. It's much larger but tends to have a lot of redundancy because they include things like Zinc (which has multiple automatically-generated stereoisomers for each of the input molecules) as well as some "virtual libraries" (from suppliers who say they have in-stock compounds but actually have never synthesized them). Craig ------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today. http://p.sf.net/sfu/beautyoftheweb _______________________________________________ OpenBabel-Devel mailing list OpenBabel-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-devel