Dear all,
Nadine and I have been working on a new canonicalization algorithm for the
RDKit. The algorithm, inspired by an idea from Roger, is more robust,
faster, and deals much better with some odd high-symmetry edge cases. We do
plan to write the new algorithm up, but at the moment the description is
the code. ;-)
We've still got some optimization work to do, but the new algorithm is
already faster than the old one. As an example: generating SMILES for 50K
molecules from the Zinc Natural Products set took 13.1s with the 2014_09_1
release; with the current code it takes 8.3s (using the tcmalloc library
this number comes down to 5.7s).
I merged the new implementation onto the trunk this morning and plan to
have it be part of the next release.
Nadine has done a thorough job of testing - doing round-trips and atom
renumbering tests on millions of molecules with only a few failures that we
feel like we understand, we will do a post on these later - but the problem
is definitely not a trivial one, so there are almost certainly some bugs
lurking. If you find one, we'd love to hear about it.
Note that because the atom-ordering algorithm has changed, SMILES generated
with the new code will be different from those generated with the old code.
Best Regards,
-greg
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss