Hi Markus, On Fri, Apr 12, 2013 at 9:30 AM, Markus Hartenfeller <[email protected]> wrote: > > I wanted to ask if anybody is currently working on an implementation of a > tautomer enumeration (and canonicalization) in RDKit.
I'm not, but definitely think it would be interesting. > > This JCAMD paper from Sitzmann & Ihlenfeldt from 2010 describes a in > implementation that looks fairly solid at a first glance, plus: they > published the rSMARTS transformations. > > http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2886898/ > > I admit I haven't yet read it carefully. Did anyone have a look at it > already? One thing that came to my mind is that there might be a problem > with differences in the aromaticity detection between CACTVS and RDKit, > because the transforms partly depend on aromaticity flags. In addition, the > enumeration at some point would benefit from a solid way of identifying > generated duplicate molecules, so this might be another building block that > is missing. You could just do a canonical smiles filter for that. > > Any other thoughts on why a re-implemtation in RDKit might be problematic? I read the paper when it came out and don't remember seeing any absolute blockers. When I skimmed it yesterday that impression held: I think it should be possible > > It will be some work for sure. > that's certainly true. Note: another method for tautomer canonicalization (but not enumeration) is to convert to inchi and back. This is similar to Noel's "canonical smiles using inchi" idea. The approach may be somewhat fragile (I'm not convinced that the RDKit's inchi->molecule implementation is the best), but is worth considering. -greg ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

