On Tue, Apr 17, 2012 at 7:56 AM, Noel O'Boyle <baoille...@gmail.com> wrote: > Well, Geoff, if you're going to be working on this I've recently been > subjecting ChEMBL to some canonicalisation tests, and can supply a good few > more test cases. I'll wrap them up and email them to you off list tomorrow.
Would it be possible to get all of these cases -- both the Kekule' form and the "correct" canonicalized aromatic versions of them? I more-or-less threw up my hands on getting the OpenSMILES definition of aromaticity nailed down, due in part to the fact that I'm not a chemist and I was never able to spark a conversation to resolve it. Now I'm thinking that if we have a comprehensive set of example structures, it might be possible to do it by spelling out a few fairly simple rules, and then enumerate a set of exceptions or special cases. Or maybe given a complete set of examples, someone can actually create a set of rules that handles every case. Craig > > - Noel > > On 17 April 2012 15:11, Geoffrey Hutchison <ge...@geoffhutchison.net> wrote: >> >> > Let me start with a little more background on the problem. I am using >> > Pybel to extract the information I need about a set of ~875 PAH molecules >> > (including alkyl substituted and radical PAHs). >> ... >> > "signature" of an error is typically that a C atom is labelled as sp3 >> > hybridized when it only has three atoms attached. (I have since learned >> > that >> > I can correct the labeling of one of the molecules by reordering the C >> > atoms.) >> >> Quick question -- can we turn this data set into a unit test to distribute >> with Open Babel? I wrote up a few fused aromatics into one of the tests, and >> we've added through bug reports. But this is definitely the most systematic >> torture test of Kekulization that I've seen. >> >> > I have worked quite a bit with two of the molecules, azulene and >> > 2175908. I have tried to reorder the atoms, convert to 2d, create a mol >> > file >> > using openbabel, remove hydrogens and then convert to 2d, etc. None of >> > these >> > things has helped. However, when I create the same molecule in ChemDraw, >> > openbabel does label the aromaticity correctly. >> >> Right. The problem with XYZ format is that Open Babel has to work out all >> the bond orders from scratch, while in ChemDraw, it just has to detect that >> it's an aromatic system. >> >> As Noel can tell you, we've worked through plenty of rare, subtle Kekule >> bugs across versions, so this will definitely help us stomp out more of >> them. >> >> If no one else goes for it, I should have some time on Thursday to sift >> through the code and fix this. >> >> Thanks, >> -Geoff > > > > ------------------------------------------------------------------------------ > Better than sec? Nothing is better than sec when it comes to > monitoring Big Data applications. Try Boundary one-second > resolution app monitoring today. Free. > http://p.sf.net/sfu/Boundary-dev2dev > _______________________________________________ > OpenBabel-discuss mailing list > OpenBabel-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss > ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss