First all, please find attached the test cases: 428 from ChEMBL (attached
off list in a separate email). They are in a tab-separated file, one line
per molecule. The first column is the canonical SMILES. The remaining 10
columns are "anti-canonical" SMILES. When the "anti-canonical" SMILES are
read in, and converted to canonical SMILES, aromaticity is lost in at least
one case (of the 10).

Craig I am with you on this. Whatever needs to be done I will help with.
There has been a recent discussion on the mailing list about this.

Regarding the Kekule form...I'm not sure how to get this right now. It's
not really exposed to the user. Geoff any thoughts? I could add some code
at the SMILES writer end if you can expose it somehow during the
Kekulization.. There have been some requests for this recently and in the
past also.

- Noel

On 17 April 2012 16:56, Craig James <cja...@emolecules.com> wrote:

> On Tue, Apr 17, 2012 at 7:56 AM, Noel O'Boyle <baoille...@gmail.com>
> wrote:
> > Well, Geoff, if you're going to be working on this I've recently been
> > subjecting ChEMBL to some canonicalisation tests, and can supply a good
> few
> > more test cases. I'll wrap them up and email them to you off list
> tomorrow.
>
> Would it be possible to get all of these cases -- both the Kekule'
> form and the "correct" canonicalized aromatic versions of them?
>
> I more-or-less threw up my hands on getting the OpenSMILES definition
> of aromaticity nailed down, due in part to the fact that I'm not a
> chemist and I was never able to spark a conversation to resolve it.
> Now I'm thinking that if we have a comprehensive set of example
> structures, it might be possible to do it by spelling out a few fairly
> simple rules, and then enumerate a set of exceptions or special cases.
>  Or maybe given a complete set of examples, someone can actually
> create a set of rules that handles every case.
>
> Craig
>
> >
> > - Noel
> >
> > On 17 April 2012 15:11, Geoffrey Hutchison <ge...@geoffhutchison.net>
> wrote:
> >>
> >> > Let me start with a little more background on the problem. I am using
> >> > Pybel to extract the information I need about a set of ~875 PAH
> molecules
> >> > (including alkyl substituted and radical PAHs).
> >> ...
> >> > "signature" of an error is typically that a C atom is labelled as sp3
> >> > hybridized when it only has three atoms attached. (I have since
> learned that
> >> > I can correct the labeling of one of the molecules by reordering the C
> >> > atoms.)
> >>
> >> Quick question -- can we turn this data set into a unit test to
> distribute
> >> with Open Babel? I wrote up a few fused aromatics into one of the
> tests, and
> >> we've added through bug reports. But this is definitely the most
> systematic
> >> torture test of Kekulization that I've seen.
> >>
> >> > I have worked quite a bit with two of the molecules, azulene and
> >> > 2175908. I have tried to reorder the atoms, convert to 2d, create a
> mol file
> >> > using openbabel, remove hydrogens and then convert to 2d, etc. None
> of these
> >> > things has helped. However, when I create the same molecule in
> ChemDraw,
> >> > openbabel does label the aromaticity correctly.
> >>
> >> Right. The problem with XYZ format is that Open Babel has to work out
> all
> >> the bond orders from scratch, while in ChemDraw, it just has to detect
> that
> >> it's an aromatic system.
> >>
> >> As Noel can tell you, we've worked through plenty of rare, subtle Kekule
> >> bugs across versions, so this will definitely help us stomp out more of
> >> them.
> >>
> >> If no one else goes for it, I should have some time on Thursday to sift
> >> through the code and fix this.
> >>
> >> Thanks,
> >> -Geoff
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Better than sec? Nothing is better than sec when it comes to
> > monitoring Big Data applications. Try Boundary one-second
> > resolution app monitoring today. Free.
> > http://p.sf.net/sfu/Boundary-dev2dev
> > _______________________________________________
> > OpenBabel-discuss mailing list
> > openbabel-disc...@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
> >
>
------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to