Re: [OpenBabel-Devel] Aromaticity detection recomputed unnecessarily?

Craig James Thu, 24 May 2012 07:45:49 -0700

On Thu, May 24, 2012 at 6:03 AM, Noel O'Boyle <baoille...@gmail.com> wrote:


> Looking at "blame" online, it seems that Tim made the change to allow
> ferrocene structures to be normalised for the eMolecule
> canonicalisation tests:
>
>
> http://openbabel.svn.sourceforge.net/viewvc/openbabel/openbabel/trunk/src/formats/smilesformat.cpp?r1=4033&r2=4034&;
>
> IMO, structure normalisations like this are probably best done as an
> op (like the normalisation of dative bonds is currently) rather than
> inline in the smilesformat.
>

Interesting.  Even more interesting is that this ferrocene code was taken
out at some point.


http://openbabel.svn.sourceforge.net/viewvc/openbabel/openbabel/trunk/src/formats/smilesformat.cpp?r1=4034&r2=4758&;

I have a vague recollection of a discussion about the wisdom of doing this
in OpenBabel versus adding either normalization plug-ins or letting the
application decide how to do it, but I could be wrong.

I'll look in detail at the rest of the changes to smilesformat.cpp and see
if I can spot any reason for copying the molecule.  Since there's only one
BeginModify/EndModify anywhere in the file, and all it does is add
hydrogens, I suspect it will be safe to remove the copying operation.

In fact, whether or not it's safe, I have to do it.  It's dramatically
slower for fullerenes, but even for ordinary molecules it slows me down by
a factor of 5 to 100.

I should probably devise a test application to add to OpenBabel so that
this doesn't happen again.  Is there some way to add a test to the system
where success is defined by performance rather than correct output?

In fact, I tested it by removing the copy operation, and it improved the
speed dramatically ... the roughly 10,000 slowdown disappeared completely
on analyzing fullerenes.

Craig

>
> I haven't checked whether there are other modifications to the
> structure, but it would be easy to test with a before and after
> comparison.
>
> - Noel
>
> On 23 May 2012 23:50, Craig James <cja...@emolecules.com> wrote:
> > More information on this question...
> >
> > On Wed, May 23, 2012 at 3:05 PM, Craig James <cja...@emolecules.com>
> wrote:
> >>
> >> Hi,
> >>
> >> I've run into a case where one of my algorithms runs something like
> 10,000
> >> time slower in 2.3.x than it did in 2.2.x, and running valgrind with the
> >> "callgrind" tool seems to indicate that aromaticity detection is the
> >> problem.
> >>
> >> Here's the problem: Normally when you generate a SMILES, it detects
> >> aromaticity and you're done.  But in my case, I'm generating thousands
> of
> >> "fragment SMILES" on the same molecule (taking various substructures and
> >> generating a SMILES for each structure using an OBBitVec of atoms).  It
> >> appears that each time you ask for a SMILES, it discards the
> aromaticity and
> >> starts over, even though the molecule hasn't changed.
> >
> >
> > It looks like the source of the problem is here:
> >
> >   SMIBaseFormat::WriteMolecule(OBBase* pOb,OBConversion* pConv)
> >   {
> >     OBMol* pmol = dynamic_cast<OBMol*>(pOb);
> >     ....
> >     OBMol mol = *pmol;
> >
> > In other words, it makes a fresh new copy of the molecule object every
> time
> > you ask for a SMILES.  The operator= method copies most of the molecule's
> > data, but not aromaticity, so the aromaticity has to be recomputed every
> > time.  Plus, the time to copy the molecule object is significant.
> >
> > This turns out to be a horrible performance hit for my algorithm.  I'm
> > generating fingerprints using fragment SMILES, and each fragment that
> goes
> > into the fingerprint requires a canonical "fragment SMILES".  As
> mentioned
> > above, for large systems like fullerenes it's about 10,000 times slower.
> > Even for ordinary molecules it's a huge performance hit.
> >
> > As far as I can tell, there is only one place where the molecule is
> > modified: explicit hydrogens are added to chiral atoms to make the
> analysis
> > easier.
> >
> > Is there some reason for this change?  In 2.2.x the molecule wasn't
> copied.
> >
> > Thanks,
> > Craig
> >
> >>
> >>
> >> I haven't dug into the code yet ... I wanted to check to see if this
> makes
> >> sense to anyone.  In 2.2.x this didn't happen, so it's a recent change,
> and
> >> I know several OB contributors made significant improvements to
> >> aromaticity.  My "use case" is unusual so I'm not surprised this problem
> >> didn't show up in anyone else's app.
> >>
> >> It is dramatic with fullerenes -- a C70 molecule takes less than a
> second
> >> to analyze in 2.2.x and over five minutes in 2.3.x.
> >>
> >> Thanks,
> >> Craig
> >>
> >
> >
> >
> ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > OpenBabel-Devel mailing list
> > OpenBabel-Devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/openbabel-devel
> >
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Re: [OpenBabel-Devel] Aromaticity detection recomputed unnecessarily?

Reply via email to