More information on this question...

On Wed, May 23, 2012 at 3:05 PM, Craig James <cja...@emolecules.com> wrote:

> Hi,
>
> I've run into a case where one of my algorithms runs something like 10,000
> time slower in 2.3.x than it did in 2.2.x, and running valgrind with the
> "callgrind" tool seems to indicate that aromaticity detection is the
> problem.
>
> Here's the problem: Normally when you generate a SMILES, it detects
> aromaticity and you're done.  But in my case, I'm generating thousands of
> "fragment SMILES" on the same molecule (taking various substructures and
> generating a SMILES for each structure using an OBBitVec of atoms).  It
> appears that each time you ask for a SMILES, it discards the aromaticity
> and starts over, even though the molecule hasn't changed.
>

It looks like the source of the problem is here:

  SMIBaseFormat::WriteMolecule(OBBase* pOb,OBConversion* pConv)
  {
    OBMol* pmol = dynamic_cast<OBMol*>(pOb);
    ....
    OBMol mol = *pmol;

In other words, it makes a fresh new copy of the molecule object every time
you ask for a SMILES.  The operator= method copies most of the molecule's
data, but not aromaticity, so the aromaticity has to be recomputed every
time.  Plus, the time to copy the molecule object is significant.

This turns out to be a horrible performance hit for my algorithm.  I'm
generating fingerprints using fragment SMILES, and each fragment that goes
into the fingerprint requires a canonical "fragment SMILES".  As mentioned
above, for large systems like fullerenes it's about 10,000 times slower.
Even for ordinary molecules it's a huge performance hit.

As far as I can tell, there is only one place where the molecule is
modified: explicit hydrogens are added to chiral atoms to make the analysis
easier.

Is there some reason for this change?  In 2.2.x the molecule wasn't copied.

Thanks,
Craig


>
> I haven't dug into the code yet ... I wanted to check to see if this makes
> sense to anyone.  In 2.2.x this didn't happen, so it's a recent change, and
> I know several OB contributors made significant improvements to
> aromaticity.  My "use case" is unusual so I'm not surprised this problem
> didn't show up in anyone else's app.
>
> It is dramatic with fullerenes -- a C70 molecule takes less than a second
> to analyze in 2.2.x and over five minutes in 2.3.x.
>
> Thanks,
> Craig
>
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to