On Thu, May 31, 2012 at 7:13 AM, Noel O'Boyle <baoille...@gmail.com> wrote:
> On 30 May 2012 16:29, Craig James <cja...@emolecules.com> wrote:
> > Hi Tim,
> >
> > When I reverted the SMILES canonicalizer (that is,
> > SMIBaseFormat::WriteMolecule()) back so that it doesn't copy the
> molecule, I
> > solved the performance problem, but it causes the Tautomer Test to fail.
> > I've looked through your tautomer code briefly, but before I spend a
> great
> > deal of time figuring it all out, I wonder if you can spot the problem
> > quickly since you wrote the tautomer code.
> >
> > My guess is that the SMILES canonicalizer screws up your tautomer code
> > because it adds explicit hydrogens to chiral centers (mostly for
> convenience
> > so that there would always be four bonds to every tetrahedral center),
> which
> > understandably would alter the nature of what the tautomer generator is
> > doing.
> >
> > So the first question: Does the SMILES canonicalizer still need to add
> > explicit hydrogens to the chiral atoms? You rewrote all of the stereo
> code,
> > and I wonder if the extra hydrogens are even necessary. I looked through
> > the graphsym.cpp can canon.cpp files and couldn't see any need for the
> > explicit hydrogens. I might try disabling the add-explicit-hydrogens
> > feature and see if the tests still pass, but I prefer a more analytical
> > approach.
>
> Another option would be to remember which atoms were added, and remove
> them afterwards. This would solve the problem in the short term.
>
I thought about that, and may try it. But it's probably too big of a
performance hit. On large, highly symmetrical molecules, I'm computing
thousands or tens of thousands of "fragment SMILES," and it would have to
add/remove the H atoms over and over.
And I suppose I'd have to be really careful to "fool" OpenBabel so that
when BeginModify/EndModify are called it doesn't discard the aromaticity
and other calculated information.
On the other hand, the specific molecules that are problematic probably
don't have any tetrahedral centers...
Craig
> > Second question: Is there some reason why the tautomer code gets screwed
> up
> > by the added explicit hydrogens? Is it something that could be fixed
> > easily?
> >
> > I'm going to dig around an older version of smilesformat.cpp and
> canon.cpp
> > to refresh my memory about where the explicit hydrogens were needed.
> >
> > Thanks,
> > Craig
> >
> >
> ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > OpenBabel-Devel mailing list
> > OpenBabel-Devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/openbabel-devel
> >
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel