Ok - am getting somewhere now. I've confirmed that there is a problem
with the current codebase and the use of 'D', e.g. for protonated
imidazole (as in histidine in vivo), two different answers are found
depending on whether hydrogens are explicit or not:

C:\Users\noel>obabel -:Cc1[nH]c[nH+]c1 -osmi
Cc1[nH]c[nH+]c1

C:\Users\noel>obabel -:Cc1[nH]c[nH+]c1 -omol -d | obabel -imol -osmi
CC1=C[NH+]=CN1

My plan is to alter the current matcher such that results on SMILES
from ChEMBL, PubChem and eMolecules are unchanged (it's just minor
tweaks for a few of the charged patterns, e.g. D3 might become D2 and
one H). Personally, I'd also like to remove any "patterns" that aren't
triggered by (aromatic atoms in) molecules in any of these databases,
on the basis that it's better to have a set of patterns that we know
are correct (and all covered by testcases) and exclude the few extra
that may or may not be as intended. However, I won't do this unless
you agree.

Regards,
- Noel


On 30 January 2017 at 22:07, Noel O'Boyle <baoille...@gmail.com> wrote:
> Maybe am overthinking. If it doesn't change the final output (as
> regards aromatic SMILES) on ChEMBL, maybe it's not worth worrying
> about now.
>
> - Noel
>
> On 30 January 2017 at 18:31, Noel O'Boyle <baoille...@gmail.com> wrote:
>> Great. One question I've run into is what was the intention of the D2
>> etc in the SMARTS patterns. Was it the number of heavy atom neighbors?
>> As written, it's the number of explicit nbrs in the graph, which is
>> complicated by the fact that OB's SMILES parser currently adds an
>> explicit H for H's inside square brackets, e.g. [CH-]. So if the
>> patterns were developed by testing on SMILES, then the intended
>> D-value is somewhat unclear for patterns that typically match atoms
>> with hydrogens but which are written as implicit hydrogens. Confused?
>> I am too. :-)
>>
>> - Noel
>>
>> On 27 January 2017 at 22:17, Geoffrey Hutchison
>> <ge...@geoffhutchison.net> wrote:
>>> I should mention on that note, that a collaboration with Carnegie Mellon
>>> students produced a parallel implementation of Kekulization using the Eigen3
>>> matrix library. They also wrote a CUDA implementation that was modestly
>>> faster.
>>>
>>> It hasn't been ported back to Open Babel yet, but I'll leave the basic code
>>> (MIT license) here:
>>> https://github.com/NarainKrishnamurthy/chemposer
>>>
>>> Anyone interested should let me know..
>>>
>>> Cheers,
>>> -Geoff
>>>
>>> On Fri, Jan 27, 2017 at 5:13 PM, Geoffrey Hutchison
>>> <ge...@geoffhutchison.net> wrote:
>>>>
>>>> I think it's a great idea. Chris Morley had recommended similar concepts
>>>> in terms of implicit valence.
>>>>
>>>> Yes, many of the stranger SMARTS patterns here are for "dodgy" SMILES that
>>>> should retain aromaticity. It's possible, perhaps to set some level of "if
>>>> it was initially flagged as an aromatic atom, be more lenient" rules in the
>>>> code.
>>>>
>>>> I'd like to continue the concept of an annual release, so in the meantime,
>>>> I think experiments are welcome.
>>>>
>>>> -Geoff
>>>>
>>>> On Fri, Jan 27, 2017 at 3:03 AM, Noel O'Boyle <baoille...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi there,
>>>>>
>>>>> Here's a heads-up on some work I've been prototyping.
>>>>>
>>>>> The aromatic atom typer currently uses SMARTS patterns in aromatic.txt
>>>>> to assign max/min values of pi electrons. A more efficient approach is
>>>>> to simultaneously match against all the SMARTS patterns rather than
>>>>> one at a time, and well, to avoid using SMARTS at all.
>>>>>
>>>>> I've attached a Python prototype that shows the general idea - see the
>>>>> function getMinMax (the calls to IsAromatic will have to be removed,
>>>>> but are unavoidable here; the "elif"s will become a switch statement;I
>>>>> need to think some more about explicit hydrogens). To my mind, the use
>>>>> of a direct lookup is as clear, if not clearer, than using SMARTS
>>>>> patterns.
>>>>>
>>>>> I note that the existing tests don't hit all of the patterns, and
>>>>> while I can find molecules in ChEMBL that hit almost all of the
>>>>> patterns, I'm not sure whether I can find ones where the corresponding
>>>>> atom turns out to be aromatic in the end. I have a feeling this is
>>>>> because the patterns were added in response to dodgy smiles (e.g.
>>>>> using n instead of [nH]) which were reported or found by Geoff.
>>>>>
>>>>> Regards,
>>>>> - Noel
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>>>>> _______________________________________________
>>>>> OpenBabel-Devel mailing list
>>>>> OpenBabel-Devel@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/openbabel-devel
>>>>>
>>>>
>>>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to