> 1. The "D" or "degree" in SMARTS
>
> [#6D2] is supposed to mean a a carbon atom with two bonds.  However,  
> it treats explicit hydrogens as a bond,

Well, that can be changed. You're the former Daylight guy -- is "D"  
supposed to ignore explicit hydrogens?

> For OpenBabel 3, we should clarify the definition of "D" in SMARTS  
> so that it either does or does not include bonds to H atoms,  
> regardless of whether the H are represented implicitely or  
> explicitely in the internal C++ structures.

Sounds like an "Open SMARTS" specification is needed. ;-)

> This is a mess.  Because of other flaws in the system, Geoff (I  
> think it was Geoff) was forced to add rules like these:
...
> It fixed a bug (an aromatic ring that was missed), but it's is  
> absurd.  It's saying nitrogen with two bonds (single and/or double),  
> OR with three bonds (single or double), OR with two single bonds  
> (regardless of overall valence), can ALL deliver either one or two  
> electrons to the aromatic system!

The problem is the state of formal charges. At the moment, when  
aromaticity detection occurs, formal charges may be unknown. That led  
us down the current path.

> So I propose to discard the range rules from aromatic.txt, and have  
> each SMARTS specify a single number, the specific number of  
> electrons contributed by that atom.

That would be great. At the time I was hacking those SMARTS rules, I  
wasn't expert enough with SMARTS to define clear, specific patterns.  
As you note, they're not trivial.

> I believe (but am not positive) that SMARTS should be sufficient to  
> define all potentially aromatic atoms.  We shouldn't need any  
> special-case code.  The SMARTS will define the electron count of  
> each atom, then we apply Hueckel's 4n+2 rule, and that's it.

If so, I think we can safely eliminate the code in typer.cpp. The  
whole point of having aromatic.txt is that the rules are user-editable  
without needing to write code. That's a good goal.

> I can't fix #5 and #6, but luckily, the SDF and SMILES parsers are  
> pretty consistent, so for my purposes (cheminformatics), I should be  
> able to get a clear idea what H-count and valence actually mean.

I suspect for many other formats (i.e., those with explicit hydrogens  
like QM codes), we should be able to standardize easily. The last  
remaining task is a sane assignment of formal charges. It should not  
require the pH code. I have some in-house code which handles a variety  
of "hypervalent" atoms like S and P, so I'll try to get that ready for  
trunk.

As you said, SDF and SMILES should define formal charges in the file  
itself -- nothing is ambiguous.

>  My plan is to more-or-less rewrite the aromaticity code in  
> typer.cpp from top to bottom, and rewrite all of the rules in  
> aromatic.txt.  There doesn't seem to be anything worth saving.

I would hope you'd open the code -- e.g., on a GitHub repository or  
similar. Whether it gets incorporated into OB-2.3 or used for the  
MolCore/OB3 work can be decided later.

-Geoff

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to