> 1. The "D" or "degree" in SMARTS > > [#6D2] is supposed to mean a a carbon atom with two bonds. However, > it treats explicit hydrogens as a bond,
Well, that can be changed. You're the former Daylight guy -- is "D" supposed to ignore explicit hydrogens? > For OpenBabel 3, we should clarify the definition of "D" in SMARTS > so that it either does or does not include bonds to H atoms, > regardless of whether the H are represented implicitely or > explicitely in the internal C++ structures. Sounds like an "Open SMARTS" specification is needed. ;-) > This is a mess. Because of other flaws in the system, Geoff (I > think it was Geoff) was forced to add rules like these: ... > It fixed a bug (an aromatic ring that was missed), but it's is > absurd. It's saying nitrogen with two bonds (single and/or double), > OR with three bonds (single or double), OR with two single bonds > (regardless of overall valence), can ALL deliver either one or two > electrons to the aromatic system! The problem is the state of formal charges. At the moment, when aromaticity detection occurs, formal charges may be unknown. That led us down the current path. > So I propose to discard the range rules from aromatic.txt, and have > each SMARTS specify a single number, the specific number of > electrons contributed by that atom. That would be great. At the time I was hacking those SMARTS rules, I wasn't expert enough with SMARTS to define clear, specific patterns. As you note, they're not trivial. > I believe (but am not positive) that SMARTS should be sufficient to > define all potentially aromatic atoms. We shouldn't need any > special-case code. The SMARTS will define the electron count of > each atom, then we apply Hueckel's 4n+2 rule, and that's it. If so, I think we can safely eliminate the code in typer.cpp. The whole point of having aromatic.txt is that the rules are user-editable without needing to write code. That's a good goal. > I can't fix #5 and #6, but luckily, the SDF and SMILES parsers are > pretty consistent, so for my purposes (cheminformatics), I should be > able to get a clear idea what H-count and valence actually mean. I suspect for many other formats (i.e., those with explicit hydrogens like QM codes), we should be able to standardize easily. The last remaining task is a sane assignment of formal charges. It should not require the pH code. I have some in-house code which handles a variety of "hypervalent" atoms like S and P, so I'll try to get that ready for trunk. As you said, SDF and SMILES should define formal charges in the file itself -- nothing is ambiguous. > My plan is to more-or-less rewrite the aromaticity code in > typer.cpp from top to bottom, and rewrite all of the rules in > aromatic.txt. There doesn't seem to be anything worth saving. I would hope you'd open the code -- e.g., on a GitHub repository or similar. Whether it gets incorporated into OB-2.3 or used for the MolCore/OB3 work can be decided later. -Geoff ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ OpenBabel-Devel mailing list OpenBabel-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-devel