[OpenBabel-Devel] Refining openbabel's aromatic atom types

2014-02-02 Thread ro...@nextmovesoftware.com
Hi Noel and Geoff,I've been investigating some of the weird SMILES strings distributed by eMolecules,that can't be read into other cheminformatics packages.  A significant fraction appearto be molecules with nonsense formal charges on aromatic atoms, which then fail tobe Kekulized given the mismatched valence states.Two examples include:c1ccc2c(c1)[n+2](c(CO)c(CO)[n+]2[O-])OandFc1c(F)c(F)[c+7](c(c1F)F)[Ti]1234([C+6]5C=CC=C5)([C+6]5[C+7]3=[C+7]2[C+7]1=[C+7]45)[c+7]1c(F)c(F)c(c(c1F)F)F 44386258The second example is complete nonsense, because as explained on my "can't break thelaws of physics" blog post, a carbon can't possibly have a formal charge of +7 with onlysix protons.  Given this brokenness, they shouldn't be getting marked as aromatic.Digging deeper into where these atoms may be getting classified as aromatic led meto OpenBabel's aromatic.txt and indeed the current trunk version of openbabel willblindly transform "c1ccc2c(c1)[N+](=C(C(=[N+2]2O)CO)CO)[O-]" into the first stringabove.The problem appears to be that the current SMARTS patterns in aromatic.txt are too forgiving,allowing any formal charge to be accepted as aromatic.  I suspect the pattern's author may haveassumed that, like SMILES, not specifying a formal charge implies no charge.  Indeed, the OpenSMILES specification implicitly repeats this, by listing the SMILES but not the SMARTS.The attached patch resolves the issue, by tightening these SMARTS patterns.  The relevantideology is "first do no harm"; a ring system shouldn't be considered aromatic unless wecan be certain we can correctly Kekulize it back at some point in the future.  "[n+2]", if it didexist and was allowed (it isn't on Daylight SMILES, c.f. daycgi/depict), should be isoelectronicwith boron, three valent, potentially aromatic, but contributing zero pi-electrons.I've also noticed that "genheaders.sh" hasn't been run since some of the most recent changesto the data/*.txt files, meaning some of the data/*.h files are out of sync.  If this proposed patchgets accepted, running genheader.sh to regenerate aromatic.h would also address this.Please let me know what you think?Roger--Roger Sayle, Ph.D.CEO and founderNextMove Software LimitedRegistered in England No. 07588305Registered Office: Innovation Centre (Unit 23), Cambridge Science Park, Cambridge CB4 0EY


##
##
#Open Babel file: aromatic.txt   #
##
##
#  Copyright (c) 1998-2001 by OpenEye Scientific Software, Inc.  #
#  Some portions Copyright (c) 2001-2005 Geoffrey R. Hutchison   #
#  Part of the Open Babel package, under the GNU General Public License (GPL)#
##
# SMARTS patterns with minimum and maximum pi-electrons contributed to an#
#   aromatic system (used by typer.cpp:OBAromaticTyper)  #
# The LAST PATTERN MATCHED is used to assign values, so that patterns should #
#   be ordered from more general to more specific#
##
##

#PATTERNMIN MAX

#carbon patterns
[#6rD2+0]   1   1
# exo ketone or alcohol -- don't know which
[#6rD3+0]~!@[#8]0   1
[#6rD2+,#6rD3+] 1   1
[#6r+0]=@*  1   1
[#6rD3+0]=!@*   1   1
# external double bonds to hetero atoms contribute no electrons to the 
# aromatic systems -- quinoid systems are non-aromatic, e.g. 1,4-benzoquinone
[#6rD3+0]=!@[!#6]   0   0
[#6rD3-]2   2

#nitrogen patterns
[#7rD2+0]   1   2
[#7rD3+0]   1   2
[#7r+0](-@*)-@* 1   2
[#7rD2+0]=@*1   1
[#7rD3+]1   1
[#7rD3+0]=O 1   1
[#7rD2-]2   2

#oxygen patterns
[#8r+0] 2   2
[#8r+]  1   1

#sulfur patterns
[#16rD2+0]  2   2
[#16rD2+]   1   1
[#16rD3+0]=!@O  2   2

#other misc patterns
# Accounts Chem Res 1978 11 p. 153
# phosphole, phosphabenzene (not v. aromatic)
[#15rD3+0]  2   2
# selenophene
[#34rD2+0]  2   2
# arsabenzene, etc. (*really* not v. aromatic)
#[#33rD3+0] 2   2
# tellurophene, etc. (*really* not v. aromatic)
#[#52rD2+0] 2   2
# stilbabenzene, etc. (very little aromatic character)
#[#51rD3+0] 2   2


aromatic.txt.patch
Description: Binary data
-

Re: [OpenBabel-Devel] Refining openbabel's aromatic atom types

2014-02-03 Thread ro...@nextmovesoftware.com

Hi Geoff,

On 3 Feb 2014, at 05:23, Geoffrey Hutchison  wrote:
>> to be molecules with nonsense formal charges on aromatic atoms, which then 
>> fail to
>> be Kekulized given the mismatched valence states.
> 
> OK, I have an interesting test case from Noel's python testkekulize.py script.
> (Sadly, my laptop wasn't set to build the python bindings and so this is 
> currently broken.)
> 
> C1=[N+]C=Nc2[nH]cnc12
> vs.
> c1[n+]cnc2[nH]cnc12
> 
> Now, I'm not entirely sure why there's a positive charge on that nitrogen. 
> Let's ignore that for a second — assume there's some chemical process that 
> makes it N+.. If we're including two explicit double-bonds in that 
> six-membered ring, plus a double bond from the imidazole ring.. that makes 
> three double bonds = 6 pi electrons.
> 
> So I'd probably mark that ring as aromatic on an exam. That suggests that a 
> pyrridinium n+ is OK, and contributes 1 pi electron.
> 
> For example, if I add an H, Daylight depict is quite happy with this as 
> aromatic:
> c1[n+H]cnc2[nH]cnc12
> 
> And as you know, the two examples you gave are rejected completely by 
> Daylight SMILES.
> 
> Thoughts?

The presence of the hydrogen on the pyridinium nitrogen is critical for 
aromaticity.  Atoms with
non-normal valences, such as radicals, are not usually considered aromatic in 
SMILES.  The
list of aromatic atom types in the OpenSMILES spec doesn't (shouldn't) contain 
any radicals,
but does explicitly allow (mono)cationic and (mono)anionic nitrogen.

I think the mistake is to "Let's ignore that for a second".  Consider benzene.  
Then let's strip a
pi electron from the ring, resulting in [CH+]1=CC=CC=C1, this isn't aromatic as 
we've now 
only 5 pi-electrons, i.e. this shouldn't be [cH+]1c1.  The exact same thing 
is happening
in Noel's example.  Start with 4n+2 pyridine, n1c1 and lose a (pi) electron 
gives
[N+]1=CC=CC=C1 not [n+]1c1.

http://www.daylight.com/daycgi/depict?5b4e2b5d313d43433d43433d4331

The same happens if the electron is lost from one of the carbons, 
N1=CC=C[CH+]=C1.
For Noel's example above, OpenBabel now matches Daylight's (and OpenEye's) 
behaviour.
http://www.daylight.com/daycgi/depict?43313d5b4e2b5d433d4e63325b6e485d636e633132

If OB is trying to count electrons, clearly these SMARTS patterns need to match 
formal charge.

p.s. I'm not claiming that the revised SMARTS in aromatic.txt are correct, just 
that they have
fewer bugs than without this patch.

Best regards,

Roger
--
Roger Sayle, Ph.D.
CEO and founder
NextMove Software Limited
Registered in England No. 07588305
Registered Office: Innovation Centre (Unit 23), Cambridge Science Park, 
Cambridge CB4 0EY


--
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
___
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel