Hi Noel and Geoff,I've been investigating some of the weird SMILES strings distributed by eMolecules,that can't be read into other cheminformatics packages. A significant fraction appearto be molecules with nonsense formal charges on aromatic atoms, which then fail tobe Kekulized given the mismatched valence states.Two examples include:c1ccc2c(c1)[n+2](c(CO)c(CO)[n+]2[O-])OandFc1c(F)c(F)[c+7](c(c1F)F)[Ti]1234([C+6]5C=CC=C5)([C+6]5[C+7]3=[C+7]2[C+7]1=[C+7]45)[c+7]1c(F)c(F)c(c(c1F)F)F 44386258The second example is complete nonsense, because as explained on my "can't break thelaws of physics" blog post, a carbon can't possibly have a formal charge of +7 with onlysix protons. Given this brokenness, they shouldn't be getting marked as aromatic.Digging deeper into where these atoms may be getting classified as aromatic led meto OpenBabel's aromatic.txt and indeed the current trunk version of openbabel willblindly transform "c1ccc2c(c1)[N+](=C(C(=[N+2]2O)CO)CO)[O-]" into the first stringabove.The problem appears to be that the current SMARTS patterns in aromatic.txt are too forgiving,allowing any formal charge to be accepted as aromatic. I suspect the pattern's author may haveassumed that, like SMILES, not specifying a formal charge implies no charge. Indeed, the OpenSMILES specification implicitly repeats this, by listing the SMILES but not the SMARTS.The attached patch resolves the issue, by tightening these SMARTS patterns. The relevantideology is "first do no harm"; a ring system shouldn't be considered aromatic unless wecan be certain we can correctly Kekulize it back at some point in the future. "[n+2]", if it didexist and was allowed (it isn't on Daylight SMILES, c.f. daycgi/depict), should be isoelectronicwith boron, three valent, potentially aromatic, but contributing zero pi-electrons.I've also noticed that "genheaders.sh" hasn't been run since some of the most recent changesto the data/*.txt files, meaning some of the data/*.h files are out of sync. If this proposed patchgets accepted, running genheader.sh to regenerate aromatic.h would also address this.Please let me know what you think?Roger--Roger Sayle, Ph.D.CEO and founderNextMove Software LimitedRegistered in England No. 07588305Registered Office: Innovation Centre (Unit 23), Cambridge Science Park, Cambridge CB4 0EY
##
##
#Open Babel file: aromatic.txt #
##
##
# Copyright (c) 1998-2001 by OpenEye Scientific Software, Inc. #
# Some portions Copyright (c) 2001-2005 Geoffrey R. Hutchison #
# Part of the Open Babel package, under the GNU General Public License (GPL)#
##
# SMARTS patterns with minimum and maximum pi-electrons contributed to an#
# aromatic system (used by typer.cpp:OBAromaticTyper) #
# The LAST PATTERN MATCHED is used to assign values, so that patterns should #
# be ordered from more general to more specific#
##
##
#PATTERNMIN MAX
#carbon patterns
[#6rD2+0] 1 1
# exo ketone or alcohol -- don't know which
[#6rD3+0]~!@[#8]0 1
[#6rD2+,#6rD3+] 1 1
[#6r+0]=@* 1 1
[#6rD3+0]=!@* 1 1
# external double bonds to hetero atoms contribute no electrons to the
# aromatic systems -- quinoid systems are non-aromatic, e.g. 1,4-benzoquinone
[#6rD3+0]=!@[!#6] 0 0
[#6rD3-]2 2
#nitrogen patterns
[#7rD2+0] 1 2
[#7rD3+0] 1 2
[#7r+0](-@*)-@* 1 2
[#7rD2+0]=@*1 1
[#7rD3+]1 1
[#7rD3+0]=O 1 1
[#7rD2-]2 2
#oxygen patterns
[#8r+0] 2 2
[#8r+] 1 1
#sulfur patterns
[#16rD2+0] 2 2
[#16rD2+] 1 1
[#16rD3+0]=!@O 2 2
#other misc patterns
# Accounts Chem Res 1978 11 p. 153
# phosphole, phosphabenzene (not v. aromatic)
[#15rD3+0] 2 2
# selenophene
[#34rD2+0] 2 2
# arsabenzene, etc. (*really* not v. aromatic)
#[#33rD3+0] 2 2
# tellurophene, etc. (*really* not v. aromatic)
#[#52rD2+0] 2 2
# stilbabenzene, etc. (very little aromatic character)
#[#51rD3+0] 2 2
aromatic.txt.patch
Description: Binary data
-