> For now we are interested in consistency before "accuracy", which is another 
> subject. As a related note, we have tested several atom typing programs 
> (Knodle, I-interpret, Unicon and also Open Babel) and the perception of the 
> number of aromatic atoms typically differ in 10-20 % when analyzing a 3600 
> structures in the PDBbind database.

This is hardly surprising. For one, if I take 10 organic chemists in a room and 
ask them to identify aromatic rings, I’ll get at least 10-20% variation.

More specifically, there is not one uniform cheminformatics model for 
aromaticity - because there is no well-defined chemical definition. That’s 
omitting the hard cases, even given a specific aromatic model. I’d guess we get 
5-10 bug reports per year on specific cases for OB aromaticity detection.

But your question is how do you get uniform atom types, regardless of the input 
file format. This is probably impossible. If you have data in format X with 
correct bond and formal charge assignments (e.g., SDF) and data in XYZ format 
with atoms and no bonds or formal charges, you have to assume that all the bond 
perception is perfect. I don’t have a good metric for OB’s implementation, but 
I’d guess somewhere in the ~90-95% range.

In short, please don’t throw away good data. Stick to file formats that retain 
as much information as possible.

-Geoff
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to