> For now we are interested in consistency before "accuracy", which is another > subject. As a related note, we have tested several atom typing programs > (Knodle, I-interpret, Unicon and also Open Babel) and the perception of the > number of aromatic atoms typically differ in 10-20 % when analyzing a 3600 > structures in the PDBbind database.
This is hardly surprising. For one, if I take 10 organic chemists in a room and ask them to identify aromatic rings, I’ll get at least 10-20% variation. More specifically, there is not one uniform cheminformatics model for aromaticity - because there is no well-defined chemical definition. That’s omitting the hard cases, even given a specific aromatic model. I’d guess we get 5-10 bug reports per year on specific cases for OB aromaticity detection. But your question is how do you get uniform atom types, regardless of the input file format. This is probably impossible. If you have data in format X with correct bond and formal charge assignments (e.g., SDF) and data in XYZ format with atoms and no bonds or formal charges, you have to assume that all the bond perception is perfect. I don’t have a good metric for OB’s implementation, but I’d guess somewhere in the ~90-95% range. In short, please don’t throw away good data. Stick to file formats that retain as much information as possible. -Geoff ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss