Hi all, I've found some rather serious problems in the OB MACCS definitions in the 2.3 release.
I'm working more on my fingerprint generation codes, in preparation for my poster at the GCC conference in Goslar. In order to test the bit order, I constructed SMILES strings designed to hit specific bits in the MACCS keys. The test case "C1CCC1" should produce a "1" at position 11 (bit 10 if you count from 0). RDKit and OEChem both do this, but OpenBabel does not. I looked up the OB's pattern definition and found: 11:('*1~*~*~*~*1',0), # 4M Ring *NOTE* Was '*1~*~*~*~1' This and 9 others changed by CM because OB didn't like it This pattern is incorrect. It matches a 5-membered ring. I tested the MACCS fingerprint with the 5 membered ring "C1CCCC1" and sure enough, OB gives a 1 for position 11. With the 4-membered ring C1CCC1 I see that OB sets position 22 (bit 21) to 1, when it should be 0. Here's the OB SMARTS definition for that position 22:('*1~*~*~*1',0), # 3M Ring Here's the corresponding line from rdkit/Chem/MACCSkeys.py 22:('*1~*~*~1',0), # 3M Ring You can see that here also the modified OB definition is looking for one too many atoms. I haven't done a full analysis of which other bits are incorrect in this way, but what I've found is enough to say that people shouldn't use OB's MACCS definitions until they've been reviewed and fixed. Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ The Next 800 Companies to Lead America's Growth: New Video Whitepaper David G. Thomson, author of the best-selling book "Blueprint to a Billion" shares his insights and actions to help propel your business during the next growth cycle. Listen Now! http://p.sf.net/sfu/SAP-dev2dev _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss