Hi all,

 I've found some rather serious problems in the OB MACCS definitions in the 2.3 
release.

I'm working more on my fingerprint generation codes, in preparation for my 
poster at the GCC conference in Goslar. In order to test the bit order, I 
constructed SMILES strings designed to hit specific bits in the MACCS keys.

 The test case "C1CCC1" should produce a "1" at position 11 (bit 10 if you 
count from 0). RDKit and OEChem both do this, but OpenBabel does not. I looked 
up the OB's pattern definition and found:

11:('*1~*~*~*~*1',0), # 4M Ring *NOTE* Was '*1~*~*~*~1' This and 9 others 
changed by CM because OB didn't like it

This pattern is incorrect. It matches a 5-membered ring. I tested the MACCS 
fingerprint with the 5 membered ring "C1CCCC1" and sure enough, OB gives a 1 
for position 11.

With the 4-membered ring C1CCC1 I see that OB sets position 22 (bit 21) to 1, 
when it should be 0. Here's the OB SMARTS definition for that position

22:('*1~*~*~*1',0), # 3M Ring

Here's the corresponding line from rdkit/Chem/MACCSkeys.py

22:('*1~*~*~1',0), # 3M Ring

You can see that here also the modified OB definition is looking for one too 
many atoms.

I haven't done a full analysis of which other bits are incorrect in this way, 
but what I've found is enough to say that people shouldn't use OB's MACCS 
definitions until they've been reviewed and fixed.


                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
The Next 800 Companies to Lead America's Growth: New Video Whitepaper
David G. Thomson, author of the best-selling book "Blueprint to a 
Billion" shares his insights and actions to help propel your 
business during the next growth cycle. Listen Now!
http://p.sf.net/sfu/SAP-dev2dev
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to