On 22/07/2012 23:35, Tim Vandermeersch wrote:
> Hi,
>
> The problem seems to be in src/fingerprints/finger3.cpp:
>
>    //Each bit represents a single substructure; no need for
> confirmation when substructure searching
>    virtual unsigned int Flags() { return FPT_UNIQUEBITS;};
>
> This confuses me. It's not because a the substructures are present in
> the queried molecule that the queried molecule is a superstructure of
> the query. So an isomorphism search is still needed to confirm the
> hit.

This code comment (and the implementation of it in fastsearchformat) is 
clearly wrong and has been for a long time. The flag means that the bit 
represents only one substructure feature and is not a hash as in FP2. I 
have corrected in trunk code in fastsearchformat and the comment in 
finger3.cpp.

> When I change the Flags() function to return 0, I still don't get the
> expected results though. With my query there should be 46 hits but FP3
> gives 26, FP4 25 and MACCS only 12. Is there something I'm missing
> here. If the bits simply represent a substructure, the fingerprint
> screening should return all possible molecules containing the query.

I think that this is because a structure as a pattern is not being 
distinguished sufficiently from a structure as a molecule. A SMILES 
input of OC will match any ether when it is used as SMARTS or in a FP2 
substructure search. With FP4 or MACCS, it is seen as methanol and a bit 
corresponding to an alcohol is set. This prevents a match to an ordinary 
ether.

obabel -:"OC" -ofpt -xfFP4 -xs
 >
Alcohol C_ONS_bond
1 molecule converted

obabel -:"COC" -ofpt -xfFP4 -xs
 >
Dialkylether    C_ONS_bond
1 molecule converted

obabel -:"OC" -ofpt -xfMACCS -xs
 >
93: QCH3        139: OH 157: C-O        160: CH3        164: O
1 molecule converted

obabel -:"COC" -ofpt -xfMACCS -xs
 >
74: CH3ACH3     86: CH2QCH2     93: QCH3        126: A!O!A      149: CH3 
 > 1*2
157: C-O        160: CH3        164: O
1 molecule converted

I guess structure-key fingerprints should not be used for substructure 
searches, at least until we have a way round this. But they may be 
better for similarity comparisons. For example, in the above, the 
presence of an alcohol is more chemically significant than any old O 
bonded to C.

Chris


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to