On 22/07/2012 23:35, Tim Vandermeersch wrote:
> Hi,
>
> The problem seems to be in src/fingerprints/finger3.cpp:
>
> //Each bit represents a single substructure; no need for
> confirmation when substructure searching
> virtual unsigned int Flags() { return FPT_UNIQUEBITS;};
>
> This confuses me. It's not because a the substructures are present in
> the queried molecule that the queried molecule is a superstructure of
> the query. So an isomorphism search is still needed to confirm the
> hit.
This code comment (and the implementation of it in fastsearchformat) is
clearly wrong and has been for a long time. The flag means that the bit
represents only one substructure feature and is not a hash as in FP2. I
have corrected in trunk code in fastsearchformat and the comment in
finger3.cpp.
> When I change the Flags() function to return 0, I still don't get the
> expected results though. With my query there should be 46 hits but FP3
> gives 26, FP4 25 and MACCS only 12. Is there something I'm missing
> here. If the bits simply represent a substructure, the fingerprint
> screening should return all possible molecules containing the query.
I think that this is because a structure as a pattern is not being
distinguished sufficiently from a structure as a molecule. A SMILES
input of OC will match any ether when it is used as SMARTS or in a FP2
substructure search. With FP4 or MACCS, it is seen as methanol and a bit
corresponding to an alcohol is set. This prevents a match to an ordinary
ether.
obabel -:"OC" -ofpt -xfFP4 -xs
>
Alcohol C_ONS_bond
1 molecule converted
obabel -:"COC" -ofpt -xfFP4 -xs
>
Dialkylether C_ONS_bond
1 molecule converted
obabel -:"OC" -ofpt -xfMACCS -xs
>
93: QCH3 139: OH 157: C-O 160: CH3 164: O
1 molecule converted
obabel -:"COC" -ofpt -xfMACCS -xs
>
74: CH3ACH3 86: CH2QCH2 93: QCH3 126: A!O!A 149: CH3
> 1*2
157: C-O 160: CH3 164: O
1 molecule converted
I guess structure-key fingerprints should not be used for substructure
searches, at least until we have a way round this. But they may be
better for similarity comparisons. For example, in the above, the
presence of an alcohol is more chemically significant than any old O
bonded to C.
Chris
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel