Dear all,

As I mentioned in an earlier message
(http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01430.html),
the default parameters for the RDKit fingerprint end up setting far
too many bits for drug-like molecules. The result of this is
similarity values that are in general too high and more frequent
occurrences of molecules that are similar to each other only due to
bit collisions.

The easy solution to this problem is to decrease the number of bits
set per path found (the nBitsPerHash parameter) from 4 to 2. I propose
doing this for the Q4 2010 release of the RDKit. The downside is that
the fingerprints generated with that release will not be compatible
with fingerprints from earlier releases unless you specify
nBitsPerHash=4 on your own. The upside is a much more useful
similarity fingerprint.

Any objections to me making this change?

-greg

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to