Dear all, As I mentioned in an earlier message (http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01430.html), the default parameters for the RDKit fingerprint end up setting far too many bits for drug-like molecules. The result of this is similarity values that are in general too high and more frequent occurrences of molecules that are similar to each other only due to bit collisions.
The easy solution to this problem is to decrease the number of bits set per path found (the nBitsPerHash parameter) from 4 to 2. I propose doing this for the Q4 2010 release of the RDKit. The downside is that the fingerprints generated with that release will not be compatible with fingerprints from earlier releases unless you specify nBitsPerHash=4 on your own. The upside is a much more useful similarity fingerprint. Any objections to me making this change? -greg ------------------------------------------------------------------------------ Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss