On Thu, Jul 11, 2019 at 12:02 AM Lewis Martin <lewis.marti...@gmail.com>
wrote:

> Thanks Greg!
> Would you mind giving a blurb or a link to a paper on how count simulation
> works? I looked through the GSOC pull request but unfortunately don't
> understand it.
>

There's a summary of it on pages 21 and 22 here:
https://www.rdkit.org/UGM/2012/Landrum_RDKit_UGM.Fingerprints.Final.pptx.pdf
and a paragraph (that probably needs a picture) in the new fingerprint docs
here:
https://github.com/rdkit/rdkit/blob/master/Docs/Book/RDKit_Book.rst#atom-pair-and-topological-torsion-fingerprints

Agreed re. your comments. Usually 256bits is for playing around and then
> larger FPs are for 'production' runs. Although in my use case, for example
> logistic regression / naive bayes classifier on protein activity records in
> chembl, I really don't see a big difference despite collisions! That was
> prior to count simulation.
>

Yeah, when I've looked at this before I've seen more or less the same
thing: http://rdkit.blogspot.com/2014/03/colliding-bits-ii.html

Note that there is likely a larger difference with the count simulation:
there a 256 bit FP has essentially 64 bits to encode the features.

-greg
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to