On Thu, Jul 11, 2019 at 12:02 AM Lewis Martin <lewis.marti...@gmail.com> wrote:
> Thanks Greg! > Would you mind giving a blurb or a link to a paper on how count simulation > works? I looked through the GSOC pull request but unfortunately don't > understand it. > There's a summary of it on pages 21 and 22 here: https://www.rdkit.org/UGM/2012/Landrum_RDKit_UGM.Fingerprints.Final.pptx.pdf and a paragraph (that probably needs a picture) in the new fingerprint docs here: https://github.com/rdkit/rdkit/blob/master/Docs/Book/RDKit_Book.rst#atom-pair-and-topological-torsion-fingerprints Agreed re. your comments. Usually 256bits is for playing around and then > larger FPs are for 'production' runs. Although in my use case, for example > logistic regression / naive bayes classifier on protein activity records in > chembl, I really don't see a big difference despite collisions! That was > prior to count simulation. > Yeah, when I've looked at this before I've seen more or less the same thing: http://rdkit.blogspot.com/2014/03/colliding-bits-ii.html Note that there is likely a larger difference with the count simulation: there a 256 bit FP has essentially 64 bits to encode the features. -greg
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss