Hi, let me start by saying I am sorry it took me so long to answer. Rajarshi Guha wrote: (>) > In the past for binary fingerprints the string version of a BitSet was > sufficient to output fp's to a file > > For something like the signature fingerprint - I see it has a method > to output a bit version and the raw version. How is the bit version > generated and what is the bit string length?
I am assuming you mean the signature fingerprinter as in the class generating the fingerprint not the actual representation of it. It does indeed provide the raw version and a new bit version. I introduced a few interfaces for doing different fingerprint representation since a dense BitSet is not always what one wants. My implementation of the bit fingerprint interface does two things: 1. Provides sparse representation of which bits are set by storing indexes to the set bits. Compared to the approach based on Javas BitSet this means that instead of storing an array of Bits it stores and array of integers corresponding to which bits are set. For the signature fingerprint this consisted of hashing to an integer using Javas hashcode method on the signature String and then using that integer unhashed. As in instead of having length 1024 (corresponding to 10 bits for lookup) which is common in CDK it uses 32 bits i.e. an integer. 2. It does not store the actual signature which is the case for the raw fingerprint. The raw fingerprint is handy when one wants to go back using the benefits of signatures. However storing these Strings take up memory and sometimes that is not wanted. Note that the sparse representation only makes sense for fingerprints that are truly sparse i.e has very few true bits and many false. > is it appropriate to dump out a binary version of a signature > fingerprint? or instead provide the raw values of such an fp? That depends on what you want. If you are going to use it only for lookup and don't want to go back to the signature Strings then yea some sort of String representation looking a bit like this probably makes sens: 5 45 756 45657 4568721 ... where the numbers signifies which bits are set to true. I did not make the class / interface with such a method but perhaps it would be good? (Patches welcome) As for if you want to keep the signatures too then perhaps another String representation would be good for the RawFingerprint. Before concluding let me also say I am sorry I never came around to writing that blog post Egon suggested me to do about my ideas behind the new Fingerprinter things was. Actually I think this mail has sort of explained most of it -- except maybe that there now also is an interface for Count fingerprints -- and now even that is mentioned... :) -- // Jonathan ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Cdk-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/cdk-user

