The source code to HPPC is public and accessible, so you are more then
welcome to peek/ contribute/ take whatever you want, Benson.

Dawid

On Fri, Apr 2, 2010 at 10:45 PM, Benson Margulies <bimargul...@gmail.com> wrote:
> Dawid,
>
> Now I recall why I stopped working on features of Mahout collections :-)
> HPPC.
>
> We'll see who gets where first.
>
> --benson
>
>
> On Fri, Apr 2, 2010 at 10:06 AM, Dawid Weiss <dawid.we...@gmail.com> wrote:
>
>> > What's the use case for needing to vary the hash function? It's one of
>> > those things where I assume there are incorrect ways to do it, and
>> > correct ways, and among the correct ways fairly clear arguments about
>> > which function will be better -- i.e. the object should provide the
>> > best function.
>>
>> Unfortunately this is not true -- just recently I've hit a use case
>> where the keys stored were Long values and their distribution had a
>> very low variance in the lower bits. HPPC implemented open hashing
>> using 2^n arrays and hashes were modulo bitmask... this caused really,
>> really long conflict chains for values that were actually very
>> different. I looked at how JDK's HashMap solves this problem -- they
>> do a simple rehashing scheme internally (so it's object hash and then
>> remixing hash in a cascade). I've finally decided to allow external
>> hash functions AND changed the _default_ hash function used for
>> "remixing" to be murmur hash. Performance benchmarks show this yields
>> virtually no degradation in execution time (the CPUs seem to spend
>> most of their time waiting on cache misses anyway, so internal
>> rehashing is not an issue).
>>
>> I must also apologize for a bit of inactivity with HPPC... Like I
>> said, we have released it internally on our "labs" Web site here:
>>
>> http://labs.carrotsearch.com/hppc.html
>>
>> It doesn't mean we turn our backs on contributing HPPC to Mahout --
>> the opposite, we would love to do it. But contrary to what I
>> originally thought (to push HPPC to Mahout as soon as possible) I kind
>> of grew reluctant because so many things are missing (equals/hashcode,
>> java collections adapters) or can be improved (documentation, faster
>> iterators).
>>
>> So... I'm still going to experiment with HPPC in our labs, especially
>> API-wise, release one or two versions in between and then kindly ask
>> you to peek at the final (?) result and consider moving the code under
>> Mahout umbrella. Sounds good?
>>
>> Dawid
>>
>

Reply via email to