Yes, there are some public benchmark results, such as the official benchmark from xxhash site (http://www.xxhash.com/) and published comparison from smhasher project (https://github.com/rurban/smhasher/).
On Tue, Jul 9, 2019 at 5:25 AM Wes McKinney <[email protected]> wrote: > > Do you have any benchmark data to support the choice of hash function? > > On Wed, Jul 3, 2019 at 8:41 AM 俊杰陈 <[email protected]> wrote: > > > > Dear Parquet developers > > > > To simplify the voting, I 'd like to update voting content to the spec > > with xxHash hash strategy. Now you can reply with +1 or -1. > > > > Thanks for your participation. > > > > On Tue, Jul 2, 2019 at 10:23 AM 俊杰陈 <[email protected]> wrote: > > > > > > Dear Parquet developers > > > > > > Parquet Bloom filter has been developed for a while, per the discussion > > > on the mail list, it's time to call a vote for spec to move forward. The > > > current spec can be found at > > > https://github.com/apache/parquet-format/blob/master/BloomFilter.md. > > > There are some different options about the internal hash choice of Bloom > > > filter and the PR is for that concern. > > > > > > So I 'd like to propose to vote the spec + hash option, for example: > > > > > > +1 to spec and xxHash > > > +1 to spec and murmur3 > > > ... > > > > > > Please help to vote, any feedback is also welcome in the discussion > > > thread. > > > > > > Thanks & Best Regards > > > > > > > > -- > > Thanks & Best Regards -- Thanks & Best Regards
