Sorry about that, it made me laugh 6 years ago, I didn't expect this to
come back and haunt me :)...

There are ways out of this, none of them are particularly appealing:
- Add a SQL conf to make the value configurable.
- Add a seed parameter to the function. I am not sure if we can make this
work well with star expansion (e.g. xxhash64(*) is allowed).
- Add a new function that allows you to set the seed: e.g.
xxhash64_with_seed(<seed>,
<value 1>, ..., <value n>).

On Mon, Sep 26, 2022 at 8:27 PM Sean Owen <sro...@gmail.com> wrote:

> Oh yeah I get why we love to pick 42 for random things. I'm guessing it
> was a bit of an oversight here as the 'seed' is directly initial state and
> 0 makes much more sense.
>
> On Mon, Sep 26, 2022, 7:24 PM Nicholas Gustafson <njgustaf...@gmail.com>
> wrote:
>
>> I don’t know the reason, however would offer a hunch that perhaps it’s a
>> nod to Douglas Adams (author of The Hitchhiker’s Guide to the Galaxy).
>>
>>
>> https://news.mit.edu/2019/answer-life-universe-and-everything-sum-three-cubes-mathematics-0910
>>
>> On Sep 26, 2022, at 16:59, Sean Owen <sro...@gmail.com> wrote:
>>
>> 
>> OK, it came to my attention today that hash functions in spark, like
>> xxhash64, actually always seed with 42:
>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L655
>>
>> This is an issue if you want the hash of some value in Spark to match the
>> hash you compute with xxhash64 somewhere else, and, AFAICT most any other
>> impl will start with seed=0.
>>
>> I'm guessing there wasn't a *great* reason for this, just seemed like 42
>> was a nice default seed. And we can't change it now without maybe subtly
>> changing program behaviors. And, I am guessing it's messy to let the
>> function now take a seed argument, esp. in SQL.
>>
>> So I'm left with, I guess we should doc that? I can do it if so.
>> And just a cautionary tale I guess, for hash function users.
>>
>>

Reply via email to