Re: [SQL] hash: 64-bits and seeding

2019-03-07 Thread Huon.Wilson
Thanks for the guidance. That was my initial inclination, but I decided that consistency with the existing ‘hash’ was better. However, like you, I also prefer the specific form. I’ve opened https://issues.apache.org/jira/browse/SPARK-27099 and submitted the patch (using ‘xxhash64’) at

[SQL] hash: 64-bits and seeding

2019-03-06 Thread Huon.Wilson
Hi, I’m working on something that requires deterministic randomness, i.e. a row gets the same “random” value no matter the order of the DataFrame. A seeded hash seems to be the perfect way to do this, but the existing hashes have various limitations: - hash: 32-bit output (only 4 billion