Hive Hash in Spark

tcondie Wed, 06 Mar 2019 13:49:57 -0800

Hi,


I noticed the existence of a Hive Hash partitioning implementation in Spark,
but also noticed that it's not being used, and that the Spark hash
partitioning function is presently hardcoded to Murmur3. My question is
whether Hive Hash is dead code or are their future plans to support reading
and understanding data the has been partitioned using Hive Hash? By
understanding, I mean that I'm able to avoid a full shuffle join on Table A
(partitioned by Hive Hash) when joining with a Table B that I can shuffle
via Hive Hash to Table A. 

 

Thank you,

Tyson

Hive Hash in Spark

Reply via email to