Hi Saurabh.
Bucketing in Hive refers to hash partitioning where a hashing function is
applied. Likewise an RDBMS, Hive will apply a linear hashing algorithm to
prevent data from clustering within specific partitions. Hashing is very
effective if the column selected for bucketing has very high
Details of Hive Version:
I am using Hive -14.0 with Tez as execution engine.
Thanks,
Saurabh
Sent from my iPhone, please avoid typos.
> On 07-Sep-2015, at 1:51 am, Db-Blog wrote:
>
> Hi,
>
> I need to join two big tables in hive. The join key is the grain of both
Hi,
I need to join two big tables in hive. The join key is the grain of both these
tables, hence clustering and sorting on the same will provide significant
performance optimisation while joining.
However, i am not sure how to calculate the exact number of buckets while
creating these