RE: Bucketing- Identify Number of Buckets

2015-09-06 Thread Mich Talebzadeh
Hi Saurabh. Bucketing in Hive refers to hash partitioning where a hashing function is applied. Likewise an RDBMS, Hive will apply a linear hashing algorithm to prevent data from clustering within specific partitions. Hashing is very effective if the column selected for bucketing has very high

Re: Bucketing- Identify Number of Buckets

2015-09-06 Thread Db-Blog
Details of Hive Version: I am using Hive -14.0 with Tez as execution engine. Thanks, Saurabh Sent from my iPhone, please avoid typos. > On 07-Sep-2015, at 1:51 am, Db-Blog wrote: > > Hi, > > I need to join two big tables in hive. The join key is the grain of both

Bucketing- Identify Number of Buckets

2015-09-06 Thread Db-Blog
Hi, I need to join two big tables in hive. The join key is the grain of both these tables, hence clustering and sorting on the same will provide significant performance optimisation while joining. However, i am not sure how to calculate the exact number of buckets while creating these