Hi, 

I need to join two big tables in hive. The join key is the grain of both these 
tables, hence clustering and sorting on the same will provide significant 
performance optimisation while joining.  

However, i am not sure how to calculate the exact number of buckets while 
creating these tables. Can someone please share any pointers on the same? 

Planning to keep these Clustered and Sorted tables as parquet/orc- for columnar 
storage and better compression. 

Thanks,
Saurabh

Sent from my iPhone, please avoid typos.

Reply via email to