Hi, I need to join two big tables in hive. The join key is the grain of both these tables, hence clustering and sorting on the same will provide significant performance optimisation while joining.
However, i am not sure how to calculate the exact number of buckets while creating these tables. Can someone please share any pointers on the same? Planning to keep these Clustered and Sorted tables as parquet/orc- for columnar storage and better compression. Thanks, Saurabh Sent from my iPhone, please avoid typos.