Re: Does DataFrame has something like set hive.groupby.skewindata=true;

2016-05-23 Thread Virgil Palanciuc
It doesn't. However, if you have a very large number of keys, with a small number of very large keys, you can do one of the following: A. Use a custom partitioner that counts the number of items in a key and avoids putting large keys together; alternatively, if feasible (and needed), include part

Does DataFrame has something like set hive.groupby.skewindata=true;

2016-05-21 Thread unk1102
Hi I am having DataFrame with huge skew data in terms of TB and I am doing groupby on 8 fields which I cant avoid unfortunately. I am looking to optimize this I have found hive has set hive.groupby.skewindata=true; I dont use Hive I have Spark DataFrame can we achieve above Spark? Please guide.