It doesn't.
However, if you have a very large number of keys, with a small number of
very large keys, you can do one of the following:
A. Use a custom partitioner that counts the number of items in a key and
avoids putting large keys together; alternatively, if feasible (and
needed), include part
Hi I am having DataFrame with huge skew data in terms of TB and I am doing
groupby on 8 fields which I cant avoid unfortunately. I am looking to
optimize this I have found hive has
set hive.groupby.skewindata=true;
I dont use Hive I have Spark DataFrame can we achieve above Spark? Please
guide.