I sent a PR to add skewed join last year:
https://github.com/apache/spark/pull/3505
However, it does not split a key to multiple partitions. Instead, if a key
has too many values that can not be fit in to memory, it will store the
values into the disk temporarily and use disk files to do the join.

Best Regards,
Shixiong Zhu

2015-03-13 9:37 GMT+08:00 Soila Pertet Kavulya <skavu...@gmail.com>:

> Does Spark support skewed joins similar to Pig which distributes large
> keys over multiple partitions? I tried using the RangePartitioner but
> I am still experiencing failures because some keys are too large to
> fit in a single partition. I cannot use broadcast variables to
> work-around this because both RDDs are too large to fit in driver
> memory.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to