Re: Support for skewed joins in Spark

Soila Pertet Kavulya Thu, 12 Mar 2015 21:26:58 -0700

Thanks Shixiong,

I'll try out your PR. Do you know what the status of the PR is? Are
there any plans to incorporate this change to the
DataFrames/SchemaRDDs in Spark 1.3?


Soila

On Thu, Mar 12, 2015 at 7:52 PM, Shixiong Zhu <zsxw...@gmail.com> wrote:
> I sent a PR to add skewed join last year:
> https://github.com/apache/spark/pull/3505
> However, it does not split a key to multiple partitions. Instead, if a key
> has too many values that can not be fit in to memory, it will store the
> values into the disk temporarily and use disk files to do the join.
>
> Best Regards,
>
> Shixiong Zhu
>
> 2015-03-13 9:37 GMT+08:00 Soila Pertet Kavulya <skavu...@gmail.com>:
>>
>> Does Spark support skewed joins similar to Pig which distributes large
>> keys over multiple partitions? I tried using the RangePartitioner but
>> I am still experiencing failures because some keys are too large to
>> fit in a single partition. I cannot use broadcast variables to
>> work-around this because both RDDs are too large to fit in driver
>> memory.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Support for skewed joins in Spark

Reply via email to