Re: Support for skewed joins in Spark

๏̯͡๏ Mon, 04 May 2015 06:11:11 -0700

Hello Soila,
Can you share the code that shows usuag of RangePartitioner ?
I am facing issue with .join() where one task runs forever. I tried
repartition(100/200/300/1200) and it did not help, I cannot use map-side
join because both datasets are huge and beyond driver memory size.
Regards,
Deepak


On Fri, Mar 13, 2015 at 9:54 AM, Soila Pertet Kavulya <skavu...@gmail.com>
wrote:

> Thanks Shixiong,
>
> I'll try out your PR. Do you know what the status of the PR is? Are
> there any plans to incorporate this change to the
> DataFrames/SchemaRDDs in Spark 1.3?
>
> Soila
>
> On Thu, Mar 12, 2015 at 7:52 PM, Shixiong Zhu <zsxw...@gmail.com> wrote:
> > I sent a PR to add skewed join last year:
> > https://github.com/apache/spark/pull/3505
> > However, it does not split a key to multiple partitions. Instead, if a
> key
> > has too many values that can not be fit in to memory, it will store the
> > values into the disk temporarily and use disk files to do the join.
> >
> > Best Regards,
> >
> > Shixiong Zhu
> >
> > 2015-03-13 9:37 GMT+08:00 Soila Pertet Kavulya <skavu...@gmail.com>:
> >>
> >> Does Spark support skewed joins similar to Pig which distributes large
> >> keys over multiple partitions? I tried using the RangePartitioner but
> >> I am still experiencing failures because some keys are too large to
> >> fit in a single partition. I cannot use broadcast variables to
> >> work-around this because both RDDs are too large to fit in driver
> >> memory.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Deepak

Re: Support for skewed joins in Spark

Reply via email to