I am performing join operation , if I convert reduce side join to map side
(no shuffle will happen)  and I assume in that case this error shouldn't
come. Let me know if this understanding is correct

On Tue, May 1, 2018 at 9:37 PM, Ryan Blue <rb...@netflix.com> wrote:

> This is usually caused by skew. Sometimes you can work around it by in
> creasing the number of partitions like you tried, but when that doesn’t
> work you need to change the partitioning that you’re using.
>
> If you’re aggregating, try adding an intermediate aggregation. For
> example, if your query is select sum(x), a from t group by a, then try select
> sum(partial), a from (select sum(x) as partial, a, b from t group by a, b)
> group by a.
>
> rb
> ​
>
> On Tue, May 1, 2018 at 4:21 AM, Pralabh Kumar <pralabhku...@gmail.com>
> wrote:
>
>> Hi
>>
>> I am getting the above error in Spark SQL . I have increase (using 5000 )
>> number of partitions but still getting the same error .
>>
>> My data most probably is skew.
>>
>>
>>
>> org.apache.spark.shuffle.FetchFailedException: Too large frame: 4247124829
>>      at 
>> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:419)
>>      at 
>> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:349)
>>
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Reply via email to