I am performing join operation , if I convert reduce side join to map side (no shuffle will happen) and I assume in that case this error shouldn't come. Let me know if this understanding is correct
On Tue, May 1, 2018 at 9:37 PM, Ryan Blue <rb...@netflix.com> wrote: > This is usually caused by skew. Sometimes you can work around it by in > creasing the number of partitions like you tried, but when that doesn’t > work you need to change the partitioning that you’re using. > > If you’re aggregating, try adding an intermediate aggregation. For > example, if your query is select sum(x), a from t group by a, then try select > sum(partial), a from (select sum(x) as partial, a, b from t group by a, b) > group by a. > > rb > > > On Tue, May 1, 2018 at 4:21 AM, Pralabh Kumar <pralabhku...@gmail.com> > wrote: > >> Hi >> >> I am getting the above error in Spark SQL . I have increase (using 5000 ) >> number of partitions but still getting the same error . >> >> My data most probably is skew. >> >> >> >> org.apache.spark.shuffle.FetchFailedException: Too large frame: 4247124829 >> at >> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:419) >> at >> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:349) >> >> > > > -- > Ryan Blue > Software Engineer > Netflix >