I ran into the same issue when the dataset is very big. Marcelo from Cloudera found that it may be caused by SPARK-2711, so their Spark 1.1 release reverted SPARK-2711, and the issue is gone. See https://issues.apache.org/jira/browse/SPARK-3633 for detail.
You can checkout Cloudera's version here https://github.com/cloudera/spark/tree/cdh5-1.1.0_5.2.0 PS, I don't test it yet, but will test it in the following couple days, and report back. Sincerely, DB Tsai ------------------------------------------------------- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Sat, Oct 18, 2014 at 6:22 PM, marylucy <qaz163wsx_...@hotmail.com> wrote: > When doing groupby for big data,may be 500g,some partition tasks > success,some partition tasks fetchfailed error. Spark system retry > previous stage,but always fail > 6 computers : 384g > Worker:40g*7 for one computer > > Can anyone tell me why fetch failed??? > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >