Hi,

I am having this FetchFailed issue when the driver is about to collect about
2.5M lines of short strings (about 10 characters each line) from a YARN
cluster with 400 nodes:

*14/08/22 11:43:27 WARN scheduler.TaskSetManager: Lost task 205.0 in stage
0.0 (TID 1228, aaa.xxx.com): FetchFailed(BlockManagerId(220, aaa.xxx.com,
37899, 0), shuffleId=0, mapId=420, reduceId=205)
14/08/22 11:43:27 WARN scheduler.TaskSetManager: Lost task 603.0 in stage
0.0 (TID 1626, aaa.xxx.com): FetchFailed(BlockManagerId(220, aaa.xxx.com,
37899, 0), shuffleId=0, mapId=420, reduceId=603)*

And other than this FetchFailed, I am not able to see anything else from the
log file (no OOM errors shown).  

This does not happen when there is only 2M lines. I guess it might because
of the akka message size, and then I used the following 

spark.akka.frameSize  100
spark.akka.timeout      200

And that does not help as well. Has anyone experienced similar problems? 

Thanks,
Jiayu



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/FetchFailed-when-collect-at-YARN-cluster-tp12670.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to