RE: Issues with partitionBy: FetchFailed

2014-09-21 Thread Shao, Saisai
Hi, I’ve also met this problem before, I think you can try to set “spark.core.connection.ack.wait.timeout” to a large value to avoid ack timeout, default is 60 seconds. Sometimes because of GC pause or some other reasons, acknowledged message will be timeout, which will lead to this

Re: Issues with partitionBy: FetchFailed

2014-09-21 Thread David Rowe
Hi, I've seen this problem before, and I'm not convinced it's GC. When spark shuffles it writes a lot of small files to store the data to be sent to other executors (AFAICT). According to what I've read around the place the intention is that these files be stored in disk buffers, and since