Any limitations of spark.shuffle.spill?

2014-11-05 Thread Yangcheng Huang
Hi

One question about the power of spark.shuffle.spill -
(I know this has been asked several times :-)

Basically, in handling a (cached) dataset that doesn't fit in memory, Spark can 
spill it to disk.

However, can I say that, when this is enabled, Spark can handle the situation 
faultlessly, no matter -

(1)How big the data set is (as compared to the available memory)

(2)How complex the detailed calculation is being carried out
Can spark.shuffle.spill handle this perfectly?

Here we assume that (1) the disk space has no limitations and (2) the code is 
correctly written according to the functional requirements.

The reason to ask this is, under such situations, I kept receiving warnings 
like FetchFailed, if memory usage reaches the limit.

Thanks
YC


Re: Any limitations of spark.shuffle.spill?

2014-11-05 Thread Shixiong Zhu
Two limitations we found here:
http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-in-quot-cogroup-quot-td17349.html

Best Regards,
Shixiong Zhu

2014-11-06 2:04 GMT+08:00 Yangcheng Huang yangcheng.hu...@huawei.com:

  Hi



 One question about the power of spark.shuffle.spill –

 (I know this has been asked several times :-)



 Basically, in handling a (cached) dataset that doesn’t fit in memory,
 Spark can spill it to disk.



 However, can I say that, when this is enabled, Spark can handle the
 situation faultlessly, no matter –

 (1)How big the data set is (as compared to the available memory)

 (2)How complex the detailed calculation is being carried out

 Can spark.shuffle.spill handle this perfectly?



 Here we assume that (1) the disk space has no limitations and (2) the code
 is correctly written according to the functional requirements.



 The reason to ask this is, under such situations, I kept receiving
 warnings like “FetchFailed”, if memory usage reaches the limit.



 Thanks

 YC