Re: Is spark.driver.maxResultSize used correctly ?

2016-03-01 Thread Jeff Zhang
Check the code again. Looks like currently the task result will be loaded into memory no matter it is DirectTaskResult or InDirectTaskResult. Previous I thought InDirectTaskResult can be loaded into memory later which can save memory, RDD#collectAsIterator is what I thought that may save memory.

Re: Is spark.driver.maxResultSize used correctly ?

2016-03-01 Thread Reynold Xin
How big of a deal is this though? If I am reading your email correctly, either way this job will fail. You simply want it to fail earlier in the executor side, rather than collecting it and fail on the driver side? On Sunday, February 28, 2016, Jeff Zhang wrote: > data skew

Re: Is spark.driver.maxResultSize used correctly ?

2016-02-28 Thread Jeff Zhang
data skew might be possible, but not the common case. I think we should design for the common case, for the skew case, we may can set some parameter of fraction to allow user to tune it. On Sat, Feb 27, 2016 at 4:51 PM, Reynold Xin wrote: > But sometimes you might have skew

Re: Is spark.driver.maxResultSize used correctly ?

2016-02-27 Thread Reynold Xin
But sometimes you might have skew and almost all the result data are in one or a few tasks though. On Friday, February 26, 2016, Jeff Zhang wrote: > > My job get this exception very easily even when I set large value of > spark.driver.maxResultSize. After checking the spark

Is spark.driver.maxResultSize used correctly ?

2016-02-26 Thread Jeff Zhang
My job get this exception very easily even when I set large value of spark.driver.maxResultSize. After checking the spark code, I found spark.driver.maxResultSize is also used in Executor side to decide whether DirectTaskResult/InDirectTaskResult sent. This doesn't make sense to me. Using