Re: Is spark.driver.maxResultSize used correctly ?

Reynold Xin Sat, 27 Feb 2016 00:51:57 -0800

But sometimes you might have skew and almost all the result data are in one
or a few tasks though.


On Friday, February 26, 2016, Jeff Zhang <zjf...@gmail.com> wrote:

>
> My job get this exception very easily even when I set large value of
> spark.driver.maxResultSize. After checking the spark code, I found
> spark.driver.maxResultSize is also used in Executor side to decide whether
> DirectTaskResult/InDirectTaskResult sent. This doesn't make sense to me.
> Using  spark.driver.maxResultSize / taskNum might be more proper. Because
> if  spark.driver.maxResultSize is 1g and we have 10 tasks each has 200m
> output. Then even the output of each task is less than
>  spark.driver.maxResultSize so DirectTaskResult will be sent to driver, but
> the total result size is 2g which will cause exception in driver side.
>
>
> 16/02/26 10:10:49 INFO DAGScheduler: Job 4 failed: treeAggregate at
> LogisticRegression.scala:283, took 33.796379 s
>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Total size of serialized results of 1 tasks (1085.0
> MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Is spark.driver.maxResultSize used correctly ?

Reply via email to