Ziqi Liu created SPARK-40261:
--------------------------------

             Summary: TaskResult meta should not be counted into result size
                 Key: SPARK-40261
                 URL: https://issues.apache.org/jira/browse/SPARK-40261
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.4.0
            Reporter: Ziqi Liu


This issue exists for a long time (since 
[https://github.com/liuzqt/spark/commit/c33e55008239f417764d589c1366371d18331686)]

when calculating whether driver fetching result exceed 
`spark.driver.maxResultSize` limit, the whole serialized result task size is 
taken into account, including task 
metadata([accumUpdates|https://github.com/apache/spark/blob/c95ed826e23fdec6e1a779cfebde7b3364594fb5/core/src/main/scala/org/apache/spark/scheduler/TaskResult.scala#L41])
 as well. However, the metadata should not be counted because they will be 
discarded by the driver immediately after being processed.

This will lead to exception when running jobs with tons of task but actually 
return small results.

Therefore we should only count 
`[valueBytes|https://github.com/apache/spark/blob/c95ed826e23fdec6e1a779cfebde7b3364594fb5/core/src/main/scala/org/apache/spark/scheduler/TaskResult.scala#L40]`
 when calculating result size limit.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to