[ https://issues.apache.org/jira/browse/SPARK-40261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ziqi Liu updated SPARK-40261: ----------------------------- Summary: DirectTaskResult meta should not be counted into result size (was: TaskResult meta should not be counted into result size) > DirectTaskResult meta should not be counted into result size > ------------------------------------------------------------ > > Key: SPARK-40261 > URL: https://issues.apache.org/jira/browse/SPARK-40261 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.4.0 > Reporter: Ziqi Liu > Priority: Major > > This issue exists for a long time (since > [https://github.com/liuzqt/spark/commit/c33e55008239f417764d589c1366371d18331686)] > when calculating whether driver fetching result exceed > `spark.driver.maxResultSize` limit, the whole serialized result task size is > taken into account, including task metadata overhead > size([accumUpdates|https://github.com/apache/spark/blob/c95ed826e23fdec6e1a779cfebde7b3364594fb5/core/src/main/scala/org/apache/spark/scheduler/TaskResult.scala#L41]) > as well. However, the metadata should not be counted because they will be > discarded by the driver immediately after being processed. > This will lead to exception when running jobs with tons of task but actually > return small results. > Therefore we should only count > `[valueBytes|https://github.com/apache/spark/blob/c95ed826e23fdec6e1a779cfebde7b3364594fb5/core/src/main/scala/org/apache/spark/scheduler/TaskResult.scala#L40]` > when calculating result size limit. > cc [~joshrosen] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org