Tengfei Huang created SPARK-52006: ------------------------------------- Summary: CollectMetricsExec's accumulator should be excluded from heartbeats and event logs Key: SPARK-52006 URL: https://issues.apache.org/jira/browse/SPARK-52006 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.5 Reporter: Tengfei Huang
CollectMetricsExec uses an accumulator(AggregatingAccumulator) for emitting metrics in a side-channel that can be collected to the driver. Unfortunately, this inherits many default accumulator behaviors, some of which are undesirable: * This is a named accumulator but isn’t an internal accumulator (i.e. its name doesn’t start with the internal task metrics prefix), so its result is shown in the Spark UI. But that result isn’t meaningful to users because it’s the `.toString` representation of an UnsafeRow representing an aggregation buffer. * This same string representation is also logged in event logs on a per-task basis, slowing down event logging and bloating log size. * The underlying accumulator value is also retained by the live UI even though it’s not displayed on a per-task basis, bloating driver memory consumption. * Even though our code tries to only publish a metric update once at the end of the task, there is a race condition between task completion listeners and metric heartbeater deregistration(stemming from the division of responsibilities between Task.scala and Executor.scala) which will lead to accumulator serialization issues. Based on the above undesirable facts, we should give this metric a special internal accumulator name and should use that name to explicitly exclude it from heartbeats and from event logs. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org