Ali Ince created SPARK-45759: -------------------------------- Summary: Custom metrics should be updated after commit too Key: SPARK-45759 URL: https://issues.apache.org/jira/browse/SPARK-45759 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.1 Reporter: Ali Ince
We have a DataWriter component, which processes records in configurable batches, which are accumulated in `write(T record)` implementation and sent to the persistent store when the configured batch size is reached. Within this approach, last batch is handled during `commit()` call, as there is no other mechanism of knowing if there are more records or not. We are now adding support for custom metrics, by implementing the `supportedCustomMetrics()` and `currentMetricsValues()` in the `Write` and `DataWriter` implementations. The problem we see is, since `CustomMetrics.updateMetrics` is only called [during|[https://github.com/apache/spark/blob/af8907a0873f5ca192b150f28a0c112107594722/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala#L443-L443]] and [just after|[https://github.com/apache/spark/blob/af8907a0873f5ca192b150f28a0c112107594722/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala#L451-L451|https://github.com/apache/spark/blob/af8907a0873f5ca192b150f28a0c112107594722/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala#L451-L451)]] record processing, we do not observe the complete metrics since the last batch that is handled during `commit()` call is not collected/updated. We propose to also to add `CustomMetrics.updateMetrics` call after `commit()` is processed successfully, ideally just before `run` function exits (maybe just above [https://github.com/apache/spark/blob/af8907a0873f5ca192b150f28a0c112107594722/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala#L473-L473]). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org