leixm opened a new pull request, #55228: URL: https://github.com/apache/spark/pull/55228
### What changes were proposed in this pull request? This PR makes the custom metrics update interval configurable by introducing a new SQL configuration `spark.sql.execution.customMetrics.numRowsPerUpdate`. Previously, `CustomMetrics.NUM_ROWS_PER_UPDATE` was hardcoded to `100`, meaning custom metrics were updated every 100 rows during data source read/write operations. This PR replaces the hardcoded constant with a configurable value, with the default remaining at `100` for backward compatibility. ### Why are the changes needed? Updating custom metrics every 100 rows introduces non-trivial CPU overhead for high-throughput workloads. In our production testing, increasing this value to `1000` resulted in a noticeable reduction in CPU time with no observable loss in metric accuracy. Different workloads have different trade-offs between metric freshness and CPU overhead. Making this configurable allows users to tune the update frequency based on their specific requirements. ### Does this PR introduce _any_ user-facing change? Yes. A new SQL configuration is added: - **`spark.sql.execution.customMetrics.numRowsPerUpdate`** (default: `100`): Controls the number of rows between custom metrics updates during data source read/write operations. Increasing this value reduces the frequency of metrics updates and can lower CPU overhead for high-throughput workloads. The default behavior is unchanged. ### How was this patch tested? Existing unit tests. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor with Claude -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
