leixm opened a new pull request, #55228:
URL: https://github.com/apache/spark/pull/55228

   ### What changes were proposed in this pull request?
   
   This PR makes the custom metrics update interval configurable by introducing 
a new SQL configuration `spark.sql.execution.customMetrics.numRowsPerUpdate`.
   
   Previously, `CustomMetrics.NUM_ROWS_PER_UPDATE` was hardcoded to `100`, 
meaning custom metrics were updated every 100 rows during data source 
read/write operations. This PR replaces the hardcoded constant with a 
configurable value, with the default remaining at `100` for backward 
compatibility.
   
   
   ### Why are the changes needed?
   
   Updating custom metrics every 100 rows introduces non-trivial CPU overhead 
for high-throughput workloads. In our production testing, increasing this value 
to `1000` resulted in a noticeable reduction in CPU time with no observable 
loss in metric accuracy.
   
   Different workloads have different trade-offs between metric freshness and 
CPU overhead. Making this configurable allows users to tune the update 
frequency based on their specific requirements.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. A new SQL configuration is added:
   
   - **`spark.sql.execution.customMetrics.numRowsPerUpdate`** (default: `100`): 
Controls the number of rows between custom metrics updates during data source 
read/write operations. Increasing this value reduces the frequency of metrics 
updates and can lower CPU overhead for high-throughput workloads.
   
   The default behavior is unchanged.
   
   ### How was this patch tested?
   
   Existing unit tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Cursor with Claude


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to