Re: [PR] [CORE] Add customMetrics extension point to ShuffleWriterMetrics for backend-specific shuffle stats [gluten]

via GitHub Tue, 19 May 2026 02:16:53 -0700


luis4a0 commented on code in PR #12114:
URL: https://github.com/apache/gluten/pull/12114#discussion_r3265097543



##########
backends-velox/src/main/scala/org/apache/spark/shuffle/ColumnarShuffleWriter.scala:
##########
@@ -258,6 +258,18 @@ class ColumnarShuffleWriter[K, V](
     dep.metrics("dataSize").add(splitResult.getRawPartitionLengths.sum)
     dep.metrics("compressTime").add(splitResult.getTotalCompressTime)
     dep.metrics("peakBytes").add(splitResult.getPeakBytes)
+    // Backend-specific custom metrics (see 
GlutenSplitResult.getCustomMetrics).
+    // Only entries whose key is pre-registered in `dep.metrics` are recorded;
+    // unknown keys are silently dropped so that the native side can ship new
+    // metrics ahead of the Spark-side `VeloxMetricsApi` registration without
+    // breaking older builds.
+    splitResult.getCustomMetrics.forEach {
+      (key, value) =>
+        val m = dep.metrics.get(key)
+        if (m.isDefined) {
+          m.get.add(value)

Review Comment:
   Good catch — fully agreed about the parity gap across the 3 writer entry 
points. This particular concern is now moot because, per offline reviewer 
feedback, we dropped the second commit entirely (the one that wired 
`getCustomMetrics()` into `ColumnarShuffleWriter.scala`). The Scala consumer 
side will come back as small follow-up PRs once specific metrics have proven 
useful, and your `applyCustomMetrics(splitResult, dep)` helper is exactly the 
right shape for the first such follow-up — I'll factor it that way to keep all 
3 writers in sync from day one.
   
   Will add a TODO note next to `getCustomMetrics()` in `GlutenSplitResult` so 
the helper-shape requirement is captured for whoever picks up the first 
per-metric follow-up.



##########
gluten-arrow/src/main/java/org/apache/gluten/vectorized/GlutenSplitResult.java:
##########
@@ -122,4 +139,32 @@ public double getAvgDictionaryFields() {
   public long getDictionarySize() {
     return dictionarySize;
   }
+
+  /**
+   * Backend-specific shuffle writer metrics, keyed by 
`<Backend>.<Family>.<Stat>`. The map
+   * preserves the iteration order JNI marshalled, but callers should treat 
the map as unordered.
+   * Returns an empty map if the native side did not populate any custom 
metrics (e.g. older Gluten
+   * libs, or backends that don't yet emit any). The returned map is 
unmodifiable.
+   */
+  public Map<String, Long> getCustomMetrics() {
+    Map<String, Long> cached = customMetricsCache;
+    if (cached != null) {
+      return cached;
+    }
+    synchronized (this) {
+      if (customMetricsCache != null) {
+        return customMetricsCache;
+      }
+      if (customMetricsKeys == null || customMetricsKeys.length == 0) {

Review Comment:
   Excellent point. Fixed in 6e6d98d. Added an explicit length-match check at 
the top of the synchronized block that throws `IllegalStateException` 
mentioning both lengths (and the case where values is null while keys is 
non-empty), so the producer-side bug is unambiguous and the cache field never 
gets assigned to a partial map. Two new `GlutenSplitResultSuite` cases:
   
   - `fails loudly on mismatched key/value array lengths` — also asserts the 
second call still throws (the cache field stays null on the failure path)
   - `fails loudly when values array is null but keys is non-empty`
   
   Both pass locally; full suite is now 8/8.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [CORE] Add customMetrics extension point to ShuffleWriterMetrics for backend-specific shuffle stats [gluten]

Reply via email to