Re: [PR] [VL] Add per-batch input-encoding counter to VeloxHashShuffleWriter [gluten]

via GitHub Mon, 18 May 2026 09:34:58 -0700


luis4a0 commented on code in PR #12107:
URL: https://github.com/apache/gluten/pull/12107#discussion_r3260487942



##########
cpp/velox/shuffle/VeloxHashShuffleWriter.cc:
##########
@@ -1328,6 +1370,22 @@ void VeloxHashShuffleWriter::stat() const {
     }
     LOG(INFO) << oss.str();
   }
+  {
+    std::ostringstream oss;
+    oss << "Velox shuffle writer stat:InputEncoding";
+    int64_t total = 0;
+    for (auto v : inputEncodingCounts_) {
+      total += v;
+    }
+    for (int b = 0; b < kInputEncodingNum; ++b) {
+      auto v = inputEncodingCounts_[b];
+      oss << " " << inputEncodingName(static_cast<InputEncodingBucket>(b)) << 
"=" << v;
+      if (total > 0) {
+        oss << "(" << (100.0 * static_cast<double>(v) / 
static_cast<double>(total)) << "%)";

Review Comment:
   Right, fixed in a208a78. Added an `inputEncodingSkippedBatches_` counter 
that's incremented on every early-return path in 
`accumulateInputEncodingCounts` and printed at the end of the InputEncoding log 
line as `SkippedNonVeloxBatches=N`. The encoding-bucket total + the skip count 
now equal the `count` field on the surrounding `cpuWallTimingList_` lines, so 
the two log blocks have comparable denominators.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [VL] Add per-batch input-encoding counter to VeloxHashShuffleWriter [gluten]

Reply via email to