luis4a0 opened a new pull request, #12083:
URL: https://github.com/apache/gluten/pull/12083

   ## What changes are proposed in this pull request?
   
   The `SCOPED_TIMER` macro in `cpp/velox/utils/Common.h` wraps its
   `DeltaCpuWallTimer` in a `do { ... } while (0)` block. The
   `DeltaCpuWallTimer` is constructed inside the do-block, so it is also
   **destroyed at the closing brace of the do-block — i.e. immediately
   after construction**. The lambda fires recording essentially zero ns,
   and the surrounding scope (which the call sites intend to time) is
   not measured at all.
   
   As a consequence, every existing `SCOPED_TIMER` call in the Velox
   shuffle writers (`VeloxHashShuffleWriter.cc`,
   `VeloxRssSortShuffleWriter.cc`) has been silently recording near-zero
   values into `cpuWallTimingList_`. The per-bucket time breakdown
   (Compute, FlattenRV, SplitRV, EvictPartition, Stop, …) that
   `stat()` would log under `VELOX_SHUFFLE_WRITER_LOG_FLAG=1` is
   unusable for performance analysis as a result.
   
   This PR replaces the do-while wrapper with two uniquely-named local
   variables that survive until the end of the **enclosing** scope —
   which is what every call site already expects from a SCOPED-style
   RAII timer.
   
   ```cpp
   #define SCOPED_TIMER(timing)                                                 
                                         \
     auto GLUTEN_SCOPED_TIMER_CONCAT(_glutenScopedTimerTarget_, __LINE__) = 
&(timing);                                   \
     facebook::velox::DeltaCpuWallTimer 
GLUTEN_SCOPED_TIMER_CONCAT(_glutenScopedTimer_, __LINE__) {                     
 \
       [_glutenScopedTimerTargetPtr = 
GLUTEN_SCOPED_TIMER_CONCAT(_glutenScopedTimerTarget_, __LINE__)](               
   \
           const facebook::velox::CpuWallTiming& delta) { 
_glutenScopedTimerTargetPtr->add(delta); }                     \
     }
   ```
   
   A few implementation notes:
   
   - The unique-naming pattern (using `__LINE__`) is needed so that
     multiple `SCOPED_TIMER` calls in the same scope don't shadow each
     other.
   - The extra local for the timing target (a pointer) is needed because
     the `timing` macro argument may be an arbitrary expression
     (e.g. `cpuWallTimingList_[CpuWallTimingCompute]`) that is not a
     valid lambda capture name. Capturing a pointer to it sidesteps that.
   
   ## How was this patch tested?
   
   I verified the change compiles cleanly across all 18 call sites of
   `SCOPED_TIMER` in
   `cpp/velox/shuffle/Velox{Hash,RssSort}ShuffleWriter.cc`. No new tests
   are added in this PR — adding a unit test that asserts a `SCOPED_TIMER`
   records non-zero elapsed time after a deliberate sleep would be a
   useful defensive addition; happy to add it as a follow-up if the
   reviewers want it in this PR.
   
   End-to-end runtime validation (running TPC-DS with
   `VELOX_SHUFFLE_WRITER_LOG_FLAG=1` and confirming the per-bucket
   numbers are now non-zero) was not possible in my development
   environment because a system-wide gluten-velox JAR took precedence
   over the locally-built OSS bundle on the JNI symbol resolution path,
   shadowing the patched `libvelox.so`. The fix is small and
   inspection-correct against the SCOPED_TIMER pattern used elsewhere in
   Velox/Folly.
   
   Marked as draft for an extra round of eyes on the macro before flipping to 
ready.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to