luis4a0 opened a new pull request, #12083:
URL: https://github.com/apache/gluten/pull/12083
## What changes are proposed in this pull request?
The `SCOPED_TIMER` macro in `cpp/velox/utils/Common.h` wraps its
`DeltaCpuWallTimer` in a `do { ... } while (0)` block. The
`DeltaCpuWallTimer` is constructed inside the do-block, so it is also
**destroyed at the closing brace of the do-block — i.e. immediately
after construction**. The lambda fires recording essentially zero ns,
and the surrounding scope (which the call sites intend to time) is
not measured at all.
As a consequence, every existing `SCOPED_TIMER` call in the Velox
shuffle writers (`VeloxHashShuffleWriter.cc`,
`VeloxRssSortShuffleWriter.cc`) has been silently recording near-zero
values into `cpuWallTimingList_`. The per-bucket time breakdown
(Compute, FlattenRV, SplitRV, EvictPartition, Stop, …) that
`stat()` would log under `VELOX_SHUFFLE_WRITER_LOG_FLAG=1` is
unusable for performance analysis as a result.
This PR replaces the do-while wrapper with two uniquely-named local
variables that survive until the end of the **enclosing** scope —
which is what every call site already expects from a SCOPED-style
RAII timer.
```cpp
#define SCOPED_TIMER(timing)
\
auto GLUTEN_SCOPED_TIMER_CONCAT(_glutenScopedTimerTarget_, __LINE__) =
&(timing); \
facebook::velox::DeltaCpuWallTimer
GLUTEN_SCOPED_TIMER_CONCAT(_glutenScopedTimer_, __LINE__) {
\
[_glutenScopedTimerTargetPtr =
GLUTEN_SCOPED_TIMER_CONCAT(_glutenScopedTimerTarget_, __LINE__)](
\
const facebook::velox::CpuWallTiming& delta) {
_glutenScopedTimerTargetPtr->add(delta); } \
}
```
A few implementation notes:
- The unique-naming pattern (using `__LINE__`) is needed so that
multiple `SCOPED_TIMER` calls in the same scope don't shadow each
other.
- The extra local for the timing target (a pointer) is needed because
the `timing` macro argument may be an arbitrary expression
(e.g. `cpuWallTimingList_[CpuWallTimingCompute]`) that is not a
valid lambda capture name. Capturing a pointer to it sidesteps that.
## How was this patch tested?
I verified the change compiles cleanly across all 18 call sites of
`SCOPED_TIMER` in
`cpp/velox/shuffle/Velox{Hash,RssSort}ShuffleWriter.cc`. No new tests
are added in this PR — adding a unit test that asserts a `SCOPED_TIMER`
records non-zero elapsed time after a deliberate sleep would be a
useful defensive addition; happy to add it as a follow-up if the
reviewers want it in this PR.
End-to-end runtime validation (running TPC-DS with
`VELOX_SHUFFLE_WRITER_LOG_FLAG=1` and confirming the per-bucket
numbers are now non-zero) was not possible in my development
environment because a system-wide gluten-velox JAR took precedence
over the locally-built OSS bundle on the JNI symbol resolution path,
shadowing the patched `libvelox.so`. The fix is small and
inspection-correct against the SCOPED_TIMER pattern used elsewhere in
Velox/Folly.
Marked as draft for an extra round of eyes on the macro before flipping to
ready.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]