luis4a0 opened a new pull request, #12086:
URL: https://github.com/apache/gluten/pull/12086

   ## What changes are proposed in this pull request?
   
   The bounds check in `SubstraitParser::checkWindowFunction`
   (`cpp/velox/substrait/SubstraitParser.cc:300`) reads:
   
   ```cpp
   if ((pos != std::string::npos) &&
       (msg.value().size() >= targetFunction.size()) &&
       (msg.value().substr(pos + config.size(), targetFunction.size()) == 
targetFunction))
   ```
   
   The size comparison `msg.value().size() >= targetFunction.size()` is
   not enough to guarantee that the subsequent `substr()` call stays
   within the buffer — the substring starts at offset
   `pos + config.size()`, so the correct bound is
   
   ```cpp
   msg.value().size() >= pos + config.size() + targetFunction.size()
   ```
   
   This PR fixes the bound and adds a small 
`cpp/velox/tests/SubstraitParserTest.cc`
   to lock in the desired behavior.
   
   ## Is the bug observable today?
   
   **No.** I want to be honest about this up front. `std::string::substr`
   **clamps** the requested length to the actual remaining string length,
   so the buggy bound + the substr call together produce the right
   answer (false) for truncated inputs by accident. None of the current
   callers (`"row_number"`, `"rank"`, `"dense_rank"`) exercise the
   failure path in any way that produces a wrong result today.
   
   I'm sending the fix anyway because:
   
   1. It's visibly wrong on inspection — reviewers later have to spend
      cycles confirming whether it's a real bug. Cleaner to fix it now.
   2. `std::string::substr`'s clamping behaviour is the only thing
      keeping this from being a runtime fault. A future refactor to
      `std::string_view::substr` (different contract — throws
      `std::out_of_range` on bad bounds) would turn this into a real
      exception with no logical change to the surrounding code.
   3. The fix is one tightening character per line of code, with a
      clear correctness argument.
   
   ## How was this patch tested?
   
   Adds `cpp/velox/tests/SubstraitParserTest.cc` with five cases:
   
   - `checkWindowFunctionWellFormedMatch` — happy path.
   - `checkWindowFunctionWellFormedNoMatch` — well-formed payload, target 
doesn't match.
   - `checkWindowFunctionTruncatedPayload` — the regression case: payload bytes
     after `window_function=` are shorter than the target. Must return false
     without overrunning the buffer.
   - `checkWindowFunctionEmptyPayload` — empty payload, no `window_function=` 
selector at all.
   - `checkWindowFunctionNoOptimization` — extension with no optimization 
payload.
   
   **All five tests pass under both the buggy and the fixed version**, because
   the bug is dormant (see "Is the bug observable today?" above). I validated
   this with a standalone harness that compiles the same logic against both
   versions of the bounds check (gated by `-DUSE_BUGGY=1`):
   
   |                                                | Fixed (this PR) | Buggy 
(current main) |
   
|------------------------------------------------|----------------:|---------------------:|
   | `checkWindowFunctionWellFormedMatch`           | PASS ✅          | PASS ✅  
            |
   | `checkWindowFunctionWellFormedNoMatch`         | PASS ✅          | PASS ✅  
            |
   | `checkWindowFunctionTruncatedPayload`          | PASS ✅          | PASS ✅  
            |
   | `checkWindowFunctionEmptyPayload`              | PASS ✅          | PASS ✅  
            |
   | `checkWindowFunctionNoOptimization`            | PASS ✅          | PASS ✅  
            |
   
   So the test is **defensive only** — its purpose is to lock in the
   desired behavior so any future refactor that breaks the bound check
   fails loudly, rather than to demonstrate the buggy version producing
   wrong output today.
   
   If reviewers prefer not to add a test for a dormant-bug case, I'm
   happy to drop it and ship the 1-line fix on its own.
   
   Marked as draft for one round of review before flipping to ready.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to