Akanksha-kedia commented on PR #18892:
URL: https://github.com/apache/pinot/pull/18892#issuecomment-4855078364

   ## Code Review: `fix/split-part-memory-optimization`
   
   The core idea is sound — eliminating the per-row `String[]` allocation in 
the 4-arg `splitPart` overload is a legitimate hot-path optimization, and the 
implementation faithfully replicates `splitByWholeSeparator` semantics. The 
deterministic test table is thorough. Two **MAJOR** gaps and several **MINOR** 
issues below.
   
   ---
   
   ### C4 — Performance
   
   **[MAJOR] No benchmark covers the 4-arg overload (the thing this PR actually 
changes)**
   `BenchmarkSplitPart` only benchmarks `splitPart(input, delimiter, index)` 
(3-arg). The PR's optimization target is `splitPart(input, delimiter, limit, 
index)` (4-arg), and there is no way to verify or quantify the improvement 
without a benchmark.
   > Add `testSplitPartLimitedNew()` / `testSplitPartLimitedOld()` methods to 
`BenchmarkSplitPart`, mirroring the existing 3-arg pair.
   
   **[MINOR] No single-char delimiter fast path for negative indices in the 
4-arg path**
   The 3-arg overload has `splitPartNegativeIdxSingleCharDelim` which avoids a 
second forward scan for single-char delimiters (lines 609–612). The 4-arg 
overload always executes two full scans for negative indices regardless of 
delimiter length. Single-char delimiters (`,`, `.`, `/`) dominate real 
workloads.
   > Add a `// TODO: add single-char fast path for negative indices, analogous 
to splitPartNegativeIdxSingleCharDelim` comment.
   
   ---
   
   ### C5 — Correctness & Nulls
   
   **[MINOR] `while (pos <= inputLen)` loop bound at `splitPartLimitedForward` 
— the `=` arm is dead code**
   Every code path that sets `pos >= inputLen` returns before the outer loop 
re-evaluates. The loop is functionally `while (pos < inputLen)`.
   > Change to `while (pos < inputLen)` for clarity.
   
   **[MINOR] `if (totalFields == 0)` guard at line 752 is unreachable**
   `countFieldsLimited` returns at minimum `1` in all cases. A `totalFields` of 
`0` is structurally impossible.
   > Remove the dead guard or replace with `assert totalFields > 0`.
   
   **[MINOR] Stale comment at `splitPartLimitedForward` lines 839–841**
   `// (this case is handled by the trailing-delimiter logic below)` — the 
trailing-delimiter logic is on a completely separate code path, not "below" 
this return.
   > Replace with: `// The requested index is beyond the last field.`
   
   ---
   
   ### C6 — Testing
   
   **[MAJOR] `testSplitPartRandomized` only fuzz-tests the 3-arg overload — the 
4-arg overload has zero randomized coverage**
   ```java
   // line 316:
   String actual = StringFunctions.splitPart(input, delimiter, index);  // ← 
3-arg only
   ```
   > Add `testSplitPartLimitedRandomized` cross-checking `splitPart(input, 
delimiter, limit, index)` against 
`splitPartArrayBased(StringUtils.splitByWholeSeparator(input, delimiter, 
limit), index)` over 10k random tuples.
   
   **[MINOR] No SQL-level test coverage for `splitPart` with a limit**
   `pinot-query-runtime/src/test/resources/queries/StringFunctions.json` has no 
`splitPart` entries. SQL-level tests catch function-registry wiring issues that 
unit tests miss.
   
   ---
   
   ### C3 — Naming & API
   
   **[MINOR] Missing `// compare with BenchmarkSplitPart` cross-reference in 
the 4-arg overload**
   The 3-arg `splitPart` (line 588) has this comment. The 4-arg overload — now 
the more complex and performance-sensitive — has no such pointer.
   
   ---
   
   ### C8 — Process & Scope
   
   **[MINOR] Commit message overstates fuzz coverage**
   > *"All existing unit tests (172 cases including randomized fuzz) pass 
unchanged, confirming behavioral equivalence."*
   
   The fuzz test only exercises the 3-arg path. The 4-arg path is covered by 89 
deterministic table-driven cases only.
   
   Scope is clean — single file, no creep. PR description accurately describes 
the change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to