Akanksha-kedia commented on PR #18892:
URL: https://github.com/apache/pinot/pull/18892#issuecomment-4853660677

   cc @xiangfu0 @Jackie-Jiang — following up on the earlier review comment with 
a summary for visibility.
   
   ## What this PR does
   
   Refactors the 4-arg splitPart(input, delimiter, limit, index) overload to 
avoid allocating a full String[] array on every invocation (#17585).
   
   **Problem:** The original implementation called 
StringUtils.splitByWholeSeparator() which allocates a String[] plus all element 
Strings on every row. In hot query paths where splitPart is called per-row, 
this creates significant GC pressure.
   
   **Approach:** Replaces the array allocation with direct index-based forward 
scanning:
   - Positive index: single forward scan extracts only the one substring needed 
— O(n), zero intermediate allocations
   - Negative index: one forward scan to count fields, one to extract — O(2n), 
zero intermediate allocations
   
   The null/empty delimiter case falls back to the original Commons 
implementation (complex whitespace-splitting rules).
   
   **Files changed:**
   - StringFunctions.java — new splitPartLimitedForward() and 
countFieldsLimited() private helpers
   - StringFunctionsTest.java — fuzz test (testSplitPartLimitedRandomized) 
cross-checking 10k random inputs against splitPartArrayBased, plus targeted 
data-provider cases for limit+index combinations


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to