Akanksha-kedia commented on PR #18892: URL: https://github.com/apache/pinot/pull/18892#issuecomment-4853660677
cc @xiangfu0 @Jackie-Jiang — following up on the earlier review comment with a summary for visibility. ## What this PR does Refactors the 4-arg splitPart(input, delimiter, limit, index) overload to avoid allocating a full String[] array on every invocation (#17585). **Problem:** The original implementation called StringUtils.splitByWholeSeparator() which allocates a String[] plus all element Strings on every row. In hot query paths where splitPart is called per-row, this creates significant GC pressure. **Approach:** Replaces the array allocation with direct index-based forward scanning: - Positive index: single forward scan extracts only the one substring needed — O(n), zero intermediate allocations - Negative index: one forward scan to count fields, one to extract — O(2n), zero intermediate allocations The null/empty delimiter case falls back to the original Commons implementation (complex whitespace-splitting rules). **Files changed:** - StringFunctions.java — new splitPartLimitedForward() and countFieldsLimited() private helpers - StringFunctionsTest.java — fuzz test (testSplitPartLimitedRandomized) cross-checking 10k random inputs against splitPartArrayBased, plus targeted data-provider cases for limit+index combinations -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
