Re: [PR] arrow-data: Add REE support for `build_extend` and `build_extend_nulls` [arrow-rs]

via GitHub Wed, 18 Jun 2025 06:57:17 -0700


brancz commented on PR #7671:
URL: https://github.com/apache/arrow-rs/pull/7671#issuecomment-2984332176


   It actually throws up a larger design question: Should two extend calls, 
that end up referencing the same value even continue the "previous" run? As in 
if we an array that is an REE with the logical values being:
   ```
   [1, 1, 0, 0, 1, 1]
   ```
   and the interactions are
   ```
   arr.extend(0, 2)
   arr.extend(4, 6)
   ```
   should the result be
   1) runs: [4], values: [1]
   2) runs: [2, 4], values: [1, 1]
   
   Obviously 1) is more optimized, but it would also mean that `.extend` needs 
to be able to compare arbitrary values (because it needs to know when to 
continue vs. start a new run)
   
   I would propose that we don't do this optimization in this PR and have a 
larger conversation about whether arrow-data should be able to do something 
like that and if so, how without creating cyclic dependency issues, and in this 
PR just adjust the test expectations.
   
   Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] arrow-data: Add REE support for `build_extend` and `build_extend_nulls` [arrow-rs]

Reply via email to