brancz commented on PR #7671: URL: https://github.com/apache/arrow-rs/pull/7671#issuecomment-2984332176
It actually throws up a larger design question: Should two extend calls, that end up referencing the same value even continue the "previous" run? As in if we an array that is an REE with the logical values being: ``` [1, 1, 0, 0, 1, 1] ``` and the interactions are ``` arr.extend(0, 2) arr.extend(4, 6) ``` should the result be 1) runs: [4], values: [1] 2) runs: [2, 4], values: [1, 1] Obviously 1) is more optimized, but it would also mean that `.extend` needs to be able to compare arbitrary values (because it needs to know when to continue vs. start a new run) I would propose that we don't do this optimization in this PR and have a larger conversation about whether arrow-data should be able to do something like that and if so, how without creating cyclic dependency issues, and in this PR just adjust the test expectations. Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org