GeorgeLeePatterson opened a new pull request, #327: URL: https://github.com/apache/arrow-js/pull/327
## Summary Adds O(1) amortized sequential access optimization for RunEndEncoded vectors through a stateful iterator implementation. This PR builds on the base RunEndEncoded support added in #326 by optimizing the iteration performance from O(n log n) to O(n) for sequential access patterns. ## Implementation Implements `RunEndEncodedIterator` based on Apache Arrow C++'s `PhysicalIndexFinder` caching algorithm: - **Caches** the last physical index from previous lookup - **Fast path**: Validates cached index is still valid for current logical index (O(1)) - **Fallback**: Uses cached index to partition search space, then binary searches the relevant partition - **Sequential iteration**: O(1) amortized per element instead of O(log n) ### Algorithm 1. Check if cached physical index is still valid for current logical index 2. If valid and within run bounds, return cached index immediately (common case) 3. If not valid, use cached index to partition search space into before/after ranges 4. Binary search only the relevant partition ### Performance - **Best case** (sequential iteration): O(1) per element - **Worst case** (random access): O(log n) + 1 extra probe vs standard binary search - **Typical iteration patterns**: Near-constant time per element ## Changes - `src/visitor/iterator.ts`: Added `RunEndEncodedIterator` class and `runEndEncodedIterator()` function ## Testing Existing iterator tests cover the functionality. The optimization is transparent to the API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
