GeorgeLeePatterson opened a new pull request, #327:
URL: https://github.com/apache/arrow-js/pull/327

   ## Summary
   
   Adds O(1) amortized sequential access optimization for RunEndEncoded vectors 
through a stateful iterator implementation.
   
   This PR builds on the base RunEndEncoded support added in #326 by optimizing 
the iteration performance from O(n log n) to O(n) for sequential access 
patterns.
   
   ## Implementation
   
   Implements `RunEndEncodedIterator` based on Apache Arrow C++'s 
`PhysicalIndexFinder` caching algorithm:
   
   - **Caches** the last physical index from previous lookup
   - **Fast path**: Validates cached index is still valid for current logical 
index (O(1))
   - **Fallback**: Uses cached index to partition search space, then binary 
searches the relevant partition
   - **Sequential iteration**: O(1) amortized per element instead of O(log n)
   
   ### Algorithm
   
   1. Check if cached physical index is still valid for current logical index
   2. If valid and within run bounds, return cached index immediately (common 
case)
   3. If not valid, use cached index to partition search space into 
before/after ranges
   4. Binary search only the relevant partition
   
   ### Performance
   
   - **Best case** (sequential iteration): O(1) per element
   - **Worst case** (random access): O(log n) + 1 extra probe vs standard 
binary search
   - **Typical iteration patterns**: Near-constant time per element
   
   ## Changes
   
   - `src/visitor/iterator.ts`: Added `RunEndEncodedIterator` class and 
`runEndEncodedIterator()` function
   
   ## Testing
   
   Existing iterator tests cover the functionality. The optimization is 
transparent to the API.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to