alamb commented on issue #5523:
URL: https://github.com/apache/arrow-rs/issues/5523#issuecomment-3074832941

   Here is a thought I had about how to tell when to decide when to use 
RowSelection vs BooleanArray  that I mentioned to @XiangpengHao  this afternoon 
but wanted to get into writing somewhere
   
   The heuristic for deciding between the two representations likely needs the 
quantity "what is the average run length of selected portions of the mask"
   
   One way to calculate "how many times does the mask change (namely, how many 
distinct RowSelector's would be needed) would be:
   
   ```rust
   let mask: BooleanArray =  evaluate_predicate();
   // compute locations where the subsequent bit is not equal to the previous 
bit
   let transitions = ne(mask.slice(0, mask.len() - 1), mask.slice(1, mask.len() 
- 1)
   
   // the total number of row selections would be
   let total_row_selections = transitions.count_ones();
   
   // the average length of the selection can be obtained via the set bit 
iterator
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to