alamb commented on issue #5523: URL: https://github.com/apache/arrow-rs/issues/5523#issuecomment-3074832941
Here is a thought I had about how to tell when to decide when to use RowSelection vs BooleanArray that I mentioned to @XiangpengHao this afternoon but wanted to get into writing somewhere The heuristic for deciding between the two representations likely needs the quantity "what is the average run length of selected portions of the mask" One way to calculate "how many times does the mask change (namely, how many distinct RowSelector's would be needed) would be: ```rust let mask: BooleanArray = evaluate_predicate(); // compute locations where the subsequent bit is not equal to the previous bit let transitions = ne(mask.slice(0, mask.len() - 1), mask.slice(1, mask.len() - 1) // the total number of row selections would be let total_row_selections = transitions.count_ones(); // the average length of the selection can be obtained via the set bit iterator ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org