alamb commented on issue #3463:
URL: https://github.com/apache/datafusion/issues/3463#issuecomment-3696495342

   Here is no
   
   Theory 3: overhead of converting back to mask is taking a long time
   
   Changes: force non inlining of a few functiosn so they show up the profile
   
   
   <details><summary>Details</summary>
   <p>
   
   ```diff
   diff --git a/parquet/src/arrow/arrow_reader/selection.rs 
b/parquet/src/arrow/arrow_reader/selection.rs
   index 2ddf812f9c3..5a91f3f273c 100644
   --- a/parquet/src/arrow/arrow_reader/selection.rs
   +++ b/parquet/src/arrow/arrow_reader/selection.rs
   @@ -146,6 +146,7 @@ impl RowSelection {
        /// # Panic
        ///
        /// Panics if any of the [`BooleanArray`] contain nulls
   +    #[inline(never)]
        pub fn from_filters(filters: &[BooleanArray]) -> Self {
            let mut next_offset = 0;
            let total_rows = filters.iter().map(|x| x.len()).sum();
   @@ -161,6 +162,7 @@ impl RowSelection {
        }
   
        /// Creates a [`RowSelection`] from an iterator of consecutive ranges 
to keep
   +    #[inline(never)]
        pub fn from_consecutive_ranges<I: Iterator<Item = Range<usize>>>(
            ranges: I,
            total_rows: usize,
   @@ -201,6 +203,7 @@ impl RowSelection {
        /// Note: this method does not make any effort to combine consecutive 
ranges, nor coalesce
        /// ranges that are close together. This is instead delegated to the IO 
subsystem to optimise,
        /// e.g. 
[`ObjectStore::get_ranges`](object_store::ObjectStore::get_ranges)
   +    #[inline(never)]
        pub fn scan_ranges(&self, page_locations: &[PageLocation]) -> 
Vec<Range<u64>> {
            let mut ranges: Vec<Range<u64>> = vec![];
            let mut row_offset = 0;
   @@ -342,6 +345,7 @@ impl RowSelection {
        /// Panics if `other` does not have a length equal to the number of 
rows selected
        /// by this RowSelection
        ///
   +    #[inline(never)]
        pub fn and_then(&self, other: &Self) -> Self {
            let mut selectors = vec![];
            let mut first = self.selectors.iter().cloned().peekable();
   @@ -923,6 +927,7 @@ impl RowSelectionCursor {
        }
    }
   
   +#[inline(never)]
    fn boolean_mask_from_selectors(selectors: &[RowSelector]) -> BooleanBuffer {
        let total_rows: usize = selectors.iter().map(|s| s.row_count).sum();
        let mut builder = BooleanBufferBuilder::new(total_rows);
   ```
   
   
   </p>
   </details> 
   
   
   Profiling with  q10
   
   ```shell
   samply record -- 
/Users/andrewlamb/Software/datafusion2/target/profiling/datafusion-cli   -f 
q.sql  > /dev/null  2>&1
   ```
   ```sql
   SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE 
"MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;
   
   
   ```
   Also note that this is a situation where the predicate column also appears 
in the projection ("MobilePhoneModel")
   
   Note that MobilePhoneModel is a StringViewArray and most values are short
   
   ```sql
   > select  "MobilePhoneModel", count(*) as c  from hits GROUP BY 1 ORDER BY 2 
desc;
   ```
   
   <details><summary>Details</summary>
   <p>
   
   ```sql
   +------------------+----------+
   | MobilePhoneModel | c        |
   +------------------+----------+
   |                  | 94434285 |
   | iPad             | 5106874  |
   | iPhone           | 229128   |
   | A500             | 75723    |
   | N8-00            | 24880    |
   | ONE TOUCH 6030A  | 12436    |
   | iPho             | 10163    |
   | GT-P7300B        | 9217     |
   | 3110000          | 8975     |
   | HTC Desire       | 7129     |
   | eagle75          | 6084     |
   | GT-I9500         | 5833     |
   | Transformer      | 4566     |
   | GT-I9100         | 4303     |
   | LG/P760/V1       | 4214     |
   | 5250             | 3635     |
   | GT-I9192         | 3518     |
   | MTC              | 3196     |
   | SGH-I317         | 3066     |
   | GT-S5830         | 2391     |
   | One_dual         | 1912     |
   | IQ245Plus        | 1812     |
   | HTC One          | 1802     |
   | 308              | 1705     |
   | Acer A701        | 1644     |
   | S7-3             | 1631     |
   | HTC_WildfireSV   | 1551     |
   | X2-02            | 1538     |
   | HTC              | 1326     |
   | GT-P7500         | 1315     |
   | Sensation        | 1305     |
   | LT26ii           | 1124     |
   | HTC Wildfire     | 1083     |
   | GT-P7500R        | 1055     |
   | U8800pro         | 960      |
   | Nokia303         | 955      |
   | GT-I9000000.2    | 896      |
   | 3110             | 885      |
   | ST25i            | 832      |
   | W536             | 829      |
   | .                           |
   | .                           |
   | .                           |
   +------------------+----------+
   ```
   
   </p>
   </details> 
   
   
   While looking at the profile, it seems to me like the overall cost of 
decoding the rows and then applying the mask is taking longer than using the 
selectors 
   
   <img width="1728" height="849" alt="Image" 
src="https://github.com/user-attachments/assets/58ed9367-0812-4baa-acb1-6dde1c42da51";
 />
   
   Note that only 2% of the time seems to be spent converting selectors back 
and forth to masks
   
   ## next experiments:
   1. Adjust the threshold when using mask --> selectors (from 32 to 64, for 
example) 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to