alamb commented on issue #3463:
URL: https://github.com/apache/datafusion/issues/3463#issuecomment-3696495342
Here is no
Theory 3: overhead of converting back to mask is taking a long time
Changes: force non inlining of a few functiosn so they show up the profile
<details><summary>Details</summary>
<p>
```diff
diff --git a/parquet/src/arrow/arrow_reader/selection.rs
b/parquet/src/arrow/arrow_reader/selection.rs
index 2ddf812f9c3..5a91f3f273c 100644
--- a/parquet/src/arrow/arrow_reader/selection.rs
+++ b/parquet/src/arrow/arrow_reader/selection.rs
@@ -146,6 +146,7 @@ impl RowSelection {
/// # Panic
///
/// Panics if any of the [`BooleanArray`] contain nulls
+ #[inline(never)]
pub fn from_filters(filters: &[BooleanArray]) -> Self {
let mut next_offset = 0;
let total_rows = filters.iter().map(|x| x.len()).sum();
@@ -161,6 +162,7 @@ impl RowSelection {
}
/// Creates a [`RowSelection`] from an iterator of consecutive ranges
to keep
+ #[inline(never)]
pub fn from_consecutive_ranges<I: Iterator<Item = Range<usize>>>(
ranges: I,
total_rows: usize,
@@ -201,6 +203,7 @@ impl RowSelection {
/// Note: this method does not make any effort to combine consecutive
ranges, nor coalesce
/// ranges that are close together. This is instead delegated to the IO
subsystem to optimise,
/// e.g.
[`ObjectStore::get_ranges`](object_store::ObjectStore::get_ranges)
+ #[inline(never)]
pub fn scan_ranges(&self, page_locations: &[PageLocation]) ->
Vec<Range<u64>> {
let mut ranges: Vec<Range<u64>> = vec![];
let mut row_offset = 0;
@@ -342,6 +345,7 @@ impl RowSelection {
/// Panics if `other` does not have a length equal to the number of
rows selected
/// by this RowSelection
///
+ #[inline(never)]
pub fn and_then(&self, other: &Self) -> Self {
let mut selectors = vec![];
let mut first = self.selectors.iter().cloned().peekable();
@@ -923,6 +927,7 @@ impl RowSelectionCursor {
}
}
+#[inline(never)]
fn boolean_mask_from_selectors(selectors: &[RowSelector]) -> BooleanBuffer {
let total_rows: usize = selectors.iter().map(|s| s.row_count).sum();
let mut builder = BooleanBufferBuilder::new(total_rows);
```
</p>
</details>
Profiling with q10
```shell
samply record --
/Users/andrewlamb/Software/datafusion2/target/profiling/datafusion-cli -f
q.sql > /dev/null 2>&1
```
```sql
SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE
"MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;
```
Also note that this is a situation where the predicate column also appears
in the projection ("MobilePhoneModel")
Note that MobilePhoneModel is a StringViewArray and most values are short
```sql
> select "MobilePhoneModel", count(*) as c from hits GROUP BY 1 ORDER BY 2
desc;
```
<details><summary>Details</summary>
<p>
```sql
+------------------+----------+
| MobilePhoneModel | c |
+------------------+----------+
| | 94434285 |
| iPad | 5106874 |
| iPhone | 229128 |
| A500 | 75723 |
| N8-00 | 24880 |
| ONE TOUCH 6030A | 12436 |
| iPho | 10163 |
| GT-P7300B | 9217 |
| 3110000 | 8975 |
| HTC Desire | 7129 |
| eagle75 | 6084 |
| GT-I9500 | 5833 |
| Transformer | 4566 |
| GT-I9100 | 4303 |
| LG/P760/V1 | 4214 |
| 5250 | 3635 |
| GT-I9192 | 3518 |
| MTC | 3196 |
| SGH-I317 | 3066 |
| GT-S5830 | 2391 |
| One_dual | 1912 |
| IQ245Plus | 1812 |
| HTC One | 1802 |
| 308 | 1705 |
| Acer A701 | 1644 |
| S7-3 | 1631 |
| HTC_WildfireSV | 1551 |
| X2-02 | 1538 |
| HTC | 1326 |
| GT-P7500 | 1315 |
| Sensation | 1305 |
| LT26ii | 1124 |
| HTC Wildfire | 1083 |
| GT-P7500R | 1055 |
| U8800pro | 960 |
| Nokia303 | 955 |
| GT-I9000000.2 | 896 |
| 3110 | 885 |
| ST25i | 832 |
| W536 | 829 |
| . |
| . |
| . |
+------------------+----------+
```
</p>
</details>
While looking at the profile, it seems to me like the overall cost of
decoding the rows and then applying the mask is taking longer than using the
selectors
<img width="1728" height="849" alt="Image"
src="https://github.com/user-attachments/assets/58ed9367-0812-4baa-acb1-6dde1c42da51"
/>
Note that only 2% of the time seems to be spent converting selectors back
and forth to masks
## next experiments:
1. Adjust the threshold when using mask --> selectors (from 32 to 64, for
example)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]