This is an automated email from the ASF dual-hosted git repository.

kamille pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
     new 6cc514bf6f Add extended query for checking improvement for blocked 
groups optimization (#15936)
6cc514bf6f is described below

commit 6cc514bf6fd5419dead03edc16b071d73ddb861d
Author: kamille <kami...@apache.org>
AuthorDate: Sat May 3 22:50:39 2025 +0800

    Add extended query for checking improvement for blocked groups optimization 
(#15936)
    
    * add query to show improvement for 15591.
    
    * document the new added query.
---
 benchmarks/queries/clickbench/README.md    | 32 ++++++++++++++++++++++++++++++
 benchmarks/queries/clickbench/extended.sql |  1 +
 2 files changed, 33 insertions(+)

diff --git a/benchmarks/queries/clickbench/README.md 
b/benchmarks/queries/clickbench/README.md
index bbe6981960..7d27d3c6c3 100644
--- a/benchmarks/queries/clickbench/README.md
+++ b/benchmarks/queries/clickbench/README.md
@@ -132,6 +132,7 @@ Results look like
 ```
 
 ### Q6: How many social shares meet complex multi-stage filtering criteria?
+
 **Question**: What is the count of sharing actions from iPhone mobile users on 
specific social networks, within common timezones, participating in seasonal 
campaigns, with high screen resolutions and closely matched UTM parameters?
 **Important Query Properties**: Simple filter with high-selectivity, Costly 
string matching, A large number of filters with high overhead are positioned 
relatively later in the process
 
@@ -159,6 +160,37 @@ WHERE
 ```
 Result is empty,Since it has already been filtered by `"SocialAction" = 
'share'`.
 
+### Q7: Device Resolution and Refresh Behavior Analysis
+
+**Question**: Identify the top 10 WatchIDs with the highest resolution range 
(min/max "ResolutionWidth") and total refresh count ("IsRefresh") in descending 
WatchID order
+
+**Important Query Properties**: Primitive aggregation functions, group by 
single primitive column, high cardinality grouping
+
+```sql
+SELECT "WatchID", MIN("ResolutionWidth") as wmin, MAX("ResolutionWidth") as 
wmax, SUM("IsRefresh") as srefresh
+FROM hits
+GROUP BY "WatchID"
+ORDER BY "WatchID" DESC
+LIMIT 10;
+```
+
+Results look like
+```
++---------------------+------+------+----------+
+| WatchID             | wmin | wmax | srefresh |
++---------------------+------+------+----------+
+| 9223372033328793741 | 1368 | 1368 | 0        |
+| 9223371941779979288 | 1479 | 1479 | 0        |
+| 9223371906781104763 | 1638 | 1638 | 0        |
+| 9223371803397398692 | 1990 | 1990 | 0        |
+| 9223371799215233959 | 1638 | 1638 | 0        |
+| 9223371785975219972 | 0    | 0    | 0        |
+| 9223371776706839366 | 1368 | 1368 | 0        |
+| 9223371740707848038 | 1750 | 1750 | 0        |
+| 9223371715190479830 | 1368 | 1368 | 0        |
+| 9223371620124912624 | 1828 | 1828 | 0        |
++---------------------+------+------+----------+
+```
 
 ## Data Notes
 
diff --git a/benchmarks/queries/clickbench/extended.sql 
b/benchmarks/queries/clickbench/extended.sql
index 455de3a98f..f210c3a56e 100644
--- a/benchmarks/queries/clickbench/extended.sql
+++ b/benchmarks/queries/clickbench/extended.sql
@@ -5,3 +5,4 @@ SELECT "SocialSourceNetworkID", "RegionID", COUNT(*), 
AVG("Age"), AVG("ParamPric
 SELECT "ClientIP", "WatchID",  COUNT(*) c, MIN("ResponseStartTiming") tmin, 
MEDIAN("ResponseStartTiming") tmed, MAX("ResponseStartTiming") tmax FROM hits 
WHERE "JavaEnable" = 0  GROUP BY  "ClientIP", "WatchID" HAVING c > 1 ORDER BY 
tmed DESC LIMIT 10;
 SELECT "ClientIP", "WatchID",  COUNT(*) c, MIN("ResponseStartTiming") tmin, 
APPROX_PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY "ResponseStartTiming") 
tp95, MAX("ResponseStartTiming") tmax FROM 'hits' WHERE "JavaEnable" = 0 GROUP 
BY  "ClientIP", "WatchID" HAVING c > 1 ORDER BY tp95 DESC LIMIT 10;
 SELECT COUNT(*) AS ShareCount FROM hits WHERE "IsMobile" = 1 AND 
"MobilePhoneModel" LIKE 'iPhone%' AND "SocialAction" = 'share' AND 
"SocialSourceNetworkID" IN (5, 12) AND "ClientTimeZone" BETWEEN -5 AND 5 AND 
regexp_match("Referer", '\/campaign\/(spring|summer)_promo') IS NOT NULL AND 
CASE WHEN split_part(split_part("URL", 'resolution=', 2), '&', 1) ~ '^\d+$' 
THEN split_part(split_part("URL", 'resolution=', 2), '&', 1)::INT ELSE 0 END > 
1920 AND levenshtein(CAST("UTMSource" AS STRING), C [...]
+SELECT "WatchID", MIN("ResolutionWidth") as wmin, MAX("ResolutionWidth") as 
wmax, SUM("IsRefresh") as srefresh FROM hits GROUP BY "WatchID" ORDER BY 
"WatchID" DESC LIMIT 10;


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org

Reply via email to