l1t1 opened a new issue, #10245:
URL: https://github.com/apache/datafusion/issues/10245

   ### Is your feature request related to a problem or challenge?
   
   it took about 280s to group by 4 columns on a table of 1e8 rows
   
   ```
   > create table ren as select
   ((random()*1000)::int%2+1)::int c1,
   ((random()*1000)::int%100+1)::int c2,
   ((random()*1000)::int%30+1)::int c3,
   ((random()*1000)::int%60+1)::int c4,
   ((random()*1000)::int%5+1)::int c5
   from unnest(generate_series(1,1e8::int));
   0 row(s) fetched.
   Elapsed 317.856 seconds.
   
   > create table rc1c2c3c4 as select c1,c2,c3,c4,sum(1)cnt from ren group by 
cube(c1,c2,c3,c4);
   0 row(s) fetched.
   Elapsed 281.501 seconds.
   
   > create table t as select c1,c2,c3,c4,sum(1)cnt from ren group by 
c1,c2,c3,c4;
   0 row(s) fetched.
   Elapsed 38.093 seconds.
   
   > create table rc1c2c3c4m as
   select c1,c2,c3,c4,sum(1)cnt from ren group by c1,c2,c3,c4 union all
   select c1,null c2,null c3,null c4,sum(1)cnt from ren group by c1 union all
   select null c1,c2,null c3,null c4,sum(1)cnt from ren group by c2 union all
   select null c1,null c2,c3,null c4,sum(1)cnt from ren group by c3 union all
   select null c1,null c2,null c3,c4,sum(1)cnt from ren group by c4 union all
   select c1,c2,null c3,null c4,sum(1)cnt from ren group by c1,c2 union all
   select c1,null c2,c3,null c4,sum(1)cnt from ren group by c1,c3 union all
   select c1,null c2,null c3,c4,sum(1)cnt from ren group by c1,c4 union all
   select null c1,c2,c3,null c4,sum(1)cnt from ren group by c2,c3 union all
   select null c1,c2,null c3,c4,sum(1)cnt from ren group by c2,c4 union all
   select null c1,null c2,c3,c4,sum(1)cnt from ren group by c3,c4 union all
   select c1,c2,c3,null c4,sum(1)cnt from ren group by c1,c2,c3 union all
   select c1,c2,null c3,c4,sum(1)cnt from ren group by c1,c2,c4 union all
   select c1,null c2,c3,c4,sum(1)cnt from ren group by c1,c3,c4 union all
   select null c1,c2,c3,c4,sum(1)cnt from ren group by c2,c3,c4 union all
   select null c1,null c2,null c3,null c4,sum(1)cnt from ren  ;
   0 row(s) fetched.
   Elapsed 1077.624 seconds.
   ```
   the test of duckdb is relatively fast, though its `group by cube` is slower 
than the equivalent `union all`
   
   https://github.com/duckdb/duckdb/issues/10450
   
   ### Describe the solution you'd like
   
   the result of cube should  use the results of one group by statement as temp 
table and query it subsequently.
   such as
   https://github.com/duckdb/duckdb/issues/10451 said
   
   ```
   create table t as select c1,c2,c3,c4,sum(1)cnt from ren group by c1,c2,c3,c4;
   --Run Time (s): real 6.829 user 70.730853 sys 10.280466
   create table rc1c2c3c4m3 as 
   select * from t union all
   select c1,null,null,null,sum(cnt)cnt from t group by c1 union all
   select null,c2,null,null,sum(cnt)cnt from t group by c2 union all
   select null,null,c3,null,sum(cnt)cnt from t group by c3 union all
   select null,null,null,c4,sum(cnt)cnt from t group by c4 union all
   select c1,c2,null,null,sum(cnt)cnt from t group by c1,c2 union all
   select c1,null,c3,null,sum(cnt)cnt from t group by c1,c3 union all
   select c1,null,null,c4,sum(cnt)cnt from t group by c1,c4 union all
   select null,c2,c3,null,sum(cnt)cnt from t group by c2,c3 union all
   select null,c2,null,c4,sum(cnt)cnt from t group by c2,c4 union all
   select null,null,c3,c4,sum(cnt)cnt from t group by c3,c4 union all
   select c1,c2,c3,null,sum(cnt)cnt from t group by c1,c2,c3 union all
   select c1,c2,null,c4,sum(cnt)cnt from t group by c1,c2,c4 union all
   select c1,null,c3,c4,sum(cnt)cnt from t group by c1,c3,c4 union all
   select null,c2,c3,c4,sum(cnt)cnt from t group by c2,c3,c4 union all
   select null,null,null,null,sum(cnt)cnt from t    ;
   ```
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to