l1t1 opened a new issue, #10245: URL: https://github.com/apache/datafusion/issues/10245
### Is your feature request related to a problem or challenge? it took about 280s to group by 4 columns on a table of 1e8 rows ``` > create table ren as select ((random()*1000)::int%2+1)::int c1, ((random()*1000)::int%100+1)::int c2, ((random()*1000)::int%30+1)::int c3, ((random()*1000)::int%60+1)::int c4, ((random()*1000)::int%5+1)::int c5 from unnest(generate_series(1,1e8::int)); 0 row(s) fetched. Elapsed 317.856 seconds. > create table rc1c2c3c4 as select c1,c2,c3,c4,sum(1)cnt from ren group by cube(c1,c2,c3,c4); 0 row(s) fetched. Elapsed 281.501 seconds. > create table t as select c1,c2,c3,c4,sum(1)cnt from ren group by c1,c2,c3,c4; 0 row(s) fetched. Elapsed 38.093 seconds. > create table rc1c2c3c4m as select c1,c2,c3,c4,sum(1)cnt from ren group by c1,c2,c3,c4 union all select c1,null c2,null c3,null c4,sum(1)cnt from ren group by c1 union all select null c1,c2,null c3,null c4,sum(1)cnt from ren group by c2 union all select null c1,null c2,c3,null c4,sum(1)cnt from ren group by c3 union all select null c1,null c2,null c3,c4,sum(1)cnt from ren group by c4 union all select c1,c2,null c3,null c4,sum(1)cnt from ren group by c1,c2 union all select c1,null c2,c3,null c4,sum(1)cnt from ren group by c1,c3 union all select c1,null c2,null c3,c4,sum(1)cnt from ren group by c1,c4 union all select null c1,c2,c3,null c4,sum(1)cnt from ren group by c2,c3 union all select null c1,c2,null c3,c4,sum(1)cnt from ren group by c2,c4 union all select null c1,null c2,c3,c4,sum(1)cnt from ren group by c3,c4 union all select c1,c2,c3,null c4,sum(1)cnt from ren group by c1,c2,c3 union all select c1,c2,null c3,c4,sum(1)cnt from ren group by c1,c2,c4 union all select c1,null c2,c3,c4,sum(1)cnt from ren group by c1,c3,c4 union all select null c1,c2,c3,c4,sum(1)cnt from ren group by c2,c3,c4 union all select null c1,null c2,null c3,null c4,sum(1)cnt from ren ; 0 row(s) fetched. Elapsed 1077.624 seconds. ``` the test of duckdb is relatively fast, though its `group by cube` is slower than the equivalent `union all` https://github.com/duckdb/duckdb/issues/10450 ### Describe the solution you'd like the result of cube should use the results of one group by statement as temp table and query it subsequently. such as https://github.com/duckdb/duckdb/issues/10451 said ``` create table t as select c1,c2,c3,c4,sum(1)cnt from ren group by c1,c2,c3,c4; --Run Time (s): real 6.829 user 70.730853 sys 10.280466 create table rc1c2c3c4m3 as select * from t union all select c1,null,null,null,sum(cnt)cnt from t group by c1 union all select null,c2,null,null,sum(cnt)cnt from t group by c2 union all select null,null,c3,null,sum(cnt)cnt from t group by c3 union all select null,null,null,c4,sum(cnt)cnt from t group by c4 union all select c1,c2,null,null,sum(cnt)cnt from t group by c1,c2 union all select c1,null,c3,null,sum(cnt)cnt from t group by c1,c3 union all select c1,null,null,c4,sum(cnt)cnt from t group by c1,c4 union all select null,c2,c3,null,sum(cnt)cnt from t group by c2,c3 union all select null,c2,null,c4,sum(cnt)cnt from t group by c2,c4 union all select null,null,c3,c4,sum(cnt)cnt from t group by c3,c4 union all select c1,c2,c3,null,sum(cnt)cnt from t group by c1,c2,c3 union all select c1,c2,null,c4,sum(cnt)cnt from t group by c1,c2,c4 union all select c1,null,c3,c4,sum(cnt)cnt from t group by c1,c3,c4 union all select null,c2,c3,c4,sum(cnt)cnt from t group by c2,c3,c4 union all select null,null,null,null,sum(cnt)cnt from t ; ``` ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
