jiangzhx opened a new issue #1246: URL: https://github.com/apache/arrow-datafusion/issues/1246
**Describe the bug** group by high cardinality column in datafusion 10 times slower than low cardinality column. also i tested on other olap engine, there are only 2 times slow or less; ### [trion](https://github.com/trinodb/trino) olap engine write by java low cardinality usage ms: 1000ms± high cardinality usage ms: 2000ms± ### [doris](https://github.com/apache/incubator-doris/) olap engine write by c++ low cardinality usage ms: 350ms± high cardinality usage ms: 500ms± **To Reproduce** Steps to reproduce the behavior: parquet table with 60,000,000 rows; data generate by [ssb-dbgen](https://github.com/electrum/ssb-dbgen) group by LO_ORDERPRIORITY SELECT sum(LO_EXTENDEDPRICE) AS revenue FROM lineorder_flat group by LO_ORDERPRIORITY; 5 rows in set. Query took 0.341 seconds. group by S_ADDRESS SELECT sum(LO_EXTENDEDPRICE) AS revenue FROM lineorder_flat group by S_ADDRESS; 20000 rows in set. Query took 2.582 seconds. **Expected behavior** should some with other engine; **Additional context** Add any other context about the problem here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org