alamb commented on issue #6897: URL: https://github.com/apache/arrow-datafusion/issues/6897#issuecomment-1629071485
I have verified this has been fixed on master (aka what will be released in DataFusion `28.0.0`). BTW I added new test coverage in https://github.com/apache/arrow-datafusion/pull/6836 so that we don't break this again by accident. Since it is a regression I would be willing to create a patch release (`27.0.1`) with the fix if that would be helpful for others Using this query (thanks for the reproducer @maxburke 🙏 ) ```sql SELECT "day" AS "date", count(distinct "direction") AS "num_directions" FROM 'test_data.parquet' GROUP BY "day" ORDER BY "day" ASC; ``` ## `26.0.0` works ```shell DataFusion CLI v26.0.0 ❯ SELECT "day" AS "date", count(distinct "direction") AS "num_directions" FROM 'test_data.parquet' GROUP BY "day" ORDER BY "day" ASC; +---------------------+----------------+ | date | num_directions | +---------------------+----------------+ | 2011-09-09T00:00:00 | 2 | | 2011-09-10T00:00:00 | 2 | ... | 2018-04-14T00:00:00 | 2 | | 2018-04-15T00:00:00 | 2 | +---------------------+----------------+ 81 rows in set. Query took 0.024 seconds. ❯ ``` ## `27.0.0` fails ```shell DataFusion CLI v27.0.0 ❯ SELECT "day" AS "date", count(distinct "direction") AS "num_directions" FROM 'test_data.parquet' GROUP BY "day" ORDER BY "day" ASC; Optimizer rule 'simplify_expressions' failed caused by Schema error: No field named "test_data.parquet".day. Valid fields are "test_data.parquet.day", "COUNT(DISTINCT test_data.parquet.direction)". ❯ ``` ## `main` passes: ```shell $ git checkout main Already on 'main' Your branch is up to date with 'apache/main'. $ CARGO_TARGET_DIR=/Users/alamb/Software/target-df cargo run Finished dev [unoptimized + debuginfo] target(s) in 0.27s Running `/Users/alamb/Software/target-df/debug/datafusion-cli` DataFusion CLI v27.0.0 ❯ SELECT "day" AS "date", count(distinct "direction") AS "num_directions" FROM 'test_data.parquet' GROUP BY "day" ORDER BY "day" ASC; +---------------------+----------------+ | date | num_directions | +---------------------+----------------+ | 2011-09-09T00:00:00 | 2 | | 2011-09-10T00:00:00 | 2 | ... | 2018-04-14T00:00:00 | 2 | | 2018-04-15T00:00:00 | 2 | +---------------------+----------------+ 81 rows in set. Query took 0.027 seconds. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
