LiaCastaneda opened a new issue, #17012: URL: https://github.com/apache/datafusion/issues/17012
### Describe the bug Queries that group by columns of type List<Dictionary<(),()>> fail with the following error: `Expected infallible creation of GenericListArray from ArrayDataRef failed: InvalidArgumentError("[Large]ListArray's child datatype Utf8 does not correspond to the List's datatype Dictionary(Int8, Utf8)")` This happens when doing a roundtrip from `ArrayRef` -> `Row` -> `ArrayRef`, it panics on `convert_row`. I believe this is because upon encoding a Dict to Row it doesn't seem to preserve dict encoding (see [here](https://github.com/apache/arrow-rs/blob/079d4f2db87c9b542c63c4f862876d5559dbfd99/arrow-row/src/lib.rs#L1608)) The error arises from Arrow, but on DataFusion it may hapen [here](https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/aggregates/group_values/row.rs) since it uses Arrow's RowConverter and does roundtrip conversions. There is already an open issue on Arrow https://github.com/apache/arrow-rs/issues/7165 ### To Reproduce ``` #[tokio::test] async fn df_list_of_dict_should_panic() -> Result<()> { // build List<Dictionary<Int8,Utf8>> let mut dict_builder = StringDictionaryBuilder::<Int8Type>::new(); for s in ["foo","bar","baz","foo"] { dict_builder.append(s)?; } let mut list_builder = ListBuilder::new(dict_builder); list_builder.values().append("foo")?; list_builder.values().append("bar")?; list_builder.append(true); list_builder.values().append("baz")?; list_builder.append(true); let list_dict = list_builder.finish(); let schema = Arc::new(Schema::new(vec![ Field::new("a", DataType::Int32, false), Field::new("c", list_dict.data_type().clone(), false), ])); let batch = RecordBatch::try_new( schema.clone(), vec![Arc::new(Int32Array::from(vec![1,2])), Arc::new(list_dict)], )?; let ctx = SessionContext::new(); ctx.register_batch("x", batch)?; // GROUP BY forces Aggregate (first RowConverter pass) // ORDER BY … LIMIT forces TopKExec (second pass) let df = ctx.sql( r#" SELECT c, COUNT(*) AS cnt FROM x GROUP BY c ORDER BY cnt DESC LIMIT 10 "#, ).await?; df.collect().await?; Ok(()) } ``` ### Expected behavior Not throw an error ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org