LiaCastaneda opened a new issue, #17012:
URL: https://github.com/apache/datafusion/issues/17012

   ### Describe the bug
   
   Queries that group by columns of type List<Dictionary<(),()>>  fail with the 
following error:
   
   `Expected infallible creation of GenericListArray from ArrayDataRef failed: 
InvalidArgumentError("[Large]ListArray's child datatype Utf8 does not 
correspond to the List's datatype Dictionary(Int8, Utf8)")`
   
   This happens when doing a roundtrip from `ArrayRef` -> `Row` -> `ArrayRef`, 
it panics on `convert_row`. I believe this is because upon encoding a Dict to 
Row it doesn't seem to preserve dict encoding (see 
[here](https://github.com/apache/arrow-rs/blob/079d4f2db87c9b542c63c4f862876d5559dbfd99/arrow-row/src/lib.rs#L1608))
   
   The error arises from Arrow, but on DataFusion it may hapen 
[here](https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/aggregates/group_values/row.rs)
 since it uses Arrow's RowConverter and does roundtrip conversions.
   
   There is already an open issue on Arrow 
https://github.com/apache/arrow-rs/issues/7165
   
   
   ### To Reproduce
   
   ```
   #[tokio::test]
   async fn df_list_of_dict_should_panic() -> Result<()> {
       // build List<Dictionary<Int8,Utf8>>
       let mut dict_builder = StringDictionaryBuilder::<Int8Type>::new();
       for s in ["foo","bar","baz","foo"] { dict_builder.append(s)?; }
       let mut list_builder = ListBuilder::new(dict_builder);
       list_builder.values().append("foo")?; 
       list_builder.values().append("bar")?;
       list_builder.append(true);
       list_builder.values().append("baz")?; 
       list_builder.append(true);
       let list_dict = list_builder.finish();
   
       let schema = Arc::new(Schema::new(vec![
           Field::new("a", DataType::Int32, false),
           Field::new("c", list_dict.data_type().clone(), false),
       ]));
       let batch = RecordBatch::try_new(
           schema.clone(),
           vec![Arc::new(Int32Array::from(vec![1,2])), Arc::new(list_dict)],
       )?;
   
       let ctx = SessionContext::new();
       ctx.register_batch("x", batch)?;
   
       // GROUP BY forces Aggregate (first RowConverter pass)
       // ORDER BY … LIMIT forces TopKExec (second pass)
       let df = ctx.sql(
           r#"
           SELECT c, COUNT(*) AS cnt
           FROM   x
           GROUP  BY c
           ORDER  BY cnt DESC
           LIMIT  10
           "#,
       ).await?;
   
       df.collect().await?;
   
       Ok(())
   }
   ```
   
   ### Expected behavior
   
   Not throw an error
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to