duongcongtoai opened a new issue, #10672: URL: https://github.com/apache/datafusion/issues/10672
### Describe the bug Given this query ``` select unnest(struct_c1c0), unnest(list_c2c0) from ( select unnest(column1) as struct_c1c0, unnest(column2)['c0'] as list_c2c0 from temp limit 1 ); ``` where list_c2c0 is the result of an unnest logical plan followed by a limit ``` | logical_plan | Unnest: lists[unnest(list_c2c0)] structs[unnest(struct_c1c0)] | | | Projection: unnest(temp.column1) AS unnest(struct_c1c0), get_field(unnest(column2), Utf8("c0")) AS unnest(list_c2c0) | | | Limit: skip=0, fetch=1 | | | Unnest: lists[unnest(temp.column1), unnest(column2)] structs[] | | | Projection: temp.column1 AS unnest(temp.column1), temp.column2 AS unnest(column2) | | | TableScan: temp projection=[column1, column2] ``` then unnesting on this column "list_c2c0" again will cause the error ``` Arrow error: Invalid argument error: all columns in a record batch must have the same length ``` ### To Reproduce ``` > create table temp as values ([struct('a')], [struct([1,2]),struct([2,3])]); 0 row(s) fetched. Elapsed 0.008 seconds. > select unnest(column1) as struct_c1c0, unnest(column2)['c0'] as list_c2c0 from temp limit 1; +-------------+-----------+ | struct_c1c0 | list_c2c0 | +-------------+-----------+ | {c0: a} | [1, 2] | +-------------+-----------+ 1 row(s) fetched. Elapsed 0.010 seconds. > select unnest(struct_c1c0), unnest(list_c2c0) from ( select unnest(column1) as struct_c1c0, unnest(column2)['c0'] as list_c2c0 from temp limit 1 ); Arrow error: Invalid argument error: all columns in a record batch must have the same length ``` ### Expected behavior Correct result returned ``` > select unnest(struct_c1c0), unnest(list_c2c0) from ( select unnest(column1) as struct_c1c0, unnest(column2)['c0'] as list_c2c0 from temp limit 1 ); +------------------------+-------------------+ | unnest(struct_c1c0).c0 | unnest(list_c2c0) | +------------------------+-------------------+ | a | 1 | | a | 2 | +------------------------+-------------------+ ``` ### Additional context The problem may be somewhere around this line ``` https://github.com/apache/datafusion/blob/34eda15b73a9e278af8844b30ed2f1c21c10359c/datafusion/physical-plan/src/unnest.rs#L431 ``` The context of this function is it's trying to downcast the generic Array into underlying type and use the underlying values as the unnested column values. However this downcast somehow leak all the values which were previously skipped at the Limit operator, thus leak the extra elements [2,3]. If we add these debug lines ``` println!( "dyn array length: {} and data {:?}", list_arrays[0].len(), list_arrays[0] ); println!( "down casted array length {} and data {:?}", typed_arrays[0].values().len(), typed_arrays[0].values() ); ``` The result will be ``` dyn array length: 1 and data ListArray [ PrimitiveArray<Int64> [ 1, 2, ], ] down casted array length 4 and data PrimitiveArray<Int64> [ 1, 2, 2, 3, ] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org