duongcongtoai opened a new issue, #10672:
URL: https://github.com/apache/datafusion/issues/10672

   ### Describe the bug
   
   Given this query
   ```
   select unnest(struct_c1c0), unnest(list_c2c0) from (
               select unnest(column1) as struct_c1c0, unnest(column2)['c0'] as 
list_c2c0 from temp limit 1
    );
   ```
   where list_c2c0 is the result of an unnest logical plan followed by a limit
   ```
   | logical_plan  | Unnest: lists[unnest(list_c2c0)] 
structs[unnest(struct_c1c0)]                                                    
             |
   |               |   Projection: unnest(temp.column1) AS unnest(struct_c1c0), 
get_field(unnest(column2), Utf8("c0")) AS unnest(list_c2c0)        |
   |               |     Limit: skip=0, fetch=1                                 
                                                                   |
   |               |       Unnest: lists[unnest(temp.column1), unnest(column2)] 
structs[]                                                          |
   |               |         Projection: temp.column1 AS unnest(temp.column1), 
temp.column2 AS unnest(column2)                                     |
   |               |           TableScan: temp projection=[column1, column2]   
   ```
   then unnesting on this column "list_c2c0" again will cause the error
   ```
   Arrow error: Invalid argument error: all columns in a record batch must have 
the same length
   ```
   
   
   ### To Reproduce
   
   ```
   > create table temp as values ([struct('a')], [struct([1,2]),struct([2,3])]);
   0 row(s) fetched. 
   Elapsed 0.008 seconds.
   > select unnest(column1) as struct_c1c0, unnest(column2)['c0'] as list_c2c0 
from temp limit 1;
   +-------------+-----------+
   | struct_c1c0 | list_c2c0 |
   +-------------+-----------+
   | {c0: a}     | [1, 2]    |
   +-------------+-----------+
   1 row(s) fetched. 
   Elapsed 0.010 seconds.
   > select unnest(struct_c1c0), unnest(list_c2c0) from (
               select unnest(column1) as struct_c1c0, unnest(column2)['c0'] as 
list_c2c0 from temp limit 1
    );
   Arrow error: Invalid argument error: all columns in a record batch must have 
the same length
   ```
   
   ### Expected behavior
   
   Correct result returned
   ```
   > select unnest(struct_c1c0), unnest(list_c2c0) from (
               select unnest(column1) as struct_c1c0, unnest(column2)['c0'] as 
list_c2c0 from temp limit 1
    );
   +------------------------+-------------------+
   | unnest(struct_c1c0).c0 | unnest(list_c2c0) |
   +------------------------+-------------------+
   | a                      | 1                 |
   | a                      | 2                 |
   +------------------------+-------------------+
   ```
   
   ### Additional context
   
   The problem may be somewhere around this line
   ```
   
https://github.com/apache/datafusion/blob/34eda15b73a9e278af8844b30ed2f1c21c10359c/datafusion/physical-plan/src/unnest.rs#L431
   ```
   The context of this function is it's trying to downcast the generic Array 
into underlying type and use the underlying values as the unnested column 
values.
   However this downcast somehow leak all the values which were previously 
skipped at the Limit operator, thus leak the extra elements [2,3].
   If we add these debug lines
   ```
           println!(
               "dyn array length: {} and data {:?}",
               list_arrays[0].len(),
               list_arrays[0]
           );
           println!(
               "down casted array length {} and data {:?}",
               typed_arrays[0].values().len(),
               typed_arrays[0].values()
           );
   ```
   The result will be
   ```
   dyn array length: 1 and data ListArray
   [
     PrimitiveArray<Int64>
   [
     1,
     2,
   ],
   ]
   down casted array length 4 and data PrimitiveArray<Int64>
   [
     1,
     2,
     2,
     3,
   ]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to