alamb commented on issue #15162:
URL: https://github.com/apache/datafusion/issues/15162#issuecomment-2715232608

   Thank you for bringing this up @comphead -- I think we have struggled with 
this issue for a while downstream in DataFusion
   
   I think the core fix of this issue is not constructing `DataType::List`, but 
rather one of comparison
   
   As @tustvold points out, the field name is arbitrary and not consistent 
across arrow implementations. Plumbing some way to change it around might work, 
but we'll be forever trying to find all the corder cases. 
   
   Thus in my opinion, rather than try and control the name of the field, a 
better approach is to change places where **`DataType::List`s** are compared 
and ignore the field name unless is it is important
   
   For example, the specific error that @comphead posted in this issue is
   
   ```
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 309.0 failed 1 times, most
    recent failure: Lost task 0.0 in stage 309.0 (TID 797) 
(Mac-1741305812954.local executor driver): 
   org.apache.comet.CometNativeException: Invalid argument error: column types 
must match schema types, 
   expected List(Field { name: "element", data_type: Int8, nullable: true, 
dict_id: 0, dict_is_ordered: false, metadata: 
   {} }) but found List(Field { name: "item", data_type: Int8, nullable: true, 
dict_id: 0, dict_is_ordered: false, 
   metadata: {} }) at column index 0
   ```
   
   It seems like the that error actually comes from RecordBatch construction 
within arrow-rs
   
   
https://github.com/apache/arrow-rs/blob/f4fde769ab6e1a9b75f890b7f8b47bc22800830b/arrow-array/src/record_batch.rs#L333
   
   Perhaps we can relax this check / update RecordBatch::new() to align 
incoming `DataType::List` to match the schema 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to