alamb commented on issue #15162: URL: https://github.com/apache/datafusion/issues/15162#issuecomment-2715232608
Thank you for bringing this up @comphead -- I think we have struggled with this issue for a while downstream in DataFusion I think the core fix of this issue is not constructing `DataType::List`, but rather one of comparison As @tustvold points out, the field name is arbitrary and not consistent across arrow implementations. Plumbing some way to change it around might work, but we'll be forever trying to find all the corder cases. Thus in my opinion, rather than try and control the name of the field, a better approach is to change places where **`DataType::List`s** are compared and ignore the field name unless is it is important For example, the specific error that @comphead posted in this issue is ``` org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 309.0 failed 1 times, most recent failure: Lost task 0.0 in stage 309.0 (TID 797) (Mac-1741305812954.local executor driver): org.apache.comet.CometNativeException: Invalid argument error: column types must match schema types, expected List(Field { name: "element", data_type: Int8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) but found List(Field { name: "item", data_type: Int8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) at column index 0 ``` It seems like the that error actually comes from RecordBatch construction within arrow-rs https://github.com/apache/arrow-rs/blob/f4fde769ab6e1a9b75f890b7f8b47bc22800830b/arrow-array/src/record_batch.rs#L333 Perhaps we can relax this check / update RecordBatch::new() to align incoming `DataType::List` to match the schema 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org