xudong963 commented on issue #1064: URL: https://github.com/apache/arrow-datafusion/issues/1064#issuecomment-937813055
Bug located at https://github.com/apache/arrow-datafusion/blob/4687899957463ce81c4795a6d35d31320db0252b/datafusion/src/physical_plan/planner.rs#L836 `input_dfschema` is from the logical input schema, so idx of the column is from the logical input schema. The idx is wrapped in physical expr and is used in https://github.com/apache/arrow-datafusion/blob/4687899957463ce81c4795a6d35d31320db0252b/datafusion/src/physical_plan/type_coercion.rs#L56 Pay attention to the `schema`, which is from the physical input schema. So when the size of the logical input schema is different from the size of the physical input schema, the bug appears. The direct way from my brain is to get the idx of the column from the physical input schema, `let idx = input_schema.index_of(c.name.as_str())?;`. But sometimes column, logical input schema field name, and physical input schema field name are not same, such as the following case: ```sql select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date '1994-01-01' and l_shipdate < date '1995-01-01' and l_discount between 0.06 - 0.01 and 0.06 + 0.01 and l_quantity < 24; ``` ```rust [datafusion/src/physical_plan/planner.rs:836] c = Column { relation: None, name: "SUM(lineitem.l_extendedprice * lineitem.l_discount)", } [datafusion/src/physical_plan/planner.rs:837] input_dfschema = DFSchema { fields: [ DFField { qualifier: None, field: Field { name: "SUM(lineitem.l_extendedprice * lineitem.l_discount)", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None, }, }, ], } [datafusion/src/physical_plan/planner.rs:838] input_schema = Schema { fields: [ Field { name: "SUM(lineitem.l_extendedprice Multiply lineitem.l_discount)", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None, }, ], metadata: {}, } ``` Please give me some suggestions about the situation, thanks! @alamb @Dandandan @houqp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org