xudong963 commented on issue #1064:
URL: 
https://github.com/apache/arrow-datafusion/issues/1064#issuecomment-937813055


   Bug located at 
https://github.com/apache/arrow-datafusion/blob/4687899957463ce81c4795a6d35d31320db0252b/datafusion/src/physical_plan/planner.rs#L836
   
   `input_dfschema` is from the logical input schema, so idx of the column is 
from the logical input schema.
   
   The idx is wrapped in physical expr and is used in 
https://github.com/apache/arrow-datafusion/blob/4687899957463ce81c4795a6d35d31320db0252b/datafusion/src/physical_plan/type_coercion.rs#L56
   
   Pay attention to the `schema`, which is from the physical input schema. So 
when the size of the logical input schema is different from the size of the 
physical input schema, the bug appears.
   
   The direct way from my brain is to get the idx of the column from the 
physical input schema, `let idx = input_schema.index_of(c.name.as_str())?;`.  
But sometimes column, logical input schema field name, and physical input 
schema field name are not same, such as the following case:
   ```sql
   select
       sum(l_extendedprice * l_discount) as revenue
   from
       lineitem
   where
           l_shipdate >= date '1994-01-01'
     and l_shipdate < date '1995-01-01'
     and l_discount between 0.06 - 0.01 and 0.06 + 0.01
     and l_quantity < 24;
   ```
   ```rust
   [datafusion/src/physical_plan/planner.rs:836] c = Column {
       relation: None,
       name: "SUM(lineitem.l_extendedprice * lineitem.l_discount)",
   }
   [datafusion/src/physical_plan/planner.rs:837] input_dfschema = DFSchema {
       fields: [
           DFField {
               qualifier: None,
               field: Field {
                   name: "SUM(lineitem.l_extendedprice * lineitem.l_discount)",
                   data_type: Float64,
                   nullable: true,
                   dict_id: 0,
                   dict_is_ordered: false,
                   metadata: None,
               },
           },
       ],
   }
   [datafusion/src/physical_plan/planner.rs:838] input_schema = Schema {
       fields: [
           Field {
               name: "SUM(lineitem.l_extendedprice Multiply 
lineitem.l_discount)",
               data_type: Float64,
               nullable: true,
               dict_id: 0,
               dict_is_ordered: false,
               metadata: None,
           },
       ],
       metadata: {},
   }
   ```
   Please give me some suggestions about the situation, thanks! @alamb 
@Dandandan @houqp 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to