liukun4515 opened a new issue, #3583:
URL: https://github.com/apache/arrow-datafusion/issues/3583

   **Describe the bug**
   
   in this https://github.com/apache/arrow-datafusion/pull/3396 pr, I remove 
the type coercion in the physical phase.
   
   We need do the type coercion in the logical phase as much as early.
   
   But after do the type coercion, there is some bugs for this.
   
   For example: test case `tpch_explain_q10`
   
   the `ProjectionPushDown` will not work for this test case.
   
   I add some log to debug this, find the cause reason is the type and column 
name for this.
   
   ```
               // aggregate:
               // * remove any aggregate expression that is not required
               // * construct the new set of required columns
   
               // Find distinct group by exprs in the case where we have a 
grouping set
               let all_group_expr: Vec<Expr> = 
grouping_set_to_exprlist(group_expr)?;
   
               exprlist_to_columns(&all_group_expr, &mut new_required_columns)?;
   
               warn!("the input plan schema  {:?}", plan.schema().fields());
               // Gather all columns needed for expressions in this Aggregate
               let mut new_aggr_expr = Vec::new();
               aggr_expr.iter().try_for_each(|expr| {
                   let name = &expr.name()?;
                   let column = Column::from_name(name);
                   warn!("debug iter agg expr: {} \n {} \n {} \n", expr, name, 
column);
                   if required_columns.contains(&column) {
                       new_aggr_expr.push(expr.clone());
                       new_required_columns.insert(column);
   
                       // add to the new set of required columns
                       expr_to_columns(expr, &mut new_required_columns)
                   } else {
                       Ok(())
                   }
               })?;
               warn!(
                   "debug optimizer exprs: {:?} due to unexpected colums: {:?}, 
the new agg_expr: {:?}",
                   aggr_expr, required_columns, new_aggr_expr
               );
   ```
   
   run the test
   
   ```
   RUST_LOG=trace cargo test --package datafusion --test sql_integration 
sql::explain_analyze::tpch_explain_q10 -- --exact
   ```
   
   Get the warn long
   
   ```
   [2022-09-22T02:27:25Z WARN  datafusion_optimizer::projection_push_down] the 
input plan schema  [DFField { qualifier: Some("customer"), field: Field { name: 
"c_custkey", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: 
false
   , metadata: None } }, DFField { qualifier: Some("customer"), field: Field { 
name: "c_name", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: 
false, metadata: None } }, DFField { qualifier: Some("customer"), field: Field 
{ na
   me: "c_acctbal", data_type: Decimal128(15, 2), nullable: false, dict_id: 0, 
dict_is_ordered: false, metadata: None } }, DFField { qualifier: 
Some("customer"), field: Field { name: "c_phone", data_type: Utf8, nullable: 
false, dict_id: 0, d
   ict_is_ordered: false, metadata: None } }, DFField { qualifier: 
Some("nation"), field: Field { name: "n_name", data_type: Utf8, nullable: 
false, dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { 
qualifier: Some("customer")
   , field: Field { name: "c_address", data_type: Utf8, nullable: false, 
dict_id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: 
Some("customer"), field: Field { name: "c_comment", data_type: Utf8, nullable: 
false, dict_
   id: 0, dict_is_ordered: false, metadata: None } }, DFField { qualifier: 
None, field: Field { name: "SUM(lineitem.l_extendedprice * Int64(1) - 
lineitem.l_discount)", data_type: Decimal128(38, 4), nullable: true, dict_id: 
0, dict_is_ordered
   : false, metadata: None } }]
   [2022-09-22T02:27:25Z WARN  datafusion_optimizer::projection_push_down] 
debug iter agg expr: SUM(CAST(#lineitem.l_extendedprice AS Decimal128(38, 4)) * 
CAST(Decimal128(Some(100),23,2) - CAST(#lineitem.l_discount AS Decimal128(23, 
2)) AS D
   ecimal128(38, 4)))
        SUM(lineitem.l_extendedprice * Decimal128(Some(100),23,2) - 
lineitem.l_discount)
        #SUM(lineitem.l_extendedprice * Decimal128(Some(100),23,2) - 
lineitem.l_discount)
   
   [2022-09-22T02:27:25Z WARN  datafusion_optimizer::projection_push_down] 
debug optimizer exprs: [SUM(CAST(#lineitem.l_extendedprice AS Decimal128(38, 
4)) * CAST(Decimal128(Some(100),23,2) - CAST(#lineitem.l_discount AS 
Decimal128(23, 2)) A
   S Decimal128(38, 4)))] due to unexpected colums: {Column { relation: 
Some("nation"), name: "n_name" }, Column { relation: Some("customer"), name: 
"c_name" }, Column { relation: None, name: "SUM(lineitem.l_extendedprice * 
Int64(1) - lineit
   em.l_discount)" }, Column { relation: None, name: "revenue" }, Column { 
relation: Some("customer"), name: "c_custkey" }, Column { relation: 
Some("customer"), name: "c_address" }, Column { relation: Some("customer"), 
name: "c_comment" }, C
   olumn { relation: Some("customer"), name: "c_acctbal" }, Column { relation: 
Some("customer"), name: "c_phone" }}, the new agg_expr: []
   ```
   
   From the log, we can see
   
   In the schema, a field is `Field { name: "SUM(lineitem.l_extendedprice * 
Int64(1) - lineitem.l_discount)"`
   
   Bu the expr is `SUM(CAST(#lineitem.l_extendedprice AS Decimal128(38, 4)) * 
CAST(Decimal128(Some(100),23,2) - CAST(#lineitem.l_discount AS Decimal128(23, 
2)) AS D
   ecimal128(38, 4)))` and the column is `#SUM(lineitem.l_extendedprice * 
Decimal128(Some(100),23,2) - lineitem.l_discount)`
   
   The difference is `Int64(1)` and `Decimal128(Some(100),23,2`, the decimal128 
is got from the type coercion.
   
   
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   **Expected behavior**
   A clear and concise description of what you expected to happen.
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to