linhongliu-db opened a new pull request #30053:
URL: https://github.com/apache/spark/pull/30053


   ### What changes were proposed in this pull request?
   This PR fixes a conflict between `RewriteDistinctAggregates` and 
`DecimalAggregates`.
   In some cases, `DecimalAggregates` will wrap the decimal column to 
`UnscaledValue` using
   different rules for different aggregates.
   
   This means, same distinct column with different aggregates will change to 
different distinct columns
   after `DecimalAggregates`. For example:
   `avg(distinct decimal_col), sum(distinct decimal_col)` may change to
   `avg(distinct UnscaledValue(decimal_col)), sum(distinct decimal_col)`
   
   We assume after `RewriteDistinctAggregates`, there will be at most one 
distinct column in aggregates,
   but `DecimalAggregates` breaks this assumption. To fix this, we have to 
switch the order of these two
   rules.
   
   
   ### Why are the changes needed?
   bug fix
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   added test cases
   
   Authored-by: Linhong Liu <linhong....@databricks.com>
   Signed-off-by: Wenchen Fan <wenc...@databricks.com>
   (cherry picked from commit 40ef5c91ade906b38169f959b3991ce8b0f45154)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to