Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r145017429
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
    @@ -205,14 +205,17 @@ object PhysicalAggregation {
         case logical.Aggregate(groupingExpressions, resultExpressions, child) 
=>
           // A single aggregate expression might appear multiple times in 
resultExpressions.
           // In order to avoid evaluating an individual aggregate function 
multiple times, we'll
    -      // build a set of the distinct aggregate expressions and build a 
function which can
    +      // build a map of the distinct aggregate expressions and build a 
function which can
           // be used to re-write expressions so that they reference the single 
copy of the
    -      // aggregate function which actually gets computed.
    -      val aggregateExpressions = resultExpressions.flatMap { expr =>
    +      // aggregate function which actually gets computed. Note that 
aggregate expressions
    +      // should always be deterministic, so we can use its canonicalized 
expression as its
    --- End diff --
    
    That's not true, see Postgres:
    ```
    cloud=# select random(), random();
          random       |      random      
    -------------------+------------------
     0.360924747306854 | 0.90011026058346
    (1 row)
    
    cloud=# select random() - random();
          ?column?       
    ---------------------
     -0.0260558221489191
    (1 row)
    ```
    
    Seems we need a way to distinguish FIRST_VALUE and RANDOM. Both of them are 
nondeterministic, but `FIRST_VALUE` has consistent values in the same 
project/aggregate.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to