Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19475#discussion_r144175468
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
 ---
    @@ -46,14 +47,20 @@ object ExpressionSet {
      *   set.contains(1 + a) => true
      *   set.contains(a + 2) => false
      * }}}
    + *
    + * For non-deterministic expressions, they are always considered as not 
contained in the [[Set]].
    + * On adding a non-deterministic expression, simply append it to the 
original expressions.
    + * This is consistent with how we define `semanticEquals` between two 
expressions.
      */
     class ExpressionSet protected(
         protected val baseSet: mutable.Set[Expression] = new mutable.HashSet,
         protected val originals: mutable.Buffer[Expression] = new ArrayBuffer)
       extends Set[Expression] {
     
       protected def add(e: Expression): Unit = {
    -    if (!baseSet.contains(e.canonicalized)) {
    +    if (!e.deterministic) {
    +      originals += e
    +    } else if (!baseSet.contains(e.canonicalized) ) {
    --- End diff --
    
    Is it ok to just to replace `baseSet.contains(e.canonicalized)` ?:
    ```
        if (!baseSet.exists(_.semanticEquals(e))) {
          baseSet.add(e.canonicalized)
          originals += e
        }
    ```
    I feel better to put equality checks in one place.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to