Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19475#discussion_r144182958
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala
 ---
    @@ -46,14 +47,20 @@ object ExpressionSet {
      *   set.contains(1 + a) => true
      *   set.contains(a + 2) => false
      * }}}
    + *
    + * For non-deterministic expressions, they are always considered as not 
contained in the [[Set]].
    + * On adding a non-deterministic expression, simply append it to the 
original expressions.
    + * This is consistent with how we define `semanticEquals` between two 
expressions.
      */
     class ExpressionSet protected(
         protected val baseSet: mutable.Set[Expression] = new mutable.HashSet,
         protected val originals: mutable.Buffer[Expression] = new ArrayBuffer)
       extends Set[Expression] {
     
       protected def add(e: Expression): Unit = {
    -    if (!baseSet.contains(e.canonicalized)) {
    +    if (!e.deterministic) {
    +      originals += e
    +    } else if (!baseSet.contains(e.canonicalized) ) {
    --- End diff --
    
    Yeah I have thought about this.
    But the time complexity is O(n) for adding an expression.
    It is a trade off, I prefer to my current implementation, the time 
complexity is O(1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to