Github user heary-cao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18969#discussion_r136531688
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
    @@ -24,6 +24,24 @@ import org.apache.spark.sql.catalyst.plans._
     import org.apache.spark.sql.catalyst.plans.logical._
     
     /**
    + * A pattern that matches any number of project if fields is deterministic
    + * or child is LeafNode of project on top of another relational operator.
    + */
    +object ProjectOperation extends PredicateHelper {
    +  type ReturnType = (Seq[NamedExpression], LogicalPlan)
    +
    +  def unapply(plan: LogicalPlan): Option[ReturnType] = plan match {
    +    case Project(fields, child) if fields.forall(_.deterministic) =>
    +      Some((fields, child))
    +
    +    case Project(fields, child: LeafNode) if 
!fields.forall(_.deterministic) =>
    --- End diff --
    
    this might be a rare case. but in business scenario, there are still a lot 
of scenes to use the rare case. similar business scenarios:
    1.Random grouping, add a random factor to each row of data before grouping.
    2.Use the random value to fill a field, easy to follow algorithm for 
calculation or prevents querying data anomalies.
    3.Data skew, data are discretized using random values.
    thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to