Github user heary-cao commented on a diff in the pull request: https://github.com/apache/spark/pull/18969#discussion_r136531688 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -24,6 +24,24 @@ import org.apache.spark.sql.catalyst.plans._ import org.apache.spark.sql.catalyst.plans.logical._ /** + * A pattern that matches any number of project if fields is deterministic + * or child is LeafNode of project on top of another relational operator. + */ +object ProjectOperation extends PredicateHelper { + type ReturnType = (Seq[NamedExpression], LogicalPlan) + + def unapply(plan: LogicalPlan): Option[ReturnType] = plan match { + case Project(fields, child) if fields.forall(_.deterministic) => + Some((fields, child)) + + case Project(fields, child: LeafNode) if !fields.forall(_.deterministic) => --- End diff -- this might be a rare case. but in business scenario, there are still a lot of scenes to use the rare case. similar business scenarios: 1.Random grouping, add a random factor to each row of data before grouping. 2.Use the random value to fill a field, easy to follow algorithm for calculation or prevents querying data anomalies. 3.Data skew, data are discretized using random values. thanks.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org