[ https://issues.apache.org/jira/browse/SPARK-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yin Huai resolved SPARK-9082. ----------------------------- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 7446 [https://github.com/apache/spark/pull/7446] > Filter using non-deterministic expressions should not be pushed down > -------------------------------------------------------------------- > > Key: SPARK-9082 > URL: https://issues.apache.org/jira/browse/SPARK-9082 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Yin Huai > Assignee: Wenchen Fan > Fix For: 1.5.0 > > > For example, > {code} > val df = sqlContext.range(1, 10).select($"id", rand(0).as('r)) > df.as("a").join(df.filter($"r" < 0.5).as("b"), $"a.id" === > $"b.id").explain(true) > {code} > The plan is > {code} > == Physical Plan == > ShuffledHashJoin [id#55323L], [id#55327L], BuildRight > Exchange (HashPartitioning 200) > Project [id#55323L,Rand 0 AS r#55324] > PhysicalRDD [id#55323L], MapPartitionsRDD[42268] at range at <console>:37 > Exchange (HashPartitioning 200) > Project [id#55327L,Rand 0 AS r#55325] > Filter (LessThan) > PhysicalRDD [id#55327L], MapPartitionsRDD[42268] at range at <console>:37 > {code} > The rand get evaluated twice instead of once. > This is caused by when we push down predicates we replace the attribute > reference in the predicate with the actual expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org