[ https://issues.apache.org/jira/browse/SPARK-10740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-10740: ------------------------------------ Assignee: (was: Apache Spark) > handle nondeterministic expressions correctly for set operations > ---------------------------------------------------------------- > > Key: SPARK-10740 > URL: https://issues.apache.org/jira/browse/SPARK-10740 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Wenchen Fan > > We should only push down deterministic filter condition to set operator. > For Union, let's say we do a non-deterministic filter on 1...5 union 1...5, > and we may get 1,3 for the left side and 2,4 for the right side, then the > result should be 1,3,2,4. If we push down this filter, we get 1,3 for both > side(we create a new random object with same seed in each side) and the > result would be 1,3,1,3. > For Intersect, let's say there is a non-deterministic condition with a 0.5 > possibility to accept a row and we have a row that presents in both sides of > an Intersect. Once we push down this condition, the possibility to accept > this row will be 0.25. > For Except, let's say there is a row that presents in both sides of an > Except. This row should not be in the final output. However, if we pushdown a > non-deterministic condition, it is possible that this row is rejected from > one side and then we output a row that should not be a part of the result. > We should only push down deterministic projection to Union. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org