[ https://issues.apache.org/jira/browse/SPARK-21274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16465008#comment-16465008 ]
Dilip Biswal commented on SPARK-21274: -------------------------------------- [~viirya] Thank you. I will study this more tomorrow. From a quick peek, it looks reasonable to me. Since our current EXCEPT ALL rewrite is simpler, we can stay with that (given both hive and db2 do it the same way) ? And change the INTERSECT ALL rewrite after studying it a little more. > Implement EXCEPT ALL and INTERSECT ALL > -------------------------------------- > > Key: SPARK-21274 > URL: https://issues.apache.org/jira/browse/SPARK-21274 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 2.0.0, 2.1.0, 2.2.0 > Reporter: Ruslan Dautkhanov > Priority: Major > > 1) *EXCEPT ALL* / MINUS ALL : > {code} > SELECT a,b,c FROM tab1 > EXCEPT ALL > SELECT a,b,c FROM tab2 > {code} > can be rewritten as following outer join: > {code} > SELECT a,b,c > FROM tab1 t1 > LEFT OUTER JOIN > tab2 t2 > ON ( > (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c) > ) > WHERE > COALESCE(t2.a, t2.b, t2.c) IS NULL > {code} > (register as a temp.view this second query under "*t1_except_t2_df*" name > that can be also used to find INTERSECT ALL below): > 2) *INTERSECT ALL*: > {code} > SELECT a,b,c FROM tab1 > INTERSECT ALL > SELECT a,b,c FROM tab2 > {code} > can be rewritten as following anti-join using t1_except_t2_df we defined > above: > {code} > SELECT a,b,c > FROM tab1 t1 > WHERE > NOT EXISTS > ( SELECT 1 > FROM t1_except_t2_df e > WHERE (t1.a, t1.b, t1.c) = (e.a, e.b, e.c) > ) > {code} > So the suggestion is just to use above query rewrites to implement both > EXCEPT ALL and INTERSECT ALL sql set operations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org