Ruslan Dautkhanov created SPARK-21274: -----------------------------------------
Summary: Implement EXCEPT ALL and INTERSECT ALL Key: SPARK-21274 URL: https://issues.apache.org/jira/browse/SPARK-21274 Project: Spark Issue Type: New Feature Components: Optimizer, SQL Affects Versions: 2.1.0, 2.0.0, 2.2.0 Reporter: Ruslan Dautkhanov 1) *EXCEPT ALL* / MINUS ALL : {code} SELECT a,b,c FROM tab1 EXCEPT ALL SELECT a,b,c FROM tab2 {code} can be rewritten as following outer join: {code} SELECT a,b,c FROM tab1 t1 LEFT OUTER JOIN tab2 t2 ON ( (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c) ) WHERE COALESCE(t2.a, t2.b, t2.c) IS NULL {code} (register as a temp.view this second query under "*t1_except_t2_df*" name that can be also used to find INTERSECT ALL below): 2) *INTERSECT ALL*: {code} SELECT a,b,c FROM tab1 INTERSECT ALL SELECT a,b,c FROM tab2 {code} can be rewritten as following anti-join using t1_except_t2_df we defined above: {code} SELECT a,b,c FROM tab1 t1 WHERE NOT EXISTS ( SELECT 1 FROM t1_except_t2_df e WHERE (t1.a, t1.b, t1.c) = (e.a, e.b, e.c) ) {code} So the suggestion is just to use above query rewrites to implement both EXCEPT ALL and INTERSECT ALL sql set operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org