Ruslan Dautkhanov created SPARK-21274:
-----------------------------------------

             Summary: Implement EXCEPT ALL and INTERSECT ALL
                 Key: SPARK-21274
                 URL: https://issues.apache.org/jira/browse/SPARK-21274
             Project: Spark
          Issue Type: New Feature
          Components: Optimizer, SQL
    Affects Versions: 2.1.0, 2.0.0, 2.2.0
            Reporter: Ruslan Dautkhanov


1) *EXCEPT ALL* / MINUS ALL :

{code}
SELECT a,b,c FROM tab1
 EXCEPT ALL 
SELECT a,b,c FROM tab2
{code}
can be rewritten as following outer join:
{code}
SELECT a,b,c
FROM    tab1 t1
     LEFT OUTER JOIN 
        tab2 t2
     ON (
        (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c)
     )
WHERE
    COALESCE(t2.a, t2.b, t2.c) IS NULL
{code}
(register as a temp.view this second query under "*t1_except_t2_df*" name that 
can be also used to find INTERSECT ALL below):

2) *INTERSECT ALL*:

{code}
SELECT a,b,c FROM tab1
 INTERSECT ALL 
SELECT a,b,c FROM tab2
{code}
can be rewritten as following anti-join using t1_except_t2_df we defined above:
{code}
SELECT a,b,c
FROM    tab1 t1
WHERE 
   NOT EXISTS
   (    SELECT 1
        FROM    t1_except_t2_df e
        WHERE (t1.a, t1.b, t1.c) = (e.a, e.b, e.c)
   )
{code}

So the suggestion is just to use above query rewrites to implement both EXCEPT 
ALL and INTERSECT ALL sql set operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to