[ https://issues.apache.org/jira/browse/SPARK-18597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702328#comment-15702328 ]
Herman van Hovell commented on SPARK-18597: ------------------------------------------- [~nsyca] LEFT SEMI and LEFT ANTI are both SEMI joins (half joins). A semi join only returns rows from the first table when it matches one or more rows from the second table (I got this from http://www.slideshare.net/alokeparnachoudhury/semi-joins). The anti join is the opposite, and only returns a row form the first table when it does not match any row in the second table. Hive also supports LEFT SEMI JOIN: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins Impala supports both LEFT SEMI and LEFT ANTI JOIN: https://www.cloudera.com/documentation/enterprise/5-7-x/topics/impala_joins.html > Do not push down filters for LEFT ANTI JOIN > ------------------------------------------- > > Key: SPARK-18597 > URL: https://issues.apache.org/jira/browse/SPARK-18597 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Herman van Hovell > Assignee: Herman van Hovell > Priority: Minor > Labels: correctness > Fix For: 2.1.0 > > > The optimizer pushes down filters for left anti joins. This unfortunately has > the opposite effect. For example: > {noformat} > sql("create or replace temporary view tbl_a as values (1, 5), (2, 1), (3, 6) > as t(c1, c2)") > sql("create or replace temporary view tbl_b as values 1 as t(c1)") > sql(""" > select * > from tbl_a > left anti join tbl_b on ((tbl_a.c1 = tbl_a.c2) is null or tbl_a.c1 = > tbl_a.c2) > """) > {noformat} > Should return rows [2, 1] & [3, 6], but returns no rows. > The upside is that this will only happen when you use a really weird > anti-join (only referencing the table on the left hand side). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org