[ https://issues.apache.org/jira/browse/SPARK-24859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550359#comment-16550359 ]
Johannes Mayer commented on SPARK-24859: ---------------------------------------- I will provide an example. Could you test it on the master branch? > Predicates pushdown on outer joins > ---------------------------------- > > Key: SPARK-24859 > URL: https://issues.apache.org/jira/browse/SPARK-24859 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 2.2.0 > Environment: Cloudera CDH 5.13.1 > Reporter: Johannes Mayer > Priority: Major > > I have two AVRO tables in Hive called FAct and DIm. Both are partitioned by a > common column called part_col. Now I want to join both tables on their id but > only for some of partitions. > If I use an inner join, everything works well: > > {code:java} > select * > from FA f > join DI d > on(f.id = d.id and f.part_col = d.part_col) > where f.part_col = 'xyz' > {code} > > In the sql explain plan I can see, that the predicate part_col = 'xyz' is > also used in the DIm HiveTableScan. > > When I execute the same query using a left join the full dim table is > scanned. There are some workarounds for this issue, but I wanted to report > this as a bug, since it works on an inner join, and i think the behaviour > should be the same for an outer join > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org