[ https://issues.apache.org/jira/browse/SPARK-30768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-30768: ---------------------------------- Affects Version/s: (was: 3.0.0) 3.1.0 > Constraints inferred from inequality attributes > ----------------------------------------------- > > Key: SPARK-30768 > URL: https://issues.apache.org/jira/browse/SPARK-30768 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: Yuming Wang > Priority: Major > > How to reproduce: > {code:sql} > create table SPARK_30768_1(c1 int, c2 int); > create table SPARK_30768_2(c1 int, c2 int); > {code} > *Spark SQL*: > {noformat} > spark-sql> explain select t1.* from SPARK_30768_1 t1 join SPARK_30768_2 t2 on > (t1.c1 > t2.c1) where t1.c1 = 3; > == Physical Plan == > *(3) Project [c1#5, c2#6] > +- BroadcastNestedLoopJoin BuildRight, Inner, (c1#5 > c1#7) > :- *(1) Project [c1#5, c2#6] > : +- *(1) Filter (isnotnull(c1#5) AND (c1#5 = 3)) > : +- *(1) ColumnarToRow > : +- FileScan parquet default.spark_30768_1[c1#5,c2#6] Batched: > true, DataFilters: [isnotnull(c1#5), (c1#5 = 3)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(c1), EqualTo(c1,3)], > ReadSchema: struct<c1:int,c2:int> > +- BroadcastExchange IdentityBroadcastMode, [id=#60] > +- *(2) Project [c1#7] > +- *(2) Filter isnotnull(c1#7) > +- *(2) ColumnarToRow > +- FileScan parquet default.spark_30768_2[c1#7] Batched: true, > DataFilters: [isnotnull(c1#7)], Format: Parquet, Location: > InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., > PartitionFilters: [], PushedFilters: [IsNotNull(c1)], ReadSchema: > struct<c1:int> > {noformat} > *Hive* support this feature: > {noformat} > hive> explain select t1.* from SPARK_30768_1 t1 join SPARK_30768_2 t2 on > (t1.c1 > t2.c1) where t1.c1 = 3; > Warning: Map Join MAPJOIN[13][bigTable=?] in task 'Stage-3:MAPRED' is a cross > product > OK > STAGE DEPENDENCIES: > Stage-4 is a root stage > Stage-3 depends on stages: Stage-4 > Stage-0 depends on stages: Stage-3 > STAGE PLANS: > Stage: Stage-4 > Map Reduce Local Work > Alias -> Map Local Tables: > $hdt$_0:t1 > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > $hdt$_0:t1 > TableScan > alias: t1 > filterExpr: (c1 = 3) (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column > stats: NONE > Filter Operator > predicate: (c1 = 3) (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Select Operator > expressions: c2 (type: int) > outputColumnNames: _col1 > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > HashTable Sink Operator > keys: > 0 > 1 > Stage: Stage-3 > Map Reduce > Map Operator Tree: > TableScan > alias: t2 > filterExpr: (c1 < 3) (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column > stats: NONE > Filter Operator > predicate: (c1 < 3) (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Select Operator > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 > 1 > outputColumnNames: _col1 > Statistics: Num rows: 1 Data size: 1 Basic stats: PARTIAL > Column stats: NONE > Select Operator > expressions: 3 (type: int), _col1 (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 1 Data size: 1 Basic stats: PARTIAL > Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 1 Basic stats: > PARTIAL Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Execution mode: vectorized > Local Work: > Map Reduce Local Work > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > Time taken: 5.491 seconds, Fetched: 71 row(s) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org