[ https://issues.apache.org/jira/browse/SPARK-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196752#comment-15196752 ]
Ashok kumar Rajendran commented on SPARK-13900: ----------------------------------------------- Hi Xio Li, Thanks for looking into this. Here is the explain plan for these 2 queries. (TableA is a big table with 150 fields, I just shortened here to reduce text size) Execution plan for Union Query. Explain execution ==================== 16/03/15 21:00:21 INFO datasources.DataSourceStrategy: Selected 24 partitions out of 24, pruned 0.0% partitions. 16/03/15 21:00:21 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 24.1 KB, free 41.9 KB) 16/03/15 21:00:21 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 4.9 KB, free 46.9 KB) 16/03/15 21:00:21 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.88.12.80:50492 (size: 4.9 KB, free: 7.0 GB) 16/03/15 21:00:21 INFO spark.SparkContext: Created broadcast 2 from explain at JavaSparkSQL.java:676 16/03/15 21:00:22 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 73.3 KB, free 120.2 KB) 16/03/15 21:00:22 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 4.9 KB, free 125.1 KB) 16/03/15 21:00:22 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.88.12.80:50492 (size: 4.9 KB, free: 7.0 GB) 16/03/15 21:00:22 INFO spark.SparkContext: Created broadcast 3 from explain at JavaSparkSQL.java:676 16/03/15 21:00:22 INFO datasources.DataSourceStrategy: Selected 24 partitions out of 24, pruned 0.0% partitions. 16/03/15 21:00:22 INFO storage.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 73.3 KB, free 198.5 KB) 16/03/15 21:00:22 INFO storage.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 4.9 KB, free 203.4 KB) 16/03/15 21:00:22 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on 10.88.12.80:50492 (size: 4.9 KB, free: 7.0 GB) 16/03/15 21:00:22 INFO spark.SparkContext: Created broadcast 4 from explain at JavaSparkSQL.java:676 16/03/15 21:00:22 INFO storage.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 73.3 KB, free 276.7 KB) 16/03/15 21:00:22 INFO storage.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 4.9 KB, free 281.7 KB) 16/03/15 21:00:22 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on 10.88.12.80:50492 (size: 4.9 KB, free: 7.0 GB) 16/03/15 21:00:22 INFO spark.SparkContext: Created broadcast 5 from explain at JavaSparkSQL.java:676 16/03/15 21:00:22 INFO datasources.DataSourceStrategy: Selected 24 partitions out of 24, pruned 0.0% partitions. 16/03/15 21:00:22 INFO storage.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 73.3 KB, free 355.0 KB) 16/03/15 21:00:22 INFO storage.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 4.9 KB, free 359.9 KB) 16/03/15 21:00:22 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on 10.88.12.80:50492 (size: 4.9 KB, free: 7.0 GB) 16/03/15 21:00:22 INFO spark.SparkContext: Created broadcast 6 from explain at JavaSparkSQL.java:676 16/03/15 21:00:22 INFO storage.MemoryStore: Block broadcast_7 stored as values in memory (estimated size 73.3 KB, free 433.3 KB) 16/03/15 21:00:22 INFO storage.MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 4.9 KB, free 438.2 KB) 16/03/15 21:00:22 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on 10.88.12.80:50492 (size: 4.9 KB, free: 7.0 GB) 16/03/15 21:00:22 INFO spark.SparkContext: Created broadcast 7 from explain at JavaSparkSQL.java:676 == Parsed Logical Plan == Union :- Union : :- Project [TableA_Dimension1#74,TableA_Dimension2#94,TableA_Dimension3#68,TableA_timestamp_millis#38,TableA_field40#40,TableB_dimension1#162 AS TableB_dimension1#163,TableB_dimension2#155 AS inv_ua#164,TableB_dimension3#156 AS TableB_dimension3#165,TableB_timestamp_mills#157 AS TableB_timestamp_mills#166] : : +- Filter (((TableA_timestamp_millis#38 >= TableB_timestamp_mills#157) && (cast(TableA_timestamp_millis#38 as double) <= (cast(TableB_timestamp_mills#157 as double) + cast(3600000 as double)))) && (TableA_partition_hour_bucket#153 >= TableB_partition_hour_bucket#159)) : : +- Join Inner, Some((TableA_Dimension1#74 = TableB_dimension1#162)) : : :- Subquery TableA : : : +- Relation[TableA_field0#0,TableA_field1#1,TableA_field2#2,TableA_field3#3,TableA_field4#4,TableA_field5#5,TableA_field6#6,TableA_field7#7,TableA_field8#8,TableA_field9#9,TableA_field10#10,TableA_field11#11,TableA_field12#12,TableA_field13#13,TableA_field14#14,TableA_field15#15,....,TableA_field150#150] ParquetRelation : : +- Subquery TableB : : +- Relation[timeBucket#154,TableB_dimension2#155,TableB_dimension3#156,TableB_timestamp_mills#157,endTime#158,TableB_partition_hour_bucket#159,endTimeBucket#160,count#161L,TableB_dimension1#162] ParquetRelation : +- Project [TableA_Dimension1#74,TableA_Dimension2#94,TableA_Dimension3#68,TableA_timestamp_millis#38,TableA_field40#40,TableB_dimension1#162 AS TableB_dimension1#167,TableB_dimension2#155 AS inv_ua#168,TableB_dimension3#156 AS TableB_dimension3#169,TableB_timestamp_mills#157 AS TableB_timestamp_mills#170] : +- Filter ((((TableA_timestamp_millis#38 >= TableB_timestamp_mills#157) && (cast(TableA_timestamp_millis#38 as double) <= (cast(TableB_timestamp_mills#157 as double) + cast(3600000 as double)))) && (TableA_Dimension3#68 = TableB_dimension3#156)) && (TableA_partition_hour_bucket#153 >= TableB_partition_hour_bucket#159)) : +- Join Inner, None : :- Subquery TableA : : +- Relation[TableA_field0#0,TableA_field1#1,TableA_field2#2,TableA_field3#3,TableA_field4#4,TableA_field5#5,TableA_field6#6,TableA_field7#7,TableA_field8#8,TableA_field9#9,TableA_field10#10,TableA_field11#11,TableA_field12#12,TableA_field13#13,TableA_field14#14,TableA_field15#15,....,TableA_field150#150] ParquetRelation : +- Subquery TableB : +- Relation[timeBucket#154,TableB_dimension2#155,TableB_dimension3#156,TableB_timestamp_mills#157,endTime#158,TableB_partition_hour_bucket#159,endTimeBucket#160,count#161L,TableB_dimension1#162] ParquetRelation +- Project [TableA_Dimension1#74,TableA_Dimension2#94,TableA_Dimension3#68,TableA_timestamp_millis#38,TableA_field40#40,TableB_dimension1#162 AS TableB_dimension1#171,TableB_dimension2#155 AS inv_ua#172,TableB_dimension3#156 AS TableB_dimension3#173,TableB_timestamp_mills#157 AS TableB_timestamp_mills#174] +- Filter ((((TableA_timestamp_millis#38 >= TableB_timestamp_mills#157) && (cast(TableA_timestamp_millis#38 as double) <= (cast(TableB_timestamp_mills#157 as double) + cast(3600000 as double)))) && (TableA_Dimension2#94 = TableB_dimension2#155)) && (TableA_partition_hour_bucket#153 >= TableB_partition_hour_bucket#159)) +- Join Inner, None :- Subquery TableA : +- Relation[TableA_field0#0,TableA_field1#1,TableA_field2#2,TableA_field3#3,TableA_field4#4,TableA_field5#5,TableA_field6#6,TableA_field7#7,TableA_field8#8,TableA_field9#9,TableA_field10#10,TableA_field11#11,TableA_field12#12,TableA_field13#13,TableA_field14#14,TableA_field15#15,....,TableA_field150#150] ParquetRelation +- Subquery TableB +- Relation[timeBucket#154,TableB_dimension2#155,TableB_dimension3#156,TableB_timestamp_mills#157,endTime#158,TableB_partition_hour_bucket#159,endTimeBucket#160,count#161L,TableB_dimension1#162] ParquetRelation == Analyzed Logical Plan == TableA_Dimension1: string, TableA_Dimension2: string, TableA_Dimension3: string, TableA_timestamp_millis: string, TableA_field40: string, TableB_dimension1: string, inv_ua: string, TableB_dimension3: string, TableB_timestamp_mills: string Union :- Union : :- Project [TableA_Dimension1#74,TableA_Dimension2#94,TableA_Dimension3#68,TableA_timestamp_millis#38,TableA_field40#40,TableB_dimension1#162 AS TableB_dimension1#163,TableB_dimension2#155 AS inv_ua#164,TableB_dimension3#156 AS TableB_dimension3#165,TableB_timestamp_mills#157 AS TableB_timestamp_mills#166] : : +- Filter (((TableA_timestamp_millis#38 >= TableB_timestamp_mills#157) && (cast(TableA_timestamp_millis#38 as double) <= (cast(TableB_timestamp_mills#157 as double) + cast(3600000 as double)))) && (TableA_partition_hour_bucket#153 >= TableB_partition_hour_bucket#159)) : : +- Join Inner, Some((TableA_Dimension1#74 = TableB_dimension1#162)) : : :- Subquery TableA : : : +- Relation[TableA_field0#0,TableA_field1#1,TableA_field2#2,TableA_field3#3,TableA_field4#4,TableA_field5#5,TableA_field6#6,TableA_field7#7,TableA_field8#8,TableA_field9#9,TableA_field10#10,TableA_field11#11,TableA_field12#12,TableA_field13#13,TableA_field14#14,TableA_field15#15,....,TableA_field150#150] ParquetRelation : : +- Subquery TableB : : +- Relation[timeBucket#154,TableB_dimension2#155,TableB_dimension3#156,TableB_timestamp_mills#157,endTime#158,TableB_partition_hour_bucket#159,endTimeBucket#160,count#161L,TableB_dimension1#162] ParquetRelation : +- Project [TableA_Dimension1#74,TableA_Dimension2#94,TableA_Dimension3#68,TableA_timestamp_millis#38,TableA_field40#40,TableB_dimension1#162 AS TableB_dimension1#167,TableB_dimension2#155 AS inv_ua#168,TableB_dimension3#156 AS TableB_dimension3#169,TableB_timestamp_mills#157 AS TableB_timestamp_mills#170] : +- Filter ((((TableA_timestamp_millis#38 >= TableB_timestamp_mills#157) && (cast(TableA_timestamp_millis#38 as double) <= (cast(TableB_timestamp_mills#157 as double) + cast(3600000 as double)))) && (TableA_Dimension3#68 = TableB_dimension3#156)) && (TableA_partition_hour_bucket#153 >= TableB_partition_hour_bucket#159)) : +- Join Inner, None : :- Subquery TableA : : +- Relation[TableA_field0#0,TableA_field1#1,TableA_field2#2,TableA_field3#3,TableA_field4#4,TableA_field5#5,TableA_field6#6,TableA_field7#7,TableA_field8#8,TableA_field9#9,TableA_field10#10,TableA_field11#11,TableA_field12#12,TableA_field13#13,TableA_field14#14,TableA_field15#15,....,TableA_field150#150] ParquetRelation : +- Subquery TableB : +- Relation[timeBucket#154,TableB_dimension2#155,TableB_dimension3#156,TableB_timestamp_mills#157,endTime#158,TableB_partition_hour_bucket#159,endTimeBucket#160,count#161L,TableB_dimension1#162] ParquetRelation +- Project [TableA_Dimension1#74,TableA_Dimension2#94,TableA_Dimension3#68,TableA_timestamp_millis#38,TableA_field40#40,TableB_dimension1#162 AS TableB_dimension1#171,TableB_dimension2#155 AS inv_ua#172,TableB_dimension3#156 AS TableB_dimension3#173,TableB_timestamp_mills#157 AS TableB_timestamp_mills#174] +- Filter ((((TableA_timestamp_millis#38 >= TableB_timestamp_mills#157) && (cast(TableA_timestamp_millis#38 as double) <= (cast(TableB_timestamp_mills#157 as double) + cast(3600000 as double)))) && (TableA_Dimension2#94 = TableB_dimension2#155)) && (TableA_partition_hour_bucket#153 >= TableB_partition_hour_bucket#159)) +- Join Inner, None :- Subquery TableA : +- Relation[TableA_field0#0,TableA_field1#1,TableA_field2#2,TableA_field3#3,TableA_field4#4,TableA_field5#5,TableA_field6#6,TableA_field7#7,TableA_field8#8,TableA_field9#9,TableA_field10#10,TableA_field11#11,TableA_field12#12,TableA_field13#13,TableA_field14#14,TableA_field15#15,....,TableA_field150#150] ParquetRelation +- Subquery TableB +- Relation[timeBucket#154,TableB_dimension2#155,TableB_dimension3#156,TableB_timestamp_mills#157,endTime#158,TableB_partition_hour_bucket#159,endTimeBucket#160,count#161L,TableB_dimension1#162] ParquetRelation == Optimized Logical Plan == Union :- Union : :- Project [TableA_Dimension1#74,TableA_Dimension2#94,TableA_Dimension3#68,TableA_timestamp_millis#38,TableA_field40#40,TableB_dimension1#162 AS TableB_dimension1#163,TableB_dimension2#155 AS inv_ua#164,TableB_dimension3#156 AS TableB_dimension3#165,TableB_timestamp_mills#157 AS TableB_timestamp_mills#166] : : +- Join Inner, Some(((((TableA_timestamp_millis#38 >= TableB_timestamp_mills#157) && (cast(TableA_timestamp_millis#38 as double) <= (cast(TableB_timestamp_mills#157 as double) + 3600000.0))) && (TableA_partition_hour_bucket#153 >= TableB_partition_hour_bucket#159)) && (TableA_Dimension1#74 = TableB_dimension1#162))) : : :- Project [TableA_timestamp_millis#38,TableA_Dimension1#74,TableA_field40#40,TableA_Dimension3#68,TableA_partition_hour_bucket#153,TableA_Dimension2#94] : : : +- Relation[TableA_field0#0,TableA_field1#1,TableA_field2#2,TableA_field3#3,TableA_field4#4,TableA_field5#5,TableA_field6#6,TableA_field7#7,TableA_field8#8,TableA_field9#9,TableA_field10#10,TableA_field11#11,TableA_field12#12,TableA_field13#13,TableA_field14#14,TableA_field15#15,....,TableA_field150#150] ParquetRelation : : +- Project [TableB_dimension3#156,TableB_partition_hour_bucket#159,TableB_dimension1#162,TableB_timestamp_mills#157,TableB_dimension2#155] : : +- Relation[timeBucket#154,TableB_dimension2#155,TableB_dimension3#156,TableB_timestamp_mills#157,endTime#158,TableB_partition_hour_bucket#159,endTimeBucket#160,count#161L,TableB_dimension1#162] ParquetRelation : +- Project [TableA_Dimension1#74,TableA_Dimension2#94,TableA_Dimension3#68,TableA_timestamp_millis#38,TableA_field40#40,TableB_dimension1#162 AS TableB_dimension1#167,TableB_dimension2#155 AS inv_ua#168,TableB_dimension3#156 AS TableB_dimension3#169,TableB_timestamp_mills#157 AS TableB_timestamp_mills#170] : +- Join Inner, Some(((((TableA_timestamp_millis#38 >= TableB_timestamp_mills#157) && (cast(TableA_timestamp_millis#38 as double) <= (cast(TableB_timestamp_mills#157 as double) + 3600000.0))) && (TableA_Dimension3#68 = TableB_dimension3#156)) && (TableA_partition_hour_bucket#153 >= TableB_partition_hour_bucket#159))) : :- Project [TableA_timestamp_millis#38,TableA_Dimension1#74,TableA_field40#40,TableA_Dimension3#68,TableA_partition_hour_bucket#153,TableA_Dimension2#94] : : +- Relation[TableA_field0#0,TableA_field1#1,TableA_field2#2,TableA_field3#3,TableA_field4#4,TableA_field5#5,TableA_field6#6,TableA_field7#7,TableA_field8#8,TableA_field9#9,TableA_field10#10,TableA_field11#11,TableA_field12#12,TableA_field13#13,TableA_field14#14,TableA_field15#15,....,TableA_field150#150] ParquetRelation : +- Project [TableB_dimension3#156,TableB_partition_hour_bucket#159,TableB_dimension1#162,TableB_timestamp_mills#157,TableB_dimension2#155] : +- Relation[timeBucket#154,TableB_dimension2#155,TableB_dimension3#156,TableB_timestamp_mills#157,endTime#158,TableB_partition_hour_bucket#159,endTimeBucket#160,count#161L,TableB_dimension1#162] ParquetRelation +- Project [TableA_Dimension1#74,TableA_Dimension2#94,TableA_Dimension3#68,TableA_timestamp_millis#38,TableA_field40#40,TableB_dimension1#162 AS TableB_dimension1#171,TableB_dimension2#155 AS inv_ua#172,TableB_dimension3#156 AS TableB_dimension3#173,TableB_timestamp_mills#157 AS TableB_timestamp_mills#174] +- Join Inner, Some(((((TableA_timestamp_millis#38 >= TableB_timestamp_mills#157) && (cast(TableA_timestamp_millis#38 as double) <= (cast(TableB_timestamp_mills#157 as double) + 3600000.0))) && (TableA_Dimension2#94 = TableB_dimension2#155)) && (TableA_partition_hour_bucket#153 >= TableB_partition_hour_bucket#159))) :- Project [TableA_timestamp_millis#38,TableA_Dimension1#74,TableA_field40#40,TableA_Dimension3#68,TableA_partition_hour_bucket#153,TableA_Dimension2#94] : +- Relation[TableA_field0#0,TableA_field1#1,TableA_field2#2,TableA_field3#3,TableA_field4#4,TableA_field5#5,TableA_field6#6,TableA_field7#7,TableA_field8#8,TableA_field9#9,TableA_field10#10,TableA_field11#11,TableA_field12#12,TableA_field13#13,TableA_field14#14,TableA_field15#15,....,TableA_field150#150] ParquetRelation +- Project [TableB_dimension3#156,TableB_partition_hour_bucket#159,TableB_dimension1#162,TableB_timestamp_mills#157,TableB_dimension2#155] +- Relation[timeBucket#154,TableB_dimension2#155,TableB_dimension3#156,TableB_timestamp_mills#157,endTime#158,TableB_partition_hour_bucket#159,endTimeBucket#160,count#161L,TableB_dimension1#162] ParquetRelation == Physical Plan == Union :- Project [TableA_Dimension1#74,TableA_Dimension2#94,TableA_Dimension3#68,TableA_timestamp_millis#38,TableA_field40#40,TableB_dimension1#162 AS TableB_dimension1#163,TableB_dimension2#155 AS inv_ua#164,TableB_dimension3#156 AS TableB_dimension3#165,TableB_timestamp_mills#157 AS TableB_timestamp_mills#166] : +- Filter (((TableA_timestamp_millis#38 >= TableB_timestamp_mills#157) && (cast(TableA_timestamp_millis#38 as double) <= (cast(TableB_timestamp_mills#157 as double) + 3600000.0))) && (TableA_partition_hour_bucket#153 >= TableB_partition_hour_bucket#159)) : +- BroadcastHashJoin [TableA_Dimension1#74], [TableB_dimension1#162], BuildRight : :- Scan ParquetRelation[TableA_timestamp_millis#38,TableA_Dimension1#74,TableA_field40#40,TableA_Dimension3#68,TableA_partition_hour_bucket#153,TableA_Dimension2#94] InputPaths: hdfs://myhdfs:8888/backfill/akraj/spark/output/TableA-parq : +- Scan ParquetRelation[TableB_dimension3#156,TableB_partition_hour_bucket#159,TableB_dimension1#162,TableB_timestamp_mills#157,TableB_dimension2#155] InputPaths: hdfs://myhdfs:8888/backfill/akraj/spark/input/TableB-parq :- Project [TableA_Dimension1#74,TableA_Dimension2#94,TableA_Dimension3#68,TableA_timestamp_millis#38,TableA_field40#40,TableB_dimension1#162 AS TableB_dimension1#167,TableB_dimension2#155 AS inv_ua#168,TableB_dimension3#156 AS TableB_dimension3#169,TableB_timestamp_mills#157 AS TableB_timestamp_mills#170] : +- Filter (((TableA_timestamp_millis#38 >= TableB_timestamp_mills#157) && (cast(TableA_timestamp_millis#38 as double) <= (cast(TableB_timestamp_mills#157 as double) + 3600000.0))) && (TableA_partition_hour_bucket#153 >= TableB_partition_hour_bucket#159)) : +- BroadcastHashJoin [TableA_Dimension3#68], [TableB_dimension3#156], BuildRight : :- Scan ParquetRelation[TableA_timestamp_millis#38,TableA_Dimension1#74,TableA_field40#40,TableA_Dimension3#68,TableA_partition_hour_bucket#153,TableA_Dimension2#94] InputPaths: hdfs://myhdfs:8888/backfill/akraj/spark/output/TableA-parq : +- Scan ParquetRelation[TableB_dimension3#156,TableB_partition_hour_bucket#159,TableB_dimension1#162,TableB_timestamp_mills#157,TableB_dimension2#155] InputPaths: hdfs://myhdfs:8888/backfill/akraj/spark/input/TableB-parq +- Project [TableA_Dimension1#74,TableA_Dimension2#94,TableA_Dimension3#68,TableA_timestamp_millis#38,TableA_field40#40,TableB_dimension1#162 AS TableB_dimension1#171,TableB_dimension2#155 AS inv_ua#172,TableB_dimension3#156 AS TableB_dimension3#173,TableB_timestamp_mills#157 AS TableB_timestamp_mills#174] +- Filter (((TableA_timestamp_millis#38 >= TableB_timestamp_mills#157) && (cast(TableA_timestamp_millis#38 as double) <= (cast(TableB_timestamp_mills#157 as double) + 3600000.0))) && (TableA_partition_hour_bucket#153 >= TableB_partition_hour_bucket#159)) +- BroadcastHashJoin [TableA_Dimension2#94], [TableB_dimension2#155], BuildRight :- Scan ParquetRelation[TableA_timestamp_millis#38,TableA_Dimension1#74,TableA_field40#40,TableA_Dimension3#68,TableA_partition_hour_bucket#153,TableA_Dimension2#94] InputPaths: hdfs://myhdfs:8888/backfill/akraj/spark/output/TableA-parq +- Scan ParquetRelation[TableB_dimension3#156,TableB_partition_hour_bucket#159,TableB_dimension1#162,TableB_timestamp_mills#157,TableB_dimension2#155] InputPaths: hdfs://myhdfs:8888/backfill/akraj/spark/input/TableB-parq Explain execution ==================== > Spark SQL queries with OR condition is not optimized properly > ------------------------------------------------------------- > > Key: SPARK-13900 > URL: https://issues.apache.org/jira/browse/SPARK-13900 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.0 > Reporter: Ashok kumar Rajendran > > I have a large table with few billions of rows and have a very small table > with 4 dimensional values. All the data is stored in parquet format. I would > like to get rows that match any of these dimensions. For example, > Select field1, field2 from A, B where A.dimension1 = B.dimension1 OR > A.dimension2 = B.dimension2 OR A.dimension3 = B.dimension3 OR A.dimension4 = > B.dimension4. > The query plan takes this as BroadcastNestedLoopJoin and executes for very > long time. > If I execute this as Union queries, it takes around 1.5mins for each > dimension. Each query internally does BroadcastHashJoin. > Select field1, field2 from A, B where A.dimension1 = B.dimension1 > UNION ALL > Select field1, field2 from A, B where A.dimension2 = B.dimension2 > UNION ALL > Select field1, field2 from A, B where A.dimension3 = B.dimension3 > UNION ALL > Select field1, field2 from A, B where A.dimension4 = B.dimension4. > This is obviously not an optimal solution as it makes multiple scanning at > same table but it gives result much better than OR condition. > Seems the SQL optimizer is not working properly which causes huge performance > impact on this type of OR query. > Please correct me if I miss anything here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org