Babulal created SPARK-25332: ------------------------------- Summary: Instead of broadcast hash join ,Sort merge join has selected when restart spark-shell/spark-JDBC for hive provider Key: SPARK-25332 URL: https://issues.apache.org/jira/browse/SPARK-25332 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Babulal
spark.sql("create table x1(name string,age int) stored as parquet ") spark.sql("insert into x1 select 'a',29") spark.sql("create table x2 (name string,age int) stored as parquet '") spark.sql("insert into x2_ex select 'a',29") scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain == Physical Plan == *{color:#14892c}(2) BroadcastHashJoin{color} [name#101], [name#103], Inner, BuildRight :- *(2) Project [name#101, age#102] : +- *(2) Filter isnotnull(name#101) : +- *(2) FileScan parquet default.x1_ex[name#101,age#102] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x1, PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: struct<name:string,age:int> +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true])) +- *(1) Project [name#103, age#104] +- *(1) Filter isnotnull(name#103) +- *(1) FileScan parquet default.x2_ex[name#103,age#104] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x2, PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: struct<name:string,age:int> Now Restart Spark-Shell or do spark-submit orrestart JDBCServer again and run same select query again scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain == Physical Plan == *{color:#FF0000}(5) SortMergeJoin [{color}name#43], [name#45], Inner :- *(2) Sort [name#43 ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(name#43, 200) : +- *(1) Project [name#43, age#44] : +- *(1) Filter isnotnull(name#43) : +- *(1) FileScan parquet default.x1[name#43,age#44] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x1], PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: struct<name:string,age:int> +- *(4) Sort [name#45 ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(name#45, 200) +- *(3) Project [name#45, age#46] +- *(3) Filter isnotnull(name#45) +- *(3) FileScan parquet default.x2[name#45,age#46] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x2], PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: struct<name:string,age:int> scala> spark.sql("desc formatted x1").show(200,false) +----------------------------+--------------------------------------------------------------+-------+ |col_name |data_type |comment| +----------------------------+--------------------------------------------------------------+-------+ |name |string |null | |age |int |null | | | | | |# Detailed Table Information| | | |Database |default | | |Table |x1 | | |Owner |Administrator | | |Created Time |Sun Aug 19 12:36:58 IST 2018 | | |Last Access |Thu Jan 01 05:30:00 IST 1970 | | |Created By |Spark 2.3.0 | | |Type |MANAGED | | |Provider |hive | | |Table Properties |[transient_lastDdlTime=1534662418] | | |Location |file:/D:/spark_release/spark/bin/spark-warehouse/x1 | | |Serde Library |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | | |InputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | | |OutputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| | |Storage Properties |[serialization.format=1] | | |Partition Provider |Catalog | | +----------------------------+--------------------------------------------------------------+-------+ With datasource table ,working fine ( create table using parquet instead of stored by ) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org