[ https://issues.apache.org/jira/browse/SPARK-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-25332: ------------------------------------ Assignee: (was: Apache Spark) > Instead of broadcast hash join ,Sort merge join has selected when restart > spark-shell/spark-JDBC for hive provider > ------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-25332 > URL: https://issues.apache.org/jira/browse/SPARK-25332 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.0 > Reporter: Babulal > Priority: Major > > spark.sql("create table x1(name string,age int) stored as parquet ") > spark.sql("insert into x1 select 'a',29") > spark.sql("create table x2 (name string,age int) stored as parquet '") > spark.sql("insert into x2_ex select 'a',29") > scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain > == Physical Plan == > *{color:#14892c}(2) BroadcastHashJoin{color} [name#101], [name#103], Inner, > BuildRight > :- *(2) Project [name#101, age#102] > : +- *(2) Filter isnotnull(name#101) > : +- *(2) FileScan parquet default.x1_ex[name#101,age#102] Batched: true, > Format: Parquet, Location: > InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x1, > PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: > struct<name:string,age:int> > +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true])) > +- *(1) Project [name#103, age#104] > +- *(1) Filter isnotnull(name#103) > +- *(1) FileScan parquet default.x2_ex[name#103,age#104] Batched: true, > Format: Parquet, Location: > InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x2, > PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: > struct<name:string,age:int> > > > Now Restart Spark-Shell or do spark-submit orrestart JDBCServer again and > run same select query again > > scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain > scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain > == Physical Plan == > *{color:#FF0000}(5) SortMergeJoin [{color}name#43], [name#45], Inner > :- *(2) Sort [name#43 ASC NULLS FIRST], false, 0 > : +- Exchange hashpartitioning(name#43, 200) > : +- *(1) Project [name#43, age#44] > : +- *(1) Filter isnotnull(name#43) > : +- *(1) FileScan parquet default.x1[name#43,age#44] Batched: true, Format: > Parquet, Location: > InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x1], > PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: > struct<name:string,age:int> > +- *(4) Sort [name#45 ASC NULLS FIRST], false, 0 > +- Exchange hashpartitioning(name#45, 200) > +- *(3) Project [name#45, age#46] > +- *(3) Filter isnotnull(name#45) > +- *(3) FileScan parquet default.x2[name#45,age#46] Batched: true, Format: > Parquet, Location: > InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x2], > PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: > struct<name:string,age:int> > > > scala> spark.sql("desc formatted x1").show(200,false) > +----------------------------+--------------------------------------------------------------+-------+ > |col_name |data_type |comment| > +----------------------------+--------------------------------------------------------------+-------+ > |name |string |null | > |age |int |null | > | | | | > |# Detailed Table Information| | | > |Database |default | | > |Table |x1 | | > |Owner |Administrator | | > |Created Time |Sun Aug 19 12:36:58 IST 2018 | | > |Last Access |Thu Jan 01 05:30:00 IST 1970 | | > |Created By |Spark 2.3.0 | | > |Type |MANAGED | | > |Provider |hive | | > |Table Properties |[transient_lastDdlTime=1534662418] | | > |Location |file:/D:/spark_release/spark/bin/spark-warehouse/x1 | | > |Serde Library |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | > | > |InputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | > | > |OutputFormat > |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| | > |Storage Properties |[serialization.format=1] | | > |Partition Provider |Catalog | | > +----------------------------+--------------------------------------------------------------+-------+ > > With datasource table ,working fine ( create table using parquet instead of > stored by ) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org