[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...

maropu Thu, 23 Aug 2018 02:54:01 -0700

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/22198


    [SPARK-25121][SQL] Supports multi-part table names for broadcast hint 
resolution

    ## What changes were proposed in this pull request?
    This pr fixed code to respect a database name for broadcast table hint 
resolution.
    Currently, spark ignores a database name in multi-part names;
    ```
    scala> sql("CREATE DATABASE testDb")
    scala> spark.range(10).write.saveAsTable("testDb.t")
    
    // without this patch
    scala> spark.range(10).join(spark.table("testDb.t"), 
"id").hint("broadcast", "testDb.t").explain
    == Physical Plan ==
    *(2) Project [id#24L]
    +- *(2) BroadcastHashJoin [id#24L], [id#26L], Inner, BuildLeft
       :- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
false]))
       :  +- *(1) Range (0, 10, step=1, splits=4)
       +- *(2) Project [id#26L]
          +- *(2) Filter isnotnull(id#26L)
             +- *(2) FileScan parquet testdb.t[id#26L] Batched: true, Format: 
Parquet, Location: 
InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-2.3.1-bin-hadoop2.7/spark-warehouse...,
 PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: 
struct<id:bigint>
    
    // with this patch
    scala> spark.range(10).join(spark.table("testDb.t"), 
"id").hint("broadcast", "testDb.t").explain
    == Physical Plan ==
    *(2) Project [id#3L]
    +- *(2) BroadcastHashJoin [id#3L], [id#5L], Inner, BuildRight
       :- *(2) Range (0, 10, step=1, splits=4)
       +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
true]))
          +- *(1) Project [id#5L]
             +- *(1) Filter isnotnull(id#5L)
                +- *(1) FileScan parquet testdb.t[id#5L] Batched: true, Format: 
Parquet, Location: 
InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-master/spark-warehouse/testdb.db/t],
 PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: 
struct<id:bigint>
    ```
    
    ## How was this patch tested?
    Added tests in `DataFrameJoinSuite`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark SPARK-25121

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22198.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22198
    
----
commit d2be6920ba1cc052e9d5d8364cf48375cea8ba44
Author: Takeshi Yamamuro <yamamuro@...>
Date:   2018-08-23T07:20:51Z

    Fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22198: [SPARK-25121][SQL] Supports multi-part table name...

Reply via email to