GitHub user gengliangwang opened a pull request:

    https://github.com/apache/spark/pull/23215

    [SPARK-26263][SQL] Throw exception when Partition column value can't be 
converted to user specified type

    ## What changes were proposed in this pull request?
    
    Currently if user provides data schema, partition column values are 
converted as per it. But if the conversion failed, e.g. converting string to 
int, the column value is null. We should throw exception in such case.
    
    For the following directory
    ```
    /tmp/testDir
    ├── p=bar
    └── p=foo
    ```
    If we run:
    ```
    val schema = StructType(Seq(StructField("p", IntegerType, false)))
    spark.read.schema(schema).csv("/tmp/testDir/").show()
    ```
    We will get:
    ```
    +----+
    |   p|
    +----+
    |null|
    |null|
    +----+
    ```
    This PR proposes to throw exception in such case, instead of converting 
into null value silently:
    1. These null partition column values doesn't make sense to users in most 
cases. It is better to show the conversion failure, and then users can adjust 
the schema or ETL jobs to fix it.
    2. There are always exceptions on such conversion failure for non-partition 
data columns. Partition columns should have the same behavior.
    ## How was this patch tested?
    
    Unit test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gengliangwang/spark SPARK-26263

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23215.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23215
    
----
commit 7060e127de339de42be12ed382ef0a4363ae325d
Author: Gengliang Wang <gengliang.wang@...>
Date:   2018-12-04T09:43:03Z

    SPARK-26263: Throw exception when partition value can't be converted to 
specific type

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to