GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/16313

    [SPARK-18899][SQL] append a bucketed table using DataFrameWriter with 
mismatched bucketing should fail

    ## What changes were proposed in this pull request?
    
    When we append data to an existing table with 
`DataFrameWriter.saveAsTable`, we will check the schema and partition columns 
to see if there is a mismatch. However, we forget to check bucketing, which may 
lead to a problematic table that has different bucketing in different data 
files.
    
    This PR cleans up the checking logic, to fix this bug, and also adds the 
schema check for non-file-based data source. 
    
    ## How was this patch tested?
    new regression test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark bug1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16313.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16313
    
----
commit 370bdc9c15bb865869e2c2e10e60dd501ad8b2f0
Author: Wenchen Fan <wenc...@databricks.com>
Date:   2016-12-16T16:40:16Z

    append a bucketed table using DataFrameWriter with mismatched bucketing 
should fail

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to