[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...

dongjoon-hyun Thu, 27 Oct 2016 15:21:12 -0700

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/15664


    [SPARK-18123][SQL] Use db column names instead of RDD column ones during 
JDBC Writing

    ## What changes were proposed in this pull request?
    
    Apache Spark supports the following cases **by quoting RDD column names** 
while saving through JDBC.
    * Allow reserved keyword as a column name, e.g., 'order'.
    * Allow mixed-case colume names like the following, e.g., `[a: int, A: 
int]`.
    
      ```scala
    scala> val df = sql("select 1 a, 1 A")
    df: org.apache.spark.sql.DataFrame = [a: int, A: int]
    scala> val option = Map("url" -> "jdbc:postgresql:postgres", "dbtable" -> 
"mixed", "user" -> "postgres", "password" -> "test")
    scala> df.write.mode("overwrite").format("jdbc").options(option).save()
    scala> df.write.mode("append").format("jdbc").options(option).save()
    ```
    
    This PR aims to use database column names instead of RDD column ones in 
order to support the following additionally.
    Note that this case succeeds with `MySQL`, but fails on `Postgres`/`Oracle` 
before.
    
    ```scala
    val df1 = sql("select 1 a")
    val df2 = sql("select 1 A")
    ...
    df1.write.mode("overwrite").format("jdbc").options(option).save()
    df2.write.mode("append").format("jdbc").options(option).save()
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins test with a new testcase.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-18123

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15664.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15664
    
----
commit 9558f96eb8d66ed89b2b507e81a285f710c82262
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-10-27T21:30:58Z

    [SPARK-18123][SQL] Use database column names instead of RDD schema column 
names

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...

Reply via email to