GitHub user sureshthalamati opened a pull request:

    https://github.com/apache/spark/pull/16209

    [WIP][SPARK-10849][SQL] Adds option to the JDBC data source  for user to 
specify database column type for the create table

    ## What changes were proposed in this pull request?
    Currently JDBC data source creates tables in the target database using the 
default type mapping, and the JDBC dialect mechanism.  If users want to 
specify different database data type for only some of columns, there is no 
option available. In scenarios where default mapping does not work, users are 
forced to create tables on the target database before writing. This workaround 
is probably not acceptable from a usability point of view. This PR is to 
provide a user-defined type mapping for specific columns.
    
    The solution is to allow users to specify database column data type for the 
create table  as JDBC datasource option(createTableColumnTypes) on write. Data 
type information can be specified as key(column name)-value(data type) pairs in 
JSON (e.g: {"name":"varchar(128)", "comments":"clob(20k)"}). Users can use 
org.apache.spark.sql.types.MetadataBuilder to build the metadata and generate 
the JSON string required for this option. 
    
    Example:
    ```Scala
    val mdb = new MetadataBuilder()
    mdb.putString("name", "VARCHAR(128)”)
    mdb.putString("comments”, “CLOB(20K)”)
    val createTableColTypes = mdb.build().json
    df.write.option("createTableColumnTypes", createTableColTypes).jdbc(url, 
"TEST.DBCOLTYPETEST", properties)
    ```
    Alternative approach  is to add a new column metadata property to the jdbc 
data source for users to specify database column type using the metadata.
    
    TODO : Case-insensitive column name lookup based on the 
spark.sql.caseSensitive property value.
    
    ## How was this patch tested?
    Added new test case to the JDBCWriteSuite

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sureshthalamati/spark 
jdbc_custom_dbtype_option_json-spark-10849

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16209
    
----
commit 6eec6ca63c5641d1bbbbc9958bdd300ac079d5cf
Author: sureshthalamati <suresh.thalam...@gmail.com>
Date:   2016-12-02T23:22:17Z

    Adding new option to the jdbc to allow users to specify create table column 
types when table is created on write

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to