GitHub user sureshthalamati opened a pull request:
https://github.com/apache/spark/pull/16208
[WIP][SPARK-10849][SQL] Adds a new column metadata property to the jdbc
data source for users to specify database column type using the metadata
## What changes were proposed in this pull request?
Currently JDBC data source creates tables in the target database using the
default type mapping, and the JDBC dialect mechanism. Â If users want to
specify different database data type for only some of columns, there is no
option available. In scenarios where default mapping does not work, users are
forced to create tables on the target database before writing. This workaround
is probably not acceptable from a usability point of view. This PR is to
provide a user-defined type mapping for specific columns.
Â
The solution is based on the existing Redshift connector
(https://github.com/databricks/spark-redshift#setting-a-custom-column-type). We
add a new column metadata property to the jdbc data source for users to specify
database column type using the metadata.
Â
Example :
```Scala
val nvarcharMd = new MetadataBuilder().putString(âcreateTableColumnType",
"NVARCHAR(123)").build()
val newDf = df.withColumn("name", col("name"), nvarcharMd)
newDf.write.mode(SaveMode.Overwrite).jdbc(url, "TEST.USERDBTYPETEST",
properties)
```
One restriction with this approach metadata modification is unsupported in
the Python, SQL, and R language APIs. Users have to create a new data frame to
specify the metadata with the _createTableColumnType_ property.
Â
Alternative approach is to add JDBC data source option for users to specify
database column types information as JSON String.
TODO: Documentation for specifying the database column type
## How was this patch tested?
Added new test case to the JDBCWriteSuite
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sureshthalamati/spark
jdbc_custom_dbtype-spark-10849
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/16208.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #16208
commit 38349033a306a733e83975ca09b6cf8a8d69d397
Author: sureshthalamati
Date: 2016-12-02T23:22:17Z
[SPARK-10849][SQL} Add new jdbc datasource metadata property to allow users
to specify database column type when creating table on write.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org