[GitHub] spark pull request #16208: [WIP][SPARK-10849][SQL] Adds a new column metadat...

2017-03-23 Thread sureshthalamati
Github user sureshthalamati closed the pull request at:

https://github.com/apache/spark/pull/16208


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16208: [WIP][SPARK-10849][SQL] Adds a new column metadat...

2016-12-07 Thread sureshthalamati
GitHub user sureshthalamati opened a pull request:

https://github.com/apache/spark/pull/16208

[WIP][SPARK-10849][SQL] Adds a new column metadata property to the jdbc 
data source for users to specify database column type using the metadata

## What changes were proposed in this pull request?
Currently JDBC data source creates tables in the target database using the 
default type mapping, and the JDBC dialect mechanism.  If users want to 
specify different database data type for only some of columns, there is no 
option available. In scenarios where default mapping does not work, users are 
forced to create tables on the target database before writing. This workaround 
is probably not acceptable from a usability point of view. This PR is to 
provide a user-defined type mapping for specific columns.
 
The solution is based on the existing Redshift connector 
(https://github.com/databricks/spark-redshift#setting-a-custom-column-type). We 
add a new column metadata property to the jdbc data source for users to specify 
database column type using the metadata.
 
Example :
```Scala
val nvarcharMd = new MetadataBuilder().putString(“createTableColumnType", 
"NVARCHAR(123)").build()
val newDf = df.withColumn("name", col("name"), nvarcharMd)
newDf.write.mode(SaveMode.Overwrite).jdbc(url, "TEST.USERDBTYPETEST", 
properties)
```
One restriction with this approach metadata modification is unsupported in 
the Python, SQL, and R language APIs. Users have to create a new data frame to 
specify the metadata with the _createTableColumnType_ property.
 
Alternative approach is to add JDBC data source option for users to specify 
database column types information as JSON String.

TODO: Documentation for specifying the database column type

## How was this patch tested?
Added new test case to the JDBCWriteSuite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sureshthalamati/spark 
jdbc_custom_dbtype-spark-10849

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16208.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16208


commit 38349033a306a733e83975ca09b6cf8a8d69d397
Author: sureshthalamati 
Date:   2016-12-02T23:22:17Z

[SPARK-10849][SQL} Add new jdbc datasource metadata property to allow users 
to specify database column type when creating table on write.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org