[jira] [Comment Edited] (SPARK-5472) Add support for reading from and writing to a JDBC database

Tor Myklebust (JIRA) Thu, 29 Jan 2015 08:40:28 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297105#comment-14297105
 ]


Tor Myklebust edited comment on SPARK-5472 at 1/29/15 4:37 PM:
---------------------------------------------------------------

Not sure what you mean by "essentially" here.  JdbcRDD certainly lets you pull 
information out of a database, and from there you can munge it to accomplish 
whatever task you want to accomplish.  Part of the point here is to eliminate, 
or at least drastically reduce, the need for manual munging.

JdbcRDD gives you an RDD of Array[Object]'s or, if you specify a function that 
maps ResultSet rows to objects of your choosing, an RDD of some class of your 
choosing.  It doesn't natively produce Spark SQL DataFrames.  In order to get a 
DataFrame, you need an RDD of Row objects and their schema; a lot of the work 
here comes from type mapping between types in the external database and Spark 
SQL types.

JdbcRDD also doesn't expose itself as a data source in Spark SQL; you can't 
"CREATE TABLE foo USING something" with some options in Spark SQL in order to 
get a table named foo that really lives inside an external database.


was (Author: tmyklebu):
Not sure what you mean by "essentially" here.

JdbcRDD gives you an RDD of Array[Object]'s or, if you specify a function that 
maps ResultSet rows to objects of your choosing, an RDD of some class of your 
choosing.  It doesn't natively produce Spark SQL DataFrames.  In order to get a 
DataFrame, you need an RDD of Row objects and their schema; a lot of the work 
here comes from type mapping between types in the external database and Spark 
SQL types.

JdbcRDD also doesn't expose itself as a data source in Spark SQL; you can't 
"CREATE TABLE foo USING something" with some options in Spark SQL in order to 
get a table named foo that really lives inside an external database.

> Add support for reading from and writing to a JDBC database
> -----------------------------------------------------------
>
>                 Key: SPARK-5472
>                 URL: https://issues.apache.org/jira/browse/SPARK-5472
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Tor Myklebust
>            Priority: Minor
>
> It would be nice to be able to make a table in a JDBC database appear as a 
> table in Spark SQL.  This would let users, for instance, perform a JOIN 
> between a DataFrame in Spark SQL with a table in a Postgres database.
> It might also be nice to be able to go the other direction---save a DataFrame 
> to a database---for instance in an ETL job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-5472) Add support for reading from and writing to a JDBC database

Reply via email to