[jira] [Commented] (SPARK-5472) Add support for reading from and writing to a JDBC database

Anand Mohan Tumuluri (JIRA) Tue, 03 Feb 2015 12:35:57 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303924#comment-14303924
 ]


Anand Mohan Tumuluri commented on SPARK-5472:
---------------------------------------------

[~tmyklebu] Many thanks for this extremely useful feature.
[~rxin] Many Thanks for replying and covering the missing filter pushdown. 
(Dont know how I missed it, may be because we dont use the JdbcRdd directly but 
use it after joining to Parquet & caching)

But the advantage of the old JdbcRdd when compared to the new SQL based JDBCRDD 
is that it supports any kind of query which returns a resultset, not 
necessarily limited to a table/view like the current one.
In our use case, we actually use JdbcRdd to get data from a normalized 
transactional database into a denormalized dimensional model with a mammoth SQL 
statement. We only get 5% of the columns that are there in the DB.
If I were to use the new JDBCRDD, I either will have to 
a. add a view into the transactional db(definitely face lot of resistance) or 
b. map all the tables into Spark SQL and do the joins and denormalization 
within Spark SQL (Dont know what issues I will face with Spark SQL's 
limitations in terms of SQL-92 support).
In addition, we were able to take advantage of SQL conditionals to partition 
the table/query in whichever way we wanted. Now I dont know how we would 
achieve that.

Overall not a very pleasant situation for us to be if the new JDBCRDD replaces 
the old JdbcRdd. We are OK if it doesnt replace the old one but complements it.
We will definitely try the write path with the new JDBCRDD but continue to use 
the old for reading.

> Add support for reading from and writing to a JDBC database
> -----------------------------------------------------------
>
>                 Key: SPARK-5472
>                 URL: https://issues.apache.org/jira/browse/SPARK-5472
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Tor Myklebust
>            Assignee: Tor Myklebust
>            Priority: Blocker
>             Fix For: 1.3.0
>
>
> It would be nice to be able to make a table in a JDBC database appear as a 
> table in Spark SQL.  This would let users, for instance, perform a JOIN 
> between a DataFrame in Spark SQL with a table in a Postgres database.
> It might also be nice to be able to go the other direction -- save a 
> DataFrame to a database -- for instance in an ETL job.
> Edited to clarify:  Both of these tasks are certainly possible to accomplish 
> at the moment with a little bit of ad-hoc glue code.  However, there is no 
> fundamental reason why the user should need to supply the table schema and 
> some code for pulling data out of a ResultSet row into a Catalyst Row 
> structure when this information can be derived from the schema of the 
> database table itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5472) Add support for reading from and writing to a JDBC database

Reply via email to