[ https://issues.apache.org/jira/browse/SPARK-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303924#comment-14303924 ]
Anand Mohan Tumuluri commented on SPARK-5472: --------------------------------------------- [~tmyklebu] Many thanks for this extremely useful feature. [~rxin] Many Thanks for replying and covering the missing filter pushdown. (Dont know how I missed it, may be because we dont use the JdbcRdd directly but use it after joining to Parquet & caching) But the advantage of the old JdbcRdd when compared to the new SQL based JDBCRDD is that it supports any kind of query which returns a resultset, not necessarily limited to a table/view like the current one. In our use case, we actually use JdbcRdd to get data from a normalized transactional database into a denormalized dimensional model with a mammoth SQL statement. We only get 5% of the columns that are there in the DB. If I were to use the new JDBCRDD, I either will have to a. add a view into the transactional db(definitely face lot of resistance) or b. map all the tables into Spark SQL and do the joins and denormalization within Spark SQL (Dont know what issues I will face with Spark SQL's limitations in terms of SQL-92 support). In addition, we were able to take advantage of SQL conditionals to partition the table/query in whichever way we wanted. Now I dont know how we would achieve that. Overall not a very pleasant situation for us to be if the new JDBCRDD replaces the old JdbcRdd. We are OK if it doesnt replace the old one but complements it. We will definitely try the write path with the new JDBCRDD but continue to use the old for reading. > Add support for reading from and writing to a JDBC database > ----------------------------------------------------------- > > Key: SPARK-5472 > URL: https://issues.apache.org/jira/browse/SPARK-5472 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Tor Myklebust > Assignee: Tor Myklebust > Priority: Blocker > Fix For: 1.3.0 > > > It would be nice to be able to make a table in a JDBC database appear as a > table in Spark SQL. This would let users, for instance, perform a JOIN > between a DataFrame in Spark SQL with a table in a Postgres database. > It might also be nice to be able to go the other direction -- save a > DataFrame to a database -- for instance in an ETL job. > Edited to clarify: Both of these tasks are certainly possible to accomplish > at the moment with a little bit of ad-hoc glue code. However, there is no > fundamental reason why the user should need to supply the table schema and > some code for pulling data out of a ResultSet row into a Catalyst Row > structure when this information can be derived from the schema of the > database table itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org