[jira] [Commented] (SPARK-2710) Build SchemaRDD from a JdbcRDD with MetaData (no hard-coded case class)

Michael Armbrust (JIRA) Tue, 02 Dec 2014 17:27:45 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232431#comment-14232431
 ]


Michael Armbrust commented on SPARK-2710:
-----------------------------------------

I'd suggest looking at the test cases (which have examples of each of the 
different interfaces): 
https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/sources

Additionally, I have put together a sample library here that reads Avro data: 
https://github.com/databricks/spark-avro

> Build SchemaRDD from a JdbcRDD with MetaData (no hard-coded case class)
> -----------------------------------------------------------------------
>
>                 Key: SPARK-2710
>                 URL: https://issues.apache.org/jira/browse/SPARK-2710
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, SQL
>            Reporter: Teng Qiu
>
> Spark SQL can take Parquet files or JSON files as a table directly (without 
> given a case class to define the schema)
> as a component named SQL, it should also be able to take a ResultSet from 
> RDBMS easily.
> i find that there is a JdbcRDD in core: 
> core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala
> so i want to make some small change in this file to allow SQLContext to read 
> the MetaData from the PreparedStatement (read metadata do not need to execute 
> the query really).
> Then, in Spark SQL, SQLContext can create SchemaRDD with JdbcRDD and his 
> MetaData.
> In the further, maybe we can add a feature in sql-shell, so that user can 
> using spark-thrift-server join tables from different sources
> such as:
> {code}
> CREATE TABLE jdbc_tbl1 AS JDBC "connectionString" "username" "password" 
> "initQuery" "bound" ...
> CREATE TABLE parquet_files AS PARQUET "hdfs://tmp/parquet_table/"
> SELECT parquet_files.colX, jdbc_tbl1.colY
>   FROM parquet_files
>   JOIN jdbc_tbl1
>     ON (parquet_files.id = jdbc_tbl1.id)
> {code}
> I think such a feature will be useful, like facebook Presto engine does.
> oh, and there is a small bug in JdbcRDD
> in compute(), method close()
> {code}
> if (null != conn && ! stmt.isClosed()) conn.close()
> {code}
> should be
> {code}
> if (null != conn && ! conn.isClosed()) conn.close()
> {code}
> just a small write error :)
> but such a close method will never be able to close conn...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2710) Build SchemaRDD from a JdbcRDD with MetaData (no hard-coded case class)

Reply via email to