GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/14907

    [SPARK-17351] Refactor JDBCRDD to expose ResultSet -> Seq[Row] utility 
methods

    This patch refactors the internals of the JDBC data source in order to 
allow some of its code to be re-used in an automated comparison testing 
harness. Here are the key changes:
    
    - Move the JDBC `ResultSetMetadata` to `StructType` conversion logic from 
`JDBCRDD.resolveTable()` to the `JdbcUtils` object (as a new 
`getSchema(ResultSet, JdbcDialect)` method), allowing it to be applied on 
`ResultSet`s that are created elsewhere.
    - Move the `ResultSet` to `InternalRow` conversion methods from `JDBCRDD` 
to `JdbcUtils`:
      - It makes sense to move the `JDBCValueGetter` type and `makeGetter` 
functions here given that their write-path counterparts (`JDBCValueSetter`) are 
already in `JdbcUtils`.
      - Add an internal `resultSetToSparkInternalRows` method which takes a 
`ResultSet` and schema and returns an `Iterator[InternalRow]`. This effectively 
extracts the main loop of `JDBCRDD` into its own method.
      - Add a public `resultSetToRows` method to `JdbcUtils`, which wraps the 
minimal machinery around `resultSetToSparkInternalRows` in order to allow it to 
be called from outside of a Spark job.
    - Make `JdbcDialect.get` into a `DeveloperApi` (`JdbcDialect` itself is 
already a `DeveloperApi`).
    
    Put together, these changes enable the following testing pattern:
    
    ```scala
    val jdbResultSet: ResultSet = conn.prepareStatement(query).executeQuery()
    val resultSchema: StructType = JdbcUtils.getSchema(jdbResultSet, 
JdbcDialects.get("jdbc:postgresql"))
    val jdbcRows: Seq[Row] = JdbcUtils.resultSetToRows(jdbResultSet, 
schema).toSeq
    checkAnswer(sparkResult, jdbcRows) // in a test case
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark modularize-jdbc-internals

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14907.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14907
    
----
commit 17d770a85e0921a5bfcbe00aead71cf169f76119
Author: Josh Rosen <joshro...@databricks.com>
Date:   2016-08-23T19:02:55Z

    Move ResultSet -> Seq[InternalRow] conversion into JdbcUtils

commit 682b5917341a530627a5d873196f6c4a3259a91b
Author: Josh Rosen <joshro...@databricks.com>
Date:   2016-08-23T19:07:44Z

    Make new method private[spark]

commit ec49accbf2c532766fb66c3b8910fd7c81563839
Author: Josh Rosen <joshro...@databricks.com>
Date:   2016-08-23T19:18:03Z

    Move getCatalystType to JdbcUtils and add new getSchema() method.

commit 025c9d08d485ebfab0dd23f1ed5065e537ad0437
Author: Josh Rosen <joshro...@databricks.com>
Date:   2016-08-23T19:54:54Z

    Add public resultSetToRows() method for converting to public rows.

commit 05dfe5276017862dadf8791672de818329d52723
Author: Josh Rosen <joshro...@databricks.com>
Date:   2016-08-24T02:25:35Z

    Remove InputMetrics from a public API.

commit fca548ae24bbc60e8c04ea4e6756dfb19942fb61
Author: Josh Rosen <joshro...@databricks.com>
Date:   2016-08-25T00:49:28Z

    Open up JdbcDialects.get as developerapi.

commit 43cbef6b4310dd9af08672bcaa01d8114b1fe5fc
Author: Josh Rosen <joshro...@databricks.com>
Date:   2016-09-01T00:32:25Z

    Merge remote-tracking branch 'origin/master' into modularize-jdbc-internals

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to