GitHub user GaalDornick opened a pull request: https://github.com/apache/spark/pull/17190
[SPARK-19478][SS] JDBC Sink [WIP] ## What changes were proposed in this pull request? Implementation of Sink that supports storing structured streaming data into a JDBC compliant RDBMS database. It supports Overwrite and Append modes. By default it supports _atleast once_ operations and can be configured to support _exactly once_ To keep track of batches that have been written to a table, it creates a _log_ table with the name <tablename>$_SINK_LOG. This table has 2 columns: batchID and status of batch. The status can either be COMMITTED or UNCOMMITTED. When JDBC Sink receives a batch it checks if there is an entry in the sink log table for that batch with status = COMMITTED. If status is COMMITTED, it ignores the batch, other wise it tries the append/overwrite operation To enable _exactly once_ the client should create a column in the original table that stores the batchID. This column should be of LongType. The name of the column should be passed in the options with the name _batchIdCol_. If the JDBC Sink finds that this option is set, it will use _exactly once_ mode. In this mode, it will set the _batchIdCol_ to the batch id that is inserting or overwriting the record. Also, in the beginning of the batch, if it finds a batch with status=UNCOMMITTED, it deletes the records in the original table that match the batchID ## How was this patch tested? Implemented JDBCSinkSuite that is modeled along the lines of other Sink tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/GaalDornick/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17190.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17190 ---- commit 28c8bebadbb7a800c94ba7321af7d144d4678e73 Author: Jayesh Lalwani <lalwani.jay...@gmail.com> Date: 2017-02-26T07:13:39Z Implemented JDBCSink commit f838c4974d435cc19b7589e198d152b362227959 Author: Jayesh Lalwani <lalwani.jay...@gmail.com> Date: 2017-02-26T07:15:07Z Merge remote-tracking branch 'upstream/master' commit 7ac0d7899c06e7f35a3253ee57fa31f30aa946a4 Author: Jayesh Lalwani <lalwani.jay...@gmail.com> Date: 2017-02-28T13:26:49Z Formatting code commit 12086becb1ab882738349b5bb959b4b536832f12 Author: Jayesh Lalwani <lalwani.jay...@gmail.com> Date: 2017-02-28T14:04:05Z Merge remote-tracking branch 'upstream/master' commit 2a43d29a329afa27f4238d61c681fa918cd84d40 Author: Jayesh Lalwani <lalwani.jay...@gmail.com> Date: 2017-03-01T13:04:26Z Merge remote-tracking branch 'upstream/master' commit 756ea2cb32c8a85ccb98cd84d85962e1b5d37154 Author: Jayesh Lalwani <lalwani.jay...@gmail.com> Date: 2017-03-06T14:36:23Z Merge remote-tracking branch 'upstream/master' commit dde8b0b15f11c4e19361e8af485c113ef1a5b422 Author: Jayesh Lalwani <lalwani.jay...@gmail.com> Date: 2017-03-07T13:35:48Z Merge remote-tracking branch 'upstream/master' ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org