GitHub user jose-torres opened a pull request: https://github.com/apache/spark/pull/20096
[SPARK-22908] Add kafka source and sink for continuous processing. ## What changes were proposed in this pull request? Add kafka source and sink for continuous processing. This involves two small changes to the execution engine: * Bring data reader close() into the normal data reader thread to avoid thread safety issues. * Fix up the semantics of the RECONFIGURING StreamExecution state. State updates are now atomic, and we don't have to deal with swallowing an exception. ## How was this patch tested? new unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/jose-torres/spark continuous-kafka Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20096.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20096 ---- commit eec38d374a4f3db46a26b8926192ae44da90b6ae Author: Jose Torres <jose@...> Date: 2017-12-21T21:07:35Z basic kafka commit f91cb0190d5a5d7942cd3d53bc571140a03965c6 Author: Jose Torres <jose@...> Date: 2017-12-24T21:40:20Z move reader close to data reader thread in case reader isn't thread safe commit 7c180db439c9d3c6389c5ff6033a61341e7f1bbf Author: Jose Torres <jose@...> Date: 2017-12-27T20:43:16Z test + small fixes commit 7596e34f1e8a047263da7bf8522a14869f289125 Author: Jose Torres <jose@...> Date: 2017-12-27T21:21:56Z fixes lost in cherrypick ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org