I asked similar question for end-to-end exactly-once with Kafka, and you're correct distributed transaction is not supported. Introducing distributed transaction like "two-phase commit" requires huge change on Spark codebase and the feedback was not positive.
What you could try instead is intermediate output: inserting into temporal table in executors, and move inserted records to the final table in driver (must be atomic). Thanks, Jungtaek Lim (HeartSaVioR) On Sat, Aug 3, 2019 at 4:56 AM Shiv Prashant Sood <shivprash...@gmail.com> wrote: > All, > > I understood that DataSourceV2 supports Transactional write and wanted to > implement that in JDBC DataSource V2 connector ( PR#25211 > <https://github.com/apache/spark/pull/25211> ). > > Don't see how this is feasible for JDBC based connector. The FW suggest > that EXECUTOR send a commit message to DRIVER, and actual commit should > only be done by DRIVER after receiving all commit confirmations. This will > not work for JDBC as commits have to happen on the JDBC Connection which > is maintained by the EXECUTORS and JDBCConnection is not serializable that > it can be sent to the DRIVER. > > Am i right in thinking that this cannot be supported for JDBC? My goal is > to either fully write or roll back the dataframe write operation. > > Thanks in advance for your help. > > Regards, > Shiv > -- Name : Jungtaek Lim Blog : http://medium.com/@heartsavior Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior