[jira] [Commented] (SPARK-19335) Spark should support doing an efficient DataFrame Upsert via JDBC

Cory Lassila (Jira) Fri, 20 Dec 2019 13:35:07 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001220#comment-17001220
 ]


Cory Lassila commented on SPARK-19335:
--------------------------------------

+1 I believe this would be useful, my scenario is using a 5-min aggregate Spark 
Structured Streaming job which Reads from Kafka & uses forEachBatch to do 
multi-out to several different postgres tables. If we fail half-way thru 
multi-out writing to postgres, we get duplicate records in the postgres tables. 
Let me know if this makes sense or if I'm missing something.

Thanks!

> Spark should support doing an efficient DataFrame Upsert via JDBC
> -----------------------------------------------------------------
>
>                 Key: SPARK-19335
>                 URL: https://issues.apache.org/jira/browse/SPARK-19335
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Ilya Ganelin
>            Priority: Minor
>
> Doing a database update, as opposed to an insert is useful, particularly when 
> working with streaming applications which may require revisions to previously 
> stored data. 
> Spark DataFrames/DataSets do not currently support an Update feature via the 
> JDBC Writer allowing only Overwrite or Append.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19335) Spark should support doing an efficient DataFrame Upsert via JDBC

Reply via email to