Thank you for your response. In case of an update we need sometime to just
update a record and in other cases we need to update the existing record and
insert a new record. The statement you proposed doesn't handle that.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Modern versions of postgres have upsert, ie insert into ... on
conflict ... do update
On Thu, Dec 14, 2017 at 11:26 AM, salemi wrote:
> Thank you for your respond.
> The approach loads just the data into the DB. I am looking for an approach
> that allows me to update
Thank you for your respond.
The approach loads just the data into the DB. I am looking for an approach
that allows me to update existing entries in the DB amor insert a new entry
if it doesn't exist.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
use foreachPartition(), get a connection from a jdbc connection pool,
and insert the data the same way you would in a non-spark program.
If you're only doing inserts, postgres COPY will be faster (e.g.
https://discuss.pivotal.io/hc/en-us/articles/204237003), but if you're
doing updates that's not
Hi All,
we are consuming messages from Kafka using Spark dsteam. Once the processing
is done we would like to update/insert the data in bulk fashion into the
database.
I was wondering what the best solution for this might be. Our Postgres
database table is not partitioned.
Thank you,
Ali