Re: bulk upsert data batch from Kafka dstream into Postgres db

2017-12-14 Thread salemi
Thank you for your response. In case of an update we need sometime to just update a record and in other cases we need to update the existing record and insert a new record. The statement you proposed doesn't handle that. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: bulk upsert data batch from Kafka dstream into Postgres db

2017-12-14 Thread Cody Koeninger
Modern versions of postgres have upsert, ie insert into ... on conflict ... do update On Thu, Dec 14, 2017 at 11:26 AM, salemi wrote: > Thank you for your respond. > The approach loads just the data into the DB. I am looking for an approach > that allows me to update

Re: bulk upsert data batch from Kafka dstream into Postgres db

2017-12-14 Thread salemi
Thank you for your respond. The approach loads just the data into the DB. I am looking for an approach that allows me to update existing entries in the DB amor insert a new entry if it doesn't exist. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: bulk upsert data batch from Kafka dstream into Postgres db

2017-12-14 Thread Cody Koeninger
use foreachPartition(), get a connection from a jdbc connection pool, and insert the data the same way you would in a non-spark program. If you're only doing inserts, postgres COPY will be faster (e.g. https://discuss.pivotal.io/hc/en-us/articles/204237003), but if you're doing updates that's not

bulk upsert data batch from Kafka dstream into Postgres db

2017-12-13 Thread salemi
Hi All, we are consuming messages from Kafka using Spark dsteam. Once the processing is done we would like to update/insert the data in bulk fashion into the database. I was wondering what the best solution for this might be. Our Postgres database table is not partitioned. Thank you, Ali