I am using spark to run merge query in postgres sql. The way its being done now is save the data to be merged in postgres as temp tables. Now run the merge queries in postgres using java sql connection and statment . So basically this query runs in postgres. The queries are insert into source table if it doesn't exists in source but exists in temp table , else update. Problem is both the tables got 400K records and thus this whole query takes 20 hours to run. Is there any way to do it in spark itself and not run the query in PG , so this can complete in reasonable time.
-- Thanks Deepak