I am using spark to run merge query in postgres sql.
The way its being done now is save the data to be merged in postgres as
temp tables.
Now run the  merge queries in postgres using java sql connection and
statment .
So basically this query runs in postgres.
The queries are insert into source table if it doesn't exists in source but
exists in temp table , else update.
Problem is both the tables got 400K records and thus this whole query takes
20 hours to run.
Is there any way to do it in spark itself and not run the query in PG , so
this can complete in reasonable time.

-- 
Thanks
Deepak

Reply via email to