What would be the best way to accomplish the following behavior:

1. There is a table which is partitioned by date
2. Spark job runs on a particular date, we would like it to wipe out all
data for that date. This is to make the job idempotent and lets us rerun a
job if it failed without fear of duplicated data
3. Preserve data for all other dates

I am guessing that overwrite would not work here or if it does its not
guaranteed to stay that way, but am not sure. If thats the case, is there a
good/robust way to get this behavior?

-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience

Reply via email to