What would be the best way to accomplish the following behavior: 1. There is a table which is partitioned by date 2. Spark job runs on a particular date, we would like it to wipe out all data for that date. This is to make the job idempotent and lets us rerun a job if it failed without fear of duplicated data 3. Preserve data for all other dates
I am guessing that overwrite would not work here or if it does its not guaranteed to stay that way, but am not sure. If thats the case, is there a good/robust way to get this behavior? -- Pedro Rodriguez PhD Student in Distributed Machine Learning | CU Boulder UC Berkeley AMPLab Alumni ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 Github: github.com/EntilZha | LinkedIn: https://www.linkedin.com/in/pedrorodriguezscience