i am looking into writing a dataframe to parquet using partioning. so something like
df .write .mode(saveMode) .partitionBy(partitionColumn) .format("parquet") .save(path) i imagine i will have thousands of partitions. generally my goal is not to recreate all partitions every time, but just a few partitions. the partitions i do write to i want to replace all the data in. i would expect this to be a general and typical use case since a true append (adding data to partitions) is messy and not idempotent and to be avoided by design (in fact i am not sure why it exists at all, unless transactions are supported). redoing all partitions is very inefficient. what saveMode do i use? in my tests if i use saveMode=Overwrite then i lose all partitions. if i use saveMode=Append is the dangerous non-idempotent usage that adds to partitions. i dont think saveMode=Ignore or saveMode=ErrorIfExists will help me either.