Hi All, df = spark.read.csv(PATH) spark.conf.set('spark.sql.sources.partitionOverwriteMode', 'dynamic') df.repartition(col1, col2).write.mode('overwrite').partitionBy('col1').parquet(OUT_PATH)
works fine and overwrites the partitioned directory as expected. However this doesn't overwrite when previous run was abruptly interrupted and the partitioned directory only has _started flag file & no _SUCCESS or _committed. In this case, second run doesn't overwrite, causing partition to have duplicated files. Could someone please help? -- Regards, Rishi Shah