koert kuipers created SPARK-28945: ------------------------------------- Summary: Allow concurrent writes to unrelated partitions with dynamic partition overwrite Key: SPARK-28945 URL: https://issues.apache.org/jira/browse/SPARK-28945 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.3 Reporter: koert kuipers
It is desirable to run concurrent jobs that write to different partitions within same baseDir using partitionBy and dynamic partitionOverwriteMode. See for example here: https://stackoverflow.com/questions/38964736/multiple-spark-jobs-appending-parquet-data-to-same-base-path-with-partitioning Or the discussion here: https://github.com/delta-io/delta/issues/9 This doesnt seem that difficult. I suspect only changes needed are in org.apache.spark.internal.io.HadoopMapReduceCommitProtocol, which already has a flag for dynamicPartitionOverwrite. I got a quick test to work by disabling all committer activity (committer.setupJob, committer.commitJob, etc.) when dynamicPartitionOverwrite is true. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org