[ 
https://issues.apache.org/jira/browse/SPARK-28945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973715#comment-16973715
 ] 

koert kuipers commented on SPARK-28945:
---------------------------------------

i understand there is a great deal of complexity in the committer and this 
might require more work to get it right

but its still unclear to me if the committer is doing anything at all in case 
of dynamic partition overwrite.
what do i lose by disabling all committer activity (committer.setupJob, 
committer.commitJob, etc.) when dynamicPartitionOverwrite is true? and if i 
lose nothing, is that a good thing, or does that mean i should be worried about 
the current state?

> Allow concurrent writes to different partitions with dynamic partition 
> overwrite
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-28945
>                 URL: https://issues.apache.org/jira/browse/SPARK-28945
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.3
>            Reporter: koert kuipers
>            Priority: Minor
>
> It is desirable to run concurrent jobs that write to different partitions 
> within same baseDir using partitionBy and dynamic partitionOverwriteMode.
> See for example here:
> https://stackoverflow.com/questions/38964736/multiple-spark-jobs-appending-parquet-data-to-same-base-path-with-partitioning
> Or the discussion here:
> https://github.com/delta-io/delta/issues/9
> This doesnt seem that difficult. I suspect only changes needed are in 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol, which already has 
> a flag for dynamicPartitionOverwrite. I got a quick test to work by disabling 
> all committer activity (committer.setupJob, committer.commitJob, etc.) when 
> dynamicPartitionOverwrite is true. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to