[ 
https://issues.apache.org/jira/browse/SPARK-40034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678814#comment-17678814
 ] 

Steve Loughran commented on SPARK-40034:
----------------------------------------

Note that these changes aren't sufficient. The hadoop 3.3.5 manifest committer 
can tell spark that it has the rename semantics needed, but the protocol is 
still broken and the downstream tests I wrote aren't sufficient. 

SPARK-41551 will have a fix

> PathOutputCommitters to work with dynamic partition overwrite
> -------------------------------------------------------------
>
>                 Key: SPARK-40034
>                 URL: https://issues.apache.org/jira/browse/SPARK-40034
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, SQL
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>             Fix For: 3.4.0
>
>
> sibling of MAPREDUCE-7403: allow PathOutputCommitter implementation to 
> declare that they support the semantics required by spark dynamic 
> partitioning:
> * rename to work as expected
> * working dir to be on same fs as final dir
> They will do this through implementing StreamCapabilities and adding a new 
> probe, "mapreduce.job.committer.dynamic.partitioning" ; the spark side 
> changes are to
> * postpone rejection of dynamic partition overwrite until the output 
> committer is created
> * allow it if the committer implements StreamCapabilities and returns true 
> for {{hasCapability("mapreduce.job.committer.dynamic.partitioning")))
> this isn't going to be supported by the s3a committers, they don't meet the 
> requirements. The manifest committer of MAPREDUCE-7341 running against abfs 
> and gcs does work. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to