[ 
https://issues.apache.org/jira/browse/SPARK-37948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hujiahua updated SPARK-37948:
-----------------------------
    Summary: Disable mapreduce.fileoutputcommitter.algorithm.version=2 by 
default  (was: Disable 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 by default)

> Disable mapreduce.fileoutputcommitter.algorithm.version=2 by default
> --------------------------------------------------------------------
>
>                 Key: SPARK-37948
>                 URL: https://issues.apache.org/jira/browse/SPARK-37948
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.2.0
>            Reporter: hujiahua
>            Priority: Major
>
> The hadoop MR v2 commit algorithm had a correctness issue described by 
> SPARK-33019, and changed 
> spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default. 
> But some spark users like me ware unaware of this correctness issue before 
> and had used v2 commit algorithm in spark 2.x for performance purposes. And 
> after upgrade to spark 3.x, we encountered this correctness issue in 
> production environment, caused a very serious failure.The trigger probability 
> of this issue was higher in new version spark 3.x, and I didn't delve into 
> the specific reasons. So I propose we should better disable 
> spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 by default, if 
> users using v2 commit algorithm, then fail the job and warn users this 
> correctness issue. Or users can choose to force the v2 usage through a new 
> configuration.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to