[ https://issues.apache.org/jira/browse/SPARK-37948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
hujiahua updated SPARK-37948: ----------------------------- Summary: Disable mapreduce.fileoutputcommitter.algorithm.version=2 by default (was: Disable spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 by default) > Disable mapreduce.fileoutputcommitter.algorithm.version=2 by default > -------------------------------------------------------------------- > > Key: SPARK-37948 > URL: https://issues.apache.org/jira/browse/SPARK-37948 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.2.0 > Reporter: hujiahua > Priority: Major > > The hadoop MR v2 commit algorithm had a correctness issue described by > SPARK-33019, and changed > spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default. > But some spark users like me ware unaware of this correctness issue before > and had used v2 commit algorithm in spark 2.x for performance purposes. And > after upgrade to spark 3.x, we encountered this correctness issue in > production environment, caused a very serious failure.The trigger probability > of this issue was higher in new version spark 3.x, and I didn't delve into > the specific reasons. So I propose we should better disable > spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 by default, if > users using v2 commit algorithm, then fail the job and warn users this > correctness issue. Or users can choose to force the v2 usage through a new > configuration. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org