[jira] [Commented] (SPARK-37948) Disable mapreduce.fileoutputcommitter.algorithm.version=2 by default
[ https://issues.apache.org/jira/browse/SPARK-37948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482813#comment-17482813 ] hujiahua commented on SPARK-37948: -- Well, your explanation does make sense. > Disable mapreduce.fileoutputcommitter.algorithm.version=2 by default > > > Key: SPARK-37948 > URL: https://issues.apache.org/jira/browse/SPARK-37948 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: hujiahua >Priority: Major > > The hadoop MR v2 commit algorithm had a correctness issue described by > SPARK-33019, and changed > spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default. > But some spark users like me ware unaware of this correctness issue before > and had used v2 commit algorithm in spark 2.x for performance purposes. And > after upgrade to spark 3.x, we encountered this correctness issue in > production environment, caused a very serious failure.The trigger probability > of this issue was higher in new version spark 3.x, and I didn't delve into > the specific reasons. So I propose we should better disable > spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 by default, if > users using v2 commit algorithm, then fail the job and warn users this > correctness issue. Or users can choose to force the v2 usage through a new > configuration. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37948) Disable mapreduce.fileoutputcommitter.algorithm.version=2 by default
[ https://issues.apache.org/jira/browse/SPARK-37948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479096#comment-17479096 ] Hyukjin Kwon commented on SPARK-37948: -- The problem is that users might intentionally enable v2 protocol, and it makes less sense to warn and disable it. They might already know the risk, and enable v2. I personally think it's discouraged to assume that user's input is wrong. > Disable mapreduce.fileoutputcommitter.algorithm.version=2 by default > > > Key: SPARK-37948 > URL: https://issues.apache.org/jira/browse/SPARK-37948 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: hujiahua >Priority: Major > > The hadoop MR v2 commit algorithm had a correctness issue described by > SPARK-33019, and changed > spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default. > But some spark users like me ware unaware of this correctness issue before > and had used v2 commit algorithm in spark 2.x for performance purposes. And > after upgrade to spark 3.x, we encountered this correctness issue in > production environment, caused a very serious failure.The trigger probability > of this issue was higher in new version spark 3.x, and I didn't delve into > the specific reasons. So I propose we should better disable > spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 by default, if > users using v2 commit algorithm, then fail the job and warn users this > correctness issue. Or users can choose to force the v2 usage through a new > configuration. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org