[jira] [Commented] (SPARK-37948) Disable mapreduce.fileoutputcommitter.algorithm.version=2 by default

2022-01-26 Thread hujiahua (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482813#comment-17482813
 ] 

hujiahua commented on SPARK-37948:
--

Well, your explanation does make sense.

> Disable mapreduce.fileoutputcommitter.algorithm.version=2 by default
> 
>
> Key: SPARK-37948
> URL: https://issues.apache.org/jira/browse/SPARK-37948
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: hujiahua
>Priority: Major
>
> The hadoop MR v2 commit algorithm had a correctness issue described by 
> SPARK-33019, and changed 
> spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default. 
> But some spark users like me ware unaware of this correctness issue before 
> and had used v2 commit algorithm in spark 2.x for performance purposes. And 
> after upgrade to spark 3.x, we encountered this correctness issue in 
> production environment, caused a very serious failure.The trigger probability 
> of this issue was higher in new version spark 3.x, and I didn't delve into 
> the specific reasons. So I propose we should better disable 
> spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 by default, if 
> users using v2 commit algorithm, then fail the job and warn users this 
> correctness issue. Or users can choose to force the v2 usage through a new 
> configuration.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37948) Disable mapreduce.fileoutputcommitter.algorithm.version=2 by default

2022-01-19 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479096#comment-17479096
 ] 

Hyukjin Kwon commented on SPARK-37948:
--

The problem is that users might intentionally enable v2 protocol, and it makes 
less sense to warn and disable it. They might already know the risk, and enable 
v2. I personally think it's discouraged to assume that user's input is wrong.

> Disable mapreduce.fileoutputcommitter.algorithm.version=2 by default
> 
>
> Key: SPARK-37948
> URL: https://issues.apache.org/jira/browse/SPARK-37948
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: hujiahua
>Priority: Major
>
> The hadoop MR v2 commit algorithm had a correctness issue described by 
> SPARK-33019, and changed 
> spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default. 
> But some spark users like me ware unaware of this correctness issue before 
> and had used v2 commit algorithm in spark 2.x for performance purposes. And 
> after upgrade to spark 3.x, we encountered this correctness issue in 
> production environment, caused a very serious failure.The trigger probability 
> of this issue was higher in new version spark 3.x, and I didn't delve into 
> the specific reasons. So I propose we should better disable 
> spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 by default, if 
> users using v2 commit algorithm, then fail the job and warn users this 
> correctness issue. Or users can choose to force the v2 usage through a new 
> configuration.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org