subject:"\[spark\] branch branch\-3.0 updated\: \[SPARK\-33019\]\[CORE\] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default"

[spark] branch branch-3.0 updated: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f3b80f8  [SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
f3b80f8 is described below

commit f3b80f88324e8a1a76d01d13cfc1fc7082238214
Author: Dongjoon Hyun 
AuthorDate: Tue Sep 29 12:02:45 2020 -0700

[SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

### What changes were proposed in this pull request?

Apache Spark 3.1's default Hadoop profile is `hadoop-3.2`. Instead of 
having a warning documentation, this PR aims to use a consistent and safer 
version of Apache Hadoop file output committer algorithm which is `v1`. This 
will prevent a silent correctness regression during migration from Apache Spark 
2.4/3.0 to Apache Spark 3.1.0. Of course, if there is a user-provided 
configuration, 
`spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2`, that will be 
used still.

### Why are the changes needed?

Apache Spark provides multiple distributions with Hadoop 2.7 and Hadoop 
3.2. `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version` depends on 
the Hadoop version. Apache Hadoop 3.0 switches the default algorithm from `v1` 
to `v2` and now there exists a discussion to remove `v2`. We had better provide 
a consistent default behavior of `v1` across various Spark distributions.

- [MAPREDUCE-7282](https://issues.apache.org/jira/browse/MAPREDUCE-7282) MR 
v2 commit algorithm should be deprecated and not the default

### Does this PR introduce _any_ user-facing change?

Yes. This changes the default behavior. Users can override this conf.

### How was this patch tested?

Manual.

**BEFORE (spark-3.0.1-bin-hadoop3.2)**
```scala
scala> sc.version
res0: String = 3.0.1

scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res1: String = 2
```

**AFTER**
```scala
scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res0: String = 1
```

Closes #29895 from dongjoon-hyun/SPARK-DEFAUT-COMMITTER.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cc06266ade5a4eb35089501a3b32736624208d4c)
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala   |  3 +++
 docs/configuration.md  | 10 ++
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
index 1180501..6f799a5 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
@@ -462,6 +462,9 @@ private[spark] object SparkHadoopUtil {
 for ((key, value) <- conf.getAll if key.startsWith("spark.hadoop.")) {
   hadoopConf.set(key.substring("spark.hadoop.".length), value)
 }
+if 
(conf.getOption("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version").isEmpty)
 {
+  hadoopConf.set("mapreduce.fileoutputcommitter.algorithm.version", "1")
+}
   }
 
   private def appendSparkHiveConfigs(conf: SparkConf, hadoopConf: 
Configuration): Unit = {
diff --git a/docs/configuration.md b/docs/configuration.md
index 95ff282..36e4f45 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1761,16 +1761,10 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
   
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
-  Dependent on environment
+  1
   
 The file output committer algorithm version, valid algorithm version 
number: 1 or 2.
-Version 2 may have better performance, but version 1 may handle failures 
better in certain situations,
-as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815.
-The default value depends on the Hadoop version used in an environment:
-1 for Hadoop versions lower than 3.0
-2 for Hadoop versions 3.0 and higher
-It's important to note that this can change back to 1 again in the future 
once https://issues.apache.org/jira/browse/MAPREDUCE-7282;>MAPREDUCE-7282
-is fixed and merged.
+Note that 2 may cause a correctness issue like MAPREDUCE-7282.
   
   2.2.0
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f3b80f8  [SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
f3b80f8 is described below

commit f3b80f88324e8a1a76d01d13cfc1fc7082238214
Author: Dongjoon Hyun 
AuthorDate: Tue Sep 29 12:02:45 2020 -0700

[SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

### What changes were proposed in this pull request?

Apache Spark 3.1's default Hadoop profile is `hadoop-3.2`. Instead of 
having a warning documentation, this PR aims to use a consistent and safer 
version of Apache Hadoop file output committer algorithm which is `v1`. This 
will prevent a silent correctness regression during migration from Apache Spark 
2.4/3.0 to Apache Spark 3.1.0. Of course, if there is a user-provided 
configuration, 
`spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2`, that will be 
used still.

### Why are the changes needed?

Apache Spark provides multiple distributions with Hadoop 2.7 and Hadoop 
3.2. `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version` depends on 
the Hadoop version. Apache Hadoop 3.0 switches the default algorithm from `v1` 
to `v2` and now there exists a discussion to remove `v2`. We had better provide 
a consistent default behavior of `v1` across various Spark distributions.

- [MAPREDUCE-7282](https://issues.apache.org/jira/browse/MAPREDUCE-7282) MR 
v2 commit algorithm should be deprecated and not the default

### Does this PR introduce _any_ user-facing change?

Yes. This changes the default behavior. Users can override this conf.

### How was this patch tested?

Manual.

**BEFORE (spark-3.0.1-bin-hadoop3.2)**
```scala
scala> sc.version
res0: String = 3.0.1

scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res1: String = 2
```

**AFTER**
```scala
scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res0: String = 1
```

Closes #29895 from dongjoon-hyun/SPARK-DEFAUT-COMMITTER.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cc06266ade5a4eb35089501a3b32736624208d4c)
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala   |  3 +++
 docs/configuration.md  | 10 ++
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
index 1180501..6f799a5 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
@@ -462,6 +462,9 @@ private[spark] object SparkHadoopUtil {
 for ((key, value) <- conf.getAll if key.startsWith("spark.hadoop.")) {
   hadoopConf.set(key.substring("spark.hadoop.".length), value)
 }
+if 
(conf.getOption("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version").isEmpty)
 {
+  hadoopConf.set("mapreduce.fileoutputcommitter.algorithm.version", "1")
+}
   }
 
   private def appendSparkHiveConfigs(conf: SparkConf, hadoopConf: 
Configuration): Unit = {
diff --git a/docs/configuration.md b/docs/configuration.md
index 95ff282..36e4f45 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1761,16 +1761,10 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
   
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
-  Dependent on environment
+  1
   
 The file output committer algorithm version, valid algorithm version 
number: 1 or 2.
-Version 2 may have better performance, but version 1 may handle failures 
better in certain situations,
-as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815.
-The default value depends on the Hadoop version used in an environment:
-1 for Hadoop versions lower than 3.0
-2 for Hadoop versions 3.0 and higher
-It's important to note that this can change back to 1 again in the future 
once https://issues.apache.org/jira/browse/MAPREDUCE-7282;>MAPREDUCE-7282
-is fixed and merged.
+Note that 2 may cause a correctness issue like MAPREDUCE-7282.
   
   2.2.0
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f3b80f8  [SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
f3b80f8 is described below

commit f3b80f88324e8a1a76d01d13cfc1fc7082238214
Author: Dongjoon Hyun 
AuthorDate: Tue Sep 29 12:02:45 2020 -0700

[SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

### What changes were proposed in this pull request?

Apache Spark 3.1's default Hadoop profile is `hadoop-3.2`. Instead of 
having a warning documentation, this PR aims to use a consistent and safer 
version of Apache Hadoop file output committer algorithm which is `v1`. This 
will prevent a silent correctness regression during migration from Apache Spark 
2.4/3.0 to Apache Spark 3.1.0. Of course, if there is a user-provided 
configuration, 
`spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2`, that will be 
used still.

### Why are the changes needed?

Apache Spark provides multiple distributions with Hadoop 2.7 and Hadoop 
3.2. `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version` depends on 
the Hadoop version. Apache Hadoop 3.0 switches the default algorithm from `v1` 
to `v2` and now there exists a discussion to remove `v2`. We had better provide 
a consistent default behavior of `v1` across various Spark distributions.

- [MAPREDUCE-7282](https://issues.apache.org/jira/browse/MAPREDUCE-7282) MR 
v2 commit algorithm should be deprecated and not the default

### Does this PR introduce _any_ user-facing change?

Yes. This changes the default behavior. Users can override this conf.

### How was this patch tested?

Manual.

**BEFORE (spark-3.0.1-bin-hadoop3.2)**
```scala
scala> sc.version
res0: String = 3.0.1

scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res1: String = 2
```

**AFTER**
```scala
scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res0: String = 1
```

Closes #29895 from dongjoon-hyun/SPARK-DEFAUT-COMMITTER.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cc06266ade5a4eb35089501a3b32736624208d4c)
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala   |  3 +++
 docs/configuration.md  | 10 ++
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
index 1180501..6f799a5 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
@@ -462,6 +462,9 @@ private[spark] object SparkHadoopUtil {
 for ((key, value) <- conf.getAll if key.startsWith("spark.hadoop.")) {
   hadoopConf.set(key.substring("spark.hadoop.".length), value)
 }
+if 
(conf.getOption("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version").isEmpty)
 {
+  hadoopConf.set("mapreduce.fileoutputcommitter.algorithm.version", "1")
+}
   }
 
   private def appendSparkHiveConfigs(conf: SparkConf, hadoopConf: 
Configuration): Unit = {
diff --git a/docs/configuration.md b/docs/configuration.md
index 95ff282..36e4f45 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1761,16 +1761,10 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
   
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
-  Dependent on environment
+  1
   
 The file output committer algorithm version, valid algorithm version 
number: 1 or 2.
-Version 2 may have better performance, but version 1 may handle failures 
better in certain situations,
-as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815.
-The default value depends on the Hadoop version used in an environment:
-1 for Hadoop versions lower than 3.0
-2 for Hadoop versions 3.0 and higher
-It's important to note that this can change back to 1 again in the future 
once https://issues.apache.org/jira/browse/MAPREDUCE-7282;>MAPREDUCE-7282
-is fixed and merged.
+Note that 2 may cause a correctness issue like MAPREDUCE-7282.
   
   2.2.0
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f3b80f8  [SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
f3b80f8 is described below

commit f3b80f88324e8a1a76d01d13cfc1fc7082238214
Author: Dongjoon Hyun 
AuthorDate: Tue Sep 29 12:02:45 2020 -0700

[SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

### What changes were proposed in this pull request?

Apache Spark 3.1's default Hadoop profile is `hadoop-3.2`. Instead of 
having a warning documentation, this PR aims to use a consistent and safer 
version of Apache Hadoop file output committer algorithm which is `v1`. This 
will prevent a silent correctness regression during migration from Apache Spark 
2.4/3.0 to Apache Spark 3.1.0. Of course, if there is a user-provided 
configuration, 
`spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2`, that will be 
used still.

### Why are the changes needed?

Apache Spark provides multiple distributions with Hadoop 2.7 and Hadoop 
3.2. `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version` depends on 
the Hadoop version. Apache Hadoop 3.0 switches the default algorithm from `v1` 
to `v2` and now there exists a discussion to remove `v2`. We had better provide 
a consistent default behavior of `v1` across various Spark distributions.

- [MAPREDUCE-7282](https://issues.apache.org/jira/browse/MAPREDUCE-7282) MR 
v2 commit algorithm should be deprecated and not the default

### Does this PR introduce _any_ user-facing change?

Yes. This changes the default behavior. Users can override this conf.

### How was this patch tested?

Manual.

**BEFORE (spark-3.0.1-bin-hadoop3.2)**
```scala
scala> sc.version
res0: String = 3.0.1

scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res1: String = 2
```

**AFTER**
```scala
scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res0: String = 1
```

Closes #29895 from dongjoon-hyun/SPARK-DEFAUT-COMMITTER.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cc06266ade5a4eb35089501a3b32736624208d4c)
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala   |  3 +++
 docs/configuration.md  | 10 ++
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
index 1180501..6f799a5 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
@@ -462,6 +462,9 @@ private[spark] object SparkHadoopUtil {
 for ((key, value) <- conf.getAll if key.startsWith("spark.hadoop.")) {
   hadoopConf.set(key.substring("spark.hadoop.".length), value)
 }
+if 
(conf.getOption("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version").isEmpty)
 {
+  hadoopConf.set("mapreduce.fileoutputcommitter.algorithm.version", "1")
+}
   }
 
   private def appendSparkHiveConfigs(conf: SparkConf, hadoopConf: 
Configuration): Unit = {
diff --git a/docs/configuration.md b/docs/configuration.md
index 95ff282..36e4f45 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1761,16 +1761,10 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
   
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
-  Dependent on environment
+  1
   
 The file output committer algorithm version, valid algorithm version 
number: 1 or 2.
-Version 2 may have better performance, but version 1 may handle failures 
better in certain situations,
-as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815.
-The default value depends on the Hadoop version used in an environment:
-1 for Hadoop versions lower than 3.0
-2 for Hadoop versions 3.0 and higher
-It's important to note that this can change back to 1 again in the future 
once https://issues.apache.org/jira/browse/MAPREDUCE-7282;>MAPREDUCE-7282
-is fixed and merged.
+Note that 2 may cause a correctness issue like MAPREDUCE-7282.
   
   2.2.0
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

2020-09-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f3b80f8  [SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default
f3b80f8 is described below

commit f3b80f88324e8a1a76d01d13cfc1fc7082238214
Author: Dongjoon Hyun 
AuthorDate: Tue Sep 29 12:02:45 2020 -0700

[SPARK-33019][CORE] Use 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

### What changes were proposed in this pull request?

Apache Spark 3.1's default Hadoop profile is `hadoop-3.2`. Instead of 
having a warning documentation, this PR aims to use a consistent and safer 
version of Apache Hadoop file output committer algorithm which is `v1`. This 
will prevent a silent correctness regression during migration from Apache Spark 
2.4/3.0 to Apache Spark 3.1.0. Of course, if there is a user-provided 
configuration, 
`spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2`, that will be 
used still.

### Why are the changes needed?

Apache Spark provides multiple distributions with Hadoop 2.7 and Hadoop 
3.2. `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version` depends on 
the Hadoop version. Apache Hadoop 3.0 switches the default algorithm from `v1` 
to `v2` and now there exists a discussion to remove `v2`. We had better provide 
a consistent default behavior of `v1` across various Spark distributions.

- [MAPREDUCE-7282](https://issues.apache.org/jira/browse/MAPREDUCE-7282) MR 
v2 commit algorithm should be deprecated and not the default

### Does this PR introduce _any_ user-facing change?

Yes. This changes the default behavior. Users can override this conf.

### How was this patch tested?

Manual.

**BEFORE (spark-3.0.1-bin-hadoop3.2)**
```scala
scala> sc.version
res0: String = 3.0.1

scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res1: String = 2
```

**AFTER**
```scala
scala> 
sc.hadoopConfiguration.get("mapreduce.fileoutputcommitter.algorithm.version")
res0: String = 1
```

Closes #29895 from dongjoon-hyun/SPARK-DEFAUT-COMMITTER.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cc06266ade5a4eb35089501a3b32736624208d4c)
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala   |  3 +++
 docs/configuration.md  | 10 ++
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
index 1180501..6f799a5 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
@@ -462,6 +462,9 @@ private[spark] object SparkHadoopUtil {
 for ((key, value) <- conf.getAll if key.startsWith("spark.hadoop.")) {
   hadoopConf.set(key.substring("spark.hadoop.".length), value)
 }
+if 
(conf.getOption("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version").isEmpty)
 {
+  hadoopConf.set("mapreduce.fileoutputcommitter.algorithm.version", "1")
+}
   }
 
   private def appendSparkHiveConfigs(conf: SparkConf, hadoopConf: 
Configuration): Unit = {
diff --git a/docs/configuration.md b/docs/configuration.md
index 95ff282..36e4f45 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1761,16 +1761,10 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
   
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
-  Dependent on environment
+  1
   
 The file output committer algorithm version, valid algorithm version 
number: 1 or 2.
-Version 2 may have better performance, but version 1 may handle failures 
better in certain situations,
-as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815.
-The default value depends on the Hadoop version used in an environment:
-1 for Hadoop versions lower than 3.0
-2 for Hadoop versions 3.0 and higher
-It's important to note that this can change back to 1 again in the future 
once https://issues.apache.org/jira/browse/MAPREDUCE-7282;>MAPREDUCE-7282
-is fixed and merged.
+Note that 2 may cause a correctness issue like MAPREDUCE-7282.
   
   2.2.0
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

[spark] branch branch-3.0 updated: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

[spark] branch branch-3.0 updated: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

[spark] branch branch-3.0 updated: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

[spark] branch branch-3.0 updated: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

5 matches

Site Navigation

Mail list logo

Footer information