This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 23db9b4 [SPARK-38191][CORE][FOLLOWUP] The staging directory of write job only needs to be initialized once in HadoopMapReduceCommitProtocol 23db9b4 is described below commit 23db9b440ba70f4edf1f4a604f4829e1831ea502 Author: weixiuli <weixi...@jd.com> AuthorDate: Wed Mar 2 18:00:56 2022 -0600 [SPARK-38191][CORE][FOLLOWUP] The staging directory of write job only needs to be initialized once in HadoopMapReduceCommitProtocol ### What changes were proposed in this pull request? This pr follows up the https://github.com/apache/spark/pull/35492, try to use a stagingDir constant instead of the stagingDir method in HadoopMapReduceCommitProtocol. ### Why are the changes needed? In the https://github.com/apache/spark/pull/35492#issuecomment-1054910730 ``` ./build/sbt -mem 4096 -Phadoop-2 "sql/testOnly org.apache.spark.sql.sources.PartitionedWriteSuite -- -z SPARK-27194" ... [info] Cause: org.apache.spark.SparkException: Task not serializable ... [info] Cause: java.io.NotSerializableException: org.apache.hadoop.fs.Path ... ``` It's because org.apache.hadoop.fs.Path is serializable in Hadoop3 but not in Hadoop2. So, we should make the stagingDir transient to avoid that. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Passed `./build/sbt -mem 4096 -Phadoop-2 "sql/testOnly org.apache.spark.sql.sources.PartitionedWriteSuite -- -z SPARK-27194"` Pass the CIs. Closes #35693 from weixiuli/staging-directory. Authored-by: weixiuli <weixi...@jd.com> Signed-off-by: Sean Owen <sro...@gmail.com> --- .../org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala b/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala index a39e9ab..3a24da9 100644 --- a/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala +++ b/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala @@ -104,7 +104,7 @@ class HadoopMapReduceCommitProtocol( * The staging directory of this write job. Spark uses it to deal with files with absolute output * path, or writing data into partitioned directory with dynamicPartitionOverwrite=true. */ - protected def stagingDir = getStagingDir(path, jobId) + @transient protected lazy val stagingDir = getStagingDir(path, jobId) protected def setupCommitter(context: TaskAttemptContext): OutputCommitter = { val format = context.getOutputFormatClass.getConstructor().newInstance() --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org