GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/18111
[SPARK-20886][CORE] HadoopMapReduceCommitProtocol to fail meaningfully if FileOutputCommitter.getWorkPath==null ## What changes were proposed in this pull request? Handles the situation where a `FileOutputCommitter.getWorkPath()` returns `null` by a `require()` call and a message which explains the problem and includes the `toString` value of the committer for better diagnostics. The situation occurs if the committer being passed in is a job committer, not a task committer, that is: it was initalised with a `JobAttemptContext` not a `TaskAttemptContext`. The existing code does an `Option(workPath.toString).getOrElse(path)` which *may* be an attempt to handle the null path case. If so, it isn't, because its the `.toString()` call which is failing. If people do think that code should be resilient to null work paths, that line could be changed. However, it may hide the underlying problem: the committer is misconfigured. It may be a rare-occurence today, but it is more likely with modified subclasses of `FileOutputCommitter`, as well as possible with some ongoing work of mine in Hadoop to better support commitment to cloud storage infrastructures. ## How was this patch tested? Manually. The before & after stack traces are on the JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/steveloughran/spark cloud/SPARK-20886-committer-NPE Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18111.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18111 ---- commit 02eb7bf0ee6b81841f22e3c46d822eaebb28e85c Author: Steve Loughran <ste...@hortonworks.com> Date: 2017-05-25T15:46:50Z SPARK-20886 HadoopMapReduceCommitProtocol to fail with message if FileOutputCommitter.getWorkPath==null Add a requirement. The existing code does an Option.getWorkpath.toString() which *may* be an attempt to handle the null path case. If so, it isn't, because its the .toString() which is failing. Change-Id: Idddf9813761e7008425542f96903bce12bedd978 ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org