[GitHub] spark pull request #18111: [SPARK-20886][CORE] HadoopMapReduceCommitProtocol...

steveloughran Thu, 25 May 2017 08:58:15 -0700

GitHub user steveloughran opened a pull request:

    https://github.com/apache/spark/pull/18111


    [SPARK-20886][CORE] HadoopMapReduceCommitProtocol to fail meaningfully if 
FileOutputCommitter.getWorkPath==null

    ## What changes were proposed in this pull request?
    
    Handles the situation where a `FileOutputCommitter.getWorkPath()` returns 
`null` by a `require()` call and a message which explains the problem and 
includes the `toString` value of the committer for better diagnostics.
    
    The situation occurs if the committer being passed in is a job committer, 
not a task committer, that is: it was initalised with a `JobAttemptContext` not 
a `TaskAttemptContext`.
    
    The existing code does an  `Option(workPath.toString).getOrElse(path)` 
which *may* be an attempt to handle the null path case. If so, it isn't, 
because its the `.toString()` call which is failing. If people do think that 
code should be resilient to null work paths, that line could be changed. 
However, it may hide the underlying problem: the committer is misconfigured.
    
    It may be a rare-occurence today, but it is more likely with modified 
subclasses of `FileOutputCommitter`, as well as possible
    with some ongoing work of mine in Hadoop to better support commitment to 
cloud storage infrastructures.
    
    ## How was this patch tested?
    
    Manually. The before & after stack traces are on the JIRA.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/steveloughran/spark 
cloud/SPARK-20886-committer-NPE

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18111.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18111
    
----
commit 02eb7bf0ee6b81841f22e3c46d822eaebb28e85c
Author: Steve Loughran <ste...@hortonworks.com>
Date:   2017-05-25T15:46:50Z

    SPARK-20886 HadoopMapReduceCommitProtocol to fail with message if 
FileOutputCommitter.getWorkPath==null
    Add a requirement.
    The existing code does an Option.getWorkpath.toString() which *may* be an 
attempt to handle the null path case. If so, it isn't, because its the 
.toString() which is failing.
    
    Change-Id: Idddf9813761e7008425542f96903bce12bedd978

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18111: [SPARK-20886][CORE] HadoopMapReduceCommitProtocol...

Reply via email to