GitHub user rshest opened a pull request:

    https://github.com/apache/spark/pull/19154

    Fix DiskBlockManager crashing when a root local folder has been externally 
deleted

    ## What changes were proposed in this pull request?
    
    **The problem:** 
    
    `DiskBlockManager has a notion of "scratch" local folder(s), which can be 
configured via `spark.local.dir` option, and which defaults to the system's 
`/tmp`. The hierarchy is two-level, e.g. `/blockmgr-XXX.../YY`, where the `YY` 
part is a hash bit, to spread files evenly. 
    
    Function `DiskBlockManager.getFile` _expects_ the top level directories 
(`blockmgr-XXX...`) to always exist (they get created once, when the spark 
context is first created), otherwise it would fail with message like:
    
    ```
    ... java.io.IOException: Failed to create local dir in 
/tmp/blockmgr-XXX.../YY
    ```
    
    However, this may not always be the case, in particular if it's the default 
`/tmp` folder - in this case, on certain operating systems, it can be cleaned 
on a regular basis (e.g. once per day via a system cron job). 
    
    The symptom is that after the process using spark is running for a while (a 
few days), it may not be able to load files anymore, since the scratch 
directories are not there and `DiskBlockManager.getFile` crashes.
    
    The change/mitigation is simple: use `File.mkdirs` instead of `File.mkdir` 
inside `getFile`, so that we create the _full path_ there, which will handle 
the case that parent directory is not there anymore.
    
    ## How was this patch tested?
    
    I have added a falsifying unit test inside `DiskBlockManagerSuite`, which 
gets fixed via this patch.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rshest/spark 
fix-DiskBlockManager-local-root-removed

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19154.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19154
    
----
commit dc502493c8c5cde03ba4dc1ce8391e176c583267
Author: Ruslan Shestopalyuk <rushb...@gmail.com>
Date:   2017-09-06T15:24:43Z

    Fix DiskBlockManager crashing when root local folder has been removed

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to