[GitHub] spark issue #19154: [SPARK-21942][CORE] Fix DiskBlockManager crashing when a...

2017-09-07 Thread rshest
Github user rshest commented on the issue:

https://github.com/apache/spark/pull/19154
  
I have managed to create the JIRA task and updated the pull request's title 
correspondingly:
https://issues.apache.org/jira/browse/SPARK-21942

Since this does not look so far that it's going to be let through, for the 
posterity and for people who might come here with a similar problem, the 
suggestions to work this around are (according to the Sean's comment in the 
issue):

- manually configure your scratch directory (_spark.local.dir_) to be 
elsewhere
- stop your system from cleaning up your temp folder


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19154: Fix DiskBlockManager crashing when a root local folder h...

2017-09-07 Thread rshest
Github user rshest commented on the issue:

https://github.com/apache/spark/pull/19154
  
Please note that it's not _people_ deleting files, it's the operating 
system doing this automatically, inside the `/tmp` folder. This can happen with 
a high probability, after some time.

I did read the guidelines above before submitting the PR, and I believe I 
went through the all steps aside of creating the JIRA issue (I had trouble 
logging into the system for some reason). Could you please point me to what 
else needs to be done, exactly? Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19154: Fix DiskBlockManager crashing when a root local f...

2017-09-07 Thread rshest
GitHub user rshest opened a pull request:

https://github.com/apache/spark/pull/19154

Fix DiskBlockManager crashing when a root local folder has been externally 
deleted

## What changes were proposed in this pull request?

**The problem:** 

`DiskBlockManager has a notion of "scratch" local folder(s), which can be 
configured via `spark.local.dir` option, and which defaults to the system's 
`/tmp`. The hierarchy is two-level, e.g. `/blockmgr-XXX.../YY`, where the `YY` 
part is a hash bit, to spread files evenly. 

Function `DiskBlockManager.getFile` _expects_ the top level directories 
(`blockmgr-XXX...`) to always exist (they get created once, when the spark 
context is first created), otherwise it would fail with message like:

```
... java.io.IOException: Failed to create local dir in 
/tmp/blockmgr-XXX.../YY
```

However, this may not always be the case, in particular if it's the default 
`/tmp` folder - in this case, on certain operating systems, it can be cleaned 
on a regular basis (e.g. once per day via a system cron job). 

The symptom is that after the process using spark is running for a while (a 
few days), it may not be able to load files anymore, since the scratch 
directories are not there and `DiskBlockManager.getFile` crashes.

The change/mitigation is simple: use `File.mkdirs` instead of `File.mkdir` 
inside `getFile`, so that we create the _full path_ there, which will handle 
the case that parent directory is not there anymore.

## How was this patch tested?

I have added a falsifying unit test inside `DiskBlockManagerSuite`, which 
gets fixed via this patch.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rshest/spark 
fix-DiskBlockManager-local-root-removed

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19154.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19154


commit dc502493c8c5cde03ba4dc1ce8391e176c583267
Author: Ruslan Shestopalyuk <rushb...@gmail.com>
Date:   2017-09-06T15:24:43Z

Fix DiskBlockManager crashing when root local folder has been removed




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org