[ 
https://issues.apache.org/jira/browse/SPARK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-21942:
------------------------------
    Affects Version/s:     (was: 2.2.1)
                           (was: 2.3.0)
                           (was: 3.0.0)
                           (was: 2.0.2)
                           (was: 1.6.3)
     Target Version/s:   (was: 2.3.0)
        Fix Version/s:     (was: 2.3.0)

> DiskBlockManager crashing when a root local folder has been externally 
> deleted by OS
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-21942
>                 URL: https://issues.apache.org/jira/browse/SPARK-21942
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.1, 1.6.2, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.2.0
>            Reporter: Ruslan Shestopalyuk
>            Priority: Minor
>              Labels: storage
>
> _DiskBlockManager_ has a notion of a "scratch" local folder(s), which can be 
> configured via _spark.local.dir_ option, and which defaults to the system's 
> _/tmp_. The hierarchy is two-level, e.g. _/blockmgr-XXX.../YY_, where the 
> _YY_ part is a hash bit, to spread files evenly.
> Function _DiskBlockManager.getFile_ expects the top level directories 
> (_blockmgr-XXX..._) to always exist (they get created once, when the spark 
> context is first created), otherwise it would fail with a message like:
> {code}
> ... java.io.IOException: Failed to create local dir in /tmp/blockmgr-XXX.../YY
> {code}
> However, this may not always be the case.
> In particular, *if it's the default _/tmp_ folder*, there can be different 
> strategies of automatically removing files from it, depending on the OS:
> * on the boot time
> * on a regular basis (e.g. once per day via a system cron job)
> * based on the file age
> The symptom is that after the process (in our case, a service) using spark is 
> running for a while (a few days), it may not be able to load files anymore, 
> since the top-level scratch directories are not there and 
> _DiskBlockManager.getFile_ crashes.
> Please note that this is different from people arbitrarily removing files 
> manually.
> We have both the facts that _/tmp_ is the default in the spark config and 
> that the system has the right to tamper with its contents, and will do it 
> with a high probability, after some period of time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to