[ https://issues.apache.org/jira/browse/SPARK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157489#comment-16157489 ]
Ruslan Shestopalyuk commented on SPARK-21942: --------------------------------------------- Sean, OS won't ever delete the files that are currently open - in fact it only deletes files that have not been _accessed_ for several days. For example, in case of a RHEL Fedora distribution (which is a base for the standard AWS Linux image), the corresponding cron job config looks like this: {code:bash} $ cat /etc/cron.daily/tmpwatch #! /bin/sh flags=-umc /usr/sbin/tmpwatch "$flags" -x /tmp/.X11-unix -x /tmp/.XIM-unix \ -x /tmp/.font-unix -x /tmp/.ICE-unix -x /tmp/.Test-unix \ -X '/tmp/hsperfdata_*' 10d /tmp /usr/sbin/tmpwatch "$flags" 30d /var/tmp for d in /var/{cache/man,catman}/{cat?,X11R6/cat?,local/cat?}; do if [ -d "$d" ]; then /usr/sbin/tmpwatch "$flags" -f 30d "$d" fi done {code} So it runs the cron task daily, executing the [tmpwatch](https://linux.die.net/man/8/tmpwatch) utility, telling it in particular: * for _/tmp_ to delete all files that have not been _accessed_ for more than 10 days * the same for _/var/tmp_, but not accessed for 30 days So in case of the spark scratch folder, it will be purged if it has not been accessed (writte. > DiskBlockManager crashing when a root local folder has been externally > deleted by OS > ------------------------------------------------------------------------------------ > > Key: SPARK-21942 > URL: https://issues.apache.org/jira/browse/SPARK-21942 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.6.1, 1.6.2, 1.6.3, 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, > 2.2.0, 2.2.1, 2.3.0, 3.0.0 > Reporter: Ruslan Shestopalyuk > Priority: Minor > Labels: storage > Fix For: 2.3.0 > > > _DiskBlockManager_ has a notion of a "scratch" local folder(s), which can be > configured via _spark.local.dir_ option, and which defaults to the system's > _/tmp_. The hierarchy is two-level, e.g. _/blockmgr-XXX.../YY_, where the > _YY_ part is a hash bit, to spread files evenly. > Function _DiskBlockManager.getFile_ expects the top level directories > (_blockmgr-XXX..._) to always exist (they get created once, when the spark > context is first created), otherwise it would fail with a message like: > {code} > ... java.io.IOException: Failed to create local dir in /tmp/blockmgr-XXX.../YY > {code} > However, this may not always be the case. > In particular, *if it's the default _/tmp_ folder*, there can be different > strategies of automatically removing files from it, depending on the OS: > * on the boot time > * on a regular basis (e.g. once per day via a system cron job) > * based on the file age > The symptom is that after the process (in our case, a service) using spark is > running for a while (a few days), it may not be able to load files anymore, > since the top-level scratch directories are not there and > _DiskBlockManager.getFile_ crashes. > Please note that this is different from people arbitrarily removing files > manually. > We have both the facts that _/tmp_ is the default in the spark config and > that the system has the right to tamper with its contents, and will do it > with a high probability, after some period of time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org