[
https://issues.apache.org/jira/browse/SPARK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157489#comment-16157489
]
Ruslan Shestopalyuk commented on SPARK-21942:
---------------------------------------------
Sean, OS won't ever delete the files that are currently open - in fact it only
deletes files that have not been _accessed_ for several days.
For example, in case of a RHEL Fedora distribution (which is a base for the
standard AWS Linux image), the corresponding cron job config looks like this:
{code:bash}
$ cat /etc/cron.daily/tmpwatch
#! /bin/sh
flags=-umc
/usr/sbin/tmpwatch "$flags" -x /tmp/.X11-unix -x /tmp/.XIM-unix \
-x /tmp/.font-unix -x /tmp/.ICE-unix -x /tmp/.Test-unix \
-X '/tmp/hsperfdata_*' 10d /tmp
/usr/sbin/tmpwatch "$flags" 30d /var/tmp
for d in /var/{cache/man,catman}/{cat?,X11R6/cat?,local/cat?}; do
if [ -d "$d" ]; then
/usr/sbin/tmpwatch "$flags" -f 30d "$d"
fi
done
{code}
So it runs the cron task daily, executing the
[tmpwatch](https://linux.die.net/man/8/tmpwatch) utility, telling it in
particular:
* for _/tmp_ to delete all files that have not been _accessed_ for more than 10
days
* the same for _/var/tmp_, but not accessed for 30 days
So in case of the spark scratch folder, it will be purged if it has not been
accessed (writte.
> DiskBlockManager crashing when a root local folder has been externally
> deleted by OS
> ------------------------------------------------------------------------------------
>
> Key: SPARK-21942
> URL: https://issues.apache.org/jira/browse/SPARK-21942
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.6.1, 1.6.2, 1.6.3, 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1,
> 2.2.0, 2.2.1, 2.3.0, 3.0.0
> Reporter: Ruslan Shestopalyuk
> Priority: Minor
> Labels: storage
> Fix For: 2.3.0
>
>
> _DiskBlockManager_ has a notion of a "scratch" local folder(s), which can be
> configured via _spark.local.dir_ option, and which defaults to the system's
> _/tmp_. The hierarchy is two-level, e.g. _/blockmgr-XXX.../YY_, where the
> _YY_ part is a hash bit, to spread files evenly.
> Function _DiskBlockManager.getFile_ expects the top level directories
> (_blockmgr-XXX..._) to always exist (they get created once, when the spark
> context is first created), otherwise it would fail with a message like:
> {code}
> ... java.io.IOException: Failed to create local dir in /tmp/blockmgr-XXX.../YY
> {code}
> However, this may not always be the case.
> In particular, *if it's the default _/tmp_ folder*, there can be different
> strategies of automatically removing files from it, depending on the OS:
> * on the boot time
> * on a regular basis (e.g. once per day via a system cron job)
> * based on the file age
> The symptom is that after the process (in our case, a service) using spark is
> running for a while (a few days), it may not be able to load files anymore,
> since the top-level scratch directories are not there and
> _DiskBlockManager.getFile_ crashes.
> Please note that this is different from people arbitrarily removing files
> manually.
> We have both the facts that _/tmp_ is the default in the spark config and
> that the system has the right to tamper with its contents, and will do it
> with a high probability, after some period of time.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]