Hi Yang,

Isn't external shuffle service better for long running applications?
"It runs as a standalone application and manages shuffle output files so
they are available for executors at all time"

It is described here:
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-ExternalShuffleService.html

---
Artur

On Fri, Sep 2, 2016 at 12:30 PM 汪洋 <tiandiwo...@icloud.com> wrote:

> Thank you for you response.
>
> We are using spark-1.6.2 on standalone deploy mode with dynamic allocation
> disabled.
>
> I have traced the code. IMHO, it seems this cleanup is not handled by
> shutdown hooks directly. The shutdown hooks only send a
> “ExecutorStateChanged” message to the worker and if the worker see the
> message, it will cleanup the directory *only when this application is
> finished*. In our case, the application is not finished (long running).
> The executor exits due to some unknown error and it is restarted by worker
> right away. In this scenario, those old directories are not going to be
> deleted.
>
> If the application is still running, is it safe to delete the old
> “blockmgr” directory and leaving only the newest one?
>
> Our temporary solution is to restart our application regularly and we are
> seeking a more elegant way.
>
> Thanks.
>
> Yang
>
>
> 在 2016年9月2日,下午4:11,Sun Rui <sunrise_...@163.com> 写道:
>
> Hi,
> Could you give more information about your Spark environment? cluster
> manager, spark version, using dynamic allocation or not, etc..
>
> Generally, executors will delete temporary directories for shuffle files
> on exit because JVM shutdown hooks are registered. Unless they are brutally
> killed.
>
> You can safely delete the directories when you are sure that the spark
> applications related to them have finished. A crontab task may be used for
> automatic clean up.
>
> On Sep 2, 2016, at 12:18, 汪洋 <tiandiwo...@icloud.com> wrote:
>
> Hi all,
>
> I discovered that sometimes executor exits unexpectedly and when it is
> restarted, it will create another blockmgr directory without deleting the
> old ones. Thus, for a long running application, some shuffle files will
> never be cleaned up. Sometimes those files could take up the whole disk.
>
> Is there a way to clean up those unused file automatically? Or is it safe
> to delete the old directory manually only leaving the newest one?
>
> Here is the executor’s local directory.
> <D7718580-FF26-47F8-B6F8-00FB1F20A8C0.png>
>
> Any advice on this?
>
> Thanks.
>
> Yang
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
> --
--
Artur Sukhenko

Reply via email to