Hi Yang, Isn't external shuffle service better for long running applications? "It runs as a standalone application and manages shuffle output files so they are available for executors at all time"
It is described here: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-ExternalShuffleService.html --- Artur On Fri, Sep 2, 2016 at 12:30 PM 汪洋 <tiandiwo...@icloud.com> wrote: > Thank you for you response. > > We are using spark-1.6.2 on standalone deploy mode with dynamic allocation > disabled. > > I have traced the code. IMHO, it seems this cleanup is not handled by > shutdown hooks directly. The shutdown hooks only send a > “ExecutorStateChanged” message to the worker and if the worker see the > message, it will cleanup the directory *only when this application is > finished*. In our case, the application is not finished (long running). > The executor exits due to some unknown error and it is restarted by worker > right away. In this scenario, those old directories are not going to be > deleted. > > If the application is still running, is it safe to delete the old > “blockmgr” directory and leaving only the newest one? > > Our temporary solution is to restart our application regularly and we are > seeking a more elegant way. > > Thanks. > > Yang > > > 在 2016年9月2日,下午4:11,Sun Rui <sunrise_...@163.com> 写道: > > Hi, > Could you give more information about your Spark environment? cluster > manager, spark version, using dynamic allocation or not, etc.. > > Generally, executors will delete temporary directories for shuffle files > on exit because JVM shutdown hooks are registered. Unless they are brutally > killed. > > You can safely delete the directories when you are sure that the spark > applications related to them have finished. A crontab task may be used for > automatic clean up. > > On Sep 2, 2016, at 12:18, 汪洋 <tiandiwo...@icloud.com> wrote: > > Hi all, > > I discovered that sometimes executor exits unexpectedly and when it is > restarted, it will create another blockmgr directory without deleting the > old ones. Thus, for a long running application, some shuffle files will > never be cleaned up. Sometimes those files could take up the whole disk. > > Is there a way to clean up those unused file automatically? Or is it safe > to delete the old directory manually only leaving the newest one? > > Here is the executor’s local directory. > <D7718580-FF26-47F8-B6F8-00FB1F20A8C0.png> > > Any advice on this? > > Thanks. > > Yang > > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > -- -- Artur Sukhenko