[ 
https://issues.apache.org/jira/browse/FLINK-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309212#comment-14309212
 ] 

Stephan Ewen commented on FLINK-1492:
-------------------------------------

The current solution is a bit hacky. Right now, we see multiple cleanups I 
think one is on graceful shutdown of the task manager (through observation of 
job manager death) and then through the shutdown hook.

I think the right solution is to not simply let the shutdown hook delete the 
directory, but to have the shutdown hook trigger call a "shutdown" on the Blob 
manager.
The shutdown should also make sure it occurs only once, so it does not happen 
through both the task manager shutdown, and the shutdown hook.

It is also good practice that the blob manager should remove the shutdown hook 
once shutdown is called, to prevent resource leaks.


> Exceptions on shutdown concerning BLOB store cleanup
> ----------------------------------------------------
>
>                 Key: FLINK-1492
>                 URL: https://issues.apache.org/jira/browse/FLINK-1492
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager, TaskManager
>    Affects Versions: 0.9
>            Reporter: Stephan Ewen
>            Assignee: Ufuk Celebi
>             Fix For: 0.9
>
>
> The following stack traces occur not every time, but frequently.
> {code}
> java.lang.IllegalArgumentException: 
> /tmp/blobStore-7a89856a-47f9-45d6-b88b-981a3eff1982 does not exist
>       at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1637)
>       at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
>       at 
> org.apache.flink.runtime.blob.BlobServer.shutdown(BlobServer.java:213)
>       at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.shutdown(BlobLibraryCacheManager.java:171)
>       at 
> org.apache.flink.runtime.jobmanager.JobManager.postStop(JobManager.scala:136)
>       at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
>       at 
> org.apache.flink.runtime.jobmanager.JobManager.aroundPostStop(JobManager.scala:80)
>       at 
> akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
>       at 
> akka.actor.dungeon.FaultHandling$class.handleChildTerminated(FaultHandling.scala:292)
>       at akka.actor.ActorCell.handleChildTerminated(ActorCell.scala:369)
>       at 
> akka.actor.dungeon.DeathWatch$class.watchedActorTerminated(DeathWatch.scala:63)
>       at akka.actor.ActorCell.watchedActorTerminated(ActorCell.scala:369)
>       at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:455)
>       at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
>       at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)
>       at akka.dispatch.Mailbox.run(Mailbox.scala:220)
>       at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>       at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:16:15,350 ERROR 
> org.apache.flink.test.util.ForkableFlinkMiniCluster$$anonfun$startTaskManager$1$$anon$1
>   - LibraryCacheManager did not shutdown properly.
> java.io.IOException: Unable to delete file: 
> /tmp/blobStore-e2619536-fb7c-452a-8639-487a074d1582/cache/blob_ff74895f7bdeeaa3bd70b6932beed143048bb4c7
>       at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2279)
>       at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
>       at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
>       at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
>       at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
>       at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
>       at org.apache.flink.runtime.blob.BlobCache.shutdown(BlobCache.java:159)
>       at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.shutdown(BlobLibraryCacheManager.java:171)
>       at 
> org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:173)
>       at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
>       at 
> org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:86)
>       at 
> akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
>       at 
> akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
>       at akka.actor.ActorCell.terminate(ActorCell.scala:369)
>       at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
>       at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
>       at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)
>       at akka.dispatch.Mailbox.run(Mailbox.scala:220)
>       at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>       at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:16:15,345 ERROR org.apache.flink.runtime.blob.BlobCache                    
>    - Error deleting directory 
> /tmp/blobStore-4313349e-8a58-4683-9fd0-3d2c52be1864 during JVM shutdown: 
> /tmp/blobStore-4313349e-8a58-4683-9fd0-3d2c52be1864 does not exist
> java.lang.IllegalArgumentException: 
> /tmp/blobStore-4313349e-8a58-4683-9fd0-3d2c52be1864 does not exist
>       at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1637)
>       at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
>       at org.apache.flink.runtime.blob.BlobUtils$1.run(BlobUtils.java:210)
>       at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to