[ https://issues.apache.org/jira/browse/FLINK-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609708#comment-14609708 ]
Stephan Ewen edited comment on FLINK-2299 at 7/1/15 8:07 AM: ------------------------------------------------------------- Okay, judging from the timestamps, it looks like the TM disappears due to a large GC stall. Here are a few things to try: 1. Run Flink with the G1 garbage collector. It works incrementally and usually leads to lower GC pauses 2. Dedicate more memory to the user code (lower the managed memory fraction for the system) was (Author: stephanewen): Okay, judging from the timestamps, it looks like the TM disappears due to a large GC stall. Here are a few things to try: 1. Run Flink with the G1 garbage collector. It works incrementally and usually leads to lower GC pauses 2. Dedicate more memory to the user code (lower the mamaged memory fraction for the system) > The slot on which the task maanger was scheduled was killed > ----------------------------------------------------------- > > Key: FLINK-2299 > URL: https://issues.apache.org/jira/browse/FLINK-2299 > Project: Flink > Issue Type: Bug > Affects Versions: 0.9, 0.10 > Reporter: Andra Lungu > Priority: Critical > Fix For: 0.9.1 > > > The following code: > https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSATriangleCount.java > Ran on the twitter follower graph: > http://twitter.mpi-sws.org/data-icwsm2010.html > With a similar configuration to the one in FLINK-2293 > fails with the following exception: > java.lang.Exception: The slot in which the task was executed has been > released. Probably loss of TaskManager 57c67d938c9144bec5ba798bb8ebe636 @ > wally025 - 8 slots - URL: > akka.tcp://flink@130.149.249.35:56135/user/taskmanager > at > org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:151) > at > org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547) > at > org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119) > at > org.apache.flink.runtime.instance.Instance.markDead(Instance.java:154) > at > org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:182) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:421) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at > akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46) > at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369) > at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501) > at akka.actor.ActorCell.invoke(ActorCell.scala:486) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 06/29/2015 10:33:46 Job execution switched to status FAILING. > The logs are here: > https://drive.google.com/file/d/0BwnaKJcSLc43M1BhNUt5NWdINHc/view?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)