Hi James,

On Tue, Feb 11, 2014 at 9:21 PM, James Bellenger <ja...@kixeye.com> wrote:

> Thanks for the followup Roland
> Bonus Followup Round!
>
> This original issue came up in looking at the cluster sharding support in
> 2.3.0-RC2
> We like to use "ephemeral" cluster nodes with random akka ports. After the
> cluster shuts down, the first node that comes up and becomes
> ShardCoordinator recovers the previous journal and tries to reconnect to
> the regions defined in the journal. These not only don't exist, but they
> never will exist due to the randomized ports.
>

Yes, the coordinator tries to connect to stored regions, but it should not
cause any problem if it can't. It watch them and will remove them when it
receives Terminated. The regions re-register themselves to the coordinator.
Do you see anything else?


>
> I had expected that when ShardCoordinator shuts down without handoff that
> it would wipe out its internal state. It's hard to tell if this is an issue
> in cluster sharding,  cluster singleton manager in general, or a caveat
> emptor with using ephemeral ports
>

To make it clear; there is no handoff in the singleton manager (any more).
It is just careful to avoid running more than one instance. The handoff
from one shutdown coordinator to a new coordinator is done by recovering
the persistent state of the coordinator.

Regards,
Patrik


>
>
>
> On Tue, Feb 11, 2014 at 2:36 AM, Akka Team <akka.offic...@gmail.com>wrote:
>
>> Hi James,
>>
>> the sequence you describe makes perfect sense to me, and the
>> ClusterSingletonManager tries to be overly thorough here; so much so that I
>> would call it a 
>> bug<https://www.assembla.com/spaces/ddEDvgVAKr3QrUeJe5aVNr/tickets/3869>.
>> Thanks for reporting!
>>
>> Regards,
>>
>> Roland
>>
>>
>>
>> On Mon, Feb 10, 2014 at 9:34 PM, James Bellenger <ja...@kixeye.com>wrote:
>>
>>> Hi gang.
>>> What is the process for a node to gracefully exit a cluster?
>>> Nodes in our system are going through this sequence:
>>>
>>>    - jvm gets the shutdown signal
>>>    - node calls cluster.leave(cluster.selfAddress)
>>>    - node waits until it sees MemberRemoved with its own address
>>>    - node gives singletons a grace period to migrate
>>>    - actor system is shutdown
>>>    - jvm exits
>>>
>>> This *feels *correct, but the 
>>> docs<http://doc.akka.io/docs/akka/2.3.0-RC2/scala/cluster-usage.html#Leaving>
>>>  are
>>> fuzzy on when the node can drop out.
>>> Moreover, ClusterSingletonManager has a hard time with this flow.
>>> Especially for 1-node clusters, it tries to handover to a non-existing
>>> peer, fails, and then fails harder when it is restarted and the cluster
>>> service is no longer running.
>>>
>>> Is there a better way for nodes to leave the cluster?
>>> Logs below.
>>>
>>> INFO 12:19:40,586 com.kixeye.common.log.AkkaLogger - Cluster Node
>>> [akka.tcp://ghost@127.0.0.1:50570] - Marked address [akka.tcp://
>>> ghost@127.0.0.1:50570] as [Leaving]
>>> INFO 12:19:41,355 com.kixeye.common.log.AkkaLogger - Cluster Node
>>> [akka.tcp://ghost@127.0.0.1:50570] - Leader is moving node [akka.tcp://
>>> ghost@127.0.0.1:50570] to [Exiting]
>>> INFO 12:19:41,356 com.kixeye.common.cluster.ClusterModule - member
>>> removed: leave completed!
>>> INFO 12:19:41,362 com.kixeye.common.log.AkkaLogger - Cluster Node
>>> [akka.tcp://ghost@127.0.0.1:50570] - Shutting down...
>>> INFO 12:19:41,371 com.kixeye.common.log.AkkaLogger - Cluster Node
>>> [akka.tcp://ghost@127.0.0.1:50570] - Successfully shut down
>>> INFO 12:19:41,374 akka.contrib.pattern.ClusterSingletonManager - Exited
>>> [akka.tcp://ghost@127.0.0.1:50570]
>>> INFO 12:19:41,376 akka.contrib.pattern.ClusterSingletonManager - Oldest
>>> observed OldestChanged: [akka.tcp://ghost@127.0.0.1:50570 -> None]
>>> INFO 12:19:41,381 akka.contrib.pattern.ClusterSingletonManager -
>>> ClusterSingletonManager state change [Oldest -> WasOldest]
>>> INFO 12:19:41,396 akka.actor.LocalActorRef - Message
>>> [akka.cluster.ClusterEvent$LeaderChanged] from
>>> Actor[akka://ghost/deadLetters] to
>>> Actor[akka://ghost/system/cluster/core/daemon/autoDown#2017004581] was
>>> not delivered. [1] dead letters encountered. This logging can be turned off
>>> or adjusted with configuration settings 'akka.log-dead-letters' and
>>> 'akka.log-dead-letters-during-shutdown'.
>>> INFO 12:19:41,396 akka.actor.LocalActorRef - Message
>>> [akka.dispatch.sysmsg.Terminate] from
>>> Actor[akka://ghost/system/cluster/core/daemon/heartbeatSender#1919962524]
>>> to
>>> Actor[akka://ghost/system/cluster/core/daemon/heartbeatSender#1919962524]
>>> was not delivered. [2] dead letters encountered. This logging can be turned
>>> off or adjusted with configuration settings 'akka.log-dead-letters' and
>>> 'akka.log-dead-letters-during-shutdown'.
>>> INFO 12:19:41,397 akka.actor.LocalActorRef - Message
>>> [akka.cluster.ClusterEvent$RoleLeaderChanged] from
>>> Actor[akka://ghost/deadLetters] to
>>> Actor[akka://ghost/system/cluster/core/daemon/autoDown#2017004581] was
>>> not delivered. [3] dead letters encountered. This logging can be turned off
>>> or adjusted with configuration settings 'akka.log-dead-letters' and
>>> 'akka.log-dead-letters-during-shutdown'.
>>> INFO 12:19:41,397 akka.actor.LocalActorRef - Message
>>> [akka.cluster.ClusterEvent$SeenChanged] from
>>> Actor[akka://ghost/deadLetters] to
>>> Actor[akka://ghost/system/cluster/core/daemon/autoDown#2017004581] was
>>> not delivered. [4] dead letters encountered. This logging can be turned off
>>> or adjusted with configuration settings 'akka.log-dead-letters' and
>>> 'akka.log-dead-letters-during-shutdown'.
>>> INFO 12:19:41,398 akka.actor.LocalActorRef - Message
>>> [akka.cluster.InternalClusterAction$Unsubscribe] from
>>> Actor[akka://ghost/deadLetters] to
>>> Actor[akka://ghost/system/cluster/core/daemon#1571353727] was not
>>> delivered. [5] dead letters encountered. This logging can be turned off or
>>> adjusted with configuration settings 'akka.log-dead-letters' and
>>> 'akka.log-dead-letters-during-shutdown'.
>>> INFO 12:19:41,398 akka.actor.LocalActorRef - Message
>>> [akka.cluster.InternalClusterAction$Unsubscribe] from
>>> Actor[akka://ghost/deadLetters] to
>>> Actor[akka://ghost/system/cluster/core/daemon#1571353727] was not
>>> delivered. [6] dead letters encountered. This logging can be turned off or
>>> adjusted with configuration settings 'akka.log-dead-letters' and
>>> 'akka.log-dead-letters-during-shutdown'.
>>> INFO 12:19:42,395 akka.contrib.pattern.ClusterSingletonManager - Retry
>>> [1], sending TakeOverFromMe to [None]
>>> INFO 12:19:43,415 akka.contrib.pattern.ClusterSingletonManager - Retry
>>> [2], sending TakeOverFromMe to [None]
>>> INFO 12:19:44,435 akka.contrib.pattern.ClusterSingletonManager - Retry
>>> [3], sending TakeOverFromMe to [None]
>>> INFO 12:19:45,455 akka.contrib.pattern.ClusterSingletonManager - Retry
>>> [4], sending TakeOverFromMe to [None]
>>> INFO 12:19:46,475 akka.contrib.pattern.ClusterSingletonManager - Retry
>>> [5], sending TakeOverFromMe to [None]
>>> ERROR 12:19:47,517 akka.actor.OneForOneStrategy - Expected hand-over to
>>> [None] never occured
>>> akka.contrib.pattern.ClusterSingletonManagerIsStuck: Expected hand-over
>>> to [None] never occured
>>> at
>>> akka.contrib.pattern.ClusterSingletonManager$$anonfun$10.applyOrElse(ClusterSingletonManager.scala:556)
>>> ~[akka-contrib_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at
>>> akka.contrib.pattern.ClusterSingletonManager$$anonfun$10.applyOrElse(ClusterSingletonManager.scala:548)
>>> ~[akka-contrib_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
>>> ~[scala-library.jar:?]
>>> at akka.actor.FSM$class.processEvent(FSM.scala:603)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at
>>> akka.contrib.pattern.ClusterSingletonManager.processEvent(ClusterSingletonManager.scala:336)
>>> ~[akka-contrib_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:597)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>> at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:569)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>> at
>>> akka.contrib.pattern.ClusterSingletonManager.aroundReceive(ClusterSingletonManager.scala:336)
>>> ~[akka-contrib_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:491)
>>> [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>> at akka.actor.ActorCell.invoke_aroundBody2(ActorCell.scala:462)
>>> [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at akka.actor.ActorCell.invoke_aroundBody3$advice(ActorCell.scala:536)
>>> [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>> at akka.actor.ActorCell.invoke(ActorCell.scala:1)
>>> [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
>>> [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>> at akka.dispatch.Mailbox.run(Mailbox.scala:220)
>>> [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>>> [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at
>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> [scala-library.jar:?]
>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> [scala-library.jar:?]
>>>  at
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> [scala-library.jar:?]
>>> at
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> [scala-library.jar:?]
>>> INFO 12:19:47,520 akka.actor.LocalActorRef - Message
>>> [akka.cluster.InternalClusterAction$Unsubscribe] from
>>> Actor[akka://ghost/deadLetters] to
>>> Actor[akka://ghost/system/cluster/core/daemon#1571353727] was not
>>> delivered. [7] dead letters encountered. This logging can be turned off or
>>> adjusted with configuration settings 'akka.log-dead-letters' and
>>> 'akka.log-dead-letters-during-shutdown'.
>>> INFO 12:19:47,520 akka.actor.LocalActorRef - Message
>>> [akka.cluster.InternalClusterAction$Unsubscribe] from
>>> Actor[akka://ghost/deadLetters] to
>>> Actor[akka://ghost/system/cluster/core/daemon#1571353727] was not
>>> delivered. [8] dead letters encountered. This logging can be turned off or
>>> adjusted with configuration settings 'akka.log-dead-letters' and
>>> 'akka.log-dead-letters-during-shutdown'.
>>> INFO 12:19:47,581 akka.actor.LocalActorRef - Message
>>> [akka.actor.PoisonPill$] from
>>> Actor[akka://ghost/user/$a/masterregion-DI_USA_1/queuemgr-DI_USA_1/$b#-1227772002]
>>> to
>>> Actor[akka://ghost/user/$a/masterregion-DI_USA_1/queuemgr-DI_USA_1/$b/$a#1401591931]
>>> was not delivered. [9] dead letters encountered. This logging can be turned
>>> off or adjusted with configuration settings 'akka.log-dead-letters' and
>>> 'akka.log-dead-letters-during-shutdown'.
>>> INFO 12:19:47,611 akka.actor.LocalActorRef - Message
>>> [akka.actor.PoisonPill$] from
>>> Actor[akka://ghost/user/$a/masterregion-DI_USA_1/masterstats-DI_USA_1/$a#1216246889]
>>> to
>>> Actor[akka://ghost/user/$a/masterregion-DI_USA_1/masterstats-DI_USA_1/$a/$a#-146187624]
>>> was not delivered. [10] dead letters encountered. This logging can be
>>> turned off or adjusted with configuration settings 'akka.log-dead-letters'
>>> and 'akka.log-dead-letters-during-shutdown'.
>>> ERROR 12:19:47,616 akka.actor.OneForOneStrategy - requirement failed:
>>> Cluster node must not be terminated
>>> akka.actor.PostRestartException: exception post restart (class
>>> akka.contrib.pattern.ClusterSingletonManagerIsStuck)
>>>  at
>>> akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:240)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at
>>> akka.actor.dungeon.FaultHandling$$anonfun$6.apply(FaultHandling.scala:238)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at
>>> akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:293)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at
>>> akka.actor.dungeon.FaultHandling$$anonfun$handleNonFatalOrInterruptedException$1.applyOrElse(FaultHandling.scala:288)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>> ~[scala-library.jar:?]
>>>  at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>> ~[scala-library.jar:?]
>>> at
>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>> ~[scala-library.jar:?]
>>>  at
>>> akka.actor.dungeon.FaultHandling$class.finishRecreate(FaultHandling.scala:238)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at
>>> akka.actor.dungeon.FaultHandling$class.handleChildTerminated(FaultHandling.scala:281)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at akka.actor.ActorCell.handleChildTerminated(ActorCell.scala:344)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>> at
>>> akka.actor.dungeon.DeathWatch$class.watchedActorTerminated(DeathWatch.scala:53)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at akka.actor.ActorCell.watchedActorTerminated(ActorCell.scala:344)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>> at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:430)
>>> [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at akka.actor.ActorCell.systemInvoke_aroundBody0(ActorCell.scala:453)
>>> [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>> at
>>> akka.actor.ActorCell.systemInvoke_aroundBody1$advice(ActorCell.scala:477)
>>> [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at akka.actor.ActorCell.systemInvoke(ActorCell.scala:1)
>>> [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>> at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
>>> [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>> [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>> at
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>>> [akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at
>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> [scala-library.jar:?]
>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> [scala-library.jar:?]
>>>  at
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> [scala-library.jar:?]
>>> at
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> [scala-library.jar:?]
>>> Caused by: java.lang.IllegalArgumentException: requirement failed:
>>> Cluster node must not be terminated
>>> at scala.Predef$.require(Predef.scala:233) ~[scala-library.jar:?]
>>>  at
>>> akka.contrib.pattern.ClusterSingletonManager.preStart(ClusterSingletonManager.scala:389)
>>> ~[akka-contrib_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at akka.actor.Actor$class.postRestart(Actor.scala:547)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>> at
>>> akka.contrib.pattern.ClusterSingletonManager.postRestart(ClusterSingletonManager.scala:336)
>>> ~[akka-contrib_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at akka.actor.Actor$class.aroundPostRestart(Actor.scala:485)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>> at
>>> akka.contrib.pattern.ClusterSingletonManager.aroundPostRestart(ClusterSingletonManager.scala:336)
>>> ~[akka-contrib_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  at
>>> akka.actor.dungeon.FaultHandling$class.finishRecreate(FaultHandling.scala:229)
>>> ~[akka-actor_2.10-2.3.0-RC1.jar:2.3.0-RC1]
>>>  ... 15 more
>>>
>>> --
>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>> >>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>> >>>>>>>>>> Search the archives:
>>> https://groups.google.com/group/akka-user
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Akka User List" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to akka-user+unsubscr...@googlegroups.com.
>>> To post to this group, send email to akka-user@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/akka-user.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>
>>
>> --
>> Akka Team
>> Typesafe - The software stack for applications that scale
>> Blog: letitcrash.com
>> Twitter: @akkateam
>>
>> --
>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>> >>>>>>>>>> Check the FAQ: http://akka.io/faq/
>> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to akka-user+unsubscr...@googlegroups.com.
>> To post to this group, send email to akka-user@googlegroups.com.
>> Visit this group at http://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ: http://akka.io/faq/
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to akka-user+unsubscr...@googlegroups.com.
> To post to this group, send email to akka-user@googlegroups.com.
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 

Patrik Nordwall
Typesafe <http://typesafe.com/> -  Reactive apps on the JVM
Twitter: @patriknw

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: http://akka.io/faq/
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to