[akka-user] cluster sharding failover

2014-02-17 Thread James Bellenger
Hi gang, a question regarding shard failover

Here's my 3-node setup (using akka-2.3.0 rc3):

   - backend 1
   - backend 2
   - frontend

All nodes initialize ClusterSharding at startup -- frontend supplies an
empty entryProps so that it does not host any regions.

Frontend starts pinging a range of sharded actors host on the backends -- I
can see that the entries are evenly distributed across the backends.

When I invoke a clean shutdown of one of the backends, the frontend node is
left in a bad state. It continually tries to connect to the dead backend
node. It does this forever. Restarting the frontend is required to get it
to find the failed over shards on the remaining backend.

Am I missing something with shutting down cluster sharding?
Some logs from frontend. They're a bit noisy -- I've tried to call out the
interesting bits.

DEBUG 15:40:42,389 akka.contrib.pattern.ShardRegion - Forwarding request
for shard [0] to [Actor[akka.tcp://
ghost@127.0.0.1:50351/user/sharding/user#-576014717]]
DEBUG 15:40:43,405 akka.contrib.pattern.ShardRegion - Forwarding request
for shard [1] to [Actor[akka.tcp://
ghost@127.0.0.1:50324/user/sharding/user#-498786510]]
DEBUG 15:40:44,425 akka.contrib.pattern.ShardRegion - Forwarding request
for shard [2] to [Actor[akka.tcp://
ghost@127.0.0.1:50351/user/sharding/user#-576014717]]
INFO 15:40:44,825 akka.actor.LocalActorRef - Message
[akka.remote.transport.AssociationHandle$Disassociated] from
Actor[akka://ghost/deadLetters] to
Actor[akka://ghost/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2Fghost%40127.0.0.1%3A50351-1#-206867083]
was not delivered. [2] dead letters encountered. This logging can be turned
off or adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
INFO 15:40:44,881 akka.actor.LocalActorRef - Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from
Actor[akka://ghost/deadLetters] to
Actor[akka://ghost/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2Fghost%40127.0.0.1%3A50351-1#-206867083]
was not delivered. [3] dead letters encountered. This logging can be turned
off or adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
*DEBUG 15:40:44,884 akka.remote.EndpointWriter - Disassociated
[akka.tcp://ghost@127.0.0.1:50373 http://ghost@127.0.0.1:50373] -
[akka.tcp://ghost@127.0.0.1:50351 http://ghost@127.0.0.1:50351]*
INFO 15:40:44,884 akka.actor.LocalActorRef - Message [akka.actor.FSM$Timer]
from Actor[akka://ghost/deadLetters] to
Actor[akka://ghost/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fghost%40127.0.0.1%3A50351-1/endpointWriter#1517188237]
was not delivered. [4] dead letters encountered. This logging can be turned
off or adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
INFO 15:40:45,091
akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef - Message
[akka.cluster.GossipEnvelope] from
Actor[akka://ghost/system/cluster/core/daemon#-1507678631] to
Actor[akka://ghost/deadLetters] was not delivered. [5] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
INFO 15:40:45,097 com.kixeye.common.log.AkkaLogger - Cluster Node
[akka.tcp://ghost@127.0.0.1:50373] - Marking exiting node(s) as UNREACHABLE
[Member(address = akka.tcp://ghost@127.0.0.1:50351, status = Exiting)].
This is expected and they will be removed.
*INFO 15:40:45,106 com.kixeye.common.cluster.ClusterModule - member is
unreachable: Member(address = akka.tcp://ghost@127.0.0.1:50351
http://ghost@127.0.0.1:50351, status = Exiting)*
^-- node becomes unreachable

DEBUG 15:40:45,445 akka.contrib.pattern.ShardRegion - Forwarding request
for shard [3] to [Actor[akka.tcp://
ghost@127.0.0.1:50324/user/sharding/user#-498786510]]
INFO 15:40:45,535
akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef - Message
[akka.cluster.ClusterHeartbeatSender$Heartbeat] from
Actor[akka://ghost/system/cluster/core/daemon/heartbeatSender#214131267] to
Actor[akka://ghost/deadLetters] was not delivered. [6] dead letters
encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
*DEBUG 15:40:46,465 akka.contrib.pattern.ShardRegion - Forwarding request
for shard [4] to
[Actor[akka.tcp://ghost@127.0.0.1:50351/user/sharding/user#-576014717
http://ghost@127.0.0.1:50351/user/sharding/user#-576014717]]*
*^-- *forwarding msg to a known-unreachable node

INFO 15:40:46,465
akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef - Message
[com.kixeye.common.cluster.UserBackend$UserMessageEnvelope] from
Actor[akka://ghost/user/ghost-sheppard/module-com.kixeye.common.cluster.UserClientModule-8c0d9b97-8b4e-4988-a40b-91d054080afa#1779934259]
to Actor[akka://ghost/deadLetters] was 

Re: [akka-user] akka cluster seed nodes on ec2

2014-02-11 Thread James Bellenger
Hi Tim,
We have a similar-ish setup over here. We ended up registering all nodes in
zookeeper and doing discovery through that. This works well for initial
cluster startup as well as nodes joining an existing cluster. It also works
well for hybrid environments that are not all on aws.


On Tue, Feb 11, 2014 at 11:07 AM, Patrik Nordwall patrik.nordw...@gmail.com
 wrote:

 Hi Tim,

 You can use whatever nodes as seed nodes, except for when you start up a
 fresh cluster from scratch. When you start a node you can use the AWS API
 to discover other EC2 instances and use all or a few as seed nodes.

 The special case is when starting a new cluster, and then I imagine that
 you can mark one node as special using AWS metadata, or a special argument
 in the start script. This special node must put itself first in the list of
 seed nodes. In all other cases a node should not include itself as seed
 node.

 The reason for the special first seed node is to avoid creating several
 separate clusters when starting from an empty cluster.

 I would be interested in understanding why this would not
 be feasible solution.

 Cheers,
 Patrik


 On Tue, Feb 11, 2014 at 7:20 PM, Timothy Perrett 
 timo...@getintheloop.euwrote:

 Hey Roland - doesnt always on sound like will never fail... ? I
 wouldn't be comfortable manually managing that cluster, so it would need an
 ASG, which in turn will remove the oldest node when shrinking, so over time
 you'd end up loosing all your original seed nodes if your cluster is
 expanding and contracting enough.



 On Sunday, 9 February 2014 23:35:20 UTC-8, rkuhn wrote:

 Hi Tim,

 you could have one small group which is always on and acts as entry
 point (seed nodes) while the rest are auto-scaling. Would that solve the
 issue?

 Regards,

 Roland

 6 feb 2014 kl. 23:12 skrev Timothy Perrett tim...@getintheloop.eu:

 Scott,

 What did you ever end up doing about this? As yet, I have not seen any
 decent story for auto-scalling akka-cluster on AWS; it strikes me that over
 a long enough period all the seed nodes could be reaped by the ASG, but
 that there would be enough folks wanting to do this that there would be a
 decent solution for it.

 Cheers

 Tim

 On Thursday, 24 October 2013 01:09:08 UTC-7, Patrik Nordwall wrote:

 Hi Scott,

 Using the AWS API together with Cluster(system).joinSeedNodes should be
 possible. The first seed node must be marked somehow.

 From docs:
 You may also use Cluster(system).joinSeedNodes, which is attractive
 when dynamically discovering other nodes at startup by using some external
 tool or API. When using joinSeedNodes you should not include the node
 itself except for the node that is supposed to be the first seed node, and
 that should be placed first in parameter to joinSeedNodes.

 Regards,
 Patrik



 On Tue, Oct 22, 2013 at 1:36 AM, Ryan Tanner ryan@gmail.comwrote:

 We have Vagrant auto-update the hosts file over ssh.  Kinda clunky but
 it works for small clusters.


 On Monday, October 21, 2013 4:58:36 PM UTC-5, Scott Clasen wrote:

 Anyone have good tricks for standing up an akka cluster on ec2 WRT
 seed nodes?

 Would like to be able to auto scale it so not having to manually join
 a node is necessary.

 Could use elastic IPs would rather not. Could maybe use a tcb elb?
 not sure that would work, would rather not.

 Guess I can use ZK which I already have stood up...what weird things
 can happen if a node attempts to join a cluster via a seed that has been
 partitioned away from the cluster?



 --
  Read the docs: http://akka.io/docs/
  Check the FAQ: http://akka.io/faq/
  Search the archives: https://groups.google.com/
 group/akka-user
 ---
 You received this message because you are subscribed to the Google
 Groups Akka User List group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to akka-user+...@googlegroups.com.
 To post to this group, send email to akka...@googlegroups.com.
 Visit this group at http://groups.google.com/group/akka-user.
 For more options, visit https://groups.google.com/groups/opt_out.




 --

 Patrik Nordwall
 Typesafe http://typesafe.com/ -  Reactive apps on the JVM
 Twitter: @patriknw


 --
  Read the docs: http://akka.io/docs/
  Check the FAQ: http://akka.io/faq/
  Search the archives: https://groups.google.com/
 group/akka-user
 ---
 You received this message because you are subscribed to the Google
 Groups Akka User List group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to akka-user+...@googlegroups.com.
 To post to this group, send email to akka...@googlegroups.com.
 Visit this group at http://groups.google.com/group/akka-user.
 For more options, visit https://groups.google.com/groups/opt_out.




 *Dr. Roland Kuhn*
 *Akka Tech Lead*
 Typesafe http://typesafe.com/ - Reactive apps on the JVM.
 twitter: @rolandkuhn
  http://twitter.com/#!/rolandkuhn

  --
  Read the docs: http://akka.io/docs/
  Check the FAQ: http://akka.io/faq/
  Search 

Re: [akka-user] clean cluster exit

2014-02-11 Thread James Bellenger
Thanks for the followup Roland
Bonus Followup Round!

This original issue came up in looking at the cluster sharding support in
2.3.0-RC2
We like to use ephemeral cluster nodes with random akka ports. After the
cluster shuts down, the first node that comes up and becomes
ShardCoordinator recovers the previous journal and tries to reconnect to
the regions defined in the journal. These not only don't exist, but they
never will exist due to the randomized ports.

I had expected that when ShardCoordinator shuts down without handoff that
it would wipe out its internal state. It's hard to tell if this is an issue
in cluster sharding,  cluster singleton manager in general, or a caveat
emptor with using ephemeral ports



On Tue, Feb 11, 2014 at 2:36 AM, Akka Team akka.offic...@gmail.com wrote:

 Hi James,

 the sequence you describe makes perfect sense to me, and the
 ClusterSingletonManager tries to be overly thorough here; so much so that I
 would call it a 
 bughttps://www.assembla.com/spaces/ddEDvgVAKr3QrUeJe5aVNr/tickets/3869.
 Thanks for reporting!

 Regards,

 Roland



 On Mon, Feb 10, 2014 at 9:34 PM, James Bellenger ja...@kixeye.com wrote:

 Hi gang.
 What is the process for a node to gracefully exit a cluster?
 Nodes in our system are going through this sequence:

- jvm gets the shutdown signal
- node calls cluster.leave(cluster.selfAddress)
- node waits until it sees MemberRemoved with its own address
- node gives singletons a grace period to migrate
- actor system is shutdown
- jvm exits

 This *feels *correct, but the 
 docshttp://doc.akka.io/docs/akka/2.3.0-RC2/scala/cluster-usage.html#Leaving
  are
 fuzzy on when the node can drop out.
 Moreover, ClusterSingletonManager has a hard time with this flow.
 Especially for 1-node clusters, it tries to handover to a non-existing
 peer, fails, and then fails harder when it is restarted and the cluster
 service is no longer running.

 Is there a better way for nodes to leave the cluster?
 Logs below.

 INFO 12:19:40,586 com.kixeye.common.log.AkkaLogger - Cluster Node
 [akka.tcp://ghost@127.0.0.1:50570] - Marked address [akka.tcp://
 ghost@127.0.0.1:50570] as [Leaving]
 INFO 12:19:41,355 com.kixeye.common.log.AkkaLogger - Cluster Node
 [akka.tcp://ghost@127.0.0.1:50570] - Leader is moving node [akka.tcp://
 ghost@127.0.0.1:50570] to [Exiting]
 INFO 12:19:41,356 com.kixeye.common.cluster.ClusterModule - member
 removed: leave completed!
 INFO 12:19:41,362 com.kixeye.common.log.AkkaLogger - Cluster Node
 [akka.tcp://ghost@127.0.0.1:50570] - Shutting down...
 INFO 12:19:41,371 com.kixeye.common.log.AkkaLogger - Cluster Node
 [akka.tcp://ghost@127.0.0.1:50570] - Successfully shut down
 INFO 12:19:41,374 akka.contrib.pattern.ClusterSingletonManager - Exited
 [akka.tcp://ghost@127.0.0.1:50570]
 INFO 12:19:41,376 akka.contrib.pattern.ClusterSingletonManager - Oldest
 observed OldestChanged: [akka.tcp://ghost@127.0.0.1:50570 - None]
 INFO 12:19:41,381 akka.contrib.pattern.ClusterSingletonManager -
 ClusterSingletonManager state change [Oldest - WasOldest]
 INFO 12:19:41,396 akka.actor.LocalActorRef - Message
 [akka.cluster.ClusterEvent$LeaderChanged] from
 Actor[akka://ghost/deadLetters] to
 Actor[akka://ghost/system/cluster/core/daemon/autoDown#2017004581] was
 not delivered. [1] dead letters encountered. This logging can be turned off
 or adjusted with configuration settings 'akka.log-dead-letters' and
 'akka.log-dead-letters-during-shutdown'.
 INFO 12:19:41,396 akka.actor.LocalActorRef - Message
 [akka.dispatch.sysmsg.Terminate] from
 Actor[akka://ghost/system/cluster/core/daemon/heartbeatSender#1919962524]
 to
 Actor[akka://ghost/system/cluster/core/daemon/heartbeatSender#1919962524]
 was not delivered. [2] dead letters encountered. This logging can be turned
 off or adjusted with configuration settings 'akka.log-dead-letters' and
 'akka.log-dead-letters-during-shutdown'.
 INFO 12:19:41,397 akka.actor.LocalActorRef - Message
 [akka.cluster.ClusterEvent$RoleLeaderChanged] from
 Actor[akka://ghost/deadLetters] to
 Actor[akka://ghost/system/cluster/core/daemon/autoDown#2017004581] was
 not delivered. [3] dead letters encountered. This logging can be turned off
 or adjusted with configuration settings 'akka.log-dead-letters' and
 'akka.log-dead-letters-during-shutdown'.
 INFO 12:19:41,397 akka.actor.LocalActorRef - Message
 [akka.cluster.ClusterEvent$SeenChanged] from
 Actor[akka://ghost/deadLetters] to
 Actor[akka://ghost/system/cluster/core/daemon/autoDown#2017004581] was
 not delivered. [4] dead letters encountered. This logging can be turned off
 or adjusted with configuration settings 'akka.log-dead-letters' and
 'akka.log-dead-letters-during-shutdown'.
 INFO 12:19:41,398 akka.actor.LocalActorRef - Message
 [akka.cluster.InternalClusterAction$Unsubscribe] from
 Actor[akka://ghost/deadLetters] to
 Actor[akka://ghost/system/cluster/core/daemon#1571353727] was not
 delivered. [5] dead letters encountered. This logging can be turned off

[akka-user] clean cluster exit

2014-02-10 Thread James Bellenger
Hi gang.
What is the process for a node to gracefully exit a cluster?
Nodes in our system are going through this sequence:

   - jvm gets the shutdown signal
   - node calls cluster.leave(cluster.selfAddress)
   - node waits until it sees MemberRemoved with its own address
   - node gives singletons a grace period to migrate
   - actor system is shutdown
   - jvm exits

This *feels *correct, but the
docshttp://doc.akka.io/docs/akka/2.3.0-RC2/scala/cluster-usage.html#Leaving
are
fuzzy on when the node can drop out.
Moreover, ClusterSingletonManager has a hard time with this flow.
Especially for 1-node clusters, it tries to handover to a non-existing
peer, fails, and then fails harder when it is restarted and the cluster
service is no longer running.

Is there a better way for nodes to leave the cluster?
Logs below.

INFO 12:19:40,586 com.kixeye.common.log.AkkaLogger - Cluster Node
[akka.tcp://ghost@127.0.0.1:50570] - Marked address [akka.tcp://
ghost@127.0.0.1:50570] as [Leaving]
INFO 12:19:41,355 com.kixeye.common.log.AkkaLogger - Cluster Node
[akka.tcp://ghost@127.0.0.1:50570] - Leader is moving node [akka.tcp://
ghost@127.0.0.1:50570] to [Exiting]
INFO 12:19:41,356 com.kixeye.common.cluster.ClusterModule - member removed:
leave completed!
INFO 12:19:41,362 com.kixeye.common.log.AkkaLogger - Cluster Node
[akka.tcp://ghost@127.0.0.1:50570] - Shutting down...
INFO 12:19:41,371 com.kixeye.common.log.AkkaLogger - Cluster Node
[akka.tcp://ghost@127.0.0.1:50570] - Successfully shut down
INFO 12:19:41,374 akka.contrib.pattern.ClusterSingletonManager - Exited
[akka.tcp://ghost@127.0.0.1:50570]
INFO 12:19:41,376 akka.contrib.pattern.ClusterSingletonManager - Oldest
observed OldestChanged: [akka.tcp://ghost@127.0.0.1:50570 - None]
INFO 12:19:41,381 akka.contrib.pattern.ClusterSingletonManager -
ClusterSingletonManager state change [Oldest - WasOldest]
INFO 12:19:41,396 akka.actor.LocalActorRef - Message
[akka.cluster.ClusterEvent$LeaderChanged] from
Actor[akka://ghost/deadLetters] to
Actor[akka://ghost/system/cluster/core/daemon/autoDown#2017004581] was not
delivered. [1] dead letters encountered. This logging can be turned off or
adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
INFO 12:19:41,396 akka.actor.LocalActorRef - Message
[akka.dispatch.sysmsg.Terminate] from
Actor[akka://ghost/system/cluster/core/daemon/heartbeatSender#1919962524]
to
Actor[akka://ghost/system/cluster/core/daemon/heartbeatSender#1919962524]
was not delivered. [2] dead letters encountered. This logging can be turned
off or adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
INFO 12:19:41,397 akka.actor.LocalActorRef - Message
[akka.cluster.ClusterEvent$RoleLeaderChanged] from
Actor[akka://ghost/deadLetters] to
Actor[akka://ghost/system/cluster/core/daemon/autoDown#2017004581] was not
delivered. [3] dead letters encountered. This logging can be turned off or
adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
INFO 12:19:41,397 akka.actor.LocalActorRef - Message
[akka.cluster.ClusterEvent$SeenChanged] from
Actor[akka://ghost/deadLetters] to
Actor[akka://ghost/system/cluster/core/daemon/autoDown#2017004581] was not
delivered. [4] dead letters encountered. This logging can be turned off or
adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
INFO 12:19:41,398 akka.actor.LocalActorRef - Message
[akka.cluster.InternalClusterAction$Unsubscribe] from
Actor[akka://ghost/deadLetters] to
Actor[akka://ghost/system/cluster/core/daemon#1571353727] was not
delivered. [5] dead letters encountered. This logging can be turned off or
adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
INFO 12:19:41,398 akka.actor.LocalActorRef - Message
[akka.cluster.InternalClusterAction$Unsubscribe] from
Actor[akka://ghost/deadLetters] to
Actor[akka://ghost/system/cluster/core/daemon#1571353727] was not
delivered. [6] dead letters encountered. This logging can be turned off or
adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.
INFO 12:19:42,395 akka.contrib.pattern.ClusterSingletonManager - Retry [1],
sending TakeOverFromMe to [None]
INFO 12:19:43,415 akka.contrib.pattern.ClusterSingletonManager - Retry [2],
sending TakeOverFromMe to [None]
INFO 12:19:44,435 akka.contrib.pattern.ClusterSingletonManager - Retry [3],
sending TakeOverFromMe to [None]
INFO 12:19:45,455 akka.contrib.pattern.ClusterSingletonManager - Retry [4],
sending TakeOverFromMe to [None]
INFO 12:19:46,475 akka.contrib.pattern.ClusterSingletonManager - Retry [5],
sending TakeOverFromMe to [None]
ERROR 12:19:47,517 akka.actor.OneForOneStrategy - Expected hand-over to
[None] never occured
akka.contrib.pattern.ClusterSingletonManagerIsStuck: Expected hand-over to
[None] never occured
at