Re: [akka-user] Sharding problem when restarting Cluster

Jim Hazen Tue, 04 Aug 2015 11:59:05 -0700

I see this issue happen whenever AWS has a network hiccup.  I have a 
multi-node cluster behind a LB and akka cluster sharding along with akka 
persistence writing to a Dynamo journal.  I'm currently on akka 2.3.11, 
which means the same shared Dynamo table used to store my persistent actors 
is also being used to store cluster information.  I know of no way to 
prevent this until akka 2.4.


I have my min-nr-of-members set to (nodes / 2) + 1.  Things seem to work 
fine during clean node restarts and code deploys.

However, I run into the same problem as the OP when AWS suffers an 
intermittent network partition.  The nodes within the akka cluster can't 
fully communicate, yet the LB is able to reach all nodes.  Cluster state is 
persisted into the same location, because that's unavoidable while using 
akka persistence for other things.  Eventually cluster sharding gets upset 
and panics.  Causing the error below to be repeated constantly until the 
full cluster is shut down and started back up cleanly.

What should a developer do to isolate from split brained issues when using 
cluster sharding?  min-nr-of-members appears to only be checked during 
cluster startup.  However, once started and participating, what happens 
automatically when the cluster detects that the cluster has dropped below 
min-nr-of-members?  I can attempt to guard against possible issues in 
application land by subscribing to the cluster events and taking some 
action.  I'm not sure if there's anything I can do to prevent the cluster 
sharding internals from running into state however, since writing cluster 
state to a shared journal is unavoidable and network issues are unavoidable.


2015-08-02 05:33:30.138 05:33:30.138UTC [Device] ERROR 
> akka.actor.OneForOneStrategy DeviceSvc-akka.actor.default-dispatcher-3 
> akka://DeviceSvc/user/sharding/UserDeviceIndexCoordinator/singleton/coordinator
>  
> - requirement failed: Shard [2] already allocated: State(Map(-2 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203],
>  
> 0 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203],
>  
> 2 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203],
>  
> -1 -> 
> Actor[akka.tcp://DeviceSvc@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773],
>  
> 3 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203]),Map(Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203]
>  
> -> Vector(2, 3, -2, 0), 
> Actor[akka.tcp://DeviceSvc@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773]
>  
> -> Vector(-1)),Set())
>
> > java.lang.IllegalArgumentException: requirement failed: Shard [2] 
> already allocated: State(Map(-2 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203],
>  
> 0 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203],
>  
> 2 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203],
>  
> -1 -> 
> Actor[akka.tcp://DeviceSvc@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773],
>  
> 3 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203]),Map(Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203]
>  
> -> Vector(2, 3, -2, 0), 
> Actor[akka.tcp://DeviceSvc@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773]
>  
> -> Vector(-1)),Set())
>
> >        at scala.Predef$.require(Predef.scala:219) 
> ~[org.scala-lang.scala-library-2.11.6.jar:na]
>
> >        at 
> akka.contrib.pattern.ShardCoordinator$Internal$State.updated(ClusterSharding.scala:1119)
>  
> ~[com.typesafe.akka.akka-contrib_2.11-2.3.11.jar:2.3.11]
>
> >        at 
> akka.contrib.pattern.ShardCoordinator$$anonfun$receiveRecover$1.applyOrElse(ClusterSharding.scala:1242)
>  
> ~[com.typesafe.akka.akka-contrib_2.11-2.3.11.jar:2.3.11]
>
> >        at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) 
> ~[org.scala-lang.scala-library-2.11.6.jar:na]
>
> >        at 
> akka.persistence.Eventsourced$$anonfun$akka$persistence$Eventsourced$$recoveryBehavior$1.applyOrElse(Eventsourced.scala:168)
>  
> ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na]
>
> >        at akka.persistence.Recovery$class.runReceive(Recovery.scala:48) 
> ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na]
>
> >        at 
> akka.contrib.pattern.ShardCoordinator.runReceive(ClusterSharding.scala:1195) 
> ~[com.typesafe.akka.akka-contrib_2.11-2.3.11.jar:2.3.11]
>
> >        at 
> akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33)
>  
> ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na]
>
> >        at 
> akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33)
>  
> ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na]
>
> >        at 
> akka.persistence.Recovery$class.withCurrentPersistent(Recovery.scala:185) 
> ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na]
>
> >        at 
> akka.contrib.pattern.ShardCoordinator.withCurrentPersistent(ClusterSharding.scala:1195)
>  
> ~[com.typesafe.akka.akka-contrib_2.11-2.3.11.jar:2.3.11]
> >        at 
> akka.persistence.Recovery$State$class.processPersistent(Recovery.scala:33) 
> ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na]
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Sharding problem when restarting Cluster

Reply via email to