I see this issue happen whenever AWS has a network hiccup. I have a multi-node cluster behind a LB and akka cluster sharding along with akka persistence writing to a Dynamo journal. I'm currently on akka 2.3.11, which means the same shared Dynamo table used to store my persistent actors is also being used to store cluster information. I know of no way to prevent this until akka 2.4.
I have my min-nr-of-members set to (nodes / 2) + 1. Things seem to work fine during clean node restarts and code deploys. However, I run into the same problem as the OP when AWS suffers an intermittent network partition. The nodes within the akka cluster can't fully communicate, yet the LB is able to reach all nodes. Cluster state is persisted into the same location, because that's unavoidable while using akka persistence for other things. Eventually cluster sharding gets upset and panics. Causing the error below to be repeated constantly until the full cluster is shut down and started back up cleanly. What should a developer do to isolate from split brained issues when using cluster sharding? min-nr-of-members appears to only be checked during cluster startup. However, once started and participating, what happens automatically when the cluster detects that the cluster has dropped below min-nr-of-members? I can attempt to guard against possible issues in application land by subscribing to the cluster events and taking some action. I'm not sure if there's anything I can do to prevent the cluster sharding internals from running into state however, since writing cluster state to a shared journal is unavoidable and network issues are unavoidable. 2015-08-02 05:33:30.138 05:33:30.138UTC [Device] ERROR > akka.actor.OneForOneStrategy DeviceSvc-akka.actor.default-dispatcher-3 > akka://DeviceSvc/user/sharding/UserDeviceIndexCoordinator/singleton/coordinator > > - requirement failed: Shard [2] already allocated: State(Map(-2 -> > Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203], > > 0 -> > Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203], > > 2 -> > Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203], > > -1 -> > Actor[akka.tcp://DeviceSvc@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773], > > 3 -> > Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203]),Map(Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203] > > -> Vector(2, 3, -2, 0), > Actor[akka.tcp://DeviceSvc@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773] > > -> Vector(-1)),Set()) > > > java.lang.IllegalArgumentException: requirement failed: Shard [2] > already allocated: State(Map(-2 -> > Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203], > > 0 -> > Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203], > > 2 -> > Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203], > > -1 -> > Actor[akka.tcp://DeviceSvc@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773], > > 3 -> > Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203]),Map(Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203] > > -> Vector(2, 3, -2, 0), > Actor[akka.tcp://DeviceSvc@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773] > > -> Vector(-1)),Set()) > > > at scala.Predef$.require(Predef.scala:219) > ~[org.scala-lang.scala-library-2.11.6.jar:na] > > > at > akka.contrib.pattern.ShardCoordinator$Internal$State.updated(ClusterSharding.scala:1119) > > ~[com.typesafe.akka.akka-contrib_2.11-2.3.11.jar:2.3.11] > > > at > akka.contrib.pattern.ShardCoordinator$$anonfun$receiveRecover$1.applyOrElse(ClusterSharding.scala:1242) > > ~[com.typesafe.akka.akka-contrib_2.11-2.3.11.jar:2.3.11] > > > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > ~[org.scala-lang.scala-library-2.11.6.jar:na] > > > at > akka.persistence.Eventsourced$$anonfun$akka$persistence$Eventsourced$$recoveryBehavior$1.applyOrElse(Eventsourced.scala:168) > > ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na] > > > at akka.persistence.Recovery$class.runReceive(Recovery.scala:48) > ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na] > > > at > akka.contrib.pattern.ShardCoordinator.runReceive(ClusterSharding.scala:1195) > ~[com.typesafe.akka.akka-contrib_2.11-2.3.11.jar:2.3.11] > > > at > akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33) > > ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na] > > > at > akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33) > > ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na] > > > at > akka.persistence.Recovery$class.withCurrentPersistent(Recovery.scala:185) > ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na] > > > at > akka.contrib.pattern.ShardCoordinator.withCurrentPersistent(ClusterSharding.scala:1195) > > ~[com.typesafe.akka.akka-contrib_2.11-2.3.11.jar:2.3.11] > > at > akka.persistence.Recovery$State$class.processPersistent(Recovery.scala:33) > ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na] > -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.