Thanks, Patrik. That is precisely what happened. I had been using auto-down-unreachable-after, and while this appeared to work fine in the normal "rolling update" mode of deploying nodes to our cluster, it had the split-brain effects when there were transient cases of unreachability. I have sense replaced the use of the auto-down-unreachable-after with an implementation that, upon detecting a MemberUnreachable cluster event, queries AWS/ECS to determine whether the node is still present in the cluster (and service). If not, it marks the node down, otherwise, it doesn't. So far, this seems to work just fine. (And yes, we did clean up the duplicate persistence records prior to restarting the cluster).
And yes, I'm aware of the Split Brain Resolver from Lightbend. I'm sure it would work well too. At this point in our journey, where we have no revenue and little funds, we're looking primarily to open source and roll-our-own solutions. But as we get into production and have customers, we'll likely take advantage of Lightbend services and products. Thanks again. -- Eric On Sunday, August 7, 2016 at 9:48:08 AM UTC-7, Patrik Nordwall wrote: > > It's typically caused by multiple persistent actors with the same > persistenceId running at the same time. E.g. because there were a network > split and your cluster was split into two separate clusters and thereby > starting multiple persistent actors. That is why we so strongly recommend > against using auto-downing and instead recommend Split Brain Resolver > <http://doc.akka.io/docs/akka/rp-current/scala/split-brain-resolver.html> > or similar. > > Now you need to cleanup the corrupted data before starting the system. > > http://doc.akka.io/docs/akka/2.4/scala/cluster-sharding.html#Removal_of_Internal_Cluster_Sharding_Data > > Have you changed the default mode=repair-by-discard-old in the config of > the replay filter? > > https://github.com/akka/akka/blob/master/akka-persistence/src/main/resources/reference.conf#L131 > > Regards, > Patrik > > On Tue, Aug 2, 2016 at 9:44 PM, Eric Swenson <er...@swenson.org > <javascript:>> wrote: > >> I'm getting this error consistently now, and don't know why this is >> happening nor what to do about it. I form the persistentId this way: >> >> override def persistenceId: String = self.path.parent.parent.name + >> "-" + self.path.name >> >> So I don't see how I could have two persisters with the same id. I'm >> unable to bring up my akka cluster due to this error. >> >> Any suggestions? I'm running akka 2.4.8 with akka.persistence: >> >> journal.plugin = "cassandra-journal" >> snapshot-store.plugin = "cassandra-snapshot-store" >> >> >> On Monday, April 25, 2016 at 3:34:21 AM UTC-7, Tim Pigden wrote: >>> >>> Hi >>> I'm getting this message. I'm probably doing something wrong but any >>> idea what that might be? I know what messages I'm persisting and this >>> particular test is one in which I kill off my persistor and restart it. >>> Or does it indicate the message is failing to deserialize or something >>> like that >>> >>> 2016-04-25 10:33:47,570 - ERROR - from >>> com.optrak.opkakka.ddd.persistence.SimplePersistor Persistence failure when >>> replaying events for persistenceId [shd-matrix-testId]. Last known sequence >>> number [0] >>> java.lang.IllegalStateException: Invalid replayed event [1] in buffer from >>> old writer [f6bd09c4-1f1c-4710-8cf6-c6f0776f39d3] with persistenceId >>> [shd-matrix-testId] >>> at >>> akka.persistence.journal.ReplayFilter$$anonfun$receive$1.applyOrElse(ReplayFilter.scala:125) >>> at akka.actor.Actor$class.aroundReceive(Actor.scala:480) >>> at >>> akka.persistence.journal.ReplayFilter.aroundReceive(ReplayFilter.scala:50) >>> >>> >>> >>> Any suggests much appreciated! >>> >>> >>> >>> >>> -- >> >>>>>>>>>> Read the docs: http://akka.io/docs/ >> >>>>>>>>>> Check the FAQ: >> http://doc.akka.io/docs/akka/current/additional/faq.html >> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user >> --- >> You received this message because you are subscribed to the Google Groups >> "Akka User List" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to akka-user+...@googlegroups.com <javascript:>. >> To post to this group, send email to akka...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/akka-user. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > > Patrik Nordwall > Akka Tech Lead > Lightbend <http://www.lightbend.com/> - Reactive apps on the JVM > Twitter: @patriknw > > -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at https://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.