[akka-user] Sharding problem when restarting Cluster

2014-08-06 Thread Moax76
Hi,

We have a project that is using Akka Persistence (2.3.4) with sharding in a 
cluster (with 2 nodes).
It has 3 differnt types of AbstractPersistentActor all using Sharding.

When the cluster is running, everything works fine and it usually works to 
stop the cluster and restart it again. But from time to time the following 
happens when trying to start the cluster:

When starting the first node (server1), the logs gets flodded with 
log-entries like this:

ERROR akka.actor.OneForOneStrategy 
akka://NCSSystem/user/data/connectionCoordinator/singleton/coordinator 
:
requirement failed: Region 
Actor[akka.tcp://NCSSystem@server2/user/data/connection#789496176] 
not registered: State(Map(),Map(),Set())
java.lang.IllegalArgumentException: requirement failed: Region Actor[
akka.tcp://NCSSystem@server2/user/data/connection#789496176] not 
registered: State(Map(),Map(),Set())
at scala.Predef$.require(Predef.scala:233) ~[scala-library-2.10.
4.jar:na]
at akka.contrib.pattern.ShardCoordinator$Internal$State.updated(
ClusterSharding.scala:1115) ~[akka-contrib_2.10-2.3.4.jar:2.3.4]
at akka.contrib.pattern.
ShardCoordinator$$anonfun$receiveRecover$1.applyOrElse(ClusterSharding.scala
:1236) ~[akka-contrib_2.10-2.3.4.jar:2.3.4]
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(
AbstractPartialFunction.scala:33) ~[scala-library-2.10.4.jar:na]
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(
AbstractPartialFunction.scala:33) ~[scala-library-2.10.4.jar:na]
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(
AbstractPartialFunction.scala:25) ~[scala-library-2.10.4.jar:na]
at akka.persistence.
Eventsourced$$anonfun$akka$persistence$Eventsourced$$recoveryBehavior$1.
applyOrElse(Eventsourced.scala:168) ~[akka-persistence-experimental_2.10-2.3
.4.jar:na]
at akka.persistence.Recovery$State$$anonfun$processPersistent$1.
apply(Recovery.scala:33) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
at akka.persistence.Recovery$State$$anonfun$processPersistent$1.
apply(Recovery.scala:33) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
at akka.persistence.Recovery$class.withCurrentPersistent(
Recovery.scala:176) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
at akka.contrib.pattern.ShardCoordinator.withCurrentPersistent(
ClusterSharding.scala:1192) ~[akka-contrib_2.10-2.3.4.jar:2.3.4]
at akka.persistence.Recovery$State$class.processPersistent(
Recovery.scala:33) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
at akka.persistence.Recovery$$anon$1.processPersistent(Recovery.
scala:95) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
at akka.persistence.Recovery$$anon$1.aroundReceive(Recovery.
scala:101) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
at akka.persistence.Recovery$class.aroundReceive(Recovery.scala:
256) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
at akka.contrib.pattern.ShardCoordinator.
akka$persistence$Eventsourced$$super$aroundReceive(ClusterSharding.scala:
1192) ~[akka-contrib_2.10-2.3.4.jar:2.3.4]
at akka.persistence.Eventsourced$$anon$1.aroundReceive(
Eventsourced.scala:35) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
at akka.persistence.Eventsourced$class.aroundReceive(
Eventsourced.scala:369) ~[akka-persistence-experimental_2.10-2.3.4.jar:na]
at akka.contrib.pattern.ShardCoordinator.aroundReceive(
ClusterSharding.scala:1192) ~[akka-contrib_2.10-2.3.4.jar:2.3.4]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) [
akka-actor_2.10-2.3.4.jar:na]
at akka.actor.ActorCell.invoke(ActorCell.scala:487) [akka-
actor_2.10-2.3.4.jar:na]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) [akka
-actor_2.10-2.3.4.jar:na]
at akka.dispatch.Mailbox.run(Mailbox.scala:220) [akka-actor_2.10
-2.3.4.jar:na]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.
exec(AbstractDispatcher.scala:393) [akka-actor_2.10-2.3.4.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.
java:260) [scala-library-2.10.4.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(
ForkJoinPool.java:1339) [scala-library-2.10.4.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool
.java:1979) [scala-library-2.10.4.jar:na]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
ForkJoinWorkerThread.java:107) [scala-library-2.10.4.jar:na]




This happens for all 3 coordinators and it does not look like it will never 
stop logging. The connection#789496176 is not repeating.

The only way I have found to recover from this situation is to manually 
delete all journal- and snapshot-entries for the coordinators.

Any help or hint would be appreciated.

Best regards,
Morten Kjetland

-- 
>>  Re

Re: [akka-user] Sharding problem when restarting Cluster

2014-08-06 Thread Konrad Malawski
Hi Morten,
thanks for reporting!
Which journal plugin are you using?

It looks like during replay it gets an ShardHomeAllocated without getting
ShardRegionProxyRegistered first – which makes it blow up (we must first
register, then allocate the shard).

One reason could be that the persist of ShardRegionProxyRegistered never
succeeded...?
Would you be able to verify if your journal contains such these events (or
if SRPR is missing)?
It would be great to track down to the root of this problem. It *could* be
a bug on our side, but hard to pinpoint exactly yet.

-- 
Cheers,
Konrad 'ktoso' Malawski
hAkker @ Typesafe



-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Re: [akka-user] Sharding problem when restarting Cluster

2014-08-07 Thread Morten Kjetland
Hi,

Turns out there was a bug in our homebrew jdbc-snapshot implementation.

The loaded SelectedSnapshot was populated with Option(state) instead of
just the state, so the following lines in ShardCoordinator was not executed:

 case SnapshotOffer(_, state: State) =>
  log.debug("receiveRecover SnapshotOffer {}", state)
  persistentState = state

The snapshot was therefor never applied, so when it started receiving
events with sequenceNr after the snapshot, it blew up.

Thanks a lot for helping me in the right direction.

Best regards,
Morten Kjetland


On Wed, Aug 6, 2014 at 2:12 PM, Morten Kjetland  wrote:

> Thanks the response,
>
> We are using a homebrew jdbc journal.
>
> I checked the journal and ShardRegionProxyRegistered is written to it.
> But I was unable to reproduce the problem now.
> It might be a problem related to snapshoting in combination with a bug in
> our jdbc journal.
> I'll try to reproduce it later and check the db again.
>
> I just saw that https://github.com/dnvriend/akka-persistence-jdbc was
> worked on during the summer, so I'll try to use that one instead of our
> own, and see if the problem goes away.
>
> Best regards,
> Morten Kjetland
>
>
> On Wed, Aug 6, 2014 at 12:40 PM, Konrad Malawski 
> wrote:
>
>> Hi Morten,
>> thanks for reporting!
>> Which journal plugin are you using?
>>
>> It looks like during replay it gets an ShardHomeAllocated without getting
>> ShardRegionProxyRegistered first - which makes it blow up (we must first
>> register, then allocate the shard).
>>
>> One reason could be that the persist of ShardRegionProxyRegistered never
>> succeeded...?
>> Would you be able to verify if your journal contains such these events
>> (or if SRPR is missing)?
>> It would be great to track down to the root of this problem. It *could*
>> be a bug on our side, but hard to pinpoint exactly yet.
>>
>> --
>> Cheers,
>> Konrad 'ktoso' Malawski
>> hAkker @ Typesafe
>>
>> 
>>
>> --
>> >> Read the docs: http://akka.io/docs/
>> >> Check the FAQ:
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>> >> Search the archives: https://groups.google.com/group/akka-user
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to akka-user+unsubscr...@googlegroups.com.
>> To post to this group, send email to akka-user@googlegroups.com.
>> Visit this group at http://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Re: [akka-user] Sharding problem when restarting Cluster

2014-08-07 Thread Konrad Malawski
Great to hear you've found the problem!
We'll provide a TCK for journal plugins with the next release (minor
already), so I suggest grinding your custom plugin with it to see if it's
really valid :-)

Happy hakking!


On Thu, Aug 7, 2014 at 11:30 AM, Morten Kjetland  wrote:

> Hi,
>
> Turns out there was a bug in our homebrew jdbc-snapshot implementation.
>
> The loaded SelectedSnapshot was populated with Option(state) instead of
> just the state, so the following lines in ShardCoordinator was not executed:
>
>  case SnapshotOffer(_, state: State) ⇒
>   log.debug("receiveRecover SnapshotOffer {}", state)
>   persistentState = state
>
> The snapshot was therefor never applied, so when it started receiving
> events with sequenceNr after the snapshot, it blew up.
>
> Thanks a lot for helping me in the right direction.
>
> Best regards,
> Morten Kjetland
>
>
> On Wed, Aug 6, 2014 at 2:12 PM, Morten Kjetland  wrote:
>
>> Thanks the response,
>>
>> We are using a homebrew jdbc journal.
>>
>> I checked the journal and ShardRegionProxyRegistered is written to it.
>> But I was unable to reproduce the problem now.
>> It might be a problem related to snapshoting in combination with a bug in
>> our jdbc journal.
>> I'll try to reproduce it later and check the db again.
>>
>> I just saw that https://github.com/dnvriend/akka-persistence-jdbc was
>> worked on during the summer, so I'll try to use that one instead of our
>> own, and see if the problem goes away.
>>
>> Best regards,
>> Morten Kjetland
>>
>>
>> On Wed, Aug 6, 2014 at 12:40 PM, Konrad Malawski 
>> wrote:
>>
>>> Hi Morten,
>>> thanks for reporting!
>>> Which journal plugin are you using?
>>>
>>> It looks like during replay it gets an ShardHomeAllocated without
>>> getting ShardRegionProxyRegistered first – which makes it blow up (we must
>>> first register, then allocate the shard).
>>>
>>> One reason could be that the persist of ShardRegionProxyRegistered never
>>> succeeded...?
>>> Would you be able to verify if your journal contains such these events
>>> (or if SRPR is missing)?
>>> It would be great to track down to the root of this problem. It *could*
>>> be a bug on our side, but hard to pinpoint exactly yet.
>>>
>>> --
>>> Cheers,
>>> Konrad 'ktoso' Malawski
>>> hAkker @ Typesafe
>>>
>>> 
>>>
>>> --
>>> >> Read the docs: http://akka.io/docs/
>>> >> Check the FAQ:
>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>> >> Search the archives:
>>> https://groups.google.com/group/akka-user
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Akka User List" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to akka-user+unsubscr...@googlegroups.com.
>>> To post to this group, send email to akka-user@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/akka-user.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>  --
> >> Read the docs: http://akka.io/docs/
> >> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to akka-user+unsubscr...@googlegroups.com.
> To post to this group, send email to akka-user@googlegroups.com.
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Cheers,
Konrad 'ktoso' Malawski
hAkker @ Typesafe



-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Re: [akka-user] Sharding problem when restarting Cluster

2014-11-10 Thread Richard Bowker
I have had seen a similar problem when restarting nodes in a cluster using 
sharding.

after restarting, the node with the shard coordinator went into an infinite 
error loop.

I was using akka 2.3.6 and "com.github.krasserm" %% 
"akka-persistence-cassandra" % "0.3.4" as the journal/persistence store.

section of the error log below, I didn not know what to do to recover this 
other than just manually delete all the akka keystores from the database 
which obviously isn't ideal!

any thoughts?

thanks

[ERROR] [11/10/2014 13:16:21.969] 
[ClusterSystem-akka.actor.default-dispatcher-17] 
[akka://ClusterSystem/user/sharding/PollServiceCoordinator/singleton/coordinator]
 

requirement failed: Region 
Actor[akka.tcp://ClusterSystem@172.31.18.169:2552/user/sharding/PollService#546005322]
 
not registered: State(Map(test47 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> 
Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 -> 
Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test42 -> 
Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 
test29 -> 

Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 
test18 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test14 -> 
Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 
test36 -> Actor

[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 
test25 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
test28 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43 -> 
Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 

test32 -> 
Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 
test20 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test15 -> 
Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 
test33 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test22 -> 
Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 
test0 

-> 
Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 
test44 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 

test11 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
test41 -> Actor

[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 
test37 -> Actor

[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 
test16 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
test9 -> 

Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 
test23 -> Actor

[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 
test34 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
test45 -> 

Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 
test38 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
test8 

-> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test19 
-> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test35 
-> Actor

[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 
test5 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
test24 -> 

Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test2 -> 
Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828]),Map

(Actor[akka://ClusterSystem/user/sharding/PollService#1625036981] -> 
Vector(test6, test18, test28, test20, test33, test44, test16, test34, 
test8, test24, test5, test11, 

test19, test25, test30, test38, test47), 
Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828]
 
-> Vector(test42, test36, test32, test15, 

test0, test41, test23, test45, test35, test2, test9, test14, test22, 
test29, test37, test43)),Set(Actor

[akka.tcp://ClusterSystem@172.31.24.129:49813/user/sharding/PollService#1785638107]))
java.lang.IllegalArgumentException: requirement failed: Region 
Actor[akka.tcp://ClusterSystem@172.31.18.169:2552/user/sharding/PollService#546005322]
 
not registered: 

State(Map(test47 -> 
Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> 
Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 

-> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test42 
-> 
Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 

test29 -> 
Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
 
test18 -> Actor

[akka://ClusterSystem/user/sharding/PollService#1625036981], test14 -> 
Actor[akk

Re: [akka-user] Sharding problem when restarting Cluster

2014-11-10 Thread Patrik Nordwall
Hi Richard,

That is not good. We have seen similar issue a few times and tracked it
down to bugs in the journal implementations. It will happen when events are
replayed in the wrong order.

Is there a way we can reproduce this?

Regards,
Patrik

On Mon, Nov 10, 2014 at 2:42 PM, Richard Bowker <
mechajohntravo...@googlemail.com> wrote:

> I have had seen a similar problem when restarting nodes in a cluster using
> sharding.
>
> after restarting, the node with the shard coordinator went into an
> infinite error loop.
>
> I was using akka 2.3.6 and "com.github.krasserm" %%
> "akka-persistence-cassandra" % "0.3.4" as the journal/persistence store.
>
> section of the error log below, I didn not know what to do to recover this
> other than just manually delete all the akka keystores from the database
> which obviously isn't ideal!
>
> any thoughts?
>
> thanks
>
> [ERROR] [11/10/2014 13:16:21.969]
> [ClusterSystem-akka.actor.default-dispatcher-17]
> [akka://ClusterSystem/user/sharding/PollServiceCoordinator/singleton/coordinator]
>
> requirement failed: Region Actor[akka.tcp://
> ClusterSystem@172.31.18.169:2552/user/sharding/PollService#546005322] not
> registered: State(Map(test47 -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test6 ->
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 ->
> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test42 ->
> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
> test29 ->
>
> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
> test18 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>
> test14 -> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
> test36 -> Actor
>
> [akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
> test25 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
> test28 ->
>
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43
> -> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
>
> test32 -> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
> test20 -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test15 ->
> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
> test33 ->
>
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test22
> -> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
> test0
>
> -> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
> test44 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>
> test11 ->
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test41 ->
> Actor
>
> [akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
> test37 -> Actor
>
> [akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
> test16 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
> test9 ->
>
> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
> test23 -> Actor
>
> [akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
> test34 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
> test45 ->
>
> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
> test38 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
> test8
>
> -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
> test19 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
> test35 -> Actor
>
> [akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828],
> test5 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
> test24 ->
>
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test2 ->
> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828]),Map
>
> (Actor[akka://ClusterSystem/user/sharding/PollService#1625036981] ->
> Vector(test6, test18, test28, test20, test33, test44, test16, test34,
> test8, test24, test5, test11,
>
> test19, test25, test30, test38, test47), Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828] ->
> Vector(test42, test36, test32, test15,
>
> test0, test41, test23, test45, test35, test2, test9, test14, test22,
> test29, test37, test43)),Set(Actor
>
> [akka.tcp://
> ClusterSystem@172.31.24.129:49813/user/sharding/PollService#1785638107]))
> java.lang.IllegalArgumentException: requirement failed: Region
> Actor[akka.tcp://
> ClusterSystem@172.31.18.169:2552/user/sharding/PollService#546005322] not
> registered:
>
> State(Map(test47 ->
> Actor[akka://Clus

Re: [akka-user] Sharding problem when restarting Cluster

2014-11-10 Thread Richard Bowker
Hi Patrik, unfortunately not.  In fact its only happened once to me so far 
so may be a difficult one to reproduce.

I will of course get back to you if I can find a trigger.

Tackar!

On Monday, November 10, 2014 4:04:18 PM UTC, Patrik Nordwall wrote:
>
> Hi Richard,
>
> That is not good. We have seen similar issue a few times and tracked it 
> down to bugs in the journal implementations. It will happen when events are 
> replayed in the wrong order.
>
> Is there a way we can reproduce this?
>
> Regards,
> Patrik
>
> On Mon, Nov 10, 2014 at 2:42 PM, Richard Bowker <
> mechajoh...@googlemail.com > wrote:
>
> I have had seen a similar problem when restarting nodes in a cluster using 
> sharding.
>
> after restarting, the node with the shard coordinator went into an 
> infinite error loop.
>
> I was using akka 2.3.6 and "com.github.krasserm" %% 
> "akka-persistence-cassandra" % "0.3.4" as the journal/persistence store.
>
> section of the error log below, I didn not know what to do to recover this 
> other than just manually delete all the akka keystores from the database 
> which obviously isn't ideal!
>
> any thoughts?
>
> thanks
>
> [ERROR] [11/10/2014 13:16:21.969] 
> [ClusterSystem-akka.actor.default-dispatcher-17] 
> [akka://ClusterSystem/user/sharding/PollServiceCoordinator/singleton/coordinator]
>  
>
> requirement failed: Region Actor[akka.tcp://
> ClusterSystem@172.31.18.169:2552/user/sharding/PollService#546005322] not 
> registered: State(Map(test47 -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> 
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 -> 
> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test42 -> 
> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 
> test29 -> 
>
> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 
> test18 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
>
> test14 -> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 
> test36 -> Actor
>
> [akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 
> test25 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test28 -> 
>
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43 
> -> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 
>
> test32 -> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 
> test20 -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test15 -> 
> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 
> test33 -> 
>
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test22 
> -> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 
> test0 
>
> -> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 
> test44 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
>
> test11 -> 
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test41 -> 
> Actor
>
> [akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 
> test37 -> Actor
>
> [akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 
> test16 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test9 -> 
>
> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 
> test23 -> Actor
>
> [akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 
> test34 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test45 -> 
>
> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 
> test38 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test8 
>
> -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test19 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test35 -> Actor
>
> [akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828], 
> test5 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test24 -> 
>
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test2 -> 
> Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828] 
> 
> ),Map
>
> (Actor[akka://ClusterSystem/user/sharding/PollService#1625036981] -> 
> Vector(test6, test18, test28, test20, test33, test44, test16, test34, 
> test8, test24, test5, test11, 
>
> test19, test25, test30, test38, test47), Actor[akka.tcp://
> ClusterSystem@172.31.21.9:2552/user/sharding/PollService#1716980828] -> 
> Vector(test42, test36, test32, test15, 
>

Re: [akka-user] Sharding problem when restarting Cluster

2014-11-10 Thread Patrik Nordwall
On Mon, Nov 10, 2014 at 5:18 PM, Richard Bowker <
mechajohntravo...@googlemail.com> wrote:

> Hi Patrik, unfortunately not.  In fact its only happened once to me so far
> so may be a difficult one to reproduce.
>
> I will of course get back to you if I can find a trigger.
>

Thanks. That is problematic with these bugs. I don't know much about
Cassandra. Is it possible to export the data from cassandra if this happens
again so it could be analyzed (replayed) by us? Given that you don't have
any sensitive information in it.

/Patrik


>
> Tackar!
>
> On Monday, November 10, 2014 4:04:18 PM UTC, Patrik Nordwall wrote:
>>
>> Hi Richard,
>>
>> That is not good. We have seen similar issue a few times and tracked it
>> down to bugs in the journal implementations. It will happen when events are
>> replayed in the wrong order.
>>
>> Is there a way we can reproduce this?
>>
>> Regards,
>> Patrik
>>
>> On Mon, Nov 10, 2014 at 2:42 PM, Richard Bowker > com> wrote:
>>
>> I have had seen a similar problem when restarting nodes in a cluster
>> using sharding.
>>
>> after restarting, the node with the shard coordinator went into an
>> infinite error loop.
>>
>> I was using akka 2.3.6 and "com.github.krasserm" %%
>> "akka-persistence-cassandra" % "0.3.4" as the journal/persistence store.
>>
>> section of the error log below, I didn not know what to do to recover
>> this other than just manually delete all the akka keystores from the
>> database which obviously isn't ideal!
>>
>> any thoughts?
>>
>> thanks
>>
>> [ERROR] [11/10/2014 13:16:21.969] 
>> [ClusterSystem-akka.actor.default-dispatcher-17]
>> [akka://ClusterSystem/user/sharding/PollServiceCoordinator/
>> singleton/coordinator]
>>
>> requirement failed: Region Actor[akka.tcp://ClusterSystem
>> @172.31.18.169:2552/user/sharding/PollService#546005322] not registered:
>> State(Map(test47 -> Actor
>>
>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test6 ->
>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30
>> -> Actor
>>
>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test42 ->
>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test29 ->
>>
>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test18 -> Actor[akka://ClusterSystem/
>> user/sharding/PollService#1625036981],
>>
>> test14 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test36 -> Actor
>>
>> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
>> PollService#1716980828], test25 -> Actor[akka://ClusterSystem/
>> user/sharding/PollService#1625036981], test28 ->
>>
>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43
>> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828],
>>
>> test32 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test20 -> Actor
>>
>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test15 ->
>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test33 ->
>>
>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test22
>> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test0
>>
>> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/
>> user/sharding/PollService#1625036981],
>>
>> test11 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>> test41 -> Actor
>>
>> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
>> PollService#1716980828], test37 -> Actor
>>
>> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
>> PollService#1716980828], test16 -> Actor[akka://ClusterSystem/
>> user/sharding/PollService#1625036981], test9 ->
>>
>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test23 -> Actor
>>
>> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
>> PollService#1716980828], test34 -> Actor[akka://ClusterSystem/
>> user/sharding/PollService#1625036981], test45 ->
>>
>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test38 -> Actor[akka://ClusterSystem/
>> user/sharding/PollService#1625036981], test8
>>
>> -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>> test19 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>> test35 -> Actor
>>
>> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
>> PollService#1716980828], test5 -> Actor[akka://ClusterSystem/
>> user/sharding/PollService#1625036981], test24 ->
>>
>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test2
>> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828]
>> 

Re: [akka-user] Sharding problem when restarting Cluster

2014-11-10 Thread Richard Bowker
sure, no problem.

On Monday, November 10, 2014 4:35:24 PM UTC, Patrik Nordwall wrote:
>
>
>
> On Mon, Nov 10, 2014 at 5:18 PM, Richard Bowker <
> mechajoh...@googlemail.com > wrote:
>
> Hi Patrik, unfortunately not.  In fact its only happened once to me so far 
> so may be a difficult one to reproduce.
>
> I will of course get back to you if I can find a trigger.
>
>
> Thanks. That is problematic with these bugs. I don't know much about 
> Cassandra. Is it possible to export the data from cassandra if this happens 
> again so it could be analyzed (replayed) by us? Given that you don't have 
> any sensitive information in it.
>
> /Patrik
>  
>
>
> Tackar!
>
> On Monday, November 10, 2014 4:04:18 PM UTC, Patrik Nordwall wrote:
>
> Hi Richard,
>
> That is not good. We have seen similar issue a few times and tracked it 
> down to bugs in the journal implementations. It will happen when events are 
> replayed in the wrong order.
>
> Is there a way we can reproduce this?
>
> Regards,
> Patrik
>
> On Mon, Nov 10, 2014 at 2:42 PM, Richard Bowker  com> wrote:
>
> I have had seen a similar problem when restarting nodes in a cluster using 
> sharding.
>
> after restarting, the node with the shard coordinator went into an 
> infinite error loop.
>
> I was using akka 2.3.6 and "com.github.krasserm" %% 
> "akka-persistence-cassandra" % "0.3.4" as the journal/persistence store.
>
> section of the error log below, I didn not know what to do to recover this 
> other than just manually delete all the akka keystores from the database 
> which obviously isn't ideal!
>
> any thoughts?
>
> thanks
>
> [ERROR] [11/10/2014 13:16:21.969] 
> [ClusterSystem-akka.actor.default-dispatcher-17] 
> [akka://ClusterSystem/user/sharding/PollServiceCoordinator/
> singleton/coordinator] 
>
> requirement failed: Region Actor[akka.tcp://ClusterSystem
> @172.31.18.169:2552/user/sharding/PollService#546005322] not registered: 
> State(Map(test47 -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> 
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 
> -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test42 -> 
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test29 -> 
>
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test18 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], 
>
> test14 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test36 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test25 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test28 -> 
>
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43 
> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], 
>
> test32 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test20 -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test15 -> 
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test33 -> 
>
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test22 
> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test0 
>
> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], 
>
> test11 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test41 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test37 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test16 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test9 -> 
>
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test23 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test34 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test45 -> 
>
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test38 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test8 
>
> -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test19 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test35 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test5 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test24 -> 
>
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test2 
> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828] 
> 

Re: [akka-user] Sharding problem when restarting Cluster

2014-11-11 Thread Richard Bowker
Hi Patrik, I have managed to repro it twice again. We have typesafe support 
so I will get in touch with them to discuss how best to send the repro 
setup, as it's not a simple attachment!

thanks

Rich

On Monday, November 10, 2014 4:35:24 PM UTC, Patrik Nordwall wrote:
>
>
>
> On Mon, Nov 10, 2014 at 5:18 PM, Richard Bowker <
> mechajoh...@googlemail.com > wrote:
>
> Hi Patrik, unfortunately not.  In fact its only happened once to me so far 
> so may be a difficult one to reproduce.
>
> I will of course get back to you if I can find a trigger.
>
>
> Thanks. That is problematic with these bugs. I don't know much about 
> Cassandra. Is it possible to export the data from cassandra if this happens 
> again so it could be analyzed (replayed) by us? Given that you don't have 
> any sensitive information in it.
>
> /Patrik
>  
>
>
> Tackar!
>
> On Monday, November 10, 2014 4:04:18 PM UTC, Patrik Nordwall wrote:
>
> Hi Richard,
>
> That is not good. We have seen similar issue a few times and tracked it 
> down to bugs in the journal implementations. It will happen when events are 
> replayed in the wrong order.
>
> Is there a way we can reproduce this?
>
> Regards,
> Patrik
>
> On Mon, Nov 10, 2014 at 2:42 PM, Richard Bowker  com> wrote:
>
> I have had seen a similar problem when restarting nodes in a cluster using 
> sharding.
>
> after restarting, the node with the shard coordinator went into an 
> infinite error loop.
>
> I was using akka 2.3.6 and "com.github.krasserm" %% 
> "akka-persistence-cassandra" % "0.3.4" as the journal/persistence store.
>
> section of the error log below, I didn not know what to do to recover this 
> other than just manually delete all the akka keystores from the database 
> which obviously isn't ideal!
>
> any thoughts?
>
> thanks
>
> [ERROR] [11/10/2014 13:16:21.969] 
> [ClusterSystem-akka.actor.default-dispatcher-17] 
> [akka://ClusterSystem/user/sharding/PollServiceCoordinator/
> singleton/coordinator] 
>
> requirement failed: Region Actor[akka.tcp://ClusterSystem
> @172.31.18.169:2552/user/sharding/PollService#546005322] not registered: 
> State(Map(test47 -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> 
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 
> -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test42 -> 
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test29 -> 
>
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test18 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], 
>
> test14 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test36 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test25 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test28 -> 
>
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43 
> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], 
>
> test32 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test20 -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test15 -> 
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test33 -> 
>
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test22 
> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test0 
>
> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], 
>
> test11 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test41 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test37 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test16 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test9 -> 
>
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test23 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test34 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test45 -> 
>
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test38 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test8 
>
> -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test19 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test35 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test5 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test24 -> 
>
> Actor[akka://Clust

Re: [akka-user] Sharding problem when restarting Cluster

2014-11-11 Thread Patrik Nordwall
On Tue, Nov 11, 2014 at 1:34 PM, Richard Bowker <
mechajohntravo...@googlemail.com> wrote:

> Hi Patrik, I have managed to repro it twice again. We have typesafe
> support so I will get in touch with them to discuss how best to send the
> repro setup, as it's not a simple attachment!
>

Excellent!
Thanks


>
> thanks
>
> Rich
>
> On Monday, November 10, 2014 4:35:24 PM UTC, Patrik Nordwall wrote:
>>
>>
>>
>> On Mon, Nov 10, 2014 at 5:18 PM, Richard Bowker > com> wrote:
>>
>> Hi Patrik, unfortunately not.  In fact its only happened once to me so
>> far so may be a difficult one to reproduce.
>>
>> I will of course get back to you if I can find a trigger.
>>
>>
>> Thanks. That is problematic with these bugs. I don't know much about
>> Cassandra. Is it possible to export the data from cassandra if this happens
>> again so it could be analyzed (replayed) by us? Given that you don't have
>> any sensitive information in it.
>>
>> /Patrik
>>
>>
>>
>> Tackar!
>>
>> On Monday, November 10, 2014 4:04:18 PM UTC, Patrik Nordwall wrote:
>>
>> Hi Richard,
>>
>> That is not good. We have seen similar issue a few times and tracked it
>> down to bugs in the journal implementations. It will happen when events are
>> replayed in the wrong order.
>>
>> Is there a way we can reproduce this?
>>
>> Regards,
>> Patrik
>>
>> On Mon, Nov 10, 2014 at 2:42 PM, Richard Bowker > com> wrote:
>>
>> I have had seen a similar problem when restarting nodes in a cluster
>> using sharding.
>>
>> after restarting, the node with the shard coordinator went into an
>> infinite error loop.
>>
>> I was using akka 2.3.6 and "com.github.krasserm" %%
>> "akka-persistence-cassandra" % "0.3.4" as the journal/persistence store.
>>
>> section of the error log below, I didn not know what to do to recover
>> this other than just manually delete all the akka keystores from the
>> database which obviously isn't ideal!
>>
>> any thoughts?
>>
>> thanks
>>
>> [ERROR] [11/10/2014 13:16:21.969] 
>> [ClusterSystem-akka.actor.default-dispatcher-17]
>> [akka://ClusterSystem/user/sharding/PollServiceCoordinator/s
>> ingleton/coordinator]
>>
>> requirement failed: Region Actor[akka.tcp://ClusterSystem
>> @172.31.18.169:2552/user/sharding/PollService#546005322] not registered:
>> State(Map(test47 -> Actor
>>
>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test6 ->
>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30
>> -> Actor
>>
>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test42 ->
>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/shardin
>> g/PollService#1716980828], test29 ->
>>
>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/shardin
>> g/PollService#1716980828], test18 -> Actor[akka://ClusterSystem/use
>> r/sharding/PollService#1625036981],
>>
>> test14 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/shardin
>> g/PollService#1716980828], test36 -> Actor
>>
>> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/Pol
>> lService#1716980828], test25 -> Actor[akka://ClusterSystem/use
>> r/sharding/PollService#1625036981], test28 ->
>>
>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43
>> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/shardin
>> g/PollService#1716980828],
>>
>> test32 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/shardin
>> g/PollService#1716980828], test20 -> Actor
>>
>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test15 ->
>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/shardin
>> g/PollService#1716980828], test33 ->
>>
>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test22
>> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/shardin
>> g/PollService#1716980828], test0
>>
>> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/shardin
>> g/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/use
>> r/sharding/PollService#1625036981],
>>
>> test11 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>> test41 -> Actor
>>
>> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/Pol
>> lService#1716980828], test37 -> Actor
>>
>> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/Pol
>> lService#1716980828], test16 -> Actor[akka://ClusterSystem/use
>> r/sharding/PollService#1625036981], test9 ->
>>
>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/shardin
>> g/PollService#1716980828], test23 -> Actor
>>
>> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/Pol
>> lService#1716980828], test34 -> Actor[akka://ClusterSystem/use
>> r/sharding/PollService#1625036981], test45 ->
>>
>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/shardin
>> g/PollService#1716980828], test38 -> Actor[akka://ClusterSystem/use
>> r/sharding/PollService#1625036981], test8
>>
>> -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>> test19 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>> test35 -> Actor
>>
>> [

Re: [akka-user] Sharding problem when restarting Cluster

2015-06-09 Thread Anders Båtstrand
Is there any news about this? I have experienced the same, using Akka 
2.3.10 and Cassandra (latest version of the plugin).

Best regards,

Anders

tirsdag 11. november 2014 13.34.03 UTC+1 skrev Richard Bowker følgende:
>
> Hi Patrik, I have managed to repro it twice again. We have typesafe 
> support so I will get in touch with them to discuss how best to send the 
> repro setup, as it's not a simple attachment!
>
> thanks
>
> Rich
>
> On Monday, November 10, 2014 4:35:24 PM UTC, Patrik Nordwall wrote:
>
>
>
> On Mon, Nov 10, 2014 at 5:18 PM, Richard Bowker <
> mechajoh...@googlemail.com> wrote:
>
> Hi Patrik, unfortunately not.  In fact its only happened once to me so far 
> so may be a difficult one to reproduce.
>
> I will of course get back to you if I can find a trigger.
>
>
> Thanks. That is problematic with these bugs. I don't know much about 
> Cassandra. Is it possible to export the data from cassandra if this happens 
> again so it could be analyzed (replayed) by us? Given that you don't have 
> any sensitive information in it.
>
> /Patrik
>  
>
>
> Tackar!
>
> On Monday, November 10, 2014 4:04:18 PM UTC, Patrik Nordwall wrote:
>
> Hi Richard,
>
> That is not good. We have seen similar issue a few times and tracked it 
> down to bugs in the journal implementations. It will happen when events are 
> replayed in the wrong order.
>
> Is there a way we can reproduce this?
>
> Regards,
> Patrik
>
> On Mon, Nov 10, 2014 at 2:42 PM, Richard Bowker  com> wrote:
>
> I have had seen a similar problem when restarting nodes in a cluster using 
> sharding.
>
> after restarting, the node with the shard coordinator went into an 
> infinite error loop.
>
> I was using akka 2.3.6 and "com.github.krasserm" %% 
> "akka-persistence-cassandra" % "0.3.4" as the journal/persistence store.
>
> section of the error log below, I didn not know what to do to recover this 
> other than just manually delete all the akka keystores from the database 
> which obviously isn't ideal!
>
> any thoughts?
>
> thanks
>
> [ERROR] [11/10/2014 13:16:21.969] 
> [ClusterSystem-akka.actor.default-dispatcher-17] 
> [akka://ClusterSystem/user/sharding/PollServiceCoordinator/
> singleton/coordinator] 
>
> requirement failed: Region Actor[akka.tcp://ClusterSystem
> @172.31.18.169:2552/user/sharding/PollService#546005322] not registered: 
> State(Map(test47 -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> 
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 
> -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test42 -> 
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test29 -> 
>
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test18 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], 
>
> test14 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test36 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test25 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test28 -> 
>
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43 
> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], 
>
> test32 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test20 -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test15 -> 
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test33 -> 
>
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test22 
> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test0 
>
> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], 
>
> test11 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test41 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test37 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test16 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test9 -> 
>
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test23 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test34 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test45 -> 
>
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test38 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test8 
>
> -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test19 -> Actor[akka://ClusterSystem/user/sharding/PollServ

Re: [akka-user] Sharding problem when restarting Cluster

2015-06-09 Thread Brandon Arp
I am seeing this as well with Akka 2.3.10.

On Tuesday, June 9, 2015 at 2:27:18 AM UTC-7, Anders Båtstrand wrote:
>
> Is there any news about this? I have experienced the same, using Akka 
> 2.3.10 and Cassandra (latest version of the plugin).
>
> Best regards,
>
> Anders
>
> tirsdag 11. november 2014 13.34.03 UTC+1 skrev Richard Bowker følgende:
>
> Hi Patrik, I have managed to repro it twice again. We have typesafe 
> support so I will get in touch with them to discuss how best to send the 
> repro setup, as it's not a simple attachment!
>
> thanks
>
> Rich
>
> On Monday, November 10, 2014 4:35:24 PM UTC, Patrik Nordwall wrote:
>
>
>
> On Mon, Nov 10, 2014 at 5:18 PM, Richard Bowker <
> mechajoh...@googlemail.com> wrote:
>
> Hi Patrik, unfortunately not.  In fact its only happened once to me so far 
> so may be a difficult one to reproduce.
>
> I will of course get back to you if I can find a trigger.
>
>
> Thanks. That is problematic with these bugs. I don't know much about 
> Cassandra. Is it possible to export the data from cassandra if this happens 
> again so it could be analyzed (replayed) by us? Given that you don't have 
> any sensitive information in it.
>
> /Patrik
>  
>
>
> Tackar!
>
> On Monday, November 10, 2014 4:04:18 PM UTC, Patrik Nordwall wrote:
>
> Hi Richard,
>
> That is not good. We have seen similar issue a few times and tracked it 
> down to bugs in the journal implementations. It will happen when events are 
> replayed in the wrong order.
>
> Is there a way we can reproduce this?
>
> Regards,
> Patrik
>
> On Mon, Nov 10, 2014 at 2:42 PM, Richard Bowker  com> wrote:
>
> I have had seen a similar problem when restarting nodes in a cluster using 
> sharding.
>
> after restarting, the node with the shard coordinator went into an 
> infinite error loop.
>
> I was using akka 2.3.6 and "com.github.krasserm" %% 
> "akka-persistence-cassandra" % "0.3.4" as the journal/persistence store.
>
> section of the error log below, I didn not know what to do to recover this 
> other than just manually delete all the akka keystores from the database 
> which obviously isn't ideal!
>
> any thoughts?
>
> thanks
>
> [ERROR] [11/10/2014 13:16:21.969] 
> [ClusterSystem-akka.actor.default-dispatcher-17] 
> [akka://ClusterSystem/user/sharding/PollServiceCoordinator/
> singleton/coordinator] 
>
> requirement failed: Region Actor[akka.tcp://ClusterSystem
> @172.31.18.169:2552/user/sharding/PollService#546005322] not registered: 
> State(Map(test47 -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> 
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30 
> -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test42 -> 
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test29 -> 
>
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test18 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], 
>
> test14 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test36 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test25 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test28 -> 
>
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43 
> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], 
>
> test32 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test20 -> Actor
>
> [akka://ClusterSystem/user/sharding/PollService#1625036981], test15 -> 
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test33 -> 
>
> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test22 
> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test0 
>
> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
> sharding/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], 
>
> test11 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
> test41 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test37 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test16 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test9 -> 
>
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test23 -> Actor
>
> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
> PollService#1716980828], test34 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test45 -> 
>
> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/PollService#
> 1716980828], test38 -> Actor[akka://ClusterSystem/
> user/sharding/PollService#1625036981], test8 
>
> ->

Re: [akka-user] Sharding problem when restarting Cluster

2015-06-10 Thread Patrik Nordwall
We need logs (debug level) and description of the scenario. Perhaps it is
best that you create a github issue and we can discuss over there.

/Patrik

tis 9 jun 2015 kl. 23:58 skrev Brandon Arp :

> I am seeing this as well with Akka 2.3.10.
>
>
> On Tuesday, June 9, 2015 at 2:27:18 AM UTC-7, Anders Båtstrand wrote:
>
>> Is there any news about this? I have experienced the same, using Akka
>> 2.3.10 and Cassandra (latest version of the plugin).
>>
>> Best regards,
>>
>> Anders
>>
>> tirsdag 11. november 2014 13.34.03 UTC+1 skrev Richard Bowker følgende:
>>
>> Hi Patrik, I have managed to repro it twice again. We have typesafe
>> support so I will get in touch with them to discuss how best to send the
>> repro setup, as it's not a simple attachment!
>>
>> thanks
>>
>> Rich
>>
>> On Monday, November 10, 2014 4:35:24 PM UTC, Patrik Nordwall wrote:
>>
>>
>>
>> On Mon, Nov 10, 2014 at 5:18 PM, Richard Bowker <
>> mechajoh...@googlemail.com> wrote:
>>
>> Hi Patrik, unfortunately not.  In fact its only happened once to me so
>> far so may be a difficult one to reproduce.
>>
>> I will of course get back to you if I can find a trigger.
>>
>>
>> Thanks. That is problematic with these bugs. I don't know much about
>> Cassandra. Is it possible to export the data from cassandra if this happens
>> again so it could be analyzed (replayed) by us? Given that you don't have
>> any sensitive information in it.
>>
>> /Patrik
>>
>>
>>
>> Tackar!
>>
>> On Monday, November 10, 2014 4:04:18 PM UTC, Patrik Nordwall wrote:
>>
>> Hi Richard,
>>
>> That is not good. We have seen similar issue a few times and tracked it
>> down to bugs in the journal implementations. It will happen when events are
>> replayed in the wrong order.
>>
>> Is there a way we can reproduce this?
>>
>> Regards,
>> Patrik
>>
>> On Mon, Nov 10, 2014 at 2:42 PM, Richard Bowker > com> wrote:
>>
>> I have had seen a similar problem when restarting nodes in a cluster
>> using sharding.
>>
>> after restarting, the node with the shard coordinator went into an
>> infinite error loop.
>>
>> I was using akka 2.3.6 and "com.github.krasserm" %%
>> "akka-persistence-cassandra" % "0.3.4" as the journal/persistence store.
>>
>> section of the error log below, I didn not know what to do to recover
>> this other than just manually delete all the akka keystores from the
>> database which obviously isn't ideal!
>>
>> any thoughts?
>>
>> thanks
>>
>> [ERROR] [11/10/2014 13:16:21.969] 
>> [ClusterSystem-akka.actor.default-dispatcher-17]
>> [akka://ClusterSystem/user/sharding/PollServiceCoordinator/
>> singleton/coordinator]
>>
>> requirement failed: Region Actor[akka.tcp://ClusterSystem
>> @172.31.18.169:2552/user/sharding/PollService#546005322] not registered:
>> State(Map(test47 -> Actor
>>
>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test6 ->
>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test30
>> -> Actor
>>
>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test42 ->
>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test29 ->
>>
>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test18 -> Actor[akka://ClusterSystem/
>> user/sharding/PollService#1625036981],
>>
>> test14 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test36 -> Actor
>>
>> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
>> PollService#1716980828], test25 -> Actor[akka://ClusterSystem/
>> user/sharding/PollService#1625036981], test28 ->
>>
>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test43
>> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828],
>>
>> test32 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test20 -> Actor
>>
>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test15 ->
>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test33 ->
>>
>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], test22
>> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test0
>>
>> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/
>> user/sharding/PollService#1625036981],
>>
>> test11 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981],
>> test41 -> Actor
>>
>> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
>> PollService#1716980828], test37 -> Actor
>>
>> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
>> PollService#1716980828], test16 -> Actor[akka://ClusterSystem/
>> user/sharding/PollService#1625036981], test9 ->
>>
>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>> sharding/PollService#1716980828], test23 -> Actor
>>
>> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding

Re: [akka-user] Sharding problem when restarting Cluster

2015-06-15 Thread GG
Was there a github issue created for this? I am seeing something that looks 
very similar and I don't want to create a duplicate ticket if one already 
exists.

On Wednesday, June 10, 2015 at 1:27:55 AM UTC-7, Patrik Nordwall wrote:
>
> We need logs (debug level) and description of the scenario. Perhaps it is 
> best that you create a github issue and we can discuss over there.
>
> /Patrik
>
> tis 9 jun 2015 kl. 23:58 skrev Brandon Arp  >:
>
>> I am seeing this as well with Akka 2.3.10.
>>
>>
>> On Tuesday, June 9, 2015 at 2:27:18 AM UTC-7, Anders Båtstrand wrote:
>>
>>> Is there any news about this? I have experienced the same, using Akka 
>>> 2.3.10 and Cassandra (latest version of the plugin).
>>>
>>> Best regards,
>>>
>>> Anders
>>>
>>> tirsdag 11. november 2014 13.34.03 UTC+1 skrev Richard Bowker følgende:
>>>
>>> Hi Patrik, I have managed to repro it twice again. We have typesafe 
>>> support so I will get in touch with them to discuss how best to send the 
>>> repro setup, as it's not a simple attachment!
>>>
>>> thanks
>>>
>>> Rich
>>>
>>> On Monday, November 10, 2014 4:35:24 PM UTC, Patrik Nordwall wrote:
>>>
>>>
>>>
>>> On Mon, Nov 10, 2014 at 5:18 PM, Richard Bowker <
>>> mechajoh...@googlemail.com> wrote:
>>>
>>> Hi Patrik, unfortunately not.  In fact its only happened once to me so 
>>> far so may be a difficult one to reproduce.
>>>
>>> I will of course get back to you if I can find a trigger.
>>>
>>>
>>> Thanks. That is problematic with these bugs. I don't know much about 
>>> Cassandra. Is it possible to export the data from cassandra if this happens 
>>> again so it could be analyzed (replayed) by us? Given that you don't have 
>>> any sensitive information in it.
>>>
>>> /Patrik
>>>  
>>>
>>>
>>> Tackar!
>>>
>>> On Monday, November 10, 2014 4:04:18 PM UTC, Patrik Nordwall wrote:
>>>
>>> Hi Richard,
>>>
>>> That is not good. We have seen similar issue a few times and tracked it 
>>> down to bugs in the journal implementations. It will happen when events are 
>>> replayed in the wrong order.
>>>
>>> Is there a way we can reproduce this?
>>>
>>> Regards,
>>> Patrik
>>>
>>> On Mon, Nov 10, 2014 at 2:42 PM, Richard Bowker >> com> wrote:
>>>
>>> I have had seen a similar problem when restarting nodes in a cluster 
>>> using sharding.
>>>
>>> after restarting, the node with the shard coordinator went into an 
>>> infinite error loop.
>>>
>>> I was using akka 2.3.6 and "com.github.krasserm" %% 
>>> "akka-persistence-cassandra" % "0.3.4" as the journal/persistence store.
>>>
>>> section of the error log below, I didn not know what to do to recover 
>>> this other than just manually delete all the akka keystores from the 
>>> database which obviously isn't ideal!
>>>
>>> any thoughts?
>>>
>>> thanks
>>>
>>> [ERROR] [11/10/2014 13:16:21.969] 
>>> [ClusterSystem-akka.actor.default-dispatcher-17] 
>>> [akka://ClusterSystem/user/sharding/PollServiceCoordinator/
>>> singleton/coordinator] 
>>>
>>> requirement failed: Region Actor[akka.tcp://ClusterSystem
>>> @172.31.18.169:2552/user/sharding/PollService#546005322] not 
>>> registered: State(Map(test47 -> Actor
>>>
>>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test6 -> 
>>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
>>> test30 -> Actor
>>>
>>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test42 -> 
>>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>>> sharding/PollService#1716980828], test29 -> 
>>>
>>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>>> sharding/PollService#1716980828], test18 -> Actor[akka://ClusterSystem/
>>> user/sharding/PollService#1625036981], 
>>>
>>> test14 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>>> sharding/PollService#1716980828], test36 -> Actor
>>>
>>> [akka.tcp://ClusterSystem@172.31.21.9:2552/user/sharding/
>>> PollService#1716980828], test25 -> Actor[akka://ClusterSystem/
>>> user/sharding/PollService#1625036981], test28 -> 
>>>
>>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
>>> test43 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>>> sharding/PollService#1716980828], 
>>>
>>> test32 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>>> sharding/PollService#1716980828], test20 -> Actor
>>>
>>> [akka://ClusterSystem/user/sharding/PollService#1625036981], test15 -> 
>>> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>>> sharding/PollService#1716980828], test33 -> 
>>>
>>> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
>>> test22 -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>>> sharding/PollService#1716980828], test0 
>>>
>>> -> Actor[akka.tcp://ClusterSystem@172.31.21.9:2552/user/
>>> sharding/PollService#1716980828], test44 -> Actor[akka://ClusterSystem/
>>> user/sharding/PollService#1625036981], 
>>>
>>> test11 -> Actor[akka://ClusterSystem/user/sharding/PollService#1625036981], 
>>> test41 -> Actor
>>>
>>> [akka.tcp://ClusterSystem@172.31.21.9:2552/use

Re: [akka-user] Sharding problem when restarting Cluster

2015-06-15 Thread GG
A little more detail on my issue: We've found that if we simply move our 
leveldb out of the way, the issue goes away which seems to align with 
Patrik's earlier post indicating a possible problem in the persistence 
impl. We are currently using the leveldb plugin in native mode. There seems 
to be some issue during replay where a Region Actor failed to register with 
a "requirement failed" similar to Richards stack trace above:



2015/06/16 00:00:00.472 [DEBUG] 
[ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
 
resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
2015/06/16 00:00:00.472 [DEBUG] 
[ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
 
resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
2015/06/16 00:00:00.472 [ERROR] 
[ClusterSystem-akka.actor.default-dispatcher-22][OneForOneStrategy] 
requirement failed: Shard [57] already allocated: State(Map(67 -> 
Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
 
12 -> 
Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
 
23 -> 
Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
 
40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
68 -> 
Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
 
48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
57 -> 
Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
 
25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
69 -> 
Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
 
53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
42 -> 
Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
 
27 -> 
Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
 
97 -> 
Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
 
91 -> 
Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]),Map(Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#-1086032504]
 
-> Vector(), 
Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] -> 
Vector(), 
Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]
 
-> Vector(42, 97, 67, 27, 91), 
Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985]
 
-> Vector(12, 23, 68, 69, 57), 
Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] -> 
Vector(48, 53, 25, 40)),Set()) java.lang.IllegalArgumentException: 
requirement failed: Shard [57] already allocated: State(Map(67 -> 
Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
 
12 -> 
Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
 
23 -> 
Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
 
40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
68 -> 
Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
 
48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
57 -> 
Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
 
25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
69 -> 
Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
 
53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
42 -> 
Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
 
27 -> 
Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
 
97 -> 
Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
 
91 -> 
Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]),Map(Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#-1086032504]
 
-> Vector(), 
Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] -> 
Vector(), 
Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]
 
-> Vector(42, 97, 67, 27, 91), 
Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985]
 
-> Vector(12, 23, 68, 69, 57), 
Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] -> 
Vector(48, 53, 25, 40)),Set())
at scala.Predef$.require(Predef.scala:219) ~[referrals:1.0]
at 
akka.contrib.pattern.ShardCoordinator$Internal$State.updated(ClusterSharding.scala:1119)
 
~[referrals:1.0]
at 
akka.contrib.pattern.ShardCoordinator$$anonfun$receiveRecover$1.ap

Re: [akka-user] Sharding problem when restarting Cluster

2015-06-16 Thread Patrik Nordwall
How did you use Leveldb journal? It can't really be used in a clustered
system. It is possible to use it for demo or testing with shared journal,
but that must not be used for production.

/Patrik

On Tue, Jun 16, 2015 at 3:07 AM, GG  wrote:

> A little more detail on my issue: We've found that if we simply move our
> leveldb out of the way, the issue goes away which seems to align with
> Patrik's earlier post indicating a possible problem in the persistence
> impl. We are currently using the leveldb plugin in native mode. There seems
> to be some issue during replay where a Region Actor failed to register with
> a "requirement failed" similar to Richards stack trace above:
>
>
>
> 2015/06/16 00:00:00.472 [DEBUG]
> [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
> resolve of path sequence [/user/sharding/ReferralView#-947611826] failed
> 2015/06/16 00:00:00.472 [DEBUG]
> [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
> resolve of path sequence [/user/sharding/ReferralView#-947611826] failed
> 2015/06/16 00:00:00.472 [ERROR]
> [ClusterSystem-akka.actor.default-dispatcher-22][OneForOneStrategy]
> requirement failed: Shard [57] already allocated: State(Map(67 ->
> Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
> 12 -> Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
> 23 -> Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
> 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
> 68 -> Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
> 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
> 57 -> Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
> 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
> 69 -> Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
> 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
> 42 -> Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
> 27 -> Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
> 97 -> Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
> 91 -> Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]
> ),Map(Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#-1086032504]
> -> Vector(),
> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] ->
> Vector(), Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]
> -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985]
> -> Vector(12, 23, 68, 69, 57),
> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] ->
> Vector(48, 53, 25, 40)),Set()) java.lang.IllegalArgumentException:
> requirement failed: Shard [57] already allocated: State(Map(67 ->
> Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
> 12 -> Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
> 23 -> Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
> 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
> 68 -> Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
> 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
> 57 -> Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
> 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
> 69 -> Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
> 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
> 42 -> Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
> 27 -> Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
> 97 -> Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
> 91 -> Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]
> ),Map(Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#-1086032504]
> -> Vector(),
> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] ->
> Vector(), Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]
> -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985]
> -> Vector(12, 23, 68, 69, 57),
> Act

Re: [akka-user] Sharding problem when restarting Cluster

2015-06-16 Thread GG
Patrick,

Thanks for your reply. We are using leveldb in a cluster system following 
the SharedLevelDb store instructions in the akka persistence docs. My 
understanding is that it shouldn't be used in production as it's a single 
point of failure.  We eventually want to move to a more available storage 
system but for developing and testing our application, this was the fastest 
way to get started.
If the issue at hand can, with 100% certainty, be blamed on our usage of a 
shared leveldb then we can move forward and invest in another persistence 
implementation. If, on the other hand, it's the result of a bug in akka 
remoting or clustering then we'll need to dig into and resolve that issue 
before we can confidently use those technologies in production.

 Do you have any insight on this Patrik? If the solution is to move to 
another persistence layer, we're considering Cassandra, Dynamo and Kafka 
(in roughly that order) as our production impls. Do you have any insight 
into the maturity of any of those impls?

Thanks



On Tuesday, June 16, 2015 at 1:08:22 AM UTC-7, Patrik Nordwall wrote:
>
> How did you use Leveldb journal? It can't really be used in a clustered 
> system. It is possible to use it for demo or testing with shared journal, 
> but that must not be used for production.
>
> /Patrik
>
> On Tue, Jun 16, 2015 at 3:07 AM, GG > 
> wrote:
>
>> A little more detail on my issue: We've found that if we simply move our 
>> leveldb out of the way, the issue goes away which seems to align with 
>> Patrik's earlier post indicating a possible problem in the persistence 
>> impl. We are currently using the leveldb plugin in native mode. There seems 
>> to be some issue during replay where a Region Actor failed to register with 
>> a "requirement failed" similar to Richards stack trace above:
>>
>>
>>
>> 2015/06/16 00:00:00.472 [DEBUG] 
>> [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
>>  
>> resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
>> 2015/06/16 00:00:00.472 [DEBUG] 
>> [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
>>  
>> resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
>> 2015/06/16 00:00:00.472 [ERROR] 
>> [ClusterSystem-akka.actor.default-dispatcher-22][OneForOneStrategy] 
>> requirement failed: Shard [57] already allocated: State(Map(67 -> 
>> Actor[akka.tcp://
>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 
>> 12 -> Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 
>> 23 -> Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 
>> 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
>> 68 -> Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 
>> 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
>> 57 -> Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 
>> 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
>> 69 -> Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 
>> 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
>> 42 -> Actor[akka.tcp://
>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 
>> 27 -> Actor[akka.tcp://
>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 
>> 97 -> Actor[akka.tcp://
>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 
>> 91 -> Actor[akka.tcp://
>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]
>> ),Map(Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#-1086032504] 
>> -> Vector(), 
>> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] -> 
>> Vector(), Actor[akka.tcp://
>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996] 
>> -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985] 
>> -> Vector(12, 23, 68, 69, 57), 
>> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] -> 
>> Vector(48, 53, 25, 40)),Set()) java.lang.IllegalArgumentException: 
>> requirement failed: Shard [57] already allocated: State(Map(67 -> 
>> Actor[akka.tcp://
>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 
>> 12 -> Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 
>> 23 -> Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 
>> 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
>> 68 -> Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 
>> 48 -> Actor[akka://ClusterSystem/user/shard

Re: [akka-user] Sharding problem when restarting Cluster

2015-06-16 Thread Patrik Nordwall
On Tue, Jun 16, 2015 at 7:44 PM, GG  wrote:

> Patrick,
>
> Thanks for your reply. We are using leveldb in a cluster system following
> the SharedLevelDb store instructions in the akka persistence docs. My
> understanding is that it shouldn't be used in production as it's a single
> point of failure.  We eventually want to move to a more available storage
> system but for developing and testing our application, this was the fastest
> way to get started.
>
If the issue at hand can, with 100% certainty, be blamed on our usage of a
> shared leveldb then we can move forward and invest in another persistence
> implementation. If, on the other hand, it's the result of a bug in akka
> remoting or clustering then we'll need to dig into and resolve that issue
> before we can confidently use those technologies in production.
>

I can't be 100% of course, and if you want me to investigate it we have to
do that in the Typesafe support channel (contact i...@typesafe.com if you
are not subscriber).


>
>  Do you have any insight on this Patrik? If the solution is to move to
> another persistence layer, we're considering Cassandra, Dynamo and Kafka
> (in roughly that order) as our production impls. Do you have any insight
> into the maturity of any of those impls?
>

Cassandra should be a good first choice.

/Patrik


>
> Thanks
>
>
>
> On Tuesday, June 16, 2015 at 1:08:22 AM UTC-7, Patrik Nordwall wrote:
>>
>> How did you use Leveldb journal? It can't really be used in a clustered
>> system. It is possible to use it for demo or testing with shared journal,
>> but that must not be used for production.
>>
>> /Patrik
>>
>> On Tue, Jun 16, 2015 at 3:07 AM, GG  wrote:
>>
>>> A little more detail on my issue: We've found that if we simply move our
>>> leveldb out of the way, the issue goes away which seems to align with
>>> Patrik's earlier post indicating a possible problem in the persistence
>>> impl. We are currently using the leveldb plugin in native mode. There seems
>>> to be some issue during replay where a Region Actor failed to register with
>>> a "requirement failed" similar to Richards stack trace above:
>>>
>>>
>>>
>>> 2015/06/16 00:00:00.472 [DEBUG]
>>> [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
>>> resolve of path sequence [/user/sharding/ReferralView#-947611826] failed
>>> 2015/06/16 00:00:00.472 [DEBUG]
>>> [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
>>> resolve of path sequence [/user/sharding/ReferralView#-947611826] failed
>>> 2015/06/16 00:00:00.472 [ERROR]
>>> [ClusterSystem-akka.actor.default-dispatcher-22][OneForOneStrategy]
>>> requirement failed: Shard [57] already allocated: State(Map(67 ->
>>> Actor[akka.tcp://
>>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
>>> 12 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>>> 23 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>>> 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
>>> 68 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>>> 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
>>> 57 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>>> 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
>>> 69 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>>> 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
>>> 42 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
>>> 27 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
>>> 97 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
>>> 91 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]
>>> ),Map(Actor[akka.tcp://
>>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#-1086032504]
>>> -> Vector(),
>>> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] ->
>>> Vector(), Actor[akka.tcp://
>>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]
>>> -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://
>>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985]
>>> -> Vector(12, 23, 68, 69, 57),
>>> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] ->
>>> Vector(48, 53, 25, 40)),Set()) java.lang.IllegalArgumentException:
>>> requirement failed: Shard [57] already allocated: State(Map(67 ->
>>> Actor[akka.tcp://
>>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
>>> 12 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>>> 23 -> Ac

Re: [akka-user] Sharding problem when restarting Cluster

2015-06-17 Thread GG
Alright. We'll give Cassandra a try. Thanks for the help Patrik.

On Tuesday, June 16, 2015 at 1:05:39 PM UTC-7, Patrik Nordwall wrote:
>
>
>
> On Tue, Jun 16, 2015 at 7:44 PM, GG > 
> wrote:
>
> Patrick,
>
> Thanks for your reply. We are using leveldb in a cluster system following 
> the SharedLevelDb store instructions in the akka persistence docs. My 
> understanding is that it shouldn't be used in production as it's a single 
> point of failure.  We eventually want to move to a more available storage 
> system but for developing and testing our application, this was the fastest 
> way to get started.
>
> If the issue at hand can, with 100% certainty, be blamed on our usage of a 
> shared leveldb then we can move forward and invest in another persistence 
> implementation. If, on the other hand, it's the result of a bug in akka 
> remoting or clustering then we'll need to dig into and resolve that issue 
> before we can confidently use those technologies in production.
>
>
> I can't be 100% of course, and if you want me to investigate it we have to 
> do that in the Typesafe support channel (contact in...@typesafe.com 
>  if you are not subscriber).
>  
>
>
>  Do you have any insight on this Patrik? If the solution is to move to 
> another persistence layer, we're considering Cassandra, Dynamo and Kafka 
> (in roughly that order) as our production impls. Do you have any insight 
> into the maturity of any of those impls?
>
>
> Cassandra should be a good first choice.
>
> /Patrik
>  
>
>
> Thanks
>
>
>
> On Tuesday, June 16, 2015 at 1:08:22 AM UTC-7, Patrik Nordwall wrote:
>
> How did you use Leveldb journal? It can't really be used in a clustered 
> system. It is possible to use it for demo or testing with shared journal, 
> but that must not be used for production.
>
> /Patrik
>
> On Tue, Jun 16, 2015 at 3:07 AM, GG  wrote:
>
> A little more detail on my issue: We've found that if we simply move our 
> leveldb out of the way, the issue goes away which seems to align with 
> Patrik's earlier post indicating a possible problem in the persistence 
> impl. We are currently using the leveldb plugin in native mode. There seems 
> to be some issue during replay where a Region Actor failed to register with 
> a "requirement failed" similar to Richards stack trace above:
>
>
>
> 2015/06/16 00:00:00.472 [DEBUG] 
> [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
>  
> resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
> 2015/06/16 00:00:00.472 [DEBUG] 
> [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
>  
> resolve of path sequence [/user/sharding/ReferralView#-947611826] failed 
> 2015/06/16 00:00:00.472 [ERROR] 
> [ClusterSystem-akka.actor.default-dispatcher-22][OneForOneStrategy] 
> requirement failed: Shard [57] already allocated: State(Map(67 -> 
> Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 
> 12 -> Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 
> 23 -> Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 
> 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
> 68 -> Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 
> 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
> 57 -> Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 
> 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
> 69 -> Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985], 
> 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
> 42 -> Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 
> 27 -> Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 
> 97 -> Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 
> 91 -> Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996] 
> 
> ),Map(Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#-1086032504] 
> -> Vector(), 
> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] -> 
> Vector(), Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996] 
> -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://
> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985] 
> -> Vector(12, 23, 68, 69, 57), 
> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538] -> 
> Vector(48, 53, 25, 40)),Set()) java.lang.IllegalArgumentException: 
> requirement failed: Shard [57] already allocated: State(Map(67 -> 
> Actor[akka.tc

Re: [akka-user] Sharding problem when restarting Cluster

2015-06-17 Thread Patrik Nordwall
You're welcome. Let me know if you see same problem with Cassandra, then I
will give it more attention.
/Patrik

ons 17 jun 2015 kl. 18:56 skrev GG :

> Alright. We'll give Cassandra a try. Thanks for the help Patrik.
>
> On Tuesday, June 16, 2015 at 1:05:39 PM UTC-7, Patrik Nordwall wrote:
>>
>>
>>
>> On Tue, Jun 16, 2015 at 7:44 PM, GG  wrote:
>>
>> Patrick,
>>
>> Thanks for your reply. We are using leveldb in a cluster system following
>> the SharedLevelDb store instructions in the akka persistence docs. My
>> understanding is that it shouldn't be used in production as it's a single
>> point of failure.  We eventually want to move to a more available storage
>> system but for developing and testing our application, this was the fastest
>> way to get started.
>>
>> If the issue at hand can, with 100% certainty, be blamed on our usage of
>> a shared leveldb then we can move forward and invest in another persistence
>> implementation. If, on the other hand, it's the result of a bug in akka
>> remoting or clustering then we'll need to dig into and resolve that issue
>> before we can confidently use those technologies in production.
>>
>>
>> I can't be 100% of course, and if you want me to investigate it we have
>> to do that in the Typesafe support channel (contact in...@typesafe.com
>> if you are not subscriber).
>>
>
>>
>>
>>  Do you have any insight on this Patrik? If the solution is to move to
>> another persistence layer, we're considering Cassandra, Dynamo and Kafka
>> (in roughly that order) as our production impls. Do you have any insight
>> into the maturity of any of those impls?
>>
>>
>> Cassandra should be a good first choice.
>>
>> /Patrik
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Tuesday, June 16, 2015 at 1:08:22 AM UTC-7, Patrik Nordwall wrote:
>>
>> How did you use Leveldb journal? It can't really be used in a clustered
>> system. It is possible to use it for demo or testing with shared journal,
>> but that must not be used for production.
>>
>> /Patrik
>>
>> On Tue, Jun 16, 2015 at 3:07 AM, GG  wrote:
>>
>> A little more detail on my issue: We've found that if we simply move our
>> leveldb out of the way, the issue goes away which seems to align with
>> Patrik's earlier post indicating a possible problem in the persistence
>> impl. We are currently using the leveldb plugin in native mode. There seems
>> to be some issue during replay where a Region Actor failed to register with
>> a "requirement failed" similar to Richards stack trace above:
>>
>>
>>
>> 2015/06/16 00:00:00.472 [DEBUG]
>> [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
>> resolve of path sequence [/user/sharding/ReferralView#-947611826] failed
>> 2015/06/16 00:00:00.472 [DEBUG]
>> [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
>> resolve of path sequence [/user/sharding/ReferralView#-947611826] failed
>> 2015/06/16 00:00:00.472 [ERROR]
>> [ClusterSystem-akka.actor.default-dispatcher-22][OneForOneStrategy]
>> requirement failed: Shard [57] already allocated: State(Map(67 ->
>> Actor[akka.tcp://
>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
>> 12 -> Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>> 23 -> Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>> 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
>> 68 -> Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>> 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
>> 57 -> Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>> 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
>> 69 -> Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>> 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
>> 42 -> Actor[akka.tcp://
>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
>> 27 -> Actor[akka.tcp://
>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
>> 97 -> Actor[akka.tcp://
>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
>> 91 -> Actor[akka.tcp://
>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]
>> 
>> ),Map(Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#-1086032504]
>> -> Vector(),
>> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1934388853] ->
>> Vector(), Actor[akka.tcp://
>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]
>> -> Vector(42, 97, 67, 27, 91), Actor[akka.tcp://
>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985]
>> -> Vector(12, 23, 68, 69, 57),
>> Actor[akka://ClusterSystem

Re: [akka-user] Sharding problem when restarting Cluster

2015-06-17 Thread Richard Bowker
It's been a while! But just for reference, Patrik investigated my issue at
the time and we came to the conclusion I had accidentally created two
clusters writing to the same database (I had not set up my seed nodes in a
resilient way as I was only doing a prototype). Once I fixed this the issue
was never seen again.
On 17 Jun 2015 20:29, "Patrik Nordwall"  wrote:

> You're welcome. Let me know if you see same problem with Cassandra, then I
> will give it more attention.
> /Patrik
>
> ons 17 jun 2015 kl. 18:56 skrev GG :
>
>> Alright. We'll give Cassandra a try. Thanks for the help Patrik.
>>
>> On Tuesday, June 16, 2015 at 1:05:39 PM UTC-7, Patrik Nordwall wrote:
>>>
>>>
>>>
>>> On Tue, Jun 16, 2015 at 7:44 PM, GG  wrote:
>>>
>>> Patrick,
>>>
>>> Thanks for your reply. We are using leveldb in a cluster system
>>> following the SharedLevelDb store instructions in the akka persistence
>>> docs. My understanding is that it shouldn't be used in production as it's a
>>> single point of failure.  We eventually want to move to a more available
>>> storage system but for developing and testing our application, this was the
>>> fastest way to get started.
>>>
>>> If the issue at hand can, with 100% certainty, be blamed on our usage of
>>> a shared leveldb then we can move forward and invest in another persistence
>>> implementation. If, on the other hand, it's the result of a bug in akka
>>> remoting or clustering then we'll need to dig into and resolve that issue
>>> before we can confidently use those technologies in production.
>>>
>>>
>>> I can't be 100% of course, and if you want me to investigate it we have
>>> to do that in the Typesafe support channel (contact in...@typesafe.com
>>> if you are not subscriber).
>>>
>>
>>>
>>>
>>>  Do you have any insight on this Patrik? If the solution is to move to
>>> another persistence layer, we're considering Cassandra, Dynamo and Kafka
>>> (in roughly that order) as our production impls. Do you have any insight
>>> into the maturity of any of those impls?
>>>
>>>
>>> Cassandra should be a good first choice.
>>>
>>> /Patrik
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Tuesday, June 16, 2015 at 1:08:22 AM UTC-7, Patrik Nordwall wrote:
>>>
>>> How did you use Leveldb journal? It can't really be used in a clustered
>>> system. It is possible to use it for demo or testing with shared journal,
>>> but that must not be used for production.
>>>
>>> /Patrik
>>>
>>> On Tue, Jun 16, 2015 at 3:07 AM, GG  wrote:
>>>
>>> A little more detail on my issue: We've found that if we simply move our
>>> leveldb out of the way, the issue goes away which seems to align with
>>> Patrik's earlier post indicating a possible problem in the persistence
>>> impl. We are currently using the leveldb plugin in native mode. There seems
>>> to be some issue during replay where a Region Actor failed to register with
>>> a "requirement failed" similar to Richards stack trace above:
>>>
>>>
>>>
>>> 2015/06/16 00:00:00.472 [DEBUG]
>>> [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
>>> resolve of path sequence [/user/sharding/ReferralView#-947611826] failed
>>> 2015/06/16 00:00:00.472 [DEBUG]
>>> [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
>>> resolve of path sequence [/user/sharding/ReferralView#-947611826] failed
>>> 2015/06/16 00:00:00.472 [ERROR]
>>> [ClusterSystem-akka.actor.default-dispatcher-22][OneForOneStrategy]
>>> requirement failed: Shard [57] already allocated: State(Map(67 ->
>>> Actor[akka.tcp://
>>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
>>> 12 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>>> 23 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>>> 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
>>> 68 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>>> 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
>>> 57 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>>> 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
>>> 69 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
>>> 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538],
>>> 42 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
>>> 27 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
>>> 97 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
>>> 91 -> Actor[akka.tcp://
>>> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996]
>>> 
>>> ),Map(A

Re: [akka-user] Sharding problem when restarting Cluster

2015-06-23 Thread Anders Båtstrand
This did happen again to me now, after a problem with the clustering
(separate thread). The nodes did not agree on who was the leader, or
the cluster size. Seems two clusters writing to the same database is
the cause of this.

On 17 June 2015 at 21:33, Richard Bowker  wrote:
> It's been a while! But just for reference, Patrik investigated my issue at
> the time and we came to the conclusion I had accidentally created two
> clusters writing to the same database (I had not set up my seed nodes in a
> resilient way as I was only doing a prototype). Once I fixed this the issue
> was never seen again.
>
> On 17 Jun 2015 20:29, "Patrik Nordwall"  wrote:
>>
>> You're welcome. Let me know if you see same problem with Cassandra, then I
>> will give it more attention.
>> /Patrik
>>
>> ons 17 jun 2015 kl. 18:56 skrev GG :
>>>
>>> Alright. We'll give Cassandra a try. Thanks for the help Patrik.
>>>
>>> On Tuesday, June 16, 2015 at 1:05:39 PM UTC-7, Patrik Nordwall wrote:



 On Tue, Jun 16, 2015 at 7:44 PM, GG  wrote:

 Patrick,

 Thanks for your reply. We are using leveldb in a cluster system
 following the SharedLevelDb store instructions in the akka persistence 
 docs.
 My understanding is that it shouldn't be used in production as it's a 
 single
 point of failure.  We eventually want to move to a more available storage
 system but for developing and testing our application, this was the fastest
 way to get started.

 If the issue at hand can, with 100% certainty, be blamed on our usage of
 a shared leveldb then we can move forward and invest in another persistence
 implementation. If, on the other hand, it's the result of a bug in akka
 remoting or clustering then we'll need to dig into and resolve that issue
 before we can confidently use those technologies in production.


 I can't be 100% of course, and if you want me to investigate it we have
 to do that in the Typesafe support channel (contact in...@typesafe.com if
 you are not subscriber).




  Do you have any insight on this Patrik? If the solution is to move to
 another persistence layer, we're considering Cassandra, Dynamo and Kafka 
 (in
 roughly that order) as our production impls. Do you have any insight into
 the maturity of any of those impls?


 Cassandra should be a good first choice.

 /Patrik



 Thanks



 On Tuesday, June 16, 2015 at 1:08:22 AM UTC-7, Patrik Nordwall wrote:

 How did you use Leveldb journal? It can't really be used in a clustered
 system. It is possible to use it for demo or testing with shared journal,
 but that must not be used for production.

 /Patrik

 On Tue, Jun 16, 2015 at 3:07 AM, GG  wrote:

 A little more detail on my issue: We've found that if we simply move our
 leveldb out of the way, the issue goes away which seems to align with
 Patrik's earlier post indicating a possible problem in the persistence 
 impl.
 We are currently using the leveldb plugin in native mode. There seems to be
 some issue during replay where a Region Actor failed to register with a
 "requirement failed" similar to Richards stack trace above:



 2015/06/16 00:00:00.472 [DEBUG]
 [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
 resolve of path sequence [/user/sharding/ReferralView#-947611826] failed
 2015/06/16 00:00:00.472 [DEBUG]
 [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
 resolve of path sequence [/user/sharding/ReferralView#-947611826] failed
 2015/06/16 00:00:00.472 [ERROR]
 [ClusterSystem-akka.actor.default-dispatcher-22][OneForOneStrategy]
 requirement failed: Shard [57] already allocated: State(Map(67 ->
 Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996],
 12 ->
 Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
 23 ->
 Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
 40 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
 68
 ->
 Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
 48 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
 57
 ->
 Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
 25 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
 69
 ->
 Actor[akka.tcp://ClusterSystem@172.31.15.250:9599/user/sharding/ReferralView#1263176985],
 53 -> Actor[akka://ClusterSystem/user/sharding/ReferralView#-1114626538], 
 42
 ->
 Actor[akka.tcp://ClusterSystem@172.31.10.125:9599/user/sharding/ReferralV

Re: [akka-user] Sharding problem when restarting Cluster

2015-07-20 Thread Zack Angelo
I'm also experiencing this issue intermittently using a JDBC persistence 
plugin. I'm working on a way to reproduce, but in the meantime, can anyone 
suggest a way to detect this failure programatically? I'd like our server 
to fail more visibly when something like this occurs, is there an actor I 
can watch or something? 

This also breaks our shutdown hook because we do a graceful shard handoff 
on SIGTERM and then wait for the shard region actor to terminate (it never 
does in this state). 

On Tuesday, June 23, 2015 at 8:06:59 AM UTC-5, Anders Båtstrand wrote:
>
> This did happen again to me now, after a problem with the clustering 
> (separate thread). The nodes did not agree on who was the leader, or 
> the cluster size. Seems two clusters writing to the same database is 
> the cause of this. 
>
> On 17 June 2015 at 21:33, Richard Bowker  > wrote: 
> > It's been a while! But just for reference, Patrik investigated my issue 
> at 
> > the time and we came to the conclusion I had accidentally created two 
> > clusters writing to the same database (I had not set up my seed nodes in 
> a 
> > resilient way as I was only doing a prototype). Once I fixed this the 
> issue 
> > was never seen again. 
> > 
> > On 17 Jun 2015 20:29, "Patrik Nordwall"  > wrote: 
> >> 
> >> You're welcome. Let me know if you see same problem with Cassandra, 
> then I 
> >> will give it more attention. 
> >> /Patrik 
> >> 
> >> ons 17 jun 2015 kl. 18:56 skrev GG >: 
>
> >>> 
> >>> Alright. We'll give Cassandra a try. Thanks for the help Patrik. 
> >>> 
> >>> On Tuesday, June 16, 2015 at 1:05:39 PM UTC-7, Patrik Nordwall wrote: 
>  
>  
>  
>  On Tue, Jun 16, 2015 at 7:44 PM, GG  wrote: 
>  
>  Patrick, 
>  
>  Thanks for your reply. We are using leveldb in a cluster system 
>  following the SharedLevelDb store instructions in the akka 
> persistence docs. 
>  My understanding is that it shouldn't be used in production as it's a 
> single 
>  point of failure.  We eventually want to move to a more available 
> storage 
>  system but for developing and testing our application, this was the 
> fastest 
>  way to get started. 
>  
>  If the issue at hand can, with 100% certainty, be blamed on our usage 
> of 
>  a shared leveldb then we can move forward and invest in another 
> persistence 
>  implementation. If, on the other hand, it's the result of a bug in 
> akka 
>  remoting or clustering then we'll need to dig into and resolve that 
> issue 
>  before we can confidently use those technologies in production. 
>  
>  
>  I can't be 100% of course, and if you want me to investigate it we 
> have 
>  to do that in the Typesafe support channel (contact 
> in...@typesafe.com if 
>  you are not subscriber). 
>  
>  
>  
>  
>   Do you have any insight on this Patrik? If the solution is to move 
> to 
>  another persistence layer, we're considering Cassandra, Dynamo and 
> Kafka (in 
>  roughly that order) as our production impls. Do you have any insight 
> into 
>  the maturity of any of those impls? 
>  
>  
>  Cassandra should be a good first choice. 
>  
>  /Patrik 
>  
>  
>  
>  Thanks 
>  
>  
>  
>  On Tuesday, June 16, 2015 at 1:08:22 AM UTC-7, Patrik Nordwall wrote: 
>  
>  How did you use Leveldb journal? It can't really be used in a 
> clustered 
>  system. It is possible to use it for demo or testing with shared 
> journal, 
>  but that must not be used for production. 
>  
>  /Patrik 
>  
>  On Tue, Jun 16, 2015 at 3:07 AM, GG  wrote: 
>  
>  A little more detail on my issue: We've found that if we simply move 
> our 
>  leveldb out of the way, the issue goes away which seems to align with 
>  Patrik's earlier post indicating a possible problem in the 
> persistence impl. 
>  We are currently using the leveldb plugin in native mode. There seems 
> to be 
>  some issue during replay where a Region Actor failed to register with 
> a 
>  "requirement failed" similar to Richards stack trace above: 
>  
>  
>  
>  2015/06/16 00:00:00.472 [DEBUG] 
>  
> [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
>  
>
>  resolve of path sequence [/user/sharding/ReferralView#-947611826] 
> failed 
>  2015/06/16 00:00:00.472 [DEBUG] 
>  
> [ClusterSystem-akka.actor.default-dispatcher-21][LocalActorRefProvider(akka://ClusterSystem)]
>  
>
>  resolve of path sequence [/user/sharding/ReferralView#-947611826] 
> failed 
>  2015/06/16 00:00:00.472 [ERROR] 
>  [ClusterSystem-akka.actor.default-dispatcher-22][OneForOneStrategy] 
>  requirement failed: Shard [57] already allocated: State(Map(67 -> 
>  Actor[akka.tcp://
> ClusterSystem@172.31.10.125:9599/user/sharding/ReferralView#-575704996], 
>  12 -> 
>  Ac

Re: [akka-user] Sharding problem when restarting Cluster

2015-08-04 Thread Jim Hazen
I see this issue happen whenever AWS has a network hiccup.  I have a 
multi-node cluster behind a LB and akka cluster sharding along with akka 
persistence writing to a Dynamo journal.  I'm currently on akka 2.3.11, 
which means the same shared Dynamo table used to store my persistent actors 
is also being used to store cluster information.  I know of no way to 
prevent this until akka 2.4.

I have my min-nr-of-members set to (nodes / 2) + 1.  Things seem to work 
fine during clean node restarts and code deploys.

However, I run into the same problem as the OP when AWS suffers an 
intermittent network partition.  The nodes within the akka cluster can't 
fully communicate, yet the LB is able to reach all nodes.  Cluster state is 
persisted into the same location, because that's unavoidable while using 
akka persistence for other things.  Eventually cluster sharding gets upset 
and panics.  Causing the error below to be repeated constantly until the 
full cluster is shut down and started back up cleanly.

What should a developer do to isolate from split brained issues when using 
cluster sharding?  min-nr-of-members appears to only be checked during 
cluster startup.  However, once started and participating, what happens 
automatically when the cluster detects that the cluster has dropped below 
min-nr-of-members?  I can attempt to guard against possible issues in 
application land by subscribing to the cluster events and taking some 
action.  I'm not sure if there's anything I can do to prevent the cluster 
sharding internals from running into state however, since writing cluster 
state to a shared journal is unavoidable and network issues are unavoidable.


2015-08-02 05:33:30.138 05:33:30.138UTC [Device] ERROR 
> akka.actor.OneForOneStrategy DeviceSvc-akka.actor.default-dispatcher-3 
> akka://DeviceSvc/user/sharding/UserDeviceIndexCoordinator/singleton/coordinator
>  
> - requirement failed: Shard [2] already allocated: State(Map(-2 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203],
>  
> 0 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203],
>  
> 2 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203],
>  
> -1 -> 
> Actor[akka.tcp://DeviceSvc@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773],
>  
> 3 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203]),Map(Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203]
>  
> -> Vector(2, 3, -2, 0), 
> Actor[akka.tcp://DeviceSvc@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773]
>  
> -> Vector(-1)),Set())
>
> > java.lang.IllegalArgumentException: requirement failed: Shard [2] 
> already allocated: State(Map(-2 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203],
>  
> 0 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203],
>  
> 2 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203],
>  
> -1 -> 
> Actor[akka.tcp://DeviceSvc@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773],
>  
> 3 -> 
> Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203]),Map(Actor[akka.tcp://DeviceSvc@172.31.4.174:8108/user/sharding/UserDeviceIndex#-360404203]
>  
> -> Vector(2, 3, -2, 0), 
> Actor[akka.tcp://DeviceSvc@172.31.13.57:8108/user/sharding/UserDeviceIndex#855444773]
>  
> -> Vector(-1)),Set())
>
> >at scala.Predef$.require(Predef.scala:219) 
> ~[org.scala-lang.scala-library-2.11.6.jar:na]
>
> >at 
> akka.contrib.pattern.ShardCoordinator$Internal$State.updated(ClusterSharding.scala:1119)
>  
> ~[com.typesafe.akka.akka-contrib_2.11-2.3.11.jar:2.3.11]
>
> >at 
> akka.contrib.pattern.ShardCoordinator$$anonfun$receiveRecover$1.applyOrElse(ClusterSharding.scala:1242)
>  
> ~[com.typesafe.akka.akka-contrib_2.11-2.3.11.jar:2.3.11]
>
> >at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) 
> ~[org.scala-lang.scala-library-2.11.6.jar:na]
>
> >at 
> akka.persistence.Eventsourced$$anonfun$akka$persistence$Eventsourced$$recoveryBehavior$1.applyOrElse(Eventsourced.scala:168)
>  
> ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na]
>
> >at akka.persistence.Recovery$class.runReceive(Recovery.scala:48) 
> ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na]
>
> >at 
> akka.contrib.pattern.ShardCoordinator.runReceive(ClusterSharding.scala:1195) 
> ~[com.typesafe.akka.akka-contrib_2.11-2.3.11.jar:2.3.11]
>
> >at 
> akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33)
>  
> ~[com.typesafe.akka.akka-persistence-experimental_2.11-2.3.11.jar:na]
>
> >at 
> akka.persistence.Recovery$State$$anonfun$processPersistent$1.apply(Recovery.scala:33)
>  
> ~[com.typesafe.ak