Re: [akka-user] Shard coordinator persisting actor addresses containing the node IP

Johannes Berg Thu, 05 Mar 2015 05:51:45 -0800

Okay, so following the other discussion as well I guess there's no need for 
a bug report then since the uid is part of the address. The problem we saw 
only occured because we had the clusters wrongfully configured and 
shouldn't happen if you shutdown and move the node to another cluster.


Thanks for clearing this out.

Johannes

On Monday, March 2, 2015 at 1:58:45 PM UTC+2, Patrik Nordwall wrote:
>
> Thinking some more about this. Actually, this can only happen if you split 
> an existing cluster in two separate clusters, typically by using auto-down. 
> It is not only using host:port. It is using full ActorRef of the shard 
> region and that includes the uid of the actor incarnation. That means that 
> it will not match and this will not be an issue if you shutdown the system.
>
> /Patrik
>
> On Mon, Mar 2, 2015 at 12:38 PM, Patrik Nordwall <patrik....@gmail.com 
> <javascript:>> wrote:
>
>> This is an oversight. Please create an issue 
>> <https://github.com/akka/akka/issues>. For now I would recommend that 
>> you use unique names (actor system name) of the the clusters to avoid 
>> accidental cross talk like this.
>>
>> Thanks for reporting!
>>
>> Regards,
>> Patrik
>>
>> On Mon, Mar 2, 2015 at 11:04 AM, Brice Figureau <bric...@daysofwonder.com 
>> <javascript:>> wrote:
>>
>>> On Tue, 2015-02-24 at 23:06 -0800, Johannes Berg wrote:
>>> > Hi!
>>> >
>>> > We recently made a mistake by having two separate clusters use the
>>> > same journal DB which obviously doesn't work but it lead to some
>>> > questions about how the cluster sharding works.
>>> >
>>> > We detected our problem by seeing a stack trace containing lines of
>>> > code that didn't exist anywhere in the code deployed in that cluster
>>> > and after tripple checking that we were sure we did run the correct
>>> > code in all places in the cluster we finally found out that it in fact
>>> > must be talking with an actor outside of the cluster. Looking at what
>>> > the cluster sharding coordinator persists in the journal we found an
>>> > IP address of a node in the other cluster.
>>>
>>> What a coincidence! I observed the exact same issue a couple of days
>>> after you.
>>> See here:
>>> https://groups.google.com/d/msg/akka-user/OJKGEgKBn2g/BQLRWkoK8zcJ
>>>
>>> So I tested a bit more and found that the system would heal by itself (a
>>> bit later), because if there's no ActorSystem running on those other
>>> nodes, the ShardCoordinator would finally receive a Terminated message
>>> (at least under akka 2.3.9), and allocate the region to another place.
>>> Granted if the node moved to another cluster using the same ActorSystem
>>> name, it might be more problematic.
>>>
>>> Still this look a bit strange to me, or something that had been
>>> oversighted during design.
>>>
>>> > Granted the data in the journal table is pretty invalid but this still
>>> > raises the question why the shard coordinator could start using nodes
>>> > that aren't part of the cluster just based on some historical
>>> > persisted info in the journal table. For example what happens if you
>>> > run sharding over two nodes and decide to move the second node to
>>> > another cluster. You take down the system and change the config files
>>> > and then start the two systems again, could the sharding coordinator
>>> > on the first node under any circumstances still use the second node
>>> > based on the info in the journal table even though it's now not part
>>> > of the same cluster anymore?
>>>
>>> Yes, this isn't practical at all in an elastic environment. I would have
>>> preferred a mechanism when recovering the ShardCoordinator that would
>>> allocate the region locally then rebalance the shards based on the
>>> discovered regions at the end of the recovery.
>>>
>>> I think I'll enter a bug report for this problem.
>>>
>>> --
>>> Brice Figureau <bric...@daysofwonder.com <javascript:>>
>>>
>>> --
>>> >>>>>>>>>>      Read the docs: http://akka.io/docs/
>>> >>>>>>>>>>      Check the FAQ: 
>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>> >>>>>>>>>>      Search the archives: 
>>> https://groups.google.com/group/akka-user
>>> ---
>>> You received this message because you are subscribed to the Google 
>>> Groups "Akka User List" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to akka-user+...@googlegroups.com <javascript:>.
>>> To post to this group, send email to akka...@googlegroups.com 
>>> <javascript:>.
>>> Visit this group at http://groups.google.com/group/akka-user.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> -- 
>>
>> Patrik Nordwall
>> Typesafe <http://typesafe.com/> -  Reactive apps on the JVM
>> Twitter: @patriknw
>>
>> [image: Scala Days] <http://event.scaladays.org/scaladays-sanfran-2015>
>>
>>  
>
>
> -- 
>
> Patrik Nordwall
> Typesafe <http://typesafe.com/> -  Reactive apps on the JVM
> Twitter: @patriknw
>
> [image: Scala Days] <http://event.scaladays.org/scaladays-sanfran-2015>
>
>  

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Shard coordinator persisting actor addresses containing the node IP

Reply via email to