On Mon, Feb 9, 2015 at 6:15 PM, Magnus H <magnus.holtl...@gmail.com> wrote:

> Hi Patrik,
>
> The message extractor is quite simple, I've modified some class names but
> basically looks like this:
> public final static ShardRegion.MessageExtractor messageExtractor = new
> ShardRegion.MessageExtractor() {
>  @Override
> public String entryId(Object msg) {
> return String.valueOf(getSessionId(msg));
> }
>  @Override
> public Object entryMessage(Object msg) {
> return msg;
> }
>  @Override
> public String shardId(Object msg) {
> return String.valueOf(getSessionId(msg) % 20);
> }
>  public int getSessionId(Object msg) {
> if (msg instanceof AMessageEnvelope) {
> return getSessionId(((AMessageEnvelope)msg).messageObject());
> }
> if (msg instanceof OneMessageType) {
> return ((OneMessageType)msg).sessionId();
> }
> if (msg instanceof AnotherMessageType) {
> return ((AnotherMessageType)msg).sessionId();
> }
> return -1;
> }
> };
>
>
That looks good.


> I'll will try and get some better logs. We have only seen this problem
> once but we'll see if we can get it to happen again.
>

Thanks.

By the way, are you sure that the two duplicate actors were part of the
same cluster?
Note that auto-down may split the cluster in two separate clusters, and
thereby starting two shard coordinators, and so on.
What downing strategy do you use?

/Patrik


>
> Thanks,
> Magnus
>
> On Monday, February 9, 2015 at 1:05:19 PM UTC+2, Patrik Nordwall wrote:
>>
>> Hi Magnus,
>>
>> That should not happen.
>>
>> How does your MessageExtractor look like?
>>
>> We might be able to understand the problem better if you are able to run
>> with debug log level and share those logs.
>>
>> Failures to save snapshots should not influence correctness, but the
>> maintainer of akka-persistence-sql-async might be interested of that issue.
>>
>> Regards,
>> Patrik
>>
>>
>>
>> On Thu, Feb 5, 2015 at 5:13 PM, Magnus H <magnus....@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a problem where multiple actors for the same entry id are running
>>> in the cluster. We have two nodes that are both running a ShardRegion and
>>> the sharded actors. By analyzing logs I've concluded that when a message
>>> comes in to any of the nodes, the message is routed to an entry actor
>>> running on the same node. The two ShardRegions does not seem to have
>>> consencus about where the entry is actually running. Other entries in the
>>> same shard seem to work fine, I have only identified a single entry that is
>>> in this bad state.
>>>
>>> I can see that an entry actor is being created on both of the two nodes:
>>> Node 1: 2015-02-03 17:08:35,773+0000 Info [reactor-akka.actor.default-
>>> dispatcher-3@0xb] [com.packagename.service.SessionPollHandler],
>>> Instantiated SessionPollHandler for session 104
>>> Node 2: 2015-02-03 17:09:08,599+0000 Info [reactor-akka.actor.default-
>>> dispatcher-2@0xa] [com.packagename.service.SessionPollHandler],
>>> Instantiated SessionPollHandler for session 104
>>>
>>> The entries are being sharded on session id. The shard coordinator
>>> singleton is running on node 2. At 17:09:08 node 2 receives a message which
>>> is routed to session 104 through the ShardRegion. This triggers the actor
>>> creation, although the actor is already running in node 1 and should be
>>> routed there.
>>>
>>> The entry actors are using Akka persistence and extends the
>>> UntypedPersistentActor. One of the entry actors is not able to write to the
>>> journal and fails with a message like:
>>> ERROR[db-async-netty-thread-2] MySQLConnection - Received an error
>>> message -> ErrorMessage(1062,#23000,Duplicate entry
>>> 'SessionPollHandler-104-2' for key 'PRIMARY')
>>>
>>> I assume this is because the two actors think they are on different
>>> sequence numbers. This means that every other request coming in to the
>>> system is served correctly and every other message goes to the bad actor
>>> and fails since the event cannot be persisted.
>>>
>>> I can also see some log messages where the shard coordinator fails to
>>> write to the database:
>>> Persistent snapshot failure: Error 1062 - #23000 - Duplicate entry
>>> '/user/sharding/SessionPollShardCoordinator/singleton/coordinator' for
>>> key 'PRIMARY'
>>>
>>> None of these happen around the time where these the two entry actors
>>> are created, but some hours before and some hours later. If the poll
>>> coordinator fails to store it's state I can understand if things break, for
>>> example so that no actor is created at all, but I wouldn't expect two
>>> actors to be created. Even though there are a number of these messages the
>>> system seems to be working fine except for this specific entry that is
>>> misbehaving.
>>>
>>> We are currently running Akka 2.3.6, akka-persistence-sql-async 0.1 and
>>> mysql-async 0.2.15.
>>>
>>> Is there any known issues that could cause something like this, that
>>> perhaps have been fixed already in later versions of Akka? To me it feels
>>> like a problem in Akka, or could there be something that we are doing wrong?
>>>
>>> I am sure I can resolve the issue for now by modifying the journal
>>> database and/or restarting the nodes. I have not tried to do that yet since
>>> I would want to understand what is going on and make sure it does not
>>> happen again. Any ideas would be appreciated.
>>>
>>> Thanks,
>>> Magnus
>>>
>>> --
>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>> >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/
>>> current/additional/faq.html
>>> >>>>>>>>>> Search the archives: https://groups.google.com/
>>> group/akka-user
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "Akka User List" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to akka-user+...@googlegroups.com.
>>> To post to this group, send email to akka...@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/akka-user.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>>
>> Patrik Nordwall
>> Typesafe <http://typesafe.com/> -  Reactive apps on the JVM
>> Twitter: @patriknw
>>
>>   --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to akka-user+unsubscr...@googlegroups.com.
> To post to this group, send email to akka-user@googlegroups.com.
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>



-- 

Patrik Nordwall
Typesafe <http://typesafe.com/> -  Reactive apps on the JVM
Twitter: @patriknw

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to