Roland and Martin, thank you for your quick responses.  I've opened two 
tickets:  https://github.com/akka/akka/issues/15816 
and https://github.com/krasserm/akka-persistence-cassandra/issues/21.

On Wednesday, 3 September 2014 21:02:06 UTC-4, Michael Diamant wrote:
>
> My team uses akka-persistence 2.3.3 and akka-persistence-cassandra 0.3.1. 
>  Recently, in production, my team observed a View that did not appear to be 
> polling as expected.  The application had been running for about 12 hours 
> (and previously has run for much longer without issue).  Updates to 
> Cassandra did not propagate to the consuming application.  The consumer did 
> not emit any error level logging (in production, logging is set to error). 
>  The application is run on multiple nodes.  Restarting one application 
> instance fixed the issue (i.e. the View read all events on start-up and 
> continued polling as expected).
>
> Having limited instrumentation available, there is not much else that I 
> can specify with certainty about the running instances with suspected 
> broken Views.  The View actor is created with the default supervision 
> strategy (i.e. restart on exception), which rules out the scenario that the 
> actor was stopped.  Additionally, local tests were performed to confirm 
> this behavior in the event of an exception.  
>
> The hypothesis my team formed to explain the situation is that perhaps a 
> call to Cassandra via the akka-persistence-cassandra journal never 
> returned.  There are several issues related to the DataStax driver (e.g. 
> https://datastax-oss.atlassian.net/browse/JAVA-268) that might be at play 
> here.  These issues appear to be resolved in 2.0.4, while 
> akka-persistence-cassandra is compiled against 2.0.1.  My team will upgrade 
> accordingly.
>
> Assuming this is the issue, I want to voice my concern about how 
> akka-persistence handles journals that fail to return a response. 
>  Following the code, akka.persistence.Recovery tells the journal to read:
> journal ! ReplayMessages(lastSequenceNr + 1L, toSnr, replayMax, 
> processorId, self)
>
> Then, based on the response type (success/failure), appropriate callbacks 
> are invoked until ultimately in View, onReplayComplete() is invoked.  This 
> function is responsible for scheduling the next polling attempt.  If the 
> journal fails to respond, then the View never seeks to poll again because 
> there is no timeout mechanism (that I am aware of).
>
> If what I'm talking through holds water, would it make sense to consider 
> adding a timeout to the View to ensure it continues to attempt polling for 
> updates?  It could also make sense to instrument a policy for reporting an 
> error when this stale condition is discovered.  I'm happy to think through 
> the proposed enhancements further should the hypothesis be validated.
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to