On 06.05.14 21:13, Matthew Howard wrote:
On Tuesday, May 6, 2014 2:06:16 AM UTC-4, Martin Krasser wrote:
You may be interested in this pull request
<https://github.com/akka/akka/pull/15036> that enables reading
from akka-persistence journals via reactive-stream producers.
Yea, actually that looks much like what I had in mind.
On Tuesday, May 6, 2014 2:29:33 AM UTC-4, massivedynamic wrote:
I was also in the same mindset that you have in terms of just
wanting a pure Akka solution with a pulling pattern. The only
issue that I found there was in the scenario where your master
actors (the one's that hold the work that other actors pull
from) go down. In this case, you're losing all of the data
that the master actors held unless you have some sort of
safe-guard in place (not really sure what this might look like).
I am in the same boat as Ryan mentions below - I'm enriching and
processing data that already resides in a database, so if a failure
occurs I can just reprocess from the start. That is generally the
direction I was thinking with a pure Akka implementation based on
persistence. If you had a (persistent) Processor(s) just accumulating
Tweet events, then a View(s) could subscribe to that processor and
emit the events to downstream workers for processing. In that case the
View acts as the coordinator and the Processor is the durable mailbox
effectively, if either goes down you have the ability to recreate it
and effectively pick up where you left off. You'd need to play with
the snapshots, replay and recovery a bit to get proper flow control
while reading the journal... Based on a quick read of Martin's PR
above I think that is where streams would be helpful (a replacement of
the View in my scenario). A PersistentChannel might be an easier
option now that I think about it... then your workers can confirm when
done - providing you automatic replay for anything missed if
coordinator/worker dies.
Please not that the primary use case for persistent channels is to deal
with slow and/or temporarily available consumers/destinations. It is not
optimized for high throughput (yet). More detailed, a persistent channel
usually has a very high write rate (with up to 100k msgs/sec, provided
by a Processor it uses internally) but only a moderate message delivery
rate to consumers. If you need a persistent queue with a high message
throughput, consider using a 3rd party messaging product.
I'm not sure in your case how you might protect yourself if the
coordinator dies, although if that is likely there should be some way
to minimize the job and state of the coordinator to minimize it's
role. So for example if the coordinator is responsible for a) pulling
data from the Twitter stream, and b) supervising workers to consume
that data, and c) acting on the response from workers and maintaining
some state... then perhaps that really is best done with 3 actors. I
could see a) and c) possibly being done with actors behind a router to
provide some fault-tolerance (in which case "a" couldn't really be a
Processor, but I think it could use a PersistentChannel).
In my case I had lots of niggling reasons not to introduce another
architectural component - but I think some MQ/DB option is perfectly
reasonable. On another note - Ryan, I have been reading your blog
also... very helpful, thanks.
--
Martin Krasser
blog: http://krasserm.blogspot.com
code: http://github.com/krasserm
twitter: http://twitter.com/mrt1nz
--
Read the docs: http://akka.io/docs/
Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.