Provability not probability (iPad helping) On Wednesday, August 27, 2014, Greg Young <gregoryyou...@gmail.com> wrote:
> I have used 1:n it's a fairly common pattern I just wanted to point out > it's not a panacea that works everywhere and had some rather large > downsides if applied in the wrong place. I agree in the discussion being > around when which is applicable. My original point was that you can't > really do 1:1 with many of the backends as they don't support millions of > streams. > > Btw for performance what many do is an identity map and caching in memory > (assuming the whole set does not fit in memory) > > There is a side bit to this as well in terms of probability. If you use > one actor for n it's next to impossible to show that transactions do not > interfere with each other (while this is rather trivial with 1:1 as they > would need messages between each other) > > Cheers, > > Greg > > On Wednesday, August 27, 2014, Martin Krasser <krass...@googlemail.com > <javascript:_e(%7B%7D,'cvml','krass...@googlemail.com');>> wrote: > >> Whether to go for a 1:1 approach or a 1:n approach (or a partitioned m:n >> approach where m << n) really depends on the concrete use case and >> non-functional requirements. Your example might be a good candidate for a >> 1:1 approach (see also further comments inline) but there are also examples >> for which a 1:n or m:n approach is a better choice. Here are some general >> influencing factors: >> >> - length of event history required to recover state: bank accounts need >> the full event history to be recovered but order management is an example >> where this is often not the case. Orders (trade orders in finance, lab >> orders during medical treatments, ...) usually have a limited validity so >> that you can recover active orders from a limited event history (last 10 >> days, for example) which should make migrations after code changes rather >> painless. BTW, having only a single persistent actor (or a few) that >> maintains state is comparable to role of a "Business Logic Processor" in >> the LMAX architecture which originated from the high frequency trading >> domain. >> >> - latency requirements: creating a new persistent actor has some >> overhead, not only memory but also bootstrap as its creation requires a >> roundtrip to the backend store. Re-activation of passivated actors that >> have been designed around a 1:1 approach, may also be in conflict with low >> latency requirements. Good compromises can often be found by following an >> m:n approach in this case. >> >> - write throughput: high write throughput can only be achieved by >> batching writes and batching is currently implemented on a per persistent >> actor basis. Throughput therefore scales better when having a small(er) >> number of actors. A large number of actors will create more but smaller >> batches, reducing throughput. This is however more a limitation of the >> current implementation of akka-persistence. Maybe a switch to batching on >> journal level is a good idea, so that a single write batch can contain >> events from several actors. >> >> - ... >> >> Even if you need to replay a long event history (for example after a code >> change), you can always do that in the background on a separate node until >> the new version of the persistent actor caught up and switch the >> application to it when done. You could even have both versions running at >> the same time for A/B testing for example. With a replay rate of 100k/sec >> you can replay a billion events within a few hours. >> >> Further comments inline ... >> >> On 26.08.14 20:34, Greg Young wrote: >> >> OK for bank accounts there is some amount of state needed to verify a >> transaction. Let's propose that for now its the branch you opened your >> account at, your current balance,your address and a risk classification as >> well as a customer profitability/loyalty score (these are all reasonable >> things to track in terms of deciding if a transaction should be accepted or >> not) >> >> >> When validating commands, you only need to keep that part of application >> state within persistent actors for which you have strict consistency >> requirements. In context of bank accounts, this is for sure the case for >> the balance, but not necessarily for customer profitability, loyality score >> or whatever. These metrics may be calculated in the background, hence, >> having eventual read consistency for them should be sufficient. >> Consequently this state can be maintained elsewhere (as part of a separate >> read model) and requested from persistent actors during transaction >> validation. If you need further metrics in the future, new read models can >> be added and included into the validation workflow initiated by a >> persistent actor. >> >> >> I could keep millions of these inside of a single actor. >> >> A few problems come up though: >> >> Replaying this actor from events is very painful (millions possibly >> hundreds of millions of events and they must be processes serially) >> solution->snapshots? >> Snapshots have all the same versioning issues people are used to with >> keeping state around. What happens when the state I am keeping changes say >> now I also need to keep avg+stddev of transaction amount or we found a bug >> in how we were maintaining the loyalty score (back to #1) this will >> invalidate my snapshot >> >> >> See above, there's no need to keep all of that inside the persistent >> actor for strict read consistency. Allowing eventual consistency during >> command validation where possible not only makes the validation process >> more flexible (by just including new read models if required) but also >> reduces snapshot migration efforts (by simplifying the state structure >> inside persistent actors). >> >> Furthermore, ensuring strict consistency for persistent actor state >> requires usage of persist() instead of persistAsync() which reduces >> throughput at least by a factor of 10. That may again be in conflict with >> write throughput requirements. >> >> To conclude, I think there are use cases where a 1:1 approach makes sense >> but this shouldn't be a general recommendation IMO. It really depends on >> the specific functional and non-functional requirements for finding the >> best compromise. >> >> (requiring a full replay or else you run into another whole series of >> hokey problems trying to do "from here forward" type things (imagine a new >> feature that relies on a 6 month moving average) >> >> >> >> >> >> >> On Tue, Aug 26, 2014 at 2:15 PM, Martin Krasser <krass...@googlemail.com> >> wrote: >> >>> >>> On 26.08.14 20:12, Greg Young wrote: >>> >>> In particular I am interested in the associated state thats needed, I >>> can see keeping it in a single actor but this does not turn out well at all >>> for most production systems in particular as changes happen over time. >>> >>> >>> I don't get your point. Please elaborate. >>> >>> >>> >>> On Tue, Aug 26, 2014 at 2:08 PM, Martin Krasser <krass...@googlemail.com >>> > wrote: >>> >>>> See my eventsourced example(s), that I published 1-2 years ago, others >>>> are closed source >>>> >>>> >>>> On 26.08.14 20:06, Greg Young wrote: >>>> >>>> Love to see an example >>>> >>>> On Tuesday, August 26, 2014, Martin Krasser <krass...@googlemail.com> >>>> wrote: >>>> >>>>> >>>>> On 26.08.14 19:56, Greg Young wrote: >>>>> >>>>> I'm curious how you would model say bank accounts with only a few >>>>> hundred actors can you go into a bit of detail >>>>> >>>>> >>>>> persistent-actor : bank-account = 1:n (instead of 1:1) >>>>> >>>>> >>>>> On Tuesday, August 26, 2014, Martin Krasser <krass...@googlemail.com> >>>>> wrote: >>>>> >>>>>> >>>>>> On 26.08.14 16:44, Andrzej Dębski wrote: >>>>>> >>>>>> My mind must have filtered out the possibility of making snapshots >>>>>> using Views - thanks. >>>>>> >>>>>> About partitions: I suspected as much. The only thing that I am >>>>>> wondering now is: if it is possible to dynamically create partitions in >>>>>> Kafka? AFAIK the number of partitions is set during topic creation (be it >>>>>> programmatically using API or CLI tools) and there is CLI tool you can >>>>>> use >>>>>> to modify existing topic: >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-5.AddPartitionTool. >>>>>> To keep the invariant " PersistentActor is the only writer to a >>>>>> partitioned journal topic" you would have to create those partitions >>>>>> dynamically (usually you don't know up front how many PersistentActors >>>>>> your >>>>>> system will have) on per-PersistentActor basis. >>>>>> >>>>>> >>>>>> You're right. If you want to keep all data in Kafka without ever >>>>>> deleting them, you'd need to add partitions dynamically (which is >>>>>> currently >>>>>> possible with APIs that back the CLI). On the other hand, using Kafka >>>>>> this >>>>>> way is the wrong approach IMO. If you really need to keep the full event >>>>>> history, keep old events on HDFS or wherever and only the more recent >>>>>> ones >>>>>> in Kafka (where a full replay must first read from HDFS and then from >>>>>> Kafka) or use a journal plugin that is explicitly designed for long-term >>>>>> event storage. >>>>>> >>>>>> The main reason why I developed the Kafka plugin was to integrate my >>>>>> Akka applications in unified log processing architectures as descibed in >>>>>> Jay Kreps' excellent article >>>>>> <http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying>. >>>>>> Also mentioned in this article is a snapshotting strategy that fits >>>>>> typical >>>>>> retention times in Kafka. >>>>>> >>>>>> >>>>>> On the other hand maybe you are assuming that each actor is writing >>>>>> to different topic >>>>>> >>>>>> >>>>>> yes, and the Kafka plugin is currently implemented that way. >>>>>> >>>>>> - but I think this solution is not viable because information about >>>>>> topics is limited by ZK and other factors: >>>>>> http://grokbase.com/t/kafka/users/133v60ng6v/limit-on-number-of-kafka-topic >>>>>> . >>>>>> >>>>>> >>>>>> A more in-depth discussion about these limitations is given at >>>>>> http://www.quora.com/How-many-topics-can-be-created-in-Apache-Kafka >>>>>> with a detailed comment from Jay. I'd say that if you designed your >>>>>> application to run more than a few hundred persistent actors, then the >>>>>> Kafka plugin is the probably wrong choice. I tend to design my >>>>>> applications >>>>>> to have only a small number of persistent actors (which is in contrast to >>>>>> many other discussions on akka-user) which makes the Kafka plugin a good >>>>>> candidate. >>>>>> >>>>>> To recap, the Kafka plugin is a reasonable choice if >>>>>> >>>>>> - frequent snapshotting is done by persistent actors (every day or so) >>>>>> - you don't have more than a few hundred persistent actors and >>>>>> - your application is a component of a unified log processing >>>>>> architecture (backed by Kafka) >>>>>> >>>>>> The most interesting next Kafka plugin feature for me to develop is >>>>>> an HDFS integration for long-term event storage (and full event history >>>>>> replay). WDYT? >>>>>> >>>>>> >>>>>> W dniu wtorek, 26 sierpnia 2014 15:28:47 UTC+2 użytkownik Martin >>>>>> Krasser napisał: >>>>>>> >>>>>>> Hi Andrzej, >>>>>>> >>>>>>> On 26.08.14 09:15, Andrzej Dębski wrote: >>>>>>> >>>>>>> Hello >>>>>>> >>>>>>> Lately I have been reading about a possibility of using Apache >>>>>>> Kafka as journal/snapshot store for akka-persistence. >>>>>>> >>>>>>> I am aware of the plugin created by Martin Krasser: >>>>>>> https://github.com/krasserm/akka-persistence-kafka/ and also I read >>>>>>> other topic about Kafka as journal >>>>>>> https://groups.google.com/forum/#!searchin/akka-user/kakfka/akka-user/iIHmvC6bVrI/zeZJtW0_6FwJ >>>>>>> . >>>>>>> >>>>>>> In both sources I linked two ideas were presented: >>>>>>> >>>>>>> 1. Set log retention to 7 days, take snapshots every 3 days >>>>>>> (example values) >>>>>>> 2. Set log retention to unlimited. >>>>>>> >>>>>>> Here is the first question: in first case wouldn't it mean that >>>>>>> persistent views would receive skewed view of the PersistentActor state >>>>>>> (only events from 7 days) - is it really viable solution? As far as I >>>>>>> know >>>>>>> PersistentView can only receive events - it can't receive snapshots from >>>>>>> corresponding PersistentActor (which is good in general case). >>>>>>> >>>>>>> >>>>>>> PersistentViews can create their own snapshots which are isolated >>>>>>> from the corresponding PersistentActor's snapshots. >>>>>>> >>>>>>> >>>>>>> Second question (more directed to Martin): in the thread I linked >>>>>>> you wrote: >>>>>>> >>>>>>> I don't go into Kafka partitioning details here but it is >>>>>>>> possible to implement the journal driver in a way that both a single >>>>>>>> persistent actor's data are partitioned *and* kept in order >>>>>>>> >>>>>>> >>>>>>> I am very interested in this idea. AFAIK it is not yet >>>>>>> implemented in current plugin but I was wondering if you could share >>>>>>> high >>>>>>> level idea how would you achieve that (one persistent actor, multiple >>>>>>> partitions, ordering ensured)? >>>>>>> >>>>>>> >>>>>>> The idea is to >>>>>>> >>>>>>> - first write events 1 to n to partition 1 >>>>>>> - then write events n+1 to 2n to partition 2 >>>>>>> - then write events 2n+1 to 3n to partition 3 >>>>>>> - ... and so on >>>>>>> >>>>>>> This works because a PersistentActor is the only writer to a >>>>>>> partitioned journal topic. During replay, you first replay partition 1, >>>>>>> then partition 2 and so on. This should be rather easy to implement in >>>>>>> the >>>>>>> Kafka journal, just didn't have time so far; pull requests are welcome >>>>>>> :) >>>>>>> Btw, the Cassandra journal >>>>>>> <https://github.com/krasserm/akka-persistence-cassandra> follows >>>>>>> the very same strategy for scaling with data volume (by using different >>>>>>> partition keys). >>>>>>> >>>>>>> Cheers, >>>>>>> Martin >>>>>>> >>>>>>> -- >>>>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>> >>>>>>>>>> Check the FAQ: >>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>> >>>>>>>>>> Search the archives: >>>>>>> https://groups.google.com/group/akka-user >>>>>>> --- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "Akka User List" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to akka-user+...@googlegroups.com. >>>>>>> To post to this group, send email to akka...@googlegroups.com. >>>>>>> Visit this group at http://groups.google.com/group/akka-user. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Martin Krasser >>>>>>> >>>>>>> blog: http://krasserm.blogspot.com >>>>>>> code: http://github.com/krasserm >>>>>>> twitter: http://twitter.com/mrt1nz >>>>>>> >>>>>>> -- >>>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>> >>>>>>>>>> Check the FAQ: >>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>> >>>>>>>>>> Search the archives: >>>>>> https://groups.google.com/group/akka-user >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Akka User List" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to akka-user+unsubscr...@googlegroups.com. >>>>>> To post to this group, send email to akka-user@googlegroups.com. >>>>>> Visit this group at http://groups.google.com/group/akka-user. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>>> >>>>>> -- >>>>>> Martin Krasser >>>>>> >>>>>> blog: http://krasserm.blogspot.com >>>>>> code: http://github.com/krasserm >>>>>> twitter: http://twitter.com/mrt1nz >>>>>> >>>>>> -- >>>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>> >>>>>>>>>> Check the FAQ: >>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>> >>>>>>>>>> Search the archives: >>>>>> https://groups.google.com/group/akka-user >>>>>> --- >>>>>> You received this message because you are subscribed to a topic in >>>>>> the Google Groups "Akka User List" group. >>>>>> To unsubscribe from this topic, visit >>>>>> https://groups.google.com/d/topic/akka-user/Bz9pWyK7V7g/unsubscribe. >>>>>> To unsubscribe from this group and all its topics, send an email to >>>>>> akka-user+unsubscr...@googlegroups.com. >>>>>> To post to this group, send email to akka-user@googlegroups.com. >>>>>> Visit this group at http://groups.google.com/group/akka-user. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> >>>>> -- >>>>> Studying for the Turing test >>>>> >>>>> -- >>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>> >>>>>>>>>> Check the FAQ: >>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>> >>>>>>>>>> Search the archives: >>>>> https://groups.google.com/group/akka-user >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Akka User List" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to akka-user+unsubscr...@googlegroups.com. >>>>> To post to this group, send email to akka-user@googlegroups.com. >>>>> Visit this group at http://groups.google.com/group/akka-user. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> >>>>> -- >>>>> Martin Krasser >>>>> >>>>> blog: http://krasserm.blogspot.com >>>>> code: http://github.com/krasserm >>>>> twitter: http://twitter.com/mrt1nz >>>>> >>>>> -- >>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>> >>>>>>>>>> Check the FAQ: >>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>> >>>>>>>>>> Search the archives: >>>>> https://groups.google.com/group/akka-user >>>>> --- >>>>> You received this message because you are subscribed to a topic in the >>>>> Google Groups "Akka User List" group. >>>>> To unsubscribe from this topic, visit >>>>> https://groups.google.com/d/topic/akka-user/Bz9pWyK7V7g/unsubscribe. >>>>> To unsubscribe from this group and all its topics, send an email to >>>>> akka-user+unsubscr...@googlegroups.com. >>>>> To post to this group, send email to akka-user@googlegroups.com. >>>>> Visit this group at http://groups.google.com/group/akka-user. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>>> -- >>>> Studying for the Turing test >>>> >>>> -- >>>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>> >>>>>>>>>> Check the FAQ: >>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>> >>>>>>>>>> Search the archives: >>>> https://groups.google.com/group/akka-user >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "Akka User List" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to akka-user+unsubscr...@googlegroups.com. >>>> To post to this group, send email to akka-user@googlegroups.com. >>>> Visit this group at http://groups.google.com/group/akka-user. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> >>>> -- >>>> Martin Krasser >>>> >>>> blog: http://krasserm.blogspot.com >>>> code: http://github.com/krasserm >>>> twitter: http://twitter.com/mrt1nz >>>> >>>> -- >>>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>> >>>>>>>>>> Check the FAQ: >>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>> >>>>>>>>>> Search the archives: >>>> https://groups.google.com/group/akka-user >>>> --- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "Akka User List" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/akka-user/Bz9pWyK7V7g/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> akka-user+unsubscr...@googlegroups.com. >>>> To post to this group, send email to akka-user@googlegroups.com. >>>> Visit this group at http://groups.google.com/group/akka-user. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> >>> -- >>> Studying for the Turing test >>> -- >>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>> >>>>>>>>>> Check the FAQ: >>> http://doc.akka.io/docs/akka/current/additional/faq.html >>> >>>>>>>>>> Search the archives: >>> https://groups.google.com/group/akka-user >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "Akka User List" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to akka-user+unsubscr...@googlegroups.com. >>> To post to this group, send email to akka-user@googlegroups.com. >>> Visit this group at http://groups.google.com/group/akka-user. >>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> -- >>> Martin Krasser >>> >>> blog: http://krasserm.blogspot.com >>> code: http://github.com/krasserm >>> twitter: http://twitter.com/mrt1nz >>> >>> -- >>> >>>>>>>>>> Read the docs: http://akka.io/docs/ >>> >>>>>>>>>> Check the FAQ: >>> http://doc.akka.io/docs/akka/current/additional/faq.html >>> >>>>>>>>>> Search the archives: >>> https://groups.google.com/group/akka-user >>> --- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "Akka User List" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/akka-user/Bz9pWyK7V7g/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> akka-user+unsubscr...@googlegroups.com. >>> To post to this group, send email to akka-user@googlegroups.com. >>> Visit this group at http://groups.google.com/group/akka-user. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> >> -- >> Studying for the Turing test >> -- >> >>>>>>>>>> Read the docs: http://akka.io/docs/ >> >>>>>>>>>> Check the FAQ: >> http://doc.akka.io/docs/akka/current/additional/faq.html >> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user >> --- >> You received this message because you are subscribed to the Google Groups >> "Akka User List" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to akka-user+unsubscr...@googlegroups.com. >> To post to this group, send email to akka-user@googlegroups.com. >> Visit this group at http://groups.google.com/group/akka-user. >> For more options, visit https://groups.google.com/d/optout. >> >> >> -- >> Martin Krasser >> >> blog: http://krasserm.blogspot.com >> code: http://github.com/krasserm >> twitter: http://twitter.com/mrt1nz >> >> -- >> >>>>>>>>>> Read the docs: http://akka.io/docs/ >> >>>>>>>>>> Check the FAQ: >> http://doc.akka.io/docs/akka/current/additional/faq.html >> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "Akka User List" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/akka-user/Bz9pWyK7V7g/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> akka-user+unsubscr...@googlegroups.com. >> To post to this group, send email to akka-user@googlegroups.com. >> Visit this group at http://groups.google.com/group/akka-user. >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > Studying for the Turing test > > -- Studying for the Turing test -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.