19 aug 2014 kl. 16:27 skrev Greg Young <[email protected]>: > I am not responding to this one post just a reply towards the end and will > discuss a few posts from earlier. > > To start I have to agree with some of the posters that premature scaling can > cause many issues. This actually reminds me of the CQRS journey which people > mentioned earlier. One of the main criticisms of the CQRS Journey is that it > prematurely took scaling constraints which causes the code to be much much > more complex than it needs to be. This was partially due to it being a sample > app of something larger and partially due to the p&p team also showing azure > at the same time. Because they wanted to distribute and show Azure at the > same time the team took cloud constraints as a given. This caused for > instance every handler in the system to need to be idempotent. While > seemingly a small constraint this actually adds a significant amount of > complexity to the system. > > The same problem exists in what is being discussed today. For 95+% of systems > it is totally reasonable that when I write a projection I expect my events to > have assured ordering. As Vaughn mentioned a few hundred events/second is the > vast majority of systems. Systems like these can be completely linearized and > ordering assurances are not an issue. This removes a LOT of complexity in > projections code as you don't have to handle hundreds to thousands of edge > cases in your read models where you get events out of order. Saying that > ordering assurances are not needed and everyone should use casual consistency > is really saying "we don't care about the bottom 95% of users".
As noted earlier we are in agreement on this: providing projections (which I also called Queries in this thread) without strict ordering would be meaningless because reliable consumption would only be possible from start to finish. The ability to remember a stream position and restart replay from there implies linearization. We also all agree that this feature cannot be supported by a back-end store that is scalable beyond a single partition (i.e. when multiple distributed nodes are concurrently written to). And we agree that this restriction is tolerable in a large number of relevant use-cases. > > RKuhn had mentioned doing joins. You are correct in this is how we do it now. > We offer historically perfect joins but in live there is no way to do a live > perfect join via queries. We do however support another mechanism for this > that will assure that your live join will always match your historical. We > allow you to precalculate and save the results of the join. This produces a > stream full of stream links which can then be replayed as many times > (perfectly) as you want. > > > There was some discussion above about using precalculated topics to handle > projections. I believe the terminology was called tags. The general idea if I > can repeat it is to write an event FooOccurred and to include upon it some > tags (foo, bar, baz) which would map it to topics that could then be replayed > as a whole. This on the outset seems like a good idea but will not work well > in production. The place where it will run into a problem is that I cannot > know when writing the events all mappings that any future projections may > wish to have. Tomorrow my business could ask me for a report that looks at a > completely new way of partitioning the events and I will be unable to do it. This is a crucial point which implies that Akka Persistence cannot generically provide meaningful projections (or Queries) without relying on a linearizable back-end store. > As I mentioned previously in a quick comment. What is being asked for today > is actually already supported with akka,persistence providing you are using > event store as your backend (for those interested today is the release of the > final RC of 3.0 which has all of the support for the akka,perisistence client > (binaries are for win/linux/max)). Basically what you would do is run > akka.persistence on your write side but *not* use it for supporting your read > models. Instead when dealing with your read models you would use a > catchupsubscription for what you are interested in. I do not see anything > inherently wrong with this way of doing things and it begs the question of > whether this is actually a more appropriate way to deal with eventsourced > systems using akka,.persistence. eg use native storage directly if it > supports it. Taking together the conclusions so far I tend to agree with this assessment. Akka Persistence can provide designated event streams with proper ordering (per persistenceId or Topic) while projections or Queries depend on the underlying storage technology. A potential compromise would be to offer generic but inefficient Queries for those Journals that can provide everything in-order; otherwise we would need to standardize on a query language and that prospect makes me shiver … Regards, Roland > > Cheers, > > Greg > On Tuesday, August 19, 2014 9:24:10 AM UTC-4, √ wrote: > The decision if scale is needed cannot be implicit, as then you are luring > people into the non-scalable world and when they find out then it is too late. > > > On Tue, Aug 19, 2014 at 3:20 PM, Roland Kuhn <[email protected]> wrote: > > 19 aug 2014 kl. 14:57 skrev Gary Malouf <[email protected]>: > > For CQRS specifically, a lot of what people call scalability is in it's > ability to easily model multiple read views to make queries very fast off the > same event data. > > In the cases where a true global ordering is truly necessary, one often does > not need to handle hundreds of thousands of writes per second. I think the > ideal is to have the global ordering property for events by default, and have > to disable that if you feel a need to do more writes per second than a single > writer can handle. > > > Unfortunately it is not only the number of writes per second, the sheer data > volume can drive the need for a distributed, partitioned storage mechanism. > There is only so much you can fit within a single machine and once you go > beyond that you quickly run into CAP (if you want your guarantees to hold > 100% at all times). The way forward then necessitates that you must > compromise on something, either Availability or Determinism (in this case). > > Regards, > > Roland > > Once the global ordering property is enforced, solving many of the publisher > ordering issues (and supporting sagas) becomes significantly easier to > achieve. > > On Aug 19, 2014 8:49 AM, "Roland Kuhn" <[email protected]> wrote: > > 18 aug 2014 kl. 16:49 skrev Patrik Nordwall <[email protected]>: > > On Mon, Aug 18, 2014 at 3:38 PM, Roland Kuhn <[email protected]> wrote: > > 18 aug 2014 kl. 10:27 skrev Patrik Nordwall <[email protected]>: > > Hi Roland, > > A few more questions for clarification... > > > On Sat, Aug 16, 2014 at 10:11 PM, Vaughn Vernon <[email protected]> > wrote: > > On Friday, August 15, 2014 11:39:45 AM UTC-6, rkuhn wrote: > Dear hakkers, > > unfortunately it took me a long time to catch up with akka-user to this point > after the vacation, but on the other hand this made for a very interesting > and stimulating read, thanks for this thread! > > If I may, here’s what I have understood so far: > In order to support not only actor persistence but also full CQRS we need to > adjust our terminology: events are published to topics, where each > persistenceId is one such topic but others are also allowed. > Common use-cases of building projections or denormalized views require the > ability to query the union of a possibly large number of topics in such a > fashion that no events are lost. This union can be viewed as a synthetic or > logical topic, but issues arise in that true topics provide total ordering > while these synthetic ones have difficulties doing so. > Constructing Sagas is hard. > > AFAICS 3. is not related to the other two, the mentions in this thread have > only alluded to the problems so I assume that the difficulty is primarily to > design a process that has the right eventual consistency properties (i.e. > rollbacks, retries, …). This is an interesting topic but let’s concentrate on > the original question first. > > The first point is a rather simple one, we just need to expose the necessary > API for writing to a given topic instead of the local Actor’s persistenceId; > I’d opt for adding variants of the persist() methods that take an additional > String argument. Using the resulting event log is then done as for the others > (i.e. Views and potentially queries should just work). > > Does that mean that a PersistentActor can emit events targeted to its > persistenceId and/or targeted to an external topic and it is only the events > targeted to the persistenceId that will be replayed during recovery of that > PersistentActor? > > Yes. > > Both these two types of events can be replayed by a PersistentView. > > Yes; they are not different types of events, just how they get to the Journal > is slightly different. > > > The only concern is that the Journal needs to be prepared to receive events > concurrently from multiple sources instead of just the same Actor, but since > each topic needs to be totally ordered this will not be an additional hassle > beyond just routing to the same replica, just like for persistenceIds. > > Replica as in data store replica, or as in journal actor? > > The Journal must implement this in whatever way is suitable for the back-end. > A generic solution would be to shard the topics as Actors across the cluster > (internal to the Journal), or the Journal could talk to the replicated > back-end store such that a topic always is written to one specific node (if > that helps). > > What has been requested is "all events for an Aggregate type", e.g. all > shopping carts, and this will will not scale. It can still be useful, and > with some careful design you could partition things when scalability is > needed. I'm just saying that it is a big gun, that can be pointed in the > wrong direction. > > Mixed-up context: #1 is about predefined topics to which events are emitted, > not queries. We need to strictly keep these separate. > > > > > > > Is point one for providing a sequence number from a single ordering source? > > Yes, that is also what I was wondering. Do we need such a sequence number? A > PersistentView should be able to define a replay starting point. (right now I > think that is missing, it is only supported by saving snapshots) > > Or do you mean topic in the sense that I cover above with EntitiesRef? In > other words, what is the String argument and how does it work? If you would > show a few sample persist() APIs that might help clarify. And if you are > referring to a global ordering sequence, whose must maintain that? Is it the > store implementation or the developer? > > #1 is not about sequence numbers per se (although it has consequences of that > kind): it is only about allowing persistenceIds that are not bound to a > single PersistentActor and that all PersistentActors can publish to. Mock > code: > > def apply(evt: Event) = state = evt(state) > > def receiveCommand = { > case c: Cmd => > if (isValid(c)) { > persist(Event1(c))(apply) > persistToTopic("myTopic", Event2(c)) { evt => > apply(evt) > sender() ! Done > } > } > } > > > Looks good, but to make it clear, there is no transaction that spans over > these two persist calls. > > Of course. > > > Everyone who listens to "myTopic" will then (eventually) get Event2. > > > > The second point is the contentious one, since a feature request (consistent > iteration over a query) clashes with a design choice (scalability). First it > is important to note that this clash is genuine: scalability means that we do > not want to limit the size of a topic to always fit one unit of consistency, > our default assumption is that everything should be prepared for > distribution. We all know that in a distributed system linearizability is not > generally achievable, meaning that a distributed (synthetic) topic that > receives events from concurrent sources will not be able to provide a global > ordering. A non-distributed Journal, OTOH, is a single point of failure which > is not desirable for many applications (i.e. your business will go down while > the Journal has issues—true replication requires the ability to fail > independently and hence is distributed in the CAP sense). > > I think I understand this to mean that if you decide to implement a store > using MySQL/Postgres/Oracle/LevelDB or whatever, then you live with what you > get and what you don't get from those stores. If so, that's okay with me > because we already live with those trade offs all the time anyway. I think > this is far better than trying to make the whole world step up to > Availability and Partition tolerance when all they want to do is write a > business app using akka-persistence. This allows teams to decide for > themselves which of the two CAP attributes they want, and note that even > Amazon would choose C over A or P in some cases. > > I agree, I think the ordering quality of service should be provided by the > journal implementation and not enforced by akka persistence. If you use > MySQL/Postgres/Oracle/LevelDB the total ordering is a no-brainer, but if you > use Cassandra or Kafka it is not. > > My point (perhaps not well articulated) was that in order to offer the > feature of listening to arbitrary Queries the Journal MUST provide the > resulting event stream in a consistent order (see more below). Providing them > in random order for each replay is worse than useless as far as I can see. > > > > > As I see it, a query (like “all events of this type” etc.) should be > configured for the given Journal and should then be available as a > (synthetic) topic for normal consumption—but not for being written to. The > Journal is then free to implement this in any way it sees fit, but barring > fundamental advances in CS or errors on my part this will always require that > the synthetic topic is not scalable in the way we usually define that (i.e. > distributable). As Vaughn points out this may not be an issue at all, actual > benchmarks would help settle this point. Journal backends that already > implement a global order can make use of that, for others the synthetic topic > would work just like any other non-PersistentActor topic with manual > duplication of those events that match the query (akin to (a) in the first > post of this thread); this duplication does not necessarily need to double > the memory consumption, it could also only persist the events by reference > (depending on the storage engine). > > I think these are very typical kinds of queries are: > > - All newly persisted events that I have not yet processed since the last > time I asked for them (because I always process all new events in some > specific way) > > This needs a total order, and it must be consistent across runs (i.e. > eventual consistency is not good enough, otherwise you will lose events or > double-process them). > > - All persisted events that constitute the state of my actor > > This is defined to be the Actor’s own topic, hence it is not a Query (and > therefore not a Problem ;-) ). > > - All persisted events from the beginning of time because I just redesigned > 20 user interface views and I have to delete and rebuild all my view states > from day-1, and the events must be delivered in the same order that they > originally happened, or my generated views' state will be wrong > > The order that they originally happened in is ill-defined unless your Journal > globally serializes events—which is not a given. > > - All persisted events from the beginning of time because a new system is > coming on line and needs to be seeded with what happened from the time I was > deployed until now, and the events must be delivered in the same order that > they originally happened, or the state of my newly deployed system will be > wrong > > Same here. But it is difficult to imagine things going wrong between > unrelated (concurrent & distributed) entities, their order did not matter the > first time around as well, right? An exception here is causal ordering (BTW: > casual ordering is not known to me). > > > We have still not really defined what this *order* is. > > "in the same order that they originally happened" sounds like a wall clock > timestamp in the PersistentActor, is that what we mean? -- and then we all > know that it is not perfect in a distributed system, and events may have > exactly the same timestamp. > > Or do we mean the insert order in the data store? There is often no such > thing in a distributed store. > > Or do we mean that the replay or these events should be deterministic, i.e. > always replayed in the same order? > > This is the only one we can aim for AFAICS. > > I agree, but that is impossible to achieve with a fully available system > (AFAIK). > > Yes. Queries cannot scale, I have yet to see someone contesting this > conclusion. > > > As usual, I’d love to be proven wrong. > > I tried to understand what is supported by EventStore. Found this page: > https://github.com/EventStore/EventStore/wiki/Projections-fromStreams > > It is clear that the total order of a projection from multiple streams is not > perfect, but probably good enough for practical purposes. > > What Greg describes there is exactly what I mean with Eventually > Linearizable: while things are happening it takes a while for the system to > sync up and agree on a replay order. Once that is set, everything is > deterministic. > > Yes, the problem is that a PersistentView is querying live data, and if that > "replay" order is supposed to be the same as a later historical query the > data must be stored in total order, or some kind of buffering/sorting best > effort must be used. > > Yes. > > > > The alternative would be to strictly obey causal order by tracking who sent > what in response to which message. Causality only defines a partial order, > but that should by definition be enough because those event pairs which are > unordered also do not care about each other (i.e. the global ordering might > be different during each replay but that is irrelevant since nobody can > observe it anyway). > > I think this ties in to what is confusing me most about all this. The only > consistency boundary in my world is the DDD Aggregate instance, i.e. one > PersistentActor instance. What has been requested is something that requires > consistency (ordering) across Aggregate instances. > > Why not model the topic as a separate PersistentActor? Problems with that has > been raised, such as how to reliably deliver the events from the Aggregate > PersistentActor to the topic PersistentActor. Is that impossible to solve? > > Again: we need to keep Topics and Queries separated. What you describe is a > Topic, and that works fine. What people asked for are Queries, and they are > difficult. The use-case that was initially presented was not about > consistency between different PersistentActors, it was only about the > capability to deterministically replay all events in the system, which > includes the ability to start at a given point. I still have not seen a > proposal for how to achieve that without someone having to actually store > different cursors for the distributed datastore partitions—unless the store > is not partitioned and therefore not scalable. > > Regards, > > Roland > > > Don't we have a similar problem with the two calls to persist and > persistToTopic. They are not to atomic. > > /Patrik > > > Regards, > > Roland > > > Regards, > Patrik > > > > > > > When it comes to providing queries in a way that does not have a global > ordering, my current opinion is that we should not do this because it would > be quite pointless (a.k.a. unusable). A compromise would be to provide > eventually linearizable queries based on the premise that the application of > events should be idempotent in any case and overlapping replay (i.e. where > necessary from the last known-linear point instead of the requested one) must > be tolerated. AFAIK this is the topic of ongoing research, though, so I’d > place that lower on the priority list. > > Are you here referring to Casual Consistency? Otherwise, I am not sure I > follow. If you refer to Casual Consistency, I agree that this should be lower > on the to-support priority list than global ordering, because it is just too > hard compared to the needs of most teams that want to use ES/CQRS. > > > Does this sound like a fair summary? Please let me know in case I > misrepresent or misunderstand something, once we reach consensus on what we > need we’ll ticket and solve it, as usual ;-) > > Regards, > > Roland > > 12 aug 2014 kl. 18:10 skrev Ashley Aitken <[email protected]>: > > > Thanks for your post Vaughn. > > On Monday, 11 August 2014 05:57:05 UTC+8, Vaughn Vernon wrote: > None of this stuff is easy to do, and even harder to do right. > > I am the first to agree with that. > > Your post gives away the main problem with getting this to work correctly, > because Actor Model and akka-persistence currently supports the first half of > A, but not the second half. In other words, to make the interface rich we not > only need a new set of abstractions, we also need to overcome the direct > messaging nature of actors because it can be limiting in some use cases. > With the messaging library I am building, currently named Number 9, which > includes both persistence and a process manager, this problem is handled as > follows. Any actor that sends a message may: > > 1. send a persistent message to another actor > 2. send a persistent message to a topic > 3. send a persistent message primarily to another actor, but also to a topic > > That is very interesting. > > It seems to me that CQRS commands should be sent as messages (persistent or > not) - your (1.) and changes of state (AR or application) should be published > as events (to topics or more generally) - your (2.) but I can't see a need > for (3.)? > > Further, a process manager for a bank account transfer could be implemented > with a command to the source account (withdrawForTransfer) that would be > acknowledged by a published persistent event (WithdrawnForTransfer). Similar > for deposit into target account. > > Pawel Kaczor in his DDD-Leaven-Akka series (Lesson 3) includes projections > from aggregated streams of events and a process manager / saga using Akka > Persistence by having the ARs persisting their events and also publishing > their events. > > > http://pkaczor.blogspot.com.au/2014/06/reactive-ddd-with-akka-projections.html > > https://github.com/pawelkaczor/ddd-leaven-akka > > The only shortcomings (not his fault or a criticism) seem to be: 1) the use > of two event infrastructures (one for persistence and one for pub/sub), 2) > the limited ability for complex projections (like Greg mentioned and > available in Event Store), and 3) lack of persistence for pub/sub events. > > The latter makes reconstruction of a read model or construction of a new read > model after the events have been published more difficult. > > If you have watched any of my presentations on this subject you have heard > this before. I am presenting most of this to the DDD Denver meetup this > Monday night. The title of the talk is "Building a Reactive Process Manager, > Twice". The twice part is because I will demonstrate this working both in > Scala with Akka and also in C# with Dotsero: > > Thank you I will look out for that (please share the video link if it is > recorded and put on the Web). I have seen (but not watched) some of your > videos because I am unsure as to who is leading here and the videos I saw > seemed to be from a few years ago. > > I've just got your book so I will get on with reading that (for DDD and CQRS > enlightenment). > > -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to the Google Groups > "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > > Visit this group at http://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/d/optout. > > > > Dr. Roland Kuhn > Akka Tech Lead > Typesafe – Reactive apps on the JVM. > twitter: @rolandkuhn > > > > -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to the Google Groups > "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/d/optout. > > > > -- > > Patrik Nordwall > Typesafe - Reactive apps on the JVM > Twitter: @patriknw > > > > -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to the Google Groups > "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/d/optout. > > > > Dr. Roland Kuhn > Akka Tech Lead > Typesafe – Reactive apps on the JVM. > twitter: @rolandkuhn > > > > -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to the Google Groups > "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/d/optout. > > > > -- > > Patrik Nordwall > Typesafe - Reactive apps on the JVM > Twitter: @patriknw > > > > -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to the Google Groups > "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to akka-u > ... > > -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to the Google Groups > "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/d/optout. Dr. Roland Kuhn Akka Tech Lead Typesafe – Reactive apps on the JVM. twitter: @rolandkuhn -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
