On 05.09.14 09:09, Roland Kuhn wrote:
Attempting a second round-up of what shall go into tickets, in addition to my first summary we need to:

  * predefine trait JournalQuery with minimal semantics (to make the
    Journal support discoverable at runtime)
  * predefine queries for named streams since that is universally
    useful; these are separate from PersistenceID queries due to
    different consistency requirements
  * add support for write-side tags (see below)
  * add a comprehensive PersistenceTestKit which supports the
    fabrication of arbitrary event streams for both PersistentActor
    and read-side verification


Ashley, your challenge about considering non-ES write-sides is one that I think we might not take up: the scope of Akka Persistence is to support persistent Actors and their interactions, therefore I believe we should be opinionated about how we achieve that. If you want to use CQRS without ES then Akka might just not be for you (for some values of “you”, not necessarily you ;-) ).

Now why tags? My previous conclusion was that burdening the write-side with generating them goes counter to the spirit of ES in that this tagging should well be possible after the fact. The problem is that that can be extremely costly, so spawning a particular query on the read-side should not implicitly replay all events of all time, that has the potential of bringing down the whole system due to overload. I still think that stores might want to offer this feature under the covers—i.e. not accessible via Akka Persistence standard APIs—but for those that cannot we need to provide something else. The most prominent use of tags will probably be that each kind of PersistentActor has its own tag, solving the type issue as well (as brought up by Alex and Olger). In summary, write-side tags are just an optimization.

Concerning the ability to publish to arbitrary topics from any Actor I am on the fence: this is a powerful feature that can be quite a burden to implement. What we are defining here is—again—Akka Persistence, meaning that all things that are journaled are intended to stay there eternally. Using this to realize a (usually ephemeral) event bus is probably going to suffer from impedance mismatches, as witnessed by previous questions concerning the efficiency of deleting log entries—something that should really not be done in the intended use-cases. So, if an Actor wants to persist an Event to make it part of the journaled event stream, then I’d argue that that Actor is at least conceptually a PersistentActor. What is wrong with requiring it to be one also in practice? The only thing that we might want to add is that recovery (i.e. the write-side reading of events) can be opted out of. Thoughts?

Already doable with:

    override def preStart(): Unit = {
self ! Recover(fromSnapshot = SnapshotSelectionCriteria.None, replayMax = 0L)
    }

but maybe you were more thinking about different traits for writing and recovery (?)


For everything going beyond the above I’d say we should wait and see what extensions are provided by Journal implementations and how well they work in practice.

Regards,

Roland

27 aug 2014 kl. 16:34 skrev Roland Kuhn <goo...@rkuhn.info <mailto:goo...@rkuhn.info>>:

Dear hakkers,

there have been several very interesting, educational and productive threads in the past weeks (e.g. here <https://groups.google.com/d/msg/akka-user/SL5vEVW7aTo/KfqAXAmzol0J> and here <https://groups.google.com/d/msg/akka-user/4kbYcwWS2OI/hpmAkxnB9D4J>). We have taken some time to distill the essential problems as well as discuss the proposed solutions and below is my attempt at a summary. In the very likely case that I missed something, by all means please raise your voice. The intention for this thread is to end with a set of github issues for making Akka Persistence as closely aligned with CQRS/ES principles as we can make it.

As Greg and others have confirmed, the write-side (PersistentActor) is already doing a very good job, so we do not see a need to change anything at this point. My earlier proposal of adding specific topics as well as the discussed labels or tags all feel a bit wrong since they benefit only the read-side and should therefore not be a concern/duty of the write-side.

On the read-side we came to the conclusion that PersistentView basically does nearly the right thing, but it focuses on the wrong aspect: it seems most suited to track a single PersistentActor with some slack, but also not with back-pressure as a first-class citizen (it is possible to achieve it, albeit not trivial). What we distilled as the core functionality for a read-side actor is the following:

  * it can ask for a certain set of events
  * it consumes the resulting event stream on its own schedule
  * it can be stateful and persistent on its own


This does not preclude populating e.g. a graph database or a SQL store directly from the journal back-end via Spark, but we do see the need to allow Akka Actors to be used to implement such a projection.

Starting from the bottom up, allowing the read-side to be a PersistentActor in itself means that receiving Events should not require a mixin trait like PersistentView. The next bullet point means that the Event stream must be properly back-pressured, and we have a technology under development that is predestined for such an endeavor: Akka Streams. So the proposal is that any Actor can obtain the ActorRef for a given Journal and send it a request for the event stream it wants, and in response it will get a message containing a stream (i.e. Flow) of events and some meta-information to go with it.

The question that remains at this point is what exactly it means to “ask for a certain set of events”. In order to keep the number of abstractions minimal, the first use-case for this feature is the recovery of a PersistentActor. Each Journal will probably support different kinds of queries, but it must for this use-case respond to

case class QueryByPersistenceId(id: String, fromSeqNr: Long, toSeqNr: Long)

with something like

case class EventStreamOffer(metadata: Metadata, stream: Flow[PersistentMsg])

The metadata allows the recipient to correlate this offer with the corresponding request and it contains other information as we will see in the following.

Another way to ask for events was discussed as Topics or Labels or Tags in the previous threads, and the idea was that the generated stream of all events was enriched by qualifiers that allow the Journal to construct a materialized view (e.g. a separate queue that copies all events of a given type). This view then has a name that is requested from the read-side in order to e.g. have an Actor that keeps track of certain aspects of all persistent ShoppingCarts in a retail application. As I said above we think that this concern should be handled outside of the write-side because logically it does not belong there. Its closest cousin is the construction of an additional index or view within a SQL store, maintained by the RDBMS upon request from the DBA, but available to and relied upon by the read-side. We propose that this is also how this should work with Akka Persistence: the Journal is free to allow the configuration of materialized views that can be requested as event streams by name. The extraction of the indexing characteristics is performed by the Journal or its backing store, outside the scope of the Journal SPI; one example of doing it this way has been implemented by Martin <https://github.com/krasserm/akka-persistence-kafka/#user-defined-topics> already. We propose to access the auxiliary streams by something like

case class QueryKafkaTopic(name: String, fromSeqNr: Long, toSeqNr: Long)

Sequence numbers are necessary for deterministic replay/consumption. We had long discussions about the scalability implications, which is the reason why we propose to leave such queries proprietary to the Journal backend. Assuming a perfectly scalable (but then of course not real-time linearizable) Journal, the query might allow only

case class QuerySuperscalableTopic(name: String, fromTime: DateTime)

This will try to give you all events that were recorded after the given moment, but replay will not be deterministic, there will not be unique sequence numbers. These properties will be reflected in the Metadata that comes with the EventStreamOffer.

The last way to ask for events is to select them using an arbitrarily powerful query at runtime, probably with dynamic parameters so that it cannot be prepared or materialized while writing the log. Whether and how this is supported by the Journal depends on the precise back-end, and this is very much deliberate: we want to allow the Journal implementations to focus on different use-cases and offer different feature trade-offs. If a RDBMS is used, then things will naturally be linearized, but less scalable, for example. Document databases can extract a different set of features than when storing BLOBs in Oracle, etc. The user-facing API would be defined by each Journal implementation and could include

case class QueryEventStoreJS(javascriptCode: String)
case class QueryByProperty(jsonKey: String, value: String, since: DateTime)
case class QueryByType(clazz: Class[_], fromSeqNr: Long, toSeqNr: Long)
case class QueryNewStreams(fromSeqNr: Long, toSeqNr: Long)

The last one should elegantly solve the use-case of wanting to catalog which persistenceIds are valid in the Journal (which has been requested several times as well). As discussed for the SuperscalableTopic, each Journal would be free to decide whether it wants to implement deterministic replay, etc.

Properly modeling streams of events as Akka Streams feels like a consistent way forward, it also allows non-actor code to be employed for doing stream processing on the resulting event streams, including merging multiple of them or feeding events into Spark—the possibilities are boundless. I’m quite excited by this new perspective and look forward to your feedback on how well this helps Akka users implement the Q in CQRS.

Regards,


*Dr. Roland Kuhn*
/Akka Tech Lead/
Typesafe <http://typesafe.com/> – Reactive apps on the JVM.
twitter: @rolandkuhn
<http://twitter.com/#%21/rolandkuhn>


--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com <mailto:akka-user+unsubscr...@googlegroups.com>. To post to this group, send email to akka-user@googlegroups.com <mailto:akka-user@googlegroups.com>.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



*Dr. Roland Kuhn*
/Akka Tech Lead/
Typesafe <http://typesafe.com/> – Reactive apps on the JVM.
twitter: @rolandkuhn
<http://twitter.com/#%21/rolandkuhn>


--
Martin Krasser

blog:    http://krasserm.blogspot.com
code:    http://github.com/krasserm
twitter: http://twitter.com/mrt1nz

--
     Read the docs: http://akka.io/docs/
     Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
     Search the archives: https://groups.google.com/group/akka-user
--- You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to