Re: [akka-user] Apache Kafka as journal - retention times/PersistentView and partitions

Martin Krasser Tue, 26 Aug 2014 10:44:43 -0700


On 26.08.14 16:44, Andrzej Dębski wrote:

My mind must have filtered out the possibility of making snapshotsusing Views - thanks.
About partitions: I suspected as much. The only thing that I amwondering now is: if it is possible to dynamically create partitionsin Kafka? AFAIK the number of partitions is set during topic creation(be it programmatically using API or CLI tools) and there is CLI toolyou can use to modify existing topic:https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-5.AddPartitionTool.To keep the invariant " PersistentActor is the only writer to apartitioned journal topic" you would have to create those partitionsdynamically (usually you don't know up front how many PersistentActorsyour system will have) on per-PersistentActor basis.

You're right. If you want to keep all data in Kafka without everdeleting them, you'd need to add partitions dynamically (which iscurrently possible with APIs that back the CLI). On the other hand,using Kafka this way is the wrong approach IMO. If you really need tokeep the full event history, keep old events on HDFS or wherever andonly the more recent ones in Kafka (where a full replay must first readfrom HDFS and then from Kafka) or use a journal plugin that isexplicitly designed for long-term event storage.

The main reason why I developed the Kafka plugin was to integrate myAkka applications in unified log processing architectures as descibed inJay Kreps' excellent article<http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying>.Also mentioned in this article is a snapshotting strategy that fitstypical retention times in Kafka.

On the other hand maybe you are assuming that each actor is writing todifferent topic


yes, and the Kafka plugin is currently implemented that way.

- but I think this solution is not viable because information abouttopics is limited by ZK and otherfactors: http://grokbase.com/t/kafka/users/133v60ng6v/limit-on-number-of-kafka-topic.

A more in-depth discussion about these limitations is given athttp://www.quora.com/How-many-topics-can-be-created-in-Apache-Kafka witha detailed comment from Jay. I'd say that if you designed yourapplication to run more than a few hundred persistent actors, then theKafka plugin is the probably wrong choice. I tend to design myapplications to have only a small number of persistent actors (which isin contrast to many other discussions on akka-user) which makes theKafka plugin a good candidate.


To recap, the Kafka plugin is a reasonable choice if

- frequent snapshotting is done by persistent actors (every day or so)
- you don't have more than a few hundred persistent actors and

- your application is a component of a unified log processingarchitecture (backed by Kafka)

The most interesting next Kafka plugin feature for me to develop is anHDFS integration for long-term event storage (and full event historyreplay). WDYT?

W dniu wtorek, 26 sierpnia 2014 15:28:47 UTC+2 użytkownik MartinKrasser napisał:


    Hi Andrzej,

    On 26.08.14 09:15, Andrzej Dębski wrote:

    Hello

    Lately I have been reading about a possibility of using Apache
    Kafka as journal/snapshot store for akka-persistence.

    I am aware of the plugin created by Martin Krasser:
    https://github.com/krasserm/akka-persistence-kafka/
    <https://github.com/krasserm/akka-persistence-kafka/> and also I
    read other topic about Kafka as journal
    
https://groups.google.com/forum/#!searchin/akka-user/kakfka/akka-user/iIHmvC6bVrI/zeZJtW0_6FwJ
    
<https://groups.google.com/forum/#%21searchin/akka-user/kakfka/akka-user/iIHmvC6bVrI/zeZJtW0_6FwJ>.

    In both sources I linked two ideas were presented:

    1. Set log retention to 7 days, take snapshots every 3 days
    (example values)
    2. Set log retention to unlimited.

    Here is the first question: in first case wouldn't it mean that
    persistent views would receive skewed view of the PersistentActor
    state (only events from 7 days) - is it really viable solution?
    As far as I know PersistentView can only receive events - it
    can't receive snapshots from corresponding PersistentActor (which
    is good in general case).


    PersistentViews can create their own snapshots which are isolated
    from the corresponding PersistentActor's snapshots.


    Second question (more directed to Martin): in the thread I linked
    you wrote:

         I don't go into Kafka partitioning details here but it is
        possible to implement the journal driver in a way that both a
        single persistent actor's data are partitioned *and* kept in
        order


     I am very interested in this idea. AFAIK it is not yet
    implemented in current plugin but I was wondering if you could
    share high level idea how would you achieve that (one persistent
    actor, multiple partitions, ordering ensured)?


    The idea is to

    - first write events 1 to n to partition 1
    - then write events n+1 to 2n to partition 2
    - then write events 2n+1 to 3n to partition 3
    - ... and so on

    This works because a PersistentActor is the only writer to a
    partitioned journal topic. During replay, you first replay
    partition 1, then partition 2 and so on. This should be rather
    easy to implement in the Kafka journal, just didn't have time so
    far; pull requests are welcome :) Btw, the Cassandra journal
    <https://github.com/krasserm/akka-persistence-cassandra> follows
    the very same strategy for scaling with data volume (by using
    different partition keys).

    Cheers,
    Martin

-->>>>>>>>>> Read the docs: http://akka.io/docs/

    >>>>>>>>>> Check the FAQ:
    http://doc.akka.io/docs/akka/current/additional/faq.html
    <http://doc.akka.io/docs/akka/current/additional/faq.html>
    >>>>>>>>>> Search the archives:
    https://groups.google.com/group/akka-user
    <https://groups.google.com/group/akka-user>
    ---
    You received this message because you are subscribed to the
    Google Groups "Akka User List" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to akka-user+...@googlegroups.com <javascript:>.
    To post to this group, send email to akka...@googlegroups.com
    <javascript:>.
    Visit this group at http://groups.google.com/group/akka-user
    <http://groups.google.com/group/akka-user>.
    For more options, visit https://groups.google.com/d/optout
    <https://groups.google.com/d/optout>.

--Martin Krasser


    blog:http://krasserm.blogspot.com
    code:http://github.com/krasserm
    twitter:http://twitter.com/mrt1nz

--
>>>>>>>>>> Read the docs: http://akka.io/docs/

>>>>>>>>>> Check the FAQ:http://doc.akka.io/docs/akka/current/additional/faq.html

>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---

You received this message because you are subscribed to the GoogleGroups "Akka User List" group.To unsubscribe from this group and stop receiving emails from it, sendan email to akka-user+unsubscr...@googlegroups.com<mailto:akka-user+unsubscr...@googlegroups.com>.To post to this group, send email to akka-user@googlegroups.com<mailto:akka-user@googlegroups.com>.

Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


--
Martin Krasser

blog:    http://krasserm.blogspot.com
code:    http://github.com/krasserm
twitter: http://twitter.com/mrt1nz

--

     Read the docs: http://akka.io/docs/
     Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
     Search the archives: https://groups.google.com/group/akka-user

---You received this message because you are subscribed to the Google Groups "Akka User List" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Apache Kafka as journal - retention times/PersistentView and partitions

Reply via email to