Re: Distributed Log as Kafka's backend

Khurrum Nasim Tue, 30 Aug 2016 22:13:50 -0700

Thanks Leigh and Sijie.

I will move the development to under a contrib project. I am going to also
talk the kafka folks if that is the best place to host this idea.


KN

On Mon, Aug 29, 2016 at 7:45 AM, Leigh Stewart <lstew...@twitter.com.invalid
> wrote:

> Agree with Sijie, I think this is exciting work and I didn't mean to cut
> off your options. My objection was just about code organization.
>
> A contribs project seems like a good compromise for now, until we can think
> of a better place to put the code.
>
> Sijie's right though, if we want to fully productionize this and make it
> reusable this might not be the best long term location.
>
> What are your thoughts Khurrum? Does the code organization/ layering
> argument make sense?
>
> Thanks!
>
> On Fri, Aug 26, 2016 at 8:24 PM, Sijie Guo <si...@apache.org> wrote:
>
> > + Leigh
> >
> > Khurrum,
> >
> > Thanks for your hard working on this. The approach in general looks good
> > to me.
> >
> > However, I am kind of agreeing with what Leigh commented at pull request.
> > Ideally we want to make DL more focus on single streams itself, such as
> > durability, consistency and performance. As different applications might
> > use streams in a different way to produce different data/consumption
> > models. For example, you can use a set of streams to build Kafka-like
> > partitioned pubsub, or other people can use a set of streams to build a
> > queue-like messaging system, or build database.
> >
> > However, at the other side, it is very interesting to see a good Kafka
> > client integration using DL streams as partitions rather than just a
> > non-completed tutorial. I wouldn't discourage your hard working.
> Probably a
> > tradeoff here is making a distributdlog-contribs module and moving the
> > distributedlog-kafka module to under it. The distributedlog-contribs
> module
> > hosts any integration related contributions. This would helping avoid any
> > confusions. Any thoughts, Leigh?
> >
> > Also, Khurrum, did you talk with Kafka community? I am not sure if DL is
> > the right repo to host this. Does anyone else have better suggestions on
> > this?
> >
> > - Sijie
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thursday, August 25, 2016, Khurrum Nasim <khurrumnas...@gmail.com>
> > wrote:
> >
> >> I sent out another pull request to improve the kafka publisher in the
> >> tutorial : https://github.com/apache/incubator-distributedlog/pull/16
> >>
> >> We tried to use the existing kafka configuration, key/value serializer
> and
> >> partitioner as possible as we can. So we don't need to rewrite our
> >> existing
> >> services to adopt distributedlog.
> >>
> >> Although the pull request is still WIP, we'd like to know if we are
> using
> >> distributed log in the right way. Especially we are thinking of changing
> >> write proxy to also return either transaction id or sequence id on write
> >> requests.
> >>
> >> Appreciate your helps.
> >>
> >> - KN
> >>
> >>
> >>
> >> On Thu, Aug 25, 2016 at 1:28 AM, Khurrum Nasim <khurrumnas...@gmail.com
> >
> >> wrote:
> >>
> >> > I sent out a pull request about the offset sequencer.
> >> https://github.com/
> >> > apache/incubator-distributedlog/pull/15
> >> >
> >> > I am not sure if there is any code guideline to follow. I tried my
> best
> >> to
> >> > follow existing code style. If I did anything wrong, please help me
> fix
> >> > them.
> >> >
> >> > - KN
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Aug 23, 2016 at 9:38 AM, Khurrum Nasim <
> khurrumnas...@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> Hi All,
> >> >>
> >> >> After read the DL code, we have a better idea on how to use
> distributed
> >> >> log as the kafka implementation. There are two approaches to do that:
> >> one
> >> >> is to use distributedlog-core library directly in kafka broker, while
> >> the
> >> >> other one is to use all the DL components.
> >> >>
> >> >> The first approach is basically to replace the storage of kafka
> broker
> >> >> with bookkeeper. The good part is that all the kafka wire-protocols
> >> will
> >> >> remain unchanged. But it might take longer time and also make
> >> operations
> >> >> complicated.
> >> >>
> >> >> The second approach is to implement Kafka's publisher and subscriber
> >> API
> >> >> using DL. It would be much faster and more consistent on operations
> (we
> >> >> only need to operate DL backend only). However, it would only support
> >> java
> >> >> client.
> >> >>
> >> >> We discussed internally. We felt the second approach is good enough
> to
> >> us
> >> >> and it is easier to achieve. We will start with the second approach.
> If
> >> >> there are anyone interested in first approach, we'd like to
> >> participant and
> >> >> help too.
> >> >>
> >> >> Here is the outline about our changes:
> >> >>
> >> >> * Kafka Namespace: as I replied in the other email thread, we want to
> >> >> layout the streams in following format:
> >> >>
> >> >> namespace/topic/partitions : storing all the partitions
> >> >> namespace/topic/partitions/N : storing the given partition `N`
> >> >> namespace/topic/subscriptions : storing all the subscriptions
> >> >> namespace/topic/subscriptions/S : storing the information of
> >> >> subscription `S`
> >> >>
> >> >> both `namespace/topic/partitions/N` and
> `namespace/topic/subscriptions
> >> /S`
> >> >> are DL streams.
> >> >>
> >> >> * Offset Sequencer: we want to assign `offset` as the transaction id
> >> >> instead of `timestamp`. we will add a `OffsetSequencer` and allow
> write
> >> >> proxy to load `OffsetSequencer` instead of `TimeSequencer`.
> >> >>
> >> >> * Use separated DL streams to store the information of a
> subscription,
> >> >> such as offsets and consumer load balancing information.
> >> >>
> >> >> Do you see any concerns here?
> >> >>
> >> >>
> >> >> - KN
> >> >>
> >> >> On Tue, Aug 9, 2016 at 1:04 PM, Sijie Guo <si...@apache.org> wrote:
> >> >>
> >> >>> Thanks Khurrum.
> >> >>>
> >> >>> At this point, we don't have any specific process to follow for big
> >> >>> features. We were discussing one under
> >> >>> http://mail-archives.apache.org/mod_mbox/incubator-distribut
> >> >>> edlog-dev/201607.mbox/browser
> >> >>>
> >> >>> But ideally, let's use mail list for discussion and use confluence
> >> page
> >> >>> for
> >> >>> reflecting the discussions into a design doc.
> >> >>>
> >> >>> If you already have a confluence account (if not, please create
> one),
> >> >>> please email me your account. I can grant the permission to you,
> then
> >> you
> >> >>> can edit.
> >> >>>
> >> >>> - Sijie
> >> >>>
> >> >>> On Mon, Aug 1, 2016 at 9:01 AM, Khurrum Nasim <
> >> khurrumnas...@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>> > Sijie,
> >> >>> >
> >> >>> > Thank you so much for your quick reply. We are using Kafka now and
> >> we
> >> >>> are
> >> >>> > interested in the features in DL like durability and handling slow
> >> >>> > machines.
> >> >>> >
> >> >>> > If it is okay to the community, we'd like to give a try and
> evaluate
> >> >>> the
> >> >>> > solution. Is there any process that I should follow?
> >> >>> >
> >> >>> > KN
> >> >>> >
> >> >>> > On Sunday, July 31, 2016, Sijie Guo <si...@apache.org
> >> >>> > <javascript:_e(%7B%7D,'cvml','si...@apache.org');>> wrote:
> >> >>> >
> >> >>> > > Khurrum,
> >> >>> > >
> >> >>> > > Interesting. Thank you for your interests in DistributedLog.
> >> >>> > >
> >> >>> > > Three years ago when we started the project internally at
> Twitter,
> >> >>> we did
> >> >>> > > have a plan to use it as a backend for both kestrel (Twitter's
> >> >>> in-house
> >> >>> > > queue system) and Kafka. However, we didn't go down that
> >> direction.
> >> >>> > > Instead, we built a similar self-serve pub/sub system over
> >> >>> DistributedLog
> >> >>> > > to consolidate our kestrel and kafka. So we don't have a
> concrete
> >> >>> plan to
> >> >>> > > build the kafka's interface over DistributedLog. The module was
> >> put
> >> >>> under
> >> >>> > > tutorials is mostly to give people an idea how it can be used
> for
> >> >>> > building
> >> >>> > > a partition based pub/sub system.
> >> >>> > >
> >> >>> > > However, I don't have any strong preference here. If you think
> it
> >> >>> would
> >> >>> > be
> >> >>> > > useful to other people, you are welcome to contribute. We'd be
> >> happy
> >> >>> to
> >> >>> > > guide and offer any helps.
> >> >>> > >
> >> >>> > > Also, it might be good if you can explain more about what you
> are
> >> >>> > planning
> >> >>> > > to do. Other people in the community can chime in and discuss.
> >> >>> > >
> >> >>> > > Please let us know your thoughts. You are very welcome to make
> any
> >> >>> > > contributions.
> >> >>> > >
> >> >>> > > - Sijie
> >> >>> > >
> >> >>> > > On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim <
> >> >>> khurrumnas...@gmail.com
> >> >>> > >
> >> >>> > > wrote:
> >> >>> > >
> >> >>> > > > Hello folks,
> >> >>> > > >
> >> >>> > > > I saw there is a 'distributedlog-kafka' module in tutorials.
> >> But it
> >> >>> > seems
> >> >>> > > > not complete yet. I am wondering if there is a plan to fully
> >> >>> implement
> >> >>> > > the
> >> >>> > > > kafka's interface. It would be great if we can use kafka's
> >> >>> interface to
> >> >>> > > > access distributed log. I'd like to contribute if there is a
> >> plan.
> >> >>> > > >
> >> >>> > > > Thanks,
> >> >>> > > > KN
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >> >>
> >> >>
> >> >
> >>
> >
>

Re: Distributed Log as Kafka's backend

Reply via email to