Re: Distributed Log as Kafka's backend

Khurrum Nasim Thu, 25 Aug 2016 01:28:57 -0700

I sent out a pull request about the offset sequencer.
https://github.com/apache/incubator-distributedlog/pull/15


I am not sure if there is any code guideline to follow. I tried my best to
follow existing code style. If I did anything wrong, please help me fix
them.

- KN




On Tue, Aug 23, 2016 at 9:38 AM, Khurrum Nasim <[email protected]>
wrote:

> Hi All,
>
> After read the DL code, we have a better idea on how to use distributed
> log as the kafka implementation. There are two approaches to do that: one
> is to use distributedlog-core library directly in kafka broker, while the
> other one is to use all the DL components.
>
> The first approach is basically to replace the storage of kafka broker
> with bookkeeper. The good part is that all the kafka wire-protocols will
> remain unchanged. But it might take longer time and also make operations
> complicated.
>
> The second approach is to implement Kafka's publisher and subscriber API
> using DL. It would be much faster and more consistent on operations (we
> only need to operate DL backend only). However, it would only support java
> client.
>
> We discussed internally. We felt the second approach is good enough to us
> and it is easier to achieve. We will start with the second approach. If
> there are anyone interested in first approach, we'd like to participant and
> help too.
>
> Here is the outline about our changes:
>
> * Kafka Namespace: as I replied in the other email thread, we want to
> layout the streams in following format:
>
> namespace/topic/partitions : storing all the partitions
> namespace/topic/partitions/N : storing the given partition `N`
> namespace/topic/subscriptions : storing all the subscriptions
> namespace/topic/subscriptions/S : storing the information of subscription
> `S`
>
> both `namespace/topic/partitions/N` and `namespace/topic/subscriptions/S`
> are DL streams.
>
> * Offset Sequencer: we want to assign `offset` as the transaction id
> instead of `timestamp`. we will add a `OffsetSequencer` and allow write
> proxy to load `OffsetSequencer` instead of `TimeSequencer`.
>
> * Use separated DL streams to store the information of a subscription,
> such as offsets and consumer load balancing information.
>
> Do you see any concerns here?
>
>
> - KN
>
> On Tue, Aug 9, 2016 at 1:04 PM, Sijie Guo <[email protected]> wrote:
>
>> Thanks Khurrum.
>>
>> At this point, we don't have any specific process to follow for big
>> features. We were discussing one under
>> http://mail-archives.apache.org/mod_mbox/incubator-distribut
>> edlog-dev/201607.mbox/browser
>>
>> But ideally, let's use mail list for discussion and use confluence page
>> for
>> reflecting the discussions into a design doc.
>>
>> If you already have a confluence account (if not, please create one),
>> please email me your account. I can grant the permission to you, then you
>> can edit.
>>
>> - Sijie
>>
>> On Mon, Aug 1, 2016 at 9:01 AM, Khurrum Nasim <[email protected]>
>> wrote:
>>
>> > Sijie,
>> >
>> > Thank you so much for your quick reply. We are using Kafka now and we
>> are
>> > interested in the features in DL like durability and handling slow
>> > machines.
>> >
>> > If it is okay to the community, we'd like to give a try and evaluate the
>> > solution. Is there any process that I should follow?
>> >
>> > KN
>> >
>> > On Sunday, July 31, 2016, Sijie Guo <[email protected]
>> > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>> >
>> > > Khurrum,
>> > >
>> > > Interesting. Thank you for your interests in DistributedLog.
>> > >
>> > > Three years ago when we started the project internally at Twitter, we
>> did
>> > > have a plan to use it as a backend for both kestrel (Twitter's
>> in-house
>> > > queue system) and Kafka. However, we didn't go down that direction.
>> > > Instead, we built a similar self-serve pub/sub system over
>> DistributedLog
>> > > to consolidate our kestrel and kafka. So we don't have a concrete
>> plan to
>> > > build the kafka's interface over DistributedLog. The module was put
>> under
>> > > tutorials is mostly to give people an idea how it can be used for
>> > building
>> > > a partition based pub/sub system.
>> > >
>> > > However, I don't have any strong preference here. If you think it
>> would
>> > be
>> > > useful to other people, you are welcome to contribute. We'd be happy
>> to
>> > > guide and offer any helps.
>> > >
>> > > Also, it might be good if you can explain more about what you are
>> > planning
>> > > to do. Other people in the community can chime in and discuss.
>> > >
>> > > Please let us know your thoughts. You are very welcome to make any
>> > > contributions.
>> > >
>> > > - Sijie
>> > >
>> > > On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim <
>> [email protected]
>> > >
>> > > wrote:
>> > >
>> > > > Hello folks,
>> > > >
>> > > > I saw there is a 'distributedlog-kafka' module in tutorials. But it
>> > seems
>> > > > not complete yet. I am wondering if there is a plan to fully
>> implement
>> > > the
>> > > > kafka's interface. It would be great if we can use kafka's
>> interface to
>> > > > access distributed log. I'd like to contribute if there is a plan.
>> > > >
>> > > > Thanks,
>> > > > KN
>> > > >
>> > >
>> >
>>
>
>

Re: Distributed Log as Kafka's backend

Reply via email to