I sent out a pull request about the offset sequencer. https://github.com/apache/incubator-distributedlog/pull/15
I am not sure if there is any code guideline to follow. I tried my best to follow existing code style. If I did anything wrong, please help me fix them. - KN On Tue, Aug 23, 2016 at 9:38 AM, Khurrum Nasim <[email protected]> wrote: > Hi All, > > After read the DL code, we have a better idea on how to use distributed > log as the kafka implementation. There are two approaches to do that: one > is to use distributedlog-core library directly in kafka broker, while the > other one is to use all the DL components. > > The first approach is basically to replace the storage of kafka broker > with bookkeeper. The good part is that all the kafka wire-protocols will > remain unchanged. But it might take longer time and also make operations > complicated. > > The second approach is to implement Kafka's publisher and subscriber API > using DL. It would be much faster and more consistent on operations (we > only need to operate DL backend only). However, it would only support java > client. > > We discussed internally. We felt the second approach is good enough to us > and it is easier to achieve. We will start with the second approach. If > there are anyone interested in first approach, we'd like to participant and > help too. > > Here is the outline about our changes: > > * Kafka Namespace: as I replied in the other email thread, we want to > layout the streams in following format: > > namespace/topic/partitions : storing all the partitions > namespace/topic/partitions/N : storing the given partition `N` > namespace/topic/subscriptions : storing all the subscriptions > namespace/topic/subscriptions/S : storing the information of subscription > `S` > > both `namespace/topic/partitions/N` and `namespace/topic/subscriptions/S` > are DL streams. > > * Offset Sequencer: we want to assign `offset` as the transaction id > instead of `timestamp`. we will add a `OffsetSequencer` and allow write > proxy to load `OffsetSequencer` instead of `TimeSequencer`. > > * Use separated DL streams to store the information of a subscription, > such as offsets and consumer load balancing information. > > Do you see any concerns here? > > > - KN > > On Tue, Aug 9, 2016 at 1:04 PM, Sijie Guo <[email protected]> wrote: > >> Thanks Khurrum. >> >> At this point, we don't have any specific process to follow for big >> features. We were discussing one under >> http://mail-archives.apache.org/mod_mbox/incubator-distribut >> edlog-dev/201607.mbox/browser >> >> But ideally, let's use mail list for discussion and use confluence page >> for >> reflecting the discussions into a design doc. >> >> If you already have a confluence account (if not, please create one), >> please email me your account. I can grant the permission to you, then you >> can edit. >> >> - Sijie >> >> On Mon, Aug 1, 2016 at 9:01 AM, Khurrum Nasim <[email protected]> >> wrote: >> >> > Sijie, >> > >> > Thank you so much for your quick reply. We are using Kafka now and we >> are >> > interested in the features in DL like durability and handling slow >> > machines. >> > >> > If it is okay to the community, we'd like to give a try and evaluate the >> > solution. Is there any process that I should follow? >> > >> > KN >> > >> > On Sunday, July 31, 2016, Sijie Guo <[email protected] >> > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> > >> > > Khurrum, >> > > >> > > Interesting. Thank you for your interests in DistributedLog. >> > > >> > > Three years ago when we started the project internally at Twitter, we >> did >> > > have a plan to use it as a backend for both kestrel (Twitter's >> in-house >> > > queue system) and Kafka. However, we didn't go down that direction. >> > > Instead, we built a similar self-serve pub/sub system over >> DistributedLog >> > > to consolidate our kestrel and kafka. So we don't have a concrete >> plan to >> > > build the kafka's interface over DistributedLog. The module was put >> under >> > > tutorials is mostly to give people an idea how it can be used for >> > building >> > > a partition based pub/sub system. >> > > >> > > However, I don't have any strong preference here. If you think it >> would >> > be >> > > useful to other people, you are welcome to contribute. We'd be happy >> to >> > > guide and offer any helps. >> > > >> > > Also, it might be good if you can explain more about what you are >> > planning >> > > to do. Other people in the community can chime in and discuss. >> > > >> > > Please let us know your thoughts. You are very welcome to make any >> > > contributions. >> > > >> > > - Sijie >> > > >> > > On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim < >> [email protected] >> > > >> > > wrote: >> > > >> > > > Hello folks, >> > > > >> > > > I saw there is a 'distributedlog-kafka' module in tutorials. But it >> > seems >> > > > not complete yet. I am wondering if there is a plan to fully >> implement >> > > the >> > > > kafka's interface. It would be great if we can use kafka's >> interface to >> > > > access distributed log. I'd like to contribute if there is a plan. >> > > > >> > > > Thanks, >> > > > KN >> > > > >> > > >> > >> > >
