Hi All, After read the DL code, we have a better idea on how to use distributed log as the kafka implementation. There are two approaches to do that: one is to use distributedlog-core library directly in kafka broker, while the other one is to use all the DL components.
The first approach is basically to replace the storage of kafka broker with bookkeeper. The good part is that all the kafka wire-protocols will remain unchanged. But it might take longer time and also make operations complicated. The second approach is to implement Kafka's publisher and subscriber API using DL. It would be much faster and more consistent on operations (we only need to operate DL backend only). However, it would only support java client. We discussed internally. We felt the second approach is good enough to us and it is easier to achieve. We will start with the second approach. If there are anyone interested in first approach, we'd like to participant and help too. Here is the outline about our changes: * Kafka Namespace: as I replied in the other email thread, we want to layout the streams in following format: namespace/topic/partitions : storing all the partitions namespace/topic/partitions/N : storing the given partition `N` namespace/topic/subscriptions : storing all the subscriptions namespace/topic/subscriptions/S : storing the information of subscription `S` both `namespace/topic/partitions/N` and `namespace/topic/subscriptions/S` are DL streams. * Offset Sequencer: we want to assign `offset` as the transaction id instead of `timestamp`. we will add a `OffsetSequencer` and allow write proxy to load `OffsetSequencer` instead of `TimeSequencer`. * Use separated DL streams to store the information of a subscription, such as offsets and consumer load balancing information. Do you see any concerns here? - KN On Tue, Aug 9, 2016 at 1:04 PM, Sijie Guo <[email protected]> wrote: > Thanks Khurrum. > > At this point, we don't have any specific process to follow for big > features. We were discussing one under > http://mail-archives.apache.org/mod_mbox/incubator-distribut > edlog-dev/201607.mbox/browser > > But ideally, let's use mail list for discussion and use confluence page for > reflecting the discussions into a design doc. > > If you already have a confluence account (if not, please create one), > please email me your account. I can grant the permission to you, then you > can edit. > > - Sijie > > On Mon, Aug 1, 2016 at 9:01 AM, Khurrum Nasim <[email protected]> > wrote: > > > Sijie, > > > > Thank you so much for your quick reply. We are using Kafka now and we are > > interested in the features in DL like durability and handling slow > > machines. > > > > If it is okay to the community, we'd like to give a try and evaluate the > > solution. Is there any process that I should follow? > > > > KN > > > > On Sunday, July 31, 2016, Sijie Guo <[email protected] > > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > > > Khurrum, > > > > > > Interesting. Thank you for your interests in DistributedLog. > > > > > > Three years ago when we started the project internally at Twitter, we > did > > > have a plan to use it as a backend for both kestrel (Twitter's in-house > > > queue system) and Kafka. However, we didn't go down that direction. > > > Instead, we built a similar self-serve pub/sub system over > DistributedLog > > > to consolidate our kestrel and kafka. So we don't have a concrete plan > to > > > build the kafka's interface over DistributedLog. The module was put > under > > > tutorials is mostly to give people an idea how it can be used for > > building > > > a partition based pub/sub system. > > > > > > However, I don't have any strong preference here. If you think it would > > be > > > useful to other people, you are welcome to contribute. We'd be happy to > > > guide and offer any helps. > > > > > > Also, it might be good if you can explain more about what you are > > planning > > > to do. Other people in the community can chime in and discuss. > > > > > > Please let us know your thoughts. You are very welcome to make any > > > contributions. > > > > > > - Sijie > > > > > > On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim < > [email protected] > > > > > > wrote: > > > > > > > Hello folks, > > > > > > > > I saw there is a 'distributedlog-kafka' module in tutorials. But it > > seems > > > > not complete yet. I am wondering if there is a plan to fully > implement > > > the > > > > kafka's interface. It would be great if we can use kafka's interface > to > > > > access distributed log. I'd like to contribute if there is a plan. > > > > > > > > Thanks, > > > > KN > > > > > > > > > >
