Thanks Leigh and Sijie. I will move the development to under a contrib project. I am going to also talk the kafka folks if that is the best place to host this idea.
KN On Mon, Aug 29, 2016 at 7:45 AM, Leigh Stewart <lstew...@twitter.com.invalid > wrote: > Agree with Sijie, I think this is exciting work and I didn't mean to cut > off your options. My objection was just about code organization. > > A contribs project seems like a good compromise for now, until we can think > of a better place to put the code. > > Sijie's right though, if we want to fully productionize this and make it > reusable this might not be the best long term location. > > What are your thoughts Khurrum? Does the code organization/ layering > argument make sense? > > Thanks! > > On Fri, Aug 26, 2016 at 8:24 PM, Sijie Guo <si...@apache.org> wrote: > > > + Leigh > > > > Khurrum, > > > > Thanks for your hard working on this. The approach in general looks good > > to me. > > > > However, I am kind of agreeing with what Leigh commented at pull request. > > Ideally we want to make DL more focus on single streams itself, such as > > durability, consistency and performance. As different applications might > > use streams in a different way to produce different data/consumption > > models. For example, you can use a set of streams to build Kafka-like > > partitioned pubsub, or other people can use a set of streams to build a > > queue-like messaging system, or build database. > > > > However, at the other side, it is very interesting to see a good Kafka > > client integration using DL streams as partitions rather than just a > > non-completed tutorial. I wouldn't discourage your hard working. > Probably a > > tradeoff here is making a distributdlog-contribs module and moving the > > distributedlog-kafka module to under it. The distributedlog-contribs > module > > hosts any integration related contributions. This would helping avoid any > > confusions. Any thoughts, Leigh? > > > > Also, Khurrum, did you talk with Kafka community? I am not sure if DL is > > the right repo to host this. Does anyone else have better suggestions on > > this? > > > > - Sijie > > > > > > > > > > > > > > > > > > > > On Thursday, August 25, 2016, Khurrum Nasim <khurrumnas...@gmail.com> > > wrote: > > > >> I sent out another pull request to improve the kafka publisher in the > >> tutorial : https://github.com/apache/incubator-distributedlog/pull/16 > >> > >> We tried to use the existing kafka configuration, key/value serializer > and > >> partitioner as possible as we can. So we don't need to rewrite our > >> existing > >> services to adopt distributedlog. > >> > >> Although the pull request is still WIP, we'd like to know if we are > using > >> distributed log in the right way. Especially we are thinking of changing > >> write proxy to also return either transaction id or sequence id on write > >> requests. > >> > >> Appreciate your helps. > >> > >> - KN > >> > >> > >> > >> On Thu, Aug 25, 2016 at 1:28 AM, Khurrum Nasim <khurrumnas...@gmail.com > > > >> wrote: > >> > >> > I sent out a pull request about the offset sequencer. > >> https://github.com/ > >> > apache/incubator-distributedlog/pull/15 > >> > > >> > I am not sure if there is any code guideline to follow. I tried my > best > >> to > >> > follow existing code style. If I did anything wrong, please help me > fix > >> > them. > >> > > >> > - KN > >> > > >> > > >> > > >> > > >> > On Tue, Aug 23, 2016 at 9:38 AM, Khurrum Nasim < > khurrumnas...@gmail.com > >> > > >> > wrote: > >> > > >> >> Hi All, > >> >> > >> >> After read the DL code, we have a better idea on how to use > distributed > >> >> log as the kafka implementation. There are two approaches to do that: > >> one > >> >> is to use distributedlog-core library directly in kafka broker, while > >> the > >> >> other one is to use all the DL components. > >> >> > >> >> The first approach is basically to replace the storage of kafka > broker > >> >> with bookkeeper. The good part is that all the kafka wire-protocols > >> will > >> >> remain unchanged. But it might take longer time and also make > >> operations > >> >> complicated. > >> >> > >> >> The second approach is to implement Kafka's publisher and subscriber > >> API > >> >> using DL. It would be much faster and more consistent on operations > (we > >> >> only need to operate DL backend only). However, it would only support > >> java > >> >> client. > >> >> > >> >> We discussed internally. We felt the second approach is good enough > to > >> us > >> >> and it is easier to achieve. We will start with the second approach. > If > >> >> there are anyone interested in first approach, we'd like to > >> participant and > >> >> help too. > >> >> > >> >> Here is the outline about our changes: > >> >> > >> >> * Kafka Namespace: as I replied in the other email thread, we want to > >> >> layout the streams in following format: > >> >> > >> >> namespace/topic/partitions : storing all the partitions > >> >> namespace/topic/partitions/N : storing the given partition `N` > >> >> namespace/topic/subscriptions : storing all the subscriptions > >> >> namespace/topic/subscriptions/S : storing the information of > >> >> subscription `S` > >> >> > >> >> both `namespace/topic/partitions/N` and > `namespace/topic/subscriptions > >> /S` > >> >> are DL streams. > >> >> > >> >> * Offset Sequencer: we want to assign `offset` as the transaction id > >> >> instead of `timestamp`. we will add a `OffsetSequencer` and allow > write > >> >> proxy to load `OffsetSequencer` instead of `TimeSequencer`. > >> >> > >> >> * Use separated DL streams to store the information of a > subscription, > >> >> such as offsets and consumer load balancing information. > >> >> > >> >> Do you see any concerns here? > >> >> > >> >> > >> >> - KN > >> >> > >> >> On Tue, Aug 9, 2016 at 1:04 PM, Sijie Guo <si...@apache.org> wrote: > >> >> > >> >>> Thanks Khurrum. > >> >>> > >> >>> At this point, we don't have any specific process to follow for big > >> >>> features. We were discussing one under > >> >>> http://mail-archives.apache.org/mod_mbox/incubator-distribut > >> >>> edlog-dev/201607.mbox/browser > >> >>> > >> >>> But ideally, let's use mail list for discussion and use confluence > >> page > >> >>> for > >> >>> reflecting the discussions into a design doc. > >> >>> > >> >>> If you already have a confluence account (if not, please create > one), > >> >>> please email me your account. I can grant the permission to you, > then > >> you > >> >>> can edit. > >> >>> > >> >>> - Sijie > >> >>> > >> >>> On Mon, Aug 1, 2016 at 9:01 AM, Khurrum Nasim < > >> khurrumnas...@gmail.com> > >> >>> wrote: > >> >>> > >> >>> > Sijie, > >> >>> > > >> >>> > Thank you so much for your quick reply. We are using Kafka now and > >> we > >> >>> are > >> >>> > interested in the features in DL like durability and handling slow > >> >>> > machines. > >> >>> > > >> >>> > If it is okay to the community, we'd like to give a try and > evaluate > >> >>> the > >> >>> > solution. Is there any process that I should follow? > >> >>> > > >> >>> > KN > >> >>> > > >> >>> > On Sunday, July 31, 2016, Sijie Guo <si...@apache.org > >> >>> > <javascript:_e(%7B%7D,'cvml','si...@apache.org');>> wrote: > >> >>> > > >> >>> > > Khurrum, > >> >>> > > > >> >>> > > Interesting. Thank you for your interests in DistributedLog. > >> >>> > > > >> >>> > > Three years ago when we started the project internally at > Twitter, > >> >>> we did > >> >>> > > have a plan to use it as a backend for both kestrel (Twitter's > >> >>> in-house > >> >>> > > queue system) and Kafka. However, we didn't go down that > >> direction. > >> >>> > > Instead, we built a similar self-serve pub/sub system over > >> >>> DistributedLog > >> >>> > > to consolidate our kestrel and kafka. So we don't have a > concrete > >> >>> plan to > >> >>> > > build the kafka's interface over DistributedLog. The module was > >> put > >> >>> under > >> >>> > > tutorials is mostly to give people an idea how it can be used > for > >> >>> > building > >> >>> > > a partition based pub/sub system. > >> >>> > > > >> >>> > > However, I don't have any strong preference here. If you think > it > >> >>> would > >> >>> > be > >> >>> > > useful to other people, you are welcome to contribute. We'd be > >> happy > >> >>> to > >> >>> > > guide and offer any helps. > >> >>> > > > >> >>> > > Also, it might be good if you can explain more about what you > are > >> >>> > planning > >> >>> > > to do. Other people in the community can chime in and discuss. > >> >>> > > > >> >>> > > Please let us know your thoughts. You are very welcome to make > any > >> >>> > > contributions. > >> >>> > > > >> >>> > > - Sijie > >> >>> > > > >> >>> > > On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim < > >> >>> khurrumnas...@gmail.com > >> >>> > > > >> >>> > > wrote: > >> >>> > > > >> >>> > > > Hello folks, > >> >>> > > > > >> >>> > > > I saw there is a 'distributedlog-kafka' module in tutorials. > >> But it > >> >>> > seems > >> >>> > > > not complete yet. I am wondering if there is a plan to fully > >> >>> implement > >> >>> > > the > >> >>> > > > kafka's interface. It would be great if we can use kafka's > >> >>> interface to > >> >>> > > > access distributed log. I'd like to contribute if there is a > >> plan. > >> >>> > > > > >> >>> > > > Thanks, > >> >>> > > > KN > >> >>> > > > > >> >>> > > > >> >>> > > >> >>> > >> >> > >> >> > >> > > >> > > >