Sure we could do it. We skipped last time because dl was not OSS. Need to find some time though - lets discuss quickly next week.
On Fri, Nov 11, 2016 at 12:10 PM, Sijie Guo <si...@apache.org> wrote: > /cc Leigh > > I don't think we pushed the DL related code to kestrel. As I think kestrel > has been in the deprecation path internally at Twitter. But it might be > worth pushing the code change just for reference. Leigh, what's your > opinion? > > - Sijie > > On Wed, Nov 9, 2016 at 2:48 AM, Gerrit Sundaram <gerritsunda...@gmail.com> > wrote: > >> Sijie, thank your for your comments and suggestions. I will start a >> separate thread for discussing the metadata operation primitives. >> >> BTW, I didn't find any code in kestrel that is related to distributedlog >> :( Can you kindly point me the files? >> >> - Gerrit >> >> >> On Wed, Nov 2, 2016 at 10:35 AM, Sijie Guo <sij...@twitter.com> wrote: >> >>> >>> >>> On Wed, Nov 2, 2016 at 3:14 AM, Gerrit Sundaram < >>> gerritsunda...@gmail.com> wrote: >>> >>>> FYI - I tried to use the AppendOnlyStreamWriter and >>>> AppendOnlyStreamReader to demonstrate the idea : >>>> https://github.com/apache/incubator-distributedlog/pulls/43 Let me >>>> know if this is a good direction to go after. >>>> >>>> - Gerrit >>>> >>>> On Wed, Nov 2, 2016 at 2:21 AM, Gerrit Sundaram < >>>> gerritsunda...@gmail.com> wrote: >>>> >>>>> Hi distributedlog folks, >>>>> >>>>> I am new to this community. I am wondering is there anyone tried to >>>>> build a file system over replicated logs. There are a lot of similarities >>>>> between a filesystem file and a replicated log. You can use files to build >>>>> replicated log or use replicated logs to build a filesystem. >>>>> >>>>> I took at the code repo and found there are two files >>>>> 'AppendOnlyStreamReader' and 'AppendOnlyStreamWriter'. They seem to >>>>> implement file I/O related API. Did you guys attempt to provide filesystem >>>>> API over distributedlog? >>>>> >>>> >>> Ah, those two classes were designed for filesystem-like I/O operations. >>> We used them for substituting the local-file-based journal in kestrel >>> <https://github.com/twitter-archive/kestrel>. >>> >> >>> >>>> >>>>> I am wondering if it is possible to build a filesystem over >>>>> distributedlog. Would this be an interesting topic to this project and the >>>>> community? I have two reasons for that >>>>> - I can leverage the good stuffs like parallel replication, low >>>>> latency for better performance? >>>>> >>>> - DL uses zookeeper for metadata storage. ZooKeeper has pretty nice >>>>> filesystem-like interface. So it would be a nice fit. >>>>> >>>> >>> this sounds interesting. I don't think there are any major blockers for >>> DL exposing a filesystem-like API, as indeed we already did that for >>> kestrel. You might need to spend time on refining the metadata operations, >>> like list files, get file status and such. >>> >>> Re "better performance" - for data I/O, it should be just fine for >>> workloads like writes, tailing reads and caught-up reads (scans). I am not >>> sure about random reads, as we didn't really pay attention to this at >>> Twitter (although Salesforce used bookkeeper as the storage for also >>> serving random reads, it should probably work just well). I am not certain >>> about metadata operations - we did create/open/delete log streams >>> frequently for some of our use cases, but still might be less frequent >>> comparing to a filesystem. We have a plan to make the stream primitive very >>> lightweight, so we can support huge number of streams. We probably can work >>> together on improving the metadata part. >>> >>> I took a look at your pull request. I liked your layout - putting it in >>> a contrib module to incubate this idea. We definitely welcome any >>> contributions that make DL easy to use. Feel free to start a proposal >>> discussion >>> <https://cwiki.apache.org/confluence/display/DL/Project+Proposals>. I >>> believe there will be a lot of corner cases to discuss. >>> >> >>> >>> >>>> >>>>> - Gerrit >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >> >