/cc Leigh I don't think we pushed the DL related code to kestrel. As I think kestrel has been in the deprecation path internally at Twitter. But it might be worth pushing the code change just for reference. Leigh, what's your opinion?
- Sijie On Wed, Nov 9, 2016 at 2:48 AM, Gerrit Sundaram <gerritsunda...@gmail.com> wrote: > Sijie, thank your for your comments and suggestions. I will start a > separate thread for discussing the metadata operation primitives. > > BTW, I didn't find any code in kestrel that is related to distributedlog > :( Can you kindly point me the files? > > - Gerrit > > > On Wed, Nov 2, 2016 at 10:35 AM, Sijie Guo <sij...@twitter.com> wrote: > >> >> >> On Wed, Nov 2, 2016 at 3:14 AM, Gerrit Sundaram <gerritsunda...@gmail.com >> > wrote: >> >>> FYI - I tried to use the AppendOnlyStreamWriter and >>> AppendOnlyStreamReader to demonstrate the idea : >>> https://github.com/apache/incubator-distributedlog/pulls/43 Let me know >>> if this is a good direction to go after. >>> >>> - Gerrit >>> >>> On Wed, Nov 2, 2016 at 2:21 AM, Gerrit Sundaram < >>> gerritsunda...@gmail.com> wrote: >>> >>>> Hi distributedlog folks, >>>> >>>> I am new to this community. I am wondering is there anyone tried to >>>> build a file system over replicated logs. There are a lot of similarities >>>> between a filesystem file and a replicated log. You can use files to build >>>> replicated log or use replicated logs to build a filesystem. >>>> >>>> I took at the code repo and found there are two files >>>> 'AppendOnlyStreamReader' and 'AppendOnlyStreamWriter'. They seem to >>>> implement file I/O related API. Did you guys attempt to provide filesystem >>>> API over distributedlog? >>>> >>> >> Ah, those two classes were designed for filesystem-like I/O operations. >> We used them for substituting the local-file-based journal in kestrel >> <https://github.com/twitter-archive/kestrel>. >> > >> >>> >>>> I am wondering if it is possible to build a filesystem over >>>> distributedlog. Would this be an interesting topic to this project and the >>>> community? I have two reasons for that >>>> - I can leverage the good stuffs like parallel replication, low latency >>>> for better performance? >>>> >>> - DL uses zookeeper for metadata storage. ZooKeeper has pretty nice >>>> filesystem-like interface. So it would be a nice fit. >>>> >>> >> this sounds interesting. I don't think there are any major blockers for >> DL exposing a filesystem-like API, as indeed we already did that for >> kestrel. You might need to spend time on refining the metadata operations, >> like list files, get file status and such. >> >> Re "better performance" - for data I/O, it should be just fine for >> workloads like writes, tailing reads and caught-up reads (scans). I am not >> sure about random reads, as we didn't really pay attention to this at >> Twitter (although Salesforce used bookkeeper as the storage for also >> serving random reads, it should probably work just well). I am not certain >> about metadata operations - we did create/open/delete log streams >> frequently for some of our use cases, but still might be less frequent >> comparing to a filesystem. We have a plan to make the stream primitive very >> lightweight, so we can support huge number of streams. We probably can work >> together on improving the metadata part. >> >> I took a look at your pull request. I liked your layout - putting it in a >> contrib module to incubate this idea. We definitely welcome any >> contributions that make DL easy to use. Feel free to start a proposal >> discussion >> <https://cwiki.apache.org/confluence/display/DL/Project+Proposals>. I >> believe there will be a lot of corner cases to discuss. >> > >> >> >>> >>>> - Gerrit >>>> >>>> >>>> >>>> >>>> >>> >> >