Sijie, thank your for your comments and suggestions. I will start a separate thread for discussing the metadata operation primitives.
BTW, I didn't find any code in kestrel that is related to distributedlog :( Can you kindly point me the files? - Gerrit On Wed, Nov 2, 2016 at 10:35 AM, Sijie Guo <sij...@twitter.com> wrote: > > > On Wed, Nov 2, 2016 at 3:14 AM, Gerrit Sundaram <gerritsunda...@gmail.com> > wrote: > >> FYI - I tried to use the AppendOnlyStreamWriter and >> AppendOnlyStreamReader to demonstrate the idea : >> https://github.com/apache/incubator-distributedlog/pulls/43 Let me know >> if this is a good direction to go after. >> >> - Gerrit >> >> On Wed, Nov 2, 2016 at 2:21 AM, Gerrit Sundaram <gerritsunda...@gmail.com >> > wrote: >> >>> Hi distributedlog folks, >>> >>> I am new to this community. I am wondering is there anyone tried to >>> build a file system over replicated logs. There are a lot of similarities >>> between a filesystem file and a replicated log. You can use files to build >>> replicated log or use replicated logs to build a filesystem. >>> >>> I took at the code repo and found there are two files >>> 'AppendOnlyStreamReader' and 'AppendOnlyStreamWriter'. They seem to >>> implement file I/O related API. Did you guys attempt to provide filesystem >>> API over distributedlog? >>> >> > Ah, those two classes were designed for filesystem-like I/O operations. We > used them for substituting the local-file-based journal in kestrel > <https://github.com/twitter-archive/kestrel>. > > >> >>> I am wondering if it is possible to build a filesystem over >>> distributedlog. Would this be an interesting topic to this project and the >>> community? I have two reasons for that >>> - I can leverage the good stuffs like parallel replication, low latency >>> for better performance? >>> >> - DL uses zookeeper for metadata storage. ZooKeeper has pretty nice >>> filesystem-like interface. So it would be a nice fit. >>> >> > this sounds interesting. I don't think there are any major blockers for DL > exposing a filesystem-like API, as indeed we already did that for kestrel. > You might need to spend time on refining the metadata operations, like list > files, get file status and such. > > Re "better performance" - for data I/O, it should be just fine for > workloads like writes, tailing reads and caught-up reads (scans). I am not > sure about random reads, as we didn't really pay attention to this at > Twitter (although Salesforce used bookkeeper as the storage for also > serving random reads, it should probably work just well). I am not certain > about metadata operations - we did create/open/delete log streams > frequently for some of our use cases, but still might be less frequent > comparing to a filesystem. We have a plan to make the stream primitive very > lightweight, so we can support huge number of streams. We probably can work > together on improving the metadata part. > > I took a look at your pull request. I liked your layout - putting it in a > contrib module to incubate this idea. We definitely welcome any > contributions that make DL easy to use. Feel free to start a proposal > discussion > <https://cwiki.apache.org/confluence/display/DL/Project+Proposals>. I > believe there will be a lot of corner cases to discuss. > > > >> >>> - Gerrit >>> >>> >>> >>> >>> >> >