Re: FileSystem API over distributedlog logs

Leigh Stewart Fri, 11 Nov 2016 12:22:05 -0800

Sure we could do it. We skipped last time because dl was not OSS.

Need to find some time though - lets discuss quickly next week.


On Fri, Nov 11, 2016 at 12:10 PM, Sijie Guo <si...@apache.org> wrote:

> /cc Leigh
>
> I don't think we pushed the DL related code to kestrel. As I think kestrel
> has been in the deprecation path internally at Twitter. But it might be
> worth pushing the code change just for reference. Leigh, what's your
> opinion?
>
> - Sijie
>
> On Wed, Nov 9, 2016 at 2:48 AM, Gerrit Sundaram <gerritsunda...@gmail.com>
> wrote:
>
>> Sijie, thank your for your comments and suggestions. I will start a
>> separate thread for discussing the metadata operation primitives.
>>
>> BTW, I didn't find any code in kestrel that is related to distributedlog
>> :( Can you kindly point me the files?
>>
>> - Gerrit
>>
>>
>> On Wed, Nov 2, 2016 at 10:35 AM, Sijie Guo <sij...@twitter.com> wrote:
>>
>>>
>>>
>>> On Wed, Nov 2, 2016 at 3:14 AM, Gerrit Sundaram <
>>> gerritsunda...@gmail.com> wrote:
>>>
>>>> FYI - I tried to use the AppendOnlyStreamWriter and
>>>> AppendOnlyStreamReader to demonstrate the idea :
>>>> https://github.com/apache/incubator-distributedlog/pulls/43 Let me
>>>> know if this is a good direction to go after.
>>>>
>>>> - Gerrit
>>>>
>>>> On Wed, Nov 2, 2016 at 2:21 AM, Gerrit Sundaram <
>>>> gerritsunda...@gmail.com> wrote:
>>>>
>>>>> Hi distributedlog folks,
>>>>>
>>>>> I am new to this community. I am wondering is there anyone tried to
>>>>> build a file system over replicated logs. There are a lot of similarities
>>>>> between a filesystem file and a replicated log. You can use files to build
>>>>> replicated log or use replicated logs to build a filesystem.
>>>>>
>>>>> I took at the code repo and found there are two files
>>>>> 'AppendOnlyStreamReader' and 'AppendOnlyStreamWriter'. They seem to
>>>>> implement file I/O related API. Did you guys attempt to provide filesystem
>>>>> API over distributedlog?
>>>>>
>>>>
>>> Ah, those two classes were designed for filesystem-like I/O operations.
>>> We used them for substituting the local-file-based journal in kestrel
>>> <https://github.com/twitter-archive/kestrel>.
>>>
>>
>>>
>>>>
>>>>> I am wondering if it is possible to build a filesystem over
>>>>> distributedlog. Would this be an interesting topic to this project and the
>>>>> community? I have two reasons for that
>>>>> - I can leverage the good stuffs like parallel replication, low
>>>>> latency for better performance?
>>>>>
>>>> - DL uses zookeeper for metadata storage. ZooKeeper has pretty nice
>>>>> filesystem-like interface. So it would be a nice fit.
>>>>>
>>>>
>>> this sounds interesting. I don't think there are any major blockers for
>>> DL exposing a filesystem-like API, as indeed we already did that for
>>> kestrel. You might need to spend time on refining the metadata operations,
>>> like list files, get file status and such.
>>>
>>> Re "better performance" - for data I/O, it should be just fine for
>>> workloads like writes, tailing reads and caught-up reads (scans). I am not
>>> sure about random reads, as we didn't really pay attention to this at
>>> Twitter (although Salesforce used bookkeeper as the storage for also
>>> serving random reads, it should probably work just well).  I am not certain
>>> about metadata operations - we did create/open/delete log streams
>>> frequently for some of our use cases, but still might be less frequent
>>> comparing to a filesystem. We have a plan to make the stream primitive very
>>> lightweight, so we can support huge number of streams. We probably can work
>>> together on improving the metadata part.
>>>
>>> I took a look at your pull request. I liked your layout - putting it in
>>> a contrib module to incubate this idea. We definitely welcome any
>>> contributions that make DL easy to use. Feel free to start a proposal
>>> discussion
>>> <https://cwiki.apache.org/confluence/display/DL/Project+Proposals>. I
>>> believe there will be a lot of corner cases to discuss.
>>>
>>
>>>
>>>
>>>>
>>>>> - Gerrit
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: FileSystem API over distributedlog logs

Reply via email to