Gerrit,

Can you send me your wiki account?

- Sijie

On Thu, Nov 17, 2016 at 1:38 AM, Gerrit Sundaram <gerritsunda...@gmail.com>
wrote:

> Can you grant me the permissions for editing the wiki page?
>
> - Gerrit
>
> On Thu, Nov 17, 2016 at 1:37 AM, Gerrit Sundaram <gerritsunda...@gmail.com
> >
> wrote:
>
> >
> >
> > On Tue, Nov 15, 2016 at 2:14 AM, Sijie Guo <si...@apache.org> wrote:
> >
> >> On Sat, Nov 12, 2016 at 2:30 AM, Gerrit Sundaram <
> >> gerritsunda...@gmail.com>
> >> wrote:
> >>
> >> > On Fri, Nov 11, 2016 at 1:09 PM, Sijie Guo <si...@apache.org> wrote:
> >> >
> >> > > I liked this topic. A better name might be 'stream storage
> >> primitives',
> >> > as
> >> > > we treat DL as a stream storage. Comments inline.
> >> > >
> >> > > On Wed, Nov 9, 2016 at 3:09 AM, Gerrit Sundaram <
> >> > gerritsunda...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > As what Sijie suggested in the other email thread, I started this
> >> email
> >> > > > thread for discussing the stream operation primitives.
> >> > > >
> >> > > > The stream operations that I am aware of that DL supports are
> >> > > >
> >> > > > * Open a distributedlog stream
> >> > > > * Delete a distributedlog stream
> >> > > > * List all the distributedlog streams under a namespace
> >> > > >
> >> > >
> >> > > Are you also looking for listing streams under a 'sub-namespace' -
> (or
> >> > > streams have common prefix)? (Based on my understanding on your
> >> proposal,
> >> > > you might need this for a filesystem-like API?)
> >> > >
> >> >
> >> > Yes. However it seems like DL is more designed with flat namespace
> with
> >> > just streams.
> >>
> >>
> >> Ah, yes. The original thought is to tight a namespace to a user or an
> >> application. Under a namespace, application can manage the streams by
> >> their
> >> own. So that's why it was designed with a flat namespace.
> >>
> >>
> >> > There is no concept about 'sub-namespace'. Although I
> >> > probably can hack it by just naming the stream names in a filesystem
> >> > path-like way.
> >> >
> >> > However I am still curious do you guys want to introduce any sort of
> >> naming
> >> > hierarchy in the naming within a namespace. For example, can you have
> a
> >> > 'StreamSet', which is a set of streams? (like in filesystem, a
> directory
> >> > has a list of children). If you have similar hierarchical, it
> definitely
> >> > will simply my work.
> >> >
> >>
> >> In the write proxy, we have a similar concept like 'StreamSet' to group
> >> some physical DL streams into one virtual stream. However that was
> mostly
> >> used for exporting metrics for grouped virtual streams. We don't quite
> >> emphasize the concept of 'virtual stream' in DL. As we tended to let the
> >> application decide what the virtual stream looks like.
> >>
> >> However, for metadata organization and management, it might make sense
> to
> >> think of such hierarchy.
> >>
> >> What do you have in your mind about 'StreamSet'? Can you explain a
> little
> >> more?
> >
> >
> > I was thinking a group of streams that might be used for same application
> > but store different parts of data. It is like the 'virtual' stream that
> you
> > mentioned.
> >
> > - Gerrit
> >
> >
> >>
> >> >
> >> >
> >> > >
> >> > >
> >> > > > * Seal a distributedlog stream
> >> > > > * Truncate a distributedlog stream
> >> > > >
> >> > >
> >> > > Just to clarify this, the 'truncate' in DL is to trim the head of
> the
> >> > > stream not the tail.
> >> > > The 'truncate' in filesystem world is to a size of precisely
> *length*
> >> > > bytes, it is truncating the tail.
> >> > >
> >> > > Make sure we clarified it and are on same page.
> >> > >
> >> >
> >> > Yes, we are on the same page.
> >> >
> >> >
> >> > >
> >> > >
> >> > > >
> >> > > > I am looking for a more filesystem-like API. for example,
> >> > > >
> >> > > > * Get the status/attributes of a stream (like stat in filesystem)
> >> > > >
> >> > >
> >> > > +1 for stream status/attributes. I think we might actually already
> >> have
> >> > > this in DL. since in kestrel, we use that for storing customized
> >> > metadata.
> >> > > It might make sense to formalize it into 'stream status'.
> >> > >
> >> >
> >> > Gotcha.
> >> >
> >> >
> >> > >
> >> > >
> >> > > > * Rename a stream
> >> > > >
> >> > >
> >> > > we've talked about this for a while. +1.
> >> > >
> >> > >
> >> > > > * Symlink a stream
> >> > >
> >> > >
> >> > > Symlink a stream is probably easy to do. +1 we've thought about that
> >> for
> >> > > having the flexibility to move stream between different storage
> >> backend.
> >> > > Symlink would help this.
> >> > >
> >> > > But a more fundamental thought here is symlinks for log segments. So
> >> > when a
> >> > > symlinked stream is deleted, the underneath log segments might not
> be
> >> > > deleted until its link count decreased to zero.
> >> > >
> >> > >
> >> > >
> >> > > >
> >> > > > Another operations that I can think of might be useful.
> >> > > >
> >> > > > * Split/Fork a stream (it can be useful for dynamic data
> >> partitioning)
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > Split and fork a stream sounds interesting. But it sounds like a
> more
> >> > > high-level feature rather than storage primitives. Actually, it
> might
> >> be
> >> > a
> >> > > good separate discussion feature.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > > * Merge/Concat streams
> >> > > >
> >> > >
> >> > >
> >> > > I think there is already one outstanding jira for concatenating two
> DL
> >> > > streams. Jia and Arvind are working on that.
> >> > >
> >> > > https://issues.apache.org/jira/browse/DL-46
> >> >
> >> >
> >> > I will watch that lira.
> >> >
> >> >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > >
> >> > > > The above operations are based on my knowledge about DL. Feel free
> >> to
> >> > add
> >> > > > more.
> >> > >
> >> > >
> >> > > >
> >> > > > - Gerrit
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Reply via email to