Gerrit, Can you send me your wiki account?
- Sijie On Thu, Nov 17, 2016 at 1:38 AM, Gerrit Sundaram <gerritsunda...@gmail.com> wrote: > Can you grant me the permissions for editing the wiki page? > > - Gerrit > > On Thu, Nov 17, 2016 at 1:37 AM, Gerrit Sundaram <gerritsunda...@gmail.com > > > wrote: > > > > > > > On Tue, Nov 15, 2016 at 2:14 AM, Sijie Guo <si...@apache.org> wrote: > > > >> On Sat, Nov 12, 2016 at 2:30 AM, Gerrit Sundaram < > >> gerritsunda...@gmail.com> > >> wrote: > >> > >> > On Fri, Nov 11, 2016 at 1:09 PM, Sijie Guo <si...@apache.org> wrote: > >> > > >> > > I liked this topic. A better name might be 'stream storage > >> primitives', > >> > as > >> > > we treat DL as a stream storage. Comments inline. > >> > > > >> > > On Wed, Nov 9, 2016 at 3:09 AM, Gerrit Sundaram < > >> > gerritsunda...@gmail.com> > >> > > wrote: > >> > > > >> > > > As what Sijie suggested in the other email thread, I started this > >> email > >> > > > thread for discussing the stream operation primitives. > >> > > > > >> > > > The stream operations that I am aware of that DL supports are > >> > > > > >> > > > * Open a distributedlog stream > >> > > > * Delete a distributedlog stream > >> > > > * List all the distributedlog streams under a namespace > >> > > > > >> > > > >> > > Are you also looking for listing streams under a 'sub-namespace' - > (or > >> > > streams have common prefix)? (Based on my understanding on your > >> proposal, > >> > > you might need this for a filesystem-like API?) > >> > > > >> > > >> > Yes. However it seems like DL is more designed with flat namespace > with > >> > just streams. > >> > >> > >> Ah, yes. The original thought is to tight a namespace to a user or an > >> application. Under a namespace, application can manage the streams by > >> their > >> own. So that's why it was designed with a flat namespace. > >> > >> > >> > There is no concept about 'sub-namespace'. Although I > >> > probably can hack it by just naming the stream names in a filesystem > >> > path-like way. > >> > > >> > However I am still curious do you guys want to introduce any sort of > >> naming > >> > hierarchy in the naming within a namespace. For example, can you have > a > >> > 'StreamSet', which is a set of streams? (like in filesystem, a > directory > >> > has a list of children). If you have similar hierarchical, it > definitely > >> > will simply my work. > >> > > >> > >> In the write proxy, we have a similar concept like 'StreamSet' to group > >> some physical DL streams into one virtual stream. However that was > mostly > >> used for exporting metrics for grouped virtual streams. We don't quite > >> emphasize the concept of 'virtual stream' in DL. As we tended to let the > >> application decide what the virtual stream looks like. > >> > >> However, for metadata organization and management, it might make sense > to > >> think of such hierarchy. > >> > >> What do you have in your mind about 'StreamSet'? Can you explain a > little > >> more? > > > > > > I was thinking a group of streams that might be used for same application > > but store different parts of data. It is like the 'virtual' stream that > you > > mentioned. > > > > - Gerrit > > > > > >> > >> > > >> > > >> > > > >> > > > >> > > > * Seal a distributedlog stream > >> > > > * Truncate a distributedlog stream > >> > > > > >> > > > >> > > Just to clarify this, the 'truncate' in DL is to trim the head of > the > >> > > stream not the tail. > >> > > The 'truncate' in filesystem world is to a size of precisely > *length* > >> > > bytes, it is truncating the tail. > >> > > > >> > > Make sure we clarified it and are on same page. > >> > > > >> > > >> > Yes, we are on the same page. > >> > > >> > > >> > > > >> > > > >> > > > > >> > > > I am looking for a more filesystem-like API. for example, > >> > > > > >> > > > * Get the status/attributes of a stream (like stat in filesystem) > >> > > > > >> > > > >> > > +1 for stream status/attributes. I think we might actually already > >> have > >> > > this in DL. since in kestrel, we use that for storing customized > >> > metadata. > >> > > It might make sense to formalize it into 'stream status'. > >> > > > >> > > >> > Gotcha. > >> > > >> > > >> > > > >> > > > >> > > > * Rename a stream > >> > > > > >> > > > >> > > we've talked about this for a while. +1. > >> > > > >> > > > >> > > > * Symlink a stream > >> > > > >> > > > >> > > Symlink a stream is probably easy to do. +1 we've thought about that > >> for > >> > > having the flexibility to move stream between different storage > >> backend. > >> > > Symlink would help this. > >> > > > >> > > But a more fundamental thought here is symlinks for log segments. So > >> > when a > >> > > symlinked stream is deleted, the underneath log segments might not > be > >> > > deleted until its link count decreased to zero. > >> > > > >> > > > >> > > > >> > > > > >> > > > Another operations that I can think of might be useful. > >> > > > > >> > > > * Split/Fork a stream (it can be useful for dynamic data > >> partitioning) > >> > > > > >> > > > >> > > > >> > > > >> > > Split and fork a stream sounds interesting. But it sounds like a > more > >> > > high-level feature rather than storage primitives. Actually, it > might > >> be > >> > a > >> > > good separate discussion feature. > >> > > > >> > > > >> > > > >> > > > >> > > > * Merge/Concat streams > >> > > > > >> > > > >> > > > >> > > I think there is already one outstanding jira for concatenating two > DL > >> > > streams. Jia and Arvind are working on that. > >> > > > >> > > https://issues.apache.org/jira/browse/DL-46 > >> > > >> > > >> > I will watch that lira. > >> > > >> > > >> > > > >> > > > >> > > > >> > > > >> > > > > >> > > > The above operations are based on my knowledge about DL. Feel free > >> to > >> > add > >> > > > more. > >> > > > >> > > > >> > > > > >> > > > - Gerrit > >> > > > > >> > > > >> > > >> > > > > >