Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
No graph changes either on client side or server side. The > snap-view-server will detect availability of new snapshot from > glusterd, and will spin up a new glfs_t for the corresponding snap, > and start returning new list of "names" in readdir(), etc. I asked if we were dynamically changing the client graph to add new protocol/client instances. Here is Varun's answer. > Adding a protocol/client instance to connect to protocol/server at the > daemon. Apparently the addition he mentions wasn't the kind I was asking about, but something that only occurs at normal volfile-generation time. Is that correct? > No volfile/graph changes at all. Creation/removal of snapshots is > handled in the form of a dynamic list of glfs_t's on the server side. So we still have dynamically added graphs, but they're wrapped up in GFAPI objects? Let's be sure to capture that nuance in v2 of the spec. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
On Thu, May 8, 2014 at 12:20 PM, Jeff Darcy wrote: > > They were: a) snap view generation requires privileged ops to > > glusterd. So moving this task to the server side solves a lot of those > > challenges. > > Not really. A server-side component issuing privileged requests > whenever a client asks it to is no more secure than a client-side > component issuing them directly. client cannot ask the server side component to do any privileged requests on its behalf. If it has the right to connect to the volume, then it can issue a readdir() request and get served with whatever is served to it. If it presents an unknown filehandle, snap-view-server returns ESTALE. > There needs to be some sort of > authentication and authorization at the glusterd level (the only place > these all converge). This is a more general problem that we've had with > glusterd for a long time. If security is a sincere concern for USS, > shouldn't we address it by trying to move the general solution forward? > The goal was to not make the security problem harder or worse. With this design the privileged operation is still contained within the server side. If clients were to issue RPCs to glusterd (to get list of snaps, their volfiles etc.), it would have been a challenge for the general glusterd security problem. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
> Overall, it seems like having clients connect *directly* to the >> snapshot volumes once they've been started might have avoided some >> complexity or problems. Was this considered? > Yes this was considered. I have mentioned the two reasons why this was > dropped in the other mail. I look forward to the next version of the design which reflects the new ideas since this email thread started. > They were: a) snap view generation requires privileged ops to > glusterd. So moving this task to the server side solves a lot of those > challenges. Not really. A server-side component issuing privileged requests whenever a client asks it to is no more secure than a client-side component issuing them directly. There needs to be some sort of authentication and authorization at the glusterd level (the only place these all converge). This is a more general problem that we've had with glusterd for a long time. If security is a sincere concern for USS, shouldn't we address it by trying to move the general solution forward? ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
On Thu, May 8, 2014 at 11:48 AM, Jeff Darcy wrote: > > client graph is not dynamically modified. the snapview-client and > > protocol/server are inserted by volgen and no further changes are made on > > the client side. I believe Anand was referring to " Adding a > protocol/client > > instance to connect to protocol/server at the daemon" as an action being > > performed by volgen. > > OK, so let's say we create a new volfile including connections for a > snapshot > that didn't even exist when the client first mounted. Are you saying we do > a full graph switch to that new volfile? No graph changes either on client side or server side. The snap-view-server will detect availability of new snapshot from glusterd, and will spin up a new glfs_t for the corresponding snap, and start returning new list of "names" in readdir(), etc. > That still seems dynamic. Doesn't > that still mean we need to account for USS state when we regenerate the > next volfile after an add-brick (for example)? One way or another the > graph's going to change, which creates a lot of state-management issues. > No volfile/graph changes at all. Creation/removal of snapshots is handled in the form of a dynamic list of glfs_t's on the server side. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
> client graph is not dynamically modified. the snapview-client and > protocol/server are inserted by volgen and no further changes are made on > the client side. I believe Anand was referring to " Adding a protocol/client > instance to connect to protocol/server at the daemon" as an action being > performed by volgen. OK, so let's say we create a new volfile including connections for a snapshot that didn't even exist when the client first mounted. Are you saying we do a full graph switch to that new volfile? That still seems dynamic. Doesn't that still mean we need to account for USS state when we regenerate the next volfile after an add-brick (for example)? One way or another the graph's going to change, which creates a lot of state-management issues. Those need to be addressed in a reviewable design so everyone can think about it and contribute their thoughts based on their perspectives. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
On Thu, May 8, 2014 at 4:53 AM, Jeff Darcy wrote: > > > * How do clients find it? Are we dynamically changing the client > > >side graph to add new protocol/client instances pointing to new > > >snapview-servers, or is snapview-client using RPC directly? Are > > >the snapview-server ports managed through the glusterd portmapper > > >interface, or patched in some other way? > > Adding a protocol/client instance to connect to protocol/server at the > > daemon. > > So now the client graph is being dynamically modified, in ways that > make it un-derivable from the volume configuration (because they're > based in part on user activity since then)? What happens if a normal > graph switch (e.g. due to add-brick) happens? I'll need to think some > more about what this architectural change really means. client graph is not dynamically modified. the snapview-client and protocol/server are inserted by volgen and no further changes are made on the client side. I believe Anand was referring to " Adding a protocol/client instance to connect to protocol/server at the daemon" as an action being performed by volgen. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
On Thu, May 8, 2014 at 4:48 AM, Jeff Darcy wrote: > > If snapview-server runs on all servers, how does a particular client > decide which one to use? Do we need to do something to avoid hot spots? > > Overall, it seems like having clients connect *directly* to the snapshot > volumes once they've been started might have avoided some complexity or > problems. Was this considered? > Yes this was considered. I have mentioned the two reasons why this was dropped in the other mail. They were: a) snap view generation requires privileged ops to glusterd. So moving this task to the server side solves a lot of those challenges. b) keep tab on total number of connections in the system and don't explore the connections with more clients (given that there can be lots of snapshots.) > > > * How does snapview-server manage user credentials for connecting > > >to snap bricks? What if multiple users try to use the same > > >snapshot at the same time? How does any of this interact with > > >on-wire or on-disk encryption? > > > > No interaction with on-disk or on-wire encryption. Multiple users can > > always access the same snapshot (volume) at the same time. Why do you > > see any restrictions there? > > If we're using either on-disk or on-network encryption, client keys and > certificates must remain on the clients. They must not be on servers. > If the volumes are being proxied through snapview-server, it needs > those credentials, but letting it have them defeats both security > mechanisms. > The encryption xlator sits on top of snapview-client on the client side, and should be able to decrypt file content whether coming from a snap view or the main volume. keys and certs remain on the client. But thanks for mentioning this, we need to spin up an instance of locks xlator on top of snapview-server to satisfy the locking requests from crypt. Avati ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
On Thu, May 8, 2014 at 4:45 AM, Ira Cooper wrote: > Also inline. > > - Original Message - > > > The scalability factor I mentioned simply had to do with the core > > infrastructure (depending on very basic mechanisms like the epoll wait > > thread, the entire end-to-end flow of a single fop like say, a lookup() > > here). Even though this was contained to an extent by the introduction > > of the io-threads xlator in snapd, it is still a complex path that is > > not exactly about high performance design. That wasn't the goal to begin > > with. > > Yes, if you get rid of the daemon it doesn't have those issues ;). > > > I am not sure what the linear range versus a non-linear one has to do > > with the design? Maybe you are seeing something that I miss. A random > > gfid is generated in the snapview-server xlator on lookups. The > > snapview-client is a kind of a basic redirector that detects when a > > reference is made to a "virtual" inode (based on stored context) and > > simply redirects to the snapd daemon. It stores the info returned from > > snapview-server, capturing the essential inode info in the inode context > > (note this is the client side inode we are talking abt). > > That last note, is merely a warning against changing the properties of the > UUID generator, please ignore it. > > > In the daemon there is another level of translation which needs to > > associate this gfid with an inode in the context of the protocol-server > > xlator. The next step of the translation is that this inode needs to be > > translated to the actual gfid on disk - that is the only on-disk gfid > > which exists in one of the snapshotted gluster volumes. To that extent > > the snapview-s xlator needs to know which is the glfs_t structure to > > access so it can get to the right gfapi graph. Once it knows that, it > > can access any object in that gfapi graph using the glfs_object (which > > has the real inode info from the gfapi world and the actual on-disk > gfid). > > No daemon! SCRAP IT! Throw it in the bin, and don't let it climb back > out. > > What you are proposing: random gfid -> real gfid ; as the mapping the > daemon must maintain. > > What I am proposing: real gfid + offset -> real gfid ; offset is a per > snapshot value, local to the client. > > Because the lookup table is now trivial, a single integer per snapshot. > You don't need all that complex infrastructure. > The purpose for the existence of the daemon is two: - client cannot perform privileged ops to glusterd regarding listing of snaps etc. - limit the total number of connections coming to bricks. If each client has a new set of connections to each of the snpashot bricks, the total number of connections in the system will become a function of the total number of clients * total number of snapshots. gfid management is something completely orthogonal, we can use the current random gfid or a more deterministic one (going to require a LOT more changes to make gfids deterministic, and what about already assigned ones, etc.) whether the .snap view is generated on client side or server side. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
> > Overall, it seems like having clients connect *directly* to the > > snapshot volumes once they've been started might have avoided some > > complexity or problems. Was this considered? > > Can you explain this in more detail? Are you saying that the virtual > namespace overlay used by the current design can be reused along with > returning extra info to clients or is this a new approach where you > make the clients much more intelligent than they are in the current > approach? Basically the clients would have the same intelligence that now resides in snapview-server. Instead of spinning up a new protocol/client to talk to a new snapview-server, they'd send a single RPC to start the snapshot brick daemons, then connect to those itself. Of course, this exacerbates the problem with dynamically changing translator graphs on the client side, because now they dynamically added parts will be whole trees (corresponding to whole volfiles) instead of single protocol/client translators. Long term, I think we should consider *not* handling these overlays as modifications to the main translator graph, but instead allowing multiple translator graphs to be active in the glusterfs process concurrently. For example, this greatly simplifies the question of how to deal with a graph change after we've added several overlays. * "Splice" method: graph comparisons must be enhanced to ignore the overlays, overlays must be re-added after the graph switch takes place, etc. * "Multiple graph" method: just change the main graph (the one that's rooted at mount/fuse) and leave the others alone. Stray thought: does any of this break when we're in an NFS or Samba daemon instead of a native-mount glusterfs daemon? > > If we're using either on-disk or on-network encryption, client keys > > and certificates must remain on the clients. They must not be on > > servers. If the volumes are being proxied through snapview-server, > > it needs those credentials, but letting it have them defeats both > > security mechanisms. > > > > Also, do we need to handle the case where the credentials have > > changed since the snapshot was taken? This is probably a more > > general problem with snapshots themselves, but still needs to be > > considered. > > Agreed. Very nice point you brought up. We will need to think a bit > more on this Jeff. This is what reviews are for. ;) Another thought: are there any interesting security implications because USS allows one user to expose *other users'* previous versions through the automatically mounted snapshot? ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
On 05/08/2014 05:18 PM, Jeff Darcy wrote: * Since a snap volume will refer to multiple bricks, we'll need more brick daemons as well. How are *those* managed? This is infra handled by the "core" snapshot functionality/feature. When a snap is created, it is treated not only as a lvm2 thin-lv but as a glusterfs volume as well. The snap volume is activated and mounted and made available for regular use through the native fuse-protocol client. Management of these is not part of the USS feature. But handled as part of the core snapshot implementation. If we're auto-starting snapshot volumes, are we auto-stopping them as well? According to what policy? These are not auto-stopped at all. A deactivate cmd has been introduced as part of the core snapshot feature, which can be used to deactivate such a snap vol. Refer to the snapshot feature design for details. USS (mainly snapview-server xlator) talks to the snapshot volumes (and hence the bricks) through the glfs_t *, and passing a glfs_object pointer. So snapview-server is using GFAPI from within a translator? This caused a *lot* of problems in NSR reconciliation, especially because of how GFAPI constantly messes around with the "THIS" pointer. Does the USS work include fixing these issues? Well, not only that, here we have multiple gfapi call-graphs all hanging (for each snap vol) from the same snapd/snapview-server xlator address space. Ok, don't panic :) We haven't hit any issues in our basic testing so far. Let us test some more to see if we hit the problems you mention. If we hit them, we fix them. Or something like that. ;) If snapview-server runs on all servers, how does a particular client decide which one to use? Do we need to do something to avoid hot spots? The idea as of today is to connect to the snapd running on the host the client connects to, where the mgmt glusterd is running. We can think of other mechanisms like distributing the connections to different snapds but that is not implemented for the first drop. And it is a concern only if we hit a perf bottleneck wrt the number of requests a given snapd hits. Overall, it seems like having clients connect *directly* to the snapshot volumes once they've been started might have avoided some complexity or problems. Was this considered? Can you explain this in more detail? Are you saying that the virtual namespace overlay used by the current design can be reused along with returning extra info to clients or is this a new approach where you make the clients much more intelligent than they are in the current approach? * How does snapview-server manage user credentials for connecting to snap bricks? What if multiple users try to use the same snapshot at the same time? How does any of this interact with on-wire or on-disk encryption? No interaction with on-disk or on-wire encryption. Multiple users can always access the same snapshot (volume) at the same time. Why do you see any restrictions there? If we're using either on-disk or on-network encryption, client keys and certificates must remain on the clients. They must not be on servers. If the volumes are being proxied through snapview-server, it needs those credentials, but letting it have them defeats both security mechanisms. Also, do we need to handle the case where the credentials have changed since the snapshot was taken? This is probably a more general problem with snapshots themselves, but still needs to be considered. Agreed. Very nice point you brought up. We will need to think a bit more on this Jeff. Cheers, Anand ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
> > * How do clients find it? Are we dynamically changing the client > >side graph to add new protocol/client instances pointing to new > >snapview-servers, or is snapview-client using RPC directly? Are > >the snapview-server ports managed through the glusterd portmapper > >interface, or patched in some other way? > Adding a protocol/client instance to connect to protocol/server at the > daemon. So now the client graph is being dynamically modified, in ways that make it un-derivable from the volume configuration (because they're based in part on user activity since then)? What happens if a normal graph switch (e.g. due to add-brick) happens? I'll need to think some more about what this architectural change really means. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
> > * Since a snap volume will refer to multiple bricks, we'll need > >more brick daemons as well. How are *those* managed? > > This is infra handled by the "core" snapshot functionality/feature. When > a snap is created, it is treated not only as a lvm2 thin-lv but as a > glusterfs volume as well. The snap volume is activated and mounted and > made available for regular use through the native fuse-protocol client. > Management of these is not part of the USS feature. But handled as part > of the core snapshot implementation. If we're auto-starting snapshot volumes, are we auto-stopping them as well? According to what policy? > USS (mainly snapview-server xlator) > talks to the snapshot volumes (and hence the bricks) through the glfs_t > *, and passing a glfs_object pointer. So snapview-server is using GFAPI from within a translator? This caused a *lot* of problems in NSR reconciliation, especially because of how GFAPI constantly messes around with the "THIS" pointer. Does the USS work include fixing these issues? If snapview-server runs on all servers, how does a particular client decide which one to use? Do we need to do something to avoid hot spots? Overall, it seems like having clients connect *directly* to the snapshot volumes once they've been started might have avoided some complexity or problems. Was this considered? > > * How does snapview-server manage user credentials for connecting > >to snap bricks? What if multiple users try to use the same > >snapshot at the same time? How does any of this interact with > >on-wire or on-disk encryption? > > No interaction with on-disk or on-wire encryption. Multiple users can > always access the same snapshot (volume) at the same time. Why do you > see any restrictions there? If we're using either on-disk or on-network encryption, client keys and certificates must remain on the clients. They must not be on servers. If the volumes are being proxied through snapview-server, it needs those credentials, but letting it have them defeats both security mechanisms. Also, do we need to handle the case where the credentials have changed since the snapshot was taken? This is probably a more general problem with snapshots themselves, but still needs to be considered. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
Also inline. - Original Message - > The scalability factor I mentioned simply had to do with the core > infrastructure (depending on very basic mechanisms like the epoll wait > thread, the entire end-to-end flow of a single fop like say, a lookup() > here). Even though this was contained to an extent by the introduction > of the io-threads xlator in snapd, it is still a complex path that is > not exactly about high performance design. That wasn't the goal to begin > with. Yes, if you get rid of the daemon it doesn't have those issues ;). > I am not sure what the linear range versus a non-linear one has to do > with the design? Maybe you are seeing something that I miss. A random > gfid is generated in the snapview-server xlator on lookups. The > snapview-client is a kind of a basic redirector that detects when a > reference is made to a "virtual" inode (based on stored context) and > simply redirects to the snapd daemon. It stores the info returned from > snapview-server, capturing the essential inode info in the inode context > (note this is the client side inode we are talking abt). That last note, is merely a warning against changing the properties of the UUID generator, please ignore it. > In the daemon there is another level of translation which needs to > associate this gfid with an inode in the context of the protocol-server > xlator. The next step of the translation is that this inode needs to be > translated to the actual gfid on disk - that is the only on-disk gfid > which exists in one of the snapshotted gluster volumes. To that extent > the snapview-s xlator needs to know which is the glfs_t structure to > access so it can get to the right gfapi graph. Once it knows that, it > can access any object in that gfapi graph using the glfs_object (which > has the real inode info from the gfapi world and the actual on-disk gfid). No daemon! SCRAP IT! Throw it in the bin, and don't let it climb back out. What you are proposing: random gfid -> real gfid ; as the mapping the daemon must maintain. What I am proposing: real gfid + offset -> real gfid ; offset is a per snapshot value, local to the client. Because the lookup table is now trivial, a single integer per snapshot. You don't need all that complex infrastructure. Thanks, -Ira ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
Inline. On 05/07/2014 10:59 PM, Ira Cooper wrote: Anand, I also have a concern regarding the user-serviceable snapshot feature. You rightfully call out the lack of scaling caused by maintaining the gfid -> gfid mapping tables, and correctly point out that this will limit the use cases this feature will be applicable to, on the client side. If in fact gluster generates its gfids randomly, and has always done so, I propose that we can change the algorithm used to determine the mapping, to eliminate the lack of scaling of our solution. We can create a fixed constant per-snapshot. (Can be in just the client's memory, or stored on disk, that is an implementation detail here.) We will call this constant "n". I propose we just add the constant to the gfid determine the new gfid. It turns out that this new gfid has the same chance of collision as any random gfid. (It will take a moment for you to convince yourself of this, but the argument is fairly intuitive.) If we do this, I'd suggest we do it on the first 32 bits of the gfid, because we can use simple unsigned math, and let it just overflow. (If we get up to 2^32 snapshots, we can revisit this aspect of the design, but we'll have other issues at that number.) By using addition this way, we also allow for subtraction to be used for a later purpose. Note: This design relies on our random gfid generator not turning out a linear range of numbers. If it has in the past, or will in the future, clearly this design has flaws. But, I know of no such plans. As long as the randomness is sufficient, there should be no issue. (IE: It doesn't turn out linear results.) I don't claim to understand your question completely but have a feeling you are going off the track here. So bear with me, as my explanation could be off the mark as well. The scalability factor I mentioned simply had to do with the core infrastructure (depending on very basic mechanisms like the epoll wait thread, the entire end-to-end flow of a single fop like say, a lookup() here). Even though this was contained to an extent by the introduction of the io-threads xlator in snapd, it is still a complex path that is not exactly about high performance design. That wasn't the goal to begin with. I am not sure what the linear range versus a non-linear one has to do with the design? Maybe you are seeing something that I miss. A random gfid is generated in the snapview-server xlator on lookups. The snapview-client is a kind of a basic redirector that detects when a reference is made to a "virtual" inode (based on stored context) and simply redirects to the snapd daemon. It stores the info returned from snapview-server, capturing the essential inode info in the inode context (note this is the client side inode we are talking abt). In the daemon there is another level of translation which needs to associate this gfid with an inode in the context of the protocol-server xlator. The next step of the translation is that this inode needs to be translated to the actual gfid on disk - that is the only on-disk gfid which exists in one of the snapshotted gluster volumes. To that extent the snapview-s xlator needs to know which is the glfs_t structure to access so it can get to the right gfapi graph. Once it knows that, it can access any object in that gfapi graph using the glfs_object (which has the real inode info from the gfapi world and the actual on-disk gfid). Anand Thanks, -Ira / ira@(redhat.com|samba.org) PS: +1 to Jeff here. He's spotting major issues, that should be looked at, above the issue above. - Original Message - Attached is a basic write-up of the user-serviceable snapshot feature design (Avati's). Please take a look and let us know if you have questions of any sort... A few. The design creates a new type of daemon: snapview-server. * Where is it started? One server (selected how) or all? * How do clients find it? Are we dynamically changing the client side graph to add new protocol/client instances pointing to new snapview-servers, or is snapview-client using RPC directly? Are the snapview-server ports managed through the glusterd portmapper interface, or patched in some other way? * Since a snap volume will refer to multiple bricks, we'll need more brick daemons as well. How are *those* managed? * How does snapview-server manage user credentials for connecting to snap bricks? What if multiple users try to use the same snapshot at the same time? How does any of this interact with on-wire or on-disk encryption? I'm sure I'll come up with more later. Also, next time it might be nice to use the upstream feature proposal template *as it was designed* to make sure that questions like these get addressed where the whole community can participate in a timely fashion. ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluste
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
Answers inline. On 05/07/2014 10:52 PM, Jeff Darcy wrote: Attached is a basic write-up of the user-serviceable snapshot feature design (Avati's). Please take a look and let us know if you have questions of any sort... A few. The design creates a new type of daemon: snapview-server. * Where is it started? One server (selected how) or all? snapview-server is the same of the server side xlator. snapd is the (glusterfsd) daemon that when started and is running, looks like this: /usr/local/sbin/glusterfsd -s localhost --volfile-id snapd/vol2 -p /var/lib/glusterd/vols/vol2/run/snapd.pid -l /usr/local/var/log/glusterfs/snapd.log --xlator-option *-posix.glusterd-uuid=bd3b0111-33db-499c-8497-f455db729394 --brick-name vol2-server -S /var/run/2301b86236cdb9bd09c7f3988ac5c29f.socket --brick-port 49164 --xlator-option vol2-server.listen-port=49164 where vol2 is the name of the glusterfs vol in question. The snapd is started by nodesvc start (as of today) and is started when the vol is started. It includes the protocol-server, io-threads and the snapview-server xlators. * How do clients find it? Are we dynamically changing the client side graph to add new protocol/client instances pointing to new snapview-servers, or is snapview-client using RPC directly? Are the snapview-server ports managed through the glusterd portmapper interface, or patched in some other way? As Varun mentioned in his response, snapd gets a port through the glusterd pmap_registry calls. Client side xlator as usual does the pmap client query to find out the port it needs to connect to. Nothing different from the norm here. * Since a snap volume will refer to multiple bricks, we'll need more brick daemons as well. How are *those* managed? This is infra handled by the "core" snapshot functionality/feature. When a snap is created, it is treated not only as a lvm2 thin-lv but as a glusterfs volume as well. The snap volume is activated and mounted and made available for regular use through the native fuse-protocol client. Management of these is not part of the USS feature. But handled as part of the core snapshot implementation. USS (mainly snapview-server xlator) talks to the snapshot volumes (and hence the bricks) through the glfs_t *, and passing a glfs_object pointer. Each of these glfs instances represents the gfapi world for an individual snapshotted volume - accessing any of the snap vol bricks etc. is handled within the gfapi world. So to that extent, snapview-server xlator is yet another consumer of the handle-based gfapi calls like nfs-ganesha. * How does snapview-server manage user credentials for connecting to snap bricks? What if multiple users try to use the same snapshot at the same time? How does any of this interact with on-wire or on-disk encryption? No interaction with on-disk or on-wire encryption. Multiple users can always access the same snapshot (volume) at the same time. Why do you see any restrictions there? Maybe you want to know if userA can access the .snaps directory of another user userB? Today the credentials are passed as is ie. snapview-server does not play around with user-creds. If the snapshot volume access allows it, it goes through. Let me get back with more details on this one and thanks for bringing it up. Remember that the snapshot vol is read-only. And snapview-server has many glfs_t * pointers, each one pointing to each of the snapshot volumes (through the gfapi world). I'm sure I'll come up with more later. Also, next time it might be nice to use the upstream feature proposal template *as it was designed* to make sure that questions like these get addressed where the whole community can participate in a timely fashion. Umm...this wasn't exactly a feature "proposal" was it ;-) ? But that said, point taken. Will do that. Next time. Please do send all questions you come up with and thanks for these. Certainly helped clarify several imp points here. Anand ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
Hi, On Wednesday 07 May 2014 10:52 PM, Jeff Darcy wrote: Attached is a basic write-up of the user-serviceable snapshot feature design (Avati's). Please take a look and let us know if you have questions of any sort... A few. The design creates a new type of daemon: snapview-server. * Where is it started? One server (selected how) or all? All the servers in the cluster. * How do clients find it? Are we dynamically changing the client side graph to add new protocol/client instances pointing to new snapview-servers, or is snapview-client using RPC directly? Are the snapview-server ports managed through the glusterd portmapper interface, or patched in some other way? Adding a protocol/client instance to connect to protocol/server at the daemon. so, the call flow would look like, (if the call is to .snaps) snapview-client -> protocol/client -> protocol/server ->snapview-server. Yes, it is handled through glusterd portmapper. * Since a snap volume will refer to multiple bricks, we'll need more brick daemons as well. How are *those* managed? Brick processes associated with the snapshot will be started. - Varun Shastry * How does snapview-server manage user credentials for connecting to snap bricks? What if multiple users try to use the same snapshot at the same time? How does any of this interact with on-wire or on-disk encryption? I'm sure I'll come up with more later. Also, next time it might be nice to use the upstream feature proposal template *as it was designed* to make sure that questions like these get addressed where the whole community can participate in a timely fashion. ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
- Original Message - From: "Varun Shastry" To: "Sobhan Samantaray" , ana...@redhat.com Cc: gluster-devel@gluster.org, "gluster-users" , "Anand Avati" Sent: Thursday, May 8, 2014 11:16:11 AM Subject: Re: [Gluster-users] User-serviceable snapshots design Hi Sobhan, On Wednesday 07 May 2014 09:12 PM, Sobhan Samantaray wrote: > I think its a good idea to include the auto-remove of the snapshots based on > the time or space as threshold as mentioned in below link. > > http://www.howtogeek.com/110138/how-to-back-up-your-linux-system-with-back-in-time/ I think this feature is already implemented (partially?) as part of the snapshot feature. The feature proposed here only concentrates on the user serviceability of the snapshots taken. I understand that it would be part of core snapshot feature. I was talking w.r.t Paul's suggestion of scheduling the snapshot based on threshold levels in snapshot which will be part of phase-2. - Varun Shastry > > - Original Message - > From: "Anand Subramanian" > To: "Paul Cuzner" > Cc: gluster-devel@gluster.org, "gluster-users" , > "Anand Avati" > Sent: Wednesday, May 7, 2014 7:50:30 PM > Subject: Re: [Gluster-users] User-serviceable snapshots design > > Hi Paul, that is definitely doable and a very nice suggestion. It is just > that we probably won't be able to get to that in the immediate code drop > (what we like to call phase-1 of the feature). But yes, let us try to > implement what you suggest for phase-2. Soon :-) > > Regards, > Anand > > On 05/06/2014 07:27 AM, Paul Cuzner wrote: > > > > Just one question relating to thoughts around how you apply a filter to the > snapshot view from a user's perspective. > > In the "considerations" section, it states - "We plan to introduce a > configurable option to limit the number of snapshots visible under the USS > feature." > Would it not be possible to take the meta data from the snapshots to form a > tree hierarchy when the number of snapshots present exceeds a given > threshold, effectively organising the snaps by time. I think this would work > better from an end-user workflow perspective. > > i.e. > .snaps > \/ Today > +-- snap01_20140503_0800 > +-- snap02_ 20140503_ 1400 >> Last 7 days >> 7-21 days >> 21-60 days >> 60-180days >> 180days > > > > > > > From: "Anand Subramanian" > To: gluster-de...@nongnu.org , "gluster-users" > Cc: "Anand Avati" > Sent: Saturday, 3 May, 2014 2:35:26 AM > Subject: [Gluster-users] User-serviceable snapshots design > > Attached is a basic write-up of the user-serviceable snapshot feature > design (Avati's). Please take a look and let us know if you have > questions of any sort... > > We have a basic implementation up now; reviews and upstream commit > should follow very soon over the next week. > > Cheers, > Anand > > ___ > Gluster-users mailing list > gluster-us...@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > > ___ > Gluster-users mailing list > gluster-us...@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users > ___ > Gluster-users mailing list > gluster-us...@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
Hi Sobhan, On Wednesday 07 May 2014 09:12 PM, Sobhan Samantaray wrote: I think its a good idea to include the auto-remove of the snapshots based on the time or space as threshold as mentioned in below link. http://www.howtogeek.com/110138/how-to-back-up-your-linux-system-with-back-in-time/ I think this feature is already implemented (partially?) as part of the snapshot feature. The feature proposed here only concentrates on the user serviceability of the snapshots taken. - Varun Shastry - Original Message - From: "Anand Subramanian" To: "Paul Cuzner" Cc: gluster-devel@gluster.org, "gluster-users" , "Anand Avati" Sent: Wednesday, May 7, 2014 7:50:30 PM Subject: Re: [Gluster-users] User-serviceable snapshots design Hi Paul, that is definitely doable and a very nice suggestion. It is just that we probably won't be able to get to that in the immediate code drop (what we like to call phase-1 of the feature). But yes, let us try to implement what you suggest for phase-2. Soon :-) Regards, Anand On 05/06/2014 07:27 AM, Paul Cuzner wrote: Just one question relating to thoughts around how you apply a filter to the snapshot view from a user's perspective. In the "considerations" section, it states - "We plan to introduce a configurable option to limit the number of snapshots visible under the USS feature." Would it not be possible to take the meta data from the snapshots to form a tree hierarchy when the number of snapshots present exceeds a given threshold, effectively organising the snaps by time. I think this would work better from an end-user workflow perspective. i.e. .snaps \/ Today +-- snap01_20140503_0800 +-- snap02_ 20140503_ 1400 Last 7 days 7-21 days 21-60 days 60-180days 180days From: "Anand Subramanian" To: gluster-de...@nongnu.org , "gluster-users" Cc: "Anand Avati" Sent: Saturday, 3 May, 2014 2:35:26 AM Subject: [Gluster-users] User-serviceable snapshots design Attached is a basic write-up of the user-serviceable snapshot feature design (Avati's). Please take a look and let us know if you have questions of any sort... We have a basic implementation up now; reviews and upstream commit should follow very soon over the next week. Cheers, Anand ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
Anand, I also have a concern regarding the user-serviceable snapshot feature. You rightfully call out the lack of scaling caused by maintaining the gfid -> gfid mapping tables, and correctly point out that this will limit the use cases this feature will be applicable to, on the client side. If in fact gluster generates its gfids randomly, and has always done so, I propose that we can change the algorithm used to determine the mapping, to eliminate the lack of scaling of our solution. We can create a fixed constant per-snapshot. (Can be in just the client's memory, or stored on disk, that is an implementation detail here.) We will call this constant "n". I propose we just add the constant to the gfid determine the new gfid. It turns out that this new gfid has the same chance of collision as any random gfid. (It will take a moment for you to convince yourself of this, but the argument is fairly intuitive.) If we do this, I'd suggest we do it on the first 32 bits of the gfid, because we can use simple unsigned math, and let it just overflow. (If we get up to 2^32 snapshots, we can revisit this aspect of the design, but we'll have other issues at that number.) By using addition this way, we also allow for subtraction to be used for a later purpose. Note: This design relies on our random gfid generator not turning out a linear range of numbers. If it has in the past, or will in the future, clearly this design has flaws. But, I know of no such plans. As long as the randomness is sufficient, there should be no issue. (IE: It doesn't turn out linear results.) Thanks, -Ira / ira@(redhat.com|samba.org) PS: +1 to Jeff here. He's spotting major issues, that should be looked at, above the issue above. - Original Message - > > Attached is a basic write-up of the user-serviceable snapshot feature > > design (Avati's). Please take a look and let us know if you have > > questions of any sort... > > A few. > > The design creates a new type of daemon: snapview-server. > > * Where is it started? One server (selected how) or all? > > * How do clients find it? Are we dynamically changing the client > side graph to add new protocol/client instances pointing to new > snapview-servers, or is snapview-client using RPC directly? Are > the snapview-server ports managed through the glusterd portmapper > interface, or patched in some other way? > > * Since a snap volume will refer to multiple bricks, we'll need > more brick daemons as well. How are *those* managed? > > * How does snapview-server manage user credentials for connecting > to snap bricks? What if multiple users try to use the same > snapshot at the same time? How does any of this interact with > on-wire or on-disk encryption? > > I'm sure I'll come up with more later. Also, next time it might > be nice to use the upstream feature proposal template *as it was > designed* to make sure that questions like these get addressed > where the whole community can participate in a timely fashion. > ___ > Gluster-users mailing list > gluster-us...@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
> Attached is a basic write-up of the user-serviceable snapshot feature > design (Avati's). Please take a look and let us know if you have > questions of any sort... A few. The design creates a new type of daemon: snapview-server. * Where is it started? One server (selected how) or all? * How do clients find it? Are we dynamically changing the client side graph to add new protocol/client instances pointing to new snapview-servers, or is snapview-client using RPC directly? Are the snapview-server ports managed through the glusterd portmapper interface, or patched in some other way? * Since a snap volume will refer to multiple bricks, we'll need more brick daemons as well. How are *those* managed? * How does snapview-server manage user credentials for connecting to snap bricks? What if multiple users try to use the same snapshot at the same time? How does any of this interact with on-wire or on-disk encryption? I'm sure I'll come up with more later. Also, next time it might be nice to use the upstream feature proposal template *as it was designed* to make sure that questions like these get addressed where the whole community can participate in a timely fashion. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
I think its a good idea to include the auto-remove of the snapshots based on the time or space as threshold as mentioned in below link. http://www.howtogeek.com/110138/how-to-back-up-your-linux-system-with-back-in-time/ - Original Message - From: "Anand Subramanian" To: "Paul Cuzner" Cc: gluster-devel@gluster.org, "gluster-users" , "Anand Avati" Sent: Wednesday, May 7, 2014 7:50:30 PM Subject: Re: [Gluster-users] User-serviceable snapshots design Hi Paul, that is definitely doable and a very nice suggestion. It is just that we probably won't be able to get to that in the immediate code drop (what we like to call phase-1 of the feature). But yes, let us try to implement what you suggest for phase-2. Soon :-) Regards, Anand On 05/06/2014 07:27 AM, Paul Cuzner wrote: Just one question relating to thoughts around how you apply a filter to the snapshot view from a user's perspective. In the "considerations" section, it states - "We plan to introduce a configurable option to limit the number of snapshots visible under the USS feature." Would it not be possible to take the meta data from the snapshots to form a tree hierarchy when the number of snapshots present exceeds a given threshold, effectively organising the snaps by time. I think this would work better from an end-user workflow perspective. i.e. .snaps \/ Today +-- snap01_20140503_0800 +-- snap02_ 20140503_ 1400 > Last 7 days > 7-21 days > 21-60 days > 60-180days > 180days From: "Anand Subramanian" To: gluster-de...@nongnu.org , "gluster-users" Cc: "Anand Avati" Sent: Saturday, 3 May, 2014 2:35:26 AM Subject: [Gluster-users] User-serviceable snapshots design Attached is a basic write-up of the user-serviceable snapshot feature design (Avati's). Please take a look and let us know if you have questions of any sort... We have a basic implementation up now; reviews and upstream commit should follow very soon over the next week. Cheers, Anand ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
Hi Paul, that is definitely doable and a very nice suggestion. It is just that we probably won't be able to get to that in the immediate code drop (what we like to call phase-1 of the feature). But yes, let us try to implement what you suggest for phase-2. Soon :-) Regards, Anand On 05/06/2014 07:27 AM, Paul Cuzner wrote: Just one question relating to thoughts around how you apply a filter to the snapshot view from a user's perspective. In the "considerations" section, it states - "We plan to introduce a configurable option to limit the number of snapshots visible under the USS feature." Would it not be possible to take the meta data from the snapshots to form a tree hierarchy when the number of snapshots present exceeds a given threshold, effectively organising the snaps by time. I think this would work better from an end-user workflow perspective. i.e. .snaps \/ Today +-- snap01_20140503_0800 +-- snap02_20140503_1400 > Last 7 days > 7-21 days > 21-60 days > 60-180days > 180days *From: *"Anand Subramanian" *To: *gluster-de...@nongnu.org, "gluster-users" *Cc: *"Anand Avati" *Sent: *Saturday, 3 May, 2014 2:35:26 AM *Subject: *[Gluster-users] User-serviceable snapshots design Attached is a basic write-up of the user-serviceable snapshot feature design (Avati's). Please take a look and let us know if you have questions of any sort... We have a basic implementation up now; reviews and upstream commit should follow very soon over the next week. Cheers, Anand ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
Hi Sobhan, Thanks for the comments. It was a very quick writeup so that there is at least some clarity on the implementation internals. I will try to find time and plug in some more details. I am not quite sure what you mean by "the default value of the option of uss" but am assuming its the option of turning the feature on/off? If so, this will be set to "off" by default initially. The admin would need to turn uss on for a given vol. Thanks, Anand On 05/06/2014 10:08 PM, Sobhan Samantaray wrote: Hi Anand, Thanks to come-up with the nice design. I have couple of comments. 1. It should be mention in the design the access-protocols that should be used(NFS/CIFS etc) although the requirement states that. 2. Consideration section: "Again, this is not a performance oriented feature. Rather, the goal is to allow a seamless user-experience by allowing easy and useful access to snapshotted volumes and individual data stored in those volumes". If it is something that fops performance would not be impacted due to the introduction of this feature then it should be clarified. 3. It's good to mention the default value of the option of uss. Regards Sobhan From: "Paul Cuzner" To: ana...@redhat.com Cc: gluster-devel@gluster.org, "gluster-users" , "Anand Avati" Sent: Tuesday, May 6, 2014 7:27:29 AM Subject: Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design Just one question relating to thoughts around how you apply a filter to the snapshot view from a user's perspective. In the "considerations" section, it states - "We plan to introduce a configurable option to limit the number of snapshots visible under the USS feature." Would it not be possible to take the meta data from the snapshots to form a tree hierarchy when the number of snapshots present exceeds a given threshold, effectively organising the snaps by time. I think this would work better from an end-user workflow perspective. i.e. .snaps \/ Today +-- snap01_20140503_0800 +-- snap02_ 20140503_ 1400 Last 7 days 7-21 days 21-60 days 60-180days 180days From: "Anand Subramanian" To: gluster-de...@nongnu.org, "gluster-users" Cc: "Anand Avati" Sent: Saturday, 3 May, 2014 2:35:26 AM Subject: [Gluster-users] User-serviceable snapshots design Attached is a basic write-up of the user-serviceable snapshot feature design (Avati's). Please take a look and let us know if you have questions of any sort... We have a basic implementation up now; reviews and upstream commit should follow very soon over the next week. Cheers, Anand ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
Hi Anand, Thanks to come-up with the nice design. I have couple of comments. 1. It should be mention in the design the access-protocols that should be used(NFS/CIFS etc) although the requirement states that. 2. Consideration section: "Again, this is not a performance oriented feature. Rather, the goal is to allow a seamless user-experience by allowing easy and useful access to snapshotted volumes and individual data stored in those volumes". If it is something that fops performance would not be impacted due to the introduction of this feature then it should be clarified. 3. It's good to mention the default value of the option of uss. Regards Sobhan From: "Paul Cuzner" To: ana...@redhat.com Cc: gluster-devel@gluster.org, "gluster-users" , "Anand Avati" Sent: Tuesday, May 6, 2014 7:27:29 AM Subject: Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design Just one question relating to thoughts around how you apply a filter to the snapshot view from a user's perspective. In the "considerations" section, it states - "We plan to introduce a configurable option to limit the number of snapshots visible under the USS feature." Would it not be possible to take the meta data from the snapshots to form a tree hierarchy when the number of snapshots present exceeds a given threshold, effectively organising the snaps by time. I think this would work better from an end-user workflow perspective. i.e. .snaps \/ Today +-- snap01_20140503_0800 +-- snap02_ 20140503_ 1400 > Last 7 days > 7-21 days > 21-60 days > 60-180days > 180days From: "Anand Subramanian" To: gluster-de...@nongnu.org, "gluster-users" Cc: "Anand Avati" Sent: Saturday, 3 May, 2014 2:35:26 AM Subject: [Gluster-users] User-serviceable snapshots design Attached is a basic write-up of the user-serviceable snapshot feature design (Avati's). Please take a look and let us know if you have questions of any sort... We have a basic implementation up now; reviews and upstream commit should follow very soon over the next week. Cheers, Anand ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] User-serviceable snapshots design
Just one question relating to thoughts around how you apply a filter to the snapshot view from a user's perspective. In the "considerations" section, it states - "We plan to introduce a configurable option to limit the number of snapshots visible under the USS feature." Would it not be possible to take the meta data from the snapshots to form a tree hierarchy when the number of snapshots present exceeds a given threshold, effectively organising the snaps by time. I think this would work better from an end-user workflow perspective. i.e. .snaps \/ Today +-- snap01_20140503_0800 +-- snap02_ 20140503_ 1400 > Last 7 days > 7-21 days > 21-60 days > 60-180days > 180days > From: "Anand Subramanian" > To: gluster-de...@nongnu.org, "gluster-users" > Cc: "Anand Avati" > Sent: Saturday, 3 May, 2014 2:35:26 AM > Subject: [Gluster-users] User-serviceable snapshots design > Attached is a basic write-up of the user-serviceable snapshot feature > design (Avati's). Please take a look and let us know if you have > questions of any sort... > We have a basic implementation up now; reviews and upstream commit > should follow very soon over the next week. > Cheers, > Anand > ___ > Gluster-users mailing list > gluster-us...@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel