On Tue, Mar 1, 2016 at 12:37 PM, Prasanna Kumar Kalever <pkale...@redhat.com> wrote: > Hello Gluster, > > > Introducing a new file based snapshot feature in gluster which is based on > reflinks feature which will be out from xfs in a couple of months > (downstream) > > > what is a reflink ? > > You might have surely used softlinks and hardlinks everyday! > > Reflink supports transparent copy on write, unlike soft/hardlinks which if > useful for snapshotting, basically reflink points to same data blocks that > are used by actual file (blocks are common to real file and a reflink file > hence space efficient), they use different inode numbers hence they can have > different permissions to access same data blocks, although they may look > similar to hardlinks but are more space efficient and can handle all > operations that can be performed on a regular file, unlike hardlinks that > are limited to unlink(). > > which filesystem support reflink ? > I think its Btrfs who put it for the first time, now xfs trying hard to > make them available, in the future we can see them in ext4 as well > > You can get a feel of reflinks by following tutorial > https://pkalever.wordpress.com/2016/01/22/xfs-reflinks-tutorial/ > > > POC in gluster: https://asciinema.org/a/be50ukifcwk8tqhvo0ndtdqdd?speed=2 > > > How we are doing it ? > Currently we don't have a specific system-call that gives handle to > reflinks, so I decided to go with ioctl call with XFS_IOC_CLONE command. > > In POC I have used setxattr/getxattr to create/delete/list the snapshot. > Restore feature will use setxattr as well. > > We can have a fop although Fuse does't understand it, we will manage with a > setxattr at Fuse mount point and again from client side it will be a fop till > the posix xlator then as a ioctl to the underlying filesystem. Planing to > expose APIs for create, delete, list and restore. > > Are these snapshots Internal or external? > We will have a separate file each time we create a snapshot, obviously the > snapshot file will have a different inode number and will be a readonly, all > these files are maintained in the ".fsnap/ " directory which is maintained > by the parent directory where the snapshot-ted/actual file resides, > therefore they will not be visible to user (even with ls -a option, just like > USS). > > *** We can always restore to any snapshot available in the list and the best > part is we can delete any snapshot between snapshot1 and snapshotN because > all of them are independent *** > > It is applications duty to ensure the consistency of the file before it > tries to create a snapshot, say in case of VM file snapshot it is the > hyper-visor that should freeze the IO and then request for the snapshot > > > > Integration with gluster: (Initial state, need more investigation) > > Quota: > Since the snapshot files resides in ".fsnap/" directory which is maintained > by the same directory where the actual file exist, it falls in the same > users quota :) > > DHT: > As said the snapshot files will resides in the same directory where the > actual file resides may be in a ".fsnap/" directory > > Re-balancing: > Simplest solution could be, copy the actual file as whole copy then for > snapshotfiles rsync only delta's and recreate snapshots history by repeating > snapshot sequence after each snapshotfile rsync. > > AFR: > Mostly will be same as write fop (inodelk's and quorum's). There could be no > way to recover or recreate a snapshot on node (brick to be precise) which > was down while taking snapshot and comes back later in time. > > Disperse: > Mostly take the inodelk and snapshot the file, on each of the bricks should > work. > > Sharding: > Assume we have a file split into 4 shards. If the fop for take snapshot is > sent to all the subvols having the shards, it would be sufficient. All shards > will have the snapshot for the state of the shard. > List of snap fop should be sent only to the main subvol where shard 0 resides. > Delete of a snap should be similar to create. > Restore would be a little difficult because metadata of the file needs to be > updated in shard xlator. > <Needs more investigation> > Also in case of sharding, the bricks have gfid based flat filesystem. Hence > the snaps created will also be in the shard directory, hence quota is not > straight forward and needs additional work in this case. > > > How can we make it better ? > Discussion page: http://pad.engineering.redhat.com/kclYd9TPjr
This link is not accessible externally. Could you move the contents to a public location? > > > Thanks to "Pranith Kumar Karampuri", "Raghavendra Talur", "Rajesh Joseph", > "Poornima Gurusiddaiah" and "Kotresh Hiremath Ravishankar" > for all initial discussions. > > > -Prasanna > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel _______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel