Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files
- Original Message - From: Prashanth Pai p...@redhat.com To: Anoop C S anoo...@redhat.com Cc: gluster-devel@gluster.org Sent: Tuesday, August 18, 2015 11:59:09 AM Subject: Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files - Original Message - From: Anoop C S anoo...@redhat.com To: gluster-devel@gluster.org Sent: Monday, August 17, 2015 6:20:50 PM Subject: [Gluster-devel] Implementing Flat Hierarchy for trashed files Hi all, As we move forward, in order to fix the limitations with current trash translator we are planning to replace the existing criteria for trashed files inside trash directory with a general flat hierarchy as described in the following sections. Please have your thoughts on following design considerations. Current implementation == * Trash translator resides on glusterfs server stack just above posix. * Trash directory (.trashcan) is created during volume start and is visible under root of the volume. * Each trashed file is moved (renamed) to trash directory with an appended time stamp in the file name. Do these files get moved during re-balance due to name change or do you choose file name according to the DHT regex magic to avoid that ? * Exact directory hierarchy (w.r.t the root of volume) is maintained inside trash directory whenever a file is deleted/truncated from a directory Outstanding issues == * Since renaming occurs at the server side, client-side is unaware of trash doing rename or create operations. * As a result files/directories may not be visible from mount point. * Files/Directories created from from trash translator will not have gfid associated with it until lookup is performed. Proposed Flat hierarchy === * Instead of creating the whole directory under trash, we will rename the file and place it directly under trash directory (of course with appended time stamp). The .trashcan directory might not scale with millions of such files placed under one directory. We had faced the same problem with gluster-swift project for object expiration feature and had decided to distribute our files across multiple directories in a deterministic way. And, personally, I'd prefer storing absolute timestamp, for example: as returned by `date +%s` command. * Directory hierarchy can be stored via either of the following two approaches: (a) File name will contain the whole path with time stamp appended If this approach is taken, you might have trouble with choosing a magic letter representing slashes. (b) Store whole hierarchy as an xattr Other enhancements == * Create the trash directory only when trash xlator is enabled. This is a needed enhancement. Upgrade to 3.7.* from older glusterfs versions caused undesired results in gluster-swift integration because .trashcan was visible by default on all glusterfs volumes. * Operations such as unlink, rename etc will be prevented on trash directory only when trash xlator is enabled. * A new trash helper translator on client side(loaded only when trash is enabled) to resolve split brain issues with truncation of files. * Restore files from trash with the help of an explicit setfattr call. You have to be very careful with races involved in re-creating the path when clients are accessing volume, also with over-writing if path exists. It's way easier (from implementer's perspective) if this is a manual process. If the on-disk structure is changed how will upgrades are handled? Thanks Regards, -Anoop C S -Jiffin Tony Thottan ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files
On Mon, 2015-08-17 at 23:15 +0530, Soumya Koduri wrote: This approach sounds good. Few inputs/queries inline. On 08/17/2015 06:20 PM, Anoop C S wrote: Hi all, As we move forward, in order to fix the limitations with current trash translator we are planning to replace the existing criteria for trashed files inside trash directory with a general flat hierarchy as described in the following sections. Please have your thoughts on following design considerations. Current implementation == * Trash translator resides on glusterfs server stack just above posix. * Trash directory (.trashcan) is created during volume start and is visible under root of the volume. * Each trashed file is moved (renamed) to trash directory with an appended time stamp in the file name. * Exact directory hierarchy (w.r.t the root of volume) is maintained inside trash directory whenever a file is deleted/truncated from a directory Outstanding issues == * Since renaming occurs at the server side, client-side is unaware of trash doing rename or create operations. * As a result files/directories may not be visible from mount point. * Files/Directories created from from trash translator will not have gfid associated with it until lookup is performed. Proposed Flat hierarchy === * Instead of creating the whole directory under trash, we will rename the file and place it directly under trash directory (of course with appended time stamp). * Directory hierarchy can be stored via either of the following two approaches: (a) File name will contain the whole path with time stamp appended (b) Store whole hierarchy as an xattr IMO, (b) sounds better compared to (a) as storing entire hierarchical path as the file name may end up reaching file_name max length limit sooner. Also users may wish to look at the file names with the original names for easy identification in the .trash directory. Other enhancements == * Create the trash directory only when trash xlator is enabled. Can the trash xlator be disabled once its enabled? If yes, will the files be still visible from the mount point? Trash translator can be disabled and trash directory will be still visible from the mount point with its contents. * Operations such as unlink, rename etc will be prevented on trash directory only when trash xlator is enabled. * A new trash helper translator on client side(loaded only when trash is enabled) to resolve split brain issues with truncation of files. Doesn't AFR/EC already take care of this? Could you please provide more details on this issue. With trash translator enabled, truncate is performed in 2 steps: (1) Read from the original file and create an exact copy under trash directory. This create call from trash translator will miss gfid for that file under trash directory. (2) Truncate the original file. Thanks, Soumya * Restore files from trash with the help of an explicit setfattr call. Thanks Regards, -Anoop C S -Jiffin Tony Thottan ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files
- Original Message - From: Anoop C S anoo...@redhat.com To: gluster-devel@gluster.org Sent: Monday, August 17, 2015 6:20:50 PM Subject: [Gluster-devel] Implementing Flat Hierarchy for trashed files Hi all, As we move forward, in order to fix the limitations with current trash translator we are planning to replace the existing criteria for trashed files inside trash directory with a general flat hierarchy as described in the following sections. Please have your thoughts on following design considerations. Current implementation == * Trash translator resides on glusterfs server stack just above posix. * Trash directory (.trashcan) is created during volume start and is visible under root of the volume. * Each trashed file is moved (renamed) to trash directory with an appended time stamp in the file name. Do these files get moved during re-balance due to name change or do you choose file name according to the DHT regex magic to avoid that ? * Exact directory hierarchy (w.r.t the root of volume) is maintained inside trash directory whenever a file is deleted/truncated from a directory Outstanding issues == * Since renaming occurs at the server side, client-side is unaware of trash doing rename or create operations. * As a result files/directories may not be visible from mount point. * Files/Directories created from from trash translator will not have gfid associated with it until lookup is performed. Proposed Flat hierarchy === * Instead of creating the whole directory under trash, we will rename the file and place it directly under trash directory (of course with appended time stamp). The .trashcan directory might not scale with millions of such files placed under one directory. We had faced the same problem with gluster-swift project for object expiration feature and had decided to distribute our files across multiple directories in a deterministic way. And, personally, I'd prefer storing absolute timestamp, for example: as returned by `date +%s` command. * Directory hierarchy can be stored via either of the following two approaches: (a) File name will contain the whole path with time stamp appended If this approach is taken, you might have trouble with choosing a magic letter representing slashes. (b) Store whole hierarchy as an xattr Other enhancements == * Create the trash directory only when trash xlator is enabled. This is a needed enhancement. Upgrade to 3.7.* from older glusterfs versions caused undesired results in gluster-swift integration because .trashcan was visible by default on all glusterfs volumes. * Operations such as unlink, rename etc will be prevented on trash directory only when trash xlator is enabled. * A new trash helper translator on client side(loaded only when trash is enabled) to resolve split brain issues with truncation of files. * Restore files from trash with the help of an explicit setfattr call. You have to be very careful with races involved in re-creating the path when clients are accessing volume, also with over-writing if path exists. It's way easier (from implementer's perspective) if this is a manual process. Thanks Regards, -Anoop C S -Jiffin Tony Thottan ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files
On Tue, 2015-08-18 at 02:29 -0400, Prashanth Pai wrote: - Original Message - From: Anoop C S anoo...@redhat.com To: gluster-devel@gluster.org Sent: Monday, August 17, 2015 6:20:50 PM Subject: [Gluster-devel] Implementing Flat Hierarchy for trashed files Hi all, As we move forward, in order to fix the limitations with current trash translator we are planning to replace the existing criteria for trashed files inside trash directory with a general flat hierarchy as described in the following sections. Please have your thoughts on following design considerations. Current implementation == * Trash translator resides on glusterfs server stack just above posix. * Trash directory (.trashcan) is created during volume start and is visible under root of the volume. * Each trashed file is moved (renamed) to trash directory with an appended time stamp in the file name. Do these files get moved during re-balance due to name change or do you choose file name according to the DHT regex magic to avoid that ? Actually we had put up http://review.gluster.org/#/c/9865/ for addressing this issue. With the above change we can have this xattr set on trashed files so as to mask those from rebalance process. * Exact directory hierarchy (w.r.t the root of volume) is maintained inside trash directory whenever a file is deleted/truncated from a directory Outstanding issues == * Since renaming occurs at the server side, client-side is unaware of trash doing rename or create operations. * As a result files/directories may not be visible from mount point. * Files/Directories created from from trash translator will not have gfid associated with it until lookup is performed. Proposed Flat hierarchy === * Instead of creating the whole directory under trash, we will rename the file and place it directly under trash directory (of course with appended time stamp). The .trashcan directory might not scale with millions of such files placed under one directory. We had faced the same problem with gluster-swift project for object expiration feature and had decided to distribute our files across multiple directories in a deterministic way. And, personally, I'd prefer storing absolute timestamp, for example: as returned by `date +%s` command. In glusterfs we use strftime() library call for string formatting date and time. We can use gf_timefmt_s format inside gluster which is a wrapper for %s format exposed by strftime() lib call to get the number of seconds since the Epoch. But the problem here is that is depends on TZ(timezone). For more detailed explanation see the commit message fromhttp://review.gluster.org/#/c/11930/. * Directory hierarchy can be stored via either of the following two approaches: (a) File name will contain the whole path with time stamp appended If this approach is taken, you might have trouble with choosing a magic letter representing slashes. (b) Store whole hierarchy as an xattr Other enhancements == * Create the trash directory only when trash xlator is enabled. This is a needed enhancement. Upgrade to 3.7.* from older glusterfs versions caused undesired results in gluster-swift integration because .trashcan was visible by default on all glusterfs volumes. * Operations such as unlink, rename etc will be prevented on trash directory only when trash xlator is enabled. * A new trash helper translator on client side(loaded only when trash is enabled) to resolve split brain issues with truncation of files. * Restore files from trash with the help of an explicit setfattr call. You have to be very careful with races involved in re-creating the path when clients are accessing volume, also with over-writing if path exists. It's way easier (from implementer's perspective) if this is a manual process. Thanks Regards, -Anoop C S -Jiffin Tony Thottan ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files
Comments inline. On 18/08/15 09:54, Niels de Vos wrote: On Mon, Aug 17, 2015 at 06:20:50PM +0530, Anoop C S wrote: Hi all, As we move forward, in order to fix the limitations with current trash translator we are planning to replace the existing criteria for trashed files inside trash directory with a general flat hierarchy as described in the following sections. Please have your thoughts on following design considerations. Current implementation == * Trash translator resides on glusterfs server stack just above posix. * Trash directory (.trashcan) is created during volume start and is visible under root of the volume. * Each trashed file is moved (renamed) to trash directory with an appended time stamp in the file name. * Exact directory hierarchy (w.r.t the root of volume) is maintained inside trash directory whenever a file is deleted/truncated from a directory Outstanding issues == * Since renaming occurs at the server side, client-side is unaware of trash doing rename or create operations. * As a result files/directories may not be visible from mount point. This might be something upcall could help with. If the trash xlator is placed above upcall, any clients interested in the .trashcan directory (or subdirs) could get an in/revalidation request. * Files/Directories created from from trash translator will not have gfid associated with it until lookup is performed. When a client receives an invalidation of the parent directory (from upcall), a LOOKUP will follow on the next request. If I understand it correctly , solution become more complex if integrate both translator and upcall together. 1.) Upcall notification can be send to a client only if it has accessed .trashcan 2.) There should be translator at client side to initiate lookup after receiving upcall notification 3.) Performance hit. Say file `foo`is present in a/b/c/. We need to create path a/b/c/ inside trash directory. So ideally trash xlator will first create directory 'a' , then send upcall notification to all of the client and then clients will initiate lookup on 'a', perform gfid healing on that directory. After that it will create `b` and repeat the same procedure. Proposed Flat hierarchy === I'm missing a bit of info here, what limitations need to be addressed? all above mentioned outstanding issues can be addressed by the flat hierarchy. * Instead of creating the whole directory under trash, we will rename the file and place it directly under trash directory (of course with appended time stamp). * Directory hierarchy can be stored via either of the following two approaches: (a) File name will contain the whole path with time stamp appended (b) Store whole hierarchy as an xattr If this is needed, definitely go with (b). Filenames have a limit, and the full path (directories + filename + timestamp) could surely hit that. Thanks for the suggestion. Other enhancements == Have these been filed as bugs/RFEs? If not, please do so and include a good description of the work that is needed. Maybe others in the Gluster community are interested in providing patches, and details on what to do is very helpful. Sure. We will file different RFE's as soon as possible and sent it in different mail. Thanks, Niels * Create the trash directory only when trash xlator is enabled. * Operations such as unlink, rename etc will be prevented on trash directory only when trash xlator is enabled. * A new trash helper translator on client side(loaded only when trash is enabled) to resolve split brain issues with truncation of files. * Restore files from trash with the help of an explicit setfattr call. Thanks Regards, -Anoop C S -Jiffin Tony Thottan ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- Jiffin ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files
On Mon, Aug 17, 2015 at 06:20:50PM +0530, Anoop C S wrote: Hi all, As we move forward, in order to fix the limitations with current trash translator we are planning to replace the existing criteria for trashed files inside trash directory with a general flat hierarchy as described in the following sections. Please have your thoughts on following design considerations. Current implementation == * Trash translator resides on glusterfs server stack just above posix. * Trash directory (.trashcan) is created during volume start and is visible under root of the volume. * Each trashed file is moved (renamed) to trash directory with an appended time stamp in the file name. * Exact directory hierarchy (w.r.t the root of volume) is maintained inside trash directory whenever a file is deleted/truncated from a directory Outstanding issues == * Since renaming occurs at the server side, client-side is unaware of trash doing rename or create operations. * As a result files/directories may not be visible from mount point. This might be something upcall could help with. If the trash xlator is placed above upcall, any clients interested in the .trashcan directory (or subdirs) could get an in/revalidation request. * Files/Directories created from from trash translator will not have gfid associated with it until lookup is performed. When a client receives an invalidation of the parent directory (from upcall), a LOOKUP will follow on the next request. Proposed Flat hierarchy === I'm missing a bit of info here, what limitations need to be addressed? * Instead of creating the whole directory under trash, we will rename the file and place it directly under trash directory (of course with appended time stamp). * Directory hierarchy can be stored via either of the following two approaches: (a) File name will contain the whole path with time stamp appended (b) Store whole hierarchy as an xattr If this is needed, definitely go with (b). Filenames have a limit, and the full path (directories + filename + timestamp) could surely hit that. Other enhancements == Have these been filed as bugs/RFEs? If not, please do so and include a good description of the work that is needed. Maybe others in the Gluster community are interested in providing patches, and details on what to do is very helpful. Thanks, Niels * Create the trash directory only when trash xlator is enabled. * Operations such as unlink, rename etc will be prevented on trash directory only when trash xlator is enabled. * A new trash helper translator on client side(loaded only when trash is enabled) to resolve split brain issues with truncation of files. * Restore files from trash with the help of an explicit setfattr call. Thanks Regards, -Anoop C S -Jiffin Tony Thottan ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel pgp9j6aBfP7It.pgp Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files
This approach sounds good. Few inputs/queries inline. On 08/17/2015 06:20 PM, Anoop C S wrote: Hi all, As we move forward, in order to fix the limitations with current trash translator we are planning to replace the existing criteria for trashed files inside trash directory with a general flat hierarchy as described in the following sections. Please have your thoughts on following design considerations. Current implementation == * Trash translator resides on glusterfs server stack just above posix. * Trash directory (.trashcan) is created during volume start and is visible under root of the volume. * Each trashed file is moved (renamed) to trash directory with an appended time stamp in the file name. * Exact directory hierarchy (w.r.t the root of volume) is maintained inside trash directory whenever a file is deleted/truncated from a directory Outstanding issues == * Since renaming occurs at the server side, client-side is unaware of trash doing rename or create operations. * As a result files/directories may not be visible from mount point. * Files/Directories created from from trash translator will not have gfid associated with it until lookup is performed. Proposed Flat hierarchy === * Instead of creating the whole directory under trash, we will rename the file and place it directly under trash directory (of course with appended time stamp). * Directory hierarchy can be stored via either of the following two approaches: (a) File name will contain the whole path with time stamp appended (b) Store whole hierarchy as an xattr IMO, (b) sounds better compared to (a) as storing entire hierarchical path as the file name may end up reaching file_name max length limit sooner. Also users may wish to look at the file names with the original names for easy identification in the .trash directory. Other enhancements == * Create the trash directory only when trash xlator is enabled. Can the trash xlator be disabled once its enabled? If yes, will the files be still visible from the mount point? * Operations such as unlink, rename etc will be prevented on trash directory only when trash xlator is enabled. * A new trash helper translator on client side(loaded only when trash is enabled) to resolve split brain issues with truncation of files. Doesn't AFR/EC already take care of this? Could you please provide more details on this issue. Thanks, Soumya * Restore files from trash with the help of an explicit setfattr call. Thanks Regards, -Anoop C S -Jiffin Tony Thottan ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel