Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files

2015-08-18 Thread Rajesh Joseph


- Original Message -
 From: Prashanth Pai p...@redhat.com
 To: Anoop C S anoo...@redhat.com
 Cc: gluster-devel@gluster.org
 Sent: Tuesday, August 18, 2015 11:59:09 AM
 Subject: Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files
 
 
 - Original Message -
  From: Anoop C S anoo...@redhat.com
  To: gluster-devel@gluster.org
  Sent: Monday, August 17, 2015 6:20:50 PM
  Subject: [Gluster-devel] Implementing Flat Hierarchy for trashed files
  
  Hi all,
  
  As we move forward, in order to fix the limitations with current trash
  translator we are planning to replace the existing criteria for trashed
  files inside trash directory with a general flat hierarchy as described
  in the following sections. Please have your thoughts on following
  design considerations.
  
  Current implementation
  ==
  * Trash translator resides on glusterfs server stack just above posix.
  * Trash directory (.trashcan) is created during volume start and is
visible under root of the volume.
  * Each trashed file is moved (renamed) to trash directory with an
appended time stamp in the file name.
 
 Do these files get moved during re-balance due to name change or do you
 choose file name according to the DHT regex magic to avoid that ?
 
  * Exact directory hierarchy (w.r.t the root of volume) is maintained
inside trash directory whenever a file is deleted/truncated from a
directory
  
  Outstanding issues
  ==
  * Since renaming occurs at the server side, client-side is unaware of
trash doing rename or create operations.
  * As a result files/directories may not be visible from mount point.
  * Files/Directories created from from trash translator will not have
gfid associated with it until lookup is performed.
  
  Proposed Flat hierarchy
  ===
  * Instead of creating the whole directory under trash, we will rename
the file and place it directly under trash directory (of course with
appended time stamp).
 
 The .trashcan directory might not scale with millions of such files placed
 under one directory. We had faced the same problem with gluster-swift
 project for object expiration feature and had decided to distribute our
 files across multiple directories in a deterministic way. And, personally,
 I'd prefer storing absolute timestamp, for example: as returned by `date
 +%s` command.
 
  * Directory hierarchy can be stored via either of the following two
approaches:
  (a) File name will contain the whole path with time stamp
  appended
 
 If this approach is taken, you might have trouble with choosing a magic
 letter representing slashes.
 
  (b) Store whole hierarchy as an xattr
  
  Other enhancements
  ==
  * Create the trash directory only
  when trash xlator is enabled.
 
 This is a needed enhancement. Upgrade to 3.7.* from older glusterfs versions
 caused undesired results in gluster-swift integration because .trashcan was
 visible by default on all glusterfs volumes.
 
  * Operations such as unlink, rename etc
  will be prevented on trash
directory only when trash xlator is
  enabled.
  * A new trash helper translator on client side(loaded only when
  trash
is enabled) to resolve split brain issues with truncation of
  files.
  * Restore files from trash with the help of an explicit setfattr
  call.
 
 You have to be very careful with races involved in re-creating the path when
 clients are accessing volume, also with over-writing if path exists.
 It's way easier (from implementer's perspective) if this is a manual process.
 
  


If the on-disk structure is changed how will upgrades are handled?


  Thanks  Regards,
  -Anoop C S
  -Jiffin Tony Thottan
  ___
  Gluster-devel mailing list
  Gluster-devel@gluster.org
  http://www.gluster.org/mailman/listinfo/gluster-devel
  
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files

2015-08-18 Thread Anoop C S
On Mon, 2015-08-17 at 23:15 +0530, Soumya Koduri wrote:
 This approach sounds good. Few inputs/queries inline.
 
 
 On 08/17/2015 06:20 PM, Anoop C S wrote:
  Hi all,
  
  As we move forward, in order to fix the limitations with current
  trash
  translator we are planning to replace the existing criteria for
  trashed
  files inside trash directory with a general flat hierarchy as
  described
  in the following sections. Please have your thoughts on following
  design considerations.
  
  Current implementation
  ==
  * Trash translator resides on glusterfs server stack just above
  posix.
  * Trash directory (.trashcan) is created during volume start and is
 visible under root of the volume.
  * Each trashed file is moved (renamed) to trash directory with an
 appended time stamp in the file name.
  * Exact directory hierarchy (w.r.t the root of volume) is
  maintained
 inside trash directory whenever a file is deleted/truncated from
  a
 directory
  
  Outstanding issues
  ==
  * Since renaming occurs at the server side, client-side is unaware
  of
 trash doing rename or create operations.
  * As a result files/directories may not be visible from mount
  point.
  * Files/Directories created from from trash translator will not
  have
 gfid associated with it until lookup is performed.
  
  Proposed Flat hierarchy
  ===
  * Instead of creating the whole directory under trash, we will
  rename
 the file and place it directly under trash directory (of course
  with
 appended time stamp).
  * Directory hierarchy can be stored via either of the following two
 approaches:
  (a) File name will contain the whole path with time stamp
  appended
  (b) Store whole hierarchy as an xattr
  
 IMO, (b) sounds better compared to (a) as storing entire hierarchical
 path as the file name may end up reaching file_name max length limit 
 sooner. Also users may wish to look at the file names with the
 original 
 names for easy identification in the .trash directory.
 
  Other enhancements
  ==
  * Create the trash directory only
  when trash xlator is enabled.
 
 Can the trash xlator be disabled once its enabled? If yes, will the 
 files be still visible from the mount point?
 

Trash translator can be disabled and trash directory will be still
visible from the mount point with its contents. 

  * Operations such as unlink, rename etc
  will be prevented on trash
 directory only when trash xlator is
  enabled.
  * A new trash helper translator on client side(loaded only when
  trash
 is enabled) to resolve split brain issues with truncation of
  files.
 Doesn't AFR/EC already take care of this? Could you please provide
 more 
 details on this issue.
 

With trash translator enabled, truncate is performed in 2 steps:
(1) Read from the original file and create an exact copy under trash
directory. This create call from trash translator will miss gfid for
that file under trash directory.
(2) Truncate the original file.

 
 Thanks,
 Soumya
 
  * Restore files from trash with the help of an explicit setfattr
  call.
  
  Thanks  Regards,
  -Anoop C S
  -Jiffin Tony Thottan
  ___
  Gluster-devel mailing list
  Gluster-devel@gluster.org
  http://www.gluster.org/mailman/listinfo/gluster-devel
  
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files

2015-08-18 Thread Prashanth Pai

- Original Message -
 From: Anoop C S anoo...@redhat.com
 To: gluster-devel@gluster.org
 Sent: Monday, August 17, 2015 6:20:50 PM
 Subject: [Gluster-devel] Implementing Flat Hierarchy for trashed files
 
 Hi all,
 
 As we move forward, in order to fix the limitations with current trash
 translator we are planning to replace the existing criteria for trashed
 files inside trash directory with a general flat hierarchy as described
 in the following sections. Please have your thoughts on following
 design considerations.
 
 Current implementation
 ==
 * Trash translator resides on glusterfs server stack just above posix.
 * Trash directory (.trashcan) is created during volume start and is
   visible under root of the volume.
 * Each trashed file is moved (renamed) to trash directory with an
   appended time stamp in the file name.

Do these files get moved during re-balance due to name change or do you choose 
file name according to the DHT regex magic to avoid that ?

 * Exact directory hierarchy (w.r.t the root of volume) is maintained
   inside trash directory whenever a file is deleted/truncated from a
   directory
 
 Outstanding issues
 ==
 * Since renaming occurs at the server side, client-side is unaware of
   trash doing rename or create operations.
 * As a result files/directories may not be visible from mount point.
 * Files/Directories created from from trash translator will not have
   gfid associated with it until lookup is performed.
 
 Proposed Flat hierarchy
 ===
 * Instead of creating the whole directory under trash, we will rename
   the file and place it directly under trash directory (of course with
   appended time stamp).

The .trashcan directory might not scale with millions of such files placed 
under one directory. We had faced the same problem with gluster-swift project 
for object expiration feature and had decided to distribute our files across 
multiple directories in a deterministic way. And, personally, I'd prefer 
storing absolute timestamp, for example: as returned by `date +%s` command.

 * Directory hierarchy can be stored via either of the following two
   approaches:
   (a) File name will contain the whole path with time stamp
   appended

If this approach is taken, you might have trouble with choosing a magic 
letter representing slashes.

   (b) Store whole hierarchy as an xattr
 
 Other enhancements
 ==
 * Create the trash directory only
 when trash xlator is enabled.

This is a needed enhancement. Upgrade to 3.7.* from older glusterfs versions 
caused undesired results in gluster-swift integration because .trashcan was 
visible by default on all glusterfs volumes.

 * Operations such as unlink, rename etc
 will be prevented on trash
   directory only when trash xlator is
 enabled.
 * A new trash helper translator on client side(loaded only when
 trash
   is enabled) to resolve split brain issues with truncation of
 files.
 * Restore files from trash with the help of an explicit setfattr
 call.

You have to be very careful with races involved in re-creating the path when 
clients are accessing volume, also with over-writing if path exists.
It's way easier (from implementer's perspective) if this is a manual process.

 
 Thanks  Regards,
 -Anoop C S
 -Jiffin Tony Thottan
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files

2015-08-18 Thread Anoop C S
On Tue, 2015-08-18 at 02:29 -0400, Prashanth Pai wrote:
 - Original Message -
  From: Anoop C S anoo...@redhat.com
  To: gluster-devel@gluster.org
  Sent: Monday, August 17, 2015 6:20:50 PM
  Subject: [Gluster-devel] Implementing Flat Hierarchy for trashed
  files
  
  Hi all,
  
  As we move forward, in order to fix the limitations with current
  trash
  translator we are planning to replace the existing criteria for
  trashed
  files inside trash directory with a general flat hierarchy as
  described
  in the following sections. Please have your thoughts on following
  design considerations.
  
  Current implementation
  ==
  * Trash translator resides on glusterfs server stack just above
  posix.
  * Trash directory (.trashcan) is created during volume start and is
visible under root of the volume.
  * Each trashed file is moved (renamed) to trash directory with an
appended time stamp in the file name.
 
 Do these files get moved during re-balance due to name change or do
 you choose file name according to the DHT regex magic to avoid that ?
 

Actually we had put up http://review.gluster.org/#/c/9865/ for
addressing this issue. With the above change we can have this xattr set
on trashed files so as to mask those from rebalance process.

  * Exact directory hierarchy (w.r.t the root of volume) is
  maintained
inside trash directory whenever a file is deleted/truncated from
  a
directory
  
  Outstanding issues
  ==
  * Since renaming occurs at the server side, client-side is unaware
  of
trash doing rename or create operations.
  * As a result files/directories may not be visible from mount
  point.
  * Files/Directories created from from trash translator will not
  have
gfid associated with it until lookup is performed.
  
  Proposed Flat hierarchy
  ===
  * Instead of creating the whole directory under trash, we will
  rename
the file and place it directly under trash directory (of course
  with
appended time stamp).
 
 The .trashcan directory might not scale with millions of such files
 placed under one directory. We had faced the same problem with
 gluster-swift project for object expiration feature and had decided
 to distribute our files across multiple directories in a
 deterministic way. And, personally, I'd prefer storing absolute
 timestamp, for example: as returned by `date +%s` command.
 

In glusterfs we use strftime() library call for string formatting date
and time. We can use gf_timefmt_s format inside gluster which is a
wrapper for %s format exposed by strftime() lib call to get the number
of seconds since the Epoch. But the problem here is that is depends on
TZ(timezone). For more detailed explanation see the commit message 
fromhttp://review.gluster.org/#/c/11930/.

  * Directory hierarchy can be stored via either of the following two
approaches:
  (a) File name will contain the whole path with time stamp
  appended
 
 If this approach is taken, you might have trouble with choosing a 
 magic letter representing slashes.
  (b) Store whole hierarchy as an xattr
  
  Other enhancements
  ==
  * Create the trash directory only
  when trash xlator is enabled.
 
 This is a needed enhancement. Upgrade to 3.7.* from older glusterfs
 versions caused undesired results in gluster-swift integration
 because .trashcan was visible by default on all glusterfs volumes.
 
  * Operations such as unlink, rename etc
  will be prevented on trash
directory only when trash xlator is
  enabled.
  * A new trash helper translator on client side(loaded only when
  trash
is enabled) to resolve split brain issues with truncation of
  files.
  * Restore files from trash with the help of an explicit setfattr
  call.
 
 You have to be very careful with races involved in re-creating the
 path when clients are accessing volume, also with over-writing if
 path exists.
 It's way easier (from implementer's perspective) if this is a manual
 process.
 
  
  Thanks  Regards,
  -Anoop C S
  -Jiffin Tony Thottan
  ___
  Gluster-devel mailing list
  Gluster-devel@gluster.org
  http://www.gluster.org/mailman/listinfo/gluster-devel
  
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files

2015-08-18 Thread Jiffin Tony Thottan

Comments inline.

On 18/08/15 09:54, Niels de Vos wrote:

On Mon, Aug 17, 2015 at 06:20:50PM +0530, Anoop C S wrote:

Hi all,

As we move forward, in order to fix the limitations with current trash
translator we are planning to replace the existing criteria for trashed
files inside trash directory with a general flat hierarchy as described
in the following sections. Please have your thoughts on following
design considerations.

Current implementation
==
* Trash translator resides on glusterfs server stack just above posix.
* Trash directory (.trashcan) is created during volume start and is
   visible under root of the volume.
* Each trashed file is moved (renamed) to trash directory with an
   appended time stamp in the file name.
* Exact directory hierarchy (w.r.t the root of volume) is maintained
   inside trash directory whenever a file is deleted/truncated from a
   directory

Outstanding issues
==
* Since renaming occurs at the server side, client-side is unaware of
   trash doing rename or create operations.
* As a result files/directories may not be visible from mount point.

This might be something upcall could help with. If the trash xlator is
placed above upcall, any clients interested in the .trashcan directory
(or subdirs) could get an in/revalidation request.


* Files/Directories created from from trash translator will not have
   gfid associated with it until lookup is performed.

When a client receives an invalidation of the parent directory (from
upcall), a LOOKUP will follow on the next request.


If I understand it correctly , solution become more complex if integrate 
both translator and upcall together.
1.) Upcall notification can be send to a client only if it has accessed 
.trashcan
2.) There should be translator at client side to initiate lookup after 
receiving upcall notification
3.) Performance hit. Say file `foo`is present in a/b/c/. We need to 
create path a/b/c/ inside trash directory.
So ideally trash xlator will first create directory 'a' , then send 
upcall notification to all of the client and then clients will initiate 
lookup on 'a',
perform gfid healing on that directory. After that it will create `b` 
and repeat the same procedure.

Proposed Flat hierarchy
===

I'm missing a bit of info here, what limitations need to be addressed?


all above mentioned outstanding issues can be addressed by the flat 
hierarchy.

* Instead of creating the whole directory under trash, we will rename
   the file and place it directly under trash directory (of course with
   appended time stamp).
* Directory hierarchy can be stored via either of the following two
   approaches:
(a) File name will contain the whole path with time stamp
appended
(b) Store whole hierarchy as an xattr

If this is needed, definitely go with (b). Filenames have a limit, and
the full path (directories + filename + timestamp) could surely hit
that.


Thanks for the suggestion.


Other enhancements
==

Have these been filed as bugs/RFEs? If not, please do so and include a
good description of the work that is needed. Maybe others in the Gluster
community are interested in providing patches, and details on what to do
is very helpful.


Sure. We will file different RFE's as soon as possible and sent it in 
different mail.



Thanks,
Niels


* Create the trash directory only
when trash xlator is enabled.
* Operations such as unlink, rename etc
will be prevented on trash
   directory only when trash xlator is
enabled.
* A new trash helper translator on client side(loaded only when
trash
   is enabled) to resolve split brain issues with truncation of
files.
* Restore files from trash with the help of an explicit setfattr
call.

Thanks  Regards,
-Anoop C S
-Jiffin Tony Thottan
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel



--
Jiffin



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files

2015-08-17 Thread Niels de Vos
On Mon, Aug 17, 2015 at 06:20:50PM +0530, Anoop C S wrote:
 Hi all,
 
 As we move forward, in order to fix the limitations with current trash
 translator we are planning to replace the existing criteria for trashed
 files inside trash directory with a general flat hierarchy as described
 in the following sections. Please have your thoughts on following
 design considerations.
 
 Current implementation
 ==
 * Trash translator resides on glusterfs server stack just above posix.
 * Trash directory (.trashcan) is created during volume start and is
   visible under root of the volume.
 * Each trashed file is moved (renamed) to trash directory with an
   appended time stamp in the file name. 
 * Exact directory hierarchy (w.r.t the root of volume) is maintained
   inside trash directory whenever a file is deleted/truncated from a
   directory
 
 Outstanding issues
 ==
 * Since renaming occurs at the server side, client-side is unaware of
   trash doing rename or create operations.
 * As a result files/directories may not be visible from mount point.

This might be something upcall could help with. If the trash xlator is
placed above upcall, any clients interested in the .trashcan directory
(or subdirs) could get an in/revalidation request.

 * Files/Directories created from from trash translator will not have
   gfid associated with it until lookup is performed.

When a client receives an invalidation of the parent directory (from
upcall), a LOOKUP will follow on the next request.

 Proposed Flat hierarchy
 ===

I'm missing a bit of info here, what limitations need to be addressed?

 * Instead of creating the whole directory under trash, we will rename
   the file and place it directly under trash directory (of course with
   appended time stamp).
 * Directory hierarchy can be stored via either of the following two
   approaches:
   (a) File name will contain the whole path with time stamp
   appended
   (b) Store whole hierarchy as an xattr

If this is needed, definitely go with (b). Filenames have a limit, and
the full path (directories + filename + timestamp) could surely hit
that.

 Other enhancements
 ==

Have these been filed as bugs/RFEs? If not, please do so and include a
good description of the work that is needed. Maybe others in the Gluster
community are interested in providing patches, and details on what to do
is very helpful.

Thanks,
Niels

 * Create the trash directory only
 when trash xlator is enabled.
 * Operations such as unlink, rename etc
 will be prevented on trash
   directory only when trash xlator is
 enabled.
 * A new trash helper translator on client side(loaded only when
 trash
   is enabled) to resolve split brain issues with truncation of
 files.
 * Restore files from trash with the help of an explicit setfattr
 call.
 
 Thanks  Regards,
 -Anoop C S
 -Jiffin Tony Thottan
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel


pgp9j6aBfP7It.pgp
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Implementing Flat Hierarchy for trashed files

2015-08-17 Thread Soumya Koduri

This approach sounds good. Few inputs/queries inline.


On 08/17/2015 06:20 PM, Anoop C S wrote:

Hi all,

As we move forward, in order to fix the limitations with current trash
translator we are planning to replace the existing criteria for trashed
files inside trash directory with a general flat hierarchy as described
in the following sections. Please have your thoughts on following
design considerations.

Current implementation
==
* Trash translator resides on glusterfs server stack just above posix.
* Trash directory (.trashcan) is created during volume start and is
   visible under root of the volume.
* Each trashed file is moved (renamed) to trash directory with an
   appended time stamp in the file name.
* Exact directory hierarchy (w.r.t the root of volume) is maintained
   inside trash directory whenever a file is deleted/truncated from a
   directory

Outstanding issues
==
* Since renaming occurs at the server side, client-side is unaware of
   trash doing rename or create operations.
* As a result files/directories may not be visible from mount point.
* Files/Directories created from from trash translator will not have
   gfid associated with it until lookup is performed.

Proposed Flat hierarchy
===
* Instead of creating the whole directory under trash, we will rename
   the file and place it directly under trash directory (of course with
   appended time stamp).
* Directory hierarchy can be stored via either of the following two
   approaches:
(a) File name will contain the whole path with time stamp
appended
(b) Store whole hierarchy as an xattr

IMO, (b) sounds better compared to (a) as storing entire hierarchical 
path as the file name may end up reaching file_name max length limit 
sooner. Also users may wish to look at the file names with the original 
names for easy identification in the .trash directory.



Other enhancements
==
* Create the trash directory only
when trash xlator is enabled.


Can the trash xlator be disabled once its enabled? If yes, will the 
files be still visible from the mount point?



* Operations such as unlink, rename etc
will be prevented on trash
   directory only when trash xlator is
enabled.
* A new trash helper translator on client side(loaded only when
trash
   is enabled) to resolve split brain issues with truncation of
files.
Doesn't AFR/EC already take care of this? Could you please provide more 
details on this issue.



Thanks,
Soumya


* Restore files from trash with the help of an explicit setfattr
call.

Thanks  Regards,
-Anoop C S
-Jiffin Tony Thottan
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel