Hello all -

Following this message is a series of 11 patches that rework the reiserfs
xattr code. The current implementation open codes a number of functions that
are well tested and stable elsewhere, but will slight modifications. It also
does a number of things suboptimally, such as operations that affect all of
the xattrs associated with an inode, as well as not handling journalling as
well as can be.

I've run them through a weekend of stress testing successfully, but I'd like
some additional testing before considering them safe.

Here's the run down:

* 01 - Make internal xattr operations use vfs ops
  This eliminates the open coding of the read/write loops in favor of using
  vfs_read and vfs_write. Performance-wise, it's very little difference
  from the open coding, and implementation-wise, it's more tested. Yes, this
  violates the no-file-io-in-the-kernel rules, but this rule has been violated
  since the day the original patches were accepted.

* 02 - Simplify xattr internal file lookups/opens
  This patch uses vfs-level operations where possible, using the i_mutex
  to protect directories rather than a special xattr lock.

* 03 - Eliminate per-super xattr lock
  Since we use i_mutex, we can eliminate the xattr lock, removing the locking
  constraints from namei.c

* 04 - Make per-inode xattr locking more fine grained
  The better locking allows us to tighten the critical sections that must
  be protected against concurrent access. Writers protect against concurrency
  using the owning inode's i_mutex, but readers still need protection against
  writers. This patch pushes the xattr_sem down to surround just the operation
  on the xattr itself.

* 05 - Make i_has_xattr_dir flag more sane
  With the simplification of open/lookups, there is now only one place for
  the flag to be set and one for it to be cleared.

* 06 - Use generic xattr handlers
  This patch moves the reiserfs xattr handlers to the generic VFS supplied
  struct xattr_handler. It uses the generic handlers where possible, but
  due to the generic handlers splitting the prefix from the name, that's not
  always possible since ReiserFS requires that the prefix be preserved.

* 07 - Use O_NOATIME for internal files
  This patch adds O_NOATIME for internal files, since it is never exported and
  it saves us adding another transaction for the dirty inode. It also means
  that on read operations, we don't start a transaction with an xattr sem
  held when the inode is dirtied.

* 08 - Add per-file data=ordered mode and use it for xattrs
  Despite being metadata, extended attributes are written with the same
  data logging policy as the rest of the file system. This patch adds a 
  per-file data=ordered mode and uses it when the file system is using
  data=writeback.

* 09 - Journaled xattrs
  This is the big patch. This eliminates the deadlocks that can still occur
  with xattrs on ReiserFS. The basic rule is that before doing a write
  operation on the xattr, a transaction must be started. In order to do this,
  reiserfs_xattr_set is split into two functions; One wraps the other in a
  transaction for the ->setxattr calls, and the other is used directly by
  internal functions like those which implement ACLs. This also alters the
  return value from reiserfs_cache_default_acl to contain the number of blocks
  required to handle the additional blocks required by inheriting the default
  ACL.
  
* 10 - Use generic readdir for operations across all xattrs
  One of the most wasteful operations with the existing code is operations
  that traverse all the extended attributes for a given inode. Since they are
  stored as regular files, it makes sense to use ->readdir to enumerate them.
  However, since the operations may modify the file system, the cached path
  must be dropped before the filldir routine is called. This means that
  the path must be re-read for each iteration. This patch leverages the
  generic readdir function to simply fill and return an array of dentries
  that can be operated on after the path has already been freed. ReiserFS's
  hashed directories really come in handy here, since the directory offset
  doesn't change when entries are deleted. Therefore, the old code which was
  quite involved and full of duplication is reduced to a single for_each_xattr
  routine that calls out to individual delete or chown functions as needed.

* 11 - Associate xattr's packing locality w/ owning inode's
  Since there isn't any linkage between an inode and its xattr directory, it's
  possible for the xattrs to be placed quite far away from the inode on disk.
  This patch leverages the dentry's fsdata pointer to allow reiserfs_new_inode
  to use the owning inode's dir id in determining where to place the new
  objects.

  -Jeff

Reply via email to