from:"Andreas Grünbacher"

Re: [PATCH 006/141] gfs2: Fix fall-through warnings for Clang

2021-04-20 Thread Andreas Grünbacher

Am Di., 20. Apr. 2021 um 22:29 Uhr schrieb Gustavo A. R. Silva
:
> Friendly ping: who can take this, please?

Oops, that got lost. I've added it to for-next now.

Thanks,
Andreas

Re: [PATCH -next] gfs2: use kmem_cache_free() instead of kfree()

2021-04-09 Thread Andreas Grünbacher

Hi,

Am Fr., 9. Apr. 2021 um 10:20 Uhr schrieb Wei Yongjun :
> memory allocated by kmem_cache_alloc() should be freed using
> kmem_cache_free(), not kfree().

thanks for the patch, that's true. This patch has turned out to have
other problems as well, so we've pulled it and Bob is currently
investigating.

Andreas

Re: [PATCH 1/3] posic_acl: Add a helper determine if SGID should be cleared

2021-03-19 Thread Andreas Grünbacher

Hi,

Am Fr., 19. März 2021 um 20:58 Uhr schrieb Vivek Goyal :
> posix_acl_update_mode() determines what's the equivalent mode and if SGID
> needs to be cleared or not. I need to make use of this code in fuse
> as well. Fuse will send this information to virtiofs file server and
> file server will take care of clearing SGID if it needs to be done.
>
> Hence move this code in a separate helper so that more than one place
> can call into it.
>
> Cc: Jan Kara 
> Cc: Andreas Gruenbacher 
> Cc: Alexander Viro 
> Signed-off-by: Vivek Goyal 
> ---
>  fs/posix_acl.c|  3 +--
>  include/linux/posix_acl.h | 11 +++
>  2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/fs/posix_acl.c b/fs/posix_acl.c
> index f3309a7edb49..2d62494c4a5b 100644
> --- a/fs/posix_acl.c
> +++ b/fs/posix_acl.c
> @@ -684,8 +684,7 @@ int posix_acl_update_mode(struct user_namespace 
> *mnt_userns,
> return error;
> if (error == 0)
> *acl = NULL;
> -   if (!in_group_p(i_gid_into_mnt(mnt_userns, inode)) &&
> -   !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
> +   if (posix_acl_mode_clear_sgid(mnt_userns, inode))
> mode &= ~S_ISGID;
> *mode_p = mode;
> return 0;
> diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h
> index 307094ebb88c..073c5e546de3 100644
> --- a/include/linux/posix_acl.h
> +++ b/include/linux/posix_acl.h
> @@ -59,6 +59,17 @@ posix_acl_release(struct posix_acl *acl)
>  }
>
>
> +static inline bool
> +posix_acl_mode_clear_sgid(struct user_namespace *mnt_userns,
> + struct inode *inode)
> +{
> +   if (!in_group_p(i_gid_into_mnt(mnt_userns, inode)) &&
> +   !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
> +   return true;
> +
> +   return false;

That's just

return !in_group_p(i_gid_into_mnt(mnt_userns, inode)) &&
!capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID);

The same pattern we have in posix_acl_update_mode also exists in
setattr_copy and inode_init_owner, and almost the same pattern exists
in setattr_prepare, so can this be cleaned up as well? The function
also isn't POSIX ACL specific, so the function name is misleading.

> +}
> +
>  /* posix_acl.c */
>
>  extern void posix_acl_init(struct posix_acl *, int);
> --
> 2.25.4

Thanks,
Andreas

Re: [PATCH v2] fs: amend SLAB_RECLAIM_ACCOUNT on gfs2 related slab cache

2021-01-05 Thread Andreas Grünbacher

Bob,

Am Di., 5. Jan. 2021 um 14:28 Uhr schrieb Bob Peterson :
> - Original Message -
> > From: Zhaoyang Huang 
> >
> > As gfs2_quotad_cachep and gfs2_glock_cachep have registered
> > the shrinker, amending SLAB_RECLAIM_ACCOUNT when creating
> > them, which make the slab acount to be presiced.
> >
> > Signed-off-by: Zhaoyang Huang 
> > ---
> > v2: add gfs2_glock_cachep for same operation
> > ---
> Hi,
>
> Thanks. Your patch is now pushed to the linux-gfs2/for-next branch.
> I cleaned up the description a bit. For example, I changed "fs:" to
> "gfs2:" to conform to other gfs2 patches.

so the patch description now reads:

"As gfs2_quotad_cachep and gfs2_glock_cachep have registered
shrinkers, amending SLAB_RECLAIM_ACCOUNT when creating them,
which improves slab accounting."

I'm not sure what that description is based on; the definition of
SLAB_RECLAIM_ACCOUNT in include/linux/slab.h looks as follows, which
indicates that the purpose isn't really accounting but object
grouping:

/* The following flags affect the page allocator grouping pages by mobility */
/* Objects are reclaimable */
#define SLAB_RECLAIM_ACCOUNT((slab_flags_t __force)0x0002U)
#define SLAB_TEMPORARY  SLAB_RECLAIM_ACCOUNT/* Objects are
short-lived */

Furthermore, would it make sense to use SLAB_RECLAIM_ACCOUNT or
SLAB_TEMPORARY for gfs2_bufdata and gfs2_trans as well?

Thanks,
Andreas

Re: [PATCH 5.8 214/464] iomap: Make sure iomap_end is called after iomap_begin

2020-08-17 Thread Andreas Grünbacher

Gerg,

Am Mo., 17. Aug. 2020 um 21:39 Uhr schrieb Greg Kroah-Hartman
:
> From: Andreas Gruenbacher 
>
> [ Upstream commit 856473cd5d17dbbf3055710857c67a4af6d9fcc0 ]
>
> Make sure iomap_end is always called when iomap_begin succeeds.
>
> Without this fix, iomap_end won't be called when a filesystem's
> iomap_begin operation returns an invalid mapping, bypassing any
> unlocking done in iomap_end.  With this fix, the unlocking will still
> happen.
>
> This bug was found by Bob Peterson during code review.  It's unlikely
> that such iomap_begin bugs will survive to affect users, so backporting
> this fix seems unnecessary.

this doesn't need to be backported.

Thanks,
Andreas


> Fixes: ae259a9c8593 ("fs: introduce iomap infrastructure")
> Signed-off-by: Andreas Gruenbacher 
> Reviewed-by: Christoph Hellwig 
> Reviewed-by: Darrick J. Wong 
> Signed-off-by: Darrick J. Wong 
> Signed-off-by: Sasha Levin 
> ---
>  fs/iomap/apply.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> index 76925b40b5fd2..26ab6563181fc 100644
> --- a/fs/iomap/apply.c
> +++ b/fs/iomap/apply.c
> @@ -46,10 +46,14 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t 
> length, unsigned flags,
> ret = ops->iomap_begin(inode, pos, length, flags, , );
> if (ret)
> return ret;
> -   if (WARN_ON(iomap.offset > pos))
> -   return -EIO;
> -   if (WARN_ON(iomap.length == 0))
> -   return -EIO;
> +   if (WARN_ON(iomap.offset > pos)) {
> +   written = -EIO;
> +   goto out;
> +   }
> +   if (WARN_ON(iomap.length == 0)) {
> +   written = -EIO;
> +   goto out;
> +   }
>
> trace_iomap_apply_dstmap(inode, );
> if (srcmap.type != IOMAP_HOLE)
> @@ -80,6 +84,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, 
> unsigned flags,
> written = actor(inode, pos, length, data, ,
> srcmap.type != IOMAP_HOLE ?  : );
>
> +out:
> /*
>  * Now the data has been copied, commit the range we've copied.  This
>  * should not fail unless the filesystem has had a fatal error.
> --
> 2.25.1
>
>
>

Re: [PATCH 5.7 188/393] iomap: Make sure iomap_end is called after iomap_begin

2020-08-17 Thread Andreas Grünbacher

Greg,

Am Mo., 17. Aug. 2020 um 21:01 Uhr schrieb Greg Kroah-Hartman
:
> From: Andreas Gruenbacher 
>
> [ Upstream commit 856473cd5d17dbbf3055710857c67a4af6d9fcc0 ]
>
> Make sure iomap_end is always called when iomap_begin succeeds.
>
> Without this fix, iomap_end won't be called when a filesystem's
> iomap_begin operation returns an invalid mapping, bypassing any
> unlocking done in iomap_end.  With this fix, the unlocking will still
> happen.
>
> This bug was found by Bob Peterson during code review.  It's unlikely
> that such iomap_begin bugs will survive to affect users, so backporting
> this fix seems unnecessary.

this doesn't need to be backported.

Thanks,
Andreas


> Fixes: ae259a9c8593 ("fs: introduce iomap infrastructure")
> Signed-off-by: Andreas Gruenbacher 
> Reviewed-by: Christoph Hellwig 
> Reviewed-by: Darrick J. Wong 
> Signed-off-by: Darrick J. Wong 
> Signed-off-by: Sasha Levin 
> ---
>  fs/iomap/apply.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> index 76925b40b5fd2..26ab6563181fc 100644
> --- a/fs/iomap/apply.c
> +++ b/fs/iomap/apply.c
> @@ -46,10 +46,14 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t 
> length, unsigned flags,
> ret = ops->iomap_begin(inode, pos, length, flags, , );
> if (ret)
> return ret;
> -   if (WARN_ON(iomap.offset > pos))
> -   return -EIO;
> -   if (WARN_ON(iomap.length == 0))
> -   return -EIO;
> +   if (WARN_ON(iomap.offset > pos)) {
> +   written = -EIO;
> +   goto out;
> +   }
> +   if (WARN_ON(iomap.length == 0)) {
> +   written = -EIO;
> +   goto out;
> +   }
>
> trace_iomap_apply_dstmap(inode, );
> if (srcmap.type != IOMAP_HOLE)
> @@ -80,6 +84,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, 
> unsigned flags,
> written = actor(inode, pos, length, data, ,
> srcmap.type != IOMAP_HOLE ?  : );
>
> +out:
> /*
>  * Now the data has been copied, commit the range we've copied.  This
>  * should not fail unless the filesystem has had a fatal error.
> --
> 2.25.1
>
>
>

Re: [RFC v3 1/2] fs: Add IOCB_NOIO flag for generic_file_read_iter

2020-07-07 Thread Andreas Grünbacher

Am Di., 7. Juli 2020 um 20:40 Uhr schrieb Jens Axboe :
> On 7/7/20 8:44 AM, Andreas Gruenbacher wrote:
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 3f881a892ea7..1ab2ea19e883 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -315,6 +315,7 @@ enum rw_hint {
> >  #define IOCB_SYNC(1 << 5)
> >  #define IOCB_WRITE   (1 << 6)
> >  #define IOCB_NOWAIT  (1 << 7)
> > +#define IOCB_NOIO(1 << 8)
>
> Just to make this even more trivial in terms of merge conflicts, could
> you do 1 << 9 instead?

Sure.

Andreas

Re: [PATCH 05/15] mm: allow read-ahead with IOCB_NOWAIT set

2020-07-07 Thread Andreas Grünbacher

Am Mi., 24. Juni 2020 um 18:48 Uhr schrieb Jens Axboe :
>
> On 6/24/20 10:41 AM, Matthew Wilcox wrote:
> > On Wed, Jun 24, 2020 at 09:35:19AM -0600, Jens Axboe wrote:
> >> On 6/24/20 9:00 AM, Jens Axboe wrote:
> >>> On 6/23/20 7:46 PM, Matthew Wilcox wrote:
>  I'd be quite happy to add a gfp_t to struct readahead_control.
>  The other thing I've been looking into for other reasons is adding
>  a memalloc_nowait_{save,restore}, which would avoid passing down
>  the gfp_t.
> >>>
> >>> That was my first thought, having the memalloc_foo_save/restore for
> >>> this. I don't think adding a gfp_t to readahead_control is going
> >>> to be super useful, seems like the kind of thing that should be
> >>> non-blocking by default.
> >>
> >> We're already doing memalloc_nofs_save/restore in
> >> page_cache_readahead_unbounded(), so I think all we need is to just do a
> >> noio dance in generic_file_buffered_read() and that should be enough.
> >
> > I think we can still sleep though, right?  I was thinking more
> > like this:
> >
> > http://git.infradead.org/users/willy/linux.git/shortlog/refs/heads/memalloc
>
> Yeah, that's probably better. How do we want to handle this? I've already
> got the other bits queued up. I can either add them to the series, or
> pull a branch that'll go into Linus as well.

Also note my conflicting patch that introduces a IOCB_NOIO flag for
fixing a gfs2 regression:

https://lore.kernel.org/linux-fsdevel/20200703095325.1491832-2-agrue...@redhat.com/

Thanks,
Andreas

Re: [PATCH 07/15] mm: add support for async page locking

2020-07-07 Thread Andreas Grünbacher

Jens,

Am Do., 18. Juni 2020 um 16:47 Uhr schrieb Jens Axboe :
> Normally waiting for a page to become unlocked, or locking the page,
> requires waiting for IO to complete. Add support for lock_page_async()
> and wait_on_page_locked_async(), which are callback based instead. This
> allows a caller to get notified when a page becomes unlocked, rather
> than wait for it.
>
> We add a new iocb field, ki_waitq, to pass in the necessary data for this
> to happen. We can unionize this with ki_cookie, since that is only used
> for polled IO. Polled IO can never co-exist with async callbacks, as it is
> (by definition) polled completions. struct wait_page_key is made public,
> and we define struct wait_page_async as the interface between the caller
> and the core.
>
> Acked-by: Johannes Weiner 
> Signed-off-by: Jens Axboe 
> ---
>  include/linux/fs.h  |  7 ++-
>  include/linux/pagemap.h | 17 
>  mm/filemap.c| 45 -
>  3 files changed, 67 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 6c4ab4dc1cd7..6ac919b40596 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -315,6 +315,8 @@ enum rw_hint {
>  #define IOCB_SYNC  (1 << 5)
>  #define IOCB_WRITE (1 << 6)
>  #define IOCB_NOWAIT(1 << 7)
> +/* iocb->ki_waitq is valid */
> +#define IOCB_WAITQ (1 << 8)
>
>  struct kiocb {
> struct file *ki_filp;
> @@ -328,7 +330,10 @@ struct kiocb {
> int ki_flags;
> u16 ki_hint;
> u16 ki_ioprio; /* See linux/ioprio.h */
> -   unsigned intki_cookie; /* for ->iopoll */
> +   union {
> +   unsigned intki_cookie; /* for ->iopoll */
> +   struct wait_page_queue  *ki_waitq; /* for async buffered IO */
> +   };
>
> randomized_struct_fields_end
>  };
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 2f18221bb5c8..e053e1d9a4d7 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -535,6 +535,7 @@ static inline int wake_page_match(struct wait_page_queue 
> *wait_page,
>
>  extern void __lock_page(struct page *page);
>  extern int __lock_page_killable(struct page *page);
> +extern int __lock_page_async(struct page *page, struct wait_page_queue 
> *wait);
>  extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
> unsigned int flags);
>  extern void unlock_page(struct page *page);
> @@ -571,6 +572,22 @@ static inline int lock_page_killable(struct page *page)
> return 0;
>  }
>
> +/*
> + * lock_page_async - Lock the page, unless this would block. If the page
> + * is already locked, then queue a callback when the page becomes unlocked.
> + * This callback can then retry the operation.
> + *
> + * Returns 0 if the page is locked successfully, or -EIOCBQUEUED if the page
> + * was already locked and the callback defined in 'wait' was queued.
> + */
> +static inline int lock_page_async(struct page *page,
> + struct wait_page_queue *wait)
> +{
> +   if (!trylock_page(page))
> +   return __lock_page_async(page, wait);
> +   return 0;
> +}
> +
>  /*
>   * lock_page_or_retry - Lock the page, unless this would block and the
>   * caller indicated that it can handle a retry.
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c3175dbd8fba..e8aaf43bee9f 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1180,6 +1180,36 @@ int wait_on_page_bit_killable(struct page *page, int 
> bit_nr)
>  }
>  EXPORT_SYMBOL(wait_on_page_bit_killable);
>
> +static int __wait_on_page_locked_async(struct page *page,
> +  struct wait_page_queue *wait, bool set)
> +{
> +   struct wait_queue_head *q = page_waitqueue(page);
> +   int ret = 0;
> +
> +   wait->page = page;
> +   wait->bit_nr = PG_locked;
> +
> +   spin_lock_irq(>lock);
> +   __add_wait_queue_entry_tail(q, >wait);
> +   SetPageWaiters(page);
> +   if (set)
> +   ret = !trylock_page(page);
> +   else
> +   ret = PageLocked(page);
> +   /*
> +* If we were succesful now, we know we're still on the
> +* waitqueue as we're still under the lock. This means it's
> +* safe to remove and return success, we know the callback
> +* isn't going to trigger.
> +*/
> +   if (!ret)
> +   __remove_wait_queue(q, >wait);
> +   else
> +   ret = -EIOCBQUEUED;
> +   spin_unlock_irq(>lock);
> +   return ret;
> +}
> +
>  /**
>   * put_and_wait_on_page_locked - Drop a reference and wait for it to be 
> unlocked
>   * @page: The page to wait for.
> @@ -1342,6 +1372,11 @@ int __lock_page_killable(struct page *__page)
>  }
>

Re: [Cluster-devel] [PATCH v11 16/25] fs: Convert mpage_readpages to mpage_readahead

2020-06-16 Thread Andreas Grünbacher

Am Mi., 17. Juni 2020 um 02:33 Uhr schrieb Matthew Wilcox :
>
> On Wed, Jun 17, 2020 at 12:36:13AM +0200, Andreas Gruenbacher wrote:
> > Am Mi., 15. Apr. 2020 um 23:39 Uhr schrieb Matthew Wilcox 
> > :
> > > From: "Matthew Wilcox (Oracle)" 
> > >
> > > Implement the new readahead aop and convert all callers (block_dev,
> > > exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6,
> > > reiserfs & udf).  The callers are all trivial except for GFS2 & OCFS2.
> >
> > This patch leads to an ABBA deadlock in xfstest generic/095 on gfs2.
> >
> > Our lock hierarchy is such that the inode cluster lock ("inode glock")
> > for an inode needs to be taken before any page locks in that inode's
> > address space.
>
> How does that work for ...
>
> writepage:  yes, unlocks (see below)
> readpage:   yes, unlocks
> invalidatepage: yes
> releasepage:yes
> freepage:   yes
> isolate_page:   yes
> migratepage:yes (both)
> putback_page:   yes
> launder_page:   yes
> is_partially_uptodate:  yes
> error_remove_page:  yes
>
> Is there a reason that you don't take the glock in the higher level
> ops which are called before readhead gets called?  I'm looking at XFS,
> and it takes the xfs_ilock SHARED in xfs_file_buffered_aio_read()
> (called from xfs_file_read_iter).

Right, the approach from the following thread might fix this:

https://lore.kernel.org/linux-fsdevel/20191122235324.17245-1-agrue...@redhat.com/T/#t

Andreas

Re: [RFC PATCH V2 1/9] include/linux/pagemap.h: introduce attach/clear_page_private

2020-04-30 Thread Andreas Grünbacher

Hi,

Am Do., 30. Apr. 2020 um 23:56 Uhr schrieb Guoqing Jiang
:
> The logic in attach_page_buffers and  __clear_page_buffers are quite
> paired, but
>
> 1. they are located in different files.
>
> 2. attach_page_buffers is implemented in buffer_head.h, so it could be
>used by other files. But __clear_page_buffers is static function in
>buffer.c and other potential users can't call the function, md-bitmap
>even copied the function.
>
> So, introduce the new attach/clear_page_private to replace them. With
> the new pair of function, we will remove the usage of attach_page_buffers
> and  __clear_page_buffers in next patches. Thanks for the new names from
> Christoph Hellwig.
>
> Suggested-by: Matthew Wilcox 
> Cc: Andrew Morton 
> Cc: "Darrick J. Wong" 
> Cc: William Kucharski 
> Cc: "Kirill A. Shutemov" 
> Cc: Andreas Gruenbacher 
> Cc: Yang Shi 
> Cc: Yafang Shao 
> Cc: Song Liu 
> Cc: linux-r...@vger.kernel.org
> Cc: Chris Mason 
> Cc: Josef Bacik 
> Cc: David Sterba 
> Cc: linux-bt...@vger.kernel.org
> Cc: Alexander Viro 
> Cc: Jaegeuk Kim 
> Cc: Chao Yu 
> Cc: linux-f2fs-de...@lists.sourceforge.net
> Cc: Christoph Hellwig 
> Cc: linux-...@vger.kernel.org
> Cc: Anton Altaparmakov 
> Cc: linux-ntfs-...@lists.sourceforge.net
> Cc: Mike Marshall 
> Cc: Martin Brandenburg 
> Cc: de...@lists.orangefs.org
> Cc: Thomas Gleixner 
> Cc: Sebastian Andrzej Siewior 
> Cc: Roman Gushchin 
> Cc: Andreas Dilger 
> Signed-off-by: Guoqing Jiang 
> ---
> RFC -> RFC V2:  Address the comments from Christoph Hellwig
> 1. change function names to attach/clear_page_private and add comments.
> 2. change the return type of attach_page_private.
>
>  include/linux/pagemap.h | 35 +++
>  1 file changed, 35 insertions(+)
>
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index a8f7bd8ea1c6..2e515f210b18 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -205,6 +205,41 @@ static inline int page_cache_add_speculative(struct page 
> *page, int count)
> return __page_cache_add_speculative(page, count);
>  }
>
> +/**
> + * attach_page_private - attach data to page's private field and set 
> PG_private.
> + * @page: page to be attached and set flag.
> + * @data: data to attach to page's private field.
> + *
> + * Need to take reference as mm.h said "Setting PG_private should also 
> increment
> + * the refcount".
> + */
> +static inline void attach_page_private(struct page *page, void *data)
> +{
> +   get_page(page);
> +   set_page_private(page, (unsigned long)data);
> +   SetPagePrivate(page);
> +}
> +
> +/**
> + * clear_page_private - clear page's private field and PG_private.
> + * @page: page to be cleared.
> + *
> + * The counterpart function of attach_page_private.
> + * Return: private data of page or NULL if page doesn't have private data.
> + */
> +static inline void *clear_page_private(struct page *page)
> +{
> +   void *data = (void *)page_private(page);
> +
> +   if (!PagePrivate(page))
> +   return NULL;
> +   ClearPagePrivate(page);
> +   set_page_private(page, 0);
> +   put_page(page);
> +
> +   return data;
> +}
> +

I like this in general, but the name clear_page_private suggests that
this might be the inverse operation of set_page_private, which it is
not. So maybe this can be renamed to detach_page_private to more
clearly indicate that it pairs with attach_page_private?

>  #ifdef CONFIG_NUMA
>  extern struct page *__page_cache_alloc(gfp_t gfp);
>  #else
> --
> 2.17.1
>

Thanks,
Andreas

GFS2: Pull Request (2nd attempt)

2019-05-08 Thread Andreas Grünbacher

Hi Linus,

please consider pulling the following changes for the GFS2 file
system, now without resolving the merge conflict Stephen Rothwell has
pointed out between commit

  2b070cfe582b ("block: remove the i argument to bio_for_each_segment_all")

which is already merged, interacting with commit

  e21e191994af ("gfs2: read journal in large chunks")

which is part of this pull request. Stephen's suggested resolution of
the conflict can be found at
https://lore.kernel.org/lkml/20190506150707.618f0...@canb.auug.org.au/.

Thanks,
Andreas

The following changes since commit b4b52b881cf08e13d110eac811d4becc0775abbf:

  Merge tag 'Wimplicit-fallthrough-5.2-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
(2019-05-07 12:48:10 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git
tags/gfs2-for-5.2

for you to fetch changes up to f4686c26ecc34e8e458b8235f0af5198c9b13bfd:

  gfs2: read journal in large chunks (2019-05-07 23:39:15 +0200)


We've got the following patches ready for this merge window:

"gfs2: Fix loop in gfs2_rbm_find (v2)"

  A rework of a fix we ended up reverting in 5.0 because of an iozone
  performance regression.

"gfs2: read journal in large chunks" and
"gfs2: fix race between gfs2_freeze_func and unmount"

  An improved version of a commit we also ended up reverting in 5.0
  because of a regression in xfstest generic/311.  It turns out that the
  journal changes were mostly innocent and that unfreeze didn't wait for
  the freeze to complete, which caused the filesystem to be unmounted
  before it was actually idle.

"gfs2: Fix occasional glock use-after-free"
"gfs2: Fix iomap write page reclaim deadlock"
"gfs2: Fix lru_count going negative"

  Fixes for various problems reported and partially fixed by Citrix
  engineers.  Thank you very much.

"gfs2: clean_journal improperly set sd_log_flush_head"

  Another fix from Bob.

A few other minor cleanups.


Abhi Das (2):
  gfs2: fix race between gfs2_freeze_func and unmount
  gfs2: read journal in large chunks

Andreas Gruenbacher (7):
  gfs2: Fix loop in gfs2_rbm_find (v2)
  gfs2: Fix occasional glock use-after-free
  gfs2: Remove misleading comments in gfs2_evict_inode
  gfs2: Remove unnecessary extern declarations
  gfs2: Rename sd_log_le_{revoke,ordered}
  gfs2: Rename gfs2_trans_{add_unrevoke => remove_revoke}
  gfs2: Fix iomap write page reclaim deadlock

Bob Peterson (2):
  gfs2: clean_journal improperly set sd_log_flush_head
  gfs2: Replace gl_revokes with a GLF flag

Ross Lagerwall (1):
  gfs2: Fix lru_count going negative

 fs/gfs2/aops.c   |  14 ++-
 fs/gfs2/bmap.c   | 118 ++-
 fs/gfs2/bmap.h   |   1 +
 fs/gfs2/dir.c|   2 +-
 fs/gfs2/glock.c  |  25 +++--
 fs/gfs2/glops.c  |   3 +-
 fs/gfs2/incore.h |   9 +-
 fs/gfs2/log.c|  47 ++
 fs/gfs2/log.h|   5 +-
 fs/gfs2/lops.c   | 261 ++-
 fs/gfs2/lops.h   |  11 +--
 fs/gfs2/main.c   |   1 -
 fs/gfs2/ops_fstype.c |   7 +-
 fs/gfs2/recovery.c   | 135 ++
 fs/gfs2/recovery.h   |   4 +-
 fs/gfs2/rgrp.c   |  56 +--
 fs/gfs2/super.c  |  20 ++--
 fs/gfs2/trans.c  |   4 +-
 fs/gfs2/trans.h  |   2 +-
 fs/gfs2/xattr.c  |   6 +-
 20 files changed, 438 insertions(+), 293 deletions(-)

Re: [PATCH] overlayfs: ignore empty NFSv4 ACLs in ext4 upperdir

2019-05-03 Thread Andreas Grünbacher

Am Fr., 3. Mai 2019 um 01:24 Uhr schrieb NeilBrown :
> On Thu, May 02 2019, Andreas Gruenbacher wrote:
> > On Thu, 2 May 2019 at 05:57, NeilBrown  wrote:
> >> On Wed, May 01 2019, Amir Goldstein wrote:
> >> > On Wed, May 1, 2019 at 10:03 PM NeilBrown  wrote:
> >> >> On Tue, Dec 06 2016, J. Bruce Fields wrote:
> >> >> > On Tue, Dec 06, 2016 at 02:18:31PM +0100, Andreas Gruenbacher wrote:
> >> >> >> On Tue, Dec 6, 2016 at 11:08 AM, Miklos Szeredi  
> >> >> >> wrote:
> >> >> >> > On Tue, Dec 6, 2016 at 12:24 AM, Andreas Grünbacher
> >> >> >> >  wrote:
> >> >> >> >> 2016-12-06 0:19 GMT+01:00 Andreas Grünbacher 
> >> >> >> >> :
> >> >> >> >
> >> >> >> >>> It's not hard to come up with a heuristic that determines if a
> >> >> >> >>> system.nfs4_acl value is equivalent to a file mode, and to 
> >> >> >> >>> ignore the
> >> >> >> >>> attribute in that case. (The file mode is transmitted in its own
> >> >> >> >>> attribute already, so actually converting .) That way, overlayfs 
> >> >> >> >>> could
> >> >> >> >>> still fail copying up files that have an actual ACL. It's still 
> >> >> >> >>> an
> >> >> >> >>> ugly hack ...
> >> >> >> >>
> >> >> >> >> Actually, that kind of heuristic would make sense in the NFS 
> >> >> >> >> client
> >> >> >> >> which could then hide the "system.nfs4_acl" attribute.
> >
> > I still think the nfs client could make this problem mostly go away by
> > not exposing "system.nfs4_acl" xattrs when the acl is equivalent to
> > the file mode.
>
> Maybe ... but this feels a bit like "sweeping it under the carpet".
> What happens if some file on the lower layer does have a more complex
> ACL?
> Do we just fail any attempt to modify that object?  Doesn't that violate
> the law of least surprise?

It would at least expose that there is a problem only if there is an
actual problem.

> Maybe if the lower-layer has an i_op->permission method, then overlayfs
> should *always* call that for permission checking - unless a
> chmod/chown/etc has happened on the file.  That way, we wouldn't need to
> copy-up the ACL, but would still get correct ACL testing.

No, the permissions need to stick with the object. Otherwise, what
would you do on rename or when the lower layer changes?

Andreas

> Thanks,
> NeilBrown
>
>
> > The richacl patches contain a workable abgorithm for
> > that. The problem would remain for files that have an actual NFS4 ACL,
> > which just cannot be mapped to a file mode or to POSIX ACLs in the
> > general case, as well as for files that have a POSIX ACL. Mapping NFS4
> > ACL that used to be a POSIX ACL back to POSIX ACLs could be achieved
> > in many cases as well, but the code would be quite messy. A better way
> > seems to be to using a filesystem that doesn't support POSIX ACLs in
> > the first place. Unfortunately, xfs doesn't allow turning off POSIX
> > ACLs, for example.
> >
> > Andreas
> >
> >> >> >> > Even simpler would be if knfsd didn't send the attribute if not
> >> >> >> > necessary.  Looks like there's code actively creating the nfs4_acl 
> >> >> >> > on
> >> >> >> > the wire even if the filesystem had none:
> >> >> >> >
> >> >> >> > pacl = get_acl(inode, ACL_TYPE_ACCESS);
> >> >> >> > if (!pacl)
> >> >> >> > pacl = posix_acl_from_mode(inode->i_mode, GFP_KERNEL);
> >> >> >> >
> >> >> >> > What's the point?
> >> >> >>
> >> >> >> That's how the protocol is specified.
> >> >> >
> >> >> > Yep, even if we could make that change to nfsd it wouldn't help the
> >> >> > client with the large number of other servers that are out there
> >> >> > (including older knfsd's).
> >> >> >
> >> >> > --b.
> >> >> >
> >> >> >> (I'm not saying that that's very helpful.)
> >> >> >>
> >> >> >> Andreas
> >> >>
> >> >> Hi

Re: [PATCH] overlayfs: ignore empty NFSv4 ACLs in ext4 upperdir

2019-05-02 Thread Andreas Grünbacher

Am Do., 2. Mai 2019 um 16:28 Uhr schrieb Miklos Szeredi :
> On Thu, May 2, 2019 at 10:05 AM Andreas Gruenbacher  
> wrote:
> > On Thu, 2 May 2019 at 05:57, NeilBrown  wrote:
> > > On Wed, May 01 2019, Amir Goldstein wrote:
> > > > On Wed, May 1, 2019 at 10:03 PM NeilBrown  wrote:
> > > >> On Tue, Dec 06 2016, J. Bruce Fields wrote:
> > > >> > On Tue, Dec 06, 2016 at 02:18:31PM +0100, Andreas Gruenbacher wrote:
> > > >> >> On Tue, Dec 6, 2016 at 11:08 AM, Miklos Szeredi  
> > > >> >> wrote:
> > > >> >> > On Tue, Dec 6, 2016 at 12:24 AM, Andreas Grünbacher
> > > >> >> >  wrote:
> > > >> >> >> 2016-12-06 0:19 GMT+01:00 Andreas Grünbacher 
> > > >> >> >> :
> > > >> >> >
> > > >> >> >>> It's not hard to come up with a heuristic that determines if a
> > > >> >> >>> system.nfs4_acl value is equivalent to a file mode, and to 
> > > >> >> >>> ignore the
> > > >> >> >>> attribute in that case. (The file mode is transmitted in its own
> > > >> >> >>> attribute already, so actually converting .) That way, 
> > > >> >> >>> overlayfs could
> > > >> >> >>> still fail copying up files that have an actual ACL. It's still 
> > > >> >> >>> an
> > > >> >> >>> ugly hack ...
> > > >> >> >>
> > > >> >> >> Actually, that kind of heuristic would make sense in the NFS 
> > > >> >> >> client
> > > >> >> >> which could then hide the "system.nfs4_acl" attribute.
> >
> > I still think the nfs client could make this problem mostly go away by
> > not exposing "system.nfs4_acl" xattrs when the acl is equivalent to
> > the file mode. The richacl patches contain a workable abgorithm for
> > that. The problem would remain for files that have an actual NFS4 ACL,
> > which just cannot be mapped to a file mode or to POSIX ACLs in the
> > general case, as well as for files that have a POSIX ACL. Mapping NFS4
> > ACL that used to be a POSIX ACL back to POSIX ACLs could be achieved
> > in many cases as well, but the code would be quite messy. A better way
> > seems to be to using a filesystem that doesn't support POSIX ACLs in
> > the first place. Unfortunately, xfs doesn't allow turning off POSIX
> > ACLs, for example.
>
> How about mounting NFSv4 with noacl?  That should fix this issue, right?

You'll still see permissions that differ from what the filesystem
enforces, and copy-up would change that behavior.

Andreas

> Thanks,
> Miklos
>
>
>
> >
> > Andreas
> >
> > > >> >> > Even simpler would be if knfsd didn't send the attribute if not
> > > >> >> > necessary.  Looks like there's code actively creating the 
> > > >> >> > nfs4_acl on
> > > >> >> > the wire even if the filesystem had none:
> > > >> >> >
> > > >> >> > pacl = get_acl(inode, ACL_TYPE_ACCESS);
> > > >> >> > if (!pacl)
> > > >> >> > pacl = posix_acl_from_mode(inode->i_mode, GFP_KERNEL);
> > > >> >> >
> > > >> >> > What's the point?
> > > >> >>
> > > >> >> That's how the protocol is specified.
> > > >> >
> > > >> > Yep, even if we could make that change to nfsd it wouldn't help the
> > > >> > client with the large number of other servers that are out there
> > > >> > (including older knfsd's).
> > > >> >
> > > >> > --b.
> > > >> >
> > > >> >> (I'm not saying that that's very helpful.)
> > > >> >>
> > > >> >> Andreas
> > > >>
> > > >> Hi everyone.
> > > >>  I have a customer facing this problem, and so stumbled onto the email
> > > >>  thread.
> > > >>  Unfortunately it didn't resolve anything.  Maybe I can help kick 
> > > >> things
> > > >>  along???
> > > >>
> > > >>  The core problem here is that NFSv4 and ext4 use different and largely
> > > >>  incompatible ACL implementations.  There is no way to accurately
> >

Re: linux-next: Signed-off-by missing for commit in the gfs2 tree

2018-12-11 Thread Andreas Grünbacher

Hi,

Am Di., 11. Dez. 2018 um 21:40 Uhr schrieb Stephen Rothwell
:
> Hi all,
>
> On Wed, 12 Dec 2018 07:24:02 +1100 Stephen Rothwell  
> wrote:
> >
> > Commit
> >
> >   ec702fa79702 ("gfs2: Dump nrpages for inodes and their glocks")
> >
> > is missing a Signed-off-by from its committer.
>
> Now, after the rebase:
>
> Commits
>
>   187e4dad3f55 ("gfs2: Remove vestigial bd_ops")
>
> is missing a Signed-off-by from its committer.

Yep, I've not asked for a way to get the server to reject or warn
about such pushes without a reason.

Thanks,
Andreas

Re: sysfs: Do not return POSIX ACL xattrs via listxattr()

2018-09-10 Thread Andreas Grünbacher

Marc,

Am Mo., 10. Sep. 2018 um 05:03 Uhr schrieb Marc Aurele La France
:
> Greetings.
>
> Commit 786534b92f3ce68f4afc8a761c80b76887797b0a "tmpfs: listxattr
> should include POSIX ACL xattrs", which first appeared in 4.5 kernels,
> introduced a regression whereby listxattr() syscalls on anything in
> sysfs, or its mountpoint, return the name of the two POSIX ACL xattrs,
> but attempts to retrieve these values are denied with EOPNOTSUP.  For
> example ...
>
> # getfattr -d --match=- /sys
> /sys: system.posix_acl_access: Operation not supported
> /sys: system.posix_acl_default: Operation not supported
> #

I can confirm this regression.

> This inconsistent behaviour confuses rsync(1) (among others) when it
> runs into a sysfs mountpoint, even when told to not descend into it.
> This issue occurs because simple_xattr_list() does not correctly deal
> with cached ACLs.
>
> The suggested fix below was developed with a 4.18.7 kernel, but should
> apply, modulo patch fuzz, to any kernel >= 4.7.  A fix for 4.5 <=
> kernels < 4.7 is trivially different, but I won't bother given such
> kernels are no longer maintained.
>
> Note that the only other simple_xattr_list() caller, shmem, avoids
> this glitch by previously calling cache_no_acl() on all inodes it
> creates, so perhaps sysfs/kernfs should do the same.
>
> Please Reply-To-All.
>
> Thanks and have a great day.
>
> Marc.
>
> Signed-off-by: Marc Aurèle La France 
>
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -949,13 +949,13 @@ ssize_t simple_xattr_list(struct inode *inode, struct 
> simple_xattrs *xattrs,
> int err = 0;
>
>  #ifdef CONFIG_FS_POSIX_ACL
> -   if (inode->i_acl) {
> +   if (inode->i_acl && !is_uncached_acl(inode->i_acl)) {
> err = xattr_list_one(, _size,
>  XATTR_NAME_POSIX_ACL_ACCESS);
> if (err)
> return err;
> }
> -   if (inode->i_default_acl) {
> +   if (inode->i_default_acl && !is_uncached_acl(inode->i_default_acl)) {
> err = xattr_list_one(, _size,
>  XATTR_NAME_POSIX_ACL_DEFAULT);
> if (err)

This seems to be a better fix, but I haven't fully verified it, yet:

--- a/fs/inode.c
+++ b/fs/inode.c
@@ -187,7 +187,8 @@ int inode_init_always(struct super_block *sb,
struct inode *inode)
 inode->i_mapping = mapping;
 INIT_HLIST_HEAD(>i_dentry);/* buggered by rcu freeing */
 #ifdef CONFIG_FS_POSIX_ACL
-inode->i_acl = inode->i_default_acl = ACL_NOT_CACHED;
+inode->i_acl = inode->i_default_acl =
+   (sb->s_flags & SB_POSIXACL) ? ACL_NOT_CACHED : NULL;
 #endif

 #ifdef CONFIG_FSNOTIFY

Thanks,
Andreas

Re: sysfs: Do not return POSIX ACL xattrs via listxattr()

2018-09-10 Thread Andreas Grünbacher

Marc,

Am Mo., 10. Sep. 2018 um 05:03 Uhr schrieb Marc Aurele La France
:
> Greetings.
>
> Commit 786534b92f3ce68f4afc8a761c80b76887797b0a "tmpfs: listxattr
> should include POSIX ACL xattrs", which first appeared in 4.5 kernels,
> introduced a regression whereby listxattr() syscalls on anything in
> sysfs, or its mountpoint, return the name of the two POSIX ACL xattrs,
> but attempts to retrieve these values are denied with EOPNOTSUP.  For
> example ...
>
> # getfattr -d --match=- /sys
> /sys: system.posix_acl_access: Operation not supported
> /sys: system.posix_acl_default: Operation not supported
> #

I can confirm this regression.

> This inconsistent behaviour confuses rsync(1) (among others) when it
> runs into a sysfs mountpoint, even when told to not descend into it.
> This issue occurs because simple_xattr_list() does not correctly deal
> with cached ACLs.
>
> The suggested fix below was developed with a 4.18.7 kernel, but should
> apply, modulo patch fuzz, to any kernel >= 4.7.  A fix for 4.5 <=
> kernels < 4.7 is trivially different, but I won't bother given such
> kernels are no longer maintained.
>
> Note that the only other simple_xattr_list() caller, shmem, avoids
> this glitch by previously calling cache_no_acl() on all inodes it
> creates, so perhaps sysfs/kernfs should do the same.
>
> Please Reply-To-All.
>
> Thanks and have a great day.
>
> Marc.
>
> Signed-off-by: Marc Aurèle La France 
>
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -949,13 +949,13 @@ ssize_t simple_xattr_list(struct inode *inode, struct 
> simple_xattrs *xattrs,
> int err = 0;
>
>  #ifdef CONFIG_FS_POSIX_ACL
> -   if (inode->i_acl) {
> +   if (inode->i_acl && !is_uncached_acl(inode->i_acl)) {
> err = xattr_list_one(, _size,
>  XATTR_NAME_POSIX_ACL_ACCESS);
> if (err)
> return err;
> }
> -   if (inode->i_default_acl) {
> +   if (inode->i_default_acl && !is_uncached_acl(inode->i_default_acl)) {
> err = xattr_list_one(, _size,
>  XATTR_NAME_POSIX_ACL_DEFAULT);
> if (err)

This seems to be a better fix, but I haven't fully verified it, yet:

--- a/fs/inode.c
+++ b/fs/inode.c
@@ -187,7 +187,8 @@ int inode_init_always(struct super_block *sb,
struct inode *inode)
 inode->i_mapping = mapping;
 INIT_HLIST_HEAD(>i_dentry);/* buggered by rcu freeing */
 #ifdef CONFIG_FS_POSIX_ACL
-inode->i_acl = inode->i_default_acl = ACL_NOT_CACHED;
+inode->i_acl = inode->i_default_acl =
+   (sb->s_flags & SB_POSIXACL) ? ACL_NOT_CACHED : NULL;
 #endif

 #ifdef CONFIG_FSNOTIFY

Thanks,
Andreas

Re: linux-next: Signed-off-by missing for commit in the gfs2 tree

2018-07-24 Thread Andreas Grünbacher

Hi Stephen,

2018-07-25 0:01 GMT+02:00 Stephen Rothwell :
> Hi all,
>
> Commit
>
>   b6fbc2f6a686 ("GFS2: Fix recovery issues for spectators")
>
> is missing a Signed-off-by from its committer.  It was rebased.

thanks, I've fixed it now but this is getting rather annoying -- and
it must happen to others as well. Do you happen to have git hooks that
prevent this from happening?

Thanks,
Andreas

Re: linux-next: Signed-off-by missing for commit in the gfs2 tree

2018-07-24 Thread Andreas Grünbacher

Hi Stephen,

2018-07-25 0:01 GMT+02:00 Stephen Rothwell :
> Hi all,
>
> Commit
>
>   b6fbc2f6a686 ("GFS2: Fix recovery issues for spectators")
>
> is missing a Signed-off-by from its committer.  It was rebased.

thanks, I've fixed it now but this is getting rather annoying -- and
it must happen to others as well. Do you happen to have git hooks that
prevent this from happening?

Thanks,
Andreas

Re: linux-next: Signed-off-by missing for commits in the gfs2 tree

2018-07-04 Thread Andreas Grünbacher

2018-07-04 23:57 GMT+02:00 Stephen Rothwell :
> Hi all,
>
> On Thu, 5 Jul 2018 07:55:44 +1000 Stephen Rothwell  
> wrote:
>>
>> Commits
>>
>>   0627518adcc8 ("GFS2: rgrp free blocks used incorrectly")
>>   b30cbec54def ("gfs2: Don't reject a supposedly full bitmap if we have 
>> blocks reserved")
>>
>> are missing a Signed-off-by from their committer.
>>
>> Not updated in the rebase.
>
> Just cc'ing the committer.

Yeah thanks, fixed now.

Andreas

Re: linux-next: Signed-off-by missing for commits in the gfs2 tree

2018-07-04 Thread Andreas Grünbacher

2018-07-04 23:57 GMT+02:00 Stephen Rothwell :
> Hi all,
>
> On Thu, 5 Jul 2018 07:55:44 +1000 Stephen Rothwell  
> wrote:
>>
>> Commits
>>
>>   0627518adcc8 ("GFS2: rgrp free blocks used incorrectly")
>>   b30cbec54def ("gfs2: Don't reject a supposedly full bitmap if we have 
>> blocks reserved")
>>
>> are missing a Signed-off-by from their committer.
>>
>> Not updated in the rebase.
>
> Just cc'ing the committer.

Yeah thanks, fixed now.

Andreas

Re: Quilt vs gmail (Was: [PATCH 0/3] sched/swait: Convert to full exclusive mode)

2018-06-13 Thread Andreas Grünbacher

Jean,

2018-06-13 14:32 GMT+02:00 Jean Delvare :
> Hi Peter, Linus, Andreas,
>
> On Tue, 12 Jun 2018 19:14:20 +0200, Peter Zijlstra wrote:
>> On Tue, Jun 12, 2018 at 09:47:34AM -0700, Linus Torvalds wrote:
>>
>> > I do note how quilt emails are really hard to read, because that:
>> >
>> > Content-Disposition: inline
>> >
>> > makes gmail think it's flowed.
>> >
>> > Which works horribly badly for patches, surprise surprise.
>> >
>> > So I really wish quilt wouldn't do that. It does smell like a gmail
>> > bug, but at the same time, why would you use "Content-Disposition:
>> > inline" when you don't have an actual multi-part email? So I do blame
>> > quilt too for sending nonsensical headers.
>> >
>> > (Yes, yes, I see the "It is permissible to use Content-Disposition on
>> > the main body" in the RFC. But the RFC also makes it clear that it
>> > actually matters for how things are presented, so saying "ok, I'll do
>> > flowed" seems equally insane and equally technically RFC-compliant)
>>
>> Quilt people, anything that can be done about that?
>
> The purpose of the Content-Disposition header is to let quilt store the
> original patch file name, so that the recipient can save the email as a
> patch file having the exact same name as the sender was using, to make
> communication between developers easier. This is the reason why the
> header is being added despite the email not being multi-part. As Linus
> found out already, RFC 2183 allows that. The RFC also explicitly allows
> the use of a filename parameter for inline parts (see section 2.3.)
>
> Using "attachment" instead of "inline" would presumably force the user
> to save the patch to a file before being able to read it, or, at least,
> to take additional actions in the MUA to convince it to display the
> contents inline regardless of what the Content-Disposition header
> says. This is clearly not desirable.
>
> We could try specifying the filename directly, without the "inline"
> keyword, however that would no longer comply with the RFC
> ("disposition-parm" is optional, but "disposition-type" is mandatory)
> and I am afraid that some MUA implementations would either default to
> disposition-type "attachment" in this case, or ignore the header
> altogether.
>
> I'm not sure I understand what "flowed" means in this context. If you
> mean that gmail breaks the formatting of the patch, I would say that
> gmail is infringing the RFC, which clearly stipulates at the beginning
> that the Content-Disposition header field is only about telling the MUA
> which parts should be displayed immediately and which parts should not,
> and not about presentation issues.
>
> Considering that "inline" is the default for a non-multi-part message,
> any MUA which changes its behavior in the presence of
> "Content-Disposition: inline" is bugged in my opinion.

All of that may be correct, but those headers apparently do break
email based patch reviewing on Thunderbird and Gmail now, and that's
not very likely to change. If we continue with our current practice,
we'll end up frustrating users. On top of that, i we make this an
optional feature, quilt users may think that using that option is a
good idea when they will actually break their recipients' workflows.
As Thomas Gleixner wrote in the other thread, most recipients will
already have a way to deal with messages from other sources that don't
include patch filenames, so let's just get rid of Content-Disposition
headers in quilt for good.

Andreas

Re: Quilt vs gmail (Was: [PATCH 0/3] sched/swait: Convert to full exclusive mode)

2018-06-13 Thread Andreas Grünbacher

Jean,

2018-06-13 14:32 GMT+02:00 Jean Delvare :
> Hi Peter, Linus, Andreas,
>
> On Tue, 12 Jun 2018 19:14:20 +0200, Peter Zijlstra wrote:
>> On Tue, Jun 12, 2018 at 09:47:34AM -0700, Linus Torvalds wrote:
>>
>> > I do note how quilt emails are really hard to read, because that:
>> >
>> > Content-Disposition: inline
>> >
>> > makes gmail think it's flowed.
>> >
>> > Which works horribly badly for patches, surprise surprise.
>> >
>> > So I really wish quilt wouldn't do that. It does smell like a gmail
>> > bug, but at the same time, why would you use "Content-Disposition:
>> > inline" when you don't have an actual multi-part email? So I do blame
>> > quilt too for sending nonsensical headers.
>> >
>> > (Yes, yes, I see the "It is permissible to use Content-Disposition on
>> > the main body" in the RFC. But the RFC also makes it clear that it
>> > actually matters for how things are presented, so saying "ok, I'll do
>> > flowed" seems equally insane and equally technically RFC-compliant)
>>
>> Quilt people, anything that can be done about that?
>
> The purpose of the Content-Disposition header is to let quilt store the
> original patch file name, so that the recipient can save the email as a
> patch file having the exact same name as the sender was using, to make
> communication between developers easier. This is the reason why the
> header is being added despite the email not being multi-part. As Linus
> found out already, RFC 2183 allows that. The RFC also explicitly allows
> the use of a filename parameter for inline parts (see section 2.3.)
>
> Using "attachment" instead of "inline" would presumably force the user
> to save the patch to a file before being able to read it, or, at least,
> to take additional actions in the MUA to convince it to display the
> contents inline regardless of what the Content-Disposition header
> says. This is clearly not desirable.
>
> We could try specifying the filename directly, without the "inline"
> keyword, however that would no longer comply with the RFC
> ("disposition-parm" is optional, but "disposition-type" is mandatory)
> and I am afraid that some MUA implementations would either default to
> disposition-type "attachment" in this case, or ignore the header
> altogether.
>
> I'm not sure I understand what "flowed" means in this context. If you
> mean that gmail breaks the formatting of the patch, I would say that
> gmail is infringing the RFC, which clearly stipulates at the beginning
> that the Content-Disposition header field is only about telling the MUA
> which parts should be displayed immediately and which parts should not,
> and not about presentation issues.
>
> Considering that "inline" is the default for a non-multi-part message,
> any MUA which changes its behavior in the presence of
> "Content-Disposition: inline" is bugged in my opinion.

All of that may be correct, but those headers apparently do break
email based patch reviewing on Thunderbird and Gmail now, and that's
not very likely to change. If we continue with our current practice,
we'll end up frustrating users. On top of that, i we make this an
optional feature, quilt users may think that using that option is a
good idea when they will actually break their recipients' workflows.
As Thomas Gleixner wrote in the other thread, most recipients will
already have a way to deal with messages from other sources that don't
include patch filenames, so let's just get rid of Content-Disposition
headers in quilt for good.

Andreas

Re: linux-next: manual merge of the xfs tree with Linus' tree

2018-06-04 Thread Andreas Grünbacher

2018-06-05 2:59 GMT+02:00 Dave Chinner :
> On Tue, Jun 05, 2018 at 10:34:03AM +1000, Stephen Rothwell wrote:
>> Hi all,
>>
>> Today's linux-next merge of the xfs tree got a conflict in:
>>
>>   fs/gfs2/bmap.c
>>
>> between commit:
>>
>>   628e366df11c ("gfs2: Iomap cleanups and improvements")
>>
>> from Linus' tree and commit:
>>
>>   7ee66c03e40a ("iomap: move IOMAP_F_BOUNDARY to gfs2")
>>
>> from the xfs tree.
>>
>> I fixed it up (see below) and can carry the fix as necessary. This
>> is now fixed as far as linux-next is concerned, but any non trivial
>> conflicts should be mentioned to your upstream maintainer when your tree
>> is submitted for merging.  You may also want to consider cooperating
>> with the maintainer of the conflicting tree to minimise any particularly
>> complex conflicts.
>
> We should have seen this before the gfs2 tree was merged into Linus'
> tree. Does that mean the gfs2 tree is not being pulled into the
> linux-next tree?

That's probably our fault, the gfs2 for-next branch was slightly
outdated. That patch would have been better in the gfs2 tree. How
would you like to proceed?

Thanks,
Andreas

Re: linux-next: manual merge of the xfs tree with Linus' tree

2018-06-04 Thread Andreas Grünbacher

2018-06-05 2:59 GMT+02:00 Dave Chinner :
> On Tue, Jun 05, 2018 at 10:34:03AM +1000, Stephen Rothwell wrote:
>> Hi all,
>>
>> Today's linux-next merge of the xfs tree got a conflict in:
>>
>>   fs/gfs2/bmap.c
>>
>> between commit:
>>
>>   628e366df11c ("gfs2: Iomap cleanups and improvements")
>>
>> from Linus' tree and commit:
>>
>>   7ee66c03e40a ("iomap: move IOMAP_F_BOUNDARY to gfs2")
>>
>> from the xfs tree.
>>
>> I fixed it up (see below) and can carry the fix as necessary. This
>> is now fixed as far as linux-next is concerned, but any non trivial
>> conflicts should be mentioned to your upstream maintainer when your tree
>> is submitted for merging.  You may also want to consider cooperating
>> with the maintainer of the conflicting tree to minimise any particularly
>> complex conflicts.
>
> We should have seen this before the gfs2 tree was merged into Linus'
> tree. Does that mean the gfs2 tree is not being pulled into the
> linux-next tree?

That's probably our fault, the gfs2 for-next branch was slightly
outdated. That patch would have been better in the gfs2 tree. How
would you like to proceed?

Thanks,
Andreas

Re: [PATCH 1/6] rhashtable: improve documentation for rhashtable_walk_peek()

2018-03-27 Thread Andreas Grünbacher

Neil,

2018-03-27 1:33 GMT+02:00 NeilBrown :
> The documentation for rhashtable_walk_peek() wrong.  It claims to
> return the *next* entry, whereas it in fact returns the *previous*
> entry.
> However if no entries have yet been returned - or if the iterator
> was reset due to a resize event, then rhashtable_walk_peek()
> *does* return the next entry, but also advances the iterator.
>
> I suspect that this interface should be discarded and the one user
> should be changed to not require it.  Possibly this patch should be
> seen as a first step in that conversation.
>
> This patch mostly corrects the documentation, but does make a
> small code change so that the documentation can be correct without
> listing too many special cases.  I don't think the one user will
> be affected by the code change.

how about I come up with a replacement so that we can remove
rhashtable_walk_peek straight away without making it differently
broken in the meantime?

Thanks,
Andreas

Re: [PATCH 1/6] rhashtable: improve documentation for rhashtable_walk_peek()

2018-03-27 Thread Andreas Grünbacher

Neil,

2018-03-27 1:33 GMT+02:00 NeilBrown :
> The documentation for rhashtable_walk_peek() wrong.  It claims to
> return the *next* entry, whereas it in fact returns the *previous*
> entry.
> However if no entries have yet been returned - or if the iterator
> was reset due to a resize event, then rhashtable_walk_peek()
> *does* return the next entry, but also advances the iterator.
>
> I suspect that this interface should be discarded and the one user
> should be changed to not require it.  Possibly this patch should be
> seen as a first step in that conversation.
>
> This patch mostly corrects the documentation, but does make a
> small code change so that the documentation can be correct without
> listing too many special cases.  I don't think the one user will
> be affected by the code change.

how about I come up with a replacement so that we can remove
rhashtable_walk_peek straight away without making it differently
broken in the meantime?

Thanks,
Andreas

Re: [PATCH] gfs2: select CONFIG_LIBCRC32C

2018-02-02 Thread Andreas Grünbacher

Hi Arnd,

2018-02-02 16:43 GMT+01:00 Arnd Bergmann :
> The new crc32c logic in gfs2_log_header_v2 causes a link
> error without libcrc32c:
>
> ERROR: "crc32c" [fs/gfs2/gfs2.ko] undefined!
>
> While the original patch selected CONFIG_CRYPTO_CRC32C to deal
> with this issue, it turned out to be the wrong symbol.
>
> Fixes: c1696fb85d33 ("GFS2: Introduce new gfs2_log_header_v2")
> Signed-off-by: Arnd Bergmann 
> ---
>  fs/gfs2/Kconfig | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
> index c0225d4b5435..3ed2b088dcfd 100644
> --- a/fs/gfs2/Kconfig
> +++ b/fs/gfs2/Kconfig
> @@ -3,8 +3,7 @@ config GFS2_FS
> depends on (64BIT || LBDAF)
> select FS_POSIX_ACL
> select CRC32
> -   select CRYPTO
> -   select CRYPTO_CRC32C
> +   select LIBCRC32C
> select QUOTACTL
> select FS_IOMAP
> help
> --
> 2.9.0
>

thanks for the fix, it's identical to what Bob has just asked Linus to
merge (https://lkml.org/lkml/2018/2/2/361).

Andreas

Re: [PATCH] gfs2: select CONFIG_LIBCRC32C

2018-02-02 Thread Andreas Grünbacher

Hi Arnd,

2018-02-02 16:43 GMT+01:00 Arnd Bergmann :
> The new crc32c logic in gfs2_log_header_v2 causes a link
> error without libcrc32c:
>
> ERROR: "crc32c" [fs/gfs2/gfs2.ko] undefined!
>
> While the original patch selected CONFIG_CRYPTO_CRC32C to deal
> with this issue, it turned out to be the wrong symbol.
>
> Fixes: c1696fb85d33 ("GFS2: Introduce new gfs2_log_header_v2")
> Signed-off-by: Arnd Bergmann 
> ---
>  fs/gfs2/Kconfig | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
> index c0225d4b5435..3ed2b088dcfd 100644
> --- a/fs/gfs2/Kconfig
> +++ b/fs/gfs2/Kconfig
> @@ -3,8 +3,7 @@ config GFS2_FS
> depends on (64BIT || LBDAF)
> select FS_POSIX_ACL
> select CRC32
> -   select CRYPTO
> -   select CRYPTO_CRC32C
> +   select LIBCRC32C
> select QUOTACTL
> select FS_IOMAP
> help
> --
> 2.9.0
>

thanks for the fix, it's identical to what Bob has just asked Linus to
merge (https://lkml.org/lkml/2018/2/2/361).

Andreas

Re: [PATCH 1/1] fs: Allows for xattr support to be compiled out

2017-04-12 Thread Andreas Grünbacher

 2017-04-12 0:47 GMT+02:00 Brian Ashworth :
> Allows for xattr syscalls and related functions to be compiled out.
> These are not needed on machines that do not utilize filesystems that
> support xattrs or userspaces that require extended attributes. This will
> aid in the tinification efforts.

How much does this shrink things in the end?

> The xattr_full_name function was moved from fs/xattr.c to
> include/linux/xattr.h and made static inline. This function only deals
> with interaction of the xattr_handler struct and does not require any
> of the xattr functions to be implemented.

This seems unnecessary.: when there are no xattr syscalls,
xattr_full_name shouldn't be called anymore.

> The rest of the functions in fs/xattr.c have had stubs created in
> include/linux/xattr.h to return the appropriate value to indicate that
> it is not supported or that there is no data.
>
> add/remove: 0/39 grow/shrink: 0/5 up/down: 0/-4020 (-4020)
> function old new   delta
> xattr_getsecurity  6   -  -6
> simple_xattr_list_add 13   - -13
> fdget 56  42 -14
> sys_lremovexattr  15   - -15
> sys_removexattr   18   - -18
> cap_inode_killpriv22   3 -19
> xattr_full_name   25   - -25
> sys_llistxattr25   - -25
> sys_listxattr 25   - -25
> cap_inode_need_killpriv   28   3 -25
> sys_lgetxattr 33   - -33
> sys_getxattr  33   - -33
> vfs_listxattr 34   - -34
> sys_setxattr  43   - -43
> sys_lsetxattr 43   - -43
> removexattr   56   - -56
> simple_xattr_alloc59   - -59
> sys_flistxattr61   - -61
> sys_fgetxattr 66   - -66
> vfs_getxattr  69   - -69
> __vfs_removexattr 70   - -70
> __vfs_getxattr71   - -71
> sys_fremovexattr  75   - -75
> simple_xattr_get  75   - -75
> vfs_removexattr   79   - -79
> simple_xattr_list 79   - -79
> sys_fsetxattr 89   - -89
> __vfs_setxattr91   - -91
> path_listxattr98   - -98
> path_getxattr103   --103
> __vfs_setxattr_noperm105   --105
> vfs_setxattr 113   --113
> path_removexattr 114   --114
> path_setxattr132   --132
> listxattr149   --149
> xattr_resolve_name   155   --155
> get_vfs_caps_from_disk   184  19-165
> generic_listxattr194   --194
> xattr_permission 208   --208
> vfs_getxattr_alloc   216   --216
> simple_xattr_set 231   --231
> setxattr 239   --239
> getxattr 240   --240
> cap_bprm_set_creds   613 366-247
> Total: Before=1900297, After=1896277, chg -0.21%
>
> Signed-off-by: Brian Ashworth 
> ---
>  fs/Kconfig |   1 +
>  fs/Makefile|   3 +-
>  fs/cifs/Kconfig|   1 +
>  fs/ext2/Kconfig|   1 +
>  fs/f2fs/Kconfig|   1 +
>  fs/jffs2/Kconfig   |   1 +
>  fs/reiserfs/Kconfig|   1 +
>  fs/squashfs/Kconfig|   1 +
>  fs/xattr.c |  24 
>  include/linux/xattr.h  | 126 
> -
>  init/Kconfig   |  11 
>  kernel/sys_ni.c|  12 
>  security/integrity/evm/Kconfig |   1 +
>  13 files changed, 157 insertions(+), 27 deletions(-)
>
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 83eab52fb3f6..18ce4032e177 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -173,6 +173,7 @@ config TMPFS_POSIX_ACL
>  config TMPFS_XATTR
> bool "Tmpfs extended attributes"
>

Re: [PATCH 1/1] fs: Allows for xattr support to be compiled out

2017-04-12 Thread Andreas Grünbacher

 2017-04-12 0:47 GMT+02:00 Brian Ashworth :
> Allows for xattr syscalls and related functions to be compiled out.
> These are not needed on machines that do not utilize filesystems that
> support xattrs or userspaces that require extended attributes. This will
> aid in the tinification efforts.

How much does this shrink things in the end?

> The xattr_full_name function was moved from fs/xattr.c to
> include/linux/xattr.h and made static inline. This function only deals
> with interaction of the xattr_handler struct and does not require any
> of the xattr functions to be implemented.

This seems unnecessary.: when there are no xattr syscalls,
xattr_full_name shouldn't be called anymore.

> The rest of the functions in fs/xattr.c have had stubs created in
> include/linux/xattr.h to return the appropriate value to indicate that
> it is not supported or that there is no data.
>
> add/remove: 0/39 grow/shrink: 0/5 up/down: 0/-4020 (-4020)
> function old new   delta
> xattr_getsecurity  6   -  -6
> simple_xattr_list_add 13   - -13
> fdget 56  42 -14
> sys_lremovexattr  15   - -15
> sys_removexattr   18   - -18
> cap_inode_killpriv22   3 -19
> xattr_full_name   25   - -25
> sys_llistxattr25   - -25
> sys_listxattr 25   - -25
> cap_inode_need_killpriv   28   3 -25
> sys_lgetxattr 33   - -33
> sys_getxattr  33   - -33
> vfs_listxattr 34   - -34
> sys_setxattr  43   - -43
> sys_lsetxattr 43   - -43
> removexattr   56   - -56
> simple_xattr_alloc59   - -59
> sys_flistxattr61   - -61
> sys_fgetxattr 66   - -66
> vfs_getxattr  69   - -69
> __vfs_removexattr 70   - -70
> __vfs_getxattr71   - -71
> sys_fremovexattr  75   - -75
> simple_xattr_get  75   - -75
> vfs_removexattr   79   - -79
> simple_xattr_list 79   - -79
> sys_fsetxattr 89   - -89
> __vfs_setxattr91   - -91
> path_listxattr98   - -98
> path_getxattr103   --103
> __vfs_setxattr_noperm105   --105
> vfs_setxattr 113   --113
> path_removexattr 114   --114
> path_setxattr132   --132
> listxattr149   --149
> xattr_resolve_name   155   --155
> get_vfs_caps_from_disk   184  19-165
> generic_listxattr194   --194
> xattr_permission 208   --208
> vfs_getxattr_alloc   216   --216
> simple_xattr_set 231   --231
> setxattr 239   --239
> getxattr 240   --240
> cap_bprm_set_creds   613 366-247
> Total: Before=1900297, After=1896277, chg -0.21%
>
> Signed-off-by: Brian Ashworth 
> ---
>  fs/Kconfig |   1 +
>  fs/Makefile|   3 +-
>  fs/cifs/Kconfig|   1 +
>  fs/ext2/Kconfig|   1 +
>  fs/f2fs/Kconfig|   1 +
>  fs/jffs2/Kconfig   |   1 +
>  fs/reiserfs/Kconfig|   1 +
>  fs/squashfs/Kconfig|   1 +
>  fs/xattr.c |  24 
>  include/linux/xattr.h  | 126 
> -
>  init/Kconfig   |  11 
>  kernel/sys_ni.c|  12 
>  security/integrity/evm/Kconfig |   1 +
>  13 files changed, 157 insertions(+), 27 deletions(-)
>
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 83eab52fb3f6..18ce4032e177 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -173,6 +173,7 @@ config TMPFS_POSIX_ACL
>  config TMPFS_XATTR
> bool "Tmpfs extended attributes"
> depends on TMPFS
> +   depends on

Re: [PATCH] overlayfs: ignore empty NFSv4 ACLs in ext4 upperdir

2016-12-05 Thread Andreas Grünbacher

2016-12-06 0:19 GMT+01:00 Andreas Grünbacher <andreas.gruenbac...@gmail.com>:
> 2016-12-05 23:58 GMT+01:00 Patrick Plagwitz <patrick_plagw...@web.de>:
>> On 12/05/2016 08:37 PM, Andreas Grünbacher wrote:
>>> 2016-12-05 17:25 GMT+01:00 J. Bruce Fields <bfie...@fieldses.org>:
>>>> On Mon, Dec 05, 2016 at 04:36:03PM +0100, Miklos Szeredi wrote:
>>>>> On Mon, Dec 5, 2016 at 4:19 PM, J. Bruce Fields <bfie...@fieldses.org> 
>>>>> wrote:
>>>>>>> Can NFS people comment on this?  Where does the nfs4_acl come from?
>>>>>>
>>>>>> This is the interface the NFS client provides for applications to modify
>>>>>> NFSv4 ACLs on servers that support them.
>>>>>
>>>>> Fine, but why are we seeing this xattr on exports where no xattrs are
>>>>> set on the exported fs?
>>>>
>>>> I don't know.  I took another look at the original patch and don't see
>>>> any details on the server setup: which server is it (knfsd, ganesha,
>>>> netapp, ...)?  How is it configured?
>>>>
>>>>>>> What can overlayfs do if it's a non-empty ACL?
>>>>>>
>>>>>> As little as possible.  You can't copy it up, can you?  So any attempt
>>>>>> to support it is going to be incomplete.
>>>>>
>>>>> Right.
>>>>>
>>>>>>
>>>>>>> Does knfsd translate posix ACL into NFS acl?  If so, we can translate
>>>>>>> back.  Should we do a generic POSIX<->NFS acl translator?
>>>>>>
>>>>>> knsd does translate between POSIX and NFSv4 ACLs.  It's a complicated
>>>>>
>>>>> This does explain the nfs4_acl xattr on the client.  Question: if it's
>>>>> empty, why have it at all?
>>>>
>>>> I'm honestly not sure what's going on there.  I'd be curious to see a
>>>> network trace if possible.
>>>
>>> I do see "system.nfs4_acl" attributes on knfsd exported filesystems
>>> that support POSIX ACLs (for ext4: "mount -o acl"). For exported
>>> filesystem that don't support POSIX ACLs (ext4: mount -o noacl), that
>>> attribute is missing. The attribute shouldn't be empty though; when
>>> the file has no real ACL, "system.nfs4_acl" represents the file mode
>>> permissions. The "system.nfs4_acl" attribute exposes the information
>>> on the wire; there is no resonable way to translate that into an ACL
>>> on another filesystem, really.
>>>
>>> Patrick, what does 'getfattr -m- -d /nfs/file' give you?
>>>
>> getfattr -m - -d nfs/folder -e text gives
>>
>> # file: nfs/folder/
>> system.nfs4_acl="\000\000\000^C\000\000\000\000\000\000\000\000\000^V^A\000\000\000^FOWNER@\000\000\000\000\000\000\000\000\000\000\000^R\000\000\000\000^FGROUP@\000\000\000\000\000\000\000\000\000\000\000^R\000\000\000\000
>> EVERYONE@\000\000"
>>
>> Those are 80 bytes. I checked again and vfs_getxattr indeed returns size=80.
>> It just looked empty because the first byte is 0... Ok, so nfs4_acl is not
>> empty after all and checking *value == 0 does not tell if there are actually
>> ACLs present or not, sorry for the confusion.
>>
>> You are right, when I mount the exported fs with noacl the problem goes away.
>> You already helped me there, thanks.
>>
>> Still, I think there should be a way to copy up files that actually have no
>> ACLs since acl is often the default for ext4 mounts and giving an "Operation
>> not supported" for random open(2)s is not a very good way to convey what's
>> going on.
>
> It's not hard to come up with a heuristic that determines if a
> system.nfs4_acl value is equivalent to a file mode, and to ignore the
> attribute in that case. (The file mode is transmitted in its own
> attribute already, so actually converting .) That way, overlayfs could
> still fail copying up files that have an actual ACL. It's still an
> ugly hack ...

Actually, that kind of heuristic would make sense in the NFS client
which could then hide the "system.nfs4_acl" attribute.

Andreas

Re: [PATCH] overlayfs: ignore empty NFSv4 ACLs in ext4 upperdir

2016-12-05 Thread Andreas Grünbacher

2016-12-06 0:19 GMT+01:00 Andreas Grünbacher :
> 2016-12-05 23:58 GMT+01:00 Patrick Plagwitz :
>> On 12/05/2016 08:37 PM, Andreas Grünbacher wrote:
>>> 2016-12-05 17:25 GMT+01:00 J. Bruce Fields :
>>>> On Mon, Dec 05, 2016 at 04:36:03PM +0100, Miklos Szeredi wrote:
>>>>> On Mon, Dec 5, 2016 at 4:19 PM, J. Bruce Fields  
>>>>> wrote:
>>>>>>> Can NFS people comment on this?  Where does the nfs4_acl come from?
>>>>>>
>>>>>> This is the interface the NFS client provides for applications to modify
>>>>>> NFSv4 ACLs on servers that support them.
>>>>>
>>>>> Fine, but why are we seeing this xattr on exports where no xattrs are
>>>>> set on the exported fs?
>>>>
>>>> I don't know.  I took another look at the original patch and don't see
>>>> any details on the server setup: which server is it (knfsd, ganesha,
>>>> netapp, ...)?  How is it configured?
>>>>
>>>>>>> What can overlayfs do if it's a non-empty ACL?
>>>>>>
>>>>>> As little as possible.  You can't copy it up, can you?  So any attempt
>>>>>> to support it is going to be incomplete.
>>>>>
>>>>> Right.
>>>>>
>>>>>>
>>>>>>> Does knfsd translate posix ACL into NFS acl?  If so, we can translate
>>>>>>> back.  Should we do a generic POSIX<->NFS acl translator?
>>>>>>
>>>>>> knsd does translate between POSIX and NFSv4 ACLs.  It's a complicated
>>>>>
>>>>> This does explain the nfs4_acl xattr on the client.  Question: if it's
>>>>> empty, why have it at all?
>>>>
>>>> I'm honestly not sure what's going on there.  I'd be curious to see a
>>>> network trace if possible.
>>>
>>> I do see "system.nfs4_acl" attributes on knfsd exported filesystems
>>> that support POSIX ACLs (for ext4: "mount -o acl"). For exported
>>> filesystem that don't support POSIX ACLs (ext4: mount -o noacl), that
>>> attribute is missing. The attribute shouldn't be empty though; when
>>> the file has no real ACL, "system.nfs4_acl" represents the file mode
>>> permissions. The "system.nfs4_acl" attribute exposes the information
>>> on the wire; there is no resonable way to translate that into an ACL
>>> on another filesystem, really.
>>>
>>> Patrick, what does 'getfattr -m- -d /nfs/file' give you?
>>>
>> getfattr -m - -d nfs/folder -e text gives
>>
>> # file: nfs/folder/
>> system.nfs4_acl="\000\000\000^C\000\000\000\000\000\000\000\000\000^V^A\000\000\000^FOWNER@\000\000\000\000\000\000\000\000\000\000\000^R\000\000\000\000^FGROUP@\000\000\000\000\000\000\000\000\000\000\000^R\000\000\000\000
>> EVERYONE@\000\000"
>>
>> Those are 80 bytes. I checked again and vfs_getxattr indeed returns size=80.
>> It just looked empty because the first byte is 0... Ok, so nfs4_acl is not
>> empty after all and checking *value == 0 does not tell if there are actually
>> ACLs present or not, sorry for the confusion.
>>
>> You are right, when I mount the exported fs with noacl the problem goes away.
>> You already helped me there, thanks.
>>
>> Still, I think there should be a way to copy up files that actually have no
>> ACLs since acl is often the default for ext4 mounts and giving an "Operation
>> not supported" for random open(2)s is not a very good way to convey what's
>> going on.
>
> It's not hard to come up with a heuristic that determines if a
> system.nfs4_acl value is equivalent to a file mode, and to ignore the
> attribute in that case. (The file mode is transmitted in its own
> attribute already, so actually converting .) That way, overlayfs could
> still fail copying up files that have an actual ACL. It's still an
> ugly hack ...

Actually, that kind of heuristic would make sense in the NFS client
which could then hide the "system.nfs4_acl" attribute.

Andreas

Re: [PATCH] overlayfs: ignore empty NFSv4 ACLs in ext4 upperdir

2016-12-05 Thread Andreas Grünbacher

2016-12-05 23:58 GMT+01:00 Patrick Plagwitz <patrick_plagw...@web.de>:
> On 12/05/2016 08:37 PM, Andreas Grünbacher wrote:
>> 2016-12-05 17:25 GMT+01:00 J. Bruce Fields <bfie...@fieldses.org>:
>>> On Mon, Dec 05, 2016 at 04:36:03PM +0100, Miklos Szeredi wrote:
>>>> On Mon, Dec 5, 2016 at 4:19 PM, J. Bruce Fields <bfie...@fieldses.org> 
>>>> wrote:
>>>>>> Can NFS people comment on this?  Where does the nfs4_acl come from?
>>>>>
>>>>> This is the interface the NFS client provides for applications to modify
>>>>> NFSv4 ACLs on servers that support them.
>>>>
>>>> Fine, but why are we seeing this xattr on exports where no xattrs are
>>>> set on the exported fs?
>>>
>>> I don't know.  I took another look at the original patch and don't see
>>> any details on the server setup: which server is it (knfsd, ganesha,
>>> netapp, ...)?  How is it configured?
>>>
>>>>>> What can overlayfs do if it's a non-empty ACL?
>>>>>
>>>>> As little as possible.  You can't copy it up, can you?  So any attempt
>>>>> to support it is going to be incomplete.
>>>>
>>>> Right.
>>>>
>>>>>
>>>>>> Does knfsd translate posix ACL into NFS acl?  If so, we can translate
>>>>>> back.  Should we do a generic POSIX<->NFS acl translator?
>>>>>
>>>>> knsd does translate between POSIX and NFSv4 ACLs.  It's a complicated
>>>>
>>>> This does explain the nfs4_acl xattr on the client.  Question: if it's
>>>> empty, why have it at all?
>>>
>>> I'm honestly not sure what's going on there.  I'd be curious to see a
>>> network trace if possible.
>>
>> I do see "system.nfs4_acl" attributes on knfsd exported filesystems
>> that support POSIX ACLs (for ext4: "mount -o acl"). For exported
>> filesystem that don't support POSIX ACLs (ext4: mount -o noacl), that
>> attribute is missing. The attribute shouldn't be empty though; when
>> the file has no real ACL, "system.nfs4_acl" represents the file mode
>> permissions. The "system.nfs4_acl" attribute exposes the information
>> on the wire; there is no resonable way to translate that into an ACL
>> on another filesystem, really.
>>
>> Patrick, what does 'getfattr -m- -d /nfs/file' give you?
>>
> getfattr -m - -d nfs/folder -e text gives
>
> # file: nfs/folder/
> system.nfs4_acl="\000\000\000^C\000\000\000\000\000\000\000\000\000^V^A\000\000\000^FOWNER@\000\000\000\000\000\000\000\000\000\000\000^R\000\000\000\000^FGROUP@\000\000\000\000\000\000\000\000\000\000\000^R\000\000\000\000
> EVERYONE@\000\000"
>
> Those are 80 bytes. I checked again and vfs_getxattr indeed returns size=80.
> It just looked empty because the first byte is 0... Ok, so nfs4_acl is not
> empty after all and checking *value == 0 does not tell if there are actually
> ACLs present or not, sorry for the confusion.
>
> You are right, when I mount the exported fs with noacl the problem goes away.
> You already helped me there, thanks.
>
> Still, I think there should be a way to copy up files that actually have no
> ACLs since acl is often the default for ext4 mounts and giving an "Operation
> not supported" for random open(2)s is not a very good way to convey what's
> going on.

It's not hard to come up with a heuristic that determines if a
system.nfs4_acl value is equivalent to a file mode, and to ignore the
attribute in that case. (The file mode is transmitted in its own
attribute already, so actually converting .) That way, overlayfs could
still fail copying up files that have an actual ACL. It's still an
ugly hack ...

Andreas

Re: [PATCH] overlayfs: ignore empty NFSv4 ACLs in ext4 upperdir

2016-12-05 Thread Andreas Grünbacher

2016-12-05 23:58 GMT+01:00 Patrick Plagwitz :
> On 12/05/2016 08:37 PM, Andreas Grünbacher wrote:
>> 2016-12-05 17:25 GMT+01:00 J. Bruce Fields :
>>> On Mon, Dec 05, 2016 at 04:36:03PM +0100, Miklos Szeredi wrote:
>>>> On Mon, Dec 5, 2016 at 4:19 PM, J. Bruce Fields  
>>>> wrote:
>>>>>> Can NFS people comment on this?  Where does the nfs4_acl come from?
>>>>>
>>>>> This is the interface the NFS client provides for applications to modify
>>>>> NFSv4 ACLs on servers that support them.
>>>>
>>>> Fine, but why are we seeing this xattr on exports where no xattrs are
>>>> set on the exported fs?
>>>
>>> I don't know.  I took another look at the original patch and don't see
>>> any details on the server setup: which server is it (knfsd, ganesha,
>>> netapp, ...)?  How is it configured?
>>>
>>>>>> What can overlayfs do if it's a non-empty ACL?
>>>>>
>>>>> As little as possible.  You can't copy it up, can you?  So any attempt
>>>>> to support it is going to be incomplete.
>>>>
>>>> Right.
>>>>
>>>>>
>>>>>> Does knfsd translate posix ACL into NFS acl?  If so, we can translate
>>>>>> back.  Should we do a generic POSIX<->NFS acl translator?
>>>>>
>>>>> knsd does translate between POSIX and NFSv4 ACLs.  It's a complicated
>>>>
>>>> This does explain the nfs4_acl xattr on the client.  Question: if it's
>>>> empty, why have it at all?
>>>
>>> I'm honestly not sure what's going on there.  I'd be curious to see a
>>> network trace if possible.
>>
>> I do see "system.nfs4_acl" attributes on knfsd exported filesystems
>> that support POSIX ACLs (for ext4: "mount -o acl"). For exported
>> filesystem that don't support POSIX ACLs (ext4: mount -o noacl), that
>> attribute is missing. The attribute shouldn't be empty though; when
>> the file has no real ACL, "system.nfs4_acl" represents the file mode
>> permissions. The "system.nfs4_acl" attribute exposes the information
>> on the wire; there is no resonable way to translate that into an ACL
>> on another filesystem, really.
>>
>> Patrick, what does 'getfattr -m- -d /nfs/file' give you?
>>
> getfattr -m - -d nfs/folder -e text gives
>
> # file: nfs/folder/
> system.nfs4_acl="\000\000\000^C\000\000\000\000\000\000\000\000\000^V^A\000\000\000^FOWNER@\000\000\000\000\000\000\000\000\000\000\000^R\000\000\000\000^FGROUP@\000\000\000\000\000\000\000\000\000\000\000^R\000\000\000\000
> EVERYONE@\000\000"
>
> Those are 80 bytes. I checked again and vfs_getxattr indeed returns size=80.
> It just looked empty because the first byte is 0... Ok, so nfs4_acl is not
> empty after all and checking *value == 0 does not tell if there are actually
> ACLs present or not, sorry for the confusion.
>
> You are right, when I mount the exported fs with noacl the problem goes away.
> You already helped me there, thanks.
>
> Still, I think there should be a way to copy up files that actually have no
> ACLs since acl is often the default for ext4 mounts and giving an "Operation
> not supported" for random open(2)s is not a very good way to convey what's
> going on.

It's not hard to come up with a heuristic that determines if a
system.nfs4_acl value is equivalent to a file mode, and to ignore the
attribute in that case. (The file mode is transmitted in its own
attribute already, so actually converting .) That way, overlayfs could
still fail copying up files that have an actual ACL. It's still an
ugly hack ...

Andreas

Re: [PATCH] overlayfs: ignore empty NFSv4 ACLs in ext4 upperdir

2016-12-05 Thread Andreas Grünbacher

2016-12-05 17:25 GMT+01:00 J. Bruce Fields :
> On Mon, Dec 05, 2016 at 04:36:03PM +0100, Miklos Szeredi wrote:
>> On Mon, Dec 5, 2016 at 4:19 PM, J. Bruce Fields  wrote:
>> >> Can NFS people comment on this?  Where does the nfs4_acl come from?
>> >
>> > This is the interface the NFS client provides for applications to modify
>> > NFSv4 ACLs on servers that support them.
>>
>> Fine, but why are we seeing this xattr on exports where no xattrs are
>> set on the exported fs?
>
> I don't know.  I took another look at the original patch and don't see
> any details on the server setup: which server is it (knfsd, ganesha,
> netapp, ...)?  How is it configured?
>
>> >> What can overlayfs do if it's a non-empty ACL?
>> >
>> > As little as possible.  You can't copy it up, can you?  So any attempt
>> > to support it is going to be incomplete.
>>
>> Right.
>>
>> >
>> >> Does knfsd translate posix ACL into NFS acl?  If so, we can translate
>> >> back.  Should we do a generic POSIX<->NFS acl translator?
>> >
>> > knsd does translate between POSIX and NFSv4 ACLs.  It's a complicated
>>
>> This does explain the nfs4_acl xattr on the client.  Question: if it's
>> empty, why have it at all?
>
> I'm honestly not sure what's going on there.  I'd be curious to see a
> network trace if possible.

I do see "system.nfs4_acl" attributes on knfsd exported filesystems
that support POSIX ACLs (for ext4: "mount -o acl"). For exported
filesystem that don't support POSIX ACLs (ext4: mount -o noacl), that
attribute is missing. The attribute shouldn't be empty though; when
the file has no real ACL, "system.nfs4_acl" represents the file mode
permissions. The "system.nfs4_acl" attribute exposes the information
on the wire; there is no resonable way to translate that into an ACL
on another filesystem, really.

Patrick, what does 'getfattr -m- -d /nfs/file' give you?

>> > algorithm, and lossy (in the NFSv4->POSIX direction).  The client
>> > developers have been understandably reluctant to have anything to do
>> > with it.
>> >
>> > So, I think listxattr should omit system.nfs4_acl, and attempts to
>> > set/get the attribute should error out.  The same should apply to any
>> > "system." attribute not supported by both filesystems, I think?
>>
>> Basically that's what happens now.  The problem is that nfsv4 mounts
>> seem always have these xattrs, even when the exported fs doesn't have
>> anything.
>
> I said "both", that's a logical "and".  Whether or not nfs claims
> support would then be irrelevant in this case, since ext4 doesn't
> support system.nfs4_acl.
>
>> We could do the copy up even if the NFS4->POSIX translation was
>> possible (which is the case with POSIX ACL translated by knfsd).  We'd
>> just get back the original ACL, so that's OK.
>
> Note that knfsd is an exception, most NFSv4-acl-supporting servers
> aren't translating from POSIX ACLs.
>
>> NFS is only supported as lower (read-only) layer, so we don't care
>> about changing the ACL on the server.
>
> Out of curiosity, how do you check permissions after copy up?
>
> The client doesn't do much permissions-checking normally, because it's
> hard to get right--even in the absence of ACLs, it may not understand
> the server's owners and groups completely.
>
> I guess that's fine, you may be happy to let people write to the file
> without permissions to the lower file, since the writes aren't going
> there anyway.
>
> So, I don't know what want here.
>
> You're not going to want to use the ACL for actual permission-checking,
> and you can't modify it, so it doesn't seem very useful.

IMO the only thing overlayfs can do is hide those attributes from the
user and ignore them when copying up. Still, the attributes shouldn't
be empty.

Andreas

Re: [PATCH] overlayfs: ignore empty NFSv4 ACLs in ext4 upperdir

2016-12-05 Thread Andreas Grünbacher

2016-12-05 17:25 GMT+01:00 J. Bruce Fields :
> On Mon, Dec 05, 2016 at 04:36:03PM +0100, Miklos Szeredi wrote:
>> On Mon, Dec 5, 2016 at 4:19 PM, J. Bruce Fields  wrote:
>> >> Can NFS people comment on this?  Where does the nfs4_acl come from?
>> >
>> > This is the interface the NFS client provides for applications to modify
>> > NFSv4 ACLs on servers that support them.
>>
>> Fine, but why are we seeing this xattr on exports where no xattrs are
>> set on the exported fs?
>
> I don't know.  I took another look at the original patch and don't see
> any details on the server setup: which server is it (knfsd, ganesha,
> netapp, ...)?  How is it configured?
>
>> >> What can overlayfs do if it's a non-empty ACL?
>> >
>> > As little as possible.  You can't copy it up, can you?  So any attempt
>> > to support it is going to be incomplete.
>>
>> Right.
>>
>> >
>> >> Does knfsd translate posix ACL into NFS acl?  If so, we can translate
>> >> back.  Should we do a generic POSIX<->NFS acl translator?
>> >
>> > knsd does translate between POSIX and NFSv4 ACLs.  It's a complicated
>>
>> This does explain the nfs4_acl xattr on the client.  Question: if it's
>> empty, why have it at all?
>
> I'm honestly not sure what's going on there.  I'd be curious to see a
> network trace if possible.

I do see "system.nfs4_acl" attributes on knfsd exported filesystems
that support POSIX ACLs (for ext4: "mount -o acl"). For exported
filesystem that don't support POSIX ACLs (ext4: mount -o noacl), that
attribute is missing. The attribute shouldn't be empty though; when
the file has no real ACL, "system.nfs4_acl" represents the file mode
permissions. The "system.nfs4_acl" attribute exposes the information
on the wire; there is no resonable way to translate that into an ACL
on another filesystem, really.

Patrick, what does 'getfattr -m- -d /nfs/file' give you?

>> > algorithm, and lossy (in the NFSv4->POSIX direction).  The client
>> > developers have been understandably reluctant to have anything to do
>> > with it.
>> >
>> > So, I think listxattr should omit system.nfs4_acl, and attempts to
>> > set/get the attribute should error out.  The same should apply to any
>> > "system." attribute not supported by both filesystems, I think?
>>
>> Basically that's what happens now.  The problem is that nfsv4 mounts
>> seem always have these xattrs, even when the exported fs doesn't have
>> anything.
>
> I said "both", that's a logical "and".  Whether or not nfs claims
> support would then be irrelevant in this case, since ext4 doesn't
> support system.nfs4_acl.
>
>> We could do the copy up even if the NFS4->POSIX translation was
>> possible (which is the case with POSIX ACL translated by knfsd).  We'd
>> just get back the original ACL, so that's OK.
>
> Note that knfsd is an exception, most NFSv4-acl-supporting servers
> aren't translating from POSIX ACLs.
>
>> NFS is only supported as lower (read-only) layer, so we don't care
>> about changing the ACL on the server.
>
> Out of curiosity, how do you check permissions after copy up?
>
> The client doesn't do much permissions-checking normally, because it's
> hard to get right--even in the absence of ACLs, it may not understand
> the server's owners and groups completely.
>
> I guess that's fine, you may be happy to let people write to the file
> without permissions to the lower file, since the writes aren't going
> there anyway.
>
> So, I don't know what want here.
>
> You're not going to want to use the ACL for actual permission-checking,
> and you can't modify it, so it doesn't seem very useful.

IMO the only thing overlayfs can do is hide those attributes from the
user and ignore them when copying up. Still, the attributes shouldn't
be empty.

Andreas

Re: [PATCH 10/12] posix_acl: don't ignore return value of posix_acl_create_masq()

2016-09-16 Thread Andreas Grünbacher

2016-09-16 14:19 GMT+02:00 Miklos Szeredi :
> Signed-off-by: Miklos Szeredi 
> Cc: Andreas Gruenbacher 
> ---
>  fs/posix_acl.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/fs/posix_acl.c b/fs/posix_acl.c
> index 59d47ab0791a..ea3eb6f3bf1e 100644
> --- a/fs/posix_acl.c
> +++ b/fs/posix_acl.c
> @@ -598,13 +598,14 @@ posix_acl_create(struct inode *dir, umode_t *mode,
> if (IS_ERR(p))
> return PTR_ERR(p);
>
> +   ret = -ENOMEM;
> clone = posix_acl_clone(p, GFP_NOFS);
> if (!clone)
> -   goto no_mem;
> +   goto err_release;
>
> ret = posix_acl_create_masq(clone, mode);
> if (ret < 0)
> -   goto no_mem_clone;
> +   goto err_release_clone;
>
> if (ret == 0)
> posix_acl_release(clone);
> @@ -618,11 +619,11 @@ posix_acl_create(struct inode *dir, umode_t *mode,
>
> return 0;
>
> -no_mem_clone:
> +err_release_clone:
> posix_acl_release(clone);
> -no_mem:
> +err_release:
> posix_acl_release(p);
> -   return -ENOMEM;
> +   return ret;
>  }
>  EXPORT_SYMBOL_GPL(posix_acl_create);

Indeed, the return value of posix_acl_create_masq shouldn't be ignored
here. posix_acl_create_masq can still only fail when the default ACL
of the parent directory is corrupted as users are prohibited from
setting invalid default ACLs.

Reviewed-by: Andreas Gruenbacher

Re: [PATCH 10/12] posix_acl: don't ignore return value of posix_acl_create_masq()

2016-09-16 Thread Andreas Grünbacher

2016-09-16 14:19 GMT+02:00 Miklos Szeredi :
> Signed-off-by: Miklos Szeredi 
> Cc: Andreas Gruenbacher 
> ---
>  fs/posix_acl.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/fs/posix_acl.c b/fs/posix_acl.c
> index 59d47ab0791a..ea3eb6f3bf1e 100644
> --- a/fs/posix_acl.c
> +++ b/fs/posix_acl.c
> @@ -598,13 +598,14 @@ posix_acl_create(struct inode *dir, umode_t *mode,
> if (IS_ERR(p))
> return PTR_ERR(p);
>
> +   ret = -ENOMEM;
> clone = posix_acl_clone(p, GFP_NOFS);
> if (!clone)
> -   goto no_mem;
> +   goto err_release;
>
> ret = posix_acl_create_masq(clone, mode);
> if (ret < 0)
> -   goto no_mem_clone;
> +   goto err_release_clone;
>
> if (ret == 0)
> posix_acl_release(clone);
> @@ -618,11 +619,11 @@ posix_acl_create(struct inode *dir, umode_t *mode,
>
> return 0;
>
> -no_mem_clone:
> +err_release_clone:
> posix_acl_release(clone);
> -no_mem:
> +err_release:
> posix_acl_release(p);
> -   return -ENOMEM;
> +   return ret;
>  }
>  EXPORT_SYMBOL_GPL(posix_acl_create);

Indeed, the return value of posix_acl_create_masq shouldn't be ignored
here. posix_acl_create_masq can still only fail when the default ACL
of the parent directory is corrupted as users are prohibited from
setting invalid default ACLs.

Reviewed-by: Andreas Gruenbacher

Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)

2016-08-17 Thread Andreas Grünbacher

Hi Jan,

2016-05-31 11:50 GMT+02:00 Jan Kara :
> On Tue 31-05-16 10:07:24, Minchan Kim wrote:
>> On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote:
>> > [adding lkml and linux-mm to the cc list]
>> >
>> > On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG 
>> > wrote:
>> > > Hi Dave,
>> > >   Hi Brian,
>> > >
>> > > below are the results with a vanilla 4.4.11 kernel.
>> >
>> > Thanks for persisting with the testing, Stefan.
>> >
>> > 
>> >
>> > > i've now used a vanilla 4.4.11 Kernel and the issue remains. After a
>> > > fresh reboot it has happened again on the root FS for a debian apt file:
>> > >
>> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 
>> > > 0x12b990
>> > > [ cut here ]
>> > > WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239
>> > > xfs_vm_releasepage+0x10f/0x140()
>> > > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
>> > > iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse
>> > > sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan
>> > > ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor
>> > > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod
>> > > ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
>> > > i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas
>> > > CPU: 1 PID: 111 Comm: kswapd0 Tainted: G   O4.4.11 #1
>> > > Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015
>> > >   880c4dacfa88 a23c5b8f 
>> > >  a2a51ab4 880c4dacfac8 a20837a7 880c4dacfae8
>> > >  0001 ea00010c3640 8802176b49d0 ea00010c3660
>> > > Call Trace:
>> > >  [] dump_stack+0x63/0x84
>> > >  [] warn_slowpath_common+0x97/0xe0
>> > >  [] warn_slowpath_null+0x1a/0x20
>> > >  [] xfs_vm_releasepage+0x10f/0x140
>> > >  [] ? page_mkclean_one+0xd0/0xd0
>> > >  [] ? anon_vma_prepare+0x150/0x150
>> > >  [] try_to_release_page+0x32/0x50
>> > >  [] shrink_active_list+0x3ce/0x3e0
>> > >  [] shrink_lruvec+0x687/0x7d0
>> > >  [] shrink_zone+0xdc/0x2c0
>> > >  [] kswapd+0x4f9/0x970
>> > >  [] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>> > >  [] kthread+0xc9/0xe0
>> > >  [] ? kthread_stop+0x100/0x100
>> > >  [] ret_from_fork+0x3f/0x70
>> > >  [] ? kthread_stop+0x100/0x100
>> > > ---[ end trace c9d679f8ed4d7610 ]---
>> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size
>> > > 0x12b990
>> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size
>> > .
>> >
>> > Ok, I suspect this may be a VM bug. I've been looking at the 4.6
>> > code (so please try to reproduce on that kernel!) but it looks to me
>> > like the only way we can get from shrink_active_list() direct to
>> > try_to_release_page() is if we are over the maximum bufferhead
>> > threshold (i.e buffer_heads_over_limit = true) and we are trying to
>> > reclaim pages direct from the active list.
>> >
>> > Because we are called from kswapd()->balance_pgdat(), we have:
>> >
>> > struct scan_control sc = {
>> > .gfp_mask = GFP_KERNEL,
>> > .order = order,
>> > .priority = DEF_PRIORITY,
>> > .may_writepage = !laptop_mode,
>> > .may_unmap = 1,
>> > .may_swap = 1,
>> > };
>> >
>> > The key point here is reclaim is being run with .may_writepage =
>> > true for default configuration kernels. when we get to
>> > shrink_active_list():
>> >
>> > if (!sc->may_writepage)
>> > isolate_mode |= ISOLATE_CLEAN;
>> >
>> > But sc->may_writepage = true and this allows isolate_lru_pages() to
>> > isolate dirty pages from the active list. Normally this isn't a
>> > problem, because the isolated active list pages are rotated to the
>> > inactive list, and nothing else happens to them. *Except when
>> > buffer_heads_over_limit = true*. This special condition would
>> > explain why I have never seen apt/dpkg cause this problem on any of
>> > my (many) Debian systems that all use XFS
>> >
>> > In that case, shrink_active_list() runs:
>> >
>> > if (unlikely(buffer_heads_over_limit)) {
>> > if (page_has_private(page) && trylock_page(page)) {
>> > if (page_has_private(page))
>> > try_to_release_page(page, 0);
>> > unlock_page(page);
>> > }
>> > }
>> >
>> > i.e. it locks the page, and if it has buffer heads it trys to get
>> > the bufferheads freed from the page.
>> >
>> > But this is a dirty page, which means it may have delalloc or
>> > unwritten state on it's buffers, both of which indicate that there
>> > is dirty data in teh page that hasn't been written. XFS issues a
>> > warning on this because neither shrink_active_list nor
>> > try_to_release_page() check for whether the page is dirty or not.
>> >
>> >

Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)

2016-08-17 Thread Andreas Grünbacher

Hi Jan,

2016-05-31 11:50 GMT+02:00 Jan Kara :
> On Tue 31-05-16 10:07:24, Minchan Kim wrote:
>> On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote:
>> > [adding lkml and linux-mm to the cc list]
>> >
>> > On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG 
>> > wrote:
>> > > Hi Dave,
>> > >   Hi Brian,
>> > >
>> > > below are the results with a vanilla 4.4.11 kernel.
>> >
>> > Thanks for persisting with the testing, Stefan.
>> >
>> > 
>> >
>> > > i've now used a vanilla 4.4.11 Kernel and the issue remains. After a
>> > > fresh reboot it has happened again on the root FS for a debian apt file:
>> > >
>> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x0 size 
>> > > 0x12b990
>> > > [ cut here ]
>> > > WARNING: CPU: 1 PID: 111 at fs/xfs/xfs_aops.c:1239
>> > > xfs_vm_releasepage+0x10f/0x140()
>> > > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 xt_multiport
>> > > iptable_filter ip_tables x_tables bonding coretemp 8021q garp fuse
>> > > sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd shpchp vxlan
>> > > ip6_udp_tunnel udp_tunnel ipmi_si ipmi_msghandler button btrfs xor
>> > > raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod
>> > > ehci_pci ehci_hcd usbcore usb_common igb ahci i2c_algo_bit libahci
>> > > i2c_core mpt3sas ptp pps_core raid_class scsi_transport_sas
>> > > CPU: 1 PID: 111 Comm: kswapd0 Tainted: G   O4.4.11 #1
>> > > Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0b 05/18/2015
>> > >   880c4dacfa88 a23c5b8f 
>> > >  a2a51ab4 880c4dacfac8 a20837a7 880c4dacfae8
>> > >  0001 ea00010c3640 8802176b49d0 ea00010c3660
>> > > Call Trace:
>> > >  [] dump_stack+0x63/0x84
>> > >  [] warn_slowpath_common+0x97/0xe0
>> > >  [] warn_slowpath_null+0x1a/0x20
>> > >  [] xfs_vm_releasepage+0x10f/0x140
>> > >  [] ? page_mkclean_one+0xd0/0xd0
>> > >  [] ? anon_vma_prepare+0x150/0x150
>> > >  [] try_to_release_page+0x32/0x50
>> > >  [] shrink_active_list+0x3ce/0x3e0
>> > >  [] shrink_lruvec+0x687/0x7d0
>> > >  [] shrink_zone+0xdc/0x2c0
>> > >  [] kswapd+0x4f9/0x970
>> > >  [] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>> > >  [] kthread+0xc9/0xe0
>> > >  [] ? kthread_stop+0x100/0x100
>> > >  [] ret_from_fork+0x3f/0x70
>> > >  [] ? kthread_stop+0x100/0x100
>> > > ---[ end trace c9d679f8ed4d7610 ]---
>> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x1000 size
>> > > 0x12b990
>> > > XFS (md127p3): ino 0x41221d1 delalloc 1 unwritten 0 pgoff 0x2000 size
>> > .
>> >
>> > Ok, I suspect this may be a VM bug. I've been looking at the 4.6
>> > code (so please try to reproduce on that kernel!) but it looks to me
>> > like the only way we can get from shrink_active_list() direct to
>> > try_to_release_page() is if we are over the maximum bufferhead
>> > threshold (i.e buffer_heads_over_limit = true) and we are trying to
>> > reclaim pages direct from the active list.
>> >
>> > Because we are called from kswapd()->balance_pgdat(), we have:
>> >
>> > struct scan_control sc = {
>> > .gfp_mask = GFP_KERNEL,
>> > .order = order,
>> > .priority = DEF_PRIORITY,
>> > .may_writepage = !laptop_mode,
>> > .may_unmap = 1,
>> > .may_swap = 1,
>> > };
>> >
>> > The key point here is reclaim is being run with .may_writepage =
>> > true for default configuration kernels. when we get to
>> > shrink_active_list():
>> >
>> > if (!sc->may_writepage)
>> > isolate_mode |= ISOLATE_CLEAN;
>> >
>> > But sc->may_writepage = true and this allows isolate_lru_pages() to
>> > isolate dirty pages from the active list. Normally this isn't a
>> > problem, because the isolated active list pages are rotated to the
>> > inactive list, and nothing else happens to them. *Except when
>> > buffer_heads_over_limit = true*. This special condition would
>> > explain why I have never seen apt/dpkg cause this problem on any of
>> > my (many) Debian systems that all use XFS
>> >
>> > In that case, shrink_active_list() runs:
>> >
>> > if (unlikely(buffer_heads_over_limit)) {
>> > if (page_has_private(page) && trylock_page(page)) {
>> > if (page_has_private(page))
>> > try_to_release_page(page, 0);
>> > unlock_page(page);
>> > }
>> > }
>> >
>> > i.e. it locks the page, and if it has buffer heads it trys to get
>> > the bufferheads freed from the page.
>> >
>> > But this is a dirty page, which means it may have delalloc or
>> > unwritten state on it's buffers, both of which indicate that there
>> > is dirty data in teh page that hasn't been written. XFS issues a
>> > warning on this because neither shrink_active_list nor
>> > try_to_release_page() check for whether the page is dirty or not.
>> >
>> > Hence it seems

Re: [PATCH] ubifs: Fix xattr generic handler usage

2016-08-01 Thread Andreas Grünbacher

2016-07-31 21:42 GMT+02:00 Richard Weinberger :
> UBIFS uses full names to work with xattrs, therefore we have to use
> xattr_full_name() to obtain the xattr prefix as string.
>
> Cc: 
> Cc: Andreas Gruenbacher 
> Fixes: 2b88fc21ca ("ubifs: Switch to generic xattr handlers")
> Signed-off-by: Richard Weinberger 

Reviewed-by: Andreas Gruenbacher 

> ---
>  fs/ubifs/xattr.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c
> index b5fc279..c63710f 100644
> --- a/fs/ubifs/xattr.c
> +++ b/fs/ubifs/xattr.c
> @@ -575,7 +575,8 @@ static int ubifs_xattr_get(const struct xattr_handler 
> *handler,
> dbg_gen("xattr '%s', ino %lu ('%pd'), buf size %zd", name,
> inode->i_ino, dentry, size);
>
> -   return  __ubifs_getxattr(inode, name, buffer, size);
> +   name = xattr_full_name(handler, name);
> +   return __ubifs_getxattr(inode, name, buffer, size);
>  }
>
>  static int ubifs_xattr_set(const struct xattr_handler *handler,
> @@ -586,6 +587,8 @@ static int ubifs_xattr_set(const struct xattr_handler 
> *handler,
> dbg_gen("xattr '%s', host ino %lu ('%pd'), size %zd",
> name, inode->i_ino, dentry, size);
>
> +   name = xattr_full_name(handler, name);
> +
> if (value)
> return __ubifs_setxattr(inode, name, value, size, flags);
> else
> --
> 2.7.3
>

Re: [PATCH] ubifs: Fix xattr generic handler usage

2016-08-01 Thread Andreas Grünbacher

2016-07-31 21:42 GMT+02:00 Richard Weinberger :
> UBIFS uses full names to work with xattrs, therefore we have to use
> xattr_full_name() to obtain the xattr prefix as string.
>
> Cc: 
> Cc: Andreas Gruenbacher 
> Fixes: 2b88fc21ca ("ubifs: Switch to generic xattr handlers")
> Signed-off-by: Richard Weinberger 

Reviewed-by: Andreas Gruenbacher 

> ---
>  fs/ubifs/xattr.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c
> index b5fc279..c63710f 100644
> --- a/fs/ubifs/xattr.c
> +++ b/fs/ubifs/xattr.c
> @@ -575,7 +575,8 @@ static int ubifs_xattr_get(const struct xattr_handler 
> *handler,
> dbg_gen("xattr '%s', ino %lu ('%pd'), buf size %zd", name,
> inode->i_ino, dentry, size);
>
> -   return  __ubifs_getxattr(inode, name, buffer, size);
> +   name = xattr_full_name(handler, name);
> +   return __ubifs_getxattr(inode, name, buffer, size);
>  }
>
>  static int ubifs_xattr_set(const struct xattr_handler *handler,
> @@ -586,6 +587,8 @@ static int ubifs_xattr_set(const struct xattr_handler 
> *handler,
> dbg_gen("xattr '%s', host ino %lu ('%pd'), size %zd",
> name, inode->i_ino, dentry, size);
>
> +   name = xattr_full_name(handler, name);
> +
> if (value)
> return __ubifs_setxattr(inode, name, value, size, flags);
> else
> --
> 2.7.3
>

Re: [RFC v3 42/45] nfs: Add richacl support

2016-07-15 Thread Andreas Grünbacher

Hi Andy,

2016-06-23 18:37 GMT+02:00 Weston Andros Adamson :
>> On Apr 24, 2015, at 7:04 AM, Andreas Gruenbacher 
>>  wrote:
>> Changes nfs to support the "system.richacl" xattr instead of 
>> "system.nfs4_acl".
>>
>> The "system.nfs4_acl" xattr nfs uses directly exposes the on-the-wire format 
>> of
>> NFSv4's acl attribute to user space. This has at least two downsides: (1) the
>> format is different from other file systems, so user-space code needs to be 
>> nfs
>> filesystem aware; (2) when symbolic user@domain and group@domain names are 
>> used
>> in the acl, user-space needs to perform ID mapping in the same way as the
>> kernel.
>>
>> Previously, when user-space requested only the length of the 
>> "system.nfs4_acl"
>> attribute, nfs didn't need to retrieve the entire "system.nfs4_acl" 
>> attribute;
>> retrieving its length was enough.  With the "system.richacl" xattr, the 
>> length
>> of "system.richacl" cannot be computed from the length of the NFSv4 acl
>> attribute, so we always need to retrieve and cache the acl even when 
>> user-space
>> only asks for the length of the "system.richacl" attribute.
>>
>> Because the nfs client now knows which kind of acl the user is trying to set,
>> it will now no longer sends acls with deny entries to servers which didn't
>> declare support for that feature.  The maximum supported size of the NFSv4 
>> acl
>> attribute is now hard coded in the client code and no longer depends on the
>> size of the buffer the user provides to the getxattr system call.  When an 
>> acl
>> exceeds this limit, getxattr fails with errno set to ENOMEM.
>>
>> Signed-off-by: Andreas Gruenbacher 
>> ---
>> fs/nfs/inode.c|   3 -
>> fs/nfs/nfs4proc.c | 354 
>> --
>> fs/nfs/nfs4xdr.c  | 224 +
>> fs/nfs/super.c|   4 +-
>> include/linux/nfs_fs.h|   1 -
>> include/linux/nfs_fs_sb.h |   2 +
>> include/linux/nfs_xdr.h   |   8 +-
>> 7 files changed, 357 insertions(+), 239 deletions(-)
>>
>> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
>> index d42dff6..e67f72e 100644
>> --- a/fs/nfs/inode.c
>> +++ b/fs/nfs/inode.c
>> @@ -1824,9 +1824,6 @@ struct inode *nfs_alloc_inode(struct super_block *sb)
>>   return NULL;
>>   nfsi->flags = 0UL;
>>   nfsi->cache_validity = 0UL;
>> -#if IS_ENABLED(CONFIG_NFS_V4)
>> - nfsi->nfs4_acl = NULL;
>> -#endif /* CONFIG_NFS_V4 */
>>   return >vfs_inode;
>> }
>> EXPORT_SYMBOL_GPL(nfs_alloc_inode);
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 8c50670..c2ba4f0 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -55,6 +55,8 @@
>> #include 
>> #include 
>> #include 
>> +#include 
>> +#include 
>>
>> #include "nfs4_fs.h"
>> #include "delegation.h"
>> @@ -2892,15 +2894,18 @@ static int _nfs4_server_capabilities(struct 
>> nfs_server *server, struct nfs_fh *f
>>   res.attr_bitmask[2] &= FATTR4_WORD2_NFS42_MASK;
>>   }
>>   memcpy(server->attr_bitmask, res.attr_bitmask, 
>> sizeof(server->attr_bitmask));
>> - server->caps &= ~(NFS_CAP_ACLS|NFS_CAP_HARDLINKS|
>> - NFS_CAP_SYMLINKS|NFS_CAP_FILEID|
>> + server->caps &= ~(NFS_CAP_ALLOW_ACLS|NFS_CAP_DENY_ACLS|
>> + 
>> NFS_CAP_HARDLINKS|NFS_CAP_SYMLINKS|NFS_CAP_FILEID|
>>   NFS_CAP_MODE|NFS_CAP_NLINK|NFS_CAP_OWNER|
>>   NFS_CAP_OWNER_GROUP|NFS_CAP_ATIME|
>>   NFS_CAP_CTIME|NFS_CAP_MTIME|
>>   NFS_CAP_SECURITY_LABEL);
>> - if (res.attr_bitmask[0] & FATTR4_WORD0_ACL &&
>> - res.acl_bitmask & ACL4_SUPPORT_ALLOW_ACL)
>> - server->caps |= NFS_CAP_ACLS;
>> + if (res.attr_bitmask[0] & FATTR4_WORD0_ACL) {
>> + if (res.acl_bitmask & ACL4_SUPPORT_ALLOW_ACL)
>> + server->caps |= NFS_CAP_ALLOW_ACLS;
>> + if (res.acl_bitmask & ACL4_SUPPORT_DENY_ACL)
>> + server->caps |= NFS_CAP_DENY_ACLS;
>> + }
>>   if (res.has_links != 0)
>>   server->caps |= NFS_CAP_HARDLINKS;
>>   if (res.has_symlinks != 0)
>> @@ -4428,45 +4433,11 @@ static int nfs4_proc_renew(struct nfs_client *clp, 
>> struct rpc_cred *cred)
>>   return 0;
>> }
>>
>> -static inline int nfs4_server_supports_acls(struct nfs_server *server)
>> -{
>> - return server->caps & NFS_CAP_ACLS;
>> -}
>> -
>> -/* Assuming that XATTR_SIZE_MAX is a multiple of PAGE_SIZE, and that
>> - * it's OK to put sizeof(void) * (XATTR_SIZE_MAX/PAGE_SIZE) bytes on
>> - * the stack.
>> +/* A arbitrary limit; we allocate at most DIV_ROUND_UP(NFS4ACL_SIZE_MAX,
>> + * PAGE_SIZE) pages and put an array of

Re: [RFC v3 42/45] nfs: Add richacl support

2016-07-15 Thread Andreas Grünbacher

Hi Andy,

2016-06-23 18:37 GMT+02:00 Weston Andros Adamson :
>> On Apr 24, 2015, at 7:04 AM, Andreas Gruenbacher 
>>  wrote:
>> Changes nfs to support the "system.richacl" xattr instead of 
>> "system.nfs4_acl".
>>
>> The "system.nfs4_acl" xattr nfs uses directly exposes the on-the-wire format 
>> of
>> NFSv4's acl attribute to user space. This has at least two downsides: (1) the
>> format is different from other file systems, so user-space code needs to be 
>> nfs
>> filesystem aware; (2) when symbolic user@domain and group@domain names are 
>> used
>> in the acl, user-space needs to perform ID mapping in the same way as the
>> kernel.
>>
>> Previously, when user-space requested only the length of the 
>> "system.nfs4_acl"
>> attribute, nfs didn't need to retrieve the entire "system.nfs4_acl" 
>> attribute;
>> retrieving its length was enough.  With the "system.richacl" xattr, the 
>> length
>> of "system.richacl" cannot be computed from the length of the NFSv4 acl
>> attribute, so we always need to retrieve and cache the acl even when 
>> user-space
>> only asks for the length of the "system.richacl" attribute.
>>
>> Because the nfs client now knows which kind of acl the user is trying to set,
>> it will now no longer sends acls with deny entries to servers which didn't
>> declare support for that feature.  The maximum supported size of the NFSv4 
>> acl
>> attribute is now hard coded in the client code and no longer depends on the
>> size of the buffer the user provides to the getxattr system call.  When an 
>> acl
>> exceeds this limit, getxattr fails with errno set to ENOMEM.
>>
>> Signed-off-by: Andreas Gruenbacher 
>> ---
>> fs/nfs/inode.c|   3 -
>> fs/nfs/nfs4proc.c | 354 
>> --
>> fs/nfs/nfs4xdr.c  | 224 +
>> fs/nfs/super.c|   4 +-
>> include/linux/nfs_fs.h|   1 -
>> include/linux/nfs_fs_sb.h |   2 +
>> include/linux/nfs_xdr.h   |   8 +-
>> 7 files changed, 357 insertions(+), 239 deletions(-)
>>
>> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
>> index d42dff6..e67f72e 100644
>> --- a/fs/nfs/inode.c
>> +++ b/fs/nfs/inode.c
>> @@ -1824,9 +1824,6 @@ struct inode *nfs_alloc_inode(struct super_block *sb)
>>   return NULL;
>>   nfsi->flags = 0UL;
>>   nfsi->cache_validity = 0UL;
>> -#if IS_ENABLED(CONFIG_NFS_V4)
>> - nfsi->nfs4_acl = NULL;
>> -#endif /* CONFIG_NFS_V4 */
>>   return >vfs_inode;
>> }
>> EXPORT_SYMBOL_GPL(nfs_alloc_inode);
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 8c50670..c2ba4f0 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -55,6 +55,8 @@
>> #include 
>> #include 
>> #include 
>> +#include 
>> +#include 
>>
>> #include "nfs4_fs.h"
>> #include "delegation.h"
>> @@ -2892,15 +2894,18 @@ static int _nfs4_server_capabilities(struct 
>> nfs_server *server, struct nfs_fh *f
>>   res.attr_bitmask[2] &= FATTR4_WORD2_NFS42_MASK;
>>   }
>>   memcpy(server->attr_bitmask, res.attr_bitmask, 
>> sizeof(server->attr_bitmask));
>> - server->caps &= ~(NFS_CAP_ACLS|NFS_CAP_HARDLINKS|
>> - NFS_CAP_SYMLINKS|NFS_CAP_FILEID|
>> + server->caps &= ~(NFS_CAP_ALLOW_ACLS|NFS_CAP_DENY_ACLS|
>> + 
>> NFS_CAP_HARDLINKS|NFS_CAP_SYMLINKS|NFS_CAP_FILEID|
>>   NFS_CAP_MODE|NFS_CAP_NLINK|NFS_CAP_OWNER|
>>   NFS_CAP_OWNER_GROUP|NFS_CAP_ATIME|
>>   NFS_CAP_CTIME|NFS_CAP_MTIME|
>>   NFS_CAP_SECURITY_LABEL);
>> - if (res.attr_bitmask[0] & FATTR4_WORD0_ACL &&
>> - res.acl_bitmask & ACL4_SUPPORT_ALLOW_ACL)
>> - server->caps |= NFS_CAP_ACLS;
>> + if (res.attr_bitmask[0] & FATTR4_WORD0_ACL) {
>> + if (res.acl_bitmask & ACL4_SUPPORT_ALLOW_ACL)
>> + server->caps |= NFS_CAP_ALLOW_ACLS;
>> + if (res.acl_bitmask & ACL4_SUPPORT_DENY_ACL)
>> + server->caps |= NFS_CAP_DENY_ACLS;
>> + }
>>   if (res.has_links != 0)
>>   server->caps |= NFS_CAP_HARDLINKS;
>>   if (res.has_symlinks != 0)
>> @@ -4428,45 +4433,11 @@ static int nfs4_proc_renew(struct nfs_client *clp, 
>> struct rpc_cred *cred)
>>   return 0;
>> }
>>
>> -static inline int nfs4_server_supports_acls(struct nfs_server *server)
>> -{
>> - return server->caps & NFS_CAP_ACLS;
>> -}
>> -
>> -/* Assuming that XATTR_SIZE_MAX is a multiple of PAGE_SIZE, and that
>> - * it's OK to put sizeof(void) * (XATTR_SIZE_MAX/PAGE_SIZE) bytes on
>> - * the stack.
>> +/* A arbitrary limit; we allocate at most DIV_ROUND_UP(NFS4ACL_SIZE_MAX,
>> + * PAGE_SIZE) pages and put an array of DIV_ROUND_UP(NFS4ACL_SIZE_MAX,
>> + * PAGE_SIZE) pages on the stack when

Re: [PATCH] staging: lustre: llite: drop acl from cache

2016-05-23 Thread Andreas Grünbacher

2016-05-24 2:35 GMT+02:00 James Simmons :
> Commit b8a7a3a6 change get_acl() for posix xattr to always cache
> the ACL which increases the reference count. That reference count
> can be reduced by have ll_get_acl() call forget_cached_acl() which
> it wasn't. When an inode gets deleted by Lustre the POSIX ACL
> reference count is tested to ensure its 1 and if not produces an error.

Lustre shouldn't assume that the VFS immediately drops the reference
it is passed. Please remove that check as well.

> Since forget_cached_acl() was not called Lustre started to complain.
> This patch changes ll_get_acl() to call forget_cached_acl().
>
> Signed-off-by: James Simmons 

Reviewed-by: Andreas Gruenbacher 

> ---
>  drivers/staging/lustre/lustre/llite/file.c |1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/llite/file.c 
> b/drivers/staging/lustre/lustre/llite/file.c
> index f47f2ac..0191945 100644
> --- a/drivers/staging/lustre/lustre/llite/file.c
> +++ b/drivers/staging/lustre/lustre/llite/file.c
> @@ -3124,6 +3124,7 @@ struct posix_acl *ll_get_acl(struct inode *inode, int 
> type)
> spin_lock(>lli_lock);
> /* VFS' acl_permission_check->check_acl will release the refcount */
> acl = posix_acl_dup(lli->lli_posix_acl);
> +   forget_cached_acl(inode, type);
> spin_unlock(>lli_lock);
>
> return acl;
> --
> 1.7.1
>

Re: [PATCH] staging: lustre: llite: drop acl from cache

2016-05-23 Thread Andreas Grünbacher

2016-05-24 2:35 GMT+02:00 James Simmons :
> Commit b8a7a3a6 change get_acl() for posix xattr to always cache
> the ACL which increases the reference count. That reference count
> can be reduced by have ll_get_acl() call forget_cached_acl() which
> it wasn't. When an inode gets deleted by Lustre the POSIX ACL
> reference count is tested to ensure its 1 and if not produces an error.

Lustre shouldn't assume that the VFS immediately drops the reference
it is passed. Please remove that check as well.

> Since forget_cached_acl() was not called Lustre started to complain.
> This patch changes ll_get_acl() to call forget_cached_acl().
>
> Signed-off-by: James Simmons 

Reviewed-by: Andreas Gruenbacher 

> ---
>  drivers/staging/lustre/lustre/llite/file.c |1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/llite/file.c 
> b/drivers/staging/lustre/lustre/llite/file.c
> index f47f2ac..0191945 100644
> --- a/drivers/staging/lustre/lustre/llite/file.c
> +++ b/drivers/staging/lustre/lustre/llite/file.c
> @@ -3124,6 +3124,7 @@ struct posix_acl *ll_get_acl(struct inode *inode, int 
> type)
> spin_lock(>lli_lock);
> /* VFS' acl_permission_check->check_acl will release the refcount */
> acl = posix_acl_dup(lli->lli_posix_acl);
> +   forget_cached_acl(inode, type);
> spin_unlock(>lli_lock);
>
> return acl;
> --
> 1.7.1
>

Re: [lkp] [selinux] 5d226df4ed: -10.1% unixbench.score

2016-01-04 Thread Andreas Grünbacher

2015-12-28 2:59 GMT+01:00 kernel test robot :
> FYI, we noticed the below changes on
>
> git://git.infradead.org/users/pcmoore/selinux stable-4.5
> commit 5d226df4edfa0eb1e689e7ac2741cf261ff7cbf1 ("selinux: Revalidate invalid 
> inode security labels")

I could reproduce the performance regression with:

  $ wget https://byte-unixbench.googlecode.com/files/UnixBench5.1.3.tgz
  $ tar xvfz UnixBench5.1.3.tgz
  $ cd UnixBench
  $ make
  $ ./Run pipe -c 1

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [lkp] [selinux] 5d226df4ed: -10.1% unixbench.score

2016-01-04 Thread Andreas Grünbacher

2015-12-28 2:59 GMT+01:00 kernel test robot :
> FYI, we noticed the below changes on
>
> git://git.infradead.org/users/pcmoore/selinux stable-4.5
> commit 5d226df4edfa0eb1e689e7ac2741cf261ff7cbf1 ("selinux: Revalidate invalid 
> inode security labels")
>
>
> =
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/tbox_group/test/testcase:
>   
> gcc-4.9/performance/x86_64-rhel/1/debian-x86_64-2015-02-07.cgz/ivb42/pipe/unixbench
>
> commit:
>   6f3be9f562e3027c77bc4482ccf2cea8600a7f74
>   5d226df4edfa0eb1e689e7ac2741cf261ff7cbf1
>
> 6f3be9f562e3027c 5d226df4edfa0eb1e689e7ac27
>  --
>  %stddev %change %stddev
>  \  |\
>   2086 ±  0% -10.1%   1876 ±  0%  unixbench.score
>   7.79 ±  0% -10.2%   7.00 ±  1%  time.user_time
>  33404 ±117%   +9470.5%3196941 ±173%  
> latency_stats.max.wait_on_page_bit.__filemap_fdatawait_range.filemap_fdatawait_keep_errors.sync_inodes_sb.sync_inodes_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
>  18934 ± 34% -82.8%   3252 ± 73%  
> latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_execute.rpc_run_task.nfs4_call_sync_sequence.[nfsv4]._nfs4_proc_access.[nfsv4].nfs4_proc_access.[nfsv4].nfs_do_access.nfs_permission.__inode_permission.inode_permission.link_path_walk
>  57519 ±100%   +5478.5%3208706 ±172%  
> latency_stats.sum.wait_on_page_bit.__filemap_fdatawait_range.filemap_fdatawait_keep_errors.sync_inodes_sb.sync_inodes_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
>  13414 ± 33% +49.4%  20036 ±  8%  numa-meminfo.node0.Active(anon)
>  13399 ± 33% +49.0%  19963 ±  7%  numa-meminfo.node0.AnonPages
>  11624 ± 38% -52.7%   5495 ± 18%  numa-meminfo.node1.Active(anon)
>  11518 ± 38% -52.6%   5457 ± 17%  numa-meminfo.node1.AnonPages
>   1641 ± 50% -53.8% 757.75 ± 15%  numa-meminfo.node1.PageTables
>   3353 ± 33% +49.2%   5004 ±  8%  numa-vmstat.node0.nr_active_anon
>   3349 ± 33% +49.0%   4991 ±  7%  numa-vmstat.node0.nr_anon_pages
>   2904 ± 38% -52.7%   1374 ± 18%  numa-vmstat.node1.nr_active_anon
>   2878 ± 38% -52.6%   1365 ± 17%  numa-vmstat.node1.nr_anon_pages
> 410.00 ± 50% -53.9% 189.00 ± 15%  
> numa-vmstat.node1.nr_page_table_pages
>  14760 ±  2%+207.2%  45336 ± 65%  numa-vmstat.node1.numa_other
>
>
> ivb42: Ivytown Ivy Bridge-EP
> Memory: 64G
>
>
>
>   unixbench.score
>
>   2150 ++---+
>||
>   2100 ++  .*..*.*..|
>|  .*..* *.*..*.   .*.*..*.* |
>*.*..*.*..* *..*.*.*..*.*..*.*.  |
>   2050 ++   |
>||
>   2000 ++   |
>||
>   1950 ++   |
>||
>O O  O O  O  OO  O O  O O  O O O |
>   1900 ++  O  OO O O  O O  O O  O O |
>| O O  O O
>   1850 ++---+
>
>
> [*] bisect-good sample
> [O] bisect-bad  sample
>
> To reproduce:
>
> git clone 
> git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml  # job file is attached in this email
> bin/lkp run job.yaml
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Ying Huang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [lkp] [selinux] 5d226df4ed: -10.1% unixbench.score

2016-01-04 Thread Andreas Grünbacher

2015-12-28 2:59 GMT+01:00 kernel test robot :
> FYI, we noticed the below changes on
>
> git://git.infradead.org/users/pcmoore/selinux stable-4.5
> commit 5d226df4edfa0eb1e689e7ac2741cf261ff7cbf1 ("selinux: Revalidate invalid 
> inode security labels")
>
>
> =
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/tbox_group/test/testcase:
>   
> gcc-4.9/performance/x86_64-rhel/1/debian-x86_64-2015-02-07.cgz/ivb42/pipe/unixbench
>
> commit:
>   6f3be9f562e3027c77bc4482ccf2cea8600a7f74
>   5d226df4edfa0eb1e689e7ac2741cf261ff7cbf1
>
> 6f3be9f562e3027c 5d226df4edfa0eb1e689e7ac27
>  --
>  %stddev %change %stddev
>  \  |\
>   2086 ±  0% -10.1%   1876 ±  0%  unixbench.score
>   7.79 ±  0% -10.2%   7.00 ±  1%  time.user_time
>  33404 ±117%   +9470.5%3196941 ±173%  
> latency_stats.max.wait_on_page_bit.__filemap_fdatawait_range.filemap_fdatawait_keep_errors.sync_inodes_sb.sync_inodes_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
>  18934 ± 34% -82.8%   3252 ± 73%  
> latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_execute.rpc_run_task.nfs4_call_sync_sequence.[nfsv4]._nfs4_proc_access.[nfsv4].nfs4_proc_access.[nfsv4].nfs_do_access.nfs_permission.__inode_permission.inode_permission.link_path_walk
>  57519 ±100%   +5478.5%3208706 ±172%  
> latency_stats.sum.wait_on_page_bit.__filemap_fdatawait_range.filemap_fdatawait_keep_errors.sync_inodes_sb.sync_inodes_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
>  13414 ± 33% +49.4%  20036 ±  8%  numa-meminfo.node0.Active(anon)
>  13399 ± 33% +49.0%  19963 ±  7%  numa-meminfo.node0.AnonPages
>  11624 ± 38% -52.7%   5495 ± 18%  numa-meminfo.node1.Active(anon)
>  11518 ± 38% -52.6%   5457 ± 17%  numa-meminfo.node1.AnonPages
>   1641 ± 50% -53.8% 757.75 ± 15%  numa-meminfo.node1.PageTables
>   3353 ± 33% +49.2%   5004 ±  8%  numa-vmstat.node0.nr_active_anon
>   3349 ± 33% +49.0%   4991 ±  7%  numa-vmstat.node0.nr_anon_pages
>   2904 ± 38% -52.7%   1374 ± 18%  numa-vmstat.node1.nr_active_anon
>   2878 ± 38% -52.6%   1365 ± 17%  numa-vmstat.node1.nr_anon_pages
> 410.00 ± 50% -53.9% 189.00 ± 15%  
> numa-vmstat.node1.nr_page_table_pages
>  14760 ±  2%+207.2%  45336 ± 65%  numa-vmstat.node1.numa_other
>
>
> ivb42: Ivytown Ivy Bridge-EP
> Memory: 64G
>
>
>
>   unixbench.score
>
>   2150 ++---+
>||
>   2100 ++  .*..*.*..|
>|  .*..* *.*..*.   .*.*..*.* |
>*.*..*.*..* *..*.*.*..*.*..*.*.  |
>   2050 ++   |
>||
>   2000 ++   |
>||
>   1950 ++   |
>||
>O O  O O  O  OO  O O  O O  O O O |
>   1900 ++  O  OO O O  O O  O O  O O |
>| O O  O O
>   1850 ++---+
>
>
> [*] bisect-good sample
> [O] bisect-bad  sample
>
> To reproduce:
>
> git clone 
> git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml  # job file is attached in this email
> bin/lkp run job.yaml
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Ying Huang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [lkp] [selinux] 5d226df4ed: -10.1% unixbench.score

2016-01-04 Thread Andreas Grünbacher

2015-12-28 2:59 GMT+01:00 kernel test robot :
> FYI, we noticed the below changes on
>
> git://git.infradead.org/users/pcmoore/selinux stable-4.5
> commit 5d226df4edfa0eb1e689e7ac2741cf261ff7cbf1 ("selinux: Revalidate invalid 
> inode security labels")

I could reproduce the performance regression with:

  $ wget https://byte-unixbench.googlecode.com/files/UnixBench5.1.3.tgz
  $ tar xvfz UnixBench5.1.3.tgz
  $ cd UnixBench
  $ make
  $ ./Run pipe -c 1

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the vfs tree

2015-12-09 Thread Andreas Grünbacher

2015-12-09 23:20 GMT+01:00 Stephen Rothwell :
> OK, I wrote all that and then I realised that the preferred names
> (XATTR_NAME_POSIX_ACL_..) have been in Linus' tree for a long time (in
> include/uapi/linux/xattr.h), so you could just change the orangefs tree
> to use those already. i.e. my patch should apply cleanly to the
> orangefs tree and build and work correctly independently of the vfs
> tree change (I think).

Yes, I think that would be best.

Thanks, in particular for all the linux-next work!

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the vfs tree

2015-12-09 Thread Andreas Grünbacher

2015-12-09 23:20 GMT+01:00 Stephen Rothwell :
> OK, I wrote all that and then I realised that the preferred names
> (XATTR_NAME_POSIX_ACL_..) have been in Linus' tree for a long time (in
> include/uapi/linux/xattr.h), so you could just change the orangefs tree
> to use those already. i.e. my patch should apply cleanly to the
> orangefs tree and build and work correctly independently of the vfs
> tree change (I think).

Yes, I think that would be best.

Thanks, in particular for all the linux-next work!

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/10] xattr handlers: Simplify list operation

2015-12-01 Thread Andreas Grünbacher

2015-12-01 10:53 GMT+01:00 James Morris :
> On Mon, 30 Nov 2015, Andreas Gruenbacher wrote:
>
>> diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
>> index 906022d..61dd93f 100644
>> --- a/fs/ocfs2/xattr.c
>> +++ b/fs/ocfs2/xattr.c
>> @@ -895,6 +895,8 @@ static int ocfs2_xattr_list_entry(char *buffer, size_t 
>> size,
>>
>>   *result += total_len;
>>
>> + /* FIXME: Not checking the ->list operation here ... */
>> +
>
> What does this mean?

ocfs2 defines list xattr handler operations for "user.*" and
"trusted.*" xattrs but never bothers calling them. Should be fixed
separately.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/10] xattr handlers: Simplify list operation

2015-12-01 Thread Andreas Grünbacher

2015-12-01 10:53 GMT+01:00 James Morris :
> On Mon, 30 Nov 2015, Andreas Gruenbacher wrote:
>
>> diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
>> index 906022d..61dd93f 100644
>> --- a/fs/ocfs2/xattr.c
>> +++ b/fs/ocfs2/xattr.c
>> @@ -895,6 +895,8 @@ static int ocfs2_xattr_list_entry(char *buffer, size_t 
>> size,
>>
>>   *result += total_len;
>>
>> + /* FIXME: Not checking the ->list operation here ... */
>> +
>
> What does this mean?

ocfs2 defines list xattr handler operations for "user.*" and
"trusted.*" xattrs but never bothers calling them. Should be fixed
separately.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] xfs: only modify ACLs if CONFIG_XFS_POSIX_ACL enabled

2015-11-03 Thread Andreas Grünbacher

Hi Arnd,

2015-11-03 14:43 GMT+01:00 Arnd Bergmann :
> A recent bug fix added a call to forget_cached_acl(), but that
> function is not always available, so we can get a build error
> here [...]

didn't expect anybody to trip over this anytime soon...

I've sent a fix yesterday,

  http://oss.sgi.com/archives/xfs/2015-11/msg00028.html

and Dave has already fixed it in for-next:

  git//git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] xfs: only modify ACLs if CONFIG_XFS_POSIX_ACL enabled

2015-11-03 Thread Andreas Grünbacher

Hi Arnd,

2015-11-03 14:43 GMT+01:00 Arnd Bergmann :
> A recent bug fix added a call to forget_cached_acl(), but that
> function is not always available, so we can get a build error
> here [...]

didn't expect anybody to trip over this anytime soon...

I've sent a fix yesterday,

  http://oss.sgi.com/archives/xfs/2015-11/msg00028.html

and Dave has already fixed it in for-next:

  git//git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v10 38/46] richacl: Add support for unmapped identifiers

2015-10-11 Thread Andreas Grünbacher

2015-10-12 2:22 GMT+02:00 Dave Chinner :
> This was used by the XFS support patch earlier in the series. Bisect
> problem here...

Yes, I'll fix that up as well.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v10 24/46] xfs: Add richacl support

2015-10-11 Thread Andreas Grünbacher

2015-10-12 2:10 GMT+02:00 Dave Chinner :
> On Mon, Oct 12, 2015 at 12:58:35AM +0200, Andreas Gruenbacher wrote:
>> diff --git a/fs/ext4/richacl.h b/fs/ext4/richacl.h
>> index fc7826f..f26fbdd 100644
>> --- a/fs/ext4/richacl.h
>> +++ b/fs/ext4/richacl.h
>> @@ -24,7 +24,6 @@
>>
>>  extern struct richacl *ext4_get_richacl(struct inode *);
>>  extern int ext4_set_richacl(struct inode *, struct richacl *);
>> -
>>  extern int ext4_init_richacl(handle_t *, struct inode *, struct inode *);
>>
>>  #else  /* CONFIG_EXT4_FS_RICHACL */
>
> stray hunk.

Oops, thanks.

>> diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
>> index 5d47b4d..05dd312 100644
>> --- a/fs/xfs/Kconfig
>> +++ b/fs/xfs/Kconfig
>> @@ -52,6 +52,17 @@ config XFS_POSIX_ACL
>>
>> If you don't know what Access Control Lists are, say N.
>>
>> +config XFS_RICHACL
>> +bool "XFS Rich Access Control Lists (EXPERIMENTAL)"
>> +depends on XFS_FS
>> +select FS_RICHACL
>> +help
>> +  Richacls are an implementation of NFSv4 ACLs, extended by file 
>> masks
>> +  to cleanly integrate into the POSIX file permission model.  To 
>> learn
>> +   more about them, see http://www.bestbits.at/richacl/.
>> +
>> +  If you don't know what Richacls are, say N.
>> +
>
> So now we have multiple compile time ACL configs that we have to
> test. Personally, I see no reason for making either posix or rich
> acls compile time dependent. How about we just change CONFIG_FS_XFS
> to have "select FS_POSIX_ACL" and "select FS_RICHACL"
> unconditionally, and then we can get rid of all the conditional
> code?

I'm sure neither kind of ACLs makes sense on many tiny systems, it
should be possible to disable them. XFS may not make sense on those
kinds of systems either, that is not my call to make though.

>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index 9590a06..8c6da45 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -461,10 +461,18 @@ xfs_sb_has_ro_compat_feature(
>>  #define XFS_SB_FEAT_INCOMPAT_FTYPE   (1 << 0)/* filetype in dirent 
>> */
>>  #define XFS_SB_FEAT_INCOMPAT_SPINODES(1 << 1)/* sparse 
>> inode chunks */
>>  #define XFS_SB_FEAT_INCOMPAT_META_UUID   (1 << 2)/* metadata 
>> UUID */
>> +
>> +#ifdef CONFIG_XFS_RICHACL
>> +#define XFS_SB_FEAT_INCOMPAT_RICHACL (1 << 3)/* richacls */
>> +#else
>> +#define XFS_SB_FEAT_INCOMPAT_RICHACL 0
>> +#endif
>> +
>
> Regardless of whether we build in richacl support, this is wrong.
> Feature bits are on-disk format definitions, so it will always have
> this value whether or not the kernel supports the feature.

This is so that xfs_mount_validate_sb() will complain when mounting a
richacl filesystem on a kernel which doesn't have richacl suport
compiled in. The same effect can be had with extra code there of
course.

>> @@ -722,7 +737,10 @@ xfs_setattr_nonsize(
>>*   Posix ACL code seems to care about this issue either.
>>*/
>>   if ((mask & ATTR_MODE) && !(flags & XFS_ATTR_NOACL)) {
>> - error = posix_acl_chmod(inode, inode->i_mode);
>> + if (XFS_IS_RICHACL(inode))
>> + error = richacl_chmod(inode, inode->i_mode);
>> + else
>> + error = posix_acl_chmod(inode, inode->i_mode);
>>   if (error)
>>   return error;
>>   }
>
> This sort of thing could do with a helper like:
>
> static inline int
> xfs_acl_chmod(struct inode *inode, int mode)
> {
> if (IS_RICHACL(inode))
> return richacl_chmod(inode, mode);
> return posix_acl_chmod(inode, mode);
> }

Alright.

>> +#include "xfs.h"
>> +#include "xfs_format.h"
>> +#include "xfs_log_format.h"
>> +#include "xfs_inode.h"
>> +#include "xfs_attr.h"
>> +
>> +#include 
>> +
>> +struct richacl *
>> +xfs_get_richacl(struct inode *inode)
>> +{
>
> ...
>
> If we always build in ACL support, this can all go in xfs_acl.c.
>
>> + struct xfs_inode *ip = XFS_I(inode);
>> + struct richacl *acl;
>> + int size = XATTR_SIZE_MAX;
>> + void *value;
>> + int error;
>> +
>> + value = kmem_zalloc_large(size, KM_SLEEP);
>> + if (!value)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + error = xfs_attr_get(ip, XATTR_NAME_RICHACL, value, , ATTR_ROOT);
>> + if (error) {
>> + /*
>> +  * If the attribute doesn't exist make sure we have a negative
>> +  * cache entry, for any other error assume it is transient and
>> +  * leave the cache entry as ACL_NOT_CACHED.
>> +  */
>> + acl = (error == -ENOATTR) ? NULL : ERR_PTR(error);
>
> I rather dislike ternaries in code like this - it's hard to read the
> logic. Initalise acl to NULL, and then this simply becomes:
>
> if (error != -ENOATTR)
> acl = ERR_PTR(error);

As you prefer.

>> + } else
>> +

Re: [PATCH v10 24/46] xfs: Add richacl support

2015-10-11 Thread Andreas Grünbacher

2015-10-12 2:10 GMT+02:00 Dave Chinner :
> On Mon, Oct 12, 2015 at 12:58:35AM +0200, Andreas Gruenbacher wrote:
>> diff --git a/fs/ext4/richacl.h b/fs/ext4/richacl.h
>> index fc7826f..f26fbdd 100644
>> --- a/fs/ext4/richacl.h
>> +++ b/fs/ext4/richacl.h
>> @@ -24,7 +24,6 @@
>>
>>  extern struct richacl *ext4_get_richacl(struct inode *);
>>  extern int ext4_set_richacl(struct inode *, struct richacl *);
>> -
>>  extern int ext4_init_richacl(handle_t *, struct inode *, struct inode *);
>>
>>  #else  /* CONFIG_EXT4_FS_RICHACL */
>
> stray hunk.

Oops, thanks.

>> diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
>> index 5d47b4d..05dd312 100644
>> --- a/fs/xfs/Kconfig
>> +++ b/fs/xfs/Kconfig
>> @@ -52,6 +52,17 @@ config XFS_POSIX_ACL
>>
>> If you don't know what Access Control Lists are, say N.
>>
>> +config XFS_RICHACL
>> +bool "XFS Rich Access Control Lists (EXPERIMENTAL)"
>> +depends on XFS_FS
>> +select FS_RICHACL
>> +help
>> +  Richacls are an implementation of NFSv4 ACLs, extended by file 
>> masks
>> +  to cleanly integrate into the POSIX file permission model.  To 
>> learn
>> +   more about them, see http://www.bestbits.at/richacl/.
>> +
>> +  If you don't know what Richacls are, say N.
>> +
>
> So now we have multiple compile time ACL configs that we have to
> test. Personally, I see no reason for making either posix or rich
> acls compile time dependent. How about we just change CONFIG_FS_XFS
> to have "select FS_POSIX_ACL" and "select FS_RICHACL"
> unconditionally, and then we can get rid of all the conditional
> code?

I'm sure neither kind of ACLs makes sense on many tiny systems, it
should be possible to disable them. XFS may not make sense on those
kinds of systems either, that is not my call to make though.

>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index 9590a06..8c6da45 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -461,10 +461,18 @@ xfs_sb_has_ro_compat_feature(
>>  #define XFS_SB_FEAT_INCOMPAT_FTYPE   (1 << 0)/* filetype in dirent 
>> */
>>  #define XFS_SB_FEAT_INCOMPAT_SPINODES(1 << 1)/* sparse 
>> inode chunks */
>>  #define XFS_SB_FEAT_INCOMPAT_META_UUID   (1 << 2)/* metadata 
>> UUID */
>> +
>> +#ifdef CONFIG_XFS_RICHACL
>> +#define XFS_SB_FEAT_INCOMPAT_RICHACL (1 << 3)/* richacls */
>> +#else
>> +#define XFS_SB_FEAT_INCOMPAT_RICHACL 0
>> +#endif
>> +
>
> Regardless of whether we build in richacl support, this is wrong.
> Feature bits are on-disk format definitions, so it will always have
> this value whether or not the kernel supports the feature.

This is so that xfs_mount_validate_sb() will complain when mounting a
richacl filesystem on a kernel which doesn't have richacl suport
compiled in. The same effect can be had with extra code there of
course.

>> @@ -722,7 +737,10 @@ xfs_setattr_nonsize(
>>*   Posix ACL code seems to care about this issue either.
>>*/
>>   if ((mask & ATTR_MODE) && !(flags & XFS_ATTR_NOACL)) {
>> - error = posix_acl_chmod(inode, inode->i_mode);
>> + if (XFS_IS_RICHACL(inode))
>> + error = richacl_chmod(inode, inode->i_mode);
>> + else
>> + error = posix_acl_chmod(inode, inode->i_mode);
>>   if (error)
>>   return error;
>>   }
>
> This sort of thing could do with a helper like:
>
> static inline int
> xfs_acl_chmod(struct inode *inode, int mode)
> {
> if (IS_RICHACL(inode))
> return richacl_chmod(inode, mode);
> return posix_acl_chmod(inode, mode);
> }

Alright.

>> +#include "xfs.h"
>> +#include "xfs_format.h"
>> +#include "xfs_log_format.h"
>> +#include "xfs_inode.h"
>> +#include "xfs_attr.h"
>> +
>> +#include 
>> +
>> +struct richacl *
>> +xfs_get_richacl(struct inode *inode)
>> +{
>
> ...
>
> If we always build in ACL support, this can all go in xfs_acl.c.
>
>> + struct xfs_inode *ip = XFS_I(inode);
>> + struct richacl *acl;
>> + int size = XATTR_SIZE_MAX;
>> + void *value;
>> + int error;
>> +
>> + value = kmem_zalloc_large(size, KM_SLEEP);
>> + if (!value)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + error = xfs_attr_get(ip, XATTR_NAME_RICHACL, value, , ATTR_ROOT);
>> + if (error) {
>> + /*
>> +  * If the attribute doesn't exist make sure we have a negative
>> +  * cache entry, for any other error assume it is transient and
>> +  * leave the cache entry as ACL_NOT_CACHED.
>> +  */
>> + acl = (error == -ENOATTR) ? NULL : ERR_PTR(error);
>
> I rather dislike ternaries in code like this - it's hard to read the
> logic. Initalise acl to NULL, and then this simply becomes:
>
> if (error != -ENOATTR)
> acl = ERR_PTR(error);

As you prefer.

Re: [PATCH v10 38/46] richacl: Add support for unmapped identifiers

2015-10-11 Thread Andreas Grünbacher

2015-10-12 2:22 GMT+02:00 Dave Chinner :
> This was used by the XFS support patch earlier in the series. Bisect
> problem here...

Yes, I'll fix that up as well.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 00/41] Richacls

2015-10-06 Thread Andreas Grünbacher

2015-10-06 11:49 GMT+02:00 James Morris :
> Where is the rationale for them?
>
> This url doesn't work: http://acl.bestbits.at/richacl/

That URL also properly redirects to http://www.bestbits.at/richacl/
now. There is some information on that website, in the manual pages in
the richacl user-space package, an in the patch descriptions and
comments.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 00/41] Richacls

2015-10-06 Thread Andreas Grünbacher

2015-10-06 11:49 GMT+02:00 James Morris :
> Where is the rationale for them?
>
> This url doesn't work: http://acl.bestbits.at/richacl/

That URL also properly redirects to http://www.bestbits.at/richacl/
now. There is some information on that website, in the manual pages in
the richacl user-space package, an in the patch descriptions and
comments.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 6/7] squashfs: xattr simplifications

2015-10-04 Thread Andreas Grünbacher

2015-10-04 8:29 GMT+02:00 Christoph Hellwig :
> IS it just me or is this handler broke before and after this patch? It
> only copies out the prefix, but not the actual xattr name.

Well, squashfs appends the rest of the name in squashfs_listxattr, so
it's not broken, just different. I have tried cleaning that up by
converting the list operation into something like this instead:

  bool (*may_list)(struct dentry *dentry);

The xattr name would always be copied into the buffer in
iop->listxattr. But security_inode_listsecurity which is called from
vfs_getxattr as well as nfs4_xattr_list_nfs4_label was getting in the
way, and it's unclear to me how to best clean that up first.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 6/7] squashfs: xattr simplifications

2015-10-04 Thread Andreas Grünbacher

2015-10-04 8:29 GMT+02:00 Christoph Hellwig :
> IS it just me or is this handler broke before and after this patch? It
> only copies out the prefix, but not the actual xattr name.

Well, squashfs appends the rest of the name in squashfs_listxattr, so
it's not broken, just different. I have tried cleaning that up by
converting the list operation into something like this instead:

  bool (*may_list)(struct dentry *dentry);

The xattr name would always be copied into the buffer in
iop->listxattr. But security_inode_listsecurity which is called from
vfs_getxattr as well as nfs4_xattr_list_nfs4_label was getting in the
way, and it's unclear to me how to best clean that up first.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 00/41] Richacls

2015-09-29 Thread Andreas Grünbacher

2015-09-28 19:46 GMT+02:00 J. Bruce Fields :
> On Mon, Sep 28, 2015 at 07:10:06PM +0200, Andreas Grünbacher wrote:
>> 2015-09-28 18:35 GMT+02:00 J. Bruce Fields :
>> > On Mon, Sep 28, 2015 at 12:08:51AM +0200, Andreas Gruenbacher wrote:
>> >> Open issues in nfs:
>> >>
>> >> * When a user or group name cannot be mapped, nfs's idmapper always maps 
>> >> it
>> >>   to nobody. That's good enough for mapping the file owner and owning
>> >>   group, but not for identifiers in acls. For now, to get the nfs richacl
>> >>   support somewhat working, I'm explicitly checking if mapping has 
>> >> resulted
>> >>   in uid/gid 99 in the kernel.
>> >>
>> >> * When the nfs server replies with NFS4ERR_BADNAME for any user or group
>> >>   name lookup, the client will stop sending numeric uids and gids to the
>> >>   server even when the lookup wasn't numeric.  From then on, the client
>> >>   will translate uids and gids that have no mapping to the string 
>> >> "nobody",
>> >>   and the server will reject them.  This problem is not specific to acls.
>> >
>> > Do you have fixes in mind for these two issues?
>>
>> I'm not sure how to best fix the idmapper problem, with backwards
>> compatibility and all.
>
> I haven't looked at the current nfsidmap interface  So it's
> completely lacking any way to communicate failure?

Yes, when a user doesn't exist, idmapper maps that to the nobody
uid/gid. That's the failure mode of stat. In the acl case, we do want
to map user and group names to their respective ids where possible (so
that the acl makes sense in the local system context), but we do want
to preserve the original user and group names when there is no such
mapping instead of mapping to the nobody uid/gid.

>> The second problem shouldn't be too hard to fix.
>
> Is it enough to turn off the failover in the case there's no possibility
> it could have been caused by a numeric id?

Yes, I believe that would be enough.

> If any user can set ACLs with arbitrary strings as names, then we'd be
> giving any user unprivileged user the ability to turn off numeric
> idmapping, so I think we need to fix that.

The bug can be triggered by unprivileged users with nfs4_setfacl.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 00/41] Richacls

2015-09-29 Thread Andreas Grünbacher

2015-09-28 19:46 GMT+02:00 J. Bruce Fields <bfie...@fieldses.org>:
> On Mon, Sep 28, 2015 at 07:10:06PM +0200, Andreas Grünbacher wrote:
>> 2015-09-28 18:35 GMT+02:00 J. Bruce Fields <bfie...@fieldses.org>:
>> > On Mon, Sep 28, 2015 at 12:08:51AM +0200, Andreas Gruenbacher wrote:
>> >> Open issues in nfs:
>> >>
>> >> * When a user or group name cannot be mapped, nfs's idmapper always maps 
>> >> it
>> >>   to nobody. That's good enough for mapping the file owner and owning
>> >>   group, but not for identifiers in acls. For now, to get the nfs richacl
>> >>   support somewhat working, I'm explicitly checking if mapping has 
>> >> resulted
>> >>   in uid/gid 99 in the kernel.
>> >>
>> >> * When the nfs server replies with NFS4ERR_BADNAME for any user or group
>> >>   name lookup, the client will stop sending numeric uids and gids to the
>> >>   server even when the lookup wasn't numeric.  From then on, the client
>> >>   will translate uids and gids that have no mapping to the string 
>> >> "nobody",
>> >>   and the server will reject them.  This problem is not specific to acls.
>> >
>> > Do you have fixes in mind for these two issues?
>>
>> I'm not sure how to best fix the idmapper problem, with backwards
>> compatibility and all.
>
> I haven't looked at the current nfsidmap interface  So it's
> completely lacking any way to communicate failure?

Yes, when a user doesn't exist, idmapper maps that to the nobody
uid/gid. That's the failure mode of stat. In the acl case, we do want
to map user and group names to their respective ids where possible (so
that the acl makes sense in the local system context), but we do want
to preserve the original user and group names when there is no such
mapping instead of mapping to the nobody uid/gid.

>> The second problem shouldn't be too hard to fix.
>
> Is it enough to turn off the failover in the case there's no possibility
> it could have been caused by a numeric id?

Yes, I believe that would be enough.

> If any user can set ACLs with arbitrary strings as names, then we'd be
> giving any user unprivileged user the ability to turn off numeric
> idmapping, so I think we need to fix that.

The bug can be triggered by unprivileged users with nfs4_setfacl.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 00/41] Richacls

2015-09-28 Thread Andreas Grünbacher

2015-09-28 18:35 GMT+02:00 J. Bruce Fields :
> On Mon, Sep 28, 2015 at 12:08:51AM +0200, Andreas Gruenbacher wrote:
>> here's another update of the richacl patch queue.  At this stage, I would
>> like to ask for final feedback so that the core and ext4 code (patches
>> 1-19) can be merged in the 4.4 merge window.  The nfsd and nfs code should
>> then go through the respective maintainer trees.
>
> I've been over the core richacl and nfsd parts very carefully, and they
> definitely look ready to me.

Thanks a lot for all that work, by the way.

>> Changes since the last posting (https://lwn.net/Articles/656704/):
>>
>> * The MAY_DELETE_SELF permission now also overrides the sticky
>>directory checks.
>>
>> * Fix the permission check algorithm to apply the owner mask instead
>>   of the group mask to user entries matching the current owner. That way,
>>   the owner will retain the permissions in those entries when creating
>>   objects with create mode 0700 and similar. (A chmod to mode 0700 already
>>   creates an owner@:rwpx::allow ace, which was hiding this bug.)
>>
>> * Fix richacl_apply_masks to properly insert deny aces when raising the
>>   permissions of the other class. The bug could be triggered by
>>   chmod'ing a group@:r::allow acl to mode 0077, for example.
>>
>> * Various cleanups and improvements to comments.
>>
>>
>> The complete patch queue is available here:
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/agruen/linux-richacl.git \
>>   richacl-2015-09-28
>>
>>
>> The richacl user-space utilitites and test suite are available here:
>>
>>   https://github.com/andreas-gruenbacher/richacl/
>>
>>
>> Open issues in nfs:
>>
>> * When a user or group name cannot be mapped, nfs's idmapper always maps it
>>   to nobody. That's good enough for mapping the file owner and owning
>>   group, but not for identifiers in acls. For now, to get the nfs richacl
>>   support somewhat working, I'm explicitly checking if mapping has resulted
>>   in uid/gid 99 in the kernel.
>>
>> * When the nfs server replies with NFS4ERR_BADNAME for any user or group
>>   name lookup, the client will stop sending numeric uids and gids to the
>>   server even when the lookup wasn't numeric.  From then on, the client
>>   will translate uids and gids that have no mapping to the string "nobody",
>>   and the server will reject them.  This problem is not specific to acls.
>
> Do you have fixes in mind for these two issues?

I'm not sure how to best fix the idmapper problem, with backwards
compatibility and all. The second problem shouldn't be too hard to
fix.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 10/41] richacl: Permission check algorithm

2015-09-28 Thread Andreas Grünbacher

2015-09-28 18:29 GMT+02:00 J. Bruce Fields :
> On Mon, Sep 28, 2015 at 06:25:23PM +0200, Andreas Grünbacher wrote:
>> 2015-09-28 18:08 GMT+02:00 J. Bruce Fields :
>> > The above also skips the following group_mask application on any unix
>> > group.
>>
>> Really? How does it do that?
>
> Sorry, I meant "unix user", not "unix group"!

Indeed, that's a bit tricky. Probably worth changing if just for clarity:

-   goto entry_matches_owner;
+   if (uid_eq(current_fsuid(), inode->i_uid))
+   goto entry_matches_owner;

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 10/41] richacl: Permission check algorithm

2015-09-28 Thread Andreas Grünbacher

2015-09-28 18:08 GMT+02:00 J. Bruce Fields :
> On Mon, Sep 28, 2015 at 12:09:01AM +0200, Andreas Gruenbacher wrote:
>> + /*
>> +  * Check if the acl grants the requested access and determine which
>> +  * file class the process is in.
>> +  */
>> + richacl_for_each_entry(ace, acl) {
>> + unsigned int ace_mask = ace->e_mask;
>> +
>> + if (richace_is_inherit_only(ace))
>> + continue;
>> + if (richace_is_owner(ace)) {
>> + if (!uid_eq(current_fsuid(), inode->i_uid))
>> + continue;
>> + goto entry_matches_owner;
>> + } else if (richace_is_group(ace)) {
>> + if (!in_owning_group)
>> + continue;
>> + } else if (richace_is_unix_user(ace)) {
>> + if (!uid_eq(current_fsuid(), ace->e_id.uid))
>> + continue;
>> + goto entry_matches_owner;
>> + } else if (richace_is_unix_group(ace)) {
>> + if (!in_group_p(ace->e_id.gid))
>> + continue;
>> + } else
>> + goto entry_matches_everyone;
>> +
>> + /*
>> +  * Apply the group file mask to entries other than owner@ and
>> +  * everyone@ or user entries matching the owner.
>
> The above also skips the following group_mask application on any unix
> group.

Really? How does it do that?

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 00/41] Richacls

2015-09-28 Thread Andreas Grünbacher

2015-09-28 18:35 GMT+02:00 J. Bruce Fields :
> On Mon, Sep 28, 2015 at 12:08:51AM +0200, Andreas Gruenbacher wrote:
>> here's another update of the richacl patch queue.  At this stage, I would
>> like to ask for final feedback so that the core and ext4 code (patches
>> 1-19) can be merged in the 4.4 merge window.  The nfsd and nfs code should
>> then go through the respective maintainer trees.
>
> I've been over the core richacl and nfsd parts very carefully, and they
> definitely look ready to me.

Thanks a lot for all that work, by the way.

>> Changes since the last posting (https://lwn.net/Articles/656704/):
>>
>> * The MAY_DELETE_SELF permission now also overrides the sticky
>>directory checks.
>>
>> * Fix the permission check algorithm to apply the owner mask instead
>>   of the group mask to user entries matching the current owner. That way,
>>   the owner will retain the permissions in those entries when creating
>>   objects with create mode 0700 and similar. (A chmod to mode 0700 already
>>   creates an owner@:rwpx::allow ace, which was hiding this bug.)
>>
>> * Fix richacl_apply_masks to properly insert deny aces when raising the
>>   permissions of the other class. The bug could be triggered by
>>   chmod'ing a group@:r::allow acl to mode 0077, for example.
>>
>> * Various cleanups and improvements to comments.
>>
>>
>> The complete patch queue is available here:
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/agruen/linux-richacl.git \
>>   richacl-2015-09-28
>>
>>
>> The richacl user-space utilitites and test suite are available here:
>>
>>   https://github.com/andreas-gruenbacher/richacl/
>>
>>
>> Open issues in nfs:
>>
>> * When a user or group name cannot be mapped, nfs's idmapper always maps it
>>   to nobody. That's good enough for mapping the file owner and owning
>>   group, but not for identifiers in acls. For now, to get the nfs richacl
>>   support somewhat working, I'm explicitly checking if mapping has resulted
>>   in uid/gid 99 in the kernel.
>>
>> * When the nfs server replies with NFS4ERR_BADNAME for any user or group
>>   name lookup, the client will stop sending numeric uids and gids to the
>>   server even when the lookup wasn't numeric.  From then on, the client
>>   will translate uids and gids that have no mapping to the string "nobody",
>>   and the server will reject them.  This problem is not specific to acls.
>
> Do you have fixes in mind for these two issues?

I'm not sure how to best fix the idmapper problem, with backwards
compatibility and all. The second problem shouldn't be too hard to
fix.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 10/41] richacl: Permission check algorithm

2015-09-28 Thread Andreas Grünbacher

2015-09-28 18:29 GMT+02:00 J. Bruce Fields <bfie...@fieldses.org>:
> On Mon, Sep 28, 2015 at 06:25:23PM +0200, Andreas Grünbacher wrote:
>> 2015-09-28 18:08 GMT+02:00 J. Bruce Fields <bfie...@fieldses.org>:
>> > The above also skips the following group_mask application on any unix
>> > group.
>>
>> Really? How does it do that?
>
> Sorry, I meant "unix user", not "unix group"!

Indeed, that's a bit tricky. Probably worth changing if just for clarity:

-   goto entry_matches_owner;
+   if (uid_eq(current_fsuid(), inode->i_uid))
+   goto entry_matches_owner;

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 10/41] richacl: Permission check algorithm

2015-09-28 Thread Andreas Grünbacher

2015-09-28 18:08 GMT+02:00 J. Bruce Fields :
> On Mon, Sep 28, 2015 at 12:09:01AM +0200, Andreas Gruenbacher wrote:
>> + /*
>> +  * Check if the acl grants the requested access and determine which
>> +  * file class the process is in.
>> +  */
>> + richacl_for_each_entry(ace, acl) {
>> + unsigned int ace_mask = ace->e_mask;
>> +
>> + if (richace_is_inherit_only(ace))
>> + continue;
>> + if (richace_is_owner(ace)) {
>> + if (!uid_eq(current_fsuid(), inode->i_uid))
>> + continue;
>> + goto entry_matches_owner;
>> + } else if (richace_is_group(ace)) {
>> + if (!in_owning_group)
>> + continue;
>> + } else if (richace_is_unix_user(ace)) {
>> + if (!uid_eq(current_fsuid(), ace->e_id.uid))
>> + continue;
>> + goto entry_matches_owner;
>> + } else if (richace_is_unix_group(ace)) {
>> + if (!in_group_p(ace->e_id.gid))
>> + continue;
>> + } else
>> + goto entry_matches_everyone;
>> +
>> + /*
>> +  * Apply the group file mask to entries other than owner@ and
>> +  * everyone@ or user entries matching the owner.
>
> The above also skips the following group_mask application on any unix
> group.

Really? How does it do that?

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v7 24/41] richacl: Set the other permissions to the other mask

2015-09-23 Thread Andreas Grünbacher

2015-09-23 16:03 GMT+02:00 J. Bruce Fields :
> On Sat, Sep 05, 2015 at 12:27:19PM +0200, Andreas Gruenbacher wrote:
>> +static int
>> +richacl_set_other_permissions(struct richacl_alloc *alloc)
>> +{
>> + struct richacl *acl = alloc->acl;
>> + unsigned int x = RICHACE_POSIX_ALWAYS_ALLOWED;
>> + unsigned int other_mask = acl->a_other_mask & ~x;
>> + struct richace *ace = acl->a_entries + acl->a_count - 1;
>> +
>> + if (!(other_mask &&
>> +   (acl->a_flags & RICHACL_WRITE_THROUGH) &&
>> +   (acl->a_flags & RICHACL_MASKED)))
>> + return 0;
>
> By the way, I think this is only called after checking MASKED--so the
> MASKED check here could be a WARN_ON, or could just be dropped.  Ditto
> for the following patch.

Hmm, yes. I'll drop that.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v7 24/41] richacl: Set the other permissions to the other mask

2015-09-23 Thread Andreas Grünbacher

2015-09-23 16:03 GMT+02:00 J. Bruce Fields :
> On Sat, Sep 05, 2015 at 12:27:19PM +0200, Andreas Gruenbacher wrote:
>> +static int
>> +richacl_set_other_permissions(struct richacl_alloc *alloc)
>> +{
>> + struct richacl *acl = alloc->acl;
>> + unsigned int x = RICHACE_POSIX_ALWAYS_ALLOWED;
>> + unsigned int other_mask = acl->a_other_mask & ~x;
>> + struct richace *ace = acl->a_entries + acl->a_count - 1;
>> +
>> + if (!(other_mask &&
>> +   (acl->a_flags & RICHACL_WRITE_THROUGH) &&
>> +   (acl->a_flags & RICHACL_MASKED)))
>> + return 0;
>
> By the way, I think this is only called after checking MASKED--so the
> MASKED check here could be a WARN_ON, or could just be dropped.  Ditto
> for the following patch.

Hmm, yes. I'll drop that.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v7 10/41] richacl: Permission check algorithm

2015-09-11 Thread Andreas Grünbacher

2015-09-11 23:16 GMT+02:00 J. Bruce Fields :
> On Sat, Sep 05, 2015 at 12:27:05PM +0200, Andreas Gruenbacher wrote:
>> + /*
>> +  * Apply the group file mask to entries other than owner@ and
>> +  * everyone@ or user entries matching the owner.  This ensures
>> +  * that we grant the same permissions as the acl computed by
>> +  * richacl_apply_masks().
>> +  *
>> +  * Without this restriction, the following richacl would grant
>> +  * rw access to processes which are both the owner and in the
>> +  * owning group, but not to other users in the owning group,
>> +  * which could not be represented without masks:
>> +  *
>> +  *  owner:rw::mask
>> +  *  group@:rw::allow
>> +  */
>> + if ((acl->a_flags & RICHACL_MASKED) && richace_is_allow(ace))
>> + ace_mask &= acl->a_group_mask;
>
> I'm having trouble understanding this.  I think the problem is that I
> don't really understand the notation in your example.  Is a_group_mask
> zero in that example?  I think it must be, in which case, OK I think I
> get it.

Yes. I'm not sure if the example becomes easier to understand when the
empty group mask and perhaps also the other mask is included.

> (Though I still have to think about it a little more to convince myself
> that richacl_apply_masks() always gets the same result.)

I have tried to break the algorithm into digestible pieces. Do you see
another way to make things easier to understand?

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v7 10/41] richacl: Permission check algorithm

2015-09-11 Thread Andreas Grünbacher

2015-09-11 23:16 GMT+02:00 J. Bruce Fields :
> On Sat, Sep 05, 2015 at 12:27:05PM +0200, Andreas Gruenbacher wrote:
>> + /*
>> +  * Apply the group file mask to entries other than owner@ and
>> +  * everyone@ or user entries matching the owner.  This ensures
>> +  * that we grant the same permissions as the acl computed by
>> +  * richacl_apply_masks().
>> +  *
>> +  * Without this restriction, the following richacl would grant
>> +  * rw access to processes which are both the owner and in the
>> +  * owning group, but not to other users in the owning group,
>> +  * which could not be represented without masks:
>> +  *
>> +  *  owner:rw::mask
>> +  *  group@:rw::allow
>> +  */
>> + if ((acl->a_flags & RICHACL_MASKED) && richace_is_allow(ace))
>> + ace_mask &= acl->a_group_mask;
>
> I'm having trouble understanding this.  I think the problem is that I
> don't really understand the notation in your example.  Is a_group_mask
> zero in that example?  I think it must be, in which case, OK I think I
> get it.

Yes. I'm not sure if the example becomes easier to understand when the
empty group mask and perhaps also the other mask is included.

> (Though I still have to think about it a little more to convince myself
> that richacl_apply_masks() always gets the same result.)

I have tried to break the algorithm into digestible pieces. Do you see
another way to make things easier to understand?

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v6 09/40] richacl: Update the file masks in chmod()

2015-09-02 Thread Andreas Grünbacher

2015-09-01 23:38 GMT+02:00 J. Bruce Fields :
> Do you need to require WRITE_THROUGH here too?

Yes, indeed.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v6 09/40] richacl: Update the file masks in chmod()

2015-09-02 Thread Andreas Grünbacher

2015-09-01 23:38 GMT+02:00 J. Bruce Fields :
> Do you need to require WRITE_THROUGH here too?

Yes, indeed.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v6 10/40] richacl: Permission check algorithm

2015-08-28 Thread Andreas Grünbacher

2015-08-28 23:49 GMT+02:00 J. Bruce Fields :
> On Tue, Aug 04, 2015 at 01:53:08PM +0200, Andreas Gruenbacher wrote:
>> + /*
>> +  * We don't care which class the process is in when the acl is
>> +  * not masked.
>> +  */
>> + in_owner_or_group_class = 1;
>
> So why bother setting it at all, then?  Oh, I see, it lets you break out
> of the loop below earlier.  OK.

Comment changed to:

/*
 * When the acl is not masked, there is no need to determine if
 * the process is in the group class and we can earlier break
 * out of the loop below.
 */

> Patch makes sense to me, ACK.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v6 03/40] vfs: Add MAY_DELETE_SELF and MAY_DELETE_CHILD permission flags

2015-08-28 Thread Andreas Grünbacher

2015-08-28 23:36 GMT+02:00 Andy Lutomirski :
> Silly question from the peanut gallery: is there any such thing as
> opening an fd pointing at a file such that the "open file description"
> (i.e. the struct file) captures the right to delete the file?
>
> IOW do we need FMODE_DELETE_SELF?

When would that permission be checked, what syscall would you use to
unlink an open file descriptor?

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v6 10/40] richacl: Permission check algorithm

2015-08-28 Thread Andreas Grünbacher

2015-08-28 23:49 GMT+02:00 J. Bruce Fields bfie...@fieldses.org:
 On Tue, Aug 04, 2015 at 01:53:08PM +0200, Andreas Gruenbacher wrote:
 + /*
 +  * We don't care which class the process is in when the acl is
 +  * not masked.
 +  */
 + in_owner_or_group_class = 1;

 So why bother setting it at all, then?  Oh, I see, it lets you break out
 of the loop below earlier.  OK.

Comment changed to:

/*
 * When the acl is not masked, there is no need to determine if
 * the process is in the group class and we can earlier break
 * out of the loop below.
 */

 Patch makes sense to me, ACK.

Thanks,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v6 03/40] vfs: Add MAY_DELETE_SELF and MAY_DELETE_CHILD permission flags

2015-08-28 Thread Andreas Grünbacher

2015-08-28 23:36 GMT+02:00 Andy Lutomirski l...@amacapital.net:
 Silly question from the peanut gallery: is there any such thing as
 opening an fd pointing at a file such that the open file description
 (i.e. the struct file) captures the right to delete the file?

 IOW do we need FMODE_DELETE_SELF?

When would that permission be checked, what syscall would you use to
unlink an open file descriptor?

Thanks,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 39/39] nfs: Add support for the v4.1 dacl attribute

2015-07-22 Thread Andreas Grünbacher

Hi Anna,

2015-07-22 17:57 GMT+02:00 Anna Schumaker :
> Discussions around deprecating or removing the minorversion= argument have 
> come up recently.
> The preferred way of mounting NFS v4.1 these days is with: "-o vers=4.1".  
> Can you update your
> patch description so we don't confuse people? :)

sure, that was easy :)

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 39/39] nfs: Add support for the v4.1 dacl attribute

2015-07-22 Thread Andreas Grünbacher

Hi Anna,

2015-07-22 17:57 GMT+02:00 Anna Schumaker anna.schuma...@netapp.com:
 Discussions around deprecating or removing the minorversion= argument have 
 come up recently.
 The preferred way of mounting NFS v4.1 these days is with: -o vers=4.1.  
 Can you update your
 patch description so we don't confuse people? :)

sure, that was easy :)

Thanks,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v4 06/31] richacl: In-memory representation and helper functions

2015-06-26 Thread Andreas Grünbacher

2015-06-25 23:40 GMT+02:00 Stefan (metze) Metzmacher :
>>> I'm wondering if the size of an ace should be dynamic,
>>> which might make it possible to support other ace types
>>> in future. E.g. supporting other identities like 128-bit values
>>> to make it easier to map Windows SIDS.
>>
>> I'm working on additionally supporting unmapped user@domain and
>> group@domain identifier strings; we have to deal with that case in the
>> nfs client; that may be useful for Samba as well.
>
> Can this be any string? So would
> "S-1-5-21-4052121579-2079768045-1474639452-1001" also work?

I don't see why not, we'd just need to prevent namespace clashes.

> How would the current thread/process get a "token" that would match such
> an ace?

Solaris seems to solve this by what they call ephemeral ids; that concept may
become useful.

> [...]
> In general shouldn't kuid_t uid = current_fsuid(); be at the top of the
> function just once?

It really is just a pointer dereference.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v4 06/31] richacl: In-memory representation and helper functions

2015-06-26 Thread Andreas Grünbacher

2015-06-25 23:40 GMT+02:00 Stefan (metze) Metzmacher me...@samba.org:
 I'm wondering if the size of an ace should be dynamic,
 which might make it possible to support other ace types
 in future. E.g. supporting other identities like 128-bit values
 to make it easier to map Windows SIDS.

 I'm working on additionally supporting unmapped user@domain and
 group@domain identifier strings; we have to deal with that case in the
 nfs client; that may be useful for Samba as well.

 Can this be any string? So would
 S-1-5-21-4052121579-2079768045-1474639452-1001 also work?

I don't see why not, we'd just need to prevent namespace clashes.

 How would the current thread/process get a token that would match such
 an ace?

Solaris seems to solve this by what they call ephemeral ids; that concept may
become useful.

 [...]
 In general shouldn't kuid_t uid = current_fsuid(); be at the top of the
 function just once?

It really is just a pointer dereference.

Thanks,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v4 06/31] richacl: In-memory representation and helper functions

2015-06-25 Thread Andreas Grünbacher

Hi,

2015-06-25 21:58 GMT+02:00 Stefan (metze) Metzmacher :
> Is that also the on disk representation?

that's not the xattr representation, no.

> I'm wondering if the size of an ace should be dynamic,
> which might make it possible to support other ace types
> in future. E.g. supporting other identities like 128-bit values
> to make it easier to map Windows SIDS.

I'm working on additionally supporting unmapped user@domain and
group@domain identifier strings; we have to deal with that case in the
nfs client; that may be useful for Samba as well.

> Even without 128-bit ids, it would be very useful to mark an
> ace so that it applies to a uid or gid at the same time.
> This would reduce the size of the ace list when Samba uses
> IDMAP_TYPE_BOTH, which means a SID is mapped to a unix id, which
> is user (uid) and group (gid) at the same time. This feature is required
> in order to support SID-Histories on accounts.
> Currently Samba needs to add two aces (one uid and one gid)
> in order to represent one Windows ace.

It's not clear to me if supporting this would be a good idea right now.
The kernel would have to treat each such entry like two separate entries
internally. How would we map a combined user-space "uid + gid"
number to a kernel uid and gid since it could map to two distinct
numbers there?

> I haven't looked at the claims based acls on Windows, but it would be
> good if the new infrastructure is dynamic enough to support something
> like that in a future version.

I don't know, I have yet to see a use case that isn't totally crazy.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v4 06/31] richacl: In-memory representation and helper functions

2015-06-25 Thread Andreas Grünbacher

Hi,

2015-06-25 21:58 GMT+02:00 Stefan (metze) Metzmacher me...@samba.org:
 Is that also the on disk representation?

that's not the xattr representation, no.

 I'm wondering if the size of an ace should be dynamic,
 which might make it possible to support other ace types
 in future. E.g. supporting other identities like 128-bit values
 to make it easier to map Windows SIDS.

I'm working on additionally supporting unmapped user@domain and
group@domain identifier strings; we have to deal with that case in the
nfs client; that may be useful for Samba as well.

 Even without 128-bit ids, it would be very useful to mark an
 ace so that it applies to a uid or gid at the same time.
 This would reduce the size of the ace list when Samba uses
 IDMAP_TYPE_BOTH, which means a SID is mapped to a unix id, which
 is user (uid) and group (gid) at the same time. This feature is required
 in order to support SID-Histories on accounts.
 Currently Samba needs to add two aces (one uid and one gid)
 in order to represent one Windows ace.

It's not clear to me if supporting this would be a good idea right now.
The kernel would have to treat each such entry like two separate entries
internally. How would we map a combined user-space uid + gid
number to a kernel uid and gid since it could map to two distinct
numbers there?

 I haven't looked at the claims based acls on Windows, but it would be
 good if the new infrastructure is dynamic enough to support something
 like that in a future version.

I don't know, I have yet to see a use case that isn't totally crazy.

Thanks,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] vfs: Minor documentation fix

2015-06-04 Thread Andreas Grünbacher

Jon,

2015-06-05 0:46 GMT+02:00 Jonathan Corbet :
> I was going to toss this into the docs tree, but it won't apply even
> after correcting for the line wrapping and white-space mangling.  Care to
> try again?

Sorry. I'll remember not to send via the Drafts folder with git-imap-send again.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] vfs: Minor documentation fix

2015-06-04 Thread Andreas Grünbacher

Jon,

2015-06-05 0:46 GMT+02:00 Jonathan Corbet cor...@lwn.net:
 I was going to toss this into the docs tree, but it won't apply even
 after correcting for the line wrapping and white-space mangling.  Care to
 try again?

Sorry. I'll remember not to send via the Drafts folder with git-imap-send again.

Thanks,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] vfs: Minor documentation fix

2015-06-01 Thread Andreas Grünbacher

The check_acl inode operation and the IPERM_FLAG_RCU flag are long gone; update
the documentation.

Signed-off-by: Andreas Gruenbacher 
---
 Documentation/filesystems/porting | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/filesystems/porting
b/Documentation/filesystems/porting
index e69274d..7a34c7e 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -379,10 +379,10 @@ may now be called in rcu-walk mode (nd->flags &
LOOKUP_RCU). -ECHILD should be
 returned if the filesystem cannot handle rcu-walk. See
 Documentation/filesystems/vfs.txt for more details.

- permission and check_acl are inode permission checks that are called
-on many or all directory inodes on the way down a path walk (to check for
-exec permission). These must now be rcu-walk aware (flags & IPERM_FLAG_RCU).
-See Documentation/filesystems/vfs.txt for more details.
+ permission is an inode permission check that is called on many or all
+directory inodes on the way down a path walk (to check for exec permission). It
+must now be rcu-walk aware (mask & MAY_NOT_BLOCK).  See
+Documentation/filesystems/vfs.txt for more details.

 --
 [mandatory]
-- 
2.4.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] vfs: Minor documentation fix

2015-06-01 Thread Andreas Grünbacher

The check_acl inode operation and the IPERM_FLAG_RCU flag are long gone; update
the documentation.

Signed-off-by: Andreas Gruenbacher agrue...@redhat.com
---
 Documentation/filesystems/porting | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/filesystems/porting
b/Documentation/filesystems/porting
index e69274d..7a34c7e 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -379,10 +379,10 @@ may now be called in rcu-walk mode (nd-flags 
LOOKUP_RCU). -ECHILD should be
 returned if the filesystem cannot handle rcu-walk. See
 Documentation/filesystems/vfs.txt for more details.

- permission and check_acl are inode permission checks that are called
-on many or all directory inodes on the way down a path walk (to check for
-exec permission). These must now be rcu-walk aware (flags  IPERM_FLAG_RCU).
-See Documentation/filesystems/vfs.txt for more details.
+ permission is an inode permission check that is called on many or all
+directory inodes on the way down a path walk (to check for exec permission). It
+must now be rcu-walk aware (mask  MAY_NOT_BLOCK).  See
+Documentation/filesystems/vfs.txt for more details.

 --
 [mandatory]
-- 
2.4.0
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v3 41/45] rpc: Allow to demand-allocate pages to encode into

2015-05-29 Thread Andreas Grünbacher

2015-05-29 1:24 GMT+02:00 Trond Myklebust :
> xdr_get_next_encode() should return NULL on failure, not ENOMEM.
> Why is this trying to do a GFP_KERNEL allocation inside an XDR routine
> anyway? That's not an I/O safe sleep.

Fixed, thanks.

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v3 42/45] nfs: Add richacl support

2015-05-29 Thread Andreas Grünbacher

2015-05-29 17:24 GMT+02:00 Trond Myklebust :
>> It seems unreasonable to me to expect applications other than special file
>> system maintenance tools to cater to such file system differences; there are
>> just too many file systems out there for that to work. Instead, it
>> would be better
>> to use an interface that can be generalized across file systems.
>
> My point is that system.richacl is not such an interface. It can only
> ever work for local filesystems that understand and store local uids
> and gids. It has no support for the remote users/groups that are
> stored on your NFS/SMB server unless they happen to have a local
> mapping into uids and gids, and so the API is inappropriate to replace
> the existing NFSv4 acl API on the client.

That can be changed if we find a reasonable solution.

>>> The problem is that you are 100% reliant on an accurate idmapper in
>>> order to convert the name@domain to a _correct_ uid/gid. It isn't
>>> sufficient to convert to just any valid uid/gid, because if your ACL
>>> tool is trying to edit the ACL, you can end up converting all those
>>> DENY modes for user 'johnny_rot...@blackhats.are.us' into DENY modes
>>> for user 'nobody'.
>>> ...and yes, libnfsidmap will happily convert all unknown
>>> user/groupnames into whatever uid/gid corresponds to 'nobody' without
>>> returning an error.
>>
>> That's indeed a problem, and I can think of two ways of addressing it:
>>
>> First, acl editors need to be careful about nobody entries; they need to be
>> aware that nobody could actually stand for somebody else. (We could map
>> unmappable users and groups to something else than nobody, but that
>> might just shift the problem without improve anything.)
>
> What is the editor supposed to do about an entry it cannot even
> interpret, let alone store back to the server?

In the end, it will have to be a user decision what to do about such entries,
the editor can warn the user and make it harder to make mistakes though.

>> Second, we could add support for passing through unmappable Who values
>> as is. But that raises the problem of how to pass thtough and represent
>> different kinds of identifiers: NFSv4 users user@domain and group@domain
>> strings; smb users SIDs; maybe more.
>
> Now you have a mixture of some stuff that is being translated into
> local uid/gid format and therefore needs translating back when you're
> going to update the ACL, and some stuff that is not. What is the value
> of doing the mapping here?

If you don't translate, you cannot copy the permissions to another file
system. In the ideal case, everything can be translated; where that fails,
the user probably wants to know.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v3 37/45] nfs/sunrpc: No more encode and decode function pointer casting

2015-05-29 Thread Andreas Grünbacher

2015-05-29 2:37 GMT+02:00 Trond Myklebust :
> On Thu, May 28, 2015 at 7:40 PM, Andreas Grünbacher
>  wrote:
>> 2015-05-29 1:11 GMT+02:00 Trond Myklebust :
>>> How is this even remotely relevant to ACL functionality, and why does
>>> it deserve to bypass the NFS tree?
>>
>> I've posted this to the linux-nfs mailing list for review among
>> others, how is that
>> bypassing the NFS tree? Would you prefer those things sent to you personally
>> as well?
>
> No. I'm saying that changes that affect the core RPC code should not
> be going through external trees as part of an external feature; they
> should go through the maintainer trees.

I agree, none of that is going to get merged directly.

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v3 42/45] nfs: Add richacl support

2015-05-29 Thread Andreas Grünbacher

2015-05-29 15:15 GMT+02:00 Trond Myklebust :
> [reply reordered]
> So having revisited the reasons why I chose the system.nfs4_acl
> interface when we did NFSv4 ACLs, I'm not sure we should implement
> system.richacl for the NFS client at all.
>
> Your assertion that "when symbolic user@domain and group@domain names
> are used in the acl, user-space needs to perform ID mapping in the
> same way as the kernel"  is WRONG. User space needs do no such thing,
> and that was the whole point of the interface; to allow the user to
> specify ACLs in a format that is checked only on the _server_, and not
> on the client.

That's only half true. Right now, user-space applications trying to copy
permissions between an nfs mount and another file system will fail unless the
application has explicitly been made nfs aware and supports the
"system.nfs4_acl"
attribute (as well as some other acl mechanism if the permissions go beyond the
file mode).

The same problem exists when trying to make sense of acls.

It seems unreasonable to me to expect applications other than special file
system maintenance tools to cater to such file system differences; there are
just too many file systems out there for that to work. Instead, it
would be better
to use an interface that can be generalized across file systems.

> The problem is that you are 100% reliant on an accurate idmapper in
> order to convert the name@domain to a _correct_ uid/gid. It isn't
> sufficient to convert to just any valid uid/gid, because if your ACL
> tool is trying to edit the ACL, you can end up converting all those
> DENY modes for user 'johnny_rot...@blackhats.are.us' into DENY modes
> for user 'nobody'.
> ...and yes, libnfsidmap will happily convert all unknown
> user/groupnames into whatever uid/gid corresponds to 'nobody' without
> returning an error.

That's indeed a problem, and I can think of two ways of addressing it:

First, acl editors need to be careful about nobody entries; they need to be
aware that nobody could actually stand for somebody else. (We could map
unmappable users and groups to something else than nobody, but that
might just shift the problem without improve anything.)

Second, we could add support for passing through unmappable Who values
as is. But that raises the problem of how to pass thtough and represent
different kinds of identifiers: NFSv4 users user@domain and group@domain
strings; smb users SIDs; maybe more.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v3 37/45] nfs/sunrpc: No more encode and decode function pointer casting

2015-05-29 Thread Andreas Grünbacher

2015-05-29 2:37 GMT+02:00 Trond Myklebust trond.mykleb...@primarydata.com:
 On Thu, May 28, 2015 at 7:40 PM, Andreas Grünbacher
 andreas.gruenbac...@gmail.com wrote:
 2015-05-29 1:11 GMT+02:00 Trond Myklebust trond.mykleb...@primarydata.com:
 How is this even remotely relevant to ACL functionality, and why does
 it deserve to bypass the NFS tree?

 I've posted this to the linux-nfs mailing list for review among
 others, how is that
 bypassing the NFS tree? Would you prefer those things sent to you personally
 as well?

 No. I'm saying that changes that affect the core RPC code should not
 be going through external trees as part of an external feature; they
 should go through the maintainer trees.

I agree, none of that is going to get merged directly.

Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v3 41/45] rpc: Allow to demand-allocate pages to encode into

2015-05-29 Thread Andreas Grünbacher

2015-05-29 1:24 GMT+02:00 Trond Myklebust trond.mykleb...@primarydata.com:
 xdr_get_next_encode() should return NULL on failure, not ENOMEM.
 Why is this trying to do a GFP_KERNEL allocation inside an XDR routine
 anyway? That's not an I/O safe sleep.

Fixed, thanks.

Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v3 42/45] nfs: Add richacl support

2015-05-29 Thread Andreas Grünbacher

2015-05-29 15:15 GMT+02:00 Trond Myklebust trond.mykleb...@primarydata.com:
 [reply reordered]
 So having revisited the reasons why I chose the system.nfs4_acl
 interface when we did NFSv4 ACLs, I'm not sure we should implement
 system.richacl for the NFS client at all.

 Your assertion that when symbolic user@domain and group@domain names
 are used in the acl, user-space needs to perform ID mapping in the
 same way as the kernel  is WRONG. User space needs do no such thing,
 and that was the whole point of the interface; to allow the user to
 specify ACLs in a format that is checked only on the _server_, and not
 on the client.

That's only half true. Right now, user-space applications trying to copy
permissions between an nfs mount and another file system will fail unless the
application has explicitly been made nfs aware and supports the
system.nfs4_acl
attribute (as well as some other acl mechanism if the permissions go beyond the
file mode).

The same problem exists when trying to make sense of acls.

It seems unreasonable to me to expect applications other than special file
system maintenance tools to cater to such file system differences; there are
just too many file systems out there for that to work. Instead, it
would be better
to use an interface that can be generalized across file systems.

 The problem is that you are 100% reliant on an accurate idmapper in
 order to convert the name@domain to a _correct_ uid/gid. It isn't
 sufficient to convert to just any valid uid/gid, because if your ACL
 tool is trying to edit the ACL, you can end up converting all those
 DENY modes for user 'johnny_rot...@blackhats.are.us' into DENY modes
 for user 'nobody'.
 ...and yes, libnfsidmap will happily convert all unknown
 user/groupnames into whatever uid/gid corresponds to 'nobody' without
 returning an error.

That's indeed a problem, and I can think of two ways of addressing it:

First, acl editors need to be careful about nobody entries; they need to be
aware that nobody could actually stand for somebody else. (We could map
unmappable users and groups to something else than nobody, but that
might just shift the problem without improve anything.)

Second, we could add support for passing through unmappable Who values
as is. But that raises the problem of how to pass thtough and represent
different kinds of identifiers: NFSv4 users user@domain and group@domain
strings; smb users SIDs; maybe more.

Thanks,
Andreas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 145 matches

Mail list logo