Re: [Cluster-devel] [PATCH v3 28/49] dm zoned: dynamically allocate the dm-zoned-meta shrinker

2023-07-27 Thread Damien Le Moal
On 7/28/23 07:59, Dave Chinner wrote:
> On Thu, Jul 27, 2023 at 07:20:46PM +0900, Damien Le Moal wrote:
>> On 7/27/23 17:55, Qi Zheng wrote:
>>>>>   goto err;
>>>>>   }
>>>>>   +    zmd->mblk_shrinker->count_objects = dmz_mblock_shrinker_count;
>>>>> +    zmd->mblk_shrinker->scan_objects = dmz_mblock_shrinker_scan;
>>>>> +    zmd->mblk_shrinker->seeks = DEFAULT_SEEKS;
>>>>> +    zmd->mblk_shrinker->private_data = zmd;
>>>>> +
>>>>> +    shrinker_register(zmd->mblk_shrinker);
>>>>
>>>> I fail to see how this new shrinker API is better... Why isn't there a
>>>> shrinker_alloc_and_register() function ? That would avoid adding all this 
>>>> code
>>>> all over the place as the new API call would be very similar to the current
>>>> shrinker_register() call with static allocation.
>>>
>>> In some registration scenarios, memory needs to be allocated in advance.
>>> So we continue to use the previous prealloc/register_prepared()
>>> algorithm. The shrinker_alloc_and_register() is just a helper function
>>> that combines the two, and this increases the number of APIs that
>>> shrinker exposes to the outside, so I choose not to add this helper.
>>
>> And that results in more code in many places instead of less code + a simple
>> inline helper in the shrinker header file...
> 
> It's not just a "simple helper" - it's a function that has to take 6
> or 7 parameters with a return value that must be checked and
> handled.
> 
> This was done in the first versions of the patch set - the amount of
> code in each caller does not go down and, IMO, was much harder to
> read and determine "this is obviously correct" that what we have
> now.
> 
>> So not adding that super simple
>> helper is not exactly the best choice in my opinion.
> 
> Each to their own - I much prefer the existing style/API over having
> to go look up a helper function every time I want to check some
> random shrinker has been set up correctly

OK. All fair points.


-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH v2 92/92] fs: rename i_ctime field to __i_ctime

2023-07-05 Thread Damien Le Moal
On 7/6/23 03:58, Jeff Layton wrote:
> Now that everything in-tree is converted to use the accessor functions,
> rename the i_ctime field in the inode to discourage direct access.
> 
> Signed-off-by: Jeff Layton 

Looks OK to me.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH v2 08/92] fs: new helper: simple_rename_timestamp

2023-07-05 Thread Damien Le Moal
On 7/6/23 03:58, Jeff Layton wrote:
> A rename potentially involves updating 4 different inode timestamps. Add
> a function that handles the details sanely, and convert the libfs.c
> callers to use it.
> 
> Signed-off-by: Jeff Layton 
> ---
>  fs/libfs.c | 36 +++-
>  include/linux/fs.h |  2 ++
>  2 files changed, 29 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/libfs.c b/fs/libfs.c
> index a7e56baf8bbd..9ee79668c909 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -692,6 +692,31 @@ int simple_rmdir(struct inode *dir, struct dentry 
> *dentry)
>  }
>  EXPORT_SYMBOL(simple_rmdir);
>  
> +/**
> + * simple_rename_timestamp - update the various inode timestamps for rename
> + * @old_dir: old parent directory
> + * @old_dentry: dentry that is being renamed
> + * @new_dir: new parent directory
> + * @new_dentry: target for rename
> + *
> + * POSIX mandates that the old and new parent directories have their ctime 
> and
> + * mtime updated, and that inodes of @old_dentry and @new_dentry (if any), 
> have
> + * their ctime updated.
> + */
> +void simple_rename_timestamp(struct inode *old_dir, struct dentry 
> *old_dentry,
> +  struct inode *new_dir, struct dentry *new_dentry)
> +{
> + struct inode *newino = d_inode(new_dentry);
> +
> + old_dir->i_mtime = inode_set_ctime_current(old_dir);
> + if (new_dir != old_dir)
> + new_dir->i_mtime = inode_set_ctime_current(new_dir);
> + inode_set_ctime_current(d_inode(old_dentry));
> + if (newino)
> + inode_set_ctime_current(newino);
> +}
> +EXPORT_SYMBOL_GPL(simple_rename_timestamp);
> +
>  int simple_rename_exchange(struct inode *old_dir, struct dentry *old_dentry,
>  struct inode *new_dir, struct dentry *new_dentry)
>  {
> @@ -707,11 +732,7 @@ int simple_rename_exchange(struct inode *old_dir, struct 
> dentry *old_dentry,
>   inc_nlink(old_dir);
>   }
>   }
> - old_dir->i_ctime = old_dir->i_mtime =
> - new_dir->i_ctime = new_dir->i_mtime =
> - d_inode(old_dentry)->i_ctime =
> - d_inode(new_dentry)->i_ctime = current_time(old_dir);
> -
> + simple_rename_timestamp(old_dir, old_dentry, new_dir, new_dentry);

This is somewhat changing the current behavior: before the patch, the mtime and
ctime of old_dir, new_dir and the inodes associated with the dentries are always
equal. But given that simple_rename_timestamp() calls inode_set_ctime_current()
4 times, the times could potentially be different.

I am not sure if that is an issue, but it seems that calling
inode_set_ctime_current() once, recording the "now" time it sets and using that
value to set all times may be more efficient and preserve the existing behavior.

>   return 0;
>  }
>  EXPORT_SYMBOL_GPL(simple_rename_exchange);
> @@ -720,7 +741,6 @@ int simple_rename(struct mnt_idmap *idmap, struct inode 
> *old_dir,
> struct dentry *old_dentry, struct inode *new_dir,
> struct dentry *new_dentry, unsigned int flags)
>  {
> - struct inode *inode = d_inode(old_dentry);
>   int they_are_dirs = d_is_dir(old_dentry);
>  
>   if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE))
> @@ -743,9 +763,7 @@ int simple_rename(struct mnt_idmap *idmap, struct inode 
> *old_dir,
>   inc_nlink(new_dir);
>   }
>  
> - old_dir->i_ctime = old_dir->i_mtime = new_dir->i_ctime =
> - new_dir->i_mtime = inode->i_ctime = current_time(old_dir);
> -
> + simple_rename_timestamp(old_dir, old_dentry, new_dir, new_dentry);
>   return 0;
>  }
>  EXPORT_SYMBOL(simple_rename);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index bdfbd11a5811..14e38bd900f1 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2979,6 +2979,8 @@ extern int simple_open(struct inode *inode, struct file 
> *file);
>  extern int simple_link(struct dentry *, struct inode *, struct dentry *);
>  extern int simple_unlink(struct inode *, struct dentry *);
>  extern int simple_rmdir(struct inode *, struct dentry *);
> +void simple_rename_timestamp(struct inode *old_dir, struct dentry 
> *old_dentry,
> +  struct inode *new_dir, struct dentry *new_dentry);
>  extern int simple_rename_exchange(struct inode *old_dir, struct dentry 
> *old_dentry,
> struct inode *new_dir, struct dentry 
> *new_dentry);
>  extern int simple_rename(struct mnt_idmap *, struct inode *,

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH v2 07/92] fs: add ctime accessors infrastructure

2023-07-05 Thread Damien Le Moal
On 7/6/23 03:58, Jeff Layton wrote:
> struct timespec64 has unused bits in the tv_nsec field that can be used
> for other purposes. In future patches, we're going to change how the
> inode->i_ctime is accessed in certain inodes in order to make use of
> them. In order to do that safely though, we'll need to eradicate raw
> accesses of the inode->i_ctime field from the kernel.
> 
> Add new accessor functions for the ctime that we use to replace them.
> 
> Reviewed-by: Jan Kara 
> Reviewed-by: Luis Chamberlain 
> Signed-off-by: Jeff Layton 

Looks OK to me.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 01/79] fs: add ctime accessors infrastructure

2023-06-21 Thread Damien Le Moal
On 6/21/23 23:45, Jeff Layton wrote:
> struct timespec64 has unused bits in the tv_nsec field that can be used
> for other purposes. In future patches, we're going to change how the
> inode->i_ctime is accessed in certain inodes in order to make use of
> them. In order to do that safely though, we'll need to eradicate raw
> accesses of the inode->i_ctime field from the kernel.
> 
> Add new accessor functions for the ctime that we can use to replace them.
> 
> Signed-off-by: Jeff Layton 

[...]

> +/**
> + * inode_ctime_peek - fetch the current ctime from the inode
> + * @inode: inode from which to fetch ctime
> + *
> + * Grab the current ctime from the inode and return it.
> + */
> +static inline struct timespec64 inode_ctime_peek(const struct inode *inode)

To be consistent with inode_ctime_set(), why not call this one inode_ctime_get()
? Also, inode_set_ctime() & inode_get_ctime() may be a little more natural. But
no strong opinion about that though.

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper

2023-05-31 Thread Damien Le Moal
On 5/31/23 16:50, Christoph Hellwig wrote:
> Add a helper dealing with handling the syncing of a buffered write fallback
> for direct I/O.
> 
> Signed-off-by: Christoph Hellwig 

Looks OK to me.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info

2023-05-31 Thread Damien Le Moal
On 5/31/23 16:50, Christoph Hellwig wrote:
> The last user of current->backing_dev_info disappeared in commit
> b9b1335e6403 ("remove bdi_congested() and wb_congested() and related
> functions").  Remove the field and all assignments to it.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Hannes Reinecke 
> Reviewed-by: Darrick J. Wong 

Nice cleanup !

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 11/13] fuse: update ki_pos in fuse_perform_write

2023-05-22 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Both callers of fuse_perform_write need to updated ki_pos, move it into
> common code.
> 
> Signed-off-by: Christoph Hellwig 

Looks OK to me.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 12/13] fuse: drop redundant arguments to fuse_perform_write

2023-05-22 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> pos is always equal to iocb->ki_pos, and mapping is always equal to
> iocb->ki_filp->f_mapping.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 04/13] filemap: add a kiocb_write_and_wait helper

2023-05-22 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Factor out a helper that does filemap_write_and_wait_range for a the

for a the -> for the

> range covered by a read kiocb, or returns -EAGAIN if the kiocb
> is marked as nowait and there would be pages to write.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 01/13] iomap: update ki_pos a little later in iomap_dio_complete

2023-05-22 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Move the ki_pos update down a bit to prepare for a better common
> helper that invalidates pages based of an iocb.
> 
> Signed-off-by: Christoph Hellwig 

Looks OK to me.

Reviewed-by: Damien Le Moal 

> + if (dio->flags & IOMAP_DIO_NEED_SYNC)
> + ret = generic_write_sync(iocb, ret);
> + if (ret > 0)
> + ret += dio->done_before;
> + }
>   trace_iomap_dio_complete(iocb, dio->error, ret);
>   kfree(dio);
> -

white line change. Personally, I like a blank line before returns to make them
stand out :)

>   return ret;
>  }
>  EXPORT_SYMBOL_GPL(iomap_dio_complete);

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 13/13] fuse: use direct_write_fallback

2023-05-22 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Use the generic direct_write_fallback helper instead of duplicating the
> logic.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 02/13] filemap: update ki_pos in generic_perform_write

2023-05-22 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> All callers of generic_perform_write need to updated ki_pos, move it into
> common code.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 05/13] filemap: add a kiocb_invalidate_pages helper

2023-05-22 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Factor out a helper that calls filemap_write_and_wait_range and
> invalidate_inode_pages2_rangefor a the range covered by a write kiocb or

invalidate_inode_pages2_rangefor a the range
->
invalidate_inode_pages2_range for the range

> returns -EAGAIN if the kiocb is marked as nowait and there would be pages
> to write or invalidate.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 09/13] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages

2023-05-22 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Use the common helpers for direct I/O page invalidation instead of
> open coding the logic.  This leads to a slight reordering of checks
> in __iomap_dio_rw to keep the logic straight.
> 
> Signed-off-by: Christoph Hellwig 

Looks OK to me.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 07/13] iomap: update ki_pos in iomap_file_buffered_write

2023-05-22 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> All callers of iomap_file_buffered_write need to updated ki_pos, move it
> into common code.
> 
> Signed-off-by: Christoph Hellwig 

One nit below.

Acked-by: Damien Le Moal 

> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 063133ec77f49e..550525a525c45c 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -864,16 +864,19 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
> iov_iter *i,
>   .len= iov_iter_count(i),
>   .flags  = IOMAP_WRITE,
>   };
> - int ret;
> + ssize_t ret;
>  
>   if (iocb->ki_flags & IOCB_NOWAIT)
>   iter.flags |= IOMAP_NOWAIT;
>  
>   while ((ret = iomap_iter(, ops)) > 0)
>   iter.processed = iomap_write_iter(, i);
> - if (iter.pos == iocb->ki_pos)
> +
> + if (unlikely(ret < 0))

Nit: This could be if (unlikely(ret <= 0)), no ?

>   return ret;
> - return iter.pos - iocb->ki_pos;
> +     ret = iter.pos - iocb->ki_pos;
> + iocb->ki_pos += ret;
> + return ret;


-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 03/13] filemap: assign current->backing_dev_info in generic_perform_write

2023-05-22 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Move the assignment to current->backing_dev_info from the callers into
> generic_perform_write to reduce boiler plate code and reduce the scope
> to just around the page dirtying loop.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 10/13] fs: factor out a direct_write_fallback helper

2023-05-22 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Add a helper dealing with handling the syncing of a buffered write fallback
> for direct I/O.
> 
> Signed-off-by: Christoph Hellwig 

Looks OK. One comment below.

Reviewed-by: Damien Le Moal 

> + /*
> +  * We need to ensure that the page cache pages are written to disk and
> +  * invalidated to preserve the expected O_DIRECT semantics.
> +  */
> + end = pos + buffered_written - 1;
> + err = filemap_write_and_wait_range(mapping, pos, end);
> + if (err < 0) {
> + /*
> +  * We don't know how much we wrote, so just return the number of
> +  * bytes which were direct-written
> +  */
> + return err;
> + }
> + invalidate_mapping_pages(mapping, pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
> + return direct_written + buffered_written;

Why not adding here something like:

if (buffered_written != iov_iter_count(from))
return -EIO;

return direct_written + buffered_written;

to have the same semantic as plain DIO ?

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 08/13] iomap: assign current->backing_dev_info in iomap_file_buffered_write

2023-05-22 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Move the assignment to current->backing_dev_info from the callers into
> iomap_file_buffered_write to reduce boiler plate code and reduce the
> scope to just around the page dirtying loop.
> 
> Note that zonefs was missing this assignment before.

Hu... Shouldn't this be fixed as a separate patch with a Fixes tag for this 
cycle ?
I have never noticed any issues with this missing though. Not sure how an issue
can be triggered with this assignment missing.

Apart from that, this patch look good to me.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 06/13] filemap: add a kiocb_invalidate_post_write helper

2023-05-22 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Add a helper to invalidate page cache after a dio write.
> 
> Signed-off-by: Christoph Hellwig 

Nit: kiocb_invalidate_post_dio_write() may be a better name to be explicit about
the fact that this is for DIOs only ?

Otherwise looks ok to me.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 18/19] dm-crypt: check if adding pages to clone bio fails

2023-03-29 Thread Damien Le Moal
On 3/30/23 09:17, Yang Shi wrote:
> On Wed, Mar 29, 2023 at 4:49 PM Damien Le Moal
>  wrote:
>>
>> On 3/30/23 02:06, Johannes Thumshirn wrote:
>>> Check if adding pages to clone bio fails and if bail out.
>>
>> Nope. The code retries with direct reclaim until it succeeds. Which is very
>> suspicious...
> 
> It is not related to bio_add_page() failure. It is used to avoid a
> race condition when two processes are depleting the mempool
> simultaneously.
> 
> IIUC I don't think page merge may happen for dm-crypt, so is
> __bio_add_page() good enough? I'm working on this code too, using
> __bio_add_page() would make my patch easier.

If the BIO was allocated with enough bvecs, we could use that function. But page
merging reduces overhead, so if it can happen, let's use it.

> 
>>
>>>
>>> This way we can mark bio_add_pages as __must_check.
>>>
>>> Signed-off-by: Johannes Thumshirn 
>>
>> With the commit message fixed,
>>
>> Reviewed-by: Damien Le Moal 
>>
>>
>> --
>> Damien Le Moal
>> Western Digital Research
>>
>>

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 19/19] block: mark bio_add_page as __must_check

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:06, Johannes Thumshirn wrote:
> Now that all users of bio_add_page check for the return value, mark
> bio_add_page as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 18/19] dm-crypt: check if adding pages to clone bio fails

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:06, Johannes Thumshirn wrote:
> Check if adding pages to clone bio fails and if bail out.

Nope. The code retries with direct reclaim until it succeeds. Which is very
suspicious...

> 
> This way we can mark bio_add_pages as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

With the commit message fixed,

Reviewed-by: Damien Le Moal 


-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 17/19] md: raid1: check if adding pages to resync bio fails

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:06, Johannes Thumshirn wrote:
> Check if adding pages to resync bio fails and if bail out.
> 
> As the comment above suggests this cannot happen, WARN if it actually
> happens.
> 
> This way we can mark bio_add_pages as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/md/raid1-10.c |  7 ++-
>  drivers/md/raid10.c   | 12 ++--
>  2 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> index e61f6cad4e08..c21b6c168751 100644
> --- a/drivers/md/raid1-10.c
> +++ b/drivers/md/raid1-10.c
> @@ -105,7 +105,12 @@ static void md_bio_reset_resync_pages(struct bio *bio, 
> struct resync_pages *rp,
>* won't fail because the vec table is big
>* enough to hold all these pages
>*/
> - bio_add_page(bio, page, len, 0);
> + if (WARN_ON(!bio_add_page(bio, page, len, 0))) {

Not sure we really need the WARN_ON here...
Nevertheless,

Reviewed-by: Damien Le Moal 


> + bio->bi_status = BLK_STS_RESOURCE;
> + bio_endio(bio);
> + return;
> + }
> +
>   size -= len;
>   } while (idx++ < RESYNC_PAGES && size > 0);
>  }
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 6c66357f92f5..5682dba52fd3 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -3808,7 +3808,11 @@ static sector_t raid10_sync_request(struct mddev 
> *mddev, sector_t sector_nr,
>* won't fail because the vec table is big enough
>* to hold all these pages
>*/
> - bio_add_page(bio, page, len, 0);
> + if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
> + bio->bi_status = BLK_STS_RESOURCE;
> + bio_endio(bio);
> + goto giveup;
> + }
>   }
>   nr_sectors += len>>9;
>   sector_nr += len>>9;
> @@ -4989,7 +4993,11 @@ static sector_t reshape_request(struct mddev *mddev, 
> sector_t sector_nr,
>* won't fail because the vec table is big enough
>* to hold all these pages
>*/
> - bio_add_page(bio, page, len, 0);
> + if (WARN_ON(!bio_add_page(bio, page, len, 0))) {
> + bio->bi_status = BLK_STS_RESOURCE;
> + bio_endio(bio);
> + return sectors_done; /* XXX: is this correct? */
> + }
>   }
>   sector_nr += len >> 9;
>   nr_sectors += len >> 9;

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 16/19] md: raid1: use __bio_add_page for adding single page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:06, Johannes Thumshirn wrote:
> The sync request code uses bio_add_page() to add a page to a newly created 
> bio.
> bio_add_page() can fail, but the return value is never checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 15/19] md: check for failure when adding pages in alloc_behind_master_bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:06, Johannes Thumshirn wrote:
> alloc_behind_master_bio() can possibly add multiple pages to a bio, but it
> is not checking for the return value of bio_add_page() if adding really
> succeeded.
> 
> Check if the page adding succeeded and if not bail out.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 14/19] floppy: use __bio_add_page for adding single page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:06, Johannes Thumshirn wrote:
> The floppy code uses bio_add_page() to add a page to a newly created bio.
> bio_add_page() can fail, but the return value is never checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 13/19] zram: use __bio_add_page for adding single page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The zram writeback code uses bio_add_page() to add a page to a newly
> created bio. bio_add_page() can fail, but the return value is never
> checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 12/19] zonefs: use __bio_add_page for adding single page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The zonefs superblock reading code uses bio_add_page() to add a page to a
> newly created bio. bio_add_page() can fail, but the return value is
> never checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Acked-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 11/19] gfs: use __bio_add_page for adding single page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The GFS superblock reading code uses bio_add_page() to add a page to a
> newly created bio. bio_add_page() can fail, but the return value is never
> checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 10/19] jfs: logmgr: use __bio_add_page to add single page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The JFS IO code uses bio_add_page() to add a page to a newly created bio.
> bio_add_page() can fail, but the return value is never checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 09/19] btrfs: raid56: use __bio_add_page to add single page

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The btrfs raid58 sector submission code uses bio_add_page() to add a page
> to a newly created bio. bio_add_page() can fail, but the return value is
> never checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 08/19] btrfs: repair: use __bio_add_page for adding single page

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The btrfs repair bio submission code uses bio_add_page() to add a page to
> a newly created bio. bio_add_page() can fail, but the return value is
> never checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 07/19] md: raid5: use __bio_add_page to add single page to new bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The raid5-ppl submission code uses bio_add_page() to add a page to a
> newly created bio. bio_add_page() can fail, but the return value is never
> checked. For adding consecutive pages, the return is actually checked and
> a new bio is allocated if adding the page fails.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 05/19] md: use __bio_add_page to add single page

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The md-raid superblock writing code uses bio_add_page() to add a page to a
> newly created bio. bio_add_page() can fail, but the return value is never
> checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-of_-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 03/19] dm: dm-zoned: use __bio_add_page for adding single metadata page

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> dm-zoned uses bio_add_page() for adding a single page to a freshly created
> metadata bio.
> 
> Use __bio_add_page() instead as adding a single page to a new bio is
> always guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() __must_check
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 04/19] fs: buffer: use __bio_add_page to add single page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The buffer_head submission code uses bio_add_page() to add a page to a
> newly created bio. bio_add_page() can fail, but the return value is never
> checked.
> 
> Use __bio_add_page() as adding a single page to a newly created bio is
> guaranteed to succeed.
> 
> This brings us a step closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 02/19] drbd: use __bio_add_page to add page to bio

2023-03-29 Thread Damien Le Moal
On 3/30/23 02:05, Johannes Thumshirn wrote:
> The drbd code only adds a single page to a newly created bio. So use
> __bio_add_page() to add the page which is guaranteed to succeed in this
> case.
> 
> This brings us closer to marking bio_add_page() as __must_check.
> 
> Signed-off-by: Johannes Thumshirn 

With Matthew comment addressed,

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [RFC v6 08/10] iomap/xfs: Eliminate the iomap_valid handler

2023-01-18 Thread Damien Le Moal
On 1/18/23 16:21, Christoph Hellwig wrote:
> On Sun, Jan 15, 2023 at 09:29:58AM -0800, Darrick J. Wong wrote:
>> I don't have any objections to pulling everything except patches 8 and
>> 10 for testing this week. 
> 
> That would be great.  I now have a series to return the ERR_PTR
> from __filemap_get_folio which will cause a minor conflict, but
> I think that's easy enough for Linux to handle.
> 
>>
>> 1. Does zonefs need to revalidate mappings?  The mappings are 1:1 so I
>> don't think it does, but OTOH zone pointer management might complicate
>> that.
> 
> Adding Damien.

zonefs has a static mapping of file blocks that never changes and is fully
populated up to a file max size from mount. So zonefs is not using the
iomap_valid page operation. In fact, zonefs is not even using struct
iomap_page_ops.

> 
>> 2. How about porting the writeback iomap validation to use this
>> mechanism?  (I suspect Dave might already be working on this...)
> 
> What is "this mechanism"?  Do you mean the here removed ->iomap_valid
> ?   writeback calls into ->map_blocks for every block while under the
> folio lock, so the validation can (and for XFS currently is) done
> in that.  Moving it out into a separate method with extra indirect
> functiona call overhead and interactions between the methods seems
> like a retrograde step to me.
> 
>> 2. Do we need to revalidate mappings for directio writes?  I think the
>> answer is no (for xfs) because the ->iomap_begin call will allocate
>> whatever blocks are needed and truncate/punch/reflink block on the
>> iolock while the directio writes are pending, so you'll never end up
>> with a stale mapping.
> 
> Yes.

-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH 1/1] Revert "iomap: fall back to buffered writes for invalidation failures"

2022-02-09 Thread Damien Le Moal
On 2022/02/09 17:52, Lee Jones wrote:
> This reverts commit 60263d5889e6dc5987dc51b801be4955ff2e4aa7.
> 
> Reverting since this commit opens a potential avenue for abuse.
> 
> The C-reproducer and more information can be found at the link below.
> 
> With this patch applied, I can no longer get the repro to trigger.
> 
>   kernel BUG at fs/ext4/inode.c:2647!
>   invalid opcode:  [#1] PREEMPT SMP KASAN
>   CPU: 0 PID: 459 Comm: syz-executor359 Tainted: GW 
> 5.10.93-syzkaller-01028-g0347b1658399 #0
>   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
>   RIP: 0010:mpage_prepare_extent_to_map+0xbe9/0xc00 fs/ext4/inode.c:2647
>   Code: e8 fc bd c3 ff e9 80 f6 ff ff 44 89 e9 80 e1 07 38 c1 0f 8c a6 fe ff 
> ff 4c 89 ef e8 e1 bd c3 ff e9 99 fe ff ff e8 87 c9 89 ff <0f> 0b e8 80 c9 89 
> ff 0f 0b e8 79 1e b8 02 66 0f 1f 84 00 00 00 00
>   RSP: 0018:c9e2e9c0 EFLAGS: 00010293
>   RAX: 81e321f9 RBX:  RCX: 88810c12cf00
>   RDX:  RSI:  RDI: 
>   RBP: c9e2eb90 R08: 81e31e71 R09: f940008d68b1
>   R10: f940008d68b1 R11:  R12: ea00046b4580
>   R13: c9e2ea80 R14: 011e R15: dc00
>   FS:  7fcfd0717700() GS:8881f700() knlGS:
>   CS:  0010 DS:  ES:  CR0: 80050033
>   CR2: 7fcfd07d5520 CR3: 00010a142000 CR4: 003506b0
>   DR0:  DR1:  DR2: 
>   DR3:  DR6: fffe0ff0 DR7: 0400
>   Call Trace:
>ext4_writepages+0xcbb/0x3950 fs/ext4/inode.c:2776
>do_writepages+0x13a/0x280 mm/page-writeback.c:2358
>__filemap_fdatawrite_range+0x356/0x420 mm/filemap.c:427
>filemap_write_and_wait_range+0x64/0xe0 mm/filemap.c:660
>__iomap_dio_rw+0x621/0x10c0 fs/iomap/direct-io.c:495
>iomap_dio_rw+0x35/0x80 fs/iomap/direct-io.c:611
>ext4_dio_write_iter fs/ext4/file.c:571 [inline]
>ext4_file_write_iter+0xfc5/0x1b70 fs/ext4/file.c:681
>do_iter_readv_writev+0x548/0x740 include/linux/fs.h:1941
>do_iter_write+0x182/0x660 fs/read_write.c:866
>vfs_iter_write+0x7c/0xa0 fs/read_write.c:907
>iter_file_splice_write+0x7fb/0xf70 fs/splice.c:686
>do_splice_from fs/splice.c:764 [inline]
>direct_splice_actor+0xfe/0x130 fs/splice.c:933
>splice_direct_to_actor+0x4f4/0xbc0 fs/splice.c:888
>do_splice_direct+0x28b/0x3e0 fs/splice.c:976
>do_sendfile+0x914/0x1390 fs/read_write.c:1257
> 
> Link: https://syzkaller.appspot.com/bug?extid=41c966bf0729a530bd8d
> 
> From the patch:
> Cc: Stable 
> Cc: Christoph Hellwig 
> Cc: Dave Chinner 
> Cc: Goldwyn Rodrigues 
> Cc: Darrick J. Wong 
> Cc: Bob Peterson 
> Cc: Damien Le Moal 
> Cc: Theodore Ts'o  # for ext4
> Cc: Andreas Gruenbacher 
> Cc: Ritesh Harjani 
> 
> Others:
> Cc: Johannes Thumshirn 
> Cc: linux-...@vger.kernel.org
> Cc: linux-fsde...@vger.kernel.org
> Cc: linux-e...@vger.kernel.org
> Cc: cluster-devel@redhat.com
> 
> Fixes: 60263d5889e6d ("iomap: fall back to buffered writes for invalidation 
> failures")
> Reported-by: syzbot+0ed9f769264276638...@syzkaller.appspotmail.com
> Signed-off-by: Lee Jones 

For zonefs:

Acked-by: Damien Le Moal 



-- 
Damien Le Moal
Western Digital Research



Re: [Cluster-devel] [PATCH RFC PKS/PMEM 26/58] fs/zonefs: Utilize new kmap_thread()

2020-10-11 Thread Damien Le Moal
On 2020/10/10 4:52, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> The kmap() calls in this FS are localized to a single thread.  To avoid
> the over head of global PKRS updates use the new kmap_thread() call.
> 
> Cc: Damien Le Moal 
> Cc: Naohiro Aota 
> Signed-off-by: Ira Weiny 
> ---
>  fs/zonefs/super.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index 8ec7c8f109d7..2fd6c86beee1 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -1297,7 +1297,7 @@ static int zonefs_read_super(struct super_block *sb)
>   if (ret)
>   goto free_page;
>  
> - super = kmap(page);
> + super = kmap_thread(page);
>  
>   ret = -EINVAL;
>   if (le32_to_cpu(super->s_magic) != ZONEFS_MAGIC)
> @@ -1349,7 +1349,7 @@ static int zonefs_read_super(struct super_block *sb)
>   ret = 0;
>  
>  unmap:
> - kunmap(page);
> +     kunmap_thread(page);
>  free_page:
>   __free_page(page);
>  
> 

acked-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research





Re: [Cluster-devel] [PATCH 3/3] iomap: fall back to buffered writes for invalidation failures

2020-07-21 Thread Damien Le Moal
*/
>   ASSERT(ret < 0 || ret == count);
>   return ret;
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index 07bc42d62673ce..d0a04528a7e18e 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -786,8 +786,11 @@ static ssize_t zonefs_file_write_iter(struct kiocb 
> *iocb, struct iov_iter *from)
>   if (iocb->ki_pos >= ZONEFS_I(inode)->i_max_size)
>   return -EFBIG;
>  
> - if (iocb->ki_flags & IOCB_DIRECT)
> - return zonefs_file_dio_write(iocb, from);
> + if (iocb->ki_flags & IOCB_DIRECT) {
> + ssize_t ret = zonefs_file_dio_write(iocb, from);
> + if (ret != -ENOTBLK)
> + return ret;
> + }
>  
>   return zonefs_file_buffered_write(iocb, from);
>  }
> 

Looks fine. For zonefs:

Acked-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research





Re: [Cluster-devel] [PATCH 2/2] iomap: fall back to buffered writes for invalidation failures

2020-07-14 Thread Damien Le Moal
e..793850454b752f 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -786,8 +786,11 @@ static ssize_t zonefs_file_write_iter(struct kiocb 
> *iocb, struct iov_iter *from)
>   if (iocb->ki_pos >= ZONEFS_I(inode)->i_max_size)
>   return -EFBIG;
>  
> - if (iocb->ki_flags & IOCB_DIRECT)
> - return zonefs_file_dio_write(iocb, from);
> + if (iocb->ki_flags & IOCB_DIRECT) {
> + ret = zonefs_file_dio_write(iocb, from);
> + if (ret != -EREMCHG)
> + return ret;
> + }

This looks fine. This would happen only for conventional zone writes since
sequential zone writes cannot ever issue direct IOs on top of cached data as
that would be a forbidden "overwrite" operation.

>  
>   return zonefs_file_buffered_write(iocb, from);
>  }
> 


-- 
Damien Le Moal
Western Digital Research