Re: [f2fs-dev] [PATCH] f2fs: check discard support for conventional zones

2024-08-15 Thread Damien Le Moal
On 8/16/24 11:23, Shin'ichiro Kawasaki wrote:
> As the helper function f2fs_bdev_support_discard() shows, f2fs checks if
> the target block devices support discard by calling
> bdev_max_discard_sectors() and bdev_is_zoned(). This check works good

s/good/well

> for most cases, but it does not work for conventional zones on zoned
> block devices. F2fs assumes that zoned block devices support discard,
> and calls __submit_discard_cmd(). When __submit_discard_cmd() is called
> for sequential write required zones, it works fine since
> __submit_discard_cmd() issues zone reset commands instead of discard
> commands. However, when __submit_discard_cmd() is called for
> conventional zones, __blkdev_issue_discard() is called even when the
> devices do not support discard.
> 
> The inappropriate __blkdev_issue_discard() call was not a problem before
> the commit 30f1e7241422 ("block: move discard checks into the ioctl
> handler") because __blkdev_issue_discard() checked if the target devices
> support discard or not. If not, it returned EOPNOTSUPP. After the
> commit, __blkdev_issue_discard() no longer checks it and always returns
> zero and sets NULL to the give bio pointer. This NULL pointer triggers
> f2fs_bug_on() in __submit_discard_cmd(). The BUG is recreated with the
> commands below at the umount step, where /dev/nullb0 is a zoned null_blk
> with 5GB total size, 128MB zone size and 10 conventional zones.
> 
> $ mkfs.f2fs -f -m /dev/nullb0
> $ mount /dev/nullb0 /mnt
> $ for ((i=0;i<5;i++)); do dd if=/dev/zero of=/mnt/test bs=65536 count=1600 
> conv=fsync; done
> $ umount /mnt
> 
> To fix the BUG, avoid the inappropriate __blkdev_issue_discard() call.
> When discard is requested for conventional zones, check if the device
> supports discard or not. If not, return EOPNOTSUPP.
> 
> Fixes: 30f1e7241422 ("block: move discard checks into the ioctl handler")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Shin'ichiro Kawasaki 
> ---
>  fs/f2fs/segment.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 78c3198a6308..e10bf3c4eed5 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -1281,6 +1281,8 @@ static int __submit_discard_cmd(struct f2fs_sb_info 
> *sbi,
>   __submit_zone_reset_cmd(sbi, dc, flag,
>   wait_list, issued);
>   return 0;
> + } else if (!bdev_max_discard_sectors(bdev)) {
> + return -EOPNOTSUPP;
>   }

You do not need the else here. And I suggest adding a comment as well:

/*
 * Issue discard for conventional zones only if the
         * device supports discard.
     */
if (!bdev_max_discard_sectors(bdev))
return -EOPNOTSUPP;

With that:

Reviewed-by: Damien Le Moal 

>   }
>  #endif

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3 1/5] zonefs: pass GFP_KERNEL to blkdev_zone_mgmt() call

2024-02-08 Thread Damien Le Moal
On 1/29/24 16:52, Johannes Thumshirn wrote:
> Pass GFP_KERNEL instead of GFP_NOFS to the blkdev_zone_mgmt() call in
> zonefs_zone_mgmt().
> 
> As as zonefs_zone_mgmt() and zonefs_inode_zone_mgmt() are never called
> from a place that can recurse back into the filesystem on memory reclaim,
> it is save to call blkdev_zone_mgmt() with GFP_KERNEL.
> 
> Link: https://lore.kernel.org/all/zzcgxi46ainlc...@casper.infradead.org/
> Signed-off-by: Johannes Thumshirn 
> ---
>  fs/zonefs/super.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index 93971742613a..63fbac018c04 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -113,7 +113,7 @@ static int zonefs_zone_mgmt(struct super_block *sb,
>  
>   trace_zonefs_zone_mgmt(sb, z, op);
>   ret = blkdev_zone_mgmt(sb->s_bdev, op, z->z_sector,
> -z->z_size >> SECTOR_SHIFT, GFP_NOFS);
> +z->z_size >> SECTOR_SHIFT, GFP_KERNEL);
>   if (ret) {
>   zonefs_err(sb,
>  "Zone management operation %s at %llu failed %d\n",
> 

Given that zonefs_inode_zone_mgmt() which calls zonefs_zone_mgmt() is only used
for sequential zone inodes, and that these inodes cannot be written with
buffered IOs, I think this is safe as we will never have dirty pages to
writeback for reclaim. So there should be no risk of re-entering the FS on
reclaim with GFP_KERNEL.

So:

Acked-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3 5/5] block: remove gfp_flags from blkdev_zone_mgmt

2024-01-29 Thread Damien Le Moal
On 1/29/24 16:52, Johannes Thumshirn wrote:
> Now that all callers pass in GFP_KERNEL to blkdev_zone_mgmt() and use
> memalloc_no{io,fs}_{save,restore}() to define the allocation scope, we can
> drop the gfp_mask parameter from blkdev_zone_mgmt() as well as
> blkdev_zone_reset_all() and blkdev_zone_reset_all_emulated().
> 
> Signed-off-by: Johannes Thumshirn 

Looks good to me.

Reviewed-by: Damien Le Moal 

But let me check zonefs (patch 1).

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3 4/5] f2fs: guard blkdev_zone_mgmt with nofs scope

2024-01-29 Thread Damien Le Moal
On 1/29/24 16:52, Johannes Thumshirn wrote:
> Guard the calls to blkdev_zone_mgmt() with a memalloc_nofs scope.
> This helps us getting rid of the GFP_NOFS argument to blkdev_zone_mgmt();
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3 3/5] btrfs: zoned: call blkdev_zone_mgmt in nofs scope

2024-01-29 Thread Damien Le Moal
On 1/29/24 16:52, Johannes Thumshirn wrote:
> Add a memalloc_nofs scope around all calls to blkdev_zone_mgmt(). This
> allows us to further get rid of the GFP_NOFS argument for
> blkdev_zone_mgmt().
> 
> Signed-off-by: Johannes Thumshirn 

Looks OK to me.

Reviewed-by: Damien Le Moal 


-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3 2/5] dm: dm-zoned: guard blkdev_zone_mgmt with noio scope

2024-01-29 Thread Damien Le Moal
On 1/29/24 16:52, Johannes Thumshirn wrote:
> Guard the calls to blkdev_zone_mgmt() with a memalloc_noio scope.
> This helps us getting rid of the GFP_NOIO argument to blkdev_zone_mgmt();
> 
> Signed-off-by: Johannes Thumshirn 

Reviewed-by: Damien Le Moal 

> ---
>  drivers/md/dm-zoned-metadata.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index fdfe30f7b697..165996cc966c 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -1655,10 +1655,13 @@ static int dmz_reset_zone(struct dmz_metadata *zmd, 
> struct dm_zone *zone)
>  
>   if (!dmz_is_empty(zone) || dmz_seq_write_err(zone)) {
>   struct dmz_dev *dev = zone->dev;
> + unsigned int noio_flag;
>  
> + noio_flag = memalloc_noio_save();
>   ret = blkdev_zone_mgmt(dev->bdev, REQ_OP_ZONE_RESET,
>  dmz_start_sect(zmd, zone),
> -zmd->zone_nr_sectors, GFP_NOIO);
> +zmd->zone_nr_sectors, GFP_KERNEL);
> + memalloc_noio_restore(noio_flag);
>   if (ret) {
>   dmz_dev_err(dev, "Reset zone %u failed %d",
>   zone->id, ret);
> 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3 1/5] zonefs: pass GFP_KERNEL to blkdev_zone_mgmt() call

2024-01-29 Thread Damien Le Moal
On 1/29/24 16:52, Johannes Thumshirn wrote:
> Pass GFP_KERNEL instead of GFP_NOFS to the blkdev_zone_mgmt() call in
> zonefs_zone_mgmt().
> 
> As as zonefs_zone_mgmt() and zonefs_inode_zone_mgmt() are never called
> from a place that can recurse back into the filesystem on memory reclaim,
> it is save to call blkdev_zone_mgmt() with GFP_KERNEL.

s/save/safe

> Link: https://lore.kernel.org/all/zzcgxi46ainlc...@casper.infradead.org/
> Signed-off-by: Johannes Thumshirn 
> ---
>  fs/zonefs/super.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index 93971742613a..63fbac018c04 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -113,7 +113,7 @@ static int zonefs_zone_mgmt(struct super_block *sb,
>  
>   trace_zonefs_zone_mgmt(sb, z, op);
>   ret = blkdev_zone_mgmt(sb->s_bdev, op, z->z_sector,
> -z->z_size >> SECTOR_SHIFT, GFP_NOFS);
> +z->z_size >> SECTOR_SHIFT, GFP_KERNEL);
>   if (ret) {
>   zonefs_err(sb,
>  "Zone management operation %s at %llu failed %d\n",

I think this is OK but I will need a little more time to fully convince myself.
So let me look again at the code to check all the calls contexts.

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 1/5] zonefs: pass GFP_KERNEL to blkdev_zone_mgmt() call

2024-01-23 Thread Damien Le Moal
On 1/24/24 08:21, Dave Chinner wrote:
> On Tue, Jan 23, 2024 at 09:39:02PM +0100, Mikulas Patocka wrote:
>>
>>
>> On Tue, 23 Jan 2024, Johannes Thumshirn wrote:
>>
>>> Pass GFP_KERNEL instead of GFP_NOFS to the blkdev_zone_mgmt() call in
>>> zonefs_zone_mgmt().
>>>
>>> As as zonefs_zone_mgmt() and zonefs_inode_zone_mgmt() are never called
>>> from a place that can recurse back into the filesystem on memory reclaim,
>>> it is save to call blkdev_zone_mgmt() with GFP_KERNEL.
>>>
>>> Signed-off-by: Johannes Thumshirn 
>>> ---
>>>  fs/zonefs/super.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
>>> index 93971742613a..63fbac018c04 100644
>>> --- a/fs/zonefs/super.c
>>> +++ b/fs/zonefs/super.c
>>> @@ -113,7 +113,7 @@ static int zonefs_zone_mgmt(struct super_block *sb,
>>>  
>>> trace_zonefs_zone_mgmt(sb, z, op);
>>> ret = blkdev_zone_mgmt(sb->s_bdev, op, z->z_sector,
>>> -  z->z_size >> SECTOR_SHIFT, GFP_NOFS);
>>> +  z->z_size >> SECTOR_SHIFT, GFP_KERNEL);
>>> if (ret) {
>>> zonefs_err(sb,
>>>"Zone management operation %s at %llu failed %d\n",
>>>
>>> -- 
>>> 2.43.0
>>
>> zonefs_inode_zone_mgmt calls 
>> lockdep_assert_held(&ZONEFS_I(inode)->i_truncate_mutex); - so, this 
>> function is called with the mutex held - could it happen that the 
>> GFP_KERNEL allocation recurses into the filesystem and attempts to take 
>> i_truncate_mutex as well?
>>
>> i.e. GFP_KERNEL -> iomap_do_writepage -> zonefs_write_map_blocks -> 
>> zonefs_write_iomap_begin -> mutex_lock(&zi->i_truncate_mutex)
> 
> zonefs doesn't have a ->writepage method, so writeback can't be
> called from memory reclaim like this.

And also, buffered writes are allowed only for conventional zone files, for
which we never do zone management. For sequential zone files which may have
there zone managed with blkdev_zone_mgmt(), only direct writes are allowed.

> 
> -Dave.

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 3/5] block: remove support for the host aware zone model

2023-12-19 Thread Damien Le Moal
On 12/19/23 17:08, Ed Tsai (蔡宗軒) wrote:
> On Tue, 2023-12-19 at 07:16 +, Naohiro Aota wrote:
>>   
>> External email : Please do not click links or open attachments until
>> you have verified the sender or the content.
>>  On Mon, Dec 18, 2023 at 08:21:22AM +, Ed Tsai (蔡宗軒) wrote:
>>> On Mon, 2023-12-18 at 15:53 +0900, Damien Le Moal wrote:
>>>>  On 2023/12/18 15:15, Ed Tsai (蔡宗軒) wrote:
>>>>> Hi Christoph,
>>>>>
>>>>> some minor suggestions:
>>>>>
>>>>> On Sun, 2023-12-17 at 17:53 +0100, Christoph Hellwig wrote:
>>>>>>
>>>>>> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
>>>>>> index 198d38b53322c1..260b5b8f2b0d7e 100644
>>>>>> --- a/drivers/md/dm-table.c
>>>>>> +++ b/drivers/md/dm-table.c
>>>>>> @@ -1579,21 +1579,18 @@ bool
>> dm_table_has_no_data_devices(struct
>>>>>> dm_table *t)
>>>>>>  return true;
>>>>>>  }
>>>>>>  
>>>>>> -static int device_not_zoned_model(struct dm_target *ti,
>> struct
>>>>>> dm_dev *dev,
>>>>>> -  sector_t start, sector_t len, void
>>>>>> *data)
>>>>>> +static int device_not_zoned(struct dm_target *ti, struct
>> dm_dev
>>>>>> *dev,
>>>>>> +sector_t start, sector_t len, void *data)
>>>>>>  {
>>>>>> -struct request_queue *q = bdev_get_queue(dev->bdev);
>>>>>> -enum blk_zoned_model *zoned_model = data;
>>>>>> +bool *zoned = data;
>>>>>>  
>>>>>> -return blk_queue_zoned_model(q) != *zoned_model;
>>>>>> +return bdev_is_zoned(dev->bdev) != *zoned;
>>>>>>  }
>>>>>>  
>>>>>>  static int device_is_zoned_model(struct dm_target *ti, struct
>>>> dm_dev
>>>>>> *dev,
>>>>>>   sector_t start, sector_t len, void
>>>>>> *data)
>>>>>
>>>>> Seems like the word "model" should also be remove here.
>>>>>
>>>>>>  {
>>>>>> -struct request_queue *q = bdev_get_queue(dev->bdev);
>>>>>> -
>>>>>> -return blk_queue_zoned_model(q) != BLK_ZONED_NONE;
>>>>>> +return bdev_is_zoned(dev->bdev);
>>>>>>  }
>>>>>>  
>>>>>>  /*
>>>>>> @@ -1603,8 +1600,7 @@ static int device_is_zoned_model(struct
>>>>>> dm_target *ti, struct dm_dev *dev,
>>>>>>   * has the DM_TARGET_MIXED_ZONED_MODEL feature set, the
>> devices
>>>> can
>>>>>> have any
>>>>>>   * zoned model with all zoned devices having the same zone
>> size.
>>>>>>   */
>>>>>> -static bool dm_table_supports_zoned_model(struct dm_table *t,
>>>>>> -  enum blk_zoned_model
>>>>>> zoned_model)
>>>>>> +static bool dm_table_supports_zoned(struct dm_table *t, bool
>>>> zoned)
>>>>>>  {
>>>>>>  for (unsigned int i = 0; i < t->num_targets; i++) {
>>>>>>  struct dm_target *ti = dm_table_get_target(t, i);
>>>>>> @@ -1623,11 +1619,11 @@ static bool
>>>>>> dm_table_supports_zoned_model(struct dm_table *t,
>>>>>>  
>>>>>>  if (dm_target_supports_zoned_hm(ti->type)) {
>>>>>>  if (!ti->type->iterate_devices ||
>>>>>> -ti->type->iterate_devices(ti,
>>>>>> device_not_zoned_model,
>>>>>> -  &zoned_model))
>>>>>> +ti->type->iterate_devices(ti,
>>>>>> device_not_zoned,
>>>>>> +  &zoned))
>>>>>>  return false;
>>>>>>  } else if (!dm_target_supports_mixed_zoned_model(ti-
>>>>>>> type)) {
>>>>>> -if (zoned_model == BLK_ZONED_HM)
>>>>>> +if (zoned)
>>>>>>  return false;
>>>>>>  }
>>>>>>  }
>>>>>
>>>>> The parameter "bool zoned" is redundant. It should be removed
>> from
>>>> the
>>>>> above 3 functions
>>>
>>> The two func, is zoned and not zoned, are essentially the same.
>> They
>>> can be simplified into one function.
>>
>> Both functions are used for iterate_devices's callback in
>> dm_table_supports_zoned_model(). As shown in raid_iterate_devices(),
>> iterate_devices() returns 0 if the callback func calls on all the
>> devices
>> returns 0, or returns a non-zero result early otherwise. So, the
>> iterate_devices() call returns "true" if any one of the underlying
>> devices
>> is (zoned|not zoned).
>>
>> Since we cannot create lambda as in other fancy languages, we need
>> two
>> functions...
> 
> Not really, there is a "void *data" can be used.
> 
> The device_is_zoned_model() is just the same as the device_not_zoned()
> with (bool *)data = false.
> 
> It's very minor, so is okay to ignore my preference.

Send a patch on top of Christoph's series if you want to clean this up.

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 5/5] sd: only call disk_clear_zoned when needed

2023-12-18 Thread Damien Le Moal
On 2023/12/18 1:53, Christoph Hellwig wrote:
> disk_clear_zoned only needs to be called when a device reported zone
> managed mode first and we clear it.  Add a check so that disk_clear_zoned
> isn't called on devices that were never zoned.
> 
> This avoids a fairly expensive queue freezing when revalidating
> conventional devices.
> 
> Signed-off-by: Christoph Hellwig 

Looks good.

Reviewed-by: Damien Le Moal 


-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 4/5] block: simplify disk_set_zoned

2023-12-18 Thread Damien Le Moal
On 2023/12/18 1:53, Christoph Hellwig wrote:
> Only use disk_set_zoned to actually enable zoned device support.
> For clearing it, call disk_clear_zoned, which is renamed from
> disk_clear_zone_settings and now directly clears the zoned flag as
> well.
> 
> Signed-off-by: Christoph Hellwig 

Looks good.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 3/5] block: remove support for the host aware zone model

2023-12-18 Thread Damien Le Moal
On 2023/12/18 1:53, Christoph Hellwig wrote:
> When zones were first added the SCSI and ATA specs, two different
> models were supported (in addition to the drive managed one that
> is invisible to the host):
> 
>  - host managed where non-conventional zones there is strict requirement
>to write at the write pointer, or else an error is returned
>  - host aware where a write point is maintained if writes always happen
>at it, otherwise it is left in an under-defined state and the
>sequential write preferred zones behave like conventional zones
>(probably very badly performing ones, though)
> 
> Not surprisingly this lukewarm model didn't prove to be very useful and
> was finally removed from the ZBC and SBC specs (NVMe never implemented
> it).  Due to to the easily disappearing write pointer host software
> could never rely on the write pointer to actually be useful for say
> recovery.
> 
> Fortunately only a few HDD prototypes shipped using this model which
> never made it to mass production.  Drop the support before it is too
> late.  Note that any such host aware prototype HDD can still be used
> with Linux as we'll now treat it as a conventional HDD.
> 
> Signed-off-by: Christoph Hellwig 

[...]

> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
> index 6d8218a4412264..d03d66f1149301 100644
> --- a/drivers/scsi/scsi_debug.c
> +++ b/drivers/scsi/scsi_debug.c
> @@ -339,7 +339,7 @@ struct sdebug_dev_info {
>   bool used;
>  
>   /* For ZBC devices */
> - enum blk_zoned_model zmodel;
> + bool zoned;
>   unsigned int zcap;
>   unsigned int zsize;
>   unsigned int zsize_shift;
> @@ -844,8 +844,11 @@ static bool write_since_sync;
>  static bool sdebug_statistics = DEF_STATISTICS;
>  static bool sdebug_wp;
>  static bool sdebug_allow_restart;
> -/* Following enum: 0: no zbc, def; 1: host aware; 2: host managed */
> -static enum blk_zoned_model sdeb_zbc_model = BLK_ZONED_NONE;
> +static enum {
> + BLK_ZONED_NONE  = 0,
> + BLK_ZONED_HA= 1,
> + BLK_ZONED_HM= 2,
> +} sdeb_zbc_model = BLK_ZONED_NONE;
>  static char *sdeb_zbc_model_s;
>  
>  enum sam_lun_addr_method {SAM_LUN_AM_PERIPHERAL = 0x0,
> @@ -1815,8 +1818,6 @@ static int inquiry_vpd_b1(struct sdebug_dev_info 
> *devip, unsigned char *arr)
>   arr[1] = 1; /* non rotating medium (e.g. solid state) */
>   arr[2] = 0;
>   arr[3] = 5; /* less than 1.8" */
> - if (devip->zmodel == BLK_ZONED_HA)
> - arr[4] = 1 << 4;    /* zoned field = 01b */

I think we should keep everything related to HA in scsi debug as that is an easy
way to test the block layer and scsi. no ?

Other than this, very nice cleanup !

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/5] virtio_blk: remove the broken zone revalidation support

2023-12-18 Thread Damien Le Moal
On 2023/12/18 1:53, Christoph Hellwig wrote:
> virtblk_revalidate_zones is called unconditionally from
> virtblk_config_changed_work from the virtio config_changed callback.
> 
> virtblk_revalidate_zones is a bit odd in that it re-clears the zoned
> state for host aware or non-zoned devices, which isn't needed unless the
> zoned mode changed - but a zone mode change to a host managed model isn't
> handled at all, and virtio_blk also doesn't handle any other config
> change except for a capacity change is handled (and even if it was
> the upper layers above virtio_blk wouldn't handle it very well).
> 
> But even the useful case of a size change that would add or remove
> zones isn't handled properly as blk_revalidate_disk_zones expects the
> device capacity to cover all zones, but the capacity is only updated
> after virtblk_revalidate_zones.
> 
> As this code appears to be entirely untested and is getting in the way
> remove it for now, but it can be readded in a fixed version with
> proper test coverage if needed.
> 
> Fixes: 95bfec41bd3d ("virtio-blk: add support for zoned block devices")
> Fixes: f1ba4e674feb ("virtio-blk: fix to match virtio spec")
> Signed-off-by: Christoph Hellwig 

Not ideal... But I think this is OK for now given that as you say, the upper
layer will not be able to handle zone changes anyway.

Reviewed-by: Damien Le Moal 

> ---
>  drivers/block/virtio_blk.c | 26 --
>  1 file changed, 26 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index aeead732a24dc9..a28f1687066bb4 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -722,27 +722,6 @@ static int virtblk_report_zones(struct gendisk *disk, 
> sector_t sector,
>   return ret;
>  }
>  
> -static void virtblk_revalidate_zones(struct virtio_blk *vblk)
> -{
> - u8 model;
> -
> - virtio_cread(vblk->vdev, struct virtio_blk_config,
> -  zoned.model, &model);
> - switch (model) {
> - default:
> - dev_err(&vblk->vdev->dev, "unknown zone model %d\n", model);
> - fallthrough;
> - case VIRTIO_BLK_Z_NONE:
> - case VIRTIO_BLK_Z_HA:
> - disk_set_zoned(vblk->disk, BLK_ZONED_NONE);
> - return;
> - case VIRTIO_BLK_Z_HM:
> - WARN_ON_ONCE(!vblk->zone_sectors);
> - if (!blk_revalidate_disk_zones(vblk->disk, NULL))
> - set_capacity_and_notify(vblk->disk, 0);
> - }
> -}
> -
>  static int virtblk_probe_zoned_device(struct virtio_device *vdev,
>  struct virtio_blk *vblk,
>  struct request_queue *q)
> @@ -823,10 +802,6 @@ static int virtblk_probe_zoned_device(struct 
> virtio_device *vdev,
>   */
>  #define virtblk_report_zones   NULL
>  
> -static inline void virtblk_revalidate_zones(struct virtio_blk *vblk)
> -{
> -}
> -
>  static inline int virtblk_probe_zoned_device(struct virtio_device *vdev,
>   struct virtio_blk *vblk, struct request_queue *q)
>  {
> @@ -982,7 +957,6 @@ static void virtblk_config_changed_work(struct 
> work_struct *work)
>   struct virtio_blk *vblk =
>   container_of(work, struct virtio_blk, config_work);
>  
> - virtblk_revalidate_zones(vblk);
>   virtblk_update_capacity(vblk, true);
>  }
>  

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 1/5] virtio_blk: cleanup zoned device probing

2023-12-18 Thread Damien Le Moal
On 2023/12/18 1:53, Christoph Hellwig wrote:
> Move reading and checking the zoned model from virtblk_probe_zoned_device
> into the caller, leaving only the code to perform the actual setup for
> host managed zoned devices in virtblk_probe_zoned_device.
> 
> This allows to share the model reading and sharing between builds with
> and without CONFIG_BLK_DEV_ZONED, and improve it for the
> !CONFIG_BLK_DEV_ZONED case.
> 
> Signed-off-by: Christoph Hellwig 

Looks good.
Reviewed-by: Damien Le Moal 

> ---
>  drivers/block/virtio_blk.c | 50 +-
>  1 file changed, 22 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index d53d6aa8ee69a4..aeead732a24dc9 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -748,22 +748,6 @@ static int virtblk_probe_zoned_device(struct 
> virtio_device *vdev,
>  struct request_queue *q)
>  {
>   u32 v, wg;
> - u8 model;
> -
> - virtio_cread(vdev, struct virtio_blk_config,
> -  zoned.model, &model);
> -
> - switch (model) {
> - case VIRTIO_BLK_Z_NONE:
> - case VIRTIO_BLK_Z_HA:
> - /* Present the host-aware device as non-zoned */
> - return 0;
> - case VIRTIO_BLK_Z_HM:
> - break;
> - default:
> - dev_err(&vdev->dev, "unsupported zone model %d\n", model);
> - return -EINVAL;
> - }
>  
>   dev_dbg(&vdev->dev, "probing host-managed zoned device\n");
>  
> @@ -846,16 +830,9 @@ static inline void virtblk_revalidate_zones(struct 
> virtio_blk *vblk)
>  static inline int virtblk_probe_zoned_device(struct virtio_device *vdev,
>   struct virtio_blk *vblk, struct request_queue *q)
>  {
> - u8 model;
> -
> - virtio_cread(vdev, struct virtio_blk_config, zoned.model, &model);
> - if (model == VIRTIO_BLK_Z_HM) {
> - dev_err(&vdev->dev,
> - "virtio_blk: zoned devices are not supported");
> - return -EOPNOTSUPP;
> - }
> -
> - return 0;
> + dev_err(&vdev->dev,
> + "virtio_blk: zoned devices are not supported");
> + return -EOPNOTSUPP;
>  }
>  #endif /* CONFIG_BLK_DEV_ZONED */
>  
> @@ -1570,9 +1547,26 @@ static int virtblk_probe(struct virtio_device *vdev)
>* placed after the virtio_device_ready() call above.
>*/
>   if (virtio_has_feature(vdev, VIRTIO_BLK_F_ZONED)) {
> - err = virtblk_probe_zoned_device(vdev, vblk, q);
> - if (err)
> + u8 model;
> +
> + virtio_cread(vdev, struct virtio_blk_config, zoned.model,
> + &model);
> + switch (model) {
> + case VIRTIO_BLK_Z_NONE:
> + case VIRTIO_BLK_Z_HA:
> + /* Present the host-aware device as non-zoned */
> + break;
> + case VIRTIO_BLK_Z_HM:
> + err = virtblk_probe_zoned_device(vdev, vblk, q);
> + if (err)
> + goto out_cleanup_disk;
> + break;
> +     default:
> + dev_err(&vdev->dev, "unsupported zone model %d\n",
> + model);
> + err = -EINVAL;
>   goto out_cleanup_disk;
> + }
>   }
>  
>   err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 3/5] block: remove support for the host aware zone model

2023-12-18 Thread Damien Le Moal
On 12/18/23 17:21, Ed Tsai (蔡宗軒) wrote:
> On Mon, 2023-12-18 at 15:53 +0900, Damien Le Moal wrote:
>>  On 2023/12/18 15:15, Ed Tsai (蔡宗軒) wrote:
>>> Hi Christoph,
>>>
>>> some minor suggestions:
>>>
>>> On Sun, 2023-12-17 at 17:53 +0100, Christoph Hellwig wrote:
>>>>
>>>> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
>>>> index 198d38b53322c1..260b5b8f2b0d7e 100644
>>>> --- a/drivers/md/dm-table.c
>>>> +++ b/drivers/md/dm-table.c
>>>> @@ -1579,21 +1579,18 @@ bool dm_table_has_no_data_devices(struct
>>>> dm_table *t)
>>>>  return true;
>>>>  }
>>>>  
>>>> -static int device_not_zoned_model(struct dm_target *ti, struct
>>>> dm_dev *dev,
>>>> -  sector_t start, sector_t len, void
>>>> *data)
>>>> +static int device_not_zoned(struct dm_target *ti, struct dm_dev
>>>> *dev,
>>>> +sector_t start, sector_t len, void *data)
>>>>  {
>>>> -struct request_queue *q = bdev_get_queue(dev->bdev);
>>>> -enum blk_zoned_model *zoned_model = data;
>>>> +bool *zoned = data;
>>>>  
>>>> -return blk_queue_zoned_model(q) != *zoned_model;
>>>> +return bdev_is_zoned(dev->bdev) != *zoned;
>>>>  }
>>>>  
>>>>  static int device_is_zoned_model(struct dm_target *ti, struct
>> dm_dev
>>>> *dev,
>>>>   sector_t start, sector_t len, void
>>>> *data)
>>>
>>> Seems like the word "model" should also be remove here.
>>>
>>>>  {
>>>> -struct request_queue *q = bdev_get_queue(dev->bdev);
>>>> -
>>>> -return blk_queue_zoned_model(q) != BLK_ZONED_NONE;
>>>> +return bdev_is_zoned(dev->bdev);
>>>>  }
>>>>  
>>>>  /*
>>>> @@ -1603,8 +1600,7 @@ static int device_is_zoned_model(struct
>>>> dm_target *ti, struct dm_dev *dev,
>>>>   * has the DM_TARGET_MIXED_ZONED_MODEL feature set, the devices
>> can
>>>> have any
>>>>   * zoned model with all zoned devices having the same zone size.
>>>>   */
>>>> -static bool dm_table_supports_zoned_model(struct dm_table *t,
>>>> -  enum blk_zoned_model
>>>> zoned_model)
>>>> +static bool dm_table_supports_zoned(struct dm_table *t, bool
>> zoned)
>>>>  {
>>>>  for (unsigned int i = 0; i < t->num_targets; i++) {
>>>>  struct dm_target *ti = dm_table_get_target(t, i);
>>>> @@ -1623,11 +1619,11 @@ static bool
>>>> dm_table_supports_zoned_model(struct dm_table *t,
>>>>  
>>>>  if (dm_target_supports_zoned_hm(ti->type)) {
>>>>  if (!ti->type->iterate_devices ||
>>>> -ti->type->iterate_devices(ti,
>>>> device_not_zoned_model,
>>>> -  &zoned_model))
>>>> +ti->type->iterate_devices(ti,
>>>> device_not_zoned,
>>>> +  &zoned))
>>>>  return false;
>>>>  } else if (!dm_target_supports_mixed_zoned_model(ti-
>>>>> type)) {
>>>> -if (zoned_model == BLK_ZONED_HM)
>>>> +if (zoned)
>>>>  return false;
>>>>  }
>>>>  }
>>>
>>> The parameter "bool zoned" is redundant. It should be removed from
>> the
>>> above 3 functions
> 
> The two func, is zoned and not zoned, are essentially the same. They
> can be simplified into one function.

Maybe... But that needs testing/checking. I added the one because I could not
reuse the other given what is being tested.

> 
>>>
>>> Additionally, because we no longer need to distinguish the zoned
>> model
>>> here, DM_TARGET_MIXED_ZONED_MODEL is meaningless. We can also clean
>> up
>>> its related code.
>>
>> Nope. The mixed thing is for mixing up non-zoned with zoned models.
>> For the entire DM code, HM and HA are both treated as HM-like zoned.
>>
>> -- 
>> Damien Le Moal
>> Western Digital Research
> 
> Thank you. I have some misunderstanding. Please disregard it.

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 3/5] block: remove support for the host aware zone model

2023-12-17 Thread Damien Le Moal
On 2023/12/18 15:15, Ed Tsai (蔡宗軒) wrote:
> Hi Christoph,
> 
> some minor suggestions:
> 
> On Sun, 2023-12-17 at 17:53 +0100, Christoph Hellwig wrote:
>>
>> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
>> index 198d38b53322c1..260b5b8f2b0d7e 100644
>> --- a/drivers/md/dm-table.c
>> +++ b/drivers/md/dm-table.c
>> @@ -1579,21 +1579,18 @@ bool dm_table_has_no_data_devices(struct
>> dm_table *t)
>>  return true;
>>  }
>>  
>> -static int device_not_zoned_model(struct dm_target *ti, struct
>> dm_dev *dev,
>> -  sector_t start, sector_t len, void
>> *data)
>> +static int device_not_zoned(struct dm_target *ti, struct dm_dev
>> *dev,
>> +sector_t start, sector_t len, void *data)
>>  {
>> -struct request_queue *q = bdev_get_queue(dev->bdev);
>> -enum blk_zoned_model *zoned_model = data;
>> +bool *zoned = data;
>>  
>> -return blk_queue_zoned_model(q) != *zoned_model;
>> +return bdev_is_zoned(dev->bdev) != *zoned;
>>  }
>>  
>>  static int device_is_zoned_model(struct dm_target *ti, struct dm_dev
>> *dev,
>>   sector_t start, sector_t len, void
>> *data)
> 
> Seems like the word "model" should also be remove here.
> 
>>  {
>> -struct request_queue *q = bdev_get_queue(dev->bdev);
>> -
>> -return blk_queue_zoned_model(q) != BLK_ZONED_NONE;
>> +return bdev_is_zoned(dev->bdev);
>>  }
>>  
>>  /*
>> @@ -1603,8 +1600,7 @@ static int device_is_zoned_model(struct
>> dm_target *ti, struct dm_dev *dev,
>>   * has the DM_TARGET_MIXED_ZONED_MODEL feature set, the devices can
>> have any
>>   * zoned model with all zoned devices having the same zone size.
>>   */
>> -static bool dm_table_supports_zoned_model(struct dm_table *t,
>> -  enum blk_zoned_model
>> zoned_model)
>> +static bool dm_table_supports_zoned(struct dm_table *t, bool zoned)
>>  {
>>  for (unsigned int i = 0; i < t->num_targets; i++) {
>>  struct dm_target *ti = dm_table_get_target(t, i);
>> @@ -1623,11 +1619,11 @@ static bool
>> dm_table_supports_zoned_model(struct dm_table *t,
>>  
>>  if (dm_target_supports_zoned_hm(ti->type)) {
>>  if (!ti->type->iterate_devices ||
>> -ti->type->iterate_devices(ti,
>> device_not_zoned_model,
>> -  &zoned_model))
>> +ti->type->iterate_devices(ti,
>> device_not_zoned,
>> +  &zoned))
>>  return false;
>>  } else if (!dm_target_supports_mixed_zoned_model(ti-
>>> type)) {
>> -if (zoned_model == BLK_ZONED_HM)
>> +if (zoned)
>>  return false;
>>  }
>>  }
> 
> The parameter "bool zoned" is redundant. It should be removed from the
> above 3 functions
> 
> Additionally, because we no longer need to distinguish the zoned model
> here, DM_TARGET_MIXED_ZONED_MODEL is meaningless. We can also clean up
> its related code.

Nope. The mixed thing is for mixing up non-zoned with zoned models.
For the entire DM code, HM and HA are both treated as HM-like zoned.

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3 28/49] dm zoned: dynamically allocate the dm-zoned-meta shrinker

2023-07-27 Thread Damien Le Moal
On 7/28/23 07:59, Dave Chinner wrote:
> On Thu, Jul 27, 2023 at 07:20:46PM +0900, Damien Le Moal wrote:
>> On 7/27/23 17:55, Qi Zheng wrote:
>>>>>   goto err;
>>>>>   }
>>>>>   +    zmd->mblk_shrinker->count_objects = dmz_mblock_shrinker_count;
>>>>> +    zmd->mblk_shrinker->scan_objects = dmz_mblock_shrinker_scan;
>>>>> +    zmd->mblk_shrinker->seeks = DEFAULT_SEEKS;
>>>>> +    zmd->mblk_shrinker->private_data = zmd;
>>>>> +
>>>>> +    shrinker_register(zmd->mblk_shrinker);
>>>>
>>>> I fail to see how this new shrinker API is better... Why isn't there a
>>>> shrinker_alloc_and_register() function ? That would avoid adding all this 
>>>> code
>>>> all over the place as the new API call would be very similar to the current
>>>> shrinker_register() call with static allocation.
>>>
>>> In some registration scenarios, memory needs to be allocated in advance.
>>> So we continue to use the previous prealloc/register_prepared()
>>> algorithm. The shrinker_alloc_and_register() is just a helper function
>>> that combines the two, and this increases the number of APIs that
>>> shrinker exposes to the outside, so I choose not to add this helper.
>>
>> And that results in more code in many places instead of less code + a simple
>> inline helper in the shrinker header file...
> 
> It's not just a "simple helper" - it's a function that has to take 6
> or 7 parameters with a return value that must be checked and
> handled.
> 
> This was done in the first versions of the patch set - the amount of
> code in each caller does not go down and, IMO, was much harder to
> read and determine "this is obviously correct" that what we have
> now.
> 
>> So not adding that super simple
>> helper is not exactly the best choice in my opinion.
> 
> Each to their own - I much prefer the existing style/API over having
> to go look up a helper function every time I want to check some
> random shrinker has been set up correctly

OK. All fair points.


-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3 28/49] dm zoned: dynamically allocate the dm-zoned-meta shrinker

2023-07-27 Thread Damien Le Moal
On 7/27/23 17:55, Qi Zheng wrote:
>>>   goto err;
>>>   }
>>>   +    zmd->mblk_shrinker->count_objects = dmz_mblock_shrinker_count;
>>> +    zmd->mblk_shrinker->scan_objects = dmz_mblock_shrinker_scan;
>>> +    zmd->mblk_shrinker->seeks = DEFAULT_SEEKS;
>>> +    zmd->mblk_shrinker->private_data = zmd;
>>> +
>>> +    shrinker_register(zmd->mblk_shrinker);
>>
>> I fail to see how this new shrinker API is better... Why isn't there a
>> shrinker_alloc_and_register() function ? That would avoid adding all this 
>> code
>> all over the place as the new API call would be very similar to the current
>> shrinker_register() call with static allocation.
> 
> In some registration scenarios, memory needs to be allocated in advance.
> So we continue to use the previous prealloc/register_prepared()
> algorithm. The shrinker_alloc_and_register() is just a helper function
> that combines the two, and this increases the number of APIs that
> shrinker exposes to the outside, so I choose not to add this helper.

And that results in more code in many places instead of less code + a simple
inline helper in the shrinker header file... So not adding that super simple
helper is not exactly the best choice in my opinion.

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3 28/49] dm zoned: dynamically allocate the dm-zoned-meta shrinker

2023-07-27 Thread Damien Le Moal
On 7/27/23 17:04, Qi Zheng wrote:
> In preparation for implementing lockless slab shrink, use new APIs to
> dynamically allocate the dm-zoned-meta shrinker, so that it can be freed
> asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
> read-side critical section when releasing the struct dmz_metadata.
> 
> Signed-off-by: Qi Zheng 
> Reviewed-by: Muchun Song 
> ---
>  drivers/md/dm-zoned-metadata.c | 28 
>  1 file changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 9d3cca8e3dc9..0bcb26a43578 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -187,7 +187,7 @@ struct dmz_metadata {
>   struct rb_root  mblk_rbtree;
>   struct list_headmblk_lru_list;
>   struct list_headmblk_dirty_list;
> - struct shrinker mblk_shrinker;
> + struct shrinker *mblk_shrinker;
>  
>   /* Zone allocation management */
>   struct mutexmap_lock;
> @@ -615,7 +615,7 @@ static unsigned long dmz_shrink_mblock_cache(struct 
> dmz_metadata *zmd,
>  static unsigned long dmz_mblock_shrinker_count(struct shrinker *shrink,
>  struct shrink_control *sc)
>  {
> - struct dmz_metadata *zmd = container_of(shrink, struct dmz_metadata, 
> mblk_shrinker);
> + struct dmz_metadata *zmd = shrink->private_data;
>  
>   return atomic_read(&zmd->nr_mblks);
>  }
> @@ -626,7 +626,7 @@ static unsigned long dmz_mblock_shrinker_count(struct 
> shrinker *shrink,
>  static unsigned long dmz_mblock_shrinker_scan(struct shrinker *shrink,
> struct shrink_control *sc)
>  {
> - struct dmz_metadata *zmd = container_of(shrink, struct dmz_metadata, 
> mblk_shrinker);
> + struct dmz_metadata *zmd = shrink->private_data;
>   unsigned long count;
>  
>   spin_lock(&zmd->mblk_lock);
> @@ -2936,19 +2936,23 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>*/
>   zmd->min_nr_mblks = 2 + zmd->nr_map_blocks + zmd->zone_nr_bitmap_blocks 
> * 16;
>   zmd->max_nr_mblks = zmd->min_nr_mblks + 512;
> - zmd->mblk_shrinker.count_objects = dmz_mblock_shrinker_count;
> - zmd->mblk_shrinker.scan_objects = dmz_mblock_shrinker_scan;
> - zmd->mblk_shrinker.seeks = DEFAULT_SEEKS;
>  
>   /* Metadata cache shrinker */
> - ret = register_shrinker(&zmd->mblk_shrinker, "dm-zoned-meta:(%u:%u)",
> - MAJOR(dev->bdev->bd_dev),
> - MINOR(dev->bdev->bd_dev));
> - if (ret) {
> - dmz_zmd_err(zmd, "Register metadata cache shrinker failed");
> + zmd->mblk_shrinker = shrinker_alloc(0,  "dm-zoned-meta:(%u:%u)",
> + MAJOR(dev->bdev->bd_dev),
> + MINOR(dev->bdev->bd_dev));
> + if (!zmd->mblk_shrinker) {
> + dmz_zmd_err(zmd, "Allocate metadata cache shrinker failed");

ret is not set here, so dmz_ctr_metadata() will return success. You need to add:
ret = -ENOMEM;
or something.
>   goto err;
>   }
>  
> + zmd->mblk_shrinker->count_objects = dmz_mblock_shrinker_count;
> + zmd->mblk_shrinker->scan_objects = dmz_mblock_shrinker_scan;
> + zmd->mblk_shrinker->seeks = DEFAULT_SEEKS;
> + zmd->mblk_shrinker->private_data = zmd;
> +
> + shrinker_register(zmd->mblk_shrinker);

I fail to see how this new shrinker API is better... Why isn't there a
shrinker_alloc_and_register() function ? That would avoid adding all this code
all over the place as the new API call would be very similar to the current
shrinker_register() call with static allocation.

> +
>   dmz_zmd_info(zmd, "DM-Zoned metadata version %d", zmd->sb_version);
>   for (i = 0; i < zmd->nr_devs; i++)
>   dmz_print_dev(zmd, i);
> @@ -2995,7 +2999,7 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
>   */
>  void dmz_dtr_metadata(struct dmz_metadata *zmd)
>  {
> - unregister_shrinker(&zmd->mblk_shrinker);
> + shrinker_free(zmd->mblk_shrinker);
>   dmz_cleanup_metadata(zmd);
>   kfree(zmd);
>  }

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: don't reopen the main block device in f2fs_scan_devices

2023-07-10 Thread Damien Le Moal
On 7/11/23 10:52, Jaegeuk Kim wrote:
>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>>> index ca31163da00a55..8d11d4a5ec331d 100644
>>>> --- a/fs/f2fs/super.c
>>>> +++ b/fs/f2fs/super.c
>>>> @@ -1560,7 +1560,8 @@ static void destroy_device_list(struct f2fs_sb_info 
>>>> *sbi)
>>>>  {
>>>>int i;
>>>>  
>>>> -  for (i = 0; i < sbi->s_ndevs; i++) {
>>>
>>> #ifdef CONFIG_BLK_DEV_ZONED
>>>
>>>> +  kvfree(FDEV(0).blkz_seq);
>>>
>>> #endif
>>
>> This should not be needed since for the !CONFIG_BLK_DEV_ZONED case,
>> FDEV(0).blkz_seq should always be NULL. However, what I think may be missing 
>> is
>> "FDEV(0).blkz_seq = NULL;" after the kvfree() call. No ?
> 
> I was looking at a glance of this:
> https://lore.kernel.org/linux-f2fs-devel/202307110542.nbamyzxe-...@intel.com/T/#u

OK. Got it.
I still think that it may be safer to add the NULL affectation though.


-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: don't reopen the main block device in f2fs_scan_devices

2023-07-10 Thread Damien Le Moal
if (i > 0 && !RDEV(i).path[0])
>> +if (i == 0)
>> +FDEV(0).bdev = sbi->sb->s_bdev;
>> +else if (!RDEV(i).path[0])
>>  break;
>>  
>> -if (max_devices == 1) {
>> -/* Single zoned block device mount */
>> -FDEV(0).bdev =
>> -blkdev_get_by_dev(sbi->sb->s_bdev->bd_dev, mode,
>> -  sbi->sb->s_type, NULL);
>> -} else {
>> +if (max_devices > 1) {
>>  /* Multi-device mount */
>>  memcpy(FDEV(i).path, RDEV(i).path, MAX_PATH_LEN);
>>  FDEV(i).total_segments =
>> @@ -4215,10 +4212,9 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
>>  FDEV(i).end_blk = FDEV(i).start_blk +
>>  (FDEV(i).total_segments <<
>>  sbi->log_blocks_per_seg) - 1;
>> +FDEV(i).bdev = blkdev_get_by_path(FDEV(i).path,
>> +mode, sbi->sb->s_type, NULL);
>>  }
>> -FDEV(i).bdev = blkdev_get_by_path(FDEV(i).path, mode,
>> -  sbi->sb->s_type,
>> -  NULL);
>>  }
>>  if (IS_ERR(FDEV(i).bdev))
>>  return PTR_ERR(FDEV(i).bdev);
>> -- 
>> 2.39.2

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2 92/92] fs: rename i_ctime field to __i_ctime

2023-07-05 Thread Damien Le Moal
On 7/6/23 03:58, Jeff Layton wrote:
> Now that everything in-tree is converted to use the accessor functions,
> rename the i_ctime field in the inode to discourage direct access.
> 
> Signed-off-by: Jeff Layton 

Looks OK to me.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2 08/92] fs: new helper: simple_rename_timestamp

2023-07-05 Thread Damien Le Moal
On 7/6/23 03:58, Jeff Layton wrote:
> A rename potentially involves updating 4 different inode timestamps. Add
> a function that handles the details sanely, and convert the libfs.c
> callers to use it.
> 
> Signed-off-by: Jeff Layton 
> ---
>  fs/libfs.c | 36 +++-
>  include/linux/fs.h |  2 ++
>  2 files changed, 29 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/libfs.c b/fs/libfs.c
> index a7e56baf8bbd..9ee79668c909 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -692,6 +692,31 @@ int simple_rmdir(struct inode *dir, struct dentry 
> *dentry)
>  }
>  EXPORT_SYMBOL(simple_rmdir);
>  
> +/**
> + * simple_rename_timestamp - update the various inode timestamps for rename
> + * @old_dir: old parent directory
> + * @old_dentry: dentry that is being renamed
> + * @new_dir: new parent directory
> + * @new_dentry: target for rename
> + *
> + * POSIX mandates that the old and new parent directories have their ctime 
> and
> + * mtime updated, and that inodes of @old_dentry and @new_dentry (if any), 
> have
> + * their ctime updated.
> + */
> +void simple_rename_timestamp(struct inode *old_dir, struct dentry 
> *old_dentry,
> +  struct inode *new_dir, struct dentry *new_dentry)
> +{
> + struct inode *newino = d_inode(new_dentry);
> +
> + old_dir->i_mtime = inode_set_ctime_current(old_dir);
> + if (new_dir != old_dir)
> + new_dir->i_mtime = inode_set_ctime_current(new_dir);
> + inode_set_ctime_current(d_inode(old_dentry));
> + if (newino)
> + inode_set_ctime_current(newino);
> +}
> +EXPORT_SYMBOL_GPL(simple_rename_timestamp);
> +
>  int simple_rename_exchange(struct inode *old_dir, struct dentry *old_dentry,
>  struct inode *new_dir, struct dentry *new_dentry)
>  {
> @@ -707,11 +732,7 @@ int simple_rename_exchange(struct inode *old_dir, struct 
> dentry *old_dentry,
>   inc_nlink(old_dir);
>   }
>   }
> - old_dir->i_ctime = old_dir->i_mtime =
> - new_dir->i_ctime = new_dir->i_mtime =
> - d_inode(old_dentry)->i_ctime =
> - d_inode(new_dentry)->i_ctime = current_time(old_dir);
> -
> + simple_rename_timestamp(old_dir, old_dentry, new_dir, new_dentry);

This is somewhat changing the current behavior: before the patch, the mtime and
ctime of old_dir, new_dir and the inodes associated with the dentries are always
equal. But given that simple_rename_timestamp() calls inode_set_ctime_current()
4 times, the times could potentially be different.

I am not sure if that is an issue, but it seems that calling
inode_set_ctime_current() once, recording the "now" time it sets and using that
value to set all times may be more efficient and preserve the existing behavior.

>   return 0;
>  }
>  EXPORT_SYMBOL_GPL(simple_rename_exchange);
> @@ -720,7 +741,6 @@ int simple_rename(struct mnt_idmap *idmap, struct inode 
> *old_dir,
> struct dentry *old_dentry, struct inode *new_dir,
> struct dentry *new_dentry, unsigned int flags)
>  {
> - struct inode *inode = d_inode(old_dentry);
>   int they_are_dirs = d_is_dir(old_dentry);
>  
>   if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE))
> @@ -743,9 +763,7 @@ int simple_rename(struct mnt_idmap *idmap, struct inode 
> *old_dir,
>   inc_nlink(new_dir);
>   }
>  
> - old_dir->i_ctime = old_dir->i_mtime = new_dir->i_ctime =
> - new_dir->i_mtime = inode->i_ctime = current_time(old_dir);
> -
> + simple_rename_timestamp(old_dir, old_dentry, new_dir, new_dentry);
>   return 0;
>  }
>  EXPORT_SYMBOL(simple_rename);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index bdfbd11a5811..14e38bd900f1 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2979,6 +2979,8 @@ extern int simple_open(struct inode *inode, struct file 
> *file);
>  extern int simple_link(struct dentry *, struct inode *, struct dentry *);
>  extern int simple_unlink(struct inode *, struct dentry *);
>  extern int simple_rmdir(struct inode *, struct dentry *);
> +void simple_rename_timestamp(struct inode *old_dir, struct dentry 
> *old_dentry,
> +  struct inode *new_dir, struct dentry *new_dentry);
>  extern int simple_rename_exchange(struct inode *old_dir, struct dentry 
> *old_dentry,
> struct inode *new_dir, struct dentry 
> *new_dentry);
>  extern int simple_rename(struct mnt_idmap *, struct inode *,

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2 07/92] fs: add ctime accessors infrastructure

2023-07-05 Thread Damien Le Moal
On 7/6/23 03:58, Jeff Layton wrote:
> struct timespec64 has unused bits in the tv_nsec field that can be used
> for other purposes. In future patches, we're going to change how the
> inode->i_ctime is accessed in certain inodes in order to make use of
> them. In order to do that safely though, we'll need to eradicate raw
> accesses of the inode->i_ctime field from the kernel.
> 
> Add new accessor functions for the ctime that we use to replace them.
> 
> Reviewed-by: Jan Kara 
> Reviewed-by: Luis Chamberlain 
> Signed-off-by: Jeff Layton 

Looks OK to me.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 01/79] fs: add ctime accessors infrastructure

2023-06-21 Thread Damien Le Moal
On 6/21/23 23:45, Jeff Layton wrote:
> struct timespec64 has unused bits in the tv_nsec field that can be used
> for other purposes. In future patches, we're going to change how the
> inode->i_ctime is accessed in certain inodes in order to make use of
> them. In order to do that safely though, we'll need to eradicate raw
> accesses of the inode->i_ctime field from the kernel.
> 
> Add new accessor functions for the ctime that we can use to replace them.
> 
> Signed-off-by: Jeff Layton 

[...]

> +/**
> + * inode_ctime_peek - fetch the current ctime from the inode
> + * @inode: inode from which to fetch ctime
> + *
> + * Grab the current ctime from the inode and return it.
> + */
> +static inline struct timespec64 inode_ctime_peek(const struct inode *inode)

To be consistent with inode_ctime_set(), why not call this one inode_ctime_get()
? Also, inode_set_ctime() & inode_get_ctime() may be a little more natural. But
no strong opinion about that though.

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 09/12] fs: factor out a direct_write_fallback helper

2023-05-31 Thread Damien Le Moal
On 5/31/23 16:50, Christoph Hellwig wrote:
> Add a helper dealing with handling the syncing of a buffered write fallback
> for direct I/O.
> 
> Signed-off-by: Christoph Hellwig 

Looks OK to me.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 01/12] backing_dev: remove current->backing_dev_info

2023-05-31 Thread Damien Le Moal
On 5/31/23 16:50, Christoph Hellwig wrote:
> The last user of current->backing_dev_info disappeared in commit
> b9b1335e6403 ("remove bdi_congested() and wb_congested() and related
> functions").  Remove the field and all assignments to it.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Hannes Reinecke 
> Reviewed-by: Darrick J. Wong 

Nice cleanup !

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 13/13] fuse: use direct_write_fallback

2023-05-21 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Use the generic direct_write_fallback helper instead of duplicating the
> logic.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 12/13] fuse: drop redundant arguments to fuse_perform_write

2023-05-21 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> pos is always equal to iocb->ki_pos, and mapping is always equal to
> iocb->ki_filp->f_mapping.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 11/13] fuse: update ki_pos in fuse_perform_write

2023-05-21 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Both callers of fuse_perform_write need to updated ki_pos, move it into
> common code.
> 
> Signed-off-by: Christoph Hellwig 

Looks OK to me.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 10/13] fs: factor out a direct_write_fallback helper

2023-05-21 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Add a helper dealing with handling the syncing of a buffered write fallback
> for direct I/O.
> 
> Signed-off-by: Christoph Hellwig 

Looks OK. One comment below.

Reviewed-by: Damien Le Moal 

> + /*
> +  * We need to ensure that the page cache pages are written to disk and
> +  * invalidated to preserve the expected O_DIRECT semantics.
> +  */
> + end = pos + buffered_written - 1;
> + err = filemap_write_and_wait_range(mapping, pos, end);
> + if (err < 0) {
> + /*
> +  * We don't know how much we wrote, so just return the number of
> +  * bytes which were direct-written
> +  */
> + return err;
> + }
> + invalidate_mapping_pages(mapping, pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
> + return direct_written + buffered_written;

Why not adding here something like:

if (buffered_written != iov_iter_count(from))
return -EIO;

return direct_written + buffered_written;

to have the same semantic as plain DIO ?

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 09/13] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages

2023-05-21 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Use the common helpers for direct I/O page invalidation instead of
> open coding the logic.  This leads to a slight reordering of checks
> in __iomap_dio_rw to keep the logic straight.
> 
> Signed-off-by: Christoph Hellwig 

Looks OK to me.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 08/13] iomap: assign current->backing_dev_info in iomap_file_buffered_write

2023-05-21 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Move the assignment to current->backing_dev_info from the callers into
> iomap_file_buffered_write to reduce boiler plate code and reduce the
> scope to just around the page dirtying loop.
> 
> Note that zonefs was missing this assignment before.

Hu... Shouldn't this be fixed as a separate patch with a Fixes tag for this 
cycle ?
I have never noticed any issues with this missing though. Not sure how an issue
can be triggered with this assignment missing.

Apart from that, this patch look good to me.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 07/13] iomap: update ki_pos in iomap_file_buffered_write

2023-05-21 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> All callers of iomap_file_buffered_write need to updated ki_pos, move it
> into common code.
> 
> Signed-off-by: Christoph Hellwig 

One nit below.

Acked-by: Damien Le Moal 

> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 063133ec77f49e..550525a525c45c 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -864,16 +864,19 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
> iov_iter *i,
>   .len= iov_iter_count(i),
>   .flags  = IOMAP_WRITE,
>   };
> - int ret;
> + ssize_t ret;
>  
>   if (iocb->ki_flags & IOCB_NOWAIT)
>   iter.flags |= IOMAP_NOWAIT;
>  
>   while ((ret = iomap_iter(&iter, ops)) > 0)
>   iter.processed = iomap_write_iter(&iter, i);
> - if (iter.pos == iocb->ki_pos)
> +
> + if (unlikely(ret < 0))

Nit: This could be if (unlikely(ret <= 0)), no ?

>   return ret;
> - return iter.pos - iocb->ki_pos;
> + ret = iter.pos - iocb->ki_pos;
> + iocb->ki_pos += ret;
> + return ret;


-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 06/13] filemap: add a kiocb_invalidate_post_write helper

2023-05-21 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Add a helper to invalidate page cache after a dio write.
> 
> Signed-off-by: Christoph Hellwig 

Nit: kiocb_invalidate_post_dio_write() may be a better name to be explicit about
the fact that this is for DIOs only ?

Otherwise looks ok to me.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 05/13] filemap: add a kiocb_invalidate_pages helper

2023-05-21 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Factor out a helper that calls filemap_write_and_wait_range and
> invalidate_inode_pages2_rangefor a the range covered by a write kiocb or

invalidate_inode_pages2_rangefor a the range
->
invalidate_inode_pages2_range for the range

> returns -EAGAIN if the kiocb is marked as nowait and there would be pages
> to write or invalidate.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 04/13] filemap: add a kiocb_write_and_wait helper

2023-05-21 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Factor out a helper that does filemap_write_and_wait_range for a the

for a the -> for the

> range covered by a read kiocb, or returns -EAGAIN if the kiocb
> is marked as nowait and there would be pages to write.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 03/13] filemap: assign current->backing_dev_info in generic_perform_write

2023-05-21 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Move the assignment to current->backing_dev_info from the callers into
> generic_perform_write to reduce boiler plate code and reduce the scope
> to just around the page dirtying loop.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 02/13] filemap: update ki_pos in generic_perform_write

2023-05-21 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> All callers of generic_perform_write need to updated ki_pos, move it into
> common code.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 01/13] iomap: update ki_pos a little later in iomap_dio_complete

2023-05-21 Thread Damien Le Moal
On 5/19/23 18:35, Christoph Hellwig wrote:
> Move the ki_pos update down a bit to prepare for a better common
> helper that invalidates pages based of an iocb.
> 
> Signed-off-by: Christoph Hellwig 

Looks OK to me.

Reviewed-by: Damien Le Moal 

> + if (dio->flags & IOMAP_DIO_NEED_SYNC)
> + ret = generic_write_sync(iocb, ret);
> + if (ret > 0)
> + ret += dio->done_before;
> + }
>   trace_iomap_dio_complete(iocb, dio->error, ret);
>   kfree(dio);
> -

white line change. Personally, I like a blank line before returns to make them
stand out :)

>   return ret;
>  }
>  EXPORT_SYMBOL_GPL(iomap_dio_complete);

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [RFC PATCH] f2fs: preserve direct write semantics when buffering is forced

2023-03-23 Thread Damien Le Moal via Linux-f2fs-devel
On Thu, 2023-03-23 at 16:46 -0700, Jaegeuk Kim wrote:
> On 03/23, Damien Le Moal wrote:
> > On Thu, 2023-03-23 at 15:14 -0700, Jaegeuk Kim wrote:
> > > On 03/20, Christoph Hellwig wrote:
> > > > On Mon, Feb 20, 2023 at 01:20:04PM +0100, Hans Holmberg wrote:
> > > > > A) Supporting proper direct writes for zoned block devices
> > > > > would
> > > > > be the best, but it is currently not supported (probably for
> > > > > a good but non-obvious reason). Would it be feasible to
> > > > > implement proper direct IO?
> > > > 
> > > > I don't think why not.  In many ways direct writes to zoned
> > > > devices
> > > > should be easier than non-direct writes.
> > > > 
> > > > Any comments from the maintainers why the direct I/O writes to
> > > > zoned
> > > > devices are disabled?  I could not find anything helpful in the
> > > > comments
> > > > or commit logs.
> > > 
> > > The direct IO wants to overwrite the data on the same block
> > > address,
> > > while
> > > zoned device does not support it?
> > 
> > Surely that is not the case with LFS mode, doesn't it ? Otherwise,
> > even
> > buffered overwrites would have an issue.
> 
> Zoned device only supports LFS mode.

Yes, and that was exactly my point: with LFS mode, O_DIRECT write
should never overwrite anything. So I do not see why direct writes
should be handled as buffered writes with zoned devices. Am I missing
something here ?

> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [RFC PATCH] f2fs: preserve direct write semantics when buffering is forced

2023-03-23 Thread Damien Le Moal via Linux-f2fs-devel
On Thu, 2023-03-23 at 15:14 -0700, Jaegeuk Kim wrote:
> On 03/20, Christoph Hellwig wrote:
> > On Mon, Feb 20, 2023 at 01:20:04PM +0100, Hans Holmberg wrote:
> > > A) Supporting proper direct writes for zoned block devices would
> > > be the best, but it is currently not supported (probably for
> > > a good but non-obvious reason). Would it be feasible to
> > > implement proper direct IO?
> > 
> > I don't think why not.  In many ways direct writes to zoned devices
> > should be easier than non-direct writes.
> > 
> > Any comments from the maintainers why the direct I/O writes to
> > zoned
> > devices are disabled?  I could not find anything helpful in the
> > comments
> > or commit logs.
> 
> The direct IO wants to overwrite the data on the same block address,
> while
> zoned device does not support it?

Surely that is not the case with LFS mode, doesn't it ? Otherwise, even
buffered overwrites would have an issue.


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3 01/10] kobject: introduce kobject_del_and_put()

2023-03-22 Thread Damien Le Moal via Linux-f2fs-devel
On 3/23/23 01:58, Yangtao Li wrote:
> There are plenty of using kobject_del() and kobject_put() together
> in the kernel tree. This patch wraps these two calls in a single helper.
> 
> Signed-off-by: Yangtao Li 
> ---
> v3:
> -convert to inline helper
> v2:
> -add kobject_del_and_put() users
>  include/linux/kobject.h | 13 +
>  lib/kobject.c   |  3 +--
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
> index bdab370a24f4..e21b7c22e355 100644
> --- a/include/linux/kobject.h
> +++ b/include/linux/kobject.h
> @@ -112,6 +112,19 @@ extern struct kobject * __must_check 
> kobject_get_unless_zero(
>   struct kobject *kobj);
>  extern void kobject_put(struct kobject *kobj);
>  
> +/**
> + * kobject_del_and_put() - Delete kobject.
> + * @kobj: object.
> + *
> + * Unlink kobject from hierarchy and decrement the refcount.

Unlink kobject from hierarchy and decrement its refcount.

> + * If refcount is 0, call kobject_cleanup().

That is done by kobject_put() and not explicitly done directly in this helper.
So I would not mention this to avoid confusion as you otherwise have a
description that does not match the code we can see here.

With that fixed, this looks OK to me, so feel free to add:

Reviewed-by: Damien Le Moal 

> + */
> +static inline void kobject_del_and_put(struct kobject *kobj)
> +{
> + kobject_del(kobj);
> + kobject_put(kobj);
> +}
> +
>  extern const void *kobject_namespace(const struct kobject *kobj);
>  extern void kobject_get_ownership(const struct kobject *kobj,
> kuid_t *uid, kgid_t *gid);
> diff --git a/lib/kobject.c b/lib/kobject.c
> index f79a434e1231..e6c5a3ff1c53 100644
> --- a/lib/kobject.c
> +++ b/lib/kobject.c
> @@ -876,8 +876,7 @@ void kset_unregister(struct kset *k)
>  {
>   if (!k)
>   return;
> - kobject_del(&k->kobj);
> - kobject_put(&k->kobj);
> + kobject_del_and_put(&k->kobj);

Nit: You could simplify this one to be:

if (k)
kobject_del_and_put(&k->kobj);

and drop the return line.

>  }
>  EXPORT_SYMBOL(kset_unregister);
>  

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3 02/10] f2fs: convert to kobject_del_and_put()

2023-03-22 Thread Damien Le Moal via Linux-f2fs-devel
On 3/23/23 01:58, Yangtao Li wrote:
> Use kobject_del_and_put() to simplify code.
> 
> Cc: Greg Kroah-Hartman 
> Cc: "Rafael J. Wysocki" 
> Cc: Damien Le Moal 
> Signed-off-by: Yangtao Li 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [RESEND, PATCH v2 01/10] kobject: introduce kobject_del_and_put()

2023-03-20 Thread Damien Le Moal via Linux-f2fs-devel
On 3/21/23 03:46, Yangtao Li wrote:
> There are plenty of using kobject_del() and kobject_put() together
> in the kernel tree. This patch wraps these two calls in a single helper.
> 
> Signed-off-by: Yangtao Li 
> ---
> v2:
> -add kobject_del_and_put() users
> resend patchset to gregkh, Rafael and Damien
>  include/linux/kobject.h |  1 +
>  lib/kobject.c   | 17 +++--
>  2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
> index bdab370a24f4..782d4bd119f8 100644
> --- a/include/linux/kobject.h
> +++ b/include/linux/kobject.h
> @@ -111,6 +111,7 @@ extern struct kobject *kobject_get(struct kobject *kobj);
>  extern struct kobject * __must_check kobject_get_unless_zero(
>   struct kobject *kobj);
>  extern void kobject_put(struct kobject *kobj);
> +extern void kobject_del_and_put(struct kobject *kobj);
>  
>  extern const void *kobject_namespace(const struct kobject *kobj);
>  extern void kobject_get_ownership(const struct kobject *kobj,
> diff --git a/lib/kobject.c b/lib/kobject.c
> index 6e2f0bee3560..8c0293e37214 100644
> --- a/lib/kobject.c
> +++ b/lib/kobject.c
> @@ -731,6 +731,20 @@ void kobject_put(struct kobject *kobj)
>  }
>  EXPORT_SYMBOL(kobject_put);
>  
> +/**
> + * kobject_del_and_put() - Delete kobject.
> + * @kobj: object.
> + *
> + * Unlink kobject from hierarchy and decrement the refcount.
> + * If refcount is 0, call kobject_cleanup().
> + */
> +void kobject_del_and_put(struct kobject *kobj)
> +{
> + kobject_del(kobj);
> + kobject_put(kobj);
> +}
> +EXPORT_SYMBOL_GPL(kobject_del_and_put);

Why not make this an inline helper defined in include/linux/kobject.h instead of
a new symbol ?

> +
>  static void dynamic_kobj_release(struct kobject *kobj)
>  {
>   pr_debug("kobject: (%p): %s\n", kobj, __func__);
> @@ -874,8 +888,7 @@ void kset_unregister(struct kset *k)
>  {
>   if (!k)
>       return;
> - kobject_del(&k->kobj);
> - kobject_put(&k->kobj);
> + kobject_del_and_put(&k->kobj);
>  }
>  EXPORT_SYMBOL(kset_unregister);
>  

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2, RESEND 01/10] kobject: introduce kobject_del_and_put()

2023-03-20 Thread Damien Le Moal via Linux-f2fs-devel
On 3/20/23 16:11, Yangtao Li wrote:
> Hi filesystem maintainers,
> 
>> Hard to comment on patches with this. It is only 10 patches. So send 
>> everything please.
> 
> If you are interested in the entire patchset besides Damien,
> please let me know. I'll resend the email later to cc more people.

Yes, I said I am interested, twice already. It is IMPOSSIBLE to review a patch
without the context of other patches before/after said patch. So if you want a
review/ack for zonefs, then send the entire series.

> 
> Thx,
> Yangtao

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2, RESEND 01/10] kobject: introduce kobject_del_and_put()

2023-03-19 Thread Damien Le Moal via Linux-f2fs-devel
On 3/20/23 12:34, Yangtao Li wrote:
> Hi all,
> 
> Out of consideration for minimizing disruption, I did not send the
> patchset to everyone. However, it seems that my consideration was
> unnecessary, so I CC'd everyone on the first patch. If you would
> like to see the entire patchset, you can access it at this address.
> 
> https://lore.kernel.org/lkml/20230319092641.41917-1-frank...@vivo.com/

Hard to comment on patches with this. It is only 10 patches. So send everything
please.

> 
> Thx,
> Yangtao

-- 
Damien Le Moal
Western Digital Research



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 01/16] block: make blkdev_nr_zones and blk_queue_zone_no generic for npo2 zsze

2022-05-03 Thread Damien Le Moal via Linux-f2fs-devel
On 2022/05/04 1:37, Bart Van Assche wrote:
> On 4/27/22 09:02, Pankaj Raghav wrote:
>> Adapt blkdev_nr_zones and blk_queue_zone_no function so that it can
>> also work for non-power-of-2 zone sizes.
>>
>> As the existing deployments of zoned devices had power-of-2
>> assumption, power-of-2 optimized calculation is kept for those devices.
>>
>> There are no direct hot paths modified and the changes just
>> introduce one new branch per call.
>>
>> Reviewed-by: Luis Chamberlain 
>> Signed-off-by: Pankaj Raghav 
>> ---
>>   block/blk-zoned.c  | 8 +++-
>>   include/linux/blkdev.h | 8 +++-
>>   2 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>> index 38cd840d8838..1dff4a8bd51d 100644
>> --- a/block/blk-zoned.c
>> +++ b/block/blk-zoned.c
>> @@ -117,10 +117,16 @@ EXPORT_SYMBOL_GPL(__blk_req_zone_write_unlock);
>>   unsigned int blkdev_nr_zones(struct gendisk *disk)
>>   {
>>  sector_t zone_sectors = blk_queue_zone_sectors(disk->queue);
>> +sector_t capacity = get_capacity(disk);
>>   
>>  if (!blk_queue_is_zoned(disk->queue))
>>  return 0;
>> -return (get_capacity(disk) + zone_sectors - 1) >> ilog2(zone_sectors);
>> +
>> +if (is_power_of_2(zone_sectors))
>> +return (capacity + zone_sectors - 1) >>
>> +   ilog2(zone_sectors);
>> +
>> +return div64_u64(capacity + zone_sectors - 1, zone_sectors);
>>   }
>>   EXPORT_SYMBOL_GPL(blkdev_nr_zones);
> 
> Does anyone need support for more than 4 billion sectors per zone? If 
> not, do_div() should be sufficient.
> 
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 60d016138997..c4e4c7071b7b 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -665,9 +665,15 @@ static inline unsigned int blk_queue_nr_zones(struct 
>> request_queue *q)
>>   static inline unsigned int blk_queue_zone_no(struct request_queue *q,
>>   sector_t sector)
>>   {
>> +sector_t zone_sectors = blk_queue_zone_sectors(q);
>> +
>>  if (!blk_queue_is_zoned(q))
>>  return 0;
>> -return sector >> ilog2(q->limits.chunk_sectors);
>> +
>> +    if (is_power_of_2(zone_sectors))
>> +return sector >> ilog2(zone_sectors);
>> +
>> +return div64_u64(sector, zone_sectors);
>>   }
> 
> Same comment here.

sector_t is 64-bits even on 32-bits arch, no ?
so div64_u64 is needed here I think, which will be a simple regular division for
64-bit arch.

> 
> Thanks,
> 
> Bart.


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 16/16] dm-zoned: ensure only power of 2 zone sizes are allowed

2022-04-28 Thread Damien Le Moal via Linux-f2fs-devel
On 4/29/22 02:34, Luis Chamberlain wrote:
> On Thu, Apr 28, 2022 at 08:42:41AM +0900, Damien Le Moal wrote:
>> On 4/28/22 01:02, Pankaj Raghav wrote:
>>> From: Luis Chamberlain 
>>>
>>> Today dm-zoned relies on the assumption that you have a zone size
>>> with a power of 2. Even though the block layer today enforces this
>>> requirement, these devices do exist and so provide a stop-gap measure
>>> to ensure these devices cannot be used by mistake
>>>
>>> Signed-off-by: Luis Chamberlain 
>>> Signed-off-by: Pankaj Raghav 
>>> ---
>>>  drivers/md/dm-zone.c | 12 
>>>  1 file changed, 12 insertions(+)
>>>
>>> diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
>>> index 57daa86c19cf..221e0aa0f1a7 100644
>>> --- a/drivers/md/dm-zone.c
>>> +++ b/drivers/md/dm-zone.c
>>> @@ -231,6 +231,18 @@ static int dm_revalidate_zones(struct mapped_device 
>>> *md, struct dm_table *t)
>>> struct request_queue *q = md->queue;
>>> unsigned int noio_flag;
>>> int ret;
>>> +   struct block_device *bdev = md->disk->part0;
>>> +   sector_t zone_sectors;
>>> +   char bname[BDEVNAME_SIZE];
>>> +
>>> +   zone_sectors = bdev_zone_sectors(bdev);
>>> +
>>> +   if (!is_power_of_2(zone_sectors)) {
>>> +   DMWARN("%s: %s only power of two zone size supported\n",
>>> +  dm_device_name(md),
>>> +  bdevname(bdev, bname));
>>> +   return 1;
>>> +   }
>>
>> Why ?
>>
>> See my previous email about still allowing ZC < ZS for non power of 2 zone
>> size drives. dm-zoned can easily support non power of 2 zone size as long
>> as ZC == ZS for all zones.
> 
> Great, thanks for the heads up.
> 
>> The problem with dm-zoned is ZC < ZS *AND* potentially variable ZC per
>> zone. That cannot be supported easily (still not impossible, but
>> definitely a lot more complex).
> 
> I see thanks.
> 
> Testing would still be required to ensure this all works well with npo2.
> So I'd prefer to do that as a separate effort, even if it is easy. So
> for now I think it makes sense to avoid this as this is not yet well
> tested.
> 
> As with filesystem support, we've even have gotten hints that support
> for npo2 should be easy, but without proper testing it would not be
> prudent to enable support for users yet.
> 
> One step at a time.

Yes, in general, I agree. But in this case, that will create kernel
versions that end up having partial support for zoned drives. Not ideal to
say the least. So if the patches are not that big, I would rather like to
see everything go into a single release.

-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 03/16] block: add bdev_zone_no helper

2022-04-27 Thread Damien Le Moal via Linux-f2fs-devel
On 4/28/22 01:02, Pankaj Raghav wrote:
> Many places in the filesystem for zoned devices open code this function
> to find the zone number for a given sector with power of 2 assumption.
> This generic helper can be used to calculate zone number for a given
> sector in a block device
> 
> This helper internally uses blk_queue_zone_no to find the zone number.
> 
> Reviewed-by: Luis Chamberlain 
> Signed-off-by: Pankaj Raghav 
> ---
>  include/linux/blkdev.h | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index f8f2d2998afb..55293e0a8702 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1392,6 +1392,15 @@ static inline bool bdev_zone_aligned(struct 
> block_device *bdev, sector_t sec)
>   return false;
>  }
>  
> +static inline unsigned int bdev_zone_no(struct block_device *bdev, sector_t 
> sec)
> +{
> + struct request_queue *q = bdev_get_queue(bdev);
> +
> + if (q)

q is never NULL. So this can be simplified to:

return blk_queue_zone_no(bdev_get_queue(bdev), sector);

> + return blk_queue_zone_no(q, sec);
> + return 0;
> +}
> +
>  static inline unsigned int bdev_max_open_zones(struct block_device *bdev)
>  {
>   struct request_queue *q = bdev_get_queue(bdev);


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 04/16] block: allow blk-zoned devices to have non-power-of-2 zone size

2022-04-27 Thread Damien Le Moal via Linux-f2fs-devel
On 4/28/22 01:02, Pankaj Raghav wrote:
> Convert the calculations on zone size to be generic instead of relying on
> power_of_2 based logic in the block layer using the helpers wherever
> possible.
> 
> The only hot path affected by this change for power_of_2 zoned devices
> is in blk_check_zone_append() but the effects should be negligible as the
> helper blk_queue_zone_aligned() optimizes the calculation for those
> devices. Note that the append path cannot be accessed by direct raw access
> to the block device but only through a filesystem abstraction.
> 
> Finally, remove the check for power_of_2 zone size requirement in
> blk-zoned.c
> 
> Reviewed-by: Luis Chamberlain 
> Signed-off-by: Pankaj Raghav 
> ---
>  block/blk-core.c  |  3 +--
>  block/blk-zoned.c | 12 ++--
>  2 files changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 937bb6b86331..850caf311064 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -634,8 +634,7 @@ static inline blk_status_t blk_check_zone_append(struct 
> request_queue *q,
>   return BLK_STS_NOTSUPP;
>  
>   /* The bio sector must point to the start of a sequential zone */
> - if (pos & (blk_queue_zone_sectors(q) - 1) ||
> - !blk_queue_zone_is_seq(q, pos))
> + if (!blk_queue_zone_aligned(q, pos) || !blk_queue_zone_is_seq(q, pos))

blk_queue_zone_aligned() is a little confusing since "aligned" is also
used for write-pointer aligned. I would rename this helper

blk_queue_is_zone_start()

or something like that.


>   return BLK_STS_IOERR;
>  
>   /*
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index 1dff4a8bd51d..f7c7c3bd148d 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -288,10 +288,10 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum 
> req_opf op,
>   return -EINVAL;
>  
>   /* Check alignment (handle eventual smaller last zone) */
> - if (sector & (zone_sectors - 1))
> + if (!blk_queue_zone_aligned(q, sector))
>   return -EINVAL;
>  
> - if ((nr_sectors & (zone_sectors - 1)) && end_sector != capacity)
> + if (!blk_queue_zone_aligned(q, nr_sectors) && end_sector != capacity)
>   return -EINVAL;
>  
>   /*
> @@ -489,14 +489,14 @@ static int blk_revalidate_zone_cb(struct blk_zone 
> *zone, unsigned int idx,
>* smaller last zone.
>*/
>   if (zone->start == 0) {
> - if (zone->len == 0 || !is_power_of_2(zone->len)) {
> - pr_warn("%s: Invalid zoned device with non power of two 
> zone size (%llu)\n",
> - disk->disk_name, zone->len);
> + if (zone->len == 0) {
> + pr_warn("%s: Invalid zoned device size",
> + disk->disk_name);

The message is weird now. Please change it to "Invalid zone size".

Also, the entire premise of this patch series is that it is hard for
people to support the unusable sectors between zone capacity and zone end
for drives with a zone capacity smaller than the zone size.

Yet, here you do not check that zone capacity == zone size for drives that
do not have a zone size equal to a power of 2 number of sectors. This
means that we can still have drives with ZC < ZS AND ZS not equal to a
power of 2. So from the point of view of your arguments, no gains at all.
Any thoughts on this ?

>   return -ENODEV;
>   }
>  
>   args->zone_sectors = zone->len;
> - args->nr_zones = (capacity + zone->len - 1) >> ilog2(zone->len);
> + args->nr_zones = div64_u64(capacity + zone->len - 1, zone->len);
>   } else if (zone->start + args->zone_sectors < capacity) {
>   if (zone->len != args->zone_sectors) {
>   pr_warn("%s: Invalid zoned device with non constant 
> zone size\n",


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 16/16] dm-zoned: ensure only power of 2 zone sizes are allowed

2022-04-27 Thread Damien Le Moal via Linux-f2fs-devel
On 4/28/22 01:02, Pankaj Raghav wrote:
> From: Luis Chamberlain 
> 
> Today dm-zoned relies on the assumption that you have a zone size
> with a power of 2. Even though the block layer today enforces this
> requirement, these devices do exist and so provide a stop-gap measure
> to ensure these devices cannot be used by mistake
> 
> Signed-off-by: Luis Chamberlain 
> Signed-off-by: Pankaj Raghav 
> ---
>  drivers/md/dm-zone.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
> index 57daa86c19cf..221e0aa0f1a7 100644
> --- a/drivers/md/dm-zone.c
> +++ b/drivers/md/dm-zone.c
> @@ -231,6 +231,18 @@ static int dm_revalidate_zones(struct mapped_device *md, 
> struct dm_table *t)
>   struct request_queue *q = md->queue;
>   unsigned int noio_flag;
>   int ret;
> + struct block_device *bdev = md->disk->part0;
> + sector_t zone_sectors;
> + char bname[BDEVNAME_SIZE];
> +
> + zone_sectors = bdev_zone_sectors(bdev);
> +
> + if (!is_power_of_2(zone_sectors)) {
> + DMWARN("%s: %s only power of two zone size supported\n",
> +dm_device_name(md),
> +bdevname(bdev, bname));
> + return 1;
> + }

Why ?

See my previous email about still allowing ZC < ZS for non power of 2 zone
size drives. dm-zoned can easily support non power of 2 zone size as long
as ZC == ZS for all zones.

The problem with dm-zoned is ZC < ZS *AND* potentially variable ZC per
zone. That cannot be supported easily (still not impossible, but
definitely a lot more complex).

>  
>   /*
>* Check if something changed. If yes, cleanup the current resources


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 12/16] zonefs: allow non power of 2 zoned devices

2022-04-27 Thread Damien Le Moal via Linux-f2fs-devel
On 4/28/22 01:02, Pankaj Raghav wrote:
> The zone size shift variable is useful only if the zone sizes are known
> to be power of 2. Remove that variable and use generic helpers from
> block layer to calculate zone index in zonefs

Period missing at the end of the sentence.

What about zonefs-tools and its test suite ? Is everything still OK on
that front ? I suspect not...

> 
> Reviewed-by: Luis Chamberlain 
> Signed-off-by: Pankaj Raghav 
> ---
>  fs/zonefs/super.c  | 6 ++
>  fs/zonefs/zonefs.h | 1 -
>  2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index 3614c7834007..5422be2ca570 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -401,10 +401,9 @@ static void __zonefs_io_error(struct inode *inode, bool 
> write)
>  {
>   struct zonefs_inode_info *zi = ZONEFS_I(inode);
>   struct super_block *sb = inode->i_sb;
> - struct zonefs_sb_info *sbi = ZONEFS_SB(sb);
>   unsigned int noio_flag;
>   unsigned int nr_zones =
> - zi->i_zone_size >> (sbi->s_zone_sectors_shift + SECTOR_SHIFT);
> + bdev_zone_no(sb->s_bdev, zi->i_zone_size >> SECTOR_SHIFT);
>   struct zonefs_ioerr_data err = {
>   .inode = inode,
>   .write = write,
> @@ -1300,7 +1299,7 @@ static void zonefs_init_file_inode(struct inode *inode, 
> struct blk_zone *zone,
>   struct zonefs_sb_info *sbi = ZONEFS_SB(sb);
>   struct zonefs_inode_info *zi = ZONEFS_I(inode);
>  
> - inode->i_ino = zone->start >> sbi->s_zone_sectors_shift;
> + inode->i_ino = bdev_zone_no(sb->s_bdev, zone->start);
>   inode->i_mode = S_IFREG | sbi->s_perm;
>  
>   zi->i_ztype = type;
> @@ -1647,7 +1646,6 @@ static int zonefs_fill_super(struct super_block *sb, 
> void *data, int silent)
>* interface constraints.
>*/
>   sb_set_blocksize(sb, bdev_zone_write_granularity(sb->s_bdev));
> - sbi->s_zone_sectors_shift = ilog2(bdev_zone_sectors(sb->s_bdev));
>   sbi->s_uid = GLOBAL_ROOT_UID;
>   sbi->s_gid = GLOBAL_ROOT_GID;
>   sbi->s_perm = 0640;
> diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h
> index 7b147907c328..2d5ea3be3a8e 100644
> --- a/fs/zonefs/zonefs.h
> +++ b/fs/zonefs/zonefs.h
> @@ -175,7 +175,6 @@ struct zonefs_sb_info {
>   kgid_t      s_gid;
>   umode_t s_perm;
>   uuid_t  s_uuid;
> - unsigned ints_zone_sectors_shift;
>  
>   unsigned ints_nr_files[ZONEFS_ZTYPE_MAX];
>  


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: use flush command instead of FUA for zoned device

2022-04-21 Thread Damien Le Moal via Linux-f2fs-devel
On 4/20/22 06:57, Jaegeuk Kim wrote:
> The block layer for zoned disk can reorder the FUA'ed IOs. Let's use flush
> command to keep the write order.

Stricktly speaking, for a request that has data, the problem is triggered
by REQ_PREFLUSH since in this case the request does not go through the
scheduler and is processed through the blk-flush machinery. REQ_FUA on its
own should not matter if the device supports it. If the device does not
support FUA, then the same problem can happen due to POSTFLUSH (again no
scheduler).

Bypassing the scheduler leads to the write not write-locking the zone,
which leads to reordering... Completely overlooked that case when the zone
write locking was implemented.

Ideally, the FS should not have to care about this. blk-flush machinery
should be a little more intelligent and process the write phase of the
request using the scheduler. Need to look into that.

> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/file.c | 4 +++-
>  fs/f2fs/node.c | 2 +-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index f08e6208e183..2aef0632f35b 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -372,7 +372,9 @@ static int f2fs_do_sync_file(struct file *file, loff_t 
> start, loff_t end,
>   f2fs_remove_ino_entry(sbi, ino, APPEND_INO);
>   clear_inode_flag(inode, FI_APPEND_WRITE);
>  flush_out:
> - if (!atomic && F2FS_OPTION(sbi).fsync_mode != FSYNC_MODE_NOBARRIER)
> + if ((!atomic && F2FS_OPTION(sbi).fsync_mode != FSYNC_MODE_NOBARRIER) ||
> + (atomic && !test_opt(sbi, NOBARRIER) &&
> + f2fs_sb_has_blkzoned(sbi)))

Aligning the conditions and not breaking the second line would make this a
lot easier to read...

>   ret = f2fs_issue_flush(sbi, inode->i_ino);
>   if (!ret) {
>   f2fs_remove_ino_entry(sbi, ino, UPDATE_INO);
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index c280f482c741..7224a980056f 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1633,7 +1633,7 @@ static int __write_node_page(struct page *page, bool 
> atomic, bool *submitted,
>   goto redirty_out;
>   }
>  
> - if (atomic && !test_opt(sbi, NOBARRIER))
> + if (atomic && !test_opt(sbi, NOBARRIER) && !f2fs_sb_has_blkzoned(sbi))
>   fio.op_flags |= REQ_PREFLUSH | REQ_FUA;

Is this really OK to do ? flush + write as different operations may not
lead to the same result as a preflush+fua write.

Until the block layer is fixed to properly handle this, a simpler fix for
f2fs would be to force enable the NOBARRIER option for zoned drives ? That
would avoid these changes no ?

Also, with all the testing we do on SMR disks and f2fs (smaller, older SMR
disks due to the 16TB limit), we never have triggered this problem. How
did you trigger it ?

>  
>   /* should add to global list before clearing PAGECACHE status */


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 10/27] mm: use bdev_is_zoned in claim_swapfile

2022-04-14 Thread Damien Le Moal via Linux-f2fs-devel
On 4/15/22 13:52, Christoph Hellwig wrote:
> Use the bdev based helper instead of poking into the queue.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  mm/swapfile.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 63c61f8b26118..4c7537162af5e 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2761,7 +2761,7 @@ static int claim_swapfile(struct swap_info_struct *p, 
> struct inode *inode)
>* write only restriction.  Hence zoned block devices are not
>* suitable for swapping.  Disallow them here.
>*/
> - if (blk_queue_is_zoned(p->bdev->bd_disk->queue))
> + if (bdev_is_zoned(p->bdev))
>   return -EINVAL;
>   p->flags |= SWP_BLKDEV;
>   } else if (S_ISREG(inode->i_mode)) {

Looks good.

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 27/27] direct-io: remove random prefetches

2022-04-14 Thread Damien Le Moal via Linux-f2fs-devel
On 4/15/22 13:52, Christoph Hellwig wrote:
> Randomly poking into block device internals for manual prefetches isn't
> exactly a very maintainable thing to do.  And none of the performance
> criticil direct I/O implementations still use this library function

s/criticil/critical

> anyway, so just drop it.
> 
> Signed-off-by: Christoph Hellwig 

Looks good to me.

Reviewed-by: Damien Le Moal 


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/3] libf2fs: add support to report zone size

2022-04-12 Thread Damien Le Moal via Linux-f2fs-devel
On Tue, 2022-04-12 at 09:23 -0700, Luis Chamberlain wrote:
> On Tue, Apr 12, 2022 at 12:14:13PM +0000, Damien Le Moal wrote:
> > On Tue, 2022-04-12 at 13:27 +0200, Pankaj Raghav wrote:
> > > From: Luis Chamberlain 
> > > 
> > > Expand get_device_info() to report the zone size.
> > > This is now important give power of 2 zone sizees (PO2)
> > 
> > s/give/given that
> > s/sizees/size
> > 
> > > can exist, and so can non power of 2 zones sizes (NPO2),
> > 
> > No they cannot, not yet in Linux.
> 
> They *do* exist in the real world, and in Linux they do come up when
> users are manually removing the current po2 requirement on Linux
> sources [0].
> 
> So how about:
> 
> This is now important give power of 2 zone sizees (PO2)
> do exist and some users are manually forcing to enable these
> on Linux [0].

Please fix the grammar and typos as mentioned above.

> 
> [0] 
> https://lkml.kernel.org/r/CAJjM=8Cap1bwu8cp-mRTsiBmbHH=ed8pp9vdasig2o_ziht...@mail.gmail.com

This link does not work: lore says "not found"

> 
> > Overall, I really do not see the point of this patch.
> 
> It makes the subsequent patch easier to read and also makes
> the displaying zone support separate easier to review. It can
> certainly be merged, but I don't think it makes the last patch
> as easier to read. It's subjective and up to the maintainer.

In my opinion, a single patch that adds power of 2 check right after
chunk_sectors is read would be a lot simpler and easier to understand.

> 
>   Luis

-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 1/3] libf2fs_zoned: factor out helper to get chunk sectors

2022-04-12 Thread Damien Le Moal via Linux-f2fs-devel
On Tue, 2022-04-12 at 13:27 +0200, Pankaj Raghav wrote:
> From: Luis Chamberlain 
> 
> This moves the code which fetches the chunk_sectors into a helper.
> Yes, this could in theory be used by non-zoned devices but that's
> not the case yet, so no need to make this a generic routine.
> 
> This makes it clear what this is doing, and helps us make the
> subsequent changes easier to read.
> 
> Signed-off-by: Luis Chamberlain 

Pankaj, since you are sending the patch, add your Signed-ff-by please.

> ---
>  lib/libf2fs_zoned.c | 27 +++
>  1 file changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/libf2fs_zoned.c b/lib/libf2fs_zoned.c
> index ce73b9a..1447181 100644
> --- a/lib/libf2fs_zoned.c
> +++ b/lib/libf2fs_zoned.c
> @@ -146,35 +146,46 @@ int f2fs_get_zoned_model(int i)
>   return 0;
>  }
>  
> -int f2fs_get_zone_blocks(int i)
> +uint64_t f2fs_get_zone_chunk_sectors(struct device_info *dev)

chunk_sectors is an unsigned int. Why the change to uint64_t ?

>  {
> - struct device_info *dev = c.devices + i;
>   uint64_t sectors;
>   char str[PATH_MAX];
>   FILE *file;
>   int res;
>  
> - /* Get zone size */
> - dev->zone_blocks = 0;
> -
>   res = get_sysfs_path(dev, "queue/chunk_sectors", str, sizeof(str));
>   if (res != 0) {
>   MSG(0, "\tError: Failed to get device sysfs attribute path\n");
> - return -1;
> + return 0;
>   }
>  
>   file = fopen(str, "r");
>   if (!file)
> - return -1;
> + return 0;
>  
>   memset(str, 0, sizeof(str));
>   res = fscanf(file, "%s", str);
>   fclose(file);
>  
>   if (res != 1)
> - return -1;
> + return 0;
>  
>   sectors = atol(str);
> + if (!sectors)
> + return 0;
> +
> + return sectors;
> +}
> +
> +int f2fs_get_zone_blocks(int i)
> +{
> + struct device_info *dev = c.devices + i;
> + uint64_t sectors;
> +
> + /* Get zone size */
> + dev->zone_blocks = 0;
> +
> + sectors = f2fs_get_zone_chunk_sectors(dev);
>   if (!sectors)
>   return -1;
>  

-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 3/3] libf2fs: don't allow mkfs / fsck on zoned NPO2

2022-04-12 Thread Damien Le Moal via Linux-f2fs-devel
On Tue, 2022-04-12 at 13:27 +0200, Pankaj Raghav wrote:
> From: Luis Chamberlain 
> 
> f2fs currently only work with zoned storage devices with a zone
> size which is a power of 2 (PO2). So check if a non-power of 2
> zone is device is found, and if so disallow its use. This prevents
> users from incorrectly using these devices.
> 
> This is a non-issue today give today's kernel does not allow NPO2
> zon devices to exist. But these devices do exist, and so proactively
> put a stop-gap measure in place to prevent the from being assumed
> to be used.
> 
> Signed-off-by: Luis Chamberlain 
> Signed-off-by: Pankaj Raghav 
> ---
>  lib/libf2fs.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/lib/libf2fs.c b/lib/libf2fs.c
> index 8fad1d7..a13ba32 100644
> --- a/lib/libf2fs.c
> +++ b/lib/libf2fs.c
> @@ -882,6 +882,11 @@ static int open_check_fs(char *path, int flag)
>   return open(path, O_RDONLY | flag);
>  }
>  
> +static int is_power_of_2(unsigned long n)
> +{
> + return (n != 0 && ((n & (n - 1)) == 0));
> +}
> +
>  int get_device_info(int i)
>  {
>   int32_t fd = 0;
> @@ -1043,6 +1048,13 @@ int get_device_info(int i)
>   return -1;
>   }
>  
> + if (!dev->zone_size || !is_power_of_2(dev->zone_size)) {
> + MSG(0, "\tError: zoned: illegal zone size %lu (not a 
> power of 2)\n",
> + dev->zone_size);

The message should be different for the !dev->zone_size case since that
would be an error.

> + free(stat_buf);
> + return -1;
> + }
> +
>   /*
>* Check zone configuration: for the first disk of a
>* multi-device volume, conventional zones are needed.

Of the 3 patches of this series, this one is the only one that makes sense
to me. I fail to see how the first 2 patches improve things.

-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/3] libf2fs: add support to report zone size

2022-04-12 Thread Damien Le Moal via Linux-f2fs-devel
On Tue, 2022-04-12 at 13:27 +0200, Pankaj Raghav wrote:
> From: Luis Chamberlain 
> 
> Expand get_device_info() to report the zone size.
> This is now important give power of 2 zone sizees (PO2)

s/give/given that
s/sizees/size

> can exist, and so can non power of 2 zones sizes (NPO2),

No they cannot, not yet in Linux.

> and we should be aware of the differences in terms of
> support.
> 
> This will be used more in subsequent patch.
> 
> Signed-off-by: Luis Chamberlain 
> Signed-off-by: Pankaj Raghav 
> ---
>  include/f2fs_fs.h   | 1 +
>  lib/libf2fs.c   | 5 +++--
>  lib/libf2fs_zoned.c | 5 +++--
>  3 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
> index d236437..83c5b33 100644
> --- a/include/f2fs_fs.h
> +++ b/include/f2fs_fs.h
> @@ -386,6 +386,7 @@ struct device_info {
>   u_int32_t nr_zones;
>   u_int32_t nr_rnd_zones;
>   size_t zone_blocks;
> + uint64_t zone_size;
>   size_t *zone_cap_blocks;
>  };
>  
> diff --git a/lib/libf2fs.c b/lib/libf2fs.c
> index 420dfda..8fad1d7 100644
> --- a/lib/libf2fs.c
> +++ b/lib/libf2fs.c
> @@ -1055,8 +1055,9 @@ int get_device_info(int i)
>   MSG(0, "Info: Host-%s zoned block device:\n",
>   (dev->zoned_model == F2FS_ZONED_HA) ?
>   "aware" : "managed");
> - MSG(0, "  %u zones, %u randomly writeable zones\n",
> - dev->nr_zones, dev->nr_rnd_zones);
> + MSG(0, "  %u zones, %lu zone size: %u randomly writeable 
> zones\n",

No unit mentioned for zone size in the message.

> + dev->nr_zones, dev->zone_size,
> + dev->nr_rnd_zones);
>   MSG(0, "  %lu blocks per zone\n",
>   dev->zone_blocks);
>   }
> diff --git a/lib/libf2fs_zoned.c b/lib/libf2fs_zoned.c
> index 1447181..0acae88 100644
> --- a/lib/libf2fs_zoned.c
> +++ b/lib/libf2fs_zoned.c
> @@ -189,8 +189,9 @@ int f2fs_get_zone_blocks(int i)
>   if (!sectors)
>   return -1;
>  
> - dev->zone_blocks = sectors >> (F2FS_BLKSIZE_BITS - 9);
> - sectors = (sectors << 9) / c.sector_size;
> + dev->zone_size = sectors << SECTOR_SHIFT;
> + dev->zone_blocks = sectors >> (F2FS_BLKSIZE_BITS - SECTOR_SHIFT);
> + sectors = dev->zone_size / c.sector_size;
>  
>   /*
>* Total number of zones: there may

Overall, I really do not see the point of this patch.

-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 14/27] block: add a bdev_max_zone_append_sectors helper

2022-04-06 Thread Damien Le Moal via Linux-f2fs-devel

On 4/6/22 15:05, Christoph Hellwig wrote:

Add a helper to check the max supported sectors for zone append based on
the block_device instead of having to poke into the block layer internal
request_queue.

Signed-off-by: Christoph Hellwig 
---
  drivers/nvme/target/zns.c | 3 +--
  fs/zonefs/super.c | 3 +--
  include/linux/blkdev.h| 6 ++
  3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index e34718b095504..82b61acf7a72b 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -34,8 +34,7 @@ static int validate_conv_zones_cb(struct blk_zone *z,
  
  bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)

  {
-   struct request_queue *q = ns->bdev->bd_disk->queue;
-   u8 zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
+   u8 zasl = nvmet_zasl(bdev_max_zone_append_sectors(ns->bdev));
struct gendisk *bd_disk = ns->bdev->bd_disk;
int ret;
  
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c

index 3614c7834007d..7a63807b736c4 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -678,13 +678,12 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, 
struct iov_iter *from)
struct inode *inode = file_inode(iocb->ki_filp);
struct zonefs_inode_info *zi = ZONEFS_I(inode);
struct block_device *bdev = inode->i_sb->s_bdev;
-   unsigned int max;
+   unsigned int max = bdev_max_zone_append_sectors(bdev);
struct bio *bio;
ssize_t size;
int nr_pages;
ssize_t ret;
  
-	max = queue_max_zone_append_sectors(bdev_get_queue(bdev));

max = ALIGN_DOWN(max << SECTOR_SHIFT, inode->i_sb->s_blocksize);
iov_iter_truncate(from, max);
  
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h

index a433798c3343e..f8c50b77543eb 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1188,6 +1188,12 @@ static inline unsigned int 
queue_max_zone_append_sectors(const struct request_qu
return min(l->max_zone_append_sectors, l->max_sectors);
  }
  
+static inline unsigned int

+bdev_max_zone_append_sectors(struct block_device *bdev)
+{
+   return queue_max_zone_append_sectors(bdev_get_queue(bdev));
+}
+
  static inline unsigned queue_logical_block_size(const struct request_queue *q)
  {
    int retval = 512;


Looks good.

Acked-by: Damien Le Moal 

--
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v4 0/6] IO priority fixes and improvements

2021-08-18 Thread Damien Le Moal
On 2021/08/18 22:24, Jens Axboe wrote:
> On 8/10/21 9:36 PM, Damien Le Moal wrote:
>> This series fixes problems with IO priority values handling and cleans
>> up several macro names and code for clarity.
> 
> Applied for 5.15 - note that I dropped the lightnvm change from 6/6,
> as it isn't strictly needed and lightnvm is deleted from the 5.15
> drivers branch anyway.

OK. Thanks !



-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v4 0/6] IO priority fixes and improvements

2021-08-18 Thread Damien Le Moal
On 2021/08/11 12:37, Damien Le Moal wrote:
> This series fixes problems with IO priority values handling and cleans
> up several macro names and code for clarity.

Jens,

Any comment on this ?

> 
> Changes from v3:
> * Split former patch 2 into patches 2, 3 and 4 to facilitate review and
>   have more descriptive commit titles.
> * In patch 5, keep IOPRIO_BE_NR as an alias for the new IOPRIO_NR_LEVELS
>   macro. Change this patch title and commit message accordingly.
> * In patch 6, define IOPRIO_BE_NORM as an alias of IOPRIO_NORM.
> 
> Changes from v2:
> * Fixed typo in a comment in patch 3
> * Added reviewed-by tags
> 
> Changes from v1:
> * Added patch 4 to unify the default priority value used in various
>   places.
> * Fixed patch 2 as suggested by Bart: remove extra parenthesis and move
>   ioprio_valid() from the uapi header to the kernel header.
> * In patch 2, add priority value masking.
> 
> Damien Le Moal (6):
>   block: bfq: fix bfq_set_next_ioprio_data()
>   block: improve ioprio class description comment
>   block: change ioprio_valid() to an inline function
>   block: fix IOPRIO_PRIO_CLASS() and IOPRIO_PRIO_VALUE() macros
>   block: Introduce IOPRIO_NR_LEVELS
>   block: fix default IO priority handling
> 
>  block/bfq-iosched.c  | 10 +-
>  block/bfq-iosched.h  |  4 ++--
>  block/bfq-wf2q.c |  6 +++---
>  block/ioprio.c   |  9 -
>  drivers/nvme/host/lightnvm.c |  2 +-
>  fs/f2fs/sysfs.c  |  2 +-
>  include/linux/ioprio.h   | 17 -
>  include/uapi/linux/ioprio.h  | 34 ++++--
>  8 files changed, 52 insertions(+), 32 deletions(-)
> 


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v4 3/6] block: change ioprio_valid() to an inline function

2021-08-11 Thread Damien Le Moal
On 2021/08/11 16:56, Johannes Thumshirn wrote:
> On 11/08/2021 05:37, Damien Le Moal wrote:
>> Change the ioprio_valid() macro in include/usapi/linux/ioprio.h to an
>> uapi ~^
> 
>> inline function declared on the kernel side in include/linux/ioprio.h.
>> Also improve checks on the class value by checking the upper bound
>> value.
> 
> But I think it needs to stay in include/uapi/linux/ioprio.h as it's there
> since the 2.6.x days (I've checked back to v2.6.39.4) so the chance of
> user-space using it is quite high.

include/uapi/linux/ioprio.h is being introduced with kernel 5.15. This user
header did not exist now and in previous kernels. include/linux/ioprio.h has
been around for a while though, but that is a kernel header, not an application
header.


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v4 4/6] block: fix IOPRIO_PRIO_CLASS() and IOPRIO_PRIO_VALUE() macros

2021-08-10 Thread Damien Le Moal
The ki_ioprio field of struct kiocb is 16-bits (u16) but often handled
as an int in the block layer. E.g. ioprio_check_cap() takes an int as
argument.

With such implicit int casting function calls, the upper 16-bits of the
int argument may be left uninitialized by the compiler, resulting in
invalid values for the IOPRIO_PRIO_CLASS() macro (garbage upper bits)
and in an error return for functions such as ioprio_check_cap().

Fix this by masking the result of the shift by IOPRIO_CLASS_SHIFT bits
in the IOPRIO_PRIO_CLASS() macro. The new macro IOPRIO_CLASS_MASK
defines the 3-bits mask for the priority class.
Similarly, apply the IOPRIO_PRIO_MASK mask to the data argument of the
IOPRIO_PRIO_VALUE() macro to ignore the upper bits of the data value.
The IOPRIO_CLASS_MASK mask is also applied to the class argument of this
macro before shifting the result by IOPRIO_CLASS_SHIFT bits.

While at it, also change the argument name of the IOPRIO_PRIO_CLASS()
and IOPRIO_PRIO_DATA() macros from "mask" to "ioprio" to reflect the
fact that a priority value should be passed rather than a mask.

Signed-off-by: Damien Le Moal 
---
 include/uapi/linux/ioprio.h | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
index 5064e230374c..936f0d8f30e1 100644
--- a/include/uapi/linux/ioprio.h
+++ b/include/uapi/linux/ioprio.h
@@ -5,12 +5,16 @@
 /*
  * Gives us 8 prio classes with 13-bits of data for each class
  */
-#define IOPRIO_CLASS_SHIFT (13)
+#define IOPRIO_CLASS_SHIFT 13
+#define IOPRIO_CLASS_MASK  0x07
 #define IOPRIO_PRIO_MASK   ((1UL << IOPRIO_CLASS_SHIFT) - 1)
 
-#define IOPRIO_PRIO_CLASS(mask)((mask) >> IOPRIO_CLASS_SHIFT)
-#define IOPRIO_PRIO_DATA(mask) ((mask) & IOPRIO_PRIO_MASK)
-#define IOPRIO_PRIO_VALUE(class, data) (((class) << IOPRIO_CLASS_SHIFT) | data)
+#define IOPRIO_PRIO_CLASS(ioprio)  \
+   (((ioprio) >> IOPRIO_CLASS_SHIFT) & IOPRIO_CLASS_MASK)
+#define IOPRIO_PRIO_DATA(ioprio)   ((ioprio) & IOPRIO_PRIO_MASK)
+#define IOPRIO_PRIO_VALUE(class, data) \
+   class) & IOPRIO_CLASS_MASK) << IOPRIO_CLASS_SHIFT) | \
+((data) & IOPRIO_PRIO_MASK))
 
 /*
  * These are the io priority groups as implemented by the BFQ and mq-deadline
-- 
2.31.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v4 3/6] block: change ioprio_valid() to an inline function

2021-08-10 Thread Damien Le Moal
Change the ioprio_valid() macro in include/usapi/linux/ioprio.h to an
inline function declared on the kernel side in include/linux/ioprio.h.
Also improve checks on the class value by checking the upper bound
value.

Signed-off-by: Damien Le Moal 
---
 include/linux/ioprio.h  | 10 ++
 include/uapi/linux/ioprio.h |  2 --
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index ef9ad4fb245f..2ee3373684b1 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -8,6 +8,16 @@
 
 #include 
 
+/*
+ * Check that a priority value has a valid class.
+ */
+static inline bool ioprio_valid(unsigned short ioprio)
+{
+   unsigned short class = IOPRIO_PRIO_CLASS(ioprio);
+
+   return class > IOPRIO_CLASS_NONE && class <= IOPRIO_CLASS_IDLE;
+}
+
 /*
  * if process has set io priority explicitly, use that. if not, convert
  * the cpu scheduler nice value to an io priority
diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
index 6b735854aebd..5064e230374c 100644
--- a/include/uapi/linux/ioprio.h
+++ b/include/uapi/linux/ioprio.h
@@ -27,8 +27,6 @@ enum {
IOPRIO_CLASS_IDLE,
 };
 
-#define ioprio_valid(mask) (IOPRIO_PRIO_CLASS((mask)) != IOPRIO_CLASS_NONE)
-
 /*
  * 8 best effort priority levels are supported
  */
-- 
2.31.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v4 0/6] IO priority fixes and improvements

2021-08-10 Thread Damien Le Moal
This series fixes problems with IO priority values handling and cleans
up several macro names and code for clarity.

Changes from v3:
* Split former patch 2 into patches 2, 3 and 4 to facilitate review and
  have more descriptive commit titles.
* In patch 5, keep IOPRIO_BE_NR as an alias for the new IOPRIO_NR_LEVELS
  macro. Change this patch title and commit message accordingly.
* In patch 6, define IOPRIO_BE_NORM as an alias of IOPRIO_NORM.

Changes from v2:
* Fixed typo in a comment in patch 3
* Added reviewed-by tags

Changes from v1:
* Added patch 4 to unify the default priority value used in various
  places.
* Fixed patch 2 as suggested by Bart: remove extra parenthesis and move
  ioprio_valid() from the uapi header to the kernel header.
* In patch 2, add priority value masking.

Damien Le Moal (6):
  block: bfq: fix bfq_set_next_ioprio_data()
  block: improve ioprio class description comment
  block: change ioprio_valid() to an inline function
  block: fix IOPRIO_PRIO_CLASS() and IOPRIO_PRIO_VALUE() macros
  block: Introduce IOPRIO_NR_LEVELS
  block: fix default IO priority handling

 block/bfq-iosched.c  | 10 +-
 block/bfq-iosched.h  |  4 ++--
 block/bfq-wf2q.c |  6 +++---
 block/ioprio.c   |  9 -
 drivers/nvme/host/lightnvm.c |  2 +-
 fs/f2fs/sysfs.c  |  2 +-
 include/linux/ioprio.h   | 17 -
 include/uapi/linux/ioprio.h  | 34 --
 8 files changed, 52 insertions(+), 32 deletions(-)

-- 
2.31.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v4 6/6] block: fix default IO priority handling

2021-08-10 Thread Damien Le Moal
The default IO priority is the best effort (BE) class with the
normal priority level IOPRIO_NORM (4). However, get_task_ioprio()
returns IOPRIO_CLASS_NONE/IOPRIO_NORM as the default priority and
get_current_ioprio() returns IOPRIO_CLASS_NONE/0. Let's be consistent
with the defined default and have both of these functions return the
default priority IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM) when
the user did not define another default IO priority for the task.

In include/uapi/linux/ioprio.h, introduce the IOPRIO_BE_NORM macro as
an alias to IOPRIO_NORM to clarify that this default level applies to
the BE priotity class. In include/linux/ioprio.h, define the macro
IOPRIO_DEFAULT as IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_BE_NORM)
and use this new macro when setting a priority to the default.

Signed-off-by: Damien Le Moal 
---
 block/bfq-iosched.c  | 2 +-
 block/ioprio.c   | 6 +++---
 drivers/nvme/host/lightnvm.c | 2 +-
 include/linux/ioprio.h   | 7 ++-
 include/uapi/linux/ioprio.h  | 5 +++--
 5 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 4b434369e411..e92bc0348433 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5411,7 +5411,7 @@ static struct bfq_queue **bfq_async_queue_prio(struct 
bfq_data *bfqd,
case IOPRIO_CLASS_RT:
return &bfqg->async_bfqq[0][ioprio];
case IOPRIO_CLASS_NONE:
-   ioprio = IOPRIO_NORM;
+   ioprio = IOPRIO_BE_NORM;
fallthrough;
case IOPRIO_CLASS_BE:
return &bfqg->async_bfqq[1][ioprio];
diff --git a/block/ioprio.c b/block/ioprio.c
index ca6b136c5586..0e4ff245f2bf 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -170,7 +170,7 @@ static int get_task_ioprio(struct task_struct *p)
ret = security_task_getioprio(p);
if (ret)
goto out;
-   ret = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, IOPRIO_NORM);
+   ret = IOPRIO_DEFAULT;
task_lock(p);
if (p->io_context)
ret = p->io_context->ioprio;
@@ -182,9 +182,9 @@ static int get_task_ioprio(struct task_struct *p)
 int ioprio_best(unsigned short aprio, unsigned short bprio)
 {
if (!ioprio_valid(aprio))
-   aprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
+   aprio = IOPRIO_DEFAULT;
if (!ioprio_valid(bprio))
-   bprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
+   bprio = IOPRIO_DEFAULT;
 
return min(aprio, bprio);
 }
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index e9d9ad47f70f..0fbbff0b3edb 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -662,7 +662,7 @@ static struct request *nvme_nvm_alloc_request(struct 
request_queue *q,
if (rqd->bio)
blk_rq_append_bio(rq, rqd->bio);
else
-   rq->ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
+   rq->ioprio = IOPRIO_DEFAULT;
 
return rq;
 }
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 2ee3373684b1..3f53bc27a19b 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -8,6 +8,11 @@
 
 #include 
 
+/*
+ * Default IO priority.
+ */
+#define IOPRIO_DEFAULT IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_BE_NORM)
+
 /*
  * Check that a priority value has a valid class.
  */
@@ -51,7 +56,7 @@ static inline int get_current_ioprio(void)
 
if (ioc)
return ioc->ioprio;
-   return IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
+   return IOPRIO_DEFAULT;
 }
 
 /*
diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
index aac39338d02c..f70f2596a6bf 100644
--- a/include/uapi/linux/ioprio.h
+++ b/include/uapi/linux/ioprio.h
@@ -44,8 +44,9 @@ enum {
 };
 
 /*
- * Fallback BE priority
+ * Fallback BE priority level.
  */
-#define IOPRIO_NORM(4)
+#define IOPRIO_NORM4
+#define IOPRIO_BE_NORM IOPRIO_NORM
 
 #endif /* _UAPI_LINUX_IOPRIO_H */
-- 
2.31.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v4 2/6] block: improve ioprio class description comment

2021-08-10 Thread Damien Le Moal
In include/usapi/linux/ioprio.h, change the ioprio class enum comment
to remove the outdated reference to CFQ and mention BFQ and mq-deadline
instead. Also document the high priority NCQ command use for RT class
IOs directed at ATA drives that support NCQ priority.

Signed-off-by: Damien Le Moal 
---
 include/uapi/linux/ioprio.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
index 77b17e08b0da..6b735854aebd 100644
--- a/include/uapi/linux/ioprio.h
+++ b/include/uapi/linux/ioprio.h
@@ -13,10 +13,12 @@
 #define IOPRIO_PRIO_VALUE(class, data) (((class) << IOPRIO_CLASS_SHIFT) | data)
 
 /*
- * These are the io priority groups as implemented by CFQ. RT is the realtime
- * class, it always gets premium service. BE is the best-effort scheduling
- * class, the default for any process. IDLE is the idle scheduling class, it
- * is only served when no one else is using the disk.
+ * These are the io priority groups as implemented by the BFQ and mq-deadline
+ * schedulers. RT is the realtime class, it always gets premium service. For
+ * ATA disks supporting NCQ IO priority, RT class IOs will be processed using
+ * high priority NCQ commands. BE is the best-effort scheduling class, the
+ * default for any process. IDLE is the idle scheduling class, it is only
+ * served when no one else is using the disk.
  */
 enum {
IOPRIO_CLASS_NONE,
-- 
2.31.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v4 5/6] block: Introduce IOPRIO_NR_LEVELS

2021-08-10 Thread Damien Le Moal
The BFQ scheduler and ioprio_check_cap() both assume that the RT
priority class (IOPRIO_CLASS_RT) can have up to 8 different priority
levels, similarly to the BE class (IOPRIO_CLASS_iBE). This is
controlled using the IOPRIO_BE_NR macro , which is badly named as the
number of levels also applies to the RT class.

Introduce the class independent IOPRIO_NR_LEVELS macro, defined to 8,
to make things clear. Keep the old IOPRIO_BE_NR macro definition as an
alias for IOPRIO_NR_LEVELS.

Signed-off-by: Damien Le Moal 
---
 block/bfq-iosched.c | 8 
 block/bfq-iosched.h | 4 ++--
 block/bfq-wf2q.c| 6 +++---
 block/ioprio.c  | 3 +--
 fs/f2fs/sysfs.c | 2 +-
 include/uapi/linux/ioprio.h | 5 +++--
 6 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index e546a5f4bff9..4b434369e411 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2508,7 +2508,7 @@ void bfq_end_wr_async_queues(struct bfq_data *bfqd,
int i, j;
 
for (i = 0; i < 2; i++)
-   for (j = 0; j < IOPRIO_BE_NR; j++)
+   for (j = 0; j < IOPRIO_NR_LEVELS; j++)
if (bfqg->async_bfqq[i][j])
bfq_bfqq_end_wr(bfqg->async_bfqq[i][j]);
if (bfqg->async_idle_bfqq)
@@ -5293,10 +5293,10 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct 
bfq_io_cq *bic)
break;
}
 
-   if (bfqq->new_ioprio >= IOPRIO_BE_NR) {
+   if (bfqq->new_ioprio >= IOPRIO_NR_LEVELS) {
pr_crit("bfq_set_next_ioprio_data: new_ioprio %d\n",
bfqq->new_ioprio);
-   bfqq->new_ioprio = IOPRIO_BE_NR - 1;
+   bfqq->new_ioprio = IOPRIO_NR_LEVELS - 1;
}
 
bfqq->entity.new_weight = bfq_ioprio_to_weight(bfqq->new_ioprio);
@@ -6825,7 +6825,7 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct 
bfq_group *bfqg)
int i, j;
 
for (i = 0; i < 2; i++)
-   for (j = 0; j < IOPRIO_BE_NR; j++)
+   for (j = 0; j < IOPRIO_NR_LEVELS; j++)
__bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j]);
 
__bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 99c2a3cb081e..385e28a843d1 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -931,7 +931,7 @@ struct bfq_group {
 
void *bfqd;
 
-   struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
+   struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS];
struct bfq_queue *async_idle_bfqq;
 
struct bfq_entity *my_entity;
@@ -948,7 +948,7 @@ struct bfq_group {
struct bfq_entity entity;
struct bfq_sched_data sched_data;
 
-   struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
+   struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS];
struct bfq_queue *async_idle_bfqq;
 
struct rb_root rq_pos_tree;
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index 7a462df71f68..b74cc0da118e 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -505,7 +505,7 @@ static void bfq_active_insert(struct bfq_service_tree *st,
  */
 unsigned short bfq_ioprio_to_weight(int ioprio)
 {
-   return (IOPRIO_BE_NR - ioprio) * BFQ_WEIGHT_CONVERSION_COEFF;
+   return (IOPRIO_NR_LEVELS - ioprio) * BFQ_WEIGHT_CONVERSION_COEFF;
 }
 
 /**
@@ -514,12 +514,12 @@ unsigned short bfq_ioprio_to_weight(int ioprio)
  *
  * To preserve as much as possible the old only-ioprio user interface,
  * 0 is used as an escape ioprio value for weights (numerically) equal or
- * larger than IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF.
+ * larger than IOPRIO_NR_LEVELS * BFQ_WEIGHT_CONVERSION_COEFF.
  */
 static unsigned short bfq_weight_to_ioprio(int weight)
 {
return max_t(int, 0,
-IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight);
+IOPRIO_NR_LEVELS * BFQ_WEIGHT_CONVERSION_COEFF - weight);
 }
 
 static void bfq_get_entity(struct bfq_entity *entity)
diff --git a/block/ioprio.c b/block/ioprio.c
index bee628f9f1b2..ca6b136c5586 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -74,9 +74,8 @@ int ioprio_check_cap(int ioprio)
fallthrough;
/* rt has prio field too */
case IOPRIO_CLASS_BE:
-   if (data >= IOPRIO_BE_NR || data < 0)
+   if (data >= IOPRIO_NR_LEVELS || data < 0)
return -EINVAL;
-
break;
case IOPRIO_CLASS_IDLE:
break;
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index 6642246206bd..daad532a4e2b 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -378,7 +378,7 @@ static ssize_t __sbi_store(struct f2fs_attr *a,
ret = kstrtol(name, 10, &am

[f2fs-dev] [PATCH v4 1/6] block: bfq: fix bfq_set_next_ioprio_data()

2021-08-10 Thread Damien Le Moal
For a request that has a priority level equal to or larger than
IOPRIO_BE_NR, bfq_set_next_ioprio_data() prints a critical warning but
defaults to setting the request new_ioprio field to IOPRIO_BE_NR. This
is not consistent with the warning and the allowed values for priority
levels. Fix this by setting the request new_ioprio field to
IOPRIO_BE_NR - 1, the lowest priority level allowed.

Cc: 
Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an 
extra scheduler")
Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
---
 block/bfq-iosched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index e4a61eda2d0f..e546a5f4bff9 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5296,7 +5296,7 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct 
bfq_io_cq *bic)
if (bfqq->new_ioprio >= IOPRIO_BE_NR) {
pr_crit("bfq_set_next_ioprio_data: new_ioprio %d\n",
bfqq->new_ioprio);
-   bfqq->new_ioprio = IOPRIO_BE_NR;
+   bfqq->new_ioprio = IOPRIO_BE_NR - 1;
}
 
bfqq->entity.new_weight = bfq_ioprio_to_weight(bfqq->new_ioprio);
-- 
2.31.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3 4/4] block: fix default IO priority handling

2021-08-07 Thread Damien Le Moal via Linux-f2fs-devel
On 2021/08/08 1:19, Jens Axboe wrote:
> On 8/6/21 5:18 AM, Damien Le Moal wrote:
>> diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
>> index 99d37d4807b8..5b4a39c2f623 100644
>> --- a/include/uapi/linux/ioprio.h
>> +++ b/include/uapi/linux/ioprio.h
>> @@ -42,8 +42,8 @@ enum {
>>  };
>>  
>>  /*
>> - * Fallback BE priority
>> + * Fallback BE priority level.
>>   */
>> -#define IOPRIO_NORM 4
>> +#define IOPRIO_BE_NORM  4
> 
> This again seems like a very poor idea.

OK. Will remove that. Or we could do:

#define IOPRIO_NORM 4
#define IOPRIO_BE_NORM  IOPRIO_NORM

In case other classes want to set a different default... Though, that is not
critical I think.

-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3 2/4] block: fix ioprio interface

2021-08-07 Thread Damien Le Moal via Linux-f2fs-devel
On 2021/08/08 1:18, Jens Axboe wrote:
> On 8/6/21 5:18 AM, Damien Le Moal wrote:
>> An iocb aio_reqprio field is 16-bits (u16) but often handled as an int
>> in the block layer. E.g. ioprio_check_cap() takes an int as argument.
> 
> I'd just reference kiocb->ki_ioprio, that's transport agnostic.
> 
> Apart from that, this one looks fine to me. A better commit title
> would do wonders too, though, it's very non-descript.

OK. Will cleanup commit title and message.


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v3 3/4] block: rename IOPRIO_BE_NR

2021-08-07 Thread Damien Le Moal via Linux-f2fs-devel
On 2021/08/08 1:16, Jens Axboe wrote:
> On 8/6/21 5:18 AM, Damien Le Moal wrote:
>> diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
>> index abc40965aa96..99d37d4807b8 100644
>> --- a/include/uapi/linux/ioprio.h
>> +++ b/include/uapi/linux/ioprio.h
>> @@ -31,9 +31,9 @@ enum {
>>  };
>>  
>>  /*
>> - * 8 best effort priority levels are supported
>> + * The RT and BE priority classes both support up to 8 priority levels.
>>   */
>> -#define IOPRIO_BE_NR8
>> +#define IOPRIO_NR_LEVELS8
> 
> That might not be a good idea, if an application already uses
> IOPRIO_BE_NR...

Hmmm. include/uapi/linux/ioprio.h is being introduced with kernel 5.15. These
definition are not UAPI level right now.

What about something like this:

#define IOPRIO_NR_LEVELS8
#define IOPRIO_BE_NRIOPRIO_NR_LEVELS

To keep IOPRIO_BE_NR ?

OR,

Keep IOPRIO_BE_NR as is in include/uapi/linux/ioprio.h and add

#define IOPRIO_NR_LEVELSIOPRIO_BE_NR

in include/linux/ioprio.h ?

Both would still allow doing some cleanup kernel side.

Or I can just drop this patch too.

-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v3 3/4] block: rename IOPRIO_BE_NR

2021-08-06 Thread Damien Le Moal
The BFQ scheduler and ioprio_check_cap() both assume that the RT
priority class (IOPRIO_CLASS_RT) can have up to 8 different priority
levels. This is controlled using the macro IOPRIO_BE_NR, which is badly
named as the number of levels also applies to the RT class.

Rename IOPRIO_BE_NR to the class independent IOPRIO_NR_LEVELS to make
things clear.

Signed-off-by: Damien Le Moal 
---
 block/bfq-iosched.c | 8 
 block/bfq-iosched.h | 4 ++--
 block/bfq-wf2q.c| 6 +++---
 block/ioprio.c  | 3 +--
 fs/f2fs/sysfs.c | 2 +-
 include/uapi/linux/ioprio.h | 4 ++--
 6 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 1f38d75524ae..d5824cab34d7 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2505,7 +2505,7 @@ void bfq_end_wr_async_queues(struct bfq_data *bfqd,
int i, j;
 
for (i = 0; i < 2; i++)
-   for (j = 0; j < IOPRIO_BE_NR; j++)
+   for (j = 0; j < IOPRIO_NR_LEVELS; j++)
if (bfqg->async_bfqq[i][j])
bfq_bfqq_end_wr(bfqg->async_bfqq[i][j]);
if (bfqg->async_idle_bfqq)
@@ -5290,10 +5290,10 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct 
bfq_io_cq *bic)
break;
}
 
-   if (bfqq->new_ioprio >= IOPRIO_BE_NR) {
+   if (bfqq->new_ioprio >= IOPRIO_NR_LEVELS) {
pr_crit("bfq_set_next_ioprio_data: new_ioprio %d\n",
bfqq->new_ioprio);
-   bfqq->new_ioprio = IOPRIO_BE_NR - 1;
+   bfqq->new_ioprio = IOPRIO_NR_LEVELS - 1;
}
 
bfqq->entity.new_weight = bfq_ioprio_to_weight(bfqq->new_ioprio);
@@ -6822,7 +6822,7 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct 
bfq_group *bfqg)
int i, j;
 
for (i = 0; i < 2; i++)
-   for (j = 0; j < IOPRIO_BE_NR; j++)
+   for (j = 0; j < IOPRIO_NR_LEVELS; j++)
__bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j]);
 
__bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 99c2a3cb081e..385e28a843d1 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -931,7 +931,7 @@ struct bfq_group {
 
void *bfqd;
 
-   struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
+   struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS];
struct bfq_queue *async_idle_bfqq;
 
struct bfq_entity *my_entity;
@@ -948,7 +948,7 @@ struct bfq_group {
struct bfq_entity entity;
struct bfq_sched_data sched_data;
 
-   struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
+   struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS];
struct bfq_queue *async_idle_bfqq;
 
struct rb_root rq_pos_tree;
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index 7a462df71f68..b74cc0da118e 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -505,7 +505,7 @@ static void bfq_active_insert(struct bfq_service_tree *st,
  */
 unsigned short bfq_ioprio_to_weight(int ioprio)
 {
-   return (IOPRIO_BE_NR - ioprio) * BFQ_WEIGHT_CONVERSION_COEFF;
+   return (IOPRIO_NR_LEVELS - ioprio) * BFQ_WEIGHT_CONVERSION_COEFF;
 }
 
 /**
@@ -514,12 +514,12 @@ unsigned short bfq_ioprio_to_weight(int ioprio)
  *
  * To preserve as much as possible the old only-ioprio user interface,
  * 0 is used as an escape ioprio value for weights (numerically) equal or
- * larger than IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF.
+ * larger than IOPRIO_NR_LEVELS * BFQ_WEIGHT_CONVERSION_COEFF.
  */
 static unsigned short bfq_weight_to_ioprio(int weight)
 {
return max_t(int, 0,
-IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight);
+IOPRIO_NR_LEVELS * BFQ_WEIGHT_CONVERSION_COEFF - weight);
 }
 
 static void bfq_get_entity(struct bfq_entity *entity)
diff --git a/block/ioprio.c b/block/ioprio.c
index bee628f9f1b2..ca6b136c5586 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -74,9 +74,8 @@ int ioprio_check_cap(int ioprio)
fallthrough;
/* rt has prio field too */
case IOPRIO_CLASS_BE:
-   if (data >= IOPRIO_BE_NR || data < 0)
+   if (data >= IOPRIO_NR_LEVELS || data < 0)
return -EINVAL;
-
break;
case IOPRIO_CLASS_IDLE:
break;
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index 6642246206bd..daad532a4e2b 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -378,7 +378,7 @@ static ssize_t __sbi_store(struct f2fs_attr *a,
ret = kstrtol(name, 10, &data);
if (ret)
return ret;
-   if (data >= IOPRIO_BE_NR || data <

[f2fs-dev] [PATCH v3 2/4] block: fix ioprio interface

2021-08-06 Thread Damien Le Moal
An iocb aio_reqprio field is 16-bits (u16) but often handled as an int
in the block layer. E.g. ioprio_check_cap() takes an int as argument.
With such implicit int casting function calls, the upper 16-bits of the
int argument may be left uninitialized by the compiler, resulting in
invalid values for the IOPRIO_PRIO_CLASS() macro (garbage upper bits)
and in an error return for functions such as ioprio_check_cap().

Fix this by masking the result of the shift by IOPRIO_CLASS_SHIFT bits
in the IOPRIO_PRIO_CLASS() macro. The new macro IOPRIO_CLASS_MASK
defines the 3-bits mask for the priority class.

While at it, cleanup the following:
* Apply the mask IOPRIO_PRIO_MASK to the data argument of the
  IOPRIO_PRIO_VALUE() macro to ignore upper bits of the data value.
* Remove unnecessary parenthesis around fixed values in the macro
  definitions in include/uapi/linux/ioprio.h.
* Update the outdated mention of CFQ in the comment describing priority
  classes and instead mention BFQ and mq-deadline.
* Change the argument name of the IOPRIO_PRIO_CLASS() and
  IOPRIO_PRIO_DATA() macros from "mask" to "ioprio" to reflect the fact
  that an IO priority value should be passed rather than a mask.
* Change the ioprio_valid() macro into an inline function, adding a
  check on the maximum value of the class of a priority value as
  defined by the IOPRIO_CLASS_MAX enum value. Move this function to
  the kernel side in include/linux/ioprio.h.
* Remove the unnecessary "else" after the return statements in
  task_nice_ioclass().

Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
---
 include/linux/ioprio.h  | 15 ---
 include/uapi/linux/ioprio.h | 19 +++
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index ef9ad4fb245f..9b3a6d8172b4 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -8,6 +8,16 @@
 
 #include 
 
+/*
+ * Check that a priority value has a valid class.
+ */
+static inline bool ioprio_valid(unsigned short ioprio)
+{
+   unsigned short class = IOPRIO_PRIO_CLASS(ioprio);
+
+   return class > IOPRIO_CLASS_NONE && class < IOPRIO_CLASS_MAX;
+}
+
 /*
  * if process has set io priority explicitly, use that. if not, convert
  * the cpu scheduler nice value to an io priority
@@ -25,10 +35,9 @@ static inline int task_nice_ioclass(struct task_struct *task)
 {
if (task->policy == SCHED_IDLE)
return IOPRIO_CLASS_IDLE;
-   else if (task_is_realtime(task))
+   if (task_is_realtime(task))
return IOPRIO_CLASS_RT;
-   else
-   return IOPRIO_CLASS_BE;
+   return IOPRIO_CLASS_BE;
 }
 
 /*
diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
index 77b17e08b0da..abc40965aa96 100644
--- a/include/uapi/linux/ioprio.h
+++ b/include/uapi/linux/ioprio.h
@@ -5,12 +5,15 @@
 /*
  * Gives us 8 prio classes with 13-bits of data for each class
  */
-#define IOPRIO_CLASS_SHIFT (13)
+#define IOPRIO_CLASS_SHIFT 13
+#define IOPRIO_CLASS_MASK  0x07
 #define IOPRIO_PRIO_MASK   ((1UL << IOPRIO_CLASS_SHIFT) - 1)
 
-#define IOPRIO_PRIO_CLASS(mask)((mask) >> IOPRIO_CLASS_SHIFT)
-#define IOPRIO_PRIO_DATA(mask) ((mask) & IOPRIO_PRIO_MASK)
-#define IOPRIO_PRIO_VALUE(class, data) (((class) << IOPRIO_CLASS_SHIFT) | data)
+#define IOPRIO_PRIO_CLASS(ioprio)  \
+   (((ioprio) >> IOPRIO_CLASS_SHIFT) & IOPRIO_CLASS_MASK)
+#define IOPRIO_PRIO_DATA(ioprio)   ((ioprio) & IOPRIO_PRIO_MASK)
+#define IOPRIO_PRIO_VALUE(class, data) \
+   (((class) << IOPRIO_CLASS_SHIFT) | ((data) & IOPRIO_PRIO_MASK))
 
 /*
  * These are the io priority groups as implemented by CFQ. RT is the realtime
@@ -23,14 +26,14 @@ enum {
IOPRIO_CLASS_RT,
IOPRIO_CLASS_BE,
IOPRIO_CLASS_IDLE,
-};
 
-#define ioprio_valid(mask) (IOPRIO_PRIO_CLASS((mask)) != IOPRIO_CLASS_NONE)
+   IOPRIO_CLASS_MAX,
+};
 
 /*
  * 8 best effort priority levels are supported
  */
-#define IOPRIO_BE_NR   (8)
+#define IOPRIO_BE_NR   8
 
 enum {
IOPRIO_WHO_PROCESS = 1,
@@ -41,6 +44,6 @@ enum {
 /*
  * Fallback BE priority
  */
-#define IOPRIO_NORM(4)
+#define IOPRIO_NORM4
 
 #endif /* _UAPI_LINUX_IOPRIO_H */
-- 
2.31.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v3 0/4] IO priority fixes and improvements

2021-08-06 Thread Damien Le Moal
This series fixes problems with IO priority values handling and cleans
up several macro names and code for clarity.

Changes from v2:
* Fixed typo in a comment in patch 3
* Added reviewed-by tags

Changes from v1:
* Added patch 4 to unify the default priority value used in various
  places.
* Fixed patch 2 as suggested by Bart: remove extra parenthesis and move
  ioprio_valid() from the uapi header to the kernel header.
* In patch 2, add priority value masking.

Damien Le Moal (4):
  block: bfq: fix bfq_set_next_ioprio_data()
  block: fix ioprio interface
  block: rename IOPRIO_BE_NR
  block: fix default IO priority handling

 block/bfq-iosched.c  | 10 +-
 block/bfq-iosched.h  |  4 ++--
 block/bfq-wf2q.c |  6 +++---
 block/ioprio.c   |  9 -
 drivers/nvme/host/lightnvm.c |  2 +-
 fs/f2fs/sysfs.c  |  2 +-
 include/linux/ioprio.h   | 22 ++
 include/uapi/linux/ioprio.h  | 23 +--
 8 files changed, 47 insertions(+), 31 deletions(-)

-- 
2.31.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v3 1/4] block: bfq: fix bfq_set_next_ioprio_data()

2021-08-06 Thread Damien Le Moal
For a request that has a priority level equal to or larger than
IOPRIO_BE_NR, bfq_set_next_ioprio_data() prints a critical warning but
defaults to setting the request new_ioprio field to IOPRIO_BE_NR. This
is not consistent with the warning and the allowed values for priority
levels. Fix this by setting the request new_ioprio field to
IOPRIO_BE_NR - 1, the lowest priority level allowed.

Cc: 
Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an 
extra scheduler")
Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
---
 block/bfq-iosched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 727955918563..1f38d75524ae 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5293,7 +5293,7 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct 
bfq_io_cq *bic)
if (bfqq->new_ioprio >= IOPRIO_BE_NR) {
pr_crit("bfq_set_next_ioprio_data: new_ioprio %d\n",
bfqq->new_ioprio);
-   bfqq->new_ioprio = IOPRIO_BE_NR;
+   bfqq->new_ioprio = IOPRIO_BE_NR - 1;
}
 
bfqq->entity.new_weight = bfq_ioprio_to_weight(bfqq->new_ioprio);
-- 
2.31.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v3 4/4] block: fix default IO priority handling

2021-08-06 Thread Damien Le Moal
The default IO priority is the best effort (BE) class with the
normal priority level IOPRIO_NORM (4). However, get_task_ioprio()
returns IOPRIO_CLASS_NONE/IOPRIO_NORM as the default priority and
get_current_ioprio() returns IOPRIO_CLASS_NONE/0. Let's be consistent
with the defined default and have both of these functions return the
default priority IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM) when
the user did not define another default IO priority for the task.

In include/linux/ioprio.h, rename the IOPRIO_NORM macro to
IOPRIO_BE_NORM to clarify that this default level applies to the BE
priotity class. Also, define the macro IOPRIO_DEFAULT as
IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_BE_NORM) and use this new
macro when setting a priority to the default.

Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
---
 block/bfq-iosched.c  | 2 +-
 block/ioprio.c   | 6 +++---
 drivers/nvme/host/lightnvm.c | 2 +-
 include/linux/ioprio.h   | 7 ++-
 include/uapi/linux/ioprio.h  | 4 ++--
 5 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index d5824cab34d7..a07d630c6972 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5408,7 +5408,7 @@ static struct bfq_queue **bfq_async_queue_prio(struct 
bfq_data *bfqd,
case IOPRIO_CLASS_RT:
return &bfqg->async_bfqq[0][ioprio];
case IOPRIO_CLASS_NONE:
-   ioprio = IOPRIO_NORM;
+   ioprio = IOPRIO_BE_NORM;
fallthrough;
case IOPRIO_CLASS_BE:
return &bfqg->async_bfqq[1][ioprio];
diff --git a/block/ioprio.c b/block/ioprio.c
index ca6b136c5586..0e4ff245f2bf 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -170,7 +170,7 @@ static int get_task_ioprio(struct task_struct *p)
ret = security_task_getioprio(p);
if (ret)
goto out;
-   ret = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, IOPRIO_NORM);
+   ret = IOPRIO_DEFAULT;
task_lock(p);
if (p->io_context)
ret = p->io_context->ioprio;
@@ -182,9 +182,9 @@ static int get_task_ioprio(struct task_struct *p)
 int ioprio_best(unsigned short aprio, unsigned short bprio)
 {
if (!ioprio_valid(aprio))
-   aprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
+   aprio = IOPRIO_DEFAULT;
if (!ioprio_valid(bprio))
-   bprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
+   bprio = IOPRIO_DEFAULT;
 
return min(aprio, bprio);
 }
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index e9d9ad47f70f..0fbbff0b3edb 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -662,7 +662,7 @@ static struct request *nvme_nvm_alloc_request(struct 
request_queue *q,
if (rqd->bio)
blk_rq_append_bio(rq, rqd->bio);
else
-   rq->ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
+   rq->ioprio = IOPRIO_DEFAULT;
 
return rq;
 }
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 9b3a6d8172b4..2837c3a0d2e1 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -8,6 +8,11 @@
 
 #include 
 
+/*
+ * Default IO priority.
+ */
+#define IOPRIO_DEFAULT IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_BE_NORM)
+
 /*
  * Check that a priority value has a valid class.
  */
@@ -50,7 +55,7 @@ static inline int get_current_ioprio(void)
 
if (ioc)
return ioc->ioprio;
-   return IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
+   return IOPRIO_DEFAULT;
 }
 
 /*
diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
index 99d37d4807b8..5b4a39c2f623 100644
--- a/include/uapi/linux/ioprio.h
+++ b/include/uapi/linux/ioprio.h
@@ -42,8 +42,8 @@ enum {
 };
 
 /*
- * Fallback BE priority
+ * Fallback BE priority level.
  */
-#define IOPRIO_NORM4
+#define IOPRIO_BE_NORM 4
 
 #endif /* _UAPI_LINUX_IOPRIO_H */
-- 
2.31.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2 2/4] block: fix ioprio interface

2021-08-06 Thread Damien Le Moal
On 2021/08/06 15:35, Hannes Reinecke wrote:
> On 8/6/21 7:11 AM, Damien Le Moal wrote:
>> An iocb aio_reqprio field is 16-bits (u16) but often handled as an int
>> in the block layer. E.g. ioprio_check_cap() takes an int as argument.
>> With such implicit int casting function calls, the upper 16-bits of the
>> int argument may be left uninitialized by the compiler, resulting in
>> invalid values for the IOPRIO_PRIO_CLASS() macro (garbage upper bits)
>> and in an error return for functions such as ioprio_check_cap().
>>
>> Fix this by masking the result of the shift by IOPRIO_CLASS_SHIFT bits
>> in the IOPRIO_PRIO_CLASS() macro. The new macro IOPRIO_CLASS_MASK
>> defines the 3-bits mask for the priority class.
>>
>> While at it, cleanup the following:
>> * Apply the mask IOPRIO_PRIO_MASK to the data argument of the
>>IOPRIO_PRIO_VALUE() macro to ignore upper bits of the data value.
>> * Remove unnecessary parenthesis around fixed values in the macro
>>definitions in include/uapi/linux/ioprio.h.
>> * Update the outdated mention of CFQ in the comment describing priority
>>classes and instead mention BFQ and mq-deadline.
>> * Change the argument name of the IOPRIO_PRIO_CLASS() and
>>IOPRIO_PRIO_DATA() macros from "mask" to "ioprio" to reflect the fact
>>that an IO priority value should be passed rather than a mask.
>> * Change the ioprio_valid() macro into an inline function, adding a
>>check on the maximum value of the class of a priority value as
>>defined by the IOPRIO_CLASS_MAX enum value. Move this function to
>>    the kernel side in include/linux/ioprio.h.
>> * Remove the unnecessary "else" after the return statements in
>>task_nice_ioclass().
>>
>> Signed-off-by: Damien Le Moal 
>> ---
>>   include/linux/ioprio.h  | 15 ---
>>   include/uapi/linux/ioprio.h | 19 +++
>>   2 files changed, 23 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
>> index ef9ad4fb245f..9b3a6d8172b4 100644
>> --- a/include/linux/ioprio.h
>> +++ b/include/linux/ioprio.h
>> @@ -8,6 +8,16 @@
>>   
>>   #include 
>>   
>> +/*
>> + * Check that a priority value has a valid class.
>> + */
>> +static inline bool ioprio_valid(unsigned short ioprio)
> 
> Wouldn't it be better to use 'u16' here as type, as we're relying on the 
> number of bits?

Other functions in block/ioprio.c and in include/linux/ioprio.h use "unsigned
short", so I followed. But many functions, if not most, use "int". This is all a
bit of a mess. I think we need a "typedef ioprio_t u16;" to clean things up. But
there are a lot of places to fix. I can add such patch... Worth it ?

> 
>> +{
>> +unsigned short class = IOPRIO_PRIO_CLASS(ioprio);
>> +
>> +return class > IOPRIO_CLASS_NONE && class < IOPRIO_CLASS_MAX;
>> +}
>> +
>>   /*
>>* if process has set io priority explicitly, use that. if not, convert
>>* the cpu scheduler nice value to an io priority
>> @@ -25,10 +35,9 @@ static inline int task_nice_ioclass(struct task_struct 
>> *task)
>>   {
>>  if (task->policy == SCHED_IDLE)
>>  return IOPRIO_CLASS_IDLE;
>> -else if (task_is_realtime(task))
>> +if (task_is_realtime(task))
>>  return IOPRIO_CLASS_RT;
>> -else
>> -return IOPRIO_CLASS_BE;
>> +return IOPRIO_CLASS_BE;
>>   }
>>   
>>   /*
>> diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
>> index 77b17e08b0da..abc40965aa96 100644
>> --- a/include/uapi/linux/ioprio.h
>> +++ b/include/uapi/linux/ioprio.h
>> @@ -5,12 +5,15 @@
>>   /*
>>* Gives us 8 prio classes with 13-bits of data for each class
>>*/
>> -#define IOPRIO_CLASS_SHIFT  (13)
>> +#define IOPRIO_CLASS_SHIFT  13
>> +#define IOPRIO_CLASS_MASK   0x07
>>   #define IOPRIO_PRIO_MASK   ((1UL << IOPRIO_CLASS_SHIFT) - 1)
>>   
>> -#define IOPRIO_PRIO_CLASS(mask) ((mask) >> IOPRIO_CLASS_SHIFT)
>> -#define IOPRIO_PRIO_DATA(mask)  ((mask) & IOPRIO_PRIO_MASK)
>> -#define IOPRIO_PRIO_VALUE(class, data)  (((class) << 
>> IOPRIO_CLASS_SHIFT) | data)
>> +#define IOPRIO_PRIO_CLASS(ioprio)   \
>> +(((ioprio) >> IOPRIO_CLASS_SHIFT) & IOPRIO_CLASS_MASK)
>> +#define IOPRIO_PRIO_DATA(ioprio)((ioprio) & IOPRIO_PRIO_MASK)
>> +#define IOPRIO_PRIO_VALUE(class, data)  \
>

Re: [f2fs-dev] [PATCH v2 3/4] block: rename IOPRIO_BE_NR

2021-08-05 Thread Damien Le Moal
On 2021/08/06 15:38, Hannes Reinecke wrote:
> On 8/6/21 7:11 AM, Damien Le Moal wrote:
>> The BFQ scheduler and ioprio_check_cap() both assume that the RT
>> priority class (IOPRIO_CLASS_RT) can have up to 8 different priority
>> levels. This is controlled using the macro IOPRIO_BE_NR, which is badly
>> named as the number of levels applies to the RT class.
>>
>> Rename IOPRIO_BE_NR to the class independent IOPRIO_NR_LEVELS to make
>> things clear.
>>
>> Signed-off-by: Damien Le Moal 
>> ---
>>   block/bfq-iosched.c | 8 
>>   block/bfq-iosched.h | 4 ++--
>>   block/bfq-wf2q.c| 6 +++---
>>   block/ioprio.c  | 3 +--
>>   fs/f2fs/sysfs.c | 2 +-
>>   include/uapi/linux/ioprio.h | 4 ++--
>>   6 files changed, 13 insertions(+), 14 deletions(-)
>>
>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>> index 1f38d75524ae..d5824cab34d7 100644
>> --- a/block/bfq-iosched.c
>> +++ b/block/bfq-iosched.c
>> @@ -2505,7 +2505,7 @@ void bfq_end_wr_async_queues(struct bfq_data *bfqd,
>>  int i, j;
>>   
>>  for (i = 0; i < 2; i++)
>> -for (j = 0; j < IOPRIO_BE_NR; j++)
>> +for (j = 0; j < IOPRIO_NR_LEVELS; j++)
>>  if (bfqg->async_bfqq[i][j])
>>  bfq_bfqq_end_wr(bfqg->async_bfqq[i][j]);
>>  if (bfqg->async_idle_bfqq)
>> @@ -5290,10 +5290,10 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, 
>> struct bfq_io_cq *bic)
>>  break;
>>  }
>>   
>> -if (bfqq->new_ioprio >= IOPRIO_BE_NR) {
>> +if (bfqq->new_ioprio >= IOPRIO_NR_LEVELS) {
>>  pr_crit("bfq_set_next_ioprio_data: new_ioprio %d\n",
>>  bfqq->new_ioprio);
>> -bfqq->new_ioprio = IOPRIO_BE_NR - 1;
>> +bfqq->new_ioprio = IOPRIO_NR_LEVELS - 1;
>>  }
>>   
>>  bfqq->entity.new_weight = bfq_ioprio_to_weight(bfqq->new_ioprio);
>> @@ -6822,7 +6822,7 @@ void bfq_put_async_queues(struct bfq_data *bfqd, 
>> struct bfq_group *bfqg)
>>  int i, j;
>>   
>>  for (i = 0; i < 2; i++)
>> -for (j = 0; j < IOPRIO_BE_NR; j++)
>> +for (j = 0; j < IOPRIO_NR_LEVELS; j++)
>>  __bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j]);
>>   
>>  __bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
>> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
>> index 99c2a3cb081e..385e28a843d1 100644
>> --- a/block/bfq-iosched.h
>> +++ b/block/bfq-iosched.h
>> @@ -931,7 +931,7 @@ struct bfq_group {
>>   
>>  void *bfqd;
>>   
>> -struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
>> +struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS];
>>  struct bfq_queue *async_idle_bfqq;
>>   
>>  struct bfq_entity *my_entity;
>> @@ -948,7 +948,7 @@ struct bfq_group {
>>  struct bfq_entity entity;
>>  struct bfq_sched_data sched_data;
>>   
>> -struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
>> +struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS];
>>  struct bfq_queue *async_idle_bfqq;
>>   
>>  struct rb_root rq_pos_tree;
>> diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
>> index 7a462df71f68..b74cc0da118e 100644
>> --- a/block/bfq-wf2q.c
>> +++ b/block/bfq-wf2q.c
>> @@ -505,7 +505,7 @@ static void bfq_active_insert(struct bfq_service_tree 
>> *st,
>>*/
>>   unsigned short bfq_ioprio_to_weight(int ioprio)
>>   {
>> -return (IOPRIO_BE_NR - ioprio) * BFQ_WEIGHT_CONVERSION_COEFF;
>> +return (IOPRIO_NR_LEVELS - ioprio) * BFQ_WEIGHT_CONVERSION_COEFF;
>>   }
>>   
>>   /**
>> @@ -514,12 +514,12 @@ unsigned short bfq_ioprio_to_weight(int ioprio)
>>*
>>* To preserve as much as possible the old only-ioprio user interface,
>>* 0 is used as an escape ioprio value for weights (numerically) equal or
>> - * larger than IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF.
>> + * larger than IOPRIO_NR_LEVELS * BFQ_WEIGHT_CONVERSION_COEFF.
>>*/
>>   static unsigned short bfq_weight_to_ioprio(int weight)
>>   {
>>  return max_t(int, 0,
>> - IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight);
>> + IOPRIO_NR_LEVELS * BFQ_WEIGHT_CONVERSION_COEFF - weight);
>>   }
>>   
>>   static void bfq_get_entity(struct bfq_entity *entity)
>> diff --git a/block/

[f2fs-dev] [PATCH v2 3/4] block: rename IOPRIO_BE_NR

2021-08-05 Thread Damien Le Moal
The BFQ scheduler and ioprio_check_cap() both assume that the RT
priority class (IOPRIO_CLASS_RT) can have up to 8 different priority
levels. This is controlled using the macro IOPRIO_BE_NR, which is badly
named as the number of levels applies to the RT class.

Rename IOPRIO_BE_NR to the class independent IOPRIO_NR_LEVELS to make
things clear.

Signed-off-by: Damien Le Moal 
---
 block/bfq-iosched.c | 8 
 block/bfq-iosched.h | 4 ++--
 block/bfq-wf2q.c| 6 +++---
 block/ioprio.c  | 3 +--
 fs/f2fs/sysfs.c | 2 +-
 include/uapi/linux/ioprio.h | 4 ++--
 6 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 1f38d75524ae..d5824cab34d7 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2505,7 +2505,7 @@ void bfq_end_wr_async_queues(struct bfq_data *bfqd,
int i, j;
 
for (i = 0; i < 2; i++)
-   for (j = 0; j < IOPRIO_BE_NR; j++)
+   for (j = 0; j < IOPRIO_NR_LEVELS; j++)
if (bfqg->async_bfqq[i][j])
bfq_bfqq_end_wr(bfqg->async_bfqq[i][j]);
if (bfqg->async_idle_bfqq)
@@ -5290,10 +5290,10 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct 
bfq_io_cq *bic)
break;
}
 
-   if (bfqq->new_ioprio >= IOPRIO_BE_NR) {
+   if (bfqq->new_ioprio >= IOPRIO_NR_LEVELS) {
pr_crit("bfq_set_next_ioprio_data: new_ioprio %d\n",
bfqq->new_ioprio);
-   bfqq->new_ioprio = IOPRIO_BE_NR - 1;
+   bfqq->new_ioprio = IOPRIO_NR_LEVELS - 1;
}
 
bfqq->entity.new_weight = bfq_ioprio_to_weight(bfqq->new_ioprio);
@@ -6822,7 +6822,7 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct 
bfq_group *bfqg)
int i, j;
 
for (i = 0; i < 2; i++)
-   for (j = 0; j < IOPRIO_BE_NR; j++)
+   for (j = 0; j < IOPRIO_NR_LEVELS; j++)
__bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j]);
 
__bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 99c2a3cb081e..385e28a843d1 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -931,7 +931,7 @@ struct bfq_group {
 
void *bfqd;
 
-   struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
+   struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS];
struct bfq_queue *async_idle_bfqq;
 
struct bfq_entity *my_entity;
@@ -948,7 +948,7 @@ struct bfq_group {
struct bfq_entity entity;
struct bfq_sched_data sched_data;
 
-   struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
+   struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS];
struct bfq_queue *async_idle_bfqq;
 
struct rb_root rq_pos_tree;
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index 7a462df71f68..b74cc0da118e 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -505,7 +505,7 @@ static void bfq_active_insert(struct bfq_service_tree *st,
  */
 unsigned short bfq_ioprio_to_weight(int ioprio)
 {
-   return (IOPRIO_BE_NR - ioprio) * BFQ_WEIGHT_CONVERSION_COEFF;
+   return (IOPRIO_NR_LEVELS - ioprio) * BFQ_WEIGHT_CONVERSION_COEFF;
 }
 
 /**
@@ -514,12 +514,12 @@ unsigned short bfq_ioprio_to_weight(int ioprio)
  *
  * To preserve as much as possible the old only-ioprio user interface,
  * 0 is used as an escape ioprio value for weights (numerically) equal or
- * larger than IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF.
+ * larger than IOPRIO_NR_LEVELS * BFQ_WEIGHT_CONVERSION_COEFF.
  */
 static unsigned short bfq_weight_to_ioprio(int weight)
 {
return max_t(int, 0,
-IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight);
+IOPRIO_NR_LEVELS * BFQ_WEIGHT_CONVERSION_COEFF - weight);
 }
 
 static void bfq_get_entity(struct bfq_entity *entity)
diff --git a/block/ioprio.c b/block/ioprio.c
index bee628f9f1b2..ca6b136c5586 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -74,9 +74,8 @@ int ioprio_check_cap(int ioprio)
fallthrough;
/* rt has prio field too */
case IOPRIO_CLASS_BE:
-   if (data >= IOPRIO_BE_NR || data < 0)
+   if (data >= IOPRIO_NR_LEVELS || data < 0)
return -EINVAL;
-
break;
case IOPRIO_CLASS_IDLE:
break;
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index 6642246206bd..daad532a4e2b 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -378,7 +378,7 @@ static ssize_t __sbi_store(struct f2fs_attr *a,
ret = kstrtol(name, 10, &data);
if (ret)
return ret;
-   if (data >= IOPRIO_BE_NR || data <

[f2fs-dev] [PATCH v2 4/4] block: fix default IO priority handling

2021-08-05 Thread Damien Le Moal
The default IO priority is the best effort (BE) class with the
normal priority level IOPRIO_NORM (4). However, get_task_ioprio()
returns IOPRIO_CLASS_NONE/IOPRIO_NORM as the default priority and
get_current_ioprio() returns IOPRIO_CLASS_NONE/0. Let's be consistent
with the defined default and have both of these functions return the
default priority IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM) when
the user did not define another default IO priority for the task.

In include/linux/ioprio.h, rename the IOPRIO_NORM macro to
IOPRIO_BE_NORM to clarify that this default level applies to the BE
priotity class. Also, define the macro IOPRIO_DEFAULT as
IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_BE_NORM) and use this new
macro when setting a priority to the default.

Signed-off-by: Damien Le Moal 
---
 block/bfq-iosched.c  | 2 +-
 block/ioprio.c   | 6 +++---
 drivers/nvme/host/lightnvm.c | 2 +-
 include/linux/ioprio.h   | 7 ++-
 include/uapi/linux/ioprio.h  | 4 ++--
 5 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index d5824cab34d7..a07d630c6972 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5408,7 +5408,7 @@ static struct bfq_queue **bfq_async_queue_prio(struct 
bfq_data *bfqd,
case IOPRIO_CLASS_RT:
return &bfqg->async_bfqq[0][ioprio];
case IOPRIO_CLASS_NONE:
-   ioprio = IOPRIO_NORM;
+   ioprio = IOPRIO_BE_NORM;
fallthrough;
case IOPRIO_CLASS_BE:
return &bfqg->async_bfqq[1][ioprio];
diff --git a/block/ioprio.c b/block/ioprio.c
index ca6b136c5586..0e4ff245f2bf 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -170,7 +170,7 @@ static int get_task_ioprio(struct task_struct *p)
ret = security_task_getioprio(p);
if (ret)
goto out;
-   ret = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, IOPRIO_NORM);
+   ret = IOPRIO_DEFAULT;
task_lock(p);
if (p->io_context)
ret = p->io_context->ioprio;
@@ -182,9 +182,9 @@ static int get_task_ioprio(struct task_struct *p)
 int ioprio_best(unsigned short aprio, unsigned short bprio)
 {
if (!ioprio_valid(aprio))
-   aprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
+   aprio = IOPRIO_DEFAULT;
if (!ioprio_valid(bprio))
-   bprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
+   bprio = IOPRIO_DEFAULT;
 
return min(aprio, bprio);
 }
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index e9d9ad47f70f..0fbbff0b3edb 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -662,7 +662,7 @@ static struct request *nvme_nvm_alloc_request(struct 
request_queue *q,
if (rqd->bio)
blk_rq_append_bio(rq, rqd->bio);
else
-   rq->ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
+   rq->ioprio = IOPRIO_DEFAULT;
 
return rq;
 }
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 9b3a6d8172b4..2837c3a0d2e1 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -8,6 +8,11 @@
 
 #include 
 
+/*
+ * Default IO priority.
+ */
+#define IOPRIO_DEFAULT IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_BE_NORM)
+
 /*
  * Check that a priority value has a valid class.
  */
@@ -50,7 +55,7 @@ static inline int get_current_ioprio(void)
 
if (ioc)
return ioc->ioprio;
-   return IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
+   return IOPRIO_DEFAULT;
 }
 
 /*
diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
index b9d48744dacb..ccc633af44d5 100644
--- a/include/uapi/linux/ioprio.h
+++ b/include/uapi/linux/ioprio.h
@@ -42,8 +42,8 @@ enum {
 };
 
 /*
- * Fallback BE priority
+ * Fallback BE priority level.
  */
-#define IOPRIO_NORM4
+#define IOPRIO_BE_NORM 4
 
 #endif /* _UAPI_LINUX_IOPRIO_H */
-- 
2.31.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2 2/4] block: fix ioprio interface

2021-08-05 Thread Damien Le Moal
An iocb aio_reqprio field is 16-bits (u16) but often handled as an int
in the block layer. E.g. ioprio_check_cap() takes an int as argument.
With such implicit int casting function calls, the upper 16-bits of the
int argument may be left uninitialized by the compiler, resulting in
invalid values for the IOPRIO_PRIO_CLASS() macro (garbage upper bits)
and in an error return for functions such as ioprio_check_cap().

Fix this by masking the result of the shift by IOPRIO_CLASS_SHIFT bits
in the IOPRIO_PRIO_CLASS() macro. The new macro IOPRIO_CLASS_MASK
defines the 3-bits mask for the priority class.

While at it, cleanup the following:
* Apply the mask IOPRIO_PRIO_MASK to the data argument of the
  IOPRIO_PRIO_VALUE() macro to ignore upper bits of the data value.
* Remove unnecessary parenthesis around fixed values in the macro
  definitions in include/uapi/linux/ioprio.h.
* Update the outdated mention of CFQ in the comment describing priority
  classes and instead mention BFQ and mq-deadline.
* Change the argument name of the IOPRIO_PRIO_CLASS() and
  IOPRIO_PRIO_DATA() macros from "mask" to "ioprio" to reflect the fact
  that an IO priority value should be passed rather than a mask.
* Change the ioprio_valid() macro into an inline function, adding a
  check on the maximum value of the class of a priority value as
  defined by the IOPRIO_CLASS_MAX enum value. Move this function to
  the kernel side in include/linux/ioprio.h.
* Remove the unnecessary "else" after the return statements in
  task_nice_ioclass().

Signed-off-by: Damien Le Moal 
---
 include/linux/ioprio.h  | 15 ---
 include/uapi/linux/ioprio.h | 19 +++
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index ef9ad4fb245f..9b3a6d8172b4 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -8,6 +8,16 @@
 
 #include 
 
+/*
+ * Check that a priority value has a valid class.
+ */
+static inline bool ioprio_valid(unsigned short ioprio)
+{
+   unsigned short class = IOPRIO_PRIO_CLASS(ioprio);
+
+   return class > IOPRIO_CLASS_NONE && class < IOPRIO_CLASS_MAX;
+}
+
 /*
  * if process has set io priority explicitly, use that. if not, convert
  * the cpu scheduler nice value to an io priority
@@ -25,10 +35,9 @@ static inline int task_nice_ioclass(struct task_struct *task)
 {
if (task->policy == SCHED_IDLE)
return IOPRIO_CLASS_IDLE;
-   else if (task_is_realtime(task))
+   if (task_is_realtime(task))
return IOPRIO_CLASS_RT;
-   else
-   return IOPRIO_CLASS_BE;
+   return IOPRIO_CLASS_BE;
 }
 
 /*
diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
index 77b17e08b0da..abc40965aa96 100644
--- a/include/uapi/linux/ioprio.h
+++ b/include/uapi/linux/ioprio.h
@@ -5,12 +5,15 @@
 /*
  * Gives us 8 prio classes with 13-bits of data for each class
  */
-#define IOPRIO_CLASS_SHIFT (13)
+#define IOPRIO_CLASS_SHIFT 13
+#define IOPRIO_CLASS_MASK  0x07
 #define IOPRIO_PRIO_MASK   ((1UL << IOPRIO_CLASS_SHIFT) - 1)
 
-#define IOPRIO_PRIO_CLASS(mask)((mask) >> IOPRIO_CLASS_SHIFT)
-#define IOPRIO_PRIO_DATA(mask) ((mask) & IOPRIO_PRIO_MASK)
-#define IOPRIO_PRIO_VALUE(class, data) (((class) << IOPRIO_CLASS_SHIFT) | data)
+#define IOPRIO_PRIO_CLASS(ioprio)  \
+   (((ioprio) >> IOPRIO_CLASS_SHIFT) & IOPRIO_CLASS_MASK)
+#define IOPRIO_PRIO_DATA(ioprio)   ((ioprio) & IOPRIO_PRIO_MASK)
+#define IOPRIO_PRIO_VALUE(class, data) \
+   (((class) << IOPRIO_CLASS_SHIFT) | ((data) & IOPRIO_PRIO_MASK))
 
 /*
  * These are the io priority groups as implemented by CFQ. RT is the realtime
@@ -23,14 +26,14 @@ enum {
IOPRIO_CLASS_RT,
IOPRIO_CLASS_BE,
IOPRIO_CLASS_IDLE,
-};
 
-#define ioprio_valid(mask) (IOPRIO_PRIO_CLASS((mask)) != IOPRIO_CLASS_NONE)
+   IOPRIO_CLASS_MAX,
+};
 
 /*
  * 8 best effort priority levels are supported
  */
-#define IOPRIO_BE_NR   (8)
+#define IOPRIO_BE_NR   8
 
 enum {
IOPRIO_WHO_PROCESS = 1,
@@ -41,6 +44,6 @@ enum {
 /*
  * Fallback BE priority
  */
-#define IOPRIO_NORM(4)
+#define IOPRIO_NORM4
 
 #endif /* _UAPI_LINUX_IOPRIO_H */
-- 
2.31.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2 0/4] IO priority fixes and improvements

2021-08-05 Thread Damien Le Moal
This series fixes problems with IO priority values handling and cleans
up several macro names and code for clarity.

Changes from v1:
* Added patch 4 to unify the default priority value used in various
  places.
* Fixed patch 2 as suggested by Bart: remove extra parenthesis and move
  ioprio_valid() from the uapi header to the kernel header.
* In patch 2, add priority value masking.

Damien Le Moal (4):
  block: bfq: fix bfq_set_next_ioprio_data()
  block: fix ioprio interface
  block: rename IOPRIO_BE_NR
  block: fix default IO priority handling

 block/bfq-iosched.c  | 10 +-
 block/bfq-iosched.h  |  4 ++--
 block/bfq-wf2q.c |  6 +++---
 block/ioprio.c   |  9 -
 drivers/nvme/host/lightnvm.c |  2 +-
 fs/f2fs/sysfs.c  |  2 +-
 include/linux/ioprio.h   | 22 ++
 include/uapi/linux/ioprio.h  | 23 +--
 8 files changed, 47 insertions(+), 31 deletions(-)

-- 
2.31.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2 1/4] block: bfq: fix bfq_set_next_ioprio_data()

2021-08-05 Thread Damien Le Moal
For a request that has a priority level equal to or larger than
IOPRIO_BE_NR, bfq_set_next_ioprio_data() prints a critical warning but
defaults to setting the request new_ioprio field to IOPRIO_BE_NR. This
is not consistent with the warning and the allowed values for priority
levels. Fix this by setting the request new_ioprio field to
IOPRIO_BE_NR - 1, the lowest priority level allowed.

Cc: 
Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an 
extra scheduler")
Signed-off-by: Damien Le Moal 
---
 block/bfq-iosched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 727955918563..1f38d75524ae 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5293,7 +5293,7 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct 
bfq_io_cq *bic)
if (bfqq->new_ioprio >= IOPRIO_BE_NR) {
pr_crit("bfq_set_next_ioprio_data: new_ioprio %d\n",
bfqq->new_ioprio);
-   bfqq->new_ioprio = IOPRIO_BE_NR;
+   bfqq->new_ioprio = IOPRIO_BE_NR - 1;
}
 
bfqq->entity.new_weight = bfq_ioprio_to_weight(bfqq->new_ioprio);
-- 
2.31.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 04/13] mm: Add functions to lock invalidate_lock for two mappings

2021-05-26 Thread Damien Le Moal
On 2021/05/26 19:07, Jan Kara wrote:
> On Tue 25-05-21 13:48:05, Darrick J. Wong wrote:
>> On Tue, May 25, 2021 at 03:50:41PM +0200, Jan Kara wrote:
>>> Some operations such as reflinking blocks among files will need to lock
>>> invalidate_lock for two mappings. Add helper functions to do that.
>>>
>>> Signed-off-by: Jan Kara 
>>> ---
>>>  include/linux/fs.h |  6 ++
>>>  mm/filemap.c   | 38 ++
>>>  2 files changed, 44 insertions(+)
>>>
>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>> index 897238d9f1e0..e6f7447505f5 100644
>>> --- a/include/linux/fs.h
>>> +++ b/include/linux/fs.h
>>> @@ -822,6 +822,12 @@ static inline void inode_lock_shared_nested(struct 
>>> inode *inode, unsigned subcla
>>>  void lock_two_nondirectories(struct inode *, struct inode*);
>>>  void unlock_two_nondirectories(struct inode *, struct inode*);
>>>  
>>> +void filemap_invalidate_down_write_two(struct address_space *mapping1,
>>> +  struct address_space *mapping2);
>>> +void filemap_invalidate_up_write_two(struct address_space *mapping1,
>>
>> TBH I find myself wishing that the invalidate_lock used the same
>> lock/unlock style wrappers that we use for i_rwsem.
>>
>> filemap_invalidate_lock(inode1->mapping);
>> filemap_invalidate_lock_two(inode1->i_mapping, inode2->i_mapping);
> 
> OK, and filemap_invalidate_lock_shared() for down_read()? I guess that
> works for me.

What about filemap_invalidate_lock_read() and filemap_invalidate_lock_write() ?
That reminds the down_read()/down_write() without the slightly confusing 
down/up.

> 
>   Honza
> 
>  
>> To be fair, I've never been able to keep straight that down means lock
>> and up means unlock.  Ah well, at least you didn't use "p" and "v".
>>
>> Mechanically, the changes look ok to me.
>> Acked-by: Darrick J. Wong 
>>
>> --D
>>
>>> +struct address_space *mapping2);
>>> +
>>> +
>>>  /*
>>>   * NOTE: in a 32bit arch with a preemptable kernel and
>>>   * an UP compile the i_size_read/write must be atomic
>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>> index 4d9ec4c6cc34..d3801a9739aa 100644
>>> --- a/mm/filemap.c
>>> +++ b/mm/filemap.c
>>> @@ -1009,6 +1009,44 @@ struct page *__page_cache_alloc(gfp_t gfp)
>>>  EXPORT_SYMBOL(__page_cache_alloc);
>>>  #endif
>>>  
>>> +/*
>>> + * filemap_invalidate_down_write_two - lock invalidate_lock for two 
>>> mappings
>>> + *
>>> + * Lock exclusively invalidate_lock of any passed mapping that is not NULL.
>>> + *
>>> + * @mapping1: the first mapping to lock
>>> + * @mapping2: the second mapping to lock
>>> + */
>>> +void filemap_invalidate_down_write_two(struct address_space *mapping1,
>>> +  struct address_space *mapping2)
>>> +{
>>> +   if (mapping1 > mapping2)
>>> +   swap(mapping1, mapping2);
>>> +   if (mapping1)
>>> +   down_write(&mapping1->invalidate_lock);
>>> +   if (mapping2 && mapping1 != mapping2)
>>> +   down_write_nested(&mapping2->invalidate_lock, 1);
>>> +}
>>> +EXPORT_SYMBOL(filemap_invalidate_down_write_two);
>>> +
>>> +/*
>>> + * filemap_invalidate_up_write_two - unlock invalidate_lock for two 
>>> mappings
>>> + *
>>> + * Unlock exclusive invalidate_lock of any passed mapping that is not NULL.
>>> + *
>>> + * @mapping1: the first mapping to unlock
>>> + * @mapping2: the second mapping to unlock
>>> + */
>>> +void filemap_invalidate_up_write_two(struct address_space *mapping1,
>>> +struct address_space *mapping2)
>>> +{
>>> +   if (mapping1)
>>> +   up_write(&mapping1->invalidate_lock);
>>> +   if (mapping2 && mapping1 != mapping2)
>>> +   up_write(&mapping2->invalidate_lock);
>>> +}
>>> +EXPORT_SYMBOL(filemap_invalidate_up_write_two);
>>> +
>>>  /*
>>>   * In order to wait for pages to become available there must be
>>>   * waitqueues associated with pages. By using a hash table of
>>> -- 
>>> 2.26.2
>>>


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 07/11] zonefs: Convert to using invalidate_lock

2021-05-12 Thread Damien Le Moal
On 2021/05/12 22:46, Jan Kara wrote:
> Use invalidate_lock instead of zonefs' private i_mmap_sem. The intended
> purpose is exactly the same.
> 
> CC: Damien Le Moal 
> CC: Johannes Thumshirn 
> CC: 
> Signed-off-by: Jan Kara 
> ---
>  fs/zonefs/super.c  | 23 +--
>  fs/zonefs/zonefs.h |  7 +++
>  2 files changed, 8 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index cd145d318b17..da2e95d98677 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -462,7 +462,7 @@ static int zonefs_file_truncate(struct inode *inode, 
> loff_t isize)
>   inode_dio_wait(inode);
>  
>   /* Serialize against page faults */
> - down_write(&zi->i_mmap_sem);
> + down_write(&inode->i_mapping->invalidate_lock);
>  
>   /* Serialize against zonefs_iomap_begin() */
>   mutex_lock(&zi->i_truncate_mutex);
> @@ -500,7 +500,7 @@ static int zonefs_file_truncate(struct inode *inode, 
> loff_t isize)
>  
>  unlock:
>   mutex_unlock(&zi->i_truncate_mutex);
> - up_write(&zi->i_mmap_sem);
> + up_write(&inode->i_mapping->invalidate_lock);
>  
>   return ret;
>  }
> @@ -575,18 +575,6 @@ static int zonefs_file_fsync(struct file *file, loff_t 
> start, loff_t end,
>   return ret;
>  }
>  
> -static vm_fault_t zonefs_filemap_fault(struct vm_fault *vmf)
> -{
> - struct zonefs_inode_info *zi = ZONEFS_I(file_inode(vmf->vma->vm_file));
> - vm_fault_t ret;
> -
> - down_read(&zi->i_mmap_sem);
> - ret = filemap_fault(vmf);
> - up_read(&zi->i_mmap_sem);
> -
> - return ret;
> -}
> -
>  static vm_fault_t zonefs_filemap_page_mkwrite(struct vm_fault *vmf)
>  {
>   struct inode *inode = file_inode(vmf->vma->vm_file);
> @@ -607,16 +595,16 @@ static vm_fault_t zonefs_filemap_page_mkwrite(struct 
> vm_fault *vmf)
>   file_update_time(vmf->vma->vm_file);
>  
>   /* Serialize against truncates */
> - down_read(&zi->i_mmap_sem);
> + down_read(&inode->i_mapping->invalidate_lock);
>   ret = iomap_page_mkwrite(vmf, &zonefs_iomap_ops);
> - up_read(&zi->i_mmap_sem);
> + up_read(&inode->i_mapping->invalidate_lock);
>  
>   sb_end_pagefault(inode->i_sb);
>   return ret;
>  }
>  
>  static const struct vm_operations_struct zonefs_file_vm_ops = {
> - .fault  = zonefs_filemap_fault,
> + .fault  = filemap_fault,
>   .map_pages  = filemap_map_pages,
>   .page_mkwrite   = zonefs_filemap_page_mkwrite,
>  };
> @@ -1158,7 +1146,6 @@ static struct inode *zonefs_alloc_inode(struct 
> super_block *sb)
>  
>   inode_init_once(&zi->i_vnode);
>   mutex_init(&zi->i_truncate_mutex);
> - init_rwsem(&zi->i_mmap_sem);
>   zi->i_wr_refcnt = 0;
>  
>   return &zi->i_vnode;
> diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h
> index 51141907097c..7b147907c328 100644
> --- a/fs/zonefs/zonefs.h
> +++ b/fs/zonefs/zonefs.h
> @@ -70,12 +70,11 @@ struct zonefs_inode_info {
>* and changes to the inode private data, and in particular changes to
>* a sequential file size on completion of direct IO writes.
>* Serialization of mmap read IOs with truncate and syscall IO
> -  * operations is done with i_mmap_sem in addition to i_truncate_mutex.
> -  * Only zonefs_seq_file_truncate() takes both lock (i_mmap_sem first,
> -  * i_truncate_mutex second).
> +  * operations is done with invalidate_lock in addition to
> +  * i_truncate_mutex.  Only zonefs_seq_file_truncate() takes both lock
> +  * (invalidate_lock first, i_truncate_mutex second).
>*/
>   struct mutexi_truncate_mutex;
> - struct rw_semaphore i_mmap_sem;
>  
>   /* guarded by i_truncate_mutex */
>   unsigned inti_wr_refcnt;
> 

Looks OK to me for zonefs. This is a nice cleanup.

Acked-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [dm-devel] [PATCH 01/17] zonefs: use bio_alloc in zonefs_file_dio_append

2021-01-26 Thread Damien Le Moal
On 2021/01/26 23:58, Christoph Hellwig wrote:
> Use bio_alloc instead of open coding it.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  fs/zonefs/super.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index bec47f2d074beb..faea2ed34b4a37 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -678,7 +678,7 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, 
> struct iov_iter *from)
>   if (!nr_pages)
>   return 0;
>  
> - bio = bio_alloc_bioset(GFP_NOFS, nr_pages, &fs_bio_set);
> + bio = bio_alloc(GFP_NOFS, nr_pages);
>   if (!bio)
>       return -ENOMEM;
>  
> 

Acked-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH RFC PKS/PMEM 26/58] fs/zonefs: Utilize new kmap_thread()

2020-10-11 Thread Damien Le Moal
On 2020/10/10 4:52, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> The kmap() calls in this FS are localized to a single thread.  To avoid
> the over head of global PKRS updates use the new kmap_thread() call.
> 
> Cc: Damien Le Moal 
> Cc: Naohiro Aota 
> Signed-off-by: Ira Weiny 
> ---
>  fs/zonefs/super.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index 8ec7c8f109d7..2fd6c86beee1 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -1297,7 +1297,7 @@ static int zonefs_read_super(struct super_block *sb)
>   if (ret)
>   goto free_page;
>  
> - super = kmap(page);
> + super = kmap_thread(page);
>  
>   ret = -EINVAL;
>   if (le32_to_cpu(super->s_magic) != ZONEFS_MAGIC)
> @@ -1349,7 +1349,7 @@ static int zonefs_read_super(struct super_block *sb)
>   ret = 0;
>  
>  unmap:
> - kunmap(page);
> +     kunmap_thread(page);
>  free_page:
>   __free_page(page);
>  
> 

acked-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: Fix direct IO handling

2019-11-27 Thread Damien Le Moal
On 2019/11/26 17:34, Ritesh Harjani wrote:
> Hello Damien,
> 
> IIUC, you are trying to fix a stale data read by DIO read for the case
> you explained in your patch w.r.t. DIO-write forced to write as buffIO.
> 
> Coincidentally I was just looking at the same code path just now.
> So I do have a query to you/f2fs group. Below could be silly one, as I
> don't understand F2FS in great detail.
> 
> How is the stale data by DIO read, is protected against a mmap
> writes via f2fs_vm_page_mkwrite?
> 
> f2fs_vm_page_mkwrite() f2fs_direct_IO (read)
>   filemap_write_and_wait_range()
>   -> f2fs_get_blocks()
>-> submit_bio()
> 
>   -> set_page_dirty()
> 
> Is above race possible with current f2fs code?
> i.e. f2fs_direct_IO could read the stale data from the blocks
> which were allocated due to mmap fault?

The faulted page is locked until the fault is fully processed so direct
IO has to wait for that to complete first.

> 
> Am I missing something here?
> 
> -ritesh
> 
> On 11/26/19 1:27 PM, Damien Le Moal wrote:
>> f2fs_preallocate_blocks() identifies direct IOs using the IOCB_DIRECT
>> flag for a kiocb structure. However, the file system direct IO handler
>> function f2fs_direct_IO() may have decided that a direct IO has to be
>> exececuted as a buffered IO using the function f2fs_force_buffered_io().
>> This is the case for instance for volumes including zoned block device
>> and for unaligned write IOs with LFS mode enabled.
>>
>> These 2 different methods of identifying direct IOs can result in
>> inconsistencies generating stale data access for direct reads after a
>> direct IO write that is treated as a buffered write. Fix this
>> inconsistency by combining the IOCB_DIRECT flag test with the result
>> of f2fs_force_buffered_io().
>>
>> Reported-by: Javier Gonzalez 
>> Signed-off-by: Damien Le Moal 
>> ---
>>   fs/f2fs/data.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>> index 5755e897a5f0..8ac2d3b70022 100644
>> --- a/fs/f2fs/data.c
>> +++ b/fs/f2fs/data.c
>> @@ -1073,6 +1073,8 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct 
>> iov_iter *from)
>>  int flag;
>>  int err = 0;
>>  bool direct_io = iocb->ki_flags & IOCB_DIRECT;
>> +bool do_direct_io = direct_io &&
>> +!f2fs_force_buffered_io(inode, iocb, from);
>>   
>>  /* convert inline data for Direct I/O*/
>>  if (direct_io) {
>> @@ -1081,7 +1083,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct 
>> iov_iter *from)
>>  return err;
>>  }
>>   
>> -if (direct_io && allow_outplace_dio(inode, iocb, from))
>> +if (do_direct_io && allow_outplace_dio(inode, iocb, from))
>>  return 0;
>>   
>>  if (is_inode_flag_set(inode, FI_NO_PREALLOC))
>>
> 
> 


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] f2fs: Fix direct IO handling

2019-11-25 Thread Damien Le Moal
f2fs_preallocate_blocks() identifies direct IOs using the IOCB_DIRECT
flag for a kiocb structure. However, the file system direct IO handler
function f2fs_direct_IO() may have decided that a direct IO has to be
exececuted as a buffered IO using the function f2fs_force_buffered_io().
This is the case for instance for volumes including zoned block device
and for unaligned write IOs with LFS mode enabled.

These 2 different methods of identifying direct IOs can result in
inconsistencies generating stale data access for direct reads after a
direct IO write that is treated as a buffered write. Fix this
inconsistency by combining the IOCB_DIRECT flag test with the result
of f2fs_force_buffered_io().

Reported-by: Javier Gonzalez 
Signed-off-by: Damien Le Moal 
---
 fs/f2fs/data.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5755e897a5f0..8ac2d3b70022 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1073,6 +1073,8 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct 
iov_iter *from)
int flag;
int err = 0;
bool direct_io = iocb->ki_flags & IOCB_DIRECT;
+   bool do_direct_io = direct_io &&
+   !f2fs_force_buffered_io(inode, iocb, from);
 
/* convert inline data for Direct I/O*/
if (direct_io) {
@@ -1081,7 +1083,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct 
iov_iter *from)
return err;
}
 
-   if (direct_io && allow_outplace_dio(inode, iocb, from))
+   if (do_direct_io && allow_outplace_dio(inode, iocb, from))
return 0;
 
if (is_inode_flag_set(inode, FI_NO_PREALLOC))
-- 
2.23.0



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: disble physical prealloc in LSF mount

2019-11-25 Thread Damien Le Moal
+ Shin'Ichiro

On 2019/11/26 15:19, Damien Le Moal wrote:
> On 2019/11/26 12:58, Javier González wrote:
>> On 26.11.2019 02:06, Damien Le Moal wrote:
>>> On 2019/11/26 4:03, Javier González wrote:
>>>> On 25.11.2019 00:48, Damien Le Moal wrote:
>>>>> On 2019/11/22 18:00, Javier González wrote:
>>>>>> From: Javier González 
>>>>>>
>>>>>> Fix file system corruption when using LFS mount (e.g., in zoned
>>>>>> devices). Seems like the fallback into buffered I/O creates an
>>>>>> inconsistency if the application is assuming both read and write DIO. I
>>>>>> can easily reproduce a corruption with a simple RocksDB test.
>>>>>>
>>>>>> Might be that the f2fs_forced_buffered_io path brings some problems too,
>>>>>> but I have not seen other failures besides this one.
>>>>>>
>>>>>> Problem reproducible without a zoned block device, simply by forcing
>>>>>> LFS mount:
>>>>>>
>>>>>>   $ sudo mkfs.f2fs -f -m /dev/nvme0n1
>>>>>>   $ sudo mount /dev/nvme0n1 /mnt/f2fs
>>>>>>   $ sudo  /opt/rocksdb/db_bench  --benchmarks=fillseq --use_existing_db=0
>>>>>>   --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true
>>>>>>   --db=/mnt/f2fs --num=5000 --value_size=1048576 --verify_checksum=1
>>>>>>   --block_size=65536
>>>>>>
>>>>>> Note that the options that cause the problem are:
>>>>>>   --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true
>>>>>>
>>>>>> Fixes: f9d6d0597698 ("f2fs: fix out-place-update DIO write")
>>>>>>
>>>>>> Signed-off-by: Javier González 
>>>>>> ---
>>>>>>  fs/f2fs/data.c | 3 ---
>>>>>>  1 file changed, 3 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>>>> index 5755e897a5f0..b045dd6ab632 100644
>>>>>> --- a/fs/f2fs/data.c
>>>>>> +++ b/fs/f2fs/data.c
>>>>>> @@ -1081,9 +1081,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, 
>>>>>> struct iov_iter *from)
>>>>>>  return err;
>>>>>>  }
>>>>>>
>>>>>> -if (direct_io && allow_outplace_dio(inode, iocb, from))
>>>>>> -return 0;
>>>>>
>>>>> Since for LFS mode, all DIOs can end up out of place, I think that it
>>>>> may be better to change allow_outplace_dio() to always return true in
>>>>> the case of LFS mode. So may be something like:
>>>>>
>>>>> static inline int allow_outplace_dio(struct inode *inode,
>>>>>   struct kiocb *iocb, struct iov_iter *iter)
>>>>> {
>>>>>   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>>>>>   int rw = iov_iter_rw(iter);
>>>>>
>>>>>   return test_opt(sbi, LFS) ||
>>>>>   (rw == WRITE && !block_unaligned_IO(inode, iocb, iter));
>>>>> }
>>>>>
>>>>> instead of the original:
>>>>>
>>>>> static inline int allow_outplace_dio(struct inode *inode,
>>>>>   struct kiocb *iocb, struct iov_iter *iter)
>>>>> {
>>>>>   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>>>>>   int rw = iov_iter_rw(iter);
>>>>>
>>>>>   return (test_opt(sbi, LFS) && (rw == WRITE) &&
>>>>>   !block_unaligned_IO(inode, iocb, iter));
>>>>> }
>>>>>
>>>>> Thoughts ?
>>>>>
>>>>
>>>> I see what you mean and it makes sense. However, the problem I am seeing
>>>> occurs when allow_outplace_dio() returns true, as this is what creates
>>>> the inconsistency between the write being buffered and the read being
>>>> DIO.
>>>
>>> But if the write is switched to buffered, the DIO read should use the
>>> buffered path too, no ? Since this is all happening under VFS, the
>>> generic DIO read path will not ensure that the buffered writes are
>>> flushed to disk before issuing the direct read, I think. So that would
>>> explain your data corruption, i.e. y

Re: [f2fs-dev] [PATCH] f2fs: disble physical prealloc in LSF mount

2019-11-25 Thread Damien Le Moal
On 2019/11/26 12:58, Javier González wrote:
> On 26.11.2019 02:06, Damien Le Moal wrote:
>> On 2019/11/26 4:03, Javier González wrote:
>>> On 25.11.2019 00:48, Damien Le Moal wrote:
>>>> On 2019/11/22 18:00, Javier González wrote:
>>>>> From: Javier González 
>>>>>
>>>>> Fix file system corruption when using LFS mount (e.g., in zoned
>>>>> devices). Seems like the fallback into buffered I/O creates an
>>>>> inconsistency if the application is assuming both read and write DIO. I
>>>>> can easily reproduce a corruption with a simple RocksDB test.
>>>>>
>>>>> Might be that the f2fs_forced_buffered_io path brings some problems too,
>>>>> but I have not seen other failures besides this one.
>>>>>
>>>>> Problem reproducible without a zoned block device, simply by forcing
>>>>> LFS mount:
>>>>>
>>>>>   $ sudo mkfs.f2fs -f -m /dev/nvme0n1
>>>>>   $ sudo mount /dev/nvme0n1 /mnt/f2fs
>>>>>   $ sudo  /opt/rocksdb/db_bench  --benchmarks=fillseq --use_existing_db=0
>>>>>   --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true
>>>>>   --db=/mnt/f2fs --num=5000 --value_size=1048576 --verify_checksum=1
>>>>>   --block_size=65536
>>>>>
>>>>> Note that the options that cause the problem are:
>>>>>   --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true
>>>>>
>>>>> Fixes: f9d6d0597698 ("f2fs: fix out-place-update DIO write")
>>>>>
>>>>> Signed-off-by: Javier González 
>>>>> ---
>>>>>  fs/f2fs/data.c | 3 ---
>>>>>  1 file changed, 3 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>>> index 5755e897a5f0..b045dd6ab632 100644
>>>>> --- a/fs/f2fs/data.c
>>>>> +++ b/fs/f2fs/data.c
>>>>> @@ -1081,9 +1081,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, 
>>>>> struct iov_iter *from)
>>>>>   return err;
>>>>>   }
>>>>>
>>>>> - if (direct_io && allow_outplace_dio(inode, iocb, from))
>>>>> - return 0;
>>>>
>>>> Since for LFS mode, all DIOs can end up out of place, I think that it
>>>> may be better to change allow_outplace_dio() to always return true in
>>>> the case of LFS mode. So may be something like:
>>>>
>>>> static inline int allow_outplace_dio(struct inode *inode,
>>>>struct kiocb *iocb, struct iov_iter *iter)
>>>> {
>>>>struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>>>>int rw = iov_iter_rw(iter);
>>>>
>>>>return test_opt(sbi, LFS) ||
>>>>(rw == WRITE && !block_unaligned_IO(inode, iocb, iter));
>>>> }
>>>>
>>>> instead of the original:
>>>>
>>>> static inline int allow_outplace_dio(struct inode *inode,
>>>>struct kiocb *iocb, struct iov_iter *iter)
>>>> {
>>>>struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>>>>int rw = iov_iter_rw(iter);
>>>>
>>>>return (test_opt(sbi, LFS) && (rw == WRITE) &&
>>>>!block_unaligned_IO(inode, iocb, iter));
>>>> }
>>>>
>>>> Thoughts ?
>>>>
>>>
>>> I see what you mean and it makes sense. However, the problem I am seeing
>>> occurs when allow_outplace_dio() returns true, as this is what creates
>>> the inconsistency between the write being buffered and the read being
>>> DIO.
>>
>> But if the write is switched to buffered, the DIO read should use the
>> buffered path too, no ? Since this is all happening under VFS, the
>> generic DIO read path will not ensure that the buffered writes are
>> flushed to disk before issuing the direct read, I think. So that would
>> explain your data corruption, i.e. you are reading stale data on the
>> device before the buffered writes make it to the media.
>>
> 
> As far as I can see, the read is always sent DIO, so yes, I also believe
> that we are reading stale data. This is why the corruption is not seen
> if preventing allow_outplace_dio() from sending the write to the
> buffered path.
> 
> What surprises me is that this is very easy to trigger (see

Re: [f2fs-dev] [PATCH] f2fs: disble physical prealloc in LSF mount

2019-11-25 Thread Damien Le Moal
On 2019/11/26 4:03, Javier González wrote:
> On 25.11.2019 00:48, Damien Le Moal wrote:
>> On 2019/11/22 18:00, Javier González wrote:
>>> From: Javier González 
>>>
>>> Fix file system corruption when using LFS mount (e.g., in zoned
>>> devices). Seems like the fallback into buffered I/O creates an
>>> inconsistency if the application is assuming both read and write DIO. I
>>> can easily reproduce a corruption with a simple RocksDB test.
>>>
>>> Might be that the f2fs_forced_buffered_io path brings some problems too,
>>> but I have not seen other failures besides this one.
>>>
>>> Problem reproducible without a zoned block device, simply by forcing
>>> LFS mount:
>>>
>>>   $ sudo mkfs.f2fs -f -m /dev/nvme0n1
>>>   $ sudo mount /dev/nvme0n1 /mnt/f2fs
>>>   $ sudo  /opt/rocksdb/db_bench  --benchmarks=fillseq --use_existing_db=0
>>>   --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true
>>>   --db=/mnt/f2fs --num=5000 --value_size=1048576 --verify_checksum=1
>>>   --block_size=65536
>>>
>>> Note that the options that cause the problem are:
>>>   --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true
>>>
>>> Fixes: f9d6d0597698 ("f2fs: fix out-place-update DIO write")
>>>
>>> Signed-off-by: Javier González 
>>> ---
>>>  fs/f2fs/data.c | 3 ---
>>>  1 file changed, 3 deletions(-)
>>>
>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>> index 5755e897a5f0..b045dd6ab632 100644
>>> --- a/fs/f2fs/data.c
>>> +++ b/fs/f2fs/data.c
>>> @@ -1081,9 +1081,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, 
>>> struct iov_iter *from)
>>> return err;
>>> }
>>>
>>> -   if (direct_io && allow_outplace_dio(inode, iocb, from))
>>> -   return 0;
>>
>> Since for LFS mode, all DIOs can end up out of place, I think that it
>> may be better to change allow_outplace_dio() to always return true in
>> the case of LFS mode. So may be something like:
>>
>> static inline int allow_outplace_dio(struct inode *inode,
>>  struct kiocb *iocb, struct iov_iter *iter)
>> {
>>  struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>>  int rw = iov_iter_rw(iter);
>>
>>  return test_opt(sbi, LFS) ||
>>  (rw == WRITE && !block_unaligned_IO(inode, iocb, iter));
>> }
>>
>> instead of the original:
>>
>> static inline int allow_outplace_dio(struct inode *inode,
>>  struct kiocb *iocb, struct iov_iter *iter)
>> {
>>  struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>>  int rw = iov_iter_rw(iter);
>>
>>  return (test_opt(sbi, LFS) && (rw == WRITE) &&
>>  !block_unaligned_IO(inode, iocb, iter));
>> }
>>
>> Thoughts ?
>>
> 
> I see what you mean and it makes sense. However, the problem I am seeing
> occurs when allow_outplace_dio() returns true, as this is what creates
> the inconsistency between the write being buffered and the read being
> DIO.

But if the write is switched to buffered, the DIO read should use the
buffered path too, no ? Since this is all happening under VFS, the
generic DIO read path will not ensure that the buffered writes are
flushed to disk before issuing the direct read, I think. So that would
explain your data corruption, i.e. you are reading stale data on the
device before the buffered writes make it to the media.

> 
> I did test the code you propose and, as expected, it still triggered the
> corruption.
> 
>>> -
>>> if (is_inode_flag_set(inode, FI_NO_PREALLOC))
>>> return 0;
>>>
>>>
>>
>>
>> -- 
>> Damien Le Moal
>> Western Digital Research
> 
> Thanks,
> Javier
> 


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: disble physical prealloc in LSF mount

2019-11-24 Thread Damien Le Moal
On 2019/11/22 18:00, Javier González wrote:
> From: Javier González 
> 
> Fix file system corruption when using LFS mount (e.g., in zoned
> devices). Seems like the fallback into buffered I/O creates an
> inconsistency if the application is assuming both read and write DIO. I
> can easily reproduce a corruption with a simple RocksDB test.
> 
> Might be that the f2fs_forced_buffered_io path brings some problems too,
> but I have not seen other failures besides this one.
> 
> Problem reproducible without a zoned block device, simply by forcing
> LFS mount:
> 
>   $ sudo mkfs.f2fs -f -m /dev/nvme0n1
>   $ sudo mount /dev/nvme0n1 /mnt/f2fs
>   $ sudo  /opt/rocksdb/db_bench  --benchmarks=fillseq --use_existing_db=0
>   --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true
>   --db=/mnt/f2fs --num=5000 --value_size=1048576 --verify_checksum=1
>   --block_size=65536
> 
> Note that the options that cause the problem are:
>   --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true
> 
> Fixes: f9d6d0597698 ("f2fs: fix out-place-update DIO write")
> 
> Signed-off-by: Javier González 
> ---
>  fs/f2fs/data.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 5755e897a5f0..b045dd6ab632 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -1081,9 +1081,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct 
> iov_iter *from)
>   return err;
>   }
>  
> - if (direct_io && allow_outplace_dio(inode, iocb, from))
> - return 0;

Since for LFS mode, all DIOs can end up out of place, I think that it
may be better to change allow_outplace_dio() to always return true in
the case of LFS mode. So may be something like:

static inline int allow_outplace_dio(struct inode *inode,
struct kiocb *iocb, struct iov_iter *iter)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
int rw = iov_iter_rw(iter);

return test_opt(sbi, LFS) ||
(rw == WRITE && !block_unaligned_IO(inode, iocb, iter));
}

instead of the original:

static inline int allow_outplace_dio(struct inode *inode,
struct kiocb *iocb, struct iov_iter *iter)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
int rw = iov_iter_rw(iter);

return (test_opt(sbi, LFS) && (rw == WRITE) &&
        !block_unaligned_IO(inode, iocb, iter));
}

Thoughts ?

> -
>   if (is_inode_flag_set(inode, FI_NO_PREALLOC))
>   return 0;
>  
> 


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2 0/9] Zoned block device enhancements and zone report rework

2019-11-12 Thread Damien Le Moal
On 2019/11/13 11:16, Jens Axboe wrote:
> On 11/10/19 6:39 PM, Damien Le Moal wrote:
>> This series of patches introduces changes to zoned block device handling
>> code with the intent to simplify the code while optimizing run-time
>> operation, particularly in the area of zone reporting.
>>
>> The first patch lifts the device zone check code out of the sd driver
>> and reimplements these zone checks generically as part of
>> blk_revalidate_disk_zones(). This avoids zoned block device drivers to
>> have to implement these checks. The second patch simplifies this
>> function code for the !zoned case.
>>
>> The third patch is a small cleanup of zone report processing in
>> preparation for the fourth patch which removes support for partitions
>> on zoned block devices. As mentioned in that patch commit message, none
>> of the known partitioning tools support zoned devices and there are no
>> known use case in the field of SMR disks being used with partitions.
>> Dropping partition supports allows to significantly simplify the code
>> for zone report as zone sector values remapping becomes unnecessary.
>>
>> Patch 5 to 6 are small cleanups and fixes of the null_blk driver zoned
>> mode.
>>
>> The prep patch 7 optimizes zone report buffer allocation for the SCSI
>> sd driver. Finally, patch 8 introduces a new interface for report zones
>> handling using a callback function executed per zone reported by the
>> device. This allows avoiding the need to allocate large arrays of
>> blk_zone structures for the execution of zone reports. This can
>> significantly reduce memory usage and pressure on the memory management
>> system while significantly simplify the code all over.
>>
>> Overall, this series not only reduces significantly the code size, it
>> also improves run-time memory usage for zone report execution.
>>
>> This series applies cleanly on the for-next block tree on top of the
>> zone management operation series. It may however create a conflict with
>> Christoph's reqork of disk size revalidation. Please consider this
>> series for inclusion in the 5.5 kernel.
> 
> We're taking branching to new levels... I created for-5.5/zoned for this,
> which is for-5.5/block + for-5.5/drivers + for-5.5/drivers-post combined.
> The latter is a branch with the SCSI dependencies from Martin pulled in.
> 

Jens,

Thanks !

-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2 6/9] null_blk: clean up report zones

2019-11-10 Thread Damien Le Moal
From: Christoph Hellwig 

Make the instance name match the method name and define the name to NULL
instead of providing an inline stub, which is rather pointless for a
method call.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Chaitanya Kulkarni 
---
 drivers/block/null_blk.h   | 11 +++
 drivers/block/null_blk_main.c  |  2 +-
 drivers/block/null_blk_zoned.c |  4 ++--
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/block/null_blk.h b/drivers/block/null_blk.h
index 93c2a3d403da..9bf56fbab091 100644
--- a/drivers/block/null_blk.h
+++ b/drivers/block/null_blk.h
@@ -91,8 +91,8 @@ struct nullb {
 #ifdef CONFIG_BLK_DEV_ZONED
 int null_zone_init(struct nullb_device *dev);
 void null_zone_exit(struct nullb_device *dev);
-int null_zone_report(struct gendisk *disk, sector_t sector,
-struct blk_zone *zones, unsigned int *nr_zones);
+int null_report_zones(struct gendisk *disk, sector_t sector,
+ struct blk_zone *zones, unsigned int *nr_zones);
 blk_status_t null_handle_zoned(struct nullb_cmd *cmd,
enum req_opf op, sector_t sector,
sector_t nr_sectors);
@@ -105,12 +105,6 @@ static inline int null_zone_init(struct nullb_device *dev)
return -EINVAL;
 }
 static inline void null_zone_exit(struct nullb_device *dev) {}
-static inline int null_zone_report(struct gendisk *disk, sector_t sector,
-  struct blk_zone *zones,
-  unsigned int *nr_zones)
-{
-   return -EOPNOTSUPP;
-}
 static inline blk_status_t null_handle_zoned(struct nullb_cmd *cmd,
 enum req_opf op, sector_t sector,
 sector_t nr_sectors)
@@ -123,5 +117,6 @@ static inline size_t null_zone_valid_read_len(struct nullb 
*nullb,
 {
return len;
 }
+#define null_report_zones  NULL
 #endif /* CONFIG_BLK_DEV_ZONED */
 #endif /* __NULL_BLK_H */
diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 89d385bab45b..2687eb36441c 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1444,7 +1444,7 @@ static void null_config_discard(struct nullb *nullb)
 
 static const struct block_device_operations null_ops = {
.owner  = THIS_MODULE,
-   .report_zones   = null_zone_report,
+   .report_zones   = null_report_zones,
 };
 
 static void null_init_queue(struct nullb *nullb, struct nullb_queue *nq)
diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c
index 02f41a3bc4cb..00696f16664b 100644
--- a/drivers/block/null_blk_zoned.c
+++ b/drivers/block/null_blk_zoned.c
@@ -66,8 +66,8 @@ void null_zone_exit(struct nullb_device *dev)
kvfree(dev->zones);
 }
 
-int null_zone_report(struct gendisk *disk, sector_t sector,
-struct blk_zone *zones, unsigned int *nr_zones)
+int null_report_zones(struct gendisk *disk, sector_t sector,
+ struct blk_zone *zones, unsigned int *nr_zones)
 {
struct nullb *nullb = disk->private_data;
struct nullb_device *dev = nullb->dev;
-- 
2.23.0



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2 7/9] null_blk: Add zone_nr_conv to features

2019-11-10 Thread Damien Le Moal
For a null_blk device with zoned mode enabled, the number of
conventional zones can be configured through configfs with the
zone_nr_conv parameter. Add this missing parameter in the features
string.

Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Chaitanya Kulkarni 
---
 drivers/block/null_blk_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 2687eb36441c..27fb34d7da31 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -467,7 +467,7 @@ nullb_group_drop_item(struct config_group *group, struct 
config_item *item)
 
 static ssize_t memb_group_features_show(struct config_item *item, char *page)
 {
-   return snprintf(page, PAGE_SIZE, 
"memory_backed,discard,bandwidth,cache,badblocks,zoned,zone_size\n");
+   return snprintf(page, PAGE_SIZE, 
"memory_backed,discard,bandwidth,cache,badblocks,zoned,zone_size,zone_nr_conv\n");
 }
 
 CONFIGFS_ATTR_RO(memb_group_, features);
-- 
2.23.0



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2 5/9] null_blk: clean up the block device operations

2019-11-10 Thread Damien Le Moal
From: Christoph Hellwig 

Remove the pointless stub open and release methods, give the operations
vector a slightly less confusing name, and use normal alignment for the
assignment operators.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Chaitanya Kulkarni 
---
 drivers/block/null_blk_main.c | 19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 40a64bdd8ea7..89d385bab45b 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1442,20 +1442,9 @@ static void null_config_discard(struct nullb *nullb)
blk_queue_flag_set(QUEUE_FLAG_DISCARD, nullb->q);
 }
 
-static int null_open(struct block_device *bdev, fmode_t mode)
-{
-   return 0;
-}
-
-static void null_release(struct gendisk *disk, fmode_t mode)
-{
-}
-
-static const struct block_device_operations null_fops = {
-   .owner =THIS_MODULE,
-   .open = null_open,
-   .release =  null_release,
-   .report_zones = null_zone_report,
+static const struct block_device_operations null_ops = {
+   .owner  = THIS_MODULE,
+   .report_zones   = null_zone_report,
 };
 
 static void null_init_queue(struct nullb *nullb, struct nullb_queue *nq)
@@ -1556,7 +1545,7 @@ static int null_gendisk_register(struct nullb *nullb)
disk->flags |= GENHD_FL_EXT_DEVT | GENHD_FL_SUPPRESS_PARTITION_INFO;
disk->major = null_major;
disk->first_minor   = nullb->index;
-   disk->fops  = &null_fops;
+   disk->fops  = &null_ops;
disk->private_data  = nullb;
disk->queue = nullb->q;
strncpy(disk->disk_name, nullb->disk_name, DISK_NAME_LEN);
-- 
2.23.0



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


  1   2   3   >