On 4/9/25 8:51 AM, Benjamin Marzinski wrote:
Thanks for all the fixes. Of note is that I am looking into the dm-delay
pre-suspend problem for zoned case (potential reordering of writes). It turns
out to not be an easy fix given how dm-delay works. Still scratching my head :)
> There were multiple places in dm's __bind() function where it could fail
> and not completely roll back, leaving the device using the the old
> table, but with device limits and resources from the new table.
> Additionally, unused mempools for request-based devices were not always
> freed immediately.
>
> Also, there were a number of issues with switching zoned tables that
> emulate zone append (in other words, dm-crypt on top of zoned devices).
> dm_blk_report_zones() could be called while the device was initially
> setting up a zoned table and creating zoned resources or could possibly
> fail to end a srcu read section. More importantly,
> blk_revalidate_disk_zones() would never get called when updating a zoned
> table. This could cause the dm device to see the wrong zone write
> offsets, not have a large enough zwplugs reserved in its mempool, or
> read invalid memory when checking the conventional zones bitmap.
>
> Finally, any DM device created on top of a device emulating zone appeads
> will automatically have zone write plug resources created for it, since
> max_hw_zone_append_sectors will always be 0 for a device stacked on top
> of a device max_hw_zone_append_sectors = 0
>
> This patchset fixes these issues. It deals with the problems around
> blk_revalidate_disk_zones() by only calling it for devices that have no
> zone write plug resources. This will always correctly update the zoned
> settings. If a device has zone write plug resources, calling
> blk_revalidate_disk_zones() will not correctly update them in many
> cases, so DM simply doesn't call it for devices with zone write plug
> resources. Instead of allowing people to load tables that can break the
> device, like currently happens, DM disallosw any table reloads that
> change the zoned setting for devices that already have zone write plug
> resources. Finally, it deals with the max_hw_zone_append_sectors issue
> by making sure that it is non-zero for zoned DM devices that do not need
> zone write append emulation.
>
> Changes in V3:
> - Use queue_limits_start_update() instead of modifying
> queue_limits_set()
> - Rewrote the commit message for patch 0004 ("dm: fix
> dm_blk_report_zones") to explain that this only happens when initially
> setting up a table with zone append resources, so disallowing table
> swaps after you set up a zoned dm-crypt device will not effect the
> issue at all. I did not implement Christoph's suggestion because I
> don't understand how it would work here. Perhaps I'm just being dense.
> I'm not wedded to this solution. Any one that keeps this
> use-after-free error from being possible is fine by me.
> - Added a final patch to deal with the issue of stacked devices always
> getting zone append resources if any underlying device needs them.
>
> Changes in V2:
> - Made queue_limits_set() optionally return the old limits (grabbed
> while holding the limits_lock), and used this in
> dm_table_set_restrictions()
> - dropped changes to disk_free_zone_resources() and the
> blk_revalidate_disk_zones() code path (removed patches 0005 & 0006)
> - Instead of always calling blk_revalidate_disk_zones(), disallow
> changes that would change zone settings if the device has
> zone write plug resources (final patch).
>
> Benjamin Marzinski (6):
> dm: don't change md if dm_table_set_restrictions() fails
> dm: free table mempools if not used in __bind
> dm: handle failures in dm_table_set_restrictions
> dm: fix dm_blk_report_zones
> dm: limit swapping tables for devices with zone write plugs
> dm: fix native zone append devices on top of emulated ones
>
> drivers/md/dm-core.h | 1 +
> drivers/md/dm-table.c | 67 +++++++++++++++++++++++++-------
> drivers/md/dm-zone.c | 90 ++++++++++++++++++++++++++++++-------------
> drivers/md/dm.c | 36 ++++++++++-------
> drivers/md/dm.h | 6 +++
> 5 files changed, 146 insertions(+), 54 deletions(-)
>
--
Damien Le Moal
Western Digital Research