On 6/19/25 17:16, Andrey Zhadchenko wrote:
When doing simple discard tests, we encountered the following crash
with panic_on_warn:

[4421147.245184] ------------[ cut here ]------------
[4421147.245569] WARNING: CPU: 6 PID: 505054 at drivers/md/dm-ploop.h:392 
ploop_advance_local_after_bat_wb+0x2ae/0x2d0 [ploop]
[4421147.246073] Modules linked in: loop tls dm_qcow2(E) vhost_net tap 
xt_physdev nft_meta_bridge tun dm_zero_rq vhost_blk vhost_vsock vhost 
vhost_iotlb ip6t_REJECT nf_reject_ipv6 xt_CHECKSUM xt_MASQUERADE xt_conntrack 
ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 nft_counter 8021q garp mrp nf_tables nfnetlink 
rpcrdma rfkill intel_rapl_msr intel_rapl_common intel_pmc_core intel_vsec 
pmt_telemetry pmt_class kvm_intel kvm iTCO_wdt irqbypass iTCO_vendor_support 
rapl vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common 
virtio_console vsock i2c_i801 pcspkr lpc_ich i2c_smbus pvpanic_mmio pvpanic 
joydev nfsd br_netfilter veth overlay auth_rpcgss nfs_acl lockd grace bridge 
stp llc sunrpc binfmt_misc xfs libcrc32c sd_mod sr_mod cdrom t10_pi sg 
drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul syscopyarea crc32_pclmul 
sysfillrect ahci sysimgblt libahci fb_sys_fops virtio_net ghash_clmulni_intel 
libata net_failover drm v!
irtio_scsi failover
[4421147.246120]  serio_raw crc32c_intel fuse_kio_pcs rdma_cm iw_cm ib_cm 
ib_core fuse push_backup dm_tracking ploop dm_mod [last unloaded: dm_qcow2]
[4421147.249733] CPU: 6 PID: 505054 Comm: kworker/6:1 ve: / Kdump: loaded 
Tainted: G            E  X  -------  ---  5.14.0-427.44.1.ovz9.80.29 #1 
ovz9.80.29
[4421147.250261] Hardware name: Virtuozzo KVM/Virtuozzo, BIOS 1.16.3-2.vz9.2 
04/01/2014
[4421147.250651] Workqueue: dio/dm-1 iomap_dio_complete_work
[4421147.250914] RIP: 0010:ploop_advance_local_after_bat_wb+0x2ae/0x2d0 [ploop]
[4421147.251216] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b b5 88 00 00 
00 3b 8d 90 00 00 00 73 0c 89 cf f0 48 0f ab 3e e9 81 fe ff ff <0f> 0b e9 7a fe 
ff ff 0f 0b 0f 0b e9 22 ff ff ff 0f 0b e9 c9 fe ff
[4421147.251938] RSP: 0018:ffffc07dc8447d98 EFLAGS: 00010002
[4421147.252202] RAX: ffff9dc778699040 RBX: ffff9dc7413bd800 RCX: 
0000000000000000
[4421147.252578] RDX: 0000000000000010 RSI: 0000000068746957 RDI: 
ffff9dcc5414c420
[4421147.252943] RBP: ffff9dcac2299000 R08: 0000000000000000 R09: 
ffff9dcac2299400
[4421147.253328] R10: 0000000000000001 R11: ffff9dc747460000 R12: 
ffffc07dc8447dd0
[4421147.253706] R13: ffffffffcedc7000 R14: 0000000000000074 R15: 
ffff9dc8e56aa380
[4421147.254071] FS:  0000000000000000(0000) GS:ffff9dce9fb80000(0000) 
knlGS:0000000000000000
[4421147.254468] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4421147.254743] CR2: 00007ffd825d3000 CR3: 00000001052a4002 CR4: 
0000000000372ee0
[4421147.255109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[4421147.255474] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 
0000000000000400
[4421147.255848] Call Trace:
[4421147.256040]  <TASK>
[4421147.256225]  ? show_trace_log_lvl+0x1c4/0x2df
[4421147.256466]  ? show_trace_log_lvl+0x1c4/0x2df
[4421147.256712]  ? ploop_put_piwb+0x187/0x1f0 [ploop]
[4421147.256959]  ? ploop_advance_local_after_bat_wb+0x2ae/0x2d0 [ploop]
[4421147.257272]  ? __warn+0x81/0x110
[4421147.257485]  ? ploop_advance_local_after_bat_wb+0x2ae/0x2d0 [ploop]
[4421147.257775]  ? report_bug+0x10a/0x140
[4421147.257997]  ? handle_bug+0x3c/0x70
[4421147.258212]  ? exc_invalid_op+0x14/0x70
[4421147.258438]  ? asm_exc_invalid_op+0x16/0x20
[4421147.258678]  ? ploop_advance_local_after_bat_wb+0x2ae/0x2d0 [ploop]
[4421147.258962]  ? ploop_advance_local_after_bat_wb+0xcc/0x2d0 [ploop]
[4421147.259246]  ploop_put_piwb+0x187/0x1f0 [ploop]
[4421147.259493]  ploop_do_pio_endio+0x31/0x90 [ploop]
[4421147.259748]  process_one_work+0x1e2/0x3b0
[4421147.259980]  ? __pfx_worker_thread+0x10/0x10
[4421147.261238]  worker_thread+0x50/0x3a0
[4421147.261460]  ? __pfx_worker_thread+0x10/0x10
[4421147.261704]  kthread+0xdd/0x100
[4421147.261915]  ? __pfx_kthread+0x10/0x10
[4421147.262138]  ret_from_fork+0x29/0x50
[4421147.262360]  </TASK>
[4421147.262553] Kernel panic - not syncing: panic_on_warn set ...

The extra debug looks like that:

[ 3022.561985] got discard: pos 0, size 1048576
[ 3022.562373]          processing discard: clu 0 (idx 16), len 1048576
[ 3022.563182]                  dropped clu 1752459607 (total 101) [page 0, clu 
0 (i 16 + off -16)]

1752459607 -> 0x68746957 -> 57 69 74 68 ("With") which is a start of ploop
header. So we clearly missed the first metadata page handling.

Since ploop first metadata page contains 64bit header, we shift all
clusters in bat by this value when accessing them.
In ploop_advance_local_after_bat_wb() we iterate over the page either from
0 or 16, if it is a first page. Therefore:
  - i is the position of iterated cluster in the metadata page, so we should
use it for accessing/writing bat_entries and bat_levels.
  - i + off is the real cluster number
  - bat_entries[i] is the cluster we should punch from hole bitmap

Also the current code does out-of-bound access if the metadata page is
not the first, as it uses real cluster number as index when accessing
the bat page (which is only 4096 bytes long)

Fixes: da5f60147b62 ("dm-ploop: dm-ploop: simplify discard completion")
Signed-off-by: Andrey Zhadchenko <andrey.zhadche...@virtuozzo.com>
---
v2: dropped erroneous ploop_hole_set_bit() change, expanded commit
message for a bit

  drivers/md/dm-ploop-map.c | 9 ++++-----
  1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
index 6dd94af5fd94..988bf00ed4be 100644
--- a/drivers/md/dm-ploop-map.c
+++ b/drivers/md/dm-ploop-map.c
@@ -853,16 +853,15 @@ static void ploop_advance_local_after_bat_wb(struct ploop 
*ploop,
        for (; i < last; i++) {
                if (piwb->type == PIWB_TYPE_DISCARD) {
                        /* discard completed */
-                       u32 clu = i + off;
-                       u8 level = md->bat_levels[clu];
-                       u32 d_clu = READ_ONCE(bat_entries[clu]);
+                       u8 level = md->bat_levels[i];
+                       u32 d_clu = READ_ONCE(bat_entries[i]);
if (success && !dst_clu[i] &&
                            (!(d_clu == BAT_ENTRY_NONE ||
                            level < ploop_top_level(ploop)))) {
                                WARN_ON_ONCE(ploop->nr_deltas != 1);
-                               WRITE_ONCE(bat_entries[clu], BAT_ENTRY_NONE);
-                               WRITE_ONCE(md->bat_levels[clu], 0);
+                               WRITE_ONCE(bat_entries[i], BAT_ENTRY_NONE);
+                               WRITE_ONCE(md->bat_levels[i], 0);

In original code in ploop_piwb_discard_completed we accessed bat_entries and bat_levels at index no more than 1024 (PAGE_SIZE / sizeof(map_index_t)) as we used ploop_bat_clu_idx_in_page() for conversion, now maximum value is not so obvious:

(READ_ONCE(ploop->nr_bat_entries) - piwb->page_id * PAGE_SIZE / sizeof(map_index_t) + PLOOP_MAP_OFFSET - 1)

are we sure it won't overflow?

I mean, maybe the idea in da5f60147b62 ("dm-ploop: dm-ploop: simplify discard completion") about "ploop_advance_local_after_bat_wb is handling only one md page" was wrong?

                                ploop_hole_set_bit(d_clu, ploop);
                        }

--
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.


_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to