Re: [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited

Mikulas Patocka Tue, 06 Jan 2026 09:07:22 -0800

On Tue, 6 Jan 2026, Uladzislau Rezki wrote:

> On Tue, Jan 06, 2026 at 04:56:07PM +0100, Mikulas Patocka wrote:
> > On the kernel 6.19-rc, I am experiencing 15-second boot stall in a
> > virtual machine when probing a virtio-scsi disk:
> > [    1.011641] SCSI subsystem initialized
> > [    1.013972] virtio_scsi virtio6: 16/0/0 default/read/poll queues
> > [    1.015983] scsi host0: Virtio SCSI HBA
> > [    1.019578] ACPI: \_SB_.GSIA: Enabled at IRQ 16
> > [    1.020225] ahci 0000:00:1f.2: AHCI vers 0001.0000, 32 command slots, 
> > 1.5 Gbps, SATA mode
> > [    1.020228] ahci 0000:00:1f.2: 6/6 ports implemented (port mask 0x3f)
> > [    1.020230] ahci 0000:00:1f.2: flags: 64bit ncq only
> > [    1.024688] scsi host1: ahci
> > [    1.025432] scsi host2: ahci
> > [    1.025966] scsi host3: ahci
> > [    1.026511] scsi host4: ahci
> > [    1.028371] scsi host5: ahci
> > [    1.028918] scsi host6: ahci
> > [    1.029266] ata1: SATA max UDMA/133 abar m4096@0xfea23000 port 
> > 0xfea23100 irq 16 lpm-pol 1
> > [    1.029305] ata2: SATA max UDMA/133 abar m4096@0xfea23000 port 
> > 0xfea23180 irq 16 lpm-pol 1
> > [    1.029316] ata3: SATA max UDMA/133 abar m4096@0xfea23000 port 
> > 0xfea23200 irq 16 lpm-pol 1
> > [    1.029327] ata4: SATA max UDMA/133 abar m4096@0xfea23000 port 
> > 0xfea23280 irq 16 lpm-pol 1
> > [    1.029341] ata5: SATA max UDMA/133 abar m4096@0xfea23000 port 
> > 0xfea23300 irq 16 lpm-pol 1
> > [    1.029356] ata6: SATA max UDMA/133 abar m4096@0xfea23000 port 
> > 0xfea23380 irq 16 lpm-pol 1
> > [    1.118111] scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK 2.5+ 
> > PQ: 0 ANSI: 5
> > [    1.348916] ata1: SATA link down (SStatus 0 SControl 300)
> > [    1.350713] ata2: SATA link down (SStatus 0 SControl 300)
> > [    1.351025] ata6: SATA link down (SStatus 0 SControl 300)
> > [    1.351160] ata5: SATA link down (SStatus 0 SControl 300)
> > [    1.351326] ata3: SATA link down (SStatus 0 SControl 300)
> > [    1.351536] ata4: SATA link down (SStatus 0 SControl 300)
> > [    1.449153] input: ImExPS/2 Generic Explorer Mouse as 
> > /devices/platform/i8042/serio1/input/input2
> > [   16.483477] sd 0:0:0:0: Power-on or device reset occurred
> > [   16.483691] sd 0:0:0:0: [sda] 2097152 512-byte logical blocks: (1.07 
> > GB/1.00 GiB)
> > [   16.483762] sd 0:0:0:0: [sda] Write Protect is off
> > [   16.483877] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
> > doesn't support DPO or FUA
> > [   16.569225] sd 0:0:0:0: [sda] Attached SCSI disk
> > 
> > I bisected it and it is caused by the commit 89e1fb7ceffd which
> > introduces calls to synchronize_rcu_expedited.
> > 
> > This commit replaces synchronize_rcu_expedited and kfree with a call to 
> > kfree_rcu_mightsleep, avoiding the 15-second delay.
> > 
> > Signed-off-by: Mikulas Patocka <[email protected]>
> > Fixes: 89e1fb7ceffd ("blk-mq: fix potential uaf for 'queue_hw_ctx'")
> > 
> > ---
> >  block/blk-mq.c |    3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > Index: linux-2.6/block/blk-mq.c
> > ===================================================================
> > --- linux-2.6.orig/block/blk-mq.c   2026-01-06 16:45:11.000000000 +0100
> > +++ linux-2.6/block/blk-mq.c        2026-01-06 16:48:00.000000000 +0100
> > @@ -4553,8 +4553,7 @@ static void __blk_mq_realloc_hw_ctxs(str
> >              * Make sure reading the old queue_hw_ctx from other
> >              * context concurrently won't trigger uaf.
> >              */
> > -           synchronize_rcu_expedited();
> > -           kfree(hctxs);
> > +           kfree_rcu_mightsleep(hctxs);
> >
> I agree, doing freeing that way is not optimal. But kfree_rcu_mightsleep()
> also might not work. It has a fallback, if we can not place an object into
> "page" due to memory allocation failure, it inlines freeing:
> 
> <snip>
> synchronize_rcu();
> free().
> <snip>
> 
> Please note, synchronize_rcu() can easily be converted into expedited
> version. See rcu_gp_is_expedited().
> 
> --
> Uladzislau Rezki

Would this patch be better? It does GFP_KERNEL allocation which dones't 
fail in practice.

> Inlining is a corner case but it can happen. The best way is to add
> rcu_head to the blk_mq_hw_ctx structure and use kfree_rcu(). It never
> blocks.

We are not protecting the blk_mq_hw_ctx structure with RCU, we are 
protecting the q->queue_hw_ctx array. So, rcu_head cannot be added to an 
array. We could cast the array to rcu_head (and make sure that the initial 
allocation is at least sizeof(struct rcu_head)), but that is hacky.

Mikulas

---
 block/blk-mq.c |   23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

Index: linux-2.6/block/blk-mq.c
===================================================================
--- linux-2.6.orig/block/blk-mq.c       2026-01-06 15:55:41.000000000 +0100
+++ linux-2.6/block/blk-mq.c    2026-01-06 16:22:40.000000000 +0100
@@ -4531,6 +4531,18 @@ static struct blk_mq_hw_ctx *blk_mq_allo
        return NULL;
 }
 
+struct rcu_free_hctxs {
+       struct rcu_head head;
+       struct blk_mq_hw_ctx **hctxs;
+};
+
+static void rcu_free_hctxs(struct rcu_head *head)
+{
+       struct rcu_free_hctxs *r = container_of(head, struct rcu_free_hctxs, 
head);
+       kfree(r->hctxs);
+       kfree(r);
+}
+
 static void __blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
                                     struct request_queue *q)
 {
@@ -4539,6 +4551,7 @@ static void __blk_mq_realloc_hw_ctxs(str
 
        if (q->nr_hw_queues < set->nr_hw_queues) {
                struct blk_mq_hw_ctx **new_hctxs;
+               struct rcu_free_hctxs *r;
 
                new_hctxs = kcalloc_node(set->nr_hw_queues,
                                       sizeof(*new_hctxs), GFP_KERNEL,
@@ -4553,8 +4566,14 @@ static void __blk_mq_realloc_hw_ctxs(str
                 * Make sure reading the old queue_hw_ctx from other
                 * context concurrently won't trigger uaf.
                 */
-               synchronize_rcu_expedited();
-               kfree(hctxs);
+               r = kmalloc(sizeof(struct rcu_free_hctxs), GFP_KERNEL);
+               if (!r) {
+                       synchronize_rcu_expedited();
+                       kfree(hctxs);
+               } else {
+                       r->hctxs = hctxs;
+                       call_rcu(&r->head, rcu_free_hctxs);
+               }
                hctxs = new_hctxs;
        }
 
>
Re: [PATCH] blk-mq: avoid stall during boot due to synchronize_rcu_expedited

Reply via email to