On Tue, Jan 29, 2019 at 03:25:48AM -0800, John Garry wrote: > Hi, > > I have a question on $subject which I hope you can shed some light on. > > According to commit c5cb83bb337c25 ("genirq/cpuhotplug: Handle managed > IRQs on CPU hotplug"), if we offline the last CPU in a managed IRQ > affinity mask, the IRQ is shutdown. > > The reasoning is that this IRQ is thought to be associated with a > specific queue on a MQ device, and the CPUs in the IRQ affinity mask are > the same CPUs associated with the queue. So, if no CPU is using the > queue, then no need for the IRQ. > > However how does this handle scenario of last CPU in IRQ affinity mask > being offlined while IO associated with queue is still in flight? > > Or if we make the decision to use queue associated with the current CPU, > and then that CPU (being the last CPU online in the queue's IRQ > afffinity mask) goes offline and we finish the delivery with another CPU? > > In these cases, when the IO completes, it would not be serviced and timeout. > > I have actually tried this on my arm64 system and I see IO timeouts.
Hm, we used to freeze the queues with CPUHP_BLK_MQ_PREPARE callback, which would reap all outstanding commands before the CPU and IRQ are taken offline. That was removed with commit 4b855ad37194f ("blk-mq: Create hctx for each present CPU"). It sounds like we should bring something like that back, but make more fine grain to the per-cpu context.