On 4/21/2026 3:38 PM, Arnd Bergmann wrote:
> On Tue, Apr 21, 2026, at 08:57, Peng Yang wrote:
>> __send_to_port() busy-waits in virtqueue_get_buf() while holding
>> outvq_lock with IRQs disabled. If the host stops draining the TX
>> virtqueue, this loop never terminates.
>>
>> This was observed during secondary VM boot: virtio_mem plugged memory
>> in multiple iterations, each emitting dev_info() messages through the
>> hvc console. A writev() on the hvc TTY entered __send_to_port() and
>> stalled in the spin loop. When the watchdog bark ISR fired on another
>> CPU, it attempted printk(), which tried to acquire outvq_lock through
>> the same path and spun indefinitely. With all CPUs stuck, the watchdog
>> could not be serviced and triggered a bite.
>>
>> Add a 200 ms deadline using ktime_get_mono_fast_ns() to bound the spin
>> loop. ktime_get_mono_fast_ns() reads the hardware counter directly and
>> is safe to call with IRQs disabled and spinlocks held.
>>
>> The 200 ms value is chosen to be far above normal host response latency
>> (microseconds) to avoid spurious exits, yet well below the watchdog
>> bark-to-bite window (typically 3 s) so that CPUs can escape the loop
>> and complete the bark handler before a bite occurs.
> 
> Which host implementation do you use? The way the virtio_console
> driver works really assumes that virtqueue_kick() consumes the
> buffer synchronously. Even though that is not how virtio is
> specified, this does tend to work. ;-)
> 
We are using crosvm as the host VMM with its virtio-console backend,
running on Android. The trigger is Android host reboot/shutdown: when
Android initiates a reboot, the crosvm process exits and tears down
the virtio-console backend. At that point, the TX virtqueue is no
longer being drained by the host and will never be consumed again.

The crash dump from the actual failure confirms the exact deadlock
scenario:

Core 3 holds outvq_lock and spins forever in virtqueue_get_buf waiting
for the host to consume the buffer:

virtqueue_get_buf
__send_to_port
put_chars
hvc_push
hvc_write
n_tty_write
  <- writev() syscall

Core 0 has a watchdog bark ISR fire and attempts printk, holds the
console lock, but spins on _raw_spin_lock_irqsave waiting to acquire
outvq_lock:

queued_spin_lock_slowpath
_raw_spin_lock_irqsave
__send_to_port
put_chars
hvc_console_print
console_flush_all
console_unlock
vprintk_emit
  <- printk (watchdog bark handler)

Core 1 has a virtio_mem worker calling _dev_info, spinning inside
vprintk_emit waiting to acquire the console lock which is held by Core
0:

vprintk_emit       <- spinning for console lock
dev_vprintk_emit
dev_printk_emit
__dev_printk
_dev_info
  <- virtio_mem worker

Core 2 (khvcd kernel thread) is also blocked in __hvc_poll trying to
acquire outvq_lock:

queued_spin_lock_slowpath
_raw_spin_lock_irqsave
__hvc_poll
khvcd

With all CPUs stuck, the watchdog cannot be serviced and a bite occurs
before the graceful shutdown can complete.

The 200 ms timeout is intended as a bounded escape from this "backend
already gone" scenario. It is well above normal crosvm response latency
(microseconds) to avoid false exits under normal operation, and well
below the watchdog bark-to-bite window so that CPUs can escape the loop
and allow the graceful shutdown to proceed.
>> @@ -632,10 +634,18 @@ static ssize_t __send_to_port(struct port *port, 
>> struct scatterlist *sg,
>>       * buffer and relax the spinning requirement.  The downside is
>>       * we need to kmalloc a GFP_ATOMIC buffer each time the
>>       * console driver writes something out.
>> +     *
>> +     * To avoid spinning forever if the host stops processing the
>> +     * TX virtqueue (e.g. during VM shutdown), a 200ms deadline is
>> +     * used to break out of the loop as a fallback.
>        */
> 
> Did you by any chance mean to use microseconds instead of milliseconds?
> Waiting this long with interrupts disabled likely breaks a lot
> of assumptions, e.g. in the scheduler. If you have to deal with
> a hypervisor that does not handle the console output synchronously,
> the alternative suggested in the existing comment would likely
> be more appropriate.
> 
>       Arnd

The unit is intentionally milliseconds, not microseconds. The timeout
only triggers in the abnormal case where crosvm has already exited —
under normal operation crosvm drains the TX virtqueue in microseconds
and the loop exits immediately. The 200 ms value is chosen to be well
below the watchdog bark-to-bite window (typically a few seconds) so
that CPUs can escape the spin loop and allow the watchdog bark handler
to complete before a bite occurs.

You are right that holding IRQs disabled for up to 200 ms is far from
ideal and breaks scheduler assumptions. We considered the alternative
suggested in the existing comment — copying data to a GFP_ATOMIC buffer
to avoid the spin entirely — but that approach has a fundamental limitation
in our case: the spin loop is not just in __send_to_port called from the hvc
TTY path, but also in send_control_msg, and more critically, the deadlock
occurs because printk itself ends up calling put_chars → __send_to_port
while holding the console lock with IRQs disabled. Refactoring to eliminate
the spin loop entirely would require a more invasive rework of the console
write path, which is beyond the scope of this fix.

The 200 ms timeout is intended as a minimal, targeted workaround to prevent
the watchdog bite in our specific scenario. We are open to suggestions on a
better long-term approach.

Reply via email to