On Tue, Apr 21, 2026, at 08:57, Peng Yang wrote:
> __send_to_port() busy-waits in virtqueue_get_buf() while holding
> outvq_lock with IRQs disabled. If the host stops draining the TX
> virtqueue, this loop never terminates.
>
> This was observed during secondary VM boot: virtio_mem plugged memory
> in multiple iterations, each emitting dev_info() messages through the
> hvc console. A writev() on the hvc TTY entered __send_to_port() and
> stalled in the spin loop. When the watchdog bark ISR fired on another
> CPU, it attempted printk(), which tried to acquire outvq_lock through
> the same path and spun indefinitely. With all CPUs stuck, the watchdog
> could not be serviced and triggered a bite.
>
> Add a 200 ms deadline using ktime_get_mono_fast_ns() to bound the spin
> loop. ktime_get_mono_fast_ns() reads the hardware counter directly and
> is safe to call with IRQs disabled and spinlocks held.
>
> The 200 ms value is chosen to be far above normal host response latency
> (microseconds) to avoid spurious exits, yet well below the watchdog
> bark-to-bite window (typically 3 s) so that CPUs can escape the loop
> and complete the bark handler before a bite occurs.

Which host implementation do you use? The way the virtio_console
driver works really assumes that virtqueue_kick() consumes the
buffer synchronously. Even though that is not how virtio is
specified, this does tend to work. ;-)

> @@ -632,10 +634,18 @@ static ssize_t __send_to_port(struct port *port, struct 
> scatterlist *sg,
>        * buffer and relax the spinning requirement.  The downside is
>        * we need to kmalloc a GFP_ATOMIC buffer each time the
>        * console driver writes something out.
> +      *
> +      * To avoid spinning forever if the host stops processing the
> +      * TX virtqueue (e.g. during VM shutdown), a 200ms deadline is
> +      * used to break out of the loop as a fallback.
         */

Did you by any chance mean to use microseconds instead of milliseconds?
Waiting this long with interrupts disabled likely breaks a lot
of assumptions, e.g. in the scheduler. If you have to deal with
a hypervisor that does not handle the console output synchronously,
the alternative suggested in the existing comment would likely
be more appropriate.

      Arnd

Reply via email to