Hi Stephen,

ok, thanks, that's tricky indeed.

In case you are doing this with QEMU, could you please make sure you
have the following change in your QEMU?: 
https://lists.gnu.org/archive/html/qemu-devel/2024-09/msg02207.html

Or do you see this on hardware?


Thanks,
Adam

On Tue Sep 30, 2025 at 11:14:57 +0800, yy18513676366 wrote:
> Hi Adam,
> 
> 
> Thank you very much for your reply — it really gave me some hope.
> 
> 
> This issue is indeed difficult to reproduce reliably, which has been one of 
> the main challenges during my debugging.
> So far, I have found that increasing the vtimer interrupt frequency, while 
> keeping the traditional handling mode (i.e., without direct injection),
> makes the problem significantly easier to reproduce. 
> 
> 
> The relevant changes are as follows. 
> 1、In this setup, the vtimer is adjusted from roughly one trigger per 
> millisecond to approximately one trigger per microsecond, 
> and the system remains stable and functional:
> 
> 
> diff --git a/src/kern/arm/timer-arm-generic.cpp 
> b/src/kern/arm/timer-arm-generic.cpp
> index a040cf46..b4cbbceb 100644
> --- a/src/kern/arm/timer-arm-generic.cpp
> +++ b/src/kern/arm/timer-arm-generic.cpp
> @@ -64,7 +64,8 @@ void Timer::init(Cpu_number cpu)
>    if (cpu == Cpu_number::boot_cpu())
>      {
>        _freq0 = frequency();
> -      _interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 
> 1000000;
> +      //_interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 
> 1000000;
> +      _interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 
> 1000000000;
>        printf("ARM generic timer: freq=%ld interval=%ld cnt=%lld\n",
>               _freq0, _interval, Gtimer::counter());
>        assert(_freq0);
> 
> 
> 2、In addition, I selected the mode where interrupts are not directly injected:
> diff --git a/src/Kconfig b/src/Kconfig
> index 4391c996..55deeb1c 100644
> --- a/src/Kconfig
> +++ b/src/Kconfig
> @@ -367,7 +367,7 @@ config IOMMU
>  config IRQ_DIRECT_INJECT
>         bool "Support direct interrupt forwarding to guests"
>         depends on CPU_VIRT && HAS_IRQ_DIRECT_INJECT_OPTION
> -       default y
> +      default n
>         help
>           Adds support in the kernel to allow the VMM to let Fiasco directly
>           forward hardware interrupts to a guest. This enables just the
> 
> At the moment, this is the only way I have found that can noticeably increase 
> the reproduction rate.
> Once again, thank you for your valuable time and feedback!
> 
> Best regards,
> Stephen.yang
> 
> 
> 
> 
> At 2025-09-29 00:11:41, "Adam Lackorzynski" <[email protected]> wrote:
> >Hi,
> >
> >On Wed Sep 17, 2025 at 13:57:43 +0800, yy18513676366 wrote:
> >> When running a virtual machine, I encounter an assertion failure after the 
> >> VM
> >> has been up for some time. The kernel crashes in src/kern/arm/
> >> thread-arm-hyp.cpp, specifically in the function vcpu_vgic_upcall(unsigned
> >> virq):
> >> 
> >> vcpu_vgic_upcall(unsigned virq)
> >> {
> >>    ......
> >>    assert(state() & Thread_vcpu_user);
> >>    ......
> >> }
> >> 
> >> Based on source code inspection and preliminary debugging, the problem 
> >> seems to
> >> be related to the management of the Thread_vcpu_user state.
> >> 
> >>   1  Under normal circumstances, the vcpu_resume path (transitioning from 
> >> the
> >> kernel back to the guest OS) updates the vCPU state to include
> >> Thread_vcpu_user. However, if an interrupt is delivered during this 
> >> transition
> >> while the receiving side is not yet ready, the vCPU frequently return to 
> >> the
> >> kernel (via vcpu_return_to_kernel) and subsequently process the interrupt
> >> through guest_irq in vcpu_entries. In this situation, the expected update 
> >> of
> >> Thread_vcpu_user may not yet have taken place, which seems result in the 
> >> assert
> >> being triggered when a VGIC interrupt is involved.
> >> 
> >>   2  A similar condition seems to occur in the vcpu_async_ipc path. At the 
> >> end
> >> of IPC handling, this function explicitly clears the Thread_vcpu_user 
> >> flag. If
> >> a VGIC interrupt is delivered during this phase, the absence of the 
> >> expected
> >> Thread_vcpu_user state seems to lead to the same assertion failure.
> >> 
> >> I would like to confirm if the two points above are correct, and what 
> >> steps I
> >> should take next to further debug this issue.
> >
> >Thanks for repording. At least the description sounds reasonable to me.
> >
> >Do you have a good way of reliably reproducing this situation?
> >
> >> In addition, I have some assumptions I would like to confirm:
> >> 
> >> First, for IPC between non-vcpu threads, the L4 microkernel handles message
> >> delivery and scheduling (wake/schedule) directly, without requiring any
> >> forwarding through uvmm. Similarly, interrupts bound via the interrupt
> >> controller (ICU) to a non-vcpu thread or handler are also managed by the 
> >> kernel
> >> and scheduler, and therefore do not necessarily involve uvmm.
> >
> >IPCs between threads are handled by the microkernel. vcpu-thread vs.
> >non-vcpu-thread is just making the difference regarding how it is
> >delivered to the thread. For a non-vcpu thread the receiver has to wait
> >in IPC to get it, in vcpu mode the IPC is received by causing a vcpu
> >event and bringing the vcpu to its entry. This also works without
> >virtualization (note that vcpus also work without hw-virtualization).
> >For interrupts it is the same. For non-vcpu threads they have to block
> >in IPC to get an interrupt, or for vcpu threads, they will be brought to
> >their entry.
> >
> >> Second, passthrough interrupts, when not delivered in direct-injection 
> >> mode,
> >> are routed to uvmm for handling if they are bound to a vCPU. Likewise, 
> >> services
> >> provided by uvmm (such as virq) are also bound to a vCPU and therefore 
> >> require
> >> forwarding through uvmm.
> >
> >Yes. Direct injection will only happen when the vcpu is running.
> >
> >> There seems to have been a similar question in the past, but it does not 
> >> seem
> >> to have been resolved.
> >> 
> >> Re: Assertion failure error in kernel vgic interrupt processing - 
> >> l4-hackers -
> >> OS Site
> >> 
> >> I wonder if my questions are related to that post, and if any solutions 
> >> exist.
> >
> >Thanks, we need to work on it. Reproducing this situation on our side
> >would be very valuable.
> >
> >
> >Thanks, Adam
_______________________________________________
l4-hackers mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to