Hi Stephen, ok, thanks, that's tricky indeed.
In case you are doing this with QEMU, could you please make sure you have the following change in your QEMU?: https://lists.gnu.org/archive/html/qemu-devel/2024-09/msg02207.html Or do you see this on hardware? Thanks, Adam On Tue Sep 30, 2025 at 11:14:57 +0800, yy18513676366 wrote: > Hi Adam, > > > Thank you very much for your reply — it really gave me some hope. > > > This issue is indeed difficult to reproduce reliably, which has been one of > the main challenges during my debugging. > So far, I have found that increasing the vtimer interrupt frequency, while > keeping the traditional handling mode (i.e., without direct injection), > makes the problem significantly easier to reproduce. > > > The relevant changes are as follows. > 1、In this setup, the vtimer is adjusted from roughly one trigger per > millisecond to approximately one trigger per microsecond, > and the system remains stable and functional: > > > diff --git a/src/kern/arm/timer-arm-generic.cpp > b/src/kern/arm/timer-arm-generic.cpp > index a040cf46..b4cbbceb 100644 > --- a/src/kern/arm/timer-arm-generic.cpp > +++ b/src/kern/arm/timer-arm-generic.cpp > @@ -64,7 +64,8 @@ void Timer::init(Cpu_number cpu) > if (cpu == Cpu_number::boot_cpu()) > { > _freq0 = frequency(); > - _interval = Unsigned64{_freq0} * Config::Scheduler_granularity / > 1000000; > + //_interval = Unsigned64{_freq0} * Config::Scheduler_granularity / > 1000000; > + _interval = Unsigned64{_freq0} * Config::Scheduler_granularity / > 1000000000; > printf("ARM generic timer: freq=%ld interval=%ld cnt=%lld\n", > _freq0, _interval, Gtimer::counter()); > assert(_freq0); > > > 2、In addition, I selected the mode where interrupts are not directly injected: > diff --git a/src/Kconfig b/src/Kconfig > index 4391c996..55deeb1c 100644 > --- a/src/Kconfig > +++ b/src/Kconfig > @@ -367,7 +367,7 @@ config IOMMU > config IRQ_DIRECT_INJECT > bool "Support direct interrupt forwarding to guests" > depends on CPU_VIRT && HAS_IRQ_DIRECT_INJECT_OPTION > - default y > + default n > help > Adds support in the kernel to allow the VMM to let Fiasco directly > forward hardware interrupts to a guest. This enables just the > > At the moment, this is the only way I have found that can noticeably increase > the reproduction rate. > Once again, thank you for your valuable time and feedback! > > Best regards, > Stephen.yang > > > > > At 2025-09-29 00:11:41, "Adam Lackorzynski" <[email protected]> wrote: > >Hi, > > > >On Wed Sep 17, 2025 at 13:57:43 +0800, yy18513676366 wrote: > >> When running a virtual machine, I encounter an assertion failure after the > >> VM > >> has been up for some time. The kernel crashes in src/kern/arm/ > >> thread-arm-hyp.cpp, specifically in the function vcpu_vgic_upcall(unsigned > >> virq): > >> > >> vcpu_vgic_upcall(unsigned virq) > >> { > >> ...... > >> assert(state() & Thread_vcpu_user); > >> ...... > >> } > >> > >> Based on source code inspection and preliminary debugging, the problem > >> seems to > >> be related to the management of the Thread_vcpu_user state. > >> > >> 1 Under normal circumstances, the vcpu_resume path (transitioning from > >> the > >> kernel back to the guest OS) updates the vCPU state to include > >> Thread_vcpu_user. However, if an interrupt is delivered during this > >> transition > >> while the receiving side is not yet ready, the vCPU frequently return to > >> the > >> kernel (via vcpu_return_to_kernel) and subsequently process the interrupt > >> through guest_irq in vcpu_entries. In this situation, the expected update > >> of > >> Thread_vcpu_user may not yet have taken place, which seems result in the > >> assert > >> being triggered when a VGIC interrupt is involved. > >> > >> 2 A similar condition seems to occur in the vcpu_async_ipc path. At the > >> end > >> of IPC handling, this function explicitly clears the Thread_vcpu_user > >> flag. If > >> a VGIC interrupt is delivered during this phase, the absence of the > >> expected > >> Thread_vcpu_user state seems to lead to the same assertion failure. > >> > >> I would like to confirm if the two points above are correct, and what > >> steps I > >> should take next to further debug this issue. > > > >Thanks for repording. At least the description sounds reasonable to me. > > > >Do you have a good way of reliably reproducing this situation? > > > >> In addition, I have some assumptions I would like to confirm: > >> > >> First, for IPC between non-vcpu threads, the L4 microkernel handles message > >> delivery and scheduling (wake/schedule) directly, without requiring any > >> forwarding through uvmm. Similarly, interrupts bound via the interrupt > >> controller (ICU) to a non-vcpu thread or handler are also managed by the > >> kernel > >> and scheduler, and therefore do not necessarily involve uvmm. > > > >IPCs between threads are handled by the microkernel. vcpu-thread vs. > >non-vcpu-thread is just making the difference regarding how it is > >delivered to the thread. For a non-vcpu thread the receiver has to wait > >in IPC to get it, in vcpu mode the IPC is received by causing a vcpu > >event and bringing the vcpu to its entry. This also works without > >virtualization (note that vcpus also work without hw-virtualization). > >For interrupts it is the same. For non-vcpu threads they have to block > >in IPC to get an interrupt, or for vcpu threads, they will be brought to > >their entry. > > > >> Second, passthrough interrupts, when not delivered in direct-injection > >> mode, > >> are routed to uvmm for handling if they are bound to a vCPU. Likewise, > >> services > >> provided by uvmm (such as virq) are also bound to a vCPU and therefore > >> require > >> forwarding through uvmm. > > > >Yes. Direct injection will only happen when the vcpu is running. > > > >> There seems to have been a similar question in the past, but it does not > >> seem > >> to have been resolved. > >> > >> Re: Assertion failure error in kernel vgic interrupt processing - > >> l4-hackers - > >> OS Site > >> > >> I wonder if my questions are related to that post, and if any solutions > >> exist. > > > >Thanks, we need to work on it. Reproducing this situation on our side > >would be very valuable. > > > > > >Thanks, Adam _______________________________________________ l4-hackers mailing list -- [email protected] To unsubscribe send an email to [email protected]
