Hi Stephen,

I doubt this is hardware, it seldomly is. Would you be able to share
which Arm core it is if you can?
I'll try to reproduce here, your indication of increasing the timer
frequency is a good hint. And knowing which core or at least category of
core can be helpful.


BR, Adam

On Thu Oct 02, 2025 at 13:02:50 +0800, yy18513676366 wrote:
> Hi Adam,
> 
> 
> I truly appreciate your reply. 
> I actually encountered this issue on real hardware rather than QEMU. 
> May I ask if this problem could be related to the hardware itself? I’m not 
> quite sure I fully understand.
> 
> 
> Best regards
> Stephen.yang
> 
> 
> 
> 
> 
> 
> 
> 在 2025-09-30 15:50:22,"Adam Lackorzynski" <[email protected]> 写道:
> >Hi Stephen,
> >
> >ok, thanks, that's tricky indeed.
> >
> >In case you are doing this with QEMU, could you please make sure you
> >have the following change in your QEMU?: 
> >https://lists.gnu.org/archive/html/qemu-devel/2024-09/msg02207.html
> >
> >Or do you see this on hardware?
> >
> >
> >Thanks,
> >Adam
> >
> >On Tue Sep 30, 2025 at 11:14:57 +0800, yy18513676366 wrote:
> >> Hi Adam,
> >> 
> >> 
> >> Thank you very much for your reply — it really gave me some hope.
> >> 
> >> 
> >> This issue is indeed difficult to reproduce reliably, which has been one 
> >> of the main challenges during my debugging.
> >> So far, I have found that increasing the vtimer interrupt frequency, while 
> >> keeping the traditional handling mode (i.e., without direct injection),
> >> makes the problem significantly easier to reproduce. 
> >> 
> >> 
> >> The relevant changes are as follows. 
> >> 1、In this setup, the vtimer is adjusted from roughly one trigger per 
> >> millisecond to approximately one trigger per microsecond, 
> >> and the system remains stable and functional:
> >> 
> >> 
> >> diff --git a/src/kern/arm/timer-arm-generic.cpp 
> >> b/src/kern/arm/timer-arm-generic.cpp
> >> index a040cf46..b4cbbceb 100644
> >> --- a/src/kern/arm/timer-arm-generic.cpp
> >> +++ b/src/kern/arm/timer-arm-generic.cpp
> >> @@ -64,7 +64,8 @@ void Timer::init(Cpu_number cpu)
> >>    if (cpu == Cpu_number::boot_cpu())
> >>      {
> >>        _freq0 = frequency();
> >> -      _interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 
> >> 1000000;
> >> +      //_interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 
> >> 1000000;
> >> +      _interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 
> >> 1000000000;
> >>        printf("ARM generic timer: freq=%ld interval=%ld cnt=%lld\n",
> >>               _freq0, _interval, Gtimer::counter());
> >>        assert(_freq0);
> >> 
> >> 
> >> 2、In addition, I selected the mode where interrupts are not directly 
> >> injected:
> >> diff --git a/src/Kconfig b/src/Kconfig
> >> index 4391c996..55deeb1c 100644
> >> --- a/src/Kconfig
> >> +++ b/src/Kconfig
> >> @@ -367,7 +367,7 @@ config IOMMU
> >>  config IRQ_DIRECT_INJECT
> >>         bool "Support direct interrupt forwarding to guests"
> >>         depends on CPU_VIRT && HAS_IRQ_DIRECT_INJECT_OPTION
> >> -       default y
> >> +      default n
> >>         help
> >>           Adds support in the kernel to allow the VMM to let Fiasco 
> >> directly
> >>           forward hardware interrupts to a guest. This enables just the
> >> 
> >> At the moment, this is the only way I have found that can noticeably 
> >> increase the reproduction rate.
> >> Once again, thank you for your valuable time and feedback!
> >> 
> >> Best regards,
> >> Stephen.yang
> >> 
> >> 
> >> 
> >> 
> >> At 2025-09-29 00:11:41, "Adam Lackorzynski" <[email protected]> wrote:
> >> >Hi,
> >> >
> >> >On Wed Sep 17, 2025 at 13:57:43 +0800, yy18513676366 wrote:
> >> >> When running a virtual machine, I encounter an assertion failure after 
> >> >> the VM
> >> >> has been up for some time. The kernel crashes in src/kern/arm/
> >> >> thread-arm-hyp.cpp, specifically in the function 
> >> >> vcpu_vgic_upcall(unsigned
> >> >> virq):
> >> >> 
> >> >> vcpu_vgic_upcall(unsigned virq)
> >> >> {
> >> >>    ......
> >> >>    assert(state() & Thread_vcpu_user);
> >> >>    ......
> >> >> }
> >> >> 
> >> >> Based on source code inspection and preliminary debugging, the problem 
> >> >> seems to
> >> >> be related to the management of the Thread_vcpu_user state.
> >> >> 
> >> >>   1  Under normal circumstances, the vcpu_resume path (transitioning 
> >> >> from the
> >> >> kernel back to the guest OS) updates the vCPU state to include
> >> >> Thread_vcpu_user. However, if an interrupt is delivered during this 
> >> >> transition
> >> >> while the receiving side is not yet ready, the vCPU frequently return 
> >> >> to the
> >> >> kernel (via vcpu_return_to_kernel) and subsequently process the 
> >> >> interrupt
> >> >> through guest_irq in vcpu_entries. In this situation, the expected 
> >> >> update of
> >> >> Thread_vcpu_user may not yet have taken place, which seems result in 
> >> >> the assert
> >> >> being triggered when a VGIC interrupt is involved.
> >> >> 
> >> >>   2  A similar condition seems to occur in the vcpu_async_ipc path. At 
> >> >> the end
> >> >> of IPC handling, this function explicitly clears the Thread_vcpu_user 
> >> >> flag. If
> >> >> a VGIC interrupt is delivered during this phase, the absence of the 
> >> >> expected
> >> >> Thread_vcpu_user state seems to lead to the same assertion failure.
> >> >> 
> >> >> I would like to confirm if the two points above are correct, and what 
> >> >> steps I
> >> >> should take next to further debug this issue.
> >> >
> >> >Thanks for repording. At least the description sounds reasonable to me.
> >> >
> >> >Do you have a good way of reliably reproducing this situation?
> >> >
> >> >> In addition, I have some assumptions I would like to confirm:
> >> >> 
> >> >> First, for IPC between non-vcpu threads, the L4 microkernel handles 
> >> >> message
> >> >> delivery and scheduling (wake/schedule) directly, without requiring any
> >> >> forwarding through uvmm. Similarly, interrupts bound via the interrupt
> >> >> controller (ICU) to a non-vcpu thread or handler are also managed by 
> >> >> the kernel
> >> >> and scheduler, and therefore do not necessarily involve uvmm.
> >> >
> >> >IPCs between threads are handled by the microkernel. vcpu-thread vs.
> >> >non-vcpu-thread is just making the difference regarding how it is
> >> >delivered to the thread. For a non-vcpu thread the receiver has to wait
> >> >in IPC to get it, in vcpu mode the IPC is received by causing a vcpu
> >> >event and bringing the vcpu to its entry. This also works without
> >> >virtualization (note that vcpus also work without hw-virtualization).
> >> >For interrupts it is the same. For non-vcpu threads they have to block
> >> >in IPC to get an interrupt, or for vcpu threads, they will be brought to
> >> >their entry.
> >> >
> >> >> Second, passthrough interrupts, when not delivered in direct-injection 
> >> >> mode,
> >> >> are routed to uvmm for handling if they are bound to a vCPU. Likewise, 
> >> >> services
> >> >> provided by uvmm (such as virq) are also bound to a vCPU and therefore 
> >> >> require
> >> >> forwarding through uvmm.
> >> >
> >> >Yes. Direct injection will only happen when the vcpu is running.
> >> >
> >> >> There seems to have been a similar question in the past, but it does 
> >> >> not seem
> >> >> to have been resolved.
> >> >> 
> >> >> Re: Assertion failure error in kernel vgic interrupt processing - 
> >> >> l4-hackers -
> >> >> OS Site
> >> >> 
> >> >> I wonder if my questions are related to that post, and if any solutions 
> >> >> exist.
> >> >
> >> >Thanks, we need to work on it. Reproducing this situation on our side
> >> >would be very valuable.
> >> >
> >> >
> >> >Thanks, Adam
> >_______________________________________________
> >l4-hackers mailing list -- [email protected]
> >To unsubscribe send an email to [email protected]
_______________________________________________
l4-hackers mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to