Re: Assertion ¡®state() & Thread_vcpu_user' in kernel vgic interrupt processing

Adam Lackorzynski Fri, 17 Oct 2025 13:59:23 -0700

Hi Stephen,

while we do not have this particular Rockchip SoC right now, we have A76
+ A55 cores. Thanks!



BR,
Adam

On Mon Oct 13, 2025 at 10:28:14 +0800, yy18513676366 wrote:
> Hi Adam,
> 
> We’re using the Radxa SiRider S1 development board, which is based on the 
> Rockchip RK3588 SoC.
> You can find more details here: https://arace.tech/products/radxa-sirider-s1.
> The CPU consists of 4 Cortex-A76 cores and 2 Cortex-A55 cores.
> 
> Best,
> Stephen.yang
> 
> 
> 
> 
> 
> At 2025-10-06 06:00:44, "Adam Lackorzynski" <[email protected]> wrote:
> >Hi Stephen,
> >
> >I doubt this is hardware, it seldomly is. Would you be able to share
> >which Arm core it is if you can?
> >I'll try to reproduce here, your indication of increasing the timer
> >frequency is a good hint. And knowing which core or at least category of
> >core can be helpful.
> >
> >
> >BR, Adam
> >
> >On Thu Oct 02, 2025 at 13:02:50 +0800, yy18513676366 wrote:
> >> Hi Adam,
> >> 
> >> 
> >> I truly appreciate your reply. 
> >> I actually encountered this issue on real hardware rather than QEMU. 
> >> May I ask if this problem could be related to the hardware itself? I’m not 
> >> quite sure I fully understand.
> >> 
> >> 
> >> Best regards
> >> Stephen.yang
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 在 2025-09-30 15:50:22，"Adam Lackorzynski" <[email protected]> 写道：
> >> >Hi Stephen,
> >> >
> >> >ok, thanks, that's tricky indeed.
> >> >
> >> >In case you are doing this with QEMU, could you please make sure you
> >> >have the following change in your QEMU?: 
> >> >https://lists.gnu.org/archive/html/qemu-devel/2024-09/msg02207.html
> >> >
> >> >Or do you see this on hardware?
> >> >
> >> >
> >> >Thanks,
> >> >Adam
> >> >
> >> >On Tue Sep 30, 2025 at 11:14:57 +0800, yy18513676366 wrote:
> >> >> Hi Adam,
> >> >> 
> >> >> 
> >> >> Thank you very much for your reply — it really gave me some hope.
> >> >> 
> >> >> 
> >> >> This issue is indeed difficult to reproduce reliably, which has been 
> >> >> one of the main challenges during my debugging.
> >> >> So far, I have found that increasing the vtimer interrupt frequency, 
> >> >> while keeping the traditional handling mode (i.e., without direct 
> >> >> injection),
> >> >> makes the problem significantly easier to reproduce. 
> >> >> 
> >> >> 
> >> >> The relevant changes are as follows. 
> >> >> 1、In this setup, the vtimer is adjusted from roughly one trigger per 
> >> >> millisecond to approximately one trigger per microsecond, 
> >> >> and the system remains stable and functional:
> >> >> 
> >> >> 
> >> >> diff --git a/src/kern/arm/timer-arm-generic.cpp 
> >> >> b/src/kern/arm/timer-arm-generic.cpp
> >> >> index a040cf46..b4cbbceb 100644
> >> >> --- a/src/kern/arm/timer-arm-generic.cpp
> >> >> +++ b/src/kern/arm/timer-arm-generic.cpp
> >> >> @@ -64,7 +64,8 @@ void Timer::init(Cpu_number cpu)
> >> >>    if (cpu == Cpu_number::boot_cpu())
> >> >>      {
> >> >>        _freq0 = frequency();
> >> >> -      _interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 
> >> >> 1000000;
> >> >> +      //_interval = Unsigned64{_freq0} * Config::Scheduler_granularity 
> >> >> / 1000000;
> >> >> +      _interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 
> >> >> 1000000000;
> >> >>        printf("ARM generic timer: freq=%ld interval=%ld cnt=%lld\n",
> >> >>               _freq0, _interval, Gtimer::counter());
> >> >>        assert(_freq0);
> >> >> 
> >> >> 
> >> >> 2、In addition, I selected the mode where interrupts are not directly 
> >> >> injected:
> >> >> diff --git a/src/Kconfig b/src/Kconfig
> >> >> index 4391c996..55deeb1c 100644
> >> >> --- a/src/Kconfig
> >> >> +++ b/src/Kconfig
> >> >> @@ -367,7 +367,7 @@ config IOMMU
> >> >>  config IRQ_DIRECT_INJECT
> >> >>         bool "Support direct interrupt forwarding to guests"
> >> >>         depends on CPU_VIRT && HAS_IRQ_DIRECT_INJECT_OPTION
> >> >> -       default y
> >> >> +      default n
> >> >>         help
> >> >>           Adds support in the kernel to allow the VMM to let Fiasco 
> >> >> directly
> >> >>           forward hardware interrupts to a guest. This enables just the
> >> >> 
> >> >> At the moment, this is the only way I have found that can noticeably 
> >> >> increase the reproduction rate.
> >> >> Once again, thank you for your valuable time and feedback!
> >> >> 
> >> >> Best regards,
> >> >> Stephen.yang
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> At 2025-09-29 00:11:41, "Adam Lackorzynski" <[email protected]> wrote:
> >> >> >Hi,
> >> >> >
> >> >> >On Wed Sep 17, 2025 at 13:57:43 +0800, yy18513676366 wrote:
> >> >> >> When running a virtual machine, I encounter an assertion failure 
> >> >> >> after the VM
> >> >> >> has been up for some time. The kernel crashes in src/kern/arm/
> >> >> >> thread-arm-hyp.cpp, specifically in the function 
> >> >> >> vcpu_vgic_upcall(unsigned
> >> >> >> virq):
> >> >> >> 
> >> >> >> vcpu_vgic_upcall(unsigned virq)
> >> >> >> {
> >> >> >>    ......
> >> >> >>    assert(state() & Thread_vcpu_user);
> >> >> >>    ......
> >> >> >> }
> >> >> >> 
> >> >> >> Based on source code inspection and preliminary debugging, the 
> >> >> >> problem seems to
> >> >> >> be related to the management of the Thread_vcpu_user state.
> >> >> >> 
> >> >> >>   1  Under normal circumstances, the vcpu_resume path (transitioning 
> >> >> >> from the
> >> >> >> kernel back to the guest OS) updates the vCPU state to include
> >> >> >> Thread_vcpu_user. However, if an interrupt is delivered during this 
> >> >> >> transition
> >> >> >> while the receiving side is not yet ready, the vCPU frequently 
> >> >> >> return to the
> >> >> >> kernel (via vcpu_return_to_kernel) and subsequently process the 
> >> >> >> interrupt
> >> >> >> through guest_irq in vcpu_entries. In this situation, the expected 
> >> >> >> update of
> >> >> >> Thread_vcpu_user may not yet have taken place, which seems result in 
> >> >> >> the assert
> >> >> >> being triggered when a VGIC interrupt is involved.
> >> >> >> 
> >> >> >>   2  A similar condition seems to occur in the vcpu_async_ipc path. 
> >> >> >> At the end
> >> >> >> of IPC handling, this function explicitly clears the 
> >> >> >> Thread_vcpu_user flag. If
> >> >> >> a VGIC interrupt is delivered during this phase, the absence of the 
> >> >> >> expected
> >> >> >> Thread_vcpu_user state seems to lead to the same assertion failure.
> >> >> >> 
> >> >> >> I would like to confirm if the two points above are correct, and 
> >> >> >> what steps I
> >> >> >> should take next to further debug this issue.
> >> >> >
> >> >> >Thanks for repording. At least the description sounds reasonable to me.
> >> >> >
> >> >> >Do you have a good way of reliably reproducing this situation?
> >> >> >
> >> >> >> In addition, I have some assumptions I would like to confirm:
> >> >> >> 
> >> >> >> First, for IPC between non-vcpu threads, the L4 microkernel handles 
> >> >> >> message
> >> >> >> delivery and scheduling (wake/schedule) directly, without requiring 
> >> >> >> any
> >> >> >> forwarding through uvmm. Similarly, interrupts bound via the 
> >> >> >> interrupt
> >> >> >> controller (ICU) to a non-vcpu thread or handler are also managed by 
> >> >> >> the kernel
> >> >> >> and scheduler, and therefore do not necessarily involve uvmm.
> >> >> >
> >> >> >IPCs between threads are handled by the microkernel. vcpu-thread vs.
> >> >> >non-vcpu-thread is just making the difference regarding how it is
> >> >> >delivered to the thread. For a non-vcpu thread the receiver has to wait
> >> >> >in IPC to get it, in vcpu mode the IPC is received by causing a vcpu
> >> >> >event and bringing the vcpu to its entry. This also works without
> >> >> >virtualization (note that vcpus also work without hw-virtualization).
> >> >> >For interrupts it is the same. For non-vcpu threads they have to block
> >> >> >in IPC to get an interrupt, or for vcpu threads, they will be brought 
> >> >> >to
> >> >> >their entry.
> >> >> >
> >> >> >> Second, passthrough interrupts, when not delivered in 
> >> >> >> direct-injection mode,
> >> >> >> are routed to uvmm for handling if they are bound to a vCPU. 
> >> >> >> Likewise, services
> >> >> >> provided by uvmm (such as virq) are also bound to a vCPU and 
> >> >> >> therefore require
> >> >> >> forwarding through uvmm.
> >> >> >
> >> >> >Yes. Direct injection will only happen when the vcpu is running.
> >> >> >
> >> >> >> There seems to have been a similar question in the past, but it does 
> >> >> >> not seem
> >> >> >> to have been resolved.
> >> >> >> 
> >> >> >> Re: Assertion failure error in kernel vgic interrupt processing - 
> >> >> >> l4-hackers -
> >> >> >> OS Site
> >> >> >> 
> >> >> >> I wonder if my questions are related to that post, and if any 
> >> >> >> solutions exist.
> >> >> >
> >> >> >Thanks, we need to work on it. Reproducing this situation on our side
> >> >> >would be very valuable.
> >> >> >
> >> >> >
> >> >> >Thanks, Adam
> >> >_______________________________________________
> >> >l4-hackers mailing list -- [email protected]
> >> >To unsubscribe send an email to [email protected]
> >_______________________________________________
> >l4-hackers mailing list -- [email protected]
> >To unsubscribe send an email to [email protected]
_______________________________________________
l4-hackers mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Re: Assertion ¡®state() & Thread_vcpu_user' in kernel vgic interrupt processing

Reply via email to