Re:Re: Assertion ¡®state() & Thread_vcpu_user' in kernel vgic interrupt processing

yy18513676366 Wed, 01 Oct 2025 22:04:51 -0700

Hi Adam,


I truly appreciate your reply. 
I actually encountered this issue on real hardware rather than QEMU. 
May I ask if this problem could be related to the hardware itself? I’m not 
quite sure I fully understand.


Best regards
Stephen.yang







在 2025-09-30 15:50:22，"Adam Lackorzynski" <[email protected]> 写道：
>Hi Stephen,
>
>ok, thanks, that's tricky indeed.
>
>In case you are doing this with QEMU, could you please make sure you
>have the following change in your QEMU?: 
>https://lists.gnu.org/archive/html/qemu-devel/2024-09/msg02207.html
>
>Or do you see this on hardware?
>
>
>Thanks,
>Adam
>
>On Tue Sep 30, 2025 at 11:14:57 +0800, yy18513676366 wrote:
>> Hi Adam,
>> 
>> 
>> Thank you very much for your reply — it really gave me some hope.
>> 
>> 
>> This issue is indeed difficult to reproduce reliably, which has been one of 
>> the main challenges during my debugging.
>> So far, I have found that increasing the vtimer interrupt frequency, while 
>> keeping the traditional handling mode (i.e., without direct injection),
>> makes the problem significantly easier to reproduce. 
>> 
>> 
>> The relevant changes are as follows. 
>> 1、In this setup, the vtimer is adjusted from roughly one trigger per 
>> millisecond to approximately one trigger per microsecond, 
>> and the system remains stable and functional:
>> 
>> 
>> diff --git a/src/kern/arm/timer-arm-generic.cpp 
>> b/src/kern/arm/timer-arm-generic.cpp
>> index a040cf46..b4cbbceb 100644
>> --- a/src/kern/arm/timer-arm-generic.cpp
>> +++ b/src/kern/arm/timer-arm-generic.cpp
>> @@ -64,7 +64,8 @@ void Timer::init(Cpu_number cpu)
>>    if (cpu == Cpu_number::boot_cpu())
>>      {
>>        _freq0 = frequency();
>> -      _interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 
>> 1000000;
>> +      //_interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 
>> 1000000;
>> +      _interval = Unsigned64{_freq0} * Config::Scheduler_granularity / 
>> 1000000000;
>>        printf("ARM generic timer: freq=%ld interval=%ld cnt=%lld\n",
>>               _freq0, _interval, Gtimer::counter());
>>        assert(_freq0);
>> 
>> 
>> 2、In addition, I selected the mode where interrupts are not directly 
>> injected:
>> diff --git a/src/Kconfig b/src/Kconfig
>> index 4391c996..55deeb1c 100644
>> --- a/src/Kconfig
>> +++ b/src/Kconfig
>> @@ -367,7 +367,7 @@ config IOMMU
>>  config IRQ_DIRECT_INJECT
>>         bool "Support direct interrupt forwarding to guests"
>>         depends on CPU_VIRT && HAS_IRQ_DIRECT_INJECT_OPTION
>> -       default y
>> +      default n
>>         help
>>           Adds support in the kernel to allow the VMM to let Fiasco directly
>>           forward hardware interrupts to a guest. This enables just the
>> 
>> At the moment, this is the only way I have found that can noticeably 
>> increase the reproduction rate.
>> Once again, thank you for your valuable time and feedback!
>> 
>> Best regards,
>> Stephen.yang
>> 
>> 
>> 
>> 
>> At 2025-09-29 00:11:41, "Adam Lackorzynski" <[email protected]> wrote:
>> >Hi,
>> >
>> >On Wed Sep 17, 2025 at 13:57:43 +0800, yy18513676366 wrote:
>> >> When running a virtual machine, I encounter an assertion failure after 
>> >> the VM
>> >> has been up for some time. The kernel crashes in src/kern/arm/
>> >> thread-arm-hyp.cpp, specifically in the function vcpu_vgic_upcall(unsigned
>> >> virq):
>> >> 
>> >> vcpu_vgic_upcall(unsigned virq)
>> >> {
>> >>    ......
>> >>    assert(state() & Thread_vcpu_user);
>> >>    ......
>> >> }
>> >> 
>> >> Based on source code inspection and preliminary debugging, the problem 
>> >> seems to
>> >> be related to the management of the Thread_vcpu_user state.
>> >> 
>> >>   1  Under normal circumstances, the vcpu_resume path (transitioning from 
>> >> the
>> >> kernel back to the guest OS) updates the vCPU state to include
>> >> Thread_vcpu_user. However, if an interrupt is delivered during this 
>> >> transition
>> >> while the receiving side is not yet ready, the vCPU frequently return to 
>> >> the
>> >> kernel (via vcpu_return_to_kernel) and subsequently process the interrupt
>> >> through guest_irq in vcpu_entries. In this situation, the expected update 
>> >> of
>> >> Thread_vcpu_user may not yet have taken place, which seems result in the 
>> >> assert
>> >> being triggered when a VGIC interrupt is involved.
>> >> 
>> >>   2  A similar condition seems to occur in the vcpu_async_ipc path. At 
>> >> the end
>> >> of IPC handling, this function explicitly clears the Thread_vcpu_user 
>> >> flag. If
>> >> a VGIC interrupt is delivered during this phase, the absence of the 
>> >> expected
>> >> Thread_vcpu_user state seems to lead to the same assertion failure.
>> >> 
>> >> I would like to confirm if the two points above are correct, and what 
>> >> steps I
>> >> should take next to further debug this issue.
>> >
>> >Thanks for repording. At least the description sounds reasonable to me.
>> >
>> >Do you have a good way of reliably reproducing this situation?
>> >
>> >> In addition, I have some assumptions I would like to confirm:
>> >> 
>> >> First, for IPC between non-vcpu threads, the L4 microkernel handles 
>> >> message
>> >> delivery and scheduling (wake/schedule) directly, without requiring any
>> >> forwarding through uvmm. Similarly, interrupts bound via the interrupt
>> >> controller (ICU) to a non-vcpu thread or handler are also managed by the 
>> >> kernel
>> >> and scheduler, and therefore do not necessarily involve uvmm.
>> >
>> >IPCs between threads are handled by the microkernel. vcpu-thread vs.
>> >non-vcpu-thread is just making the difference regarding how it is
>> >delivered to the thread. For a non-vcpu thread the receiver has to wait
>> >in IPC to get it, in vcpu mode the IPC is received by causing a vcpu
>> >event and bringing the vcpu to its entry. This also works without
>> >virtualization (note that vcpus also work without hw-virtualization).
>> >For interrupts it is the same. For non-vcpu threads they have to block
>> >in IPC to get an interrupt, or for vcpu threads, they will be brought to
>> >their entry.
>> >
>> >> Second, passthrough interrupts, when not delivered in direct-injection 
>> >> mode,
>> >> are routed to uvmm for handling if they are bound to a vCPU. Likewise, 
>> >> services
>> >> provided by uvmm (such as virq) are also bound to a vCPU and therefore 
>> >> require
>> >> forwarding through uvmm.
>> >
>> >Yes. Direct injection will only happen when the vcpu is running.
>> >
>> >> There seems to have been a similar question in the past, but it does not 
>> >> seem
>> >> to have been resolved.
>> >> 
>> >> Re: Assertion failure error in kernel vgic interrupt processing - 
>> >> l4-hackers -
>> >> OS Site
>> >> 
>> >> I wonder if my questions are related to that post, and if any solutions 
>> >> exist.
>> >
>> >Thanks, we need to work on it. Reproducing this situation on our side
>> >would be very valuable.
>> >
>> >
>> >Thanks, Adam
>_______________________________________________
>l4-hackers mailing list -- [email protected]
>To unsubscribe send an email to [email protected]

_______________________________________________
l4-hackers mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Re:Re: Assertion ¡®state() & Thread_vcpu_user' in kernel vgic interrupt processing

Reply via email to