[Bug 1859384] Re: arm gic: gic_acknowledge_irq doesn't clear line level for other cores for 1-n level-sensitive interrupts and gic_clear_pending uses GIC_DIST_TEST_MODEL (even on v2 where it always re
** Summary changed: - arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq + arm gic: gic_acknowledge_irq doesn't clear line level for other cores for 1-n level-sensitive interrupts and gic_clear_pending uses GIC_DIST_TEST_MODEL (even on v2 where it always read 0 - "N-N") -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1859384 Title: arm gic: gic_acknowledge_irq doesn't clear line level for other cores for 1-n level-sensitive interrupts and gic_clear_pending uses GIC_DIST_TEST_MODEL (even on v2 where it always read 0 - "N-N") Status in QEMU: New Bug description: For a 1-N interrupt (any SPI on the GICv2), as mandated by the TRM, only one CPU can acknowledge the IRQ until it becomes inactive. The TRM also mandates that SGIs and PPIs follow the N-N model and that SPIs follow the 1-N model. However this is not currently the case with QEMU. I have locally (no minimal test case) seen e.g. uart interrupts being acknowledged twice before having been deactivated (expected: irqId on one CPU and 1023 on the other instead). I have narrowed the issue down to the following: 1) arm_gic_common_reset resets all irq_state[id] fields to 0. This means all IRQ will use the N-N model, and if s->revision != REV_11MPCORE, then there's no way to set any interrupt to 1-N. If ""fixed"" locally with a hackjob, I still have the following trace: pl011_irq_state 534130.800 pid=2424 level=0x1 gic_set_irq 2.900 pid=2424 irq=0x21 level=0x1 cpumask=0xff target=0xff gic_update_set_irq 3.300 pid=2424 cpu=0x0 name=irq level=0x1 gic_update_set_irq 4.200 pid=2424 cpu=0x1 name=irq level=0x1 gic_acknowledge_irq 539.400 pid=2424 s=cpu cpu=0x1 irq=0x21 gic_update_set_irq 269.800 pid=2424 cpu=0x0 name=irq level=0x1 gic_cpu_read 4.100 pid=2424 s=cpu cpu=0x1 addr=0xc val=0x21 gic_acknowledge_irq 15.600 pid=2424 s=cpu cpu=0x0 irq=0x21 gic_cpu_read 265.000 pid=2424 s=cpu cpu=0x0 addr=0xc val=0x21 pl011_write 1594.700 pid=2424 addr=0x44 value=0x50 pl011_irq_state 2.000 pid=2424 level=0x0 gic_set_irq 1.300 pid=2424 irq=0x21 level=0x0 cpumask=0xff target=0xff pl011_write 30.700 pid=2424 addr=0x38 value=0x0 pl011_irq_state 1.200 pid=2424 level=0x0 gic_cpu_write 110.600 pid=2424 s=cpu cpu=0x0 addr=0x10 val=0x21 gic_cpu_write 193.400 pid=2424 s=cpu cpu=0x0 addr=0x1000 val=0x21 pl011_irq_state 1169.500 pid=2424 level=0x0 This is because: 2) gic_acknowledge_irq calls gic_clear_pending which uses GIC_DIST_CLEAR_PENDING but this usually has no effect on level- sensitive interrupts. With this often being a no-op (ie. assuming ispendr was not written to), any 1-n level-sensitive interrupt is still improperly pending on all the other cores. (Also, I don't really know how the qemu thread model works, there might be race conditions in the acknowledgment logic if gic_acknowledge_irq is called by multiple threads, too.) Option used: -nographic -machine virt,virtualization=on,accel=tcg,gic-version=2 -cpu cortex-a57 -smp 4 -m 1024 -kernel whatever.elf -d unimp,guest_errors -semihosting-config enable,target=native -chardev stdio,id=uart -serial chardev:uart -monitor none -trace gic_update_set_irq -trace gic_acknowledge_irq -trace pl011_irq_state -trace pl011_write -trace gic_cpu_read -trace gic_cpu_write -trace gic_set_irq Commit used: dc65a5bdc9fa543690a775b50d4ffbeb22c56d6d "Merge remote- tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200108' into staging" To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1859384/+subscriptions
[Bug 1859384] Re: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq
Please find attached a test case reproducing this issue. (this is a variant of https://github.com/rhdrjones/kvm-unit- tests/blob/master/arm/pl031.c but for multiple CPUs) ** Attachment added: "Test case (kvm-unit-tests)" https://bugs.launchpad.net/qemu/+bug/1859384/+attachment/5319887/+files/pl031_smp.c -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1859384 Title: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq Status in QEMU: New Bug description: For a 1-N interrupt (any SPI on the GICv2), as mandated by the TRM, only one CPU can acknowledge the IRQ until it becomes inactive. The TRM also mandates that SGIs and PPIs follow the N-N model and that SPIs follow the 1-N model. However this is not currently the case with QEMU. I have locally (no minimal test case) seen e.g. uart interrupts being acknowledged twice before having been deactivated (expected: irqId on one CPU and 1023 on the other instead). I have narrowed the issue down to the following: 1) arm_gic_common_reset resets all irq_state[id] fields to 0. This means all IRQ will use the N-N model, and if s->revision != REV_11MPCORE, then there's no way to set any interrupt to 1-N. If ""fixed"" locally with a hackjob, I still have the following trace: pl011_irq_state 534130.800 pid=2424 level=0x1 gic_set_irq 2.900 pid=2424 irq=0x21 level=0x1 cpumask=0xff target=0xff gic_update_set_irq 3.300 pid=2424 cpu=0x0 name=irq level=0x1 gic_update_set_irq 4.200 pid=2424 cpu=0x1 name=irq level=0x1 gic_acknowledge_irq 539.400 pid=2424 s=cpu cpu=0x1 irq=0x21 gic_update_set_irq 269.800 pid=2424 cpu=0x0 name=irq level=0x1 gic_cpu_read 4.100 pid=2424 s=cpu cpu=0x1 addr=0xc val=0x21 gic_acknowledge_irq 15.600 pid=2424 s=cpu cpu=0x0 irq=0x21 gic_cpu_read 265.000 pid=2424 s=cpu cpu=0x0 addr=0xc val=0x21 pl011_write 1594.700 pid=2424 addr=0x44 value=0x50 pl011_irq_state 2.000 pid=2424 level=0x0 gic_set_irq 1.300 pid=2424 irq=0x21 level=0x0 cpumask=0xff target=0xff pl011_write 30.700 pid=2424 addr=0x38 value=0x0 pl011_irq_state 1.200 pid=2424 level=0x0 gic_cpu_write 110.600 pid=2424 s=cpu cpu=0x0 addr=0x10 val=0x21 gic_cpu_write 193.400 pid=2424 s=cpu cpu=0x0 addr=0x1000 val=0x21 pl011_irq_state 1169.500 pid=2424 level=0x0 This is because: 2) gic_acknowledge_irq calls gic_clear_pending which uses GIC_DIST_CLEAR_PENDING but this usually has no effect on level- sensitive interrupts. With this often being a no-op (ie. assuming ispendr was not written to), any 1-n level-sensitive interrupt is still improperly pending on all the other cores. (Also, I don't really know how the qemu thread model works, there might be race conditions in the acknowledgment logic if gic_acknowledge_irq is called by multiple threads, too.) Option used: -nographic -machine virt,virtualization=on,accel=tcg,gic-version=2 -cpu cortex-a57 -smp 4 -m 1024 -kernel whatever.elf -d unimp,guest_errors -semihosting-config enable,target=native -chardev stdio,id=uart -serial chardev:uart -monitor none -trace gic_update_set_irq -trace gic_acknowledge_irq -trace pl011_irq_state -trace pl011_write -trace gic_cpu_read -trace gic_cpu_write -trace gic_set_irq Commit used: dc65a5bdc9fa543690a775b50d4ffbeb22c56d6d "Merge remote- tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200108' into staging" To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1859384/+subscriptions
[Bug 1859384] Re: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq
Err, I meant 3.2 subitem 5 "Note" "In a multiprocessor implementation, the GIC handles(...)" too, sorry -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1859384 Title: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq Status in QEMU: New Bug description: For a 1-N interrupt (any SPI on the GICv2), as mandated by the TRM, only one CPU can acknowledge the IRQ until it becomes inactive. The TRM also mandates that SGIs and PPIs follow the N-N model and that SPIs follow the 1-N model. However this is not currently the case with QEMU. I have locally (no minimal test case) seen e.g. uart interrupts being acknowledged twice before having been deactivated (expected: irqId on one CPU and 1023 on the other instead). I have narrowed the issue down to the following: 1) arm_gic_common_reset resets all irq_state[id] fields to 0. This means all IRQ will use the N-N model, and if s->revision != REV_11MPCORE, then there's no way to set any interrupt to 1-N. If ""fixed"" locally with a hackjob, I still have the following trace: pl011_irq_state 534130.800 pid=2424 level=0x1 gic_set_irq 2.900 pid=2424 irq=0x21 level=0x1 cpumask=0xff target=0xff gic_update_set_irq 3.300 pid=2424 cpu=0x0 name=irq level=0x1 gic_update_set_irq 4.200 pid=2424 cpu=0x1 name=irq level=0x1 gic_acknowledge_irq 539.400 pid=2424 s=cpu cpu=0x1 irq=0x21 gic_update_set_irq 269.800 pid=2424 cpu=0x0 name=irq level=0x1 gic_cpu_read 4.100 pid=2424 s=cpu cpu=0x1 addr=0xc val=0x21 gic_acknowledge_irq 15.600 pid=2424 s=cpu cpu=0x0 irq=0x21 gic_cpu_read 265.000 pid=2424 s=cpu cpu=0x0 addr=0xc val=0x21 pl011_write 1594.700 pid=2424 addr=0x44 value=0x50 pl011_irq_state 2.000 pid=2424 level=0x0 gic_set_irq 1.300 pid=2424 irq=0x21 level=0x0 cpumask=0xff target=0xff pl011_write 30.700 pid=2424 addr=0x38 value=0x0 pl011_irq_state 1.200 pid=2424 level=0x0 gic_cpu_write 110.600 pid=2424 s=cpu cpu=0x0 addr=0x10 val=0x21 gic_cpu_write 193.400 pid=2424 s=cpu cpu=0x0 addr=0x1000 val=0x21 pl011_irq_state 1169.500 pid=2424 level=0x0 This is because: 2) gic_acknowledge_irq calls gic_clear_pending which uses GIC_DIST_CLEAR_PENDING but this usually has no effect on level- sensitive interrupts. With this often being a no-op (ie. assuming ispendr was not written to), any 1-n level-sensitive interrupt is still improperly pending on all the other cores. (Also, I don't really know how the qemu thread model works, there might be race conditions in the acknowledgment logic if gic_acknowledge_irq is called by multiple threads, too.) Option used: -nographic -machine virt,virtualization=on,accel=tcg,gic-version=2 -cpu cortex-a57 -smp 4 -m 1024 -kernel whatever.elf -d unimp,guest_errors -semihosting-config enable,target=native -chardev stdio,id=uart -serial chardev:uart -monitor none -trace gic_update_set_irq -trace gic_acknowledge_irq -trace pl011_irq_state -trace pl011_write -trace gic_cpu_read -trace gic_cpu_write -trace gic_set_irq Commit used: dc65a5bdc9fa543690a775b50d4ffbeb22c56d6d "Merge remote- tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200108' into staging" To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1859384/+subscriptions
[Bug 1859384] Re: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq
For 2): since I've never written to ispendr, the level interrupt is still considered as pending on the other core because GIC_DIST_TEST_LEVEL(...) evaluates to true. I believe ack should clear the level on other cores for 1-n interrupts > For part (2), I think you're saying that we're missing the bit of functionality that in the arch spec ... I do, apologies if what I wrote was confusing. "Implications of the 1-N model" provides clearer wording about that functionality -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1859384 Title: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq Status in QEMU: New Bug description: For a 1-N interrupt (any SPI on the GICv2), as mandated by the TRM, only one CPU can acknowledge the IRQ until it becomes inactive. The TRM also mandates that SGIs and PPIs follow the N-N model and that SPIs follow the 1-N model. However this is not currently the case with QEMU. I have locally (no minimal test case) seen e.g. uart interrupts being acknowledged twice before having been deactivated (expected: irqId on one CPU and 1023 on the other instead). I have narrowed the issue down to the following: 1) arm_gic_common_reset resets all irq_state[id] fields to 0. This means all IRQ will use the N-N model, and if s->revision != REV_11MPCORE, then there's no way to set any interrupt to 1-N. If ""fixed"" locally with a hackjob, I still have the following trace: pl011_irq_state 534130.800 pid=2424 level=0x1 gic_set_irq 2.900 pid=2424 irq=0x21 level=0x1 cpumask=0xff target=0xff gic_update_set_irq 3.300 pid=2424 cpu=0x0 name=irq level=0x1 gic_update_set_irq 4.200 pid=2424 cpu=0x1 name=irq level=0x1 gic_acknowledge_irq 539.400 pid=2424 s=cpu cpu=0x1 irq=0x21 gic_update_set_irq 269.800 pid=2424 cpu=0x0 name=irq level=0x1 gic_cpu_read 4.100 pid=2424 s=cpu cpu=0x1 addr=0xc val=0x21 gic_acknowledge_irq 15.600 pid=2424 s=cpu cpu=0x0 irq=0x21 gic_cpu_read 265.000 pid=2424 s=cpu cpu=0x0 addr=0xc val=0x21 pl011_write 1594.700 pid=2424 addr=0x44 value=0x50 pl011_irq_state 2.000 pid=2424 level=0x0 gic_set_irq 1.300 pid=2424 irq=0x21 level=0x0 cpumask=0xff target=0xff pl011_write 30.700 pid=2424 addr=0x38 value=0x0 pl011_irq_state 1.200 pid=2424 level=0x0 gic_cpu_write 110.600 pid=2424 s=cpu cpu=0x0 addr=0x10 val=0x21 gic_cpu_write 193.400 pid=2424 s=cpu cpu=0x0 addr=0x1000 val=0x21 pl011_irq_state 1169.500 pid=2424 level=0x0 This is because: 2) gic_acknowledge_irq calls gic_clear_pending which uses GIC_DIST_CLEAR_PENDING but this usually has no effect on level- sensitive interrupts. With this often being a no-op (ie. assuming ispendr was not written to), any 1-n level-sensitive interrupt is still improperly pending on all the other cores. (Also, I don't really know how the qemu thread model works, there might be race conditions in the acknowledgment logic if gic_acknowledge_irq is called by multiple threads, too.) Option used: -nographic -machine virt,virtualization=on,accel=tcg,gic-version=2 -cpu cortex-a57 -smp 4 -m 1024 -kernel whatever.elf -d unimp,guest_errors -semihosting-config enable,target=native -chardev stdio,id=uart -serial chardev:uart -monitor none -trace gic_update_set_irq -trace gic_acknowledge_irq -trace pl011_irq_state -trace pl011_write -trace gic_cpu_read -trace gic_cpu_write -trace gic_set_irq Commit used: dc65a5bdc9fa543690a775b50d4ffbeb22c56d6d "Merge remote- tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200108' into staging" To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1859384/+subscriptions
[Bug 1859384] Re: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq
> about describing the expected versus actual behaviour you see. Expected behavior: * core 0 (or 1) reads irqId (irqId becomes active/active-pending) * core 1 (or resp. 0) reads 1023 * core 0 handles and deactivates the interrupt What I am getting instead: * core 0 reads irqId * core 1 also reads irqId * core 0 handles the interrupt, later deactivates it * core 1 attempts to handle the interrupt In arm-gic.c, reads of GICC_IAR call gic_acknowledge_irq. gic_acknowledge_irq, in turn, calls gic_clear_pending (in gic_internal.h) which eventually evaluates GIC_DIST_TEST_MODEL, line 266 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1859384 Title: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq Status in QEMU: New Bug description: For a 1-N interrupt (any SPI on the GICv2), as mandated by the TRM, only one CPU can acknowledge the IRQ until it becomes inactive. The TRM also mandates that SGIs and PPIs follow the N-N model and that SPIs follow the 1-N model. However this is not currently the case with QEMU. I have locally (no minimal test case) seen e.g. uart interrupts being acknowledged twice before having been deactivated (expected: irqId on one CPU and 1023 on the other instead). I have narrowed the issue down to the following: 1) arm_gic_common_reset resets all irq_state[id] fields to 0. This means all IRQ will use the N-N model, and if s->revision != REV_11MPCORE, then there's no way to set any interrupt to 1-N. If ""fixed"" locally with a hackjob, I still have the following trace: pl011_irq_state 534130.800 pid=2424 level=0x1 gic_set_irq 2.900 pid=2424 irq=0x21 level=0x1 cpumask=0xff target=0xff gic_update_set_irq 3.300 pid=2424 cpu=0x0 name=irq level=0x1 gic_update_set_irq 4.200 pid=2424 cpu=0x1 name=irq level=0x1 gic_acknowledge_irq 539.400 pid=2424 s=cpu cpu=0x1 irq=0x21 gic_update_set_irq 269.800 pid=2424 cpu=0x0 name=irq level=0x1 gic_cpu_read 4.100 pid=2424 s=cpu cpu=0x1 addr=0xc val=0x21 gic_acknowledge_irq 15.600 pid=2424 s=cpu cpu=0x0 irq=0x21 gic_cpu_read 265.000 pid=2424 s=cpu cpu=0x0 addr=0xc val=0x21 pl011_write 1594.700 pid=2424 addr=0x44 value=0x50 pl011_irq_state 2.000 pid=2424 level=0x0 gic_set_irq 1.300 pid=2424 irq=0x21 level=0x0 cpumask=0xff target=0xff pl011_write 30.700 pid=2424 addr=0x38 value=0x0 pl011_irq_state 1.200 pid=2424 level=0x0 gic_cpu_write 110.600 pid=2424 s=cpu cpu=0x0 addr=0x10 val=0x21 gic_cpu_write 193.400 pid=2424 s=cpu cpu=0x0 addr=0x1000 val=0x21 pl011_irq_state 1169.500 pid=2424 level=0x0 This is because: 2) gic_acknowledge_irq calls gic_clear_pending which uses GIC_DIST_CLEAR_PENDING but this usually has no effect on level- sensitive interrupts. With this often being a no-op (ie. assuming ispendr was not written to), any 1-n level-sensitive interrupt is still improperly pending on all the other cores. (Also, I don't really know how the qemu thread model works, there might be race conditions in the acknowledgment logic if gic_acknowledge_irq is called by multiple threads, too.) Option used: -nographic -machine virt,virtualization=on,accel=tcg,gic-version=2 -cpu cortex-a57 -smp 4 -m 1024 -kernel whatever.elf -d unimp,guest_errors -semihosting-config enable,target=native -chardev stdio,id=uart -serial chardev:uart -monitor none -trace gic_update_set_irq -trace gic_acknowledge_irq -trace pl011_irq_state -trace pl011_write -trace gic_cpu_read -trace gic_cpu_write -trace gic_set_irq Commit used: dc65a5bdc9fa543690a775b50d4ffbeb22c56d6d "Merge remote- tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200108' into staging" To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1859384/+subscriptions
[Bug 1859384] Re: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq
> You might find there is enough in kvm-unit-tests GIC tests already to build a test case for what you are seeing. Right, I will do so as soon as possible. For bug 1) however, a simpler test can be made _start: // x0=gicd mov x0, #0x0800 // Read icfgr[for irqid=32...] ldr w1, [x0, #(0xc00+32/4)] // Try to write to icfgr mov w1, #3 str w1, [x0, #(0xc00+32/4)] // Read back ldr w1, [x0, #(0xc00+32/4)] b . Running this code through the gdbstub, we can see that the model bits ((2*id+0) mod 16) in icfgr are always 0, no matter what. However, even for the GICv2, GIC_DIST_TEST_MODEL is being used in qemu source code, meaning all interrupts, including SPIs, are wrongly treated as N-N. The initialization function of the GIC should (at least for GICv2 devices) initialize these bits as 1 for all SPIs; this is currently not the case. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1859384 Title: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq Status in QEMU: New Bug description: For a 1-N interrupt (any SPI on the GICv2), as mandated by the TRM, only one CPU can acknowledge the IRQ until it becomes inactive. The TRM also mandates that SGIs and PPIs follow the N-N model and that SPIs follow the 1-N model. However this is not currently the case with QEMU. I have locally (no minimal test case) seen e.g. uart interrupts being acknowledged twice before having been deactivated (expected: irqId on one CPU and 1023 on the other instead). I have narrowed the issue down to the following: 1) arm_gic_common_reset resets all irq_state[id] fields to 0. This means all IRQ will use the N-N model, and if s->revision != REV_11MPCORE, then there's no way to set any interrupt to 1-N. If ""fixed"" locally with a hackjob, I still have the following trace: pl011_irq_state 534130.800 pid=2424 level=0x1 gic_set_irq 2.900 pid=2424 irq=0x21 level=0x1 cpumask=0xff target=0xff gic_update_set_irq 3.300 pid=2424 cpu=0x0 name=irq level=0x1 gic_update_set_irq 4.200 pid=2424 cpu=0x1 name=irq level=0x1 gic_acknowledge_irq 539.400 pid=2424 s=cpu cpu=0x1 irq=0x21 gic_update_set_irq 269.800 pid=2424 cpu=0x0 name=irq level=0x1 gic_cpu_read 4.100 pid=2424 s=cpu cpu=0x1 addr=0xc val=0x21 gic_acknowledge_irq 15.600 pid=2424 s=cpu cpu=0x0 irq=0x21 gic_cpu_read 265.000 pid=2424 s=cpu cpu=0x0 addr=0xc val=0x21 pl011_write 1594.700 pid=2424 addr=0x44 value=0x50 pl011_irq_state 2.000 pid=2424 level=0x0 gic_set_irq 1.300 pid=2424 irq=0x21 level=0x0 cpumask=0xff target=0xff pl011_write 30.700 pid=2424 addr=0x38 value=0x0 pl011_irq_state 1.200 pid=2424 level=0x0 gic_cpu_write 110.600 pid=2424 s=cpu cpu=0x0 addr=0x10 val=0x21 gic_cpu_write 193.400 pid=2424 s=cpu cpu=0x0 addr=0x1000 val=0x21 pl011_irq_state 1169.500 pid=2424 level=0x0 This is because: 2) gic_acknowledge_irq calls gic_clear_pending which uses GIC_DIST_CLEAR_PENDING but this usually has no effect on level- sensitive interrupts. With this often being a no-op (ie. assuming ispendr was not written to), any 1-n level-sensitive interrupt is still improperly pending on all the other cores. (Also, I don't really know how the qemu thread model works, there might be race conditions in the acknowledgment logic if gic_acknowledge_irq is called by multiple threads, too.) Option used: -nographic -machine virt,virtualization=on,accel=tcg,gic-version=2 -cpu cortex-a57 -smp 4 -m 1024 -kernel whatever.elf -d unimp,guest_errors -semihosting-config enable,target=native -chardev stdio,id=uart -serial chardev:uart -monitor none -trace gic_update_set_irq -trace gic_acknowledge_irq -trace pl011_irq_state -trace pl011_write -trace gic_cpu_read -trace gic_cpu_write -trace gic_set_irq Commit used: dc65a5bdc9fa543690a775b50d4ffbeb22c56d6d "Merge remote- tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200108' into staging" To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1859384/+subscriptions
[Bug 1859384] Re: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq
** Summary changed: - arm gicv2: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq + arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq ** Tags removed: gicv2 ** Tags added: gic -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1859384 Title: arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq Status in QEMU: New Bug description: For a 1-N interrupt (any SPI on the GICv2), as mandated by the TRM, only one CPU can acknowledge the IRQ until it becomes inactive. The TRM also mandates that SGIs and PPIs follow the N-N model and that SPIs follow the 1-N model. However this is not currently the case with QEMU. I have locally (no minimal test case) seen e.g. uart interrupts being acknowledged twice before having been deactivated (expected: irqId on one CPU and 1023 on the other instead). I have narrowed the issue down to the following: 1) arm_gic_common_reset resets all irq_state[id] fields to 0. This means all IRQ will use the N-N model, and if s->revision != REV_11MPCORE, then there's no way to set any interrupt to 1-N. If ""fixed"" locally with a hackjob, I still have the following trace: pl011_irq_state 534130.800 pid=2424 level=0x1 gic_set_irq 2.900 pid=2424 irq=0x21 level=0x1 cpumask=0xff target=0xff gic_update_set_irq 3.300 pid=2424 cpu=0x0 name=irq level=0x1 gic_update_set_irq 4.200 pid=2424 cpu=0x1 name=irq level=0x1 gic_acknowledge_irq 539.400 pid=2424 s=cpu cpu=0x1 irq=0x21 gic_update_set_irq 269.800 pid=2424 cpu=0x0 name=irq level=0x1 gic_cpu_read 4.100 pid=2424 s=cpu cpu=0x1 addr=0xc val=0x21 gic_acknowledge_irq 15.600 pid=2424 s=cpu cpu=0x0 irq=0x21 gic_cpu_read 265.000 pid=2424 s=cpu cpu=0x0 addr=0xc val=0x21 pl011_write 1594.700 pid=2424 addr=0x44 value=0x50 pl011_irq_state 2.000 pid=2424 level=0x0 gic_set_irq 1.300 pid=2424 irq=0x21 level=0x0 cpumask=0xff target=0xff pl011_write 30.700 pid=2424 addr=0x38 value=0x0 pl011_irq_state 1.200 pid=2424 level=0x0 gic_cpu_write 110.600 pid=2424 s=cpu cpu=0x0 addr=0x10 val=0x21 gic_cpu_write 193.400 pid=2424 s=cpu cpu=0x0 addr=0x1000 val=0x21 pl011_irq_state 1169.500 pid=2424 level=0x0 This is because: 2) gic_acknowledge_irq calls gic_clear_pending which uses GIC_DIST_CLEAR_PENDING but this usually has no effect on level- sensitive interrupts. With this often being a no-op (ie. assuming ispendr was not written to), any 1-n level-sensitive interrupt is still improperly pending on all the other cores. (Also, I don't really know how the qemu thread model works, there might be race conditions in the acknowledgment logic if gic_acknowledge_irq is called by multiple threads, too.) Option used: -nographic -machine virt,virtualization=on,accel=tcg,gic-version=2 -cpu cortex-a57 -smp 4 -m 1024 -kernel whatever.elf -d unimp,guest_errors -semihosting-config enable,target=native -chardev stdio,id=uart -serial chardev:uart -monitor none -trace gic_update_set_irq -trace gic_acknowledge_irq -trace pl011_irq_state -trace pl011_write -trace gic_cpu_read -trace gic_cpu_write -trace gic_set_irq Commit used: dc65a5bdc9fa543690a775b50d4ffbeb22c56d6d "Merge remote- tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200108' into staging" To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1859384/+subscriptions
[Bug 1859384] [NEW] arm gic: interrupt model never 1 on non-mpcore and race condition in gic_acknowledge_irq
Public bug reported: For a 1-N interrupt (any SPI on the GICv2), as mandated by the TRM, only one CPU can acknowledge the IRQ until it becomes inactive. The TRM also mandates that SGIs and PPIs follow the N-N model and that SPIs follow the 1-N model. However this is not currently the case with QEMU. I have locally (no minimal test case) seen e.g. uart interrupts being acknowledged twice before having been deactivated (expected: irqId on one CPU and 1023 on the other instead). I have narrowed the issue down to the following: 1) arm_gic_common_reset resets all irq_state[id] fields to 0. This means all IRQ will use the N-N model, and if s->revision != REV_11MPCORE, then there's no way to set any interrupt to 1-N. If ""fixed"" locally with a hackjob, I still have the following trace: pl011_irq_state 534130.800 pid=2424 level=0x1 gic_set_irq 2.900 pid=2424 irq=0x21 level=0x1 cpumask=0xff target=0xff gic_update_set_irq 3.300 pid=2424 cpu=0x0 name=irq level=0x1 gic_update_set_irq 4.200 pid=2424 cpu=0x1 name=irq level=0x1 gic_acknowledge_irq 539.400 pid=2424 s=cpu cpu=0x1 irq=0x21 gic_update_set_irq 269.800 pid=2424 cpu=0x0 name=irq level=0x1 gic_cpu_read 4.100 pid=2424 s=cpu cpu=0x1 addr=0xc val=0x21 gic_acknowledge_irq 15.600 pid=2424 s=cpu cpu=0x0 irq=0x21 gic_cpu_read 265.000 pid=2424 s=cpu cpu=0x0 addr=0xc val=0x21 pl011_write 1594.700 pid=2424 addr=0x44 value=0x50 pl011_irq_state 2.000 pid=2424 level=0x0 gic_set_irq 1.300 pid=2424 irq=0x21 level=0x0 cpumask=0xff target=0xff pl011_write 30.700 pid=2424 addr=0x38 value=0x0 pl011_irq_state 1.200 pid=2424 level=0x0 gic_cpu_write 110.600 pid=2424 s=cpu cpu=0x0 addr=0x10 val=0x21 gic_cpu_write 193.400 pid=2424 s=cpu cpu=0x0 addr=0x1000 val=0x21 pl011_irq_state 1169.500 pid=2424 level=0x0 This is because: 2) gic_acknowledge_irq calls gic_clear_pending which uses GIC_DIST_CLEAR_PENDING but this usually has no effect on level-sensitive interrupts. With this often being a no-op (ie. assuming ispendr was not written to), any 1-n level-sensitive interrupt is still improperly pending on all the other cores. (Also, I don't really know how the qemu thread model works, there might be race conditions in the acknowledgment logic if gic_acknowledge_irq is called by multiple threads, too.) Option used: -nographic -machine virt,virtualization=on,accel=tcg,gic-version=2 -cpu cortex-a57 -smp 4 -m 1024 -kernel whatever.elf -d unimp,guest_errors -semihosting-config enable,target=native -chardev stdio,id=uart -serial chardev:uart -monitor none -trace gic_update_set_irq -trace gic_acknowledge_irq -trace pl011_irq_state -trace pl011_write -trace gic_cpu_read -trace gic_cpu_write -trace gic_set_irq Commit used: dc65a5bdc9fa543690a775b50d4ffbeb22c56d6d "Merge remote- tracking branch 'remotes/dgibson/tags/ppc-for-5.0-20200108' into staging" ** Affects: qemu Importance: Undecided Status: New ** Tags: arm gic ** Description changed: For a 1-N interrupt (any SPI on the GICv2), as mandated by the TRM, only one CPU can acknowledge the IRQ until it becomes inactive. The TRM also mandates that SGIs and PPIs follow the N-N model and that SPIs follow the 1-N model. However this is not currently the case with QEMU. I have locally (no minimal test case) seen e.g. uart interrupts being acknowledged twice before having been deactivated (expected: irqId on one CPU and 1023 on the other instead). I have narrowed the issue down to the following: 1) arm_gic_common_reset resets all irq_state[id] fields to 0. This means all IRQ will use the N-N model, and if s->revision != REV_11MPCORE, then there's no way to set any interrupt to 1-N. **If fixed locally** with a hackjob, I still have the following trace: pl011_irq_state 534130.800 pid=2424 level=0x1 gic_set_irq 2.900 pid=2424 irq=0x21 level=0x1 cpumask=0xff target=0xff gic_update_set_irq 3.300 pid=2424 cpu=0x0 name=irq level=0x1 gic_update_set_irq 4.200 pid=2424 cpu=0x1 name=irq level=0x1 gic_acknowledge_irq 539.400 pid=2424 s=cpu cpu=0x1 irq=0x21 gic_update_set_irq 269.800 pid=2424 cpu=0x0 name=irq level=0x1 gic_cpu_read 4.100 pid=2424 s=cpu cpu=0x1 addr=0xc val=0x21 gic_acknowledge_irq 15.600 pid=2424 s=cpu cpu=0x0 irq=0x21 gic_cpu_read 265.000 pid=2424 s=cpu cpu=0x0 addr=0xc val=0x21 pl011_write 1594.700 pid=2424 addr=0x44 value=0x50 pl011_irq_state 2.000 pid=2424 level=0x0 gic_set_irq 1.300 pid=2424 irq=0x21 level=0x0 cpumask=0xff target=0xff pl011_write 30.700 pid=2424 addr=0x38 value=0x0 pl011_irq_state 1.200 pid=2424 level=0x0 gic_cpu_write 110.600 pid=2424 s=cpu cpu=0x0 addr=0x10 val=0x21 gic_cpu_write 193.400 pid=2424 s=cpu cpu=0x0 addr=0x1000 val=0x21 pl011_irq_state 1169.500 pid=2424 level=0x0 This is because: 2) gic_acknowledge_irq calls gic_clear_pending which uses GIC_DIST_CLEAR_PENDING but this usually has no effect on level-sensitive interrupts. With this often being a no-op (ie. assuming ispendr was not
[Bug 1859021] [NEW] qemu-system-aarch64 (tcg): cval + voff overflow not handled, causes qemu to hang
Public bug reported: The Armv8 architecture reference manual states that for any timer set (e.g. CNTP* and CNTV*), the condition for such timer to generate an interrupt (if enabled & unmasked) is: CVAL <= CNT(P/V)CT Although this is arguably sloppy coding, I have seen code that is therefore assuming it can set CVAL to a very high value (e.g. UINT64_MAX) and leave the interrupt enabled in CTL, and never get the interrupt. On latest master commit as the time of writing, there is an integer overflow in target/arm/helper.c gt_recalc_timer affecting the virtual timer when the interrupt is enabled in CTL: /* Next transition is when we hit cval */ nexttick = gt->cval + offset; When this overflow happens, I notice that qemu is no longer responsive and that I have to SIGKILL the process: - qemu takes nearly all the cpu time of the cores it is running on (e.g. 50% cpu usage if running on half the cores) and is completely unresponsive - no guest interrupt (reported via -d int) is generated Here the minimal code example to reproduce the issue: mov x0, #1 msr cntvoff_el2, x0 mov x0, #-1 msr cntv_cval_el0, x0 mov x0, #1 msr cntv_ctl_el0, x0 // interrupt generation enabled, not masked; qemu will start to hang here Options used: -nographic -machine virt,virtualization=on,gic-version=2,accel=tcg -cpu cortex-a57 -smp 4 -m 1024 -kernel whatever.elf -d unimp,guest_errors,int -semihosting-config enable,target=native -serial mon:stdio Version used: 4.2 ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1859021 Title: qemu-system-aarch64 (tcg): cval + voff overflow not handled, causes qemu to hang Status in QEMU: New Bug description: The Armv8 architecture reference manual states that for any timer set (e.g. CNTP* and CNTV*), the condition for such timer to generate an interrupt (if enabled & unmasked) is: CVAL <= CNT(P/V)CT Although this is arguably sloppy coding, I have seen code that is therefore assuming it can set CVAL to a very high value (e.g. UINT64_MAX) and leave the interrupt enabled in CTL, and never get the interrupt. On latest master commit as the time of writing, there is an integer overflow in target/arm/helper.c gt_recalc_timer affecting the virtual timer when the interrupt is enabled in CTL: /* Next transition is when we hit cval */ nexttick = gt->cval + offset; When this overflow happens, I notice that qemu is no longer responsive and that I have to SIGKILL the process: - qemu takes nearly all the cpu time of the cores it is running on (e.g. 50% cpu usage if running on half the cores) and is completely unresponsive - no guest interrupt (reported via -d int) is generated Here the minimal code example to reproduce the issue: mov x0, #1 msr cntvoff_el2, x0 mov x0, #-1 msr cntv_cval_el0, x0 mov x0, #1 msr cntv_ctl_el0, x0 // interrupt generation enabled, not masked; qemu will start to hang here Options used: -nographic -machine virt,virtualization=on,gic-version=2,accel=tcg -cpu cortex-a57 -smp 4 -m 1024 -kernel whatever.elf -d unimp,guest_errors,int -semihosting-config enable,target=native -serial mon:stdio Version used: 4.2 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1859021/+subscriptions