On 2026/02/20 19:51, Philippe Mathieu-Daudé wrote:
On 20/2/26 04:02, Chao Liu wrote:
Hi Peter,

On Wed, Feb 18, 2026 at 04:47:50PM +0000, Peter Maydell wrote:
On Tue, 17 Feb 2026 at 21:42, Philippe Mathieu-Daudé <[email protected]> wrote:

On 17/2/26 20:12, Yodel Eldar wrote:
+Philippe

Hi,

On 17/02/2026 03:21, Peter Maydell wrote:
On Tue, 17 Feb 2026 at 06:35, Akihiko Odaki
<[email protected]> wrote:

alpha_cpu_realizefn() did not properly call cpu_reset(), which
corrupted icount. Add the missing function call to fix icount.

Signed-off-by: Akihiko Odaki <[email protected]>
---

So, the real culprit was hiding in plain sight in Alpha-specific code
all along? Congrats on finding it!

   target/alpha/cpu.c | 1 +
   1 file changed, 1 insertion(+)

diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
index 1780db7d1e29..74281ebdb367 100644
--- a/target/alpha/cpu.c
+++ b/target/alpha/cpu.c
@@ -124,6 +124,7 @@ static void alpha_cpu_realizefn(DeviceState *dev,
Error **errp)
       }

       qemu_init_vcpu(cs);
+    cpu_reset(cs);

       acc->parent_realize(dev, errp);
   }

Realize functions shouldn't call reset on themselves.
For CPU objects it is currently the responsibility of the
board code to arrange that the CPU objects get reset.

I think the following addresses Peter's remarks; it passed 100
repetitions of the Alpha replay test after reapplying the reverted
commit:

diff --git hw/alpha/dp264.c hw/alpha/dp264.c
index 5e64528431..091ffc0085 100644
--- hw/alpha/dp264.c
+++ hw/alpha/dp264.c
@@ -68,5 +68,7 @@ static void clipper_init(MachineState *machine)
       memset(cpus, 0, sizeof(cpus));
       for (i = 0; i < smp_cpus; ++i) {
-        cpus[i] = ALPHA_CPU(cpu_create(machine->cpu_type));
+        CPUState *cpu = cpu_create(machine->cpu_type);
+        cpu_reset(cpu);
+        cpus[i] = ALPHA_CPU(cpu);

Hmm this pattern is used a lot (creating CPUs in board_init without
manually calling cpu_reset). If this is the simplest fix, maybe
we could add a cpu_create_resetted() helper and use it where
appropriate (i.e. not where qemu_register_reset is then called).

Resetting the CPU either in realize or else in the machine
after create causes it to get reset once on startup. But
it doesn't do anything to cause it to be reset when the
user (or the guest) triggers a system reset. If the board
arranges for the CPUs to be reset during system reset then
that also works for the initial startup case.

Reset is unfortunately a bit of a mess: I have periodically
thought about it but still don't have an overview of what
exactly it ought to be doing, let alone a plan (e.g. should

I'm quite interested in the CPU reset flow, so I tried to analyze the
related code across all architectures and organized my findings below.


== Current CPU Reset Strategies ==

There are essentially five different patterns in use now:

   Strategy | Who does it        | Initial | System  | Example
   ---------+--------------------+---------+--------- +------------------------
   A        | realize only       | Yes     | No (*)  | x86
            |                    |         |         |
   ---------+--------------------+---------+--------- +------------------------    B        | realize + board    | Yes x2  | Yes     | RISC-V, ARM, OpenRISC
            | (simple wrapper)   |         |         |
   ---------+--------------------+---------+--------- +------------------------    C        | board only         | Yes     | Yes     | PPC, MIPS, SPARC, M68K,             | (custom logic)     |         |         | SH4, Xtensa, Microblaze    ---------+--------------------+---------+--------- +------------------------
   D        | nobody             | No      | No      | Alpha (bug)
            |                    |         |         |
   ---------+--------------------+---------+--------- +------------------------
   E        | board (SoC child   | Yes     | Yes     | MIPS CPS
            | on qbus tree)      |         |         |
   ---------+--------------------+---------+--------- +------------------------

     (*) x86 gets away with this because the reset vector is architecturally
         fixed at 0xFFFFFFF0 and does not need board-level PC setup.

The root cause of this inconsistency is that cpu_create() calls
qdev_realize(dev, NULL, ...) with a NULL bus, so CPUs are not part of
the qbus tree. The only thing registered into the root reset container
in qemu_machine_creation_done() is sysbus_get_default():

```
   /* hw/core/machine.c */
   qemu_register_resettable(OBJECT(sysbus_get_default()));
```

The comment right above it says it all:

   "Note that this will *not* reset any Device objects
    which are not attached to some part of the qbus tree."

This means every board must independently arrange for CPU reset, and
as Alpha demonstrates, it is easy to forget.


== Per-Architecture Details ==


--- Strategy A: realize-only reset (x86) ---

x86_cpu_realizefn() calls cpu_reset() at the end of realize:

```
   /* target/i386/cpu.c */
   cpu_reset(cs);
   xcc->parent_realize(dev, &local_err);
```

But the board code in hw/i386/x86-common.c:x86_cpus_init() does NOT
register any qemu_register_reset() callback for CPUs. The CPU is
created with qdev_realize(DEVICE(cpu), NULL, errp) and is not on any
bus.

x86 has its own x86_cpu_reset_hold() (target/i386/cpu.c) which
correctly resets all architectural state including segment registers,
CR0, etc. But since the CPU object is not in the reset container,
this method would not be called during qemu_system_reset() through
the resettable framework alone.

x86 "works" because:
   1. The reset vector is hardwired (no board PC setup needed)
   2. KVM may handle reset internally for KVM-accelerated guests
   3. The firmware (SeaBIOS/OVMF) handles warm reboot via the
      keyboard controller reset mechanism


--- Strategy B: realize + board simple wrapper (RISC-V, ARM, OpenRISC) ---

These targets call cpu_reset() in their realize function AND the board
registers a qemu_register_reset() callback that also calls cpu_reset().
This results in the CPU being reset twice at startup.

RISC-V example:

```
   /* target/riscv/cpu.c - in riscv_cpu_realize() */
   qemu_init_vcpu(cs);
   cpu_reset(cs);                    /* first reset */

   /* hw/riscv/riscv_hart.c - board callback */
   static void riscv_harts_cpu_reset(void *opaque)
   {
       RISCVCPU *cpu = opaque;
       cpu_reset(CPU(cpu)); /* second reset (at startup + system reset) */
   }

   /* hw/riscv/riscv_hart.c */
   qemu_register_reset(riscv_harts_cpu_reset, &s->harts[idx]);
   return qdev_realize(DEVICE(&s->harts[idx]), NULL, errp);
```

Note the ordering:

qemu_register_reset() is called BEFORE qdev_realize(). So the
sequence at startup is:
   1. qdev_realize() -> riscv_cpu_realize() -> cpu_reset() [first]
   2. qemu_machine_creation_done() -> qemu_system_reset()
      -> LegacyReset callback -> cpu_reset()               [second]

The board callback here is a pure wrapper with no extra logic, so the
double reset is harmless but wasteful(we can use gdb to make sure).

ARM is similar but the board callback has substantial custom logic
(see Strategy C below). The realize-time cpu_reset() at
target/arm/cpu.c is redundant with the one in do_cpu_reset().

OpenRISC (target/openrisc/cpu.c) also calls cpu_reset() in
realize, and boards like hw/openrisc/openrisc_sim.c register
qemu_register_reset() with a simple wrapper.


--- Strategy C: board-only with custom post-reset logic (PPC, MIPS...) ---

These boards register qemu_register_reset() callbacks that first call
cpu_reset() and then apply board-specific state (PC, registers, TLB).

MIPS Malta (hw/mips/malta.c):

```
   static void main_cpu_reset(void *opaque)
   {
       MIPSCPU *cpu = opaque;
       CPUMIPSState *env = &cpu->env;

       cpu_reset(CPU(cpu));

       /* Clear ERL bit when booting a kernel */
       if (loaderparams.kernel_filename) {
           env->CP0_Status &= ~(1 << CP0St_ERL);
       }
   }
```

PPC e500 (hw/ppc/e500.c) - distinguishes primary/secondary:

```
   static void ppce500_cpu_reset(void *opaque)       /* primary */
   {
       /* ... */
       cpu_reset(cs);
       cs->halted = 0;
       env->gpr[1] = (16 * MiB) - 8;
       env->gpr[3] = bi->dt_base;    /* device tree address */
       /* ... sets up TLB mappings ... */
   }

   static void ppce500_cpu_reset_sec(void *opaque)   /* secondary */
   {
       cpu_reset(cs);
       cs->exception_index = EXCP_HLT;  /* halt until kicked */
   }

   /* Registration distinguishes primary vs secondary */
   if (!i) {
       qemu_register_reset(ppce500_cpu_reset, cpu);
   } else {
       qemu_register_reset(ppce500_cpu_reset_sec, cpu);
   }
```

SPARC sun4m (hw/sparc/sun4m.c) and about 30 other board
files across PPC, M68K, SH4, Xtensa, Microblaze, and OpenRISC
follow the same general pattern with varying amounts of custom
post-reset logic.

PS:

In this case, we can also talk about ARM.

ARM boot (hw/arm/boot.c) - the most complex case:

```
   static void do_cpu_reset(void *opaque)
   {
       ARMCPU *cpu = opaque;
       /* ... */
       cpu_reset(cs);
       if (info) {
           if (!info->is_linux) {
               /* Set endianness, jump to entry */
               cpu_set_pc(cs, info->entry);
           } else {
               /* Emulate firmware: set EL, configure SCTLR, ... */
               arm_emulate_firmware_reset(cs, target_el);
               if (cpu == info->primary_cpu) {
                   cpu_set_pc(cs, info->loader_start);
               } else if (info->secondary_cpu_reset_hook) {
                   info->secondary_cpu_reset_hook(cpu, info);
               }
           }
       }
   }
```

ARM boards don't call qemu_register_reset() directly. Instead,
arm_load_kernel() (hw/arm/boot.c) registers do_cpu_reset for
all CPUs in a loop:

```
   for (cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
       qemu_register_reset(do_cpu_reset, ARM_CPU(cs));
   }
```


--- Strategy D: no reset at all (Alpha - BUG) ---

Alpha is the only architecture that neither calls cpu_reset() in
realize nor registers a qemu_register_reset() callback:

```
   /* target/alpha/cpu.c */
   static void alpha_cpu_realizefn(DeviceState *dev, Error **errp)
   {
       /* ... */
       cpu_exec_realizefn(cs, &local_err);
       /* ... */
       qemu_init_vcpu(cs);
       /* NO cpu_reset(cs) here! */
       acc->parent_realize(dev, errp);
   }

   /* hw/alpha/dp264.c */
   for (i = 0; i < smp_cpus; ++i) {
       cpus[i] = ALPHA_CPU(cpu_create(machine->cpu_type));
       /* NO qemu_register_reset() */
   }

   /* Later, board directly pokes env fields: */
   cpus[i]->env.pc = palcode_entry;
   cpus[i]->env.palbr = palcode_entry;
```

Alpha also has no target-specific reset_hold method -
alpha_cpu_class_init() does not set rc->phases.hold, so it relies
entirely on the parent cpu_common_reset_hold(). But since the CPU
is not in the reset container, even that never runs during system
reset.

This is the bug that corrupted icount, as Akihiko's patch identified.


--- Strategy E: SoC container on qbus tree (MIPS CPS) ---

Some SoC-level container devices create CPUs as child objects:

```
   /* hw/mips/cps.c */
   static void mips_cps_realize(DeviceState *dev, Error **errp)
   {
       for (i = 0; i < s->num_vp; i++) {
           cpu = MIPS_CPU(object_new(s->cpu_type));
           qdev_realize_and_unref(DEVICE(cpu), NULL, errp);
           qemu_register_reset(main_cpu_reset, s->cpus[i]);
       }
   }
```

Even here, the CPU itself is realized with a NULL bus, so it still
needs the explicit qemu_register_reset() call.


== The Core Problem ==

cpu_create() and most board code call qdev_realize(dev, NULL, ...)
for CPUs, placing them outside the qbus tree. The comment in
hw/core/cpu-common.c acknowledges this:

```
   /*
    * Reason: CPUs still need special care by board code: wiring up
    * IRQs, adding reset handlers, halting non-first CPUs, ...
    */
   dc->user_creatable = false;
```

The "adding reset handlers" part is exactly what is inconsistently
done across the codebase.


===

As the analysis above shows, the current CPU reset situation is indeed
quite messy -- multi different strategies across the codebase, with at
least one outright bug (Alpha) and several cases of redundant double
resets.

This is a nice analysis.


I think Peter's ideas about rethinking the reset infrastructure
are well worth pursuing. Whether that means cascading reset via the QOM
tree, or a separate reset infrastructure that defaults to the QOM tree
but allows SoC-level overrides, having a single consistent mechanism
would eliminate this entire class of "forgot to register CPU reset"
bugs.

I don't think an infrastructure is necessary. Anyone who write a new board or CPU will start by copying an existing example; if the existing boards reset CPUs properly, it's unlikely to forget that for the new ones. alpha is just too old.

In my opinion, the problem here is that the existing ones are not agreed on how to reset the CPU. This can lead to confusion and may create bugs in new and old boards and CPUs.


The tricky part, as Peter noted, is getting from here to there
without breaking things or starting yet another never-completed API
transition.

Great analysis!

For completeness you should also consider the hotplug path issues:

https://lore.kernel.org/qemu-devel/CAFEAcA- [email protected]/ https://lore.kernel.org/qemu-devel/20251001010127.3092631-1- [email protected]/

Based on that, few months ago I started to audit the current
transitions to look at an enforced state machine, from my notes:
https://tinyurl.com/3h3bv63k

With that in mind (state-machine enforced transitions) we really
should use the "call RESET within REALIZE" pattern, and fix by
another mean (probably a reset bus).

It will be more accurate to model it as a combination of several state machines since there are several variables. For example, the diagram shows halted and stopped as mutually exclusive, but both halted and stopped can be true.

Likewise, if you only want to avoid the "call RESET within REALIZE" pattern; you can just add a bool variable. In the reset handler, you can check the variable, run the hotplug path, and then set the variable.

However, having multiple variables makes it hard to understand the whole system, so interactions among them need to be analyzed. Perhaps we may find particular states of different variables are mutually exclusive; in such a case the variables can be merged into one variable (in C it will be an enum).

Regards,
Akihiko Odaki


Today I think we should add some powerOn/Off phases too, likely
able to resolve Salil timing issues.

If there is interest in moving forward with refactoring this area,
I'd be happy to help contribute patches or review work. :)


Thanks,
Chao

we cascade reset via the QOM tree? or do we need a separate
reset infrastructure that defaults to the QOM tree but that
SoC objects can override if they have more complex reset
requirements? How does this interact with bus-reset (which is
definitely a thing for some buses? And once we've decided what
we want, how do we get from where we are right now to there
without breaking things and ideally without having another
of the long-drawn-out never-completed API transitions we're so
good at?)

thanks
-- PMM




Reply via email to