Hi Peter,
On Wed, Feb 18, 2026 at 04:47:50PM +0000, Peter Maydell wrote:
> On Tue, 17 Feb 2026 at 21:42, Philippe Mathieu-Daudé <[email protected]>
> wrote:
> >
> > On 17/2/26 20:12, Yodel Eldar wrote:
> > > +Philippe
> > >
> > > Hi,
> > >
> > > On 17/02/2026 03:21, Peter Maydell wrote:
> > >> On Tue, 17 Feb 2026 at 06:35, Akihiko Odaki
> > >> <[email protected]> wrote:
> > >>>
> > >>> alpha_cpu_realizefn() did not properly call cpu_reset(), which
> > >>> corrupted icount. Add the missing function call to fix icount.
> > >>>
> > >>> Signed-off-by: Akihiko Odaki <[email protected]>
> > >>> ---
> > >
> > > So, the real culprit was hiding in plain sight in Alpha-specific code
> > > all along? Congrats on finding it!
> > >
> > >>> target/alpha/cpu.c | 1 +
> > >>> 1 file changed, 1 insertion(+)
> > >>>
> > >>> diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
> > >>> index 1780db7d1e29..74281ebdb367 100644
> > >>> --- a/target/alpha/cpu.c
> > >>> +++ b/target/alpha/cpu.c
> > >>> @@ -124,6 +124,7 @@ static void alpha_cpu_realizefn(DeviceState *dev,
> > >>> Error **errp)
> > >>> }
> > >>>
> > >>> qemu_init_vcpu(cs);
> > >>> + cpu_reset(cs);
> > >>>
> > >>> acc->parent_realize(dev, errp);
> > >>> }
> > >>
> > >> Realize functions shouldn't call reset on themselves.
> > >> For CPU objects it is currently the responsibility of the
> > >> board code to arrange that the CPU objects get reset.
>
> > > I think the following addresses Peter's remarks; it passed 100
> > > repetitions of the Alpha replay test after reapplying the reverted
> > > commit:
> > >
> > > diff --git hw/alpha/dp264.c hw/alpha/dp264.c
> > > index 5e64528431..091ffc0085 100644
> > > --- hw/alpha/dp264.c
> > > +++ hw/alpha/dp264.c
> > > @@ -68,5 +68,7 @@ static void clipper_init(MachineState *machine)
> > > memset(cpus, 0, sizeof(cpus));
> > > for (i = 0; i < smp_cpus; ++i) {
> > > - cpus[i] = ALPHA_CPU(cpu_create(machine->cpu_type));
> > > + CPUState *cpu = cpu_create(machine->cpu_type);
> > > + cpu_reset(cpu);
> > > + cpus[i] = ALPHA_CPU(cpu);
> >
> > Hmm this pattern is used a lot (creating CPUs in board_init without
> > manually calling cpu_reset). If this is the simplest fix, maybe
> > we could add a cpu_create_resetted() helper and use it where
> > appropriate (i.e. not where qemu_register_reset is then called).
>
> Resetting the CPU either in realize or else in the machine
> after create causes it to get reset once on startup. But
> it doesn't do anything to cause it to be reset when the
> user (or the guest) triggers a system reset. If the board
> arranges for the CPUs to be reset during system reset then
> that also works for the initial startup case.
>
> Reset is unfortunately a bit of a mess: I have periodically
> thought about it but still don't have an overview of what
> exactly it ought to be doing, let alone a plan (e.g. should
I'm quite interested in the CPU reset flow, so I tried to analyze the
related code across all architectures and organized my findings below.
== Current CPU Reset Strategies ==
There are essentially five different patterns in use now:
Strategy | Who does it | Initial | System | Example
---------+--------------------+---------+---------+------------------------
A | realize only | Yes | No (*) | x86
| | | |
---------+--------------------+---------+---------+------------------------
B | realize + board | Yes x2 | Yes | RISC-V, ARM, OpenRISC
| (simple wrapper) | | |
---------+--------------------+---------+---------+------------------------
C | board only | Yes | Yes | PPC, MIPS, SPARC, M68K,
| (custom logic) | | | SH4, Xtensa, Microblaze
---------+--------------------+---------+---------+------------------------
D | nobody | No | No | Alpha (bug)
| | | |
---------+--------------------+---------+---------+------------------------
E | board (SoC child | Yes | Yes | MIPS CPS
| on qbus tree) | | |
---------+--------------------+---------+---------+------------------------
(*) x86 gets away with this because the reset vector is architecturally
fixed at 0xFFFFFFF0 and does not need board-level PC setup.
The root cause of this inconsistency is that cpu_create() calls
qdev_realize(dev, NULL, ...) with a NULL bus, so CPUs are not part of
the qbus tree. The only thing registered into the root reset container
in qemu_machine_creation_done() is sysbus_get_default():
```
/* hw/core/machine.c */
qemu_register_resettable(OBJECT(sysbus_get_default()));
```
The comment right above it says it all:
"Note that this will *not* reset any Device objects
which are not attached to some part of the qbus tree."
This means every board must independently arrange for CPU reset, and
as Alpha demonstrates, it is easy to forget.
== Per-Architecture Details ==
--- Strategy A: realize-only reset (x86) ---
x86_cpu_realizefn() calls cpu_reset() at the end of realize:
```
/* target/i386/cpu.c */
cpu_reset(cs);
xcc->parent_realize(dev, &local_err);
```
But the board code in hw/i386/x86-common.c:x86_cpus_init() does NOT
register any qemu_register_reset() callback for CPUs. The CPU is
created with qdev_realize(DEVICE(cpu), NULL, errp) and is not on any
bus.
x86 has its own x86_cpu_reset_hold() (target/i386/cpu.c) which
correctly resets all architectural state including segment registers,
CR0, etc. But since the CPU object is not in the reset container,
this method would not be called during qemu_system_reset() through
the resettable framework alone.
x86 "works" because:
1. The reset vector is hardwired (no board PC setup needed)
2. KVM may handle reset internally for KVM-accelerated guests
3. The firmware (SeaBIOS/OVMF) handles warm reboot via the
keyboard controller reset mechanism
--- Strategy B: realize + board simple wrapper (RISC-V, ARM, OpenRISC) ---
These targets call cpu_reset() in their realize function AND the board
registers a qemu_register_reset() callback that also calls cpu_reset().
This results in the CPU being reset twice at startup.
RISC-V example:
```
/* target/riscv/cpu.c - in riscv_cpu_realize() */
qemu_init_vcpu(cs);
cpu_reset(cs); /* first reset */
/* hw/riscv/riscv_hart.c - board callback */
static void riscv_harts_cpu_reset(void *opaque)
{
RISCVCPU *cpu = opaque;
cpu_reset(CPU(cpu)); /* second reset (at startup + system reset) */
}
/* hw/riscv/riscv_hart.c */
qemu_register_reset(riscv_harts_cpu_reset, &s->harts[idx]);
return qdev_realize(DEVICE(&s->harts[idx]), NULL, errp);
```
Note the ordering:
qemu_register_reset() is called BEFORE qdev_realize(). So the
sequence at startup is:
1. qdev_realize() -> riscv_cpu_realize() -> cpu_reset() [first]
2. qemu_machine_creation_done() -> qemu_system_reset()
-> LegacyReset callback -> cpu_reset() [second]
The board callback here is a pure wrapper with no extra logic, so the
double reset is harmless but wasteful(we can use gdb to make sure).
ARM is similar but the board callback has substantial custom logic
(see Strategy C below). The realize-time cpu_reset() at
target/arm/cpu.c is redundant with the one in do_cpu_reset().
OpenRISC (target/openrisc/cpu.c) also calls cpu_reset() in
realize, and boards like hw/openrisc/openrisc_sim.c register
qemu_register_reset() with a simple wrapper.
--- Strategy C: board-only with custom post-reset logic (PPC, MIPS...) ---
These boards register qemu_register_reset() callbacks that first call
cpu_reset() and then apply board-specific state (PC, registers, TLB).
MIPS Malta (hw/mips/malta.c):
```
static void main_cpu_reset(void *opaque)
{
MIPSCPU *cpu = opaque;
CPUMIPSState *env = &cpu->env;
cpu_reset(CPU(cpu));
/* Clear ERL bit when booting a kernel */
if (loaderparams.kernel_filename) {
env->CP0_Status &= ~(1 << CP0St_ERL);
}
}
```
PPC e500 (hw/ppc/e500.c) - distinguishes primary/secondary:
```
static void ppce500_cpu_reset(void *opaque) /* primary */
{
/* ... */
cpu_reset(cs);
cs->halted = 0;
env->gpr[1] = (16 * MiB) - 8;
env->gpr[3] = bi->dt_base; /* device tree address */
/* ... sets up TLB mappings ... */
}
static void ppce500_cpu_reset_sec(void *opaque) /* secondary */
{
cpu_reset(cs);
cs->exception_index = EXCP_HLT; /* halt until kicked */
}
/* Registration distinguishes primary vs secondary */
if (!i) {
qemu_register_reset(ppce500_cpu_reset, cpu);
} else {
qemu_register_reset(ppce500_cpu_reset_sec, cpu);
}
```
SPARC sun4m (hw/sparc/sun4m.c) and about 30 other board
files across PPC, M68K, SH4, Xtensa, Microblaze, and OpenRISC
follow the same general pattern with varying amounts of custom
post-reset logic.
PS:
In this case, we can also talk about ARM.
ARM boot (hw/arm/boot.c) - the most complex case:
```
static void do_cpu_reset(void *opaque)
{
ARMCPU *cpu = opaque;
/* ... */
cpu_reset(cs);
if (info) {
if (!info->is_linux) {
/* Set endianness, jump to entry */
cpu_set_pc(cs, info->entry);
} else {
/* Emulate firmware: set EL, configure SCTLR, ... */
arm_emulate_firmware_reset(cs, target_el);
if (cpu == info->primary_cpu) {
cpu_set_pc(cs, info->loader_start);
} else if (info->secondary_cpu_reset_hook) {
info->secondary_cpu_reset_hook(cpu, info);
}
}
}
}
```
ARM boards don't call qemu_register_reset() directly. Instead,
arm_load_kernel() (hw/arm/boot.c) registers do_cpu_reset for
all CPUs in a loop:
```
for (cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
qemu_register_reset(do_cpu_reset, ARM_CPU(cs));
}
```
--- Strategy D: no reset at all (Alpha - BUG) ---
Alpha is the only architecture that neither calls cpu_reset() in
realize nor registers a qemu_register_reset() callback:
```
/* target/alpha/cpu.c */
static void alpha_cpu_realizefn(DeviceState *dev, Error **errp)
{
/* ... */
cpu_exec_realizefn(cs, &local_err);
/* ... */
qemu_init_vcpu(cs);
/* NO cpu_reset(cs) here! */
acc->parent_realize(dev, errp);
}
/* hw/alpha/dp264.c */
for (i = 0; i < smp_cpus; ++i) {
cpus[i] = ALPHA_CPU(cpu_create(machine->cpu_type));
/* NO qemu_register_reset() */
}
/* Later, board directly pokes env fields: */
cpus[i]->env.pc = palcode_entry;
cpus[i]->env.palbr = palcode_entry;
```
Alpha also has no target-specific reset_hold method -
alpha_cpu_class_init() does not set rc->phases.hold, so it relies
entirely on the parent cpu_common_reset_hold(). But since the CPU
is not in the reset container, even that never runs during system
reset.
This is the bug that corrupted icount, as Akihiko's patch identified.
--- Strategy E: SoC container on qbus tree (MIPS CPS) ---
Some SoC-level container devices create CPUs as child objects:
```
/* hw/mips/cps.c */
static void mips_cps_realize(DeviceState *dev, Error **errp)
{
for (i = 0; i < s->num_vp; i++) {
cpu = MIPS_CPU(object_new(s->cpu_type));
qdev_realize_and_unref(DEVICE(cpu), NULL, errp);
qemu_register_reset(main_cpu_reset, s->cpus[i]);
}
}
```
Even here, the CPU itself is realized with a NULL bus, so it still
needs the explicit qemu_register_reset() call.
== The Core Problem ==
cpu_create() and most board code call qdev_realize(dev, NULL, ...)
for CPUs, placing them outside the qbus tree. The comment in
hw/core/cpu-common.c acknowledges this:
```
/*
* Reason: CPUs still need special care by board code: wiring up
* IRQs, adding reset handlers, halting non-first CPUs, ...
*/
dc->user_creatable = false;
```
The "adding reset handlers" part is exactly what is inconsistently
done across the codebase.
===
As the analysis above shows, the current CPU reset situation is indeed
quite messy -- multi different strategies across the codebase, with at
least one outright bug (Alpha) and several cases of redundant double
resets.
I think Peter's ideas about rethinking the reset infrastructure
are well worth pursuing. Whether that means cascading reset via the QOM
tree, or a separate reset infrastructure that defaults to the QOM tree
but allows SoC-level overrides, having a single consistent mechanism
would eliminate this entire class of "forgot to register CPU reset"
bugs.
The tricky part, as Peter noted, is getting from here to there
without breaking things or starting yet another never-completed API
transition.
If there is interest in moving forward with refactoring this area,
I'd be happy to help contribute patches or review work. :)
Thanks,
Chao
> we cascade reset via the QOM tree? or do we need a separate
> reset infrastructure that defaults to the QOM tree but that
> SoC objects can override if they have more complex reset
> requirements? How does this interact with bus-reset (which is
> definitely a thing for some buses? And once we've decided what
> we want, how do we get from where we are right now to there
> without breaking things and ideally without having another
> of the long-drawn-out never-completed API transitions we're so
> good at?)
>
> thanks
> -- PMM
>