QEMU maintains disk boot order and geometry information in the lists "fw_boot_order" and "fw_lchs". For example, boot order is derived from each device's bootindex value.
During system reset, and only during system reset, QEMU updates the "bootorder" and "bios-geometry" entries in fw_cfg based on the contents of "fw_boot_order" and "fw_lchs". After the guest VM boots, the firmware (e.g., SeaBIOS) can read the boot order from fw_cfg and boot from the disk at the top of the list. The reset handler fw_cfg_machine_reset() is invoked either implicitly during instance creation or explicitly by the user via HMP/QMP. However, users may attach or detach disks while the VM is in the prelaunch state. Because there is no implicit reset when transitioning from prelaunch to running, the "bootorder" and "bios-geometry" data in fw_cfg can become stale. As a result, the firmware may be unable to locate the correct disk to boot from. Here is an example that demonstrates the bug. 1. Create a QEMU instance with a virtio-scsi HBA and keep it in the prelaunch state. Use SeaBIOS rather than UEFI. -device virtio-scsi-pci,id=scsi0,num_queues=4 \ -S \ 2. First, attach the boot disk, then attach the secondary disk. (qemu) drive_add 0 file=boot.qcow2,if=none,id=drive0 (qemu) device_add scsi-hd,drive=drive0,bus=scsi0.0,channel=0,scsi-id=0,lun=1,bootindex=1 (qemu) drive_add 0 file=secondary.qcow2,if=none,id=drive1 (qemu) device_add scsi-hd,drive=drive1,bus=scsi0.0,channel=0,scsi-id=0,lun=2,bootindex=-1 3. Start the VM from the prelaunch state. Because the "bootorder" and "bios-geometry" data in fw_cfg is stale, SeaBIOS attempts to boot from the secondary disk only once and then stops. As a result, the VM fails to boot. One possible workaround is to require QEMU users to explicitly issue a system_reset before starting a guest VM from the prelaunch state, if any disks have been attached or detached. Another option is to address the issue in SeaBIOS. Nowadays, SeaBIOS attempts to boot from only a single disk. We could enhance SeaBIOS to try multiple disks in order until boot succeeds. Another option is to update "bootorder" and "bios-geometry" everywhere disks are attached or detached. This may require identifying the relevant functions across multiple device types, such as SCSI, NVMe, virtio-blk, and IDE. This commit fixes the issue in QEMU by ensuring that "bootorder" and "bios-geometry" are always updated when QEMU transitions from the prelaunch state to running. According to runstate_transitions_def[], RUN_STATE_PRELAUNCH is allowed to transition to four states. Only the transition to RUN_STATE_RUNNING requires updating "bootorder" and "bios-geometry". Co-developed-by: Joe Jin <[email protected]> Signed-off-by: Dongli Zhang <[email protected]> --- v1 -> v2: - Add new runstate tranisition notifier to track transition from prelaunch to running hw/nvram/fw_cfg.c | 31 +++++++++++++++++++++++++++++-- include/hw/nvram/fw_cfg.h | 1 + 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c index 1d7d835421..5154d77028 100644 --- a/hw/nvram/fw_cfg.c +++ b/hw/nvram/fw_cfg.c @@ -27,6 +27,7 @@ #include "system/system.h" #include "system/dma.h" #include "system/reset.h" +#include "system/runstate.h" #include "system/address-spaces.h" #include "hw/core/boards.h" #include "hw/nvram/fw_cfg.h" @@ -965,9 +966,8 @@ bool fw_cfg_add_file_from_generator(FWCfgState *s, return true; } -static void fw_cfg_machine_reset(void *opaque) +static void __fw_cfg_machine_reset(FWCfgState *s) { - FWCfgState *s = opaque; void *ptr; size_t len; char *buf; @@ -981,10 +981,37 @@ static void fw_cfg_machine_reset(void *opaque) g_free(ptr); } +static void fw_cfg_machine_reset(void *opaque) +{ + FWCfgState *s = opaque; + + __fw_cfg_machine_reset(s); +} + +static void fw_cfg_runstate_transition(Notifier *notifier, void *data) +{ + FWCfgState *s = container_of(notifier, FWCfgState, + runstate_transition); + const VMRunStateTransition *transition = data; + + /* + * According to runstate_transitions_def[], RUN_STATE_PRELAUNCH is + * allowed to transition to four states. Only the transition to + * RUN_STATE_RUNNING requires updating "bootorder" and "bios-geometry". + */ + if (transition->old_state == RUN_STATE_PRELAUNCH && + transition->new_state == RUN_STATE_RUNNING) { + __fw_cfg_machine_reset(s); + } +} + static void fw_cfg_machine_ready(struct Notifier *n, void *data) { FWCfgState *s = container_of(n, FWCfgState, machine_ready); + qemu_register_reset(fw_cfg_machine_reset, s); + s->runstate_transition.notify = fw_cfg_runstate_transition; + qemu_add_runstate_transition_notifier(&s->runstate_transition); } static const Property fw_cfg_properties[] = { diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h index 56f17a0bdc..81ac940119 100644 --- a/include/hw/nvram/fw_cfg.h +++ b/include/hw/nvram/fw_cfg.h @@ -66,6 +66,7 @@ struct FWCfgState { uint16_t cur_entry; uint32_t cur_offset; Notifier machine_ready; + Notifier runstate_transition; bool dma_enabled; dma_addr_t dma_addr; -- 2.39.3
