QEMU maintains disk boot order and geometry information in the lists
"fw_boot_order" and "fw_lchs". For example, boot order is derived from each
device's bootindex value.

During system reset, and only during system reset, QEMU updates the
"bootorder" and "bios-geometry" entries in fw_cfg based on the contents of
"fw_boot_order" and "fw_lchs". After the guest VM boots, the firmware
(e.g., SeaBIOS) can read the boot order from fw_cfg and boot from the disk
at the top of the list.

The reset handler fw_cfg_machine_reset() is invoked either implicitly
during instance creation or explicitly by the user via HMP/QMP.

However, users may attach or detach disks while the VM is in the prelaunch
state. Because there is no implicit reset when transitioning from prelaunch
to running, the "bootorder" and "bios-geometry" data in fw_cfg can become
stale. As a result, the firmware may be unable to locate the correct disk
to boot from.

Here is an example that demonstrates the bug.

1. Create a QEMU instance with a virtio-scsi HBA and keep it in the
prelaunch state. Use SeaBIOS rather than UEFI.

-device virtio-scsi-pci,id=scsi0,num_queues=4 \
-S \

2. First, attach the boot disk, then attach the secondary disk.

(qemu) drive_add 0 file=boot.qcow2,if=none,id=drive0
(qemu) device_add 
scsi-hd,drive=drive0,bus=scsi0.0,channel=0,scsi-id=0,lun=1,bootindex=1
(qemu) drive_add 0 file=secondary.qcow2,if=none,id=drive1
(qemu) device_add 
scsi-hd,drive=drive1,bus=scsi0.0,channel=0,scsi-id=0,lun=2,bootindex=-1

3. Start the VM from the prelaunch state. Because the "bootorder" and
"bios-geometry" data in fw_cfg is stale, SeaBIOS attempts to boot from the
secondary disk only once and then stops. As a result, the VM fails to boot.

One possible workaround is to require QEMU users to explicitly issue a
system_reset before starting a guest VM from the prelaunch state, if any
disks have been attached or detached.

Another option is to address the issue in SeaBIOS. Nowadays, SeaBIOS
attempts to boot from only a single disk. We could enhance SeaBIOS to try
multiple disks in order until boot succeeds.

Another option is to update "bootorder" and "bios-geometry" everywhere
disks are attached or detached. This may require identifying the relevant
functions across multiple device types, such as SCSI, NVMe, virtio-blk, and
IDE.

This commit fixes the issue in QEMU by ensuring that "bootorder" and
"bios-geometry" are always updated when QEMU transitions from the prelaunch
state to running.

Co-developed-by: Joe Jin <[email protected]>
Signed-off-by: Dongli Zhang <[email protected]>
---
 hw/nvram/fw_cfg.c         | 20 ++++++++++++++++++--
 include/hw/nvram/fw_cfg.h |  2 ++
 monitor/qmp-cmds.c        |  6 ++++++
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index 1d7d835421..6f60abae65 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -965,9 +965,8 @@ bool fw_cfg_add_file_from_generator(FWCfgState *s,
     return true;
 }
 
-static void fw_cfg_machine_reset(void *opaque)
+static void __fw_cfg_machine_reset(FWCfgState *s)
 {
-    FWCfgState *s = opaque;
     void *ptr;
     size_t len;
     char *buf;
@@ -981,6 +980,23 @@ static void fw_cfg_machine_reset(void *opaque)
     g_free(ptr);
 }
 
+void fw_cfg_machine_reload(void)
+{
+    FWCfgState *s = fw_cfg_find();
+
+    if (!s) {
+        return;
+    }
+
+    __fw_cfg_machine_reset(s);
+}
+
+static void fw_cfg_machine_reset(void *opaque)
+{
+    FWCfgState *s = opaque;
+    __fw_cfg_machine_reset(s);
+}
+
 static void fw_cfg_machine_ready(struct Notifier *n, void *data)
 {
     FWCfgState *s = container_of(n, FWCfgState, machine_ready);
diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
index 56f17a0bdc..8aace1d4a9 100644
--- a/include/hw/nvram/fw_cfg.h
+++ b/include/hw/nvram/fw_cfg.h
@@ -351,4 +351,6 @@ void load_image_to_fw_cfg(FWCfgState *fw_cfg, uint16_t 
size_key,
                           uint16_t data_key, const char *image_name,
                           bool try_decompress);
 
+void fw_cfg_machine_reload(void);
+
 #endif
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 0c409c27dc..dfeb80d1d9 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -32,6 +32,7 @@
 #include "hw/mem/memory-device.h"
 #include "hw/intc/intc.h"
 #include "migration/misc.h"
+#include "hw/nvram/fw_cfg.h"
 
 NameInfo *qmp_query_name(Error **errp)
 {
@@ -112,6 +113,11 @@ void qmp_cont(Error **errp)
             error_propagate(errp, local_err);
             return;
         }
+
+        if (runstate_check(RUN_STATE_PRELAUNCH)) {
+            fw_cfg_machine_reload();
+        }
+
         vm_start();
     }
 }
-- 
2.39.3


Reply via email to