Paolo Bonzini <pbonz...@redhat.com> writes:

> On 11/11/21 15:37, Markus Armbruster wrote:
>>> 1) PHASE_NO_MACHINE - backends can already be created here, but no
>>> machine exists yet
>>>
>>> 2) PHASE_MACHINE_CREATED - the machine object has been created.  It's
>>> not initialized, but it's there.
>>>
>>> 3) PHASE_ACCEL_CREATED - the accelerator object has been created.  The
>>> accelerator needs the machine object, because for example KVM might
>>> not support all machine types.  So the accelerator queries the machine
>>> object and fails creation in case of incompatibility.  This enables
>>> e.g. fallback to TCG.  -preconfig starts the monitor here.
>> 
>> We should be able to start monitors first, if we put in the work.
>
> The monitor starts, the question is the availability of the event loop. 

What does the event loop depend on?

>   This requires a command (or a something) to advance to the next phase. 
>    x-exit-preconfig is such a command.
>
> In addition, one thing I don't like of preconfig is that command line 
> arguments linger until they are triggered by x-exit-preconfig.  Adding 
> more such commands makes things worse.

Yes, that's ugly.  I'd prefer command line left to right, and then QMP
commands in order.  If your command line advances the phase too far for
your QMP commands, then that's your own fault.

>>> 4) PHASE_MACHINE_INIT - machine initialization consists mostly in
>>> creating the onboard devices.  For this to happen, the machine needs
>>> to learn about the accelerator, because onboard devices include CPUs
>>> and other accelerator-dependent devices.  Devices plugged in this
>>> phase are cold-plugged.
>>>
>>> 5) PHASE_MACHINE_READY - machine init done notifiers have been called
>>> and the VM is ready.  Devices plugged in this phase already count as
>>> hot-plugged.  -S starts the monitor here.
>> 
>> Remind us: what work is done in the machine init done notifiers?
>
> It depends, but---generally speaking---what they do applies only to 
> cold-plugged devices.  For example, fw_cfg gathers the boot order in the 
> machine-init-done notifier (via get_boot_devices_list).
>
>> What exactly necessitates "count as hot-plugged"?  Is it something done
>> in these notifiers?
>
> It depends on the bus.  It boils down to this code in device_initfn:
>
>      if (phase_check(PHASE_MACHINE_READY)) {
>          dev->hotplugged = 1;
>          qdev_hot_added = true;
>      }
>
> For example, hotplugged PCI devices must define function 0 last; 
> coldplugged PCI devices can define functions in any order 
> (do_pci_register_device, called by pci_qdev_realize).
>
> Another example, a device_add after machine-done causes an ACPI hotplug 
> event, because acpi_pcihp_device_plug_cb checks dev->hotplugged.

Worse, if the guest doesn't play ball, the device remains in hot plug
limbo.

Why would anyone *want* to plug a device in PHASE_MACHINE_READY (when
the plug is hot) instead of earlier (when it's cold)?

>>> x-exit-preconfig goes straight from PHASE_ACCEL_CREATED to
>>> PHASE_MACHINE_READY.  Devices can only be created after
>>> PHASE_MACHINE_INIT, so device_add cannot be enabled at preconfig
>>> stage.
>> 
>> Now I am confused again.  Can you cold plug devices with device_add in
>> presence of -preconfig, and if yes, how?
>
> No, because the monitor goes directly from a point where device_add 
> fails (PHASE_ACCEL_CREATED) to a point where devices are hotplugged 
> (PHASE_MACHINE_READY).

Bummer.

>> Related question: when exactly in these phases do we create devices
>> specified with -device?
>
> In PHASE_MACHINE_INIT---that is, after the machine has been initialized 
> and before machine-done-notifiers have been called.

In other words, you should never use device_add where -device would do,
because the latter gives you cold plug (which is simple and reliable),
and the former hot plug (which is the opposite).

>>> With a pure-QMP configuration flow, PHASE_MACHINE_CREATED would be
>>> reached with a machine-set command (corresponding to the
>>> non-deprecated parts of -machine) and PHASE_ACCEL_CREATED would be
>>> reached with an accel-set command (corresponding to -accel).
>> 
>> I don't think this depends on "pure-QMP configuration flow".  -machine
>> and -accel could advance the phase just like their buddies machine-set
>> and accel-set.
>
> They already do (see qemu_init's calls to phase_advance).
>
>> State transition diagram:
>> 
>>      PHASE_NO_MACHINE (initial state)
>>              |
>>              |  -machine or machine-set
>>              v
>>      PHASE_MACHINE_CREATED
>>              |
>>              |  -accel or accel-set
>>              v
>>      PHASE_ACCEL_CREATED
>>              |
>>              |  ???
>
> qmp_x_exit_preconfig() -> qemu_init_board() -> machine_run_board_init()

I read this as "the state transition happens in
machine_run_board_init(), called from qmp_x_exit_preconfig() via
qemu_init_board()".

>>              v
>>      PHASE_MACHINE_INIT
>>              |
>>              |  ???
>
> qmp_x_exit_preconfig() -> qemu_machine_creation_done() -> 
> qdev_machine_creation_done()

I read this as "the state transition happens in
qdev_machine_creation_done(), called from qmp_x_exit_preconfig() via
qemu_machine_creation_done()".

> The steps in qmp_x_exit_preconfig() are pretty self-explanatory:
>
>      qemu_init_board();
>      qemu_create_cli_devices();
>      qemu_machine_creation_done();
>
> qemu_init() calls qmp_x_exit_preconfig() if -preconfig is not passed on 
> the command line.
>
>>              v
>>      PHASE_MACHINE_READY
>>              |
>>              |  cont
>>              v
>>             ???
>
> No phases anymore, it's always PHASE_MACHINE_READY.  cont simply changes 
> the runstate to RUNNING.
>
>> The earlier the monitor becomes available, the better.
>> Ideally, we'd process the command line strictly left to right, and fail
>> options that are "out of phase".  Make the monitor available right when
>> we process its -mon.  The -chardev for its character device must precede
>> it.
>
> The boat for this has sailed.  The only sane way to do this is a new binary.

"Ideally" still applies to any new binary.

>> Likewise, we'd fail QMP commands that are "out of phase".
>> @allow-preconfig is a crutch that only exists because we're afraid (with
>> reason) of hidden assumptions in QMP commands.
>
> At this point, it's not even like that anymore (except for block devices 
> because my patches haven't been applied).  allow-preconfig is just a 
> quick way to avoid writing
>
>      if (!phase_check(PHASE_MACHINE_READY)) {
>          error_setg(errp, "Please configure the machine first");
>          return;
>      }
>
> over and over in many commands.

My point is that we still have quite a few commands without
'allow-preconfig' mostly because we are afraid of allowing them in
preconfig state, not because of true phase dependencies.


Reply via email to