On Thu, Oct 18, 2012 at 08:32:54AM +0200, Alexander Graf wrote:
> 
> 
> On 18.10.2012, at 03:18, Benjamin Herrenschmidt <b...@kernel.crashing.org> 
> wrote:
> 
> > On Thu, 2012-10-18 at 11:09 +1100, David Gibson wrote:
> > 
> >>>> That's horrible; if you use -boot just once it will clobber a
> >>>> persistent NVRAM's boot order.  I see that a means of changing the
> >>>> default boot order from management tools is desirable, but that
> >>>> shouldn't be the normal behaviour of -boot.  And the objections to (2)
> >>>> apply even more strongly - we'd need to translate arbitrary -boot
> >>>> strings to NVRAM representation which may not be at all
> >>>> straightforward from the information qemu has available.
> >>> 
> >>> It may not be straight forward, but it's what makes the most sense from
> >>> a user's PoV.
> >> 
> >> Bollocks.  Using -boot to override the normal boot sequence
> >> permanently changing the normal boot sequence absoultely does not make
> >> sense from a user's PoV.
> > 
> > I strongly agree with David here. -boot should not change the persistent
> > state.
> 
> I think Anthony and you are looking at 2 different use cases, each
> with their own sane reasoning.
> 
> You want to have the chance to override the boot order temporarily
> for things like cd boot or quick guest rescue missions.
> 
> You also want to be able to permanently change the guest's boot
> order from a management tool. At that same place you want to be able
> to display it, so you don't have to boot your vm to know what it
> would be doing.

That's true to an extent.  However, I vehemently disagree that it's
arbitrary which one gets the new option.  Neither -boot nor bootindex=
alter any persistent data now and they should not suddenly start doing
so.

Now a method of externally altering the firmware persistent boot order
would certainly be nice to have.  However, I'm not at all convinced
that it's realistically possible to do that in way that has a platform
neutral interface.  The fundamental problem here is that we're tied to
the pre-existing ways the platform stores the boot order information
and what that's even capable of expressing can be very different from
platform to platform: can it express an arbitrary list, or just a
limited number of devices, or just one?  can it represent arbitrary
devices in some firmware id/address scheme, or does it just
give order of a fixed set of known devices?  or is it even more
limited, containing just a few "CD before disk" type booleans?  for
that matter, does the firmware even have any notion at all of a
persistent configurable boot order?

If the configuration tool/setting has to be platform specific anyway,
then most of the questions the current proposal attempts to address
simply don't arise.  We could make such a tool for pseries right now:
access the persistent nvram image via qemu-nbd and poke the necessary
things in.

> As for device detection logic, both face the same problems. You need
> to be able to say 'boot from cd-rom first temporarily' just the same
> as you need to be able to say 'boot from the first cd-rom as first
> boot option permanently'. The permanent change needs to be possible
> with the vm turned off though.
> 
> I suppose that Anthony's reasoning is that we can implement
> temporary in the management layer (or even qemu) if we have the
> permanent mechanism, by switching back to the previous state after
> shutdown if the guest written boot order didn't change.

That really doesn't work, for the reason you mention in the next
paragraph, amongst others.

> I don't mind personally if we have one interface for temporary and
> persistent or 2 separate ones, but I think we should aim for having
> both options available in the long run. Though doing permanent
> changes first and reverting them later could raise problems when you
> kill your vm, since that wouldn't clean up the temporary change.

Not to mention that the persistent store could be used for other
things as well, and restoring it could clobber other changes that the
guest has made and which should be persistent.

> > In our case, the persistent state will have been carefully crafted by
> > complicated scripts by the distro installer, and while I may want to use
> > -boot to boot once off a cd image or similar, I certainly don't want
> > that to affect my nvram setting pointing to the right on-disk
> > bootloader.
> > 
> > Additionally I don't want qemu to have to understand all the intricacies
> > of expressing OFW boot path if we can avoid it.
> 
> Yes, the same problem as EFI for example is facing. The solution here is as 
> simple as it gets: a new device name space. Instead of having a boot list 
> entry saying 'boot from device x, part y, file z' you would get an entry 
> saying 'boot from /qemu/disk0' and leave the rest to the firmware. The good 
> thing about this approach is that it again is persistable and can be used in 
> boot order lists. So you can directly translate -boot cd into 
> /qemu/disk0,/qemu/cdrom. And if you screwed up your guest boot config, just 
> put that order in by hand into permanent config.
> 
> > 
> > Qemu gives as much info as it can and let the firmware itself inside the
> > guest figure things out.
> 
> Yes, that's the only chance we have really. Even for bootindex, which could 
> for example get translated to /qemu/pci/0.10.0/disk0 which again would then 
> get aliased to the actual disk device node behind pci device 0.10.0 (first 
> disk) by SLOF.
> 
> > 
> > In fact, I don't want Qemu to know anything about our internal nvram
> > format. This is a business between the guest FW and the guest OS. The
> > only thing qemu is allowed to do is wipe it out if asked to do so :-)
> 
> It might be useful to use fdt in nvram to store the permanent boot order. 
> That way QEMU / management tools have the chance to make persistent changes. 
> Everyone around already understands fdt anyways :).
> 
> >> Um.. as far as I can tell that's a point in favour of my position.  It
> >> makes it impossible for qemu to correctly describe boot sequences
> >> using these devices in the terms firmware uses internally.  On the
> >> other hand it certainly is possible for qemu to pass bootorder="cd"
> >> (or whatever) to the firmware via device tree of fw_cfg and have
> >> firmware locally interpret that in tersm of what it knows about
> >> available devices.
> > 
> > This is more/less what happens with -boot today. IE. If you pass "c"
> > SLOF looks for a bootable disk (though arguably the algorithm could be
> > improved), "d" for a bootable optical media etc...
> > 
> > We definitely want something a bit more expressive and in some case
> > might even be able to pass down from the command line a full path to an
> > actual device but we don't necessarily want qemu to understand the nvram
> > format of this.
> > 
> > Make it an expressive representation that makes sense to qemu, and let
> > the FW "translate" that to something it understands internally.
> 
> Yes :).
> 
> Regardless of this problem, I think the conclusion on how to gandle default 
> -boot makes sense to everyone, so you (Avik?) can already start working on 
> that one while we nail down the details of the boot protocol handshakes 
> between QEMU and SLOF.
> 
> 
> Alex
> 

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson


Reply via email to