On Wed, May 08, 2013 at 03:35:46PM +0300, Michael S. Tsirkin wrote: > On Wed, May 08, 2013 at 02:35:44PM +0300, Gleb Natapov wrote: > > On Wed, May 08, 2013 at 02:07:24PM +0300, Michael S. Tsirkin wrote: > > > On Wed, May 08, 2013 at 01:59:12PM +0300, Gleb Natapov wrote: > > > > On Wed, May 08, 2013 at 01:43:25PM +0300, Michael S. Tsirkin wrote: > > > > > On Wed, May 08, 2013 at 01:34:59PM +0300, Gleb Natapov wrote: > > > > > > On Wed, May 08, 2013 at 01:29:12PM +0300, Michael S. Tsirkin wrote: > > > > > > > On Wed, May 08, 2013 at 12:31:50PM +0300, Gleb Natapov wrote: > > > > > > > > On Tue, May 07, 2013 at 07:01:13PM -0400, Kevin O'Connor wrote: > > > > > > > > > On Tue, May 07, 2013 at 09:00:48PM +0300, Michael S. Tsirkin > > > > > > > > > wrote: > > > > > > > > > > On Thu, Apr 25, 2013 at 12:02:20PM +0300, Michael S. > > > > > > > > > > Tsirkin wrote: > > > > > > > > > > > Untested yet, but I thought I'd share the > > > > > > > > > > > BIOS bits so we can agree on direction. > > > > > > > > > > > > > > > > > > > > > > In particular check out ROM sizes: > > > > > > > > > > > - Before patchset with DSDT enabled > > > > > > > > > > > Total size: 127880 Fixed: 59060 Free: 3192 (used > > > > > > > > > > > 97.6% of 128KiB rom) > > > > > > > > > > > - Before patchset with DSDT disabled > > > > > > > > > > > Total size: 122844 Fixed: 58884 Free: 8228 (used > > > > > > > > > > > 93.7% of 128KiB rom) > > > > > > > > > > > - After patchset: > > > > > > > > > > > Total size: 128776 Fixed: 59100 Free: 2296 (used > > > > > > > > > > > 98.2% of 128KiB rom) > > > > > > > > > > > - Legacy disabled at build time: > > > > > > > > > > > Total size: 119836 Fixed: 58996 Free: 11236 (used > > > > > > > > > > > 91.4% of 128KiB rom) > > > > > > > > > > > > > > > > > > > > > > As can be seen from this, most size savings come > > > > > > > > > > > from dropping DSDT, but we do save a bit by removing > > > > > > > > > > > other tables. Of course the real reason to move tables to > > > > > > > > > > > QEMU > > > > > > > > > > > is so that ACPI can better match hardware. > > > > > > > > > > > > > > > > > > > > > > This patchset adds an option to move all code for > > > > > > > > > > > formatting acpi tables > > > > > > > > > > > out of BIOS. With this, QEMU has full control over the > > > > > > > > > > > table layout. > > > > > > > > > > > All tables are loaded from the new "/etc/acpi/" directory. > > > > > > > > > > > Any entries in this directory cause BIOS to disable > > > > > > > > > > > ACPI table generation completely. > > > > > > > > > > > A generic linker script, controlled by QEMU, is > > > > > > > > > > > loaded from "/etc/linker-script". It is used to > > > > > > > > > > > patch in table pointers and checksums. > > > > > > > > > > > > > > > > > > > > After some thought, there are two additional > > > > > > > > > > options worth considering, in that they simplify > > > > > > > > > > bios code somewhat: > > > > > > > > > > > > > > > > > > > > - bios could get size from qemu, allocate a buffer > > > > > > > > > > (e.g. could be one buffer for all tables) > > > > > > > > > > and pass the address to qemu. > > > > > > > > > > qemu does all the patching > > > > > > > > > > > > > > > > > > > > - further, qemu could do the copy of tables into > > > > > > > > > > that address directly > > > > > > > > > > > > > > > > > > This seems more complex than necessary to me. > > > > > > > > > > > > > > > > > > The important task is to get the tables generated in QEMU - > > > > > > > > > I'd focus > > > > > > > > > on getting the tables generated in QEMU (one table per fw_cfg > > > > > > > > > "file"). > > > > > > > > > Once that is done, the SeaBIOS side can be easily > > > > > > > > > implemented, and we > > > > > > > > > can add any enhancements on top if we feel it is necessary. > > > > > > > > > > > > > > > > > +1. This "copy of tables into that address directly" is just an > > > > > > > > ad-hoc PV > > > > > > > > isa DMA device in disguise. Such device was refused when > > > > > > > > libguestfs > > > > > > > > asked for it, and they wanted it for much better reason - > > > > > > > > performance. > > > > > > > > There is existing mechanism to pass data into firmware. Use it > > > > > > > > please. > > > > > > > > > > > > > > Yes I can code it up using FW_CFG for now. > > > > > > > > > > > > > > One issue with QEMU_CFG_FILE_DIR is that it's broken wrt > > > > > > > migration, > > > > > > > unless we pass in very small bits of data which we > > > > > > > can guarantee never changes across qemu versions. > > > > > > > > > > > > > Shouldn't we guaranty that ACPI tables do not change for the same > > > > > > machine type anyway? > > > > > > > > > > That's not practical. They are too big to stay completely unchanged. > > > > > > > > > I will not be surprised if this will cause us problem somehow. Guest > > > > will see new tables only after reboot/resume from S4 so damage is > > > > limited, but one thing that comes to mind is table's size change. If > > > > they grow from one version to the other after resuming a guest from S4 > > > > on new QEMU version part of the tables may be corrupted. > > > > > > Why would it be corrupted? > > > > > Because ACPI tables are stored in the memory marked as "ACPI data" (IIRC > > seabios mark them as reserved). Guests do not save reserved memory during > > S4 (don't know about "ACPI data", but if guest does not copy tables from > > it to another location I doubt it saves the memory, anyway ACPI spec does > > not mandate it), so what happens on resume if the memory grows? Part > > of it, that was not marked as reserved before S4, is re-written with > > whatever data guest had there and rest contains now corrupted ACPI tables > > that BIOS put there during boot. We can hope that guest is smart enough > > to see that memory map changed and refuse to resume or we can put ACPI > > tables into NVS memory which has to be saved and restored during S4 by > > OSPM. > > This easily solvable by asking for memory with a solid margin. > > But what you write above sounds strange: would not it > apply to any memory allocated with malloc_high? > If so, a minor change in BIOS e.g. where we > allocate a bit more, less, or in a different order, > will break suspend. > Good point. I really do not know how S4 resume handles memory map changes. The only sane way that I see is to refuse resume if memory map changes.
> > > In any case, FACS has a hardware signature value for > > > just such a case. If we know VM can not be resumed on new QEMU, > > > we can change the signature and it will cold-boot instead. > > > > > Nice, does Linux check it? > > If not we can fix it. > > > > > > > > Off-list, I suggested fixing it and migrating file > > > > > > > content, but Anthony thinks it's a bad idea. > > > > > > > > > > > > > Why is this a bad idea to fix device migration? > > > > > > > > > > You misunderstand I think. > > > > > Question is whether we should be putting so much info in fw_cfg. > > > > > If we keep fw_cfg for small things we don't need to > > > > > migrate it. In that case ACPI tables have to be passed in > > > > > using some other mechanism. > > > > > > > > > Where this notion that fw_cfg is only for a small things is coming > > > > from? I can assure you this was not the case when the device was > > > > introduced. In fact it is used today for not so small things like > > > > bootindex splash screen bitmaps, option rom loading and kernel/initrd > > > > loading. Some of those are bigger then ACPI tables will ever be. > > > > And they all should be migrated, so fw_cfg should be fixed anyway. > > > > > > > > -- > > > > Gleb. > > > > > > I'm not arguing with that. Convince Anthony please. > > > > > Convince him in what? That fw_cfg is broken vrt migration and there are > > cases that will fail _today_ without any ACPI related changes? This is > > knows for ages. > > > > -- > > Gleb. > > That we should use fw_cfg to load acpi tables. > I haven't seen his arguments against it. -- Gleb.