On Wed, May 08, 2013 at 03:35:46PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 08, 2013 at 02:35:44PM +0300, Gleb Natapov wrote:
> > On Wed, May 08, 2013 at 02:07:24PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 08, 2013 at 01:59:12PM +0300, Gleb Natapov wrote:
> > > > On Wed, May 08, 2013 at 01:43:25PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 08, 2013 at 01:34:59PM +0300, Gleb Natapov wrote:
> > > > > > On Wed, May 08, 2013 at 01:29:12PM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Wed, May 08, 2013 at 12:31:50PM +0300, Gleb Natapov wrote:
> > > > > > > > On Tue, May 07, 2013 at 07:01:13PM -0400, Kevin O'Connor wrote:
> > > > > > > > > On Tue, May 07, 2013 at 09:00:48PM +0300, Michael S. Tsirkin 
> > > > > > > > > wrote:
> > > > > > > > > > On Thu, Apr 25, 2013 at 12:02:20PM +0300, Michael S. 
> > > > > > > > > > Tsirkin wrote:
> > > > > > > > > > > Untested yet, but I thought I'd share the
> > > > > > > > > > > BIOS bits so we can agree on direction.
> > > > > > > > > > > 
> > > > > > > > > > > In particular check out ROM sizes:
> > > > > > > > > > > - Before patchset with DSDT enabled
> > > > > > > > > > >     Total size: 127880  Fixed: 59060  Free: 3192 (used 
> > > > > > > > > > > 97.6% of 128KiB rom)
> > > > > > > > > > > - Before patchset with DSDT disabled
> > > > > > > > > > >     Total size: 122844  Fixed: 58884  Free: 8228 (used 
> > > > > > > > > > > 93.7% of 128KiB rom)
> > > > > > > > > > > - After patchset:
> > > > > > > > > > >     Total size: 128776  Fixed: 59100  Free: 2296 (used 
> > > > > > > > > > > 98.2% of 128KiB rom)
> > > > > > > > > > > - Legacy disabled at build time:
> > > > > > > > > > >     Total size: 119836  Fixed: 58996  Free: 11236 (used 
> > > > > > > > > > > 91.4% of 128KiB rom)
> > > > > > > > > > > 
> > > > > > > > > > > As can be seen from this, most size savings come
> > > > > > > > > > > from dropping DSDT, but we do save a bit by removing
> > > > > > > > > > > other tables. Of course the real reason to move tables to 
> > > > > > > > > > > QEMU
> > > > > > > > > > > is so that ACPI can better match hardware.
> > > > > > > > > > > 
> > > > > > > > > > > This patchset adds an option to move all code for 
> > > > > > > > > > > formatting acpi tables
> > > > > > > > > > > out of BIOS. With this, QEMU has full control over the 
> > > > > > > > > > > table layout.
> > > > > > > > > > > All tables are loaded from the new "/etc/acpi/" directory.
> > > > > > > > > > > Any entries in this directory cause BIOS to disable
> > > > > > > > > > > ACPI table generation completely.
> > > > > > > > > > > A generic linker script, controlled by QEMU, is
> > > > > > > > > > > loaded from "/etc/linker-script". It is used to
> > > > > > > > > > > patch in table pointers and checksums.
> > > > > > > > > > 
> > > > > > > > > > After some thought, there are two additional
> > > > > > > > > > options worth considering, in that they simplify
> > > > > > > > > > bios code somewhat:
> > > > > > > > > > 
> > > > > > > > > > - bios could get size from qemu, allocate a buffer
> > > > > > > > > >   (e.g. could be one buffer for all tables)
> > > > > > > > > >   and pass the address to qemu.
> > > > > > > > > >   qemu does all the patching
> > > > > > > > > > 
> > > > > > > > > > - further, qemu could do the copy of tables into
> > > > > > > > > >   that address directly
> > > > > > > > > 
> > > > > > > > > This seems more complex than necessary to me.
> > > > > > > > > 
> > > > > > > > > The important task is to get the tables generated in QEMU - 
> > > > > > > > > I'd focus
> > > > > > > > > on getting the tables generated in QEMU (one table per fw_cfg 
> > > > > > > > > "file").
> > > > > > > > > Once that is done, the SeaBIOS side can be easily 
> > > > > > > > > implemented, and we
> > > > > > > > > can add any enhancements on top if we feel it is necessary.
> > > > > > > > > 
> > > > > > > > +1. This "copy of tables into that address directly" is just an 
> > > > > > > > ad-hoc PV
> > > > > > > > isa DMA device in disguise. Such device was refused when 
> > > > > > > > libguestfs
> > > > > > > > asked for it, and they wanted it for much better reason - 
> > > > > > > > performance.
> > > > > > > > There is existing mechanism to pass data into firmware. Use it 
> > > > > > > > please.
> > > > > > > 
> > > > > > > Yes I can code it up using FW_CFG for now.
> > > > > > > 
> > > > > > > One issue with QEMU_CFG_FILE_DIR is that it's broken wrt 
> > > > > > > migration,
> > > > > > > unless we pass in very small bits of data which we
> > > > > > > can guarantee never changes across qemu versions.
> > > > > > > 
> > > > > > Shouldn't we guaranty that ACPI tables do not change for the same
> > > > > > machine type anyway?
> > > > > 
> > > > > That's not practical. They are too big to stay completely unchanged.
> > > > > 
> > > > I will not be surprised if this will cause us problem somehow. Guest
> > > > will see new tables only after reboot/resume from S4 so damage is
> > > > limited, but one thing that comes to mind is table's size change. If
> > > > they grow from one version to the other after resuming a guest from S4
> > > > on new QEMU version part of the tables may be corrupted.
> > > 
> > > Why would it be corrupted?
> > > 
> > Because ACPI tables are stored in the memory marked as "ACPI data" (IIRC
> > seabios mark them as reserved). Guests do not save reserved memory during
> > S4 (don't know about "ACPI data", but if guest does not copy tables from
> > it to another location I doubt it saves the memory, anyway ACPI spec does
> > not mandate it), so what happens on resume if the memory grows? Part
> > of it, that was not marked as reserved before S4, is re-written with
> > whatever data guest had there and rest contains now corrupted ACPI tables
> > that BIOS put there during boot. We can hope that guest is smart enough
> > to see that memory map changed and refuse to resume or we can put ACPI
> > tables into NVS memory which has to be saved and restored during S4 by
> > OSPM.
> 
> This easily solvable by asking for memory with a solid margin.
> 
> But what you write above sounds strange: would not it
> apply to any memory allocated with malloc_high?
> If so, a minor change in BIOS e.g. where we
> allocate a bit more, less, or in a different order,
> will break suspend.
> 
Good point. I really do not know how S4 resume handles memory map
changes. The only sane way that I see is to refuse resume if memory map
changes.

> > > In any case, FACS has a hardware signature value for
> > > just such a case. If we know VM can not be resumed on new QEMU,
> > > we can change the signature and it will cold-boot instead.
> > > 
> > Nice, does Linux check it?
> 
> If not we can fix it.
> 
> > > > > > > Off-list, I suggested fixing it and migrating file
> > > > > > > content, but Anthony thinks it's a bad idea.
> > > > > > > 
> > > > > > Why is this a bad idea to fix device migration?
> > > > > 
> > > > > You misunderstand I think.
> > > > > Question is whether we should be putting so much info in fw_cfg.
> > > > > If we keep fw_cfg for small things we don't need to
> > > > > migrate it. In that case ACPI tables have to be passed in
> > > > > using some other mechanism.
> > > > > 
> > > > Where this notion that fw_cfg is only for a small things is coming
> > > > from? I can assure you this was not the case when the device was
> > > > introduced. In fact it is used today for not so small things like
> > > > bootindex splash screen bitmaps, option rom loading and kernel/initrd
> > > > loading. Some of those are bigger then ACPI tables will ever be.
> > > > And they all should be migrated, so fw_cfg should be fixed anyway.
> > > > 
> > > > --
> > > >                         Gleb.
> > > 
> > > I'm not arguing with that. Convince Anthony please.
> > > 
> > Convince him in what? That fw_cfg is broken vrt migration and there are
> > cases that will fail _today_ without any ACPI related changes? This is
> > knows for ages.
> > 
> > --
> >                     Gleb.
> 
> That we should use fw_cfg to load acpi tables.
> 
I haven't seen his arguments against it.

--
                        Gleb.

Reply via email to