from:"Vasilis Liaskovitis"

Re: [RFC PATCH v3 00/19] ACPI memory hotplug

2012-11-01 Thread Vasilis Liaskovitis

On Wed, Oct 31, 2012 at 01:16:56PM +0200, Avi Kivity wrote:
 On 10/31/2012 12:58 PM, Stefan Hajnoczi wrote:
  On Fri, Sep 21, 2012 at 1:17 PM, Vasilis Liaskovitis
  vasilis.liaskovi...@profitbricks.com wrote:
  This is v3 of the ACPI memory hotplug functionality. Only x86_64 target is 
  supported
  for now.
  
  Hi Vasilis,
  Regarding the hot unplug issue we've been discussing, it's possible to
  progress this patch series without fully solving that problem upfront.
  
  Karen Noel suggested that the series could be rolled without the hot
  unplug command, so that it's not possible to hit the unsafe case.
  This would allow users to hot plug additional memory.  They would have
  to use virtio-balloon to reduce the memory footprint again.  Later,
  when the memory region referencing issue has been solved the hot
  unplug command can be added.
  
  Just wanted to mention Karen's idea in case you feel stuck right now.
 
 We could introduce hotunplug as an experimental feature so people can
 test and play with it, and later graduate it to a fully supported feature.

ok, I 'll separate hotplug and hotunplug patches for next version of the
patchseries (maybe even offer hotunplug in a separate series)

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC PATCH v3 05/19] Implement dimm device abstraction

2012-10-24 Thread Vasilis Liaskovitis

Hi,

On Wed, Oct 24, 2012 at 12:15:17PM +0200, Stefan Hajnoczi wrote:
 On Wed, Oct 24, 2012 at 10:06 AM, liu ping fan qemul...@gmail.com wrote:
  On Tue, Oct 23, 2012 at 8:25 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
  On Fri, Sep 21, 2012 at 01:17:21PM +0200, Vasilis Liaskovitis wrote:
  +static void dimm_populate(DimmDevice *s)
  +{
  +DeviceState *dev= (DeviceState*)s;
  +MemoryRegion *new = NULL;
  +
  +new = g_malloc(sizeof(MemoryRegion));
  +memory_region_init_ram(new, dev-id, s-size);
  +vmstate_register_ram_global(new);
  +memory_region_add_subregion(get_system_memory(), s-start, new);
  +s-mr = new;
  +}
  +
  +static void dimm_depopulate(DimmDevice *s)
  +{
  +assert(s);
  +vmstate_unregister_ram(s-mr, NULL);
  +memory_region_del_subregion(get_system_memory(), s-mr);
  +memory_region_destroy(s-mr);
  +s-mr = NULL;
  +}
 
  How is dimm hot unplug protected against callers who currently have RAM
  mapped (from cpu_physical_memory_map())?
 
  Emulated devices call cpu_physical_memory_map() directly or indirectly
  through DMA emulation code.  The RAM pointer may be held for arbitrary
  lengths of time, across main loop iterations, etc.
 
  It's not clear to me that it is safe to unplug a DIMM that has network
  or disk I/O buffers, for example.  We also need to be robust against
  malicious guests who abuse the hotplug lifecycle.  QEMU should never be
  left with dangling pointers.
 
  Not sure about the block layer. But I think those thread are already
  out of big lock, so there should be a MemoryListener to catch the
  RAM-unplug event, and if needed, bdrv_flush.

do we want bdrv_flush, or some kind of cancel request e.g. bdrv_aio_cancel?

 
 Here is the detailed scenario:
 
 1. Emulated device does cpu_physical_memory_map() and gets a pointer
 to guest RAM.
 2. Return to vcpu or iothread, continue processing...
 3. Hot unplug of RAM causes the guest RAM to disappear.
 4. Pending I/O completes and overwrites memory from dangling guest RAM 
 pointer.
 
 Any I/O device that does zero-copy I/O in QEMU faces this problem:
  * The block layer is affected.
  * The net layer is unaffected because it doesn't do zero-copy tx/rx
 across returns to the main loop (#2 above).
  * Not sure about other devices classes (e.g. USB).
 
 How should the MemoryListener callback work?  For block I/O it may not
 be possible to cancel pending I/O asynchronously - if you try to
 cancel then your thread may block until the I/O completes.

e.g. paio_cancel does this?
is there already an API to asynchronously cancel all in flight operations in a
BlockDriverState? Afaict block_job_cancel refers to streaming jobs only and
doesn't help here.

Can we make the RAM unplug initiate async I/O cancellations, prevent further 
I/Os,
and only free the memory in a callback, after all DMA I/O to the associated 
memory
region has been cancelled or completed?

Also iiuc the MemoryListener should be registered from users of
cpu_physical_memory_map e.g. hw/virtio.c

By the way dimm_depopulate only frees the qemu memory on an ACPI _EJ request, 
which
means that a well-behaved guest will have already offlined the memory and is not
using it anymore. If the guest still uses the memory e.g. for a DMA buffer, the
logical memory offlining will fail and the _EJ/qemu memory freeing will never
happen.

But in theory a malicious acpi guest driver could trigger _EJ requests to do 
step
3 above. 

Or perhaps the backing block driver can finish an I/O request for a zero-copy
block device that the guest doesn't care for anymore? I 'll think about this a
bit more.

 Synchronous cancel behavior is not workable since it can lead to poor
 latency or hangs in the guest.

ok

thanks,

- Vasilis

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v3 06/19] Implement -dimm command line option

2012-10-22 Thread Vasilis Liaskovitis

Hi,
On Thu, Oct 18, 2012 at 02:33:02PM +0200, Avi Kivity wrote:
 On 10/18/2012 11:27 AM, Vasilis Liaskovitis wrote:
  On Wed, Oct 17, 2012 at 12:03:51PM +0200, Avi Kivity wrote:
  On 10/17/2012 11:19 AM, Vasilis Liaskovitis wrote:
   
   I don't think so, but probably there's a limit of DIMMs that real
   controllers have, something like 8 max.
   
   In the case of i440fx specifically, do you mean that we should model the 
   DRB
   (Dram row boundary registers in section 3.2.19 of the i440fx spec) ?
   
   The i440fx DRB registers only supports up to 8 DRAM rows (let's say 1 row
   maps 1-1 to a DimmDevice for this discussion) and only supports up to 
   2GB of
   memory afaict (bit 31 and above is ignored).
   
   I 'd rather not model this part of the i440fx - having only 8 DIMMs 
   seems too
   restrictive. The rest of the patchset supports up to 255 DIMMs so it 
   would be a
   waste imho to model an old pc memory controller that only supports 8 
   DIMMs.
   
   There was also an old discussion about i440fx modeling here:
   https://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg02705.html
   the general direction was that i440fx is too old and we don't want to 
   precisely
   emulate the DRB registers, since they lack flexibility.
   
   Possible solutions:
   
   1) is there a newer and more flexible chipset that we could model?
  
  Look for q35 on this list.
  
  thanks, I 'll take a look. It sounds like the other options below are more
  straightforward now, but let me know if you prefer q35 integration as a 
  priority.
 
 At least validate that what you're doing fits with how q35 works.

In terms of pmc modeling, the q35 page http://wiki.qemu.org/Features/Q35
mentions:

Refactor i440fx to create i440fx-pmc class
ich9: model ICH9 Super I/O chip
ich9: make i440fx-pmc a generic PCNorthBridge class and add support for ich9
northbridge 

is this still the plan? There was an old patchset creating i440fx-pmc here:
http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg03501.html
but I am not sure if it has been dropped or worked on. v3 of the q35 patchset
doesn't include a pmc I think.

It would be good to know what the current plan regarding pmc modeling (for both
q35 and i440fx) is.

thanks,

- Vasilis

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v3 06/19] Implement -dimm command line option

2012-10-18 Thread Vasilis Liaskovitis

On Wed, Oct 17, 2012 at 12:03:51PM +0200, Avi Kivity wrote:
 On 10/17/2012 11:19 AM, Vasilis Liaskovitis wrote:
  
  I don't think so, but probably there's a limit of DIMMs that real
  controllers have, something like 8 max.
  
  In the case of i440fx specifically, do you mean that we should model the DRB
  (Dram row boundary registers in section 3.2.19 of the i440fx spec) ?
  
  The i440fx DRB registers only supports up to 8 DRAM rows (let's say 1 row
  maps 1-1 to a DimmDevice for this discussion) and only supports up to 2GB of
  memory afaict (bit 31 and above is ignored).
  
  I 'd rather not model this part of the i440fx - having only 8 DIMMs seems 
  too
  restrictive. The rest of the patchset supports up to 255 DIMMs so it would 
  be a
  waste imho to model an old pc memory controller that only supports 8 DIMMs.
  
  There was also an old discussion about i440fx modeling here:
  https://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg02705.html
  the general direction was that i440fx is too old and we don't want to 
  precisely
  emulate the DRB registers, since they lack flexibility.
  
  Possible solutions:
  
  1) is there a newer and more flexible chipset that we could model?
 
 Look for q35 on this list.

thanks, I 'll take a look. It sounds like the other options below are more
straightforward now, but let me know if you prefer q35 integration as a 
priority.

 
  
  2) model and document 
  ^--- the critical bit
 
  a generic (non-existent) i440fx that would support more
  and larger DIMMs. E.g. support 255 DIMMs. If we want to use a description
  similar to the i440fx DRB registers, the registers would take up a lot of 
  space.
  In i440fx there is one 8-bit DRB register per DIMM, and DRB[i] describes how
  many 8MB chunks are contained in DIMMs 0...i. So, the register values are
  cumulative (and total described memory cannot exceed 256x8MB = 2GB)
 
 Our i440fx has already been extended by support for pci and cpu hotplug,
 and I see no reason not to extend it for memory.  We can allocate extra
 mmio space for registers if needed.  Usually I'm against this sort of
 thing, but in this case we don't have much choice.

ok

 
  
  We could for example model: 
  - an 8-bit non-cumulative register for each DIMM, denoting how many
  128MB chunks it contains. This allowes 32GB for each DIMM, and with 255 
  DIMMs we
  describe a bit less than 8TB. These registers require 255 bytes.
  - a 16-bit cumulative register for each DIMM again for 128MB chunks. This 
  allows
  us to describe 8TB of memory (but the registers take up double the space, 
  because
  they describe cumulative memory amounts)
 
 There is no reason to save space.  Why not have two 64-bit registers per
 DIMM, one describing the size and the other the base address, both in
 bytes?  Use a few low order bits for control.

Do we want this generic scheme above to be tied into the i440fx/pc machine?
Or have it as a separate generic memory bus / pmc usable by others (e.g. in
hw/dimm.c)?
The 64-bit values you describe are already part of DimmDevice properties, but
they are not hardware registers described as part of a chipset.

In terms of control bits, did you want to mimic some other chipset registers? - 
any examples would be useful.

 
  
  3) let everything be handled/abstracted by dimmbus - the chipset DRB 
  modelling
  is not done (at least for i440fx, other machines could). This is the least 
  precise
  in terms of emulation. On the other hand, if we are not really trying to 
  emulate
  the real (too restrictive) hardware, does it matter?
 
 We could emulate base memory using the chipset, and extra memory using
 the scheme above.  This allows guests that are tied to the chipset to
 work, and guests that have more awareness (seabios) to use the extra
 features.

But if we use the real i440fx pmc DRBs for base memory, this means base memory
would be = 2GB, right?

Sounds like we 'd need to change the DRBs anyway to describe useful amounts of
base memory (e.g. 512MB chunks and check against address lines [36:29] can
describe base memory up to 64GB, though that's still limiting for very large
VMs). But we'd be diverting from the real hardware again.

Then we can model base memory with tweaked i440fx pmc's DRB registers - we
could only use DRB[0] (one DIMM describing all of base memory) or more.

DIMMs would be allowed to be hotplugged in the generic mem-controller scheme 
only
(unless it makes sense to allow hotplug in the remaining pmc DRBs and
start using the generic scheme once we run out of emulated DRBs)

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v3 06/19] Implement -dimm command line option

2012-10-17 Thread Vasilis Liaskovitis

On Sat, Oct 13, 2012 at 08:57:19AM +, Blue Swirl wrote:
 On Tue, Oct 9, 2012 at 5:04 PM, Vasilis Liaskovitis
 vasilis.liaskovi...@profitbricks.com wrote:
 
snip
  Maybe even the dimmbus device shouldn't exist by itself after all, or
  it should be pretty much invisible to users. On real HW, the memory
  controller or south bridge handles the memory. For i440fx, it's part
  of the same chipset. So I think we should just add qdev properties to
  i440fx to specify the sizes, nodes etc. Then i440fx should create the
  dimmbus device unconditionally using the properties. The default
  properties should create a sane configuration, otherwise -global
  i440fx.dimm_size=512M etc. could be used. Then the bus would be
  populated as before or with device_add.
 
  hmm the problem with using only i440fx properties, is that size/nodes look
  dimm specific to me, not chipset-memcontroller specific. Unless we only 
  allow
  uniform size dimms. Is it possible to have a dynamic list of sizes/nodes 
  pairs as
  properties of a qdev device?
 
 I don't think so, but probably there's a limit of DIMMs that real
 controllers have, something like 8 max.

In the case of i440fx specifically, do you mean that we should model the DRB
(Dram row boundary registers in section 3.2.19 of the i440fx spec) ?

The i440fx DRB registers only supports up to 8 DRAM rows (let's say 1 row
maps 1-1 to a DimmDevice for this discussion) and only supports up to 2GB of
memory afaict (bit 31 and above is ignored).

I 'd rather not model this part of the i440fx - having only 8 DIMMs seems too
restrictive. The rest of the patchset supports up to 255 DIMMs so it would be a
waste imho to model an old pc memory controller that only supports 8 DIMMs.

There was also an old discussion about i440fx modeling here:
https://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg02705.html
the general direction was that i440fx is too old and we don't want to precisely
emulate the DRB registers, since they lack flexibility.

Possible solutions:

1) is there a newer and more flexible chipset that we could model?

2) model and document a generic (non-existent) i440fx that would support more
and larger DIMMs. E.g. support 255 DIMMs. If we want to use a description
similar to the i440fx DRB registers, the registers would take up a lot of space.
In i440fx there is one 8-bit DRB register per DIMM, and DRB[i] describes how
many 8MB chunks are contained in DIMMs 0...i. So, the register values are
cumulative (and total described memory cannot exceed 256x8MB = 2GB)

We could for example model: 
- an 8-bit non-cumulative register for each DIMM, denoting how many
128MB chunks it contains. This allowes 32GB for each DIMM, and with 255 DIMMs we
describe a bit less than 8TB. These registers require 255 bytes.
- a 16-bit cumulative register for each DIMM again for 128MB chunks. This allows
us to describe 8TB of memory (but the registers take up double the space, 
because
they describe cumulative memory amounts)

3) let everything be handled/abstracted by dimmbus - the chipset DRB modelling
is not done (at least for i440fx, other machines could). This is the least 
precise
in terms of emulation. On the other hand, if we are not really trying to emulate
the real (too restrictive) hardware, does it matter?

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v3 06/19] Implement -dimm command line option

2012-10-09 Thread Vasilis Liaskovitis

Hi,

sorry for the delayed answer.

On Sat, Sep 29, 2012 at 11:13:04AM +, Blue Swirl wrote:
 
  The -dimm option is supposed to specify the dimm/memory layout, and not 
  create
  any devices.
 
  If we don't want this new option, I have a question:
 
  A -device/device_add means we create a new qdev device at startup or as a
  hotplug operation respectively. So, the semantics of
  -device dimm,id=dimm0,size=512M,node=0,populated=on are clear to me.
 
  What does -device dimm,populated=off mean from a qdev perspective? There 
  are 2
  alternatives:
 
  - The device is created on the dimmbus, but is not used/populated yet. Than 
  the
  activation/acpi-hotplug of the dimm may require a separate command (we used 
  to have
  dimm_add in versions  3). device_add handling always hotplugs a new 
  qdev
  device, so this wouldn't fit this usecase, because the device already 
  exists. In
  this case, the actual acpi hotplug operation is decoupled from qdev device
  creation.
 
 The bus exists but the devices do not, device_add would add DIMMs to
 the bus. This matches PCI bus created by the host bridge and PCI
 device hotplug.
 
 A more complex setup would be dimm bus, dimm slot devices and DIMM
 devices. The intermediate slot device would contain one DIMM device if
 plugged.

interesting, I haven't thought about this alternative. It does sounds overly
complex, but a dimmslot / dimmdevice splitup could consolidate hotplug semantic
differences between populated=on/off. Something similar to the dimmslot device
is already present in v3 (dimmcfg structure), but it's not a qdev visible 
device.
I 'd rather avoid the complication, but i might revisit this idea.

 
 
  - The dimmdevice is not created when -device dimm,populated=off (this 
  would
  require some ugly checking in normal -device argument handling). Only the 
  dimm
  layout is saved. The hotplug is triggered from a normal device_add later. 
  So in
  this case, the acpi hotplug happens at the same time as the qdev hotplug.
 
  Do you see a simpler alternative without introducing a new option?
 
  Using the -dimm option follows the second semantic and avoids changing 
  the -device
  semantics. Dimm layout description is decoupled from dimmdevice creation, 
  and qdev
  hotplug coincides with acpi hotplug.
 
 Maybe even the dimmbus device shouldn't exist by itself after all, or
 it should be pretty much invisible to users. On real HW, the memory
 controller or south bridge handles the memory. For i440fx, it's part
 of the same chipset. So I think we should just add qdev properties to
 i440fx to specify the sizes, nodes etc. Then i440fx should create the
 dimmbus device unconditionally using the properties. The default
 properties should create a sane configuration, otherwise -global
 i440fx.dimm_size=512M etc. could be used. Then the bus would be
 populated as before or with device_add.

hmm the problem with using only i440fx properties, is that size/nodes look
dimm specific to me, not chipset-memcontroller specific. Unless we only allow
uniform size dimms. Is it possible to have a dynamic list of sizes/nodes pairs 
as 
properties of a qdev device?

Also if there is no dimmbus, and instead we have only links from i440fx to 
dimm-devices,
would the current qdev hotplug API be enough?

I am currently leaning towards this: i440fx unconditionally creates the 
dimmbus. Users
don't have to specify the bus (i assume this is what you mean by dimmbus should
be invisible to the users)

We only use -device dimm to describe dimms. With -device 
dimm,populated=off, only
the dimm config layout will be saved in the dimmbus. The hotplug is triggered 
from a normal
device_add later (same as pci hotplug). 

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v3 06/19] Implement -dimm command line option

2012-09-24 Thread Vasilis Liaskovitis

On Sat, Sep 22, 2012 at 01:46:57PM +, Blue Swirl wrote:
 On Fri, Sep 21, 2012 at 11:17 AM, Vasilis Liaskovitis
 vasilis.liaskovi...@profitbricks.com wrote:
  Example:
  -dimm id=dimm0,size=512M,node=0,populated=off
 
 There should not be a need to introduce a new top level option,
 instead you should just use -device, like
 -device dimm,base=0,id=dimm0,size=512M,node=0,populated=off
 
 That would also specify the start address.

What is base? the start address? I think the start address should be 
calculated by the
chipset / board, not by the user.

The -dimm option is supposed to specify the dimm/memory layout, and not create
any devices.

If we don't want this new option, I have a question:

A -device/device_add means we create a new qdev device at startup or as a
hotplug operation respectively. So, the semantics of 
-device dimm,id=dimm0,size=512M,node=0,populated=on are clear to me.

What does -device dimm,populated=off mean from a qdev perspective? There are 2
alternatives:

- The device is created on the dimmbus, but is not used/populated yet. Than the
activation/acpi-hotplug of the dimm may require a separate command (we used to 
have
dimm_add in versions  3). device_add handling always hotplugs a new qdev
device, so this wouldn't fit this usecase, because the device already exists. In
this case, the actual acpi hotplug operation is decoupled from qdev device
creation.

- The dimmdevice is not created when -device dimm,populated=off (this would
require some ugly checking in normal -device argument handling). Only the dimm
layout is saved. The hotplug is triggered from a normal device_add later. So in
this case, the acpi hotplug happens at the same time as the qdev hotplug.

Do you see a simpler alternative without introducing a new option?

Using the -dimm option follows the second semantic and avoids changing the 
-device
semantics. Dimm layout description is decoupled from dimmdevice creation, and 
qdev
hotplug coincides with acpi hotplug.

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v3 20/19][SeaBIOS] alternative: Use paravirt interface for pci windows

2012-09-24 Thread Vasilis Liaskovitis

On Mon, Sep 24, 2012 at 02:35:30PM +0800, Wen Congyang wrote:
 At 09/21/2012 07:20 PM, Vasilis Liaskovitis Wrote:
  Initialize the 32-bit and 64-bit pci starting offsets from values passed in 
  by
  the qemu paravirt interface QEMU_CFG_PCI_WINDOW. Qemu calculates the 
  starting
  offsets based on initial memory and hotplug-able dimms.
 
 This patch can't be applied if I apply the other patches for seabios. And I
 don't find this patch in your tree.

to test these alternative patches, please try these trees:

https://github.com/vliaskov/seabios/commits/memhp-v3-alt
https://github.com/vliaskov/qemu-kvm/commits/memhp-v3-alt

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v3 11/19] Implement qmp and hmp commands for notification lists

2012-09-24 Thread Vasilis Liaskovitis

Hi,

On Fri, Sep 21, 2012 at 04:03:26PM -0600, Eric Blake wrote:
 On 09/21/2012 05:17 AM, Vasilis Liaskovitis wrote:
  Guest can respond to ACPI hotplug events e.g. with _EJ or _OST method.
  This patch implements a tail queue to store guest notifications for memory
  hot-add and hot-remove requests.
  
  Guest responses for memory hotplug command on a per-dimm basis can be 
  detected
  with the new hmp command info memhp or the new qmp command query-memhp
 
 Naming doesn't match the QMP code.

will fix.

 
  Examples:
  
  (qemu) device_add dimm,id=ram0
 
  
  These notification items should probably be part of migration state (not yet
  implemented).
 
 In the case of libvirt driving migration, you already said in 10/19 that
 libvirt has to start the destination with the populated=on|off fields
 correct for each dimm according to the state it was in at the time the

That patch actually alleviates this restriction for the off-on direction i.e.
it allows for the target-VM to not have its args updated for dimm hot-add.
(e.g. Let's say the source was started with a dimm, initialy off. The dimm is
 hot-plugged, and then migrated . WIth patch 10/19, the populated arg doesn't
 have to be updated on the target)
The other direction (off-on) still needs correct arg change.

If libvirt/management layers guarantee the dimm arguments are correctly changed,
I don't see that we need 10/19 patch eventually.

What I think is needed is another hmp/qmp command, that will report
which dimms are on/off at any given time e.g.

(monitor) info memory-hotplug

dimm0: off
dimm1: on
...
dimmN: off

This can be used on the source by libvirt / other layers to find out the
populated dimms, and construct the correct command line on the destination.
Does this make sense to you?

The current patch only deals with success/failure event notifications (not
on-off state of dimms) and should probably be renamed to
query-memory-hotplug-events.

 host started the update.  Can the host hot unplug memory after migration
 has started?

Good testcase. I would rather not allow any  hotplug operations while the 
migration
is happening. 

What do we do with pci hotplug during migration currently?  I found a discussion
dating from a year ago, suggesting the same as the simplest solution, but I
don't know what's currently implemented.
http://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg01204.html

 
  +
  +##
  +# @MemHpInfo:
  +#
  +# Information about status of a memory hotplug command
  +#
  +# @dimm: the Dimm associated with the result
  +#
  +# @result: the result of the hotplug command
  +#
  +# Since: 1.3
  +#
  +##
  +{ 'type': 'MemHpInfo',
  +  'data': {'dimm': 'str', 'request': 'str', 'result': 'str'} }
 
 Should 'result' be a bool (true for success, false for still pending) or
 an enum, instead of a free-form string?  Likewise, isn't 'request' going
 to be exactly one of two values (plug or unplug)?

agreed with 'request'.

For 'result' it is also a boolean, but with 'success' and 'failure' (rather than
'pending'). Items are only queued when the guest has given us a definite _OST
or _EJ result wich is either success or fail. If an operation is pending, 
nothing
is queued here. 

Perhaps queueing pending operations also has a usecase, but this isn't addressed
in this patch.
 
thanks,

- Vasilis

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v3 08/19] pc: calculate dimm physical addresses and adjust memory map

2012-09-24 Thread Vasilis Liaskovitis

On Sat, Sep 22, 2012 at 02:15:28PM +, Blue Swirl wrote:
  +
  +/* Function to configure memory offsets of hotpluggable dimms */
  +
  +target_phys_addr_t pc_set_hp_memory_offset(uint64_t size)
  +{
  +target_phys_addr_t ret;
  +
  +/* on first call, initialize ram_hp_offset */
  +if (!ram_hp_offset) {
  +if (ram_size = PCI_HOLE_START ) {
  +ram_hp_offset = 0x1LL + (ram_size - PCI_HOLE_START);
  +} else {
  +ram_hp_offset = ram_size;
  +}
  +}
  +
  +if (ram_hp_offset = 0x1LL) {
  +ret = ram_hp_offset;
  +above_4g_hp_mem_size += size;
  +ram_hp_offset += size;
  +}
  +/* if dimm fits before pci hole, append it normally */
  +else if (ram_hp_offset + size = PCI_HOLE_START) {
 
 } else if ...
 
  +ret = ram_hp_offset;
  +below_4g_hp_mem_size += size;
  +ram_hp_offset += size;
  +}
  +/* otherwise place it above 4GB */
  +else {
 
 } else {
 
  +ret = 0x1LL;
  +above_4g_hp_mem_size += size;
  +ram_hp_offset = 0x1LL + size;
  +}
  +
  +return ret;
  +}
 
 But the function and use of lots of global variables is ugly. The dimm
 devices should be just created in piix_pci.c (i440fx) directly with
 correct offsets and sizes, so all  below_4g_mem_size etc. calculations
 should be moved there. That would implement the PMC part of i440fx.
 
 For ISA PC, probably the board should create the DIMMs since there may
 not be a memory controller. The 4G logic does not make sense there
 anyway.

What about moving the implementation to pc_piix.c?
Initial RAM and pci windows are already calculated in pc_init1, and then passed
to i440fx_init. The memory bus could be attached to i440fx for pci-enabled pc
and to isabus-bridge for isa-pc (isa-pc not tested yet).

Something like the following:

---
 hw/pc.h  |1 +
 hw/pc_piix.c |   57 +++--
 2 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/hw/pc.h b/hw/pc.h
index e4db071..d6cc43b 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -10,6 +10,7 @@
 #include memory.h
 #include ioapic.h
 
+#define PCI_HOLE_START 0xe000
 /* PC-style peripherals (also used by other machines).  */
 
 /* serial.c */
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 88ff041..17db95a 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -43,6 +43,7 @@
 #include xen.h
 #include memory.h
 #include exec-memory.h
+#include dimm.h
 #ifdef CONFIG_XEN
 #  include xen/hvm/hvm_info_table.h
 #endif
@@ -52,6 +53,8 @@
 static const int ide_iobase[MAX_IDE_BUS] = { 0x1f0, 0x170 };
 static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 };
 static const int ide_irq[MAX_IDE_BUS] = { 14, 15 };
+static ram_addr_t below_4g_hp_mem_size = 0;
+static ram_addr_t above_4g_hp_mem_size = 0;
 
 static void kvm_piix3_setup_irq_routing(bool pci_enabled)
 {
@@ -117,6 +120,41 @@ static void ioapic_init(GSIState *gsi_state)
 }
 }
 
+static target_phys_addr_t pc_set_hp_memory_offset(uint64_t size)
+{
+target_phys_addr_t ret;
+static ram_addr_t ram_hp_offset = 0;
+
+/* on first call, initialize ram_hp_offset */
+if (!ram_hp_offset) {
+if (ram_size = PCI_HOLE_START ) {
+ram_hp_offset = 0x1LL + (ram_size - PCI_HOLE_START);
+} else {
+ram_hp_offset = ram_size;
+}
+}
+
+if (ram_hp_offset = 0x1LL) {
+ret = ram_hp_offset;
+above_4g_hp_mem_size += size;
+ram_hp_offset += size;
+}
+/* if dimm fits before pci hole, append it normally */
+else if (ram_hp_offset + size = PCI_HOLE_START) {
+ret = ram_hp_offset;
+below_4g_hp_mem_size += size;
+ram_hp_offset += size;
+}
+/* otherwise place it above 4GB */
+else {
+ret = 0x1LL;
+above_4g_hp_mem_size += size;
+ram_hp_offset = 0x1LL + size;
+}
+
+return ret;
+}
+
 /* PC hardware initialisation */
 static void pc_init1(MemoryRegion *system_memory,
  MemoryRegion *system_io,
@@ -155,9 +193,9 @@ static void pc_init1(MemoryRegion *system_memory,
 kvmclock_create();
 }
 
-if (ram_size = 0xe000 ) {
-above_4g_mem_size = ram_size - 0xe000;
-below_4g_mem_size = 0xe000;
+if (ram_size = PCI_HOLE_START ) {
+above_4g_mem_size = ram_size - PCI_HOLE_START;
+below_4g_mem_size = PCI_HOLE_START;
 } else {
 above_4g_mem_size = 0;
 below_4g_mem_size = ram_size;
@@ -172,6 +210,9 @@ static void pc_init1(MemoryRegion *system_memory,
 rom_memory = system_memory;
 }
 
+/* adjust memory map for hotplug dimms */
+dimm_calc_offsets(pc_set_hp_memory_offset);
+
 /* allocate ram and load rom/bios */
 if (!xen_enabled()) {
 fw_cfg = pc_memory_init(system_memory,
@@ -192,18 +233,22 @@ static void pc_init1(MemoryRegion *system_memory,

[RFC PATCH v3 00/19] ACPI memory hotplug

2012-09-21 Thread Vasilis Liaskovitis

/seabios/commits/memhp-v3

Vasilis Liaskovitis (12):
  Implement dimm device abstraction
  Implement -dimm command line option
  acpi_piix4: Implement memory device hotplug registers
  pc: calculate dimm physical addresses and adjust memory map
  pc: Add dimm paravirt SRAT info
  fix live-migration when populated=on is missing
  Implement qmp and hmp commands for notification lists
  Implement info memory-total and query-memory-total
  balloon: update with hotplugged memory
  Add _OST dimm support
  Update dimm state on reset
  Implement _PS3 for dimm

 arch_init.c |   24 ++-
 docs/specs/acpi_hotplug.txt |   54 ++
 docs/specs/fwcfg.txt|   28 +++
 hmp-commands.hx |4 +
 hmp.c   |   24 +++
 hmp.h   |2 +
 hw/Makefile.objs|2 +-
 hw/acpi_piix4.c |  114 +++-
 hw/dimm.c   |  435 +++
 hw/dimm.h   |  101 ++
 hw/pc.c |   55 ++-
 hw/pc.h |6 +
 hw/pc_piix.c|   20 ++-
 hw/virtio-balloon.c |   13 +-
 monitor.c   |   14 ++
 qapi-schema.json|   37 
 qemu-config.c   |   25 +++
 qemu-options.hx |5 +
 qmp-commands.hx |   57 ++
 sysemu.h|1 +
 vl.c|   51 +
 21 files changed, 1051 insertions(+), 21 deletions(-)
 create mode 100644 docs/specs/acpi_hotplug.txt
 create mode 100644 docs/specs/fwcfg.txt
 create mode 100644 hw/dimm.c
 create mode 100644 hw/dimm.h

 Vasilis Liaskovitis (7):
  Add ACPI_EXTRACT_DEVICE* macros
  Subject: [PATCH 02/18] Add SSDT memory device support
  acpi-dsdt: Implement functions for memory hotplug
  acpi: generate hotplug memory devices
  Add _OST dimm method
  Implement _PS3 method for memory device
  Calculate pcimem_start and pcimem64_start from SRAT entries

 Makefile  |2 +-
 src/acpi-dsdt.dsl |  135 ++-
 src/acpi.c|  216 
 src/acpi.h|3 +
 src/pciinit.c |6 +-
 src/post.c|3 +
 src/smp.c |4 +
 src/ssdt-mem.dsl  |   73 +
 tools/acpi_extract.py |   28 +++
 9 files changed, 447 insertions(+), 23 deletions(-)
 create mode 100644 src/ssdt-mem.dsl


-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 01/19][SeaBIOS] Add ACPI_EXTRACT_DEVICE* macros

2012-09-21 Thread Vasilis Liaskovitis

This allows to extract the beginning, end and name of a Device object.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 tools/acpi_extract.py |   28 
 1 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/tools/acpi_extract.py b/tools/acpi_extract.py
index 167a322..cb2540e 100755
--- a/tools/acpi_extract.py
+++ b/tools/acpi_extract.py
@@ -195,6 +195,28 @@ def aml_package_start(offset):
 offset += 1
 return offset + aml_pkglen_bytes(offset) + 1
 
+def aml_device_start(offset):
+#0x5B 0x82 DeviceOp PkgLength NameString ProcID
+if ((aml[offset] != 0x5B) or (aml[offset + 1] != 0x82)):
+die( Name offset 0x%x: expected 0x5B 0x83 actual 0x%x 0x%x %
+ (offset, aml[offset], aml[offset + 1]));
+return offset
+
+def aml_device_string(offset):
+#0x5B 0x82 DeviceOp PkgLength NameString ProcID
+start = aml_device_start(offset)
+offset += 2
+pkglenbytes = aml_pkglen_bytes(offset)
+offset += pkglenbytes
+return offset
+
+def aml_device_end(offset):
+start = aml_device_start(offset)
+offset += 2
+pkglenbytes = aml_pkglen_bytes(offset)
+pkglen = aml_pkglen(offset)
+return offset + pkglen
+
 lineno = 0
 for line in fileinput.input():
 # Strip trailing newline
@@ -279,6 +301,12 @@ for i in range(len(asl)):
 offset = aml_processor_end(offset)
 elif (directive == ACPI_EXTRACT_PKG_START):
 offset = aml_package_start(offset)
+elif (directive == ACPI_EXTRACT_DEVICE_START):
+offset = aml_device_start(offset)
+elif (directive == ACPI_EXTRACT_DEVICE_STRING):
+offset = aml_device_string(offset)
+elif (directive == ACPI_EXTRACT_DEVICE_END):
+offset = aml_device_end(offset)
 else:
 die(Unsupported directive %s % directive)
 
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 03/19][SeaBIOS] acpi-dsdt: Implement functions for memory hotplug

2012-09-21 Thread Vasilis Liaskovitis

Extend the DSDT to include methods for handling memory hot-add and hot-remove
notifications and memory device status requests. These functions are called
from the memory device SSDT methods.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/acpi-dsdt.dsl |   70 +++-
 1 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 2060686..5d3e92b 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -737,6 +737,71 @@ DefinitionBlock (
 }
 Return(One)
 }
+/* Objects filled in by run-time generated SSDT */
+External(MTFY, MethodObj)
+External(MEON, PkgObj)
+
+Method (CMST, 1, NotSerialized) {
+// _STA method - return ON status of memdevice
+// Local0 = MEON flag for this cpu
+Store(DerefOf(Index(MEON, Arg0)), Local0)
+If (Local0) { Return(0xF) } Else { Return(0x0) }
+}
+
+/* Memory hotplug notify array */
+OperationRegion(MEST, SystemIO, 0xaf80, 32)
+Field (MEST, ByteAcc, NoLock, Preserve)
+{
+MES, 256
+}
+ 
+/* Memory eject byte */
+OperationRegion(MEMJ, SystemIO, 0xafa0, 1)
+Field (MEMJ, ByteAcc, NoLock, Preserve)
+{
+MPE, 8
+}
+
+Method(MESC, 0) {
+// Local5 = active memdevice bitmap
+Store (MES, Local5)
+// Local2 = last read byte from bitmap
+Store (Zero, Local2)
+// Local0 = memory device iterator
+Store (Zero, Local0)
+While (LLess(Local0, SizeOf(MEON))) {
+// Local1 = MEON flag for this memory device
+Store(DerefOf(Index(MEON, Local0)), Local1)
+If (And(Local0, 0x07)) {
+// Shift down previously read bitmap byte
+ShiftRight(Local2, 1, Local2)
+} Else {
+// Read next byte from memdevice bitmap
+Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), 
Local2)
+}
+// Local3 = active state for this memory device
+Store(And(Local2, 1), Local3)
+
+If (LNotEqual(Local1, Local3)) {
+// State change - update MEON with new state
+Store(Local3, Index(MEON, Local0))
+// Do MEM notify
+If (LEqual(Local3, 1)) {
+MTFY(Local0, 1)
+} Else {
+MTFY(Local0, 3)
+}
+}
+Increment(Local0)
+}
+Return(One)
+}
+
+Method (MPEJ, 2, NotSerialized) {
+// _EJ0 method - eject callback
+Store(Arg0, MPE)
+Sleep(200)
+}
 }
 
 
@@ -759,8 +824,9 @@ DefinitionBlock (
 // CPU hotplug event
 Return(\_SB.PRSC())
 }
-Method(_L03) {
-Return(0x01)
+Method(_E03) {
+// Memory hotplug event
+Return(\_SB.MESC())
 }
 Method(_L04) {
 Return(0x01)
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 04/19][SeaBIOS] acpi: generate hotplug memory devices

2012-09-21 Thread Vasilis Liaskovitis

The memory device generation is guided by qemu paravirt info. Seabios
first uses the info to setup SRAT entries for the hotplug-able memory slots.
Afterwards, build_memssdt uses the created SRAT entries to generate
appropriate memory device objects. One memory device (and corresponding SRAT
entry) is generated for each hotplug-able qemu memslot. Currently no SSDT
memory device is created for initial system memory.

We only support up to 255 DIMMs for now (PackageOp used for the MEON array can
only describe an array of at most 255 elements. VarPackageOp would be needed to
support more than 255 devices)

v1-v2:
Seabios reads mems_sts from qemu to build e820_map
SSDT size and some offsets are calculated with extraction macros.

v2-v3:
Minor name changes

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/acpi.c |  158 +--
 1 files changed, 152 insertions(+), 6 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index 6d239fa..1223b52 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -13,6 +13,7 @@
 #include pci_regs.h // PCI_INTERRUPT_LINE
 #include ioport.h // inl
 #include paravirt.h // qemu_cfg_irq0_override
+#include memmap.h
 
 //
 /* ACPI tables init */
@@ -416,11 +417,26 @@ encodeLen(u8 *ssdt_ptr, int length, int bytes)
 #define PCIHP_AML (ssdp_pcihp_aml + *ssdt_pcihp_start)
 #define PCI_SLOTS 32
 
+/* 0x5B 0x82 DeviceOp PkgLength NameString DimmID */
+#define MEM_BASE 0xaf80
+#define MEM_AML (ssdm_mem_aml + *ssdt_mem_start)
+#define MEM_SIZEOF (*ssdt_mem_end - *ssdt_mem_start)
+#define MEM_OFFSET_HEX (*ssdt_mem_name - *ssdt_mem_start + 2)
+#define MEM_OFFSET_ID (*ssdt_mem_id - *ssdt_mem_start)
+#define MEM_OFFSET_PXM 31
+#define MEM_OFFSET_START 55
+#define MEM_OFFSET_END   63
+#define MEM_OFFSET_SIZE  79
+
+u64 nb_hp_memslots = 0;
+struct srat_memory_affinity *mem;
+
 #define SSDT_SIGNATURE 0x54445353 // SSDT
 #define SSDT_HEADER_LENGTH 36
 
 #include ssdt-susp.hex
 #include ssdt-pcihp.hex
+#include ssdt-mem.hex
 
 #define PCI_RMV_BASE 0xae0c
 
@@ -472,6 +488,111 @@ static void patch_pcihp(int slot, u8 *ssdt_ptr, u32 eject)
 }
 }
 
+static void build_memdev(u8 *ssdt_ptr, int i, u64 mem_base, u64 mem_len, u8 
node)
+{
+memcpy(ssdt_ptr, MEM_AML, MEM_SIZEOF);
+ssdt_ptr[MEM_OFFSET_HEX] = getHex(i  4);
+ssdt_ptr[MEM_OFFSET_HEX+1] = getHex(i);
+ssdt_ptr[MEM_OFFSET_ID] = i;
+ssdt_ptr[MEM_OFFSET_PXM] = node;
+*(u64*)(ssdt_ptr + MEM_OFFSET_START) = mem_base;
+*(u64*)(ssdt_ptr + MEM_OFFSET_END) = mem_base + mem_len;
+*(u64*)(ssdt_ptr + MEM_OFFSET_SIZE) = mem_len;
+}
+
+static void*
+build_memssdt(void)
+{
+u64 mem_base;
+u64 mem_len;
+u8  node;
+int i;
+struct srat_memory_affinity *entry = mem;
+u64 nb_memdevs = nb_hp_memslots;
+u8  memslot_status, enabled;
+
+int length = ((1+3+4)
+  + (nb_memdevs * MEM_SIZEOF)
+  + (1+2+5+(12*nb_memdevs))
+  + (6+2+1+(1*nb_memdevs)));
+u8 *ssdt = malloc_high(sizeof(struct acpi_table_header) + length);
+if (! ssdt) {
+warn_noalloc();
+return NULL;
+}
+u8 *ssdt_ptr = ssdt + sizeof(struct acpi_table_header);
+
+// build Scope(_SB_) header
+*(ssdt_ptr++) = 0x10; // ScopeOp
+ssdt_ptr = encodeLen(ssdt_ptr, length-1, 3);
+*(ssdt_ptr++) = '_';
+*(ssdt_ptr++) = 'S';
+*(ssdt_ptr++) = 'B';
+*(ssdt_ptr++) = '_';
+
+for (i = 0; i  nb_memdevs; i++) {
+mem_base = (((u64)(entry-base_addr_high)  32 )| 
entry-base_addr_low);
+mem_len = (((u64)(entry-length_high)  32 )| entry-length_low);
+node = entry-proximity[0];
+build_memdev(ssdt_ptr, i, mem_base, mem_len, node);
+ssdt_ptr += MEM_SIZEOF;
+entry++;
+}
+
+// build Method(MTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CM00, Arg1)} 
...}
+*(ssdt_ptr++) = 0x14; // MethodOp
+ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*nb_memdevs), 2);
+*(ssdt_ptr++) = 'M';
+*(ssdt_ptr++) = 'T';
+*(ssdt_ptr++) = 'F';
+*(ssdt_ptr++) = 'Y';
+*(ssdt_ptr++) = 0x02;
+for (i=0; inb_memdevs; i++) {
+*(ssdt_ptr++) = 0xA0; // IfOp
+   ssdt_ptr = encodeLen(ssdt_ptr, 11, 1);
+*(ssdt_ptr++) = 0x93; // LEqualOp
+*(ssdt_ptr++) = 0x68; // Arg0Op
+*(ssdt_ptr++) = 0x0A; // BytePrefix
+*(ssdt_ptr++) = i;
+*(ssdt_ptr++) = 0x86; // NotifyOp
+*(ssdt_ptr++) = 'M';
+*(ssdt_ptr++) = 'P';
+*(ssdt_ptr++) = getHex(i  4);
+*(ssdt_ptr++) = getHex(i);
+*(ssdt_ptr++) = 0x69; // Arg1Op
+}
+
+// build Name(MEON, Package() { One, One, ..., Zero, Zero, ... })
+*(ssdt_ptr++) = 0x08; // NameOp
+*(ssdt_ptr++) = 'M';
+*(ssdt_ptr++) = 'E';
+*(ssdt_ptr++) = 'O';
+*(ssdt_ptr++) = 'N';
+*(ssdt_ptr++) = 0x12; // PackageOp
+ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*nb_memdevs), 2);
+*(ssdt_ptr

[RFC PATCH v3 10/19] fix live-migration when populated=on is missing

2012-09-21 Thread Vasilis Liaskovitis

Live migration works after memory hot-add events, as long as the
qemu command line -dimm arguments are changed on the destination host
to specify populated=on for the dimms that have been hot-added.

If a command-line change has not occured, the destination host does not yet
have the corresponding ramblock in its ram_list. Activate the dimm on the
destination during ram_load.

Perhaps several fields of the DimmDevice should be part of a
VMStateDescription to handle migration in a cleaner way. But the problem
is that ramblocks are checked before qdev vmstates.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 arch_init.c |   24 +---
 1 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 5a1173e..b63caa7 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -45,6 +45,7 @@
 #include hw/pcspk.h
 #include qemu/page_cache.h
 #include qmp-commands.h
+#include hw/dimm.h
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -740,10 +741,27 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 }
 
 if (!block) {
-fprintf(stderr, Unknown ramblock \%s\, cannot 
+/* this can happen if a dimm was hot-added at source 
host */
+bool ramblock_found = false;
+if (dimm_add(id)) {
+fprintf(stderr, Cannot add unknown ramblock 
\%s\, 
+cannot accept migration\n, id);
+ret = -EINVAL;
+goto done;
+}
+/* rescan ram_list, verify ramblock is there now */
+QLIST_FOREACH(block, ram_list.blocks, next) {
+if (!strncmp(id, block-idstr, sizeof(id))) {
+ramblock_found = true;
+break;
+}
+}
+if (!ramblock_found) {
+fprintf(stderr, Unknown ramblock \%s\, cannot 
 accept migration\n, id);
-ret = -EINVAL;
-goto done;
+ret = -EINVAL;
+goto done;
+}
 }
 
 total_ram_bytes -= length;
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 09/19] pc: Add dimm paravirt SRAT info

2012-09-21 Thread Vasilis Liaskovitis

The numa_fw_cfg paravirt interface is extended to include SRAT information for
all hotplug-able dimms. There are 3 words for each hotplug-able memory slot,
denoting start address, size and node proximity. The new info is appended after
existing numa info, so that the fw_cfg layout does not break.  This information
is used by Seabios to build hotplug memory device objects at runtime.
nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info
to SeaBIOS.

v1-v2:
Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not
to break existing layout
Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 docs/specs/fwcfg.txt |   28 
 hw/pc.c  |   14 --
 2 files changed, 40 insertions(+), 2 deletions(-)
 create mode 100644 docs/specs/fwcfg.txt

diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt
new file mode 100644
index 000..55f96d9
--- /dev/null
+++ b/docs/specs/fwcfg.txt
@@ -0,0 +1,28 @@
+QEMU-BIOS Paravirt Documentation
+--
+
+This document describes paravirt data structures passed from QEMU to BIOS.
+
+FW_CFG_NUMA paravirt info
+
+The SRAT info passed from QEMU to BIOS has the following layout:
+
+---
+#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | ... 
| nodelast_mem
+
+---
+#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | 
dimmlast_sz | dimmlast_pxm
+
+Entry 0 contains the number of numa nodes (nb_numa_nodes).
+
+Entries 1..max_cpus: The next max_cpus entries describe node proximity for each
+one of the vCPUs in the system.
+
+Entries max_cpus+1..max_cpus+nb_numa_nodes+1:  The next nb_numa_nodes entries
+describe the memory size for each one of the NUMA nodes in the system.
+
+Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms 
(nb_hp_dimms)
+
+The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet 
contains
+the physical address offset, size (in bytes), and node proximity for the
+respective dimm.
diff --git a/hw/pc.c b/hw/pc.c
index 2c9664d..f2604ae 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -598,6 +598,7 @@ static void *bochs_bios_init(void)
 uint8_t *smbios_table;
 size_t smbios_len;
 uint64_t *numa_fw_cfg;
+uint64_t *hp_dimms_fw_cfg;
 int i, j;
 
 register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL);
@@ -632,8 +633,10 @@ static void *bochs_bios_init(void)
 /* allocate memory for the NUMA channel: one (64bit) word for the number
  * of nodes, one word for each VCPU-node and one word for each node to
  * hold the amount of memory.
+ * Finally one word for the number of hotplug memory slots and three words
+ * for each hotplug memory slot (start address, size and node proximity).
  */
-numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
+numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 
8);
 numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
 for (i = 0; i  max_cpus; i++) {
 for (j = 0; j  nb_numa_nodes; j++) {
@@ -646,8 +649,15 @@ static void *bochs_bios_init(void)
 for (i = 0; i  nb_numa_nodes; i++) {
 numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
 }
+
+numa_fw_cfg[1 + max_cpus + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms);
+
+hp_dimms_fw_cfg = numa_fw_cfg + 2 + max_cpus + nb_numa_nodes;
+if (nb_hp_dimms)
+setup_fwcfg_hp_dimms(hp_dimms_fw_cfg);
+
 fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg,
- (1 + max_cpus + nb_numa_nodes) * 8);
+ (2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8);
 
 return fw_cfg;
 }
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 11/19] Implement qmp and hmp commands for notification lists

2012-09-21 Thread Vasilis Liaskovitis

Guest can respond to ACPI hotplug events e.g. with _EJ or _OST method.
This patch implements a tail queue to store guest notifications for memory
hot-add and hot-remove requests.

Guest responses for memory hotplug command on a per-dimm basis can be detected
with the new hmp command info memhp or the new qmp command query-memhp
Examples:

(qemu) device_add dimm,id=ram0
(qemu) info memory-hotplug
dimm: ram0 hot-add success
or
dimm: ram0 hot-add failure

(qemu) device_del ram3
(qemu) info memory-hotplug
dimm: ram3 hot-remove success
or
dimm: ram3 hot-remove failure

Results are removed from the queue once read.

This patch only queues _EJ events that signal hot-remove success.
For  _OST event queuing, which cover the hot-remove failure and
hot-add success/failure cases, the _OST patches in this series are  are also
needed.

These notification items should probably be part of migration state (not yet
implemented).

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hmp-commands.hx  |2 +
 hmp.c|   17 ++
 hmp.h|1 +
 hw/dimm.c|   62 +-
 hw/dimm.h|2 +-
 monitor.c|7 ++
 qapi-schema.json |   26 ++
 qmp-commands.hx  |   37 
 8 files changed, 152 insertions(+), 2 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index ed67e99..cfb1b67 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1462,6 +1462,8 @@ show device tree
 show qdev device model list
 @item info roms
 show roms
+@item info memory-hotplug
+show memory-hotplug
 @end table
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index ba6fbd3..4b3d63d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1168,3 +1168,20 @@ void hmp_screen_dump(Monitor *mon, const QDict *qdict)
 qmp_screendump(filename, err);
 hmp_handle_error(mon, err);
 }
+
+void hmp_info_memory_hotplug(Monitor *mon)
+{
+MemHpInfoList *info;
+MemHpInfoList *item;
+MemHpInfo *dimm;
+
+info = qmp_query_memory_hotplug(NULL);
+for (item = info; item; item = item-next) {
+dimm = item-value;
+monitor_printf(mon, dimm: %s %s %s\n, dimm-dimm,
+dimm-request, dimm-result);
+dimm-dimm = NULL;
+}
+
+qapi_free_MemHpInfoList(info);
+}
diff --git a/hmp.h b/hmp.h
index 48b9c59..986705a 100644
--- a/hmp.h
+++ b/hmp.h
@@ -73,5 +73,6 @@ void hmp_getfd(Monitor *mon, const QDict *qdict);
 void hmp_closefd(Monitor *mon, const QDict *qdict);
 void hmp_send_key(Monitor *mon, const QDict *qdict);
 void hmp_screen_dump(Monitor *mon, const QDict *qdict);
+void hmp_info_memory_hotplug(Monitor *mon);
 
 #endif
diff --git a/hw/dimm.c b/hw/dimm.c
index 288b997..fbd93a8 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -65,6 +65,7 @@ static void dimm_bus_initfn(Object *obj)
 DimmBus *bus = DIMM_BUS(obj);
 QTAILQ_INIT(bus-dimmconfig_list);
 QTAILQ_INIT(bus-dimmlist);
+QTAILQ_INIT(bus-dimm_hp_result_queue);
 
 QTAILQ_FOREACH_SAFE(dimm_cfg, dimmconfig_list, nextdimmcfg, 
next_dimm_cfg) {
 QTAILQ_REMOVE(dimmconfig_list, dimm_cfg, nextdimmcfg);
@@ -236,20 +237,78 @@ void dimm_notify(uint32_t idx, uint32_t event)
 {
 DimmBus *bus = main_memory_bus;
 DimmDevice *s;
+DimmConfig *slotcfg;
+struct dimm_hp_result *result;
+
 s = dimm_find_from_idx(idx);
 assert(s != NULL);
+result = g_malloc0(sizeof(*result));
+slotcfg = dimmcfg_find_from_name(DEVICE(s)-id);
+result-dimmname = slotcfg-name;
 
 switch(event) {
 case DIMM_REMOVE_SUCCESS:
 dimm_depopulate(s);
-qdev_simple_unplug_cb((DeviceState*)s);
 QTAILQ_REMOVE(bus-dimmlist, s, nextdimm);
+qdev_simple_unplug_cb((DeviceState*)s);
+QTAILQ_INSERT_TAIL(bus-dimm_hp_result_queue, result, next);
 break;
 default:
+g_free(result);
 break;
 }
 }
 
+MemHpInfoList *qmp_query_memory_hotplug(Error **errp)
+{
+DimmBus *bus = main_memory_bus;
+MemHpInfoList *head = NULL, *cur_item = NULL, *info;
+struct dimm_hp_result *item, *nextitem;
+
+QTAILQ_FOREACH_SAFE(item, bus-dimm_hp_result_queue, next, nextitem) {
+
+info = g_malloc0(sizeof(*info));
+info-value = g_malloc0(sizeof(*info-value));
+info-value-dimm = g_malloc0(sizeof(char) * 32);
+info-value-request = g_malloc0(sizeof(char) * 16);
+info-value-result = g_malloc0(sizeof(char) * 16);
+switch (item-ret) {
+case DIMM_REMOVE_SUCCESS:
+strcpy(info-value-request, hot-remove);
+strcpy(info-value-result, success);
+break;
+case DIMM_REMOVE_FAIL:
+strcpy(info-value-request, hot-remove);
+strcpy(info-value-result, failure);
+break;
+case DIMM_ADD_SUCCESS:
+strcpy(info-value-request, hot-add

[RFC PATCH v3 17/19][SeaBIOS] Implement _PS3 method for memory device

2012-09-21 Thread Vasilis Liaskovitis

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/acpi-dsdt.dsl |   15 +++
 src/ssdt-mem.dsl  |4 
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 0d37bbc..8a18770 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -784,6 +784,13 @@ DefinitionBlock (
 MIF, 8
 }
 
+/* Memory _PS3 byte */
+OperationRegion(MPSB, SystemIO, 0xafa4, 1)
+Field (MPSB, ByteAcc, NoLock, Preserve)
+{
+MPS, 8
+}
+
 Method(MESC, 0) {
 // Local5 = active memdevice bitmap
 Store (MES, Local5)
@@ -824,6 +831,14 @@ DefinitionBlock (
 Store(Arg0, MPE)
 Sleep(200)
 }
+
+Method (MPS3, 1, NotSerialized) {
+// _PS3 method - power-off method
+Store(Arg0, MPS)
+Store(Zero, Index(MEON, Arg0))
+Sleep(200)
+}
+
 Method (MOST, 3, Serialized) {
 // _OST method - OS status indication
 Switch (And(Arg0, 0xFF)) {
diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl
index 041d301..7423fc6 100644
--- a/src/ssdt-mem.dsl
+++ b/src/ssdt-mem.dsl
@@ -39,6 +39,7 @@ DefinitionBlock (ssdt-mem.aml, SSDT, 0x02, BXPC, 
CSSDT, 0x1)
 External(CMST, MethodObj)
 External(MPEJ, MethodObj)
 External(MOST, MethodObj)
+External(MPS3, MethodObj)
 
 Name(_CRS, ResourceTemplate() {
 QwordMemory(
@@ -64,6 +65,9 @@ DefinitionBlock (ssdt-mem.aml, SSDT, 0x02, BXPC, 
CSSDT, 0x1)
 Method (_OST, 3) {
 MOST(Arg0, Arg1, ID)
 }
+Method (_PS3, 0) {
+MPS3(ID)
+}
 }
 }
 
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 18/19] Implement _PS3 for dimm

2012-09-21 Thread Vasilis Liaskovitis

This will allow us to update dimm state on OSPM-initiated eject operations e.g.
with echo 1  /sys/bus/acpi/devices/PNP0C80\:00/eject

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 docs/specs/acpi_hotplug.txt |7 +++
 hw/acpi_piix4.c |5 +
 hw/dimm.c   |3 +++
 hw/dimm.h   |3 ++-
 4 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/docs/specs/acpi_hotplug.txt b/docs/specs/acpi_hotplug.txt
index 536da16..69868fe 100644
--- a/docs/specs/acpi_hotplug.txt
+++ b/docs/specs/acpi_hotplug.txt
@@ -45,3 +45,10 @@ insertion failed.
 Written by ACPI memory device _OST method to notify qemu of failed
 hot-add.  Write-only.
 
+Memory Dimm _PS3 power-off initiated by OSPM (IO port 0xafa4, 1-byte access):
+---
+Dimm hot-add _PS3 initiated by OSPM. Byte value indicates Dimm slot which
+entered D3 state.
+
+Written by ACPI memory device _PS3 method to notify qemu of power-off state for
+the dimm.  Write-only.
diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 8bf58a6..aad78ca 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -52,6 +52,7 @@
 #define MEM_OST_REMOVE_FAIL 0xafa1
 #define MEM_OST_ADD_SUCCESS 0xafa2
 #define MEM_OST_ADD_FAIL 0xafa3
+#define MEM_PS3 0xafa4
 
 #define PIIX4_MEM_HOTPLUG_STATUS 8
 #define PIIX4_PCI_HOTPLUG_STATUS 2
@@ -545,6 +546,9 @@ static void gpe_writeb(void *opaque, uint32_t addr, 
uint32_t val)
 case MEM_OST_ADD_FAIL:
 dimm_notify(val, DIMM_ADD_FAIL);
 break;
+case MEM_PS3:
+dimm_notify(val, DIMM_OSPM_POWEROFF);
+break;
 default:
 acpi_gpe_ioport_writeb(s-ar, addr, val);
 }
@@ -621,6 +625,7 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, 
PIIX4PMState *s)
 register_ioport_write(MEM_OST_REMOVE_FAIL, 1, 1,  gpe_writeb, s);
 register_ioport_write(MEM_OST_ADD_SUCCESS, 1, 1,  gpe_writeb, s);
 register_ioport_write(MEM_OST_ADD_FAIL, 1, 1,  gpe_writeb, s);
+register_ioport_write(MEM_PS3, 1, 1,  gpe_writeb, s);
 
 for(i = 0; i  DIMM_BITMAP_BYTES; i++) {
 s-gperegs.mems_sts[i] = 0;
diff --git a/hw/dimm.c b/hw/dimm.c
index b993668..08f66d5 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -319,6 +319,9 @@ void dimm_notify(uint32_t idx, uint32_t event)
 qdev_simple_unplug_cb((DeviceState*)s);
 QTAILQ_INSERT_TAIL(bus-dimm_hp_result_queue, result, next);
 break;
+case DIMM_OSPM_POWEROFF:
+if (bus-dimm_revert)
+bus-dimm_revert(bus-dimm_hotplug_qdev, s, 1);
 default:
 g_free(result);
 break;
diff --git a/hw/dimm.h b/hw/dimm.h
index ce091fe..8d73b8f 100644
--- a/hw/dimm.h
+++ b/hw/dimm.h
@@ -15,7 +15,8 @@ typedef enum {
 DIMM_REMOVE_SUCCESS = 0,
 DIMM_REMOVE_FAIL = 1,
 DIMM_ADD_SUCCESS = 2,
-DIMM_ADD_FAIL = 3
+DIMM_ADD_FAIL = 3,
+DIMM_OSPM_POWEROFF = 4
 } dimm_hp_result_code;
 
 typedef enum {
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 08/19] pc: calculate dimm physical addresses and adjust memory map

2012-09-21 Thread Vasilis Liaskovitis

Dimm physical address offsets are calculated automatically and memory map is
adjusted accordingly. If a DIMM can fit before the PCI_HOLE_START (currently
0xe000), it will be added normally, otherwise its physical address will be
above 4GB.

Also create memory bus on i440fx-pcihost device.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/pc.c  |   41 +
 hw/pc.h  |6 ++
 hw/pc_piix.c |   20 ++--
 vl.c |1 +
 4 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 112739a..2c9664d 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -52,6 +52,7 @@
 #include arch_init.h
 #include bitmap.h
 #include vga-pci.h
+#include dimm.h
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
@@ -93,6 +94,9 @@ struct e820_table {
 static struct e820_table e820_table;
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
+ram_addr_t below_4g_hp_mem_size = 0;
+ram_addr_t above_4g_hp_mem_size = 0;
+extern target_phys_addr_t ram_hp_offset;
 void gsi_handler(void *opaque, int n, int level)
 {
 GSIState *s = opaque;
@@ -1160,3 +1164,40 @@ void pc_pci_device_init(PCIBus *pci_bus)
 pci_create_simple(pci_bus, -1, lsi53c895a);
 }
 }
+
+
+/* Function to configure memory offsets of hotpluggable dimms */
+
+target_phys_addr_t pc_set_hp_memory_offset(uint64_t size)
+{
+target_phys_addr_t ret;
+
+/* on first call, initialize ram_hp_offset */
+if (!ram_hp_offset) {
+if (ram_size = PCI_HOLE_START ) {
+ram_hp_offset = 0x1LL + (ram_size - PCI_HOLE_START);
+} else {
+ram_hp_offset = ram_size;
+}
+}
+
+if (ram_hp_offset = 0x1LL) {
+ret = ram_hp_offset;
+above_4g_hp_mem_size += size;
+ram_hp_offset += size;
+}
+/* if dimm fits before pci hole, append it normally */
+else if (ram_hp_offset + size = PCI_HOLE_START) {
+ret = ram_hp_offset;
+below_4g_hp_mem_size += size;
+ram_hp_offset += size;
+}
+/* otherwise place it above 4GB */
+else {
+ret = 0x1LL;
+above_4g_hp_mem_size += size;
+ram_hp_offset = 0x1LL + size;
+}
+
+return ret;
+}
diff --git a/hw/pc.h b/hw/pc.h
index e4db071..f3304fc 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -10,6 +10,7 @@
 #include memory.h
 #include ioapic.h
 
+#define PCI_HOLE_START 0xe000
 /* PC-style peripherals (also used by other machines).  */
 
 /* serial.c */
@@ -214,6 +215,11 @@ static inline bool isa_ne2000_init(ISABus *bus, int base, 
int irq, NICInfo *nd)
 /* pc_sysfw.c */
 void pc_system_firmware_init(MemoryRegion *rom_memory);
 
+/* memory hotplug */
+target_phys_addr_t pc_set_hp_memory_offset(uint64_t size);
+extern ram_addr_t below_4g_hp_mem_size;
+extern ram_addr_t above_4g_hp_mem_size;
+
 /* e820 types */
 #define E820_RAM1
 #define E820_RESERVED   2
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 88ff041..d1fd276 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -43,6 +43,7 @@
 #include xen.h
 #include memory.h
 #include exec-memory.h
+#include dimm.h
 #ifdef CONFIG_XEN
 #  include xen/hvm/hvm_info_table.h
 #endif
@@ -155,9 +156,9 @@ static void pc_init1(MemoryRegion *system_memory,
 kvmclock_create();
 }
 
-if (ram_size = 0xe000 ) {
-above_4g_mem_size = ram_size - 0xe000;
-below_4g_mem_size = 0xe000;
+if (ram_size = PCI_HOLE_START ) {
+above_4g_mem_size = ram_size - PCI_HOLE_START;
+below_4g_mem_size = PCI_HOLE_START;
 } else {
 above_4g_mem_size = 0;
 below_4g_mem_size = ram_size;
@@ -172,6 +173,9 @@ static void pc_init1(MemoryRegion *system_memory,
 rom_memory = system_memory;
 }
 
+/* adjust memory map for hotplug dimms */
+dimm_calc_offsets(pc_set_hp_memory_offset);
+
 /* allocate ram and load rom/bios */
 if (!xen_enabled()) {
 fw_cfg = pc_memory_init(system_memory,
@@ -192,9 +196,11 @@ static void pc_init1(MemoryRegion *system_memory,
 if (pci_enabled) {
 pci_bus = i440fx_init(i440fx_state, piix3_devfn, isa_bus, gsi,
   system_memory, system_io, ram_size,
-  below_4g_mem_size,
-  0x1ULL - below_4g_mem_size,
-  0x1ULL + above_4g_mem_size,
+  below_4g_mem_size + below_4g_hp_mem_size,
+  0x1ULL - below_4g_mem_size
+- below_4g_hp_mem_size,
+  0x1ULL + above_4g_mem_size
++ above_4g_hp_mem_size,
   (sizeof(target_phys_addr_t) == 4
? 0
: ((uint64_t)1  62)),
@@ -223,6 +229,8 @@ static void pc_init1(MemoryRegion *system_memory

[RFC PATCH v3 19/19][SeaBIOS] Calculate pcimem_start and pcimem64_start from SRAT entries

2012-09-21 Thread Vasilis Liaskovitis

pcimem_start and pcimem64_start are adjusted from srat entries. For this reason,
paravirt info (NUMA SRAT entries and number of cpus) need to be read before 
pci_setup.
Imho, this is an ugly code change since SRAT bios tables and number of
cpus have to be read earlier. But the advantage is that no new paravirt 
interface
is introduced. Suggestions to make the code change cleaner are welcome.

The alternative patch (will be sent as a reply to this patch) implements a
paravirt interface to read the starting values of pcimem_start and
pcimem64_start from QEMU.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/acpi.c|   82 
 src/acpi.h|3 ++
 src/pciinit.c |6 +++-
 src/post.c|3 ++
 src/smp.c |4 +++
 5 files changed, 72 insertions(+), 26 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index 1223b52..9e99aa7 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -428,7 +428,10 @@ encodeLen(u8 *ssdt_ptr, int length, int bytes)
 #define MEM_OFFSET_END   63
 #define MEM_OFFSET_SIZE  79
 
-u64 nb_hp_memslots = 0;
+u64 nb_hp_memslots = 0, nb_numanodes;
+u64 *numa_data, *hp_memdata;
+u64 below_4g_hp_mem_size = 0;
+u64 above_4g_hp_mem_size = 0;
 struct srat_memory_affinity *mem;
 
 #define SSDT_SIGNATURE 0x54445353 // SSDT
@@ -763,17 +766,7 @@ acpi_build_srat_memory(struct srat_memory_affinity 
*numamem,
 static void *
 build_srat(void)
 {
-int nb_numa_nodes = qemu_cfg_get_numa_nodes();
-
-u64 *numadata = malloc_tmphigh(sizeof(u64) * (MaxCountCPUs + 
nb_numa_nodes));
-if (!numadata) {
-warn_noalloc();
-return NULL;
-}
-
-qemu_cfg_get_numa_data(numadata, MaxCountCPUs + nb_numa_nodes);
-
-qemu_cfg_get_numa_data(nb_hp_memslots, 1);
+int nb_numa_nodes = nb_numanodes;
 struct system_resource_affinity_table *srat;
 int srat_size = sizeof(*srat) +
 sizeof(struct srat_processor_affinity) * MaxCountCPUs +
@@ -782,7 +775,7 @@ build_srat(void)
 srat = malloc_high(srat_size);
 if (!srat) {
 warn_noalloc();
-free(numadata);
+free(numa_data);
 return NULL;
 }
 
@@ -791,6 +784,7 @@ build_srat(void)
 struct srat_processor_affinity *core = (void*)(srat + 1);
 int i;
 u64 curnode;
+u64 *numadata = numa_data;
 
 for (i = 0; i  MaxCountCPUs; ++i) {
 core-type = SRAT_PROCESSOR;
@@ -847,15 +841,7 @@ build_srat(void)
 mem = (void*)numamem;
 
 if (nb_hp_memslots) {
-u64 *hpmemdata = malloc_tmphigh(sizeof(u64) * (3 * nb_hp_memslots));
-if (!hpmemdata) {
-warn_noalloc();
-free(hpmemdata);
-free(numadata);
-return NULL;
-}
-
-qemu_cfg_get_numa_data(hpmemdata, 3 * nb_hp_memslots);
+u64 *hpmemdata = hp_memdata;
 
 for (i = 1; i  nb_hp_memslots + 1; ++i) {
 mem_base = *hpmemdata++;
@@ -865,7 +851,7 @@ build_srat(void)
 numamem++;
 slots++;
 }
-free(hpmemdata);
+free(hp_memdata);
 }
 
 for (; slots  nb_numa_nodes + nb_hp_memslots + 2; slots++) {
@@ -875,10 +861,58 @@ build_srat(void)
 
 build_header((void*)srat, SRAT_SIGNATURE, srat_size, 1);
 
-free(numadata);
+free(numa_data);
 return srat;
 }
 
+/* QEMU paravirt SRAT entries need to be read in before pci initilization */
+void read_srat_early(void)
+{
+int i;
+
+nb_numanodes = qemu_cfg_get_numa_nodes();
+u64 *hpmemdata;
+u64 mem_len, mem_base;
+
+numa_data = malloc_tmphigh(sizeof(u64) * (MaxCountCPUs + nb_numanodes));
+if (!numa_data) {
+warn_noalloc();
+}
+
+qemu_cfg_get_numa_data(numa_data, MaxCountCPUs + nb_numanodes);
+qemu_cfg_get_numa_data(nb_hp_memslots, 1);
+
+if (nb_hp_memslots) {
+hp_memdata = malloc_tmphigh(sizeof(u64) * (3 * nb_hp_memslots));
+if (!hp_memdata) {
+warn_noalloc();
+free(hp_memdata);
+free(numa_data);
+}
+
+qemu_cfg_get_numa_data(hp_memdata, 3 * nb_hp_memslots);
+hpmemdata = hp_memdata;
+
+for (i = 1; i  nb_hp_memslots + 1; ++i) {
+mem_base = *hpmemdata++;
+mem_len = *hpmemdata++;
+hpmemdata++;
+if (mem_base = 0x1LL) {
+above_4g_hp_mem_size += mem_len;
+}
+/* if dimm fits before pci hole, append it normally */
+else if (mem_base + mem_len = BUILD_PCIMEM_START) {
+below_4g_hp_mem_size += mem_len;
+}
+/* otherwise place it above 4GB */
+else {
+above_4g_hp_mem_size += mem_len;
+}
+}
+
+}
+}
+
 static const struct pci_device_id acpi_find_tbl[] = {
 /* PIIX4 Power Management device. */
 PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371AB_3, NULL),
diff --git a/src/acpi.h b/src/acpi.h
index cb21561..d29837f

[RFC PATCH v3 16/19] Update dimm state on reset

2012-09-21 Thread Vasilis Liaskovitis

in case of hot-remove failure on a guest that does not implement _OST,
the dimm bitmaps in qemu and Seabios show the dimm as unplugged, but the dimm
is still present on the qdev/memory bus. To avoid this inconsistency, we set the
dimm state to active/hot-plugged on a reset of the associated acpi_pm device.
This way the dimm is still active after a VM reboot and dimm visibility has
always the same behaviour, regardless of _OST support in the guest.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/acpi_piix4.c |1 +
 hw/dimm.c   |   20 
 hw/dimm.h   |1 +
 3 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index f7220d4..8bf58a6 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -373,6 +373,7 @@ static void piix4_reset(void *opaque)
 pci_conf[0x5B] = 0x02;
 }
 piix4_update_hotplug(s);
+dimm_state_sync();
 }
 
 static void piix4_powerdown(void *opaque, int irq, int power_failing)
diff --git a/hw/dimm.c b/hw/dimm.c
index 1521462..b993668 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -182,6 +182,26 @@ static DimmDevice *dimm_find_from_idx(uint32_t idx)
 return NULL;
 }
 
+void dimm_state_sync(void)
+{
+DimmBus *bus = main_memory_bus;
+DimmDevice *slot;
+
+/* if a hot-remove operation is pending on reset, it means the hot-remove
+ * operation has failed, but the guest hasn't notified us e.g. because the
+ * guest does not provide _OST notifications. The device is still present 
on
+ * the dimmbus, but the qemu and Seabios dimm bitmaps show this device as
+ * unplugged. To avoid this inconsistency, we set the dimm bits to active
+ * i.e. hot-plugged for each dimm present on the dimmbus.
+ */
+QTAILQ_FOREACH(slot, bus-dimmlist, nextdimm) {
+if (slot-pending == DIMM_REMOVE_PENDING) {
+if (bus-dimm_revert)
+bus-dimm_revert(bus-dimm_hotplug_qdev, slot, 0);
+}
+}
+}
+
 /* used to create a dimm device, only on incoming migration of a hotplugged
  * RAMBlock
  */
diff --git a/hw/dimm.h b/hw/dimm.h
index a6c6e6f..ce091fe 100644
--- a/hw/dimm.h
+++ b/hw/dimm.h
@@ -95,5 +95,6 @@ void main_memory_bus_create(Object *parent);
 void dimm_config_create(char *id, uint64_t size, uint64_t node,
 uint32_t dimm_idx, uint32_t populated);
 uint64_t get_hp_memory_total(void);
+void dimm_state_sync(void);
 
 #endif
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 15/19] Add _OST dimm support

2012-09-21 Thread Vasilis Liaskovitis

This allows qemu to receive notifications from the guest OS on success or
failure of a memory hotplug request. The guest OS needs to implement the _OST
functionality for this to work (linux-next: http://lkml.org/lkml/2012/6/25/321)

This patch also updates dimm bitmap state and hot-remove pending flag
on hot-remove fail.  This allows failed hot operations to be retried at
anytime. This only works for guests that use _OST notification.
Also adds new _OST registers in  docs/specs/acpi_hotplug.txt

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 docs/specs/acpi_hotplug.txt |   25 +
 hw/acpi_piix4.c |   35 ++-
 hw/dimm.c   |   28 +++-
 hw/dimm.h   |   10 +-
 4 files changed, 95 insertions(+), 3 deletions(-)

diff --git a/docs/specs/acpi_hotplug.txt b/docs/specs/acpi_hotplug.txt
index cf86242..536da16 100644
--- a/docs/specs/acpi_hotplug.txt
+++ b/docs/specs/acpi_hotplug.txt
@@ -20,3 +20,28 @@ ejected.
 
 Written by ACPI memory device _EJ0 method to notify qemu of successfull
 hot-removal.  Write-only.
+
+Memory Dimm ejection failure notification (IO port 0xafa1, 1-byte access):
+---
+Dimm hot-remove _OST notification. Byte value indicates Dimm slot for which
+ejection failed.
+
+Written by ACPI memory device _OST method to notify qemu of failed
+hot-removal.  Write-only.
+
+Memory Dimm insertion success notification (IO port 0xafa2, 1-byte access):
+---
+Dimm hot-remove _OST notification. Byte value indicates Dimm slot for which
+insertion succeeded.
+
+Written by ACPI memory device _OST method to notify qemu of failed
+hot-add.  Write-only.
+
+Memory Dimm insertion failure notification (IO port 0xafa3, 1-byte access):
+---
+Dimm hot-remove _OST notification. Byte value indicates Dimm slot for which
+insertion failed.
+
+Written by ACPI memory device _OST method to notify qemu of failed
+hot-add.  Write-only.
+
diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 8776669..f7220d4 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -49,6 +49,9 @@
 #define PCI_RMV_BASE 0xae0c
 #define MEM_BASE 0xaf80
 #define MEM_EJ_BASE 0xafa0
+#define MEM_OST_REMOVE_FAIL 0xafa1
+#define MEM_OST_ADD_SUCCESS 0xafa2
+#define MEM_OST_ADD_FAIL 0xafa3
 
 #define PIIX4_MEM_HOTPLUG_STATUS 8
 #define PIIX4_PCI_HOTPLUG_STATUS 2
@@ -87,6 +90,7 @@ typedef struct PIIX4PMState {
 uint8_t s4_val;
 } PIIX4PMState;
 
+static int piix4_dimm_revert(DeviceState *qdev, DimmDevice *dev, int add);
 static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s);
 
 #define ACPI_ENABLE 0xf1
@@ -531,6 +535,15 @@ static void gpe_writeb(void *opaque, uint32_t addr, 
uint32_t val)
 case MEM_EJ_BASE:
 dimm_notify(val, DIMM_REMOVE_SUCCESS);
 break;
+case MEM_OST_REMOVE_FAIL:
+dimm_notify(val, DIMM_REMOVE_FAIL);
+break;
+case MEM_OST_ADD_SUCCESS:
+dimm_notify(val, DIMM_ADD_SUCCESS);
+break;
+case MEM_OST_ADD_FAIL:
+dimm_notify(val, DIMM_ADD_FAIL);
+break;
 default:
 acpi_gpe_ioport_writeb(s-ar, addr, val);
 }
@@ -604,13 +617,16 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, 
PIIX4PMState *s)
 
 register_ioport_read(MEM_BASE, DIMM_BITMAP_BYTES, 1,  gpe_readb, s);
 register_ioport_write(MEM_EJ_BASE, 1, 1,  gpe_writeb, s);
+register_ioport_write(MEM_OST_REMOVE_FAIL, 1, 1,  gpe_writeb, s);
+register_ioport_write(MEM_OST_ADD_SUCCESS, 1, 1,  gpe_writeb, s);
+register_ioport_write(MEM_OST_ADD_FAIL, 1, 1,  gpe_writeb, s);
 
 for(i = 0; i  DIMM_BITMAP_BYTES; i++) {
 s-gperegs.mems_sts[i] = 0;
 }
 
 pci_bus_hotplug(bus, piix4_device_hotplug, s-dev.qdev);
-dimm_bus_hotplug(piix4_dimm_hotplug, s-dev.qdev);
+dimm_bus_hotplug(piix4_dimm_hotplug, piix4_dimm_revert, s-dev.qdev);
 }
 
 static void enable_device(PIIX4PMState *s, int slot)
@@ -656,6 +672,23 @@ static int piix4_dimm_hotplug(DeviceState *qdev, 
DimmDevice *dev, int
 return 0;
 }
 
+static int piix4_dimm_revert(DeviceState *qdev, DimmDevice *dev, int add)
+{
+PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, qdev);
+PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, pci_dev);
+struct gpe_regs *g = s-gperegs;
+DimmDevice *slot = DIMM(dev);
+int idx = slot-idx;
+
+if (add) {
+g-mems_sts[idx/8] = ~(1  (idx%8));
+}
+else {
+g-mems_sts[idx/8] |= (1  (idx%8));
+}
+return 0;
+}
+
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
PCIHotplugState state)
 {
diff --git a/hw/dimm.c b/hw/dimm.c
index 21626f6..1521462 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c

[RFC PATCH v3 14/19][SeaBIOS] Add _OST dimm method

2012-09-21 Thread Vasilis Liaskovitis

Add support for _OST method. _OST method will write into the correct I/O byte to
signal success / failure of hot-add or hot-remove to qemu.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/acpi-dsdt.dsl |   50 ++
 src/ssdt-mem.dsl  |4 
 2 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 5d3e92b..0d37bbc 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -762,6 +762,28 @@ DefinitionBlock (
 MPE, 8
 }
 
+
+/* Memory hot-remove notify failure byte */
+OperationRegion(MEEF, SystemIO, 0xafa1, 1)
+Field (MEEF, ByteAcc, NoLock, Preserve)
+{
+MEF, 8
+}
+
+/* Memory hot-add notify success byte */
+OperationRegion(MPIS, SystemIO, 0xafa2, 1)
+Field (MPIS, ByteAcc, NoLock, Preserve)
+{
+MIS, 8
+}
+
+/* Memory hot-add notify failure byte */
+OperationRegion(MPIF, SystemIO, 0xafa3, 1)
+Field (MPIF, ByteAcc, NoLock, Preserve)
+{
+MIF, 8
+}
+
 Method(MESC, 0) {
 // Local5 = active memdevice bitmap
 Store (MES, Local5)
@@ -802,6 +824,34 @@ DefinitionBlock (
 Store(Arg0, MPE)
 Sleep(200)
 }
+Method (MOST, 3, Serialized) {
+// _OST method - OS status indication
+Switch (And(Arg0, 0xFF)) {
+Case(0x3)
+{
+Switch(And(Arg1, 0xFF)) {
+Case(0x1) {
+Store(Arg2, MEF)
+// Revert MEON flag for this memory device to one
+Store(One, Index(MEON, Arg2))
+}
+}
+}
+Case(0x1)
+{
+Switch(And(Arg1, 0xFF)) {
+Case(0x0) {
+Store(Arg2, MIS)
+}
+Case(0x1) {
+Store(Arg2, MIF)
+// Revert MEON flag for this memory device to zero
+Store(Zero, Index(MEON, Arg2))
+}
+}
+}
+}
+}
 }
 
 
diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl
index ee322f0..041d301 100644
--- a/src/ssdt-mem.dsl
+++ b/src/ssdt-mem.dsl
@@ -38,6 +38,7 @@ DefinitionBlock (ssdt-mem.aml, SSDT, 0x02, BXPC, 
CSSDT, 0x1)
 
 External(CMST, MethodObj)
 External(MPEJ, MethodObj)
+External(MOST, MethodObj)
 
 Name(_CRS, ResourceTemplate() {
 QwordMemory(
@@ -60,6 +61,9 @@ DefinitionBlock (ssdt-mem.aml, SSDT, 0x02, BXPC, 
CSSDT, 0x1)
 Method (_EJ0, 1, NotSerialized) {
 MPEJ(ID, Arg0)
 }
+Method (_OST, 3) {
+MOST(Arg0, Arg1, ID)
+}
 }
 }
 
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 13/19] balloon: update with hotplugged memory

2012-09-21 Thread Vasilis Liaskovitis

query-balloon and info balloon should report total memory available to the
guest.

balloon inflate/ deflate can also use all memory available to the guest (initial
+ hotplugged memory)

Ballon driver has been minimaly tested with the patch, please review and test.

Caveat: if the guest does not online hotplugged-memory, it's easy for a balloon
inflate command to OOM a guest.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/virtio-balloon.c |   13 +
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index dd1a650..bca21bc 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -22,6 +22,7 @@
 #include virtio-balloon.h
 #include kvm.h
 #include exec-memory.h
+#include dimm.h
 
 #if defined(__linux__)
 #include sys/mman.h
@@ -147,10 +148,11 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 VirtIOBalloon *dev = to_virtio_balloon(vdev);
 struct virtio_balloon_config config;
 uint32_t oldactual = dev-actual;
+uint64_t hotplugged_ram_size = get_hp_memory_total();
 memcpy(config, config_data, 8);
 dev-actual = le32_to_cpu(config.actual);
 if (dev-actual != oldactual) {
-qemu_balloon_changed(ram_size -
+qemu_balloon_changed(ram_size + hotplugged_ram_size -
  (dev-actual  VIRTIO_BALLOON_PFN_SHIFT));
 }
 }
@@ -188,17 +190,20 @@ static void virtio_balloon_stat(void *opaque, BalloonInfo 
*info)
 
 info-actual = ram_size - ((uint64_t) dev-actual 
VIRTIO_BALLOON_PFN_SHIFT);
+info-actual += get_hp_memory_total(); 
 }
 
 static void virtio_balloon_to_target(void *opaque, ram_addr_t target)
 {
 VirtIOBalloon *dev = opaque;
+uint64_t hotplugged_ram_size = get_hp_memory_total();
 
-if (target  ram_size) {
-target = ram_size;
+if (target  ram_size + hotplugged_ram_size) {
+target = ram_size + hotplugged_ram_size;
 }
 if (target) {
-dev-num_pages = (ram_size - target)  VIRTIO_BALLOON_PFN_SHIFT;
+dev-num_pages = (ram_size + hotplugged_ram_size - target) 
+ VIRTIO_BALLOON_PFN_SHIFT;
 virtio_notify_config(dev-vdev);
 }
 }
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 12/19] Implement info memory-total and query-memory-total

2012-09-21 Thread Vasilis Liaskovitis

Returns total physical memory available to guest in bytes, including hotplugged
memory. Note that the number reported here may be different from what the guest
sees e.g. if the guest has not logically onlined hotplugged memory.

This functionality is provided independently of a balloon device, since a
guest can be using ACPI memory hotplug without using a balloon device.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hmp-commands.hx  |2 ++
 hmp.c|7 +++
 hmp.h|1 +
 hw/dimm.c|   21 +
 hw/dimm.h|1 +
 monitor.c|7 +++
 qapi-schema.json |   11 +++
 qmp-commands.hx  |   20 
 8 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index cfb1b67..988d207 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1464,6 +1464,8 @@ show qdev device model list
 show roms
 @item info memory-hotplug
 show memory-hotplug
+@item info memory-total
+show memory-total
 @end table
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index 4b3d63d..cc31ddc 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1185,3 +1185,10 @@ void hmp_info_memory_hotplug(Monitor *mon)
 
 qapi_free_MemHpInfoList(info);
 }
+
+void hmp_info_memory_total(Monitor *mon)
+{
+uint64_t ram_total;
+ram_total = (uint64_t)qmp_query_memory_total(NULL);
+monitor_printf(mon, MemTotal: %lu \n, ram_total);
+}
diff --git a/hmp.h b/hmp.h
index 986705a..ab96dba 100644
--- a/hmp.h
+++ b/hmp.h
@@ -74,5 +74,6 @@ void hmp_closefd(Monitor *mon, const QDict *qdict);
 void hmp_send_key(Monitor *mon, const QDict *qdict);
 void hmp_screen_dump(Monitor *mon, const QDict *qdict);
 void hmp_info_memory_hotplug(Monitor *mon);
+void hmp_info_memory_total(Monitor *mon);
 
 #endif
diff --git a/hw/dimm.c b/hw/dimm.c
index fbd93a8..21626f6 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -28,6 +28,7 @@ static DimmBus *main_memory_bus;
 /* the following list is used to hold dimm config info before machine
  * initialization. After machine init, the list is emptied and not used 
anymore.*/
 static DimmConfiglist dimmconfig_list = 
QTAILQ_HEAD_INITIALIZER(dimmconfig_list);
+extern ram_addr_t ram_size;
 
 static void dimmbus_dev_print(Monitor *mon, DeviceState *dev, int indent);
 static char *dimmbus_get_fw_dev_path(DeviceState *dev);
@@ -233,6 +234,26 @@ void setup_fwcfg_hp_dimms(uint64_t *fw_cfg_slots)
 }
 }
 
+uint64_t get_hp_memory_total(void)
+{
+DimmBus *bus = main_memory_bus;
+DimmDevice *slot;
+uint64_t info = 0;
+
+QTAILQ_FOREACH(slot, bus-dimmlist, nextdimm) {
+info += slot-size;
+}
+return info;
+}
+
+int64_t qmp_query_memory_total(Error **errp)
+{
+uint64_t info;
+info = ram_size + get_hp_memory_total();
+
+return (int64_t)info;
+}
+
 void dimm_notify(uint32_t idx, uint32_t event)
 {
 DimmBus *bus = main_memory_bus;
diff --git a/hw/dimm.h b/hw/dimm.h
index 95251ba..21225be 100644
--- a/hw/dimm.h
+++ b/hw/dimm.h
@@ -86,5 +86,6 @@ int dimm_add(char *id);
 void main_memory_bus_create(Object *parent);
 void dimm_config_create(char *id, uint64_t size, uint64_t node,
 uint32_t dimm_idx, uint32_t populated);
+uint64_t get_hp_memory_total(void);
 
 #endif
diff --git a/monitor.c b/monitor.c
index be9a1d9..4f5ea60 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2747,6 +2747,13 @@ static mon_cmd_t info_cmds[] = {
 .mhandler.info = hmp_info_memory_hotplug,
 },
 {
+.name   = memory-total,
+.args_type  = ,
+.params = ,
+.help   = show total memory size,
+.mhandler.info = hmp_info_memory_total,
+},
+{
 .name   = NULL,
 },
 };
diff --git a/qapi-schema.json b/qapi-schema.json
index 3706a2a..c1d2571 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2581,3 +2581,14 @@
 # Since: 1.3
 ##
 { 'command': 'query-memory-hotplug', 'returns': ['MemHpInfo'] }
+
+##
+# @query-memory-total:
+#
+# Returns total memory in bytes, including hotplugged dimms
+#
+# Returns: int
+#
+# Since: 1.3
+##
+{ 'command': 'query-memory-total', 'returns': 'int' }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index e50dcc2..20b7eea 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2576,3 +2576,23 @@ Example:
}
 
 EQMP
+
+{
+.name   = query-memory-total,
+.args_type  = ,
+.mhandler.cmd_new = qmp_marshal_input_query_memory_total
+},
+SQMP
+query-memory-total
+--
+
+Return total memory in bytes, including hotplugged dimms
+
+Example:
+
+- { execute: query-memory-total }
+- {
+  return: 1073741824
+   }
+
+EQMP
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 07/19] acpi_piix4: Implement memory device hotplug registers

2012-09-21 Thread Vasilis Liaskovitis

A 32-byte register is used to present up to 256 hotplug-able memory devices
to BIOS and OSPM. Hot-add and hot-remove functions trigger an ACPI hotplug
event through these. Only reads are allowed from these registers.

An ACPI hot-remove event but needs to wait for OSPM to eject the device.
We use a single-byte register to know when OSPM has called the _EJ function
for a particular dimm. A write to this byte will depopulate the respective dimm.
Only writes are allowed to this byte.

v1-v2:
mems_sts address moved from 0xaf20 to 0xaf80 (to accomodate more space for
cpu-hotplugging in the future).
_EJ array is reduced to a single byte.
Add documentation in docs/specs/acpi_hotplug.txt

v2-v3:
minor name changes

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 docs/specs/acpi_hotplug.txt |   22 +
 hw/acpi_piix4.c |   73 --
 2 files changed, 91 insertions(+), 4 deletions(-)
 create mode 100644 docs/specs/acpi_hotplug.txt

diff --git a/docs/specs/acpi_hotplug.txt b/docs/specs/acpi_hotplug.txt
new file mode 100644
index 000..cf86242
--- /dev/null
+++ b/docs/specs/acpi_hotplug.txt
@@ -0,0 +1,22 @@
+QEMU-ACPI BIOS hotplug interface
+--
+This document describes the interface between QEMU and the ACPI BIOS for 
non-PCI
+space. For the PCI interface please look at docs/specs/acpi_pci_hotplug.txt
+
+QEMU-ACPI BIOS memory hotplug interface
+--
+
+Memory Dimm status array (IO port 0xaf80-0xaf9f, 1-byte access):
+---
+Dimm hot-plug notification pending. One bit per slot.
+
+Read by ACPI BIOS GPE.3 handler to notify OS of memory hot-add or hot-remove
+events.  Read-only.
+
+Memory Dimm ejection success notification (IO port 0xafa0, 1-byte access):
+---
+Dimm hot-remove _EJ0 notification. Byte value indicates Dimm slot that was
+ejected.
+
+Written by ACPI memory device _EJ0 method to notify qemu of successfull
+hot-removal.  Write-only.
diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index c56220b..8776669 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -28,6 +28,8 @@
 #include range.h
 #include ioport.h
 #include fw_cfg.h
+#include sysbus.h
+#include dimm.h
 
 //#define DEBUG
 
@@ -45,9 +47,15 @@
 #define PCI_DOWN_BASE 0xae04
 #define PCI_EJ_BASE 0xae08
 #define PCI_RMV_BASE 0xae0c
+#define MEM_BASE 0xaf80
+#define MEM_EJ_BASE 0xafa0
 
+#define PIIX4_MEM_HOTPLUG_STATUS 8
 #define PIIX4_PCI_HOTPLUG_STATUS 2
 
+struct gpe_regs {
+uint8_t mems_sts[DIMM_BITMAP_BYTES];
+};
 struct pci_status {
 uint32_t up; /* deprecated, maintained for migration compatibility */
 uint32_t down;
@@ -69,6 +77,7 @@ typedef struct PIIX4PMState {
 Notifier machine_ready;
 
 /* for pci hotplug */
+struct gpe_regs gperegs;
 struct pci_status pci0_status;
 uint32_t pci0_hotplug_enable;
 uint32_t pci0_slot_device_present;
@@ -93,8 +102,8 @@ static void pm_update_sci(PIIX4PMState *s)
ACPI_BITMASK_POWER_BUTTON_ENABLE |
ACPI_BITMASK_GLOBAL_LOCK_ENABLE |
ACPI_BITMASK_TIMER_ENABLE)) != 0) ||
-(((s-ar.gpe.sts[0]  s-ar.gpe.en[0])
-   PIIX4_PCI_HOTPLUG_STATUS) != 0);
+(((s-ar.gpe.sts[0]  s-ar.gpe.en[0]) 
+  (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_MEM_HOTPLUG_STATUS)) != 0);
 
 qemu_set_irq(s-irq, sci_level);
 /* schedule a timer interruption if needed */
@@ -499,7 +508,16 @@ type_init(piix4_pm_register_types)
 static uint32_t gpe_readb(void *opaque, uint32_t addr)
 {
 PIIX4PMState *s = opaque;
-uint32_t val = acpi_gpe_ioport_readb(s-ar, addr);
+uint32_t val = 0;
+struct gpe_regs *g = s-gperegs;
+
+switch (addr) {
+case MEM_BASE ... MEM_BASE+DIMM_BITMAP_BYTES:
+val = g-mems_sts[addr - MEM_BASE];
+break;
+default:
+val = acpi_gpe_ioport_readb(s-ar, addr);
+}
 
 PIIX4_DPRINTF(gpe read %x == %x\n, addr, val);
 return val;
@@ -509,7 +527,13 @@ static void gpe_writeb(void *opaque, uint32_t addr, 
uint32_t val)
 {
 PIIX4PMState *s = opaque;
 
-acpi_gpe_ioport_writeb(s-ar, addr, val);
+switch (addr) {
+case MEM_EJ_BASE:
+dimm_notify(val, DIMM_REMOVE_SUCCESS);
+break;
+default:
+acpi_gpe_ioport_writeb(s-ar, addr, val);
+}
 pm_update_sci(s);
 
 PIIX4_DPRINTF(gpe write %x == %d\n, addr, val);
@@ -560,9 +584,11 @@ static uint32_t pcirmv_read(void *opaque, uint32_t addr)
 
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 PCIHotplugState state);
+static int piix4_dimm_hotplug(DeviceState *qdev, DimmDevice *dev, int add);
 
 static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
 {
+int i = 0;
 
 register_ioport_write

[RFC PATCH v3 05/19] Implement dimm device abstraction

2012-09-21 Thread Vasilis Liaskovitis

Each hotplug-able memory slot is a DimmDevice. All DimmDevices are attached
to a new bus called DimmBus. This bus is introduced so that we no longer
depend on hotplug-capability of main system bus (the main bus does not allow
hotplugging). The DimmBus should be attached to a chipset Device (i440fx in case
of the pc)

A hot-add operation for a particular dimm:
- creates a new DimmDevice and attaches it to the DimmBus
- creates a new MemoryRegion of the given physical address offset, size and
node proximity, and attaches it to main system memory as a sub_region.

A successful hot-remove operation detaches and frees the MemoryRegion from
system memory, and removes the DimmDevice from the DimmBus.

Hotplug operations are done through normal device_add /device_del commands.
Also add properties to DimmDevice.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/dimm.c |  305 +
 hw/dimm.h |   90 ++
 2 files changed, 395 insertions(+), 0 deletions(-)
 create mode 100644 hw/dimm.c
 create mode 100644 hw/dimm.h

diff --git a/hw/dimm.c b/hw/dimm.c
new file mode 100644
index 000..288b997
--- /dev/null
+++ b/hw/dimm.c
@@ -0,0 +1,305 @@
+/*
+ * Dimm device for Memory Hotplug
+ *
+ * Copyright ProfitBricks GmbH 2012
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see http://www.gnu.org/licenses/
+ */
+
+#include trace.h
+#include qdev.h
+#include dimm.h
+#include time.h
+#include ../exec-memory.h
+#include qmp-commands.h
+
+/* the system-wide memory bus. */
+static DimmBus *main_memory_bus;
+/* the following list is used to hold dimm config info before machine
+ * initialization. After machine init, the list is emptied and not used 
anymore.*/
+static DimmConfiglist dimmconfig_list = 
QTAILQ_HEAD_INITIALIZER(dimmconfig_list);
+
+static void dimmbus_dev_print(Monitor *mon, DeviceState *dev, int indent);
+static char *dimmbus_get_fw_dev_path(DeviceState *dev);
+
+static Property dimm_properties[] = {
+DEFINE_PROP_UINT64(start, DimmDevice, start, 0),
+DEFINE_PROP_UINT64(size, DimmDevice, size, DEFAULT_DIMMSIZE),
+DEFINE_PROP_UINT32(node, DimmDevice, node, 0),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void dimmbus_dev_print(Monitor *mon, DeviceState *dev, int indent)
+{
+}
+
+static char *dimmbus_get_fw_dev_path(DeviceState *dev)
+{
+char path[40];
+
+snprintf(path, sizeof(path), %s, qdev_fw_name(dev));
+return strdup(path);
+}
+
+static void dimm_bus_class_init(ObjectClass *klass, void *data)
+{
+BusClass *k = BUS_CLASS(klass);
+
+k-print_dev = dimmbus_dev_print;
+k-get_fw_dev_path = dimmbus_get_fw_dev_path;
+}
+
+static void dimm_bus_initfn(Object *obj)
+{
+DimmConfig *dimm_cfg, *next_dimm_cfg;
+DimmBus *bus = DIMM_BUS(obj);
+QTAILQ_INIT(bus-dimmconfig_list);
+QTAILQ_INIT(bus-dimmlist);
+
+QTAILQ_FOREACH_SAFE(dimm_cfg, dimmconfig_list, nextdimmcfg, 
next_dimm_cfg) {
+QTAILQ_REMOVE(dimmconfig_list, dimm_cfg, nextdimmcfg);
+QTAILQ_INSERT_TAIL(bus-dimmconfig_list, dimm_cfg, nextdimmcfg);
+}
+}
+
+static const TypeInfo dimm_bus_info = {
+.name = TYPE_DIMM_BUS,
+.parent = TYPE_BUS,
+.instance_size = sizeof(DimmBus),
+.instance_init = dimm_bus_initfn,
+.class_init = dimm_bus_class_init,
+};
+
+void main_memory_bus_create(Object *parent)
+{
+main_memory_bus = g_malloc0(dimm_bus_info.instance_size);
+main_memory_bus-qbus.glib_allocated = true;
+qbus_create_inplace(main_memory_bus-qbus, TYPE_DIMM_BUS, DEVICE(parent),
+membus);
+}
+
+static void dimm_populate(DimmDevice *s)
+{
+DeviceState *dev= (DeviceState*)s;
+MemoryRegion *new = NULL;
+
+new = g_malloc(sizeof(MemoryRegion));
+memory_region_init_ram(new, dev-id, s-size);
+vmstate_register_ram_global(new);
+memory_region_add_subregion(get_system_memory(), s-start, new);
+s-mr = new;
+}
+
+static void dimm_depopulate(DimmDevice *s)
+{
+assert(s);
+vmstate_unregister_ram(s-mr, NULL);
+memory_region_del_subregion(get_system_memory(), s-mr);
+memory_region_destroy(s-mr);
+s-mr = NULL;
+}
+
+void dimm_config_create(char *id, uint64_t size, uint64_t node, uint32_t
+dimm_idx, uint32_t populated)
+{
+DimmConfig *dimm_cfg;
+dimm_cfg = (DimmConfig*) g_malloc0(sizeof(DimmConfig));
+dimm_cfg-name = id

[RFC PATCH v3 06/19] Implement -dimm command line option

2012-09-21 Thread Vasilis Liaskovitis

Example:
-dimm id=dimm0,size=512M,node=0,populated=off
will define a 512M memory slot belonging to numa node 0.

When populated=on, a DimmDevice is created and hot-plugged at system startup.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/Makefile.objs |2 +-
 qemu-config.c|   25 +
 qemu-options.hx  |5 +
 sysemu.h |1 +
 vl.c |   50 ++
 5 files changed, 82 insertions(+), 1 deletions(-)

diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 6dfebd2..8c5c39a 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -26,7 +26,7 @@ hw-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
 hw-obj-$(CONFIG_PCSPK) += pcspk.o
 hw-obj-$(CONFIG_PCKBD) += pckbd.o
 hw-obj-$(CONFIG_FDC) += fdc.o
-hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o
+hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o dimm.o
 hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
 hw-obj-$(CONFIG_DMA) += dma.o
 hw-obj-$(CONFIG_I82374) += i82374.o
diff --git a/qemu-config.c b/qemu-config.c
index eba977e..4022d64 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -646,6 +646,30 @@ QemuOptsList qemu_boot_opts = {
 },
 };
 
+static QemuOptsList qemu_dimm_opts = {
+.name = dimm,
+.head = QTAILQ_HEAD_INITIALIZER(qemu_dimm_opts.head),
+.desc = {
+{
+.name = id,
+.type = QEMU_OPT_STRING,
+.help = id of this dimm device,
+},{
+.name = size,
+.type = QEMU_OPT_SIZE,
+.help = memory size for this dimm,
+},{
+.name = populated,
+.type = QEMU_OPT_BOOL,
+.help = populated for this dimm,
+},{
+.name = node,
+.type = QEMU_OPT_NUMBER,
+.help = NUMA node number (i.e. proximity) for this dimm,
+},
+{ /* end of list */ }
+},
+};
 static QemuOptsList *vm_config_groups[32] = {
 qemu_drive_opts,
 qemu_chardev_opts,
@@ -662,6 +686,7 @@ static QemuOptsList *vm_config_groups[32] = {
 qemu_boot_opts,
 qemu_iscsi_opts,
 qemu_sandbox_opts,
+qemu_dimm_opts,
 NULL,
 };
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 804a2d1..3687722 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2842,3 +2842,8 @@ HXCOMM This is the last statement. Insert new options 
before this line!
 STEXI
 @end table
 ETEXI
+
+DEF(dimm, HAS_ARG, QEMU_OPTION_dimm,
+-dimm id=dimmid,size=sz,node=nd,populated=on|off\n
+specify memory dimm device with name dimmid, size sz on node nd,
+QEMU_ARCH_ALL)
diff --git a/sysemu.h b/sysemu.h
index 65552ac..7baf9c9 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -139,6 +139,7 @@ extern QEMUClock *rtc_clock;
 extern int nb_numa_nodes;
 extern uint64_t node_mem[MAX_NODES];
 extern unsigned long *node_cpumask[MAX_NODES];
+extern int nb_hp_dimms;
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/vl.c b/vl.c
index 7c577fa..af1745c 100644
--- a/vl.c
+++ b/vl.c
@@ -126,6 +126,7 @@ int main(int argc, char **argv)
 #include hw/xen.h
 #include hw/qdev.h
 #include hw/loader.h
+#include hw/dimm.h
 #include bt-host.h
 #include net.h
 #include net/slirp.h
@@ -248,6 +249,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = 
QTAILQ_HEAD_INITIALIZER(fw_boot_order
 int nb_numa_nodes;
 uint64_t node_mem[MAX_NODES];
 unsigned long *node_cpumask[MAX_NODES];
+int nb_hp_dimms;
 
 uint8_t qemu_uuid[16];
 
@@ -530,6 +532,37 @@ static void configure_rtc_date_offset(const char 
*startdate, int legacy)
 }
 }
 
+static void configure_dimm(QemuOpts *opts)
+{
+const char *id;
+uint64_t size, node;
+bool populated;
+QemuOpts *devopts;
+char buf[256];
+if (nb_hp_dimms == MAX_DIMMS) {
+fprintf(stderr, qemu: maximum number of DIMMs (%d) exceeded\n,
+MAX_DIMMS);
+exit(1);
+}
+id = qemu_opts_id(opts);
+size = qemu_opt_get_size(opts, size, DEFAULT_DIMMSIZE);
+populated = qemu_opt_get_bool(opts, populated, 0);
+node = qemu_opt_get_number(opts, node, 0);
+
+dimm_config_create((char*)id, size, node, nb_hp_dimms, 0);
+
+if (populated) {
+devopts = qemu_opts_create(qemu_find_opts(device), id, 0, NULL);
+qemu_opt_set(devopts, driver, dimm);
+snprintf(buf, sizeof(buf), %lu, size);
+qemu_opt_set(devopts, size, buf);
+snprintf(buf, sizeof(buf), %lu, node);
+qemu_opt_set(devopts, node, buf);
+qemu_opt_set(devopts, bus, membus);
+}
+nb_hp_dimms++;
+}
+
 static void configure_rtc(QemuOpts *opts)
 {
 const char *value;
@@ -2354,6 +2387,8 @@ int main(int argc, char **argv, char **envp)
 DisplayChangeListener *dcl;
 int cyls, heads, secs, translation;
 QemuOpts *hda_opts = NULL, *opts, *machine_opts;
+QemuOpts *dimm_opts[MAX_DIMMS];
+int nb_dimm_opts = 0;
 QemuOptsList *olist;
 int optind;
 const char *optarg;
@@ -3288,6 +3323,18 @@ int main

[RFC PATCH v3 02/19][SeaBIOS] Add SSDT memory device support

2012-09-21 Thread Vasilis Liaskovitis

Define SSDT hotplug-able memory devices in _SB namespace. The dynamically
generated SSDT includes per memory device hotplug methods. These methods
just call methods defined in the DSDT. Also dynamically generate a MTFY
method and a MEON array of the online/available memory devices.  ACPI
extraction macros are used to place the AML code in variables later used by
src/acpi. The design is taken from SSDT cpu generation.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 Makefile |2 +-
 src/ssdt-mem.dsl |   65 ++
 2 files changed, 66 insertions(+), 1 deletions(-)
 create mode 100644 src/ssdt-mem.dsl

diff --git a/Makefile b/Makefile
index 5486f88..e82cfc9 100644
--- a/Makefile
+++ b/Makefile
@@ -233,7 +233,7 @@ $(OUT)%.hex: src/%.dsl ./tools/acpi_extract_preprocess.py 
./tools/acpi_extract.p
$(Q)$(PYTHON) ./tools/acpi_extract.py $(OUT)$*.lst  $(OUT)$*.off
$(Q)cat $(OUT)$*.off  $@
 
-$(OUT)ccode32flat.o: $(OUT)acpi-dsdt.hex $(OUT)ssdt-proc.hex 
$(OUT)ssdt-pcihp.hex $(OUT)ssdt-susp.hex
+$(OUT)ccode32flat.o: $(OUT)acpi-dsdt.hex $(OUT)ssdt-proc.hex 
$(OUT)ssdt-pcihp.hex $(OUT)ssdt-susp.hex $(OUT)ssdt-mem.hex
 
  Kconfig rules
 
diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl
new file mode 100644
index 000..ee322f0
--- /dev/null
+++ b/src/ssdt-mem.dsl
@@ -0,0 +1,65 @@
+/* This file is the basis for the ssdt_mem[] variable in src/acpi.c.
+ * It is similar in design to the ssdt_proc variable.
+ * It defines the contents of the per-cpu Processor() object.  At
+ * runtime, a dynamically generated SSDT will contain one copy of this
+ * AML snippet for every possible memory device in the system.  The
+ * objects will * be placed in the \_SB_ namespace.
+ *
+ * In addition to the aml code generated from this file, the
+ * src/acpi.c file creates a MEMNTFY method with an entry for each memdevice:
+ * Method(MTFY, 2) {
+ * If (LEqual(Arg0, 0x00)) { Notify(MP00, Arg1) }
+ * If (LEqual(Arg0, 0x01)) { Notify(MP01, Arg1) }
+ * ...
+ * }
+ * and a MEON array with the list of active and inactive memory devices:
+ * Name(MEON, Package() { One, One, ..., Zero, Zero, ... })
+ */
+ACPI_EXTRACT_ALL_CODE ssdm_mem_aml
+
+DefinitionBlock (ssdt-mem.aml, SSDT, 0x02, BXPC, CSSDT, 0x1)
+/*  v-- DO NOT EDIT --v */
+{
+ACPI_EXTRACT_DEVICE_START ssdt_mem_start
+ACPI_EXTRACT_DEVICE_END ssdt_mem_end
+ACPI_EXTRACT_DEVICE_STRING ssdt_mem_name
+Device(MPAA) {
+ACPI_EXTRACT_NAME_BYTE_CONST ssdt_mem_id
+Name(ID, 0xAA)
+/*  ^-- DO NOT EDIT --^
+ *
+ * The src/acpi.c code requires the above layout so that it can update
+ * MPAA and 0xAA with the appropriate MEMDEVICE id (see
+ * SD_OFFSET_MEMHEX/MEMID1/MEMID2).  Don't change the above without
+ * also updating the C code.
+ */
+Name(_HID, EISAID(PNP0C80))
+Name(_PXM, 0xAA)
+
+External(CMST, MethodObj)
+External(MPEJ, MethodObj)
+
+Name(_CRS, ResourceTemplate() {
+QwordMemory(
+   ResourceConsumer,
+   ,
+   MinFixed,
+   MaxFixed,
+   Cacheable,
+   ReadWrite,
+   0x0,
+   0xDEADBEEF,
+   0xE6ADBEEE,
+   0x,
+   0x0800,
+   )
+})
+Method (_STA, 0) {
+Return(CMST(ID))
+}
+Method (_EJ0, 1, NotSerialized) {
+MPEJ(ID, Arg0)
+}
+}
+}
+
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 19/19] alternative: Introduce paravirt interface QEMU_CFG_PCI_WINDOW

2012-09-21 Thread Vasilis Liaskovitis

Qemu already calculates the 32-bit and 64-bit PCI starting offsets based on
initial memory and hotplug-able dimms. This info needs to be passed to Seabios
for PCI initialization.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 docs/specs/fwcfg.txt |9 +
 hw/fw_cfg.h  |1 +
 hw/pc_piix.c |   10 ++
 3 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt
index 55f96d9..d9fa215 100644
--- a/docs/specs/fwcfg.txt
+++ b/docs/specs/fwcfg.txt
@@ -26,3 +26,12 @@ Entry max_cpus+nb_numa_nodes+1 contains the number of memory 
dimms (nb_hp_dimms)
 The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet 
contains
 the physical address offset, size (in bytes), and node proximity for the
 respective dimm.
+
+FW_CFG_PCI_WINDOW paravirt info
+
+QEMU passes the starting address for the 32-bit and 64-bit PCI windows to BIOS.
+The following layouts are followed:
+
+
+pcimem32_start | pcimem64_start | 
+
diff --git a/hw/fw_cfg.h b/hw/fw_cfg.h
index 856bf91..6c8c151 100644
--- a/hw/fw_cfg.h
+++ b/hw/fw_cfg.h
@@ -27,6 +27,7 @@
 #define FW_CFG_SETUP_SIZE   0x17
 #define FW_CFG_SETUP_DATA   0x18
 #define FW_CFG_FILE_DIR 0x19
+#define FW_CFG_PCI_WINDOW   0x1a
 
 #define FW_CFG_FILE_FIRST   0x20
 #define FW_CFG_FILE_SLOTS   0x10
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index d1fd276..034761f 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -44,6 +44,7 @@
 #include memory.h
 #include exec-memory.h
 #include dimm.h
+#include fw_cfg.h
 #ifdef CONFIG_XEN
 #  include xen/hvm/hvm_info_table.h
 #endif
@@ -149,6 +150,7 @@ static void pc_init1(MemoryRegion *system_memory,
 MemoryRegion *pci_memory;
 MemoryRegion *rom_memory;
 void *fw_cfg = NULL;
+uint64_t *pci_window_fw_cfg;
 
 pc_cpus_init(cpu_model);
 
@@ -205,6 +207,14 @@ static void pc_init1(MemoryRegion *system_memory,
? 0
: ((uint64_t)1  62)),
   pci_memory, ram_memory);
+
+pci_window_fw_cfg = g_malloc0(2 * 8);
+pci_window_fw_cfg[0] = cpu_to_le64(below_4g_mem_size +
+below_4g_hp_mem_size);
+pci_window_fw_cfg[1] = cpu_to_le64(0x1ULL + above_4g_mem_size
++ above_4g_hp_mem_size);
+fw_cfg_add_bytes(fw_cfg, FW_CFG_PCI_WINDOW, 
+(uint8_t *)pci_window_fw_cfg, 2 * 8);
 } else {
 pci_bus = NULL;
 i440fx_state = NULL;
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 20/19][SeaBIOS] alternative: Use paravirt interface for pci windows

2012-09-21 Thread Vasilis Liaskovitis

Initialize the 32-bit and 64-bit pci starting offsets from values passed in by
the qemu paravirt interface QEMU_CFG_PCI_WINDOW. Qemu calculates the starting
offsets based on initial memory and hotplug-able dimms.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/paravirt.c |6 ++
 src/paravirt.h |2 ++
 src/pciinit.c  |5 ++---
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/src/paravirt.c b/src/paravirt.c
index 2a98d53..390ef30 100644
--- a/src/paravirt.c
+++ b/src/paravirt.c
@@ -346,3 +346,9 @@ void qemu_cfg_romfile_setup(void)
 dprintf(3, Found fw_cfg file: %s (size=%d)\n, file-name, 
file-size);
 }
 }
+
+void qemu_cfg_get_pci_offsets(u64 *pcimem_start, u64 *pcimem64_start)
+{
+qemu_cfg_read_entry(pcimem_start, QEMU_CFG_PCI_WINDOW, sizeof(u64));
+qemu_cfg_read((u8*)(pcimem64_start), sizeof(u64));
+}
diff --git a/src/paravirt.h b/src/paravirt.h
index a284c41..b53ff88 100644
--- a/src/paravirt.h
+++ b/src/paravirt.h
@@ -35,6 +35,7 @@ static inline int kvm_para_available(void)
 #define QEMU_CFG_BOOT_MENU  0x0e
 #define QEMU_CFG_MAX_CPUS   0x0f
 #define QEMU_CFG_FILE_DIR   0x19
+#define QEMU_CFG_PCI_WINDOW 0x1a
 #define QEMU_CFG_ARCH_LOCAL 0x8000
 #define QEMU_CFG_ACPI_TABLES(QEMU_CFG_ARCH_LOCAL + 0)
 #define QEMU_CFG_SMBIOS_ENTRIES (QEMU_CFG_ARCH_LOCAL + 1)
@@ -65,5 +66,6 @@ struct e820_reservation {
 u32 qemu_cfg_e820_entries(void);
 void* qemu_cfg_e820_load_next(void *addr);
 void qemu_cfg_romfile_setup(void);
+void qemu_cfg_get_pci_offsets(u64 *pcimem_start, u64 *pcimem64_start);
 
 #endif
diff --git a/src/pciinit.c b/src/pciinit.c
index 68f302a..64468a0 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -592,8 +592,7 @@ static void pci_region_map_entries(struct pci_bus *busses, 
struct pci_region *r)
 
 static void pci_bios_map_devices(struct pci_bus *busses)
 {
-pcimem_start = RamSize;
-
+qemu_cfg_get_pci_offsets(pcimem_start, pcimem64_start);
 if (pci_bios_init_root_regions(busses)) {
 struct pci_region r64_mem, r64_pref;
 r64_mem.list = NULL;
@@ -611,7 +610,7 @@ static void pci_bios_map_devices(struct pci_bus *busses)
 u64 align_mem = pci_region_align(r64_mem);
 u64 align_pref = pci_region_align(r64_pref);
 
-r64_mem.base = ALIGN(0x1LL + RamSizeOver4G, align_mem);
+r64_mem.base = ALIGN(pcimem64_start, align_mem);
 r64_pref.base = ALIGN(r64_mem.base + sum_mem, align_pref);
 pcimem64_start = r64_mem.base;
 pcimem64_end = r64_pref.base + sum_pref;
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v2 03/21][SeaBIOS] acpi-dsdt: Implement functions for memory hotplug

2012-07-20 Thread Vasilis Liaskovitis

On Tue, Jul 17, 2012 at 03:23:00PM +0800, Wen Congyang wrote:
  +Method(MESC, 0) {
  +// Local5 = active memdevice bitmap
  +Store (MES, Local5)
  +// Local2 = last read byte from bitmap
  +Store (Zero, Local2)
  +// Local0 = memory device iterator
  +Store (Zero, Local0)
  +While (LLess(Local0, SizeOf(MEON))) {
  +// Local1 = MEON flag for this memory device
  +Store(DerefOf(Index(MEON, Local0)), Local1)
  +If (And(Local0, 0x07)) {
  +// Shift down previously read bitmap byte
  +ShiftRight(Local2, 1, Local2)
  +} Else {
  +// Read next byte from memdevice bitmap
  +Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), 
  Local2)
  +}
  +// Local3 = active state for this memory device
  +Store(And(Local2, 1), Local3)
  +
  +If (LNotEqual(Local1, Local3)) {
 
 There are two ways to hot remove a memory device:
 1. dimm_del
 2. echo 1 /sys/bus/acpi/devices/PNP0C80:XX/eject
 
 In the 2nd case, we cannot hotplug this memory device again,
 because both Local1 and Local3 are 1.
 
 So, I think MEON flag for this meory device should be set to 0 in method _EJ0
 or implement method _PS3 for memory device.

good catch. Both internal seabios state (MEON) and the machine qemu bitmap
(mems_sts in hw/acpi_piix4.c) have to be updated when the ejection comes from
OSPM action. I will implement a _PS3 method that updates the MEON flag and also
signals qemu to change the mems_sts bitmap.

thanks,
- Vasilis

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC PATCH v2 06/21] dimm: Implement memory device abstraction

2012-07-13 Thread Vasilis Liaskovitis

Hi,

On Thu, Jul 12, 2012 at 07:55:42PM +, Blue Swirl wrote:
 On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
 vasilis.liaskovi...@profitbricks.com wrote:
  Each hotplug-able memory slot is a SysBusDevice. A hot-add operation for a
  particular dimm creates a new MemoryRegion of the given physical address
  offset, size and node proximity, and attaches it to main system memory as a
  sub_region. A hot-remove operation detaches and frees the MemoryRegion from
  system memory.
 
  This prototype still lacks proper qdev integration: a separate
  hotplug side-channel is used and main system bus hotplug capability is
  ignored.
 
  Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
  ---
   hw/Makefile.objs |2 +-
   hw/dimm.c|  234 
  ++
   hw/dimm.h|   58 +
   3 files changed, 293 insertions(+), 1 deletions(-)
   create mode 100644 hw/dimm.c
   create mode 100644 hw/dimm.h
 
  diff --git a/hw/Makefile.objs b/hw/Makefile.objs
  index 3d77259..e2184bf 100644
  --- a/hw/Makefile.objs
  +++ b/hw/Makefile.objs
  @@ -26,7 +26,7 @@ hw-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
   hw-obj-$(CONFIG_PCSPK) += pcspk.o
   hw-obj-$(CONFIG_PCKBD) += pckbd.o
   hw-obj-$(CONFIG_FDC) += fdc.o
  -hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o
  +hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o dimm.o
   hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
   hw-obj-$(CONFIG_DMA) += dma.o
   hw-obj-$(CONFIG_I82374) += i82374.o
  diff --git a/hw/dimm.c b/hw/dimm.c
  new file mode 100644
  index 000..00c4623
  --- /dev/null
  +++ b/hw/dimm.c
  @@ -0,0 +1,234 @@
  +/*
  + * Dimm device for Memory Hotplug
  + *
  + * Copyright ProfitBricks GmbH 2012
  + * This library is free software; you can redistribute it and/or
  + * modify it under the terms of the GNU Lesser General Public
  + * License as published by the Free Software Foundation; either
  + * version 2 of the License, or (at your option) any later version.
  + *
  + * This library is distributed in the hope that it will be useful,
  + * but WITHOUT ANY WARRANTY; without even the implied warranty of
  + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  + * Lesser General Public License for more details.
  + *
  + * You should have received a copy of the GNU Lesser General Public
  + * License along with this library; if not, see 
  http://www.gnu.org/licenses/
  + */
  +
  +#include trace.h
  +#include qdev.h
  +#include dimm.h
  +#include time.h
  +#include ../exec-memory.h
  +#include qmp-commands.h
  +
  +static DeviceState *dimm_hotplug_qdev;
  +static dimm_hotplug_fn dimm_hotplug;
  +static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;
 
 Using global state does not look right. It should always be possible
 to pass around structures to avoid it.

ok, I 'll try to remove the global state.

 
  +
  +static Property dimm_properties[] = {
  +DEFINE_PROP_END_OF_LIST()
  +};
  +
  +void dimm_populate(DimmState *s)
 
 All functions are global and exported but there does not seem to be
 users. Please make all static which you can.

will do

 
  +{
  +DeviceState *dev= (DeviceState*)s;
  +MemoryRegion *new = NULL;
  +
  +new = g_malloc(sizeof(MemoryRegion));
  +memory_region_init_ram(new, dev-id, s-size);
  +vmstate_register_ram_global(new);
  +memory_region_add_subregion(get_system_memory(), s-start, new);
  +s-mr = new;
  +s-populated = true;
  +}
  +
  +
  +void dimm_depopulate(DimmState *s)
  +{
  +assert(s);
  +if (s-populated) {
  +vmstate_unregister_ram(s-mr, NULL);
  +memory_region_del_subregion(get_system_memory(), s-mr);
  +memory_region_destroy(s-mr);
  +s-populated = false;
  +s-mr = NULL;
  +}
  +}
  +
  +DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
  +dimm_idx, bool populated)
  +{
  +DeviceState *dev;
  +DimmState *mdev;
  +
  +dev = sysbus_create_simple(dimm, -1, NULL);
  +dev-id = id;
  +
  +mdev = DIMM(dev);
  +mdev-idx = dimm_idx;
  +mdev-start = 0;
  +mdev-size = size;
  +mdev-node = node;
  +mdev-populated = populated;
  +QTAILQ_INSERT_TAIL(dimmlist, mdev, nextdimm);
  +return mdev;
  +}
  +
  +void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev)
  +{
  +dimm_hotplug_qdev = qdev;
  +dimm_hotplug = hotplug;
  +dimm_scan_populated();
  +}
  +
  +void dimm_activate(DimmState *slot)
  +{
  +dimm_populate(slot);
  +if (dimm_hotplug)
  +dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 1);
 
 Why the cast?

dimm_hotplug accepts SysBusDevice, not DimmState, though that can be changed.
 
 Also braces, please check your patches with checkpatch.pl.


ok, I 'll do checks with checkpatch.pl. 

  +}
  +
  +void dimm_deactivate(DimmState *slot)
  +{
  +if (dimm_hotplug)
  +dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot

Re: [Qemu-devel] [RFC PATCH v2 00/21] ACPI memory hotplug

2012-07-13 Thread Vasilis Liaskovitis

On Thu, Jul 12, 2012 at 08:04:56PM +, Blue Swirl wrote:
 On Wed, Jul 11, 2012 at 10:31 AM, Vasilis Liaskovitis
 vasilis.liaskovi...@profitbricks.com wrote:
  This is v2 of the ACPI memory hotplug prototype for x86_64 target.
 
 I think the concept of DIMMs (what about SIMMs? SODIMMs? I liked
 memslot) would be useful for most targets, but hotplugging may be
 limited to x86 only. It would be nice to keep these two separate or as
 loosely coupled as possible.

agreed.
what specific usecases besides hotplugging are you thinking about? 
Also are there non-acpi hotplug platforms?

I am trying to keep generic dimm manipulation functions (e.g. population /
depopulation and searching) in hw/dimm[.ch]. Currently the x86-acpi_piix4 
backend
registers a callback for hot-add / hot-remove. In theory other hotplug backends
can hook in. 

btw I don't mind using -memslot (I think someone during v1 mentioned -dimm), 
we just
need some consensus on the naming.

 
 
  Changes v1-v2
 
  - memory map is automatically calculated for hotplug dimms. Dimms are added 
  from
  top-of-memory skipping the pci hole at [PCI_HOLE_START, 4G).
  - Renamed from -memslot to -dimm. Commands changed to dimm_add, 
  dimm_del.
  - Seabios ejection array reduced to a byte. Use extraction macros for dimm 
  ssdt.
  - additional SRAT paravirt info does not break previous SRAT fw_cfg layout.
  - Documentation of new acpi_piix4 registers and paravirt data.
  - add ACPI _OST support for _OST enabled guests. This allows qemu to receive
  notification for success / failure of memory hot-add and hot-remove 
  operations.
  Guest needs to support _OST (https://lkml.org/lkml/2012/6/25/321)
  - add monitor info command to report total guest memory (initial + 
  hot-added)
  - add command line options and monitor commands for batch dimm 
  creation/population
 
  Overview:
 
  Dimm devices are modeled with a new qemu command line
 
  -dimm id=name,size=sz,node=pxm,populated=on|off
 
  As already mentioned, the starting physical address for all dimms is 
  calculated
  automatically from top of memory, skipping the pci hole at [PCI_HOLE_START, 
  4G).
  Node is defining numa proximity for this dimm. When not defined it defaults
  to zero.
  -dimm id=dimm0,size=512M,node=0,populated=off
  will define a 512M memory slot belonging to numa node 0.
 
  Dimms are added or removed with a new hmp command dimm_add/dimm_del:
  Hot-add syntax: dimm_add id
  Hot-remove syntax: dimm_del id
 
  Issues:
 
  - Live migration works as long as populated field is changed to on for
  hotplugged dimms at the destination qemu command line (patch 12/21 lifts
  this requirement). The DimmState structure does not yet define a
  VMStateDescription, but i assume this is the preferred way to pass state
  for migration.
 
  - Dimms are abstracted as qdevices attached to the main system bus. However,
  memory hotplugging has its own side channel ignoring main_system_bus's 
  hotplug
  incapability. A cleaner integration is still needed, probably attaching 
  memory
  devices as children-links of an acpi-capable device (in the pc case 
  acpi_piix4)
  instead of the system bus (TBD). Then device_add/device_del instead of new
  commands can hopefully be used.
 
  Comments/review welcome.
 
  series is based on uq/master for qemu-kvm, and master for seabios. Can be 
  found
  also at:
  http://github.com/vliaskov/qemu-kvm/commits/memhp-v2
  http://github.com/vliaskov/seabios/commits/memhp-v2
 
  Vasilis Liaskovitis (14):
dimm: Implement memory device abstraction
acpi_piix4: Implement memory device hotplug registers
pc: calculate dimm physical addresses and adjust memory map
pc: Add dimm paravirt SRAT info
Implement -dimm command line option
Implement dimm_add and dimm_del commands for hmp and qmp
fix live-migration when populated=on is missing
Implement memory hotplug notification lists
acpi_piix4: _OST dimm support
acpi_piix4: Update dimm state on VM reboot
acpi_piix4: Update dimm bitmap state on hot-remove fail
Implement info memtotal and query-memtotal
Implement -dimms, -dimmspop command line options
Implement mem_increase, mem_decrease hmp/qmp commands
 
   arch_init.c |   23 ++-
   docs/specs/acpi_hotplug.txt |   46 +
   docs/specs/fwcfg.txt|   28 +++
   hmp-commands.hx |   67 +++
   hmp.c   |   24 +++
   hmp.h   |2 +
   hw/Makefile.objs|2 +-
   hw/acpi_piix4.c |  131 -
   hw/dimm.c   |  449 
  +++
   hw/dimm.h   |   72 +++
   hw/pc.c |   94 +-
   hw/pc.h |6 +
   hw/pc_piix.c|   18 ++-
   monitor.c   |   35 
   monitor.h   |5 +
   qapi-schema.json|   38 
   qemu-config.c   |   70

Re: [RFC PATCH v2 05/21][SeaBIOS] pciinit: Fix pcimem_start value

2012-07-12 Thread Vasilis Liaskovitis

On Thu, Jul 12, 2012 at 09:22:14AM +0200, Gerd Hoffmann wrote:
 On 07/11/12 18:45, Vasilis Liaskovitis wrote:
  Hi,
  
  On Wed, Jul 11, 2012 at 01:56:19PM +0200, Gerd Hoffmann wrote:
  On 07/11/12 12:31, Vasilis Liaskovitis wrote:
  In order to hotplug memory between RamSize and BUILD_PCIMEM_START, the pci
  window needs to start at BUILD_PCIMEM_START (0xe000).
  Otherwise, the guest cannot online new dimms at those ranges due to 
  pci_root
  window conflicts. (workaround for linux guest is booting with pci=nocrs)
 
   static void pci_bios_map_devices(struct pci_bus *busses)
   {
  -pcimem_start = RamSize;
  +pcimem_start = BUILD_PCIMEM_START;
 
  It isn't that simple.  For the 32bit pci window it will work, but will
  leaves address space unused instead of assigning it to the 32bit pci
  window.  For the 64bit pci window it will not work.
 
  You have to walk the dimms and figure what the highest used address is,
  for both below-4g and above-4g.  Then fill two variable with it and make
  the pci init code use that instead of RamSize and RamSizeOver4G.
  
  I see. I already have these values values computed in qemu-kvm, so I can 
  pass
  them in a paravirt struct, or infer them from the dimm/srat paravirt info 
  that I
  already pass to seabios. 
 
 I'd suggest to infer from the dimm info, to limit the amout of
 information which needs to be passed from qemu to seabios.

ok.Currently dimm info is processed in bios_init_tables(), which is called after
pci_setup(). I 'll see if i can do the processing earlier.

 
  If i understand correctly, we would like the pcimem windows to use the 
  maximum
  possible address space (constrained by the exact dimms/ranges which are 
  defined)
  instead of leaving unused space.
 
 Yes, for the 32bit pci window.
 
 The 64bit pci window is mapped above all memory, and it must likewise
 consider defined+unfilled dimms so the start address doesn't collide
 with memory hot-plugged above 4G later on.

yes, understood.

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v2 00/21] ACPI memory hotplug

2012-07-11 Thread Vasilis Liaskovitis

This is v2 of the ACPI memory hotplug prototype for x86_64 target.

Changes v1-v2

- memory map is automatically calculated for hotplug dimms. Dimms are added from
top-of-memory skipping the pci hole at [PCI_HOLE_START, 4G).
- Renamed from -memslot to -dimm. Commands changed to dimm_add, 
dimm_del.
- Seabios ejection array reduced to a byte. Use extraction macros for dimm ssdt.
- additional SRAT paravirt info does not break previous SRAT fw_cfg layout.
- Documentation of new acpi_piix4 registers and paravirt data.
- add ACPI _OST support for _OST enabled guests. This allows qemu to receive
notification for success / failure of memory hot-add and hot-remove operations.
Guest needs to support _OST (https://lkml.org/lkml/2012/6/25/321)
- add monitor info command to report total guest memory (initial + hot-added)
- add command line options and monitor commands for batch dimm 
creation/population

Overview:

Dimm devices are modeled with a new qemu command line 

-dimm id=name,size=sz,node=pxm,populated=on|off

As already mentioned, the starting physical address for all dimms is calculated
automatically from top of memory, skipping the pci hole at [PCI_HOLE_START, 4G).
Node is defining numa proximity for this dimm. When not defined it defaults
to zero.
-dimm id=dimm0,size=512M,node=0,populated=off
will define a 512M memory slot belonging to numa node 0.

Dimms are added or removed with a new hmp command dimm_add/dimm_del:
Hot-add syntax: dimm_add id
Hot-remove syntax: dimm_del id

Issues:

- Live migration works as long as populated field is changed to on for
hotplugged dimms at the destination qemu command line (patch 12/21 lifts
this requirement). The DimmState structure does not yet define a
VMStateDescription, but i assume this is the preferred way to pass state
for migration.

- Dimms are abstracted as qdevices attached to the main system bus. However,
memory hotplugging has its own side channel ignoring main_system_bus's hotplug
incapability. A cleaner integration is still needed, probably attaching memory
devices as children-links of an acpi-capable device (in the pc case acpi_piix4)
instead of the system bus (TBD). Then device_add/device_del instead of new
commands can hopefully be used.

Comments/review welcome.

series is based on uq/master for qemu-kvm, and master for seabios. Can be found
also at:
http://github.com/vliaskov/qemu-kvm/commits/memhp-v2
http://github.com/vliaskov/seabios/commits/memhp-v2

Vasilis Liaskovitis (14):
  dimm: Implement memory device abstraction
  acpi_piix4: Implement memory device hotplug registers
  pc: calculate dimm physical addresses and adjust memory map
  pc: Add dimm paravirt SRAT info
  Implement -dimm command line option
  Implement dimm_add and dimm_del commands for hmp and qmp
  fix live-migration when populated=on is missing
  Implement memory hotplug notification lists
  acpi_piix4: _OST dimm support
  acpi_piix4: Update dimm state on VM reboot
  acpi_piix4: Update dimm bitmap state on hot-remove fail
  Implement info memtotal and query-memtotal
  Implement -dimms, -dimmspop command line options
  Implement mem_increase, mem_decrease hmp/qmp commands

 arch_init.c |   23 ++-
 docs/specs/acpi_hotplug.txt |   46 +
 docs/specs/fwcfg.txt|   28 +++
 hmp-commands.hx |   67 +++
 hmp.c   |   24 +++
 hmp.h   |2 +
 hw/Makefile.objs|2 +-
 hw/acpi_piix4.c |  131 -
 hw/dimm.c   |  449 +++
 hw/dimm.h   |   72 +++
 hw/pc.c |   94 +-
 hw/pc.h |6 +
 hw/pc_piix.c|   18 ++-
 monitor.c   |   35 
 monitor.h   |5 +
 qapi-schema.json|   38 
 qemu-config.c   |   70 +++
 qemu-options.hx |   15 ++
 qmp-commands.hx |  137 +
 sysemu.h|1 +
 vl.c|  122 -
 21 files changed, 1368 insertions(+), 17 deletions(-)
 create mode 100644 docs/specs/acpi_hotplug.txt
 create mode 100644 docs/specs/fwcfg.txt
 create mode 100644 hw/dimm.c
 create mode 100644 hw/dimm.h

Vasilis Liaskovitis (7):
  Add ACPI_EXTRACT_DEVICE* macros
  Add SSDT memory device support
  acpi-dsdt: Implement functions for memory hotplug.
  acpi: generate hotplug memory devices.
  pciinit: Fix pcimem_start value
  acpi_dsdt: Support _OST dimm method
  acpi_dsdt: Revert internal dimm state on _OST failure
 
 Makefile  |2 +-
 src/acpi-dsdt.dsl |  120 -
 src/acpi.c|  158 +++--
 src/pciinit.c |2 +-
 src/ssdt-mem.dsl  |   69 +
 tools/acpi_extract.py |   28 +
 6 files changed, 369 insertions(+), 10 deletions(-)
 create mode 100644 src/ssdt-mem.dsl

[RFC PATCH v2 03/21][SeaBIOS] acpi-dsdt: Implement functions for memory hotplug

2012-07-11 Thread Vasilis Liaskovitis

Extend the DSDT to include methods for handling memory hot-add and hot-remove
notifications and memory device status requests. These functions are called
from the memory device SSDT methods.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/acpi-dsdt.dsl |   70 +++-
 1 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 2060686..5d3e92b 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -737,6 +737,71 @@ DefinitionBlock (
 }
 Return(One)
 }
+/* Objects filled in by run-time generated SSDT */
+External(MTFY, MethodObj)
+External(MEON, PkgObj)
+
+Method (CMST, 1, NotSerialized) {
+// _STA method - return ON status of memdevice
+// Local0 = MEON flag for this cpu
+Store(DerefOf(Index(MEON, Arg0)), Local0)
+If (Local0) { Return(0xF) } Else { Return(0x0) }
+}
+
+/* Memory hotplug notify array */
+OperationRegion(MEST, SystemIO, 0xaf80, 32)
+Field (MEST, ByteAcc, NoLock, Preserve)
+{
+MES, 256
+}
+ 
+/* Memory eject byte */
+OperationRegion(MEMJ, SystemIO, 0xafa0, 1)
+Field (MEMJ, ByteAcc, NoLock, Preserve)
+{
+MPE, 8
+}
+
+Method(MESC, 0) {
+// Local5 = active memdevice bitmap
+Store (MES, Local5)
+// Local2 = last read byte from bitmap
+Store (Zero, Local2)
+// Local0 = memory device iterator
+Store (Zero, Local0)
+While (LLess(Local0, SizeOf(MEON))) {
+// Local1 = MEON flag for this memory device
+Store(DerefOf(Index(MEON, Local0)), Local1)
+If (And(Local0, 0x07)) {
+// Shift down previously read bitmap byte
+ShiftRight(Local2, 1, Local2)
+} Else {
+// Read next byte from memdevice bitmap
+Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), 
Local2)
+}
+// Local3 = active state for this memory device
+Store(And(Local2, 1), Local3)
+
+If (LNotEqual(Local1, Local3)) {
+// State change - update MEON with new state
+Store(Local3, Index(MEON, Local0))
+// Do MEM notify
+If (LEqual(Local3, 1)) {
+MTFY(Local0, 1)
+} Else {
+MTFY(Local0, 3)
+}
+}
+Increment(Local0)
+}
+Return(One)
+}
+
+Method (MPEJ, 2, NotSerialized) {
+// _EJ0 method - eject callback
+Store(Arg0, MPE)
+Sleep(200)
+}
 }
 
 
@@ -759,8 +824,9 @@ DefinitionBlock (
 // CPU hotplug event
 Return(\_SB.PRSC())
 }
-Method(_L03) {
-Return(0x01)
+Method(_E03) {
+// Memory hotplug event
+Return(\_SB.MESC())
 }
 Method(_L04) {
 Return(0x01)
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v2 07/21] acpi_piix4: Implement memory device hotplug registers

2012-07-11 Thread Vasilis Liaskovitis

A 32-byte register is used to present up to 256 hotplug-able memory devices
to BIOS and OSPM. Hot-add and hot-remove functions trigger an ACPI hotplug
event through these. Only reads are allowed from these registers.

An ACPI hot-remove event but needs to wait for OSPM to eject the device.
We use a single-byte register to know when OSPM has called the _EJ function
for a particular dimm. A write to this byte will depopulate the respective dimm.
Only writes are allowed to this byte.

v1-v2:
mems_sts address moved from 0xaf20 to 0xaf80 (to accomodate more space for
cpu-hotplugging in the future).
_EJ array is reduced to a single byte.
Add documentation in docs/specs/acpi_hotplug.txt

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 docs/specs/acpi_hotplug.txt |   22 +
 hw/acpi_piix4.c |   73 --
 2 files changed, 91 insertions(+), 4 deletions(-)
 create mode 100644 docs/specs/acpi_hotplug.txt

diff --git a/docs/specs/acpi_hotplug.txt b/docs/specs/acpi_hotplug.txt
new file mode 100644
index 000..cf86242
--- /dev/null
+++ b/docs/specs/acpi_hotplug.txt
@@ -0,0 +1,22 @@
+QEMU-ACPI BIOS hotplug interface
+--
+This document describes the interface between QEMU and the ACPI BIOS for 
non-PCI
+space. For the PCI interface please look at docs/specs/acpi_pci_hotplug.txt
+
+QEMU-ACPI BIOS memory hotplug interface
+--
+
+Memory Dimm status array (IO port 0xaf80-0xaf9f, 1-byte access):
+---
+Dimm hot-plug notification pending. One bit per slot.
+
+Read by ACPI BIOS GPE.3 handler to notify OS of memory hot-add or hot-remove
+events.  Read-only.
+
+Memory Dimm ejection success notification (IO port 0xafa0, 1-byte access):
+---
+Dimm hot-remove _EJ0 notification. Byte value indicates Dimm slot that was
+ejected.
+
+Written by ACPI memory device _EJ0 method to notify qemu of successfull
+hot-removal.  Write-only.
diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 0aace60..b988597 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -28,6 +28,8 @@
 #include range.h
 #include ioport.h
 #include fw_cfg.h
+#include sysbus.h
+#include dimm.h
 
 //#define DEBUG
 
@@ -45,9 +47,15 @@
 #define PCI_DOWN_BASE 0xae04
 #define PCI_EJ_BASE 0xae08
 #define PCI_RMV_BASE 0xae0c
+#define MEM_BASE 0xaf80
+#define MEM_EJ_BASE 0xafa0
 
+#define PIIX4_MEM_HOTPLUG_STATUS 8
 #define PIIX4_PCI_HOTPLUG_STATUS 2
 
+struct gpe_regs {
+uint8_t mems_sts[DIMM_BITMAP_BYTES];
+};
 struct pci_status {
 uint32_t up; /* deprecated, maintained for migration compatibility */
 uint32_t down;
@@ -69,6 +77,7 @@ typedef struct PIIX4PMState {
 Notifier machine_ready;
 
 /* for pci hotplug */
+struct gpe_regs gperegs;
 struct pci_status pci0_status;
 uint32_t pci0_hotplug_enable;
 uint32_t pci0_slot_device_present;
@@ -93,8 +102,8 @@ static void pm_update_sci(PIIX4PMState *s)
ACPI_BITMASK_POWER_BUTTON_ENABLE |
ACPI_BITMASK_GLOBAL_LOCK_ENABLE |
ACPI_BITMASK_TIMER_ENABLE)) != 0) ||
-(((s-ar.gpe.sts[0]  s-ar.gpe.en[0])
-   PIIX4_PCI_HOTPLUG_STATUS) != 0);
+(((s-ar.gpe.sts[0]  s-ar.gpe.en[0]) 
+  (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_MEM_HOTPLUG_STATUS)) != 0);
 
 qemu_set_irq(s-irq, sci_level);
 /* schedule a timer interruption if needed */
@@ -499,7 +508,16 @@ type_init(piix4_pm_register_types)
 static uint32_t gpe_readb(void *opaque, uint32_t addr)
 {
 PIIX4PMState *s = opaque;
-uint32_t val = acpi_gpe_ioport_readb(s-ar, addr);
+uint32_t val = 0;
+struct gpe_regs *g = s-gperegs;
+
+switch (addr) {
+case MEM_BASE ... MEM_BASE+DIMM_BITMAP_BYTES:
+val = g-mems_sts[addr - MEM_BASE];
+break;
+default:
+val = acpi_gpe_ioport_readb(s-ar, addr);
+}
 
 PIIX4_DPRINTF(gpe read %x == %x\n, addr, val);
 return val;
@@ -509,7 +527,13 @@ static void gpe_writeb(void *opaque, uint32_t addr, 
uint32_t val)
 {
 PIIX4PMState *s = opaque;
 
-acpi_gpe_ioport_writeb(s-ar, addr, val);
+switch (addr) {
+case MEM_EJ_BASE:
+dimm_notify(val, DIMM_REMOVE_SUCCESS);
+break;
+default:
+acpi_gpe_ioport_writeb(s-ar, addr, val);
+}
 pm_update_sci(s);
 
 PIIX4_DPRINTF(gpe write %x == %d\n, addr, val);
@@ -560,9 +584,11 @@ static uint32_t pcirmv_read(void *opaque, uint32_t addr)
 
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 PCIHotplugState state);
+static int piix4_dimm_hotplug(DeviceState *qdev, SysBusDevice *dev, int add);
 
 static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
 {
+int i = 0;
 
 register_ioport_write(GPE_BASE, GPE_LEN, 1, gpe_writeb

[RFC PATCH v2 13/21] Implement memory hotplug notification lists

2012-07-11 Thread Vasilis Liaskovitis

Guest can respond to ACPI hotplug events e.g. with _EJ or _OST method.
This patch implements a tail queue to store guest notifications for memory
hot-add and hot-remove requests.

Guest responses for memory hotplug command on a per-dimm basis can be detected
with the new hmp command info memhp or the new qmp command query-memhp
Examples:

(qemu) dimm_add dimm0
(qemu) info memhp
Dimm: dimm0 hot-add success
or
Dimm: dimm0 hot-add failure

(qemu) dimm_del dimm0
(qemu) info memhp
Dimm: dimm0 hot-remove success
or
Dimm: dimm0 hot-remove failure

Results are removed from the queue once read.

This patch only queues _EJ events that signal hot-remove success.
For  _OST event queuing, which cover the hot-remove failure and
hot-add success/failure cases, the next 2 patches are also needed.

These notification items should probably be part of migration state (not yet
implemented)

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hmp-commands.hx  |2 +
 hmp.c|   17 
 hmp.h|1 +
 hw/dimm.c|   55 ++
 hw/dimm.h|6 +
 monitor.c|7 ++
 qapi-schema.json |   26 +
 qmp-commands.hx  |   38 +
 8 files changed, 152 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 012c150..3172cde 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1459,6 +1459,8 @@ show device tree
 show qdev device model list
 @item info roms
 show roms
+@item info memhp
+show memhp
 @end table
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index b9cec1d..ec25d9a 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1000,3 +1000,20 @@ void hmp_netdev_del(Monitor *mon, const QDict *qdict)
 qmp_netdev_del(id, err);
 hmp_handle_error(mon, err);
 }
+
+void hmp_info_memhp(Monitor *mon)
+{
+MemHpInfoList *info;
+MemHpInfoList *item;
+MemHpInfo *dimm;
+
+info = qmp_query_memhp(NULL);
+for (item = info; item; item = item-next) {
+dimm = item-value;
+monitor_printf(mon, Dimm: %s %s %s\n, dimm-Dimm,
+dimm-request, dimm-result);
+dimm-Dimm = NULL;
+}
+
+qapi_free_MemHpInfoList(info);
+}
diff --git a/hmp.h b/hmp.h
index 79d138d..971e7c4 100644
--- a/hmp.h
+++ b/hmp.h
@@ -64,5 +64,6 @@ void hmp_device_del(Monitor *mon, const QDict *qdict);
 void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict);
 void hmp_netdev_add(Monitor *mon, const QDict *qdict);
 void hmp_netdev_del(Monitor *mon, const QDict *qdict);
+void hmp_info_memhp(Monitor *mon);
 
 #endif
diff --git a/hw/dimm.c b/hw/dimm.c
index 00c4623..9b32386 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -26,6 +26,7 @@
 static DeviceState *dimm_hotplug_qdev;
 static dimm_hotplug_fn dimm_hotplug;
 static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;
+static QTAILQ_HEAD(dimm_hp_result_head, dimm_hp_result)  dimm_hp_result_queue;
 
 static Property dimm_properties[] = {
 DEFINE_PROP_END_OF_LIST()
@@ -189,16 +190,69 @@ void dimm_notify(uint32_t idx, uint32_t event)
 DimmState *s;
 s = dimm_find_from_idx(idx);
 assert(s != NULL);
+struct dimm_hp_result *result = g_malloc0(sizeof(*result));
 
+result-s = s;
+result-ret = event;
 switch(event) {
 case DIMM_REMOVE_SUCCESS:
 dimm_depopulate(s);
+QTAILQ_INSERT_TAIL(dimm_hp_result_queue, result, next);
 break;
 default:
+g_free(result);
 break;
 }
 }
 
+MemHpInfoList *qmp_query_memhp(Error **errp)
+{
+MemHpInfoList *head = NULL, *cur_item = NULL, *info;
+struct dimm_hp_result *item, *nextitem;
+
+QTAILQ_FOREACH_SAFE(item, dimm_hp_result_queue, next, nextitem) {
+
+info = g_malloc0(sizeof(*info));
+info-value = g_malloc0(sizeof(*info-value));
+info-value-Dimm = g_malloc0(sizeof(char) * 32);
+info-value-request = g_malloc0(sizeof(char) * 16);
+info-value-result = g_malloc0(sizeof(char) * 16);
+switch (item-ret) {
+case DIMM_REMOVE_SUCCESS:
+strcpy(info-value-request, hot-remove);
+strcpy(info-value-result, success);
+break;
+case DIMM_REMOVE_FAIL:
+strcpy(info-value-request, hot-remove);
+strcpy(info-value-result, failure);
+break;
+case DIMM_ADD_SUCCESS:
+strcpy(info-value-request, hot-add);
+strcpy(info-value-result, success);
+break;
+case DIMM_ADD_FAIL:
+strcpy(info-value-request, hot-add);
+strcpy(info-value-result, failure);
+break;
+default:
+break;
+}
+strcpy(info-value-Dimm, item-s-busdev.qdev.id);
+/* XXX: waiting for the qapi to support GSList */
+if (!cur_item) {
+head = cur_item = info

[RFC PATCH v2 14/21][SeaBIOS] acpi_dsdt: Support _OST dimm method

2012-07-11 Thread Vasilis Liaskovitis

Add support for _OST method. _OST method will write into the correct I/O byte to
signal success / failure of hot-add or hot-remove to qemu.
 
Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/acpi-dsdt.dsl |   46 ++
 src/ssdt-mem.dsl  |4 
 2 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 5d3e92b..1c253ca 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -762,6 +762,28 @@ DefinitionBlock (
 MPE, 8
 }
 
+
+/* Memory hot-remove notify failure byte */
+OperationRegion(MEEF, SystemIO, 0xafa1, 1)
+Field (MEEF, ByteAcc, NoLock, Preserve)
+{
+MEF, 8
+}
+
+/* Memory hot-add notify success byte */
+OperationRegion(MPIS, SystemIO, 0xafa2, 1)
+Field (MPIS, ByteAcc, NoLock, Preserve)
+{
+MIS, 8
+}
+
+/* Memory hot-add notify failure byte */
+OperationRegion(MPIF, SystemIO, 0xafa3, 1)
+Field (MPIF, ByteAcc, NoLock, Preserve)
+{
+MIF, 8
+}
+
 Method(MESC, 0) {
 // Local5 = active memdevice bitmap
 Store (MES, Local5)
@@ -802,6 +824,30 @@ DefinitionBlock (
 Store(Arg0, MPE)
 Sleep(200)
 }
+Method (MOST, 3, Serialized) {
+// _OST method - OS status indication
+Switch (And(Arg0, 0xFF)) {
+Case(0x3)
+{
+Switch(And(Arg1, 0xFF)) {
+Case(0x1) {
+Store(Arg2, MEF)
+}
+}
+}
+Case(0x1)
+{
+Switch(And(Arg1, 0xFF)) {
+Case(0x0) {
+Store(Arg2, MIS)
+}
+Case(0x1) {
+Store(Arg2, MIF)
+}
+}
+}
+}
+}
 }
 
 
diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl
index ee322f0..041d301 100644
--- a/src/ssdt-mem.dsl
+++ b/src/ssdt-mem.dsl
@@ -38,6 +38,7 @@ DefinitionBlock (ssdt-mem.aml, SSDT, 0x02, BXPC, 
CSSDT, 0x1)
 
 External(CMST, MethodObj)
 External(MPEJ, MethodObj)
+External(MOST, MethodObj)
 
 Name(_CRS, ResourceTemplate() {
 QwordMemory(
@@ -60,6 +61,9 @@ DefinitionBlock (ssdt-mem.aml, SSDT, 0x02, BXPC, 
CSSDT, 0x1)
 Method (_EJ0, 1, NotSerialized) {
 MPEJ(ID, Arg0)
 }
+Method (_OST, 3) {
+MOST(Arg0, Arg1, ID)
+}
 }
 }
 
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v2 18/21] acpi_piix4: Update dimm bitmap state on hot-remove fail

2012-07-11 Thread Vasilis Liaskovitis

This allows failed hot operations to be retried at anytime. This only
works for guests that use _OST notification. Other guests cannot retry failed
hot operations on same devices until after reboot.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/acpi_piix4.c |   20 +++-
 hw/dimm.c   |   16 +++-
 hw/dimm.h   |2 +-
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index ebc5de7..db631cc 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -599,6 +599,7 @@ static uint32_t pcirmv_read(void *opaque, uint32_t addr)
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 PCIHotplugState state);
 static int piix4_dimm_hotplug(DeviceState *qdev, SysBusDevice *dev, int add);
+static int piix4_dimm_revert(DeviceState *qdev, SysBusDevice *dev, int add);
 
 static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
 {
@@ -627,7 +628,7 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, 
PIIX4PMState *s)
 }
 
 pci_bus_hotplug(bus, piix4_device_hotplug, s-dev.qdev);
-dimm_register_hotplug(piix4_dimm_hotplug, s-dev.qdev);
+dimm_register_hotplug(piix4_dimm_hotplug, piix4_dimm_revert, s-dev.qdev);
 }
 
 static void enable_device(PIIX4PMState *s, int slot)
@@ -696,6 +697,23 @@ void piix4_dimm_state_sync(PIIX4PMState *s)
 }
 }
 
+static int piix4_dimm_revert(DeviceState *qdev, SysBusDevice *dev, int add)
+{
+PCIDevice *pci_dev = DO_UPCAST(PCIDevice, qdev, qdev);
+PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, pci_dev);
+struct gpe_regs *g = s-gperegs;
+DimmState *slot = DIMM(dev);
+int idx = slot-idx;
+
+if (add) {
+g-mems_sts[idx/8] = ~(1  (idx%8));
+}
+else {
+g-mems_sts[idx/8] |= (1  (idx%8));
+}
+return 0;
+}
+
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
PCIHotplugState state)
 {
diff --git a/hw/dimm.c b/hw/dimm.c
index ba104cc..2115567 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -25,6 +25,7 @@
 
 static DeviceState *dimm_hotplug_qdev;
 static dimm_hotplug_fn dimm_hotplug;
+static dimm_hotplug_fn dimm_revert;
 static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;
 static QTAILQ_HEAD(dimm_hp_result_head, dimm_hp_result)  dimm_hp_result_queue;
 
@@ -77,10 +78,12 @@ DimmState *dimm_create(char *id, uint64_t size, uint64_t 
node, uint32_t
 return mdev;
 }
 
-void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev)
+void dimm_register_hotplug(dimm_hotplug_fn hotplug, dimm_hotplug_fn revert,
+DeviceState *qdev)
 {
 dimm_hotplug_qdev = qdev;
 dimm_hotplug = hotplug;
+dimm_revert = revert;
 dimm_scan_populated();
 }
 
@@ -211,10 +214,20 @@ void dimm_notify(uint32_t idx, uint32_t event)
 s-pending = false;
 break;
 case DIMM_REMOVE_FAIL:
+QTAILQ_INSERT_TAIL(dimm_hp_result_queue, result, next);
+s-pending = false;
+if (dimm_revert)
+dimm_revert(dimm_hotplug_qdev, (SysBusDevice*)s, 0);
+break;
 case DIMM_ADD_SUCCESS:
+QTAILQ_INSERT_TAIL(dimm_hp_result_queue, result, next);
+s-pending = false;
+break;
 case DIMM_ADD_FAIL:
 QTAILQ_INSERT_TAIL(dimm_hp_result_queue, result, next);
 s-pending = false;
+if (dimm_revert)
+dimm_revert(dimm_hotplug_qdev, (SysBusDevice*)s, 1);
 break;
 default:
 g_free(result);
@@ -288,6 +301,7 @@ static void dimm_class_init(ObjectClass *klass, void *data)
 dc-props = dimm_properties;
 sc-init = dimm_init;
 dimm_hotplug = NULL;
+dimm_revert = NULL;
 QTAILQ_INIT(dimmlist);
 QTAILQ_INIT(dimm_hp_result_queue);
 }
diff --git a/hw/dimm.h b/hw/dimm.h
index 0fa6137..b563e3f 100644
--- a/hw/dimm.h
+++ b/hw/dimm.h
@@ -54,7 +54,7 @@ void dimm_depopulate(DimmState *s);
 int dimm_do(Monitor *mon, const QDict *qdict, bool add);
 DimmState *dimm_find_from_idx(uint32_t idx);
 DimmState *dimm_find_from_name(char *id);
-void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev);
+void dimm_register_hotplug(dimm_hotplug_fn hotplug, dimm_hotplug_fn revert, 
DeviceState *qdev);
 void dimm_calc_offsets(dimm_calcoffset_fn calcfn);
 void dimm_activate(DimmState *slot);
 void dimm_deactivate(DimmState *slot);
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v2 20/21] Implement -dimms, -dimmspop command line options

2012-07-11 Thread Vasilis Liaskovitis

Implement batch dimm creation command line options. These could be useful for
not bloating the command line with a large number of dimms.

syntax: -dimms pfx=poolid,size=sz,num=n
Will create numdimms dimms with ids poolid0, ..., poolidn-1. Each dimm has a
size of sz.

Implement -dimmpop option to populate dimms at bootup
syntax: -dimmpop pfx=poolid,num=n
This will populate n dimms with ids poolid0, ..., poolidn-1.

(live-migration could break here without patch 12/21: -dimmspop
needs to be reworked to support populating of individual dimms with
same prefix, and not only a range of dimms starting from 0)

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/dimm.c   |9 ++
 hw/dimm.h   |2 +-
 qemu-config.c   |   45 
 qemu-options.hx |   10 ++
 vl.c|   86 ++-
 5 files changed, 150 insertions(+), 2 deletions(-)

diff --git a/hw/dimm.c b/hw/dimm.c
index 2115567..6e324d3 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -187,6 +187,15 @@ void dimm_calc_offsets(dimm_calcoffset_fn calcfn)
 }
 }
 
+int dimm_set_populated(DimmState *s)
+{
+if (s) {
+s-populated = true;
+return 0;
+}
+else return -1;
+}
+
 /* used to populate and activate dimms at boot time */
 void dimm_scan_populated(void)
 {
diff --git a/hw/dimm.h b/hw/dimm.h
index b563e3f..0fdf59b 100644
--- a/hw/dimm.h
+++ b/hw/dimm.h
@@ -60,6 +60,6 @@ void dimm_activate(DimmState *slot);
 void dimm_deactivate(DimmState *slot);
 void dimm_scan_populated(void);
 void dimm_notify(uint32_t idx, uint32_t event);
-
+int dimm_set_populated(DimmState *s);
 
 #endif
diff --git a/qemu-config.c b/qemu-config.c
index 4abc31b..7f63186 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -650,6 +650,49 @@ static QemuOptsList qemu_dimm_opts = {
 { /* end of list */ }
 },
 };
+
+static QemuOptsList qemu_dimms_opts = {
+.name = dimms,
+.head = QTAILQ_HEAD_INITIALIZER(qemu_dimms_opts.head),
+.desc = {
+{
+.name = pfx,
+.type = QEMU_OPT_STRING,
+.help = prefix of ids for these dimm devices,
+},{
+.name = size,
+.type = QEMU_OPT_SIZE,
+.help = memory size for these dimm,
+},{
+.name = num,
+.type = QEMU_OPT_NUMBER,
+.help = number of dimm devices in this pool,
+},{
+.name = node,
+.type = QEMU_OPT_NUMBER,
+.help = NUMA node number (i.e. proximity) for these dimms,
+},
+{ /* end of list */ }
+},
+};
+
+static QemuOptsList qemu_dimmspop_opts = {
+.name = dimmspop,
+.head = QTAILQ_HEAD_INITIALIZER(qemu_dimmspop_opts.head),
+.desc = {
+{
+.name = pfx,
+.type = QEMU_OPT_STRING,
+.help = pool prefix for this dimm device,
+},{
+.name = num,
+.type = QEMU_OPT_SIZE,
+.help = number of dimm devices to populate,
+},
+{ /* end of list */ }
+},
+};
+
 static QemuOptsList *vm_config_groups[32] = {
 qemu_drive_opts,
 qemu_chardev_opts,
@@ -666,6 +709,8 @@ static QemuOptsList *vm_config_groups[32] = {
 qemu_boot_opts,
 qemu_iscsi_opts,
 qemu_dimm_opts,
+qemu_dimms_opts,
+qemu_dimmspop_opts,
 NULL,
 };
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 61909f7..0a9326e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2752,3 +2752,13 @@ DEF(dimm, HAS_ARG, QEMU_OPTION_dimm,
 -dimm id=dimmid,size=sz,node=nd,populated=on|off\n
 specify memory dimm device with name dimmid, size sz on node nd,
 QEMU_ARCH_ALL)
+
+DEF(dimms, HAS_ARG, QEMU_OPTION_dimms,
+-dimms pfx=id,size=sz,node=nd\n
+specify pool of num memory dimm devices of size sz each on node nd,
+QEMU_ARCH_ALL)
+
+DEF(dimmspop, HAS_ARG, QEMU_OPTION_dimmspop,
+-dimmspop pfx=id,num=n\n
+populate n dimms of pool id (dimms with ids id0,...,idn-1) at system 
startup,
+QEMU_ARCH_ALL)
diff --git a/vl.c b/vl.c
index efe915e..37752be 100644
--- a/vl.c
+++ b/vl.c
@@ -538,6 +538,65 @@ static void configure_dimm(QemuOpts *opts)
 nb_hp_dimms++;
 }
 
+static void configure_dimms(QemuOpts *opts)
+{
+const char *value, *pfx, *id;
+uint64_t size, node;
+int num, dimm;
+char buf[32];
+
+id = qemu_opts_id(opts);
+value = qemu_opt_get(opts, pfx);
+if (!value) {
+fprintf(stderr, qemu: invalid prefix for dimm pool '%s'\n, id);
+exit(1);
+}
+pfx = value;
+
+size = qemu_opt_get_size(opts, size, DEFAULT_DIMMSIZE);
+num = qemu_opt_get_number(opts, num, 1);
+node = qemu_opt_get_number(opts, node, 0);
+
+for (dimm = 0; dimm  num; dimm++) {
+if (nb_hp_dimms == MAX_DIMMS) {
+fprintf(stderr, qemu: maximum number of DIMMs (%d) exceeded\n,
+MAX_DIMMS

[RFC PATCH v2 06/21] dimm: Implement memory device abstraction

2012-07-11 Thread Vasilis Liaskovitis

Each hotplug-able memory slot is a SysBusDevice. A hot-add operation for a
particular dimm creates a new MemoryRegion of the given physical address
offset, size and node proximity, and attaches it to main system memory as a
sub_region. A hot-remove operation detaches and frees the MemoryRegion from
system memory.

This prototype still lacks proper qdev integration: a separate
hotplug side-channel is used and main system bus hotplug capability is
ignored.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/Makefile.objs |2 +-
 hw/dimm.c|  234 ++
 hw/dimm.h|   58 +
 3 files changed, 293 insertions(+), 1 deletions(-)
 create mode 100644 hw/dimm.c
 create mode 100644 hw/dimm.h

diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 3d77259..e2184bf 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -26,7 +26,7 @@ hw-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
 hw-obj-$(CONFIG_PCSPK) += pcspk.o
 hw-obj-$(CONFIG_PCKBD) += pckbd.o
 hw-obj-$(CONFIG_FDC) += fdc.o
-hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o
+hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o dimm.o
 hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
 hw-obj-$(CONFIG_DMA) += dma.o
 hw-obj-$(CONFIG_I82374) += i82374.o
diff --git a/hw/dimm.c b/hw/dimm.c
new file mode 100644
index 000..00c4623
--- /dev/null
+++ b/hw/dimm.c
@@ -0,0 +1,234 @@
+/*
+ * Dimm device for Memory Hotplug
+ *
+ * Copyright ProfitBricks GmbH 2012
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see http://www.gnu.org/licenses/
+ */
+
+#include trace.h
+#include qdev.h
+#include dimm.h
+#include time.h
+#include ../exec-memory.h
+#include qmp-commands.h
+
+static DeviceState *dimm_hotplug_qdev;
+static dimm_hotplug_fn dimm_hotplug;
+static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;
+
+static Property dimm_properties[] = {
+DEFINE_PROP_END_OF_LIST()
+};
+
+void dimm_populate(DimmState *s)
+{
+DeviceState *dev= (DeviceState*)s;
+MemoryRegion *new = NULL;
+
+new = g_malloc(sizeof(MemoryRegion));
+memory_region_init_ram(new, dev-id, s-size);
+vmstate_register_ram_global(new);
+memory_region_add_subregion(get_system_memory(), s-start, new);
+s-mr = new;
+s-populated = true;
+}
+
+
+void dimm_depopulate(DimmState *s)
+{
+assert(s);
+if (s-populated) {
+vmstate_unregister_ram(s-mr, NULL);
+memory_region_del_subregion(get_system_memory(), s-mr);
+memory_region_destroy(s-mr);
+s-populated = false;
+s-mr = NULL;
+}
+}
+
+DimmState *dimm_create(char *id, uint64_t size, uint64_t node, uint32_t
+dimm_idx, bool populated)
+{
+DeviceState *dev;
+DimmState *mdev;
+
+dev = sysbus_create_simple(dimm, -1, NULL);
+dev-id = id;
+
+mdev = DIMM(dev);
+mdev-idx = dimm_idx;
+mdev-start = 0;
+mdev-size = size;
+mdev-node = node;
+mdev-populated = populated;
+QTAILQ_INSERT_TAIL(dimmlist, mdev, nextdimm);
+return mdev;
+}
+
+void dimm_register_hotplug(dimm_hotplug_fn hotplug, DeviceState *qdev)
+{
+dimm_hotplug_qdev = qdev;
+dimm_hotplug = hotplug;
+dimm_scan_populated();
+}
+
+void dimm_activate(DimmState *slot)
+{
+dimm_populate(slot);
+if (dimm_hotplug)
+dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 1);
+}
+
+void dimm_deactivate(DimmState *slot)
+{
+if (dimm_hotplug)
+dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 0);
+}
+
+DimmState *dimm_find_from_name(char *id)
+{
+Error *err = NULL;
+DeviceState *qdev;
+const char *type;
+qdev = qdev_find_recursive(sysbus_get_default(), id);
+if (qdev) {
+type = object_property_get_str(OBJECT(qdev), type, err);
+if (!type) {
+return NULL;
+}
+if (!strcmp(type, dimm)) {
+return DIMM(qdev);
+}
+}
+return NULL;
+}
+
+int dimm_do(Monitor *mon, const QDict *qdict, bool add)
+{
+DimmState *slot = NULL;
+
+char *id = (char*) qdict_get_try_str(qdict, id);
+if (!id) {
+fprintf(stderr, ERROR %s invalid id\n,__FUNCTION__);
+return 1;
+}
+
+slot = dimm_find_from_name(id);
+
+if (!slot) {
+fprintf(stderr, %s no slot %s found\n, __FUNCTION__, id);
+return 1;
+}
+
+if (add) {
+if (slot-populated) {
+fprintf(stderr, ERROR

[RFC PATCH v2 12/21] fix live-migration when populated=on is missing

2012-07-11 Thread Vasilis Liaskovitis

Live migration works after memory hot-add events, as long as the
qemu command line -dimm arguments are changed on the destination host
to specify populated=on for the dimms that have been hot-added.

If a command-line change has not occured, the destination host does not yet
have the corresponding ramblock in its ram_list. Activate the memslot on the
destination during ram_load.

Perhaps several fields of the DimmState struct should be part of a
VMStateDescription to handle migration in a cleaner way.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 arch_init.c |   23 ---
 1 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index a9e8b74..5f46b98 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -43,6 +43,7 @@
 #include hw/smbios.h
 #include exec-memory.h
 #include hw/pcspk.h
+#include hw/dimm.h
 
 #ifdef TARGET_SPARC
 int graphic_width = 1024;
@@ -452,9 +453,25 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
 }
 
 if (!block) {
-fprintf(stderr, Unknown ramblock \%s\, cannot 
-accept migration\n, id);
-return -EINVAL;
+/* this can happen if a dimm was hot-added at source 
host */
+DimmState *slot = dimm_find_from_name(id);
+if (slot) {
+dimm_activate(slot);
+/* rescan ram_list, verify ramblock is there now */
+QLIST_FOREACH(block, ram_list.blocks, next) {
+if (!strncmp(id, block-idstr, sizeof(id))) {
+if (block-length != length)
+return -EINVAL;
+break;
+}
+}
+assert(block);
+}
+else {
+fprintf(stderr, Unknown ramblock \%s\, cannot 
+accept migration\n, id);
+return -EINVAL;
+}
 }
 
 total_ram_bytes -= length;
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v2 16/21] acpi_piix4: Update dimm state on VM reboot

2012-07-11 Thread Vasilis Liaskovitis

in case of hot-remove or hot-add failure, the dimm bitmaps in qemu and Seabios
are inconsistent with the true state of the DIMM devices. The populated field
of the DimmState reflects the true state of the device. This inconsistency means
that a failed operation cannot be retried.

Ths patch updates the bit array to the true state of the dimms on VM reboot.
This allows retry of failed hot-add or hot-remove operations after a reboot.

Retrying a failed hot operation is not yet possible before reboot (the following
patch removes this limitation for guests with _OST acpi support)

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/acpi_piix4.c |   25 +
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index d8e2c22..ebc5de7 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -91,6 +91,7 @@ typedef struct PIIX4PMState {
 } PIIX4PMState;
 
 static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s);
+static void piix4_dimm_state_sync(PIIX4PMState *s);
 
 #define ACPI_ENABLE 0xf1
 #define ACPI_DISABLE 0xf0
@@ -369,6 +370,7 @@ static void piix4_reset(void *opaque)
 /* Mark SMM as already inited (until KVM supports SMM). */
 pci_conf[0x5B] = 0x02;
 }
+piix4_dimm_state_sync(s);
 piix4_update_hotplug(s);
 }
 
@@ -671,6 +673,29 @@ static int piix4_dimm_hotplug(DeviceState *qdev, 
SysBusDevice *dev, int
 return 0;
 }
 
+void piix4_dimm_state_sync(PIIX4PMState *s)
+{
+struct gpe_regs *g = s-gperegs;
+DimmState *slot = NULL;
+uint32_t i, temp = 1;
+
+for(i = 0; i  MAX_DIMMS; i++) {
+slot = dimm_find_from_idx(i);
+if (!slot)
+break;
+if (i % 8 == 0) {
+temp = 1;
+g-mems_sts[i / 8] = 0;
+}
+else
+temp = temp  1;
+if (slot-populated) {
+g-mems_sts[i / 8] |= temp;
+}
+slot-pending = false;
+}
+}
+
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
PCIHotplugState state)
 {
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v2 17/21][SeaBIOS] acpi_dsdt: Revert internal dimm state on _OST failure

2012-07-11 Thread Vasilis Liaskovitis

This reverts bitmap state in the case of a failed hot operation, in order to
allow retry of failed hot operations

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/acpi-dsdt.dsl |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 1c253ca..0d37bbc 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -832,6 +832,8 @@ DefinitionBlock (
 Switch(And(Arg1, 0xFF)) {
 Case(0x1) {
 Store(Arg2, MEF)
+// Revert MEON flag for this memory device to one
+Store(One, Index(MEON, Arg2))
 }
 }
 }
@@ -843,6 +845,8 @@ DefinitionBlock (
 }
 Case(0x1) {
 Store(Arg2, MIF)
+// Revert MEON flag for this memory device to zero
+Store(Zero, Index(MEON, Arg2))
 }
 }
 }
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v2 19/21] Implement info memtotal and query-memtotal

2012-07-11 Thread Vasilis Liaskovitis

Returns total memory of guest in bytes, including hotplugged memory.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hmp-commands.hx  |2 ++
 hmp.c|7 +++
 hmp.h|1 +
 hw/dimm.c|   15 +++
 monitor.c|7 +++
 qapi-schema.json |   12 
 qmp-commands.hx  |   20 
 7 files changed, 64 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3172cde..016062e 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1461,6 +1461,8 @@ show qdev device model list
 show roms
 @item info memhp
 show memhp
+@item info memtotal
+show memtotal
 @end table
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index ec25d9a..8f89c7d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1017,3 +1017,10 @@ void hmp_info_memhp(Monitor *mon)
 
 qapi_free_MemHpInfoList(info);
 }
+
+void hmp_info_memtotal(Monitor *mon)
+{
+uint64_t ram_total;
+ram_total = (uint64_t)qmp_query_memtotal(NULL);
+monitor_printf(mon, MemTotal: %lu \n, ram_total);
+}
diff --git a/hmp.h b/hmp.h
index 971e7c4..d6e715e 100644
--- a/hmp.h
+++ b/hmp.h
@@ -65,5 +65,6 @@ void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict);
 void hmp_netdev_add(Monitor *mon, const QDict *qdict);
 void hmp_netdev_del(Monitor *mon, const QDict *qdict);
 void hmp_info_memhp(Monitor *mon);
+void hmp_info_memtotal(Monitor *mon);
 
 #endif
diff --git a/hw/dimm.c b/hw/dimm.c
index 6e324d3..b544173 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -28,6 +28,7 @@ static dimm_hotplug_fn dimm_hotplug;
 static dimm_hotplug_fn dimm_revert;
 static QTAILQ_HEAD(Dimmlist, DimmState)  dimmlist;
 static QTAILQ_HEAD(dimm_hp_result_head, dimm_hp_result)  dimm_hp_result_queue;
+extern ram_addr_t ram_size;
 
 static Property dimm_properties[] = {
 DEFINE_PROP_END_OF_LIST()
@@ -292,6 +293,20 @@ MemHpInfoList *qmp_query_memhp(Error **errp)
 
 return head;
 }
+
+int64_t qmp_query_memtotal(Error **errp)
+{
+DimmState *slot;
+uint64_t info = ram_size;
+
+QTAILQ_FOREACH(slot, dimmlist, nextdimm) {
+if (slot-populated) {
+info += slot-size;
+}
+}
+return (int64_t)info;
+}
+
 static int dimm_init(SysBusDevice *s)
 {
 DimmState *slot;
diff --git a/monitor.c b/monitor.c
index 4a14e26..1dd646c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2739,6 +2739,13 @@ static mon_cmd_t info_cmds[] = {
 .mhandler.info = hmp_info_memhp,
 },
 {
+.name   = memtotal,
+.args_type  = ,
+.params = ,
+.help   = show total memory size,
+.mhandler.info = hmp_info_memtotal,
+},
+{
 .name   = NULL,
 },
 };
diff --git a/qapi-schema.json b/qapi-schema.json
index 049f6f9..5bbf2c0 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1888,3 +1888,15 @@
 # Since: 1.1.3
 ##
 { 'command': 'query-memhp', 'returns': ['MemHpInfo'] }
+
+##
+# @query-memtotal:
+#
+# Returns total memory in bytes, including hotplugged dimms
+#
+# Returns: a l
+#
+# Since: 1.2
+##
+{ 'command': 'query-memtotal', 'returns': 'int' }
+
diff --git a/qmp-commands.hx b/qmp-commands.hx
index cd1d5f0..6c71696 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2286,3 +2286,23 @@ Example:
}
 
 EQMP
+
+{
+.name   = query-memtotal,
+.args_type  = ,
+.mhandler.cmd_new = qmp_marshal_input_query_memtotal
+},
+SQMP
+query-memtotal
+--
+
+Return total memory in bytes, including hotplugged dimms
+
+Example:
+
+- { execute: query-memtotal }
+- {
+  return: 1073741824
+   }
+
+EQMP
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v2 15/21] acpi_piix4: _OST dimm support

2012-07-11 Thread Vasilis Liaskovitis

This allows qemu to receive notifications from the guest OS on success or
failure of a memory hotplug request. The guest OS needs to implement the _OST
functionality for this to work (linux-next: http://lkml.org/lkml/2012/6/25/321)
Also add new _OST registers in docs/specs/acpi_hotplug.txt

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 docs/specs/acpi_hotplug.txt |   24 
 hw/acpi_piix4.c |   15 +++
 hw/dimm.c   |   18 ++
 hw/dimm.h   |1 +
 4 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/docs/specs/acpi_hotplug.txt b/docs/specs/acpi_hotplug.txt
index cf86242..2f6fd5f 100644
--- a/docs/specs/acpi_hotplug.txt
+++ b/docs/specs/acpi_hotplug.txt
@@ -20,3 +20,27 @@ ejected.
 
 Written by ACPI memory device _EJ0 method to notify qemu of successfull
 hot-removal.  Write-only.
+
+Memory Dimm ejection failure notification (IO port 0xafa1, 1-byte access):
+---
+Dimm hot-remove _OST failure notification. Byte value indicates Dimm slot for
+which ejection failed.
+
+Written by ACPI memory device _OST method to notify qemu of failed
+hot-removal.  Write-only.
+
+Memory Dimm insertion success notification (IO port 0xafa2, 1-byte access):
+---
+Dimm hot-add _OST success notification. Byte value indicates Dimm slot for 
which
+insertion succeeded.
+
+Written by ACPI memory device _OST method to notify qemu of failed
+hot-add.  Write-only.
+
+Memory Dimm insertion failure notification (IO port 0xafa3, 1-byte access):
+---
+Dimm hot-add _OST failure notification. Byte value indicates Dimm slot for 
which
+insertion failed.
+
+Written by ACPI memory device _OST method to notify qemu of failed
+hot-add.  Write-only.
diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index b988597..d8e2c22 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -49,6 +49,9 @@
 #define PCI_RMV_BASE 0xae0c
 #define MEM_BASE 0xaf80
 #define MEM_EJ_BASE 0xafa0
+#define MEM_OST_REMOVE_FAIL 0xafa1
+#define MEM_OST_ADD_SUCCESS 0xafa2
+#define MEM_OST_ADD_FAIL 0xafa3
 
 #define PIIX4_MEM_HOTPLUG_STATUS 8
 #define PIIX4_PCI_HOTPLUG_STATUS 2
@@ -531,6 +534,15 @@ static void gpe_writeb(void *opaque, uint32_t addr, 
uint32_t val)
 case MEM_EJ_BASE:
 dimm_notify(val, DIMM_REMOVE_SUCCESS);
 break;
+case MEM_OST_REMOVE_FAIL:
+dimm_notify(val, DIMM_REMOVE_FAIL);
+break;
+case MEM_OST_ADD_SUCCESS:
+dimm_notify(val, DIMM_ADD_SUCCESS);
+break;
+case MEM_OST_ADD_FAIL:
+dimm_notify(val, DIMM_ADD_FAIL);
+break;
 default:
 acpi_gpe_ioport_writeb(s-ar, addr, val);
 }
@@ -604,6 +616,9 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, 
PIIX4PMState *s)
 
 register_ioport_read(MEM_BASE, DIMM_BITMAP_BYTES, 1,  gpe_readb, s);
 register_ioport_write(MEM_EJ_BASE, 1, 1,  gpe_writeb, s);
+register_ioport_write(MEM_OST_REMOVE_FAIL, 1, 1,  gpe_writeb, s);
+register_ioport_write(MEM_OST_ADD_SUCCESS, 1, 1,  gpe_writeb, s);
+register_ioport_write(MEM_OST_ADD_FAIL, 1, 1,  gpe_writeb, s);
 
 for(i = 0; i  DIMM_BITMAP_BYTES; i++) {
 s-gperegs.mems_sts[i] = 0;
diff --git a/hw/dimm.c b/hw/dimm.c
index 9b32386..ba104cc 100644
--- a/hw/dimm.c
+++ b/hw/dimm.c
@@ -89,12 +89,14 @@ void dimm_activate(DimmState *slot)
 dimm_populate(slot);
 if (dimm_hotplug)
 dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 1);
+slot-pending = true;
 }
 
 void dimm_deactivate(DimmState *slot)
 {
 if (dimm_hotplug)
 dimm_hotplug(dimm_hotplug_qdev, (SysBusDevice*)slot, 0);
+slot-pending = true;
 }
 
 DimmState *dimm_find_from_name(char *id)
@@ -138,6 +140,10 @@ int dimm_do(Monitor *mon, const QDict *qdict, bool add)
 __FUNCTION__, id);
 return 1;
 }
+if (slot-pending) {
+fprintf(stderr, warning: %s slot %s hot-operation pending\n,
+__FUNCTION__, id);
+}
 dimm_activate(slot);
 }
 else {
@@ -146,6 +152,10 @@ int dimm_do(Monitor *mon, const QDict *qdict, bool add)
 __FUNCTION__, id);
 return 1;
 }
+if (slot-pending) {
+fprintf(stderr, warning: %s slot %s hot-operation pending\n,
+__FUNCTION__, id);
+}
 dimm_deactivate(slot);
 }
 
@@ -198,6 +208,13 @@ void dimm_notify(uint32_t idx, uint32_t event)
 case DIMM_REMOVE_SUCCESS:
 dimm_depopulate(s);
 QTAILQ_INSERT_TAIL(dimm_hp_result_queue, result, next);
+s-pending = false;
+break;
+case DIMM_REMOVE_FAIL:
+case DIMM_ADD_SUCCESS:
+case DIMM_ADD_FAIL

[RFC PATCH v2 11/21] Implement dimm_add and dimm_del hmp/qmp commands

2012-07-11 Thread Vasilis Liaskovitis

Hot-add hmp syntax: dimm_add dimmid
Hot-remove hmp syntax: dimm_del dimmid

Respective qmp commands are dimm-add, dimm-del.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hmp-commands.hx |   32 
 monitor.c   |   11 +++
 monitor.h   |3 +++
 qmp-commands.hx |   39 +++
 4 files changed, 85 insertions(+), 0 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index f5d9d91..012c150 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -618,6 +618,38 @@ Add device.
 ETEXI
 
 {
+.name   = dimm_del,
+.args_type  = id:s,
+.params = id,
+.help   = hot-remove memory (dimm device),
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_dimm_del,
+},
+
+STEXI
+@item dimm_del @var{config}
+@findex dimm_del
+
+Hot-remove dimm.
+ETEXI
+
+{
+.name   = dimm_add,
+.args_type  = id:s,
+.params = id,
+.help   = hot-add memory (dimm device),
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_dimm_add,
+},
+
+STEXI
+@item dimm_add @var{config}
+@findex dimm_add
+
+Hot-add dimm.
+ETEXI
+
+{
 .name   = device_del,
 .args_type  = id:s,
 .params = device,
diff --git a/monitor.c b/monitor.c
index f6107ba..d3d95a6 100644
--- a/monitor.c
+++ b/monitor.c
@@ -67,6 +67,7 @@
 #include qmp-commands.h
 #include hmp.h
 #include qemu-thread.h
+#include hw/dimm.h
 
 /* for pic/irq_info */
 #if defined(TARGET_SPARC)
@@ -4813,3 +4814,13 @@ int monitor_read_block_device_key(Monitor *mon, const 
char *device,
 
 return monitor_read_bdrv_key_start(mon, bs, completion_cb, opaque);
 }
+
+int do_dimm_add(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+return dimm_do(mon, qdict, true);
+}
+
+int do_dimm_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+return dimm_do(mon, qdict, false);
+}
diff --git a/monitor.h b/monitor.h
index 5f4de1b..afdd721 100644
--- a/monitor.h
+++ b/monitor.h
@@ -86,4 +86,7 @@ int qmp_qom_set(Monitor *mon, const QDict *qdict, QObject 
**ret);
 
 int qmp_qom_get(Monitor *mon, const QDict *qdict, QObject **ret);
 
+int do_dimm_add(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_dimm_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
+
 #endif /* !MONITOR_H */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 2e1a38e..7efd628 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2209,3 +2209,42 @@ EQMP
 .args_type  = implements:s?,abstract:b?,
 .mhandler.cmd_new = qmp_marshal_input_qom_list_types,
 },
+{
+.name   = dimm-add,
+.args_type  = id:s,
+.mhandler.cmd_new = do_dimm_add,
+},
+SQMP
+dimm-add
+-
+
+Hot-add memory DIMM
+
+Will hotplug memory DIMMs with given id.
+
+Example:
+
+- { execute: dimm-add, arguments: { id: dimm0 } }
+- { return: {} }
+
+EQMP
+
+{
+.name   = dimm-del,
+.args_type  = id:s,
+.mhandler.cmd_new = do_dimm_del,
+},
+SQMP
+dimm-del
+-
+
+Hot-remove memory DIMM
+
+Will hot-unplug memory DIMMs with given id.
+
+Example:
+
+- { execute: dimm-del, arguments: { id: dimm0 } }
+- { return: {} }
+
+EQMP
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v2 10/21] Implement -dimm command line option

2012-07-11 Thread Vasilis Liaskovitis

Syntax: -dimm id=name,size=sz,node=pxm,populated=on|off

The starting physical address for all dimms is calculated automatically from top
of memory, skipping the pci hole at [PCI_HOLE_START, 4G). 
populated=on means the dimm is populated at machine startup. Default is off.
node is defining numa proximity for this dimm. Default is node zero.

Example:
-dimm id=dimm0,size=512M,node=0,populated=off
will define a 512M memory slot belonging to numa node 0.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 qemu-config.c   |   25 +
 qemu-options.hx |5 +
 sysemu.h|1 +
 vl.c|   35 +++
 4 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/qemu-config.c b/qemu-config.c
index 5c3296b..4abc31b 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -626,6 +626,30 @@ QemuOptsList qemu_boot_opts = {
 },
 };
 
+static QemuOptsList qemu_dimm_opts = {
+.name = dimm,
+.head = QTAILQ_HEAD_INITIALIZER(qemu_dimm_opts.head),
+.desc = {
+{
+.name = id,
+.type = QEMU_OPT_STRING,
+.help = id of this dimm device,
+},{
+.name = size,
+.type = QEMU_OPT_SIZE,
+.help = memory size for this dimm,
+},{
+.name = populated,
+.type = QEMU_OPT_BOOL,
+.help = populated for this dimm,
+},{
+.name = node,
+.type = QEMU_OPT_NUMBER,
+.help = NUMA node number (i.e. proximity) for this dimm,
+},
+{ /* end of list */ }
+},
+};
 static QemuOptsList *vm_config_groups[32] = {
 qemu_drive_opts,
 qemu_chardev_opts,
@@ -641,6 +665,7 @@ static QemuOptsList *vm_config_groups[32] = {
 qemu_machine_opts,
 qemu_boot_opts,
 qemu_iscsi_opts,
+qemu_dimm_opts,
 NULL,
 };
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 8b66264..61909f7 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2747,3 +2747,8 @@ HXCOMM This is the last statement. Insert new options 
before this line!
 STEXI
 @end table
 ETEXI
+
+DEF(dimm, HAS_ARG, QEMU_OPTION_dimm,
+-dimm id=dimmid,size=sz,node=nd,populated=on|off\n
+specify memory dimm device with name dimmid, size sz on node nd,
+QEMU_ARCH_ALL)
diff --git a/sysemu.h b/sysemu.h
index bc2c788..3e21a22 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -136,6 +136,7 @@ extern QEMUClock *rtc_clock;
 extern int nb_numa_nodes;
 extern uint64_t node_mem[MAX_NODES];
 extern uint64_t node_cpumask[MAX_NODES];
+extern int nb_hp_dimms;
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/vl.c b/vl.c
index 0ff8818..efe915e 100644
--- a/vl.c
+++ b/vl.c
@@ -120,6 +120,7 @@ int main(int argc, char **argv)
 #include hw/xen.h
 #include hw/qdev.h
 #include hw/loader.h
+#include hw/dimm.h
 #include bt-host.h
 #include net.h
 #include net/slirp.h
@@ -242,6 +243,7 @@ QTAILQ_HEAD(, FWBootEntry) fw_boot_order = 
QTAILQ_HEAD_INITIALIZER(fw_boot_order
 int nb_numa_nodes;
 uint64_t node_mem[MAX_NODES];
 uint64_t node_cpumask[MAX_NODES];
+int nb_hp_dimms;
 
 uint8_t qemu_uuid[16];
 
@@ -518,6 +520,23 @@ static void configure_rtc_date_offset(const char 
*startdate, int legacy)
 rtc_date_offset = time(NULL) - rtc_start_date;
 }
 }
+static void configure_dimm(QemuOpts *opts)
+{
+const char *id;
+uint64_t size, node;
+bool populated;
+if (nb_hp_dimms == MAX_DIMMS) {
+fprintf(stderr, qemu: maximum number of DIMMs (%d) exceeded\n,
+MAX_DIMMS);
+exit(1);
+}
+id = qemu_opts_id(opts);
+size = qemu_opt_get_size(opts, size, DEFAULT_DIMMSIZE);
+populated = qemu_opt_get_bool(opts, populated, 0);
+node = qemu_opt_get_number(opts, node, 0);
+dimm_create((char*)id, size, node, nb_hp_dimms, populated);
+nb_hp_dimms++;
+}
 
 static void configure_rtc(QemuOpts *opts)
 {
@@ -2273,6 +2292,8 @@ int main(int argc, char **argv, char **envp)
 DisplayChangeListener *dcl;
 int cyls, heads, secs, translation;
 QemuOpts *hda_opts = NULL, *opts, *machine_opts;
+QemuOpts *dimm_opts[MAX_DIMMS];
+int nb_dimm_opts = 0;
 QemuOptsList *olist;
 int optind;
 const char *optarg;
@@ -3200,6 +3221,18 @@ int main(int argc, char **argv, char **envp)
 case QEMU_OPTION_qtest_log:
 qtest_log = optarg;
 break;
+case QEMU_OPTION_dimm:
+if (nb_dimm_opts == MAX_DIMMS) {
+fprintf(stderr, qemu: maximum number of DIMMs (%d) 
exceeded\n,
+MAX_DIMMS);
+}
+dimm_opts[nb_dimm_opts] =
+qemu_opts_parse(qemu_find_opts(dimm), optarg, 0);
+if (!dimm_opts[nb_dimm_opts]) {
+exit(1);
+}
+nb_dimm_opts++;
+break;
 default:
 os_parse_cmd_args

[RFC PATCH v2 09/21] pc: Add dimm paravirt SRAT info

2012-07-11 Thread Vasilis Liaskovitis

The numa_fw_cfg paravirt interface is extended to include SRAT information for
all hotplug-able dimms. There are 3 words for each hotplug-able memory slot,
denoting start address, size and node proximity. The new info is appended after
existing numa info, so that the fw_cfg layout does not break.  This information
is used by Seabios to build hotplug memory device objects at runtime.
nb_numa_nodes is set to 1 by default (not 0), so that we always pass srat info
to SeaBIOS.

v1-v2:
Dimm SRAT info (#dimms) is appended at end of existing numa fw_cfg in order not
to break existing layout
Documentation of the new fwcfg layout is included in docs/specs/fwcfg.txt

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 docs/specs/fwcfg.txt |   28 ++
 hw/pc.c  |   53 -
 vl.c |2 +-
 3 files changed, 80 insertions(+), 3 deletions(-)
 create mode 100644 docs/specs/fwcfg.txt

diff --git a/docs/specs/fwcfg.txt b/docs/specs/fwcfg.txt
new file mode 100644
index 000..e6fcd8f
--- /dev/null
+++ b/docs/specs/fwcfg.txt
@@ -0,0 +1,28 @@
+QEMU-BIOS Paravirt Documentation
+--
+
+This document describes paravirt data structures passed from QEMU to BIOS.
+
+fw_cfg SRAT paravirt info
+
+The SRAT info passed from QEMU to BIOS has the following layout:
+
+---
+#nodes | cpu0_pxm | cpu1_pxm | ... | cpulast_pxm | node0_mem | node1_mem | ... 
| nodelast_mem
+
+---
+#dimms | dimm0_start | dimm0_sz | dimm0_pxm | ... | dimmlast_start | 
dimmlast_sz | dimmlast_pxm
+
+Entry 0 contains the number of numa nodes (nb_numa_nodes).
+
+Entries 1..max_cpus: The next max_cpus entries describe node proximity for each
+one of the vCPUs in the system.
+
+Entries max_cpus+1..max_cpus+nb_numa_nodes+1:  The next nb_numa_nodes entries
+describe the memory size for each one of the NUMA nodes in the system.
+
+Entry max_cpus+nb_numa_nodes+1 contains the number of memory dimms 
(nb_hp_dimms)
+
+The last 3 * nb_hp_dimms entries are organized in triplets: Each triplet 
contains
+the physical address offset, size (in bytes), and node proximity for the
+respective dimm.
diff --git a/hw/pc.c b/hw/pc.c
index ef9901a..cf651d0 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -598,12 +598,15 @@ int e820_add_entry(uint64_t address, uint64_t length, 
uint32_t type)
 return index;
 }
 
+static void setup_hp_dimms(uint64_t *fw_cfg_slots);
+
 static void *bochs_bios_init(void)
 {
 void *fw_cfg;
 uint8_t *smbios_table;
 size_t smbios_len;
 uint64_t *numa_fw_cfg;
+uint64_t *hp_dimms_fw_cfg;
 int i, j;
 
 register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL);
@@ -638,8 +641,10 @@ static void *bochs_bios_init(void)
 /* allocate memory for the NUMA channel: one (64bit) word for the number
  * of nodes, one word for each VCPU-node and one word for each node to
  * hold the amount of memory.
+ * Finally one word for the number of hotplug memory slots and three words
+ * for each hotplug memory slot (start address, size and node proximity).
  */
-numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
+numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 
8);
 numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
 for (i = 0; i  max_cpus; i++) {
 for (j = 0; j  nb_numa_nodes; j++) {
@@ -652,8 +657,15 @@ static void *bochs_bios_init(void)
 for (i = 0; i  nb_numa_nodes; i++) {
 numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
 }
+
+numa_fw_cfg[1 + max_cpus + nb_numa_nodes] = cpu_to_le64(nb_hp_dimms);
+
+hp_dimms_fw_cfg = numa_fw_cfg + 2 + max_cpus + nb_numa_nodes;
+if (nb_hp_dimms)
+setup_hp_dimms(hp_dimms_fw_cfg);
+
 fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg,
- (1 + max_cpus + nb_numa_nodes) * 8);
+ (2 + max_cpus + nb_numa_nodes + 3 * nb_hp_dimms) * 8);
 
 return fw_cfg;
 }
@@ -1223,3 +1235,40 @@ target_phys_addr_t pc_set_hp_memory_offset(uint64_t size)
 
 return ret;
 }
+
+static void setup_hp_dimms(uint64_t *fw_cfg_slots)
+{
+int i = 0;
+Error *err = NULL;
+DeviceState *dev;
+DimmState *slot;
+const char *type;
+BusChild *kid;
+BusState *bus = sysbus_get_default();
+
+QTAILQ_FOREACH(kid, bus-children, sibling) {
+dev = kid-child;
+type = object_property_get_str(OBJECT(dev), type, err);
+if (err) {
+error_free(err);
+fprintf(stderr, error getting device type\n);
+exit(1);
+}
+
+if (!strcmp(type, dimm)) {
+if (!dev-id) {
+fprintf(stderr, error getting dimm device id\n

[RFC PATCH v2 08/21] pc: calculate dimm physical addresses and adjust memory map

2012-07-11 Thread Vasilis Liaskovitis

Dimm physical address offsets are calculated automatically and memory map is
adjusted accordingly. If a DIMM can fit before the PCI_HOLE_START (currently
0xe000), it will be added normally, otherwise its physical address will be
above 4GB.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/pc.c  |   41 +
 hw/pc.h  |6 ++
 hw/pc_piix.c |   18 --
 vl.c |1 +
 4 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index c7e9ab3..ef9901a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -48,6 +48,7 @@
 #include memory.h
 #include exec-memory.h
 #include arch_init.h
+#include dimm.h
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
@@ -89,6 +90,9 @@ struct e820_table {
 static struct e820_table e820_table;
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
+ram_addr_t below_4g_hp_mem_size = 0;
+ram_addr_t above_4g_hp_mem_size = 0;
+extern target_phys_addr_t ram_hp_offset;
 void gsi_handler(void *opaque, int n, int level)
 {
 GSIState *s = opaque;
@@ -1182,3 +1186,40 @@ void pc_pci_device_init(PCIBus *pci_bus)
 pci_create_simple(pci_bus, -1, lsi53c895a);
 }
 }
+
+
+/* Function to configure memory offsets of hotpluggable dimms */
+
+target_phys_addr_t pc_set_hp_memory_offset(uint64_t size)
+{
+target_phys_addr_t ret;
+
+/* on first call, initialize ram_hp_offset */
+if (!ram_hp_offset) {
+if (ram_size = PCI_HOLE_START ) {
+ram_hp_offset = 0x1LL + (ram_size - PCI_HOLE_START);
+} else {
+ram_hp_offset = ram_size;
+}
+}
+
+if (ram_hp_offset = 0x1LL) {
+ret = ram_hp_offset;
+above_4g_hp_mem_size += size;
+ram_hp_offset += size;
+}
+/* if dimm fits before pci hole, append it normally */
+else if (ram_hp_offset + size = PCI_HOLE_START) {
+ret = ram_hp_offset;
+below_4g_hp_mem_size += size;
+ram_hp_offset += size;
+}
+/* otherwise place it above 4GB */
+else {
+ret = 0x1LL;
+above_4g_hp_mem_size += size;
+ram_hp_offset = 0x1LL + size;
+}
+
+return ret;
+}
diff --git a/hw/pc.h b/hw/pc.h
index 31ccb6f..15bdd7d 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -10,6 +10,7 @@
 #include memory.h
 #include ioapic.h
 
+#define PCI_HOLE_START 0xe000
 /* PC-style peripherals (also used by other machines).  */
 
 /* serial.c */
@@ -218,6 +219,11 @@ static inline bool isa_ne2000_init(ISABus *bus, int base, 
int irq, NICInfo *nd)
 /* pc_sysfw.c */
 void pc_system_firmware_init(MemoryRegion *rom_memory);
 
+/* memory hotplug */
+target_phys_addr_t pc_set_hp_memory_offset(uint64_t size);
+extern ram_addr_t below_4g_hp_mem_size;
+extern ram_addr_t above_4g_hp_mem_size;
+
 /* e820 types */
 #define E820_RAM1
 #define E820_RESERVED   2
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 0c0096f..f3f1651 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -43,6 +43,7 @@
 #include xen.h
 #include memory.h
 #include exec-memory.h
+#include dimm.h
 #ifdef CONFIG_XEN
 #  include xen/hvm/hvm_info_table.h
 #endif
@@ -155,9 +156,9 @@ static void pc_init1(MemoryRegion *system_memory,
 kvmclock_create();
 }
 
-if (ram_size = 0xe000 ) {
-above_4g_mem_size = ram_size - 0xe000;
-below_4g_mem_size = 0xe000;
+if (ram_size = PCI_HOLE_START ) {
+above_4g_mem_size = ram_size - PCI_HOLE_START;
+below_4g_mem_size = PCI_HOLE_START;
 } else {
 above_4g_mem_size = 0;
 below_4g_mem_size = ram_size;
@@ -172,6 +173,9 @@ static void pc_init1(MemoryRegion *system_memory,
 rom_memory = system_memory;
 }
 
+/* adjust memory map for hotplug dimms */
+dimm_calc_offsets(pc_set_hp_memory_offset);
+
 /* allocate ram and load rom/bios */
 if (!xen_enabled()) {
 fw_cfg = pc_memory_init(system_memory,
@@ -192,9 +196,11 @@ static void pc_init1(MemoryRegion *system_memory,
 if (pci_enabled) {
 pci_bus = i440fx_init(i440fx_state, piix3_devfn, isa_bus, gsi,
   system_memory, system_io, ram_size,
-  below_4g_mem_size,
-  0x1ULL - below_4g_mem_size,
-  0x1ULL + above_4g_mem_size,
+  below_4g_mem_size + below_4g_hp_mem_size,
+  0x1ULL - below_4g_mem_size
+- below_4g_hp_mem_size,
+  0x1ULL + above_4g_mem_size
++ above_4g_hp_mem_size,
   (sizeof(target_phys_addr_t) == 4
? 0
: ((uint64_t)1  62)),
diff --git a/vl.c b/vl.c
index 1329c30..0ff8818 100644
--- a/vl.c
+++ b/vl.c
@@ -176,6 +176,7 @@ DisplayType display_type

[RFC PATCH v2 04/21][SeaBIOS] acpi: generate hotplug memory devices

2012-07-11 Thread Vasilis Liaskovitis

The memory device generation is guided by qemu paravirt info. Seabios
first uses the info to setup SRAT entries for the hotplug-able memory slots.
Afterwards, build_memssdt uses the created SRAT entries to generate
appropriate memory device objects. One memory device (and corresponding SRAT
entry) is generated for each hotplug-able qemu memslot. Currently no SSDT
memory device is created for initial system memory.

We only support up to 255 DIMMs for now (PackageOp used for the MEON array can
only describe an array of at most 255 elements. VarPackageOp would be needed to
support more than 255 devices)

v1-v2:
Seabios reads mems_sts from qemu to build e820_map
SSDT size and some offsets are calculated with extraction macros.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/acpi.c |  158 +--
 1 files changed, 152 insertions(+), 6 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index 55e4607..c83e8c7 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -510,6 +510,127 @@ build_ssdt(void)
 return ssdt;
 }
 
+#include ssdt-mem.hex
+
+/* 0x5B 0x82 DeviceOp PkgLength NameString DimmID */
+#define MEM_BASE 0xaf80
+#define SD_MEM (ssdm_mem_aml + *ssdt_mem_start)
+#define SD_MEMSIZEOF (*ssdt_mem_end - *ssdt_mem_start)
+#define SD_OFFSET_MEMHEX (*ssdt_mem_name - *ssdt_mem_start + 2)
+#define SD_OFFSET_MEMID (*ssdt_mem_id - *ssdt_mem_start)
+#define SD_OFFSET_PXMID 31
+#define SD_OFFSET_MEMSTART 55
+#define SD_OFFSET_MEMEND   63
+#define SD_OFFSET_MEMSIZE  79
+
+u64 nb_hp_memslots = 0;
+struct srat_memory_affinity *mem;
+
+static void build_memdev(u8 *ssdt_ptr, int i, u64 mem_base, u64 mem_len, u8 
node)
+{
+memcpy(ssdt_ptr, SD_MEM, SD_MEMSIZEOF);
+ssdt_ptr[SD_OFFSET_MEMHEX] = getHex(i  4);
+ssdt_ptr[SD_OFFSET_MEMHEX+1] = getHex(i);
+ssdt_ptr[SD_OFFSET_MEMID] = i;
+ssdt_ptr[SD_OFFSET_PXMID] = node;
+*(u64*)(ssdt_ptr + SD_OFFSET_MEMSTART) = mem_base;
+*(u64*)(ssdt_ptr + SD_OFFSET_MEMEND) = mem_base + mem_len;
+*(u64*)(ssdt_ptr + SD_OFFSET_MEMSIZE) = mem_len;
+}
+
+static void*
+build_memssdt(void)
+{
+u64 mem_base;
+u64 mem_len;
+u8  node;
+int i;
+struct srat_memory_affinity *entry = mem;
+u64 nb_memdevs = nb_hp_memslots;
+u8  memslot_status, enabled;
+
+int length = ((1+3+4)
+  + (nb_memdevs * SD_MEMSIZEOF)
+  + (1+2+5+(12*nb_memdevs))
+  + (6+2+1+(1*nb_memdevs)));
+u8 *ssdt = malloc_high(sizeof(struct acpi_table_header) + length);
+if (! ssdt) {
+warn_noalloc();
+return NULL;
+}
+u8 *ssdt_ptr = ssdt + sizeof(struct acpi_table_header);
+
+// build Scope(_SB_) header
+*(ssdt_ptr++) = 0x10; // ScopeOp
+ssdt_ptr = encodeLen(ssdt_ptr, length-1, 3);
+*(ssdt_ptr++) = '_';
+   *(ssdt_ptr++) = 'S';
+*(ssdt_ptr++) = 'B';
+*(ssdt_ptr++) = '_';
+
+for (i = 0; i  nb_memdevs; i++) {
+mem_base = (((u64)(entry-base_addr_high)  32 )| 
entry-base_addr_low);
+mem_len = (((u64)(entry-length_high)  32 )| entry-length_low);
+node = entry-proximity[0];
+build_memdev(ssdt_ptr, i, mem_base, mem_len, node);
+ssdt_ptr += SD_MEMSIZEOF;
+entry++;
+}
+
+// build Method(MTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CM00, Arg1)} 
...}
+*(ssdt_ptr++) = 0x14; // MethodOp
+ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*nb_memdevs), 2);
+*(ssdt_ptr++) = 'M';
+*(ssdt_ptr++) = 'T';
+*(ssdt_ptr++) = 'F';
+*(ssdt_ptr++) = 'Y';
+*(ssdt_ptr++) = 0x02;
+for (i=0; inb_memdevs; i++) {
+*(ssdt_ptr++) = 0xA0; // IfOp
+   ssdt_ptr = encodeLen(ssdt_ptr, 11, 1);
+*(ssdt_ptr++) = 0x93; // LEqualOp
+*(ssdt_ptr++) = 0x68; // Arg0Op
+*(ssdt_ptr++) = 0x0A; // BytePrefix
+*(ssdt_ptr++) = i;
+*(ssdt_ptr++) = 0x86; // NotifyOp
+*(ssdt_ptr++) = 'M';
+*(ssdt_ptr++) = 'P';
+*(ssdt_ptr++) = getHex(i  4);
+*(ssdt_ptr++) = getHex(i);
+*(ssdt_ptr++) = 0x69; // Arg1Op
+}
+
+// build Name(MEON, Package() { One, One, ..., Zero, Zero, ... })
+*(ssdt_ptr++) = 0x08; // NameOp
+*(ssdt_ptr++) = 'M';
+*(ssdt_ptr++) = 'E';
+*(ssdt_ptr++) = 'O';
+*(ssdt_ptr++) = 'N';
+*(ssdt_ptr++) = 0x12; // PackageOp
+ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*nb_memdevs), 2);
+*(ssdt_ptr++) = nb_memdevs;
+
+entry = mem;
+memslot_status = 0;
+
+for (i = 0; i  nb_memdevs; i++) {
+enabled = 0;
+if (i % 8 == 0)
+memslot_status = inb(MEM_BASE + i/8);
+enabled = memslot_status  1;
+mem_base = (((u64)(entry-base_addr_high)  32 )| 
entry-base_addr_low);
+mem_len = (((u64)(entry-length_high)  32 )| entry-length_low);
+*(ssdt_ptr++) = enabled ? 0x01 : 0x00;
+if (enabled)
+add_e820(mem_base, mem_len, E820_RAM);
+memslot_status = memslot_status  1;
+entry

[RFC PATCH v2 02/21][SeaBIOS] Add SSDT memory device support

2012-07-11 Thread Vasilis Liaskovitis

Define SSDT hotplug-able memory devices in _SB namespace. The dynamically
generated SSDT includes per memory device hotplug methods. These methods
just call methods defined in the DSDT. Also dynamically generate a MTFY
method and a MEON array of the online/available memory devices.  ACPI
extraction macros are used to place the AML code in variables later used by
src/acpi. The design is taken from SSDT cpu generation.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 Makefile |2 +-
 src/ssdt-mem.dsl |   65 ++
 2 files changed, 66 insertions(+), 1 deletions(-)
 create mode 100644 src/ssdt-mem.dsl

diff --git a/Makefile b/Makefile
index fe974f7..299069e 100644
--- a/Makefile
+++ b/Makefile
@@ -228,7 +228,7 @@ $(OUT)%.hex: src/%.dsl ./tools/acpi_extract_preprocess.py 
./tools/acpi_extract.p
$(Q)$(PYTHON) ./tools/acpi_extract.py $(OUT)$*.lst  $(OUT)$*.off
$(Q)cat $(OUT)$*.off  $@
 
-$(OUT)ccode32flat.o: $(OUT)acpi-dsdt.hex $(OUT)ssdt-proc.hex 
$(OUT)ssdt-pcihp.hex
+$(OUT)ccode32flat.o: $(OUT)acpi-dsdt.hex $(OUT)ssdt-proc.hex 
$(OUT)ssdt-pcihp.hex $(OUT)ssdt-mem.hex
 
  Kconfig rules
 
diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl
new file mode 100644
index 000..ee322f0
--- /dev/null
+++ b/src/ssdt-mem.dsl
@@ -0,0 +1,65 @@
+/* This file is the basis for the ssdt_mem[] variable in src/acpi.c.
+ * It is similar in design to the ssdt_proc variable.
+ * It defines the contents of the per-cpu Processor() object.  At
+ * runtime, a dynamically generated SSDT will contain one copy of this
+ * AML snippet for every possible memory device in the system.  The
+ * objects will * be placed in the \_SB_ namespace.
+ *
+ * In addition to the aml code generated from this file, the
+ * src/acpi.c file creates a MEMNTFY method with an entry for each memdevice:
+ * Method(MTFY, 2) {
+ * If (LEqual(Arg0, 0x00)) { Notify(MP00, Arg1) }
+ * If (LEqual(Arg0, 0x01)) { Notify(MP01, Arg1) }
+ * ...
+ * }
+ * and a MEON array with the list of active and inactive memory devices:
+ * Name(MEON, Package() { One, One, ..., Zero, Zero, ... })
+ */
+ACPI_EXTRACT_ALL_CODE ssdm_mem_aml
+
+DefinitionBlock (ssdt-mem.aml, SSDT, 0x02, BXPC, CSSDT, 0x1)
+/*  v-- DO NOT EDIT --v */
+{
+ACPI_EXTRACT_DEVICE_START ssdt_mem_start
+ACPI_EXTRACT_DEVICE_END ssdt_mem_end
+ACPI_EXTRACT_DEVICE_STRING ssdt_mem_name
+Device(MPAA) {
+ACPI_EXTRACT_NAME_BYTE_CONST ssdt_mem_id
+Name(ID, 0xAA)
+/*  ^-- DO NOT EDIT --^
+ *
+ * The src/acpi.c code requires the above layout so that it can update
+ * MPAA and 0xAA with the appropriate MEMDEVICE id (see
+ * SD_OFFSET_MEMHEX/MEMID1/MEMID2).  Don't change the above without
+ * also updating the C code.
+ */
+Name(_HID, EISAID(PNP0C80))
+Name(_PXM, 0xAA)
+
+External(CMST, MethodObj)
+External(MPEJ, MethodObj)
+
+Name(_CRS, ResourceTemplate() {
+QwordMemory(
+   ResourceConsumer,
+   ,
+   MinFixed,
+   MaxFixed,
+   Cacheable,
+   ReadWrite,
+   0x0,
+   0xDEADBEEF,
+   0xE6ADBEEE,
+   0x,
+   0x0800,
+   )
+})
+Method (_STA, 0) {
+Return(CMST(ID))
+}
+Method (_EJ0, 1, NotSerialized) {
+MPEJ(ID, Arg0)
+}
+}
+}
+
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v2 04/21][SeaBIOS] acpi: generate hotplug memory devices

2012-07-11 Thread Vasilis Liaskovitis

Hi,

On Wed, Jul 11, 2012 at 06:48:38PM +0800, Wen Congyang wrote:
  +if (enabled)
  +add_e820(mem_base, mem_len, E820_RAM);
 
 add_e820() is declared in memmap.h. You should include this header file,
 otherwise, seabios cannot be built.

thanks. you had the same comment on v1 but I forgot to address it. I will
update.

- Vasilis
 
 Thanks
 Wen Congyang
 
  +memslot_status = memslot_status  1;
  +entry++;
  +}
  +build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
  +
  +return ssdt;
  +}
  +
   #include ssdt-pcihp.hex
   
   #define PCI_RMV_BASE 0xae0c
  @@ -618,9 +739,6 @@ build_srat(void)
   {
   int nb_numa_nodes = qemu_cfg_get_numa_nodes();
   
  -if (nb_numa_nodes == 0)
  -return NULL;
  -
   u64 *numadata = malloc_tmphigh(sizeof(u64) * (MaxCountCPUs + 
  nb_numa_nodes));
   if (!numadata) {
   warn_noalloc();
  @@ -629,10 +747,11 @@ build_srat(void)
   
   qemu_cfg_get_numa_data(numadata, MaxCountCPUs + nb_numa_nodes);
   
  +qemu_cfg_get_numa_data(nb_hp_memslots, 1);
   struct system_resource_affinity_table *srat;
   int srat_size = sizeof(*srat) +
   sizeof(struct srat_processor_affinity) * MaxCountCPUs +
  -sizeof(struct srat_memory_affinity) * (nb_numa_nodes + 2);
  +sizeof(struct srat_memory_affinity) * (nb_numa_nodes + 
  nb_hp_memslots + 2);
   
   srat = malloc_high(srat_size);
   if (!srat) {
  @@ -667,7 +786,7 @@ build_srat(void)
* from 640k-1M and possibly another one from 3.5G-4G.
*/
   struct srat_memory_affinity *numamem = (void*)core;
  -int slots = 0;
  +int slots = 0, node;
   u64 mem_len, mem_base, next_base = 0;
   
   acpi_build_srat_memory(numamem, 0, 640*1024, 0, 1);
  @@ -694,10 +813,36 @@ build_srat(void)
   next_base += (1ULL  32) - RamSize;
   }
   acpi_build_srat_memory(numamem, mem_base, mem_len, i-1, 1);
  +
   numamem++;
   slots++;
  +
  +}
  +mem = (void*)numamem;
  +
  +if (nb_hp_memslots) {
  +u64 *hpmemdata = malloc_tmphigh(sizeof(u64) * (3 * 
  nb_hp_memslots));
  +if (!hpmemdata) {
  +warn_noalloc();
  +free(hpmemdata);
  +free(numadata);
  +return NULL;
  +}
  +
  +qemu_cfg_get_numa_data(hpmemdata, 3 * nb_hp_memslots);
  +
  +for (i = 1; i  nb_hp_memslots + 1; ++i) {
  +mem_base = *hpmemdata++;
  +mem_len = *hpmemdata++;
  +node = *hpmemdata++;
  +acpi_build_srat_memory(numamem, mem_base, mem_len, node, 1);
  +numamem++;
  +slots++;
  +}
  +free(hpmemdata);
   }
  -for (; slots  nb_numa_nodes + 2; slots++) {
  +
  +for (; slots  nb_numa_nodes + nb_hp_memslots + 2; slots++) {
   acpi_build_srat_memory(numamem, 0, 0, 0, 0);
   numamem++;
   }
  @@ -748,6 +893,7 @@ acpi_bios_init(void)
   ACPI_INIT_TABLE(build_madt());
   ACPI_INIT_TABLE(build_hpet());
   ACPI_INIT_TABLE(build_srat());
  +ACPI_INIT_TABLE(build_memssdt());
   ACPI_INIT_TABLE(build_pcihp());
   
   u16 i, external_tables = qemu_cfg_acpi_additional_tables();
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v2 05/21][SeaBIOS] pciinit: Fix pcimem_start value

2012-07-11 Thread Vasilis Liaskovitis

Hi,

On Wed, Jul 11, 2012 at 01:56:19PM +0200, Gerd Hoffmann wrote:
 On 07/11/12 12:31, Vasilis Liaskovitis wrote:
  In order to hotplug memory between RamSize and BUILD_PCIMEM_START, the pci
  window needs to start at BUILD_PCIMEM_START (0xe000).
  Otherwise, the guest cannot online new dimms at those ranges due to pci_root
  window conflicts. (workaround for linux guest is booting with pci=nocrs)
 
   static void pci_bios_map_devices(struct pci_bus *busses)
   {
  -pcimem_start = RamSize;
  +pcimem_start = BUILD_PCIMEM_START;
 
 It isn't that simple.  For the 32bit pci window it will work, but will
 leaves address space unused instead of assigning it to the 32bit pci
 window.  For the 64bit pci window it will not work.
 
 You have to walk the dimms and figure what the highest used address is,
 for both below-4g and above-4g.  Then fill two variable with it and make
 the pci init code use that instead of RamSize and RamSizeOver4G.

I see. I already have these values values computed in qemu-kvm, so I can pass
them in a paravirt struct, or infer them from the dimm/srat paravirt info that I
already pass to seabios. 

If i understand correctly, we would like the pcimem windows to use the maximum
possible address space (constrained by the exact dimms/ranges which are defined)
instead of leaving unused space.

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC PATCH v2 13/21] Implement memory hotplug notification lists

2012-07-11 Thread Vasilis Liaskovitis

Hi,

On Wed, Jul 11, 2012 at 08:59:03AM -0600, Eric Blake wrote:
 On 07/11/2012 04:31 AM, Vasilis Liaskovitis wrote:
  Guest can respond to ACPI hotplug events e.g. with _EJ or _OST method.
  This patch implements a tail queue to store guest notifications for memory
  hot-add and hot-remove requests.
  
  Guest responses for memory hotplug command on a per-dimm basis can be 
  detected
  with the new hmp command info memhp or the new qmp command query-memhp
  Examples:
  
 
  +++ b/qapi-schema.json
  @@ -1862,3 +1862,29 @@
   # Since: 0.14.0
   ##
   { 'command': 'netdev_del', 'data': {'id': 'str'} }
  +
  +##
  +# @MemHpInfo:
  +#
  +# Information about status of a memory hotplug command
  +#
  +# @Dimm: the Dimm associated with the result
  +#
  +# @result: the result of the hotplug command
  +#
  +# Since: 1.1.3
 
 Should probably be 1.2, not 1.1.3.


right

  +#
  +##
  +{ 'type': 'MemHpInfo',
  +  'data': {'Dimm': 'str', 'request': 'str', 'result': 'str'} }
 
 Why the upper case?  Wouldn't 'dimm' be more consistent?


I will change to dimm

  +
  +##
  +# @query-memhp:
 
 Why are we abbreviating?  It might be better to name the QMP command
 query-memory-hotplug
 

agreed, memhp is a bit cryptic. I will change to your suggestion 

  +#
  +# Returns a list of information about pending hotplug commands
  +#
  +# Returns: a list of @MemhpInfo
  +#
  +# Since: 1.1.3
 
 Likewise for 1.2.

right

 
  +
  +- Dimm: Dimm name (json-str)
  +- request: type of hot request: hot-add or hot-remove  (json-str)
  +- result: result of the hotplug request for this Dimm success or failure 
  (json-str)
 
 This may need tweaks (such as s/Dimm/dimm/) based on resolution of above
 comments.
ok, it will be dimm

thanks,

- Vasilis

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC PATCH v2 19/21] Implement info memtotal and query-memtotal

2012-07-11 Thread Vasilis Liaskovitis

Hi,

On Wed, Jul 11, 2012 at 09:14:29AM -0600, Eric Blake wrote:
 On 07/11/2012 04:32 AM, Vasilis Liaskovitis wrote:
  Returns total memory of guest in bytes, including hotplugged memory.
  
  Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
 
 Should this instead be merged with query-balloon output, so that we have
 a single command that shows all aspects of memory usage (both balloon
 and hotplug at once)?
 
 
  @@ -1888,3 +1888,15 @@
   # Since: 1.1.3
   ##
   { 'command': 'query-memhp', 'returns': ['MemHpInfo'] }
  +
  +##
  +# @query-memtotal:
 
 A more generic name might be 'query-memory', especially if we merge
 balloon and hotplug information into one command.

query-memory sounds reasonable to me.

query-balloon should also be updated to show the correct memory. 
Do you foresee any issues with merging them?  the query-memory command
should work independently of the balloon driver.

  +#
  +# Returns total memory in bytes, including hotplugged dimms
  +#
  +# Returns: a l
 
 truncated

sorry about that.

thanks,

- Vasilis

 
 -- 
 Eric Blake   ebl...@redhat.com+1-919-301-3266
 Libvirt virtualization library http://libvirt.org
 
 
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v2 20/21] Implement -dimms, -dimmspop command line options

2012-07-11 Thread Vasilis Liaskovitis

Hi,

On Wed, Jul 11, 2012 at 05:55:25PM +0300, Avi Kivity wrote:
 On 07/11/2012 01:32 PM, Vasilis Liaskovitis wrote:
  Implement batch dimm creation command line options. These could be useful 
  for
  not bloating the command line with a large number of dimms.
 
 IMO this is unneeded.  With a management tool there is no problem
 generating a long command line; from the command line -dimm will be a
 rarely used option.

ok, I thought so. I guess this patch and the next are unwanted, unless there is 
a
strong opinion for using them coming from others.

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/9] ACPI memory hotplug

2012-04-24 Thread Vasilis Liaskovitis

Hi,
On Tue, Apr 24, 2012 at 10:52:24AM +0300, Gleb Natapov wrote:
 On Mon, Apr 23, 2012 at 02:31:15PM +0200, Vasilis Liaskovitis wrote:
  The 440fx spec mentions: The address range from the top of main DRAM to 4
  Gbytes (top of physical memory space supported by the 440FX PCIset) is 
  normally
  mapped to PCI. The PMC forwards all accesses within this address range to 
  PCI.
  
  What we probably want is that the initial memory map creation takes into 
  account
  all dimms specified (both populated/unpopulated) 
 Yes.
 
  So -m 1G, -device dimm,size=1G,populated=true -device 
  dimm,size=1G,populated=false
  would create a system map with top of memory and start of PCI-hole at 2G. 
  
 What -m 1G means on this command line? Isn't it redundant?
yes, this was redundant with the original concept.

 May be we should make -m create non unplaggable, populated slot starting
 at address 0. Ten you config above will specify 3G memory with 2G
 populated (first of which is not removable) and 1G unpopulated. PCI hole
 starts above 3G.

I agree -m should mean one big unpluggable slot.

So in the new proposal,-device dimm populated=true means a hot-removable dimm
that has already been hotplugged. 

A question here is when exactly should the initial hot-add event for this dimm
be played? If the relevant OSPM has not yet been initialized (e.g. 
acpi_memhotplug
module in a linux guest needs to be loaded), the guest may not see the event. 
This is a general issue of course, but with initially populated hot-removable
dimms it may be a bigger issue. Can ospm acpi initialization be detected?

Or maybe you are suggesting populated=true is part of initial memory (i.e. not
hot-added, but still hot-removable). Though in that case guestOS may use it for
bootmem allocations, making hot-remove more likely to fail at the memory
offlining stage.

 
  This may require some shifting of physical address offsets around
  3.5GB-4GB - is this the minimum PCI hole allowed?
 Currently it is 1G in QEMU code.
ok

  
  E.g. if we specify 4x1GB DIMMs (onlt the first initially populated)   
  -m 1G, -device dimm,size=1G,populated=true -device 
  dimm,size=1G,populated=false
  -device dimm,size=1G,populated=false -device dimm,size=1G,populated=false
  
  we create the following memory map:
  dimm0: [0,1G)
  dimm1: [1G, 2G)
  dimm2: [2G, 3G)
  dimm3: [4G, 5G) or dimm3 is split into [3G, 3.5G) and [4G, 4.5G)
  
  does either of these options sound reasonable?
  
 We shouldn't split dimms IMO. Just unnecessary complication. Better make
 bigger PCI hole.

ok

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 3/9][SeaBIOS] acpi: generate hotplug memory devices.

2012-04-24 Thread Vasilis Liaskovitis

Hi,

On Mon, Apr 23, 2012 at 07:37:51PM -0400, Kevin O'Connor wrote:
 On Thu, Apr 19, 2012 at 04:08:41PM +0200, Vasilis Liaskovitis wrote:
   The memory device generation is guided by qemu paravirt info. Seabios
   first uses the info to setup SRAT entries for the hotplug-able memory 
  slots.
   Afterwards, build_memssdt uses the created SRAT entries to generate
   appropriate memory device objects. One memory device (and corresponding 
  SRAT
   entry) is generated for each hotplug-able qemu memslot. Currently no SSDT
   memory device is created for initial system memory (the method can be
   generalized to all memory though).
  
   Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
  ---
   src/acpi.c |  151 
  ++--
   1 files changed, 147 insertions(+), 4 deletions(-)
  
  diff --git a/src/acpi.c b/src/acpi.c
  index 30888b9..5580099 100644
  --- a/src/acpi.c
  +++ b/src/acpi.c
  @@ -484,6 +484,131 @@ build_ssdt(void)
   return ssdt;
   }
   
  +static unsigned char ssdt_mem[] = {
  +0x5b,0x82,0x47,0x07,0x4d,0x50,0x41,0x41,
 
 This patch looks like it uses the SSDT generation mechanism that was
 present in SeaBIOS v1.6.3.  Since then, however, the runtime AML code
 generation has been improved to be more dynamic.  Any runtime
 generated AML code should be updated to use the newer mechanisms.

thanks, I will look into the new mechanism and rewrite.

- Vasilis

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 8/9] pc: adjust e820 map on hot-add and hot-remove

2012-04-23 Thread Vasilis Liaskovitis

On Sun, Apr 22, 2012 at 04:58:47PM +0300, Gleb Natapov wrote:
 On Thu, Apr 19, 2012 at 04:08:46PM +0200, Vasilis Liaskovitis wrote:
   Hotplugged memory is not persistent in the e820 memory maps. After 
  hotplugging
   a memslot and rebooting the VM, the hotplugged device is not present.
  
   A possible solution is to add an e820 for the new memslot in the acpi_piix4
   hot-add handler. On a reset, Seabios (see next patch in series) will 
  enable all
   memory devices for which it finds an e820 entry that covers the devices's 
  address
   range.
  
   On hot-remove, the acpi_piix4 handler will try to remove the e820 entry
   corresponding to the device. This will work when no VM reboots happen
   between hot-add and hot-remove, but it is not a sufficient solution in
   general: Seabios and GuestOS merge adjacent e820 entries on machine reboot,
   so the sequence hot-add/ rebootVM / hot-remove will fail to remove a
   corresponding e820 entry at the hot-remove phase.
  
 Why do you need this path and the next one? Bios can restore the state
 of memslots and build e820 map by reading mems_sts.

i see, that is a simpler solution. Since qemu currently creates most ram e820map
entries and passes them to seabios, I tried to follow the same approach. But
your suggestion makes things easier and we don't have to worry about merged e820
entries on hot-remove.  I 'll rework it.
thanks,

 Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/9] ACPI memory hotplug

2012-04-23 Thread Vasilis Liaskovitis

Hi,

On Sun, Apr 22, 2012 at 05:20:59PM +0300, Gleb Natapov wrote:
 On Sun, Apr 22, 2012 at 05:13:27PM +0300, Avi Kivity wrote:
  On 04/22/2012 05:09 PM, Gleb Natapov wrote:
   On Sun, Apr 22, 2012 at 05:06:43PM +0300, Avi Kivity wrote:
On 04/22/2012 04:56 PM, Gleb Natapov wrote:
 start. We will need it for migration anyway.

  hotplug-able memory slots i.e. initial system memory is not modeled 
  with
  memslots. The concept could be generalized to include all memory 
  though, or it
  could more closely follow kvm-memory slots.
 OK, I hope final version will allow for memory  4G to be 
 hot-pluggable.

Why is that important?

   Because my feeling is that people that want to use this kind of feature
   what to start using it with VMs smaller than 4G. Of course not all
   memory have to be hot unpluggable. Making first 1M or event first 128M not
   unpluggable make perfect sense.
  
  Can't you achieve this with -m 1G, -device dimm,size=1G,populated=true
  -device dimm,size=1G,populated=false?
  
 From this:
 
 (for hw/pc.c PCI hole is currently [below_4g_mem_size, 4G), so
 hotplugged memory should start from max(4G, above_4g_mem_size).
 
 I understand that hotpluggable memory can start from above 4G only. With
 the config above we will have memory hole from 1G to PCI memory hole.
 May be not a big problem, but I do not see technical reason for the constrain.
  
The 440fx spec mentions: The address range from the top of main DRAM to 4
Gbytes (top of physical memory space supported by the 440FX PCIset) is normally
mapped to PCI. The PMC forwards all accesses within this address range to PCI.

What we probably want is that the initial memory map creation takes into account
all dimms specified (both populated/unpopulated) 
So -m 1G, -device dimm,size=1G,populated=true -device 
dimm,size=1G,populated=false
would create a system map with top of memory and start of PCI-hole at 2G. 

This may require some shifting of physical address offsets around
3.5GB-4GB - is this the minimum PCI hole allowed?

E.g. if we specify 4x1GB DIMMs (onlt the first initially populated)   
-m 1G, -device dimm,size=1G,populated=true -device dimm,size=1G,populated=false
-device dimm,size=1G,populated=false -device dimm,size=1G,populated=false

we create the following memory map:
dimm0: [0,1G)
dimm1: [1G, 2G)
dimm2: [2G, 3G)
dimm3: [4G, 5G) or dimm3 is split into [3G, 3.5G) and [4G, 4.5G)

does either of these options sound reasonable?

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC PATCH 2/9][SeaBIOS] Implement acpi-dsdt functions for memory hotplug.

2012-04-20 Thread Vasilis Liaskovitis

Hi,

On Fri, Apr 20, 2012 at 12:55:24PM +0200, Igor Mammedov wrote:
 +/* Memory eject notify method */
 +OperationRegion(MEMJ, SystemIO, 0xaf40, 32)
 +Field (MEMJ, ByteAcc, NoLock, Preserve)
 +{
 +MPE, 256
 +}
 +
 +Method (MPEJ, 2, NotSerialized) {
 +// _EJ0 method - eject callback
 +Store(ShiftLeft(1,Arg0), MPE)
 +Sleep(200)
 +}
 MPE is write only and only one memslot is ejected at a time. Why 256 
 bit-field is here then?
 Could we use just 1 byte and write a slot number into it and save some io 
 address space this way?

good point. This was implemented similarly to the hot-add/status register only
for symmetry, but you are right, since only one slot is ejected at a time, this
can be reduced to one byte and save space. I will update for the next version.

 
 +
 +/* Memory hotplug notify method */
 +OperationRegion(MEST, SystemIO, 0xaf20, 32)
 It's more a suggestion: move it a bit farther to allow maybe 1024 cpus in the 
 future.
 That will prevent compatibility a headache, if we decide to expand support to 
 more then
 256 cpus.

ok, I will move it to 0xaf80 or higher (so cpu-hotplug could be extended to at
least 1024 cpus)

 
 Or event better to make this address configurable in run-time and build this 
 var along
 with SSDT (converting along the way all other hard-coded io ports to the same 
 generic
 run-time interface). This wish is out of scope of this patch-set, but what
 do you think about the idea?

yes, that would give more flexibility and avoid more compatibility headaches.
As you say it's not a main issue for the series, but I can work on it as we 
start
converting hardcoded i/o ports to configurable properties.

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/9] ACPI memory hotplug

2012-04-20 Thread Vasilis Liaskovitis

On Thu, Apr 19, 2012 at 04:08:38PM +0200, Vasilis Liaskovitis wrote:
 
 series is based on uq/master for qemu-kvm, and master for seabios. Can be 
 found
 also at:
forgot to paste the repo links in the original coverletter, here they are if
someone wants them:

https://github.com/vliaskov/qemu-kvm/commits/memory-hotplug 
https://github.com/vliaskov/seabios/commits/memory-hotplug

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC PATCH 6/9] pc: pass paravirt info for hotplug memory slots to BIOS

2012-04-20 Thread Vasilis Liaskovitis

On Fri, Apr 20, 2012 at 12:33:57PM +0200, Igor Mammedov wrote:
 On 04/19/2012 04:08 PM, Vasilis Liaskovitis wrote:
 -numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
 +numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * 
 nb_hp_memslots) * 8);
   numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
 +numa_fw_cfg[1] = cpu_to_le64(nb_hp_memslots);
 this will brake compatibility if guest was migrated from old-new qemu
 than on reboot it will use old bios that expects numa_fw_cfg[1] to be 
 something else.
 Could memslots info be moved to the end of an existing interface?

right. The number of memslots can be placed at 1 + max_cpus + nb_numa_nodes,
instead of right after the number of nodes. This way the old layout is 
preserved,
and all memslot info comes at the end. I will rewrite.

thanks,
- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 0/9] ACPI memory hotplug

2012-04-19 Thread Vasilis Liaskovitis

This is a prototype for ACPI memory hotplug on x86_64 target. Based on some
earlier work and comments from Gleb.

Memslot devices are modeled with a new qemu command line 

-memslot id=name,start=start_addr,size=sz,node=pxm

user is responsible for defining memslots with meaningful start/size values,
e.g. not defining a memory slot over a PCI-hole. Alternatively, the start size
could also be handled/assigned automatically from the specific emulated hardware
(for hw/pc.c PCI hole is currently [below_4g_mem_size, 4G), so hotplugged memory
should start from max(4G, above_4g_mem_size).

Node is defining numa proximity for this memslot. When not defined it defaults
to zero.

e.g. -memslot id=hot1,start=4294967296,size=536870912,node=0
will define a 512M memory slot starting at physical address 4G, belonging to 
numa node 0.

Memory slots are added or removed with a new hmp command memslot:
Hot-add syntax: memslot id add
Hot-remove syntax: memslot id delete

- All memslots are initially unpopulated. Memslots are currently modeling only
hotplug-able memory slots i.e. initial system memory is not modeled with
memslots. The concept could be generalized to include all memory though, or it
could more closely follow kvm-memory slots.

- Memslots are abstracted as qdevices attached to the main system bus. However,
memory hotplugging has its own side channel ignoring main_system_bus's hotplug
incapability. A cleaner integration would be needed. What's  the preferred
way of modeling memory devices in qom? Would it be better to attach memory
devices as children-links of an acpi-capable device (in the pc case acpi_piix4)
instead of the system bus?

- Refcounting memory slots has been discussed (but is not included in this 
series yet). Depopulating a memory region happens on a guestOS _EJ callback,
which means the guestOS will not be using the region anymore. However, guest
addresses from the depopulated region need to also be unmapped from the qemu
address space using cpu_physical_memory_unmap(). Does 
memory_region_del_subregion()
or some other memory API call guarantee that a memoryregion has been unmapped
from qemu's address space?

- What is the expected behaviour of hotplugged memory after a reboot? Is it
supposed to be persistent after reboots? The last 2 patches in the series try to
make hotplugged memslots persistent after reboot by creating and consulting e820
map entries.  A better solution is needed for hot-remove after a reboot, because
e820 entries can be merged.

series is based on uq/master for qemu-kvm, and master for seabios. Can be found
also at:


Vasilis Liaskovitis (9):
  Seabios: Add SSDT memory device support
  Seabios, acpi: Implement acpi-dsdt functions for memory hotplug.
  Seabios, acpi: generate hotplug memory devices.
  Implement memslot device abstraction
  acpi_piix4: Implement memory device hotplug registers and handlers. 
  pc: pass paravirt info for hotplug memory slots to BIOS
  Implement memslot command-line option and memslot hmp monitor command
  pc: adjust e820 map on hot-add and hot-remove
  Seabios, acpi: enable memory devices if e820 entry is present

 Makefile.objs   |2 +-
 hmp-commands.hx |   15 
 hw/acpi_piix4.c |  103 +++-
 hw/memslot.c|  201 +++
 hw/memslot.h|   44 
 hw/pc.c |   87 ++--
 hw/pc.h |1 +
 monitor.c   |8 ++
 monitor.h   |1 +
 qemu-config.c   |   25 +++
 qemu-options.hx |8 ++
 sysemu.h|1 +
 vl.c|   44 -
 13 files changed, 528 insertions(+), 12 deletions(-)
 create mode 100644 hw/memslot.c
 create mode 100644 hw/memslot.h

 create mode 100644 src/ssdt-mem.dsl
 src/acpi-dsdt.dsl |   68 ++-
 src/acpi.c|  155 +++--
 src/memmap.c  |   15 +
 src/ssdt-mem.dsl  |   66 ++
 4 files changed, 298 insertions(+), 6 deletions(-)
 create mode 100644 src/ssdt-mem.dsl

-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 3/9][SeaBIOS] acpi: generate hotplug memory devices.

2012-04-19 Thread Vasilis Liaskovitis

 The memory device generation is guided by qemu paravirt info. Seabios
 first uses the info to setup SRAT entries for the hotplug-able memory slots.
 Afterwards, build_memssdt uses the created SRAT entries to generate
 appropriate memory device objects. One memory device (and corresponding SRAT
 entry) is generated for each hotplug-able qemu memslot. Currently no SSDT
 memory device is created for initial system memory (the method can be
 generalized to all memory though).

 Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/acpi.c |  151 ++--
 1 files changed, 147 insertions(+), 4 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index 30888b9..5580099 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -484,6 +484,131 @@ build_ssdt(void)
 return ssdt;
 }
 
+static unsigned char ssdt_mem[] = {
+0x5b,0x82,0x47,0x07,0x4d,0x50,0x41,0x41,
+0x08,0x49,0x44,0x5f,0x5f,0x0a,0xaa,0x08,
+0x5f,0x48,0x49,0x44,0x0c,0x41,0xd0,0x0c,
+0x80,0x08,0x5f,0x50,0x58,0x4d,0x0a,0xaa,
+0x08,0x5f,0x43,0x52,0x53,0x11,0x33,0x0a,
+0x30,0x8a,0x2b,0x00,0x00,0x0d,0x03,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xef,
+0xbe,0xad,0xde,0x00,0x00,0x00,0x00,0xee,
+0xbe,0xad,0xe6,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x08,0x00,0x00,0x00,0x00,0x79,
+0x00,0x14,0x0f,0x5f,0x53,0x54,0x41,0x00,
+0xa4,0x43,0x4d,0x53,0x54,0x49,0x44,0x5f,
+0x5f,0x14,0x0f,0x5f,0x45,0x4a,0x30,0x01,
+0x4d,0x50,0x45,0x4a,0x49,0x44,0x5f,0x5f,
+0x68
+};
+
+#define SD_OFFSET_MEMHEX 6
+#define SD_OFFSET_MEMID 14
+#define SD_OFFSET_PXMID 31
+#define SD_OFFSET_MEMSTART 55
+#define SD_OFFSET_MEMEND   63
+#define SD_OFFSET_MEMSIZE  79
+
+u64 nb_hp_memslots = 0;
+struct srat_memory_affinity *mem;
+
+static void build_memdev(u8 *ssdt_ptr, int i, u64 mem_base, u64 mem_len, u8 
node)
+{
+memcpy(ssdt_ptr, ssdt_mem, sizeof(ssdt_mem));
+ssdt_ptr[SD_OFFSET_MEMHEX] = getHex(i  4);
+ssdt_ptr[SD_OFFSET_MEMHEX+1] = getHex(i);
+ssdt_ptr[SD_OFFSET_MEMID] = i;
+ssdt_ptr[SD_OFFSET_PXMID] = node;
+*(u64*)(ssdt_ptr + SD_OFFSET_MEMSTART) = mem_base;
+*(u64*)(ssdt_ptr + SD_OFFSET_MEMEND) = mem_base + mem_len;
+*(u64*)(ssdt_ptr + SD_OFFSET_MEMSIZE) = mem_len;
+}
+
+static void*
+build_memssdt(void)
+{
+u64 mem_base;
+u64 mem_len;
+u8  node;
+int i;
+struct srat_memory_affinity *entry = mem;
+u64 nb_memdevs = nb_hp_memslots;
+
+int length = ((1+3+4)
+  + (nb_memdevs * sizeof(ssdt_mem))
+  + (1+2+5+(12*nb_memdevs))
+  + (6+2+1+(1*nb_memdevs)));
+u8 *ssdt = malloc_high(sizeof(struct acpi_table_header) + length);
+if (! ssdt) {
+warn_noalloc();
+return NULL;
+}
+u8 *ssdt_ptr = ssdt + sizeof(struct acpi_table_header);
+
+// build Scope(_SB_) header
+*(ssdt_ptr++) = 0x10; // ScopeOp
+ssdt_ptr = encodeLen(ssdt_ptr, length-1, 3);
+*(ssdt_ptr++) = '_';
+*(ssdt_ptr++) = 'S';
+*(ssdt_ptr++) = 'B';
+*(ssdt_ptr++) = '_';
+
+for (i = 0; i  nb_memdevs; i++) {
+mem_base = (((u64)(entry-base_addr_high)  32 )| 
entry-base_addr_low);
+mem_len = (((u64)(entry-length_high)  32 )| entry-length_low);
+node = entry-proximity[0];
+build_memdev(ssdt_ptr, i, mem_base, mem_len, node);
+ssdt_ptr += sizeof(ssdt_mem);
+entry++;
+}
+
+// build Method(MTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CM00, Arg1)} 
...}
+*(ssdt_ptr++) = 0x14; // MethodOp
+ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*nb_memdevs), 2);
+*(ssdt_ptr++) = 'M';
+*(ssdt_ptr++) = 'T';
+*(ssdt_ptr++) = 'F';
+*(ssdt_ptr++) = 'Y';
+*(ssdt_ptr++) = 0x02;
+for (i=0; inb_memdevs; i++) {
+*(ssdt_ptr++) = 0xA0; // IfOp
+ssdt_ptr = encodeLen(ssdt_ptr, 11, 1);
+*(ssdt_ptr++) = 0x93; // LEqualOp
+*(ssdt_ptr++) = 0x68; // Arg0Op
+*(ssdt_ptr++) = 0x0A; // BytePrefix
+*(ssdt_ptr++) = i;
+*(ssdt_ptr++) = 0x86; // NotifyOp
+*(ssdt_ptr++) = 'M';
+*(ssdt_ptr++) = 'P';
+*(ssdt_ptr++) = getHex(i  4);
+*(ssdt_ptr++) = getHex(i);
+*(ssdt_ptr++) = 0x69; // Arg1Op
+}
+
+// build Name(MEON, Package() { One, One, ..., Zero, Zero, ... })
+*(ssdt_ptr++) = 0x08; // NameOp
+*(ssdt_ptr++) = 'M';
+*(ssdt_ptr++) = 'E';
+*(ssdt_ptr++) = 'O';
+*(ssdt_ptr++) = 'N';
+*(ssdt_ptr++) = 0x12; // PackageOp
+ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*nb_memdevs), 2);
+*(ssdt_ptr++) = nb_memdevs;
+
+entry = mem;
+
+for (i = 0; i  nb_memdevs; i++) {
+mem_base = (((u64)(entry-base_addr_high)  32 )| 
entry-base_addr_low);
+mem_len = (((u64)(entry-length_high)  32 )| entry-length_low);
+*(ssdt_ptr++) = 0x00;
+entry++;
+}
+build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
+
+return ssdt

[RFC PATCH 1/9][SeaBIOS] Add SSDT memory device support

2012-04-19 Thread Vasilis Liaskovitis

Define SSDT hotplug-able memory devices in _SB namespace. The dynamically
generated SSDT includes per memory device hotplug methods. These methods
just call methods defined in the DSDT. Also dynamically generate a MTFY
method and a MEON array of the online/available memory devices.  Add file
src/ssdt-mem.dsl with directions for generating the per-memory device
processor object AML code.
The design is taken from SSDT cpu generation.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/ssdt-mem.dsl |   66 ++
 1 files changed, 66 insertions(+), 0 deletions(-)
 create mode 100644 src/ssdt-mem.dsl

diff --git a/src/ssdt-mem.dsl b/src/ssdt-mem.dsl
new file mode 100644
index 000..9586643
--- /dev/null
+++ b/src/ssdt-mem.dsl
@@ -0,0 +1,66 @@
+/* This file is the basis for the ssdt_mem[] variable in src/acpi.c.
+ * It is similar in design to the ssdt_proc variable.  
+ * It defines the contents of the per-cpu Processor() object.  At
+ * runtime, a dynamically generated SSDT will contain one copy of this
+ * AML snippet for every possible memory device in the system.  The 
+ * objects will * be placed in the \_SB_ namespace.
+ *
+ * To generate a new ssdt_memc[], run the commands:
+ *   cpp -P src/ssdt-mem.dsl  out/ssdt-mem.dsl.i
+ *   iasl -ta -p out/ssdt-mem out/ssdt-mem.dsl.i
+ *   tail -c +37  out/ssdt-mem.aml | hexdump -e ' 8/1 0x%02x, \n'
+ * and then cut-and-paste the output into the src/acpi.c ssdt_mem[]
+ * array.
+ *
+ * In addition to the aml code generated from this file, the
+ * src/acpi.c file creates a MEMNTFY method with an entry for each memdevice:
+ * Method(MTFY, 2) {
+ * If (LEqual(Arg0, 0x00)) { Notify(MP00, Arg1) }
+ * If (LEqual(Arg0, 0x01)) { Notify(MP01, Arg1) }
+ * ...
+ * }
+ * and a MEON array with the list of active and inactive memory devices:
+ * Name(MEON, Package() { One, One, ..., Zero, Zero, ... })
+ */
+DefinitionBlock (ssdt-mem.aml, SSDT, 0x02, BXPC, CSSDT, 0x1)
+/*  v-- DO NOT EDIT --v */
+{
+Device(MPAA) {
+Name(ID, 0xAA)
+/*  ^-- DO NOT EDIT --^
+ *
+ * The src/acpi.c code requires the above layout so that it can update
+ * MPAA and 0xAA with the appropriate MEMDEVICE id (see
+ * SD_OFFSET_MEMHEX/MEMID1/MEMID2).  Don't change the above without
+ * also updating the C code.
+ */
+Name(_HID, EISAID(PNP0C80))
+Name(_PXM, 0xAA)
+
+External(CMST, MethodObj)
+External(MPEJ, MethodObj)
+
+Name(_CRS, ResourceTemplate() {
+QwordMemory(
+   ResourceConsumer,
+   ,
+   MinFixed,
+   MaxFixed,
+   Cacheable,
+   ReadWrite, 
+   0x0, 
+   0xDEADBEEF, 
+   0xE6ADBEEE, 
+   0x,
+   0x0800, 
+   )
+})
+Method (_STA, 0) {
+Return(CMST(ID))
+}
+Method (_EJ0, 1, NotSerialized) {
+MPEJ(ID, Arg0)
+}
+}
+}
+
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 4/9] Implement memslot device abstraction

2012-04-19 Thread Vasilis Liaskovitis

 Each hotplug-able memory slot is a SysBusDevice. All memslots are initially
 unpopulated. A hot-add operation for a particular memory slot creates a new
 MemoryRegion of the given physical address offset, size and node proximity,
 and attaches it to main system memory as a sub_region. A hot-remove operation
 detaches and frees the MemoryRegion from system memory.

 This is an early prototype and lacks proper qdev integration: a separate
 hotplug mechanism/side-channel is used and main system bus hotplug
 capability is ignored.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/memslot.c |  195 ++
 hw/memslot.h |   44 +
 2 files changed, 239 insertions(+), 0 deletions(-)
 create mode 100644 hw/memslot.c
 create mode 100644 hw/memslot.h

diff --git a/hw/memslot.c b/hw/memslot.c
new file mode 100644
index 000..b100824
--- /dev/null
+++ b/hw/memslot.c
@@ -0,0 +1,195 @@
+/*
+ * MemorySlot device for Memory Hotplug
+ *
+ * Copyright ProfitBricks GmbH 2012
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see http://www.gnu.org/licenses/
+ */
+
+#include trace.h
+#include qdev.h
+#include memslot.h
+#include ../exec-memory.h
+
+static DeviceState *memslot_hotplug_qdev;
+static memslot_hotplug_fn memslot_hotplug;
+
+static Property memslot_properties[] = {
+DEFINE_PROP_END_OF_LIST()
+};
+
+void memslot_populate(MemSlotState *s)
+{
+char buf[32];
+MemoryRegion *new = NULL;
+
+sprintf(buf, memslot%u, s-idx);
+new = g_malloc(sizeof(MemoryRegion));
+memory_region_init_ram(new, buf, s-size);
+vmstate_register_ram_global(new);
+memory_region_add_subregion(get_system_memory(), s-start, new);
+s-mr = new;
+s-populated = 1;
+}
+
+void memslot_depopulate(MemSlotState *s)
+{
+assert(s);
+if (s-populated) {
+vmstate_unregister_ram(s-mr, NULL);
+memory_region_del_subregion(get_system_memory(), s-mr);
+memory_region_destroy(s-mr);
+s-populated = 0;
+s-mr = NULL;
+}
+}
+
+MemSlotState *memslot_create(char *id, target_phys_addr_t start, uint64_t size,
+uint64_t node, uint32_t memslot_idx)
+{
+DeviceState *dev;
+MemSlotState *mdev;
+
+dev = sysbus_create_simple(memslot, -1, NULL);
+dev-id = id;
+
+mdev = MEMSLOT(dev);
+mdev-idx = memslot_idx;
+mdev-start = start;
+mdev-size = size;
+mdev-node = node;
+
+return mdev;
+}
+
+void memslot_register_hotplug(memslot_hotplug_fn hotplug, DeviceState *qdev)
+{
+memslot_hotplug_qdev = qdev;
+memslot_hotplug = hotplug;
+}
+
+static MemSlotState *memslot_find(char *id)
+{
+DeviceState *qdev;
+qdev = qdev_find_recursive(sysbus_get_default(), id);
+if (qdev)
+return MEMSLOT(qdev);
+return NULL;
+}
+
+int memslot_do(Monitor *mon, const QDict *qdict)
+{
+MemSlotState *slot = NULL;
+
+char *id = (char*) qdict_get_try_str(qdict, id);
+if (!id) {
+fprintf(stderr, ERROR %s invalid id\n,__FUNCTION__);
+return 1;
+}
+
+slot = memslot_find(id);
+
+if (!slot) {
+fprintf(stderr, %s no slot %s found\n, __FUNCTION__, id);
+return 1;
+}
+
+char *action = (char*) qdict_get_try_str(qdict, action);
+if (!action || (strcmp(action, add)  strcmp(action, delete))) {
+fprintf(stderr, ERROR %s invalid action\n, __FUNCTION__);
+return 1;
+}
+
+if (!strcmp(action, add)) {
+if (slot-populated) {
+fprintf(stderr, ERROR %s slot %s already populated\n,
+__FUNCTION__, id);
+return 1;
+}
+memslot_populate(slot);
+if (memslot_hotplug)
+memslot_hotplug(memslot_hotplug_qdev, (SysBusDevice*)slot, 1);
+}
+else {
+if (!slot-populated) {
+fprintf(stderr, ERROR %s slot %s is not populated\n,
+__FUNCTION__, id);
+return 1;
+}
+if (memslot_hotplug)
+memslot_hotplug(memslot_hotplug_qdev, (SysBusDevice*)slot, 0);
+}
+return 0;
+}
+
+MemSlotState *memslot_find_from_idx(uint32_t idx)
+{
+Error *err = NULL;
+DeviceState *dev;
+MemSlotState *slot;
+char *type;
+BusState *bus = sysbus_get_default();
+QTAILQ_FOREACH(dev, bus-children, sibling) {
+type = object_property_get_str(OBJECT(dev

[RFC PATCH 5/9] acpi_piix4: Implement memory device hotplug registers

2012-04-19 Thread Vasilis Liaskovitis

 A 32-byte register is used to present up to 256 hotplug-able memory devices
 to BIOS and OSPM. Hot-add and hot-remove functions trigger an ACPI hotplug
 event through these. Only reads are allowed from these registers (from
 BIOS/OSPM perspective). memslot id add will immediately populate the new
 memslot (a new MemoryRegion is created and attached to system memory), and
 then trigger the ACPI hot-add event. memslot id delete triggers the ACPI
 hot-remove event but needs to wait for OSPM to eject the device.  We use a
 second set of eject registers to know when OSPM has called the _EJ function
 for a particular memslot. A write to these will depopulate the corresponding
 memslot i.e. detach and free the MemoryRegion. Only writes to the eject
 registers are allowed.

 A new property mem_acpi_hotplug should enable these memory hotplug registers
 for future machine types (not yet implemented in this version).

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/acpi_piix4.c |   93 --
 1 files changed, 89 insertions(+), 4 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 797ed24..a14dd3c 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -27,6 +27,8 @@
 #include sysemu.h
 #include range.h
 #include ioport.h
+#include sysbus.h
+#include memslot.h
 
 //#define DEBUG
 
@@ -43,9 +45,16 @@
 #define PCI_BASE 0xae00
 #define PCI_EJ_BASE 0xae08
 #define PCI_RMV_BASE 0xae0c
+#define MEM_BASE 0xaf20
+#define MEM_EJ_BASE 0xaf40
 
+#define PIIX4_MEM_HOTPLUG_STATUS 8
 #define PIIX4_PCI_HOTPLUG_STATUS 2
 
+struct gpe_regs {
+uint8_t mems_sts[32];
+};
+
 struct pci_status {
 uint32_t up;
 uint32_t down;
@@ -66,6 +75,7 @@ typedef struct PIIX4PMState {
 int kvm_enabled;
 Notifier machine_ready;
 
+struct gpe_regsgpe;
 /* for pci hotplug */
 struct pci_status pci0_status;
 uint32_t pci0_hotplug_enable;
@@ -86,8 +96,8 @@ static void pm_update_sci(PIIX4PMState *s)
ACPI_BITMASK_POWER_BUTTON_ENABLE |
ACPI_BITMASK_GLOBAL_LOCK_ENABLE |
ACPI_BITMASK_TIMER_ENABLE)) != 0) ||
-(((s-ar.gpe.sts[0]  s-ar.gpe.en[0])
-   PIIX4_PCI_HOTPLUG_STATUS) != 0);
+(((s-ar.gpe.sts[0]  s-ar.gpe.en[0]) 
+  (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_MEM_HOTPLUG_STATUS)) != 0);
 
 qemu_set_irq(s-irq, sci_level);
 /* schedule a timer interruption if needed */
@@ -432,17 +442,34 @@ type_init(piix4_pm_register_types)
 static uint32_t gpe_readb(void *opaque, uint32_t addr)
 {
 PIIX4PMState *s = opaque;
-uint32_t val = acpi_gpe_ioport_readb(s-ar, addr);
+uint32_t val = 0;
+struct gpe_regs *g = s-gpe;
+
+switch (addr) {
+case MEM_BASE ... MEM_BASE+31:
+val = g-mems_sts[addr - MEM_BASE];
+break;
+default:
+val = acpi_gpe_ioport_readb(s-ar, addr);
+}
 
 PIIX4_DPRINTF(gpe read %x == %x\n, addr, val);
 return val;
 }
 
+static void piix4_memslot_eject(uint32_t addr, uint32_t val);
+
 static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
 {
 PIIX4PMState *s = opaque;
 
-acpi_gpe_ioport_writeb(s-ar, addr, val);
+switch (addr) {
+case MEM_EJ_BASE ... MEM_EJ_BASE+31:
+piix4_memslot_eject(addr, val);
+break;
+default:
+acpi_gpe_ioport_writeb(s-ar, addr, val);
+}
 pm_update_sci(s);
 
 PIIX4_DPRINTF(gpe write %x == %d\n, addr, val);
@@ -521,9 +548,12 @@ static void pcirmv_write(void *opaque, uint32_t addr, 
uint32_t val)
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 PCIHotplugState state);
 
+static int piix4_memslot_hotplug(DeviceState *qdev, SysBusDevice *dev, int 
add);
+
 static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
 {
 struct pci_status *pci0_status = s-pci0_status;
+int i = 0;
 
 register_ioport_write(GPE_BASE, GPE_LEN, 1, gpe_writeb, s);
 register_ioport_read(GPE_BASE, GPE_LEN, 1,  gpe_readb, s);
@@ -538,6 +568,13 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, 
PIIX4PMState *s)
 register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
 register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
 
+register_ioport_read(MEM_BASE, 32, 1,  gpe_readb, s);
+register_ioport_write(MEM_EJ_BASE, 32, 1,  gpe_writeb, s);
+for(i = 0; i  32; i++) {
+s-gpe.mems_sts[i] = 0;
+}
+memslot_register_hotplug(piix4_memslot_hotplug, s-dev.qdev);
+
 pci_bus_hotplug(bus, piix4_device_hotplug, s-dev.qdev);
 }
 
@@ -553,6 +590,54 @@ static void disable_device(PIIX4PMState *s, int slot)
 s-pci0_status.down |= (1  slot);
 }
 
+static void enable_mem_device(PIIX4PMState *s, int memdevice)
+{
+struct gpe_regs *g = s-gpe;
+s-ar.gpe.sts[0] |= PIIX4_MEM_HOTPLUG_STATUS;
+g-mems_sts[memdevice/8] |= (1  (memdevice%8));
+}
+
+static void

[RFC PATCH 8/9] pc: adjust e820 map on hot-add and hot-remove

2012-04-19 Thread Vasilis Liaskovitis

 Hotplugged memory is not persistent in the e820 memory maps. After hotplugging
 a memslot and rebooting the VM, the hotplugged device is not present.

 A possible solution is to add an e820 for the new memslot in the acpi_piix4
 hot-add handler. On a reset, Seabios (see next patch in series) will enable all
 memory devices for which it finds an e820 entry that covers the devices's 
address
 range.

 On hot-remove, the acpi_piix4 handler will try to remove the e820 entry
 corresponding to the device. This will work when no VM reboots happen
 between hot-add and hot-remove, but it is not a sufficient solution in
 general: Seabios and GuestOS merge adjacent e820 entries on machine reboot,
 so the sequence hot-add/ rebootVM / hot-remove will fail to remove a
 corresponding e820 entry at the hot-remove phase.

 Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/acpi_piix4.c |6 ++
 hw/pc.c |   28 
 hw/pc.h |1 +
 3 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 2921d18..2b5fd04 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -619,6 +619,9 @@ static void piix4_memslot_eject(uint32_t addr, uint32_t val)
 s = memslot_find_from_idx(start + idx);
 assert(s != NULL);
 memslot_depopulate(s);
+if (e820_del_entry(s-start, s-size, E820_RAM) == -EBUSY)
+PIIX4_DPRINTF(failed to remove e820 entry for memslot %u\n,
+   s-idx);
 }
 val = val  1;
 idx++;
@@ -634,6 +637,9 @@ static int piix4_memslot_hotplug(DeviceState *qdev, 
SysBusDevice *dev, int
 
 if (add) {
 enable_mem_device(s, slot-idx);
+if (e820_add_entry(slot-start, slot-size, E820_RAM) == -EBUSY)
+PIIX4_DPRINTF(failed to add e820 entry for memslot %u\n,
+slot-idx);
 }
 else {
 disable_mem_device(s, slot-idx);
diff --git a/hw/pc.c b/hw/pc.c
index f1f550a..04d243f 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -593,6 +593,34 @@ int e820_add_entry(uint64_t address, uint64_t length, 
uint32_t type)
 return index;
 }
 
+int e820_del_entry(uint64_t address, uint64_t length, uint32_t type)
+{
+int index = le32_to_cpu(e820_table.count);
+int search;
+struct e820_entry *entry;
+
+if (index == 0)
+return -EBUSY;
+search = index - 1;
+entry = e820_table.entry[search];
+while (search = 0) {
+if ((entry-address == cpu_to_le64(address)) 
+(entry-length == cpu_to_le64(length)) 
+(entry-type == cpu_to_le32(type))){
+if (search != index - 1) {
+memcpy(e820_table.entry[search], e820_table.entry[search + 
1],
+sizeof(struct e820_entry) * (index - search));
+}
+index--;
+e820_table.count = cpu_to_le32(index);
+return 1;
+}
+search--;
+entry = e820_table.entry[search];
+}
+return -EBUSY;
+}
+
 static void bochs_bios_setup_hp_memslots(uint64_t *fw_cfg_slots);
 
 static void *bochs_bios_init(void)
diff --git a/hw/pc.h b/hw/pc.h
index 74d3369..4925e8c 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -226,5 +226,6 @@ void pc_system_firmware_init(MemoryRegion *rom_memory);
 #define E820_UNUSABLE   5
 
 int e820_add_entry(uint64_t, uint64_t, uint32_t);
+int e820_del_entry(uint64_t, uint64_t, uint32_t);
 
 #endif
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 9/9][SeaBIOS] enable memory devices if e820 entry is present

2012-04-19 Thread Vasilis Liaskovitis

 On a reboot, seabios regenerates srat/ssdt objects. If a valid e820 entry is
 found spanning the whole address range of a hotplug memory device, the  device
 will be enabled. This ensures persistency of hotplugged memory slots across VM
 reboots.

 Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/acpi.c   |6 +-
 src/memmap.c |   15 +++
 2 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index 5580099..2ebed2e 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -601,7 +601,11 @@ build_memssdt(void)
 for (i = 0; i  nb_memdevs; i++) {
 mem_base = (((u64)(entry-base_addr_high)  32 )| 
entry-base_addr_low);
 mem_len = (((u64)(entry-length_high)  32 )| entry-length_low);
-*(ssdt_ptr++) = 0x00;
+if (find_e820(mem_base, mem_len, E820_RAM)) {
+*(ssdt_ptr++) = 0x01;
+}
+else
+*(ssdt_ptr++) = 0x00;
 entry++;
 }
 build_header((void*)ssdt, SSDT_SIGNATURE, ssdt_ptr - ssdt, 1);
diff --git a/src/memmap.c b/src/memmap.c
index 56865b4..9790da1 100644
--- a/src/memmap.c
+++ b/src/memmap.c
@@ -131,6 +131,21 @@ add_e820(u64 start, u64 size, u32 type)
 //dump_map();
 }
 
+// Check if an e820 entry exists that covers the memory range
+// [start, start+size) with same type as type.
+int
+find_e820(u64 start, u64 size, u32 type)
+{
+int i;
+for (i=0; ie820_count; i++) {
+struct e820entry *e = e820_list[i];
+if ((e-start = start)  (e-size = (size + start - e-start)) 
+(e-type == type))
+return 1;
+}
+return 0;
+}
+
 // Report on final memory locations.
 void
 memmap_finalize(void)
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 7/9] Implement memslot command-line option and memslot hmp command

2012-04-19 Thread Vasilis Liaskovitis

 Implement -memslot qemu-kvm command line option to define hotplug-able memory
 slots.
 Syntax: -memslot id=name,start=addr,size=sz,node=nodeid

 e.g. -memslot id=hot1,start=4294967296,size=1073741824,node=0
 will define a 1G memory slot starting at physical address 4G, belonging to numa
 node 0. Defining no node will automatically add a memslot to node 0.

 Also implement a new hmp monitor command for hot-add and hot-remove of memory 
slots
 Syntax: memslot slotname action
 where action is add/delete and slotname is the qdev-id of the memory slot.

 Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 Makefile.objs   |2 +-
 hmp-commands.hx |   15 +++
 monitor.c   |8 
 monitor.h   |1 +
 qemu-config.c   |   25 +
 qemu-options.hx |8 
 sysemu.h|1 +
 vl.c|   40 
 8 files changed, 99 insertions(+), 1 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 5c3bcda..98ce865 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -240,7 +240,7 @@ hw-obj-$(CONFIG_USB_OHCI) += usb/hcd-ohci.o
 hw-obj-$(CONFIG_USB_EHCI) += usb/hcd-ehci.o
 hw-obj-$(CONFIG_USB_XHCI) += usb/hcd-xhci.o
 hw-obj-$(CONFIG_FDC) += fdc.o
-hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o
+hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o memslot.o
 hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
 hw-obj-$(CONFIG_DMA) += dma.o
 hw-obj-$(CONFIG_I82374) += i82374.o
diff --git a/hmp-commands.hx b/hmp-commands.hx
index a6f5a84..cadf4ca 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -618,6 +618,21 @@ Add device.
 ETEXI
 
 {
+.name   = memslot,
+.args_type  = id:s,action:s,
+.params = id,action,
+.help   = add memslot device,
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_memslot_add,
+},
+
+STEXI
+@item memslot_add @var{config}
+@findex memslot_add
+
+Add memslot.
+ETEXI
+{
 .name   = device_del,
 .args_type  = id:s,
 .params = device,
diff --git a/monitor.c b/monitor.c
index 8946a10..f672186 100644
--- a/monitor.c
+++ b/monitor.c
@@ -30,6 +30,7 @@
 #include hw/pci.h
 #include hw/watchdog.h
 #include hw/loader.h
+#include hw/memslot.h
 #include gdbstub.h
 #include net.h
 #include net/slirp.h
@@ -4675,3 +4676,10 @@ int monitor_read_block_device_key(Monitor *mon, const 
char *device,
 
 return monitor_read_bdrv_key_start(mon, bs, completion_cb, opaque);
 }
+
+int do_memslot_add(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+#if defined(TARGET_I386) || defined(TARGET_X86_64)
+return memslot_do(mon, qdict);
+#endif
+}
diff --git a/monitor.h b/monitor.h
index 0d49800..1e14a63 100644
--- a/monitor.h
+++ b/monitor.h
@@ -80,5 +80,6 @@ int monitor_read_password(Monitor *mon, ReadLineFunc 
*readline_func,
 int qmp_qom_set(Monitor *mon, const QDict *qdict, QObject **ret);
 
 int qmp_qom_get(Monitor *mon, const QDict *qdict, QObject **ret);
+int do_memslot_add(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif /* !MONITOR_H */
diff --git a/qemu-config.c b/qemu-config.c
index be84a03..1f26187 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -613,6 +613,30 @@ QemuOptsList qemu_boot_opts = {
 },
 };
 
+static QemuOptsList qemu_memslot_opts = {
+.name = memslot,
+.head = QTAILQ_HEAD_INITIALIZER(qemu_memslot_opts.head),
+.desc = {
+{
+.name = id,
+.type = QEMU_OPT_STRING,
+},{
+.name = start,
+.type = QEMU_OPT_SIZE,
+.help = physical address start for this memslot,
+},{
+.name = size,
+.type = QEMU_OPT_SIZE,
+.help = memory size for this memslot,
+},{
+.name = node,
+.type = QEMU_OPT_NUMBER,
+.help = NUMA node number (i.e. proximity) for this memslot,
+},
+{ /* end of list */ }
+},
+};
+
 static QemuOptsList *vm_config_groups[32] = {
 qemu_drive_opts,
 qemu_chardev_opts,
@@ -628,6 +652,7 @@ static QemuOptsList *vm_config_groups[32] = {
 qemu_machine_opts,
 qemu_boot_opts,
 qemu_iscsi_opts,
+qemu_memslot_opts,
 NULL,
 };
 
diff --git a/qemu-options.hx b/qemu-options.hx
index a169792..aff0546 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2728,3 +2728,11 @@ HXCOMM This is the last statement. Insert new options 
before this line!
 STEXI
 @end table
 ETEXI
+
+DEF(memslot, HAS_ARG, QEMU_OPTION_memslot,
+-memslot start=num,size=num,id=name\n
+specify unpopulated memory slot,
+QEMU_ARCH_ALL)
+
+
+
diff --git a/sysemu.h b/sysemu.h
index bc2c788..7247099 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -136,6 +136,7 @@ extern QEMUClock *rtc_clock;
 extern int nb_numa_nodes;
 extern uint64_t node_mem[MAX_NODES];
 extern uint64_t node_cpumask[MAX_NODES];
+extern int nb_hp_memslots;
 
 #define MAX_OPTION_ROMS 16
 typedef struct

[RFC PATCH 6/9] pc: pass paravirt info for hotplug memory slots to BIOS

2012-04-19 Thread Vasilis Liaskovitis

 The numa_fw_cfg paravirt interface is extended to include SRAT information for
 all hotplug-able memslots. There are 3 words for each hotplug-able memory slot,
 denoting start address, size and node proximity. nb_numa_nodes is set to 1 by
 default (not 0), so that we always pass srat info to SeaBIOS.

 This information is used by Seabios to build hotplug memory device objects at 
runtime.

 Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/pc.c |   59 +--
 vl.c|4 +++-
 2 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 67f0479..f1f550a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -46,6 +46,7 @@
 #include ui/qemu-spice.h
 #include memory.h
 #include exec-memory.h
+#include memslot.h
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
@@ -592,12 +593,15 @@ int e820_add_entry(uint64_t address, uint64_t length, 
uint32_t type)
 return index;
 }
 
+static void bochs_bios_setup_hp_memslots(uint64_t *fw_cfg_slots);
+
 static void *bochs_bios_init(void)
 {
 void *fw_cfg;
 uint8_t *smbios_table;
 size_t smbios_len;
 uint64_t *numa_fw_cfg;
+uint64_t *hp_memslots_fw_cfg;
 int i, j;
 
 register_ioport_write(0x400, 1, 2, bochs_bios_write, NULL);
@@ -630,28 +634,71 @@ static void *bochs_bios_init(void)
 fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, (uint8_t *)hpet_cfg,
  sizeof(struct hpet_fw_config));
 /* allocate memory for the NUMA channel: one (64bit) word for the number
- * of nodes, one word for each VCPU-node and one word for each node to
- * hold the amount of memory.
+ * of nodes, one word for the number of hotplug memory slots, one word
+ * for each VCPU-node, one word for each node to hold the amount of 
memory.
+ * Finally three words for each hotplug memory slot, denoting start 
address,
+ * size and node proximity.
  */
-numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
+numa_fw_cfg = g_malloc0((2 + max_cpus + nb_numa_nodes + 3 * 
nb_hp_memslots) * 8);
 numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
+numa_fw_cfg[1] = cpu_to_le64(nb_hp_memslots);
+
 for (i = 0; i  max_cpus; i++) {
 for (j = 0; j  nb_numa_nodes; j++) {
 if (node_cpumask[j]  (1  i)) {
-numa_fw_cfg[i + 1] = cpu_to_le64(j);
+numa_fw_cfg[i + 2] = cpu_to_le64(j);
 break;
 }
 }
 }
 for (i = 0; i  nb_numa_nodes; i++) {
-numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
+numa_fw_cfg[max_cpus + 2 + i] = cpu_to_le64(node_mem[i]);
 }
+
+hp_memslots_fw_cfg = numa_fw_cfg + 2 + max_cpus + nb_numa_nodes;
+if (nb_hp_memslots)
+bochs_bios_setup_hp_memslots(hp_memslots_fw_cfg);
+
 fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg,
- (1 + max_cpus + nb_numa_nodes) * 8);
+ (2 + max_cpus + nb_numa_nodes + 3 * nb_hp_memslots) * 8);
 
 return fw_cfg;
 }
 
+static void bochs_bios_setup_hp_memslots(uint64_t *fw_cfg_slots)
+{
+int i = 0;
+Error *err = NULL;
+DeviceState *dev;
+MemSlotState *slot;
+char *type;
+BusState *bus = sysbus_get_default();
+
+QTAILQ_FOREACH(dev, bus-children, sibling) {
+type = object_property_get_str(OBJECT(dev), type, err);
+if (err) {
+error_free(err);
+fprintf(stderr, error getting device type\n);
+exit(1);
+}
+
+if (!strcmp(type, memslot)) {
+if (!dev-id) {
+error_free(err);
+fprintf(stderr, error getting memslot device id\n);
+exit(1);
+}
+if (!strcmp(dev-id, initialslot)) continue;
+slot = MEMSLOT(dev);
+fw_cfg_slots[3 * slot-idx] = cpu_to_le64(slot-start);
+fw_cfg_slots[3 * slot-idx + 1] = cpu_to_le64(slot-size);
+fw_cfg_slots[3 * slot-idx + 2] = cpu_to_le64(slot-node);
+i++;
+}
+}
+assert(i == nb_hp_memslots);
+}
+
 static long get_file_size(FILE *f)
 {
 long where, size;
diff --git a/vl.c b/vl.c
index ae91a8a..50df453 100644
--- a/vl.c
+++ b/vl.c
@@ -3428,8 +3428,10 @@ int main(int argc, char **argv, char **envp)
 
 register_savevm_live(NULL, ram, 0, 4, NULL, ram_save_live, NULL,
  ram_load, NULL);
+if (!nb_numa_nodes)
+nb_numa_nodes = 1;
 
-if (nb_numa_nodes  0) {
+{
 int i;
 
 if (nb_numa_nodes  MAX_NODES) {
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 2/9][SeaBIOS] Implement acpi-dsdt functions for memory hotplug.

2012-04-19 Thread Vasilis Liaskovitis

Extend the DSDT to include methods for handling memory hot-add and hot-remove
notifications and memory device status requests. These functions are called
from the memory device SSDT methods.

Eject has only been tested with level gpe event, but will be changed to edge gpe
event soon, according to recent master patch for other ACPI hotplug events.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/acpi-dsdt.dsl |   68 +++-
 1 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 4bdc268..184daf0 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -709,9 +709,72 @@ DefinitionBlock (
 }
 Return(One)
 }
-}
 
+/* Objects filled in by run-time generated SSDT */
+External(MTFY, MethodObj)
+External(MEON, PkgObj)
+
+Method (CMST, 1, NotSerialized) {
+// _STA method - return ON status of memdevice
+// Local0 = MEON flag for this cpu
+Store(DerefOf(Index(MEON, Arg0)), Local0)
+If (Local0) { Return(0xF) } Else { Return(0x0) }
+}
+/* Memory eject notify method */
+OperationRegion(MEMJ, SystemIO, 0xaf40, 32)
+Field (MEMJ, ByteAcc, NoLock, Preserve)
+{
+MPE, 256
+}
+
+Method (MPEJ, 2, NotSerialized) {
+// _EJ0 method - eject callback
+Store(ShiftLeft(1,Arg0), MPE)
+Sleep(200)
+}
+
+/* Memory hotplug notify method */
+OperationRegion(MEST, SystemIO, 0xaf20, 32)
+Field (MEST, ByteAcc, NoLock, Preserve)
+{
+MES, 256
+}
+
+Method(MESC, 0) {
+// Local5 = active memdevice bitmap
+Store (MES, Local5)
+// Local2 = last read byte from bitmap
+Store (Zero, Local2)
+// Local0 = memory device iterator
+Store (Zero, Local0)
+While (LLess(Local0, SizeOf(MEON))) {
+// Local1 = MEON flag for this memory device
+Store(DerefOf(Index(MEON, Local0)), Local1)
+If (And(Local0, 0x07)) {
+// Shift down previously read bitmap byte
+ShiftRight(Local2, 1, Local2)
+} Else {
+// Read next byte from memdevice bitmap
+Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), 
Local2)
+}
+// Local3 = active state for this memory device
+Store(And(Local2, 1), Local3)
 
+If (LNotEqual(Local1, Local3)) {
+// State change - update MEON with new state
+Store(Local3, Index(MEON, Local0))
+// Do MEM notify
+If (LEqual(Local3, 1)) {
+MTFY(Local0, 1)
+} Else {
+MTFY(Local0, 3)
+}
+}
+Increment(Local0)
+}
+Return(One)
+}
+}
 /
  * General purpose events
  /
@@ -732,7 +795,8 @@ DefinitionBlock (
 Return(\_SB.PRSC())
 }
 Method(_L03) {
-Return(0x01)
+// Memory hotplug event
+Return(\_SB.MESC())
 }
 Method(_L04) {
 Return(0x01)
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC PATCH 0/9] ACPI memory hotplug

2012-04-19 Thread Vasilis Liaskovitis

Hi,

On Thu, Apr 19, 2012 at 09:49:31AM -0500, Anthony Liguori wrote:
 On 04/19/2012 09:08 AM, Vasilis Liaskovitis wrote:
 This is a prototype for ACPI memory hotplug on x86_64 target. Based on some
 earlier work and comments from Gleb.
 
 Memslot devices are modeled with a new qemu command line
 
 -memslot id=name,start=start_addr,size=sz,node=pxm
 
 Hi,
 
 For 1.2, I'd really like to focus on refactoring the PC machine as
 described in this series:
 
 https://github.com/aliguori/qemu/commits/qom-rebase.12
 
 I'd like to represent the guest memory as a DIMM device.
 
 In terms of this proposal, I would then expect that the i440fx
 device would have a num_dimms property that controlled how many
 linkDIMM's it had.  Hotplug would consist of creating a DIMM at
 run time and connecting it to the appropriate link.

ok, makes sense.

 One thing that's not clear to me is how the start/size fits in.  On
 bare metal, is this something that's calculated by the firmware
 during start up and then populated in ACPI?   Does it do something
 like take the largest possible DIMM size that it supports and fill
 out the table?

The current series works as follows:
For each DIMM/memslot option, firmware constructs a QWordMemory ACPI object
(see ACPI spec, ASL 18.5.95). This object has AddressMinimum, AddressMaximum,
RangeLength fields. The first of these corresponds directly to the start
attribute, the third corresponds to size, and the second is derived from both.

On bare metal, I believe the firmware detects the actual DIMM devices and their
size and calculates the physical offset (AddressMinimum) for each, taking into
account possible PCI hole. I doubt the largest possible DIMM size is used, since
a hotplug entity/event should correspond to a physical device. (Kevin or Gleb 
may
have a better idea of what real metal firmware usually does).

Perhaps you are suggesting having a predefined number of equally sized DIMMs as
being hotplug-able? This way no memslot/DIMM config would have to be passed by
the user at the command line for each DIMM.

In this series, the user-defined memslot options pass the desired DIMM
descriptions to SeaBIOS, which then builds the aforementioned objects.(I assume
it would be possible to pass this info with normal -device commands, after
proper qom-ification)

 
 At any rate, I think we should focus on modeling this in QOM verses
 adding a new option and hacking at the existing memory init code.

agreed. I will take a look into qom-rebase.

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 7/9] Implement memslot command-line option and memslot hmp command

2012-04-19 Thread Vasilis Liaskovitis

Hi,

On Thu, Apr 19, 2012 at 05:22:52PM +0300, Avi Kivity wrote:
 On 04/19/2012 05:08 PM, Vasilis Liaskovitis wrote:
   Implement -memslot qemu-kvm command line option to define hotplug-able 
  memory
   slots.
   Syntax: -memslot id=name,start=addr,size=sz,node=nodeid
 
   e.g. -memslot id=hot1,start=4294967296,size=1073741824,node=0
   will define a 1G memory slot starting at physical address 4G, belonging to 
  numa
   node 0. Defining no node will automatically add a memslot to node 0.
 
 start=4G,size=1G ought to work too, no?

it should, but it didn't when I tried. Probably some silliness on my part, I
will retry.

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

live migration between qemu-kvm 1.0 and 0.15

2012-03-27 Thread Vasilis Liaskovitis

Hi,

is live migration between qemu-kvm stable-0.15 and stable-1.0 trees possible?
When I live migrate a VM from 1.0 to 0.15, the destination side 0.15 qemu-kvm
exits with:
(qemu) Unknown savevm section or instance 'i8259' 0

That's expected, since commit i8259:convert to qdev 
747c70af78f7088f182c87e683bdf847beead1e4
introduces the i8259 device in the qdev tree.

The other direction (live migrate from 0.15.1 to 1.0.0) seems to work fine.
Are any other issues expected in this direction?

The vmstate for i8259 has not changed between these trees afaict. If the
qdev-ified i8259 is reverted from stable-1.0 tree (to restore live-migration
compatibility between the versions), would you expect problems?

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][SeaBIOS] memory hotplug

2012-03-16 Thread Vasilis Liaskovitis

Hi,

On Thu, Mar 15, 2012 at 02:01:38PM +0200, Gleb Natapov wrote:
 Commenting a little bit late, but since you've said that you are working on
 a new version of the patch... better late than never.
 
 On Thu, Aug 11, 2011 at 04:39:38PM +0200, Vasilis Liaskovitis wrote:
  Hi,
  
  I am testing a set of experimental patches for memory-hotplug on x86_64 
  host /
  guest combinations. I have implemented this in a similar way to 
  cpu-hotplug.  
  
  A dynamic SSDT table with all memory devices is created at boot time.  This
  table calls static methods from the DSDT.  A byte array indicates which 
  memory
  device is online or not. This array is kept in sync with a qemu-kvm bitmap 
  array
  through ioport 0xaf20. Qemu-kvm updates this table on a mem_set command 
  and
  an ACPI event is triggered.
  
  Memory devices are 128MB in size (to match 
  /sys/devices/memory/block_size_bytes 
  in x86_64). They are constructed dynamically in src/ssdt-mem.asl , 
  similarly to
  hotpluggable-CPUs.  The _CRS memstart-memend attribute for each memory 
  device is
  defined accordingly, skipping the hole at 0xe000 - 0x1.
  Hotpluggable memory is always located above 4GB.
  
 What is the reason for this limitation?

We currently model a PCI hole from below_4g_mem_size to 4GB, see i440fx_init
call in pc_init1. The decision was discussed here:
http://patchwork.ozlabs.org/patch/105892/
afaict because there was no clear resolution on using a top-of-memory register.
So, hotplugging will start at 4GB + above_4g_mem_size. Unless we can model the
pci hole more accurately hardware-wise.

 
  Qemu-kvm sets the upper bound of hotpluggable memory with maxmem = 
  [totalmemory in
  MB] on the command line. Maxmem is an argument for -m similar to maxcpus 
  for smp.
  E.g. -m 1024,maxmem=2048 on the qemu command line will create memory 
  devices
  for 2GB of RAM, enabling only 1GB initially.
  
  Qemu_monitor triggers a memory hotplug with:
  (qemu) mem_set [memory range in MBs] online
  
 As far as I see mem_set does not get memory range as a parameter. The
 parameter is amount of memory to add/remove and memory is added/removed
 to/from the top.
 
 This is not flexible enough.  Find grained control for memory slots is
 needed. What about exposing memory slot configuration to command line
 like this: 
 
 -memslot mem=size,populated=yes|no
 
 adding one of those for each slot.
 
yes, I agree we need this.
Is the idea to model all physical DIMMs? For initial system RAM does it make
sense to explicitly specify slots at the command line, or infer them? 

I think we can allocate a new qemu ram MemoryRegion for each new hotplugged
slot/DIMM, so there will be a 1-1 mapping between new populated slots and
qemu memory ram regions.  Perhaps we want initial memory allocation to also 
comply with physical slot/DIMM modeling. Initial (cold) RAM is created as
a single MemoryRegion pc.ram

Also in kvm we can easily run out of kvm_memory_slots (10 slots per VM and 32
system-wide I think)

 mem_set will get slot id to populate/depopulate just like cpu_set gets
 cpu slot number to remove and not just yanks cpus with highest slot id.

right, but I think for upstream qemu, people would like to eventually use 
device_add,
instead of a new mem_set command. Pretty much the same way as cpu hotplug?

For this to happen, memory devices should be modeled in QOM/qdev. Are we 
planning
on keeping a CPUSocket structures for CPUs? or perhaps modelling a memory 
controller
is the right way. What type should the memory controller/devices be a child of?

I 'll try to resubmit in a few weeks time, though depending on feedack qom/qdev 
of
memory devices will probably take longer.

thanks,

- Vasilis

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

live migration between amd fam15h-fam10h

2012-03-01 Thread Vasilis Liaskovitis

Hi,

I am getting a frozen guest when migrating from an Opteron 6274 host (amd 
fam15h) to
an Opteron 6174 host (amd fam10h). The live migration completes succesfully, but
the guest is frozen: vcn screen is still there, but no input is possible and
no kernel output is seen. Trying c on the qemu-monitor does not help.
I am using -cpu Opteron_G3 which I assumed would be ok for both host cpus.

In the opposite direction (migrating from an amd fam10h host to an amdfam15h
host) the guest continues to run on the destination. However, on most of these
successfull live migrations, I notice a clocksource unstable message on the
guest kernel (using the default kvm-clock clocksource) e.g.
Clocksource tsc unstable (delta = -1500533439 ns)
Same situation (guest runs on destination with clocksource unstable message)
happens when migrating between fam15h hosts (I have not tried between fam10h
hosts)

Changing the clocksource (tsc, acpi_pm, hpet) does not solve the issue.
Also tried with -cpu kvm64 with same result.

qemu-kvm version: 0.15.1, 1.0 or qemu-kvm/master
Host kernel: 3.0.15 (on both hosts)
Guest kernel: 3.0.6 or 3.2

this is the qemu-kvm command line used on the source host: 


kvm -enable-kvm -m 1024 -smp 1 -cpu Opteron_G3,check -drive \
file=/opt/test.img,if=none,id=drive-virtio-disk1,format=raw,cache=writethrough,boot=on
-device 
virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1
-monitor stdio -vnc 0.0.0.0:6 -vga std -chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0


The destination host has the same command line with an added -incoming
tcp:. I have mainly tested this with non-shared storage (but also shared
storage has the same result). Migration is triggered with migrate -b 
tcp:destip:

Do the TSC microarchitecture changes in amdfam15h (see AMD SW optimiization
guide for fam15h, 47414 Rev 3.02 Appendix E) affect pvclock stability on
migration in same family or across families?

cpuid information follows in case it's helpful.

6274 host:

 eax ineax  ebx  ecx  edx
 000d 68747541 444d4163 69746e65
0001 00600f12 02100800 1e98220b 178bfbff
0002    
0003    
0004    
0005 0040 0040 0003 
0006   0001 
0007    
0008    
0009    
000a    
000b    
000c    
000d    
8000 801e 68747541 444d4163 69746e65
8001 00600f12 3000 01c9bfff 2fd3fbff
8002 20444d41 6574704f 286e6f72 20294d54
8003 636f7250 6f737365 32362072 20203437
8004 20202020 20202020 20202020 00202020
8005 ff20ff18 ff20ff30 10040140 40020140
8006 6400 64004200 08008140 0060e140
8007    03d9
8008 3030  500f 
8009    
800a 0001 0001  14ff
800b    
800c    
800d    
800e    
800f    
8010    
8011    
8012    
8013    
8014    
8015    
8016    
8017    
8018    
8019 f020f018 6400  
801a 0003   
801b 00ff   
801c  80032013 00010200 800f
801d    
801e 0022 0101 0100 

Vendor ID: AuthenticAMD; CPUID level 13

AMD-specific functions
Version 00600f12:
Family: 15 Model: 1 []

Standard feature flags 178bfbff:
Floating Point Unit
Virtual Mode Extensions
Debugging Extensions
Page Size Extensions
Time Stamp Counter (with RDTSC and CR4 disable bit)
Model Specific Registers with RDMSR  WRMSR
PAE - Page Address Extensions
Machine Check Exception
COMPXCHG8B Instruction
APIC
SYSCALL/SYSRET or SYSENTER/SYSEXIT instructions
MTRR - Memory Type Range Registers
Global paging extension
Machine Check Architecture
Conditional Move Instruction
PAT - Page Attribute Table
PSE-36 - Page Size Extensions
19 - reserved
MMX instructions
FXSAVE/FXRSTOR
25 - reserved
26 - reserved
28 - reserved
Generation: 15 Model: 1
Extended feature flags 2fd3fbff:
Floating Point Unit
Virtual Mode Extensions
Debugging Extensions
Page Size Extensions
Time

Re: [PATCH v2 3/4] uq/master: Add CPU eject handling for acpi_piix4

2012-01-30 Thread Vasilis Liaskovitis

On Thu, Jan 26, 2012 at 12:46:18PM +0200, Avi Kivity wrote:
 On 01/24/2012 04:56 PM, Vasilis Liaskovitis wrote:
  On Tue, Jan 24, 2012 at 11:28:41AM +0100, Jan Kiszka wrote:
   On 2012-01-24 11:10, Vasilis Liaskovitis wrote:
Add stub functions for CPU eject callback. Define cpu_acpi_eject 
property and
enable eject callback only for pc-1.1 machine model.
   
   Just to get the idea: What is the plan and advantage of introducing a
   stub first? How much more is required to have some usable feature, even
   if its just a friction of the full support?
  
  There's not really an advantage to adding stubs first. The plan depends on 
  the
  lifecycle patches getting accepted in some form at some point. The code is 
  all
  out there, and some of it has been reviewed/commented on, but not accepted.
 
  kvm needs the following patches:
  https://lkml.org/lkml/2012/1/6/355 (v7, still in work)
  http://patchwork.ozlabs.org/patch/127828/
  This second patch introduces ioctl KVM_SETSTATE_VCPU, (qemu uses it to 
  signal
  vcpu destruction to the host) but the review mentions there should be a
  simpler way. It's unclear to me whether this ioctl is desired or not.
 
 Those patches are not strictly needed.  On a kernel that doesn't have
 them, you can simply park the vcpu thread in userspace until it is
 re-added.  I suggest writing the qemu patches without the assumption
 that you're running on a 3.4+ kernel.

ok, I will try to handle CPU ejection without relying on the lifecycle
patches.

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/4] acpi_piix4: Add CPU eject infrastructure for pc-1.1

2012-01-24 Thread Vasilis Liaskovitis

This patch series adds support for CPU ejection callbacks in Seabios and qemu.
This will be needed for proper ACPI vcpu destruction/unplug in conjunction
with the vcpu lifecycle patches. 

v1-v2: Add pc-1.1 model with cpu acpi ejection property. Add documentation.
v1 of the series also defined the eject method to handle the CPU_DEAD event
in the cpu lifecycle/destruction series. That patch has been dropped from the
patchset and will be sent separately as lifecycle/unplug series matures.

v2 patches are based on uq/master, plus a patch from the first version of
vcpu-hotplug qemu upstream series, specifically:
http://patchwork.ozlabs.org/patch/136463/

Vasilis Liaskovitis (3):
  uq/master: Add machine model pc-1.1
  uq/master: Add CPU eject handling for acpi_piix4
  uq/master: Add acpi cpu interface documentation

 docs/specs/acpi_hotplug.txt |   49 +++
 docs/specs/acpi_pci_hotplug.txt |   37 -
 hw/acpi_piix4.c |   20 
 hw/pc_piix.c|   16 
 4 files changed, 85 insertions(+), 37 deletions(-)
 create mode 100644 docs/specs/acpi_hotplug.txt
 delete mode 100644 docs/specs/acpi_pci_hotplug.txt

-- 
1.7.7.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/4][SeaBios] Add bitmap for CPU EJ0 callback

2012-01-24 Thread Vasilis Liaskovitis

Add bitmap for CPU EJ0 callback and write to it on a cpu _EJ0 callback. Remove
Sleep() call.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/acpi-dsdt.dsl |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 7082b65..5138c2a 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -650,9 +650,15 @@ DefinitionBlock (
 Store(DerefOf(Index(CPON, Arg0)), Local0)
 If (Local0) { Return(0xF) } Else { Return(0x0) }
 }
+/* CPU eject notify method */
+OperationRegion(PREJ, SystemIO, 0xaf20, 32)
+Field (PREJ, ByteAcc, NoLock, Preserve)
+{
+PRE, 256
+}
 Method (CPEJ, 2, NotSerialized) {
 // _EJ0 method - eject callback
-Sleep(200)
+Store(ShiftLeft(1, Arg0), PRE)
 }
 
 /* CPU hotplug notify method */
-- 
1.7.7.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] uq/master: Add machine model pc-1.1

2012-01-24 Thread Vasilis Liaskovitis

Add machine model pc-1.1 

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/pc_piix.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 744b0dc..ac251c6 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -375,6 +375,13 @@ static void pc_xen_hvm_init(ram_addr_t ram_size,
 }
 #endif
 
+static QEMUMachine pc_machine_v1_1 = {
+.name = pc-1.1,
+.desc = Standard PC,
+.init = pc_init_pci,
+.max_cpus = 255,
+};
+
 static QEMUMachine pc_machine_v1_0 = {
 .name = pc-1.0,
 .alias = pc,
@@ -674,6 +681,7 @@ static QEMUMachine xenfv_machine = {
 
 static void pc_machine_init(void)
 {
+qemu_register_machine(pc_machine_v1_1);
 qemu_register_machine(pc_machine_v1_0);
 qemu_register_machine(pc_machine_v0_15);
 qemu_register_machine(pc_machine_v0_14);
-- 
1.7.7.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] uq/master: Add acpi cpu interface documentation

2012-01-24 Thread Vasilis Liaskovitis

Add CPU acpi interface documentation. Move all ACPI documentation (CPU and
PCI) to one file.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 docs/specs/acpi_hotplug.txt |   49 +++
 docs/specs/acpi_pci_hotplug.txt |   37 -
 2 files changed, 49 insertions(+), 37 deletions(-)
 create mode 100644 docs/specs/acpi_hotplug.txt
 delete mode 100644 docs/specs/acpi_pci_hotplug.txt

diff --git a/docs/specs/acpi_hotplug.txt b/docs/specs/acpi_hotplug.txt
new file mode 100644
index 000..2026bed
--- /dev/null
+++ b/docs/specs/acpi_hotplug.txt
@@ -0,0 +1,49 @@
+QEMU-ACPI BIOS PCI hotplug interface
+--
+
+QEMU supports PCI hotplug via ACPI, for PCI bus 0. This document
+describes the interface between QEMU and the ACPI BIOS.
+
+ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
+-
+
+Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
+event to ACPI BIOS, via SCI interrupt.
+
+PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte access):
+---
+Slot injection notification pending. One bit per slot.
+
+Read by ACPI BIOS GPE.1 handler to notify OS of injection
+events.
+
+PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
+-
+Slot removal notification pending. One bit per slot.
+
+Read by ACPI BIOS GPE.1 handler to notify OS of removal
+events.
+
+PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
+
+
+Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
+Reads return 0.
+
+PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
+---
+
+Used by ACPI BIOS _RMV method to indicate removability status to OS. One
+bit per slot.
+
+CPU hotplug notification pending (IO port 0xaf00-0xaf1f, 32-byte access):
+---
+CPU hotplug notification pending. One bit per cpu.
+
+Read by ACPI BIOS GPE.2 handler to notify OS of injection
+events.
+
+CPU eject (IO port 0xaf20-0xaf3f, 32-byte access):
+
+
+Used by ACPI BIOS _EJ0 method to request cpu removal. One bit per cpu.
diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt
deleted file mode 100644
index f0f74a7..000
--- a/docs/specs/acpi_pci_hotplug.txt
+++ /dev/null
@@ -1,37 +0,0 @@
-QEMU-ACPI BIOS PCI hotplug interface
---
-
-QEMU supports PCI hotplug via ACPI, for PCI bus 0. This document
-describes the interface between QEMU and the ACPI BIOS.
-
-ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
--
-
-Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
-event to ACPI BIOS, via SCI interrupt.
-
-PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte access):

-Slot injection notification pending. One bit per slot.
-
-Read by ACPI BIOS GPE.1 handler to notify OS of injection
-events.
-
-PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
--
-Slot removal notification pending. One bit per slot.
-
-Read by ACPI BIOS GPE.1 handler to notify OS of removal
-events.
-
-PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
-
-
-Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
-Reads return 0.
-
-PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):

-
-Used by ACPI BIOS _RMV method to indicate removability status to OS. One
-bit per slot.
-- 
1.7.7.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 3/4] uq/master: Add CPU eject handling for acpi_piix4

2012-01-24 Thread Vasilis Liaskovitis

Add stub functions for CPU eject callback. Define cpu_acpi_eject property and
enable eject callback only for pc-1.1 machine model.

Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/acpi_piix4.c |   20 
 hw/pc_piix.c|8 
 2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 96e1ce8..8475aa6 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -40,6 +40,7 @@
 
 #define GPE_BASE 0xafe0
 #define PROC_BASE 0xaf00
+#define PROC_EJ_BASE 0xaf20
 #define GPE_LEN 4
 #define PCI_BASE 0xae00
 #define PCI_EJ_BASE 0xae08
@@ -80,6 +81,8 @@ typedef struct PIIX4PMState {
 struct gpe_regs gpe_cpu;
 struct pci_status pci0_status;
 uint32_t pci0_hotplug_enable;
+/* for cpu hotplug */
+uint32_t cpu_acpi_eject;
 } PIIX4PMState;
 
 static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s);
@@ -424,6 +427,7 @@ static PCIDeviceInfo piix4_pm_info = {
 .class_id   = PCI_CLASS_BRIDGE_OTHER,
 .qdev.props = (Property[]) {
 DEFINE_PROP_UINT32(smb_io_base, PIIX4PMState, smb_io_base, 0),
+DEFINE_PROP_UINT32(cpu_acpi_eject, PIIX4PMState, cpu_acpi_eject, 0),
 DEFINE_PROP_END_OF_LIST(),
 }
 };
@@ -497,6 +501,17 @@ static void pcihotplug_write(void *opaque, uint32_t addr, 
uint32_t val)
 PIIX4_DPRINTF(pcihotplug write %x == %d\n, addr, val);
 }
 
+static uint32_t cpuej_read(void *opaque, uint32_t addr)
+{
+PIIX4_DPRINTF(cpuej read %x\n, addr);
+return 0;
+}
+
+static void cpuej_write(void *opaque, uint32_t addr, uint32_t val)
+{
+PIIX4_DPRINTF(cpuej write %x == %d\n, addr, val);
+}
+
 static uint32_t pciej_read(void *opaque, uint32_t addr)
 {
 PIIX4_DPRINTF(pciej read %x\n, addr);
@@ -555,6 +570,11 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, 
PIIX4PMState *s)
 register_ioport_write(PROC_BASE, 32, 1, gpe_writeb, s);
 register_ioport_read(PROC_BASE, 32, 1,  gpe_readb, s);
 
+if (s-cpu_acpi_eject) {
+register_ioport_write(PROC_EJ_BASE, 32, 1, cpuej_write, s);
+register_ioport_read(PROC_EJ_BASE, 32, 1,  cpuej_read, s);
+}
+
 register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
 register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
 
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index ac251c6..6d61567 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -380,6 +380,14 @@ static QEMUMachine pc_machine_v1_1 = {
 .desc = Standard PC,
 .init = pc_init_pci,
 .max_cpus = 255,
+.compat_props = (GlobalProperty[]) {
+{
+.driver   = PIIX4_PM,
+.property = cpu_acpi_eject,
+.value= stringify(1),
+},
+{ /* end of list */ }
+},
 };
 
 static QEMUMachine pc_machine_v1_0 = {
-- 
1.7.7.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 3/4] uq/master: Add CPU eject handling for acpi_piix4

2012-01-24 Thread Vasilis Liaskovitis

On Tue, Jan 24, 2012 at 11:28:41AM +0100, Jan Kiszka wrote:
 On 2012-01-24 11:10, Vasilis Liaskovitis wrote:
  Add stub functions for CPU eject callback. Define cpu_acpi_eject property 
  and
  enable eject callback only for pc-1.1 machine model.
 
 Just to get the idea: What is the plan and advantage of introducing a
 stub first? How much more is required to have some usable feature, even
 if its just a friction of the full support?

There's not really an advantage to adding stubs first. The plan depends on the
lifecycle patches getting accepted in some form at some point. The code is all
out there, and some of it has been reviewed/commented on, but not accepted.

kvm needs the following patches:
https://lkml.org/lkml/2012/1/6/355 (v7, still in work)
http://patchwork.ozlabs.org/patch/127828/
This second patch introduces ioctl KVM_SETSTATE_VCPU, (qemu uses it to signal
vcpu destruction to the host) but the review mentions there should be a
simpler way. It's unclear to me whether this ioctl is desired or not.

userspace qemu/qemu-kvm need some form of these patches
http://patchwork.ozlabs.org/patch/127831/
http://patchwork.ozlabs.org/patch/127830/
http://patchwork.ozlabs.org/patch/127833/
http://patchwork.ozlabs.org/patch/127834/

Assuming that the above is further reviewed and accepted, the extra code
needed to actually make something useful in the stub functions would be 
something
like the following (with the above ioctl), comments welcome. This code calls
kvm function from hw/acpi_piix4.c so it's probably not well abstracted enough
for upstream.

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 8475aa6..b5fcb4a 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -509,6 +509,20 @@ static uint32_t cpuej_read(void *opaque, uint32_t addr)
 
 static void cpuej_write(void *opaque, uint32_t addr, uint32_t val)
 {
+PIIX4PMState *s = opaque;
+CPUState *env;
+int cpu;
+int ret;
+
+cpu = ffs(val);
+/* zero means no bit was set, i.e. no CPU ejection happened */
+if (!cpu)
+   return;
+cpu--;
+env = cpu_phyid_to_cpu((uint64_t)cpu);
+if (s-kvm_enabled  env != NULL) {
+kvm_eject_vcpu(env);
+}
 PIIX4_DPRINTF(cpuej write %x == %d\n, addr, val);
 }
 
diff --git a/kvm-all.c b/kvm-all.c
index 88f1156..d3e53f5 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -193,6 +193,13 @@ static void kvm_reset_vcpu(void *opaque)
 kvm_arch_reset_vcpu(env);
 }
 
+static void kvm_eject_vcpu(void *opaque)
+{
+CPUState *env = opaque;
+
+kvm_arch_eject_vcpu(env);
+}
+
 int kvm_irqchip_in_kernel(void)
 {
 return kvm_state-irqchip_in_kernel;
diff --git a/kvm.h b/kvm.h
index 40b5ffc..ace28a8 100644
--- a/kvm.h
+++ b/kvm.h
@@ -125,6 +125,8 @@ int kvm_arch_init_vcpu(CPUState *env);
 
 void kvm_arch_reset_vcpu(CPUState *env);
 
+void kvm_arch_eject_vcpu(CPUState *env);
+
 int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr);
 int kvm_arch_on_sigbus(int code, void *addr);
 
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index e41de39..f8239c0 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -589,6 +589,21 @@ void kvm_arch_reset_vcpu(CPUState *env)
 }
 }
 
+void kvm_arch_eject_vcpu(CPUState *env)
+{
+struct kvm_vcpu_state state;
+int ret = 0;
+
+if (env-state == CPU_STATE_ZAPREQ) {
+state.vcpu_id = env-cpu_index;
+state.state = 1;
+ret = kvm_vm_ioctl(env-kvm_state, KVM_SETSTATE_VCPU, state);
+if (ret)
+fprintf(stderr, KVM_SETSTATE_VCPU failed: %s\n,
+strerror(ret));
+}
+}
+
 static int kvm_get_supported_msrs(KVMState *s)
 {
 static int kvm_supported_msrs;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3][Seabios] Add bitmap for cpu _EJ0 callback

2012-01-19 Thread Vasilis Liaskovitis

On Fri, Jan 13, 2012 at 07:27:01PM -0500, Kevin O'Connor wrote:
 
 [...]
   Method (CPEJ, 2, NotSerialized) {
   // _EJ0 method - eject callback
  +Store(ShiftLeft(1, Arg0), PRE)
   Sleep(200)
   }

I have another question here: the PCI _EJO callback seems to return 0x0, but
the CPU _EJ0 doesn't return anything. THe ACPIspec4.0a draft section 6.3.3
mentions that _EJx methods have no return value. Is the above difference
intentional?

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] acpi_piix4: Add stub functions for CPU eject callback

2012-01-16 Thread Vasilis Liaskovitis

On Sun, Jan 15, 2012 at 02:38:52PM +0200, Avi Kivity wrote:
 On 01/13/2012 01:11 PM, Vasilis Liaskovitis wrote:
  Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
  ---
   hw/acpi_piix4.c |   15 +++
   1 files changed, 15 insertions(+), 0 deletions(-)
 
  diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
  index d5743b6..8bf30dd 100644
  --- a/hw/acpi_piix4.c
  +++ b/hw/acpi_piix4.c
  @@ -37,6 +37,7 @@
   
   #define GPE_BASE 0xafe0
   #define PROC_BASE 0xaf00
  +#define PROC_EJ_BASE 0xaf20
 
 
 We're adding stuff to piix4 which was never there.  At a minimum this
 needs to be documented.  Also needs to be -M pc-1.1 and later only.

Where should this be documented? PCI/ACPI hotplug addresses are documented in
docs/specs/acpi_pci_hotplug.txt but for CPU hotplug documentation (i.e.
for the existing PROC_BASE) I don't see relevant documentation. I will
create a docs/specs/acpi_cpu_hotplug.txt if that sounds reasonable.

For pc-1.1, a new QEMUmachine type will be needed I assume. Should a check be
made against the machine version in the piix4 code? any relevant examples? 

thanks,

- Vasilis

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3][Seabios] Add bitmap for cpu _EJ0 callback

2012-01-16 Thread Vasilis Liaskovitis

On Fri, Jan 13, 2012 at 07:27:01PM -0500, Kevin O'Connor wrote:
 On Fri, Jan 13, 2012 at 12:11:30PM +0100, Vasilis Liaskovitis wrote:
  
  Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
 
 The SeaBIOS change is okay with me, but the qemu/kvm change needs to
 be accepted first.
 
 [...]
   Method (CPEJ, 2, NotSerialized) {
   // _EJ0 method - eject callback
  +Store(ShiftLeft(1, Arg0), PRE)
   Sleep(200)
   }
 
 Is the Sleep() still needed?

I believe it's unneccesary. I 'll test without it and resend.
thanks,

- Vasilis

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/3] acpi_piix4: Add CPU eject handling

2012-01-13 Thread Vasilis Liaskovitis

This patch series adds support for CPU _EJ0 callback in Seabios and qemu-kvm.
The first patch defines the CPU eject bitmap in Seabios and writes to it
during the callback. The second patch adds empty stub functions to qemu-kvm to
handle the bitmap writes.

The third patch defines the eject method to handle the CPU_DEAD event
in Liu Ping Fan's cpu lifecycle/destruction patchseries, see:
http://patchwork.ozlabs.org/patch/127832/
This ACPI implementation can be used instead of the cpustate virtio/pci device
in the original series.

Vasilis Liaskovitis (2):
  acpi_piix4: Add CPU ejection handling
  acpi_piix4: Call KVM_SETSTATE_VCPU ioctl on cpu ejection

 hw/acpi_piix4.c |   36 
 1 files changed, 36 insertions(+), 0 deletions(-)

-- 
1.7.7.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3][Seabios] Add bitmap for cpu _EJ0 callback

2012-01-13 Thread Vasilis Liaskovitis


Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 src/acpi-dsdt.dsl |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 7082b65..71d8ac4 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -650,8 +650,15 @@ DefinitionBlock (
 Store(DerefOf(Index(CPON, Arg0)), Local0)
 If (Local0) { Return(0xF) } Else { Return(0x0) }
 }
+/* CPU eject notify method */
+OperationRegion(PREJ, SystemIO, 0xaf20, 32)
+Field (PREJ, ByteAcc, NoLock, Preserve)
+{
+PRE, 256
+}
 Method (CPEJ, 2, NotSerialized) {
 // _EJ0 method - eject callback
+Store(ShiftLeft(1, Arg0), PRE)
 Sleep(200)
 }
 
-- 
1.7.7.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] acpi_piix4: Add stub functions for CPU eject callback

2012-01-13 Thread Vasilis Liaskovitis


Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/acpi_piix4.c |   15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index d5743b6..8bf30dd 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -37,6 +37,7 @@
 
 #define GPE_BASE 0xafe0
 #define PROC_BASE 0xaf00
+#define PROC_EJ_BASE 0xaf20
 #define GPE_LEN 4
 #define PCI_BASE 0xae00
 #define PCI_EJ_BASE 0xae08
@@ -493,6 +494,17 @@ static void pcihotplug_write(void *opaque, uint32_t addr, 
uint32_t val)
 PIIX4_DPRINTF(pcihotplug write %x == %d\n, addr, val);
 }
 
+static uint32_t cpuej_read(void *opaque, uint32_t addr)
+{
+PIIX4_DPRINTF(cpuej read %x\n, addr);
+return 0;
+}
+
+static void cpuej_write(void *opaque, uint32_t addr, uint32_t val)
+{
+PIIX4_DPRINTF(cpuej write %x == %d\n, addr, val);
+}
+
 static uint32_t pciej_read(void *opaque, uint32_t addr)
 {
 PIIX4_DPRINTF(pciej read %x\n, addr);
@@ -553,6 +565,9 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, 
PIIX4PMState *s)
 register_ioport_write(PROC_BASE, 32, 1, gpe_writeb, s);
 register_ioport_read(PROC_BASE, 32, 1,  gpe_readb, s);
 
+register_ioport_write(PROC_EJ_BASE, 32, 1, cpuej_write, s);
+register_ioport_read(PROC_EJ_BASE, 32, 1,  cpuej_read, s);
+
 register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
 register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
 
-- 
1.7.7.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] acpi_piix4: Call KVM_SETSTATE_VCPU ioctl on cpu ejection

2012-01-13 Thread Vasilis Liaskovitis


Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
---
 hw/acpi_piix4.c |   21 +
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 8bf30dd..12eef55 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -502,6 +502,27 @@ static uint32_t cpuej_read(void *opaque, uint32_t addr)
 
 static void cpuej_write(void *opaque, uint32_t addr, uint32_t val)
 {
+struct kvm_vcpu_state state;
+CPUState *env;
+int cpu;
+int ret;
+
+cpu = ffs(val);
+/* zero means no bit was set, i.e. no CPU ejection happened */
+if (!cpu)
+   return;
+cpu--;
+env = cpu_phyid_to_cpu((uint64_t)cpu);
+if (env != NULL) {
+if (env-state == CPU_STATE_ZAPREQ) {
+state.vcpu_id = env-cpu_index;
+state.state = 1;
+ret = kvm_vm_ioctl(env-kvm_state, KVM_SETSTATE_VCPU, state);
+if (ret)
+fprintf(stderr, KVM_SETSTATE_VCPU failed: %s\n,
+strerror(ret));
+}
+}
 PIIX4_DPRINTF(cpuej write %x == %d\n, addr, val);
 }
 
-- 
1.7.7.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] acpi_piix4: Call KVM_SETSTATE_VCPU ioctl on cpu ejection

2012-01-13 Thread Vasilis Liaskovitis

On Fri, Jan 13, 2012 at 12:58:53PM +0100, Jan Kiszka wrote:
 On 2012-01-13 12:11, Vasilis Liaskovitis wrote:
  Signed-off-by: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
  ---
   hw/acpi_piix4.c |   21 +
   1 files changed, 21 insertions(+), 0 deletions(-)
  
  diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
  index 8bf30dd..12eef55 100644
  --- a/hw/acpi_piix4.c
  +++ b/hw/acpi_piix4.c
  @@ -502,6 +502,27 @@ static uint32_t cpuej_read(void *opaque, uint32_t addr)
   
   static void cpuej_write(void *opaque, uint32_t addr, uint32_t val)
   {
  +struct kvm_vcpu_state state;
  +CPUState *env;
  +int cpu;
  +int ret;
  +
  +cpu = ffs(val);
  +/* zero means no bit was set, i.e. no CPU ejection happened */
  +if (!cpu)
  +   return;
  +cpu--;
  +env = cpu_phyid_to_cpu((uint64_t)cpu);
  +if (env != NULL) {
  +if (env-state == CPU_STATE_ZAPREQ) {
  +state.vcpu_id = env-cpu_index;
  +state.state = 1;
  +ret = kvm_vm_ioctl(env-kvm_state, KVM_SETSTATE_VCPU, state);
 
 That breaks in the absence of KVM or if it is not enabled.

Right, I will rework.

Do we expect icc-bus related changes on a CPU unplug? This patch does not
handle this yet.

 
 Also, where was this IOCTL introduced? Where are the linux header changes?


The headers are here:
http://patchwork.ozlabs.org/patch/127834/

And the ioctl is introduced here:
http://patchwork.ozlabs.org/patch/127828/

Though the actual ioctl code seems to have dropped through the cracks in the
above patch. A sample implementation against 3.1.0 is below, but I have not
included it in the patch series. I expect the ioctl implementation to be part
of Liu 's kernel kvm-related series. In any case, this third patch depends on
the cpu zap/lifecycle patchseries and perhaps should be reviewed separately
from the first 2.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6d3a724..8dd9ebd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2095,6 +2095,22 @@ static long kvm_vm_ioctl(struct file *filp,
r = kvm_ioeventfd(kvm, data);
break;
}
+   case KVM_SETSTATE_VCPU: {
+   struct kvm_vcpu_state vcpu_state;
+   struct kvm_vcpu *vcpu;
+   int idx;
+   r = -EFAULT;
+   if (copy_from_user(vcpu_state, argp,
+   sizeof(struct kvm_vcpu_state)))
+   goto out;
+   idx = srcu_read_lock(kvm-srcu);
+   kvm_for_each_vcpu(vcpu, kvm)
+   if (vcpu_state.vcpu_id == vcpu-vcpu_id)
+   vcpu-state = vcpu_state.state;
+   srcu_read_unlock(kvm-srcu, idx);
+   r = 0;
+   break;
+   }
 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
case KVM_SET_BOOT_CPU_ID:
r = 0;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] acpi_piix4: Add CPU eject handling

2012-01-13 Thread Vasilis Liaskovitis

On Fri, Jan 13, 2012 at 12:58:10PM +0100, Jan Kiszka wrote:
 Please work against upstream (uq/master for kvm-related patches), not
 qemu-kvm. It possibly makes no technical difference here, but we do not
 want to let the code bases needlessly diverge again. If if does make a
 difference and upstream lacks further bits, push them first.

Apologies, I will from now on.

thanks,

- Vasilis

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM, CPU hotplug: Avoid wraparound in pvclock_get_nsec_offset

2011-12-12 Thread Vasilis Liaskovitis

Hotplugging a vCPU with kvmclock enabled can cause a guest stall/hang. When
the stall happens, pvclock_clocksource_read() is called for the new vCPU and
pvclock_get_nsec_offset calculates native_read_tsc() - shadow-tsc_timestamp.
shadow-tsc_timestamp contains a value larger than native_read_tsc(), so the
result is a very large 64-bit unsigned value. The global tsc variable 
last_value gets updated with this, causing system stall/freeze:
rcu_sched_state detected stalls on CPUs/tasks ...

The large shadow-tsc_timestamp value observed in the hanged cases is the tsc
written into the boot clock on VM startup.
Is the boot clock persistent in the guest? Can it get accessed by a vCPU
other than vCPU 0, if its own hv_clock struct has not yet been registered
or if the host has not yet updated the new hv_clock with a valid tsc_timestamp 
in kvm_guest_time_update() ?

Fix temporarily by returning a zero offset if the delta in
pvclock_get_nsec_offset() is negative.

Tested on 3.0.6 guest kernel. Testing this patch requires qemu-kvm from: 
git://git.kiszka.org/qemu-kvm.git queues/cpu-hotplug

---
 arch/x86/kernel/pvclock.c |   11 ---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 42eb330..9d31144 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -43,9 +43,14 @@ void pvclock_set_flags(u8 flags)
 
 static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
 {
-   u64 delta = native_read_tsc() - shadow-tsc_timestamp;
-   return pvclock_scale_delta(delta, shadow-tsc_to_nsec_mul,
-  shadow-tsc_shift);
+u64 current_read_tsc = native_read_tsc();
+if (current_read_tsc  shadow-tsc_timestamp) {
+u64 delta = current_read_tsc - shadow-tsc_timestamp;
+return pvclock_scale_delta(delta, shadow-tsc_to_nsec_mul,
+shadow-tsc_shift);
+}
+/* tsc value can be smaller than tsc_timestamp on a vCPU hotplug */
+else return 0;
 }
 
 /*
-- 
1.7.7.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM, CPU hotplug: Avoid wraparound in pvclock_get_nsec_offset

2011-12-12 Thread Vasilis Liaskovitis

On Mon, Dec 12, 2011 at 02:53:29PM +0100, Jan Kiszka wrote:
 
 Can't comment on the semantics, but your patch is whitespace damaged and
 doesn't follow kernel coding style. But I assume it's not for
 application yet, right?

right. It fixes the hang for me, but I am not sure it's the best solution. If
it is, I 'll resend properly.
thanks,

- Vasilis

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 108 matches

Mail list logo