date:20140729

Eric Blake ebl...@redhat.com writes:

 On 07/25/2014 02:12 AM, Markus Armbruster wrote:
 Stefan Hajnoczi stefa...@redhat.com writes:
 
 When QEMU is executed as part of a test case or from a script, it is
 usually desirable to exit if the parent process terminates.  This
 ensures that leaked QEMU processes do not continue consuming resources
 after their parent has died.

  ##
 -{ 'type': 'ChardevHostdev', 'data': { 'device' : 'str' } }
 +{ 'type': 'ChardevHostdev', 'data': { 'device' : 'str',
 +  '*exit-on-eof' : 'bool' } }
  
  ##
  # @ChardevSocket:
 
 Any use cases beyond libqtest?

 Libvirt probably won't use it for normal guests (we don't want to kill
 qemu just because the monitor disconnects), but does have the notion of
 an autodestroy guest where it might be useful (we WANT the guest to go
 away if libvirtd dies early).  In fact, autodestroy guests are used
 during migration - we want to kill qemu on the destination side if
 libvirtd dies before the source side finishes sending the migration
 stream.  But in that scenario, once migration succeeds, libvirt has to
 be able to convert an autodestroy guest back into a normal guest that no
 longer disappears when libvirtd does; would this be something that QMP
 can toggle the state of this attribute on the fly?  Perhaps through qom-set?

After migration completes, execution moves from source to target.
Wouldn't you want to switch off target auto-destruct together with that
move, atomically?

 If no, should this be x-exit-on-eof?
 
 Hmm, looks like there's no precedence for x- in QAPI.

 Ah, but we do.  For example, x-rdma-pin-all in MigrationCapability in
 qemu 1.7; which has later been removed.

 However, we also have precedence of actions in QAPI that are very
 unlikely to be used outside of qtest, but which are not marked
 experimental; for example, the 'Abort' action in 'transaction' will
 probably never be used by libvirt

Arguably not a conscious decision to make it ABI forever, more a case of
nobody thought about *not* making it ABI :)

   (but as I type that, the thought in
 the back of my mind is that I could possibly do some sort of feature
 probing by setting up a transaction that will abort, and distinguish
 between whether the transaction aborted or whether it errored out
 because the feature I'm probing for isn't present).

 I'm fine leaving this as plain 'exit-on-eof'.

I don't really mind this instance, but I'm a bit concerned about rank
ABI growth.

[Qemu-devel] [PATCH for-2.1] po: update Italian translation

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 po/it.po | 63 ---
 1 file changed, 40 insertions(+), 23 deletions(-)

diff --git a/po/it.po b/po/it.po
index a62665c..e46fb3a 100644
--- a/po/it.po
+++ b/po/it.po
@@ -1,13 +1,13 @@
 # Italian translation for QEMU.
 # This file is put in the public domain.
-# Paolo Bonzini pbonz...@redhat.com, 2012.
+# Paolo Bonzini pbonz...@redhat.com, 2012-2014.
 #
 msgid 
 msgstr 
 Project-Id-Version: QEMU 1.4.50\n
 Report-Msgid-Bugs-To: qemu-devel@nongnu.org\n
-POT-Creation-Date: 2013-07-05 22:36+0200\n
-PO-Revision-Date: 2012-02-27 08:23+0100\n
+POT-Creation-Date: 2014-07-29 08:14+0200\n
+PO-Revision-Date: 2014-07-29 08:25+0200\n
 Last-Translator: Paolo Bonzini pbonz...@redhat.com\n
 Language-Team: Italian i...@li.org\n
 Language: it\n
@@ -16,49 +16,66 @@ msgstr 
 Content-Transfer-Encoding: 8bit\n
 Plural-Forms: nplurals=2; plural=n != 1;\n
 
-#: ui/gtk.c:214
+#: ui/gtk.c:321
 msgid  - Press Ctrl+Alt+G to release grab
-msgstr 
+msgstr  - Premere Ctrl+Alt+G per rilasciare l'input
 
-#: ui/gtk.c:218
+#: ui/gtk.c:325
 msgid  [Paused]
-msgstr 
+msgstr  [Pausa]
 
-#: ui/gtk.c:1318
+#: ui/gtk.c:1601
 msgid _Pause
-msgstr 
+msgstr _Pausa
 
-#: ui/gtk.c:1324
+#: ui/gtk.c:1607
 msgid _Reset
-msgstr 
+msgstr _Reset
 
-#: ui/gtk.c:1327
+#: ui/gtk.c:1610
 msgid Power _Down
-msgstr 
+msgstr _Spegni
+
+#: ui/gtk.c:1616
+msgid _Quit
+msgstr _Esci
 
-#: ui/gtk.c:1381
+#: ui/gtk.c:1702
+msgid Zoom _In
+msgstr _Aumenta zoom
+
+#: ui/gtk.c:1709
+msgid Zoom _Out
+msgstr _Riduci zoom
+
+#: ui/gtk.c:1716
+msgid Best _Fit
+msgstr A_nnulla zoom
+
+#: ui/gtk.c:1723
 msgid Zoom To _Fit
 msgstr Adatta alla _finestra
 
-#: ui/gtk.c:1387
+#: ui/gtk.c:1729
 msgid Grab On _Hover
 msgstr Cattura _automatica input
 
-#: ui/gtk.c:1390
+#: ui/gtk.c:1732
 msgid _Grab Input
 msgstr _Cattura input
 
-#: ui/gtk.c:1416
+#: ui/gtk.c:1761
 msgid Show _Tabs
 msgstr Mostra _tab
 
-#: ui/gtk.c:1430
+#: ui/gtk.c:1764
+msgid Detach Tab
+msgstr _Sposta in una nuova finestra
+
+#: ui/gtk.c:1778
 msgid _Machine
-msgstr 
+msgstr _Macchina virtuale
 
-#: ui/gtk.c:1435
+#: ui/gtk.c:1783
 msgid _View
 msgstr _Visualizza
-
-#~ msgid _File
-#~ msgstr _File
-- 
1.9.3

Re: [Qemu-devel] [PATCH v4 0/5] ACPI fixes for QEMU 2.1

Paolo Bonzini pbonz...@redhat.com writes:

 v3-v4:
   drop all pretense of supporting bridges [me]

Does this non-support need documentation?  None visible in diffstat...


 v2-v3:
   fix tests/acpi-test-data/pc/DSDT [Peter]
   track down make check failure, fix it [patch 4, me]
   split patch 2 in two parts [mst]
   do not make bsel_alloc global [mst]
   include Igor's bridge patch [mst, as discussed on IRC]

 Igor Mammedov (1):
   pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is
 disabled

 Paolo Bonzini (4):
   acpi-dsdt: procedurally generate _PRT
   pc: hack for migration compatibility from QEMU 2.0
   pc: future-proof migration-compatibility of ACPI tables
   bios-tables-test: fix ASL normalization false positive

  hw/i386/acpi-build.c|   94 +-
  hw/i386/acpi-dsdt.dsl   |   90 +-
  hw/i386/acpi-dsdt.hex.generated | 1910 
 +++
  hw/i386/pc_piix.c   |   19 +
  hw/i386/pc_q35.c|5 +
  include/hw/i386/pc.h|1 +
  tests/acpi-test-data/pc/DSDT|  Bin 4499 - 2807 bytes
  tests/bios-tables-test.c|6 +-
  8 files changed, 263 insertions(+), 1862 deletions(-)

Re: [Qemu-devel] [PATCH v4 0/5] ACPI fixes for QEMU 2.1

Il 29/07/2014 08:16, Markus Armbruster ha scritto:
  v3-v4:
 drop all pretense of supporting bridges [me]
 Does this non-support need documentation?  None visible in diffstat...

Yes, it should.  It was already broken for 1.7-2.0.

Paolo

Re: [Qemu-devel] Dynamic QEMU platform device instantiation in machine files: phone call on Wed July 30

Eric Auger eric.au...@linaro.org writes:

 Dear all,

 For your information, a phone call will be held this week on Wed July
 30, 17h-18h CET to address the topic of dynamic instantiation of QEMU
 platform devices in machine files (using the -device qemu option).

 Related threads are:
 http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
 http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg01019.html

 The objective of the meeting is to discuss (and hopefully reach a
 consensus) on where we can put the code associated to platform device dt
 node generation and also qom binding stuff.

 In case you are interested in this discussion, please contact me and I
 will send you the phone call details.

Why not use the regular developers' conference call?  Next one is
scheduled for August 5th, but an extraordinary one could probably be
arranged earlier (tomorrow usual time perhaps?).

Re: [Qemu-devel] [PATCH 3/5] scsi-block: extract scsi_block_is_passthrough

On Mon, 07/28 17:08, Paolo Bonzini wrote:
 This will be used for both scsi_block_new_request and the scsi-block
 implementation of parse_cdb.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  hw/scsi/scsi-disk.c | 38 ++
  1 file changed, 26 insertions(+), 12 deletions(-)
 
 diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
 index d47ecd6..81b7276 100644
 --- a/hw/scsi/scsi-disk.c
 +++ b/hw/scsi/scsi-disk.c
 @@ -2501,12 +2501,8 @@ static int scsi_block_initfn(SCSIDevice *dev)
  return scsi_initfn(s-qdev);
  }
  
 -static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag,
 -   uint32_t lun, uint8_t *buf,
 -   void *hba_private)
 +static bool scsi_block_is_passthrough(SCSIDiskState *s, uint8_t *buf)
  {
 -SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d);
 -
  switch (buf[0]) {
  case READ_6:
  case READ_10:
 @@ -2523,9 +2519,9 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice 
 *d, uint32_t tag,
  case WRITE_VERIFY_12:
  case WRITE_VERIFY_16:
  /* If we are not using O_DIRECT, we might read stale data from the
 -  * host cache if writes were made using other commands than these
 -  * ones (such as WRITE SAME or EXTENDED COPY, etc.).  So, without
 -  * O_DIRECT everything must go through SG_IO.
 + * host cache if writes were made using other commands than these
 + * ones (such as WRITE SAME or EXTENDED COPY, etc.).  So, without
 + * O_DIRECT everything must go through SG_IO.
   */
  if (!(bdrv_get_flags(s-qdev.conf.bs)  BDRV_O_NOCACHE)) {
  break;
 @@ -2542,13 +2538,31 @@ static SCSIRequest *scsi_block_new_request(SCSIDevice 
 *d, uint32_t tag,
   * just make scsi-block operate the same as scsi-generic for them.
   */
  if (s-qdev.type != TYPE_ROM) {
 -return scsi_req_alloc(scsi_disk_dma_reqops, s-qdev, tag, lun,
 -  hba_private);
 +return false;
  }
 +break;
 +
 +default:
 +break;

Out of curiosity, why add this default branch?

Fam

  }
  
 -return scsi_req_alloc(scsi_generic_req_ops, s-qdev, tag, lun,
 -  hba_private);
 +return true;
 +}
 +
 +
 +static SCSIRequest *scsi_block_new_request(SCSIDevice *d, uint32_t tag,
 +   uint32_t lun, uint8_t *buf,
 +   void *hba_private)
 +{
 +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d);
 +
 +if (scsi_block_is_passthrough(s, buf)) {
 +return scsi_req_alloc(scsi_generic_req_ops, s-qdev, tag, lun,
 +  hba_private);
 +} else {
 +return scsi_req_alloc(scsi_disk_dma_reqops, s-qdev, tag, lun,
 +  hba_private);
 +}
  }
  #endif
  
 -- 
 1.9.3

Re: [Qemu-devel] [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

2014-07-29 Thread Chen, Tiejun


On 2014/7/26 1:01, Konrad Rzeszutek Wilk wrote:

On Thu, Jul 24, 2014 at 09:44:41AM +0800, Chen, Tiejun wrote:

On 2014/7/24 4:54, Konrad Rzeszutek Wilk wrote:

On Sat, Jul 19, 2014 at 12:27:21AM +, Kay, Allen M wrote:

For the MCH PCI registers that do need to be read - can you tell us which ones 
those are?


In qemu/hw/xen_pt_igd.c/igd_pci_read(), following MCH PCI config register reads 
are passthrough to the host HW.   Some of the registers are needed by Ironlake 
GFX driver which we probably can remove.  I did a trace recently on Broadwell,  
the number of register accessed are even smaller (0, 2, 2c, 2e, 50, 52, a0, 
a4).  Given that we now have integrated MCH and GPU in the same socket, looks 
like driver can easily remove reads for offsets 0 - 0x2e.

case 0x00:/* vendor id */
case 0x02:/* device id */
case 0x08:/* revision id */
case 0x2c:/* sybsystem vendor id */
case 0x2e:/* sybsystem id */


Right. We can fix the i915 to use the mechanism that Michael mentioned.
(see attached RFC patches)


case 0x50:/* SNB: processor graphics control register */
case 0x52:/* processor graphics control register */
case 0xa0:/* top of memory */
case 0xb0:/* ILK: BSM: should read from dev 2 offset 
0x5c */
case 0x58:/* SNB: PAVPC Offset */
case 0xa4:/* SNB: graphics base of stolen memory */
case 0xa8:/* SNB: base of GTT stolen memory */


I dug in the intel-gtt.c (see ironlake_gtt_driver) to figure out this
a bit more. As in, I speculated, what if we returned 0 (and not implement
any support for reading from these registers). What would happen?

0x52 for Ironlake (g5):
--
It looks like intel_gmch_probe is called when i915_gem_gtt_init
starts (there is a lot of code that looks to be used between
intel-gtt.c and i915.c).

Anyhow the interesting parts are that i9xx_setup ends up calling
ioremap the i915 BAR to setup some of these registers for older generations.

Then i965_gtt_total_entries gets which reads 0x52, but it is only
needed for v5 generation. For other (v4 and G33) it reads it from the GPU's
0x2020  register.

If there is a mismatch, it writes to the GPU at 0x2020 to update the
the size based on the bridge. And then it reads from 0x2020 and that
is returned and stuck in  intel_private.gtt_total_entries.

So 0x52 in the emulated bridge could be populated with what the
GPU has at 0x2020. And the writes go in the emulated area.

0x52 for v6 - v8:
-
We seem to go to gen6_gmch_probe which just figures out the
the GTT based on the GPU's BAR sizes. The stolen values
are read from 0x50 from the GPU. So no need to touch the bridge
(see gen6_gmch_probe)

OK, so no need to have 0x52 or 0x50 in the bridge.

0xA0:
-
Could not find any reference in the Linux code. Why would
Windows driver need this? If we returned the _real_ TOM would
it matter? Is it used to figure out the device should use 32-bit
DMA operations or 40-bit?

0xb0 or 0x5c:
-
No mention of them in the Linux code.

0x58, 0xa4, 0xa8:
-
No usage of them in the Linux code. We seem to be using the 0x52

from the bridge and 0x2020 from the GPU for this functionality.



So in theory*, if using Ironlake we need to have a proper value
in 0x52 in the bridge. But for the later chipsets we do not need
these values (I am assuming that intel_setup_mchbar can safely
return as it does that for Ironlake and could very well for later
generations).

Can this be reflected in the Windows driver as well?

P.S.
*theory: That is assuming we modify the Linux i915_drv.c:intel_detect_pch
to pick up the id as suggested earlier. See the RFC patches attached.
(Not compile tested at all!)


I take a look these patches, looks we still scan all PCI Bridge to walk all
PCHs. So this means we still need to fake a PCI bridge, right? Or maybe you
don't cover this problem this time.


I totally forgot. Feel free to fix that.




Sorry for this delay response.


I prefer we should check dev slot to get that PCH like my previous patch,


Uh? Your patch was checking for 0:1f.0, not the slot of the device.


Yes.



(see https://lkml.org/lkml/2014/6/19/121)

gpu:drm:i915:intel_detect_pch: back to check devfn instead of check class
type. Because Windows always use this way, so I think this point should be
same between Linux and Windows.


Didn't we discuss that we did not want to pass in PCH at all if we can?


I'm a bit confused since I guess I'm missing something important in this 
long discussion. I guess we just fake a simple PCIe device but just has 
PCI_VENDOR_ID_XEN/Real_PCH_Device_ID, and then well probe such a PCIe 
device inside intel_detect_pch(), right?


And this means we will not support those existing drivers?



And from the

Re: [Qemu-devel] [PATCH for-2.2 v2 0/5] scsi: enable passthrough of vendor-specific commands

On Mon, 07/28 17:08, Paolo Bonzini wrote:
 Right now scsi-generic is parsing the CDB, in order to compute
 the expected number of bytes to be transferred.  This is necessary
 if DMA is done by the HBA via scsi_req_data, but it prevents executing
 vendor-specific commands via scsi-generic because we don't know how
 to parse them.
 
 If DMA is delegated to the SCSI layer via get_sg_list, we know in
 advance how many bytes the guest will want to receive and we can pass
 the information straight from the guest to SG_IO.  In this case, it is
 unnecessary to parse the CDB to get the same information.  scsi-disk needs
 it to detect underruns and overruns, but scsi-generic and scsi-block can
 just ask the HBA about the transfer direction and size.
 
 This series introduces a new parse_cdb callback in both the device and
 the HBA.  The latter is called by scsi_bus_parse_cdb, which devices can
 call for passthrough requests in their implementation of parse_cdb.
 
 Paolo
 
 v1-v2: use the right CDB size for non-vendor-specific commands,
 as some drivers and/or firmware expect that and complain
 if you pass a READ(10) command in a 16-byte CDB.  Interdiff
 here.

More learning than reviewing for me, so take this with a grain of salt:

Reviewed-by: Fam Zheng f...@redhat.com

[Qemu-devel] [RESEND PATCH V3] qemu-img info: show nocow info

2014-07-29 Thread Chunyan Liu

Add nocow info in 'qemu-img info' output to show whether the file
currently has NOCOW flag set or not.

Signed-off-by: Chunyan Liu cy...@suse.com
Acked-by: Eric Blake ebl...@redhat.com
---
Resend for QEMU 2.2. Change json version comment.

 block/qapi.c | 25 +
 qapi/block-core.json |  5 -
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/block/qapi.c b/block/qapi.c
index f44f6b4..aa53f19 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -28,6 +28,13 @@
 #include qapi-visit.h
 #include qapi/qmp-output-visitor.h
 #include qapi/qmp/types.h
+#ifdef __linux__
+#include linux/fs.h
+#include sys/ioctl.h
+#ifndef FS_NOCOW_FL
+#define FS_NOCOW_FL 0x0080 /* Do not cow file */
+#endif
+#endif
 
 BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs)
 {
@@ -173,6 +180,20 @@ void bdrv_query_image_info(BlockDriverState *bs,
 Error *err = NULL;
 ImageInfo *info = g_new0(ImageInfo, 1);
 
+#ifdef __linux__
+int fd, attr;
+
+/* get NOCOW info */
+fd = qemu_open(bs-filename, O_RDONLY | O_NONBLOCK);
+if (fd = 0) {
+if (ioctl(fd, FS_IOC_GETFLAGS, attr) == 0  (attr  FS_NOCOW_FL)) {
+info-has_nocow = true;
+info-nocow = true;
+}
+qemu_close(fd);
+}
+#endif
+
 bdrv_get_geometry(bs, total_sectors);
 
 info-filename= g_strdup(bs-filename);
@@ -625,4 +646,8 @@ void bdrv_image_info_dump(fprintf_function func_fprintf, 
void *f,
 func_fprintf(f, Format specific information:\n);
 bdrv_image_info_specific_dump(func_fprintf, f, info-format_specific);
 }
+
+if (info-has_nocow  info-nocow) {
+func_fprintf(f, NOCOW flag: set\n);
+}
 }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index e378653..72b4015 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -115,6 +115,8 @@
 # @format-specific: #optional structure supplying additional format-specific
 # information (since 1.7)
 #
+# @nocow: #optional info of whether NOCOW flag is set or not. (since 2.2)
+#
 # Since: 1.3
 #
 ##
@@ -126,7 +128,8 @@
'*backing-filename': 'str', '*full-backing-filename': 'str',
'*backing-filename-format': 'str', '*snapshots': ['SnapshotInfo'],
'*backing-image': 'ImageInfo',
-   '*format-specific': 'ImageInfoSpecific' } }
+   '*format-specific': 'ImageInfoSpecific',
+   '*nocow': 'bool' } }
 
 ##
 # @ImageCheck:
-- 
1.8.4.5

Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm

Serge,

 I don't think that is in any way a problem.  Is migrating to older
 versions ever actually expected to work?  In either case I don't
 think for this particular case it's a problem.

Good; no; and good - respectively.

 (The how to handle this in libvirt question is more interesting)

I've been giving this some thought. However rococo we make this,
I think the caller is often going to need to modify the command
line anyway, i.e. is going to need to be aware of the migration
problem.

For instance, taking Ubuntu as an example, 12.04 shipped with
qemu-kvm-1.0, and a pxe-virtio.rom of just under 64k, giving
a 64k ROM slot. 14.04 ships with qemu-2.0 and a pxe-virtio.rom
of over 80k, giving a 128k ROM slot. So however we fix the
machine types, when migrating in a 12.04 initiated VM, qemu
will need
 -global virtio-net-pci.romfile=/path/to/pxe-virtio.rom.12.04
on the command line (or, if you don't much care about PXE
working on a soft reboot, a blank file of the same size),
whereas when migrating in a 14.04 initiated VM, that must
not be on the command line.

Fixing this properly would entail requiring that the ROMs are
(effectively) distributed with qemu or at least that all
ROM sizes become part of the machine type standard. This
would have the advantage that loading a larger ROM than
the machine type specifies would fail unless the ROM
size was explicitly configured on the command line. But
this is a subject wider than this patch.

So the long and the short of it is that libvirt (sadly) like
anything else starting qemu machines needs to know a bit about
different versions of qemu, and be able to replace a machine
type option with a machine type option and more on the
command line.

My previous suggestion doesn't help much because qemu will
still need to be passed something on the command line.

I think the best way to go with this patch would be something
like:

* Add pc-1.0-qemu-kvm as a machine type (done)

* Rename pc-1.0 to pc-1.0-qemu-git

* Add an alias for pc-1.0 to either pc-1.0-qemu-git or
  pc-1.0-qemu-kvm, configurable at build time with
  a ./configure option. The distro can then set this
  appropriately. This would default to pc-1.0-qemu-git
  (i.e. the current behaviour).

On distros that only every used one of the above,
./configure will sort things out, +/- self-inflicted
injuries relating to ROM size changes. If life is
more complicated, libvirt (and other callers) will
need to be aware. pc-1.0-qemu-git and pc-1.0-qemu-kvm
can be used to unambiguously mean the relevant machine
type, which will fix things going forward for that
machine type.

WDYT?

-- 
Alex Bligh

Re: [Qemu-devel] [PATCH 3/5] scsi-block: extract scsi_block_is_passthrough

Il 29/07/2014 08:28, Fam Zheng ha scritto:
  +
  +default:
  +break;
 Out of curiosity, why add this default branch?

No particular reason, I guess I've changed my style since I wrote this code!

Paolo

Re: [Qemu-devel] Dynamic QEMU platform device instantiation in machine files: phone call on Wed July 30

2014-07-29 Thread Eric Auger

Hi Markus,

Thank you for the proposal. Actually quite a lot of people are now OK
with this time slot. This phone call was organized in non std way since
Alex was not available next week. If you don't mind I would now prefer
to keep this slot. Do you wish to join by the way?

Best Regards

Eric

On 07/29/2014 08:24 AM, Markus Armbruster wrote:
 Eric Auger eric.au...@linaro.org writes:
 
 Dear all,

 For your information, a phone call will be held this week on Wed July
 30, 17h-18h CET to address the topic of dynamic instantiation of QEMU
 platform devices in machine files (using the -device qemu option).

 Related threads are:
 http://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00047.html
 http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg01019.html

 The objective of the meeting is to discuss (and hopefully reach a
 consensus) on where we can put the code associated to platform device dt
 node generation and also qom binding stuff.

 In case you are interested in this discussion, please contact me and I
 will send you the phone call details.
 
 Why not use the regular developers' conference call?  Next one is
 scheduled for August 5th, but an extraordinary one could probably be
 arranged earlier (tomorrow usual time perhaps?).

Re: [Qemu-devel] [PATCH 1/3] loader: add support for resizeable blobs

2014-07-29 Thread Dr. David Alan Gilbert

* Michael S. Tsirkin (m...@redhat.com) wrote:
 On Mon, Jul 28, 2014 at 05:52:27PM +0100, Dr. David Alan Gilbert wrote:
  * Michael S. Tsirkin (m...@redhat.com) wrote:
   Support resizeable blobs: we allocate more memory than currently
   available in the blob, which can later be filled in.
   
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   ---
include/hw/loader.h   | 14 +++--
include/hw/nvram/fw_cfg.h |  2 +-
hw/core/loader.c  | 15 +
hw/nvram/fw_cfg.c | 79 
   ---
4 files changed, 96 insertions(+), 14 deletions(-)
   
  
  
   +static bool fw_cfg_len_needed(void *opaque, int version_id)
   +{
   +FWCfgState *s = FW_CFG(opaque);
   +int i, j;
   +
   +for (i = 0; i  ARRAY_SIZE(s-entries); ++i) {
   +for (j = 0; j  ARRAY_SIZE(s-entries[0]); ++j) {
   +if (s-entries[i][j].len != s-entries[i][j].max_len) {
   +return true;
   +}
   +}
   +}
   +
   +return false;
   +}
  
  This feels a bit delicate; it means that increasing the 'max_len' changes
  the expected migration stream, which seems odd for a parameter whose
  job is to make things easier to migrate.
  
  Dave
 
 It's an API issue - so you are more comfortable with
 an explicit flag?

Yes, if it was something that was just switched on with new machine types
I wouldn't worry as much.

Dave

 
   +
static const VMStateDescription vmstate_fw_cfg = {
.name = fw_cfg,
.version_id = 2,
.minimum_version_id = 1,
   +.post_load = fw_cfg_post_load,
   +.pre_save = fw_cfg_pre_save,
.fields = (VMStateField[]) {
VMSTATE_UINT16(cur_entry, FWCfgState),
VMSTATE_UINT16_HACK(cur_offset, FWCfgState, is_version_1),
VMSTATE_UINT32_V(cur_offset, FWCfgState, 2),
   +VMSTATE_ARRAY_TEST(len, FWCfgState, FW_CFG_LEN_ENTRIES,
   +   fw_cfg_len_needed,
   +   vmstate_info_uint32, uint32_t),
VMSTATE_END_OF_LIST()
}
};
   @@ -388,23 +449,28 @@ static const VMStateDescription vmstate_fw_cfg = {
static void fw_cfg_add_bytes_read_callback(FWCfgState *s, uint16_t key,
   FWCfgReadCallback callback,
   void *callback_opaque,
   -   void *data, size_t len)
   +   void *data, size_t len,
   +   size_t max_len)
{
int arch = !!(key  FW_CFG_ARCH_LOCAL);

   +assert(len = max_len);
   +
key = FW_CFG_ENTRY_MASK;

assert(key  FW_CFG_MAX_ENTRY  len  UINT32_MAX);

s-entries[arch][key].data = data;
s-entries[arch][key].len = (uint32_t)len;
   +s-entries[arch][key].reset_len = (uint32_t)len;
   +s-entries[arch][key].max_len = (uint32_t)max_len;
s-entries[arch][key].read_callback = callback;
s-entries[arch][key].callback_opaque = callback_opaque;
}

void fw_cfg_add_bytes(FWCfgState *s, uint16_t key, void *data, size_t 
   len)
{
   -fw_cfg_add_bytes_read_callback(s, key, NULL, NULL, data, len);
   +fw_cfg_add_bytes_read_callback(s, key, NULL, NULL, data, len, len);
}

void fw_cfg_add_string(FWCfgState *s, uint16_t key, const char *value)
   @@ -454,13 +520,15 @@ void fw_cfg_add_callback(FWCfgState *s, uint16_t 
   key, FWCfgCallback callback,

s-entries[arch][key].data = data;
s-entries[arch][key].len = (uint32_t)len;
   +s-entries[arch][key].reset_len = (uint32_t)len;
   +s-entries[arch][key].max_len = (uint32_t)len;
s-entries[arch][key].callback_opaque = callback_opaque;
s-entries[arch][key].callback = callback;
}

void fw_cfg_add_file_callback(FWCfgState *s,  const char *filename,
  FWCfgReadCallback callback, void 
   *callback_opaque,
   -  void *data, size_t len)
   +  void *data, size_t len, size_t max_len)
{
int i, index;
size_t dsize;
   @@ -475,7 +543,8 @@ void fw_cfg_add_file_callback(FWCfgState *s,  const 
   char *filename,
assert(index  FW_CFG_FILE_SLOTS);

fw_cfg_add_bytes_read_callback(s, FW_CFG_FILE_FIRST + index,
   -   callback, callback_opaque, data, len);
   +   callback, callback_opaque, data, len,
   +   max_len);

pstrcpy(s-files-f[index].name, sizeof(s-files-f[index].name),
filename);
   @@ -496,7 +565,7 @@ void fw_cfg_add_file_callback(FWCfgState *s,  const 
   char *filename,
void fw_cfg_add_file(FWCfgState *s,  const char *filename,
 void *data, size_t len)
{
   -

[Qemu-devel] [Bug 1349722] [NEW] qemu-io: Exit code is always zero

2014-07-29 Thread Maria Kustova

Public bug reported:

The qemu-io always returns zero on exit independently on errors occurred
during the command execution.

Example,

$ qemu-io -c 'write 128 234' /tmp/run1/test-1/test.img

offset 128 is not sector aligned

$ echo $?
0


qemu.git HEAD: 41a1a9c42c4e

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1349722

Title:
  qemu-io: Exit code is always zero

Status in QEMU:
  New

Bug description:
  The qemu-io always returns zero on exit independently on errors
  occurred during the command execution.

  Example,

  $ qemu-io -c 'write 128 234' /tmp/run1/test-1/test.img

  offset 128 is not sector aligned

  $ echo $?
  0

  
  qemu.git HEAD: 41a1a9c42c4e

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1349722/+subscriptions

Re: [Qemu-devel] [PATCH] piix: set legacy table size for 1.7

Il 28/07/2014 23:01, Michael S. Tsirkin ha scritto:
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Paolo, so the following is needed on top of your patch?
 
  hw/i386/pc_piix.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
 index 4524e6b..9694f88 100644
 --- a/hw/i386/pc_piix.c
 +++ b/hw/i386/pc_piix.c
 @@ -326,6 +326,7 @@ static void pc_compat_1_7(MachineState *machine)
  smbios_defaults = false;
  gigabyte_align = false;
  option_rom_has_mr = true;
 +legacy_acpi_table_size = 6414;
  x86_cpu_compat_disable_kvm_features(FEAT_1_ECX, CPUID_EXT_X2APIC);
  }

Only if you want to support pc-i440fx-1.7 migration for 1.7-2.1 (and
not for 2.0-2.1).  Probably better considering we're applying Igor's patch.

Paolo

Re: [Qemu-devel] [PATCH] acpi-build: tweak acpi migration limits

Il 28/07/2014 23:15, Michael S. Tsirkin ha scritto:
 - Tweak error message for legacy machine type:
   Basically if table size exceeds the limits we set all
   bets are off for migration: e.g. it can start failing even
   within given qemu minor version simply because of a bugfix.
 - Increase table size to 128k.
 - Make sure we notice it long before we start getting close to the
   128k limit: warn at 64k.
 - Don't fail if we exceed the limit: most people don't care about
   migration, even less people care about cross version miration.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  hw/i386/acpi-build.c | 16 +---
  1 file changed, 9 insertions(+), 7 deletions(-)
 
 diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
 index 2178894..816c6d9 100644
 --- a/hw/i386/acpi-build.c
 +++ b/hw/i386/acpi-build.c
 @@ -62,7 +62,7 @@
  #define ACPI_BUILD_LEGACY_CPU_AML_SIZE97
  #define ACPI_BUILD_ALIGN_SIZE 0x1000
  
 -#define ACPI_BUILD_TABLE_SIZE 0x1
 +#define ACPI_BUILD_TABLE_SIZE 0x2
  
  typedef struct AcpiCpuInfo {
  DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT);
 @@ -1586,17 +1586,19 @@ void acpi_build(PcGuestInfo *guest_info, 
 AcpiBuildTables *tables)
   ACPI_BUILD_ALIGN_SIZE);
  if (tables-table_data-len  legacy_table_size) {
  /* Should happen only with PCI bridges and -M pc-i440fx-2.0.  */
 -error_report(Warning: migration to QEMU 2.0 may not work.);
 +error_report(Warning: migration may not work.);
  }
  g_array_set_size(tables-table_data, legacy_table_size);
  } else {
 -if (tables-table_data-len  ACPI_BUILD_TABLE_SIZE) {
 +/* Make sure we have a buffer in case we need to resize the tables. 
 */
 +if (tables-table_data-len  ACPI_BUILD_TABLE_SIZE / 2) {
  /* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory 
 slots.  */
 -error_report(ACPI tables are larger than 64k.  Please remove);
 -error_report(CPUs, NUMA nodes, memory slots or PCI bridges.);
 -exit(1);
 +error_report(Warning: ACPI tables are larger than 64k.);
 +error_report(Warning: migration may not work.);
 +error_report(Warning: please remove CPUs, NUMA nodes, 
 + memory slots or PCI bridges.);
  }
 -g_array_set_size(tables-table_data, ACPI_BUILD_TABLE_SIZE);
 +acpi_align_size(tables-table_data, ACPI_BUILD_TABLE_SIZE);
  }
  
  acpi_align_size(tables-linker, ACPI_BUILD_ALIGN_SIZE);
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Re: [Qemu-devel] [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

Il 29/07/2014 08:59, Chen, Tiejun ha scritto:

 (see https://lkml.org/lkml/2014/6/19/121)
 gpu:drm:i915:intel_detect_pch: back to check devfn instead of check
 class
 type. Because Windows always use this way, so I think this point
 should be
 same between Linux and Windows.

 Didn't we discuss that we did not want to pass in PCH at all if we can?
 
 I'm a bit confused since I guess I'm missing something important in this
 long discussion. I guess we just fake a simple PCIe device but just has
 PCI_VENDOR_ID_XEN/Real_PCH_Device_ID, and then well probe such a PCIe
 device inside intel_detect_pch(), right?

Yes, for the special PIIX4 legacy machine type we want to do that and
place the device at 00:1f.0.

Paolo

Re: [Qemu-devel] [PATCH for-2.2 1/2] rename parse_enum_option to qapi_enum_parse and make it public

Il 28/07/2014 22:06, Peter Lieven ha scritto:
  ...but you have relaxed the license to LGPLv2+ in your code motion.
  Then again, Peter is the original author of this code in commit
  82a402e9, so you have the legal right to relax things.
 Actually, I was not aware. I just took Hu Taos original commit. As we want
 to make a collection of utilities I would go as far as put a BSD license here.
 
 I will send a v2 with your comments regarding Patch 2. And do that license
 change as well.

LGPL v2.1+ is the standard license for qapi/, so I think it's okay to
use either.

Paolo

Re: [Qemu-devel] [PATCH] target-mips: Ignore unassigned accesses with KVM

Il 28/07/2014 23:36, Aurelien Jarno ha scritto:
 On Mon, Jul 28, 2014 at 12:37:50PM +0100, James Hogan wrote:
 MIPS registers an unassigned access handler which raises a guest bus
 error exception. However this causes QEMU to crash when KVM is enabled
 as it isn't called from the main execution loop so longjmp() gets called
 without a corresponding setjmp().

 Until the KVM API can be updated to trigger a guest exception in
 response to an MMIO exit, prevent the bus error exception being raised
 from mips_cpu_unassigned_access() if KVM is enabled.

 The check is at run time since the do_unassigned_access callback is
 initialised before it is known whether KVM will be enabled.

 The problem can be triggered with Malta emulation by making the guest
 write to the reset region at physical address 0x1bf0, since it is
 marked read-only which is treated as unassigned for writes.

 Signed-off-by: James Hogan james.ho...@imgtec.com
 Cc: Aurelien Jarno aurel...@aurel32.net
 Cc: Peter Maydell peter.mayd...@linaro.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Gleb Natapov g...@redhat.com
 Cc: Christoffer Dall christoffer.d...@linaro.org
 Cc: Sanjay Lal sanj...@kymasys.com
 ---
  target-mips/op_helper.c | 11 +++
  1 file changed, 11 insertions(+)

 diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
 index 27651a4a00c1..df97b35f8701 100644
 --- a/target-mips/op_helper.c
 +++ b/target-mips/op_helper.c
 @@ -21,6 +21,7 @@
  #include qemu/host-utils.h
  #include exec/helper-proto.h
  #include exec/cpu_ldst.h
 +#include sysemu/kvm.h
  
  #ifndef CONFIG_USER_ONLY
  static inline void cpu_mips_tlb_flush (CPUMIPSState *env, int flush_global);
 @@ -2168,6 +2169,16 @@ void mips_cpu_unassigned_access(CPUState *cs, hwaddr 
 addr,
  MIPSCPU *cpu = MIPS_CPU(cs);
  CPUMIPSState *env = cpu-env;
  
 +/*
 + * Raising an exception with KVM enabled will crash because it won't be 
 from
 + * the main execution loop so the longjmp won't have a matching setjmp.
 + * Until we can trigger a bus error exception through KVM lets just 
 ignore
 + * the access.
 + */
 +if (kvm_enabled()) {
 +return;
 +}
 +
  if (is_exec) {
  helper_raise_exception(env, EXCP_IBE);
  } else {
 
 Reviewed-by: Aurelien Jarno aurel...@aurel32.net
 
 Note that even if the test is added for each exception, it is light
 enough compared to triggering and handling an exception so that it has
 no impact on performance.
 
 Paolo, do you want to take this patch in your kvm tree?

Sure, I'll include it for 2.2.

Paolo

Re: [Qemu-devel] Dynamic QEMU platform device instantiation in machine files: phone call on Wed July 30

Eric Auger eric.au...@linaro.org writes:

 Hi Markus,

 Thank you for the proposal. Actually quite a lot of people are now OK
 with this time slot. This phone call was organized in non std way since
 Alex was not available next week. If you don't mind I would now prefer
 to keep this slot.

Let's keep it.

Don't be shy using the ordinary call in the future, even if that means
asking for one outside the biweekly schedule.

Do you wish to join by the way?

I'd love to, but family obligations will make it difficult.

You said 17h-18h CET.  Do you mean CEST?  Easiest way to avoid the
confusion is UTC.

Re: [Qemu-devel] [Intel-gfx] ResettRe: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support

2014-07-29 Thread Chen, Tiejun


On 2014/7/29 16:32, Paolo Bonzini wrote:

Il 29/07/2014 08:59, Chen, Tiejun ha scritto:


(see https://lkml.org/lkml/2014/6/19/121)

gpu:drm:i915:intel_detect_pch: back to check devfn instead of check
class
type. Because Windows always use this way, so I think this point
should be
same between Linux and Windows.


Didn't we discuss that we did not want to pass in PCH at all if we can?


I'm a bit confused since I guess I'm missing something important in this
long discussion. I guess we just fake a simple PCIe device but just has
PCI_VENDOR_ID_XEN/Real_PCH_Device_ID, and then well probe such a PCIe
device inside intel_detect_pch(), right?


Yes, for the special PIIX4 legacy machine type we want to do that and
place the device at 00:1f.0.



Got it.

BTW, please review those patches implemented a separate machine, xenigd, 
firstly. Then I can step on this point.


Thanks
Tiejun

Re: [Qemu-devel] [PATCH v2 1/7] bootindex: add modify_boot_device_path function

2014-07-29 Thread Gonglei (Arei)

Hi, Gerd

   I think it is ok to allow only *changing* the bootindex.
  
  Yes, that's no problem.

 But then yoy always  will have a old entry where you can take the 
 suffix
 from, and you don't need the suffix as parameter for the monitor
 command.

No, optional.
  
Because the bootindex property is not a necessary property for devices.
If a device, such as IDE cdrom haven't attach the bootindex in qemu
 booting
command line, the global fw_boot_order will not have its entry.
  
   I'd suggest to simply not support this and throw an error then.
  
  Ok.
 
So, we should
save the suffix as parameter.
  
   I think it is a bad idea to have the suffix as monitor command
   parameter.  It is a implementation detail which the monitor users should
   not have to worry about.
  
  Yes. Actually I also have this misgivings.
 
   Easiest way to do this is to allow *changing* an existing bootindex
   entry only, and not support *adding* new boot entries.  The user has to
   set a bootindex for any device it might want to boot from in the future
   then.  I think this is acceptable.
  
  Hmm..
 
   If you want support adding bootentries at runtime (and it probably makes
   sense to support removing entries too in that case) the suffix handling
   should be reworked.  The suffix could be stored in DeviceState instead
   of being passed to the add_bootentry function, so you can add boot
   entries later on without expecting the user to know what the correct
   suffix is.
  
  That's a good idea! I think this is a good improvement.
 
  Any other comments? Thanks!
 
 Maybe we should save the suffix as parameter in add_boot_device_path()
 
 There are two reasons:
 
 1. the floppy device may have two bootindex, which configure as below:
   -global isa-fdc.driveA=drive-fdc0-0-0, bootindexA=xxx \
   -global isa-fdc.driveB=drive-fdc0-0-1 bootindexB=xxx \
  We can see it in isabus_fdc_realize() [hw/block/fdc.c]
 
 2. The option rom don't need pass dev to add_bootentry,
   which realized in rom_add_file() [hw/core/loader.c]
 
 in those situation, we have to pass suffix to add_bootentry function.
 
In addition, because an isa-fdc device can be configured two drive,
we have to pass the suffix for changing floppy's bootindex. Both del_bootentry
and add_bootentry need suffix to confirm the unique device.

Because the suffix of bootindex is invisible for common user. Maybe we 
can introduce a query interface for user. Such as hmp monitor command
info bootindex, which show the information of bootindex have configured,
includeing suffix. But the suffix is optional, maybe just useful for floppy 
disk,
that will be not a serious or trouble problem IMHO. 

So, I want to do those in the next version:
- save the suffix as optional continue.
- allow changing the existing bootindex entry only, 
 not support adding new boot entries.
- rework modify_boot_device_path(), add checking logic for the suffix
 when remove old bootindex entry from global boot order list. 

I'm looking forward to your reply, thanks!

Best regards,
-Gonglei

Re: [Qemu-devel] Dynamic QEMU platform device instantiation in machine files: phone call on Wed July 30

2014-07-29 Thread Eric Auger

On 07/29/2014 11:07 AM, Markus Armbruster wrote:
 Eric Auger eric.au...@linaro.org writes:
 
 Hi Markus,

 Thank you for the proposal. Actually quite a lot of people are now OK
 with this time slot. This phone call was organized in non std way since
 Alex was not available next week. If you don't mind I would now prefer
 to keep this slot.
 
 Let's keep it.
 
 Don't be shy using the ordinary call in the future, even if that means
 asking for one outside the biweekly schedule.
OK sure I will ;-)
 
Do you wish to join by the way?
 
 I'd love to, but family obligations will make it difficult.
 
 You said 17h-18h CET.  Do you mean CEST?  Easiest way to avoid the
 confusion is UTC.

Yes CEST = UTC + 2. Thanks for pointing out the need for clarification.

Regards

Eric

[Qemu-devel] [PATCH] acpi-build: minor code cleanup

Fix up and add  comments to clarify code, plus a trivial
code change for clarity.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/i386/acpi-build.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 779160f..ec86f1b 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -825,9 +825,9 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state)
 bool bus_hotplug_support = false;
 
 /*
-skip bridge subtree creation if bridge hotplug is disabled
-to make it compatible with 1.7 machine type
-*/
+ * Skip bridge subtree creation if bridge hotplug is disabled
+ * to make acpi tables compatible with legacy machine types.
+ */
 if (!child-pcihp_bridge_en  bus-parent_dev) {
 return;
 }
@@ -869,6 +869,7 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state)
 PCIDeviceClass *pc;
 PCIDevice *pdev = bus-devices[i];
 int slot = PCI_SLOT(i);
+bool bridge_in_acpi;
 
 if (!pdev) {
 continue;
@@ -878,8 +879,13 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state)
 pc = PCI_DEVICE_GET_CLASS(pdev);
 dc = DEVICE_GET_CLASS(pdev);
 
-if (pc-class_id == PCI_CLASS_BRIDGE_ISA ||
-(pc-is_bridge  child-pcihp_bridge_en)) {
+/* When hotplug for bridges is enabled, bridges are
+ * described in ACPI separately (see build_pci_bus_end).
+ * In this case they aren't themselves hot-pluggable.
+ */
+bridge_in_acpi = pc-is_bridge  child-pcihp_bridge_en;
+
+if (pc-class_id == PCI_CLASS_BRIDGE_ISA || bridge_in_acpi) {
 set_bit(slot, slot_device_system);
 }
 
@@ -891,7 +897,7 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state)
 }
 }
 
-if (!dc-hotpluggable || (pc-is_bridge  child-pcihp_bridge_en)) {
+if (!dc-hotpluggable || bridge_in_acpi) {
 clear_bit(slot, slot_hotplug_enable);
 }
 }
-- 
MST

Re: [Qemu-devel] [PATCH v4 0/5] ACPI fixes for QEMU 2.1

On Tue, Jul 29, 2014 at 12:25:56PM +0200, Laszlo Ersek wrote:
 On 07/28/14 23:27, Michael S. Tsirkin wrote:
  On Mon, Jul 28, 2014 at 05:34:13PM +0200, Paolo Bonzini wrote:
  v3-v4:
 drop all pretense of supporting bridges [me]
 
  v2-v3:
 fix tests/acpi-test-data/pc/DSDT [Peter]
 track down make check failure, fix it [patch 4, me]
 split patch 2 in two parts [mst]
 do not make bsel_alloc global [mst]
 include Igor's bridge patch [mst, as discussed on IRC]
  
  OK, I applied this, and did some tweaks on top that I think
  make it a bit safer.
  It's very very late in the release cycle, but also very late in the
  day so I don't want to risk sending pull request now.
  I did push it out: tag for_upstream in my tree
  Will send tomorrow: Paolo, Laszlo, Gerd, could you please take
  a look and ack?
 
 Any particular reason for reordering the patches from Paolo's v4?
 
 In that series, the order is:
 
 1  acpi-dsdt: procedurally generate _PRT
 2  pc: hack for migration compatibility from QEMU 2.0
 3  pc: future-proof migration-compatibility of ACPI tables
 4  bios-tables-test: fix ASL normalization false positive
 5  pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug
is disabled
 
 In yours,
 
 1  acpi-dsdt: procedurally generate _PRT
 2  pc: hack for migration compatibility from QEMU 2.0
 3  bios-tables-test: fix ASL normalization false positive
 4  pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug
is disabled
 5  acpi-build: minor code cleanup
 6  pc: future-proof migration-compatibility of ACPI tables

It's because of this patch. It was a bit controversial,
so I deferred applying it for a while.
I implemented an alternative solution just to see how
it would look like.
That patch turned out to be much bigger, so I agreed we
should go ahead with Paolo's one for 2.1 even if it's not pretty,
and will create more work for 2.2.

 7  acpi-build: tweak acpi migration limits
 8  piix: set legacy table size for 1.7
 
 1 - 1
 2 - 2
 3 - 6
 4 - 3
 5 - 4
 
 Patches 1  2 are identical between the two sets, and their order is the
 same.
 
 You cut out patch #3, moved up patches #4 and #5, added a new patch
 (acpi-build: minor code cleanup), and then reinserted #3.
 
 .. Patches taken from Paolo's v4 seem to be identical.
 
 For patch acpi-build: minor code cleanup:
 - two typos in the commit message (double space, for clarify)
 - seems OK otherwise
 
 For patch acpi-build: tweak acpi migration limits:
 
 len   pre-patch post-patch
   message   actionmessage  action
       ---  
 [  0, 64 KB]  none  set to 64 KB  none set to 128KB
 ( 64 KB, 128 KB]  error exit  warning  set to 128KB
 (128 KB, inf) error exit  warning  round up to multiple
  of 128 KB
 
 I don't object.
 
 For patch piix: set legacy table size for 1.7: didn't Igor say
 something that such a migration wouldn't work anyway? I could be
 remembering wrong.
 
 Thanks
 Laszlo

I don't recall this, but if there are more bug we could just
fix them too.

-- 
MST

Re: [Qemu-devel] [PATCH v4 0/5] ACPI fixes for QEMU 2.1

Il 29/07/2014 12:31, Michael S. Tsirkin ha scritto:
  For patch piix: set legacy table size for 1.7: didn't Igor say
  something that such a migration wouldn't work anyway? I could be
  remembering wrong.
 
 I don't recall this, but if there are more bug we could just
 fix them too.

You have to choose between a -M pc-i440fx-1.7 that migrates from 1.7
to 2.1, and one that migrates from 2.0 to 2.1.

- to make 1.7-2.1 work: you need both Igor's patch (generate AML
only...) and piix: set legacy table size for 1.7.  All configurations
will work, including those with PCI bridges.

- to make 2.0-2.1 work (with -M pc-i440fx-1.7): you need to omit
piix: set legacy table size for 1.7, and configurations with PCI
bridges remain broken.  Igor's patch is not needed, because it only
affects configurations with PCI bridges.

mst prefers the first one, and he changed my view on that too.  And he
noticed that with piix: set legacy table size for 1.7 more things just
work, so it's better to include it.

Paolo

[Qemu-devel] [PULL 0/8] pc migration fixes

The following changes since commit c60a57ff497667780132a3fcdc1500c83af5d5c0:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2014-07-25 16:58:41 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to f47337cb9181a1f9339b9b0516b92fe77a22f0f3:

  piix: set legacy table size for 1.7 (2014-07-29 12:26:12 +0200)


pc migration fixes

Last minute fixes for migration.
It seems that if we don't fix it now, fixing
it in the next version will be even more painful ...

Signed-off-by: Michael S. Tsirkin m...@redhat.com


Igor Mammedov (1):
  pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is 
disabled

Michael S. Tsirkin (3):
  acpi-build: minor code cleanup
  acpi-build: tweak acpi migration limits
  piix: set legacy table size for 1.7

Paolo Bonzini (4):
  acpi-dsdt: procedurally generate _PRT
  pc: hack for migration compatibility from QEMU 2.0
  bios-tables-test: fix ASL normalization false positive
  pc: future-proof migration-compatibility of ACPI tables

 include/hw/i386/pc.h|1 +
 hw/i386/acpi-build.c|   99 +-
 hw/i386/pc_piix.c   |   20 +
 hw/i386/pc_q35.c|5 +
 tests/bios-tables-test.c|6 +-
 hw/i386/acpi-dsdt.dsl   |   90 +-
 hw/i386/acpi-dsdt.hex.generated | 1910 +++
 tests/acpi-test-data/pc/DSDT|  Bin 4499 - 2807 bytes
 8 files changed, 269 insertions(+), 1862 deletions(-)

[Qemu-devel] [PULL 2/8] pc: hack for migration compatibility from QEMU 2.0

From: Paolo Bonzini pbonz...@redhat.com

Changing the ACPI table size causes migration to break, and the memory
hotplug work opened our eyes on how horribly we were breaking things in
2.0 already.

The ACPI table size is rounded to the next 4k, which one would think
gives some headroom.  In practice this is not the case, because the user
can control the ACPI table size (each CPU adds 97 bytes to the SSDT and
8 to the MADT) and so some -smp values will break the 4k boundary and
fail to migrate.  Similarly, PCI bridges add ~1870 bytes to the SSDT.

This patch concerns itself with fixing migration from QEMU 2.0.  It
computes the payload size of QEMU 2.0 and always uses that one.
The previous patch shrunk the ACPI tables enough that the QEMU 2.0 size
should always be enough; non-AML tables can change depending on the
configuration (especially MADT, SRAT, HPET) but they remain the same
between QEMU 2.0 and 2.1, so we only compute our padding based on the
sizes of the SSDT and DSDT.

Migration from QEMU 1.7 should work for guests that have a number of CPUs
other than 12, 13, 14, 54, 55, 56, 97, 98, 139, 140.  It was already
broken from QEMU 1.7 to QEMU 2.0 in the same way, though.

Even with this patch, QEMU 1.7 and 2.0 have two different ideas of
-M pc-i440fx-2.0 when there are PCI bridges.  Igor sent a patch to
adopt the QEMU 1.7 definition.  I think distributions should apply
it if they move directly from QEMU 1.7 to 2.1+ without ever packaging
version 2.0.

Reviewed-by: Laszlo Ersek ler...@redhat.com
Tested-by: Igor Mammedov imamm...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Reviewed-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 include/hw/i386/pc.h |  1 +
 hw/i386/acpi-build.c | 57 
 hw/i386/pc_piix.c| 19 ++
 hw/i386/pc_q35.c |  5 +
 4 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 1c0c382..f4b9b2b 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -94,6 +94,7 @@ struct PcGuestInfo {
 uint64_t *node_mem;
 uint64_t *node_cpu;
 FWCfgState *fw_cfg;
+int legacy_acpi_table_size;
 bool has_acpi_build;
 bool has_reserved_memory;
 };
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index ebc5f03..d90c471 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -25,7 +25,9 @@
 #include glib.h
 #include qemu-common.h
 #include qemu/bitmap.h
+#include qemu/osdep.h
 #include qemu/range.h
+#include qemu/error-report.h
 #include hw/pci/pci.h
 #include qom/cpu.h
 #include hw/i386/pc.h
@@ -52,6 +54,14 @@
 #include qapi/qmp/qint.h
 #include qom/qom-qobject.h
 
+/* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
+ * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
+ * a little bit, there should be plenty of free space since the DSDT
+ * shrunk by ~1.5k between QEMU 2.0 and QEMU 2.1.
+ */
+#define ACPI_BUILD_LEGACY_CPU_AML_SIZE97
+#define ACPI_BUILD_ALIGN_SIZE 0x1000
+
 typedef struct AcpiCpuInfo {
 DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT);
 } AcpiCpuInfo;
@@ -1440,13 +1450,14 @@ static
 void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
 {
 GArray *table_offsets;
-unsigned facs, dsdt, rsdt;
+unsigned facs, ssdt, dsdt, rsdt;
 AcpiCpuInfo cpu;
 AcpiPmInfo pm;
 AcpiMiscInfo misc;
 AcpiMcfgInfo mcfg;
 PcPciInfo pci;
 uint8_t *u;
+size_t aml_len = 0;
 
 acpi_get_cpu_info(cpu);
 acpi_get_pm_info(pm);
@@ -1474,13 +1485,20 @@ void acpi_build(PcGuestInfo *guest_info, 
AcpiBuildTables *tables)
 dsdt = tables-table_data-len;
 build_dsdt(tables-table_data, tables-linker, misc);
 
+/* Count the size of the DSDT and SSDT, we will need it for legacy
+ * sizing of ACPI tables.
+ */
+aml_len += tables-table_data-len - dsdt;
+
 /* ACPI tables pointed to by RSDT */
 acpi_add_table(table_offsets, tables-table_data);
 build_fadt(tables-table_data, tables-linker, pm, facs, dsdt);
 
+ssdt = tables-table_data-len;
 acpi_add_table(table_offsets, tables-table_data);
 build_ssdt(tables-table_data, tables-linker, cpu, pm, misc, pci,
guest_info);
+aml_len += tables-table_data-len - ssdt;
 
 acpi_add_table(table_offsets, tables-table_data);
 build_madt(tables-table_data, tables-linker, cpu, guest_info);
@@ -1513,14 +1531,45 @@ void acpi_build(PcGuestInfo *guest_info, 
AcpiBuildTables *tables)
 /* RSDP is in FSEG memory, so allocate it separately */
 build_rsdp(tables-rsdp, tables-linker, rsdt);
 
-/* We'll expose it all to Guest so align size to reduce
+/* We'll expose it all to Guest so we want to reduce
  * chance of size changes.
  * RSDP is small so it's easy to keep it immutable, no need to
  * bother with alignment.
+ *
+ * We used to align the tables

[Qemu-devel] [PULL 7/8] acpi-build: tweak acpi migration limits

- Tweak error message for legacy machine type:
  Basically if table size exceeds the limits we set all
  bets are off for migration: e.g. it can start failing even
  within given qemu minor version simply because of a bugfix.
- Increase table size to 128k.
- Make sure we notice it long before we start getting close to the
  128k limit: warn at 64k.
- Don't fail if we exceed the limit: most people don't care about
  migration, even less people care about cross version miration.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/i386/acpi-build.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 2178894..816c6d9 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -62,7 +62,7 @@
 #define ACPI_BUILD_LEGACY_CPU_AML_SIZE97
 #define ACPI_BUILD_ALIGN_SIZE 0x1000
 
-#define ACPI_BUILD_TABLE_SIZE 0x1
+#define ACPI_BUILD_TABLE_SIZE 0x2
 
 typedef struct AcpiCpuInfo {
 DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT);
@@ -1586,17 +1586,19 @@ void acpi_build(PcGuestInfo *guest_info, 
AcpiBuildTables *tables)
  ACPI_BUILD_ALIGN_SIZE);
 if (tables-table_data-len  legacy_table_size) {
 /* Should happen only with PCI bridges and -M pc-i440fx-2.0.  */
-error_report(Warning: migration to QEMU 2.0 may not work.);
+error_report(Warning: migration may not work.);
 }
 g_array_set_size(tables-table_data, legacy_table_size);
 } else {
-if (tables-table_data-len  ACPI_BUILD_TABLE_SIZE) {
+/* Make sure we have a buffer in case we need to resize the tables. */
+if (tables-table_data-len  ACPI_BUILD_TABLE_SIZE / 2) {
 /* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots. 
 */
-error_report(ACPI tables are larger than 64k.  Please remove);
-error_report(CPUs, NUMA nodes, memory slots or PCI bridges.);
-exit(1);
+error_report(Warning: ACPI tables are larger than 64k.);
+error_report(Warning: migration may not work.);
+error_report(Warning: please remove CPUs, NUMA nodes, 
+ memory slots or PCI bridges.);
 }
-g_array_set_size(tables-table_data, ACPI_BUILD_TABLE_SIZE);
+acpi_align_size(tables-table_data, ACPI_BUILD_TABLE_SIZE);
 }
 
 acpi_align_size(tables-linker, ACPI_BUILD_ALIGN_SIZE);
-- 
MST

[Qemu-devel] [PULL 8/8] piix: set legacy table size for 1.7

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/i386/pc_piix.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 4524e6b..9694f88 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -326,6 +326,7 @@ static void pc_compat_1_7(MachineState *machine)
 smbios_defaults = false;
 gigabyte_align = false;
 option_rom_has_mr = true;
+legacy_acpi_table_size = 6414;
 x86_cpu_compat_disable_kvm_features(FEAT_1_ECX, CPUID_EXT_X2APIC);
 }
 
-- 
MST

[Qemu-devel] [PULL 5/8] acpi-build: minor code cleanup

Fix up and add  comments to clarify code, plus a trivial
code change for clarity.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/i386/acpi-build.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 779160f..ec86f1b 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -825,9 +825,9 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state)
 bool bus_hotplug_support = false;
 
 /*
-skip bridge subtree creation if bridge hotplug is disabled
-to make it compatible with 1.7 machine type
-*/
+ * Skip bridge subtree creation if bridge hotplug is disabled
+ * to make acpi tables compatible with legacy machine types.
+ */
 if (!child-pcihp_bridge_en  bus-parent_dev) {
 return;
 }
@@ -869,6 +869,7 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state)
 PCIDeviceClass *pc;
 PCIDevice *pdev = bus-devices[i];
 int slot = PCI_SLOT(i);
+bool bridge_in_acpi;
 
 if (!pdev) {
 continue;
@@ -878,8 +879,13 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state)
 pc = PCI_DEVICE_GET_CLASS(pdev);
 dc = DEVICE_GET_CLASS(pdev);
 
-if (pc-class_id == PCI_CLASS_BRIDGE_ISA ||
-(pc-is_bridge  child-pcihp_bridge_en)) {
+/* When hotplug for bridges is enabled, bridges are
+ * described in ACPI separately (see build_pci_bus_end).
+ * In this case they aren't themselves hot-pluggable.
+ */
+bridge_in_acpi = pc-is_bridge  child-pcihp_bridge_en;
+
+if (pc-class_id == PCI_CLASS_BRIDGE_ISA || bridge_in_acpi) {
 set_bit(slot, slot_device_system);
 }
 
@@ -891,7 +897,7 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state)
 }
 }
 
-if (!dc-hotpluggable || (pc-is_bridge  child-pcihp_bridge_en)) {
+if (!dc-hotpluggable || bridge_in_acpi) {
 clear_bit(slot, slot_hotplug_enable);
 }
 }
-- 
MST

[Qemu-devel] [PULL 1/8] acpi-dsdt: procedurally generate _PRT

From: Paolo Bonzini pbonz...@redhat.com

This replaces the _PRT constant with a method that computes it.

The problem is that the DSDT+SSDT have grown from 2.0 to 2.1,
enough to cross the 8k barrier (we align the ACPI tables to 4k
before putting them in fw_cfg).  This causes problems with
migration and the pc-i440fx-2.0 machine type.

The solution to the problem is to hardcode 64k as the limit,
but this doesn't solve the bug with pc-i440fx-2.0.  The fix will be
for QEMU 2.1 to use exactly the same size as QEMU 2.0 for the
ACPI tables.  First, however, we must make the actual AML
equal or smaller; to do this, rewrite _PRT in a way that saves
over 1k of bytecode.

Reviewed-by: Laszlo Ersek ler...@redhat.com
Tested-by: Igor Mammedov imamm...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Reviewed-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/i386/acpi-dsdt.dsl   |   90 +-
 hw/i386/acpi-dsdt.hex.generated | 1910 +++
 tests/acpi-test-data/pc/DSDT|  Bin 4499 - 2807 bytes
 3 files changed, 148 insertions(+), 1852 deletions(-)

diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl
index 3cc0ea0..6ba0170 100644
--- a/hw/i386/acpi-dsdt.dsl
+++ b/hw/i386/acpi-dsdt.dsl
@@ -181,57 +181,45 @@ DefinitionBlock (
 
 Scope(\_SB) {
 Scope(PCI0) {
-Name(_PRT, Package() {
-/* PCI IRQ routing table, example from ACPI 2.0a specification,
-   section 6.2.8.1 */
-/* Note: we provide the same info as the PCI routing
-   table of the Bochs BIOS */
-
-#define prt_slot(nr, lnk0, lnk1, lnk2, lnk3) \
-Package() { nr##, 0, lnk0, 0 }, \
-Package() { nr##, 1, lnk1, 0 }, \
-Package() { nr##, 2, lnk2, 0 }, \
-Package() { nr##, 3, lnk3, 0 }
-
-#define prt_slot0(nr) prt_slot(nr, LNKD, LNKA, LNKB, LNKC)
-#define prt_slot1(nr) prt_slot(nr, LNKA, LNKB, LNKC, LNKD)
-#define prt_slot2(nr) prt_slot(nr, LNKB, LNKC, LNKD, LNKA)
-#define prt_slot3(nr) prt_slot(nr, LNKC, LNKD, LNKA, LNKB)
-
-prt_slot0(0x),
-/* Device 1 is power mgmt device, and can only use irq 9 */
-prt_slot(0x0001, LNKS, LNKB, LNKC, LNKD),
-prt_slot2(0x0002),
-prt_slot3(0x0003),
-prt_slot0(0x0004),
-prt_slot1(0x0005),
-prt_slot2(0x0006),
-prt_slot3(0x0007),
-prt_slot0(0x0008),
-prt_slot1(0x0009),
-prt_slot2(0x000a),
-prt_slot3(0x000b),
-prt_slot0(0x000c),
-prt_slot1(0x000d),
-prt_slot2(0x000e),
-prt_slot3(0x000f),
-prt_slot0(0x0010),
-prt_slot1(0x0011),
-prt_slot2(0x0012),
-prt_slot3(0x0013),
-prt_slot0(0x0014),
-prt_slot1(0x0015),
-prt_slot2(0x0016),
-prt_slot3(0x0017),
-prt_slot0(0x0018),
-prt_slot1(0x0019),
-prt_slot2(0x001a),
-prt_slot3(0x001b),
-prt_slot0(0x001c),
-prt_slot1(0x001d),
-prt_slot2(0x001e),
-prt_slot3(0x001f),
-})
+Method (_PRT, 0) {
+Store(Package(128) {}, Local0)
+Store(Zero, Local1)
+While(LLess(Local1, 128)) {
+// slot = pin  2
+Store(ShiftRight(Local1, 2), Local2)
+
+// lnk = (slot + pin)  3
+Store(And(Add(Local1, Local2), 3), Local3)
+If (LEqual(Local3, 0)) {
+Store(Package(4) { Zero, Zero, LNKD, Zero }, Local4)
+}
+If (LEqual(Local3, 1)) {
+// device 1 is the power-management device, needs SCI
+If (LEqual(Local1, 4)) {
+Store(Package(4) { Zero, Zero, LNKS, Zero }, 
Local4)
+} Else {
+Store(Package(4) { Zero, Zero, LNKA, Zero }, 
Local4)
+}
+}
+If (LEqual(Local3, 2)) {
+Store(Package(4) { Zero, Zero, LNKB, Zero }, Local4)
+}
+If (LEqual(Local3, 3)) {
+Store(Package(4) { Zero, Zero, LNKC, Zero }, Local4)
+}
+
+// Complete the interrupt routing entry:
+//Package(4) { 0x[slot], [pin], [link], 0) }
+
+Store(Or(ShiftLeft(Local2, 16), 0x), Index(Local4, 0))
+Store(And(Local1, 3),Index(Local4, 1))
+Store(Local4,Index(Local0,

[Qemu-devel] [PULL 3/8] bios-tables-test: fix ASL normalization false positive

From: Paolo Bonzini pbonz...@redhat.com

My version of IASL (from RHEL7) puts two newlines between the head comment
and the DefinitionBlock property.  Kill all newlines after the comment,
so that normalize_asl works properly.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Reviewed-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Laszlo Ersek ler...@redhat.com
---
 tests/bios-tables-test.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index 62771f7..045eb27 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -487,7 +487,11 @@ static GString *normalize_asl(gchar *asl_code)
 /* strip comments (different generation days) */
 comment = g_strstr_len(asl-str, asl-len, COMMENT_END);
 if (comment) {
-asl = g_string_erase(asl, 0, comment + sizeof(COMMENT_END) - asl-str);
+comment += strlen(COMMENT_END);
+while (*comment == '\n') {
+comment++;
+}
+asl = g_string_erase(asl, 0, comment - asl-str);
 }
 
 /* strip def block name (it has file path in it) */
-- 
MST

[Qemu-devel] [PULL 4/8] pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is disabled

From: Igor Mammedov imamm...@redhat.com

Fixes migration regression from QEMU-1.7 to a newer QEMUs.
SSDT table size in QEMU-1.7 doesn't change regardless of
a number of PCI bridge devices present at startup.

However in QEMU-2.0 since addition of hotplug on PCI bridges,
each PCI bridge adds ~1875 bytes to SSDT table, including
pc-i440fx-1.7 machine type where PCI bridge hotplug disabled
via compat property.
It breaks migration from QEMU-1.7 to QEMU-2.[01] -M pc-i440fx-1.7
since RAMBlock size of ACPI tables on target becomes larger
then on source and migration fails with:

Length mismatch: /rom@etc/acpi/tables: 2000 in != 3000

error.

Fix this by generating AML only for PCI0 bus if
hotplug on PCI bridges is disabled and preserves PCI brigde
description in AML as it was done in QEMU-1.7 for pc-i440fx-1.7.

It will help to maintain size of SSDT static regardless of
number of PCI bridges on startup for pc-i440fx-1.7 machine type.

Signed-off-by: Igor Mammedov imamm...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Reviewed-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/i386/acpi-build.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index d90c471..779160f 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -74,6 +74,7 @@ typedef struct AcpiMcfgInfo {
 typedef struct AcpiPmInfo {
 bool s3_disabled;
 bool s4_disabled;
+bool pcihp_bridge_en;
 uint8_t s4_val;
 uint16_t sci_int;
 uint8_t acpi_enable_cmd;
@@ -95,6 +96,7 @@ typedef struct AcpiBuildPciBusHotplugState {
 GArray *device_table;
 GArray *notify_table;
 struct AcpiBuildPciBusHotplugState *parent;
+bool pcihp_bridge_en;
 } AcpiBuildPciBusHotplugState;
 
 static void acpi_get_dsdt(AcpiMiscInfo *info)
@@ -198,6 +200,9 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
NULL);
 pm-gpe0_blk_len = object_property_get_int(obj, ACPI_PM_PROP_GPE0_BLK_LEN,
NULL);
+pm-pcihp_bridge_en =
+object_property_get_bool(obj, acpi-pci-hotplug-with-bridge-support,
+ NULL);
 }
 
 static void acpi_get_misc_info(AcpiMiscInfo *info)
@@ -778,11 +783,13 @@ static void acpi_set_pci_info(void)
 }
 
 static void build_pci_bus_state_init(AcpiBuildPciBusHotplugState *state,
- AcpiBuildPciBusHotplugState *parent)
+ AcpiBuildPciBusHotplugState *parent,
+ bool pcihp_bridge_en)
 {
 state-parent = parent;
 state-device_table = build_alloc_array();
 state-notify_table = build_alloc_array();
+state-pcihp_bridge_en = pcihp_bridge_en;
 }
 
 static void build_pci_bus_state_cleanup(AcpiBuildPciBusHotplugState *state)
@@ -796,7 +803,7 @@ static void *build_pci_bus_begin(PCIBus *bus, void 
*parent_state)
 AcpiBuildPciBusHotplugState *parent = parent_state;
 AcpiBuildPciBusHotplugState *child = g_malloc(sizeof *child);
 
-build_pci_bus_state_init(child, parent);
+build_pci_bus_state_init(child, parent, parent-pcihp_bridge_en);
 
 return child;
 }
@@ -817,6 +824,14 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state)
 GArray *method;
 bool bus_hotplug_support = false;
 
+/*
+skip bridge subtree creation if bridge hotplug is disabled
+to make it compatible with 1.7 machine type
+*/
+if (!child-pcihp_bridge_en  bus-parent_dev) {
+return;
+}
+
 if (bus-parent_dev) {
 op = 0x82; /* DeviceOp */
 build_append_nameseg(bus_table, S%.02X_,
@@ -863,7 +878,8 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state)
 pc = PCI_DEVICE_GET_CLASS(pdev);
 dc = DEVICE_GET_CLASS(pdev);
 
-if (pc-class_id == PCI_CLASS_BRIDGE_ISA || pc-is_bridge) {
+if (pc-class_id == PCI_CLASS_BRIDGE_ISA ||
+(pc-is_bridge  child-pcihp_bridge_en)) {
 set_bit(slot, slot_device_system);
 }
 
@@ -875,7 +891,7 @@ static void build_pci_bus_end(PCIBus *bus, void *bus_state)
 }
 }
 
-if (!dc-hotpluggable || pc-is_bridge) {
+if (!dc-hotpluggable || (pc-is_bridge  child-pcihp_bridge_en)) {
 clear_bit(slot, slot_hotplug_enable);
 }
 }
@@ -1140,7 +1156,7 @@ build_ssdt(GArray *table_data, GArray *linker,
 bus = PCI_HOST_BRIDGE(pci_host)-bus;
 }
 
-build_pci_bus_state_init(hotplug_state, NULL);
+build_pci_bus_state_init(hotplug_state, NULL, 
pm-pcihp_bridge_en);
 
 if (bus) {
 /* Scan all PCI buses. Generate tables to support hotplug. */
-- 
MST

[Qemu-devel] [PULL 6/8] pc: future-proof migration-compatibility of ACPI tables

From: Paolo Bonzini pbonz...@redhat.com

This patch avoids that similar changes break QEMU again in the future.
QEMU will now hard-code 64k as the maximum ACPI table size, which
(despite being an order of magnitude smaller than 640k) should be enough
for everyone.

Reviewed-by: Laszlo Ersek ler...@redhat.com
Tested-by: Igor Mammedov imamm...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Reviewed-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/i386/acpi-build.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index ec86f1b..2178894 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -62,6 +62,8 @@
 #define ACPI_BUILD_LEGACY_CPU_AML_SIZE97
 #define ACPI_BUILD_ALIGN_SIZE 0x1000
 
+#define ACPI_BUILD_TABLE_SIZE 0x1
+
 typedef struct AcpiCpuInfo {
 DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT);
 } AcpiCpuInfo;
@@ -1588,7 +1590,13 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables 
*tables)
 }
 g_array_set_size(tables-table_data, legacy_table_size);
 } else {
-acpi_align_size(tables-table_data, ACPI_BUILD_ALIGN_SIZE);
+if (tables-table_data-len  ACPI_BUILD_TABLE_SIZE) {
+/* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots. 
 */
+error_report(ACPI tables are larger than 64k.  Please remove);
+error_report(CPUs, NUMA nodes, memory slots or PCI bridges.);
+exit(1);
+}
+g_array_set_size(tables-table_data, ACPI_BUILD_TABLE_SIZE);
 }
 
 acpi_align_size(tables-linker, ACPI_BUILD_ALIGN_SIZE);
-- 
MST

[Qemu-devel] [PATCH v2] Add ACPI tables for TPM

2014-07-29 Thread Stefan Berger

From: Stefan Berger stef...@linux.vnet.ibm.com

Add an SSDT ACPI table for the TPM device.
Add a TCPA table for BIOS logging area when a TPM is being used.

The latter follows this spec here:

http://www.trustedcomputinggroup.org/files/static_page_files/DCD4188E-1A4B-B294-D050A155FB6F7385/TCG_ACPIGeneralSpecification_PublicReview.pdf

Signed-off-by: Stefan Berger stef...@linux.vnet.ibm.com
---
 hw/i386/Makefile.objs |  3 ++-
 hw/i386/acpi-build.c  | 46 ++
 hw/i386/acpi-defs.h   | 11 +++
 hw/i386/ssdt-tpm.dsl  | 43 +++
 hw/tpm/tpm_tis.h  |  5 +
 include/hw/acpi/tpm.h | 29 +
 include/sysemu/tpm.h  |  5 +
 7 files changed, 137 insertions(+), 5 deletions(-)
 create mode 100644 hw/i386/ssdt-tpm.dsl
 create mode 100644 include/hw/acpi/tpm.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index 48014ab..3688cf8 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -10,7 +10,8 @@ obj-y += bios-linker-loader.o
 hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex \
hw/i386/ssdt-proc.hex hw/i386/ssdt-pcihp.hex hw/i386/ssdt-misc.hex \
hw/i386/acpi-dsdt.hex hw/i386/q35-acpi-dsdt.hex \
-   hw/i386/q35-acpi-dsdt.hex hw/i386/ssdt-mem.hex
+   hw/i386/q35-acpi-dsdt.hex hw/i386/ssdt-mem.hex \
+   hw/i386/ssdt-tpm.hex
 
 iasl-option=$(shell if test -z `$(1) $(2) 21  /dev/null` \
 ; then echo $(2); else echo $(3); fi ;)
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index ebc5f03..d767e37 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -38,6 +38,8 @@
 #include hw/loader.h
 #include hw/isa/isa.h
 #include hw/acpi/memory_hotplug.h
+#include sysemu/tpm.h
+#include hw/acpi/tpm.h
 
 /* Supported chipsets: */
 #include hw/acpi/piix4.h
@@ -75,6 +77,7 @@ typedef struct AcpiPmInfo {
 
 typedef struct AcpiMiscInfo {
 bool has_hpet;
+bool has_tpm;
 DECLARE_BITMAP(slot_hotplug_enable, PCI_SLOT_MAX);
 const unsigned char *dsdt_code;
 unsigned dsdt_size;
@@ -193,6 +196,7 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
 static void acpi_get_misc_info(AcpiMiscInfo *info)
 {
 info-has_hpet = hpet_find();
+info-has_tpm = tpm_find();
 info-pvpanic_port = pvpanic_port();
 }
 
@@ -681,6 +685,7 @@ static inline char acpi_get_hex(uint32_t val)
 
 #include hw/i386/ssdt-misc.hex
 #include hw/i386/ssdt-pcihp.hex
+#include hw/i386/ssdt-tpm.hex
 
 static void
 build_append_notify_method(GArray *device, const char *name,
@@ -1167,6 +1172,40 @@ build_hpet(GArray *table_data, GArray *linker)
  (void *)hpet, HPET, sizeof(*hpet), 1);
 }
 
+static void
+build_tpm_tcpa(GArray *table_data, GArray *linker)
+{
+Acpi20Tcpa *tcpa;
+uint32_t log_area_minimum_length = TPM_LOG_AREA_MINIMUM_SIZE;
+uint64_t log_area_start_address;
+size_t len = log_area_minimum_length + sizeof(*tcpa);
+
+log_area_start_address = table_data-len + sizeof(*tcpa);
+
+tcpa = acpi_data_push(table_data, len);
+
+tcpa-platform_class = cpu_to_le16(TPM_TCPA_ACPI_CLASS_CLIENT);
+tcpa-log_area_minimum_length = cpu_to_le32(log_area_minimum_length);
+tcpa-log_area_start_address = cpu_to_le64(log_area_start_address);
+
+/* LASA address to be filled by Guest linker */
+bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
+   ACPI_BUILD_TABLE_FILE,
+   table_data, tcpa-log_area_start_address,
+   sizeof(tcpa-log_area_start_address));
+build_header(linker, table_data,
+ (void *)tcpa, TCPA, sizeof(*tcpa), 2);
+}
+
+static void
+build_tpm_ssdt(GArray *table_data, GArray *linker)
+{
+void *tpm_ptr;
+
+tpm_ptr = acpi_data_push(table_data, sizeof(ssdt_tpm_aml));
+memcpy(tpm_ptr, ssdt_tpm_aml, sizeof(ssdt_tpm_aml));
+}
+
 typedef enum {
 MEM_AFFINITY_NOFLAGS  = 0,
 MEM_AFFINITY_ENABLED  = (1  0),
@@ -1489,6 +1528,13 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables 
*tables)
 acpi_add_table(table_offsets, tables-table_data);
 build_hpet(tables-table_data, tables-linker);
 }
+if (misc.has_tpm) {
+acpi_add_table(table_offsets, tables-table_data);
+build_tpm_ssdt(tables-table_data, tables-linker);
+
+acpi_add_table(table_offsets, tables-table_data);
+build_tpm_tcpa(tables-table_data, tables-linker);
+}
 if (guest_info-numa_nodes) {
 acpi_add_table(table_offsets, tables-table_data);
 build_srat(tables-table_data, tables-linker, cpu, guest_info);
diff --git a/hw/i386/acpi-defs.h b/hw/i386/acpi-defs.h
index e93babb..1bc974a 100644
--- a/hw/i386/acpi-defs.h
+++ b/hw/i386/acpi-defs.h
@@ -314,4 +314,15 @@ struct AcpiTableMcfg {
 } QEMU_PACKED;
 typedef struct AcpiTableMcfg AcpiTableMcfg;
 
+/*
+ * TCPA Description Table
+ */
+struct Acpi20Tcpa {
+

[Qemu-devel] [Bug 1343827] Re: block.c: multiwrite_merge() truncates overlapping requests

Thanks for reporting this bug.  I'm writing a test case and fix, will CC
you on the patches.

Please keep in mind that no ordering is guaranteed between requests that
are in-flight at the same time.  Therefore it is unusual to submit
overlapping requests and could indicate a bug in the application.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1343827

Title:
  block.c: multiwrite_merge() truncates overlapping requests

Status in QEMU:
  New

Bug description:
  If the list of requests passed to multiwrite_merge() contains two
  requests where the first is for a range of sectors that is a strict
  subset of the second's, the second request is truncated to end where
  the first starts, so the second half of the second request is lost.

  This is easy to reproduce by running fio against a virtio-blk device
  running on qemu 2.1.0-rc1 with the below fio script. At least with fio
  2.0.13, the randwrite pass will issue overlapping bios to the block
  driver, which the kernel is happy to pass along to qemu:

  [global]
  randrepeat=0
  ioengine=libaio
  iodepth=64
  direct=1
  size=1M
  numjobs=1
  verify_fatal=1
  verify_dump=1

  filename=$dev

  [seqwrite]
  blocksize_range=4k-1M
  rw=write
  verify=crc32c-intel

  [randwrite]
  stonewall
  blocksize_range=4k-1M
  rw=randwrite
  verify=meta

  Here is a naive fix for the problem that simply avoids merging
  problematic requests. I guess a better solution would be to redo
  qemu_iovec_concat() to do the right thing.

  diff -ur old/qemu-2.1.0-rc2/block.c qemu-2.1.0-rc2/block.c
  --- old/qemu-2.1.0-rc2/block.c  2014-07-15 14:49:14.0 -0700
  +++ qemu-2.1.0-rc2/block.c  2014-07-17 23:03:14.224169741 -0700
  @@ -4460,7 +4460,9 @@
   int64_t oldreq_last = reqs[outidx].sector + reqs[outidx].nb_sectors;
   
   // Handle exactly sequential writes and overlapping writes.
  -if (reqs[i].sector = oldreq_last) {
  +// If this request ends before the previous one, don't merge.
  +if (reqs[i].sector = oldreq_last 
  +reqs[i].sector + reqs[i].nb_sectors = oldreq_last) {
   merge = 1;
   }

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1343827/+subscriptions

Re: [Qemu-devel] [PATCH v4 0/5] ACPI fixes for QEMU 2.1

2014-07-29 Thread Laszlo Ersek

On 07/29/14 12:39, Paolo Bonzini wrote:
 Il 29/07/2014 12:31, Michael S. Tsirkin ha scritto:
 For patch piix: set legacy table size for 1.7: didn't Igor say
 something that such a migration wouldn't work anyway? I could be
 remembering wrong.

 I don't recall this, but if there are more bug we could just
 fix them too.
 
 You have to choose between a -M pc-i440fx-1.7 that migrates from 1.7
 to 2.1, and one that migrates from 2.0 to 2.1.
 
 - to make 1.7-2.1 work: you need both Igor's patch (generate AML
 only...) and piix: set legacy table size for 1.7.  All configurations
 will work, including those with PCI bridges.
 
 - to make 2.0-2.1 work (with -M pc-i440fx-1.7): you need to omit
 piix: set legacy table size for 1.7, and configurations with PCI
 bridges remain broken.  Igor's patch is not needed, because it only
 affects configurations with PCI bridges.
 
 mst prefers the first one, and he changed my view on that too.  And he
 noticed that with piix: set legacy table size for 1.7 more things just
 work, so it's better to include it.

I see. Thanks.

 1  acd727e acpi-dsdt: procedurally generate _PRT
 2  07fb617 pc: hack for migration compatibility from QEMU 2.0

These already have my R-b.

 3  3d5061f bios-tables-test: fix ASL normalization false positive

Just gave my R-b in this thread.

 4  82631f6 pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug 
 is disabled

Acked-by: Laszlo Ersek ler...@redhat.com

 5  1cffcf8 acpi-build: minor code cleanup

Typos in the commit message should be fixed. With those changes,
Reviewed-by: Laszlo Ersek ler...@redhat.com

 6  3200ac6 pc: future-proof migration-compatibility of ACPI tables

Has my R-b already.

 7  004af2c acpi-build: tweak acpi migration limits

Reviewed-by: Laszlo Ersek ler...@redhat.com

 8  5c35a24 piix: set legacy table size for 1.7

Acked-by: Laszlo Ersek ler...@redhat.com

Laszlo

Re: [Qemu-devel] [PATCH 2/4] xen:hw:pci-host:piix: create host bridge to passthrough

On Thu, Jul 24, 2014 at 07:30:27PM +0800, Tiejun Chen wrote:
 Implement that pci

s/that/a/

 host bridge to specific

s/to specific/specific/

 to passthrough. Actually
 this just inherit

s/inherit/inherits/

 the standard one.
 
 This is based on http://patchwork.ozlabs.org/patch/363810/.
 
 Signed-off-by: Tiejun Chen tiejun.c...@intel.com
 ---
  hw/pci-host/piix.c | 43 +++
  1 file changed, 43 insertions(+)
 
 diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
 index e0e0946..9feddf5 100644
 --- a/hw/pci-host/piix.c
 +++ b/hw/pci-host/piix.c
 @@ -34,6 +34,7 @@
  #include sysemu/sysemu.h
  #include hw/i386/ioapic.h
  #include qapi/visitor.h
 +#include hw/xen/xen_pt.h

What call needs this, exactly?

  
  /*
   * I440FX chipset data sheet.
 @@ -44,6 +45,10 @@
  #define I440FX_PCI_HOST_BRIDGE(obj) \
  OBJECT_CHECK(I440FXState, (obj), TYPE_I440FX_PCI_HOST_BRIDGE)
  

OK cool, but don't you want to put igd and passthrough
somewhere in the name? Maybe legacy as well since
future drivers will work with existing machine type, right?

Applies to functions and macro names as well.

 +#define TYPE_I440FX_XEN_PCI_DEVICE i440FX-xen
 +#define I440FX_XEN_PCI_DEVICE(obj) \
 +OBJECT_CHECK(PCII440FXState, (obj), TYPE_I440FX_XEN_PCI_DEVICE)
 +
  typedef struct I440FXState {
  PCIHostState parent_obj;
  PcPciInfo pci_info;
 @@ -305,6 +310,16 @@ static int i440fx_initfn(PCIDevice *dev)
  return 0;
  }
  
 +static int i440fx_xen_initfn(PCIDevice *dev)
 +{
 +PCII440FXState *d = I440FX_XEN_PCI_DEVICE(dev);
 +
 +dev-config[I440FX_SMRAM] = 0x02;
 +
 +cpu_smm_register(i440fx_set_smm, d);
 +return 0;
 +}
 +
  PCIBus *i440fx_init(PCII440FXState **pi440fx_state,
  int *piix3_devfn,
  ISABus **isa_bus, qemu_irq *pic,
 @@ -704,6 +719,33 @@ static const TypeInfo i440fx_info = {
  .class_init= i440fx_class_init,
  };
  
 +static void i440fx_xen_class_init(ObjectClass *klass, void *data)
 +{
 +DeviceClass *dc = DEVICE_CLASS(klass);
 +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 +
 +k-init = i440fx_xen_initfn;
 +k-vendor_id = PCI_VENDOR_ID_INTEL;
 +k-device_id = PCI_DEVICE_ID_INTEL_82441;
 +k-revision = 0x02;
 +k-class_id = PCI_CLASS_BRIDGE_ISA;
 +dc-desc = XEN Host bridge;
 +dc-vmsd = vmstate_i440fx;
 +/*
 + * PCI-facing part of the host bridge, not usable without the
 + * host-facing part, which can't be device_add'ed, yet.
 + */
 +dc-cannot_instantiate_with_device_add_yet = true;
 +dc-hotpluggable   = false;
 +}
 +
 +static const TypeInfo i440fx_xen_info = {
 +.name  = TYPE_I440FX_XEN_PCI_DEVICE,
 +.parent= TYPE_PCI_DEVICE,
 +.instance_size = sizeof(PCII440FXState),
 +.class_init= i440fx_xen_class_init,
 +};
 +
  static const char *i440fx_pcihost_root_bus_path(PCIHostState *host_bridge,
  PCIBus *rootbus)
  {
 @@ -745,6 +787,7 @@ static const TypeInfo i440fx_pcihost_info = {
  static void i440fx_register_types(void)
  {
  type_register_static(i440fx_info);
 +type_register_static(i440fx_xen_info);
  type_register_static(piix3_info);
  type_register_static(piix3_xen_info);
  type_register_static(i440fx_pcihost_info);
 -- 
 1.9.1

Re: [Qemu-devel] [PATCH 3/4] xen:hw:pci-host:piix: introduce xen_igd_i440fx_init

On Thu, Jul 24, 2014 at 07:30:28PM +0800, Tiejun Chen wrote:
 This is almost same as an original i440fx_init but just
 work with that xen igd host bridge to passthrough.
 
 Signed-off-by: Tiejun Chen tiejun.c...@intel.com
 ---
  hw/pci-host/piix.c   | 79 
 
  include/hw/i386/pc.h | 10 +++
  2 files changed, 89 insertions(+)
 
 diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
 index 9feddf5..7ef08d7 100644
 --- a/hw/pci-host/piix.c
 +++ b/hw/pci-host/piix.c
 @@ -407,6 +407,85 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state,
  return b;
  }
  
 +PCIBus *xen_igd_i440fx_init(PCII440FXState **pi440fx_state,
 +int *piix3_devfn,
 +ISABus **isa_bus, qemu_irq *pic,
 +MemoryRegion *address_space_mem,
 +MemoryRegion *address_space_io,
 +ram_addr_t ram_size,
 +ram_addr_t below_4g_mem_size,
 +ram_addr_t above_4g_mem_size,
 +MemoryRegion *pci_address_space,
 +MemoryRegion *ram_memory)
 +{
 +DeviceState *dev;
 +PCIBus *b;
 +PCIDevice *d;
 +PCIHostState *s;
 +PIIX3State *piix3;
 +PCII440FXState *f;
 +unsigned i;
 +I440FXState *i440fx;
 +
 +dev = qdev_create(NULL, TYPE_I440FX_PCI_HOST_BRIDGE);
 +s = PCI_HOST_BRIDGE(dev);
 +b = pci_bus_new(dev, NULL, pci_address_space,
 +address_space_io, 0, TYPE_PCI_BUS);
 +s-bus = b;
 +object_property_add_child(qdev_get_machine(), i440fx, OBJECT(dev), 
 NULL);
 +qdev_init_nofail(dev);
 +
 +d = pci_create_simple(b, 0, TYPE_I440FX_XEN_PCI_DEVICE);
 +*pi440fx_state = I440FX_XEN_PCI_DEVICE(d);
 +f = *pi440fx_state;
 +f-system_memory = address_space_mem;
 +f-pci_address_space = pci_address_space;
 +f-ram_memory = ram_memory;
 +
 +i440fx = I440FX_PCI_HOST_BRIDGE(dev);
 +i440fx-pci_info.w32.begin = below_4g_mem_size;
 +
 +/* setup pci memory mapping */
 +pc_pci_as_mapping_init(OBJECT(f), f-system_memory,
 +   f-pci_address_space);
 +
 +memory_region_init_alias(f-smram_region, OBJECT(d), smram-region,
 + f-pci_address_space, 0xa, 0x2);
 +memory_region_add_subregion_overlap(f-system_memory, 0xa,
 +f-smram_region, 1);
 +memory_region_set_enabled(f-smram_region, false);
 +init_pam(dev, f-ram_memory, f-system_memory, f-pci_address_space,
 + f-pam_regions[0], PAM_BIOS_BASE, PAM_BIOS_SIZE);
 +for (i = 0; i  12; ++i) {
 +init_pam(dev, f-ram_memory, f-system_memory, f-pci_address_space,
 + f-pam_regions[i+1], PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE,
 + PAM_EXPAN_SIZE);
 +}
 +
 +/* Xen supports additional interrupt routes from the PCI devices to
 + * the IOAPIC: the four pins of each PCI device on the bus are also
 + * connected to the IOAPIC directly.
 + * These additional routes can be discovered through ACPI. */
 +piix3 = DO_UPCAST(PIIX3State, dev,
 +pci_create_simple_multifunction(b, -1, true, PIIX3-xen));
 +pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq,
 +piix3, XEN_PIIX_NUM_PIRQS);
 +piix3-pic = pic;
 +*isa_bus = ISA_BUS(qdev_get_child_bus(DEVICE(piix3), isa.0));
 +
 +*piix3_devfn = piix3-dev.devfn;
 +
 +ram_size = ram_size / 8 / 1024 / 1024;
 +if (ram_size  255) {
 +ram_size = 255;
 +}
 +d-config[0x57] = ram_size;
 +
 +i440fx_update_memory_mappings(f);
 +
 +return b;
 +}

Too much copy-paste. Please refactor to avoid code duplication.
Is the only difference here the use of TYPE_I440FX_XEN_PCI_DEVICE?
Then you can pass type in as a parameter to a common static sub-function.

 +
  PCIBus *find_i440fx(void)
  {
  PCIHostState *s = OBJECT_CHECK(PCIHostState,
 diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
 index 1c0c382..51656d9 100644
 --- a/include/hw/i386/pc.h
 +++ b/include/hw/i386/pc.h
 @@ -239,6 +239,16 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int 
 *piix_devfn,
  MemoryRegion *pci_memory,
  MemoryRegion *ram_memory);
  
 +PCIBus *xen_igd_i440fx_init(PCII440FXState **pi440fx_state, int *piix_devfn,
 +ISABus **isa_bus, qemu_irq *pic,
 +MemoryRegion *address_space_mem,
 +MemoryRegion *address_space_io,
 +ram_addr_t ram_size,
 +ram_addr_t below_4g_mem_size,
 +ram_addr_t above_4g_mem_size,
 +MemoryRegion *pci_memory,
 +MemoryRegion *ram_memory);
 +
  PCIBus *find_i440fx(void);
  /* piix4.c */
  extern PCIDevice *piix4_dev;
 -- 
 1.9.1

Re: [Qemu-devel] [PATCH 1/4] hw:i386:pc_piix: split pc_init1()

On Thu, Jul 24, 2014 at 07:30:26PM +0800, Tiejun Chen wrote:
 We'd like to split pc_init1 and then we can share something
 with other stuff.
 
 Signed-off-by: Tiejun Chen tiejun.c...@intel.com

Did you test this patch? It does not look like it can work.

 ---
  hw/i386/pc_piix.c | 93 
 +--
  1 file changed, 70 insertions(+), 23 deletions(-)
 
 diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
 index 7081c08..2391fda 100644
 --- a/hw/i386/pc_piix.c
 +++ b/hw/i386/pc_piix.c
 @@ -70,34 +70,21 @@ static bool smbios_legacy_mode;
  static bool gigabyte_align = true;
  static bool has_reserved_memory = true;
  
 -/* PC hardware initialisation */
 -static void pc_init1(MachineState *machine,
 +static ram_addr_t below_4g_mem_size;
 +static ram_addr_t above_4g_mem_size;
 +static void pc_machine_base_init(MachineState *machine,
   int pci_enabled,
 - int kvmclock_enabled)
 + int kvmclock_enabled,
 + DeviceState *icc_bridge,
 + MemoryRegion *ram_memory,
 + MemoryRegion *pci_memory,
 + qemu_irq *gsi,
 + GSIState *gsi_state,
 + FWCfgState *fw_cfg)
  {
  PCMachineState *pc_machine = PC_MACHINE(machine);
  MemoryRegion *system_memory = get_system_memory();
 -MemoryRegion *system_io = get_system_io();
 -int i;
 -ram_addr_t below_4g_mem_size, above_4g_mem_size;
 -PCIBus *pci_bus;
 -ISABus *isa_bus;
 -PCII440FXState *i440fx_state;
 -int piix3_devfn = -1;
 -qemu_irq *cpu_irq;
 -qemu_irq *gsi;
 -qemu_irq *i8259;
 -qemu_irq *smi_irq;
 -GSIState *gsi_state;
 -DriveInfo *hd[MAX_IDE_BUS * MAX_IDE_DEVS];
 -BusState *idebus[MAX_IDE_BUS];
 -ISADevice *rtc_state;
 -ISADevice *floppy;
 -MemoryRegion *ram_memory;
 -MemoryRegion *pci_memory;
  MemoryRegion *rom_memory;
 -DeviceState *icc_bridge;
 -FWCfgState *fw_cfg = NULL;
  PcGuestInfo *guest_info;
  ram_addr_t lowmem;
  
 @@ -190,6 +177,20 @@ static void pc_init1(MachineState *machine,
  } else {
  gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
  }
 +}
 +
 +static void pc_machine_pci_bus_init(MachineState *machine,
 + int pci_enabled,
 + PCII440FXState *i440fx_state,
 + int piix3_devfn,
 + PCIBus *pci_bus,
 + ISABus *isa_bus,
 + qemu_irq *gsi,
 + MemoryRegion *pci_memory,
 + MemoryRegion *ram_memory)
 +{
 +MemoryRegion *system_memory = get_system_memory();
 +MemoryRegion *system_io = get_system_io();
  
  if (pci_enabled) {
  pci_bus = i440fx_init(i440fx_state, piix3_devfn, isa_bus, gsi,
 @@ -203,6 +204,28 @@ static void pc_init1(MachineState *machine,
  isa_bus = isa_bus_new(NULL, system_io);
  no_hpet = 1;
  }
 +}
 +
 +static void pc_machine_device_init(MachineState *machine,
 + int pci_enabled,
 + GSIState *gsi_state,
 + DeviceState *icc_bridge,
 + int piix3_devfn,
 + FWCfgState *fw_cfg,
 + PCIBus *pci_bus,
 + ISABus *isa_bus,
 + qemu_irq *gsi)
 +{
 +int i;
 +DriveInfo *hd[MAX_IDE_BUS * MAX_IDE_DEVS];
 +BusState *idebus[MAX_IDE_BUS];
 +qemu_irq *smi_irq;
 +PCMachineState *pc_machine = PC_MACHINE(machine);
 +qemu_irq *cpu_irq;
 +qemu_irq *i8259;
 +ISADevice *rtc_state;
 +ISADevice *floppy;
 +
  isa_bus_irqs(isa_bus, gsi);
  
  if (kvm_irqchip_in_kernel()) {
 @@ -290,6 +313,30 @@ static void pc_init1(MachineState *machine,
  }
  }
  
 +/* PC hardware initialisation */
 +static void pc_init1(MachineState *machine,
 + int pci_enabled,
 + int kvmclock_enabled)
 +{
 +PCIBus *pci_bus = NULL;
 +ISABus *isa_bus = NULL;
 +PCII440FXState *i440fx_state = NULL;
 +int piix3_devfn = -1;
 +qemu_irq *gsi = NULL;
 +GSIState *gsi_state = NULL;
 +MemoryRegion *ram_memory = NULL;
 +MemoryRegion *pci_memory = NULL;
 +DeviceState *icc_bridge = NULL;
 +FWCfgState *fw_cfg = NULL;

These are set to NULL here and never modified below.
Why does it make sense?


 +
 +pc_machine_base_init(machine, pci_enabled, kvmclock_enabled, icc_bridge,
 +ram_memory, pci_memory, gsi, gsi_state, fw_cfg);
 +pc_machine_pci_bus_init(machine, pci_enabled, i440fx_state, piix3_devfn,
 +pci_bus, isa_bus, gsi, pci_memory, ram_memory);
 +pc_machine_device_init(machine, pci_enabled, gsi_state, icc_bridge,
 +piix3_devfn, fw_cfg, pci_bus, isa_bus, gsi);
 +}
 +
  static void pc_init_pci(MachineState *machine)
  {

[Qemu-devel] [PATCH] fix vcpu long time io blocking on tap, when too many packets was delivered to the guest os via tap interface.

2014-07-29 Thread Wangkai


Signed-off-by: Wangkai wangka...@huawei.com
---
 net/tap.c |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index a40f7f0..9a59934 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -189,6 +189,7 @@ static void tap_send(void *opaque)
 {
 TAPState *s = opaque;
 int size;
+int packets = 0;
 
 while (qemu_can_send_packet(s-nc)) {
 uint8_t *buf = s-buf;
@@ -210,6 +211,19 @@ static void tap_send(void *opaque)
 } else if (size  0) {
 break;
 }
+
+/*
+ * When receive packets on tap, QEMU io was locked, if too many
+ * packets was delivered to the guest os via tap interface,
+ * tap_send() would keep looping, if then the VM required a io
+ * operation, would be blocked for a long time.
+ * Here we set the number to limit one tap interface receive time,
+ * keep io events fair and lock time little.
+ */
+packets++;
+if (packets = 50) {
+break;
+}
 }
 }
 
-- 
1.7.2.5

Re: [Qemu-devel] [PATCH 4/4] xen:hw:i386:pc_piix: introduce new machine for IGD passthrough

On Thu, Jul 24, 2014 at 07:30:29PM +0800, Tiejun Chen wrote:
 Now we can introduce a new machine, xenigd, specific to IGD
 passthrough. This can avoid involving other common codes.
 
 Signed-off-by: Tiejun Chen tiejun.c...@intel.com
 ---
  hw/i386/pc_piix.c | 87 
 +++
  1 file changed, 87 insertions(+)
 
 diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
 index 2391fda..46e5901 100644
 --- a/hw/i386/pc_piix.c
 +++ b/hw/i386/pc_piix.c
 @@ -206,6 +206,33 @@ static void pc_machine_pci_bus_init(MachineState 
 *machine,
  }
  }
  
 +static void xen_igd_pc_machine_pci_bus_init(MachineState *machine,
 + int pci_enabled,
 + PCII440FXState *i440fx_state,
 + int piix3_devfn,
 + PCIBus *pci_bus,
 + ISABus *isa_bus,
 + qemu_irq *gsi,
 + MemoryRegion *pci_memory,
 + MemoryRegion *ram_memory)
 +{
 +MemoryRegion *system_memory = get_system_memory();
 +MemoryRegion *system_io = get_system_io();
 +
 +if (pci_enabled) {
 +pci_bus = xen_igd_i440fx_init(i440fx_state, piix3_devfn, isa_bus,
 +  gsi, system_memory, system_io, 
 machine-ram_size,
 +  below_4g_mem_size,
 +  above_4g_mem_size,
 +  pci_memory, ram_memory);
 +} else {
 +pci_bus = NULL;
 +i440fx_state = NULL;

what does this do?

 +isa_bus = isa_bus_new(NULL, system_io);
 +no_hpet = 1;
 +}

no_hpet is code duplicated from pc in chunk above, better to move to
a common function.

 +}
 +
  static void pc_machine_device_init(MachineState *machine,
   int pci_enabled,
   GSIState *gsi_state,
 @@ -337,11 +364,39 @@ static void pc_init1(MachineState *machine,
  piix3_devfn, fw_cfg, pci_bus, isa_bus, gsi);
  }
  
 +static void xen_igd_pc_init1(MachineState *machine,
 + int pci_enabled,
 + int kvmclock_enabled)
 +{
 +PCIBus *pci_bus = NULL;
 +ISABus *isa_bus = NULL;
 +PCII440FXState *i440fx_state = NULL;
 +int piix3_devfn = -1;
 +qemu_irq *gsi = NULL;
 +GSIState *gsi_state = NULL;
 +MemoryRegion *ram_memory = NULL;
 +MemoryRegion *pci_memory = NULL;
 +DeviceState *icc_bridge = NULL;
 +FWCfgState *fw_cfg = NULL;
 +
 +pc_machine_base_init(machine, pci_enabled, kvmclock_enabled, icc_bridge,
 +ram_memory, pci_memory, gsi, gsi_state, fw_cfg);
 +xen_igd_pc_machine_pci_bus_init(machine, pci_enabled, i440fx_state,
 +piix3_devfn, pci_bus, isa_bus, gsi, pci_memory, 
 ram_memory);
 +pc_machine_device_init(machine, pci_enabled, gsi_state, icc_bridge,
 +piix3_devfn, fw_cfg, pci_bus, isa_bus, gsi);
 +}
 +
  static void pc_init_pci(MachineState *machine)
  {
  pc_init1(machine, 1, 1);
  }
  
 +static void xen_igd_pc_init_pci(MachineState *machine)
 +{
 +xen_igd_pc_init1(machine, 1, 1);
 +}
 +
  static void pc_compat_2_0(MachineState *machine)
  {
  smbios_legacy_mode = true;
 @@ -470,6 +525,17 @@ static void pc_xen_hvm_init(MachineState *machine)
  pci_create_simple(bus, -1, xen-platform);
  }
  }
 +static void xen_igd_pc_hvm_init(MachineState *machine)
 +{
 +PCIBus *bus;
 +
 +xen_igd_pc_init_pci(machine);
 +
 +bus = pci_find_primary_bus();
 +if (bus != NULL) {
 +pci_create_simple(bus, -1, xen-platform);
 +}
 +}
  #endif
  
  #define PC_I440FX_MACHINE_OPTIONS \
 @@ -919,6 +985,26 @@ static QEMUMachine xenfv_machine = {
  { /* end of list */ }
  },
  };
 +static QEMUMachine xenigd_machine = {
 +PC_COMMON_MACHINE_OPTIONS,
 +.name = xenigd,
 +.desc = Xen Fully-virtualized PC specific to IGD,
 +.init = xen_igd_pc_hvm_init,
 +.max_cpus = HVM_MAX_VCPUS,
 +.default_machine_opts = accel=xen,
 +.hot_add_cpu = pc_hot_add_cpu,
 +.compat_props = (GlobalProperty[]) {
 +/* xenfv has no fwcfg and so does not load acpi from QEMU.
 + * as such new acpi features don't work.
 + */
 +{
 +.driver   = PIIX4_PM,
 +.property = acpi-pci-hotplug-with-bridge-support,
 +.value= off,
 +},
 +{ /* end of list */ }
 +},
 +};
  #endif
  
  static void pc_machine_init(void)
 @@ -942,6 +1028,7 @@ static void pc_machine_init(void)
  qemu_register_pc_machine(isapc_machine);
  #ifdef CONFIG_XEN
  qemu_register_pc_machine(xenfv_machine);
 +qemu_register_pc_machine(xenigd_machine);
  #endif
  }
  
 -- 
 1.9.1

Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-29 Thread Christian Borntraeger

On 28/07/14 16:22, Alexander Graf wrote:

On 28.07.2014, at 16:16, David Hildenbrand d...@linux.vnet.ibm.com wrote:

On 10.07.14 15:10, Christian Borntraeger wrote:
From: David Hildenbrand d...@linux.vnet.ibm.com

If a cpu is stopped, it must never be allowed to run and no interrupt may
wake it
up. A cpu also has to be unhalted if it is halted and has work to do - this
scenario wasn't hit in kvm case yet, as only disabled wait is processed
within
QEMU.

Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com

This looks like it's something that generic infrastructure should take
care of, no? How does this work for the other archs? They always get an
interrupt on the transition between !has_work - has_work. Why don't we
get one for s390x?

Alex

Well, we have the special case on s390 as a CPU that is in the STOPPED or the
CHECK STOP state may never run - even if there is an interrupt. It's
basically like this CPU has been switched off.

Imagine that it is tried to inject an interrupt into a stopped vcpu. It
will kick the stopped vcpu and thus lead to a call to
kvm_arch_process_async_events(). We have to deny that this vcpu will ever
run as long as it is stopped. It's like a way to suppress the
interrupt for such a transition you mentioned.

An interrupt kick usually just means we go back into the main loop. From
there we check the interrupt bitmap which interrupt to handle. Check out the
handling code here:

http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580

If you just check for the stopped state in here, do_interrupt() will never
get called and thus the CPU shouldn't ever get executed. Unless I'm heavily
mistaken :).

Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a
SIGP START to that vcpu).

Yes, in that case that other CPU generates a signal (a different bit in
interrupt_request) and the first CPU would see that it has to wake up and
wake up.

I am not sure if such a mechanism/scenario is applicable to any other arch.
They
all seem to reset the cs-halted flag if they know they are able to run (e.g.
due to an interrupt) - they have no such thing as stopped cpus, only
halted/waiting cpus.

There's not really much difference between the two. The only difference from
a software point of view is that a stopped CPU has its external interrupt
bits masked off, no?

We have
- wait (wait bit in PSW)
- disabled wait (wait bit and interrupt fencing in PSW)
- STOPPED (not related to PSW, state change usually handled via service
processor or hypervisor)

I think we have to differentiate between KVM/TCG. On KVM we always do in kernel
halt and qemu sees a halted only for STOPPED or disabled wait. TCG has to take
care of the normal wait as well.

From a first glimpse, a disabled wait and STOPPED look similar, but there are
(important) differences, e.g. other CPUs get a different a different result
from a SIGP SENSE. This makes a big difference, e.g. for Linux guests, that
send a SIGP STOP, followed by a SIGP SENSE loop until the CPU is down on
hotplug (and shutdown, kexec..) So I think we agree, that handling the cpu
states natively makes sense.

The question is now only how to model it correctly without breaking TCG/KVM and
reuse as much common code as possible. Correct?

Do I understand you correctly, that your collapsing of stopped and halted is
only in the qemu coding sense, IOW maybe we could just modify
kvm_arch_process_async_events to consider the STOPPED state, as TCGs sigp
implementation does not support SMP anyway?
David would that work?

Christian

[Qemu-devel] [Bug 1224444] Re: virtio-serial loses writes when used over virtio-mmio

2014-07-29 Thread Richard Jones

FWIW I am able to reproduce this quite easily on aarch64 too.

My test program is:
https://github.com/libguestfs/libguestfs/blob/master/tests/qemu/qemu-speed-test.c

and you use it like this:
qemu-speed-test --virtio-serial-upload

(You can also test virtio-serial downloads and a few other things, but
those don't appear to deadlock)

Slowing down the upload, even just by enabling debugging, is sufficient
to make the problem go away most of the time.

I am testing with qemu from git
(f45c56e0166e86d3b309ae72f4cb8e3d0949c7ef).

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/122

Title:
  virtio-serial loses writes when used over virtio-mmio

Status in QEMU:
  New

Bug description:
  virtio-serial appears to lose writes, but only when used on top of
  virtio-mmio.  The scenario is this:

  /home/rjones/d/qemu/arm-softmmu/qemu-system-arm \
  -global virtio-blk-device.scsi=off \
  -nodefconfig \
  -nodefaults \
  -nographic \
  -M vexpress-a15 \
  -machine accel=kvm:tcg \
  -m 500 \
  -no-reboot \
  -kernel /home/rjones/d/libguestfs/tmp/.guestfs-1001/kernel.27944 \
  -dtb /home/rjones/d/libguestfs/tmp/.guestfs-1001/dtb.27944 \
  -initrd /home/rjones/d/libguestfs/tmp/.guestfs-1001/initrd.27944 \
  -device virtio-scsi-device,id=scsi \
  -drive 
file=/home/rjones/d/libguestfs/tmp/libguestfsLa9dE2/scratch.1,cache=unsafe,format=raw,id=hd0,if=none
 \
  -device scsi-hd,drive=hd0 \
  -drive 
file=/home/rjones/d/libguestfs/tmp/.guestfs-1001/root.27944,snapshot=on,id=appliance,cache=unsafe,if=none
 \
  -device scsi-hd,drive=appliance \
  -device virtio-serial-device \
  -serial stdio \
  -chardev 
socket,path=/home/rjones/d/libguestfs/tmp/libguestfsLa9dE2/guestfsd.sock,id=channel0
 \
  -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
  -append 'panic=1 mem=500M console=ttyAMA0 udevtimeout=600 no_timer_check 
acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 
guestfs_verbose=1 TERM=xterm-256color'

  After the guest starts up, a daemon writes 4 bytes to a virtio-serial
  socket.  The host side reads these 4 bytes correctly and writes a 64
  byte message.  The guest never sees this message.

  I enabled virtio-mmio debugging, and this is what is printed (## = my
  comment):

  ## guest opens the socket:
  trying to open virtio-serial channel 
'/dev/virtio-ports/org.libguestfs.channel.0'
  virtio_mmio: virtio_mmio_write offset 0x50 value 0x3
  opened the socket, sock = 3
  udevadm settle
  ## guest writes 4 bytes to the socket:
  virtio_mmio: virtio_mmio_write offset 0x50 value 0x5
  virtio_mmio: virtio_mmio setting IRQ 1
  virtio_mmio: virtio_mmio_read offset 0x60
  virtio_mmio: virtio_mmio_write offset 0x64 value 0x1
  virtio_mmio: virtio_mmio setting IRQ 0
  sent magic GUESTFS_LAUNCH_FLAG
  ## host reads 4 bytes successfully:
  main_loop libguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG
  libguestfs: [14605ms] appliance is up
  Guest launched OK.
  ## host writes 64 bytes to socket:
  libguestfs: writing the data to the socket (size = 64)
  waiting for next request
  libguestfs: data written OK
  ## hangs here forever with guest in read() call never receiving any data

  I am using qemu from git today (2d1fe1873a984).

  Some notes:

  - It's not 100% reproducible.  Sometimes everything works fine, although it 
fails often (eg  2/3rds of the time).
  - KVM is being used.
  - We've long used virtio-serial on x86 and have never seen anything like this.

  This is what the output looks like when it doesn't fail:

  trying to open virtio-serial channel 
'/dev/virtio-ports/org.libguestfs.channel.0
  '
  virtio_mmio: virtio_mmio_write offset 0x50 value 0x3
  opened the socket, sock = 3
  udevadm settle
  virtio_mmio: virtio_mmio_write offset 0x50 value 0x5
  virtio_mmio: virtio_mmio setting IRQ 1
  virtio_mmio: virtio_mmio_read offset 0x60
  virtio_mmio: virtio_mmio_write offset 0x64 value 0x1
  virtio_mmio: virtio_mmio setting IRQ 0
  sent magic GUESTlibguestfs: recv_from_daemon: received GUESTFS_LAUNCH_FLAG
  libguestfs: [14710ms] appliance is up
  Guest launched OK.
  libguestfs: writing the data to the socket (size = 64)
  FS_LAUNCH_FLAG
  main_loop waiting for next request
  libguestfs: data written OK
  virtio_mmio: virtio_mmio_write offset 0x50 value 0x2
  virtio_mmio: virtio_mmio setting IRQ 1
  virtio_mmio: virtio_mmio setting IRQ 1
  virtio_mmio: virtio_mmio_write offset 0x50 value 0x2
  virtio_mmio: virtio_mmio_read offset 0x60
  virtio_mmio: virtio_mmio setting IRQ 1
  virtio_mmio: virtio_mmio_write offset 0x64 value 0x1
  virtio_mmio: virtio_mmio setting IRQ 0
  virtio_mmio: virtio_mmio_read offset 0x60
  virtio_mmio: virtio_mmio_write offset 0x64 value 0x0
  virtio_mmio: virtio_mmio setting IRQ 0
  virtio_mmio: virtio_mmio_read offset 0x60
  virtio_mmio: virtio_mmio_write offset

Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-29 Thread Alexander Graf



On 29.07.14 13:44, Christian Borntraeger wrote:

On 28/07/14 16:22, Alexander Graf wrote:

On 28.07.2014, at 16:16, David Hildenbrand d...@linux.vnet.ibm.com wrote:


On 10.07.14 15:10, Christian Borntraeger wrote:

From: David Hildenbrand d...@linux.vnet.ibm.com

If a cpu is stopped, it must never be allowed to run and no interrupt may wake 
it
up. A cpu also has to be unhalted if it is halted and has work to do - this
scenario wasn't hit in kvm case yet, as only disabled wait is processed within
QEMU.

Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com

This looks like it's something that generic infrastructure should take
care of, no? How does this work for the other archs? They always get an
interrupt on the transition between !has_work - has_work. Why don't we
get one for s390x?


Alex



Well, we have the special case on s390 as a CPU that is in the STOPPED or the
CHECK STOP state may never run - even if there is an interrupt. It's
basically like this CPU has been switched off.

Imagine that it is tried to inject an interrupt into a stopped vcpu. It
will kick the stopped vcpu and thus lead to a call to
kvm_arch_process_async_events(). We have to deny that this vcpu will ever
run as long as it is stopped. It's like a way to suppress the
interrupt for such a transition you mentioned.

An interrupt kick usually just means we go back into the main loop. From there 
we check the interrupt bitmap which interrupt to handle. Check out the handling 
code here:

   
http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580

If you just check for the stopped state in here, do_interrupt() will never get 
called and thus the CPU shouldn't ever get executed. Unless I'm heavily 
mistaken :).


Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a
SIGP START to that vcpu).

Yes, in that case that other CPU generates a signal (a different bit in 
interrupt_request) and the first CPU would see that it has to wake up and wake 
up.


I am not sure if such a mechanism/scenario is applicable to any other arch. They
all seem to reset the cs-halted flag if they know they are able to run (e.g.
due to an interrupt) - they have no such thing as stopped cpus, only
halted/waiting cpus.

There's not really much difference between the two. The only difference from a software 
point of view is that a stopped CPU has its external interrupt bits masked 
off, no?

We have
- wait (wait bit in PSW)
- disabled wait (wait bit and interrupt fencing in PSW)
- STOPPED (not related to PSW, state change usually handled via service 
processor or hypervisor)

I think we have to differentiate between KVM/TCG. On KVM we always do in kernel 
halt and qemu sees a halted only for STOPPED or disabled wait. TCG has to take 
care of the normal wait as well.

 From a first glimpse, a disabled wait and STOPPED look similar, but there are 
(important) differences, e.g. other CPUs get a different a different result 
from a SIGP SENSE. This makes a big difference, e.g. for Linux guests, that 
send a SIGP STOP, followed by a SIGP SENSE loop until the CPU is down on 
hotplug (and shutdown, kexec..) So I think we agree, that handling the cpu 
states natively makes sense.

The question is now only how to model it correctly without breaking TCG/KVM and 
reuse as much common code as possible. Correct?

Do I understand you correctly, that your collapsing of stopped and halted is 
only in the qemu coding sense, IOW maybe we could just modify 
kvm_arch_process_async_events to consider the STOPPED state, as TCGs sigp 
implementation does not support SMP anyway?


That works for me, yes.


Alex

[Qemu-devel] Live migration debugging

2014-07-29 Thread Paul Boven


Hi folks,

Recently there's been several patches to fix kvmclock issues during 
migrations, which were subsequently reverted. I hope the observations 
below can be helpful in pinning down the actual issues to make live 
migration work again in the future.


Live migration has been broken since at least release 1.4.0 (as shipped 
with Ubuntu 13.04), and still has the same problems in 2.1.0-rc2, but 
briefly worked in 2.0-git-20140609.


The problem is that once the live migration is complete and the guest 
gets started on the destination server, it will hang for a long time, 
consuming 100% cpu. This can be mere seconds, but I've also observed 
hangs for as long as 11 minutes. And then suddenly the guest starts to 
respond again as if nothing happens, but its clock has not progressed at 
all while the machine was hanging.


What I have observed is that the time spent hanging is exactly the 
difference between the clock rate of the host, and the 'real' (NTP) 
time. If you multiply the time since the previous migration with the PPM 
offset as determined by NTP (see /var/lib/ntp/ntp.drift), that is 
exactly how many seconds the guest will spend at 100% CPU before 
becoming responsive again. I have observed this on two different pairs 
of KVM servers. Each of the servers has a negative PPM value according 
to NTP.


Example: a guest having nearly 9 days of uptime, with (according to NTP) 
a clock rate of -34 ppm, froze for 27 seconds when I migrated it. I have 
done quite a few test migrations, and this relationship holds quite 
precisely.


As the duration of the freeze is proportional to the time since the 
previous migration, debugging is a bit difficult as you have to wait a 
while before you can demonstrate the problem. It is also probably a 
reason this problem is underreported, because it is not very noticeable 
if you do it right after starting the VM, but looks like a complete 
crash if you have a few months of uptime.


With the 2.0 sources from 2014-06-09, the problem does *not* occur. A 
side-effect of that patch is that the guest clock has a lot of jitter 
until the first migration, but behaves normally (yet without hangs) on 
subsequent migrations.


Is there a way that I can directly read the kvmclock from the guest or 
host, so we can compare them before and after migration, and see what 
goes wrong precisely?


See also https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1297218

Regards, Paul Boven.
--
Paul Boven bo...@jive.nl +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science

Re: [Qemu-devel] [PATCH v2] Tap: fix vcpu long time io blocking on tap

2014-07-29 Thread Wangkai (Kevin,C)

 -Original Message-
 From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
 Sent: Monday, July 28, 2014 11:53 PM
 To: Wangkai (Kevin,C)
 Cc: qemu-devel@nongnu.org; Lee yang; aligu...@amazon.com; Stefan
 Hajnoczi
 Subject: Re: [Qemu-devel] [PATCH v2] Tap: fix vcpu long time io
 blocking on tap

 On Fri, Jul 18, 2014 at 09:33:42AM +, Wangkai (Kevin,C) wrote:
  fix vcpu long time io blocking on tap, when too many packets was
  delivered to the guest os via tap interface.

  --
  Signed-off-by: Wangkai wangka...@huawei.com

 Thanks, applied to my net-next tree:
 https://github.com/stefanha/qemu/commits/net-next

 The patch did not apply cleanly so I had to do it manually:

   Applying: Tap: fix vcpu long time io blocking on tap
   fatal: corrupt patch at line 32
   Repository lacks necessary blobs to fall back on 3-way merge.
   Cannot fall back to three-way merge.
   Patch failed at 0001 Tap: fix vcpu long time io blocking on tap

 Please use git-send-email(1).

 I also adjusted the commit message and doc comments to fit QEMU style
 and for grammar.

 Stefan
[Wangkai (Kevin,C)] 

Hi Stefan,
I will send a patch v3 use git send-email to you.

Wangkai

Re: [Qemu-devel] [RFC PATCH v2 07/49] kvmapic: fixing loading vmstate

2014-07-29 Thread Pavel Dovgaluk

 From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo 
 Bonzini
 Il 17/07/2014 13:02, Pavel Dovgalyuk ha scritto:
  diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
  index ce3d903..9d75ee0 100644
  --- a/hw/intc/apic_common.c
  +++ b/hw/intc/apic_common.c
  @@ -347,7 +347,7 @@ static int apic_dispatch_post_load(void *opaque, int 
  version_id)
 
   static const VMStateDescription vmstate_apic_common = {
   .name = apic,
  -.version_id = 3,
  +.version_id = 4,
   .minimum_version_id = 3,
   .minimum_version_id_old = 1,
   .load_state_old = apic_load_old,
  @@ -374,6 +374,9 @@ static const VMStateDescription vmstate_apic_common = {
   VMSTATE_INT64(next_time, APICCommonState),
   VMSTATE_INT64(timer_expiry,
 APICCommonState), /* open-coded timer state */
  +VMSTATE_INT32_V(sipi_vector, APICCommonState, 4),
  +VMSTATE_INT32_V(wait_for_sipi, APICCommonState, 4),
 
 This could be a subsection.  sipi_vector is only used (needed) if 
 wait_for_sipi != 0.

  Right, sipi_vector is used when wait_for_sipi != 0. But we can set 
sipi_vector to non-zero,
save the snapshot, and then set wait_for_sipi. If that snapshot will be loaded, 
sipi_vector become incorrect.
Isn't this scenario possible?

Pavel Dovgalyuk

Re: [Qemu-devel] [RFC PATCH v2 07/49] kvmapic: fixing loading vmstate

Il 29/07/2014 14:03, Pavel Dovgaluk ha scritto:
  
  This could be a subsection.  sipi_vector is only used (needed) if 
  wait_for_sipi != 0.
   Right, sipi_vector is used when wait_for_sipi != 0. But we can set 
 sipi_vector to non-zero,
 save the snapshot, and then set wait_for_sipi. If that snapshot will be 
 loaded, sipi_vector become incorrect.
 Isn't this scenario possible?

sipi_vector will not be used until CPU_INTERRUPT_SIPI is set, and then
sipi_vector will have been overwritten with a new value.  The
architecture guarantees that.

Paolo

Re: [Qemu-devel] [PATCH v6 3/3] sclp-s390: Add memory hotplug SCLPs

2014-07-29 Thread Christian Borntraeger

On 30/06/14 16:00, Matthew Rosato wrote:
 Add memory information to read SCP info and add handlers for
 Read Storage Element Information, Attach Storage Element,
 Assign Storage and Unassign
 
 Signed-off-by: Matthew Rosato mjros...@linux.vnet.ibm.com
 ---
  hw/s390x/sclp.c|  259 
 ++--
  target-s390x/cpu.h |   15 +++
  target-s390x/kvm.c |5 +
  3 files changed, 273 insertions(+), 6 deletions(-)
 
 diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c
 index 769d7c3..936b189 100644
 --- a/hw/s390x/sclp.c
 +++ b/hw/s390x/sclp.c
 @@ -16,7 +16,8 @@
  #include sysemu/kvm.h
  #include exec/memory.h
  #include sysemu/sysemu.h
 -
 +#include exec/address-spaces.h
 +#include qemu/config-file.h
  #include hw/s390x/sclp.h
  #include hw/s390x/event-facility.h
 
 @@ -33,10 +34,19 @@ static inline SCLPEventFacility *get_event_facility(void)
  static void read_SCP_info(SCCB *sccb)
  {
  ReadInfo *read_info = (ReadInfo *) sccb;
 +sclpMemoryHotplugDev *mhd = get_sclp_memory_hotplug_dev();
  CPUState *cpu;
 -int shift = 0;
  int cpu_count = 0;
  int i = 0;
 +int increment_size = 20;
 +int rnsize, rnmax;
 +QemuOpts *opts = qemu_opts_find(qemu_find_opts(memory), NULL);
 +int slots = qemu_opt_get_number(opts, slots, 0);
 +int max_avail_slots = s390_get_memslot_count(kvm_state);
 +
 +if (slots  max_avail_slots) {
 +slots = max_avail_slots;
 +}
 
  CPU_FOREACH(cpu) {
  cpu_count++;
 @@ -54,14 +64,235 @@ static void read_SCP_info(SCCB *sccb)
 
  read_info-facilities = cpu_to_be64(SCLP_HAS_CPU_INFO);
 
 -while ((ram_size  (20 + shift))  65535) {
 -shift++;
 +/*

I was about to apply this series, but I think it breaks the non-ccw machine for 
strange memory sizes (e.g. 40001MB).
The s390-virtio.c file still does a shifting via 65535. Maybe just do a change 
over there as well?

You should also start making up your mind about migration and prepare patches. 
I will send cpu migration patches soon.

Christian



 + * The storage increment size is a multiple of 1M and is a power of 2.
 + * The number of storage increments must be MAX_STORAGE_INCREMENTS or 
 fewer.
 + */
 +while ((ram_size  increment_size)  MAX_STORAGE_INCREMENTS) {
 +increment_size++;
 +}
 +rnmax = ram_size  increment_size;
 +
 +/* Memory Hotplug is only supported for the ccw machine type */
 +if (mhd) {
 +while ((mhd-standby_mem_size  increment_size) 
 +   MAX_STORAGE_INCREMENTS) {
 +increment_size++;
 +}
 +assert(increment_size == mhd-increment_size);
 +
 +mhd-standby_subregion_size = MEM_SECTION_SIZE;
 +/* Deduct the memory slot already used for core */
 +if (slots  0) {
 +while ((mhd-standby_subregion_size * (slots - 1)
 + mhd-standby_mem_size)) {
 +mhd-standby_subregion_size = mhd-standby_subregion_size  
 1;
 +}
 +}
 +/*
 + * Initialize mapping of guest standby memory sections indicating 
 which
 + * are and are not online. Assume all standby memory begins offline.
 + */
 +if (mhd-standby_state_map == 0) {
 +if (mhd-standby_mem_size % mhd-standby_subregion_size) {
 +mhd-standby_state_map = g_malloc0((mhd-standby_mem_size /
 + mhd-standby_subregion_size + 
 1) *
 + (mhd-standby_subregion_size /
 + MEM_SECTION_SIZE));
 +} else {
 +mhd-standby_state_map = g_malloc0(mhd-standby_mem_size /
 +   MEM_SECTION_SIZE);
 +}
 +}
 +mhd-padded_ram_size = ram_size + mhd-pad_size;
 +mhd-rzm = 1  mhd-increment_size;
 +rnmax = ((ram_size + mhd-standby_mem_size + mhd-pad_size)
 +  mhd-increment_size);
 +
 +read_info-facilities |= 
 cpu_to_be64(SCLP_FC_ASSIGN_ATTACH_READ_STOR);
 +}
 +
 +rnsize = 1  (increment_size - 20);
 +if (rnsize = 128) {
 +read_info-rnsize = rnsize;
 +} else {
 +read_info-rnsize = 0;
 +read_info-rnsize2 = cpu_to_be32(rnsize);
  }
 -read_info-rnmax = cpu_to_be16(ram_size  (20 + shift));
 -read_info-rnsize = 1  shift;
 +
 +if (rnmax  0x1) {
 +read_info-rnmax = cpu_to_be16(rnmax);
 +} else {
 +read_info-rnmax = cpu_to_be16(0);
 +read_info-rnmax2 = cpu_to_be64(rnmax);
 +}
 +
  sccb-h.response_code = cpu_to_be16(SCLP_RC_NORMAL_READ_COMPLETION);
  }
 
 +static void read_storage_element0_info(SCCB *sccb)
 +{
 +int i, assigned;
 +int subincrement_id = SCLP_STARTING_SUBINCREMENT_ID;
 +ReadStorageElementInfo *storage_info = (ReadStorageElementInfo *) sccb;
 +sclpMemoryHotplugDev *mhd =

Re: [Qemu-devel] [PULL 0/8] pc migration fixes

On 29 July 2014 11:48, Michael S. Tsirkin m...@redhat.com wrote:
 The following changes since commit c60a57ff497667780132a3fcdc1500c83af5d5c0:

   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
 staging (2014-07-25 16:58:41 +0100)

 are available in the git repository at:

   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

 for you to fetch changes up to f47337cb9181a1f9339b9b0516b92fe77a22f0f3:

   piix: set legacy table size for 1.7 (2014-07-29 12:26:12 +0200)

 
 pc migration fixes

 Last minute fixes for migration.
 It seems that if we don't fix it now, fixing
 it in the next version will be even more painful ...

 Signed-off-by: Michael S. Tsirkin m...@redhat.com

 
 Igor Mammedov (1):
   pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is 
 disabled

 Michael S. Tsirkin (3):
   acpi-build: minor code cleanup
   acpi-build: tweak acpi migration limits
   piix: set legacy table size for 1.7

 Paolo Bonzini (4):
   acpi-dsdt: procedurally generate _PRT
   pc: hack for migration compatibility from QEMU 2.0
   bios-tables-test: fix ASL normalization false positive
   pc: future-proof migration-compatibility of ACPI tables

Applied, thanks.

-- PMM

[Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices

This patch set is based on a patch suggested by Einar Lueck
on Feb 08, 2013.

This patch set introduces:
1. s390x specific geometry detection:
Add s390 specific version of hd_geometry_guess function,
which uses HDIO_GETGEO ioctl.

2. A set of blocksize patches for autodetection of logical and
physical blocksizes. Change history:
2.1
Original blocksize patch only configured autolookup for
virtio-blk devices. There was a request from Stefan Hajnoczi
to make this architecture-independent. Now autolookup is
configured by default for all block devices.
2.2
Add driver method to probe blocksizes for raw and
host_device drivers.
(also requested by the reviewers of the original patch)

Ekaterina Tumanova (4):
  hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x
  blocksize: support auto-sensing of blocksizes
  blocksize: Add driver method to get the blocksizes
  blocksize: add blkconf_blocksize call to all block devices

 block.c   | 12 +
 block/raw-posix.c | 69 ++-
 block/raw_bsd.c   | 14 ++
 hw/block/Makefile.objs|  6 -
 hw/block/block.c  | 25 +
 hw/block/hd-geometry.c| 56 ++
 hw/block/nvme.c   |  1 +
 hw/block/virtio-blk.c |  1 +
 hw/core/qdev-properties.c |  4 ++-
 hw/ide/qdev.c |  1 +
 hw/scsi/scsi-disk.c   |  1 +
 hw/usb/dev-storage.c  |  1 +
 include/block/block.h |  1 +
 include/block/block_int.h |  5 
 include/hw/block/block.h  |  6 +++--
 15 files changed, 180 insertions(+), 23 deletions(-)

-- 
1.8.5.5

[Qemu-devel] [PATCH 3/4] blocksize: Add driver method to get the blocksizes

This patch introduces a new method of defining the physical
and logical blocksizes for raw and host_device drivers.
It uses various ioctls to determine the logical and physical
blocksizes.

For detecting the logical size it uses ioctl calls, that
were previously coded in raw_probe_alignment (now moved into
separate function probe_logical_blocksize). For detecting the
physical blocksize it uses BLKPBSZGET ioctl.

For raw devices driver calls the child's method.

Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 block/raw-posix.c | 69 ---
 block/raw_bsd.c   | 14 +++
 2 files changed, 64 insertions(+), 19 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 8e9758e..6e0d80c 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -221,50 +221,70 @@ static int raw_normalize_devicepath(const char **filename)
 }
 #endif
 
-static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
+static unsigned int probe_logical_blocksize(BlockDriverState *bs, int fd)
 {
-BDRVRawState *s = bs-opaque;
-char *buf;
-unsigned int sector_size;
-
-/* For /dev/sg devices the alignment is not really used.
-   With buffered I/O, we don't have any restrictions. */
-if (bs-sg || !(s-open_flags  O_DIRECT)) {
-bs-request_alignment = 1;
-s-buf_align = 1;
-return;
-}
+unsigned int sector_size = 0;
 
 /* Try a few ioctls to get the right size */
-bs-request_alignment = 0;
-s-buf_align = 0;
-
 #ifdef BLKSSZGET
 if (ioctl(fd, BLKSSZGET, sector_size) = 0) {
-bs-request_alignment = sector_size;
+return sector_size;
 }
 #endif
 #ifdef DKIOCGETBLOCKSIZE
 if (ioctl(fd, DKIOCGETBLOCKSIZE, sector_size) = 0) {
-bs-request_alignment = sector_size;
+return sector_size;
 }
 #endif
 #ifdef DIOCGSECTORSIZE
 if (ioctl(fd, DIOCGSECTORSIZE, sector_size) = 0) {
-bs-request_alignment = sector_size;
+return sector_size;
 }
 #endif
 #ifdef CONFIG_XFS
 if (s-is_xfs) {
 struct dioattr da;
 if (xfsctl(NULL, fd, XFS_IOC_DIOINFO, da) = 0) {
-bs-request_alignment = da.d_miniosz;
+sector_size = da.d_miniosz;
 /* The kernel returns wrong information for d_mem */
 /* s-buf_align = da.d_mem; */
+return sector_size;
 }
 }
 #endif
 
+return 0;
+}
+
+static unsigned int probe_physical_blocksize(BlockDriverState *bs, int fd)
+{
+unsigned int blk_size = 0;
+#ifdef BLKPBSZGET
+if (ioctl(fd, BLKPBSZGET, blk_size) = 0) {
+return blk_size;
+}
+#endif
+
+return 0;
+}
+
+static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
+{
+BDRVRawState *s = bs-opaque;
+char *buf;
+
+/* For /dev/sg devices the alignment is not really used.
+   With buffered I/O, we don't have any restrictions. */
+if (bs-sg || !(s-open_flags  O_DIRECT)) {
+bs-request_alignment = 1;
+s-buf_align = 1;
+return;
+}
+
+s-buf_align = 0;
+/* Let's try to use the logical blocksize for the alignment. */
+bs-request_alignment = probe_logical_blocksize(bs, fd);
+
 /* If we could not get the sizes so far, we can only guess them */
 if (!s-buf_align) {
 size_t align;
@@ -642,6 +662,16 @@ static void raw_refresh_limits(BlockDriverState *bs, Error 
**errp)
 bs-bl.opt_mem_alignment = s-buf_align;
 }
 
+static int hdev_probe_blocksizes(BlockDriverState *bs)
+{
+BDRVRawState *s = bs-opaque;
+
+bs-logical_block_size = probe_logical_blocksize(bs, s-fd);
+bs-physical_block_size = probe_physical_blocksize(bs, s-fd);
+
+return 0;
+}
+
 static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)
 {
 int ret;
@@ -2009,6 +2039,7 @@ static BlockDriver bdrv_host_device = {
 .bdrv_get_info = raw_get_info,
 .bdrv_get_allocated_file_size
 = raw_get_allocated_file_size,
+.bdrv_probe_blocksizes = hdev_probe_blocksizes,
 
 .bdrv_detach_aio_context = raw_detach_aio_context,
 .bdrv_attach_aio_context = raw_attach_aio_context,
diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index f82f4c2..ae400a6 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -173,6 +173,19 @@ static int raw_probe(const uint8_t *buf, int buf_size, 
const char *filename)
 return 1;
 }
 
+static int raw_probe_blocksizes(BlockDriverState *bs)
+{
+int ret;
+
+ret = bdrv_probe_blocksizes(bs-file);
+if (ret == 0) {
+bs-logical_block_size = bs-file-logical_block_size;
+bs-physical_block_size = bs-file-physical_block_size;
+}
+
+return ret;
+}
+
 static BlockDriver bdrv_raw = {
 .format_name  = raw,
 .bdrv_probe   = raw_probe,
@@ -190,6 +203,7 @@ static BlockDriver bdrv_raw = {

[Qemu-devel] [PATCH 2/4] blocksize: support auto-sensing of blocksizes

The block device model does not impose fixed block sizes for
access to backing devices. This patch introduces support for
auto lookup of the block sizes of the backing block device.

To achieve this, a new function blkconf_blocksizes is
implemented. This function tries to get values
from the block driver. If this does not work 512 is used,
so the default excecution logic is not changed.

Based on 2013 patch from Einar Lueck elelu...@de.ibm.com

Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 block.c   | 12 
 hw/block/block.c  | 25 +
 hw/core/qdev-properties.c |  4 +++-
 include/block/block.h |  1 +
 include/block/block_int.h |  5 +
 include/hw/block/block.h  |  2 ++
 6 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 8cf519b..67de6e9 100644
--- a/block.c
+++ b/block.c
@@ -552,6 +552,18 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error 
**errp)
 }
 }
 
+int bdrv_probe_blocksizes(BlockDriverState *bs)
+{
+BlockDriver *drv = bs-drv;
+
+assert(drv != NULL);
+if (drv-bdrv_probe_blocksizes) {
+return drv-bdrv_probe_blocksizes(bs);
+}
+
+return -1;
+}
+
 /*
  * Create a uniquely-named empty temporary file.
  * Return 0 upon success, otherwise a negative errno value.
diff --git a/hw/block/block.c b/hw/block/block.c
index 33dd3f3..29a0227 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -10,6 +10,10 @@
 #include sysemu/blockdev.h
 #include hw/block/block.h
 #include qemu/error-report.h
+#include block/block_int.h
+#ifdef __linux__
+#include linux/fs.h
+#endif
 
 void blkconf_serial(BlockConf *conf, char **serial)
 {
@@ -22,6 +26,27 @@ void blkconf_serial(BlockConf *conf, char **serial)
 }
 }
 
+void blkconf_blocksizes(BlockConf *conf)
+{
+BlockDriverState *bs = conf-bs;
+
+/* default values as a basis - if probing fails */
+conf-physical_block_size = BLOCK_PROPERTY_STD_BLKSIZE;
+conf-logical_block_size = BLOCK_PROPERTY_STD_BLKSIZE;
+if (bdrv_probe_blocksizes(conf-bs)  0) {
+return;
+}
+if (bs-logical_block_size) {
+conf-logical_block_size = (uint16_t)(bs-logical_block_size);
+}
+if (bs-physical_block_size) {
+conf-physical_block_size = (uint16_t)(bs-physical_block_size);
+} else if (conf-logical_block_size) {
+/* if driver sets no physical size, try to use logical size */
+conf-physical_block_size = conf-logical_block_size;
+}
+}
+
 int blkconf_geometry(BlockConf *conf, int *ptrans,
  unsigned cyls_max, unsigned heads_max, unsigned secs_max)
 {
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 3d12560..49fb1e3 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -579,7 +579,9 @@ static void set_blocksize(Object *obj, Visitor *v, void 
*opaque,
 error_propagate(errp, local_err);
 return;
 }
-if (value  min || value  max) {
+
+/* value == 0 indicates that block size should be sensed later on */
+if ((value  min || value  max)  value  0) {
 error_set(errp, QERR_PROPERTY_VALUE_OUT_OF_RANGE,
   dev-id?:, name, (int64_t)value, min, max);
 return;
diff --git a/include/block/block.h b/include/block/block.h
index f08471d..43af424 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -583,6 +583,7 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs);
  * the old #AioContext is not executing.
  */
 void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context);
+int bdrv_probe_blocksizes(BlockDriverState *bs);
 
 void bdrv_io_plug(BlockDriverState *bs);
 void bdrv_io_unplug(BlockDriverState *bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 7b541a0..572e954 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -266,6 +266,8 @@ struct BlockDriver {
 void (*bdrv_io_unplug)(BlockDriverState *bs);
 void (*bdrv_flush_io_queue)(BlockDriverState *bs);
 
+int (*bdrv_probe_blocksizes)(BlockDriverState *bs);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
@@ -360,6 +362,9 @@ struct BlockDriverState {
 /* the block size for which the guest device expects atomicity */
 int guest_block_size;
 
+unsigned int physical_block_size;
+unsigned int logical_block_size;
+
 /* do we need to tell the quest if we have a volatile write cache? */
 int enable_write_cache;
 
diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index 7c3d6c8..7a0092e 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -40,6 +40,7 @@ static inline unsigned int get_physical_block_exp(BlockConf 
*conf)
 return exp;
 }
 
+#define BLOCK_PROPERTY_STD_BLKSIZE 512
 #define DEFINE_BLOCK_PROPERTIES(_state, _conf)  \

[Qemu-devel] [PATCH 1/4] hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x

This patch extends the function hd_geometry_guess. It introduces a
target specific hook. The default implementation for this target
specific hook is empty, has therefore no effect and the existing logic
works as before.

For target-s390x, the behaviour is chosen as follows:
If no geo could be guessed via guess_disk_lchs, a new function called
guess_disk_pchs is called. The latter utilizes HDIO_GET_GEO ioctl to ask
the underlying disk for geometry.
If this is not successful (e.g. image files) geometry is derived from the
size of the disk (as before).

The new HDIO_GETGEO logic is required for two use cases:
a) Support for geometries of Direct Attached Storage Disks (DASD)
on s390x configured as backing of virtio block devices.
b) Support for FCP attached SCSI disks that do not yet have a
partition table. Without this patch, fdisk -l on the host would
return different results then fdisk -l in the guest.

Based on 2013 patch from Einar Lueck elelu...@de.ibm.com

Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/block/Makefile.objs |  6 +-
 hw/block/hd-geometry.c | 56 ++
 2 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
index bf46f03..e4f6205 100644
--- a/hw/block/Makefile.objs
+++ b/hw/block/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-y += block.o cdrom.o hd-geometry.o
+common-obj-y += block.o cdrom.o
 common-obj-$(CONFIG_FDC) += fdc.o
 common-obj-$(CONFIG_SSI_M25P80) += m25p80.o
 common-obj-$(CONFIG_NAND) += nand.o
@@ -13,3 +13,7 @@ obj-$(CONFIG_SH4) += tc58128.o
 
 obj-$(CONFIG_VIRTIO) += virtio-blk.o
 obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += dataplane/
+
+# geometry is target/architecture dependent and therefore needs to be built
+# for every target platform
+obj-y += hd-geometry.o
diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c
index 6feb4f8..7988264 100644
--- a/hw/block/hd-geometry.c
+++ b/hw/block/hd-geometry.c
@@ -33,6 +33,10 @@
 #include block/block.h
 #include hw/block/block.h
 #include trace.h
+#ifdef __linux__
+#include linux/fs.h
+#include linux/hdreg.h
+#endif
 
 struct partition {
 uint8_t boot_ind;   /* 0x80 - active */
@@ -47,6 +51,37 @@ struct partition {
 uint32_t nr_sects;  /* nr of sectors in partition */
 } QEMU_PACKED;
 
+static void guess_chs_for_size(BlockDriverState *bs, uint32_t *pcyls,
+   uint32_t *pheads, uint32_t *psecs);
+
+/* try to get geometry from disk via HDIO_GETGEO ioctl
+   Return 0 if OK, -1 if ioctl does not work (e.g. image file) */
+static inline int guess_disk_pchs(BlockDriverState *bs, uint32_t *pcylinders,
+  uint32_t *pheads, uint32_t *psectors)
+{
+#ifdef __linux__
+struct hd_geometry geo;
+
+if (bdrv_ioctl(bs, HDIO_GETGEO, geo)) {
+return -1;
+}
+
+/* HDIO_GETGEO may return success even though geo contains zeros
+   (e.g. certain multipath setups) */
+if (!geo.heads || !geo.sectors || !geo.cylinders) {
+return -1;
+}
+
+*pheads = geo.heads;
+*psectors = geo.sectors;
+*pcylinders = geo.cylinders;
+return 0;
+#else
+return -1;
+#endif
+}
+
+
 /* try to guess the disk logical geometry from the MSDOS partition table.
Return 0 if OK, -1 if could not guess */
 static int guess_disk_lchs(BlockDriverState *bs,
@@ -116,6 +151,26 @@ static void guess_chs_for_size(BlockDriverState *bs,
 *psecs = 63;
 }
 
+#ifdef TARGET_S390X
+void hd_geometry_guess(BlockDriverState *bs, uint32_t *pcyls, uint32_t *pheads,
+   uint32_t *psecs, int *ptrans)
+{
+if (guess_disk_lchs(bs, (int *)pcyls, (int *)pheads, (int *)psecs) == 0) {
+goto done;
+}
+if (guess_disk_pchs(bs, pcyls, pheads, psecs) == 0) {
+goto done;
+}
+/* still no geometry - try to guess from disk size */
+guess_chs_for_size(bs, pcyls, pheads, psecs);
+done:
+if (ptrans) {
+*ptrans = BIOS_ATA_TRANSLATION_NONE;
+}
+trace_hd_geometry_guess(bs, *pcyls, *pheads, *psecs,
+BIOS_ATA_TRANSLATION_NONE);
+}
+#else
 void hd_geometry_guess(BlockDriverState *bs,
uint32_t *pcyls, uint32_t *pheads, uint32_t *psecs,
int *ptrans)
@@ -148,6 +203,7 @@ void hd_geometry_guess(BlockDriverState *bs,
 }
 trace_hd_geometry_guess(bs, *pcyls, *pheads, *psecs, translation);
 }
+#endif
 
 int hd_bios_chs_auto_trans(uint32_t cyls, uint32_t heads, uint32_t secs)
 {
-- 
1.8.5.5

[Qemu-devel] [PATCH 4/4] blocksize: add blkconf_blocksize call to all block devices

This patch add the blkconf_blocksize call to all
devices, which use DEFINE_BLOCK_PROPERTIES.
If the underlying driver function fails, blkconf_blocksizes
will set blocksizes to default (512) value.

Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 hw/block/nvme.c  | 1 +
 hw/block/virtio-blk.c| 1 +
 hw/ide/qdev.c| 1 +
 hw/scsi/scsi-disk.c  | 1 +
 hw/usb/dev-storage.c | 1 +
 include/hw/block/block.h | 4 ++--
 6 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 5fd8f89..50fe769 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -761,6 +761,7 @@ static int nvme_init(PCIDevice *pci_dev)
 if (!n-serial) {
 return -1;
 }
+blkconf_blocksizes(n-conf);
 
 pci_conf = pci_dev-config;
 pci_conf[PCI_INTERRUPT_PIN] = 1;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index c241c50..b5027b1 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -743,6 +743,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 }
 
 blkconf_serial(blk-conf, blk-serial);
+blkconf_blocksizes(blk-conf);
 s-original_wce = bdrv_enable_write_cache(blk-conf.bs);
 if (blkconf_geometry(blk-conf, NULL, 65535, 255, 255)  0) {
 error_setg(errp, Error setting geometry);
diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index 6e475e6..6d94d8f 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -161,6 +161,7 @@ static int ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind)
 }
 
 blkconf_serial(dev-conf, dev-serial);
+blkconf_blocksizes(dev-conf);
 if (kind != IDE_CD
  blkconf_geometry(dev-conf, dev-chs_trans, 65536, 16, 255)  0) {
 return -1;
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index d47ecd6..bfae48b 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2250,6 +2250,7 @@ static int scsi_initfn(SCSIDevice *dev)
 }
 
 blkconf_serial(s-qdev.conf, s-serial);
+blkconf_blocksizes(s-qdev.conf);
 if (dev-type == TYPE_DISK
  blkconf_geometry(dev-conf, NULL, 65535, 255, 255)  0) {
 return -1;
diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index ae4efcb..cf50ac1 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -603,6 +603,7 @@ static int usb_msd_initfn_storage(USBDevice *dev)
 }
 
 blkconf_serial(s-conf, dev-serial);
+blkconf_blocksizes(s-conf);
 
 /*
  * Hack alert: this pretends to be a block device, but it's really
diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index 7a0092e..8560bb2 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -44,9 +44,9 @@ static inline unsigned int get_physical_block_exp(BlockConf 
*conf)
 #define DEFINE_BLOCK_PROPERTIES(_state, _conf)  \
 DEFINE_PROP_DRIVE(drive, _state, _conf.bs),   \
 DEFINE_PROP_BLOCKSIZE(logical_block_size, _state, \
-  _conf.logical_block_size, 512),   \
+  _conf.logical_block_size, 0), \
 DEFINE_PROP_BLOCKSIZE(physical_block_size, _state,\
-  _conf.physical_block_size, 512),  \
+  _conf.physical_block_size, 0),\
 DEFINE_PROP_UINT16(min_io_size, _state, _conf.min_io_size, 0),  \
 DEFINE_PROP_UINT32(opt_io_size, _state, _conf.opt_io_size, 0),\
 DEFINE_PROP_INT32(bootindex, _state, _conf.bootindex, -1),\
-- 
1.8.5.5

Re: [Qemu-devel] [PATCH 0/4] Geometry and blocksize support for backing devices

2014-07-29 Thread Christian Borntraeger

On 29/07/14 14:27, Ekaterina Tumanova wrote:
 This patch set is based on a patch suggested by Einar Lueck
 on Feb 08, 2013.
 
 This patch set introduces:
 1. s390x specific geometry detection:
 Add s390 specific version of hd_geometry_guess function,
 which uses HDIO_GETGEO ioctl.
 
 2. A set of blocksize patches for autodetection of logical and
 physical blocksizes. Change history:
 2.1
 Original blocksize patch only configured autolookup for
 virtio-blk devices. There was a request from Stefan Hajnoczi
 to make this architecture-independent. Now autolookup is
 configured by default for all block devices.
 2.2
 Add driver method to probe blocksizes for raw and
 host_device drivers.
 (also requested by the reviewers of the original patch)
 
 Ekaterina Tumanova (4):
   hd-geometry.c: Integrate HDIO_GETGEO in guessing for target-s390x
   blocksize: support auto-sensing of blocksizes
   blocksize: Add driver method to get the blocksizes
   blocksize: add blkconf_blocksize call to all block devices
 
  block.c   | 12 +
  block/raw-posix.c | 69 
 ++-
  block/raw_bsd.c   | 14 ++
  hw/block/Makefile.objs|  6 -
  hw/block/block.c  | 25 +
  hw/block/hd-geometry.c| 56 ++
  hw/block/nvme.c   |  1 +
  hw/block/virtio-blk.c |  1 +
  hw/core/qdev-properties.c |  4 ++-
  hw/ide/qdev.c |  1 +
  hw/scsi/scsi-disk.c   |  1 +
  hw/usb/dev-storage.c  |  1 +
  include/block/block.h |  1 +
  include/block/block_int.h |  5 
  include/hw/block/block.h  |  6 +++--
  15 files changed, 180 insertions(+), 23 deletions(-)
 

CCing Kevin and Stefan.

Re: [Qemu-devel] [PATCH for-2.1] po: Update French translation

On 28 July 2014 22:44, Aurelien Jarno aurel...@aurel32.net wrote:
 Add new translations for recently added messages.

 Signed-off-by: Aurelien Jarno aurel...@aurel32.net
 ---
  po/fr_FR.po |   54 +++---

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH for-2.1] po: update Italian translation

On 29 July 2014 07:15, Paolo Bonzini pbonz...@redhat.com wrote:
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  po/it.po | 63 ---
  1 file changed, 40 insertions(+), 23 deletions(-)

Applied, thanks.

-- PMM

Re: [Qemu-devel] [Bug 1343827] [NEW] block.c: multiwrite_merge() truncates overlapping requests

On Mon, Jul 28, 2014 at 2:12 PM, Andrey Korolyov and...@xdel.ru wrote:
 bug is still here in the master and can be easily reproduced (and, of
 course, looks like blocker since data corruption takes place). Does
 anyone have an idea on when the fix (at least suggested one) will be
 merged?

I just sent the patches and CCed you.

The fix will not be included last-minute in QEMU 2.1 unless you have a
real-world application that depends on this behavior.  It will be
included in QEMU 2.2.  If this is critical for you, please reply with
details and we might be able to still get it into QEMU 2.1.

Stefan

Re: [Qemu-devel] [PATCH 2/2] target-mips/translate.c: Add judgement for msb and lsb

2014-07-29 Thread Elta


On 07/29/2014 06:52 AM, Aurelien Jarno wrote:

On Mon, Jul 28, 2014 at 11:34:30PM +0100, Peter Maydell wrote:

On 28 July 2014 23:32, Aurelien Jarno aurel...@aurel32.net wrote:

On Mon, Jul 28, 2014 at 11:01:02PM +0100, Peter Maydell wrote:

This may be true, but the TCG README doesn't define negative
lengths as being unspecified behaviour (ie guaranteed to at
least not crash even if the result isn't specified), and in fact the
implementation of tcg_gen_deposit will assert on negative lengths.
We shouldn't implement guest unpredictable cases as crash QEMU.

Well I tried this code under QEMU, and it clearly doesn't crash. It
seems the assert are not enabled with the default configuration options.

Try --enable-debug...

That's my point, it's only in debug mode, not in the default
configuration.


Maybe remove the tcg_debug_assert in tcg_gen_deposit_i64 and 
tcg_gen_deposit_i64
is a better way. But it may cause other mistake in other architecture, 
i'm not

sure.




That said I agree it's something to avoid, but I don't think triggering
a RI exception is the thing to do (even if it is correct according the
MIPS ISA manual) when real silicon output a random result instead.

Yes, you could emit code to do that instead if you like.

When I said random, it didn't say in the sense of random generator, but
in the sense a result that might depend on the input value and the
silicon implementation. It would be silly to emit code just for that,
but it would be smart for example to skip the deposit op in that case
instead of triggering an exception.

I think, debug mode shouldn't crash the qemu with an unpredictable 
operation,

so i want to fix it. And you say there shouldn't raise RI, i agree with you.

Or when lsb  msb, just leave the code and do nothing. What do you think 
about

this way?

[Qemu-devel] [PATCH 0/2] block: fix multiwrite_merge() overlapping requests

This is a fix for https://bugs.launchpad.net/qemu/+bug/1343827.

Patch 1 fixes the bug.  Patch 2 adds a qemu-iotests test case to prevent 
regressions.

Stefan Hajnoczi (2):
  block: fix overlapping multiwrite requests
  qemu-iotests: add multiwrite test cases

 block.c|  6 +++
 tests/qemu-iotests/100 | 97 ++
 tests/qemu-iotests/100.out | 54 ++
 tests/qemu-iotests/group   |  1 +
 4 files changed, 158 insertions(+)
 create mode 100755 tests/qemu-iotests/100
 create mode 100644 tests/qemu-iotests/100.out

-- 
1.9.3

[Qemu-devel] [PATCH 1/2] block: fix overlapping multiwrite requests

When request A is a strict subset of request B:

  


multiwrite_merge() merges them as follows:

  AA

The tail of request A should have been included:

  AAAA

This patch fixes data loss but this code path is probably rare.  Since
guests cannot assume ordering between in-flight requests, few
applications submit overlapping write requests.

Reported-by: Slava Pestov sviatoslav.pes...@gmail.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block.c b/block.c
index 8cf519b..0a3ac43 100644
--- a/block.c
+++ b/block.c
@@ -4498,6 +4498,12 @@ static int multiwrite_merge(BlockDriverState *bs, 
BlockRequest *reqs,
 // Add the second request
 qemu_iovec_concat(qiov, reqs[i].qiov, 0, reqs[i].qiov-size);
 
+// Add tail of first request, if necessary
+if (qiov-size  reqs[outidx].qiov-size) {
+qemu_iovec_concat(qiov, reqs[outidx].qiov, qiov-size,
+  reqs[outidx].qiov-size - qiov-size);
+}
+
 reqs[outidx].nb_sectors = qiov-size  9;
 reqs[outidx].qiov = qiov;
 
-- 
1.9.3

[Qemu-devel] [PATCH 2/2] qemu-iotests: add multiwrite test cases

This test case covers the basic bdrv_aio_multiwrite() scenarios:
1. Single request
2. Sequential requests
3. Overlapping requests
4. Disjoint requests

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 tests/qemu-iotests/100 | 97 ++
 tests/qemu-iotests/100.out | 54 ++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 152 insertions(+)
 create mode 100755 tests/qemu-iotests/100
 create mode 100644 tests/qemu-iotests/100.out

diff --git a/tests/qemu-iotests/100 b/tests/qemu-iotests/100
new file mode 100755
index 000..636e0e0
--- /dev/null
+++ b/tests/qemu-iotests/100
@@ -0,0 +1,97 @@
+#!/bin/bash
+#
+# Test simple read/write using plain bdrv_read/bdrv_write
+#
+# Copyright (C) 2014 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see http://www.gnu.org/licenses/.
+#
+
+# creator
+owner=stefa...@redhat.com
+
+seq=`basename $0`
+echo QA output created by $seq
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap _cleanup; exit \$status 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt generic
+_supported_proto generic
+_supported_os Linux
+
+
+size=128M
+
+echo
+echo == Single request ==
+_make_test_img $size
+$QEMU_IO -c multiwrite 0 4k $TEST_IMG | _filter_qemu_io
+
+echo
+echo == verify pattern ==
+$QEMU_IO -c read -P 0xcd 0 4k $TEST_IMG | _filter_qemu_io
+$QEMU_IO -c read -P 0 4k 4k $TEST_IMG | _filter_qemu_io
+
+echo
+echo == Sequential requests ==
+_make_test_img $size
+$QEMU_IO -c multiwrite 0 4k ; 4k 4k $TEST_IMG | _filter_qemu_io
+
+echo
+echo == verify pattern ==
+$QEMU_IO -c read -P 0xcd 0 4k $TEST_IMG | _filter_qemu_io
+$QEMU_IO -c read -P 0xce 4k 4k $TEST_IMG | _filter_qemu_io
+$QEMU_IO -c read -P 0 8k 4k $TEST_IMG | _filter_qemu_io
+
+echo
+echo == Overlapping requests ==
+_make_test_img $size
+$QEMU_IO -c multiwrite 0 4k ; 1k 2k $TEST_IMG | _filter_qemu_io
+
+echo
+echo == verify pattern ==
+# Order of overlapping in-flight requests is not guaranteed so we cannot verify
+# [1k, 3k) since it could have either pattern 0xcd or 0xce.
+$QEMU_IO -c read -P 0xcd 0 1k $TEST_IMG | _filter_qemu_io
+$QEMU_IO -c read -P 0xcd 3k 1k $TEST_IMG | _filter_qemu_io
+$QEMU_IO -c read -P 0 4k 4k $TEST_IMG | _filter_qemu_io
+
+echo
+echo == Disjoint requests ==
+_make_test_img $size
+$QEMU_IO -c multiwrite 0 4k ; 64k 4k $TEST_IMG | _filter_qemu_io
+
+echo
+echo == verify pattern ==
+$QEMU_IO -c read -P 0xcd 0 4k $TEST_IMG | _filter_qemu_io
+$QEMU_IO -c read -P 0 4k 60k $TEST_IMG | _filter_qemu_io
+$QEMU_IO -c read -P 0xce 64k 4k $TEST_IMG | _filter_qemu_io
+$QEMU_IO -c read -P 0 68k 4k $TEST_IMG | _filter_qemu_io
+
+# success, all done
+echo *** done
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/100.out b/tests/qemu-iotests/100.out
new file mode 100644
index 000..2b9f741
--- /dev/null
+++ b/tests/qemu-iotests/100.out
@@ -0,0 +1,54 @@
+QA output created by 100
+
+== Single request ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 
+wrote 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verify pattern ==
+read 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 4096
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== Sequential requests ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 
+wrote 8192/8192 bytes at offset 0
+8 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verify pattern ==
+read 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 4096
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 8192
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== Overlapping requests ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 
+wrote 6144/6144 bytes at offset 0
+6 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verify pattern ==
+read 1024/1024 bytes at offset 0
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1024/1024 bytes at offset 3072
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 4096
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== Disjoint requests ==
+Formatting

Re: [Qemu-devel] [PATCH v2] Tap: fix vcpu long time io blocking on tap

On Tue, Jul 29, 2014 at 1:02 PM, Wangkai (Kevin,C) wangka...@huawei.com wrote:

 -Original Message-
 From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
 Sent: Monday, July 28, 2014 11:53 PM
 To: Wangkai (Kevin,C)
 Cc: qemu-devel@nongnu.org; Lee yang; aligu...@amazon.com; Stefan
 Hajnoczi
 Subject: Re: [Qemu-devel] [PATCH v2] Tap: fix vcpu long time io
 blocking on tap

 On Fri, Jul 18, 2014 at 09:33:42AM +, Wangkai (Kevin,C) wrote:
  fix vcpu long time io blocking on tap, when too many packets was
  delivered to the guest os via tap interface.

  --
  Signed-off-by: Wangkai wangka...@huawei.com

 Thanks, applied to my net-next tree:
 https://github.com/stefanha/qemu/commits/net-next

 The patch did not apply cleanly so I had to do it manually:

   Applying: Tap: fix vcpu long time io blocking on tap
   fatal: corrupt patch at line 32
   Repository lacks necessary blobs to fall back on 3-way merge.
   Cannot fall back to three-way merge.
   Patch failed at 0001 Tap: fix vcpu long time io blocking on tap

 Please use git-send-email(1).

 I also adjusted the commit message and doc comments to fit QEMU style
 and for grammar.

 Stefan
 [Wangkai (Kevin,C)]

 Hi Stefan,
 I will send a patch v3 use git send-email to you.

There is no need, I have already applied v2.

Thanks,
Stefan

Re: [Qemu-devel] [PATCH RFC v2 02/12] reset handler for qdevified devices

2014-07-29 Thread Juan Quintela

Sanidhya Kashyap sanidhya.ii...@gmail.com wrote:
 I have added a structure containing the list of qdevified devices which have
 been added to the SaveVMHandlers. Since, I was unable to find any particular
 struct containing the information about all the qdevified devices. So, I have
 created my own version, which is very very specific.

 I shall be grateful if anyone can give some pointers on this.

 Signed-off-by: Sanidhya Kashyap sanidhya.ii...@gmail.com
 ---
  savevm.c | 30 ++
  1 file changed, 30 insertions(+)

 diff --git a/savevm.c b/savevm.c
 index e19ae0a..0255fa0 100644
 --- a/savevm.c
 +++ b/savevm.c
 @@ -503,12 +503,29 @@ void unregister_savevm(DeviceState *dev, const char 
 *idstr, void *opaque)
  }
  }
  
 +/*
 + * Reset entry for qdevified devices.
 + * Contains all those devices which have been qdevified and are
 + * part of the SaveVMHandlers. This one allows resetting of
 + * single device at any time.
 + */
 +
 +typedef struct VMStatesQdevResetEntry {
 +QTAILQ_ENTRY(VMStatesQdevResetEntry) entry;
 +DeviceState *dev;
 +char device_name[256];
 +} VMStatesQdevResetEntry;
 +
 +static QTAILQ_HEAD(vmstate_reset_handlers, VMStatesQdevResetEntry)
 +  vmstate_reset_handlers = 
 QTAILQ_HEAD_INITIALIZER(vmstate_reset_handlers);
 +
  int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
 const VMStateDescription *vmsd,
 void *opaque, int alias_id,
 int required_for_version)
  {
  SaveStateEntry *se;
 +VMStatesQdevResetEntry *qre;
  
  /* If this triggers, alias support can be dropped for the vmsd. */
  assert(alias_id == -1 || required_for_version = 
 vmsd-minimum_version_id);
 @@ -521,6 +538,12 @@ int vmstate_register_with_alias_id(DeviceState *dev, int 
 instance_id,
  se-alias_id = alias_id;
  
  if (dev) {
 +
 +qre = g_malloc0(sizeof(VMStatesQdevResetEntry));
 +qre-dev = dev;
 +strcpy(qre-device_name, vmsd-name);
 +QTAILQ_INSERT_TAIL(vmstate_reset_handlers, qre, entry);
 +

As stated on irc, you could add a dev field to SaveStateEntry and call
it a day.

Thanks, Juan.


  char *id = qdev_get_dev_path(dev);
  if (id) {
  pstrcpy(se-idstr, sizeof(se-idstr), id);
 @@ -551,6 +574,7 @@ void vmstate_unregister(DeviceState *dev, const 
 VMStateDescription *vmsd,
  void *opaque)
  {
  SaveStateEntry *se, *new_se;
 +VMStatesQdevResetEntry *qre, *new_qre;
  
  QTAILQ_FOREACH_SAFE(se, savevm_handlers, entry, new_se) {
  if (se-vmsd == vmsd  se-opaque == opaque) {
 @@ -561,6 +585,12 @@ void vmstate_unregister(DeviceState *dev, const 
 VMStateDescription *vmsd,
  g_free(se);
  }
  }
 +
 +QTAILQ_FOREACH_SAFE(qre, vmstate_reset_handlers, entry, new_qre) {
 +if (dev  qre-dev  !strcmp(vmsd-name, qre-device_name)) {
 +QTAILQ_REMOVE(vmstate_reset_handlers, qre, entry);
 +}
 +}
  }
  
  static int vmstate_load(QEMUFile *f, SaveStateEntry *se, int version_id)

Re: [Qemu-devel] [RESEND PATCH V3] qemu-img info: show nocow info

2014-07-29 Thread Eric Blake

On 07/29/2014 01:18 AM, Chunyan Liu wrote:
 Add nocow info in 'qemu-img info' output to show whether the file
 currently has NOCOW flag set or not.
 
 Signed-off-by: Chunyan Liu cy...@suse.com
 Acked-by: Eric Blake ebl...@redhat.com

Actually, this should be Reviewed-by, not Acked-by.  My understanding is
that Reviewed-by is stronger (I actually read through the patch, and
looked for possible problems) while Acked-by is weaker (a maintainer
stating that the work is taken in on the basis of third-party reviews
without actually reviewing the patch itself).

 ---
 Resend for QEMU 2.2. Change json version comment.
 
  block/qapi.c | 25 +
  qapi/block-core.json |  5 -
  2 files changed, 29 insertions(+), 1 deletion(-)
 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH RFC v2 03/12] VMState test: query command to extract the qdevified device names

2014-07-29 Thread Juan Quintela

Sanidhya Kashyap sanidhya.ii...@gmail.com wrote:
 I have provided a qmp interface for getting the list of qdevified devices
 that have been registered with SaveVMHandlers.

 Signed-off-by: Sanidhya Kashyap sanidhya.ii...@gmail.com
 ---
  qapi-schema.json | 22 ++
  qmp-commands.hx  | 25 +
  savevm.c | 34 ++
  3 files changed, 81 insertions(+)

 diff --git a/qapi-schema.json b/qapi-schema.json
 index b11aad2..996e6b5 100644
 --- a/qapi-schema.json
 +++ b/qapi-schema.json
 @@ -3480,3 +3480,25 @@
  # Since: 2.1
  ##
  { 'command': 'rtc-reset-reinjection' }
 +
 +##
 +# @VMstatesQdevDevices
 +#
 +# list of qdevified devices that are registered with SaveStateEntry
 +#
 +# @device: list of qdevified device names


Should we use qdev on the name?  Or just list of devices?  My
understanding is that all devices are on this list, no?

 +#
 +# Since 2.2
 +##
 +{ 'type': 'VMStatesQdevDevices',
 +  'data': { 'device': ['str'] } }
 +
 +##
 +# @query-qdev-devices
 +#
 +# returns the VMStatesQdevDevices that have the associated value
 +#
 +# Since 2.2
 +##
 +{ 'command': 'query-qdev-devices',
 +  'returns': 'VMStatesQdevDevices' }
 diff --git a/qmp-commands.hx b/qmp-commands.hx
 index 4be4765..2e20032 100644
 --- a/qmp-commands.hx
 +++ b/qmp-commands.hx
 @@ -3755,3 +3755,28 @@ Example:
  - { return: {} }
  
  EQMP
 +
 +{
 +.name   = query-qdev-devices,
 +.args_type  = ,
 +.mhandler.cmd_new = qmp_marshal_input_query_qdev_devices,
 +},
 +
 +SQMP
 +query-qdev-devices
 +--
 +
 +Shows registered Qdevified devices
 +
 +
 +Example (1):
 +
 +- { execute: query-qdev-devices }
 +- { return: [
 +   {
 + devices: [ kvm-tpr-opt, piix4_pm ]

Once here, can we change this to also include the device version?

i.e. something like:

 devices: [ [ device: [ name: kvm-tpr-opt, version, 15]]], ...]

Or somesuch?

 +   }
 + ]
 +   }
 +
 +EQMP
 diff --git a/savevm.c b/savevm.c
 index 0255fa0..7c1600a 100644
 --- a/savevm.c
 +++ b/savevm.c
 @@ -1167,6 +1167,40 @@ void do_savevm(Monitor *mon, const QDict *qdict)
  }
  }
  
 +static strList *create_qdev_list(const char *name, strList *list)
 +{
 +strList *temp_list;
 +int len;
 +
 +if (!list) {
 +list = g_malloc0(sizeof(strList));
 +len = strlen(name);
 +list-value = g_malloc0(sizeof(char)*(len+1));
 +strcpy(list-value, name);
 +list-next = NULL;
 +return list;
 +}
 +temp_list = g_malloc0(sizeof(strList));
 +len = strlen(name);
 +temp_list-value = g_malloc0(sizeof(char)*(len+1));
 +strcpy(temp_list-value, name);
 +temp_list-next = list;
 +list = temp_list;
 +return list;
 +}
 +
 +VMStatesQdevDevices *qmp_query_qdev_devices(Error **errp)
 +{
 +VMStatesQdevResetEntry *qre;
 +VMStatesQdevDevices *qdev_devices = 
 g_malloc0(sizeof(VMStatesQdevDevices));
 +
 +QTAILQ_FOREACH(qre, vmstate_reset_handlers, entry) {
 +qdev_devices-device = create_qdev_list(qre-device_name,
 + qdev_devices-device);
 +}
 +return qdev_devices;
 +}
 +
  void qmp_xen_save_devices_state(const char *filename, Error **errp)
  {
  QEMUFile *f;

Re: [Qemu-devel] [PATCH 1/2] block: fix overlapping multiwrite requests

On Tue, 07/29 13:41, Stefan Hajnoczi wrote:
 When request A is a strict subset of request B:
 
   
 

s/subset/superset/ ?

Fam

Re: [Qemu-devel] [PATCH 2/2] target-mips/translate.c: Add judgement for msb and lsb

On 28 July 2014 23:52, Aurelien Jarno aurel...@aurel32.net wrote:
 On Mon, Jul 28, 2014 at 11:34:30PM +0100, Peter Maydell wrote:
 On 28 July 2014 23:32, Aurelien Jarno aurel...@aurel32.net wrote:
  On Mon, Jul 28, 2014 at 11:01:02PM +0100, Peter Maydell wrote:
  This may be true, but the TCG README doesn't define negative
  lengths as being unspecified behaviour (ie guaranteed to at
  least not crash even if the result isn't specified), and in fact the
  implementation of tcg_gen_deposit will assert on negative lengths.
  We shouldn't implement guest unpredictable cases as crash QEMU.
 
  Well I tried this code under QEMU, and it clearly doesn't crash. It
  seems the assert are not enabled with the default configuration options.

 Try --enable-debug...

 That's my point, it's only in debug mode, not in the default
 configuration.

Debug builds are pretty common though, it's not exactly
something obscure like only crashes on SPARC hosts.

  That said I agree it's something to avoid, but I don't think triggering
  a RI exception is the thing to do (even if it is correct according the
  MIPS ISA manual) when real silicon output a random result instead.

 Yes, you could emit code to do that instead if you like.

 When I said random, it didn't say in the sense of random generator, but
 in the sense a result that might depend on the input value and the
 silicon implementation. It would be silly to emit code just for that,
 but it would be smart for example to skip the deposit op in that case
 instead of triggering an exception.

That's what I had in mind, yes.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v5 2/2] vmdk: Optimize cluster allocation

On Tue, Jul 29, 2014 at 09:00:43AM +0800, Fam Zheng wrote:
 On Mon, 07/28 16:11, Stefan Hajnoczi wrote:
  On Tue, May 27, 2014 at 04:49:22PM +0800, Fam Zheng wrote:
   +if (!bs-backing_hd) {
   +memset(whole_grain, 0,  skip_start_sector  BDRV_SECTOR_BITS);
   +memset(whole_grain + (skip_end_sector  BDRV_SECTOR_BITS), 0,
   +   cluster_bytes - (skip_end_sector  BDRV_SECTOR_BITS));
   +}
   +
   +assert(skip_end_sector = sector_num + extent-cluster_sectors);
  
  Does this assertion make sense?  skip_end_sector is a small number of
  sectors (relative to start of cluster), while sector_num +
  extent-cluster_sectors is a large absolute sector offset.
 
 skip_end_sector is absolute sector number too. The caller hunk in this patch
 is:

I disagree.  If it was an absolute sector number then the memset() a few
lines above would be incorrect:

  memset(whole_grain, 0,  skip_start_sector  BDRV_SECTOR_BITS);
  memset(whole_grain + (skip_end_sector  BDRV_SECTOR_BITS), 0,
 cluster_bytes - (skip_end_sector  BDRV_SECTOR_BITS));

Look at the code you pasted again:

 @@ -1406,12 +1468,17 @@ static int vmdk_write(BlockDriverState *bs, int64_t 
 sector_num,
  if (!extent) {
  return -EIO;
  }
 -ret = get_cluster_offset(
 -bs,
 -extent,
 -m_data,
 -sector_num  9, !extent-compressed,
 -cluster_offset);
 +extent_begin_sector = extent-end_sector - extent-sectors;
 +extent_relative_sector_num = sector_num - extent_begin_sector;
 +index_in_cluster = extent_relative_sector_num % 
 extent-cluster_sectors;
 +n = extent-cluster_sectors - index_in_cluster;
 +if (n  nb_sectors) {
 +n = nb_sectors;
 +}
 +ret = get_cluster_offset(bs, extent, m_data, sector_num  9,
 + !(extent-compressed || zeroed),
 + cluster_offset,
 + index_in_cluster, index_in_cluster + n);
  if (extent-compressed) {
  if (ret == VMDK_OK) {
  /* Refuse write to allocated cluster for streamOptimized */
 
 See the last parameter of get_cluster_offset.

The last parameter is (extent_relative_sector_num %
extent-cluster_sectors) + (extent-cluster_sectors - index_in_cluster).
Those are definitely sector counts (like nb_sectors) and not absolute
sector numbers (like sector_num).


pgppqs4d2XJb8.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH 1/2] block: fix overlapping multiwrite requests

On Tue, Jul 29, 2014 at 1:46 PM, Fam Zheng f...@redhat.com wrote:
 On Tue, 07/29 13:41, Stefan Hajnoczi wrote:
 When request A is a strict subset of request B:

   
 

 s/subset/superset/ ?

Yes :)

Re: [Qemu-devel] [PATCH 2/3] qemu-char: add -chardev exit-on-eof option

2014-07-29 Thread Eric Blake

On 07/29/2014 12:12 AM, Markus Armbruster wrote:

 Libvirt probably won't use it for normal guests (we don't want to kill
 qemu just because the monitor disconnects), but does have the notion of
 an autodestroy guest where it might be useful (we WANT the guest to go
 away if libvirtd dies early).  In fact, autodestroy guests are used
 during migration - we want to kill qemu on the destination side if
 libvirtd dies before the source side finishes sending the migration
 stream.  But in that scenario, once migration succeeds, libvirt has to
 be able to convert an autodestroy guest back into a normal guest that no
 longer disappears when libvirtd does; would this be something that QMP
 can toggle the state of this attribute on the fly?  Perhaps through qom-set?
 
 After migration completes, execution moves from source to target.
 Wouldn't you want to switch off target auto-destruct together with that
 move, atomically?

Libvirt starts the destination with -S, and migration can't complete
until libvirt resumes the destination CPUs with 'cont'.  Libvirt's
current timing of releasing auto-destruct is based on handshaking
between source and destination; it occurs after source claims migration
is done but before resuming CPUs on the destination, which satisfies the
atomicity that you correctly observed to be necessary.

 However, we also have precedence of actions in QAPI that are very
 unlikely to be used outside of qtest, but which are not marked
 experimental; for example, the 'Abort' action in 'transaction' will
 probably never be used by libvirt
 
 Arguably not a conscious decision to make it ABI forever, more a case of
 nobody thought about *not* making it ABI :)

Added in June 2013; and we *did* have a discussion on whether to hide
the transaction name at the time...
https://lists.gnu.org/archive/html/qemu-devel/2013-06/msg03281.html

 I don't really mind this instance, but I'm a bit concerned about rank
 ABI growth.

And that's a good position to maintain - it's always good to justify new
knobs, especially since once they are ABI, it's harder to refactor
around them.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm

2014-07-29 Thread Serge E. Hallyn

Quoting Alex Bligh (a...@alex.org.uk):
 Serge,
 
  I don't think that is in any way a problem.  Is migrating to older
  versions ever actually expected to work?  In either case I don't
  think for this particular case it's a problem.
 
 Good; no; and good - respectively.
 
  (The how to handle this in libvirt question is more interesting)
 
 I've been giving this some thought. However rococo we make this,
 I think the caller is often going to need to modify the command
 line anyway, i.e. is going to need to be aware of the migration
 problem.
 
 For instance, taking Ubuntu as an example, 12.04 shipped with
 qemu-kvm-1.0, and a pxe-virtio.rom of just under 64k, giving
 a 64k ROM slot. 14.04 ships with qemu-2.0 and a pxe-virtio.rom
 of over 80k, giving a 128k ROM slot. So however we fix the
 machine types, when migrating in a 12.04 initiated VM, qemu
 will need
  -global virtio-net-pci.romfile=/path/to/pxe-virtio.rom.12.04
 on the command line (or, if you don't much care about PXE
 working on a soft reboot, a blank file of the same size),
 whereas when migrating in a 14.04 initiated VM, that must
 not be on the command line.
 
 Fixing this properly would entail requiring that the ROMs are
 (effectively) distributed with qemu or at least that all
 ROM sizes become part of the machine type standard. This
 would have the advantage that loading a larger ROM than
 the machine type specifies would fail unless the ROM
 size was explicitly configured on the command line. But
 this is a subject wider than this patch.
 
 So the long and the short of it is that libvirt (sadly) like
 anything else starting qemu machines needs to know a bit about
 different versions of qemu, and be able to replace a machine
 type option with a machine type option and more on the
 command line.
 
 My previous suggestion doesn't help much because qemu will
 still need to be passed something on the command line.
 
 I think the best way to go with this patch would be something
 like:
 
 * Add pc-1.0-qemu-kvm as a machine type (done)
 
 * Rename pc-1.0 to pc-1.0-qemu-git
 
 * Add an alias for pc-1.0 to either pc-1.0-qemu-git or
   pc-1.0-qemu-kvm, configurable at build time with
   a ./configure option. The distro can then set this
   appropriately. This would default to pc-1.0-qemu-git
   (i.e. the current behaviour).
 
 On distros that only every used one of the above,
 ./configure will sort things out, +/- self-inflicted
 injuries relating to ROM size changes. If life is
 more complicated, libvirt (and other callers) will
 need to be aware. pc-1.0-qemu-git and pc-1.0-qemu-kvm
 can be used to unambiguously mean the relevant machine
 type, which will fix things going forward for that
 machine type.
 
 WDYT?

That sounds good.

And from there I think the thing to do will be to introduce a transient
alternate package that has the pc-1.0 alias pointing ot pc-1.0-qemu-kvm and
depends on the legacy pxe rom.  And maybe users can then choose that package for
migration purposes if needed, without having to make any changes to libvirt at
all, while others are completely unaffected.

-serge

Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm


On 29 Jul 2014, at 14:03, Serge E. Hallyn se...@hallyn.com wrote:

 That sounds good.
 
 And from there I think the thing to do will be to introduce a transient
 alternate package that has the pc-1.0 alias pointing ot pc-1.0-qemu-kvm and
 depends on the legacy pxe rom.  And maybe users can then choose that package 
 for
 migration purposes if needed, without having to make any changes to libvirt at
 all, while others are completely unaffected.

OK, I'll do that the next time I roll the patch. I'd like to hear from
some others first and am mindful of Paolo's comment re waiting for the
release to ship.

-- 
Alex Bligh

Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm

Il 29/07/2014 15:03, Serge E. Hallyn ha scritto:
 
 And from there I think the thing to do will be to introduce a transient
 alternate package that has the pc-1.0 alias pointing ot pc-1.0-qemu-kvm

This should be done in the main package, too.

 and depends on the legacy pxe rom.

If you can make the pxe-virtio.rom file 64k or less, then that would be
a good idea for 14.04 in general.  Newer machine types use
efi-virtio.rom, so you won't break -M pc migration.

Paolo

Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm

2014-07-29 Thread Serge Hallyn

Quoting Paolo Bonzini (pbonz...@redhat.com):
 Il 29/07/2014 15:03, Serge E. Hallyn ha scritto:
  
  And from there I think the thing to do will be to introduce a transient
  alternate package that has the pc-1.0 alias pointing ot pc-1.0-qemu-kvm
 
 This should be done in the main package, too.

That seems like a problem, unless Im misunderstanding something.  If we do
that in the main package, then anyone running a pc-1.0 system under the
qemu package won't be able to migrate.  Wouldn't it be better to have
pc-1.0 alias by default point to the pc-1.0-qemu machine type?

  and depends on the legacy pxe rom.
 
 If you can make the pxe-virtio.rom file 64k or less, then that would be
 a good idea for 14.04 in general.  Newer machine types use
 efi-virtio.rom, so you won't break -M pc migration.

Hm.  No idea offhand how I'd do that, but it sounds worth looking into.

thanks,
-serge

Re: [Qemu-devel] [PATCH v5 2/2] vmdk: Optimize cluster allocation

On Tue, 07/29 13:51, Stefan Hajnoczi wrote:
 On Tue, Jul 29, 2014 at 09:00:43AM +0800, Fam Zheng wrote:
  On Mon, 07/28 16:11, Stefan Hajnoczi wrote:
   On Tue, May 27, 2014 at 04:49:22PM +0800, Fam Zheng wrote:
+if (!bs-backing_hd) {
+memset(whole_grain, 0,  skip_start_sector  BDRV_SECTOR_BITS);
+memset(whole_grain + (skip_end_sector  BDRV_SECTOR_BITS), 0,
+   cluster_bytes - (skip_end_sector  BDRV_SECTOR_BITS));
+}
+
+assert(skip_end_sector = sector_num + extent-cluster_sectors);
   
   Does this assertion make sense?  skip_end_sector is a small number of
   sectors (relative to start of cluster), while sector_num +
   extent-cluster_sectors is a large absolute sector offset.
  
  skip_end_sector is absolute sector number too. The caller hunk in this patch
  is:
 
 I disagree.

You are right, I totally misread when replying. Will respin to fix the
assertion and also the spellings.

Thanks for reviewing and explaining my mistake :)

Fam

Re: [Qemu-devel] [PATCH for-2.1 0/2] qdev-monitor: include QOM device properties in -device FOO, help output

On Wed, Jul 09, 2014 at 02:01:30PM +0200, Stefan Hajnoczi wrote:
 These two patches fix the -device FOO,help output regression that Cole spotted
 in QEMU 2.0-rc0.  The problem is that virtio-blk-pci qdev properties have been
 converted to QOM alias properties but -device FOO,help shows only qdev
 properties.
 
 We simply need to update -device FOO,help code to use both qdev and QOM
 properties.  Note that types change because a 'drive' qdev type is actually a
 'str' QOM type.  We're moving more and more to QOM properties where the final
 type for this property would be 'linkDrive' or similar.
 
 Cole: please confirm that this fixes the issue
 
 Stefan Hajnoczi (2):
   qmp: hide hotplugged device property from device-list-properties
   qdev-monitor: include QOM properties in -device FOO,help output
 
  qdev-monitor.c | 40 +---
  qmp.c  |  1 +
  2 files changed, 18 insertions(+), 23 deletions(-)

CCed qemu-stable since we ought to fix -device FOO,?.  This patch was
missed for QEMU 2.1 but not critical (see Cole's response).

Applied to my block-next tree:
https://github.com/stefanha/qemu/commits/block-next

Stefan


pgpHm5HOZydZ6.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm

Il 29/07/2014 15:27, Serge Hallyn ha scritto:
 Quoting Paolo Bonzini (pbonz...@redhat.com):
 Il 29/07/2014 15:03, Serge E. Hallyn ha scritto:

 And from there I think the thing to do will be to introduce a transient
 alternate package that has the pc-1.0 alias pointing ot pc-1.0-qemu-kvm

 This should be done in the main package, too.
 
 That seems like a problem, unless Im misunderstanding something.  If we do
 that in the main package, then anyone running a pc-1.0 system under the
 qemu package won't be able to migrate.  Wouldn't it be better to have
 pc-1.0 alias by default point to the pc-1.0-qemu machine type?

You'd break that for people who have already upgraded from 12.04 and
14.04 and are keeping the old machine type.  You'd fix it for people who
are upgrading now.

I think providing a smoother upgrade path is worthwhile, even if it
annoys someone else.

Unfortunately the only solution is a lot of testing *before* a release,
and in fact this is why 2.1 was delayed by a migration problem.  Once
the release is out, you'll have to make someone unhappy.

 and depends on the legacy pxe rom.

 If you can make the pxe-virtio.rom file 64k or less, then that would be
 a good idea for 14.04 in general.  Newer machine types use
 efi-virtio.rom, so you won't break -M pc migration.
 
 Hm.  No idea offhand how I'd do that, but it sounds worth looking into.

I'm not sure either.  You could simply package the 12.04 ipxe ROMs into
14.04, and add a note about getting the sources for GPL friendliness.

Paolo

Re: [Qemu-devel] [PATCH] vmdk: improve streamOptimized vmdk support

On Mon, Jul 07, 2014 at 10:54:27AM -0400, Milos Vyletel wrote:
 VMDK's streamOptimized format is different that regular sparse format.
 L1(GD) and L2(GT) tables are not predefined but rather generated and
 written during image creation mainly because there is no way to tell
 how much space data will occupy once they are compressed. Also the
 location of header, L1 and L2 tables differs.
 
 - L2 tables (grain tables) are written after all grains they point to
 - L1 tables are written after all grains and L2 tables
 - footer at the end is used instead of header in first sector
 
 This patch improves streamOptimized support and adds possibility to
 create true streamOptimized images using qemu-img. Some of the changes
 are from VMDK specs, some of them from hexdump-ing images from VMWare
 and VirtualBox.
 
 I have compared these images to the ones generated by VMWare and vbox
 and they are identical with the exception of DescriptorFile that has
 some differences but none that would change behavior(CID and some
 additional DDB entries differ) and streamOptimized image generated from
 raw image was succesfully imported (as OVA) into VMWare ESXi and Oracle
 OVM.
 
 Signed-off-by: Milos Vyletel milos.vyle...@gmail.com
 ---
  block/vmdk.c |  363 
 +-
  1 files changed, 281 insertions(+), 82 deletions(-)

What does this patch do beyond what QEMU already supports today?

Is there a particular application that rejected QEMU's streamOptimized
images?  Is this a bug fix?

Please use scripts/checkpatch.pl to check coding style.

Fam: Please review

 diff --git a/block/vmdk.c b/block/vmdk.c
 index 27a78da..f482225 100644
 --- a/block/vmdk.c
 +++ b/block/vmdk.c
 @@ -81,6 +81,21 @@ typedef struct {
  uint16_t compressAlgorithm;
  } QEMU_PACKED VMDK4Header;
  
 +typedef struct {
 +uint64_t val;
 +uint32_t size;
 +uint32_t type;
 +uint8_t pad[BDRV_SECTOR_SIZE - sizeof(uint64_t) - 2*sizeof(uint32_t)];
 +} QEMU_PACKED VMDK4MetaMarker;
 +
 +typedef struct {
 +VMDK4MetaMarker footer_marker;
 +uint32_t magic;
 +VMDK4Header header;
 +uint8_t pad[BDRV_SECTOR_SIZE - sizeof(uint32_t) - sizeof(VMDK4Header)];
 +VMDK4MetaMarker eos_marker;
 +} QEMU_PACKED VMDK4Footer;
 +
  #define L2_CACHE_SIZE 16
  
  typedef struct VmdkExtent {
 @@ -89,24 +104,29 @@ typedef struct VmdkExtent {
  bool compressed;
  bool has_marker;
  bool has_zero_grain;
 +bool has_footer;
  int version;
  int64_t sectors;
  int64_t end_sector;
  int64_t flat_start_offset;
  int64_t l1_table_offset;
  int64_t l1_backup_table_offset;
 +uint32_t l1_index;
  uint32_t *l1_table;
  uint32_t *l1_backup_table;
  unsigned int l1_size;
  uint32_t l1_entry_sectors;
  
  unsigned int l2_size;
 +uint32_t *l2_table;
  uint32_t *l2_cache;
  uint32_t l2_cache_offsets[L2_CACHE_SIZE];
  uint32_t l2_cache_counts[L2_CACHE_SIZE];
  
  int64_t cluster_sectors;
  char *type;
 +
 +VMDK4Footer footer;
  } VmdkExtent;
  
  typedef struct BDRVVmdkState {
 @@ -555,14 +575,51 @@ static char *vmdk_read_desc(BlockDriverState *file, 
 uint64_t desc_offset,
  return buf;
  }
  
 +static int vmdk_read_footer(BlockDriverState *bs,
 +VMDK4Footer *footer)
 +{
 +int ret;
 +
 +/*
 +* footer starts 3 sectors from end
 +* - footer marker
 +* - footer
 +* - end-of-stream marker
 +*/
 +ret = bdrv_pread(bs-file,
 +(bs-file-total_sectors - 3) * BDRV_SECTOR_SIZE,
 +footer, sizeof(*footer));
 +if (ret  0) {
 +goto out;
 +}
 +
 +/* Some sanity checks for the footer */
 +if (be32_to_cpu(footer-magic) != VMDK4_MAGIC ||
 +le32_to_cpu(footer-footer_marker.size) != 0  ||
 +le32_to_cpu(footer-footer_marker.type) != MARKER_FOOTER ||
 +le64_to_cpu(footer-eos_marker.val) != 0  ||
 +le32_to_cpu(footer-eos_marker.size) != 0  ||
 +le32_to_cpu(footer-eos_marker.type) != MARKER_END_OF_STREAM)
 +{
 +ret = -EINVAL;
 +goto out;
 +}
 +
 +ret = VMDK_OK;
 + out:
 +return ret;
 +}
 +
  static int vmdk_open_vmdk4(BlockDriverState *bs,
 BlockDriverState *file,
 int flags, Error **errp)
  {
  int ret;
 +bool has_footer = false;
  uint32_t magic;
  uint32_t l1_size, l1_entry_sectors;
  VMDK4Header header;
 +VMDK4Footer footer;
  VmdkExtent *extent;
  BDRVVmdkState *s = bs-opaque;
  int64_t l1_backup_offset = 0;
 @@ -593,48 +650,13 @@ static int vmdk_open_vmdk4(BlockDriverState *bs,
  
  if (le64_to_cpu(header.gd_offset) == VMDK4_GD_AT_END) {
  /*
 - * The footer takes precedence over the header, so read it in. The
 - * footer starts at offset -1024 from the end: One sector for the
 - * footer, and another one for the end-of-stream marker.
 + * The

Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm


On 29 Jul 2014, at 14:21, Paolo Bonzini pbonz...@redhat.com wrote:

 If you can make the pxe-virtio.rom file 64k or less, then that would be
 a good idea for 14.04 in general.  Newer machine types use
 efi-virtio.rom, so you won't break -M pc migration.

Without further, won't that break migration from 14.04 *with* the big
ROM?

-- 
Alex Bligh

Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm


On 29 Jul 2014, at 14:35, Paolo Bonzini pbonz...@redhat.com wrote:

 I'm not sure either.  You could simply package the 12.04 ipxe ROMs into
 14.04, and add a note about getting the sources for GPL friendliness.

This would be my preference (or in Ubuntu's case, add it to the ipxe-qemu
package) but I think it should only be used when the legacy machine type
is used or it will break inbound migrations from other 14.04 machines
started with 128k ROMs (AIUI).

-- 
Alex Bligh

Re: [Qemu-devel] [PATCH RFC v2 05/12] VMstate test: basic VMState testing mechanism

2014-07-29 Thread Juan Quintela

Sanidhya Kashyap sanidhya.ii...@gmail.com wrote:
 In this patch, I have made the following changes:

 * changed the DPRINT statement.
 * renamed the variables.
 * added noqdev variable which decides which option to use for resetting.
 * added devices option which can help in resetting one or many devices
 (only qdevified ones).
 * updated the documentation.

 Signed-off-by: Sanidhya Kashyap sanidhya.ii...@gmail.com

 +##
 +# @test-vmstates
 +#
 +# tests the vmstates' value by dumping and loading in memory
 +#
 +# @iterations: (optional) The total iterations for vmstate testing.
 +# The min and max defind value is 10 and 100 respectively.
 +#
 +# @period: (optional) sleep interval between iteration (in milliseconds).
 +# The default interval is 100 milliseconds with min and max being
 +# 1 and 1 respectively.
 +#
 +# @noqdev: boolean variable which decides whether to use qdevified devices
 +# or not. Will be removed when all the devices have been qdevified.
 +#
 +# @devices: (optional) helps in resetting particular qdevified decices
 +# that have been registered with SaveStateEntry
 +#
 +# Since 2.2
 +##
 +{ 'command': 'test-vmstates',
 +  'data': {'*iterations': 'int',
 +   '*period': 'int',
 +   'noqdev':  'bool',

Do we really care about noqdev, or should we just decree that it is
false always?


 +#define DEBUG_TEST_VMSTATES 1
 +
 +#ifndef DEBUG_TEST_VMSTATES
 +#define DEBUG_TEST_VMSTATES 0
 +#endif

If you have the previe line, this ones are not needed.


 +
 +#define DPRINTF(fmt, ...) \
 +do { \
 +if (DEBUG_TEST_VMSTATES) { \
 +printf(vmstate_test:  fmt, ## __VA_ARGS__); \
 +} \
 +} while (0)

DPRINTF is *so* last year O:-)
It is considedered better to just add tracepoints to the code.  I think
that all the DPRINTF's make sense to be a tracepoint, no?


 +struct VMStateLogState {
 +int64_t current_iteration;
 +int64_t iterations;
 +int64_t period;
 +bool active_state;
 +bool noqdev;
 +VMStatesQdevDevices *qdevices;
 +QEMUTimer *timer;
 +
 +QTAILQ_HEAD(qdev_list, VMStatesQdevResetEntry) qdev_list;
 +};
 +
 +static VMStateLogState *vmstate_current_state(void)
 +{
 +static VMStateLogState current_state = {
 +.active_state = false,
 +};
 +
 +return current_state;
 +}
 +
 +static inline void test_vmstates_clear_qdev_entries(VMStateLogState *v)

We need a better preffix that test_vmstates_
But I can't think of one right now.  Will think later about it.


 +static inline bool check_device_name(VMStateLogState *v,
 + VMStatesQdevDevices *qdevices,
 + Error **errp)

Is inline needed?  I would expect the compiler to do a reasonable
decision with an static function called only once?

 +{
 +VMStatesQdevResetEntry *qre;
 +strList *devices_name = qdevices-device;
 +QTAILQ_INIT(v-qdev_list);
 +bool device_present;
 +
 +/* now, checking against each one */
 +for (; devices_name; devices_name = devices_name-next) {
 +device_present = false;
 +VMStatesQdevResetEntry *new_qre;
 +QTAILQ_FOREACH(qre, vmstate_reset_handlers, entry) {
 +if (!strcmp(qre-device_name, devices_name-value)) {
 +
 +device_present = true;
 +
 +new_qre = g_malloc0(sizeof(VMStatesQdevResetEntry));
 +new_qre-dev = qre-dev;
 +strcpy(new_qre-device_name, qre-device_name);
 +QTAILQ_INSERT_TAIL(v-qdev_list, new_qre, entry);
 +
 +break;
   return;

And remove the whole device_present variable and assignation?

 +}
 +}
 +if (!device_present) {
 +test_vmstates_clear_qdev_entries(v);
 +error_setg(errp, Incorrect device name - %s\n,
 +   devices_name-value);
 +return false;
 +}
 +}
 +return true;
 +}
 +
 +static inline void test_vmstates_reset_devices(VMStateLogState *v)
 +{
 +VMStatesQdevResetEntry *qre;
 +
 +if (v-noqdev) {
 +DPRINTF(resetting all devices\n);
 +qemu_system_reset(VMRESET_SILENT);

What is diffe9rent between calling with noqdev and with an empty
device list?  I would expect them to be the same list of devices.

 +} else if (!v-qdevices) {
 +QTAILQ_FOREACH(qre, vmstate_reset_handlers, entry) {
 +DPRINTF(resetting device: %s\n, qre-device_name);
 +device_reset(qre-dev);
 +}
 +} else {
 +QTAILQ_FOREACH(qre, v-qdev_list, entry) {
 +DPRINTF(resetting device: %s\n, qre-device_name);
 +device_reset(qre-dev);
 +}
 +}
 +}
 +
 +static void vmstate_test_cb(void *opaque)
 +{
 +VMStateLogState *v = opaque;
 +int saved_vm_running = runstate_is_running();
 +const QEMUSizedBuffer *qsb;
 +QEMUFile *f;
 +int ret;
 +int64_t save_vmstates_duration, load_vmstates_duration;

Re: [Qemu-devel] [PATCH for 2.1 V3] qemu-img info: show nocow info

On Mon, Jul 28, 2014 at 12:58:33PM -0600, Eric Blake wrote:
 On 07/28/2014 09:19 AM, Stefan Hajnoczi wrote:
  On Wed, Jul 09, 2014 at 10:43:13AM +0800, Chunyan Liu wrote:
  Add nocow info in 'qemu-img info' output to show whether the file
  currently has NOCOW flag set or not.
 
  Signed-off-by: Chunyan Liu cy...@suse.com
  ---
  Changes:
- update output info to NOCOW flag: set
 
   block/qapi.c | 25 +
   qapi/block-core.json |  5 -
   2 files changed, 29 insertions(+), 1 deletion(-)
  
  This patch was sent on July 9th, after the 2.1 soft freeze when we stop
  merging new features.  Soft freeze was 17th of June.
  
  Please resend for QEMU 2.2 and update the qapi-schema.json version
  comment.
 
 There's still the argument that this is a bug fix for an incomplete
 implementation of a new feature that IS in qemu 2.1 (that is, 2.1 is
 adding the ability to set the nocow flag, but without this patch, that
 addition is a write-only interface, and this patch is correcting the bug
 to allow it to be a read-write interface).
 
 But it is fairly late in the game - what is the level of damage if 2.1
 is released with a write-only setting, and this patch is deferred to
 2.2?  Without some strong justification, I can agree with the decision
 to postpone this patch.

You can use lsattr(1), so I see no critical need.

Stefan


pgpEu4JGYUZON.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm

2014-07-29 Thread Serge Hallyn

Quoting Alex Bligh (a...@alex.org.uk):
 
 On 29 Jul 2014, at 14:21, Paolo Bonzini pbonz...@redhat.com wrote:
 
  If you can make the pxe-virtio.rom file 64k or less, then that would be
  a good idea for 14.04 in general.  Newer machine types use
  efi-virtio.rom, so you won't break -M pc migration.
 
 Without further, won't that break migration from 14.04 *with* the big
 ROM?

Heh, good point.

Re: [Qemu-devel] [PATCH] vmdk: improve streamOptimized vmdk support

2014-07-29 Thread Milos Vyletel

On Tue, Jul 29, 2014 at 9:37 AM, Stefan Hajnoczi stefa...@redhat.com wrote:
 On Mon, Jul 07, 2014 at 10:54:27AM -0400, Milos Vyletel wrote:
 VMDK's streamOptimized format is different that regular sparse format.
 L1(GD) and L2(GT) tables are not predefined but rather generated and
 written during image creation mainly because there is no way to tell
 how much space data will occupy once they are compressed. Also the
 location of header, L1 and L2 tables differs.

 - L2 tables (grain tables) are written after all grains they point to
 - L1 tables are written after all grains and L2 tables
 - footer at the end is used instead of header in first sector

 This patch improves streamOptimized support and adds possibility to
 create true streamOptimized images using qemu-img. Some of the changes
 are from VMDK specs, some of them from hexdump-ing images from VMWare
 and VirtualBox.

 I have compared these images to the ones generated by VMWare and vbox
 and they are identical with the exception of DescriptorFile that has
 some differences but none that would change behavior(CID and some
 additional DDB entries differ) and streamOptimized image generated from
 raw image was succesfully imported (as OVA) into VMWare ESXi and Oracle
 OVM.

 Signed-off-by: Milos Vyletel milos.vyle...@gmail.com
 ---
  block/vmdk.c |  363 
 +-
  1 files changed, 281 insertions(+), 82 deletions(-)

 What does this patch do beyond what QEMU already supports today?

 Is there a particular application that rejected QEMU's streamOptimized
 images?  Is this a bug fix?

 Please use scripts/checkpatch.pl to check coding style.

 Fam: Please review


Images created/converted using QEMU were rejected by VMWare ESXi,
vCloud, VirtualBox and Oracle OVM. I did not try anything else.

Generally speaking streamOptimized format is not followed by QEMU.
Instead regular VMDK format + encryption is used. We can say this is
bug fix since streamOptimzed format is already included but does not
work well.

I'll check the style and fix the code. I'll postpone resubmit until
further review is done since I'm sure there will be things that will
need to be fixed.

Milos

 diff --git a/block/vmdk.c b/block/vmdk.c
 index 27a78da..f482225 100644
 --- a/block/vmdk.c
 +++ b/block/vmdk.c
 @@ -81,6 +81,21 @@ typedef struct {
  uint16_t compressAlgorithm;
  } QEMU_PACKED VMDK4Header;

 +typedef struct {
 +uint64_t val;
 +uint32_t size;
 +uint32_t type;
 +uint8_t pad[BDRV_SECTOR_SIZE - sizeof(uint64_t) - 2*sizeof(uint32_t)];
 +} QEMU_PACKED VMDK4MetaMarker;
 +
 +typedef struct {
 +VMDK4MetaMarker footer_marker;
 +uint32_t magic;
 +VMDK4Header header;
 +uint8_t pad[BDRV_SECTOR_SIZE - sizeof(uint32_t) - sizeof(VMDK4Header)];
 +VMDK4MetaMarker eos_marker;
 +} QEMU_PACKED VMDK4Footer;
 +
  #define L2_CACHE_SIZE 16

  typedef struct VmdkExtent {
 @@ -89,24 +104,29 @@ typedef struct VmdkExtent {
  bool compressed;
  bool has_marker;
  bool has_zero_grain;
 +bool has_footer;
  int version;
  int64_t sectors;
  int64_t end_sector;
  int64_t flat_start_offset;
  int64_t l1_table_offset;
  int64_t l1_backup_table_offset;
 +uint32_t l1_index;
  uint32_t *l1_table;
  uint32_t *l1_backup_table;
  unsigned int l1_size;
  uint32_t l1_entry_sectors;

  unsigned int l2_size;
 +uint32_t *l2_table;
  uint32_t *l2_cache;
  uint32_t l2_cache_offsets[L2_CACHE_SIZE];
  uint32_t l2_cache_counts[L2_CACHE_SIZE];

  int64_t cluster_sectors;
  char *type;
 +
 +VMDK4Footer footer;
  } VmdkExtent;

  typedef struct BDRVVmdkState {
 @@ -555,14 +575,51 @@ static char *vmdk_read_desc(BlockDriverState *file, 
 uint64_t desc_offset,
  return buf;
  }

 +static int vmdk_read_footer(BlockDriverState *bs,
 +VMDK4Footer *footer)
 +{
 +int ret;
 +
 +/*
 +* footer starts 3 sectors from end
 +* - footer marker
 +* - footer
 +* - end-of-stream marker
 +*/
 +ret = bdrv_pread(bs-file,
 +(bs-file-total_sectors - 3) * BDRV_SECTOR_SIZE,
 +footer, sizeof(*footer));
 +if (ret  0) {
 +goto out;
 +}
 +
 +/* Some sanity checks for the footer */
 +if (be32_to_cpu(footer-magic) != VMDK4_MAGIC ||
 +le32_to_cpu(footer-footer_marker.size) != 0  ||
 +le32_to_cpu(footer-footer_marker.type) != MARKER_FOOTER ||
 +le64_to_cpu(footer-eos_marker.val) != 0  ||
 +le32_to_cpu(footer-eos_marker.size) != 0  ||
 +le32_to_cpu(footer-eos_marker.type) != MARKER_END_OF_STREAM)
 +{
 +ret = -EINVAL;
 +goto out;
 +}
 +
 +ret = VMDK_OK;
 + out:
 +return ret;
 +}
 +
  static int vmdk_open_vmdk4(BlockDriverState *bs,
 BlockDriverState *file,
 int flags, Error **errp)
  {
  int ret;
 +bool has_footer = false;
  uint32_t magic;

[Qemu-devel] [PATCH 2/2] target-mips/translate.c: Update OPC_SYNCI

2014-07-29 Thread Dongxue Zhang

Update OPC_SYNCI with BS_STOP, in order to handle the instructions which saved
in the same TB of the store instruction.

Signed-off-by: Dongxue Zhang elta@gmail.com
---
 target-mips/translate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index c381366..dc8afcf 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -15334,7 +15334,9 @@ static void decode_opc (CPUMIPSState *env, DisasContext 
*ctx)
 break;
 case OPC_SYNCI:
 check_insn(ctx, ISA_MIPS32R2);
-/* Treat as NOP. */
+/* Break the TB to be able to sync copied instructions
+   immediately */
+ctx-bstate = BS_STOP;
 break;
 case OPC_BPOSGE32:/* MIPS DSP branch */
 #if defined(TARGET_MIPS64)
-- 
1.8.1.2

Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

Il 28/07/2014 17:03, David Hildenbrand ha scritto:
 Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt
 like things (SIGP START) AND a special interrupt (SIGP RESTART - which is like
 a SIPI++ as it performs a psw exchange - NMI). So we basically have two
 paths that can lead to a state change.  All interrupt bits may be in any
 combination (SIGP RESTART interrupts can't be masked out, nor can SIGP START 
 be
 denied).
 
 The other thing may be that on s390, each vcpu (including itself) can put
 another vcpu into the STOPPED state - I assume that this is different for x86 
 
 INIT_RECEIVED. For this reason we have to watch out for bad race conditions
 (e.g. multiple vcpus working on another vcpu)...

You can do that in x86 by sending an INIT inter-processor interrupt.  A
SIPI is ignored if the CPU is not in INIT_RECEIVED state.

Commit 66450a21f99636af4fafac2afd33f1a40631bc3a introduced the current
implementation.

- an INIT cancels a previous SIPI;

- if both INIT and SIPI are sent, on real hardware you need to have a
few hundred microseconds between them, but KVM will reliably process
INIT before SIPI.

See commit 299018f44ac553dce3caf84df1d14c4764faa279 for an example of
the races that can happen.

Note that x86 has KVM_MP_STATE_SIPI_RECEIVED state but it is obsolete,
we go straight from KVM_MP_STATE_INIT_RECEIVED to KVM_MP_STATE_RUNNABLE.

Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm


On 29 Jul 2014, at 14:42, Paolo Bonzini pbonz...@redhat.com wrote:

 Il 29/07/2014 15:39, Alex Bligh ha scritto:
 I'm not sure either.  You could simply package the 12.04 ipxe ROMs into
 14.04, and add a note about getting the sources for GPL friendliness.
 This would be my preference (or in Ubuntu's case, add it to the ipxe-qemu
 package) but I think it should only be used when the legacy machine type
 is used or it will break inbound migrations from other 14.04 machines
 started with 128k ROMs (AIUI).
 
 -M pc and its alias -M pc-i440fx-2.0 doesn't use pxe-virtio.rom at
 all (at least upstream).

Really? How does it pxeboot on virtio then (it definitely does pxeboot
on virtio)?

  Does Ubuntu 14.04 have efi-virtio.rom?

Yes. After 2 layers of symlinks you get to.

ubuntu@trustytest:~$ ls -la /usr/lib/ipxe/qemu/efi-virtio.rom
-rw-r--r-- 1 root root 220672 Jan  6  2014 /usr/lib/ipxe/qemu/efi-virtio.rom

 It would break -M pc-1.0 started on older 14.04, but I think that's
 acceptable.

I was more worried about any previous versions of Ubuntu (newer than
12.04) which might also be using the larger rom size. But then I
haven't investigated at what stage the rom size grew.

-- 
Alex Bligh

Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm

Il 29/07/2014 15:56, Alex Bligh ha scritto:
 
 On 29 Jul 2014, at 14:42, Paolo Bonzini pbonz...@redhat.com wrote:
 
 Il 29/07/2014 15:39, Alex Bligh ha scritto:
 I'm not sure either.  You could simply package the 12.04 ipxe ROMs into
 14.04, and add a note about getting the sources for GPL friendliness.
 This would be my preference (or in Ubuntu's case, add it to the ipxe-qemu
 package) but I think it should only be used when the legacy machine type
 is used or it will break inbound migrations from other 14.04 machines
 started with 128k ROMs (AIUI).

 -M pc and its alias -M pc-i440fx-2.0 doesn't use pxe-virtio.rom at
 all (at least upstream).
 
 Really? How does it pxeboot on virtio then (it definitely does pxeboot
 on virtio)?

efi-virtio.rom contains both BIOS and UEFI ROMs.

  Does Ubuntu 14.04 have efi-virtio.rom?
 
 Yes. After 2 layers of symlinks you get to.
 
 ubuntu@trustytest:~$ ls -la /usr/lib/ipxe/qemu/efi-virtio.rom
 -rw-r--r-- 1 root root 220672 Jan  6  2014 /usr/lib/ipxe/qemu/efi-virtio.rom
 
 It would break -M pc-1.0 started on older 14.04, but I think that's
 acceptable.
 
 I was more worried about any previous versions of Ubuntu (newer than
 12.04) which might also be using the larger rom size. But then I
 haven't investigated at what stage the rom size grew.

You're right, but in Serge's shoes I wouldn't bother about anything
except LTS.

Paolo

Re: [Qemu-devel] [PATCH] vmdk: improve streamOptimized vmdk support

On Tue, Jul 29, 2014 at 2:46 PM, Milos Vyletel milos.vyle...@gmail.com wrote:
 On Tue, Jul 29, 2014 at 9:37 AM, Stefan Hajnoczi stefa...@redhat.com wrote:
 On Mon, Jul 07, 2014 at 10:54:27AM -0400, Milos Vyletel wrote:
 VMDK's streamOptimized format is different that regular sparse format.
 L1(GD) and L2(GT) tables are not predefined but rather generated and
 written during image creation mainly because there is no way to tell
 how much space data will occupy once they are compressed. Also the
 location of header, L1 and L2 tables differs.

 - L2 tables (grain tables) are written after all grains they point to
 - L1 tables are written after all grains and L2 tables
 - footer at the end is used instead of header in first sector

 This patch improves streamOptimized support and adds possibility to
 create true streamOptimized images using qemu-img. Some of the changes
 are from VMDK specs, some of them from hexdump-ing images from VMWare
 and VirtualBox.

 I have compared these images to the ones generated by VMWare and vbox
 and they are identical with the exception of DescriptorFile that has
 some differences but none that would change behavior(CID and some
 additional DDB entries differ) and streamOptimized image generated from
 raw image was succesfully imported (as OVA) into VMWare ESXi and Oracle
 OVM.

 Signed-off-by: Milos Vyletel milos.vyle...@gmail.com
 ---
  block/vmdk.c |  363 
 +-
  1 files changed, 281 insertions(+), 82 deletions(-)

 What does this patch do beyond what QEMU already supports today?

 Is there a particular application that rejected QEMU's streamOptimized
 images?  Is this a bug fix?

 Please use scripts/checkpatch.pl to check coding style.

 Fam: Please review


 Images created/converted using QEMU were rejected by VMWare ESXi,
 vCloud, VirtualBox and Oracle OVM. I did not try anything else.

 Generally speaking streamOptimized format is not followed by QEMU.
 Instead regular VMDK format + encryption is used. We can say this is
 bug fix since streamOptimzed format is already included but does not
 work well.

 I'll check the style and fix the code. I'll postpone resubmit until
 further review is done since I'm sure there will be things that will
 need to be fixed.

What was the command-line you used?

qemu-img convert -f qcow2 -O vmdk -o subformat=streamOptimized
input.qcow2 output.vmdk

Stefan

Re: [Qemu-devel] [PATCH 2/3] qemu-char: add -chardev exit-on-eof option

Eric Blake ebl...@redhat.com writes:

 On 07/29/2014 12:12 AM, Markus Armbruster wrote:

 Libvirt probably won't use it for normal guests (we don't want to kill
 qemu just because the monitor disconnects), but does have the notion of
 an autodestroy guest where it might be useful (we WANT the guest to go
 away if libvirtd dies early).  In fact, autodestroy guests are used
 during migration - we want to kill qemu on the destination side if
 libvirtd dies before the source side finishes sending the migration
 stream.  But in that scenario, once migration succeeds, libvirt has to
 be able to convert an autodestroy guest back into a normal guest that no
 longer disappears when libvirtd does; would this be something that QMP
 can toggle the state of this attribute on the fly?  Perhaps through qom-set?
 
 After migration completes, execution moves from source to target.
 Wouldn't you want to switch off target auto-destruct together with that
 move, atomically?

 Libvirt starts the destination with -S, and migration can't complete
 until libvirt resumes the destination CPUs with 'cont'.  Libvirt's
 current timing of releasing auto-destruct is based on handshaking
 between source and destination; it occurs after source claims migration
 is done but before resuming CPUs on the destination, which satisfies the
 atomicity that you correctly observed to be necessary.

Makes sense, thanks.

 However, we also have precedence of actions in QAPI that are very
 unlikely to be used outside of qtest, but which are not marked
 experimental; for example, the 'Abort' action in 'transaction' will
 probably never be used by libvirt
 
 Arguably not a conscious decision to make it ABI forever, more a case of
 nobody thought about *not* making it ABI :)

 Added in June 2013; and we *did* have a discussion on whether to hide
 the transaction name at the time...
 https://lists.gnu.org/archive/html/qemu-devel/2013-06/msg03281.html

I stand corrected.

 I don't really mind this instance, but I'm a bit concerned about rank
 ABI growth.

 And that's a good position to maintain - it's always good to justify new
 knobs, especially since once they are ABI, it's harder to refactor
 around them.

Exactly.

Re: [Qemu-devel] [PATCH] vmdk: improve streamOptimized vmdk support

2014-07-29 Thread Milos Vyletel

On Tue, Jul 29, 2014 at 10:37 AM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Tue, Jul 29, 2014 at 2:46 PM, Milos Vyletel milos.vyle...@gmail.com 
 wrote:
 On Tue, Jul 29, 2014 at 9:37 AM, Stefan Hajnoczi stefa...@redhat.com wrote:
 On Mon, Jul 07, 2014 at 10:54:27AM -0400, Milos Vyletel wrote:
 VMDK's streamOptimized format is different that regular sparse format.
 L1(GD) and L2(GT) tables are not predefined but rather generated and
 written during image creation mainly because there is no way to tell
 how much space data will occupy once they are compressed. Also the
 location of header, L1 and L2 tables differs.

 - L2 tables (grain tables) are written after all grains they point to
 - L1 tables are written after all grains and L2 tables
 - footer at the end is used instead of header in first sector

 This patch improves streamOptimized support and adds possibility to
 create true streamOptimized images using qemu-img. Some of the changes
 are from VMDK specs, some of them from hexdump-ing images from VMWare
 and VirtualBox.

 I have compared these images to the ones generated by VMWare and vbox
 and they are identical with the exception of DescriptorFile that has
 some differences but none that would change behavior(CID and some
 additional DDB entries differ) and streamOptimized image generated from
 raw image was succesfully imported (as OVA) into VMWare ESXi and Oracle
 OVM.

 Signed-off-by: Milos Vyletel milos.vyle...@gmail.com
 ---
  block/vmdk.c |  363 
 +-
  1 files changed, 281 insertions(+), 82 deletions(-)

 What does this patch do beyond what QEMU already supports today?

 Is there a particular application that rejected QEMU's streamOptimized
 images?  Is this a bug fix?

 Please use scripts/checkpatch.pl to check coding style.

 Fam: Please review


 Images created/converted using QEMU were rejected by VMWare ESXi,
 vCloud, VirtualBox and Oracle OVM. I did not try anything else.

 Generally speaking streamOptimized format is not followed by QEMU.
 Instead regular VMDK format + encryption is used. We can say this is
 bug fix since streamOptimzed format is already included but does not
 work well.

 I'll check the style and fix the code. I'll postpone resubmit until
 further review is done since I'm sure there will be things that will
 need to be fixed.

 What was the command-line you used?

 qemu-img convert -f qcow2 -O vmdk -o subformat=streamOptimized
 input.qcow2 output.vmdk

 Stefan

I've been using test image from
http://wiki.qemu.org/download/linux-0.2.img.bz2 and converting it like
this:

qemu-img convert -O vmdk -o
adapter_type=lsilogic,compat6,subformat=streamOptimized linux-0.2.img
linux-0.2.vmdk

Milos

Re: [Qemu-devel] [PATCH v6 3/3] sclp-s390: Add memory hotplug SCLPs

2014-07-29 Thread Matthew Rosato

On 07/29/2014 08:17 AM, Christian Borntraeger wrote:
 On 30/06/14 16:00, Matthew Rosato wrote:
 Add memory information to read SCP info and add handlers for
 Read Storage Element Information, Attach Storage Element,
 Assign Storage and Unassign

 Signed-off-by: Matthew Rosato mjros...@linux.vnet.ibm.com
 ---
  hw/s390x/sclp.c|  259 
 ++--
  target-s390x/cpu.h |   15 +++
  target-s390x/kvm.c |5 +
  3 files changed, 273 insertions(+), 6 deletions(-)

 diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c
 index 769d7c3..936b189 100644
 --- a/hw/s390x/sclp.c
 +++ b/hw/s390x/sclp.c
 @@ -16,7 +16,8 @@
  #include sysemu/kvm.h
  #include exec/memory.h
  #include sysemu/sysemu.h
 -
 +#include exec/address-spaces.h
 +#include qemu/config-file.h
  #include hw/s390x/sclp.h
  #include hw/s390x/event-facility.h

 @@ -33,10 +34,19 @@ static inline SCLPEventFacility *get_event_facility(void)
  static void read_SCP_info(SCCB *sccb)
  {
  ReadInfo *read_info = (ReadInfo *) sccb;
 +sclpMemoryHotplugDev *mhd = get_sclp_memory_hotplug_dev();
  CPUState *cpu;
 -int shift = 0;
  int cpu_count = 0;
  int i = 0;
 +int increment_size = 20;
 +int rnsize, rnmax;
 +QemuOpts *opts = qemu_opts_find(qemu_find_opts(memory), NULL);
 +int slots = qemu_opt_get_number(opts, slots, 0);
 +int max_avail_slots = s390_get_memslot_count(kvm_state);
 +
 +if (slots  max_avail_slots) {
 +slots = max_avail_slots;
 +}

  CPU_FOREACH(cpu) {
  cpu_count++;
 @@ -54,14 +64,235 @@ static void read_SCP_info(SCCB *sccb)

  read_info-facilities = cpu_to_be64(SCLP_HAS_CPU_INFO);

 -while ((ram_size  (20 + shift))  65535) {
 -shift++;
 +/*
 
 I was about to apply this series, but I think it breaks the non-ccw machine 
 for strange memory sizes (e.g. 40001MB).
 The s390-virtio.c file still does a shifting via 65535. Maybe just do a 
 change over there as well?
 

Verified, this breaks on the non-ccw machine.  Thanks for catching it --
I'll submit a new series that includes a patch for it.

 You should also start making up your mind about migration and prepare 
 patches. I will send cpu migration patches soon.
 

OK, sure, I have code that I was testing a while ago, I'll start
cleaning it up for submission.

 Christian
 
 
 
 + * The storage increment size is a multiple of 1M and is a power of 2.
 + * The number of storage increments must be MAX_STORAGE_INCREMENTS or 
 fewer.
 + */
 +while ((ram_size  increment_size)  MAX_STORAGE_INCREMENTS) {
 +increment_size++;
 +}
 +rnmax = ram_size  increment_size;
 +
 +/* Memory Hotplug is only supported for the ccw machine type */
 +if (mhd) {
 +while ((mhd-standby_mem_size  increment_size) 
 +   MAX_STORAGE_INCREMENTS) {
 +increment_size++;
 +}
 +assert(increment_size == mhd-increment_size);
 +
 +mhd-standby_subregion_size = MEM_SECTION_SIZE;
 +/* Deduct the memory slot already used for core */
 +if (slots  0) {
 +while ((mhd-standby_subregion_size * (slots - 1)
 + mhd-standby_mem_size)) {
 +mhd-standby_subregion_size = mhd-standby_subregion_size 
  1;
 +}
 +}
 +/*
 + * Initialize mapping of guest standby memory sections indicating 
 which
 + * are and are not online. Assume all standby memory begins offline.
 + */
 +if (mhd-standby_state_map == 0) {
 +if (mhd-standby_mem_size % mhd-standby_subregion_size) {
 +mhd-standby_state_map = g_malloc0((mhd-standby_mem_size /
 + mhd-standby_subregion_size + 
 1) *
 + (mhd-standby_subregion_size /
 + MEM_SECTION_SIZE));
 +} else {
 +mhd-standby_state_map = g_malloc0(mhd-standby_mem_size /
 +   MEM_SECTION_SIZE);
 +}
 +}
 +mhd-padded_ram_size = ram_size + mhd-pad_size;
 +mhd-rzm = 1  mhd-increment_size;
 +rnmax = ((ram_size + mhd-standby_mem_size + mhd-pad_size)
 +  mhd-increment_size);
 +
 +read_info-facilities |= 
 cpu_to_be64(SCLP_FC_ASSIGN_ATTACH_READ_STOR);
 +}
 +
 +rnsize = 1  (increment_size - 20);
 +if (rnsize = 128) {
 +read_info-rnsize = rnsize;
 +} else {
 +read_info-rnsize = 0;
 +read_info-rnsize2 = cpu_to_be32(rnsize);
  }
 -read_info-rnmax = cpu_to_be16(ram_size  (20 + shift));
 -read_info-rnsize = 1  shift;
 +
 +if (rnmax  0x1) {
 +read_info-rnmax = cpu_to_be16(rnmax);
 +} else {
 +read_info-rnmax = cpu_to_be16(0);
 +read_info-rnmax2 = cpu_to_be64(rnmax);
 +}
 +
  sccb-h.response_code =

Re: [Qemu-devel] [PATCH] [RFC] Add machine type pc-1.0-qemu-kvm for live migrate compatibility with qemu-kvm


On 29 Jul 2014, at 15:00, Paolo Bonzini pbonz...@redhat.com wrote:

 efi-virtio.rom contains both BIOS and UEFI ROMs.

You learn a new thing every day.

 You're right, but in Serge's shoes I wouldn't bother about anything
 except LTS.

Certainly this would be the most convenient path for me (with
my 'LTS user' hat on), i.e. make it 'just work' out the box.

-- 
Alex Bligh

Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-29 Thread David Hildenbrand

 Il 28/07/2014 17:03, David Hildenbrand ha scritto:
  Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt
  like things (SIGP START) AND a special interrupt (SIGP RESTART - which is 
  like
  a SIPI++ as it performs a psw exchange - NMI). So we basically have two
  paths that can lead to a state change.  All interrupt bits may be in any
  combination (SIGP RESTART interrupts can't be masked out, nor can SIGP 
  START be
  denied).
  
  The other thing may be that on s390, each vcpu (including itself) can put
  another vcpu into the STOPPED state - I assume that this is different for 
  x86 
  INIT_RECEIVED. For this reason we have to watch out for bad race conditions
  (e.g. multiple vcpus working on another vcpu)...
 
 You can do that in x86 by sending an INIT inter-processor interrupt.  A
 SIPI is ignored if the CPU is not in INIT_RECEIVED state.
 
 Commit 66450a21f99636af4fafac2afd33f1a40631bc3a introduced the current
 implementation.
 
 - an INIT cancels a previous SIPI;
 
 - if both INIT and SIPI are sent, on real hardware you need to have a
 few hundred microseconds between them, but KVM will reliably process
 INIT before SIPI.
 
 See commit 299018f44ac553dce3caf84df1d14c4764faa279 for an example of
 the races that can happen.
 
 Note that x86 has KVM_MP_STATE_SIPI_RECEIVED state but it is obsolete,
 we go straight from KVM_MP_STATE_INIT_RECEIVED to KVM_MP_STATE_RUNNABLE.
 

Thanks for the explanation Paolo!

Looks like from an interrupt point of view, the states have a lot in common.
The major thing that differs on s390 is probably the way these interrupts are
generated and what else they influence (all the power of the SIGP facility :)
+ special check-stop state that can't be left by an interrupt, only by SIGP
  CPU resets).

David

Re: [Qemu-devel] [PATCH 2/2] target-mips/translate.c: Add judgement for msb and lsb

2014-07-29 Thread Dongxue Zhang

Ok, I got you. I will re-build a new patch for all the bitops.


2014-07-29 22:08 GMT+08:00 Aurelien Jarno aurel...@aurel32.net:

 On Tue, Jul 29, 2014 at 08:41:08PM +0800, Elta wrote:
  I think, debug mode shouldn't crash the qemu with an unpredictable
  operation,
  so i want to fix it. And you say there shouldn't raise RI, i agree with
 you.

 Agreed.

  Or when lsb  msb, just leave the code and do nothing. What do you
  think about
  this way?

 Yes, you can use something like:

 if (lsb = msb) {
 deposit(...)
 }

 --
 Aurelien Jarno  GPG: 4096R/1DDD8C9B
 aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] hw/arm/virt: fix pl031 addr typo