date:20130822

Re: [Qemu-devel] VNC key presses not correct

2013-08-22 Thread Markus Armbruster

Erik Rull  writes:

> Markus Armbruster wrote:
>> Erik Rull  writes:
>>
>>> Hi all,
>>>
>>> I'm struggling with the QEMU VNC on qemu-kvm-1.2.0 a bit, the following two
>>> things are not working properly:
>>
>> Have you tried to reproduce on a current version?
>>
>
> Hello Markus,
>
> yes, I was able to reproduce this with the released qemu-1.6.0 from
> the qemu-wiki website. Character mistakes are the same when opening a
> Notepad on Windows XP on the guest.

Thanks!

> Maybe nobody cares because english is the layout used nearly everywhere :-)

I nobody cared, we wouldn't ship it :)

Cc'ing Gerd, who knows more about VNC than I do.

> And it is not related to the host operating system, I used a Debian
> 6.0 and a self built linux - both with the same effect.
> And it is not related to the VNC client, I tried several clients on
> several operating systems, all with the same result.
>
> Do you see this effect with a non-english input layout on your side, too?
> When using SDL/direct input everything is great.

Your initial report has details on some keypresses.  Please also tell us
your complete QEMU command line, and your complete VNC client command
line(s).

[Qemu-devel] [PATCH v2] block: Introduce bs->zero_beyond_eof

2013-08-22 Thread Asias He

In 4146b46c42e0989cb5842e04d88ab6ccb1713a48 (block: Produce zeros when
protocols reading beyond end of file), we break qemu-iotests ./check
-qcow2 022. This happens because qcow2 temporarily sets ->growable = 1
for vmstate accesses (which are stored beyond the end of regular image
data).

We introduce the bs->zero_beyond_eof to allow qcow2_load_vmstate() to
disable ->zero_beyond_eof temporarily in addition to enable ->growable.

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Asias He 
---
Changes in v2: Set bs->zero_beyond_eof in bdrv_open_common

 block.c   | 4 +++-
 block/qcow2.c | 3 +++
 include/block/block_int.h | 3 +++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index f3cd9fb..6668c05 100644
--- a/block.c
+++ b/block.c
@@ -706,6 +706,7 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockDriverState *file,
 
 bs->open_flags = flags;
 bs->buffer_alignment = 512;
+bs->zero_beyond_eof = true;
 open_flags = bdrv_open_flags(bs, flags);
 bs->read_only = !(open_flags & BDRV_O_RDWR);
 
@@ -1402,6 +1403,7 @@ void bdrv_close(BlockDriverState *bs)
 bs->valid_key = 0;
 bs->sg = 0;
 bs->growable = 0;
+bs->zero_beyond_eof = false;
 QDECREF(bs->options);
 bs->options = NULL;
 
@@ -2544,7 +2546,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState 
*bs,
 }
 }
 
-if (!bs->growable) {
+if (!(bs->zero_beyond_eof && bs->growable)) {
 ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
 } else {
 /* Read zeros after EOF of growable BDSes */
diff --git a/block/qcow2.c b/block/qcow2.c
index 3376901..14e863d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1722,12 +1722,15 @@ static int qcow2_load_vmstate(BlockDriverState *bs, 
uint8_t *buf,
 {
 BDRVQcowState *s = bs->opaque;
 int growable = bs->growable;
+bool zero_beyond_eof = bs->zero_beyond_eof;
 int ret;
 
 BLKDBG_EVENT(bs->file, BLKDBG_VMSTATE_LOAD);
 bs->growable = 1;
+bs->zero_beyond_eof = false;
 ret = bdrv_pread(bs, qcow2_vm_state_offset(s) + pos, buf, size);
 bs->growable = growable;
+bs->zero_beyond_eof = zero_beyond_eof;
 
 return ret;
 }
diff --git a/include/block/block_int.h b/include/block/block_int.h
index e45f2a0..74b0689 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -281,6 +281,9 @@ struct BlockDriverState {
 /* Whether the disk can expand beyond total_sectors */
 int growable;
 
+/* Whether produces zeros when read beyond eof */
+bool zero_beyond_eof;
+
 /* the memory alignment required for the buffers handled by this driver */
 int buffer_alignment;
 
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH] block: Introduce bs->zero_beyond_eof

2013-08-22 Thread Asias He

On Wed, Aug 21, 2013 at 05:44:08PM +0200, Stefan Hajnoczi wrote:
> On Wed, Aug 21, 2013 at 04:26:04PM +0800, Asias He wrote:
> > @@ -868,6 +868,7 @@ int bdrv_file_open(BlockDriverState **pbs, const char 
> > *filename,
> >  QDECREF(options);
> >  
> >  bs->growable = 1;
> > +bs->zero_beyond_eof = true;
> >  *pbs = bs;
> >  return 0;
> >  
> > @@ -978,6 +979,7 @@ int bdrv_open(BlockDriverState *bs, const char 
> > *filename, QDict *options,
> >  }
> >  
> >  bs->options = options;
> > +bs->zero_beyond_eof = true;
> >  options = qdict_clone_shallow(options);
> >  
> >  /* For snapshot=on, create a temporary qcow2 overlay */
> 
> We chatted about whether to duplicate bs->zero_beyond_eof = true on IRC.
> Now I think you could put it in bdrv_open_common() to avoid duplication.
> Every BDS should have ->zero_beyond_eof = true by default.

Nice. v2 is on its way.

> Stefan

-- 
Asias

[Qemu-devel] [PATCH] block: better error message for read only format name

2013-08-22 Thread Fam Zheng

When user tries to use read-only whitelist format in the command line
option, failure message was "'foo' invalid format". It might be invalid
only for writable, but valid for read-only, so it is confusing. Give the
user easier to understand information.

Signed-off-by: Fam Zheng 
---
 blockdev.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index bc7016a..d3500c6 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -487,7 +487,11 @@ static DriveInfo *blockdev_init(QemuOpts *all_opts,
 
 drv = bdrv_find_whitelisted_format(buf, ro);
 if (!drv) {
-error_report("'%s' invalid format", buf);
+if (!ro && bdrv_find_whitelisted_format(buf, !ro)) {
+error_report("'%s' can be only used as read-only device.", 
buf);
+} else {
+error_report("'%s' invalid format", buf);
+}
 return NULL;
 }
 }
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-22 Thread Stefan Hajnoczi

On Thu, Aug 22, 2013 at 11:29:47AM +0530, Bharata B Rao wrote:
> On Wed, Aug 21, 2013 at 05:40:11PM +0200, Paolo Bonzini wrote:
> > Il 21/08/2013 17:24, Stefan Hajnoczi ha scritto:
> > > On Wed, Aug 21, 2013 at 10:02:47AM +0800, Asias He wrote:
> > >> In block/gluster.c, we have
> > >>
> > >> gluster_finish_aiocb
> > >> {
> > >>if (retval != sizeof(acb)) {
> > >>   qemu_mutex_lock_iothread(); /* We are in gluster thread context */
> > >>   ...
> > >>   qemu_mutex_unlock_iothread();
> > >>}
> > >> }
> > >>
> > >> qemu tools, e.g. qemu-img, might race here because
> > >> qemu_mutex_{lock,unlock}_iothread are a nop operation and
> > >> gluster_finish_aiocb is in the gluster thread context.
> > >>
> > >> To fix, we introduce our own mutex for qemu tools.
> > > 
> > > I think we need to look more closely at the error code path:
> > > 
> > > acb->ret = ret;
> > > retval = qemu_write_full(s->fds[GLUSTER_FD_WRITE], &acb, sizeof(acb));
> > > if (retval != sizeof(acb)) {
> > > /*
> > >  * Gluster AIO callback thread failed to notify the waiting
> > >  * QEMU thread about IO completion.
> > >  *
> > >  * Complete this IO request and make the disk inaccessible for
> > >  * subsequent reads and writes.
> > >  */
> > > error_report("Gluster failed to notify QEMU about IO completion");
> > > 
> > > qemu_mutex_lock_iothread(); /* We are in gluster thread context */
> > > acb->common.cb(acb->common.opaque, -EIO);
> > > qemu_aio_release(acb);
> > > close(s->fds[GLUSTER_FD_READ]);
> > > close(s->fds[GLUSTER_FD_WRITE]);
> > > 
> > > Is it safe to close the fds?  There is a race here:
> > > 
> > > 1. Another thread opens a new file descriptor and gets GLUSTER_FD_READ or
> > >GLUSTER_FD_WRITE's old fd value.
> > > 2. Another gluster thread invokes the callback and does
> > >qemu_write_full(s->fds[GLUSTER_FD_WRITE], ...).
> > > 
> > > Since the mutex doesn't protect s->fds[] this is possible.
> > > 
> > > Maybe a simpler solution for request completion is:
> > > 
> > > 1. Linked list of completed acbs.
> > > 2. Mutex to protect the linked list.
> > > 3. EventNotifier to signal iothread.
> > 
> > We could just use a bottom half, too.  Add a bottom half to acb,
> > schedule it in gluster_finish_aiocb, delete it in the bottom half's own
> > callback.
> 
> gluster_finish_aiocb gets called from gluster thread, is it safe to create
> and schedule a bh from such a thread ?
> 
> In my first implementation 
> (http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg01748.html), I was 
> using a BH from qemu read side thread (the thread
> that would respond to pipe write from gluster callback thread). That
> implementation was based on rbd and I later dropped the BH part since it
> looked like a round about way of completing the aio when we are already using
> the pipe mechanism for aio completion.

Recent patches made creating and scheduling a BH thread-safe.

I think Paolo's idea is better than mine.

Stefan

[Qemu-devel] [PATCH v2 0/3] vfio: fixes for better support for 128 bit memory section sizes

2013-08-22 Thread Alexey Kardashevskiy

I made a couple of small patches while debugging VFIO on SPAPR
which uses IOMMU MemoryRegion 2^64 bytes long.

Changes:
v2:
* added int128_exts64() function as a separate patch and used in
"vfio: Fix 128 bit handling".


Alexey Kardashevskiy (3):
  int128: add int128_exts64()
  vfio: Fix debug output for int128 values
  vfio: Fix 128 bit handling

 hw/misc/vfio.c| 19 +--
 include/qemu/int128.h |  5 +
 2 files changed, 18 insertions(+), 6 deletions(-)

-- 
1.8.4.rc4

[Qemu-devel] [PATCH v2 1/3] int128: add int128_exts64()

2013-08-22 Thread Alexey Kardashevskiy

This adds macro to extend signed 64bit value to signed 128bit value.

Signed-off-by: Alexey Kardashevskiy 
---
 include/qemu/int128.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 9ed47aa..987a1a9 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -38,6 +38,11 @@ static inline Int128 int128_2_64(void)
 return (Int128) { 0, 1 };
 }
 
+static inline Int128 int128_exts64(int64_t a)
+{
+return (Int128) { .lo = a, .hi = (a >> 63) ? -1 : 0 };
+}
+
 static inline Int128 int128_and(Int128 a, Int128 b)
 {
 return (Int128) { a.lo & b.lo, a.hi & b.hi };
-- 
1.8.4.rc4

[Qemu-devel] [PATCH v2 3/3] vfio: Fix 128 bit handling

2013-08-22 Thread Alexey Kardashevskiy

Upcoming VFIO on SPAPR PPC64 support will initialize the IOMMU
memory region with UINT64_MAX (2^64 bytes) size so int128_get64()
will assert.

The patch takes care of this check. The existing type1 IOMMU code
is not expected to map all 64 bits of RAM so the patch does not
touch that part.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v2:
* used new function int128_exts64()
---
 hw/misc/vfio.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index dfe3a80..3878fc7 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -1920,6 +1920,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 VFIOContainer *container = container_of(listener, VFIOContainer,
 iommu_data.listener);
 hwaddr iova, end;
+Int128 llend;
 void *vaddr;
 int ret;
 
@@ -1940,13 +1941,17 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 }
 
 iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
-end = (section->offset_within_address_space + int128_get64(section->size)) 
&
-  TARGET_PAGE_MASK;
+llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
 
-if (iova >= end) {
+if (int128_ge(int128_make64(iova), llend)) {
 return;
 }
 
+end = (section->offset_within_address_space + int128_get64(section->size)) 
&
+  TARGET_PAGE_MASK;
+
 vaddr = memory_region_get_ram_ptr(section->mr) +
 section->offset_within_region +
 (iova - section->offset_within_address_space);
-- 
1.8.4.rc4

[Qemu-devel] [PATCH v2 2/3] vfio: Fix debug output for int128 values

2013-08-22 Thread Alexey Kardashevskiy

Memory regions can easily be 2^64 byte long and therefore overflow
for just a bit but that is enough for int128_get64() to assert.

This takes care of debug printing of huge section sizes.

Signed-off-by: Alexey Kardashevskiy 
---
 hw/misc/vfio.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index 017e693..dfe3a80 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -1928,7 +1928,8 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 if (vfio_listener_skipped_section(section)) {
 DPRINTF("SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n",
 section->offset_within_address_space,
-section->offset_within_address_space + section->size - 1);
+section->offset_within_address_space +
+int128_get64(int128_sub(section->size, int128_one(;
 return;
 }
 
@@ -1973,7 +1974,8 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 if (vfio_listener_skipped_section(section)) {
 DPRINTF("SKIPPING region_del %"HWADDR_PRIx" - %"PRIx64"\n",
 section->offset_within_address_space,
-section->offset_within_address_space + section->size - 1);
+section->offset_within_address_space +
+int128_get64(int128_sub(section->size, int128_one(;
 return;
 }
 
-- 
1.8.4.rc4

Re: [Qemu-devel] updated: kvm PCI todo wiki

2013-08-22 Thread Nicholas A. Bellinger

On Wed, 2013-08-21 at 14:45 +0200, Hannes Reinecke wrote:
> On 08/21/2013 12:48 PM, Michael S. Tsirkin wrote:
> > Hey guys,
> > I've put up a wiki page with a kvm PCI todo list,
> > mainly to avoid effort duplication, but also in the hope
> > to draw attention to what I think we should try addressing
> > in KVM:
> >
> > http://www.linux-kvm.org/page/PCITodo
> >
> > This page could cover all PCI related activity in KVM,
> > it is very incomplete.
> > We should probably add e.g. IOMMU related stuff.
> >
> > Note: if there's no developer listed for an item,
> > this just means I don't know of anyone actively working
> > on an issue at the moment, not that no one intends to.
> >
> > I would appreciate it if others working on one of the items on this list
> > would add their names so we can communicate better.  If others like this
> > wiki page, please go ahead and add stuff you are working on if any.
> >
> > It would be especially nice to add testing projects.
> >
> > Also, feel free to add links to bugzillas items.
> >
> On a related note, did anyone ever tried to test MSI / MSI-X with a 
> windows guest? I've tried to enable it for virtio but for some reason 
> Windows didn't wanted to enable it. AHCI was even worse; the stock 
> Windows version doesn't support MSI and the Intel one doesn't like our 
> implementation :-(.
> 
> Anyone ever managed to get this to work?
> 
> If not it'd be a good topic for the wiki ...
> 

Speaking of which, I asked Asias about this recently and he seems to
think that virtio-net + virtio-blk drivers for MSFT do in fact support
MSI /MSI-X.

MST, do you know if that that true..?

--nab

Re: [Qemu-devel] updated: kvm PCI todo wiki

2013-08-22 Thread Michael S. Tsirkin

On Thu, Aug 22, 2013 at 01:29:34AM -0700, Nicholas A. Bellinger wrote:
> On Wed, 2013-08-21 at 14:45 +0200, Hannes Reinecke wrote:
> > On 08/21/2013 12:48 PM, Michael S. Tsirkin wrote:
> > > Hey guys,
> > > I've put up a wiki page with a kvm PCI todo list,
> > > mainly to avoid effort duplication, but also in the hope
> > > to draw attention to what I think we should try addressing
> > > in KVM:
> > >
> > > http://www.linux-kvm.org/page/PCITodo
> > >
> > > This page could cover all PCI related activity in KVM,
> > > it is very incomplete.
> > > We should probably add e.g. IOMMU related stuff.
> > >
> > > Note: if there's no developer listed for an item,
> > > this just means I don't know of anyone actively working
> > > on an issue at the moment, not that no one intends to.
> > >
> > > I would appreciate it if others working on one of the items on this list
> > > would add their names so we can communicate better.  If others like this
> > > wiki page, please go ahead and add stuff you are working on if any.
> > >
> > > It would be especially nice to add testing projects.
> > >
> > > Also, feel free to add links to bugzillas items.
> > >
> > On a related note, did anyone ever tried to test MSI / MSI-X with a 
> > windows guest? I've tried to enable it for virtio but for some reason 
> > Windows didn't wanted to enable it. AHCI was even worse; the stock 
> > Windows version doesn't support MSI and the Intel one doesn't like our 
> > implementation :-(.
> > 
> > Anyone ever managed to get this to work?
> > 
> > If not it'd be a good topic for the wiki ...
> > 
> 
> Speaking of which, I asked Asias about this recently and he seems to
> think that virtio-net + virtio-blk drivers for MSFT do in fact support
> MSI /MSI-X.
> 
> MST, do you know if that that true..?
> 
> --nab

Yes, they do for modern windows versions.

[Qemu-devel] [RFC 0/2] osdep: warn if opening a file O_DIRECT on tmpfs fails

2013-08-22 Thread Stefan Hajnoczi

This is an implementation of Dan and Eric's idea for probing a failed O_DIRECT
open(2) call to see if the file system does not support O_DIRECT.

I wanted to see what the implementation looks like but I don't like it:

1. We still need to guess if O_DIRECT is supported in the O_CREAT EINVAL case
   because we can't probe if O_CREAT | O_DIRECT | O_EXCL returned EINVAL.
2. There is a race condition between open(O_CREAT | O_EXCL | O_DIRECT) and
   opening again without O_CREAT.  If the file is deleted we'll get ENOENT
   which would have been impossible before.
3. It's way complicated.

Issue #1 gives me an idea: why play games when we can simply warn the user?

if (ret == -1 && errno == EINVAL && (flags & O_DIRECT)) {
error_report("file system may not support O_DIRECT");
errno = EINVAL; /* in case it was clobbered */
}

I think this simple, portable approach beats statfs tmpfs and open probing.
Will send a patch for that and plan to merge it.

Stefan Hajnoczi (2):
  libcacard: link against qemu-error.o for error_report()
  osdep: warn if opening a file O_DIRECT on tmpfs fails

 libcacard/Makefile |  3 ++-
 util/osdep.c   | 62 --
 2 files changed, 57 insertions(+), 8 deletions(-)

-- 
1.8.3.1

Re: [Qemu-devel] [PATCH] vl: allow "cont" from panicked state

2013-08-22 Thread Laszlo Ersek

On 08/21/13 17:32, Paolo Bonzini wrote:

> To support 1.5, libvirt should simply be ready to react to unanticipated
> GUEST_PANICKED events.  reboot-on-panic will simply be broken for 1.5
> and Linux 3.10+ guests. :(

I'm probably misunderstanding the discussion, but it might be possible
to disable pvpanic even in 1.5 from the host side, with the following hack:

  -global pvpanic.ioport=0

In qemu, this will either configure a working pvpanic device on ioport
0, or the pvpanic device will be genuinely broken. At least it doesn't
(obviously) break other stuff (in v1.5.2):

(qemu) info mtree
I/O
- (prio 0, RW): io
  - (prio 0, RW): pvpanic
  -0007 (prio 0, RW): dma-chan

(qemu) info qtree
bus: main-system-bus
  dev: i440FX-pcihost, id ""
bus: pci.0
  dev: PIIX3, id ""
bus: isa.0
  dev: pvpanic, id ""
ioport = 0

Either way, the "etc/pvpanic-port" fw_cfg file will contain 0, and
SeaBIOS will interpret it as "no pvpanic device". It will report the
same to the Linux guest too (_STA will return 0 for QEMU0001;
pvpanic_add() --> -ENODEV). Thus, no pvpanic notifier should be
registered, and reboot-on-panic should be reachable in the guest.

A horrible hack, certainly.

Laszlo

[Qemu-devel] [RFC 2/2] osdep: warn if opening a file O_DIRECT on tmpfs fails

2013-08-22 Thread Stefan Hajnoczi

Print a warning when opening a file O_DIRECT on tmpfs fails with EINVAL.
This saves users a lot of time trying to figure out the EINVAL error.

Reported-by: Deepak C Shetty 
Suggested-by: Eric Blake 
Suggested-by: Daniel P. Berrange 
Signed-off-by: Stefan Hajnoczi 
---
 util/osdep.c | 62 +---
 1 file changed, 55 insertions(+), 7 deletions(-)

diff --git a/util/osdep.c b/util/osdep.c
index 685c8ae..cbee2b7 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -148,6 +148,23 @@ static int qemu_parse_fdset(const char *param)
 #endif
 
 /*
+ * Open a file with O_CLOEXEC semantics
+ */
+static int open_cloexec(const char *name, int flags, int mode)
+{
+int ret;
+#ifdef O_CLOEXEC
+ret = open(name, flags | O_CLOEXEC, mode);
+#else
+ret = open(name, flags, mode);
+if (ret >= 0) {
+qemu_set_cloexec(ret);
+}
+#endif
+return ret;
+}
+
+/*
  * Opens a file with FD_CLOEXEC set
  */
 int qemu_open(const char *name, int flags, ...)
@@ -198,14 +215,45 @@ int qemu_open(const char *name, int flags, ...)
 va_end(ap);
 }
 
-#ifdef O_CLOEXEC
-ret = open(name, flags | O_CLOEXEC, mode);
-#else
-ret = open(name, flags, mode);
-if (ret >= 0) {
-qemu_set_cloexec(ret);
+/* Some file systems do not support O_DIRECT and fail with EINVAL.  Confirm
+ * the problem by trying again without O_DIRECT and printing a warning.
+ *
+ * Sounds easy enough but we need to be careful with O_CREAT since this
+ * function should not create a file when an error is returned.
+ */
+if (flags & (O_CREAT | O_DIRECT) == O_CREAT | O_DIRECT) {
+ret = open_cloexec(name, flags | O_EXCL, mode);
+
+if (ret >= 0) {
+return ret;
+}
+
+if (errno == EINVAL) {
+error_report("file system may not support O_DIRECT");
+errno = EINVAL; /* in case it was clobbered */
+return ret;
+} else if (errno == EEXIST && (flags & O_EXCL) == 0) {
+/* We know the file existed, drop O_CREAT so the following open
+ * attempts do not create a file if O_DIRECT produces EINVAL.  Note
+ * there is a race condition here if the file is deleted while we
+ * perform our open calls.
+ */
+flags &= O_CREAT;
+} else {
+return ret;
+}
+}
+
+ret = open_cloexec(name, flags, mode);
+
+if (ret == -1 && errno == EINVAL && (flags & O_DIRECT)) {
+int fd = open_cloexec(name, flags & ~O_DIRECT, mode);
+if (fd >= 0) {
+close(fd);
+error_report("file system does not support O_DIRECT");
+}
+errno = EINVAL; /* in case it was clobbered */
 }
-#endif
 
 return ret;
 }
-- 
1.8.3.1

[Qemu-devel] [RFC 1/2] libcacard: link against qemu-error.o for error_report()

2013-08-22 Thread Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi 
---
 libcacard/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libcacard/Makefile b/libcacard/Makefile
index 47827a0..4d15da4 100644
--- a/libcacard/Makefile
+++ b/libcacard/Makefile
@@ -4,7 +4,8 @@ TOOLS += vscclient$(EXESUF)
 
 # objects linked into a shared library, built with libtool with -fPIC if 
required
 libcacard-obj-y = $(stub-obj-y) $(libcacard-y)
-libcacard-obj-y += util/osdep.o util/cutils.o util/qemu-timer-common.o 
util/error.o
+libcacard-obj-y += util/osdep.o util/cutils.o util/qemu-timer-common.o
+libcacard-obj-y += util/error.o util/qemu-error.o
 libcacard-obj-$(CONFIG_WIN32) += util/oslib-win32.o util/qemu-thread-win32.o
 libcacard-obj-$(CONFIG_POSIX) += util/oslib-posix.o util/qemu-thread-posix.o
 libcacard-obj-y += $(filter trace/%, $(util-obj-y))
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v3 6/6] hw: Clean up bogus default boot order

2013-08-22 Thread Markus Armbruster

"Michael S. Tsirkin"  writes:

> On Wed, Aug 21, 2013 at 06:01:45PM +0200, Markus Armbruster wrote:
>> "Michael S. Tsirkin"  writes:
>> 
>> > On Wed, Aug 21, 2013 at 05:10:56PM +0200, Markus Armbruster wrote:
>> >> "Michael S. Tsirkin"  writes:
>> >> 
>> >> > On Mon, Jul 22, 2013 at 01:38:47PM +0200, Markus Armbruster wrote:
>> >> >> We set default boot order "cad" in every single machine definition
>> >> >> except "pseries" and "moxiesim", even though very few boards actually
>> >> >> care for boot order, and "cad" makes sense for even fewer.
>> >> >> 
>> >> >> Machines that care:
>> >> >> 
>> >> >> * pc and its variants
>> >> >> 
>> >> >>   Accept up to three letters 'a', 'b' (undocumented alias for 'a'),
>> >> >>   'c', 'd' and 'n'.  Reject all others (fatal with -boot).
>> >> >> 
>> >> >> * nseries (n800, n810)
>> >> >> 
>> >> >>   Check whether order starts with 'n'.  Silently ignored otherwise.
>> >> >> 
>> >> >> * prep, g3beige, mac99
>> >> >> 
>> >> >>   Extract the first character the machine understands (subset of
>> >> >>   'a'..'f').  Silently ignored otherwise.
>> >> >> 
>> >> >> * spapr
>> >> >> 
>> >> >>   Accept an arbitrary string (vl.c restricts it to contain only
>> >> >>   'a'..'p', no duplicates).
>> >> >> 
>> >> >> * sun4[mdc]
>> >> >> 
>> >> >>   Use the first character.  Silently ignored otherwise.
>> >> >> 
>> >> >> Strip characters these machines ignore from their default boot order.
>> >> >> 
>> >> >> For all other machines, remove the unused default boot order
>> >> >> alltogether.
>> >> >> 
>> >> >> Note that my rename of QEMUMachine member boot_order to
>> >> >> default_boot_order and QEMUMachineInitArgs member boot_device to
>> >> >> boot_order has a welcome side effect: it makes every use of boot
>> >> >> orders visible in this patch, for easy review.
>> >> >> 
>> >> >> Signed-off-by: Markus Armbruster 
>> >> >> ---
>> >> [...]
>> >> >> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
>> >> >> index 9327ac1..3700bd5 100644
>> >> >> --- a/hw/i386/pc_piix.c
>> >> >> +++ b/hw/i386/pc_piix.c
>> >> >> @@ -216,7 +216,7 @@ static void pc_init1(QEMUMachineInitArgs *args,
>> >> >>  }
>> >> >>  }
>> >> >>  
>> >> >> -pc_cmos_init(below_4g_mem_size, above_4g_mem_size, 
>> >> >> args->boot_device,
>> >> >> +pc_cmos_init(below_4g_mem_size, above_4g_mem_size, 
>> >> >> args->boot_order,
>> >> >>   floppy, idebus[0], idebus[1], rtc_state);
>> >> >>  
>> >> >>  if (pci_enabled && usb_enabled(false)) {
>> >> >> @@ -324,7 +324,7 @@ static QEMUMachine pc_i440fx_machine_v1_6 = {
>> >> >>  .hot_add_cpu = pc_hot_add_cpu,
>> >> >>  .max_cpus = 255,
>> >> >>  .is_default = 1,
>> >> >> -DEFAULT_MACHINE_OPTIONS,
>> >> >> +.default_boot_order = "cad",
>> >> >>  };
>> >> >>  
>> >> >>  static QEMUMachine pc_i440fx_machine_v1_5 = {
>> >> >> @@ -337,7 +337,7 @@ static QEMUMachine pc_i440fx_machine_v1_5 = {
>> >> >>  PC_COMPAT_1_5,
>> >> >>  { /* end of list */ }
>> >> >>  },
>> >> >> -DEFAULT_MACHINE_OPTIONS,
>> >> >> +.default_boot_order = "cad",
>> >> >>  };
>> >> >>  
>> >> >>  static QEMUMachine pc_i440fx_machine_v1_4 = {
>> >> >
>> >> > So all PC machine types share this?
>> >> 
>> >> Correct, just like they share DEFAULT_MACHINE_OPTIONS before my patch.
>> >> Which is defined as
>> >> 
>> >> #define DEFAULT_MACHINE_OPTIONS \
>> >> .boot_order = "cad"
>> >> 
>> >> I.e. my patch merely peels off a layer of obfuscation :)
>> >
>> > Using a macro in multiple places, instead of a hard-coded constant is
>> > not obfuscation.
>> >
>> >> > Can we set this in some common code, somehow?
>> >> 
>> >> We don't have an inheritance notion for machine types.
>> >> 
>> >> vl.c uses machine->boot_order before calling one of its methods, so
>> >> monkey-patching .boot_order from a method won't do.  Besides, that cure
>> >> looks much worse than the disease to me.
>> >> 
>> >> Can't think of anything else offhand.
>> >> 
>> >> [...]
>> >
>> > Set this in pc_init_pci somehow?
>> 
>> Too late, see "vl.c uses..." above.
>
> Ah, missed it.
> Can we fix that?

If I understand you correctly, you want to monkey-patch
machine->boot_order from machine->init().  Issues:

* machine->init() does not have access to machine.  Fixable.

* machine is read-only, except for a few places in vl.c (one is managing
  the list of registered machines, the other monkey-patches
  machine->max_cpus to one when it's zero).  I don't want *more*
  monkey-patching, I want *less*, so we can make machines const.  In
  fact, we can make current_machine const right away, and I think we
  should (patch coming).

* If machine->init() can change the default boot order, we get a data
  dependency cycle.  Current data dependency chain:

  0. Initialize QEMUMachine *machine to the default machine.

  1. Option parsing sets machine, and collects "boot-opts" options.

  2. Evaluation of "boot-opts": find normal boot order (from
 machine->boot_order and op

Re: [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions

2013-08-22 Thread Laszlo Ersek

On 08/22/13 05:16, Wanlong Gao wrote:
> On 08/22/2013 10:29 AM, Eric Blake wrote:
>> On 08/21/2013 07:12 PM, Wanlong Gao wrote:
>>
> +   '*mem':'str' }}

 Why is size passed as a 'str' instead of an integral type?  If anything,
 at the QMP layer, it should be an integer representing size in bytes
 (the command line and HMP are already capable of converting shorthand
 like 1G into proper byte counts for use in QAPI).
>>>
>>> Since the original "mem" options is MB default, but "size" type is byte 
>>> default,
>>> so we should pass a "str" first to be consistent with original option.
>>
>> No. HMP is human-friendly - it can default to M.  QMP is
>> machine-friendly - it should default to bytes and take an 'int' rather
>> than a 'str'.  Part of the glue between HMP and QMP is converting from
>> human-friendly to machine-friendly, so that QMP doesn't have to carry cruft.
> 
> This "mem" options is only for command line options, I can't understand you
> are saying QMP command here. Because the original "mem" option treat "1024"
> as "1024MB", but if I set this to "size" type, this "mem" options will
> treat "1024" as "124B". So I should pass a str first and make it to "MB"
> default in the options parse function to be consistent with original one.

Yes. This part of the schema is not for exposure over QMP, it just
generates stuff for OptsVisitor, and it must remain compatible with the
original, manual parsing of the option.

This came up for V6:

http://thread.gmane.org/gmane.comp.emulators.qemu/225678/focus=225714

Laszlo

Re: [Qemu-devel] [RFC PATCH 3/4] qmp: Add "snapshot=" option to nbd-server-add

2013-08-22 Thread Fam Zheng

On Wed, 08/21 15:02, Stefan Hajnoczi wrote:
> On Mon, Jul 29, 2013 at 12:25:31PM +0800, Fam Zheng wrote:
> > +/* create a point-in-time snapshot BDS from an existing BDS */
> > +static BlockDriverState *nbd_create_snapshot(BlockDriverState *orig_bs)
> > +{
> > +int ret;
> > +char filename[1024];
> > +BlockDriver *drv;
> > +BlockDriverState *bs;
> > +QEMUOptionParameter *options;
> > +Error *local_err = NULL;
> > +
> > +bs = bdrv_new("");
> > +ret = get_tmp_filename(filename, sizeof(filename));
> > +if (ret < 0) {
> > +goto err;
> 
> unlink(filename) is not safe if this function returns an error.
> 
> It is simpler if you get the temporary filename first and then do
> bdrv_new().
> 
> > +}
> > +drv = bdrv_find_format("qcow2");
> > +if (drv < 0) {
> > +goto err;
> > +}
> > +options = parse_option_parameters("", drv->create_options, NULL);
> > +set_option_parameter_int(options, BLOCK_OPT_SIZE, 
> > bdrv_getlength(orig_bs));
> > +
> > +ret = bdrv_create(drv, filename, options);
> 
> This is handy but only works if the QEMU process has permission to
> create temporary files.
> 
Yes, this is a shortcut, it has this limitation, but the good side is
it's easy to get and no other dependency.

To avoid creating file, we'll need blockdev-add to override backing_hd,
and blockdev-backup to use an existing BDS as target, then we simply use
current nbd-server-add to export it.

This series is still RFC, with above said, we still need to decide which
is the way we (QEMU and libvirt) want for 1.7.  any thoughts?

Thanks
Fam

Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 09:48, Stefan Hajnoczi ha scritto:
>> > gluster_finish_aiocb gets called from gluster thread, is it safe to create
>> > and schedule a bh from such a thread ?
>> > 
>> > In my first implementation 
>> > (http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg01748.html), I 
>> > was using a BH from qemu read side thread (the thread
>> > that would respond to pipe write from gluster callback thread). That
>> > implementation was based on rbd and I later dropped the BH part since it
>> > looked like a round about way of completing the aio when we are already 
>> > using
>> > the pipe mechanism for aio completion.
> Recent patches made creating and scheduling a BH thread-safe.

I thought scheduling BHs was always thread safe?

> I think Paolo's idea is better than mine.

Both are fine, they are just different.  Mine is simpler because it
leaves list management to the BH code.

But since we are at it, we should simplify the code and uniformly use a
bottom half for both successful and erroneous completions.  This applies
to both ideas.

Maybe an even simpler patch would be to just abort if the
GLUSTER_FD_WRITE write fails.  Under what circumstances could it happen?

Paolo

Re: [Qemu-devel] [PATCH v2 1/3] int128: add int128_exts64()

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 10:20, Alexey Kardashevskiy ha scritto:
> This adds macro to extend signed 64bit value to signed 128bit value.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  include/qemu/int128.h | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/qemu/int128.h b/include/qemu/int128.h
> index 9ed47aa..987a1a9 100644
> --- a/include/qemu/int128.h
> +++ b/include/qemu/int128.h
> @@ -38,6 +38,11 @@ static inline Int128 int128_2_64(void)
>  return (Int128) { 0, 1 };
>  }
>  
> +static inline Int128 int128_exts64(int64_t a)
> +{
> +return (Int128) { .lo = a, .hi = (a >> 63) ? -1 : 0 };
> +}

The "? -1 : 0" is not necessary, but the compiler will remove it at -O1
or more (interestingly, or -O0 it will remove the shift and leave the
conditional!).

>  static inline Int128 int128_and(Int128 a, Int128 b)
>  {
>  return (Int128) { a.lo & b.lo, a.hi & b.hi };
> 

Acked-by: Paolo Bonzini 

Paolo

Re: [Qemu-devel] [PATCH v2 3/3] vfio: Fix 128 bit handling

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 10:20, Alexey Kardashevskiy ha scritto:
>  iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> -end = (section->offset_within_address_space + 
> int128_get64(section->size)) &
> -  TARGET_PAGE_MASK;
> +llend = int128_make64(section->offset_within_address_space);
> +llend = int128_add(llend, section->size);
> +llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
>  
> -if (iova >= end) {
> +if (int128_ge(int128_make64(iova), llend)) {
>  return;
>  }
>  
> +end = (section->offset_within_address_space + 
> int128_get64(section->size)) &
> +  TARGET_PAGE_MASK;
> +

This can still fail for section->size = 2^64.  Do your IOMMU patches
take care of it?

Paolo

Re: [Qemu-devel] [PATCH V8 00/13] monitor: support sub command group in auto completion and help

2013-08-22 Thread Wenchao Xia


于 2013-8-20 22:04, Luiz Capitulino 写道:

On Tue, 30 Jul 2013 12:03:11 -0400
Luiz Capitulino  wrote:


On Fri, 26 Jul 2013 11:20:29 +0800
Wenchao Xia  wrote:


This series make auto completion and help functions works normal for sub
command, by using reentrant functions. In order to do that, global variables
are not directly used in those functions any more. With this series, cmd_table
is a member of structure Monitor so it is possible to create a monitor with
different command table now, auto completion will work in that monitor. In
short, "info" is not treated as a special case now, this series ensure help
and auto complete function works normal for any sub command added in the future.

Patch 5 replaced cur_mon with rs->mon, it is safe because:
monitor_init() calls readline_init() which initialize mon->rs, result is
mon->rs->mon == mon. Then qemu_chr_add_handlers() is called, which make
monitor_read() function take *mon as its opaque. Later, when user input,
monitor_read() is called, where cur_mon is set to *mon by "cur_mon = opaque".
If qemu's monitors run in one thread, then later in readline_handle_byte()
and readline_comletion(), cur_mon is actually equal to rs->mon, in another
word, it points to the monitor instance, so it is safe to replace *cur_mon
in those functions.


I've applied this to qmp-next with the change I suggested for
patch 09/13.


Unfortunately this series brakes make check:

GTESTER check-qtest-x86_64
Broken pipe
GTester: last random seed: R02S3492bd34f44dd17460851643383be44d
main-loop: WARNING: I/O thread spun for 1000 iterations
make: *** [check-qtest-x86_64] Error 1

I debugged it (with some help from Laszlo) and the problem is that it
broke the human-monitor-command command. Any usage of this command
triggers the bug like:

{ "execute": "human-monitor-command",
  "arguments": { "command-line": "info registers" } }

It seems simple to fix, I think you just have to initialize
mon->cmd_table in qmp_human_monitor_command(), but I'd recommend two
things:

  1. It's better to split off some/all QMP initialization from
 monitor_init() and call it from qmp_human_monitor_command()

  2. Can you please take the opportunity and test all commands using
 cur_mon? Just grep for it

Sorry for noticing this only now, but I only run make check before
sending a pull request (although this very likely shows you didn't
run it either).


  About the fd related qmp interface, to test it, send_msg() is needed,
which was not supported in python 2, but new added python 3.3. I think
there are three ways to add test cases for fd qmp APIs:
1 test only when python > 3.3.
2 python plus C: compile a .so and call it with ctypes.
3 a new test framework: pure C code to call qmp interfaces.
  Which one do you prefer?

--
Best Regards

Wenchao Xia

Re: [Qemu-devel] [RFC PATCH v2 0/3] Start fixing the pvpanic mess

2013-08-22 Thread Daniel P. Berrange

On Wed, Aug 21, 2013 at 06:56:52PM +0200, Paolo Bonzini wrote:
> Il 21/08/2013 18:55, Daniel P. Berrange ha scritto:
> > On Wed, Aug 21, 2013 at 06:51:11PM +0200, Paolo Bonzini wrote:
> >> Il 21/08/2013 18:48, Daniel P. Berrange ha scritto:
> >>> No,  is the right thing to be using for this from
> >>> libvirt's pov & I don't think we should invent something new.
> >>> The  element has always been intended to represent
> >>> handling of guest panics, not qemu internal errors.
> >>
> >> Actually for Xen HVM guests, it mostly traps things such as failed
> >> vmentries.  The Xen PV-on-HVM drivers do not register a panic notifier
> >> that moves the guest to the "crashed" state.
> >>
> >>  cannot be salvaged, in my opinion, because all domain XMLs in
> >> the wild will have a setting that causes libvirt to add "-device
> >> isa-pvpanic".  Thus changing libvirt versions will change guest
> >> hardware, which is _very_ bad.
> >>
> >> In addition, Windows XP and 2003 will show the annoying device wizard
> >> upon a libvirt upgrade, and fixing this is what surfaced all the mess.
> > 
> > The existance of a  element should not be having any
> > effect on what hardware we create. That is merely a lifecycle
> > policy setting that should be completely independant of the
> > guest device model.
> > 
> > eg it is valid to have  present in the XML at all
> > times, even if there's no pvpanic device present. That simply
> > means the actions will never be triggered.
> 
> So are you suggesting to add a  element to ?  That
> may be fine, but it doesn't seem very user-friendly.

Yes, if we're going to have pvpanic be user controllable, it must be
via an explicit device element. 

None of the  elements should have any impact on guest ABI
model. They're purely lifecycle policy settings.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH] vl: allow "cont" from panicked state

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 10:38, Laszlo Ersek ha scritto:
>> > To support 1.5, libvirt should simply be ready to react to unanticipated
>> > GUEST_PANICKED events.  reboot-on-panic will simply be broken for 1.5
>> > and Linux 3.10+ guests. :(
> I'm probably misunderstanding the discussion, but it might be possible
> to disable pvpanic even in 1.5 from the host side, with the following hack:
> 
>   -global pvpanic.ioport=0
> 
> In qemu, this will either configure a working pvpanic device on ioport
> 0, or the pvpanic device will be genuinely broken. At least it doesn't
> (obviously) break other stuff (in v1.5.2):
> 
> (qemu) info mtree
> I/O
> - (prio 0, RW): io
>   - (prio 0, RW): pvpanic
>   -0007 (prio 0, RW): dma-chan

No, you're not misunderstanding the discussion.

Depending on the priorities of the pvpanic and legacy-DMA regions, it
would break DMA channel 0.  Channel 0 is (was) used for DRAM refresh, so
it should not have any visible effect.  However, it may not be entirely
disabling pvpanic, just making it mostly invisible.

Paolo

Re: [Qemu-devel] [RFC PATCH 3/4] qmp: Add "snapshot=" option to nbd-server-add

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 10:53, Fam Zheng ha scritto:
>> > This is handy but only works if the QEMU process has permission to
>> > create temporary files.
>> > 
> Yes, this is a shortcut, it has this limitation, but the good side is
> it's easy to get and no other dependency.
> 
> To avoid creating file, we'll need blockdev-add to override backing_hd,
> and blockdev-backup to use an existing BDS as target, then we simply use
> current nbd-server-add to export it.
> 
> This series is still RFC, with above said, we still need to decide which
> is the way we (QEMU and libvirt) want for 1.7.  any thoughts?

I think this was the initial design, but Kevin already said he didn't
like the idea.

(Also, if you do this you have to add nbd-server-add to qmp-transaction,
to atomically start fleecing of multiple devices).

Paolo

[Qemu-devel] [PATCH 0/2] osdep: warn if open(O_DIRECT) on fails with EINVAL

2013-08-22 Thread Stefan Hajnoczi

After trying out a few approaches, here is what I think is the simplest viable
way to print a user-friendly warning if opening a file O_DIRECT fails with
EINVAL.  This happens on Linux tmpfs.

We don't really know why we got EINVAL but if O_DIRECT was used it's a good
clue that the file system does not support O_DIRECT.

Stefan Hajnoczi (2):
  libcacard: link against qemu-error.o for error_report()
  osdep: warn if open(O_DIRECT) on fails with EINVAL

 libcacard/Makefile | 3 ++-
 util/osdep.c   | 7 +++
 2 files changed, 9 insertions(+), 1 deletion(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH 1/2] libcacard: link against qemu-error.o for error_report()

2013-08-22 Thread Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi 
---
 libcacard/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libcacard/Makefile b/libcacard/Makefile
index 47827a0..4d15da4 100644
--- a/libcacard/Makefile
+++ b/libcacard/Makefile
@@ -4,7 +4,8 @@ TOOLS += vscclient$(EXESUF)
 
 # objects linked into a shared library, built with libtool with -fPIC if 
required
 libcacard-obj-y = $(stub-obj-y) $(libcacard-y)
-libcacard-obj-y += util/osdep.o util/cutils.o util/qemu-timer-common.o 
util/error.o
+libcacard-obj-y += util/osdep.o util/cutils.o util/qemu-timer-common.o
+libcacard-obj-y += util/error.o util/qemu-error.o
 libcacard-obj-$(CONFIG_WIN32) += util/oslib-win32.o util/qemu-thread-win32.o
 libcacard-obj-$(CONFIG_POSIX) += util/oslib-posix.o util/qemu-thread-posix.o
 libcacard-obj-y += $(filter trace/%, $(util-obj-y))
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH] vl: allow "cont" from panicked state

2013-08-22 Thread Michael S. Tsirkin

On Thu, Aug 22, 2013 at 11:19:44AM +0200, Paolo Bonzini wrote:
> Il 22/08/2013 10:38, Laszlo Ersek ha scritto:
> >> > To support 1.5, libvirt should simply be ready to react to unanticipated
> >> > GUEST_PANICKED events.  reboot-on-panic will simply be broken for 1.5
> >> > and Linux 3.10+ guests. :(
> > I'm probably misunderstanding the discussion, but it might be possible
> > to disable pvpanic even in 1.5 from the host side, with the following hack:
> > 
> >   -global pvpanic.ioport=0
> > 
> > In qemu, this will either configure a working pvpanic device on ioport
> > 0, or the pvpanic device will be genuinely broken. At least it doesn't
> > (obviously) break other stuff (in v1.5.2):
> > 
> > (qemu) info mtree
> > I/O
> > - (prio 0, RW): io
> >   - (prio 0, RW): pvpanic
> >   -0007 (prio 0, RW): dma-chan
> 
> No, you're not misunderstanding the discussion.
> 
> Depending on the priorities of the pvpanic and legacy-DMA regions, it
> would break DMA channel 0.  Channel 0 is (was) used for DRAM refresh, so
> it should not have any visible effect.  However, it may not be entirely
> disabling pvpanic, just making it mostly invisible.
> 
> Paolo

Ugh.

And now that Paolo pointed out that nothing terrible
happens even when migrating from host with pvpanic
enabled to host with pvpanic disabled, I'm inclined
to think we should just disable pvpanic in 1.5.X.

Thoughts?

-- 
MST

Re: [Qemu-devel] [RFC PATCH 3/4] qmp: Add "snapshot=" option to nbd-server-add

2013-08-22 Thread Fam Zheng

On Thu, 08/22 11:22, Paolo Bonzini wrote:
> Il 22/08/2013 10:53, Fam Zheng ha scritto:
> >> > This is handy but only works if the QEMU process has permission to
> >> > create temporary files.
> >> > 
> > Yes, this is a shortcut, it has this limitation, but the good side is
> > it's easy to get and no other dependency.
> > 
> > To avoid creating file, we'll need blockdev-add to override backing_hd,
> > and blockdev-backup to use an existing BDS as target, then we simply use
> > current nbd-server-add to export it.
> > 
> > This series is still RFC, with above said, we still need to decide which
> > is the way we (QEMU and libvirt) want for 1.7.  any thoughts?
> 
> I think this was the initial design, but Kevin already said he didn't
> like the idea.
> 
> (Also, if you do this you have to add nbd-server-add to qmp-transaction,
> to atomically start fleecing of multiple devices).
> 
Rigth. And this stands the same as for blockdev-backup.

Fam

Re: [Qemu-devel] [PATCH v2 1/3] int128: add int128_exts64()

2013-08-22 Thread Peter Maydell

On 22 August 2013 10:09, Paolo Bonzini  wrote:
> Il 22/08/2013 10:20, Alexey Kardashevskiy ha scritto:
>> +static inline Int128 int128_exts64(int64_t a)
>> +{
>> +return (Int128) { .lo = a, .hi = (a >> 63) ? -1 : 0 };
>> +}
>
> The "? -1 : 0" is not necessary, but the compiler will remove it at -O1
> or more (interestingly, or -O0 it will remove the shift and leave the
> conditional!).

We can avoid relying on implementation defined
behaviour here by using
  .hi = (a < 0) ? -1 : 0;

(I know we allow ourselves to assume right-shift of signed
ints is arithmetic shift, but I think it's nicer to avoid it unless
it really makes the code better.)

-- PMM

[Qemu-devel] [PATCH 2/2] osdep: warn if open(O_DIRECT) on fails with EINVAL

2013-08-22 Thread Stefan Hajnoczi

Print a warning when opening a file O_DIRECT fails with EINVAL.  This
saves users a lot of time trying to figure out the EINVAL error, which
is typical when attempting to open a file O_DIRECT on Linux tmpfs.

Reported-by: Deepak C Shetty 
Signed-off-by: Stefan Hajnoczi 
---
 util/osdep.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/util/osdep.c b/util/osdep.c
index 685c8ae..62072b4 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -207,6 +207,13 @@ int qemu_open(const char *name, int flags, ...)
 }
 #endif
 
+#ifdef O_DIRECT
+if (ret == -1 && errno == EINVAL && (flags & O_DIRECT)) {
+error_report("file system may not support O_DIRECT");
+errno = EINVAL; /* in case it was clobbered */
+}
+#endif /* O_DIRECT */
+
 return ret;
 }
 
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v2 1/3] int128: add int128_exts64()

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 11:47, Peter Maydell ha scritto:
> On 22 August 2013 10:09, Paolo Bonzini  wrote:
>> Il 22/08/2013 10:20, Alexey Kardashevskiy ha scritto:
>>> +static inline Int128 int128_exts64(int64_t a)
>>> +{
>>> +return (Int128) { .lo = a, .hi = (a >> 63) ? -1 : 0 };
>>> +}
>>
>> The "? -1 : 0" is not necessary, but the compiler will remove it at -O1
>> or more (interestingly, or -O0 it will remove the shift and leave the
>> conditional!).
> 
> We can avoid relying on implementation defined
> behaviour here by using
>   .hi = (a < 0) ? -1 : 0;
> 
> (I know we allow ourselves to assume right-shift of signed
> ints is arithmetic shift, but I think it's nicer to avoid it unless
> it really makes the code better.)

This is what Alexey proposed.  I suggested (a >> 63) without the ?: but
he misunderstood my (probably not clear enough) suggestion.

Paolo

Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 11:50, Asias He ha scritto:
> On Wed, Aug 21, 2013 at 10:16:02AM +0200, Paolo Bonzini wrote:
>> Il 21/08/2013 04:02, Asias He ha scritto:
>>> In block/gluster.c, we have
>>>
>>> gluster_finish_aiocb
>>> {
>>>if (retval != sizeof(acb)) {
>>>   qemu_mutex_lock_iothread(); /* We are in gluster thread context */
>>>   ...
>>>   qemu_mutex_unlock_iothread();
>>>}
>>> }
>>>
>>> qemu tools, e.g. qemu-img, might race here because
>>> qemu_mutex_{lock,unlock}_iothread are a nop operation and
>>> gluster_finish_aiocb is in the gluster thread context.
>>>
>>> To fix, we introduce our own mutex for qemu tools.
>>>
>>> Signed-off-by: Asias He 
>>> ---
>>>  stubs/iothread-lock.c | 11 +++
>>>  1 file changed, 11 insertions(+)
>>>
>>> diff --git a/stubs/iothread-lock.c b/stubs/iothread-lock.c
>>> index 5d8aca1..d5c6dec 100644
>>> --- a/stubs/iothread-lock.c
>>> +++ b/stubs/iothread-lock.c
>>> @@ -1,10 +1,21 @@
>>>  #include "qemu-common.h"
>>>  #include "qemu/main-loop.h"
>>>  
>>> +static QemuMutex qemu_tools_mutex;
>>> +static pthread_once_t qemu_tools_once = PTHREAD_ONCE_INIT;
>> 
>> Doesn't work on Windows, but you can just add
> 
> Hmm, Any reasons, why it does not work on Windows?

There are no pthreads on Windows.

There is an emulation library, but we're not using it.

>> __attribute__((__constructor__)) to qemu_tools_mutex_init.
> 
> __attribute__((__constructor__)) works on Windows?

Yes, it is part of the runtime library's support for C++.  We use it
already, see include/qemu/module.h.

Paolo

>> Paolo
>>
>>> +static void qemu_tools_mutex_init(void)
>>> +{
>>> +qemu_mutex_init(&qemu_tools_mutex);
>>> +}
>>> +
>>>  void qemu_mutex_lock_iothread(void)
>>>  {
>>> +pthread_once(&qemu_tools_once, qemu_tools_mutex_init);
>>> +qemu_mutex_lock(&qemu_tools_mutex);
>>>  }
>>>  
>>>  void qemu_mutex_unlock_iothread(void)
>>>  {
>>> +qemu_mutex_unlock(&qemu_tools_mutex);
>>>  }
>>>
>>
>

Re: [Qemu-devel] [PATCH v3 6/6] hw: Clean up bogus default boot order

2013-08-22 Thread Michael S. Tsirkin

On Thu, Aug 22, 2013 at 10:39:34AM +0200, Markus Armbruster wrote:
> "Michael S. Tsirkin"  writes:
> 
> > On Wed, Aug 21, 2013 at 06:01:45PM +0200, Markus Armbruster wrote:
> >> "Michael S. Tsirkin"  writes:
> >> 
> >> > On Wed, Aug 21, 2013 at 05:10:56PM +0200, Markus Armbruster wrote:
> >> >> "Michael S. Tsirkin"  writes:
> >> >> 
> >> >> > On Mon, Jul 22, 2013 at 01:38:47PM +0200, Markus Armbruster wrote:
> >> >> >> We set default boot order "cad" in every single machine definition
> >> >> >> except "pseries" and "moxiesim", even though very few boards actually
> >> >> >> care for boot order, and "cad" makes sense for even fewer.
> >> >> >> 
> >> >> >> Machines that care:
> >> >> >> 
> >> >> >> * pc and its variants
> >> >> >> 
> >> >> >>   Accept up to three letters 'a', 'b' (undocumented alias for 'a'),
> >> >> >>   'c', 'd' and 'n'.  Reject all others (fatal with -boot).
> >> >> >> 
> >> >> >> * nseries (n800, n810)
> >> >> >> 
> >> >> >>   Check whether order starts with 'n'.  Silently ignored otherwise.
> >> >> >> 
> >> >> >> * prep, g3beige, mac99
> >> >> >> 
> >> >> >>   Extract the first character the machine understands (subset of
> >> >> >>   'a'..'f').  Silently ignored otherwise.
> >> >> >> 
> >> >> >> * spapr
> >> >> >> 
> >> >> >>   Accept an arbitrary string (vl.c restricts it to contain only
> >> >> >>   'a'..'p', no duplicates).
> >> >> >> 
> >> >> >> * sun4[mdc]
> >> >> >> 
> >> >> >>   Use the first character.  Silently ignored otherwise.
> >> >> >> 
> >> >> >> Strip characters these machines ignore from their default boot order.
> >> >> >> 
> >> >> >> For all other machines, remove the unused default boot order
> >> >> >> alltogether.
> >> >> >> 
> >> >> >> Note that my rename of QEMUMachine member boot_order to
> >> >> >> default_boot_order and QEMUMachineInitArgs member boot_device to
> >> >> >> boot_order has a welcome side effect: it makes every use of boot
> >> >> >> orders visible in this patch, for easy review.
> >> >> >> 
> >> >> >> Signed-off-by: Markus Armbruster 
> >> >> >> ---
> >> >> [...]
> >> >> >> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> >> >> >> index 9327ac1..3700bd5 100644
> >> >> >> --- a/hw/i386/pc_piix.c
> >> >> >> +++ b/hw/i386/pc_piix.c
> >> >> >> @@ -216,7 +216,7 @@ static void pc_init1(QEMUMachineInitArgs *args,
> >> >> >>  }
> >> >> >>  }
> >> >> >>  
> >> >> >> -pc_cmos_init(below_4g_mem_size, above_4g_mem_size, 
> >> >> >> args->boot_device,
> >> >> >> +pc_cmos_init(below_4g_mem_size, above_4g_mem_size, 
> >> >> >> args->boot_order,
> >> >> >>   floppy, idebus[0], idebus[1], rtc_state);
> >> >> >>  
> >> >> >>  if (pci_enabled && usb_enabled(false)) {
> >> >> >> @@ -324,7 +324,7 @@ static QEMUMachine pc_i440fx_machine_v1_6 = {
> >> >> >>  .hot_add_cpu = pc_hot_add_cpu,
> >> >> >>  .max_cpus = 255,
> >> >> >>  .is_default = 1,
> >> >> >> -DEFAULT_MACHINE_OPTIONS,
> >> >> >> +.default_boot_order = "cad",
> >> >> >>  };
> >> >> >>  
> >> >> >>  static QEMUMachine pc_i440fx_machine_v1_5 = {
> >> >> >> @@ -337,7 +337,7 @@ static QEMUMachine pc_i440fx_machine_v1_5 = {
> >> >> >>  PC_COMPAT_1_5,
> >> >> >>  { /* end of list */ }
> >> >> >>  },
> >> >> >> -DEFAULT_MACHINE_OPTIONS,
> >> >> >> +.default_boot_order = "cad",
> >> >> >>  };
> >> >> >>  
> >> >> >>  static QEMUMachine pc_i440fx_machine_v1_4 = {
> >> >> >
> >> >> > So all PC machine types share this?
> >> >> 
> >> >> Correct, just like they share DEFAULT_MACHINE_OPTIONS before my patch.
> >> >> Which is defined as
> >> >> 
> >> >> #define DEFAULT_MACHINE_OPTIONS \
> >> >> .boot_order = "cad"
> >> >> 
> >> >> I.e. my patch merely peels off a layer of obfuscation :)
> >> >
> >> > Using a macro in multiple places, instead of a hard-coded constant is
> >> > not obfuscation.
> >> >
> >> >> > Can we set this in some common code, somehow?
> >> >> 
> >> >> We don't have an inheritance notion for machine types.
> >> >> 
> >> >> vl.c uses machine->boot_order before calling one of its methods, so
> >> >> monkey-patching .boot_order from a method won't do.  Besides, that cure
> >> >> looks much worse than the disease to me.
> >> >> 
> >> >> Can't think of anything else offhand.
> >> >> 
> >> >> [...]
> >> >
> >> > Set this in pc_init_pci somehow?
> >> 
> >> Too late, see "vl.c uses..." above.
> >
> > Ah, missed it.
> > Can we fix that?
> 
> If I understand you correctly, you want to monkey-patch
> machine->boot_order from machine->init().  Issues:
> 
> * machine->init() does not have access to machine.  Fixable.
> 
> * machine is read-only, except for a few places in vl.c (one is managing
>   the list of registered machines, the other monkey-patches
>   machine->max_cpus to one when it's zero).  I don't want *more*
>   monkey-patching, I want *less*, so we can make machines const.  In
>   fact, we can make current_machine const right away, and I think we
>   should (patch coming).
> 
> * If machine->init(

Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-22 Thread Bharata B Rao

On Thu, Aug 22, 2013 at 11:06:47AM +0200, Paolo Bonzini wrote:
> Il 22/08/2013 09:48, Stefan Hajnoczi ha scritto:
> >> > gluster_finish_aiocb gets called from gluster thread, is it safe to 
> >> > create
> >> > and schedule a bh from such a thread ?
> >> > 
> >> > In my first implementation 
> >> > (http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg01748.html), I 
> >> > was using a BH from qemu read side thread (the thread
> >> > that would respond to pipe write from gluster callback thread). That
> >> > implementation was based on rbd and I later dropped the BH part since it
> >> > looked like a round about way of completing the aio when we are already 
> >> > using
> >> > the pipe mechanism for aio completion.
> > Recent patches made creating and scheduling a BH thread-safe.
> 
> I thought scheduling BHs was always thread safe?
> 
> > I think Paolo's idea is better than mine.
> 
> Both are fine, they are just different.  Mine is simpler because it
> leaves list management to the BH code.
> 
> But since we are at it, we should simplify the code and uniformly use a
> bottom half for both successful and erroneous completions.  This applies
> to both ideas.
> 
> Maybe an even simpler patch would be to just abort if the
> GLUSTER_FD_WRITE write fails.

This was the first apporach I had. I used to abort when writes to pipe
fail. But there were concerns raised about handling the failures gracefully
and hence we ended up doing all that error handling of completing the aio
with -EIO, closing the pipe and making the disk inaccessible.

> Under what circumstances could it happen?

Not very sure, I haven't seen that happening. I had to manually inject
faults to test this error path and verify the graceful recovery.

Regards,
Bharata.

[Qemu-devel] [PATCH] chardev: fix pty_chr_timer

2013-08-22 Thread Gerd Hoffmann

pty_chr_timer first calls pty_chr_update_read_handler(), then clears
timer_tag (because it is a one-shot timer).   This is the wrong order
though.  pty_chr_update_read_handler might re-arm time timer, and the
new timer_tag gets overwitten in that case.

This leads to crashes when unplugging a pty chardev:  pty_chr_close
thinks no timer is running -> timer isn't canceled -> pty_chr_timer gets
called with stale CharDevState -> BOOM.

This patch fixes the ordering.
Kill the pointless goto while being at it.

https://bugzilla.redhat.com/show_bug.cgi?id=994414

Signed-off-by: Gerd Hoffmann 
---
 qemu-char.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 1be1cf6..1621fbd 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -1026,15 +1026,11 @@ static gboolean pty_chr_timer(gpointer opaque)
 struct CharDriverState *chr = opaque;
 PtyCharDriver *s = chr->opaque;
 
-if (s->connected) {
-goto out;
-}
-
-/* Next poll ... */
-pty_chr_update_read_handler(chr);
-
-out:
 s->timer_tag = 0;
+if (!s->connected) {
+/* Next poll ... */
+pty_chr_update_read_handler(chr);
+}
 return FALSE;
 }
 
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 11:55, Bharata B Rao ha scritto:
> This was the first apporach I had. I used to abort when writes to pipe
> fail. But there were concerns raised about handling the failures gracefully
> and hence we ended up doing all that error handling of completing the aio
> with -EIO, closing the pipe and making the disk inaccessible.
> 
>> > Under what circumstances could it happen?
> Not very sure, I haven't seen that happening. I had to manually inject
> faults to test this error path and verify the graceful recovery.

Looking at write(2), it looks like it is impossible

   EAGAIN or EWOULDBLOCK
   can't happen, blocking file descriptor

   EBADF, EPIPE
   shouldn't happen since the device is drained before
   calling qemu_gluster_close.

   EDESTADDRREQ, EDQUOT, EFBIG, EIO, ENOSPC
   cannot happen for pipes

   EFAULT
   abort would be fine

   EINTR
   handled by qemu_write_full

   EINVAL
   cannot happen (unless the pipe is closed and the
   file descriptor recycled, but see EBADF above)

The pipe(7) man page doesn't seem to add any more errors.

Paolo

Re: [Qemu-devel] [PATCH] vl: allow "cont" from panicked state

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 11:37, Michael S. Tsirkin ha scritto:
>> No, you're not misunderstanding the discussion.
>>
>> Depending on the priorities of the pvpanic and legacy-DMA regions, it
>> would break DMA channel 0.  Channel 0 is (was) used for DRAM refresh, so
>> it should not have any visible effect.  However, it may not be entirely
>> disabling pvpanic, just making it mostly invisible.
>>
>> Paolo
> 
> Ugh.
> 
> And now that Paolo pointed out that nothing terrible
> happens even when migrating from host with pvpanic
> enabled to host with pvpanic disabled, I'm inclined
> to think we should just disable pvpanic in 1.5.X.
> 
> Thoughts?

You'd obviously have no objection from me.

Paolo

Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-22 Thread Bharata B Rao

On Thu, Aug 22, 2013 at 12:00:48PM +0200, Paolo Bonzini wrote:
> Il 22/08/2013 11:55, Bharata B Rao ha scritto:
> > This was the first apporach I had. I used to abort when writes to pipe
> > fail. But there were concerns raised about handling the failures gracefully
> > and hence we ended up doing all that error handling of completing the aio
> > with -EIO, closing the pipe and making the disk inaccessible.
> > 
> >> > Under what circumstances could it happen?
> > Not very sure, I haven't seen that happening. I had to manually inject
> > faults to test this error path and verify the graceful recovery.
> 
> Looking at write(2), it looks like it is impossible
> 
>EAGAIN or EWOULDBLOCK
>can't happen, blocking file descriptor
> 
>EBADF, EPIPE
>shouldn't happen since the device is drained before
>calling qemu_gluster_close.
> 
>EDESTADDRREQ, EDQUOT, EFBIG, EIO, ENOSPC
>cannot happen for pipes
> 
>EFAULT
>abort would be fine

In the case where we have separate system and data disks and if error (EFAULT)
happens for the data disk, don't we want to keep the VM up by gracefully
disabling IO to the data disk ? I remember this was one of the motivations to
handle this failure.

Regards,
Bharata.

Re: [Qemu-devel] [PATCH] vl: allow "cont" from panicked state

2013-08-22 Thread Laszlo Ersek

On 08/22/13 11:19, Paolo Bonzini wrote:
> Il 22/08/2013 10:38, Laszlo Ersek ha scritto:
 To support 1.5, libvirt should simply be ready to react to unanticipated
 GUEST_PANICKED events.  reboot-on-panic will simply be broken for 1.5
 and Linux 3.10+ guests. :(
>> I'm probably misunderstanding the discussion, but it might be possible
>> to disable pvpanic even in 1.5 from the host side, with the following hack:
>>
>>   -global pvpanic.ioport=0
>>
>> In qemu, this will either configure a working pvpanic device on ioport
>> 0, or the pvpanic device will be genuinely broken. At least it doesn't
>> (obviously) break other stuff (in v1.5.2):
>>
>> (qemu) info mtree
>> I/O
>> - (prio 0, RW): io
>>   - (prio 0, RW): pvpanic
>>   -0007 (prio 0, RW): dma-chan
> 
> No, you're not misunderstanding the discussion.
> 
> Depending on the priorities of the pvpanic and legacy-DMA regions, it
> would break DMA channel 0. 



I think before priority comes into the picture, the access size would
matter first, no?

(I think I'm recalling this from the 0xCF9 reset control register, which
falls into the [0xCF8..0xCFA] range.)

Unless ioport 0 is accessed with width 1 for dma-chan purposes, I think
such an access would be unique to pvpanic, and always dispatched to pvpanic.

> Channel 0 is (was) used for DRAM refresh, so
> it should not have any visible effect.  However, it may not be entirely
> disabling pvpanic, just making it mostly invisible.

That's good enough for the guest to reach kexec :)



Laszlo

Re: [Qemu-devel] [PATCH v2 3/3] vfio: Fix 128 bit handling

2013-08-22 Thread Alexey Kardashevskiy

On 08/22/2013 07:11 PM, Paolo Bonzini wrote:
> Il 22/08/2013 10:20, Alexey Kardashevskiy ha scritto:
>>  iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
>> -end = (section->offset_within_address_space + 
>> int128_get64(section->size)) &
>> -  TARGET_PAGE_MASK;
>> +llend = int128_make64(section->offset_within_address_space);
>> +llend = int128_add(llend, section->size);
>> +llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
>>  
>> -if (iova >= end) {
>> +if (int128_ge(int128_make64(iova), llend)) {
>>  return;
>>  }
>>  
>> +end = (section->offset_within_address_space + 
>> int128_get64(section->size)) &
>> +  TARGET_PAGE_MASK;
>> +
> 
> This can still fail for section->size = 2^64.  Do your IOMMU patches
> take care of it?

Nope. That part works for IOMMU mapped to RAM which is smaller than 2^64
bytes and therefore I do not see why we would need 2^64 bits sizes there.
Either way, I cannot test it quick (yes, I know, I should have some x86
VFIO setup by hand as everyone has a lot of x86, etc...) so I decided to
leave to the moment when x86 folks hit the problem :)

-- 
Alexey

Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-22 Thread Asias He

On Wed, Aug 21, 2013 at 10:16:02AM +0200, Paolo Bonzini wrote:
> Il 21/08/2013 04:02, Asias He ha scritto:
> > In block/gluster.c, we have
> > 
> > gluster_finish_aiocb
> > {
> >if (retval != sizeof(acb)) {
> >   qemu_mutex_lock_iothread(); /* We are in gluster thread context */
> >   ...
> >   qemu_mutex_unlock_iothread();
> >}
> > }
> > 
> > qemu tools, e.g. qemu-img, might race here because
> > qemu_mutex_{lock,unlock}_iothread are a nop operation and
> > gluster_finish_aiocb is in the gluster thread context.
> > 
> > To fix, we introduce our own mutex for qemu tools.
> > 
> > Signed-off-by: Asias He 
> > ---
> >  stubs/iothread-lock.c | 11 +++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/stubs/iothread-lock.c b/stubs/iothread-lock.c
> > index 5d8aca1..d5c6dec 100644
> > --- a/stubs/iothread-lock.c
> > +++ b/stubs/iothread-lock.c
> > @@ -1,10 +1,21 @@
> >  #include "qemu-common.h"
> >  #include "qemu/main-loop.h"
> >  
> > +static QemuMutex qemu_tools_mutex;
> > +static pthread_once_t qemu_tools_once = PTHREAD_ONCE_INIT;
> 
> Doesn't work on Windows, but you can just add

Hmm, Any reasons, why it does not work on Windows?

> __attribute__((__constructor__)) to qemu_tools_mutex_init.

__attribute__((__constructor__)) works on Windows?

> Paolo
> 
> > +static void qemu_tools_mutex_init(void)
> > +{
> > +qemu_mutex_init(&qemu_tools_mutex);
> > +}
> > +
> >  void qemu_mutex_lock_iothread(void)
> >  {
> > +pthread_once(&qemu_tools_once, qemu_tools_mutex_init);
> > +qemu_mutex_lock(&qemu_tools_mutex);
> >  }
> >  
> >  void qemu_mutex_unlock_iothread(void)
> >  {
> > +qemu_mutex_unlock(&qemu_tools_mutex);
> >  }
> > 
> 

-- 
Asias

Re: [Qemu-devel] [PATCH v3 00/24] arm: ARM11MPCore+A9MPCore+A15MPCore QOM'ification

2013-08-22 Thread Andreas Färber

Am 21.08.2013 23:01, schrieb Peter Maydell:
> On 20 August 2013 16:20, Andreas Färber  wrote:
>> From: Andreas Färber 
>>
>> Hello Peter,
>>
>> This series fully QOM'ifies A9MPCore so that it can be embedded for Tegra2.
>> It goes on to do the same for A15MPCore, which had previously been taken as
>> template for Cortex-A57 by John Rigby, and in v3 ARM11MPCore.
> 
> This doesn't apply any more -- do you have a branch somewhere to
> save me resolving all the conflicts by hand?

I already noticed and rebased. My tegra branch is based on it, but I can
submit a v4.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH v2 1/3] int128: add int128_exts64()

2013-08-22 Thread Peter Maydell

On 22 August 2013 10:48, Paolo Bonzini  wrote:
> Il 22/08/2013 11:47, Peter Maydell ha scritto:
>> We can avoid relying on implementation defined
>> behaviour here by using
>>   .hi = (a < 0) ? -1 : 0;
>>
>> (I know we allow ourselves to assume right-shift of signed
>> ints is arithmetic shift, but I think it's nicer to avoid it unless
>> it really makes the code better.)
>
> This is what Alexey proposed.  I suggested (a >> 63) without the ?: but
> he misunderstood my (probably not clear enough) suggestion.

Yes, I found that email thread after sending this. I think
the (a < 0) variant is better than using a shift (with or without
the ?: operator).

-- PMM

Re: [Qemu-devel] [PATCH] vl: allow "cont" from panicked state

2013-08-22 Thread Laszlo Ersek

On 08/22/13 12:34, Laszlo Ersek wrote:

> (I think I'm recalling this from the 0xCF9 reset control register, which
> falls into the [0xCF8..0xCFA] range.)

[0xCF8..0xCFB], sigh

Re: [Qemu-devel] [PATCH v2 1/3] int128: add int128_exts64()

2013-08-22 Thread Alexey Kardashevskiy

On 08/22/2013 07:48 PM, Paolo Bonzini wrote:
> Il 22/08/2013 11:47, Peter Maydell ha scritto:
>> On 22 August 2013 10:09, Paolo Bonzini  wrote:
>>> Il 22/08/2013 10:20, Alexey Kardashevskiy ha scritto:
 +static inline Int128 int128_exts64(int64_t a)
 +{
 +return (Int128) { .lo = a, .hi = (a >> 63) ? -1 : 0 };
 +}
>>>
>>> The "? -1 : 0" is not necessary, but the compiler will remove it at -O1
>>> or more (interestingly, or -O0 it will remove the shift and leave the
>>> conditional!).
>>
>> We can avoid relying on implementation defined
>> behaviour here by using
>>   .hi = (a < 0) ? -1 : 0;
>>
>> (I know we allow ourselves to assume right-shift of signed
>> ints is arithmetic shift, but I think it's nicer to avoid it unless
>> it really makes the code better.)
> 
> This is what Alexey proposed.  I suggested (a >> 63) without the ?: but
> he misunderstood my (probably not clear enough) suggestion.

Yes, I misunderstood. It was not obvious to me that (signed long
long)-1>>63 will be still -1. I really (really) envy people who can easily
read stuff like but I cannot :(

1) return (Int128) { .lo = a, .hi = (a < 0) ? -1 : 0 };
2) return (Int128) { .lo = a, .hi = (a < 0) };
3) return (Int128) { .lo = a, .hi = a >> 63 };

So with which one should I repost the patch?


-- 
Alexey

Re: [Qemu-devel] [PATCH v3 11/24] cpu/a15mpcore: Embed GICState

2013-08-22 Thread Andreas Färber

Am 21.08.2013 23:05, schrieb Peter Maydell:
> On 20 August 2013 16:21, Andreas Färber  wrote:
>> From: Andreas Färber 
>>
>> This covers both emulated and KVM GIC.
> 
>> @@ -35,40 +36,48 @@ typedef struct A15MPPrivState {
>>  uint32_t num_cpu;
>>  uint32_t num_irq;
>>  MemoryRegion container;
>> -DeviceState *gic;
>> +
>> +GICState gic;
>>  } A15MPPrivState;
> 
>>  static void a15mp_priv_initfn(Object *obj)
>>  {
>>  SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
>>  A15MPPrivState *s = A15MPCORE_PRIV(obj);
>> +DeviceState *gicdev;
>> +const char *gictype = "arm_gic";
>> +
>> +if (kvm_irqchip_in_kernel()) {
>> +gictype = "kvm-arm-gic";
>> +}
>>
>>  memory_region_init(&s->container, obj, "a15mp-priv-container", 0x8000);
>>  sysbus_init_mmio(sbd, &s->container);
>> +
>> +object_initialize(&s->gic, gictype);
>> +gicdev = DEVICE(&s->gic);
>> +qdev_set_parent_bus(gicdev, sysbus_get_default());
>> +qdev_prop_set_uint32(gicdev, "revision", 2);
> 
> So this is basically assuming that kvm-arm-gic and arm-gic
> both have an instance struct of exactly the same size,
> even though they're different classes (they happen to be
> so at the moment, because neither adds extra state beyond
> that needed by common base class). Is that really a good
> idea? (If it ever becomes not true we get silent memory
> corruption here...)

Not sure if a union of only one member is permitted? We're not actually
accessing the GICState, only void* and DEVICE()/SYS_BUS_DEVICE(), so it
just needs to block the memory, hopefully without needing to distinguish
between ->gic.emulated and ->gic.kvm pointers.
The decision doesn't depend on any user-settable property, just on the
at this point global kvm_enabled() state, so I see nowhere else to
allocate it dynamically.

If you change the .instance_size struct one of the GICs uses, then a
number of places will need to be reviewed, including
ARM_GIC_COMMON()[*], ARM_GIC() and KVM_ARM_GIC() all returning the same
type.

[*] When we're through with the functional changes, we should
s/ARM_GIC_COMMON/COMMON_ARM_GIC/g to match the general pattern.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH] chardev: fix pty_chr_timer

2013-08-22 Thread Laszlo Ersek

On 08/22/13 11:57, Gerd Hoffmann wrote:
> pty_chr_timer first calls pty_chr_update_read_handler(), then clears
> timer_tag (because it is a one-shot timer).   This is the wrong order
> though.  pty_chr_update_read_handler might re-arm time timer, and the
> new timer_tag gets overwitten in that case.
> 
> This leads to crashes when unplugging a pty chardev:  pty_chr_close
> thinks no timer is running -> timer isn't canceled -> pty_chr_timer gets
> called with stale CharDevState -> BOOM.
> 
> This patch fixes the ordering.
> Kill the pointless goto while being at it.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=994414
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  qemu-char.c | 12 
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/qemu-char.c b/qemu-char.c
> index 1be1cf6..1621fbd 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -1026,15 +1026,11 @@ static gboolean pty_chr_timer(gpointer opaque)
>  struct CharDriverState *chr = opaque;
>  PtyCharDriver *s = chr->opaque;
>  
> -if (s->connected) {
> -goto out;
> -}
> -
> -/* Next poll ... */
> -pty_chr_update_read_handler(chr);
> -
> -out:
>  s->timer_tag = 0;
> +if (!s->connected) {
> +/* Next poll ... */
> +pty_chr_update_read_handler(chr);
> +}
>  return FALSE;
>  }

pty_chr_timer()
  s->timer_tag = 0;
  pty_chr_update_read_handler() // s->connected == 0
pty_chr_state(..., 1)   // G_IO_HUP is clear
  not calling: g_source_remove(s->timer_tag)
  s->connected = 1;
  qemu_chr_be_generic_open()
  s->fd_tag = io_add_watch_poll()

So, in this order, s->timer_tag is not removed in pty_chr_state().

But that shouldn't be necessary anyway, since pty_chr_timer() returns
FALSE, and its associated tag (s->timer_tag, see pty_chr_rearm_timer())
is removed anyway.

Seems OK to me.

Reviewed-by: Laszlo Ersek

Re: [Qemu-devel] [PATCH v2 3/3] vfio: Fix 128 bit handling

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 12:41, Alexey Kardashevskiy ha scritto:
> On 08/22/2013 07:11 PM, Paolo Bonzini wrote:
>> Il 22/08/2013 10:20, Alexey Kardashevskiy ha scritto:
>>>  iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
>>> -end = (section->offset_within_address_space + 
>>> int128_get64(section->size)) &
>>> -  TARGET_PAGE_MASK;
>>> +llend = int128_make64(section->offset_within_address_space);
>>> +llend = int128_add(llend, section->size);
>>> +llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
>>>  
>>> -if (iova >= end) {
>>> +if (int128_ge(int128_make64(iova), llend)) {
>>>  return;
>>>  }
>>>  
>>> +end = (section->offset_within_address_space + 
>>> int128_get64(section->size)) &
>>> +  TARGET_PAGE_MASK;
>>> +
>>
>> This can still fail for section->size = 2^64.  Do your IOMMU patches
>> take care of it?
> 
> Nope. That part works for IOMMU mapped to RAM which is smaller than 2^64
> bytes and therefore I do not see why we would need 2^64 bits sizes there.

Understood.  So the IOMMU patches take care of it because this code is
only used for non-IOMMU regions.  Thanks,

Paolo

> Either way, I cannot test it quick (yes, I know, I should have some x86
> VFIO setup by hand as everyone has a lot of x86, etc...) so I decided to
> leave to the moment when x86 folks hit the problem :)
> 
>

Re: [Qemu-devel] [PATCH v2 1/3] int128: add int128_exts64()

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 12:50, Alexey Kardashevskiy ha scritto:
> 1) return (Int128) { .lo = a, .hi = (a < 0) ? -1 : 0 };
> 2) return (Int128) { .lo = a, .hi = (a < 0) };
> 3) return (Int128) { .lo = a, .hi = a >> 63 };
> 
> So with which one should I repost the patch?

The second would be "-(a < 0)" actually.  Peter wants (1) I think, so go
for it.

Paolo

Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 12:28, Bharata B Rao ha scritto:
> On Thu, Aug 22, 2013 at 12:00:48PM +0200, Paolo Bonzini wrote:
>> Il 22/08/2013 11:55, Bharata B Rao ha scritto:
>>> This was the first apporach I had. I used to abort when writes to pipe
>>> fail. But there were concerns raised about handling the failures gracefully
>>> and hence we ended up doing all that error handling of completing the aio
>>> with -EIO, closing the pipe and making the disk inaccessible.
>>>
> Under what circumstances could it happen?
>>> Not very sure, I haven't seen that happening. I had to manually inject
>>> faults to test this error path and verify the graceful recovery.
>>
>> Looking at write(2), it looks like it is impossible
>>
>>EAGAIN or EWOULDBLOCK
>>can't happen, blocking file descriptor
>>
>>EBADF, EPIPE
>>shouldn't happen since the device is drained before
>>calling qemu_gluster_close.
>>
>>EDESTADDRREQ, EDQUOT, EFBIG, EIO, ENOSPC
>>cannot happen for pipes
>>
>>EFAULT
>>abort would be fine
> 
> In the case where we have separate system and data disks and if error (EFAULT)
> happens for the data disk, don't we want to keep the VM up by gracefully
> disabling IO to the data disk ?

EFAULT means the buffer address is invalid, I/O error would be EIO, but...

> I remember this was one of the motivations to
> handle this failure.

... this write is on the pipe, not on a disk.

Paolo

Re: [Qemu-devel] [PATCH v3 6/6] hw: Clean up bogus default boot order

2013-08-22 Thread Markus Armbruster

"Michael S. Tsirkin"  writes:

> On Thu, Aug 22, 2013 at 10:39:34AM +0200, Markus Armbruster wrote:
>> "Michael S. Tsirkin"  writes:
>> 
>> > On Wed, Aug 21, 2013 at 06:01:45PM +0200, Markus Armbruster wrote:
>> >> "Michael S. Tsirkin"  writes:
>> >> 
>> >> > On Wed, Aug 21, 2013 at 05:10:56PM +0200, Markus Armbruster wrote:
>> >> >> "Michael S. Tsirkin"  writes:
>> >> >> 
>> >> >> > On Mon, Jul 22, 2013 at 01:38:47PM +0200, Markus Armbruster wrote:
>> >> >> >> We set default boot order "cad" in every single machine definition
>> >> >> >> except "pseries" and "moxiesim", even though very few boards 
>> >> >> >> actually
>> >> >> >> care for boot order, and "cad" makes sense for even fewer.
>> >> >> >> 
>> >> >> >> Machines that care:
>> >> >> >> 
>> >> >> >> * pc and its variants
>> >> >> >> 
>> >> >> >>   Accept up to three letters 'a', 'b' (undocumented alias for 'a'),
>> >> >> >>   'c', 'd' and 'n'.  Reject all others (fatal with -boot).
>> >> >> >> 
>> >> >> >> * nseries (n800, n810)
>> >> >> >> 
>> >> >> >>   Check whether order starts with 'n'.  Silently ignored otherwise.
>> >> >> >> 
>> >> >> >> * prep, g3beige, mac99
>> >> >> >> 
>> >> >> >>   Extract the first character the machine understands (subset of
>> >> >> >>   'a'..'f').  Silently ignored otherwise.
>> >> >> >> 
>> >> >> >> * spapr
>> >> >> >> 
>> >> >> >>   Accept an arbitrary string (vl.c restricts it to contain only
>> >> >> >>   'a'..'p', no duplicates).
>> >> >> >> 
>> >> >> >> * sun4[mdc]
>> >> >> >> 
>> >> >> >>   Use the first character.  Silently ignored otherwise.
>> >> >> >> 
>> >> >> >> Strip characters these machines ignore from their default boot 
>> >> >> >> order.
>> >> >> >> 
>> >> >> >> For all other machines, remove the unused default boot order
>> >> >> >> alltogether.
>> >> >> >> 
>> >> >> >> Note that my rename of QEMUMachine member boot_order to
>> >> >> >> default_boot_order and QEMUMachineInitArgs member boot_device to
>> >> >> >> boot_order has a welcome side effect: it makes every use of boot
>> >> >> >> orders visible in this patch, for easy review.
>> >> >> >> 
>> >> >> >> Signed-off-by: Markus Armbruster 
>> >> >> >> ---
>> >> >> [...]
>> >> >> >> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
>> >> >> >> index 9327ac1..3700bd5 100644
>> >> >> >> --- a/hw/i386/pc_piix.c
>> >> >> >> +++ b/hw/i386/pc_piix.c
>> >> >> >> @@ -216,7 +216,7 @@ static void pc_init1(QEMUMachineInitArgs *args,
>> >> >> >>  }
>> >> >> >>  }
>> >> >> >>  
>> >> >> >> -pc_cmos_init(below_4g_mem_size, above_4g_mem_size, 
>> >> >> >> args->boot_device,
>> >> >> >> +pc_cmos_init(below_4g_mem_size, above_4g_mem_size, 
>> >> >> >> args->boot_order,
>> >> >> >>   floppy, idebus[0], idebus[1], rtc_state);
>> >> >> >>  
>> >> >> >>  if (pci_enabled && usb_enabled(false)) {
>> >> >> >> @@ -324,7 +324,7 @@ static QEMUMachine pc_i440fx_machine_v1_6 = {
>> >> >> >>  .hot_add_cpu = pc_hot_add_cpu,
>> >> >> >>  .max_cpus = 255,
>> >> >> >>  .is_default = 1,
>> >> >> >> -DEFAULT_MACHINE_OPTIONS,
>> >> >> >> +.default_boot_order = "cad",
>> >> >> >>  };
>> >> >> >>  
>> >> >> >>  static QEMUMachine pc_i440fx_machine_v1_5 = {
>> >> >> >> @@ -337,7 +337,7 @@ static QEMUMachine pc_i440fx_machine_v1_5 = {
>> >> >> >>  PC_COMPAT_1_5,
>> >> >> >>  { /* end of list */ }
>> >> >> >>  },
>> >> >> >> -DEFAULT_MACHINE_OPTIONS,
>> >> >> >> +.default_boot_order = "cad",
>> >> >> >>  };
>> >> >> >>  
>> >> >> >>  static QEMUMachine pc_i440fx_machine_v1_4 = {
>> >> >> >
>> >> >> > So all PC machine types share this?
>> >> >> 
>> >> >> Correct, just like they share DEFAULT_MACHINE_OPTIONS before my patch.
>> >> >> Which is defined as
>> >> >> 
>> >> >> #define DEFAULT_MACHINE_OPTIONS \
>> >> >> .boot_order = "cad"
>> >> >> 
>> >> >> I.e. my patch merely peels off a layer of obfuscation :)
>> >> >
>> >> > Using a macro in multiple places, instead of a hard-coded constant is
>> >> > not obfuscation.
>> >> >
>> >> >> > Can we set this in some common code, somehow?
>> >> >> 
>> >> >> We don't have an inheritance notion for machine types.
>> >> >> 
>> >> >> vl.c uses machine->boot_order before calling one of its methods, so
>> >> >> monkey-patching .boot_order from a method won't do.  Besides, that cure
>> >> >> looks much worse than the disease to me.
>> >> >> 
>> >> >> Can't think of anything else offhand.
>> >> >> 
>> >> >> [...]
>> >> >
>> >> > Set this in pc_init_pci somehow?
>> >> 
>> >> Too late, see "vl.c uses..." above.
>> >
>> > Ah, missed it.
>> > Can we fix that?
>> 
>> If I understand you correctly, you want to monkey-patch
>> machine->boot_order from machine->init().  Issues:
>> 
>> * machine->init() does not have access to machine.  Fixable.
>> 
>> * machine is read-only, except for a few places in vl.c (one is managing
>>   the list of registered machines, the other monkey-patches
>>   machine->max_cpus to one when it's zero).  I don't want *more*
>>   monk

[Qemu-devel] [PATCH v3 1/3] int128: add int128_exts64()

2013-08-22 Thread Alexey Kardashevskiy

This adds macro to extend signed 64bit value to signed 128bit value.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v3:
* (.hi = (a >> 63) ? -1 : 0) changed to (.hi = (a < 0) ? -1 : 0)
---
 include/qemu/int128.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 9ed47aa..ef87e5e 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -38,6 +38,11 @@ static inline Int128 int128_2_64(void)
 return (Int128) { 0, 1 };
 }
 
+static inline Int128 int128_exts64(int64_t a)
+{
+return (Int128) { .lo = a, .hi = (a < 0) ? -1 : 0 };
+}
+
 static inline Int128 int128_and(Int128 a, Int128 b)
 {
 return (Int128) { a.lo & b.lo, a.hi & b.hi };
-- 
1.8.4.rc4

[Qemu-devel] [PATCH v3 0/3] vfio: fixes for better support for 128 bit memory section sizes

2013-08-22 Thread Alexey Kardashevskiy

I made a couple of small patches while debugging VFIO on SPAPR
which uses IOMMU MemoryRegion 2^64 bytes long.

Changes:
v3:
* "int128: add int128_exts64()" updated

v2:
* added int128_exts64() function as a separate patch and used in
"vfio: Fix 128 bit handling"



Alexey Kardashevskiy (3):
  int128: add int128_exts64()
  vfio: Fix debug output for int128 values
  vfio: Fix 128 bit handling

 hw/misc/vfio.c| 19 +--
 include/qemu/int128.h |  5 +
 2 files changed, 18 insertions(+), 6 deletions(-)

-- 
1.8.4.rc4

[Qemu-devel] [PATCH v3 2/3] vfio: Fix debug output for int128 values

2013-08-22 Thread Alexey Kardashevskiy

Memory regions can easily be 2^64 byte long and therefore overflow
for just a bit but that is enough for int128_get64() to assert.

This takes care of debug printing of huge section sizes.

Signed-off-by: Alexey Kardashevskiy 
---
 hw/misc/vfio.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index 017e693..dfe3a80 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -1928,7 +1928,8 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 if (vfio_listener_skipped_section(section)) {
 DPRINTF("SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n",
 section->offset_within_address_space,
-section->offset_within_address_space + section->size - 1);
+section->offset_within_address_space +
+int128_get64(int128_sub(section->size, int128_one(;
 return;
 }
 
@@ -1973,7 +1974,8 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 if (vfio_listener_skipped_section(section)) {
 DPRINTF("SKIPPING region_del %"HWADDR_PRIx" - %"PRIx64"\n",
 section->offset_within_address_space,
-section->offset_within_address_space + section->size - 1);
+section->offset_within_address_space +
+int128_get64(int128_sub(section->size, int128_one(;
 return;
 }
 
-- 
1.8.4.rc4

[Qemu-devel] [PATCH v3 3/3] vfio: Fix 128 bit handling

2013-08-22 Thread Alexey Kardashevskiy

Upcoming VFIO on SPAPR PPC64 support will initialize the IOMMU
memory region with UINT64_MAX (2^64 bytes) size so int128_get64()
will assert.

The patch takes care of this check. The existing type1 IOMMU code
is not expected to map all 64 bits of RAM so the patch does not
touch that part.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v2:
* used new function int128_exts64()
---
 hw/misc/vfio.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index dfe3a80..3878fc7 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -1920,6 +1920,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 VFIOContainer *container = container_of(listener, VFIOContainer,
 iommu_data.listener);
 hwaddr iova, end;
+Int128 llend;
 void *vaddr;
 int ret;
 
@@ -1940,13 +1941,17 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 }
 
 iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
-end = (section->offset_within_address_space + int128_get64(section->size)) 
&
-  TARGET_PAGE_MASK;
+llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
 
-if (iova >= end) {
+if (int128_ge(int128_make64(iova), llend)) {
 return;
 }
 
+end = (section->offset_within_address_space + int128_get64(section->size)) 
&
+  TARGET_PAGE_MASK;
+
 vaddr = memory_region_get_ram_ptr(section->mr) +
 section->offset_within_region +
 (iova - section->offset_within_address_space);
-- 
1.8.4.rc4

Re: [Qemu-devel] [PATCH] vl: allow "cont" from panicked state

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 12:34, Laszlo Ersek ha scritto:
> 

Actually it's fine to clarify these things!  Hence the longish
digression below.

> I think before priority comes into the picture, the access size would
> matter first, no?
> 
> (I think I'm recalling this from the 0xCF9 reset control register, which
> falls into the [0xCF8..0xCFA] range.)

The base address is what matters.  A 2- or 4-byte access to x will
always go to the region that includes address x, even if there are other
regions between x and respectively x+1 or x+3.  So an access to 0xCF8
will go to the PCI address register, while an access to 0xCF9 will
always go to the reset control register.

This happens in address_space_translate_internal:

diff = int128_sub(section->mr->size, int128_make64(addr));

For a write to 0xCF8, addr would be 0 (it is relative to the base of the
MemoryRegion).  section->size would be 1 because the next section starts
at 0xCF9.  However, section->mr->size would be 4 as specified e.g. in
i440fx_pcihost_initfn:

memory_region_init_io(&s->conf_mem, obj, &pci_host_conf_le_ops, s,
  "pci-conf-idx", 4);

Using section->size would be wrong---it would attempt a 1-byte write to
0xCF8, another 1-byte write to 0xCF9, and a 2-byte write to 0xCFA.
section->mr->size instead does a single write to 0xCF8, the same as on
real hardware.

BTW, the behavior changed slightly in QEMU 1.6 for 8-byte accesses. All
such accesses were split to two 4-byte accesses before; now the maximum
size of a "direct" MMIO operation (the data bus size, effectively) is 64
bits, so a 64-bit write will always address a single MemoryRegion.

For example, say you had the PCI address and data registers occupying
two separate 4-byte MemoryRegions in 8 consecutive bytes of memory.  In
1.5 you could write both of them with a single 64-bit write.  In 1.6,
this would only write four bytes to the first MemoryRegion.  This
matches hardware more closely, and is really unlikely to be a problem: a
target with 32-bit data bus probably would not have 64-bit CPU registers
to begin with.  If it did, it would resemble the architecture of the
80386SX or 8088 processors.

> Unless ioport 0 is accessed with width 1 for dma-chan purposes, I think
> such an access would be unique to pvpanic, and always dispatched to pvpanic.

It is:

static const MemoryRegionOps channel_io_ops = {
.read = read_chan,
.write = write_chan,
.endianness = DEVICE_NATIVE_ENDIAN,
.impl = {
.min_access_size = 1,
.max_access_size = 1,
},
};

>> Channel 0 is (was) used for DRAM refresh, so
>> it should not have any visible effect.  However, it may not be entirely
>> disabling pvpanic, just making it mostly invisible.
> 
> That's good enough for the guest to reach kexec :)

Yes, I cannot deny that. :)

Paolo

Re: [Qemu-devel] [PATCH v5 0/8] Implement reference count for BlockDriverState [resend]

2013-08-22 Thread Stefan Hajnoczi

On Fri, Aug 09, 2013 at 06:01:53PM +0800, Fam Zheng wrote:
> [resend to the correct list]
> 
> BlockDriverState lifecycle management is needed by future features such as
> image fleecing and blockdev-add. This series adds reference count to
> BlockDriverState.
> 
> The first two patches clean up two odd BlockDriverState use cases, so all code
> uses bdrv_new() to create BlockDriverState instance.
> 
> Then implemented bdrv_ref() and bdrv_unref() to operate on refcnt: Initially,
> refcnt is 1, which means bdrv_unref is effectively a bdrv_delete() here. So
> patch 04 has a search and replace to convert bdrv_delete to bdrv_unref, before
> bdrv_ref is used anywhere. 05~08 patches calls bdrv_ref for device attach,
> block-migration and nbd.
> 
> The rule is: Either bdrv_ref() or bdrv_new() must have a matching
> bdrv_unref() call, and the last matching bdrv_unref deletes the bs.
> 
> v4:
> 08: Added, let block job use BDS reference.
> 02: Fix leak of bs.opaque
> 
> v3:
> 03: Removed unnecessary bdrv_close() call.
> 
> v2:
> 05: Removed: "block: use BlockDriverState refcnt for device attach/detach"
> 07: Fix xen_disk blk_disconnect() as it depended on device attach refcnt.

Sorry, can't merge this because it breaks qemu-iotests 041 and 055:

  $ ./check -qcow2 055 041

Please always run qemu-iotests before submitting patches.

Stefan

Re: [Qemu-devel] [RFC] Convert AioContext to Gsource sub classes

2013-08-22 Thread Stefan Hajnoczi

On Wed, Aug 21, 2013 at 05:33:41PM +0800, Wenchao Xia wrote:
> 于 2013-8-21 16:45, Stefan Hajnoczi 写道:
> >On Tue, Aug 20, 2013 at 05:59:58PM +0800, Wenchao Xia wrote:
> >>于 2013-8-16 16:12, Wenchao Xia 写道:
> >>>于 2013-8-16 15:15, Wenchao Xia 写道:
> 于 2013-8-16 0:32, Michael Roth 写道:
> >Quoting Michael Roth (2013-08-15 10:23:20)
> >>Quoting Wenchao Xia (2013-08-13 03:44:39)
> >>>于 2013-8-13 1:01, Michael Roth 写道:
> Quoting Paolo Bonzini (2013-08-12 02:30:28)
> >>1) rename AioContext to AioSource.
> >>This is my major purpose, which declare it is not a "context"
> >>concept,
> >>and GMainContext is the entity represent the thread's activity.
> >
> >Note that the nested event loops in QEMU are _very_ different from
> >glib nested event loops.  In QEMU, nested event loops only run block
> >layer events.  In glib, they run all events.  That's why you need
> >AioContext.
> 
> Would it be possible to use glib for our nested loops as well by just
> setting a higher priority for the AioContext GSource?
> 
> Stefan and I were considering how we could make use of his "drop
> ioflush" patches to use a common mechanism to register fd events, but
> still allow us to distinguish between AioContext and non-AioContext
> for nested loops. I was originally thinking of using prepare()
> functions
> to filter out non-AioContext events, but that requires we implement
> on GSource's with that in mind, and non make use of pre-baked ones
> like GIOChannel's, and bakes block stuff into every event source
> implementation.
> 
> >>>Besides priority, also g_source_set_can_recurse() can help.
> >>>With a deeper think, I found a harder problem:
> >>>g_main_context_acquire() and g_main_context_release(). In release,
> >>>pending BH/IO call back need to be cleared, but this action can't
> >>>be triggered automatically when user call g_main_context_release().
> >>
> >>I don't understand why this is a requirement, gmctx_acquire/release
> >>ensure
> >>that only one thread attempts to iterate the main loop at a time. this
> >>isn't currently an issue in qemu, and if we re-implemented
> >>qemu_aio_wait()
> >>to use the same glib interfaces, the tracking of in-flight requests
> >>would
> >>be moved to the block layer via Stefan's 'drop io_flush' patches, which
> >>moves that block-specific logic out of the main loop/AioContext GSource
> >>by design. Are there other areas where you see this as a problem?
> >
> >I think I understand better what you're referring to, you mean that
> >if qemu_aio_wait was called, and was implementated to essentially call
> >g_main_context_iterate(), that after, say, 1 iteration, the
> >iothread/dataplane thread might acquire the main loop and dispatch
> >block/non-block events between qemu_aio_wait() returned? The simple
> >approach would be to have qemu_aio_wait() call
> >g_main_context_acquire/release
> >at the start end of the function, which would ensure that this never
> >happens.
> >
>    qemu_aio_wait() is relative simple to resolve by
> g_main_context_acquire(), but also shows additional code needed
> for a thread switch:
> (guess following is the plan to implement qemu_aio_wait())
> qemu_aio_wait(GMainContext *ctx)
> {
>  return g_main_context(ctx, PRI_BLOCK);
> }
> at caller:
> {
>  while (qemu_aio_wait(ctx) {
>  ;
>  }
>  g_main_context_release(ctx);
> }
>    The above code shows that, in *ctx switch operation, there is
> more than glib's own event loop API envolved, qemu_aio_wait(). So
> it referenced to a question: what data structure
> should be used to represent context concept and control the thread
> switching behavior?  It is better to have a clear layer, GMainContext or
> GlibQContext, instead of GMainContext plus custom function. The caller
> reference to at least two: nested user block layer, flat user above
> block layer.
>    In my opinion, this problem is brought by Gsource AioContext, Take
> the call path of bdrv_aio_readv(*bdrv_cb) on raw linux file as
> an example, there are aync following operations involved for AioContext:
> 1 the bdrv_cb() will be executed in bdrv_co_em_bh().
> 2 bdrv_co_io_em_complete() will be executed in event_notfier_ready().
> 3 aio_worker() will be executed in worker_thread().
>    Operation 2 and 3 are tracked by block layer's queue after Stefan's
> dropping io_flush() series.
>    Now if we stick to GMainContext to represent context concept,
> then when thread B want to aquire GMainContext used by thread A,
> all works setupped by A should be finished before B aquire it,
> otherwis

Re: [Qemu-devel] [PATCH v3 11/24] cpu/a15mpcore: Embed GICState

2013-08-22 Thread Peter Maydell

On 22 August 2013 11:56, Andreas Färber  wrote:
> Am 21.08.2013 23:05, schrieb Peter Maydell:
> Not sure if a union of only one member is permitted? We're not actually
> accessing the GICState, only void* and DEVICE()/SYS_BUS_DEVICE(), so it
> just needs to block the memory, hopefully without needing to distinguish
> between ->gic.emulated and ->gic.kvm pointers.

Yes, but we need to actually have the right amount of memory
even if we don't care which of the subclasses this is. When
we were just storing a pointer this worked fine, because we
effectively deferred to the "create me a Foo" code to allocate
the right amount of memory for the specific object it returned.

> The decision doesn't depend on any user-settable property, just on the
> at this point global kvm_enabled() state, so I see nowhere else to
> allocate it dynamically.
>
> If you change the .instance_size struct one of the GICs uses, then a
> number of places will need to be reviewed, including
> ARM_GIC_COMMON()[*], ARM_GIC() and KVM_ARM_GIC() all returning the same
> type.

Yes, but those are part of the implementation, not the users,
and also they will be easy to find because there will be compile
errors where we've changed the type returned by one of those
macros but we haven't updated the callsite. (Conversely, users
which don't care which specific subclass they're dealing with
can continue to use the struct and macro corresponding to
the common base class.)

If there was a variant of object_initialize() that checked that
sizeof(*obj) was at least as big as the size of the object
instance that would be useful. (Would need to be a macro,
and I think we'd need to expose a function for "how big is
this type anyway").

-- PMM

Re: [Qemu-devel] [PATCHv4] add qemu-img convert -C option (skip target volume creation)

2013-08-22 Thread Stefan Hajnoczi

On Mon, Aug 12, 2013 at 12:41:50PM +0100, Alex Bligh wrote:
> From: Alexandre Derumier 
> 
> Add a -C option to skip volume creation on qemu-img convert.
> This is useful for targets such as rbd / ceph, where the
> target volume may already exist; we cannot always rely on
> qemu-img convert to create the image, as dependent on the
> output format, there may be parameters which are not possible
> to specify through the qemu-img convert command line.
> 
> Signed-off-by: Alexandre Derumier 
> Signed-off-by: Alex Bligh 
> ---
>  qemu-img-cmds.hx |4 ++--
>  qemu-img.c   |   39 ---
>  qemu-img.texi|   15 ++-
>  3 files changed, 40 insertions(+), 18 deletions(-)

Looks good but please include a new qemu-iotest test case that checks:

1. Error if the target volume does not exist.

2. Success if a correctly sized target volume exists.

3. ?? if an incorrectly sized target volume exists.

...and anything else you feel is worth testing.

I recommend keeping the test volume size small so the test case can
execute quickly.  1 MB should be fine for raw or qcow2 images.

Stefan

Re: [Qemu-devel] [PATCH]virtio: virtqueue_get_avail_bytes: fix desc_pa when loop over the indirect descriptor table

2013-08-22 Thread Stefan Hajnoczi

On Thu, Aug 22, 2013 at 02:47:16PM +0800, yinyin wrote:
> virtqueue_get_avail_bytes: when found a indirect desc, we need loop over it.
>/* loop over the indirect descriptor table */
>indirect = 1;
>max = vring_desc_len(desc_pa, i) / sizeof(VRingDesc);
>num_bufs = i = 0;
>desc_pa = vring_desc_addr(desc_pa, i);
> But, It init i to 0, then use i to update desc_pa. so we will always get:
> desc_pa = vring_desc_addr(desc_pa, 0);
> the last two line should swap.
> 
> Signed-off-by: Yin Yin 
> ---
>  hw/virtio/virtio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi

Re: [Qemu-devel] [PATCH v3 0/3] slirp: fill mainloop with more precise timeout value

2013-08-22 Thread Stefan Hajnoczi

On Wed, Aug 21, 2013 at 10:15:49AM +0800, Liu Ping Fan wrote:
> With this series, we can set the mainloop timeout more precisely when slirp 
> has
> to emulate tcp timeout problem.
> 
> v3:
>   fix comment: document timeout unit "milliseconds"
>   fix logic: no slirps, no timeout modifications in slirp_pollfds_fill()
> v2:
>   fold slirp_update_timeout logic into slirp_pollfds_fill.
> 
> 
> Liu Ping Fan (3):
>   slirp: make timeout local
>   slirp: define timeout as macro
>   slirp: set mainloop timeout with more precise value
> 
>  main-loop.c  |  3 +--
>  slirp/libslirp.h |  3 +--
>  slirp/slirp.c| 55 +++
>  slirp/slirp.h|  3 +++
>  stubs/slirp.c|  6 +-
>  5 files changed, 45 insertions(+), 25 deletions(-)

These patches look okay, I've skimmed them.  Leaving detailed slirp review to 
Jan.

Stefan

Re: [Qemu-devel] [PATCH v2] block: Introduce bs->zero_beyond_eof

2013-08-22 Thread Stefan Hajnoczi

On Thu, Aug 22, 2013 at 03:24:14PM +0800, Asias He wrote:
> In 4146b46c42e0989cb5842e04d88ab6ccb1713a48 (block: Produce zeros when
> protocols reading beyond end of file), we break qemu-iotests ./check
> -qcow2 022. This happens because qcow2 temporarily sets ->growable = 1
> for vmstate accesses (which are stored beyond the end of regular image
> data).
> 
> We introduce the bs->zero_beyond_eof to allow qcow2_load_vmstate() to
> disable ->zero_beyond_eof temporarily in addition to enable ->growable.
> 
> Suggested-by: Stefan Hajnoczi 
> Signed-off-by: Asias He 
> ---
> Changes in v2: Set bs->zero_beyond_eof in bdrv_open_common
> 
>  block.c   | 4 +++-
>  block/qcow2.c | 3 +++
>  include/block/block_int.h | 3 +++
>  3 files changed, 9 insertions(+), 1 deletion(-)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

...with a little twist: since the broken patch hasn't been merged yet
I'm applying this fix *first* to keep the tree bisectable.

Stefan

Re: [Qemu-devel] [PATCH v2] block: Produce zeros when protocols reading beyond end of file

2013-08-22 Thread Stefan Hajnoczi

On Tue, Aug 06, 2013 at 09:53:40AM +0800, Asias He wrote:
> From: MORITA Kazutaka 
> 
> While Asias is debugging an issue creating qcow2 images on top of
> non-file protocols.  It boils down to this example using NBD:
> 
> $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
> 
> Notice the open -g option to set bs->growable.  This means you can
> read/write beyond end of file.  Reading beyond end of file is supposed
> to produce zeroes.
> 
> We rely on this behavior in qcow2_create2() during qcow2 image
> creation.  We create a new file and then write the qcow2 header
> structure using bdrv_pwrite().  Since QCowHeader is not a multiple of
> sector size, block.c first uses bdrv_read() on the empty file to fetch
> the first sector (should be all zeroes).
> 
> Here is the output from the qemu-io NBD example above:
> 
> $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
> :  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  
> 0010:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  
> 0020:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  
> ...
> 
> We are not zeroing the buffer!  As a result qcow2 image creation on top
> of protocols is not guaranteed to work even when file creation is
> supported by the protocol.
> 
> Signed-off-by: MORITA Kazutaka 
> Signed-off-by: Asias He 
> ---
>  block.c | 30 +-
>  1 file changed, 29 insertions(+), 1 deletion(-)

Applied again on top of Asias' fix so qcow2 vmstate doesn't break.

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan

[Qemu-devel] [PATCH ] hw: Modified usb-tablet category from Misc to Input

2013-08-22 Thread Marcel Apfelbaum

Thanks to Markus Armbruster 
for finding this bug.

Signed-off-by: Marcel Apfelbaum 
---
 hw/usb/dev-hid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/dev-hid.c b/hw/usb/dev-hid.c
index 66c6331..5956720 100644
--- a/hw/usb/dev-hid.c
+++ b/hw/usb/dev-hid.c
@@ -658,7 +658,7 @@ static void usb_tablet_class_initfn(ObjectClass *klass, 
void *data)
 uc->product_desc   = "QEMU USB Tablet";
 dc->vmsd = &vmstate_usb_ptr;
 dc->props = usb_tablet_properties;
-set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
 }
 
 static const TypeInfo usb_tablet_info = {
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH] block: better error message for read only format name

2013-08-22 Thread Stefan Hajnoczi

On Thu, Aug 22, 2013 at 03:36:59PM +0800, Fam Zheng wrote:
> When user tries to use read-only whitelist format in the command line
> option, failure message was "'foo' invalid format". It might be invalid
> only for writable, but valid for read-only, so it is confusing. Give the
> user easier to understand information.
> 
> Signed-off-by: Fam Zheng 
> ---
>  blockdev.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan

Re: [Qemu-devel] [PULL 06/18] aio: stop using .io_flush()

2013-08-22 Thread Stefan Hajnoczi

On Wed, Aug 21, 2013 at 07:55:42PM +0200, Stefan Weil wrote:
> this patch was committed yesterday. It breaks MinGW builds:
> 
> aio-win32.c: In function ‘aio_poll’:
> aio-win32.c:128:21: error: ‘AioHandler’ has no member named ‘opaque’
> aio-win32.c:191:25: error: ‘AioHandler’ has no member named ‘opaque’

Thanks for pointing this out.  Let me try to put together a mingw
environment, fix this, and keep testing mingw builds in the future.

Stefan

Re: [Qemu-devel] [PATCHv4] add qemu-img convert -C option (skip target volume creation)

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 13:45, Stefan Hajnoczi ha scritto:
> On Mon, Aug 12, 2013 at 12:41:50PM +0100, Alex Bligh wrote:
>> > From: Alexandre Derumier 
>> > 
>> > Add a -C option to skip volume creation on qemu-img convert.
>> > This is useful for targets such as rbd / ceph, where the
>> > target volume may already exist; we cannot always rely on
>> > qemu-img convert to create the image, as dependent on the
>> > output format, there may be parameters which are not possible
>> > to specify through the qemu-img convert command line.
>> > 
>> > Signed-off-by: Alexandre Derumier 
>> > Signed-off-by: Alex Bligh 
>> > ---
>> >  qemu-img-cmds.hx |4 ++--
>> >  qemu-img.c   |   39 ---
>> >  qemu-img.texi|   15 ++-
>> >  3 files changed, 40 insertions(+), 18 deletions(-)
> Looks good but please include a new qemu-iotest test case that checks:
> 
> 1. Error if the target volume does not exist.
> 
> 2. Success if a correctly sized target volume exists.
> 
> 3. ?? if an incorrectly sized target volume exists.
> 
> ...and anything else you feel is worth testing.
> 
> I recommend keeping the test volume size small so the test case can
> execute quickly.  1 MB should be fine for raw or qcow2 images.

Also, this is the same as some HMP commands' "-n" option (live
snapshots, mirroring, backup) so I suggest to use that name.

Paolo

Re: [Qemu-devel] [PATCH 3/3] pvpanic: rename to isa-pvpanic

2013-08-22 Thread Laszlo Ersek

On 08/21/13 19:06, Paolo Bonzini wrote:
> Il 21/08/2013 19:07, Michael S. Tsirkin ha scritto:

>> NACK
> 
> You know that a single developer's NACK counts nothing (it can be you,
> it can be me), don't you?

going meta...

What's this?

All I know (... I think I know) about patch acceptance is that Anthony
prefers to have at least one R-b. As far as I've seen this is not a hard
requirement (for example, maintainers sometimes send unreviewed patches
in a pull request, and on occasion they are merged).

No words have been spent on NAKs yet (... since my subscription, that
is). Is this stuff formalized somewhere?

Sorry for wasting time...

Thanks,
Laszlo

Re: [Qemu-devel] [PATCH 3/3] pvpanic: rename to isa-pvpanic

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 14:43, Laszlo Ersek ha scritto:
> On 08/21/13 19:06, Paolo Bonzini wrote:
>> Il 21/08/2013 19:07, Michael S. Tsirkin ha scritto:
> 
>>> NACK
>>
>> You know that a single developer's NACK counts nothing (it can be you,
>> it can be me), don't you?
> 
> going meta...
> 
> What's this?
> 
> All I know (... I think I know) about patch acceptance is that Anthony
> prefers to have at least one R-b. As far as I've seen this is not a hard
> requirement (for example, maintainers sometimes send unreviewed patches
> in a pull request, and on occasion they are merged).
> 
> No words have been spent on NAKs yet (... since my subscription, that
> is). Is this stuff formalized somewhere?
> 
> Sorry for wasting time...

No, it's not.  But for example I NACKed removal of pvpanic from 1.6, it
was overridden, and I didn't complain too much.

Paolo

[Qemu-devel] [PATCH ] qemu-help: add category headlines

2013-08-22 Thread Marcel Apfelbaum

This patch follows Markus Armbruster suggestion:

A possibly better way to group help by category: instead of adding
categories to each line, add category headlines, like this:

Controller/Bridge/Hub devices:
name "NAME", bus "BUS"...
...
USB devices:
name "NAME", bus "BUS"...
...
Storage devices:
...

This way, showing devices with multiple categories once per category
actually makes sense.

Note that the "categories to each line" is kept for 2 reasons:
1. Preparation for multifunction devices
2. Ability to grep by category

Signed-off-by: Marcel Apfelbaum 
---
 qdev-monitor.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/qdev-monitor.c b/qdev-monitor.c
index 410cdcb..a7329b0 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -156,6 +156,8 @@ static void qdev_print_category_devices(DeviceCategory 
category)
 DeviceClass *dc;
 GSList *list, *curr;
 
+error_printf("%s devices:\n", qdev_category_get_name(category));
+
 list = object_class_get_list(TYPE_DEVICE, false);
 for (curr = list; curr; curr = g_slist_next(curr)) {
 dc = (DeviceClass *)object_class_dynamic_cast(curr->data, TYPE_DEVICE);
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH V8 00/13] monitor: support sub command group in auto completion and help

2013-08-22 Thread Luiz Capitulino

On Thu, 22 Aug 2013 17:16:23 +0800
Wenchao Xia  wrote:

> 于 2013-8-20 22:04, Luiz Capitulino 写道:
> > On Tue, 30 Jul 2013 12:03:11 -0400
> > Luiz Capitulino  wrote:
> >
> >> On Fri, 26 Jul 2013 11:20:29 +0800
> >> Wenchao Xia  wrote:
> >>
> >>> This series make auto completion and help functions works normal for sub
> >>> command, by using reentrant functions. In order to do that, global 
> >>> variables
> >>> are not directly used in those functions any more. With this series, 
> >>> cmd_table
> >>> is a member of structure Monitor so it is possible to create a monitor 
> >>> with
> >>> different command table now, auto completion will work in that monitor. In
> >>> short, "info" is not treated as a special case now, this series ensure 
> >>> help
> >>> and auto complete function works normal for any sub command added in the 
> >>> future.
> >>>
> >>> Patch 5 replaced cur_mon with rs->mon, it is safe because:
> >>> monitor_init() calls readline_init() which initialize mon->rs, result is
> >>> mon->rs->mon == mon. Then qemu_chr_add_handlers() is called, which make
> >>> monitor_read() function take *mon as its opaque. Later, when user input,
> >>> monitor_read() is called, where cur_mon is set to *mon by "cur_mon = 
> >>> opaque".
> >>> If qemu's monitors run in one thread, then later in readline_handle_byte()
> >>> and readline_comletion(), cur_mon is actually equal to rs->mon, in another
> >>> word, it points to the monitor instance, so it is safe to replace *cur_mon
> >>> in those functions.
> >>
> >> I've applied this to qmp-next with the change I suggested for
> >> patch 09/13.
> >
> > Unfortunately this series brakes make check:
> >
> > GTESTER check-qtest-x86_64
> > Broken pipe
> > GTester: last random seed: R02S3492bd34f44dd17460851643383be44d
> > main-loop: WARNING: I/O thread spun for 1000 iterations
> > make: *** [check-qtest-x86_64] Error 1
> >
> > I debugged it (with some help from Laszlo) and the problem is that it
> > broke the human-monitor-command command. Any usage of this command
> > triggers the bug like:
> >
> > { "execute": "human-monitor-command",
> >   "arguments": { "command-line": "info registers" } }
> >
> > It seems simple to fix, I think you just have to initialize
> > mon->cmd_table in qmp_human_monitor_command(), but I'd recommend two
> > things:
> >
> >   1. It's better to split off some/all QMP initialization from
> >  monitor_init() and call it from qmp_human_monitor_command()
> >
> >   2. Can you please take the opportunity and test all commands using
> >  cur_mon? Just grep for it
> >
> > Sorry for noticing this only now, but I only run make check before
> > sending a pull request (although this very likely shows you didn't
> > run it either).
> >
>About the fd related qmp interface, to test it, send_msg() is needed,
> which was not supported in python 2, but new added python 3.3. I think
> there are three ways to add test cases for fd qmp APIs:
> 1 test only when python > 3.3.
> 2 python plus C: compile a .so and call it with ctypes.
> 3 a new test framework: pure C code to call qmp interfaces.
>Which one do you prefer?

Can't we have a C program plus a shell script to test this? Anyway, if
this gets complicated you can skip having the test-case. This series took
a long way already and holding it because of that test-case isn't fair.

Re: [Qemu-devel] [PATCH ] hw: Modified usb-tablet category from Misc to Input

2013-08-22 Thread Andreas Färber

Am 22.08.2013 14:14, schrieb Marcel Apfelbaum:
> Thanks to Markus Armbruster 
> for finding this bug.

The simplest way to record that is:
Reported-by: Markus Armbruster 

And since this doesn't affect all of hw/, something like "usb/dev-hid:"
might been better; but Gerd as USB maintainer (CC'ed) would be the one
most qualified to change or to instruct to his likings.

Probably missing
Cc: qemu-sta...@nongnu.org
to have it fixed in 1.6.1?

> 
> Signed-off-by: Marcel Apfelbaum 

Change itself looks fine,

Reviewed-by: Andreas Färber 

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-22 Thread Bharata B Rao

On Thu, Aug 22, 2013 at 01:15:59PM +0200, Paolo Bonzini wrote:
> Il 22/08/2013 12:28, Bharata B Rao ha scritto:
> > On Thu, Aug 22, 2013 at 12:00:48PM +0200, Paolo Bonzini wrote:
> >> Il 22/08/2013 11:55, Bharata B Rao ha scritto:
> >>> This was the first apporach I had. I used to abort when writes to pipe
> >>> fail. But there were concerns raised about handling the failures 
> >>> gracefully
> >>> and hence we ended up doing all that error handling of completing the aio
> >>> with -EIO, closing the pipe and making the disk inaccessible.
> >>>
> > Under what circumstances could it happen?
> >>> Not very sure, I haven't seen that happening. I had to manually inject
> >>> faults to test this error path and verify the graceful recovery.
> >>
> >> Looking at write(2), it looks like it is impossible
> >>
> >>EAGAIN or EWOULDBLOCK
> >>can't happen, blocking file descriptor
> >>
> >>EBADF, EPIPE
> >>shouldn't happen since the device is drained before
> >>calling qemu_gluster_close.
> >>
> >>EDESTADDRREQ, EDQUOT, EFBIG, EIO, ENOSPC
> >>cannot happen for pipes
> >>
> >>EFAULT
> >>abort would be fine
> > 
> > In the case where we have separate system and data disks and if error 
> > (EFAULT)
> > happens for the data disk, don't we want to keep the VM up by gracefully
> > disabling IO to the data disk ?
> 
> EFAULT means the buffer address is invalid, I/O error would be EIO, but...
> 
> > I remember this was one of the motivations to
> > handle this failure.
> 
> ... this write is on the pipe, not on a disk.

Right. Failure to complete the write on the pipe means that IO done to the
disk didn't complete and hence to the VM it is essentially a disk IO failure.
That's the reason we return -EIO and make the disk inaccessible when this
failure happens.

My question was if it is ok to abort the VM when IO to one of the disks fails ?

But, if you think it is not worth handling such errors then may be we can drop
this elaborate and race-prone error recovery and just abort.

Regards,
Bharata.

Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 15:25, Bharata B Rao ha scritto:
> On Thu, Aug 22, 2013 at 01:15:59PM +0200, Paolo Bonzini wrote:
>> Il 22/08/2013 12:28, Bharata B Rao ha scritto:
>>> On Thu, Aug 22, 2013 at 12:00:48PM +0200, Paolo Bonzini wrote:
 Il 22/08/2013 11:55, Bharata B Rao ha scritto:
> This was the first apporach I had. I used to abort when writes to pipe
> fail. But there were concerns raised about handling the failures 
> gracefully
> and hence we ended up doing all that error handling of completing the aio
> with -EIO, closing the pipe and making the disk inaccessible.
>
>>> Under what circumstances could it happen?
> Not very sure, I haven't seen that happening. I had to manually inject
> faults to test this error path and verify the graceful recovery.

 Looking at write(2), it looks like it is impossible

EAGAIN or EWOULDBLOCK
can't happen, blocking file descriptor

EBADF, EPIPE
shouldn't happen since the device is drained before
calling qemu_gluster_close.

EDESTADDRREQ, EDQUOT, EFBIG, EIO, ENOSPC
cannot happen for pipes

EFAULT
abort would be fine
>>>
>>> In the case where we have separate system and data disks and if error 
>>> (EFAULT)
>>> happens for the data disk, don't we want to keep the VM up by gracefully
>>> disabling IO to the data disk ?
>>
>> EFAULT means the buffer address is invalid, I/O error would be EIO, but...
>>
>>> I remember this was one of the motivations to
>>> handle this failure.
>>
>> ... this write is on the pipe, not on a disk.
> 
> Right. Failure to complete the write on the pipe means that IO done to the
> disk didn't complete and hence to the VM it is essentially a disk IO failure.

The question is, can the write to the pipe actually fail?  Not just "in
practice not" according to the documented errors, it seems to me that it
cannot.

> That's the reason we return -EIO and make the disk inaccessible when this
> failure happens.
> 
> My question was if it is ok to abort the VM when IO to one of the disks fails 
> ?

Absolutely not, but here the code seems dead to me.

Paolo

> But, if you think it is not worth handling such errors then may be we can drop
> this elaborate and race-prone error recovery and just abort.
> 
> Regards,
> Bharata.
>

[Qemu-devel] [PATCH 2/2] win32-aio: drop win32_aio_flush_cb()

2013-08-22 Thread Stefan Hajnoczi

The io_flush argument to qemu_aio_set_event_notifier() has been removed
since the block layer learnt to drain requests by itself.  Fix the
Windows build for win32-aio.o by updating the
qemu_aio_set_event_notifier() call and dropping win32_aio_flush_cb().

Signed-off-by: Stefan Hajnoczi 
---
 block/win32-aio.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/block/win32-aio.c b/block/win32-aio.c
index fcb7c75..5d1d199 100644
--- a/block/win32-aio.c
+++ b/block/win32-aio.c
@@ -105,13 +105,6 @@ static void win32_aio_completion_cb(EventNotifier *e)
 }
 }
 
-static int win32_aio_flush_cb(EventNotifier *e)
-{
-QEMUWin32AIOState *s = container_of(e, QEMUWin32AIOState, e);
-
-return (s->count > 0) ? 1 : 0;
-}
-
 static void win32_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 QEMUWin32AIOCB *waiocb = (QEMUWin32AIOCB *)blockacb;
@@ -201,8 +194,7 @@ QEMUWin32AIOState *win32_aio_init(void)
 goto out_close_efd;
 }
 
-qemu_aio_set_event_notifier(&s->e, win32_aio_completion_cb,
-win32_aio_flush_cb);
+qemu_aio_set_event_notifier(&s->e, win32_aio_completion_cb);
 
 return s;
 
-- 
1.8.3.1

[Qemu-devel] [PATCH 0/2] win32: build fixes due to io_flush removal

2013-08-22 Thread Stefan Hajnoczi

Stefan Weil noticed that the win32 build is broken with my io_flush changes
applied.  The issues are simple build failures that should have been avoided by
test building Windows, which I didn't.

To make amends I've set up a mingw cross-compiler and have fixed the build.
Sorry for the breakage.

I can include this in my block pull request later today if you are happy with
the patches.

Stefan Hajnoczi (2):
  aio-win32: replace incorrect AioHandler->opaque usage with ->e
  win32-aio: drop win32_aio_flush_cb()

 aio-win32.c   |  4 ++--
 block/win32-aio.c | 10 +-
 2 files changed, 3 insertions(+), 11 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH 1/2] aio-win32: replace incorrect AioHandler->opaque usage with ->e

2013-08-22 Thread Stefan Hajnoczi

The AioHandler->opaque field does not exist in aio-win32.c.  The code
that uses it was incorrectly copied from aio-posix.c.  For Windows we
can use AioHandler->e to match against AioContext->notifier.

This patch fixes the Windows build for aio-win32.o.

Signed-off-by: Stefan Hajnoczi 
---
 aio-win32.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/aio-win32.c b/aio-win32.c
index 78b2801..efb2d5a 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -125,7 +125,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 node->io_notify(node->e);
 
 /* aio_notify() does not count as progress */
-if (node->opaque != &ctx->notifier) {
+if (node->e != &ctx->notifier) {
 progress = true;
 }
 }
@@ -188,7 +188,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 node->io_notify(node->e);
 
 /* aio_notify() does not count as progress */
-if (node->opaque != &ctx->notifier) {
+if (node->e != &ctx->notifier) {
 progress = true;
 }
 }
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v3 0/4] vmdk: Support ESX files

2013-08-22 Thread Stefan Hajnoczi

On Mon, Aug 19, 2013 at 06:54:24PM +0800, Fam Zheng wrote:
> This series add support for VMFS and VMFSSPARSE extents, these types are found
> in description file from ESX hosts.
> 
>  - VMFS is in monolithiFlat format (raw), but hosted in ESX.
> 
>  - VMFSSPARSE is the format we call "vmdk3" with magic bytes "COWD". This 
> patch
>fix the opening of vmdk3 and rename it to vmfs_sparse which is better in
>representing its main usage nowadays.
> 
> v3:
> Reorder patches to first move header check to
> vmdk_add_extent().
> 
> 
> Fam Zheng (3):
>   vmdk: Move l1_size check into vmdk_add_extent()
>   vmdk: fix L1 and L2 table size in vmdk3 open
>   vmdk: support vmfsSparse files
> 
> Paolo Bonzini (1):
>   vmdk: support vmfs files
> 
>  block/vmdk.c | 52 +++-
>  1 file changed, 27 insertions(+), 25 deletions(-)
> 
> -- 
> 1.8.3.1
> 
> 

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan

[Qemu-devel] [PATCH v2] usb/dev-hid: Modified usb-tablet category from Misc to Input

2013-08-22 Thread Marcel Apfelbaum

usb-tablet device was wrongy assgined to Misc category

Reported-by: Markus Armbruster 
Signed-off-by: Marcel Apfelbaum 
Reviewed-by: Andreas Färber 
---
Changes from v2:
 - Add cc to qemu-stable and Gerd Hoffmann
 - Changed subject prefix from hw to usb/dev-hid

 hw/usb/dev-hid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/dev-hid.c b/hw/usb/dev-hid.c
index 66c6331..5956720 100644
--- a/hw/usb/dev-hid.c
+++ b/hw/usb/dev-hid.c
@@ -658,7 +658,7 @@ static void usb_tablet_class_initfn(ObjectClass *klass, 
void *data)
 uc->product_desc   = "QEMU USB Tablet";
 dc->vmsd = &vmstate_usb_ptr;
 dc->props = usb_tablet_properties;
-set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
 }
 
 static const TypeInfo usb_tablet_info = {
-- 
1.8.3.1

[Qemu-devel] [PATCH v3] usb/dev-hid: Modified usb-tablet category from Misc to Input

2013-08-22 Thread Marcel Apfelbaum

usb-tablet device was wrongly assigned to Misc category

Reported-by: Markus Armbruster 
Signed-off-by: Marcel Apfelbaum 
Reviewed-by: Andreas Färber 
---
Changes from v2:
 - Corrected spelling
Changes from v1:
 - Add cc to qemu-stable and Gerd Hoffmann
 - Changed subject prefix from hw to usb/dev-hid

 hw/usb/dev-hid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/dev-hid.c b/hw/usb/dev-hid.c
index 66c6331..5956720 100644
--- a/hw/usb/dev-hid.c
+++ b/hw/usb/dev-hid.c
@@ -658,7 +658,7 @@ static void usb_tablet_class_initfn(ObjectClass *klass, 
void *data)
 uc->product_desc   = "QEMU USB Tablet";
 dc->vmsd = &vmstate_usb_ptr;
 dc->props = usb_tablet_properties;
-set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
 }
 
 static const TypeInfo usb_tablet_info = {
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH ] qemu-help: add category headlines

2013-08-22 Thread Andreas Färber

Am 22.08.2013 14:48, schrieb Marcel Apfelbaum:
> This patch follows Markus Armbruster suggestion:
> 
> A possibly better way to group help by category: instead of adding
> categories to each line, add category headlines, like this:
> 
> Controller/Bridge/Hub devices:
> name "NAME", bus "BUS"...
> ...
> USB devices:
> name "NAME", bus "BUS"...
> ...
> Storage devices:
> ...
> 
> This way, showing devices with multiple categories once per category
> actually makes sense.
> 
> Note that the "categories to each line" is kept for 2 reasons:
> 1. Preparation for multifunction devices
> 2. Ability to grep by category
> 
> Signed-off-by: Marcel Apfelbaum 
> ---
>  qdev-monitor.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/qdev-monitor.c b/qdev-monitor.c
> index 410cdcb..a7329b0 100644
> --- a/qdev-monitor.c
> +++ b/qdev-monitor.c
> @@ -156,6 +156,8 @@ static void qdev_print_category_devices(DeviceCategory 
> category)
>  DeviceClass *dc;
>  GSList *list, *curr;
>  
> +error_printf("%s devices:\n", qdev_category_get_name(category));

Why is that an error? Shouldn't it go to stdout?

Andreas

> +
>  list = object_class_get_list(TYPE_DEVICE, false);
>  for (curr = list; curr; curr = g_slist_next(curr)) {
>  dc = (DeviceClass *)object_class_dynamic_cast(curr->data, 
> TYPE_DEVICE);
> 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-22 Thread Bharata B Rao

On Thu, Aug 22, 2013 at 03:27:35PM +0200, Paolo Bonzini wrote:
>  Looking at write(2), it looks like it is impossible
> 
> EAGAIN or EWOULDBLOCK
> can't happen, blocking file descriptor
> 
> EBADF, EPIPE
> shouldn't happen since the device is drained before
> calling qemu_gluster_close.
> 
> EDESTADDRREQ, EDQUOT, EFBIG, EIO, ENOSPC
> cannot happen for pipes
> 
> EFAULT
> abort would be fine
> >>>
> >>> In the case where we have separate system and data disks and if error 
> >>> (EFAULT)
> >>> happens for the data disk, don't we want to keep the VM up by gracefully
> >>> disabling IO to the data disk ?
> >>
> >> EFAULT means the buffer address is invalid, I/O error would be EIO, but...
> >>
> >>> I remember this was one of the motivations to
> >>> handle this failure.
> >>
> >> ... this write is on the pipe, not on a disk.
> > 
> > Right. Failure to complete the write on the pipe means that IO done to the
> > disk didn't complete and hence to the VM it is essentially a disk IO 
> > failure.
> 
> The question is, can the write to the pipe actually fail?  Not just "in
> practice not" according to the documented errors, it seems to me that it
> cannot.

May be I am dragging this a bit, but since we are at it, let me make one last
observation here :)

The buffer in question here is the GlusterAIOCB pointer that gets passed
back and forth between QEMU and gluster thro' glfs_pwritev_async and associated
callback gluster_finish_aiocb. Isn't there a possibility that gluster will
not give us back the same pointer during callback due to some errors on the
gluster side ? Unlikely but possible ?

Regards,
Bharata.

Re: [Qemu-devel] [PATCH ] qemu-help: add category headlines

2013-08-22 Thread Markus Armbruster

Andreas Färber  writes:

> Am 22.08.2013 14:48, schrieb Marcel Apfelbaum:
>> This patch follows Markus Armbruster suggestion:
>> 
>> A possibly better way to group help by category: instead of adding
>> categories to each line, add category headlines, like this:
>> 
>> Controller/Bridge/Hub devices:
>> name "NAME", bus "BUS"...
>> ...
>> USB devices:
>> name "NAME", bus "BUS"...
>> ...
>> Storage devices:
>> ...
>> 
>> This way, showing devices with multiple categories once per category
>> actually makes sense.
>> 
>> Note that the "categories to each line" is kept for 2 reasons:
>> 1. Preparation for multifunction devices
>> 2. Ability to grep by category
>> 
>> Signed-off-by: Marcel Apfelbaum 
>> ---
>>  qdev-monitor.c | 2 ++
>>  1 file changed, 2 insertions(+)
>> 
>> diff --git a/qdev-monitor.c b/qdev-monitor.c
>> index 410cdcb..a7329b0 100644
>> --- a/qdev-monitor.c
>> +++ b/qdev-monitor.c
>> @@ -156,6 +156,8 @@ static void qdev_print_category_devices(DeviceCategory 
>> category)
>>  DeviceClass *dc;
>>  GSList *list, *curr;
>>  
>> +error_printf("%s devices:\n", qdev_category_get_name(category));
>
> Why is that an error? Shouldn't it go to stdout?

Output of -device help has always gone to stderr, and that has always
annoyed me.  Just not enough to fix it.

[...]

Re: [Qemu-devel] [PATCH] rdma: clean up of qemu_rdma_cleanup()

2013-08-22 Thread Michael R. Hines


On 08/12/2013 10:12 PM, Isaku Yamahata wrote:

- It can't be determined by RDMAContext::cm_id != NULL if the connection
   is established or not.
- RDMAContext::cm_id is leaked and not destroyed because it is set to NULL
   too early.
- RDMAContext::qp is created by rdma_create_qp() so that it should be destroyed
   by rdma_destroy_qp(). not ibv_destroy_qp()

Cc: Michael R. Hines 
Signed-off-by: Isaku Yamahata 
---
  migration-rdma.c |9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/migration-rdma.c b/migration-rdma.c
index 3d1266f..e71c10a 100644
--- a/migration-rdma.c
+++ b/migration-rdma.c
@@ -356,6 +356,7 @@ typedef struct RDMAContext {
   */
  struct rdma_cm_id *cm_id;   /* connection manager ID */
  struct rdma_cm_id *listen_id;
+bool connected;

  struct ibv_context  *verbs;
  struct rdma_event_channel   *channel;
@@ -2192,7 +2193,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
  struct rdma_cm_event *cm_event;
  int ret, idx;

-if (rdma->cm_id) {
+if (rdma->cm_id && rdma->connected) {
  if (rdma->error_state) {
  RDMAControlHeader head = { .len = 0,
 .type = RDMA_CONTROL_ERROR,
@@ -2211,7 +2212,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
  }
  }
  DDPRINTF("Disconnected.\n");
-rdma->cm_id = NULL;
+rdma->connected = false;
  }

  g_free(rdma->block);
@@ -2233,7 +2234,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
  }

  if (rdma->qp) {
-ibv_destroy_qp(rdma->qp);
+rdma_destroy_qp(rdma->cm_id);
  rdma->qp = NULL;
  }
  if (rdma->cq) {
@@ -2370,6 +2371,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error 
**errp)
  rdma->cm_id = NULL;
  goto err_rdma_source_connect;
  }
+rdma->connected = true;

  memcpy(&cap, cm_event->param.conn.private_data, sizeof(cap));
  network_to_caps(&cap);
@@ -2904,6 +2906,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
  }

  rdma_ack_cm_event(cm_event);
+rdma->connected = true;

  ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
  if (ret) {


I have applied this to my tree. Thanks.

- Michael

Re: [Qemu-devel] [PATCH 0/2] win32: build fixes due to io_flush removal

2013-08-22 Thread Stefan Weil

Am 22.08.2013 15:28, schrieb Stefan Hajnoczi:
> Stefan Weil noticed that the win32 build is broken with my io_flush changes
> applied.  The issues are simple build failures that should have been avoided 
> by
> test building Windows, which I didn't.
>
> To make amends I've set up a mingw cross-compiler and have fixed the build.
> Sorry for the breakage.
>
> I can include this in my block pull request later today if you are happy with
> the patches.
>
> Stefan Hajnoczi (2):
>   aio-win32: replace incorrect AioHandler->opaque usage with ->e
>   win32-aio: drop win32_aio_flush_cb()
>
>  aio-win32.c   |  4 ++--
>  block/win32-aio.c | 10 +-
>  2 files changed, 3 insertions(+), 11 deletions(-)
>

For both patches:

Reviewed-by: Stefan Weil 

Hi Stefan,

thank you for the fast fix. Please include it in your pull request.

Regards,
Stefan

Re: [Qemu-devel] [PATCH] kvm: shoten the parameter list for get_real_device()

2013-08-22 Thread Alex Williamson

On Mon, 2013-08-19 at 09:19 +0800, Wei Yang wrote:
> get_real_device() has 5 parameters with the last 4 is contained in the first
> structure.
> 
> This patch removes the last 4 parameters and directly use them from the first
> parameter.
> 
> Signed-off-by: Wei Yang 


Seems harmless enough

Acked-by: Alex Williamson 

> ---
>  hw/i386/kvm/pci-assign.c |9 -
>  1 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> index 5618173..011764f 100644
> --- a/hw/i386/kvm/pci-assign.c
> +++ b/hw/i386/kvm/pci-assign.c
> @@ -568,8 +568,7 @@ static int get_real_device_id(const char *devpath, 
> uint16_t *val)
>  return get_real_id(devpath, "device", val);
>  }
>  
> -static int get_real_device(AssignedDevice *pci_dev, uint16_t r_seg,
> -   uint8_t r_bus, uint8_t r_dev, uint8_t r_func)
> +static int get_real_device(AssignedDevice *pci_dev)
>  {
>  char dir[128], name[128];
>  int fd, r = 0, v;
> @@ -582,7 +581,8 @@ static int get_real_device(AssignedDevice *pci_dev, 
> uint16_t r_seg,
>  dev->region_number = 0;
>  
>  snprintf(dir, sizeof(dir), "/sys/bus/pci/devices/%04x:%02x:%02x.%x/",
> - r_seg, r_bus, r_dev, r_func);
> + pci_dev->host.domain, pci_dev->host.bus,
> + pci_dev->host.slot, pci_dev->host.function);
>  
>  snprintf(name, sizeof(name), "%sconfig", dir);
>  
> @@ -1769,8 +1769,7 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
>  memcpy(dev->emulate_config_write, dev->emulate_config_read,
> sizeof(dev->emulate_config_read));
>  
> -if (get_real_device(dev, dev->host.domain, dev->host.bus,
> -dev->host.slot, dev->host.function)) {
> +if (get_real_device(dev)) {
>  error_report("pci-assign: Error: Couldn't get real device (%s)!",
>   dev->dev.qdev.id);
>  goto out;

Re: [Qemu-devel] [PATCH] block: Fix race in gluster_finish_aiocb

2013-08-22 Thread Paolo Bonzini

Il 22/08/2013 16:01, Bharata B Rao ha scritto:
 EFAULT means the buffer address is invalid, I/O error would be EIO, but...

> I remember this was one of the motivations to
> handle this failure.

 ... this write is on the pipe, not on a disk.
>>>
>>> Right. Failure to complete the write on the pipe means that IO done to the
>>> disk didn't complete and hence to the VM it is essentially a disk IO 
>>> failure.
>>
>> The question is, can the write to the pipe actually fail?  Not just "in
>> practice not" according to the documented errors, it seems to me that it
>> cannot.
> 
> May be I am dragging this a bit, but since we are at it, let me make one last
> observation here :)
> 
> The buffer in question here is the GlusterAIOCB pointer that gets passed
> back and forth between QEMU and gluster thro' glfs_pwritev_async and 
> associated
> callback gluster_finish_aiocb.

No, it's not:

  retval = qemu_write_full(s->fds[GLUSTER_FD_WRITE], &acb, sizeof(acb));

The pointer that is passed to qemu_write_full is on the stack; it's the
address of the acb variable.  The _content_ of the buffer (which the
kernel doesn't look at, so it's not relevant for generating EFAULT) is
the GlusterAIOCB pointer.

> Isn't there a possibility that gluster will
> not give us back the same pointer during callback due to some errors on the
> gluster side ? Unlikely but possible ?

Then we would have already crashed on the line before:

  acb->ret = ret;

which writes to the hypothetically corrupted pointer.

But we cannot protect against memory corruption, or against bugs that
violate the API contract at such a fundamental level.  QEMU will
SIGSEGV, but it wouldn't be the first time.

Paolo

Re: [Qemu-devel] [PATCH v2] usb/dev-hid: Modified usb-tablet category from Misc to Input

2013-08-22 Thread Andreas Färber

Am 22.08.2013 15:39, schrieb Marcel Apfelbaum:
> usb-tablet device was wrongy assgined to Misc category
> 
> Reported-by: Markus Armbruster 
> Signed-off-by: Marcel Apfelbaum 
> Reviewed-by: Andreas Färber 
> ---
> Changes from v2:
>  - Add cc to qemu-stable and Gerd Hoffmann

FWIW I literally meant "Cc: qemu-sta...@nongnu.org", which can be
git-grep'ed for. If the patch is picked up by a human maintainer such as
Gerd rather than Anthony's scripts that can be added by the maintainer
to spare you a v4.

>  - Changed subject prefix from hw to usb/dev-hid

Thanks,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH ] qemu-help: add category headlines

2013-08-22 Thread Eric Blake

On 08/22/2013 08:13 AM, Markus Armbruster wrote:

>>> +++ b/qdev-monitor.c
>>> @@ -156,6 +156,8 @@ static void qdev_print_category_devices(DeviceCategory 
>>> category)
>>>  DeviceClass *dc;
>>>  GSList *list, *curr;
>>>  
>>> +error_printf("%s devices:\n", qdev_category_get_name(category));
>>
>> Why is that an error? Shouldn't it go to stdout?
> 
> Output of -device help has always gone to stderr, and that has always
> annoyed me.  Just not enough to fix it.

Back when libvirt scraped -help output, changing it to use stdout would
be an incompatible change.  But now that libvirt uses QMP, I would also
welcome a change to use stdout (but also fall in the category of "not
enough of a bother for ME to fix it).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH] kvm: warn if num cpus is greater than num recommended

2013-08-22 Thread Andrew Jones

The comment in kvm_max_vcpus() states that it's using the recommended
procedure from the kernel API documentation to get the max number
of vcpus that kvm supports. It is, but by always returning the
maximum number supported. The maximum number should only be used
for development purposes. qemu should check KVM_CAP_NR_VCPUS for
the recommended number of vcpus. This patch adds a warning if a user
specifies a number of cpus between the recommended and max.

Signed-off-by: Andrew Jones 
---
 kvm-all.c | 45 +++--
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 716860f617455..9092e13ae60ea 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1313,24 +1313,24 @@ static int kvm_irqchip_create(KVMState *s)
 return 0;
 }
 
-static int kvm_max_vcpus(KVMState *s)
+/* Find number of supported CPUs using the recommended
+ * procedure from the kernel API documentation to cope with
+ * older kernels that may be missing capabilities.
+ */
+static int kvm_recommended_vcpus(KVMState *s)
 {
 int ret;
 
-/* Find number of supported CPUs using the recommended
- * procedure from the kernel API documentation to cope with
- * older kernels that may be missing capabilities.
- */
-ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
-if (ret) {
-return ret;
-}
 ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS);
-if (ret) {
-return ret;
-}
+return (ret) ? ret : 4;
+}
 
-return 4;
+static int kvm_max_vcpus(KVMState *s)
+{
+int ret;
+
+ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
+return (ret) ? ret : kvm_recommended_vcpus(s);
 }
 
 int kvm_init(void)
@@ -1383,12 +1383,21 @@ int kvm_init(void)
 goto err;
 }
 
-max_vcpus = kvm_max_vcpus(s);
+max_vcpus = kvm_recommended_vcpus(s);
 if (smp_cpus > max_vcpus) {
-ret = -EINVAL;
-fprintf(stderr, "Number of SMP cpus requested (%d) exceeds max cpus "
-"supported by KVM (%d)\n", smp_cpus, max_vcpus);
-goto err;
+fprintf(stderr,
+"Warning: Number of SMP cpus requested (%d) exceeds "
+"recommended cpus supported by KVM (%d)\n",
+smp_cpus, max_vcpus);
+
+max_vcpus = kvm_max_vcpus(s);
+if (smp_cpus > max_vcpus) {
+ret = -EINVAL;
+fprintf(stderr, "Number of SMP cpus requested (%d) exceeds "
+"max cpus supported by KVM (%d)\n",
+smp_cpus, max_vcpus);
+goto err;
+}
 }
 
 s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
-- 
1.8.1.4

[Qemu-devel] pvpanic plans?

2013-08-22 Thread Paolo Bonzini

The thread from yesterday has died off (perhaps also because of
my inappropriate answer to Michael, for which I apologize to him
and everyone).  I took some time to discuss the libvirt requirements
further with Daniel Berrange and Eric Blake on IRC.  If anyone is
interested, I can give logs.  This is a suggestion for how to
proceed in both QEMU and libvirt.


== Builtin pvpanic ==

QEMU will remove pvpanic from pc-1.5 in 1.6.1 and 1.5.4.  This does not
break migration.


== Support in libvirt for current functionality ==

libvirt will add a  element, and possibly a capability
for it accessible via "virsh capabilities".  There are two possibilities:

1) On QEMU 1.5.4/1.6.1 and newer (and on QEMU 1.6.0 with a machine type
   other than pc-1.5),  will only work if the element is there.
   On QEMU 1.5.0->1.5.3, and on QEMU 1.6.0 with the pc-1.5 machine type,
will be obeyed always, and may override e.g. reboot-on-panic
   if a guest driver exist.

2) On all versions,  will only work if the element is there.


In turn, there are two ways to implement (2):

2a) libvirt will always add -global pvpanic.iobase=0 to neutralize
the builtin pvpanic device if present.  
will create the device with -device pvpanic,iobase=0x505

Advantage: no changes to QEMU

Disadvantage 1: writes to port 0 with QEMU 1.{5.0,5.1,5.2,5.3,6.0}
  and pc-1.5 machine type will write to a pvpanic device instead of
  the DMA controller.  Probably harmless, and limited to some QEMU
  versions.

Disadvantage 2: libvirt has knowledge of the pvpanic port number

2b) QEMU will provide a way for libvirt to detect that no machine type
has the builtin pvpanic.  If some machine type may have the builtin
pvpanic, and  is absent, libvirt will add
"-global pvpanic.iobase=0" to neutralize it.  Otherwise, libvirt
will create the device normally.

A possible way for libvirt to detect "good" machine types is a
dummy property.  This is a bit ugly in that the property would not
affect the behavior of the device.  The property would remain in
the long term.

Another possibility is for QEMU to rename the device, e.g. to
isa-pvpanic.  This is also somewhat gross, but not visible in the
long term when the "pvpanic" name will be lost in history.

Advantage 1: libvirt has no knowledge of the pvpanic port number

Disadvantage 1: same as above

Disadvantage 2: need a somewhat gross change in QEMU


This method also provides an (also somewhat gross on the QEMU side)
way to detect other changes in the pvpanic semantics.  One example
mentioned below, is making the panicked state temporary.

== Possible improvements to pvpanic ==

The current implementation of pvpanic supports three modes: reset system
on panic, destroy domain on panic, preserve domain with no possibility
to resume it.  (Optionally a domain can be dumped too).

Long term, the choice to include pvpanic should not be on the guest
admin's shoulders, but rather in libosinfo.  Thus, it would be nice to
have a fourth mode where the panic is logged but the guest otherwise
keeps running.  This mode would let libosinfo add pvpanic by default
without affecting the guest's behavior on panic.

With this change, ignore will behave as follows
for the three possibilities above:

(1)  With QEMU 1.5.0 to 1.6.1,  will _not_ obey the setting,
 never (even if no  is specified).

 libvirt will have to pick a fallback action.

   advantage of destroy as fallback: it is the default (but
  note that restart is the default for virt-install)

   advantage of preserve as fallback: lets the admin examine
  the panic

   advantage of restart as fallback: maximum availability of
  the VM, it is the default for virt-install

(2a) With QEMU 1.5.0 to 1.6.1,  will _not_ obey the setting
 if  is specified.  libvirt has _no way_ to learn
 about this, so the capability would always be present with these
 QEMU versions and libosinfo would always add  with
 these versions.  Given the libosinfo scenario being considered here,
 this is not very different from (1).

(2b) With QEMU 1.5.0 to 1.6.1, the  element will not
 be available and not exposed in libvirt capabilities.  Thus with
 this version libosinfo would omit  from the XML.
 Guest policy will always be followed correctly.


The problem in both (1) and (2a) can be summarized as follows.  First,
libvirt will have to implement and document a fallback action for buggy
QEMU.  Second, even though the problems would be limited to some version
of QEMU, they would be relatively hard to debug for a casual user, could
start happening randomly by updating any one of QEMU, libvirt, libosinfo
or the guest kernel, and there is no fallback action for libvirt that is
always correct.

Thus, considering future libosinfo support for pvpanic, (2b) is preferrable
in my opinion.

Now, making pvpanic reversible requires a change in QEMU (

Re: [Qemu-devel] [PATCH] kvm: warn if num cpus is greater than num recommended

2013-08-22 Thread Eduardo Habkost


On 22/08/2013, at 12:39, Andrew Jones  wrote:

> The comment in kvm_max_vcpus() states that it's using the recommended
> procedure from the kernel API documentation to get the max number
> of vcpus that kvm supports. It is, but by always returning the
> maximum number supported. The maximum number should only be used
> for development purposes. qemu should check KVM_CAP_NR_VCPUS for
> the recommended number of vcpus. This patch adds a warning if a user
> specifies a number of cpus between the recommended and max.
> 
> Signed-off-by: Andrew Jones 

CCing libvir-list. It is probably interesting for libvirt to expose or warn 
about the recommended VCPU limit somehow, and in this case a simple warning on 
stderr won't be enough.

> ---
> kvm-all.c | 45 +++--
> 1 file changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/kvm-all.c b/kvm-all.c
> index 716860f617455..9092e13ae60ea 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1313,24 +1313,24 @@ static int kvm_irqchip_create(KVMState *s)
> return 0;
> }
> 
> -static int kvm_max_vcpus(KVMState *s)
> +/* Find number of supported CPUs using the recommended
> + * procedure from the kernel API documentation to cope with
> + * older kernels that may be missing capabilities.
> + */
> +static int kvm_recommended_vcpus(KVMState *s)
> {
> int ret;
> 
> -/* Find number of supported CPUs using the recommended
> - * procedure from the kernel API documentation to cope with
> - * older kernels that may be missing capabilities.
> - */
> -ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
> -if (ret) {
> -return ret;
> -}
> ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS);
> -if (ret) {
> -return ret;
> -}
> +return (ret) ? ret : 4;
> +}
> 
> -return 4;
> +static int kvm_max_vcpus(KVMState *s)
> +{
> +int ret;
> +
> +ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
> +return (ret) ? ret : kvm_recommended_vcpus(s);
> }
> 
> int kvm_init(void)
> @@ -1383,12 +1383,21 @@ int kvm_init(void)
> goto err;
> }
> 
> -max_vcpus = kvm_max_vcpus(s);
> +max_vcpus = kvm_recommended_vcpus(s);
> if (smp_cpus > max_vcpus) {
> -ret = -EINVAL;
> -fprintf(stderr, "Number of SMP cpus requested (%d) exceeds max cpus "
> -"supported by KVM (%d)\n", smp_cpus, max_vcpus);
> -goto err;
> +fprintf(stderr,
> +"Warning: Number of SMP cpus requested (%d) exceeds "
> +"recommended cpus supported by KVM (%d)\n",
> +smp_cpus, max_vcpus);
> +
> +max_vcpus = kvm_max_vcpus(s);
> +if (smp_cpus > max_vcpus) {
> +ret = -EINVAL;
> +fprintf(stderr, "Number of SMP cpus requested (%d) exceeds "
> +"max cpus supported by KVM (%d)\n",
> +smp_cpus, max_vcpus);
> +goto err;
> +}
> }
> 
> s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
> -- 
> 1.8.1.4
> 

-- 
Eduardo

Re: [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions

2013-08-22 Thread Eric Blake

On 08/22/2013 02:46 AM, Laszlo Ersek wrote:
> Yes. This part of the schema is not for exposure over QMP, it just
> generates stuff for OptsVisitor, and it must remain compatible with the
> original, manual parsing of the option.
> 
> This came up for V6:
> 
> http://thread.gmane.org/gmane.comp.emulators.qemu/225678/focus=225714

My fault for coming into the conversation late, but a note to that
effect in the commit log, and/or in the description of why this type is
listed in the qapi document, would be handy.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] kvm: warn if num cpus is greater than num recommended

2013-08-22 Thread Andreas Färber

Am 22.08.2013 18:12, schrieb Eduardo Habkost:
> 
> On 22/08/2013, at 12:39, Andrew Jones  wrote:
> 
>> The comment in kvm_max_vcpus() states that it's using the recommended
>> procedure from the kernel API documentation to get the max number
>> of vcpus that kvm supports. It is, but by always returning the
>> maximum number supported. The maximum number should only be used
>> for development purposes. qemu should check KVM_CAP_NR_VCPUS for
>> the recommended number of vcpus. This patch adds a warning if a user
>> specifies a number of cpus between the recommended and max.
>>
>> Signed-off-by: Andrew Jones 
> 
> CCing libvir-list. It is probably interesting for libvirt to expose or warn 
> about the recommended VCPU limit somehow, and in this case a simple warning 
> on stderr won't be enough.
> 
>> ---
>> kvm-all.c | 45 +++--
>> 1 file changed, 27 insertions(+), 18 deletions(-)
>>
>> diff --git a/kvm-all.c b/kvm-all.c
>> index 716860f617455..9092e13ae60ea 100644
>> --- a/kvm-all.c
>> +++ b/kvm-all.c
>> @@ -1313,24 +1313,24 @@ static int kvm_irqchip_create(KVMState *s)
>> return 0;
>> }
>>
>> -static int kvm_max_vcpus(KVMState *s)
>> +/* Find number of supported CPUs using the recommended
>> + * procedure from the kernel API documentation to cope with
>> + * older kernels that may be missing capabilities.
>> + */
>> +static int kvm_recommended_vcpus(KVMState *s)
>> {
>> int ret;
>>
>> -/* Find number of supported CPUs using the recommended
>> - * procedure from the kernel API documentation to cope with
>> - * older kernels that may be missing capabilities.
>> - */
>> -ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
>> -if (ret) {
>> -return ret;
>> -}
>> ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS);
>> -if (ret) {
>> -return ret;
>> -}
>> +return (ret) ? ret : 4;
>> +}
>>
>> -return 4;
>> +static int kvm_max_vcpus(KVMState *s)
>> +{
>> +int ret;
>> +
>> +ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS);
>> +return (ret) ? ret : kvm_recommended_vcpus(s);
>> }
>>
>> int kvm_init(void)
>> @@ -1383,12 +1383,21 @@ int kvm_init(void)
>> goto err;
>> }
>>
>> -max_vcpus = kvm_max_vcpus(s);
>> +max_vcpus = kvm_recommended_vcpus(s);
>> if (smp_cpus > max_vcpus) {
>> -ret = -EINVAL;
>> -fprintf(stderr, "Number of SMP cpus requested (%d) exceeds max cpus 
>> "
>> -"supported by KVM (%d)\n", smp_cpus, max_vcpus);
>> -goto err;
>> +fprintf(stderr,
>> +"Warning: Number of SMP cpus requested (%d) exceeds "
>> +"recommended cpus supported by KVM (%d)\n",
>> +smp_cpus, max_vcpus);
>> +
>> +max_vcpus = kvm_max_vcpus(s);
>> +if (smp_cpus > max_vcpus) {
>> +ret = -EINVAL;
>> +fprintf(stderr, "Number of SMP cpus requested (%d) exceeds "
>> +"max cpus supported by KVM (%d)\n",
>> +smp_cpus, max_vcpus);
>> +goto err;
>> +}

Should at least the fatal one use the new error_report()?

>> }
>>
>> s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);

I notice that only checks in kvm_init() based on smp_cpus are touched
herein. Should we add similar checks to CPU hot-add code and thus
possibly move that into some per-vCPU code path?

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH V8 01/11] NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions

2013-08-22 Thread Laszlo Ersek

On 08/22/13 18:14, Eric Blake wrote:
> On 08/22/2013 02:46 AM, Laszlo Ersek wrote:
>> Yes. This part of the schema is not for exposure over QMP, it just
>> generates stuff for OptsVisitor, and it must remain compatible with the
>> original, manual parsing of the option.
>>
>> This came up for V6:
>>
>> http://thread.gmane.org/gmane.comp.emulators.qemu/225678/focus=225714
> 
> My fault for coming into the conversation late, but a note to that
> effect in the commit log, and/or in the description of why this type is
> listed in the qapi document, would be handy.

I agree. We should probably tack a banner to each such structure in the
qapi schema json.

Thanks
Laszlo

Re: [Qemu-devel] pvpanic plans?

2013-08-22 Thread Anthony Liguori

Paolo Bonzini  writes:

> The thread from yesterday has died off (perhaps also because of
> my inappropriate answer to Michael, for which I apologize to him
> and everyone).  I took some time to discuss the libvirt requirements
> further with Daniel Berrange and Eric Blake on IRC.  If anyone is
> interested, I can give logs.  This is a suggestion for how to
> proceed in both QEMU and libvirt.
>
>
> == Builtin pvpanic ==
>
> QEMU will remove pvpanic from pc-1.5 in 1.6.1 and 1.5.4.  This does not
> break migration.

pvpanic has been a failure.  It's a poorly designed device with even
worse semantics.  I applied it and I'll take the fault for merging it in
the first place.

We should simply scrap it and start over.  It has so few users at this
point that this is still a realistic option.  Using something based on
ISA that requires specific ACPI entries was a mistake.

We should just introduce a simple watchdog device based on virtio and
call it a day.  Then it's cross platform, solves the guest enumeration
problem, and libvirt can detect the presence of the new device.

None of the plans outlined below give us a proper solution.  I think
removing is our best option at this point.

Regards,

Anthony Liguori

>
>
> == Support in libvirt for current functionality ==
>
> libvirt will add a  element, and possibly a capability
> for it accessible via "virsh capabilities".  There are two possibilities:
>
> 1) On QEMU 1.5.4/1.6.1 and newer (and on QEMU 1.6.0 with a machine type
>other than pc-1.5),  will only work if the element is there.
>On QEMU 1.5.0->1.5.3, and on QEMU 1.6.0 with the pc-1.5 machine type,
> will be obeyed always, and may override e.g. reboot-on-panic
>if a guest driver exist.
>
> 2) On all versions,  will only work if the element is there.
>
>
> In turn, there are two ways to implement (2):
>
> 2a) libvirt will always add -global pvpanic.iobase=0 to neutralize
> the builtin pvpanic device if present.  
> will create the device with -device pvpanic,iobase=0x505
>
> Advantage: no changes to QEMU
>
> Disadvantage 1: writes to port 0 with QEMU 1.{5.0,5.1,5.2,5.3,6.0}
>   and pc-1.5 machine type will write to a pvpanic device instead of
>   the DMA controller.  Probably harmless, and limited to some QEMU
>   versions.
>
> Disadvantage 2: libvirt has knowledge of the pvpanic port number
>
> 2b) QEMU will provide a way for libvirt to detect that no machine type
> has the builtin pvpanic.  If some machine type may have the builtin
> pvpanic, and  is absent, libvirt will add
> "-global pvpanic.iobase=0" to neutralize it.  Otherwise, libvirt
> will create the device normally.
>
> A possible way for libvirt to detect "good" machine types is a
> dummy property.  This is a bit ugly in that the property would not
> affect the behavior of the device.  The property would remain in
> the long term.
>
> Another possibility is for QEMU to rename the device, e.g. to
> isa-pvpanic.  This is also somewhat gross, but not visible in the
> long term when the "pvpanic" name will be lost in history.
>
> Advantage 1: libvirt has no knowledge of the pvpanic port number
>
> Disadvantage 1: same as above
>
> Disadvantage 2: need a somewhat gross change in QEMU
>
>
> This method also provides an (also somewhat gross on the QEMU side)
> way to detect other changes in the pvpanic semantics.  One example
> mentioned below, is making the panicked state temporary.
>
> == Possible improvements to pvpanic ==
>
> The current implementation of pvpanic supports three modes: reset system
> on panic, destroy domain on panic, preserve domain with no possibility
> to resume it.  (Optionally a domain can be dumped too).
>
> Long term, the choice to include pvpanic should not be on the guest
> admin's shoulders, but rather in libosinfo.  Thus, it would be nice to
> have a fourth mode where the panic is logged but the guest otherwise
> keeps running.  This mode would let libosinfo add pvpanic by default
> without affecting the guest's behavior on panic.
>
> With this change, ignore will behave as follows
> for the three possibilities above:
>
> (1)  With QEMU 1.5.0 to 1.6.1,  will _not_ obey the setting,
>  never (even if no  is specified).
>
>  libvirt will have to pick a fallback action.
>
>advantage of destroy as fallback: it is the default (but
>   note that restart is the default for virt-install)
>
>advantage of preserve as fallback: lets the admin examine
>   the panic
>
>advantage of restart as fallback: maximum availability of
>   the VM, it is the default for virt-install
>
> (2a) With QEMU 1.5.0 to 1.6.1,  will _not_ obey the setting
>  if  is specified.  libvirt has _no way_ to learn
>  about this, so the capability would always be present with these
>  QEMU versions and libosinfo would always add  with
>  these versions.  Given the libosinfo scenar

Re: [Qemu-devel] [PATCH 3/3] pvpanic: rename to isa-pvpanic

2013-08-22 Thread Anthony Liguori

Laszlo Ersek  writes:

> On 08/21/13 19:06, Paolo Bonzini wrote:
>> Il 21/08/2013 19:07, Michael S. Tsirkin ha scritto:
>
>>> NACK
>> 
>> You know that a single developer's NACK counts nothing (it can be you,
>> it can be me), don't you?
>
> going meta...
>
> What's this?
>
> All I know (... I think I know) about patch acceptance is that Anthony
> prefers to have at least one R-b. As far as I've seen this is not a hard
> requirement (for example, maintainers sometimes send unreviewed patches
> in a pull request, and on occasion they are merged).

I look very poorly on anyone nacking anything.  I value constructive
feedback.

Nacking does not add any value to the conversation.  I admire the fact
that we've been able to maintain a very high level of conversation over
the years on qemu-devel and throwing around nacks just lowers the
overall tone.

If you can't think of anything better to say than NACK, don't even
bother sending the email in the first place.

Regards,

Anthony Liguori

>
> No words have been spent on NAKs yet (... since my subscription, that
> is). Is this stuff formalized somewhere?
>
> Sorry for wasting time...
>
> Thanks,
> Laszlo

[Qemu-devel] [PATCH 01/18] qtest: Fix FMT_timeval vs time_t

2013-08-22 Thread Richard Henderson

Since FMT_timeval unconditionally uses %ld, cast tv_sec to long.

Signed-off-by: Richard Henderson 
---
 qtest.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/qtest.c b/qtest.c
index 74f1842..4f6963b 100644
--- a/qtest.c
+++ b/qtest.c
@@ -177,7 +177,7 @@ static void qtest_send_prefix(CharDriverState *chr)
 
 qtest_get_time(&tv);
 fprintf(qtest_log_fp, "[S +" FMT_timeval "] ",
-tv.tv_sec, (long) tv.tv_usec);
+(long) tv.tv_sec, (long) tv.tv_usec);
 }
 
 static void GCC_FMT_ATTR(2, 3) qtest_send(CharDriverState *chr,
@@ -225,7 +225,7 @@ static void qtest_process_command(CharDriverState *chr, 
gchar **words)
 
 qtest_get_time(&tv);
 fprintf(qtest_log_fp, "[R +" FMT_timeval "]",
-tv.tv_sec, (long) tv.tv_usec);
+(long) tv.tv_sec, (long) tv.tv_usec);
 for (i = 0; words[i]; i++) {
 fprintf(qtest_log_fp, " %s", words[i]);
 }
@@ -485,7 +485,7 @@ static void qtest_event(void *opaque, int event)
 qtest_opened = true;
 if (qtest_log_fp) {
 fprintf(qtest_log_fp, "[I " FMT_timeval "] OPENED\n",
-start_time.tv_sec, (long) start_time.tv_usec);
+(long) start_time.tv_sec, (long) start_time.tv_usec);
 }
 break;
 case CHR_EVENT_CLOSED:
@@ -494,7 +494,7 @@ static void qtest_event(void *opaque, int event)
 qemu_timeval tv;
 qtest_get_time(&tv);
 fprintf(qtest_log_fp, "[I +" FMT_timeval "] CLOSED\n",
-tv.tv_sec, (long) tv.tv_usec);
+(long) tv.tv_sec, (long) tv.tv_usec);
 }
 break;
 default:
-- 
1.8.1.4

1 2 3 >

1 - 100 of 253 matches

Mail list logo