Re: [Qemu-devel] [PATCH 1/1] migration: qemu_savevm_state_cleanup becomes mandatory operation

2015-11-09 Thread Li, Liang Z
> >>> Yes, you are right. Thanks a lot.
> >>>
> >>> BTW, can this patch fix the regression you reported?
> >>>
> >>> Reviewed-by: Liang Li 
> >>>
> >> yes
> > Great.  You'd better change the commit message to make it more clear.
> >
> > Liang
> argh.. you are right...
> 
> This problem has appeared in the end of big rework of another problem with
> snapshots and dataplane.
> Sorry that this is not clear that regression is fixed.
> I'll resend the patch with better commit message
> 
> Den

If the regression still exists with you patch,  could please try the test.patch 
I sent to you plus your patch to find out what's the root cause? I am not sure 
about the bdrv_drain_all() in blk_mig_cleanup().

Liang





Re: [Qemu-devel] [PATCH v2] virtio-blk: trivial code optimization

2015-11-09 Thread Paolo Bonzini


On 09/11/2015 04:19, arei.gong...@huawei.com wrote:
> + * 1. requests are not sequential
> + * 2. merge would exceed maximum number of IOVs
> + * 3. merge would exceed maximum transfer length of backend 
> device
> + */
> +if (sector_num + nb_sectors != req->sector_num ||
> +niov > IOV_MAX - req->qiov.niov ||
> +req->qiov.size / BDRV_SECTOR_SIZE + nb_sectors > 
> max_xfer_len) {

Hi Gonglei,

the third condition should also be changed to "new > max - old".

Thanks,

Paolo



Re: [Qemu-devel] [PATCH for-2.5 1/1] spapr: Handle failure of KVM_PPC_ALLOCATE_HTAB ioctl

2015-11-09 Thread David Gibson
On Mon, Nov 09, 2015 at 03:24:15PM +1100, David Gibson wrote:
> On Tue, Nov 03, 2015 at 03:38:19PM +0530, Bharata B Rao wrote:
> > KVM_PPC_ALLOCATE_HTAB ioctl can return -ENOMEM for KVM guests and QEMU
> > never handled this correctly. But this didn't cause any problems till
> > now as KVM_PPC_ALLOCATE_HTAB ioctl returned with smaller than requested
> > HTAB when enough contiguous memory wasn't available in the host.
> > After the proposed kernel change: 
> > https://patchwork.ozlabs.org/patch/530501/,
> > KVM_PPC_ALLOCATE_HTAB ioctl will not fallback to lower sized HTAB
> > allocation and will fail if requested HTAB size can't be met.
> > 
> > Check for such failures in QEMU and abort appropriately. This will
> > prevent guest kernel from hanging/freezing during early boot by doing
> > graceful exit when host is unable to allocate requested HTAB.
> > 
> > Signed-off-by: Bharata B Rao 
> 
> I'm going to apply this, since it fixes a real problem.
> 
> I'm not entirely happy with the way it's done though - I'd prefer to
> see a separate case for (shift < 0) giving an unconditional error.
> Handling both the HV success case and the failure case in that first
> branch is unnecessarily subtle and confusing, IMO.

Ugh.. actually.. this patch seems to cause make check failures when
configured for powerpc guest on an x86 host.  I haven't debugged yet,
but I'm guessing the shift != 0 is now catching the TCG (or PR) case
where we need to allocate the htab ourselves.

> 
> 
> > ---
> >  hw/ppc/spapr.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index e1202ce..ec6e141 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1022,7 +1022,7 @@ static void spapr_alloc_htab(sPAPRMachineState *spapr)
> >  
> >  shift = kvmppc_reset_htab(spapr->htab_shift);
> >  
> > -if (shift > 0) {
> > +if (shift != 0) {
> >  /* Kernel handles htab, we don't need to allocate one */
> >  if (shift != spapr->htab_shift) {
> >  error_setg(&error_abort, "Failed to allocate HTAB of requested 
> > size, try with smaller maxmem");
> > @@ -1055,7 +1055,7 @@ static void spapr_reset_htab(sPAPRMachineState *spapr)
> >  int index;
> >  
> >  shift = kvmppc_reset_htab(spapr->htab_shift);
> > -if (shift > 0) {
> > +if (shift != 0) {
> >  if (shift != spapr->htab_shift) {
> >  error_setg(&error_abort, "Requested HTAB allocation failed 
> > during reset");
> >  }
> 



-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH qemu] spapr: Add /system-id

2015-11-09 Thread David Gibson
On Mon, Nov 09, 2015 at 05:47:17PM +1100, Alexey Kardashevskiy wrote:
> Section B.6.2.1 Root Node Properties of PAPR specification defines
> a set of properties which shall be present in the device tree root,
> one of these properties is "system-id" which "should be unique across
> all systems and all manufacturers". Since UUID is meant to be unique,
> it makes sense to use it as "system-id".
> 
> This adds "system-id" property to the device tree root when not empty.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> 
> This might be expected by AIX so here is the patch.
> I am really not sure if it makes sense to initialize property when
> UUID is all zeroes as the requirement is "unique" and zero-uuid is
> not.

Yeah, I think it would be better to omit system-id entirely when a
UUID hasn't been supplied.
> 
> ---
>  hw/ppc/spapr.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index de77528..e8b407d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -374,6 +374,9 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
>qemu_uuid[14], qemu_uuid[15]);
>  
>  _FDT((fdt_property_string(fdt, "vm,uuid", buf)));
> +if (qemu_uuid_set) {
> +_FDT((fdt_property_string(fdt, "system-id", buf)));
> +}
>  g_free(buf);
>  
>  if (qemu_get_vm_name()) {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2] virtio-blk: trivial code optimization

2015-11-09 Thread Gonglei
On 2015/11/9 16:41, Paolo Bonzini wrote:
> 
> 
> On 09/11/2015 04:19, arei.gong...@huawei.com wrote:
>> + * 1. requests are not sequential
>> + * 2. merge would exceed maximum number of IOVs
>> + * 3. merge would exceed maximum transfer length of backend 
>> device
>> + */
>> +if (sector_num + nb_sectors != req->sector_num ||
>> +niov > IOV_MAX - req->qiov.niov ||
>> +req->qiov.size / BDRV_SECTOR_SIZE + nb_sectors > 
>> max_xfer_len) {
> 
> Hi Gonglei,
> 
> the third condition should also be changed to "new > max - old".
> 
Okay, will do.  :)

Thanks,
-Gonglei





Re: [Qemu-devel] [PATCH COLO-Frame v10 11/38] QEMUSizedBuffer: Introduce two help functions for qsb

2015-11-09 Thread zhanghailiang

On 2015/11/7 2:30, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

Introduce two new QEMUSizedBuffer APIs which will be used by COLO to buffer
VM state:
One is qsb_put_buffer(), which put the content of a given QEMUSizedBuffer
into QEMUFile, this is used to send buffered VM state to secondary.
Another is qsb_fill_buffer(), read 'size' bytes of data from the file into
qsb, this is used to get VM state from socket into a buffer.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
  include/migration/qemu-file.h |  3 ++-
  migration/qemu-file-buf.c | 58 +++
  2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 29a338d..de42d5b 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -144,7 +144,8 @@ ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t 
start, size_t count,
 uint8_t *buf);
  ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
   off_t pos, size_t count);
-
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, int size);
+int qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, int size);


I made most of the qemu_file use size_t back in August; cna you update
this please.


Of course, Will do that in next version, thanks.


Dave



  /*
   * For use on files opened with qemu_bufopen
diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c
index 49516b8..e58004d 100644
--- a/migration/qemu-file-buf.c
+++ b/migration/qemu-file-buf.c
@@ -366,6 +366,64 @@ ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t 
*source,
  return count;
  }

+
+/**
+ * Put the content of a given QEMUSizedBuffer into QEMUFile.
+ *
+ * @f: A QEMUFile
+ * @qsb: A QEMUSizedBuffer
+ * @size: size of content to write
+ */
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, int size)
+{
+int i, l;
+
+for (i = 0; i < qsb->n_iov && size > 0; i++) {
+l = MIN(qsb->iov[i].iov_len, size);
+qemu_put_buffer(f, qsb->iov[i].iov_base, l);
+size -= l;
+}
+}
+
+/*
+ * Read 'size' bytes of data from the file into qsb.
+ * always fill from pos 0 and used after qsb_create().
+ *
+ * It will return size bytes unless there was an error, in which case it will
+ * return as many as it managed to read (assuming blocking fd's which
+ * all current QEMUFile are)
+ */
+int qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, int size)
+{
+ssize_t rc = qsb_grow(qsb, size);
+int pending = size, i;
+qsb->used = 0;
+uint8_t *buf = NULL;
+
+if (rc < 0) {
+return rc;
+}
+
+for (i = 0; i < qsb->n_iov && pending > 0; i++) {
+int doneone = 0;
+/* read until iov full */
+while (doneone < qsb->iov[i].iov_len && pending > 0) {
+int readone = 0;
+buf = qsb->iov[i].iov_base;
+readone = qemu_get_buffer(f, buf,
+MIN(qsb->iov[i].iov_len - doneone, pending));
+if (readone == 0) {
+return qsb->used;
+}
+buf += readone;
+doneone += readone;
+pending -= readone;
+qsb->used += readone;
+}
+}
+return qsb->used;
+}
+
  typedef struct QEMUBuffer {
  QEMUSizedBuffer *qsb;
  QEMUFile *file;
--
1.8.3.1



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

.







Re: [Qemu-devel] [PATCH v9 00/56] Postcopy implementation

2015-11-09 Thread Paolo Bonzini


On 09/11/2015 05:13, David Gibson wrote:
> Which makes me think it's a bit odd that we're not already getting 
> most of the htab data across during the precopy phase.  Don't we 
> already delay entering the postcopy phase until precopy is
> "complete" in the sense that the remaining non-postcopiable data is
> below the downtime limit?  I would have thought that would also
> ensure we'd only have a reasonable number of remaining htab updates
> for the package.

You can enter postcopy at any time.  In this case, Bharata is entering
immediately after the beginning of migration, basically without doing
any work in precopy mode.

Paolo



Re: [Qemu-devel] [PATCH v9 00/56] Postcopy implementation

2015-11-09 Thread Dr. David Alan Gilbert
* Bharata B Rao (bhar...@linux.vnet.ibm.com) wrote:
> On Fri, Nov 06, 2015 at 03:48:11PM +, Dr. David Alan Gilbert wrote:
> > * Bharata B Rao (bhar...@linux.vnet.ibm.com) wrote:
> > 
> > > > Where we have iterable, but non-postcopiable devices (e.g. htab
> > > > or block migration), complete them before forming the 'package'
> > > > but with the CPUs stopped.  This stops them filling up the package.
> > > 
> > > That helps and the migration suceeds now when I switch to postcopy
> > > immediately after starting the migration.
> > 
> > Excellent.
> > 
> > > However after postcopy migration, when I attempt to start an incoming
> > > instance again to migrate the guest back, I see this failure:
> > > 
> > > qemu-system-ppc64: cannot set up guest memory 'ppc_spapr.ram': Cannot 
> > > allocate memory
> > > 
> > > The same doesn't happen with normal migration.
> > 
> > Huh that's fun; that's the original source guest that's running out of RAM?
> > It's original QEMU should be gone by that point.
> 
> Yes, the original source QEMU is gone, but there is not enough memory left
> in the host to start another incoming QEMU instance because...

Ah, so this is ping-pong on one host?

> At the beginning
> -
> $ grep -i mem /proc/meminfo 
> MemTotal:   132816832 kB
> MemFree:128781632 kB
> MemAvailable:   131668224 kB
> 
> After starting the guest (-m 64G,slots=32,maxmem=128G)
> 
> $ grep -i mem /proc/meminfo 
> MemTotal:   132816832 kB
> MemFree:124866880 kB
> MemAvailable:   127753728 kB
> 
> After starting the destination instance (incoming)
> -
> $ grep -i mem /proc/meminfo 
> MemTotal:   132816832 kB
> MemFree:122514880 kB
> MemAvailable:   125401920 kB
> 
> After postcopy migration completes
> --
> $ grep -i mem /proc/meminfo 
> MemTotal:   132816832 kB
> MemFree:55150592 kB
> MemAvailable:   58037888 kB
> 
> After terminating the source instance
> -
> $ grep -i mem /proc/meminfo 
> MemTotal:   132816832 kB
> MemFree:59432448 kB
> MemAvailable:   62319872 kB
> 
> So as you can see, postcopy migration will result in guest claiming
> its entire RAM memory from host. This doesn't happen during normal migration.

I'll try and see if I can replicate this.

Can you :
   1) show me the command line you're using
   2) Show me /proc/pid/smaps for the destination qemu
   3) Turn on the trace postcopy_place_page_zero
  the theory is that most of your pages should end up as zero pages
  and not be allocated.

Dave

> The above results are with the additional patch you sent in this thread
> and I was switching to postcopy migration immediately after starting the
> migration. Without this patch, when I delay the switching to postcopy
> migration a bit, I see the free memory as below when postcopy migration
> completes.
> 
> $ grep -i mem /proc/meminfo 
> MemTotal:   132816832 kB
> MemFree:85018176 kB
> MemAvailable:   87937024 kB
> 
> Regards,
> Bharata.
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11

2015-11-09 Thread Paolo Bonzini


On 08/11/2015 23:55, Peter Maydell wrote:
> So the good news is that on mainline this doesn't happen any more.
> The bad news is that something weird is going on such that git
> bisect doesn't give helpful answers. Specifically if I start by
> compiling older versions and work forwards, then
>  0fd7e09 kvmclock: add a new function to update env->tsc.
> shows the bug, and
>  6388acc Revert "Introduce cpu_clean_all_dirty"
> does not. (And I've got to that commit both via a git-bisect
> and by a second round of manually trying to identify the commit,
> so it's consistent about where it changes behaviour.)
> However that makes no sense because that revert commit
> is just removing unused code. And then if I go backwards again
> to 0fd7e09 the bug doesn't repro there.

Even 0fd7e09 does not change behavior unless you use KVM (which you
obviously don't do under Mac OS X).  So if you go backwards to 0fd7e09^
it shouldn't reproduce there either.

What is the known bad SHA1?

Paolo



Re: [Qemu-devel] [PATCH COLO-Frame v10 12/38] COLO: Save PVM state to secondary side when do checkpoint

2015-11-09 Thread zhanghailiang

On 2015/11/7 2:59, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

The main process of checkpoint is to synchronize SVM with PVM.
VM's state includes ram and device state. So we will migrate PVM's
state to SVM when do checkpoint, just like migration does.

We will cache PVM's state in slave, we use QEMUSizedBuffer
to store the data, we need to know the size of VM state, so in master,
we use qsb to store VM state temporarily, get the data size by call 
qsb_get_length()
and then migrate the data to the qsb in the secondary side.

Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Li Zhijian 
---
  migration/colo.c   | 68 ++
  migration/ram.c| 47 +
  migration/savevm.c |  2 +-
  3 files changed, 101 insertions(+), 16 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 2510762..b865513 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -17,6 +17,9 @@
  #include "qemu/error-report.h"
  #include "qemu/sockets.h"

+/* colo buffer */
+#define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
+
  bool colo_supported(void)
  {
  return true;
@@ -94,9 +97,12 @@ static int colo_ctl_get(QEMUFile *f, uint32_t require)
  return value;
  }

-static int colo_do_checkpoint_transaction(MigrationState *s)
+static int colo_do_checkpoint_transaction(MigrationState *s,
+  QEMUSizedBuffer *buffer)
  {
  int ret;
+size_t size;
+QEMUFile *trans = NULL;

  ret = colo_ctl_put(s->to_dst_file, COLO_COMMAND_CHECKPOINT_REQUEST, 0);
  if (ret < 0) {
@@ -107,15 +113,47 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s)
  if (ret < 0) {
  goto out;
  }
+/* Reset colo buffer and open it for write */
+qsb_set_length(buffer, 0);
+trans = qemu_bufopen("w", buffer);
+if (!trans) {
+error_report("Open colo buffer for write failed");
+goto out;
+}

-/* TODO: suspend and save vm state to colo buffer */
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+qemu_mutex_unlock_iothread();
+trace_colo_vm_state_change("run", "stop");
+
+/* Disable block migration */
+s->params.blk = 0;
+s->params.shared = 0;
+qemu_savevm_state_header(trans);
+qemu_savevm_state_begin(trans, &s->params);
+qemu_mutex_lock_iothread();
+qemu_savevm_state_complete(trans);
+qemu_mutex_unlock_iothread();
+
+qemu_fflush(trans);

  ret = colo_ctl_put(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND, 0);
  if (ret < 0) {
  goto out;
  }
+/* we send the total size of the vmstate first */
+size = qsb_get_length(buffer);
+ret = colo_ctl_put(s->to_dst_file, COLO_COMMAND_VMSTATE_SIZE, size);
+if (ret < 0) {
+goto out;
+}

-/* TODO: send vmstate to Secondary */
+qsb_put_buffer(s->to_dst_file, buffer, size);
+qemu_fflush(s->to_dst_file);
+ret = qemu_file_get_error(s->to_dst_file);
+if (ret < 0) {
+goto out;
+}

  ret = colo_ctl_get(s->from_dst_file, COLO_COMMAND_VMSTATE_RECEIVED);
  if (ret < 0) {
@@ -127,14 +165,24 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s)
  goto out;
  }

-/* TODO: resume Primary */
+ret = 0;
+/* resume master */
+qemu_mutex_lock_iothread();
+vm_start();
+qemu_mutex_unlock_iothread();
+trace_colo_vm_state_change("stop", "run");

  out:
+if (trans) {
+qemu_fclose(trans);
+}
+
  return ret;
  }

  static void colo_process_checkpoint(MigrationState *s)
  {
+QEMUSizedBuffer *buffer = NULL;
  int fd, ret = 0;

  /* Dup the fd of to_dst_file */
@@ -159,6 +207,13 @@ static void colo_process_checkpoint(MigrationState *s)
  goto out;
  }

+buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
+if (buffer == NULL) {
+ret = -ENOMEM;
+error_report("Failed to allocate buffer!");


Please say 'Failed to allocate colo buffer'; QEMU has lots and lots of buffers.



OK, will fix it in next version.


+goto out;
+}
+
  qemu_mutex_lock_iothread();
  vm_start();
  qemu_mutex_unlock_iothread();
@@ -166,7 +221,7 @@ static void colo_process_checkpoint(MigrationState *s)

  while (s->state == MIGRATION_STATUS_COLO) {
  /* start a colo checkpoint */
-ret = colo_do_checkpoint_transaction(s);
+ret = colo_do_checkpoint_transaction(s, buffer);
  if (ret < 0) {
  goto out;
  }
@@ -179,6 +234,9 @@ out:
  migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
MIGRATION_STATUS_COMPLETED);

+qsb_free(buffer);
+buffer = NULL;
+
  if (s->from_dst_file) {
  qemu_fclose(s->from_dst_file);
  }
diff --git a/migration/ram.c b/migration/ram.c
index a25bcc7..5784c15 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3

Re: [Qemu-devel] [PATCH v9 04/56] Move page_size_init earlier

2015-11-09 Thread Juan Quintela
"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> The HOST_PAGE_ALIGN macros don't work until the page size variables
> have been set up; later in postcopy I use those macros in the RAM
> code, and it can be triggered using -object.
>
> Fix this by initialising page_size_init() earlier - it's currently
> initialised inside the accelerators, move it up into vl.c.
>
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Juan Quintela 



Re: [Qemu-devel] [PATCH v9 06/56] qemu_ram_block_by_name

2015-11-09 Thread Juan Quintela
"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> Add a function to find a RAMBlock by name; use it in two
> of the places that already open code that loop; we've
> got another use later in postcopy.
>
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Juan Quintela 

> -error_report("Can't find block %s!", id);
> +error_report("Can't find block %s", id);

No bangs on error messages?  You are a boring man O:-)



Re: [Qemu-devel] [PATCH v3 8/9] block: Introduce BlockDriver.bdrv_drain callback

2015-11-09 Thread Paolo Bonzini


On 09/11/2015 03:56, Fam Zheng wrote:
> +if (bs->drv && bs->drv->bdrv_drain) {
> +bs->drv->bdrv_drain(bs);
> +}
> +QLIST_FOREACH(child, &bs->children, next) {
> +BlockDriverState *cbs = child->bs;
> +if (cbs->drv && cbs->drv->bdrv_drain) {
> +cbs->drv->bdrv_drain(bs);
> +}
> +}

I think this is not enough, because the children could have children as
well.

Perhaps you can do it like bdrv_flush: one function does the call to
bdrv_drain and the recursion on children; bdrv_drain calls that one
function and then does the "while (busy)" loop.

Paolo



Re: [Qemu-devel] [PATCH v9 08/56] Add qemu_get_buffer_in_place to avoid copies some of the time

2015-11-09 Thread Juan Quintela
"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> qemu_get_buffer always copies the data it reads to a users buffer,
> however in many cases the file buffer inside qemu_file could be given
> back to the caller, avoiding the copy.  This isn't always possible
> depending on the size and alignment of the data.
>
> Thus 'qemu_get_buffer_in_place' either copies the data to a supplied
> buffer or updates a pointer to the internal buffer if convenient.
>
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Juan Quintela 



Re: [Qemu-devel] [PATCH v9 12/56] ram_load: Factor out host_from_stream_offset call and check

2015-11-09 Thread Juan Quintela
"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> The main RAM load loop has a call to host_from_stream_offset for
> each page type that actually loads data with the same test;
> factor it out before the switch.
>
> The host = NULL is to silence a bogus gcc warning of
> an unitialised in the RAM_SAVE_COMPRESS_PAGE case, it
> doesn't seem to realise that host is always initialised by the if at
> the top in the cases the switch takes.
>
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Juan Quintela 



Re: [Qemu-devel] [PATCH v9 15/56] Add Linux userfaultfd.h header

2015-11-09 Thread Juan Quintela
"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> Postcopy uses the userfaultfd.h feature in the Linux kernel; include
> the header.
>
> (In early versions of the patch series we had this, and then we dropped
> this by only including it if the kernel headers defined the syscall
> number; however 1842bdfd added the syscall definition to our
> headers, which means we can't tell if the kernel has it or not)
>
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Juan Quintela 




Re: [Qemu-devel] [PATCH RFC 3/5] qapi: Use common name mangling for enumeration constants

2015-11-09 Thread Markus Armbruster
"Daniel P. Berrange"  writes:

> On Thu, Nov 05, 2015 at 04:30:00PM +0100, Markus Armbruster wrote:
>> QAPI names needn't be valid C identifiers, so we mangle them with
>> c_name().  Except for enumeration constants, which we mangle with
>> camel_to_upper().
>> 
>> c_name() is easy enough to understand: replace '.' and '-' by '_',
>> prefix certain ticklish identifiers with 'q_'.
>> 
>> camel_to_upper() is a hairball of heuristics, and guessing how it'll
>> mangle interesting input could serve as a (nerdy) game.  Despite some
>> tweaking (commit 5d371f4), it's still inadqeuate for some QAPI names
>> (commit 351d36e).
>> 
>> Example: QAPI definition
>> 
>> { 'enum': 'BlockDeviceIoStatus', 'data': [ 'ok', 'failed', 'nospace' ] }
>> 
>> generates
>> 
>> typedef enum BlockDeviceIoStatus {
>> BLOCK_DEVICE_IO_STATUS_OK = 0,
>> BLOCK_DEVICE_IO_STATUS_FAILED = 1,
>> BLOCK_DEVICE_IO_STATUS_NOSPACE = 2,
>> BLOCK_DEVICE_IO_STATUS_MAX = 3,
>> } BlockDeviceIoStatus;
>> 
>> Observe that c_name() maps BlockDeviceIoStatus to itself, and
>> camel_to_upper() maps it to BLOCK_DEVICE_IO_STATUS, i.e. the
>> enumeration constants are shouted, the enumeration type isn't.
>> 
>> Because mangled names must not clash, name mangling restricts the
>> names you can use.  For example, you can't have member 'a-b' and
>> 'a_b'.
>> 
>> Having two separate manglings complicates this.  Enumeration constants
>> must be distinct after mangling with camel_to_upper().  But as soon as
>> you use the enumeration as union tag, they must *also* be distinct
>> after mangling with c_name().
>> 
>> Having shouted enumeration constants isn't worth the complexity cost.
>
> I rather disagree with this. Having all-uppercase for enum constants
> is a really widely used convention and I think it is pretty useful
> when reading to code to have constants (whether #define/enum) clearly
> marked in uppercase.  After this change we'll have situation where
> QAPI enums uses CamelCase, while all other non-QAPI enums (whether
> defined in QEMU source, or via #included 3rd party headers use
> UPPER_CASE for constants. I think that's rather unpleasant.

CODING_STYLE doesn't mandate shouting constants.

Existing code doesn't shout constants consistently.

Third parties don't shout constants consistently.

A competing convention is to use the enumeration type's name as prefix
for the constants.

> I agree with your premise that predicting the output of the qapi
> name mangler though is essentially impossible for mortals. How
> about a counter proposal
>
> First make 'prefix' compulsory for enums, instead of trying to
> figure out a suitable prefix automatically. Then, have a very
> simple rule for the enum constants where you just uppercase a-z
> and translate any non-alpha-numeric character into a _. Don't
> try todo anything else special like figuring out word boundaries.

Essentially c_name(qapi_name).upper().

> That would get of much of the complexity in the qapi name mangler
> and give a easily predictable enum constant name. Thus the vast
> majority of .c source files (possibly even all of them) would not
> need any change.

'prefix' is a work-around for deficient name mangling.  Making it
mandatory declares defeat on enumeration constant name mangling.  The
reason for defeat are unreasonable goals, namely combining these three
conventions:

* QAPI and C type names are in CamelCase

* Enumeration constants are prefixed by the type name

* Enumeration constants are shouted, including the prefix

They necessitate converting the CamelCase prefix to shouting, which is
the troublesome part.  Note that shouting the remainder (the QAPI member
name) is trivial: .upper() serves.


Let's take a step back and examine what I want to achieve.


First, I want simple, consistent rules for QAPI names.  Two kinds:
reserved names and name clashes.

A few names are globally reserved (e.g. prefix 'q_') , and a few more
only in certain name spaces (e.g. type name suffix 'List').  Simple
enough.

Two names clash if they're equal after replacing '-' and '.' by '_'.
Simple enough.

*Except* for enumeration member names, which can clash in other ways
(example: 'GotCha' with 'got-cha').  The exact special clashing rules
haven't been written down.  Nobody knows them, actually.  Instead of
writing them down, I want to get rid of then.

We could change the normal clashing rule to additionally ignore case.
Still simple enough, and makes sense to me.

Ignoring case would let us safely shout names in generated code.


Second, I want our C names generated in a simple, predictable way.  This
is largely the case:

* We use the QAPI name with '-' and '.' replaced by '_'

* Sometimes, we prepend a 'q_' to avoid certain ticklish names

* We prepend and append fixed strings

*Except* for enumeration member names, which undergo a different
mangling that is neither simple nor predictable.

My proposal simply drops the special case.

Your counter-proposal 

Re: [Qemu-devel] [PATCH v3 8/9] block: Introduce BlockDriver.bdrv_drain callback

2015-11-09 Thread Fam Zheng
On Mon, 11/09 10:24, Paolo Bonzini wrote:
> 
> 
> On 09/11/2015 03:56, Fam Zheng wrote:
> > +if (bs->drv && bs->drv->bdrv_drain) {
> > +bs->drv->bdrv_drain(bs);
> > +}
> > +QLIST_FOREACH(child, &bs->children, next) {
> > +BlockDriverState *cbs = child->bs;
> > +if (cbs->drv && cbs->drv->bdrv_drain) {
> > +cbs->drv->bdrv_drain(bs);
> > +}
> > +}
> 
> I think this is not enough, because the children could have children as
> well.
> 
> Perhaps you can do it like bdrv_flush: one function does the call to
> bdrv_drain and the recursion on children; bdrv_drain calls that one
> function and then does the "while (busy)" loop.
> 

Right, this function is no longer recursive and only goes to one layer down.

Will fix.

Thanks,

Fam



Re: [Qemu-devel] [PATCH 3/3] block/gluster: add support for multiple gluster servers

2015-11-09 Thread Prasanna Kumar Kalever
On Monday, November 9, 2015 12:34:45 PM, Peter Krempa wrote:
> On Thu, Nov 05, 2015 at 07:45:50 -0500, Prasanna Kumar Kalever wrote:
> > On Thursday, November 5, 2015 6:07:06 PM, Prasanna Kumar Kalever wrote:
> > > This patch adds a way to specify multiple volfile servers to the gluster
> > > block backend of QEMU with tcp|rdma transport types and their port
> > > numbers.
> > > 
> > > Problem:
> > > 
> > > Currently VM Image on gluster volume is specified like this:
> > 
> 
> [...]
> 
> > > @@ -345,7 +676,7 @@ static int qemu_gluster_open(BlockDriverState *bs,
> > > QDict
> > > *options,
> > >  
> > >  out:
> > >  qemu_opts_del(opts);
> > > -qemu_gluster_gconf_free(gconf);
> > > +qapi_free_BlockdevOptionsGluster(gconf);
> > 
> > Can some one help me please ?
> > This leads to crash in the second iteration i.e. while freeing
> > "gconf->servers->next->value"
> 
> So, prior to this you allocate a array of the data structures as:
> 
> +gsconf = g_new0(GlusterServer, num_servers);
> +
> +ptr = qemu_opt_get(opts, GLUSTER_OPT_VOLUME);
> +if (!ptr) {
> +error_setg(&local_err, "Error: qemu_gluster: please provide 'volume'
> "
> +   "option");
> +goto out;
> +}
> 
> Then you use the following code to fill the linked list:
> 
> +  if (gconf->servers == NULL) {
> +gconf->servers = g_new0(GlusterServerList, 1);
> +gconf->servers->value = &gsconf[i];
> 
> So here you set the value. For a i of 0 the '&gsconf[i]' expression will
> be a pointer with equal address to 'gsconf'. For explanation:
> 
> 'gsconf[i]' can be written as '*(gsconf + i)', so
> '&gsconf[i]' becomes basically '&(*(gsconf + i))'
> 
> This can be also simplified to:
> 'gsconf + i'. For a i of 0 this becomes the same pointer as 'gsconf'
> 
> And once you use that with free(), the whole gsconf array will be freed.
> All the other pointers that you've filled to the linked list become
> invalid, since they were pointing into the same array that was
> completely freed in the first iteration.

Thanks for the help Peter, It solves the problem.

-Prasanna 
> 
> Peter
> 



Re: [Qemu-devel] [PATCH 0/2] memory: Eliminate qemu_ram_free_from_ptr() & memory_region_destructor_ram_from_ptr()

2015-11-09 Thread Paolo Bonzini


On 06/11/2015 22:20, Eduardo Habkost wrote:
> This series eliminates two functions that are not needed anymore:
> qemu_ram_free_from_ptr(), and memory_region_destructor_ram_from_ptr().
> 
> Eduardo Habkost (2):
>   exec: Eliminate qemu_ram_free_from_ptr()
>   memory: Eliminate memory_region_destructor_ram_from_ptr()
> 
>  exec.c  | 19 ---
>  include/exec/ram_addr.h |  1 -
>  memory.c|  7 +--
>  3 files changed, 1 insertion(+), 26 deletions(-)
> 

Queued for 2.6, thanks.

Paolo



Re: [Qemu-devel] [PATCH] exec: Remove unnecessary RAM_FILE flag

2015-11-09 Thread Paolo Bonzini


On 06/11/2015 23:11, Eduardo Habkost wrote:
> The only code that sets RAMBlock.fd is file_ram_alloc(), and the only
> code that calls file_ram_alloc() sets the RAM_FILE flag. That means the
> flag is always set when RAMBlock.fd >= 0, and the munmap() call at
> reclaim_ramblock() is dead code that never runs.
> 
> Signed-off-by: Eduardo Habkost 
> ---
>  exec.c | 10 +-
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index a028961..9eb8b4b 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -88,9 +88,6 @@ static MemoryRegion io_mem_unassigned;
>   */
>  #define RAM_RESIZEABLE (1 << 2)
>  
> -/* RAM is backed by an mmapped file.
> - */
> -#define RAM_FILE (1 << 3)
>  #endif
>  
>  struct CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
> @@ -1597,7 +1594,6 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, 
> MemoryRegion *mr,
>  new_block->used_length = size;
>  new_block->max_length = size;
>  new_block->flags = share ? RAM_SHARED : 0;
> -new_block->flags |= RAM_FILE;
>  new_block->host = file_ram_alloc(new_block, size,
>   mem_path, errp);
>  if (!new_block->host) {
> @@ -1699,11 +1695,7 @@ static void reclaim_ramblock(RAMBlock *block)
>  xen_invalidate_map_cache_entry(block->host);
>  #ifndef _WIN32
>  } else if (block->fd >= 0) {
> -if (block->flags & RAM_FILE) {
> -qemu_ram_munmap(block->host, block->max_length);
> -} else {
> -munmap(block->host, block->max_length);
> -}
> +qemu_ram_munmap(block->host, block->max_length);
>  close(block->fd);
>  #endif
>  } else {
> 

Queued for 2.6, thanks.

Paolo



Re: [Qemu-devel] [PATCH v10 00/30] qapi member collision (post-introspection cleanups, subset C')

2015-11-09 Thread Markus Armbruster
Eric Blake  writes:

> On 11/06/2015 09:03 AM, Markus Armbruster wrote:
>> Eric Blake  writes:
[...]
>>> Hopefully, we are converging on something that will be ready
>>> for a pull request, especially for the earlier patches of this
>>> subset.
>> 
>> I guess you mean PATCH 01-12.  I had a few questions, but the most
>> likely outcome seems to be minor touchups I could apply in my tree.
>> 
>> I'm okay with trying to get more patches in, but let's get these out of
>> the way meanwhile.
>
> Yes, 01-12 seems like a good first set, if you want to make those
> touchups (I've supplied some potential text improvements in reply to
> some of your comments); I'm happy, as always, to take a peek over your
> staging repo to double check what you are prepping for the pull request.

Please have a look at my qapi-next branch.



Re: [Qemu-devel] [PULL 0/7] Block patches

2015-11-09 Thread Stefan Hajnoczi
On Mon, Nov 09, 2015 at 08:35:56AM +0100, Markus Armbruster wrote:
> Fam Zheng  writes:
> 
> > On Fri, 11/06 18:07, Peter Maydell wrote:
> >> On 6 November 2015 at 17:52, Stefan Hajnoczi  wrote:
> >> > The following changes since commit 
> >> > 4b59f39bc9a03afcc74b2fa28da7c3189fca507c:
> >> >
> >> >   Merge remote-tracking branch
> >> > 'remotes/mjt/tags/pull-trivial-patches-2015-11-06' into staging
> >> > (2015-11-06 12:50:24 +)
> >> >
> >> > are available in the git repository at:
> >> >
> >> >   git://github.com/stefanha/qemu.git tags/block-pull-request
> >> >
> >> > for you to fetch changes up to 6f707181b1bd6ccf2d2fd9397039c7ef6fa4a9fd:
> >> >
> >> >   blockdev: acquire AioContext in hmp_commit() (2015-11-06 15:41:00 
> >> > +)
> >> >
> >> > 
> >> 
> >> Build failure on OSX :-(
> >> 
> >> /Users/pm215/src/qemu-for-merges/aio-posix.c  CCblock/qcow.o
> >> :442:37: error: no member named 'epollfd' in 'struct AioContext'
> >> epoll_handler.pfd.fd = ctx->epollfd;
> >>~~~  ^
> >> 
> >
> > :(
> >
> > I think it is harmless to always include this member in AioContext. Stefan,
> > could you squash this into patch 5?  Thanks!
> >
> > Fam
> >
> > ---
> >
> > diff --git a/include/block/aio.h b/include/block/aio.h
> > index 91737d5..735f1f8 100644
> > --- a/include/block/aio.h
> > +++ b/include/block/aio.h
> > @@ -124,11 +124,9 @@ struct AioContext {
> >  QEMUTimerListGroup tlg;
> >  
> >  int external_disable_cnt;
> > -#ifdef CONFIG_EPOLL
> >  int epollfd;
> >  bool epoll_enabled;
> >  bool epoll_available;
> > -#endif
> >  };
> 
> Replace by the ifdeffery by a comment pointing to CONFIG_EPOLL, perhaps?

Done.  Will send a v2 pull request.

Stefan


signature.asc
Description: PGP signature


[Qemu-devel] [PATCH v3] virtio-blk: trivial code optimization

2015-11-09 Thread arei.gonglei
From: Gonglei 

1. avoid possible superflous checking
2. make code more robustness

Signed-off-by: Gonglei 
Reviewed-by: Fam Zheng 
---
v3: change the third condition too [Paolo]
add Fam's R-by
---
 hw/block/virtio-blk.c | 27 +--
 1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 093e475..9124358 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -404,24 +404,15 @@ void virtio_blk_submit_multireq(BlockBackend *blk, 
MultiReqBuffer *mrb)
 for (i = 0; i < mrb->num_reqs; i++) {
 VirtIOBlockReq *req = mrb->reqs[i];
 if (num_reqs > 0) {
-bool merge = true;
-
-/* merge would exceed maximum number of IOVs */
-if (niov + req->qiov.niov > IOV_MAX) {
-merge = false;
-}
-
-/* merge would exceed maximum transfer length of backend device */
-if (req->qiov.size / BDRV_SECTOR_SIZE + nb_sectors > max_xfer_len) 
{
-merge = false;
-}
-
-/* requests are not sequential */
-if (sector_num + nb_sectors != req->sector_num) {
-merge = false;
-}
-
-if (!merge) {
+/*
+ * NOTE: We cannot merge the requests in below situations:
+ * 1. requests are not sequential
+ * 2. merge would exceed maximum number of IOVs
+ * 3. merge would exceed maximum transfer length of backend device
+ */
+if (sector_num + nb_sectors != req->sector_num ||
+niov > IOV_MAX - req->qiov.niov ||
+nb_sectors > max_xfer_len - req->qiov.size / BDRV_SECTOR_SIZE) 
{
 submit_requests(blk, mrb, start, num_reqs, niov);
 num_reqs = 0;
 }
-- 
1.7.12.4





Re: [Qemu-devel] [PATCH] hw/arm/virt: error_report cleanups

2015-11-09 Thread Peter Maydell
On 9 November 2015 at 07:44, Markus Armbruster  wrote:
> For consistency, error messages should be a phrase, not a full sentence,
> let alone a paraphraph.

This is in direct conflict with wanting them to be actually useful
to users :-(

thanks
-- PMM



[Qemu-devel] [PATCH v12 3/3] block/gluster: add support for multiple gluster servers

2015-11-09 Thread Prasanna Kumar Kalever
This patch adds a way to specify multiple volfile servers to the gluster
block backend of QEMU with tcp|rdma transport types and their port numbers.

Problem:

Currently VM Image on gluster volume is specified like this:

file=gluster[+tcp]://host[:port]/testvol/a.img

Assuming we have three hosts in trusted pool with replica 3 volume
in action and unfortunately host (mentioned in the command above) went down
for some reason, since the volume is replica 3 we now have other 2 hosts
active from which we can boot the VM.

But currently there is no mechanism to pass the other 2 gluster host
addresses to qemu.

Solution:

New way of specifying VM Image on gluster volume with volfile servers:
(We still support old syntax to maintain backward compatibility)

Basic command line syntax looks like:

Pattern I:
 -drive driver=gluster,
volume=testvol,path=/path/a.raw,
servers.0.host=1.2.3.4,
   [servers.0.port=24007,]
   [servers.0.transport=tcp,]
servers.1.host=5.6.7.8,
   [servers.1.port=24008,]
   [servers.1.transport=rdma,] ...

Pattern II:
 'json:{"driver":"qcow2","file":{"driver":"gluster",
   "volume":"testvol","path":"/path/a.qcow2",
   "servers":[{tuple0},{tuple1}, ...{tupleN}]}}'

   driver  => 'gluster' (protocol name)
   volume  => name of gluster volume where our VM image resides
   path=> absolute path of image in gluster volume

  {tuple}  => {"host":"1.2.3.4"[,"port":"24007","transport":"tcp"]}

   host=> host address (hostname/ipv4/ipv6 addresses)
   port=> port number on which glusterd is listening. (default 24007)
   transport   => transport type used to connect to gluster management daemon,
   it can be tcp|rdma (default 'tcp')

Examples:
1.
 -drive driver=qcow2,file.driver=gluster,
file.volume=testvol,file.path=/path/a.qcow2,
file.servers.0.host=1.2.3.4,
file.servers.0.port=24007,
file.servers.0.transport=tcp,
file.servers.1.host=5.6.7.8,
file.servers.1.port=24008,
file.servers.1.transport=rdma
2.
 'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"testvol",
 "path":"/path/a.qcow2","servers":
 [{"host":"1.2.3.4","port":"24007","transport":"tcp"},
  {"host":"4.5.6.7","port":"24008","transport":"rdma"}] } }'

This patch gives a mechanism to provide all the server addresses, which are in
replica set, so in case host1 is down VM can still boot from any of the
active hosts.

This is equivalent to the backup-volfile-servers option supported by
mount.glusterfs (FUSE way of mounting gluster volume)

This patch depends on a recent fix in libgfapi raised as part of this work:
http://review.gluster.org/#/c/12114/

Credits: Sincere thanks to Kevin Wolf  and
"Deepak C Shetty"  for inputs and all their support

Signed-off-by: Prasanna Kumar Kalever 
---
v1:
multiple host addresses but common port number and transport type
pattern: URI syntax with query (?) delimitor
syntax:
file=gluster[+transport-type]://host1:24007/testvol/a.img\
 ?servers=host2&servers=host3

v2:
multiple host addresses each have their own port number, but all use
 common transport type
pattern: URI syntax  with query (?) delimiter
syntax:
file=gluster[+transport-type]://[host[:port]]/testvol/a.img\
 [?servers=host1[:port]\
  &servers=host2[:port]]

v3:
multiple host addresses each have their own port number and transport type
pattern: changed to json
syntax:
'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"testvol",
   "path":"/path/a.qcow2","servers":
 [{"host":"1.2.3.4","port":"24007","transport":"tcp"},
  {"host":"4.5.6.7","port":"24008","transport":"rdma"}] } }'

v4, v5:
address comments from "Eric Blake" 
renamed:
'backup-volfile-servers' -> 'volfile-servers'

v6:
address comments from Peter Krempa 
renamed:
 'volname'->  'volume'
 'image-path' ->  'path'
 'server' ->  'host'

v7:
fix for v6 (initialize num_servers to 1 and other typos)

v8:
split patch set v7 into series of 3 as per Peter Krempa 
review comments

v9:
reorder the series of patches addressing "Eric Blake" 
review comments

v10:
fix mem-leak as per Peter Krempa  review comments

v11:
using qapi-types* defined structures as per "Eric Blake" 
review comments.

v12:
fix crash caused in qapi_free_BlockdevOptionsGluster
---
 block/gluster.c  | 460 ---
 qapi/block-core.json |  64 ++-
 2 files changed, 455 insertions(+), 69 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index ededda2..2b944e6 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -11,6 +11,16 @@
 #include "block/block_int.h"
 #include "qemu/uri.h"
 
+#define GLUSTER_OPT_FILENAME   "filename"
+#define GLUSTER_OPT_VOLUME "volume"
+#define GLUSTER_OPT_PATH   "path"
+#define GLUSTER_OPT_HOST   "host"

[Qemu-devel] [PATCH 0/3] block/gluster: add support for multiple gluster servers

2015-11-09 Thread Prasanna Kumar Kalever
Prasanna Kumar Kalever (3):
  block/gluster: rename [server, volname, image] -> [host, volume, path]
  block/gluster: code cleanup
  block/gluster: add support for multiple gluster servers

 block/gluster.c  | 589 ---
 qapi/block-core.json |  64 +-
 2 files changed, 523 insertions(+), 130 deletions(-)

-- 
2.1.0




Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11

2015-11-09 Thread Peter Maydell
On 9 November 2015 at 09:10, Paolo Bonzini  wrote:
>
>
> On 08/11/2015 23:55, Peter Maydell wrote:
>> So the good news is that on mainline this doesn't happen any more.
>> The bad news is that something weird is going on such that git
>> bisect doesn't give helpful answers. Specifically if I start by
>> compiling older versions and work forwards, then
>>  0fd7e09 kvmclock: add a new function to update env->tsc.
>> shows the bug, and
>>  6388acc Revert "Introduce cpu_clean_all_dirty"
>> does not. (And I've got to that commit both via a git-bisect
>> and by a second round of manually trying to identify the commit,
>> so it's consistent about where it changes behaviour.)
>> However that makes no sense because that revert commit
>> is just removing unused code. And then if I go backwards again
>> to 0fd7e09 the bug doesn't repro there.
>
> Even 0fd7e09 does not change behavior unless you use KVM (which you
> obviously don't do under Mac OS X).  So if you go backwards to 0fd7e09^
> it shouldn't reproduce there either.
>
> What is the known bad SHA1?

2b5a79f is definitely bad even rebuilt from clean. I'm going
to do a re-bisect building each step from clean this morning.

thanks
-- PMM



[Qemu-devel] [PULL v2 5/7] aio: Introduce aio-epoll.c

2015-11-09 Thread Stefan Hajnoczi
From: Fam Zheng 

To minimize code duplication, epoll is hooked into aio-posix's
aio_poll() instead of rolling its own. This approach also has both
compile-time and run-time switchability.

1) When QEMU starts with a small number of fds in the event loop, ppoll
is used.

2) When QEMU starts with a big number of fds, or when more devices are
hot plugged, epoll kicks in when the number of fds hits the threshold.

3) Some fds may not support epoll, such as tty based stdio. In this
case, it falls back to ppoll.

A rough benchmark with scsi-disk on virtio-scsi dataplane (epoll gets
enabled from 64 onward). Numbers are in MB/s.

===
 | master | epoll
 ||
scsi disks # | readrandrw | readrandrw
-||
1| 86  36 | 92  45
8| 87  43 | 86  41
64   | 71  32 | 70  38
128  | 48  24 | 58  31
256  | 37  19 | 57  28
===

To comply with aio_{disable,enable}_external, we always use ppoll when
aio_external_disabled() is true.

[Removed #ifdef CONFIG_EPOLL around AioContext epollfd field declaration
since the field is also referenced outside CONFIG_EPOLL code.
--Stefan]

Signed-off-by: Fam Zheng 
Message-id: 1446177989-6702-4-git-send-email-f...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 aio-posix.c | 184 +++-
 include/block/aio.h |   5 ++
 2 files changed, 188 insertions(+), 1 deletion(-)

diff --git a/aio-posix.c b/aio-posix.c
index 5bff3cd..06148a9 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -17,6 +17,9 @@
 #include "block/block.h"
 #include "qemu/queue.h"
 #include "qemu/sockets.h"
+#ifdef CONFIG_EPOLL
+#include 
+#endif
 
 struct AioHandler
 {
@@ -29,6 +32,162 @@ struct AioHandler
 QLIST_ENTRY(AioHandler) node;
 };
 
+#ifdef CONFIG_EPOLL
+
+/* The fd number threashold to switch to epoll */
+#define EPOLL_ENABLE_THRESHOLD 64
+
+static void aio_epoll_disable(AioContext *ctx)
+{
+ctx->epoll_available = false;
+if (!ctx->epoll_enabled) {
+return;
+}
+ctx->epoll_enabled = false;
+close(ctx->epollfd);
+}
+
+static inline int epoll_events_from_pfd(int pfd_events)
+{
+return (pfd_events & G_IO_IN ? EPOLLIN : 0) |
+   (pfd_events & G_IO_OUT ? EPOLLOUT : 0) |
+   (pfd_events & G_IO_HUP ? EPOLLHUP : 0) |
+   (pfd_events & G_IO_ERR ? EPOLLERR : 0);
+}
+
+static bool aio_epoll_try_enable(AioContext *ctx)
+{
+AioHandler *node;
+struct epoll_event event;
+
+QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+int r;
+if (node->deleted || !node->pfd.events) {
+continue;
+}
+event.events = epoll_events_from_pfd(node->pfd.events);
+event.data.ptr = node;
+r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, &event);
+if (r) {
+return false;
+}
+}
+ctx->epoll_enabled = true;
+return true;
+}
+
+static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new)
+{
+struct epoll_event event;
+int r;
+
+if (!ctx->epoll_enabled) {
+return;
+}
+if (!node->pfd.events) {
+r = epoll_ctl(ctx->epollfd, EPOLL_CTL_DEL, node->pfd.fd, &event);
+if (r) {
+aio_epoll_disable(ctx);
+}
+} else {
+event.data.ptr = node;
+event.events = epoll_events_from_pfd(node->pfd.events);
+if (is_new) {
+r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, &event);
+if (r) {
+aio_epoll_disable(ctx);
+}
+} else {
+r = epoll_ctl(ctx->epollfd, EPOLL_CTL_MOD, node->pfd.fd, &event);
+if (r) {
+aio_epoll_disable(ctx);
+}
+}
+}
+}
+
+static int aio_epoll(AioContext *ctx, GPollFD *pfds,
+ unsigned npfd, int64_t timeout)
+{
+AioHandler *node;
+int i, ret = 0;
+struct epoll_event events[128];
+
+assert(npfd == 1);
+assert(pfds[0].fd == ctx->epollfd);
+if (timeout > 0) {
+ret = qemu_poll_ns(pfds, npfd, timeout);
+}
+if (timeout <= 0 || ret > 0) {
+ret = epoll_wait(ctx->epollfd, events,
+ sizeof(events) / sizeof(events[0]),
+ timeout);
+if (ret <= 0) {
+goto out;
+}
+for (i = 0; i < ret; i++) {
+int ev = events[i].events;
+node = events[i].data.ptr;
+node->pfd.revents = (ev & EPOLLIN ? G_IO_IN : 0) |
+(ev & EPOLLOUT ? G_IO_OUT : 0) |
+(ev & EPOLLHUP ? G_IO_HUP : 0) |
+(ev & EPOLLERR ? G_IO_ERR : 0);
+}
+}
+out:
+return ret;
+}
+
+static bool aio_epoll_enabled(AioContext *ctx)
+{
+/* Fall

[Qemu-devel] [PULL v2 4/7] aio: Introduce aio_context_setup

2015-11-09 Thread Stefan Hajnoczi
From: Fam Zheng 

This is the place to initialize platform specific bits of AioContext.

Signed-off-by: Fam Zheng 
Message-id: 1446177989-6702-3-git-send-email-f...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 aio-posix.c |  4 
 aio-win32.c |  4 
 async.c | 13 +++--
 include/block/aio.h |  8 
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index 0467f23..5bff3cd 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -302,3 +302,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
 return progress;
 }
+
+void aio_context_setup(AioContext *ctx, Error **errp)
+{
+}
diff --git a/aio-win32.c b/aio-win32.c
index 43c4c79..cdc4456 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -369,3 +369,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 aio_context_release(ctx);
 return progress;
 }
+
+void aio_context_setup(AioContext *ctx, Error **errp)
+{
+}
diff --git a/async.c b/async.c
index 9589e4b..e106072 100644
--- a/async.c
+++ b/async.c
@@ -325,12 +325,18 @@ AioContext *aio_context_new(Error **errp)
 {
 int ret;
 AioContext *ctx;
+Error *local_err = NULL;
+
 ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
+aio_context_setup(ctx, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
+}
 ret = event_notifier_init(&ctx->notifier, false);
 if (ret < 0) {
-g_source_destroy(&ctx->source);
 error_setg_errno(errp, -ret, "Failed to initialize event notifier");
-return NULL;
+goto fail;
 }
 g_source_set_can_recurse(&ctx->source, true);
 aio_set_event_notifier(ctx, &ctx->notifier,
@@ -345,6 +351,9 @@ AioContext *aio_context_new(Error **errp)
 ctx->notify_dummy_bh = aio_bh_new(ctx, notify_dummy_bh, NULL);
 
 return ctx;
+fail:
+g_source_destroy(&ctx->source);
+return NULL;
 }
 
 void aio_context_ref(AioContext *ctx)
diff --git a/include/block/aio.h b/include/block/aio.h
index cab7c76..9ffecf7 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -429,4 +429,12 @@ static inline bool aio_node_check(AioContext *ctx, bool 
is_external)
 return !is_external || !atomic_read(&ctx->external_disable_cnt);
 }
 
+/**
+ * aio_context_setup:
+ * @ctx: the aio context
+ *
+ * Initialize the aio context.
+ */
+void aio_context_setup(AioContext *ctx, Error **errp);
+
 #endif
-- 
2.5.0




[Qemu-devel] [PULL v2 0/7] Block patches

2015-11-09 Thread Stefan Hajnoczi
The following changes since commit c4a7bf54e588ff2de9fba0898b686156251b2063:

  Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into 
staging (2015-11-07 19:55:15 +)

are available in the git repository at:

  git://github.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 84aa0140dd4f04f5d993f0db14c40da8d3de2706:

  blockdev: acquire AioContext in hmp_commit() (2015-11-09 10:07:10 +)





Denis V. Lunev (1):
  monitor: add missed aio_context_acquire into vm_completion call

Fam Zheng (3):
  aio: Introduce aio_external_disabled
  aio: Introduce aio_context_setup
  aio: Introduce aio-epoll.c

Michael S. Tsirkin (2):
  dataplane: simplify indirect descriptor read
  dataplane: support non-contigious s/g

Stefan Hajnoczi (1):
  blockdev: acquire AioContext in hmp_commit()

 aio-posix.c | 188 +++-
 aio-win32.c |   4 +
 async.c |  13 ++-
 blockdev.c  |  12 ++-
 hw/virtio/dataplane/vring.c |  96 ++
 include/block/aio.h |  24 ++
 monitor.c   |  11 ++-
 7 files changed, 309 insertions(+), 39 deletions(-)

-- 
2.5.0




[Qemu-devel] [PULL v2 1/7] dataplane: simplify indirect descriptor read

2015-11-09 Thread Stefan Hajnoczi
From: "Michael S. Tsirkin" 

Use address_space_read to make sure we handle the case of an indirect
descriptor crossing DIMM boundary correctly.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Stefan Hajnoczi 
Tested-by: Igor Mammedov 
Message-id: 1446047243-3221-1-git-send-email-...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio/dataplane/vring.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index 68f1994..0b92fcf 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -257,6 +257,21 @@ static void copy_in_vring_desc(VirtIODevice *vdev,
 host->next = virtio_lduw_p(vdev, &guest->next);
 }
 
+static bool read_vring_desc(VirtIODevice *vdev,
+hwaddr guest,
+struct vring_desc *host)
+{
+if (address_space_read(&address_space_memory, guest, 
MEMTXATTRS_UNSPECIFIED,
+   (uint8_t *)host, sizeof *host)) {
+return false;
+}
+host->addr = virtio_tswap64(vdev, host->addr);
+host->len = virtio_tswap32(vdev, host->len);
+host->flags = virtio_tswap16(vdev, host->flags);
+host->next = virtio_tswap16(vdev, host->next);
+return true;
+}
+
 /* This is stolen from linux/drivers/vhost/vhost.c. */
 static int get_indirect(VirtIODevice *vdev, Vring *vring,
 VirtQueueElement *elem, struct vring_desc *indirect)
@@ -284,23 +299,16 @@ static int get_indirect(VirtIODevice *vdev, Vring *vring,
 }
 
 do {
-struct vring_desc *desc_ptr;
-MemoryRegion *mr;
-
 /* Translate indirect descriptor */
-desc_ptr = vring_map(&mr,
- indirect->addr + found * sizeof(desc),
- sizeof(desc), false);
-if (!desc_ptr) {
-error_report("Failed to map indirect descriptor "
+if (!read_vring_desc(vdev, indirect->addr + found * sizeof(desc),
+ &desc)) {
+error_report("Failed to read indirect descriptor "
  "addr %#" PRIx64 " len %zu",
  (uint64_t)indirect->addr + found * sizeof(desc),
  sizeof(desc));
 vring->broken = true;
 return -EFAULT;
 }
-copy_in_vring_desc(vdev, desc_ptr, &desc);
-memory_region_unref(mr);
 
 /* Ensure descriptor has been loaded before accessing fields */
 barrier(); /* read_barrier_depends(); */
-- 
2.5.0




[Qemu-devel] [PULL v2 2/7] dataplane: support non-contigious s/g

2015-11-09 Thread Stefan Hajnoczi
From: "Michael S. Tsirkin" 

bring_map currently fails if one of the entries it's mapping is
contigious in GPA but not HVA address space.  Introduce a mapped_len
parameter so it can handle this, returning the actual mapped length.

This will still fail if there's no space left in the sg, but luckily max
queue size in use is currently 256, while max sg size is 1024, so we
should be OK even is all entries happen to cross a single DIMM boundary.

Won't work well with very small DIMM sizes, unfortunately:
e.g. this will fail with 4K DIMMs where a single
request might span a large number of DIMMs.

Let's hope these are uncommon - at least we are not breaking things.

Reported-by: Stefan Hajnoczi 
Reported-by: Igor Mammedov 
Signed-off-by: Michael S. Tsirkin 
Tested-by: Igor Mammedov 
Message-id: 1446047243-3221-2-git-send-email-...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio/dataplane/vring.c | 68 ++---
 1 file changed, 46 insertions(+), 22 deletions(-)

diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index 0b92fcf..23f667e 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -25,15 +25,30 @@
 
 /* vring_map can be coupled with vring_unmap or (if you still have the
  * value returned in *mr) memory_region_unref.
+ * Returns NULL on failure.
+ * Callers that can handle a partial mapping must supply mapped_len pointer to
+ * get the actual length mapped.
+ * Passing mapped_len == NULL requires either a full mapping or a failure.
  */
-static void *vring_map(MemoryRegion **mr, hwaddr phys, hwaddr len,
+static void *vring_map(MemoryRegion **mr, hwaddr phys,
+   hwaddr len, hwaddr *mapped_len,
bool is_write)
 {
 MemoryRegionSection section = memory_region_find(get_system_memory(), 
phys, len);
+uint64_t size;
 
-if (!section.mr || int128_get64(section.size) < len) {
+if (!section.mr) {
 goto out;
 }
+
+size = int128_get64(section.size);
+assert(size);
+
+/* Passing mapped_len == NULL requires either a full mapping or a failure. 
*/
+if (!mapped_len && size < len) {
+goto out;
+}
+
 if (is_write && section.readonly) {
 goto out;
 }
@@ -46,6 +61,10 @@ static void *vring_map(MemoryRegion **mr, hwaddr phys, 
hwaddr len,
 goto out;
 }
 
+if (mapped_len) {
+*mapped_len = MIN(size, len);
+}
+
 *mr = section.mr;
 return memory_region_get_ram_ptr(section.mr) + 
section.offset_within_region;
 
@@ -78,7 +97,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
 addr = virtio_queue_get_desc_addr(vdev, n);
 size = virtio_queue_get_desc_size(vdev, n);
 /* Map the descriptor area as read only */
-ptr = vring_map(&vring->mr_desc, addr, size, false);
+ptr = vring_map(&vring->mr_desc, addr, size, NULL, false);
 if (!ptr) {
 error_report("Failed to map 0x%" HWADDR_PRIx " byte for vring desc "
  "at 0x%" HWADDR_PRIx,
@@ -92,7 +111,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
 /* Add the size of the used_event_idx */
 size += sizeof(uint16_t);
 /* Map the driver area as read only */
-ptr = vring_map(&vring->mr_avail, addr, size, false);
+ptr = vring_map(&vring->mr_avail, addr, size, NULL, false);
 if (!ptr) {
 error_report("Failed to map 0x%" HWADDR_PRIx " byte for vring avail "
  "at 0x%" HWADDR_PRIx,
@@ -106,7 +125,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
 /* Add the size of the avail_event_idx */
 size += sizeof(uint16_t);
 /* Map the device area as read-write */
-ptr = vring_map(&vring->mr_used, addr, size, true);
+ptr = vring_map(&vring->mr_used, addr, size, NULL, true);
 if (!ptr) {
 error_report("Failed to map 0x%" HWADDR_PRIx " byte for vring used "
  "at 0x%" HWADDR_PRIx,
@@ -206,6 +225,7 @@ static int get_desc(Vring *vring, VirtQueueElement *elem,
 struct iovec *iov;
 hwaddr *addr;
 MemoryRegion *mr;
+hwaddr len;
 
 if (desc->flags & VRING_DESC_F_WRITE) {
 num = &elem->in_num;
@@ -224,26 +244,30 @@ static int get_desc(Vring *vring, VirtQueueElement *elem,
 }
 }
 
-/* Stop for now if there are not enough iovecs available. */
-if (*num >= VIRTQUEUE_MAX_SIZE) {
-error_report("Invalid SG num: %u", *num);
-return -EFAULT;
-}
+while (desc->len) {
+/* Stop for now if there are not enough iovecs available. */
+if (*num >= VIRTQUEUE_MAX_SIZE) {
+error_report("Invalid SG num: %u", *num);
+return -EFAULT;
+}
+
+iov->iov_base = vring_map(&mr, desc->addr, desc->len, &len,
+  desc->flags & VRING_DESC_F_WRITE);
+if (!iov->iov_base) {
+error_report("Failed to map descriptor addr %#" PRIx64 " len %u",
+  

[Qemu-devel] [PULL v2 7/7] blockdev: acquire AioContext in hmp_commit()

2015-11-09 Thread Stefan Hajnoczi
This one slipped through.  Although we acquire AioContext when
committing all devices we don't for just a single device.

AioContext must be acquired before calling bdrv_*() functions to
synchronize access with other threads that may be using the AioContext.

Signed-off-by: Stefan Hajnoczi 
Signed-off-by: Denis V. Lunev 
Reviewed-by: Jeff Cody 
Signed-off-by: Stefan Hajnoczi 
---
 blockdev.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 8b8bfa9..97be42f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1120,6 +1120,9 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
 if (!strcmp(device, "all")) {
 ret = bdrv_commit_all();
 } else {
+BlockDriverState *bs;
+AioContext *aio_context;
+
 blk = blk_by_name(device);
 if (!blk) {
 monitor_printf(mon, "Device '%s' not found\n", device);
@@ -1129,7 +1132,14 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "Device '%s' has no medium\n", device);
 return;
 }
-ret = bdrv_commit(blk_bs(blk));
+
+bs = blk_bs(blk);
+aio_context = bdrv_get_aio_context(bs);
+aio_context_acquire(aio_context);
+
+ret = bdrv_commit(bs);
+
+aio_context_release(aio_context);
 }
 if (ret < 0) {
 monitor_printf(mon, "'commit' error for '%s': %s\n", device,
-- 
2.5.0




[Qemu-devel] [PULL v2 3/7] aio: Introduce aio_external_disabled

2015-11-09 Thread Stefan Hajnoczi
From: Fam Zheng 

This allows AioContext users to check the enable/disable state of
external clients.

Signed-off-by: Fam Zheng 
Message-id: 1446177989-6702-2-git-send-email-f...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/block/aio.h b/include/block/aio.h
index 92efc5e..cab7c76 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -406,6 +406,17 @@ static inline void aio_enable_external(AioContext *ctx)
 }
 
 /**
+ * aio_external_disabled:
+ * @ctx: the aio context
+ *
+ * Return true if the external clients are disabled.
+ */
+static inline bool aio_external_disabled(AioContext *ctx)
+{
+return atomic_read(&ctx->external_disable_cnt);
+}
+
+/**
  * aio_node_check:
  * @ctx: the aio context
  * @is_external: Whether or not the checked node is an external event source.
-- 
2.5.0




[Qemu-devel] [PULL v2 6/7] monitor: add missed aio_context_acquire into vm_completion call

2015-11-09 Thread Stefan Hajnoczi
From: "Denis V. Lunev" 

Signed-off-by: Denis V. Lunev 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
CC: Luiz Capitulino 
CC: Markus Armbruster 
Signed-off-by: Stefan Hajnoczi 
---
 monitor.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/monitor.c b/monitor.c
index 6cd747f..3295840 100644
--- a/monitor.c
+++ b/monitor.c
@@ -3408,13 +3408,18 @@ static void vm_completion(ReadLineState *rs, const char 
*str)
 readline_set_completion_index(rs, len);
 while ((bs = bdrv_next(bs))) {
 SnapshotInfoList *snapshots, *snapshot;
+AioContext *ctx = bdrv_get_aio_context(bs);
+bool ok = false;
 
-if (!bdrv_can_snapshot(bs)) {
-continue;
+aio_context_acquire(ctx);
+if (bdrv_can_snapshot(bs)) {
+ok = bdrv_query_snapshot_info_list(bs, &snapshots, NULL) == 0;
 }
-if (bdrv_query_snapshot_info_list(bs, &snapshots, NULL)) {
+aio_context_release(ctx);
+if (!ok) {
 continue;
 }
+
 snapshot = snapshots;
 while (snapshot) {
 char *completion = snapshot->value->name;
-- 
2.5.0




Re: [Qemu-devel] [PATCH v6 09/33] exec: allow file_ram_alloc to work on file

2015-11-09 Thread Michael S. Tsirkin
On Fri, Oct 30, 2015 at 01:56:03PM +0800, Xiao Guangrong wrote:
> Currently, file_ram_alloc() only works on directory - it creates a file
> under @path and do mmap on it
> 
> This patch tries to allow it to work on file directly, if @path is a
> directory it works as before, otherwise it treats @path as the target
> file then directly allocate memory from it
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  exec.c | 80 
> ++
>  1 file changed, 51 insertions(+), 29 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 3ca7e50..f219010 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>  }
>  
>  #ifdef __linux__
> +static bool path_is_dir(const char *path)
> +{
> +struct stat fs;
> +
> +return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);

This means file doesn't exist is treated as a file.
Can't figure out if that's intentional, should
be documented in any case.

> +}
> +
> +static int open_file_path(RAMBlock *block, const char *path, size_t size)
> +{
> +char *filename;
> +char *sanitized_name;
> +char *c;
> +int fd;
> +
> +if (!path_is_dir(path)) {
> +int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;

Why does this make sense?

> +
> +flags |= O_EXCL;

And why does this makes sense?

> +return open(path, flags);
> +}
> +
> +/* Make name safe to use with mkstemp by replacing '/' with '_'. */
> +sanitized_name = g_strdup(memory_region_name(block->mr));
> +for (c = sanitized_name; *c != '\0'; c++) {
> +if (*c == '/') {
> +*c = '_';
> +}
> +}
> +filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
> +   sanitized_name);
> +g_free(sanitized_name);
> +fd = mkstemp(filename);
> +if (fd >= 0) {
> +unlink(filename);
> +/*
> + * ftruncate is not supported by hugetlbfs in older
> + * hosts, so don't bother bailing out on errors.
> + * If anything goes wrong with it under other filesystems,
> + * mmap will fail.
> + */
> +if (ftruncate(fd, size)) {
> +perror("ftruncate");
> +}
> +}
> +g_free(filename);
> +
> +return fd;
> +}
> +
>  static void *file_ram_alloc(RAMBlock *block,
>  ram_addr_t memory,
>  const char *path,
>  Error **errp)
>  {
> -char *filename;
> -char *sanitized_name;
> -char *c;
>  void *area;
>  int fd;
>  uint64_t pagesize;
> @@ -1211,38 +1257,14 @@ static void *file_ram_alloc(RAMBlock *block,
>  goto error;
>  }
>  
> -/* Make name safe to use with mkstemp by replacing '/' with '_'. */
> -sanitized_name = g_strdup(memory_region_name(block->mr));
> -for (c = sanitized_name; *c != '\0'; c++) {
> -if (*c == '/')
> -*c = '_';
> -}
> -
> -filename = g_strdup_printf("%s/qemu_back_mem.%s.XX", path,
> -   sanitized_name);
> -g_free(sanitized_name);
> +memory = ROUND_UP(memory, pagesize);
>  
> -fd = mkstemp(filename);
> +fd = open_file_path(block, path, memory);
>  if (fd < 0) {
>  error_setg_errno(errp, errno,
>   "unable to create backing store for path %s", path);
> -g_free(filename);
>  goto error;
>  }
> -unlink(filename);
> -g_free(filename);
> -
> -memory = ROUND_UP(memory, pagesize);
> -
> -/*
> - * ftruncate is not supported by hugetlbfs in older
> - * hosts, so don't bother bailing out on errors.
> - * If anything goes wrong with it under other filesystems,
> - * mmap will fail.
> - */
> -if (ftruncate(fd, memory)) {
> -perror("ftruncate");
> -}
>  
>  area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>  if (area == MAP_FAILED) {
> -- 
> 1.8.3.1



[Qemu-devel] [PATCH v4 0/9] block: Fixes for bdrv_drain

2015-11-09 Thread Fam Zheng
v4: Don't miss children's children. [Paolo]

v3: Don't reuse coroutine in bdrv_aio_ioctl. [Stefan]
Recursely call .bdrv_drain callback only. [Stefan, Paolo]
Added Kevin's reviewed-by in other patches.

v2: Add Kevin's reviewed-by in patches 1, 2, 5-7, 9.
Address Kevin's reviewing comments which are:
- Explicit "ret = 0" before out label in patch 3.
- Add missing qemu_aio_unref() in patch 4.
- Recurse into all children in bdrv_drain in patch 8.

Previously bdrv_drain and bdrv_drain_all don't handle ioctl, flush and discard
requests (which are fundamentally the same as read and write requests that
change disk state).  Forgetting such requests leaves us in risk of violating
the invariant that bdrv_drain() callers rely on - all asynchronous requests
must have completed after bdrv_drain returns.

Enrich the tracked request types, and add tracked_request_begin/end pairs to
all three code paths. As a prerequisite, ioctl code is moved into coroutine
too.

The last two patches take care of QED's "need check" timer, so that after
bdrv_drain returns, the driver is in a consistent state.

Fam


Fam Zheng (9):
  block: Add more types for tracked request
  block: Track flush requests
  block: Track discard requests
  iscsi: Emulate commands in iscsi_aio_ioctl as iscsi_ioctl
  block: Add ioctl parameter fields to BlockRequest
  block: Emulate bdrv_ioctl with bdrv_aio_ioctl and track both
  block: Drop BlockDriver.bdrv_ioctl
  block: Introduce BlockDriver.bdrv_drain callback
  qed: Implement .bdrv_drain

 block/io.c| 150 +++---
 block/iscsi.c |  73 +++---
 block/qed.c   |  13 
 block/raw-posix.c |   8 ---
 block/raw_bsd.c   |   6 --
 include/block/block.h |  16 +++--
 include/block/block_int.h |  17 +-
 7 files changed, 208 insertions(+), 75 deletions(-)

-- 
2.4.3




[Qemu-devel] [PATCH v4 1/9] block: Add more types for tracked request

2015-11-09 Thread Fam Zheng
We'll track more request types besides read and write, change the
boolean field to an enum.

Signed-off-by: Fam Zheng 
Reviewed-by: Kevin Wolf 
---
 block/io.c|  9 +
 include/block/block_int.h | 10 +-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index 8dcad3b..793809a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -348,13 +348,14 @@ static void tracked_request_end(BdrvTrackedRequest *req)
 static void tracked_request_begin(BdrvTrackedRequest *req,
   BlockDriverState *bs,
   int64_t offset,
-  unsigned int bytes, bool is_write)
+  unsigned int bytes,
+  enum BdrvTrackedRequestType type)
 {
 *req = (BdrvTrackedRequest){
 .bs = bs,
 .offset = offset,
 .bytes  = bytes,
-.is_write   = is_write,
+.type   = type,
 .co = qemu_coroutine_self(),
 .serialising= false,
 .overlap_offset = offset,
@@ -971,7 +972,7 @@ static int coroutine_fn bdrv_co_do_preadv(BlockDriverState 
*bs,
 bytes = ROUND_UP(bytes, align);
 }
 
-tracked_request_begin(&req, bs, offset, bytes, false);
+tracked_request_begin(&req, bs, offset, bytes, BDRV_TRACKED_READ);
 ret = bdrv_aligned_preadv(bs, &req, offset, bytes, align,
   use_local_qiov ? &local_qiov : qiov,
   flags);
@@ -1292,7 +1293,7 @@ static int coroutine_fn 
bdrv_co_do_pwritev(BlockDriverState *bs,
  * Pad qiov with the read parts and be sure to have a tracked request not
  * only for bdrv_aligned_pwritev, but also for the reads of the RMW cycle.
  */
-tracked_request_begin(&req, bs, offset, bytes, true);
+tracked_request_begin(&req, bs, offset, bytes, BDRV_TRACKED_WRITE);
 
 if (!qiov) {
 ret = bdrv_co_do_zero_pwritev(bs, offset, bytes, flags, &req);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 3ceeb5a..7db9900 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -60,11 +60,19 @@
 
 #define BLOCK_PROBE_BUF_SIZE512
 
+enum BdrvTrackedRequestType {
+BDRV_TRACKED_READ,
+BDRV_TRACKED_WRITE,
+BDRV_TRACKED_FLUSH,
+BDRV_TRACKED_IOCTL,
+BDRV_TRACKED_DISCARD,
+};
+
 typedef struct BdrvTrackedRequest {
 BlockDriverState *bs;
 int64_t offset;
 unsigned int bytes;
-bool is_write;
+enum BdrvTrackedRequestType type;
 
 bool serialising;
 int64_t overlap_offset;
-- 
2.4.3




[Qemu-devel] [PATCH v4 6/9] block: Emulate bdrv_ioctl with bdrv_aio_ioctl and track both

2015-11-09 Thread Fam Zheng
Currently all drivers that support .bdrv_aio_ioctl also implement
.bdrv_ioctl redundantly.  To track ioctl requests in block layer it is
easier if we unify the two paths, because we'll need to run it in a
coroutine, as required by tracked_request_begin. While we're at it, use
.bdrv_aio_ioctl plus aio_poll() to emulate bdrv_ioctl().

Signed-off-by: Fam Zheng 
---
 block/io.c | 101 +++--
 1 file changed, 92 insertions(+), 9 deletions(-)

diff --git a/block/io.c b/block/io.c
index 324ae5a..4ecb171 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2528,26 +2528,109 @@ int bdrv_discard(BlockDriverState *bs, int64_t 
sector_num, int nb_sectors)
 return rwco.ret;
 }
 
+typedef struct {
+CoroutineIOCompletion *co;
+QEMUBH *bh;
+} BdrvIoctlCompletionData;
+
+static void bdrv_ioctl_bh_cb(void *opaque)
+{
+BdrvIoctlCompletionData *data = opaque;
+
+bdrv_co_io_em_complete(data->co, -ENOTSUP);
+qemu_bh_delete(data->bh);
+}
+
+static int bdrv_co_do_ioctl(BlockDriverState *bs, int req, void *buf)
+{
+BlockDriver *drv = bs->drv;
+BdrvTrackedRequest tracked_req;
+CoroutineIOCompletion co = {
+.coroutine = qemu_coroutine_self(),
+};
+BlockAIOCB *acb;
+
+tracked_request_begin(&tracked_req, bs, 0, 0, BDRV_TRACKED_IOCTL);
+if (!drv || !drv->bdrv_aio_ioctl) {
+co.ret = -ENOTSUP;
+goto out;
+}
+
+acb = drv->bdrv_aio_ioctl(bs, req, buf, bdrv_co_io_em_complete, &co);
+if (!acb) {
+BdrvIoctlCompletionData *data = g_new(BdrvIoctlCompletionData, 1);
+data->bh = aio_bh_new(bdrv_get_aio_context(bs),
+bdrv_ioctl_bh_cb, data);
+data->co = &co;
+qemu_bh_schedule(data->bh);
+}
+qemu_coroutine_yield();
+out:
+tracked_request_end(&tracked_req);
+return co.ret;
+}
+
+typedef struct {
+BlockDriverState *bs;
+int req;
+void *buf;
+int ret;
+} BdrvIoctlCoData;
+
+static void coroutine_fn bdrv_co_ioctl_entry(void *opaque)
+{
+BdrvIoctlCoData *data = opaque;
+data->ret = bdrv_co_do_ioctl(data->bs, data->req, data->buf);
+}
+
 /* needed for generic scsi interface */
-
 int bdrv_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
 {
-BlockDriver *drv = bs->drv;
+BdrvIoctlCoData data = {
+.bs = bs,
+.req = req,
+.buf = buf,
+.ret = -EINPROGRESS,
+};
 
-if (drv && drv->bdrv_ioctl)
-return drv->bdrv_ioctl(bs, req, buf);
-return -ENOTSUP;
+if (qemu_in_coroutine()) {
+/* Fast-path if already in coroutine context */
+bdrv_co_ioctl_entry(&data);
+} else {
+Coroutine *co = qemu_coroutine_create(bdrv_co_ioctl_entry);
+qemu_coroutine_enter(co, &data);
+}
+while (data.ret == -EINPROGRESS) {
+aio_poll(bdrv_get_aio_context(bs), true);
+}
+return data.ret;
+}
+
+static void coroutine_fn bdrv_co_aio_ioctl_entry(void *opaque)
+{
+BlockAIOCBCoroutine *acb = opaque;
+acb->req.error = bdrv_co_do_ioctl(acb->common.bs,
+  acb->req.req, acb->req.buf);
+bdrv_co_complete(acb);
 }
 
 BlockAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,
 unsigned long int req, void *buf,
 BlockCompletionFunc *cb, void *opaque)
 {
-BlockDriver *drv = bs->drv;
+BlockAIOCBCoroutine *acb = qemu_aio_get(&bdrv_em_co_aiocb_info,
+bs, cb, opaque);
+Coroutine *co;
 
-if (drv && drv->bdrv_aio_ioctl)
-return drv->bdrv_aio_ioctl(bs, req, buf, cb, opaque);
-return NULL;
+acb->need_bh = true;
+acb->req.error = -EINPROGRESS;
+acb->req.req = req;
+acb->req.buf = buf;
+co = qemu_coroutine_create(bdrv_co_aio_ioctl_entry);
+qemu_coroutine_enter(co, acb);
+
+bdrv_co_maybe_schedule_bh(acb);
+return &acb->common;
 }
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
-- 
2.4.3




[Qemu-devel] [PATCH v4 3/9] block: Track discard requests

2015-11-09 Thread Fam Zheng
Both bdrv_discard and bdrv_aio_discard will call into bdrv_co_discard,
so add tracked_request_begin/end calls around the loop.

Signed-off-by: Fam Zheng 
Reviewed-by: Kevin Wolf 
---
 block/io.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/block/io.c b/block/io.c
index a9a49e4..324ae5a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2424,6 +2424,7 @@ static void coroutine_fn bdrv_discard_co_entry(void 
*opaque)
 int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
  int nb_sectors)
 {
+BdrvTrackedRequest req;
 int max_discard, ret;
 
 if (!bs->drv) {
@@ -2446,6 +2447,8 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
 return 0;
 }
 
+tracked_request_begin(&req, bs, sector_num, nb_sectors,
+  BDRV_TRACKED_DISCARD);
 bdrv_set_dirty(bs, sector_num, nb_sectors);
 
 max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
@@ -2479,20 +2482,24 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, 
int64_t sector_num,
 acb = bs->drv->bdrv_aio_discard(bs, sector_num, nb_sectors,
 bdrv_co_io_em_complete, &co);
 if (acb == NULL) {
-return -EIO;
+ret = -EIO;
+goto out;
 } else {
 qemu_coroutine_yield();
 ret = co.ret;
 }
 }
 if (ret && ret != -ENOTSUP) {
-return ret;
+goto out;
 }
 
 sector_num += num;
 nb_sectors -= num;
 }
-return 0;
+ret = 0;
+out:
+tracked_request_end(&req);
+return ret;
 }
 
 int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
-- 
2.4.3




[Qemu-devel] [PATCH v4 2/9] block: Track flush requests

2015-11-09 Thread Fam Zheng
Both bdrv_flush and bdrv_aio_flush eventually call bdrv_co_flush, add
tracked_request_begin and tracked_request_end pair in that function so
that all flush requests are now tracked.

Signed-off-by: Fam Zheng 
Reviewed-by: Kevin Wolf 
---
 block/io.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/block/io.c b/block/io.c
index 793809a..a9a49e4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2318,18 +2318,20 @@ static void coroutine_fn bdrv_flush_co_entry(void 
*opaque)
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 {
 int ret;
+BdrvTrackedRequest req;
 
 if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
 bdrv_is_sg(bs)) {
 return 0;
 }
 
+tracked_request_begin(&req, bs, 0, 0, BDRV_TRACKED_FLUSH);
 /* Write back cached data to the OS even with cache=unsafe */
 BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
 if (bs->drv->bdrv_co_flush_to_os) {
 ret = bs->drv->bdrv_co_flush_to_os(bs);
 if (ret < 0) {
-return ret;
+goto out;
 }
 }
 
@@ -2369,14 +2371,17 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 ret = 0;
 }
 if (ret < 0) {
-return ret;
+goto out;
 }
 
 /* Now flush the underlying protocol.  It will also have BDRV_O_NO_FLUSH
  * in the case of cache=unsafe, so there are no useless flushes.
  */
 flush_parent:
-return bs->file ? bdrv_co_flush(bs->file->bs) : 0;
+ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
+out:
+tracked_request_end(&req);
+return ret;
 }
 
 int bdrv_flush(BlockDriverState *bs)
-- 
2.4.3




Re: [Qemu-devel] [PATCH v6 11/33] hostmem-file: use whole file size if possible

2015-11-09 Thread Michael S. Tsirkin
On Fri, Oct 30, 2015 at 01:56:05PM +0800, Xiao Guangrong wrote:
> Use the whole file size if @size is not specified which is useful
> if we want to directly pass a file to guest
> 
> Signed-off-by: Xiao Guangrong 

Better split these simplifications off from the series.

> ---
>  backends/hostmem-file.c | 48 
>  1 file changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index 9097a57..e1bc9ff 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -9,6 +9,9 @@
>   * This work is licensed under the terms of the GNU GPL, version 2 or later.
>   * See the COPYING file in the top-level directory.
>   */
> +#include 
> +#include 
> +
>  #include "qemu-common.h"
>  #include "sysemu/hostmem.h"
>  #include "sysemu/sysemu.h"
> @@ -33,20 +36,57 @@ struct HostMemoryBackendFile {
>  char *mem_path;
>  };
>  
> +static uint64_t get_file_size(const char *file)
> +{
> +struct stat stat_buf;
> +uint64_t size = 0;
> +int fd;
> +
> +fd = open(file, O_RDONLY);
> +if (fd < 0) {
> +return 0;
> +}
> +
> +if (stat(file, &stat_buf) < 0) {
> +goto exit;
> +}
> +
> +if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) {

You must test S_ISDIR too.

> +goto exit;
> +}
> +
> +size = lseek(fd, 0, SEEK_END);
> +if (size == -1) {
> +size = 0;
> +}
> +exit:
> +close(fd);
> +return size;
> +}
> +
>  static void
>  file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>  {
>  HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
>  
> -if (!backend->size) {
> -error_setg(errp, "can't create backend with size 0");
> -return;
> -}
>  if (!fb->mem_path) {
>  error_setg(errp, "mem-path property not set");
>  return;
>  }
>  
> +if (!backend->size) {
> +/*
> + * use the whole file size if @size is not specified.
> + */
> +backend->size = get_file_size(fb->mem_path);
> +}
> +
> +if (!backend->size) {
> +error_setg(errp, "failed to get file size for %s, can't create "
> + "backend on it", mem_path);
> +return;
> +}
> +
>  backend->force_prealloc = mem_prealloc;
>  memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
>   object_get_canonical_path(OBJECT(backend)),
> -- 
> 1.8.3.1



[Qemu-devel] [PATCH v4 9/9] qed: Implement .bdrv_drain

2015-11-09 Thread Fam Zheng
The "need_check_timer" is used to clear the "NEED_CHECK" flag in the
image header after a grace period once metadata update has finished. In
compliance to the bdrv_drain semantics we should make sure it remains
deleted once .bdrv_drain is called.

We cannot reuse qed_need_check_timer_cb because here it doesn't satisfy
the assertion.  Do the "plug" and "flush" calls manually.

Signed-off-by: Fam Zheng 
Reviewed-by: Kevin Wolf 
---
 block/qed.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/block/qed.c b/block/qed.c
index 5ea05d4..9b88895 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -375,6 +375,18 @@ static void bdrv_qed_attach_aio_context(BlockDriverState 
*bs,
 }
 }
 
+static void bdrv_qed_drain(BlockDriverState *bs)
+{
+BDRVQEDState *s = bs->opaque;
+
+/* Cancel timer and start doing I/O that were meant to happen as if it
+ * fired, that way we get bdrv_drain() taking care of the ongoing requests
+ * correctly. */
+qed_cancel_need_check_timer(s);
+qed_plug_allocating_write_reqs(s);
+bdrv_aio_flush(s->bs, qed_clear_need_check, s);
+}
+
 static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
  Error **errp)
 {
@@ -1676,6 +1688,7 @@ static BlockDriver bdrv_qed = {
 .bdrv_check   = bdrv_qed_check,
 .bdrv_detach_aio_context  = bdrv_qed_detach_aio_context,
 .bdrv_attach_aio_context  = bdrv_qed_attach_aio_context,
+.bdrv_drain   = bdrv_qed_drain,
 };
 
 static void bdrv_qed_init(void)
-- 
2.4.3




[Qemu-devel] [PATCH v4 7/9] block: Drop BlockDriver.bdrv_ioctl

2015-11-09 Thread Fam Zheng
Now the callback is not used any more, drop the field along with all
implementations in block drivers, which are iscsi and raw.

Signed-off-by: Fam Zheng 
Reviewed-by: Kevin Wolf 
---
 block/iscsi.c | 33 -
 block/raw-posix.c |  8 
 block/raw_bsd.c   |  6 --
 include/block/block_int.h |  1 -
 4 files changed, 48 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index b3fa0a0..002d29b 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -845,38 +845,6 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 return &acb->common;
 }
 
-static void ioctl_cb(void *opaque, int status)
-{
-int *p_status = opaque;
-*p_status = status;
-}
-
-static int iscsi_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
-{
-IscsiLun *iscsilun = bs->opaque;
-int status;
-
-switch (req) {
-case SG_GET_VERSION_NUM:
-*(int *)buf = 3;
-break;
-case SG_GET_SCSI_ID:
-((struct sg_scsi_id *)buf)->scsi_type = iscsilun->type;
-break;
-case SG_IO:
-status = -EINPROGRESS;
-iscsi_aio_ioctl(bs, req, buf, ioctl_cb, &status);
-
-while (status == -EINPROGRESS) {
-aio_poll(iscsilun->aio_context, true);
-}
-
-return 0;
-default:
-return -1;
-}
-return 0;
-}
 #endif
 
 static int64_t
@@ -1807,7 +1775,6 @@ static BlockDriver bdrv_iscsi = {
 .bdrv_co_flush_to_disk = iscsi_co_flush,
 
 #ifdef __linux__
-.bdrv_ioctl   = iscsi_ioctl,
 .bdrv_aio_ioctl   = iscsi_aio_ioctl,
 #endif
 
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 918c756..aec9ec6 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -2175,12 +2175,6 @@ static int hdev_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 #if defined(__linux__)
-static int hdev_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
-{
-BDRVRawState *s = bs->opaque;
-
-return ioctl(s->fd, req, buf);
-}
 
 static BlockAIOCB *hdev_aio_ioctl(BlockDriverState *bs,
 unsigned long int req, void *buf,
@@ -2338,7 +2332,6 @@ static BlockDriver bdrv_host_device = {
 
 /* generic scsi device */
 #ifdef __linux__
-.bdrv_ioctl = hdev_ioctl,
 .bdrv_aio_ioctl = hdev_aio_ioctl,
 #endif
 };
@@ -2471,7 +2464,6 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_lock_medium   = cdrom_lock_medium,
 
 /* generic scsi device */
-.bdrv_ioctl = hdev_ioctl,
 .bdrv_aio_ioctl = hdev_aio_ioctl,
 };
 #endif /* __linux__ */
diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index 0aded31..915d6fd 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -169,11 +169,6 @@ static void raw_lock_medium(BlockDriverState *bs, bool 
locked)
 bdrv_lock_medium(bs->file->bs, locked);
 }
 
-static int raw_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
-{
-return bdrv_ioctl(bs->file->bs, req, buf);
-}
-
 static BlockAIOCB *raw_aio_ioctl(BlockDriverState *bs,
  unsigned long int req, void *buf,
  BlockCompletionFunc *cb,
@@ -262,7 +257,6 @@ BlockDriver bdrv_raw = {
 .bdrv_media_changed   = &raw_media_changed,
 .bdrv_eject   = &raw_eject,
 .bdrv_lock_medium = &raw_lock_medium,
-.bdrv_ioctl   = &raw_ioctl,
 .bdrv_aio_ioctl   = &raw_aio_ioctl,
 .create_opts  = &raw_create_opts,
 .bdrv_has_zero_init   = &raw_has_zero_init
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 7db9900..550ce18 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -227,7 +227,6 @@ struct BlockDriver {
 void (*bdrv_lock_medium)(BlockDriverState *bs, bool locked);
 
 /* to control generic scsi devices */
-int (*bdrv_ioctl)(BlockDriverState *bs, unsigned long int req, void *buf);
 BlockAIOCB *(*bdrv_aio_ioctl)(BlockDriverState *bs,
 unsigned long int req, void *buf,
 BlockCompletionFunc *cb, void *opaque);
-- 
2.4.3




[Qemu-devel] [PATCH v4 8/9] block: Introduce BlockDriver.bdrv_drain callback

2015-11-09 Thread Fam Zheng
Drivers can have internal request sources that generate IO, like the
need_check_timer in QED. Since we want quiesced periods that contain
nested event loops in block layer, we need to have a way to disable such
event sources.

Block drivers must implement the "bdrv_drain" callback if it has any
internal sources that can generate I/O activity, like a timer or a
worker thread (even in a library) that can schedule QEMUBH in an
asynchronous callback.

Update the comments of bdrv_drain and bdrv_drained_begin accordingly.

Like bdrv_requests_pending(), we should consider all the children of bs.
Before, the while loop just works, as bdrv_requests_pending() already
tracks its children; now we mustn't miss the callback, so recurse down
explicitly.

Signed-off-by: Fam Zheng 
---
 block/io.c| 16 +++-
 include/block/block_int.h |  6 ++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 4ecb171..adc1eab 100644
--- a/block/io.c
+++ b/block/io.c
@@ -237,8 +237,21 @@ bool bdrv_requests_pending(BlockDriverState *bs)
 return false;
 }
 
+static void bdrv_drain_recurse(BlockDriverState *bs)
+{
+BdrvChild *child;
+
+if (bs->drv && bs->drv->bdrv_drain) {
+bs->drv->bdrv_drain(bs);
+}
+QLIST_FOREACH(child, &bs->children, next) {
+bdrv_drain_recurse(child->bs);
+}
+}
+
 /*
- * Wait for pending requests to complete on a single BlockDriverState subtree
+ * Wait for pending requests to complete on a single BlockDriverState subtree,
+ * and suspend block driver's internal I/O until next request arrives.
  *
  * Note that unlike bdrv_drain_all(), the caller must hold the BlockDriverState
  * AioContext.
@@ -251,6 +264,7 @@ void bdrv_drain(BlockDriverState *bs)
 {
 bool busy = true;
 
+bdrv_drain_recurse(bs);
 while (busy) {
 /* Keep iterating */
  bdrv_flush_io_queue(bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 550ce18..4a9f8ff 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -295,6 +295,12 @@ struct BlockDriver {
  */
 int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
 
+/**
+ * Drain and stop any internal sources of requests in the driver, and
+ * remain so until next I/O callback (e.g. bdrv_co_writev) is called.
+ */
+void (*bdrv_drain)(BlockDriverState *bs);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
2.4.3




[Qemu-devel] [PATCH v4 4/9] iscsi: Emulate commands in iscsi_aio_ioctl as iscsi_ioctl

2015-11-09 Thread Fam Zheng
iscsi_ioctl emulates SG_GET_VERSION_NUM and SG_GET_SCSI_ID. Now that
bdrv_ioctl() will be emulated with .bdrv_aio_ioctl, replicate the logic
into iscsi_aio_ioctl to make them consistent.

Signed-off-by: Fam Zheng 
Reviewed-by: Kevin Wolf 
---
 block/iscsi.c | 40 ++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 9a628b7..b3fa0a0 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -96,6 +96,7 @@ typedef struct IscsiAIOCB {
 int status;
 int64_t sector_num;
 int nb_sectors;
+int ret;
 #ifdef __linux__
 sg_io_hdr_t *ioh;
 #endif
@@ -726,6 +727,38 @@ iscsi_aio_ioctl_cb(struct iscsi_context *iscsi, int status,
 iscsi_schedule_bh(acb);
 }
 
+static void iscsi_ioctl_bh_completion(void *opaque)
+{
+IscsiAIOCB *acb = opaque;
+
+qemu_bh_delete(acb->bh);
+acb->common.cb(acb->common.opaque, acb->ret);
+qemu_aio_unref(acb);
+}
+
+static void iscsi_ioctl_handle_emulated(IscsiAIOCB *acb, int req, void *buf)
+{
+BlockDriverState *bs = acb->common.bs;
+IscsiLun *iscsilun = bs->opaque;
+int ret = 0;
+
+switch (req) {
+case SG_GET_VERSION_NUM:
+*(int *)buf = 3;
+break;
+case SG_GET_SCSI_ID:
+((struct sg_scsi_id *)buf)->scsi_type = iscsilun->type;
+break;
+default:
+ret = -EINVAL;
+}
+assert(!acb->bh);
+acb->bh = aio_bh_new(bdrv_get_aio_context(bs),
+ iscsi_ioctl_bh_completion, acb);
+acb->ret = ret;
+qemu_bh_schedule(acb->bh);
+}
+
 static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 unsigned long int req, void *buf,
 BlockCompletionFunc *cb, void *opaque)
@@ -735,8 +768,6 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 struct iscsi_data data;
 IscsiAIOCB *acb;
 
-assert(req == SG_IO);
-
 acb = qemu_aio_get(&iscsi_aiocb_info, bs, cb, opaque);
 
 acb->iscsilun = iscsilun;
@@ -745,6 +776,11 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 acb->buf = NULL;
 acb->ioh = buf;
 
+if (req != SG_IO) {
+iscsi_ioctl_handle_emulated(acb, req, buf);
+return &acb->common;
+}
+
 acb->task = malloc(sizeof(struct scsi_task));
 if (acb->task == NULL) {
 error_report("iSCSI: Failed to allocate task for scsi command. %s",
-- 
2.4.3




Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11

2015-11-09 Thread Paolo Bonzini


On 09/11/2015 11:02, Peter Maydell wrote:
> On 9 November 2015 at 09:10, Paolo Bonzini  wrote:
>>
>>
>> On 08/11/2015 23:55, Peter Maydell wrote:
>>> So the good news is that on mainline this doesn't happen any more.
>>> The bad news is that something weird is going on such that git
>>> bisect doesn't give helpful answers. Specifically if I start by
>>> compiling older versions and work forwards, then
>>>  0fd7e09 kvmclock: add a new function to update env->tsc.
>>> shows the bug, and
>>>  6388acc Revert "Introduce cpu_clean_all_dirty"
>>> does not. (And I've got to that commit both via a git-bisect
>>> and by a second round of manually trying to identify the commit,
>>> so it's consistent about where it changes behaviour.)
>>> However that makes no sense because that revert commit
>>> is just removing unused code. And then if I go backwards again
>>> to 0fd7e09 the bug doesn't repro there.
>>
>> Even 0fd7e09 does not change behavior unless you use KVM (which you
>> obviously don't do under Mac OS X).  So if you go backwards to 0fd7e09^
>> it shouldn't reproduce there either.
>>
>> What is the known bad SHA1?
> 
> 2b5a79f is definitely bad even rebuilt from clean. I'm going

Hmm, so the list is pretty short:

-
Eduardo Habkost (3):
  pc: Set hw_version on all machine classes
  osdep: Rename qemu_{get, set}_version() to qemu_{, set_}hw_version()
  megasas: Use qemu_hw_version() instead of QEMU_VERSION

Fam Zheng (1):
  scripts/text2pod.pl: Escape left brace

Igor Mammedov (1):
  file_ram_alloc: propagate error to caller instead of terminating QEMU

John Snow (2):
  configure: disallow ccache during compile tests
  configure: disable FORTIFY_SOURCE under clang

Paolo Bonzini (4):
  target-i386: fix pcmpxstrx equal-ordered (strstr) mode
  ioport: do not use CPU_LOG_IOPORT
  qemu-log: remove -d ioport
  memory: call begin, log_start and commit when registering a new listener

Pavel Fedin (1):
  backends/hostmem-file: Allow to specify full pathname for backing file

Stefan Weil (1):
  cpu-exec: Fix compiler warning (-Werror=clobbered)
-

The only patches that could possibly fix the bug are:

 target-i386: fix pcmpxstrx equal-ordered (strstr) mode
 memory: call begin, log_start and commit when registering a new listener
 cpu-exec: Fix compiler warning (-Werror=clobbered)

It's probably the second.  The first has been there forever, while
the last doesn't have any effect under clang.

Paolo



[Qemu-devel] [PATCH v4 5/9] block: Add ioctl parameter fields to BlockRequest

2015-11-09 Thread Fam Zheng
The two fields that will be used by ioctl handling code later are added
as union, because it's used exclusively by ioctl code which dosn't need
the four fields in the other struct of the union.

Signed-off-by: Fam Zheng 
Reviewed-by: Kevin Wolf 
---
 include/block/block.h | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 610db92..c8b40b7 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -335,10 +335,18 @@ void bdrv_aio_cancel_async(BlockAIOCB *acb);
 
 typedef struct BlockRequest {
 /* Fields to be filled by multiwrite caller */
-int64_t sector;
-int nb_sectors;
-int flags;
-QEMUIOVector *qiov;
+union {
+struct {
+int64_t sector;
+int nb_sectors;
+int flags;
+QEMUIOVector *qiov;
+};
+struct {
+int req;
+void *buf;
+};
+};
 BlockCompletionFunc *cb;
 void *opaque;
 
-- 
2.4.3




Re: [Qemu-devel] [PATCH] hw/arm/virt: error_report cleanups

2015-11-09 Thread Markus Armbruster
Peter Maydell  writes:

> On 9 November 2015 at 07:44, Markus Armbruster  wrote:
>> For consistency, error messages should be a phrase, not a full sentence,
>> let alone a paraphraph.
>
> This is in direct conflict with wanting them to be actually useful
> to users :-(

I appreciate your drive for useful error messages.  Judging from the
error messages we got, it's a rare thing.

Let me rephrase.  The error message proper (the thing emitted by
error_report()) should be a phrase, and it should be short and to the
point.  It can be followed by hints.  Compare:

qemu-system-arm: Unable to determine GIC version supported by host. KVM 
acceleration is probably not supported.

and

qemu-system-arm: Unable to determine GIC version supported by host
KVM acceleration is probably not supported

I prefer the latter.  The error message proper is short and to the
point.  The hint points to the most probable cause.  Sensible line
lengths.

By the way, the error.h API supports this message + hints convention
since commit 50b7b00.



Re: [Qemu-devel] [PATCH v4 8/9] block: Introduce BlockDriver.bdrv_drain callback

2015-11-09 Thread Paolo Bonzini


On 09/11/2015 11:16, Fam Zheng wrote:
> Drivers can have internal request sources that generate IO, like the
> need_check_timer in QED. Since we want quiesced periods that contain
> nested event loops in block layer, we need to have a way to disable such
> event sources.
> 
> Block drivers must implement the "bdrv_drain" callback if it has any
> internal sources that can generate I/O activity, like a timer or a
> worker thread (even in a library) that can schedule QEMUBH in an
> asynchronous callback.
> 
> Update the comments of bdrv_drain and bdrv_drained_begin accordingly.
> 
> Like bdrv_requests_pending(), we should consider all the children of bs.
> Before, the while loop just works, as bdrv_requests_pending() already
> tracks its children; now we mustn't miss the callback, so recurse down
> explicitly.

Reviewed-by: Paolo Bonzini 

> Signed-off-by: Fam Zheng 
> ---
>  block/io.c| 16 +++-
>  include/block/block_int.h |  6 ++
>  2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 4ecb171..adc1eab 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -237,8 +237,21 @@ bool bdrv_requests_pending(BlockDriverState *bs)
>  return false;
>  }
>  
> +static void bdrv_drain_recurse(BlockDriverState *bs)
> +{
> +BdrvChild *child;
> +
> +if (bs->drv && bs->drv->bdrv_drain) {
> +bs->drv->bdrv_drain(bs);
> +}
> +QLIST_FOREACH(child, &bs->children, next) {
> +bdrv_drain_recurse(child->bs);
> +}
> +}
> +
>  /*
> - * Wait for pending requests to complete on a single BlockDriverState subtree
> + * Wait for pending requests to complete on a single BlockDriverState 
> subtree,
> + * and suspend block driver's internal I/O until next request arrives.
>   *
>   * Note that unlike bdrv_drain_all(), the caller must hold the 
> BlockDriverState
>   * AioContext.
> @@ -251,6 +264,7 @@ void bdrv_drain(BlockDriverState *bs)
>  {
>  bool busy = true;
>  
> +bdrv_drain_recurse(bs);
>  while (busy) {
>  /* Keep iterating */
>   bdrv_flush_io_queue(bs);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 550ce18..4a9f8ff 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -295,6 +295,12 @@ struct BlockDriver {
>   */
>  int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
>  
> +/**
> + * Drain and stop any internal sources of requests in the driver, and
> + * remain so until next I/O callback (e.g. bdrv_co_writev) is called.
> + */
> +void (*bdrv_drain)(BlockDriverState *bs);
> +
>  QLIST_ENTRY(BlockDriver) list;
>  };
>  
> 



Re: [Qemu-devel] [PATCH 02/16] exec.c: Allow target CPUs to define multiple AddressSpaces

2015-11-09 Thread Paolo Bonzini


On 05/11/2015 19:15, Peter Maydell wrote:
> Allow multiple calls to cpu_address_space_init(); each
> call adds an entry to the cpu->ases array at the specified
> index. It is up to the target-specific CPU code to actually use
> these extra address spaces.
> 
> Since this multiple AddressSpace support won't work with
> KVM, add an assertion to avoid confusing failures.

Actually it won't work _now_ with KVM, but it could.

It would be a good idea to map the multiple CPU AddressSpaces to KVM's
own multiple address spaces.  It's possible to modify i386 to do this,
using address space 0 for normal operation and address space 1 for SMM,
just like KVM.

More on this as I reply to the remainder of the series...

Paolo

> Signed-off-by: Peter Maydell 



Re: [Qemu-devel] [PATCH 02/16] exec.c: Allow target CPUs to define multiple AddressSpaces

2015-11-09 Thread Paolo Bonzini


On 06/11/2015 14:34, Peter Maydell wrote:
>> > IIUC, g_renew may move the entire cpu_ases area. The internals of
>> > memory_listener_register (called below) seem to put away the pointers to 
>> > listeners
>> > so a renew+move would leave invalid pointers to listeners in memory.c 
>> > wouldn't it?
>> >
>> > There are various ways of solving this, (e.g dynamic allocation of the 
>> > listener,
>> > static allocation of the cpu_ases, invalidate all listeners and restore 
>> > them after
>> > each as init and more). I'm sure you'll figure something out.
> Oops, yes, you're right.
> 
> Maybe we should just have the target CPU say in advance what the
> maximum number of AddressSpaces it will have is -- my expectation
> is that this will be (a) small (b) known in advance anyway.

I agree.  Or even just allocate room statically, for the largest amount
that all targets in QEMU use.  My expectation is that this will be 2. :)

Paolo



Re: [Qemu-devel] [PATCH v6 07/33] util: introduce qemu_file_get_page_size()

2015-11-09 Thread Michael S. Tsirkin
On Fri, Oct 30, 2015 at 01:56:01PM +0800, Xiao Guangrong wrote:
> There are three places use the some logic to get the page size on
> the file path or file fd
> 
> This patch introduces qemu_file_get_page_size() to unify the code
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  include/qemu/osdep.h |  1 +
>  target-ppc/kvm.c | 21 +++--
>  util/oslib-posix.c   | 16 
>  util/oslib-win32.c   |  5 +
>  4 files changed, 25 insertions(+), 18 deletions(-)
> 
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index b568424..d4dde02 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -302,4 +302,5 @@ int qemu_read_password(char *buf, int buf_size);
>   */
>  pid_t qemu_fork(Error **errp);
>  
> +size_t qemu_file_get_page_size(const char *mem_path);
>  #endif
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index ac70f08..c661f1c 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -308,28 +308,13 @@ static void kvm_get_smmu_info(PowerPCCPU *cpu, struct 
> kvm_ppc_smmu_info *info)
>  
>  static long gethugepagesize(const char *mem_path)
>  {
> -struct statfs fs;
> -int ret;
> -
> -do {
> -ret = statfs(mem_path, &fs);
> -} while (ret != 0 && errno == EINTR);
> +long size = qemu_file_get_page_size(mem_path);
>  
> -if (ret != 0) {
> -fprintf(stderr, "Couldn't statfs() memory path: %s\n",
> -strerror(errno));
> +if (!size) {
>  exit(1);
>  }
>  
> -#define HUGETLBFS_MAGIC   0x958458f6
> -
> -if (fs.f_type != HUGETLBFS_MAGIC) {
> -/* Explicit mempath, but it's ordinary pages */
> -return getpagesize();
> -}
> -
> -/* It's hugepage, return the huge page size */
> -return fs.f_bsize;
> +return size;
>  }
>  
>  static int find_max_supported_pagesize(Object *obj, void *opaque)
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 914cef5..ad94c5a 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -360,6 +360,22 @@ static size_t fd_getpagesize(int fd)
>  return getpagesize();
>  }
>  
> +size_t qemu_file_get_page_size(const char *path)
> +{
> +size_t size = 0;
> +int fd = qemu_open(path, O_RDONLY);
> +
> +if (fd < 0) {
> +fprintf(stderr, "Could not open %s.\n", path);
> +goto exit;
> +}
> +
> +size = fd_getpagesize(fd);
> +qemu_close(fd);
> +exit:
> +return size;
> +}
> +
>  void os_mem_prealloc(int fd, char *area, size_t memory)
>  {
>  int ret;

So this is opening the file for the sole purpose of
doing the fstatfs on it. Seems strange, just do statfs instead.
In fact, maybe we want statfs_getpagesize.

> diff --git a/util/oslib-win32.c b/util/oslib-win32.c
> index 09f9e98..a18aa87 100644
> --- a/util/oslib-win32.c
> +++ b/util/oslib-win32.c
> @@ -462,6 +462,11 @@ size_t getpagesize(void)
>  return system_info.dwPageSize;
>  }
>  
> +size_t qemu_file_get_page_size(const char *path)
> +{
> +return getpagesize();
> +}
> +
>  void os_mem_prealloc(int fd, char *area, size_t memory)
>  {
>  int i;

And why is this needed on win32?

> -- 
> 1.8.3.1



Re: [Qemu-devel] [PATCH v6 08/33] exec: allow memory to be allocated from any kind of path

2015-11-09 Thread Michael S. Tsirkin
On Fri, Oct 30, 2015 at 01:56:02PM +0800, Xiao Guangrong wrote:
> Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
> of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
> locates at DAX enabled filesystem
> 
> So this patch let it work on any kind of path
> 
> Signed-off-by: Xiao Guangrong 

So this allows regular memory to be specified directly.
This needs to be split out and merged separately
from acpi/nvdimm bits.

Alternatively, if it's possible to use nvdimm with DAX fs
(similar to hugetlbfs), leave these patches off for now.


> ---
>  exec.c | 56 +---
>  1 file changed, 17 insertions(+), 39 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 8af2570..3ca7e50 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1174,32 +1174,6 @@ void qemu_mutex_unlock_ramlist(void)
>  }
>  
>  #ifdef __linux__
> -
> -#include 
> -
> -#define HUGETLBFS_MAGIC   0x958458f6
> -
> -static long gethugepagesize(const char *path, Error **errp)
> -{
> -struct statfs fs;
> -int ret;
> -
> -do {
> -ret = statfs(path, &fs);
> -} while (ret != 0 && errno == EINTR);
> -
> -if (ret != 0) {
> -error_setg_errno(errp, errno, "failed to get page size of file %s",
> - path);
> -return 0;
> -}
> -
> -if (fs.f_type != HUGETLBFS_MAGIC)
> -fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
> -
> -return fs.f_bsize;
> -}
> -
>  static void *file_ram_alloc(RAMBlock *block,
>  ram_addr_t memory,
>  const char *path,
> @@ -1210,20 +1184,24 @@ static void *file_ram_alloc(RAMBlock *block,
>  char *c;
>  void *area;
>  int fd;
> -uint64_t hpagesize;
> -Error *local_err = NULL;
> +uint64_t pagesize;
>  
> -hpagesize = gethugepagesize(path, &local_err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> +pagesize = qemu_file_get_page_size(path);
> +if (!pagesize) {
> +error_setg(errp, "can't get page size for %s", path);
>  goto error;
>  }
> -block->mr->align = hpagesize;
>  
> -if (memory < hpagesize) {
> +if (pagesize == getpagesize()) {
> +fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
> +}
> +
> +block->mr->align = pagesize;
> +
> +if (memory < pagesize) {
>  error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
> -   "or larger than huge page size 0x%" PRIx64,
> -   memory, hpagesize);
> +   "or larger than page size 0x%" PRIx64,
> +   memory, pagesize);
>  goto error;
>  }
>  
> @@ -1247,14 +1225,14 @@ static void *file_ram_alloc(RAMBlock *block,
>  fd = mkstemp(filename);
>  if (fd < 0) {
>  error_setg_errno(errp, errno,
> - "unable to create backing store for hugepages");
> + "unable to create backing store for path %s", path);
>  g_free(filename);
>  goto error;
>  }
>  unlink(filename);
>  g_free(filename);

Looks like we are still calling mkstemp/unlink here.
How does this work?

>  
> -memory = ROUND_UP(memory, hpagesize);
> +memory = ROUND_UP(memory, pagesize);
>  
>  /*
>   * ftruncate is not supported by hugetlbfs in older
> @@ -1266,10 +1244,10 @@ static void *file_ram_alloc(RAMBlock *block,
>  perror("ftruncate");
>  }
>  
> -area = qemu_ram_mmap(fd, memory, hpagesize, block->flags & RAM_SHARED);
> +area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>  if (area == MAP_FAILED) {
>  error_setg_errno(errp, errno,
> - "unable to map backing store for hugepages");
> + "unable to map backing store for path %s", path);
>  close(fd);
>  goto error;
>  }
> -- 
> 1.8.3.1



Re: [Qemu-devel] [PATCH v6 12/33] pc-dimm: remove DEFAULT_PC_DIMMSIZE

2015-11-09 Thread Michael S. Tsirkin
On Fri, Oct 30, 2015 at 01:56:06PM +0800, Xiao Guangrong wrote:
> It's not used any more
> 
> Signed-off-by: Xiao Guangrong 

You should leave the renames and cleanups off for later.
This patchset is large enough as it is.

> ---
>  include/hw/mem/pc-dimm.h | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
> index d83bf30..11a8937 100644
> --- a/include/hw/mem/pc-dimm.h
> +++ b/include/hw/mem/pc-dimm.h
> @@ -20,8 +20,6 @@
>  #include "sysemu/hostmem.h"
>  #include "hw/qdev.h"
>  
> -#define DEFAULT_PC_DIMMSIZE (1024*1024*1024)
> -
>  #define TYPE_PC_DIMM "pc-dimm"
>  #define PC_DIMM(obj) \
>  OBJECT_CHECK(PCDIMMDevice, (obj), TYPE_PC_DIMM)
> -- 
> 1.8.3.1



Re: [Qemu-devel] [PATCH 03/16] tlb_set_page_with_attrs: Take argument specifying AddressSpace to use

2015-11-09 Thread Paolo Bonzini


On 05/11/2015 19:15, Peter Maydell wrote:
> Add an argument to tlb_set_page_with_attrs which allows the target CPU code
> to tell the core code which AddressSpace to use.
> 
> The AddressSpace is specified by the index into the array of ASes which
> were registered with cpu_address_space_init().
> 
> Signed-off-by: Peter Maydell 

Can it be deduced from the attributes instead?  Basically, you would have

   int cpu_get_asidx(MemTxAttrs attrs);

in cpu.h, which is called by tlb_set_page_with_attrs.
cpu_get_phys_page_asidx_debug could also be replaced by
cpu_get_phys_page_attrs_debug.

Paolo

> ---
>  cputlb.c|  6 +++---
>  exec.c  |  7 ---
>  include/exec/exec-all.h | 33 ++---
>  target-arm/helper.c |  2 +-
>  target-i386/helper.c|  2 +-
>  5 files changed, 39 insertions(+), 11 deletions(-)
> 
> diff --git a/cputlb.c b/cputlb.c
> index bf1d50a..e753083 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -343,7 +343,7 @@ static void tlb_add_large_page(CPUArchState *env, 
> target_ulong vaddr,
>   * Called from TCG-generated code, which is under an RCU read-side
>   * critical section.
>   */
> -void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
> +void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr, int asidx,
>   hwaddr paddr, MemTxAttrs attrs, int prot,
>   int mmu_idx, target_ulong size)
>  {
> @@ -363,7 +363,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong 
> vaddr,
>  }
>  
>  sz = size;
> -section = address_space_translate_for_iotlb(cpu, paddr, &xlat, &sz);
> +section = address_space_translate_for_iotlb(cpu, asidx, paddr, &xlat, 
> &sz);
>  assert(sz >= TARGET_PAGE_SIZE);
>  
>  #if defined(DEBUG_TLB)
> @@ -433,7 +433,7 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
>hwaddr paddr, int prot,
>int mmu_idx, target_ulong size)
>  {
> -tlb_set_page_with_attrs(cpu, vaddr, paddr, MEMTXATTRS_UNSPECIFIED,
> +tlb_set_page_with_attrs(cpu, vaddr, 0, paddr, MEMTXATTRS_UNSPECIFIED,
>  prot, mmu_idx, size);
>  }
>  
> diff --git a/exec.c b/exec.c
> index 6a2a694..92e76fa 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -445,12 +445,13 @@ MemoryRegion *address_space_translate(AddressSpace *as, 
> hwaddr addr,
>  
>  /* Called from RCU critical section */
>  MemoryRegionSection *
> -address_space_translate_for_iotlb(CPUState *cpu, hwaddr addr,
> +address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr addr,
>hwaddr *xlat, hwaddr *plen)
>  {
>  MemoryRegionSection *section;
> -section = 
> address_space_translate_internal(cpu->cpu_ases[0].memory_dispatch,
> -   addr, xlat, plen, false);
> +AddressSpaceDispatch *d = cpu->cpu_ases[asidx].memory_dispatch;
> +
> +section = address_space_translate_internal(d, addr, xlat, plen, false);
>  
>  assert(!section->mr->iommu_ops);
>  return section;
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 90130ca..472d0fc 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -144,7 +144,34 @@ void tlb_flush_by_mmuidx(CPUState *cpu, ...);
>  void tlb_set_page(CPUState *cpu, target_ulong vaddr,
>hwaddr paddr, int prot,
>int mmu_idx, target_ulong size);
> -void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
> +/**
> + * tlb_set_page_with_attrs:
> + * @cpu: CPU to add this TLB entry for
> + * @vaddr: virtual address of page to add entry for
> + * @asidx: index of the AddressSpace the physical address is in
> + * @paddr: physical address of the page
> + * @attrs: memory transaction attributes
> + * @prot: access permissions (PAGE_READ/PAGE_WRITE/PAGE_EXEC bits)
> + * @mmu_idx: MMU index to insert TLB entry for
> + * @size: size of the page in bytes
> + *
> + * Add an entry to this CPU's TLB (a mapping from virtual address
> + * @vaddr to physical address @paddr) with the specified memory
> + * transaction attributes. This is generally called by the target CPU
> + * specific code after it has been called through the tlb_fill()
> + * entry point and performed a successful page table walk to find
> + * the physical address and attributes for the virtual address
> + * which provoked the TLB miss.
> + *
> + * At most one entry for a given virtual address is permitted. Only a
> + * single TARGET_PAGE_SIZE region is mapped; the supplied @size is only
> + * used by tlb_flush_page.
> + *
> + * @asidx is the index of the AddressSpace in the cpu->ases array;
> + * if the CPU does not support multiple AddressSpaces then it will
> + * always be zero.
> + */
> +void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr, int asidx,
>   hwaddr paddr, MemTxAttrs attrs,
>   int prot, int mm

Re: [Qemu-devel] qemu : rbd block driver internal snapshot and vm_stop is hanging forever

2015-11-09 Thread Alexandre DERUMIER
adding to ceph.conf

[client]
rbd_non_blocking_aio = false


fix the problem for me (with rbd_cache=false)


(@cc jdur...@redhat.com)



- Mail original -
De: "Denis V. Lunev" 
À: "aderumier" , "ceph-devel" 
, "qemu-devel" 
Envoyé: Lundi 9 Novembre 2015 08:22:34
Objet: Re: [Qemu-devel] qemu : rbd block driver internal snapshot and vm_stop 
is hanging forever

On 11/09/2015 10:19 AM, Denis V. Lunev wrote: 
> On 11/09/2015 06:10 AM, Alexandre DERUMIER wrote: 
>> Hi, 
>> 
>> with qemu (2.4.1), if I do an internal snapshot of an rbd device, 
>> then I pause the vm with vm_stop, 
>> 
>> the qemu process is hanging forever 
>> 
>> 
>> monitor commands to reproduce: 
>> 
>> 
>> # snapshot_blkdev_internal drive-virtio0 yoursnapname 
>> # stop 
>> 
>> 
>> 
>> 
>> I don't see this with qcow2 or sheepdog block driver for example. 
>> 
>> 
>> Regards, 
>> 
>> Alexandre 
>> 
> this could look like the problem I have recenty trying to 
> fix with dataplane enabled. Patch series is named as 
> 
> [PATCH for 2.5 v6 0/10] dataplane snapshot fixes 
> 
> Den 

anyway, even if above will not help, can you collect gdb 
traces from all threads in QEMU process. May be I'll be 
able to give a hit. 

Den 



Re: [Qemu-devel] [PATCH 08/16] exec.c: Have one io_mem_watch per AddressSpace

2015-11-09 Thread Paolo Bonzini


On 05/11/2015 19:15, Peter Maydell wrote:
> The io_mem_watch MemoryRegion's read and write callbacks pass the
> accesses through to an underlying address space. Now that that
> might be something other than address_space_memory, we need to
> pass the correct AddressSpace in via the opaque pointer. That
> means we need to have one io_mem_watch MemoryRegion per address
> space, rather than sharing one between all address spaces.
> 
> This should also fix gdbstub watchpoints in x86 SMRAM, which would
> previously not have worked correctly.
> 
> Signed-off-by: Peter Maydell 
> ---
>  exec.c| 40 +++-
>  include/exec/memory.h |  2 ++
>  2 files changed, 25 insertions(+), 17 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index bc6ab8a..9998fa0 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -163,8 +163,6 @@ static void io_mem_init(void);
>  static void memory_map_init(void);
>  static void tcg_commit(MemoryListener *listener);
>  
> -static MemoryRegion io_mem_watch;
> -
>  /**
>   * CPUAddressSpace: all the information a CPU needs about an AddressSpace
>   * @cpu: the CPU whose AddressSpace this is
> @@ -334,12 +332,6 @@ static MemoryRegionSection *phys_page_find(PhysPageEntry 
> lp, hwaddr addr,
>  }
>  }
>  
> -bool memory_region_is_unassigned(MemoryRegion *mr)
> -{
> -return mr != &io_mem_rom && mr != &io_mem_notdirty && !mr->rom_device
> -&& mr != &io_mem_watch;
> -}
> -
>  /* Called from RCU critical section */
>  static MemoryRegionSection *address_space_lookup_region(AddressSpaceDispatch 
> *d,
>  hwaddr addr,
> @@ -2045,17 +2037,18 @@ static MemTxResult watch_mem_read(void *opaque, 
> hwaddr addr, uint64_t *pdata,
>  {
>  MemTxResult res;
>  uint64_t data;
> +AddressSpace *as = opaque;
>  
>  check_watchpoint(addr & ~TARGET_PAGE_MASK, size, attrs, BP_MEM_READ);
>  switch (size) {
>  case 1:
> -data = address_space_ldub(&address_space_memory, addr, attrs, &res);
> +data = address_space_ldub(as, addr, attrs, &res);
>  break;
>  case 2:
> -data = address_space_lduw(&address_space_memory, addr, attrs, &res);
> +data = address_space_lduw(as, addr, attrs, &res);
>  break;
>  case 4:
> -data = address_space_ldl(&address_space_memory, addr, attrs, &res);
> +data = address_space_ldl(as, addr, attrs, &res);
>  break;
>  default: abort();
>  }

current_cpu is available here, so it should be possible to have only one
io_mem_watch per CPU address space index (i.e. two).

But actually, if the address space can be derived from the attributes,
you just need to fish the right address space out of current_cpu.

> +as->io_mem_watch = g_new0(MemoryRegion, 1);

This is leaked when the address space goes away.  You can allocate it
statically inside AddressSpace if my alternative plan from above doesn't
work out.

> +memory_region_init_io(as->io_mem_watch, NULL, &watch_mem_ops, as,
> +  NULL, UINT64_MAX);
> +
>  as->dispatch = NULL;
>  as->dispatch_listener = (MemoryListener) {
>  .begin = mem_begin,
> @@ -2339,6 +2344,7 @@ void address_space_destroy_dispatch(AddressSpace *as)
>  if (d) {
>  call_rcu(d, address_space_dispatch_free, rcu);
>  }
> +memory_region_unref(as->io_mem_watch);




Re: [Qemu-devel] [PATCH 03/16] tlb_set_page_with_attrs: Take argument specifying AddressSpace to use

2015-11-09 Thread Peter Maydell
On 9 November 2015 at 10:44, Paolo Bonzini  wrote:
>
>
> On 05/11/2015 19:15, Peter Maydell wrote:
>> Add an argument to tlb_set_page_with_attrs which allows the target CPU code
>> to tell the core code which AddressSpace to use.
>>
>> The AddressSpace is specified by the index into the array of ASes which
>> were registered with cpu_address_space_init().
>>
>> Signed-off-by: Peter Maydell 
>
> Can it be deduced from the attributes instead?  Basically, you would have
>
>int cpu_get_asidx(MemTxAttrs attrs);
>
> in cpu.h, which is called by tlb_set_page_with_attrs.
> cpu_get_phys_page_asidx_debug could also be replaced by
> cpu_get_phys_page_attrs_debug.

For ARM it could, certainly (and as you say, if we go that
way then some of the extra passing around of asidxes collapses,
and in particular we don't need to keep them separately in the
iotlb entries). I wasn't convinced that this would be true for
all possible uses of AddressSpaces.

thanks
-- PMM



Re: [Qemu-devel] [PATCH 09/16] target-arm: Support multiple address spaces in page table walks

2015-11-09 Thread Paolo Bonzini


On 05/11/2015 19:15, Peter Maydell wrote:
> If we have a secure address space, use it in page table walks:
>  * when doing the physical accesses to read descriptors,
>make them through the correct address space
>  * when the final result indicates a secure access, pass the
>correct address space index to tlb_set_page_with_attrs()
> 
> (The descriptor reads are the only direct physical accesses
> made in target-arm/ for CPUs which might have TrustZone.)

What is the case where you have no secure address space and you have
TrustZone?  KVM doesn't have TrustZone, so it should never be in a
secure regime, should it?

Paolo



Re: [Qemu-devel] [PATCH RFC 3/5] qapi: Use common name mangling for enumeration constants

2015-11-09 Thread Daniel P. Berrange
On Mon, Nov 09, 2015 at 10:34:41AM +0100, Markus Armbruster wrote:
> "Daniel P. Berrange"  writes:
> 
> > On Thu, Nov 05, 2015 at 04:30:00PM +0100, Markus Armbruster wrote:
> >> QAPI names needn't be valid C identifiers, so we mangle them with
> >> c_name().  Except for enumeration constants, which we mangle with
> >> camel_to_upper().
> >> 
> >> c_name() is easy enough to understand: replace '.' and '-' by '_',
> >> prefix certain ticklish identifiers with 'q_'.
> >> 
> >> camel_to_upper() is a hairball of heuristics, and guessing how it'll
> >> mangle interesting input could serve as a (nerdy) game.  Despite some
> >> tweaking (commit 5d371f4), it's still inadqeuate for some QAPI names
> >> (commit 351d36e).
> >> 
> >> Example: QAPI definition
> >> 
> >> { 'enum': 'BlockDeviceIoStatus', 'data': [ 'ok', 'failed', 'nospace' ] 
> >> }
> >> 
> >> generates
> >> 
> >> typedef enum BlockDeviceIoStatus {
> >> BLOCK_DEVICE_IO_STATUS_OK = 0,
> >> BLOCK_DEVICE_IO_STATUS_FAILED = 1,
> >> BLOCK_DEVICE_IO_STATUS_NOSPACE = 2,
> >> BLOCK_DEVICE_IO_STATUS_MAX = 3,
> >> } BlockDeviceIoStatus;
> >> 
> >> Observe that c_name() maps BlockDeviceIoStatus to itself, and
> >> camel_to_upper() maps it to BLOCK_DEVICE_IO_STATUS, i.e. the
> >> enumeration constants are shouted, the enumeration type isn't.
> >> 
> >> Because mangled names must not clash, name mangling restricts the
> >> names you can use.  For example, you can't have member 'a-b' and
> >> 'a_b'.
> >> 
> >> Having two separate manglings complicates this.  Enumeration constants
> >> must be distinct after mangling with camel_to_upper().  But as soon as
> >> you use the enumeration as union tag, they must *also* be distinct
> >> after mangling with c_name().
> >> 
> >> Having shouted enumeration constants isn't worth the complexity cost.
> >
> > I rather disagree with this. Having all-uppercase for enum constants
> > is a really widely used convention and I think it is pretty useful
> > when reading to code to have constants (whether #define/enum) clearly
> > marked in uppercase.  After this change we'll have situation where
> > QAPI enums uses CamelCase, while all other non-QAPI enums (whether
> > defined in QEMU source, or via #included 3rd party headers use
> > UPPER_CASE for constants. I think that's rather unpleasant.
> 
> CODING_STYLE doesn't mandate shouting constants.
> 
> Existing code doesn't shout constants consistently.

I wrote a script (attached) to parse the code and try to get extract
some data on upper/lower/mixed case usage in enum declarations

Enum member naming
  Uppercase: 993 in 325 files
  Lowercase: 119 in 76 files
  Mixedcase: 110 in 22 files

So, uppercase is preferred 10:1 over both lowercase and mixedcase
The filecount for uppercase would be far higher were it not for
fact that all QAPI enums end up in one file.

If we try to count #define's which provide constants, again
all uppercase is the clear majority

  Upper 32256
  Lower 829
  Mixed 8869

(50% of those Mixed constants come from the userspace emulator
 #defines for syscall numbers)

> Third parties don't shout constants consistently.

Running the same script across all the QEMU RPM dependancies again
shows uppercase is the favoured style:

Enum member naming
  Uppercase: 538 in 178 files
  Lowercase: 38 in 17 files
  Mixedcase: 141 in 53 files

The breakdown per-RPM is as follows:

bluez-libs-devel
Enum member naming
  Uppercase: 4 in 3 files

brlapi-devel
Enum member naming
  Uppercase: 43 in 23 files
  Mixedcase: 83 in 29 files

bzip2-devel
Enum member naming
  None

ceph-devel
Enum member naming
  None

cyrus-sasl-devel
Enum member naming
  Uppercase: 1 in 1 files

glusterfs-api-devel
Enum member naming
  Uppercase: 1 in 1 files

glusterfs-devel
Enum member naming
  Uppercase: 94 in 26 files
  Lowercase: 10 in 6 files
  Mixedcase: 2 in 1 files

gnutls-devel
Enum member naming
  Uppercase: 63 in 10 files

gtk3-devel
Enum member naming
  Uppercase: 152 in 60 files
  Lowercase: 1 in 1 files

libaio-devel
Enum member naming
  Uppercase: 1 in 1 files

libattr-devel
Enum member naming
  None

libcap-devel
Enum member naming
  None

libcurl-devel
Enum member naming
  Uppercase: 17 in 2 files
  Lowercase: 3 in 1 files
  Mixedcase: 1 in 1 files

libfdt-devel
Enum member naming
  None

libiscsi-devel
Enum member naming
  Uppercase: 32 in 2 files

libjpeg-devel
Enum member naming
  None

libpng-devel
Enum member naming
  None

librdmacm-devel
Enum member naming
  Uppercase: 7 in 3 files

libseccomp-devel
Enum member naming
  Uppercase: 2 in 1 files

libssh2-devel
Enum member naming
  None

libusbx-devel
Enum member naming
  Uppercase: 21 in 1 files

libuuid-devel
Enum member naming
  None

ncurses-devel
Enum member naming
  Uppercase: 3 in 3 files
  Mixedcase: 6 in 6 files

nss-devel
Enum member naming
  Uppercase: 2 in 2 files
  Lowercase: 10 in 2 files
  Mixedcase: 41 in 10 files

numactl-devel
Enum member naming
  None

pciutils-devel
Enum mem

Re: [Qemu-devel] [PATCH 08/16] exec.c: Have one io_mem_watch per AddressSpace

2015-11-09 Thread Peter Maydell
On 9 November 2015 at 10:49, Paolo Bonzini  wrote:
>
>
> On 05/11/2015 19:15, Peter Maydell wrote:
>> @@ -2045,17 +2037,18 @@ static MemTxResult watch_mem_read(void *opaque, 
>> hwaddr addr, uint64_t *pdata,
>>  {
>>  MemTxResult res;
>>  uint64_t data;
>> +AddressSpace *as = opaque;
>>
>>  check_watchpoint(addr & ~TARGET_PAGE_MASK, size, attrs, BP_MEM_READ);
>>  switch (size) {
>>  case 1:
>> -data = address_space_ldub(&address_space_memory, addr, attrs, &res);
>> +data = address_space_ldub(as, addr, attrs, &res);
>>  break;
>>  case 2:
>> -data = address_space_lduw(&address_space_memory, addr, attrs, &res);
>> +data = address_space_lduw(as, addr, attrs, &res);
>>  break;
>>  case 4:
>> -data = address_space_ldl(&address_space_memory, addr, attrs, &res);
>> +data = address_space_ldl(as, addr, attrs, &res);
>>  break;
>>  default: abort();
>>  }
>
> current_cpu is available here, so it should be possible to have only one
> io_mem_watch per CPU address space index (i.e. two).

So the opaque gives you the asidx and then you look at current_cpu's
cpu_ases[asidx]? Yeah, that works and is simpler. (Good argument for
making "number of ASes per CPU" a compile-time constant I guess.)

> But actually, if the address space can be derived from the attributes,
> you just need to fish the right address space out of current_cpu.
>
>> +as->io_mem_watch = g_new0(MemoryRegion, 1);
>
> This is leaked when the address space goes away.  You can allocate it
> statically inside AddressSpace if my alternative plan from above doesn't
> work out.

Oh, right, it's not sufficient to unref it, it doesn't auto-free
itself when the refcount drops to 0.

thanks
-- PMM



Re: [Qemu-devel] [PATCH v6 09/33] exec: allow file_ram_alloc to work on file

2015-11-09 Thread Xiao Guangrong



On 11/09/2015 06:13 PM, Michael S. Tsirkin wrote:

On Fri, Oct 30, 2015 at 01:56:03PM +0800, Xiao Guangrong wrote:

Currently, file_ram_alloc() only works on directory - it creates a file
under @path and do mmap on it

This patch tries to allow it to work on file directly, if @path is a
directory it works as before, otherwise it treats @path as the target
file then directly allocate memory from it

Signed-off-by: Xiao Guangrong 
---
  exec.c | 80 ++
  1 file changed, 51 insertions(+), 29 deletions(-)

diff --git a/exec.c b/exec.c
index 3ca7e50..f219010 100644
--- a/exec.c
+++ b/exec.c
@@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
  }

  #ifdef __linux__
+static bool path_is_dir(const char *path)
+{
+struct stat fs;
+
+return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);


This means file doesn't exist is treated as a file.
Can't figure out if that's intentional, should
be documented in any case.


The path is dir only if it passes S_ISDIR test. Otherwise, it is treated
as regular file. Any error on the file will be figured out by open() and
mmap() later.




+}
+
+static int open_file_path(RAMBlock *block, const char *path, size_t size)
+{
+char *filename;
+char *sanitized_name;
+char *c;
+int fd;
+
+if (!path_is_dir(path)) {
+int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;


Why does this make sense?


RAM_SHARED means its write is invisible to others also it does not actual
change the content of the file. Readonly permission is enough for this case.
It is also help us to pass a readonly file to the guess but discard its change
after guest shutdown.




+
+flags |= O_EXCL;


And why does this makes sense?



It' used if we pass a block device as a NVDIMM backend memory:
  O_EXCL can be used without O_CREAT if pathname refers to a block
device.  If the block device is in use by the system (e.g., mounted),
open() fails with the error EBUSY

BTW, the similar logic has already been in upsteam QEMU which is
introduced by commit 8d31d6b6 (backends/hostmem-file: Allow to specify
full pathname for backing file). I will drop these patches:
  util: introduce qemu_file_get_page_size()
  exec: allow memory to be allocated from any kind of path
  exec: allow file_ram_alloc to work on file



Re: [Qemu-devel] [PATCH v9 21/56] migration_is_setup_or_active

2015-11-09 Thread Juan Quintela
"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> Add 'migration_is_setup_or_active' utility function to check state.
> (It gets postcopy added to it's list later on in the series)
>
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Juan Quintela 



Re: [Qemu-devel] [PATCH 11/16] memory: Add address_space_init_shareable()

2015-11-09 Thread Paolo Bonzini


On 05/11/2015 19:15, Peter Maydell wrote:
> +AddressSpace *address_space_init_shareable(MemoryRegion *root, const char 
> *name)
> +{
> +AddressSpace *as;
> +
> +QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
> +if (root == as->root) {
> +as->ref_count++;
> +return as;
> +}
> +}
> +
> +as = g_malloc0(sizeof *as);
> +address_space_init(as, root, name);
> +as->malloced = true;
> +return as;
>  }

You shouldn't return a non-shareable address space here, I think,
because it might be contained into another object and that object might
disappear.  I haven't thought this through very much, but adding an " &&
as->malloced" to the conditional seems easy and safe.

Paolo

>  
>  void address_space_destroy(AddressSpace *as)
>  {
>  MemoryRegion *root = as->root;
>  
> +as->ref_count--;
> +if (as->ref_count) {
> +return;
> +}
>  /* Flush out anything from MemoryListeners listening in on this */
>  memory_region_transaction_begin();
>  as->root = NULL;
> 



Re: [Qemu-devel] [PATCH v6 11/33] hostmem-file: use whole file size if possible

2015-11-09 Thread Xiao Guangrong



On 11/09/2015 06:17 PM, Michael S. Tsirkin wrote:

On Fri, Oct 30, 2015 at 01:56:05PM +0800, Xiao Guangrong wrote:

Use the whole file size if @size is not specified which is useful
if we want to directly pass a file to guest

Signed-off-by: Xiao Guangrong 


Better split these simplifications off from the series.


Yep, i am also little tired to respin this long series, will drop this
one.




Re: [Qemu-devel] [PATCH v9 20/56] Return path: Send responses from destination to source

2015-11-09 Thread Juan Quintela
"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> Add migrate_send_rp_message to send a message from destination to source 
> along the return path.
>   (It uses a mutex to let it be called from multiple threads)
> Add migrate_send_rp_shut to send a 'shut' message to indicate
>   the destination is finished with the RP.
> Add migrate_send_rp_ack to send a 'PONG' message in response to a PING
>   Use it in the MSG_RP_PING handler
>
> Signed-off-by: Dr. David Alan Gilbert 
> Reviewed-by: Amit Shah 

Reviewed-by: Juan Quintela 



Re: [Qemu-devel] [PATCH v9 30/56] migration_completion: Take current state

2015-11-09 Thread Juan Quintela
"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> Soon we'll be in either ACTIVE or POSTCOPY_ACTIVE when we
> complete migration, and we need to know which we expect to be
> in to change state safely.
>
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Juan Quintela 



Re: [Qemu-devel] [PATCH 12/16] qom/cpu: Add MemoryRegion property

2015-11-09 Thread Paolo Bonzini


On 05/11/2015 19:15, Peter Maydell wrote:
> +
> +/* This is a softmmu CPU object, so create a property for it
> + * so users can wire up its memory. (This can't go in qom/cpu.c
> + * because that file is compiled only once for both user-mode
> + * and system builds.) The default if no link is set up is to use
> + * the system address space.
> + */
> +object_property_add_link(OBJECT(cpu), "memory", TYPE_MEMORY_REGION,
> + (Object **)&cpu->memory,
> + qdev_prop_allow_set_link_before_realize,
> + OBJ_PROP_LINK_UNREF_ON_RELEASE,
> + &error_abort);
> +cpu->memory = system_memory;

You need object_ref(cpu->memory) here, because setting cpu->memory will
drop a reference from the previously-set value.

Paolo

>  #endif



Re: [Qemu-devel] [PATCH 09/16] target-arm: Support multiple address spaces in page table walks

2015-11-09 Thread Peter Maydell
On 9 November 2015 at 10:51, Paolo Bonzini  wrote:
>
>
> On 05/11/2015 19:15, Peter Maydell wrote:
>> If we have a secure address space, use it in page table walks:
>>  * when doing the physical accesses to read descriptors,
>>make them through the correct address space
>>  * when the final result indicates a secure access, pass the
>>correct address space index to tlb_set_page_with_attrs()
>>
>> (The descriptor reads are the only direct physical accesses
>> made in target-arm/ for CPUs which might have TrustZone.)
>
> What is the case where you have no secure address space and you have
> TrustZone?  KVM doesn't have TrustZone, so it should never be in a
> secure regime, should it?

You mean "what is the case where is_secure but cpu->num_ases == 1" ?
That happens if you have a TrustZone CPU but the board has only
connected up one address space, because there is no difference
in the view from Secure and NonSecure. (vexpress is like this
in hardware, and most of our board models for TZ CPUS are like
that now even if the real h/w makes a distinction.)

I could have handled that by making the CPU init code always
register two ASes (using the same one twice if the board code
only passes one or using system_address_space twice if the
board code doesn't pass one at all), but that seemed a bit wasteful.

thanks
-- PMM



Re: [Qemu-devel] [PATCH 11/16] memory: Add address_space_init_shareable()

2015-11-09 Thread Peter Maydell
On 9 November 2015 at 10:55, Paolo Bonzini  wrote:
>
>
> On 05/11/2015 19:15, Peter Maydell wrote:
>> +AddressSpace *address_space_init_shareable(MemoryRegion *root, const char 
>> *name)
>> +{
>> +AddressSpace *as;
>> +
>> +QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
>> +if (root == as->root) {
>> +as->ref_count++;
>> +return as;
>> +}
>> +}
>> +
>> +as = g_malloc0(sizeof *as);
>> +address_space_init(as, root, name);
>> +as->malloced = true;
>> +return as;
>>  }
>
> You shouldn't return a non-shareable address space here, I think,
> because it might be contained into another object and that object might
> disappear.  I haven't thought this through very much, but adding an " &&
> as->malloced" to the conditional seems easy and safe.

That would prevent sharing the system address space, which is one
you really commonly want to share. I guess we could make that one
be alloced-shareable.

thanks
-- PMM



Re: [Qemu-devel] [PATCH 08/16] exec.c: Have one io_mem_watch per AddressSpace

2015-11-09 Thread Paolo Bonzini


On 09/11/2015 11:54, Peter Maydell wrote:
> > current_cpu is available here, so it should be possible to have only one
> > io_mem_watch per CPU address space index (i.e. two).
> 
> So the opaque gives you the asidx and then you look at current_cpu's
> cpu_ases[asidx]? Yeah, that works and is simpler. (Good argument for
> making "number of ASes per CPU" a compile-time constant I guess.)

Yes.  Of course it works well only if the compile-time constant is
small---but again, my guess is that it'll be two.

It would be even better if you can use the attrs instead of the opaque
to compute the asidx.  Then the only change you need is to watch_mem_*.

Paolo



Re: [Qemu-devel] [PATCH v6 07/33] util: introduce qemu_file_get_page_size()

2015-11-09 Thread Xiao Guangrong



On 11/09/2015 06:33 PM, Michael S. Tsirkin wrote:

On Fri, Oct 30, 2015 at 01:56:01PM +0800, Xiao Guangrong wrote:

There are three places use the some logic to get the page size on
the file path or file fd

This patch introduces qemu_file_get_page_size() to unify the code

Signed-off-by: Xiao Guangrong 
---
  include/qemu/osdep.h |  1 +
  target-ppc/kvm.c | 21 +++--
  util/oslib-posix.c   | 16 
  util/oslib-win32.c   |  5 +
  4 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index b568424..d4dde02 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -302,4 +302,5 @@ int qemu_read_password(char *buf, int buf_size);
   */
  pid_t qemu_fork(Error **errp);

+size_t qemu_file_get_page_size(const char *mem_path);
  #endif
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index ac70f08..c661f1c 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -308,28 +308,13 @@ static void kvm_get_smmu_info(PowerPCCPU *cpu, struct 
kvm_ppc_smmu_info *info)

  static long gethugepagesize(const char *mem_path)
  {
-struct statfs fs;
-int ret;
-
-do {
-ret = statfs(mem_path, &fs);
-} while (ret != 0 && errno == EINTR);
+long size = qemu_file_get_page_size(mem_path);

-if (ret != 0) {
-fprintf(stderr, "Couldn't statfs() memory path: %s\n",
-strerror(errno));
+if (!size) {
  exit(1);
  }

-#define HUGETLBFS_MAGIC   0x958458f6
-
-if (fs.f_type != HUGETLBFS_MAGIC) {
-/* Explicit mempath, but it's ordinary pages */
-return getpagesize();
-}
-
-/* It's hugepage, return the huge page size */
-return fs.f_bsize;
+return size;
  }

  static int find_max_supported_pagesize(Object *obj, void *opaque)
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 914cef5..ad94c5a 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -360,6 +360,22 @@ static size_t fd_getpagesize(int fd)
  return getpagesize();
  }

+size_t qemu_file_get_page_size(const char *path)
+{
+size_t size = 0;
+int fd = qemu_open(path, O_RDONLY);
+
+if (fd < 0) {
+fprintf(stderr, "Could not open %s.\n", path);
+goto exit;
+}
+
+size = fd_getpagesize(fd);
+qemu_close(fd);
+exit:
+return size;
+}
+
  void os_mem_prealloc(int fd, char *area, size_t memory)
  {
  int ret;


So this is opening the file for the sole purpose of
doing the fstatfs on it. Seems strange, just do statfs instead.
In fact, maybe we want statfs_getpagesize.


It is just to reuse the code of fd_getpagesize() which already has
the logic to check pagesize.




diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index 09f9e98..a18aa87 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -462,6 +462,11 @@ size_t getpagesize(void)
  return system_info.dwPageSize;
  }

+size_t qemu_file_get_page_size(const char *path)
+{
+return getpagesize();
+}
+
  void os_mem_prealloc(int fd, char *area, size_t memory)
  {
  int i;


And why is this needed on win32?


It is not actually used, just make osdep.h happy which has
qemu_file_get_page_size() declare.

BTW, i will drop this patch for now on.




Re: [Qemu-devel] [PATCH v9 34/56] Postcopy: Maintain unsentmap

2015-11-09 Thread Juan Quintela
"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> Maintain an 'unsentmap' of pages that have yet to be sent.
> This is used in the following patches to discard some set of
> the pages already sent as we enter postcopy mode.
>
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Juan Quintela 



Re: [Qemu-devel] [PATCH 11/16] memory: Add address_space_init_shareable()

2015-11-09 Thread Paolo Bonzini


On 09/11/2015 11:59, Peter Maydell wrote:
>>> >> +as = g_malloc0(sizeof *as);
>>> >> +address_space_init(as, root, name);
>>> >> +as->malloced = true;
>>> >> +return as;
>>> >>  }
>> >
>> > You shouldn't return a non-shareable address space here, I think,
>> > because it might be contained into another object and that object might
>> > disappear.  I haven't thought this through very much, but adding an " &&
>> > as->malloced" to the conditional seems easy and safe.
> That would prevent sharing the system address space, which is one
> you really commonly want to share. I guess we could make that one
> be alloced-shareable.

You would only allocate one duplicate of the system address space
though.  The second call to address_space_init_shareable would not
create a new AS.

Paolo



Re: [Qemu-devel] [PATCH v9 00/56] Postcopy implementation

2015-11-09 Thread Dr. David Alan Gilbert
* Bharata B Rao (bhar...@linux.vnet.ibm.com) wrote:
> On Mon, Nov 09, 2015 at 09:08:33AM +, Dr. David Alan Gilbert wrote:
> > * Bharata B Rao (bhar...@linux.vnet.ibm.com) wrote:
> > > On Fri, Nov 06, 2015 at 03:48:11PM +, Dr. David Alan Gilbert wrote:
> > > > * Bharata B Rao (bhar...@linux.vnet.ibm.com) wrote:



> > Can you :
> >1) show me the command line you're using
> 
> ./ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none -machine 
> pseries -m 64G,slots=32,maxmem=128G -smp 16 -device 
> virtio-blk-pci,drive=rootdisk -drive 
> file=/home/bharata/F20-snap1,if=none,cache=none,id=rootdisk,format=qcow2 
> -monitor telnet:127.0.0.1:1234,server,nowait -trace events=my-trace-events

Thanks; looks sensible.

> >2) Show me /proc/pid/smaps for the destination qemu
> 
> Attached.

> 3fef23ff-3fff23ff rw-p  00:00 0 
> Size:   67108864 kB
> Rss:67108864 kB
> Pss:67108864 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  0 kB
> Private_Clean: 0 kB
> Private_Dirty:  67108864 kB
> Referenced: 67108864 kB
> Anonymous:  67108864 kB
> AnonHugePages: 32768 kB
> Swap:  0 kB
> SwapPss:   0 kB
> KernelPageSize:   64 kB
> MMUPageSize:  64 kB
> Locked:0 kB
> VmFlags: rd wr mr mw me dc ac hg mg 

OK, all dirty - that's not happy.

> >3) Turn on the trace postcopy_place_page_zero
> >   the theory is that most of your pages should end up as zero pages
> >   and not be allocated.
> 
> No hits for postcopy_place_page_zero either in source or destination QEMU.

OK, that's why it's all dirty then - it should be placing most pages
as zero pages; I checked and that is triggering on x86.

Oh, I think I see it; the following is untested, I'll grab a machine to
try it on but it will take a little while:

diff --git a/migration/ram.c b/migration/ram.c
index 62cf42b..e6db965 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2273,6 +2274,7 @@ static int ram_load_postcopy(QEMUFile *f)
 /* Temporary page that is later 'placed' */
 void *postcopy_host_page = postcopy_get_tmp_page(mis);
 void *last_host = NULL;
+bool all_zero = false;
 
 while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
 ram_addr_t addr;
@@ -2280,7 +2282,6 @@ static int ram_load_postcopy(QEMUFile *f)
 void *page_buffer = NULL;
 void *place_source = NULL;
 uint8_t ch;
-bool all_zero = false;
 
 addr = qemu_get_be64(f);
 flags = addr & ~TARGET_PAGE_MASK;

Thanks for spotting this!

Dave

> 
> Regards,
> Bharata.

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH 09/16] target-arm: Support multiple address spaces in page table walks

2015-11-09 Thread Paolo Bonzini


On 09/11/2015 11:58, Peter Maydell wrote:
> On 9 November 2015 at 10:51, Paolo Bonzini  wrote:
>>
>>
>> On 05/11/2015 19:15, Peter Maydell wrote:
>>> If we have a secure address space, use it in page table walks:
>>>  * when doing the physical accesses to read descriptors,
>>>make them through the correct address space
>>>  * when the final result indicates a secure access, pass the
>>>correct address space index to tlb_set_page_with_attrs()
>>>
>>> (The descriptor reads are the only direct physical accesses
>>> made in target-arm/ for CPUs which might have TrustZone.)
>>
>> What is the case where you have no secure address space and you have
>> TrustZone?  KVM doesn't have TrustZone, so it should never be in a
>> secure regime, should it?
> 
> You mean "what is the case where is_secure but cpu->num_ases == 1" ?
> That happens if you have a TrustZone CPU but the board has only
> connected up one address space, because there is no difference
> in the view from Secure and NonSecure. (vexpress is like this
> in hardware, and most of our board models for TZ CPUS are like
> that now even if the real h/w makes a distinction.)
> 
> I could have handled that by making the CPU init code always
> register two ASes (using the same one twice if the board code
> only passes one or using system_address_space twice if the
> board code doesn't pass one at all), but that seemed a bit wasteful.

I think it's simpler though.  Complicating the init code is better than
handling the condition throughout all the helpers...

Paolo



Re: [Qemu-devel] [PATCH v2] configure: disable FORTIFY_SOURCE under clang

2015-11-09 Thread Peter Maydell
On 3 November 2015 at 20:43, John Snow  wrote:
> Some versions of clang may have difficulty compiling glibc headers when
> -D_FORTIFY_SOURCE is used. For example, Clang++ 3.5.0-9.fc22 cannot
> compile glibc's stdio headers when -D_FORTIFY_SOURCE=2 is used. This
> manifests currently as build failures with clang and any arm target.
>
> According to LLVM dev Richard Smith, clang does not target or support
> FORTIFY_SOURCE + glibc, and it should not be relied on.
> "It's still an unsupported combination, and while it might compile, some
> of the checks are unlikely to work because they require a frontend
> inliner to be useful"
>
> See: http://lists.llvm.org/pipermail/cfe-dev/2015-November/045846.html
>
> Conclusion: disable fortify-source if we appear to be using clang instead
> of testing for compile success or failure, which may be incidental or not
> indicative of proper support of the feature.

>  ##
>  # End of CC checks
>  # After here, no more $cc or $ld runs
> @@ -4442,7 +4457,7 @@ fi
>  if test "$gcov" = "yes" ; then
>CFLAGS="-fprofile-arcs -ftest-coverage -g $CFLAGS"
>LDFLAGS="-fprofile-arcs -ftest-coverage $LDFLAGS"
> -elif test "$debug" = "no" ; then
> +elif test "$fortify_source" = "yes" ; then
>CFLAGS="-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 $CFLAGS"
>  fi

This change means that a non-debug build with clang will no longer
pass -O2 in the CFLAGS, so you get an unoptimized build...

You need to handle fortify_source=yes and debug=no separately.

thanks
-- PMM



Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11

2015-11-09 Thread Peter Maydell
On 9 November 2015 at 10:21, Paolo Bonzini  wrote:
> Hmm, so the list is pretty short:

The good news is that my second bisect conclusively fingered
the culprit for why the bug went away...

>   configure: disable FORTIFY_SOURCE under clang

...the bad news is it's because this patch inadvertently makes
all non-fortify-source compiles be no-optimization (and we already
knew the bug didn't repro in an unoptimized build).

This also explains the weird behaviour in bisect: it was only
when some later commit touched enough of the header files to
force recompilation of whichever object file it is that's
being problematic that the bug disappeared.

thanks
-- PMM



Re: [Qemu-devel] [PATCH v6 19/33] dimm: keep the state of the whole backend memory

2015-11-09 Thread Michael S. Tsirkin
On Fri, Oct 30, 2015 at 01:56:13PM +0800, Xiao Guangrong wrote:
> QEMU keeps the state of memory of dimm device during live migration,
> however, it is not enough for nvdimm device as its memory does not
> contain its label data, so that we should protect the whole backend
> memory instead
> 
> Signed-off-by: Xiao Guangrong 

It looks like there's now a difference between
host_memory_backend_get_memory and get_memory_region,
whereas previously they were exactly interchangeable.

This needs some thought, in particular the theoretically
generic dimm.c has to do tricks to accomodate nvdimm.

The missing piece for NVDIMM is the 128k label space at the end,
right?  Can't nvdimm specific code just register that as a
separate RAM chunk in order to migrate it?

> ---
>  hw/mem/dimm.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
> index 498d380..7d1 100644
> --- a/hw/mem/dimm.c
> +++ b/hw/mem/dimm.c
> @@ -134,9 +134,16 @@ void dimm_memory_plug(DeviceState *dev, 
> MemoryHotplugState *hpms,
>  }
>  
>  memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
> -vmstate_register_ram(mr, dev);
>  numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
>  
> +/*
> + * save the state only for @mr is not enough as it does not contain
> + * the label data of NVDIMM device, so that we keep the state of
> + * whole hostmem instead.
> + */
> +vmstate_register_ram(host_memory_backend_get_memory(dimm->hostmem, errp),
> + dev);
> +
>  out:
>  error_propagate(errp, local_err);
>  }
> @@ -145,10 +152,13 @@ void dimm_memory_unplug(DeviceState *dev, 
> MemoryHotplugState *hpms,
> MemoryRegion *mr)
>  {
>  DIMMDevice *dimm = DIMM(dev);
> +MemoryRegion *backend_mr;
> +
> +backend_mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort);
>  
>  numa_unset_mem_node_id(dimm->addr, memory_region_size(mr), dimm->node);
>  memory_region_del_subregion(&hpms->mr, mr);
> -vmstate_unregister_ram(mr, dev);
> +vmstate_unregister_ram(backend_mr, dev);
>  }
>  
>  int qmp_dimm_device_list(Object *obj, void *opaque)
> -- 
> 1.8.3.1



Re: [Qemu-devel] [PATCH v6 08/33] exec: allow memory to be allocated from any kind of path

2015-11-09 Thread Xiao Guangrong



On 11/09/2015 06:39 PM, Michael S. Tsirkin wrote:

On Fri, Oct 30, 2015 at 01:56:02PM +0800, Xiao Guangrong wrote:

Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
locates at DAX enabled filesystem

So this patch let it work on any kind of path

Signed-off-by: Xiao Guangrong 


So this allows regular memory to be specified directly.
This needs to be split out and merged separately
from acpi/nvdimm bits.


Yup, that is in my plan.



Alternatively, if it's possible to use nvdimm with DAX fs
(similar to hugetlbfs), leave these patches off for now.



DAX is a filesystem mount options, it is not easy to get this
info from a file.




---
  exec.c | 56 +---
  1 file changed, 17 insertions(+), 39 deletions(-)

diff --git a/exec.c b/exec.c
index 8af2570..3ca7e50 100644
--- a/exec.c
+++ b/exec.c
@@ -1174,32 +1174,6 @@ void qemu_mutex_unlock_ramlist(void)
  }

  #ifdef __linux__
-
-#include 
-
-#define HUGETLBFS_MAGIC   0x958458f6
-
-static long gethugepagesize(const char *path, Error **errp)
-{
-struct statfs fs;
-int ret;
-
-do {
-ret = statfs(path, &fs);
-} while (ret != 0 && errno == EINTR);
-
-if (ret != 0) {
-error_setg_errno(errp, errno, "failed to get page size of file %s",
- path);
-return 0;
-}
-
-if (fs.f_type != HUGETLBFS_MAGIC)
-fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
-
-return fs.f_bsize;
-}
-
  static void *file_ram_alloc(RAMBlock *block,
  ram_addr_t memory,
  const char *path,
@@ -1210,20 +1184,24 @@ static void *file_ram_alloc(RAMBlock *block,
  char *c;
  void *area;
  int fd;
-uint64_t hpagesize;
-Error *local_err = NULL;
+uint64_t pagesize;

-hpagesize = gethugepagesize(path, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
+pagesize = qemu_file_get_page_size(path);
+if (!pagesize) {
+error_setg(errp, "can't get page size for %s", path);
  goto error;
  }
-block->mr->align = hpagesize;

-if (memory < hpagesize) {
+if (pagesize == getpagesize()) {
+fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
+}
+
+block->mr->align = pagesize;
+
+if (memory < pagesize) {
  error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
-   "or larger than huge page size 0x%" PRIx64,
-   memory, hpagesize);
+   "or larger than page size 0x%" PRIx64,
+   memory, pagesize);
  goto error;
  }

@@ -1247,14 +1225,14 @@ static void *file_ram_alloc(RAMBlock *block,
  fd = mkstemp(filename);
  if (fd < 0) {
  error_setg_errno(errp, errno,
- "unable to create backing store for hugepages");
+ "unable to create backing store for path %s", path);
  g_free(filename);
  goto error;
  }
  unlink(filename);
  g_free(filename);


Looks like we are still calling mkstemp/unlink here.
How does this work?


Hmm? We have got the fd so the file can be safely unlinked (kernel does not 
actually unlink
the file since mkstemp() holds a refcount.).

And this patch just renames the variables, no logic changed. Will drop it from 
the serials
for now on.



Re: [Qemu-devel] [PATCH v9 35/56] Postcopy: Calculate discard

2015-11-09 Thread Juan Quintela
"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> Where postcopy is preceeded by a period of precopy, the destination will
> have received pages that may have been dirtied on the source after the
> page was sent.  The destination must throw these pages away before
> starting it's CPUs.
>
> Calculate list of sent & dirty pages
> Provide helpers on the destination side to discard these.
>
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Juan Quintela 

But please, send a fix for:

>  return ret;
>  }
>  
> +/**
> + * postcopy_ram_discard_range: Discard a range of memory.
> + * We can assume that if we've been called postcopy_ram_hosttest returned 
> true.
> + *
> + * @mis: Current incoming migration state.
> + * @start, @length: range of memory to discard.
> + *
> + * returns: 0 on success.
> + */
> +int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
> +   size_t length)
> +{
> +trace_postcopy_ram_discard_range(start, length);
> +if (madvise(start, length, MADV_DONTNEED)) {

qemu_madvise(start, length, QEMU_MADV_DONTNEED)





Re: [Qemu-devel] [PATCH 09/16] target-arm: Support multiple address spaces in page table walks

2015-11-09 Thread Peter Maydell
On 9 November 2015 at 11:03, Paolo Bonzini  wrote:
>
>
> On 09/11/2015 11:58, Peter Maydell wrote:
>> You mean "what is the case where is_secure but cpu->num_ases == 1" ?
>> That happens if you have a TrustZone CPU but the board has only
>> connected up one address space, because there is no difference
>> in the view from Secure and NonSecure. (vexpress is like this
>> in hardware, and most of our board models for TZ CPUS are like
>> that now even if the real h/w makes a distinction.)
>>
>> I could have handled that by making the CPU init code always
>> register two ASes (using the same one twice if the board code
>> only passes one or using system_address_space twice if the
>> board code doesn't pass one at all), but that seemed a bit wasteful.
>
> I think it's simpler though.  Complicating the init code is better than
> handling the condition throughout all the helpers...

It was the overhead of having an extra AddressSpace that concerned
me (plus it shows up in things like 'info mtree' somewhat confusingly
if you didn't expect your board to really have 2 ASes). But I don't
feel very strongly about it (or have anything solid to base an
argument either way on).

thanks
-- PMM



Re: [Qemu-devel] [PATCH v6 12/33] pc-dimm: remove DEFAULT_PC_DIMMSIZE

2015-11-09 Thread Xiao Guangrong



On 11/09/2015 06:40 PM, Michael S. Tsirkin wrote:

On Fri, Oct 30, 2015 at 01:56:06PM +0800, Xiao Guangrong wrote:

It's not used any more

Signed-off-by: Xiao Guangrong 


You should leave the renames and cleanups off for later.
This patchset is large enough as it is.


Okay.



Re: [Qemu-devel] [PATCH v9 36/56] postcopy: Incoming initialisation

2015-11-09 Thread Juan Quintela
"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> Signed-off-by: Dr. David Alan Gilbert 
> Reviewed-by: David Gibson 
> Reviewed-by: Amit Shah 

Reviewed-by: Juan Quintela 



Re: [Qemu-devel] [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI

2015-11-09 Thread Igor Mammedov
On Fri, 6 Nov 2015 16:31:43 +0800
Xiao Guangrong  wrote:

> 
> 
> On 11/05/2015 10:49 PM, Igor Mammedov wrote:
> > On Thu, 5 Nov 2015 21:33:39 +0800
> > Xiao Guangrong  wrote:
> >
> >>
> >>
> >> On 11/05/2015 09:03 PM, Igor Mammedov wrote:
> >>> On Thu, 5 Nov 2015 18:15:31 +0800
> >>> Xiao Guangrong  wrote:
> >>>
> 
> 
>  On 11/05/2015 05:58 PM, Igor Mammedov wrote:
> > On Mon,  2 Nov 2015 17:13:27 +0800
> > Xiao Guangrong  wrote:
> >
> >> A page staring from 0xFF0 and IO port 0x0a18 - 0xa1b in guest are
> >   ^^ missing one 0???
> >
> >> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> >> for detailed design
> >>
> >> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
> >> that controls if nvdimm support is enabled, it is true on default and
> >> it is false on 2.4 and its earlier version to keep compatibility
> >>
> >> Signed-off-by: Xiao Guangrong 
> > [...]
> >
> >> @@ -33,6 +33,15 @@
> >>  */
> >> #define MIN_NAMESPACE_LABEL_SIZE  (128UL << 10)
> >>
> >> +/*
> >> + * A page staring from 0xFF0 and IO port 0x0a18 - 0xa1b in guest 
> >> are
> > ^^^ missing 0 or value in define 
> > below has an extra 0
> >
> >> + * reserved for NVDIMM ACPI emulation, refer to 
> >> docs/specs/acpi_nvdimm.txt
> >> + * for detailed design.
> >> + */
> >> +#define NVDIMM_ACPI_MEM_BASE  0xFF00ULL
> > it still maps RAM at arbitrary place,
> > that's the reason why VMGenID patches hasn't been merged for
> > more than several months.
> > I'm not against of using (more exactly I'm for it) direct mapping
> > but we should reach consensus when and how to use it first.
> >
> > I'd wouldn't use addresses below 4G as it may be used firmware or
> > legacy hardware and I won't bet that 0xFF00ULL won't conflict
> > with anything.
> > An alternative place to allocate reserve from could be high memory.
> > For pc we have "reserved-memory-end" which currently makes sure
> > that hotpluggable memory range isn't used by firmware.
> >
> > How about making API that allows to map additional memory
> > ranges before reserved-memory-end and pushes it up as mappings are
> > added.
[...]

> 
> Really a good study case to me, i tried your patch and moved the 64 bit
> staffs to the private method, it worked. :)
> 
> Igor, is this the API you want?

Lets get ack from Michael on the idea of RAM mapping before
"reserved-memory-end" first.
If he rejects it then there isn't any other way except of switching
to MMIO instead.

> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 6bf569a..aba29df 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1291,6 +1291,38 @@ FWCfgState *xen_load_linux(PCMachineState *pcms,
>   return fw_cfg;
>   }
> 
> +static void pc_reserve_high_memory_init(PCMachineState *pcms,
> +uint64_t base, uint64_t align)
> +{
> +pcms->reserve_high_memory.current_addr = ROUND_UP(base, align);
> +}
> +
> +static uint64_t
> +pc_reserve_high_memory_end(PCMachineState *pcms, int64_t align)
> +{
> +return ROUND_UP(pcms->reserve_high_memory.current_addr, align);
> +}
> +
> +uint64_t pc_reserve_high_memory(PCMachineState *pcms, uint64_t size,
> +int64_t align, Error **errp)
> +{
> +uint64_t end_addr, current_addr = pcms->reserve_high_memory.current_addr;
> +
> +if (!current_addr) {
> +error_setg(errp, "reserved high memory is not initialized.");
> +return 0;
> +}
> +
> +end_addr = pc_reserve_high_memory_end(pcms, align) + size;
> +if (current_addr > end_addr) {
> +error_setg(errp, "reserved high memory is not enough.");
> +return 0;
> +}
> +
> +pcms->reserve_high_memory.current_addr = end_addr;
> +return end_addr;
> +}
> +
>   FWCfgState *pc_memory_init(PCMachineState *pcms,
>  MemoryRegion *system_memory,
>  MemoryRegion *rom_memory,
> @@ -1379,6 +1411,8 @@ FWCfgState *pc_memory_init(PCMachineState *pcms,
>  "hotplug-memory", hotplug_mem_size);
>   memory_region_add_subregion(system_memory, 
> pcms->hotplug_memory.base,
>   &pcms->hotplug_memory.mr);
> +pc_reserve_high_memory_init(pcms, pcms->hotplug_memory.base +
> +hotplug_mem_size, 1ULL << 30);
>   }
> 
>   /* Initialize PC system firmware */
> @@ -1403,7 +1437,7 @@ FWCfgState *pc_memory_init(PCMachineState *pcms,
>   uint64_t res_mem_end = pcms->hotplug_memory.base;
> 
>   if (!pcmc->broken_reserved_end) {
> -res_mem_end += memory_region_size(&pcms->hotplug_memory.mr);
> +res_mem_end = 

Re: [Qemu-devel] [PATCH v6 06/33] acpi: add aml_method_serialized

2015-11-09 Thread Michael S. Tsirkin
On Fri, Oct 30, 2015 at 01:56:00PM +0800, Xiao Guangrong wrote:
> It avoid explicit Mutex and will be used by NVDIMM ACPI
> 
> Signed-off-by: Xiao Guangrong 

I'd rather you squashed these utility patches in with where
the code is used. This is just making it harder to review
as I have to jump back and forth.

> ---
>  hw/acpi/aml-build.c | 26 --
>  include/hw/acpi/aml-build.h |  1 +
>  2 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 9f792ab..8bee8b2 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -696,14 +696,36 @@ Aml *aml_while(Aml *predicate)
>  }
>  
>  /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefMethod */
> -Aml *aml_method(const char *name, int arg_count)
> +static Aml *__aml_method(const char *name, int arg_count, bool serialized)

Please don't prefix names with __.
what should you call this?
For example, you can call it aml_method_serialized.

>  {
>  Aml *var = aml_bundle(0x14 /* MethodOp */, AML_PACKAGE);
> +int methodflags;
> +
> +/*
> + * MethodFlags:
> + *   bit 0-2: ArgCount (0-7)
> + *   bit 3: SerializeFlag
> + * 0: NotSerialized
> + * 1: Serialized
> + *   bit 4-7: reserved (must be 0)
> + */
> +assert(!(arg_count & ~7));

Or shorter assert(arg_count < 8);

> +methodflags = arg_count | (serialized << 3);
>  build_append_namestring(var->buf, "%s", name);
> -build_append_byte(var->buf, arg_count); /* MethodFlags: ArgCount */
> +build_append_byte(var->buf, methodflags);
>  return var;
>  }
>  
> +Aml *aml_method(const char *name, int arg_count)
> +{
> +return __aml_method(name, arg_count, false);
> +}
> +
> +Aml *aml_method_serialized(const char *name, int arg_count)
> +{
> +return __aml_method(name, arg_count, true);
> +}
> +
>  /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefDevice */
>  Aml *aml_device(const char *name_format, ...)
>  {
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 5b8a118..00cf40e 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -263,6 +263,7 @@ Aml *aml_qword_memory(AmlDecode dec, AmlMinFixed 
> min_fixed,
>  Aml *aml_scope(const char *name_format, ...) GCC_FMT_ATTR(1, 2);
>  Aml *aml_device(const char *name_format, ...) GCC_FMT_ATTR(1, 2);
>  Aml *aml_method(const char *name, int arg_count);
> +Aml *aml_method_serialized(const char *name, int arg_count);
>  Aml *aml_if(Aml *predicate);
>  Aml *aml_else(void);
>  Aml *aml_while(Aml *predicate);
> -- 
> 1.8.3.1



Re: [Qemu-devel] [PATCH v9 42/56] Page request: Consume pages off the post-copy queue

2015-11-09 Thread Juan Quintela
"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> When transmitting RAM pages, consume pages that have been queued by
> MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.
>
> Note:
>   a) After a queued page the linear walk carries on from after the
> unqueued page; there is a reasonable chance that the destination
> was about to ask for other closeby pages anyway.
>
>   b) We have to be careful of any assumptions that the page walking
> code makes, in particular it does some short cuts on its first linear
> walk that break as soon as we do a queued page.
>
>   c) We have to be careful to not break up host-page size chunks, since
> this makes it harder to place the pages on the destination.
>
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Juan Quintela 




Re: [Qemu-devel] [PATCH 09/16] target-arm: Support multiple address spaces in page table walks

2015-11-09 Thread Paolo Bonzini


On 09/11/2015 12:09, Peter Maydell wrote:
>>> >> I could have handled that by making the CPU init code always
>>> >> register two ASes (using the same one twice if the board code
>>> >> only passes one or using system_address_space twice if the
>>> >> board code doesn't pass one at all), but that seemed a bit wasteful.
>> >
>> > I think it's simpler though.  Complicating the init code is better than
>> > handling the condition throughout all the helpers...
> It was the overhead of having an extra AddressSpace that concerned
> me (plus it shows up in things like 'info mtree' somewhat confusingly
> if you didn't expect your board to really have 2 ASes).

I don't think it shows up twice with address_space_init_shareable, does it?

Paolo

> But I don't
> feel very strongly about it (or have anything solid to base an
> argument either way on).




Re: [Qemu-devel] [PATCH v6 19/33] dimm: keep the state of the whole backend memory

2015-11-09 Thread Xiao Guangrong



On 11/09/2015 07:04 PM, Michael S. Tsirkin wrote:

On Fri, Oct 30, 2015 at 01:56:13PM +0800, Xiao Guangrong wrote:

QEMU keeps the state of memory of dimm device during live migration,
however, it is not enough for nvdimm device as its memory does not
contain its label data, so that we should protect the whole backend
memory instead

Signed-off-by: Xiao Guangrong 


It looks like there's now a difference between
host_memory_backend_get_memory and get_memory_region,
whereas previously they were exactly interchangeable.


Yes, it is. host_memory_backend_get_memory() is used to get the whole
backend memory region. get_memory_region() is used to get the memory region
which will be mapped to guest's address space. They are on the same page
for pc-dimm, but it's different for nvdimm since part of backend memory used
as label data which is not directly mapped to guest.



This needs some thought, in particular the theoretically
generic dimm.c has to do tricks to accomodate nvdimm.

The missing piece for NVDIMM is the 128k label space at the end,
right?  Can't nvdimm specific code just register that as a
separate RAM chunk in order to migrate it?


Yes, it is.

I thought this before, then we need to have addition callbackes:
- plug(), which is called when the dimm device is plugged, then nvdimm
  has the chance to do it own migration things.

- unplug(), let nvdimm undo the thing of migration.

It needs more codes but the logic seems more clear. If you like this way,
i will do.




---
  hw/mem/dimm.c | 14 --
  1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 498d380..7d1 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -134,9 +134,16 @@ void dimm_memory_plug(DeviceState *dev, MemoryHotplugState 
*hpms,
  }

  memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
-vmstate_register_ram(mr, dev);
  numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);

+/*
+ * save the state only for @mr is not enough as it does not contain
+ * the label data of NVDIMM device, so that we keep the state of
+ * whole hostmem instead.
+ */
+vmstate_register_ram(host_memory_backend_get_memory(dimm->hostmem, errp),
+ dev);
+
  out:
  error_propagate(errp, local_err);
  }
@@ -145,10 +152,13 @@ void dimm_memory_unplug(DeviceState *dev, 
MemoryHotplugState *hpms,
 MemoryRegion *mr)
  {
  DIMMDevice *dimm = DIMM(dev);
+MemoryRegion *backend_mr;
+
+backend_mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort);

  numa_unset_mem_node_id(dimm->addr, memory_region_size(mr), dimm->node);
  memory_region_del_subregion(&hpms->mr, mr);
-vmstate_unregister_ram(mr, dev);
+vmstate_unregister_ram(backend_mr, dev);
  }

  int qmp_dimm_device_list(Object *obj, void *opaque)
--
1.8.3.1






Re: [Qemu-devel] [PATCH v9 44/56] Postcopy: Use helpers to map pages during migration

2015-11-09 Thread Juan Quintela
"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> In postcopy, the destination guest is running at the same time
> as it's receiving pages; as we receive new pages we must put
> them into the guests address space atomically to avoid a running
> CPU accessing a partially written page.
>
> Use the helpers in postcopy-ram.c to map these pages.
>
> qemu_get_buffer_in_place is used to avoid a copy out of qemu_file
> in the case that postcopy is going to do a copy anyway.
>
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Juan Quintela 



Re: [Qemu-devel] [PULL 0/3] target-i386: tcg: Handle clflushopt/clwb/pcommit instructions

2015-11-09 Thread Peter Maydell
On 7 November 2015 at 14:54, Eduardo Habkost  wrote:
> The following changes since commit 4b59f39bc9a03afcc74b2fa28da7c3189fca507c:
>
>   Merge remote-tracking branch 
> 'remotes/mjt/tags/pull-trivial-patches-2015-11-06' into staging (2015-11-06 
> 12:50:24 +)
>
> are available in the git repository at:
>
>   git://github.com/ehabkost/qemu.git tags/x86-pull-request
>
> for you to fetch changes up to 0c47242b519a224279f13c685aa6e79347f97b85:
>
>   target-i386: Add clflushopt/clwb/pcommit to TCG_7_0_EBX_FEATURES 
> (2015-11-06 12:19:33 -0200)
>
> 
> target-i386: tcg: Handle clflushopt/clwb/pcommit instructions
>
> A small update to TCG code so it can handle the new
> clflushopt/clwb/pcommit instructions.
>
> 

Applied, thanks.

-- PMM



Re: [Qemu-devel] [PATCH 09/16] target-arm: Support multiple address spaces in page table walks

2015-11-09 Thread Peter Maydell
On 9 November 2015 at 11:19, Paolo Bonzini  wrote:
>
>
> On 09/11/2015 12:09, Peter Maydell wrote:
 >> I could have handled that by making the CPU init code always
 >> register two ASes (using the same one twice if the board code
 >> only passes one or using system_address_space twice if the
 >> board code doesn't pass one at all), but that seemed a bit wasteful.
>>> >
>>> > I think it's simpler though.  Complicating the init code is better than
>>> > handling the condition throughout all the helpers...
>> It was the overhead of having an extra AddressSpace that concerned
>> me (plus it shows up in things like 'info mtree' somewhat confusingly
>> if you didn't expect your board to really have 2 ASes).
>
> I don't think it shows up twice with address_space_init_shareable, does it?

Yes, looking at the code you're right.

-- PMM



Re: [Qemu-devel] [PATCH v9 48/56] Host page!=target page: Cleanup bitmaps

2015-11-09 Thread Juan Quintela
"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> Prior to the start of postcopy, ensure that everything that will
> be transferred later is a whole host-page in size.
>
> This is accomplished by discarding partially transferred host pages
> and marking any that are partially dirty as fully dirty.
>
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Juan Quintela 



Re: [Qemu-devel] [PATCH v6 06/33] acpi: add aml_method_serialized

2015-11-09 Thread Xiao Guangrong



On 11/09/2015 07:14 PM, Michael S. Tsirkin wrote:

On Fri, Oct 30, 2015 at 01:56:00PM +0800, Xiao Guangrong wrote:

It avoid explicit Mutex and will be used by NVDIMM ACPI

Signed-off-by: Xiao Guangrong 


I'd rather you squashed these utility patches in with where
the code is used. This is just making it harder to review
as I have to jump back and forth.


Okay to me.




---
  hw/acpi/aml-build.c | 26 --
  include/hw/acpi/aml-build.h |  1 +
  2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 9f792ab..8bee8b2 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -696,14 +696,36 @@ Aml *aml_while(Aml *predicate)
  }

  /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefMethod */
-Aml *aml_method(const char *name, int arg_count)
+static Aml *__aml_method(const char *name, int arg_count, bool serialized)


Please don't prefix names with __.
what should you call this?
For example, you can call it aml_method_serialized.


Igor disliked that aml_method_serialized() is spepated from aml_method(), so i
will unify them as a single function instead. :)




  {
  Aml *var = aml_bundle(0x14 /* MethodOp */, AML_PACKAGE);
+int methodflags;
+
+/*
+ * MethodFlags:
+ *   bit 0-2: ArgCount (0-7)
+ *   bit 3: SerializeFlag
+ * 0: NotSerialized
+ * 1: Serialized
+ *   bit 4-7: reserved (must be 0)
+ */
+assert(!(arg_count & ~7));


Or shorter assert(arg_count < 8);


Good to me.



Re: [Qemu-devel] [PATCH v6 05/33] acpi: add aml_object_type

2015-11-09 Thread Michael S. Tsirkin
On Fri, Oct 30, 2015 at 01:55:59PM +0800, Xiao Guangrong wrote:
> Implement ObjectType which is used by NVDIMM _DSM method in
> later patch
> 
> Signed-off-by: Xiao Guangrong 

I had to go dig in the _DSM patch to see how it's used.
And sure enough, callers have to know AML to make
sense of it. So pls don't split out tiny patches like this.
include the callee with the caller.

> ---
>  hw/acpi/aml-build.c | 8 
>  include/hw/acpi/aml-build.h | 1 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index efc06ab..9f792ab 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1178,6 +1178,14 @@ Aml *aml_concatenate(Aml *source1, Aml *source2, Aml 
> *target)
>  return var;
>  }
>  
> +/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefObjectType */
> +Aml *aml_object_type(Aml *object)
> +{
> +Aml *var = aml_opcode(0x8E /* ObjectTypeOp */);
> +aml_append(var, object);
> +return var;
> +}
> +

It would be better to have a higher level API
that can be used without knowning AML.
For example:

aml_object_type_is_package()



>  void
>  build_header(GArray *linker, GArray *table_data,
>   AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 325782d..5b8a118 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -278,6 +278,7 @@ Aml *aml_derefof(Aml *arg);
>  Aml *aml_sizeof(Aml *arg);
>  Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
>  Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target);
> +Aml *aml_object_type(Aml *object);
>  
>  void
>  build_header(GArray *linker, GArray *table_data,
> -- 
> 1.8.3.1



Re: [Qemu-devel] [PATCH v3 0/9] block: Fixes for bdrv_drain

2015-11-09 Thread Stefan Hajnoczi
On Mon, Nov 09, 2015 at 10:56:39AM +0800, Fam Zheng wrote:
> v3: Don't reuse coroutine in bdrv_aio_ioctl. [Stefan]
> Recursely call .bdrv_drain callback only. [Stefan, Paolo]

I don't understand this change.  I thought you want .bdrv_drain() to be
called on the whole tree, but the latest code seems to call it on the
root and children only.  It doesn't recurse the only the root and first
level of the tree get .bdrv_drain() calls.  Is this intentional?


signature.asc
Description: PGP signature


Re: [Qemu-devel] Qemu: Guest Linux hangs on Mac OS X 10.11

2015-11-09 Thread Peter Maydell
On 9 November 2015 at 11:06, Peter Maydell  wrote:
> On 9 November 2015 at 10:21, Paolo Bonzini  wrote:
>> Hmm, so the list is pretty short:
>
> The good news is that my second bisect conclusively fingered
> the culprit for why the bug went away...
>
>>   configure: disable FORTIFY_SOURCE under clang
>
> ...the bad news is it's because this patch inadvertently makes
> all non-fortify-source compiles be no-optimization (and we already
> knew the bug didn't repro in an unoptimized build).

Yep, just confirmed that current master with commit b553a0428
reverted displays the bug again.

thanks
-- PMM



[Qemu-devel] [Minios-devel] [PATCH v4 0/] Begin to disentangle libxenctrl and provide some stable libraries

2015-11-09 Thread Ian Campbell
In <1431963008.4944.80.ca...@citrix.com> I proposed stabilising some
parts of the libxenctrl API/ABI by disaggregating into separate
libraries.

This is v5 of that set of series against:
xen
qemu-xen
qemu-xen-traditional
mini-os

NB: Samuel+minios-devel will only get the mini-os side and Stefano+qemu
-devel the qemu-xen side.

The code for all repos can be found in:

git://xenbits.xen.org/people/ianc/libxenctrl-split/xen.git  v5
git://xenbits.xen.org/people/ianc/libxenctrl-split/qemu-xen.git v5
git://xenbits.xen.org/people/ianc/libxenctrl-split/qemu-xen-traditional.git v5
git://xenbits.xen.org/people/ianc/libxenctrl-split/mini-os.git  v5

The tip of the xen.git branch contains an extra patch hacking Config.mk
to point to all the others above, which should get the correct things for
the HEAD of the branch, but not further back in time.

The new libraries here are:

 * libxentoollog: Common logging infrastructure
 * libxenevtchn: Userspace access to evtchns (via /dev/xen/evtchn etc)
 * libxengnttab: Userspace access to grant tables (via /dev/xen/gnt??? etc)
 * libxencall: Making hypercalls (i.e. the IOCTL_PRIVCMD_HYPERCALL type
   functionality)
 * libxenforeignmemory: Privileged mappings of foreign memory
   (IOCTL_PRIVCMD_MMAP et al)

The first three were actually pretty distinct within libxenctrl already and
have not changed in quite some time.

Although the other two are somewhat new they are based on top of long
standing stable ioctls, which gives me some confidence.

Nonetheless I would appreciate extra review of at least the interface
headers of all of these with a particular eye to the suitability of these
interfaces being maintained in an ABI (_B_, not _P_) stable way going
forward.

Still to come would be libraries for specific out of tree purposes
(device model, kexec), which would be adding new library at the same
level as libxc I think, rather than underneath, i.e. also using the
libraries split out here, but hopefully not libxenctrl itself.

The new libraries use linker version-scripts to hopefully make future
ABI changes be possible in a compatible way.

Since last time:

 * xen.git now has a substantial number of acks
 * mini-os side is now completely acked
 * qemu-xen is little over half acked/reviewed, and the rest are updated
   based on feedback.
 * qemu-xen-traditional one patch acked

The whole thing has been build tested on Linux (incl stubdoms), and on
FreeBSD. I have runtime on Linux with qemu-xen, qemu-xen-trad and stubdoms.

Neither NetBSD nor Solaris have been tested at all. It's certainly not
impossible that I've not got the #includes in the new files quite right.

http://xenbits.xen.org/people/ianc/libxenctrl-split/v5.html is the document
I've been using to try and track what I'm doing. It may not be all that
useful. The history of it is in the v5-with-doc branch of the xen.git
linked to above.

I propose that these precursors (and the corresponding qemu-xen-traditional 
+ mini-os patches) should go in now:

tools/Rules.mk: Properly handle libraries with recursive dependencies.
tools: Refactor "xentoollog" into its own library

I believe all the relevant/related patches are acked and this would make a
(small) dent in the set of things to keep in the air.

Ian.

___
Minios-devel mailing list
minios-de...@lists.xenproject.org
http://lists.xenproject.org/cgi-bin/mailman/listinfo/minios-devel

___
Minios-devel mailing list
minios-de...@lists.xenproject.org
http://lists.xenproject.org/cgi-bin/mailman/listinfo/minios-devel



  1   2   3   4   5   >