date:20170417

Re: [Qemu-devel] [PATCH 02/15] colo-compare: implement the process of checkpoint

2017-04-17 Thread Hailiang Zhang


On 2017/4/18 11:55, Jason Wang wrote:


On 2017年04月17日 19:04, Hailiang Zhang wrote:

Hi Jason,

On 2017/4/14 14:38, Jason Wang wrote:

On 2017年04月14日 14:22, Hailiang Zhang wrote:

Hi Jason,

On 2017/4/14 13:57, Jason Wang wrote:

On 2017年02月22日 17:31, Zhang Chen wrote:

On 02/22/2017 11:42 AM, zhanghailiang wrote:

While do checkpoint, we need to flush all the unhandled packets,
By using the filter notifier mechanism, we can easily to notify
every compare object to do this process, which runs inside
of compare threads as a coroutine.

Hi~ Jason and Hailiang.

I will send a patch set later about colo-compare notify mechanism for
Xen like this patch.
I want to add a new chardev socket way in colo-comapre connect to Xen
colo, for notify
checkpoint or failover, Because We have no choice to use this way
communicate with Xen codes.
That's means we will have two notify mechanism.
What do you think about this?


Thanks
Zhang Chen

I was thinking the possibility of using similar way to for colo
compare.
E.g can we use socket? This can saves duplicated codes more or less.

Since there are too many sockets used by filter and COLO, (Two unix
sockets and two
   tcp sockets for each vNIC), I don't want to introduce more ;) , but
i'm not sure if it is
possible to make it more flexible and optional, abstract these
duplicated codes,
pass the opened fd (No matter eventfd or socket fd ) as parameter, for
example.
Is this way acceptable ?

Thanks,
Hailiang

Yes, that's kind of what I want. We don't want to use two message
format. Passing a opened fd need management support, we still need a
fallback if there's no management on top. For qemu/kvm, we can do all
stuffs transparent to the cli by e.g socketpair() or others, but the key
is to have a unified message format.

After a deeper investigation, i think we can re-use most codes, since
there is no
existing way to notify xen (no ?), we still needs notify chardev
socket (Be used to notify xen, it is optional.)
(http://patchwork.ozlabs.org/patch/733431/ "COLO-compare: Add Xen
notify chardev socket handler frame")

Yes and actually you can use this for bi-directional communication. The
only differences is the implementation of comparing.


Besides, there is an existing qmp comand 'xen-colo-do-checkpoint',

I don't see this in master?


Er, it has been merged already, please see migration/colo.c, void 
qmp_xen_colo_do_checkpoint(Error **errp);


we can re-use it to notify
colo-compare objects and other filter objects to do checkpoint, for
the opposite direction, we use
the notify chardev socket (Only for xen).

Just want to make sure I understand the design, who will trigger this
command? Management?


The command will be issued by XEN (xc_save ?), the original existing 
xen-colo-do-checkpoint
command now only be used to notify block replication to do checkpoint, we can 
re-use it for filters too.


Can we just use the socket?


I don't quite understand ...
Just as the codes showed bellow, in this scenario,
XEN notifies colo-compare and fiters do checkpoint by using qmp command,
and colo-compare notifies XEN about net inconsistency event by using the new 
socket.


So the codes will be like:
diff --git a/migration/colo.c b/migration/colo.c
index 91da936..813c281 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -224,7 +224,19 @@ ReplicationStatus
*qmp_query_xen_replication_status(Error **errp)

  void qmp_xen_colo_do_checkpoint(Error **errp)
  {
+Error *local_err = NULL;
+
  replication_do_checkpoint_all(errp);
+/* Notify colo-compare and other filters to do checkpoint */
+colo_notify_compares_event(NULL, COLO_CHECKPOINT, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+colo_notify_filters_event(COLO_CHECKPOINT, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+}
  }

  static void colo_send_message(QEMUFile *f, COLOMessage msg,
diff --git a/net/colo-compare.c b/net/colo-compare.c
index 24e13f0..de975c5 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -391,6 +391,9 @@ static void colo_compare_inconsistent_notify(void)
  {
  notifier_list_notify(&colo_compare_notifiers,
  migrate_get_current());


KVM will use this notifier/callback way, and in this way, we can avoid the 
redundant socket.


+if (s->notify_dev) {
+   /* Do something, notify remote side through notify dev */
+}
  }


If we have a notify socket configured, we will send the message about net 
inconsistent event.



  void colo_compare_register_notifier(Notifier *notify)

How about this scenario ？

See my reply above, and we need unify the message format too. Raw string
is ok but we'd better have something like TLV or others.


Agreed, we need it to be more standard.

Thanks,
Hailiang


Thanks


Thoughts?

Thanks


Thanks


.


.






.

Re: [Qemu-devel] [PATCH 09/19] migration: Create block capabilities for shared and enable

2017-04-17 Thread Juan Quintela

Eric Blake  wrote:
> On 04/17/2017 03:00 PM, Juan Quintela wrote:
>> This two capabilites were added through the command line.  Notice that
>
> s/This/These/
> s/capabilites/capabilities/
>
>> we just created them.  This is just the boilerplate.
>> 
>> Signed-off-by: Juan Quintela 
>> ---
>>  include/migration/migration.h |  3 +++
>>  migration/migration.c | 36 
>>  qapi-schema.json  |  7 ++-
>>  3 files changed, 45 insertions(+), 1 deletion(-)
>
> I think this is a nice cleanup, even if it is exposing the internal
> block migration (the 'migrate -b' stuff) that we really don't like (it
> is one of the things causing grief at the 2.9-rc4 stage), because users
> should be favoring NBD migration over internal block migration these days.

I am 1st in the queue to remove it, but each time that we talk about
removing it, somebody appears to came to fix it :-p

So, if it has to stay, just update it with the times.

> Reviewed-by: Eric Blake

[Qemu-devel] [PATCH 2/2] target/openrisc: Implement EPH bit

2017-04-17 Thread Tim 'mithro' Ansell

Exception Prefix High (EPH) control bit of the Supervision Register
(SR).

The significant bits (31-12) of the vector offset address for each
exception depend on the setting of the Supervision Register (SR)'s EPH
bit and the Exception Vector Base Address Register (EVBAR).

If SR[EPH] is set, the vector offset is logically ORed with the offset
0xF000.

This means if EPH is;
 * 0 - Exceptions vectors start at EVBAR
 * 1 - Exception vectors start at EVBAR | 0xF000

Signed-off-by: Tim 'mithro' Ansell 
---
 target/openrisc/interrupt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/openrisc/interrupt.c b/target/openrisc/interrupt.c
index 78f0ba9421..2c91fab380 100644
--- a/target/openrisc/interrupt.c
+++ b/target/openrisc/interrupt.c
@@ -69,6 +69,9 @@ void openrisc_cpu_do_interrupt(CPUState *cs)
 if (env->cpucfgr & CPUCFGR_EVBARP) {
 vect_pc |= env->evbar;
 }
+if (env->sr & SR_EPH) {
+vect_pc |= 0xf000;
+}
 env->pc = vect_pc;
 } else {
 cpu_abort(cs, "Unhandled exception 0x%x\n", cs->exception_index);
-- 
2.12.1

[Qemu-devel] [PATCH 1/2] target/openrisc: Implement EVBAR register

2017-04-17 Thread Tim 'mithro' Ansell

Exception Vector Base Address Register (EVBAR) - This optional register
can be used to apply an offset to the exception vector addresses.

The significant bits (31-12) of the vector offset address for each
exception depend on the setting of the Supervision Register (SR)'s EPH
bit and the Exception Vector Base Address Register (EVBAR).

Its presence is indicated by the EVBARP bit in the CPU Configuration
Register (CPUCFGR).

Signed-off-by: Tim 'mithro' Ansell 
---
 target/openrisc/cpu.c| 2 ++
 target/openrisc/cpu.h| 7 +++
 target/openrisc/interrupt.c  | 6 +-
 target/openrisc/sys_helper.c | 7 +++
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/target/openrisc/cpu.c b/target/openrisc/cpu.c
index 7fd2b9a216..1524ed981a 100644
--- a/target/openrisc/cpu.c
+++ b/target/openrisc/cpu.c
@@ -134,6 +134,7 @@ static void or1200_initfn(Object *obj)
 
 set_feature(cpu, OPENRISC_FEATURE_OB32S);
 set_feature(cpu, OPENRISC_FEATURE_OF32S);
+set_feature(cpu, OPENRISC_FEATURE_EVBAR);
 }
 
 static void openrisc_any_initfn(Object *obj)
@@ -141,6 +142,7 @@ static void openrisc_any_initfn(Object *obj)
 OpenRISCCPU *cpu = OPENRISC_CPU(obj);
 
 set_feature(cpu, OPENRISC_FEATURE_OB32S);
+set_feature(cpu, OPENRISC_FEATURE_EVBAR);
 }
 
 typedef struct OpenRISCCPUInfo {
diff --git a/target/openrisc/cpu.h b/target/openrisc/cpu.h
index 418a0e6960..1958b72718 100644
--- a/target/openrisc/cpu.h
+++ b/target/openrisc/cpu.h
@@ -111,6 +111,11 @@ enum {
 CPUCFGR_OF32S = (1 << 7),
 CPUCFGR_OF64S = (1 << 8),
 CPUCFGR_OV64S = (1 << 9),
+/* CPUCFGR_ND = (1 << 10), */
+/* CPUCFGR_AVRP = (1 << 11), */
+CPUCFGR_EVBARP = (1 << 12),
+/* CPUCFGR_ISRP = (1 << 13), */
+/* CPUCFGR_AECSRP = (1 << 14), */
 };
 
 /* DMMU configure register */
@@ -200,6 +205,7 @@ enum {
 OPENRISC_FEATURE_OF32S = (1 << 7),
 OPENRISC_FEATURE_OF64S = (1 << 8),
 OPENRISC_FEATURE_OV64S = (1 << 9),
+OPENRISC_FEATURE_EVBAR = (1 << 12),
 };
 
 /* Tick Timer Mode Register */
@@ -289,6 +295,7 @@ typedef struct CPUOpenRISCState {
 uint32_t dmmucfgr;/* DMMU configure register */
 uint32_t immucfgr;/* IMMU configure register */
 uint32_t esr; /* Exception supervisor register */
+uint32_t evbar;   /* Exception vector base address register */
 uint32_t fpcsr;   /* Float register */
 float_status fp_status;
 
diff --git a/target/openrisc/interrupt.c b/target/openrisc/interrupt.c
index a2eec6fb32..78f0ba9421 100644
--- a/target/openrisc/interrupt.c
+++ b/target/openrisc/interrupt.c
@@ -65,7 +65,11 @@ void openrisc_cpu_do_interrupt(CPUState *cs)
 env->lock_addr = -1;
 
 if (cs->exception_index > 0 && cs->exception_index < EXCP_NR) {
-env->pc = (cs->exception_index << 8);
+hwaddr vect_pc = cs->exception_index << 8;
+if (env->cpucfgr & CPUCFGR_EVBARP) {
+vect_pc |= env->evbar;
+}
+env->pc = vect_pc;
 } else {
 cpu_abort(cs, "Unhandled exception 0x%x\n", cs->exception_index);
 }
diff --git a/target/openrisc/sys_helper.c b/target/openrisc/sys_helper.c
index 60c3193656..6ba816249b 100644
--- a/target/openrisc/sys_helper.c
+++ b/target/openrisc/sys_helper.c
@@ -39,6 +39,10 @@ void HELPER(mtspr)(CPUOpenRISCState *env,
 env->vr = rb;
 break;
 
+case TO_SPR(0, 11): /* EVBAR */
+env->evbar = rb;
+break;
+
 case TO_SPR(0, 16): /* NPC */
 cpu_restore_state(cs, GETPC());
 /* ??? Mirror or1ksim in not trashing delayed branch state
@@ -206,6 +210,9 @@ target_ulong HELPER(mfspr)(CPUOpenRISCState *env,
 case TO_SPR(0, 4): /* IMMUCFGR */
 return env->immucfgr;
 
+case TO_SPR(0, 11): /* EVBAR */
+return env->evbar;
+
 case TO_SPR(0, 16): /* NPC (equals PC) */
 cpu_restore_state(cs, GETPC());
 return env->pc;
-- 
2.12.1

[Qemu-devel] [PATCH 0/2] targets/openrisc: Improve exception vectoring.

2017-04-17 Thread Tim 'mithro' Ansell

Hi,

This patch series improves the exception vectoring on the OpenRISC platform by
adding support for both the EVBAR register and EPH bit.

This is my first patch to upstream QEMU, so please do point of if I have done
anything silly.

Tim 'mithro' Ansell (2):
  target/openrisc: Implement EVBAR register
  target/openrisc: Implement EPH bit

 target/openrisc/cpu.c| 2 ++
 target/openrisc/cpu.h| 7 +++
 target/openrisc/interrupt.c  | 9 -
 target/openrisc/sys_helper.c | 7 +++
 4 files changed, 24 insertions(+), 1 deletion(-)

-- 
2.12.1

Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)

2017-04-17 Thread Liu, Yi L

> > Hi Peter,
> >
> > Skip address space switching is a good idea to support Passthru mode.
> > However, without the address space, the vfio notifier would not be
> > registered, thus vIOMMU emulator has no way to connect to host. It is
> > no harm if there is only map/unmap notifier. But if we have more
> > notifiers other than map/unmap, it may be a problem.
> >
> > I think we need to reconsider it here.
> 
> For now I think as switching is good to us in general. Could I know more 
> context
> about this? Would it be okay to work on top of this in the future?
> 

Let me explain. For vSVM enabling, it needs to add new notifier to bind gPASID 
table
to host. If an assigned device uses SVM in guest, as designed guest IOMMU 
driver would
set "pt" for it. This is to make sure the host second-level page table stores 
GPA->HPA
mapping. So that pIOMMU can do nested translation to achieve gVA->GPA GPA->HPA 
mapping.

So the situation is if we want to keep GPA->HPA mapping, we should skip address 
space
switch, but the vfio listener would not know vIOMMU is added, so no vfio 
notifier would
be registered. But if we do the switch, the GPA->HPA mapping is unmapped. And 
need
another way to build the GPA->HPA mapping.

I think we may have two choice here. Pls let me know your ideas.

1. skip the switch for "pt" mode, find other way to register vfio notifiers. 
not sure if this
is workable. Just a quick thought.

2. do the switch, and rebuild GPA->HPA mapping if device use "pt" mode. For 
this option,
I also have two way to build the GPA->HPA mapping.
a) walk all the memory region sections in address_space_memory, and build the 
mapping.
address_space_memory.dispatch->map.sections, sections stores all the mapped 
sections.

b) let guest iommu driver mimics a 1:1 mapping for the devices use "pt" mode. 
in this way,
the GPA->HPA mapping could be setup by walking the guest SL page table. This is 
consistent
with your vIOVA replay solution.

Also I'm curious if Tianyu's fault report framework needs to register new 
notifiers.

Thanks,
Yi L

Re: [Qemu-devel] [PATCH v4 2/6] replication: add shared-disk and shared-disk-id options

2017-04-17 Thread Xie Changlong




On 04/12/2017 10:05 PM, zhanghailiang wrote:

We use these two options to identify which disk is
shared

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Zhang Chen 
---
v4:
- Add proper comment for primary_disk (Stefan)
v2:
- Move g_free(s->shared_disk_id) to the common fail process place (Stefan)
- Fix comments for these two options
---
  block/replication.c  | 43 +--
  qapi/block-core.json | 10 +-
  2 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index bf3c395..418b81b 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -25,9 +25,12 @@
  typedef struct BDRVReplicationState {
  ReplicationMode mode;
  int replication_state;
+bool is_shared_disk;
+char *shared_disk_id;
  BdrvChild *active_disk;
  BdrvChild *hidden_disk;
  BdrvChild *secondary_disk;
+BdrvChild *primary_disk;
  char *top_id;
  ReplicationState *rs;
  Error *blocker;
@@ -53,6 +56,9 @@ static void replication_stop(ReplicationState *rs, bool 
failover,
  
  #define REPLICATION_MODE"mode"

  #define REPLICATION_TOP_ID  "top-id"
+#define REPLICATION_SHARED_DISK "shared-disk"
+#define REPLICATION_SHARED_DISK_ID "shared-disk-id"
+
  static QemuOptsList replication_runtime_opts = {
  .name = "replication",
  .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
@@ -65,6 +71,14 @@ static QemuOptsList replication_runtime_opts = {
  .name = REPLICATION_TOP_ID,
  .type = QEMU_OPT_STRING,
  },
+{
+.name = REPLICATION_SHARED_DISK_ID,
+.type = QEMU_OPT_STRING,
+},
+{
+.name = REPLICATION_SHARED_DISK,
+.type = QEMU_OPT_BOOL,
+},
  { /* end of list */ }
  },
  };
@@ -85,6 +99,9 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
  QemuOpts *opts = NULL;
  const char *mode;
  const char *top_id;
+const char *shared_disk_id;
+BlockBackend *blk;
+BlockDriverState *tmp_bs;
  
  bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file,

 false, errp);
@@ -125,12 +142,33 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
 "The option mode's value should be primary or secondary");
  goto fail;
  }
+s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
+


What If secondary side is supplied with 'REPLICATION_SHARED_DISK_ID'? 
Pls refer f4f2539bc to pefect the logical.

 false);
+if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
+shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
+if (!shared_disk_id) {
+error_setg(&local_err, "Missing shared disk blk option");
+goto fail;
+}
+s->shared_disk_id = g_strdup(shared_disk_id);
+blk = blk_by_name(s->shared_disk_id);
+if (!blk) {
+error_setg(&local_err, "There is no %s block", s->shared_disk_id);
+goto fail;
+}
+/* We have a BlockBackend for the primary disk but use BdrvChild for
+ * consistency - active_disk, secondary_disk, etc are also BdrvChild.
+ */
+tmp_bs = blk_bs(blk);
+s->primary_disk = QLIST_FIRST(&tmp_bs->parents);
+}
  
  s->rs = replication_new(bs, &replication_ops);
  
-ret = 0;

-
+qemu_opts_del(opts);
+return 0;
  fail:
+g_free(s->shared_disk_id);
  qemu_opts_del(opts);
  error_propagate(errp, local_err);
  
@@ -141,6 +179,7 @@ static void replication_close(BlockDriverState *bs)

  {
  BDRVReplicationState *s = bs->opaque;
  
+g_free(s->shared_disk_id);

  if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
  replication_stop(s->rs, false, NULL);
  }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 033457c..361c932 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2661,12 +2661,20 @@
  #  node who owns the replication node chain. Must not be given in
  #  primary mode.
  #
+# @shared-disk-id: Id of shared disk while is replication mode, if @shared-disk
+#  is true, this option is required (Since: 2.10)
+#

Further explanations:

For @shared-disk-id, it must/only be given when @shared-disk enable on
Primary side.


+# @shared-disk: To indicate whether or not a disk is shared by primary VM
+#   and secondary VM. (The default is false) (Since: 2.10)
+#

Further explanations:

For @shared-disk, it must be given or not-given on both side at the same 
time.



  # Since: 2.9
  ##
  { 'struct': 'BlockdevOptionsReplication',
'base': 'BlockdevOptionsGenericFormat',
'data': { 'mode': 'ReplicationMode',
-'*top-id': 'str' } }
+'*top-id': 'str',
+

Re: [Qemu-devel] [Xen-devel] [PATCH] configure: introduce --enable-xen-fb-backend

2017-04-17 Thread Juergen Gross

On 14/04/17 19:52, Stefano Stabellini wrote:
> On Fri, 14 Apr 2017, Juergen Gross wrote:
>> On 14/04/17 08:06, Oleksandr Andrushchenko wrote:
>>> On 04/14/2017 03:12 AM, Stefano Stabellini wrote:
 On Tue, 11 Apr 2017, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko 
>
> For some use cases when Xen framebuffer/input backend
> is not a part of Qemu it is required to disable it,
> because of conflicting access to input/display devices.
> Introduce additional configuration option for explicit
> input/display control.
 In these cases when you don't want xenfb, why don't you just remove
 "vfb" from the xl config file? QEMU only starts the xenfb backend when
 requested by the toolstack.

 Is it because you have an alternative xenfb backend? If so, is it really
 fully xenfb compatible, or is it a different protocol? If it is a
 different protocol, I suggest you rename your frontend/backend PV device
 name to something different from "vfb".

>>> Well, offending part is vkbd actually (for multi-touch
>>> we run our own user-space backend which supports
>>> kbd/ptr/mtouch), but vfb and vkbd is the same backend
>>> in QEMU. So, I am ok for vfb, but just want vkbd off
>>> So, there are 2 options:
>>> 1. At compile time remove vkbd and still allow vfb
>>> 2. Remove xenfb completely, if acceptable (this is my case)
>>
>> What about adding a Xenstore entry for backend type and let qemu test
>> for it being not present or containing "qemu"?
> 
> That is what we do for the console, using the xenstore node "type". QEMU
> is "ioemu" while xenconsoled is "xenconsoled". Weirdly, instead of a
> backend node, it is a read-only frontend node, see
> tools/libxl/libxl_console.c:libxl__device_console_add.
> 
> Oleksandr, I am sorry to feature-creep this simple patch, but I think
> Juergen is right. But we cannot do it just for one protocol. We need to
> introduce a generic way to enable/disable backends in QEMU. Using a
> xenstore node is OK.

An alternative solution would be similar to qdisk/tap or qusb/vusb
backends: Use different device types on backend side while keeping
frontend side of Xenstore the same as today.

So today the vkbd backend nodes are:

/local/domain/0/backend/vkbd/

You could use:

/local/domain/0/backend/mtouch

and keep the frontend nodes (/local/domain//device/vkbd/), possibly
with additional feature node(s).

The qemu backend would have to check for the vkbd backend nodes to be
present before enabling the related backend.


Juergen

> 
> We could do exactly the same as the PV console, thus "type" = "ioemu",
> read-only, under the frontend xenstore directory. Or we could introduce
> new nodes. I would probably go for "backend-type" = "qemu" under the
> backend xenstore directory. I don't have a strong opinion about this. In
> the example below I'll use the PV console convention.
> 
> For starters:
> 
> * libxl needs to write the "type" node to xenstore for *all* protocols.
>   The "type" is not yet configurable.
> * qemu reads them for all backends, proceeds if "type" = "ioemu"
> 
> These should be two simple patches. Stage 2:
> 
> * we add options in the xl config file to configure any backend, libxl
>   set "type" accordingly (Maybe not *any*, but vif, vkbd, vfb could all
>   have a "type". It is OK if you only add an option for vkbd.)
> * non-QEMU backends, in particular Linux backends, also read the "type"
>   node and proceed if it's "linux"
> 
> Does this sound OK to you?
>

Re: [Qemu-devel] [PATCH 26/31] vdi: Avoid bitrot of debugging code

2017-04-17 Thread Stefan Weil


Am 18.04.2017 um 03:33 schrieb Eric Blake:

Rework the debug define so that we always get -Wformat checking,
even when debugging is disabled.

Signed-off-by: Eric Blake 
---


Reviewed-by: Stefan Weil 



 block/vdi.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index d12d9cd..a70b969 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -86,12 +86,18 @@
 #define DEFAULT_CLUSTER_SIZE (1 * MiB)

 #if defined(CONFIG_VDI_DEBUG)
-#define logout(fmt, ...) \
-fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__)
+#define VDI_DEBUG 1
 #else
-#define logout(fmt, ...) ((void)0)
+#define VDI_DEBUG 0
 #endif

+#define logout(fmt, ...) \
+do {\
+if (VDI_DEBUG) {\
+fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__); \
+}   \
+} while (0)
+
 /* Image signature. */
 #define VDI_SIGNATURE 0xbeda107f

Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)

2017-04-17 Thread Peter Xu

On Tue, Apr 18, 2017 at 04:30:40AM +, Liu, Yi L wrote:

[...]

> > +static void vtd_switch_address_space(VTDAddressSpace *as) {
> > +bool use_iommu;
> > +
> > +assert(as);
> > +
> > +use_iommu = as->iommu_state->dmar_enabled;
> > +if (use_iommu) {
> > +/* Further checks per-device configuration */
> > +use_iommu &= !vtd_dev_pt_enabled(as);
> > +}
> > +
> > +trace_vtd_switch_address_space(pci_bus_num(as->bus),
> > +   VTD_PCI_SLOT(as->devfn),
> > +   VTD_PCI_FUNC(as->devfn),
> > +   use_iommu);
> 
> Hi Peter,
> 
> Skip address space switching is a good idea to support Passthru mode.
> However, without the address space, the vfio notifier would not be
> registered, thus vIOMMU emulator has no way to connect to host. It is
> no harm if there is only map/unmap notifier. But if we have more notifiers
> other than map/unmap, it may be a problem.
> 
> I think we need to reconsider it here. 

For now I think as switching is good to us in general. Could I know
more context about this? Would it be okay to work on top of this in
the future?

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)

2017-04-17 Thread Liu, Yi L

> -Original Message-
> From: Qemu-devel [mailto:qemu-devel-bounces+yi.l.liu=intel@nongnu.org] On
> Behalf Of Peter Xu
> Sent: Monday, April 17, 2017 7:32 PM
> To: qemu-devel@nongnu.org
> Cc: Lan, Tianyu ; Michael S . Tsirkin ;
> Jason Wang ; pet...@redhat.com; Marcel Apfelbaum
> ; David Gibson 
> Subject: [Qemu-devel] [PATCH v2 7/7] intel_iommu: support passthrough (PT)
> 
> Signed-off-by: Peter Xu 
> ---
>  hw/i386/intel_iommu.c  | 109 
> +
>  hw/i386/intel_iommu_internal.h |   1 +
>  hw/i386/trace-events   |   1 +
>  hw/i386/x86-iommu.c|   1 +
>  include/hw/i386/x86-iommu.h|   1 +
>  5 files changed, 93 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> 05ae631..deb2007 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -872,7 +872,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s,
> uint8_t bus_num,
>  } else {
>  switch (vtd_ce_get_type(ce)) {
>  case VTD_CONTEXT_TT_MULTI_LEVEL:
> -/* fall through */
> +case VTD_CONTEXT_TT_PASS_THROUGH:
>  case VTD_CONTEXT_TT_DEV_IOTLB:
>  break;
>  default:
> @@ -883,6 +883,73 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s,
> uint8_t bus_num,
>  return 0;
>  }
> 
> +/* Fetch translation type for specific device. Returns <0 if error
> + * happens, otherwise return the shifted type to check against
> + * VTD_CONTEXT_TT_*. */
> +static int vtd_dev_get_trans_type(VTDAddressSpace *as) {
> +IntelIOMMUState *s;
> +VTDContextEntry ce;
> +int ret;
> +
> +s = as->iommu_state;
> +
> +ret = vtd_dev_to_context_entry(s, pci_bus_num(as->bus),
> +   as->devfn, &ce);
> +if (ret) {
> +return ret;
> +}
> +
> +return vtd_ce_get_type(&ce);
> +}
> +
> +static bool vtd_dev_pt_enabled(VTDAddressSpace *as) {
> +int ret;
> +
> +assert(as);
> +
> +ret = vtd_dev_get_trans_type(as);
> +if (ret < 0) {
> +/*
> + * Possibly failed to parse the context entry for some reason
> + * (e.g., during init, or any guest configuration errors on
> + * context entries). We should assume PT not enabled for
> + * safety.
> + */
> +return false;
> +}
> +
> +return ret == VTD_CONTEXT_TT_PASS_THROUGH; }
> +
> +static void vtd_switch_address_space(VTDAddressSpace *as) {
> +bool use_iommu;
> +
> +assert(as);
> +
> +use_iommu = as->iommu_state->dmar_enabled;
> +if (use_iommu) {
> +/* Further checks per-device configuration */
> +use_iommu &= !vtd_dev_pt_enabled(as);
> +}
> +
> +trace_vtd_switch_address_space(pci_bus_num(as->bus),
> +   VTD_PCI_SLOT(as->devfn),
> +   VTD_PCI_FUNC(as->devfn),
> +   use_iommu);

Hi Peter,

Skip address space switching is a good idea to support Passthru mode.
However, without the address space, the vfio notifier would not be
registered, thus vIOMMU emulator has no way to connect to host. It is
no harm if there is only map/unmap notifier. But if we have more notifiers
other than map/unmap, it may be a problem.

I think we need to reconsider it here. 

Regards,
Yi L
> +/* Turn off first then on the other */
> +if (use_iommu) {
> +memory_region_set_enabled(&as->sys_alias, false);
> +memory_region_set_enabled(&as->iommu, true);
> +} else {
> +memory_region_set_enabled(&as->iommu, false);
> +memory_region_set_enabled(&as->sys_alias, true);
> +}
> +}
> +
>  static inline uint16_t vtd_make_source_id(uint8_t bus_num, uint8_t devfn)  {
>  return ((bus_num & 0xffUL) << 8) | (devfn & 0xffUL); @@ -991,6 +1058,18 
> @@
> static void vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
>  cc_entry->context_cache_gen = s->context_cache_gen;
>  }
> 
> +/*
> + * We don't need to translate for pass-through context entries.
> + * Also, let's ignore IOTLB caching as well for PT devices.
> + */
> +if (vtd_ce_get_type(&ce) == VTD_CONTEXT_TT_PASS_THROUGH) {
> +entry->translated_addr = entry->iova;
> +entry->addr_mask = VTD_PAGE_SIZE - 1;
> +entry->perm = IOMMU_RW;
> +trace_vtd_translate_pt(source_id, entry->iova);
> +return;
> +}
> +
>  ret_fr = vtd_iova_to_slpte(&ce, addr, is_write, &slpte, &level,
> &reads, &writes);
>  if (ret_fr) {
> @@ -1135,6 +1214,11 @@ static void
> vtd_context_device_invalidate(IntelIOMMUState *s,
>   VTD_PCI_FUNC(devfn_it));
>  vtd_as->context_cache_entry.context_cache_gen = 0;
>  /*
> + * Do switch address space when needed, in case if the
> + * device passthrough bit is switche

Re: [Qemu-devel] [PATCH 7/7] intel_iommu: support passthrough (PT)

2017-04-17 Thread Peter Xu

On Tue, Apr 18, 2017 at 12:00:13PM +0800, Jason Wang wrote:
> 
> 
> On 2017年04月18日 11:50, Peter Xu wrote:
> >On Tue, Apr 18, 2017 at 11:23:35AM +0800, Jason Wang wrote:
> >>On 2017年04月17日 18:58, Peter Xu wrote:
> >[...]
> >
> >>>+static void vtd_switch_address_space(VTDAddressSpace *as)
> >>>+{
> >>>+bool use_iommu;
> >>>+
> >>>+assert(as);
> >>>+
> >>>+use_iommu = as->iommu_state->dmar_enabled;
> >>>+if (use_iommu) {
> >>>+/* Further checks per-device configuration */
> >>>+use_iommu &= !vtd_dev_pt_enabled(as);
> >>>+}
> >>Looks like you can use as->iommu_state->dmar_enabled &&
> >>!vtd_dev_pt_enabled(as)
> >vtd_dev_pt_enalbed() needs to read the guest memory (starting from
> >reading root entry), which is slightly slow. I was trying to avoid
> >unecessary reads.
> >
> >[...]
> 
> I think compiler won't go to vtd_dev_pt_enabled() if dmar_enabled is false.

You are right. I'll switch.

> 
> >
> >>>@@ -991,6 +1058,18 @@ static void vtd_do_iommu_translate(VTDAddressSpace 
> >>>*vtd_as, PCIBus *bus,
> >>>  cc_entry->context_cache_gen = s->context_cache_gen;
> >>>  }
> >>>+/*
> >>>+ * We don't need to translate for pass-through context entries.
> >>>+ * Also, let's ignore IOTLB caching as well for PT devices.
> >>>+ */
> >>>+if (vtd_ce_get_type(&ce) == VTD_CONTEXT_TT_PASS_THROUGH) {
> >>>+entry->translated_addr = entry->iova;
> >>>+entry->addr_mask = VTD_PAGE_SIZE - 1;
> >>>+entry->perm = IOMMU_RW;
> >>>+trace_vtd_translate_pt(source_id, entry->iova);
> >>>+return;
> >>>+}
> >>Several questions here:
> >>
> >>1) Is this just for vhost?
> >No. When caching mode is not enabled, all passthroughed devices should
> >be using this path.
> 
> Ok, then it looks better to switch the address space if we've found it was
> PT?

Do you mean to switch in that if() above? Then when invalidate context
entry, we switch back if needed?

> 
> >
> >>2) Since this is done after IOTLB querying, do we need flush IOTLB during
> >>address switching?
> >IMHO if guest switches address space for a device, it is required to
> >send IOTLB flush as well for that device/domain.
> >
> >[...]
> 
> Ok.
> 
> >
> >>>  static void vtd_switch_address_space_all(IntelIOMMUState *s)
> >>>  {
> >>>  GHashTableIter iter;
> >>>@@ -2849,6 +2914,10 @@ static void vtd_init(IntelIOMMUState *s)
> >>>  s->ecap |= VTD_ECAP_DT;
> >>>  }
> >>>+if (x86_iommu->pt_supported) {
> >>>+s->ecap |= VTD_ECAP_PT;
> >>>+}
> >>Since we support migration now, need compat this for pre 2.10.
> >Oh yes. If I set pt=off by default, it should be okay then, right?
> 
> Right, but I think it's better to keep this on by default for performance
> reason.

Okay. Just to confirm, that'll need one entry for HW_COMPAT_2_9,
right? (though it is still not there)

-- 
Peter Xu

[Qemu-devel] define constant in .risu file

2017-04-17 Thread G 3

Is there a way to define a constant in a .risu file? Something like  
this:


my $upper_imm_limit = 500;

Re: [Qemu-devel] [PATCH v2 1/7] memory: tune last param of iommu_ops.translate()

2017-04-17 Thread David Gibson

On Mon, Apr 17, 2017 at 07:32:04PM +0800, Peter Xu wrote:
> This patch converts the old "is_write" bool into IOMMUAccessFlags. The
> difference is that "is_write" can only express either read/write, but
> sometimes what we really want is "none" here (neither read nor write).
> Replay is an good example - during replay, we should not check any RW
> permission bits since thats not an actual IO at all.
> 
> CC: Paolo Bonzini 
> CC: David Gibson 
> Signed-off-by: Peter Xu 

Reviewed-by: David Gibson 

spapr specific part,

Acked-by: David Gibson 

> ---
>  exec.c   |  6 --
>  hw/alpha/typhoon.c   |  2 +-
>  hw/dma/rc4030.c  |  2 +-
>  hw/i386/amd_iommu.c  |  4 ++--
>  hw/i386/intel_iommu.c|  4 ++--
>  hw/pci-host/apb.c|  2 +-
>  hw/ppc/spapr_iommu.c |  2 +-
>  hw/s390x/s390-pci-bus.c  |  2 +-
>  hw/s390x/s390-pci-inst.c |  2 +-
>  include/exec/memory.h| 10 --
>  memory.c |  3 ++-
>  11 files changed, 24 insertions(+), 15 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index c97ef4a..188892b 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -475,7 +475,8 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace 
> *as, hwaddr addr,
>  break;
>  }
>  
> -iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +iotlb = mr->iommu_ops->translate(mr, addr, is_write ?
> + IOMMU_WO : IOMMU_RO);
>  if (!(iotlb.perm & (1 << is_write))) {
>  iotlb.target_as = NULL;
>  break;
> @@ -507,7 +508,8 @@ MemoryRegion *address_space_translate(AddressSpace *as, 
> hwaddr addr,
>  break;
>  }
>  
> -iotlb = mr->iommu_ops->translate(mr, addr, is_write);
> +iotlb = mr->iommu_ops->translate(mr, addr, is_write ?
> + IOMMU_WO : IOMMU_RO);
>  addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
>  | (addr & iotlb.addr_mask));
>  *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1);
> diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
> index f50f5cf..c1cf780 100644
> --- a/hw/alpha/typhoon.c
> +++ b/hw/alpha/typhoon.c
> @@ -664,7 +664,7 @@ static bool window_translate(TyphoonWindow *win, hwaddr 
> addr,
>  /* TODO: A translation failure here ought to set PCI error codes on the
> Pchip and generate a machine check interrupt.  */
>  static IOMMUTLBEntry typhoon_translate_iommu(MemoryRegion *iommu, hwaddr 
> addr,
> - bool is_write)
> + IOMMUAccessFlags flag)
>  {
>  TyphoonPchip *pchip = container_of(iommu, TyphoonPchip, iommu);
>  IOMMUTLBEntry ret;
> diff --git a/hw/dma/rc4030.c b/hw/dma/rc4030.c
> index 0080141..edf9432 100644
> --- a/hw/dma/rc4030.c
> +++ b/hw/dma/rc4030.c
> @@ -489,7 +489,7 @@ static const MemoryRegionOps jazzio_ops = {
>  };
>  
>  static IOMMUTLBEntry rc4030_dma_translate(MemoryRegion *iommu, hwaddr addr,
> -  bool is_write)
> +  IOMMUAccessFlags flag)
>  {
>  rc4030State *s = container_of(iommu, rc4030State, dma_mr);
>  IOMMUTLBEntry ret = {
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index f86a40a..42b34ef 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -987,7 +987,7 @@ static inline bool amdvi_is_interrupt_addr(hwaddr addr)
>  }
>  
>  static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
> - bool is_write)
> + IOMMUAccessFlags flag)
>  {
>  AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
>  AMDVIState *s = as->iommu_state;
> @@ -1016,7 +1016,7 @@ static IOMMUTLBEntry amdvi_translate(MemoryRegion 
> *iommu, hwaddr addr,
>  return ret;
>  }
>  
> -amdvi_do_translate(as, addr, is_write, &ret);
> +amdvi_do_translate(as, addr, flag & IOMMU_WO, &ret);
>  trace_amdvi_translation_result(as->bus_num, PCI_SLOT(as->devfn),
>  PCI_FUNC(as->devfn), addr, ret.translated_addr);
>  return ret;
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 02f047c..ea54ec3 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2221,7 +2221,7 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
>  }
>  
>  static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion *iommu, hwaddr addr,
> - bool is_write)
> + IOMMUAccessFlags flag)
>  {
>  VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
>  IntelIOMMUState *s = vtd_as->iommu_state;
> @@ -2243,7 +2243,7 @@ static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion 
> *iommu, hwaddr addr,
>  }
>  
>  vtd_do_iommu_translate(vtd_as, vtd_as->bus, vtd_as->devfn, addr,
> -

Re: [Qemu-devel] [PATCH v2 2/7] memory: remove the last param in memory_region_iommu_replay()

2017-04-17 Thread David Gibson

On Mon, Apr 17, 2017 at 07:32:05PM +0800, Peter Xu wrote:
> We were always passing in that one as "false" to assume that's an read
> operation, and we also assume that IOMMU translation would always have
> that read permission. A better permission would be IOMMU_NONE since the
> replay is after all not a real read operation, but just a page table
> rebuilding process.
> 
> CC: David Gibson 
> CC: Paolo Bonzini 
> Signed-off-by: Peter Xu 

Reviewed-by: David Gibson 

> ---
>  hw/vfio/common.c  | 2 +-
>  include/exec/memory.h | 5 +
>  memory.c  | 8 +++-
>  3 files changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 6b33b9f..d008a4b 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -488,7 +488,7 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
>  
>  memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> -memory_region_iommu_replay(giommu->iommu, &giommu->n, false);
> +memory_region_iommu_replay(giommu->iommu, &giommu->n);
>  
>  return;
>  }
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 9047bf3..8721d53 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -731,11 +731,8 @@ void memory_region_register_iommu_notifier(MemoryRegion 
> *mr,
>   *
>   * @mr: the memory region to observe
>   * @n: the notifier to which to replay iommu mappings
> - * @is_write: Whether to treat the replay as a translate "write"
> - * through the iommu
>   */
> -void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
> -bool is_write);
> +void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n);
>  
>  /**
>   * memory_region_iommu_replay_all: replay existing IOMMU translations
> diff --git a/memory.c b/memory.c
> index 47dc107..6b2fdb7 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1620,12 +1620,10 @@ uint64_t 
> memory_region_iommu_get_min_page_size(MemoryRegion *mr)
>  return TARGET_PAGE_SIZE;
>  }
>  
> -void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
> -bool is_write)
> +void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n)
>  {
>  hwaddr addr, granularity;
>  IOMMUTLBEntry iotlb;
> -IOMMUAccessFlags flag = is_write ? IOMMU_WO : IOMMU_RO;
>  
>  /* If the IOMMU has its own replay callback, override */
>  if (mr->iommu_ops->replay) {
> @@ -1636,7 +1634,7 @@ void memory_region_iommu_replay(MemoryRegion *mr, 
> IOMMUNotifier *n,
>  granularity = memory_region_iommu_get_min_page_size(mr);
>  
>  for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
> -iotlb = mr->iommu_ops->translate(mr, addr, flag);
> +iotlb = mr->iommu_ops->translate(mr, addr, IOMMU_NONE);
>  if (iotlb.perm != IOMMU_NONE) {
>  n->notify(n, &iotlb);
>  }
> @@ -1654,7 +1652,7 @@ void memory_region_iommu_replay_all(MemoryRegion *mr)
>  IOMMUNotifier *notifier;
>  
>  IOMMU_NOTIFIER_FOREACH(notifier, mr) {
> -memory_region_iommu_replay(mr, notifier, false);
> +memory_region_iommu_replay(mr, notifier);
>  }
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 0/5] FTGMAC100 nic model for the Aspeed SoCs

2017-04-17 Thread Jason Wang




On 2017年04月14日 16:34, Cédric Le Goater wrote:

Hi,

The Aspeed SoCs AST2400 and AST2500 have two FTGMAC100 ethernet
controllers. This series proposes a model for this device and a way to
customize the bit definitions which are slightly different from the
Faraday definitions.

The last patch adds a fake NC-SI (Network Controller Sideband
Interface) backend to pretend a NIC is being managed. This is only
usable with the slirp stack for the moment.

The model has been tested on the 'palmetto', 'romulus' and
'ast2500-evb' machines using different implementations of the Linux
driver and with U-Boot. It has been stressed with iperf.

The full patchset is available here, based on QEMU v2.9.0-rc4 :

   https://github.com/legoater/qemu/commits/aspeed

To test, grab a palmetto or a romulus BMC firmware image :

   
https://openpower.xyz/job/openbmc-build/distro=ubuntu,target=palmetto/lastSuccessfulBuild/artifact/images/palmetto/flash-palmetto
   
https://openpower.xyz/job/openbmc-build/distro=ubuntu,target=romulus/lastSuccessfulBuild/artifact/images/romulus/flash-romulus

and start the machines with  :

   qemu-system-arm -m 512 -M romulus-bmc  -drive 
file=flash-romulus,format=raw,if=mtd  -nographic
   qemu-system-arm -m 512 -M palmetto-bmc -drive 
file=flash-palmetto,format=raw,if=mtd -nographic

the fake NC-SI NIC should come up :

   ftgmac100 1e66.ethernet eth0: NCSI interface up

and to log, use credentials root/0penBmc

Thanks,

C.

Changes since v2:
  - rebased on QEMU v2.9.0-rc4
  - modified the 'ncsi-pkt.h' file to add its location under the Linux
tree
  - removed useless zeroing of the 'reserved' fields
  - introduced a ncsi_rsp_handlers array to hold the payload size and
a specific handler routine for each command
  - fixed the output size of the frame to match the command payload
size

Changes since v1:
  - removed TODO comments and used LOG_UNIMP in the read/write ops
  - used camelcase for struct names and typedefs.
  - removed the useless struct definitions for ring descriptors and
the alignment pragma
  - introduced a frame buffer at the machine level to reduce stack
usage in the tx path.
  - introduced symbolic constants for PHY values.
  - introduced rtl8211e PHY chip specific registers
  - removed qemu_set_irq() in reset path
  - checked for dma_memory_read() errors. Also for write but that was
less important as the descriptor is first read so it should be
valid for the write.
  - removed the irq state
  - removed the weird hack to catch a first valid descriptor.
  - fixed the read of the mac address

Cédric Le Goater (5):
   hw/net: add MII definitions
   net: add FTGMAC100 support
   net/ftgmac100: add a 'aspeed' property
   aspeed: add a FTGMAC100 nic
   slirp: add a fake NC-SI backend

  default-configs/arm-softmmu.mak |1 +
  hw/arm/aspeed_soc.c |   21 +
  hw/net/Makefile.objs|1 +
  hw/net/ftgmac100.c  | 1016 +++
  include/hw/arm/aspeed_soc.h |2 +
  include/hw/net/ftgmac100.h  |   64 +++
  include/hw/net/mii.h|   71 ++-
  include/net/eth.h   |1 +
  slirp/Makefile.objs |2 +-
  slirp/ncsi-pkt.h|  419 
  slirp/ncsi.c|  130 +
  slirp/slirp.c   |4 +
  slirp/slirp.h   |3 +
  13 files changed, 1716 insertions(+), 19 deletions(-)
  create mode 100644 hw/net/ftgmac100.c
  create mode 100644 include/hw/net/ftgmac100.h
  create mode 100644 slirp/ncsi-pkt.h
  create mode 100644 slirp/ncsi.c



Queued for 2.10.

Thanks

Re: [Qemu-devel] [RFC 5/7] pci: Set phb->bus inside pci_register_bus()

2017-04-17 Thread David Gibson

On Mon, Apr 17, 2017 at 06:59:14PM -0300, Eduardo Habkost wrote:
> Every single caller of of pci_register_bus() saves the return value in
> phb->bus. Do that inside pci_register_bus() to avoid code duplication
> and make it harder to break.
> 
> Most (but not all) conversions done using the following Coccinelle script:
> 
>   @@
>   identifier b;
>   expression phb;
>   @@
>   -b = pci_register_bus(phb, ARGS);
>   +phb->bus = pci_register_bus(phb, ARGS);
>...
>   -phb->bus = b;
> 
>   @@
>   expression phb;
>   expression list ARGS;
>   @@
>   -phb->bus = pci_register_bus(phb, ARGS);
>   +pci_register_bus(phb, ARGS);
> 
> Cc: Richard Henderson 
> Cc: Aurelien Jarno 
> Cc: Yongbok Kim 
> Cc: Alexander Graf 
> Cc: Scott Wood 
> Cc: Paul Burton 
> Cc: "Michael S. Tsirkin" 
> Cc: Marcel Apfelbaum 
> Cc: David Gibson 
> Cc: Cornelia Huck 
> Cc: Christian Borntraeger 
> Cc: qemu-...@nongnu.org
> Signed-off-by: Eduardo Habkost 


Reviewed-by: David Gibson 

> ---
>  include/hw/pci/pci.h  | 12 ++--
>  hw/alpha/typhoon.c| 10 +-
>  hw/mips/gt64xxx_pci.c |  9 +++--
>  hw/pci-host/apb.c |  7 ++-
>  hw/pci-host/bonito.c  |  7 +++
>  hw/pci-host/gpex.c|  5 ++---
>  hw/pci-host/grackle.c |  9 ++---
>  hw/pci-host/ppce500.c |  8 
>  hw/pci-host/uninorth.c| 18 ++
>  hw/pci-host/xilinx-pcie.c |  6 +++---
>  hw/pci/pci.c  | 14 +++---
>  hw/ppc/ppc4xx_pci.c   |  8 
>  hw/ppc/spapr_pci.c| 10 +-
>  hw/s390x/s390-pci-bus.c   | 10 +-
>  hw/sh4/sh_pci.c   |  9 +++--
>  15 files changed, 60 insertions(+), 82 deletions(-)
> 
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 56387ccb0c..3b1e2c408a 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -408,12 +408,12 @@ void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq, 
> pci_map_irq_fn map_irq,
>  int pci_bus_get_irq_level(PCIBus *bus, int irq_num);
>  /* 0 <= pin <= 3 0 = INTA, 1 = INTB, 2 = INTC, 3 = INTD */
>  int pci_swizzle_map_irq_fn(PCIDevice *pci_dev, int pin);
> -PCIBus *pci_register_bus(PCIHostState *phb, const char *name,
> - pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
> - void *irq_opaque,
> - MemoryRegion *address_space_mem,
> - MemoryRegion *address_space_io,
> - uint8_t devfn_min, int nirq, const char *typename);
> +void pci_register_bus(PCIHostState *phb, const char *name,
> +  pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
> +  void *irq_opaque,
> +  MemoryRegion *address_space_mem,
> +  MemoryRegion *address_space_io,
> +  uint8_t devfn_min, int nirq, const char *typename);
>  void pci_bus_set_route_irq_fn(PCIBus *, pci_route_irq_fn);
>  PCIINTxRoute pci_device_route_intx_to_irq(PCIDevice *dev, int pin);
>  bool pci_intx_route_changed(PCIINTxRoute *old, PCIINTxRoute *new);
> diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
> index ac0633a55e..5926686d79 100644
> --- a/hw/alpha/typhoon.c
> +++ b/hw/alpha/typhoon.c
> @@ -883,11 +883,11 @@ PCIBus *typhoon_init(ram_addr_t ram_size, ISABus 
> **isa_bus,
>  memory_region_add_subregion(addr_space, 0x801fc00ULL,
>  &s->pchip.reg_io);
>  
> -b = pci_register_bus(phb, "pci",
> - typhoon_set_irq, sys_map_irq, s,
> - &s->pchip.reg_mem, &s->pchip.reg_io,
> - 0, 64, TYPE_PCI_BUS);
> -phb->bus = b;
> +pci_register_bus(phb, "pci",
> + typhoon_set_irq, sys_map_irq, s,
> + &s->pchip.reg_mem, &s->pchip.reg_io,
> + 0, 64, TYPE_PCI_BUS);
> +b = phb->bus;
>  qdev_init_nofail(dev);
>  
>  /* Host memory as seen from the PCI side, via the IOMMU.  */
> diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c
> index bd131bcdc6..69963453f0 100644
> --- a/hw/mips/gt64xxx_pci.c
> +++ b/hw/mips/gt64xxx_pci.c
> @@ -1171,12 +1171,9 @@ PCIBus *gt64120_register(qemu_irq *pic)
>  phb = PCI_HOST_BRIDGE(dev);
>  memory_region_init(&d->pci0_mem, OBJECT(dev), "pci0-mem", UINT32_MAX);
>  address_space_init(&d->pci0_mem_as, &d->pci0_mem, "pci0-mem");
> -phb->bus = pci_register_bus(phb, "pci",
> -gt64120_pci_set_irq, gt64120_pci_map_irq,
> -pic,
> -&d->pci0_mem,
> -get_system_io(),
> -PCI_DEVFN(18, 0), 4, TYPE_PCI_BUS);
> +pci_register_bus(phb, "pci", gt64120_pci_set_irq, gt64120_pci_map_irq,
> + pic, &d->pci0_mem, get_system_io(), PCI_DEVFN(18, 0), 4,
> + TYPE_PCI_BUS);
>  qdev_init_nofail(dev);
>  memo

Re: [Qemu-devel] [RFC 4/7] pci: Change pci_register_bus() 'parent' parameter to PCIHostState

2017-04-17 Thread David Gibson

On Mon, Apr 17, 2017 at 06:59:13PM -0300, Eduardo Habkost wrote:
> pci_register_bus() already requires the 'parent' argument to be a
> PCI_HOST_BRIDGE object. Change the parameter type to reflect that.
> 
> Cc: Richard Henderson 
> Cc: Aurelien Jarno 
> Cc: Yongbok Kim 
> Cc: Alexander Graf 
> Cc: Scott Wood 
> Cc: Paul Burton 
> Cc: "Michael S. Tsirkin" 
> Cc: Marcel Apfelbaum 
> Cc: David Gibson 
> Cc: Cornelia Huck 
> Cc: Christian Borntraeger 
> Cc: qemu-...@nongnu.org
> Signed-off-by: Eduardo Habkost 

Reviewed-by: David Gibson 

> ---
>  include/hw/pci/pci.h  | 2 +-
>  hw/alpha/typhoon.c| 2 +-
>  hw/mips/gt64xxx_pci.c | 2 +-
>  hw/pci-host/apb.c | 2 +-
>  hw/pci-host/bonito.c  | 2 +-
>  hw/pci-host/gpex.c| 2 +-
>  hw/pci-host/grackle.c | 2 +-
>  hw/pci-host/ppce500.c | 2 +-
>  hw/pci-host/uninorth.c| 4 ++--
>  hw/pci-host/xilinx-pcie.c | 2 +-
>  hw/pci/pci.c  | 4 ++--
>  hw/ppc/ppc4xx_pci.c   | 2 +-
>  hw/ppc/spapr_pci.c| 2 +-
>  hw/s390x/s390-pci-bus.c   | 2 +-
>  hw/sh4/sh_pci.c   | 2 +-
>  15 files changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 2242aa25eb..56387ccb0c 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -408,7 +408,7 @@ void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq, 
> pci_map_irq_fn map_irq,
>  int pci_bus_get_irq_level(PCIBus *bus, int irq_num);
>  /* 0 <= pin <= 3 0 = INTA, 1 = INTB, 2 = INTC, 3 = INTD */
>  int pci_swizzle_map_irq_fn(PCIDevice *pci_dev, int pin);
> -PCIBus *pci_register_bus(DeviceState *parent, const char *name,
> +PCIBus *pci_register_bus(PCIHostState *phb, const char *name,
>   pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
>   void *irq_opaque,
>   MemoryRegion *address_space_mem,
> diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
> index f50f5cf186..ac0633a55e 100644
> --- a/hw/alpha/typhoon.c
> +++ b/hw/alpha/typhoon.c
> @@ -883,7 +883,7 @@ PCIBus *typhoon_init(ram_addr_t ram_size, ISABus 
> **isa_bus,
>  memory_region_add_subregion(addr_space, 0x801fc00ULL,
>  &s->pchip.reg_io);
>  
> -b = pci_register_bus(dev, "pci",
> +b = pci_register_bus(phb, "pci",
>   typhoon_set_irq, sys_map_irq, s,
>   &s->pchip.reg_mem, &s->pchip.reg_io,
>   0, 64, TYPE_PCI_BUS);
> diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c
> index 4811843ab6..bd131bcdc6 100644
> --- a/hw/mips/gt64xxx_pci.c
> +++ b/hw/mips/gt64xxx_pci.c
> @@ -1171,7 +1171,7 @@ PCIBus *gt64120_register(qemu_irq *pic)
>  phb = PCI_HOST_BRIDGE(dev);
>  memory_region_init(&d->pci0_mem, OBJECT(dev), "pci0-mem", UINT32_MAX);
>  address_space_init(&d->pci0_mem_as, &d->pci0_mem, "pci0-mem");
> -phb->bus = pci_register_bus(dev, "pci",
> +phb->bus = pci_register_bus(phb, "pci",
>  gt64120_pci_set_irq, gt64120_pci_map_irq,
>  pic,
>  &d->pci0_mem,
> diff --git a/hw/pci-host/apb.c b/hw/pci-host/apb.c
> index 653e711121..1156a54224 100644
> --- a/hw/pci-host/apb.c
> +++ b/hw/pci-host/apb.c
> @@ -671,7 +671,7 @@ PCIBus *pci_apb_init(hwaddr special_base,
>  dev = qdev_create(NULL, TYPE_APB);
>  d = APB_DEVICE(dev);
>  phb = PCI_HOST_BRIDGE(dev);
> -phb->bus = pci_register_bus(DEVICE(phb), "pci",
> +phb->bus = pci_register_bus(phb, "pci",
>  pci_apb_set_irq, pci_pbm_map_irq, d,
>  &d->pci_mmio,
>  get_system_io(),
> diff --git a/hw/pci-host/bonito.c b/hw/pci-host/bonito.c
> index 1999ece590..27842edc04 100644
> --- a/hw/pci-host/bonito.c
> +++ b/hw/pci-host/bonito.c
> @@ -714,7 +714,7 @@ static int bonito_pcihost_initfn(SysBusDevice *dev)
>  {
>  PCIHostState *phb = PCI_HOST_BRIDGE(dev);
>  
> -phb->bus = pci_register_bus(DEVICE(dev), "pci",
> +phb->bus = pci_register_bus(phb, "pci",
>  pci_bonito_set_irq, pci_bonito_map_irq, dev,
>  get_system_memory(), get_system_io(),
>  0x28, 32, TYPE_PCI_BUS);
> diff --git a/hw/pci-host/gpex.c b/hw/pci-host/gpex.c
> index 66055ee5cc..042d127271 100644
> --- a/hw/pci-host/gpex.c
> +++ b/hw/pci-host/gpex.c
> @@ -62,7 +62,7 @@ static void gpex_host_realize(DeviceState *dev, Error 
> **errp)
>  sysbus_init_irq(sbd, &s->irq[i]);
>  }
>  
> -pci->bus = pci_register_bus(dev, "pcie.0", gpex_set_irq,
> +pci->bus = pci_register_bus(pci, "pcie.0", gpex_set_irq,
>  pci_swizzle_map_irq_fn, s, &s->io_mmio,
>  &s->io_ioport, 0, 4, TYPE_PCIE_BUS);
>  
> diff --git a/hw/pci-host/grackle.c b/hw/pci-host/grackle.c
>

Re: [Qemu-devel] [RFC 3/7] pci: Change pci_bus_new*() parameter to PCIHostState

2017-04-17 Thread David Gibson

On Mon, Apr 17, 2017 at 06:59:12PM -0300, Eduardo Habkost wrote:
> The pci_bus_new*() functions already require the 'parent' argument to be
> a PCI_HOST_BRIDGE object. Change the parameter type to reflect that.
> 
> Cc: "Michael S. Tsirkin" 
> Cc: Marcel Apfelbaum 
> Cc: "Hervé Poussineau" 
> Cc: Peter Maydell 
> Cc: qemu-...@nongnu.org
> Cc: qemu-...@nongnu.org
> Signed-off-by: Eduardo Habkost 

Reviewed-by: David Gibson 

> ---
>  include/hw/pci/pci.h|  5 +++--
>  hw/pci-bridge/pci_expander_bridge.c | 15 ---
>  hw/pci-host/piix.c  |  2 +-
>  hw/pci-host/prep.c  |  2 +-
>  hw/pci-host/q35.c   |  2 +-
>  hw/pci-host/versatile.c |  2 +-
>  hw/pci/pci.c| 13 ++---
>  7 files changed, 21 insertions(+), 20 deletions(-)
> 
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index a37a2d5cb6..2242aa25eb 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -393,12 +393,13 @@ typedef PCIINTxRoute (*pci_route_irq_fn)(void *opaque, 
> int pin);
>  
>  bool pci_bus_is_express(PCIBus *bus);
>  bool pci_bus_is_root(PCIBus *bus);
> -void pci_bus_new_inplace(PCIBus *bus, size_t bus_size, DeviceState *parent,
> +void pci_bus_new_inplace(PCIBus *bus, size_t bus_size,
> + PCIHostState *phb,
>   const char *name,
>   MemoryRegion *address_space_mem,
>   MemoryRegion *address_space_io,
>   uint8_t devfn_min, const char *typename);
> -PCIBus *pci_bus_new(DeviceState *parent, const char *name,
> +PCIBus *pci_bus_new(PCIHostState *phb, const char *name,
>  MemoryRegion *address_space_mem,
>  MemoryRegion *address_space_io,
>  uint8_t devfn_min, const char *typename);
> diff --git a/hw/pci-bridge/pci_expander_bridge.c 
> b/hw/pci-bridge/pci_expander_bridge.c
> index 6ac187fa32..39d29d2230 100644
> --- a/hw/pci-bridge/pci_expander_bridge.c
> +++ b/hw/pci-bridge/pci_expander_bridge.c
> @@ -213,7 +213,8 @@ static gint pxb_compare(gconstpointer a, gconstpointer b)
>  static void pxb_dev_realize_common(PCIDevice *dev, bool pcie, Error **errp)
>  {
>  PXBDev *pxb = convert_to_pxb(dev);
> -DeviceState *ds, *bds = NULL;
> +DeviceState *bds = NULL;
> +PCIHostState *phb;
>  PCIBus *bus;
>  const char *dev_name = NULL;
>  Error *local_err = NULL;
> @@ -228,11 +229,11 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
> pcie, Error **errp)
>  dev_name = dev->qdev.id;
>  }
>  
> -ds = qdev_create(NULL, TYPE_PXB_HOST);
> +phb = PCI_HOST_BRIDGE(qdev_create(NULL, TYPE_PXB_HOST));
>  if (pcie) {
> -bus = pci_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
> +bus = pci_bus_new(phb, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
>  } else {
> -bus = pci_bus_new(ds, "pxb-internal", NULL, NULL, 0, TYPE_PXB_BUS);
> +bus = pci_bus_new(phb, "pxb-internal", NULL, NULL, 0, TYPE_PXB_BUS);
>  bds = qdev_create(BUS(bus), "pci-bridge");
>  bds->id = dev_name;
>  qdev_prop_set_uint8(bds, PCI_BRIDGE_DEV_PROP_CHASSIS_NR, 
> pxb->bus_nr);
> @@ -244,7 +245,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
> pcie, Error **errp)
>  bus->address_space_io = dev->bus->address_space_io;
>  bus->map_irq = pxb_map_irq_fn;
>  
> -PCI_HOST_BRIDGE(ds)->bus = bus;
> +phb->bus = bus;
>  
>  pxb_register_bus(dev, bus, &local_err);
>  if (local_err) {
> @@ -252,7 +253,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
> pcie, Error **errp)
>  goto err_register_bus;
>  }
>  
> -qdev_init_nofail(ds);
> +qdev_init_nofail(DEVICE(phb));
>  if (bds) {
>  qdev_init_nofail(bds);
>  }
> @@ -267,7 +268,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
> pcie, Error **errp)
>  err_register_bus:
>  object_unref(OBJECT(bds));
>  object_unparent(OBJECT(bus));
> -object_unref(OBJECT(ds));
> +object_unref(OBJECT(phb));
>  }
>  
>  static void pxb_dev_realize(PCIDevice *dev, Error **errp)
> diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
> index f9218aa952..91fec05b38 100644
> --- a/hw/pci-host/piix.c
> +++ b/hw/pci-host/piix.c
> @@ -340,7 +340,7 @@ PCIBus *i440fx_init(const char *host_type, const char 
> *pci_type,
>  
>  dev = qdev_create(NULL, host_type);
>  s = PCI_HOST_BRIDGE(dev);
> -b = pci_bus_new(dev, NULL, pci_address_space,
> +b = pci_bus_new(s, NULL, pci_address_space,
>  address_space_io, 0, TYPE_PCI_BUS);
>  s->bus = b;
>  object_property_add_child(qdev_get_machine(), "i440fx", OBJECT(dev), 
> NULL);
> diff --git a/hw/pci-host/prep.c b/hw/pci-host/prep.c
> index 260a119a9e..2e2cd267f4 100644
> --- a/hw/pci-host/prep.c
> +++ b/hw/pci-host/prep.c
> @@ -269,7 +269,7 @@ static void ra

Re: [Qemu-devel] [RFC 6/7] pci: Set phb->bus inside pci_bus_new()

2017-04-17 Thread David Gibson

On Mon, Apr 17, 2017 at 06:59:15PM -0300, Eduardo Habkost wrote:
> Every single caller of pci_bus_new() saves the return value inside
> phb->bus. Do that inside pci_bus_new() to avoid code duplication and
> make it harder to break.
> 
> Cc: "Michael S. Tsirkin" 
> Cc: Marcel Apfelbaum 
> Signed-off-by: Eduardo Habkost 

Reviewed-by: David Gibson 

> ---
>  hw/pci-bridge/pci_expander_bridge.c | 2 --
>  hw/pci-host/piix.c  | 1 -
>  hw/pci-host/q35.c   | 6 +++---
>  hw/pci/pci.c| 2 +-
>  4 files changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/pci-bridge/pci_expander_bridge.c 
> b/hw/pci-bridge/pci_expander_bridge.c
> index 39d29d2230..8344ca1cc8 100644
> --- a/hw/pci-bridge/pci_expander_bridge.c
> +++ b/hw/pci-bridge/pci_expander_bridge.c
> @@ -245,8 +245,6 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
> pcie, Error **errp)
>  bus->address_space_io = dev->bus->address_space_io;
>  bus->map_irq = pxb_map_irq_fn;
>  
> -phb->bus = bus;
> -
>  pxb_register_bus(dev, bus, &local_err);
>  if (local_err) {
>  error_propagate(errp, local_err);
> diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
> index 91fec05b38..818e4979d8 100644
> --- a/hw/pci-host/piix.c
> +++ b/hw/pci-host/piix.c
> @@ -342,7 +342,6 @@ PCIBus *i440fx_init(const char *host_type, const char 
> *pci_type,
>  s = PCI_HOST_BRIDGE(dev);
>  b = pci_bus_new(s, NULL, pci_address_space,
>  address_space_io, 0, TYPE_PCI_BUS);
> -s->bus = b;
>  object_property_add_child(qdev_get_machine(), "i440fx", OBJECT(dev), 
> NULL);
>  qdev_init_nofail(dev);
>  
> diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> index 860b47a1ba..5b41412075 100644
> --- a/hw/pci-host/q35.c
> +++ b/hw/pci-host/q35.c
> @@ -49,9 +49,9 @@ static void q35_host_realize(DeviceState *dev, Error **errp)
>  sysbus_add_io(sbd, MCH_HOST_BRIDGE_CONFIG_DATA, &pci->data_mem);
>  sysbus_init_ioports(sbd, MCH_HOST_BRIDGE_CONFIG_DATA, 4);
>  
> -pci->bus = pci_bus_new(pci, "pcie.0",
> -   s->mch.pci_address_space, s->mch.address_space_io,
> -   0, TYPE_PCIE_BUS);
> +pci_bus_new(pci, "pcie.0",
> +s->mch.pci_address_space, s->mch.address_space_io,
> +0, TYPE_PCIE_BUS);
>  PC_MACHINE(qdev_get_machine())->bus = pci->bus;
>  qdev_set_parent_bus(DEVICE(&s->mch), BUS(pci->bus));
>  qdev_init_nofail(DEVICE(&s->mch));
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index d3adf806e5..486aeb7514 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -408,6 +408,7 @@ PCIBus *pci_bus_new(PCIHostState *phb, const char *name,
>  
>  bus = PCI_BUS(qbus_create(typename, DEVICE(phb), name));
>  pci_bus_init(bus, phb, address_space_mem, address_space_io, devfn_min);
> +phb->bus = bus;
>  return bus;
>  }
>  
> @@ -433,7 +434,6 @@ void pci_register_bus(PCIHostState *phb, const char *name,
>  bus = pci_bus_new(phb, name, address_space_mem,
>address_space_io, devfn_min, typename);
>  pci_bus_irqs(bus, set_irq, map_irq, irq_opaque, nirq);
> -phb->bus = bus;
>  }
>  
>  int pci_bus_num(PCIBus *s)

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC 1/7] pci: Change pci_host_bus_register() parameter to PCIHostState

2017-04-17 Thread David Gibson

On Mon, Apr 17, 2017 at 06:59:10PM -0300, Eduardo Habkost wrote:
> The function requires a PCI_HOST_BRIDGE object, so change the parameter
> type to reflect that.
> 
> Cc: "Michael S. Tsirkin" 
> Cc: Marcel Apfelbaum 
> Signed-off-by: Eduardo Habkost 

Reviewed-by: David Gibson 

> ---
>  hw/pci/pci.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 259483b1c0..25118fb91d 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -312,11 +312,9 @@ static void pcibus_reset(BusState *qbus)
>  }
>  }
>  
> -static void pci_host_bus_register(DeviceState *host)
> +static void pci_host_bus_register(PCIHostState *phb)
>  {
> -PCIHostState *host_bridge = PCI_HOST_BRIDGE(host);
> -
> -QLIST_INSERT_HEAD(&pci_host_bridges, host_bridge, next);
> +QLIST_INSERT_HEAD(&pci_host_bridges, phb, next);
>  }
>  
>  PCIBus *pci_find_primary_bus(void)
> @@ -377,7 +375,7 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent,
>  /* host bridge */
>  QLIST_INIT(&bus->child);
>  
> -pci_host_bus_register(parent);
> +pci_host_bus_register(PCI_HOST_BRIDGE(host));
>  }
>  
>  bool pci_bus_is_express(PCIBus *bus)

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC 2/7] pci: Change pci_bus_init() 'parent' parameter to PCIHostState

2017-04-17 Thread David Gibson

On Mon, Apr 17, 2017 at 06:59:11PM -0300, Eduardo Habkost wrote:
> pci_bus_init() already requires 'parent' to be a PCI_HOST_BRIDGE object,
> so change the parameter type to reflect that.
> 
> Cc: "Michael S. Tsirkin" 
> Cc: Marcel Apfelbaum 
> Signed-off-by: Eduardo Habkost 

Reviewed-by: David Gibson 

I had to do some looking to convince myself this wouldn't break P2P
bridges.

I wonder if we should rename pci_bus_init() / pci_bus_new() to make it
clearer that they're about creating a whole new PCI domain, and not
for a new bus within an existing domain.

> ---
>  hw/pci/pci.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 25118fb91d..d9535c0bdc 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -362,7 +362,7 @@ const char *pci_root_bus_path(PCIDevice *dev)
>  return rootbus->qbus.name;
>  }
>  
> -static void pci_bus_init(PCIBus *bus, DeviceState *parent,
> +static void pci_bus_init(PCIBus *bus, PCIHostState *phb,
>   MemoryRegion *address_space_mem,
>   MemoryRegion *address_space_io,
>   uint8_t devfn_min)
> @@ -375,7 +375,7 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent,
>  /* host bridge */
>  QLIST_INIT(&bus->child);
>  
> -pci_host_bus_register(PCI_HOST_BRIDGE(host));
> +pci_host_bus_register(phb);
>  }
>  
>  bool pci_bus_is_express(PCIBus *bus)
> @@ -394,8 +394,9 @@ void pci_bus_new_inplace(PCIBus *bus, size_t bus_size, 
> DeviceState *parent,
>   MemoryRegion *address_space_io,
>   uint8_t devfn_min, const char *typename)
>  {
> +PCIHostState *phb = PCI_HOST_BRIDGE(parent);
>  qbus_create_inplace(bus, bus_size, typename, parent, name);
> -pci_bus_init(bus, parent, address_space_mem, address_space_io, 
> devfn_min);
> +pci_bus_init(bus, phb, address_space_mem, address_space_io, devfn_min);
>  }
>  
>  PCIBus *pci_bus_new(DeviceState *parent, const char *name,
> @@ -403,10 +404,11 @@ PCIBus *pci_bus_new(DeviceState *parent, const char 
> *name,
>  MemoryRegion *address_space_io,
>  uint8_t devfn_min, const char *typename)
>  {
> +PCIHostState *phb = PCI_HOST_BRIDGE(parent);
>  PCIBus *bus;
>  
>  bus = PCI_BUS(qbus_create(typename, parent, name));
> -pci_bus_init(bus, parent, address_space_mem, address_space_io, 
> devfn_min);
> +pci_bus_init(bus, phb, address_space_mem, address_space_io, devfn_min);
>  return bus;
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC 7/7] pci: Set phb->bus inside pci_bus_new_inplace()

2017-04-17 Thread David Gibson

On Mon, Apr 17, 2017 at 06:59:16PM -0300, Eduardo Habkost wrote:
> Every single caller of pci_bus_new_inplace() sets phb->bus to point to
> 'bus'. Do that inside pci_bus_new_inplace() to avoid code duplication
> and make it harder to break.
> 
> Cc: "Hervé Poussineau" 
> Cc: Marcel Apfelbaum 
> Cc: "Michael S. Tsirkin" 
> Cc: Peter Maydell 
> Cc: qemu-...@nongnu.org
> Cc: qemu-...@nongnu.org
> Signed-off-by: Eduardo Habkost 

Reviewed-by: David Gibson 

> ---
>  hw/pci-host/prep.c  | 2 --
>  hw/pci-host/versatile.c | 1 -
>  hw/pci/pci.c| 1 +
>  3 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/hw/pci-host/prep.c b/hw/pci-host/prep.c
> index 2e2cd267f4..6efa5bc5ef 100644
> --- a/hw/pci-host/prep.c
> +++ b/hw/pci-host/prep.c
> @@ -284,8 +284,6 @@ static void raven_pcihost_initfn(Object *obj)
>  address_space_init(&s->bm_as, &s->bm, "raven-bm");
>  pci_setup_iommu(&s->pci_bus, raven_pcihost_set_iommu, s);
>  
> -h->bus = &s->pci_bus;
> -
>  object_initialize(&s->pci_dev, sizeof(s->pci_dev), 
> TYPE_RAVEN_PCI_DEVICE);
>  pci_dev = DEVICE(&s->pci_dev);
>  qdev_set_parent_bus(pci_dev, BUS(&s->pci_bus));
> diff --git a/hw/pci-host/versatile.c b/hw/pci-host/versatile.c
> index 24ef87610b..630f1ac1c5 100644
> --- a/hw/pci-host/versatile.c
> +++ b/hw/pci-host/versatile.c
> @@ -389,7 +389,6 @@ static void pci_vpb_init(Object *obj)
>  pci_bus_new_inplace(&s->pci_bus, sizeof(s->pci_bus), h, "pci",
>  &s->pci_mem_space, &s->pci_io_space,
>  PCI_DEVFN(11, 0), TYPE_PCI_BUS);
> -h->bus = &s->pci_bus;
>  
>  object_initialize(&s->pci_dev, sizeof(s->pci_dev), 
> TYPE_VERSATILE_PCI_HOST);
>  qdev_set_parent_bus(DEVICE(&s->pci_dev), BUS(&s->pci_bus));
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 486aeb7514..ef226f8b41 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -397,6 +397,7 @@ void pci_bus_new_inplace(PCIBus *bus, size_t bus_size,
>  {
>  qbus_create_inplace(bus, bus_size, typename, DEVICE(phb), name);
>  pci_bus_init(bus, phb, address_space_mem, address_space_io, devfn_min);
> +phb->bus = bus;
>  }
>  
>  PCIBus *pci_bus_new(PCIHostState *phb, const char *name,

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 7/7] intel_iommu: support passthrough (PT)

2017-04-17 Thread Jason Wang




On 2017年04月18日 11:50, Peter Xu wrote:

On Tue, Apr 18, 2017 at 11:23:35AM +0800, Jason Wang wrote:

On 2017年04月17日 18:58, Peter Xu wrote:

[...]


+static void vtd_switch_address_space(VTDAddressSpace *as)
+{
+bool use_iommu;
+
+assert(as);
+
+use_iommu = as->iommu_state->dmar_enabled;
+if (use_iommu) {
+/* Further checks per-device configuration */
+use_iommu &= !vtd_dev_pt_enabled(as);
+}

Looks like you can use as->iommu_state->dmar_enabled &&
!vtd_dev_pt_enabled(as)

vtd_dev_pt_enalbed() needs to read the guest memory (starting from
reading root entry), which is slightly slow. I was trying to avoid
unecessary reads.

[...]


I think compiler won't go to vtd_dev_pt_enabled() if dmar_enabled is false.




@@ -991,6 +1058,18 @@ static void vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
  cc_entry->context_cache_gen = s->context_cache_gen;
  }
+/*
+ * We don't need to translate for pass-through context entries.
+ * Also, let's ignore IOTLB caching as well for PT devices.
+ */
+if (vtd_ce_get_type(&ce) == VTD_CONTEXT_TT_PASS_THROUGH) {
+entry->translated_addr = entry->iova;
+entry->addr_mask = VTD_PAGE_SIZE - 1;
+entry->perm = IOMMU_RW;
+trace_vtd_translate_pt(source_id, entry->iova);
+return;
+}

Several questions here:

1) Is this just for vhost?

No. When caching mode is not enabled, all passthroughed devices should
be using this path.


Ok, then it looks better to switch the address space if we've found it 
was PT?





2) Since this is done after IOTLB querying, do we need flush IOTLB during
address switching?

IMHO if guest switches address space for a device, it is required to
send IOTLB flush as well for that device/domain.

[...]


Ok.




  static void vtd_switch_address_space_all(IntelIOMMUState *s)
  {
  GHashTableIter iter;
@@ -2849,6 +2914,10 @@ static void vtd_init(IntelIOMMUState *s)
  s->ecap |= VTD_ECAP_DT;
  }
+if (x86_iommu->pt_supported) {
+s->ecap |= VTD_ECAP_PT;
+}

Since we support migration now, need compat this for pre 2.10.

Oh yes. If I set pt=off by default, it should be okay then, right?


Right, but I think it's better to keep this on by default for 
performance reason.




Then, at some point, we can switch to on by default, with a touch-up
in include/hw/compat.h I guess?


Yes.

Thanks



+
  if (s->caching_mode) {
  s->cap |= VTD_CAP_CM;
  }
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 29d6707..0e73a65 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -187,6 +187,7 @@
  /* Interrupt Remapping support */
  #define VTD_ECAP_IR (1ULL << 3)
  #define VTD_ECAP_EIM(1ULL << 4)
+#define VTD_ECAP_PT (1ULL << 6)
  #define VTD_ECAP_MHMV   (15ULL << 20)
  /* CAP_REG */
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 04a6980..867ad0b 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -38,6 +38,7 @@ vtd_page_walk_skip_perm(uint64_t iova, uint64_t next) "Page walk 
skip iova 0x%"P
  vtd_page_walk_skip_reserve(uint64_t iova, uint64_t next) "Page walk skip iova 0x%"PRIx64" - 
0x%"PRIx64" due to rsrv set"
  vtd_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) "Device 
%02x:%02x.%x switching address space (iommu enabled=%d)"
  vtd_as_unmap_whole(uint8_t bus, uint8_t slot, uint8_t fn, uint64_t iova, uint64_t size) 
"Device %02x:%02x.%x start 0x%"PRIx64" size 0x%"PRIx64
+vtd_translate_pt(uint16_t sid, uint64_t addr) "source id 0x%"PRIu16", iova 
0x%"PRIx64
  # hw/i386/amd_iommu.c
  amdvi_evntlog_fail(uint64_t addr, uint32_t head) "error: fail to write at addr 
0x%"PRIx64" +  offset 0x%"PRIx32
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 02b8825..293caf8 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -91,6 +91,7 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
  static Property x86_iommu_properties[] = {
  DEFINE_PROP_BOOL("intremap", X86IOMMUState, intr_supported, false),
  DEFINE_PROP_BOOL("device-iotlb", X86IOMMUState, dt_supported, false),
+DEFINE_PROP_BOOL("pt", X86IOMMUState, pt_supported, true),

Do you know if AMD IOMMU support this?

AMD IOMMU should support this. IIUC it's the first bit of Device Table Entry.

Thanks,

Re: [Qemu-devel] [PATCH 02/15] colo-compare: implement the process of checkpoint

2017-04-17 Thread Jason Wang




On 2017年04月17日 19:04, Hailiang Zhang wrote:

Hi Jason,

On 2017/4/14 14:38, Jason Wang wrote:


On 2017年04月14日 14:22, Hailiang Zhang wrote:

Hi Jason,

On 2017/4/14 13:57, Jason Wang wrote:

On 2017年02月22日 17:31, Zhang Chen wrote:

On 02/22/2017 11:42 AM, zhanghailiang wrote:

While do checkpoint, we need to flush all the unhandled packets,
By using the filter notifier mechanism, we can easily to notify
every compare object to do this process, which runs inside
of compare threads as a coroutine.

Hi~ Jason and Hailiang.

I will send a patch set later about colo-compare notify mechanism for
Xen like this patch.
I want to add a new chardev socket way in colo-comapre connect to Xen
colo, for notify
checkpoint or failover, Because We have no choice to use this way
communicate with Xen codes.
That's means we will have two notify mechanism.
What do you think about this?


Thanks
Zhang Chen
I was thinking the possibility of using similar way to for colo 
compare.

E.g can we use socket? This can saves duplicated codes more or less.

Since there are too many sockets used by filter and COLO, (Two unix
sockets and two
  tcp sockets for each vNIC), I don't want to introduce more ;) , but
i'm not sure if it is
possible to make it more flexible and optional, abstract these
duplicated codes,
pass the opened fd (No matter eventfd or socket fd ) as parameter, for
example.
Is this way acceptable ?

Thanks,
Hailiang

Yes, that's kind of what I want. We don't want to use two message
format. Passing a opened fd need management support, we still need a
fallback if there's no management on top. For qemu/kvm, we can do all
stuffs transparent to the cli by e.g socketpair() or others, but the key
is to have a unified message format.


After a deeper investigation, i think we can re-use most codes, since 
there is no
existing way to notify xen (no ?), we still needs notify chardev 
socket (Be used to notify xen, it is optional.)
(http://patchwork.ozlabs.org/patch/733431/ "COLO-compare: Add Xen 
notify chardev socket handler frame")


Yes and actually you can use this for bi-directional communication. The 
only differences is the implementation of comparing.




Besides, there is an existing qmp comand 'xen-colo-do-checkpoint', 


I don't see this in master?


we can re-use it to notify
colo-compare objects and other filter objects to do checkpoint, for 
the opposite direction, we use

the notify chardev socket (Only for xen).


Just want to make sure I understand the design, who will trigger this 
command? Management?


Can we just use the socket?



So the codes will be like:
diff --git a/migration/colo.c b/migration/colo.c
index 91da936..813c281 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -224,7 +224,19 @@ ReplicationStatus 
*qmp_query_xen_replication_status(Error **errp)


 void qmp_xen_colo_do_checkpoint(Error **errp)
 {
+Error *local_err = NULL;
+
 replication_do_checkpoint_all(errp);
+/* Notify colo-compare and other filters to do checkpoint */
+colo_notify_compares_event(NULL, COLO_CHECKPOINT, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+colo_notify_filters_event(COLO_CHECKPOINT, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+}
 }

 static void colo_send_message(QEMUFile *f, COLOMessage msg,
diff --git a/net/colo-compare.c b/net/colo-compare.c
index 24e13f0..de975c5 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -391,6 +391,9 @@ static void colo_compare_inconsistent_notify(void)
 {
 notifier_list_notify(&colo_compare_notifiers,
 migrate_get_current());
+if (s->notify_dev) {
+   /* Do something, notify remote side through notify dev */
+}
 }

 void colo_compare_register_notifier(Notifier *notify)

How about this scenario ？


See my reply above, and we need unify the message format too. Raw string 
is ok but we'd better have something like TLV or others.


Thanks




Thoughts?

Thanks


Thanks


.





.

Re: [Qemu-devel] [PATCH 7/7] intel_iommu: support passthrough (PT)

2017-04-17 Thread Peter Xu

On Tue, Apr 18, 2017 at 11:23:35AM +0800, Jason Wang wrote:
> On 2017年04月17日 18:58, Peter Xu wrote:

[...]

> >+static void vtd_switch_address_space(VTDAddressSpace *as)
> >+{
> >+bool use_iommu;
> >+
> >+assert(as);
> >+
> >+use_iommu = as->iommu_state->dmar_enabled;
> >+if (use_iommu) {
> >+/* Further checks per-device configuration */
> >+use_iommu &= !vtd_dev_pt_enabled(as);
> >+}
> 
> Looks like you can use as->iommu_state->dmar_enabled &&
> !vtd_dev_pt_enabled(as)

vtd_dev_pt_enalbed() needs to read the guest memory (starting from
reading root entry), which is slightly slow. I was trying to avoid
unecessary reads.

[...]

> >@@ -991,6 +1058,18 @@ static void vtd_do_iommu_translate(VTDAddressSpace 
> >*vtd_as, PCIBus *bus,
> >  cc_entry->context_cache_gen = s->context_cache_gen;
> >  }
> >+/*
> >+ * We don't need to translate for pass-through context entries.
> >+ * Also, let's ignore IOTLB caching as well for PT devices.
> >+ */
> >+if (vtd_ce_get_type(&ce) == VTD_CONTEXT_TT_PASS_THROUGH) {
> >+entry->translated_addr = entry->iova;
> >+entry->addr_mask = VTD_PAGE_SIZE - 1;
> >+entry->perm = IOMMU_RW;
> >+trace_vtd_translate_pt(source_id, entry->iova);
> >+return;
> >+}
> 
> Several questions here:
> 
> 1) Is this just for vhost?

No. When caching mode is not enabled, all passthroughed devices should
be using this path.

> 2) Since this is done after IOTLB querying, do we need flush IOTLB during
> address switching?

IMHO if guest switches address space for a device, it is required to
send IOTLB flush as well for that device/domain.

[...]

> >  static void vtd_switch_address_space_all(IntelIOMMUState *s)
> >  {
> >  GHashTableIter iter;
> >@@ -2849,6 +2914,10 @@ static void vtd_init(IntelIOMMUState *s)
> >  s->ecap |= VTD_ECAP_DT;
> >  }
> >+if (x86_iommu->pt_supported) {
> >+s->ecap |= VTD_ECAP_PT;
> >+}
> 
> Since we support migration now, need compat this for pre 2.10.

Oh yes. If I set pt=off by default, it should be okay then, right?

Then, at some point, we can switch to on by default, with a touch-up
in include/hw/compat.h I guess?

> 
> >+
> >  if (s->caching_mode) {
> >  s->cap |= VTD_CAP_CM;
> >  }
> >diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> >index 29d6707..0e73a65 100644
> >--- a/hw/i386/intel_iommu_internal.h
> >+++ b/hw/i386/intel_iommu_internal.h
> >@@ -187,6 +187,7 @@
> >  /* Interrupt Remapping support */
> >  #define VTD_ECAP_IR (1ULL << 3)
> >  #define VTD_ECAP_EIM(1ULL << 4)
> >+#define VTD_ECAP_PT (1ULL << 6)
> >  #define VTD_ECAP_MHMV   (15ULL << 20)
> >  /* CAP_REG */
> >diff --git a/hw/i386/trace-events b/hw/i386/trace-events
> >index 04a6980..867ad0b 100644
> >--- a/hw/i386/trace-events
> >+++ b/hw/i386/trace-events
> >@@ -38,6 +38,7 @@ vtd_page_walk_skip_perm(uint64_t iova, uint64_t next) 
> >"Page walk skip iova 0x%"P
> >  vtd_page_walk_skip_reserve(uint64_t iova, uint64_t next) "Page walk skip 
> > iova 0x%"PRIx64" - 0x%"PRIx64" due to rsrv set"
> >  vtd_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) 
> > "Device %02x:%02x.%x switching address space (iommu enabled=%d)"
> >  vtd_as_unmap_whole(uint8_t bus, uint8_t slot, uint8_t fn, uint64_t iova, 
> > uint64_t size) "Device %02x:%02x.%x start 0x%"PRIx64" size 0x%"PRIx64
> >+vtd_translate_pt(uint16_t sid, uint64_t addr) "source id 0x%"PRIu16", iova 
> >0x%"PRIx64
> >  # hw/i386/amd_iommu.c
> >  amdvi_evntlog_fail(uint64_t addr, uint32_t head) "error: fail to write at 
> > addr 0x%"PRIx64" +  offset 0x%"PRIx32
> >diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
> >index 02b8825..293caf8 100644
> >--- a/hw/i386/x86-iommu.c
> >+++ b/hw/i386/x86-iommu.c
> >@@ -91,6 +91,7 @@ static void x86_iommu_realize(DeviceState *dev, Error 
> >**errp)
> >  static Property x86_iommu_properties[] = {
> >  DEFINE_PROP_BOOL("intremap", X86IOMMUState, intr_supported, false),
> >  DEFINE_PROP_BOOL("device-iotlb", X86IOMMUState, dt_supported, false),
> >+DEFINE_PROP_BOOL("pt", X86IOMMUState, pt_supported, true),
> 
> Do you know if AMD IOMMU support this?

AMD IOMMU should support this. IIUC it's the first bit of Device Table Entry.

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH] cpus: Fix CPU unplug for MTTCG

2017-04-17 Thread David Gibson

On Thu, Apr 13, 2017 at 01:21:46PM +0530, Bharata B Rao wrote:
> Ensure that the unplugged CPU thread is destroyed and the waiting
> thread is notified about it. This is needed for CPU unplug to work
> correctly in MTTCG mode.
> 
> Signed-off-by: Bharata B Rao 

Applied to ppc-for-2.10, thanks.

> ---
>  cpus.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/cpus.c b/cpus.c
> index 740b8dc..79f780b 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1483,6 +1483,12 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>  /* Ignore everything else? */
>  break;
>  }
> +} else if (cpu->unplug) {
> +qemu_tcg_destroy_vcpu(cpu);
> +cpu->created = false;
> +qemu_cond_signal(&qemu_cpu_cond);
> +qemu_mutex_unlock_iothread();
> +return NULL;
>  }
>  
>  atomic_mb_set(&cpu->exit_request, 0);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC for-2.10 2/3] pci: Allow host bridges to override PCI/PCIe hybrid device behaviour

2017-04-17 Thread David Gibson

On Mon, Apr 17, 2017 at 03:30:46PM -0300, Eduardo Habkost wrote:
> On Tue, Mar 28, 2017 at 01:16:50PM +1100, David Gibson wrote:
> > Currently PCI/PCIe hybrid devices - that is, devices which can appear as
> > either plain PCI or PCIe depending on where they're attached - will only
> > appear in PCIe mode if they're attached to a PCIe bus via a root port or
> > downstream port.
> > 
> > This is correct for "standard" PCIe setups, but there are some platforms
> > which need different behaviour (notably "pseries" whose paravirtualized
> > PCI host bridges have some idiosyncracies).
> > 
> > This patch allows the host bridge to override the normal behaviour.
> > 
> > Signed-off-by: David Gibson 
> > ---
> >  hw/pci/pci.c  | 11 +--
> >  include/hw/pci/pci_host.h |  1 +
> >  2 files changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index 779787b..ac68065 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -392,9 +392,16 @@ bool pci_bus_is_root(PCIBus *bus)
> >  
> >  bool pci_allow_hybrid_pcie(PCIDevice *pci_dev)
> 
> Why are we asking the device about allowing hybrid pcie, and not
> the bus itself? Shouldn't we be able to query this before the
> device is plugged, and before pci_dev->bus is even set?
> 
> In other words, why pci_allow_hyberid_pcie() and
> pci_device_get_bus() get a PCIDevice* argument instead of a
> PCIBus* argument?

Ah good point.  I made it a PCIDevice simply for convenience, but
you're right we should be able to query before the device is plugged.

> 
> >  {
> > -PCIBus *bus = pci_dev->bus;
> > +PCIHostState *host_bridge = 
> > PCI_HOST_BRIDGE(pci_device_root_bus(pci_dev)->qbus.parent);
> 
> There's something I don't understand completely: what exactly
> guarantees that pci_device_root_bus(...)->qbus.parent is always
> going to be a PCI_HOST_BRIDGE?

Well, by definition whatever is above the root bus isn't PCI, which
pretty much means it has to be a PCI host bridge.  A machine could
break this assumption, but I think that would be a bug.  We use this
same pattern to find a PCI device's (or bus's) host bridge in other
places - there doesn't appear to be another way.

> > +PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_GET_CLASS(host_bridge);
> > +
> > +if (hc->allow_hybrid_pcie) {
> > +return hc->allow_hybrid_pcie(host_bridge, pci_dev);
> > +} else {
> > +PCIBus *bus = pci_dev->bus;
> >  
> > -return pci_bus_is_express(bus) && !pci_bus_is_root(bus);
> > +return pci_bus_is_express(bus) && !pci_bus_is_root(bus);
> > +}
> >  }
> >  
> >  void pci_bus_new_inplace(PCIBus *bus, size_t bus_size, DeviceState *parent,
> > diff --git a/include/hw/pci/pci_host.h b/include/hw/pci/pci_host.h
> > index ba31595..ad03cca 100644
> > --- a/include/hw/pci/pci_host.h
> > +++ b/include/hw/pci/pci_host.h
> > @@ -54,6 +54,7 @@ typedef struct PCIHostBridgeClass {
> >  SysBusDeviceClass parent_class;
> >  
> >  const char *(*root_bus_path)(PCIHostState *, PCIBus *);
> > +bool (*allow_hybrid_pcie)(PCIHostState *, PCIDevice *);
> >  } PCIHostBridgeClass;
> >  
> >  /* common internal helpers for PCI/PCIe hosts, cut off overflows */
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 2/4] ppc: remove cannot_destroy_with_object_finalize_yet

2017-04-17 Thread David Gibson

On Fri, Apr 14, 2017 at 10:37:15AM +0200, Laurent Vivier wrote:
> This removes the assert(kvm_enabled()) from kvmppc_host_cpu_initfn()
> 
> This assert can never be triggered as the function is only registered
> when KVM is available (see also 4c315c2
> "qdev: Protect device-list-properties against broken devices").
> 
> So we can remove the cannot_destroy_with_object_finalize_yet from
> kvmppc_host_cpu_class_init() without fear and beyond reproach.
> (as it has already be done for i386 with 771a13e "i386: Unset
> cannot_destroy_with_object_finalize_yet on "host" model" and
> e435601 "target-i386: Remove assert(kvm_enabled()) from
> host_x86_cpu_initfn()")
> 
> Signed-off-by: Laurent Vivier 

Applied to ppc-for-2.10 (fixing a contextual conflict on the way).


> ---
>  target/ppc/kvm.c | 10 --
>  1 file changed, 10 deletions(-)
> 
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 9f1f132..64017ac 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -2245,14 +2245,8 @@ static void alter_insns(uint64_t *word, uint64_t 
> flags, bool on)
>  }
>  }
>  
> -static void kvmppc_host_cpu_initfn(Object *obj)
> -{
> -assert(kvm_enabled());
> -}
> -
>  static void kvmppc_host_cpu_class_init(ObjectClass *oc, void *data)
>  {
> -DeviceClass *dc = DEVICE_CLASS(oc);
>  PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
>  uint32_t vmx = kvmppc_get_vmx();
>  uint32_t dfp = kvmppc_get_dfp();
> @@ -2279,9 +2273,6 @@ static void kvmppc_host_cpu_class_init(ObjectClass *oc, 
> void *data)
>  if (icache_size != -1) {
>  pcc->l1_icache_size = icache_size;
>  }
> -
> -/* Reason: kvmppc_host_cpu_initfn() dies when !kvm_enabled() */
> -dc->cannot_destroy_with_object_finalize_yet = true;
>  }
>  
>  bool kvmppc_has_cap_epr(void)
> @@ -2333,7 +2324,6 @@ static int kvm_ppc_register_host_cpu_type(void)
>  {
>  TypeInfo type_info = {
>  .name = TYPE_HOST_POWERPC_CPU,
> -.instance_init = kvmppc_host_cpu_initfn,
>  .class_init = kvmppc_host_cpu_class_init,
>  };
>  PowerPCCPUClass *pvr_pcc;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 7/7] intel_iommu: support passthrough (PT)

2017-04-17 Thread Jason Wang




On 2017年04月17日 18:58, Peter Xu wrote:

Signed-off-by: Peter Xu 
---
  hw/i386/intel_iommu.c  | 109 +
  hw/i386/intel_iommu_internal.h |   1 +
  hw/i386/trace-events   |   1 +
  hw/i386/x86-iommu.c|   1 +
  include/hw/i386/x86-iommu.h|   1 +
  5 files changed, 93 insertions(+), 20 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 05ae631..deb2007 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -872,7 +872,7 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, 
uint8_t bus_num,
  } else {
  switch (vtd_ce_get_type(ce)) {
  case VTD_CONTEXT_TT_MULTI_LEVEL:
-/* fall through */
+case VTD_CONTEXT_TT_PASS_THROUGH:
  case VTD_CONTEXT_TT_DEV_IOTLB:
  break;
  default:
@@ -883,6 +883,73 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, 
uint8_t bus_num,
  return 0;
  }
  
+/* Fetch translation type for specific device. Returns <0 if error

+ * happens, otherwise return the shifted type to check against
+ * VTD_CONTEXT_TT_*. */
+static int vtd_dev_get_trans_type(VTDAddressSpace *as)
+{
+IntelIOMMUState *s;
+VTDContextEntry ce;
+int ret;
+
+s = as->iommu_state;
+
+ret = vtd_dev_to_context_entry(s, pci_bus_num(as->bus),
+   as->devfn, &ce);
+if (ret) {
+return ret;
+}
+
+return vtd_ce_get_type(&ce);
+}
+
+static bool vtd_dev_pt_enabled(VTDAddressSpace *as)
+{
+int ret;
+
+assert(as);
+
+ret = vtd_dev_get_trans_type(as);
+if (ret < 0) {
+/*
+ * Possibly failed to parse the context entry for some reason
+ * (e.g., during init, or any guest configuration errors on
+ * context entries). We should assume PT not enabled for
+ * safety.
+ */
+return false;
+}
+
+return ret == VTD_CONTEXT_TT_PASS_THROUGH;
+}
+
+static void vtd_switch_address_space(VTDAddressSpace *as)
+{
+bool use_iommu;
+
+assert(as);
+
+use_iommu = as->iommu_state->dmar_enabled;
+if (use_iommu) {
+/* Further checks per-device configuration */
+use_iommu &= !vtd_dev_pt_enabled(as);
+}


Looks like you can use as->iommu_state->dmar_enabled && 
!vtd_dev_pt_enabled(as)



+
+trace_vtd_switch_address_space(pci_bus_num(as->bus),
+   VTD_PCI_SLOT(as->devfn),
+   VTD_PCI_FUNC(as->devfn),
+   use_iommu);
+
+/* Turn off first then on the other */
+if (use_iommu) {
+memory_region_set_enabled(&as->sys_alias, false);
+memory_region_set_enabled(&as->iommu, true);
+} else {
+memory_region_set_enabled(&as->iommu, false);
+memory_region_set_enabled(&as->sys_alias, true);
+}
+}
+
  static inline uint16_t vtd_make_source_id(uint8_t bus_num, uint8_t devfn)
  {
  return ((bus_num & 0xffUL) << 8) | (devfn & 0xffUL);
@@ -991,6 +1058,18 @@ static void vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
  cc_entry->context_cache_gen = s->context_cache_gen;
  }
  
+/*

+ * We don't need to translate for pass-through context entries.
+ * Also, let's ignore IOTLB caching as well for PT devices.
+ */
+if (vtd_ce_get_type(&ce) == VTD_CONTEXT_TT_PASS_THROUGH) {
+entry->translated_addr = entry->iova;
+entry->addr_mask = VTD_PAGE_SIZE - 1;
+entry->perm = IOMMU_RW;
+trace_vtd_translate_pt(source_id, entry->iova);
+return;
+}


Several questions here:

1) Is this just for vhost?
2) Since this is done after IOTLB querying, do we need flush IOTLB 
during address switching?



+
  ret_fr = vtd_iova_to_slpte(&ce, addr, is_write, &slpte, &level,
 &reads, &writes);
  if (ret_fr) {
@@ -1135,6 +1214,11 @@ static void 
vtd_context_device_invalidate(IntelIOMMUState *s,
   VTD_PCI_FUNC(devfn_it));
  vtd_as->context_cache_entry.context_cache_gen = 0;
  /*
+ * Do switch address space when needed, in case if the
+ * device passthrough bit is switched.
+ */
+vtd_switch_address_space(vtd_as);
+/*
   * So a device is moving out of (or moving into) a
   * domain, a replay() suites here to notify all the
   * IOMMU_NOTIFIER_MAP registers about this change.
@@ -1366,25 +1450,6 @@ static void vtd_handle_gcmd_sirtp(IntelIOMMUState *s)
  vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRTPS);
  }
  
-static void vtd_switch_address_space(VTDAddressSpace *as)

-{
-assert(as);
-
-trace_vtd_switch_address_space(pci_bus_num(as->bus),
-   VTD_PCI_SLOT(as->devfn),
-   VTD_PCI_FUN

[Qemu-devel] [PATCH] MAINTAINERS: update Wen's email address

2017-04-17 Thread Changlong Xie

So he can get CC'ed on future patches and bugs for this feature

Signed-off-by: Changlong Xie 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index c60235e..5638992 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1817,7 +1817,7 @@ S: Supported
 F: tests/image-fuzzer/
 
 Replication
-M: Wen Congyang 
+M: Wen Congyang 
 M: Changlong Xie 
 S: Supported
 F: replication*
-- 
1.9.3

Re: [Qemu-devel] [PULL 2/8] replication: clarify permissions

2017-04-17 Thread Xie Changlong




On 04/18/2017 09:36 AM, Hailiang Zhang wrote:

On 2017/4/18 9:23, Eric Blake wrote:

On 03/17/2017 08:15 AM, Kevin Wolf wrote:

From: Changlong Xie 

Even if hidden_disk, secondary_disk are backing files, they all need
write permissions in replication scenario. Otherwise we will encouter
below exceptions on secondary side during adding nbd server:

{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk', 
'writable': true } }
{"error": {"class": "GenericError", "desc": "Conflicts with use by 
hidden-qcow2-driver as 'backing', which does not allow 'write' on 
sec-qcow2-driver-for-nbd"}}


CC: Zhang Hailiang 
CC: Zhang Chen 
CC: Wen Congyang 

This address for Wen Congyang is different than the one listed in
MAINTAINERS for replication (M: Wen Congyang ),
and different still from addresses my mailer has harvested from other
posts (wencongy...@gmail.com).  The MAINTAINERS entry is now resulting
in 'undelivered mail' bounce messages, can you please submit an update
to MAINTAINERS with your new preferred address? [or gently correct me if
I'm confusing two people with the same name?]



No, the same people, he just left his job from fujitsu, the entry in 
MAINTAINERS

file needs to be updated.

Cc: Changlong Xie 
Hi Changlong, would you please send a patch to update it ?

Hi all

After a short talk with Wen, i would like to update his new email 
address now.


Thanks
-Xie


Hailiang




.

Re: [Qemu-devel] qemu memory manage question

2017-04-17 Thread jack.chen

Thanks very much!!

2017-04-17 19:19 GMT+08:00 李强 :
>
>
>> -Original Message-
>> From: Qemu-devel
>> [mailto:qemu-devel-bounces+liqiang6-s=360...@nongnu.org] On Behalf Of
>> jack.chen
>> Sent: Monday, April 17, 2017 6:56 PM
>> To: Peter Xu
>> Cc: qemu
>> Subject: Re: [Qemu-devel] qemu memory manage question
>>
>> Thanks,from the path you have list to me,it can be  well explained,but
>> according to the source code,in the end of kvm_init,kvm_memory_listener and
>> kvm_io_listener were registered by memory_listener_register(),and  in the
>> end of
>> memory_listener_register(),listener_add_address_space() was called for each
>> address_space,so the listener->region_add was executed then.I do not know
>> what mistake I have made,can you explain it to me ?? thank you very much!
>>
>
> They are callbacks.
> Every change of memory topology will call these listeners, add 
> subregion(Peter's example),
> modify the property of memory, create address space for example.
>
> Thanks.
>
> --
> Li Qiang /the Gear Team, Qihoo 360 Inc
>
>
>> 2017-04-17 18:26 GMT+08:00 Peter Xu :
>> > On Mon, Apr 17, 2017 at 06:09:11PM +0800, jack.chen wrote:
>> >> hello,I have some questions about memory allocation in qemu for
>> >> virtual machine.I found when configure_accelerator function was
>> >> called ,memory slots  were registered to KVM,but at that time
>> >> address_space have not been initialized and ram have not been
>> >> allocated,it is really confused me,Thanks a lot!!
>> >
>> > Here's how I understand it...
>> >
>> > configure_accelerator() does not register memory slots in KVM.
>> > Instead, it registers memory listeners. See
>> > kvm_memory_listener_register(), especially:
>> >
>> > kml->listener.region_add = kvm_region_add;
>> >
>> > That's the hook function to be called when there are new memory region
>> > added to the system.
>> >
>> > Further, when RAM is initialzed, it'll modify the address space layout
>> > of system_memory, and the registered listener of KVM (kvm_region_add)
>> > will be invoked, it'll further sync with kvm. It should be in the
>> > following path if you break at kvm_region_add in gdb:
>> >
>> > #0  0x557ba13a in kvm_region_add (listener=0x568330c0,
>> > section=0x7fffd310) at /root/git/qemu/kvm-all.c:859
>> > #1  0x557c1910 in address_space_update_topology_pass
>> > (as=0x5629e240 ,
>> old_view=0x567a7090,
>> > new_view=0x568d3460, adding=true) at /root/git/qemu/memory.c:871
>> > #2  0x557c19f3 in address_space_update_topology
>> > (as=0x5629e240 ) at
>> > /root/git/qemu/memory.c:886
>> > #3  0x557c1b41 in memory_region_transaction_commit () at
>> > /root/git/qemu/memory.c:922
>> > #4  0x557c4bfd in memory_region_update_container_subregions
>> > (subregion=0x568d2fc0) at /root/git/qemu/memory.c:2075
>> > #5  0x557c4c64 in memory_region_add_subregion_common
>> > (mr=0x567a5830, offset=0, subregion=0x568d2fc0) at
>> > /root/git/qemu/memory.c:2085
>> > #6  0x557c4ca0 in memory_region_add_subregion
>> > (mr=0x567a5830, offset=0, subregion=0x568d2fc0) at
>> > /root/git/qemu/memory.c:2093
>> > #7  0x5583fd68 in pc_memory_init (pcms=0x567a4100,
>> > system_memory=0x567a5830, rom_memory=0x568d21a0,
>> > ram_memory=0x7fffd550) at /root/git/qemu/hw/i386/pc.c:1383
>> > #8  0x55847363 in pc_q35_init (machine=0x567a4100) at
>> > /root/git/qemu/hw/i386/pc_q35.c:147
>> > #9  0x55847cac in pc_init_v2_9 (machine=0x567a4100) at
>> > /root/git/qemu/hw/i386/pc_q35.c:310
>> > #10 0x558f7cf8 in main (argc=11, argv=0x7fffda78,
>> > envp=0x7fffdad8) at /root/git/qemu/vl.c:4557
>> >
>> > Hope this helps. Thanks.
>> >
>> > --
>> > Peter Xu
>

[Qemu-devel] [PATCH V2 2/2] COLO-compare: Optimize tcp compare trace event

2017-04-17 Thread Zhang Chen

Optimize two trace events as one, adjust print format make
it easy to read. rename trace_colo_compare_pkt_info_src/dst
to trace_colo_compare_tcp_info.

Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 29 +
 net/trace-events   |  3 +--
 2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 049f6f8..0eaa097 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -279,18 +279,23 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet 
*ppkt)
 res = -1;
 }
 
-if (res != 0 && trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
-trace_colo_compare_pkt_info_src(inet_ntoa(ppkt->ip->ip_src),
-ntohl(stcp->th_seq),
-ntohl(stcp->th_ack),
-res, stcp->th_flags,
-spkt->size);
-
-trace_colo_compare_pkt_info_dst(inet_ntoa(ppkt->ip->ip_dst),
-ntohl(ptcp->th_seq),
-ntohl(ptcp->th_ack),
-res, ptcp->th_flags,
-ppkt->size);
+if (res && trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
+char ip_src[20], ip_dst[20];
+
+strcpy(ip_src, inet_ntoa(ppkt->ip->ip_src));
+strcpy(ip_dst, inet_ntoa(ppkt->ip->ip_dst));
+
+trace_colo_compare_tcp_info(ip_src,
+ip_dst,
+ntohl(ptcp->th_seq),
+ntohl(stcp->th_seq),
+ntohl(ptcp->th_ack),
+ntohl(stcp->th_ack),
+res,
+ptcp->th_flags,
+stcp->th_flags,
+ppkt->size,
+spkt->size);
 
 qemu_hexdump((char *)ppkt->data, stderr,
  "colo-compare ppkt", ppkt->size);
diff --git a/net/trace-events b/net/trace-events
index 35198bc..123cb28 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -13,8 +13,7 @@ colo_compare_icmp_miscompare(const char *sta, int size) ": %s 
= %d"
 colo_compare_ip_info(int psize, const char *sta, const char *stb, int ssize, 
const char *stc, const char *std) "ppkt size = %d, ip_src = %s, ip_dst = %s, 
spkt size = %d, ip_src = %s, ip_dst = %s"
 colo_old_packet_check_found(int64_t old_time) "%" PRId64
 colo_compare_miscompare(void) ""
-colo_compare_pkt_info_src(const char *src, uint32_t sseq, uint32_t sack, int 
res, uint32_t sflag, int ssize) "src/dst: %s s: seq/ack=%u/%u res=%d flags=%x 
spkt_size: %d\n"
-colo_compare_pkt_info_dst(const char *dst, uint32_t dseq, uint32_t dack, int 
res, uint32_t dflag, int dsize) "src/dst: %s d: seq/ack=%u/%u res=%d flags=%x 
dpkt_size: %d\n"
+colo_compare_tcp_info(const char *src, const char *dst, uint32_t pseq, 
uint32_t sseq, uint32_t pack, uint32_t sack, int res, uint32_t pflag, uint32_t 
sflag, int psize, int ssize) "src/dst: %s/%s pseq/sseq:%u/%u pack/sack:%u/%u 
res=%d pflags/sflag:%x/%x psize/ssize:%d/%d \n"
 
 # net/filter-rewriter.c
 colo_filter_rewriter_debug(void) ""
-- 
2.7.4

[Qemu-devel] [PATCH V2 1/2] COLO-compare: Optimize tcp compare for option field

2017-04-17 Thread Zhang Chen

In this patch we support packet that have tcp options field.
Add tcp options field check, If the packet have options
field we just skip it and compare tcp payload,
Avoid unnecessary checkpoint, optimize performance.

Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index aada04e..049f6f8 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -248,7 +248,32 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet 
*ppkt)
 spkt->ip->ip_sum = ppkt->ip->ip_sum;
 }
 
-if (ptcp->th_sum == stcp->th_sum) {
+/*
+ * Check tcp header length for tcp option field.
+ * th_off > 5 means this tcp packet have options field.
+ * The tcp options maybe always different.
+ * for example:
+ * From RFC 7323.
+ * TCP Timestamps option (TSopt):
+ * Kind: 8
+ *
+ * Length: 10 bytes
+ *
+ *+---+---+-+-+
+ *|Kind=8 |  10   |   TS Value (TSval)  |TS Echo Reply (TSecr)|
+ *+---+---+-+-+
+ *   1   1  4 4
+ *
+ * In this case the primary guest's timestamp always different with
+ * the secondary guest's timestamp. COLO just focus on payload,
+ * so we just need skip this field.
+ */
+if (ptcp->th_off > 5) {
+ptrdiff_t tcp_offset;
+tcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data
+ + (ptcp->th_off * 4);
+res = colo_packet_compare_common(ppkt, spkt, tcp_offset);
+} else if (ptcp->th_sum == stcp->th_sum) {
 res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN);
 } else {
 res = -1;
-- 
2.7.4

[Qemu-devel] [PATCH V2 0/2] COLO-compare: Optimize tcp compare performance and trace format.

2017-04-17 Thread Zhang Chen

In the first patch, we add tcp options support to optimize compare performance.
and another patch simplified code and adjust trace print format.

Zhang Chen (2):
  COLO-compare: Optimize tcp compare for option field
  COLO-compare: Optimize tcp compare trace event

 net/colo-compare.c | 54 ++
 net/trace-events   |  3 +--
 2 files changed, 43 insertions(+), 14 deletions(-)

-- 
2.7.4

Re: [Qemu-devel] [PATCH 1/2] COLO-compare: Optimize tcp compare for option field

2017-04-17 Thread Zhang Chen




On 04/17/2017 09:43 PM, Philippe Mathieu-Daudé wrote:

Hi Zhang,

On 04/16/2017 06:24 AM, Zhang Chen wrote:

In this patch we support packet that have tcp options field.
Add tcp options field check, If the packet have options
field we just skip it and compare tcp payload,
Avoid unnecessary checkpoint, optimize performance.

Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index aada04e..881d6b2 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -228,6 +228,7 @@ static int colo_packet_compare_tcp(Packet *spkt, 
Packet *ppkt)

 {
 struct tcphdr *ptcp, *stcp;
 int res;
+int tcp_offset = 0;


No need to declare here, it can be restricted to the if {} body.


OK~





 trace_colo_compare_main("compare tcp");

@@ -248,7 +249,31 @@ static int colo_packet_compare_tcp(Packet *spkt, 
Packet *ppkt)

 spkt->ip->ip_sum = ppkt->ip->ip_sum;
 }

-if (ptcp->th_sum == stcp->th_sum) {
+/*
+ * Check tcp header length for tcp option field.
+ * th_off > 5 means this tcp packet have options field.
+ * The tcp options maybe always different.
+ * for example:
+ * From RFC 7323.
+ * TCP Timestamps option (TSopt):
+ * Kind: 8
+ *
+ * Length: 10 bytes
+ *
+ * +---+---+-+-+
+ *|Kind=8 |  10   |   TS Value (TSval)  |TS Echo Reply (TSecr)|
+ * +---+---+-+-+
+ *   1   1  4 4
+ *
+ * In this case the primary guest's timestamp always different with
+ * the secondary guest's timestamp. COLO just focus on payload,
+ * so we just need skip this field.
+ */
+if (ptcp->th_off > 5) {


You can declare here.

I'd rather declare tcp_offset as a ptrdiff_t.


I got your point, will fix it in next version.

Thanks
Zhang Chen




+tcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data
+ + (ptcp->th_off * 4);
+res = colo_packet_compare_common(ppkt, spkt, tcp_offset);
+} else if (ptcp->th_sum == stcp->th_sum) {
 res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN);
 } else {
 res = -1;




.



--
Thanks
Zhang Chen

Re: [Qemu-devel] [PULL 2/8] replication: clarify permissions

2017-04-17 Thread Hailiang Zhang


On 2017/4/18 9:23, Eric Blake wrote:

On 03/17/2017 08:15 AM, Kevin Wolf wrote:

From: Changlong Xie 

Even if hidden_disk, secondary_disk are backing files, they all need
write permissions in replication scenario. Otherwise we will encouter
below exceptions on secondary side during adding nbd server:

{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk', 'writable': 
true } }
{"error": {"class": "GenericError", "desc": "Conflicts with use by 
hidden-qcow2-driver as 'backing', which does not allow 'write' on sec-qcow2-driver-for-nbd"}}

CC: Zhang Hailiang 
CC: Zhang Chen 
CC: Wen Congyang 

This address for Wen Congyang is different than the one listed in
MAINTAINERS for replication (M: Wen Congyang ),
and different still from addresses my mailer has harvested from other
posts (wencongy...@gmail.com).  The MAINTAINERS entry is now resulting
in 'undelivered mail' bounce messages, can you please submit an update
to MAINTAINERS with your new preferred address? [or gently correct me if
I'm confusing two people with the same name?]



No, the same people, he just left his job from fujitsu, the entry in MAINTAINERS
file needs to be updated.

Cc: Changlong Xie 
Hi Changlong, would you please send a patch to update it ?

Hailiang

[Qemu-devel] [PATCH 29/31] vpc: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vpc driver accordingly.

Signed-off-by: Eric Blake 
---
 block/vpc.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index ecfee77..3cd56e7 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -689,46 +689,45 @@ fail:
 return ret;
 }

-static int64_t coroutine_fn vpc_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int64_t coroutine_fn vpc_co_block_status(BlockDriverState *bs,
+int64_t offset, int64_t bytes, int64_t *pnum, BlockDriverState **file)
 {
 BDRVVPCState *s = bs->opaque;
 VHDFooter *footer = (VHDFooter*) s->footer_buf;
-int64_t start, offset;
+int64_t start, image_offset;
 bool allocated;
 int n;

 if (be32_to_cpu(footer->type) == VHD_FIXED) {
-*pnum = nb_sectors;
+*pnum = bytes;
 *file = bs->file->bs;
 return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_DATA |
-   (sector_num << BDRV_SECTOR_BITS);
+   (offset & BDRV_BLOCK_OFFSET_MASK);
 }

-offset = get_sector_offset(bs, sector_num, 0);
-start = offset;
-allocated = (offset != -1);
+image_offset = get_image_offset(bs, offset, 0);
+start = image_offset & BDRV_BLOCK_OFFSET_MASK;
+allocated = (image_offset != -1);
 *pnum = 0;

 do {
 /* All sectors in a block are contiguous (without using the bitmap) */
-n = ROUND_UP(sector_num + 1, s->block_size / BDRV_SECTOR_SIZE)
-  - sector_num;
-n = MIN(n, nb_sectors);
+n = ROUND_UP(offset + 1, s->block_size) - offset;
+n = MIN(n, bytes);

 *pnum += n;
-sector_num += n;
-nb_sectors -= n;
+offset += n;
+bytes -= n;
 /* *pnum can't be greater than one block for allocated
  * sectors since there is always a bitmap in between. */
 if (allocated) {
 *file = bs->file->bs;
 return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
 }
-if (nb_sectors == 0) {
+if (bytes == 0) {
 break;
 }
-offset = get_sector_offset(bs, sector_num, 0);
+image_offset = get_sector_offset(bs, offset, 0);
 } while (offset == -1);

 return 0;
@@ -1074,7 +1073,7 @@ static BlockDriver bdrv_vpc = {

 .bdrv_co_preadv = vpc_co_preadv,
 .bdrv_co_pwritev= vpc_co_pwritev,
-.bdrv_co_get_block_status   = vpc_co_get_block_status,
+.bdrv_co_block_status   = vpc_co_block_status,

 .bdrv_get_info  = vpc_get_info,

-- 
2.9.3

[Qemu-devel] [PATCH 30/31] vvfat: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vvfat driver accordingly.

Signed-off-by: Eric Blake 
---
 block/vvfat.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index bef2056..825fe72 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2961,13 +2961,13 @@ vvfat_co_pwritev(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 return ret;
 }

-static int64_t coroutine_fn vvfat_co_get_block_status(BlockDriverState *bs,
-   int64_t sector_num, int nb_sectors, int *n, BlockDriverState **file)
+static int64_t coroutine_fn vvfat_co_block_status(BlockDriverState *bs,
+int64_t offset, int64_t bytes, int64_t *n, BlockDriverState **file)
 {
 BDRVVVFATState* s = bs->opaque;
-*n = s->sector_count - sector_num;
-if (*n > nb_sectors) {
-*n = nb_sectors;
+*n = s->sector_count * BDRV_SECTOR_SIZE - offset;
+if (*n > bytes) {
+*n = bytes;
 } else if (*n < 0) {
 return 0;
 }
@@ -3124,7 +3124,7 @@ static BlockDriver bdrv_vvfat = {

 .bdrv_co_preadv = vvfat_co_preadv,
 .bdrv_co_pwritev= vvfat_co_pwritev,
-.bdrv_co_get_block_status = vvfat_co_get_block_status,
+.bdrv_co_block_status   = vvfat_co_block_status,
 };

 static void bdrv_vvfat_init(void)
-- 
2.9.3

[Qemu-devel] [PATCH 26/31] vdi: Avoid bitrot of debugging code

2017-04-17 Thread Eric Blake

Rework the debug define so that we always get -Wformat checking,
even when debugging is disabled.

Signed-off-by: Eric Blake 
---
 block/vdi.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index d12d9cd..a70b969 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -86,12 +86,18 @@
 #define DEFAULT_CLUSTER_SIZE (1 * MiB)

 #if defined(CONFIG_VDI_DEBUG)
-#define logout(fmt, ...) \
-fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__)
+#define VDI_DEBUG 1
 #else
-#define logout(fmt, ...) ((void)0)
+#define VDI_DEBUG 0
 #endif

+#define logout(fmt, ...) \
+do {\
+if (VDI_DEBUG) {\
+fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__); \
+}   \
+} while (0)
+
 /* Image signature. */
 #define VDI_SIGNATURE 0xbeda107f

-- 
2.9.3

[Qemu-devel] [PATCH 23/31] qed: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qed driver accordingly.

Signed-off-by: Eric Blake 
---
 block/qed.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index fd76817..336dae4 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -729,7 +729,7 @@ typedef struct {
 Coroutine *co;
 uint64_t pos;
 int64_t status;
-int *pnum;
+int64_t *pnum;
 BlockDriverState **file;
 } QEDIsAllocatedCB;

@@ -737,10 +737,10 @@ static void qed_is_allocated_cb(void *opaque, int ret, 
uint64_t offset, size_t l
 {
 QEDIsAllocatedCB *cb = opaque;
 BDRVQEDState *s = cb->bs->opaque;
-*cb->pnum = len / BDRV_SECTOR_SIZE;
+*cb->pnum = len;
 switch (ret) {
 case QED_CLUSTER_FOUND:
-offset |= qed_offset_into_cluster(s, cb->pos);
+offset |= qed_offset_into_cluster(s, cb->pos) & 
BDRV_BLOCK_OFFSET_VALID;
 cb->status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
 *cb->file = cb->bs->file->bs;
 break;
@@ -762,23 +762,23 @@ static void qed_is_allocated_cb(void *opaque, int ret, 
uint64_t offset, size_t l
 }
 }

-static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs,
- int64_t sector_num,
- int nb_sectors, int *pnum,
- BlockDriverState **file)
+static int64_t coroutine_fn bdrv_qed_co_block_status(BlockDriverState *bs,
+ int64_t offset,
+ int64_t bytes,
+ int64_t *pnum,
+ BlockDriverState **file)
 {
 BDRVQEDState *s = bs->opaque;
-size_t len = (size_t)nb_sectors * BDRV_SECTOR_SIZE;
 QEDIsAllocatedCB cb = {
 .bs = bs,
-.pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE,
+.pos = offset,
 .status = BDRV_BLOCK_OFFSET_MASK,
 .pnum = pnum,
 .file = file,
 };
 QEDRequest request = { .l2_table = NULL };

-qed_find_cluster(s, &request, cb.pos, len, qed_is_allocated_cb, &cb);
+qed_find_cluster(s, &request, cb.pos, bytes, qed_is_allocated_cb, &cb);

 /* Now sleep if the callback wasn't invoked immediately */
 while (cb.status == BDRV_BLOCK_OFFSET_MASK) {
@@ -1710,7 +1710,7 @@ static BlockDriver bdrv_qed = {
 .bdrv_child_perm  = bdrv_format_default_perms,
 .bdrv_create  = bdrv_qed_create,
 .bdrv_has_zero_init   = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = bdrv_qed_co_get_block_status,
+.bdrv_co_block_status = bdrv_qed_co_block_status,
 .bdrv_aio_readv   = bdrv_qed_aio_readv,
 .bdrv_aio_writev  = bdrv_qed_aio_writev,
 .bdrv_co_pwrite_zeroes= bdrv_qed_co_pwrite_zeroes,
-- 
2.9.3

[Qemu-devel] [PATCH 25/31] sheepdog: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the sheepdog driver accordingly.

Signed-off-by: Eric Blake 
---
 block/sheepdog.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 1ccb81b..be7cc39 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -2971,18 +2971,17 @@ static coroutine_fn int sd_co_pdiscard(BlockDriverState 
*bs, int64_t offset,
 }

 static coroutine_fn int64_t
-sd_co_get_block_status(BlockDriverState *bs, int64_t sector_num, int 
nb_sectors,
-   int *pnum, BlockDriverState **file)
+sd_co_block_status(BlockDriverState *bs, int64_t offset, int64_t bytes,
+   int64_t *pnum, BlockDriverState **file)
 {
 BDRVSheepdogState *s = bs->opaque;
 SheepdogInode *inode = &s->inode;
 uint32_t object_size = (UINT32_C(1) << inode->block_size_shift);
-uint64_t offset = sector_num * BDRV_SECTOR_SIZE;
 unsigned long start = offset / object_size,
-  end = DIV_ROUND_UP((sector_num + nb_sectors) *
- BDRV_SECTOR_SIZE, object_size);
+  end = DIV_ROUND_UP(offset + bytes, object_size);
 unsigned long idx;
-int64_t ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
+int64_t ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID |
+(offset & BDRV_BLOCK_OFFSET_MASK);

 for (idx = start; idx < end; idx++) {
 if (inode->data_vdi_id[idx] == 0) {
@@ -2999,9 +2998,9 @@ sd_co_get_block_status(BlockDriverState *bs, int64_t 
sector_num, int nb_sectors,
 }
 }

-*pnum = (idx - start) * object_size / BDRV_SECTOR_SIZE;
-if (*pnum > nb_sectors) {
-*pnum = nb_sectors;
+*pnum = (idx - start) * object_size;
+if (*pnum > bytes) {
+*pnum = bytes;
 }
 if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID) {
 *file = bs;
@@ -3079,7 +3078,7 @@ static BlockDriver bdrv_sheepdog = {
 .bdrv_co_writev = sd_co_writev,
 .bdrv_co_flush_to_disk  = sd_co_flush_to_disk,
 .bdrv_co_pdiscard = sd_co_pdiscard,
-.bdrv_co_get_block_status = sd_co_get_block_status,
+.bdrv_co_block_status   = sd_co_block_status,

 .bdrv_snapshot_create   = sd_snapshot_create,
 .bdrv_snapshot_goto = sd_snapshot_goto,
@@ -3115,7 +3114,7 @@ static BlockDriver bdrv_sheepdog_tcp = {
 .bdrv_co_writev = sd_co_writev,
 .bdrv_co_flush_to_disk  = sd_co_flush_to_disk,
 .bdrv_co_pdiscard = sd_co_pdiscard,
-.bdrv_co_get_block_status = sd_co_get_block_status,
+.bdrv_co_block_status   = sd_co_block_status,

 .bdrv_snapshot_create   = sd_snapshot_create,
 .bdrv_snapshot_goto = sd_snapshot_goto,
@@ -3151,7 +3150,7 @@ static BlockDriver bdrv_sheepdog_unix = {
 .bdrv_co_writev = sd_co_writev,
 .bdrv_co_flush_to_disk  = sd_co_flush_to_disk,
 .bdrv_co_pdiscard = sd_co_pdiscard,
-.bdrv_co_get_block_status = sd_co_get_block_status,
+.bdrv_co_block_status   = sd_co_block_status,

 .bdrv_snapshot_create   = sd_snapshot_create,
 .bdrv_snapshot_goto = sd_snapshot_goto,
-- 
2.9.3

[Qemu-devel] [PATCH 22/31] qcow2: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qcow2 driver accordingly.

Signed-off-by: Eric Blake 
---
 block/qcow2.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 4272cca..0de7210 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1358,8 +1358,8 @@ static void qcow2_join_options(QDict *options, QDict 
*old_options)
 }
 }

-static int64_t coroutine_fn qcow2_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int64_t coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
+int64_t offset, int64_t count, int64_t *pnum, BlockDriverState **file)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t cluster_offset;
@@ -1367,21 +1367,20 @@ static int64_t coroutine_fn 
qcow2_co_get_block_status(BlockDriverState *bs,
 unsigned int bytes;
 int64_t status = 0;

-bytes = MIN(INT_MAX, nb_sectors * BDRV_SECTOR_SIZE);
+bytes = MIN(INT_MAX, count);
 qemu_co_mutex_lock(&s->lock);
-ret = qcow2_get_cluster_offset(bs, sector_num << 9, &bytes,
-   &cluster_offset);
+ret = qcow2_get_cluster_offset(bs, offset, &bytes, &cluster_offset);
 qemu_co_mutex_unlock(&s->lock);
 if (ret < 0) {
 return ret;
 }

-*pnum = bytes >> BDRV_SECTOR_BITS;
+*pnum = bytes;

 if (cluster_offset != 0 && ret != QCOW2_CLUSTER_COMPRESSED &&
 !s->cipher) {
-index_in_cluster = sector_num & (s->cluster_sectors - 1);
-cluster_offset |= (index_in_cluster << BDRV_SECTOR_BITS);
+index_in_cluster = offset & (s->cluster_size - 1);
+cluster_offset |= (index_in_cluster & BDRV_BLOCK_OFFSET_MASK);
 *file = bs->file->bs;
 status |= BDRV_BLOCK_OFFSET_VALID | cluster_offset;
 }
@@ -3429,7 +3428,7 @@ BlockDriver bdrv_qcow2 = {
 .bdrv_child_perm  = bdrv_format_default_perms,
 .bdrv_create= qcow2_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = qcow2_co_get_block_status,
+.bdrv_co_block_status = qcow2_co_block_status,
 .bdrv_set_key   = qcow2_set_key,

 .bdrv_co_preadv = qcow2_co_preadv,
-- 
2.9.3

[Qemu-devel] [PATCH 31/31] block: Drop unused .bdrv_co_get_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Now that all drivers have been updated to provide the
byte-based .bdrv_co_block_status(), we can delete the sector-based
interface.

Signed-off-by: Eric Blake 
---
 include/block/block_int.h |  3 ---
 block/io.c| 59 ---
 2 files changed, 25 insertions(+), 37 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 8f20bc3..25197d7 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -174,9 +174,6 @@ struct BlockDriver {
  * BDRV_BLOCK_DATA, _ZERO, _OFFSET_VALID, and _RAW, and only
  * according to the current BDS.
  */
-int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum,
-BlockDriverState **file);
 int64_t coroutine_fn (*bdrv_co_block_status)(BlockDriverState *bd,
 int64_t offset, int64_t bytes, int64_t *pnum,
 BlockDriverState **file);
diff --git a/block/io.c b/block/io.c
index 361eeb8..0488d08 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1719,6 +1719,7 @@ static int64_t coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,
 int64_t n; /* bytes */
 int64_t ret, ret2;
 BlockDriverState *tmp_file;
+int64_t aligned_offset, aligned_bytes;

 total_size = bdrv_getlength(bs);
 if (total_size < 0) {
@@ -1738,7 +1739,7 @@ static int64_t coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,
 bytes = n;
 }

-if (!bs->drv->bdrv_co_get_block_status && !bs->drv->bdrv_co_block_status) {
+if (!bs->drv->bdrv_co_block_status) {
 *pnum = bytes;
 ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
 if (bs->drv->protocol_name) {
@@ -1752,45 +1753,35 @@ static int64_t coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,
 }
 *file = NULL;
 bdrv_inc_in_flight(bs);
-if (bs->drv->bdrv_co_get_block_status) {
-int count; /* sectors */

-assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
-ret = bs->drv->bdrv_co_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
-bytes >> BDRV_SECTOR_BITS,
-&count, file);
-*pnum = count * BDRV_SECTOR_SIZE;
-} else {
-/* Round out to request_alignment boundaries */
-int64_t aligned_offset, aligned_bytes;
-
-aligned_offset = QEMU_ALIGN_DOWN(offset, bs->bl.request_alignment);
-aligned_bytes = ROUND_UP(offset + bytes,
- bs->bl.request_alignment) - aligned_offset;
-ret = bs->drv->bdrv_co_block_status(bs, aligned_offset, aligned_bytes,
-&n, file);
-/* Clamp pnum and ret to original request */
-if (aligned_offset != offset && ret >= 0) {
-int sectors = DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE) -
-DIV_ROUND_UP(aligned_offset, BDRV_SECTOR_SIZE);
-
-assert(n >= offset - aligned_offset);
-n -= offset - aligned_offset;
-if (sectors) {
-ret += sectors * BDRV_SECTOR_SIZE;
-}
-}
-if (ret >= 0 && n > bytes) {
-assert(aligned_bytes != bytes);
-n = bytes;
-}
-*pnum = n;
-}
+/* Round out to request_alignment boundaries */
+aligned_offset = QEMU_ALIGN_DOWN(offset, bs->bl.request_alignment);
+aligned_bytes = ROUND_UP(offset + bytes,
+ bs->bl.request_alignment) - aligned_offset;
+ret = bs->drv->bdrv_co_block_status(bs, aligned_offset, aligned_bytes,
+&n, file);
 if (ret < 0) {
 *pnum = 0;
 goto out;
 }

+/* Clamp pnum and ret to original request */
+if (aligned_offset != offset && ret >= 0) {
+int sectors = DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE) -
+DIV_ROUND_UP(aligned_offset, BDRV_SECTOR_SIZE);
+
+assert(n >= offset - aligned_offset);
+n -= offset - aligned_offset;
+if (sectors) {
+ret += sectors * BDRV_SECTOR_SIZE;
+}
+}
+if (n > bytes) {
+assert(aligned_bytes != bytes);
+n = bytes;
+}
+*pnum = n;
+
 if (ret & BDRV_BLOCK_RAW) {
 assert(ret & BDRV_BLOCK_OFFSET_VALID);
 ret = bdrv_block_status(*file, ret & BDRV_BLOCK_OFFSET_MASK,
-- 
2.9.3

[Qemu-devel] [PATCH 24/31] raw: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the raw driver accordingly.

Signed-off-by: Eric Blake 
---
 block/raw-format.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/raw-format.c b/block/raw-format.c
index 36e6503..746beed 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -250,17 +250,17 @@ fail:
 return ret;
 }

-static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors, int *pnum,
-BlockDriverState **file)
+static int64_t coroutine_fn raw_co_block_status(BlockDriverState *bs,
+int64_t offset,
+int64_t bytes, int64_t *pnum,
+BlockDriverState **file)
 {
 BDRVRawState *s = bs->opaque;
-*pnum = nb_sectors;
+*pnum = bytes;
 *file = bs->file->bs;
-sector_num += s->offset / BDRV_SECTOR_SIZE;
+offset += s->offset;
 return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_DATA |
-   (sector_num << BDRV_SECTOR_BITS);
+(offset & BDRV_BLOCK_OFFSET_MASK);
 }

 static int coroutine_fn raw_co_pwrite_zeroes(BlockDriverState *bs,
@@ -475,7 +475,7 @@ BlockDriver bdrv_raw = {
 .bdrv_co_pwritev  = &raw_co_pwritev,
 .bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
 .bdrv_co_pdiscard = &raw_co_pdiscard,
-.bdrv_co_get_block_status = &raw_co_get_block_status,
+.bdrv_co_block_status = &raw_co_block_status,
 .bdrv_truncate= &raw_truncate,
 .bdrv_getlength   = &raw_getlength,
 .has_variable_length  = true,
-- 
2.9.3

[Qemu-devel] [PATCH 28/31] vmdk: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vmdk driver accordingly.

Signed-off-by: Eric Blake 
---
 block/vmdk.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index c61b9cc..c85ce96 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1286,25 +1286,24 @@ static inline uint64_t 
vmdk_find_index_in_cluster(VmdkExtent *extent,
 return offset / BDRV_SECTOR_SIZE;
 }

-static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int64_t coroutine_fn vmdk_co_block_status(BlockDriverState *bs,
+int64_t offset, int64_t bytes, int64_t *pnum, BlockDriverState **file)
 {
 BDRVVmdkState *s = bs->opaque;
 int64_t index_in_cluster, n, ret;
-uint64_t offset;
+uint64_t cluster_offset;
 VmdkExtent *extent;

-extent = find_extent(s, sector_num, NULL);
+extent = find_extent(s, offset >> BDRV_SECTOR_BITS, NULL);
 if (!extent) {
 return 0;
 }
 qemu_co_mutex_lock(&s->lock);
-ret = get_cluster_offset(bs, extent, NULL,
- sector_num * 512, false, &offset,
+ret = get_cluster_offset(bs, extent, NULL, offset, false, &cluster_offset,
  0, 0);
 qemu_co_mutex_unlock(&s->lock);

-index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
+index_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
 switch (ret) {
 case VMDK_ERROR:
 ret = -EIO;
@@ -1319,18 +1318,15 @@ static int64_t coroutine_fn 
vmdk_co_get_block_status(BlockDriverState *bs,
 ret = BDRV_BLOCK_DATA;
 if (!extent->compressed) {
 ret |= BDRV_BLOCK_OFFSET_VALID;
-ret |= (offset + (index_in_cluster << BDRV_SECTOR_BITS))
+ret |= (cluster_offset + index_in_cluster)
 & BDRV_BLOCK_OFFSET_MASK;
 }
 *file = extent->file->bs;
 break;
 }

-n = extent->cluster_sectors - index_in_cluster;
-if (n > nb_sectors) {
-n = nb_sectors;
-}
-*pnum = n;
+n = extent->cluster_sectors * BDRV_SECTOR_SIZE - index_in_cluster;
+*pnum = MIN(n, bytes);
 return ret;
 }

@@ -2362,7 +2358,7 @@ static BlockDriver bdrv_vmdk = {
 .bdrv_close   = vmdk_close,
 .bdrv_create  = vmdk_create,
 .bdrv_co_flush_to_disk= vmdk_co_flush,
-.bdrv_co_get_block_status = vmdk_co_get_block_status,
+.bdrv_co_block_status = vmdk_co_block_status,
 .bdrv_get_allocated_file_size = vmdk_get_allocated_file_size,
 .bdrv_has_zero_init   = vmdk_has_zero_init,
 .bdrv_get_specific_info   = vmdk_get_specific_info,
-- 
2.9.3

[Qemu-devel] [PATCH 18/31] mirror: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the mirror driver accordingly.

Signed-off-by: Eric Blake 
---
 block/mirror.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 750be1f..ebd0adf 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1027,14 +1027,14 @@ static int coroutine_fn 
bdrv_mirror_top_flush(BlockDriverState *bs)
 return bdrv_co_flush(bs->backing->bs);
 }

-static int64_t coroutine_fn bdrv_mirror_top_get_block_status(
-BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
+static int64_t coroutine_fn bdrv_mirror_top_block_status(
+BlockDriverState *bs, int64_t offset, int64_t bytes, int64_t *pnum,
 BlockDriverState **file)
 {
-*pnum = nb_sectors;
+*pnum = bytes;
 *file = bs->backing->bs;
 return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_DATA |
-   (sector_num << BDRV_SECTOR_BITS);
+   (offset & BDRV_BLOCK_OFFSET_MASK);
 }

 static int coroutine_fn bdrv_mirror_top_pwrite_zeroes(BlockDriverState *bs,
@@ -1083,7 +1083,7 @@ static BlockDriver bdrv_mirror_top = {
 .bdrv_co_pwrite_zeroes  = bdrv_mirror_top_pwrite_zeroes,
 .bdrv_co_pdiscard   = bdrv_mirror_top_pdiscard,
 .bdrv_co_flush  = bdrv_mirror_top_flush,
-.bdrv_co_get_block_status   = bdrv_mirror_top_get_block_status,
+.bdrv_co_block_status   = bdrv_mirror_top_block_status,
 .bdrv_refresh_filename  = bdrv_mirror_top_refresh_filename,
 .bdrv_close = bdrv_mirror_top_close,
 .bdrv_child_perm= bdrv_mirror_top_child_perm,
-- 
2.9.3

Re: [Qemu-devel] [Qemu-block] [PATCH 00/31] make bdrv_get_block_status byte-based

2017-04-17 Thread Eric Blake

On 04/17/2017 08:33 PM, Eric Blake wrote:
> There are patches floating around to add NBD_CMD_BLOCK_STATUS,
> but NBD wants to report status on byte granularity (even if the
> reporting will probably be naturally aligned to sectors or even
> much higher levels).  I've therefore started the task of
> converting our block status code to report at a byte granularity
> rather than sectors.

Apologies for botching Kevin's address in the mail; if you reply, you'll
want to edit the to list if you don't want a bounce message about an
undeliverable address ;(


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 20/31] parallels: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the parallels driver accordingly.  Note that
the internal function block_status() is still sector-based, because
it is still in use by other sector-based functions; but that's okay
because request_alignment is 512 as a result of those functions.

Signed-off-by: Eric Blake 
---
 block/parallels.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 8be46a7..2bc1918 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -274,22 +274,25 @@ static coroutine_fn int 
parallels_co_flush_to_os(BlockDriverState *bs)
 }


-static int64_t coroutine_fn parallels_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int64_t coroutine_fn parallels_co_block_status(BlockDriverState *bs,
+int64_t offset, int64_t bytes, int64_t *pnum, BlockDriverState **file)
 {
 BDRVParallelsState *s = bs->opaque;
-int64_t offset;
+int count;

+assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
 qemu_co_mutex_lock(&s->lock);
-offset = block_status(s, sector_num, nb_sectors, pnum);
+offset = block_status(s, offset >> BDRV_SECTOR_BITS,
+  bytes >> BDRV_SECTOR_BITS, &count);
 qemu_co_mutex_unlock(&s->lock);

 if (offset < 0) {
 return 0;
 }

+*pnum = count * BDRV_SECTOR_SIZE;
 *file = bs->file->bs;
-return (offset << BDRV_SECTOR_BITS) |
+return (offset & BDRV_BLOCK_OFFSET_MASK) |
 BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }

@@ -775,7 +778,7 @@ static BlockDriver bdrv_parallels = {
 .bdrv_open = parallels_open,
 .bdrv_close= parallels_close,
 .bdrv_child_perm  = bdrv_format_default_perms,
-.bdrv_co_get_block_status = parallels_co_get_block_status,
+.bdrv_co_block_status = parallels_co_block_status,
 .bdrv_has_zero_init   = bdrv_has_zero_init_1,
 .bdrv_co_flush_to_os  = parallels_co_flush_to_os,
 .bdrv_co_readv  = parallels_co_readv,
-- 
2.9.3

[Qemu-devel] [PATCH 17/31] iscsi: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the iscsi driver accordingly.  In this case,
it is handy to teach iscsi_co_block_status() to handle a NULL file
parameter, even though the block layer passes a non-NULL value,
because we also call the function directly.

Signed-off-by: Eric Blake 
---
 block/iscsi.c | 52 
 1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index e51fdfb..f7c8a32 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -680,9 +680,9 @@ out_unlock:



-static int64_t coroutine_fn iscsi_co_get_block_status(BlockDriverState *bs,
-  int64_t sector_num,
-  int nb_sectors, int *pnum,
+static int64_t coroutine_fn iscsi_co_block_status(BlockDriverState *bs,
+  int64_t offset,
+  int64_t bytes, int64_t *pnum,
   BlockDriverState **file)
 {
 IscsiLun *iscsilun = bs->opaque;
@@ -693,15 +693,15 @@ static int64_t coroutine_fn 
iscsi_co_get_block_status(BlockDriverState *bs,

 iscsi_co_init_iscsitask(iscsilun, &iTask);

-if (!is_sector_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
+if (!is_byte_request_lun_aligned(offset, bytes, iscsilun)) {
 ret = -EINVAL;
 goto out;
 }

 /* default to all sectors allocated */
 ret = BDRV_BLOCK_DATA;
-ret |= (sector_num << BDRV_SECTOR_BITS) | BDRV_BLOCK_OFFSET_VALID;
-*pnum = nb_sectors;
+ret |= (offset & BDRV_BLOCK_OFFSET_MASK) | BDRV_BLOCK_OFFSET_VALID;
+*pnum = bytes;

 /* LUN does not support logical block provisioning */
 if (!iscsilun->lbpme) {
@@ -711,7 +711,7 @@ static int64_t coroutine_fn 
iscsi_co_get_block_status(BlockDriverState *bs,
 qemu_mutex_lock(&iscsilun->mutex);
 retry:
 if (iscsi_get_lba_status_task(iscsilun->iscsi, iscsilun->lun,
-  sector_qemu2lun(sector_num, iscsilun),
+  offset / iscsilun->block_size,
   8 + 16, iscsi_co_generic_cb,
   &iTask) == NULL) {
 ret = -ENOMEM;
@@ -750,12 +750,12 @@ retry:

 lbasd = &lbas->descriptors[0];

-if (sector_qemu2lun(sector_num, iscsilun) != lbasd->lba) {
+if (offset / iscsilun->block_size != lbasd->lba) {
 ret = -EIO;
 goto out_unlock;
 }

-*pnum = sector_lun2qemu(lbasd->num_blocks, iscsilun);
+*pnum = lbasd->num_blocks * iscsilun->block_size;

 if (lbasd->provisioning == SCSI_PROVISIONING_TYPE_DEALLOCATED ||
 lbasd->provisioning == SCSI_PROVISIONING_TYPE_ANCHORED) {
@@ -766,15 +766,13 @@ retry:
 }

 if (ret & BDRV_BLOCK_ZERO) {
-iscsi_allocmap_set_unallocated(iscsilun, sector_num * BDRV_SECTOR_SIZE,
-   *pnum * BDRV_SECTOR_SIZE);
+iscsi_allocmap_set_unallocated(iscsilun, offset, *pnum);
 } else {
-iscsi_allocmap_set_allocated(iscsilun, sector_num * BDRV_SECTOR_SIZE,
- *pnum * BDRV_SECTOR_SIZE);
+iscsi_allocmap_set_allocated(iscsilun, offset, *pnum);
 }

-if (*pnum > nb_sectors) {
-*pnum = nb_sectors;
+if (*pnum > bytes) {
+*pnum = bytes;
 }
 out_unlock:
 qemu_mutex_unlock(&iscsilun->mutex);
@@ -782,7 +780,7 @@ out:
 if (iTask.task != NULL) {
 scsi_free_scsi_task(iTask.task);
 }
-if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID) {
+if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID && file) {
 *file = bs;
 }
 return ret;
@@ -821,25 +819,23 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState 
*bs,
  nb_sectors * BDRV_SECTOR_SIZE) &&
 !iscsi_allocmap_is_allocated(iscsilun, sector_num * BDRV_SECTOR_SIZE,
  nb_sectors * BDRV_SECTOR_SIZE)) {
-int pnum;
-BlockDriverState *file;
+int64_t pnum;
 /* check the block status from the beginning of the cluster
  * containing the start sector */
-int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS;
-int head;
+int64_t head;
 int64_t ret;

-assert(cluster_sectors);
-head = sector_num % cluster_sectors;
-ret = iscsi_co_get_block_status(bs, sector_num - head,
-BDRV_REQUEST_MAX_SECTORS, &pnum,
-&file);
+assert(iscsilun->cluster_size);
+head = (sector_num * BDRV_SECTOR_SIZE) % iscsilun->cluster_size;
+ret = iscsi_co_block_status(bs, sector_num * BDRV_SECTOR_SIZE - head,
+BDRV_REQUEST_MAX_BYTES, &pnum, NULL);
 if (ret < 0) {

[Qemu-devel] [PATCH 27/31] vdi: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vdi driver accordingly.  Note that the
TODO is already covered (the block layer guarantees bounds of its
requests), and that we can remove the now-unused s->block_sectors.

Signed-off-by: Eric Blake 
---
 block/vdi.c | 25 -
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index a70b969..390e2f1 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -171,8 +171,6 @@ typedef struct {
 uint32_t *bmap;
 /* Size of block (bytes). */
 uint32_t block_size;
-/* Size of block (sectors). */
-uint32_t block_sectors;
 /* First sector of block map. */
 uint32_t bmap_sector;
 /* VDI header (converted to host endianness). */
@@ -462,7 +460,6 @@ static int vdi_open(BlockDriverState *bs, QDict *options, 
int flags,
 bs->total_sectors = header.disk_size / SECTOR_SIZE;

 s->block_size = header.block_size;
-s->block_sectors = header.block_size / SECTOR_SIZE;
 s->bmap_sector = header.offset_bmap / SECTOR_SIZE;
 s->header = header;

@@ -508,23 +505,17 @@ static int vdi_reopen_prepare(BDRVReopenState *state,
 return 0;
 }

-static int64_t coroutine_fn vdi_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int64_t coroutine_fn vdi_co_block_status(BlockDriverState *bs,
+int64_t offset, int64_t bytes, int64_t *pnum, BlockDriverState **file)
 {
-/* TODO: Check for too large sector_num (in bdrv_is_allocated or here). */
 BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
-size_t bmap_index = sector_num / s->block_sectors;
-size_t sector_in_block = sector_num % s->block_sectors;
-int n_sectors = s->block_sectors - sector_in_block;
+size_t bmap_index = offset / s->block_size;
+size_t index_in_block = offset % s->block_size;
 uint32_t bmap_entry = le32_to_cpu(s->bmap[bmap_index]);
-uint64_t offset;
 int result;

-logout("%p, %" PRId64 ", %d, %p\n", bs, sector_num, nb_sectors, pnum);
-if (n_sectors > nb_sectors) {
-n_sectors = nb_sectors;
-}
-*pnum = n_sectors;
+logout("%p, %" PRId64 ", %" PRId64 ", %p\n", bs, offset, bytes, pnum);
+*pnum = MIN(s->block_size, bytes);
 result = VDI_IS_ALLOCATED(bmap_entry);
 if (!result) {
 return 0;
@@ -532,7 +523,7 @@ static int64_t coroutine_fn 
vdi_co_get_block_status(BlockDriverState *bs,

 offset = s->header.offset_data +
   (uint64_t)bmap_entry * s->block_size +
-  sector_in_block * SECTOR_SIZE;
+  (index_in_block & BDRV_BLOCK_OFFSET_MASK);
 *file = bs->file->bs;
 return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
 }
@@ -901,7 +892,7 @@ static BlockDriver bdrv_vdi = {
 .bdrv_child_perm  = bdrv_format_default_perms,
 .bdrv_create = vdi_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = vdi_co_get_block_status,
+.bdrv_co_block_status = vdi_co_block_status,
 .bdrv_make_empty = vdi_make_empty,

 .bdrv_co_preadv = vdi_co_preadv,
-- 
2.9.3

[Qemu-devel] [PATCH 09/31] block: Switch bdrv_co_get_block_status_above() to byte-based

2017-04-17 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
type (no semantic change), and rename it to match the corresponding
public function rename.

Signed-off-by: Eric Blake 
---
 block/io.c | 42 --
 1 file changed, 16 insertions(+), 26 deletions(-)

diff --git a/block/io.c b/block/io.c
index f7ece8d..10bc011 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1819,11 +1819,11 @@ out:
 return ret;
 }

-static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState 
*bs,
+static int64_t coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
 BlockDriverState *base,
-int64_t sector_num,
-int nb_sectors,
-int *pnum,
+int64_t offset,
+int64_t bytes,
+int64_t *pnum,
 BlockDriverState **file)
 {
 BlockDriverState *p;
@@ -1831,41 +1831,32 @@ static int64_t coroutine_fn 
bdrv_co_get_block_status_above(BlockDriverState *bs,

 assert(bs != base);
 for (p = bs; p != base; p = backing_bs(p)) {
-int64_t count;
-
-ret = bdrv_co_block_status(p, sector_num * BDRV_SECTOR_SIZE,
-   nb_sectors * BDRV_SECTOR_SIZE, &count,
-   file);
-*pnum = count >> BDRV_SECTOR_BITS;
+ret = bdrv_co_block_status(p, offset, bytes, pnum, file);
 if (ret < 0 || ret & BDRV_BLOCK_ALLOCATED) {
 break;
 }
-/* [sector_num, pnum] unallocated on this layer, which could be only
- * the first part of [sector_num, nb_sectors].  */
-nb_sectors = MIN(nb_sectors, *pnum);
+/* [offset, pnum] unallocated on this layer, which could be only
+ * the first part of [offset, bytes].  */
+bytes = MIN(bytes, *pnum);
 }
 return ret;
 }

 /* Coroutine wrapper for bdrv_get_block_status_above() */
-static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
+static void coroutine_fn bdrv_block_status_above_co_entry(void *opaque)
 {
 BdrvCoBlockStatusData *data = opaque;
-int n;

-data->ret = bdrv_co_get_block_status_above(data->bs, data->base,
-   data->offset >> 
BDRV_SECTOR_BITS,
-   data->bytes >> BDRV_SECTOR_BITS,
-   &n,
-   data->file);
-*data->pnum = n * BDRV_SECTOR_SIZE;
+data->ret = bdrv_co_block_status_above(data->bs, data->base,
+   data->offset, data->bytes,
+   data->pnum, data->file);
 data->done = true;
 }

 /*
- * Synchronous wrapper around bdrv_co_get_block_status_above().
+ * Synchronous wrapper around bdrv_co_block_status_above().
  *
- * See bdrv_co_get_block_status_above() for details.
+ * See bdrv_co_block_status_above() for details.
  */
 int64_t bdrv_get_block_status_above(BlockDriverState *bs,
 BlockDriverState *base,
@@ -1887,10 +1878,9 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,

 if (qemu_in_coroutine()) {
 /* Fast-path if already in coroutine context */
-bdrv_get_block_status_above_co_entry(&data);
+bdrv_block_status_above_co_entry(&data);
 } else {
-co = qemu_coroutine_create(bdrv_get_block_status_above_co_entry,
-   &data);
+co = qemu_coroutine_create(bdrv_block_status_above_co_entry, &data);
 bdrv_coroutine_enter(bs, co);
 BDRV_POLL_WHILE(bs, !data.done);
 }
-- 
2.9.3

[Qemu-devel] [PATCH 19/31] null: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the null driver accordingly.

Signed-off-by: Eric Blake 
---
 block/null.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/block/null.c b/block/null.c
index b300390..2dc2dd7 100644
--- a/block/null.c
+++ b/block/null.c
@@ -204,22 +204,21 @@ static int null_reopen_prepare(BDRVReopenState 
*reopen_state,
 return 0;
 }

-static int64_t coroutine_fn null_co_get_block_status(BlockDriverState *bs,
- int64_t sector_num,
- int nb_sectors, int *pnum,
- BlockDriverState **file)
+static int64_t coroutine_fn null_co_block_status(BlockDriverState *bs,
+ int64_t offset,
+ int64_t bytes, int64_t *pnum,
+ BlockDriverState **file)
 {
 BDRVNullState *s = bs->opaque;
-off_t start = sector_num * BDRV_SECTOR_SIZE;
+int64_t ret = BDRV_BLOCK_OFFSET_VALID | (offset & BDRV_BLOCK_OFFSET_MASK);

-*pnum = nb_sectors;
+*pnum = bytes;
 *file = bs;

 if (s->read_zeroes) {
-return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO;
-} else {
-return BDRV_BLOCK_OFFSET_VALID | start;
+ret |= BDRV_BLOCK_ZERO;
 }
+return ret;
 }

 static void null_refresh_filename(BlockDriverState *bs, QDict *opts)
@@ -250,7 +249,7 @@ static BlockDriver bdrv_null_co = {
 .bdrv_co_flush_to_disk  = null_co_flush,
 .bdrv_reopen_prepare= null_reopen_prepare,

-.bdrv_co_get_block_status   = null_co_get_block_status,
+.bdrv_co_block_status   = null_co_block_status,

 .bdrv_refresh_filename  = null_refresh_filename,
 };
@@ -269,7 +268,7 @@ static BlockDriver bdrv_null_aio = {
 .bdrv_aio_flush = null_aio_flush,
 .bdrv_reopen_prepare= null_reopen_prepare,

-.bdrv_co_get_block_status   = null_co_get_block_status,
+.bdrv_co_block_status   = null_co_block_status,

 .bdrv_refresh_filename  = null_refresh_filename,
 };
-- 
2.9.3

[Qemu-devel] [PATCH 16/31] iscsi: Switch iscsi_allocmap_update() to byte-based

2017-04-17 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert all uses of
the allocmap (no semantic change).  Callers that already had bytes
available are simpler, and callers that now scale to bytes will be
easier to switch to byte-based in the future.

Signed-off-by: Eric Blake 
---
 block/iscsi.c | 90 +--
 1 file changed, 44 insertions(+), 46 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 9648a45..e51fdfb 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -488,24 +488,22 @@ static int iscsi_allocmap_init(IscsiLun *iscsilun, int 
open_flags)
 }

 static void
-iscsi_allocmap_update(IscsiLun *iscsilun, int64_t sector_num,
-  int nb_sectors, bool allocated, bool valid)
+iscsi_allocmap_update(IscsiLun *iscsilun, int64_t offset,
+  int64_t bytes, bool allocated, bool valid)
 {
 int64_t cl_num_expanded, nb_cls_expanded, cl_num_shrunk, nb_cls_shrunk;
-int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS;

 if (iscsilun->allocmap == NULL) {
 return;
 }
 /* expand to entirely contain all affected clusters */
-assert(cluster_sectors);
-cl_num_expanded = sector_num / cluster_sectors;
-nb_cls_expanded = DIV_ROUND_UP(sector_num + nb_sectors,
-   cluster_sectors) - cl_num_expanded;
+assert(iscsilun->cluster_size);
+cl_num_expanded = offset / iscsilun->cluster_size;
+nb_cls_expanded = DIV_ROUND_UP(offset + bytes, iscsilun->cluster_size)
+- cl_num_expanded;
 /* shrink to touch only completely contained clusters */
-cl_num_shrunk = DIV_ROUND_UP(sector_num, cluster_sectors);
-nb_cls_shrunk = (sector_num + nb_sectors) / cluster_sectors
-  - cl_num_shrunk;
+cl_num_shrunk = DIV_ROUND_UP(offset, iscsilun->cluster_size);
+nb_cls_shrunk = (offset + bytes) / iscsilun->cluster_size - cl_num_shrunk;
 if (allocated) {
 bitmap_set(iscsilun->allocmap, cl_num_expanded, nb_cls_expanded);
 } else {
@@ -528,26 +526,26 @@ iscsi_allocmap_update(IscsiLun *iscsilun, int64_t 
sector_num,
 }

 static void
-iscsi_allocmap_set_allocated(IscsiLun *iscsilun, int64_t sector_num,
- int nb_sectors)
+iscsi_allocmap_set_allocated(IscsiLun *iscsilun, int64_t offset,
+ int64_t bytes)
 {
-iscsi_allocmap_update(iscsilun, sector_num, nb_sectors, true, true);
+iscsi_allocmap_update(iscsilun, offset, bytes, true, true);
 }

 static void
-iscsi_allocmap_set_unallocated(IscsiLun *iscsilun, int64_t sector_num,
-   int nb_sectors)
+iscsi_allocmap_set_unallocated(IscsiLun *iscsilun, int64_t offset,
+   int64_t bytes)
 {
 /* Note: if cache.direct=on the fifth argument to iscsi_allocmap_update
  * is ignored, so this will in effect be an iscsi_allocmap_set_invalid.
  */
-iscsi_allocmap_update(iscsilun, sector_num, nb_sectors, false, true);
+iscsi_allocmap_update(iscsilun, offset, bytes, false, true);
 }

-static void iscsi_allocmap_set_invalid(IscsiLun *iscsilun, int64_t sector_num,
-   int nb_sectors)
+static void iscsi_allocmap_set_invalid(IscsiLun *iscsilun, int64_t offset,
+   int64_t bytes)
 {
-iscsi_allocmap_update(iscsilun, sector_num, nb_sectors, false, false);
+iscsi_allocmap_update(iscsilun, offset, bytes, false, false);
 }

 static void iscsi_allocmap_invalidate(IscsiLun *iscsilun)
@@ -561,34 +559,30 @@ static void iscsi_allocmap_invalidate(IscsiLun *iscsilun)
 }

 static inline bool
-iscsi_allocmap_is_allocated(IscsiLun *iscsilun, int64_t sector_num,
-int nb_sectors)
+iscsi_allocmap_is_allocated(IscsiLun *iscsilun, int64_t offset,
+int64_t bytes)
 {
 unsigned long size;
 if (iscsilun->allocmap == NULL) {
 return true;
 }
 assert(iscsilun->cluster_size);
-size = DIV_ROUND_UP(sector_num + nb_sectors,
-iscsilun->cluster_size >> BDRV_SECTOR_BITS);
+size = DIV_ROUND_UP(offset + bytes, iscsilun->cluster_size);
 return !(find_next_bit(iscsilun->allocmap, size,
-   sector_num * BDRV_SECTOR_SIZE /
-   iscsilun->cluster_size) == size);
+   offset / iscsilun->cluster_size) == size);
 }

 static inline bool iscsi_allocmap_is_valid(IscsiLun *iscsilun,
-   int64_t sector_num, int nb_sectors)
+   int64_t offset, int64_t bytes)
 {
 unsigned long size;
 if (iscsilun->allocmap_valid == NULL) {
 return false;
 }
 assert(iscsilun->cluster_size);
-size = DIV_ROUND_UP(sector_num + nb_sectors,
-iscsilun->cluster_size >> BDRV_SE

[Qemu-devel] [PATCH 13/31] file-posix: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the file protocol driver accordingly.

Signed-off-by: Eric Blake 
---
 block/file-posix.c | 47 +++
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 1941fb6..690bd45 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1832,22 +1832,22 @@ static int find_allocation(BlockDriverState *bs, off_t 
start,
 /*
  * Returns the allocation status of the specified sectors.
  *
- * If 'sector_num' is beyond the end of the disk image the return value is 0
+ * If 'offset' is beyond the end of the disk image the return value is 0
  * and 'pnum' is set to 0.
  *
- * 'pnum' is set to the number of sectors (including and immediately following
- * the specified sector) that are known to be in the same
+ * 'pnum' is set to the number of bytes (including and immediately following
+ * the specified offset) that are known to be in the same
  * allocated/unallocated state.
  *
- * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
+ * 'bytes' is the max value 'pnum' should be set to.  If bytes goes
  * beyond the end of the disk image it will be clamped.
  */
-static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors, int *pnum,
-BlockDriverState **file)
+static int64_t coroutine_fn raw_co_block_status(BlockDriverState *bs,
+int64_t offset,
+int64_t bytes, int64_t *pnum,
+BlockDriverState **file)
 {
-off_t start, data = 0, hole = 0;
+off_t data = 0, hole = 0;
 int64_t total_size;
 int ret;

@@ -1856,39 +1856,38 @@ static int64_t coroutine_fn 
raw_co_get_block_status(BlockDriverState *bs,
 return ret;
 }

-start = sector_num * BDRV_SECTOR_SIZE;
 total_size = bdrv_getlength(bs);
 if (total_size < 0) {
 return total_size;
-} else if (start >= total_size) {
+} else if (offset >= total_size) {
 *pnum = 0;
 return 0;
-} else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
-nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
+} else if (offset + bytes > total_size) {
+bytes = total_size - offset;
 }

-ret = find_allocation(bs, start, &data, &hole);
+ret = find_allocation(bs, offset, &data, &hole);
 if (ret == -ENXIO) {
 /* Trailing hole */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_ZERO;
 } else if (ret < 0) {
 /* No info available, so pretend there are no holes */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_DATA;
-} else if (data == start) {
-/* On a data extent, compute sectors to the end of the extent,
+} else if (data == offset) {
+/* On a data extent, compute bytes to the end of the extent,
  * possibly including a partial sector at EOF. */
-*pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
+*pnum = MIN(bytes, hole - offset);
 ret = BDRV_BLOCK_DATA;
 } else {
-/* On a hole, compute sectors to the beginning of the next extent.  */
-assert(hole == start);
-*pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
+/* On a hole, compute bytes to the beginning of the next extent.  */
+assert(hole == offset);
+*pnum = MIN(bytes, data - offset);
 ret = BDRV_BLOCK_ZERO;
 }
 *file = bs;
-return ret | BDRV_BLOCK_OFFSET_VALID | start;
+return ret | BDRV_BLOCK_OFFSET_VALID | (offset & BDRV_BLOCK_OFFSET_MASK);
 }

 static coroutine_fn BlockAIOCB *raw_aio_pdiscard(BlockDriverState *bs,
@@ -1963,7 +1962,7 @@ BlockDriver bdrv_file = {
 .bdrv_close = raw_close,
 .bdrv_create = raw_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = raw_co_get_block_status,
+.bdrv_co_block_status = raw_co_block_status,
 .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes,

 .bdrv_co_preadv = raw_co_preadv,
-- 
2.9.3

[Qemu-devel] [PATCH 21/31] qcow: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qcow driver accordingly.

Signed-off-by: Eric Blake 
---
 block/qcow.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index 5d147b9..d7dfa08 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -515,20 +515,22 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
 return cluster_offset;
 }

-static int64_t coroutine_fn qcow_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int64_t coroutine_fn qcow_co_block_status(BlockDriverState *bs,
+int64_t offset, int64_t bytes, int64_t *pnum, BlockDriverState **file)
 {
 BDRVQcowState *s = bs->opaque;
-int index_in_cluster, n;
+int index_in_cluster;
+int64_t n;
 uint64_t cluster_offset;

 qemu_co_mutex_lock(&s->lock);
-cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
+cluster_offset = get_cluster_offset(bs, offset, 0, 0, 0, 0);
 qemu_co_mutex_unlock(&s->lock);
-index_in_cluster = sector_num & (s->cluster_sectors - 1);
-n = s->cluster_sectors - index_in_cluster;
-if (n > nb_sectors)
-n = nb_sectors;
+index_in_cluster = offset & (s->cluster_size - 1);
+n = s->cluster_size - index_in_cluster;
+if (n > bytes) {
+n = bytes;
+}
 *pnum = n;
 if (!cluster_offset) {
 return 0;
@@ -536,7 +538,7 @@ static int64_t coroutine_fn 
qcow_co_get_block_status(BlockDriverState *bs,
 if ((cluster_offset & QCOW_OFLAG_COMPRESSED) || s->cipher) {
 return BDRV_BLOCK_DATA;
 }
-cluster_offset |= (index_in_cluster << BDRV_SECTOR_BITS);
+cluster_offset |= (index_in_cluster & BDRV_BLOCK_OFFSET_MASK);
 *file = bs->file->bs;
 return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | cluster_offset;
 }
@@ -1061,7 +1063,7 @@ static BlockDriver bdrv_qcow = {

 .bdrv_co_readv  = qcow_co_readv,
 .bdrv_co_writev = qcow_co_writev,
-.bdrv_co_get_block_status   = qcow_co_get_block_status,
+.bdrv_co_block_status   = qcow_co_block_status,

 .bdrv_set_key   = qcow_set_key,
 .bdrv_make_empty= qcow_make_empty,
-- 
2.9.3

[Qemu-devel] [PATCH 15/31] iscsi: Switch cluster_sectors to byte-based

2017-04-17 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert all uses of
the cluster size in sectors, along with adding assertions that we
are not dividing by zero.

Signed-off-by: Eric Blake 
---
 block/iscsi.c | 56 +++-
 1 file changed, 35 insertions(+), 21 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 5daa201..9648a45 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -79,7 +79,7 @@ typedef struct IscsiLun {
 unsigned long *allocmap;
 unsigned long *allocmap_valid;
 long allocmap_size;
-int cluster_sectors;
+int cluster_size;
 bool use_16_for_rw;
 bool write_protected;
 bool lbpme;
@@ -460,9 +460,10 @@ static int iscsi_allocmap_init(IscsiLun *iscsilun, int 
open_flags)
 {
 iscsi_allocmap_free(iscsilun);

+assert(iscsilun->cluster_size);
 iscsilun->allocmap_size =
-DIV_ROUND_UP(sector_lun2qemu(iscsilun->num_blocks, iscsilun),
- iscsilun->cluster_sectors);
+DIV_ROUND_UP(iscsilun->num_blocks * iscsilun->block_size,
+ iscsilun->cluster_size);

 iscsilun->allocmap = bitmap_try_new(iscsilun->allocmap_size);
 if (!iscsilun->allocmap) {
@@ -470,7 +471,7 @@ static int iscsi_allocmap_init(IscsiLun *iscsilun, int 
open_flags)
 }

 if (open_flags & BDRV_O_NOCACHE) {
-/* in case that cache.direct = on all allocmap entries are
+/* when cache.direct = on all allocmap entries are
  * treated as invalid to force a relookup of the block
  * status on every read request */
 return 0;
@@ -491,17 +492,19 @@ iscsi_allocmap_update(IscsiLun *iscsilun, int64_t 
sector_num,
   int nb_sectors, bool allocated, bool valid)
 {
 int64_t cl_num_expanded, nb_cls_expanded, cl_num_shrunk, nb_cls_shrunk;
+int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS;

 if (iscsilun->allocmap == NULL) {
 return;
 }
 /* expand to entirely contain all affected clusters */
-cl_num_expanded = sector_num / iscsilun->cluster_sectors;
+assert(cluster_sectors);
+cl_num_expanded = sector_num / cluster_sectors;
 nb_cls_expanded = DIV_ROUND_UP(sector_num + nb_sectors,
-   iscsilun->cluster_sectors) - 
cl_num_expanded;
+   cluster_sectors) - cl_num_expanded;
 /* shrink to touch only completely contained clusters */
-cl_num_shrunk = DIV_ROUND_UP(sector_num, iscsilun->cluster_sectors);
-nb_cls_shrunk = (sector_num + nb_sectors) / iscsilun->cluster_sectors
+cl_num_shrunk = DIV_ROUND_UP(sector_num, cluster_sectors);
+nb_cls_shrunk = (sector_num + nb_sectors) / cluster_sectors
   - cl_num_shrunk;
 if (allocated) {
 bitmap_set(iscsilun->allocmap, cl_num_expanded, nb_cls_expanded);
@@ -565,9 +568,12 @@ iscsi_allocmap_is_allocated(IscsiLun *iscsilun, int64_t 
sector_num,
 if (iscsilun->allocmap == NULL) {
 return true;
 }
-size = DIV_ROUND_UP(sector_num + nb_sectors, iscsilun->cluster_sectors);
+assert(iscsilun->cluster_size);
+size = DIV_ROUND_UP(sector_num + nb_sectors,
+iscsilun->cluster_size >> BDRV_SECTOR_BITS);
 return !(find_next_bit(iscsilun->allocmap, size,
-   sector_num / iscsilun->cluster_sectors) == size);
+   sector_num * BDRV_SECTOR_SIZE /
+   iscsilun->cluster_size) == size);
 }

 static inline bool iscsi_allocmap_is_valid(IscsiLun *iscsilun,
@@ -577,9 +583,12 @@ static inline bool iscsi_allocmap_is_valid(IscsiLun 
*iscsilun,
 if (iscsilun->allocmap_valid == NULL) {
 return false;
 }
-size = DIV_ROUND_UP(sector_num + nb_sectors, iscsilun->cluster_sectors);
+assert(iscsilun->cluster_size);
+size = DIV_ROUND_UP(sector_num + nb_sectors,
+iscsilun->cluster_size >> BDRV_SECTOR_BITS);
 return (find_next_zero_bit(iscsilun->allocmap_valid, size,
-   sector_num / iscsilun->cluster_sectors) == 
size);
+   sector_num * BDRV_SECTOR_SIZE /
+   iscsilun->cluster_size) == size);
 }

 static int coroutine_fn
@@ -814,16 +823,21 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState 
*bs,
 BlockDriverState *file;
 /* check the block status from the beginning of the cluster
  * containing the start sector */
-int64_t ret = iscsi_co_get_block_status(bs,
-  sector_num - sector_num % iscsilun->cluster_sectors,
-  BDRV_REQUEST_MAX_SECTORS, &pnum, &file);
+int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS;
+int head;
+int64_t ret;
+
+assert(cluster_sectors);
+head = sector_num % cluster_sectors;
+ret = isc

[Qemu-devel] [PATCH 07/31] block: Switch bdrv_co_get_block_status() to byte-based

2017-04-17 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change), and as with its public counterpart,
rename to bdrv_co_block_status() to make the compiler enforce that
we catch all uses.  For now, we assert that callers still pass
aligned data, but ultimately, this will be the function where we
hand off to a byte-based driver callback, and will eventually need
to add logic to ensure we round calls according to the driver's
request_alignment then touch up the result handed back to the
caller, if we start permitting a caller to pass unaligned offsets.

While at it, adjust the function to accept NULL for pnum or file,
while still guaranteeing the driver callback has a non-NULL pointer
to write into.

Signed-off-by: Eric Blake 
---
 block/io.c | 88 +-
 1 file changed, 53 insertions(+), 35 deletions(-)

diff --git a/block/io.c b/block/io.c
index 5cdb1f0..e804bdd 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1696,69 +1696,83 @@ typedef struct BdrvCoGetBlockStatusData {
  * Drivers not implementing the functionality are assumed to not support
  * backing files, hence all their sectors are reported as allocated.
  *
- * If 'sector_num' is beyond the end of the disk image the return value is 0
+ * If 'offset' is beyond the end of the disk image the return value is 0
  * and 'pnum' is set to 0.
  *
- * 'pnum' is set to the number of sectors (including and immediately following
- * the specified sector) that are known to be in the same
- * allocated/unallocated state.
+ * 'pnum' is set to the number of bytes (including and immediately following
+ * the specified offset) that are known to be in the same
+ * allocated/unallocated state.  It may be NULL.
  *
- * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
+ * 'bytes' is the max value 'pnum' should be set to.  If bytes goes
  * beyond the end of the disk image it will be clamped.
  *
- * If returned value is positive and BDRV_BLOCK_OFFSET_VALID bit is set, 'file'
- * points to the BDS which the sector range is allocated in.
+ * If returned value is positive and BDRV_BLOCK_OFFSET_VALID bit is set, and
+ * 'file' is not NULL, then *file points to the BDS which owns the allocated
+ * sector that contains offset.
  */
-static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
- int64_t sector_num,
- int nb_sectors, int *pnum,
- BlockDriverState **file)
+static int64_t coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
+ int64_t offset, int64_t bytes,
+ int64_t *pnum,
+ BlockDriverState **file)
 {
-int64_t total_sectors;
-int64_t n;
+int64_t total_size;
+int64_t n; /* bytes */
 int64_t ret, ret2;
+int count; /* sectors */
+BlockDriverState *tmp_file;

-total_sectors = bdrv_nb_sectors(bs);
-if (total_sectors < 0) {
-return total_sectors;
+total_size = bdrv_getlength(bs);
+if (total_size < 0) {
+return total_size;
 }

-if (sector_num >= total_sectors) {
+if (!pnum) {
+pnum = &n;
+}
+if (offset >= total_size) {
 *pnum = 0;
 return 0;
 }

-n = total_sectors - sector_num;
-if (n < nb_sectors) {
-nb_sectors = n;
+n = total_size - offset;
+if (n < bytes) {
+bytes = n;
 }

 if (!bs->drv->bdrv_co_get_block_status) {
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
 if (bs->drv->protocol_name) {
-ret |= BDRV_BLOCK_OFFSET_VALID | (sector_num * BDRV_SECTOR_SIZE);
+ret |= BDRV_BLOCK_OFFSET_VALID | (offset & BDRV_BLOCK_OFFSET_MASK);
 }
 return ret;
 }

+if (!file) {
+file = &tmp_file;
+}
 *file = NULL;
 bdrv_inc_in_flight(bs);
-ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum,
+/* TODO: Rather than require aligned offsets, we could instead
+ * round to the driver's request_alignment here, then touch up
+ * count afterwards back to the caller's expectations.  But first
+ * we want to switch the driver callback to likewise be
+ * byte-based. */
+assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
+ret = bs->drv->bdrv_co_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
+bytes >> BDRV_SECTOR_BITS, &count,
 file);
 if (ret < 0) {
 *pnum = 0;
 goto out;
 }
+*pnum = count * BDRV_SECTOR_SIZE;

 if (ret & BDRV_BLOCK_RAW) {
-int64_t

[Qemu-devel] [PATCH 11/31] block: Add .bdrv_co_block_status() callback

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based. Now that the block layer exposes byte-based allocation,
it's time to tackle the drivers.  Add a new callback that operates
on as small as byte boundaries, and update the block layer to ensure
that the callback is only used with inputs aligned to the device's
request_alignment. Subsequent patches will then update individual
drivers, and then finally remove .bdrv_co_get_block_status().

Signed-off-by: Eric Blake 
---
 include/block/block_int.h | 12 
 block/io.c| 47 +++
 2 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index bc3a28a..8f20bc3 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -163,11 +163,23 @@ struct BlockDriver {
  */
 int coroutine_fn (*bdrv_co_pwrite_zeroes)(BlockDriverState *bs,
 int64_t offset, int count, BdrvRequestFlags flags);
+
 int coroutine_fn (*bdrv_co_pdiscard)(BlockDriverState *bs,
 int64_t offset, int count);
+
+/*
+ * Building block for bdrv_block_status[_above]. The block layer
+ * guarantees input aligned to request_alignment, as well as
+ * non-NULL pnum and file; and the result only has to worry about
+ * BDRV_BLOCK_DATA, _ZERO, _OFFSET_VALID, and _RAW, and only
+ * according to the current BDS.
+ */
 int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, int *pnum,
 BlockDriverState **file);
+int64_t coroutine_fn (*bdrv_co_block_status)(BlockDriverState *bd,
+int64_t offset, int64_t bytes, int64_t *pnum,
+BlockDriverState **file);

 /*
  * Invalidate any cached meta-data.
diff --git a/block/io.c b/block/io.c
index 1b101cf..361eeb8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1718,7 +1718,6 @@ static int64_t coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,
 int64_t total_size;
 int64_t n; /* bytes */
 int64_t ret, ret2;
-int count; /* sectors */
 BlockDriverState *tmp_file;

 total_size = bdrv_getlength(bs);
@@ -1739,7 +1738,7 @@ static int64_t coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,
 bytes = n;
 }

-if (!bs->drv->bdrv_co_get_block_status) {
+if (!bs->drv->bdrv_co_get_block_status && !bs->drv->bdrv_co_block_status) {
 *pnum = bytes;
 ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
 if (bs->drv->protocol_name) {
@@ -1753,20 +1752,44 @@ static int64_t coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,
 }
 *file = NULL;
 bdrv_inc_in_flight(bs);
-/* TODO: Rather than require aligned offsets, we could instead
- * round to the driver's request_alignment here, then touch up
- * count afterwards back to the caller's expectations.  But first
- * we want to switch the driver callback to likewise be
- * byte-based. */
-assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
-ret = bs->drv->bdrv_co_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
-bytes >> BDRV_SECTOR_BITS, &count,
-file);
+if (bs->drv->bdrv_co_get_block_status) {
+int count; /* sectors */
+
+assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
+ret = bs->drv->bdrv_co_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
+bytes >> BDRV_SECTOR_BITS,
+&count, file);
+*pnum = count * BDRV_SECTOR_SIZE;
+} else {
+/* Round out to request_alignment boundaries */
+int64_t aligned_offset, aligned_bytes;
+
+aligned_offset = QEMU_ALIGN_DOWN(offset, bs->bl.request_alignment);
+aligned_bytes = ROUND_UP(offset + bytes,
+ bs->bl.request_alignment) - aligned_offset;
+ret = bs->drv->bdrv_co_block_status(bs, aligned_offset, aligned_bytes,
+&n, file);
+/* Clamp pnum and ret to original request */
+if (aligned_offset != offset && ret >= 0) {
+int sectors = DIV_ROUND_UP(offset, BDRV_SECTOR_SIZE) -
+DIV_ROUND_UP(aligned_offset, BDRV_SECTOR_SIZE);
+
+assert(n >= offset - aligned_offset);
+n -= offset - aligned_offset;
+if (sectors) {
+ret += sectors * BDRV_SECTOR_SIZE;
+}
+}
+if (ret >= 0 && n > bytes) {
+assert(aligned_bytes != bytes);
+n = bytes;
+}
+*pnum = n;
+}
 if (ret < 0) {
 *pnum = 0;
 goto out;
 }
-*pnum = count * BDRV_SECTOR_SIZE;

 if (ret & BDRV_BLOCK_RAW) {
 assert(ret & BDRV_BLOCK_OFFSET_VALID);
-- 
2.9.3

[Qemu-devel] [PATCH 14/31] gluster: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the gluster driver accordingly.

Signed-off-by: Eric Blake 
---
 block/gluster.c | 47 +++
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 1d4e2f7..3f252c6 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1332,24 +1332,24 @@ exit:
 /*
  * Returns the allocation status of the specified sectors.
  *
- * If 'sector_num' is beyond the end of the disk image the return value is 0
+ * If 'offset' is beyond the end of the disk image the return value is 0
  * and 'pnum' is set to 0.
  *
- * 'pnum' is set to the number of sectors (including and immediately following
- * the specified sector) that are known to be in the same
+ * 'pnum' is set to the number of bytes (including and immediately following
+ * the specified offset) that are known to be in the same
  * allocated/unallocated state.
  *
- * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
+ * 'bytes' is the max value 'pnum' should be set to.  If bytes goes
  * beyond the end of the disk image it will be clamped.
  *
- * (Based on raw_co_get_block_status() from file-posix.c.)
+ * (Based on raw_co_block_status() from file-posix.c.)
  */
-static int64_t coroutine_fn qemu_gluster_co_get_block_status(
-BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
+static int64_t coroutine_fn qemu_gluster_co_block_status(
+BlockDriverState *bs, int64_t offset, int64_t bytes, int64_t *pnum,
 BlockDriverState **file)
 {
 BDRVGlusterState *s = bs->opaque;
-off_t start, data = 0, hole = 0;
+off_t data = 0, hole = 0;
 int64_t total_size;
 int ret = -EINVAL;

@@ -1357,41 +1357,40 @@ static int64_t coroutine_fn 
qemu_gluster_co_get_block_status(
 return ret;
 }

-start = sector_num * BDRV_SECTOR_SIZE;
 total_size = bdrv_getlength(bs);
 if (total_size < 0) {
 return total_size;
-} else if (start >= total_size) {
+} else if (offset >= total_size) {
 *pnum = 0;
 return 0;
-} else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
-nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
+} else if (offset + bytes > total_size) {
+bytes = total_size - offset;
 }

-ret = find_allocation(bs, start, &data, &hole);
+ret = find_allocation(bs, offset, &data, &hole);
 if (ret == -ENXIO) {
 /* Trailing hole */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_ZERO;
 } else if (ret < 0) {
 /* No info available, so pretend there are no holes */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_DATA;
-} else if (data == start) {
+} else if (data == offset) {
 /* On a data extent, compute sectors to the end of the extent,
  * possibly including a partial sector at EOF. */
-*pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
+*pnum = MIN(bytes, hole - offset);
 ret = BDRV_BLOCK_DATA;
 } else {
 /* On a hole, compute sectors to the beginning of the next extent.  */
-assert(hole == start);
-*pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
+assert(hole == offset);
+*pnum = MIN(bytes, data - offset);
 ret = BDRV_BLOCK_ZERO;
 }

 *file = bs;

-return ret | BDRV_BLOCK_OFFSET_VALID | start;
+return ret | BDRV_BLOCK_OFFSET_VALID | (offset & BDRV_BLOCK_OFFSET_MASK);
 }


@@ -1419,7 +1418,7 @@ static BlockDriver bdrv_gluster = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
 .bdrv_co_pwrite_zeroes= qemu_gluster_co_pwrite_zeroes,
 #endif
-.bdrv_co_get_block_status = qemu_gluster_co_get_block_status,
+.bdrv_co_block_status = qemu_gluster_co_block_status,
 .create_opts  = &qemu_gluster_create_opts,
 };

@@ -1447,7 +1446,7 @@ static BlockDriver bdrv_gluster_tcp = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
 .bdrv_co_pwrite_zeroes= qemu_gluster_co_pwrite_zeroes,
 #endif
-.bdrv_co_get_block_status = qemu_gluster_co_get_block_status,
+.bdrv_co_block_status = qemu_gluster_co_block_status,
 .create_opts  = &qemu_gluster_create_opts,
 };

@@ -1475,7 +1474,7 @@ static BlockDriver bdrv_gluster_unix = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
 .bdrv_co_pwrite_zeroes= qemu_gluster_co_pwrite_zeroes,
 #endif
-.bdrv_co_get_block_status = qemu_gluster_co_get_block_status,
+.bdrv_co_block_status = qemu_gluster_co_block_status,
 .create_opts  = &qemu_gluster_create_opts,
 };

@@ -1509,7 +1508,7 @@ static BlockDriver bdrv_gluster_rdma = {
 #ifdef CONFIG_GLUSTERFS_ZEROFILL
 .bdrv_co_pwrite_zeroes= qemu_gluster_co_pwrite_zeroes,
 #endif
-.bdrv_co_get_block_status = qemu_gluster_c

[Qemu-devel] [PATCH 12/31] commit: Switch to .bdrv_co_block_status()

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the commit driver accordingly.

Signed-off-by: Eric Blake 
---
 block/commit.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 989de7d..1cc7a00 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -228,14 +228,14 @@ static int coroutine_fn 
bdrv_commit_top_preadv(BlockDriverState *bs,
 return bdrv_co_preadv(bs->backing, offset, bytes, qiov, flags);
 }

-static int64_t coroutine_fn bdrv_commit_top_get_block_status(
-BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
+static int64_t coroutine_fn bdrv_commit_top_block_status(
+BlockDriverState *bs, int64_t offset, int64_t bytes, int64_t *pnum,
 BlockDriverState **file)
 {
-*pnum = nb_sectors;
+*pnum = bytes;
 *file = bs->backing->bs;
 return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_DATA |
-   (sector_num << BDRV_SECTOR_BITS);
+   (offset & BDRV_BLOCK_OFFSET_MASK);
 }

 static void bdrv_commit_top_refresh_filename(BlockDriverState *bs, QDict *opts)
@@ -263,7 +263,7 @@ static void bdrv_commit_top_child_perm(BlockDriverState 
*bs, BdrvChild *c,
 static BlockDriver bdrv_commit_top = {
 .format_name= "commit_top",
 .bdrv_co_preadv = bdrv_commit_top_preadv,
-.bdrv_co_get_block_status   = bdrv_commit_top_get_block_status,
+.bdrv_co_block_status   = bdrv_commit_top_block_status,
 .bdrv_refresh_filename  = bdrv_commit_top_refresh_filename,
 .bdrv_close = bdrv_commit_top_close,
 .bdrv_child_perm= bdrv_commit_top_child_perm,
-- 
2.9.3

[Qemu-devel] [PATCH 06/31] block: Convert bdrv_get_block_status() to bytes

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the name of the function from bdrv_get_block_status() to
bdrv_block_status() ensures that the compiler enforces that all
callers are updated.  For now, the io.c layer still assert()s that
all callers are sector-aligned, but that can be relaxed when a later
patch implements byte-based block status in the drivers.

Note that we have an inherent limitation in the BDRV_BLOCK_* return
values: BDRV_BLOCK_OFFSET_VALID can only return the start of a
sector, even if we later relax the interface to query for the status
starting at an intermediate byte; document the obvious interpretation
that valid offsets are always sector-relative.

Therefore, for the most part this patch is just the addition of scaling
at the callers followed by inverse scaling at bdrv_block_status().  But
some code, particularly bdrv_is_allocated(), gets a lot simpler because
it no longer has to mess with sectors; also, it is now possible to pass
NULL if the caller does not care how much of the image is allocated
beyond the initial offset.

For ease of review, bdrv_get_block_status_above() will be tackled
separately.

Signed-off-by: Eric Blake 
---
 include/block/block.h | 15 +--
 block/io.c| 51 ++-
 block/qcow2-cluster.c |  2 +-
 qemu-img.c| 19 ++-
 4 files changed, 46 insertions(+), 41 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index eed1330..b9e7281 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -121,10 +121,10 @@ typedef struct HDGeometry {

 /*
  * Allocation status flags
- * BDRV_BLOCK_DATA: data is read from a file returned by bdrv_get_block_status.
+ * BDRV_BLOCK_DATA: data is read from a file returned by bdrv_block_status.
  * BDRV_BLOCK_ZERO: sectors read as zero
  * BDRV_BLOCK_OFFSET_VALID: sector stored as raw data in a file returned by
- *  bdrv_get_block_status.
+ *  bdrv_block_status.
  * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
  *   layer (as opposed to the backing file)
  * BDRV_BLOCK_RAW: used internally to indicate that the request
@@ -132,7 +132,10 @@ typedef struct HDGeometry {
  * should look in bs->file directly.
  *
  * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset in
- * bs->file where sector data can be read from as raw data.
+ * bs->file that begins the sector containing the address in question,
+ * where the sector can be read from as raw data with individual bytes
+ * at the same sector-relative locations (and thus, this bit cannot be
+ * set for mappings which are not equivalent modulo 512).
  *
  * DATA == 0 && ZERO == 0 means that data is read from backing_hd if present.
  *
@@ -414,9 +417,9 @@ int bdrv_has_zero_init_1(BlockDriverState *bs);
 int bdrv_has_zero_init(BlockDriverState *bs);
 bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs);
 bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
-int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num,
-  int nb_sectors, int *pnum,
-  BlockDriverState **file);
+int64_t bdrv_block_status(BlockDriverState *bs, int64_t offset,
+  int64_t bytes, int64_t *pnum,
+  BlockDriverState **file);
 int64_t bdrv_get_block_status_above(BlockDriverState *bs,
 BlockDriverState *base,
 int64_t sector_num,
diff --git a/block/io.c b/block/io.c
index 1f8ae81..5cdb1f0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -669,7 +669,6 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
 int64_t target_size, ret, bytes, offset = 0;
 BlockDriverState *bs = child->bs;
 BlockDriverState *file;
-int n; /* sectors */

 target_size = bdrv_getlength(bs);
 if (target_size < 0) {
@@ -681,24 +680,23 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags 
flags)
 if (bytes <= 0) {
 return 0;
 }
-ret = bdrv_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
-bytes >> BDRV_SECTOR_BITS, &n, &file);
+ret = bdrv_block_status(bs, offset, bytes, &bytes, &file);
 if (ret < 0) {
 error_report("error getting block status at offset %" PRId64 ": 
%s",
  offset, strerror(-ret));
 return ret;
 }
 if (ret & BDRV_BLOCK_ZERO) {
-offset += n * BDRV_SECTOR_BITS;
+offset += bytes;
 continue;

[Qemu-devel] [PATCH 04/31] block: Switch bdrv_make_zero() to byte-based

2017-04-17 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of zeroing a device to track by bytes instead of
sectors (although we are still guaranteed that we iterate by steps
that are sector-aligned).

Signed-off-by: Eric Blake 
---
 block/io.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/block/io.c b/block/io.c
index 07165dc..1f8ae81 100644
--- a/block/io.c
+++ b/block/io.c
@@ -666,39 +666,39 @@ int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
  */
 int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
 {
-int64_t target_sectors, ret, nb_sectors, sector_num = 0;
+int64_t target_size, ret, bytes, offset = 0;
 BlockDriverState *bs = child->bs;
 BlockDriverState *file;
-int n;
+int n; /* sectors */

-target_sectors = bdrv_nb_sectors(bs);
-if (target_sectors < 0) {
-return target_sectors;
+target_size = bdrv_getlength(bs);
+if (target_size < 0) {
+return target_size;
 }

 for (;;) {
-nb_sectors = MIN(target_sectors - sector_num, 
BDRV_REQUEST_MAX_SECTORS);
-if (nb_sectors <= 0) {
+bytes = MIN(target_size - offset, BDRV_REQUEST_MAX_BYTES);
+if (bytes <= 0) {
 return 0;
 }
-ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &n, &file);
+ret = bdrv_get_block_status(bs, offset >> BDRV_SECTOR_BITS,
+bytes >> BDRV_SECTOR_BITS, &n, &file);
 if (ret < 0) {
-error_report("error getting block status at sector %" PRId64 ": 
%s",
- sector_num, strerror(-ret));
+error_report("error getting block status at offset %" PRId64 ": 
%s",
+ offset, strerror(-ret));
 return ret;
 }
 if (ret & BDRV_BLOCK_ZERO) {
-sector_num += n;
+offset += n * BDRV_SECTOR_BITS;
 continue;
 }
-ret = bdrv_pwrite_zeroes(child, sector_num << BDRV_SECTOR_BITS,
- n << BDRV_SECTOR_BITS, flags);
+ret = bdrv_pwrite_zeroes(child, offset, n * BDRV_SECTOR_SIZE, flags);
 if (ret < 0) {
-error_report("error writing zeroes at sector %" PRId64 ": %s",
- sector_num, strerror(-ret));
+error_report("error writing zeroes at offset %" PRId64 ": %s",
+ offset, strerror(-ret));
 return ret;
 }
-sector_num += n;
+offset += n * BDRV_SECTOR_SIZE;
 }
 }

-- 
2.9.3

[Qemu-devel] [PATCH 03/31] qcow2: Switch is_zero_sectors() to byte-based

2017-04-17 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change), and rename it to is_zero_above() in
the process.

Signed-off-by: Eric Blake 
---
 block/qcow2.c | 32 ++--
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 4d34610..fe4ccf6 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2442,23 +2442,30 @@ finish:
 }


-static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
-uint32_t count)
+static bool is_zero_above(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
 int nr;
 BlockDriverState *file;
 int64_t res;
+int64_t start;

-if (start + count > bs->total_sectors) {
-count = bs->total_sectors - start;
+/* Widen to sector boundaries, then clamp to image length, before
+ * checking status of underlying sectors */
+start = QEMU_ALIGN_DOWN(offset, BDRV_SECTOR_SIZE);
+bytes = QEMU_ALIGN_UP(offset + bytes, BDRV_SECTOR_SIZE) - start;
+
+if (start + bytes > bs->total_sectors * BDRV_SECTOR_SIZE) {
+bytes = bs->total_sectors * BDRV_SECTOR_SIZE - offset;
 }

-if (!count) {
+if (!bytes) {
 return true;
 }
-res = bdrv_get_block_status_above(bs, NULL, start, count,
+res = bdrv_get_block_status_above(bs, NULL, start >> BDRV_SECTOR_BITS,
+  bytes >> BDRV_SECTOR_BITS,
   &nr, &file);
-return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == count;
+return res >= 0 && (res & BDRV_BLOCK_ZERO) &&
+nr * BDRV_SECTOR_SIZE == bytes;
 }

 static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
@@ -2476,24 +2483,21 @@ static coroutine_fn int 
qcow2_co_pwrite_zeroes(BlockDriverState *bs,
 }

 if (head || tail) {
-int64_t cl_start = (offset - head) >> BDRV_SECTOR_BITS;
 uint64_t off;
 unsigned int nr;

 assert(head + count <= s->cluster_size);

 /* check whether remainder of cluster already reads as zero */
-if (!(is_zero_sectors(bs, cl_start,
-  DIV_ROUND_UP(head, BDRV_SECTOR_SIZE)) &&
-  is_zero_sectors(bs, (offset + count) >> BDRV_SECTOR_BITS,
-  DIV_ROUND_UP(-tail & (s->cluster_size - 1),
-   BDRV_SECTOR_SIZE {
+if (!(is_zero_above(bs, offset - head, head) &&
+  is_zero_above(bs, offset + count,
+tail ? s->cluster_size - tail : 0))) {
 return -ENOTSUP;
 }

 qemu_co_mutex_lock(&s->lock);
 /* We can have new write after previous check */
-offset = cl_start << BDRV_SECTOR_BITS;
+offset = QEMU_ALIGN_DOWN(offset, s->cluster_size);
 count = s->cluster_size;
 nr = s->cluster_size;
 ret = qcow2_get_cluster_offset(bs, offset, &nr, &off);
-- 
2.9.3

[Qemu-devel] [PATCH 10/31] block: Convert bdrv_get_block_status_above() to bytes

2017-04-17 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the name of the function from bdrv_get_block_status_above()
to bdrv_block_status_above() ensures that the compiler enforces that
all callers are updated.  For now, the io.c layer still assert()s
that all callers are sector-aligned, but that can be relaxed when a
later patch implements byte-based block status in the drivers.

For the most part this patch is just the addition of scaling at the
callers followed by inverse scaling at bdrv_block_status().  But some
code, particularly bdrv_block_status(), gets a lot simpler because
it no longer has to mess with sectors; also, it is now possible to pass
NULL if the caller does not care how much of the image is allocated
beyond the initial offset, or which BDS in the chain owns the sector.

For ease of review, bdrv_get_block_status() was tackled separately.

Signed-off-by: Eric Blake 
---
 include/block/block.h | 10 +-
 block/io.c| 36 +++-
 block/mirror.c| 14 +-
 block/qcow2.c | 10 +++---
 qemu-img.c| 37 +
 5 files changed, 45 insertions(+), 62 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index b9e7281..8f2b8a2 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -420,11 +420,11 @@ bool bdrv_can_write_zeroes_with_unmap(BlockDriverState 
*bs);
 int64_t bdrv_block_status(BlockDriverState *bs, int64_t offset,
   int64_t bytes, int64_t *pnum,
   BlockDriverState **file);
-int64_t bdrv_get_block_status_above(BlockDriverState *bs,
-BlockDriverState *base,
-int64_t sector_num,
-int nb_sectors, int *pnum,
-BlockDriverState **file);
+int64_t bdrv_block_status_above(BlockDriverState *bs,
+BlockDriverState *base,
+int64_t offset,
+int64_t bytes, int64_t *pnum,
+BlockDriverState **file);
 int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
   int64_t *pnum);
 int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
diff --git a/block/io.c b/block/io.c
index 10bc011..1b101cf 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1842,7 +1842,7 @@ static int64_t coroutine_fn 
bdrv_co_block_status_above(BlockDriverState *bs,
 return ret;
 }

-/* Coroutine wrapper for bdrv_get_block_status_above() */
+/* Coroutine wrapper for bdrv_block_status_above() */
 static void coroutine_fn bdrv_block_status_above_co_entry(void *opaque)
 {
 BdrvCoBlockStatusData *data = opaque;
@@ -1858,21 +1858,19 @@ static void coroutine_fn 
bdrv_block_status_above_co_entry(void *opaque)
  *
  * See bdrv_co_block_status_above() for details.
  */
-int64_t bdrv_get_block_status_above(BlockDriverState *bs,
-BlockDriverState *base,
-int64_t sector_num,
-int nb_sectors, int *pnum,
-BlockDriverState **file)
+int64_t bdrv_block_status_above(BlockDriverState *bs,
+BlockDriverState *base,
+int64_t offset, int64_t bytes, int64_t *pnum,
+BlockDriverState **file)
 {
 Coroutine *co;
-int64_t n;
 BdrvCoBlockStatusData data = {
 .bs = bs,
 .base = base,
 .file = file,
-.offset = sector_num * BDRV_SECTOR_SIZE,
-.bytes = nb_sectors * BDRV_SECTOR_SIZE,
-.pnum = &n,
+.offset = offset,
+.bytes = bytes,
+.pnum = pnum,
 .done = false,
 };

@@ -1884,7 +1882,6 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
 bdrv_coroutine_enter(bs, co);
 BDRV_POLL_WHILE(bs, !data.done);
 }
-*pnum = n >> BDRV_SECTOR_BITS;
 return data.ret;
 }

@@ -1892,27 +1889,16 @@ int64_t bdrv_block_status(BlockDriverState *bs,
   int64_t offset, int64_t bytes, int64_t *pnum,
   BlockDriverState **file)
 {
-int64_t ret;
-int n;
-
-assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
-assert(bytes <= BDRV_REQUEST_MAX_BYTES);
-ret = bdrv_get_block_status_above(bs, backing_bs(bs),
-  offset >> BDRV_SECTOR_BITS,
-  bytes >> BDRV_SECTOR_BITS, &n, file);
-if (pnum) {
-

[Qemu-devel] [PATCH 05/31] qemu-img: Switch get_block_status() to byte-based

2017-04-17 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting
an internal function (no semantic change), and simplifying its
caller accordingly.

Signed-off-by: Eric Blake 
---
 qemu-img.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index d96b4d6..8cb5165 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2662,14 +2662,16 @@ static void dump_map_entry(OutputFormat output_format, 
MapEntry *e,
 }
 }

-static int get_block_status(BlockDriverState *bs, int64_t sector_num,
-int nb_sectors, MapEntry *e)
+static int get_block_status(BlockDriverState *bs, int64_t offset,
+int64_t bytes, MapEntry *e)
 {
 int64_t ret;
 int depth;
 BlockDriverState *file;
 bool has_offset;
+int nb_sectors = bytes >> BDRV_SECTOR_BITS;

+assert(bytes < INT_MAX);
 /* As an optimization, we could cache the current range of unallocated
  * clusters in each file of the chain, and avoid querying the same
  * range repeatedly.
@@ -2677,8 +2679,8 @@ static int get_block_status(BlockDriverState *bs, int64_t 
sector_num,

 depth = 0;
 for (;;) {
-ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &nb_sectors,
-&file);
+ret = bdrv_get_block_status(bs, offset >> BDRV_SECTOR_BITS, nb_sectors,
+&nb_sectors, &file);
 if (ret < 0) {
 return ret;
 }
@@ -2698,7 +2700,7 @@ static int get_block_status(BlockDriverState *bs, int64_t 
sector_num,
 has_offset = !!(ret & BDRV_BLOCK_OFFSET_VALID);

 *e = (MapEntry) {
-.start = sector_num * BDRV_SECTOR_SIZE,
+.start = offset,
 .length = nb_sectors * BDRV_SECTOR_SIZE,
 .data = !!(ret & BDRV_BLOCK_DATA),
 .zero = !!(ret & BDRV_BLOCK_ZERO),
@@ -2823,16 +2825,12 @@ static int img_map(int argc, char **argv)

 length = blk_getlength(blk);
 while (curr.start + curr.length < length) {
-int64_t nsectors_left;
-int64_t sector_num;
-int n;
-
-sector_num = (curr.start + curr.length) >> BDRV_SECTOR_BITS;
+int64_t offset = curr.start + curr.length;
+int64_t n;

 /* Probe up to 1 GiB at a time.  */
-nsectors_left = DIV_ROUND_UP(length, BDRV_SECTOR_SIZE) - sector_num;
-n = MIN(1 << (30 - BDRV_SECTOR_BITS), nsectors_left);
-ret = get_block_status(bs, sector_num, n, &next);
+n = QEMU_ALIGN_DOWN(MIN(1 << 30, length - offset), BDRV_SECTOR_SIZE);
+ret = get_block_status(bs, offset, n, &next);

 if (ret < 0) {
 error_report("Could not read file metadata: %s", strerror(-ret));
-- 
2.9.3

[Qemu-devel] [PATCH 08/31] block: Switch BdrvCoGetBlockStatusData to byte-based

2017-04-17 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
type (no semantic change), and rename it to match the corresponding
public function rename.

Signed-off-by: Eric Blake 
---
 block/io.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/block/io.c b/block/io.c
index e804bdd..f7ece8d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1680,16 +1680,16 @@ int bdrv_flush_all(void)
 }


-typedef struct BdrvCoGetBlockStatusData {
+typedef struct BdrvCoBlockStatusData {
 BlockDriverState *bs;
 BlockDriverState *base;
 BlockDriverState **file;
-int64_t sector_num;
-int nb_sectors;
-int *pnum;
+int64_t offset;
+int64_t bytes;
+int64_t *pnum;
 int64_t ret;
 bool done;
-} BdrvCoGetBlockStatusData;
+} BdrvCoBlockStatusData;

 /*
  * Returns the allocation status of the specified sectors.
@@ -1850,13 +1850,15 @@ static int64_t coroutine_fn 
bdrv_co_get_block_status_above(BlockDriverState *bs,
 /* Coroutine wrapper for bdrv_get_block_status_above() */
 static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
 {
-BdrvCoGetBlockStatusData *data = opaque;
+BdrvCoBlockStatusData *data = opaque;
+int n;

 data->ret = bdrv_co_get_block_status_above(data->bs, data->base,
-   data->sector_num,
-   data->nb_sectors,
-   data->pnum,
+   data->offset >> 
BDRV_SECTOR_BITS,
+   data->bytes >> BDRV_SECTOR_BITS,
+   &n,
data->file);
+*data->pnum = n * BDRV_SECTOR_SIZE;
 data->done = true;
 }

@@ -1872,13 +1874,14 @@ int64_t bdrv_get_block_status_above(BlockDriverState 
*bs,
 BlockDriverState **file)
 {
 Coroutine *co;
-BdrvCoGetBlockStatusData data = {
+int64_t n;
+BdrvCoBlockStatusData data = {
 .bs = bs,
 .base = base,
 .file = file,
-.sector_num = sector_num,
-.nb_sectors = nb_sectors,
-.pnum = pnum,
+.offset = sector_num * BDRV_SECTOR_SIZE,
+.bytes = nb_sectors * BDRV_SECTOR_SIZE,
+.pnum = &n,
 .done = false,
 };

@@ -1891,6 +1894,7 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
 bdrv_coroutine_enter(bs, co);
 BDRV_POLL_WHILE(bs, !data.done);
 }
+*pnum = n >> BDRV_SECTOR_BITS;
 return data.ret;
 }

-- 
2.9.3

[Qemu-devel] [PATCH 02/31] block: Make bdrv_round_to_clusters() signature more useful

2017-04-17 Thread Eric Blake

In the process of converting sector-based interfaces to bytes,
I'm finding it easier to represent a byte count as a 64-bit
integer at the block layer (even if we are internally capped
by SIZE_MAX or even INT_MAX for individual transactions, it's
still nicer to not have to worry about truncation/overflow
issues on as many variables).  Update the signature of
bdrv_round_to_clusters() to uniformly use uint64_t, matching
the signature already chosen for bdrv_is_allocated, and
adjust clients according to the required fallout.

Signed-off-by: Eric Blake 
---
 include/block/block.h | 4 ++--
 block/io.c| 7 ---
 block/mirror.c| 9 -
 block/trace-events| 2 +-
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 86ad511..eed1330 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -468,9 +468,9 @@ int bdrv_get_flags(BlockDriverState *bs);
 int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs);
 void bdrv_round_to_clusters(BlockDriverState *bs,
-int64_t offset, unsigned int bytes,
+int64_t offset, int64_t bytes,
 int64_t *cluster_offset,
-unsigned int *cluster_bytes);
+int64_t *cluster_bytes);

 const char *bdrv_get_encrypted_filename(BlockDriverState *bs);
 void bdrv_get_backing_filename(BlockDriverState *bs,
diff --git a/block/io.c b/block/io.c
index d61a906..07165dc 100644
--- a/block/io.c
+++ b/block/io.c
@@ -422,9 +422,9 @@ static void mark_request_serialising(BdrvTrackedRequest 
*req, uint64_t align)
  * Round a region to cluster boundaries
  */
 void bdrv_round_to_clusters(BlockDriverState *bs,
-int64_t offset, unsigned int bytes,
+int64_t offset, int64_t bytes,
 int64_t *cluster_offset,
-unsigned int *cluster_bytes)
+int64_t *cluster_bytes)
 {
 BlockDriverInfo bdi;

@@ -920,7 +920,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild 
*child,
 struct iovec iov;
 QEMUIOVector bounce_qiov;
 int64_t cluster_offset;
-unsigned int cluster_bytes;
+int64_t cluster_bytes;
 size_t skip_bytes;
 int ret;

@@ -941,6 +941,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild 
*child,
 trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
cluster_offset, cluster_bytes);

+assert(cluster_bytes < SIZE_MAX);
 iov.iov_len = cluster_bytes;
 iov.iov_base = bounce_buffer = qemu_try_blockalign(bs, iov.iov_len);
 if (bounce_buffer == NULL) {
diff --git a/block/mirror.c b/block/mirror.c
index 846e392..2510793 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -177,7 +177,7 @@ static void mirror_read_complete(void *opaque, int ret)
 /* Clip bytes relative to offset to not exceed end-of-file */
 static inline void mirror_clip_bytes(MirrorBlockJob *s,
  int64_t offset,
- unsigned int *bytes)
+ int64_t *bytes)
 {
 *bytes = MIN(*bytes, s->bdev_length - offset);
 }
@@ -190,10 +190,9 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t 
*offset,
 bool need_cow;
 int ret = 0;
 int64_t align_offset = *offset;
-unsigned int align_bytes = *bytes;
+int64_t align_bytes = *bytes;
 int max_bytes = s->granularity * s->max_iov;

-assert(*bytes < INT_MAX);
 need_cow = !test_bit(*offset / s->granularity, s->cow_bitmap);
 need_cow |= !test_bit((*offset + *bytes - 1) / s->granularity,
   s->cow_bitmap);
@@ -384,7 +383,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 while (nb_chunks > 0 && offset < s->bdev_length) {
 int64_t ret;
 int io_sectors;
-unsigned int io_bytes;
+int64_t io_bytes;
 int64_t io_bytes_acct;
 BlockDriverState *file;
 enum MirrorMethod {
@@ -410,7 +409,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 io_bytes = s->granularity;
 } else if (ret >= 0 && !(ret & BDRV_BLOCK_DATA)) {
 int64_t target_offset;
-unsigned int target_bytes;
+int64_t target_bytes;
 bdrv_round_to_clusters(blk_bs(s->target), offset, io_bytes,
&target_offset, &target_bytes);
 if (target_offset == offset &&
diff --git a/block/trace-events b/block/trace-events
index 04f6463..2301a50 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -15,7 +15,7 @@ bdrv_aio_writev(void *bs, int64_t sector_num, int nb_sectors, 
void *opaque) "bs
 bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num 
%"P

[Qemu-devel] [PATCH 00/31] make bdrv_get_block_status byte-based

2017-04-17 Thread Eric Blake

There are patches floating around to add NBD_CMD_BLOCK_STATUS,
but NBD wants to report status on byte granularity (even if the
reporting will probably be naturally aligned to sectors or even
much higher levels).  I've therefore started the task of
converting our block status code to report at a byte granularity
rather than sectors.

This is part three of that conversion: dirty-bitmap. Earlier parts
have been (mostly) reviewed, for bdrv_is_allocated and dirty-bitmaps.

Perhaps I could have split this in two; patches 1-10 vs. 11-31 make
a nice division of labor.

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-status-v1

It requires the following (v1 of bdrv_is_allocated, v1 of dirty-bitmap,
v9 of blkdebug, and Max's block-next tree):
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01995.html
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01723.html
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01298.html
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02163.html

The diffstat shows no net change in total lines - but I know that the
new code has more lines of comments than the old ;)

I still haven't felt like tackling the task of rewriting migration/block.c
to use bytes (instead of sectors) everywhere - that might give another
net win in lines of code and legibility, but I also know it would
conflict with some of the refactoring work that Juan is currently
posting for review.

Eric Blake (31):
  block: Drop unused bdrv_round_sectors_to_clusters()
  block: Make bdrv_round_to_clusters() signature more useful
  qcow2: Switch is_zero_sectors() to byte-based
  block: Switch bdrv_make_zero() to byte-based
  qemu-img: Switch get_block_status() to byte-based
  block: Convert bdrv_get_block_status() to bytes
  block: Switch bdrv_co_get_block_status() to byte-based
  block: Switch BdrvCoGetBlockStatusData to byte-based
  block: Switch bdrv_co_get_block_status_above() to byte-based
  block: Convert bdrv_get_block_status_above() to bytes
  block: Add .bdrv_co_block_status() callback
  commit: Switch to .bdrv_co_block_status()
  file-posix: Switch to .bdrv_co_block_status()
  gluster: Switch to .bdrv_co_block_status()
  iscsi: Switch cluster_sectors to byte-based
  iscsi: Switch iscsi_allocmap_update() to byte-based
  iscsi: Switch to .bdrv_co_block_status()
  mirror: Switch to .bdrv_co_block_status()
  null: Switch to .bdrv_co_block_status()
  parallels: Switch to .bdrv_co_block_status()
  qcow: Switch to .bdrv_co_block_status()
  qcow2: Switch to .bdrv_co_block_status()
  qed: Switch to .bdrv_co_block_status()
  raw: Switch to .bdrv_co_block_status()
  sheepdog: Switch to .bdrv_co_block_status()
  vdi: Avoid bitrot of debugging code
  vdi: Switch to .bdrv_co_block_status()
  vmdk: Switch to .bdrv_co_block_status()
  vpc: Switch to .bdrv_co_block_status()
  vvfat: Switch to .bdrv_co_block_status()
  block: Drop unused .bdrv_co_get_block_status()

 include/block/block.h |  33 +++
 include/block/block_int.h |  13 ++-
 block/commit.c|  10 +-
 block/file-posix.c|  47 +
 block/gluster.c   |  47 +
 block/io.c| 243 ++
 block/iscsi.c | 144 ++-
 block/mirror.c|  33 +++
 block/null.c  |  21 ++--
 block/parallels.c |  15 +--
 block/qcow.c  |  22 +++--
 block/qcow2-cluster.c |   2 +-
 block/qcow2.c |  51 +-
 block/qed.c   |  22 ++---
 block/raw-format.c|  16 +--
 block/sheepdog.c  |  23 +++--
 block/vdi.c   |  37 ---
 block/vmdk.c  |  24 ++---
 block/vpc.c   |  31 +++---
 block/vvfat.c |  12 +--
 qemu-img.c|  70 ++---
 block/trace-events|   2 +-
 22 files changed, 459 insertions(+), 459 deletions(-)

-- 
2.9.3

[Qemu-devel] [PATCH 01/31] block: Drop unused bdrv_round_sectors_to_clusters()

2017-04-17 Thread Eric Blake

Now that the last user [mirror_iteration()] has converted to using
bytes, we no longer need a function to round sectors to clusters.

Signed-off-by: Eric Blake 
---
 include/block/block.h |  4 
 block/io.c| 21 -
 2 files changed, 25 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 740cb86..86ad511 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -467,10 +467,6 @@ const char *bdrv_get_device_or_node_name(const 
BlockDriverState *bs);
 int bdrv_get_flags(BlockDriverState *bs);
 int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs);
-void bdrv_round_sectors_to_clusters(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors,
-int64_t *cluster_sector_num,
-int *cluster_nb_sectors);
 void bdrv_round_to_clusters(BlockDriverState *bs,
 int64_t offset, unsigned int bytes,
 int64_t *cluster_offset,
diff --git a/block/io.c b/block/io.c
index d22d35f..d61a906 100644
--- a/block/io.c
+++ b/block/io.c
@@ -419,27 +419,6 @@ static void mark_request_serialising(BdrvTrackedRequest 
*req, uint64_t align)
 }

 /**
- * Round a region to cluster boundaries (sector-based)
- */
-void bdrv_round_sectors_to_clusters(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors,
-int64_t *cluster_sector_num,
-int *cluster_nb_sectors)
-{
-BlockDriverInfo bdi;
-
-if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
-*cluster_sector_num = sector_num;
-*cluster_nb_sectors = nb_sectors;
-} else {
-int64_t c = bdi.cluster_size / BDRV_SECTOR_SIZE;
-*cluster_sector_num = QEMU_ALIGN_DOWN(sector_num, c);
-*cluster_nb_sectors = QEMU_ALIGN_UP(sector_num - *cluster_sector_num +
-nb_sectors, c);
-}
-}
-
-/**
  * Round a region to cluster boundaries
  */
 void bdrv_round_to_clusters(BlockDriverState *bs,
-- 
2.9.3

Re: [Qemu-devel] [PATCH 02/15] colo-compare: implement the process of checkpoint

2017-04-17 Thread Zhang Chen




On 04/17/2017 07:04 PM, Hailiang Zhang wrote:

Hi Jason,

On 2017/4/14 14:38, Jason Wang wrote:


On 2017年04月14日 14:22, Hailiang Zhang wrote:

Hi Jason,

On 2017/4/14 13:57, Jason Wang wrote:

On 2017年02月22日 17:31, Zhang Chen wrote:

On 02/22/2017 11:42 AM, zhanghailiang wrote:

While do checkpoint, we need to flush all the unhandled packets,
By using the filter notifier mechanism, we can easily to notify
every compare object to do this process, which runs inside
of compare threads as a coroutine.

Hi~ Jason and Hailiang.

I will send a patch set later about colo-compare notify mechanism for
Xen like this patch.
I want to add a new chardev socket way in colo-comapre connect to Xen
colo, for notify
checkpoint or failover, Because We have no choice to use this way
communicate with Xen codes.
That's means we will have two notify mechanism.
What do you think about this?


Thanks
Zhang Chen
I was thinking the possibility of using similar way to for colo 
compare.

E.g can we use socket? This can saves duplicated codes more or less.

Since there are too many sockets used by filter and COLO, (Two unix
sockets and two
  tcp sockets for each vNIC), I don't want to introduce more ;) , but
i'm not sure if it is
possible to make it more flexible and optional, abstract these
duplicated codes,
pass the opened fd (No matter eventfd or socket fd ) as parameter, for
example.
Is this way acceptable ?

Thanks,
Hailiang

Yes, that's kind of what I want. We don't want to use two message
format. Passing a opened fd need management support, we still need a
fallback if there's no management on top. For qemu/kvm, we can do all
stuffs transparent to the cli by e.g socketpair() or others, but the key
is to have a unified message format.


After a deeper investigation, i think we can re-use most codes, since 
there is no
existing way to notify xen (no ?), we still needs notify chardev 
socket (Be used to notify xen, it is optional.)
(http://patchwork.ozlabs.org/patch/733431/ "COLO-compare: Add Xen 
notify chardev socket handler frame")


Besides, there is an existing qmp comand 'xen-colo-do-checkpoint', we 
can re-use it to notify
colo-compare objects and other filter objects to do checkpoint, for 
the opposite direction, we use

the notify chardev socket (Only for xen).

So the codes will be like:
diff --git a/migration/colo.c b/migration/colo.c
index 91da936..813c281 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -224,7 +224,19 @@ ReplicationStatus 
*qmp_query_xen_replication_status(Error **errp)


 void qmp_xen_colo_do_checkpoint(Error **errp)
 {
+Error *local_err = NULL;
+
 replication_do_checkpoint_all(errp);
+/* Notify colo-compare and other filters to do checkpoint */
+colo_notify_compares_event(NULL, COLO_CHECKPOINT, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+colo_notify_filters_event(COLO_CHECKPOINT, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+}
 }

 static void colo_send_message(QEMUFile *f, COLOMessage msg,
diff --git a/net/colo-compare.c b/net/colo-compare.c
index 24e13f0..de975c5 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -391,6 +391,9 @@ static void colo_compare_inconsistent_notify(void)
 {
 notifier_list_notify(&colo_compare_notifiers,
 migrate_get_current());
+if (s->notify_dev) {
+   /* Do something, notify remote side through notify dev */
+}
 }

 void colo_compare_register_notifier(Notifier *notify)

How about this scenario ？


I agree this way, maybe rename qmp_xen_colo_do_checkpoint()
to qmp_remote_colo_do_checkpoint() is more generic.

Thanks
Zhang Chen




Thoughts?

Thanks


Thanks


.





.






.



--
Thanks
Zhang Chen

Re: [Qemu-devel] [PULL 2/8] replication: clarify permissions

2017-04-17 Thread Eric Blake

On 03/17/2017 08:15 AM, Kevin Wolf wrote:
> From: Changlong Xie 
> 
> Even if hidden_disk, secondary_disk are backing files, they all need
> write permissions in replication scenario. Otherwise we will encouter
> below exceptions on secondary side during adding nbd server:
> 
> {'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk', 
> 'writable': true } }
> {"error": {"class": "GenericError", "desc": "Conflicts with use by 
> hidden-qcow2-driver as 'backing', which does not allow 'write' on 
> sec-qcow2-driver-for-nbd"}}
> 
> CC: Zhang Hailiang 
> CC: Zhang Chen 
> CC: Wen Congyang 

This address for Wen Congyang is different than the one listed in
MAINTAINERS for replication (M: Wen Congyang ),
and different still from addresses my mailer has harvested from other
posts (wencongy...@gmail.com).  The MAINTAINERS entry is now resulting
in 'undelivered mail' bounce messages, can you please submit an update
to MAINTAINERS with your new preferred address? [or gently correct me if
I'm confusing two people with the same name?]

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based

2017-04-17 Thread Eric Blake

On 04/17/2017 06:42 PM, John Snow wrote:
> 
> 
> On 04/11/2017 06:29 PM, Eric Blake wrote:
>> There are patches floating around to add NBD_CMD_BLOCK_STATUS,
>> but NBD wants to report status on byte granularity (even if the
>> reporting will probably be naturally aligned to sectors or even
>> much higher levels).  I've therefore started the task of
>> converting our block status code to report at a byte granularity
>> rather than sectors.
>>
>> This is part one of that conversion: bdrv_is_allocated().
>> Other parts (still to be written) include tracking dirty bitmaps
>> by bytes (it's still one bit per granularity, but now we won't
>> be double-scaling from bytes to sectors to granularity),

That series is not only written, but reviewed now:
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02163.html


>> then
>> replacing bdrv_get_block_status() with a byte based callback
>> in all the drivers.

Coming up soon.

>>
>> Available as a tag at:
>> git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-allocated-v1
>>
>> It requires v9 or later of my prior work on blkdebug:
>> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01723.html
>> which in turn requires Max's block-next tree:
>> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01298.html
>>

>>  include/block/block.h|   6 +-
>>  include/block/block_backup.h |  11 +-
>>  include/qemu/ratelimit.h |   3 +-
>>  block/backup.c   | 126 --
>>  block/commit.c   |  54 
>>  block/io.c   |  59 +
>>  block/mirror.c   | 300 
>> ++-
>>  block/replication.c  |  29 +++--
>>  block/stream.c   |  35 +++--
>>  block/vvfat.c|  34 +++--
>>  migration/block.c|   9 +-
>>  qemu-img.c   |  15 ++-
>>  qemu-io-cmds.c   |  57 
>>  block/trace-events   |  14 +-
>>  14 files changed, 381 insertions(+), 371 deletions(-)
> 
> Shame, you added ten lines!

Yeah, some of that back-and-forth scaling is verbose and resulted in
line breaks that used to fit in one.  Of course, the second series
regained those ten lines and more, once I got to flatten some of the
interfaces away from repeated scaling:

$ git diff --stat nbd-byte-allocated-v1..nbd-byte-dirty-v1 | cat
 include/block/block_int.h|  2 +-
 include/block/dirty-bitmap.h | 21 ---
 block/backup.c   |  7 ++--
 block/dirty-bitmap.c | 83

 block/io.c   |  6 ++--
 block/mirror.c   | 73 +-
 migration/block.c| 14 
 7 files changed, 74 insertions(+), 132 deletions(-)

> 
>>
> 
> Patches 1-15:
> 
> Reviewed-by: John Snow 
> 
> 9: Is there a good reason for a void fn() to return its argument via a
> passed parameter? I see you're matching the other interface, but that
> strikes me as wonky.

It bothered me a bit too; beyond the copy-and-paste factor, my
justification was that the parameter is both input and output, rather
than output-only. But I don't mind respinning to add a preliminary patch
that fixes mirror_clip_sectors() to return a value, then pattern
mirror_clip_bytes to do likewise.

> 
> 11: Looks correct to me, but this one's a bit hairier than the rest. How
> many times do we truly need to round, adjust, clip, round again, align,
> clip, round, align, ...

I don't know that there are that many rounds of clipping going on, but
there is definitely a lot of scaling, and it gets better as later
patches make even more things be byte-based.

> 
> I'll take a peek at the last two tomorrow.

Thanks for plodding through it so far. For supposedly being no semantic
change, it is still definitely a lot to think about.  But the end result
is more legible, in my opinion.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] [PATCH 13/17] backup: Switch block_backup.h to byte-based

2017-04-17 Thread Eric Blake

On 04/17/2017 06:24 PM, John Snow wrote:
> 
> 
> On 04/11/2017 06:29 PM, Eric Blake wrote:
>> We are gradually converting to byte-based interfaces, as they are
>> easier to reason about than sector-based.  Continue by converting
>> the public interface to backup jobs (no semantic change), including
>> a change to CowRequest to track by bytes instead of cluster indices.
>>
>> Signed-off-by: Eric Blake 
>> ---

>>
>> -void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
>> -  int nb_sectors);
>> +void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
>> +  uint64_t bytes);
>>  void backup_cow_request_begin(CowRequest *req, BlockJob *job,
>> -  int64_t sector_num,
>> -  int nb_sectors);
>> +  int64_t offset, uint64_t bytes);
>>  void backup_cow_request_end(CowRequest *req);
> 
> Should we adjust the parameter names of cow_request_begin and
> wait_for_overlapping_requests, too?

Sure, I can do that.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] [PATCH 00/17] make bdrv_is_allocated[_above] byte-based

2017-04-17 Thread John Snow



On 04/11/2017 06:29 PM, Eric Blake wrote:
> There are patches floating around to add NBD_CMD_BLOCK_STATUS,
> but NBD wants to report status on byte granularity (even if the
> reporting will probably be naturally aligned to sectors or even
> much higher levels).  I've therefore started the task of
> converting our block status code to report at a byte granularity
> rather than sectors.
> 
> This is part one of that conversion: bdrv_is_allocated().
> Other parts (still to be written) include tracking dirty bitmaps
> by bytes (it's still one bit per granularity, but now we won't
> be double-scaling from bytes to sectors to granularity), then
> replacing bdrv_get_block_status() with a byte based callback
> in all the drivers.
> 
> Available as a tag at:
> git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-allocated-v1
> 
> It requires v9 or later of my prior work on blkdebug:
> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01723.html
> which in turn requires Max's block-next tree:
> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01298.html
> 
> Eric Blake (17):
>   blockjob: Track job ratelimits via bytes, not sectors
>   trace: Show blockjob actions via bytes, not sectors
>   stream: Switch stream_populate() to byte-based
>   stream: Switch stream_run() to byte-based
>   commit: Switch commit_populate() to byte-based
>   commit: Switch commit_run() to byte-based
>   mirror: Switch MirrorBlockJob to byte-based
>   mirror: Switch mirror_do_zero_or_discard() to byte-based
>   mirror: Switch mirror_cow_align() to byte-based
>   mirror: Switch mirror_do_read() to byte-based
>   mirror: Switch mirror_iteration() to byte-based
>   backup: Switch BackupBlockJob to byte-based
>   backup: Switch block_backup.h to byte-based
>   backup: Switch backup_do_cow() to byte-based
>   backup: Switch backup_run() to byte-based
>   block: Make bdrv_is_allocated() byte-based
>   block: Make bdrv_is_allocated_above() byte-based
> 
>  include/block/block.h|   6 +-
>  include/block/block_backup.h |  11 +-
>  include/qemu/ratelimit.h |   3 +-
>  block/backup.c   | 126 --
>  block/commit.c   |  54 
>  block/io.c   |  59 +
>  block/mirror.c   | 300 
> ++-
>  block/replication.c  |  29 +++--
>  block/stream.c   |  35 +++--
>  block/vvfat.c|  34 +++--
>  migration/block.c|   9 +-
>  qemu-img.c   |  15 ++-
>  qemu-io-cmds.c   |  57 
>  block/trace-events   |  14 +-
>  14 files changed, 381 insertions(+), 371 deletions(-)

Shame, you added ten lines!

> 

Patches 1-15:

Reviewed-by: John Snow 

9: Is there a good reason for a void fn() to return its argument via a
passed parameter? I see you're matching the other interface, but that
strikes me as wonky.

11: Looks correct to me, but this one's a bit hairier than the rest. How
many times do we truly need to round, adjust, clip, round again, align,
clip, round, align, ...

I'll take a peek at the last two tomorrow.

--js

Re: [Qemu-devel] [PATCH v4 0/4] X86/HMP: Expose x86 model specific registers via human monitor

2017-04-17 Thread Eduardo Habkost

On Tue, Apr 04, 2017 at 11:33:33AM +0200, Julian Kirsch wrote:
> ping
> 
> I kindly request your comments.

Hi Julian,

Sorry for taking so long to reply.

I can't find the original series on either qemu-devel archives,
or on my own mail archive.

Searching for the Message-Id you were replying to, I can't find
any matches:
https://www.mail-archive.com/search?l=mid&q=20170329183017.14026-1-git%40kirschju.re

Can you resend?

> 
> On 29.03.2017 20:30, Julian Kirsch wrote:
> > Provide read/write access to x86 model specific registers (MSRs) by means of
> > two new HMP commands "msr_get" and "msr_set". The rationale behind this
> > is to improve introspection capabilities for system virtualization mode.
> > For instance, many modern x86-64 operating systems maintain access to 
> > internal
> > data structures via the MSR_GSBASE/MSR_KERNELGSBASE MSRs. Giving
> > introspection utilities (such as a remotely attached gdb via "monitor 
> > msr_get")
> > a way of accessing these registers improves analysis results drastically.
> > 
> > This iteration addresses Eduardo's comments of splitting the patch up into
> > movement, reordering and addition of new MSRs.
> > 
> > Changes v3 -> v4:
> > * Split up x86-related parts of the patch into three distinct patches 
> > performing
> >   movement, reordering and addition of new MSRs.
> > 
> > Changes v2 -> v3:
> > * Rename HMP commands to "msr_get" and "msr_set"
> > 
> > Changes v1 -> v2:
> > * Rename HMP commands to "msr-get" and "msr-set"
> > * HMP commands Operate on the current default CPU only
> >   (removes need for cpu_index argument)
> > * Remove QMP command alltogether
> > * Implement HMP command in target/i386/monitor.c
> > * Add #ifdef TARGET_I386 around msr-get/msr-set in hmp-commands.hx
> > 
> > Julian Kirsch (4):
> >   X86: Move rdmsr/wrmsr functionality to standalone functions
> >   X86: Reorder MSRs in rdmsr/wrmsr to follow the order used by KVM
> >   X86: Add MSRs supported by KVM to rdmsr/wrmsr
> >   HMP: Introduce msr_get and msr_set HMP commands
> > 
> >  hmp-commands.hx  |  38 
> >  include/monitor/hmp-target.h |   2 +
> >  target/i386/cpu.h|   3 +
> >  target/i386/helper.c | 524 
> > +++
> >  target/i386/misc_helper.c| 297 +---
> >  target/i386/monitor.c|  76 +++
> >  6 files changed, 654 insertions(+), 286 deletions(-)
> > 
> 
> 

-- 
Eduardo

Re: [Qemu-devel] [Qemu-block] [PATCH 13/17] backup: Switch block_backup.h to byte-based

2017-04-17 Thread John Snow



On 04/11/2017 06:29 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Continue by converting
> the public interface to backup jobs (no semantic change), including
> a change to CowRequest to track by bytes instead of cluster indices.
> 
> Signed-off-by: Eric Blake 
> ---
>  include/block/block_backup.h | 11 +--
>  block/backup.c   | 29 ++---
>  block/replication.c  | 12 
>  3 files changed, 27 insertions(+), 25 deletions(-)
> 
> diff --git a/include/block/block_backup.h b/include/block/block_backup.h
> index 8a75947..994a3bd 100644
> --- a/include/block/block_backup.h
> +++ b/include/block/block_backup.h
> @@ -21,17 +21,16 @@
>  #include "block/block_int.h"
> 
>  typedef struct CowRequest {
> -int64_t start;
> -int64_t end;
> +int64_t start_byte;
> +int64_t end_byte;
>  QLIST_ENTRY(CowRequest) list;
>  CoQueue wait_queue; /* coroutines blocked on this request */
>  } CowRequest;
> 
> -void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
> -  int nb_sectors);
> +void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
> +  uint64_t bytes);
>  void backup_cow_request_begin(CowRequest *req, BlockJob *job,
> -  int64_t sector_num,
> -  int nb_sectors);
> +  int64_t offset, uint64_t bytes);
>  void backup_cow_request_end(CowRequest *req);

Should we adjust the parameter names of cow_request_begin and
wait_for_overlapping_requests, too?

> 
>  void backup_do_checkpoint(BlockJob *job, Error **errp);
> diff --git a/block/backup.c b/block/backup.c
> index a64a162..0502c1a 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -64,7 +64,7 @@ static void coroutine_fn 
> wait_for_overlapping_requests(BackupBlockJob *job,
>  do {
>  retry = false;
>  QLIST_FOREACH(req, &job->inflight_reqs, list) {
> -if (end > req->start && start < req->end) {
> +if (end > req->start_byte && start < req->end_byte) {
>  qemu_co_queue_wait(&req->wait_queue, NULL);
>  retry = true;
>  break;
> @@ -77,8 +77,8 @@ static void coroutine_fn 
> wait_for_overlapping_requests(BackupBlockJob *job,
>  static void cow_request_begin(CowRequest *req, BackupBlockJob *job,
>   int64_t start, int64_t end)
>  {
> -req->start = start;
> -req->end = end;
> +req->start_byte = start;
> +req->end_byte = end;
>  qemu_co_queue_init(&req->wait_queue);
>  QLIST_INSERT_HEAD(&job->inflight_reqs, req, list);
>  }
> @@ -114,8 +114,10 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
> *job,
>sector_num * BDRV_SECTOR_SIZE,
>nb_sectors * BDRV_SECTOR_SIZE);
> 
> -wait_for_overlapping_requests(job, start, end);
> -cow_request_begin(&cow_request, job, start, end);
> +wait_for_overlapping_requests(job, start * job->cluster_size,
> +  end * job->cluster_size);
> +cow_request_begin(&cow_request, job, start * job->cluster_size,
> +  end * job->cluster_size);
> 
>  for (; start < end; start++) {
>  if (test_bit(start, job->done_bitmap)) {
> @@ -277,32 +279,29 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
>  bitmap_zero(backup_job->done_bitmap, len);
>  }
> 
> -void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
> -  int nb_sectors)
> +void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
> +  uint64_t bytes)
>  {
>  BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
> -int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
>  int64_t start, end;
> 
>  assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
> 
> -start = sector_num / sectors_per_cluster;
> -end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
> +start = QEMU_ALIGN_DOWN(offset, backup_job->cluster_size);
> +end = QEMU_ALIGN_UP(offset + bytes, backup_job->cluster_size);
>  wait_for_overlapping_requests(backup_job, start, end);
>  }
> 
>  void backup_cow_request_begin(CowRequest *req, BlockJob *job,
> -  int64_t sector_num,
> -  int nb_sectors)
> +  int64_t offset, uint64_t bytes)
>  {
>  BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
> -int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
>  int64_t start, end;
> 
>  assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
> 
> -start = sect

[Qemu-devel] [RFC 6/7] pci: Set phb->bus inside pci_bus_new()

2017-04-17 Thread Eduardo Habkost

Every single caller of pci_bus_new() saves the return value inside
phb->bus. Do that inside pci_bus_new() to avoid code duplication and
make it harder to break.

Cc: "Michael S. Tsirkin" 
Cc: Marcel Apfelbaum 
Signed-off-by: Eduardo Habkost 
---
 hw/pci-bridge/pci_expander_bridge.c | 2 --
 hw/pci-host/piix.c  | 1 -
 hw/pci-host/q35.c   | 6 +++---
 hw/pci/pci.c| 2 +-
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index 39d29d2230..8344ca1cc8 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -245,8 +245,6 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
pcie, Error **errp)
 bus->address_space_io = dev->bus->address_space_io;
 bus->map_irq = pxb_map_irq_fn;
 
-phb->bus = bus;
-
 pxb_register_bus(dev, bus, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 91fec05b38..818e4979d8 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -342,7 +342,6 @@ PCIBus *i440fx_init(const char *host_type, const char 
*pci_type,
 s = PCI_HOST_BRIDGE(dev);
 b = pci_bus_new(s, NULL, pci_address_space,
 address_space_io, 0, TYPE_PCI_BUS);
-s->bus = b;
 object_property_add_child(qdev_get_machine(), "i440fx", OBJECT(dev), NULL);
 qdev_init_nofail(dev);
 
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 860b47a1ba..5b41412075 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -49,9 +49,9 @@ static void q35_host_realize(DeviceState *dev, Error **errp)
 sysbus_add_io(sbd, MCH_HOST_BRIDGE_CONFIG_DATA, &pci->data_mem);
 sysbus_init_ioports(sbd, MCH_HOST_BRIDGE_CONFIG_DATA, 4);
 
-pci->bus = pci_bus_new(pci, "pcie.0",
-   s->mch.pci_address_space, s->mch.address_space_io,
-   0, TYPE_PCIE_BUS);
+pci_bus_new(pci, "pcie.0",
+s->mch.pci_address_space, s->mch.address_space_io,
+0, TYPE_PCIE_BUS);
 PC_MACHINE(qdev_get_machine())->bus = pci->bus;
 qdev_set_parent_bus(DEVICE(&s->mch), BUS(pci->bus));
 qdev_init_nofail(DEVICE(&s->mch));
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index d3adf806e5..486aeb7514 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -408,6 +408,7 @@ PCIBus *pci_bus_new(PCIHostState *phb, const char *name,
 
 bus = PCI_BUS(qbus_create(typename, DEVICE(phb), name));
 pci_bus_init(bus, phb, address_space_mem, address_space_io, devfn_min);
+phb->bus = bus;
 return bus;
 }
 
@@ -433,7 +434,6 @@ void pci_register_bus(PCIHostState *phb, const char *name,
 bus = pci_bus_new(phb, name, address_space_mem,
   address_space_io, devfn_min, typename);
 pci_bus_irqs(bus, set_irq, map_irq, irq_opaque, nirq);
-phb->bus = bus;
 }
 
 int pci_bus_num(PCIBus *s)
-- 
2.11.0.259.g40922b1

[Qemu-devel] [RFC 7/7] pci: Set phb->bus inside pci_bus_new_inplace()

2017-04-17 Thread Eduardo Habkost

Every single caller of pci_bus_new_inplace() sets phb->bus to point to
'bus'. Do that inside pci_bus_new_inplace() to avoid code duplication
and make it harder to break.

Cc: "Hervé Poussineau" 
Cc: Marcel Apfelbaum 
Cc: "Michael S. Tsirkin" 
Cc: Peter Maydell 
Cc: qemu-...@nongnu.org
Cc: qemu-...@nongnu.org
Signed-off-by: Eduardo Habkost 
---
 hw/pci-host/prep.c  | 2 --
 hw/pci-host/versatile.c | 1 -
 hw/pci/pci.c| 1 +
 3 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/pci-host/prep.c b/hw/pci-host/prep.c
index 2e2cd267f4..6efa5bc5ef 100644
--- a/hw/pci-host/prep.c
+++ b/hw/pci-host/prep.c
@@ -284,8 +284,6 @@ static void raven_pcihost_initfn(Object *obj)
 address_space_init(&s->bm_as, &s->bm, "raven-bm");
 pci_setup_iommu(&s->pci_bus, raven_pcihost_set_iommu, s);
 
-h->bus = &s->pci_bus;
-
 object_initialize(&s->pci_dev, sizeof(s->pci_dev), TYPE_RAVEN_PCI_DEVICE);
 pci_dev = DEVICE(&s->pci_dev);
 qdev_set_parent_bus(pci_dev, BUS(&s->pci_bus));
diff --git a/hw/pci-host/versatile.c b/hw/pci-host/versatile.c
index 24ef87610b..630f1ac1c5 100644
--- a/hw/pci-host/versatile.c
+++ b/hw/pci-host/versatile.c
@@ -389,7 +389,6 @@ static void pci_vpb_init(Object *obj)
 pci_bus_new_inplace(&s->pci_bus, sizeof(s->pci_bus), h, "pci",
 &s->pci_mem_space, &s->pci_io_space,
 PCI_DEVFN(11, 0), TYPE_PCI_BUS);
-h->bus = &s->pci_bus;
 
 object_initialize(&s->pci_dev, sizeof(s->pci_dev), 
TYPE_VERSATILE_PCI_HOST);
 qdev_set_parent_bus(DEVICE(&s->pci_dev), BUS(&s->pci_bus));
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 486aeb7514..ef226f8b41 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -397,6 +397,7 @@ void pci_bus_new_inplace(PCIBus *bus, size_t bus_size,
 {
 qbus_create_inplace(bus, bus_size, typename, DEVICE(phb), name);
 pci_bus_init(bus, phb, address_space_mem, address_space_io, devfn_min);
+phb->bus = bus;
 }
 
 PCIBus *pci_bus_new(PCIHostState *phb, const char *name,
-- 
2.11.0.259.g40922b1

[Qemu-devel] [RFC 3/7] pci: Change pci_bus_new*() parameter to PCIHostState

2017-04-17 Thread Eduardo Habkost

The pci_bus_new*() functions already require the 'parent' argument to be
a PCI_HOST_BRIDGE object. Change the parameter type to reflect that.

Cc: "Michael S. Tsirkin" 
Cc: Marcel Apfelbaum 
Cc: "Hervé Poussineau" 
Cc: Peter Maydell 
Cc: qemu-...@nongnu.org
Cc: qemu-...@nongnu.org
Signed-off-by: Eduardo Habkost 
---
 include/hw/pci/pci.h|  5 +++--
 hw/pci-bridge/pci_expander_bridge.c | 15 ---
 hw/pci-host/piix.c  |  2 +-
 hw/pci-host/prep.c  |  2 +-
 hw/pci-host/q35.c   |  2 +-
 hw/pci-host/versatile.c |  2 +-
 hw/pci/pci.c| 13 ++---
 7 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index a37a2d5cb6..2242aa25eb 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -393,12 +393,13 @@ typedef PCIINTxRoute (*pci_route_irq_fn)(void *opaque, 
int pin);
 
 bool pci_bus_is_express(PCIBus *bus);
 bool pci_bus_is_root(PCIBus *bus);
-void pci_bus_new_inplace(PCIBus *bus, size_t bus_size, DeviceState *parent,
+void pci_bus_new_inplace(PCIBus *bus, size_t bus_size,
+ PCIHostState *phb,
  const char *name,
  MemoryRegion *address_space_mem,
  MemoryRegion *address_space_io,
  uint8_t devfn_min, const char *typename);
-PCIBus *pci_bus_new(DeviceState *parent, const char *name,
+PCIBus *pci_bus_new(PCIHostState *phb, const char *name,
 MemoryRegion *address_space_mem,
 MemoryRegion *address_space_io,
 uint8_t devfn_min, const char *typename);
diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index 6ac187fa32..39d29d2230 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -213,7 +213,8 @@ static gint pxb_compare(gconstpointer a, gconstpointer b)
 static void pxb_dev_realize_common(PCIDevice *dev, bool pcie, Error **errp)
 {
 PXBDev *pxb = convert_to_pxb(dev);
-DeviceState *ds, *bds = NULL;
+DeviceState *bds = NULL;
+PCIHostState *phb;
 PCIBus *bus;
 const char *dev_name = NULL;
 Error *local_err = NULL;
@@ -228,11 +229,11 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
pcie, Error **errp)
 dev_name = dev->qdev.id;
 }
 
-ds = qdev_create(NULL, TYPE_PXB_HOST);
+phb = PCI_HOST_BRIDGE(qdev_create(NULL, TYPE_PXB_HOST));
 if (pcie) {
-bus = pci_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
+bus = pci_bus_new(phb, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
 } else {
-bus = pci_bus_new(ds, "pxb-internal", NULL, NULL, 0, TYPE_PXB_BUS);
+bus = pci_bus_new(phb, "pxb-internal", NULL, NULL, 0, TYPE_PXB_BUS);
 bds = qdev_create(BUS(bus), "pci-bridge");
 bds->id = dev_name;
 qdev_prop_set_uint8(bds, PCI_BRIDGE_DEV_PROP_CHASSIS_NR, pxb->bus_nr);
@@ -244,7 +245,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
pcie, Error **errp)
 bus->address_space_io = dev->bus->address_space_io;
 bus->map_irq = pxb_map_irq_fn;
 
-PCI_HOST_BRIDGE(ds)->bus = bus;
+phb->bus = bus;
 
 pxb_register_bus(dev, bus, &local_err);
 if (local_err) {
@@ -252,7 +253,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
pcie, Error **errp)
 goto err_register_bus;
 }
 
-qdev_init_nofail(ds);
+qdev_init_nofail(DEVICE(phb));
 if (bds) {
 qdev_init_nofail(bds);
 }
@@ -267,7 +268,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
pcie, Error **errp)
 err_register_bus:
 object_unref(OBJECT(bds));
 object_unparent(OBJECT(bus));
-object_unref(OBJECT(ds));
+object_unref(OBJECT(phb));
 }
 
 static void pxb_dev_realize(PCIDevice *dev, Error **errp)
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index f9218aa952..91fec05b38 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -340,7 +340,7 @@ PCIBus *i440fx_init(const char *host_type, const char 
*pci_type,
 
 dev = qdev_create(NULL, host_type);
 s = PCI_HOST_BRIDGE(dev);
-b = pci_bus_new(dev, NULL, pci_address_space,
+b = pci_bus_new(s, NULL, pci_address_space,
 address_space_io, 0, TYPE_PCI_BUS);
 s->bus = b;
 object_property_add_child(qdev_get_machine(), "i440fx", OBJECT(dev), NULL);
diff --git a/hw/pci-host/prep.c b/hw/pci-host/prep.c
index 260a119a9e..2e2cd267f4 100644
--- a/hw/pci-host/prep.c
+++ b/hw/pci-host/prep.c
@@ -269,7 +269,7 @@ static void raven_pcihost_initfn(Object *obj)
 memory_region_add_subregion_overlap(address_space_mem, 0x8000,
 &s->pci_io_non_contiguous, 1);
 memory_region_add_subregion(address_space_mem, 0xc000, &s->pci_memory);
-pci_bus_new_inplace(&s->pci_bus, sizeof(s->pci_bus), DEVICE(obj), NULL,
+pci

[Qemu-devel] [RFC 2/7] pci: Change pci_bus_init() 'parent' parameter to PCIHostState

2017-04-17 Thread Eduardo Habkost

pci_bus_init() already requires 'parent' to be a PCI_HOST_BRIDGE object,
so change the parameter type to reflect that.

Cc: "Michael S. Tsirkin" 
Cc: Marcel Apfelbaum 
Signed-off-by: Eduardo Habkost 
---
 hw/pci/pci.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 25118fb91d..d9535c0bdc 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -362,7 +362,7 @@ const char *pci_root_bus_path(PCIDevice *dev)
 return rootbus->qbus.name;
 }
 
-static void pci_bus_init(PCIBus *bus, DeviceState *parent,
+static void pci_bus_init(PCIBus *bus, PCIHostState *phb,
  MemoryRegion *address_space_mem,
  MemoryRegion *address_space_io,
  uint8_t devfn_min)
@@ -375,7 +375,7 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent,
 /* host bridge */
 QLIST_INIT(&bus->child);
 
-pci_host_bus_register(PCI_HOST_BRIDGE(host));
+pci_host_bus_register(phb);
 }
 
 bool pci_bus_is_express(PCIBus *bus)
@@ -394,8 +394,9 @@ void pci_bus_new_inplace(PCIBus *bus, size_t bus_size, 
DeviceState *parent,
  MemoryRegion *address_space_io,
  uint8_t devfn_min, const char *typename)
 {
+PCIHostState *phb = PCI_HOST_BRIDGE(parent);
 qbus_create_inplace(bus, bus_size, typename, parent, name);
-pci_bus_init(bus, parent, address_space_mem, address_space_io, devfn_min);
+pci_bus_init(bus, phb, address_space_mem, address_space_io, devfn_min);
 }
 
 PCIBus *pci_bus_new(DeviceState *parent, const char *name,
@@ -403,10 +404,11 @@ PCIBus *pci_bus_new(DeviceState *parent, const char *name,
 MemoryRegion *address_space_io,
 uint8_t devfn_min, const char *typename)
 {
+PCIHostState *phb = PCI_HOST_BRIDGE(parent);
 PCIBus *bus;
 
 bus = PCI_BUS(qbus_create(typename, parent, name));
-pci_bus_init(bus, parent, address_space_mem, address_space_io, devfn_min);
+pci_bus_init(bus, phb, address_space_mem, address_space_io, devfn_min);
 return bus;
 }
 
-- 
2.11.0.259.g40922b1

[Qemu-devel] [RFC 4/7] pci: Change pci_register_bus() 'parent' parameter to PCIHostState

2017-04-17 Thread Eduardo Habkost

pci_register_bus() already requires the 'parent' argument to be a
PCI_HOST_BRIDGE object. Change the parameter type to reflect that.

Cc: Richard Henderson 
Cc: Aurelien Jarno 
Cc: Yongbok Kim 
Cc: Alexander Graf 
Cc: Scott Wood 
Cc: Paul Burton 
Cc: "Michael S. Tsirkin" 
Cc: Marcel Apfelbaum 
Cc: David Gibson 
Cc: Cornelia Huck 
Cc: Christian Borntraeger 
Cc: qemu-...@nongnu.org
Signed-off-by: Eduardo Habkost 
---
 include/hw/pci/pci.h  | 2 +-
 hw/alpha/typhoon.c| 2 +-
 hw/mips/gt64xxx_pci.c | 2 +-
 hw/pci-host/apb.c | 2 +-
 hw/pci-host/bonito.c  | 2 +-
 hw/pci-host/gpex.c| 2 +-
 hw/pci-host/grackle.c | 2 +-
 hw/pci-host/ppce500.c | 2 +-
 hw/pci-host/uninorth.c| 4 ++--
 hw/pci-host/xilinx-pcie.c | 2 +-
 hw/pci/pci.c  | 4 ++--
 hw/ppc/ppc4xx_pci.c   | 2 +-
 hw/ppc/spapr_pci.c| 2 +-
 hw/s390x/s390-pci-bus.c   | 2 +-
 hw/sh4/sh_pci.c   | 2 +-
 15 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 2242aa25eb..56387ccb0c 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -408,7 +408,7 @@ void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq, 
pci_map_irq_fn map_irq,
 int pci_bus_get_irq_level(PCIBus *bus, int irq_num);
 /* 0 <= pin <= 3 0 = INTA, 1 = INTB, 2 = INTC, 3 = INTD */
 int pci_swizzle_map_irq_fn(PCIDevice *pci_dev, int pin);
-PCIBus *pci_register_bus(DeviceState *parent, const char *name,
+PCIBus *pci_register_bus(PCIHostState *phb, const char *name,
  pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
  void *irq_opaque,
  MemoryRegion *address_space_mem,
diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
index f50f5cf186..ac0633a55e 100644
--- a/hw/alpha/typhoon.c
+++ b/hw/alpha/typhoon.c
@@ -883,7 +883,7 @@ PCIBus *typhoon_init(ram_addr_t ram_size, ISABus **isa_bus,
 memory_region_add_subregion(addr_space, 0x801fc00ULL,
 &s->pchip.reg_io);
 
-b = pci_register_bus(dev, "pci",
+b = pci_register_bus(phb, "pci",
  typhoon_set_irq, sys_map_irq, s,
  &s->pchip.reg_mem, &s->pchip.reg_io,
  0, 64, TYPE_PCI_BUS);
diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c
index 4811843ab6..bd131bcdc6 100644
--- a/hw/mips/gt64xxx_pci.c
+++ b/hw/mips/gt64xxx_pci.c
@@ -1171,7 +1171,7 @@ PCIBus *gt64120_register(qemu_irq *pic)
 phb = PCI_HOST_BRIDGE(dev);
 memory_region_init(&d->pci0_mem, OBJECT(dev), "pci0-mem", UINT32_MAX);
 address_space_init(&d->pci0_mem_as, &d->pci0_mem, "pci0-mem");
-phb->bus = pci_register_bus(dev, "pci",
+phb->bus = pci_register_bus(phb, "pci",
 gt64120_pci_set_irq, gt64120_pci_map_irq,
 pic,
 &d->pci0_mem,
diff --git a/hw/pci-host/apb.c b/hw/pci-host/apb.c
index 653e711121..1156a54224 100644
--- a/hw/pci-host/apb.c
+++ b/hw/pci-host/apb.c
@@ -671,7 +671,7 @@ PCIBus *pci_apb_init(hwaddr special_base,
 dev = qdev_create(NULL, TYPE_APB);
 d = APB_DEVICE(dev);
 phb = PCI_HOST_BRIDGE(dev);
-phb->bus = pci_register_bus(DEVICE(phb), "pci",
+phb->bus = pci_register_bus(phb, "pci",
 pci_apb_set_irq, pci_pbm_map_irq, d,
 &d->pci_mmio,
 get_system_io(),
diff --git a/hw/pci-host/bonito.c b/hw/pci-host/bonito.c
index 1999ece590..27842edc04 100644
--- a/hw/pci-host/bonito.c
+++ b/hw/pci-host/bonito.c
@@ -714,7 +714,7 @@ static int bonito_pcihost_initfn(SysBusDevice *dev)
 {
 PCIHostState *phb = PCI_HOST_BRIDGE(dev);
 
-phb->bus = pci_register_bus(DEVICE(dev), "pci",
+phb->bus = pci_register_bus(phb, "pci",
 pci_bonito_set_irq, pci_bonito_map_irq, dev,
 get_system_memory(), get_system_io(),
 0x28, 32, TYPE_PCI_BUS);
diff --git a/hw/pci-host/gpex.c b/hw/pci-host/gpex.c
index 66055ee5cc..042d127271 100644
--- a/hw/pci-host/gpex.c
+++ b/hw/pci-host/gpex.c
@@ -62,7 +62,7 @@ static void gpex_host_realize(DeviceState *dev, Error **errp)
 sysbus_init_irq(sbd, &s->irq[i]);
 }
 
-pci->bus = pci_register_bus(dev, "pcie.0", gpex_set_irq,
+pci->bus = pci_register_bus(pci, "pcie.0", gpex_set_irq,
 pci_swizzle_map_irq_fn, s, &s->io_mmio,
 &s->io_ioport, 0, 4, TYPE_PCIE_BUS);
 
diff --git a/hw/pci-host/grackle.c b/hw/pci-host/grackle.c
index 2c8acdaaca..a56c063be9 100644
--- a/hw/pci-host/grackle.c
+++ b/hw/pci-host/grackle.c
@@ -82,7 +82,7 @@ PCIBus *pci_grackle_init(uint32_t base, qemu_irq *pic,
 memory_region_add_subregion(address_space_mem, 0x8000ULL,
 &d->pci_hole);
 
-phb->bus = pci_register_bus(dev, NULL,
+

[Qemu-devel] [RFC 5/7] pci: Set phb->bus inside pci_register_bus()

2017-04-17 Thread Eduardo Habkost

Every single caller of of pci_register_bus() saves the return value in
phb->bus. Do that inside pci_register_bus() to avoid code duplication
and make it harder to break.

Most (but not all) conversions done using the following Coccinelle script:

  @@
  identifier b;
  expression phb;
  @@
  -b = pci_register_bus(phb, ARGS);
  +phb->bus = pci_register_bus(phb, ARGS);
   ...
  -phb->bus = b;

  @@
  expression phb;
  expression list ARGS;
  @@
  -phb->bus = pci_register_bus(phb, ARGS);
  +pci_register_bus(phb, ARGS);

Cc: Richard Henderson 
Cc: Aurelien Jarno 
Cc: Yongbok Kim 
Cc: Alexander Graf 
Cc: Scott Wood 
Cc: Paul Burton 
Cc: "Michael S. Tsirkin" 
Cc: Marcel Apfelbaum 
Cc: David Gibson 
Cc: Cornelia Huck 
Cc: Christian Borntraeger 
Cc: qemu-...@nongnu.org
Signed-off-by: Eduardo Habkost 
---
 include/hw/pci/pci.h  | 12 ++--
 hw/alpha/typhoon.c| 10 +-
 hw/mips/gt64xxx_pci.c |  9 +++--
 hw/pci-host/apb.c |  7 ++-
 hw/pci-host/bonito.c  |  7 +++
 hw/pci-host/gpex.c|  5 ++---
 hw/pci-host/grackle.c |  9 ++---
 hw/pci-host/ppce500.c |  8 
 hw/pci-host/uninorth.c| 18 ++
 hw/pci-host/xilinx-pcie.c |  6 +++---
 hw/pci/pci.c  | 14 +++---
 hw/ppc/ppc4xx_pci.c   |  8 
 hw/ppc/spapr_pci.c| 10 +-
 hw/s390x/s390-pci-bus.c   | 10 +-
 hw/sh4/sh_pci.c   |  9 +++--
 15 files changed, 60 insertions(+), 82 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 56387ccb0c..3b1e2c408a 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -408,12 +408,12 @@ void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq, 
pci_map_irq_fn map_irq,
 int pci_bus_get_irq_level(PCIBus *bus, int irq_num);
 /* 0 <= pin <= 3 0 = INTA, 1 = INTB, 2 = INTC, 3 = INTD */
 int pci_swizzle_map_irq_fn(PCIDevice *pci_dev, int pin);
-PCIBus *pci_register_bus(PCIHostState *phb, const char *name,
- pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
- void *irq_opaque,
- MemoryRegion *address_space_mem,
- MemoryRegion *address_space_io,
- uint8_t devfn_min, int nirq, const char *typename);
+void pci_register_bus(PCIHostState *phb, const char *name,
+  pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
+  void *irq_opaque,
+  MemoryRegion *address_space_mem,
+  MemoryRegion *address_space_io,
+  uint8_t devfn_min, int nirq, const char *typename);
 void pci_bus_set_route_irq_fn(PCIBus *, pci_route_irq_fn);
 PCIINTxRoute pci_device_route_intx_to_irq(PCIDevice *dev, int pin);
 bool pci_intx_route_changed(PCIINTxRoute *old, PCIINTxRoute *new);
diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
index ac0633a55e..5926686d79 100644
--- a/hw/alpha/typhoon.c
+++ b/hw/alpha/typhoon.c
@@ -883,11 +883,11 @@ PCIBus *typhoon_init(ram_addr_t ram_size, ISABus 
**isa_bus,
 memory_region_add_subregion(addr_space, 0x801fc00ULL,
 &s->pchip.reg_io);
 
-b = pci_register_bus(phb, "pci",
- typhoon_set_irq, sys_map_irq, s,
- &s->pchip.reg_mem, &s->pchip.reg_io,
- 0, 64, TYPE_PCI_BUS);
-phb->bus = b;
+pci_register_bus(phb, "pci",
+ typhoon_set_irq, sys_map_irq, s,
+ &s->pchip.reg_mem, &s->pchip.reg_io,
+ 0, 64, TYPE_PCI_BUS);
+b = phb->bus;
 qdev_init_nofail(dev);
 
 /* Host memory as seen from the PCI side, via the IOMMU.  */
diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c
index bd131bcdc6..69963453f0 100644
--- a/hw/mips/gt64xxx_pci.c
+++ b/hw/mips/gt64xxx_pci.c
@@ -1171,12 +1171,9 @@ PCIBus *gt64120_register(qemu_irq *pic)
 phb = PCI_HOST_BRIDGE(dev);
 memory_region_init(&d->pci0_mem, OBJECT(dev), "pci0-mem", UINT32_MAX);
 address_space_init(&d->pci0_mem_as, &d->pci0_mem, "pci0-mem");
-phb->bus = pci_register_bus(phb, "pci",
-gt64120_pci_set_irq, gt64120_pci_map_irq,
-pic,
-&d->pci0_mem,
-get_system_io(),
-PCI_DEVFN(18, 0), 4, TYPE_PCI_BUS);
+pci_register_bus(phb, "pci", gt64120_pci_set_irq, gt64120_pci_map_irq,
+ pic, &d->pci0_mem, get_system_io(), PCI_DEVFN(18, 0), 4,
+ TYPE_PCI_BUS);
 qdev_init_nofail(dev);
 memory_region_init_io(&d->ISD_mem, OBJECT(dev), &isd_mem_ops, d, 
"isd-mem", 0x1000);
 
diff --git a/hw/pci-host/apb.c b/hw/pci-host/apb.c
index 1156a54224..ea86260b04 100644
--- a/hw/pci-host/apb.c
+++ b/hw/pci-host/apb.c
@@ -671,11 +671,8 @@ PCIBus *pci_apb_init(hwaddr special_base,
 dev = qdev_create(NULL, TYPE_APB);
 d =

[Qemu-devel] [RFC 1/7] pci: Change pci_host_bus_register() parameter to PCIHostState

2017-04-17 Thread Eduardo Habkost

The function requires a PCI_HOST_BRIDGE object, so change the parameter
type to reflect that.

Cc: "Michael S. Tsirkin" 
Cc: Marcel Apfelbaum 
Signed-off-by: Eduardo Habkost 
---
 hw/pci/pci.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 259483b1c0..25118fb91d 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -312,11 +312,9 @@ static void pcibus_reset(BusState *qbus)
 }
 }
 
-static void pci_host_bus_register(DeviceState *host)
+static void pci_host_bus_register(PCIHostState *phb)
 {
-PCIHostState *host_bridge = PCI_HOST_BRIDGE(host);
-
-QLIST_INSERT_HEAD(&pci_host_bridges, host_bridge, next);
+QLIST_INSERT_HEAD(&pci_host_bridges, phb, next);
 }
 
 PCIBus *pci_find_primary_bus(void)
@@ -377,7 +375,7 @@ static void pci_bus_init(PCIBus *bus, DeviceState *parent,
 /* host bridge */
 QLIST_INIT(&bus->child);
 
-pci_host_bus_register(parent);
+pci_host_bus_register(PCI_HOST_BRIDGE(host));
 }
 
 bool pci_bus_is_express(PCIBus *bus)
-- 
2.11.0.259.g40922b1

[Qemu-devel] [RFC 0/7] pci: Type-safety and phb->bus initialization cleanup

2017-04-17 Thread Eduardo Habkost

I've noticed that pci_bus_new*() and pci_register_bus() require
'parent' to be a PCI_HOST_BRIDGE object, but this is not clear
from the function signatures.

This series implements two changes in the PCI code:

1) Replace DeviceState with PCIHostState on functions that
   already require a PCI_HOST_BRIDGE argument. Makes the
   functions harder to misuse.
2) Move PCIHostState::bus initialization inside pci_bus_new*(),
   to avoid code duplication and make sure the field will be
   always initialized consistently.

Eduardo Habkost (7):
  pci: Change pci_host_bus_register() parameter to PCIHostState
  pci: Change pci_bus_init() 'parent' parameter to PCIHostState
  pci: Change pci_bus_new*() parameter to PCIHostState
  pci: Change pci_register_bus() 'parent' parameter to PCIHostState
  pci: Set phb->bus inside pci_register_bus()
  pci: Set phb->bus inside pci_bus_new()
  pci: Set phb->bus inside pci_bus_new_inplace()

 include/hw/pci/pci.h| 17 
 hw/alpha/typhoon.c  | 10 +-
 hw/mips/gt64xxx_pci.c   |  9 +++--
 hw/pci-bridge/pci_expander_bridge.c | 15 +++---
 hw/pci-host/apb.c   |  7 ++-
 hw/pci-host/bonito.c|  7 +++
 hw/pci-host/gpex.c  |  5 ++---
 hw/pci-host/grackle.c   |  9 ++---
 hw/pci-host/piix.c  |  3 +--
 hw/pci-host/ppce500.c   |  8 
 hw/pci-host/prep.c  |  4 +---
 hw/pci-host/q35.c   |  6 +++---
 hw/pci-host/uninorth.c  | 18 ++---
 hw/pci-host/versatile.c |  3 +--
 hw/pci-host/xilinx-pcie.c   |  6 +++---
 hw/pci/pci.c| 40 ++---
 hw/ppc/ppc4xx_pci.c |  8 
 hw/ppc/spapr_pci.c  | 10 +-
 hw/s390x/s390-pci-bus.c | 10 +-
 hw/sh4/sh_pci.c |  9 +++--
 20 files changed, 89 insertions(+), 115 deletions(-)

-- 
2.11.0.259.g40922b1

Re: [Qemu-devel] DMG chunk size independence

2017-04-17 Thread John Snow

On 04/15/2017 04:38 AM, Ashijeet Acharya wrote:
> Hi,
> 
> Some of you are already aware but for the benefit of the open list,
> this mail is regarding the task mentioned
> Here -> http://wiki.qemu-project.org/ToDo/Block/DmgChunkSizeIndependence
> 

OK, so the idea here is that we should be able to read portions of
chunks instead of buffering entire chunks, because chunks can be quite
large and an unverified DMG file should not be able to cause QEMU to
allocate large portions of memory.

Currently, QEMU has a maximum chunk size and it will not open DMG files
that have chunks that exceed that size, correct?

> I had a chat with Fam regarding this and he suggested a solution where
> we fix the output buffer size to a max of say "64K" and keep inflating
> until we reach the end of the input stream. We extract the required
> data when we enter the desired range and discard the rest. Fam however
> termed this as only a  "quick fix".
> 

So it looks like your problem now is how to allow reads to subsets while
tolerating zipped chunks, right?

We can't predict where the data we want is going to appear mid-stream,
but I'm not that familiar with the DMG format, so what does the data
look like and how do we seek to it in general?

We've got the mish blocks stored inside of the ResouceFork (right?), and
each mish block contains one-or-more chunk records. So given any offset
into the virtual file, we at least know which chunk it belongs to, but
thanks to zlib, we can't just read the bits we care about.

(Correct so far?)

> The ideal fix would obviously be if we can somehow predict the exact
> location inside the compressed stream relative to the desired offset
> in the output decompressed stream, such as a specific sector in a
> chunk. Unfortunately this is not possible without doing a first pass
> over the decompressed stream as answered on the zlib FAQ page
> Here -> http://zlib.net/zlib_faq.html#faq28
> 

Yeah, I think you need to start reading the data from the beginning of
each chunk -- but it depends on the zlib data. It COULD be broken up
into different pieces, but there's no way to know without scanning it in
advance.

(Unrelated:

Do we have a zlib format driver?

It might be cute to break up such DMG files and offload zlib
optimization to another driver, like this:

[dmg]-->[zlib]-->[raw]

And we could pretend that each zlib chunk in this file is virtually its
own zlib "file" and access it with modified offsets as appropriate.

Any optimizations we make could just apply to this driver.

[anyway...])

Pre-scanning for these sync points is probably a waste of time as
there's no way to know (*I THINK*) how big each sync-block would be
decompressed, so there's still no way this helps you seek within a
compressed block...

> AFAICT after reading the zran.c example in zlib, the above mentioned
> ideal fix would ultimately lead us to decompress the whole chunk in
> steps at least once to maintain an access point lookup table. This
> solution is better if we get several random access requests over
> different read requests, otherwise it ends up being equal to the fix
> suggested by Fam plus some extra effort needed in building and
> maintaining access points.
> 

Yeah, probably not worth it overall... I have to imagine that most uses
of DMG files are for iso-like cases for installers where accesses are
going to be either sequential (or mostly sequential) and most data will
not be read twice.

I could be wrong, but that's my hunch.

Maybe you can cache the state of the INFLATE process such that once you
fill the cache with data, we can simply resume the INFLATE procedure
when the guest almost inevitably asks for the next subsequent bytes.

That'd probably be efficient /enough/ in most cases without having to
worry about a metadata cache for zlib blocks or a literal data cache for
inflated data.

Or maybe I'm full of crap, I don't know -- I'd probably try a few
approaches and see which one empirically worked better.

> I have not explored the bzip2 compressed chunks yet but have naively
> assumed that we will face the same situation there?
> 

Not sure.

> I would like the community's opinion on this and add their suggestions
> if possible to give me some new thinking points.
> 
> Thanks
> Ashijeet
>

Re: [Qemu-devel] [PATCH 09/19] migration: Create block capabilities for shared and enable

2017-04-17 Thread Eric Blake

On 04/17/2017 03:00 PM, Juan Quintela wrote:
> This two capabilites were added through the command line.  Notice that

s/This/These/
s/capabilites/capabilities/

> we just created them.  This is just the boilerplate.
> 
> Signed-off-by: Juan Quintela 
> ---
>  include/migration/migration.h |  3 +++
>  migration/migration.c | 36 
>  qapi-schema.json  |  7 ++-
>  3 files changed, 45 insertions(+), 1 deletion(-)

I think this is a nice cleanup, even if it is exposing the internal
block migration (the 'migrate -b' stuff) that we really don't like (it
is one of the things causing grief at the 2.9-rc4 stage), because users
should be favoring NBD migration over internal block migration these days.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 17/19] migration: Export rdma.c functions in its own file

2017-04-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 include/migration/migration.h |  4 
 include/migration/rdma.h  | 22 ++
 migration/migration.c |  1 +
 migration/rdma.c  |  1 +
 4 files changed, 24 insertions(+), 4 deletions(-)
 create mode 100644 include/migration/rdma.h

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3e5d106..64c1ff3 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -177,10 +177,6 @@ void migration_fd_process_incoming(QEMUFile *f);
 
 uint64_t migrate_max_downtime(void);
 
-void rdma_start_outgoing_migration(void *opaque, const char *host_port, Error 
**errp);
-
-void rdma_start_incoming_migration(const char *host_port, Error **errp);
-
 void migrate_fd_error(MigrationState *s, const Error *error);
 
 void migrate_fd_connect(MigrationState *s);
diff --git a/include/migration/rdma.h b/include/migration/rdma.h
new file mode 100644
index 000..d2cf481
--- /dev/null
+++ b/include/migration/rdma.h
@@ -0,0 +1,22 @@
+/*
+ * QEMU migration rdma functions
+ *
+ * Copyright IBM, Corp. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_MIGRATION_RDMA_H
+#define QEMU_MIGRATION_RDMA_H
+
+void rdma_start_outgoing_migration(void *opaque, const char *host_port,
+   Error **errp);
+
+void rdma_start_incoming_migration(const char *host_port, Error **errp);
+
+#endif
diff --git a/migration/migration.c b/migration/migration.c
index ba01ea2..04aa9ad 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -23,6 +23,7 @@
 #include "migration/exec.h"
 #include "migration/fd.h"
 #include "migration/socket.h"
+#include "migration/rdma.h"
 #include "migration/ram.h"
 #include "migration/migration.h"
 #include "migration/qemu-file-channel.h"
diff --git a/migration/rdma.c b/migration/rdma.c
index 3b06fe6..2b4276b 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -17,6 +17,7 @@
 #include "qapi/error.h"
 #include "qemu-common.h"
 #include "qemu/cutils.h"
+#include "migration/rdma.h"
 #include "migration/migration.h"
 #include "migration/ram.h"
 #include "migration/qemu-file-channel.h"
-- 
2.9.3

[Qemu-devel] [PATCH 13/19] migration: Remove qemu-file.h from vmstate.h

2017-04-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 include/hw/hw.h | 1 +
 include/migration/vmstate.h | 3 ---
 migration/block.c   | 1 +
 migration/channel.c | 1 +
 migration/colo.c| 1 +
 migration/postcopy-ram.c| 1 +
 migration/ram.c | 1 +
 7 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/hw/hw.h b/include/hw/hw.h
index e22d4ce..af9eae1 100644
--- a/include/hw/hw.h
+++ b/include/hw/hw.h
@@ -11,6 +11,7 @@
 #include "exec/memory.h"
 #include "hw/irq.h"
 #include "migration/vmstate.h"
+#include "migration/qemu-file.h"
 #include "qemu/module.h"
 #include "sysemu/reset.h"
 
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index f4c5bed..1e6fcb5 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -27,9 +27,6 @@
 #ifndef QEMU_VMSTATE_H
 #define QEMU_VMSTATE_H
 
-#ifndef CONFIG_USER_ONLY
-#include "migration/qemu-file.h"
-#endif
 #include "migration/qjson.h"
 
 typedef void SaveStateHandler(QEMUFile *f, void *opaque);
diff --git a/migration/block.c b/migration/block.c
index 0722837..e45a42d 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -19,6 +19,7 @@
 #include "qemu/cutils.h"
 #include "migration/block.h"
 #include "migration/migration.h"
+#include "migration/qemu-file.h"
 #include "sysemu/block-backend.h"
 
 #define BLOCK_SIZE   (1 << 20)
diff --git a/migration/channel.c b/migration/channel.c
index 10416e0..04a26c5 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -17,6 +17,7 @@
 #include "migration/channel.h"
 #include "migration/tls.h"
 #include "migration/migration.h"
+#include "migration/qemu-file.h"
 #include "trace.h"
 #include "qapi/error.h"
 #include "io/channel-tls.h"
diff --git a/migration/colo.c b/migration/colo.c
index d455884..e2eaccd 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "sysemu/sysemu.h"
 #include "migration/migration.h"
+#include "migration/qemu-file.h"
 #include "migration/colo.h"
 #include "io/channel-buffer.h"
 #include "trace.h"
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 11b24c6..5aea2ff 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -18,6 +18,7 @@
 
 #include "qemu/osdep.h"
 #include "migration/migration.h"
+#include "migration/qemu-file.h"
 #include "migration/postcopy-ram.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/balloon.h"
diff --git a/migration/ram.c b/migration/ram.c
index 4f49622..b0759ac 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -37,6 +37,7 @@
 #include "migration/xbzrle.h"
 #include "migration/init.h"
 #include "migration/migration.h"
+#include "migration/qemu-file.h"
 #include "migration/postcopy-ram.h"
 #include "migration/page_cache.h"
 #include "qemu/error-report.h"
-- 
2.9.3

[Qemu-devel] [PATCH 16/19] migration: Export ram.c functions in its own file

2017-04-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 include/migration/migration.h | 36 -
 include/migration/ram.h   | 54 +++
 migration/migration.c |  1 +
 migration/postcopy-ram.c  |  1 +
 migration/ram.c   |  1 +
 migration/rdma.c  |  1 +
 migration/savevm.c|  1 +
 7 files changed, 59 insertions(+), 36 deletions(-)
 create mode 100644 include/migration/ram.h

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 1451067..3e5d106 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -192,36 +192,6 @@ bool migration_is_blocked(Error **errp);
 bool migration_in_postcopy(void);
 MigrationState *migrate_get_current(void);
 
-void migrate_compress_threads_create(void);
-void migrate_compress_threads_join(void);
-void migrate_decompress_threads_create(void);
-void migrate_decompress_threads_join(void);
-uint64_t ram_bytes_remaining(void);
-uint64_t ram_bytes_transferred(void);
-uint64_t ram_bytes_total(void);
-uint64_t ram_dirty_sync_count(void);
-uint64_t ram_dirty_pages_rate(void);
-uint64_t ram_postcopy_requests(void);
-void free_xbzrle_decoded_buf(void);
-
-void acct_update_position(QEMUFile *f, size_t size, bool zero);
-
-uint64_t dup_mig_pages_transferred(void);
-uint64_t norm_mig_pages_transferred(void);
-uint64_t xbzrle_mig_bytes_transferred(void);
-uint64_t xbzrle_mig_pages_transferred(void);
-uint64_t xbzrle_mig_pages_overflow(void);
-uint64_t xbzrle_mig_pages_cache_miss(void);
-double xbzrle_mig_cache_miss_rate(void);
-
-void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
-void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
-/* For outgoing discard bitmap */
-int ram_postcopy_send_discard_bitmap(MigrationState *ms);
-/* For incoming postcopy discard */
-int ram_discard_range(const char *block_name, uint64_t start, size_t length);
-int ram_postcopy_incoming_init(MigrationIncomingState *mis);
-void ram_postcopy_migrated_memory_release(MigrationState *ms);
 
 bool migrate_release_ram(void);
 bool migrate_postcopy_ram(void);
@@ -233,8 +203,6 @@ int migrate_use_xbzrle(void);
 int64_t migrate_xbzrle_cache_size(void);
 bool migrate_colo_enabled(void);
 
-int64_t xbzrle_cache_resize(int64_t new_size);
-
 bool migrate_use_block_enabled(void);
 bool migrate_use_block_shared(void);
 
@@ -273,10 +241,6 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t 
block_offset,
  ram_addr_t offset, size_t size,
  uint64_t *bytes_sent);
 
-void migration_page_queue_free(void);
-int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
-uint64_t ram_pagesize_summary(void);
-
 PostcopyState postcopy_state_get(void);
 /* Set the state and return the old state */
 PostcopyState postcopy_state_set(PostcopyState new_state);
diff --git a/include/migration/ram.h b/include/migration/ram.h
new file mode 100644
index 000..c3653b3
--- /dev/null
+++ b/include/migration/ram.h
@@ -0,0 +1,54 @@
+/*
+ * QEMU migration ram
+ *
+ * Copyright IBM, Corp. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_MIGRATION_RAM_H
+#define QEMU_MIGRATION_RAM_H
+
+#include "qemu-common.h"
+#include "exec/cpu-common.h"
+
+int64_t xbzrle_cache_resize(int64_t new_size);
+uint64_t dup_mig_pages_transferred(void);
+uint64_t norm_mig_pages_transferred(void);
+uint64_t xbzrle_mig_bytes_transferred(void);
+uint64_t xbzrle_mig_pages_transferred(void);
+uint64_t xbzrle_mig_pages_cache_miss(void);
+double xbzrle_mig_cache_miss_rate(void);
+uint64_t xbzrle_mig_pages_overflow(void);
+uint64_t ram_bytes_transferred(void);
+uint64_t ram_bytes_remaining(void);
+uint64_t ram_dirty_sync_count(void);
+uint64_t ram_dirty_pages_rate(void);
+uint64_t ram_postcopy_requests(void);
+uint64_t ram_bytes_total(void);
+
+void migrate_compress_threads_create(void);
+void migrate_compress_threads_join(void);
+void migrate_decompress_threads_create(void);
+void migrate_decompress_threads_join(void);
+
+uint64_t ram_pagesize_summary(void);
+void migration_page_queue_free(void);
+int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
+void acct_update_position(QEMUFile *f, size_t size, bool zero);
+void free_xbzrle_decoded_buf(void);
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
+void ram_postcopy_migrated_memory_release(MigrationState *ms);
+/* For outgoing discard bitmap */
+int ram_postcopy_send_discard_bitmap(MigrationState *ms);
+/* For incoming postcopy discard */
+int ram_discard_range(const char *block_name, uint64_t start, size_t length);
+int ram_postcopy_incoming_init(MigrationIncomingState *mis);
+
+void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+#endif
diff --git a/migration/migration.c b/migration/

[Qemu-devel] [PATCH 19/19] monitor: remove monitor parameter from save_vmstate

2017-04-17 Thread Juan Quintela

load_vmstate() already use error_report, so be consistent.

Signed-off-by: Juan Quintela 
---
 include/sysemu/sysemu.h  |  2 +-
 migration/savevm.c   | 16 
 monitor.c|  2 +-
 replay/replay-snapshot.c |  2 +-
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 146a0dc..d2582fa 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -75,7 +75,7 @@ void qemu_remove_exit_notifier(Notifier *notify);
 void qemu_add_machine_init_done_notifier(Notifier *notify);
 void qemu_remove_machine_init_done_notifier(Notifier *notify);
 
-int save_vmstate(Monitor *mon, const char *name);
+int save_vmstate(const char *name);
 int load_vmstate(const char *name);
 void hmp_delvm(Monitor *mon, const QDict *qdict);
 void hmp_info_snapshots(Monitor *mon, const QDict *qdict);
diff --git a/migration/savevm.c b/migration/savevm.c
index cbd7e0d..36a6002 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2055,7 +2055,7 @@ int qemu_loadvm_state(QEMUFile *f)
 return ret;
 }
 
-int save_vmstate(Monitor *mon, const char *name)
+int save_vmstate(const char *name)
 {
 BlockDriverState *bs, *bs1;
 QEMUSnapshotInfo sn1, *sn = &sn1, old_sn1, *old_sn = &old_sn1;
@@ -2069,8 +2069,8 @@ int save_vmstate(Monitor *mon, const char *name)
 AioContext *aio_context;
 
 if (!bdrv_all_can_snapshot(&bs)) {
-monitor_printf(mon, "Device '%s' is writable but does not "
-   "support snapshots.\n", bdrv_get_device_name(bs));
+error_report("Device '%s' is writable but does not support snapshots.",
+ bdrv_get_device_name(bs));
 return ret;
 }
 
@@ -2087,7 +2087,7 @@ int save_vmstate(Monitor *mon, const char *name)
 
 bs = bdrv_all_find_vmstate_bs();
 if (bs == NULL) {
-monitor_printf(mon, "No block device can accept snapshots\n");
+error_report("No block device can accept snapshots");
 return ret;
 }
 aio_context = bdrv_get_aio_context(bs);
@@ -2096,7 +2096,7 @@ int save_vmstate(Monitor *mon, const char *name)
 
 ret = global_state_store();
 if (ret) {
-monitor_printf(mon, "Error saving global state\n");
+error_report("Error saving global state");
 return ret;
 }
 vm_stop(RUN_STATE_SAVE_VM);
@@ -2128,7 +2128,7 @@ int save_vmstate(Monitor *mon, const char *name)
 /* save the VM state */
 f = qemu_fopen_bdrv(bs, 1);
 if (!f) {
-monitor_printf(mon, "Could not open VM state file\n");
+error_report("Could not open VM state file");
 goto the_end;
 }
 ret = qemu_savevm_state(f, &local_err);
@@ -2141,8 +2141,8 @@ int save_vmstate(Monitor *mon, const char *name)
 
 ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, &bs);
 if (ret < 0) {
-monitor_printf(mon, "Error while creating snapshot on '%s'\n",
-   bdrv_get_device_name(bs));
+error_report("Error while creating snapshot on '%s'",
+ bdrv_get_device_name(bs));
 goto the_end;
 }
 
diff --git a/monitor.c b/monitor.c
index 2fca4fb..9e79a97 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1856,7 +1856,7 @@ static void hmp_loadvm(Monitor *mon, const QDict *qdict)
 
 static void hmp_savevm(Monitor *mon, const QDict *qdict)
 {
-save_vmstate(mon, qdict_get_try_str(qdict, "name"));
+save_vmstate(qdict_get_try_str(qdict, "name"));
 }
 
 int monitor_get_fd(Monitor *mon, const char *fdname, Error **errp)
diff --git a/replay/replay-snapshot.c b/replay/replay-snapshot.c
index 65e2d37..8cced46 100644
--- a/replay/replay-snapshot.c
+++ b/replay/replay-snapshot.c
@@ -64,7 +64,7 @@ void replay_vmstate_init(void)
 {
 if (replay_snapshot) {
 if (replay_mode == REPLAY_MODE_RECORD) {
-if (save_vmstate(cur_mon, replay_snapshot) != 0) {
+if (save_vmstate(replay_snapshot) != 0) {
 error_report("Could not create snapshot for icount record");
 exit(1);
 }
-- 
2.9.3

[Qemu-devel] [PATCH 10/19] migration: Remove use of old MigrationParams

2017-04-17 Thread Juan Quintela

We have change in the previous patch to use migration capabilities for
it.  Notice that we continue using the old command line flags from
migrate command from the time being.  Remove the set_params method as
now it is empty.

Signed-off-by: Juan Quintela 
---
 include/migration/migration.h |  2 --
 migration/block.c | 17 ++---
 migration/colo.c  |  3 ---
 migration/migration.c |  8 +---
 migration/savevm.c|  2 --
 5 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 5c78ae1..09d3188 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -41,8 +41,6 @@
 extern int only_migratable;
 
 struct MigrationParams {
-bool blk;
-bool shared;
 };
 
 /* Messages sent on the return path from destination to source */
diff --git a/migration/block.c b/migration/block.c
index 7734ff7..9490343 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -94,9 +94,6 @@ typedef struct BlkMigBlock {
 } BlkMigBlock;
 
 typedef struct BlkMigState {
-/* Written during setup phase.  Can be read without a lock.  */
-int blk_enable;
-int shared_base;
 QSIMPLEQ_HEAD(bmds_list, BlkMigDevState) bmds_list;
 int64_t total_sector_sum;
 bool zero_blocks;
@@ -425,7 +422,7 @@ static int init_blk_migration(QEMUFile *f)
 bmds->bulk_completed = 0;
 bmds->total_sectors = sectors;
 bmds->completed_sectors = 0;
-bmds->shared_base = block_mig_state.shared_base;
+bmds->shared_base = migrate_use_block_shared();
 
 assert(i < num_bs);
 bmds_bs[i].bmds = bmds;
@@ -963,22 +960,12 @@ static int block_load(QEMUFile *f, void *opaque, int 
version_id)
 return 0;
 }
 
-static void block_set_params(const MigrationParams *params, void *opaque)
-{
-block_mig_state.blk_enable = params->blk;
-block_mig_state.shared_base = params->shared;
-
-/* shared base means that blk_enable = 1 */
-block_mig_state.blk_enable |= params->shared;
-}
-
 static bool block_is_active(void *opaque)
 {
-return block_mig_state.blk_enable == 1;
+return migrate_use_block_enabled();
 }
 
 static SaveVMHandlers savevm_block_handlers = {
-.set_params = block_set_params,
 .save_live_setup = block_save_setup,
 .save_live_iterate = block_save_iterate,
 .save_live_complete_precopy = block_save_complete,
diff --git a/migration/colo.c b/migration/colo.c
index eec1959..4c2365e 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -333,9 +333,6 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
 goto out;
 }
 
-/* Disable block migration */
-s->params.blk = 0;
-s->params.shared = 0;
 qemu_savevm_state_header(fb);
 qemu_savevm_state_begin(fb, &s->params);
 qemu_mutex_lock_iothread();
diff --git a/migration/migration.c b/migration/migration.c
index 2f10657..39b4a41 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -743,6 +743,10 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 s->enabled_capabilities[cap->value->capability] = cap->value->state;
 }
 
+if (s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED]) {
+s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_ENABLED] = true;
+}
+
 if (migrate_postcopy_ram()) {
 if (migrate_use_compression()) {
 /* The decompression threads asynchronously write into RAM
@@ -1170,9 +1174,6 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 MigrationParams params;
 const char *p;
 
-params.blk = has_blk && blk;
-params.shared = has_inc && inc;
-
 if (migration_is_setup_or_active(s->state) ||
 s->state == MIGRATION_STATUS_CANCELLING ||
 s->state == MIGRATION_STATUS_COLO) {
@@ -1195,6 +1196,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 }
 
 if (has_inc && inc) {
+migrate_set_block_enabled(s);
 migrate_set_block_shared(s);
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index c47b209..c4435e1 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1237,8 +1237,6 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
 {
 int ret;
 MigrationParams params = {
-.blk = 0,
-.shared = 0
 };
 MigrationState *ms = migrate_init(¶ms);
 MigrationStatus status;
-- 
2.9.3

[Qemu-devel] [PATCH 11/19] migration: Remove old MigrationParams

2017-04-17 Thread Juan Quintela

Not used anymore after moving block migration to use capabilities.

Signed-off-by: Juan Quintela 
---
 include/migration/migration.h | 10 +++---
 include/migration/vmstate.h   |  1 -
 include/qemu/typedefs.h   |  1 -
 include/sysemu/sysemu.h   |  3 +--
 migration/colo.c  |  2 +-
 migration/migration.c |  8 +++-
 migration/savevm.c| 16 +++-
 7 files changed, 11 insertions(+), 30 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 09d3188..3353de1 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -40,9 +40,6 @@
 /* for vl.c */
 extern int only_migratable;
 
-struct MigrationParams {
-};
-
 /* Messages sent on the return path from destination to source */
 enum mig_rp_message_type {
 MIG_RP_MSG_INVALID = 0,  /* Must be 0 */
@@ -132,12 +129,10 @@ struct MigrationState
 QEMUBH *cleanup_bh;
 QEMUFile *to_dst_file;
 
-/* New style params from 'migrate-set-parameters' */
+/* params from 'migrate-set-parameters' */
 MigrationParameters parameters;
 
 int state;
-/* Old style params from 'migrate' command */
-MigrationParams params;
 
 /* State related to return path */
 struct {
@@ -191,7 +186,8 @@ void migrate_fd_error(MigrationState *s, const Error 
*error);
 
 void migrate_fd_connect(MigrationState *s);
 
-MigrationState *migrate_init(const MigrationParams *params);
+MigrationState *migrate_init(void);
+
 bool migration_is_blocked(Error **errp);
 /* True if outgoing migration has entered postcopy phase */
 bool migration_in_postcopy(void);
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 624aa0d..f4c5bed 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -37,7 +37,6 @@ typedef int LoadStateHandler(QEMUFile *f, void *opaque, int 
version_id);
 
 typedef struct SaveVMHandlers {
 /* This runs inside the iothread lock.  */
-void (*set_params)(const MigrationParams *params, void * opaque);
 SaveStateHandler *save_state;
 
 void (*cleanup)(void *opaque);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index e95f28c..7e6021c 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -48,7 +48,6 @@ typedef struct MemoryRegion MemoryRegion;
 typedef struct MemoryRegionCache MemoryRegionCache;
 typedef struct MemoryRegionSection MemoryRegionSection;
 typedef struct MigrationIncomingState MigrationIncomingState;
-typedef struct MigrationParams MigrationParams;
 typedef struct MigrationState MigrationState;
 typedef struct Monitor Monitor;
 typedef struct MonitorDef MonitorDef;
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 16175f7..5f2f21d 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -105,8 +105,7 @@ enum qemu_vm_cmd {
 #define MAX_VM_CMD_PACKAGED_SIZE (1ul << 24)
 
 bool qemu_savevm_state_blocked(Error **errp);
-void qemu_savevm_state_begin(QEMUFile *f,
- const MigrationParams *params);
+void qemu_savevm_state_begin(QEMUFile *f);
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
 void qemu_savevm_state_cleanup(void);
diff --git a/migration/colo.c b/migration/colo.c
index 4c2365e..2924602 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -334,7 +334,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
 }
 
 qemu_savevm_state_header(fb);
-qemu_savevm_state_begin(fb, &s->params);
+qemu_savevm_state_begin(fb);
 qemu_mutex_lock_iothread();
 qemu_savevm_state_complete_precopy(fb, false);
 qemu_mutex_unlock_iothread();
diff --git a/migration/migration.c b/migration/migration.c
index 39b4a41..9ab66b6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1058,7 +1058,7 @@ bool migration_is_idle(void)
 return false;
 }
 
-MigrationState *migrate_init(const MigrationParams *params)
+MigrationState *migrate_init(void)
 {
 MigrationState *s = migrate_get_current();
 
@@ -1072,7 +1072,6 @@ MigrationState *migrate_init(const MigrationParams 
*params)
 s->cleanup_bh = 0;
 s->to_dst_file = NULL;
 s->state = MIGRATION_STATUS_NONE;
-s->params = *params;
 s->rp_state.from_dst_file = NULL;
 s->rp_state.error = false;
 s->mbps = 0.0;
@@ -1171,7 +1170,6 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 {
 Error *local_err = NULL;
 MigrationState *s = migrate_get_current();
-MigrationParams params;
 const char *p;
 
 if (migration_is_setup_or_active(s->state) ||
@@ -1189,7 +1187,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 return;
 }
 
-s = migrate_init(¶ms);
+s = migrate_init();
 
 if (has_blk && blk) {
 migrate_set_block_enabled(s);
@@ -1917,7 +1915,7 @@ static void *migration_thread(void *opaque)
 qemu_savevm_send_postcopy_advise(s->to_dst_file);
 }
 
-

[Qemu-devel] [PATCH 18/19] monitor: move hmp_savevm() to monitor.c

2017-04-17 Thread Juan Quintela

hmp_loadvm is already there, so be consistent.

Signed-off-by: Juan Quintela 
---
 include/sysemu/sysemu.h | 1 -
 migration/savevm.c  | 5 -
 monitor.c   | 5 +
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 5f2f21d..146a0dc 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -75,7 +75,6 @@ void qemu_remove_exit_notifier(Notifier *notify);
 void qemu_add_machine_init_done_notifier(Notifier *notify);
 void qemu_remove_machine_init_done_notifier(Notifier *notify);
 
-void hmp_savevm(Monitor *mon, const QDict *qdict);
 int save_vmstate(Monitor *mon, const char *name);
 int load_vmstate(const char *name);
 void hmp_delvm(Monitor *mon, const QDict *qdict);
diff --git a/migration/savevm.c b/migration/savevm.c
index f628d01..cbd7e0d 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2156,11 +2156,6 @@ int save_vmstate(Monitor *mon, const char *name)
 return ret;
 }
 
-void hmp_savevm(Monitor *mon, const QDict *qdict)
-{
-save_vmstate(mon, qdict_get_try_str(qdict, "name"));
-}
-
 void qmp_xen_save_devices_state(const char *filename, Error **errp)
 {
 QEMUFile *f;
diff --git a/monitor.c b/monitor.c
index ceb0489..2fca4fb 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1854,6 +1854,11 @@ static void hmp_loadvm(Monitor *mon, const QDict *qdict)
 }
 }
 
+static void hmp_savevm(Monitor *mon, const QDict *qdict)
+{
+save_vmstate(mon, qdict_get_try_str(qdict, "name"));
+}
+
 int monitor_get_fd(Monitor *mon, const char *fdname, Error **errp)
 {
 mon_fd_t *monfd;
-- 
2.9.3

[Qemu-devel] [PATCH 09/19] migration: Create block capabilities for shared and enable

2017-04-17 Thread Juan Quintela

This two capabilites were added through the command line.  Notice that
we just created them.  This is just the boilerplate.

Signed-off-by: Juan Quintela 
---
 include/migration/migration.h |  3 +++
 migration/migration.c | 36 
 qapi-schema.json  |  7 ++-
 3 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index d6f2e94..5c78ae1 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -242,6 +242,9 @@ bool migrate_colo_enabled(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
+bool migrate_use_block_enabled(void);
+bool migrate_use_block_shared(void);
+
 bool migrate_use_compression(void);
 int migrate_compress_level(void);
 int migrate_compress_threads(void);
diff --git a/migration/migration.c b/migration/migration.c
index ce34be5..2f10657 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1151,6 +1151,16 @@ bool migration_is_blocked(Error **errp)
 return false;
 }
 
+static void migrate_set_block_shared(MigrationState *s)
+{
+s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED] = true;
+}
+
+static void migrate_set_block_enabled(MigrationState *s)
+{
+s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_ENABLED] = true;
+}
+
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
  bool has_inc, bool inc, bool has_detach, bool detach,
  Error **errp)
@@ -1180,6 +1190,14 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 
 s = migrate_init(¶ms);
 
+if (has_blk && blk) {
+migrate_set_block_enabled(s);
+}
+
+if (has_inc && inc) {
+migrate_set_block_shared(s);
+}
+
 if (strstart(uri, "tcp:", &p)) {
 tcp_start_outgoing_migration(s, p, &local_err);
 #ifdef CONFIG_RDMA
@@ -1375,6 +1393,24 @@ int64_t migrate_xbzrle_cache_size(void)
 return s->xbzrle_cache_size;
 }
 
+bool migrate_use_block_enabled(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_ENABLED];
+}
+
+bool migrate_use_block_shared(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->enabled_capabilities[MIGRATION_CAPABILITY_BLOCK_SHARED];
+}
+
 /* migration thread support */
 /*
  * Something bad happened to the RP stream, mark an error
diff --git a/qapi-schema.json b/qapi-schema.json
index 01b087f..e963bb3 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -894,11 +894,16 @@
 # @release-ram: if enabled, qemu will free the migrated ram pages on the source
 #during postcopy-ram migration. (since 2.9)
 #
+# @block-enabled: enable block migration (Since 2.10)
+#
+# @block-shared: enable block shared migration (Since 2.10)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
-   'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram'] }
+   'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
+   'block-enabled', 'block-shared' ] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.9.3

[Qemu-devel] [PATCH 14/19] migration: Remove vmstate.h from migration.h

2017-04-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 include/migration/migration.h | 1 -
 migration/block.c | 1 +
 migration/colo-comm.c | 1 +
 migration/migration.c | 1 +
 migration/ram.c   | 1 +
 5 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 3353de1..1451067 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -17,7 +17,6 @@
 #include "qapi/qmp/qdict.h"
 #include "qemu-common.h"
 #include "qemu/thread.h"
-#include "migration/vmstate.h"
 #include "qapi-types.h"
 #include "exec/cpu-common.h"
 #include "qemu/coroutine_int.h"
diff --git a/migration/block.c b/migration/block.c
index e45a42d..3048088 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -20,6 +20,7 @@
 #include "migration/block.h"
 #include "migration/migration.h"
 #include "migration/qemu-file.h"
+#include "migration/vmstate.h"
 #include "sysemu/block-backend.h"
 
 #define BLOCK_SIZE   (1 << 20)
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index 4161242..f11fa81 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -13,6 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "migration/migration.h"
+#include "migration/vmstate.h"
 #include "migration/colo.h"
 
 typedef struct {
diff --git a/migration/migration.c b/migration/migration.c
index 9990c46..c2c25fe 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -25,6 +25,7 @@
 #include "migration/socket.h"
 #include "migration/migration.h"
 #include "migration/qemu-file.h"
+#include "migration/vmstate.h"
 #include "sysemu/sysemu.h"
 #include "block/block.h"
 #include "qapi/qmp/qerror.h"
diff --git a/migration/ram.c b/migration/ram.c
index b0759ac..49e518f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -38,6 +38,7 @@
 #include "migration/init.h"
 #include "migration/migration.h"
 #include "migration/qemu-file.h"
+#include "migration/vmstate.h"
 #include "migration/postcopy-ram.h"
 #include "migration/page_cache.h"
 #include "qemu/error-report.h"
-- 
2.9.3

[Qemu-devel] [PATCH 08/19] migration: Export tls.c functions in its own file

2017-04-17 Thread Juan Quintela

Just for the functions exported from tls.c.  Notice that we can't
remove the migration/migration.h include from tls.c because it access
directly MigrationState for the tls params.

Signed-off-by: Juan Quintela 
---
 include/migration/migration.h |  9 -
 include/migration/tls.h   | 27 +++
 migration/channel.c   |  1 +
 migration/migration.c |  1 -
 migration/tls.c   |  1 +
 5 files changed, 29 insertions(+), 10 deletions(-)
 create mode 100644 include/migration/tls.h

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 0c6dae5..d6f2e94 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -183,15 +183,6 @@ void migrate_set_state(int *state, int old_state, int 
new_state);
 
 void migration_fd_process_incoming(QEMUFile *f);
 
-void migration_tls_channel_process_incoming(MigrationState *s,
-QIOChannel *ioc,
-Error **errp);
-
-void migration_tls_channel_connect(MigrationState *s,
-   QIOChannel *ioc,
-   const char *hostname,
-   Error **errp);
-
 uint64_t migrate_max_downtime(void);
 
 void rdma_start_outgoing_migration(void *opaque, const char *host_port, Error 
**errp);
diff --git a/include/migration/tls.h b/include/migration/tls.h
new file mode 100644
index 000..1d63263
--- /dev/null
+++ b/include/migration/tls.h
@@ -0,0 +1,27 @@
+/*
+ * QEMU live migration tls functions
+ *
+ * Copyright IBM, Corp. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_MIGRATION_TLS_H
+#define QEMU_MIGRATION_TLS_H
+
+#include "io/channel.h"
+
+void migration_tls_channel_process_incoming(MigrationState *s,
+QIOChannel *ioc,
+Error **errp);
+
+void migration_tls_channel_connect(MigrationState *s,
+   QIOChannel *ioc,
+   const char *hostname,
+   Error **errp);
+#endif
diff --git a/migration/channel.c b/migration/channel.c
index 6f11a1a..10416e0 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -15,6 +15,7 @@
 
 #include "qemu/osdep.h"
 #include "migration/channel.h"
+#include "migration/tls.h"
 #include "migration/migration.h"
 #include "trace.h"
 #include "qapi/error.h"
diff --git a/migration/migration.c b/migration/migration.c
index 181839b..ce34be5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -42,7 +42,6 @@
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
 #include "io/channel-buffer.h"
-#include "io/channel-tls.h"
 #include "migration/colo.h"
 
 #define MAX_THROTTLE  (32 << 20)  /* Migration transfer speed throttling */
diff --git a/migration/tls.c b/migration/tls.c
index 2b28ebd..b9e1791 100644
--- a/migration/tls.c
+++ b/migration/tls.c
@@ -21,6 +21,7 @@
 #include "qemu/osdep.h"
 #include "migration/channel.h"
 #include "migration/migration.h"
+#include "migration/tls.h"
 #include "io/channel-tls.h"
 #include "crypto/tlscreds.h"
 #include "qemu/error-report.h"
-- 
2.9.3

[Qemu-devel] [PATCH 15/19] migration: Export qemu-file-channel.c functions in its own file

2017-04-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 include/migration/qemu-file-channel.h | 22 ++
 include/migration/qemu-file.h |  3 ---
 migration/channel.c   |  2 +-
 migration/colo.c  |  2 +-
 migration/migration.c |  2 +-
 migration/qemu-file-channel.c |  2 +-
 migration/rdma.c  |  2 +-
 migration/savevm.c|  1 +
 8 files changed, 28 insertions(+), 8 deletions(-)
 create mode 100644 include/migration/qemu-file-channel.h

diff --git a/include/migration/qemu-file-channel.h 
b/include/migration/qemu-file-channel.h
new file mode 100644
index 000..bbbf36d
--- /dev/null
+++ b/include/migration/qemu-file-channel.h
@@ -0,0 +1,22 @@
+/*
+ * QEMU migration blockers
+ *
+ * Copyright IBM, Corp. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_FILE_CHANNEL_H
+#define QEMU_FILE_CHANNEL_H
+
+#include "migration/qemu-file.h"
+#include "io/channel.h"
+
+QEMUFile *qemu_fopen_channel_input(QIOChannel *ioc);
+QEMUFile *qemu_fopen_channel_output(QIOChannel *ioc);
+#endif
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 0cd648a..ec73647 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -27,7 +27,6 @@
 
 #include "qemu-common.h"
 #include "exec/cpu-common.h"
-#include "io/channel.h"
 
 
 /* Read a chunk of data from a file at the given position.  The pos argument
@@ -119,8 +118,6 @@ typedef struct QEMUFileHooks {
 } QEMUFileHooks;
 
 QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
-QEMUFile *qemu_fopen_channel_input(QIOChannel *ioc);
-QEMUFile *qemu_fopen_channel_output(QIOChannel *ioc);
 void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks);
 int qemu_get_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
diff --git a/migration/channel.c b/migration/channel.c
index 04a26c5..e4a0443 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -17,7 +17,7 @@
 #include "migration/channel.h"
 #include "migration/tls.h"
 #include "migration/migration.h"
-#include "migration/qemu-file.h"
+#include "migration/qemu-file-channel.h"
 #include "trace.h"
 #include "qapi/error.h"
 #include "io/channel-tls.h"
diff --git a/migration/colo.c b/migration/colo.c
index e2eaccd..150ac6a 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -13,7 +13,7 @@
 #include "qemu/osdep.h"
 #include "sysemu/sysemu.h"
 #include "migration/migration.h"
-#include "migration/qemu-file.h"
+#include "migration/qemu-file-channel.h"
 #include "migration/colo.h"
 #include "io/channel-buffer.h"
 #include "trace.h"
diff --git a/migration/migration.c b/migration/migration.c
index c2c25fe..6e32be0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -24,7 +24,7 @@
 #include "migration/fd.h"
 #include "migration/socket.h"
 #include "migration/migration.h"
-#include "migration/qemu-file.h"
+#include "migration/qemu-file-channel.h"
 #include "migration/vmstate.h"
 #include "sysemu/sysemu.h"
 #include "block/block.h"
diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index 45c13f1..87e44ec 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -23,7 +23,7 @@
  */
 
 #include "qemu/osdep.h"
-#include "migration/qemu-file.h"
+#include "migration/qemu-file-channel.h"
 #include "io/channel-socket.h"
 #include "qemu/iov.h"
 
diff --git a/migration/rdma.c b/migration/rdma.c
index 94d4840..d9a2d64 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -18,7 +18,7 @@
 #include "qemu-common.h"
 #include "qemu/cutils.h"
 #include "migration/migration.h"
-#include "migration/qemu-file.h"
+#include "migration/qemu-file-channel.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/sockets.h"
diff --git a/migration/savevm.c b/migration/savevm.c
index b03973a..1e6cf79 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -36,6 +36,7 @@
 #include "audio/audio.h"
 #include "migration/machine.h"
 #include "migration/migration.h"
+#include "migration/qemu-file-channel.h"
 #include "migration/postcopy-ram.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/error-report.h"
-- 
2.9.3

[Qemu-devel] [PATCH 06/19] migration: Export fd.c functions in its own file

2017-04-17 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 include/migration/fd.h| 20 
 include/migration/migration.h |  4 
 migration/fd.c|  2 +-
 migration/migration.c |  1 +
 4 files changed, 22 insertions(+), 5 deletions(-)
 create mode 100644 include/migration/fd.h

diff --git a/include/migration/fd.h b/include/migration/fd.h
new file mode 100644
index 000..4ec3298
--- /dev/null
+++ b/include/migration/fd.h
@@ -0,0 +1,20 @@
+/*
+ * QEMU live migration fd functions
+ *
+ * Copyright IBM, Corp. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_MIGRATION_FD_H
+#define QEMU_MIGRATION_FD_H
+void fd_start_incoming_migration(const char *path, Error **errp);
+
+void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
+ Error **errp);
+#endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index eb0150b..077b75b 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -202,10 +202,6 @@ void unix_start_incoming_migration(const char *path, Error 
**errp);
 
 void unix_start_outgoing_migration(MigrationState *s, const char *path, Error 
**errp);
 
-void fd_start_incoming_migration(const char *path, Error **errp);
-
-void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
**errp);
-
 void rdma_start_outgoing_migration(void *opaque, const char *host_port, Error 
**errp);
 
 void rdma_start_incoming_migration(const char *host_port, Error **errp);
diff --git a/migration/fd.c b/migration/fd.c
index fba3852..78ebd5b 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -18,7 +18,7 @@
 #include "qapi/error.h"
 #include "qemu-common.h"
 #include "migration/channel.h"
-#include "migration/migration.h"
+#include "migration/fd.h"
 #include "monitor/monitor.h"
 #include "io/channel-util.h"
 #include "trace.h"
diff --git a/migration/migration.c b/migration/migration.c
index 076c42a..a608faa 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -22,6 +22,7 @@
 #include "migration/state.h"
 #include "migration/init.h"
 #include "migration/exec.h"
+#include "migration/fd.h"
 #include "migration/migration.h"
 #include "migration/qemu-file.h"
 #include "sysemu/sysemu.h"
-- 
2.9.3

1 2 >

1 - 100 of 147 matches

Mail list logo