Re: [Qemu-devel] Status of my hacks on the MTTCG WIP branch

2016-01-19 Thread alvise rigo
On Mon, Jan 18, 2016 at 8:09 PM, Alex Bennée  wrote:
>
>
> Alex Bennée  writes:
>
> > alvise rigo  writes:
> >
> >> On Fri, Jan 15, 2016 at 4:25 PM, Alex Bennée  
> >> wrote:
> >>>
> >>> alvise rigo  writes:
> >>>
>  On Fri, Jan 15, 2016 at 3:51 PM, Alex Bennée  
>  wrote:
> >
> > alvise rigo  writes:
> >
> 
>  Keep in mind that Linux on arm64 uses the LDXP/STXP instructions that
>  exist solely in aarch64.
>  These instructions are purely emulated now and can potentially write
>  128 bits of data in a non-atomic fashion.
> >>>
> >>> Sure, but I doubt they are the reason for this hang as the kernel
> >>> doesn't use them.
> >>
> >> The kernel does use them for __cmpxchg_double in
> >> arch/arm64/include/asm/atomic_ll_sc.h.
> >
> > I take it back, if I'd have grepped for "ldxp" instead of "stxp" I would
> > have seen it, sorry about that ;-)
> >
> >> In any case, the normal exclusive instructions are also emulated in
> >> target-arm/translate-a64.c.
> >
> > I'll check on them on Monday. I'd assumed all the stuff was in the
> > helpers as I scanned through and missed the translate.c changes Fred
> > made. Hopefully that will be the last hurdle.
>
> I'm pleased to confirm you were right. I hacked up Fred's helper based
> solution for aarch64 including the ldxp/stxp stuff. It's not
> semantically correct because:
>
>   result = atomic_bool_cmpxchg(p, oldval, (uint8_t)newval) &&
>atomic_bool_cmpxchg([1], oldval2, (uint8_t)newval2);
>
> won't leave the system as it was before if the race causes the second

Exactly.

> cmpxchg to fail. I assume this won't be a problem in the LL/SC world as
> we'll be able to serialise all accesses to the exclusive page properly?

In LL/SC the idea would be to dedicate one ARM-specific helper (in
target-arm/helper-a64.c) to handle this case.
Once the helper grabbed the excl mutex, we are allowed to make 128
bits or bigger accesses.

>
>
> See:
>
> https://github.com/stsquad/qemu/tree/mttcg/multi_tcg_v8_wip_ajb_fix_locks-r2
>
> >
> > In the meantime if I'm not booting Jessie I can get MTTCG aarch64
> > working with a initrd based rootfs. Once I've gone through those I'm
> > planning on giving it a good stress test with -fsantize=threads.
>
> My first pass with this threw up a bunch of errors with the RCU code
> like this:
>
> WARNING: ThreadSanitizer: data race (pid=15387)
>   Atomic write of size 4 at 0x7f59efa51d48 by main thread (mutexes: write 
> M172):
> #0 __tsan_atomic32_fetch_add  (libtsan.so.0+0x00058e8f)
> #1 call_rcu1 util/rcu.c:288 (qemu-system-aarch64+0x006c3bd0)
> #2 address_space_update_topology 
> /home/alex/lsrc/qemu/qemu.git/memory.c:806 
> (qemu-system-aarch64+0x001ed9ca)
> #3 memory_region_transaction_commit 
> /home/alex/lsrc/qemu/qemu.git/memory.c:842 
> (qemu-system-aarch64+0x001ed9ca)
> #4 address_space_init /home/alex/lsrc/qemu/qemu.git/memory.c:2136 
> (qemu-system-aarch64+0x001f1fa6)
> #5 memory_map_init /home/alex/lsrc/qemu/qemu.git/exec.c:2344 
> (qemu-system-aarch64+0x00196607)
> #6 cpu_exec_init_all /home/alex/lsrc/qemu/qemu.git/exec.c:2795 
> (qemu-system-aarch64+0x00196607)
> #7 main /home/alex/lsrc/qemu/qemu.git/vl.c:4083 
> (qemu-system-aarch64+0x001829aa)
>
>   Previous read of size 4 at 0x7f59efa51d48 by thread T1:
> #0 call_rcu_thread util/rcu.c:242 (qemu-system-aarch64+0x006c3d92)
> #1   (libtsan.so.0+0x000235f9)
>
>   Location is global 'rcu_call_count' of size 4 at 0x7f59efa51d48 
> (qemu-system-aarch64+0x010f1d48)
>
>   Mutex M172 (0x7f59ef6254e0) created at:
> #0 pthread_mutex_init  (libtsan.so.0+0x00027ee5)
> #1 qemu_mutex_init util/qemu-thread-posix.c:55 
> (qemu-system-aarch64+0x006ad747)
> #2 qemu_init_cpu_loop /home/alex/lsrc/qemu/qemu.git/cpus.c:890 
> (qemu-system-aarch64+0x001d4166)
> #3 main /home/alex/lsrc/qemu/qemu.git/vl.c:3005 
> (qemu-system-aarch64+0x001820ac)
>
>   Thread T1 (tid=15389, running) created by main thread at:
> #0 pthread_create  (libtsan.so.0+0x000274c7)
> #1 qemu_thread_create util/qemu-thread-posix.c:525 
> (qemu-system-aarch64+0x006ae04d)
> #2 rcu_init_complete util/rcu.c:320 (qemu-system-aarch64+0x006c3d52)
> #3 rcu_init util/rcu.c:351 (qemu-system-aarch64+0x0018e288)
> #4 __libc_csu_init  (qemu-system-aarch64+0x006c63ec)
>
>
> but I don't know how many are false positives so I'm going to look in more
> detail now.

Umm...I'm not very familiar with the sanitize option, I'll let you
follow this lead :).

alvise

>
> 
>
> --
> Alex Bennée



Re: [Qemu-devel] [PATCH COLO-Frame v13 34/39] net/filter-buffer: Add default filter-buffer for each netdev

2016-01-19 Thread Hailiang Zhang

Hi Jason,

Thanks for your review.

On 2016/1/19 11:19, Jason Wang wrote:



On 12/29/2015 03:09 PM, zhanghailiang wrote:

We add each netdev (except vhost-net) a default filter-buffer,
which will be used for COLO or Micro-checkpoint to buffer VM's packets.
The name of default filter-buffer is 'nop'.
For the default filter-buffer, it will not buffer any packets in default.
So it has no side effect for the netdev.

Signed-off-by: zhanghailiang 
Cc: Jason Wang 
Cc: Yang Hongyang 


This patch did three things:

1) the ability to enable or disable a netfilter
2) the ability to add a default filter
3) default filter attaching for filter-buffer

Better to split them into separate small patches.

And several questions:

For 1), I'm not sure this is real needed, we can in fact disable a
filter by removing it.


If we do like this, do we also need to _enable_ the buffer filter by
add it dynamically instead of attaching the default filter ?
Just like what we do in V10 ?
(In that series, you think have a default filter may be better.
The main reason for that is to support
hot-add nic. Since we didn't support hot-add nic during COLO,
it will be OK to add default filter dynamically)


For 2), Instead of a specific code just for filter buffer, I think we
need a generic method for an arbitrary filter to be used as default.


Good idea.


And if we can achieve 2), 3) is not needed any more.


---
v12:
- Skip vhost-net when add default filter
- Don't go through filter layer if the filter is disabled.
v11:
- New patch
---
  include/net/filter.h | 10 +++
  net/filter-buffer.c  | 82 
  net/filter.c |  6 +++-
  net/net.c| 12 
  4 files changed, 109 insertions(+), 1 deletion(-)

diff --git a/include/net/filter.h b/include/net/filter.h
index 2deda36..40aa38c 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -56,6 +56,8 @@ struct NetFilterState {
  NetClientState *netdev;
  NetFilterDirection direction;
  char info_str[256];
+bool is_default;
+bool enabled;
  QTAILQ_ENTRY(NetFilterState) next;
  };

@@ -74,4 +76,12 @@ ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
  int iovcnt,
  void *opaque);

+static inline bool qemu_need_skip_netfilter(NetFilterState *nf)
+{
+return nf->enabled ? false : true;
+}
+
+void netdev_add_default_filter_buffer(const char *netdev_id,
+  NetFilterDirection direction,
+  Error **errp);
  #endif /* QEMU_NET_FILTER_H */
diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index 57be149..9cf3544 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -14,6 +14,13 @@
  #include "qapi/qmp/qerror.h"
  #include "qapi-visit.h"
  #include "qom/object.h"
+#include "net/net.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp-output-visitor.h"
+#include "qapi/qmp-input-visitor.h"
+#include "monitor/monitor.h"
+#include "qmp-commands.h"
+#include "net/vhost_net.h"

  #define TYPE_FILTER_BUFFER "filter-buffer"

@@ -102,6 +109,7 @@ static void filter_buffer_cleanup(NetFilterState *nf)
  static void filter_buffer_setup(NetFilterState *nf, Error **errp)
  {
  FilterBufferState *s = FILTER_BUFFER(nf);
+char *path = object_get_canonical_path_component(OBJECT(nf));

  /*
   * We may want to accept zero interval when VM FT solutions like MC
@@ -114,6 +122,14 @@ static void filter_buffer_setup(NetFilterState *nf, Error 
**errp)
  }

  s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
+nf->is_default = !strcmp(path, "nop");
+/*
+* For the default buffer filter, it will be disabled by default,
+* So it will not buffer any packets.
+*/
+if (nf->is_default) {
+nf->enabled = false;
+}
  if (s->interval) {
  timer_init_us(>release_timer, QEMU_CLOCK_VIRTUAL,
filter_buffer_release_timer, nf);
@@ -163,6 +179,72 @@ out:
  error_propagate(errp, local_err);
  }

+/*
+* This will be used by COLO or MC FT, for which they will need
+* to buffer the packets of VM's net devices, Here we add a default
+* buffer filter for each netdev. The name of default buffer filter is
+* 'nop'
+*/
+void netdev_add_default_filter_buffer(const char *netdev_id,
+  NetFilterDirection direction,
+  Error **errp)
+{


Need a more generic way to add an arbitrary filter as default. E.g
during netdev init, query if there's a default and do the initialization
there.



We call it in net_client_init1(), i don't find a better place to call it,
what's your suggestion ?


+QmpOutputVisitor *qov;
+QmpInputVisitor *qiv;
+Visitor *ov, *iv;
+QObject *obj = NULL;
+QDict *qdict;
+void *dummy 

Re: [Qemu-devel] [PATCH COLO-Frame v13 35/39] filter-buffer: Accept zero interval

2016-01-19 Thread Hailiang Zhang

On 2016/1/19 11:21, Jason Wang wrote:



On 12/29/2015 03:09 PM, zhanghailiang wrote:

For default buffer filter, its 'interval' value is zero,
so here we should accept zero interval.

Signed-off-by: zhanghailiang 
Reviewed-by: Yang Hongyang 
Cc: Jason Wang 
---
v12:
- Add Reviewed-by tag
v11:
- Add comment
v10:
- new patch
---
  net/filter-buffer.c | 10 --
  1 file changed, 10 deletions(-)

diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index 9cf3544..8abac94 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -111,16 +111,6 @@ static void filter_buffer_setup(NetFilterState *nf, Error 
**errp)
  FilterBufferState *s = FILTER_BUFFER(nf);
  char *path = object_get_canonical_path_component(OBJECT(nf));

-/*
- * We may want to accept zero interval when VM FT solutions like MC
- * or COLO use this filter to release packets on demand.
- */


You'd better move this to the commit log for a better rationale of the
patch.



OK, i will fix it, thanks.


-if (!s->interval) {
-error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "interval",
-   "a non-zero interval");
-return;
-}
-
  s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
  nf->is_default = !strcmp(path, "nop");
  /*



.







Re: [Qemu-devel] [PATCH COLO-Frame v13 36/39] filter-buffer: Introduce a helper function to enable/disable default filter

2016-01-19 Thread Hailiang Zhang

On 2016/1/19 11:35, Jason Wang wrote:



On 12/29/2015 03:09 PM, zhanghailiang wrote:

The default buffer filter doesn't buffer packets in default,
but we need to buffer packets for COLO or Micro-checkpoint,
Here we add a helper function to enable/disable filter's buffer
capability.

Signed-off-by: zhanghailiang 
Cc: Jason Wang 
Cc: Yang Hongyang 
---
v12:
- Rename the heler function to qemu_set_default_filters_status()
v11:
- New patch
---
  include/net/filter.h |  1 +
  include/net/net.h|  4 
  net/filter-buffer.c  | 19 +++
  net/net.c| 29 +
  4 files changed, 53 insertions(+)

diff --git a/include/net/filter.h b/include/net/filter.h
index 40aa38c..08aa604 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -84,4 +84,5 @@ static inline bool qemu_need_skip_netfilter(NetFilterState 
*nf)
  void netdev_add_default_filter_buffer(const char *netdev_id,
NetFilterDirection direction,
Error **errp);
+void qemu_set_default_filters_status(bool enable);
  #endif /* QEMU_NET_FILTER_H */
diff --git a/include/net/net.h b/include/net/net.h
index 7af3e15..5c65c45 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -125,6 +125,10 @@ NetClientState *qemu_find_vlan_client_by_name(Monitor 
*mon, int vlan_id,
const char *client_str);
  typedef void (*qemu_nic_foreach)(NICState *nic, void *opaque);
  void qemu_foreach_nic(qemu_nic_foreach func, void *opaque);
+typedef void (*qemu_netfilter_foreach)(NetFilterState *nf, void *opaque,
+   Error **errp);
+void qemu_foreach_netfilter(qemu_netfilter_foreach func, void *opaque,
+Error **errp);
  int qemu_can_send_packet(NetClientState *nc);
  ssize_t qemu_sendv_packet(NetClientState *nc, const struct iovec *iov,
int iovcnt);
diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index 8abac94..90a50cc 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -169,6 +169,25 @@ out:
  error_propagate(errp, local_err);
  }

+static void set_default_filter_status(NetFilterState *nf,
+  void *opaque,
+  Error **errp)
+{
+if (!strcmp(object_get_typename(OBJECT(nf)), TYPE_FILTER_BUFFER)) {
+bool *status = opaque;
+
+if (nf->is_default) {
+nf->enabled = *status;
+}
+}
+}
+
+void qemu_set_default_filters_status(bool enable)
+{
+qemu_foreach_netfilter(set_default_filter_status,
+   , NULL);
+}


The name of the function sounds a generic helper but it in fact pass a
type specific function. Consider enable is a generic property of
netfilter, we want a more generic code here.



Got it, i will fix it.


+
  /*
  * This will be used by COLO or MC FT, for which they will need
  * to buffer the packets of VM's net devices, Here we add a default
diff --git a/net/net.c b/net/net.c
index fd53cfc..30946c5 100644
--- a/net/net.c
+++ b/net/net.c
@@ -259,6 +259,35 @@ static char *assign_name(NetClientState *nc1, const char 
*model)
  return g_strdup_printf("%s.%d", model, id);
  }

+void qemu_foreach_netfilter(qemu_netfilter_foreach func, void *opaque,
+Error **errp)
+{
+NetClientState *nc;
+NetFilterState *nf;
+
+QTAILQ_FOREACH(nc, _clients, next) {
+if (nc->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
+continue;
+}
+/* FIXME: Not support multiqueue */
+if (nc->queue_index > 1) {
+error_setg(errp, "%s: multiqueue is not supported", __func__);
+return;
+}


Do we really need this? Looks like netfilter_complete() has already
checked this.



Yes, this is useless, i will remove it.


+QTAILQ_FOREACH(nf, >filters, next) {
+if (func) {
+Error *local_err = NULL;
+
+func(nf, opaque, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
+}
+}


Need a separate patch for this helper.



OK, i will split it in next version, thanks.


+
  static void qemu_net_client_destructor(NetClientState *nc)
  {
  g_free(nc);



.







Re: [Qemu-devel] [PATCH 0/3] clean-includes script to add osdep.h to everything

2016-01-19 Thread Peter Maydell
On 19 January 2016 at 07:27, Markus Armbruster  wrote:
> Peter Maydell  writes:
>
>> On 11 January 2016 at 15:19, Daniel P. Berrange  wrote:
>>> I think even guest-agent code & tests could include it in order to
>>> get clean includes, even if they don't use any of the QEMU functions
>>> defined in it. So I think its simplest to just say every .c file must
>>> use it and leave it at that.
>>
>> OK, let's assume that works.
>
> If it doesn't, we need a header with just configuration results that is
> included in every .c file first.  Just like config.h should be when
> using autoconf.

An example of the kind of code that I wasn't sure about is
the stuff in tests/tcg/mips/ -- this currently doesn't
include any QEMU headers that I can see and I don't think
they're even on the include path.

In any case I'll do the obvious stuff first and circle back
to the oddball standalone sources later.

thanks
-- PMM



Re: [Qemu-devel] [RE-RESEND PATCH] pci: Adjust PCI config limit based on bus topology

2016-01-19 Thread Marcel Apfelbaum

On 01/19/2016 01:06 AM, Alex Williamson wrote:

A conventional PCI bus does not support config space accesses above
the standard 256 byte configuration space.  PCIe-to-PCI bridges are
not permitted to forward transactions if the extended register address
field is non-zero and must handle it as an unsupported request (PCIe
bridge spec rev 1.0, 4.1.3, 4.1.4).  Therefore, we should not support
extended config space if there is a conventional bus anywhere on the
path to a device.

Signed-off-by: Alex Williamson 
---
Previous postings:
https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg05384.html
https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg02422.html

  hw/pci/pci_host.c |   26 ++
  1 file changed, 26 insertions(+)

diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
index 49f59a5..3a3e294 100644
--- a/hw/pci/pci_host.c
+++ b/hw/pci/pci_host.c
@@ -19,6 +19,7 @@
   */

  #include "hw/pci/pci.h"
+#include "hw/pci/pci_bridge.h"
  #include "hw/pci/pci_host.h"
  #include "hw/pci/pci_bus.h"
  #include "trace.h"
@@ -49,9 +50,29 @@ static inline PCIDevice *pci_dev_find_by_addr(PCIBus *bus, 
uint32_t addr)
  return pci_find_device(bus, bus_num, devfn);
  }

+static void pci_adjust_config_limit(PCIBus *bus, uint32_t *limit)
+{
+if (*limit > PCI_CONFIG_SPACE_SIZE) {
+if (!pci_bus_is_express(bus)) {
+*limit = PCI_CONFIG_SPACE_SIZE;
+return;
+}
+
+if (!pci_bus_is_root(bus)) {
+PCIDevice *bridge = pci_bridge_get_device(bus);
+pci_adjust_config_limit(bridge->bus, limit);
+}
+}
+}
+
  void pci_host_config_write_common(PCIDevice *pci_dev, uint32_t addr,
uint32_t limit, uint32_t val, uint32_t len)
  {
+pci_adjust_config_limit(pci_dev->bus, );
+if (limit <= addr) {
+return;
+}
+
  assert(len <= 4);
  /* non-zero functions are only exposed when function 0 is present,
   * allowing direct removal of unexposed functions.
@@ -70,6 +91,11 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, 
uint32_t addr,
  {
  uint32_t ret;

+pci_adjust_config_limit(pci_dev->bus, );
+if (limit <= addr) {
+return ~0x0;
+}
+
  assert(len <= 4);
  /* non-zero functions are only exposed when function 0 is present,
   * allowing direct removal of unexposed functions.




Quick question: could we check the limit as part of pci_config_size?
Anyway, it looks OK to me.

Reviewed-by: Marcel Apfelbaum 

Thanks,
Marcel



Re: [Qemu-devel] [PATCH v7] spec: add qcow2 bitmaps extension specification

2016-01-19 Thread Vladimir Sementsov-Ogievskiy

On 19.01.2016 00:16, Eric Blake wrote:

preserving semantics of those extra_data bytes).  We
have enough room for future extension, and that's good e


Ok, so, what should go to the spec? Current wording is ok? Just delete 
"Type-specific":


+
+20 - 23:extra_data_size
+Size of type-specific extra data.
+
+For now, as no extra data is defined, extra_data_size is
+reserved and must be zero.
+
+variable:   Extra data for the bitmap.
+




--
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.




Re: [Qemu-devel] [RFC PATCH v2 00/10] Introduce Intel 82574 GbE Controller Emulation (e1000e)

2016-01-19 Thread Dmitry Fleytman

> On 19 Jan 2016, at 05:48 AM, Jason Wang  wrote:
> 
> 
> 
> On 01/19/2016 01:35 AM, Leonid Bloch wrote:
>> Hello All,
>> 
>> This series is the latest code of the e1000e device emulation being 
>> developed.
>> 
>> Changes since v1:
>> 
>> 1. Added support for all the device features:
>>  - Interrupt moderation.
>>  - RSS.
>>  - Multiqueue.
>> 2. Simulated exact PCI/PCIe configuration space layout.
>> 3. Made fixes needed to pass Microsoft's HW certification tests (HCK).
>> 
>> This series is still an RFC, because the following tasks are not done yet:
>> 
>> 1. See which code can be shared between this device and the existing e1000 
>> device.
>> 2. Rebase patches to the latest master (current base is v2.3.0).
>> 
>> Please share your thoughts,
>> Thanks, Dmitry.
> 
> Hi:
> 
> Do you have a public git tree for easier reviewing?


Hi,

Yes, see here: https://github.com/daynix/qemu-e1000e/commits/e1000e-v2 

Branch e1000e-v2.

~Dmitry

> 
> Thanks
> 
>> 
>> ===
>> 
>> Hello qemu-devel,
>> 
>> This patch series is an RFC for the new networking device emulation
>> we're developing for QEMU.
>> 
>> This new device emulates the Intel 82574 GbE Controller and works
>> with unmodified Intel e1000e drivers from the Linux/Windows kernels.
>> 
>> The status of the current series is "Functional Device Ready, work
>> on Extended Features in Progress".
>> 
>> More precisely, these patches represent a functional device, which
>> is recognized by the standard Intel drivers, and is able to transfer
>> TX/RX packets with CSO/TSO offloads, according to the spec.
>> 
>> Extended features not supported yet (work in progress):
>>  1. TX/RX Interrupt moderation mechanisms
>>  2. RSS
>>  3. Full-featured multi-queue (use of multiqueued network backend)
>> 
>> Also, there will be some code refactoring and performance
>> optimization efforts.
>> 
>> This series was tested on Linux (Fedora 22) and Windows (2012R2)
>> guests, using Iperf, with TX/RX and TCP/UDP streams, and various
>> packet sizes.
>> 
>> More thorough testing, including data streams with different MTU
>> sizes, and Microsoft Certification (HLK) tests, are pending missing
>> features' development.
>> 
>> See commit messages (esp. "net: Introduce e1000e device emulation")
>> for more information about the development approaches and the
>> architecture options chosen for this device.
>> 
>> This series is based upon v2.3.0 tag of the upstream QEMU repository,
>> and it will be rebased to latest before the final submission.
>> 
>> Please share your thoughts - any feedback is highly welcomed :)
>> 
>> Best Regards,
>> Dmitry Fleytman.
>> 
>> Dmitry Fleytman (10):
>>  msix: make msix_clr_pending() visible for clients
>>  pci: Introduce function for PCI PM capability creation
>>  pcie: Add support for PCIe CAP v1
>>  pcie: Introduce function for DSN capability creation
>>  net: Introduce Toeplitz hash calculator
>>  net: Add macros for ETH address tracing
>>  net_pkt: Name vmxnet3 packet abstractions more generic
>>  net_pkt: Extend packet abstraction as requied by e1000e functionality
>>  e1000_regs: Add definitions for Intel 82574-specific bits
>>  net: Introduce e1000e device emulation
>> 
>> MAINTAINERS|   14 +
>> default-configs/pci.mak|1 +
>> hw/net/Makefile.objs   |5 +-
>> hw/net/e1000_regs.h|  353 -
>> hw/net/e1000e.c|  700 +
>> hw/net/e1000e_core.c   | 3453 
>> 
>> hw/net/e1000e_core.h   |  230 +++
>> hw/net/net_rx_pkt.c|  536 +++
>> hw/net/net_rx_pkt.h|  353 +
>> hw/net/net_tx_pkt.c|  627 
>> hw/net/net_tx_pkt.h|  191 +++
>> hw/net/vmxnet3.c   |   80 +-
>> hw/net/vmxnet_rx_pkt.c |  187 ---
>> hw/net/vmxnet_rx_pkt.h |  174 ---
>> hw/net/vmxnet_tx_pkt.c |  567 
>> hw/net/vmxnet_tx_pkt.h |  148 --
>> hw/pci/msix.c  |2 +-
>> hw/pci/pci.c   |   21 +
>> hw/pci/pcie.c  |   96 +-
>> include/hw/pci/msix.h  |1 +
>> include/hw/pci/pci.h   |2 +
>> include/hw/pci/pci_regs.h  |4 +
>> include/hw/pci/pcie.h  |5 +
>> include/hw/pci/pcie_regs.h |8 +-
>> include/net/checksum.h |   49 +-
>> include/net/eth.h  |  161 ++-
>> include/net/net.h  |5 +
>> net/checksum.c |7 +-
>> net/eth.c  |  410 +-
>> tests/Makefile |4 +-
>> trace-events   |  195 +++
>> 31 files changed, 7350 insertions(+), 1239 deletions(-)
>> create mode 100644 hw/net/e1000e.c
>> create mode 100644 hw/net/e1000e_core.c
>> create mode 100644 hw/net/e1000e_core.h
>> create mode 100644 hw/net/net_rx_pkt.c
>> create mode 100644 hw/net/net_rx_pkt.h
>> create mode 100644 hw/net/net_tx_pkt.c
>> create mode 100644 hw/net/net_tx_pkt.h
>> delete mode 100644 hw/net/vmxnet_rx_pkt.c
>> 

Re: [Qemu-devel] [PATCH v1 1/1] arm_gic: Update ID registers based on revision

2016-01-19 Thread Peter Maydell
On 19 January 2016 at 01:33, Alistair Francis
 wrote:
> Update the GIC ID registers (registers above 0xfe0) based on the GIC
> revision instead of using the sames values for all GIC implementations.
>
> Signed-off-by: Alistair Francis 
> Tested-by: Sören Brinkmann 
> ---
>
>  hw/intc/arm_gic.c | 29 ++---
>  1 file changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c
> index 13e297d..f6bfa53 100644
> --- a/hw/intc/arm_gic.c
> +++ b/hw/intc/arm_gic.c
> @@ -31,8 +31,16 @@ do { fprintf(stderr, "arm_gic: " fmt , ## __VA_ARGS__); } 
> while (0)
>  #define DPRINTF(fmt, ...) do {} while(0)
>  #endif
>
> -static const uint8_t gic_id[] = {
> -0x90, 0x13, 0x04, 0x00, 0x0d, 0xf0, 0x05, 0xb1
> +static const uint8_t gic_id_11mpcore[] = {
> +0x00, 0x00, 0x00, 0x00, 0x90, 0x13, 0x04, 0x00, 0x0d, 0xf0, 0x05, 0xb1
> +};
> +
> +static const uint8_t gic_id_gicv1[] = {
> +0x04, 0x00, 0x00, 0x00, 0x90, 0xb3, 0x1b, 0x00, 0x0d, 0xf0, 0x05, 0xb1
> +};
> +
> +static const uint8_t gic_id_gicv2[] = {
> +0x04, 0x00, 0x00, 0x00, 0x90, 0xb4, 0x2b, 0x00, 0x0d, 0xf0, 0x05, 0xb1
>  };
>
>  static inline int gic_get_current_cpu(GICState *s)
> @@ -689,7 +697,22 @@ static uint32_t gic_dist_readb(void *opaque, hwaddr 
> offset, MemTxAttrs attrs)
>  if (offset & 3) {
>  res = 0;
>  } else {
> -res = gic_id[(offset - 0xfe0) >> 2];
> +switch (s->revision) {
> +case REV_11MPCORE:
> +res = gic_id_11mpcore[(offset - 0xfe0) >> 2];
> +break;
> +case 1:
> +res = gic_id_gicv1[(offset - 0xfe0) >> 2];
> +break;
> +case 2:
> +res = gic_id_gicv2[(offset - 0xfe0) >> 2];
> +break;
> +case REV_NVIC:
> +/* Shouldn't be able to get here */
> +abort();
> +default:
> +res = 0;
> +}
>  }
>  }
>  return res;

You've expanded the arrays to include the fd0...fdc values
(which is right) but the logic also needs to change to
make offset == 0xfd0..0xfdf go through this code path and
also to use the new indexing into the array.

thanks
-- PMM



Re: [Qemu-devel] [PATCH 1/1] nvdimm: disable balloon

2016-01-19 Thread Xiao Guangrong



On 01/18/2016 07:42 PM, Denis V. Lunev wrote:

From: Vladimir Sementsov-Ogievskiy 

NVDIMM for now is planned to use as a backing store for DAX filesystem
in the guest and thus this memory is excluded from guest memory management
and LRUs.

In this case libvirt running QEMU along with configured ballon almost
immediately inflates balloon and effectively kill the guest as
qemu counts nvdimm as part of the ram.



It looks good me.

However, it is not related to this patch, why not use the 'total memory' 
reported
by guest instead? It is more precise as a) BIOS and other components will occupy
available memory and b) guest may limit the memory size it can use...


Counting dimm devices as part of the ram for ballooning was started from
patch
  virtio-balloon: Fix balloon not working correctly when hotplug memory

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Denis V. Lunev 
CC: Stefan Hajnoczi 
CC: Xiao Guangrong 
CC: "Michael S. Tsirkin" 
CC: Igor Mammedov 
CC: Eric Blake 
CC: Markus Armbruster 
---
The patch is submitted start a discussion. It may be technically correct,
but for us the situation is a bit shady.

  hw/mem/nvdimm.c  | 4 
  hw/mem/pc-dimm.c | 7 ++-
  include/hw/mem/pc-dimm.h | 1 +
  qapi-schema.json | 5 -
  4 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 4fd397f..4f4d29a 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -27,9 +27,13 @@
  static void nvdimm_class_init(ObjectClass *oc, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(oc);
+PCDIMMDeviceClass *ddc = PC_DIMM_CLASS(oc);

  /* nvdimm hotplug has not been supported yet. */
  dc->hotpluggable = false;
+
+/* ballooning is not supported */
+ddc->in_ram = false;
  }

  static TypeInfo nvdimm_info = {
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index d5cdab2..e0f869d 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -164,6 +164,7 @@ int qmp_pc_dimm_device_list(Object *obj, void *opaque)
  MemoryDeviceInfo *info = g_new0(MemoryDeviceInfo, 1);
  PCDIMMDeviceInfo *di = g_new0(PCDIMMDeviceInfo, 1);
  DeviceClass *dc = DEVICE_GET_CLASS(obj);
+PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(obj);
  PCDIMMDevice *dimm = PC_DIMM(obj);

  if (dev->id) {
@@ -172,6 +173,7 @@ int qmp_pc_dimm_device_list(Object *obj, void *opaque)
  }
  di->hotplugged = dev->hotplugged;
  di->hotpluggable = dc->hotpluggable;
+di->in_ram = ddc->in_ram;
  di->addr = dimm->addr;
  di->slot = dimm->slot;
  di->node = dimm->node;
@@ -205,7 +207,9 @@ ram_addr_t get_current_ram_size(void)
  if (value) {
  switch (value->type) {
  case MEMORY_DEVICE_INFO_KIND_DIMM:
-size += value->u.dimm->size;
+if (value->u.dimm->in_ram) {
+size += value->u.dimm->size;
+}


Can we use "object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)" to filter out
NVDIMM device?


  break;
  default:
  break;
@@ -444,6 +448,7 @@ static void pc_dimm_class_init(ObjectClass *oc, void *data)
  dc->props = pc_dimm_properties;
  dc->desc = "DIMM memory module";

+ddc->in_ram = true;
  ddc->get_memory_region = pc_dimm_get_memory_region;
  }

diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index d83bf30..3bcb505 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -65,6 +65,7 @@ typedef struct PCDIMMDevice {
  typedef struct PCDIMMDeviceClass {
  /* private */
  DeviceClass parent_class;
+bool in_ram;

  /* public */
  MemoryRegion *(*get_memory_region)(PCDIMMDevice *dimm);
diff --git a/qapi-schema.json b/qapi-schema.json
index b3038b2..613b4d5 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3922,6 +3922,8 @@
  #
  # @hotpluggable: true if device if could be added/removed while machine is 
running
  #
+# @in-ram: true if device if should be counted in current ram size (since 2.6)
+#
  # Since: 2.1
  ##
  { 'struct': 'PCDIMMDeviceInfo',
@@ -3932,7 +3934,8 @@
  'node': 'int',
  'memdev': 'str',
  'hotplugged': 'bool',
-'hotpluggable': 'bool'
+'hotpluggable': 'bool',
+'in-ram': 'bool'


What is it used for?



Re: [Qemu-devel] [PATCH v8 00/35] qapi visitor cleanups (post-introspection cleanups subset E)

2016-01-19 Thread Markus Armbruster
Eric Blake  writes:

> Based on qemu.git master. Pending prerequisites:
> + Not a strong dependency, but for qapi-tests to consistently pass,
> I needed a race fixed:
> https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg01827.html
>
> Also available as a tag at this location:
> git fetch git://repo.or.cz/qemu/ericb.git qapi-cleanupv8e
>
> and will soon be part of my branch with the rest of the v5 series, at:
> http://repo.or.cz/qemu/ericb.git/shortlog/refs/heads/qapi
>
> v8 notes:
> Four new patches (13-16/35), plus rebasing on top of them, so that
> the code base now consistently passes a 'v, name' pair anywhere a
> visitor needs a name, rather than putting other arguments in between
> the pair. I got to have fun with Coccinelle :)  Also fix a bug in my
> changes to visit_next_list() (v7 29/31), so that 'make check' and
> qemu-iotests now pass at all points in the series.
>
> The parameter ordering changes have the potential to be a rebase
> magnet, so I'm hoping this series can go in relatively soon after
> Markus returns from break.
>
> I made good on my threat in v7 of writing a qapi-to-JSON output
> visitor, but that will remain a separate series based on this one
> (the only posting of that series so far now needs rebasing:
> https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg01760.html)
>
> 001/35:[] [--] 'qobject: Document more shortcomings in our number 
> handling'
> 002/35:[] [--] 'qapi: Avoid use of misnamed DO_UPCAST()'
> 003/35:[] [--] 'qapi: Drop dead dealloc visitor variable'
> 004/35:[] [--] 'hmp: Improve use of qapi visitor'
> 005/35:[] [--] 'vl: Improve use of qapi visitor'
> 006/35:[] [--] 'balloon: Improve use of qapi visitor'
> 007/35:[] [--] 'qapi: Improve generated event use of qapi visitor'
> 008/35:[] [--] 'qapi: Track all failures between visit_start/stop'
> 009/35:[] [--] 'qapi: Prefer type_int64 over type_int in visitors'
> 010/35:[] [--] 'qapi: Make all visitors supply uint64 callbacks'
> 011/35:[] [--] 'qapi: Consolidate visitor small integer callbacks'
> 012/35:[] [--] 'qapi: Don't cast Enum* to int*'
> 013/35:[down] 'qom: Use typedef for Visitor'

Applies cleanly until here.

> 014/35:[down] 'qapi: Swap visit_* arguments for consistent 'name' placement'

Doesn't apply.

You can either spin v9 addressing Marc-André's review, or you can rebase
v8 without changes somewhere I can pull, so I can review it properly.

[...]



Re: [Qemu-devel] [PATCH v16 00/14] vfio-pci: pass the aer error to guest

2016-01-19 Thread Chen Fan


On 01/17/2016 02:34 AM, Michael S. Tsirkin wrote:

On Tue, Jan 12, 2016 at 10:43:01AM +0800, Cao jin wrote:

From: Chen Fan 

For now, for vfio pci passthough devices when qemu receives
an error from host aer report, currentlly just terminate the guest,
but usually user want to know what error occurred but stopping the
guest, so this patches add aer capability support for vfio device,
and pass the error to guest, and have guest driver to recover
from the error.

I would like to see a version of this patchset that doesn't
depend on pci core changes.
I think that if you make this simplifying assumption:

- all devices on same bus in guest are on same bus in host

then you can handle both reset and hotplug simply in function 0
since it will belong to vfio.

So we can have a version without pci core changes that simply assumes
this, and things will just work.


Now, if we wanted to enforce this limitation, I think the
cleanest way would be to add a callback in struct PCIDevice:

bool is_valid_function(PCIDevice *newfunction)

and call it as each function is added.
This way aer function can validate that each function
added shares the same bus.
And this way issues will be detected directly and not when
function 0 is added.

I would prefer this validation code to be a patch on top so we can merge
the functionality directly and avoid blocking it while we figure out the
best api to validate things.

I don't see why making guest topology match host would
ever be a problem, but if it's required to support
configurations where these differ, I'd like to see
an attempt to address that be split out, after aer
is supported.

Hi Michael,

   it's a good idea. we should simplify the implementation of the aer 
function first

without more affect on pci core code.

Thanks,
Chen





v15-v16:
10/14, 11/14 are new to introduce a reset sequence id to specify the
vfio devices has been reset for that reset. other patches aren't modified.

v14-v15:
1. add device hot reset callback
2. add bus_in_reset for vfio device to avoid multi do host bus reset

v13-v14:
1. for multifunction device, requiring all functions enable AER.(9/13)
2. due to all affected functions receive error signal, ignore no
   error occurred function. (12/13)

v12-v13:
1. since support multifuncion hotplug, here add callback to enable aer.
2. add pci device pre+post reset for aer host reset.

Chen Fan (14):
   vfio: extract vfio_get_hot_reset_info as a single function
   vfio: squeeze out vfio_pci_do_hot_reset for support bus reset
   pcie: modify the capability size assert
   vfio: make the 4 bytes aligned for capability size
   vfio: add pcie extanded capability support
   aer: impove pcie_aer_init to support vfio device
   vfio: add aer support for vfio device
   vfio: add check host bus reset is support or not
   add check reset mechanism when hotplug vfio device
   pci: introduce pci bus pre reset
   vfio: introduce last reset sequence id
   pcie_aer: expose pcie_aer_msg() interface
   vfio-pci: pass the aer error to guest
   vfio: add 'aer' property to expose aercap

  hw/pci-bridge/ioh3420.c|   2 +-
  hw/pci-bridge/xio3130_downstream.c |   2 +-
  hw/pci-bridge/xio3130_upstream.c   |   2 +-
  hw/pci/pci.c   |  42 +++
  hw/pci/pci_bridge.c|   3 +
  hw/pci/pcie.c  |   2 +-
  hw/pci/pcie_aer.c  |   6 +-
  hw/vfio/pci.c  | 616 +
  hw/vfio/pci.h  |   9 +
  include/hw/pci/pci.h   |   1 +
  include/hw/pci/pci_bus.h   |   8 +
  include/hw/pci/pcie_aer.h  |   3 +-
  12 files changed, 624 insertions(+), 72 deletions(-)

--
1.9.3




.








Re: [Qemu-devel] [PATCH v16 13/14] vfio-pci: pass the aer error to guest

2016-01-19 Thread Chen Fan


On 01/18/2016 06:45 PM, Marcel Apfelbaum wrote:

On 01/12/2016 04:43 AM, Cao jin wrote:

From: Chen Fan 

when the vfio device encounters an uncorrectable error in host,
the vfio_pci driver will signal the eventfd registered by this
vfio device, the results in the qemu eventfd handler getting


Maybe "the results in" -> resulting in


invoked.

this patch is to pass the error to guest and have the guest driver
recover from the error.


Maybe "Pass the error to... and let the ... "



Signed-off-by: Chen Fan 
---
  hw/vfio/pci.c | 53 
+++--

  1 file changed, 47 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index da4815e..efa5e01 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2553,18 +2553,59 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
  static void vfio_err_notifier_handler(void *opaque)
  {
  VFIOPCIDevice *vdev = opaque;
+PCIDevice *dev = >pdev;
+PCIEAERMsg msg = {
+.severity = 0,
+.source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
+};

  if (!event_notifier_test_and_clear(>err_notifier)) {
  return;
  }

  /*
- * TBD. Retrieve the error details and decide what action
- * needs to be taken. One of the actions could be to pass
- * the error to the guest and have the guest driver recover
- * from the error. This requires that PCIe capabilities be
- * exposed to the guest. For now, we just terminate the
- * guest to contain the error.
+ * in case the real hardware configration has been changed,


configration -> configuration



+ * here we should recheck the bus reset capability.
+ */
+if ((vdev->features & VFIO_FEATURE_ENABLE_AER) &&
+vfio_check_host_bus_reset(vdev)) {
+goto stop;
+}
+/*
+ * we should read the error details from the real hardware
+ * configuration spaces, here we only need to do is signaling
+ * to guest an uncorrectable error has occurred.
+ */
+if ((vdev->features & VFIO_FEATURE_ENABLE_AER) &&
+dev->exp.aer_cap) {


Why do we need dev->exp.aer_cap check here? In patch 7/14 we fail the 
device init

process if this happens, right?


the property FEATURE_ENABLE_AER can't represent the vfio device actually 
has the aer

capability. so here we should check it.




+uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
+uint32_t uncor_status;
+bool isfatal;
+
+uncor_status = vfio_pci_read_config(dev,
+   dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
+
+/*
+ * if we receive the error signal but not this device, we can


maybe "if the error is not emitted by this device..."


thank you for your careful review for my bad english description in the 
patchset,

I will update them in the next version.

Thanks,
Chen




Thanks,
Marcel


+ * just ignore it.
+ */
+if (!(uncor_status & ~0UL)) {
+return;
+}
+
+isfatal = uncor_status & pci_get_long(aer_cap + 
PCI_ERR_UNCOR_SEVER);

+
+msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
+ PCI_ERR_ROOT_CMD_NONFATAL_EN;
+
+pcie_aer_msg(dev, );
+return;
+}
+
+stop:
+/*
+ * If the aer capability is not exposed to the guest. we just
+ * terminate the guest to contain the error.
   */

  error_report("%s(%04x:%02x:%02x.%x) Unrecoverable error 
detected.  "






.








Re: [Qemu-devel] [PATCH v16 10/14] pci: introduce pci bus pre reset

2016-01-19 Thread Chen Fan


On 01/15/2016 04:36 AM, Alex Williamson wrote:

On Tue, 2016-01-12 at 10:43 +0800, Cao jin wrote:

From: Chen Fan 

avoid repeat bus reset, here introduce a sequence ID for each time
bus hot reset, so each vfio device could know whether they've already
been reset for that sequence ID.

Signed-off-by: Chen Fan 
---
  hw/pci/pci.c | 13 +
  hw/pci/pci_bridge.c  |  3 +++
  include/hw/pci/pci.h |  1 +
  include/hw/pci/pci_bus.h |  3 +++
  4 files changed, 20 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index f6ca6ef..ceb72d5 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -91,6 +91,18 @@ static void pci_bus_unrealize(BusState *qbus,
Error **errp)
  vmstate_unregister(NULL, _pcibus, bus);
  }
  
+void pci_bus_pre_reset(PCIBus *bus, uint32_t seqid)

+{
+PCIBus *sec;
+
+bus->in_reset = true;
+bus->reset_seqid = seqid;
+
+QLIST_FOREACH(sec, >child, sibling) {
+pci_bus_pre_reset(sec, seqid);
+}
+}
+
  static bool pcibus_is_root(PCIBus *bus)
  {
  return !bus->parent_dev;
@@ -276,6 +288,7 @@ static void pcibus_reset(BusState *qbus)
  for (i = 0; i < bus->nirq; i++) {
  assert(bus->irq_count[i] == 0);
  }
+bus->in_reset = false;
  }
  
  static void pci_host_bus_register(PCIBus *bus, DeviceState *parent)

diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index 40c97b1..c7f15a1 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -268,6 +268,9 @@ void pci_bridge_write_config(PCIDevice *d,
  newctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
  if (~oldctl & newctl & PCI_BRIDGE_CTL_BUS_RESET) {
  /* Trigger hot reset on 0->1 transition. */
+uint32_t seqid = s->sec_bus.reset_seqid++;

Doesn't this need to come from a global sequence ID?  Imagine the case
of a nested bus, the leaf bus is reset incrementing the sequence ID.
The devices on that bus store that sequence ID as they're reset.  The
parent bus is then reset, but all the devices on the leaf bus have
already been reset for that sequence ID and ignore the reset.


+
+pci_bus_pre_reset(>sec_bus, seqid ? seqid : 1);

Does this work?  Seems like this would make devices ignore the second
bus reset after the VM is instantiated.  ie.  the first bus reset seqid
is 0, so we call pre_reset with 1, the second time we call it with 1
again.


  qbus_reset_all(>sec_bus.qbus);

I'd be tempted to call qbus_walk_children() directly, it already has a
pre_busfn callback hook.

Hi Alex,

this looks like need to change much pci core code,  as Michael suggested 
in 00/14,

maybe we should simply the aer implementation. what do you think of that?

Thanks,
Chen





  }
  }
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 379b6e1..b811279 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -381,6 +381,7 @@ void pci_bus_fire_intx_routing_notifier(PCIBus
*bus);
  void pci_device_set_intx_routing_notifier(PCIDevice *dev,
PCIINTxRoutingNotifier
notifier);
  void pci_device_reset(PCIDevice *dev);
+void pci_bus_pre_reset(PCIBus *bus, uint32_t seqid);
  
  PCIDevice *pci_nic_init_nofail(NICInfo *nd, PCIBus *rootbus,

 const char *default_model,
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 7812fa9..dd6aaf1 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -40,6 +40,9 @@ struct PCIBus {
  int nirq;
  int *irq_count;
  
+bool in_reset;

+uint32_t reset_seqid;
+
  NotifierWithReturnList hotplug_notifiers;
  };
  



.








Re: [Qemu-devel] [PATCH] hw/misc: slavepci_passthru driver

2016-01-19 Thread Francesco Zuliani


Hi Alex,


On 01/18/2016 05:41 PM, Alex Williamson wrote:

On Mon, 2016-01-18 at 10:16 -0500, Marc-André Lureau wrote:

Hi

- Original Message -

Hi there,

I'd like to submit this new pci driver ( hw/misc )for inclusion,
if you think it could be useful to other as well as ourself.

The driver "worked for our needs" BUT we haven't done extensive
testing and this is our first attempt to submit a patch so I kindly
ask for extra-forgiveness .

The "slavepci_passthru" driver is useful in the scenario described
below to implement a simplified passthru when the host CPU does not
support IOMMU and one is interested only in pci target-mode (slave
devices).

Let's CC Alex, who worked on the most recent framework for something related to 
that (VFIO).


Embedded system cpu (e.g. Atom, AMD G-Series) often lack the VT-d
extensions (IOMMU) needed to be able to pass-thru pci peripherals to
the guest machine (i.e. the pci pass-thru feature cannot be used).

If one is only interested in using the pci board as a pci-target
(slave device), this driver mmap(s) the host-pci-bars into the guest
within a virtual pci-device.

What exactly do you mean by pci-target/slave device?  Does this mean
that the device is not DMA capable, ie. cannot enable BusMaster?


Yes, exactly. Our approach  can be used ONLY if one is NOT interested in 
DMA-Capability (i.e. it is not possible to enable BusMaster)

This is useful in our case for debugging via qemu gsbserver facility
(i.e. '-s' option in qemu) a system running barebone-executable .

Currently the driver assumes the custom pci card has four 32-bit bars
to be mapped (in current patch this is mandatory)

HowTo:
To use the new driver one shall:
- define two environment variables for assigning proper VID and DID to
   associate to the guest pci card
- give the host pci bar address to map in the guest.

Example Usage:

Let us suppose that we have in the host a slave pci device with the
following 4 bars (i.e. output of lspci -v -s YOUR-CARD | grep Memory)
   Memory at db80 (32-bit, non-prefetchable) [size=4K]
   Memory at db90 (32-bit, non-prefetchable) [size=8K]
   Memory at dba0 (32-bit, non-prefetchable) [size=4K]
   Memory at dbb0 (32-bit, non-prefetchable) [size=4K]

We can map these bars in a guest-pci with VID=0xe33e DID=0x000a using

SLAVEPASSTHRU_VID="0xe33e" SLAVEPASSTHRU_DID="0xa" qemu-system-x86_64 \
   YOUR-SET-OF-FLAGS \
   -device
   
slavepassthru,size1=4096,baseaddr1=0xdb90,size2=8192,baseaddr2=0xdba0,size3=4096,baseaddr3=0xdbd0,size4=4096,baseaddr4=0xdbe0

Please note that if your device has less than four bars you can give
the same size and baseaddress to the unused bars.

Those are some pretty serious usage restrictions and using /dev/mem is
really not practical.  The resource files in pci-sysfs would even be a
better option.

our was a quick hack to fulfill our needs, the approach via sysfs is
of course the right one and we would implement it if this patch is of 
interest.



I didn't see how IO and MMIO BARs get enabled on the
physical device or whether you support any kind of interrupt scheme.

In our case the IO space is not used.
The MMIO space is already enabled.

Our custom board does not have any interrupt and our quick hack
did not implement it.


   I
had never really intended QEMU use of this, but you might want to
consider vfio no-iommu mode:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/vfio/vfio.c?id=03a76b60f8ba27974e2d252bc555d2c103420e15

Using this taints the kernel, but maybe that's nothing you mind if
you're already letting QEMU access /dev/mem.  The QEMU vfio-pci driver
would need to be modified to use the new device and of course it
wouldn't have IOMMU translation capabilities.  That means that the
BusMaster bit should protected and MSI/X capabilities should be hidden
from the VM.  It seems more flexible and featureful than what you have
here.  Thanks,


I was not aware of this interesting patch, I will study it to see if
it fits our use case.

Just for information you mean "taint" in that "security" is broken, not
licensing issues, am I right?

Thanks a lot for your time

Francesco Zuliani


Alex





Re: [Qemu-devel] [PATCH 2/2] migration/virtio: Remove simple .get/.put use

2016-01-19 Thread Dr. David Alan Gilbert
* Sascha Silbe (si...@linux.vnet.ibm.com) wrote:
> Dear David,
> 
> "Dr. David Alan Gilbert"  writes:
> 
> > +/* a variable length array (i.e. _type *_field) but we know the
> > + * length
> > + */
> > +#define VMSTATE_STRUCT_VARRAY_POINTER_KNOWN(_field, _state, _num, 
> > _version, _vmsd, _type) { \
> [...]
> 
> Thinking about it some more, wouldn't VMSTATE_STRUCT_ARRAY_POINTER be a
> better name? Like with VMSTATE_ARRAY, the size of the array is known at
> compile-time. It's just that you need to dereference it first, hence
> ..._POINTER. There's nothing variable about it at all.

t's all a bit confusing; but the only pattern I'd figured out was that the
things after the 'VARRAY_' part tended to be talking about the length of
the array rather than the contents.

> But keep in mind I don't understand the current naming scheme in the
> first place, e.g. VMSTATE_ARRAY_INT32_UNSAFE vs. VMSTATE_VARRAY_INT32,
> with both of them specifying VMS_VARRAY_INT32...

No, I don't really either; one for Juan or Amit to suggest if they
prefer one or the other.

Dave

> 
> Sascha
> -- 
> Softwareentwicklung Sascha Silbe, Niederhofenstraße 5/1, 71229 Leonberg
> https://se-silbe.de/
> USt-IdNr. DE281696641
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-devel] [PATCH v3 01/10] qom: add helpers for UserCreatable object types

2016-01-19 Thread Daniel P. Berrange
The QMP monitor code has two helper methods object_add
and qmp_object_del that are called from several places
in the code (QMP, HMP and main emulator startup).

The HMP and main emulator startup code also share
further logic that extracts the qom-type & id
values from a qdict.

We soon need to use this logic from qemu-img, qemu-io
and qemu-nbd too, but don't want those to depend on
the monitor, nor do we want to duplicate the code.

To avoid this, move some code out of qmp.c and hmp.c
adding 3 new methods to qom/object_interfaces.c

 - user_creatable_add - takes a QDict holding a full
   object definition & instantiates it
 - user_creatable_add_type - takes an ID, type name,
   and QDict holding object properties & instantiates
   it
 - user_creatable_del - takes an ID and deletes the
   corresponding object

The existing code is updated to use these new methods.

Signed-off-by: Daniel P. Berrange 
---
 hmp.c   |  52 ---
 include/monitor/monitor.h   |   3 -
 include/qom/object_interfaces.h |  48 ++
 qmp.c   |  76 ++
 qom/object_interfaces.c | 139 
 vl.c|  48 --
 6 files changed, 216 insertions(+), 150 deletions(-)

diff --git a/hmp.c b/hmp.c
index 54f2620..95930b0 100644
--- a/hmp.c
+++ b/hmp.c
@@ -29,6 +29,7 @@
 #include "qapi/string-output-visitor.h"
 #include "qapi/util.h"
 #include "qapi-visit.h"
+#include "qom/object_interfaces.h"
 #include "ui/console.h"
 #include "block/qapi.h"
 #include "qemu-io.h"
@@ -1652,58 +1653,27 @@ void hmp_netdev_del(Monitor *mon, const QDict *qdict)
 void hmp_object_add(Monitor *mon, const QDict *qdict)
 {
 Error *err = NULL;
-Error *err_end = NULL;
 QemuOpts *opts;
-char *type = NULL;
-char *id = NULL;
-void *dummy = NULL;
 OptsVisitor *ov;
-QDict *pdict;
+Object *obj = NULL;
 
 opts = qemu_opts_from_qdict(qemu_find_opts("object"), qdict, );
 if (err) {
-goto out;
+hmp_handle_error(mon, );
+return;
 }
 
 ov = opts_visitor_new(opts);
-pdict = qdict_clone_shallow(qdict);
-
-visit_start_struct(opts_get_visitor(ov), , NULL, NULL, 0, );
-if (err) {
-goto out_clean;
-}
-
-qdict_del(pdict, "qom-type");
-visit_type_str(opts_get_visitor(ov), , "qom-type", );
-if (err) {
-goto out_end;
-}
+obj = user_creatable_add(qdict, opts_get_visitor(ov), );
+opts_visitor_cleanup(ov);
+qemu_opts_del(opts);
 
-qdict_del(pdict, "id");
-visit_type_str(opts_get_visitor(ov), , "id", );
 if (err) {
-goto out_end;
+hmp_handle_error(mon, );
 }
-
-object_add(type, id, pdict, opts_get_visitor(ov), );
-
-out_end:
-visit_end_struct(opts_get_visitor(ov), _end);
-if (!err && err_end) {
-qmp_object_del(id, NULL);
+if (obj) {
+object_unref(obj);
 }
-error_propagate(, err_end);
-out_clean:
-opts_visitor_cleanup(ov);
-
-QDECREF(pdict);
-qemu_opts_del(opts);
-g_free(id);
-g_free(type);
-g_free(dummy);
-
-out:
-hmp_handle_error(mon, );
 }
 
 void hmp_getfd(Monitor *mon, const QDict *qdict)
@@ -1933,7 +1903,7 @@ void hmp_object_del(Monitor *mon, const QDict *qdict)
 const char *id = qdict_get_str(qdict, "id");
 Error *err = NULL;
 
-qmp_object_del(id, );
+user_creatable_del(id, );
 hmp_handle_error(mon, );
 }
 
diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 91b95ae..aa0f373 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -43,9 +43,6 @@ void monitor_read_command(Monitor *mon, int show_prompt);
 int monitor_read_password(Monitor *mon, ReadLineFunc *readline_func,
   void *opaque);
 
-void object_add(const char *type, const char *id, const QDict *qdict,
-Visitor *v, Error **errp);
-
 AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, int64_t fdset_id,
 bool has_opaque, const char *opaque,
 Error **errp);
diff --git a/include/qom/object_interfaces.h b/include/qom/object_interfaces.h
index 283ae0d..7bbaf2f 100644
--- a/include/qom/object_interfaces.h
+++ b/include/qom/object_interfaces.h
@@ -2,6 +2,8 @@
 #define OBJECT_INTERFACES_H
 
 #include "qom/object.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/visitor.h"
 
 #define TYPE_USER_CREATABLE "user-creatable"
 
@@ -72,4 +74,50 @@ void user_creatable_complete(Object *obj, Error **errp);
  * from implements USER_CREATABLE interface.
  */
 bool user_creatable_can_be_deleted(UserCreatable *uc, Error **errp);
+
+/**
+ * user_creatable_add:
+ * @qdict: the object definition
+ * @v: the visitor
+ * @errp: if an error occurs, a pointer to an area to store the error
+ *
+ * Create an instance of the user creatable object whose type,
+ * is defined in @qdict by the 

[Qemu-devel] [PATCH v3 00/10] Make qemu-img/qemu-nbd/qemu-io CLI more flexible

2016-01-19 Thread Daniel P. Berrange
This series of patches expands the syntax of the qemu-img,
qemu-nbd and qemu-io commands to make them more flexible.

  v0: http://lists.gnu.org/archive/html/qemu-devel/2015-10/msg04365.html
  v1: https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg04014.html
  v2: https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg04354.html

First all three gain a --object parameter, which allows
instantiation of user creatable object types. The immediate
use case is to allow for creation of the 'secret' object
type to pass passwords for curl, iscsi and rbd drivers.
For qemu-nbd this will also be needed to create TLS
certificates for encryption support.

Then all three gain a '--image-opts' parameter which causes
the positional filenames to be interepreted as option strings
rather tha nplain filenames. This avoids the need to use the
JSON syntax, or to add custom CLI args for each block backend
option that exists. The immediate use case is to allow the
user to specify the ID of the 'secret' object they just created.

Finally, there are a few small cleanup patches

The first 4 patches in this series are a pre-requisite for
3 other series

 - Support for TLS in NBD
 - Support for secrets for passwd auth in curl, rbd, iscsi
 - Support for LUKS encryption passwords

Hopefully the --object patches are fairly uncontroversial
and can be merged soon. The latter patches for --image-opts
are very nice to have, but not a hard blocker right now
since the 'json:{}' syntax can be used until they are
merged.

Changed in v3:

 - Rebase to resolve with conflicts against recently
   merged code
 - Remove use of errx()

Changed in v2:

 - Share more common code in qom/object_interfaces.c to
   avoid duplicating so much of 'object_create' in each
   command
 - Remove previously added '--source optstring' parameter
   which replaced the positional filenames, in favour of
   keeping the positional filenames but using a --image-opts
   boolean arg to change their interpretation
 - Added docs for --image-opts to qemu-img man page
 - Use printf instead of echo -n in examples
 - Line wrap help string based on user terminal width not
   source code width
 - Update qemu-nbd/qemu-io to use constants for options
 - Update qemu-nbd to avoid overlapping option values

Daniel P. Berrange (10):
  qom: add helpers for UserCreatable object types
  qemu-img: add support for --object command line arg
  qemu-nbd: add support for --object command line arg
  qemu-io: add support for --object command line arg
  qemu-io: allow specifying image as a set of options args
  qemu-nbd: allow specifying image as a set of options args
  qemu-img: allow specifying image as a set of options args
  qemu-nbd: don't overlap long option values with short options
  qemu-nbd: use no_argument/required_argument constants
  qemu-io: use no_argument/required_argument constants

 hmp.c   |  52 +---
 include/monitor/monitor.h   |   3 -
 include/qom/object_interfaces.h |  48 
 qemu-img-cmds.hx|  44 ++--
 qemu-img.c  | 570 +---
 qemu-img.texi   |  14 +
 qemu-io.c   | 113 +++-
 qemu-nbd.c  | 149 ---
 qemu-nbd.texi   |   6 +
 qmp.c   |  76 +-
 qom/object_interfaces.c | 139 ++
 vl.c|  48 +---
 12 files changed, 1009 insertions(+), 253 deletions(-)

-- 
2.5.0




[Qemu-devel] [PATCH v3 02/10] qemu-img: add support for --object command line arg

2016-01-19 Thread Daniel P. Berrange
Allow creation of user creatable object types with qemu-img
via a new --object command line arg. This will be used to supply
passwords and/or encryption keys to the various block driver
backends via the recently added 'secret' object type.

 # printf letmein > mypasswd.txt
 # qemu-img info --object secret,id=sec0,file=mypasswd.txt \
  ...other info args...

Signed-off-by: Daniel P. Berrange 
---
 qemu-img-cmds.hx |  44 -
 qemu-img.c   | 266 +--
 qemu-img.texi|   8 ++
 3 files changed, 288 insertions(+), 30 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 9567774..5bb1de7 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -10,68 +10,68 @@ STEXI
 ETEXI
 
 DEF("check", img_check,
-"check [-q] [-f fmt] [--output=ofmt] [-r [leaks | all]] [-T src_cache] 
filename")
+"check [-q] [--object objectdef] [-f fmt] [--output=ofmt] [-r [leaks | 
all]] [-T src_cache] filename")
 STEXI
-@item check [-q] [-f @var{fmt}] [--output=@var{ofmt}] [-r [leaks | all]] [-T 
@var{src_cache}] @var{filename}
+@item check [--object objectdef] [-q] [-f @var{fmt}] [--output=@var{ofmt}] [-r 
[leaks | all]] [-T @var{src_cache}] @var{filename}
 ETEXI
 
 DEF("create", img_create,
-"create [-q] [-f fmt] [-o options] filename [size]")
+"create [-q] [--object objectdef] [-f fmt] [-o options] filename [size]")
 STEXI
-@item create [-q] [-f @var{fmt}] [-o @var{options}] @var{filename} [@var{size}]
+@item create [--object objectdef] [-q] [-f @var{fmt}] [-o @var{options}] 
@var{filename} [@var{size}]
 ETEXI
 
 DEF("commit", img_commit,
-"commit [-q] [-f fmt] [-t cache] [-b base] [-d] [-p] filename")
+"commit [-q] [--object objectdef] [-f fmt] [-t cache] [-b base] [-d] [-p] 
filename")
 STEXI
-@item commit [-q] [-f @var{fmt}] [-t @var{cache}] [-b @var{base}] [-d] [-p] 
@var{filename}
+@item commit [--object objectdef] [-q] [-f @var{fmt}] [-t @var{cache}] [-b 
@var{base}] [-d] [-p] @var{filename}
 ETEXI
 
 DEF("compare", img_compare,
-"compare [-f fmt] [-F fmt] [-T src_cache] [-p] [-q] [-s] filename1 
filename2")
+"compare [--object objectdef] [-f fmt] [-F fmt] [-T src_cache] [-p] [-q] 
[-s] filename1 filename2")
 STEXI
-@item compare [-f @var{fmt}] [-F @var{fmt}] [-T @var{src_cache}] [-p] [-q] 
[-s] @var{filename1} @var{filename2}
+@item compare [--object objectdef] [-f @var{fmt}] [-F @var{fmt}] [-T 
@var{src_cache}] [-p] [-q] [-s] @var{filename1} @var{filename2}
 ETEXI
 
 DEF("convert", img_convert,
-"convert [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T src_cache] [-O 
output_fmt] [-o options] [-s snapshot_id_or_name] [-l snapshot_param] [-S 
sparse_size] filename [filename2 [...]] output_filename")
+"convert [--object objectdef] [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T 
src_cache] [-O output_fmt] [-o options] [-s snapshot_id_or_name] [-l 
snapshot_param] [-S sparse_size] filename [filename2 [...]] output_filename")
 STEXI
-@item convert [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-T 
@var{src_cache}] [-O @var{output_fmt}] [-o @var{options}] [-s 
@var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S @var{sparse_size}] 
@var{filename} [@var{filename2} [...]] @var{output_filename}
+@item convert [--object objectdef] [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t 
@var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] [-o @var{options}] [-s 
@var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S @var{sparse_size}] 
@var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF("info", img_info,
-"info [-f fmt] [--output=ofmt] [--backing-chain] filename")
+"info [--object objectdef] [-f fmt] [--output=ofmt] [--backing-chain] 
filename")
 STEXI
-@item info [-f @var{fmt}] [--output=@var{ofmt}] [--backing-chain] 
@var{filename}
+@item info [--object objectdef] [-f @var{fmt}] [--output=@var{ofmt}] 
[--backing-chain] @var{filename}
 ETEXI
 
 DEF("map", img_map,
-"map [-f fmt] [--output=ofmt] filename")
+"map [--object objectdef] [-f fmt] [--output=ofmt] filename")
 STEXI
-@item map [-f @var{fmt}] [--output=@var{ofmt}] @var{filename}
+@item map [--object objectdef] [-f @var{fmt}] [--output=@var{ofmt}] 
@var{filename}
 ETEXI
 
 DEF("snapshot", img_snapshot,
-"snapshot [-q] [-l | -a snapshot | -c snapshot | -d snapshot] filename")
+"snapshot [--object objectdef] [-q] [-l | -a snapshot | -c snapshot | -d 
snapshot] filename")
 STEXI
-@item snapshot [-q] [-l | -a @var{snapshot} | -c @var{snapshot} | -d 
@var{snapshot}] @var{filename}
+@item snapshot [--object objectdef] [-q] [-l | -a @var{snapshot} | -c 
@var{snapshot} | -d @var{snapshot}] @var{filename}
 ETEXI
 
 DEF("rebase", img_rebase,
-"rebase [-q] [-f fmt] [-t cache] [-T src_cache] [-p] [-u] -b backing_file 
[-F backing_fmt] filename")
+"rebase [--object objectdef] [-q] [-f fmt] [-t cache] [-T src_cache] [-p] 
[-u] -b backing_file [-F backing_fmt] filename")
 STEXI
-@item rebase [-q] [-f @var{fmt}] 

Re: [Qemu-devel] [RFC PATCH v2 02/10] pci: Introduce function for PCI PM capability creation

2016-01-19 Thread Marcel Apfelbaum

On 01/18/2016 07:35 PM, Leonid Bloch wrote:

From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
  hw/pci/pci.c  | 21 +
  include/hw/pci/pci.h  |  2 ++
  include/hw/pci/pci_regs.h |  1 +
  3 files changed, 24 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index b3d5100..3aaf86c 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2050,6 +2050,27 @@ static void pci_del_option_rom(PCIDevice *pdev)
  pdev->has_rom = false;
  }

+int pci_add_pm_capability(PCIDevice *pdev, uint8_t offset, uint16_t pmc)
+{
+int ret = pci_add_capability(pdev, PCI_CAP_ID_PM, offset, PCI_PM_SIZEOF);
+
+if (ret >= 0) {
+pci_set_word(pdev->config + offset + PCI_PM_PMC,
+ PCI_PM_CAP_VER_1_1 |


Hi,

Why not ver 1.2 ? just wondering


+ pmc);
+
+pci_set_word(pdev->wmask + offset + PCI_PM_CTRL,
+ PCI_PM_CTRL_STATE_MASK |
+ PCI_PM_CTRL_PME_ENABLE |


PME_ENABLE and PME_STATUS are writable only if the function supports PME# 
generation from D3cold



+ PCI_PM_CTRL_DATA_SEL_MASK


And Data_Select is writable only if the data register is implemented.

My point is this seems to be a standard capability function, but it depends
on optional function features.

Thanks,
Marcel

);

+
+pci_set_word(pdev->w1cmask + offset + PCI_PM_CTRL,
+ PCI_PM_CTRL_PME_STATUS);
+}
+
+return ret;
+}
+
  /*
   * if !offset
   * Reserve space and add capability to the linked list in pci config space
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index b97c295..cec7234 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -319,6 +319,8 @@ int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,
 uint8_t offset, uint8_t size,
 Error **errp);

+int pci_add_pm_capability(PCIDevice *pdev, uint8_t offset, uint16_t pmc);
+
  void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);

  uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id);
diff --git a/include/hw/pci/pci_regs.h b/include/hw/pci/pci_regs.h
index 56a404b..2bd3ac9 100644
--- a/include/hw/pci/pci_regs.h
+++ b/include/hw/pci/pci_regs.h
@@ -221,6 +221,7 @@

  #define PCI_PM_PMC2   /* PM Capabilities Register */
  #define  PCI_PM_CAP_VER_MASK  0x0007  /* Version */
+#define  PCI_PM_CAP_VER_1_1 0x0002  /* PCI PM spec ver. 1.1 */
  #define  PCI_PM_CAP_PME_CLOCK 0x0008  /* PME clock required */
  #define  PCI_PM_CAP_RESERVED0x0010  /* Reserved field */
  #define  PCI_PM_CAP_DSI   0x0020  /* Device specific 
initialization */






[Qemu-devel] [PATCH v3 04/10] qemu-io: add support for --object command line arg

2016-01-19 Thread Daniel P. Berrange
Allow creation of user creatable object types with qemu-io
via a new --object command line arg. This will be used to supply
passwords and/or encryption keys to the various block driver
backends via the recently added 'secret' object type.

 # printf letmein > mypasswd.txt
 # qemu-io --object secret,id=sec0,file=mypasswd.txt \
  ...other args...

Signed-off-by: Daniel P. Berrange 
---
 qemu-io.c | 53 +
 1 file changed, 53 insertions(+)

diff --git a/qemu-io.c b/qemu-io.c
index d47228a..884a23e 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -21,6 +21,8 @@
 #include "qemu/config-file.h"
 #include "qemu/readline.h"
 #include "qapi/qmp/qstring.h"
+#include "qapi/opts-visitor.h"
+#include "qom/object_interfaces.h"
 #include "sysemu/block-backend.h"
 #include "block/block_int.h"
 #include "trace/control.h"
@@ -203,6 +205,8 @@ static void usage(const char *name)
 "Usage: %s [-h] [-V] [-rsnm] [-f FMT] [-c STRING] ... [file]\n"
 "QEMU Disk exerciser\n"
 "\n"
+"  --object OBJECTDEF   define an object such as 'secret' for\n"
+"   passwords and/or encryption keys\n"
 "  -c, --cmd STRING execute command with its arguments\n"
 "   from the given string\n"
 "  -f, --format FMT specifies the block driver to use\n"
@@ -364,6 +368,38 @@ static void reenable_tty_echo(void)
 qemu_set_tty_echo(STDIN_FILENO, true);
 }
 
+enum {
+OPTION_OBJECT = 256,
+};
+
+static QemuOptsList qemu_object_opts = {
+.name = "object",
+.implied_opt_name = "qom-type",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_object_opts.head),
+.desc = {
+{ }
+},
+};
+
+static int object_create(void *opaque, QemuOpts *opts, Error **errp)
+{
+Error *err = NULL;
+OptsVisitor *ov;
+QDict *pdict;
+
+ov = opts_visitor_new(opts);
+pdict = qemu_opts_to_qdict(opts, NULL);
+
+user_creatable_add(pdict, opts_get_visitor(ov), );
+opts_visitor_cleanup(ov);
+QDECREF(pdict);
+if (err) {
+error_propagate(errp, err);
+return -1;
+}
+return 0;
+}
+
 int main(int argc, char **argv)
 {
 int readonly = 0;
@@ -382,6 +418,7 @@ int main(int argc, char **argv)
 { "discard", 1, NULL, 'd' },
 { "cache", 1, NULL, 't' },
 { "trace", 1, NULL, 'T' },
+{ "object", 1, NULL, OPTION_OBJECT },
 { NULL, 0, NULL, 0 }
 };
 int c;
@@ -389,6 +426,7 @@ int main(int argc, char **argv)
 int flags = BDRV_O_UNMAP;
 Error *local_error = NULL;
 QDict *opts = NULL;
+QemuOpts *qopts = NULL;
 
 #ifdef CONFIG_POSIX
 signal(SIGPIPE, SIG_IGN);
@@ -397,6 +435,8 @@ int main(int argc, char **argv)
 progname = basename(argv[0]);
 qemu_init_exec_dir(argv[0]);
 
+module_call_init(MODULE_INIT_QOM);
+qemu_add_opts(_object_opts);
 bdrv_init();
 
 while ((c = getopt_long(argc, argv, sopt, lopt, _index)) != -1) {
@@ -448,6 +488,13 @@ int main(int argc, char **argv)
 case 'h':
 usage(progname);
 exit(0);
+case OPTION_OBJECT:
+qopts = qemu_opts_parse_noisily(qemu_find_opts("object"),
+optarg, true);
+if (!qopts) {
+exit(1);
+}
+break;
 default:
 usage(progname);
 exit(1);
@@ -464,6 +511,12 @@ int main(int argc, char **argv)
 exit(1);
 }
 
+if (qemu_opts_foreach(qemu_find_opts("object"),
+  object_create,
+  NULL, NULL)) {
+exit(1);
+}
+
 /* initialize commands */
 qemuio_add_command(_cmd);
 qemuio_add_command(_cmd);
-- 
2.5.0




[Qemu-devel] [PATCH v3 07/10] qemu-img: allow specifying image as a set of options args

2016-01-19 Thread Daniel P. Berrange
Currently qemu-img allows an image filename to be passed on the
command line, but unless using the JSON format, it does not have
a way to set any options except the format eg

   qemu-img info https://127.0.0.1/images/centos7.iso

This adds a --image-opts arg that indicates that the positional
filename should be interpreted as a full option string, not
just a filename.

   qemu-img info --source driver=http,url=https://127.0.0.1/images,sslverify=off

This flag is mutually exclusive with the '-f' / '-F' flags.

Signed-off-by: Daniel P. Berrange 
---
 qemu-img-cmds.hx |  44 
 qemu-img.c   | 304 +--
 qemu-img.texi|   6 ++
 3 files changed, 303 insertions(+), 51 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 5bb1de7..ee5c770 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -10,68 +10,68 @@ STEXI
 ETEXI
 
 DEF("check", img_check,
-"check [-q] [--object objectdef] [-f fmt] [--output=ofmt] [-r [leaks | 
all]] [-T src_cache] filename")
+"check [-q] [--object objectdef] [--image-opts] [-f fmt] [--output=ofmt] 
[-r [leaks | all]] [-T src_cache] filename")
 STEXI
-@item check [--object objectdef] [-q] [-f @var{fmt}] [--output=@var{ofmt}] [-r 
[leaks | all]] [-T @var{src_cache}] @var{filename}
+@item check [--object objectdef] [--image-opts] [-q] [-f @var{fmt}] 
[--output=@var{ofmt}] [-r [leaks | all]] [-T @var{src_cache}] @var{filename}
 ETEXI
 
 DEF("create", img_create,
-"create [-q] [--object objectdef] [-f fmt] [-o options] filename [size]")
+"create [-q] [--object objectdef] [--image-opts] [-f fmt] [-o options] 
filename [size]")
 STEXI
-@item create [--object objectdef] [-q] [-f @var{fmt}] [-o @var{options}] 
@var{filename} [@var{size}]
+@item create [--object objectdef] [--image-opts] [-q] [-f @var{fmt}] [-o 
@var{options}] @var{filename} [@var{size}]
 ETEXI
 
 DEF("commit", img_commit,
-"commit [-q] [--object objectdef] [-f fmt] [-t cache] [-b base] [-d] [-p] 
filename")
+"commit [-q] [--object objectdef] [--image-opts] [-f fmt] [-t cache] [-b 
base] [-d] [-p] filename")
 STEXI
-@item commit [--object objectdef] [-q] [-f @var{fmt}] [-t @var{cache}] [-b 
@var{base}] [-d] [-p] @var{filename}
+@item commit [--object objectdef] [--image-opts] [-q] [-f @var{fmt}] [-t 
@var{cache}] [-b @var{base}] [-d] [-p] @var{filename}
 ETEXI
 
 DEF("compare", img_compare,
-"compare [--object objectdef] [-f fmt] [-F fmt] [-T src_cache] [-p] [-q] 
[-s] filename1 filename2")
+"compare [--object objectdef] [--image-opts] [-f fmt] [-F fmt] [-T 
src_cache] [-p] [-q] [-s] filename1 filename2")
 STEXI
-@item compare [--object objectdef] [-f @var{fmt}] [-F @var{fmt}] [-T 
@var{src_cache}] [-p] [-q] [-s] @var{filename1} @var{filename2}
+@item compare [--object objectdef] [--image-opts] [-f @var{fmt}] [-F 
@var{fmt}] [-T @var{src_cache}] [-p] [-q] [-s] @var{filename1} @var{filename2}
 ETEXI
 
 DEF("convert", img_convert,
-"convert [--object objectdef] [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T 
src_cache] [-O output_fmt] [-o options] [-s snapshot_id_or_name] [-l 
snapshot_param] [-S sparse_size] filename [filename2 [...]] output_filename")
+"convert [--object objectdef] [--image-opts] [-c] [-p] [-q] [-n] [-f fmt] 
[-t cache] [-T src_cache] [-O output_fmt] [-o options] [-s snapshot_id_or_name] 
[-l snapshot_param] [-S sparse_size] filename [filename2 [...]] 
output_filename")
 STEXI
-@item convert [--object objectdef] [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t 
@var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] [-o @var{options}] [-s 
@var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S @var{sparse_size}] 
@var{filename} [@var{filename2} [...]] @var{output_filename}
+@item convert [--object objectdef] [--image-opts] [-c] [-p] [-q] [-n] [-f 
@var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] [-o 
@var{options}] [-s @var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S 
@var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF("info", img_info,
-"info [--object objectdef] [-f fmt] [--output=ofmt] [--backing-chain] 
filename")
+"info [--object objectdef] [--image-opts] [-f fmt] [--output=ofmt] 
[--backing-chain] filename")
 STEXI
-@item info [--object objectdef] [-f @var{fmt}] [--output=@var{ofmt}] 
[--backing-chain] @var{filename}
+@item info [--object objectdef] [--image-opts] [-f @var{fmt}] 
[--output=@var{ofmt}] [--backing-chain] @var{filename}
 ETEXI
 
 DEF("map", img_map,
-"map [--object objectdef] [-f fmt] [--output=ofmt] filename")
+"map [--object objectdef] [--image-opts] [-f fmt] [--output=ofmt] 
filename")
 STEXI
-@item map [--object objectdef] [-f @var{fmt}] [--output=@var{ofmt}] 
@var{filename}
+@item map [--object objectdef] [--image-opts] [-f @var{fmt}] 
[--output=@var{ofmt}] @var{filename}
 ETEXI
 
 DEF("snapshot", img_snapshot,
-"snapshot [--object objectdef] [-q] [-l | -a snapshot | -c 

[Qemu-devel] [PATCH v3 06/10] qemu-nbd: allow specifying image as a set of options args

2016-01-19 Thread Daniel P. Berrange
Currently qemu-nbd allows an image filename to be passed on the
command line, but unless using the JSON format, it does not have
a way to set any options except the format eg

   qemu-nbd https://127.0.0.1/images/centos7.iso
   qemu-nbd /home/berrange/demo.qcow2

This adds a --image-opts arg that indicates that the positional
filename should be interpreted as a full option string, not
just a filename.

   qemu-nbd --image-opts driver=http,url=https://127.0.0.1/images,sslverify=off
   qemu-nbd --image-opts file=/home/berrange/demo.qcow2

This flag is mutually exclusive with the '-f' flag.

Signed-off-by: Daniel P. Berrange 
---
 qemu-nbd.c | 45 -
 1 file changed, 40 insertions(+), 5 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 941c4c8..db610f9 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -48,6 +48,7 @@
 #define QEMU_NBD_OPT_DISCARD   3
 #define QEMU_NBD_OPT_DETECT_ZEROES 4
 #define QEMU_NBD_OPT_OBJECT5
+#define QEMU_NBD_OPT_IMAGE_OPTS6
 
 static NBDExport *exp;
 static int verbose;
@@ -381,6 +382,16 @@ static SocketAddress *nbd_build_socket_address(const char 
*sockpath,
 }
 
 
+static QemuOptsList file_opts = {
+.name = "file",
+.implied_opt_name = "file",
+.head = QTAILQ_HEAD_INITIALIZER(file_opts.head),
+.desc = {
+/* no elements => accept any params */
+{ /* end of list */ }
+},
+};
+
 static QemuOptsList qemu_object_opts = {
 .name = "object",
 .implied_opt_name = "qom-type",
@@ -448,6 +459,7 @@ int main(int argc, char **argv)
 { "persistent", 0, NULL, 't' },
 { "verbose", 0, NULL, 'v' },
 { "object", 1, NULL, QEMU_NBD_OPT_OBJECT },
+{ "image-opts", 0, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
 { NULL, 0, NULL, 0 }
 };
 int ch;
@@ -466,6 +478,7 @@ int main(int argc, char **argv)
 BlockdevDetectZeroesOptions detect_zeroes = 
BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF;
 QDict *options = NULL;
 QemuOpts *opts;
+bool imageOpts = false;
 
 /* The client thread uses SIGTERM to interrupt the server.  A signal
  * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
@@ -635,6 +648,9 @@ int main(int argc, char **argv)
 exit(1);
 }
 break;
+case QEMU_NBD_OPT_IMAGE_OPTS:
+imageOpts = true;
+break;
 case '?':
 error_report("Try `%s --help' for more information.", argv[0]);
 exit(EXIT_FAILURE);
@@ -743,13 +759,32 @@ int main(int argc, char **argv)
 bdrv_init();
 atexit(bdrv_close_all);
 
-if (fmt) {
-options = qdict_new();
-qdict_put(options, "driver", qstring_from_str(fmt));
+srcpath = argv[optind];
+if (imageOpts) {
+char *file = NULL;
+if (fmt) {
+error_report("--image-opts and -f are mutually exclusive");
+exit(EXIT_FAILURE);
+}
+opts = qemu_opts_parse_noisily(_opts, srcpath, true);
+if (!opts) {
+qemu_opts_reset(_opts);
+exit(EXIT_FAILURE);
+}
+file = g_strdup(qemu_opt_get(opts, "file"));
+qemu_opt_unset(opts, "file");
+options = qemu_opts_to_qdict(opts, NULL);
+qemu_opts_reset(_opts);
+blk = blk_new_open("hda", file, NULL, options, flags, _err);
+g_free(file);
+} else {
+if (fmt) {
+options = qdict_new();
+qdict_put(options, "driver", qstring_from_str(fmt));
+}
+blk = blk_new_open("hda", srcpath, NULL, options, flags, _err);
 }
 
-srcpath = argv[optind];
-blk = blk_new_open("hda", srcpath, NULL, options, flags, _err);
 if (!blk) {
 error_reportf_err(local_err, "Failed to blk_new_open '%s': ",
   argv[optind]);
-- 
2.5.0




[Qemu-devel] [PATCH v3 03/10] qemu-nbd: add support for --object command line arg

2016-01-19 Thread Daniel P. Berrange
Allow creation of user creatable object types with qemu-nbd
via a new --object command line arg. This will be used to supply
passwords and/or encryption keys to the various block driver
backends via the recently added 'secret' object type.

 # printf letmein > mypasswd.txt
 # qemu-nbd --object secret,id=sec0,file=mypasswd.txt \
  ...other nbd args...

Signed-off-by: Daniel P. Berrange 
---
 qemu-nbd.c| 53 +
 qemu-nbd.texi |  6 ++
 2 files changed, 59 insertions(+)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index ede4a54..941c4c8 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -23,9 +23,12 @@
 #include "qemu/main-loop.h"
 #include "qemu/sockets.h"
 #include "qemu/error-report.h"
+#include "qemu/config-file.h"
 #include "block/snapshot.h"
 #include "qapi/util.h"
 #include "qapi/qmp/qstring.h"
+#include "qapi/opts-visitor.h"
+#include "qom/object_interfaces.h"
 
 #include 
 #include 
@@ -44,6 +47,7 @@
 #define QEMU_NBD_OPT_AIO   2
 #define QEMU_NBD_OPT_DISCARD   3
 #define QEMU_NBD_OPT_DETECT_ZEROES 4
+#define QEMU_NBD_OPT_OBJECT5
 
 static NBDExport *exp;
 static int verbose;
@@ -77,6 +81,9 @@ static void usage(const char *name)
 "  -o, --offset=OFFSET   offset into the image\n"
 "  -P, --partition=NUM   only expose partition NUM\n"
 "\n"
+"General purpose options:\n"
+"  --object type,id=ID,...   define an object such as 'secret' for providing\n"
+"passwords and/or encryption keys\n"
 #ifdef __linux__
 "Kernel NBD client support:\n"
 "  -c, --connect=DEV connect FILE to the local NBD device DEV\n"
@@ -374,6 +381,35 @@ static SocketAddress *nbd_build_socket_address(const char 
*sockpath,
 }
 
 
+static QemuOptsList qemu_object_opts = {
+.name = "object",
+.implied_opt_name = "qom-type",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_object_opts.head),
+.desc = {
+{ }
+},
+};
+
+static int object_create(void *opaque, QemuOpts *opts, Error **errp)
+{
+Error *err = NULL;
+OptsVisitor *ov;
+QDict *pdict;
+
+ov = opts_visitor_new(opts);
+pdict = qemu_opts_to_qdict(opts, NULL);
+
+user_creatable_add(pdict, opts_get_visitor(ov), );
+opts_visitor_cleanup(ov);
+QDECREF(pdict);
+
+if (err) {
+error_propagate(errp, err);
+return -1;
+}
+return 0;
+}
+
 int main(int argc, char **argv)
 {
 BlockBackend *blk;
@@ -411,6 +447,7 @@ int main(int argc, char **argv)
 { "format", 1, NULL, 'f' },
 { "persistent", 0, NULL, 't' },
 { "verbose", 0, NULL, 'v' },
+{ "object", 1, NULL, QEMU_NBD_OPT_OBJECT },
 { NULL, 0, NULL, 0 }
 };
 int ch;
@@ -428,6 +465,7 @@ int main(int argc, char **argv)
 Error *local_err = NULL;
 BlockdevDetectZeroesOptions detect_zeroes = 
BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF;
 QDict *options = NULL;
+QemuOpts *opts;
 
 /* The client thread uses SIGTERM to interrupt the server.  A signal
  * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
@@ -436,6 +474,8 @@ int main(int argc, char **argv)
 memset(_sigterm, 0, sizeof(sa_sigterm));
 sa_sigterm.sa_handler = termsig_handler;
 sigaction(SIGTERM, _sigterm, NULL);
+module_call_init(MODULE_INIT_QOM);
+qemu_add_opts(_object_opts);
 qemu_init_exec_dir(argv[0]);
 
 while ((ch = getopt_long(argc, argv, sopt, lopt, _ind)) != -1) {
@@ -588,6 +628,13 @@ int main(int argc, char **argv)
 usage(argv[0]);
 exit(0);
 break;
+case QEMU_NBD_OPT_OBJECT:
+opts = qemu_opts_parse_noisily(qemu_find_opts("object"),
+   optarg, true);
+if (!opts) {
+exit(1);
+}
+break;
 case '?':
 error_report("Try `%s --help' for more information.", argv[0]);
 exit(EXIT_FAILURE);
@@ -600,6 +647,12 @@ int main(int argc, char **argv)
 exit(EXIT_FAILURE);
 }
 
+if (qemu_opts_foreach(qemu_find_opts("object"),
+  object_create,
+  NULL, NULL)) {
+exit(1);
+}
+
 if (disconnect) {
 fd = open(argv[optind], O_RDWR);
 if (fd < 0) {
diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 46fd483..9f9daca 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -14,6 +14,12 @@ Export QEMU disk image using NBD protocol.
 @table @option
 @item @var{filename}
  is a disk image filename
+@item --object type,id=@var{id},...props...
+  define a new instance of the @var{type} object class identified by @var{id}.
+  See the @code{qemu(1)} manual page for full details of the properties
+  supported. The common object type that it makes sense to define is the
+  @code{secret} object, which is used to supply passwords and/or encryption
+  keys.
 @item -p, --port=@var{port}
   port to listen on (default @samp{10809})
 @item -o, 

[Qemu-devel] [PATCH v3 05/10] qemu-io: allow specifying image as a set of options args

2016-01-19 Thread Daniel P. Berrange
Currently qemu-io allows an image filename to be passed on the
command line, but unless using the JSON format, it does not have
a way to set any options except the format eg

 qemu-io https://127.0.0.1/images/centos7.iso
 qemu-io /home/berrange/demo.qcow2

This adds a --image-opts arg that indicates that the positional
filename should be interpreted as a full option string, not
just a filename.

 qemu-io --image-opts driver=http,url=https://127.0.0.1/images,sslverify=off
 qemu-io --image-opts file=/home/berrange/demo.qcow2

This flag is mutually exclusive with the '-f' flag.

Signed-off-by: Daniel P. Berrange 
---
 qemu-io.c | 34 +-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/qemu-io.c b/qemu-io.c
index 884a23e..dceaaa9 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -370,6 +370,7 @@ static void reenable_tty_echo(void)
 
 enum {
 OPTION_OBJECT = 256,
+OPTION_IMAGE_OPTS = 257,
 };
 
 static QemuOptsList qemu_object_opts = {
@@ -400,6 +401,16 @@ static int object_create(void *opaque, QemuOpts *opts, 
Error **errp)
 return 0;
 }
 
+static QemuOptsList file_opts = {
+.name = "file",
+.implied_opt_name = "file",
+.head = QTAILQ_HEAD_INITIALIZER(file_opts.head),
+.desc = {
+/* no elements => accept any params */
+{ /* end of list */ }
+},
+};
+
 int main(int argc, char **argv)
 {
 int readonly = 0;
@@ -419,6 +430,7 @@ int main(int argc, char **argv)
 { "cache", 1, NULL, 't' },
 { "trace", 1, NULL, 'T' },
 { "object", 1, NULL, OPTION_OBJECT },
+{ "image-opts", 0, NULL, OPTION_IMAGE_OPTS },
 { NULL, 0, NULL, 0 }
 };
 int c;
@@ -427,6 +439,7 @@ int main(int argc, char **argv)
 Error *local_error = NULL;
 QDict *opts = NULL;
 QemuOpts *qopts = NULL;
+bool imageOpts = false;
 
 #ifdef CONFIG_POSIX
 signal(SIGPIPE, SIG_IGN);
@@ -495,6 +508,9 @@ int main(int argc, char **argv)
 exit(1);
 }
 break;
+case OPTION_IMAGE_OPTS:
+imageOpts = true;
+break;
 default:
 usage(progname);
 exit(1);
@@ -536,7 +552,23 @@ int main(int argc, char **argv)
 flags |= BDRV_O_RDWR;
 }
 
-if ((argc - optind) == 1) {
+if (imageOpts) {
+char *file;
+qopts = qemu_opts_parse_noisily(_opts, argv[optind], false);
+if (!qopts) {
+exit(1);
+}
+if (opts) {
+error_report("--image-opts and -f are mutually exclusive");
+exit(1);
+}
+file = g_strdup(qemu_opt_get(qopts, "file"));
+qemu_opt_unset(qopts, "file");
+opts = qemu_opts_to_qdict(qopts, NULL);
+qemu_opts_reset(_opts);
+openfile(file, flags, opts);
+g_free(file);
+} else if ((argc - optind) == 1) {
 openfile(argv[optind], flags, opts);
 }
 command_loop();
-- 
2.5.0




[Qemu-devel] [PATCH v3 10/10] qemu-io: use no_argument/required_argument constants

2016-01-19 Thread Daniel P. Berrange
When declaring the 'struct option' array, use the standard
constants no_argument/required_argument, instead of magic
values 0 and 1.

Reviewed-by: Eric Blake 
Signed-off-by: Daniel P. Berrange 
---
 qemu-io.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index dceaaa9..1c20e9b 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -416,21 +416,21 @@ int main(int argc, char **argv)
 int readonly = 0;
 const char *sopt = "hVc:d:f:rsnmgkt:T:";
 const struct option lopt[] = {
-{ "help", 0, NULL, 'h' },
-{ "version", 0, NULL, 'V' },
-{ "offset", 1, NULL, 'o' },
-{ "cmd", 1, NULL, 'c' },
-{ "format", 1, NULL, 'f' },
-{ "read-only", 0, NULL, 'r' },
-{ "snapshot", 0, NULL, 's' },
-{ "nocache", 0, NULL, 'n' },
-{ "misalign", 0, NULL, 'm' },
-{ "native-aio", 0, NULL, 'k' },
-{ "discard", 1, NULL, 'd' },
-{ "cache", 1, NULL, 't' },
-{ "trace", 1, NULL, 'T' },
-{ "object", 1, NULL, OPTION_OBJECT },
-{ "image-opts", 0, NULL, OPTION_IMAGE_OPTS },
+{ "help", no_argument, NULL, 'h' },
+{ "version", no_argument, NULL, 'V' },
+{ "offset", required_argument, NULL, 'o' },
+{ "cmd", required_argument, NULL, 'c' },
+{ "format", required_argument, NULL, 'f' },
+{ "read-only", no_argument, NULL, 'r' },
+{ "snapshot", no_argument, NULL, 's' },
+{ "nocache", no_argument, NULL, 'n' },
+{ "misalign", no_argument, NULL, 'm' },
+{ "native-aio", no_argument, NULL, 'k' },
+{ "discard", required_argument, NULL, 'd' },
+{ "cache", required_argument, NULL, 't' },
+{ "trace", required_argument, NULL, 'T' },
+{ "object", required_argument, NULL, OPTION_OBJECT },
+{ "image-opts", no_argument, NULL, OPTION_IMAGE_OPTS },
 { NULL, 0, NULL, 0 }
 };
 int c;
-- 
2.5.0




[Qemu-devel] [PATCH v3 08/10] qemu-nbd: don't overlap long option values with short options

2016-01-19 Thread Daniel P. Berrange
When defining values for long options, the normal practice is
to start numbering from 256, to avoid overlap with the range
of valid values for short options.

Reviewed-by: Eric Blake 
Signed-off-by: Daniel P. Berrange 
---
 qemu-nbd.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index db610f9..1776a3c 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -43,12 +43,12 @@
 #include 
 
 #define SOCKET_PATH"/var/lock/qemu-nbd-%s"
-#define QEMU_NBD_OPT_CACHE 1
-#define QEMU_NBD_OPT_AIO   2
-#define QEMU_NBD_OPT_DISCARD   3
-#define QEMU_NBD_OPT_DETECT_ZEROES 4
-#define QEMU_NBD_OPT_OBJECT5
-#define QEMU_NBD_OPT_IMAGE_OPTS6
+#define QEMU_NBD_OPT_CACHE 256
+#define QEMU_NBD_OPT_AIO   257
+#define QEMU_NBD_OPT_DISCARD   258
+#define QEMU_NBD_OPT_DETECT_ZEROES 259
+#define QEMU_NBD_OPT_OBJECT260
+#define QEMU_NBD_OPT_IMAGE_OPTS261
 
 static NBDExport *exp;
 static int verbose;
-- 
2.5.0




Re: [Qemu-devel] usb-storage assertions

2016-01-19 Thread Andrey Korolyov
On Tue, Jan 19, 2016 at 10:13 AM, Gerd Hoffmann  wrote:
> On Di, 2016-01-19 at 02:49 +0300, Andrey Korolyov wrote:
>> On Mon, Jan 18, 2016 at 4:55 PM, Gerd Hoffmann  wrote:
>> >   Hi,
>> >
>> >> > ok.  Had no trouble with freebsd, will go fetch netbsd images.  What
>> >> > arch is this?  i386?  x86_64?
>> >>
>> >> i386 7.0 for the reference, but I`m sure that this wouldn`t matter in
>> >> any way.
>> >
>> > 7.0 trace:
>>
>> Whoops, sorry, should be 5.1/i386.
>
> I'll check that too ...
>
>>  On a 7.0 I observe same endless
>> loop as you do.
>
> ... just wanted to start with that one.
>
> [ trace snipped ]
>
>> > So, to shutdown ehci netbsd clears the cmd register, then sets the reset
>> > bit in the cmd register.  Fine.
>> >
>> > Then it goes read the status register, in a loop, forever.  No idea why,
>> > and I also can't spot then place in the source code.  Hmm ...
>
> /me was hoping anyone has an idea what is going on here.
> Are you familiar with the netbsd kernel sources?
>

Probably not enough with driver subsystem to point even at the obvious
issue in the EHCI driver. I`d start with slowing down an emulated CPU
10...100 times via its thread cg, leaving emulator code hanging with
enough CPU cycles and check if the issue is still here. If roots of
the crash or endless loop are timing-related, they either would change
appearance significanly or disappear completely (or vice versa, slow
an emulator thread). If you don`t have enough time for such blind
testing, I may check it in a next few days. Since I`ve seen interrupt
storm complaint on FreeBSD within same conditions, I strongly prefer
the idea of a race-driven behavior.



Re: [Qemu-devel] [PATCHv4 8/8] pseries: Clean up error reporting in htab migration functions

2016-01-19 Thread David Gibson
On Tue, Jan 19, 2016 at 08:44:51AM +0100, Markus Armbruster wrote:
> David Gibson  writes:
> 
> > The functions for migrating the hash page table on pseries machine type
> > (htab_save_setup() and htab_load()) can report some errors with an
> > explicit fprintf() before returning an appropriate error code.  Change these
> > to use error_report() instead.
> >
> > Signed-off-by: David Gibson 
> > Reviewed-by: Thomas Huth 
> > ---
> >  hw/ppc/spapr.c | 12 ++--
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 3cfacb9..1eb7d03 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> 
> Lost this hunk:
> 
>   @@ -1309,8 +1309,9 @@ static int htab_save_setup(QEMUFile *f, void *opaque)
>spapr->htab_fd = kvmppc_get_htab_fd(false);
>spapr->htab_fd_stale = false;
>if (spapr->htab_fd < 0) {
>   -fprintf(stderr, "Unable to open fd for reading hash table from 
> KVM: %s\n",
>   -strerror(errno));
>   +error_report(
>   +"Unable to open fd for reading hash table from KVM: %s",
>   +strerror(errno));
>return -1;
>}
>}
> 
> Intentional?

Yes.  As noted in the cover letter, this conflicts with another series
I'm working on, which will obsolete this change anyway.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[Qemu-devel] [PATCH 1/1] vl: change QEMU state machine for system reset

2016-01-19 Thread Denis V. Lunev
This patch implements proposal from Paolo to handle system reset when
the guest is not running.

"After a reset, main_loop_should_exit should actually transition
to VM_STATE_PRELAUNCH (*not* RUN_STATE_PAUSED) for *all* states except
RUN_STATE_INMIGRATE, RUN_STATE_SAVE_VM (which I think cannot happen
there) and (of course) RUN_STATE_RUNNING."

Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Dmitry Andreev 
---
 vl.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/vl.c b/vl.c
index 0172e42..c9e47b0 100644
--- a/vl.c
+++ b/vl.c
@@ -583,6 +583,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 /* from  -> to  */
 { RUN_STATE_DEBUG, RUN_STATE_RUNNING },
 { RUN_STATE_DEBUG, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_DEBUG, RUN_STATE_PRELAUNCH },
 
 { RUN_STATE_INMIGRATE, RUN_STATE_INTERNAL_ERROR },
 { RUN_STATE_INMIGRATE, RUN_STATE_IO_ERROR },
@@ -596,15 +597,19 @@ static const RunStateTransition 
runstate_transitions_def[] = {
 
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_INTERNAL_ERROR, RUN_STATE_PRELAUNCH },
 
 { RUN_STATE_IO_ERROR, RUN_STATE_RUNNING },
 { RUN_STATE_IO_ERROR, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_IO_ERROR, RUN_STATE_PRELAUNCH },
 
 { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
 { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_PAUSED, RUN_STATE_PRELAUNCH },
 
 { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_POSTMIGRATE, RUN_STATE_PRELAUNCH },
 
 { RUN_STATE_PRELAUNCH, RUN_STATE_RUNNING },
 { RUN_STATE_PRELAUNCH, RUN_STATE_FINISH_MIGRATE },
@@ -612,8 +617,10 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
+{ RUN_STATE_FINISH_MIGRATE, RUN_STATE_PRELAUNCH },
 
 { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
+{ RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
 
 { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
 { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
@@ -627,20 +634,25 @@ static const RunStateTransition 
runstate_transitions_def[] = {
 { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
 
 { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
+{ RUN_STATE_SAVE_VM, RUN_STATE_PRELAUNCH },
 
 { RUN_STATE_SHUTDOWN, RUN_STATE_PAUSED },
 { RUN_STATE_SHUTDOWN, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_SHUTDOWN, RUN_STATE_PRELAUNCH },
 
 { RUN_STATE_DEBUG, RUN_STATE_SUSPENDED },
 { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
 { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
 { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH },
 
 { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
 { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_WATCHDOG, RUN_STATE_PRELAUNCH },
 
 { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
 { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_GUEST_PANICKED, RUN_STATE_PRELAUNCH },
 
 { RUN_STATE__MAX, RUN_STATE__MAX },
 };
@@ -1886,8 +1899,10 @@ static bool main_loop_should_exit(void)
 cpu_synchronize_all_states();
 qemu_system_reset(VMRESET_REPORT);
 resume_all_vcpus();
-if (runstate_needs_reset()) {
-runstate_set(RUN_STATE_PAUSED);
+if (!runstate_check(RUN_STATE_RUNNING) &&
+!runstate_check(RUN_STATE_INMIGRATE) &&
+!runstate_check(RUN_STATE_SAVE_VM)) {
+runstate_set(RUN_STATE_PRELAUNCH);
 }
 }
 if (qemu_wakeup_requested()) {
-- 
2.5.0




Re: [Qemu-devel] [PATCH] hw/pci: ensure that only PCI/PCIe bridges can be attached to pxb/pxb-pcie devices

2016-01-19 Thread Marcel Apfelbaum

On 01/18/2016 08:16 PM, Laszlo Ersek wrote:

On 01/18/16 19:08, Peter Maydell wrote:

On 18 January 2016 at 15:27, Marcel Apfelbaum  wrote:

PCI devices can't be plugged directly into PCI extra root bridges
because their resources can't be computed by firmware before the ACPI
tables are loaded.

Signed-off-by: Marcel Apfelbaum 
---

Hi,

This patch follows the discussion:
https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg01484.html


Is it definitely the case that no current working command lines plug
PCI devices directly into these things (including on platforms that
don't have anything to do with ACPI at all) ?


Hi,

The PXB devices can work only on ACPI based platforms, but currently work only 
on PC Machines.
So for other platforms are out of the scope.

I understand the issue in putting it generic PCI code, but:
 - Non ACPI platforms (implemented in QEMU) do not support extra PCI host 
bridges (at least yet)
 - Even when extra host bridges will be supported, there are are several ways 
to implement it
   and most of them will not require their pxbs to have a parent_device. The 
presence of a parent device
   is a pretty solid lead that is a "snooping bridge" and as far as I know is 
only typical for the existing solution.

Now the explanation of the issue we want to solve:
 - pxb (PCI expander bridge) - it already has an internal bridge, using
   -device pxb,bus80,id=pxb1 -device e1000,bus=pxb1
   will land the device on a built-in pci bridge.
   - An incorrect command-line will result in a non working device without the 
proposed patch.
 - pxb-pcie (PCIe Root Complex) - it does not have an internal bridge and 
trying to use:
-device pxb-pcie,bus80,id=pxb1 -device e1000,bus=pxb1
   will fail.

This patch ensures non of that can happen.

Last word:
I did consider another option, adding a "bridges-only" property (defaulted to 
false) to PCIBus class
and leverage the fact that the pxb internal buses derive from it(and it can be 
set to true).

Then we can simply check PCI_BUS_CLASS(bus)->bridges-only but it seemed a 
little odd since we
don't have that limitation on the real world.
I am not against it, if it is preferred I'll submit a new patch.



No clue about "pxb-pcie", but re: "pxb", the documentation and examples
by Marcel (see: "docs/pci_expander_bridge.txt") will certainly continue
working, with this patch place. And, that text file is authoritative for
pxb, since Marcel (et al) wrote the code directly for the purposes
described in the txt.


and that reminds me I need to update the doc for pxb-pcie, thanks Laszlo!
Marcel




(But I'll let Marcel answer too! :))

Thanks
Laszlo






Re: [Qemu-devel] [PATCH v3 ] doc: Introduce coding style for errors

2016-01-19 Thread Thomas Huth
On 18.01.2016 21:26, Eric Blake wrote:
> On 01/15/2016 06:54 AM, Lluís Vilanova wrote:
>> Gives some general guidelines for reporting errors in QEMU.
>>
>> Signed-off-by: Lluís Vilanova 
>> ---
>>  HACKING |   36 
>>  1 file changed, 36 insertions(+)
...
>> +Functions in this header are used to accumulate error messages in an 'Error'
>> +object, which can be propagated up the call chain where it is finally 
>> reported.
>> +
>> +In its simplest form, you can immediately report an error with:
>> +
>> +error_setg(_fatal, "Error with %s", "arguments");
> 
> This paradigm doesn't appear anywhere in the current code base
> (hw/ppc/spapr*.c has a few cases of error_setg(_abort), but
> nothing directly passes error_fatal).  It's a bit odd to document
> something that isn't actually used.

+1 for _not_ documenting this here: IMHO this looks ugly. If we want
something like this, I think we should introduce a proper
error_report_fatal() function instead.

 Thomas




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v1 05/15] crypto: add block encryption framework

2016-01-19 Thread Daniel P. Berrange
On Mon, Jan 18, 2016 at 12:48:56PM -0700, Eric Blake wrote:
> On 01/14/2016 05:16 AM, Daniel P. Berrange wrote:
> 
> >>> +# @qcowaes: QCow/QCow2 built-in AES-CBC encryption. Do not use
> >>> +#
> >>
> >> Well, the only reason to use it would be to read data off an old
> >> insecurely-encrypted qcow2 file; so maybe it should read "Do not use on
> >> new files"
> > 
> > Yep
> > 
> >>> +# Since: 2.6
> >>> +##
> >>> +{ 'enum': 'QCryptoBlockFormat',
> >>> +#  'prefix': 'QCRYPTO_BLOCK_FORMAT',
> >>> +  'data': ['qcowaes']}
> >>
> >> Would 'qcow-aes' be any easier to read?
> > 
> > Or just shorten to 'qcow' perhaps ?
> 
> Or maybe 'old-qcow' to emphasize that it is old?  At this point, we're
> painting the bikeshed, so I'll live with whatever you like.

I've gone for just 'qcow', since that's also what libvirt calls it,
and it avoids a need for further debate as to whether to hyphenate
the two words or underscore or neither ;-P

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCH v16 05/14] vfio: add pcie extanded capability support

2016-01-19 Thread Chen Fan


On 01/17/2016 09:22 PM, Marcel Apfelbaum wrote:

On 01/12/2016 04:43 AM, Cao jin wrote:

From: Chen Fan 



Hi,

I noticed a type in the subject, extanded -> extended


For vfio pcie device, we could expose the extended capability on
PCIE bus. in order to avoid config space broken, we introduce
a copy config for parsing extended caps. and rebuild the pcie
extended config space.


Maybe we can re-word this. Will someone with better English skills
advice :) ?


that will be helpful. ;)



Signed-off-by: Chen Fan 
---
  hw/vfio/pci.c | 70 
++-

  1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 288f2c7..64b0867 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1482,6 +1482,21 @@ static uint8_t vfio_std_cap_max_size(PCIDevice 
*pdev, uint8_t pos)

  return next - pos;
  }

+
+static uint16_t vfio_ext_cap_max_size(const uint8_t *config, 
uint16_t pos)

+{
+uint16_t tmp, next = PCIE_CONFIG_SPACE_SIZE;
+
+for (tmp = PCI_CONFIG_SPACE_SIZE; tmp;
+tmp = PCI_EXT_CAP_NEXT(pci_get_long(config + tmp))) {
+if (tmp > pos && tmp < next) {
+next = tmp;
+}
+}
+
+return next - pos;
+}


Can't we reuse vfio_std_cap_max_size here? if only the config size 
differs,

we can pass it as parameter.
not only the config size differ, but also the PCI express Extended 
Capability header,
the pci express head use 16bit to store the cap id and 12bit to store 
the next offset.





+
  static void vfio_set_word_bits(uint8_t *buf, uint16_t val, uint16_t 
mask)

  {
  pci_set_word(buf, (pci_get_word(buf) & ~mask) | val);
@@ -1817,16 +1832,69 @@ static int vfio_add_std_cap(VFIOPCIDevice 
*vdev, uint8_t pos)

  return 0;
  }

+static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
+{
+PCIDevice *pdev = >pdev;
+uint32_t header;
+uint16_t cap_id, next, size;
+uint8_t cap_ver;
+uint8_t *config;
+
+/*
+ * In order to avoid breaking config space, create a copy to
+ * use for parsing extended capabilities.


It will be nice to know *how* do we break/*what* will break the config
space, I confess that I didn't see it :(.

I will improve it.




+ */
+config = g_memdup(pdev->config, vdev->config_size);
+
+for (next = PCI_CONFIG_SPACE_SIZE; next;
+ next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) {
+header = pci_get_long(config + next);
+cap_id = PCI_EXT_CAP_ID(header);
+cap_ver = PCI_EXT_CAP_VER(header);
+
+/*
+ * If it becomes important to configure extended 
capabilities to their
+ * actual size, use this as the default when it's something 
we don't
+ * recognize. Since QEMU doesn't actually handle many of the 
config

+ * accesses, exact size doesn't seem worthwhile.
+ */
+size = vfio_ext_cap_max_size(config, next);
+
+pcie_add_capability(pdev, cap_id, cap_ver, next, size);
+pci_set_long(dev->config + next, PCI_EXT_CAP(cap_id, 
cap_ver, 0));

+
+/* Use emulated next pointer to allow dropping extended caps */
+ pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
+   PCI_EXT_CAP_NEXT_MASK);
+}
+
+g_free(config);
+return 0;
+}
+
  static int vfio_add_capabilities(VFIOPCIDevice *vdev)
  {
  PCIDevice *pdev = >pdev;
+int ret;

  if (!(pdev->config[PCI_STATUS] & PCI_STATUS_CAP_LIST) ||
  !pdev->config[PCI_CAPABILITY_LIST]) {
  return 0; /* Nothing to add */
  }

-return vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST]);
+ret = vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST]);
+if (ret) {
+return ret;
+}
+
+/* on PCI bus, it doesn't make sense to expose extended 
capabilities. */

+if (!pci_is_express(pdev) ||
+!pci_bus_is_express(pdev->bus) ||
+!pci_get_long(pdev->config + PCI_CONFIG_SPACE_SIZE)) {


I am curious about the last check, "!pci_get_long(pdev->config + 
PCI_CONFIG_SPACE_SIZE)",

can you please explain?

the pcie spec 3.0 defines that:

7.9.1. Extended Capabilities in Configuration Space
Extended Capabilities in Configuration Space always begin at offset 100h 
with a PCI Express
Extended Capability header (Section 7.9.3). Absence of any Extended 
Capabilities is required to be
indicated by an Extended Capability header with a Capability ID of 
h, a Capability Version of

0h, and a Next Capability Offset of 000h.

so here we test whether the offset 100h is zero.

Thanks,
Chen




Thank you,
Marcel


+return 0;
+}
+
+return vfio_add_ext_cap(vdev);
  }

  static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)





.








Re: [Qemu-devel] [PATCH v4 0/2] block: Reject negative values for throttling options

2016-01-19 Thread Markus Armbruster
Fam Zheng  writes:

> v4: Add Max's rev-by in both patches, while fixing the "maxs" typo.
>
> v3: Address comments:
> - Add test for large value; [Berto]
> - Fix typos "negative" & "caught"; [Eric, Berto]
> - Use "LL" suffix to the upper limit constant. [Berto]
>
> v2: Check the value range and report an appropriate error. [Berto]
>
> Now the negative values are silently converted to a huge positive number
> because we are doing implicit casting from uint64_t to double. Fix it and add 
> a
> test case (this was once fixed in 7d81c1413c9 but regressed when the block
> device option parsing code was changed).

I think PATCH 1's commit message could explain the problem in a bit more
detail, and it should mention the changed valid range.

Other than that, I had two questions: why cast THROTTLE_VALUE_MAX for
printing (in scope for the series), and why parse the settings as
integers even though they're really floating-point (probably not in
scope).



Re: [Qemu-devel] [PATCH v16 07/14] vfio: add aer support for vfio device

2016-01-19 Thread Chen Fan


On 01/18/2016 05:12 PM, Marcel Apfelbaum wrote:

On 01/12/2016 04:43 AM, Cao jin wrote:

From: Chen Fan 

Calling pcie_aer_init to initilize aer related registers for
vfio device, then reload physical related registers to expose
device capability.

Signed-off-by: Chen Fan 
---
  hw/vfio/pci.c | 81 
---

  hw/vfio/pci.h |  3 +++
  2 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 64b0867..38b0aa5 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1832,6 +1832,62 @@ static int vfio_add_std_cap(VFIOPCIDevice 
*vdev, uint8_t pos)

  return 0;
  }

+static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,
+  int pos, uint16_t size)
+{
+PCIDevice *pdev = >pdev;
+PCIDevice *dev_iter;
+uint8_t type;
+uint32_t errcap;
+
+if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
+pcie_add_capability(pdev, PCI_EXT_CAP_ID_ERR,
+cap_ver, pos, size);
+return 0;
+}
+
+dev_iter = pci_bridge_get_device(pdev->bus);
+if (!dev_iter) {
+goto error;
+}
+
+while (dev_iter) {
+type = pcie_cap_get_type(dev_iter);
+if ((type != PCI_EXP_TYPE_ROOT_PORT &&
+ type != PCI_EXP_TYPE_UPSTREAM &&
+ type != PCI_EXP_TYPE_DOWNSTREAM)) {
+goto error;
+}
+
+if (!dev_iter->exp.aer_cap) {
+goto error;
+}
+
+dev_iter = pci_bridge_get_device(dev_iter->bus);
+}
+
+errcap = vfio_pci_read_config(pdev, pos + PCI_ERR_CAP, 4);
+/*
+ * The ability to record multiple headers is depending on
+ * the state of the Multiple Header Recording Capable bit and
+ * enabled by the Multiple Header Recording Enable bit.
+ */
+if ((errcap & PCI_ERR_CAP_MHRC) &&
+(errcap & PCI_ERR_CAP_MHRE)) {
+pdev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_DEFAULT;
+} else {
+pdev->exp.aer_log.log_max = 0;
+}
+
+pcie_cap_deverr_init(pdev);
+return pcie_aer_init(pdev, pos, size);
+
+error:
+error_report("vfio: Unable to enable AER for device %s, parent 
bus "
+ "does not support AER signaling", 
vdev->vbasedev.name);

+return -1;
+}
+
  static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
  {
  PCIDevice *pdev = >pdev;
@@ -1839,6 +1895,7 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
  uint16_t cap_id, next, size;
  uint8_t cap_ver;
  uint8_t *config;
+int ret = 0;

  /*
   * In order to avoid breaking config space, create a copy to
@@ -1860,16 +1917,29 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
   */
  size = vfio_ext_cap_max_size(config, next);

-pcie_add_capability(pdev, cap_id, cap_ver, next, size);
-pci_set_long(dev->config + next, PCI_EXT_CAP(cap_id, 
cap_ver, 0));

+switch (cap_id) {
+case PCI_EXT_CAP_ID_ERR:
+ret = vfio_setup_aer(vdev, cap_ver, next, size);
+break;
+default:
+pcie_add_capability(pdev, cap_id, cap_ver, next, size);
+break;
+}
+
+if (ret) {
+goto out;
+}
+
+pci_set_long(pdev->config + next, PCI_EXT_CAP(cap_id, 
cap_ver, 0));


  /* Use emulated next pointer to allow dropping extended 
caps */

pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
 PCI_EXT_CAP_NEXT_MASK);
  }

+out:
  g_free(config);
-return 0;
+return ret;
  }

  static int vfio_add_capabilities(VFIOPCIDevice *vdev)
@@ -2624,6 +2694,11 @@ static int vfio_initfn(PCIDevice *pdev)
  goto out_teardown;
  }

+if ((vdev->features & VFIO_FEATURE_ENABLE_AER) &&
+!pdev->exp.aer_cap) {


Hi,

I think we need an error_report here, otherwise the init
will fail without knowing the reason.

maybe we need to exclude this case. if the device hasn't the aer cap,
the ENABLE_AER would be not affected.

Thanks,
Chen




Thanks,
Marcel



+goto out_teardown;
+}
+
  /* QEMU emulates all of MSI & MSIX */
  if (pdev->cap_present & QEMU_PCI_CAP_MSIX) {
  memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff,
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index f004d52..48c1f69 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -15,6 +15,7 @@
  #include "qemu-common.h"
  #include "exec/memory.h"
  #include "hw/pci/pci.h"
+#include "hw/pci/pci_bridge.h"
  #include "hw/vfio/vfio-common.h"
  #include "qemu/event_notifier.h"
  #include "qemu/queue.h"
@@ -127,6 +128,8 @@ typedef struct VFIOPCIDevice {
  #define VFIO_FEATURE_ENABLE_VGA (1 << VFIO_FEATURE_ENABLE_VGA_BIT)
  #define VFIO_FEATURE_ENABLE_REQ_BIT 1
  #define VFIO_FEATURE_ENABLE_REQ (1 << VFIO_FEATURE_ENABLE_REQ_BIT)
+#define VFIO_FEATURE_ENABLE_AER_BIT 2
+#define VFIO_FEATURE_ENABLE_AER (1 << 

[Qemu-devel] cgroup blkio weight has no effect for qemu

2016-01-19 Thread 陈博
Hi folks,


Could you please enlighten me about how to achieve proportional IO sharing by 
using cgroup, instead of qemu?

My qemu config is like: -drive 
file=$DISKFILe,if=none,format=qcow2,cache=none,aio=native

Test command inside vm is like: dd if=/dev/vdc of=/dev/null iflag=direct

Cgroup blkio weight of the qemu process is properly configured as well.

But no matter how change the proportion, such as vm1=400 and vm2=100, I can 
only get the equal IO speed.

Wondering cgroup blkio.weight or blkio.weight_device has no effect for qemu???


PS. cache=writethrough aio=threads is also tested, the same results. 



- Bob





Re: [Qemu-devel] [PATCHv2 1/3] spapr: Small fixes to rtas_ibm_get_system_parameter, remove rtas_st_buffer

2016-01-19 Thread Alexey Kardashevskiy

On 01/19/2016 03:30 PM, David Gibson wrote:

rtas_st_buffer() appears in spapr.h as though it were a widely used helper,
but in fact it is only used for saving data in a format used by
rtas_ibm_get_system_parameter().  This changes it to a local helper more
specifically for that function.

While we're there fix a couple of small defects in
rtas_ibm_get_system_parameter:
   - For the string value SPLPAR_CHARACTERISTICS, it wasn't including the
 terminating \0 in the length which it should according to LoPAPR
 7.3.16.1
   - It now checks that the supplied buffer has at least enough space for
 the length of the returned data, and returns an error if it does not.

Signed-off-by: David Gibson 
---
  hw/ppc/spapr_rtas.c| 22 ++
  include/hw/ppc/spapr.h | 28 +---
  2 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 34b12a3..32cdd66 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -228,6 +228,20 @@ static void rtas_stop_self(PowerPCCPU *cpu, 
sPAPRMachineState *spapr,
  env->msr = 0;
  }

+



Nit: unneeded empty line. Besides that,

Reviewed-by: Alexey Kardashevskiy 




+static inline int sysparm_st(target_ulong addr, target_ulong len,
+ const void *val, uint16_t vallen)
+{
+hwaddr phys = ppc64_phys_to_real(addr);
+
+if (len < 2) {
+return RTAS_OUT_SYSPARM_PARAM_ERROR;
+}
+stw_be_phys(_space_memory, phys, vallen);
+cpu_physical_memory_write(phys + 2, val, MIN(len - 2, vallen));
+return RTAS_OUT_SUCCESS;
+}
+
  static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu,
sPAPRMachineState *spapr,
uint32_t token, uint32_t nargs,
@@ -237,7 +251,7 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu,
  target_ulong parameter = rtas_ld(args, 0);
  target_ulong buffer = rtas_ld(args, 1);
  target_ulong length = rtas_ld(args, 2);
-target_ulong ret = RTAS_OUT_SUCCESS;
+target_ulong ret;

  switch (parameter) {
  case RTAS_SYSPARM_SPLPAR_CHARACTERISTICS: {
@@ -249,18 +263,18 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu,
current_machine->ram_size / M_BYTE,
smp_cpus,
max_cpus);
-rtas_st_buffer(buffer, length, (uint8_t *)param_val, 
strlen(param_val));
+ret = sysparm_st(buffer, length, param_val, strlen(param_val) + 1);
  g_free(param_val);
  break;
  }
  case RTAS_SYSPARM_DIAGNOSTICS_RUN_MODE: {
  uint8_t param_val = DIAGNOSTICS_RUN_MODE_DISABLED;

-rtas_st_buffer(buffer, length, _val, sizeof(param_val));
+ret = sysparm_st(buffer, length, _val, sizeof(param_val));
  break;
  }
  case RTAS_SYSPARM_UUID:
-rtas_st_buffer(buffer, length, qemu_uuid, (qemu_uuid_set ? 16 : 0));
+ret = sysparm_st(buffer, length, qemu_uuid, (qemu_uuid_set ? 16 : 0));
  break;
  default:
  ret = RTAS_OUT_NOT_SUPPORTED;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 53af76a..1e10fc9 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -408,14 +408,15 @@ int spapr_allocate_irq_block(int num, bool lsi, bool msi);
  #define RTAS_SLOT_PERM_ERR_LOG   2

  /* RTAS return codes */
-#define RTAS_OUT_SUCCESS0
-#define RTAS_OUT_NO_ERRORS_FOUND1
-#define RTAS_OUT_HW_ERROR   -1
-#define RTAS_OUT_BUSY   -2
-#define RTAS_OUT_PARAM_ERROR-3
-#define RTAS_OUT_NOT_SUPPORTED  -3
-#define RTAS_OUT_NO_SUCH_INDICATOR  -3
-#define RTAS_OUT_NOT_AUTHORIZED -9002
+#define RTAS_OUT_SUCCESS0
+#define RTAS_OUT_NO_ERRORS_FOUND1
+#define RTAS_OUT_HW_ERROR   -1
+#define RTAS_OUT_BUSY   -2
+#define RTAS_OUT_PARAM_ERROR-3
+#define RTAS_OUT_NOT_SUPPORTED  -3
+#define RTAS_OUT_NO_SUCH_INDICATOR  -3
+#define RTAS_OUT_NOT_AUTHORIZED -9002
+#define RTAS_OUT_SYSPARM_PARAM_ERROR-

  /* RTAS tokens */
  #define RTAS_TOKEN_BASE  0x2000
@@ -513,17 +514,6 @@ static inline void rtas_st_buffer_direct(target_ulong phys,
MIN(buffer_len, phys_len));
  }

-static inline void rtas_st_buffer(target_ulong phys, target_ulong phys_len,
-  uint8_t *buffer, uint16_t buffer_len)
-{
-if (phys_len < 2) {
-return;
-}
-stw_be_phys(_space_memory,
-ppc64_phys_to_real(phys), buffer_len);
-rtas_st_buffer_direct(phys + 2, phys_len - 2, buffer, buffer_len);
-}
-
  typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPRMachineState *sm,
   

[Qemu-devel] cgroup blkio weight has no effect on qemu

2016-01-19 Thread 陈博
Hi folks,


Could you enlighten me about how to achieve proportional IO sharing by using 
cgroup, instead of qemu?

My qemu config is like: -drive 
file=$DISKFILe,if=none,format=qcow2,cache=none,aio=native

Test command inside vm is like: dd if=/dev/vdc of=/dev/null iflag=direct

Cgroup blkio weight of the qemu process is properly configured as well.

But no matter how change the proportion, such as vm1=400 and vm2=100, I can 
only get the equal IO speed.

Wondering cgroup blkio.weight or blkio.weight_device has no effect on qemu???


PS. cache=writethrough aio=threads is also tested, the same results. 



- Bob




Re: [Qemu-devel] [PATCH 5/7] target-ppc: gdbstub: fix altivec registers for little-endian guests

2016-01-19 Thread Greg Kurz
On Mon, 18 Jan 2016 13:25:19 +1100
David Gibson  wrote:

> On Fri, Jan 15, 2016 at 04:00:38PM +0100, Greg Kurz wrote:
> > Altivec registers are 128-bit wide. They are stored in memory as two
> > 64-bit values that must be byteswapped when the guest is little-endian.
> > Let's reuse the ppc_maybe_bswap_register() helper for this.
> > 
> > We also need to fix the ordering of the 64-bit elements according to
> > the target endianness, for both system and user mode.
> > 
> > Signed-off-by: Greg Kurz   
> 
> What bothers me about this is that avr_need_swap() now depends on both
> host and guest endianness.  However the VSCR and VRSAVE swap - like
> the swaps for GPRs and FPRs - uses ppc_maybe_bswap_register() which
> depends only on guest endianness.
> 
> Why does altivec depend on the host endianness?
> 

This has always been the case:

commit b4f8d821e5211bbb51a278ba0fc4a4db2d581221
Author: aurel32 
Date:   Sat Jan 24 15:08:09 2009 +

target-ppc: Add Altivec register read/write using XML

[...]

+static int gdb_get_avr_reg(CPUState *env, uint8_t *mem_buf, int n)
+{
+if (n < 32) {
+#ifdef WORDS_BIGENDIAN
+stq_p(mem_buf, env->avr[n].u64[0]);
+stq_p(mem_buf+8, env->avr[n].u64[1]);
+#else
+stq_p(mem_buf, env->avr[n].u64[1]);
+stq_p(mem_buf+8, env->avr[n].u64[0]);
+#endif
+return 16;
+}

My understanding is that gdb expects registers to be presented with
the target endianness but QEMU have them in host endianness.

The ppc_maybe_bswap_register() helper is needed to fix 64-bit values
according to the target effective endianness because stq_p() always
consider both ppc64 and ppc64le to be big endian.

Here, we have a 128-bit register that we break into two 64-bit values
in memory. Each quad word has to be fixed by ppc_maybe_bswap_register().
But we also have to reorder these quad words if the host endianness
differs from the target's one. This is the purpose of avr_need_swap().

Cheers.

--
Greg

> > ---
> >  target-ppc/translate_init.c |   12 ++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> > index 18e9e561561f..80d53e4dcf5a 100644
> > --- a/target-ppc/translate_init.c
> > +++ b/target-ppc/translate_init.c
> > @@ -8754,9 +8754,9 @@ static void dump_ppc_insns (CPUPPCState *env)
> >  static bool avr_need_swap(CPUPPCState *env)
> >  {
> >  #ifdef HOST_WORDS_BIGENDIAN
> > -return false;
> > +return msr_le;
> >  #else
> > -return true;
> > +return !msr_le;
> >  #endif
> >  }
> >  
> > @@ -8800,14 +8800,18 @@ static int gdb_get_avr_reg(CPUPPCState *env, 
> > uint8_t *mem_buf, int n)
> >  stq_p(mem_buf, env->avr[n].u64[1]);
> >  stq_p(mem_buf+8, env->avr[n].u64[0]);
> >  }
> > +ppc_maybe_bswap_register(env, mem_buf, 8);
> > +ppc_maybe_bswap_register(env, mem_buf + 8, 8);
> >  return 16;
> >  }
> >  if (n == 32) {
> >  stl_p(mem_buf, env->vscr);
> > +ppc_maybe_bswap_register(env, mem_buf, 4);
> >  return 4;
> >  }
> >  if (n == 33) {
> >  stl_p(mem_buf, (uint32_t)env->spr[SPR_VRSAVE]);
> > +ppc_maybe_bswap_register(env, mem_buf, 4);
> >  return 4;
> >  }
> >  return 0;
> > @@ -8816,6 +8820,8 @@ static int gdb_get_avr_reg(CPUPPCState *env, uint8_t 
> > *mem_buf, int n)
> >  static int gdb_set_avr_reg(CPUPPCState *env, uint8_t *mem_buf, int n)
> >  {
> >  if (n < 32) {
> > +ppc_maybe_bswap_register(env, mem_buf, 8);
> > +ppc_maybe_bswap_register(env, mem_buf + 8, 8);
> >  if (!avr_need_swap(env)) {
> >  env->avr[n].u64[0] = ldq_p(mem_buf);
> >  env->avr[n].u64[1] = ldq_p(mem_buf+8);
> > @@ -8826,10 +8832,12 @@ static int gdb_set_avr_reg(CPUPPCState *env, 
> > uint8_t *mem_buf, int n)
> >  return 16;
> >  }
> >  if (n == 32) {
> > +ppc_maybe_bswap_register(env, mem_buf, 4);
> >  env->vscr = ldl_p(mem_buf);
> >  return 4;
> >  }
> >  if (n == 33) {
> > +ppc_maybe_bswap_register(env, mem_buf, 4);
> >  env->spr[SPR_VRSAVE] = (target_ulong)ldl_p(mem_buf);
> >  return 4;
> >  }
> >   
> 




Re: [Qemu-devel] [PATCH v16 08/14] vfio: add check host bus reset is support or not

2016-01-19 Thread Chen Fan


On 01/18/2016 06:32 PM, Marcel Apfelbaum wrote:

On 01/12/2016 04:43 AM, Cao jin wrote:

From: Chen Fan 



Hi,

I think the subject should be rephrased.


when init vfio devices done, we should test all the devices supported
aer whether conflict with others. For each one, get the hot reset
info for the affected device list.  For each affected device, all
should attach to the VM and on/below the same bus. also, we should test
all of the non-AER supporting vfio-pci devices on or below the target
bus to verify they have a reset mechanism.



Maybe instead of explaining what this patch does you can simply
say what are the requirements:

Something like: Check there are no AER conflicts by making sure the
devices are behind on/below the same bus... (someone with better 
English may help :) )




Signed-off-by: Chen Fan 
---
  hw/vfio/pci.c | 238 
--

  hw/vfio/pci.h |   1 +
  2 files changed, 232 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 38b0aa5..16ab0e3 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1832,6 +1832,218 @@ static int vfio_add_std_cap(VFIOPCIDevice 
*vdev, uint8_t pos)

  return 0;
  }

+static bool vfio_pci_host_slot_match(PCIHostDeviceAddress *host1,
+ PCIHostDeviceAddress *host2)
+{
+return (host1->domain == host2->domain && host1->bus == 
host2->bus &&

+host1->slot == host2->slot);
+}
+
+static bool vfio_pci_host_match(PCIHostDeviceAddress *host1,
+PCIHostDeviceAddress *host2)
+{
+return (vfio_pci_host_slot_match(host1, host2) &&
+host1->function == host2->function);
+}
+
+struct VFIODeviceFind {
+PCIDevice *pdev;
+bool found;
+};
+
+static void vfio_check_device_noreset(PCIBus *bus, PCIDevice *pdev,
+  void *opaque)
+{
+DeviceState *dev = DEVICE(pdev);
+DeviceClass *dc = DEVICE_GET_CLASS(dev);
+VFIOPCIDevice *vdev;
+struct VFIODeviceFind *find = opaque;
+
+if (find->found) {
+return;
+}
+
+if (!object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
+if (!dc->reset) {
+goto found;
+}
+return;
+}
+vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+if (!(vdev->features & VFIO_FEATURE_ENABLE_AER) &&
+!vdev->vbasedev.reset_works) {
+goto found;
+}
+
+return;
+found:
+find->pdev = pdev;
+find->found = true;
+}
+
+static void device_find(PCIBus *bus, PCIDevice *pdev, void *opaque)
+{
+struct VFIODeviceFind *find = opaque;
+
+if (find->found) {
+return;
+}
+
+if (pdev == find->pdev) {
+find->found = true;
+}
+}
+
+static int vfio_check_host_bus_reset(VFIOPCIDevice *vdev)
+{
+PCIBus *bus = vdev->pdev.bus;
+struct vfio_pci_hot_reset_info *info = NULL;
+struct vfio_pci_dependent_device *devices;
+VFIOGroup *group;
+struct VFIODeviceFind find;
+int ret, i;
+
+ret = vfio_get_hot_reset_info(vdev, );
+if (ret) {
+error_report("vfio: Cannot enable AER for device %s,"
+ " device does not support hot reset.",
+ vdev->vbasedev.name);
+goto out;




"return" is enough here in case of an error, info is released inside 
vfio_get_hot_reset_info


indeed.




+}
+
+/* List all affected devices by bus reset */
+devices = >devices[0];
+
+/* Verify that we have all the groups required */
+for (i = 0; i < info->count; i++) {
+PCIHostDeviceAddress host;
+VFIOPCIDevice *tmp;
+VFIODevice *vbasedev_iter;
+bool found = false;
+
+host.domain = devices[i].segment;
+host.bus = devices[i].bus;
+host.slot = PCI_SLOT(devices[i].devfn);
+host.function = PCI_FUNC(devices[i].devfn);
+
+/* Skip the current device */
+if (vfio_pci_host_match(, >host)) {
+continue;
+}
+
+/* Ensure we own the group of the affected device */
+QLIST_FOREACH(group, _group_list, next) {
+if (group->groupid == devices[i].group_id) {
+break;
+}
+}
+
+if (!group) {
+error_report("vfio: Cannot enable AER for device %s, "
+ "depends on group %d which is not owned.",
+ vdev->vbasedev.name, devices[i].group_id);
+ret = -1;


You can use error codes on return, or you can init ret to -1 at 
declaration,

and delete some code lines.


+goto out;
+}
+
+/* Ensure affected devices for reset on/blow the bus */


I think you meant below, not blow.


+QLIST_FOREACH(vbasedev_iter, >device_list, next) {
+if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+continue;
+}
+tmp = container_of(vbasedev_iter, 

Re: [Qemu-devel] [RFC 0/3] Draft implementation of HPT resizing (qemu side)

2016-01-19 Thread David Gibson
On Tue, Jan 19, 2016 at 01:18:17PM +0530, Bharata B Rao wrote:
> On Mon, Jan 18, 2016 at 04:44:38PM +1100, David Gibson wrote:
> > Here is a draft qemu implementation of my proposed PAPR extension for
> > allowing runtime resizing of a KVM/ppc64 guest's hash page table.
> > That in turn will allow for more flexible memory hotplug.
> > 
> > This should work with the guest kernel side patches I also posted
> > recently [1].
> > 
> > Still required to make this into a full implementation:
> >   * Guest needs to auto-resize HPT on memory hotplug events
> > 
> >   * qemu needs to allocate HPT size based on current rather than
> > maximum memory if the guest is HPT resize aware
> > 
> >   * KVM host side implementation
> > 
> >   * PAPR standardization
> 
> So with the current patchset (QEMU and guest kernel changes), I should
> be able to change the HTAB size of a PR guest right ? I see the below
> failure though:

Uh.. to be honest I haven't really considered the KVM case at all.
I'm kind of surprised it didn't just refuse to do anything.

> [root@localhost ~]# cat /sys/kernel/debug/powerpc/pft-size 
> 24
> [root@localhost ~]# echo 26 > /sys/kernel/debug/powerpc/pft-size
> [   65.996845] lpar: Attempting to resize HPT to shift 26
> [   65.996845] lpar: Attempting to resize HPT to shift 26
> [   66.113596] lpar: HPT resize to shift 26 complete (109 ms / 6 ms)
> [   66.113596] lpar: HPT resize to shift 26 complete (109 ms / 6 ms)
> 
> PR guest just hangs here while I see tons of below messages in
> the 1st level guest:
> 
> KVM can't copy data from 0x3fff99e91400!
> ...
> Couldn't emulate instruction 0x (op 0 xop 0)
> kvmppc_handle_exit_pr: emulation at 700 failed ()

Hm, not sure why that's happening.  At first I thought it was because
we weren't updating SDR1 with the address of the new htab, but that's
actually in there.  Maybe the KVM PR code isn't rereading it after
initial VM startup.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[Qemu-devel] [PATCH v3 09/10] qemu-nbd: use no_argument/required_argument constants

2016-01-19 Thread Daniel P. Berrange
When declaring the 'struct option' array, use the standard
constants no_argument/required_argument, instead of magic
values 0 and 1.

Reviewed-by: Eric Blake 
Signed-off-by: Daniel P. Berrange 
---
 qemu-nbd.c | 47 ---
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 1776a3c..fbc6610 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -437,29 +437,30 @@ int main(int argc, char **argv)
 const char *sn_id_or_name = NULL;
 const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:";
 struct option lopt[] = {
-{ "help", 0, NULL, 'h' },
-{ "version", 0, NULL, 'V' },
-{ "bind", 1, NULL, 'b' },
-{ "port", 1, NULL, 'p' },
-{ "socket", 1, NULL, 'k' },
-{ "offset", 1, NULL, 'o' },
-{ "read-only", 0, NULL, 'r' },
-{ "partition", 1, NULL, 'P' },
-{ "connect", 1, NULL, 'c' },
-{ "disconnect", 0, NULL, 'd' },
-{ "snapshot", 0, NULL, 's' },
-{ "load-snapshot", 1, NULL, 'l' },
-{ "nocache", 0, NULL, 'n' },
-{ "cache", 1, NULL, QEMU_NBD_OPT_CACHE },
-{ "aio", 1, NULL, QEMU_NBD_OPT_AIO },
-{ "discard", 1, NULL, QEMU_NBD_OPT_DISCARD },
-{ "detect-zeroes", 1, NULL, QEMU_NBD_OPT_DETECT_ZEROES },
-{ "shared", 1, NULL, 'e' },
-{ "format", 1, NULL, 'f' },
-{ "persistent", 0, NULL, 't' },
-{ "verbose", 0, NULL, 'v' },
-{ "object", 1, NULL, QEMU_NBD_OPT_OBJECT },
-{ "image-opts", 0, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
+{ "help", no_argument, NULL, 'h' },
+{ "version", no_argument, NULL, 'V' },
+{ "bind", required_argument, NULL, 'b' },
+{ "port", required_argument, NULL, 'p' },
+{ "socket", required_argument, NULL, 'k' },
+{ "offset", required_argument, NULL, 'o' },
+{ "read-only", no_argument, NULL, 'r' },
+{ "partition", required_argument, NULL, 'P' },
+{ "connect", required_argument, NULL, 'c' },
+{ "disconnect", no_argument, NULL, 'd' },
+{ "snapshot", no_argument, NULL, 's' },
+{ "load-snapshot", required_argument, NULL, 'l' },
+{ "nocache", no_argument, NULL, 'n' },
+{ "cache", required_argument, NULL, QEMU_NBD_OPT_CACHE },
+{ "aio", required_argument, NULL, QEMU_NBD_OPT_AIO },
+{ "discard", required_argument, NULL, QEMU_NBD_OPT_DISCARD },
+{ "detect-zeroes", required_argument, NULL,
+  QEMU_NBD_OPT_DETECT_ZEROES },
+{ "shared", required_argument, NULL, 'e' },
+{ "format", required_argument, NULL, 'f' },
+{ "persistent", no_argument, NULL, 't' },
+{ "verbose", no_argument, NULL, 'v' },
+{ "object", required_argument, NULL, QEMU_NBD_OPT_OBJECT },
+{ "image-opts", no_argument, NULL, QEMU_NBD_OPT_IMAGE_OPTS },
 { NULL, 0, NULL, 0 }
 };
 int ch;
-- 
2.5.0




[Qemu-devel] [PATCH v4 3/4] char: don't assume telnet initialization will not block

2016-01-19 Thread Daniel P. Berrange
The current code for doing telnet initialization is writing to
a socket without checking the return status. While it is highly
unlikely to be a problem when writing to a bare socket, as the
buffers are large enough to prevent blocking, this cannot be
assumed safe with TLS sockets. So write the telnet initialization
code into a memory buffer and then use an I/O watch to fully
send the data.

Signed-off-by: Daniel P. Berrange 
---
 qemu-char.c | 87 -
 1 file changed, 69 insertions(+), 18 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 8e9156a..f0cea8a 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2877,19 +2877,70 @@ static void tcp_chr_update_read_handler(CharDriverState 
*chr)
 }
 }
 
-#define IACSET(x,a,b,c) x[0] = a; x[1] = b; x[2] = c;
-static void tcp_chr_telnet_init(QIOChannel *ioc)
+typedef struct {
+CharDriverState *chr;
+char buf[12];
+size_t buflen;
+} TCPCharDriverTelnetInit;
+
+static gboolean tcp_chr_telnet_init_io(QIOChannel *ioc,
+   GIOCondition cond G_GNUC_UNUSED,
+   gpointer user_data)
+{
+TCPCharDriverTelnetInit *init = user_data;
+ssize_t ret;
+
+ret = qio_channel_write(ioc, init->buf, init->buflen, NULL);
+if (ret < 0) {
+if (ret == QIO_CHANNEL_ERR_BLOCK) {
+ret = 0;
+} else {
+tcp_chr_disconnect(init->chr);
+return FALSE;
+}
+}
+init->buflen -= ret;
+
+if (init->buflen == 0) {
+tcp_chr_connect(init->chr);
+return FALSE;
+}
+
+memmove(init->buf, init->buf + ret, init->buflen);
+
+return TRUE;
+}
+
+static void tcp_chr_telnet_init(CharDriverState *chr)
 {
-char buf[3];
-/* Send the telnet negotion to put telnet in binary, no echo, single char 
mode */
-IACSET(buf, 0xff, 0xfb, 0x01);  /* IAC WILL ECHO */
-qio_channel_write(ioc, buf, 3, NULL);
-IACSET(buf, 0xff, 0xfb, 0x03);  /* IAC WILL Suppress go ahead */
-qio_channel_write(ioc, buf, 3, NULL);
-IACSET(buf, 0xff, 0xfb, 0x00);  /* IAC WILL Binary */
-qio_channel_write(ioc, buf, 3, NULL);
-IACSET(buf, 0xff, 0xfd, 0x00);  /* IAC DO Binary */
-qio_channel_write(ioc, buf, 3, NULL);
+TCPCharDriver *s = chr->opaque;
+TCPCharDriverTelnetInit *init =
+g_new0(TCPCharDriverTelnetInit, 1);
+size_t n = 0;
+
+init->chr = chr;
+init->buflen = 12;
+
+#define IACSET(x, a, b, c)  \
+do {\
+x[n++] = a; \
+x[n++] = b; \
+x[n++] = c; \
+} while (0)
+
+/* Prep the telnet negotion to put telnet in binary,
+ * no echo, single char mode */
+IACSET(init->buf, 0xff, 0xfb, 0x01);  /* IAC WILL ECHO */
+IACSET(init->buf, 0xff, 0xfb, 0x03);  /* IAC WILL Suppress go ahead */
+IACSET(init->buf, 0xff, 0xfb, 0x00);  /* IAC WILL Binary */
+IACSET(init->buf, 0xff, 0xfd, 0x00);  /* IAC DO Binary */
+
+#undef IACSET
+
+qio_channel_add_watch(
+s->ioc, G_IO_OUT,
+tcp_chr_telnet_init_io,
+init, NULL);
 }
 
 static int tcp_chr_new_client(CharDriverState *chr, QIOChannelSocket *sioc)
@@ -2909,7 +2960,12 @@ static int tcp_chr_new_client(CharDriverState *chr, 
QIOChannelSocket *sioc)
 g_source_remove(s->listen_tag);
 s->listen_tag = 0;
 }
-tcp_chr_connect(chr);
+
+if (s->do_telnetopt) {
+tcp_chr_telnet_init(chr);
+} else {
+tcp_chr_connect(chr);
+}
 
 return 0;
 }
@@ -2935,7 +2991,6 @@ static gboolean tcp_chr_accept(QIOChannel *channel,
void *opaque)
 {
 CharDriverState *chr = opaque;
-TCPCharDriver *s = chr->opaque;
 QIOChannelSocket *sioc;
 
 sioc = qio_channel_socket_accept(QIO_CHANNEL_SOCKET(channel),
@@ -2944,10 +2999,6 @@ static gboolean tcp_chr_accept(QIOChannel *channel,
 return TRUE;
 }
 
-if (s->do_telnetopt) {
-tcp_chr_telnet_init(QIO_CHANNEL(sioc));
-}
-
 tcp_chr_new_client(chr, sioc);
 
 object_unref(OBJECT(sioc));
-- 
2.5.0




[Qemu-devel] [PATCH v4 4/4] char: introduce support for TLS encrypted TCP chardev backend

2016-01-19 Thread Daniel P. Berrange
This integrates support for QIOChannelTLS object in the TCP
chardev backend. If the 'tls-creds=NAME' option is passed with
the '-chardev tcp' argument, then it will setup the chardev
such that the client is required to establish a TLS handshake
when connecting. There is no support for checking the client
certificate against ACLs in this initial patch. This is pending
work to QOM-ify the ACL object code.

A complete invocation to run QEMU as the server for a TLS
encrypted serial dev might be

  $ qemu-system-x86_64 \
  -nodefconfig -nodefaults -device sga -display none \
  -chardev socket,id=s0,host=127.0.0.1,port=9000,tls-creds=tls0,server \
  -device isa-serial,chardev=s0 \
  -object tls-creds-x509,id=tls0,endpoint=server,verify-peer=off,\
 dir=/home/berrange/security/qemutls

To test with the gnutls-cli tool as the client:

  $ gnutls-cli --priority=NORMAL -p 9000 \
   --x509cafile=/home/berrange/security/qemutls/ca-cert.pem \
   127.0.0.1

If QEMU was told to use 'anon' credential type, then use the
priority string 'NORMAL:+ANON-DH' with gnutls-cli

Alternatively, if setting up a chardev to operate as a client,
then the TLS credentials registered must be for the client
endpoint. First a TLS server must be setup, which can be done
with the gnutls-serv tool

  $ gnutls-serv --priority=NORMAL -p 9000 --echo \
   --x509cafile=/home/berrange/security/qemutls/ca-cert.pem \
   --x509certfile=/home/berrange/security/qemutls/server-cert.pem \
   --x509keyfile=/home/berrange/security/qemutls/server-key.pem

Then QEMU can connect with

  $ qemu-system-x86_64 \
  -nodefconfig -nodefaults -device sga -display none \
  -chardev socket,id=s0,host=127.0.0.1,port=9000,tls-creds=tls0 \
  -device isa-serial,chardev=s0 \
  -object tls-creds-x509,id=tls0,endpoint=client,\
dir=/home/berrange/security/qemutls

Signed-off-by: Daniel P. Berrange 
---
 qapi-schema.json |   2 +
 qemu-char.c  | 136 ++-
 qemu-options.hx  |   9 +++-
 3 files changed, 134 insertions(+), 13 deletions(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index b3038b2..8d04897 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3146,6 +3146,7 @@
 #
 # @addr: socket address to listen on (server=true)
 #or connect to (server=false)
+# @tls-creds: #optional the ID of the TLS credentials object (since 2.6)
 # @server: #optional create server socket (default: true)
 # @wait: #optional wait for incoming connection on server
 #sockets (default: false).
@@ -3160,6 +3161,7 @@
 # Since: 1.4
 ##
 { 'struct': 'ChardevSocket', 'data': { 'addr'   : 'SocketAddress',
+ '*tls-creds'  : 'str',
  '*server': 'bool',
  '*wait'  : 'bool',
  '*nodelay'   : 'bool',
diff --git a/qemu-char.c b/qemu-char.c
index f0cea8a..7ded3c2 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -35,6 +35,7 @@
 #include "qemu/base64.h"
 #include "io/channel-socket.h"
 #include "io/channel-file.h"
+#include "io/channel-tls.h"
 
 #include 
 #include 
@@ -2532,9 +2533,11 @@ static CharDriverState 
*qemu_chr_open_udp(QIOChannelSocket *sioc,
 /* TCP Net console */
 
 typedef struct {
-QIOChannel *ioc;
+QIOChannel *ioc; /* Client I/O channel */
+QIOChannelSocket *sioc; /* Client master channel */
 QIOChannelSocket *listen_ioc;
 guint listen_tag;
+QCryptoTLSCreds *tls_creds;
 int connected;
 int max_size;
 int do_telnetopt;
@@ -2776,6 +2779,8 @@ static void tcp_chr_disconnect(CharDriverState *chr)
 QIO_CHANNEL(s->listen_ioc), G_IO_IN, tcp_chr_accept, chr, NULL);
 }
 remove_fd_in_watch(chr);
+object_unref(OBJECT(s->sioc));
+s->sioc = NULL;
 object_unref(OBJECT(s->ioc));
 s->ioc = NULL;
 g_free(chr->filename);
@@ -2849,12 +2854,12 @@ static void tcp_chr_connect(void *opaque)
 {
 CharDriverState *chr = opaque;
 TCPCharDriver *s = chr->opaque;
-QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(s->ioc);
 
 g_free(chr->filename);
-chr->filename = sockaddr_to_str(>localAddr, sioc->localAddrLen,
->remoteAddr, sioc->remoteAddrLen,
-s->is_listen, s->is_telnet);
+chr->filename = sockaddr_to_str(
+>sioc->localAddr, s->sioc->localAddrLen,
+>sioc->remoteAddr, s->sioc->remoteAddrLen,
+s->is_listen, s->is_telnet);
 
 s->connected = 1;
 if (s->ioc) {
@@ -2943,6 +2948,57 @@ static void tcp_chr_telnet_init(CharDriverState *chr)
 init, NULL);
 }
 
+
+static void tcp_chr_tls_handshake(Object *source,
+  Error *err,
+  gpointer user_data)
+{
+CharDriverState *chr = user_data;
+TCPCharDriver *s = chr->opaque;
+
+if (err) {
+

[Qemu-devel] [PATCH v4 0/4] Convert chardevs to QIOChannel & add TLS support

2016-01-19 Thread Daniel P. Berrange
This is an update of patches previously shown in an RFC posting

  RFC: https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg00829.html
   v1: https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg04222.html
   v2: https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg03823.html
   v3: https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg01601.html

This short series converts the chardev backends to use the new
QIOChannel framework. After doing so it then adds support for
TLS encryption of TCP chardevs. The commit message in the last
patch explains the TLS encryption in detail.

The GIOChannel -> QIOChannel conversion has been validated by
running the qtest framework, which indeed found a few bugs
initially which I have since fixed.

The TLS support has been tested for interoperability using
the gnutls-serv and gnutls-client programs which provide
stub TLS endpoints/clients respectively.

Changed in v4:

 - Rebase to resolve conflicts with recent merged patches

Changed in v3:

 - Fix buffer update after partial send of telnet data

Daniel P. Berrange (4):
  char: remove fixed length filename allocation
  char: convert from GIOChannel to QIOChannel
  char: don't assume telnet initialization will not block
  char: introduce support for TLS encrypted TCP chardev backend

 qapi-schema.json |   2 +
 qemu-char.c  | 913 ---
 qemu-options.hx  |   9 +-
 tests/Makefile   |   2 +-
 4 files changed, 479 insertions(+), 447 deletions(-)

-- 
2.5.0




[Qemu-devel] [PATCH v4 1/4] char: remove fixed length filename allocation

2016-01-19 Thread Daniel P. Berrange
A variety of places were snprintf()ing into a fixed length
filename buffer. Some of the buffers were stack allocated,
while another was heap allocated with g_malloc(). Switch
them all to heap allocated using g_strdup_printf() avoiding
arbitrary length restrictions.

This also facilitates later patches which will want to
populate the filename by calling external functions
which do not support use of a pre-allocated buffer.

Signed-off-by: Daniel P. Berrange 
---
 qemu-char.c | 86 +++--
 1 file changed, 44 insertions(+), 42 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index e133f4f..8e96f90 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -88,39 +88,37 @@
 
 #define READ_BUF_LEN 4096
 #define READ_RETRIES 10
-#define CHR_MAX_FILENAME_SIZE 256
 #define TCP_MAX_FDS 16
 
 /***/
 /* Socket address helpers */
 
-static int SocketAddress_to_str(char *dest, int max_len,
-const char *prefix, SocketAddress *addr,
-bool is_listen, bool is_telnet)
+static char *SocketAddress_to_str(const char *prefix, SocketAddress *addr,
+  bool is_listen, bool is_telnet)
 {
 switch (addr->type) {
 case SOCKET_ADDRESS_KIND_INET:
-return snprintf(dest, max_len, "%s%s:%s:%s%s", prefix,
-is_telnet ? "telnet" : "tcp", addr->u.inet->host,
-addr->u.inet->port, is_listen ? ",server" : "");
+return g_strdup_printf("%s%s:%s:%s%s", prefix,
+   is_telnet ? "telnet" : "tcp", 
addr->u.inet->host,
+   addr->u.inet->port, is_listen ? ",server" : "");
 break;
 case SOCKET_ADDRESS_KIND_UNIX:
-return snprintf(dest, max_len, "%sunix:%s%s", prefix,
-addr->u.q_unix->path, is_listen ? ",server" : "");
+return g_strdup_printf("%sunix:%s%s", prefix,
+   addr->u.q_unix->path,
+   is_listen ? ",server" : "");
 break;
 case SOCKET_ADDRESS_KIND_FD:
-return snprintf(dest, max_len, "%sfd:%s%s", prefix, addr->u.fd->str,
-is_listen ? ",server" : "");
+return g_strdup_printf("%sfd:%s%s", prefix, addr->u.fd->str,
+   is_listen ? ",server" : "");
 break;
 default:
 abort();
 }
 }
 
-static int sockaddr_to_str(char *dest, int max_len,
-   struct sockaddr_storage *ss, socklen_t ss_len,
-   struct sockaddr_storage *ps, socklen_t ps_len,
-   bool is_listen, bool is_telnet)
+static char *sockaddr_to_str(struct sockaddr_storage *ss, socklen_t ss_len,
+ struct sockaddr_storage *ps, socklen_t ps_len,
+ bool is_listen, bool is_telnet)
 {
 char shost[NI_MAXHOST], sserv[NI_MAXSERV];
 char phost[NI_MAXHOST], pserv[NI_MAXSERV];
@@ -129,9 +127,9 @@ static int sockaddr_to_str(char *dest, int max_len,
 switch (ss->ss_family) {
 #ifndef _WIN32
 case AF_UNIX:
-return snprintf(dest, max_len, "unix:%s%s",
-((struct sockaddr_un *)(ss))->sun_path,
-is_listen ? ",server" : "");
+return g_strdup_printf("unix:%s%s",
+   ((struct sockaddr_un *)(ss))->sun_path,
+   is_listen ? ",server" : "");
 #endif
 case AF_INET6:
 left  = "[";
@@ -142,14 +140,14 @@ static int sockaddr_to_str(char *dest, int max_len,
 sserv, sizeof(sserv), NI_NUMERICHOST | NI_NUMERICSERV);
 getnameinfo((struct sockaddr *) ps, ps_len, phost, sizeof(phost),
 pserv, sizeof(pserv), NI_NUMERICHOST | NI_NUMERICSERV);
-return snprintf(dest, max_len, "%s:%s%s%s:%s%s <-> %s%s%s:%s",
-is_telnet ? "telnet" : "tcp",
-left, shost, right, sserv,
-is_listen ? ",server" : "",
-left, phost, right, pserv);
+return g_strdup_printf("%s:%s%s%s:%s%s <-> %s%s%s:%s",
+   is_telnet ? "telnet" : "tcp",
+   left, shost, right, sserv,
+   is_listen ? ",server" : "",
+   left, phost, right, pserv);
 
 default:
-return snprintf(dest, max_len, "unknown");
+return g_strdup_printf("unknown");
 }
 }
 
@@ -1074,15 +1072,18 @@ static CharDriverState *qemu_chr_open_pipe(const char 
*id,
 {
 ChardevHostdev *opts = backend->u.pipe;
 int fd_in, fd_out;
-char filename_in[CHR_MAX_FILENAME_SIZE];
-char filename_out[CHR_MAX_FILENAME_SIZE];
+char *filename_in;
+char *filename_out;
 const char *filename = opts->device;

[Qemu-devel] [PATCH v4 2/4] char: convert from GIOChannel to QIOChannel

2016-01-19 Thread Daniel P. Berrange
In preparation for introducing TLS support to the TCP chardev
backend, convert existing chardev code from using GIOChannel
to QIOChannel. This simplifies the chardev code by removing
most of the OS platform conditional code for dealing with
file descriptor passing.

Signed-off-by: Daniel P. Berrange 
---
 qemu-char.c| 648 ++---
 tests/Makefile |   2 +-
 2 files changed, 254 insertions(+), 396 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 8e96f90..8e9156a 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -33,6 +33,8 @@
 #include "qapi/qmp-output-visitor.h"
 #include "qapi-visit.h"
 #include "qemu/base64.h"
+#include "io/channel-socket.h"
+#include "io/channel-file.h"
 
 #include 
 #include 
@@ -766,7 +768,7 @@ typedef struct IOWatchPoll
 {
 GSource parent;
 
-GIOChannel *channel;
+QIOChannel *ioc;
 GSource *src;
 
 IOCanReadHandler *fd_can_read;
@@ -789,8 +791,8 @@ static gboolean io_watch_poll_prepare(GSource *source, gint 
*timeout_)
 }
 
 if (now_active) {
-iwp->src = g_io_create_watch(iwp->channel,
- G_IO_IN | G_IO_ERR | G_IO_HUP | 
G_IO_NVAL);
+iwp->src = qio_channel_create_watch(
+iwp->ioc, G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL);
 g_source_set_callback(iwp->src, iwp->fd_read, iwp->opaque, NULL);
 g_source_attach(iwp->src, NULL);
 } else {
@@ -836,9 +838,9 @@ static GSourceFuncs io_watch_poll_funcs = {
 };
 
 /* Can only be used for read */
-static guint io_add_watch_poll(GIOChannel *channel,
+static guint io_add_watch_poll(QIOChannel *ioc,
IOCanReadHandler *fd_can_read,
-   GIOFunc fd_read,
+   QIOChannelFunc fd_read,
gpointer user_data)
 {
 IOWatchPoll *iwp;
@@ -847,7 +849,7 @@ static guint io_add_watch_poll(GIOChannel *channel,
 iwp = (IOWatchPoll *) g_source_new(_watch_poll_funcs, 
sizeof(IOWatchPoll));
 iwp->fd_can_read = fd_can_read;
 iwp->opaque = user_data;
-iwp->channel = channel;
+iwp->ioc = ioc;
 iwp->fd_read = (GSourceFunc) fd_read;
 iwp->src = NULL;
 
@@ -883,79 +885,50 @@ static void remove_fd_in_watch(CharDriverState *chr)
 }
 }
 
-#ifndef _WIN32
-static GIOChannel *io_channel_from_fd(int fd)
-{
-GIOChannel *chan;
-
-if (fd == -1) {
-return NULL;
-}
 
-chan = g_io_channel_unix_new(fd);
-
-g_io_channel_set_encoding(chan, NULL, NULL);
-g_io_channel_set_buffered(chan, FALSE);
-
-return chan;
-}
-#endif
-
-static GIOChannel *io_channel_from_socket(int fd)
+static int io_channel_send_full(QIOChannel *ioc,
+const void *buf, size_t len,
+int *fds, size_t nfds)
 {
-GIOChannel *chan;
+size_t offset = 0;
 
-if (fd == -1) {
-return NULL;
-}
+while (offset < len) {
+ssize_t ret = 0;
+struct iovec iov = { .iov_base = (char *)buf + offset,
+ .iov_len = len - offset };
+
+ret = qio_channel_writev_full(
+ioc, , 1,
+fds, nfds, NULL);
+if (ret == QIO_CHANNEL_ERR_BLOCK) {
+errno = EAGAIN;
+return -1;
+} else if (ret < 0) {
+if (offset) {
+return offset;
+}
 
-#ifdef _WIN32
-chan = g_io_channel_win32_new_socket(fd);
-#else
-chan = g_io_channel_unix_new(fd);
-#endif
+errno = EINVAL;
+return -1;
+}
 
-g_io_channel_set_encoding(chan, NULL, NULL);
-g_io_channel_set_buffered(chan, FALSE);
+offset += ret;
+}
 
-return chan;
+return offset;
 }
 
-static int io_channel_send(GIOChannel *fd, const void *buf, size_t len)
-{
-size_t offset = 0;
-GIOStatus status = G_IO_STATUS_NORMAL;
-
-while (offset < len && status == G_IO_STATUS_NORMAL) {
-gsize bytes_written = 0;
 
-status = g_io_channel_write_chars(fd, buf + offset, len - offset,
-  _written, NULL);
-offset += bytes_written;
-}
-
-if (offset > 0) {
-return offset;
-}
-switch (status) {
-case G_IO_STATUS_NORMAL:
-g_assert(len == 0);
-return 0;
-case G_IO_STATUS_AGAIN:
-errno = EAGAIN;
-return -1;
-default:
-break;
-}
-errno = EINVAL;
-return -1;
+static int io_channel_send(QIOChannel *ioc, const void *buf, size_t len)
+{
+return io_channel_send_full(ioc, buf, len, NULL, 0);
 }
 
 #ifndef _WIN32
 
 typedef struct FDCharDriver {
 CharDriverState *chr;
-GIOChannel *fd_in, *fd_out;
+QIOChannel *ioc_in, *ioc_out;
 int max_size;
 } FDCharDriver;
 
@@ -964,17 +937,16 @@ static int fd_chr_write(CharDriverState *chr, const 
uint8_t *buf, int len)
 {
 FDCharDriver *s = chr->opaque;
 
-

Re: [Qemu-devel] [PATCH 1/2] 9pfs: use error_report() instead of fprintf(stderr)

2016-01-19 Thread Greg Kurz
On Mon, 18 Jan 2016 17:35:25 +0100
Markus Armbruster  wrote:

> Greg Kurz  writes:
> 
> > Signed-off-by: Greg Kurz 
> > ---

I agree to all your suggestions. Thanks !

> >  hw/9pfs/9p-handle.c |5 +++--
> >  hw/9pfs/9p-local.c  |   15 ---
> >  hw/9pfs/9p-proxy.c  |   12 ++--
> >  hw/9pfs/9p.c|2 +-
> >  4 files changed, 18 insertions(+), 16 deletions(-)
> >
> > diff --git a/hw/9pfs/9p-handle.c b/hw/9pfs/9p-handle.c
> > index 58b77b4c942d..8ba88775a2b6 100644
> > --- a/hw/9pfs/9p-handle.c
> > +++ b/hw/9pfs/9p-handle.c
> > @@ -19,6 +19,7 @@
> >  #include 
> >  #include 
> >  #include "qemu/xattr.h"
> > +#include "qemu/error-report.h"
> >  #include 
> >  #include 
> >  #ifdef CONFIG_LINUX_MAGIC_H
> > @@ -655,12 +656,12 @@ static int handle_parse_opts(QemuOpts *opts, struct 
> > FsDriverEntry *fse)
> >  const char *path = qemu_opt_get(opts, "path");
> >  
> >  if (sec_model) {
> > -fprintf(stderr, "Invalid argument security_model specified with 
> > handle fsdriver\n");
> > +error_report("Invalid argument security_model specified with 
> > handle fsdriver");
> >  return -1;
> >  }
> >  
> >  if (!path) {
> > -fprintf(stderr, "fsdev: No path specified.\n");
> > +error_report("fsdev: No path specified.");  
> 
> Recommend to drop the period while there.
> 
> >  return -1;
> >  }
> >  fse->path = g_strdup(path);
> > diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
> > index bf63eab729ad..9c25ab2db26b 100644
> > --- a/hw/9pfs/9p-local.c
> > +++ b/hw/9pfs/9p-local.c
> > @@ -20,6 +20,7 @@
> >  #include 
> >  #include 
> >  #include "qemu/xattr.h"
> > +#include "qemu/error-report.h"
> >  #include 
> >  #include 
> >  #ifdef CONFIG_LINUX_MAGIC_H
> > @@ -1209,9 +1210,9 @@ static int local_parse_opts(QemuOpts *opts, struct 
> > FsDriverEntry *fse)
> >  const char *path = qemu_opt_get(opts, "path");
> >  
> >  if (!sec_model) {
> > -fprintf(stderr, "security model not specified, "
> > -"local fs needs security model\nvalid options are:"
> > -"\tsecurity_model=[passthrough|mapped|none]\n");
> > +error_report("security model not specified, local fs needs 
> > security model");
> > +error_printf("valid options are:"
> > + 
> > "\tsecurity_model=[passthrough|mapped-xattr|mapped-file|none]\n");
> >  return -1;
> >  }
> >  
> > @@ -1225,14 +1226,14 @@ static int local_parse_opts(QemuOpts *opts, struct 
> > FsDriverEntry *fse)
> >  } else if (!strcmp(sec_model, "mapped-file")) {
> >  fse->export_flags |= V9FS_SM_MAPPED_FILE;
> >  } else {
> > -fprintf(stderr, "Invalid security model %s specified, valid 
> > options are"
> > -"\n\t [passthrough|mapped-xattr|mapped-file|none]\n",
> > -sec_model);
> > +error_report("Invalid security model %s specified, valid options 
> > are",
> > + sec_model);
> > +error_printf("\t [passthrough|mapped-xattr|mapped-file|none]\n");  
> 
> Neater:
> 
> error_report("Invalid security model %s specified", sec_model);
> error_printf("valid options are;"
>  "\t[passthrough|mapped-xattr|mapped-file|none]\n");
> 
> >  return -1;
> >  }
> >  
> >  if (!path) {
> > -fprintf(stderr, "fsdev: No path specified.\n");
> > +error_report("fsdev: No path specified.");  
> 
> Recommend to drop the period while there.
> 
> >  return -1;
> >  }
> >  fse->path = g_strdup(path);
> > diff --git a/hw/9pfs/9p-proxy.c b/hw/9pfs/9p-proxy.c
> > index 73d00dd74d11..72b9952d7c8b 100644
> > --- a/hw/9pfs/9p-proxy.c
> > +++ b/hw/9pfs/9p-proxy.c
> > @@ -1100,19 +1100,19 @@ static int connect_namedsocket(const char *path)
> >  struct sockaddr_un helper;
> >  
> >  if (strlen(path) >= sizeof(helper.sun_path)) {
> > -fprintf(stderr, "Socket name too large\n");
> > +error_report("Socket name too large");  
> 
> "too long" would be clearer, I think.
> 
> >  return -1;
> >  }
> >  sockfd = socket(AF_UNIX, SOCK_STREAM, 0);
> >  if (sockfd < 0) {
> > -fprintf(stderr, "failed to create socket: %s\n", strerror(errno));
> > +error_report("failed to create socket: %s", strerror(errno));
> >  return -1;
> >  }
> >  strcpy(helper.sun_path, path);
> >  helper.sun_family = AF_UNIX;
> >  size = strlen(helper.sun_path) + sizeof(helper.sun_family);
> >  if (connect(sockfd, (struct sockaddr *), size) < 0) {
> > -fprintf(stderr, "failed to connect to %s: %s\n", path, 
> > strerror(errno));
> > +error_report("failed to connect to %s: %s", path, strerror(errno));
> >  close(sockfd);
> >  return -1;
> >  }
> > @@ -1128,11 +1128,11 @@ static int proxy_parse_opts(QemuOpts *opts, struct 
> 

Re: [Qemu-devel] [PATCH 0/2] 9pfs: fsdev: use error_report() instead of fprintf(stderr)

2016-01-19 Thread Greg Kurz
On Mon, 18 Jan 2016 17:39:28 +0100
Markus Armbruster  wrote:

> Greg Kurz  writes:
> 
> > Hi,
> >
> > This series moves all the 9pfs/fsdev code to use error_report(), with the
> > notable exception of virtfs-proxy-helper, which doesn't need it.
> >
> > Markus,
> >
> > Should this patches go through your tree ? Or can they go through my 9p tree
> > if you ack them ?  
> 
> It can certainly go through your tree!
> 
> My tree is meant for crosscutting error work.  I also offer it
> maintainers who prefer to leave the pull request to me.
> 
> PATCH 1 could use a bit of polish, and I encourage you to respin.  But
> it's not wrong, therefore
> 
> Series
> Reviewed-by: Markus Armbruster 
> 

Thanks Markus !

I've respun the series with your suggestions. Is it expected I repost to
qemu-devel before doing a pull request ?

Cheers.

--
Greg




[Qemu-devel] [PATCH] ehci: update irq on reset

2016-01-19 Thread Gerd Hoffmann
After clearing the status register we also have to update the irq line
status.  Otherwise a irq which happends to be pending at reset time
causes a interrupt storm.  And the guest can't stop as the status
register doesn't indicate any pending interrupt.

Both NetBSD and FreeBSD hang on shutdown because of that.

Cc: qemu-sta...@nongnu.org
Reported-by: Andrey Korolyov 
Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-ehci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index d07f228..d2b7fa2 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -865,6 +865,7 @@ void ehci_reset(void *opaque)
 s->usbsts = USBSTS_HALT;
 s->usbsts_pending = 0;
 s->usbsts_frindex = 0;
+ehci_update_irq(s);
 
 s->astate = EST_INACTIVE;
 s->pstate = EST_INACTIVE;
-- 
1.8.3.1




Re: [Qemu-devel] usb-storage assertions

2016-01-19 Thread Gerd Hoffmann
  Hi,

> Probably not enough with driver subsystem to point even at the obvious
> issue in the EHCI driver. I`d start with slowing down an emulated CPU
> 10...100 times via its thread cg, leaving emulator code hanging with
> enough CPU cycles and check if the issue is still here. If roots of
> the crash or endless loop are timing-related, they either would change
> appearance significanly or disappear completely (or vice versa, slow
> an emulator thread). If you don`t have enough time for such blind
> testing, I may check it in a next few days. Since I`ve seen interrupt
> storm complaint on FreeBSD within same conditions, I strongly prefer
> the idea of a race-driven behavior.

Ha!  That nailed it.  /me was looking for a loop in the code, waiting
for the device having finished reset or something like that.  But it
turned out to be a interrupt storm indeed.

cheers,
  Gerd




[Qemu-devel] [PATCH] hw/arm/virt: Add always-on property to the virt board timer

2016-01-19 Thread Christoffer Dall
The virt board has an arch timer, which is always on.  Emit the
"always-on" property to indicate to Linux that it can switch off the
periodic timer and reduces the amount of interrupts injected into a
guest.

Signed-off-by: Christoffer Dall 
---
 hw/arm/virt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 05f9087..265fe9a 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi, 
int gictype)
 qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible",
 "arm,armv7-timer");
 }
+qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0);
 qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts",
GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags,
GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags,
-- 
2.1.2.330.g565301e.dirty




Re: [Qemu-devel] [PATCH 1/5] ide: Prohibit RESET on IDE drives

2016-01-19 Thread Paolo Bonzini


On 19/01/2016 05:51, John Snow wrote:
> +/* Only RESET is allowed to an ATAPI device while BSY and/or DRQ are 
> set. */
> +if (s->status & (BUSY_STAT|DRQ_STAT)) {
> +if (!(val == WIN_DEVICE_RESET) && (s->drive_kind == IDE_CD)) {

I was going to complain about Pascal-ish parentheses, but actually I
think there is a bug here; the expression just looks weird.

Did you mean

if (!(val == WIN_DEVICE_RESET && s->drive_kind == IDE_CD))

or equivalently applying de Morgan's law:

if (s->drive_kind != IDE_CD || val != WIN_DEVICE_RESET)

?

Paolo

> +return;




[Qemu-devel] [PATCH] nbd: use client_close() when negotiate phase fails

2016-01-19 Thread Daniel P. Berrange
When nbd_negotiate() fails, nbd_co_client_start() is
directly calling client->close(). This eventually
ends up calling nbd_client_put(), which does an
assert(client->closing). Unfortunately we have not
set the 'closing' flag, so the code now aborts. This
bug was accidentally introduced in

  commit ee7d7aabdaea4484e069cb99c9fc54e8cb24b56f
  Author: Fam Zheng 
  Date:   Thu Jan 14 16:41:01 2016 +0800

nbd: Always call "close_fn" in nbd_client_new

The simple fix is to not directly call client->close()
but instead call the client_close() method, which
takes care todo the right sequence of steps to close
the client.

Signed-off-by: Daniel P. Berrange 
---
 nbd/server.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index eead339..c29ba5f 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1080,8 +1080,7 @@ static coroutine_fn void nbd_co_client_start(void *opaque)
 nbd_export_get(exp);
 }
 if (nbd_negotiate(data)) {
-shutdown(client->sock, 2);
-client->close(client);
+client_close(client);
 goto out;
 }
 qemu_co_mutex_init(>send_lock);
-- 
2.5.0




Re: [Qemu-devel] [PATCH 3/5] ide: move buffered DMA cancel to core

2016-01-19 Thread Paolo Bonzini


On 19/01/2016 05:51, John Snow wrote:
> Buffered DMA cancellation was added to ATAPI devices and implemented
> for the BMDMA HBA. Move the code over to common IDE code and allow
> it to be used for any HBA.
> 
> Signed-off-by: John Snow 
> ---
>  hw/ide/core.c | 45 +
>  hw/ide/internal.h |  1 +
>  hw/ide/pci.c  | 36 +---
>  3 files changed, 47 insertions(+), 35 deletions(-)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 75486c2..5d81840 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -608,6 +608,51 @@ BlockAIOCB *ide_buffered_readv(IDEState *s, int64_t 
> sector_num,
>  return aioreq;
>  }
>  
> +/**
> + * Cancel all pending DMA requests.
> + * Any buffered DMA requests are instantly canceled,
> + * but any pending unbuffered DMA requests must be waited on.
> + */
> +void ide_cancel_dma_sync(IDEState *s)
> +{
> +IDEBufferedRequest *req;
> +
> +/* First invoke the callbacks of all buffered requests
> + * and flag those requests as orphaned. Ideally there
> + * are no unbuffered (Scatter Gather DMA Requests or
> + * write requests) pending and we can avoid to drain. */
> +QLIST_FOREACH(req, >buffered_requests, list) {
> +if (!req->orphaned) {
> +#ifdef DEBUG_IDE
> +printf("%s: invoking cb %p of buffered request %p with"
> +   " -ECANCELED\n", __func__, req->original_cb, req);
> +#endif
> +req->original_cb(req->original_opaque, -ECANCELED);
> +}
> +req->orphaned = true;
> +}
> +
> +/*
> + * We can't cancel Scatter Gather DMA in the middle of the
> + * operation or a partial (not full) DMA transfer would reach
> + * the storage so we wait for completion instead (we beahve
> + * like if the DMA was completed by the time the guest trying
> + * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
> + * set).
> + *
> + * In the future we'll be able to safely cancel the I/O if the
> + * whole DMA operation will be submitted to disk with a single
> + * aio operation with preadv/pwritev.
> + */
> +if (s->bus->dma->aiocb) {
> +#ifdef DEBUG_IDE
> +printf("%s: draining all remaining requests", __func__);
> +#endif
> +blk_drain_all();

As a separate patch you can change this to blk_drain(s->blk), which is
already an improvement.

Paolo

> +assert(s->bus->dma->aiocb == NULL);
> +}
> +}
> +
>  static void ide_sector_read(IDEState *s);
>  
>  static void ide_sector_read_cb(void *opaque, int ret)
> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
> index 2d1e2d2..86bde26 100644
> --- a/hw/ide/internal.h
> +++ b/hw/ide/internal.h
> @@ -586,6 +586,7 @@ BlockAIOCB *ide_issue_trim(BlockBackend *blk,
>  BlockAIOCB *ide_buffered_readv(IDEState *s, int64_t sector_num,
> QEMUIOVector *iov, int nb_sectors,
> BlockCompletionFunc *cb, void *opaque);
> +void ide_cancel_dma_sync(IDEState *s);
>  
>  /* hw/ide/atapi.c */
>  void ide_atapi_cmd(IDEState *s);
> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> index 37dbc29..6b780b8 100644
> --- a/hw/ide/pci.c
> +++ b/hw/ide/pci.c
> @@ -233,41 +233,7 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
>  /* Ignore writes to SSBM if it keeps the old value */
>  if ((val & BM_CMD_START) != (bm->cmd & BM_CMD_START)) {
>  if (!(val & BM_CMD_START)) {
> -/* First invoke the callbacks of all buffered requests
> - * and flag those requests as orphaned. Ideally there
> - * are no unbuffered (Scatter Gather DMA Requests or
> - * write requests) pending and we can avoid to drain. */
> -IDEBufferedRequest *req;
> -IDEState *s = idebus_active_if(bm->bus);
> -QLIST_FOREACH(req, >buffered_requests, list) {
> -if (!req->orphaned) {
> -#ifdef DEBUG_IDE
> -printf("%s: invoking cb %p of buffered request %p with"
> -   " -ECANCELED\n", __func__, req->original_cb, req);
> -#endif
> -req->original_cb(req->original_opaque, -ECANCELED);
> -}
> -req->orphaned = true;
> -}
> -/*
> - * We can't cancel Scatter Gather DMA in the middle of the
> - * operation or a partial (not full) DMA transfer would reach
> - * the storage so we wait for completion instead (we beahve
> - * like if the DMA was completed by the time the guest trying
> - * to cancel dma with bmdma_cmd_writeb with BM_CMD_START not
> - * set).
> - *
> - * In the future we'll be able to safely cancel the I/O if the
> - * whole DMA operation will be submitted to disk with a single
> - * aio operation with preadv/pwritev.
> - */
> - 

Re: [Qemu-devel] [PATCH 2/2] migration/virtio: Remove simple .get/.put use

2016-01-19 Thread Dr. David Alan Gilbert
* Sascha Silbe (si...@linux.vnet.ibm.com) wrote:
> Dear David,
> 
> "Dr. David Alan Gilbert"  writes:
> 
> >   Can you try this and let me know if it fixes it for you; I've
> > still not managed to persuade x86-64 to fail.
> 
> With Conny's hint re. virtio-1 (thanks!) I managed to make it fail on
> x86_64, too. I'm using libvirt for testing (virDomainSave() /
> virDomainRestore() use the qemu migration API internally, allowing for
> easy testing of migration code). Since current libvirt doesn't offer any
> knobs to set disable-modern/disable, I had to configure the devices
> manually:
> 
>   
> 
>  value='virtio-serial-pci,id=virtio-serial0,bus=pci.0,disable-modern=off,disable-legacy=on'/>
> 
> 
> 
> 
>   
> 
> With the above, migration fails on x86_64, too.

Thank you!  With that example I used:

  






  

(I had to use ide disk, my guest didn't like virtio-disk
with that; but still had virtio-net and virtio-serial).

> basic save/resume test on both x86_64 and s390x, so:
> 
> Tested-By: Sascha Silbe 

Thanks.

> (I currently don't have a more extensive test for migration; in
> particular nothing that puts the guest in a pre-defined state and
> compares on-the-wire data across qemu versions.)

No, I don't think anyone does; too many fields change depending
on timing etc - and the structure of the migration stream is
too arbitrary to pull apart [One thing I'm trying to fix by
avoiding .get/.put !].

> I'm also confident by now that I'm having a reasonable grasp of this
> particular aspect of the code, so for the actual code changes:
> 
> Reviewed-By: Sascha Silbe 
> 
> A commit message explaining what's going on would be nice, though. Maybe
> something along these lines:
> 
> migration/virtio: fix migration of VirtQueues
> 
> Commit 50e5ae4d [migration/virtio: Remove simple .get/.put use]
> refactored the virtio migration code to use the VMStateDescription API
> instead of the previous custom VMStateInfo API. It relied on
> VMSTATE_STRUCT_VARRAY_KNOWN, introduced by commit 2cf01486 [Add
> VMSTATE_STRUCT_VARRAY_KNOWN]. This was described as being for "a
> variable length array (i.e. _type *_field) but we know the
> length". However it actually specified operation for arrays embedded in
> the struct (i.e. _type _field[]) since it lacked the VMS_POINTER
> flag. This caused offset calculation to be completely off, examining and
> potentially sending random data instead of the VirtQueue content.
> 
> Replace the otherwise unused VMSTATE_STRUCT_VARRAY_KNOWN with a
> VMSTATE_STRUCT_VARRAY_POINTER_KNOWN that includes the VMS_POINTER flag
> (so now actually doing what it advertises) and use it in the virtio
> migration code.
> 
> (Feel free to reuse any or all of this).

Thanks I've reused a chunk of that;  I'll post the fix soon.
Thanks for your help on this.

Dave

> Sascha
> -- 
> Softwareentwicklung Sascha Silbe, Niederhofenstraße 5/1, 71229 Leonberg
> https://se-silbe.de/
> USt-IdNr. DE281696641
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-devel] [PATCH] cpu: cpu_save/cpu_load is no more

2016-01-19 Thread Paolo Bonzini
Everything has been converted to vmstate.

Signed-off-by: Paolo Bonzini 
---
 exec.c| 6 --
 include/qemu-common.h | 6 --
 2 files changed, 12 deletions(-)

diff --git a/exec.c b/exec.c
index 0d8ca3f..35705a1 100644
--- a/exec.c
+++ b/exec.c
@@ -628,12 +628,6 @@ void cpu_exec_init(CPUState *cpu, Error **errp)
 if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
 vmstate_register(NULL, cpu_index, _cpu_common, cpu);
 }
-#if defined(CPU_SAVE_VERSION) && !defined(CONFIG_USER_ONLY)
-register_savevm(NULL, "cpu", cpu_index, CPU_SAVE_VERSION,
-cpu_save, cpu_load, cpu->env_ptr);
-assert(cc->vmsd == NULL);
-assert(qdev_get_vmsd(DEVICE(cpu)) == NULL);
-#endif
 if (cc->vmsd != NULL) {
 vmstate_register(NULL, cpu_index, cc->vmsd, cpu);
 }
diff --git a/include/qemu-common.h b/include/qemu-common.h
index 22b010c..f557be7 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -330,12 +330,6 @@ bool tcg_enabled(void);
 
 void cpu_exec_init_all(void);
 
-/* CPU save/load.  */
-#ifdef CPU_SAVE_VERSION
-void cpu_save(QEMUFile *f, void *opaque);
-int cpu_load(QEMUFile *f, void *opaque, int version_id);
-#endif
-
 /* Unblock cpu */
 void qemu_cpu_kick_self(void);
 
-- 
2.5.0




Re: [Qemu-devel] [PATCH] nbd: use client_close() when negotiate phase fails

2016-01-19 Thread Paolo Bonzini


On 19/01/2016 12:50, Daniel P. Berrange wrote:
> When nbd_negotiate() fails, nbd_co_client_start() is
> directly calling client->close(). This eventually
> ends up calling nbd_client_put(), which does an
> assert(client->closing). Unfortunately we have not
> set the 'closing' flag, so the code now aborts. This
> bug was accidentally introduced in
> 
>   commit ee7d7aabdaea4484e069cb99c9fc54e8cb24b56f
>   Author: Fam Zheng 
>   Date:   Thu Jan 14 16:41:01 2016 +0800
> 
> nbd: Always call "close_fn" in nbd_client_new
> 
> The simple fix is to not directly call client->close()
> but instead call the client_close() method, which
> takes care todo the right sequence of steps to close
> the client.
> 
> Signed-off-by: Daniel P. Berrange 

Good catch, thanks.

Paolo

> ---
>  nbd/server.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index eead339..c29ba5f 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1080,8 +1080,7 @@ static coroutine_fn void nbd_co_client_start(void 
> *opaque)
>  nbd_export_get(exp);
>  }
>  if (nbd_negotiate(data)) {
> -shutdown(client->sock, 2);
> -client->close(client);
> +client_close(client);
>  goto out;
>  }
>  qemu_co_mutex_init(>send_lock);
> 



Re: [Qemu-devel] [PATCH 1/7] target-ppc: kvm: fix floating point registers sync on little-endian hosts

2016-01-19 Thread Greg Kurz
On Tue, 19 Jan 2016 11:55:10 +1100
David Gibson  wrote:

> On Mon, Jan 18, 2016 at 09:51:56AM +0100, Greg Kurz wrote:
> > On Mon, 18 Jan 2016 13:16:44 +1100
> > David Gibson  wrote:
> >   
> > > On Fri, Jan 15, 2016 at 04:00:12PM +0100, Greg Kurz wrote:  
> > > > On VSX capable CPUs, the 32 FP registers are mapped to the high-bits
> > > > of the 32 first VSX registers. So if you have:
> > > > 
> > > > VSR31 = (uint128) 0x0102030405060708090a0b0c0d0e0f00
> > > > 
> > > > then
> > > > 
> > > > FPR31 = (uint64) 0x0102030405060708
> > > > 
> > > > The kernel stores the VSX registers in the fp_state struct following the
> > > > host endian element ordering.
> > > > 
> > > > On big-endian:
> > > > 
> > > > fp_state.fpr[31][0] = 0x0102030405060708
> > > > fp_state.fpr[31][1] = 0x090a0b0c0d0e0f00
> > > > 
> > > > On little-endian:
> > > > 
> > > > fp_state.fpr[31][0] = 0x090a0b0c0d0e0f00
> > > > fp_state.fpr[31][1] = 0x0102030405060708
> > > > 
> > > > The KVM_GET_ONE_REG and KVM_SET_ONE_REG ioctls preserve this ordering, 
> > > > but
> > > > QEMU considers it as big-endian and always copies element [0] to the
> > > > fpr[] array and element [1] to the vsr[] array. This does not work with
> > > > little-endian hosts, and you will get:
> > > > 
> > > > (qemu) p $f31
> > > > 0x90a0b0c0d0e0f00
> > > > 
> > > > instead of:
> > > > 
> > > > (qemu) p $f31
> > > > 0x102030405060708
> > > > 
> > > > This patch fixes the element ordering for little-endian hosts.
> > > > 
> > > > Signed-off-by: Greg Kurz 
> > > 
> > > If I'm understanding correctly, the only reason this bug didn't affect
> > > things other than the gdbstub is because the get and put routines had  
> > 
> > Well it is not only gdbstub actually... as showed in the changelog, it also
> > affects the QEMU monitor which outputs wrong values since it calls 
> > kvm_get_fpu()
> > as well.  
> 
> Yes, sorry, I didn't express that well.  My point is that the only
> reason things aren't going horribly wrong is that qemu is only ever
> touching the FP/VSX values for debug, and the get/put into KVM is

I fully agree with that QEMU not touching FP/VSX is a key point for
not breaking anything.

> wrong in such a way that the right values go back again as long as
> qemu doesn't try to change them.
> 

I suppose so but I must confess I did not invest time to understand how
this KVM bug did not break the guest in some way...

> > > mirrored bugs.  So although qemu ended up with definitely wrong
> > > information in its internal state, it reshuffled it to be right on
> > > setting it back into KVM.
> > > 
> > > Is that correct?
> > >   
> > 
> > My guess is that the bug only affects gdbstub and ppc_cpu_dump_state(), 
> > because
> > these are the only cases where QEMU parses the state of FP registers... this
> > is indeed confirmed by the KVM bug you are referring to, that had no visible
> > effect for more than a year BTW.  
> 
> Ok.
> 
> Still waiting for a reply for my query on 5/7, then I'm happy to apply
> these.
> 

Yeah sorry for the delay... I had written a reply but I wasn't happy with
my poor English *again* so I spent some more time rewording. I've answered
at last ! :)

Thanks !

--
Greg




Re: [Qemu-devel] [PATCH 01/10] virtio: move VirtQueueElement at the beginning of the structs

2016-01-19 Thread Cornelia Huck
On Fri, 15 Jan 2016 13:41:49 +0100
Paolo Bonzini  wrote:

> The next patch will make virtqueue_pop/vring_pop allocate memory for a

s/will make/will make it possible for/

?

I had to spend some time grepping through the code to find that blk and
scsi (and gpu, which already had elem at the beginning of its
structure) are the only ones that work like this and that other devices
do not need any change.

> "subclass" of VirtQueueElement.  For this to work, VirtQueueElement
> must be the first field in the containing struct.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/scsi/virtio-scsi.c   |  3 +--
>  include/hw/virtio/virtio-blk.h  |  2 +-
>  include/hw/virtio/virtio-scsi.h | 13 ++---
>  3 files changed, 8 insertions(+), 10 deletions(-)

Otherwise,

Reviewed-by: Cornelia Huck 




Re: [Qemu-devel] [PATCH v2] Add optionrom compatible with fw_cfg DMA version

2016-01-19 Thread Gerd Hoffmann
  Hi,

> > > +if (fw_cfg_dma_enabled(fw_cfg)) {
> > > +option_rom[nb_option_roms].name = "linuxboot_dma.bin";
> > > +option_rom[nb_option_roms].bootindex = 0;
> > > +} else {
> > > +option_rom[nb_option_roms].name = "linuxboot.bin";
> > > +option_rom[nb_option_roms].bootindex = 0;
> > > +}  
> > 
> > Live migration compatibility requires that guest-visible changes to
> > the machine are only introduced in a new -machine .

> > I've CCed Gerd and Juan, I think they know how changes to Option ROMs
> > affect live migration better than me.  What needs to be done to
> > preserve live migration compatibility?
> 
> They are CC'd now :)

I think we are fine here.  The dma interface is enabled for new machine
types only, thats why we have fw_cfg_dma_enabled() in the first place ;)

> > Was there a technical reason why linuxboot.S cannot be extended
> > (e.g.  a size limit)?
> 
> I don't think there's a technical reason. It is a lot simpler to write
> the fw_cfg DMA stuff in C. To extend linuxboot.S these things should be
> modified:
>  - Add fw_cfg DMA detection support
>  - Change read_fw from a macro to a function that checks for fw_cfg DMA
>support and does the operation using IO or memory
>  - Extract bits and pieces from linuxboot.S into functions, that are
>only necessary when there is no support for fw_cfg DMA (the most
>important is jumping to 32 bits to read and copy the kernel).
> 
> This way, you check for support from the very beggining (when
> configuring the machine), and you don't have to branch the code
> anymore.
> 
> (I think I discussed this with somebody in the past. But I'm not sure
> with whom, or when. So I'll suppose it was a dream and it is not on the
> archives).

Could have been /me.

The fw_cfg macros in linuxboot.S are messy, looks like because they got
extended a few times.  Piling DMA support on top of that didn't look
very appealing to me.

Also DMA support simplifies things, there is no need to switch processor
modes to load the kernel above 1M.

> If you really think they should be merged, I'd even propose to
> merge the ASM version onto the C version (convert this patch into
> linuxboot.S). This slightly improves readability.

Fully agree.  I'm personally fine with having two roms, but when merging
them into one we surely should ditch the fw_cfg asm macros and go with
something more maintainable.

cheers,
  Gerd




[Qemu-devel] [PATCH v2 11/16] qdev: Define qdev_get_gpio_out

2016-01-19 Thread Alistair Francis
From: Peter Crosthwaite 

An API similar to the existing qdev_get_gpio_in() except gets outputs.
Useful for:

1: Implementing lightweight devices that don't want to keep pointers
to their own GPIOs. They can get their GPIO pointers at runtime from
QOM using this API.

2: testing or debugging code which may wish to override the
hardware generated value of of a GPIO with a user specified value
(E.G. interrupt injection).

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---

 hw/core/qdev.c | 12 
 include/hw/qdev-core.h |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 2c7101d..308e4a1 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -489,6 +489,18 @@ qemu_irq qdev_get_gpio_in(DeviceState *dev, int n)
 return qdev_get_gpio_in_named(dev, NULL, n);
 }
 
+qemu_irq qdev_get_gpio_out_named(DeviceState *dev, const char *name, int n)
+{
+char *propname = g_strdup_printf("%s[%d]",
+ name ? name : "unnamed-gpio-out", n);
+return (qemu_irq)object_property_get_link(OBJECT(dev), propname, NULL);
+}
+
+qemu_irq qdev_get_gpio_out(DeviceState *dev, int n)
+{
+return qdev_get_gpio_out_named(dev, NULL, n);
+}
+
 void qdev_connect_gpio_out_named(DeviceState *dev, const char *name, int n,
  qemu_irq pin)
 {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index abcdee8..0a09b8a 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -287,6 +287,8 @@ bool qdev_machine_modified(void);
 
 qemu_irq qdev_get_gpio_in(DeviceState *dev, int n);
 qemu_irq qdev_get_gpio_in_named(DeviceState *dev, const char *name, int n);
+qemu_irq qdev_get_gpio_out(DeviceState *dev, int n);
+qemu_irq qdev_get_gpio_out_named(DeviceState *dev, const char *name, int n);
 
 void qdev_connect_gpio_out(DeviceState *dev, int n, qemu_irq pin);
 void qdev_connect_gpio_out_named(DeviceState *dev, const char *name, int n,
-- 
2.5.0




[Qemu-devel] [PATCH v2 09/16] dma: Add Xilinx Zynq devcfg device model

2016-01-19 Thread Alistair Francis
From: Peter Crosthwaite 

Minimal device model for devcfg module of Zynq. DMA capabilities and
interrupt generation supported.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
Changed since v4:
Create device state header.
Use REG/FIELD/EX macros
Use register init_block32
Remove un-needed timer code
Changed since v3:
Stylistic updates.
Changed over to new decoding scheme.
Use .rsvd in definitions as appropriate.
Author reset (s/petalogix/xilinx).
Changed since v2:
Some QOM styling updates.
Re-implemented nw0 for lock register as pre_write
Changed since v1:
Rebased against new version of Register API.
Use action callbacks for side effects rather than switch.
Documented reasons for ge0, ge1 (Verbatim from TRM)
Added ui1 definitions for unimplemented major features
Removed dead lock code

 default-configs/arm-softmmu.mak   |   1 +
 hw/dma/Makefile.objs  |   1 +
 hw/dma/xlnx-zynq-devcfg.c | 406 ++
 include/hw/dma/xlnx-zynq-devcfg.h |  62 ++
 4 files changed, 470 insertions(+)
 create mode 100644 hw/dma/xlnx-zynq-devcfg.c
 create mode 100644 include/hw/dma/xlnx-zynq-devcfg.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index d9b90a5..bc3914d 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -66,6 +66,7 @@ CONFIG_PXA2XX=y
 CONFIG_BITBANG_I2C=y
 CONFIG_FRAMEBUFFER=y
 CONFIG_XILINX_SPIPS=y
+CONFIG_ZYNQ_DEVCFG=y
 
 CONFIG_ARM11SCU=y
 CONFIG_A9SCU=y
diff --git a/hw/dma/Makefile.objs b/hw/dma/Makefile.objs
index 0e65ed0..eaf0a81 100644
--- a/hw/dma/Makefile.objs
+++ b/hw/dma/Makefile.objs
@@ -5,6 +5,7 @@ common-obj-$(CONFIG_PL330) += pl330.o
 common-obj-$(CONFIG_I82374) += i82374.o
 common-obj-$(CONFIG_I8257) += i8257.o
 common-obj-$(CONFIG_XILINX_AXI) += xilinx_axidma.o
+common-obj-$(CONFIG_ZYNQ_DEVCFG) += xlnx-zynq-devcfg.o
 common-obj-$(CONFIG_ETRAXFS) += etraxfs_dma.o
 common-obj-$(CONFIG_STP2000) += sparc32_dma.o
 common-obj-$(CONFIG_SUN4M) += sun4m_iommu.o
diff --git a/hw/dma/xlnx-zynq-devcfg.c b/hw/dma/xlnx-zynq-devcfg.c
new file mode 100644
index 000..747d830
--- /dev/null
+++ b/hw/dma/xlnx-zynq-devcfg.c
@@ -0,0 +1,406 @@
+/*
+ * QEMU model of the Xilinx Zynq Devcfg Interface
+ *
+ * (C) 2011 PetaLogix Pty Ltd
+ * (C) 2014 Xilinx Inc.
+ * Written by Peter Crosthwaite 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "hw/dma/xlnx-zynq-devcfg.h"
+#include "qemu/bitops.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/dma.h"
+
+#define FREQ_HZ 9
+
+#define BTT_MAX 0x400
+
+#ifndef XLNX_ZYNQ_DEVCFG_ERR_DEBUG
+#define XLNX_ZYNQ_DEVCFG_ERR_DEBUG 0
+#endif
+
+#define DB_PRINT(...) do { \
+if (XLNX_ZYNQ_DEVCFG_ERR_DEBUG) { \
+qemu_log("%s: ", __func__); \
+qemu_log(__VA_ARGS__); \
+} \
+} while (0);
+
+REG32(CTRL, 0x00)
+FIELD(CTRL, FORCE_RST,  31,  1) /* Not supported, wr ignored */
+FIELD(CTRL, PCAP_PR,27,  1) /* Forced to 0 on bad unlock */
+FIELD(CTRL, PCAP_MODE,  26,  1)
+FIELD(CTRL, MULTIBOOT_EN,   24,  1)
+FIELD(CTRL, USER_MODE,  15,  1)
+FIELD(CTRL, PCFG_AES_FUSE,  12,  1)
+FIELD(CTRL, PCFG_AES_EN, 9,  3)
+FIELD(CTRL, SEU_EN,  8,  1)
+FIELD(CTRL, SEC_EN,  7,  1)
+FIELD(CTRL, SPNIDEN, 6,  1)
+FIELD(CTRL, SPIDEN,  5,  1)
+FIELD(CTRL, NIDEN,   4,  1)
+FIELD(CTRL, DBGEN,   3,  1)
+FIELD(CTRL, DAP_EN,  0,  3)
+
+REG32(LOCK, 0x04)
+#define AES_FUSE_LOCK4
+#define AES_EN_LOCK  3
+#define SEU_LOCK 2
+#define SEC_LOCK 1
+#define DBG_LOCK 0
+
+/* mapping bits 

[Qemu-devel] [PATCH v2 13/16] irq: Add opaque setter routine

2016-01-19 Thread Alistair Francis
From: Peter Crosthwaite 

Add a routine to set or override the opaque data of an IRQ.

Qdev currently always initialises IRQ opaque as the device itself.
This allows you to override to a custom opaque in the case where
there is extra or different data needed.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---

 hw/core/irq.c| 5 +
 include/hw/irq.h | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/hw/core/irq.c b/hw/core/irq.c
index 8a62a36..4a41059 100644
--- a/hw/core/irq.c
+++ b/hw/core/irq.c
@@ -76,6 +76,11 @@ qemu_irq qemu_allocate_irq(qemu_irq_handler handler, void 
*opaque, int n)
 return irq;
 }
 
+void qemu_irq_set_opaque(qemu_irq irq, void *opaque)
+{
+irq->opaque = opaque;
+}
+
 void qemu_free_irqs(qemu_irq *s, int n)
 {
 int i;
diff --git a/include/hw/irq.h b/include/hw/irq.h
index 4c4c2ea..edad0fc 100644
--- a/include/hw/irq.h
+++ b/include/hw/irq.h
@@ -44,6 +44,8 @@ qemu_irq qemu_allocate_irq(qemu_irq_handler handler, void 
*opaque, int n);
 qemu_irq *qemu_extend_irqs(qemu_irq *old, int n_old, qemu_irq_handler handler,
 void *opaque, int n);
 
+void qemu_irq_set_opaque(qemu_irq irq, void *opaque);
+
 void qemu_free_irqs(qemu_irq *s, int n);
 void qemu_free_irq(qemu_irq irq);
 
-- 
2.5.0




[Qemu-devel] [PATCH v2 10/16] xilinx_zynq: add devcfg to machine model

2016-01-19 Thread Alistair Francis
From: Peter Crosthwaite 

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
Changed since v3:
Author reset.
Changed since v1:
Added manual parenting of devcfg node (evil but needed for early access
to canonical path by devcfgs realize fn).

 hw/arm/xilinx_zynq.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index 40b4761..cb92f44 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -275,6 +275,14 @@ static void zynq_init(MachineState *machine)
 sysbus_connect_irq(busdev, n + 1, pic[dma_irqs[n] - IRQ_OFFSET]);
 }
 
+dev = qdev_create(NULL, "xlnx.ps7-dev-cfg");
+object_property_add_child(qdev_get_machine(), "xlnx-devcfg", OBJECT(dev),
+  NULL);
+qdev_init_nofail(dev);
+busdev = SYS_BUS_DEVICE(dev);
+sysbus_connect_irq(busdev, 0, pic[40-IRQ_OFFSET]);
+sysbus_mmio_map(busdev, 0, 0xF8007000);
+
 zynq_binfo.ram_size = ram_size;
 zynq_binfo.kernel_filename = kernel_filename;
 zynq_binfo.kernel_cmdline = kernel_cmdline;
-- 
2.5.0




[Qemu-devel] [PATCH v13] block/raw-posix.c: Make physical devices usable in QEMU under Mac OS X host

2016-01-19 Thread Programmingkid
Mac OS X can be picky when it comes to allowing the user
to use physical devices in QEMU. Most mounted volumes
appear to be off limits to QEMU. If an issue is detected,
a message is displayed showing the user how to unmount a
volume. Now QEMU uses both CD and DVD media.

Signed-off-by: John Arbuckle 

---
Added continue statement to the kernResult != KERN_SUCCESS if condition in 
FindEjectableCDMedia().
Moved print_unmounting_directions() to only compile under Mac OS X. 
Fixed indentation of "setup_cdrom(bsd_path, errp) == false) {".
Replaced IOCDMedia with kIOCDMediaClass.
Changed filename variable to a character array.
Changed how filename was set to bsd_path's value by using snprintf().
Removed "goto continue_as_normal" code.
Added error_occurred variable.
Moved "if (strncmp(filename, "/dev/", 5) == 0 && ret != 0)" code to inside of 
"if (ret < 0)" block. 

 block/raw-posix.c |  169 -
 1 files changed, 127 insertions(+), 42 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 076d070..67dc166 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 //#include 
+#include 
 #include 
 #endif
 
@@ -1971,33 +1972,47 @@ BlockDriver bdrv_file = {
 /* host device */
 
 #if defined(__APPLE__) && defined(__MACH__)
-static kern_return_t FindEjectableCDMedia( io_iterator_t *mediaIterator );
 static kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
 CFIndex maxPathSize, int flags);
-kern_return_t FindEjectableCDMedia( io_iterator_t *mediaIterator )
+static char *FindEjectableOpticalMedia(io_iterator_t *mediaIterator)
 {
-kern_return_t   kernResult;
+kern_return_t kernResult = KERN_FAILURE;
 mach_port_t masterPort;
 CFMutableDictionaryRef  classesToMatch;
+const char *matching_array[] = {kIODVDMediaClass, kIOCDMediaClass};
+char *mediaType = NULL;
 
 kernResult = IOMasterPort( MACH_PORT_NULL,  );
 if ( KERN_SUCCESS != kernResult ) {
 printf( "IOMasterPort returned %d\n", kernResult );
 }
 
-classesToMatch = IOServiceMatching( kIOCDMediaClass );
-if ( classesToMatch == NULL ) {
-printf( "IOServiceMatching returned a NULL dictionary.\n" );
-} else {
-CFDictionarySetValue( classesToMatch, CFSTR( kIOMediaEjectableKey ), 
kCFBooleanTrue );
-}
-kernResult = IOServiceGetMatchingServices( masterPort, classesToMatch, 
mediaIterator );
-if ( KERN_SUCCESS != kernResult )
-{
-printf( "IOServiceGetMatchingServices returned %d\n", kernResult );
-}
+int index;
+for (index = 0; index < ARRAY_SIZE(matching_array); index++) {
+classesToMatch = IOServiceMatching(matching_array[index]);
+if (classesToMatch == NULL) {
+error_report("IOServiceMatching returned NULL for %s",
+ matching_array[index]);
+continue;
+}
+CFDictionarySetValue(classesToMatch, CFSTR(kIOMediaEjectableKey),
+ kCFBooleanTrue);
+kernResult = IOServiceGetMatchingServices(masterPort, classesToMatch,
+  mediaIterator);
+if (kernResult != KERN_SUCCESS) {
+error_report("Note: IOServiceGetMatchingServices returned %d",
+ kernResult);
+continue;
+}
 
-return kernResult;
+/* If a match was found, leave the loop */
+if (*mediaIterator != 0) {
+DPRINTF("Matching using %s\n", matching_array[index]);
+mediaType = g_strdup(matching_array[index]);
+break;
+}
+}
+return mediaType;
 }
 
 kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
@@ -2029,7 +2044,46 @@ kern_return_t GetBSDPath(io_iterator_t mediaIterator, 
char *bsdPath,
 return kernResult;
 }
 
-#endif
+/* Sets up a real cdrom for use in QEMU */
+static bool setup_cdrom(char *bsd_path, Error **errp)
+{
+int index, num_of_test_partitions = 2, fd;
+char test_partition[MAXPATHLEN];
+bool partition_found = false;
+
+/* look for a working partition */
+for (index = 0; index < num_of_test_partitions; index++) {
+snprintf(test_partition, sizeof(test_partition), "%ss%d", bsd_path,
+ index);
+fd = qemu_open(test_partition, O_RDONLY | O_BINARY | O_LARGEFILE);
+if (fd >= 0) {
+partition_found = true;
+qemu_close(fd);
+break;
+}
+}
+
+/* if a working partition on the device was not found */
+if (partition_found == false) {
+error_setg(errp, "Failed to find a working partition on disc");
+} else {
+DPRINTF("Using %s as optical disc\n", test_partition);
+pstrcpy(bsd_path, MAXPATHLEN, test_partition);
+}
+return partition_found;
+}
+
+/* Prints directions on mounting and 

Re: [Qemu-devel] [PATCH 07/10] pseries: Clean up error handling in spapr_rtas_register()

2016-01-19 Thread Eric Blake
On 01/15/2016 05:00 AM, David Gibson wrote:
> The errors detected in this function necessarily indicate bugs in the rest
> of the qemu code, rather than an external or configuration problem.
> 
> So, a simple assert() is more appropriate than any more complex error
> reporting.
> 
> Signed-off-by: David Gibson 
> ---
>  hw/ppc/spapr_rtas.c | 12 +++-
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 34b12a3..0be52ae 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -648,17 +648,11 @@ target_ulong spapr_rtas_call(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  
>  void spapr_rtas_register(int token, const char *name, spapr_rtas_fn fn)
>  {
> -if (!((token >= RTAS_TOKEN_BASE) && (token < RTAS_TOKEN_MAX))) {
> -fprintf(stderr, "RTAS invalid token 0x%x\n", token);
> -exit(1);
> -}
> +assert((token >= RTAS_TOKEN_BASE) && (token < RTAS_TOKEN_MAX));

You could drop the redundant () while touching this, as in:

assert(token >= RTAS_TOKEN_BASE && token < RTAS_TOKEN_MAX);

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v3 0/3] Use QCryptoSecret for block device passwords

2016-01-19 Thread Paolo Bonzini


On 19/01/2016 17:46, Daniel P. Berrange wrote:
> On Tue, Jan 19, 2016 at 05:32:35PM +0100, Paolo Bonzini wrote:
>>
>>
>> On 19/01/2016 14:51, Daniel P. Berrange wrote:
>>> This series was previously posted:
>>>
>>>   v1: https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg04365.html
>>>   v2: https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg03809.html
>>>
>>> The RBD, Curl and iSCSI block device drivers all need the ability
>>> to accept a password to authenticate with the remote network storage
>>> server. Currently RBD and iSCSI both just take the password in clear
>>> text as part of the block parameters which is insecure (passwords are
>>> visible in the process listing), while Curl doesn't support auth at
>>> all.
>>>
>>> This series updates all three drivers so that they use the recently
>>> merged QCryptoSecret API for getting passwords. Each driver gains
>>> a 'passwordid' property that can be set to provide the ID of a
>>> QCryptoSecret object instance, which in turn provides the actual
>>> password data.
>>>
>>> This series is required in order to fix a long standing CVE security
>>> flaw in libvirt, whereby passwords are exposed in the command line
>>> arguments and so visible in process listing
>>>
>>> This series would benefit from the --object additions to qemu-img,
>>> qemu-io and qemu-nbd, but this is not a pre-requisite for its merge
>>> as it us still useful in the system emulator without that support:
>>>
>>>   https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg03381.html
>>>
>>> Changed in v3:
>>>
>>>  - Rename 'passwordid' to 'password-id', 'proxypasswordid'
>>>to 'proxy-password-id' and 'proxyusername' to 'proxy-username'
>>>(Markus)
>>>
>>> Daniel P. Berrange (3):
>>>   rbd: add support for getting password from QCryptoSecret object
>>>   curl: add support for HTTP authentication parameters
>>>   iscsi: add support for getting CHAP password via QCryptoSecret API
>>>
>>>  block/curl.c  | 66 
>>> +++
>>>  block/iscsi.c | 24 +-
>>>  block/rbd.c   | 47 ++
>>>  3 files changed, 136 insertions(+), 1 deletion(-)
>>>
>>
>> Apologizing in advance for bikeshedding: what about using proxy-secret
>> and secret instead?  Traditionally the name of object options has
>> referred to the name of the class.
> 
> I wanted to avoid using the word 'secret', because in the future when we
> have ability to run LUKS encryption over any backend, we will have need
> to pass multiple secrets for a single drive spec. For example, we'll need
> one secret to provide the RBD password, and one secret to provide the LUKS
> decryption passphrase. So I felt using 'password' is a better choice to
> standardize on for the protocol authentication needs.

If you have a qcow2->luks->rbd tree, the LUKS passphrase would be
file.secret, while the rbd credentials would be file.file.username and
file.file.secret.

password-secret and proxy-password-secret are fine too.  I just don't
like "id" too much.

Paolo



Re: [Qemu-devel] [PATCH v1 1/2] tcg: Add support for constant value promises

2016-01-19 Thread Lluís Vilanova
Edgar E Iglesias writes:

> On Sat, Jan 16, 2016 at 09:57:36PM +0100, Lluís Vilanova wrote:
>> Richard Henderson writes:
>> 
>> > On 01/15/2016 12:12 PM, Lluís Vilanova wrote:
>> >> Richard Henderson writes:
>> >> 
>> >>> On 01/15/2016 07:35 AM, Lluís Vilanova wrote:
>>  +TCGv_i64 tcg_promise_i64(TCGv_promise_i64 *promise)
>>  +{
>>  +int pi = tcg_ctx.gen_next_parm_idx;
>>  +*promise = (TCGv_promise_i64)_ctx.gen_opparam_buf[pi];
>>  +return tcg_const_i64(0xdeadcafe);
>>  +}
>> >> 
>> >>> This doesn't work for a 32-bit host.  The constant may be split across 
>> >>> two
>> >>> different parameter indices, and you don't know exactly where the second 
>> >>> will be.
>> >> 
>> >>> Because of that, I think this is over-engineered, and really prefer the 
>> >>> simpler
>> >>> interface that Edgar posted last week.
>> >> 
>> >> In this case, 'tcg_set_promise_i64' sets the two arguments accordingly on 
>> >> 32-bit
>> >> targets. Both solutions depend on TCG internals (in this specific case the
>> >> implementation of 'tcg_gen_movi_i64'), but now it's all implemented 
>> >> inside TCG.
>> >> 
>> >> Alternatively, promises could use the longer route of recording the 
>> >> opcode index
>> >> (as Edgar did AFAIR), and retrieve the argument pointer from there. 
>> >> Still, for
>> >> 32-bit targets we have to assume the two immediate moves are gonna 
>> >> generate two
>> >> consecutive opcodes.
>> 
>> > Your solution also doesn't help Edgar, since he's interested in modifying 
>> > an
>> > argument to the insn_start opcode, not modifying a literal constant in a 
>> > move.
>> 
>> I wasn't aware of that. If the idea was to use this for more than immediates
>> stored in TCGv values, I see two options. First, modify the necessary 
>> opcodes to
>> use a TCGv argument instead of an immediate. Second, generalize this patch to
>> to select any opcode argument.
>> 
>> An example of the generalization when used to reimplement icount:
>> 
>> // insn count placeholder
>> TCGv_i32 imm = tcg_const_i32(0xcafecafe);
>> // insn count promise
>> TCGv_promise_i32 imm_promise = tcg_promise_i32(
>> 1,  // how many opcodes to go "backwards"
>> 1); // what argument to modify on that opcode
>> // operate with imm
>> ...
>> // resolve value
>> tcg_set_promise_i32(imm_promise, insn_count);
>> 
>> The question still stands on how to cleanly handle promises for opcodes like 
>> a
>> 64-bit mov on a 32-bit host (it's generated as two opcodes). Using this
>> interface would still be cleaner than directly manipulating the low-level TCG
>> arrays, and makes it easier to adopt it in future changes.
>> 

> Thanks Lluis and Richard,

> I'll stay with my version for the first try at the ARM load/store fault
> reporting. If something better comes along that works for me, I'm happy
> to change.

> Richard if you want to take the patches through your tree feel free to
> do so. Otherwise, I'll post them again with more context and try through
> the ARM queue.

My offer still stands. If the generalized interface seems adequate (specific
opcode argument to set the promise for), it's a rather simple change on the
series.


Cheers,
  Lluis



Re: [Qemu-devel] [PATCH v3] s390: use FILE instead of QEMUFile for creating text file

2016-01-19 Thread Eric Blake
On 01/18/2016 04:05 AM, Daniel P. Berrange wrote:
> The s390 skeys monitor command needs to write out a plain text
> file. Currently it is using the QEMUFile class for this, but
> work is ongoing to refactor QEMUFile and eliminate much code
> related to it. The only feature qemu_fopen() gives over fopen()
> is support for QEMU FD passing, but this can be achieved with
> qemu_open() + fdopen() too. Switching to regular stdio FILE
> APIs avoids the need to sprintf via an intermediate buffer
> which slightly simplifies the code.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  hw/s390x/s390-skeys.c | 26 ++
>  1 file changed, 14 insertions(+), 12 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v2 15/16] misc: Introduce ZynqMP IOU SLCR

2016-01-19 Thread Alistair Francis
From: Peter Crosthwaite 

IOU = I/O Unit
SLCR = System Level Control Registers

This IP is a misc collections of control registers that switch various
properties of system IPs. Currently the only thing implemented is the
SD_SLOTTYPE control (implemented as a GPIO output).

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---

 hw/misc/Makefile.objs  |   1 +
 hw/misc/xlnx-zynqmp-iou-slcr.c | 113 +
 include/hw/misc/xlnx-zynqmp-iou-slcr.h |  47 ++
 3 files changed, 161 insertions(+)
 create mode 100644 hw/misc/xlnx-zynqmp-iou-slcr.c
 create mode 100644 include/hw/misc/xlnx-zynqmp-iou-slcr.h

diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index d4765c2..6e01250 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -39,6 +39,7 @@ obj-$(CONFIG_OMAP) += omap_tap.o
 obj-$(CONFIG_SLAVIO) += slavio_misc.o
 obj-$(CONFIG_ZYNQ) += zynq_slcr.o
 obj-$(CONFIG_ZYNQ) += zynq-xadc.o
+obj-$(CONFIG_ZYNQ) += xlnx-zynqmp-iou-slcr.o
 obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
 
 obj-$(CONFIG_PVPANIC) += pvpanic.o
diff --git a/hw/misc/xlnx-zynqmp-iou-slcr.c b/hw/misc/xlnx-zynqmp-iou-slcr.c
new file mode 100644
index 000..35b989c
--- /dev/null
+++ b/hw/misc/xlnx-zynqmp-iou-slcr.c
@@ -0,0 +1,113 @@
+/*
+ * Xilinx ZynqMP IOU System Level Control Registers (SLCR)
+ *
+ * Copyright (c) 2013 Xilinx Inc
+ * Copyright (c) 2013 Peter Crosthwaite 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "hw/misc/xlnx-zynqmp-iou-slcr.h"
+
+#ifndef XLNX_ZYNQMP_IOU_SLCR_ERR_DEBUG
+#define XLNX_ZYNQMP_IOU_SLCR_ERR_DEBUG 0
+#endif
+
+REG32(SD_SLOTTYPE, 0x310)
+#define R_SD_SLOTTYPE_RSVD   0x7ffe
+
+static const RegisterAccessInfo xlnx_zynqmp_iou_slcr_regs_info[] = {
+{   .name = "SD Slot TYPE", .decode.addr = A_SD_SLOTTYPE,
+.rsvd = R_SD_SLOTTYPE_RSVD,
+.gpios = (RegisterGPIOMapping []) {
+{ .name = "SD0_SLOTTYPE",   .bit_pos = 0  },
+{ .name = "SD1_SLOTTYPE",   .bit_pos = 15 },
+{},
+}
+}
+/* FIXME: Complete device model */
+};
+
+static void xlnx_zynqmp_iou_slcr_reset(DeviceState *dev)
+{
+XlnxZynqMPIOUSLCR *s = XLNX_ZYNQMP_IOU_SLCR(dev);
+int i;
+
+for (i = 0; i < XLNX_ZYNQ_MP_IOU_SLCR_R_MAX; ++i) {
+register_reset(>regs_info[i]);
+}
+}
+
+static const MemoryRegionOps xlnx_zynqmp_iou_slcr_ops = {
+.read = register_read_memory_le,
+.write = register_write_memory_le,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+}
+};
+
+static void xlnx_zynqmp_iou_slcr_init(Object *obj)
+{
+XlnxZynqMPIOUSLCR *s = XLNX_ZYNQMP_IOU_SLCR(obj);
+
+memory_region_init(>iomem, obj, "MMIO", XLNX_ZYNQ_MP_IOU_SLCR_R_MAX * 
4);
+register_init_block32(DEVICE(obj), xlnx_zynqmp_iou_slcr_regs_info,
+  ARRAY_SIZE(xlnx_zynqmp_iou_slcr_regs_info),
+  s->regs_info, s->regs, >iomem,
+  _zynqmp_iou_slcr_ops,
+  XLNX_ZYNQMP_IOU_SLCR_ERR_DEBUG);
+sysbus_init_mmio(SYS_BUS_DEVICE(obj), >iomem);
+}
+
+static const VMStateDescription vmstate_xlnx_zynqmp_iou_slcr = {
+.name = "xlnx_zynqmp_iou_slcr",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32_ARRAY(regs, XlnxZynqMPIOUSLCR,
+ XLNX_ZYNQ_MP_IOU_SLCR_R_MAX),
+VMSTATE_END_OF_LIST(),
+}
+};
+
+static void xlnx_zynqmp_iou_slcr_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->reset = xlnx_zynqmp_iou_slcr_reset;
+dc->vmsd = 

[Qemu-devel] [PATCH v2 08/16] bitops: Add ONES macro

2016-01-19 Thread Alistair Francis
From: Peter Crosthwaite 

Little macro that just gives you N ones (justified to LSB).

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---

 include/qemu/bitops.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index 8164225..27bf98d 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -430,4 +430,6 @@ static inline uint64_t deposit64(uint64_t value, int start, 
int length,
 return (value & ~mask) | ((fieldval << start) & mask);
 }
 
+#define ONES(num) ((num) == 64 ? ~0ull : (1ull << (num)) - 1)
+
 #endif
-- 
2.5.0




[Qemu-devel] [PATCH v2 06/16] register: QOMify

2016-01-19 Thread Alistair Francis
From: Peter Crosthwaite 

QOMify registers as a child of TYPE_DEVICE. This allows registers to
define GPIOs.

Define an init helper that will do QOM initialisation as well as setup
the r/w fast paths.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---

 hw/core/register.c| 34 ++
 include/hw/register.h | 17 +
 2 files changed, 51 insertions(+)

diff --git a/hw/core/register.c b/hw/core/register.c
index ca10cff..000b87f 100644
--- a/hw/core/register.c
+++ b/hw/core/register.c
@@ -185,6 +185,28 @@ void register_reset(RegisterInfo *reg)
 register_write_val(reg, reg->access->reset);
 }
 
+void register_init(RegisterInfo *reg)
+{
+assert(reg);
+const RegisterAccessInfo *ac;
+
+if (!reg->data || !reg->access) {
+return;
+}
+
+object_initialize((void *)reg, sizeof(*reg), TYPE_REGISTER);
+
+ac = reg->access;
+
+/* if there are no debug msgs and no RMW requirement, mark for fast write 
*/
+reg->write_lite = reg->debug || ac->ro || ac->w1c || ac->pre_write ||
+((ac->ge0 || ac->ge1) && qemu_loglevel_mask(LOG_GUEST_ERROR)) ||
+((ac->ui0 || ac->ui1) && qemu_loglevel_mask(LOG_UNIMP))
+ ? false : true;
+/* no debug and no clear-on-read is a fast read */
+reg->read_lite = reg->debug || ac->cor ? false : true;
+}
+
 static inline void register_write_memory(void *opaque, hwaddr addr,
  uint64_t value, unsigned size, bool 
be)
 {
@@ -232,3 +254,15 @@ uint64_t register_read_memory_le(void *opaque, hwaddr 
addr, unsigned size)
 {
 return register_read_memory(opaque, addr, size, false);
 }
+
+static const TypeInfo register_info = {
+.name  = TYPE_REGISTER,
+.parent = TYPE_DEVICE,
+};
+
+static void register_register_types(void)
+{
+type_register_static(_info);
+}
+
+type_init(register_register_types)
diff --git a/include/hw/register.h b/include/hw/register.h
index 0c6f03d..6677dee 100644
--- a/include/hw/register.h
+++ b/include/hw/register.h
@@ -11,6 +11,7 @@
 #ifndef REGISTER_H
 #define REGISTER_H
 
+#include "hw/qdev-core.h"
 #include "exec/memory.h"
 
 typedef struct RegisterInfo RegisterInfo;
@@ -101,6 +102,11 @@ struct RegisterAccessInfo {
  */
 
 struct RegisterInfo {
+/*< private >*/
+DeviceState parent_obj;
+
+/*< public >*/
+
 void *data;
 int data_size;
 
@@ -119,6 +125,9 @@ struct RegisterInfo {
 MemoryRegion mem;
 };
 
+#define TYPE_REGISTER "qemu,register"
+#define REGISTER(obj) OBJECT_CHECK(RegisterInfo, (obj), TYPE_REGISTER)
+
 /**
  * write a value to a register, subject to its restrictions
  * @reg: register to write to
@@ -144,6 +153,14 @@ uint64_t register_read(RegisterInfo *reg);
 void register_reset(RegisterInfo *reg);
 
 /**
+ * Initialize a register. GPIO's are setup as IOs to the specified device.
+ * Fast paths for eligible registers are enabled.
+ * @reg: Register to initialize
+ */
+
+void register_init(RegisterInfo *reg);
+
+/**
  * Memory API MMIO write handler that will write to a Register API register.
  *  _be for big endian variant and _le for little endian.
  * @opaque: RegisterInfo to write to
-- 
2.5.0




[Qemu-devel] [PATCH v2 12/16] qdev: Add qdev_pass_all_gpios API

2016-01-19 Thread Alistair Francis
From: Peter Crosthwaite 

For passing all GPIOs of all names from a contained device to a
container.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---

 hw/core/qdev.c | 9 +
 include/hw/qdev-core.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 308e4a1..6f84161 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -589,6 +589,15 @@ void qdev_pass_gpios(DeviceState *dev, DeviceState 
*container,
 QLIST_INSERT_HEAD(>gpios, ngl, node);
 }
 
+void qdev_pass_all_gpios(DeviceState *dev, DeviceState *container)
+{
+NamedGPIOList *ngl;
+
+QLIST_FOREACH(ngl, >gpios, node) {
+qdev_pass_gpios(dev, container, ngl->name);
+}
+}
+
 BusState *qdev_get_child_bus(DeviceState *dev, const char *name)
 {
 BusState *bus;
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 0a09b8a..753673c 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -312,6 +312,7 @@ void qdev_init_gpio_out_named(DeviceState *dev, qemu_irq 
*pins,
 
 void qdev_pass_gpios(DeviceState *dev, DeviceState *container,
  const char *name);
+void qdev_pass_all_gpios(DeviceState *dev, DeviceState *container);
 
 BusState *qdev_get_parent_bus(DeviceState *dev);
 
-- 
2.5.0




Re: [Qemu-devel] [PATCH v4 8/9] trace: [tcg] Add per-vCPU tracing states for events with the 'vcpu' property

2016-01-19 Thread Eric Blake
On 01/15/2016 09:38 AM, Lluís Vilanova wrote:
> Each event with the 'vcpu' property gets a per-vCPU dynamic tracing state.
> 
> The set of enabled events with the 'vcpu' and 'tcg' properties is used
> to select a per-vCPU physical TB cache.  The number of events with both
> properties is used to select the number of physical TB caches, and a
> bitmap of the identifiers of such enabled events is used to select a
> physical TB cache.
> 
> Signed-off-by: Lluís Vilanova 
> ---

> +++ b/qmp-commands.hx

> @@ -4594,6 +4599,13 @@ trace-event-set-state
>  
>  Set the state of events.
>  
> +Arguments:
> +
> +- "name": Event name pattern (json-string).
> +- "enable": Whether to enable or disable the event (json-bool).
> +- "ignore-unavailable": Whether to ignore errors for events that cannot be 
> changed (json-bool, optional).

Long line; wrap to keep it in 80 columns.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v2 05/16] register: Define REG and FIELD macros

2016-01-19 Thread Alistair Francis
From: Peter Crosthwaite 

Define some macros that can be used for defining registers and fields.

The REG32 macro will define A_FOO, for the byte address of a register
as well as R_FOO for the uint32_t[] register number (A_FOO / 4).

The FIELD macro will define FOO_BAR_MASK, FOO_BAR_SHIFT and
FOO_BAR_LENGTH constants for field BAR in register FOO.

Finally, there are some shorthand helpers for extracting/depositing
fields from registers based on these naming schemes.

Usage can greatly reduce the verbosity of device code.

The deposit and extract macros (eg F_EX32, AF_DP32 etc.) can be used
to generate extract and deposits without any repetition of the name
stems.

Signed-off-by: Peter Crosthwaite 
[ EI Changes:
  * Add Deposit macros
]
Signed-off-by: Edgar E. Iglesias 
Signed-off-by: Alistair Francis 
---
E.g. Currently you have to define something like:

\#define R_FOOREG (0x84/4)
\#define R_FOOREG_BARFIELD_SHIFT 10
\#define R_FOOREG_BARFIELD_LENGTH 5

uint32_t foobar_val = extract32(s->regs[R_FOOREG],
R_FOOREG_BARFIELD_SHIFT,
R_FOOREG_BARFIELD_LENGTH);

Which has:
2 macro definitions per field
3 register names ("FOOREG") per extract
2 field names ("BARFIELD") per extract

With these macros this becomes:

REG32(FOOREG, 0x84)
FIELD(FOOREG, BARFIELD, 10, 5)

uint32_t foobar_val = AF_EX32(s->regs, FOOREG, BARFIELD)

Which has:
1 macro definition per field
1 register name per extract
1 field name per extract

If you are not using arrays for the register data you can just use the
non-array "F_" variants and still save 2 name stems:

uint32_t foobar_val = F_EX32(s->fooreg, FOOREG, BARFIELD)

Deposit is similar for depositing values. Deposit has compile-time
overflow checking for literals.
For example:

REG32(XYZ1, 0x84)
FIELD(XYZ1, TRC, 0, 4)

/* Correctly set XYZ1.TRC = 5.  */
AF_DP32(s->regs, XYZ1, TRC, 5);

/* Incorrectly set XYZ1.TRC = 16.  */
AF_DP32(s->regs, XYZ1, TRC, 16);

The latter assignment results in:
warning: large integer implicitly truncated to unsigned type [-Woverflow]


 include/hw/register.h | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/include/hw/register.h b/include/hw/register.h
index 90c0185..0c6f03d 100644
--- a/include/hw/register.h
+++ b/include/hw/register.h
@@ -169,4 +169,42 @@ void register_write_memory_le(void *opaque, hwaddr addr, 
uint64_t value,
 uint64_t register_read_memory_be(void *opaque, hwaddr addr, unsigned size);
 uint64_t register_read_memory_le(void *opaque, hwaddr addr, unsigned size);
 
+/* Define constants for a 32 bit register */
+#define REG32(reg, addr)  \
+enum { A_ ## reg = (addr) };  \
+enum { R_ ## reg = (addr) / 4 };
+
+/* Define SHIFT, LEGTH and MASK constants for a field within a register */
+#define FIELD(reg, field, shift, length)  \
+enum { R_ ## reg ## _ ## field ## _SHIFT = (shift)};  \
+enum { R_ ## reg ## _ ## field ## _LENGTH = (length)};\
+enum { R_ ## reg ## _ ## field ## _MASK = (((1ULL << (length)) - 1)   \
+  << (shift)) };
+
+/* Extract a field from a register */
+
+#define F_EX32(storage, reg, field)   \
+extract32((storage), R_ ## reg ## _ ## field ## _SHIFT,   \
+  R_ ## reg ## _ ## field ## _LENGTH)
+
+/* Extract a field from an array of registers */
+
+#define AF_EX32(regs, reg, field) \
+F_EX32((regs)[R_ ## reg], reg, field)
+
+/* Deposit a register field.  */
+
+#define F_DP32(storage, reg, field, val) ({   \
+struct {  \
+unsigned int v:R_ ## reg ## _ ## field ## _LENGTH;\
+} v = { .v = val };   \
+uint32_t d;   \
+d = deposit32((storage), R_ ## reg ## _ ## field ## _SHIFT,   \
+  R_ ## reg ## _ ## field ## _LENGTH, v.v);   \
+d; })
+
+/* Deposit a field to array of registers.  */
+
+#define AF_DP32(regs, reg, field, val)\
+(regs)[R_ ## reg] = F_DP32((regs)[R_ ## reg], reg, field, val);
 #endif
-- 
2.5.0




[Qemu-devel] [PATCH v2 07/16] register: Add block initialise helper

2016-01-19 Thread Alistair Francis
From: Peter Crosthwaite 

Add a helper that will scan a static RegisterAccessInfo Array
and populate a container MemoryRegion with registers as defined.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
V2:
 - Use memory_region_add_subregion_no_print()

 hw/core/register.c| 29 +
 include/hw/register.h | 21 +
 2 files changed, 50 insertions(+)

diff --git a/hw/core/register.c b/hw/core/register.c
index 000b87f..116fd0b 100644
--- a/hw/core/register.c
+++ b/hw/core/register.c
@@ -255,6 +255,35 @@ uint64_t register_read_memory_le(void *opaque, hwaddr 
addr, unsigned size)
 return register_read_memory(opaque, addr, size, false);
 }
 
+void register_init_block32(DeviceState *owner, const RegisterAccessInfo *rae,
+   int num, RegisterInfo *ri, uint32_t *data,
+   MemoryRegion *container, const MemoryRegionOps *ops,
+   bool debug_enabled)
+{
+const char *debug_prefix = object_get_typename(OBJECT(owner));
+int i;
+
+for (i = 0; i < num; i++) {
+int index = rae[i].decode.addr / 4;
+RegisterInfo *r = [index];
+
+*r = (RegisterInfo) {
+.data = [index],
+.data_size = sizeof(uint32_t),
+.access = [i],
+.debug = debug_enabled,
+.prefix = debug_prefix,
+.opaque = owner,
+};
+register_init(r);
+
+memory_region_init_io(>mem, OBJECT(owner), ops, r, r->access->name,
+  sizeof(uint32_t));
+memory_region_add_subregion_no_print(container,
+ r->access->decode.addr, >mem);
+}
+}
+
 static const TypeInfo register_info = {
 .name  = TYPE_REGISTER,
 .parent = TYPE_DEVICE,
diff --git a/include/hw/register.h b/include/hw/register.h
index 6677dee..f3e4c2c 100644
--- a/include/hw/register.h
+++ b/include/hw/register.h
@@ -186,6 +186,27 @@ void register_write_memory_le(void *opaque, hwaddr addr, 
uint64_t value,
 uint64_t register_read_memory_be(void *opaque, hwaddr addr, unsigned size);
 uint64_t register_read_memory_le(void *opaque, hwaddr addr, unsigned size);
 
+/**
+ * Init a block of consecutive registers into a container MemoryRegion. A
+ * number of constant register definitions are parsed to create a corresponding
+ * array of RegisterInfo's.
+ *
+ * @owner: device owning the registers
+ * @rae: Register definitions to init
+ * @num: number of registers to init (length of @rae)
+ * @ri: Register array to init
+ * @data: Array to use for register data
+ * @container: Memory region to contain new registers
+ * @ops: Memory region ops to use to access registers. Opaque data of handler
+ * with be a RegisterInfo * (from @ri)
+ * @debug enabled: turn on/off verbose debug information
+ */
+
+void register_init_block32(DeviceState *owner, const RegisterAccessInfo *rae,
+   int num, RegisterInfo *ri, uint32_t *data,
+   MemoryRegion *container, const MemoryRegionOps *ops,
+   bool debug_enabled);
+
 /* Define constants for a 32 bit register */
 #define REG32(reg, addr)  \
 enum { A_ ## reg = (addr) };  \
-- 
2.5.0




[Qemu-devel] [PATCH v2 04/16] register: Add support for decoding information

2016-01-19 Thread Alistair Francis
From: Peter Crosthwaite 

Allow defining of optional address decoding information in register
definitions. This is useful for clients that want to associate
registers with specific addresses.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
changed since v4:
Remove extraneous unused defintions.

 include/hw/register.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/hw/register.h b/include/hw/register.h
index a3c41db..90c0185 100644
--- a/include/hw/register.h
+++ b/include/hw/register.h
@@ -54,6 +54,11 @@ typedef struct RegisterAccessError {
  * allowing this function to modify the value before return to the client.
  */
 
+#define REG_DECODE_READ (1 << 0)
+#define REG_DECODE_WRITE (1 << 1)
+#define REG_DECODE_EXECUTE (1 << 2)
+#define REG_DECODE_RW (REG_DECODE_READ | REG_DECODE_WRITE)
+
 struct RegisterAccessInfo {
 const char *name;
 uint64_t ro;
@@ -71,6 +76,11 @@ struct RegisterAccessInfo {
 void (*post_write)(RegisterInfo *reg, uint64_t val);
 
 uint64_t (*post_read)(RegisterInfo *reg, uint64_t val);
+
+struct {
+hwaddr addr;
+uint8_t flags;
+} decode;
 };
 
 /**
-- 
2.5.0




[Qemu-devel] [PATCH v2 03/16] register: Add Memory API glue

2016-01-19 Thread Alistair Francis
From: Peter Crosthwaite 

Add memory io handlers that glue the register API to the memory API.
Just translation functions at this stage. Although it does allow for
devices to be created without all-in-one mmio r/w handlers.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
changed from v2:
Added fast path to register_write_memory to skip endianness bitbashing

 hw/core/register.c| 48 
 include/hw/register.h | 30 ++
 2 files changed, 78 insertions(+)

diff --git a/hw/core/register.c b/hw/core/register.c
index 02a4376..ca10cff 100644
--- a/hw/core/register.c
+++ b/hw/core/register.c
@@ -184,3 +184,51 @@ void register_reset(RegisterInfo *reg)
 
 register_write_val(reg, reg->access->reset);
 }
+
+static inline void register_write_memory(void *opaque, hwaddr addr,
+ uint64_t value, unsigned size, bool 
be)
+{
+RegisterInfo *reg = opaque;
+uint64_t we = ~0;
+int shift = 0;
+
+if (reg->data_size != size) {
+we = (size == 8) ? ~0ull : (1ull << size * 8) - 1;
+shift = 8 * (be ? reg->data_size - size - addr : addr);
+}
+
+assert(size + addr <= reg->data_size);
+register_write(reg, value << shift, we << shift);
+}
+
+void register_write_memory_be(void *opaque, hwaddr addr, uint64_t value,
+  unsigned size)
+{
+register_write_memory(opaque, addr, value, size, true);
+}
+
+
+void register_write_memory_le(void *opaque, hwaddr addr, uint64_t value,
+  unsigned size)
+{
+register_write_memory(opaque, addr, value, size, false);
+}
+
+static inline uint64_t register_read_memory(void *opaque, hwaddr addr,
+unsigned size, bool be)
+{
+RegisterInfo *reg = opaque;
+int shift = 8 * (be ? reg->data_size - size - addr : addr);
+
+return register_read(reg) >> shift;
+}
+
+uint64_t register_read_memory_be(void *opaque, hwaddr addr, unsigned size)
+{
+return register_read_memory(opaque, addr, size, true);
+}
+
+uint64_t register_read_memory_le(void *opaque, hwaddr addr, unsigned size)
+{
+return register_read_memory(opaque, addr, size, false);
+}
diff --git a/include/hw/register.h b/include/hw/register.h
index 249f458..a3c41db 100644
--- a/include/hw/register.h
+++ b/include/hw/register.h
@@ -86,6 +86,8 @@ struct RegisterAccessInfo {
  * @prefix: String prefix for log and debug messages
  *
  * @opaque: Opaque data for the register
+ *
+ * @mem: optional Memory region for the register
  */
 
 struct RegisterInfo {
@@ -103,6 +105,8 @@ struct RegisterInfo {
 
 bool read_lite;
 bool write_lite;
+
+MemoryRegion mem;
 };
 
 /**
@@ -129,4 +133,30 @@ uint64_t register_read(RegisterInfo *reg);
 
 void register_reset(RegisterInfo *reg);
 
+/**
+ * Memory API MMIO write handler that will write to a Register API register.
+ *  _be for big endian variant and _le for little endian.
+ * @opaque: RegisterInfo to write to
+ * @addr: Address to write
+ * @value: Value to write
+ * @size: Number of bytes to write
+ */
+
+void register_write_memory_be(void *opaque, hwaddr addr, uint64_t value,
+  unsigned size);
+void register_write_memory_le(void *opaque, hwaddr addr, uint64_t value,
+  unsigned size);
+
+/**
+ * Memory API MMIO read handler that will read from a Register API register.
+ *  _be for big endian variant and _le for little endian.
+ * @opaque: RegisterInfo to read from
+ * @addr: Address to read
+ * @size: Number of bytes to read
+ * returns: Value read from register
+ */
+
+uint64_t register_read_memory_be(void *opaque, hwaddr addr, unsigned size);
+uint64_t register_read_memory_le(void *opaque, hwaddr addr, unsigned size);
+
 #endif
-- 
2.5.0




[Qemu-devel] [PATCH v2 14/16] register: Add GPIO API

2016-01-19 Thread Alistair Francis
From: Peter Crosthwaite 

Add GPIO functionality to the register API. This allows association
and automatic connection of GPIOs to bits in registers. GPIO inputs
will attach to handlers that automatically set read-only bits in
registers. GPIO outputs will be updated to reflect their field value
when their respective registers are written (or reset). Supports
active low GPIOs.

This is particularly effective for implementing system level
controllers, where heterogenous collections of control signals are
placed is a SoC specific peripheral then propagated all over the
system.

Signed-off-by: Peter Crosthwaite 
[ EI Changes:
  * register: Add a polarity field to GPIO connections
  Makes it possible to directly connect active low signals
  to generic interrupt pins.
]
Signed-off-by: Edgar E. Iglesias 
Signed-off-by: Alistair Francis 
---

 hw/core/register.c| 94 +++
 include/hw/register.h | 26 ++
 2 files changed, 120 insertions(+)

diff --git a/hw/core/register.c b/hw/core/register.c
index 116fd0b..0861603 100644
--- a/hw/core/register.c
+++ b/hw/core/register.c
@@ -135,6 +135,8 @@ void register_write(RegisterInfo *reg, uint64_t val, 
uint64_t we)
 }
 register_write_fast:
 register_write_val(reg, new_val);
+register_refresh_gpios(reg, old_val);
+
 if (ac->post_write) {
 ac->post_write(reg, new_val);
 }
@@ -177,18 +179,85 @@ uint64_t register_read(RegisterInfo *reg)
 void register_reset(RegisterInfo *reg)
 {
 assert(reg);
+uint64_t old_val;
 
 if (!reg->data || !reg->access) {
 return;
 }
 
+old_val = register_read_val(reg);
+
 register_write_val(reg, reg->access->reset);
+register_refresh_gpios(reg, old_val);
+}
+
+void register_refresh_gpios(RegisterInfo *reg, uint64_t old_value)
+{
+const RegisterAccessInfo *ac;
+const RegisterGPIOMapping *gpio;
+
+ac = reg->access;
+for (gpio = ac->gpios; gpio && gpio->name; gpio++) {
+int i;
+
+if (gpio->input) {
+continue;
+}
+
+for (i = 0; i < gpio->num; ++i) {
+uint64_t gpio_value, gpio_value_old;
+
+qemu_irq gpo = qdev_get_gpio_out_named(DEVICE(reg), gpio->name, i);
+gpio_value_old = extract64(old_value,
+   gpio->bit_pos + i * gpio->width,
+   gpio->width) ^ gpio->polarity;
+gpio_value = extract64(register_read_val(reg),
+   gpio->bit_pos + i * gpio->width,
+   gpio->width) ^ gpio->polarity;
+if (!(gpio_value_old ^ gpio_value)) {
+continue;
+}
+if (reg->debug && gpo) {
+qemu_log("refreshing gpio out %s to %" PRIx64 "\n",
+ gpio->name, gpio_value);
+}
+qemu_set_irq(gpo, gpio_value);
+}
+}
+}
+
+typedef struct DeviceNamedGPIOHandlerOpaque {
+DeviceState *dev;
+const char *name;
+} DeviceNamedGPIOHandlerOpaque;
+
+static void register_gpio_handler(void *opaque, int n, int level)
+{
+DeviceNamedGPIOHandlerOpaque *gho = opaque;
+RegisterInfo *reg = REGISTER(gho->dev);
+
+const RegisterAccessInfo *ac;
+const RegisterGPIOMapping *gpio;
+
+ac = reg->access;
+for (gpio = ac->gpios; gpio && gpio->name; gpio++) {
+if (gpio->input && !strcmp(gho->name, gpio->name)) {
+register_write_val(reg, deposit64(register_read_val(reg),
+  gpio->bit_pos + n * gpio->width,
+  gpio->width,
+  level ^ gpio->polarity));
+return;
+}
+}
+
+abort();
 }
 
 void register_init(RegisterInfo *reg)
 {
 assert(reg);
 const RegisterAccessInfo *ac;
+const RegisterGPIOMapping *gpio;
 
 if (!reg->data || !reg->access) {
 return;
@@ -197,6 +266,30 @@ void register_init(RegisterInfo *reg)
 object_initialize((void *)reg, sizeof(*reg), TYPE_REGISTER);
 
 ac = reg->access;
+for (gpio = ac->gpios; gpio && gpio->name; gpio++) {
+if (!gpio->num) {
+((RegisterGPIOMapping *)gpio)->num = 1;
+}
+if (!gpio->width) {
+((RegisterGPIOMapping *)gpio)->width = 1;
+}
+if (gpio->input) {
+DeviceNamedGPIOHandlerOpaque gho = {
+.name = gpio->name,
+.dev = DEVICE(reg),
+};
+qemu_irq irq;
+
+qdev_init_gpio_in_named(DEVICE(reg), register_gpio_handler,
+gpio->name, gpio->num);
+irq = qdev_get_gpio_in_named(DEVICE(reg), gpio->name, gpio->num);
+

[Qemu-devel] [PATCH v2 16/16] xlnx-zynqmp: Connect the ZynqMP IOU SLCR

2016-01-19 Thread Alistair Francis
Connect the I/O Unit System Level Control Registers device
to the ZynqMP model. Unfortunatly the GPIO links can not be
connected yet as the SD device is not yet attached to the
ZynqMP machine.

Signed-off-by: Alistair Francis 
---
V2:
 - Fix up device connection

 hw/arm/xlnx-zynqmp.c | 13 +
 include/hw/arm/xlnx-zynqmp.h |  2 ++
 2 files changed, 15 insertions(+)

diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index 57e926d..a1391ba 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -33,6 +33,8 @@
 #define SATA_ADDR   0xFD0C
 #define SATA_NUM_PORTS  2
 
+#define IOU_SLCR_ADDR   0xFF18
+
 static const uint64_t gem_addr[XLNX_ZYNQMP_NUM_GEMS] = {
 0xFF0B, 0xFF0C, 0xFF0D, 0xFF0E,
 };
@@ -118,6 +120,10 @@ static void xlnx_zynqmp_init(Object *obj)
 qdev_set_parent_bus(DEVICE(>sdhci[i]),
 sysbus_get_default());
 }
+
+object_initialize(>iou_slcr, sizeof(s->iou_slcr),
+  TYPE_XLNX_ZYNQMP_IOU_SLCR);
+qdev_set_parent_bus(DEVICE(>iou_slcr), sysbus_get_default());
 }
 
 static void xlnx_zynqmp_realize(DeviceState *dev, Error **errp)
@@ -324,6 +330,13 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
**errp)
 sysbus_connect_irq(SYS_BUS_DEVICE(>sdhci[i]), 0,
gic_spi[sdhci_intr[i]]);
 }
+
+object_property_set_bool(OBJECT(>iou_slcr), true, "realized", );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+sysbus_mmio_map(SYS_BUS_DEVICE(>iou_slcr), 0, IOU_SLCR_ADDR);
 }
 
 static Property xlnx_zynqmp_props[] = {
diff --git a/include/hw/arm/xlnx-zynqmp.h b/include/hw/arm/xlnx-zynqmp.h
index 1eba937..e3a1f0b 100644
--- a/include/hw/arm/xlnx-zynqmp.h
+++ b/include/hw/arm/xlnx-zynqmp.h
@@ -22,6 +22,7 @@
 #include "hw/intc/arm_gic.h"
 #include "hw/net/cadence_gem.h"
 #include "hw/char/cadence_uart.h"
+#include "hw/misc/xlnx-zynqmp-iou-slcr.h"
 #include "hw/ide/pci.h"
 #include "hw/ide/ahci.h"
 #include "hw/sd/sdhci.h"
@@ -78,6 +79,7 @@ typedef struct XlnxZynqMPState {
 CadenceUARTState uart[XLNX_ZYNQMP_NUM_UARTS];
 SysbusAHCIState sata;
 SDHCIState sdhci[XLNX_ZYNQMP_NUM_SDHCI];
+XlnxZynqMPIOUSLCR iou_slcr;
 
 char *boot_cpu;
 ARMCPU *boot_cpu_ptr;
-- 
2.5.0




Re: [Qemu-devel] [PATCH v1 00/15] data-driven device registers

2016-01-19 Thread Edgar E. Iglesias
On 19 Jan 2016 20:52, "Alistair Francis" 
wrote:
>
> On Fri, Jan 8, 2016 at 3:05 AM, Edgar E. Iglesias
>  wrote:
> > On Fri, Jan 08, 2016 at 10:40:28AM +, Peter Maydell wrote:
> >> On 8 January 2016 at 00:39, Alistair Francis
> >>  wrote:
> >> > On Wed, Dec 16, 2015 at 8:33 AM, Alistair Francis
> >> >  wrote:
> >> >> On Tue, Dec 15, 2015 at 1:56 PM, Peter Maydell <
peter.mayd...@linaro.org> wrote:
> >> >>> On 15 December 2015 at 20:52, Peter Crosthwaite
> >> >>>  wrote:
> >>  It needs to exist before it can be used so there is a bit of a
chicken
> >>  and egg problem there.
> >> >
> >> > No one seems to be jumping at reviewing this. Can we just send a
pull request?
> >>
> >> I don't necessarily require review [*]. I would like *somebody* other
> >> than you Xilinx folk to say "yes, I think I would use this for
> >> modelling devices". Otherwise all we have is "weird thing used
> >> only in two or three Xilinx devices and nowhere else", which I'm
> >> a bit reluctant to let into the tree. We already have a pretty
> >> wide divergence in how devices look just based on the various
> >> transitions from older to newer qdev/QOM/etc that are not complete.
> >>
> >> [*] by which I mean, I will review this series if you can find
> >> somebody else who's going to say they'd use it.
> >>
> >
> > Hi,
> >
> > I have two general comments to the series.
> >
> > 1. I think we need to do something to allow mem-attributes to be passed
to the reg-access callbacks and possibly also to add a way to filter on
attrs in the data structure.
>
> I would prefer to add this in the future. We are already having enough
> trouble getting it accepted now and this will add complexity.
>
> In saying that, it shouldn't be too hard to add in the future. The
> basic infrastructure is all there.
>

Sounds good

> >
> > 2. We had trouble in the Xilinx tree with the number of memory regions
created when using the style were each reg becomes an MR. I think that
style should either be disallowed or we need to fix the low limit (IIRC it
was at around 1K MRs/Regs).
>
> This I can help with. I am sending a V2 which doesn't print the memory
> regions with infro mtree. When you start getting hundreds of registers
> this becomes really painful. I can't see a nice way to avoid actually
> adding the memory subregions though. It is just a linked list, what
> limit is there for memory subregions?
>

I don't remember all the details but it was not directly due to the large
amount of MRs but indirectly with the resulting AS. We hit aborts and
died.  Our skeleton generator does not use the mr style anymore for
example.

Cheers,
Edgar

> Thanks,
>
> Alistair
>
> >
> > I'm part of the Xilinx team so I this outside of Peters review request
but anyawy
> >
> > Cheers,
> > Edgar
> >


[Qemu-devel] [PATCH v2 00/16] data-driven device registers

2016-01-19 Thread Alistair Francis
This patch series is based on Peter C's original register API. His
original cover letter is below.

I have added a new function memory_region_add_subregion_no_print() which
stops memory regions from being printed by 'info mtree'. This is used to
avoid evey register being printed when running 'info mtree'.

NOTE: That info qom-tree will still print all of these registers.

Future work: Allow support for memory attributes.

V2:
 - Rebase
 - Fix up IOU SLCR connections
 - Add the memory_region_add_subregion_no_print() function and use it
   for the registers

Original cover letter From Peter:
Hi All. This is a new scheme I've come up with handling device registers in a
data driven way. My motivation for this is to factor out a lot of the access
checking that seems to be replicated in every device. See P1 commit message for
further discussion.

P1 is the main patch, adds the register definition functionality
P2-3,6 add helpers that glue the register API to the Memory API
P4 Defines a set of macros that minimise register and field definitions
P5 is QOMfication
P7 is a trivial
P10-13 Work up to GPIO support
P8,9,14 add new devices (the Xilinx Zynq devcfg & ZynqMP SLCR) that use this
scheme.
P15: Connect the ZynqMP SLCR device

This Zynq devcfg device was particularly finnicky with per-bit restrictions.
I'm also looking for a higher-than-usual modelling fidelity
on the register space, with semantics defined for random reserved bits
in-between otherwise consistent fields.

Here's an example of the qemu_log output for the devcfg device. This is produced
by now generic sharable code:

/machine/unattached/device[44]:Addr 0x08:CFG: write of value 0508
/machine/unattached/device[44]:Addr 0x80:MCTRL: write of value 00800010
/machine/unattached/device[44]:Addr 0x10:INT_MASK: write of value 
/machine/unattached/device[44]:Addr :CTRL: write of value 0c00607f

And an example of a rogue guest banging on a bad bit:

/machine/unattached/device[44]:Addr 0x14:STATUS bits 0x01 may not be \
written to 1

A future feature I am interested in is implementing TCG optimisation of
side-effectless registers. The register API allows clear definition of
what registers have txn side effects and which ones don't. You could even
go a step further and translate such side-effectless accesses based on the
data pointer for the register.

Changes since RFC:
 - Connect the ZynqMP IOU SLCR device
 - Rebase

Changed from RFC v4:
Rebased
Added QOMification
Added GPIO support
Refactored Devcfg device to use FIELD/REG/EX macros.
Update style of devcfg device
Added init_block help.
Changed from v3:
Rebased
Added reserved bits.
Cleaner separation of decode and access components (Patch 3)
Changed from v2:
Fixed for hw/ re-orginisation (Paolo review)
Simplified and optimized (PMM and Gerd review)
Changed from v1:
Added ONES macro patch
Dropped bogus former patch 1 (PMM review)
Addressed Blue, Gerd and MST comments.
Simplified to be more Memory API compatible.
Added Memory API helpers.
Please see discussion already on list and commit msgs for more detail.


Alistair Francis (2):
  memory: Allow subregions to not be printed by info mtree
  xlnx-zynqmp: Connect the ZynqMP IOU SLCR

Peter Crosthwaite (14):
  register: Add Register API
  register: Add Memory API glue
  register: Add support for decoding information
  register: Define REG and FIELD macros
  register: QOMify
  register: Add block initialise helper
  bitops: Add ONES macro
  dma: Add Xilinx Zynq devcfg device model
  xilinx_zynq: add devcfg to machine model
  qdev: Define qdev_get_gpio_out
  qdev: Add qdev_pass_all_gpios API
  irq: Add opaque setter routine
  register: Add GPIO API
  misc: Introduce ZynqMP IOU SLCR

 default-configs/arm-softmmu.mak|   1 +
 hw/arm/xilinx_zynq.c   |   8 +
 hw/arm/xlnx-zynqmp.c   |  13 ++
 hw/core/Makefile.objs  |   1 +
 hw/core/irq.c  |   5 +
 hw/core/qdev.c |  21 ++
 hw/core/register.c | 391 +++
 hw/dma/Makefile.objs   |   1 +
 hw/dma/xlnx-zynq-devcfg.c  | 406 +
 hw/misc/Makefile.objs  |   1 +
 hw/misc/xlnx-zynqmp-iou-slcr.c | 113 +
 include/exec/memory.h  |  17 ++
 include/hw/arm/xlnx-zynqmp.h   |   2 +
 include/hw/dma/xlnx-zynq-devcfg.h  |  62 +
 include/hw/irq.h   |   2 +
 include/hw/misc/xlnx-zynqmp-iou-slcr.h |  47 
 include/hw/qdev-core.h |   3 +
 include/hw/register.h  | 274 ++
 include/qemu/bitops.h  |   2 +
 memory.c   |  10 +-
 20 files changed, 1379 insertions(+), 1 deletion(-)
 create mode 100644 hw/core/register.c
 create mode 100644 

[Qemu-devel] [PATCH v2 02/16] register: Add Register API

2016-01-19 Thread Alistair Francis
From: Peter Crosthwaite 

This API provides some encapsulation of registers and factors our some
common functionality to common code. Bits of device state (usually MMIO
registers), often have all sorts of access restrictions and semantics
associated with them. This API allow you to define what those
restrictions are on a bit-by-bit basis.

Helper functions are then used to access the register which observe the
semantics defined by the RegisterAccessInfo struct.

Some features:
Bits can be marked as read_only (ro field)
Bits can be marked as write-1-clear (w1c field)
Bits can be marked as reserved (rsvd field)
Reset values can be defined (reset)
Bits can throw guest errors when written certain values (ge0, ge1)
Bits can throw unimp errors when written certain values (ui0, ui1)
Bits can be marked clear on read (cor)
Pre and post action callbacks can be added to read and write ops
Verbose debugging info can be enabled/disabled

Useful for defining device register spaces in a data driven way. Cuts
down on a lot of the verbosity and repetition in the switch-case blocks
in the standard foo_mmio_read/write functions.

Also useful for automated generation of device models from hardware
design sources.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
changed from v2:
Simplified! Removed pre-read, nwx, wo
Removed byte loops (Gerd Review)
Made data pointer optional
Added fast paths for simple registers
Moved into hw/core and include/hw (Paolo Review)
changed from v1:
Rebranded as the "Register API" - I think thats probably what it is.
Near total rewrite of implementation.
De-arrayified reset (this is client/Memory APIs job).
Moved out of bitops into its own file (Blue review)
Added debug, the register pointer, and prefix to a struct (Blue Review)
Made 64-bit to play friendlier with memory API (Blue review)
Made backend storage uint8_t (MST review)
Added read/write callbacks (Blue review)
Added ui0, ui1 (Blue review)
Moved re-purposed width (now byte width defining actual storage size)
Arrayified ge0, ge1 (ui0, ui1 too) and added .reason
Added wo field (not an April fools joke - this has genuine meaning here)
Added we mask to write accessor

 hw/core/Makefile.objs |   1 +
 hw/core/register.c| 186 ++
 include/hw/register.h | 132 +++
 3 files changed, 319 insertions(+)
 create mode 100644 hw/core/register.c
 create mode 100644 include/hw/register.h

diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index abb3560..bf95db5 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -14,4 +14,5 @@ common-obj-$(CONFIG_SOFTMMU) += machine.o
 common-obj-$(CONFIG_SOFTMMU) += null-machine.o
 common-obj-$(CONFIG_SOFTMMU) += loader.o
 common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
+common-obj-$(CONFIG_SOFTMMU) += register.o
 common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
diff --git a/hw/core/register.c b/hw/core/register.c
new file mode 100644
index 000..02a4376
--- /dev/null
+++ b/hw/core/register.c
@@ -0,0 +1,186 @@
+/*
+ * Register Definition API
+ *
+ * Copyright (c) 2013 Xilinx Inc.
+ * Copyright (c) 2013 Peter Crosthwaite 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "hw/register.h"
+#include "qemu/log.h"
+
+static inline void register_write_log(RegisterInfo *reg, int dir, uint64_t val,
+  int mask, const char *msg,
+  const char *reason)
+{
+qemu_log_mask(mask, "%s:%s bits %#" PRIx64 " %s write of %d%s%s\n",
+  reg->prefix, reg->access->name, val, msg, dir,
+  reason ? ": " : "", reason ? reason : "");
+}
+
+static inline void register_write_val(RegisterInfo *reg, uint64_t val)
+{
+if (!reg->data) {
+return;
+}
+switch (reg->data_size) {
+case 1:
+*(uint8_t *)reg->data = val;
+break;
+case 2:
+*(uint16_t *)reg->data = val;
+break;
+case 4:
+*(uint32_t *)reg->data = val;
+break;
+case 8:
+*(uint64_t *)reg->data = val;
+break;
+default:
+abort();
+}
+}
+
+static inline uint64_t register_read_val(RegisterInfo *reg)
+{
+switch (reg->data_size) {
+case 1:
+return *(uint8_t *)reg->data;
+case 2:
+return *(uint16_t *)reg->data;
+case 4:
+return *(uint32_t *)reg->data;
+case 8:
+return *(uint64_t *)reg->data;
+default:
+abort();
+}
+return 0; /* unreachable */
+}
+
+void register_write(RegisterInfo *reg, uint64_t val, uint64_t we)
+{
+uint64_t old_val, new_val, test, no_w_mask;
+const RegisterAccessInfo *ac;
+const RegisterAccessError *rae;
+
+assert(reg);
+
+ac = 

[Qemu-devel] [PATCH v2 01/16] memory: Allow subregions to not be printed by info mtree

2016-01-19 Thread Alistair Francis
Add a function called memory_region_add_subregion_no_print() that
creates memory subregions that won't be printed when running
the 'info mtree' command.

Signed-off-by: Alistair Francis 
---

 include/exec/memory.h | 17 +
 memory.c  | 10 +-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 01f1004..eff2a89 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -186,6 +186,7 @@ struct MemoryRegion {
 bool skip_dump;
 bool enabled;
 bool warning_printed; /* For reservations */
+bool do_not_print;
 uint8_t vga_logging_count;
 MemoryRegion *alias;
 hwaddr alias_offset;
@@ -952,6 +953,22 @@ void memory_region_del_eventfd(MemoryRegion *mr,
 void memory_region_add_subregion(MemoryRegion *mr,
  hwaddr offset,
  MemoryRegion *subregion);
+
+/**
+ * memory_region_add_subregion_no_print: Add a subregion to a container.
+ *
+ * The same functionality as memory_region_add_subregion except that any
+ * memory regions added by this function are not printed by 'info mtree'.
+ *
+ * @mr: the region to contain the new subregion; must be a container
+ *  initialized with memory_region_init().
+ * @offset: the offset relative to @mr where @subregion is added.
+ * @subregion: the subregion to be added.
+ */
+void memory_region_add_subregion_no_print(MemoryRegion *mr,
+  hwaddr offset,
+  MemoryRegion *subregion);
+
 /**
  * memory_region_add_subregion_overlap: Add a subregion to a container
  *  with overlap.
diff --git a/memory.c b/memory.c
index 93bd8ed..ee90682 100644
--- a/memory.c
+++ b/memory.c
@@ -1827,6 +1827,14 @@ void memory_region_add_subregion(MemoryRegion *mr,
 memory_region_add_subregion_common(mr, offset, subregion);
 }
 
+void memory_region_add_subregion_no_print(MemoryRegion *mr,
+  hwaddr offset,
+  MemoryRegion *subregion)
+{
+memory_region_add_subregion(mr, offset, subregion);
+subregion->do_not_print = true;
+}
+
 void memory_region_add_subregion_overlap(MemoryRegion *mr,
  hwaddr offset,
  MemoryRegion *subregion,
@@ -2190,7 +2198,7 @@ static void mtree_print_mr(fprintf_function mon_printf, 
void *f,
 const MemoryRegion *submr;
 unsigned int i;
 
-if (!mr) {
+if (!mr || mr->do_not_print) {
 return;
 }
 
-- 
2.5.0




[Qemu-devel] [PATCH v2 6/6] ide: fix device_reset to not ignore pending AIO

2016-01-19 Thread John Snow
Signed-off-by: John Snow 
---
 hw/ide/core.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 9bc8e58..c68d1d4 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -502,7 +502,6 @@ void ide_transfer_stop(IDEState *s)
 ide_transfer_halt(s, ide_transfer_stop, true);
 }
 
-__attribute__((__unused__))
 static void ide_transfer_cancel(IDEState *s)
 {
 ide_transfer_halt(s, ide_transfer_cancel, false);
@@ -1295,6 +1294,23 @@ static bool cmd_nop(IDEState *s, uint8_t cmd)
 return true;
 }
 
+static bool cmd_device_reset(IDEState *s, uint8_t cmd)
+{
+/* Halt PIO (in the DRQ phase), then DMA */
+ide_transfer_cancel(s);
+ide_cancel_dma_sync(s);
+
+/* Reset any PIO commands, reset signature, etc */
+ide_reset(s);
+
+/* RESET: ATA8-ACS3 7.10.4 "Normal Outputs";
+ * ATA8-ACS3 Table 184 "Device Signatures for Normal Output" */
+s->status = 0x00;
+
+/* Do not overwrite status register */
+return false;
+}
+
 static bool cmd_data_set_management(IDEState *s, uint8_t cmd)
 {
 switch (s->feature) {
@@ -1611,15 +1627,6 @@ static bool cmd_exec_dev_diagnostic(IDEState *s, uint8_t 
cmd)
 return false;
 }
 
-static bool cmd_device_reset(IDEState *s, uint8_t cmd)
-{
-ide_set_signature(s);
-s->status = 0x00; /* NOTE: READY is _not_ set */
-s->error = 0x01;
-
-return false;
-}
-
 static bool cmd_packet(IDEState *s, uint8_t cmd)
 {
 /* overlapping commands not supported */
-- 
2.4.3




[Qemu-devel] [PATCH] usb: check page select value while processing iTD

2016-01-19 Thread P J P
From: Prasad J Pandit 

While processing isochronous transfer descriptors(iTD), the page
select(PG) field value could lead to an OOB read access. Add
check to avoid it.

Reported-by: Qinghao Tang 
Signed-off-by: Prasad J Pandit 
---
 hw/usb/hcd-ehci.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index d07f228..c40013e 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1404,21 +1404,23 @@ static int ehci_process_itd(EHCIState *ehci,
 if (itd->transact[i] & ITD_XACT_ACTIVE) {
 pg   = get_field(itd->transact[i], ITD_XACT_PGSEL);
 off  = itd->transact[i] & ITD_XACT_OFFSET_MASK;
-ptr1 = (itd->bufptr[pg] & ITD_BUFPTR_MASK);
-ptr2 = (itd->bufptr[pg+1] & ITD_BUFPTR_MASK);
 len  = get_field(itd->transact[i], ITD_XACT_LENGTH);
 
 if (len > max * mult) {
 len = max * mult;
 }
-
-if (len > BUFF_SIZE) {
+if (len > BUFF_SIZE || pg > 6) {
 return -1;
 }
 
+ptr1 = (itd->bufptr[pg] & ITD_BUFPTR_MASK);
 qemu_sglist_init(>isgl, ehci->device, 2, ehci->as);
 if (off + len > 4096) {
 /* transfer crosses page border */
+if (pg == 6) {
+return -1;  /* avoid page pg + 1 */
+}
+ptr2 = (itd->bufptr[pg + 1] & ITD_BUFPTR_MASK);
 uint32_t len2 = off + len - 4096;
 uint32_t len1 = len - len2;
 qemu_sglist_add(>isgl, ptr1 + off, len1);
-- 
2.5.0




[Qemu-devel] [PATCH] target-arm: Make various system registers visible to EL3

2016-01-19 Thread Peter Maydell
The AArch64 system registers DACR32_EL2, IFSR32_EL2, SPSR_IRQ,
SPSR_ABT, SPSR_UND and SPSR_FIQ are visible and fully functional from
EL3 even if the CPU has no EL2 (unlike some others which are RES0
from EL3 in that configuration).  Move them from el2_cp_reginfo[] to
v8_cp_reginfo[] so they are always present.

Signed-off-by: Peter Maydell 
---
 target-arm/helper.c | 58 ++---
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index e8ede3f..999c617 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -3166,6 +3166,35 @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
   .type = ARM_CP_ALIAS,
   .fieldoffset = offsetof(CPUARMState, vfp.xregs[ARM_VFP_FPEXC]),
   .access = PL2_RW, .accessfn = fpexc32_access },
+{ .name = "DACR32_EL2", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 4, .crn = 3, .crm = 0, .opc2 = 0,
+  .access = PL2_RW, .resetvalue = 0,
+  .writefn = dacr_write, .raw_writefn = raw_write,
+  .fieldoffset = offsetof(CPUARMState, cp15.dacr32_el2) },
+{ .name = "IFSR32_EL2", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 4, .crn = 5, .crm = 0, .opc2 = 1,
+  .access = PL2_RW, .resetvalue = 0,
+  .fieldoffset = offsetof(CPUARMState, cp15.ifsr32_el2) },
+{ .name = "SPSR_IRQ", .state = ARM_CP_STATE_AA64,
+  .type = ARM_CP_ALIAS,
+  .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 3, .opc2 = 0,
+  .access = PL2_RW,
+  .fieldoffset = offsetof(CPUARMState, banked_spsr[BANK_IRQ]) },
+{ .name = "SPSR_ABT", .state = ARM_CP_STATE_AA64,
+  .type = ARM_CP_ALIAS,
+  .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 3, .opc2 = 1,
+  .access = PL2_RW,
+  .fieldoffset = offsetof(CPUARMState, banked_spsr[BANK_ABT]) },
+{ .name = "SPSR_UND", .state = ARM_CP_STATE_AA64,
+  .type = ARM_CP_ALIAS,
+  .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 3, .opc2 = 2,
+  .access = PL2_RW,
+  .fieldoffset = offsetof(CPUARMState, banked_spsr[BANK_UND]) },
+{ .name = "SPSR_FIQ", .state = ARM_CP_STATE_AA64,
+  .type = ARM_CP_ALIAS,
+  .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 3, .opc2 = 3,
+  .access = PL2_RW,
+  .fieldoffset = offsetof(CPUARMState, banked_spsr[BANK_FIQ]) },
 REGINFO_SENTINEL
 };
 
@@ -3293,11 +3322,6 @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
   .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 0,
   .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.hcr_el2),
   .writefn = hcr_write },
-{ .name = "DACR32_EL2", .state = ARM_CP_STATE_AA64,
-  .opc0 = 3, .opc1 = 4, .crn = 3, .crm = 0, .opc2 = 0,
-  .access = PL2_RW, .resetvalue = 0,
-  .writefn = dacr_write, .raw_writefn = raw_write,
-  .fieldoffset = offsetof(CPUARMState, cp15.dacr32_el2) },
 { .name = "ELR_EL2", .state = ARM_CP_STATE_AA64,
   .type = ARM_CP_ALIAS,
   .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 0, .opc2 = 1,
@@ -3307,10 +3331,6 @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
   .type = ARM_CP_ALIAS,
   .opc0 = 3, .opc1 = 4, .crn = 5, .crm = 2, .opc2 = 0,
   .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.esr_el[2]) },
-{ .name = "IFSR32_EL2", .state = ARM_CP_STATE_AA64,
-  .opc0 = 3, .opc1 = 4, .crn = 5, .crm = 0, .opc2 = 1,
-  .access = PL2_RW, .resetvalue = 0,
-  .fieldoffset = offsetof(CPUARMState, cp15.ifsr32_el2) },
 { .name = "FAR_EL2", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 4, .crn = 6, .crm = 0, .opc2 = 0,
   .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.far_el[2]) },
@@ -3319,26 +3339,6 @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
   .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 0, .opc2 = 0,
   .access = PL2_RW,
   .fieldoffset = offsetof(CPUARMState, banked_spsr[BANK_HYP]) },
-{ .name = "SPSR_IRQ", .state = ARM_CP_STATE_AA64,
-  .type = ARM_CP_ALIAS,
-  .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 3, .opc2 = 0,
-  .access = PL2_RW,
-  .fieldoffset = offsetof(CPUARMState, banked_spsr[BANK_IRQ]) },
-{ .name = "SPSR_ABT", .state = ARM_CP_STATE_AA64,
-  .type = ARM_CP_ALIAS,
-  .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 3, .opc2 = 1,
-  .access = PL2_RW,
-  .fieldoffset = offsetof(CPUARMState, banked_spsr[BANK_ABT]) },
-{ .name = "SPSR_UND", .state = ARM_CP_STATE_AA64,
-  .type = ARM_CP_ALIAS,
-  .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 3, .opc2 = 2,
-  .access = PL2_RW,
-  .fieldoffset = offsetof(CPUARMState, banked_spsr[BANK_UND]) },
-{ .name = "SPSR_FIQ", .state = ARM_CP_STATE_AA64,
-  .type = ARM_CP_ALIAS,
-  .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 3, .opc2 = 3,
-  .access = PL2_RW,
-  .fieldoffset = offsetof(CPUARMState, banked_spsr[BANK_FIQ]) },
 { .name = "VBAR_EL2", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 4, .crn = 12, .crm = 0, .opc2 = 0,
   .access = PL2_RW, 

Re: [Qemu-devel] [PATCH] sheepdog: allow to delete snapshot

2016-01-19 Thread Jeff Cody
On Wed, Dec 23, 2015 at 09:22:26PM +0900, Hitoshi Mitake wrote:
> From: Vasiliy Tolstov 
> 
> This patch implements a blockdriver function bdrv_snapshot_delete() in
> the sheepdog driver. With the new function, snapshots of sheepdog can
> be deleted from libvirt.
> 
> Cc: Jeff Cody 
> Signed-off-by: Hitoshi Mitake 
> Signed-off-by: Vasiliy Tolstov 
> ---
>  block/sheepdog.c | 125 
> ++-
>  1 file changed, 123 insertions(+), 2 deletions(-)
> 
> diff --git a/block/sheepdog.c b/block/sheepdog.c
> index d80e4ed..0a4f2fc 100644
> --- a/block/sheepdog.c
> +++ b/block/sheepdog.c
> @@ -283,6 +283,12 @@ static inline bool is_snapshot(struct SheepdogInode 
> *inode)
>  return !!inode->snap_ctime;
>  }
>  
> +static inline size_t count_data_objs(const struct SheepdogInode *inode)
> +{
> +return DIV_ROUND_UP(inode->vdi_size,
> +(1UL << inode->block_size_shift));
> +}
> +
>  #undef DPRINTF
>  #ifdef DEBUG_SDOG
>  #define DPRINTF(fmt, args...)   \
> @@ -2479,13 +2485,128 @@ out:
>  return ret;
>  }
>  
> +#define NR_BATCHED_DISCARD 128
> +
> +static bool remove_objects(BDRVSheepdogState *s)
> +{
> +int fd, i = 0, nr_objs = 0;
> +Error *local_err = NULL;
> +int ret = 0;
> +bool result = true;
> +SheepdogInode *inode = >inode;
> +
> +fd = connect_to_sdog(s, _err);
> +if (fd < 0) {
> +error_report_err(local_err);
> +return false;
> +}
> +
> +nr_objs = count_data_objs(inode);
> +while (i < nr_objs) {
> +int start_idx, nr_filled_idx;
> +
> +while (i < nr_objs && !inode->data_vdi_id[i]) {
> +i++;
> +}
> +start_idx = i;
> +
> +nr_filled_idx = 0;
> +while (i < nr_objs && nr_filled_idx < NR_BATCHED_DISCARD) {
> +if (inode->data_vdi_id[i]) {
> +inode->data_vdi_id[i] = 0;
> +nr_filled_idx++;
> +}
> +
> +i++;
> +}
> +
> +ret = write_object(fd, s->aio_context,
> +   (char *)>data_vdi_id[start_idx],
> +   vid_to_vdi_oid(s->inode.vdi_id), inode->nr_copies,
> +   (i - start_idx) * sizeof(uint32_t),
> +   offsetof(struct SheepdogInode,
> +data_vdi_id[start_idx]),
> +   false, s->cache_flags);
> +if (ret < 0) {
> +error_report("failed to discard snapshot inode.");
> +result = false;
> +goto out;
> +}
> +}
> +
> +out:
> +closesocket(fd);
> +return result;
> +}
> +
>  static int sd_snapshot_delete(BlockDriverState *bs,
>const char *snapshot_id,
>const char *name,
>Error **errp)
>  {
> -/* FIXME: Delete specified snapshot id.  */
> -return 0;
> +uint32_t snap_id = 0;
> +char snap_tag[SD_MAX_VDI_TAG_LEN];
> +Error *local_err = NULL;
> +int fd, ret;
> +char buf[SD_MAX_VDI_LEN + SD_MAX_VDI_TAG_LEN];
> +BDRVSheepdogState *s = bs->opaque;
> +unsigned int wlen = SD_MAX_VDI_LEN + SD_MAX_VDI_TAG_LEN, rlen = 0;
> +uint32_t vid;
> +SheepdogVdiReq hdr = {
> +.opcode = SD_OP_DEL_VDI,
> +.data_length = wlen,
> +.flags = SD_FLAG_CMD_WRITE,
> +};
> +SheepdogVdiRsp *rsp = (SheepdogVdiRsp *)
> +
> +if (!remove_objects(s)) {
> +return -1;
> +}
> +
> +memset(buf, 0, sizeof(buf));
> +memset(snap_tag, 0, sizeof(snap_tag));
> +pstrcpy(buf, SD_MAX_VDI_LEN, s->name);
> +if (qemu_strtoul(snapshot_id, NULL, 10, (unsigned long *)_id)) {
> +return -1;
> +}
> +
> +if (snap_id) {
> +hdr.snapid = snap_id;
> +} else {
> +pstrcpy(snap_tag, sizeof(snap_tag), snapshot_id);
> +pstrcpy(buf + SD_MAX_VDI_LEN, SD_MAX_VDI_TAG_LEN, snap_tag);
> +}
> +
> +ret = find_vdi_name(s, s->name, snap_id, snap_tag, , true,
> +_err);
> +if (ret) {
> +return ret;
> +}
> +
> +fd = connect_to_sdog(s, _err);
> +if (fd < 0) {
> +error_report_err(local_err);
> +return -1;
> +}
> +
> +ret = do_req(fd, s->aio_context, (SheepdogReq *),
> + buf, , );
> +closesocket(fd);
> +if (ret) {
> +return ret;
> +}
> +
> +switch (rsp->result) {
> +case SD_RES_NO_VDI:
> +error_report("%s was already deleted", s->name);
> +case SD_RES_SUCCESS:
> +break;
> +default:
> +error_report("%s, %s", sd_strerror(rsp->result), s->name);
> +return -1;
> +}
> +
> +return ret;
>  }
>  
>  static int sd_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab)
> -- 
> 1.9.1
>

Thanks, 

Re: [Qemu-devel] [PATCH v1 17/17] arm: boot: Support big-endian elfs

2016-01-19 Thread Peter Maydell
On 18 January 2016 at 07:12, Peter Crosthwaite
 wrote:
> Support ARM big-endian ELF files in system-mode emulation. When loading
> an elf, determine the endianness mode expected by the elf, and set the
> relevant CPU state accordingly.
>
> With this, big-endian modes are now fully supported via system-mode LE,
> so there is no need to restrict the elf loading to the TARGET
> endianness so the ifdeffery on TARGET_WORDS_BIGENDIAN goes away.
>
> Signed-off-by: Peter Crosthwaite 
> ---
>
>  hw/arm/boot.c| 96 
> ++--
>  include/hw/arm/arm.h |  9 +
>  2 files changed, 88 insertions(+), 17 deletions(-)
>
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index 0de4269..053c9e8 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -465,9 +465,34 @@ static void do_cpu_reset(void *opaque)
>  cpu_reset(cs);
>  if (info) {
>  if (!info->is_linux) {
> +int i;
>  /* Jump to the entry point.  */
>  uint64_t entry = info->entry;
>
> +switch (info->endianness) {
> +case ARM_ENDIANNESS_LE:
> +env->cp15.sctlr_el[1] &= ~SCTLR_E0E;
> +for (i = 1; i < 4; ++i) {
> +env->cp15.sctlr_el[i] &= ~SCTLR_EE;
> +}
> +env->uncached_cpsr &= ~CPSR_E;
> +break;
> +case ARM_ENDIANNESS_BE8:
> +env->cp15.sctlr_el[1] |= SCTLR_E0E;
> +for (i = 1; i < 4; ++i) {
> +env->cp15.sctlr_el[i] |= SCTLR_EE;
> +}
> +env->uncached_cpsr |= CPSR_E;
> +break;
> +case ARM_ENDIANNESS_BE32:
> +env->cp15.sctlr_el[1] |= SCTLR_B;
> +break;
> +case ARM_ENDIANNESS_UNKNOWN:
> +break; /* Board's decision */
> +default:
> +g_assert_not_reached();
> +}

Do we really want this much magic for non-linux images? I would
expect that the image would be intended to run with whatever the
state the board puts the CPU in from reset (ie the CPU has suitable
QOM properties for its initial endianness state, corresponding to
real hardware reset-config signals like the A15's CFGEND/CFGTE).

> +
>  if (!env->aarch64) {
>  env->thumb = info->entry & 1;
>  entry &= 0xfffe;
> @@ -589,16 +614,23 @@ static void arm_load_kernel_notify(Notifier *notifier, 
> void *data)
>  int kernel_size;
>  int initrd_size;
>  int is_linux = 0;
> +
>  uint64_t elf_entry, elf_low_addr, elf_high_addr;
>  int elf_machine;
> +bool elf_is64;
> +union {
> +Elf32_Ehdr h32;
> +Elf64_Ehdr h64;
> +} elf_header;
> +
>  hwaddr entry, kernel_load_offset;
> -int big_endian;
>  static const ARMInsnFixup *primary_loader;
>  ArmLoadKernelNotifier *n = DO_UPCAST(ArmLoadKernelNotifier,
>   notifier, notifier);
>  ARMCPU *cpu = n->cpu;
>  struct arm_boot_info *info =
>  container_of(n, struct arm_boot_info, load_kernel_notifier);
> +Error *err = NULL;
>
>  /* The board code is not supposed to set secure_board_setup unless
>   * running its code in secure mode is actually possible, and KVM
> @@ -678,12 +710,6 @@ static void arm_load_kernel_notify(Notifier *notifier, 
> void *data)
>  if (info->nb_cpus == 0)
>  info->nb_cpus = 1;
>
> -#ifdef TARGET_WORDS_BIGENDIAN
> -big_endian = 1;
> -#else
> -big_endian = 0;
> -#endif

Was this code ever built with TARGET_WORDS_BIGENDIAN defined?

> -
>  /* We want to put the initrd far enough into RAM that when the
>   * kernel is uncompressed it will not clobber the initrd. However
>   * on boards without much RAM we must ensure that we still leave
> @@ -698,16 +724,52 @@ static void arm_load_kernel_notify(Notifier *notifier, 
> void *data)
>  MIN(info->ram_size / 2, 128 * 1024 * 1024);
>
>  /* Assume that raw images are linux kernels, and ELF images are not.  */
> -kernel_size = load_elf(info->kernel_filename, NULL, NULL, _entry,
> -   _low_addr, _high_addr, big_endian,
> -   elf_machine, 1, 0);
> -if (kernel_size > 0 && have_dtb(info)) {
> -/* If there is still some room left at the base of RAM, try and put
> - * the DTB there like we do for images loaded with -bios or -pflash.
> - */
> -if (elf_low_addr > info->loader_start
> -|| elf_high_addr < info->loader_start) {
> -/* Pass elf_low_addr as address limit to load_dtb if it may be
> +
> +load_elf_hdr(info->kernel_filename, _header, _is64, );
> +
> +if (!err) {
> +int data_swab = 0;
> +bool big_endian;
> +
> +if (elf_is64) {
> +big_endian = 

Re: [Qemu-devel] [PATCH v1 00/17] ARM big-endian and setend support

2016-01-19 Thread Peter Maydell
On 18 January 2016 at 07:12, Peter Crosthwaite
 wrote:
> Hi All,
>
> This patch series adds system-mode big-endian support for ARM. It also
> implements the setend instruction, and loading of BE binaries even in
> LE emulation mode.
>
> Based on Paolo's original work. I have moved all the BE32 related work
> to the back of the series. Multiple parties are interested in the BE8
> work just on its own, so that could potentially be merged w/o BE32.
> PMM requested BE32 be at least thought out architecturally, so this
> series sees BE32 functionality through.
>
> I have tested all of LE. BE8 and BE32 in both linux-user mode (for
> regressions) and system mode (BE8 and BE32 are new here).
> My test application is here, the README gives some example command
> lines you can run:
>
> https://github.com/pcrost/arm-be-test

Thanks for picking this up again. I've sent out my review comments on it now.

-- PMM



Re: [Qemu-devel] [RFC] util: Fix QEMU_LD_PREFIX endless loop

2016-01-19 Thread Peter Maydell
On 15 January 2016 at 18:15, Richard Henderson  wrote:
> On 01/15/2016 09:53 AM, Peter Maydell wrote:
>>> @@ -58,7 +58,7 @@ static struct pathelem *new_entry(const char *root,
>>>  #if defined(DT_DIR) && defined(DT_UNKNOWN) && defined(DT_LNK)
>>>  # define dirent_type(dirent) ((dirent)->d_type)
>>>  # define is_dir_maybe(type) \
>>> -((type) == DT_DIR || (type) == DT_UNKNOWN || (type) == DT_LNK)
>>> +((type) == DT_DIR || (type) == DT_UNKNOWN)
>>>  #else
>>>  # define dirent_type(dirent) (1)
>>>  # define is_dir_maybe(type)  (type)
>>> --
>>> 2.5.0
>>
>> This change would be essentially reverting commit 338d80dd353c50b63,
>> which specifically added support for symbolic links in the directory
>> structure. So if we applied it we'd be regressing on the problem
>> that that change was meant to fix.
>>
>> Richard, git says that commit was one of yours :-)
>
> Because gcc and qemu have different names for their sysroot trees, and in my
> disks, gcc is the "master".  So I normally have
>
>.../qemu/run/qemu-alpha -> .../gcc/run-cross/alphaev67-linux/sys-root
>.../qemu/run/qemu-arm -> .../gcc/run-cross/arm-linux-gnueabi/sys-root
>.../qemu/run/qemu-sparc -> .../gcc/run-cross/sparc64-linux/sys-root
>.../qemu/run/qemu-sparc64 -> .../gcc/run-cross/sparc64-linux/sys-root
>
> The DT_LNK is required for traversing even the first link.

Right. So the path.c code is definitely buggy, but this patch
isn't the right way to fix it. It really doesn't behave
sensibly if you point it at a full root fs, but lots of people
want to do that, so it would be nice if it worked...

I think the underlying thing the code is trying to do is
create a sort of union-mount of the real root filesystem and
the directory you point at with -L. We need to do that in a way
that doesn't insist on scanning everything in the -L directory
on startup.

thanks
-- PMM



Re: [Qemu-devel] [PATCH v7] spec: add qcow2 bitmaps extension specification

2016-01-19 Thread Kevin Wolf
Am 11.01.2016 um 14:05 hat Vladimir Sementsov-Ogievskiy geschrieben:
> The new feature for qcow2: storing bitmaps.
> 
> This patch adds new header extension to qcow2 - Bitmaps Extension. It
> provides an ability to store virtual disk related bitmaps in a qcow2
> image. For now there is only one type of such bitmaps: Dirty Tracking
> Bitmap, which just tracks virtual disk changes from some moment.
> 
> Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file,
> should be stored in this qcow2 file. The size of each bitmap
> (considering its granularity) is equal to virtual disk size.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> v7:
> 
> - Rewordings, grammar.
>   Max, Eric, John, thank you very much.
> 
> - add last paragraph: remaining bits in bitmap data clusters must be
>   zero.
> 
> - s/Bitmap Directory/bitmap directory/ and other names like this at
>   the request of Max.
> 
> v6:
> 
> - reword bitmap_directory_size description
> - bitmap type: make 0 reserved
> - extra_data_size: resize to 4bytes
>   Also, I've marked this field as "must be zero". We can always change
>   it, if we decide allowing managing app to specify any extra data, by
>   defining some magic value as a top of user extra data.. So, for now
>   non zeor extra_data_size should be considered as an error.
> - swap name and extra_data to give good alignment to extra_data.
> 
> 
> v5:
> 
> - 'Dirty bitmaps' renamed to 'Bitmaps', as we may have several types of
>   bitmaps.
> - rewordings
> - move upper bounds to "Notes about Qemu limits"
> - s/should/must somewhere. (but not everywhere)
> - move name_size field closer to name itself in bitmap header
> - add extra data area to bitmap header
> - move bitmap data description to separate section
> 
>  docs/specs/qcow2.txt | 172 
> ++-
>  1 file changed, 171 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 121dfc8..997239d 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -103,7 +103,18 @@ in the description of a field.
>  write to an image with unknown auto-clear features if it
>  clears the respective bits from this field first.
>  
> -Bits 0-63:  Reserved (set to 0)
> +Bit 0:  Bitmaps extension bit
> +This bit indicates consistency for the 
> bitmaps
> +extension data.
> +
> +It is an error if this bit is set without the
> +bitmaps extension present.
> +
> +If the bitmaps extension is present but this
> +bit is unset, the bitmaps extension data is
> +inconsistent.

It may as well be consistent, but we don't know.

Perhaps something like "must be considered inconsistent" or "is
potentially inconsistent".

> +
> +Bits 1-63:  Reserved (set to 0)
>  
>   96 -  99:  refcount_order
>  Describes the width of a reference count block entry 
> (width
> @@ -123,6 +134,7 @@ be stored. Each extension has a structure like the 
> following:
>  0x - End of the header extension area
>  0xE2792ACA - Backing file format name
>  0x6803f857 - Feature name table
> +0x23852875 - Bitmaps extension
>  other  - Unknown header extension, can be safely
>   ignored
>  
> @@ -166,6 +178,34 @@ the header extension data. Each entry look like this:
>  terminated if it has full length)
>  
>  
> +== Bitmaps extension ==
> +
> +The bitmaps extension is an optional header extension. It provides the 
> ability
> +to store bitmaps related to a virtual disk. For now, there is only one bitmap
> +type: the dirty tracking bitmap, which tracks virtual disk changes from some
> +point in time.

I have one major problem with this patch, and it starts here.

The spec talks about dirty tracking bitmaps all the way, but it never
really defines what a dirty tracking bitmap even contains. It has a few
hints here and there, but they aren't consistent.

Here's the first hint: They track "virtual disk changes", which implies
they track guest clusters rather than host clusters.

> +The data of the extension should be considered consistent only if the
> +corresponding auto-clear feature bit is set, see autoclear_features above.
> +
> +The fields of the bitmaps extension are:
> +
> +  0 -  3:  nb_bitmaps
> +   The number of bitmaps contained in the image. Must be
> +   greater than or equal to 1.
> +
> +   Note: Qemu currently only supports up to 65535 bitmaps per
> +  

Re: [Qemu-devel] [PATCH] hw/arm/virt: Add always-on property to the virt board timer

2016-01-19 Thread Andrew Jones
On Tue, Jan 19, 2016 at 01:43:07PM +, Marc Zyngier wrote:
> On 19/01/16 13:32, Andrew Jones wrote:
> > On Tue, Jan 19, 2016 at 01:43:41PM +0100, Christoffer Dall wrote:
> >> On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote:
> >>> On Tue, Jan 19, 2016 at 12:49:18PM +0100, Christoffer Dall wrote:
>  The virt board has an arch timer, which is always on.  Emit the
>  "always-on" property to indicate to Linux that it can switch off the
>  periodic timer and reduces the amount of interrupts injected into a
>  guest.
> 
>  Signed-off-by: Christoffer Dall 
>  ---
>   hw/arm/virt.c | 1 +
>   1 file changed, 1 insertion(+)
> 
>  diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>  index 05f9087..265fe9a 100644
>  --- a/hw/arm/virt.c
>  +++ b/hw/arm/virt.c
>  @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo 
>  *vbi, int gictype)
>   qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible",
>   "arm,armv7-timer");
>   }
>  +qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0);
>   qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts",
>  GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, 
>  irqflags,
>  GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, 
>  irqflags,
>  -- 
>  2.1.2.330.g565301e.dirty
> 
> 
> >>>
> >>> Hi Christoffer,
> >>>
> >>> We should also patch the ACPI generation at the same time. I think
> >>> something like
> >>>
> >>>  - gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE;
> >>>  + gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE | ACPI_GTDT_ALWAYS_ON;
> >>
> >> I'm really not familiar enough with ACPI to be comfortable writing code
> >> for this or testing this.
> >>
> >> But if someone can pick this up and add the ACPI bits or can post a
> >> follow-up patch, then I'm all for it :)
> > 
> > I can post a follow-up patch.
> > 
> >>
> >>>
> >>> should do it.
> >>>
> >>> Also, having the guest reduce the number of interrupts sounds good. Can
> >>> you point me to something to read about how/why a guest may choose to do
> >>> that, and what the trade-offs are?
> >>>
> >> Not really, but you can ask Marc.
> > 
> > OK, CCing him. One thing I see is that without this change we're
> > currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though
> > it's not true. Having that set may disable the oneshot capabilityj
> > necessary to switch to nohz mode? I'll just stop there with my
> > speculation though, so Marc won't have to correct too much...
> 
> You're spot on. See 82a5619 in the kernel tree. When I did a similar
> change in kvmtool, I saw a massive reduction in the number of timer
> interrupts injected (specially when the number of vcpu is relatively high).
> 
> This also have interesting benefits when running on a model, where
> you're trying to squeeze the last bits of "performance" from the monster...
>

Hmm, I'm probably testing this wrong, but I don't see any difference in
the number of injected timer interrupts. My guest, which I boot with
UEFI, has 

CONFIG_ARM_ARCH_TIMER=y
CONFIG_ARM_ARCH_TIMER_EVTSTREAM=y
CONFIG_ARM_TIMER_SP804=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HZ_1000=y
CONFIG_HZ=1000

I've boot a guest using DT with and without this patch

---WITHOUT---

# ls /proc/device-tree/timer
compatible  interrupts  name
# cat /proc/interrupts  
   CPU0   CPU1   CPU2   CPU3   CPU4   CPU5 CPU6 
  CPU7
  3:   6958   5766   5166   5187   5576   5129 4695 
  4398   GIC  27 Edge  arch_timer
# sleep 120 && cat /proc/interrupts  
   CPU0   CPU1   CPU2   CPU3   CPU4   CPU5 CPU6 
  CPU7
  3:   7557   5986   5487   5265   6232   5868 5464 
  4438   GIC  27 Edge  arch_timer

---WITH---

# ls /proc/device-tree/timer
always-on  compatible  interrupts  name
# cat /proc/interrupts 
   CPU0   CPU1   CPU2   CPU3   CPU4   CPU5 CPU6 
  CPU7
  3:   7005   6080   4996   5391   5165   5257 4930 
  4844   GIC  27 Edge  arch_timer
# sleep 120 && cat /proc/interrupts 
   CPU0   CPU1   CPU2   CPU3   CPU4   CPU5 CPU6 
  CPU7
  3:   7523   6505   5264   6717   5273   5391 5526 
  4901   GIC  27 Edge  arch_timer



And kvm trace data has

---WITHOUT---
$ grep kvm_timer_update_irq trace.out | wc -l
94336
---WITH---
$ grep kvm_timer_update_irq trace.out | wc -l
95838


Any suggestions?

Thanks,
drew



Re: [Qemu-devel] [PATCH 09/10] virtio: read avail_idx from VQ only when necessary

2016-01-19 Thread Paolo Bonzini


On 19/01/2016 17:54, Michael S. Tsirkin wrote:
> On Fri, Jan 15, 2016 at 01:41:57PM +0100, Paolo Bonzini wrote:
>> From: Vincenzo Maffione 
>>
>> The virtqueue_pop() implementation needs to check if the avail ring
>> contains some pending buffers. To perform this check, it is not
>> always necessary to fetch the avail_idx in the VQ memory, which is
>> expensive. This patch introduces a shadow variable tracking avail_idx
>> and modifies virtio_queue_empty() to access avail_idx in physical
>> memory only when necessary.
>>
>> Signed-off-by: Vincenzo Maffione 
>> Message-Id: 
>> 
>> Signed-off-by: Paolo Bonzini 
> 
> Is the cost due to the page walk?

Yes, as with all the other patches.  But unlike patches 7 and 10 where
we just reduce the number of walks, for patch 8 and 9 it's difficult to
beat a local cache. :)

>> @@ -1579,6 +1595,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
>> version_id)
>>  return -1;
>>  }
>>  vdev->vq[i].used_idx = vring_used_idx(>vq[i]);
>> +vdev->vq[i].shadow_avail_idx = vring_avail_idx(>vq[i]);
>>  }
>>  }
> 
> 
> shadow_avail_idx also should be updated on vhost stop,

That's virtio_queue_set_last_avail_idx, right?

Paolo



Re: [Qemu-devel] [PATCH v1 15/17] loader: add API to load elf header

2016-01-19 Thread Peter Maydell
On 18 January 2016 at 07:12, Peter Crosthwaite
 wrote:
> Add an API to load an elf header header from a file. Populates a
> buffer with the header contents, as well as a boolean for whether the
> elf is 64b or not. Both arguments are optional.
>
> Signed-off-by: Peter Crosthwaite 
> ---
>
>  hw/core/loader.c| 48 
>  include/hw/loader.h |  1 +
>  2 files changed, 49 insertions(+)
>
> diff --git a/hw/core/loader.c b/hw/core/loader.c
> index 6b69852..28da8e2 100644
> --- a/hw/core/loader.c
> +++ b/hw/core/loader.c
> @@ -331,6 +331,54 @@ const char *load_elf_strerror(int error)
>  }
>  }
>
> +void load_elf_hdr(const char *filename, void *hdr, bool *is64, Error **errp)
> +{
> +int fd;
> +uint8_t e_ident[EI_NIDENT];
> +size_t hdr_size, off = 0;
> +bool is64l;
> +
> +fd = open(filename, O_RDONLY | O_BINARY);
> +if (fd < 0) {
> +error_setg_errno(errp, errno, "Fail to open file");

"Failed" (also below).

I don't think we end up with the filename anywhere in the
error message; it would be helpful if we could include it.

> +return;
> +}
> +if (read(fd, e_ident, sizeof(e_ident)) != sizeof(e_ident)) {
> +error_setg_errno(errp, errno, "Fail to read file");
> +goto fail;
> +}
> +if (e_ident[0] != ELFMAG0 ||
> +e_ident[1] != ELFMAG1 ||
> +e_ident[2] != ELFMAG2 ||
> +e_ident[3] != ELFMAG3) {
> +error_setg(errp, "Bad ELF magic");
> +goto fail;
> +}
> +
> +is64l = e_ident[EI_CLASS] == ELFCLASS64;
> +hdr_size = is64l ? sizeof(Elf64_Ehdr) : sizeof(Elf32_Ehdr);
> +if (is64) {
> +*is64 = is64l;
> +}
> +
> +lseek(fd, 0, SEEK_SET);

You're not checking this lseek for failure (and you don't
need it anyway, because you could just copy the magic bytes
into *hdr and read four fewer bytes).

> +while (hdr && off < hdr_size) {
> +size_t br = read(fd, hdr + off, hdr_size - off);
> +switch (br) {
> +case 0:
> +error_setg(errp, "File too short");
> +goto fail;
> +case -1:
> +error_setg_errno(errp, errno, "Failed to read file");
> +goto fail;
> +}
> +off += br;
> +}
> +
> +fail:
> +close(fd);
> +}
> +
>  /* return < 0 if error, otherwise the number of bytes loaded in memory */
>  int load_elf(const char *filename, uint64_t (*translate_fn)(void *, 
> uint64_t),
>   void *translate_opaque, uint64_t *pentry, uint64_t *lowaddr,
> diff --git a/include/hw/loader.h b/include/hw/loader.h
> index f7b43ab..33067f8 100644
> --- a/include/hw/loader.h
> +++ b/include/hw/loader.h
> @@ -36,6 +36,7 @@ int load_elf(const char *filename, uint64_t 
> (*translate_fn)(void *, uint64_t),
>   void *translate_opaque, uint64_t *pentry, uint64_t *lowaddr,
>   uint64_t *highaddr, int big_endian, int elf_machine,
>   int clear_lsb);
> +void load_elf_hdr(const char *filename, void *hdr, bool *is64, Error **errp);

Doc comment, please.

>  int load_aout(const char *filename, hwaddr addr, int max_sz,
>int bswap_needed, hwaddr target_page_size);
>  int load_uimage(const char *filename, hwaddr *ep,
> --
> 1.9.1

thanks
-- PMM



[Qemu-devel] [PULL 1/2] vfio/pci-quirks: Only quirk to size of PCI config space

2016-01-19 Thread Alex Williamson
For quirks that support the full PCIe extended config space, limit the
quirk to only the size of config space available through vfio.  This
allows host systems with broken MMCONFIG regions to still make use of
these quirks without generating bad address faults trying to access
beyond the end of config space exposed through vfio.  This may expose
direct access to the mirror of extended config space, only trapping
the sub-range of standard config space, but allowing this makes the
quirk, and thus the device, functional.  We expect that only device
specific accesses make use of the mirror, not general extended PCI
capability accesses, so any virtualization in this space is likely
unnecessary anyway, and the device is still IOMMU isolated, so it
should only be able to hurt itself through any bogus configurations
enabled by this space.

Link: https://www.redhat.com/archives/vfio-users/2015-November/msg00192.html
Reported-by: Ronnie Swanink 
Reviewed-by: Laszlo Ersek 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci-quirks.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 30c68a1..e117c41 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -328,7 +328,7 @@ static void vfio_probe_ati_bar4_quirk(VFIOPCIDevice *vdev, 
int nr)
 window->data_offset = 4;
 window->nr_matches = 1;
 window->matches[0].match = 0x4000;
-window->matches[0].mask = PCIE_CONFIG_SPACE_SIZE - 1;
+window->matches[0].mask = vdev->config_size - 1;
 window->bar = nr;
 window->addr_mem = >mem[0];
 window->data_mem = >mem[1];
@@ -674,7 +674,7 @@ static void vfio_probe_nvidia_bar5_quirk(VFIOPCIDevice 
*vdev, int nr)
 window->matches[0].match = 0x1800;
 window->matches[0].mask = PCI_CONFIG_SPACE_SIZE - 1;
 window->matches[1].match = 0x88000;
-window->matches[1].mask = PCIE_CONFIG_SPACE_SIZE - 1;
+window->matches[1].mask = vdev->config_size - 1;
 window->bar = nr;
 window->addr_mem = bar5->addr_mem = >mem[0];
 window->data_mem = bar5->data_mem = >mem[1];
@@ -765,7 +765,7 @@ static void vfio_probe_nvidia_bar0_quirk(VFIOPCIDevice 
*vdev, int nr)
 memory_region_init_io(mirror->mem, OBJECT(vdev),
   _nvidia_mirror_quirk, mirror,
   "vfio-nvidia-bar0-88000-mirror-quirk",
-  PCIE_CONFIG_SPACE_SIZE);
+  vdev->config_size);
 memory_region_add_subregion_overlap(>bars[nr].region.mem,
 mirror->offset, mirror->mem, 1);
 




[Qemu-devel] [PULL 2/2] vfio/pci: Lazy PBA emulation

2016-01-19 Thread Alex Williamson
The PCI spec recommends devices use additional alignment for MSI-X
data structures to allow software to map them to separate processor
pages.  One advantage of doing this is that we can emulate those data
structures without a significant performance impact to the operation
of the device.  Some devices fail to implement that suggestion and
assigned device performance suffers.

One such case of this is a Mellanox MT27500 series, ConnectX-3 VF,
where the MSI-X vector table and PBA are aligned on separate 4K
pages.  If PBA emulation is enabled, performance suffers.  It's not
clear how much value we get from PBA emulation, but the solution here
is to only lazily enable the emulated PBA when a masked MSI-X vector
fires.  We then attempt to more aggresively disable the PBA memory
region any time a vector is unmasked.  The expectation is then that
a typical VM will run entirely with PBA emulation disabled, and only
when used is that emulation re-enabled.

Reported-by: Shyam Kaushik 
Tested-by: Shyam Kaushik 
Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c |   39 +++
 hw/vfio/pci.h |1 +
 trace-events  |2 ++
 3 files changed, 42 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 1fb868c..e66c47f 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -356,6 +356,13 @@ static void vfio_msi_interrupt(void *opaque)
 if (vdev->interrupt == VFIO_INT_MSIX) {
 get_msg = msix_get_message;
 notify = msix_notify;
+
+/* A masked vector firing needs to use the PBA, enable it */
+if (msix_is_masked(>pdev, nr)) {
+set_bit(nr, vdev->msix->pending);
+memory_region_set_enabled(>pdev.msix_pba_mmio, true);
+trace_vfio_msix_pba_enable(vdev->vbasedev.name);
+}
 } else if (vdev->interrupt == VFIO_INT_MSI) {
 get_msg = msi_get_message;
 notify = msi_notify;
@@ -535,6 +542,14 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
unsigned int nr,
 }
 }
 
+/* Disable PBA emulation when nothing more is pending. */
+clear_bit(nr, vdev->msix->pending);
+if (find_first_bit(vdev->msix->pending,
+   vdev->nr_vectors) == vdev->nr_vectors) {
+memory_region_set_enabled(>pdev.msix_pba_mmio, false);
+trace_vfio_msix_pba_disable(vdev->vbasedev.name);
+}
+
 return 0;
 }
 
@@ -738,6 +753,9 @@ static void vfio_msix_disable(VFIOPCIDevice *vdev)
 
 vfio_msi_disable_common(vdev);
 
+memset(vdev->msix->pending, 0,
+   BITS_TO_LONGS(vdev->msix->entries) * sizeof(unsigned long));
+
 trace_vfio_msix_disable(vdev->vbasedev.name);
 }
 
@@ -1251,6 +1269,8 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos)
 {
 int ret;
 
+vdev->msix->pending = g_malloc0(BITS_TO_LONGS(vdev->msix->entries) *
+sizeof(unsigned long));
 ret = msix_init(>pdev, vdev->msix->entries,
 >bars[vdev->msix->table_bar].region.mem,
 vdev->msix->table_bar, vdev->msix->table_offset,
@@ -1264,6 +1284,24 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos)
 return ret;
 }
 
+/*
+ * The PCI spec suggests that devices provide additional alignment for
+ * MSI-X structures and avoid overlapping non-MSI-X related registers.
+ * For an assigned device, this hopefully means that emulation of MSI-X
+ * structures does not affect the performance of the device.  If devices
+ * fail to provide that alignment, a significant performance penalty may
+ * result, for instance Mellanox MT27500 VFs:
+ * http://www.spinics.net/lists/kvm/msg125881.html
+ *
+ * The PBA is simply not that important for such a serious regression and
+ * most drivers do not appear to look at it.  The solution for this is to
+ * disable the PBA MemoryRegion unless it's being used.  We disable it
+ * here and only enable it if a masked vector fires through QEMU.  As the
+ * vector-use notifier is called, which occurs on unmask, we test whether
+ * PBA emulation is needed and again disable if not.
+ */
+memory_region_set_enabled(>pdev.msix_pba_mmio, false);
+
 return 0;
 }
 
@@ -1275,6 +1313,7 @@ static void vfio_teardown_msi(VFIOPCIDevice *vdev)
 msix_uninit(>pdev,
 >bars[vdev->msix->table_bar].region.mem,
 >bars[vdev->msix->pba_bar].region.mem);
+g_free(vdev->msix->pending);
 }
 }
 
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index f004d52..6256587 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -95,6 +95,7 @@ typedef struct VFIOMSIXInfo {
 uint32_t pba_offset;
 MemoryRegion mmap_mem;
 void *mmap;
+unsigned long *pending;
 } VFIOMSIXInfo;
 
 typedef struct VFIOPCIDevice {
diff --git a/trace-events b/trace-events
index 934a7b6..c9ac144 100644
--- a/trace-events
+++ 

[Qemu-devel] [PULL 0/2] VFIO updates 2016-01-19

2016-01-19 Thread Alex Williamson
The following changes since commit 3db34bf64ab4f8797565dd8750003156c32b301d:

  Merge remote-tracking branch 'remotes/afaerber/tags/qom-devices-for-peter' 
into staging (2016-01-18 17:40:50 +)

are available in the git repository at:


  git://github.com/awilliam/qemu-vfio.git tags/vfio-update-20160119.0

for you to fetch changes up to 95239e162518dc6577164be3d9a789aba7f591a3:

  vfio/pci: Lazy PBA emulation (2016-01-19 11:33:42 -0700)


VFIO updates 2016-01-19

 - Performance fix for devices with poorly placed MSI-X PBA regions
 - Quirk fix for hosts with broken MMCONFIG access


Alex Williamson (2):
  vfio/pci-quirks: Only quirk to size of PCI config space
  vfio/pci: Lazy PBA emulation

 hw/vfio/pci-quirks.c |  6 +++---
 hw/vfio/pci.c| 39 +++
 hw/vfio/pci.h|  1 +
 trace-events |  2 ++
 4 files changed, 45 insertions(+), 3 deletions(-)



  1   2   3   4   >