date:20141202

Re: [Qemu-devel] [PATCHv3] block: add event when disk usage exceeds threshold

2014-12-02 Thread Francesco Romani

- Original Message -
> From: "Francesco Romani" 
> To: "Eric Blake" 
> Cc: kw...@redhat.com, lcapitul...@redhat.com, qemu-devel@nongnu.org, 
> stefa...@redhat.com, mdr...@linux.vnet.ibm.com
> Sent: Tuesday, December 2, 2014 8:47:45 AM
> Subject: Re: [Qemu-devel] [PATCHv3] block: add event when disk usage  exceeds 
> threshold
> 
> Thanks for the quick review!

> > Missing the change to the 'query-block' and 'query-named-block-nodes'
> > examples to show the new always-present output field.
> 

Sorry, I missed the *examples* keyword. Will fix.

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani

Re: [Qemu-devel] [PATCH for-2.3 0/6] vmdk: A few small fixes

2014-12-02 Thread Markus Armbruster

Fam Zheng  writes:

> Here are some improvements on miscellaneous things such as CID generation,
> comments, input validation.
>
> Fam Zheng (6):
>   vmdk: Use g_random_int to generate CID
>   vmdk: Fix comment to match code of extent lines
>   vmdk: Clean up descriptor file reading
>   vmdk: Check descriptor file length when reading it
>   vmdk: Remove unnecessary initialization
>   vmdk: Set errp on failures in vmdk_open_vmdk4
>
>  block/vmdk.c | 25 ++---
>  1 file changed, 18 insertions(+), 7 deletions(-)

The last one could be a candidate for stable.

Reviewed-by: Markus Armbruster

Re: [Qemu-devel] [PATCH v2] e1000: defer packets until BM enabled

2014-12-02 Thread Jason Wang




On Tue, Dec 2, 2014 at 2:06 AM, Michael S. Tsirkin  
wrote:

Some guests seem to set BM for e1000 after
enabling RX.
If packets arrive in the window, device is wedged.
Probably works by luck on real hardware, work around
this by making can_receive depend on BM.

Tested-by: Gabriel Somlo 
Signed-off-by: Michael S. Tsirkin 
---

Amos - you were the one reporting the failures, could
you pls confirm this patch fixes the issues for you?

 hw/net/e1000.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index e33a4da..89c5788 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -33,6 +33,7 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/dma.h"
 #include "qemu/iov.h"
+#include "qemu/range.h"
 
 #include "e1000_regs.h"
 
@@ -923,7 +924,9 @@ e1000_can_receive(NetClientState *nc)

 E1000State *s = qemu_get_nic_opaque(nc);
 
 return (s->mac_reg[STATUS] & E1000_STATUS_LU) &&

-(s->mac_reg[RCTL] & E1000_RCTL_EN) && e1000_has_rxbufs(s, 1);
+(s->mac_reg[RCTL] & E1000_RCTL_EN) &&
+(s->parent_obj.config[PCI_COMMAND] & PCI_COMMAND_MASTER) &&
+e1000_has_rxbufs(s, 1);
 }
 
 static uint64_t rx_desc_base(E1000State *s)

@@ -1529,6 +1532,20 @@ static NetClientInfo net_e1000_info = {
 .link_status_changed = e1000_set_link_status,
 };
 
+static void e1000_write_config(PCIDevice *pci_dev, uint32_t address,

+uint32_t val, int len)
+{
+E1000State *s = E1000(pci_dev);
+
+pci_default_write_config(pci_dev, address, val, len);
+
+if (range_covers_byte(address, len, PCI_COMMAND) &&
+(pci_dev->config[PCI_COMMAND] & PCI_COMMAND_MASTER)) {
+qemu_flush_queued_packets(qemu_get_queue(s->nic));
+}
+}
+
+
 static int pci_e1000_init(PCIDevice *pci_dev)
 {
 DeviceState *dev = DEVICE(pci_dev);
@@ -1539,6 +1556,8 @@ static int pci_e1000_init(PCIDevice *pci_dev)
 int i;
 uint8_t *macaddr;
 
+pci_dev->config_write = e1000_write_config;

+
 pci_conf = pci_dev->config;
 
 /* TODO: RST# value should be 0, PCI spec 6.2.4 */

--
MST


Reviewed-by: Jason Wang

Re: [Qemu-devel] [PATCH v2 01/13] block: Make essential BlockDriver objects public

2014-12-02 Thread Max Reitz


On 2014-12-01 at 16:59, Eric Blake wrote:

On 11/27/2014 07:48 AM, Max Reitz wrote:

There are some block drivers which are essential to QEMU and may not be
removed: These are raw, file and qcow2 (as the default non-raw format).
Make their BlockDriver objects public so they can be directly referenced
throughout the block layer without needing to call bdrv_find_format()
and having to deal with an error at runtime, while the real problem
occured during linking (where raw, file or qcow2 were not linked into

s/occured/occurred/


qemu).

Cc: qemu-sta...@nongnu.org
Signed-off-by: Max Reitz 
---
  block/qcow2.c | 4 ++--
  block/raw-posix.c | 4 ++--
  block/raw-win32.c | 4 ++--
  block/raw_bsd.c   | 4 ++--
  include/block/block_int.h | 8 
  5 files changed, 16 insertions(+), 8 deletions(-)

Reviewed-by: Eric Blake 


+++ b/block/qcow2.c
@@ -2847,7 +2847,7 @@ static QemuOptsList qcow2_create_opts = {
  }
  };
  
-static BlockDriver bdrv_qcow2 = {

+BlockDriver *bdrv_qcow2 = &(BlockDriver){

Do we want any use of 'const', to avoid accidental manipulation of the
pointer and/or pointed-to contents?


Sounds good at first, but for instance qemu_opts_create() (which is 
often called with bdrv_qcow2->create_opts and the like) don't take a 
const pointer. We could fix all those functions, but trying to fix the 
const-ness of the block layer sounds like really tedious work to me... 
Also, bdrv_find_format() returns a non-const pointer so it's at least 
not more broken than it was before.


Max

Re: [Qemu-devel] [PATCH v2 02/13] block: Omit bdrv_find_format for essential drivers

2014-12-02 Thread Max Reitz


On 2014-12-01 at 17:01, Eric Blake wrote:

On 11/27/2014 07:48 AM, Max Reitz wrote:

We can always assume raw, file and qcow2 being available; so do not use
bdrv_find_format() to locate their BlockDriver objects but statically
reference the respective objects.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Max Reitz 
---
  block.c | 11 ++-
  1 file changed, 2 insertions(+), 9 deletions(-)

Reviewed-by: Eric Blake 


@@ -1293,7 +1288,6 @@ int bdrv_append_temp_snapshot(BlockDriverState *bs, int 
flags, Error **errp)
  /* TODO: extra byte is a hack to ensure MAX_PATH space on Windows. */
  char *tmp_filename = g_malloc0(PATH_MAX + 1);
  int64_t total_size;
-BlockDriver *bdrv_qcow2;

Hmm - how hard would it be to get qemu to be clean under -Wshadow?  This
is a case where you would have had to change this hunk during patch 1 if
-Wshadow were in effect.


Well, I know I once found a bug which would have been spotted by 
-Wshadow; so in principle, I'm in favor of trying to enforce -Wshadow. 
On the other hand, I guess it may be really hard. We could try some 
time, but I don't want to have to run through all of qemu's code...


Max

Re: [Qemu-devel] [PATCH 0/3] blockdev: Add read-only option to change-blockdev

2014-12-02 Thread Max Reitz


On 2014-11-28 at 16:43, Markus Armbruster wrote:

Kevin Wolf  writes:


Am 20.11.2014 um 13:44 hat Max Reitz geschrieben:

The 'change' QMP and HMP command allows replacing the medium in drives
which support this, e.g. floppy disk drives. For some drives, the medium
carries information about whether it can be written to or not (again,
floppy drives). Therefore, it should be possible to change the read-only
state of block devices when changing the loaded medium.

This series adds an optional additional parameter to the 'change' QMP
and HMP command which allows changing the read-only state in four ways:

- 'retain': Just keep the status as it was before; this is the current
   behavior and thus this will be the default.
- 'ro': Force read-only access
- 'rw': Force writable access
- 'auto': This opens the new file R/W first, if that fails, the file is
   opened read-only.

Not sure if 'auto' is worth implementing (it's a typical HMP default
action that no QMP client would use, except that it isn't even the
default for HMP), but the implementation looks correct at least.

QMP eschews magic.  I'd prefer to keep 'auto' out.

HMP can offer it regardless, if it's useful.  But I doubt it will be.
Few users will need to control this, and fewer will realize they can by
giving an extra argument.


Well, Kevin made the good point of the user generally knowing whether a 
file should be written to or not. Furthermore, I don't think it would be 
too hard to use rw, then see that access was denied, and then use ro, 
manually. Maybe that's even better than auto, somehow.


Max


I wish I could tell you to leave change alone because it's a legacy
dungpile.  Alas, it's still only a dungpile.  I hope we can create a set
of sane media control commands relatively soon now.

Re: [Qemu-devel] [PATCH v4 0/3] chardev: Add -qmp-pretty

2014-12-02 Thread Max Reitz


On 2014-11-28 at 16:55, Markus Armbruster wrote:

Copying Luiz.

Max Reitz  writes:


This series does not add new functionality. Adding a QMP monitor with
prettily formatted JSON output can be done as follows:

$ qemu -chardev stdio,id=mon0 -mon chardev=mon0,mode=control,pretty=on

However, this is rather cumbersome, so this series (its first patch)
adds a shortcut in the form of the new command line option -qmp-pretty.

Since the argument given to a monitor command line option (such as -qmp)
is parsed depending on its prefix and probably also depending on the
current phase of the moon, this is cleaner than trying to add a "switch"
to -qmp itself (in the form of "-qmp stdio,pretty=on").

Yet another "convenience" option *groan*

Why can't we simply make -qmp set pretty=on and be done with it?
It's a convenience option, i.e. meant for humans, and why would humans
*not* want pretty=on?


Well, pretty=on produces really long output. I prefer short output if I 
don't expect to run commands like query-block which are completely 
unreadable with pretty=off.


I personally had a really bad taste when I saw that I couldn't modify 
-qmp somehow to get it to set pretty=on optionally. But then, after 
having implemented -qmp-pretty in basically five lines (the "case" 
statement for that option and the qemu_opt_set_bool() in 
monitor_parse()), I could send the series with a clear conscience. It's 
not that there's a new convenience option which is really bad to 
maintain and will break after the first gentle blow, as it reuses all 
the code paths from -qmp, which we will keep anyway.


Since this is only in block-next, I wouldn't object to it being dropped 
and making pretty=on the default for -qmp, in principle. But I 
personally like having a convenience pretty=off option, too, because I 
as a human often actually don't want pretty=on.


Max

Re: [Qemu-devel] [PATCH v8 02/10] qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove

2014-12-02 Thread Max Reitz


On 2014-12-01 at 19:52, John Snow wrote:



On 11/27/2014 04:41 AM, Max Reitz wrote:

On 2014-11-26 at 18:41, John Snow wrote:

From: Fam Zheng 

The new command pair is added to manage user created dirty bitmap. The
dirty bitmap's name is mandatory and must be unique for the same 
device,

but different devices can have bitmaps with the same names.

The granularity is an optional field. If it is not specified, we will
choose a default granularity based on the cluster size if available,
clamped to between 4K and 64K (To mirror how the 'mirror' code was
already choosing granularity.) If we do not have cluster size info


Maybe swap the right parenthesis and the full stop?


This is an American thing, the difference between "aesthetic 
punctuation" and "logical punctuation." (<-- aesthetic.)


http://www.slate.com/articles/life/the_good_word/2011/05/the_rise_of_logical_punctuation.html 



I can make a mental note in the future to not use the American style, 
I just thought it would be fun to explain it.


No need to not use that style if it's just me learning English by false 
(or maybe in 50 years it'll be right) accusations. Thanks for your 
explanation!


Max


available, we choose 64K. This code has been factored out into helper


Naturally you're better at English than me, but shouldn't this be "into
a helper"?


This, on the other hand, is just a typo where my brain filled in the 
missing glue for me.



shared with block/mirror.

The types added to block-core.json will be re-used in future patches
in this series, see:
'qapi: Add transaction support to block-dirty-bitmap-{add, enable,
disable}'

Signed-off-by: Fam Zheng 
Signed-off-by: John Snow 
---
  block.c   | 19 ++
  block/mirror.c| 10 +-
  blockdev.c| 54
++
  include/block/block.h |  1 +
  qapi/block-core.json  | 55
+++
  qmp-commands.hx   | 49
+
  6 files changed, 179 insertions(+), 9 deletions(-)


Anyway, with or without these minor changes:

Reviewed-by: Max Reitz

Re: [Qemu-devel] [PATCH v3 06/22] qcow2: Helper for refcount array reallocation

2014-12-02 Thread Max Reitz


On 2014-11-28 at 11:46, Stefan Hajnoczi wrote:

On Thu, Nov 27, 2014 at 04:11:12PM +0100, Max Reitz wrote:

On 2014-11-27 at 16:09, Stefan Hajnoczi wrote:

On Thu, Nov 20, 2014 at 06:06:22PM +0100, Max Reitz wrote:

+/**
+ * Reallocates *array so that it can hold new_size entries. *size must contain
+ * the current number of entries in *array. If the reallocation fails, *array
+ * and *size will not be modified and -errno will be returned. If the
+ * reallocation is successful, *array will be set to the new buffer and *size
+ * will be set to new_size. The size of the reallocated refcount array buffer
+ * will be aligned to a cluster boundary, and the newly allocated area will be
+ * zeroed.
+ */
+static int realloc_refcount_array(BDRVQcowState *s, uint16_t **array,
+  int64_t *size, int64_t new_size)
+{
+/* Round to clusters so the array can be directly written to disk */
+size_t old_byte_size = ROUND_UP(refcount_array_byte_size(s, *size),
+s->cluster_size);
+size_t new_byte_size = ROUND_UP(refcount_array_byte_size(s, new_size),
+s->cluster_size);
+uint16_t *new_ptr;
+
+if (new_byte_size <= old_byte_size) {
+*size = new_size;
+return 0;
+}

Why not realloc the array to the new smaller size? ...

Because such a call will actually never happen. I could replace this if ()
by assert(new_byte_size >= old_byte_size); if (new_byte_size ==
old_byte_size), but as I said before, I'm not a friend of assertions when
the code can deal perfectly well with the "unsupported" case.

It is simpler to put an if statement around the memset.


Well, I personally find an assertion simpler; and I will not drop the if 
(new_byte_size == old_byte_size) because this is a very common case so I 
don't want to rely on g_realloc() to detect it. Also, Eric proposed it 
and I'd like to avoid a ping-pong discussion.



Then the function actually frees unused memory


Which will never happen, though, because new_byte_size will never be 
less than old_byte_size.



and readers don't wonder why you decided not to shrink the array.


An assertion will clear up things as well.


Less code and slightly better behavior.


Well, an assert() is one line, while an if () takes two (if () { and }); 
the behavior will (hopefully) not change either.


But anyway, I'll go with your proposition because, as I said, I don't 
like assertions if the code can deal perfectly fine with the bad cases. 
Therefore, if at some point in time we decide to use 
realloc_refcount_array() to shrink a refcount array, it's better to have 
planned for that case.


Max

Re: [Qemu-devel] [PATCH v3 07/22] qcow2: Helper function for refcount modification

2014-12-02 Thread Max Reitz


On 2014-11-28 at 12:26, Stefan Hajnoczi wrote:

On Thu, Nov 27, 2014 at 04:32:52PM +0100, Max Reitz wrote:

On 2014-11-27 at 16:21, Stefan Hajnoczi wrote:

On Thu, Nov 20, 2014 at 06:06:23PM +0100, Max Reitz wrote:

@@ -583,7 +608,12 @@ static int QEMU_WARN_UNUSED_RESULT 
update_refcount(BlockDriverState *bs,
  /* we can update the count and save it */
  block_index = cluster_index & (s->refcount_block_size - 1);
-refcount = be16_to_cpu(refcount_block[block_index]);
+refcount = s->get_refcount(refcount_block, block_index);
+if (refcount < 0) {
+ret = -ERANGE;
+goto fail;
+}

Here again.


@@ -1206,11 +1236,14 @@ static int inc_refcounts(BlockDriverState *bs,
  }
  }
-if (++(*refcount_table)[k] == 0) {
+refcount = s->get_refcount(*refcount_table, k);

Here the refcount < 0 check is missing.  That's why it would be simpler
to eliminate the refcount < 0 case entirely.

It's not missing. This is part of the refcount check, as are all the
following ("in other places"). The refcount check builds a refcount array in
memory all by itself, so it knows for sure there are no overflows. The line
which you omitted directly after this clips the refcount values against
s->refcount_max which is INT64_MAX for 64-bit refcounts.

Therefore, no overflows possible in the refcount checking functions, because
the refcount checking functions don't introduce the overflows themselves.
The other overflow checks are only in place to reject faulty images provided
from the outside.

I see, that makes sense.


@@ -1726,7 +1761,7 @@ static void compare_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
  continue;
  }
-refcount2 = refcount_table[i];
+refcount2 = s->get_refcount(refcount_table, i);

Missing here too and in other places.


+typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array,
+  uint64_t index);

Should the return type be int64_t?

No. If it was, we'd have to check for errors every time we call it, but it
cannot return errors (well, if we let it return uint64_t, it might return
-ERANGE, but that's exactly what I don't want). Therefore, let it return
uint64_t so we know this function cannot fail.


+typedef void Qcow2SetRefcountFunc(void *refcount_array,
+  uint64_t index, uint64_t value);

Should value's type be int64_t?  Just because the type is unsigned
doesn't make (uint64_t)-1ULL a valid value.

Actually, it does. It's just that the implementation provided here does not
support it.

Since there is an assertion against that case in the 64-bit implementation
for this function, I don't have a problem with using int64_t here, though.
But that would break symmetry with Qcow2GetRefcountFunc(), and I do have a
reason there not to return a signed value, as explained above.

Okay, so uint64_t is really a different type - it's the qcow2 spec
64-bit refcount.  int64_t is the QEMU implementation's internal
representation.

This seems error-prone to me.  Maybe comments would have helped, but
it's best to eliminate the problem entirely.  Why not bite the bullet
and fix up qcow2?

There are two external function prototypes that need to be changed:

int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index);

int64_t qcow2_update_cluster_refcount(BlockDriverState *bs,
   int64_t cluster_index, int addend,
   enum qcow2_discard_type type);

/* Returns 0 on success, -errno on failure */
int qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index,
uint64_t *refcount);
int qcow2_update_cluster_refcount(BlockDriverState *bs,
   int64_t cluster_index, int addend,
   enum qcow2_discard_type type,
  uint64_t *refcount);

Have I missed a fundamental reason why the implementation's internal
refcount type cannot be changed from int64_t to uint64_t?

It would keep the code complexity down and reduce errors.


Sounds simple enough. I'll take a look, thanks!

Max

Re: [Qemu-devel] [PATCH v3 10/22] qcow2: refcount_order parameter for qcow2_create2

2014-12-02 Thread Max Reitz


On 2014-11-27 at 17:25, Stefan Hajnoczi wrote:

On Thu, Nov 20, 2014 at 06:06:26PM +0100, Max Reitz wrote:

@@ -2003,6 +2015,7 @@ static int qcow2_create(const char *filename, QemuOpts 
*opts, Error **errp)
  size_t cluster_size = DEFAULT_CLUSTER_SIZE;
  PreallocMode prealloc;
  int version = 3;
+int refcount_width = 16, refcount_order;

It would be nice to consistently use either refcount_width or
refcount_bits throughout the code.


Will (try to) use refcount_bits everywhere.

Max

Re: [Qemu-devel] [PATCH v3 07/22] qcow2: Helper function for refcount modification

2014-12-02 Thread Max Reitz


On 2014-11-28 at 12:11, Stefan Hajnoczi wrote:

On Thu, Nov 20, 2014 at 06:06:23PM +0100, Max Reitz wrote:

@@ -1711,7 +1746,7 @@ static int calculate_refcounts(BlockDriverState *bs, 
BdrvCheckResult *res,
  static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
BdrvCheckMode fix, bool *rebuild,
int64_t *highest_cluster,
-  uint16_t *refcount_table, int64_t nb_clusters)
+  void *refcount_table, int64_t nb_clusters)
  {
  BDRVQcowState *s = bs->opaque;
  int64_t i, refcount1, refcount2;

An -ERANGE qcow2_get_refcount() return value is treated as a check error
here and we won't be able to rebuild the refcount blocks:

   refcount1 = qcow2_get_refcount(bs, i);
   if (refcount1 < 0) {
   fprintf(stderr, "Can't get refcount for cluster %" PRId64 ": %s\n",
   i, strerror(-refcount1));
   res->check_errors++;
   continue;
   }

We should allow rebuilding refcount blocks, -ERANGE is just another
corrupted refcount.


Right, hopefully this will be fixed by implementing your proposal for 
changing "uint64_t qcow2_get_refcount(...)" to "int 
qcow2_get_refcount(..., uint64_t *)" etc..


Max

Re: [Qemu-devel] [PATCH v3 18/22] qcow2: Use intermediate helper CB for amend

2014-12-02 Thread Max Reitz


On 2014-11-28 at 15:13, Stefan Hajnoczi wrote:

On Thu, Nov 20, 2014 at 06:06:34PM +0100, Max Reitz wrote:

@@ -2674,6 +2743,7 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
  bool encrypt;
  int ret;
  QemuOptDesc *desc = opts->list->desc;
+Qcow2AmendHelperCBInfo helper_cb_info;
  
  while (desc && desc->name) {

  if (!qemu_opt_find(opts, desc->name)) {
@@ -2731,6 +2801,12 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
  desc++;
  }
  
+helper_cb_info = (Qcow2AmendHelperCBInfo){

+.original_status_cb = status_cb,
+.original_cb_opaque = cb_opaque,
+.total_operations = (new_version < old_version)

If you respin, another way of writing this is without total_operations
here (so it initializes to 0)...


+};
+
  /* Upgrade first (some features may require compat=1.1) */
  if (new_version > old_version) {
  s->qcow_version = new_version;
@@ -2789,7 +2865,9 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
  
  /* Downgrade last (so unsupported features can be removed before) */

  if (new_version < old_version) {
-ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
+helper_cb_info.current_operation = QCOW2_DOWNGRADING;

...and then helper_cb_info.total_operations++ here.

That way the new_version < old_version check is not duplicated into the
helper_cb_info initializer.

The code is clearer because we assign current_operation and
total_operations at the same time.

Just a style suggestion, feel free to ignore if you don't like it.


I think helper_cb_info.total_operations should be set right when 
creating the object. We can only do .total_operations++ once, and that 
is for the very first operation because after that is executed, we may 
no longer change .total_operations; and I don't think we should treat 
the first operation differently than the rest.


Max

Re: [Qemu-devel] Update on TCG Multithreading

2014-12-02 Thread Dr. David Alan Gilbert

* Mark Burton (mark.bur...@greensocs.com) wrote:
> 
> All - first a huge thanks for those who have contributed, and those who have 
> expressed an interest in helping out.
> 
> One issue I???d like to see more opinions on is the question of a cache per 
> core, or a shared cache.
> I have heard anecdotal evidence that a shared cache gives a major performance 
> benefit???.
> Does anybody have anything more concrete?
> (of course we will get numbers in the end if we implement the hybrid scheme 
> as suggested in the wiki - but I???d still appreciate any feedback).
> 
> Our next plan is to start putting an implementation plan together. Probably 
> quite sketchy at this point, and we hope to start coding shortly.

I'd expect a shared one to be able to take advantage
of code that's translated by one core and then used on
another.
On the other hand with one per core you can perform updates
on the caches with a lot less locking; however you've still
got to be able to do invalidates across all the caches if any
core does the write, and that could also get tricky.

Dave

> 
> 
> Cheers
> 
> Mark.
> 
> 
> 
> 
> 
>+44 (0)20 7100 3485 x 210
>  +33 (0)5 33 52 01 77x 210
> 
>   +33 (0)603762104
>   mark.burton
>  
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2 0/3] block: Don't probe for unknown backing file format

2014-12-02 Thread Kevin Wolf

Am 28.11.2014 um 12:33 hat Stefan Hajnoczi geschrieben:
> On Tue, Nov 25, 2014 at 06:12:39PM +0100, Kevin Wolf wrote:
> > v2:
> > - Added two patches to make the test case actually work properly and not
> >   just by accident on my laptop. [Max]
> > - Fixed comment in test case [Max]
> > 
> > Kevin Wolf (3):
> >   qcow2: Fix header extension size check
> >   qcow2.py: Add required padding for header extensions
> >   block: Don't probe for unknown backing file format
> > 
> >  block.c |  7 +++---
> >  block/qcow2.c   |  2 +-
> >  tests/qemu-iotests/080  |  2 ++
> >  tests/qemu-iotests/080.out  |  2 ++
> >  tests/qemu-iotests/114  | 61 
> > +
> >  tests/qemu-iotests/114.out  | 13 ++
> >  tests/qemu-iotests/group|  1 +
> >  tests/qemu-iotests/qcow2.py |  4 +++
> >  8 files changed, 87 insertions(+), 5 deletions(-)
> >  create mode 100755 tests/qemu-iotests/114
> >  create mode 100644 tests/qemu-iotests/114.out
> > 
> > -- 
> > 1.8.3.1
> > 
> > 
> 
> Thanks, applied to my block-next tree:
> https://github.com/stefanha/qemu/commits/block-next

Somehow you lost the new files (114 and 114.out) in patch 3. I'm fixing
this up in my block-next branch now, but I thought I should tell you in
case you need to fix some local script.

Kevin


pgp1q8U4S5dFT.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH] qemu-iotests: 060: Filter the real disk size

2014-12-02 Thread Kevin Wolf

Am 27.11.2014 um 15:08 hat Kevin Wolf geschrieben:
> The real on-disk size of an image depends on things like the host
> filesystem. _img_info already filters it out, use the function in 060.
> 
> Signed-off-by: Kevin Wolf 

Applied to block-next.

Kevin

Re: [Qemu-devel] [PATCH] qemu-iotests: 082: Filter the real disk size

2014-12-02 Thread Kevin Wolf

Am 27.11.2014 um 17:28 hat Michael Mueller geschrieben:
> The real on-disk size of an image depends on things like the host
> filesystem. _img_info already filters it out, use the function in 082.
> 
> Signed-off-by: Michael Mueller 

Thanks, applied to the block-next branch.

Kevin

Re: [Qemu-devel] [PATCH v3 0/7] parallels format support improvements

2014-12-02 Thread Denis V. Lunev


On 06/11/14 15:54, Denis V. Lunev wrote:

The patchset implements additional compatibility bits for Parallels
format:
- initial support of parsing of Parallels DiskDeskriptor.xml
   Typically Parallels disk bundle consists of several images which are
   glued by XML disk descriptor. Also XML hides inside several important
   parameters which are not available in the image header.
- support for padded Parallels images.
   For the time being Parallels was created an optimization for such OSes
   in its desktop product. Desktop users are not qualified enough to create
   properly aligned installations. Thus Parallels makes a blind guess
   on a customer behalf and creates so-called "padded" images if guest
   OS type is specified as WinXP, Win2k and Win2k3.

The code uses approach from VMDK support, either image or XML descriptor
could be used. Though there is temporary hack in the opening code:
BlockDriverState->file is being reopened inside parallels_open. I prefer
to keep this code in this state till proper Parallels snapshots support
in order to minimize current changes.

Changes from v2:
- (patch 1) changed libxml2 addition as suggested by Michael Tokarev
- (patch 2) changed API of xml_find/xml_get_text to avoid memcpy to variable
   on stack
- (patch 2) dropped predefined value for PARALLELS_XML/PARALLELS_IMAGE
- (patch 2) other minor changes (spelling, placement)
- (patch 3) quoted TEST_IMG as suggested by Jeff Cody
- (patches 4, 6) quoted TEST_IMG as suggested by Jeff Cody

Changes from v1:
- dropped already merged part (original patches 1-3)

CC: Jeff Cody 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
CC: Roman Kagan 
CC: Denis V. Lunev 


ping

Re: [Qemu-devel] [PATCH for-2.3 0/6] vmdk: A few small fixes

2014-12-02 Thread Kevin Wolf

Am 02.12.2014 um 09:28 hat Markus Armbruster geschrieben:
> Fam Zheng  writes:
> 
> > Here are some improvements on miscellaneous things such as CID generation,
> > comments, input validation.
> >
> > Fam Zheng (6):
> >   vmdk: Use g_random_int to generate CID
> >   vmdk: Fix comment to match code of extent lines
> >   vmdk: Clean up descriptor file reading
> >   vmdk: Check descriptor file length when reading it
> >   vmdk: Remove unnecessary initialization
> >   vmdk: Set errp on failures in vmdk_open_vmdk4
> >
> >  block/vmdk.c | 25 ++---
> >  1 file changed, 18 insertions(+), 7 deletions(-)
> 
> The last one could be a candidate for stable.
> 
> Reviewed-by: Markus Armbruster 

Thanks, applied to block-next.

Kevin

Re: [Qemu-devel] [PATCH] target-mips: Output CP0.Config2-5 in the register dump

2014-12-02 Thread Leon Alrae

On 18/11/2014 03:20, Maciej W. Rozycki wrote:
> @@ -19276,6 +19276,10 @@ void mips_cpu_dump_state(CPUState *cs, F
>  env->CP0_Status, env->CP0_Cause, env->CP0_EPC);
>  cpu_fprintf(f, "Config0 0x%08x Config1 0x%08x LLAddr 0x" 
> TARGET_FMT_lx "\n",
>  env->CP0_Config0, env->CP0_Config1, env->lladdr);
> +cpu_fprintf(f, "Config2 0x%08x Config3 0x%08x\n",
> +env->CP0_Config2, env->CP0_Config3);
> +cpu_fprintf(f, "Config4 0x%08x Config5 0x%08x\n",
> +env->CP0_Config4, env->CP0_Config5);

Wouldn't it be better to order these registers for example by CP0
Register and Select number rather than putting them at the end? I think
it doesn't matter at the moment as there are just few registers printed
out, but once we start adding more then probably we would like to avoid
having them placed arbitrarily in the output.

Regards,
Leon

Re: [Qemu-devel] [Bug 1383857] Re: aarch64: virtio disks don't show up in guest (neither blk nor scsi)

2014-12-02 Thread Christoffer Dall

I was just about to get to testing this stuff, but thanks for working
through it on your own, and apologies I didn't get to it before.

Cc'ing Marc so he's aware of the progress.

-Christoffer

On Mon, Dec 1, 2014 at 3:46 PM, Richard Jones  wrote:
> Finally found the problem, patch posted:
> https://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00034.html
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1383857
>
> Title:
>   aarch64: virtio disks don't show up in guest (neither blk nor scsi)
>
> Status in QEMU:
>   New
>
> Bug description:
>   kernel-3.18.0-0.rc1.git0.1.rwmj5.fc22.aarch64 (3.18 rc1 + some hardware 
> enablement)
>   qemu from git today
>
>   When I create a guest with virtio-scsi disks, they don't show up inside the 
> guest.
>   Literally after the virtio_mmio.ko and virtio_scsi.ko modules are loaded, 
> there are
>   no messages about disks, and of course nothing else works.
>
>   Really long command line (generated by libvirt):
>
>   HOME=/home/rjones USER=rjones LOGNAME=rjones QEMU_AUDIO_DRV=none
>   TMPDIR=/home/rjones/d/libguestfs/tmp
>   /home/rjones/d/qemu/aarch64-softmmu/qemu-system-aarch64 -name guestfs-
>   oqv29um3jp03kpjf -S -machine virt,accel=tcg,usb=off -cpu cortex-a57 -m
>   500 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid
>   a5f1a15d-2bc7-46df-9974-1d1f643b2449 -nographic -no-user-config
>   -nodefaults -chardev
>   socket,id=charmonitor,path=/home/rjones/.config/libvirt/qemu/lib
>   /guestfs-oqv29um3jp03kpjf.monitor,server,nowait -mon
>   chardev=charmonitor,id=monitor,mode=control -rtc
>   base=utc,driftfix=slew -no-reboot -boot strict=on -kernel
>   /home/rjones/d/libguestfs/tmp/.guestfs-1000/appliance.d/kernel -initrd
>   /home/rjones/d/libguestfs/tmp/.guestfs-1000/appliance.d/initrd -append
>   panic=1 console=ttyAMA0 earlyprintk=pl011,0x900 ignore_loglevel
>   efi-rtc=noprobe udevtimeout=6000 udev.event-timeout=6000
>   no_timer_check lpj=50 acpi=off printk.time=1 cgroup_disable=memory
>   root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=xterm-256color -device
>   virtio-scsi-device,id=scsi0 -device virtio-serial-device,id=virtio-
>   serial0 -usb -drive
>   file=/home/rjones/d/libguestfs/tmp/libguestfs4GxfQ9/scratch.1,if=none,id
>   =drive-scsi0-0-0-0,format=raw,cache=unsafe -device scsi-
>   hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-
>   scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -drive
>   file=/home/rjones/d/libguestfs/tmp/libguestfs4GxfQ9/overlay2,if=none,id
>   =drive-scsi0-0-1-0,format=qcow2,cache=unsafe -device scsi-
>   hd,bus=scsi0.0,channel=0,scsi-id=1,lun=0,drive=drive-
>   scsi0-0-1-0,id=scsi0-0-1-0 -serial
>   unix:/home/rjones/d/libguestfs/tmp/libguestfs4GxfQ9/console.sock
>   -chardev
>   
> socket,id=charchannel0,path=/home/rjones/d/libguestfs/tmp/libguestfs4GxfQ9/guestfsd.sock
>   -device virtserialport,bus=virtio-
>   
> serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.libguestfs.channel.0
>   -msg timestamp=on
>
>   There are no kernel messages about the disks, they just are not seen.
>
>   Worked with kernel 3.16 so I suspect this could be a kernel bug rather than 
> a
>   qemu bug, but I've no idea where to report those.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qemu/+bug/1383857/+subscriptions

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1383857

Title:
  aarch64: virtio disks don't show up in guest (neither blk nor scsi)

Status in QEMU:
  New

Bug description:
  kernel-3.18.0-0.rc1.git0.1.rwmj5.fc22.aarch64 (3.18 rc1 + some hardware 
enablement)
  qemu from git today

  When I create a guest with virtio-scsi disks, they don't show up inside the 
guest.
  Literally after the virtio_mmio.ko and virtio_scsi.ko modules are loaded, 
there are
  no messages about disks, and of course nothing else works.

  Really long command line (generated by libvirt):

  HOME=/home/rjones USER=rjones LOGNAME=rjones QEMU_AUDIO_DRV=none
  TMPDIR=/home/rjones/d/libguestfs/tmp
  /home/rjones/d/qemu/aarch64-softmmu/qemu-system-aarch64 -name guestfs-
  oqv29um3jp03kpjf -S -machine virt,accel=tcg,usb=off -cpu cortex-a57 -m
  500 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid
  a5f1a15d-2bc7-46df-9974-1d1f643b2449 -nographic -no-user-config
  -nodefaults -chardev
  socket,id=charmonitor,path=/home/rjones/.config/libvirt/qemu/lib
  /guestfs-oqv29um3jp03kpjf.monitor,server,nowait -mon
  chardev=charmonitor,id=monitor,mode=control -rtc
  base=utc,driftfix=slew -no-reboot -boot strict=on -kernel
  /home/rjones/d/libguestfs/tmp/.guestfs-1000/appliance.d/kernel -initrd
  /home/rjones/d/libguestfs/tmp/.guestfs-1000/appliance.d/initrd -append
  panic=1 console=ttyAMA0 earlyprintk=pl011,0x900 ignore_loglevel
  efi-rtc=noprobe udevtimeout=6000 udev.event-timeout=6000
  no_timer_check lpj=50 acpi=off printk

Re: [Qemu-devel] [PATCH v2 01/13] block: Make essential BlockDriver objects public

2014-12-02 Thread Kevin Wolf

Am 27.11.2014 um 15:48 hat Max Reitz geschrieben:
> There are some block drivers which are essential to QEMU and may not be
> removed: These are raw, file and qcow2 (as the default non-raw format).
> Make their BlockDriver objects public so they can be directly referenced
> throughout the block layer without needing to call bdrv_find_format()
> and having to deal with an error at runtime, while the real problem
> occured during linking (where raw, file or qcow2 were not linked into
> qemu).
> 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Max Reitz 
> ---
>  block/qcow2.c | 4 ++--
>  block/raw-posix.c | 4 ++--
>  block/raw-win32.c | 4 ++--
>  block/raw_bsd.c   | 4 ++--
>  include/block/block_int.h | 8 
>  5 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 8b9ffc4..0eeba36 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2847,7 +2847,7 @@ static QemuOptsList qcow2_create_opts = {
>  }
>  };
>  
> -static BlockDriver bdrv_qcow2 = {
> +BlockDriver *bdrv_qcow2 = &(BlockDriver){

As you correctly guessed, I like the looks of this syntax, but it leaves
us with some bdrv_ objects being pointers, and most others still
directly being the BlockDriver object.

I think it would be better to define a public BlockDriver here and use
&bdrv_qcow2 where necessary. Perhaps not a blocker, but in case you need
to send a v3.

Kevin

Re: [Qemu-devel] [PATCH] target-mips: Fix CP0.Config3.ISAOnExc write accesses

2014-12-02 Thread Leon Alrae

On 18/11/2014 03:59, Maciej W. Rozycki wrote:
>  Please note that for this validation I'm using an artificial microMIPS 
> processor that also has an FPU implemented, so that our microMIPS FP 
> support is correctly validated too (I don't really know if there exists 
> any real microMIPS processor that includes an FPU; if so, then it would 
> be good to add it to the list our supported configurations).

FYI, there are real CPUs which support microMIPS and include FPU, for
example microAptivUC.

> qemu-mips-config3-isaonexc.diff

Reviewed-by: Leon Alrae

Re: [Qemu-devel] [PATCH v2 03/13] block/vvfat: qcow driver may not be found

2014-12-02 Thread Kevin Wolf

Am 27.11.2014 um 15:48 hat Max Reitz geschrieben:
> Although virtually impossible right now, bdrv_find_format("qcow") may
> fail. The vvfat block driver should heed that case.
> 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Max Reitz 

I wonder what the subtle breakage would be that we would get if we
changed this from qcow1 to qcow2...

Kevin

[Qemu-devel] [PATCH v2 0/7] coroutine: optimizations

2014-12-02 Thread Paolo Bonzini

As discussed in the other thread, this brings speedups from
dropping the coroutine mutex (which serializes multiple iothreads,
too) and using ELF thread-local storage.

The speedup in perf/cost is about 50% (190->125).  Windows port tested
with tests/test-coroutine.exe under Wine.

Paolo

v1->v2: include the noinline attribute [many...]
do not mention SwitchToFiber [Kevin]
rename run_main_iothread_exit -> run_main_thread_exit
leave personal opinions out of commit messages :) [Kevin]
mention gain from patch 7 [Peter]
change "alloc_pool_size +=" to "alloc_pool_size =" [Peter]

Paolo Bonzini (7):
  coroutine-ucontext: use __thread
  qemu-thread: add per-thread atexit functions
  test-coroutine: avoid overflow on 32-bit systems
  QSLIST: add lock-free operations
  coroutine: rewrite pool to avoid mutex
  coroutine: drop qemu_coroutine_adjust_pool_size
  coroutine: try harder not to delete coroutines

 block/block-backend.c |   4 --
 coroutine-ucontext.c  |  64 +++-
 include/block/coroutine.h |  10 -
 include/qemu/queue.h  |  15 ++-
 include/qemu/thread.h |   4 ++
 qemu-coroutine.c  | 104 ++
 tests/test-coroutine.c|   2 +-
 util/qemu-thread-posix.c  |  37 +
 util/qemu-thread-win32.c  |  48 -
 9 files changed, 157 insertions(+), 131 deletions(-)

-- 
2.1.0

[Qemu-devel] [PATCH v2 3/7] test-coroutine: avoid overflow on 32-bit systems

2014-12-02 Thread Paolo Bonzini

unsigned long is not large enough to represent 10 * duration there.
Just use floating point.

Signed-off-by: Paolo Bonzini 
---
 tests/test-coroutine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c
index e22fae1..27d1b6f 100644
--- a/tests/test-coroutine.c
+++ b/tests/test-coroutine.c
@@ -337,7 +337,7 @@ static void perf_cost(void)
"%luns per coroutine",
maxcycles,
duration, ops,
-   (unsigned long)(10 * duration) / maxcycles);
+   (unsigned long)(10.0 * duration / maxcycles));
 }
 
 int main(int argc, char **argv)
-- 
2.1.0

[Qemu-devel] [PATCH v2 1/7] coroutine-ucontext: use __thread

2014-12-02 Thread Paolo Bonzini

ELF thread local storage is about 10% faster on tests/test-coroutine's
perf/cost test.  The timing on my machine is 190ns per iteration with
pthread TLS, 170 with ELF TLS.

Based on a patch by Kevin Wolf and Peter Lieven, but redone to follow
the model of coroutine-win32.c (including the important "noinline"
attribute!).

Platforms without thread-local storage (OpenBSD probably?) will need
a new-enough GCC for this to compile, in order to use the same emutls
support that Windows already relies on.

Signed-off-by: Paolo Bonzini 
---
v1->v2: include the noinline attribute [many...]
do not mention SwitchToFiber [Kevin]

 coroutine-ucontext.c | 64 +---
 1 file changed, 16 insertions(+), 48 deletions(-)

diff --git a/coroutine-ucontext.c b/coroutine-ucontext.c
index 4bf2cde..d86e3e1 100644
--- a/coroutine-ucontext.c
+++ b/coroutine-ucontext.c
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include "qemu-common.h"
 #include "block/coroutine_int.h"
@@ -48,15 +47,8 @@ typedef struct {
 /**
  * Per-thread coroutine bookkeeping
  */
-typedef struct {
-/** Currently executing coroutine */
-Coroutine *current;
-
-/** The default coroutine */
-CoroutineUContext leader;
-} CoroutineThreadState;
-
-static pthread_key_t thread_state_key;
+static __thread CoroutineUContext leader;
+static __thread Coroutine *current;
 
 /*
  * va_args to makecontext() must be type 'int', so passing
@@ -68,36 +60,6 @@ union cc_arg {
 int i[2];
 };
 
-static CoroutineThreadState *coroutine_get_thread_state(void)
-{
-CoroutineThreadState *s = pthread_getspecific(thread_state_key);
-
-if (!s) {
-s = g_malloc0(sizeof(*s));
-s->current = &s->leader.base;
-pthread_setspecific(thread_state_key, s);
-}
-return s;
-}
-
-static void qemu_coroutine_thread_cleanup(void *opaque)
-{
-CoroutineThreadState *s = opaque;
-
-g_free(s);
-}
-
-static void __attribute__((constructor)) coroutine_init(void)
-{
-int ret;
-
-ret = pthread_key_create(&thread_state_key, qemu_coroutine_thread_cleanup);
-if (ret != 0) {
-fprintf(stderr, "unable to create leader key: %s\n", strerror(errno));
-abort();
-}
-}
-
 static void coroutine_trampoline(int i0, int i1)
 {
 union cc_arg arg;
@@ -193,15 +155,23 @@ void qemu_coroutine_delete(Coroutine *co_)
 g_free(co);
 }
 
-CoroutineAction qemu_coroutine_switch(Coroutine *from_, Coroutine *to_,
-  CoroutineAction action)
+/* This function is marked noinline to prevent GCC from inlining it
+ * into coroutine_trampoline(). If we allow it to do that then it
+ * hoists the code to get the address of the TLS variable "current"
+ * out of the while() loop. This is an invalid transformation because
+ * the sigsetjmp() call may be called when running thread A but
+ * return in thread B, and so we might be in a different thread
+ * context each time round the loop.
+ */
+CoroutineAction __attribute__((noinline))
+qemu_coroutine_switch(Coroutine *from_, Coroutine *to_,
+  CoroutineAction action)
 {
 CoroutineUContext *from = DO_UPCAST(CoroutineUContext, base, from_);
 CoroutineUContext *to = DO_UPCAST(CoroutineUContext, base, to_);
-CoroutineThreadState *s = coroutine_get_thread_state();
 int ret;
 
-s->current = to_;
+current = to_;
 
 ret = sigsetjmp(from->env, 0);
 if (ret == 0) {
@@ -212,14 +181,13 @@ CoroutineAction qemu_coroutine_switch(Coroutine *from_, 
Coroutine *to_,
 
 Coroutine *qemu_coroutine_self(void)
 {
-CoroutineThreadState *s = coroutine_get_thread_state();
-
-return s->current;
+if (!current) {
+current = &leader.base;
+}
+return current;
 }
 
 bool qemu_in_coroutine(void)
 {
-CoroutineThreadState *s = pthread_getspecific(thread_state_key);
-
-return s && s->current->caller;
+return current && current->caller;
 }
-- 
2.1.0

[Qemu-devel] [PATCH v2 7/7] coroutine: try harder not to delete coroutines

2014-12-02 Thread Paolo Bonzini

From: Peter Lieven 

Placing coroutines on the global pool should be preferrable, because it
can help all threads.  But if the global pool is full, we can still
try to save some allocations by stashing completed coroutines on the
local pool.  This is quite cheap too, because it does not require
atomic operations, and provides a gain of 15% in the best case.

Signed-off-by: Peter Lieven 
Signed-off-by: Paolo Bonzini 
---
v1->v2: mention gain from this patch [Peter]
change "alloc_pool_size +=" to "alloc_pool_size =" [Peter]

 qemu-coroutine.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/qemu-coroutine.c b/qemu-coroutine.c
index da1b961..977f114 100644
--- a/qemu-coroutine.c
+++ b/qemu-coroutine.c
@@ -27,6 +27,7 @@ enum {
 static QSLIST_HEAD(, Coroutine) release_pool = QSLIST_HEAD_INITIALIZER(pool);
 static unsigned int release_pool_size;
 static __thread QSLIST_HEAD(, Coroutine) alloc_pool = 
QSLIST_HEAD_INITIALIZER(pool);
+static __thread unsigned int alloc_pool_size;
 static __thread Notifier coroutine_pool_cleanup_notifier;
 
 static void coroutine_pool_cleanup(Notifier *n, void *value)
@@ -58,13 +59,14 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
  * release_pool_size and the actual size of release_pool.  But
  * it is just a heuristic, it does not need to be perfect.
  */
-release_pool_size = 0;
+alloc_pool_size = atomic_xchg(&release_pool_size, 0);
 QSLIST_MOVE_ATOMIC(&alloc_pool, &release_pool);
 co = QSLIST_FIRST(&alloc_pool);
 }
 }
 if (co) {
 QSLIST_REMOVE_HEAD(&alloc_pool, pool_next);
+alloc_pool_size--;
 }
 }
 
@@ -87,6 +89,11 @@ static void coroutine_delete(Coroutine *co)
 atomic_inc(&release_pool_size);
 return;
 }
+if (alloc_pool_size < POOL_BATCH_SIZE) {
+QSLIST_INSERT_HEAD(&alloc_pool, co, pool_next);
+alloc_pool_size++;
+return;
+}
 }
 
 qemu_coroutine_delete(co);
-- 
2.1.0

[Qemu-devel] [PATCH v2 2/7] qemu-thread: add per-thread atexit functions

2014-12-02 Thread Paolo Bonzini

Destructors are the main additional feature of pthread TLS compared
to __thread.  If we were using C++ (hint, hint!) we could have used
thread-local objects with a destructor.  Since we are not, instead,
we add a simple Notifier-based API.

Note that the notifier must be per-thread as well.  We can add a
global list as well later, perhaps.

The Win32 implementation has some complications because a) detached
threads used not to have a QemuThreadData; b) the main thread does
not go through win32_start_routine, so we have to use atexit too.

Signed-off-by: Paolo Bonzini 
---
v1->v2: rename run_main_iothread_exit -> run_main_thread_exit

 include/qemu/thread.h|  4 
 util/qemu-thread-posix.c | 37 +
 util/qemu-thread-win32.c | 48 +---
 3 files changed, 78 insertions(+), 11 deletions(-)

diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index f7e3b9b..e89fdc9 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -61,4 +61,8 @@ bool qemu_thread_is_self(QemuThread *thread);
 void qemu_thread_exit(void *retval);
 void qemu_thread_naming(bool enable);
 
+struct Notifier;
+void qemu_thread_atexit_add(struct Notifier *notifier);
+void qemu_thread_atexit_remove(struct Notifier *notifier);
+
 #endif
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index d05a649..41cb23d 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -26,6 +26,7 @@
 #endif
 #include "qemu/thread.h"
 #include "qemu/atomic.h"
+#include "qemu/notify.h"
 
 static bool name_threads;
 
@@ -401,6 +402,42 @@ void qemu_event_wait(QemuEvent *ev)
 }
 }
 
+static pthread_key_t exit_key;
+
+union NotifierThreadData {
+void *ptr;
+NotifierList list;
+};
+QEMU_BUILD_BUG_ON(sizeof(union NotifierThreadData) != sizeof(void *));
+
+void qemu_thread_atexit_add(Notifier *notifier)
+{
+union NotifierThreadData ntd;
+ntd.ptr = pthread_getspecific(exit_key);
+notifier_list_add(&ntd.list, notifier);
+pthread_setspecific(exit_key, ntd.ptr);
+}
+
+void qemu_thread_atexit_remove(Notifier *notifier)
+{
+union NotifierThreadData ntd;
+ntd.ptr = pthread_getspecific(exit_key);
+notifier_remove(notifier);
+pthread_setspecific(exit_key, ntd.ptr);
+}
+
+static void qemu_thread_atexit_run(void *arg)
+{
+union NotifierThreadData ntd = { .ptr = arg };
+notifier_list_notify(&ntd.list, NULL);
+}
+
+static void __attribute__((constructor)) qemu_thread_atexit_init(void)
+{
+pthread_key_create(&exit_key, qemu_thread_atexit_run);
+}
+
+
 /* Attempt to set the threads name; note that this is for debug, so
  * we're not going to fail if we can't set it.
  */
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index c405c9b..7bda85b 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -12,6 +12,7 @@
  */
 #include "qemu-common.h"
 #include "qemu/thread.h"
+#include "qemu/notify.h"
 #include 
 #include 
 #include 
@@ -268,6 +269,7 @@ struct QemuThreadData {
 void *(*start_routine)(void *);
 void *arg;
 short mode;
+NotifierList  exit;
 
 /* Only used for joinable threads. */
 bool  exited;
@@ -275,18 +277,40 @@ struct QemuThreadData {
 CRITICAL_SECTION  cs;
 };
 
+static bool atexit_registered;
+static NotifierList main_thread_exit;
+
 static __thread QemuThreadData *qemu_thread_data;
 
+static void run_main_thread_exit(void)
+{
+notifier_list_notify(&main_thread_exit, NULL);
+}
+
+void qemu_thread_atexit_add(Notifier *notifier)
+{
+if (!qemu_thread_data) {
+if (!atexit_registered) {
+atexit_registered = true;
+atexit(run_main_thread_exit);
+}
+notifier_list_add(&main_thread_exit, notifier);
+} else {
+notifier_list_add(&qemu_thread_data->exit, notifier);
+}
+}
+
+void qemu_thread_atexit_remove(Notifier *notifier)
+{
+notifier_remove(notifier);
+}
+
 static unsigned __stdcall win32_start_routine(void *arg)
 {
 QemuThreadData *data = (QemuThreadData *) arg;
 void *(*start_routine)(void *) = data->start_routine;
 void *thread_arg = data->arg;
 
-if (data->mode == QEMU_THREAD_DETACHED) {
-g_free(data);
-data = NULL;
-}
 qemu_thread_data = data;
 qemu_thread_exit(start_routine(thread_arg));
 abort();
@@ -296,12 +320,14 @@ void qemu_thread_exit(void *arg)
 {
 QemuThreadData *data = qemu_thread_data;
 
-if (data) {
-assert(data->mode != QEMU_THREAD_DETACHED);
+notifier_list_notify(&data->exit, NULL);
+if (data->mode == QEMU_THREAD_JOINABLE) {
 data->ret = arg;
 EnterCriticalSection(&data->cs);
 data->exited = true;
 LeaveCriticalSection(&data->cs);
+} else {
+g_free(data);
 }
 _endthreadex(0);
 }
@@ -313,9 +339,10 @@ void *qemu_thread_join(QemuThread *thread)
 HANDLE handle;
 
 data = thread-

[Qemu-devel] [PATCH v2 4/7] QSLIST: add lock-free operations

2014-12-02 Thread Paolo Bonzini

These operations are trivial to implement and do not have ABA problems.
They are enough to implement simple multiple-producer, single consumer
lock-free lists or, as in the next patch, the multiple consumers can
steal a whole batch of elements and process them at their leisure.

Signed-off-by: Paolo Bonzini 
---
 include/qemu/queue.h | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index d433b90..6a01e2f 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -191,8 +191,19 @@ struct {   
 \
 } while (/*CONSTCOND*/0)
 
 #define QSLIST_INSERT_HEAD(head, elm, field) do {\
-(elm)->field.sle_next = (head)->slh_first;  \
-(head)->slh_first = (elm);  \
+(elm)->field.sle_next = (head)->slh_first;   \
+(head)->slh_first = (elm);   \
+} while (/*CONSTCOND*/0)
+
+#define QSLIST_INSERT_HEAD_ATOMIC(head, elm, field) do {   \
+do {   \
+(elm)->field.sle_next = (head)->slh_first; \
+} while (atomic_cmpxchg(&(head)->slh_first, (elm)->field.sle_next, \
+   (elm)) != (elm)->field.sle_next);   \
+} while (/*CONSTCOND*/0)
+
+#define QSLIST_MOVE_ATOMIC(dest, src) do {   \
+(dest)->slh_first = atomic_xchg(&(src)->slh_first, NULL);\
 } while (/*CONSTCOND*/0)
 
 #define QSLIST_REMOVE_HEAD(head, field) do { \
-- 
2.1.0

[Qemu-devel] [PATCH v2 5/7] coroutine: rewrite pool to avoid mutex

2014-12-02 Thread Paolo Bonzini

This patch removes the mutex by using fancy lock-free manipulation of
the pool.  Lock-free stacks and queues are not hard, but they can suffer
from the ABA problem so they are better avoided unless you have some
deferred reclamation scheme like RCU.  Otherwise you have to stick
with adding to a list, and emptying it completely.  This is what this
patch does, by coupling a lock-free global list of available coroutines
with per-CPU lists that are actually used on coroutine creation.

Whenever the destruction pool is big enough, the next thread that runs
out of coroutines will steal the whole destruction pool.  This is positive
in two ways:

1) the allocation does not have to do any atomic operation in the fast
path, it's entirely using thread-local storage.  Once every POOL_BATCH_SIZE
allocations it will do a single atomic_xchg.  Release does an atomic_cmpxchg
loop, that hopefully doesn't cause any starvation, and an atomic_inc.

A later patch will also remove atomic operations from the release path,
and try to avoid the atomic_xchg altogether---succeeding in doing so if
all devices either use ioeventfd or are not submitting requests actively.

2) in theory this should be completely adaptive.  The number of coroutines
around should be a little more than POOL_BATCH_SIZE * number of allocating
threads; so this also empties qemu_coroutine_adjust_pool_size.  (The previous
pool size was POOL_BATCH_SIZE * number of block backends, so it was a bit
more generous.  But if you actually have many high-iodepth disks, it's better
to put them in different iothreads, which will also use separate thread
pools and aio=native file descriptors).

This speeds up perf/cost (in tests/test-coroutine) by a factor of ~1.33.
No matter if we end with some kind of coroutine bypass scheme or not,
it cannot hurt to optimize hot code.

Signed-off-by: Paolo Bonzini 
---
v1->v2: leave personal opinions out of commit messages :) [Kevin]

 qemu-coroutine.c | 93 +---
 1 file changed, 42 insertions(+), 51 deletions(-)

diff --git a/qemu-coroutine.c b/qemu-coroutine.c
index bd574aa..aee1017 100644
--- a/qemu-coroutine.c
+++ b/qemu-coroutine.c
@@ -15,31 +15,57 @@
 #include "trace.h"
 #include "qemu-common.h"
 #include "qemu/thread.h"
+#include "qemu/atomic.h"
 #include "block/coroutine.h"
 #include "block/coroutine_int.h"
 
 enum {
-POOL_DEFAULT_SIZE = 64,
+POOL_BATCH_SIZE = 64,
 };
 
 /** Free list to speed up creation */
-static QemuMutex pool_lock;
-static QSLIST_HEAD(, Coroutine) pool = QSLIST_HEAD_INITIALIZER(pool);
-static unsigned int pool_size;
-static unsigned int pool_max_size = POOL_DEFAULT_SIZE;
+static QSLIST_HEAD(, Coroutine) release_pool = QSLIST_HEAD_INITIALIZER(pool);
+static unsigned int release_pool_size;
+static __thread QSLIST_HEAD(, Coroutine) alloc_pool = 
QSLIST_HEAD_INITIALIZER(pool);
+static __thread Notifier coroutine_pool_cleanup_notifier;
+
+static void coroutine_pool_cleanup(Notifier *n, void *value)
+{
+Coroutine *co;
+Coroutine *tmp;
+
+QSLIST_FOREACH_SAFE(co, &alloc_pool, pool_next, tmp) {
+QSLIST_REMOVE_HEAD(&alloc_pool, pool_next);
+qemu_coroutine_delete(co);
+}
+}
 
 Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
 {
 Coroutine *co = NULL;
 
 if (CONFIG_COROUTINE_POOL) {
-qemu_mutex_lock(&pool_lock);
-co = QSLIST_FIRST(&pool);
+co = QSLIST_FIRST(&alloc_pool);
+if (!co) {
+if (release_pool_size > POOL_BATCH_SIZE) {
+/* Slow path; a good place to register the destructor, too.  */
+if (!coroutine_pool_cleanup_notifier.notify) {
+coroutine_pool_cleanup_notifier.notify = 
coroutine_pool_cleanup;
+qemu_thread_atexit_add(&coroutine_pool_cleanup_notifier);
+}
+
+/* This is not exact; there could be a little skew between
+ * release_pool_size and the actual size of release_pool.  But
+ * it is just a heuristic, it does not need to be perfect.
+ */
+release_pool_size = 0;
+QSLIST_MOVE_ATOMIC(&alloc_pool, &release_pool);
+co = QSLIST_FIRST(&alloc_pool);
+}
+}
 if (co) {
-QSLIST_REMOVE_HEAD(&pool, pool_next);
-pool_size--;
+QSLIST_REMOVE_HEAD(&alloc_pool, pool_next);
 }
-qemu_mutex_unlock(&pool_lock);
 }
 
 if (!co) {
@@ -53,39 +80,19 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
 
 static void coroutine_delete(Coroutine *co)
 {
+co->caller = NULL;
+
 if (CONFIG_COROUTINE_POOL) {
-qemu_mutex_lock(&pool_lock);
-if (pool_size < pool_max_size) {
-QSLIST_INSERT_HEAD(&pool, co, pool_next);
-co->caller = NULL;
-pool_size++;
-qemu_mutex_unlock(&pool_lock);
+if (release_pool_size < POOL_BATCH_SIZE * 2) {
+

[Qemu-devel] [PATCH v2 6/7] coroutine: drop qemu_coroutine_adjust_pool_size

2014-12-02 Thread Paolo Bonzini

This is not needed anymore.  The new TLS-based algorithm is adaptive.

Signed-off-by: Paolo Bonzini 
---
 block/block-backend.c |  4 
 include/block/coroutine.h | 10 --
 qemu-coroutine.c  |  4 
 3 files changed, 18 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index d0692b1..abf0cd1 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -260,9 +260,6 @@ int blk_attach_dev(BlockBackend *blk, void *dev)
 blk_ref(blk);
 blk->dev = dev;
 bdrv_iostatus_reset(blk->bs);
-
-/* We're expecting I/O from the device so bump up coroutine pool size */
-qemu_coroutine_adjust_pool_size(COROUTINE_POOL_RESERVATION);
 return 0;
 }
 
@@ -290,7 +287,6 @@ void blk_detach_dev(BlockBackend *blk, void *dev)
 blk->dev_ops = NULL;
 blk->dev_opaque = NULL;
 bdrv_set_guest_block_size(blk->bs, 512);
-qemu_coroutine_adjust_pool_size(-COROUTINE_POOL_RESERVATION);
 blk_unref(blk);
 }
 
diff --git a/include/block/coroutine.h b/include/block/coroutine.h
index 793df0e..20c027a 100644
--- a/include/block/coroutine.h
+++ b/include/block/coroutine.h
@@ -216,14 +216,4 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, 
QEMUClockType type,
  */
 void coroutine_fn yield_until_fd_readable(int fd);
 
-/**
- * Add or subtract from the coroutine pool size
- *
- * The coroutine implementation keeps a pool of coroutines to be reused by
- * qemu_coroutine_create().  This makes coroutine creation cheap.  Heavy
- * coroutine users should call this to reserve pool space.  Call it again with
- * a negative number to release pool space.
- */
-void qemu_coroutine_adjust_pool_size(int n);
-
 #endif /* QEMU_COROUTINE_H */
diff --git a/qemu-coroutine.c b/qemu-coroutine.c
index aee1017..ca40f4f 100644
--- a/qemu-coroutine.c
+++ b/qemu-coroutine.c
@@ -144,7 +144,3 @@ void coroutine_fn qemu_coroutine_yield(void)
 self->caller = NULL;
 coroutine_swap(self, to);
 }
-
-void qemu_coroutine_adjust_pool_size(int n)
-{
-}
-- 
2.1.0

[Qemu-devel] [RFC v2 1/6] bitmap: add atomic set functions

2014-12-02 Thread Stefan Hajnoczi

Use atomic_or() for atomic bitmaps where several threads may set bits at
the same time.  This avoids the race condition between threads loading
an element, bitwise ORing, and then storing the element.

When setting all bits in a word we can avoid atomic ops and instead just
use an smp_wmb() write barrier at the end.

Most bitmap users don't need atomicity so introduce new functions.

Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/bitmap.h |  2 ++
 include/qemu/bitops.h | 14 ++
 util/bitmap.c | 36 
 3 files changed, 52 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index f0273c9..3e0a4f3 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -39,6 +39,7 @@
  * bitmap_empty(src, nbits)Are all bits zero in *src?
  * bitmap_full(src, nbits) Are all bits set in *src?
  * bitmap_set(dst, pos, nbits) Set specified bit area
+ * bitmap_set_atomic(dst, pos, nbits)   Set specified bit area with atomic ops
  * bitmap_clear(dst, pos, nbits)   Clear specified bit area
  * bitmap_find_next_zero_area(buf, len, pos, n, mask)  Find bit free area
  */
@@ -226,6 +227,7 @@ static inline int bitmap_intersects(const unsigned long 
*src1,
 }
 
 void bitmap_set(unsigned long *map, long i, long len);
+void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
 unsigned long bitmap_find_next_zero_area(unsigned long *map,
  unsigned long size,
diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index 181bd46..eda4132 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -16,6 +16,7 @@
 #include 
 
 #include "host-utils.h"
+#include "atomic.h"
 
 #define BITS_PER_BYTE   CHAR_BIT
 #define BITS_PER_LONG   (sizeof (unsigned long) * BITS_PER_BYTE)
@@ -39,6 +40,19 @@ static inline void set_bit(long nr, unsigned long *addr)
 }
 
 /**
+ * set_bit_atomic - Set a bit in memory atomically
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ */
+static inline void set_bit_atomic(long nr, unsigned long *addr)
+{
+unsigned long mask = BIT_MASK(nr);
+unsigned long *p = addr + BIT_WORD(nr);
+
+atomic_or(p, mask);
+}
+
+/**
  * clear_bit - Clears a bit in memory
  * @nr: Bit to clear
  * @addr: Address to start counting from
diff --git a/util/bitmap.c b/util/bitmap.c
index 9c6bb52..40db0d9 100644
--- a/util/bitmap.c
+++ b/util/bitmap.c
@@ -11,6 +11,7 @@
 
 #include "qemu/bitops.h"
 #include "qemu/bitmap.h"
+#include "qemu/atomic.h"
 
 /*
  * bitmaps provide an array of bits, implemented using an an
@@ -177,6 +178,41 @@ void bitmap_set(unsigned long *map, long start, long nr)
 }
 }
 
+void bitmap_set_atomic(unsigned long *map, long start, long nr)
+{
+unsigned long *p = map + BIT_WORD(start);
+const long size = start + nr;
+int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
+unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
+
+/* First word */
+if (nr - bits_to_set > 0) {
+atomic_or(p, mask_to_set);
+nr -= bits_to_set;
+bits_to_set = BITS_PER_LONG;
+p++;
+}
+
+/* Full words */
+while (nr - bits_to_set >= 0) {
+*p = ~0UL;
+nr -= bits_to_set;
+p++;
+}
+
+/* Last word */
+if (nr) {
+mask_to_set &= BITMAP_LAST_WORD_MASK(size);
+atomic_or(p, mask_to_set);
+} else {
+/* If we avoided the full barrier in atomic_or() we should at least
+ * issue a write barrier so that other threads using barriers see
+ * up-to-date bitmap contents.
+ */
+smp_wmb();
+}
+}
+
 void bitmap_clear(unsigned long *map, long start, long nr)
 {
 unsigned long *p = map + BIT_WORD(start);
-- 
2.1.0

[Qemu-devel] [RFC v2 0/6] memory: make dirty_memory[] accesses atomic

2014-12-02 Thread Stefan Hajnoczi

v2:
 * Make bitmap_set_atomic() and bitmap_test_and_clear_atomic() faster by
   replacing the inner loop with cheaper operations.  I did use smp_wmb() in
   bitmap_set_atomic() so that the function is always a write barrier, no
   matter which code path is taken. [Paolo]

The dirty_memory[] bitmap is used for live migration, TCG self-modifying code
detection, and VGA emulation.  Up until now the bitmap was always accessed
under the QEMU global mutex.  This series makes all dirty_memory[] accesses
atomic to prepare the way for threads writing to guest memory without holding
the global mutex.

In particular, this series converts non-atomic dirty_memory[] accesses to
atomic_or, atomic_xchg, and atomic_fetch_and so that race conditions are
avoided when two threads manipulate the bitmap at the same time.

There are two pieces remaining before the dirty_memory[] bitmap is truly
thread-safe:

1. Convert all cpu_physical_memory_*_dirty() callers to use the API atomically.
   There are TCG callers who things along the lines of:

 if (!cpu_physical_memory_get_dirty(addr)) {
 cpu_physical_memory_set_dirty(addr);  /* not atomic! */
 }

   At first I considered ignoring TCG completely since it is currently not
   multi-threaded, but we might as well tackle this so that
   virtio-blk/virtio-scsi dataplane can be used with TCG eventually (they still
   need ioeventfd/irqfd emulation before they can support TCG).

2. Use array RCU to safely resize dirty_memory[] for memory hotplug.  Currently
   ram_block_add() grows the bitmap in a way that is not thread-safe.

   Paolo has a QEMU userspace RCU implementation which I'd like to bring in for
   this.

Stefan Hajnoczi (6):
  bitmap: add atomic set functions
  bitmap: add atomic test and clear
  memory: use atomic ops for setting dirty memory bits
  migration: move dirty bitmap sync to ram_addr.h
  memory: replace cpu_physical_memory_reset_dirty() with test-and-clear
  memory: make cpu_physical_memory_sync_dirty_bitmap() fully atomic

 arch_init.c | 46 ++
 cputlb.c|  4 +--
 exec.c  | 23 +++
 include/exec/ram_addr.h | 74 -
 include/qemu/bitmap.h   |  4 +++
 include/qemu/bitops.h   | 14 ++
 memory.c| 11 +++-
 util/bitmap.c   | 73 
 8 files changed, 170 insertions(+), 79 deletions(-)

-- 
2.1.0

[Qemu-devel] [RFC v2 6/6] memory: make cpu_physical_memory_sync_dirty_bitmap() fully atomic

2014-12-02 Thread Stefan Hajnoczi

The fast path of cpu_physical_memory_sync_dirty_bitmap() directly
manipulates the dirty bitmap.  Use atomic_xchg() to make the
test-and-clear atomic.

Signed-off-by: Stefan Hajnoczi 
---
 include/exec/ram_addr.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index cdcbe9a..3d2a4c1 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -195,13 +195,13 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned 
long *dest,
 unsigned long *src = ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION];
 
 for (k = page; k < page + nr; k++) {
-if (src[k]) {
+unsigned long bits = atomic_xchg(&src[k], 0);
+if (bits) {
 unsigned long new_dirty;
 new_dirty = ~dest[k];
-dest[k] |= src[k];
-new_dirty &= src[k];
+dest[k] |= bits;
+new_dirty &= bits;
 num_dirty += ctpopl(new_dirty);
-src[k] = 0;
 }
 }
 } else {
-- 
2.1.0

[Qemu-devel] [RFC v2 2/6] bitmap: add atomic test and clear

2014-12-02 Thread Stefan Hajnoczi

The new bitmap_test_and_clear_atomic() function clears a range and
returns whether or not the bits were set.

Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/bitmap.h |  2 ++
 util/bitmap.c | 37 +
 2 files changed, 39 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 3e0a4f3..86dd9cd 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -41,6 +41,7 @@
  * bitmap_set(dst, pos, nbits) Set specified bit area
  * bitmap_set_atomic(dst, pos, nbits)   Set specified bit area with atomic ops
  * bitmap_clear(dst, pos, nbits)   Clear specified bit area
+ * bitmap_test_and_clear_atomic(dst, pos, nbits)Test and clear area
  * bitmap_find_next_zero_area(buf, len, pos, n, mask)  Find bit free area
  */
 
@@ -229,6 +230,7 @@ static inline int bitmap_intersects(const unsigned long 
*src1,
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
+bool bitmap_test_and_clear_atomic(unsigned long *map, long start, long nr);
 unsigned long bitmap_find_next_zero_area(unsigned long *map,
  unsigned long size,
  unsigned long start,
diff --git a/util/bitmap.c b/util/bitmap.c
index 40db0d9..d297066 100644
--- a/util/bitmap.c
+++ b/util/bitmap.c
@@ -233,6 +233,43 @@ void bitmap_clear(unsigned long *map, long start, long nr)
 }
 }
 
+bool bitmap_test_and_clear_atomic(unsigned long *map, long start, long nr)
+{
+unsigned long *p = map + BIT_WORD(start);
+const long size = start + nr;
+int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
+unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
+unsigned long dirty = 0;
+unsigned long old_bits;
+
+/* First word */
+if (nr - bits_to_clear > 0) {
+old_bits = atomic_fetch_and(p, ~mask_to_clear);
+dirty |= old_bits & mask_to_clear;
+nr -= bits_to_clear;
+bits_to_clear = BITS_PER_LONG;
+mask_to_clear = ~0UL;
+p++;
+}
+
+/* Full words */
+while (nr - bits_to_clear >= 0) {
+old_bits = atomic_xchg(p, 0);
+dirty |= old_bits;
+nr -= bits_to_clear;
+p++;
+}
+
+/* Last word */
+if (nr) {
+mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
+old_bits = atomic_fetch_and(p, ~mask_to_clear);
+dirty |= old_bits & mask_to_clear;
+}
+
+return dirty;
+}
+
 #define ALIGN_MASK(x,mask)  (((x)+(mask))&~(mask))
 
 /**
-- 
2.1.0

[Qemu-devel] [RFC v2 4/6] migration: move dirty bitmap sync to ram_addr.h

2014-12-02 Thread Stefan Hajnoczi

The dirty memory bitmap is managed by ram_addr.h and copied to
migration_bitmap[] periodically during live migration.

Move the code to sync the bitmap to ram_addr.h where related code lives.

Signed-off-by: Stefan Hajnoczi 
---
 arch_init.c | 46 ++
 include/exec/ram_addr.h | 44 
 2 files changed, 46 insertions(+), 44 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 7680d28..79c7784 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -436,52 +436,10 @@ ram_addr_t 
migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
 return (next - base) << TARGET_PAGE_BITS;
 }
 
-static inline bool migration_bitmap_set_dirty(ram_addr_t addr)
-{
-bool ret;
-int nr = addr >> TARGET_PAGE_BITS;
-
-ret = test_and_set_bit(nr, migration_bitmap);
-
-if (!ret) {
-migration_dirty_pages++;
-}
-return ret;
-}
-
 static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
 {
-ram_addr_t addr;
-unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS);
-
-/* start address is aligned at the start of a word? */
-if (((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) {
-int k;
-int nr = BITS_TO_LONGS(length >> TARGET_PAGE_BITS);
-unsigned long *src = ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION];
-
-for (k = page; k < page + nr; k++) {
-if (src[k]) {
-unsigned long new_dirty;
-new_dirty = ~migration_bitmap[k];
-migration_bitmap[k] |= src[k];
-new_dirty &= src[k];
-migration_dirty_pages += ctpopl(new_dirty);
-src[k] = 0;
-}
-}
-} else {
-for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
-if (cpu_physical_memory_get_dirty(start + addr,
-  TARGET_PAGE_SIZE,
-  DIRTY_MEMORY_MIGRATION)) {
-cpu_physical_memory_reset_dirty(start + addr,
-TARGET_PAGE_SIZE,
-DIRTY_MEMORY_MIGRATION);
-migration_bitmap_set_dirty(start + addr);
-}
-}
-}
+migration_dirty_pages +=
+cpu_physical_memory_sync_dirty_bitmap(migration_bitmap, start, length);
 }
 
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index ba90daa..87a8b28 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -190,5 +190,49 @@ static inline void 
cpu_physical_memory_clear_dirty_range(ram_addr_t start,
 void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t length,
  unsigned client);
 
+static inline
+uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned long *dest,
+   ram_addr_t start,
+   ram_addr_t length)
+{
+ram_addr_t addr;
+unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS);
+uint64_t num_dirty = 0;
+
+/* start address is aligned at the start of a word? */
+if (((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) {
+int k;
+int nr = BITS_TO_LONGS(length >> TARGET_PAGE_BITS);
+unsigned long *src = ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION];
+
+for (k = page; k < page + nr; k++) {
+if (src[k]) {
+unsigned long new_dirty;
+new_dirty = ~dest[k];
+dest[k] |= src[k];
+new_dirty &= src[k];
+num_dirty += ctpopl(new_dirty);
+src[k] = 0;
+}
+}
+} else {
+for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
+if (cpu_physical_memory_get_dirty(start + addr,
+  TARGET_PAGE_SIZE,
+  DIRTY_MEMORY_MIGRATION)) {
+cpu_physical_memory_reset_dirty(start + addr,
+TARGET_PAGE_SIZE,
+DIRTY_MEMORY_MIGRATION);
+long k = (start + addr) >> TARGET_PAGE_BITS;
+if (!test_and_set_bit(k, dest)) {
+num_dirty++;
+}
+}
+}
+}
+
+return num_dirty;
+}
+
 #endif
 #endif
-- 
2.1.0

[Qemu-devel] [RFC v2 5/6] memory: replace cpu_physical_memory_reset_dirty() with test-and-clear

2014-12-02 Thread Stefan Hajnoczi

The cpu_physical_memory_reset_dirty() function is sometimes used
together with cpu_physical_memory_get_dirty().  This is not atomic since
two separate accesses to the dirty memory bitmap are made.

Turn cpu_physical_memory_reset_dirty() into the atomic
cpu_physical_memory_test_and_clear_dirty().

The cpu_physical_memory_clear_dirty_range() function is no longer used
and can be dropped.

Signed-off-by: Stefan Hajnoczi 
---
 cputlb.c|  4 ++--
 exec.c  | 23 +--
 include/exec/ram_addr.h | 27 +++
 memory.c| 11 ---
 4 files changed, 30 insertions(+), 35 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index a55518a..3d5b9a7 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -125,8 +125,8 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
can be detected */
 void tlb_protect_code(ram_addr_t ram_addr)
 {
-cpu_physical_memory_reset_dirty(ram_addr, TARGET_PAGE_SIZE,
-DIRTY_MEMORY_CODE);
+cpu_physical_memory_test_and_clear_dirty(ram_addr, TARGET_PAGE_SIZE,
+ DIRTY_MEMORY_CODE);
 }
 
 /* update the TLB so that writes in physical page 'phys_addr' are no longer
diff --git a/exec.c b/exec.c
index 71ac104..72bdb2e 100644
--- a/exec.c
+++ b/exec.c
@@ -845,16 +845,27 @@ static void tlb_reset_dirty_range_all(ram_addr_t start, 
ram_addr_t length)
 }
 
 /* Note: start and end must be within the same ram block.  */
-void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t length,
- unsigned client)
+bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
+  ram_addr_t length,
+  unsigned client)
 {
-if (length == 0)
-return;
-cpu_physical_memory_clear_dirty_range(start, length, client);
+unsigned long end, page;
+bool dirty;
 
-if (tcg_enabled()) {
+if (length == 0) {
+return false;
+}
+
+end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
+page = start >> TARGET_PAGE_BITS;
+dirty = bitmap_test_and_clear_atomic(ram_list.dirty_memory[client],
+ page, end - page);
+
+if (dirty && tcg_enabled()) {
 tlb_reset_dirty_range_all(start, length);
 }
+
+return dirty;
 }
 
 static void cpu_physical_memory_set_dirty_tracking(bool enable)
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 87a8b28..cdcbe9a 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -175,20 +175,9 @@ static inline void 
cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
 }
 #endif /* not _WIN32 */
 
-static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start,
- ram_addr_t length,
- unsigned client)
-{
-unsigned long end, page;
-
-assert(client < DIRTY_MEMORY_NUM);
-end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
-page = start >> TARGET_PAGE_BITS;
-bitmap_clear(ram_list.dirty_memory[client], page, end - page);
-}
-
-void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t length,
- unsigned client);
+bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
+  ram_addr_t length,
+  unsigned client);
 
 static inline
 uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned long *dest,
@@ -217,12 +206,10 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(unsigned 
long *dest,
 }
 } else {
 for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
-if (cpu_physical_memory_get_dirty(start + addr,
-  TARGET_PAGE_SIZE,
-  DIRTY_MEMORY_MIGRATION)) {
-cpu_physical_memory_reset_dirty(start + addr,
-TARGET_PAGE_SIZE,
-DIRTY_MEMORY_MIGRATION);
+if (cpu_physical_memory_test_and_clear_dirty(
+start + addr,
+TARGET_PAGE_SIZE,
+DIRTY_MEMORY_MIGRATION)) {
 long k = (start + addr) >> TARGET_PAGE_BITS;
 if (!test_and_set_bit(k, dest)) {
 num_dirty++;
diff --git a/memory.c b/memory.c
index 15cf9eb..4d7761e 100644
--- a/memory.c
+++ b/memory.c
@@ -1375,13 +1375,9 @@ void memory_region_set_dirty(MemoryRegion *mr, hwaddr 
addr,
 bool memory_region_test_and_clear_dirty(MemoryRegion *mr, hwaddr addr,
 hwaddr size, unsigned client)
 {
-bool ret;
 assert(mr->terminates);
-ret = cpu_physical_memory_get_

[Qemu-devel] [RFC v2 3/6] memory: use atomic ops for setting dirty memory bits

2014-12-02 Thread Stefan Hajnoczi

Use set_bit_atomic() and bitmap_set_atomic() so that multiple threads
can dirty memory without race conditions.

Signed-off-by: Stefan Hajnoczi 
---
I had to get creative to stay under 80 characters per line.  I'm open to
suggestions if you prefer me to format it another way.
---
 include/exec/ram_addr.h | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 8fc75cd..ba90daa 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -93,30 +93,32 @@ static inline void 
cpu_physical_memory_set_dirty_flag(ram_addr_t addr,
   unsigned client)
 {
 assert(client < DIRTY_MEMORY_NUM);
-set_bit(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]);
+set_bit_atomic(addr >> TARGET_PAGE_BITS, ram_list.dirty_memory[client]);
 }
 
 static inline void cpu_physical_memory_set_dirty_range_nocode(ram_addr_t start,
   ram_addr_t 
length)
 {
 unsigned long end, page;
+unsigned long **dirty_memory = ram_list.dirty_memory;
 
 end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
 page = start >> TARGET_PAGE_BITS;
-bitmap_set(ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION], page, end - 
page);
-bitmap_set(ram_list.dirty_memory[DIRTY_MEMORY_VGA], page, end - page);
+bitmap_set_atomic(dirty_memory[DIRTY_MEMORY_MIGRATION], page, end - page);
+bitmap_set_atomic(dirty_memory[DIRTY_MEMORY_VGA], page, end - page);
 }
 
 static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start,
ram_addr_t length)
 {
 unsigned long end, page;
+unsigned long **dirty_memory = ram_list.dirty_memory;
 
 end = TARGET_PAGE_ALIGN(start + length) >> TARGET_PAGE_BITS;
 page = start >> TARGET_PAGE_BITS;
-bitmap_set(ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION], page, end - 
page);
-bitmap_set(ram_list.dirty_memory[DIRTY_MEMORY_VGA], page, end - page);
-bitmap_set(ram_list.dirty_memory[DIRTY_MEMORY_CODE], page, end - page);
+bitmap_set_atomic(dirty_memory[DIRTY_MEMORY_MIGRATION], page, end - page);
+bitmap_set_atomic(dirty_memory[DIRTY_MEMORY_VGA], page, end - page);
+bitmap_set_atomic(dirty_memory[DIRTY_MEMORY_CODE], page, end - page);
 xen_modified_memory(start, length);
 }
 
@@ -142,10 +144,11 @@ static inline void 
cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap,
 for (k = 0; k < nr; k++) {
 if (bitmap[k]) {
 unsigned long temp = leul_to_cpu(bitmap[k]);
+unsigned long **d = ram_list.dirty_memory;
 
-ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION][page + k] |= 
temp;
-ram_list.dirty_memory[DIRTY_MEMORY_VGA][page + k] |= temp;
-ram_list.dirty_memory[DIRTY_MEMORY_CODE][page + k] |= temp;
+atomic_or(&d[DIRTY_MEMORY_MIGRATION][page + k], temp);
+atomic_or(&d[DIRTY_MEMORY_VGA][page + k], temp);
+atomic_or(&d[DIRTY_MEMORY_CODE][page + k], temp);
 }
 }
 xen_modified_memory(start, pages);
-- 
2.1.0

Re: [Qemu-devel] [PATCH] target-mips: Output CP0.Config2-5 in the register dump

2014-12-02 Thread Maciej W. Rozycki

On Tue, 2 Dec 2014, Leon Alrae wrote:

> > @@ -19276,6 +19276,10 @@ void mips_cpu_dump_state(CPUState *cs, F
> >  env->CP0_Status, env->CP0_Cause, env->CP0_EPC);
> >  cpu_fprintf(f, "Config0 0x%08x Config1 0x%08x LLAddr 0x" 
> > TARGET_FMT_lx "\n",
> >  env->CP0_Config0, env->CP0_Config1, env->lladdr);
> > +cpu_fprintf(f, "Config2 0x%08x Config3 0x%08x\n",
> > +env->CP0_Config2, env->CP0_Config3);
> > +cpu_fprintf(f, "Config4 0x%08x Config5 0x%08x\n",
> > +env->CP0_Config4, env->CP0_Config5);
> 
> Wouldn't it be better to order these registers for example by CP0
> Register and Select number rather than putting them at the end? I think
> it doesn't matter at the moment as there are just few registers printed
> out, but once we start adding more then probably we would like to avoid
> having them placed arbitrarily in the output.

 I placed the registers such that output renders in a well-formatted 
readable way.  Please note that we print 64-bit registers at the right 
side and I avoided intermixing 32-bit registers with 64-bit registers.

 I agree we might reformat output according to CP0 Register and Select 
numbers as for example our (Mentor's) GDB does in `info registers' (the 
change to dump all available CP0 registers in GDB hasn't made its way 
upstream yet, partially due to the XML register description issue I 
mentioned previously).  Hovewer we need to think how to format output in a 
readable way as the receiver will most often be a human being.

 I think deciding on a uniform register width, i.e. the native machine 
word size of the CPU, might help here, after all in the execution stream 
you can access them with DMFC0 on 64-bit processors and get the expected 
result (with "don't-care's" in the upper 32 bits).  The drawback is more 
screen space is used for data which can make it all harder to comprehend.  
For example I find the output of `info registers' produced by GDB on a 
64-bit target more difficult to process in my head than the corresponding 
output on a 32-bit target.  Maybe we can leave the upper 32 bits padded 
with spaces rather than zeros though.

  Maciej

Re: [Qemu-devel] [PATCH v2 00/13] block: Various Coverity-spotted fixes

2014-12-02 Thread Kevin Wolf

Am 27.11.2014 um 15:48 hat Max Reitz geschrieben:
> This series fixes various issues spotted by Coverity. None of these is
> critical; most are just "If you do something crazy, qemu-img crashes" or
> "But what if there is no qcow2 driver?". Therefore, while these are bug
> fixes, it is a bit late to try to push them into 2.2.0. I am therefore
> tempted to vote to target 2.3 instead.
> 
> Also, none is security-relevant. The only crashes which are fixed here
> are sure to have resulted from dereferencing a NULL pointer.

Reviewed-by: Kevin Wolf

Re: [Qemu-devel] [PATCH] target-mips: Fix CP0.Config3.ISAOnExc write accesses

2014-12-02 Thread Maciej W. Rozycki

On Tue, 2 Dec 2014, Leon Alrae wrote:

> >  Please note that for this validation I'm using an artificial microMIPS 
> > processor that also has an FPU implemented, so that our microMIPS FP 
> > support is correctly validated too (I don't really know if there exists 
> > any real microMIPS processor that includes an FPU; if so, then it would 
> > be good to add it to the list our supported configurations).
> 
> FYI, there are real CPUs which support microMIPS and include FPU, for
> example microAptivUC.

 Good to know, thanks, and good to have real hardware as a reference.  
Thanks for your review.

  Maciej

Re: [Qemu-devel] [PATCH v2 5/7] coroutine: rewrite pool to avoid mutex

2014-12-02 Thread Peter Lieven


On 02.12.2014 12:05, Paolo Bonzini wrote:

This patch removes the mutex by using fancy lock-free manipulation of
the pool.  Lock-free stacks and queues are not hard, but they can suffer
from the ABA problem so they are better avoided unless you have some
deferred reclamation scheme like RCU.  Otherwise you have to stick
with adding to a list, and emptying it completely.  This is what this
patch does, by coupling a lock-free global list of available coroutines
with per-CPU lists that are actually used on coroutine creation.

Whenever the destruction pool is big enough, the next thread that runs
out of coroutines will steal the whole destruction pool.  This is positive
in two ways:

1) the allocation does not have to do any atomic operation in the fast
path, it's entirely using thread-local storage.  Once every POOL_BATCH_SIZE
allocations it will do a single atomic_xchg.  Release does an atomic_cmpxchg
loop, that hopefully doesn't cause any starvation, and an atomic_inc.

A later patch will also remove atomic operations from the release path,
and try to avoid the atomic_xchg altogether---succeeding in doing so if
all devices either use ioeventfd or are not submitting requests actively.

2) in theory this should be completely adaptive.  The number of coroutines
around should be a little more than POOL_BATCH_SIZE * number of allocating
threads; so this also empties qemu_coroutine_adjust_pool_size.  (The previous
pool size was POOL_BATCH_SIZE * number of block backends, so it was a bit
more generous.  But if you actually have many high-iodepth disks, it's better
to put them in different iothreads, which will also use separate thread
pools and aio=native file descriptors).

This speeds up perf/cost (in tests/test-coroutine) by a factor of ~1.33.
No matter if we end with some kind of coroutine bypass scheme or not,
it cannot hurt to optimize hot code.

Signed-off-by: Paolo Bonzini 
---
v1->v2: leave personal opinions out of commit messages :) [Kevin]

  qemu-coroutine.c | 93 +---
  1 file changed, 42 insertions(+), 51 deletions(-)

diff --git a/qemu-coroutine.c b/qemu-coroutine.c
index bd574aa..aee1017 100644
--- a/qemu-coroutine.c
+++ b/qemu-coroutine.c
@@ -15,31 +15,57 @@
  #include "trace.h"
  #include "qemu-common.h"
  #include "qemu/thread.h"
+#include "qemu/atomic.h"
  #include "block/coroutine.h"
  #include "block/coroutine_int.h"
  
  enum {

-POOL_DEFAULT_SIZE = 64,
+POOL_BATCH_SIZE = 64,
  };
  
  /** Free list to speed up creation */

-static QemuMutex pool_lock;
-static QSLIST_HEAD(, Coroutine) pool = QSLIST_HEAD_INITIALIZER(pool);
-static unsigned int pool_size;
-static unsigned int pool_max_size = POOL_DEFAULT_SIZE;
+static QSLIST_HEAD(, Coroutine) release_pool = QSLIST_HEAD_INITIALIZER(pool);
+static unsigned int release_pool_size;
+static __thread QSLIST_HEAD(, Coroutine) alloc_pool = 
QSLIST_HEAD_INITIALIZER(pool);
+static __thread Notifier coroutine_pool_cleanup_notifier;
+
+static void coroutine_pool_cleanup(Notifier *n, void *value)
+{
+Coroutine *co;
+Coroutine *tmp;
+
+QSLIST_FOREACH_SAFE(co, &alloc_pool, pool_next, tmp) {
+QSLIST_REMOVE_HEAD(&alloc_pool, pool_next);
+qemu_coroutine_delete(co);
+}
+}
  
  Coroutine *qemu_coroutine_create(CoroutineEntry *entry)

  {
  Coroutine *co = NULL;
  
  if (CONFIG_COROUTINE_POOL) {

-qemu_mutex_lock(&pool_lock);
-co = QSLIST_FIRST(&pool);
+co = QSLIST_FIRST(&alloc_pool);
+if (!co) {
+if (release_pool_size > POOL_BATCH_SIZE) {
+/* Slow path; a good place to register the destructor, too.  */
+if (!coroutine_pool_cleanup_notifier.notify) {
+coroutine_pool_cleanup_notifier.notify = 
coroutine_pool_cleanup;
+qemu_thread_atexit_add(&coroutine_pool_cleanup_notifier);
+}
+
+/* This is not exact; there could be a little skew between
+ * release_pool_size and the actual size of release_pool.  But
+ * it is just a heuristic, it does not need to be perfect.
+ */
+release_pool_size = 0;
+QSLIST_MOVE_ATOMIC(&alloc_pool, &release_pool);
+co = QSLIST_FIRST(&alloc_pool);
+}
+}
  if (co) {
-QSLIST_REMOVE_HEAD(&pool, pool_next);
-pool_size--;
+QSLIST_REMOVE_HEAD(&alloc_pool, pool_next);
  }
-qemu_mutex_unlock(&pool_lock);
  }
  
  if (!co) {

@@ -53,39 +80,19 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
  
  static void coroutine_delete(Coroutine *co)

  {
+co->caller = NULL;
+
  if (CONFIG_COROUTINE_POOL) {
-qemu_mutex_lock(&pool_lock);
-if (pool_size < pool_max_size) {
-QSLIST_INSERT_HEAD(&pool, co, pool_next);
-co->caller = NULL;
-pool_size++;
-qemu_mutex_un

Re: [Qemu-devel] [PATCH v2 5/7] coroutine: rewrite pool to avoid mutex

2014-12-02 Thread Paolo Bonzini



On 02/12/2014 13:09, Peter Lieven wrote:
>>
>> -static void __attribute__((destructor)) coroutine_pool_cleanup(void)
>> -{
>> -Coroutine *co;
>> -Coroutine *tmp;
>> -
>> -QSLIST_FOREACH_SAFE(co, &pool, pool_next, tmp) {
>> -QSLIST_REMOVE_HEAD(&pool, pool_next);
>> -qemu_coroutine_delete(co);
>> -}
>> -
>> -qemu_mutex_destroy(&pool_lock);
>> -}
>> -
> 
> I still feel we should leave this destructor in to clean up the
> release_pool.

Why?  If you run QEMU under valgrind, there are thousands of blocks that
we do not free.  Stefan/Kevin, what do you think?

Paolo

Re: [Qemu-devel] [RFC v2 2/6] bitmap: add atomic test and clear

2014-12-02 Thread Paolo Bonzini



On 02/12/2014 12:23, Stefan Hajnoczi wrote:
> The new bitmap_test_and_clear_atomic() function clears a range and
> returns whether or not the bits were set.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/qemu/bitmap.h |  2 ++
>  util/bitmap.c | 37 +
>  2 files changed, 39 insertions(+)
> 
> diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
> index 3e0a4f3..86dd9cd 100644
> --- a/include/qemu/bitmap.h
> +++ b/include/qemu/bitmap.h
> @@ -41,6 +41,7 @@
>   * bitmap_set(dst, pos, nbits)   Set specified bit area
>   * bitmap_set_atomic(dst, pos, nbits)   Set specified bit area with atomic 
> ops
>   * bitmap_clear(dst, pos, nbits) Clear specified bit area
> + * bitmap_test_and_clear_atomic(dst, pos, nbits)Test and clear area
>   * bitmap_find_next_zero_area(buf, len, pos, n, mask)Find bit free 
> area
>   */
>  
> @@ -229,6 +230,7 @@ static inline int bitmap_intersects(const unsigned long 
> *src1,
>  void bitmap_set(unsigned long *map, long i, long len);
>  void bitmap_set_atomic(unsigned long *map, long i, long len);
>  void bitmap_clear(unsigned long *map, long start, long nr);
> +bool bitmap_test_and_clear_atomic(unsigned long *map, long start, long nr);
>  unsigned long bitmap_find_next_zero_area(unsigned long *map,
>   unsigned long size,
>   unsigned long start,
> diff --git a/util/bitmap.c b/util/bitmap.c
> index 40db0d9..d297066 100644
> --- a/util/bitmap.c
> +++ b/util/bitmap.c
> @@ -233,6 +233,43 @@ void bitmap_clear(unsigned long *map, long start, long 
> nr)
>  }
>  }
>  
> +bool bitmap_test_and_clear_atomic(unsigned long *map, long start, long nr)
> +{
> +unsigned long *p = map + BIT_WORD(start);
> +const long size = start + nr;
> +int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
> +unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
> +unsigned long dirty = 0;
> +unsigned long old_bits;
> +
> +/* First word */
> +if (nr - bits_to_clear > 0) {
> +old_bits = atomic_fetch_and(p, ~mask_to_clear);
> +dirty |= old_bits & mask_to_clear;
> +nr -= bits_to_clear;
> +bits_to_clear = BITS_PER_LONG;
> +mask_to_clear = ~0UL;
> +p++;
> +}
> +
> +/* Full words */
> +while (nr - bits_to_clear >= 0) {
> +old_bits = atomic_xchg(p, 0);

Perhaps "if (!*p) continue;", and a smp_wmb or smp_mb at the end as in
patch 1?  That's at least what KVM does.

Paolo

> +dirty |= old_bits;
> +nr -= bits_to_clear;
> +p++;
> +}
> +
> +/* Last word */
> +if (nr) {
> +mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
> +old_bits = atomic_fetch_and(p, ~mask_to_clear);
> +dirty |= old_bits & mask_to_clear;
> +}
> +
> +return dirty;
> +}
> +
>  #define ALIGN_MASK(x,mask)  (((x)+(mask))&~(mask))
>  
>  /**
>

Re: [Qemu-devel] [PATCH v2 5/7] coroutine: rewrite pool to avoid mutex

2014-12-02 Thread Peter Lieven


On 02.12.2014 13:13, Paolo Bonzini wrote:


On 02/12/2014 13:09, Peter Lieven wrote:

-static void __attribute__((destructor)) coroutine_pool_cleanup(void)
-{
-Coroutine *co;
-Coroutine *tmp;
-
-QSLIST_FOREACH_SAFE(co, &pool, pool_next, tmp) {
-QSLIST_REMOVE_HEAD(&pool, pool_next);
-qemu_coroutine_delete(co);
-}
-
-qemu_mutex_destroy(&pool_lock);
-}
-

I still feel we should leave this destructor in to clean up the
release_pool.

Why?  If you run QEMU under valgrind, there are thousands of blocks that
we do not free.  Stefan/Kevin, what do you think?


Before this patch we cleaned up this part at least.
I have learned that it bad style not to clean up your resources.
Just because other code parts do not do it we should not introduce
new parts that don't it.

Peter

Re: [Qemu-devel] Vhost-user - multi queue support

2014-12-02 Thread Michael S. Tsirkin

On Tue, Dec 02, 2014 at 11:42:22AM +, Long, Thomas wrote:
> Hi All,
> 
> I’m just wondering what the status is with regards to supporting multi-queue 
> in
> Vhost-user?
> 
>  
> 
> I see that Nikolaev has developed a patch to support this feature:
> 
> https://github.com/SnabbCo/qemu/commit/f41eeccf4ab6ea5970e2941ce2de0aae893b10f9
> 
>  
> 
> Are there any current plans to pull either this patch or a different patch 
> into
> QEMU ?
> 
>  
> 
> Regards,
> 
> Tommy
> 
>  
> 

I don't think it was every submitted.

[Qemu-devel] [PATCH v3] qtest/bios-tables: Add DMAR aml file and enable intel_iommu for q35

2014-12-02 Thread Vasilis Liaskovitis

We generate the expected DMAR table and enable intel_iommu on q35 to test the
table.

Signed-off-by: Vasilis Liaskovitis 
---
 tests/acpi-test-data/q35/DMAR | Bin 0 -> 64 bytes
 tests/bios-tables-test.c  |   2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)
 create mode 100644 tests/acpi-test-data/q35/DMAR

diff --git a/tests/acpi-test-data/q35/DMAR b/tests/acpi-test-data/q35/DMAR
new file mode 100644
index 
..6def4553381f4e48b80ead11af2adf6ce09c8c7e
GIT binary patch
literal 64
ycmZ?qbqsP~U|?XZbMklg2v%^42yk`*iZKGkKx`1L2E+&;zyK0sV7U1YL;?U|dI

Re: [Qemu-devel] [PATCH v2] qtest/bios-tables: Add DMAR unit test on intel_iommu for q35

2014-12-02 Thread Vasilis Liaskovitis

On Mon, Nov 23, 2014 at 04:25:05PM +0200, Marcel Apfelbaum wrote:
> On Mon, 2014-11-24 at 14:37 +0100, Vasilis Liaskovitis wrote:
> > The test enables intel_iommu on q35, looks for and reads the DMAR table as 
> > well
> > as its only DRHC structure (for now), checking the header and checksums.
> 
> Hi Vaisilis,
> I had a deeper look to your patch and the code already checks
> header and checksum for DMAR, all you had to do is to add your latest chunk:
> 
> @@ -779,7 +823,7 @@ static void test_acpi_tcg(void)
> >  
> >  memset(&data, 0, sizeof(data));
> >  data.machine = MACHINE_Q35;
> > -test_acpi_one("-machine q35,accel=tcg", &data);
> > +test_acpi_one("-machine q35,accel=tcg,iommu=on", &data);
> >  free_test_data(&data);
> 
> You can check that it is automatically done by test_dst_table function.
> You can add there a print to convince yourself.
> 
> However what is missing is a DMAR binary table to compare the content with an 
> expected one.
> You can create it by running:
> tests/acpi-test-data/rebuild-expected-aml.sh
> 
> Then add the newly created file to tests/acpi-test-data/q35/DMAR

sorry for the delay. thanks, I missed this. I sent v3 simply with the addition
of the DMAR aml file and just the "iommu=on" chunk, as you suggested.


- Vasilis

Re: [Qemu-devel] [Xen-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap

2014-12-02 Thread Stefano Stabellini

On Mon, 1 Dec 2014, Don Slutz wrote:
> On 12/01/14 10:37, Stefano Stabellini wrote:
> > On Mon, 1 Dec 2014, Don Slutz wrote:
> > > On 11/27/14 05:48, Stefano Stabellini wrote:
> 
> [...]
> 
> > > 
> > > Works fine in both claim modes and with PoD used (maxmem > memory).  Do
> > > not know how to test with tmem.  I do not see how it would be worse then
> > > current
> > > code that does not auto increase.  I.E. even without a xen change, I think
> > > something
> > > like this could be done.
> > OK, good to know. I am OK with increasing maxmem only if it is strictly
> > necessary.
> > 
> > 
> > > > > My testing shows a free 32 pages that I am not sure where they come
> > > > > from.
> > > > > But
> > > > > the code about is passing my 8 nics of e1000.
> > > > I think that raising maxmem a bit higher than necessary is not too bad.
> > > > If we really care about it, we could lower the limit after QEMU's
> > > > initialization is completed.
> > > Ok.  I did find the 32 it is VGA_HOLE_SIZE.  So here is what I have which
> > > includes
> > > a lot of extra printf.
> > In QEMU I would prefer not to assume that libxl increased maxmem for the
> > vga hole. I would rather call xc_domain_setmaxmem twice for the vga hole
> > than tie QEMU to a particular maxmem allocation scheme in libxl.
> 
> Ok.  The area we are talking about is 0x000a to 0x000c.
> It is in libxc (xc_hvm_build_x86), not libxl.   I have no issue with a name
> change to
> some thing like QEMU_EXTRA_FREE_PAGES.

Sorry, I was thinking about the relocated videoram region, the one
allocated by xen_add_to_physmap in QEMU.

Regarding 0x000a-0x000c, according to this comment:

/*
 * Allocate memory for HVM guest, skipping VGA hole 0xA-0xC.
 *

xc_hvm_build_x86 shouldn't be allocating memory for it.
In fact if I remember correctly 0xa-0xc is trapped and emulated.


> My testing has shows that some of these 32 pages are used outside of QEMU.
> I am seeing just 23 free pages using a standalone program to display
> the same info after a CentOS 6.3 guest is done booting.
>
> > In libxl I would like to avoid increasing mamxem for anything QEMU will
> > allocate later, that includes rom and vga vram. I am not sure how to
> > make that work with older QEMU versions that don't call
> > xc_domain_setmaxmem by themselves yet though. Maybe we could check the
> > specific QEMU version in libxl and decide based on that. Or we could
> > export a feature flag in QEMU.
> 
> Yes, it would be nice to adjust libxl to not increase maxmem. However since
> videoram is included in memory (and maxmem), making the change related
> to vram is a bigger issue.
> 
> the rom change is much simpler.
> 
> Currently I do not know of a way to do different things based on the QEMU
> version
> and/or features (this includes getting the QEMU version in libxl).
> 
> I have been going with:
> 1) change QEMU 1st.
> 2) Wait for an upstream version of QEMU with this.
> 3) change xen to optionally use a feature in the latest QEMU.

Let's try to take this approach for the videoram too


> > 
> > > --- a/xen-hvm.c
> > > +++ b/xen-hvm.c
> > > @@ -67,6 +67,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t
> > > *shared_page, int vcpu)
> > >   #endif
> > > 
> > >   #define BUFFER_IO_MAX_DELAY  100
> > > +#define VGA_HOLE_SIZE (0x20)
> > > 
> > >   typedef struct XenPhysmap {
> > >   hwaddr start_addr;
> > > @@ -219,6 +220,11 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
> > > size,
> > > MemoryRegion *mr)
> > >   xen_pfn_t *pfn_list;
> > >   int i;
> > >   xc_dominfo_t info;
> > > +unsigned long max_pages, free_pages, real_free;
> > > +long need_pages;
> > > +uint64_t tot_pages, pod_cache_pages, pod_entries;
> > > +
> > > +trace_xen_ram_alloc(ram_addr, size, mr->name);
> > > 
> > >   if (runstate_check(RUN_STATE_INMIGRATE)) {
> > >   /* RAM already populated in Xen */
> > > @@ -232,13 +238,6 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
> > > size,
> > > MemoryRegion *mr)
> > >   return;
> > >   }
> > > 
> > > -fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
> > > -" bytes (%ld Kib) of ram at "RAM_ADDR_FMT
> > > -" mr.name=%s\n",
> > > -__func__, size, (long)(size>>10), ram_addr, mr->name);
> > > -
> > > -trace_xen_ram_alloc(ram_addr, size);
> > > -
> > >   nr_pfn = size >> TARGET_PAGE_BITS;
> > >   pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn);
> > > 
> > > @@ -246,11 +245,38 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
> > > size,
> > > MemoryRegion *mr)
> > >   pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
> > >   }
> > > 
> > > -if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) {
> > > +if ((xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) ||

I think it's best to call xc_domain_getinfolist.


> > > +   (info.domid != xen_domid)) {
> > >   hw_error("xc_domain_getinfo failed");

Re: [Qemu-devel] [RFC v2 1/6] bitmap: add atomic set functions

2014-12-02 Thread Paolo Bonzini



On 02/12/2014 12:23, Stefan Hajnoczi wrote:
> Use atomic_or() for atomic bitmaps where several threads may set bits at
> the same time.  This avoids the race condition between threads loading
> an element, bitwise ORing, and then storing the element.
> 
> When setting all bits in a word we can avoid atomic ops and instead just
> use an smp_wmb() write barrier at the end.
> 
> Most bitmap users don't need atomicity so introduce new functions.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/qemu/bitmap.h |  2 ++
>  include/qemu/bitops.h | 14 ++
>  util/bitmap.c | 36 
>  3 files changed, 52 insertions(+)
> 
> diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
> index f0273c9..3e0a4f3 100644
> --- a/include/qemu/bitmap.h
> +++ b/include/qemu/bitmap.h
> @@ -39,6 +39,7 @@
>   * bitmap_empty(src, nbits)  Are all bits zero in *src?
>   * bitmap_full(src, nbits)   Are all bits set in *src?
>   * bitmap_set(dst, pos, nbits)   Set specified bit area
> + * bitmap_set_atomic(dst, pos, nbits)   Set specified bit area with atomic 
> ops
>   * bitmap_clear(dst, pos, nbits) Clear specified bit area
>   * bitmap_find_next_zero_area(buf, len, pos, n, mask)Find bit free 
> area
>   */
> @@ -226,6 +227,7 @@ static inline int bitmap_intersects(const unsigned long 
> *src1,
>  }
>  
>  void bitmap_set(unsigned long *map, long i, long len);
> +void bitmap_set_atomic(unsigned long *map, long i, long len);
>  void bitmap_clear(unsigned long *map, long start, long nr);
>  unsigned long bitmap_find_next_zero_area(unsigned long *map,
>   unsigned long size,
> diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
> index 181bd46..eda4132 100644
> --- a/include/qemu/bitops.h
> +++ b/include/qemu/bitops.h
> @@ -16,6 +16,7 @@
>  #include 
>  
>  #include "host-utils.h"
> +#include "atomic.h"
>  
>  #define BITS_PER_BYTE   CHAR_BIT
>  #define BITS_PER_LONG   (sizeof (unsigned long) * BITS_PER_BYTE)
> @@ -39,6 +40,19 @@ static inline void set_bit(long nr, unsigned long *addr)
>  }
>  
>  /**
> + * set_bit_atomic - Set a bit in memory atomically
> + * @nr: the bit to set
> + * @addr: the address to start counting from
> + */
> +static inline void set_bit_atomic(long nr, unsigned long *addr)
> +{
> +unsigned long mask = BIT_MASK(nr);
> +unsigned long *p = addr + BIT_WORD(nr);
> +
> +atomic_or(p, mask);
> +}
> +
> +/**
>   * clear_bit - Clears a bit in memory
>   * @nr: Bit to clear
>   * @addr: Address to start counting from
> diff --git a/util/bitmap.c b/util/bitmap.c
> index 9c6bb52..40db0d9 100644
> --- a/util/bitmap.c
> +++ b/util/bitmap.c
> @@ -11,6 +11,7 @@
>  
>  #include "qemu/bitops.h"
>  #include "qemu/bitmap.h"
> +#include "qemu/atomic.h"
>  
>  /*
>   * bitmaps provide an array of bits, implemented using an an
> @@ -177,6 +178,41 @@ void bitmap_set(unsigned long *map, long start, long nr)
>  }
>  }
>  
> +void bitmap_set_atomic(unsigned long *map, long start, long nr)
> +{
> +unsigned long *p = map + BIT_WORD(start);
> +const long size = start + nr;
> +int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
> +unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
> +
> +/* First word */
> +if (nr - bits_to_set > 0) {
> +atomic_or(p, mask_to_set);
> +nr -= bits_to_set;
> +bits_to_set = BITS_PER_LONG;
> +p++;
> +}
> +
> +/* Full words */
> +while (nr - bits_to_set >= 0) {
> +*p = ~0UL;
> +nr -= bits_to_set;
> +p++;
> +}
> +
> +/* Last word */
> +if (nr) {
> +mask_to_set &= BITMAP_LAST_WORD_MASK(size);
> +atomic_or(p, mask_to_set);
> +} else {
> +/* If we avoided the full barrier in atomic_or() we should at least
> + * issue a write barrier so that other threads using barriers see
> + * up-to-date bitmap contents.
> + */
> +smp_wmb();

Why not smp_mb() since that's what atomic_or does?

You can also ensure that you always wrap the loop with atomic_ors (or do
one if setting a single word).  I think it can be like this:

if (!nr) {
return;
}

/* Always do one atomic_or at the beginning, except if one word.  */
if (nr >= bits_to_set) {
...
}

/* Full words, leave the last word aside */
while (nr - bits_to_set > BITS_PER_LONG) {
}

/* Last word */
assert(nr > 0);
...

but maybe I messed up and there's some off-by-one somewhere in this sketch.

Paolo

> +}
> +}
> +
>  void bitmap_clear(unsigned long *map, long start, long nr)
>  {
>  unsigned long *p = map + BIT_WORD(start);
>

Re: [Qemu-devel] [PATCH v2 5/7] coroutine: rewrite pool to avoid mutex

2014-12-02 Thread Paolo Bonzini



On 02/12/2014 13:18, Peter Lieven wrote:
> On 02.12.2014 13:13, Paolo Bonzini wrote:
>>
>> On 02/12/2014 13:09, Peter Lieven wrote:
 -static void __attribute__((destructor)) coroutine_pool_cleanup(void)
 -{
 -Coroutine *co;
 -Coroutine *tmp;
 -
 -QSLIST_FOREACH_SAFE(co, &pool, pool_next, tmp) {
 -QSLIST_REMOVE_HEAD(&pool, pool_next);
 -qemu_coroutine_delete(co);
 -}
 -
 -qemu_mutex_destroy(&pool_lock);
 -}
 -
>>> I still feel we should leave this destructor in to clean up the
>>> release_pool.
>> Why?  If you run QEMU under valgrind, there are thousands of blocks that
>> we do not free.  Stefan/Kevin, what do you think?
> 
> Before this patch we cleaned up this part at least.
> I have learned that it bad style not to clean up your resources.
> Just because other code parts do not do it we should not introduce
> new parts that don't it.

Which other parts do we cleanup?  For example file descriptors are not
cleaned up, much less most memory; the kernel is there to do it for us.
 I think it's up to the maintainers to decide.

Paolo

Re: [Qemu-devel] [PATCH] target-i386: Intel xsaves

2014-12-02 Thread Paolo Bonzini



On 02/12/2014 13:19, Wanpeng Li wrote:
> Add xsaves related definition, it also add corresponding part to 
> kvm_get/put, and vmstate.
> 
> Signed-off-by: Wanpeng Li 
> ---
>  target-i386/cpu.h |  2 ++
>  target-i386/kvm.c | 15 +++
>  target-i386/machine.c |  3 ++-
>  3 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 015f5b5..cff7433 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -389,6 +389,7 @@
>  #define MSR_VM_HSAVE_PA 0xc0010117
>  
>  #define MSR_IA32_BNDCFGS0x0d90
> +#define MSR_IA32_XSS0x0da0
>  
>  #define XSTATE_FP   (1ULL << 0)
>  #define XSTATE_SSE  (1ULL << 1)
> @@ -1019,6 +1020,7 @@ typedef struct CPUX86State {
>  uint64_t xstate_bv;
>  
>  uint64_t xcr0;
> +uint64_t xss;
>  
>  TPRAccess tpr_access_type;
>  } CPUX86State;
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index ccf36e8..c6fc417 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -80,6 +80,7 @@ static bool has_msr_hv_hypercall;
>  static bool has_msr_hv_vapic;
>  static bool has_msr_hv_tsc;
>  static bool has_msr_mtrr;
> +static bool has_msr_xss;
>  
>  static bool has_msr_architectural_pmu;
>  static uint32_t num_architectural_pmu_counters;
> @@ -826,6 +827,10 @@ static int kvm_get_supported_msrs(KVMState *s)
>  has_msr_bndcfgs = true;
>  continue;
>  }
> +if (kvm_msr_list->indices[i] == MSR_IA32_XSS) {
> +has_msr_xss = true;
> +continue;
> +}
>  }
>  }
>  
> @@ -1224,6 +1229,9 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>  if (has_msr_bndcfgs) {
>  kvm_msr_entry_set(&msrs[n++], MSR_IA32_BNDCFGS, env->msr_bndcfgs);
>  }
> +if (has_msr_xss) {
> +kvm_msr_entry_set(&msrs[n++], MSR_IA32_XSS, env->xss);
> +}
>  #ifdef TARGET_X86_64
>  if (lm_capable_kernel) {
>  kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar);
> @@ -1570,6 +1578,10 @@ static int kvm_get_msrs(X86CPU *cpu)
>  if (has_msr_bndcfgs) {
>  msrs[n++].index = MSR_IA32_BNDCFGS;
>  }
> +if (has_msr_xss) {
> +msrs[n++].index = MSR_IA32_XSS;
> +}
> +
>  
>  if (!env->tsc_valid) {
>  msrs[n++].index = MSR_IA32_TSC;
> @@ -1717,6 +1729,9 @@ static int kvm_get_msrs(X86CPU *cpu)
>  case MSR_IA32_BNDCFGS:
>  env->msr_bndcfgs = msrs[i].data;
>  break;
> +case MSR_IA32_XSS:
> +env->xss = msrs[i].data;
> +break;
>  default:
>  if (msrs[i].index >= MSR_MC0_CTL &&
>  msrs[i].index < MSR_MC0_CTL + (env->mcg_cap & 0xff) * 4) {
> diff --git a/target-i386/machine.c b/target-i386/machine.c
> index 1c13b14..43af33f 100644
> --- a/target-i386/machine.c
> +++ b/target-i386/machine.c
> @@ -689,7 +689,7 @@ static const VMStateDescription vmstate_avx512 = {
>  
>  VMStateDescription vmstate_x86_cpu = {
>  .name = "cpu",
> -.version_id = 12,
> +.version_id = 13,
>  .minimum_version_id = 3,
>  .pre_save = cpu_pre_save,
>  .post_load = cpu_post_load,
> @@ -786,6 +786,7 @@ VMStateDescription vmstate_x86_cpu = {
>  VMSTATE_UINT64_V(env.xcr0, X86CPU, 12),
>  VMSTATE_UINT64_V(env.xstate_bv, X86CPU, 12),
>  VMSTATE_YMMH_REGS_VARS(env.ymmh_regs, X86CPU, CPU_NB_REGS, 12),
> +VMSTATE_UINT64_V(env.xss, X86CPU, 13),
>  VMSTATE_END_OF_LIST()
>  /* The above list is not sorted /wrt version numbers, watch out! */
>  },
> 

Please use a subsection instead of bumping the version number.
Otherwise looks good.

Thanks!

Paolo

[Qemu-devel] AMD video card passthrough reset issues

2014-12-02 Thread Lucio Andrés Illanes Albornoz

Hello,

I'm doing secondary VGA passthrough with an AMD Radeon R7 260X using 
QEMU v2.1.2 w/ KVM and VFIO on Debian v7.7 (wheezy) (qemu v2.1+dfsg-5~bpo70+1 
from wheezy-backports) and kernel version 3.16.5 (from wheezy-backports as 
well) and Windows 8.1 Update 1 (x64) as the guest OS.

At present, rebooting the VM reproducibly has Windows fail to enable/start said 
video card upon bootup w/ an error code of 43, as seems to be the case w/ 
mostly everyone else running a comparable configuration; disabling/ejecting it 
before rebooting/powering down the VM from within the guest, as with everyone 
else, has proven to be a reliable mitigation. However, being that there are 
scenarios where this is either not feasible or impossible altogether, short of 
if done through a service or kernel-mode driver (and even then,) I had intended 
to investigate the causes behind this issue.

Unfortunately, the flu got to me first (so to speak.) I did notice that simply 
removing the PCI device in question and then causing a PCI bus (re)scan (both) 
through sysfs on the host in between VM reboots/power cycles is effectively 
equivalent to disabling it within the guest. Thus, I find myself wondering 
precisely what it is that does take place when doing so vs. when QEMU performs 
a `hot reset' through the corresponding interface in drivers/vfio/pci/; 
evidently, the difference must be of sufficient importance since the latter 
mechanism ends up leaving my video card unavailable for subsequent VM operation 
until the next host reboot.

I should very much appreciate any hints concerning whether it would be possible 
to have QEMU/VFIO perform whatever need be done itself or if it should be 
possible to have this be done by either itself.

Cheers
Lucio Andrés Illanes Albornoz

[Qemu-devel] [RFC PATCH] hw/arm/virt: Add support for NUMA on ARM64

2014-12-02 Thread Shannon Zhao

Add support for NUMA on ARM64. Tested successfully running a guest
Linux kernel with the following patch applied:

- arm64:numa: adding numa support for arm64 platforms.
http://www.spinics.net/lists/arm-kernel/msg365316.html

Example qemu command line:
qemu-system-aarch64 \
-enable-kvm -smp 4\
-kernel Image \
-m 512 -machine virt,kernel_irqchip=on \
-initrd guestfs.cpio.gz \
-cpu host -nographic \
-numa node,mem=256M,cpus=0-1,nodeid=0 \
-numa node,mem=256M,cpus=2-3,nodeid=1 \
-append "console=ttyAMA0 root=/dev/ram"

Todo:
1)The NUMA nodes information in DT is not finalized yet, so this
patch might need to be further modified to follow any changes in it.

2)Consider IO-NUMA as well

Please refer to the following url for NUMA DT node details:

- Documentation: arm64/arm: dt bindings for numa.
http://www.spinics.net/lists/arm-kernel/msg380200.html

Example: 2 Node system each having 2 CPUs and a Memory

numa-map {
#address-cells = <2>;
#size-cells = <1>;
#node-count = <2>;
mem-map =  <0x0 0x4000 0>,
   <0x0 0x5000 1>;

cpu-map = <0 1 0>,
  <2 3 1>;

node-matrix = <0 0 10>,
  <0 1 20>,
  <1 0 20>,
  <1 1 10>;
};

- mem-map:  This property defines the association between a range of
memory and the proximity domain/numa node to which it belongs.

- cpu-map:  This property defines the association of range of processors
(range of cpu ids) and the proximity domain to which
the processor belongs.

- node-matrix:  This table provides a matrix that describes the relative
distance (memory latency) between all System Localities.
The value of each Entry[i j distance] in node-matrix table,
where i represents a row of a matrix and j represents a
column of a matrix, indicates the relative distances
from Proximity Domain/Numa node i to every other
node j in the system (including itself).

Signed-off-by: Shannon Zhao 
---
 hw/arm/boot.c |   25 
 hw/arm/virt.c |  120 +---
 2 files changed, 113 insertions(+), 32 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 0014c34..c20fee4 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -335,7 +335,6 @@ static int load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo,
 {
 void *fdt = NULL;
 int size, rc;
-uint32_t acells, scells;
 
 if (binfo->dtb_filename) {
 char *filename;
@@ -369,30 +368,6 @@ static int load_dtb(hwaddr addr, const struct 
arm_boot_info *binfo,
 return 0;
 }
 
-acells = qemu_fdt_getprop_cell(fdt, "/", "#address-cells");
-scells = qemu_fdt_getprop_cell(fdt, "/", "#size-cells");
-if (acells == 0 || scells == 0) {
-fprintf(stderr, "dtb file invalid (#address-cells or #size-cells 
0)\n");
-goto fail;
-}
-
-if (scells < 2 && binfo->ram_size >= (1ULL << 32)) {
-/* This is user error so deserves a friendlier error message
- * than the failure of setprop_sized_cells would provide
- */
-fprintf(stderr, "qemu: dtb file not compatible with "
-"RAM size > 4GB\n");
-goto fail;
-}
-
-rc = qemu_fdt_setprop_sized_cells(fdt, "/memory", "reg",
-  acells, binfo->loader_start,
-  scells, binfo->ram_size);
-if (rc < 0) {
-fprintf(stderr, "couldn't set /memory/reg\n");
-goto fail;
-}
-
 if (binfo->kernel_cmdline && *binfo->kernel_cmdline) {
 rc = qemu_fdt_setprop_string(fdt, "/chosen", "bootargs",
  binfo->kernel_cmdline);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 78f618d..9d18a91 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -170,8 +170,6 @@ static void create_fdt(VirtBoardInfo *vbi)
  * to fill in necessary properties later
  */
 qemu_fdt_add_subnode(fdt, "/chosen");
-qemu_fdt_add_subnode(fdt, "/memory");
-qemu_fdt_setprop_string(fdt, "/memory", "device_type", "memory");
 
 /* Clock node, for the benefit of the UART. The kernel device tree
  * binding documentation claims the PL011 node clock properties are
@@ -235,6 +233,116 @@ static void fdt_add_psci_node(const VirtBoardInfo *vbi)
 qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
 }
 
+static int virt_memory_init(MachineState *machine,
+MemoryRegion *system_memory,
+const VirtBoardInfo *vbi)
+{
+MemoryRegion *ram = g_new(MemoryRegion, 1);
+CPUState *cpu;
+int min_cpu = 0, max_cpu = 0;
+int i, j, count, len;
+uint32_t acells, sc

[Qemu-devel] [PATCH RFC v5 00/19] qemu: towards virtio-1 host support

2014-12-02 Thread Cornelia Huck

Another iteration of virtio-1 patches for qemu, as always available on
git://github.com/cohuck/qemu virtio-1

This one seems to work together with the current vhost-next patches
(well, I can ping :)

Changes from v4:
- add helpers for feature bit manipulation and checking
- use 64 bit feature bits instead of 32 bit arrays
- infrastructure to allow devices to offer different sets of feature
  bits for legacy and standard devices
- several fixes (mainly regarding, you guessed it, feature bits)

Cornelia Huck (16):
  virtio: cull virtio_bus_set_vdev_features
  virtio: feature bit manipulation helpers
  virtio: add feature checking helpers
  virtio: support more feature bits
  virtio: endianness checks for virtio 1.0 devices
  virtio: allow virtio-1 queue layout
  dataplane: allow virtio-1 devices
  s390x/virtio-ccw: support virtio-1 set_vq format
  virtio: disallow late feature changes for virtio-1
  virtio: allow to fail setting status
  s390x/virtio-ccw: enable virtio 1.0
  virtio-net: no writeable mac for virtio-1
  virtio-net: support longer header
  virtio-net: enable virtio 1.0
  virtio: support revision-specific features
  virtio-blk: revision specific feature bits

Thomas Huth (3):
  linux-headers/virtio_config: Update with VIRTIO_F_VERSION_1
  s390x/css: Add a callback for when subchannel gets disabled
  s390x/virtio-ccw: add virtio set-revision call

 hw/9pfs/virtio-9p-device.c|4 +-
 hw/block/dataplane/virtio-blk.c   |4 +-
 hw/block/virtio-blk.c |   44 +++--
 hw/char/virtio-serial-bus.c   |6 +-
 hw/net/virtio-net.c   |  100 ++-
 hw/s390x/css.c|   12 ++
 hw/s390x/css.h|1 +
 hw/s390x/s390-virtio-bus.c|3 +-
 hw/s390x/s390-virtio-bus.h|2 +-
 hw/s390x/virtio-ccw.c |  235 ++---
 hw/s390x/virtio-ccw.h |8 +-
 hw/scsi/vhost-scsi.c  |3 +-
 hw/scsi/virtio-scsi-dataplane.c   |2 +-
 hw/scsi/virtio-scsi.c |   12 +-
 hw/virtio/Makefile.objs   |2 +-
 hw/virtio/dataplane/Makefile.objs |2 +-
 hw/virtio/dataplane/vring.c   |   96 +-
 hw/virtio/virtio-balloon.c|4 +-
 hw/virtio/virtio-bus.c|   24 ++-
 hw/virtio/virtio-mmio.c   |6 +-
 hw/virtio/virtio-pci.c|7 +-
 hw/virtio/virtio-pci.h|2 +-
 hw/virtio/virtio-rng.c|2 +-
 hw/virtio/virtio.c|   83 +++--
 include/hw/qdev-properties.h  |   11 ++
 include/hw/virtio/dataplane/vring-accessors.h |   75 
 include/hw/virtio/dataplane/vring.h   |   14 +-
 include/hw/virtio/virtio-access.h |4 +
 include/hw/virtio/virtio-bus.h|   14 +-
 include/hw/virtio/virtio-net.h|   46 ++---
 include/hw/virtio/virtio-scsi.h   |6 +-
 include/hw/virtio/virtio.h|   61 +--
 linux-headers/linux/virtio_config.h   |3 +
 33 files changed, 625 insertions(+), 273 deletions(-)
 create mode 100644 include/hw/virtio/dataplane/vring-accessors.h

-- 
1.7.9.5

[Qemu-devel] [PATCH RFC v5 03/19] virtio: feature bit manipulation helpers

2014-12-02 Thread Cornelia Huck

Add virtio_{add,clear}_feature helper functions for manipulating a
feature bits variable. This has some benefits over open coding:
- add check that the bit is in a sane range
- make it obvious at a glance what is going on
- have a central point to change when we want to extend feature bits

Convert existing code manipulating features to use the new helpers.

Signed-off-by: Cornelia Huck 
---
 hw/9pfs/virtio-9p-device.c  |2 +-
 hw/block/virtio-blk.c   |   16 
 hw/char/virtio-serial-bus.c |2 +-
 hw/net/virtio-net.c |   34 +-
 hw/s390x/virtio-ccw.c   |4 ++--
 hw/virtio/virtio-mmio.c |2 +-
 hw/virtio/virtio-pci.c  |4 ++--
 include/hw/virtio/virtio.h  |   12 
 8 files changed, 44 insertions(+), 32 deletions(-)

diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index 2572747..30492ec 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -23,7 +23,7 @@
 
 static uint32_t virtio_9p_get_features(VirtIODevice *vdev, uint32_t features)
 {
-features |= 1 << VIRTIO_9P_MOUNT_TAG;
+virtio_add_feature(&features, VIRTIO_9P_MOUNT_TAG);
 return features;
 }
 
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index b19b102..3f76e2a 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -568,20 +568,20 @@ static uint32_t virtio_blk_get_features(VirtIODevice 
*vdev, uint32_t features)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 
-features |= (1 << VIRTIO_BLK_F_SEG_MAX);
-features |= (1 << VIRTIO_BLK_F_GEOMETRY);
-features |= (1 << VIRTIO_BLK_F_TOPOLOGY);
-features |= (1 << VIRTIO_BLK_F_BLK_SIZE);
-features |= (1 << VIRTIO_BLK_F_SCSI);
+virtio_add_feature(&features, VIRTIO_BLK_F_SEG_MAX);
+virtio_add_feature(&features, VIRTIO_BLK_F_GEOMETRY);
+virtio_add_feature(&features, VIRTIO_BLK_F_TOPOLOGY);
+virtio_add_feature(&features, VIRTIO_BLK_F_BLK_SIZE);
+virtio_add_feature(&features, VIRTIO_BLK_F_SCSI);
 
 if (s->conf.config_wce) {
-features |= (1 << VIRTIO_BLK_F_CONFIG_WCE);
+virtio_add_feature(&features, VIRTIO_BLK_F_CONFIG_WCE);
 }
 if (blk_enable_write_cache(s->blk)) {
-features |= (1 << VIRTIO_BLK_F_WCE);
+virtio_add_feature(&features, VIRTIO_BLK_F_WCE);
 }
 if (blk_is_read_only(s->blk)) {
-features |= 1 << VIRTIO_BLK_F_RO;
+virtio_add_feature(&features, VIRTIO_BLK_F_RO);
 }
 
 return features;
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index a7b1b68..0f637db 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -474,7 +474,7 @@ static uint32_t get_features(VirtIODevice *vdev, uint32_t 
features)
 vser = VIRTIO_SERIAL(vdev);
 
 if (vser->bus.max_nr_ports > 1) {
-features |= (1 << VIRTIO_CONSOLE_F_MULTIPORT);
+virtio_add_feature(&features, VIRTIO_CONSOLE_F_MULTIPORT);
 }
 return features;
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e574bd4..f1aa100 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -446,23 +446,23 @@ static uint32_t virtio_net_get_features(VirtIODevice 
*vdev, uint32_t features)
 VirtIONet *n = VIRTIO_NET(vdev);
 NetClientState *nc = qemu_get_queue(n->nic);
 
-features |= (1 << VIRTIO_NET_F_MAC);
+virtio_add_feature(&features, VIRTIO_NET_F_MAC);
 
 if (!peer_has_vnet_hdr(n)) {
-features &= ~(0x1 << VIRTIO_NET_F_CSUM);
-features &= ~(0x1 << VIRTIO_NET_F_HOST_TSO4);
-features &= ~(0x1 << VIRTIO_NET_F_HOST_TSO6);
-features &= ~(0x1 << VIRTIO_NET_F_HOST_ECN);
+virtio_clear_feature(&features, VIRTIO_NET_F_CSUM);
+virtio_clear_feature(&features, VIRTIO_NET_F_HOST_TSO4);
+virtio_clear_feature(&features, VIRTIO_NET_F_HOST_TSO6);
+virtio_clear_feature(&features, VIRTIO_NET_F_HOST_ECN);
 
-features &= ~(0x1 << VIRTIO_NET_F_GUEST_CSUM);
-features &= ~(0x1 << VIRTIO_NET_F_GUEST_TSO4);
-features &= ~(0x1 << VIRTIO_NET_F_GUEST_TSO6);
-features &= ~(0x1 << VIRTIO_NET_F_GUEST_ECN);
+virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_CSUM);
+virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO4);
+virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO6);
+virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_ECN);
 }
 
 if (!peer_has_vnet_hdr(n) || !peer_has_ufo(n)) {
-features &= ~(0x1 << VIRTIO_NET_F_GUEST_UFO);
-features &= ~(0x1 << VIRTIO_NET_F_HOST_UFO);
+virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_UFO);
+virtio_clear_feature(&features, VIRTIO_NET_F_HOST_UFO);
 }
 
 if (!get_vhost_net(nc->peer)) {
@@ -477,11 +477,11 @@ static uint32_t virtio_net_bad_features(VirtIODevice 
*vdev)
 
 /* Linux kernel 2.6.25.  It understood MAC (as everyone must),
  * but also these: */
-features |= (1 << VIRTIO_NET_F_MAC);
-feat

[Qemu-devel] [PATCH RFC v5 04/19] virtio: add feature checking helpers

2014-12-02 Thread Cornelia Huck

Add a helper function for checking whether a bit is set in the guest
features for a vdev as well as one that works on a feature bit set.

Convert code that open-coded this: It cleans up the code and makes it
easier to extend the guest feature bits.

Signed-off-by: Cornelia Huck 
---
 hw/block/virtio-blk.c   |7 ++-
 hw/char/virtio-serial-bus.c |2 +-
 hw/net/virtio-net.c |   23 +--
 hw/scsi/virtio-scsi.c   |8 
 hw/virtio/dataplane/vring.c |   10 +-
 hw/virtio/virtio-balloon.c  |2 +-
 hw/virtio/virtio.c  |   10 +-
 include/hw/virtio/virtio.h  |   11 +++
 8 files changed, 42 insertions(+), 31 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 3f76e2a..27f263a 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -590,7 +590,6 @@ static uint32_t virtio_blk_get_features(VirtIODevice *vdev, 
uint32_t features)
 static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t status)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
-uint32_t features;
 
 if (s->dataplane && !(status & (VIRTIO_CONFIG_S_DRIVER |
 VIRTIO_CONFIG_S_DRIVER_OK))) {
@@ -601,8 +600,6 @@ static void virtio_blk_set_status(VirtIODevice *vdev, 
uint8_t status)
 return;
 }
 
-features = vdev->guest_features;
-
 /* A guest that supports VIRTIO_BLK_F_CONFIG_WCE must be able to send
  * cache flushes.  Thus, the "auto writethrough" behavior is never
  * necessary for guests that support the VIRTIO_BLK_F_CONFIG_WCE feature.
@@ -618,10 +615,10 @@ static void virtio_blk_set_status(VirtIODevice *vdev, 
uint8_t status)
  *
  * s->blk would erroneously be placed in writethrough mode.
  */
-if (!(features & (1 << VIRTIO_BLK_F_CONFIG_WCE))) {
+if (!virtio_has_feature(vdev, VIRTIO_BLK_F_CONFIG_WCE)) {
 aio_context_acquire(blk_get_aio_context(s->blk));
 blk_set_enable_write_cache(s->blk,
-   !!(features & (1 << VIRTIO_BLK_F_WCE)));
+   virtio_has_feature(vdev, VIRTIO_BLK_F_WCE));
 aio_context_release(blk_get_aio_context(s->blk));
 }
 }
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 0f637db..d49883f 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -75,7 +75,7 @@ static VirtIOSerialPort *find_port_by_name(char *name)
 static bool use_multiport(VirtIOSerial *vser)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(vser);
-return vdev->guest_features & (1 << VIRTIO_CONSOLE_F_MULTIPORT);
+return virtio_has_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT);
 }
 
 static size_t write_to_port(VirtIOSerialPort *port,
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f1aa100..9f3c58a 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -86,7 +86,7 @@ static void virtio_net_set_config(VirtIODevice *vdev, const 
uint8_t *config)
 
 memcpy(&netcfg, config, n->config_size);
 
-if (!(vdev->guest_features >> VIRTIO_NET_F_CTRL_MAC_ADDR & 1) &&
+if (!virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR) &&
 memcmp(netcfg.mac, n->mac, ETH_ALEN)) {
 memcpy(n->mac, netcfg.mac, ETH_ALEN);
 qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
@@ -305,7 +305,7 @@ static RxFilterInfo 
*virtio_net_query_rxfilter(NetClientState *nc)
 info->multicast_table = str_list;
 info->vlan_table = get_vlan_table(n);
 
-if (!((1 << VIRTIO_NET_F_CTRL_VLAN) & vdev->guest_features)) {
+if (!virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VLAN)) {
 info->vlan = RX_STATE_ALL;
 } else if (!info->vlan_table) {
 info->vlan = RX_STATE_NONE;
@@ -519,9 +519,12 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
uint32_t features)
 VirtIONet *n = VIRTIO_NET(vdev);
 int i;
 
-virtio_net_set_multiqueue(n, !!(features & (1 << VIRTIO_NET_F_MQ)));
+virtio_net_set_multiqueue(n,
+  __virtio_has_feature(features, VIRTIO_NET_F_MQ));
 
-virtio_net_set_mrg_rx_bufs(n, !!(features & (1 << 
VIRTIO_NET_F_MRG_RXBUF)));
+virtio_net_set_mrg_rx_bufs(n,
+   __virtio_has_feature(features,
+VIRTIO_NET_F_MRG_RXBUF));
 
 if (n->has_vnet_hdr) {
 n->curr_guest_offloads =
@@ -538,7 +541,7 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
uint32_t features)
 vhost_net_ack_features(get_vhost_net(nc->peer), features);
 }
 
-if ((1 << VIRTIO_NET_F_CTRL_VLAN) & features) {
+if (__virtio_has_feature(features, VIRTIO_NET_F_CTRL_VLAN)) {
 memset(n->vlans, 0, MAX_VLAN >> 3);
 } else {
 memset(n->vlans, 0xff, MAX_VLAN >> 3);
@@ -585,7 +588,7 @@ static int virtio_net_handle_offloads(VirtIONet *n, uint8_t 
cmd,
 uint64_t offloads;
 size_t s;
 
-if (!((1 << VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) &

[Qemu-devel] [PATCH RFC v5 01/19] linux-headers/virtio_config: Update with VIRTIO_F_VERSION_1

2014-12-02 Thread Cornelia Huck

From: Thomas Huth 

Add the new VIRTIO_F_VERSION_1 definition to the virtio_config.h
linux header.

Signed-off-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 linux-headers/linux/virtio_config.h |3 +++
 1 file changed, 3 insertions(+)

diff --git a/linux-headers/linux/virtio_config.h 
b/linux-headers/linux/virtio_config.h
index 75dc20b..16aa289 100644
--- a/linux-headers/linux/virtio_config.h
+++ b/linux-headers/linux/virtio_config.h
@@ -54,4 +54,7 @@
 /* Can the device handle any descriptor layout? */
 #define VIRTIO_F_ANY_LAYOUT27
 
+/* v1.0 compliant. */
+#define VIRTIO_F_VERSION_1 32
+
 #endif /* _LINUX_VIRTIO_CONFIG_H */
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC v5 05/19] virtio: support more feature bits

2014-12-02 Thread Cornelia Huck

With virtio-1, we support more than 32 feature bits. Let's extend both
host and guest features to 64, which should suffice for a while.

vhost and migration have been ignored for now.

Signed-off-by: Cornelia Huck 
---
 hw/9pfs/virtio-9p-device.c  |2 +-
 hw/block/virtio-blk.c   |2 +-
 hw/char/virtio-serial-bus.c |2 +-
 hw/net/virtio-net.c |   22 +--
 hw/s390x/s390-virtio-bus.c  |3 ++-
 hw/s390x/s390-virtio-bus.h  |2 +-
 hw/s390x/virtio-ccw.c   |   40 --
 hw/s390x/virtio-ccw.h   |5 +
 hw/scsi/vhost-scsi.c|3 +--
 hw/scsi/virtio-scsi.c   |4 ++--
 hw/virtio/virtio-balloon.c  |2 +-
 hw/virtio/virtio-bus.c  |6 ++---
 hw/virtio/virtio-mmio.c |4 ++--
 hw/virtio/virtio-pci.c  |3 ++-
 hw/virtio/virtio-pci.h  |2 +-
 hw/virtio/virtio-rng.c  |2 +-
 hw/virtio/virtio.c  |   13 ++-
 include/hw/qdev-properties.h|   11 ++
 include/hw/virtio/virtio-bus.h  |8 +++
 include/hw/virtio/virtio-net.h  |   46 +++
 include/hw/virtio/virtio-scsi.h |6 ++---
 include/hw/virtio/virtio.h  |   38 ++--
 22 files changed, 126 insertions(+), 100 deletions(-)

diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index 30492ec..60f9ff9 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -21,7 +21,7 @@
 #include "virtio-9p-coth.h"
 #include "hw/virtio/virtio-access.h"
 
-static uint32_t virtio_9p_get_features(VirtIODevice *vdev, uint32_t features)
+static uint64_t virtio_9p_get_features(VirtIODevice *vdev, uint64_t features)
 {
 virtio_add_feature(&features, VIRTIO_9P_MOUNT_TAG);
 return features;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 27f263a..9cfae66 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -564,7 +564,7 @@ static void virtio_blk_set_config(VirtIODevice *vdev, const 
uint8_t *config)
 aio_context_release(blk_get_aio_context(s->blk));
 }
 
-static uint32_t virtio_blk_get_features(VirtIODevice *vdev, uint32_t features)
+static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index d49883f..2d2ed9c 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -467,7 +467,7 @@ static void handle_input(VirtIODevice *vdev, VirtQueue *vq)
 {
 }
 
-static uint32_t get_features(VirtIODevice *vdev, uint32_t features)
+static uint64_t get_features(VirtIODevice *vdev, uint64_t features)
 {
 VirtIOSerial *vser;
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9f3c58a..d6d1b98 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -38,16 +38,16 @@
 (offsetof(container, field) + sizeof(((container *)0)->field))
 
 typedef struct VirtIOFeature {
-uint32_t flags;
+uint64_t flags;
 size_t end;
 } VirtIOFeature;
 
 static VirtIOFeature feature_sizes[] = {
-{.flags = 1 << VIRTIO_NET_F_MAC,
+{.flags = 1ULL << VIRTIO_NET_F_MAC,
  .end = endof(struct virtio_net_config, mac)},
-{.flags = 1 << VIRTIO_NET_F_STATUS,
+{.flags = 1ULL << VIRTIO_NET_F_STATUS,
  .end = endof(struct virtio_net_config, status)},
-{.flags = 1 << VIRTIO_NET_F_MQ,
+{.flags = 1ULL << VIRTIO_NET_F_MQ,
  .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
 {}
 };
@@ -441,7 +441,7 @@ static void virtio_net_set_queues(VirtIONet *n)
 
 static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue);
 
-static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features)
+static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
 NetClientState *nc = qemu_get_queue(n->nic);
@@ -471,9 +471,9 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, 
uint32_t features)
 return vhost_net_get_features(get_vhost_net(nc->peer), features);
 }
 
-static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
+static uint64_t virtio_net_bad_features(VirtIODevice *vdev)
 {
-uint32_t features = 0;
+uint64_t features = 0;
 
 /* Linux kernel 2.6.25.  It understood MAC (as everyone must),
  * but also these: */
@@ -496,7 +496,7 @@ static void virtio_net_apply_guest_offloads(VirtIONet *n)
 !!(n->curr_guest_offloads & (1ULL << VIRTIO_NET_F_GUEST_UFO)));
 }
 
-static uint64_t virtio_net_guest_offloads_by_features(uint32_t features)
+static uint64_t virtio_net_guest_offloads_by_features(uint64_t features)
 {
 static const uint64_t guest_offloads_mask =
 (1ULL << VIRTIO_NET_F_GUEST_CSUM) |
@@ -514,7 +514,7 @@ static inline uint64_t 
virtio_net_supported_guest_offloads(VirtIONet *n)
 return virtio_net_guest_offloads_by_feat

[Qemu-devel] [PATCH RFC v5 02/19] virtio: cull virtio_bus_set_vdev_features

2014-12-02 Thread Cornelia Huck

The only user of this function was virtio-ccw, and it should use
virtio_set_features() like everybody else: We need to make sure
that bad features are masked out properly, which this function did
not do.

Reviewed-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c  |3 +--
 hw/virtio/virtio-bus.c |   14 --
 include/hw/virtio/virtio-bus.h |3 ---
 3 files changed, 1 insertion(+), 19 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index ea236c9..84f17bc 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -400,8 +400,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
ccw.cda + sizeof(features.features));
 features.features = ldl_le_phys(&address_space_memory, ccw.cda);
 if (features.index < ARRAY_SIZE(dev->host_features)) {
-virtio_bus_set_vdev_features(&dev->bus, features.features);
-vdev->guest_features = features.features;
+virtio_set_features(vdev, features.features);
 } else {
 /*
  * If the guest supports more feature bits, assert that it
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index eb77019..a8ffa07 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -109,20 +109,6 @@ uint32_t virtio_bus_get_vdev_features(VirtioBusState *bus,
 return k->get_features(vdev, requested_features);
 }
 
-/* Set the features of the plugged device. */
-void virtio_bus_set_vdev_features(VirtioBusState *bus,
-  uint32_t requested_features)
-{
-VirtIODevice *vdev = virtio_bus_get_device(bus);
-VirtioDeviceClass *k;
-
-assert(vdev != NULL);
-k = VIRTIO_DEVICE_GET_CLASS(vdev);
-if (k->set_features != NULL) {
-k->set_features(vdev, requested_features);
-}
-}
-
 /* Get bad features of the plugged device. */
 uint32_t virtio_bus_get_vdev_bad_features(VirtioBusState *bus)
 {
diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index 0756545..0d2e7b4 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -84,9 +84,6 @@ size_t virtio_bus_get_vdev_config_len(VirtioBusState *bus);
 /* Get the features of the plugged device. */
 uint32_t virtio_bus_get_vdev_features(VirtioBusState *bus,
 uint32_t requested_features);
-/* Set the features of the plugged device. */
-void virtio_bus_set_vdev_features(VirtioBusState *bus,
-  uint32_t requested_features);
 /* Get bad features of the plugged device. */
 uint32_t virtio_bus_get_vdev_bad_features(VirtioBusState *bus);
 /* Get config of the plugged device. */
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC v5 06/19] virtio: endianness checks for virtio 1.0 devices

2014-12-02 Thread Cornelia Huck

Add code that checks for the VERSION_1 feature bit in order to make
decisions about the device's endianness. This allows us to support
transitional devices.

Signed-off-by: Cornelia Huck 
---
 hw/virtio/virtio.c|6 +-
 include/hw/virtio/virtio-access.h |4 
 include/hw/virtio/virtio.h|8 ++--
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 7f74ae5..8f69ffa 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -881,7 +881,11 @@ static bool virtio_device_endian_needed(void *opaque)
 VirtIODevice *vdev = opaque;
 
 assert(vdev->device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
-return vdev->device_endian != virtio_default_endian();
+if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+return vdev->device_endian != virtio_default_endian();
+}
+/* Devices conforming to VIRTIO 1.0 or later are always LE. */
+return vdev->device_endian != VIRTIO_DEVICE_ENDIAN_LITTLE;
 }
 
 static const VMStateDescription vmstate_virtio_device_endian = {
diff --git a/include/hw/virtio/virtio-access.h 
b/include/hw/virtio/virtio-access.h
index 46456fd..ee28c21 100644
--- a/include/hw/virtio/virtio-access.h
+++ b/include/hw/virtio/virtio-access.h
@@ -19,6 +19,10 @@
 
 static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
 {
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+/* Devices conforming to VIRTIO 1.0 or later are always LE. */
+return false;
+}
 #if defined(TARGET_IS_BIENDIAN)
 return virtio_is_big_endian(vdev);
 #elif defined(TARGET_WORDS_BIGENDIAN)
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 08141c7..68c40db 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -297,7 +297,11 @@ static inline bool virtio_has_feature(VirtIODevice *vdev, 
unsigned int fbit)
 
 static inline bool virtio_is_big_endian(VirtIODevice *vdev)
 {
-assert(vdev->device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
-return vdev->device_endian == VIRTIO_DEVICE_ENDIAN_BIG;
+if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+assert(vdev->device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN);
+return vdev->device_endian == VIRTIO_DEVICE_ENDIAN_BIG;
+}
+/* Devices conforming to VIRTIO 1.0 or later are always LE. */
+return false;
 }
 #endif
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC v5 08/19] dataplane: allow virtio-1 devices

2014-12-02 Thread Cornelia Huck

Handle endianness conversion for virtio-1 virtqueues correctly.

Note that dataplane now needs to be built per-target.

Signed-off-by: Cornelia Huck 
---
 hw/block/dataplane/virtio-blk.c   |4 +-
 hw/scsi/virtio-scsi-dataplane.c   |2 +-
 hw/virtio/Makefile.objs   |2 +-
 hw/virtio/dataplane/Makefile.objs |2 +-
 hw/virtio/dataplane/vring.c   |   86 ++---
 include/hw/virtio/dataplane/vring-accessors.h |   75 +
 include/hw/virtio/dataplane/vring.h   |   14 +---
 7 files changed, 131 insertions(+), 54 deletions(-)
 create mode 100644 include/hw/virtio/dataplane/vring-accessors.h

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 1222a37..2d8cc15 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -16,7 +16,9 @@
 #include "qemu/iov.h"
 #include "qemu/thread.h"
 #include "qemu/error-report.h"
+#include "hw/virtio/virtio-access.h"
 #include "hw/virtio/dataplane/vring.h"
+#include "hw/virtio/dataplane/vring-accessors.h"
 #include "sysemu/block-backend.h"
 #include "hw/virtio/virtio-blk.h"
 #include "virtio-blk.h"
@@ -75,7 +77,7 @@ static void complete_request_vring(VirtIOBlockReq *req, 
unsigned char status)
 VirtIOBlockDataPlane *s = req->dev->dataplane;
 stb_p(&req->in->status, status);
 
-vring_push(&req->dev->dataplane->vring, &req->elem,
+vring_push(s->vdev, &req->dev->dataplane->vring, &req->elem,
req->qiov.size + sizeof(*req->in));
 
 /* Suppress notification to guest by BH and its scheduled
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 03a1e8c..418d73b 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -94,7 +94,7 @@ void virtio_scsi_vring_push_notify(VirtIOSCSIReq *req)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(req->vring->parent);
 
-vring_push(&req->vring->vring, &req->elem,
+vring_push(vdev, &req->vring->vring, &req->elem,
req->qsgl.size + req->resp_iov.size);
 
 if (vring_should_notify(vdev, &req->vring->vring)) {
diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index d21c397..19b224a 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -2,7 +2,7 @@ common-obj-y += virtio-rng.o
 common-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
 common-obj-y += virtio-bus.o
 common-obj-y += virtio-mmio.o
-common-obj-$(CONFIG_VIRTIO) += dataplane/
+obj-$(CONFIG_VIRTIO) += dataplane/
 
 obj-y += virtio.o virtio-balloon.o 
 obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
diff --git a/hw/virtio/dataplane/Makefile.objs 
b/hw/virtio/dataplane/Makefile.objs
index 9a8cfc0..753a9ca 100644
--- a/hw/virtio/dataplane/Makefile.objs
+++ b/hw/virtio/dataplane/Makefile.objs
@@ -1 +1 @@
-common-obj-y += vring.o
+obj-y += vring.o
diff --git a/hw/virtio/dataplane/vring.c b/hw/virtio/dataplane/vring.c
index 6e283fc..a44c8c8 100644
--- a/hw/virtio/dataplane/vring.c
+++ b/hw/virtio/dataplane/vring.c
@@ -18,7 +18,9 @@
 #include "hw/hw.h"
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
+#include "hw/virtio/virtio-access.h"
 #include "hw/virtio/dataplane/vring.h"
+#include "hw/virtio/dataplane/vring-accessors.h"
 #include "qemu/error-report.h"
 
 /* vring_map can be coupled with vring_unmap or (if you still have the
@@ -83,7 +85,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
 vring_init(&vring->vr, virtio_queue_get_num(vdev, n), vring_ptr, 4096);
 
 vring->last_avail_idx = virtio_queue_get_last_avail_idx(vdev, n);
-vring->last_used_idx = vring->vr.used->idx;
+vring->last_used_idx = vring_get_used_idx(vdev, vring);
 vring->signalled_used = 0;
 vring->signalled_used_valid = false;
 
@@ -104,7 +106,7 @@ void vring_teardown(Vring *vring, VirtIODevice *vdev, int n)
 void vring_disable_notification(VirtIODevice *vdev, Vring *vring)
 {
 if (!virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
-vring->vr.used->flags |= VRING_USED_F_NO_NOTIFY;
+vring_set_used_flags(vdev, vring, VRING_USED_F_NO_NOTIFY);
 }
 }
 
@@ -117,10 +119,10 @@ bool vring_enable_notification(VirtIODevice *vdev, Vring 
*vring)
 if (virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
 vring_avail_event(&vring->vr) = vring->vr.avail->idx;
 } else {
-vring->vr.used->flags &= ~VRING_USED_F_NO_NOTIFY;
+vring_clear_used_flags(vdev, vring, VRING_USED_F_NO_NOTIFY);
 }
 smp_mb(); /* ensure update is seen before reading avail_idx */
-return !vring_more_avail(vring);
+return !vring_more_avail(vdev, vring);
 }
 
 /* This is stolen from linux/drivers/vhost/vhost.c:vhost_notify() */
@@ -134,12 +136,13 @@ bool vring_should_notify(VirtIODevice *vdev, Vring *vring)
 smp_mb();
 
 if (virtio_has_feature(vdev, VIRTIO_F_NOTIFY_ON_EMPTY) &&
-unlikely(vring->vr.avail->idx == vring->last_av

[Qemu-devel] [PATCH RFC v5 07/19] virtio: allow virtio-1 queue layout

2014-12-02 Thread Cornelia Huck

For virtio-1 devices, we allow a more complex queue layout that doesn't
require descriptor table and rings on a physically-contigous memory area:
add virtio_queue_set_rings() to allow transports to set this up.

Signed-off-by: Cornelia Huck 
---
 hw/virtio/virtio.c |   16 
 include/hw/virtio/virtio.h |2 ++
 2 files changed, 18 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 8f69ffa..508dccf 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -96,6 +96,13 @@ static void virtqueue_init(VirtQueue *vq)
 {
 hwaddr pa = vq->pa;
 
+if (pa == -1ULL) {
+/*
+ * This is a virtio-1 style vq that has already been setup
+ * in virtio_queue_set.
+ */
+return;
+}
 vq->vring.desc = pa;
 vq->vring.avail = pa + vq->vring.num * sizeof(VRingDesc);
 vq->vring.used = vring_align(vq->vring.avail +
@@ -717,6 +724,15 @@ hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
 return vdev->vq[n].pa;
 }
 
+void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
+hwaddr avail, hwaddr used)
+{
+vdev->vq[n].pa = -1ULL;
+vdev->vq[n].vring.desc = desc;
+vdev->vq[n].vring.avail = avail;
+vdev->vq[n].vring.used = used;
+}
+
 void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
 {
 /* Don't allow guest to flip queue between existent and
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 68c40db..80ee313 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -224,6 +224,8 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, 
hwaddr addr);
 hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n);
 void virtio_queue_set_num(VirtIODevice *vdev, int n, int num);
 int virtio_queue_get_num(VirtIODevice *vdev, int n);
+void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
+hwaddr avail, hwaddr used);
 void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
 void virtio_queue_notify(VirtIODevice *vdev, int n);
 uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC v5 12/19] virtio: disallow late feature changes for virtio-1

2014-12-02 Thread Cornelia Huck

For virtio-1 devices, the driver must not attempt to set feature bits
after it set FEATURES_OK in the device status. Simply reject it in
that case.

Signed-off-by: Cornelia Huck 
---
 hw/virtio/virtio.c |   16 ++--
 include/hw/virtio/virtio.h |2 ++
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 508dccf..4f2dc48 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -980,7 +980,7 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 vmstate_save_state(f, &vmstate_virtio, vdev);
 }
 
-int virtio_set_features(VirtIODevice *vdev, uint64_t val)
+static int __virtio_set_features(VirtIODevice *vdev, uint64_t val)
 {
 BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
 VirtioBusClass *vbusk = VIRTIO_BUS_GET_CLASS(qbus);
@@ -996,6 +996,18 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val)
 return bad ? -1 : 0;
 }
 
+int virtio_set_features(VirtIODevice *vdev, uint64_t val)
+{
+   /*
+ * The driver must not attempt to set features after feature negotiation
+ * has finished.
+ */
+if (vdev->status & VIRTIO_CONFIG_S_FEATURES_OK) {
+return -EINVAL;
+}
+return __virtio_set_features(vdev, val);
+}
+
 int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
 {
 int i, ret;
@@ -1028,7 +1040,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
version_id)
 qemu_get_be32s(f, &features);
 
 /* XXX features >= 32 */
-if (virtio_set_features(vdev, features) < 0) {
+if (__virtio_set_features(vdev, features) < 0) {
 supported_features = k->get_features(qbus->parent);
 error_report("Features 0x%x unsupported. Allowed features: 0x%lx",
  features, supported_features);
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 80ee313..9a984c2 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -32,6 +32,8 @@
 #define VIRTIO_CONFIG_S_DRIVER  2
 /* Driver has used its parts of the config, and is happy */
 #define VIRTIO_CONFIG_S_DRIVER_OK   4
+/* Driver has finished configuring features */
+#define VIRTIO_CONFIG_S_FEATURES_OK 8
 /* We've given up on this device. */
 #define VIRTIO_CONFIG_S_FAILED  0x80
 
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC v5 09/19] s390x/css: Add a callback for when subchannel gets disabled

2014-12-02 Thread Cornelia Huck

From: Thomas Huth 

We need a possibility to run code when a subchannel gets disabled.
This patch adds the necessary infrastructure.

Signed-off-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/css.c |   12 
 hw/s390x/css.h |1 +
 2 files changed, 13 insertions(+)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index b67c039..735ec55 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -588,6 +588,7 @@ int css_do_msch(SubchDev *sch, SCHIB *orig_schib)
 {
 SCSW *s = &sch->curr_status.scsw;
 PMCW *p = &sch->curr_status.pmcw;
+uint16_t oldflags;
 int ret;
 SCHIB schib;
 
@@ -610,6 +611,7 @@ int css_do_msch(SubchDev *sch, SCHIB *orig_schib)
 copy_schib_from_guest(&schib, orig_schib);
 /* Only update the program-modifiable fields. */
 p->intparm = schib.pmcw.intparm;
+oldflags = p->flags;
 p->flags &= ~(PMCW_FLAGS_MASK_ISC | PMCW_FLAGS_MASK_ENA |
   PMCW_FLAGS_MASK_LM | PMCW_FLAGS_MASK_MME |
   PMCW_FLAGS_MASK_MP);
@@ -625,6 +627,12 @@ int css_do_msch(SubchDev *sch, SCHIB *orig_schib)
 (PMCW_CHARS_MASK_MBFC | PMCW_CHARS_MASK_CSENSE);
 sch->curr_status.mba = schib.mba;
 
+/* Has the channel been disabled? */
+if (sch->disable_cb && (oldflags & PMCW_FLAGS_MASK_ENA) != 0
+&& (p->flags & PMCW_FLAGS_MASK_ENA) == 0) {
+sch->disable_cb(sch);
+}
+
 ret = 0;
 
 out:
@@ -1443,6 +1451,10 @@ void css_reset_sch(SubchDev *sch)
 {
 PMCW *p = &sch->curr_status.pmcw;
 
+if ((p->flags & PMCW_FLAGS_MASK_ENA) != 0 && sch->disable_cb) {
+sch->disable_cb(sch);
+}
+
 p->intparm = 0;
 p->flags &= ~(PMCW_FLAGS_MASK_ISC | PMCW_FLAGS_MASK_ENA |
   PMCW_FLAGS_MASK_LM | PMCW_FLAGS_MASK_MME |
diff --git a/hw/s390x/css.h b/hw/s390x/css.h
index 33104ac..7fa807b 100644
--- a/hw/s390x/css.h
+++ b/hw/s390x/css.h
@@ -81,6 +81,7 @@ struct SubchDev {
 uint8_t ccw_no_data_cnt;
 /* transport-provided data: */
 int (*ccw_cb) (SubchDev *, CCW1);
+void (*disable_cb)(SubchDev *);
 SenseId id;
 void *driver_data;
 };
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC v5 11/19] s390x/virtio-ccw: support virtio-1 set_vq format

2014-12-02 Thread Cornelia Huck

Support the new CCW_CMD_SET_VQ format for virtio-1 devices.

While we're at it, refactor the code a bit and enforce big endian
fields (which had always been required, even for legacy).

Reviewed-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c |  114 ++---
 1 file changed, 80 insertions(+), 34 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 5311d9f..75c9ff9 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -238,11 +238,20 @@ VirtualCssBus *virtual_css_bus_init(void)
 }
 
 /* Communication blocks used by several channel commands. */
-typedef struct VqInfoBlock {
+typedef struct VqInfoBlockLegacy {
 uint64_t queue;
 uint32_t align;
 uint16_t index;
 uint16_t num;
+} QEMU_PACKED VqInfoBlockLegacy;
+
+typedef struct VqInfoBlock {
+uint64_t desc;
+uint32_t res0;
+uint16_t index;
+uint16_t num;
+uint64_t avail;
+uint64_t used;
 } QEMU_PACKED VqInfoBlock;
 
 typedef struct VqConfigBlock {
@@ -269,17 +278,20 @@ typedef struct VirtioRevInfo {
 } QEMU_PACKED VirtioRevInfo;
 
 /* Specify where the virtqueues for the subchannel are in guest memory. */
-static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align,
-  uint16_t index, uint16_t num)
+static int virtio_ccw_set_vqs(SubchDev *sch, VqInfoBlock *info,
+  VqInfoBlockLegacy *linfo)
 {
 VirtIODevice *vdev = virtio_ccw_get_vdev(sch);
+uint16_t index = info ? info->index : linfo->index;
+uint16_t num = info ? info->num : linfo->num;
+uint64_t desc = info ? info->desc : linfo->queue;
 
 if (index > VIRTIO_PCI_QUEUE_MAX) {
 return -EINVAL;
 }
 
 /* Current code in virtio.c relies on 4K alignment. */
-if (addr && (align != 4096)) {
+if (linfo && desc && (linfo->align != 4096)) {
 return -EINVAL;
 }
 
@@ -287,8 +299,12 @@ static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t 
addr, uint32_t align,
 return -EINVAL;
 }
 
-virtio_queue_set_addr(vdev, index, addr);
-if (!addr) {
+if (info) {
+virtio_queue_set_rings(vdev, index, desc, info->avail, info->used);
+} else {
+virtio_queue_set_addr(vdev, index, desc);
+}
+if (!desc) {
 virtio_queue_set_vector(vdev, index, 0);
 } else {
 /* Fail if we don't have a big enough queue. */
@@ -303,10 +319,66 @@ static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t 
addr, uint32_t align,
 return 0;
 }
 
-static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
+static int virtio_ccw_handle_set_vq(SubchDev *sch, CCW1 ccw, bool check_len,
+bool is_legacy)
 {
 int ret;
 VqInfoBlock info;
+VqInfoBlockLegacy linfo;
+size_t info_len = is_legacy ? sizeof(linfo) : sizeof(info);
+
+if (check_len) {
+if (ccw.count != info_len) {
+return -EINVAL;
+}
+} else if (ccw.count < info_len) {
+/* Can't execute command. */
+return -EINVAL;
+}
+if (!ccw.cda) {
+return -EFAULT;
+}
+if (is_legacy) {
+linfo.queue = ldq_be_phys(&address_space_memory, ccw.cda);
+linfo.align = ldl_be_phys(&address_space_memory,
+  ccw.cda + sizeof(linfo.queue));
+linfo.index = lduw_be_phys(&address_space_memory,
+   ccw.cda + sizeof(linfo.queue)
+   + sizeof(linfo.align));
+linfo.num = lduw_be_phys(&address_space_memory,
+ ccw.cda + sizeof(linfo.queue)
+ + sizeof(linfo.align)
+ + sizeof(linfo.index));
+ret = virtio_ccw_set_vqs(sch, NULL, &linfo);
+} else {
+info.desc = ldq_be_phys(&address_space_memory, ccw.cda);
+info.index = lduw_be_phys(&address_space_memory,
+  ccw.cda + sizeof(info.desc)
+  + sizeof(info.res0));
+info.num = lduw_be_phys(&address_space_memory,
+ccw.cda + sizeof(info.desc)
+  + sizeof(info.res0)
+  + sizeof(info.index));
+info.avail = ldq_be_phys(&address_space_memory,
+ ccw.cda + sizeof(info.desc)
+ + sizeof(info.res0)
+ + sizeof(info.index)
+ + sizeof(info.num));
+info.used = ldq_be_phys(&address_space_memory,
+ccw.cda + sizeof(info.desc)
++ sizeof(info.res0)
++ sizeof(info.index)
++ sizeof(info.num)
++ sizeof(info.avail));
+ret = virtio_ccw_set_vqs(sch, &info, NULL);
+}
+

[Qemu-devel] [PATCH RFC v5 14/19] s390x/virtio-ccw: enable virtio 1.0

2014-12-02 Thread Cornelia Huck

virtio-ccw should now have everything in place to operate virtio 1.0
devices, so let's enable revision 1.

Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/s390x/virtio-ccw.h b/hw/s390x/virtio-ccw.h
index fe5c782..d40e3be 100644
--- a/hw/s390x/virtio-ccw.h
+++ b/hw/s390x/virtio-ccw.h
@@ -70,7 +70,7 @@ typedef struct VirtIOCCWDeviceClass {
 } VirtIOCCWDeviceClass;
 
 /* The maximum virtio revision we support. */
-#define VIRTIO_CCW_REV_MAX 0
+#define VIRTIO_CCW_REV_MAX 1
 
 /* Performance improves when virtqueue kick processing is decoupled from the
  * vcpu thread using ioeventfd for some devices. */
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC v5 16/19] virtio-net: support longer header

2014-12-02 Thread Cornelia Huck

virtio-1 devices always use num_buffers in the header, even if
mergeable rx buffers have not been negotiated.

Signed-off-by: Cornelia Huck 
---
 hw/net/virtio-net.c |   21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ebbea60..7ee2bd6 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -373,15 +373,21 @@ static int peer_has_ufo(VirtIONet *n)
 return n->has_ufo;
 }
 
-static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs)
+static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs,
+   int version_1)
 {
 int i;
 NetClientState *nc;
 
 n->mergeable_rx_bufs = mergeable_rx_bufs;
 
-n->guest_hdr_len = n->mergeable_rx_bufs ?
-sizeof(struct virtio_net_hdr_mrg_rxbuf) : sizeof(struct 
virtio_net_hdr);
+if (version_1) {
+n->guest_hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+} else {
+n->guest_hdr_len = n->mergeable_rx_bufs ?
+sizeof(struct virtio_net_hdr_mrg_rxbuf) :
+sizeof(struct virtio_net_hdr);
+}
 
 for (i = 0; i < n->max_queues; i++) {
 nc = qemu_get_subqueue(n->nic, i);
@@ -525,7 +531,9 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
uint64_t features)
 
 virtio_net_set_mrg_rx_bufs(n,
__virtio_has_feature(features,
-VIRTIO_NET_F_MRG_RXBUF));
+VIRTIO_NET_F_MRG_RXBUF),
+   __virtio_has_feature(features,
+VIRTIO_F_VERSION_1));
 
 if (n->has_vnet_hdr) {
 n->curr_guest_offloads =
@@ -1407,7 +1415,8 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 qemu_get_buffer(f, n->mac, ETH_ALEN);
 n->vqs[0].tx_waiting = qemu_get_be32(f);
 
-virtio_net_set_mrg_rx_bufs(n, qemu_get_be32(f));
+virtio_net_set_mrg_rx_bufs(n, qemu_get_be32(f),
+   virtio_has_feature(vdev, VIRTIO_F_VERSION_1));
 
 if (version_id >= 3)
 n->status = qemu_get_be16(f);
@@ -1653,7 +1662,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 
 n->vqs[0].tx_waiting = 0;
 n->tx_burst = n->net_conf.txburst;
-virtio_net_set_mrg_rx_bufs(n, 0);
+virtio_net_set_mrg_rx_bufs(n, 0, 0);
 n->promisc = 1; /* for compatibility */
 
 n->mac_table.macs = g_malloc0(MAC_TABLE_ENTRIES * ETH_ALEN);
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC v5 17/19] virtio-net: enable virtio 1.0

2014-12-02 Thread Cornelia Huck

virtio-net (non-vhost) now should have everything in place to support
virtio 1.0: let's enable the feature bit for it.

Note that VIRTIO_F_VERSION_1 is technically a transport feature; once
every device is ready for virtio 1.0, we can move setting this
feature bit out of the individual devices.

Signed-off-by: Cornelia Huck 
---
 hw/net/virtio-net.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7ee2bd6..b5dd356 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -473,6 +473,7 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features)
 }
 
 if (!get_vhost_net(nc->peer)) {
+virtio_add_feature(&features, VIRTIO_F_VERSION_1);
 return features;
 }
 return vhost_net_get_features(get_vhost_net(nc->peer), features);
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC v5 10/19] s390x/virtio-ccw: add virtio set-revision call

2014-12-02 Thread Cornelia Huck

From: Thomas Huth 

Handle the virtio-ccw revision according to what the guest sets.
When revision 1 is selected, we have a virtio-1 standard device
with byteswapping for the virtio rings.

When a channel gets disabled, we have to revert to the legacy behavior
in case the next user of the device does not negotiate the revision 1
anymore (e.g. the boot firmware uses revision 1, but the operating
system only uses the legacy mode).

Note that revisions > 0 are still disabled.

Signed-off-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c |   52 +
 hw/s390x/virtio-ccw.h |5 +
 2 files changed, 57 insertions(+)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index e434718..5311d9f 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -20,9 +20,11 @@
 #include "hw/virtio/virtio-net.h"
 #include "hw/sysbus.h"
 #include "qemu/bitops.h"
+#include "hw/virtio/virtio-access.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/s390x/adapter.h"
 #include "hw/s390x/s390_flic.h"
+#include "linux/virtio_config.h"
 
 #include "ioinst.h"
 #include "css.h"
@@ -260,6 +262,12 @@ typedef struct VirtioThinintInfo {
 uint8_t isc;
 } QEMU_PACKED VirtioThinintInfo;
 
+typedef struct VirtioRevInfo {
+uint16_t revision;
+uint16_t length;
+uint8_t data[0];
+} QEMU_PACKED VirtioRevInfo;
+
 /* Specify where the virtqueues for the subchannel are in guest memory. */
 static int virtio_ccw_set_vqs(SubchDev *sch, uint64_t addr, uint32_t align,
   uint16_t index, uint16_t num)
@@ -299,6 +307,7 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 {
 int ret;
 VqInfoBlock info;
+VirtioRevInfo revinfo;
 uint8_t status;
 VirtioFeatDesc features;
 void *config;
@@ -375,6 +384,13 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 features.features = (uint32_t)dev->host_features;
 } else if (features.index == 1) {
 features.features = (uint32_t)(dev->host_features >> 32);
+/*
+ * Don't offer version 1 to the guest if it did not
+ * negotiate at least revision 1.
+ */
+if (dev->revision <= 0) {
+features.features &= ~(1 << (VIRTIO_F_VERSION_1 - 32));
+}
 } else {
 /* Return zeroes if the guest supports more feature bits. */
 features.features = 0;
@@ -406,6 +422,13 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 (vdev->guest_features & 
0x) |
 features.features);
 } else if (features.index == 1) {
+/*
+ * The guest should not set version 1 if it didn't
+ * negotiate a revision >= 1.
+ */
+if (dev->revision <= 0) {
+features.features &= ~(1 << (VIRTIO_F_VERSION_1 - 32));
+}
 virtio_set_features(vdev,
 (vdev->guest_features & 
0x) |
 ((uint64_t)features.features << 32));
@@ -608,6 +631,25 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 }
 }
 break;
+case CCW_CMD_SET_VIRTIO_REV:
+len = sizeof(revinfo);
+if (ccw.count < len || (check_len && ccw.count > len)) {
+ret = -EINVAL;
+break;
+}
+if (!ccw.cda) {
+ret = -EFAULT;
+break;
+}
+cpu_physical_memory_read(ccw.cda, &revinfo, len);
+if (dev->revision >= 0 ||
+revinfo.revision > VIRTIO_CCW_REV_MAX) {
+ret = -ENOSYS;
+break;
+}
+ret = 0;
+dev->revision = revinfo.revision;
+break;
 default:
 ret = -ENOSYS;
 break;
@@ -615,6 +657,13 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 return ret;
 }
 
+static void virtio_sch_disable_cb(SubchDev *sch)
+{
+VirtioCcwDevice *dev = sch->driver_data;
+
+dev->revision = -1;
+}
+
 static int virtio_ccw_device_init(VirtioCcwDevice *dev, VirtIODevice *vdev)
 {
 unsigned int cssid = 0;
@@ -740,6 +789,7 @@ static int virtio_ccw_device_init(VirtioCcwDevice *dev, 
VirtIODevice *vdev)
 css_sch_build_virtual_schib(sch, 0, VIRTIO_CCW_CHPID_TYPE);
 
 sch->ccw_cb = virtio_ccw_cb;
+sch->disable_cb = virtio_sch_disable_cb;
 
 /* Build senseid data. */
 memset(&sch->id, 0, sizeof(SenseId));
@@ -747,6 +797,8 @@ static int virtio_ccw_device_init(VirtioCcwDevice *dev, 
VirtIODevice *vdev)
 sch->id.cu_type = VIRTIO_CCW_CU_TYPE;
 sch->id.cu_model = vdev->device_id;
 
+dev->revision = -1;
+
 /* Set default feature bits that are offered by the host. */
 dev->host_features = 0;
 virtio_add_feature(&dev->host_featur

[Qemu-devel] [PATCH RFC v5 18/19] virtio: support revision-specific features

2014-12-02 Thread Cornelia Huck

Devices may support different sets of feature bits depending on which
revision they're operating at. Let's give the transport a way to
re-query the device about its features when the revision has been
changed.

Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c  |   12 ++--
 hw/virtio/virtio-bus.c |   14 --
 include/hw/virtio/virtio-bus.h |3 +++
 include/hw/virtio/virtio.h |3 +++
 4 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index ec492b8..3826074 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -699,6 +699,10 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 }
 ret = 0;
 dev->revision = revinfo.revision;
+/* Re-evaluate which features the device wants to offer. */
+dev->host_features =
+virtio_bus_get_vdev_features_rev(&dev->bus, dev->host_features,
+ dev->revision >= 1 ? 1 : 0);
 break;
 default:
 ret = -ENOSYS;
@@ -712,6 +716,9 @@ static void virtio_sch_disable_cb(SubchDev *sch)
 VirtioCcwDevice *dev = sch->driver_data;
 
 dev->revision = -1;
+/* Reset the device's features to legacy. */
+dev->host_features =
+virtio_bus_get_vdev_features_rev(&dev->bus, dev->host_features, 0);
 }
 
 static int virtio_ccw_device_init(VirtioCcwDevice *dev, VirtIODevice *vdev)
@@ -854,8 +861,9 @@ static int virtio_ccw_device_init(VirtioCcwDevice *dev, 
VirtIODevice *vdev)
 virtio_add_feature(&dev->host_features, VIRTIO_F_NOTIFY_ON_EMPTY);
 virtio_add_feature(&dev->host_features, VIRTIO_F_BAD_FEATURE);
 
-dev->host_features = virtio_bus_get_vdev_features(&dev->bus,
-  dev->host_features);
+/* All devices start in legacy mode. */
+dev->host_features =
+virtio_bus_get_vdev_features_rev(&dev->bus, dev->host_features, 0);
 
 css_generate_sch_crws(sch->cssid, sch->ssid, sch->schid,
   parent->hotplugged, 1);
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index 32e3fab..a30826c 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -97,18 +97,28 @@ size_t virtio_bus_get_vdev_config_len(VirtioBusState *bus)
 }
 
 /* Get the features of the plugged device. */
-uint64_t virtio_bus_get_vdev_features(VirtioBusState *bus,
-  uint64_t requested_features)
+uint64_t virtio_bus_get_vdev_features_rev(VirtioBusState *bus,
+  uint64_t requested_features,
+  unsigned int revision)
 {
 VirtIODevice *vdev = virtio_bus_get_device(bus);
 VirtioDeviceClass *k;
 
 assert(vdev != NULL);
 k = VIRTIO_DEVICE_GET_CLASS(vdev);
+if (revision > 0 && k->get_features_rev) {
+return k->get_features_rev(vdev, requested_features, revision);
+}
 assert(k->get_features != NULL);
 return k->get_features(vdev, requested_features);
 }
 
+uint64_t virtio_bus_get_vdev_features(VirtioBusState *bus,
+  uint64_t requested_features)
+{
+return virtio_bus_get_vdev_features_rev(bus, requested_features, 0);
+}
+
 /* Get bad features of the plugged device. */
 uint64_t virtio_bus_get_vdev_bad_features(VirtioBusState *bus)
 {
diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index 0a4dde1..f0916ef 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -84,6 +84,9 @@ size_t virtio_bus_get_vdev_config_len(VirtioBusState *bus);
 /* Get the features of the plugged device. */
 uint64_t virtio_bus_get_vdev_features(VirtioBusState *bus,
   uint64_t requested_features);
+uint64_t virtio_bus_get_vdev_features_rev(VirtioBusState *bus,
+  uint64_t requested_features,
+  unsigned int revision);
 /* Get bad features of the plugged device. */
 uint64_t virtio_bus_get_vdev_bad_features(VirtioBusState *bus);
 /* Get config of the plugged device. */
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index e7bedd1..f31e3df 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -147,6 +147,9 @@ typedef struct VirtioDeviceClass {
 DeviceRealize realize;
 DeviceUnrealize unrealize;
 uint64_t (*get_features)(VirtIODevice *vdev, uint64_t requested_features);
+uint64_t (*get_features_rev)(VirtIODevice *vdev,
+ uint64_t requested_features,
+ unsigned int revision);
 uint64_t (*bad_features)(VirtIODevice *vdev);
 void (*set_features)(VirtIODevice *vdev, uint64_t val);
 int (*validate_features)(VirtIODevice *vdev);
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC v5 19/19] virtio-blk: revision specific feature bits

2014-12-02 Thread Cornelia Huck

Wire up virtio-blk to provide different feature bit sets depending
on whether legacy or v1.0 has been requested.

Note that VERSION_1 is still disabled due to missing ANY_LAYOUT support.

Signed-off-by: Cornelia Huck 
---
 hw/block/virtio-blk.c |   19 +++
 1 file changed, 19 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 9cfae66..fdc236a 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -587,6 +587,24 @@ static uint64_t virtio_blk_get_features(VirtIODevice 
*vdev, uint64_t features)
 return features;
 }
 
+static uint64_t virtio_blk_get_features_rev(VirtIODevice *vdev,
+uint64_t features,
+unsigned int revision)
+{
+if (revision == 0) {
+/* legacy */
+virtio_clear_feature(&features, VIRTIO_F_VERSION_1);
+return virtio_blk_get_features(vdev, features);
+}
+/* virtio 1.0 or later */
+virtio_clear_feature(&features, VIRTIO_BLK_F_SCSI);
+virtio_clear_feature(&features, VIRTIO_BLK_F_CONFIG_WCE);
+virtio_clear_feature(&features, VIRTIO_BLK_F_WCE);
+/* we're still missing ANY_LAYOUT */
+/* virtio_add_feature(&features, VIRTIO_F_VERSION_1); */
+return features;
+}
+
 static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t status)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
@@ -821,6 +839,7 @@ static void virtio_blk_class_init(ObjectClass *klass, void 
*data)
 vdc->get_config = virtio_blk_update_config;
 vdc->set_config = virtio_blk_set_config;
 vdc->get_features = virtio_blk_get_features;
+vdc->get_features_rev = virtio_blk_get_features_rev;
 vdc->set_status = virtio_blk_set_status;
 vdc->reset = virtio_blk_reset;
 vdc->save = virtio_blk_save_device;
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC v5 13/19] virtio: allow to fail setting status

2014-12-02 Thread Cornelia Huck

virtio-1 allow setting of the FEATURES_OK status bit to fail if
the negotiated feature bits are inconsistent: let's fail
virtio_set_status() in that case and update virtio-ccw to post an
error to the guest.

Signed-off-by: Cornelia Huck 
---
 hw/s390x/virtio-ccw.c  |   20 
 hw/virtio/virtio.c |   24 +++-
 include/hw/virtio/virtio.h |3 ++-
 3 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 75c9ff9..ec492b8 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -555,15 +555,19 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
 virtio_ccw_stop_ioeventfd(dev);
 }
-virtio_set_status(vdev, status);
-if (vdev->status == 0) {
-virtio_reset(vdev);
-}
-if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
-virtio_ccw_start_ioeventfd(dev);
+if (virtio_set_status(vdev, status) == 0) {
+if (vdev->status == 0) {
+virtio_reset(vdev);
+}
+if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+virtio_ccw_start_ioeventfd(dev);
+}
+sch->curr_status.scsw.count = ccw.count - sizeof(status);
+ret = 0;
+} else {
+/* Trigger a command reject. */
+ret = -ENOSYS;
 }
-sch->curr_status.scsw.count = ccw.count - sizeof(status);
-ret = 0;
 }
 break;
 case CCW_CMD_SET_IND:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 4f2dc48..be128f7 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -548,15 +548,37 @@ void virtio_update_irq(VirtIODevice *vdev)
 virtio_notify_vector(vdev, VIRTIO_NO_VECTOR);
 }
 
-void virtio_set_status(VirtIODevice *vdev, uint8_t val)
+static int virtio_validate_features(VirtIODevice *vdev)
+{
+VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
+
+if (k->validate_features) {
+return k->validate_features(vdev);
+} else {
+return 0;
+}
+}
+
+int virtio_set_status(VirtIODevice *vdev, uint8_t val)
 {
 VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
 trace_virtio_set_status(vdev, val);
 
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+if (!(vdev->status & VIRTIO_CONFIG_S_FEATURES_OK) &&
+val & VIRTIO_CONFIG_S_FEATURES_OK) {
+int ret = virtio_validate_features(vdev);
+
+if (ret) {
+return ret;
+}
+}
+}
 if (k->set_status) {
 k->set_status(vdev, val);
 }
 vdev->status = val;
+return 0;
 }
 
 bool target_words_bigendian(void);
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 9a984c2..e7bedd1 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -149,6 +149,7 @@ typedef struct VirtioDeviceClass {
 uint64_t (*get_features)(VirtIODevice *vdev, uint64_t requested_features);
 uint64_t (*bad_features)(VirtIODevice *vdev);
 void (*set_features)(VirtIODevice *vdev, uint64_t val);
+int (*validate_features)(VirtIODevice *vdev);
 void (*get_config)(VirtIODevice *vdev, uint8_t *config);
 void (*set_config)(VirtIODevice *vdev, const uint8_t *config);
 void (*reset)(VirtIODevice *vdev);
@@ -232,7 +233,7 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, int 
align);
 void virtio_queue_notify(VirtIODevice *vdev, int n);
 uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
 void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
-void virtio_set_status(VirtIODevice *vdev, uint8_t val);
+int virtio_set_status(VirtIODevice *vdev, uint8_t val);
 void virtio_reset(void *opaque);
 void virtio_update_irq(VirtIODevice *vdev);
 int virtio_set_features(VirtIODevice *vdev, uint64_t val);
-- 
1.7.9.5

[Qemu-devel] [PATCH RFC v5 15/19] virtio-net: no writeable mac for virtio-1

2014-12-02 Thread Cornelia Huck

Devices operating as virtio 1.0 may not allow writes to the mac
address in config space.

Signed-off-by: Cornelia Huck 
---
 hw/net/virtio-net.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index d6d1b98..ebbea60 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -87,6 +87,7 @@ static void virtio_net_set_config(VirtIODevice *vdev, const 
uint8_t *config)
 memcpy(&netcfg, config, n->config_size);
 
 if (!virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR) &&
+!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) &&
 memcmp(netcfg.mac, n->mac, ETH_ALEN)) {
 memcpy(n->mac, netcfg.mac, ETH_ALEN);
 qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
-- 
1.7.9.5

Re: [Qemu-devel] [PATCH v2 5/7] coroutine: rewrite pool to avoid mutex

2014-12-02 Thread Kevin Wolf

Am 02.12.2014 um 13:13 hat Paolo Bonzini geschrieben:
> 
> 
> On 02/12/2014 13:09, Peter Lieven wrote:
> >>
> >> -static void __attribute__((destructor)) coroutine_pool_cleanup(void)
> >> -{
> >> -Coroutine *co;
> >> -Coroutine *tmp;
> >> -
> >> -QSLIST_FOREACH_SAFE(co, &pool, pool_next, tmp) {
> >> -QSLIST_REMOVE_HEAD(&pool, pool_next);
> >> -qemu_coroutine_delete(co);
> >> -}
> >> -
> >> -qemu_mutex_destroy(&pool_lock);
> >> -}
> >> -
> > 
> > I still feel we should leave this destructor in to clean up the
> > release_pool.
> 
> Why?  If you run QEMU under valgrind, there are thousands of blocks that
> we do not free.  Stefan/Kevin, what do you think?

The destructor doesn't seem to be doing anything but freeing memory,
which the OS can indeed do for us. I don't mind either way.

Kevin

Re: [Qemu-devel] QEMU Advent Calendar has begun - Day 1

2014-12-02 Thread Kashyap Chamarthy

On Mon, Dec 01, 2014 at 04:51:58PM +, Stefan Hajnoczi wrote:
> Today is the first day of QEMU Advent Calendar 2014, where an
> interesting and fun QEMU disk image is published every day.
> 
> http://www.qemu-advent-calendar.org/
> 
> I won't spam the mailing list every day but I wanted to let you know
> that from now until Christmas we will publish a daily image for your
> enjoyment.
> 
> The first image was contributed by Gerd Hoffmann and showcases an
> amazing Slackware Linux 1.0 distribution with a pre-1.0 Linux kernel
> from 1993!

Nice!

Just a small note: I added the below options to the 'Run' script's QEMU
invocation, so I can get a nice shell right away :-)

-nographic -nodefconfig -nodefaults

$ ./Run 
+ exec qemu-system-x86_64 -enable-kvm -m 16M -drive
if=ide,format=qcow2,file=slackware.qcow2 -netdev user,id=slirp -device
ne2k_isa,netdev=slirp -serial stdio -nographic -nodefconfig
-nodefaults

Welcome to Linux 0.99pl12.

slack login: root
Linux 0.99pl12. (Posix).
No mail.
slack:/# 


Thanks for doing this!
 
-- 
/kashyap

Re: [Qemu-devel] [PATCH v3] qtest/bios-tables: Add DMAR aml file and enable intel_iommu for q35

2014-12-02 Thread Marcel Apfelbaum

On Tue, 2014-12-02 at 13:26 +0100, Vasilis Liaskovitis wrote:
> We generate the expected DMAR table and enable intel_iommu on q35 to test the
> table.
> 
> Signed-off-by: Vasilis Liaskovitis 
> ---
>  tests/acpi-test-data/q35/DMAR | Bin 0 -> 64 bytes
>  tests/bios-tables-test.c  |   2 +-
>  2 files changed, 1 insertion(+), 1 deletion(-)
>  create mode 100644 tests/acpi-test-data/q35/DMAR
> 
> diff --git a/tests/acpi-test-data/q35/DMAR b/tests/acpi-test-data/q35/DMAR
> new file mode 100644
> index 
> ..6def4553381f4e48b80ead11af2adf6ce09c8c7e
> GIT binary patch
> literal 64
> ycmZ?qbqsP~U|?XZbMklg2v%^42yk`*iZKGkKx`1L2E+&;zyK0sV7U1YL;?U|dI 
> literal 0
> HcmV?d1
> 
> diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
> index 9e4d205..6975d49 100644
> --- a/tests/bios-tables-test.c
> +++ b/tests/bios-tables-test.c
> @@ -779,7 +779,7 @@ static void test_acpi_tcg(void)
>  
>  memset(&data, 0, sizeof(data));
>  data.machine = MACHINE_Q35;
> -test_acpi_one("-machine q35,accel=tcg", &data);
> +test_acpi_one("-machine q35,accel=tcg,iommu=on", &data);
>  free_test_data(&data);
>  }
>  

Reviewed-by: Marcel Apfelbaum

[Qemu-devel] [RFC PATCH 1/3] block: add accounting for merged requests

2014-12-02 Thread Peter Lieven

Signed-off-by: Peter Lieven 
---
 block.c|2 ++
 block/accounting.c |7 +++
 block/qapi.c   |2 ++
 hmp.c  |6 +-
 include/block/accounting.h |3 +++
 qapi/block-core.json   |   10 +-
 6 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index a612594..75450e3 100644
--- a/block.c
+++ b/block.c
@@ -4508,6 +4508,8 @@ static int multiwrite_merge(BlockDriverState *bs, 
BlockRequest *reqs,
 }
 }
 
+block_acct_merge_done(&bs->stats, BLOCK_ACCT_WRITE, num_reqs - outidx - 1);
+
 return outidx + 1;
 }
 
diff --git a/block/accounting.c b/block/accounting.c
index edbb1cc..d1afcd9 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -52,3 +52,10 @@ void block_acct_highest_sector(BlockAcctStats *stats, 
int64_t sector_num,
 stats->wr_highest_sector = sector_num + nb_sectors - 1;
 }
 }
+
+void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
+  int num_requests)
+{
+assert(type < BLOCK_MAX_IOTYPE);
+stats->merged[type] += num_requests;
+}
diff --git a/block/qapi.c b/block/qapi.c
index a87a34a..ec28240 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -316,6 +316,8 @@ static BlockStats *bdrv_query_stats(const BlockDriverState 
*bs)
 s->stats->wr_bytes = bs->stats.nr_bytes[BLOCK_ACCT_WRITE];
 s->stats->rd_operations = bs->stats.nr_ops[BLOCK_ACCT_READ];
 s->stats->wr_operations = bs->stats.nr_ops[BLOCK_ACCT_WRITE];
+s->stats->rd_merged = bs->stats.merged[BLOCK_ACCT_READ];
+s->stats->wr_merged = bs->stats.merged[BLOCK_ACCT_WRITE];
 s->stats->wr_highest_offset =
 bs->stats.wr_highest_sector * BDRV_SECTOR_SIZE;
 s->stats->flush_operations = bs->stats.nr_ops[BLOCK_ACCT_FLUSH];
diff --git a/hmp.c b/hmp.c
index 63d7686..baaa9e5 100644
--- a/hmp.c
+++ b/hmp.c
@@ -419,6 +419,8 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict)
" wr_total_time_ns=%" PRId64
" rd_total_time_ns=%" PRId64
" flush_total_time_ns=%" PRId64
+   " rd_merged=%" PRId64
+   " wr_merged=%" PRId64
"\n",
stats->value->stats->rd_bytes,
stats->value->stats->wr_bytes,
@@ -427,7 +429,9 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict)
stats->value->stats->flush_operations,
stats->value->stats->wr_total_time_ns,
stats->value->stats->rd_total_time_ns,
-   stats->value->stats->flush_total_time_ns);
+   stats->value->stats->flush_total_time_ns,
+   stats->value->stats->rd_merged,
+   stats->value->stats->wr_merged);
 }
 
 qapi_free_BlockStatsList(stats_list);
diff --git a/include/block/accounting.h b/include/block/accounting.h
index 50b42b3..4c406cf 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -39,6 +39,7 @@ typedef struct BlockAcctStats {
 uint64_t nr_bytes[BLOCK_MAX_IOTYPE];
 uint64_t nr_ops[BLOCK_MAX_IOTYPE];
 uint64_t total_time_ns[BLOCK_MAX_IOTYPE];
+uint64_t merged[BLOCK_MAX_IOTYPE];
 uint64_t wr_highest_sector;
 } BlockAcctStats;
 
@@ -53,5 +54,7 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie 
*cookie,
 void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie);
 void block_acct_highest_sector(BlockAcctStats *stats, int64_t sector_num,
unsigned int nb_sectors);
+void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
+   int num_requests);
 
 #endif
diff --git a/qapi/block-core.json b/qapi/block-core.json
index a14e6ab..b64ffb6 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -389,13 +389,21 @@
 # growable sparse files (like qcow2) that are used on top
 # of a physical device.
 #
+# @rd_merged: Number of read requests that have been merged into another 
request
+# (Since 2.3).
+#
+# @wr_merged: Number of write requests that have been merged into another 
request
+# (Since 2.3).
+#
+#
 # Since: 0.14.0
 ##
 { 'type': 'BlockDeviceStats',
   'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'rd_operations': 'int',
'wr_operations': 'int', 'flush_operations': 'int',
'flush_total_time_ns': 'int', 'wr_total_time_ns': 'int',
-   'rd_total_time_ns': 'int', 'wr_highest_offset': 'int' } }
+   'rd_total_time_ns': 'int', 'wr_highest_offset': 'int',
+   'rd_merged': 'int', 'wr_merged': 'int' } }
 
 ##
 # @BlockStats:
-- 
1.7.9.5

[Qemu-devel] [RFC PATCH 3/3] virtio-blk: introduce multiread

2014-12-02 Thread Peter Lieven

this patch finally introduce multiread support to virtio-blk while
multiwrite support was there for a long time read support was missing.

To achieve this the patch does serveral things which might need futher
explaination:

 - the whole merge and multireq logic is moved from block.c into
   virtio-blk. This is move is a preparation for directly creating a
   coroutine out of virtio-blk.

 - requests are only merged if they are strictly sequential and no
   longer sorted. This simplification decreases overhead and reduces
   latency. It will also merge some requests which were unmergable before.

   The old algorithm took up to 32 requests sorted them and tried to merge
   them. The outcome was anything between 1 and 32 requests. In case of
   32 requests there were 31 requests unnecessarily delayed.

   On the other hand lets imagine e.g. 16 unmergeable requests followed
   by 32 mergable requests. The latter 32 requests would have been split
   into two 16 byte requests.

   Last the simplified logic allows for a fast path if we have only a
   single request in the multirequest. In this case the request is sent as
   ordinary request without mulltireq callbacks.

As a first benchmark I installed Ubuntu 14.04.1 on a ramdisk. The number of
merged requests is in the same order while the latency is slightly decreased.
One should not stick to much to the numbers because the number of wr_requests
are highly fluctuant. I hope the numbers show that this patch is at least
not causing to big harm:

Cmdline:
qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom 
ubuntu-14.04.1-server-amd64.iso \
 -drive if=virtio,file=/tmp/ubuntu.raw -monitor stdio

Before:
virtio0: rd_bytes=150979072 wr_bytes=2591138816 rd_operations=18475 
wr_operations=69216
 flush_operations=15343 wr_total_time_ns=26969283701 
rd_total_time_ns=1018449432
 flush_total_time_ns=562596819 rd_merged=0 wr_merged=10453

After:
virtio0: rd_bytes=148112896 wr_bytes=2845834240 rd_operations=17535 
wr_operations=72197
 flush_operations=15760 wr_total_time_ns=26104971623 
rd_total_time_ns=1012926283
 flush_total_time_ns=564414752 rd_merged=517 wr_merged=9859

Some first numbers of improved read performance while booting:

The Ubuntu 14.04.1 vServer from above:
virtio0: rd_bytes=86158336 wr_bytes=688128 rd_operations=5851 wr_operations=70
 flush_operations=4 wr_total_time_ns=10886705 rd_total_time_ns=416688595
 flush_total_time_ns=288776 rd_merged=1297 wr_merged=2

Windows 2012R2:
virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 
wr_operations=360
 flush_operations=68 wr_total_time_ns=34344992718 
rd_total_time_ns=134386844669
 flush_total_time_ns=18115517 rd_merged=641 wr_merged=216

Signed-off-by: Peter Lieven 
---
 hw/block/dataplane/virtio-blk.c |   10 +-
 hw/block/virtio-blk.c   |  222 ---
 include/hw/virtio/virtio-blk.h  |   23 ++--
 3 files changed, 156 insertions(+), 99 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 1222a37..aa4ad91 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -96,9 +96,8 @@ static void handle_notify(EventNotifier *e)
 event_notifier_test_and_clear(&s->host_notifier);
 blk_io_plug(s->conf->conf.blk);
 for (;;) {
-MultiReqBuffer mrb = {
-.num_writes = 0,
-};
+MultiReqBuffer mrb_rd = {};
+MultiReqBuffer mrb_wr = {.is_write = 1};
 int ret;
 
 /* Disable guest->host notifies to avoid unnecessary vmexits */
@@ -117,10 +116,11 @@ static void handle_notify(EventNotifier *e)
 req->elem.in_num,
 req->elem.index);
 
-virtio_blk_handle_request(req, &mrb);
+virtio_blk_handle_request(req, &mrb_wr, &mrb_rd);
 }
 
-virtio_submit_multiwrite(s->conf->conf.blk, &mrb);
+virtio_submit_multireq(s->conf->conf.blk, &mrb_wr);
+virtio_submit_multireq(s->conf->conf.blk, &mrb_rd);
 
 if (likely(ret == -EAGAIN)) { /* vring emptied */
 /* Re-enable guest->host notifies and stop processing the vring.
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 490f961..f522bfc 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -22,12 +22,15 @@
 #include "dataplane/virtio-blk.h"
 #include "migration/migration.h"
 #include "block/scsi.h"
+#include "block/block_int.h"
 #ifdef __linux__
 # include 
 #endif
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
 
+/* #define DEBUG_MULTIREQ */
+
 VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
 {
 VirtIOBlockReq *req = g_slice_new(VirtIOBlockReq);
@@ -88,6 +91,11 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
 
 trace_virtio_blk_rw_complete(req, ret);
 
+#ifdef DEBUG_MULTIREQ
+pri

[Qemu-devel] [RFC PATCH 0/3] virtio-blk: add multiread support

2014-12-02 Thread Peter Lieven

this series adds the long missing multiread support to virtio-blk.

some remarks:
 - i introduced rd_merged and wr_merged block accounting stats to
   blockstats as a generic interface which can be set from any
   driver that will introduce multirequst merging in the future.
 - the knob to disable request merging is not yet there. I would
   add it to the device properties also as a generic interface
   to have the same switch on for any driver that might introduce
   request merging in the future
 - there is cleanup and iotest adjustion missing.

Peter Lieven (3):
  block: add accounting for merged requests
  hw/virtio-blk: add a constant for max number of merged requests
  virtio-blk: introduce multiread

 block.c |2 +
 block/accounting.c  |7 ++
 block/qapi.c|2 +
 hmp.c   |6 +-
 hw/block/dataplane/virtio-blk.c |   10 +-
 hw/block/virtio-blk.c   |  222 ---
 include/block/accounting.h  |3 +
 include/hw/virtio/virtio-blk.h  |   21 ++--
 qapi/block-core.json|   10 +-
 9 files changed, 184 insertions(+), 99 deletions(-)

-- 
1.7.9.5

[Qemu-devel] [RFC PATCH 2/3] hw/virtio-blk: add a constant for max number of merged requests

2014-12-02 Thread Peter Lieven

As it was not obvious (at least for me) where the 32 comes from;
add a constant for it.

Signed-off-by: Peter Lieven 
---
 hw/block/virtio-blk.c  |2 +-
 include/hw/virtio/virtio-blk.h |4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index b19b102..490f961 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -326,7 +326,7 @@ static void virtio_blk_handle_write(VirtIOBlockReq *req, 
MultiReqBuffer *mrb)
 block_acct_start(blk_get_stats(req->dev->blk), &req->acct, req->qiov.size,
  BLOCK_ACCT_WRITE);
 
-if (mrb->num_writes == 32) {
+if (mrb->num_writes == VIRTIO_BLK_MAX_MERGE_REQS) {
 virtio_submit_multiwrite(req->dev->blk, mrb);
 }
 
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 3979dc4..3f2652f 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -134,8 +134,10 @@ typedef struct VirtIOBlock {
 struct VirtIOBlockDataPlane *dataplane;
 } VirtIOBlock;
 
+#define VIRTIO_BLK_MAX_MERGE_REQS 32
+
 typedef struct MultiReqBuffer {
-BlockRequestblkreq[32];
+BlockRequestblkreq[VIRTIO_BLK_MAX_MERGE_REQS];
 unsigned intnum_writes;
 } MultiReqBuffer;
 
-- 
1.7.9.5

Re: [Qemu-devel] [PATCH RFC v5 07/19] virtio: allow virtio-1 queue layout

2014-12-02 Thread Michael S. Tsirkin

On Tue, Dec 02, 2014 at 02:00:15PM +0100, Cornelia Huck wrote:
> For virtio-1 devices, we allow a more complex queue layout that doesn't
> require descriptor table and rings on a physically-contigous memory area:
> add virtio_queue_set_rings() to allow transports to set this up.
> 
> Signed-off-by: Cornelia Huck 
> ---
>  hw/virtio/virtio.c |   16 
>  include/hw/virtio/virtio.h |2 ++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 8f69ffa..508dccf 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -96,6 +96,13 @@ static void virtqueue_init(VirtQueue *vq)
>  {
>  hwaddr pa = vq->pa;
>  
> +if (pa == -1ULL) {
> +/*
> + * This is a virtio-1 style vq that has already been setup
> + * in virtio_queue_set.
> + */
> +return;
> +}
>  vq->vring.desc = pa;
>  vq->vring.avail = pa + vq->vring.num * sizeof(VRingDesc);
>  vq->vring.used = vring_align(vq->vring.avail +
> @@ -717,6 +724,15 @@ hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
>  return vdev->vq[n].pa;
>  }
>  
> +void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
> +hwaddr avail, hwaddr used)
> +{
> +vdev->vq[n].pa = -1ULL;
> +vdev->vq[n].vring.desc = desc;
> +vdev->vq[n].vring.avail = avail;
> +vdev->vq[n].vring.used = used;
> +}
> +
>  void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
>  {
>  /* Don't allow guest to flip queue between existent and

pa == -1ULL tricks look quite ugly.
Can't we set desc/avail/used unconditionally, and drop
the pa value?

> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 68c40db..80ee313 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -224,6 +224,8 @@ void virtio_queue_set_addr(VirtIODevice *vdev, int n, 
> hwaddr addr);
>  hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n);
>  void virtio_queue_set_num(VirtIODevice *vdev, int n, int num);
>  int virtio_queue_get_num(VirtIODevice *vdev, int n);
> +void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
> +hwaddr avail, hwaddr used);
>  void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
>  void virtio_queue_notify(VirtIODevice *vdev, int n);
>  uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
> -- 
> 1.7.9.5

Re: [Qemu-devel] [PATCH RFC v5 07/19] virtio: allow virtio-1 queue layout

2014-12-02 Thread Cornelia Huck

On Tue, 2 Dec 2014 16:46:28 +0200
"Michael S. Tsirkin"  wrote:

> On Tue, Dec 02, 2014 at 02:00:15PM +0100, Cornelia Huck wrote:
> > For virtio-1 devices, we allow a more complex queue layout that doesn't
> > require descriptor table and rings on a physically-contigous memory area:
> > add virtio_queue_set_rings() to allow transports to set this up.
> > 
> > Signed-off-by: Cornelia Huck 
> > ---
> >  hw/virtio/virtio.c |   16 
> >  include/hw/virtio/virtio.h |2 ++
> >  2 files changed, 18 insertions(+)
> > 
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 8f69ffa..508dccf 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -96,6 +96,13 @@ static void virtqueue_init(VirtQueue *vq)
> >  {
> >  hwaddr pa = vq->pa;
> >  
> > +if (pa == -1ULL) {
> > +/*
> > + * This is a virtio-1 style vq that has already been setup
> > + * in virtio_queue_set.
> > + */
> > +return;
> > +}
> >  vq->vring.desc = pa;
> >  vq->vring.avail = pa + vq->vring.num * sizeof(VRingDesc);
> >  vq->vring.used = vring_align(vq->vring.avail +
> > @@ -717,6 +724,15 @@ hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
> >  return vdev->vq[n].pa;
> >  }
> >  
> > +void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
> > +hwaddr avail, hwaddr used)
> > +{
> > +vdev->vq[n].pa = -1ULL;
> > +vdev->vq[n].vring.desc = desc;
> > +vdev->vq[n].vring.avail = avail;
> > +vdev->vq[n].vring.used = used;
> > +}
> > +
> >  void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
> >  {
> >  /* Don't allow guest to flip queue between existent and
> 
> pa == -1ULL tricks look quite ugly.
> Can't we set desc/avail/used unconditionally, and drop
> the pa value?

And have virtio_queue_get_addr() return desc? Let me see if I can come
up with a patch.

Re: [Qemu-devel] [PATCHv3] block: add event when disk usage exceeds threshold

2014-12-02 Thread Stefan Hajnoczi

On Fri, Nov 28, 2014 at 01:31:07PM +0100, Francesco Romani wrote:
> @@ -82,6 +83,8 @@ BlockDeviceInfo *bdrv_block_device_info(BlockDriverState 
> *bs)
>  info->iops_size = cfg.op_size;
>  }
>  
> +info->write_threshold = bdrv_usage_threshold_get(bs);

Overall looks good but I notice that "write_threshold" and
"usage_threshold" are both used.  Please use just one consistently (I
think "write_threshold" is clearer).

pgpoo5lthMtKa.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH] nvme: 64kB page size fixes

2014-12-02 Thread Stefan Hajnoczi

On Thu, Nov 27, 2014 at 02:39:21PM +1100, Anton Blanchard wrote:
> Initialise our maximum page size capability to 64kB and increase
> the page_size variable from 16 to 32 bits.
> 
> Signed-off-by: Anton Blanchard 
> --

Covered in NVM Express 1.1 specification "3.1.1 Offset 00h: CAP -
Controller Capabilities".
http://www.nvmexpress.org/wp-content/uploads/NVM-Express-1_1.pdf

Reviewed-by: Stefan Hajnoczi 

pgpJUzhjs9gXJ.pgp
Description: PGP signature

Re: [Qemu-devel] [Bug 1396052] [NEW] migration failed when running BurnInTest in guest

2014-12-02 Thread Stefan Hajnoczi

On Thu, Nov 27, 2014 at 02:50:59PM +0530, Amit Shah wrote:
> On (Tue) 25 Nov 2014 [09:32:58], z08687 wrote:
> > Public bug reported:
> > 
> > Hi,  
> > I found a live migration problem and have found out the reason, but I can't 
> > fix it up myself. I really need help.
> > When live migration vm and it's block device in save time, it will occur 
> > probabilistic .
> > 
> > Step:
> > 1.  start a windows vm,and run burnInTest(it will write dirty data to block 
> > device in migration)
> > 2.  migrate vm with it's block device.
> > 3.  a few minutes later,  dest vm was killed and migration will be failed 
> > (probabilistic )
> 
> That's a nice test case, as well as good analysis.

This is a known issue:
https://lists.gnu.org/archive/html/qemu-devel/2014-09/msg02959.html

Please try this patch:

  commit 7ea2d269cb84ca7a2f4b7c3735634176f7c1dc35
  Author: Alexey Kardashevskiy 
  Date:   Thu Oct 9 13:50:46 2014 +1100

block/migration: Disable cache invalidate for incoming migration

Stefan


pgppc79oaT36M.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCHv3] block: add event when disk usage exceeds threshold

2014-12-02 Thread Francesco Romani



- Original Message -
> From: "Stefan Hajnoczi" 
> To: "Francesco Romani" 
> Cc: kw...@redhat.com, mdr...@linux.vnet.ibm.com, qemu-devel@nongnu.org, 
> lcapitul...@redhat.com
> Sent: Tuesday, December 2, 2014 4:01:17 PM
> Subject: Re: [Qemu-devel] [PATCHv3] block: add event when disk usage exceeds  
> threshold
> 
> On Fri, Nov 28, 2014 at 01:31:07PM +0100, Francesco Romani wrote:
> > @@ -82,6 +83,8 @@ BlockDeviceInfo *bdrv_block_device_info(BlockDriverState
> > *bs)
> >  info->iops_size = cfg.op_size;
> >  }
> >  
> > +info->write_threshold = bdrv_usage_threshold_get(bs);
> 
> Overall looks good but I notice that "write_threshold" and
> "usage_threshold" are both used.  Please use just one consistently (I
> think "write_threshold" is clearer).

Agreed. Will use "write_threshold" everywhere, file names included.

Bests,

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani

[Qemu-devel] Vhost-user - multi queue support

2014-12-02 Thread Long, Thomas

Hi All,
I'm just wondering what the status is with regards to supporting multi-queue in 
Vhost-user?

I see that Nikolaev has developed a patch to support this feature:
https://github.com/SnabbCo/qemu/commit/f41eeccf4ab6ea5970e2941ce2de0aae893b10f9

Are there any current plans to pull either this patch or a different patch into 
QEMU ?

Regards,
Tommy

[Qemu-devel] [PATCH] target-i386: Intel xsaves

2014-12-02 Thread Wanpeng Li

Add xsaves related definition, it also add corresponding part to 
kvm_get/put, and vmstate.

Signed-off-by: Wanpeng Li 
---
 target-i386/cpu.h |  2 ++
 target-i386/kvm.c | 15 +++
 target-i386/machine.c |  3 ++-
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 015f5b5..cff7433 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -389,6 +389,7 @@
 #define MSR_VM_HSAVE_PA 0xc0010117
 
 #define MSR_IA32_BNDCFGS0x0d90
+#define MSR_IA32_XSS0x0da0
 
 #define XSTATE_FP   (1ULL << 0)
 #define XSTATE_SSE  (1ULL << 1)
@@ -1019,6 +1020,7 @@ typedef struct CPUX86State {
 uint64_t xstate_bv;
 
 uint64_t xcr0;
+uint64_t xss;
 
 TPRAccess tpr_access_type;
 } CPUX86State;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index ccf36e8..c6fc417 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -80,6 +80,7 @@ static bool has_msr_hv_hypercall;
 static bool has_msr_hv_vapic;
 static bool has_msr_hv_tsc;
 static bool has_msr_mtrr;
+static bool has_msr_xss;
 
 static bool has_msr_architectural_pmu;
 static uint32_t num_architectural_pmu_counters;
@@ -826,6 +827,10 @@ static int kvm_get_supported_msrs(KVMState *s)
 has_msr_bndcfgs = true;
 continue;
 }
+if (kvm_msr_list->indices[i] == MSR_IA32_XSS) {
+has_msr_xss = true;
+continue;
+}
 }
 }
 
@@ -1224,6 +1229,9 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 if (has_msr_bndcfgs) {
 kvm_msr_entry_set(&msrs[n++], MSR_IA32_BNDCFGS, env->msr_bndcfgs);
 }
+if (has_msr_xss) {
+kvm_msr_entry_set(&msrs[n++], MSR_IA32_XSS, env->xss);
+}
 #ifdef TARGET_X86_64
 if (lm_capable_kernel) {
 kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar);
@@ -1570,6 +1578,10 @@ static int kvm_get_msrs(X86CPU *cpu)
 if (has_msr_bndcfgs) {
 msrs[n++].index = MSR_IA32_BNDCFGS;
 }
+if (has_msr_xss) {
+msrs[n++].index = MSR_IA32_XSS;
+}
+
 
 if (!env->tsc_valid) {
 msrs[n++].index = MSR_IA32_TSC;
@@ -1717,6 +1729,9 @@ static int kvm_get_msrs(X86CPU *cpu)
 case MSR_IA32_BNDCFGS:
 env->msr_bndcfgs = msrs[i].data;
 break;
+case MSR_IA32_XSS:
+env->xss = msrs[i].data;
+break;
 default:
 if (msrs[i].index >= MSR_MC0_CTL &&
 msrs[i].index < MSR_MC0_CTL + (env->mcg_cap & 0xff) * 4) {
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 1c13b14..43af33f 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -689,7 +689,7 @@ static const VMStateDescription vmstate_avx512 = {
 
 VMStateDescription vmstate_x86_cpu = {
 .name = "cpu",
-.version_id = 12,
+.version_id = 13,
 .minimum_version_id = 3,
 .pre_save = cpu_pre_save,
 .post_load = cpu_post_load,
@@ -786,6 +786,7 @@ VMStateDescription vmstate_x86_cpu = {
 VMSTATE_UINT64_V(env.xcr0, X86CPU, 12),
 VMSTATE_UINT64_V(env.xstate_bv, X86CPU, 12),
 VMSTATE_YMMH_REGS_VARS(env.ymmh_regs, X86CPU, CPU_NB_REGS, 12),
+VMSTATE_UINT64_V(env.xss, X86CPU, 13),
 VMSTATE_END_OF_LIST()
 /* The above list is not sorted /wrt version numbers, watch out! */
 },
-- 
1.9.1

Re: [Qemu-devel] Update on TCG Multithreading

2014-12-02 Thread Kirill Batuzov

On Mon, 1 Dec 2014, Mark Burton wrote:
> 
> One issue I’d like to see more opinions on is the question of a cache per 
> core, or a shared cache.
> I have heard anecdotal evidence that a shared cache gives a major performance 
> benefit….
> Does anybody have anything more concrete?

There is a theoretical and experimental comparison of these approaches in
PQEMU article (you've cited it on wiki page). Only the authors call them
differently: they call cache-per-core "Separate Code Cache" (SCC) and
they call shared cache "Unified Code Cache" (UCC).

-- 
Kirill

Re: [Qemu-devel] [PATCH] nvme: 64kB page size fixes

2014-12-02 Thread Kevin Wolf

Am 02.12.2014 um 16:04 hat Stefan Hajnoczi geschrieben:
> On Thu, Nov 27, 2014 at 02:39:21PM +1100, Anton Blanchard wrote:
> > Initialise our maximum page size capability to 64kB and increase
> > the page_size variable from 16 to 32 bits.
> > 
> > Signed-off-by: Anton Blanchard 
> > --
> 
> Covered in NVM Express 1.1 specification "3.1.1 Offset 00h: CAP -
> Controller Capabilities".
> http://www.nvmexpress.org/wp-content/uploads/NVM-Express-1_1.pdf
> 
> Reviewed-by: Stefan Hajnoczi 

Thanks, applied to block-next.

Kevin


pgpZQ0LLHasPP.pgp
Description: PGP signature

Re: [Qemu-devel] AMD video card passthrough reset issues

2014-12-02 Thread Alex Williamson

On Tue, 2014-12-02 at 13:53 +0100, Lucio Andrés Illanes Albornoz wrote:
> Hello,
> 
>   I'm doing secondary VGA passthrough with an AMD Radeon R7 260X using
> QEMU v2.1.2 w/ KVM and VFIO on Debian v7.7 (wheezy) (qemu v2.1
> +dfsg-5~bpo70+1 from wheezy-backports) and kernel version 3.16.5 (from
> wheezy-backports as well) and Windows 8.1 Update 1 (x64) as the guest
> OS.
> 
> At present, rebooting the VM reproducibly has Windows fail to
> enable/start said video card upon bootup w/ an error code of 43, as
> seems to be the case w/ mostly everyone else running a comparable
> configuration; disabling/ejecting it before rebooting/powering down
> the VM from within the guest, as with everyone else, has proven to be
> a reliable mitigation. However, being that there are scenarios where
> this is either not feasible or impossible altogether, short of if done
> through a service or kernel-mode driver (and even then,) I had
> intended to investigate the causes behind this issue.
> 
> Unfortunately, the flu got to me first (so to speak.) I did notice
> that simply removing the PCI device in question and then causing a PCI
> bus (re)scan (both) through sysfs on the host in between VM
> reboots/power cycles is effectively equivalent to disabling it within
> the guest. Thus, I find myself wondering precisely what it is that
> does take place when doing so vs. when QEMU performs a `hot reset'
> through the corresponding interface in drivers/vfio/pci/; evidently,
> the difference must be of sufficient importance since the latter
> mechanism ends up leaving my video card unavailable for subsequent VM
> operation until the next host reboot.
> 
> I should very much appreciate any hints concerning whether it would be
> possible to have QEMU/VFIO perform whatever need be done itself or if
> it should be possible to have this be done by either itself.

All of the Bonaire-based AMD GPUs seems to have issues with reset
(R7790, R7 260/X).  I've tried to engage AMD on this, but haven't gotten
any response on this topic yet.  For devices like this that don't
support any kind of function level reset (FLR), VFIO will try to do a
PCI bus reset on guest reboot.  This is as close as we can get to how
the BIOS resets the device on a host reboot.  Unfortunately on these
cards there seems to be some sort of disconnect between the PCI bus
interface reset and resetting the rest of the GPU.  I believe I've even
seen cases where a PCI bus reset appears to have no affect on the GPU
when running in VGA mode.  My best guess is that some firmware running
in the card isn't clearing itself on reset an attempting to reload it
causes errors.  Note that a guest can be reset multiple times and the
device continues to work if the guest is restricted to standard VGA
drivers (in VGA passthrough mode of course).

In your experiment with removing and rescanning the device, are you
simply doing 'echo 1 > remove; echo 1 > /sys/bus/pci/rescan'?  Thanks,

Alex

Re: [Qemu-devel] [PATCH RFC v5 07/19] virtio: allow virtio-1 queue layout

2014-12-02 Thread Cornelia Huck

On Tue, 2 Dec 2014 15:54:44 +0100
Cornelia Huck  wrote:

> On Tue, 2 Dec 2014 16:46:28 +0200
> "Michael S. Tsirkin"  wrote:

> > pa == -1ULL tricks look quite ugly.
> > Can't we set desc/avail/used unconditionally, and drop
> > the pa value?
> 
> And have virtio_queue_get_addr() return desc? Let me see if I can come
> up with a patch.

I came up with the following (untested) patch, which should hopefully
suit mmio as well. I haven't cared about migration yet.

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 8f69ffa..ac3c615 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -69,7 +69,6 @@ typedef struct VRing
 struct VirtQueue
 {
 VRing vring;
-hwaddr pa;
 uint16_t last_avail_idx;
 /* Last used index value we have signalled on */
 uint16_t signalled_used;
@@ -92,12 +91,13 @@ struct VirtQueue
 };
 
 /* virt queue functions */
-static void virtqueue_init(VirtQueue *vq)
+static void virtqueue_update_rings(VirtQueue *vq)
 {
-hwaddr pa = vq->pa;
-
-vq->vring.desc = pa;
-vq->vring.avail = pa + vq->vring.num * sizeof(VRingDesc);
+if (!vq->vring.desc) {
+/* not yet setup -> nothing to do */
+return;
+}
+vq->vring.avail = vq->vring.desc + vq->vring.num * sizeof(VRingDesc);
 vq->vring.used = vring_align(vq->vring.avail +
  offsetof(VRingAvail, ring[vq->vring.num]),
  vq->vring.align);
@@ -605,7 +605,6 @@ void virtio_reset(void *opaque)
 vdev->vq[i].vring.avail = 0;
 vdev->vq[i].vring.used = 0;
 vdev->vq[i].last_avail_idx = 0;
-vdev->vq[i].pa = 0;
 vdev->vq[i].vector = VIRTIO_NO_VECTOR;
 vdev->vq[i].signalled_used = 0;
 vdev->vq[i].signalled_used_valid = false;
@@ -708,17 +707,34 @@ void virtio_config_writel(VirtIODevice *vdev, uint32_t 
addr, uint32_t data)
 
 void virtio_queue_set_addr(VirtIODevice *vdev, int n, hwaddr addr)
 {
-vdev->vq[n].pa = addr;
-virtqueue_init(&vdev->vq[n]);
+vdev->vq[n].vring.desc = addr;
+virtqueue_update_rings(&vdev->vq[n]);
 }
 
 hwaddr virtio_queue_get_addr(VirtIODevice *vdev, int n)
 {
-return vdev->vq[n].pa;
+return vdev->vq[n].vring.desc;
+}
+
+void virtio_queue_set_rings(VirtIODevice *vdev, int n, hwaddr desc,
+hwaddr avail, hwaddr used)
+{
+vdev->vq[n].vring.desc = desc;
+vdev->vq[n].vring.avail = avail;
+vdev->vq[n].vring.used = used;
 }
 
 void virtio_queue_set_num(VirtIODevice *vdev, int n, int num)
 {
+/*
+ * For virtio-1 devices, the number of buffers may only be
+ * updated if the ring addresses have not yet been set up.
+ */
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1) &&
+vdev->vq[n].vring.desc) {
+error_report("tried to modify buffer num for virtio-1 device");
+return;
+}
 /* Don't allow guest to flip queue between existent and
  * nonexistent states, or to set it to an invalid size.
  */
@@ -728,7 +744,7 @@ void virtio_queue_set_num(VirtIODevice *vdev, int n, int 
num)
 return;
 }
 vdev->vq[n].vring.num = num;
-virtqueue_init(&vdev->vq[n]);
+virtqueue_update_rings(&vdev->vq[n]);
 }
 
 int virtio_queue_get_num(VirtIODevice *vdev, int n)
@@ -748,6 +764,11 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, int 
align)
 BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
 
+/* virtio-1 compliant devices cannot change the aligment */
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+error_report("tried to modify queue alignment for virtio-1 device");
+return;
+}
 /* Check that the transport told us it was going to do this
  * (so a buggy transport will immediately assert rather than
  * silently failing to migrate this state)
@@ -755,7 +776,7 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, int 
align)
 assert(k->has_variable_vring_alignment);
 
 vdev->vq[n].vring.align = align;
-virtqueue_init(&vdev->vq[n]);
+virtqueue_update_rings(&vdev->vq[n]);
 }
 
 void virtio_queue_notify_vq(VirtQueue *vq)
@@ -949,7 +970,8 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 if (k->has_variable_vring_alignment) {
 qemu_put_be32(f, vdev->vq[i].vring.align);
 }
-qemu_put_be64(f, vdev->vq[i].pa);
+/* XXX virtio-1 devices */
+qemu_put_be64(f, vdev->vq[i].vring.desc);
 qemu_put_be16s(f, &vdev->vq[i].last_avail_idx);
 if (k->save_queue) {
 k->save_queue(qbus->parent, i, f);
@@ -1044,13 +1066,14 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
version_id)
 if (k->has_variable_vring_alignment) {
 vdev->vq[i].vring.align = qemu_get_be32(f);
 }
-vdev->vq[i].pa = qemu_get_be64(f);
+vdev->vq[i].vring.desc = qemu_get_be64(f);
 qemu_get_be16s(f, &vdev->vq[i].last_avail_idx);

[Qemu-devel] checkpatch hang (was: [PATCH] target-mips/op_helper.c: Restore the order of helpers)

2014-12-02 Thread Maciej W. Rozycki

Hi Blue Swirl,

On Wed, 5 Nov 2014, Maciej W. Rozycki wrote:

> Restore the order of helpers that used to be: unary operations (generic, 
> then MIPS-specific), binary operations (generic, then MIPS-specific), 
> compare operations.  At one point FMA operations were inserted at a 
> random place in the file, disregarding the preexisting order, and later 
> on even more operations sprinkled across the file.  Revert the mess by 
> moving FMA operations to a new ternary class inserted after the binary 
> class and move the misplaced unary and binary operations to where they 
> belong.
> 
> Signed-off-by: Maciej W. Rozycki 
> ---
>  I hope there is no question about this either.  No functional change.  
> Please apply.
> 
>  NB scripts/checkpatch.pl hangs on this patch -- can I help debugging it 
> somehow?

 It was brought to my attention that there is a checkpatch maintainer to 
contact on issues with this script.  My Perl-fu is regrettably lacking (to 
the extent I have never succeeded in figuring out how to debug or trace 
scripts), but perhaps you could have a look into the patch below and see 
why it confuses the script enough that it loops forever?

 Thanks,

  Maciej

qemu-mips-op-helper-xop.diff
Index: qemu-git-trunk/target-mips/op_helper.c
===
--- qemu-git-trunk.orig/target-mips/op_helper.c 2014-11-09 17:11:16.438953903 
+
+++ qemu-git-trunk/target-mips/op_helper.c  2014-11-09 17:11:19.937862467 
+
@@ -2935,110 +2935,6 @@ FLOAT_UNOP(abs)
 FLOAT_UNOP(chs)
 #undef FLOAT_UNOP
 
-#define FLOAT_FMADDSUB(name, bits, muladd_arg)  \
-uint ## bits ## _t helper_float_ ## name (CPUMIPSState *env,\
-  uint ## bits ## _t fs,\
-  uint ## bits ## _t ft,\
-  uint ## bits ## _t fd)\
-{   \
-uint ## bits ## _t fdret;   \
-\
-fdret = float ## bits ## _muladd(fs, ft, fd, muladd_arg,\
- &env->active_fpu.fp_status);   \
-update_fcr31(env, GETPC()); \
-return fdret;   \
-}
-
-FLOAT_FMADDSUB(maddf_s, 32, 0)
-FLOAT_FMADDSUB(maddf_d, 64, 0)
-FLOAT_FMADDSUB(msubf_s, 32, float_muladd_negate_product)
-FLOAT_FMADDSUB(msubf_d, 64, float_muladd_negate_product)
-#undef FLOAT_FMADDSUB
-
-#define FLOAT_MINMAX(name, bits, minmaxfunc)\
-uint ## bits ## _t helper_float_ ## name (CPUMIPSState *env,\
-  uint ## bits ## _t fs,\
-  uint ## bits ## _t ft)\
-{   \
-uint ## bits ## _t fdret;   \
-\
-fdret = float ## bits ## _ ## minmaxfunc(fs, ft,\
-   &env->active_fpu.fp_status); \
-update_fcr31(env, GETPC()); \
-return fdret;   \
-}
-
-FLOAT_MINMAX(max_s, 32, maxnum)
-FLOAT_MINMAX(max_d, 64, maxnum)
-FLOAT_MINMAX(maxa_s, 32, maxnummag)
-FLOAT_MINMAX(maxa_d, 64, maxnummag)
-
-FLOAT_MINMAX(min_s, 32, minnum)
-FLOAT_MINMAX(min_d, 64, minnum)
-FLOAT_MINMAX(mina_s, 32, minnummag)
-FLOAT_MINMAX(mina_d, 64, minnummag)
-#undef FLOAT_MINMAX
-
-#define FLOAT_RINT(name, bits)  \
-uint ## bits ## _t helper_float_ ## name (CPUMIPSState *env,\
-  uint ## bits ## _t fs)\
-{   \
-uint ## bits ## _t fdret;   \
-\
-fdret = float ## bits ## _round_to_int(fs, &env->active_fpu.fp_status); \
-update_fcr31(env, GETPC()); \
-return fdret;   \
-}
-
-FLOAT_RINT(rint_s, 32)
-FLOAT_RINT(rint_d, 64)
-#undef FLOAT_RINT
-
-#define FLOAT_CLASS_SIGNALING_NAN  0x001
-#define FLOAT_CLASS_QUIET_NAN  0x002
-#define FLOAT_CLASS_NEGATIVE_INFINITY  0x004
-#define FLOAT_CLASS_NEGATIVE_NORMAL0x008
-#define FLOAT_CLASS_NEGATIVE_SUBNORMAL 0x010
-#define FLOAT_CLASS_NEGATIVE_ZERO  0x020
-#define FLOAT_CLASS_POSITIVE_INFINITY  0x040
-#define FLOAT_CLASS_POSITIVE_NORMAL0x080
-#define FLOAT_CLAS

Re: [Qemu-devel] [PATCH] target-arm: ARM64: Adding EL1 AARCH32 guest support for KVM.

2014-12-02 Thread Peter Maydell

On 28 November 2014 at 13:06, Pranavkumar Sawargaonkar
 wrote:
> In KVM ARM64 one can choose to run guest in 32bit mode i.e EL1 in AARCH32 
> mode.
> This patch adds qemu support for running guest EL1 in AARCH32 mode with
> virt as a machine model.

Thanks for sending this patch.

> This patch also adds a support to run Image (along with zImage) for arm32.

I'm a bit confused by this -- we already support running Images
and zImages on 32 bit. We shouldn't need any extra "is this a zImage"
detection code to handle this, I don't think.

In any case, if we do need something extra here it should probably
be in its own patch.

> One can specify about 32bit kernel Image by using -cpu host,el1_aarch32 
> argument.
>
> e.g.
> "./qemu/aarch64-softmmu/qemu-system-aarch64  -nographic -display none \
>  -serial stdio -kernel ./Image  -m 512 -M virt -cpu host,el1_aarch32 \
>  -initrd rootfs.img  -append "console=ttyAMA0 root=/dev/ram" -enable-kvm"
>
> Signed-off-by: Pranavkumar Sawargaonkar 
> ---
>  hw/arm/boot.c  | 44 
>  hw/arm/virt.c  | 30 +-
>  target-arm/cpu.c   |  5 ++--
>  target-arm/cpu.h   |  2 ++
>  target-arm/kvm64.c | 73 
> ++
>  5 files changed, 146 insertions(+), 8 deletions(-)
>
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index 0014c34..da8cdc8 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -476,6 +476,32 @@ static void do_cpu_reset(void *opaque)
>  }
>  }
>
> +static int check_load_zimage(const char *filename)
> +{
> +int fd;
> +uint8_t buf[40];
> +uint32_t *p = (uint32_t *) &buf[36];
> +
> +fd = open(filename, O_RDONLY | O_BINARY);
> +if (fd < 0) {
> +perror(filename);
> +return -1;
> +}
> +
> +memset(buf, 0, sizeof(buf));
> +if (read(fd, buf, sizeof(buf)) != sizeof(buf)) {
> +close(fd);
> +return -1;
> +}
> +
> +/* Check for zImage magic number */
> +if (*p == 0x016F2818) {
> +return 1;
> +}
> +
> +   return 0;
> +}
> +
>  void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
>  {
>  CPUState *cs;
> @@ -515,15 +541,23 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
> *info)
>  return;
>  }
>
> -if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
> +if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64) &&
> +(!cpu->env.el1_aarch32)) {
>  primary_loader = bootloader_aarch64;
>  kernel_load_offset = KERNEL64_LOAD_ADDR;
>  elf_machine = EM_AARCH64;
>  } else {
> -primary_loader = bootloader;
> -kernel_load_offset = KERNEL_LOAD_ADDR;
> -elf_machine = EM_ARM;
> -}
> +if (check_load_zimage(info->kernel_filename)) {
> +primary_loader = bootloader;
> +kernel_load_offset = KERNEL_LOAD_ADDR;
> +elf_machine = EM_ARM;
> +} else {
> +primary_loader = bootloader;
> +/* Assuming we are loading Image hence aligning it to 0x8000 */
> +kernel_load_offset = KERNEL_LOAD_ADDR - 0x8000;
> +elf_machine = EM_ARM;
> +}
> +   }
>
>  info->dtb_filename = qemu_opt_get(qemu_get_machine_opts(), "dtb");
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 314e55b..64213e6 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -204,7 +204,8 @@ static void fdt_add_psci_node(const VirtBoardInfo *vbi)
>  qemu_fdt_setprop(fdt, "/psci", "compatible", comp, sizeof(comp));
>
>  cpu_off_fn = QEMU_PSCI_0_2_FN_CPU_OFF;
> -if (arm_feature(&armcpu->env, ARM_FEATURE_AARCH64)) {
> +if (arm_feature(&armcpu->env, ARM_FEATURE_AARCH64) &&
> +(!armcpu->env.el1_aarch32)) {
>  cpu_suspend_fn = QEMU_PSCI_0_2_FN64_CPU_SUSPEND;
>  cpu_on_fn = QEMU_PSCI_0_2_FN64_CPU_ON;
>  migrate_fn = QEMU_PSCI_0_2_FN64_MIGRATE;
> @@ -527,6 +528,24 @@ static void *machvirt_dtb(const struct arm_boot_info 
> *binfo, int *fdt_size)
>  return board->fdt;
>  }
>
> +#if defined(TARGET_AARCH64) && !defined(CONFIG_USER_ONLY)
> +static void check_special_cpu_model_flags(const char *cpu_model,
> +  Object *cpuobj)
> +{
> +ARMCPU *cpu = ARM_CPU(cpuobj);
> +
> +if (!cpu) {
> +return;
> +}
> +
> +if (strcmp(cpu_model, "host,el1_aarch32") == 0) {

This looks wrong -- we should support the "32 bit EL1" flag
for all 64 bit CPU types, not just "host". It also should
not be in the virt board model, but in target-arm/ code
somewhere.

> +cpu->env.el1_aarch32 = 1;
> +} else {
> +cpu->env.el1_aarch32 = 0;
> +}
> +}
> +#endif
> +
>  static void machvirt_init(MachineState *machine)
>  {
>  qemu_irq pic[NUM_IRQS];
> @@ -540,6 +559,12 @@ static void machvirt_init(MachineState *machine)
>  cpu_model = "cortex-a15";
>  }

Re: [Qemu-devel] AMD video card passthrough reset issues

2014-12-02 Thread Lucio Andrés Illanes Albornoz

On Tue, 02 Dec 2014 08:26:20 -0700 Alex Williamson  
wrote:
> All of the Bonaire-based AMD GPUs seems to have issues with reset
> (R7790, R7 260/X).  I've tried to engage AMD on this, but haven't gotten
> any response on this topic yet.  For devices like this that don't
> support any kind of function level reset (FLR), VFIO will try to do a
> PCI bus reset on guest reboot.  This is as close as we can get to how
> the BIOS resets the device on a host reboot.  Unfortunately on these
> cards there seems to be some sort of disconnect between the PCI bus
> interface reset and resetting the rest of the GPU.  I believe I've even
> seen cases where a PCI bus reset appears to have no affect on the GPU
> when running in VGA mode.  My best guess is that some firmware running
> in the card isn't clearing itself on reset an attempting to reload it
> causes errors.  Note that a guest can be reset multiple times and the
> device continues to work if the guest is restricted to standard VGA
> drivers (in VGA passthrough mode of course).
My experience is consistent with that description; the bus reset initiated 
through the hotplug reset interface appears to leave whichever part(s) of my 
video card in a state the AMD driver is not prepared to handle upon 2nd bootup 
(e.g. first VM reboot) and thereafter, it's completely gone: endless amounts of 
IOTLB_INV_TIMEOUT and `Completion-Wait loop timed out' kernel messages and 
particularly, no VGA output at all when doing primary passthrough (which I no 
longer require since vgacon isn't too fond of that,) and possibly even hangs 
upon running lspci (8) afterwards (if I remember correctly, that is.) 

I had originally intended to have QEMU trace MMIO in general and PCI{,-E} 
bus/device traffic (as relevant) in order to establish what arcane incantations 
Windows could possibly be performing, but that only ended up showing me PCI 
configuration space read I/O and IRQ reassignments upon disabling my video 
card; WinDbg/Kd* is far too slow to facilitate tracing PCI{,-E} traffic through 
breakpoints and were I to possess the Windows Research Kernel source code, 
speaking completely hypothetically here, I would then unfortunately have to 
find out that QEMU w/ KVM plus AMD's drivers doesn't go along too well w/ 
Windows Server 2003. I then figured that having drivers/vfio/pci/* produce that 
information should ultimately lead me towards the solution but I can't quite 
see to that just yet; the remove/rescan dance is the only thing that, 
pragmatically speaking, actually works for me at present.

> In your experiment with removing and rescanning the device, are you
> simply doing 'echo 1 > remove; echo 1 > /sys/bus/pci/rescan'?  Thanks,
Yes.

Re: [Qemu-devel] [PATCH v2 01/13] block: Make essential BlockDriver objects public

2014-12-02 Thread Eric Blake

On 12/02/2014 02:11 AM, Max Reitz wrote:

>>>   -static BlockDriver bdrv_qcow2 = {
>>> +BlockDriver *bdrv_qcow2 = &(BlockDriver){
>> Do we want any use of 'const', to avoid accidental manipulation of the
>> pointer and/or pointed-to contents?
> 
> Sounds good at first, but for instance qemu_opts_create() (which is
> often called with bdrv_qcow2->create_opts and the like) don't take a
> const pointer. We could fix all those functions, but trying to fix the
> const-ness of the block layer sounds like really tedious work to me...
> Also, bdrv_find_format() returns a non-const pointer so it's at least
> not more broken than it was before.

Fair enough - no need to hold up this patch on what would turn into a
much larger task of chasing const-correctness.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH] tests: Fix bios-tables-test -Werror compilation error

2014-12-02 Thread Maciej W. Rozycki

Fix:

.../tests/bios-tables-test.c: In function 'main':
.../tests/bios-tables-test.c:796:11: error: ignoring return value of 'fwrite', 
declared with attribute warn_unused_result [-Werror=unused-result]
cc1: all warnings being treated as errors
make: *** [tests/bios-tables-test.o] Error 1

happening when building QEMU with the `--enable-werror' configuration 
option present.  Report any failure from `fwrite'.

Signed-off-by: Maciej W. Rozycki 
---
qemu-test-bios-tables-fwrite.diff
Index: qemu-git-trunk/tests/bios-tables-test.c
===
--- qemu-git-trunk.orig/tests/bios-tables-test.c2014-11-21 
11:56:04.0 +
+++ qemu-git-trunk/tests/bios-tables-test.c 2014-12-02 15:52:27.868971680 
+
@@ -787,14 +787,21 @@ int main(int argc, char *argv[])
 {
 const char *arch = qtest_get_arch();
 FILE *f = fopen(disk, "w");
+size_t s;
+int err;
 int ret;
 
 if (!f) {
 fprintf(stderr, "Couldn't open \"%s\": %s", disk, strerror(errno));
 return 1;
 }
-fwrite(boot_sector, 1, sizeof boot_sector, f);
+s = fwrite(boot_sector, 1, sizeof boot_sector, f);
+err = errno;
 fclose(f);
+if (s != sizeof boot_sector) {
+fprintf(stderr, "Couldn't write \"%s\": %s", disk, strerror(err));
+return 1;
+}
 
 g_test_init(&argc, &argv, NULL);

Re: [Qemu-devel] Vhost-user - multi queue support

2014-12-02 Thread Olivier MATZ


Hi All,

On 12/02/2014 01:24 PM, Michael S. Tsirkin wrote:

On Tue, Dec 02, 2014 at 11:42:22AM +, Long, Thomas wrote:

Hi All,

I’m just wondering what the status is with regards to supporting multi-queue in
Vhost-user?

I see that Nikolaev has developed a patch to support this feature:

https://github.com/SnabbCo/qemu/commit/f41eeccf4ab6ea5970e2941ce2de0aae893b10f9

Are there any current plans to pull either this patch or a different patch into
QEMU ?


I reviewed the patch and it looks good to me. One small comment: if
nc->info_str is set to ("vhost-user%d to %s", i, chr->label), it would
make sense to display it when the event callback:

--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -122,36 +122,39 @@ static void net_vhost_user_event(void *opaque, int 
event)

 case CHR_EVENT_OPENED:
 vhost_user_start(s);
 net_vhost_link_down(s, false);
-error_report("chardev \"%s\" went up\n", s->chr->label);
+error_report("chardev \"%s\" went up\n", s->nc.info_str);
 break;
 case CHR_EVENT_CLOSED:
 net_vhost_link_down(s, true);
 vhost_user_stop(s);
-error_report("chardev \"%s\" went down\n", s->chr->label);
+error_report("chardev \"%s\" went down\n", s->nc.info_str);
 break;
 }
 }


If it helps, I can submit a patch based on Nikolaev's on the list.

Regards,
Olivier

1 2 >

1 - 100 of 165 matches

Mail list logo