Re: [Qemu-devel] [PATCH] [v2 PATCH] qemu-img: sort block formats in help message

2014-05-16 Thread Mike Day
On Fri, May 16, 2014 at 9:59 AM, Laurent Desnogues
 wrote:
>
> just noticed the use of some too recent glib features: g_strcmp0
> and GSequence related ones.

There is a patch upstream that removes g_sequence_lookup, which was
not linking on one platform. I haven't seen any other reports of
breakage. g_strcmp would be trivial to replace, of course.

thanks!

Mike



Re: [Qemu-devel] [PATCH v2] qemu-img fails to delete last snapshot

2014-05-15 Thread Mike Day
On Wed, May 14, 2014 at 11:15 AM, Kevin Wolf  wrote:
>> freeing the extra clusters.
>
> Do you have an easy reproducer? Because I can't see the bug.

Thanks for the review! I was having a hard time reproducing this until
I did a bisect. This bug was fixed by 65f33bc which was merged at or
after  the time I submitted the patch:

 qcow2: Fix alloc_clusters_noref() overflow detection

I can reproduce the bug by checking out the immediate ancestor
43cbeffb1, creating a single snapshot in a qcow2 image, and then
attempting to delete that snapshot.

The error I get is:
qemu-img: Could not delete snapshot 'snapone': (Failed to remove
snapshot from snapshot list: File too large)

This is the error that is fixed by 65f33bc

Mike



Re: [Qemu-devel] [PULL 03/17] qemu-img: sort block formats in help message

2014-05-14 Thread Mike Day
On Wed, May 14, 2014 at 9:43 AM, Jeff Cody  wrote:
> Prior to a recent commit, this function did not make distinction on
> duplicates.  As of commit e855e4fb7, duplicates are not longer printed
> in the help message:
>
> e855e4fb7: Ignore duplicate or NULL format_name in bdrv_iterate_format):
>

That makes sense, thanks for the background!

Mike



Re: [Qemu-devel] [PATCH] qemu-img: sort block formats in help message

2014-05-14 Thread Mike Day
On Wed, May 14, 2014 at 9:02 AM, Stefan Hajnoczi  wrote:
> Yes, we cannot change the git commit history once a commit is in the
> public qemu.git repository.  The only options are to:
>
> 1. Add a patch on top that fixes the issue.
> 2. Use git-revert(1) to undo your commit entirely (this adds a new patch
>on top the applies the reverse diff).
>
> In this case #1 seems like a good choice.

I followed your advice and went with #1, I think its already been pulled.

Mike



Re: [Qemu-devel] [PULL 03/17] qemu-img: sort block formats in help message

2014-05-14 Thread Mike Day
On Wed, May 14, 2014 at 8:35 AM, Stefan Hajnoczi  wrote:
> Jeff Cody recently wanted to eliminate duplicate entries in the list.  I
> thought part of your intention was to address the duplicates with your
> patch.
>
> We can back out the sequence API if it's not supported on older glib but
> it would be nice to eliminate duplicates later, too.

I agree. I can submit an additional patch that uses an older API.
What, exactly is the cause of duplicate entries in the list? I've only
seen it one time but its disconcerting when it happens.

Mike



[Qemu-devel] [PATCH] Remove g_sequence_lookup from qemu-img help function

2014-05-13 Thread Mike Day
g_sequence_lookup is not supported by glib < 2.28. The usage
of g_sequence_lookup is not essential in this context (it's a
safeguard against duplicate values in the help message).
Removing the call enables the build on all platforms and
does not change the operation of the help function.

Signed-off-by: Mike Day 
---
 qemu-img.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 04ce02a..1ad899e 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -70,11 +70,8 @@ static void add_format_to_seq(void *opaque, const char 
*fmt_name)
 {
 GSequence *seq = opaque;
 
-if (!g_sequence_lookup(seq, (gpointer)fmt_name,
-   compare_data, NULL)) {
-g_sequence_insert_sorted(seq, (gpointer)fmt_name,
- compare_data, NULL);
-}
+g_sequence_insert_sorted(seq, (gpointer)fmt_name,
+ compare_data, NULL);
 }
 
 static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...)
-- 
1.9.0




[Qemu-devel] [PATCH] Remove g_sequence_lookup from qemu-img help function

2014-05-13 Thread Mike Day
g_sequence_lookup is not supported by glib < 2.28. The usage
of g_sequence_lookup is not essential in this context (its a
safeguard against duplicate values in the help message).
Removing the call enables the build on all platforms and
does not change the operation of the help function.

Signed-off-by: Mike Day 
---
 qemu-img.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 04ce02a..bf5e74c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -70,11 +70,8 @@ static void add_format_to_seq(void *opaque, const char 
*fmt_name)
 {
 GSequence *seq = opaque;
 
-if (!g_sequence_lookup(seq, (gpointer)fmt_name,
-   compare_data, NULL)) {
 g_sequence_insert_sorted(seq, (gpointer)fmt_name,
  compare_data, NULL);
-}
 }
 
 static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...)
-- 
1.9.0




Re: [Qemu-devel] [PULL 03/17] qemu-img: sort block formats in help message

2014-05-13 Thread Mike Day
On Tue, May 13, 2014 at 10:18 AM, Cornelia Huck
 wrote:
>
> qemu-img.o: In function `add_format_to_seq':
> /home/cohuck/git/qemu/qemu-img.c:73: undefined reference to 
> `g_sequence_lookup'
> collect2: ld returned 1 exit status
>
> g_sequence_lookup has been added with glib 2.28, and this box has
> 2.22.5. configure looks for glib >= 2.12 (2.20 for mingw).

g_sequence_lookup is there because I was getting duplicate formats in
the format list, as in "vfat vfat vfat qcow." That was a temporary
situation. the lookup serves to guarantee each format is unique in the
list. It may not be needed.

Mike



[Qemu-devel] [PATCH] qemu-img: sort block formats in help message

2014-05-13 Thread Mike Day
The help message for qemu-img lists the supported block formats, of
which there are 27 as of version 2.0.50. The formats are printed in
the order of their driver's position in a linked list, which appears
random. This patch prints the formats in sorted order, making it
easier to read and to find a specific format in the list.

[Added suggestions from Fam Zheng  to declare variables
at the top of the scope in help() and to omit explicit cast for void*
opaque.
--Stefan]

[Removed call to g_sequence_lookup because it breaks the build on
 machines with glib < 2.28.
--Mike]

Signed-off-by: Mike Day 
Signed-off-by: Stefan Hajnoczi 
---
 qemu-img.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 96f4463..93e51d1 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -32,6 +32,7 @@
 #include "block/block_int.h"
 #include "block/qapi.h"
 #include 
+#include 
 
 #define QEMU_IMG_VERSION "qemu-img version " QEMU_VERSION \
   ", Copyright (c) 2004-2008 Fabrice Bellard\n"
@@ -55,9 +56,22 @@ typedef enum OutputFormat {
 #define BDRV_O_FLAGS BDRV_O_CACHE_WB
 #define BDRV_DEFAULT_CACHE "writeback"
 
-static void format_print(void *opaque, const char *name)
+static gint compare_data(gconstpointer a, gconstpointer b, gpointer user)
 {
-printf(" %s", name);
+return g_strcmp0(a, b);
+}
+
+static void print_format(gpointer data, gpointer user)
+{
+printf(" %s", (char *)data);
+}
+
+static void add_format_to_seq(void *opaque, const char *fmt_name)
+{
+GSequence *seq = opaque;
+
+g_sequence_insert_sorted(seq, (gpointer)fmt_name,
+ compare_data, NULL);
 }
 
 static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...)
@@ -142,10 +156,15 @@ static void QEMU_NORETURN help(void)
"  '-f' first image format\n"
"  '-F' second image format\n"
"  '-s' run in Strict mode - fail on different image size or sector 
allocation\n";
+GSequence *seq;
 
 printf("%s\nSupported formats:", help_msg);
-bdrv_iterate_format(format_print, NULL);
+seq = g_sequence_new(NULL);
+bdrv_iterate_format(add_format_to_seq, seq);
+g_sequence_foreach(seq, print_format, NULL);
 printf("\n");
+g_sequence_free(seq);
+
 exit(EXIT_SUCCESS);
 }
 
-- 
1.9.0




[Qemu-devel] [PATCH v2] qemu-img fails to delete last snapshot

2014-05-12 Thread Mike Day
When deleting the last snapshot, copying the resulting snapshot table
currently fails, causing the delete operation to also fail. Fix the
failure by skipping the copy and just writing the snapshot header and
freeing the extra clusters.

There are two specific problems in the current code. First is a lack of
parenthesis in the calculation of the memmove size parameter:

s->nb_snapshots - snapshot_index - 1

When s->nb_snapshots is 0, snapshot_index is 1.

0 - 1 - 1 = 0xfffe

it should be:

0 - (1 - 1) = 0x00

The second problem is shifting the snapshot table to the left. After
removing the last snapshot there are no existing snapshots to be
shifted. All that needs to be done is to write the header and
unallocate the blocks.

Signed-off-by: Mike Day 
Reviewed-by: Eric Blake 
---
v2: improved the git log entry
added Eric Blake as a reviewer

 block/qcow2-snapshot.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 0aa9def..c8b842c 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -165,9 +165,11 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 
 assert(offset <= INT_MAX);
 snapshots_size = offset;
-
 /* Allocate space for the new snapshot list */
-snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size);
+snapshots_offset = 0;
+if (snapshots_size) {
+snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size);
+}
 offset = snapshots_offset;
 if (offset < 0) {
 ret = offset;
@@ -180,12 +182,13 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 
 /* The snapshot list position has not yet been updated, so these clusters
  * must indeed be completely free */
-ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size);
-if (ret < 0) {
-goto fail;
+if (snapshots_size) {
+ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size);
+if (ret < 0) {
+goto fail;
+}
 }
 
-
 /* Write all snapshots to the new list */
 for(i = 0; i < s->nb_snapshots; i++) {
 sn = s->snapshots + i;
@@ -590,12 +593,14 @@ int qcow2_snapshot_delete(BlockDriverState *bs,
 return -ENOENT;
 }
 sn = s->snapshots[snapshot_index];
-
 /* Remove it from the snapshot list */
-memmove(s->snapshots + snapshot_index,
-s->snapshots + snapshot_index + 1,
-(s->nb_snapshots - snapshot_index - 1) * sizeof(sn));
 s->nb_snapshots--;
+if (s->nb_snapshots) {
+memmove(s->snapshots + snapshot_index,
+s->snapshots + snapshot_index + 1,
+(s->nb_snapshots - (snapshot_index - 1)) * sizeof(sn));
+}
+
 ret = qcow2_write_snapshots(bs);
 if (ret < 0) {
 error_setg_errno(errp, -ret,
-- 
1.9.0




Re: [Qemu-devel] [PATCH] qemu-img fails to delete last snapshot

2014-05-12 Thread Mike Day
On Mon, May 12, 2014 at 12:39 PM, Eric Blake  wrote:
> This information is actually quite useful in understanding the patch,
> and I would have included it prior to the --- for inclusion in git,
> rather than in the reviewer-only commentary that gets stripped during
> 'git am'.
>
>>
>>  block/qcow2-snapshot.c | 25 +++--
>>  1 file changed, 15 insertions(+), 10 deletions(-)
>
> I'd suggest posting a v2 with a better commit message; but the code
> itself seems fine, so you can add:
>
> Reviewed-by: Eric Blake 

Got it, thanks!

Mike



Re: [Qemu-devel] [PATCH] Add backing file option to qemu-img create help.

2014-05-12 Thread Mike Day
On Mon, May 12, 2014 at 12:36 PM, Kevin Wolf  wrote:
> What would qemu-img rebase do with -o? It is just for (safely) changing
> the backing file, not for updating options. There is qemu-img amend for
> that, and it does have an -o option.

A little background ... I'm writing a 4-day  KVM training course for
the Linux foundation. From time to time I come across undocumented,
buggy, or confusing options and if I can I try to fix them. In the
case of qemu-img I looked in a lot of places for documentation and
ended up at the unofficial web page that documented the -b backing
file option which works just fine for create and rebase. If an
official page had the information I needed in one place instead of
bits and pieces scattered everywhere I would have used it, naturally.

Here's why the help is confusing. On the top level help, all the
options except for -b are documented. A couple subcommands are even
listed in the top level help with their options. I expected that -b
would be documented on the top level with the others. The fact that -b
 is documented for the rebase subcommand reenforced my
expectation. I was assuming there is some consistency in using options
with backing files. (Especially if the option is called
"backingfile.")

I'm not advocating anything here, and I think its fine to leave things
as they are. I just don't want to be the guy who complains and never
fixes anything.

Mike



Re: [Qemu-devel] [PATCH] Add backing file option to qemu-img create help.

2014-05-12 Thread Mike Day
On Mon, May 12, 2014 at 11:53 AM, Eric Blake  wrote:
> Ah, but it is:
>
> $ qemu-img create -f qcow2 -o help
> Supported options:
> size Virtual disk size
> compat   Compatibility level (0.10 or 1.1)
> backing_file File name of a base image
> backing_fmt  Image format of the base image
> encryption   Encrypt the image
> cluster_size qcow2 cluster size
> preallocationPreallocation mode (allowed values: off, metadata)
> lazy_refcounts   Postpone refcount updates
>
> and more importantly, it only appears in the help output of -f modes
> that actually support a backing file.  Contrast:
>
> $ qemu-img create -f raw -o help
> Supported options:
> size Virtual disk size

This is much more prominent in the top-level help:

rebase [-q] [-f fmt] [-t cache] [-p] [-u] -b backing_file [-F
backing_fmt] filename



Re: [Qemu-devel] [PATCH] Add backing file option to qemu-img create help.

2014-05-12 Thread Mike Day
On Mon, May 12, 2014 at 11:20 AM, Eric Blake  wrote:
> Online where?  In 'qemu-img --help', or on some web page (at what URL)?

http://qemu.weilnetz.de/qemu-doc.html#vm_005fsnapshots

-o backing file is not documented in the help command.

Mike



[Qemu-devel] [PATCH] Add backing file option to qemu-img create help.

2014-05-12 Thread Mike Day
For the create subcommand the backing file (-b) option is documented
on-line but not in the binary. Add it.

Signed-off-by: Mike Day 
---
 qemu-img-cmds.hx | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index d029609..7724709 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -16,9 +16,9 @@ STEXI
 ETEXI
 
 DEF("create", img_create,
-"create [-q] [-f fmt] [-o options] filename [size]")
+"create [-q] [-f fmt] [-b backing-file] [-o options] filename [size]")
 STEXI
-@item create [-q] [-f @var{fmt}] [-o @var{options}] @var{filename} [@var{size}]
+@item create [-q] [-f @var{fmt}] [-b @var{backing_file}] [-o @var{options}] 
@var{filename} [@var{size}]
 ETEXI
 
 DEF("commit", img_commit,
-- 
1.9.0




[Qemu-devel] [PATCH] qemu-img fails to delete last snapshot

2014-05-09 Thread Mike Day
When deleting the last snapshot, copying the resulting snapshot table
currently fails, causing the delete operation to also fail. Fix the
failure by skipping the copy and just writing the snapshot header and
freeing the extra clusters.

Signed-off-by: Mike Day 
---

There are two specific problems in the curent code. First is a lack of
parenthesis in the calculation of a memmove parameter:

s->nb_snapshots - snapshot_index - 1

When s->nb_snapshots is 0, snapshot_index is 1. 

0 - 1 - 1 = 0xfffe

it should be:

0 - (1 - 1) = 0x00

The second problem is shifting the snapshot table to the left. After
removing the last snapshot there are no existing snapshots to be
shifted. All that needs to be done is to write the header and
unallocate the blocks.

 block/qcow2-snapshot.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 0aa9def..c8b842c 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -165,9 +165,11 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 
 assert(offset <= INT_MAX);
 snapshots_size = offset;
-
 /* Allocate space for the new snapshot list */
-snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size);
+snapshots_offset = 0;
+if (snapshots_size) {
+snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size);
+}
 offset = snapshots_offset;
 if (offset < 0) {
 ret = offset;
@@ -180,12 +182,13 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 
 /* The snapshot list position has not yet been updated, so these clusters
  * must indeed be completely free */
-ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size);
-if (ret < 0) {
-goto fail;
+if (snapshots_size) {
+ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size);
+if (ret < 0) {
+goto fail;
+}
 }
 
-
 /* Write all snapshots to the new list */
 for(i = 0; i < s->nb_snapshots; i++) {
 sn = s->snapshots + i;
@@ -590,12 +593,14 @@ int qcow2_snapshot_delete(BlockDriverState *bs,
 return -ENOENT;
 }
 sn = s->snapshots[snapshot_index];
-
 /* Remove it from the snapshot list */
-memmove(s->snapshots + snapshot_index,
-s->snapshots + snapshot_index + 1,
-(s->nb_snapshots - snapshot_index - 1) * sizeof(sn));
 s->nb_snapshots--;
+if (s->nb_snapshots) {
+memmove(s->snapshots + snapshot_index,
+s->snapshots + snapshot_index + 1,
+(s->nb_snapshots - (snapshot_index - 1)) * sizeof(sn));
+}
+
 ret = qcow2_write_snapshots(bs);
 if (ret < 0) {
 error_setg_errno(errp, -ret,
-- 
1.9.0




Re: [Qemu-devel] [PATCH 1/6] xics: add flags for interrupts

2014-05-09 Thread Mike Day
On Thu, May 8, 2014 at 11:12 PM, Alexey Kardashevskiy  wrote:
>> There are
>> a couple ways to mitigate this type of situation by using alternative
>> data structures to inform the loop traversal. I don't know if it is
>> worth the effort, though.
>
> Here I lost you :)

If I read the code correctly, the problem I'm wondering  about is that
the loop will waste time traversing the array  when there are only
unallocated interrupts from the current index to the end. For example,
if the interrupt array is 256 entries and the highest index that is
allocated is 16, the outside loop will traverse all 256 entries while
it should have exited after the 16th.

To mitigate this you could keep a shadow index of the current highest
allocated index and check for that in the outside loop. Or you could
maintain a shadow linked list that only includes allocated array
entries and just traverse that list. Each element on the list would be
an allocated entry in the interrupt array.

> btw I just realized that in patch#2 it should be xics_find_source instead
> of xics_find_server. There are many interrupt servers already and just one
> interrupt source (we could have many like one per PHB or something like
> that but we are not there yet), this is what I meant.



Re: [Qemu-devel] [PATCH] Update QEMU checkpatch.pl to current linux version

2014-05-08 Thread Mike Day
On Thu, May 8, 2014 at 10:02 AM, Peter Maydell  wrote:
>>> total: 0 errors, 0 warnings, 79 lines checked
>>>
>>> /tmp/a has no obvious style problems and is ready for submission.
>>> Check 500, Bad 52
>>
>> How does your new checkpatch compare to our current one?
>
> Yes; we do sometimes let checkpatch-failing commits through,
> so it would be more interesting to know:
>  * how many commits passed old-checkpatch but fail the new one
>  * how many commits failed with old-checkpatch but pass now
>  * in both cases, how many of the failures are false positives?
>
> (the last of those is tricky to answer without eyeballing
> the errors, unfortunately)

That seems like a good test regime and not a lot of effort using the
bash script Don shared.

There appears to be some interest in this patch. I'll work on a 2nd
revision, following Peter's suggestion to separate the QEMU rules into
a separate, smaller patch (and with the correct handling of brackets).

Mike



Re: [Qemu-devel] [PATCH] Update QEMU checkpatch.pl to current linux version

2014-05-08 Thread Mike Day
On Wed, May 7, 2014 at 5:16 PM, Hani Benhabiles  wrote:
> FWIW, this new version doesn't trigger a false positive I was having with
> patches 05-07 in [1].
>
> However, from a quick test for this patch on a couple of patches, I am getting
> warnings like this: "WARNING: braces {} are not necessary for single statement
> blocks" so there are still some Qemu related changes missing.

Thanks for the feedback! As Peter pointed out (and you confirm), I
missed forward-porting QEMU's brace rules.



Re: [Qemu-devel] [PATCH 2/6] xics: add find_server

2014-05-07 Thread Mike Day

Alexey Kardashevskiy  writes:

> PAPR allows having multiple interrupr servers.
>

typo above in the commit log entry,

Mike
-- 
Mike Day | "Endurance is a Virtue"



Re: [Qemu-devel] [PATCH 1/6] xics: add flags for interrupts

2014-05-07 Thread Mike Day

>  
>  for (i = 0; i < ics->nr_irqs; i++) {
>  /* FIXME: filter by server#? */
> -if (ics->islsi[i]) {
> +if (ics->irqs[i].flags & XICS_FLAGS_LSI) {
>  resend_lsi(ics, i);

Not part of your patch, but I'm curious about this FIXME (there's an
identical FIXME in resend_msi). Has this proved to be a problem? With
these patches you could have many unallocated interrupts in array AFTER
the last allocated interrupt, correct? In that case, the loop would
continue beyond the last allocated interrupt for no purpose.  There are
a couple ways to mitigate this type of situation by using alternative
data structures to inform the loop traversal. I don't know if it is
worth the effort, though.


> +/* @flags == 0 measn the interrupt is not allocated */
> +#define XICS_FLAGS_LSI 0x1
> +#define XICS_FLAGS_MSI         0x2

(nit) typo in the above comment

Mike

-- 
Mike Day | "Endurance is a Virtue"



Re: [Qemu-devel] [PATCH] Update QEMU checkpatch.pl to current linux version

2014-05-06 Thread Mike Day
On Tue, May 6, 2014 at 2:28 PM, Peter Maydell  wrote:
> I think this is going to be difficult to review, to say the least.
>
> Where does your patch come from? Is the kernel's checkpatch.pl
> just a single commit between 0.31 and 0.32 (surely not) or
> a series of fixes?

Yes, this is today's version of the kernel's checkpatch.pl.  Version
0.32 is the product of many patches onto 0.31

> A couple of ideas about how we could approach this:
>  (1) make a commit which is simply copying the kernel's 0.32
>   into our repo; then follow that with a series of commits which
>   re-apply our local changes.
>  (2) apply all the individual commits from the kernel between 0.31
>  and 0.32 to our repo
>  (3) give up and stick with 0.31...

idea (1) makes perfect sense to me, and the local changes will be easy
to review.

> It might also be helpful if you could describe the benefits
> we get from this update (any bugfixes for false positives we
> tend to run into? useful new checks?)

I think its a valid question whether to forward-port the Qemu checks
to 0.32. I'm not sure, myself, but this gives folks an idea of what
the cost/benefit is.

Yesterday I was bitten by the StudlyCaps syndrome in a patch I
submitted which was checkpatch-clean. Afterward I noticed that version
0.31 has the check for StudlyCaps commented out. This is corrected in
version 0.32 with a more ambitious check for "CamelCase."

The CamelCase check in v0.32 tries to determine if the patch
introduced StudlyCaps or if they already exist in the unpatched source
file. (I was hoping for a three-line check I could paste into v0.31.)

I noticed some other new features - it uses an optional configuration
file, checks for validly formed email addresses. The other changes I
believe, are either experimental or providing tools for filtering or
typing messages.

Mike



[Qemu-devel] [PATCH] [v2 PATCH] qemu-img: sort block formats in help message

2014-05-05 Thread Mike Day
The help message for qemu-img lists the supported block formats, of
which there are 27 as of version 2.0.50. The formats are printed in
the order of their driver's position in a linked list, which appears
random. This patch prints the formats in sorted order, making it
easier to read and to find a specific format in the list.

v2: Incorporated feedback from Stefan Hajnoczi

Signed-off-by: Mike Day 
---
 qemu-img.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 96f4463..a559108 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -32,6 +32,7 @@
 #include "block/block_int.h"
 #include "block/qapi.h"
 #include 
+#include 
 
 #define QEMU_IMG_VERSION "qemu-img version " QEMU_VERSION \
   ", Copyright (c) 2004-2008 Fabrice Bellard\n"
@@ -55,9 +56,25 @@ typedef enum OutputFormat {
 #define BDRV_O_FLAGS BDRV_O_CACHE_WB
 #define BDRV_DEFAULT_CACHE "writeback"
 
-static void format_print(void *opaque, const char *name)
+static gint compare_data(gconstpointer a, gconstpointer b, gpointer user)
 {
-printf(" %s", name);
+return g_strcmp0(a, b);
+}
+
+static void print_format(gpointer data, gpointer user)
+{
+printf(" %s", (char *)data);
+}
+
+static void add_format_to_seq(void *opaque, const char *fmt_name)
+{
+GSequence *seq = (GSequence *)opaque;
+
+if (!g_sequence_lookup(seq, (gpointer)fmt_name,
+   compare_data, NULL)) {
+g_sequence_insert_sorted(seq, (gpointer)fmt_name,
+ compare_data, NULL);
+}
 }
 
 static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...)
@@ -144,8 +161,12 @@ static void QEMU_NORETURN help(void)
"  '-s' run in Strict mode - fail on different image size or sector 
allocation\n";
 
 printf("%s\nSupported formats:", help_msg);
-bdrv_iterate_format(format_print, NULL);
+GSequence *seq = g_sequence_new(NULL);
+bdrv_iterate_format(add_format_to_seq, seq);
+g_sequence_foreach(seq, print_format, NULL);
 printf("\n");
+g_sequence_free(seq);
+
 exit(EXIT_SUCCESS);
 }
 
-- 
1.9.0




Re: [Qemu-devel] [PATCH] qemu-img: sort block formats in help message

2014-05-05 Thread Mike Day
Thanks for the review! I've been able to shorten the next version quite a bit.

On Mon, May 5, 2014 at 9:42 AM, Stefan Hajnoczi  wrote:
>>  +static void GFunc_print_format(gpointer data, gpointer user)
>
> QEMU coding style is lowercase function and variable names.  The
> scripts/checkpatch.pl script should identify coding style violations,
> please run it.

Yeah this is ugly. I've long had checkpatch.pl configured as my
pre-commit hook. I've confirmed again this morning that these ugly
caps pass through with no error or warning. I'll look at the script.

>>  +
>>  +static void add_format_to_seq(void *opaque, const char *fmt_name)
>>  +{
>>  +GSequence *seq = (GSequence *)opaque;
>>  +
>>  +if (!g_sequence_lookup(seq, (gpointer)fmt_name,
>>  +  compare_data, NULL)) {
>>  +g_sequence_insert_sorted(seq, (gpointer)fmt_name,
>>  + compare_data, NULL);
>>  +}
>
> The type casts in this patch aren't necessary.  In C void* casts to and
> from any type without an explicit cast.  Only C++ demands explicit
> casts of void*.

The casts are ugly, and I don't know how to get rid of them here
beyond a pragma.  Due to the block and glib APIs I have to cast away a
const in the second parameter. I'm bracketed on both ends:
g_sequence_* needs that second parameter to be void * (pointer), while
the the block api (bdrv_iterate_format) defines the this function
pointer type as (*)(void *, const char *). Ideas?

Mike



[Qemu-devel] [PATCH] qemu-img: sort block formats in help message

2014-05-02 Thread Mike Day
The help message for qemu-img lists the supported block formats, of
which there are 27 as of version 2.0.50. The formats are printed in
the order of their driver's position in a linked list, which appears
random. This patch prints the formats in sorted order, making it
easier to read and to find a specific format in the list.

Signed-off-by: Mike Day 
---
 qemu-img.c | 33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

 diff --git a/qemu-img.c b/qemu-img.c
 index 96f4463..d8b7ef4 100644
 --- a/qemu-img.c
 +++ b/qemu-img.c
 @@ -32,6 +32,7 @@
 #include "block/block_int.h"
 #include "block/qapi.h"
 #include 
 +#include 
 
 #define QEMU_IMG_VERSION "qemu-img version " QEMU_VERSION \
 ", Copyright (c) 2004-2008 Fabrice Bellard\n"
 @@ -60,6 +61,32 @@ static void format_print(void *opaque, const char *name)
 printf(" %s", name);
 }
 
 +static gint compare_data(gconstpointer a, gconstpointer b, gpointer user)
 +{
 +return g_strcmp0((const char *)a, (const char *)b);
 +}
 +
 +static void GFunc_print_format(gpointer data, gpointer user)
 +{
 +format_print(user, data);
 +}
 +
 +static GSequence *init_sequence(void)
 +{
 +return g_sequence_new(NULL);
 +}
 +
 +static void add_format_to_seq(void *opaque, const char *fmt_name)
 +{
 +GSequence *seq = (GSequence *)opaque;
 +
 +if (!g_sequence_lookup(seq, (gpointer)fmt_name,
 +  compare_data, NULL)) {
 +g_sequence_insert_sorted(seq, (gpointer)fmt_name,
 + compare_data, NULL);
 +}
 +}
 +
 static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...)
 {
 va_list ap;
 @@ -144,8 +171,12 @@ static void QEMU_NORETURN help(void)
 "  '-s' run in Strict mode - fail on different image size or sector 
allocation\n";
 
 printf("%s\nSupported formats:", help_msg);
 -bdrv_iterate_format(format_print, NULL);
 +GSequence *seq = init_sequence();
 +bdrv_iterate_format(add_format_to_seq, seq);
 +g_sequence_foreach(seq, GFunc_print_format, NULL);
 printf("\n");
 +g_sequence_free(seq);
 +
 exit(EXIT_SUCCESS);
 }
 
 -- 
 1.9.0




Re: [Qemu-devel] [PULL 14/16] qemu-img: Improve error messages

2014-04-30 Thread Mike Day
On Wed, Apr 30, 2014 at 10:16 AM, Eric Blake  wrote:
> Additional patches:
> https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04074.html
> https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04468.html
> both already on the queue for the next PULL request of the block branch.

Excellent, thank you!

Mike



Re: [Qemu-devel] [PULL 14/16] qemu-img: Improve error messages

2014-04-30 Thread Mike Day
On Sun, Apr 27, 2014 at 9:55 PM, Fam Zheng  wrote:
>> >  /* not found */
>> > -help();
>> > -return 0;
>> > +error_exit("Command not found: %s", cmdname);
>>
>> Looks like we just relied previously on the default 'not found' case
>> for help() to provide the "--help" option.
>>
>
> Oops! Thanks for noticing this and sending the fix.

One of you guys going to send in the trivial patch to restore help?
(Or is it upstream in another patch?)

thanks, Mike



Re: [Qemu-devel] [PATCH] -nographic sometimes adds an extra chardev for stdio

2014-04-29 Thread Mike Day
On Tue, Apr 29, 2014 at 1:02 AM, Michael Tokarev  wrote:
> I guess we should have some global variable like "stdio_occuped", set
> it to 1 when -daemonize is specified, and set and check it each time
> we try to use stdio for something.  This way we'll prevent various
> parts of qemu from fighting for stdio.

That does seem to be a more fundamental solution. Thanks for the review.

Mike



Re: [Qemu-devel] [PATCH] -nographic sometimes adds an extra chardev for stdio

2014-04-29 Thread Mike Day
On Tue, Apr 29, 2014 at 3:09 AM, Gerd Hoffmann  wrote:
>
> I don't feel like adding more band-aid to paper over the fundamental
> issue.  Too much band-aid tends to cause other unwanted side effects.

Fair enough, I tend to agree. thanks for the review.

Mike



Re: [Qemu-devel] [PATCH] -nographic sometimes adds an extra chardev for stdio

2014-04-28 Thread Mike Day
On Mon, Apr 28, 2014 at 3:53 PM, Michael Tokarev  wrote:
> Gosh. This is uuugly.
>
> Maybe, at least, we can add some variable which gets incremented for
> each -mon chardev= ?  I dunno, that's about the same ugliness.  Oh
> well... :(

Suggestions much appreciated, I think its ugly versus intrusive, but
I'm not familiar with this part of qemu.

There is another case with -monitor where you can configure two
chardevs for stdio and it does the same thing. I left that one alone,
its more complicated.

Mike



[Qemu-devel] [PATCH] -nographic sometimes adds an extra chardev for stdio

2014-04-28 Thread Mike Day
When the deprecated -nographic option is used with the -mon option in
readline mode, qemu will create a second character device for stdio
and place it over the stdio chardev put into place by the -mon
option. This causes the terminal to stop echoeing characters upon exit
from Qemu.

Fix by checking for the existing chardev before adding another.
Signed-off-by: Mike Day 

---

To reproduce, use -mon and -nographic together. I was able to
reproduce it using

# qemu-system-x86_64 -enable-kvm -m 1G  -chardev stdio,id=mon0 \
# -mon chardev=mon0,mode=readline -nographic



---
 vl.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/vl.c b/vl.c
index 236f95e..3104b20 100644
--- a/vl.c
+++ b/vl.c
@@ -93,6 +93,7 @@ int main(int argc, char **argv)
 #include "sysemu/kvm.h"
 #include "qapi/qmp/qjson.h"
 #include "qemu/option.h"
+#include "qemu/option_int.h"
 #include "qemu/config-file.h"
 #include "qemu-options.h"
 #include "qmp-commands.h"
@@ -524,6 +525,20 @@ static QemuOptsList qemu_mem_opts = {
 },
 };
 
+static int qemu_opts_loopfun(QemuOpts *opts, void *opaque)
+{
+QemuOpt *opt;
+
+if (!strncmp(((QemuOpt *)opts->list)->name, "mon", 3)) {
+QTAILQ_FOREACH(opt, &opts->head, next) {
+if (!strncmp(opt->name, "chardev", 7)) {
+return 1;
+}
+}
+}
+return 0;
+}
+
 /**
  * Get machine options
  *
@@ -4113,6 +4128,11 @@ int main(int argc, char **argv, char **envp)
 }
 
 if (display_type == DT_NOGRAPHIC) {
+int have_stdio = 0;
+QemuOptsList *opts = qemu_find_opts("mon");
+if (opts) {
+have_stdio = qemu_opts_foreach(opts, qemu_opts_loopfun, NULL, 0);
+}
 if (default_parallel)
 add_device_config(DEV_PARALLEL, "null");
 if (default_serial && default_monitor) {
@@ -4122,7 +4142,7 @@ int main(int argc, char **argv, char **envp)
 } else if (default_sclp && default_monitor) {
 add_device_config(DEV_SCLP, "mon:stdio");
 } else {
-if (default_serial)
+if (default_serial && !have_stdio)
 add_device_config(DEV_SERIAL, "stdio");
 if (default_virtcon)
 add_device_config(DEV_VIRTCON, "stdio");
-- 
1.9.0




Re: [Qemu-devel] Monitor Readline - no terminal echo after exit

2014-04-24 Thread Mike Day
On Thu, Apr 24, 2014 at 3:31 AM, Markus Armbruster  wrote:
>> I believe someone on the list mentioned they are seeing a couple
>> problems entering and exiting the Monitor. I'd like to look at this more
>> closely, starting with my most pending issue: losing the terminal echo
>> after exiting the Monitor.

Thanks for the reply Markus.

> Reproducer?

I've found a couple of ways to reproduce the problem. The easiest is
to use -nographic on the qemu command line when starting a qemu
session. In this case the monitor opens stdio but there is no visible
input or output.

Another way is to use -nographic along with -mon ,mode=readline. In this case the monitor works, but when you
exit from the monitor your terminal will not echo characters.

For reference, here are the chardev and mon options I use:

 -chardev stdio,id=mon0 -mon chardev=mon0,mode=readline

I see that -nographic is a deprecated option, fwiw.

> The monitor runs on top of a QEMU chardev.  Suggest to start digging at
> monitor_init(), both into the monitor itself, and into the
> CharDriverState object.

Thus far I've confirmed that when the -nographic option is passed, the
mon_init_func does not get called (as it does for readline mode). I
know why this is, but I'm not yet sure the right way to fix it.  Also,
with -nographic and mon:stdio  monitor_flush is called for every line
entered execpt for the last line. Normally monitor_flush is called for
every line including the last line, at least in readline mode.

I've run out of time looking at this today, but would but would be
happy if anyone has further ideas.

Mike



[Qemu-devel] Monitor Readline - no terminal echo after exit

2014-04-23 Thread Mike Day

I believe someone on the list mentioned they are seeing a couple
problems entering and exiting the Monitor. I'd like to look at this more
closely, starting with my most pending issue: losing the terminal echo
after exiting the Monitor.

Does anyone have a quick pointer as to where I should look for this
code? Otherwise I'll start looking through main_loop and friends and
vl.c for init and destroy routines. 

Mike

-- 
Mike Day | "Endurance is a Virtue"



Re: [Qemu-devel] [PATCH v6] spapr_hcall: add address-translation-mode-on-interrupt resource in H_SET_MODE

2014-04-15 Thread Mike Day
On Tue, Apr 8, 2014 at 2:28 AM, Alexander Graf  wrote:
> On 22.03.14 14:03, Alexey Kardashevskiy wrote:
>>
>> This adds handling of the RESOURCE_ADDR_TRANS_MODE resource from
>> the H_SET_MODE, for POWER8 (PowerISA 2.07) only.
>>
>> Signed-off-by: Alexey Kardashevskiy 
>> Reviewed-by: Mike Day 
>
>
> ... but I can comment on style issues regardless for now ;)

Alex, would a document under the CDA we have with SUSE be acceptable
to you? That is the fastest option available at the moment.

Mike



Re: [Qemu-devel] [PATCH v5 3/3] spapr_hcall: add address-translation-mode-on-interrupt resource in H_SET_MODE

2014-03-20 Thread Mike Day

Alexey Kardashevskiy  writes:

> This adds handling of the RESOURCE_ADDR_TRANS_MODE resource from
> the H_SET_MODE, for POWER8 (PowerISA 2.07) only.
>
> Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: Mike Day 
> ---
>  hw/ppc/spapr_hcall.c | 26 ++
>  target-ppc/cpu.h |  2 ++
>  2 files changed, 28 insertions(+)
>
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index fc5211b..fb23730 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -747,6 +747,32 @@ static target_ulong h_set_mode(PowerPCCPU *cpu, 
> sPAPREnvironment *spapr,
>  default:
>  ret = H_UNSUPPORTED_FLAG;
>  }
> +} else if (resource == H_SET_MODE_RESOURCE_ADDR_TRANS_MODE) {
> +PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +
> +if (!(pcc->insns_flags2 & PPC2_ISA207S)) {
> +return H_P2;
   
   ret = H_P2;
   goto out; 
   
Just a nit to make for easier review. The above would be more
consistent. (Even though ret is already initialized to H_P2.)

> +}
> +if (value1) {
> +ret = H_P3;
> +goto out;
> +}
> +if (value2) {
> +ret = H_P4;
> +goto out;
> +}
> +switch (mflags) {
> +case 0:
> +case 2:
> +case 3:
> +CPU_FOREACH(cs) {
> +set_spr(cs, SPR_LPCR, mflags << LPCR_AIL_SH, LPCR_AIL);
> +}
> +return H_SUCCESS;
> +
> +default:
> +return H_UNSUPPORTED_FLAG;
> +}
>  }
>  
>  out:
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 72cb546..577193a 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -462,6 +462,8 @@ struct ppc_slb_t {
>  #define MSR_LE   0  /* Little-endian mode   1 hflags 
> */
>  
>  #define LPCR_ILE (1 << (63-38))
> +#define LPCR_AIL  0x0180  /* Alternate interrupt location */
> +#define LPCR_AIL_SH   (63-40)
>  
>  #define msr_sf   ((env->msr >> MSR_SF)   & 1)
>  #define msr_isf  ((env->msr >> MSR_ISF)  & 1)
> -- 
> 1.8.4.rc4
>
>

-- 
Mike Day | "Endurance is a Virtue"



Re: [Qemu-devel] qemu patch for adding functionality to rtas_ibm_get_system_parameter [Version 2]

2014-03-14 Thread Mike Day

Tomohiro B Berry  writes:

> Hi again,
>
> I believe I have added the appropriate format changes and made a couple 
> changes to the code.  This patch should add functionality to the function 
> rtas_ibm_get_system_parameter to return a string containing the values for 
> partition_max_entitled_capacity and system_potential_processors. 

The text before the diff goes into the git commit log. The patch
guidelines say the commit log entry should describe the patch and
anything else should go below "--" which will keep it out of the commit
log. "I believe I have..." is an example of the type of text that
shouldn't be in the commit log.

There is still a line wrap - If you are not doing so consider running
your patch through scripts/chechpatch.pl before submitting it.

Also consider using git-format-patch to generate the patch you will
send if you're not doing so already.

Mike

> Regards, 
> Tomo Berry
>
> Signed-off-by: Tomo Berry 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 1cb276d..931ba06 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -225,6 +225,9 @@ static void rtas_stop_self(PowerPCCPU *cpu, 
> sPAPREnvironment *spapr,
>  }
>  
>  #define DIAGNOSTICS_RUN_MODE42
> +#define SPLPAR_CHARACTERISTICS_TOKEN 20
> +#define CMO_CHARACTERISTICS_TOKEN 44
> +#define CEDE_LATENCY_TOKEN 45
>  
>  static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu,
>sPAPREnvironment *spapr,
> @@ -236,6 +239,7 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU 
> *cpu,
>  target_ulong buffer = rtas_ld(args, 1);
>  target_ulong length = rtas_ld(args, 2);
>  target_ulong ret = RTAS_OUT_NOT_SUPPORTED;
> +char *local_buffer;
>  
>  switch (parameter) {
>  case DIAGNOSTICS_RUN_MODE:
> @@ -244,6 +248,36 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU 
> *cpu,
>  ret = RTAS_OUT_SUCCESS;
>  }
>  break;
> +case SPLPAR_CHARACTERISTICS_TOKEN:
> +
> +/*  Create a string of length bytes locally to copy to buffer  */
> +
> +local_buffer = calloc(length, 1);
> +
> +/*  These are the current system parameters supported.  The first
> + *  16 bits of the buffer must contain the length of the string.
> + *  These 16 bits are not included in the length of the string. 
> */
> +
> +((uint16_t *)local_buffer)[0] = cpu_to_be16(snprintf(local_buffer 
> + 2,
> + length - 2, "MaxEntCap=%d,MaxPlatProcs=%d", max_cpus, 
> smp_cpus));
> +
> +/*  Copy the string into buffer using rtas_st_buffer to
> + *  convert the addresses. */
> +
> +rtas_st_buffer(buffer, length, (uint8_t *)local_buffer);
> +
> +free(local_buffer);
> +ret = RTAS_OUT_SUCCESS;
> +break;
> +case CMO_CHARACTERISTICS_TOKEN:
> +ret = RTAS_OUT_NOT_SUPPORTED;
> +break;
> +case CEDE_LATENCY_TOKEN:
> +ret = RTAS_OUT_NOT_SUPPORTED;
> +break;
> +default:
> +ret = RTAS_OUT_NOT_SUPPORTED;
> +break;
>  }
>  
>  rtas_st(rets, 0, ret);
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index b2f11e9..ee6ed2d 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -356,6 +356,18 @@ static inline void rtas_st(target_ulong phys, int n, 
> uint32_t val)
>  stl_be_phys(ppc64_phys_to_real(phys + 4*n), val);
>  }
>  
> +/*  This function will store a buffer 1 byte at a time into the
> + *  address at phys up to a maximum of phys_len bytes.*/
> +
> +static inline void rtas_st_buffer(target_ulong phys,
> +  target_ulong phys_len,
> +  uint8_t *buffer) {
> +uint32_t i;
> +for (i = 0; i < phys_len; i++) {
> +stb_phys(ppc64_phys_to_real(phys + i), buffer[i]);
> +}
> +}
> +
>  typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>uint32_t token,
>uint32_t nargs, target_ulong args,
>
>
>
> From:   Mike Day 
> To: Tomohiro B Berry/Austin/IBM@IBMUS, qemu-devel@nongnu.org, 
> Date:   03/13/2014 03:24 PM
> Subject:Re: [Qemu-devel] qemu patch for adding functionality to 
> rtas_ibm_get_system_parameter
>
>
>
>
> Tomohiro, 
>
> Please follow the guidelines for submitting a patch to Qemu that are
> found in:
>
> http://wiki.qemu.org/Contribute/SubmitAPatch
>
> This patch has an inappropriate commit log, is missing a signed-off-by:
> tag, and some of the lines wrapped in my rea

Re: [Qemu-devel] qemu patch for adding functionality to rtas_ibm_get_system_parameter

2014-03-13 Thread Mike Day

Tomohiro, 

Please follow the guidelines for submitting a patch to Qemu that are
found in:

http://wiki.qemu.org/Contribute/SubmitAPatch

This patch has an inappropriate commit log, is missing a signed-off-by:
tag, and some of the lines wrapped in my reader. These are explained in
the document above. You should be able to fix these issues quickly and
resubmit this patch. 

Mike

Tomohiro B Berry  writes:

> Hi all,
>
> rtas_ibm_get_system_parameter did not previously have the functionality to 
> return the appropriate string when called with the 
> SPLPAR_CHARACTERISTICS_TOKEN.  I am proposing the following patch to add 
> that functionality.  I am including the cases for 
> CMO_CHARACTERISTICS_TOKEN and CEDE_LATENCY_TOKEN because the function gets 
> called with those tokens during the boot process, but I understand that 
> they are currently redundant with the default case.  I just figured they 
> would be useful for future development, but if we don't want them in there 
> right now I think that would be okay, too. 
>
> Regards,
> Tomo Berry
>
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 1cb276d..318fdcd 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -225,6 +225,9 @@ static void rtas_stop_self(PowerPCCPU *cpu, 
> sPAPREnvironment *spapr,
>  }
>  
>  #define DIAGNOSTICS_RUN_MODE42
> +#define SPLPAR_CHARACTERISTICS_TOKEN 20
> +#define CMO_CHARACTERISTICS_TOKEN 44
> +#define CEDE_LATENCY_TOKEN 45
>  
>  static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu,
>sPAPREnvironment *spapr,
> @@ -236,6 +239,7 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU 
> *cpu,
>  target_ulong buffer = rtas_ld(args, 1);
>  target_ulong length = rtas_ld(args, 2);
>  target_ulong ret = RTAS_OUT_NOT_SUPPORTED;
> +char *local_buffer;
>  
>  switch (parameter) {
>  case DIAGNOSTICS_RUN_MODE:
> @@ -244,6 +248,42 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU 
> *cpu,
>  ret = RTAS_OUT_SUCCESS;
>  }
>  break;
> +case SPLPAR_CHARACTERISTICS_TOKEN:
> +
> +   /*  Create a string locally to copy to buffer  */
> + 
> +   local_buffer=(char*)malloc(length*sizeof(char));
> +   memset(local_buffer,0,length);
> +
> +   /*  These are the current system parameters supported.  The spaces 
> at the 
> +*  start of the string are place holders for the string length. 
> */
> +
> +   snprintf(local_buffer,length," 
> MaxEntCap=%d,MaxPlatProcs=%d",max_cpus,smp_cpus);
> +
> +   /*  The first 16 bits of the buffer must contain the length of the 
> string.
> +*  These 16 bits are not included in the length of the string. */
> + 
> + 
> ((uint16_t*)local_buffer)[0]=cpu_to_be16((uint16_t)strlen(&local_buffer[2]));
> + 
> +   /*  Copy the string into buffer using rtas_st_buffer to 
> +*  convert the addresses. */
> + 
> +   rtas_st_buffer(buffer,length,(uint8_t*)local_buffer);
> +
> +   /*  Free the local buffer and return success. */ 
> + 
> +   free(local_buffer);
> +   ret = RTAS_OUT_SUCCESS;
> +   break;
> +case CMO_CHARACTERISTICS_TOKEN:
> +   ret = RTAS_OUT_NOT_SUPPORTED;
> +   break;
> +case CEDE_LATENCY_TOKEN:
> +   ret = RTAS_OUT_NOT_SUPPORTED;
> +   break;
> +default:
> +   ret = RTAS_OUT_NOT_SUPPORTED;
> +   break;
>  }
>  
>  rtas_st(rets, 0, ret);
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index b2f11e9..87c039c 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -356,6 +356,16 @@ static inline void rtas_st(target_ulong phys, int n, 
> uint32_t val)
>  stl_be_phys(ppc64_phys_to_real(phys + 4*n), val);
>  }
>  
> +/*  This function will store a buffer 1 byte at a time into the 
> + *  address at phys up to a maximum of phys_len bytes. */
> +
> +static inline void rtas_st_buffer(target_ulong phys, target_ulong 
> phys_len, uint8_t* buffer){
> +uint32_t i; 
> +for(i=0;i +stb_phys(ppc64_phys_to_real(phys + i),buffer[i]);
> +}
> +}
> +
>  typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>uint32_t token,
>uint32_t nargs, target_ulong args,
> diff --git a/pixman b/pixman
> --- a/pixman
> +++ b/pixman
> @@ -1 +1 @@
> -Subproject commit 97336fad32acf802003855cd8bd6477fa49a12e3
> +Subproject commit 97336fad32acf802003855cd8bd6477fa49a12e3-dirty
-- 
Mike Day | "Endurance is a Virtue"



Re: [Qemu-devel] spapr-iommu: extend SPAPR_TCE_TABLE class

2014-03-13 Thread Mike Day
On 20/11/13 16:39 +1100, Alexey Kardashevskiy wrote:
> This adds a put_tce() callback to the SPAPR TCE TABLE device class.
> The new callback allows to have different IOMMU types such as upcoming
> VFIO IOMMU and it will be used more by the upcoming Multi-TCE support.
> 
> This reworks the H_PUT_TCE handler to make use of the new put_tce()
> callback.
> 
> Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: Mike Day 



> ---
>  hw/ppc/spapr_iommu.c   | 21 +
>  include/hw/ppc/spapr.h | 13 +
>  2 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index ef45f4f..0016c13 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -207,7 +207,7 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, 
> target_ulong ioba,
>  IOMMUTLBEntry entry;
>  
>  if (ioba >= tcet->window_size) {
> -hcall_dprintf("spapr_vio_put_tce on out-of-bounds IOBA 0x"
> +hcall_dprintf("spapr put_tce_emu on out-of-bounds IOBA 0x"
>TARGET_FMT_lx "\n", ioba);
>  return H_PARAMETER;
>  }
> @@ -232,12 +232,21 @@ static target_ulong h_put_tce(PowerPCCPU *cpu, 
> sPAPREnvironment *spapr,
>  target_ulong tce = args[2];
>  target_ulong ret = H_PARAMETER;
>  sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
> +sPAPRTCETableClass *info;
> +
> +if (!tcet) {
> +return H_PARAMETER;
> +}
> +
> +info = SPAPR_TCE_TABLE_GET_CLASS(tcet);
> +if (!info || !info->put_tce) {
> +return H_PARAMETER;
> +}
>  
>  ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1);
>  
> -if (tcet) {
> -ret = put_tce_emu(tcet, ioba, tce);
> -}
> +ret = info->put_tce(tcet, ioba, tce);
> +
>  trace_spapr_iommu_put(liobn, ioba, tce, ret);
>  
>  return ret;
> @@ -287,9 +296,12 @@ int spapr_tcet_dma_dt(void *fdt, int node_off, const 
> char *propname,
>  static void spapr_tce_table_class_init(ObjectClass *klass, void *data)
>  {
>  DeviceClass *dc = DEVICE_CLASS(klass);
> +sPAPRTCETableClass *stc = SPAPR_TCE_TABLE_CLASS(klass);
> +
>  dc->vmsd = &vmstate_spapr_tce_table;
>  dc->init = spapr_tce_table_realize;
>  dc->reset = spapr_tce_reset;
> +stc->put_tce = put_tce_emu;
>  
>  QLIST_INIT(&spapr_tce_tables);
>  
> @@ -302,6 +314,7 @@ static TypeInfo spapr_tce_table_info = {
>  .parent = TYPE_DEVICE,
>  .instance_size = sizeof(sPAPRTCETable),
>  .class_init = spapr_tce_table_class_init,
> +.class_size = sizeof(sPAPRTCETableClass),
>  .instance_finalize = spapr_tce_table_finalize,
>  };
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index fdaab2d..827cda2 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -367,12 +367,25 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr 
> rtas_addr,
>  
>  #define RTAS_ERROR_LOG_MAX  2048
>  
> +typedef struct sPAPRTCETableClass sPAPRTCETableClass;
>  typedef struct sPAPRTCETable sPAPRTCETable;
>  
>  #define TYPE_SPAPR_TCE_TABLE "spapr-tce-table"
>  #define SPAPR_TCE_TABLE(obj) \
>  OBJECT_CHECK(sPAPRTCETable, (obj), TYPE_SPAPR_TCE_TABLE)
>  
> +#define SPAPR_TCE_TABLE_CLASS(klass) \
> + OBJECT_CLASS_CHECK(sPAPRTCETableClass, (klass), TYPE_SPAPR_TCE_TABLE)
> +#define SPAPR_TCE_TABLE_GET_CLASS(obj) \
> + OBJECT_GET_CLASS(sPAPRTCETableClass, (obj), TYPE_SPAPR_TCE_TABLE)
> +
> +struct sPAPRTCETableClass {
> +DeviceClass parent_class;
> +
> +target_ulong (*put_tce)(sPAPRTCETable *tcet, target_ulong ioba,
> +target_ulong tce);
> +};
> +
>  struct sPAPRTCETable {
>  DeviceState parent;
>  uint32_t liobn;
> -- 
> 1.8.4.rc4
> 
> 
> 
> 



Re: [Qemu-devel] spapr-pci: converts fprintf to error_report

2014-03-12 Thread Mike Day
On 21/11/13 15:08 +1100, Alexey Kardashevskiy wrote:
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: Mike Day 

> ---
>  hw/ppc/spapr_pci.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 963841c..d102d82 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -293,7 +293,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, 
> sPAPREnvironment *spapr,
>  ret_intr_type = RTAS_TYPE_MSIX;
>  break;
>  default:
> -fprintf(stderr, "rtas_ibm_change_msi(%u) is not implemented\n", 
> func);
> +error_report("rtas_ibm_change_msi(%u) is not implemented", func);
>  rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>  return;
>  }
> @@ -327,7 +327,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, 
> sPAPREnvironment *spapr,
>  /* Find a device number in the map to add or reuse the existing one */
>  ndev = spapr_msicfg_find(phb, config_addr, true);
>  if (ndev >= SPAPR_MSIX_MAX_DEVS || ndev < 0) {
> -fprintf(stderr, "No free entry for a new MSI device\n");
> +error_report("No free entry for a new MSI device");
>  rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>  return;
>  }
> @@ -336,7 +336,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, 
> sPAPREnvironment *spapr,
>  /* Check if there is an old config and MSI number has not changed */
>  if (phb->msi_table[ndev].nvec && (req_num != phb->msi_table[ndev].nvec)) 
> {
>  /* Unexpected behaviour */
> -fprintf(stderr, "Cannot reuse MSI config for device#%d", ndev);
> +error_report("Cannot reuse MSI config for device#%d", ndev);
>  rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>  return;
>  }
> @@ -346,7 +346,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, 
> sPAPREnvironment *spapr,
>  irq = spapr_allocate_irq_block(req_num, false,
> ret_intr_type == RTAS_TYPE_MSI);
>  if (irq < 0) {
> -fprintf(stderr, "Cannot allocate MSIs for device#%d", ndev);
> +error_report("Cannot allocate MSIs for device#%d", ndev);
>  rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>  return;
>  }
> -- 
> 1.8.4.rc4
> 
> 
> 
> 



Re: [Qemu-devel] spapr-pci: introduce a finish_realize() callback

2014-03-12 Thread Mike Day
On 21/11/13 15:08 +1100, Alexey Kardashevskiy wrote:
> The spapr-pci PHB initializes IOMMU for emulataed devices only.
> The upcoming VFIO support will do it different. However both emulated
> and VFIO PHB types share most of the initialization code.
> For the type specific things a new finish_realize() callback is
> introduced.
> 
> This introduces sPAPRPHBClass derived from PCIHostBridgeClass and
> adds the callback pointer.
> 
> This implements finish_realize() for emulated devices.
> 
> This changes initialization steps order to have the finish_realize()
> call at the end of the spapr_finalize().
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: Mike Day 

> ---
> Changes:
> v5:
> * this is a new patch in the series, it was a part of a previous patch
> ---
>  hw/ppc/spapr_pci.c  | 46 
> +
>  include/hw/pci-host/spapr.h | 18 --
>  2 files changed, 46 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index aeb012d..963841c 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -500,6 +500,7 @@ static void spapr_phb_realize(DeviceState *dev, Error 
> **errp)
>  SysBusDevice *s = SYS_BUS_DEVICE(dev);
>  sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
>  PCIHostState *phb = PCI_HOST_BRIDGE(s);
> +sPAPRPHBClass *info = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(s);
>  const char *busname;
>  char *namebuf;
>  int i;
> @@ -609,22 +610,6 @@ static void spapr_phb_realize(DeviceState *dev, Error 
> **errp)
> PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
>  phb->bus = bus;
>  
> -sphb->dma_window_start = 0;
> -sphb->dma_window_size = 0x4000;
> -sphb->tcet = spapr_tce_new_table(dev, sphb->dma_liobn,
> - sphb->dma_window_size);
> -if (!sphb->tcet) {
> -error_setg(errp, "Unable to create TCE table for %s",
> -   sphb->dtbusname);
> -return;
> -}
> -address_space_init(&sphb->iommu_as, spapr_tce_get_iommu(sphb->tcet),
> -   sphb->dtbusname);
> -
> -pci_setup_iommu(bus, spapr_pci_dma_iommu, sphb);
> -
> -pci_bus_set_route_irq_fn(bus, spapr_route_intx_pin_to_irq);
> -
>  QLIST_INSERT_HEAD(&spapr->phbs, sphb, list);
>  
>  /* Initialize the LSI table */
> @@ -639,6 +624,32 @@ static void spapr_phb_realize(DeviceState *dev, Error 
> **errp)
>  
>  sphb->lsi_table[i].irq = irq;
>  }
> +
> +pci_setup_iommu(sphb->parent_obj.bus, spapr_pci_dma_iommu, sphb);
> +
> +pci_bus_set_route_irq_fn(bus, spapr_route_intx_pin_to_irq);
> +
> +if (!info->finish_realize) {
> +error_setg(errp, "finish_realize not defined");
> +return;
> +}
> +
> +info->finish_realize(sphb, errp);
> +}
> +
> +static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp)
> +{
> +sphb->dma_window_start = 0;
> +sphb->dma_window_size = 0x4000;
> +sphb->tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
> + sphb->dma_window_size);
> +if (!sphb->tcet) {
> +error_setg(errp, "Unable to create TCE table for %s",
> +   sphb->dtbusname);
> +return ;
> +}
> +address_space_init(&sphb->iommu_as, spapr_tce_get_iommu(sphb->tcet),
> +   sphb->dtbusname);
>  }
>  
>  static void spapr_phb_reset(DeviceState *qdev)
> @@ -722,12 +733,14 @@ static void spapr_phb_class_init(ObjectClass *klass, 
> void *data)
>  {
>  PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
>  DeviceClass *dc = DEVICE_CLASS(klass);
> +sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass);
>  
>  hc->root_bus_path = spapr_phb_root_bus_path;
>  dc->realize = spapr_phb_realize;
>  dc->props = spapr_phb_properties;
>  dc->reset = spapr_phb_reset;
>  dc->vmsd = &vmstate_spapr_pci;
> +spc->finish_realize = spapr_phb_finish_realize;
>  }
>  
>  static const TypeInfo spapr_phb_info = {
> @@ -735,6 +748,7 @@ static const TypeInfo spapr_phb_info = {
>  .parent= TYPE_PCI_HOST_BRIDGE,
>  .instance_size = sizeof(sPAPRPHBState),
>  .class_init= spapr_phb_class_init,
> +.class_size= sizeof(sPAPRPHBClass),
>  };
>  
>  PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index)
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index 970b4a9

Re: [Qemu-devel] spapr-pci: convert init() callback to realize()

2014-03-12 Thread Mike Day
On 21/11/13 15:08 +1100, Alexey Kardashevskiy wrote:
> This converts the old-style init() callback to a new style realize()
> callback as init() now is supposed to do only trivial initialization.
> 
> As a part of convertion, this replaces fprintf(stderr) with error_setg()
> as realize() does not "return" any value, instead it puts the extended
> error into **errp.
> 
> Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: Mike Day 

> ---
> Changes:
> v5:
> * finish_finalize() moved to a separate patch
> ---
>  hw/ppc/spapr_pci.c | 44 ++--
>  1 file changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 7763149..aeb012d 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -32,6 +32,7 @@
>  #include "exec/address-spaces.h"
>  #include 
>  #include "trace.h"
> +#include "qemu/error-report.h"
>  
>  #include "hw/pci/pci_bus.h"
>  
> @@ -494,9 +495,9 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, 
> void *opaque, int devfn)
>  return &phb->iommu_as;
>  }
>  
> -static int spapr_phb_init(SysBusDevice *s)
> +static void spapr_phb_realize(DeviceState *dev, Error **errp)
>  {
> -DeviceState *dev = DEVICE(s);
> +SysBusDevice *s = SYS_BUS_DEVICE(dev);
>  sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
>  PCIHostState *phb = PCI_HOST_BRIDGE(s);
>  const char *busname;
> @@ -510,9 +511,9 @@ static int spapr_phb_init(SysBusDevice *s)
>  if ((sphb->buid != -1) || (sphb->dma_liobn != -1)
>  || (sphb->mem_win_addr != -1)
>  || (sphb->io_win_addr != -1)) {
> -fprintf(stderr, "Either \"index\" or other parameters must"
> -" be specified for PAPR PHB, not both\n");
> -return -1;
> +error_setg(errp, "Either \"index\" or other parameters must"
> +   " be specified for PAPR PHB, not both");
> +return;
 
shouldn't these return an int? Or if the error is returned by pointer
perhaps the function should have a void return?

>  }
>  
>  sphb->buid = SPAPR_PCI_BASE_BUID + sphb->index;
> @@ -525,28 +526,28 @@ static int spapr_phb_init(SysBusDevice *s)
>  }
>  
>  if (sphb->buid == -1) {
> -fprintf(stderr, "BUID not specified for PHB\n");
> -return -1;
> +error_setg(errp, "BUID not specified for PHB");
> +return;
>  }
>  
>  if (sphb->dma_liobn == -1) {
> -fprintf(stderr, "LIOBN not specified for PHB\n");
> -return -1;
> +error_setg(errp, "LIOBN not specified for PHB");
> +return;
>  }
>  
>  if (sphb->mem_win_addr == -1) {
> -fprintf(stderr, "Memory window address not specified for PHB\n");
> -return -1;
> +error_setg(errp, "Memory window address not specified for PHB");
> +return;
>  }
>  
>  if (sphb->io_win_addr == -1) {
> -fprintf(stderr, "IO window address not specified for PHB\n");
> -return -1;
> +error_setg(errp, "IO window address not specified for PHB");
> +return;
>  }
>  
>  if (find_phb(spapr, sphb->buid)) {
> -fprintf(stderr, "PCI host bridges must have unique BUIDs\n");
> -return -1;
> +error_setg(errp, "PCI host bridges must have unique BUIDs");
> +return;
>  }
>  
>  sphb->dtbusname = g_strdup_printf("pci@%" PRIx64, sphb->buid);
> @@ -613,8 +614,9 @@ static int spapr_phb_init(SysBusDevice *s)
>  sphb->tcet = spapr_tce_new_table(dev, sphb->dma_liobn,
>   sphb->dma_window_size);
>  if (!sphb->tcet) {
> -fprintf(stderr, "Unable to create TCE table for %s\n", 
> sphb->dtbusname);
> -return -1;
> +error_setg(errp, "Unable to create TCE table for %s",
> +   sphb->dtbusname);
> +return;
>  }
>  address_space_init(&sphb->iommu_as, spapr_tce_get_iommu(sphb->tcet),
> sphb->dtbusname);
> @@ -631,13 +633,12 @@ static int spapr_phb_init(SysBusDevice *s)
>  
>  irq = spapr_allocate_lsi(0);
>  if (!irq) {
> -return -1;
> +error_setg(errp, "spapr_allocate_lsi failed");
> +return;
>  }
>  
>  s

Re: [Qemu-devel] spapr-pci: convert init() callback to realize()

2014-03-12 Thread Mike Day
On 21/11/13 15:08 +1100, Alexey Kardashevskiy wrote:
> This converts the old-style init() callback to a new style realize()
> callback as init() now is supposed to do only trivial initialization.
> 
> As a part of convertion, this replaces fprintf(stderr) with error_setg()
> as realize() does not "return" any value, instead it puts the extended
> error into **errp.
> 
> Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: Mike Day 

> ---
> Changes:
> v5:
> * finish_finalize() moved to a separate patch
> ---
>  hw/ppc/spapr_pci.c | 44 ++--
>  1 file changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 7763149..aeb012d 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -32,6 +32,7 @@
>  #include "exec/address-spaces.h"
>  #include 
>  #include "trace.h"
> +#include "qemu/error-report.h"
>  
>  #include "hw/pci/pci_bus.h"
>  
> @@ -494,9 +495,9 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, 
> void *opaque, int devfn)
>  return &phb->iommu_as;
>  }
>  
> -static int spapr_phb_init(SysBusDevice *s)
> +static void spapr_phb_realize(DeviceState *dev, Error **errp)
>  {
> -DeviceState *dev = DEVICE(s);
> +SysBusDevice *s = SYS_BUS_DEVICE(dev);
>  sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s);
>  PCIHostState *phb = PCI_HOST_BRIDGE(s);
>  const char *busname;
> @@ -510,9 +511,9 @@ static int spapr_phb_init(SysBusDevice *s)
>  if ((sphb->buid != -1) || (sphb->dma_liobn != -1)
>  || (sphb->mem_win_addr != -1)
>  || (sphb->io_win_addr != -1)) {
> -fprintf(stderr, "Either \"index\" or other parameters must"
> -" be specified for PAPR PHB, not both\n");
> -return -1;
> +error_setg(errp, "Either \"index\" or other parameters must"
> +   " be specified for PAPR PHB, not both");
> +return;
>  }

Seems like these exit clauses should return an integer here and below.
  
>  sphb->buid = SPAPR_PCI_BASE_BUID + sphb->index;
> @@ -525,28 +526,28 @@ static int spapr_phb_init(SysBusDevice *s)
>  }
>  
>  if (sphb->buid == -1) {
> -fprintf(stderr, "BUID not specified for PHB\n");
> -return -1;
> +error_setg(errp, "BUID not specified for PHB");
> +return;
>  }
>  
>  if (sphb->dma_liobn == -1) {
> -fprintf(stderr, "LIOBN not specified for PHB\n");
> -return -1;
> +error_setg(errp, "LIOBN not specified for PHB");
> +return;
>  }
>  
>  if (sphb->mem_win_addr == -1) {
> -fprintf(stderr, "Memory window address not specified for PHB\n");
> -return -1;
> +error_setg(errp, "Memory window address not specified for PHB");
> +return;
>  }
>  
>  if (sphb->io_win_addr == -1) {
> -fprintf(stderr, "IO window address not specified for PHB\n");
> -return -1;
> +error_setg(errp, "IO window address not specified for PHB");
> +return;
>  }
>  
>  if (find_phb(spapr, sphb->buid)) {
> -fprintf(stderr, "PCI host bridges must have unique BUIDs\n");
> -return -1;
> +error_setg(errp, "PCI host bridges must have unique BUIDs");
> +return;
>  }
>  
>  sphb->dtbusname = g_strdup_printf("pci@%" PRIx64, sphb->buid);
> @@ -613,8 +614,9 @@ static int spapr_phb_init(SysBusDevice *s)
>  sphb->tcet = spapr_tce_new_table(dev, sphb->dma_liobn,
>   sphb->dma_window_size);
>  if (!sphb->tcet) {
> -fprintf(stderr, "Unable to create TCE table for %s\n", 
> sphb->dtbusname);
> -return -1;
> +error_setg(errp, "Unable to create TCE table for %s",
> +   sphb->dtbusname);
> +return;
>  }
>  address_space_init(&sphb->iommu_as, spapr_tce_get_iommu(sphb->tcet),
> sphb->dtbusname);
> @@ -631,13 +633,12 @@ static int spapr_phb_init(SysBusDevice *s)
>  
>  irq = spapr_allocate_lsi(0);
>  if (!irq) {
> -return -1;
> +error_setg(errp, "spapr_allocate_lsi failed");
> +return;
>  }
>  
>  sphb->lsi_table[i].irq = irq;
>  }
> -
> -return 0;
>  }
>  
>  static void spapr_phb_reset(DeviceState *qdev)
> @@ -720,11 +721,10 @@ static const char *spapr_phb_root_bus_path(PCIHostState 
> *host_bridge,
>  static void spapr_phb_class_init(ObjectClass *klass, void *data)
>  {
>  PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
> -SysBusDeviceClass *sdc = SYS_BUS_DEVICE_CLASS(klass);
>  DeviceClass *dc = DEVICE_CLASS(klass);
>  
>  hc->root_bus_path = spapr_phb_root_bus_path;
> -sdc->init = spapr_phb_init;
> +dc->realize = spapr_phb_realize;
>  dc->props = spapr_phb_properties;
>  dc->reset = spapr_phb_reset;
>  dc->vmsd = &vmstate_spapr_pci;
> -- 
> 1.8.4.rc4
> 
> 
> 
> 



[Qemu-devel] [PATCH 2/2] [RFC] Convert Clock Timerlists to RCU V3

2014-03-06 Thread Mike Day
timerlists is a list of lists that holds active timers, among other
items. It is read-mostly. This patch converts read access to the
timerlists to use RCU.

Rather than introduce a second mutex for timerlists, which would
require nested mutexes to in order to write to the timerlists, use one
QemuMutex in the QemuClock structure for all write access to any list
hanging off the QemuClock structure.

This patch applies against Paolo Bonzini's rcu branch:
https://github.com/bonzini/qemu/tree/rcu and also requires the
previously submitted patch ae11e1c "Convert active timers list to use
RCU for read operations V3."

V3:
- timerlists modifications split to a separate patch (this one).
- Addressed comments from Alex Bligh and Paolo Bonzini.

Signed-off-by: Mike Day 
---
 qemu-timer.c | 31 ++-
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/qemu-timer.c b/qemu-timer.c
index 57a1545..4144e54 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -74,6 +74,7 @@ struct QEMUTimerList {
 QEMUTimerListNotifyCB *notify_cb;
 void *notify_opaque;
 QemuEvent timers_done_ev;
+struct rcu_head rcu;
 };
 
 /**
@@ -111,6 +112,13 @@ QEMUTimerList *timerlist_new(QEMUClockType type,
 return timer_list;
 }
 
+static void reclaim_timerlist(struct rcu_head *rcu)
+{
+QEMUTimerList *tl = container_of(rcu, QEMUTimerList, rcu);
+g_free(tl);
+}
+
+
 void timerlist_free(QEMUTimerList *timer_list)
 {
 assert(!timerlist_has_timers(timer_list));
@@ -118,7 +126,7 @@ void timerlist_free(QEMUTimerList *timer_list)
 QLIST_REMOVE(timer_list, list);
 }
 qemu_event_destroy(&timer_list->timers_done_ev);
-g_free(timer_list);
+call_rcu1(&timer_list->rcu, reclaim_timerlist);
 }
 
 static void qemu_clock_init(QEMUClockType type)
@@ -143,9 +151,11 @@ void qemu_clock_notify(QEMUClockType type)
 {
 QEMUTimerList *timer_list;
 QEMUClock *clock = qemu_clock_ptr(type);
-QLIST_FOREACH(timer_list, &clock->timerlists, list) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(timer_list, &clock->timerlists, list) {
 timerlist_notify(timer_list);
 }
+rcu_read_unlock();
 }
 
 void qemu_clock_enable(QEMUClockType type, bool enabled)
@@ -157,9 +167,11 @@ void qemu_clock_enable(QEMUClockType type, bool enabled)
 if (enabled && !old) {
 qemu_clock_notify(type);
 } else if (!enabled && old) {
-QLIST_FOREACH(tl, &clock->timerlists, list) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(tl, &clock->timerlists, list) {
 qemu_event_wait(&tl->timers_done_ev);
 }
+rcu_read_unlock();
 }
 }
 
@@ -243,10 +255,12 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type)
 int64_t deadline = -1;
 QEMUTimerList *timer_list;
 QEMUClock *clock = qemu_clock_ptr(type);
-QLIST_FOREACH(timer_list, &clock->timerlists, list) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(timer_list, &clock->timerlists, list) {
 deadline = qemu_soonest_timeout(deadline,
 timerlist_deadline_ns(timer_list));
 }
+rcu_read_unlock();
 return deadline;
 }
 
@@ -262,11 +276,13 @@ QEMUTimerList 
*qemu_clock_get_main_loop_timerlist(QEMUClockType type)
 
 void timerlist_notify(QEMUTimerList *timer_list)
 {
-if (timer_list->notify_cb) {
+rcu_read_lock();
+if (atomic_rcu_read(&timer_list->notify_cb)) {
 timer_list->notify_cb(timer_list->notify_opaque);
 } else {
 qemu_notify_event();
 }
+rcu_read_unlock();
 }
 
 /* Transition function to convert a nanosecond timeout to ms
@@ -585,13 +601,18 @@ void qemu_clock_register_reset_notifier(QEMUClockType 
type,
 Notifier *notifier)
 {
 QEMUClock *clock = qemu_clock_ptr(type);
+qemu_mutex_lock(&clock->timer_lock);
 notifier_list_add(&clock->reset_notifiers, notifier);
+qemu_mutex_unlock(&clock->timer_lock);
 }
 
 void qemu_clock_unregister_reset_notifier(QEMUClockType type,
   Notifier *notifier)
 {
+QEMUClock *clock = qemu_clock_ptr(type);
+qemu_mutex_lock(&clock->timer_lock);
 notifier_remove(notifier);
+qemu_mutex_unlock(&clock->timer_lock);
 }
 
 void init_clocks(void)
-- 
1.9.0




[Qemu-devel] [PATCH 1/2] [RFC] Convert active timers list to use RCU V3

2014-03-06 Thread Mike Day
active_timers is a list of timer lists, associated with a Qemu Clock,
that is read-mostly. This patch converts read accesses to the list to
use RCU, which should hopefully mitigate most of the synchronization
overhead. Write accesses are now via a mutex in the parent data
structure. Functions that change the change the parent data structure
are also protected by the mutex.

This patch applies against Paolo Bonzini's rcu branch:
https://github.com/bonzini/qemu/tree/rcu

V3:
- Address comments from Alex Bligh and Paolo Bonzini
- Move the mutex into the parent QemuClock structure

Signed-off-by: Mike Day 
---
 include/qemu/timer.h |  19 +
 qemu-timer.c | 111 +--
 2 files changed, 72 insertions(+), 58 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 5afcffc..f69ec49 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -5,8 +5,15 @@
 #include "qemu-common.h"
 #include "qemu/notify.h"
 
-/* timers */
+#ifdef __GNUC__
+#ifndef __ATOMIC_RELEASE
+#define __ATOMIC_RELEASE
+#endif
+#endif
+#include "qemu/atomic.h"
+#include "qemu/rcu.h"
 
+/* timers */
 #define SCALE_MS 100
 #define SCALE_US 1000
 #define SCALE_NS 1
@@ -61,6 +68,7 @@ struct QEMUTimer {
 void *opaque;
 QEMUTimer *next;
 int scale;
+struct rcu_head rcu;
 };
 
 extern QEMUTimerListGroup main_loop_tlg;
@@ -189,12 +197,6 @@ void qemu_clock_notify(QEMUClockType type);
  * @enabled: true to enable, false to disable
  *
  * Enable or disable a clock
- * Disabling the clock will wait for related timerlists to stop
- * executing qemu_run_timers.  Thus, this functions should not
- * be used from the callback of a timer that is based on @clock.
- * Doing so would cause a deadlock.
- *
- * Caller should hold BQL.
  */
 void qemu_clock_enable(QEMUClockType type, bool enabled);
 
@@ -543,7 +545,6 @@ void timer_del(QEMUTimer *ts);
  * freed while this function is running.
  */
 void timer_mod_ns(QEMUTimer *ts, int64_t expire_time);
-
 /**
  * timer_mod_anticipate_ns:
  * @ts: the timer
@@ -685,9 +686,7 @@ static inline int64_t qemu_soonest_timeout(int64_t 
timeout1, int64_t timeout2)
 void init_clocks(void);
 
 int64_t cpu_get_ticks(void);
-/* Caller must hold BQL */
 void cpu_enable_ticks(void);
-/* Caller must hold BQL */
 void cpu_disable_ticks(void);
 
 static inline int64_t get_ticks_per_sec(void)
diff --git a/qemu-timer.c b/qemu-timer.c
index fb9e680..57a1545 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -29,6 +29,7 @@
 #include "hw/hw.h"
 
 #include "qemu/timer.h"
+#include "qemu/rcu_queue.h"
 #ifdef CONFIG_POSIX
 #include 
 #endif
@@ -45,12 +46,10 @@
 /* timers */
 
 typedef struct QEMUClock {
-/* We rely on BQL to protect the timerlists */
 QLIST_HEAD(, QEMUTimerList) timerlists;
-
+QemuMutex timer_lock;
 NotifierList reset_notifiers;
 int64_t last;
-
 QEMUClockType type;
 bool enabled;
 
@@ -70,11 +69,11 @@ QEMUClock qemu_clocks[QEMU_CLOCK_MAX];
 
 struct QEMUTimerList {
 QEMUClock *clock;
-QemuMutex active_timers_lock;
 QEMUTimer *active_timers;
 QLIST_ENTRY(QEMUTimerList) list;
 QEMUTimerListNotifyCB *notify_cb;
 void *notify_opaque;
+QemuEvent timers_done_ev;
 };
 
 /**
@@ -87,6 +86,7 @@ struct QEMUTimerList {
  */
 static inline QEMUClock *qemu_clock_ptr(QEMUClockType type)
 {
+smp_rmb();
 return &qemu_clocks[type];
 }
 
@@ -107,7 +107,6 @@ QEMUTimerList *timerlist_new(QEMUClockType type,
 timer_list->clock = clock;
 timer_list->notify_cb = cb;
 timer_list->notify_opaque = opaque;
-qemu_mutex_init(&timer_list->active_timers_lock);
 QLIST_INSERT_HEAD(&clock->timerlists, timer_list, list);
 return timer_list;
 }
@@ -118,7 +117,7 @@ void timerlist_free(QEMUTimerList *timer_list)
 if (timer_list->clock) {
 QLIST_REMOVE(timer_list, list);
 }
-qemu_mutex_destroy(&timer_list->active_timers_lock);
+qemu_event_destroy(&timer_list->timers_done_ev);
 g_free(timer_list);
 }
 
@@ -129,6 +128,7 @@ static void qemu_clock_init(QEMUClockType type)
 clock->type = type;
 clock->enabled = true;
 clock->last = INT64_MIN;
+qemu_mutex_init(&clock->timer_lock);
 QLIST_INIT(&clock->timerlists);
 notifier_list_init(&clock->reset_notifiers);
 main_loop_tlg.tl[type] = timerlist_new(type, NULL, NULL);
@@ -148,13 +148,6 @@ void qemu_clock_notify(QEMUClockType type)
 }
 }
 
-/* Disabling the clock will wait for related timerlists to stop
- * executing qemu_run_timers.  Thus, this functions should not
- * be used from the callback of a timer that is based on @clock.
- * Doing so would cause a deadlock.
- *
- * Caller should hold BQL.
- */
 void qemu_clock_enable(QEMUClockType type, bool enabled)
 {
 QEMUClock *clock = qemu_clock_ptr(type);
@@ -172,7 +

[Qemu-devel] [PATCH 0/2][RFC] Convert Clock lists to RCU (V3)

2014-03-06 Thread Mike Day
The first patch in the series converts the
clock->timerlists->active_timers list to RCU. It also creates
QemuMutex in the parent QemuClock data structure.

The second patch converts clock->timerlists to RCU.

Mike Day (2):
  [RFC] Convert active timers list to use RCU V3
  [RFC] Convert Clock Timerlists to RCU V3

 include/qemu/timer.h |  19 ---
 qemu-timer.c | 142 ---
 2 files changed, 98 insertions(+), 63 deletions(-)

-- 
1.9.0




Re: [Qemu-devel] [PATCH] usb-ohci: add vmstate descriptor

2014-03-06 Thread Mike Day

Alexey Kardashevskiy  writes:

> This adds migration support for OHCI.
>
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: Mike Day 

> ---
>  hw/usb/hcd-ohci.c | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
> index e38cdeb..c42e091 100644
> --- a/hw/usb/hcd-ohci.c
> +++ b/hw/usb/hcd-ohci.c
> @@ -1984,6 +1984,17 @@ static Property ohci_pci_properties[] = {
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> +static const VMStateDescription vmstate_ohci = {
> +.name = "ohci",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.minimum_version_id_old = 1,
> +.fields  = (VMStateField[]) {
> +VMSTATE_PCI_DEVICE(parent_obj, OHCIPCIState),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
>  static void ohci_pci_class_init(ObjectClass *klass, void *data)
>  {
>  DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -1997,6 +2008,7 @@ static void ohci_pci_class_init(ObjectClass *klass, 
> void *data)
>  set_bit(DEVICE_CATEGORY_USB, dc->categories);
>  dc->desc = "Apple USB Controller";
>  dc->props = ohci_pci_properties;
> +dc->vmsd = &vmstate_ohci;
>  }
>  
>  static const TypeInfo ohci_pci_info = {
> -- 
> 1.8.4.rc4
>
>




Re: [Qemu-devel] [PATCH v7 2/2] target-ppc: spapr: e500: fix to use cpu_dt_id

2014-03-03 Thread Mike Day
On Sat, Feb 1, 2014 at 9:45 AM, Alexey Kardashevskiy  wrote:
> This makes use of @cpu_dt_id and related API in:
> 1. emulated XICS hypercall handlers as they receive fixed CPU indexes;
> 2. XICS-KVM to enable in-kernel XICS on right CPU;
> 3. device-tree renderer.
>
> This removes @cpu_index fixup as @cpu_dt_id is used instead so QEMU monitor
> can accept command-line CPU indexes again.
>
> This changes kvm_arch_vcpu_id() to use ppc_get_vcpu_dt_id() as at the moment
> KVM CPU id and device tree ID are calculated using the same algorithm.
>
Signed-off-by: Alexey Kardashevskiy 
Acked-by: Mike Day 

> ---
> Changes:
> v7:
> * replaced referencing to PowerPCCPU::parent_obj with the CPU macro
> ---
>  hw/intc/openpic_kvm.c   |  2 +-
>  hw/intc/xics.c  | 15 +--
>  hw/intc/xics_kvm.c  | 10 +-
>  hw/ppc/e500.c   |  7 +--
>  hw/ppc/spapr.c  |  9 +
>  hw/ppc/spapr_hcall.c|  6 +++---
>  hw/ppc/spapr_rtas.c | 14 +++---
>  target-ppc/kvm.c|  2 +-
>  target-ppc/translate_init.c |  1 +
>  9 files changed, 41 insertions(+), 25 deletions(-)
>
> diff --git a/hw/intc/openpic_kvm.c b/hw/intc/openpic_kvm.c
> index c7f7b84..87fdb12 100644
> --- a/hw/intc/openpic_kvm.c
> +++ b/hw/intc/openpic_kvm.c
> @@ -228,7 +228,7 @@ int kvm_openpic_connect_vcpu(DeviceState *d, CPUState *cs)
>
>  encap.cap = KVM_CAP_IRQ_MPIC;
>  encap.args[0] = opp->fd;
> -encap.args[1] = cs->cpu_index;
> +encap.args[1] = kvm_arch_vcpu_id(cs);
>
>  return kvm_vcpu_ioctl(cs, KVM_ENABLE_CAP, &encap);
>  }
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index b437563..64aabe7 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -33,6 +33,17 @@
>  #include "qemu/error-report.h"
>  #include "qapi/visitor.h"
>
> +static int get_cpu_index_by_dt_id(int cpu_dt_id)
> +{
> +PowerPCCPU *cpu = ppc_get_vcpu_by_dt_id(cpu_dt_id);
> +
> +if (cpu) {
> +return cpu->parent_obj.cpu_index;
> +}
> +
> +return -1;
> +}
> +
>  void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu)
>  {
>  CPUState *cs = CPU(cpu);
> @@ -659,7 +670,7 @@ static target_ulong h_cppr(PowerPCCPU *cpu, 
> sPAPREnvironment *spapr,
>  static target_ulong h_ipi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>target_ulong opcode, target_ulong *args)
>  {
> -target_ulong server = args[0];
> +target_ulong server = get_cpu_index_by_dt_id(args[0]);
>  target_ulong mfrr = args[1];
>
>  if (server >= spapr->icp->nr_servers) {
> @@ -728,7 +739,7 @@ static void rtas_set_xive(PowerPCCPU *cpu, 
> sPAPREnvironment *spapr,
>  }
>
>  nr = rtas_ld(args, 0);
> -server = rtas_ld(args, 1);
> +server = get_cpu_index_by_dt_id(rtas_ld(args, 1));
>  priority = rtas_ld(args, 2);
>
>  if (!ics_valid_irq(ics, nr) || (server >= ics->icp->nr_servers)
> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> index c203646..a5bbc24 100644
> --- a/hw/intc/xics_kvm.c
> +++ b/hw/intc/xics_kvm.c
> @@ -65,7 +65,7 @@ static void icp_get_kvm_state(ICPState *ss)
>  ret = kvm_vcpu_ioctl(ss->cs, KVM_GET_ONE_REG, ®);
>  if (ret != 0) {
>  error_report("Unable to retrieve KVM interrupt controller state"
> -" for CPU %d: %s", ss->cs->cpu_index, strerror(errno));
> +" for CPU %ld: %s", kvm_arch_vcpu_id(ss->cs), 
> strerror(errno));
>  exit(1);
>  }
>
> @@ -97,7 +97,7 @@ static int icp_set_kvm_state(ICPState *ss, int version_id)
>  ret = kvm_vcpu_ioctl(ss->cs, KVM_SET_ONE_REG, ®);
>  if (ret != 0) {
>  error_report("Unable to restore KVM interrupt controller state (0x%"
> -PRIx64 ") for CPU %d: %s", state, ss->cs->cpu_index,
> +PRIx64 ") for CPU %ld: %s", state, kvm_arch_vcpu_id(ss->cs),
>  strerror(errno));
>  return ret;
>  }
> @@ -325,15 +325,15 @@ static void xics_kvm_cpu_setup(XICSState *icp, 
> PowerPCCPU *cpu)
>  struct kvm_enable_cap xics_enable_cap = {
>  .cap = KVM_CAP_IRQ_XICS,
>  .flags = 0,
> -.args = {icpkvm->kernel_xics_fd, cs->cpu_index, 0, 0},
> +.args = {icpkvm->kernel_xics_fd, kvm_arch_vcpu_id(cs), 0, 0},
>  };
>
>  ss->cs = cs;
>
>  ret = kvm_vcpu_ioctl(ss->cs, KVM_ENABLE_CAP, &xics_enable_cap);
>  if (ret < 0) {
> -error_report("Unable to connect CPU%d t

Re: [Qemu-devel] [PATCH v7 1/2] target-ppc: add PowerPCCPU::cpu_dt_id

2014-03-03 Thread Mike Day
On Sat, Feb 1, 2014 at 9:45 AM, Alexey Kardashevskiy  wrote:
> Normally CPUState::cpu_index is used to pick the right CPU for various
> operations. However default consecutive numbering does not always work
> for POWERPC.
>
> These indexes are reflected in /proc/device-tree/cpus/PowerPC,POWER7@XX
> and used to call KVM VCPU's ioctls. In order to achieve this,
> kvmppc_fixup_cpu() was introduced. Roughly speaking, it multiplies
> cpu_index by the number of threads per core.
>
> This approach has disadvantages such as:
> 1. NUMA configuration stays broken after the fixup;
> 2. CPU-targeted commands from the QEMU Monitor do not work properly as
> CPU indexes have been fixed and there is no clear way for the user to
> know what the new CPU indexes are.
>
> This introduces a @cpu_dt_id field in the CPUPPCState struct which
> is initialized from @cpu_index by default and can be fixed later
> to meet the device tree requirements.
>
> This adds an API to handle @cpu_dt_id.
>
> This removes kvmppc_fixup_cpu() as it is not more needed, @cpu_dt_id
> is calculated in ppc_cpu_realize().
>
> This will be used later in machine code.
>
Signed-off-by: Alexey Kardashevskiy 
Acked-by: Mike Day 

> ---
> Changes:
> v6: inlined kvmppc_fixup_cpu()
> ---
>  hw/ppc/ppc.c| 22 ++
>  target-ppc/cpu-qom.h|  2 ++
>  target-ppc/cpu.h| 18 ++
>  target-ppc/kvm.c| 13 -
>  target-ppc/kvm_ppc.h|  6 --
>  target-ppc/translate_init.c | 10 --
>  6 files changed, 46 insertions(+), 25 deletions(-)
>
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index 114be64..0e82719 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -26,6 +26,7 @@
>  #include "hw/ppc/ppc_e500.h"
>  #include "qemu/timer.h"
>  #include "sysemu/sysemu.h"
> +#include "sysemu/cpus.h"
>  #include "hw/timer/m48t59.h"
>  #include "qemu/log.h"
>  #include "hw/loader.h"
> @@ -1362,3 +1363,24 @@ int PPC_NVRAM_set_params (nvram_t *nvram, uint16_t 
> NVRAM_size,
>
>  return 0;
>  }
> +
> +/* CPU device-tree ID helpers */
> +int ppc_get_vcpu_dt_id(PowerPCCPU *cpu)
> +{
> +return cpu->cpu_dt_id;
> +}
> +
> +PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id)
> +{
> +CPUState *cs;
> +
> +CPU_FOREACH(cs) {
> +PowerPCCPU *cpu = POWERPC_CPU(cs);
> +
> +if (cpu->cpu_dt_id == cpu_dt_id) {
> +return cpu;
> +}
> +}
> +
> +return NULL;
> +}
> diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
> index 72b2232..b17c024 100644
> --- a/target-ppc/cpu-qom.h
> +++ b/target-ppc/cpu-qom.h
> @@ -79,6 +79,7 @@ typedef struct PowerPCCPUClass {
>  /**
>   * PowerPCCPU:
>   * @env: #CPUPPCState
> + * @cpu_dt_id: CPU index used in the device tree. KVM uses this index too
>   *
>   * A PowerPC CPU.
>   */
> @@ -88,6 +89,7 @@ typedef struct PowerPCCPU {
>  /*< public >*/
>
>  CPUPPCState env;
> +int cpu_dt_id;
>  } PowerPCCPU;
>
>  static inline PowerPCCPU *ppc_env_get_cpu(CPUPPCState *env)
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 51bcd4a..d8577ae 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -2154,4 +2154,22 @@ static inline bool cpu_has_work(CPUState *cpu)
>
>  void dump_mmu(FILE *f, fprintf_function cpu_fprintf, CPUPPCState *env);
>
> +/**
> + * ppc_get_vcpu_dt_id:
> + * @cs: a PowerPCCPU struct.
> + *
> + * Returns a device-tree ID for a CPU.
> + */
> +int ppc_get_vcpu_dt_id(PowerPCCPU *cpu);
> +
> +/**
> + * ppc_get_vcpu_by_dt_id:
> + * @cpu_dt_id: a device tree id
> + *
> + * Searches for a CPU by @cpu_dt_id.
> + *
> + * Returns: a PowerPCCPU struct
> + */
> +PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id);
> +
>  #endif /* !defined (__CPU_PPC_H__) */
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 781b72f..8bcc5fb 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -1766,19 +1766,6 @@ static void kvmppc_host_cpu_class_init(ObjectClass 
> *oc, void *data)
>  }
>  }
>
> -int kvmppc_fixup_cpu(PowerPCCPU *cpu)
> -{
> -CPUState *cs = CPU(cpu);
> -int smt;
> -
> -/* Adjust cpu index for SMT */
> -smt = kvmppc_smt_threads();
> -cs->cpu_index = (cs->cpu_index / smp_threads) * smt
> -+ (cs->cpu_index % smp_threads);
> -
> -return 0;
> -}
> -
>  bool kvmppc_has_cap_epr(void)
>  {
>  return cap_epr;
> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
> index 5f78e4b..f3afcdb 1

Re: [Qemu-devel] [PATCH 2/2] [RFC] Convert Clock Timerlists to RCU V2

2014-03-03 Thread Mike Day
On Fri, Feb 28, 2014 at 3:05 PM, Alex Bligh  wrote:

>> Rather than introduce a second mutex for timerlists, which would
>> require nested mutexes to in orderwrite to the timerlists, use one
>> QemuMutex in the QemuClock structure for all write access to any list
>> hanging off the QemuClock structure.
>
> I sort of wonder why you don't just use one Mutex across the whole
> of QEMU rather than one per clock.
>
> This would save worrying about whether when you do things like:
>   qemu_mutex_lock(&timer_list->clock->clock_mutex);

I like it, it solves another problem I spotted after I submitted the patch.

>
>> @@ -108,19 +108,24 @@ QEMUTimerList *timerlist_new(QEMUClockType type,
>> timer_list->clock = clock;
>> timer_list->notify_cb = cb;
>> timer_list->notify_opaque = opaque;
>> -qemu_mutex_init(&timer_list->active_timers_lock);
>> +qemu_mutex_init(&clock->clock_mutex);
>> QLIST_INSERT_HEAD(&clock->timerlists, timer_list, list);
>> return timer_list;
>> }
>
> That can't be right, you are calling qemu_mutex_init for each
> timerlist, but the timerlists share the mutex which is now in the
> clock. The mutex should therefore be initialized in qemu_clock_init.

> Also, clock_mutex appears never to get destroyed.

Yes, thank you, on both points.

>
>> QEMUClockType timerlist_get_clock(QEMUTimerList *timer_list)
>> {
>> -return timer_list->clock->type;
>> +return atomic_rcu_read(&timer_list->clock->type);
>> }
>
> I don't think this works because of the double indirection. It's
> The '&' means it's not actually dereferencing clock->type, but it
> is dereferencing timer_list->clock outside of either an rcu read
> lock or an atomic read. I think you'd be better with than
> rcu_read_lock() etc. (sadly).

I should fix this with parenthesis as in &(timer_list->clock->type)

>> QEMUTimerList *qemu_clock_get_main_loop_timerlist(QEMUClockType type)
>> @@ -261,11 +272,13 @@ QEMUTimerList 
>> *qemu_clock_get_main_loop_timerlist(QEMUClockType type)
>>
>> void timerlist_notify(QEMUTimerList *timer_list)
>> {
>> -if (timer_list->notify_cb) {
>> +rcu_read_lock();
>> +if (atomic_rcu_read(&timer_list->notify_cb)) {
>> timer_list->notify_cb(timer_list->notify_opaque);
>> } else {
>> qemu_notify_event();
>> }
>> +rcu_read_unlock();
>> }
>
> If you have the rcu_read_lock why do you need the atomic_rcu_read?
> And if you need it, why do you not need it on the following line?

rcu_read_lock/unlock and atomic_rcu_read/set do different things and
frequently need to be used together. rcu_read_lock/unlock causes the
thread to enter an RCU critical section by getting a counter out of
TLS. atomic_rcu_read/set will use memory barriers to flush caches
(depending on the memory model and platform) and ensure that reads and
writes are ordered. You typically would use atomic_rcu_read on the
first read, thereafter the memory is up-to-date and writes have been
flushed. And you typically use atomic_rcu_set on the last write, when
the data structure is complete. You don't need to use them on every
update. Just for completeness, rcu_read_unlock ends the rcu critical
section but its usually a no-op.

> However, as any QEMUTimerList can (now) be reclaimed, surely all
> callers of this function are going to need to hold the rcu_read_lock
> anyway?

Yes

> I think this is used only by timerlist_rearm and qemu_clock_notify.
> Provided these call this function holding the rcu_read_lock
> I don't think this function needs changing.
>
>> /* Transition function to convert a nanosecond timeout to ms
>> @@ -389,24 +402,30 @@ static void timerlist_rearm(QEMUTimerList *timer_list)
>> /* stop a timer, but do not dealloc it */
>> void timer_del(QEMUTimer *ts)
>> {
>> -QEMUTimerList *timer_list = ts->timer_list;
>> +QEMUTimerList *timer_list;
>>
>> -qemu_mutex_lock(&timer_list->active_timers_lock);
>> +timer_list = atomic_rcu_read(&ts->timer_list);
>> +rcu_read_lock();
>> +qemu_mutex_lock(&timer_list->clock->clock_mutex);
>> +rcu_read_unlock();
>> timer_del_locked(timer_list, ts);
>> -qemu_mutex_unlock(&timer_list->active_timers_lock);
>> +qemu_mutex_unlock(&timer_list->clock->clock_mutex);
>> }
>
> Again in my ignorance of RCU I don't know whether taking a mutex
> implicitly takes an rcu read lock. If not, that rcu_read_unlock
> should be after the mutex unlock.

I am going to change this because of a race condition, To answer your
question, holding a mutex does not imply holding an rcu read lock. It
does indicate the potential need for readers to use rcu to read the
list because they can do so when the mutex holder is updating the
list. And, I'm working under the assumption that holding a mutex
implies a memory barrier so there is no need for atomic_rcu_read/set.

>
>> @@ -447,7 +468,10 @@ void timer_mod_anticipate(QEMUTimer *ts, int64_t 
>> expire_time)
>>
>> bool timer_pending(QEMUTimer *ts)
>> {
>> -return ts->expire_time >= 0;
>> +int64

[Qemu-devel] [PATCH 2/2] [RFC] Convert Clock Timerlists to RCU V2

2014-02-27 Thread Mike Day
timerlists is a list of lists that holds active timers, among other
items. It is read-mostly. This patch converts read access to the
timerlists to use RCU.

Rather than introduce a second mutex for timerlists, which would
require nested mutexes to in orderwrite to the timerlists, use one
QemuMutex in the QemuClock structure for all write access to any list
hanging off the QemuClock structure.

This patch also protects timerlists->active_timers->timer_list by the
new clock mutex for write access and by RCU for read access.

This patch applies against Paolo Bonzini's rcu branch:
https://github.com/bonzini/qemu/tree/rcu and also requires the
previously submitted patch 03fba95 "Convert active timers list to use
RCU for read operations V2."

V2:
- timerlists modifications split to a separate patch (this one).
- Addressed Alex Blighs comments.

Signed-off-by: Mike Day 
---
 qemu-timer.c | 79 +---
 1 file changed, 54 insertions(+), 25 deletions(-)

diff --git a/qemu-timer.c b/qemu-timer.c
index dad30a3..e335a7a 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -47,7 +47,7 @@
 
 typedef struct QEMUClock {
 QLIST_HEAD(, QEMUTimerList) timerlists;
-
+QemuMutex clock_mutex;
 NotifierList reset_notifiers;
 int64_t last;
 QEMUClockType type;
@@ -69,12 +69,12 @@ QEMUClock qemu_clocks[QEMU_CLOCK_MAX];
 
 struct QEMUTimerList {
 QEMUClock *clock;
-QemuMutex active_timers_lock;
 QEMUTimer *active_timers;
 QLIST_ENTRY(QEMUTimerList) list;
 QEMUTimerListNotifyCB *notify_cb;
 void *notify_opaque;
 QemuEvent timers_done_ev;
+struct rcu_head rcu;
 };
 
 /**
@@ -108,19 +108,24 @@ QEMUTimerList *timerlist_new(QEMUClockType type,
 timer_list->clock = clock;
 timer_list->notify_cb = cb;
 timer_list->notify_opaque = opaque;
-qemu_mutex_init(&timer_list->active_timers_lock);
+qemu_mutex_init(&clock->clock_mutex);
 QLIST_INSERT_HEAD(&clock->timerlists, timer_list, list);
 return timer_list;
 }
 
+static void reclaim_timerlist(struct rcu_head *rcu)
+{
+QEMUTimerList *tl = container_of(rcu, QEMUTimerList, rcu);
+g_free(tl);
+}
+
 void timerlist_free(QEMUTimerList *timer_list)
 {
 assert(!timerlist_has_timers(timer_list));
 if (timer_list->clock) {
 QLIST_REMOVE(timer_list, list);
 }
-qemu_mutex_destroy(&timer_list->active_timers_lock);
-g_free(timer_list);
+call_rcu1(&timer_list->rcu, reclaim_timerlist);
 }
 
 static void qemu_clock_init(QEMUClockType type)
@@ -144,9 +149,11 @@ void qemu_clock_notify(QEMUClockType type)
 {
 QEMUTimerList *timer_list;
 QEMUClock *clock = qemu_clock_ptr(type);
-QLIST_FOREACH(timer_list, &clock->timerlists, list) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(timer_list, &clock->timerlists, list) {
 timerlist_notify(timer_list);
 }
+rcu_read_unlock();
 }
 
 void qemu_clock_enable(QEMUClockType type, bool enabled)
@@ -154,13 +161,15 @@ void qemu_clock_enable(QEMUClockType type, bool enabled)
 QEMUClock *clock = qemu_clock_ptr(type);
 QEMUTimerList *tl;
 bool old = clock->enabled;
-clock->enabled = enabled;
+atomic_rcu_set(&clock->enabled, enabled);
 if (enabled && !old) {
 qemu_clock_notify(type);
 } else if (!enabled && old) {
-QLIST_FOREACH(tl, &clock->timerlists, list) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(tl, &clock->timerlists, list) {
 qemu_event_wait(&tl->timers_done_ev);
 }
+rcu_read_unlock();
 }
 }
 
@@ -242,16 +251,18 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type)
 int64_t deadline = -1;
 QEMUTimerList *timer_list;
 QEMUClock *clock = qemu_clock_ptr(type);
-QLIST_FOREACH(timer_list, &clock->timerlists, list) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(timer_list, &clock->timerlists, list) {
 deadline = qemu_soonest_timeout(deadline,
 timerlist_deadline_ns(timer_list));
 }
+rcu_read_unlock();
 return deadline;
 }
 
 QEMUClockType timerlist_get_clock(QEMUTimerList *timer_list)
 {
-return timer_list->clock->type;
+return atomic_rcu_read(&timer_list->clock->type);
 }
 
 QEMUTimerList *qemu_clock_get_main_loop_timerlist(QEMUClockType type)
@@ -261,11 +272,13 @@ QEMUTimerList 
*qemu_clock_get_main_loop_timerlist(QEMUClockType type)
 
 void timerlist_notify(QEMUTimerList *timer_list)
 {
-if (timer_list->notify_cb) {
+rcu_read_lock();
+if (atomic_rcu_read(&timer_list->notify_cb)) {
 timer_list->notify_cb(timer_list->notify_opaque);
 } else {
 qemu_notify_event();
 }
+rcu_read_unlock();
 }
 
 /* Transition function to convert a nanosecond timeout to ms
@@ -389,24 +402,30 @@ static void timerlist

[Qemu-devel] [PATCH 0/2] [RFC] Convert Clock lists to RCU (V2)

2014-02-27 Thread Mike Day
The first patch in the series convers the
clock->timerlists->active_timers list to RCU. The second patch
converts clock->timerlists to RCU and also protects access to
timerlists->active_timers->timer_list.

Mike Day (2):
  [RFC] Convert active timers list to use RCU V2
  [RFC] Convert Clock Timerlists to RCU V2

 include/qemu/timer.h |  19 ---
 qemu-timer.c | 148 +++
 2 files changed, 98 insertions(+), 69 deletions(-)

-- 
1.8.5.3




[Qemu-devel] [PATCH 1/2] [RFC] Convert active timers list to use RCU V2

2014-02-27 Thread Mike Day
active_timers is a list of timer lists, associated with a Qemu Clock,
that is read-mostly. This patch converts read accesses to the list to
use RCU, which should hopefully mitigate most of the synchronization
overhead.

This patch applies against Paolo Bonzini's rcu branch:
https://github.com/bonzini/qemu/tree/rcu

V2:
- Addresses comments from Alex Bligh

Signed-off-by: Mike Day 
---
 include/qemu/timer.h | 19 +++
 qemu-timer.c | 69 ++--
 2 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 5afcffc..f69ec49 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -5,8 +5,15 @@
 #include "qemu-common.h"
 #include "qemu/notify.h"
 
-/* timers */
+#ifdef __GNUC__
+#ifndef __ATOMIC_RELEASE
+#define __ATOMIC_RELEASE
+#endif
+#endif
+#include "qemu/atomic.h"
+#include "qemu/rcu.h"
 
+/* timers */
 #define SCALE_MS 100
 #define SCALE_US 1000
 #define SCALE_NS 1
@@ -61,6 +68,7 @@ struct QEMUTimer {
 void *opaque;
 QEMUTimer *next;
 int scale;
+struct rcu_head rcu;
 };
 
 extern QEMUTimerListGroup main_loop_tlg;
@@ -189,12 +197,6 @@ void qemu_clock_notify(QEMUClockType type);
  * @enabled: true to enable, false to disable
  *
  * Enable or disable a clock
- * Disabling the clock will wait for related timerlists to stop
- * executing qemu_run_timers.  Thus, this functions should not
- * be used from the callback of a timer that is based on @clock.
- * Doing so would cause a deadlock.
- *
- * Caller should hold BQL.
  */
 void qemu_clock_enable(QEMUClockType type, bool enabled);
 
@@ -543,7 +545,6 @@ void timer_del(QEMUTimer *ts);
  * freed while this function is running.
  */
 void timer_mod_ns(QEMUTimer *ts, int64_t expire_time);
-
 /**
  * timer_mod_anticipate_ns:
  * @ts: the timer
@@ -685,9 +686,7 @@ static inline int64_t qemu_soonest_timeout(int64_t 
timeout1, int64_t timeout2)
 void init_clocks(void);
 
 int64_t cpu_get_ticks(void);
-/* Caller must hold BQL */
 void cpu_enable_ticks(void);
-/* Caller must hold BQL */
 void cpu_disable_ticks(void);
 
 static inline int64_t get_ticks_per_sec(void)
diff --git a/qemu-timer.c b/qemu-timer.c
index fb9e680..dad30a3 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -29,6 +29,7 @@
 #include "hw/hw.h"
 
 #include "qemu/timer.h"
+#include "qemu/rcu_queue.h"
 #ifdef CONFIG_POSIX
 #include 
 #endif
@@ -45,12 +46,10 @@
 /* timers */
 
 typedef struct QEMUClock {
-/* We rely on BQL to protect the timerlists */
 QLIST_HEAD(, QEMUTimerList) timerlists;
 
 NotifierList reset_notifiers;
 int64_t last;
-
 QEMUClockType type;
 bool enabled;
 
@@ -75,6 +74,7 @@ struct QEMUTimerList {
 QLIST_ENTRY(QEMUTimerList) list;
 QEMUTimerListNotifyCB *notify_cb;
 void *notify_opaque;
+QemuEvent timers_done_ev;
 };
 
 /**
@@ -87,6 +87,7 @@ struct QEMUTimerList {
  */
 static inline QEMUClock *qemu_clock_ptr(QEMUClockType type)
 {
+smp_rmb();
 return &qemu_clocks[type];
 }
 
@@ -148,13 +149,6 @@ void qemu_clock_notify(QEMUClockType type)
 }
 }
 
-/* Disabling the clock will wait for related timerlists to stop
- * executing qemu_run_timers.  Thus, this functions should not
- * be used from the callback of a timer that is based on @clock.
- * Doing so would cause a deadlock.
- *
- * Caller should hold BQL.
- */
 void qemu_clock_enable(QEMUClockType type, bool enabled)
 {
 QEMUClock *clock = qemu_clock_ptr(type);
@@ -172,7 +166,7 @@ void qemu_clock_enable(QEMUClockType type, bool enabled)
 
 bool timerlist_has_timers(QEMUTimerList *timer_list)
 {
-return !!timer_list->active_timers;
+return !!atomic_rcu_read(&timer_list->active_timers);
 }
 
 bool qemu_clock_has_timers(QEMUClockType type)
@@ -184,16 +178,17 @@ bool qemu_clock_has_timers(QEMUClockType type)
 bool timerlist_expired(QEMUTimerList *timer_list)
 {
 int64_t expire_time;
+bool ret;
 
-qemu_mutex_lock(&timer_list->active_timers_lock);
-if (!timer_list->active_timers) {
-qemu_mutex_unlock(&timer_list->active_timers_lock);
+rcu_read_lock();
+if (!atomic_rcu_read(&timer_list->active_timers)) {
+rcu_read_unlock();
 return false;
 }
 expire_time = timer_list->active_timers->expire_time;
-qemu_mutex_unlock(&timer_list->active_timers_lock);
-
-return expire_time < qemu_clock_get_ns(timer_list->clock->type);
+ret = (expire_time < qemu_clock_get_ns(timer_list->clock->type));
+rcu_read_unlock();
+return ret;
 }
 
 bool qemu_clock_expired(QEMUClockType type)
@@ -220,16 +215,16 @@ int64_t timerlist_deadline_ns(QEMUTimerList *timer_list)
  * value but ->notify_cb() is called when the deadline changes.  Therefore
  * the caller should notice the change and there is no race condition.
  */
-

Re: [Qemu-devel] [PATCH] [RFC] Convert Qemu Timer List and Active Timers to RCU

2014-02-17 Thread Mike Day
> 1. You seem to be removing the use of the active_timers_lock and replacing it 
> by
>rcu (fine). However, you seem to have left the qemu_mutex_destroy in
>timerlist_free, and left the mutex in QEMUTimerList. Any reason why we 
> need  both?
>

I responded incorrectly to this yesterday. We still need the mutex
here (active_timers_lock) to provide synchronization for list updates.
The difference now is that we don't need to hold the mutex for
traversing the list. But to update the list we still need to hold the
mutex.

Mike



Re: [Qemu-devel] [PATCH] [RFC] Convert Qemu Timer List and Active Timers to RCU

2014-02-15 Thread Mike Day
first not be a
>rcu_read_lock()?
Yes, thank you.

> 8. Again, may be my reading of the diff, but you appear to have
>introduced a rcu_read_lock() outside the loop, but then
>unlock it within the loop. If the loop iterates more than once,
>won't it do more unlocks than locks?

Yes, I think the right fix is to omit the last rcu_read_unlock.

-- 
Mike Day | "Endurance is a Virtue"



Re: [Qemu-devel] [PATCH] [RFC] Convert Qemu Timer List and Active Timers to RCU

2014-02-13 Thread Mike Day
On Thu, Feb 13, 2014 at 4:25 AM, Paolo Bonzini  wrote:
> Il 13/02/2014 10:11, Alex Bligh ha scritto:
>>
>> I'll certainly have a look through this. However before I do, what
>> problem is this trying to solve? Do we think there is possibility
>> of contention on the active timers lock? I used to think this was
>> taken (let alone contented) relatively infrequently, but Rob Herring's
>> recent email suggests to me the list is being modified in some
>> circumstances rather more frequently than I thought.
>
> I think that, more than contention, it tries to reduce the cost of
> synchronization primitives, especially the locking and unlocking of the list
> around the invocation of timer callbacks.

Yes, the assumption is that the active timers are a read-mostly list,
so rcu is a win.

Mike



[Qemu-devel] [PATCH] [RFC] Convert Qemu Timer List and Active Timers to RCU

2014-02-12 Thread Mike Day

Allow readers to use RCU when reading Qemu timer lists. Applies to
Paolo Bonzini's RCU branch, https://github.com/bonzini/qemu/tree/rcu.

This patch is for comment and review only.  The rcu branch needs to be
rebased on upstream.

Signed-off-by: Mike Day 
---
 include/qemu/timer.h |  9 +-
 qemu-timer.c | 88 +++-
 2 files changed, 67 insertions(+), 30 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index b58903b..5aaa213 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -4,7 +4,13 @@
 #include "qemu/typedefs.h"
 #include "qemu-common.h"
 #include "qemu/notify.h"
-
+#ifdef __GNUC__
+#ifndef __ATOMIC_RELEASE
+#define __ATOMIC_RELEASE
+#endif
+#endif
+#include "qemu/atomic.h"
+#include "qemu/rcu.h"
 /* timers */
 
 #define SCALE_MS 100
@@ -61,6 +67,7 @@ struct QEMUTimer {
 void *opaque;
 QEMUTimer *next;
 int scale;
+struct rcu_head rcu;
 };
 
 extern QEMUTimerListGroup main_loop_tlg;
diff --git a/qemu-timer.c b/qemu-timer.c
index 6b62e88..27285ca 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -29,6 +29,7 @@
 #include "hw/hw.h"
 
 #include "qemu/timer.h"
+#include "qemu/rcu_queue.h"
 #ifdef CONFIG_POSIX
 #include 
 #endif
@@ -49,7 +50,6 @@ typedef struct QEMUClock {
 
 NotifierList reset_notifiers;
 int64_t last;
-
 QEMUClockType type;
 bool enabled;
 } QEMUClock;
@@ -71,6 +71,7 @@ struct QEMUTimerList {
 QLIST_ENTRY(QEMUTimerList) list;
 QEMUTimerListNotifyCB *notify_cb;
 void *notify_opaque;
+struct rcu_head rcu;
 };
 
 /**
@@ -107,6 +108,12 @@ QEMUTimerList *timerlist_new(QEMUClockType type,
 return timer_list;
 }
 
+static void reclaim_timer_list(struct rcu_head *rcu)
+{
+QEMUTimerList *timer_list = container_of(rcu, QEMUTimerList, rcu);
+g_free(timer_list);
+}
+
 void timerlist_free(QEMUTimerList *timer_list)
 {
 assert(!timerlist_has_timers(timer_list));
@@ -114,7 +121,7 @@ void timerlist_free(QEMUTimerList *timer_list)
 QLIST_REMOVE(timer_list, list);
 }
 qemu_mutex_destroy(&timer_list->active_timers_lock);
-g_free(timer_list);
+call_rcu1(&timer_list->rcu, reclaim_timer_list);
 }
 
 static void qemu_clock_init(QEMUClockType type)
@@ -138,9 +145,11 @@ void qemu_clock_notify(QEMUClockType type)
 {
 QEMUTimerList *timer_list;
 QEMUClock *clock = qemu_clock_ptr(type);
-QLIST_FOREACH(timer_list, &clock->timerlists, list) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(timer_list, &clock->timerlists, list) {
 timerlist_notify(timer_list);
 }
+rcu_read_unlock();
 }
 
 void qemu_clock_enable(QEMUClockType type, bool enabled)
@@ -155,7 +164,7 @@ void qemu_clock_enable(QEMUClockType type, bool enabled)
 
 bool timerlist_has_timers(QEMUTimerList *timer_list)
 {
-return !!timer_list->active_timers;
+return !!atomic_rcu_read(&timer_list->active_timers);
 }
 
 bool qemu_clock_has_timers(QEMUClockType type)
@@ -168,15 +177,15 @@ bool timerlist_expired(QEMUTimerList *timer_list)
 {
 int64_t expire_time;
 
-qemu_mutex_lock(&timer_list->active_timers_lock);
-if (!timer_list->active_timers) {
-qemu_mutex_unlock(&timer_list->active_timers_lock);
+rcu_read_lock();
+if (atomic_rcu_read(&timer_list->active_timers)) {
+rcu_read_unlock();
 return false;
 }
-expire_time = timer_list->active_timers->expire_time;
-qemu_mutex_unlock(&timer_list->active_timers_lock);
 
+expire_time = timer_list->active_timers->expire_time;
 return expire_time < qemu_clock_get_ns(timer_list->clock->type);
+rcu_read_unlock();
 }
 
 bool qemu_clock_expired(QEMUClockType type)
@@ -194,8 +203,10 @@ int64_t timerlist_deadline_ns(QEMUTimerList *timer_list)
 {
 int64_t delta;
 int64_t expire_time;
-
-if (!timer_list->clock->enabled) {
+bool enabled;
+rcu_read_lock();
+enabled = atomic_rcu_read(&timer_list->clock->enabled);
+if (!enabled) {
 return -1;
 }
 
@@ -203,16 +214,13 @@ int64_t timerlist_deadline_ns(QEMUTimerList *timer_list)
  * value but ->notify_cb() is called when the deadline changes.  Therefore
  * the caller should notice the change and there is no race condition.
  */
-qemu_mutex_lock(&timer_list->active_timers_lock);
 if (!timer_list->active_timers) {
-qemu_mutex_unlock(&timer_list->active_timers_lock);
+rcu_read_unlock();
 return -1;
 }
 expire_time = timer_list->active_timers->expire_time;
-qemu_mutex_unlock(&timer_list->active_timers_lock);
-
 delta = expire_time - qemu_clock_get_ns(timer_list->clock->type);
-
+rcu_read_unlock();
 if (delta <= 0) {
 return 0;
 }
@@ -230,16 +238,22 @@ int64

Re: [Qemu-devel] [PATCH v3] virtio: Introduce virtio-testdev

2014-01-30 Thread Mike Day

Andrew Jones  writes:

> This is a virtio version of hw/misc/debugexit and should evolve into a
> virtio version of pc-testdev. pc-testdev uses the PC's ISA bus, whereas
> this testdev can be plugged into a virtio-mmio transport, which is
> needed for kvm-unit-tests/arm. virtio-testdev uses the virtio device
> config space as a communication channel, and implements an RTAS-like
> protocol through it allowing guests to execute commands. Only three
> commands are currently implemented;
> 1) VERSION: for version compatibility checks
> 2) CLEAR:   set all the config space back to zero
> 3) EXIT:exit() from qemu with a status code

> +static uint32_t virtio_testdev_get_features(VirtIODevice *vdev, uint32_t f)
> +{
> +return f;
> +}
> +

Is this meant to be a stub currently? 

Mike

-- 
Mike Day | "Endurance is a Virtue"



Re: [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node

2014-01-20 Thread Mike Day
On Mon, Jan 20, 2014 at 12:24 PM, Michael Roth
 wrote:
> Quoting Alexey Kardashevskiy (2014-01-19 20:58:20)
>
> Would need to look at it a bit more closely to say for certain, but after
> discussing it a bit Tyrel/Mike, I think the main considerations would be:
>
> 1) PHB hotplug/unplug would need to signal a different event type in it's
>check-exception/epow message, we have stubs in place for a PHB event type,
>so that's mostly a matter of adding special-casing in the hotplug callback
>for spapr-pci-host-bridge devices
> 2) The required properties for the OF node corresponding PHB will be 
> different.
>Currently these are generated as part of the hotplug callback, and attached
>to the corresponding ConfigureConnectorState node to be fed to the guest
>via subsequent ibm,configure-connector RTAS calls, so we'd just hook the
>PHB's OF node generation code in there as.
> 3) The sysctl/kernel interface for handling PHB hotplug would be different,
>we'd be relying on the rpadlar kernel module
>(/sys/bus/pci/slots/control/add_slot) rather than rpaphp
>(/sys/bus/pci/slots//power) or the PCI rescan fallback.
>This is mostly a matter of modifying the handling in the guest tools, 
> namely
>in rtas_errd, to handle the event accordingly.
>
> We also haven't done anything extensive using rpadlpar operations within qemu
> guests, so there may be various odds/ends and possibly kernel changes needed 
> to
> get that working properly (as was the case for rpaphp, though thanks to the 
> PCI
> rescan workaround a new kernel isn't required for existing guests... a similar
> fallback likely won't be available for rpadlpar)
>
> But from a high-level view at least it seems fairly straight-forward. I'll see
> if we can get a prototype working.

The fact that it "just works" now by rescanning the pci filesystem is
a significant benefit. I don't think we want to lose it. There can be
many PHBs on one of these systems. Maybe we could make the PHB
hot-pluggable and also always have one PHB plugged in at startup. Then
the guest will see the bus when it starts and it will build the pci
file system.

Mike



Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide

2014-01-20 Thread Mike Day
Do you know which device is writing to the BAR below? From the trace
it appears it should be restoring the memory address to the BAR after
writing all 1s to the BAR and reading back the contents. (the protocol
for finding the length of the bar memory.)

On Thu, Jan 9, 2014 at 12:24 PM, Alex Williamson
 wrote:
> On Wed, 2013-12-11 at 20:30 +0200, Michael S. Tsirkin wrote:
>> From: Paolo Bonzini 
> vfio: vfio_pci_read_config(:01:10.0, @0x10, len=0x4) febe0004
> (save lower 32bits of BAR)
> vfio: vfio_pci_write_config(:01:10.0, @0x10, 0x, len=0x4)
> (write mask to BAR)

Here the device should restore the memory address (original contents)
to the BAR.

> vfio: region_del febe - febe3fff
> (memory region gets unmapped)
> vfio: vfio_pci_read_config(:01:10.0, @0x10, len=0x4) c004
> (read size mask)
> vfio: vfio_pci_write_config(:01:10.0, @0x10, 0xfebe0004, len=0x4)
> (restore BAR)
> vfio: region_add febe - febe3fff [0x7fcf3654d000]
> (memory region re-mapped)
> vfio: vfio_pci_read_config(:01:10.0, @0x14, len=0x4) 0
> (save upper 32bits of BAR)
> vfio: vfio_pci_write_config(:01:10.0, @0x14, 0x, len=0x4)
> (write mask to BAR)

and here ...

> vfio: region_del febe - febe3fff
> (memory region gets unmapped)
> vfio: region_add febe - febe3fff [0x7fcf3654d000]
> (memory region gets re-mapped with new address)
> qemu-system-x86_64: vfio_dma_map(0x7fcf38861710, 0xfebe, 0x4000, 
> 0x7fcf3654d000) = -14 (Bad address)
> (iommu barfs because it can only handle 48bit physical addresses)

I looked around some but I couldn't find an obvious culprit. Could it
be that the BAR is getting unmapped automatically due to
x-intx-mmap-timeout-ms before the device has a chance to finish
restoring the correct value to the BAR?

Mike



Re: [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node

2014-01-20 Thread Mike Day
On Sun, Jan 19, 2014 at 9:58 PM, Alexey Kardashevskiy  wrote:
>
> I did not realize DRC is not just for PCI. How hard would it be to add hot
> plug support for a whole PHB? The current QEMU trend is to make QEMU
> monitor's "device_add" equal to the command line's "-device" which is not
> (yet) true for PHB but could be. Thanks.

We discussed this approach (hot-plug the whole bus) during the design
phase and at one point started to work on it. I don't think we
established whether or not the Linux sys/bus/pci/* file system would
work with it.

Mike



Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide

2014-01-14 Thread Mike Day
>>
>> The address above has already been masked. What you need to do is read
>> the BAR. If the value from the BAR end in '1', its MMIO. If it ends in
>> '10', its RAM. If it ends in '0n' its disabled. The first thing that
>> the PCI software does after reading the BAR is mask off the two low
>> bits.
>
> Are you perhaps confusing MMIO and I/O port?  I/O port cannot be mmap'd
> on x86, so it can't be directly mapped.  It also doesn't come through
> the address_space_memory filter.  I/O port is deprecated, or at least
> discouraged, MMIO is not.  Thanks,

You're right, sorry I missed that. It doesn't solve the problem.

Mike



Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide

2014-01-14 Thread Mike Day
On Tue, Jan 14, 2014 at 12:49 PM, Mike Day  wrote:
>>> > > >>>>>>>
>>> > > >>>>>>>>> Prior to this change, there was no re-map with the 
>>> > > >>>>>>>>> febe
>
>> If we choose not to map them, how do we distinguish them from guest RAM?
>> There's no MemoryRegion flag that I'm aware of to distinguish a ram_ptr
>> that points to a chunk of guest memory from one that points to the mmap
>> of a device BAR.  I think I'd need to explicitly walk all of the vfio
>> device and try to match the MemoryRegion pointer to one of my devices.
>> That only solves the problem for vfio devices and not ivshmem devices or
>> pci-assign devices.
>>
>
> I don't know if this will save you doing your memory region search or
> not. But a BAR that ends with the low bit set is MMIO, and BAR that
> ends with the low bit clear is RAM. So the address above is RAM as was
> pointed out earlier in the thread. If you got an ambitious address in
> the future you could test the low bit. But MMIO is deprecated
> according to http://wiki.osdev.org/PCI so you probably won't see it,
> at least for 64-bit addresses.

s/ambitious/ambiguous/

The address above has already been masked. What you need to do is read
the BAR. If the value from the BAR end in '1', its MMIO. If it ends in
'10', its RAM. If it ends in '0n' its disabled. The first thing that
the PCI software does after reading the BAR is mask off the two low
bits.

Mike



Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide

2014-01-14 Thread Mike Day
>> > > >>>
>> > > > Prior to this change, there was no re-map with the 
>> > > > febe

> If we choose not to map them, how do we distinguish them from guest RAM?
> There's no MemoryRegion flag that I'm aware of to distinguish a ram_ptr
> that points to a chunk of guest memory from one that points to the mmap
> of a device BAR.  I think I'd need to explicitly walk all of the vfio
> device and try to match the MemoryRegion pointer to one of my devices.
> That only solves the problem for vfio devices and not ivshmem devices or
> pci-assign devices.
>

I don't know if this will save you doing your memory region search or
not. But a BAR that ends with the low bit set is MMIO, and BAR that
ends with the low bit clear is RAM. So the address above is RAM as was
pointed out earlier in the thread. If you got an ambitious address in
the future you could test the low bit. But MMIO is deprecated
according to http://wiki.osdev.org/PCI so you probably won't see it,
at least for 64-bit addresses.

Mike



Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide

2014-01-14 Thread Mike Day
On Tue, Jan 14, 2014 at 9:05 AM, Michael S. Tsirkin  wrote:
> On Tue, Jan 14, 2014 at 08:50:54AM -0500, Mike Day wrote:

>>
>> Also, both 32-bit and 64-bit BARs are required to be supported. It is
>> legal to construct a 64-bit BAR by masking all the high bits to
>> zero. Presumably it would be OK to mask the 16 high bits to zero as
>> well, constructing a 48-bit address.

> The question was whether addresses such as
> 0xfec0 can be a valid BAR value on these
> platforms, whether it's accessible to the CPU and
> to other PCI devices.

The answer has to be no at least for Linux. Linux uses the high bit of
the page table address as state to indicate a huge page and uses
48-bit addresses. Each PCI device is different but right now Power7
supports 16TB of RAM so I don't think the PCI bridge would necessarily
decode the high 16 bits of the memory address. For two PCI devices to
communicate with each other using 64-bit addresses they both need to
support 64-bit memory in the same address range, which is possible.
All this info subject to Paul Mackerras or Alexy …

Mike



Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide

2014-01-14 Thread Mike Day

"Michael S. Tsirkin"  writes:

> On Fri, Jan 10, 2014 at 08:31:36AM -0700, Alex Williamson wrote:

> Short term, just assume 48 bits on x86.
>
> We need to figure out what's the limitation on ppc and arm -
> maybe there's none and it can address full 64 bit range.
>
> Cc some people who might know about these platforms.

The document you need is here: 

http://goo.gl/fJYxdN

"PCI Bus Binding To: IEEE Std 1275-1994"

The short answer is that Power (OpenFirmware-to-PCI) supports both MMIO
and Memory mappings for BARs.

Also, both 32-bit and 64-bit BARs are required to be supported. It is
legal to construct a 64-bit BAR by masking all the high bits to
zero. Presumably it would be OK to mask the 16 high bits to zero as
well, constructing a 48-bit address.

Mike

-- 
Mike Day | "Endurance is a Virtue"



Re: [Qemu-devel] [RFC PATCH v2] PPC: smp: autodetect numbers of threads per core

2014-01-10 Thread Mike Day
On Fri, Jan 10, 2014 at 9:25 AM, Alexander Graf  wrote:
>>> On 01/11/2014 01:00 AM, Alexander Graf wrote:
>>
 Can't we determine the number of "default threads" at a common place,
 preferably derived from cpu type?
>>>
>>> We can do anything. I asked how exactly as I really (really) do not
>>> understand the details.
>>>
>>
>> Are you suggesting we create a dictionary with all the cpu type
>> information stored in it
>> (stepping, cores, threads, memory channels, caches) that we need to
>> keep updated?
>
> We can always talk in extremes :). Today we have a dictionary of core types 
> in QEMU. If a certain core type comes with a specific number of threads, 
> that's a property of the core, no?

Yes, very true. I'm just considering the eventual situation where each
cpu type has several or more child classes to represent different
configurations (threads, cores). That would be a lot more complicated
than now.

Mike



Re: [Qemu-devel] [RFC PATCH v2] PPC: smp: autodetect numbers of threads per core

2014-01-10 Thread Mike Day
On Fri, Jan 10, 2014 at 9:13 AM, Alexey Kardashevskiy  wrote:
> On 01/11/2014 01:00 AM, Alexander Graf wrote:

>> Can't we determine the number of "default threads" at a common place,
>> preferably derived from cpu type?
>
> We can do anything. I asked how exactly as I really (really) do not
> understand the details.
>

Are you suggesting we create a dictionary with all the cpu type
information stored in it
(stepping, cores, threads, memory channels, caches) that we need to
keep updated?



Re: [Qemu-devel] [RFC PATCH v2] PPC: smp: autodetect numbers of threads per core

2014-01-10 Thread Mike Day

Alexander Graf  writes:

>> The patch as it its now is very simple and well-contained. I wonder how
>> much it would expand if we added a max thread count to the cpu class. It
>> seems like the need for a max thread count is idiomatic to powerpc. 
>
> It's only ever useful on IBM POWER. Any other PowerPC system can partition 
> vcpus by host threads, it's only IBM POWER hardware that's as broken as it is.


> I do see that the user experience is slightly suboptimal, but by creating 
> this special case there's a good chance you're doing more harm than good.

All the problems you raise about special cases are true. But, as you
point out, ibm power is uniquely broken, which would possibly justify a
special case, unless there is a better general solution. In other words,
special cases exist for unique circumstances and if I understand you
correctly this is a unique circumstance.

Alexy explained the use case in his initial posting. The user needs to
allocate all threads of a core to an instance of KVM, but has no easy
way to obtain that information (threads per core) for the Qemu threads
option. So specifying threads="max" is a more user-friendly option. In
my understanding this is strictly a usability issue.

Mike
-- 
Mike Day | "Endurance is a Virtue"



Re: [Qemu-devel] [RFC PATCH v2] PPC: smp: autodetect numbers of threads per core

2014-01-10 Thread Mike Day

Alexey Kardashevskiy  writes:

> On 01/10/2014 10:40 AM, Alexander Graf wrote:
>> 
>
>> What if we make the max thread count a property of our cpu class? The
>> we
> can add a threads=max option which will be identical between kvm and tcg.
>
>
> You lost me here :)
> Right now the sequence is:
> 1. smp_parse
> 2. config_accelerator
> 3. machine_init
>
> I proposed
> 1. config_accelerator - reads max threads from KVM (and initializes "host"
> type)
> 2. smp_parse - does the parsing using smp_threads tweaked in 1)
> 3. machine_init - creates CPUs which may or may be not "host".

The patch as it its now is very simple and well-contained. I wonder how
much it would expand if we added a max thread count to the cpu class. It
seems like the need for a max thread count is idiomatic to powerpc. 

Mike

-- 
Mike Day | "Endurance is a Virtue"



Re: [Qemu-devel] [RFC PATCH v2] PPC: smp: autodetect numbers of threads per core

2014-01-09 Thread Mike Day

Alexey Kardashevskiy  writes:

>  /* compute missing values, prefer sockets over cores over threads */
>  if (cpus == 0 || sockets == 0) {
>  sockets = sockets > 0 ? sockets : 1;
>  cores = cores > 0 ? cores : 1;
> -threads = threads > 0 ? threads : 1;
> +if (threads_max) {
> +if (threads > 0) {
> +fprintf(stderr, "Use either threads or threads_max\n");
> +exit(1);

If you went ahead with the threads="max" string option you wouldn't need
to check here for mutual excusivity and the user wouldn't need to worry
about an extra command options.

> +}
> +threads = smp_threads > 0 ? smp_threads : 1;
> +} else {
> +threads = threads > 0 ? threads : 1;
> +}
>  if (cpus == 0) {
>  cpus = cores * threads * sockets;
>  }

-- 
Mike Day | "Endurance is a Virtue"



[Qemu-devel] [PATCH] A hexdump function that also displays UTF-8 strings contained in the dumped buffer.

2013-11-11 Thread Mike Day
This function is used by a forthcomingQemu monitor command that dumps
contents of OpenFirmware Device Trees. It dumps contents of a buffer
as hex in the same format as the existing function but also also
appends any UTF-8 strings in human-readable format.

Like the existing hexdump function, this function may be used
elsewhere in Qemu, and it shares the same prototype as the existing
function.

In both functions, check for a NULL prefix parameter and omit printing
the prefix if it is null.

Here is a sample of the output of both functions with no prefix string:

:  61 62 33 64  62 65 65 66  65 62 34 64  66 62 65 03
0010:  67 62 35 64  68 01 05 03  69 62 36 64  6a 01 06 03
0020:  6b 62 37 64  6c 01 07 03  6d 62 38 64  6e 01 08 03
0030:  6f 62 39 64  70 01 09 03  71 62 78 64

:  61 62 33 64  62 65 65 66  65 62 34 64  66 62 65 03   ab3dbeefeb4dfbe.
0010:  67 62 35 64  68 01 05 03  69 62 36 64  6a 01 06 03   gb5dh...ib6dj...
0020:  6b 62 37 64  6c 01 07 03  6d 62 38 64  6e 01 08 03   kb7dl...mb8dn...
0030:  6f 62 39 64  70 01 09 03  71 62 78 64ob9dp...qbxd

Signed-off-by: Mike Day 
---
 include/qemu-common.h |  2 ++
 util/hexdump.c| 48 +++-
 2 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index 5054836..7b8e2b9 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -435,6 +435,8 @@ int mod_utf8_codepoint(const char *s, size_t n, char **end);
  */
 
 void qemu_hexdump(const char *buf, FILE *fp, const char *prefix, size_t size);
+/* include any strings alongside the hex output */
+void qemu_hexdump_str(gchar *buf, FILE *fp, const gchar *prefix, size_t len);
 
 /* vector definitions */
 #ifdef __ALTIVEC__
diff --git a/util/hexdump.c b/util/hexdump.c
index 969b340..a920c81 100644
--- a/util/hexdump.c
+++ b/util/hexdump.c
@@ -21,7 +21,11 @@ void qemu_hexdump(const char *buf, FILE *fp, const char 
*prefix, size_t size)
 
 for (b = 0; b < size; b++) {
 if ((b % 16) == 0) {
-fprintf(fp, "%s: %04x:", prefix, b);
+if (prefix) {
+fprintf(fp, "%s: %04x:", prefix, b);
+} else {
+fprintf(fp, "%04x:", b);
+}
 }
 if ((b % 4) == 0) {
 fprintf(fp, " ");
@@ -35,3 +39,45 @@ void qemu_hexdump(const char *buf, FILE *fp, const char 
*prefix, size_t size)
 fprintf(fp, "\n");
 }
 }
+
+/* print any strings along side the hex dump */
+void qemu_hexdump_str(gchar *buf, FILE *fp, const gchar *prefix, size_t len)
+{
+
+gchar *inp, *linep;
+int i, offset = 0;
+inp = linep = buf;
+
+do {
+if (prefix) {
+fprintf(fp, "%s: %04x:  ", prefix, offset);
+} else {
+fprintf(fp, "%04x:  ", offset);
+}
+for (i = 0; i < 16 && len > 0; i++, len--, offset++, inp++) {
+if (i && !(i % 4)) {
+fprintf(fp, " ");
+}
+fprintf(fp, "%02hx ", *inp);
+}
+int j;
+if (i < 16) {
+for (j = 16 - i; j; --j) {
+fprintf(fp, "   ");
+if (j && (!(j % 4))) {
+fprintf(fp, " ");
+}
+}
+}
+fprintf(fp, "  ");
+for (j = 0; j < i; j++) {
+if (*(linep + j) < 0x20 || *(linep + j) > 0x7e) {
+fprintf(fp, "%c", '.');
+} else {
+fprintf(fp, "%c", *(linep + j));
+}
+}
+fprintf(fp, "\n");
+linep = inp;
+} while (len);
+}



Re: [Qemu-devel] [RFC] SPAPR-PCI Hotplug Support in Qemu

2013-10-10 Thread Mike Day
Adding Anthony's corrected address.

On Thu, Oct 10, 2013 at 6:25 PM, Mike Day  wrote:
[RFC] SPAPR-PCI Hotplug Support in Qemu
>
> Background:
> ppc64 has a unique bus structure for PCI slots: each slot is connected
> to its PHB by a pci switch. This is true in some IBM hardware as well as
> paravirtual hardware (PAPR).
>
> SLOF firmware normally scans the hardware bus and creates the correct
> slot/PCI switch -complex in the open firmware device tree. It also
> configures the slot and PCI switch (BARs, etc.)
>
> For devices set up by platform firmware, each PCI device is attached to
> its PHB and correctly configured.
>
> For Linux hot-plugged devices running under PowerVM today, each device
> is created with a PCI switch hanging off the dev->subordinate
> pointer. (PowerVM gets this info from the open firmware device tree in
> rtas.)
>
> Problem:
>
> The Qemu hot-plug path doesn't anticipate a PCI switch being attached
> to every PHB slot.
>
> When hot-plugging a device, Qemu qdev creates the device, which allows
> the device to initialize itself. Qemu then passes this initialized
> device to the ppc PHB via the hot-plug path.[1]
>
> The current ppc hot-plug code then creates a device tree node for the
> device [2], and allocates resources (BARs etc) for the new device. [3]
>
> The ppc64 kernel expects each hot-plugged PCI device structure to
> point to a subordinate bus dev->subordinate. This assumption is held
> throughout the ppc PCI code, and there are numerous opportunities for
> panics when the device gets passed to a kernel routine with a
> subordinate pointer. [4]
>
>
> Proposed Solutions:
>
> (1) Create and hook an inert PCI switch to every hot-plugged PCI
> device in Qemu.
>
> (a) After the device has initialized itself, at hot-plug time, create a
> new PCI switch, configure the switch, allocate BARs, and attach the
> switch to the hot-plugged devices (dev->subordinate).
>
> (b) create a new device tree node that begins with the PCI switch and
> the parent of the hot-plugged device. Add the PCI switch/device
> complex to the device tree under the PHB.
>
> (2) Add each hot-plugged PCI device to its own complex of PHB
> (Processor Host Bus) and PCI switch.
>
> Simplify (1) by creating a new PHB for each hot-plugged device.
>
> (a) At PHB creation time, create a PCI switch device node for each PHB
> slot.
>
> (b) At hot-plug time, create and configure a new PHB and add the
> hot-plugged device to one of the slots. Configure and allocate
> resources as normally.
>
> Comments:
>
> The current code has only one PHB. We know we need to support more
> than one PHB ultimately. Solution #2 is consistent with this approach.
>
>
> [1]
https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c
>
> [2] ibm,rtas_configure_connector:
>
https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c#L575
>
> [3] spapr_phb_add_pci_dt
>
https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c#L900
>
> [4] dlpar_pci_add_bus
>
http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/pci/hotplug/rpadlpar_core.c?id=8bf3379a74bc9132751bfa685bad2da318fd59d7#n165
>
>
> --
>
> Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
> "Endurance is a Virtue"


[Qemu-devel] [RFC] SPAPR-PCI Hotplug Support in Qemu

2013-10-10 Thread Mike Day

[RFC] SPAPR-PCI Hotplug Support in Qemu

Background:
ppc64 has a unique bus structure for PCI slots: each slot is connected
to its PHB by a pci switch. This is true in some IBM hardware as well as
paravirtual hardware (PAPR).

SLOF firmware normally scans the hardware bus and creates the correct
slot/PCI switch -complex in the open firmware device tree. It also
configures the slot and PCI switch (BARs, etc.)

For devices set up by platform firmware, each PCI device is attached to
its PHB and correctly configured.

For Linux hot-plugged devices running under PowerVM today, each device
is created with a PCI switch hanging off the dev->subordinate
pointer. (PowerVM gets this info from the open firmware device tree in
rtas.)

Problem:

The Qemu hot-plug path doesn't anticipate a PCI switch being attached
to every PHB slot. 

When hot-plugging a device, Qemu qdev creates the device, which allows
the device to initialize itself. Qemu then passes this initialized
device to the ppc PHB via the hot-plug path.[1]

The current ppc hot-plug code then creates a device tree node for the
device [2], and allocates resources (BARs etc) for the new device. [3]

The ppc64 kernel expects each hot-plugged PCI device structure to
point to a subordinate bus dev->subordinate. This assumption is held
throughout the ppc PCI code, and there are numerous opportunities for
panics when the device gets passed to a kernel routine with a
subordinate pointer. [4]


Proposed Solutions:

(1) Create and hook an inert PCI switch to every hot-plugged PCI
device in Qemu. 

(a) After the device has initialized itself, at hot-plug time, create a
new PCI switch, configure the switch, allocate BARs, and attach the
switch to the hot-plugged devices (dev->subordinate).

(b) create a new device tree node that begins with the PCI switch and
the parent of the hot-plugged device. Add the PCI switch/device
complex to the device tree under the PHB.

(2) Add each hot-plugged PCI device to its own complex of PHB
(Processor Host Bus) and PCI switch.

Simplify (1) by creating a new PHB for each hot-plugged device. 

(a) At PHB creation time, create a PCI switch device node for each PHB
slot. 

(b) At hot-plug time, create and configure a new PHB and add the
hot-plugged device to one of the slots. Configure and allocate
resources as normally. 

Comments:

The current code has only one PHB. We know we need to support more
than one PHB ultimately. Solution #2 is consistent with this approach.


[1] https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c

[2] ibm,rtas_configure_connector:
https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c#L575

[3] spapr_phb_add_pci_dt
https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c#L900

[4] dlpar_pci_add_bus
http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/pci/hotplug/rpadlpar_core.c?id=8bf3379a74bc9132751bfa685bad2da318fd59d7#n165


-- 

Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
"Endurance is a Virtue"



Re: [Qemu-devel] [PATCH v4 2/3] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe

2013-10-07 Thread Mike Day

Paolo Bonzini  writes:

> Il 30/09/2013 15:34, Alex Bligh ha scritto:
>> 
>> I think the most likely change here is that the walkers might
>> move outside the BQL. Given modification of this list is so rare,
>> the lock would be very very read heavy, so RCU is probably a
>> sensible option.
>
> I agree.  Keeping the write side on the BQL is sane, but RCU-protecting
> the read side actually makes the rules simpler.
>
> Mike, would you like to give it a shot?

Yes, I will. I'll have a patchset for review within a couple of days. 

Mike
-- 

Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
"Endurance is a Virtue"



Re: [Qemu-devel] [PATCH v4 2/3] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe

2013-09-30 Thread Mike Day
On Mon, Sep 30, 2013 at 9:34 AM, Alex Bligh  wrote:
>>> > void qemu_clock_notify(QEMUClockType type)
>
> ...
>>>
>>> > int64_t qemu_clock_deadline_ns_all(QEMUClockType type)
>
> ...
>
>>> > I think these functions are always called now with the BQL held, so I
>>> > wonder if they are good candidates for RCU?
>>>
>>> These routines iterate through the list of timerlists held by
>>> a clock.
>>>
>>> They do not iterate through the list of active timers in a timer
>>> list. I believe the latter is what active_timers_lock protects.
>>>
>>> The list of timers attached to a clock is only modified when timers
>>> are created and deleted which is (currently) under the BQL.
>>
>> Sorry, and thanks for the correction re: active_timers_lock. I should
>> have said that clock->timerlists may need its own mutex if and when we
>> remove the BQL from the timer code.
>
>
> Correct. The list of timerlists is only modified when a
> QEMUTimerListGroup is created or destroyed (currently when an
> AioContext is created or destroyed), and that is done with the
> BQL held.
>
> As you point out, it's currently walked by a couple of functions
> that also have the BQL held.
>
> I think the most likely change here is that the walkers might
> move outside the BQL. Given modification of this list is so rare,
> the lock would be very very read heavy, so RCU is probably a
> sensible option.
>
> This link may (or may not) help in understanding:
>  http://blog.alex.org.uk/2013/08/24/changes-to-qemus-timer-system/

Alex, thanks for the link -

Mike


Re: [Qemu-devel] [PATCH v4 2/3] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe

2013-09-30 Thread Mike Day
>
>
> On Mon, Sep 30, 2013 at 8:55 AM, Alex Bligh  wrote:
> >
> >
> > On 30 Sep 2013, at 13:45, Mike Day wrote:
> >
> > > I've applied this set to Paolo's rcu tree - I see a couple of routines
> > > that appear to need the active_timers_lock:
> > >
> > > (line 137 of qemu-timer.c in my tree)
> > > void qemu_clock_notify(QEMUClockType type)
> > > {
> > >QEMUTimerList *timer_list;
> > >QEMUClock *clock = qemu_clock_ptr(type);
> > >QLIST_FOREACH(timer_list, &clock->timerlists, list) {
> > >timerlist_notify(timer_list);
> > >}
> > > }
> > >
> > > (line 228 of qemu-timer.c in my tree)
> > > int64_t qemu_clock_deadline_ns_all(QEMUClockType type)
> > > {
> > >int64_t deadline = -1;
> > >QEMUTimerList *timer_list;
> > >QEMUClock *clock = qemu_clock_ptr(type);
> > >QLIST_FOREACH(timer_list, &clock->timerlists, list) {
> > >deadline = qemu_soonest_timeout(deadline,
> > >
>  timerlist_deadline_ns(timer_list));
> > >}
> > >return deadline;
> > > }
> > >
> > > I think these functions are always called now with the BQL held, so I
> > > wonder if they are good candidates for RCU?
> >
> > These routines iterate through the list of timerlists held by
> > a clock.
> >
> > They do not iterate through the list of active timers in a timer
> > list. I believe the latter is what active_timers_lock protects.
> >
> > The list of timers attached to a clock is only modified when timers
> > are created and deleted which is (currently) under the BQL.
> >
>
>
Sorry, and thanks for the correction re: active_timers_lock. I should have
said that clock->timerlists may need its own mutex if and when we remove
the BQL from the timer code.

Mike


Re: [Qemu-devel] [PATCH v4 2/3] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe

2013-09-30 Thread Mike Day
ive_timers_lock);
> +for (t = timer_list->active_timers; t != NULL; t = t->next) {
>  if (t == ts) {
> -return true;
> +found = true;
> +break;
>  }
>  }
> -return false;
> +qemu_mutex_unlock(&timer_list->active_timers_lock);
> +return found;
>  }
>  
>  bool timer_expired(QEMUTimer *timer_head, int64_t current_time)
> @@ -369,23 +410,31 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
>  QEMUTimer *ts;
>  int64_t current_time;
>  bool progress = false;
> -   
> +QEMUTimerCB *cb;
> +void *opaque;
> +
>  if (!timer_list->clock->enabled) {
>  return progress;
>  }
>  
>  current_time = qemu_clock_get_ns(timer_list->clock->type);
>  for(;;) {
> +qemu_mutex_lock(&timer_list->active_timers_lock);
>  ts = timer_list->active_timers;
>  if (!timer_expired_ns(ts, current_time)) {
> +qemu_mutex_unlock(&timer_list->active_timers_lock);
>  break;
>  }
> +
>  /* remove timer from the list before calling the callback */
>  timer_list->active_timers = ts->next;
>  ts->next = NULL;
> +cb = ts->cb;
> +opaque = ts->opaque;
> +qemu_mutex_unlock(&timer_list->active_timers_lock);
>  
>  /* run the callback (the timer list can be modified) */
> -ts->cb(ts->opaque);
> +cb(opaque);
>  progress = true;
>  }
>  return progress;
> -- 
> 1.8.3.1
>
>

-- 

Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
"Endurance is a Virtue"



Re: [Qemu-devel] [PATCH v5 0/5] bugs fix for hpet

2013-09-26 Thread Mike Day

Paolo Bonzini  writes:

> Il 25/09/2013 08:27, liu ping fan ha scritto:
>> Hi, is hpet orphan? Or who can help me to merge this patch-set if my
>> patch is fine.
>
> Anthony, Michael?

Sorry, wrong Michael - 

Mike

-- 

Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
"Endurance is a Virtue"



Re: [Qemu-devel] [PATCH v5 0/5] bugs fix for hpet

2013-09-26 Thread Mike Day

Paolo Bonzini  writes:

> Il 25/09/2013 08:27, liu ping fan ha scritto:
>> Hi, is hpet orphan? Or who can help me to merge this patch-set if my
>> patch is fine.
>
> Anthony, Michael?

Yes, happy to help out with this. I'll start looking at it now and work
with Liu Ping, 

Mike

-- 

Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
"Endurance is a Virtue"



Re: [Qemu-devel] [PATCH] Convert ram_list to RCU DQ V4,2

2013-09-09 Thread Mike Day
On Mon, Sep 9, 2013 at 12:21 PM, Paolo Bonzini  wrote:
>
> > @@ -601,12 +608,22 @@ static void reset_ram_globals(void)
> >  last_seen_block = NULL;
> >  last_sent_block = NULL;
> >  last_offset = 0;
> > -last_version = ram_list.version;
> >  ram_bulk_stage = true;
> > +smp_wmb();
> > +last_version = ram_list.version;
>
> This barrier is still not needed.

Yes, I agree, because of the calling context. It is brittle though because
reset_ram_globals is a static function and may be called differently in the
future (or by new code at a different location). The need for a barrier may
change and it would be opaque to the internals of the reset function. Also,
the globals are writable elsewhere in the file. It needs more organization
but I agree that should be a discrete patch.

> Can you take a look at my rcu branch?

The comments clarify to me why writing to last_mru does not need a write
barrier, thank you.

> I pushed there the conversion of mru_block to RCU (based on 4.1), and
> clarified a bit more the locking conventions.  Basically we have:
>
> - functions called under iothread lock (e.g. qemu_ram_alloc)
>
> - functions called under RCU or iothread lock (e.g. qemu_get_ram_ptr)
>
> - functions called under RCU or iothread lock, or while holding a
> reference to the MemoryRegion that is looked up again (e.g.
> qemu_ram_addr_from_host).
>
> The difference between the second and third group is simply that the
> second will not call rcu_read_lock()/rcu_read_unlock(), the third will.
>
> Does it make sense?  Should we simplify it by always calling
> rcu_read_lock()/rcu_read_unlock(), which removes the second group?

I think the benefits of simplification and future reliability are greater
than the performance cost of the rcu_read_lock. The latter should be
something we verify, though.

Thank you!

Mike


[Qemu-devel] [PATCH] Convert ram_list to RCU DQ V4,2

2013-09-09 Thread Mike Day
Changes from V4.1:

* Correct memory barriers for ram_list globals.

Changes from V4:

* rebased on https://github.com/bonzini/qemu/tree/rcu
  commit 965f3b2aac93bca6df50c86fb17a06b3c856fa30

Changes from V3:

* now passes virt-test -t qemu
* uses call_rcu instead of call_rcu1
* completely removed the ram_list mutex and its locking functions
* cleaned up some comments

Changes from V2:

* added rcu reclaim function to free ram blocks
* reduced the number of comments
* made rcu locking policy consistent for different caller scenarios
* rcu updates to ram_block now are protected only by the iothread mutex

Changes from V1:

* Omitted locks or rcu critical sections within Some functions that
  read or write the ram_list but are called in a protected context
  (the caller holds the iothread lock, the ram_list mutex, or an rcu
  critical section).

Allow "unlocked" reads of the ram_list by using an RCU-enabled
DQ. Most readers of the list no longer require holding the list mutex.

The ram_list now uses a QLIST instead of a QTAILQ. The difference is
minimal.

This patch has been built and make-checked for the x86_64, ppc64,
s390x, and arm targets. It has been virt-tested using KVM -t qemu and
passes 15/15 migration tests.

To apply this patch, you must base upon Paolo Bonzini's rcu tree and
also apply the RCU DQ patch (below).

https://github.com/bonzini/qemu/tree/rcu
http://article.gmane.org/gmane.comp.emulators.qemu/230159/

Signed-off-by: Mike Day 

Signed-off-by: Mike Day 
---
 arch_init.c| 107 ++-
 exec.c | 167 +++--
 include/exec/cpu-all.h |  13 ++--
 3 files changed, 161 insertions(+), 126 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 0471cd5..e24d29e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -49,6 +49,7 @@
 #include "trace.h"
 #include "exec/cpu-all.h"
 #include "hw/acpi/acpi.h"
+#include "qemu/rcu_queue.h"
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -399,7 +400,8 @@ static void migration_bitmap_sync(void)
 trace_migration_bitmap_sync_start();
 address_space_sync_dirty_bitmap(&address_space_memory);
 
-QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
 for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
 if (memory_region_test_and_clear_dirty(block->mr,
addr, TARGET_PAGE_SIZE,
@@ -408,6 +410,8 @@ static void migration_bitmap_sync(void)
 }
 }
 }
+rcu_read_unlock();
+
 trace_migration_bitmap_sync_end(migration_dirty_pages
 - num_dirty_pages_init);
 num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
@@ -447,6 +451,8 @@ static void migration_bitmap_sync(void)
  *
  * Returns:  The number of bytes written.
  *   0 means no dirty pages
+ *
+ * assumes that the caller has rcu-locked the ram_list
  */
 
 static int ram_save_block(QEMUFile *f, bool last_stage)
@@ -458,8 +464,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 MemoryRegion *mr;
 ram_addr_t current_addr;
 
+
 if (!block)
-block = QTAILQ_FIRST(&ram_list.blocks);
+block = QLIST_FIRST_RCU(&ram_list.blocks);
 
 while (true) {
 mr = block->mr;
@@ -470,9 +477,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 if (offset >= block->length) {
 offset = 0;
-block = QTAILQ_NEXT(block, next);
+block = QLIST_NEXT_RCU(block, next);
 if (!block) {
-block = QTAILQ_FIRST(&ram_list.blocks);
+block = QLIST_FIRST_RCU(&ram_list.blocks);
 complete_round = true;
 ram_bulk_stage = false;
 }
@@ -527,9 +534,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 }
 }
+
 last_seen_block = block;
 last_offset = offset;
-
 return bytes_sent;
 }
 
@@ -566,10 +573,10 @@ uint64_t ram_bytes_total(void)
 {
 RAMBlock *block;
 uint64_t total = 0;
-
-QTAILQ_FOREACH(block, &ram_list.blocks, next)
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next)
 total += block->length;
-
+rcu_read_unlock();
 return total;
 }
 
@@ -601,12 +608,22 @@ static void reset_ram_globals(void)
 last_seen_block = NULL;
 last_sent_block = NULL;
 last_offset = 0;
-last_version = ram_list.version;
 ram_bulk_stage = true;
+smp_wmb();
+last_version = ram_list.version;
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
+
+/* ram_save_* functions each  implement a long-running RCU critical
+ * section. When rcu-reclaims in the code start to become numerous
+ * it will be ne

[Qemu-devel] [PATCH] Convert ram_list to RCU DQ V4.1

2013-09-05 Thread Mike Day
Changes from V4:

rebased on https://github.com/bonzini/qemu/tree/rcu
commit 965f3b2aac93bca6df50c86fb17a06b3c856fa30

This brings the patchset down to three files.

Changes from V3:

* now passes virt-test -t qemu
* uses call_rcu instead of call_rcu1
* completely removed the ram_list mutex and its locking functions
* cleaned up some comments

Changes from V2:

* added rcu reclaim function to free ram blocks
* reduced the number of comments
* made rcu locking policy consistent for different caller scenarios
* rcu updates to ram_block now are protected only by the iothread mutex

Changes from V1:

* Omitted locks or rcu critical sections within Some functions that
  read or write the ram_list but are called in a protected context
  (the caller holds the iothread lock, the ram_list mutex, or an rcu
  critical section).

Allow "unlocked" reads of the ram_list by using an RCU-enabled
DQ. Most readers of the list no longer require holding the list mutex.

The ram_list now uses a QLIST instead of a QTAILQ. The difference is
minimal.

This patch has been built and make-checked for the x86_64, ppc64,
s390x, and arm targets. It has been virt-tested using KVM -t qemu and
passes 15/15 migration tests.

To apply this patch, you must base upon Paolo Bonzini's rcu tree and
also apply the RCU DQ patch (below).

https://github.com/bonzini/qemu/tree/rcu
http://article.gmane.org/gmane.comp.emulators.qemu/230159/

Signed-off-by: Mike Day 
---
 arch_init.c| 105 ++-
 exec.c | 164 +++--
 include/exec/cpu-all.h |  13 ++--
 3 files changed, 158 insertions(+), 124 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 0471cd5..29f1da2 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -49,6 +49,7 @@
 #include "trace.h"
 #include "exec/cpu-all.h"
 #include "hw/acpi/acpi.h"
+#include "qemu/rcu_queue.h"
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -399,7 +400,8 @@ static void migration_bitmap_sync(void)
 trace_migration_bitmap_sync_start();
 address_space_sync_dirty_bitmap(&address_space_memory);
 
-QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
 for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
 if (memory_region_test_and_clear_dirty(block->mr,
addr, TARGET_PAGE_SIZE,
@@ -408,6 +410,8 @@ static void migration_bitmap_sync(void)
 }
 }
 }
+rcu_read_unlock();
+
 trace_migration_bitmap_sync_end(migration_dirty_pages
 - num_dirty_pages_init);
 num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
@@ -447,6 +451,8 @@ static void migration_bitmap_sync(void)
  *
  * Returns:  The number of bytes written.
  *   0 means no dirty pages
+ *
+ * assumes that the caller has rcu-locked the ram_list
  */
 
 static int ram_save_block(QEMUFile *f, bool last_stage)
@@ -458,8 +464,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 MemoryRegion *mr;
 ram_addr_t current_addr;
 
+
 if (!block)
-block = QTAILQ_FIRST(&ram_list.blocks);
+block = QLIST_FIRST_RCU(&ram_list.blocks);
 
 while (true) {
 mr = block->mr;
@@ -470,9 +477,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 if (offset >= block->length) {
 offset = 0;
-block = QTAILQ_NEXT(block, next);
+block = QLIST_NEXT_RCU(block, next);
 if (!block) {
-block = QTAILQ_FIRST(&ram_list.blocks);
+block = QLIST_FIRST_RCU(&ram_list.blocks);
 complete_round = true;
 ram_bulk_stage = false;
 }
@@ -527,9 +534,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 }
 }
+
 last_seen_block = block;
 last_offset = offset;
-
 return bytes_sent;
 }
 
@@ -566,10 +573,10 @@ uint64_t ram_bytes_total(void)
 {
 RAMBlock *block;
 uint64_t total = 0;
-
-QTAILQ_FOREACH(block, &ram_list.blocks, next)
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next)
 total += block->length;
-
+rcu_read_unlock();
 return total;
 }
 
@@ -603,10 +610,20 @@ static void reset_ram_globals(void)
 last_offset = 0;
 last_version = ram_list.version;
 ram_bulk_stage = true;
+smp_wmb();
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
+
+/* ram_save_* functions each  implement a long-running RCU critical
+ * section. When rcu-reclaims in the code start to become numerous
+ * it will be necessary to reduce the granularity of these critical
+ * sections.
+ *
+ * (ram_save_setup, ram_save_iterate, and ram_save_complete.)
+ */

Re: [Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V4

2013-09-04 Thread Mike Day
On Wed, Sep 4, 2013 at 3:58 PM, Paolo Bonzini  wrote:
>
> > @@ -1323,23 +1325,21 @@ static RAMBlock *qemu_get_ram_block(ram_addr_t
addr)
> >  {
> >  RAMBlock *block;
> >
> > -/* The list is protected by the iothread lock here.  */
> > +/* This assumes the iothread lock is taken here too. */
> > +
> >  block = ram_list.mru_block;
> >  if (block && addr - block->offset < block->length) {
> > -goto found;
> > +return block;
> >  }
> > -QTAILQ_FOREACH(block, &ram_list.blocks, next) {
> > +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> >  if (addr - block->offset < block->length) {
> > -goto found;
> > +atomic_rcu_set(&ram_list.mru_block, block);
>
> I think this is not needed, block was already published into the block
> list.  What is important is to order mru_block == NULL so that it
> happens before the node is removed.  Which we don't do, but we can fix
> later.

I am thinking of writing some macros and possibly reorganizing the ram
globals into a struct so that we can update it by exchanging pointers
atomically. It seems to disjoint right and error-prone now that we are
making it rcu-enabled.

thanks!

Mike


[Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V4

2013-09-04 Thread Mike Day
* now passes virt-test -t qemu
* uses call_rcu instead of call_rcu1
* completely removed the ram_list mutex and its locking functions
* cleaned up some comments

Changes from V3:

* added rcu reclaim function to free ram blocks
* reduced the number of comments
* made rcu locking policy consistent for different caller scenarios
* rcu updates to ram_block now are protected only by the iothread mutex

Changes from V1:

* Omitted locks or rcu critical sections within Some functions that
  read or write the ram_list but are called in a protected context
  (the caller holds the iothread lock, the ram_list mutex, or an rcu
  critical section).

Allow "unlocked" reads of the ram_list by using an RCU-enabled
DQ. Most readers of the list no longer require holding the list mutex.

The ram_list now uses a QLIST instead of a QTAILQ. The difference is
minimal.

This patch has been built and make-checked for the x86_64, ppc64,
s390x, and arm targets. It has been virt-tested using KVM -t qemu and
passes 15/15 migration tests.

To apply this patch, you must base upon Paolo Bonzini's rcu tree and
also apply the RCU DQ patch (below).

https://github.com/bonzini/qemu/tree/rcu
http://article.gmane.org/gmane.comp.emulators.qemu/230159/

Signed-off-by: Mike Day 
---
 arch_init.c  |  96 +---
 exec.c   | 162 +--
 include/exec/cpu-all.h   |  13 ++--
 include/qemu/rcu_queue.h |   8 +++
 4 files changed, 160 insertions(+), 119 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 68a7ab7..76ef5c9 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -49,6 +49,7 @@
 #include "trace.h"
 #include "exec/cpu-all.h"
 #include "hw/acpi/acpi.h"
+#include "qemu/rcu_queue.h"
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -398,7 +399,8 @@ static void migration_bitmap_sync(void)
 trace_migration_bitmap_sync_start();
 address_space_sync_dirty_bitmap(&address_space_memory);
 
-QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
 for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
 if (memory_region_test_and_clear_dirty(block->mr,
addr, TARGET_PAGE_SIZE,
@@ -407,6 +409,8 @@ static void migration_bitmap_sync(void)
 }
 }
 }
+rcu_read_unlock();
+
 trace_migration_bitmap_sync_end(migration_dirty_pages
 - num_dirty_pages_init);
 num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
@@ -446,6 +450,8 @@ static void migration_bitmap_sync(void)
  *
  * Returns:  The number of bytes written.
  *   0 means no dirty pages
+ *
+ * assumes that the caller has rcu-locked the ram_list
  */
 
 static int ram_save_block(QEMUFile *f, bool last_stage)
@@ -458,7 +464,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 ram_addr_t current_addr;
 
 if (!block)
-block = QTAILQ_FIRST(&ram_list.blocks);
+block = QLIST_FIRST_RCU(&ram_list.blocks);
 
 while (true) {
 mr = block->mr;
@@ -469,9 +475,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 if (offset >= block->length) {
 offset = 0;
-block = QTAILQ_NEXT(block, next);
+block = QLIST_NEXT_RCU(block, next);
 if (!block) {
-block = QTAILQ_FIRST(&ram_list.blocks);
+block = QLIST_FIRST_RCU(&ram_list.blocks);
 complete_round = true;
 ram_bulk_stage = false;
 }
@@ -565,10 +571,10 @@ uint64_t ram_bytes_total(void)
 {
 RAMBlock *block;
 uint64_t total = 0;
-
-QTAILQ_FOREACH(block, &ram_list.blocks, next)
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next)
 total += block->length;
-
+rcu_read_unlock();
 return total;
 }
 
@@ -602,10 +608,20 @@ static void reset_ram_globals(void)
 last_offset = 0;
 last_version = ram_list.version;
 ram_bulk_stage = true;
+smp_wmb();
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
+
+/* ram_save_* functions each  implement a long-running RCU critical
+ * section. When rcu-reclaims in the code start to become numerous
+ * it will be necessary to reduce the granularity of these critical
+ * sections.
+ *
+ * (ram_save_setup, ram_save_iterate, and ram_save_complete.)
+ */
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
 RAMBlock *block;
@@ -631,7 +647,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 
 qemu_mutex_lock_iothread();
-qemu_mutex_lock_ramlist();
+rcu_read_lock();
 bytes_transferred = 0;
 reset_ram_globals();
 
@@ -641,13 +657,13 @@ static int ram_save_setup(QEMUFile *f

[Qemu-devel] [PATCH] Convert ram_list to RCU DQ V3

2013-09-04 Thread Mike Day
Changes from V2:

* added rcu reclaim function to free ram blocks
* reduced the number of comments
* made rcu locking policy consistent for different caller scenarios
* rcu updates to ram_block now are protected only by the iothread mutex

Changes from V1:

* Omitted locks or rcu critical sections within Some functions that
  read or write the ram_list but are called in a protected context
  (the caller holds the iothread lock, the ram_list mutex, or an rcu
  critical section).

Allow "unlocked" reads of the ram_list by using an RCU-enabled
DQ. Most readers of the list no longer require holding the list mutex.

The ram_list now uses a QLIST instead of a QTAILQ. The difference is
minimal.

This patch has been built and make-checked for the x86_64, ppc64,
s390x, and arm targets. It has not been tested further than that at
this point.

To apply this patch, you must base upon Paolo Bonzini's rcu tree and
also apply the RCU DQ patch (below).

https://github.com/bonzini/qemu/tree/rcu
http://article.gmane.org/gmane.comp.emulators.qemu/230159/

Signed-off-by: Mike Day 
---
 arch_init.c  | 103 ++
 exec.c   | 160 +--
 include/exec/cpu-all.h   |   6 +-
 include/qemu/rcu_queue.h |   8 +++
 4 files changed, 173 insertions(+), 104 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 68a7ab7..4cb1543 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -49,6 +49,7 @@
 #include "trace.h"
 #include "exec/cpu-all.h"
 #include "hw/acpi/acpi.h"
+#include "qemu/rcu_queue.h"
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -398,7 +399,8 @@ static void migration_bitmap_sync(void)
 trace_migration_bitmap_sync_start();
 address_space_sync_dirty_bitmap(&address_space_memory);
 
-QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
 for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
 if (memory_region_test_and_clear_dirty(block->mr,
addr, TARGET_PAGE_SIZE,
@@ -407,6 +409,8 @@ static void migration_bitmap_sync(void)
 }
 }
 }
+rcu_read_unlock();
+
 trace_migration_bitmap_sync_end(migration_dirty_pages
 - num_dirty_pages_init);
 num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
@@ -446,6 +450,8 @@ static void migration_bitmap_sync(void)
  *
  * Returns:  The number of bytes written.
  *   0 means no dirty pages
+ *
+ * assumes that the caller has rcu-locked the ram_list
  */
 
 static int ram_save_block(QEMUFile *f, bool last_stage)
@@ -457,8 +463,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 MemoryRegion *mr;
 ram_addr_t current_addr;
 
+
 if (!block)
-block = QTAILQ_FIRST(&ram_list.blocks);
+block = QLIST_FIRST_RCU(&ram_list.blocks);
 
 while (true) {
 mr = block->mr;
@@ -469,9 +476,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 if (offset >= block->length) {
 offset = 0;
-block = QTAILQ_NEXT(block, next);
+block = QLIST_NEXT_RCU(block, next);
 if (!block) {
-block = QTAILQ_FIRST(&ram_list.blocks);
+block = QLIST_FIRST_RCU(&ram_list.blocks);
 complete_round = true;
 ram_bulk_stage = false;
 }
@@ -526,9 +533,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 }
 }
+
 last_seen_block = block;
 last_offset = offset;
-
 return bytes_sent;
 }
 
@@ -565,10 +572,10 @@ uint64_t ram_bytes_total(void)
 {
 RAMBlock *block;
 uint64_t total = 0;
-
-QTAILQ_FOREACH(block, &ram_list.blocks, next)
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next)
 total += block->length;
-
+rcu_read_unlock();
 return total;
 }
 
@@ -602,10 +609,20 @@ static void reset_ram_globals(void)
 last_offset = 0;
 last_version = ram_list.version;
 ram_bulk_stage = true;
+smp_wmb();
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
+
+/* ram_save_* functions each  implement a long-running RCU critical
+ * section. When rcu-reclaims in the code start to become numerous
+ * it will be necessary to reduce the granularity of these critical
+ * sections.
+ *
+ * (ram_save_setup, ram_save_iterate, and ram_save_complete.)
+ */
+
 static int ram_save_setup(QEMUFile *f, void *opaque)
 {
 RAMBlock *block;
@@ -631,7 +648,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 
 qemu_mutex_lock_iothread();
-qemu_mutex_lock_ramlist();
+rcu_read_lock();
 bytes_transferred = 0;
 reset_ram_globals();
 
@@ -641,13 +658,13 @@ static 

Re: [Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V2

2013-09-03 Thread Mike Day
On Tue, Sep 3, 2013 at 10:09 AM, Paolo Bonzini  wrote:
>
> Il 03/09/2013 15:56, Mike Day ha scritto:
> >> > +/* this implements a long-running RCU critical section.
> >> > + * When rcu reclaims in the code start to become numerous
> >> > + * it will be necessary to reduce the granularity of this
critical
> >> > + * section.
> >> > + */
> >>
> >> Please add the same comment (and a rcu_read_lock/unlock pair replacing
> >> the ramlist mutex) in ram_save_iterate, too.
> >
> > Just double checking on this particular change. In practice ram_save
> > manipulates the ram_list indirectly through ram_save_block. But I'm
> > assuming you want this change because of the ram state info that
> > persists between calls to ram_save (ram_list version in particular).
>
> ram_list.version is not really a problem, but last_seen_block has to
> persist across ram_save_block calls.

Got it. that's a subtle point.

> > Also, there is potential for the callback functions
> > ram_control_*_iterate to manipulate the ram_list.
>
> I think that's right now not possible (and they could use
> rcu_read_lock/unlock as well).

Yeah. So how about we say for now that the rcu critical section status upon
entry to the ram_control_*_iterate functions is undefined. I'll make some
updates.

Mike


Re: [Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V2

2013-09-03 Thread Mike Day
On Fri, Aug 30, 2013 at 12:38 PM, Paolo Bonzini  wrote:
>
> > @@ -867,7 +879,12 @@ static int ram_load(QEMUFile *f, void *opaque, int
version_id)
> >  if (version_id < 4 || version_id > 4) {
> >  return -EINVAL;
> >  }
> > -
> > +/* this implements a long-running RCU critical section.
> > + * When rcu reclaims in the code start to become numerous
> > + * it will be necessary to reduce the granularity of this critical
> > + * section.
> > + */
>
> Please add the same comment (and a rcu_read_lock/unlock pair replacing
> the ramlist mutex) in ram_save_iterate, too.

Just double checking on this particular change. In practice ram_save
manipulates the ram_list indirectly through ram_save_block. But I'm
assuming you want this change because of the ram state info that persists
between calls to ram_save (ram_list version in particular). Also, there is
potential for the callback functions ram_control_*_iterate to manipulate
the ram_list.

I'm adding the rcu_read_lock/unlock pair in ram_load. It will be recursive
with the same calls in ram_save_block, but as you pointed out this is low
overhead.

With this change in my working code, ram_control_*_iterate are called from
within an rcu critical section.

Mike


[Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V2

2013-08-30 Thread Mike Day
Changes from V1:

* Omitted locks or rcu critical sections within Some functions that
  read or write the ram_list but are called in a protected context
  (the caller holds the iothread lock, the ram_list mutex, or an rcu
  critical section).

Allow "unlocked" reads of the ram_list by using an RCU-enabled
DQ. Most readers of the list no longer require holding the list mutex.

The ram_list now uses a QLIST instead of a QTAILQ. The difference is
minimal.

This patch has been built and make-checked for the x86_64, ppc64,
s390x, and arm targets. It has not been tested further than that at
this point.

To apply this patch, you must base upon Paolo Bonzini's rcu tree and
also apply the RCU DQ patch (below).

https://github.com/bonzini/qemu/tree/rcu
http://article.gmane.org/gmane.comp.emulators.qemu/230159/

Signed-off-by: Mike Day 
---
 arch_init.c   |  80 +++---
 exec.c| 142 ++
 hw/9pfs/virtio-9p-synth.c |   2 +-
 include/exec/cpu-all.h|   4 +-
 include/qemu/rcu_queue.h  |   8 +++
 5 files changed, 151 insertions(+), 85 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 68a7ab7..3f4d676 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -49,6 +49,7 @@
 #include "trace.h"
 #include "exec/cpu-all.h"
 #include "hw/acpi/acpi.h"
+#include "qemu/rcu_queue.h"
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -398,7 +399,11 @@ static void migration_bitmap_sync(void)
 trace_migration_bitmap_sync_start();
 address_space_sync_dirty_bitmap(&address_space_memory);
 
-QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+ /* This assumes the iothread lock or the ram_list mutex is taken.
+ * if that changes, accesses to ram_list need to be protected
+ * by a mutex (writes) or an rcu read lock (reads)
+ */
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
 for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
 if (memory_region_test_and_clear_dirty(block->mr,
addr, TARGET_PAGE_SIZE,
@@ -457,8 +462,13 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 MemoryRegion *mr;
 ram_addr_t current_addr;
 
+/* Sometimes called with the ram_list mutex held (ram_save_complete)
+ * also called WITHOUT the ram_list mutex held. (ram_save_iterate).
+ * Protect ram_list with an rcu critical section.
+ */
+rcu_read_lock();
 if (!block)
-block = QTAILQ_FIRST(&ram_list.blocks);
+block = QLIST_FIRST_RCU(&ram_list.blocks);
 
 while (true) {
 mr = block->mr;
@@ -469,9 +479,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 if (offset >= block->length) {
 offset = 0;
-block = QTAILQ_NEXT(block, next);
+block = QLIST_NEXT_RCU(block, next);
 if (!block) {
-block = QTAILQ_FIRST(&ram_list.blocks);
+block = QLIST_FIRST_RCU(&ram_list.blocks);
 complete_round = true;
 ram_bulk_stage = false;
 }
@@ -526,6 +536,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 }
 }
+rcu_read_unlock();
 last_seen_block = block;
 last_offset = offset;
 
@@ -565,10 +576,10 @@ uint64_t ram_bytes_total(void)
 {
 RAMBlock *block;
 uint64_t total = 0;
-
-QTAILQ_FOREACH(block, &ram_list.blocks, next)
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next)
 total += block->length;
-
+rcu_read_unlock();
 return total;
 }
 
@@ -631,7 +642,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 
 qemu_mutex_lock_iothread();
-qemu_mutex_lock_ramlist();
+rcu_read_lock();
 bytes_transferred = 0;
 reset_ram_globals();
 
@@ -641,13 +652,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
 qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
-QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
 qemu_put_byte(f, strlen(block->idstr));
 qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
 qemu_put_be64(f, block->length);
 }
 
-qemu_mutex_unlock_ramlist();
+rcu_read_unlock();
 
 ram_control_before_iterate(f, RAM_CONTROL_SETUP);
 ram_control_after_iterate(f, RAM_CONTROL_SETUP);
@@ -664,8 +675,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 int64_t t0;
 int total_sent = 0;
 
-qemu_mutex_lock_ramlist();
-
 if (ram_list.version != last_version) {
 reset_ram_globals();
 }
@@ -701,8 +710,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 i++;
 }
 
-qemu_mutex_unlock_ramlist();
-
 /*

Re: [Qemu-devel] [RFC PATCH] convert ram_list to RCU DQ

2013-08-30 Thread Mike Day
On Fri, Aug 30, 2013 at 4:19 AM, Paolo Bonzini  wrote:
>
>
> I'm not sure about that; returning an RCU-protected variable after
> rcu_read_unlock() seems wrong to me because the pointer may not be valid
> at that point.  I suggest using a comment that asks to call
> host_from_stream_offset within rcu_read_lock()/rcu_read_unlock().
> However, if existing practice in the kernel is different, I'll bow to
that.

In this case the only caller is ram_load so I'm removing the critical
section from within host_from_stream_offset, and adding comments to note
that the ram_list needs to be protected by the caller. The current
docs/rcu.txt information indicates that rcu critical sections can be
"nested or overlapping." But your suggestion results in cleaner code - we
will have to go back to this later as you noted earlier.

Mike


Re: [Qemu-devel] [RFC PATCH] convert ram_list to RCU DQ

2013-08-29 Thread Mike Day
On Wed, Aug 28, 2013 at 12:35 PM, Paolo Bonzini  wrote:
>
> > @@ -828,13 +829,18 @@ static inline void
*host_from_stream_offset(QEMUFile *f,
> >  qemu_get_buffer(f, (uint8_t *)id, len);
> >  id[len] = 0;
> >
> > -QTAILQ_FOREACH(block, &ram_list.blocks, next) {
> > -if (!strncmp(id, block->idstr, sizeof(id)))
> > -return memory_region_get_ram_ptr(block->mr) + offset;
> > +rcu_read_lock();
> > +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> > +if (!strncmp(id, block->idstr, sizeof(id))) {
> > +ptr = memory_region_get_ram_ptr(block->mr) + offset;
> > +goto unlock_out;
> > +}
> >  }
> >
> >  fprintf(stderr, "Can't find block %s!\n", id);
> > -return NULL;
> > +unlock_out:
> > +rcu_read_unlock();
> > +return ptr;
> >  }
>
>
> Similarly, here the critical section includes the caller, and block is
> living across calls to host.  Again, for now just put all of ram_load
> under a huge RCU critical section.  Later we can use ram_list.version to
> refresh the list and make the critical sections smaller.

And: rcu_read_lock() and rcu_read_unlock() can be called recursively, so I
can still leave the lock/unlock pair in host_from_stream_offset.


[Qemu-devel] [RFC PATCH] convert ram_list to RCU DQ

2013-08-28 Thread Mike Day
Allow "unlocked" reads of the ram_list by using an RCU-enabled
DQ. Most readers of the list no longer require holding the list mutex.

The ram_list now uses a QLIST instead of a QTAILQ. The difference is
minimal.

This patch has been built and make-checked for the x86_64, ppc64,
s390x, and arm targets. It has not been tested further than that at
this point.

To apply this patch, you must base upon Paolo Bonzini's rcu tree and
also apply the RCU DQ patch (below).

https://github.com/bonzini/qemu/tree/rcu
http://article.gmane.org/gmane.comp.emulators.qemu/230159/

Signed-off-by: Mike Day 
---
 arch_init.c   |  51 -
 exec.c| 111 +-
 hw/9pfs/virtio-9p-synth.c |   2 +-
 include/exec/cpu-all.h|   4 +-
 include/qemu/rcu_queue.h  |   8 
 5 files changed, 111 insertions(+), 65 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 68a7ab7..5c7b284 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -49,6 +49,7 @@
 #include "trace.h"
 #include "exec/cpu-all.h"
 #include "hw/acpi/acpi.h"
+#include "qemu/rcu_queue.h"
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -397,8 +398,8 @@ static void migration_bitmap_sync(void)
 
 trace_migration_bitmap_sync_start();
 address_space_sync_dirty_bitmap(&address_space_memory);
-
-QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
 for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) {
 if (memory_region_test_and_clear_dirty(block->mr,
addr, TARGET_PAGE_SIZE,
@@ -407,6 +408,7 @@ static void migration_bitmap_sync(void)
 }
 }
 }
+rcu_read_unlock();
 trace_migration_bitmap_sync_end(migration_dirty_pages
 - num_dirty_pages_init);
 num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
@@ -457,8 +459,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 MemoryRegion *mr;
 ram_addr_t current_addr;
 
+rcu_read_lock();
 if (!block)
-block = QTAILQ_FIRST(&ram_list.blocks);
+block = QLIST_FIRST_RCU(&ram_list.blocks);
 
 while (true) {
 mr = block->mr;
@@ -469,9 +472,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 if (offset >= block->length) {
 offset = 0;
-block = QTAILQ_NEXT(block, next);
+block = QLIST_NEXT_RCU(block, next);
 if (!block) {
-block = QTAILQ_FIRST(&ram_list.blocks);
+block = QLIST_FIRST_RCU(&ram_list.blocks);
 complete_round = true;
 ram_bulk_stage = false;
 }
@@ -526,6 +529,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 }
 }
 }
+rcu_read_unlock();
 last_seen_block = block;
 last_offset = offset;
 
@@ -565,10 +569,10 @@ uint64_t ram_bytes_total(void)
 {
 RAMBlock *block;
 uint64_t total = 0;
-
-QTAILQ_FOREACH(block, &ram_list.blocks, next)
+rcu_read_lock();
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next)
 total += block->length;
-
+rcu_read_unlock();
 return total;
 }
 
@@ -631,7 +635,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 
 qemu_mutex_lock_iothread();
-qemu_mutex_lock_ramlist();
+rcu_read_lock();
 bytes_transferred = 0;
 reset_ram_globals();
 
@@ -641,13 +645,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 
 qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
 
-QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
 qemu_put_byte(f, strlen(block->idstr));
 qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
 qemu_put_be64(f, block->length);
 }
 
-qemu_mutex_unlock_ramlist();
+rcu_read_unlock();
 
 ram_control_before_iterate(f, RAM_CONTROL_SETUP);
 ram_control_after_iterate(f, RAM_CONTROL_SETUP);
@@ -664,8 +668,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 int64_t t0;
 int total_sent = 0;
 
-qemu_mutex_lock_ramlist();
-
 if (ram_list.version != last_version) {
 reset_ram_globals();
 }
@@ -701,8 +703,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 i++;
 }
 
-qemu_mutex_unlock_ramlist();
-
 /*
  * Must occur before EOS (or any QEMUFile operation)
  * because of RDMA protocol.
@@ -814,6 +814,7 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 static RAMBlock *block = NULL;
 char id[256];
 uint8_t len;
+void *ptr = NULL;
 
 if (flags & RAM_SAVE_FLAG_CONTINUE) {
 if (!block) {
@@ -

[Qemu-devel] [RFC PATCH V3 ] Introduce RCU-enabled DQs

2013-08-27 Thread Mike Day
Add RCU-enabled variants on the existing bsd DQ facility. Each Q
operation has the same interface as the existing (non-RCU)
version. Also, each operation is implemented as macro for now.

Using the RCU-enabled DQ, existing DQ users will be able to convert to
RCU without using a different list interface.

Changes since V2:

* reverted QLIST_FOREACH_REVERSE_RCU, it's not needed in the interface
  file. (But it is used in the test program.)  

* added g_test support to the test program

To accompany the RCU-enabled DQ, there is also a test file that uses
concurrent readers to contend with a single updater.

This patchset builds on top of Paolo Bonzini's rcu tree:
https://github.com/bonzini/qemu/tree/rcu


Signed-off-by: Mike Day 
---
 docs/rcu.txt |   2 +-
 include/qemu/queue.h |  11 --
 include/qemu/rcu_queue.h | 137 +++
 tests/Makefile   |   5 +-
 tests/rcuq_test.c| 337 +++
 5 files changed, 479 insertions(+), 13 deletions(-)
 create mode 100644 include/qemu/rcu_queue.h
 create mode 100644 tests/rcuq_test.c

diff --git a/docs/rcu.txt b/docs/rcu.txt
index b3c593c..de59896 100644
--- a/docs/rcu.txt
+++ b/docs/rcu.txt
@@ -106,7 +106,7 @@ The core RCU API is small:
 so that the reclaimer function can fetch the struct foo address
 and free it:
 
-call_rcu1(foo_reclaim, &foo.rcu);
+call_rcu1(&foo.rcu, foo_reclaim);
 
 void foo_reclaim(struct rcu_head *rp)
 {
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 847ddd1..f6f0636 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -139,17 +139,6 @@ struct {   
 \
 (elm)->field.le_prev = &(head)->lh_first;   \
 } while (/*CONSTCOND*/0)
 
-#define QLIST_INSERT_HEAD_RCU(head, elm, field) do {\
-(elm)->field.le_prev = &(head)->lh_first;   \
-(elm)->field.le_next = (head)->lh_first;\
-smp_wmb(); /* fill elm before linking it */ \
-if ((head)->lh_first != NULL)  {\
-(head)->lh_first->field.le_prev = &(elm)->field.le_next;\
-}   \
-(head)->lh_first = (elm);   \
-smp_wmb();  \
-} while (/* CONSTCOND*/0)
-
 #define QLIST_REMOVE(elm, field) do {   \
 if ((elm)->field.le_next != NULL)   \
 (elm)->field.le_next->field.le_prev =   \
diff --git a/include/qemu/rcu_queue.h b/include/qemu/rcu_queue.h
new file mode 100644
index 000..e2b8ba5
--- /dev/null
+++ b/include/qemu/rcu_queue.h
@@ -0,0 +1,137 @@
+#ifndef QEMU_RCU_SYS_QUEUE_H
+#define QEMU_RCU_SYS_QUEUE_H
+
+/*
+ * rcu_queue.h
+ *
+ * Userspace RCU QSBR header.
+ *
+ * LGPL-compatible code should include this header with :
+ *
+ * #define _LGPL_SOURCE
+ * #include 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Copyright (c) 2013 Mike D. Day, IBM Corporation.
+ *
+ * IBM's contributions to this file may be relicensed under LGPLv2 or later.
+ */
+
+#include "qemu/rcu.h"  /* rcu.h includes qemu/queue.h and qemu/atomic.h */
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+ * List functions.
+ */
+
+
+/*
+ *  The difference between atomic_read/set and atomic_rcu_read/set
+ *  is in the including of a read/write memory barrier to the volatile
+ *  access. atomic_rcu_* macros include the memory barrier, the
+ *  plain atomic macros do not. Therefore, it should be correct to
+ *  issue a series of reads or writes to the same element using only
+ *  the atomic_* macro, until the last read or write, which should be
+ *  atomic_rcu_* to introduce a read or write memory barrier as
+ *  appropriate.
+ */
+
+/* Upon publication of the listelm->next value, list readers
+ * will see the new node when following next pointers from
+ * anteced

Re: [Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs (v2)

2013-08-26 Thread Mike Day
On Sun, Aug 25, 2013 at 3:18 PM, Mathieu Desnoyers <
mathieu.desnoy...@efficios.com> wrote:
>
> I'm not very comfortable with your DQ implementation not providing any
> kind of guarantee to a forward traversal followed by backward traversal,
> nor for backward followed by forward traversal. If a list add is
> executed concurrently with traversals, and we can ensure there are no
> list del of the node, if a traversal sees the added node when doing
> forward iteration, I would clearly expect to still see it if a backward
> iteration follows.
>
> I took the liberty of implementing a couple of ideas I had to provide
> a RCU DQ with those guarantees. I just pushed the code here (beware, I
> just did some basic single-threaded unit tests so far, so consider this
> code as largely untested concurrency-wise):
>
>   git clone git://git.urcu.io/urcu.git
>   branch: rcudq
>   file: urcu/rcudq.h
>
> Direct link to the file via gitweb:
>
http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/rcudq.h;h=4a8d7b0d5143a958514cf130b1c124d99f3194ca;hb=refs/heads/rcudq

Mathieu - Thanks for the review! And thanks for the code, I'm working with
it right now. I like the idea of using a flag to provide a form of
atomicity for the doubly-linked list elements. I'm also planning on running
some timing tests to see of the additional memory barriers and atomic
accesses make *any* difference whatsoever.

Mike

Mike Day | ncm...@ncultra.org | +1 919 371-8786


Re: [Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs (v2)

2013-08-25 Thread Mike Day

Paolo Bonzini  writes:

> Just a couple of questions, one of them on the new macro...
>
>> +/* prior to publication of the elm->prev->next value, some list
>> + * readers  may still see the removed element when following
>> + * the antecedent's next pointer.
>> + */
>> +
>> +#define QLIST_REMOVE_RCU(elm, field) do {   \
>> +if ((elm)->field.le_next != NULL) { \
>> +   (elm)->field.le_next->field.le_prev =\
>> +(elm)->field.le_prev;   \
>> +}   \
>> +atomic_rcu_set((elm)->field.le_prev, (elm)->field.le_next); \
>> +} while (/*CONSTCOND*/0)
>
> Why is the barrier needed here, but not in Linux's list_del_rcu?
>
> I think it is not needed because all involved elements have already been
> published and just have their pointers shuffled.

I read this as more than shuffling pointers. The intent here is 
that the previous element's next pointer is being updated to omit the
current element from the list.

atomic_set always deferences the pointer passed to it, and
(field)->le_pre is a double pointer. So looking at the macro:

#define atomic_set(ptr, i) ((*(__typeof__(*ptr) *volatile) (ptr)) = (i))

It translates to: 

( ( * (__typeof(*elm->field.le_prev) *volatile) (elm)->field.le_prev)  = 
elm->field.le_next; ) 

Which is: 

 *((struct *elm) *volatile)(elm)->field.le_prev = elm->field.le_next; 

Which is:

*(elm)->field.le_prev = elm->field.le_next;

Because field.le_prev is a double pointer that has previously been set
to &prev (the address of the previous list element) this is assiging the
*previous* element's next pointer, the way I read it.

The Linux list_del_rcu is dealing with a singly linked list and
therefore does not set a value in the previous node's element. 

But I'm still unclear on whether or not the memory barrier is needed
because the deleted element won't be reclaimed right away.

Mike

-- 

Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
"Endurance is a Virtue"



[Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs (v2)

2013-08-24 Thread Mike Day
Add RCU-enabled variants on the existing bsd DQ facility. Each Q
operation has the same interface as the existing (non-RCU)
version. Also, each operation is implemented as macro for now.

Using the RCU-enabled DQ, existing DQ users will be able to convert to
RCU without using a different list interface.

This version (2) adds a macro to walk a Q in reverse:

QLIST_FOREACH_REVERSE_RCU(el, head, field)

Accordingly the reader threads in the test program walk the Q in
reverse in addition to walking forward.

To accompany the RCU-enabled DQ, there is also a test file that uses
concurrent readers to contend with a single updater.

This patchset builds on top of Paolo Bonzini's rcu tree:
https://github.com/bonzini/qemu/tree/rcu

Signed-off-by: Mike Day 
---
 docs/rcu.txt |   2 +-
 include/qemu/queue.h |  11 --
 include/qemu/rcu_queue.h | 145 
 tests/Makefile   |   6 +-
 tests/rcuq_test.c| 290 +++
 5 files changed, 440 insertions(+), 14 deletions(-)
 create mode 100644 include/qemu/rcu_queue.h
 create mode 100644 tests/rcuq_test.c

diff --git a/docs/rcu.txt b/docs/rcu.txt
index b3c593c..de59896 100644
--- a/docs/rcu.txt
+++ b/docs/rcu.txt
@@ -106,7 +106,7 @@ The core RCU API is small:
 so that the reclaimer function can fetch the struct foo address
 and free it:
 
-call_rcu1(foo_reclaim, &foo.rcu);
+call_rcu1(&foo.rcu, foo_reclaim);
 
 void foo_reclaim(struct rcu_head *rp)
 {
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 847ddd1..f6f0636 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -139,17 +139,6 @@ struct {   
 \
 (elm)->field.le_prev = &(head)->lh_first;   \
 } while (/*CONSTCOND*/0)
 
-#define QLIST_INSERT_HEAD_RCU(head, elm, field) do {\
-(elm)->field.le_prev = &(head)->lh_first;   \
-(elm)->field.le_next = (head)->lh_first;\
-smp_wmb(); /* fill elm before linking it */ \
-if ((head)->lh_first != NULL)  {\
-(head)->lh_first->field.le_prev = &(elm)->field.le_next;\
-}   \
-(head)->lh_first = (elm);   \
-smp_wmb();  \
-} while (/* CONSTCOND*/0)
-
 #define QLIST_REMOVE(elm, field) do {   \
 if ((elm)->field.le_next != NULL)   \
 (elm)->field.le_next->field.le_prev =   \
diff --git a/include/qemu/rcu_queue.h b/include/qemu/rcu_queue.h
new file mode 100644
index 000..198a87d
--- /dev/null
+++ b/include/qemu/rcu_queue.h
@@ -0,0 +1,145 @@
+#ifndef QEMU_RCU_SYS_QUEUE_H
+#define QEMU_RCU_SYS_QUEUE_H
+
+/*
+ * rc_queue.h
+ *
+ * Userspace RCU QSBR header.
+ *
+ * LGPL-compatible code should include this header with :
+ *
+ * #define _LGPL_SOURCE
+ * #include 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Copyright (c) 2013 Mike D. Day, IBM Corporation.
+ *
+ * IBM's contributions to this file may be relicensed under LGPLv2 or later.
+ */
+
+#include "qemu/rcu.h"  /* rcu.h includes qemu/queue.h and qemu/atomic.h */
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+ * List functions.
+ */
+
+
+/*
+ *  The difference between atomic_read/set and atomic_rcu_read/set
+ *  is in the including of a read/write memory barrier to the volatile
+ *  access. atomic_rcu_* macros include the memory barrier, the
+ *  plain atomic macros do not. Therefore, it should be correct to
+ *  issue a series of reads or writes to the same element using only
+ *  the atomic_* macro, until the last read or write, which should be
+ *  atomic_rcu_* to introduce a read or write memory barrier as
+ *  appropriate.
+ */
+
+/* Upon publication of the listelm->next value, list readers
+ * will see the new node when following next pointer

[Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs.

2013-08-23 Thread Mike Day
Add RCU-enabled variants on the existing bsd DQ facility. Each Q
operation has the same interface as the existing (non-RCU)
version. Also, each operation is implemented as macro for now.

Using the RCU-enabled DQ, existing DQ users will be able to convert to
RCU without using a different list interface.

To accompany the RCU-enabled DQ, there is also a test file that uses
concurrent readers to contend with a single updater.

This patchset builds on top of Paolo Bonzini's rcu tree:
https://github.com/bonzini/qemu/tree/rcu

Signed-off-by: Mike Day 
---
 docs/rcu.txt |   2 +-
 include/qemu/queue.h |  11 --
 include/qemu/rcu_queue.h | 137 +++
 tests/Makefile   |  14 ++-
 tests/rcuq_test.c| 279 +++
 5 files changed, 430 insertions(+), 13 deletions(-)

diff --git a/docs/rcu.txt b/docs/rcu.txt
index b3c593c..de59896 100644
--- a/docs/rcu.txt
+++ b/docs/rcu.txt
@@ -106,7 +106,7 @@ The core RCU API is small:
 so that the reclaimer function can fetch the struct foo address
 and free it:
 
-call_rcu1(foo_reclaim, &foo.rcu);
+call_rcu1(&foo.rcu, foo_reclaim);
 
 void foo_reclaim(struct rcu_head *rp)
 {
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 847ddd1..f6f0636 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -139,17 +139,6 @@ struct {   
 \
 (elm)->field.le_prev = &(head)->lh_first;   \
 } while (/*CONSTCOND*/0)
 
-#define QLIST_INSERT_HEAD_RCU(head, elm, field) do {\
-(elm)->field.le_prev = &(head)->lh_first;   \
-(elm)->field.le_next = (head)->lh_first;\
-smp_wmb(); /* fill elm before linking it */ \
-if ((head)->lh_first != NULL)  {\
-(head)->lh_first->field.le_prev = &(elm)->field.le_next;\
-}   \
-(head)->lh_first = (elm);   \
-smp_wmb();  \
-} while (/* CONSTCOND*/0)
-
 #define QLIST_REMOVE(elm, field) do {   \
 if ((elm)->field.le_next != NULL)   \
 (elm)->field.le_next->field.le_prev =   \
diff --git a/include/qemu/rcu_queue.h b/include/qemu/rcu_queue.h
new file mode 100644
index 000..e82196d
--- /dev/null
+++ b/include/qemu/rcu_queue.h
@@ -0,0 +1,137 @@
+#ifndef QEMU_RCU_SYS_QUEUE_H
+#define QEMU_RCU_SYS_QUEUE_H
+
+/*
+ * rc_queue.h
+ *
+ * Userspace RCU QSBR header.
+ *
+ * LGPL-compatible code should include this header with :
+ *
+ * #define _LGPL_SOURCE
+ * #include 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ * Copyright (c) 2013 Mike D. Day, IBM Corporation.
+ *
+ * IBM's contributions to this file may be relicensed under LGPLv2 or later.
+ */
+
+#include "qemu/rcu.h"  /* rcu.h includes qemu/queue.h and qemu/atomic.h */
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+ * List functions.
+ */
+
+
+/* 
+ *  The difference between atomic_read/set and atomic_rcu_read/set
+ *  is in the including of a read/write memory barrier to the volatile
+ *  access. atomic_rcu_* macros include the memory barrier, the 
+ *  plain atomic macros do not. Therefore, it should be correct to 
+ *  issue a series of reads or writes to the same element using only 
+ *  the atomic_* macro, until the last read or write, which should be 
+ *  atomic_rcu_* to introduce a read or write memory barrier as 
+ *  appropriate.  
+ */
+
+/* Upon publication of the listelm->next value, list readers
+ * will see the new node when following next pointers from 
+ * antecedent nodes, but may not see the new node when following 
+ * prev pointers from subsequent nodes until after the rcu grace 
+ * period expires.   
+ * see linux/include/rculist.h __list_add_rcu(new, prev, next)
+ */
+#define QLIST_INSERT_AFTER_RCU(listelm,

Re: [Qemu-devel] [PATCH] Fixup some dynamic casts in the Qemu device tree to correspond to the QOM type-checking system. These patches change from using Linux kernel style upcasts to typesafe object o

2013-08-19 Thread Mike Day

Peter Maydell  writes:

> On 19 August 2013 19:33, Mike Day  wrote:
>> These patches apply to Paolo Bonzini's rcu tree:
>>
>> https://github.com/bonzini/qemu/tree/rcu
>> commit 781e47bf1693a80b84eec298a6a1c7b29ab2c135
>>
>> Signed-off-by: Mike Day 
>> ---
>>  hw/misc/ivshmem.c  | 2 +-
>>  hw/pci-bridge/pci_bridge_dev.c | 6 +++---
>>  hw/pci/pci_bridge.c| 2 +-
>>  3 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index ebcb52a..46d8c27 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -789,7 +789,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev)
>>
>>  static void pci_ivshmem_instance_finalize(Object *obj)
>>  {
>> -IVShmemState *s = IVSHMEM(dev);
>> +IVShmemState *s = IVSHMEM(obj);
>
> This should have been a flat-out compiler error, right?

Yes, correct, but Paolo hasn't previously submitted this specific change
code afaik.

>>  if (s->migration_blocker) {
>>  migrate_del_blocker(s->migration_blocker);
>> diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
>> index c995d5d..22caf14 100644
>> --- a/hw/pci-bridge/pci_bridge_dev.c
>> +++ b/hw/pci-bridge/pci_bridge_dev.c
>> @@ -87,7 +87,6 @@ shpc_error:
>>  bridge_error:
>>  return err;
>>  }
>> -
>
> Stray blank line change.

Thanks - and git just found it for me too. Apologies.

Mike
-- 

Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
"Endurance is a Virtue"



[Qemu-devel] [PATCH RESEND] RCU implementation for Qemu. Fixup some dynamic casts in the Qemu device tree to correspond to the QOM type-checking system.

2013-08-19 Thread Mike Day

These patches change from using Linux kernel style upcasts to typesafe
object oriented casts with runtime checking semantics.

These patches apply to Paolo Bonzini's rcu tree:

https://github.com/bonzini/qemu/tree/rcu
commit 781e47bf1693a80b84eec298a6a1c7b29ab2c135

Signed-off-by: Mike Day 
---
 hw/misc/ivshmem.c  | 2 +-
 hw/pci-bridge/pci_bridge_dev.c | 6 +++---
 hw/pci/pci_bridge.c| 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index ebcb52a..46d8c27 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -789,7 +789,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev)
 
 static void pci_ivshmem_instance_finalize(Object *obj)
 {
-IVShmemState *s = IVSHMEM(dev);
+IVShmemState *s = IVSHMEM(obj);
 
 if (s->migration_blocker) {
 migrate_del_blocker(s->migration_blocker);
diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
index c995d5d..22caf14 100644
--- a/hw/pci-bridge/pci_bridge_dev.c
+++ b/hw/pci-bridge/pci_bridge_dev.c
@@ -87,7 +87,6 @@ shpc_error:
 bridge_error:
 return err;
 }
-
 static void pci_bridge_dev_exitfn(PCIDevice *dev)
 {
 PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(dev);
@@ -102,8 +101,9 @@ static void pci_bridge_dev_exitfn(PCIDevice *dev)
 static void pci_bridge_dev_instance_finalize(Object *obj)
 {
 PCIDevice *dev = PCI_DEVICE(obj);
-PCIBridge *br = DO_UPCAST(PCIBridge, dev, dev);
-PCIBridgeDev *bridge_dev = DO_UPCAST(PCIBridgeDev, bridge, br);
+PCIBridge *br = PCI_BRIDGE(dev);
+PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(br);
+
 shpc_free(dev);
 memory_region_destroy(&bridge_dev->bar);
 pci_bridge_free(dev);
diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index 63f9912..307e076 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -391,7 +391,7 @@ void pci_bridge_exitfn(PCIDevice *pci_dev)
 
 void pci_bridge_free(PCIDevice *pci_dev)
 {
-PCIBridge *s = DO_UPCAST(PCIBridge, dev, pci_dev);
+PCIBridge *s = PCI_BRIDGE(pci_dev);
 pci_bridge_region_cleanup(s, s->windows);
 memory_region_destroy(&s->address_space_mem);
 memory_region_destroy(&s->address_space_io);
-- 
1.8.3.1




[Qemu-devel] [PATCH] Fixup some dynamic casts in the Qemu device tree to correspond to the QOM type-checking system. These patches change from using Linux kernel style upcasts to typesafe object orien

2013-08-19 Thread Mike Day
These patches apply to Paolo Bonzini's rcu tree:

https://github.com/bonzini/qemu/tree/rcu
commit 781e47bf1693a80b84eec298a6a1c7b29ab2c135

Signed-off-by: Mike Day 
---
 hw/misc/ivshmem.c  | 2 +-
 hw/pci-bridge/pci_bridge_dev.c | 6 +++---
 hw/pci/pci_bridge.c| 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index ebcb52a..46d8c27 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -789,7 +789,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev)
 
 static void pci_ivshmem_instance_finalize(Object *obj)
 {
-IVShmemState *s = IVSHMEM(dev);
+IVShmemState *s = IVSHMEM(obj);
 
 if (s->migration_blocker) {
 migrate_del_blocker(s->migration_blocker);
diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
index c995d5d..22caf14 100644
--- a/hw/pci-bridge/pci_bridge_dev.c
+++ b/hw/pci-bridge/pci_bridge_dev.c
@@ -87,7 +87,6 @@ shpc_error:
 bridge_error:
 return err;
 }
-
 static void pci_bridge_dev_exitfn(PCIDevice *dev)
 {
 PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(dev);
@@ -102,8 +101,9 @@ static void pci_bridge_dev_exitfn(PCIDevice *dev)
 static void pci_bridge_dev_instance_finalize(Object *obj)
 {
 PCIDevice *dev = PCI_DEVICE(obj);
-PCIBridge *br = DO_UPCAST(PCIBridge, dev, dev);
-PCIBridgeDev *bridge_dev = DO_UPCAST(PCIBridgeDev, bridge, br);
+PCIBridge *br = PCI_BRIDGE(dev);
+PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(br);
+
 shpc_free(dev);
 memory_region_destroy(&bridge_dev->bar);
 pci_bridge_free(dev);
diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index 63f9912..307e076 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -391,7 +391,7 @@ void pci_bridge_exitfn(PCIDevice *pci_dev)
 
 void pci_bridge_free(PCIDevice *pci_dev)
 {
-PCIBridge *s = DO_UPCAST(PCIBridge, dev, pci_dev);
+PCIBridge *s = PCI_BRIDGE(pci_dev);
 pci_bridge_region_cleanup(s, s->windows);
 memory_region_destroy(&s->address_space_mem);
 memory_region_destroy(&s->address_space_io);
-- 
1.8.3.1




Re: [Qemu-devel] [RFC PATCH 0/3] v2.2 RCU Implementation for QEMU

2013-08-16 Thread Mike Day

Andreas Färber  writes:


>> https://github.com/ncultra/qemu/tree/rcu-for-1.7
>> 
>> Mike Day (2):
>>   fixup changes from commit f63ca950
>>   fixup changes from commit f62a6b2f from Paulo Bonzini
>
> These two patches are certainly not acceptable for upstream, lacking any
> textual explanation and with those subjects.

Thanks. The entire series is not ready yet - it needs to be
refactored and tested, hence the RFC tag. I'm trying to keep an rebased
version of the Paolo's May RCU tree available for those who are
interested. When its ready to be considered for an upstream merge it
will be as expected. 

The the next step for me is to start working on leveraging the RCU code. 

Mike

-- 

Mike Day | + 1 919 371-8786 | ncm...@ncultra.org
"Endurance is a Virtue"



  1   2   >