Re: [Qemu-devel] [PATCH] [v2 PATCH] qemu-img: sort block formats in help message
On Fri, May 16, 2014 at 9:59 AM, Laurent Desnogues wrote: > > just noticed the use of some too recent glib features: g_strcmp0 > and GSequence related ones. There is a patch upstream that removes g_sequence_lookup, which was not linking on one platform. I haven't seen any other reports of breakage. g_strcmp would be trivial to replace, of course. thanks! Mike
Re: [Qemu-devel] [PATCH v2] qemu-img fails to delete last snapshot
On Wed, May 14, 2014 at 11:15 AM, Kevin Wolf wrote: >> freeing the extra clusters. > > Do you have an easy reproducer? Because I can't see the bug. Thanks for the review! I was having a hard time reproducing this until I did a bisect. This bug was fixed by 65f33bc which was merged at or after the time I submitted the patch: qcow2: Fix alloc_clusters_noref() overflow detection I can reproduce the bug by checking out the immediate ancestor 43cbeffb1, creating a single snapshot in a qcow2 image, and then attempting to delete that snapshot. The error I get is: qemu-img: Could not delete snapshot 'snapone': (Failed to remove snapshot from snapshot list: File too large) This is the error that is fixed by 65f33bc Mike
Re: [Qemu-devel] [PULL 03/17] qemu-img: sort block formats in help message
On Wed, May 14, 2014 at 9:43 AM, Jeff Cody wrote: > Prior to a recent commit, this function did not make distinction on > duplicates. As of commit e855e4fb7, duplicates are not longer printed > in the help message: > > e855e4fb7: Ignore duplicate or NULL format_name in bdrv_iterate_format): > That makes sense, thanks for the background! Mike
Re: [Qemu-devel] [PATCH] qemu-img: sort block formats in help message
On Wed, May 14, 2014 at 9:02 AM, Stefan Hajnoczi wrote: > Yes, we cannot change the git commit history once a commit is in the > public qemu.git repository. The only options are to: > > 1. Add a patch on top that fixes the issue. > 2. Use git-revert(1) to undo your commit entirely (this adds a new patch >on top the applies the reverse diff). > > In this case #1 seems like a good choice. I followed your advice and went with #1, I think its already been pulled. Mike
Re: [Qemu-devel] [PULL 03/17] qemu-img: sort block formats in help message
On Wed, May 14, 2014 at 8:35 AM, Stefan Hajnoczi wrote: > Jeff Cody recently wanted to eliminate duplicate entries in the list. I > thought part of your intention was to address the duplicates with your > patch. > > We can back out the sequence API if it's not supported on older glib but > it would be nice to eliminate duplicates later, too. I agree. I can submit an additional patch that uses an older API. What, exactly is the cause of duplicate entries in the list? I've only seen it one time but its disconcerting when it happens. Mike
[Qemu-devel] [PATCH] Remove g_sequence_lookup from qemu-img help function
g_sequence_lookup is not supported by glib < 2.28. The usage of g_sequence_lookup is not essential in this context (it's a safeguard against duplicate values in the help message). Removing the call enables the build on all platforms and does not change the operation of the help function. Signed-off-by: Mike Day --- qemu-img.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 04ce02a..1ad899e 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -70,11 +70,8 @@ static void add_format_to_seq(void *opaque, const char *fmt_name) { GSequence *seq = opaque; -if (!g_sequence_lookup(seq, (gpointer)fmt_name, - compare_data, NULL)) { -g_sequence_insert_sorted(seq, (gpointer)fmt_name, - compare_data, NULL); -} +g_sequence_insert_sorted(seq, (gpointer)fmt_name, + compare_data, NULL); } static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...) -- 1.9.0
[Qemu-devel] [PATCH] Remove g_sequence_lookup from qemu-img help function
g_sequence_lookup is not supported by glib < 2.28. The usage of g_sequence_lookup is not essential in this context (its a safeguard against duplicate values in the help message). Removing the call enables the build on all platforms and does not change the operation of the help function. Signed-off-by: Mike Day --- qemu-img.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 04ce02a..bf5e74c 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -70,11 +70,8 @@ static void add_format_to_seq(void *opaque, const char *fmt_name) { GSequence *seq = opaque; -if (!g_sequence_lookup(seq, (gpointer)fmt_name, - compare_data, NULL)) { g_sequence_insert_sorted(seq, (gpointer)fmt_name, compare_data, NULL); -} } static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...) -- 1.9.0
Re: [Qemu-devel] [PULL 03/17] qemu-img: sort block formats in help message
On Tue, May 13, 2014 at 10:18 AM, Cornelia Huck wrote: > > qemu-img.o: In function `add_format_to_seq': > /home/cohuck/git/qemu/qemu-img.c:73: undefined reference to > `g_sequence_lookup' > collect2: ld returned 1 exit status > > g_sequence_lookup has been added with glib 2.28, and this box has > 2.22.5. configure looks for glib >= 2.12 (2.20 for mingw). g_sequence_lookup is there because I was getting duplicate formats in the format list, as in "vfat vfat vfat qcow." That was a temporary situation. the lookup serves to guarantee each format is unique in the list. It may not be needed. Mike
[Qemu-devel] [PATCH] qemu-img: sort block formats in help message
The help message for qemu-img lists the supported block formats, of which there are 27 as of version 2.0.50. The formats are printed in the order of their driver's position in a linked list, which appears random. This patch prints the formats in sorted order, making it easier to read and to find a specific format in the list. [Added suggestions from Fam Zheng to declare variables at the top of the scope in help() and to omit explicit cast for void* opaque. --Stefan] [Removed call to g_sequence_lookup because it breaks the build on machines with glib < 2.28. --Mike] Signed-off-by: Mike Day Signed-off-by: Stefan Hajnoczi --- qemu-img.c | 25 ++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 96f4463..93e51d1 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -32,6 +32,7 @@ #include "block/block_int.h" #include "block/qapi.h" #include +#include #define QEMU_IMG_VERSION "qemu-img version " QEMU_VERSION \ ", Copyright (c) 2004-2008 Fabrice Bellard\n" @@ -55,9 +56,22 @@ typedef enum OutputFormat { #define BDRV_O_FLAGS BDRV_O_CACHE_WB #define BDRV_DEFAULT_CACHE "writeback" -static void format_print(void *opaque, const char *name) +static gint compare_data(gconstpointer a, gconstpointer b, gpointer user) { -printf(" %s", name); +return g_strcmp0(a, b); +} + +static void print_format(gpointer data, gpointer user) +{ +printf(" %s", (char *)data); +} + +static void add_format_to_seq(void *opaque, const char *fmt_name) +{ +GSequence *seq = opaque; + +g_sequence_insert_sorted(seq, (gpointer)fmt_name, + compare_data, NULL); } static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...) @@ -142,10 +156,15 @@ static void QEMU_NORETURN help(void) " '-f' first image format\n" " '-F' second image format\n" " '-s' run in Strict mode - fail on different image size or sector allocation\n"; +GSequence *seq; printf("%s\nSupported formats:", help_msg); -bdrv_iterate_format(format_print, NULL); +seq = g_sequence_new(NULL); +bdrv_iterate_format(add_format_to_seq, seq); +g_sequence_foreach(seq, print_format, NULL); printf("\n"); +g_sequence_free(seq); + exit(EXIT_SUCCESS); } -- 1.9.0
[Qemu-devel] [PATCH v2] qemu-img fails to delete last snapshot
When deleting the last snapshot, copying the resulting snapshot table currently fails, causing the delete operation to also fail. Fix the failure by skipping the copy and just writing the snapshot header and freeing the extra clusters. There are two specific problems in the current code. First is a lack of parenthesis in the calculation of the memmove size parameter: s->nb_snapshots - snapshot_index - 1 When s->nb_snapshots is 0, snapshot_index is 1. 0 - 1 - 1 = 0xfffe it should be: 0 - (1 - 1) = 0x00 The second problem is shifting the snapshot table to the left. After removing the last snapshot there are no existing snapshots to be shifted. All that needs to be done is to write the header and unallocate the blocks. Signed-off-by: Mike Day Reviewed-by: Eric Blake --- v2: improved the git log entry added Eric Blake as a reviewer block/qcow2-snapshot.c | 25 +++-- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c index 0aa9def..c8b842c 100644 --- a/block/qcow2-snapshot.c +++ b/block/qcow2-snapshot.c @@ -165,9 +165,11 @@ static int qcow2_write_snapshots(BlockDriverState *bs) assert(offset <= INT_MAX); snapshots_size = offset; - /* Allocate space for the new snapshot list */ -snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size); +snapshots_offset = 0; +if (snapshots_size) { +snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size); +} offset = snapshots_offset; if (offset < 0) { ret = offset; @@ -180,12 +182,13 @@ static int qcow2_write_snapshots(BlockDriverState *bs) /* The snapshot list position has not yet been updated, so these clusters * must indeed be completely free */ -ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size); -if (ret < 0) { -goto fail; +if (snapshots_size) { +ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size); +if (ret < 0) { +goto fail; +} } - /* Write all snapshots to the new list */ for(i = 0; i < s->nb_snapshots; i++) { sn = s->snapshots + i; @@ -590,12 +593,14 @@ int qcow2_snapshot_delete(BlockDriverState *bs, return -ENOENT; } sn = s->snapshots[snapshot_index]; - /* Remove it from the snapshot list */ -memmove(s->snapshots + snapshot_index, -s->snapshots + snapshot_index + 1, -(s->nb_snapshots - snapshot_index - 1) * sizeof(sn)); s->nb_snapshots--; +if (s->nb_snapshots) { +memmove(s->snapshots + snapshot_index, +s->snapshots + snapshot_index + 1, +(s->nb_snapshots - (snapshot_index - 1)) * sizeof(sn)); +} + ret = qcow2_write_snapshots(bs); if (ret < 0) { error_setg_errno(errp, -ret, -- 1.9.0
Re: [Qemu-devel] [PATCH] qemu-img fails to delete last snapshot
On Mon, May 12, 2014 at 12:39 PM, Eric Blake wrote: > This information is actually quite useful in understanding the patch, > and I would have included it prior to the --- for inclusion in git, > rather than in the reviewer-only commentary that gets stripped during > 'git am'. > >> >> block/qcow2-snapshot.c | 25 +++-- >> 1 file changed, 15 insertions(+), 10 deletions(-) > > I'd suggest posting a v2 with a better commit message; but the code > itself seems fine, so you can add: > > Reviewed-by: Eric Blake Got it, thanks! Mike
Re: [Qemu-devel] [PATCH] Add backing file option to qemu-img create help.
On Mon, May 12, 2014 at 12:36 PM, Kevin Wolf wrote: > What would qemu-img rebase do with -o? It is just for (safely) changing > the backing file, not for updating options. There is qemu-img amend for > that, and it does have an -o option. A little background ... I'm writing a 4-day KVM training course for the Linux foundation. From time to time I come across undocumented, buggy, or confusing options and if I can I try to fix them. In the case of qemu-img I looked in a lot of places for documentation and ended up at the unofficial web page that documented the -b backing file option which works just fine for create and rebase. If an official page had the information I needed in one place instead of bits and pieces scattered everywhere I would have used it, naturally. Here's why the help is confusing. On the top level help, all the options except for -b are documented. A couple subcommands are even listed in the top level help with their options. I expected that -b would be documented on the top level with the others. The fact that -b is documented for the rebase subcommand reenforced my expectation. I was assuming there is some consistency in using options with backing files. (Especially if the option is called "backingfile.") I'm not advocating anything here, and I think its fine to leave things as they are. I just don't want to be the guy who complains and never fixes anything. Mike
Re: [Qemu-devel] [PATCH] Add backing file option to qemu-img create help.
On Mon, May 12, 2014 at 11:53 AM, Eric Blake wrote: > Ah, but it is: > > $ qemu-img create -f qcow2 -o help > Supported options: > size Virtual disk size > compat Compatibility level (0.10 or 1.1) > backing_file File name of a base image > backing_fmt Image format of the base image > encryption Encrypt the image > cluster_size qcow2 cluster size > preallocationPreallocation mode (allowed values: off, metadata) > lazy_refcounts Postpone refcount updates > > and more importantly, it only appears in the help output of -f modes > that actually support a backing file. Contrast: > > $ qemu-img create -f raw -o help > Supported options: > size Virtual disk size This is much more prominent in the top-level help: rebase [-q] [-f fmt] [-t cache] [-p] [-u] -b backing_file [-F backing_fmt] filename
Re: [Qemu-devel] [PATCH] Add backing file option to qemu-img create help.
On Mon, May 12, 2014 at 11:20 AM, Eric Blake wrote: > Online where? In 'qemu-img --help', or on some web page (at what URL)? http://qemu.weilnetz.de/qemu-doc.html#vm_005fsnapshots -o backing file is not documented in the help command. Mike
[Qemu-devel] [PATCH] Add backing file option to qemu-img create help.
For the create subcommand the backing file (-b) option is documented on-line but not in the binary. Add it. Signed-off-by: Mike Day --- qemu-img-cmds.hx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx index d029609..7724709 100644 --- a/qemu-img-cmds.hx +++ b/qemu-img-cmds.hx @@ -16,9 +16,9 @@ STEXI ETEXI DEF("create", img_create, -"create [-q] [-f fmt] [-o options] filename [size]") +"create [-q] [-f fmt] [-b backing-file] [-o options] filename [size]") STEXI -@item create [-q] [-f @var{fmt}] [-o @var{options}] @var{filename} [@var{size}] +@item create [-q] [-f @var{fmt}] [-b @var{backing_file}] [-o @var{options}] @var{filename} [@var{size}] ETEXI DEF("commit", img_commit, -- 1.9.0
[Qemu-devel] [PATCH] qemu-img fails to delete last snapshot
When deleting the last snapshot, copying the resulting snapshot table currently fails, causing the delete operation to also fail. Fix the failure by skipping the copy and just writing the snapshot header and freeing the extra clusters. Signed-off-by: Mike Day --- There are two specific problems in the curent code. First is a lack of parenthesis in the calculation of a memmove parameter: s->nb_snapshots - snapshot_index - 1 When s->nb_snapshots is 0, snapshot_index is 1. 0 - 1 - 1 = 0xfffe it should be: 0 - (1 - 1) = 0x00 The second problem is shifting the snapshot table to the left. After removing the last snapshot there are no existing snapshots to be shifted. All that needs to be done is to write the header and unallocate the blocks. block/qcow2-snapshot.c | 25 +++-- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c index 0aa9def..c8b842c 100644 --- a/block/qcow2-snapshot.c +++ b/block/qcow2-snapshot.c @@ -165,9 +165,11 @@ static int qcow2_write_snapshots(BlockDriverState *bs) assert(offset <= INT_MAX); snapshots_size = offset; - /* Allocate space for the new snapshot list */ -snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size); +snapshots_offset = 0; +if (snapshots_size) { +snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size); +} offset = snapshots_offset; if (offset < 0) { ret = offset; @@ -180,12 +182,13 @@ static int qcow2_write_snapshots(BlockDriverState *bs) /* The snapshot list position has not yet been updated, so these clusters * must indeed be completely free */ -ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size); -if (ret < 0) { -goto fail; +if (snapshots_size) { +ret = qcow2_pre_write_overlap_check(bs, 0, offset, snapshots_size); +if (ret < 0) { +goto fail; +} } - /* Write all snapshots to the new list */ for(i = 0; i < s->nb_snapshots; i++) { sn = s->snapshots + i; @@ -590,12 +593,14 @@ int qcow2_snapshot_delete(BlockDriverState *bs, return -ENOENT; } sn = s->snapshots[snapshot_index]; - /* Remove it from the snapshot list */ -memmove(s->snapshots + snapshot_index, -s->snapshots + snapshot_index + 1, -(s->nb_snapshots - snapshot_index - 1) * sizeof(sn)); s->nb_snapshots--; +if (s->nb_snapshots) { +memmove(s->snapshots + snapshot_index, +s->snapshots + snapshot_index + 1, +(s->nb_snapshots - (snapshot_index - 1)) * sizeof(sn)); +} + ret = qcow2_write_snapshots(bs); if (ret < 0) { error_setg_errno(errp, -ret, -- 1.9.0
Re: [Qemu-devel] [PATCH 1/6] xics: add flags for interrupts
On Thu, May 8, 2014 at 11:12 PM, Alexey Kardashevskiy wrote: >> There are >> a couple ways to mitigate this type of situation by using alternative >> data structures to inform the loop traversal. I don't know if it is >> worth the effort, though. > > Here I lost you :) If I read the code correctly, the problem I'm wondering about is that the loop will waste time traversing the array when there are only unallocated interrupts from the current index to the end. For example, if the interrupt array is 256 entries and the highest index that is allocated is 16, the outside loop will traverse all 256 entries while it should have exited after the 16th. To mitigate this you could keep a shadow index of the current highest allocated index and check for that in the outside loop. Or you could maintain a shadow linked list that only includes allocated array entries and just traverse that list. Each element on the list would be an allocated entry in the interrupt array. > btw I just realized that in patch#2 it should be xics_find_source instead > of xics_find_server. There are many interrupt servers already and just one > interrupt source (we could have many like one per PHB or something like > that but we are not there yet), this is what I meant.
Re: [Qemu-devel] [PATCH] Update QEMU checkpatch.pl to current linux version
On Thu, May 8, 2014 at 10:02 AM, Peter Maydell wrote: >>> total: 0 errors, 0 warnings, 79 lines checked >>> >>> /tmp/a has no obvious style problems and is ready for submission. >>> Check 500, Bad 52 >> >> How does your new checkpatch compare to our current one? > > Yes; we do sometimes let checkpatch-failing commits through, > so it would be more interesting to know: > * how many commits passed old-checkpatch but fail the new one > * how many commits failed with old-checkpatch but pass now > * in both cases, how many of the failures are false positives? > > (the last of those is tricky to answer without eyeballing > the errors, unfortunately) That seems like a good test regime and not a lot of effort using the bash script Don shared. There appears to be some interest in this patch. I'll work on a 2nd revision, following Peter's suggestion to separate the QEMU rules into a separate, smaller patch (and with the correct handling of brackets). Mike
Re: [Qemu-devel] [PATCH] Update QEMU checkpatch.pl to current linux version
On Wed, May 7, 2014 at 5:16 PM, Hani Benhabiles wrote: > FWIW, this new version doesn't trigger a false positive I was having with > patches 05-07 in [1]. > > However, from a quick test for this patch on a couple of patches, I am getting > warnings like this: "WARNING: braces {} are not necessary for single statement > blocks" so there are still some Qemu related changes missing. Thanks for the feedback! As Peter pointed out (and you confirm), I missed forward-porting QEMU's brace rules.
Re: [Qemu-devel] [PATCH 2/6] xics: add find_server
Alexey Kardashevskiy writes: > PAPR allows having multiple interrupr servers. > typo above in the commit log entry, Mike -- Mike Day | "Endurance is a Virtue"
Re: [Qemu-devel] [PATCH 1/6] xics: add flags for interrupts
> > for (i = 0; i < ics->nr_irqs; i++) { > /* FIXME: filter by server#? */ > -if (ics->islsi[i]) { > +if (ics->irqs[i].flags & XICS_FLAGS_LSI) { > resend_lsi(ics, i); Not part of your patch, but I'm curious about this FIXME (there's an identical FIXME in resend_msi). Has this proved to be a problem? With these patches you could have many unallocated interrupts in array AFTER the last allocated interrupt, correct? In that case, the loop would continue beyond the last allocated interrupt for no purpose. There are a couple ways to mitigate this type of situation by using alternative data structures to inform the loop traversal. I don't know if it is worth the effort, though. > +/* @flags == 0 measn the interrupt is not allocated */ > +#define XICS_FLAGS_LSI 0x1 > +#define XICS_FLAGS_MSI 0x2 (nit) typo in the above comment Mike -- Mike Day | "Endurance is a Virtue"
Re: [Qemu-devel] [PATCH] Update QEMU checkpatch.pl to current linux version
On Tue, May 6, 2014 at 2:28 PM, Peter Maydell wrote: > I think this is going to be difficult to review, to say the least. > > Where does your patch come from? Is the kernel's checkpatch.pl > just a single commit between 0.31 and 0.32 (surely not) or > a series of fixes? Yes, this is today's version of the kernel's checkpatch.pl. Version 0.32 is the product of many patches onto 0.31 > A couple of ideas about how we could approach this: > (1) make a commit which is simply copying the kernel's 0.32 > into our repo; then follow that with a series of commits which > re-apply our local changes. > (2) apply all the individual commits from the kernel between 0.31 > and 0.32 to our repo > (3) give up and stick with 0.31... idea (1) makes perfect sense to me, and the local changes will be easy to review. > It might also be helpful if you could describe the benefits > we get from this update (any bugfixes for false positives we > tend to run into? useful new checks?) I think its a valid question whether to forward-port the Qemu checks to 0.32. I'm not sure, myself, but this gives folks an idea of what the cost/benefit is. Yesterday I was bitten by the StudlyCaps syndrome in a patch I submitted which was checkpatch-clean. Afterward I noticed that version 0.31 has the check for StudlyCaps commented out. This is corrected in version 0.32 with a more ambitious check for "CamelCase." The CamelCase check in v0.32 tries to determine if the patch introduced StudlyCaps or if they already exist in the unpatched source file. (I was hoping for a three-line check I could paste into v0.31.) I noticed some other new features - it uses an optional configuration file, checks for validly formed email addresses. The other changes I believe, are either experimental or providing tools for filtering or typing messages. Mike
[Qemu-devel] [PATCH] [v2 PATCH] qemu-img: sort block formats in help message
The help message for qemu-img lists the supported block formats, of which there are 27 as of version 2.0.50. The formats are printed in the order of their driver's position in a linked list, which appears random. This patch prints the formats in sorted order, making it easier to read and to find a specific format in the list. v2: Incorporated feedback from Stefan Hajnoczi Signed-off-by: Mike Day --- qemu-img.c | 27 --- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 96f4463..a559108 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -32,6 +32,7 @@ #include "block/block_int.h" #include "block/qapi.h" #include +#include #define QEMU_IMG_VERSION "qemu-img version " QEMU_VERSION \ ", Copyright (c) 2004-2008 Fabrice Bellard\n" @@ -55,9 +56,25 @@ typedef enum OutputFormat { #define BDRV_O_FLAGS BDRV_O_CACHE_WB #define BDRV_DEFAULT_CACHE "writeback" -static void format_print(void *opaque, const char *name) +static gint compare_data(gconstpointer a, gconstpointer b, gpointer user) { -printf(" %s", name); +return g_strcmp0(a, b); +} + +static void print_format(gpointer data, gpointer user) +{ +printf(" %s", (char *)data); +} + +static void add_format_to_seq(void *opaque, const char *fmt_name) +{ +GSequence *seq = (GSequence *)opaque; + +if (!g_sequence_lookup(seq, (gpointer)fmt_name, + compare_data, NULL)) { +g_sequence_insert_sorted(seq, (gpointer)fmt_name, + compare_data, NULL); +} } static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...) @@ -144,8 +161,12 @@ static void QEMU_NORETURN help(void) " '-s' run in Strict mode - fail on different image size or sector allocation\n"; printf("%s\nSupported formats:", help_msg); -bdrv_iterate_format(format_print, NULL); +GSequence *seq = g_sequence_new(NULL); +bdrv_iterate_format(add_format_to_seq, seq); +g_sequence_foreach(seq, print_format, NULL); printf("\n"); +g_sequence_free(seq); + exit(EXIT_SUCCESS); } -- 1.9.0
Re: [Qemu-devel] [PATCH] qemu-img: sort block formats in help message
Thanks for the review! I've been able to shorten the next version quite a bit. On Mon, May 5, 2014 at 9:42 AM, Stefan Hajnoczi wrote: >> +static void GFunc_print_format(gpointer data, gpointer user) > > QEMU coding style is lowercase function and variable names. The > scripts/checkpatch.pl script should identify coding style violations, > please run it. Yeah this is ugly. I've long had checkpatch.pl configured as my pre-commit hook. I've confirmed again this morning that these ugly caps pass through with no error or warning. I'll look at the script. >> + >> +static void add_format_to_seq(void *opaque, const char *fmt_name) >> +{ >> +GSequence *seq = (GSequence *)opaque; >> + >> +if (!g_sequence_lookup(seq, (gpointer)fmt_name, >> + compare_data, NULL)) { >> +g_sequence_insert_sorted(seq, (gpointer)fmt_name, >> + compare_data, NULL); >> +} > > The type casts in this patch aren't necessary. In C void* casts to and > from any type without an explicit cast. Only C++ demands explicit > casts of void*. The casts are ugly, and I don't know how to get rid of them here beyond a pragma. Due to the block and glib APIs I have to cast away a const in the second parameter. I'm bracketed on both ends: g_sequence_* needs that second parameter to be void * (pointer), while the the block api (bdrv_iterate_format) defines the this function pointer type as (*)(void *, const char *). Ideas? Mike
[Qemu-devel] [PATCH] qemu-img: sort block formats in help message
The help message for qemu-img lists the supported block formats, of which there are 27 as of version 2.0.50. The formats are printed in the order of their driver's position in a linked list, which appears random. This patch prints the formats in sorted order, making it easier to read and to find a specific format in the list. Signed-off-by: Mike Day --- qemu-img.c | 33 - 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/qemu-img.c b/qemu-img.c index 96f4463..d8b7ef4 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -32,6 +32,7 @@ #include "block/block_int.h" #include "block/qapi.h" #include +#include #define QEMU_IMG_VERSION "qemu-img version " QEMU_VERSION \ ", Copyright (c) 2004-2008 Fabrice Bellard\n" @@ -60,6 +61,32 @@ static void format_print(void *opaque, const char *name) printf(" %s", name); } +static gint compare_data(gconstpointer a, gconstpointer b, gpointer user) +{ +return g_strcmp0((const char *)a, (const char *)b); +} + +static void GFunc_print_format(gpointer data, gpointer user) +{ +format_print(user, data); +} + +static GSequence *init_sequence(void) +{ +return g_sequence_new(NULL); +} + +static void add_format_to_seq(void *opaque, const char *fmt_name) +{ +GSequence *seq = (GSequence *)opaque; + +if (!g_sequence_lookup(seq, (gpointer)fmt_name, + compare_data, NULL)) { +g_sequence_insert_sorted(seq, (gpointer)fmt_name, + compare_data, NULL); +} +} + static void QEMU_NORETURN GCC_FMT_ATTR(1, 2) error_exit(const char *fmt, ...) { va_list ap; @@ -144,8 +171,12 @@ static void QEMU_NORETURN help(void) " '-s' run in Strict mode - fail on different image size or sector allocation\n"; printf("%s\nSupported formats:", help_msg); -bdrv_iterate_format(format_print, NULL); +GSequence *seq = init_sequence(); +bdrv_iterate_format(add_format_to_seq, seq); +g_sequence_foreach(seq, GFunc_print_format, NULL); printf("\n"); +g_sequence_free(seq); + exit(EXIT_SUCCESS); } -- 1.9.0
Re: [Qemu-devel] [PULL 14/16] qemu-img: Improve error messages
On Wed, Apr 30, 2014 at 10:16 AM, Eric Blake wrote: > Additional patches: > https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04074.html > https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg04468.html > both already on the queue for the next PULL request of the block branch. Excellent, thank you! Mike
Re: [Qemu-devel] [PULL 14/16] qemu-img: Improve error messages
On Sun, Apr 27, 2014 at 9:55 PM, Fam Zheng wrote: >> > /* not found */ >> > -help(); >> > -return 0; >> > +error_exit("Command not found: %s", cmdname); >> >> Looks like we just relied previously on the default 'not found' case >> for help() to provide the "--help" option. >> > > Oops! Thanks for noticing this and sending the fix. One of you guys going to send in the trivial patch to restore help? (Or is it upstream in another patch?) thanks, Mike
Re: [Qemu-devel] [PATCH] -nographic sometimes adds an extra chardev for stdio
On Tue, Apr 29, 2014 at 1:02 AM, Michael Tokarev wrote: > I guess we should have some global variable like "stdio_occuped", set > it to 1 when -daemonize is specified, and set and check it each time > we try to use stdio for something. This way we'll prevent various > parts of qemu from fighting for stdio. That does seem to be a more fundamental solution. Thanks for the review. Mike
Re: [Qemu-devel] [PATCH] -nographic sometimes adds an extra chardev for stdio
On Tue, Apr 29, 2014 at 3:09 AM, Gerd Hoffmann wrote: > > I don't feel like adding more band-aid to paper over the fundamental > issue. Too much band-aid tends to cause other unwanted side effects. Fair enough, I tend to agree. thanks for the review. Mike
Re: [Qemu-devel] [PATCH] -nographic sometimes adds an extra chardev for stdio
On Mon, Apr 28, 2014 at 3:53 PM, Michael Tokarev wrote: > Gosh. This is uuugly. > > Maybe, at least, we can add some variable which gets incremented for > each -mon chardev= ? I dunno, that's about the same ugliness. Oh > well... :( Suggestions much appreciated, I think its ugly versus intrusive, but I'm not familiar with this part of qemu. There is another case with -monitor where you can configure two chardevs for stdio and it does the same thing. I left that one alone, its more complicated. Mike
[Qemu-devel] [PATCH] -nographic sometimes adds an extra chardev for stdio
When the deprecated -nographic option is used with the -mon option in readline mode, qemu will create a second character device for stdio and place it over the stdio chardev put into place by the -mon option. This causes the terminal to stop echoeing characters upon exit from Qemu. Fix by checking for the existing chardev before adding another. Signed-off-by: Mike Day --- To reproduce, use -mon and -nographic together. I was able to reproduce it using # qemu-system-x86_64 -enable-kvm -m 1G -chardev stdio,id=mon0 \ # -mon chardev=mon0,mode=readline -nographic --- vl.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/vl.c b/vl.c index 236f95e..3104b20 100644 --- a/vl.c +++ b/vl.c @@ -93,6 +93,7 @@ int main(int argc, char **argv) #include "sysemu/kvm.h" #include "qapi/qmp/qjson.h" #include "qemu/option.h" +#include "qemu/option_int.h" #include "qemu/config-file.h" #include "qemu-options.h" #include "qmp-commands.h" @@ -524,6 +525,20 @@ static QemuOptsList qemu_mem_opts = { }, }; +static int qemu_opts_loopfun(QemuOpts *opts, void *opaque) +{ +QemuOpt *opt; + +if (!strncmp(((QemuOpt *)opts->list)->name, "mon", 3)) { +QTAILQ_FOREACH(opt, &opts->head, next) { +if (!strncmp(opt->name, "chardev", 7)) { +return 1; +} +} +} +return 0; +} + /** * Get machine options * @@ -4113,6 +4128,11 @@ int main(int argc, char **argv, char **envp) } if (display_type == DT_NOGRAPHIC) { +int have_stdio = 0; +QemuOptsList *opts = qemu_find_opts("mon"); +if (opts) { +have_stdio = qemu_opts_foreach(opts, qemu_opts_loopfun, NULL, 0); +} if (default_parallel) add_device_config(DEV_PARALLEL, "null"); if (default_serial && default_monitor) { @@ -4122,7 +4142,7 @@ int main(int argc, char **argv, char **envp) } else if (default_sclp && default_monitor) { add_device_config(DEV_SCLP, "mon:stdio"); } else { -if (default_serial) +if (default_serial && !have_stdio) add_device_config(DEV_SERIAL, "stdio"); if (default_virtcon) add_device_config(DEV_VIRTCON, "stdio"); -- 1.9.0
Re: [Qemu-devel] Monitor Readline - no terminal echo after exit
On Thu, Apr 24, 2014 at 3:31 AM, Markus Armbruster wrote: >> I believe someone on the list mentioned they are seeing a couple >> problems entering and exiting the Monitor. I'd like to look at this more >> closely, starting with my most pending issue: losing the terminal echo >> after exiting the Monitor. Thanks for the reply Markus. > Reproducer? I've found a couple of ways to reproduce the problem. The easiest is to use -nographic on the qemu command line when starting a qemu session. In this case the monitor opens stdio but there is no visible input or output. Another way is to use -nographic along with -mon ,mode=readline. In this case the monitor works, but when you exit from the monitor your terminal will not echo characters. For reference, here are the chardev and mon options I use: -chardev stdio,id=mon0 -mon chardev=mon0,mode=readline I see that -nographic is a deprecated option, fwiw. > The monitor runs on top of a QEMU chardev. Suggest to start digging at > monitor_init(), both into the monitor itself, and into the > CharDriverState object. Thus far I've confirmed that when the -nographic option is passed, the mon_init_func does not get called (as it does for readline mode). I know why this is, but I'm not yet sure the right way to fix it. Also, with -nographic and mon:stdio monitor_flush is called for every line entered execpt for the last line. Normally monitor_flush is called for every line including the last line, at least in readline mode. I've run out of time looking at this today, but would but would be happy if anyone has further ideas. Mike
[Qemu-devel] Monitor Readline - no terminal echo after exit
I believe someone on the list mentioned they are seeing a couple problems entering and exiting the Monitor. I'd like to look at this more closely, starting with my most pending issue: losing the terminal echo after exiting the Monitor. Does anyone have a quick pointer as to where I should look for this code? Otherwise I'll start looking through main_loop and friends and vl.c for init and destroy routines. Mike -- Mike Day | "Endurance is a Virtue"
Re: [Qemu-devel] [PATCH v6] spapr_hcall: add address-translation-mode-on-interrupt resource in H_SET_MODE
On Tue, Apr 8, 2014 at 2:28 AM, Alexander Graf wrote: > On 22.03.14 14:03, Alexey Kardashevskiy wrote: >> >> This adds handling of the RESOURCE_ADDR_TRANS_MODE resource from >> the H_SET_MODE, for POWER8 (PowerISA 2.07) only. >> >> Signed-off-by: Alexey Kardashevskiy >> Reviewed-by: Mike Day > > > ... but I can comment on style issues regardless for now ;) Alex, would a document under the CDA we have with SUSE be acceptable to you? That is the fastest option available at the moment. Mike
Re: [Qemu-devel] [PATCH v5 3/3] spapr_hcall: add address-translation-mode-on-interrupt resource in H_SET_MODE
Alexey Kardashevskiy writes: > This adds handling of the RESOURCE_ADDR_TRANS_MODE resource from > the H_SET_MODE, for POWER8 (PowerISA 2.07) only. > > Signed-off-by: Alexey Kardashevskiy Reviewed-by: Mike Day > --- > hw/ppc/spapr_hcall.c | 26 ++ > target-ppc/cpu.h | 2 ++ > 2 files changed, 28 insertions(+) > > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c > index fc5211b..fb23730 100644 > --- a/hw/ppc/spapr_hcall.c > +++ b/hw/ppc/spapr_hcall.c > @@ -747,6 +747,32 @@ static target_ulong h_set_mode(PowerPCCPU *cpu, > sPAPREnvironment *spapr, > default: > ret = H_UNSUPPORTED_FLAG; > } > +} else if (resource == H_SET_MODE_RESOURCE_ADDR_TRANS_MODE) { > +PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu); > + > +if (!(pcc->insns_flags2 & PPC2_ISA207S)) { > +return H_P2; ret = H_P2; goto out; Just a nit to make for easier review. The above would be more consistent. (Even though ret is already initialized to H_P2.) > +} > +if (value1) { > +ret = H_P3; > +goto out; > +} > +if (value2) { > +ret = H_P4; > +goto out; > +} > +switch (mflags) { > +case 0: > +case 2: > +case 3: > +CPU_FOREACH(cs) { > +set_spr(cs, SPR_LPCR, mflags << LPCR_AIL_SH, LPCR_AIL); > +} > +return H_SUCCESS; > + > +default: > +return H_UNSUPPORTED_FLAG; > +} > } > > out: > diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h > index 72cb546..577193a 100644 > --- a/target-ppc/cpu.h > +++ b/target-ppc/cpu.h > @@ -462,6 +462,8 @@ struct ppc_slb_t { > #define MSR_LE 0 /* Little-endian mode 1 hflags > */ > > #define LPCR_ILE (1 << (63-38)) > +#define LPCR_AIL 0x0180 /* Alternate interrupt location */ > +#define LPCR_AIL_SH (63-40) > > #define msr_sf ((env->msr >> MSR_SF) & 1) > #define msr_isf ((env->msr >> MSR_ISF) & 1) > -- > 1.8.4.rc4 > > -- Mike Day | "Endurance is a Virtue"
Re: [Qemu-devel] qemu patch for adding functionality to rtas_ibm_get_system_parameter [Version 2]
Tomohiro B Berry writes: > Hi again, > > I believe I have added the appropriate format changes and made a couple > changes to the code. This patch should add functionality to the function > rtas_ibm_get_system_parameter to return a string containing the values for > partition_max_entitled_capacity and system_potential_processors. The text before the diff goes into the git commit log. The patch guidelines say the commit log entry should describe the patch and anything else should go below "--" which will keep it out of the commit log. "I believe I have..." is an example of the type of text that shouldn't be in the commit log. There is still a line wrap - If you are not doing so consider running your patch through scripts/chechpatch.pl before submitting it. Also consider using git-format-patch to generate the patch you will send if you're not doing so already. Mike > Regards, > Tomo Berry > > Signed-off-by: Tomo Berry > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c > index 1cb276d..931ba06 100644 > --- a/hw/ppc/spapr_rtas.c > +++ b/hw/ppc/spapr_rtas.c > @@ -225,6 +225,9 @@ static void rtas_stop_self(PowerPCCPU *cpu, > sPAPREnvironment *spapr, > } > > #define DIAGNOSTICS_RUN_MODE42 > +#define SPLPAR_CHARACTERISTICS_TOKEN 20 > +#define CMO_CHARACTERISTICS_TOKEN 44 > +#define CEDE_LATENCY_TOKEN 45 > > static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu, >sPAPREnvironment *spapr, > @@ -236,6 +239,7 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU > *cpu, > target_ulong buffer = rtas_ld(args, 1); > target_ulong length = rtas_ld(args, 2); > target_ulong ret = RTAS_OUT_NOT_SUPPORTED; > +char *local_buffer; > > switch (parameter) { > case DIAGNOSTICS_RUN_MODE: > @@ -244,6 +248,36 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU > *cpu, > ret = RTAS_OUT_SUCCESS; > } > break; > +case SPLPAR_CHARACTERISTICS_TOKEN: > + > +/* Create a string of length bytes locally to copy to buffer */ > + > +local_buffer = calloc(length, 1); > + > +/* These are the current system parameters supported. The first > + * 16 bits of the buffer must contain the length of the string. > + * These 16 bits are not included in the length of the string. > */ > + > +((uint16_t *)local_buffer)[0] = cpu_to_be16(snprintf(local_buffer > + 2, > + length - 2, "MaxEntCap=%d,MaxPlatProcs=%d", max_cpus, > smp_cpus)); > + > +/* Copy the string into buffer using rtas_st_buffer to > + * convert the addresses. */ > + > +rtas_st_buffer(buffer, length, (uint8_t *)local_buffer); > + > +free(local_buffer); > +ret = RTAS_OUT_SUCCESS; > +break; > +case CMO_CHARACTERISTICS_TOKEN: > +ret = RTAS_OUT_NOT_SUPPORTED; > +break; > +case CEDE_LATENCY_TOKEN: > +ret = RTAS_OUT_NOT_SUPPORTED; > +break; > +default: > +ret = RTAS_OUT_NOT_SUPPORTED; > +break; > } > > rtas_st(rets, 0, ret); > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > index b2f11e9..ee6ed2d 100644 > --- a/include/hw/ppc/spapr.h > +++ b/include/hw/ppc/spapr.h > @@ -356,6 +356,18 @@ static inline void rtas_st(target_ulong phys, int n, > uint32_t val) > stl_be_phys(ppc64_phys_to_real(phys + 4*n), val); > } > > +/* This function will store a buffer 1 byte at a time into the > + * address at phys up to a maximum of phys_len bytes.*/ > + > +static inline void rtas_st_buffer(target_ulong phys, > + target_ulong phys_len, > + uint8_t *buffer) { > +uint32_t i; > +for (i = 0; i < phys_len; i++) { > +stb_phys(ppc64_phys_to_real(phys + i), buffer[i]); > +} > +} > + > typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPREnvironment *spapr, >uint32_t token, >uint32_t nargs, target_ulong args, > > > > From: Mike Day > To: Tomohiro B Berry/Austin/IBM@IBMUS, qemu-devel@nongnu.org, > Date: 03/13/2014 03:24 PM > Subject:Re: [Qemu-devel] qemu patch for adding functionality to > rtas_ibm_get_system_parameter > > > > > Tomohiro, > > Please follow the guidelines for submitting a patch to Qemu that are > found in: > > http://wiki.qemu.org/Contribute/SubmitAPatch > > This patch has an inappropriate commit log, is missing a signed-off-by: > tag, and some of the lines wrapped in my rea
Re: [Qemu-devel] qemu patch for adding functionality to rtas_ibm_get_system_parameter
Tomohiro, Please follow the guidelines for submitting a patch to Qemu that are found in: http://wiki.qemu.org/Contribute/SubmitAPatch This patch has an inappropriate commit log, is missing a signed-off-by: tag, and some of the lines wrapped in my reader. These are explained in the document above. You should be able to fix these issues quickly and resubmit this patch. Mike Tomohiro B Berry writes: > Hi all, > > rtas_ibm_get_system_parameter did not previously have the functionality to > return the appropriate string when called with the > SPLPAR_CHARACTERISTICS_TOKEN. I am proposing the following patch to add > that functionality. I am including the cases for > CMO_CHARACTERISTICS_TOKEN and CEDE_LATENCY_TOKEN because the function gets > called with those tokens during the boot process, but I understand that > they are currently redundant with the default case. I just figured they > would be useful for future development, but if we don't want them in there > right now I think that would be okay, too. > > Regards, > Tomo Berry > > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c > index 1cb276d..318fdcd 100644 > --- a/hw/ppc/spapr_rtas.c > +++ b/hw/ppc/spapr_rtas.c > @@ -225,6 +225,9 @@ static void rtas_stop_self(PowerPCCPU *cpu, > sPAPREnvironment *spapr, > } > > #define DIAGNOSTICS_RUN_MODE42 > +#define SPLPAR_CHARACTERISTICS_TOKEN 20 > +#define CMO_CHARACTERISTICS_TOKEN 44 > +#define CEDE_LATENCY_TOKEN 45 > > static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu, >sPAPREnvironment *spapr, > @@ -236,6 +239,7 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU > *cpu, > target_ulong buffer = rtas_ld(args, 1); > target_ulong length = rtas_ld(args, 2); > target_ulong ret = RTAS_OUT_NOT_SUPPORTED; > +char *local_buffer; > > switch (parameter) { > case DIAGNOSTICS_RUN_MODE: > @@ -244,6 +248,42 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU > *cpu, > ret = RTAS_OUT_SUCCESS; > } > break; > +case SPLPAR_CHARACTERISTICS_TOKEN: > + > + /* Create a string locally to copy to buffer */ > + > + local_buffer=(char*)malloc(length*sizeof(char)); > + memset(local_buffer,0,length); > + > + /* These are the current system parameters supported. The spaces > at the > +* start of the string are place holders for the string length. > */ > + > + snprintf(local_buffer,length," > MaxEntCap=%d,MaxPlatProcs=%d",max_cpus,smp_cpus); > + > + /* The first 16 bits of the buffer must contain the length of the > string. > +* These 16 bits are not included in the length of the string. */ > + > + > ((uint16_t*)local_buffer)[0]=cpu_to_be16((uint16_t)strlen(&local_buffer[2])); > + > + /* Copy the string into buffer using rtas_st_buffer to > +* convert the addresses. */ > + > + rtas_st_buffer(buffer,length,(uint8_t*)local_buffer); > + > + /* Free the local buffer and return success. */ > + > + free(local_buffer); > + ret = RTAS_OUT_SUCCESS; > + break; > +case CMO_CHARACTERISTICS_TOKEN: > + ret = RTAS_OUT_NOT_SUPPORTED; > + break; > +case CEDE_LATENCY_TOKEN: > + ret = RTAS_OUT_NOT_SUPPORTED; > + break; > +default: > + ret = RTAS_OUT_NOT_SUPPORTED; > + break; > } > > rtas_st(rets, 0, ret); > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > index b2f11e9..87c039c 100644 > --- a/include/hw/ppc/spapr.h > +++ b/include/hw/ppc/spapr.h > @@ -356,6 +356,16 @@ static inline void rtas_st(target_ulong phys, int n, > uint32_t val) > stl_be_phys(ppc64_phys_to_real(phys + 4*n), val); > } > > +/* This function will store a buffer 1 byte at a time into the > + * address at phys up to a maximum of phys_len bytes. */ > + > +static inline void rtas_st_buffer(target_ulong phys, target_ulong > phys_len, uint8_t* buffer){ > +uint32_t i; > +for(i=0;i +stb_phys(ppc64_phys_to_real(phys + i),buffer[i]); > +} > +} > + > typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPREnvironment *spapr, >uint32_t token, >uint32_t nargs, target_ulong args, > diff --git a/pixman b/pixman > --- a/pixman > +++ b/pixman > @@ -1 +1 @@ > -Subproject commit 97336fad32acf802003855cd8bd6477fa49a12e3 > +Subproject commit 97336fad32acf802003855cd8bd6477fa49a12e3-dirty -- Mike Day | "Endurance is a Virtue"
Re: [Qemu-devel] spapr-iommu: extend SPAPR_TCE_TABLE class
On 20/11/13 16:39 +1100, Alexey Kardashevskiy wrote: > This adds a put_tce() callback to the SPAPR TCE TABLE device class. > The new callback allows to have different IOMMU types such as upcoming > VFIO IOMMU and it will be used more by the upcoming Multi-TCE support. > > This reworks the H_PUT_TCE handler to make use of the new put_tce() > callback. > > Signed-off-by: Alexey Kardashevskiy Reviewed-by: Mike Day > --- > hw/ppc/spapr_iommu.c | 21 + > include/hw/ppc/spapr.h | 13 + > 2 files changed, 30 insertions(+), 4 deletions(-) > > diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c > index ef45f4f..0016c13 100644 > --- a/hw/ppc/spapr_iommu.c > +++ b/hw/ppc/spapr_iommu.c > @@ -207,7 +207,7 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, > target_ulong ioba, > IOMMUTLBEntry entry; > > if (ioba >= tcet->window_size) { > -hcall_dprintf("spapr_vio_put_tce on out-of-bounds IOBA 0x" > +hcall_dprintf("spapr put_tce_emu on out-of-bounds IOBA 0x" >TARGET_FMT_lx "\n", ioba); > return H_PARAMETER; > } > @@ -232,12 +232,21 @@ static target_ulong h_put_tce(PowerPCCPU *cpu, > sPAPREnvironment *spapr, > target_ulong tce = args[2]; > target_ulong ret = H_PARAMETER; > sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn); > +sPAPRTCETableClass *info; > + > +if (!tcet) { > +return H_PARAMETER; > +} > + > +info = SPAPR_TCE_TABLE_GET_CLASS(tcet); > +if (!info || !info->put_tce) { > +return H_PARAMETER; > +} > > ioba &= ~(SPAPR_TCE_PAGE_SIZE - 1); > > -if (tcet) { > -ret = put_tce_emu(tcet, ioba, tce); > -} > +ret = info->put_tce(tcet, ioba, tce); > + > trace_spapr_iommu_put(liobn, ioba, tce, ret); > > return ret; > @@ -287,9 +296,12 @@ int spapr_tcet_dma_dt(void *fdt, int node_off, const > char *propname, > static void spapr_tce_table_class_init(ObjectClass *klass, void *data) > { > DeviceClass *dc = DEVICE_CLASS(klass); > +sPAPRTCETableClass *stc = SPAPR_TCE_TABLE_CLASS(klass); > + > dc->vmsd = &vmstate_spapr_tce_table; > dc->init = spapr_tce_table_realize; > dc->reset = spapr_tce_reset; > +stc->put_tce = put_tce_emu; > > QLIST_INIT(&spapr_tce_tables); > > @@ -302,6 +314,7 @@ static TypeInfo spapr_tce_table_info = { > .parent = TYPE_DEVICE, > .instance_size = sizeof(sPAPRTCETable), > .class_init = spapr_tce_table_class_init, > +.class_size = sizeof(sPAPRTCETableClass), > .instance_finalize = spapr_tce_table_finalize, > }; > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > index fdaab2d..827cda2 100644 > --- a/include/hw/ppc/spapr.h > +++ b/include/hw/ppc/spapr.h > @@ -367,12 +367,25 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr > rtas_addr, > > #define RTAS_ERROR_LOG_MAX 2048 > > +typedef struct sPAPRTCETableClass sPAPRTCETableClass; > typedef struct sPAPRTCETable sPAPRTCETable; > > #define TYPE_SPAPR_TCE_TABLE "spapr-tce-table" > #define SPAPR_TCE_TABLE(obj) \ > OBJECT_CHECK(sPAPRTCETable, (obj), TYPE_SPAPR_TCE_TABLE) > > +#define SPAPR_TCE_TABLE_CLASS(klass) \ > + OBJECT_CLASS_CHECK(sPAPRTCETableClass, (klass), TYPE_SPAPR_TCE_TABLE) > +#define SPAPR_TCE_TABLE_GET_CLASS(obj) \ > + OBJECT_GET_CLASS(sPAPRTCETableClass, (obj), TYPE_SPAPR_TCE_TABLE) > + > +struct sPAPRTCETableClass { > +DeviceClass parent_class; > + > +target_ulong (*put_tce)(sPAPRTCETable *tcet, target_ulong ioba, > +target_ulong tce); > +}; > + > struct sPAPRTCETable { > DeviceState parent; > uint32_t liobn; > -- > 1.8.4.rc4 > > > >
Re: [Qemu-devel] spapr-pci: converts fprintf to error_report
On 21/11/13 15:08 +1100, Alexey Kardashevskiy wrote: > Signed-off-by: Alexey Kardashevskiy Reviewed-by: Mike Day > --- > hw/ppc/spapr_pci.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > index 963841c..d102d82 100644 > --- a/hw/ppc/spapr_pci.c > +++ b/hw/ppc/spapr_pci.c > @@ -293,7 +293,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, > sPAPREnvironment *spapr, > ret_intr_type = RTAS_TYPE_MSIX; > break; > default: > -fprintf(stderr, "rtas_ibm_change_msi(%u) is not implemented\n", > func); > +error_report("rtas_ibm_change_msi(%u) is not implemented", func); > rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR); > return; > } > @@ -327,7 +327,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, > sPAPREnvironment *spapr, > /* Find a device number in the map to add or reuse the existing one */ > ndev = spapr_msicfg_find(phb, config_addr, true); > if (ndev >= SPAPR_MSIX_MAX_DEVS || ndev < 0) { > -fprintf(stderr, "No free entry for a new MSI device\n"); > +error_report("No free entry for a new MSI device"); > rtas_st(rets, 0, RTAS_OUT_HW_ERROR); > return; > } > @@ -336,7 +336,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, > sPAPREnvironment *spapr, > /* Check if there is an old config and MSI number has not changed */ > if (phb->msi_table[ndev].nvec && (req_num != phb->msi_table[ndev].nvec)) > { > /* Unexpected behaviour */ > -fprintf(stderr, "Cannot reuse MSI config for device#%d", ndev); > +error_report("Cannot reuse MSI config for device#%d", ndev); > rtas_st(rets, 0, RTAS_OUT_HW_ERROR); > return; > } > @@ -346,7 +346,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, > sPAPREnvironment *spapr, > irq = spapr_allocate_irq_block(req_num, false, > ret_intr_type == RTAS_TYPE_MSI); > if (irq < 0) { > -fprintf(stderr, "Cannot allocate MSIs for device#%d", ndev); > +error_report("Cannot allocate MSIs for device#%d", ndev); > rtas_st(rets, 0, RTAS_OUT_HW_ERROR); > return; > } > -- > 1.8.4.rc4 > > > >
Re: [Qemu-devel] spapr-pci: introduce a finish_realize() callback
On 21/11/13 15:08 +1100, Alexey Kardashevskiy wrote: > The spapr-pci PHB initializes IOMMU for emulataed devices only. > The upcoming VFIO support will do it different. However both emulated > and VFIO PHB types share most of the initialization code. > For the type specific things a new finish_realize() callback is > introduced. > > This introduces sPAPRPHBClass derived from PCIHostBridgeClass and > adds the callback pointer. > > This implements finish_realize() for emulated devices. > > This changes initialization steps order to have the finish_realize() > call at the end of the spapr_finalize(). > > Signed-off-by: Alexey Kardashevskiy Reviewed-by: Mike Day > --- > Changes: > v5: > * this is a new patch in the series, it was a part of a previous patch > --- > hw/ppc/spapr_pci.c | 46 > + > include/hw/pci-host/spapr.h | 18 -- > 2 files changed, 46 insertions(+), 18 deletions(-) > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > index aeb012d..963841c 100644 > --- a/hw/ppc/spapr_pci.c > +++ b/hw/ppc/spapr_pci.c > @@ -500,6 +500,7 @@ static void spapr_phb_realize(DeviceState *dev, Error > **errp) > SysBusDevice *s = SYS_BUS_DEVICE(dev); > sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s); > PCIHostState *phb = PCI_HOST_BRIDGE(s); > +sPAPRPHBClass *info = SPAPR_PCI_HOST_BRIDGE_GET_CLASS(s); > const char *busname; > char *namebuf; > int i; > @@ -609,22 +610,6 @@ static void spapr_phb_realize(DeviceState *dev, Error > **errp) > PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS); > phb->bus = bus; > > -sphb->dma_window_start = 0; > -sphb->dma_window_size = 0x4000; > -sphb->tcet = spapr_tce_new_table(dev, sphb->dma_liobn, > - sphb->dma_window_size); > -if (!sphb->tcet) { > -error_setg(errp, "Unable to create TCE table for %s", > - sphb->dtbusname); > -return; > -} > -address_space_init(&sphb->iommu_as, spapr_tce_get_iommu(sphb->tcet), > - sphb->dtbusname); > - > -pci_setup_iommu(bus, spapr_pci_dma_iommu, sphb); > - > -pci_bus_set_route_irq_fn(bus, spapr_route_intx_pin_to_irq); > - > QLIST_INSERT_HEAD(&spapr->phbs, sphb, list); > > /* Initialize the LSI table */ > @@ -639,6 +624,32 @@ static void spapr_phb_realize(DeviceState *dev, Error > **errp) > > sphb->lsi_table[i].irq = irq; > } > + > +pci_setup_iommu(sphb->parent_obj.bus, spapr_pci_dma_iommu, sphb); > + > +pci_bus_set_route_irq_fn(bus, spapr_route_intx_pin_to_irq); > + > +if (!info->finish_realize) { > +error_setg(errp, "finish_realize not defined"); > +return; > +} > + > +info->finish_realize(sphb, errp); > +} > + > +static void spapr_phb_finish_realize(sPAPRPHBState *sphb, Error **errp) > +{ > +sphb->dma_window_start = 0; > +sphb->dma_window_size = 0x4000; > +sphb->tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn, > + sphb->dma_window_size); > +if (!sphb->tcet) { > +error_setg(errp, "Unable to create TCE table for %s", > + sphb->dtbusname); > +return ; > +} > +address_space_init(&sphb->iommu_as, spapr_tce_get_iommu(sphb->tcet), > + sphb->dtbusname); > } > > static void spapr_phb_reset(DeviceState *qdev) > @@ -722,12 +733,14 @@ static void spapr_phb_class_init(ObjectClass *klass, > void *data) > { > PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass); > DeviceClass *dc = DEVICE_CLASS(klass); > +sPAPRPHBClass *spc = SPAPR_PCI_HOST_BRIDGE_CLASS(klass); > > hc->root_bus_path = spapr_phb_root_bus_path; > dc->realize = spapr_phb_realize; > dc->props = spapr_phb_properties; > dc->reset = spapr_phb_reset; > dc->vmsd = &vmstate_spapr_pci; > +spc->finish_realize = spapr_phb_finish_realize; > } > > static const TypeInfo spapr_phb_info = { > @@ -735,6 +748,7 @@ static const TypeInfo spapr_phb_info = { > .parent= TYPE_PCI_HOST_BRIDGE, > .instance_size = sizeof(sPAPRPHBState), > .class_init= spapr_phb_class_init, > +.class_size= sizeof(sPAPRPHBClass), > }; > > PCIHostState *spapr_create_phb(sPAPREnvironment *spapr, int index) > diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h > index 970b4a9
Re: [Qemu-devel] spapr-pci: convert init() callback to realize()
On 21/11/13 15:08 +1100, Alexey Kardashevskiy wrote: > This converts the old-style init() callback to a new style realize() > callback as init() now is supposed to do only trivial initialization. > > As a part of convertion, this replaces fprintf(stderr) with error_setg() > as realize() does not "return" any value, instead it puts the extended > error into **errp. > > Signed-off-by: Alexey Kardashevskiy Reviewed-by: Mike Day > --- > Changes: > v5: > * finish_finalize() moved to a separate patch > --- > hw/ppc/spapr_pci.c | 44 ++-- > 1 file changed, 22 insertions(+), 22 deletions(-) > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > index 7763149..aeb012d 100644 > --- a/hw/ppc/spapr_pci.c > +++ b/hw/ppc/spapr_pci.c > @@ -32,6 +32,7 @@ > #include "exec/address-spaces.h" > #include > #include "trace.h" > +#include "qemu/error-report.h" > > #include "hw/pci/pci_bus.h" > > @@ -494,9 +495,9 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, > void *opaque, int devfn) > return &phb->iommu_as; > } > > -static int spapr_phb_init(SysBusDevice *s) > +static void spapr_phb_realize(DeviceState *dev, Error **errp) > { > -DeviceState *dev = DEVICE(s); > +SysBusDevice *s = SYS_BUS_DEVICE(dev); > sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s); > PCIHostState *phb = PCI_HOST_BRIDGE(s); > const char *busname; > @@ -510,9 +511,9 @@ static int spapr_phb_init(SysBusDevice *s) > if ((sphb->buid != -1) || (sphb->dma_liobn != -1) > || (sphb->mem_win_addr != -1) > || (sphb->io_win_addr != -1)) { > -fprintf(stderr, "Either \"index\" or other parameters must" > -" be specified for PAPR PHB, not both\n"); > -return -1; > +error_setg(errp, "Either \"index\" or other parameters must" > + " be specified for PAPR PHB, not both"); > +return; shouldn't these return an int? Or if the error is returned by pointer perhaps the function should have a void return? > } > > sphb->buid = SPAPR_PCI_BASE_BUID + sphb->index; > @@ -525,28 +526,28 @@ static int spapr_phb_init(SysBusDevice *s) > } > > if (sphb->buid == -1) { > -fprintf(stderr, "BUID not specified for PHB\n"); > -return -1; > +error_setg(errp, "BUID not specified for PHB"); > +return; > } > > if (sphb->dma_liobn == -1) { > -fprintf(stderr, "LIOBN not specified for PHB\n"); > -return -1; > +error_setg(errp, "LIOBN not specified for PHB"); > +return; > } > > if (sphb->mem_win_addr == -1) { > -fprintf(stderr, "Memory window address not specified for PHB\n"); > -return -1; > +error_setg(errp, "Memory window address not specified for PHB"); > +return; > } > > if (sphb->io_win_addr == -1) { > -fprintf(stderr, "IO window address not specified for PHB\n"); > -return -1; > +error_setg(errp, "IO window address not specified for PHB"); > +return; > } > > if (find_phb(spapr, sphb->buid)) { > -fprintf(stderr, "PCI host bridges must have unique BUIDs\n"); > -return -1; > +error_setg(errp, "PCI host bridges must have unique BUIDs"); > +return; > } > > sphb->dtbusname = g_strdup_printf("pci@%" PRIx64, sphb->buid); > @@ -613,8 +614,9 @@ static int spapr_phb_init(SysBusDevice *s) > sphb->tcet = spapr_tce_new_table(dev, sphb->dma_liobn, > sphb->dma_window_size); > if (!sphb->tcet) { > -fprintf(stderr, "Unable to create TCE table for %s\n", > sphb->dtbusname); > -return -1; > +error_setg(errp, "Unable to create TCE table for %s", > + sphb->dtbusname); > +return; > } > address_space_init(&sphb->iommu_as, spapr_tce_get_iommu(sphb->tcet), > sphb->dtbusname); > @@ -631,13 +633,12 @@ static int spapr_phb_init(SysBusDevice *s) > > irq = spapr_allocate_lsi(0); > if (!irq) { > -return -1; > +error_setg(errp, "spapr_allocate_lsi failed"); > +return; > } > > s
Re: [Qemu-devel] spapr-pci: convert init() callback to realize()
On 21/11/13 15:08 +1100, Alexey Kardashevskiy wrote: > This converts the old-style init() callback to a new style realize() > callback as init() now is supposed to do only trivial initialization. > > As a part of convertion, this replaces fprintf(stderr) with error_setg() > as realize() does not "return" any value, instead it puts the extended > error into **errp. > > Signed-off-by: Alexey Kardashevskiy Reviewed-by: Mike Day > --- > Changes: > v5: > * finish_finalize() moved to a separate patch > --- > hw/ppc/spapr_pci.c | 44 ++-- > 1 file changed, 22 insertions(+), 22 deletions(-) > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > index 7763149..aeb012d 100644 > --- a/hw/ppc/spapr_pci.c > +++ b/hw/ppc/spapr_pci.c > @@ -32,6 +32,7 @@ > #include "exec/address-spaces.h" > #include > #include "trace.h" > +#include "qemu/error-report.h" > > #include "hw/pci/pci_bus.h" > > @@ -494,9 +495,9 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, > void *opaque, int devfn) > return &phb->iommu_as; > } > > -static int spapr_phb_init(SysBusDevice *s) > +static void spapr_phb_realize(DeviceState *dev, Error **errp) > { > -DeviceState *dev = DEVICE(s); > +SysBusDevice *s = SYS_BUS_DEVICE(dev); > sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(s); > PCIHostState *phb = PCI_HOST_BRIDGE(s); > const char *busname; > @@ -510,9 +511,9 @@ static int spapr_phb_init(SysBusDevice *s) > if ((sphb->buid != -1) || (sphb->dma_liobn != -1) > || (sphb->mem_win_addr != -1) > || (sphb->io_win_addr != -1)) { > -fprintf(stderr, "Either \"index\" or other parameters must" > -" be specified for PAPR PHB, not both\n"); > -return -1; > +error_setg(errp, "Either \"index\" or other parameters must" > + " be specified for PAPR PHB, not both"); > +return; > } Seems like these exit clauses should return an integer here and below. > sphb->buid = SPAPR_PCI_BASE_BUID + sphb->index; > @@ -525,28 +526,28 @@ static int spapr_phb_init(SysBusDevice *s) > } > > if (sphb->buid == -1) { > -fprintf(stderr, "BUID not specified for PHB\n"); > -return -1; > +error_setg(errp, "BUID not specified for PHB"); > +return; > } > > if (sphb->dma_liobn == -1) { > -fprintf(stderr, "LIOBN not specified for PHB\n"); > -return -1; > +error_setg(errp, "LIOBN not specified for PHB"); > +return; > } > > if (sphb->mem_win_addr == -1) { > -fprintf(stderr, "Memory window address not specified for PHB\n"); > -return -1; > +error_setg(errp, "Memory window address not specified for PHB"); > +return; > } > > if (sphb->io_win_addr == -1) { > -fprintf(stderr, "IO window address not specified for PHB\n"); > -return -1; > +error_setg(errp, "IO window address not specified for PHB"); > +return; > } > > if (find_phb(spapr, sphb->buid)) { > -fprintf(stderr, "PCI host bridges must have unique BUIDs\n"); > -return -1; > +error_setg(errp, "PCI host bridges must have unique BUIDs"); > +return; > } > > sphb->dtbusname = g_strdup_printf("pci@%" PRIx64, sphb->buid); > @@ -613,8 +614,9 @@ static int spapr_phb_init(SysBusDevice *s) > sphb->tcet = spapr_tce_new_table(dev, sphb->dma_liobn, > sphb->dma_window_size); > if (!sphb->tcet) { > -fprintf(stderr, "Unable to create TCE table for %s\n", > sphb->dtbusname); > -return -1; > +error_setg(errp, "Unable to create TCE table for %s", > + sphb->dtbusname); > +return; > } > address_space_init(&sphb->iommu_as, spapr_tce_get_iommu(sphb->tcet), > sphb->dtbusname); > @@ -631,13 +633,12 @@ static int spapr_phb_init(SysBusDevice *s) > > irq = spapr_allocate_lsi(0); > if (!irq) { > -return -1; > +error_setg(errp, "spapr_allocate_lsi failed"); > +return; > } > > sphb->lsi_table[i].irq = irq; > } > - > -return 0; > } > > static void spapr_phb_reset(DeviceState *qdev) > @@ -720,11 +721,10 @@ static const char *spapr_phb_root_bus_path(PCIHostState > *host_bridge, > static void spapr_phb_class_init(ObjectClass *klass, void *data) > { > PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass); > -SysBusDeviceClass *sdc = SYS_BUS_DEVICE_CLASS(klass); > DeviceClass *dc = DEVICE_CLASS(klass); > > hc->root_bus_path = spapr_phb_root_bus_path; > -sdc->init = spapr_phb_init; > +dc->realize = spapr_phb_realize; > dc->props = spapr_phb_properties; > dc->reset = spapr_phb_reset; > dc->vmsd = &vmstate_spapr_pci; > -- > 1.8.4.rc4 > > > >
[Qemu-devel] [PATCH 2/2] [RFC] Convert Clock Timerlists to RCU V3
timerlists is a list of lists that holds active timers, among other items. It is read-mostly. This patch converts read access to the timerlists to use RCU. Rather than introduce a second mutex for timerlists, which would require nested mutexes to in order to write to the timerlists, use one QemuMutex in the QemuClock structure for all write access to any list hanging off the QemuClock structure. This patch applies against Paolo Bonzini's rcu branch: https://github.com/bonzini/qemu/tree/rcu and also requires the previously submitted patch ae11e1c "Convert active timers list to use RCU for read operations V3." V3: - timerlists modifications split to a separate patch (this one). - Addressed comments from Alex Bligh and Paolo Bonzini. Signed-off-by: Mike Day --- qemu-timer.c | 31 ++- 1 file changed, 26 insertions(+), 5 deletions(-) diff --git a/qemu-timer.c b/qemu-timer.c index 57a1545..4144e54 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -74,6 +74,7 @@ struct QEMUTimerList { QEMUTimerListNotifyCB *notify_cb; void *notify_opaque; QemuEvent timers_done_ev; +struct rcu_head rcu; }; /** @@ -111,6 +112,13 @@ QEMUTimerList *timerlist_new(QEMUClockType type, return timer_list; } +static void reclaim_timerlist(struct rcu_head *rcu) +{ +QEMUTimerList *tl = container_of(rcu, QEMUTimerList, rcu); +g_free(tl); +} + + void timerlist_free(QEMUTimerList *timer_list) { assert(!timerlist_has_timers(timer_list)); @@ -118,7 +126,7 @@ void timerlist_free(QEMUTimerList *timer_list) QLIST_REMOVE(timer_list, list); } qemu_event_destroy(&timer_list->timers_done_ev); -g_free(timer_list); +call_rcu1(&timer_list->rcu, reclaim_timerlist); } static void qemu_clock_init(QEMUClockType type) @@ -143,9 +151,11 @@ void qemu_clock_notify(QEMUClockType type) { QEMUTimerList *timer_list; QEMUClock *clock = qemu_clock_ptr(type); -QLIST_FOREACH(timer_list, &clock->timerlists, list) { +rcu_read_lock(); +QLIST_FOREACH_RCU(timer_list, &clock->timerlists, list) { timerlist_notify(timer_list); } +rcu_read_unlock(); } void qemu_clock_enable(QEMUClockType type, bool enabled) @@ -157,9 +167,11 @@ void qemu_clock_enable(QEMUClockType type, bool enabled) if (enabled && !old) { qemu_clock_notify(type); } else if (!enabled && old) { -QLIST_FOREACH(tl, &clock->timerlists, list) { +rcu_read_lock(); +QLIST_FOREACH_RCU(tl, &clock->timerlists, list) { qemu_event_wait(&tl->timers_done_ev); } +rcu_read_unlock(); } } @@ -243,10 +255,12 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type) int64_t deadline = -1; QEMUTimerList *timer_list; QEMUClock *clock = qemu_clock_ptr(type); -QLIST_FOREACH(timer_list, &clock->timerlists, list) { +rcu_read_lock(); +QLIST_FOREACH_RCU(timer_list, &clock->timerlists, list) { deadline = qemu_soonest_timeout(deadline, timerlist_deadline_ns(timer_list)); } +rcu_read_unlock(); return deadline; } @@ -262,11 +276,13 @@ QEMUTimerList *qemu_clock_get_main_loop_timerlist(QEMUClockType type) void timerlist_notify(QEMUTimerList *timer_list) { -if (timer_list->notify_cb) { +rcu_read_lock(); +if (atomic_rcu_read(&timer_list->notify_cb)) { timer_list->notify_cb(timer_list->notify_opaque); } else { qemu_notify_event(); } +rcu_read_unlock(); } /* Transition function to convert a nanosecond timeout to ms @@ -585,13 +601,18 @@ void qemu_clock_register_reset_notifier(QEMUClockType type, Notifier *notifier) { QEMUClock *clock = qemu_clock_ptr(type); +qemu_mutex_lock(&clock->timer_lock); notifier_list_add(&clock->reset_notifiers, notifier); +qemu_mutex_unlock(&clock->timer_lock); } void qemu_clock_unregister_reset_notifier(QEMUClockType type, Notifier *notifier) { +QEMUClock *clock = qemu_clock_ptr(type); +qemu_mutex_lock(&clock->timer_lock); notifier_remove(notifier); +qemu_mutex_unlock(&clock->timer_lock); } void init_clocks(void) -- 1.9.0
[Qemu-devel] [PATCH 1/2] [RFC] Convert active timers list to use RCU V3
active_timers is a list of timer lists, associated with a Qemu Clock, that is read-mostly. This patch converts read accesses to the list to use RCU, which should hopefully mitigate most of the synchronization overhead. Write accesses are now via a mutex in the parent data structure. Functions that change the change the parent data structure are also protected by the mutex. This patch applies against Paolo Bonzini's rcu branch: https://github.com/bonzini/qemu/tree/rcu V3: - Address comments from Alex Bligh and Paolo Bonzini - Move the mutex into the parent QemuClock structure Signed-off-by: Mike Day --- include/qemu/timer.h | 19 + qemu-timer.c | 111 +-- 2 files changed, 72 insertions(+), 58 deletions(-) diff --git a/include/qemu/timer.h b/include/qemu/timer.h index 5afcffc..f69ec49 100644 --- a/include/qemu/timer.h +++ b/include/qemu/timer.h @@ -5,8 +5,15 @@ #include "qemu-common.h" #include "qemu/notify.h" -/* timers */ +#ifdef __GNUC__ +#ifndef __ATOMIC_RELEASE +#define __ATOMIC_RELEASE +#endif +#endif +#include "qemu/atomic.h" +#include "qemu/rcu.h" +/* timers */ #define SCALE_MS 100 #define SCALE_US 1000 #define SCALE_NS 1 @@ -61,6 +68,7 @@ struct QEMUTimer { void *opaque; QEMUTimer *next; int scale; +struct rcu_head rcu; }; extern QEMUTimerListGroup main_loop_tlg; @@ -189,12 +197,6 @@ void qemu_clock_notify(QEMUClockType type); * @enabled: true to enable, false to disable * * Enable or disable a clock - * Disabling the clock will wait for related timerlists to stop - * executing qemu_run_timers. Thus, this functions should not - * be used from the callback of a timer that is based on @clock. - * Doing so would cause a deadlock. - * - * Caller should hold BQL. */ void qemu_clock_enable(QEMUClockType type, bool enabled); @@ -543,7 +545,6 @@ void timer_del(QEMUTimer *ts); * freed while this function is running. */ void timer_mod_ns(QEMUTimer *ts, int64_t expire_time); - /** * timer_mod_anticipate_ns: * @ts: the timer @@ -685,9 +686,7 @@ static inline int64_t qemu_soonest_timeout(int64_t timeout1, int64_t timeout2) void init_clocks(void); int64_t cpu_get_ticks(void); -/* Caller must hold BQL */ void cpu_enable_ticks(void); -/* Caller must hold BQL */ void cpu_disable_ticks(void); static inline int64_t get_ticks_per_sec(void) diff --git a/qemu-timer.c b/qemu-timer.c index fb9e680..57a1545 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -29,6 +29,7 @@ #include "hw/hw.h" #include "qemu/timer.h" +#include "qemu/rcu_queue.h" #ifdef CONFIG_POSIX #include #endif @@ -45,12 +46,10 @@ /* timers */ typedef struct QEMUClock { -/* We rely on BQL to protect the timerlists */ QLIST_HEAD(, QEMUTimerList) timerlists; - +QemuMutex timer_lock; NotifierList reset_notifiers; int64_t last; - QEMUClockType type; bool enabled; @@ -70,11 +69,11 @@ QEMUClock qemu_clocks[QEMU_CLOCK_MAX]; struct QEMUTimerList { QEMUClock *clock; -QemuMutex active_timers_lock; QEMUTimer *active_timers; QLIST_ENTRY(QEMUTimerList) list; QEMUTimerListNotifyCB *notify_cb; void *notify_opaque; +QemuEvent timers_done_ev; }; /** @@ -87,6 +86,7 @@ struct QEMUTimerList { */ static inline QEMUClock *qemu_clock_ptr(QEMUClockType type) { +smp_rmb(); return &qemu_clocks[type]; } @@ -107,7 +107,6 @@ QEMUTimerList *timerlist_new(QEMUClockType type, timer_list->clock = clock; timer_list->notify_cb = cb; timer_list->notify_opaque = opaque; -qemu_mutex_init(&timer_list->active_timers_lock); QLIST_INSERT_HEAD(&clock->timerlists, timer_list, list); return timer_list; } @@ -118,7 +117,7 @@ void timerlist_free(QEMUTimerList *timer_list) if (timer_list->clock) { QLIST_REMOVE(timer_list, list); } -qemu_mutex_destroy(&timer_list->active_timers_lock); +qemu_event_destroy(&timer_list->timers_done_ev); g_free(timer_list); } @@ -129,6 +128,7 @@ static void qemu_clock_init(QEMUClockType type) clock->type = type; clock->enabled = true; clock->last = INT64_MIN; +qemu_mutex_init(&clock->timer_lock); QLIST_INIT(&clock->timerlists); notifier_list_init(&clock->reset_notifiers); main_loop_tlg.tl[type] = timerlist_new(type, NULL, NULL); @@ -148,13 +148,6 @@ void qemu_clock_notify(QEMUClockType type) } } -/* Disabling the clock will wait for related timerlists to stop - * executing qemu_run_timers. Thus, this functions should not - * be used from the callback of a timer that is based on @clock. - * Doing so would cause a deadlock. - * - * Caller should hold BQL. - */ void qemu_clock_enable(QEMUClockType type, bool enabled) { QEMUClock *clock = qemu_clock_ptr(type); @@ -172,7 +
[Qemu-devel] [PATCH 0/2][RFC] Convert Clock lists to RCU (V3)
The first patch in the series converts the clock->timerlists->active_timers list to RCU. It also creates QemuMutex in the parent QemuClock data structure. The second patch converts clock->timerlists to RCU. Mike Day (2): [RFC] Convert active timers list to use RCU V3 [RFC] Convert Clock Timerlists to RCU V3 include/qemu/timer.h | 19 --- qemu-timer.c | 142 --- 2 files changed, 98 insertions(+), 63 deletions(-) -- 1.9.0
Re: [Qemu-devel] [PATCH] usb-ohci: add vmstate descriptor
Alexey Kardashevskiy writes: > This adds migration support for OHCI. > > Signed-off-by: Alexey Kardashevskiy Reviewed-by: Mike Day > --- > hw/usb/hcd-ohci.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c > index e38cdeb..c42e091 100644 > --- a/hw/usb/hcd-ohci.c > +++ b/hw/usb/hcd-ohci.c > @@ -1984,6 +1984,17 @@ static Property ohci_pci_properties[] = { > DEFINE_PROP_END_OF_LIST(), > }; > > +static const VMStateDescription vmstate_ohci = { > +.name = "ohci", > +.version_id = 1, > +.minimum_version_id = 1, > +.minimum_version_id_old = 1, > +.fields = (VMStateField[]) { > +VMSTATE_PCI_DEVICE(parent_obj, OHCIPCIState), > +VMSTATE_END_OF_LIST() > +} > +}; > + > static void ohci_pci_class_init(ObjectClass *klass, void *data) > { > DeviceClass *dc = DEVICE_CLASS(klass); > @@ -1997,6 +2008,7 @@ static void ohci_pci_class_init(ObjectClass *klass, > void *data) > set_bit(DEVICE_CATEGORY_USB, dc->categories); > dc->desc = "Apple USB Controller"; > dc->props = ohci_pci_properties; > +dc->vmsd = &vmstate_ohci; > } > > static const TypeInfo ohci_pci_info = { > -- > 1.8.4.rc4 > >
Re: [Qemu-devel] [PATCH v7 2/2] target-ppc: spapr: e500: fix to use cpu_dt_id
On Sat, Feb 1, 2014 at 9:45 AM, Alexey Kardashevskiy wrote: > This makes use of @cpu_dt_id and related API in: > 1. emulated XICS hypercall handlers as they receive fixed CPU indexes; > 2. XICS-KVM to enable in-kernel XICS on right CPU; > 3. device-tree renderer. > > This removes @cpu_index fixup as @cpu_dt_id is used instead so QEMU monitor > can accept command-line CPU indexes again. > > This changes kvm_arch_vcpu_id() to use ppc_get_vcpu_dt_id() as at the moment > KVM CPU id and device tree ID are calculated using the same algorithm. > Signed-off-by: Alexey Kardashevskiy Acked-by: Mike Day > --- > Changes: > v7: > * replaced referencing to PowerPCCPU::parent_obj with the CPU macro > --- > hw/intc/openpic_kvm.c | 2 +- > hw/intc/xics.c | 15 +-- > hw/intc/xics_kvm.c | 10 +- > hw/ppc/e500.c | 7 +-- > hw/ppc/spapr.c | 9 + > hw/ppc/spapr_hcall.c| 6 +++--- > hw/ppc/spapr_rtas.c | 14 +++--- > target-ppc/kvm.c| 2 +- > target-ppc/translate_init.c | 1 + > 9 files changed, 41 insertions(+), 25 deletions(-) > > diff --git a/hw/intc/openpic_kvm.c b/hw/intc/openpic_kvm.c > index c7f7b84..87fdb12 100644 > --- a/hw/intc/openpic_kvm.c > +++ b/hw/intc/openpic_kvm.c > @@ -228,7 +228,7 @@ int kvm_openpic_connect_vcpu(DeviceState *d, CPUState *cs) > > encap.cap = KVM_CAP_IRQ_MPIC; > encap.args[0] = opp->fd; > -encap.args[1] = cs->cpu_index; > +encap.args[1] = kvm_arch_vcpu_id(cs); > > return kvm_vcpu_ioctl(cs, KVM_ENABLE_CAP, &encap); > } > diff --git a/hw/intc/xics.c b/hw/intc/xics.c > index b437563..64aabe7 100644 > --- a/hw/intc/xics.c > +++ b/hw/intc/xics.c > @@ -33,6 +33,17 @@ > #include "qemu/error-report.h" > #include "qapi/visitor.h" > > +static int get_cpu_index_by_dt_id(int cpu_dt_id) > +{ > +PowerPCCPU *cpu = ppc_get_vcpu_by_dt_id(cpu_dt_id); > + > +if (cpu) { > +return cpu->parent_obj.cpu_index; > +} > + > +return -1; > +} > + > void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu) > { > CPUState *cs = CPU(cpu); > @@ -659,7 +670,7 @@ static target_ulong h_cppr(PowerPCCPU *cpu, > sPAPREnvironment *spapr, > static target_ulong h_ipi(PowerPCCPU *cpu, sPAPREnvironment *spapr, >target_ulong opcode, target_ulong *args) > { > -target_ulong server = args[0]; > +target_ulong server = get_cpu_index_by_dt_id(args[0]); > target_ulong mfrr = args[1]; > > if (server >= spapr->icp->nr_servers) { > @@ -728,7 +739,7 @@ static void rtas_set_xive(PowerPCCPU *cpu, > sPAPREnvironment *spapr, > } > > nr = rtas_ld(args, 0); > -server = rtas_ld(args, 1); > +server = get_cpu_index_by_dt_id(rtas_ld(args, 1)); > priority = rtas_ld(args, 2); > > if (!ics_valid_irq(ics, nr) || (server >= ics->icp->nr_servers) > diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c > index c203646..a5bbc24 100644 > --- a/hw/intc/xics_kvm.c > +++ b/hw/intc/xics_kvm.c > @@ -65,7 +65,7 @@ static void icp_get_kvm_state(ICPState *ss) > ret = kvm_vcpu_ioctl(ss->cs, KVM_GET_ONE_REG, ®); > if (ret != 0) { > error_report("Unable to retrieve KVM interrupt controller state" > -" for CPU %d: %s", ss->cs->cpu_index, strerror(errno)); > +" for CPU %ld: %s", kvm_arch_vcpu_id(ss->cs), > strerror(errno)); > exit(1); > } > > @@ -97,7 +97,7 @@ static int icp_set_kvm_state(ICPState *ss, int version_id) > ret = kvm_vcpu_ioctl(ss->cs, KVM_SET_ONE_REG, ®); > if (ret != 0) { > error_report("Unable to restore KVM interrupt controller state (0x%" > -PRIx64 ") for CPU %d: %s", state, ss->cs->cpu_index, > +PRIx64 ") for CPU %ld: %s", state, kvm_arch_vcpu_id(ss->cs), > strerror(errno)); > return ret; > } > @@ -325,15 +325,15 @@ static void xics_kvm_cpu_setup(XICSState *icp, > PowerPCCPU *cpu) > struct kvm_enable_cap xics_enable_cap = { > .cap = KVM_CAP_IRQ_XICS, > .flags = 0, > -.args = {icpkvm->kernel_xics_fd, cs->cpu_index, 0, 0}, > +.args = {icpkvm->kernel_xics_fd, kvm_arch_vcpu_id(cs), 0, 0}, > }; > > ss->cs = cs; > > ret = kvm_vcpu_ioctl(ss->cs, KVM_ENABLE_CAP, &xics_enable_cap); > if (ret < 0) { > -error_report("Unable to connect CPU%d t
Re: [Qemu-devel] [PATCH v7 1/2] target-ppc: add PowerPCCPU::cpu_dt_id
On Sat, Feb 1, 2014 at 9:45 AM, Alexey Kardashevskiy wrote: > Normally CPUState::cpu_index is used to pick the right CPU for various > operations. However default consecutive numbering does not always work > for POWERPC. > > These indexes are reflected in /proc/device-tree/cpus/PowerPC,POWER7@XX > and used to call KVM VCPU's ioctls. In order to achieve this, > kvmppc_fixup_cpu() was introduced. Roughly speaking, it multiplies > cpu_index by the number of threads per core. > > This approach has disadvantages such as: > 1. NUMA configuration stays broken after the fixup; > 2. CPU-targeted commands from the QEMU Monitor do not work properly as > CPU indexes have been fixed and there is no clear way for the user to > know what the new CPU indexes are. > > This introduces a @cpu_dt_id field in the CPUPPCState struct which > is initialized from @cpu_index by default and can be fixed later > to meet the device tree requirements. > > This adds an API to handle @cpu_dt_id. > > This removes kvmppc_fixup_cpu() as it is not more needed, @cpu_dt_id > is calculated in ppc_cpu_realize(). > > This will be used later in machine code. > Signed-off-by: Alexey Kardashevskiy Acked-by: Mike Day > --- > Changes: > v6: inlined kvmppc_fixup_cpu() > --- > hw/ppc/ppc.c| 22 ++ > target-ppc/cpu-qom.h| 2 ++ > target-ppc/cpu.h| 18 ++ > target-ppc/kvm.c| 13 - > target-ppc/kvm_ppc.h| 6 -- > target-ppc/translate_init.c | 10 -- > 6 files changed, 46 insertions(+), 25 deletions(-) > > diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c > index 114be64..0e82719 100644 > --- a/hw/ppc/ppc.c > +++ b/hw/ppc/ppc.c > @@ -26,6 +26,7 @@ > #include "hw/ppc/ppc_e500.h" > #include "qemu/timer.h" > #include "sysemu/sysemu.h" > +#include "sysemu/cpus.h" > #include "hw/timer/m48t59.h" > #include "qemu/log.h" > #include "hw/loader.h" > @@ -1362,3 +1363,24 @@ int PPC_NVRAM_set_params (nvram_t *nvram, uint16_t > NVRAM_size, > > return 0; > } > + > +/* CPU device-tree ID helpers */ > +int ppc_get_vcpu_dt_id(PowerPCCPU *cpu) > +{ > +return cpu->cpu_dt_id; > +} > + > +PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id) > +{ > +CPUState *cs; > + > +CPU_FOREACH(cs) { > +PowerPCCPU *cpu = POWERPC_CPU(cs); > + > +if (cpu->cpu_dt_id == cpu_dt_id) { > +return cpu; > +} > +} > + > +return NULL; > +} > diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h > index 72b2232..b17c024 100644 > --- a/target-ppc/cpu-qom.h > +++ b/target-ppc/cpu-qom.h > @@ -79,6 +79,7 @@ typedef struct PowerPCCPUClass { > /** > * PowerPCCPU: > * @env: #CPUPPCState > + * @cpu_dt_id: CPU index used in the device tree. KVM uses this index too > * > * A PowerPC CPU. > */ > @@ -88,6 +89,7 @@ typedef struct PowerPCCPU { > /*< public >*/ > > CPUPPCState env; > +int cpu_dt_id; > } PowerPCCPU; > > static inline PowerPCCPU *ppc_env_get_cpu(CPUPPCState *env) > diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h > index 51bcd4a..d8577ae 100644 > --- a/target-ppc/cpu.h > +++ b/target-ppc/cpu.h > @@ -2154,4 +2154,22 @@ static inline bool cpu_has_work(CPUState *cpu) > > void dump_mmu(FILE *f, fprintf_function cpu_fprintf, CPUPPCState *env); > > +/** > + * ppc_get_vcpu_dt_id: > + * @cs: a PowerPCCPU struct. > + * > + * Returns a device-tree ID for a CPU. > + */ > +int ppc_get_vcpu_dt_id(PowerPCCPU *cpu); > + > +/** > + * ppc_get_vcpu_by_dt_id: > + * @cpu_dt_id: a device tree id > + * > + * Searches for a CPU by @cpu_dt_id. > + * > + * Returns: a PowerPCCPU struct > + */ > +PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id); > + > #endif /* !defined (__CPU_PPC_H__) */ > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c > index 781b72f..8bcc5fb 100644 > --- a/target-ppc/kvm.c > +++ b/target-ppc/kvm.c > @@ -1766,19 +1766,6 @@ static void kvmppc_host_cpu_class_init(ObjectClass > *oc, void *data) > } > } > > -int kvmppc_fixup_cpu(PowerPCCPU *cpu) > -{ > -CPUState *cs = CPU(cpu); > -int smt; > - > -/* Adjust cpu index for SMT */ > -smt = kvmppc_smt_threads(); > -cs->cpu_index = (cs->cpu_index / smp_threads) * smt > -+ (cs->cpu_index % smp_threads); > - > -return 0; > -} > - > bool kvmppc_has_cap_epr(void) > { > return cap_epr; > diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h > index 5f78e4b..f3afcdb 1
Re: [Qemu-devel] [PATCH 2/2] [RFC] Convert Clock Timerlists to RCU V2
On Fri, Feb 28, 2014 at 3:05 PM, Alex Bligh wrote: >> Rather than introduce a second mutex for timerlists, which would >> require nested mutexes to in orderwrite to the timerlists, use one >> QemuMutex in the QemuClock structure for all write access to any list >> hanging off the QemuClock structure. > > I sort of wonder why you don't just use one Mutex across the whole > of QEMU rather than one per clock. > > This would save worrying about whether when you do things like: > qemu_mutex_lock(&timer_list->clock->clock_mutex); I like it, it solves another problem I spotted after I submitted the patch. > >> @@ -108,19 +108,24 @@ QEMUTimerList *timerlist_new(QEMUClockType type, >> timer_list->clock = clock; >> timer_list->notify_cb = cb; >> timer_list->notify_opaque = opaque; >> -qemu_mutex_init(&timer_list->active_timers_lock); >> +qemu_mutex_init(&clock->clock_mutex); >> QLIST_INSERT_HEAD(&clock->timerlists, timer_list, list); >> return timer_list; >> } > > That can't be right, you are calling qemu_mutex_init for each > timerlist, but the timerlists share the mutex which is now in the > clock. The mutex should therefore be initialized in qemu_clock_init. > Also, clock_mutex appears never to get destroyed. Yes, thank you, on both points. > >> QEMUClockType timerlist_get_clock(QEMUTimerList *timer_list) >> { >> -return timer_list->clock->type; >> +return atomic_rcu_read(&timer_list->clock->type); >> } > > I don't think this works because of the double indirection. It's > The '&' means it's not actually dereferencing clock->type, but it > is dereferencing timer_list->clock outside of either an rcu read > lock or an atomic read. I think you'd be better with than > rcu_read_lock() etc. (sadly). I should fix this with parenthesis as in &(timer_list->clock->type) >> QEMUTimerList *qemu_clock_get_main_loop_timerlist(QEMUClockType type) >> @@ -261,11 +272,13 @@ QEMUTimerList >> *qemu_clock_get_main_loop_timerlist(QEMUClockType type) >> >> void timerlist_notify(QEMUTimerList *timer_list) >> { >> -if (timer_list->notify_cb) { >> +rcu_read_lock(); >> +if (atomic_rcu_read(&timer_list->notify_cb)) { >> timer_list->notify_cb(timer_list->notify_opaque); >> } else { >> qemu_notify_event(); >> } >> +rcu_read_unlock(); >> } > > If you have the rcu_read_lock why do you need the atomic_rcu_read? > And if you need it, why do you not need it on the following line? rcu_read_lock/unlock and atomic_rcu_read/set do different things and frequently need to be used together. rcu_read_lock/unlock causes the thread to enter an RCU critical section by getting a counter out of TLS. atomic_rcu_read/set will use memory barriers to flush caches (depending on the memory model and platform) and ensure that reads and writes are ordered. You typically would use atomic_rcu_read on the first read, thereafter the memory is up-to-date and writes have been flushed. And you typically use atomic_rcu_set on the last write, when the data structure is complete. You don't need to use them on every update. Just for completeness, rcu_read_unlock ends the rcu critical section but its usually a no-op. > However, as any QEMUTimerList can (now) be reclaimed, surely all > callers of this function are going to need to hold the rcu_read_lock > anyway? Yes > I think this is used only by timerlist_rearm and qemu_clock_notify. > Provided these call this function holding the rcu_read_lock > I don't think this function needs changing. > >> /* Transition function to convert a nanosecond timeout to ms >> @@ -389,24 +402,30 @@ static void timerlist_rearm(QEMUTimerList *timer_list) >> /* stop a timer, but do not dealloc it */ >> void timer_del(QEMUTimer *ts) >> { >> -QEMUTimerList *timer_list = ts->timer_list; >> +QEMUTimerList *timer_list; >> >> -qemu_mutex_lock(&timer_list->active_timers_lock); >> +timer_list = atomic_rcu_read(&ts->timer_list); >> +rcu_read_lock(); >> +qemu_mutex_lock(&timer_list->clock->clock_mutex); >> +rcu_read_unlock(); >> timer_del_locked(timer_list, ts); >> -qemu_mutex_unlock(&timer_list->active_timers_lock); >> +qemu_mutex_unlock(&timer_list->clock->clock_mutex); >> } > > Again in my ignorance of RCU I don't know whether taking a mutex > implicitly takes an rcu read lock. If not, that rcu_read_unlock > should be after the mutex unlock. I am going to change this because of a race condition, To answer your question, holding a mutex does not imply holding an rcu read lock. It does indicate the potential need for readers to use rcu to read the list because they can do so when the mutex holder is updating the list. And, I'm working under the assumption that holding a mutex implies a memory barrier so there is no need for atomic_rcu_read/set. > >> @@ -447,7 +468,10 @@ void timer_mod_anticipate(QEMUTimer *ts, int64_t >> expire_time) >> >> bool timer_pending(QEMUTimer *ts) >> { >> -return ts->expire_time >= 0; >> +int64
[Qemu-devel] [PATCH 2/2] [RFC] Convert Clock Timerlists to RCU V2
timerlists is a list of lists that holds active timers, among other items. It is read-mostly. This patch converts read access to the timerlists to use RCU. Rather than introduce a second mutex for timerlists, which would require nested mutexes to in orderwrite to the timerlists, use one QemuMutex in the QemuClock structure for all write access to any list hanging off the QemuClock structure. This patch also protects timerlists->active_timers->timer_list by the new clock mutex for write access and by RCU for read access. This patch applies against Paolo Bonzini's rcu branch: https://github.com/bonzini/qemu/tree/rcu and also requires the previously submitted patch 03fba95 "Convert active timers list to use RCU for read operations V2." V2: - timerlists modifications split to a separate patch (this one). - Addressed Alex Blighs comments. Signed-off-by: Mike Day --- qemu-timer.c | 79 +--- 1 file changed, 54 insertions(+), 25 deletions(-) diff --git a/qemu-timer.c b/qemu-timer.c index dad30a3..e335a7a 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -47,7 +47,7 @@ typedef struct QEMUClock { QLIST_HEAD(, QEMUTimerList) timerlists; - +QemuMutex clock_mutex; NotifierList reset_notifiers; int64_t last; QEMUClockType type; @@ -69,12 +69,12 @@ QEMUClock qemu_clocks[QEMU_CLOCK_MAX]; struct QEMUTimerList { QEMUClock *clock; -QemuMutex active_timers_lock; QEMUTimer *active_timers; QLIST_ENTRY(QEMUTimerList) list; QEMUTimerListNotifyCB *notify_cb; void *notify_opaque; QemuEvent timers_done_ev; +struct rcu_head rcu; }; /** @@ -108,19 +108,24 @@ QEMUTimerList *timerlist_new(QEMUClockType type, timer_list->clock = clock; timer_list->notify_cb = cb; timer_list->notify_opaque = opaque; -qemu_mutex_init(&timer_list->active_timers_lock); +qemu_mutex_init(&clock->clock_mutex); QLIST_INSERT_HEAD(&clock->timerlists, timer_list, list); return timer_list; } +static void reclaim_timerlist(struct rcu_head *rcu) +{ +QEMUTimerList *tl = container_of(rcu, QEMUTimerList, rcu); +g_free(tl); +} + void timerlist_free(QEMUTimerList *timer_list) { assert(!timerlist_has_timers(timer_list)); if (timer_list->clock) { QLIST_REMOVE(timer_list, list); } -qemu_mutex_destroy(&timer_list->active_timers_lock); -g_free(timer_list); +call_rcu1(&timer_list->rcu, reclaim_timerlist); } static void qemu_clock_init(QEMUClockType type) @@ -144,9 +149,11 @@ void qemu_clock_notify(QEMUClockType type) { QEMUTimerList *timer_list; QEMUClock *clock = qemu_clock_ptr(type); -QLIST_FOREACH(timer_list, &clock->timerlists, list) { +rcu_read_lock(); +QLIST_FOREACH_RCU(timer_list, &clock->timerlists, list) { timerlist_notify(timer_list); } +rcu_read_unlock(); } void qemu_clock_enable(QEMUClockType type, bool enabled) @@ -154,13 +161,15 @@ void qemu_clock_enable(QEMUClockType type, bool enabled) QEMUClock *clock = qemu_clock_ptr(type); QEMUTimerList *tl; bool old = clock->enabled; -clock->enabled = enabled; +atomic_rcu_set(&clock->enabled, enabled); if (enabled && !old) { qemu_clock_notify(type); } else if (!enabled && old) { -QLIST_FOREACH(tl, &clock->timerlists, list) { +rcu_read_lock(); +QLIST_FOREACH_RCU(tl, &clock->timerlists, list) { qemu_event_wait(&tl->timers_done_ev); } +rcu_read_unlock(); } } @@ -242,16 +251,18 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type) int64_t deadline = -1; QEMUTimerList *timer_list; QEMUClock *clock = qemu_clock_ptr(type); -QLIST_FOREACH(timer_list, &clock->timerlists, list) { +rcu_read_lock(); +QLIST_FOREACH_RCU(timer_list, &clock->timerlists, list) { deadline = qemu_soonest_timeout(deadline, timerlist_deadline_ns(timer_list)); } +rcu_read_unlock(); return deadline; } QEMUClockType timerlist_get_clock(QEMUTimerList *timer_list) { -return timer_list->clock->type; +return atomic_rcu_read(&timer_list->clock->type); } QEMUTimerList *qemu_clock_get_main_loop_timerlist(QEMUClockType type) @@ -261,11 +272,13 @@ QEMUTimerList *qemu_clock_get_main_loop_timerlist(QEMUClockType type) void timerlist_notify(QEMUTimerList *timer_list) { -if (timer_list->notify_cb) { +rcu_read_lock(); +if (atomic_rcu_read(&timer_list->notify_cb)) { timer_list->notify_cb(timer_list->notify_opaque); } else { qemu_notify_event(); } +rcu_read_unlock(); } /* Transition function to convert a nanosecond timeout to ms @@ -389,24 +402,30 @@ static void timerlist
[Qemu-devel] [PATCH 0/2] [RFC] Convert Clock lists to RCU (V2)
The first patch in the series convers the clock->timerlists->active_timers list to RCU. The second patch converts clock->timerlists to RCU and also protects access to timerlists->active_timers->timer_list. Mike Day (2): [RFC] Convert active timers list to use RCU V2 [RFC] Convert Clock Timerlists to RCU V2 include/qemu/timer.h | 19 --- qemu-timer.c | 148 +++ 2 files changed, 98 insertions(+), 69 deletions(-) -- 1.8.5.3
[Qemu-devel] [PATCH 1/2] [RFC] Convert active timers list to use RCU V2
active_timers is a list of timer lists, associated with a Qemu Clock, that is read-mostly. This patch converts read accesses to the list to use RCU, which should hopefully mitigate most of the synchronization overhead. This patch applies against Paolo Bonzini's rcu branch: https://github.com/bonzini/qemu/tree/rcu V2: - Addresses comments from Alex Bligh Signed-off-by: Mike Day --- include/qemu/timer.h | 19 +++ qemu-timer.c | 69 ++-- 2 files changed, 44 insertions(+), 44 deletions(-) diff --git a/include/qemu/timer.h b/include/qemu/timer.h index 5afcffc..f69ec49 100644 --- a/include/qemu/timer.h +++ b/include/qemu/timer.h @@ -5,8 +5,15 @@ #include "qemu-common.h" #include "qemu/notify.h" -/* timers */ +#ifdef __GNUC__ +#ifndef __ATOMIC_RELEASE +#define __ATOMIC_RELEASE +#endif +#endif +#include "qemu/atomic.h" +#include "qemu/rcu.h" +/* timers */ #define SCALE_MS 100 #define SCALE_US 1000 #define SCALE_NS 1 @@ -61,6 +68,7 @@ struct QEMUTimer { void *opaque; QEMUTimer *next; int scale; +struct rcu_head rcu; }; extern QEMUTimerListGroup main_loop_tlg; @@ -189,12 +197,6 @@ void qemu_clock_notify(QEMUClockType type); * @enabled: true to enable, false to disable * * Enable or disable a clock - * Disabling the clock will wait for related timerlists to stop - * executing qemu_run_timers. Thus, this functions should not - * be used from the callback of a timer that is based on @clock. - * Doing so would cause a deadlock. - * - * Caller should hold BQL. */ void qemu_clock_enable(QEMUClockType type, bool enabled); @@ -543,7 +545,6 @@ void timer_del(QEMUTimer *ts); * freed while this function is running. */ void timer_mod_ns(QEMUTimer *ts, int64_t expire_time); - /** * timer_mod_anticipate_ns: * @ts: the timer @@ -685,9 +686,7 @@ static inline int64_t qemu_soonest_timeout(int64_t timeout1, int64_t timeout2) void init_clocks(void); int64_t cpu_get_ticks(void); -/* Caller must hold BQL */ void cpu_enable_ticks(void); -/* Caller must hold BQL */ void cpu_disable_ticks(void); static inline int64_t get_ticks_per_sec(void) diff --git a/qemu-timer.c b/qemu-timer.c index fb9e680..dad30a3 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -29,6 +29,7 @@ #include "hw/hw.h" #include "qemu/timer.h" +#include "qemu/rcu_queue.h" #ifdef CONFIG_POSIX #include #endif @@ -45,12 +46,10 @@ /* timers */ typedef struct QEMUClock { -/* We rely on BQL to protect the timerlists */ QLIST_HEAD(, QEMUTimerList) timerlists; NotifierList reset_notifiers; int64_t last; - QEMUClockType type; bool enabled; @@ -75,6 +74,7 @@ struct QEMUTimerList { QLIST_ENTRY(QEMUTimerList) list; QEMUTimerListNotifyCB *notify_cb; void *notify_opaque; +QemuEvent timers_done_ev; }; /** @@ -87,6 +87,7 @@ struct QEMUTimerList { */ static inline QEMUClock *qemu_clock_ptr(QEMUClockType type) { +smp_rmb(); return &qemu_clocks[type]; } @@ -148,13 +149,6 @@ void qemu_clock_notify(QEMUClockType type) } } -/* Disabling the clock will wait for related timerlists to stop - * executing qemu_run_timers. Thus, this functions should not - * be used from the callback of a timer that is based on @clock. - * Doing so would cause a deadlock. - * - * Caller should hold BQL. - */ void qemu_clock_enable(QEMUClockType type, bool enabled) { QEMUClock *clock = qemu_clock_ptr(type); @@ -172,7 +166,7 @@ void qemu_clock_enable(QEMUClockType type, bool enabled) bool timerlist_has_timers(QEMUTimerList *timer_list) { -return !!timer_list->active_timers; +return !!atomic_rcu_read(&timer_list->active_timers); } bool qemu_clock_has_timers(QEMUClockType type) @@ -184,16 +178,17 @@ bool qemu_clock_has_timers(QEMUClockType type) bool timerlist_expired(QEMUTimerList *timer_list) { int64_t expire_time; +bool ret; -qemu_mutex_lock(&timer_list->active_timers_lock); -if (!timer_list->active_timers) { -qemu_mutex_unlock(&timer_list->active_timers_lock); +rcu_read_lock(); +if (!atomic_rcu_read(&timer_list->active_timers)) { +rcu_read_unlock(); return false; } expire_time = timer_list->active_timers->expire_time; -qemu_mutex_unlock(&timer_list->active_timers_lock); - -return expire_time < qemu_clock_get_ns(timer_list->clock->type); +ret = (expire_time < qemu_clock_get_ns(timer_list->clock->type)); +rcu_read_unlock(); +return ret; } bool qemu_clock_expired(QEMUClockType type) @@ -220,16 +215,16 @@ int64_t timerlist_deadline_ns(QEMUTimerList *timer_list) * value but ->notify_cb() is called when the deadline changes. Therefore * the caller should notice the change and there is no race condition. */ -
Re: [Qemu-devel] [PATCH] [RFC] Convert Qemu Timer List and Active Timers to RCU
> 1. You seem to be removing the use of the active_timers_lock and replacing it > by >rcu (fine). However, you seem to have left the qemu_mutex_destroy in >timerlist_free, and left the mutex in QEMUTimerList. Any reason why we > need both? > I responded incorrectly to this yesterday. We still need the mutex here (active_timers_lock) to provide synchronization for list updates. The difference now is that we don't need to hold the mutex for traversing the list. But to update the list we still need to hold the mutex. Mike
Re: [Qemu-devel] [PATCH] [RFC] Convert Qemu Timer List and Active Timers to RCU
first not be a >rcu_read_lock()? Yes, thank you. > 8. Again, may be my reading of the diff, but you appear to have >introduced a rcu_read_lock() outside the loop, but then >unlock it within the loop. If the loop iterates more than once, >won't it do more unlocks than locks? Yes, I think the right fix is to omit the last rcu_read_unlock. -- Mike Day | "Endurance is a Virtue"
Re: [Qemu-devel] [PATCH] [RFC] Convert Qemu Timer List and Active Timers to RCU
On Thu, Feb 13, 2014 at 4:25 AM, Paolo Bonzini wrote: > Il 13/02/2014 10:11, Alex Bligh ha scritto: >> >> I'll certainly have a look through this. However before I do, what >> problem is this trying to solve? Do we think there is possibility >> of contention on the active timers lock? I used to think this was >> taken (let alone contented) relatively infrequently, but Rob Herring's >> recent email suggests to me the list is being modified in some >> circumstances rather more frequently than I thought. > > I think that, more than contention, it tries to reduce the cost of > synchronization primitives, especially the locking and unlocking of the list > around the invocation of timer callbacks. Yes, the assumption is that the active timers are a read-mostly list, so rcu is a win. Mike
[Qemu-devel] [PATCH] [RFC] Convert Qemu Timer List and Active Timers to RCU
Allow readers to use RCU when reading Qemu timer lists. Applies to Paolo Bonzini's RCU branch, https://github.com/bonzini/qemu/tree/rcu. This patch is for comment and review only. The rcu branch needs to be rebased on upstream. Signed-off-by: Mike Day --- include/qemu/timer.h | 9 +- qemu-timer.c | 88 +++- 2 files changed, 67 insertions(+), 30 deletions(-) diff --git a/include/qemu/timer.h b/include/qemu/timer.h index b58903b..5aaa213 100644 --- a/include/qemu/timer.h +++ b/include/qemu/timer.h @@ -4,7 +4,13 @@ #include "qemu/typedefs.h" #include "qemu-common.h" #include "qemu/notify.h" - +#ifdef __GNUC__ +#ifndef __ATOMIC_RELEASE +#define __ATOMIC_RELEASE +#endif +#endif +#include "qemu/atomic.h" +#include "qemu/rcu.h" /* timers */ #define SCALE_MS 100 @@ -61,6 +67,7 @@ struct QEMUTimer { void *opaque; QEMUTimer *next; int scale; +struct rcu_head rcu; }; extern QEMUTimerListGroup main_loop_tlg; diff --git a/qemu-timer.c b/qemu-timer.c index 6b62e88..27285ca 100644 --- a/qemu-timer.c +++ b/qemu-timer.c @@ -29,6 +29,7 @@ #include "hw/hw.h" #include "qemu/timer.h" +#include "qemu/rcu_queue.h" #ifdef CONFIG_POSIX #include #endif @@ -49,7 +50,6 @@ typedef struct QEMUClock { NotifierList reset_notifiers; int64_t last; - QEMUClockType type; bool enabled; } QEMUClock; @@ -71,6 +71,7 @@ struct QEMUTimerList { QLIST_ENTRY(QEMUTimerList) list; QEMUTimerListNotifyCB *notify_cb; void *notify_opaque; +struct rcu_head rcu; }; /** @@ -107,6 +108,12 @@ QEMUTimerList *timerlist_new(QEMUClockType type, return timer_list; } +static void reclaim_timer_list(struct rcu_head *rcu) +{ +QEMUTimerList *timer_list = container_of(rcu, QEMUTimerList, rcu); +g_free(timer_list); +} + void timerlist_free(QEMUTimerList *timer_list) { assert(!timerlist_has_timers(timer_list)); @@ -114,7 +121,7 @@ void timerlist_free(QEMUTimerList *timer_list) QLIST_REMOVE(timer_list, list); } qemu_mutex_destroy(&timer_list->active_timers_lock); -g_free(timer_list); +call_rcu1(&timer_list->rcu, reclaim_timer_list); } static void qemu_clock_init(QEMUClockType type) @@ -138,9 +145,11 @@ void qemu_clock_notify(QEMUClockType type) { QEMUTimerList *timer_list; QEMUClock *clock = qemu_clock_ptr(type); -QLIST_FOREACH(timer_list, &clock->timerlists, list) { +rcu_read_lock(); +QLIST_FOREACH_RCU(timer_list, &clock->timerlists, list) { timerlist_notify(timer_list); } +rcu_read_unlock(); } void qemu_clock_enable(QEMUClockType type, bool enabled) @@ -155,7 +164,7 @@ void qemu_clock_enable(QEMUClockType type, bool enabled) bool timerlist_has_timers(QEMUTimerList *timer_list) { -return !!timer_list->active_timers; +return !!atomic_rcu_read(&timer_list->active_timers); } bool qemu_clock_has_timers(QEMUClockType type) @@ -168,15 +177,15 @@ bool timerlist_expired(QEMUTimerList *timer_list) { int64_t expire_time; -qemu_mutex_lock(&timer_list->active_timers_lock); -if (!timer_list->active_timers) { -qemu_mutex_unlock(&timer_list->active_timers_lock); +rcu_read_lock(); +if (atomic_rcu_read(&timer_list->active_timers)) { +rcu_read_unlock(); return false; } -expire_time = timer_list->active_timers->expire_time; -qemu_mutex_unlock(&timer_list->active_timers_lock); +expire_time = timer_list->active_timers->expire_time; return expire_time < qemu_clock_get_ns(timer_list->clock->type); +rcu_read_unlock(); } bool qemu_clock_expired(QEMUClockType type) @@ -194,8 +203,10 @@ int64_t timerlist_deadline_ns(QEMUTimerList *timer_list) { int64_t delta; int64_t expire_time; - -if (!timer_list->clock->enabled) { +bool enabled; +rcu_read_lock(); +enabled = atomic_rcu_read(&timer_list->clock->enabled); +if (!enabled) { return -1; } @@ -203,16 +214,13 @@ int64_t timerlist_deadline_ns(QEMUTimerList *timer_list) * value but ->notify_cb() is called when the deadline changes. Therefore * the caller should notice the change and there is no race condition. */ -qemu_mutex_lock(&timer_list->active_timers_lock); if (!timer_list->active_timers) { -qemu_mutex_unlock(&timer_list->active_timers_lock); +rcu_read_unlock(); return -1; } expire_time = timer_list->active_timers->expire_time; -qemu_mutex_unlock(&timer_list->active_timers_lock); - delta = expire_time - qemu_clock_get_ns(timer_list->clock->type); - +rcu_read_unlock(); if (delta <= 0) { return 0; } @@ -230,16 +238,22 @@ int64
Re: [Qemu-devel] [PATCH v3] virtio: Introduce virtio-testdev
Andrew Jones writes: > This is a virtio version of hw/misc/debugexit and should evolve into a > virtio version of pc-testdev. pc-testdev uses the PC's ISA bus, whereas > this testdev can be plugged into a virtio-mmio transport, which is > needed for kvm-unit-tests/arm. virtio-testdev uses the virtio device > config space as a communication channel, and implements an RTAS-like > protocol through it allowing guests to execute commands. Only three > commands are currently implemented; > 1) VERSION: for version compatibility checks > 2) CLEAR: set all the config space back to zero > 3) EXIT:exit() from qemu with a status code > +static uint32_t virtio_testdev_get_features(VirtIODevice *vdev, uint32_t f) > +{ > +return f; > +} > + Is this meant to be a stub currently? Mike -- Mike Day | "Endurance is a Virtue"
Re: [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node
On Mon, Jan 20, 2014 at 12:24 PM, Michael Roth wrote: > Quoting Alexey Kardashevskiy (2014-01-19 20:58:20) > > Would need to look at it a bit more closely to say for certain, but after > discussing it a bit Tyrel/Mike, I think the main considerations would be: > > 1) PHB hotplug/unplug would need to signal a different event type in it's >check-exception/epow message, we have stubs in place for a PHB event type, >so that's mostly a matter of adding special-casing in the hotplug callback >for spapr-pci-host-bridge devices > 2) The required properties for the OF node corresponding PHB will be > different. >Currently these are generated as part of the hotplug callback, and attached >to the corresponding ConfigureConnectorState node to be fed to the guest >via subsequent ibm,configure-connector RTAS calls, so we'd just hook the >PHB's OF node generation code in there as. > 3) The sysctl/kernel interface for handling PHB hotplug would be different, >we'd be relying on the rpadlar kernel module >(/sys/bus/pci/slots/control/add_slot) rather than rpaphp >(/sys/bus/pci/slots//power) or the PCI rescan fallback. >This is mostly a matter of modifying the handling in the guest tools, > namely >in rtas_errd, to handle the event accordingly. > > We also haven't done anything extensive using rpadlpar operations within qemu > guests, so there may be various odds/ends and possibly kernel changes needed > to > get that working properly (as was the case for rpaphp, though thanks to the > PCI > rescan workaround a new kernel isn't required for existing guests... a similar > fallback likely won't be available for rpadlpar) > > But from a high-level view at least it seems fairly straight-forward. I'll see > if we can get a prototype working. The fact that it "just works" now by rescanning the pci filesystem is a significant benefit. I don't think we want to lose it. There can be many PHBs on one of these systems. Maybe we could make the PHB hot-pluggable and also always have one PHB plugged in at startup. Then the guest will see the bus when it starts and it will build the pci file system. Mike
Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide
Do you know which device is writing to the BAR below? From the trace it appears it should be restoring the memory address to the BAR after writing all 1s to the BAR and reading back the contents. (the protocol for finding the length of the bar memory.) On Thu, Jan 9, 2014 at 12:24 PM, Alex Williamson wrote: > On Wed, 2013-12-11 at 20:30 +0200, Michael S. Tsirkin wrote: >> From: Paolo Bonzini > vfio: vfio_pci_read_config(:01:10.0, @0x10, len=0x4) febe0004 > (save lower 32bits of BAR) > vfio: vfio_pci_write_config(:01:10.0, @0x10, 0x, len=0x4) > (write mask to BAR) Here the device should restore the memory address (original contents) to the BAR. > vfio: region_del febe - febe3fff > (memory region gets unmapped) > vfio: vfio_pci_read_config(:01:10.0, @0x10, len=0x4) c004 > (read size mask) > vfio: vfio_pci_write_config(:01:10.0, @0x10, 0xfebe0004, len=0x4) > (restore BAR) > vfio: region_add febe - febe3fff [0x7fcf3654d000] > (memory region re-mapped) > vfio: vfio_pci_read_config(:01:10.0, @0x14, len=0x4) 0 > (save upper 32bits of BAR) > vfio: vfio_pci_write_config(:01:10.0, @0x14, 0x, len=0x4) > (write mask to BAR) and here ... > vfio: region_del febe - febe3fff > (memory region gets unmapped) > vfio: region_add febe - febe3fff [0x7fcf3654d000] > (memory region gets re-mapped with new address) > qemu-system-x86_64: vfio_dma_map(0x7fcf38861710, 0xfebe, 0x4000, > 0x7fcf3654d000) = -14 (Bad address) > (iommu barfs because it can only handle 48bit physical addresses) I looked around some but I couldn't find an obvious culprit. Could it be that the BAR is getting unmapped automatically due to x-intx-mmap-timeout-ms before the device has a chance to finish restoring the correct value to the BAR? Mike
Re: [Qemu-devel] [PATCH v2 01/14] spapr: populate DRC entries for root dt node
On Sun, Jan 19, 2014 at 9:58 PM, Alexey Kardashevskiy wrote: > > I did not realize DRC is not just for PCI. How hard would it be to add hot > plug support for a whole PHB? The current QEMU trend is to make QEMU > monitor's "device_add" equal to the command line's "-device" which is not > (yet) true for PHB but could be. Thanks. We discussed this approach (hot-plug the whole bus) during the design phase and at one point started to work on it. I don't think we established whether or not the Linux sys/bus/pci/* file system would work with it. Mike
Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide
>> >> The address above has already been masked. What you need to do is read >> the BAR. If the value from the BAR end in '1', its MMIO. If it ends in >> '10', its RAM. If it ends in '0n' its disabled. The first thing that >> the PCI software does after reading the BAR is mask off the two low >> bits. > > Are you perhaps confusing MMIO and I/O port? I/O port cannot be mmap'd > on x86, so it can't be directly mapped. It also doesn't come through > the address_space_memory filter. I/O port is deprecated, or at least > discouraged, MMIO is not. Thanks, You're right, sorry I missed that. It doesn't solve the problem. Mike
Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide
On Tue, Jan 14, 2014 at 12:49 PM, Mike Day wrote: >>> > > >>>>>>> >>> > > >>>>>>>>> Prior to this change, there was no re-map with the >>> > > >>>>>>>>> febe > >> If we choose not to map them, how do we distinguish them from guest RAM? >> There's no MemoryRegion flag that I'm aware of to distinguish a ram_ptr >> that points to a chunk of guest memory from one that points to the mmap >> of a device BAR. I think I'd need to explicitly walk all of the vfio >> device and try to match the MemoryRegion pointer to one of my devices. >> That only solves the problem for vfio devices and not ivshmem devices or >> pci-assign devices. >> > > I don't know if this will save you doing your memory region search or > not. But a BAR that ends with the low bit set is MMIO, and BAR that > ends with the low bit clear is RAM. So the address above is RAM as was > pointed out earlier in the thread. If you got an ambitious address in > the future you could test the low bit. But MMIO is deprecated > according to http://wiki.osdev.org/PCI so you probably won't see it, > at least for 64-bit addresses. s/ambitious/ambiguous/ The address above has already been masked. What you need to do is read the BAR. If the value from the BAR end in '1', its MMIO. If it ends in '10', its RAM. If it ends in '0n' its disabled. The first thing that the PCI software does after reading the BAR is mask off the two low bits. Mike
Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide
>> > > >>> >> > > > Prior to this change, there was no re-map with the >> > > > febe > If we choose not to map them, how do we distinguish them from guest RAM? > There's no MemoryRegion flag that I'm aware of to distinguish a ram_ptr > that points to a chunk of guest memory from one that points to the mmap > of a device BAR. I think I'd need to explicitly walk all of the vfio > device and try to match the MemoryRegion pointer to one of my devices. > That only solves the problem for vfio devices and not ivshmem devices or > pci-assign devices. > I don't know if this will save you doing your memory region search or not. But a BAR that ends with the low bit set is MMIO, and BAR that ends with the low bit clear is RAM. So the address above is RAM as was pointed out earlier in the thread. If you got an ambitious address in the future you could test the low bit. But MMIO is deprecated according to http://wiki.osdev.org/PCI so you probably won't see it, at least for 64-bit addresses. Mike
Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide
On Tue, Jan 14, 2014 at 9:05 AM, Michael S. Tsirkin wrote: > On Tue, Jan 14, 2014 at 08:50:54AM -0500, Mike Day wrote: >> >> Also, both 32-bit and 64-bit BARs are required to be supported. It is >> legal to construct a 64-bit BAR by masking all the high bits to >> zero. Presumably it would be OK to mask the 16 high bits to zero as >> well, constructing a 48-bit address. > The question was whether addresses such as > 0xfec0 can be a valid BAR value on these > platforms, whether it's accessible to the CPU and > to other PCI devices. The answer has to be no at least for Linux. Linux uses the high bit of the page table address as state to indicate a huge page and uses 48-bit addresses. Each PCI device is different but right now Power7 supports 16TB of RAM so I don't think the PCI bridge would necessarily decode the high 16 bits of the memory address. For two PCI devices to communicate with each other using 64-bit addresses they both need to support 64-bit memory in the same address range, which is possible. All this info subject to Paul Mackerras or Alexy … Mike
Re: [Qemu-devel] [PULL 14/28] exec: make address spaces 64-bit wide
"Michael S. Tsirkin" writes: > On Fri, Jan 10, 2014 at 08:31:36AM -0700, Alex Williamson wrote: > Short term, just assume 48 bits on x86. > > We need to figure out what's the limitation on ppc and arm - > maybe there's none and it can address full 64 bit range. > > Cc some people who might know about these platforms. The document you need is here: http://goo.gl/fJYxdN "PCI Bus Binding To: IEEE Std 1275-1994" The short answer is that Power (OpenFirmware-to-PCI) supports both MMIO and Memory mappings for BARs. Also, both 32-bit and 64-bit BARs are required to be supported. It is legal to construct a 64-bit BAR by masking all the high bits to zero. Presumably it would be OK to mask the 16 high bits to zero as well, constructing a 48-bit address. Mike -- Mike Day | "Endurance is a Virtue"
Re: [Qemu-devel] [RFC PATCH v2] PPC: smp: autodetect numbers of threads per core
On Fri, Jan 10, 2014 at 9:25 AM, Alexander Graf wrote: >>> On 01/11/2014 01:00 AM, Alexander Graf wrote: >> Can't we determine the number of "default threads" at a common place, preferably derived from cpu type? >>> >>> We can do anything. I asked how exactly as I really (really) do not >>> understand the details. >>> >> >> Are you suggesting we create a dictionary with all the cpu type >> information stored in it >> (stepping, cores, threads, memory channels, caches) that we need to >> keep updated? > > We can always talk in extremes :). Today we have a dictionary of core types > in QEMU. If a certain core type comes with a specific number of threads, > that's a property of the core, no? Yes, very true. I'm just considering the eventual situation where each cpu type has several or more child classes to represent different configurations (threads, cores). That would be a lot more complicated than now. Mike
Re: [Qemu-devel] [RFC PATCH v2] PPC: smp: autodetect numbers of threads per core
On Fri, Jan 10, 2014 at 9:13 AM, Alexey Kardashevskiy wrote: > On 01/11/2014 01:00 AM, Alexander Graf wrote: >> Can't we determine the number of "default threads" at a common place, >> preferably derived from cpu type? > > We can do anything. I asked how exactly as I really (really) do not > understand the details. > Are you suggesting we create a dictionary with all the cpu type information stored in it (stepping, cores, threads, memory channels, caches) that we need to keep updated?
Re: [Qemu-devel] [RFC PATCH v2] PPC: smp: autodetect numbers of threads per core
Alexander Graf writes: >> The patch as it its now is very simple and well-contained. I wonder how >> much it would expand if we added a max thread count to the cpu class. It >> seems like the need for a max thread count is idiomatic to powerpc. > > It's only ever useful on IBM POWER. Any other PowerPC system can partition > vcpus by host threads, it's only IBM POWER hardware that's as broken as it is. > I do see that the user experience is slightly suboptimal, but by creating > this special case there's a good chance you're doing more harm than good. All the problems you raise about special cases are true. But, as you point out, ibm power is uniquely broken, which would possibly justify a special case, unless there is a better general solution. In other words, special cases exist for unique circumstances and if I understand you correctly this is a unique circumstance. Alexy explained the use case in his initial posting. The user needs to allocate all threads of a core to an instance of KVM, but has no easy way to obtain that information (threads per core) for the Qemu threads option. So specifying threads="max" is a more user-friendly option. In my understanding this is strictly a usability issue. Mike -- Mike Day | "Endurance is a Virtue"
Re: [Qemu-devel] [RFC PATCH v2] PPC: smp: autodetect numbers of threads per core
Alexey Kardashevskiy writes: > On 01/10/2014 10:40 AM, Alexander Graf wrote: >> > >> What if we make the max thread count a property of our cpu class? The >> we > can add a threads=max option which will be identical between kvm and tcg. > > > You lost me here :) > Right now the sequence is: > 1. smp_parse > 2. config_accelerator > 3. machine_init > > I proposed > 1. config_accelerator - reads max threads from KVM (and initializes "host" > type) > 2. smp_parse - does the parsing using smp_threads tweaked in 1) > 3. machine_init - creates CPUs which may or may be not "host". The patch as it its now is very simple and well-contained. I wonder how much it would expand if we added a max thread count to the cpu class. It seems like the need for a max thread count is idiomatic to powerpc. Mike -- Mike Day | "Endurance is a Virtue"
Re: [Qemu-devel] [RFC PATCH v2] PPC: smp: autodetect numbers of threads per core
Alexey Kardashevskiy writes: > /* compute missing values, prefer sockets over cores over threads */ > if (cpus == 0 || sockets == 0) { > sockets = sockets > 0 ? sockets : 1; > cores = cores > 0 ? cores : 1; > -threads = threads > 0 ? threads : 1; > +if (threads_max) { > +if (threads > 0) { > +fprintf(stderr, "Use either threads or threads_max\n"); > +exit(1); If you went ahead with the threads="max" string option you wouldn't need to check here for mutual excusivity and the user wouldn't need to worry about an extra command options. > +} > +threads = smp_threads > 0 ? smp_threads : 1; > +} else { > +threads = threads > 0 ? threads : 1; > +} > if (cpus == 0) { > cpus = cores * threads * sockets; > } -- Mike Day | "Endurance is a Virtue"
[Qemu-devel] [PATCH] A hexdump function that also displays UTF-8 strings contained in the dumped buffer.
This function is used by a forthcomingQemu monitor command that dumps contents of OpenFirmware Device Trees. It dumps contents of a buffer as hex in the same format as the existing function but also also appends any UTF-8 strings in human-readable format. Like the existing hexdump function, this function may be used elsewhere in Qemu, and it shares the same prototype as the existing function. In both functions, check for a NULL prefix parameter and omit printing the prefix if it is null. Here is a sample of the output of both functions with no prefix string: : 61 62 33 64 62 65 65 66 65 62 34 64 66 62 65 03 0010: 67 62 35 64 68 01 05 03 69 62 36 64 6a 01 06 03 0020: 6b 62 37 64 6c 01 07 03 6d 62 38 64 6e 01 08 03 0030: 6f 62 39 64 70 01 09 03 71 62 78 64 : 61 62 33 64 62 65 65 66 65 62 34 64 66 62 65 03 ab3dbeefeb4dfbe. 0010: 67 62 35 64 68 01 05 03 69 62 36 64 6a 01 06 03 gb5dh...ib6dj... 0020: 6b 62 37 64 6c 01 07 03 6d 62 38 64 6e 01 08 03 kb7dl...mb8dn... 0030: 6f 62 39 64 70 01 09 03 71 62 78 64ob9dp...qbxd Signed-off-by: Mike Day --- include/qemu-common.h | 2 ++ util/hexdump.c| 48 +++- 2 files changed, 49 insertions(+), 1 deletion(-) diff --git a/include/qemu-common.h b/include/qemu-common.h index 5054836..7b8e2b9 100644 --- a/include/qemu-common.h +++ b/include/qemu-common.h @@ -435,6 +435,8 @@ int mod_utf8_codepoint(const char *s, size_t n, char **end); */ void qemu_hexdump(const char *buf, FILE *fp, const char *prefix, size_t size); +/* include any strings alongside the hex output */ +void qemu_hexdump_str(gchar *buf, FILE *fp, const gchar *prefix, size_t len); /* vector definitions */ #ifdef __ALTIVEC__ diff --git a/util/hexdump.c b/util/hexdump.c index 969b340..a920c81 100644 --- a/util/hexdump.c +++ b/util/hexdump.c @@ -21,7 +21,11 @@ void qemu_hexdump(const char *buf, FILE *fp, const char *prefix, size_t size) for (b = 0; b < size; b++) { if ((b % 16) == 0) { -fprintf(fp, "%s: %04x:", prefix, b); +if (prefix) { +fprintf(fp, "%s: %04x:", prefix, b); +} else { +fprintf(fp, "%04x:", b); +} } if ((b % 4) == 0) { fprintf(fp, " "); @@ -35,3 +39,45 @@ void qemu_hexdump(const char *buf, FILE *fp, const char *prefix, size_t size) fprintf(fp, "\n"); } } + +/* print any strings along side the hex dump */ +void qemu_hexdump_str(gchar *buf, FILE *fp, const gchar *prefix, size_t len) +{ + +gchar *inp, *linep; +int i, offset = 0; +inp = linep = buf; + +do { +if (prefix) { +fprintf(fp, "%s: %04x: ", prefix, offset); +} else { +fprintf(fp, "%04x: ", offset); +} +for (i = 0; i < 16 && len > 0; i++, len--, offset++, inp++) { +if (i && !(i % 4)) { +fprintf(fp, " "); +} +fprintf(fp, "%02hx ", *inp); +} +int j; +if (i < 16) { +for (j = 16 - i; j; --j) { +fprintf(fp, " "); +if (j && (!(j % 4))) { +fprintf(fp, " "); +} +} +} +fprintf(fp, " "); +for (j = 0; j < i; j++) { +if (*(linep + j) < 0x20 || *(linep + j) > 0x7e) { +fprintf(fp, "%c", '.'); +} else { +fprintf(fp, "%c", *(linep + j)); +} +} +fprintf(fp, "\n"); +linep = inp; +} while (len); +}
Re: [Qemu-devel] [RFC] SPAPR-PCI Hotplug Support in Qemu
Adding Anthony's corrected address. On Thu, Oct 10, 2013 at 6:25 PM, Mike Day wrote: [RFC] SPAPR-PCI Hotplug Support in Qemu > > Background: > ppc64 has a unique bus structure for PCI slots: each slot is connected > to its PHB by a pci switch. This is true in some IBM hardware as well as > paravirtual hardware (PAPR). > > SLOF firmware normally scans the hardware bus and creates the correct > slot/PCI switch -complex in the open firmware device tree. It also > configures the slot and PCI switch (BARs, etc.) > > For devices set up by platform firmware, each PCI device is attached to > its PHB and correctly configured. > > For Linux hot-plugged devices running under PowerVM today, each device > is created with a PCI switch hanging off the dev->subordinate > pointer. (PowerVM gets this info from the open firmware device tree in > rtas.) > > Problem: > > The Qemu hot-plug path doesn't anticipate a PCI switch being attached > to every PHB slot. > > When hot-plugging a device, Qemu qdev creates the device, which allows > the device to initialize itself. Qemu then passes this initialized > device to the ppc PHB via the hot-plug path.[1] > > The current ppc hot-plug code then creates a device tree node for the > device [2], and allocates resources (BARs etc) for the new device. [3] > > The ppc64 kernel expects each hot-plugged PCI device structure to > point to a subordinate bus dev->subordinate. This assumption is held > throughout the ppc PCI code, and there are numerous opportunities for > panics when the device gets passed to a kernel routine with a > subordinate pointer. [4] > > > Proposed Solutions: > > (1) Create and hook an inert PCI switch to every hot-plugged PCI > device in Qemu. > > (a) After the device has initialized itself, at hot-plug time, create a > new PCI switch, configure the switch, allocate BARs, and attach the > switch to the hot-plugged devices (dev->subordinate). > > (b) create a new device tree node that begins with the PCI switch and > the parent of the hot-plugged device. Add the PCI switch/device > complex to the device tree under the PHB. > > (2) Add each hot-plugged PCI device to its own complex of PHB > (Processor Host Bus) and PCI switch. > > Simplify (1) by creating a new PHB for each hot-plugged device. > > (a) At PHB creation time, create a PCI switch device node for each PHB > slot. > > (b) At hot-plug time, create and configure a new PHB and add the > hot-plugged device to one of the slots. Configure and allocate > resources as normally. > > Comments: > > The current code has only one PHB. We know we need to support more > than one PHB ultimately. Solution #2 is consistent with this approach. > > > [1] https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c > > [2] ibm,rtas_configure_connector: > https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c#L575 > > [3] spapr_phb_add_pci_dt > https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c#L900 > > [4] dlpar_pci_add_bus > http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/pci/hotplug/rpadlpar_core.c?id=8bf3379a74bc9132751bfa685bad2da318fd59d7#n165 > > > -- > > Mike Day | + 1 919 371-8786 | ncm...@ncultra.org > "Endurance is a Virtue"
[Qemu-devel] [RFC] SPAPR-PCI Hotplug Support in Qemu
[RFC] SPAPR-PCI Hotplug Support in Qemu Background: ppc64 has a unique bus structure for PCI slots: each slot is connected to its PHB by a pci switch. This is true in some IBM hardware as well as paravirtual hardware (PAPR). SLOF firmware normally scans the hardware bus and creates the correct slot/PCI switch -complex in the open firmware device tree. It also configures the slot and PCI switch (BARs, etc.) For devices set up by platform firmware, each PCI device is attached to its PHB and correctly configured. For Linux hot-plugged devices running under PowerVM today, each device is created with a PCI switch hanging off the dev->subordinate pointer. (PowerVM gets this info from the open firmware device tree in rtas.) Problem: The Qemu hot-plug path doesn't anticipate a PCI switch being attached to every PHB slot. When hot-plugging a device, Qemu qdev creates the device, which allows the device to initialize itself. Qemu then passes this initialized device to the ppc PHB via the hot-plug path.[1] The current ppc hot-plug code then creates a device tree node for the device [2], and allocates resources (BARs etc) for the new device. [3] The ppc64 kernel expects each hot-plugged PCI device structure to point to a subordinate bus dev->subordinate. This assumption is held throughout the ppc PCI code, and there are numerous opportunities for panics when the device gets passed to a kernel routine with a subordinate pointer. [4] Proposed Solutions: (1) Create and hook an inert PCI switch to every hot-plugged PCI device in Qemu. (a) After the device has initialized itself, at hot-plug time, create a new PCI switch, configure the switch, allocate BARs, and attach the switch to the hot-plugged devices (dev->subordinate). (b) create a new device tree node that begins with the PCI switch and the parent of the hot-plugged device. Add the PCI switch/device complex to the device tree under the PHB. (2) Add each hot-plugged PCI device to its own complex of PHB (Processor Host Bus) and PCI switch. Simplify (1) by creating a new PHB for each hot-plugged device. (a) At PHB creation time, create a PCI switch device node for each PHB slot. (b) At hot-plug time, create and configure a new PHB and add the hot-plugged device to one of the slots. Configure and allocate resources as normally. Comments: The current code has only one PHB. We know we need to support more than one PHB ultimately. Solution #2 is consistent with this approach. [1] https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c [2] ibm,rtas_configure_connector: https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c#L575 [3] spapr_phb_add_pci_dt https://github.com/mdroth/qemu/blob/spapr-pci-hotplug/hw/ppc/spapr_pci.c#L900 [4] dlpar_pci_add_bus http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/pci/hotplug/rpadlpar_core.c?id=8bf3379a74bc9132751bfa685bad2da318fd59d7#n165 -- Mike Day | + 1 919 371-8786 | ncm...@ncultra.org "Endurance is a Virtue"
Re: [Qemu-devel] [PATCH v4 2/3] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe
Paolo Bonzini writes: > Il 30/09/2013 15:34, Alex Bligh ha scritto: >> >> I think the most likely change here is that the walkers might >> move outside the BQL. Given modification of this list is so rare, >> the lock would be very very read heavy, so RCU is probably a >> sensible option. > > I agree. Keeping the write side on the BQL is sane, but RCU-protecting > the read side actually makes the rules simpler. > > Mike, would you like to give it a shot? Yes, I will. I'll have a patchset for review within a couple of days. Mike -- Mike Day | + 1 919 371-8786 | ncm...@ncultra.org "Endurance is a Virtue"
Re: [Qemu-devel] [PATCH v4 2/3] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe
On Mon, Sep 30, 2013 at 9:34 AM, Alex Bligh wrote: >>> > void qemu_clock_notify(QEMUClockType type) > > ... >>> >>> > int64_t qemu_clock_deadline_ns_all(QEMUClockType type) > > ... > >>> > I think these functions are always called now with the BQL held, so I >>> > wonder if they are good candidates for RCU? >>> >>> These routines iterate through the list of timerlists held by >>> a clock. >>> >>> They do not iterate through the list of active timers in a timer >>> list. I believe the latter is what active_timers_lock protects. >>> >>> The list of timers attached to a clock is only modified when timers >>> are created and deleted which is (currently) under the BQL. >> >> Sorry, and thanks for the correction re: active_timers_lock. I should >> have said that clock->timerlists may need its own mutex if and when we >> remove the BQL from the timer code. > > > Correct. The list of timerlists is only modified when a > QEMUTimerListGroup is created or destroyed (currently when an > AioContext is created or destroyed), and that is done with the > BQL held. > > As you point out, it's currently walked by a couple of functions > that also have the BQL held. > > I think the most likely change here is that the walkers might > move outside the BQL. Given modification of this list is so rare, > the lock would be very very read heavy, so RCU is probably a > sensible option. > > This link may (or may not) help in understanding: > http://blog.alex.org.uk/2013/08/24/changes-to-qemus-timer-system/ Alex, thanks for the link - Mike
Re: [Qemu-devel] [PATCH v4 2/3] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe
> > > On Mon, Sep 30, 2013 at 8:55 AM, Alex Bligh wrote: > > > > > > On 30 Sep 2013, at 13:45, Mike Day wrote: > > > > > I've applied this set to Paolo's rcu tree - I see a couple of routines > > > that appear to need the active_timers_lock: > > > > > > (line 137 of qemu-timer.c in my tree) > > > void qemu_clock_notify(QEMUClockType type) > > > { > > >QEMUTimerList *timer_list; > > >QEMUClock *clock = qemu_clock_ptr(type); > > >QLIST_FOREACH(timer_list, &clock->timerlists, list) { > > >timerlist_notify(timer_list); > > >} > > > } > > > > > > (line 228 of qemu-timer.c in my tree) > > > int64_t qemu_clock_deadline_ns_all(QEMUClockType type) > > > { > > >int64_t deadline = -1; > > >QEMUTimerList *timer_list; > > >QEMUClock *clock = qemu_clock_ptr(type); > > >QLIST_FOREACH(timer_list, &clock->timerlists, list) { > > >deadline = qemu_soonest_timeout(deadline, > > > > timerlist_deadline_ns(timer_list)); > > >} > > >return deadline; > > > } > > > > > > I think these functions are always called now with the BQL held, so I > > > wonder if they are good candidates for RCU? > > > > These routines iterate through the list of timerlists held by > > a clock. > > > > They do not iterate through the list of active timers in a timer > > list. I believe the latter is what active_timers_lock protects. > > > > The list of timers attached to a clock is only modified when timers > > are created and deleted which is (currently) under the BQL. > > > > Sorry, and thanks for the correction re: active_timers_lock. I should have said that clock->timerlists may need its own mutex if and when we remove the BQL from the timer code. Mike
Re: [Qemu-devel] [PATCH v4 2/3] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe
ive_timers_lock); > +for (t = timer_list->active_timers; t != NULL; t = t->next) { > if (t == ts) { > -return true; > +found = true; > +break; > } > } > -return false; > +qemu_mutex_unlock(&timer_list->active_timers_lock); > +return found; > } > > bool timer_expired(QEMUTimer *timer_head, int64_t current_time) > @@ -369,23 +410,31 @@ bool timerlist_run_timers(QEMUTimerList *timer_list) > QEMUTimer *ts; > int64_t current_time; > bool progress = false; > - > +QEMUTimerCB *cb; > +void *opaque; > + > if (!timer_list->clock->enabled) { > return progress; > } > > current_time = qemu_clock_get_ns(timer_list->clock->type); > for(;;) { > +qemu_mutex_lock(&timer_list->active_timers_lock); > ts = timer_list->active_timers; > if (!timer_expired_ns(ts, current_time)) { > +qemu_mutex_unlock(&timer_list->active_timers_lock); > break; > } > + > /* remove timer from the list before calling the callback */ > timer_list->active_timers = ts->next; > ts->next = NULL; > +cb = ts->cb; > +opaque = ts->opaque; > +qemu_mutex_unlock(&timer_list->active_timers_lock); > > /* run the callback (the timer list can be modified) */ > -ts->cb(ts->opaque); > +cb(opaque); > progress = true; > } > return progress; > -- > 1.8.3.1 > > -- Mike Day | + 1 919 371-8786 | ncm...@ncultra.org "Endurance is a Virtue"
Re: [Qemu-devel] [PATCH v5 0/5] bugs fix for hpet
Paolo Bonzini writes: > Il 25/09/2013 08:27, liu ping fan ha scritto: >> Hi, is hpet orphan? Or who can help me to merge this patch-set if my >> patch is fine. > > Anthony, Michael? Sorry, wrong Michael - Mike -- Mike Day | + 1 919 371-8786 | ncm...@ncultra.org "Endurance is a Virtue"
Re: [Qemu-devel] [PATCH v5 0/5] bugs fix for hpet
Paolo Bonzini writes: > Il 25/09/2013 08:27, liu ping fan ha scritto: >> Hi, is hpet orphan? Or who can help me to merge this patch-set if my >> patch is fine. > > Anthony, Michael? Yes, happy to help out with this. I'll start looking at it now and work with Liu Ping, Mike -- Mike Day | + 1 919 371-8786 | ncm...@ncultra.org "Endurance is a Virtue"
Re: [Qemu-devel] [PATCH] Convert ram_list to RCU DQ V4,2
On Mon, Sep 9, 2013 at 12:21 PM, Paolo Bonzini wrote: > > > @@ -601,12 +608,22 @@ static void reset_ram_globals(void) > > last_seen_block = NULL; > > last_sent_block = NULL; > > last_offset = 0; > > -last_version = ram_list.version; > > ram_bulk_stage = true; > > +smp_wmb(); > > +last_version = ram_list.version; > > This barrier is still not needed. Yes, I agree, because of the calling context. It is brittle though because reset_ram_globals is a static function and may be called differently in the future (or by new code at a different location). The need for a barrier may change and it would be opaque to the internals of the reset function. Also, the globals are writable elsewhere in the file. It needs more organization but I agree that should be a discrete patch. > Can you take a look at my rcu branch? The comments clarify to me why writing to last_mru does not need a write barrier, thank you. > I pushed there the conversion of mru_block to RCU (based on 4.1), and > clarified a bit more the locking conventions. Basically we have: > > - functions called under iothread lock (e.g. qemu_ram_alloc) > > - functions called under RCU or iothread lock (e.g. qemu_get_ram_ptr) > > - functions called under RCU or iothread lock, or while holding a > reference to the MemoryRegion that is looked up again (e.g. > qemu_ram_addr_from_host). > > The difference between the second and third group is simply that the > second will not call rcu_read_lock()/rcu_read_unlock(), the third will. > > Does it make sense? Should we simplify it by always calling > rcu_read_lock()/rcu_read_unlock(), which removes the second group? I think the benefits of simplification and future reliability are greater than the performance cost of the rcu_read_lock. The latter should be something we verify, though. Thank you! Mike
[Qemu-devel] [PATCH] Convert ram_list to RCU DQ V4,2
Changes from V4.1: * Correct memory barriers for ram_list globals. Changes from V4: * rebased on https://github.com/bonzini/qemu/tree/rcu commit 965f3b2aac93bca6df50c86fb17a06b3c856fa30 Changes from V3: * now passes virt-test -t qemu * uses call_rcu instead of call_rcu1 * completely removed the ram_list mutex and its locking functions * cleaned up some comments Changes from V2: * added rcu reclaim function to free ram blocks * reduced the number of comments * made rcu locking policy consistent for different caller scenarios * rcu updates to ram_block now are protected only by the iothread mutex Changes from V1: * Omitted locks or rcu critical sections within Some functions that read or write the ram_list but are called in a protected context (the caller holds the iothread lock, the ram_list mutex, or an rcu critical section). Allow "unlocked" reads of the ram_list by using an RCU-enabled DQ. Most readers of the list no longer require holding the list mutex. The ram_list now uses a QLIST instead of a QTAILQ. The difference is minimal. This patch has been built and make-checked for the x86_64, ppc64, s390x, and arm targets. It has been virt-tested using KVM -t qemu and passes 15/15 migration tests. To apply this patch, you must base upon Paolo Bonzini's rcu tree and also apply the RCU DQ patch (below). https://github.com/bonzini/qemu/tree/rcu http://article.gmane.org/gmane.comp.emulators.qemu/230159/ Signed-off-by: Mike Day Signed-off-by: Mike Day --- arch_init.c| 107 ++- exec.c | 167 +++-- include/exec/cpu-all.h | 13 ++-- 3 files changed, 161 insertions(+), 126 deletions(-) diff --git a/arch_init.c b/arch_init.c index 0471cd5..e24d29e 100644 --- a/arch_init.c +++ b/arch_init.c @@ -49,6 +49,7 @@ #include "trace.h" #include "exec/cpu-all.h" #include "hw/acpi/acpi.h" +#include "qemu/rcu_queue.h" #ifdef DEBUG_ARCH_INIT #define DPRINTF(fmt, ...) \ @@ -399,7 +400,8 @@ static void migration_bitmap_sync(void) trace_migration_bitmap_sync_start(); address_space_sync_dirty_bitmap(&address_space_memory); -QTAILQ_FOREACH(block, &ram_list.blocks, next) { +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) { if (memory_region_test_and_clear_dirty(block->mr, addr, TARGET_PAGE_SIZE, @@ -408,6 +410,8 @@ static void migration_bitmap_sync(void) } } } +rcu_read_unlock(); + trace_migration_bitmap_sync_end(migration_dirty_pages - num_dirty_pages_init); num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init; @@ -447,6 +451,8 @@ static void migration_bitmap_sync(void) * * Returns: The number of bytes written. * 0 means no dirty pages + * + * assumes that the caller has rcu-locked the ram_list */ static int ram_save_block(QEMUFile *f, bool last_stage) @@ -458,8 +464,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) MemoryRegion *mr; ram_addr_t current_addr; + if (!block) -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); while (true) { mr = block->mr; @@ -470,9 +477,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } if (offset >= block->length) { offset = 0; -block = QTAILQ_NEXT(block, next); +block = QLIST_NEXT_RCU(block, next); if (!block) { -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); complete_round = true; ram_bulk_stage = false; } @@ -527,9 +534,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } } } + last_seen_block = block; last_offset = offset; - return bytes_sent; } @@ -566,10 +573,10 @@ uint64_t ram_bytes_total(void) { RAMBlock *block; uint64_t total = 0; - -QTAILQ_FOREACH(block, &ram_list.blocks, next) +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) total += block->length; - +rcu_read_unlock(); return total; } @@ -601,12 +608,22 @@ static void reset_ram_globals(void) last_seen_block = NULL; last_sent_block = NULL; last_offset = 0; -last_version = ram_list.version; ram_bulk_stage = true; +smp_wmb(); +last_version = ram_list.version; } #define MAX_WAIT 50 /* ms, half buffered_file limit */ + +/* ram_save_* functions each implement a long-running RCU critical + * section. When rcu-reclaims in the code start to become numerous + * it will be ne
[Qemu-devel] [PATCH] Convert ram_list to RCU DQ V4.1
Changes from V4: rebased on https://github.com/bonzini/qemu/tree/rcu commit 965f3b2aac93bca6df50c86fb17a06b3c856fa30 This brings the patchset down to three files. Changes from V3: * now passes virt-test -t qemu * uses call_rcu instead of call_rcu1 * completely removed the ram_list mutex and its locking functions * cleaned up some comments Changes from V2: * added rcu reclaim function to free ram blocks * reduced the number of comments * made rcu locking policy consistent for different caller scenarios * rcu updates to ram_block now are protected only by the iothread mutex Changes from V1: * Omitted locks or rcu critical sections within Some functions that read or write the ram_list but are called in a protected context (the caller holds the iothread lock, the ram_list mutex, or an rcu critical section). Allow "unlocked" reads of the ram_list by using an RCU-enabled DQ. Most readers of the list no longer require holding the list mutex. The ram_list now uses a QLIST instead of a QTAILQ. The difference is minimal. This patch has been built and make-checked for the x86_64, ppc64, s390x, and arm targets. It has been virt-tested using KVM -t qemu and passes 15/15 migration tests. To apply this patch, you must base upon Paolo Bonzini's rcu tree and also apply the RCU DQ patch (below). https://github.com/bonzini/qemu/tree/rcu http://article.gmane.org/gmane.comp.emulators.qemu/230159/ Signed-off-by: Mike Day --- arch_init.c| 105 ++- exec.c | 164 +++-- include/exec/cpu-all.h | 13 ++-- 3 files changed, 158 insertions(+), 124 deletions(-) diff --git a/arch_init.c b/arch_init.c index 0471cd5..29f1da2 100644 --- a/arch_init.c +++ b/arch_init.c @@ -49,6 +49,7 @@ #include "trace.h" #include "exec/cpu-all.h" #include "hw/acpi/acpi.h" +#include "qemu/rcu_queue.h" #ifdef DEBUG_ARCH_INIT #define DPRINTF(fmt, ...) \ @@ -399,7 +400,8 @@ static void migration_bitmap_sync(void) trace_migration_bitmap_sync_start(); address_space_sync_dirty_bitmap(&address_space_memory); -QTAILQ_FOREACH(block, &ram_list.blocks, next) { +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) { if (memory_region_test_and_clear_dirty(block->mr, addr, TARGET_PAGE_SIZE, @@ -408,6 +410,8 @@ static void migration_bitmap_sync(void) } } } +rcu_read_unlock(); + trace_migration_bitmap_sync_end(migration_dirty_pages - num_dirty_pages_init); num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init; @@ -447,6 +451,8 @@ static void migration_bitmap_sync(void) * * Returns: The number of bytes written. * 0 means no dirty pages + * + * assumes that the caller has rcu-locked the ram_list */ static int ram_save_block(QEMUFile *f, bool last_stage) @@ -458,8 +464,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) MemoryRegion *mr; ram_addr_t current_addr; + if (!block) -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); while (true) { mr = block->mr; @@ -470,9 +477,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } if (offset >= block->length) { offset = 0; -block = QTAILQ_NEXT(block, next); +block = QLIST_NEXT_RCU(block, next); if (!block) { -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); complete_round = true; ram_bulk_stage = false; } @@ -527,9 +534,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } } } + last_seen_block = block; last_offset = offset; - return bytes_sent; } @@ -566,10 +573,10 @@ uint64_t ram_bytes_total(void) { RAMBlock *block; uint64_t total = 0; - -QTAILQ_FOREACH(block, &ram_list.blocks, next) +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) total += block->length; - +rcu_read_unlock(); return total; } @@ -603,10 +610,20 @@ static void reset_ram_globals(void) last_offset = 0; last_version = ram_list.version; ram_bulk_stage = true; +smp_wmb(); } #define MAX_WAIT 50 /* ms, half buffered_file limit */ + +/* ram_save_* functions each implement a long-running RCU critical + * section. When rcu-reclaims in the code start to become numerous + * it will be necessary to reduce the granularity of these critical + * sections. + * + * (ram_save_setup, ram_save_iterate, and ram_save_complete.) + */
Re: [Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V4
On Wed, Sep 4, 2013 at 3:58 PM, Paolo Bonzini wrote: > > > @@ -1323,23 +1325,21 @@ static RAMBlock *qemu_get_ram_block(ram_addr_t addr) > > { > > RAMBlock *block; > > > > -/* The list is protected by the iothread lock here. */ > > +/* This assumes the iothread lock is taken here too. */ > > + > > block = ram_list.mru_block; > > if (block && addr - block->offset < block->length) { > > -goto found; > > +return block; > > } > > -QTAILQ_FOREACH(block, &ram_list.blocks, next) { > > +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { > > if (addr - block->offset < block->length) { > > -goto found; > > +atomic_rcu_set(&ram_list.mru_block, block); > > I think this is not needed, block was already published into the block > list. What is important is to order mru_block == NULL so that it > happens before the node is removed. Which we don't do, but we can fix > later. I am thinking of writing some macros and possibly reorganizing the ram globals into a struct so that we can update it by exchanging pointers atomically. It seems to disjoint right and error-prone now that we are making it rcu-enabled. thanks! Mike
[Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V4
* now passes virt-test -t qemu * uses call_rcu instead of call_rcu1 * completely removed the ram_list mutex and its locking functions * cleaned up some comments Changes from V3: * added rcu reclaim function to free ram blocks * reduced the number of comments * made rcu locking policy consistent for different caller scenarios * rcu updates to ram_block now are protected only by the iothread mutex Changes from V1: * Omitted locks or rcu critical sections within Some functions that read or write the ram_list but are called in a protected context (the caller holds the iothread lock, the ram_list mutex, or an rcu critical section). Allow "unlocked" reads of the ram_list by using an RCU-enabled DQ. Most readers of the list no longer require holding the list mutex. The ram_list now uses a QLIST instead of a QTAILQ. The difference is minimal. This patch has been built and make-checked for the x86_64, ppc64, s390x, and arm targets. It has been virt-tested using KVM -t qemu and passes 15/15 migration tests. To apply this patch, you must base upon Paolo Bonzini's rcu tree and also apply the RCU DQ patch (below). https://github.com/bonzini/qemu/tree/rcu http://article.gmane.org/gmane.comp.emulators.qemu/230159/ Signed-off-by: Mike Day --- arch_init.c | 96 +--- exec.c | 162 +-- include/exec/cpu-all.h | 13 ++-- include/qemu/rcu_queue.h | 8 +++ 4 files changed, 160 insertions(+), 119 deletions(-) diff --git a/arch_init.c b/arch_init.c index 68a7ab7..76ef5c9 100644 --- a/arch_init.c +++ b/arch_init.c @@ -49,6 +49,7 @@ #include "trace.h" #include "exec/cpu-all.h" #include "hw/acpi/acpi.h" +#include "qemu/rcu_queue.h" #ifdef DEBUG_ARCH_INIT #define DPRINTF(fmt, ...) \ @@ -398,7 +399,8 @@ static void migration_bitmap_sync(void) trace_migration_bitmap_sync_start(); address_space_sync_dirty_bitmap(&address_space_memory); -QTAILQ_FOREACH(block, &ram_list.blocks, next) { +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) { if (memory_region_test_and_clear_dirty(block->mr, addr, TARGET_PAGE_SIZE, @@ -407,6 +409,8 @@ static void migration_bitmap_sync(void) } } } +rcu_read_unlock(); + trace_migration_bitmap_sync_end(migration_dirty_pages - num_dirty_pages_init); num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init; @@ -446,6 +450,8 @@ static void migration_bitmap_sync(void) * * Returns: The number of bytes written. * 0 means no dirty pages + * + * assumes that the caller has rcu-locked the ram_list */ static int ram_save_block(QEMUFile *f, bool last_stage) @@ -458,7 +464,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage) ram_addr_t current_addr; if (!block) -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); while (true) { mr = block->mr; @@ -469,9 +475,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } if (offset >= block->length) { offset = 0; -block = QTAILQ_NEXT(block, next); +block = QLIST_NEXT_RCU(block, next); if (!block) { -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); complete_round = true; ram_bulk_stage = false; } @@ -565,10 +571,10 @@ uint64_t ram_bytes_total(void) { RAMBlock *block; uint64_t total = 0; - -QTAILQ_FOREACH(block, &ram_list.blocks, next) +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) total += block->length; - +rcu_read_unlock(); return total; } @@ -602,10 +608,20 @@ static void reset_ram_globals(void) last_offset = 0; last_version = ram_list.version; ram_bulk_stage = true; +smp_wmb(); } #define MAX_WAIT 50 /* ms, half buffered_file limit */ + +/* ram_save_* functions each implement a long-running RCU critical + * section. When rcu-reclaims in the code start to become numerous + * it will be necessary to reduce the granularity of these critical + * sections. + * + * (ram_save_setup, ram_save_iterate, and ram_save_complete.) + */ + static int ram_save_setup(QEMUFile *f, void *opaque) { RAMBlock *block; @@ -631,7 +647,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) } qemu_mutex_lock_iothread(); -qemu_mutex_lock_ramlist(); +rcu_read_lock(); bytes_transferred = 0; reset_ram_globals(); @@ -641,13 +657,13 @@ static int ram_save_setup(QEMUFile *f
[Qemu-devel] [PATCH] Convert ram_list to RCU DQ V3
Changes from V2: * added rcu reclaim function to free ram blocks * reduced the number of comments * made rcu locking policy consistent for different caller scenarios * rcu updates to ram_block now are protected only by the iothread mutex Changes from V1: * Omitted locks or rcu critical sections within Some functions that read or write the ram_list but are called in a protected context (the caller holds the iothread lock, the ram_list mutex, or an rcu critical section). Allow "unlocked" reads of the ram_list by using an RCU-enabled DQ. Most readers of the list no longer require holding the list mutex. The ram_list now uses a QLIST instead of a QTAILQ. The difference is minimal. This patch has been built and make-checked for the x86_64, ppc64, s390x, and arm targets. It has not been tested further than that at this point. To apply this patch, you must base upon Paolo Bonzini's rcu tree and also apply the RCU DQ patch (below). https://github.com/bonzini/qemu/tree/rcu http://article.gmane.org/gmane.comp.emulators.qemu/230159/ Signed-off-by: Mike Day --- arch_init.c | 103 ++ exec.c | 160 +-- include/exec/cpu-all.h | 6 +- include/qemu/rcu_queue.h | 8 +++ 4 files changed, 173 insertions(+), 104 deletions(-) diff --git a/arch_init.c b/arch_init.c index 68a7ab7..4cb1543 100644 --- a/arch_init.c +++ b/arch_init.c @@ -49,6 +49,7 @@ #include "trace.h" #include "exec/cpu-all.h" #include "hw/acpi/acpi.h" +#include "qemu/rcu_queue.h" #ifdef DEBUG_ARCH_INIT #define DPRINTF(fmt, ...) \ @@ -398,7 +399,8 @@ static void migration_bitmap_sync(void) trace_migration_bitmap_sync_start(); address_space_sync_dirty_bitmap(&address_space_memory); -QTAILQ_FOREACH(block, &ram_list.blocks, next) { +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) { if (memory_region_test_and_clear_dirty(block->mr, addr, TARGET_PAGE_SIZE, @@ -407,6 +409,8 @@ static void migration_bitmap_sync(void) } } } +rcu_read_unlock(); + trace_migration_bitmap_sync_end(migration_dirty_pages - num_dirty_pages_init); num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init; @@ -446,6 +450,8 @@ static void migration_bitmap_sync(void) * * Returns: The number of bytes written. * 0 means no dirty pages + * + * assumes that the caller has rcu-locked the ram_list */ static int ram_save_block(QEMUFile *f, bool last_stage) @@ -457,8 +463,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) MemoryRegion *mr; ram_addr_t current_addr; + if (!block) -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); while (true) { mr = block->mr; @@ -469,9 +476,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } if (offset >= block->length) { offset = 0; -block = QTAILQ_NEXT(block, next); +block = QLIST_NEXT_RCU(block, next); if (!block) { -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); complete_round = true; ram_bulk_stage = false; } @@ -526,9 +533,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } } } + last_seen_block = block; last_offset = offset; - return bytes_sent; } @@ -565,10 +572,10 @@ uint64_t ram_bytes_total(void) { RAMBlock *block; uint64_t total = 0; - -QTAILQ_FOREACH(block, &ram_list.blocks, next) +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) total += block->length; - +rcu_read_unlock(); return total; } @@ -602,10 +609,20 @@ static void reset_ram_globals(void) last_offset = 0; last_version = ram_list.version; ram_bulk_stage = true; +smp_wmb(); } #define MAX_WAIT 50 /* ms, half buffered_file limit */ + +/* ram_save_* functions each implement a long-running RCU critical + * section. When rcu-reclaims in the code start to become numerous + * it will be necessary to reduce the granularity of these critical + * sections. + * + * (ram_save_setup, ram_save_iterate, and ram_save_complete.) + */ + static int ram_save_setup(QEMUFile *f, void *opaque) { RAMBlock *block; @@ -631,7 +648,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) } qemu_mutex_lock_iothread(); -qemu_mutex_lock_ramlist(); +rcu_read_lock(); bytes_transferred = 0; reset_ram_globals(); @@ -641,13 +658,13 @@ static
Re: [Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V2
On Tue, Sep 3, 2013 at 10:09 AM, Paolo Bonzini wrote: > > Il 03/09/2013 15:56, Mike Day ha scritto: > >> > +/* this implements a long-running RCU critical section. > >> > + * When rcu reclaims in the code start to become numerous > >> > + * it will be necessary to reduce the granularity of this critical > >> > + * section. > >> > + */ > >> > >> Please add the same comment (and a rcu_read_lock/unlock pair replacing > >> the ramlist mutex) in ram_save_iterate, too. > > > > Just double checking on this particular change. In practice ram_save > > manipulates the ram_list indirectly through ram_save_block. But I'm > > assuming you want this change because of the ram state info that > > persists between calls to ram_save (ram_list version in particular). > > ram_list.version is not really a problem, but last_seen_block has to > persist across ram_save_block calls. Got it. that's a subtle point. > > Also, there is potential for the callback functions > > ram_control_*_iterate to manipulate the ram_list. > > I think that's right now not possible (and they could use > rcu_read_lock/unlock as well). Yeah. So how about we say for now that the rcu critical section status upon entry to the ram_control_*_iterate functions is undefined. I'll make some updates. Mike
Re: [Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V2
On Fri, Aug 30, 2013 at 12:38 PM, Paolo Bonzini wrote: > > > @@ -867,7 +879,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) > > if (version_id < 4 || version_id > 4) { > > return -EINVAL; > > } > > - > > +/* this implements a long-running RCU critical section. > > + * When rcu reclaims in the code start to become numerous > > + * it will be necessary to reduce the granularity of this critical > > + * section. > > + */ > > Please add the same comment (and a rcu_read_lock/unlock pair replacing > the ramlist mutex) in ram_save_iterate, too. Just double checking on this particular change. In practice ram_save manipulates the ram_list indirectly through ram_save_block. But I'm assuming you want this change because of the ram state info that persists between calls to ram_save (ram_list version in particular). Also, there is potential for the callback functions ram_control_*_iterate to manipulate the ram_list. I'm adding the rcu_read_lock/unlock pair in ram_load. It will be recursive with the same calls in ram_save_block, but as you pointed out this is low overhead. With this change in my working code, ram_control_*_iterate are called from within an rcu critical section. Mike
[Qemu-devel] [RFC PATCH] Convert ram_list to RCU DQ V2
Changes from V1: * Omitted locks or rcu critical sections within Some functions that read or write the ram_list but are called in a protected context (the caller holds the iothread lock, the ram_list mutex, or an rcu critical section). Allow "unlocked" reads of the ram_list by using an RCU-enabled DQ. Most readers of the list no longer require holding the list mutex. The ram_list now uses a QLIST instead of a QTAILQ. The difference is minimal. This patch has been built and make-checked for the x86_64, ppc64, s390x, and arm targets. It has not been tested further than that at this point. To apply this patch, you must base upon Paolo Bonzini's rcu tree and also apply the RCU DQ patch (below). https://github.com/bonzini/qemu/tree/rcu http://article.gmane.org/gmane.comp.emulators.qemu/230159/ Signed-off-by: Mike Day --- arch_init.c | 80 +++--- exec.c| 142 ++ hw/9pfs/virtio-9p-synth.c | 2 +- include/exec/cpu-all.h| 4 +- include/qemu/rcu_queue.h | 8 +++ 5 files changed, 151 insertions(+), 85 deletions(-) diff --git a/arch_init.c b/arch_init.c index 68a7ab7..3f4d676 100644 --- a/arch_init.c +++ b/arch_init.c @@ -49,6 +49,7 @@ #include "trace.h" #include "exec/cpu-all.h" #include "hw/acpi/acpi.h" +#include "qemu/rcu_queue.h" #ifdef DEBUG_ARCH_INIT #define DPRINTF(fmt, ...) \ @@ -398,7 +399,11 @@ static void migration_bitmap_sync(void) trace_migration_bitmap_sync_start(); address_space_sync_dirty_bitmap(&address_space_memory); -QTAILQ_FOREACH(block, &ram_list.blocks, next) { + /* This assumes the iothread lock or the ram_list mutex is taken. + * if that changes, accesses to ram_list need to be protected + * by a mutex (writes) or an rcu read lock (reads) + */ +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) { if (memory_region_test_and_clear_dirty(block->mr, addr, TARGET_PAGE_SIZE, @@ -457,8 +462,13 @@ static int ram_save_block(QEMUFile *f, bool last_stage) MemoryRegion *mr; ram_addr_t current_addr; +/* Sometimes called with the ram_list mutex held (ram_save_complete) + * also called WITHOUT the ram_list mutex held. (ram_save_iterate). + * Protect ram_list with an rcu critical section. + */ +rcu_read_lock(); if (!block) -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); while (true) { mr = block->mr; @@ -469,9 +479,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } if (offset >= block->length) { offset = 0; -block = QTAILQ_NEXT(block, next); +block = QLIST_NEXT_RCU(block, next); if (!block) { -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); complete_round = true; ram_bulk_stage = false; } @@ -526,6 +536,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } } } +rcu_read_unlock(); last_seen_block = block; last_offset = offset; @@ -565,10 +576,10 @@ uint64_t ram_bytes_total(void) { RAMBlock *block; uint64_t total = 0; - -QTAILQ_FOREACH(block, &ram_list.blocks, next) +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) total += block->length; - +rcu_read_unlock(); return total; } @@ -631,7 +642,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) } qemu_mutex_lock_iothread(); -qemu_mutex_lock_ramlist(); +rcu_read_lock(); bytes_transferred = 0; reset_ram_globals(); @@ -641,13 +652,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque) qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE); -QTAILQ_FOREACH(block, &ram_list.blocks, next) { +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { qemu_put_byte(f, strlen(block->idstr)); qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr)); qemu_put_be64(f, block->length); } -qemu_mutex_unlock_ramlist(); +rcu_read_unlock(); ram_control_before_iterate(f, RAM_CONTROL_SETUP); ram_control_after_iterate(f, RAM_CONTROL_SETUP); @@ -664,8 +675,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) int64_t t0; int total_sent = 0; -qemu_mutex_lock_ramlist(); - if (ram_list.version != last_version) { reset_ram_globals(); } @@ -701,8 +710,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) i++; } -qemu_mutex_unlock_ramlist(); - /*
Re: [Qemu-devel] [RFC PATCH] convert ram_list to RCU DQ
On Fri, Aug 30, 2013 at 4:19 AM, Paolo Bonzini wrote: > > > I'm not sure about that; returning an RCU-protected variable after > rcu_read_unlock() seems wrong to me because the pointer may not be valid > at that point. I suggest using a comment that asks to call > host_from_stream_offset within rcu_read_lock()/rcu_read_unlock(). > However, if existing practice in the kernel is different, I'll bow to that. In this case the only caller is ram_load so I'm removing the critical section from within host_from_stream_offset, and adding comments to note that the ram_list needs to be protected by the caller. The current docs/rcu.txt information indicates that rcu critical sections can be "nested or overlapping." But your suggestion results in cleaner code - we will have to go back to this later as you noted earlier. Mike
Re: [Qemu-devel] [RFC PATCH] convert ram_list to RCU DQ
On Wed, Aug 28, 2013 at 12:35 PM, Paolo Bonzini wrote: > > > @@ -828,13 +829,18 @@ static inline void *host_from_stream_offset(QEMUFile *f, > > qemu_get_buffer(f, (uint8_t *)id, len); > > id[len] = 0; > > > > -QTAILQ_FOREACH(block, &ram_list.blocks, next) { > > -if (!strncmp(id, block->idstr, sizeof(id))) > > -return memory_region_get_ram_ptr(block->mr) + offset; > > +rcu_read_lock(); > > +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { > > +if (!strncmp(id, block->idstr, sizeof(id))) { > > +ptr = memory_region_get_ram_ptr(block->mr) + offset; > > +goto unlock_out; > > +} > > } > > > > fprintf(stderr, "Can't find block %s!\n", id); > > -return NULL; > > +unlock_out: > > +rcu_read_unlock(); > > +return ptr; > > } > > > Similarly, here the critical section includes the caller, and block is > living across calls to host. Again, for now just put all of ram_load > under a huge RCU critical section. Later we can use ram_list.version to > refresh the list and make the critical sections smaller. And: rcu_read_lock() and rcu_read_unlock() can be called recursively, so I can still leave the lock/unlock pair in host_from_stream_offset.
[Qemu-devel] [RFC PATCH] convert ram_list to RCU DQ
Allow "unlocked" reads of the ram_list by using an RCU-enabled DQ. Most readers of the list no longer require holding the list mutex. The ram_list now uses a QLIST instead of a QTAILQ. The difference is minimal. This patch has been built and make-checked for the x86_64, ppc64, s390x, and arm targets. It has not been tested further than that at this point. To apply this patch, you must base upon Paolo Bonzini's rcu tree and also apply the RCU DQ patch (below). https://github.com/bonzini/qemu/tree/rcu http://article.gmane.org/gmane.comp.emulators.qemu/230159/ Signed-off-by: Mike Day --- arch_init.c | 51 - exec.c| 111 +- hw/9pfs/virtio-9p-synth.c | 2 +- include/exec/cpu-all.h| 4 +- include/qemu/rcu_queue.h | 8 5 files changed, 111 insertions(+), 65 deletions(-) diff --git a/arch_init.c b/arch_init.c index 68a7ab7..5c7b284 100644 --- a/arch_init.c +++ b/arch_init.c @@ -49,6 +49,7 @@ #include "trace.h" #include "exec/cpu-all.h" #include "hw/acpi/acpi.h" +#include "qemu/rcu_queue.h" #ifdef DEBUG_ARCH_INIT #define DPRINTF(fmt, ...) \ @@ -397,8 +398,8 @@ static void migration_bitmap_sync(void) trace_migration_bitmap_sync_start(); address_space_sync_dirty_bitmap(&address_space_memory); - -QTAILQ_FOREACH(block, &ram_list.blocks, next) { +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { for (addr = 0; addr < block->length; addr += TARGET_PAGE_SIZE) { if (memory_region_test_and_clear_dirty(block->mr, addr, TARGET_PAGE_SIZE, @@ -407,6 +408,7 @@ static void migration_bitmap_sync(void) } } } +rcu_read_unlock(); trace_migration_bitmap_sync_end(migration_dirty_pages - num_dirty_pages_init); num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init; @@ -457,8 +459,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) MemoryRegion *mr; ram_addr_t current_addr; +rcu_read_lock(); if (!block) -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); while (true) { mr = block->mr; @@ -469,9 +472,9 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } if (offset >= block->length) { offset = 0; -block = QTAILQ_NEXT(block, next); +block = QLIST_NEXT_RCU(block, next); if (!block) { -block = QTAILQ_FIRST(&ram_list.blocks); +block = QLIST_FIRST_RCU(&ram_list.blocks); complete_round = true; ram_bulk_stage = false; } @@ -526,6 +529,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage) } } } +rcu_read_unlock(); last_seen_block = block; last_offset = offset; @@ -565,10 +569,10 @@ uint64_t ram_bytes_total(void) { RAMBlock *block; uint64_t total = 0; - -QTAILQ_FOREACH(block, &ram_list.blocks, next) +rcu_read_lock(); +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) total += block->length; - +rcu_read_unlock(); return total; } @@ -631,7 +635,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) } qemu_mutex_lock_iothread(); -qemu_mutex_lock_ramlist(); +rcu_read_lock(); bytes_transferred = 0; reset_ram_globals(); @@ -641,13 +645,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque) qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE); -QTAILQ_FOREACH(block, &ram_list.blocks, next) { +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { qemu_put_byte(f, strlen(block->idstr)); qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr)); qemu_put_be64(f, block->length); } -qemu_mutex_unlock_ramlist(); +rcu_read_unlock(); ram_control_before_iterate(f, RAM_CONTROL_SETUP); ram_control_after_iterate(f, RAM_CONTROL_SETUP); @@ -664,8 +668,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) int64_t t0; int total_sent = 0; -qemu_mutex_lock_ramlist(); - if (ram_list.version != last_version) { reset_ram_globals(); } @@ -701,8 +703,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) i++; } -qemu_mutex_unlock_ramlist(); - /* * Must occur before EOS (or any QEMUFile operation) * because of RDMA protocol. @@ -814,6 +814,7 @@ static inline void *host_from_stream_offset(QEMUFile *f, static RAMBlock *block = NULL; char id[256]; uint8_t len; +void *ptr = NULL; if (flags & RAM_SAVE_FLAG_CONTINUE) { if (!block) { @@ -
[Qemu-devel] [RFC PATCH V3 ] Introduce RCU-enabled DQs
Add RCU-enabled variants on the existing bsd DQ facility. Each Q operation has the same interface as the existing (non-RCU) version. Also, each operation is implemented as macro for now. Using the RCU-enabled DQ, existing DQ users will be able to convert to RCU without using a different list interface. Changes since V2: * reverted QLIST_FOREACH_REVERSE_RCU, it's not needed in the interface file. (But it is used in the test program.) * added g_test support to the test program To accompany the RCU-enabled DQ, there is also a test file that uses concurrent readers to contend with a single updater. This patchset builds on top of Paolo Bonzini's rcu tree: https://github.com/bonzini/qemu/tree/rcu Signed-off-by: Mike Day --- docs/rcu.txt | 2 +- include/qemu/queue.h | 11 -- include/qemu/rcu_queue.h | 137 +++ tests/Makefile | 5 +- tests/rcuq_test.c| 337 +++ 5 files changed, 479 insertions(+), 13 deletions(-) create mode 100644 include/qemu/rcu_queue.h create mode 100644 tests/rcuq_test.c diff --git a/docs/rcu.txt b/docs/rcu.txt index b3c593c..de59896 100644 --- a/docs/rcu.txt +++ b/docs/rcu.txt @@ -106,7 +106,7 @@ The core RCU API is small: so that the reclaimer function can fetch the struct foo address and free it: -call_rcu1(foo_reclaim, &foo.rcu); +call_rcu1(&foo.rcu, foo_reclaim); void foo_reclaim(struct rcu_head *rp) { diff --git a/include/qemu/queue.h b/include/qemu/queue.h index 847ddd1..f6f0636 100644 --- a/include/qemu/queue.h +++ b/include/qemu/queue.h @@ -139,17 +139,6 @@ struct { \ (elm)->field.le_prev = &(head)->lh_first; \ } while (/*CONSTCOND*/0) -#define QLIST_INSERT_HEAD_RCU(head, elm, field) do {\ -(elm)->field.le_prev = &(head)->lh_first; \ -(elm)->field.le_next = (head)->lh_first;\ -smp_wmb(); /* fill elm before linking it */ \ -if ((head)->lh_first != NULL) {\ -(head)->lh_first->field.le_prev = &(elm)->field.le_next;\ -} \ -(head)->lh_first = (elm); \ -smp_wmb(); \ -} while (/* CONSTCOND*/0) - #define QLIST_REMOVE(elm, field) do { \ if ((elm)->field.le_next != NULL) \ (elm)->field.le_next->field.le_prev = \ diff --git a/include/qemu/rcu_queue.h b/include/qemu/rcu_queue.h new file mode 100644 index 000..e2b8ba5 --- /dev/null +++ b/include/qemu/rcu_queue.h @@ -0,0 +1,137 @@ +#ifndef QEMU_RCU_SYS_QUEUE_H +#define QEMU_RCU_SYS_QUEUE_H + +/* + * rcu_queue.h + * + * Userspace RCU QSBR header. + * + * LGPL-compatible code should include this header with : + * + * #define _LGPL_SOURCE + * #include + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + * + * Copyright (c) 2013 Mike D. Day, IBM Corporation. + * + * IBM's contributions to this file may be relicensed under LGPLv2 or later. + */ + +#include "qemu/rcu.h" /* rcu.h includes qemu/queue.h and qemu/atomic.h */ + + +#ifdef __cplusplus +extern "C" { +#endif + +/* + * List functions. + */ + + +/* + * The difference between atomic_read/set and atomic_rcu_read/set + * is in the including of a read/write memory barrier to the volatile + * access. atomic_rcu_* macros include the memory barrier, the + * plain atomic macros do not. Therefore, it should be correct to + * issue a series of reads or writes to the same element using only + * the atomic_* macro, until the last read or write, which should be + * atomic_rcu_* to introduce a read or write memory barrier as + * appropriate. + */ + +/* Upon publication of the listelm->next value, list readers + * will see the new node when following next pointers from + * anteced
Re: [Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs (v2)
On Sun, Aug 25, 2013 at 3:18 PM, Mathieu Desnoyers < mathieu.desnoy...@efficios.com> wrote: > > I'm not very comfortable with your DQ implementation not providing any > kind of guarantee to a forward traversal followed by backward traversal, > nor for backward followed by forward traversal. If a list add is > executed concurrently with traversals, and we can ensure there are no > list del of the node, if a traversal sees the added node when doing > forward iteration, I would clearly expect to still see it if a backward > iteration follows. > > I took the liberty of implementing a couple of ideas I had to provide > a RCU DQ with those guarantees. I just pushed the code here (beware, I > just did some basic single-threaded unit tests so far, so consider this > code as largely untested concurrency-wise): > > git clone git://git.urcu.io/urcu.git > branch: rcudq > file: urcu/rcudq.h > > Direct link to the file via gitweb: > http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/rcudq.h;h=4a8d7b0d5143a958514cf130b1c124d99f3194ca;hb=refs/heads/rcudq Mathieu - Thanks for the review! And thanks for the code, I'm working with it right now. I like the idea of using a flag to provide a form of atomicity for the doubly-linked list elements. I'm also planning on running some timing tests to see of the additional memory barriers and atomic accesses make *any* difference whatsoever. Mike Mike Day | ncm...@ncultra.org | +1 919 371-8786
Re: [Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs (v2)
Paolo Bonzini writes: > Just a couple of questions, one of them on the new macro... > >> +/* prior to publication of the elm->prev->next value, some list >> + * readers may still see the removed element when following >> + * the antecedent's next pointer. >> + */ >> + >> +#define QLIST_REMOVE_RCU(elm, field) do { \ >> +if ((elm)->field.le_next != NULL) { \ >> + (elm)->field.le_next->field.le_prev =\ >> +(elm)->field.le_prev; \ >> +} \ >> +atomic_rcu_set((elm)->field.le_prev, (elm)->field.le_next); \ >> +} while (/*CONSTCOND*/0) > > Why is the barrier needed here, but not in Linux's list_del_rcu? > > I think it is not needed because all involved elements have already been > published and just have their pointers shuffled. I read this as more than shuffling pointers. The intent here is that the previous element's next pointer is being updated to omit the current element from the list. atomic_set always deferences the pointer passed to it, and (field)->le_pre is a double pointer. So looking at the macro: #define atomic_set(ptr, i) ((*(__typeof__(*ptr) *volatile) (ptr)) = (i)) It translates to: ( ( * (__typeof(*elm->field.le_prev) *volatile) (elm)->field.le_prev) = elm->field.le_next; ) Which is: *((struct *elm) *volatile)(elm)->field.le_prev = elm->field.le_next; Which is: *(elm)->field.le_prev = elm->field.le_next; Because field.le_prev is a double pointer that has previously been set to &prev (the address of the previous list element) this is assiging the *previous* element's next pointer, the way I read it. The Linux list_del_rcu is dealing with a singly linked list and therefore does not set a value in the previous node's element. But I'm still unclear on whether or not the memory barrier is needed because the deleted element won't be reclaimed right away. Mike -- Mike Day | + 1 919 371-8786 | ncm...@ncultra.org "Endurance is a Virtue"
[Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs (v2)
Add RCU-enabled variants on the existing bsd DQ facility. Each Q operation has the same interface as the existing (non-RCU) version. Also, each operation is implemented as macro for now. Using the RCU-enabled DQ, existing DQ users will be able to convert to RCU without using a different list interface. This version (2) adds a macro to walk a Q in reverse: QLIST_FOREACH_REVERSE_RCU(el, head, field) Accordingly the reader threads in the test program walk the Q in reverse in addition to walking forward. To accompany the RCU-enabled DQ, there is also a test file that uses concurrent readers to contend with a single updater. This patchset builds on top of Paolo Bonzini's rcu tree: https://github.com/bonzini/qemu/tree/rcu Signed-off-by: Mike Day --- docs/rcu.txt | 2 +- include/qemu/queue.h | 11 -- include/qemu/rcu_queue.h | 145 tests/Makefile | 6 +- tests/rcuq_test.c| 290 +++ 5 files changed, 440 insertions(+), 14 deletions(-) create mode 100644 include/qemu/rcu_queue.h create mode 100644 tests/rcuq_test.c diff --git a/docs/rcu.txt b/docs/rcu.txt index b3c593c..de59896 100644 --- a/docs/rcu.txt +++ b/docs/rcu.txt @@ -106,7 +106,7 @@ The core RCU API is small: so that the reclaimer function can fetch the struct foo address and free it: -call_rcu1(foo_reclaim, &foo.rcu); +call_rcu1(&foo.rcu, foo_reclaim); void foo_reclaim(struct rcu_head *rp) { diff --git a/include/qemu/queue.h b/include/qemu/queue.h index 847ddd1..f6f0636 100644 --- a/include/qemu/queue.h +++ b/include/qemu/queue.h @@ -139,17 +139,6 @@ struct { \ (elm)->field.le_prev = &(head)->lh_first; \ } while (/*CONSTCOND*/0) -#define QLIST_INSERT_HEAD_RCU(head, elm, field) do {\ -(elm)->field.le_prev = &(head)->lh_first; \ -(elm)->field.le_next = (head)->lh_first;\ -smp_wmb(); /* fill elm before linking it */ \ -if ((head)->lh_first != NULL) {\ -(head)->lh_first->field.le_prev = &(elm)->field.le_next;\ -} \ -(head)->lh_first = (elm); \ -smp_wmb(); \ -} while (/* CONSTCOND*/0) - #define QLIST_REMOVE(elm, field) do { \ if ((elm)->field.le_next != NULL) \ (elm)->field.le_next->field.le_prev = \ diff --git a/include/qemu/rcu_queue.h b/include/qemu/rcu_queue.h new file mode 100644 index 000..198a87d --- /dev/null +++ b/include/qemu/rcu_queue.h @@ -0,0 +1,145 @@ +#ifndef QEMU_RCU_SYS_QUEUE_H +#define QEMU_RCU_SYS_QUEUE_H + +/* + * rc_queue.h + * + * Userspace RCU QSBR header. + * + * LGPL-compatible code should include this header with : + * + * #define _LGPL_SOURCE + * #include + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + * + * Copyright (c) 2013 Mike D. Day, IBM Corporation. + * + * IBM's contributions to this file may be relicensed under LGPLv2 or later. + */ + +#include "qemu/rcu.h" /* rcu.h includes qemu/queue.h and qemu/atomic.h */ + + +#ifdef __cplusplus +extern "C" { +#endif + +/* + * List functions. + */ + + +/* + * The difference between atomic_read/set and atomic_rcu_read/set + * is in the including of a read/write memory barrier to the volatile + * access. atomic_rcu_* macros include the memory barrier, the + * plain atomic macros do not. Therefore, it should be correct to + * issue a series of reads or writes to the same element using only + * the atomic_* macro, until the last read or write, which should be + * atomic_rcu_* to introduce a read or write memory barrier as + * appropriate. + */ + +/* Upon publication of the listelm->next value, list readers + * will see the new node when following next pointer
[Qemu-devel] [RFC PATCH] Introduce RCU-enabled DQs.
Add RCU-enabled variants on the existing bsd DQ facility. Each Q operation has the same interface as the existing (non-RCU) version. Also, each operation is implemented as macro for now. Using the RCU-enabled DQ, existing DQ users will be able to convert to RCU without using a different list interface. To accompany the RCU-enabled DQ, there is also a test file that uses concurrent readers to contend with a single updater. This patchset builds on top of Paolo Bonzini's rcu tree: https://github.com/bonzini/qemu/tree/rcu Signed-off-by: Mike Day --- docs/rcu.txt | 2 +- include/qemu/queue.h | 11 -- include/qemu/rcu_queue.h | 137 +++ tests/Makefile | 14 ++- tests/rcuq_test.c| 279 +++ 5 files changed, 430 insertions(+), 13 deletions(-) diff --git a/docs/rcu.txt b/docs/rcu.txt index b3c593c..de59896 100644 --- a/docs/rcu.txt +++ b/docs/rcu.txt @@ -106,7 +106,7 @@ The core RCU API is small: so that the reclaimer function can fetch the struct foo address and free it: -call_rcu1(foo_reclaim, &foo.rcu); +call_rcu1(&foo.rcu, foo_reclaim); void foo_reclaim(struct rcu_head *rp) { diff --git a/include/qemu/queue.h b/include/qemu/queue.h index 847ddd1..f6f0636 100644 --- a/include/qemu/queue.h +++ b/include/qemu/queue.h @@ -139,17 +139,6 @@ struct { \ (elm)->field.le_prev = &(head)->lh_first; \ } while (/*CONSTCOND*/0) -#define QLIST_INSERT_HEAD_RCU(head, elm, field) do {\ -(elm)->field.le_prev = &(head)->lh_first; \ -(elm)->field.le_next = (head)->lh_first;\ -smp_wmb(); /* fill elm before linking it */ \ -if ((head)->lh_first != NULL) {\ -(head)->lh_first->field.le_prev = &(elm)->field.le_next;\ -} \ -(head)->lh_first = (elm); \ -smp_wmb(); \ -} while (/* CONSTCOND*/0) - #define QLIST_REMOVE(elm, field) do { \ if ((elm)->field.le_next != NULL) \ (elm)->field.le_next->field.le_prev = \ diff --git a/include/qemu/rcu_queue.h b/include/qemu/rcu_queue.h new file mode 100644 index 000..e82196d --- /dev/null +++ b/include/qemu/rcu_queue.h @@ -0,0 +1,137 @@ +#ifndef QEMU_RCU_SYS_QUEUE_H +#define QEMU_RCU_SYS_QUEUE_H + +/* + * rc_queue.h + * + * Userspace RCU QSBR header. + * + * LGPL-compatible code should include this header with : + * + * #define _LGPL_SOURCE + * #include + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + * + * Copyright (c) 2013 Mike D. Day, IBM Corporation. + * + * IBM's contributions to this file may be relicensed under LGPLv2 or later. + */ + +#include "qemu/rcu.h" /* rcu.h includes qemu/queue.h and qemu/atomic.h */ + + +#ifdef __cplusplus +extern "C" { +#endif + +/* + * List functions. + */ + + +/* + * The difference between atomic_read/set and atomic_rcu_read/set + * is in the including of a read/write memory barrier to the volatile + * access. atomic_rcu_* macros include the memory barrier, the + * plain atomic macros do not. Therefore, it should be correct to + * issue a series of reads or writes to the same element using only + * the atomic_* macro, until the last read or write, which should be + * atomic_rcu_* to introduce a read or write memory barrier as + * appropriate. + */ + +/* Upon publication of the listelm->next value, list readers + * will see the new node when following next pointers from + * antecedent nodes, but may not see the new node when following + * prev pointers from subsequent nodes until after the rcu grace + * period expires. + * see linux/include/rculist.h __list_add_rcu(new, prev, next) + */ +#define QLIST_INSERT_AFTER_RCU(listelm,
Re: [Qemu-devel] [PATCH] Fixup some dynamic casts in the Qemu device tree to correspond to the QOM type-checking system. These patches change from using Linux kernel style upcasts to typesafe object o
Peter Maydell writes: > On 19 August 2013 19:33, Mike Day wrote: >> These patches apply to Paolo Bonzini's rcu tree: >> >> https://github.com/bonzini/qemu/tree/rcu >> commit 781e47bf1693a80b84eec298a6a1c7b29ab2c135 >> >> Signed-off-by: Mike Day >> --- >> hw/misc/ivshmem.c | 2 +- >> hw/pci-bridge/pci_bridge_dev.c | 6 +++--- >> hw/pci/pci_bridge.c| 2 +- >> 3 files changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c >> index ebcb52a..46d8c27 100644 >> --- a/hw/misc/ivshmem.c >> +++ b/hw/misc/ivshmem.c >> @@ -789,7 +789,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev) >> >> static void pci_ivshmem_instance_finalize(Object *obj) >> { >> -IVShmemState *s = IVSHMEM(dev); >> +IVShmemState *s = IVSHMEM(obj); > > This should have been a flat-out compiler error, right? Yes, correct, but Paolo hasn't previously submitted this specific change code afaik. >> if (s->migration_blocker) { >> migrate_del_blocker(s->migration_blocker); >> diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c >> index c995d5d..22caf14 100644 >> --- a/hw/pci-bridge/pci_bridge_dev.c >> +++ b/hw/pci-bridge/pci_bridge_dev.c >> @@ -87,7 +87,6 @@ shpc_error: >> bridge_error: >> return err; >> } >> - > > Stray blank line change. Thanks - and git just found it for me too. Apologies. Mike -- Mike Day | + 1 919 371-8786 | ncm...@ncultra.org "Endurance is a Virtue"
[Qemu-devel] [PATCH RESEND] RCU implementation for Qemu. Fixup some dynamic casts in the Qemu device tree to correspond to the QOM type-checking system.
These patches change from using Linux kernel style upcasts to typesafe object oriented casts with runtime checking semantics. These patches apply to Paolo Bonzini's rcu tree: https://github.com/bonzini/qemu/tree/rcu commit 781e47bf1693a80b84eec298a6a1c7b29ab2c135 Signed-off-by: Mike Day --- hw/misc/ivshmem.c | 2 +- hw/pci-bridge/pci_bridge_dev.c | 6 +++--- hw/pci/pci_bridge.c| 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c index ebcb52a..46d8c27 100644 --- a/hw/misc/ivshmem.c +++ b/hw/misc/ivshmem.c @@ -789,7 +789,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev) static void pci_ivshmem_instance_finalize(Object *obj) { -IVShmemState *s = IVSHMEM(dev); +IVShmemState *s = IVSHMEM(obj); if (s->migration_blocker) { migrate_del_blocker(s->migration_blocker); diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c index c995d5d..22caf14 100644 --- a/hw/pci-bridge/pci_bridge_dev.c +++ b/hw/pci-bridge/pci_bridge_dev.c @@ -87,7 +87,6 @@ shpc_error: bridge_error: return err; } - static void pci_bridge_dev_exitfn(PCIDevice *dev) { PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(dev); @@ -102,8 +101,9 @@ static void pci_bridge_dev_exitfn(PCIDevice *dev) static void pci_bridge_dev_instance_finalize(Object *obj) { PCIDevice *dev = PCI_DEVICE(obj); -PCIBridge *br = DO_UPCAST(PCIBridge, dev, dev); -PCIBridgeDev *bridge_dev = DO_UPCAST(PCIBridgeDev, bridge, br); +PCIBridge *br = PCI_BRIDGE(dev); +PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(br); + shpc_free(dev); memory_region_destroy(&bridge_dev->bar); pci_bridge_free(dev); diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index 63f9912..307e076 100644 --- a/hw/pci/pci_bridge.c +++ b/hw/pci/pci_bridge.c @@ -391,7 +391,7 @@ void pci_bridge_exitfn(PCIDevice *pci_dev) void pci_bridge_free(PCIDevice *pci_dev) { -PCIBridge *s = DO_UPCAST(PCIBridge, dev, pci_dev); +PCIBridge *s = PCI_BRIDGE(pci_dev); pci_bridge_region_cleanup(s, s->windows); memory_region_destroy(&s->address_space_mem); memory_region_destroy(&s->address_space_io); -- 1.8.3.1
[Qemu-devel] [PATCH] Fixup some dynamic casts in the Qemu device tree to correspond to the QOM type-checking system. These patches change from using Linux kernel style upcasts to typesafe object orien
These patches apply to Paolo Bonzini's rcu tree: https://github.com/bonzini/qemu/tree/rcu commit 781e47bf1693a80b84eec298a6a1c7b29ab2c135 Signed-off-by: Mike Day --- hw/misc/ivshmem.c | 2 +- hw/pci-bridge/pci_bridge_dev.c | 6 +++--- hw/pci/pci_bridge.c| 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c index ebcb52a..46d8c27 100644 --- a/hw/misc/ivshmem.c +++ b/hw/misc/ivshmem.c @@ -789,7 +789,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev) static void pci_ivshmem_instance_finalize(Object *obj) { -IVShmemState *s = IVSHMEM(dev); +IVShmemState *s = IVSHMEM(obj); if (s->migration_blocker) { migrate_del_blocker(s->migration_blocker); diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c index c995d5d..22caf14 100644 --- a/hw/pci-bridge/pci_bridge_dev.c +++ b/hw/pci-bridge/pci_bridge_dev.c @@ -87,7 +87,6 @@ shpc_error: bridge_error: return err; } - static void pci_bridge_dev_exitfn(PCIDevice *dev) { PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(dev); @@ -102,8 +101,9 @@ static void pci_bridge_dev_exitfn(PCIDevice *dev) static void pci_bridge_dev_instance_finalize(Object *obj) { PCIDevice *dev = PCI_DEVICE(obj); -PCIBridge *br = DO_UPCAST(PCIBridge, dev, dev); -PCIBridgeDev *bridge_dev = DO_UPCAST(PCIBridgeDev, bridge, br); +PCIBridge *br = PCI_BRIDGE(dev); +PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(br); + shpc_free(dev); memory_region_destroy(&bridge_dev->bar); pci_bridge_free(dev); diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index 63f9912..307e076 100644 --- a/hw/pci/pci_bridge.c +++ b/hw/pci/pci_bridge.c @@ -391,7 +391,7 @@ void pci_bridge_exitfn(PCIDevice *pci_dev) void pci_bridge_free(PCIDevice *pci_dev) { -PCIBridge *s = DO_UPCAST(PCIBridge, dev, pci_dev); +PCIBridge *s = PCI_BRIDGE(pci_dev); pci_bridge_region_cleanup(s, s->windows); memory_region_destroy(&s->address_space_mem); memory_region_destroy(&s->address_space_io); -- 1.8.3.1
Re: [Qemu-devel] [RFC PATCH 0/3] v2.2 RCU Implementation for QEMU
Andreas Färber writes: >> https://github.com/ncultra/qemu/tree/rcu-for-1.7 >> >> Mike Day (2): >> fixup changes from commit f63ca950 >> fixup changes from commit f62a6b2f from Paulo Bonzini > > These two patches are certainly not acceptable for upstream, lacking any > textual explanation and with those subjects. Thanks. The entire series is not ready yet - it needs to be refactored and tested, hence the RFC tag. I'm trying to keep an rebased version of the Paolo's May RCU tree available for those who are interested. When its ready to be considered for an upstream merge it will be as expected. The the next step for me is to start working on leveraging the RCU code. Mike -- Mike Day | + 1 919 371-8786 | ncm...@ncultra.org "Endurance is a Virtue"