Re: [Qemu-devel] [PATCH] hw/core/null-machine: Add the possibility to instantiate a CPU, RAM and kernel

2017-01-16 Thread Thomas Huth
On 14.01.2017 12:03, Laurent Vivier wrote:
> Le 14/01/2017 à 07:51, Thomas Huth a écrit :
>> Sometimes it is useful to have just a machine with CPU and RAM, without
>> any further hardware in it, e.g. if you just want to do some instruction
>> debugging for TCG with a remote GDB attached to QEMU, or run some embedded
>> code with the "-semihosting" QEMU parameter. qemu-system-m68k already
>> features a "dummy" machine, and xtensa a "sim" machine for exactly this
>> purpose.
>> All target architectures have nowadays also a "none" machine, which would
>> be a perfect match for this, too - but it currently does not allow to add
>> CPU, RAM or a kernel yet. Thus let's add these possibilities in a generic
>> way to the "none" machine, too, so that we hopefully do not need additional
>> "dummy" machines in the future anymore (and maybe can also get rid of the
>> already existing "dummy"/"sim" machines one day).
>> Note that the default behaviour of the "none" machine is not changed, i.e.
>> no CPU and no RAM is instantiated by default. You've explicitely got to
>> specify the CPU model with "-cpu" and the amount of RAM with "-m" to get
>> these new features.
> 
> Did you try to use the generic-loader to load the kernel?
> 
> Something like "-device loader,file=vmlinux" instead of adding this part
> in the none machine?

It does not work by default - because we need a way to set the CPU's
program counter to the entry point of the ELF file.
But I think the users also expect the "-kernel" parameter to be working,
so I think we should add the loader code in null-machine.c anyway.

> Perhaps we could also add a cpu this way, as they are already available
> in the device list for the machine that allows hotplug.

The only machine that allows hot-plugging of CPUs with the device
interface is ppc64 spapr, as far as I know. So this does not help with
embedded boards like m68k or xtensa. Also I think you need some
additional magic like a bus where the CPUs could be attached, and maybe
firmware interfaces, so this does not fit the idea of an embedded board
very well.

> With the same idea, we could also have a "-device ram,size=XXX" to add
> ram (not DIMM).

That might be useful indeed, but mainly because some embedded boards
expect some additinal RAM at a higher, non-zero addresses, too.

> I think is is the idea behind the none machine:
[...]
> This
> also provides a mode that we can use in the future to construct machines
> entirely through QMP commands.

We're still very far away from the possibility that everything could be
constructed on the fly. (Embedded) CPUs, RAM, buses ... all that is not
really pluggable yet. So I think my patch is a good first step into this
direction to get at least an initial playground for a machine that can
be defined by the user.

 Thomas




Re: [Qemu-devel] [PATCH RFC v3 11/14] intel_iommu: provide its own replay() callback

2017-01-16 Thread Peter Xu
On Mon, Jan 16, 2017 at 03:47:08PM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月16日 15:31, Peter Xu wrote:
> >On Fri, Jan 13, 2017 at 05:26:06PM +0800, Jason Wang wrote:
> >>
> >>On 2017年01月13日 11:06, Peter Xu wrote:
> >>>The default replay() don't work for VT-d since vt-d will have a huge
> >>>default memory region which covers address range 0-(2^64-1). This will
> >>>normally bring a dead loop when guest starts.
> >>I think it just takes too much time instead of dead loop?
> >Hmm, I can touch the commit message above to make it more precise.
> >
> >>>The solution is simple - we don't walk over all the regions. Instead, we
> >>>jump over the regions when we found that the page directories are empty.
> >>>It'll greatly reduce the time to walk the whole region.
> >>Yes, the problem is memory_region_is_iommu_reply() not smart because:
> >>
> >>- It doesn't understand large page
> >>- try go over all possible iova
> >>
> >>So I'm thinking to introduce something like iommu_ops->iova_iterate() which
> >>
> >>1) accept an start iova and return the next exist map
> >>2) understand large page
> >>3) skip unmapped iova
> >Though I haven't tested with huge pages yet, but this patch should
> >both solve above issue? I don't know whether you went over the page
> >walk logic - it should both support huge page, and it will skip
> >unmapped iova range (at least that's my goal to have this patch). In
> >that case, looks like this patch is solving the same problem? :)
> >(though without introducing iova_iterate() interface)
> >
> >Please correct me if I misunderstood it.
> 
> Kind of :) I'm fine with this patch, but just want:
> 
> - reuse most of the codes in the patch
> - current memory_region_iommu_replay() logic
> 
> So what I'm suggesting is a just slight change of API which can let caller
> decide it need to do with each range of iova. So it could be reused for
> other things except for replaying.
> 
> But if you like to keep this patch as is, I don't object it.

I see. Then I can understand your mean here. I had the same thought
before, that's why I exposed the vtd_page_walk with a hook. If you
check the page_walk function comment:

/**
 * vtd_page_walk - walk specific IOVA range, and call the hook
 *
 * @ce: context entry to walk upon
 * @start: IOVA address to start the walk
 * @end: IOVA range end address (start <= addr < end)
 * @hook_fn: the hook that to be called for each detected area
 * @private: private data for the hook function
 */

So I didn't implement the notification in page_walk at all - but in
the hook_fn. If any caller that is interested in doing something else
rather than the notification, we can just simply export the page walk
interface and provide his/her own "hook_fn", then it'll be triggered
for each valid page (no matter a huge/small one).

If we can have a more general interface in the future - no matter
whether we call it iova_iterate() or something else (I'll prefer the
hooker way to do it, so maybe a common page walker with a hook
function), we can do it simply (at least for Intel platform) based on
this vtd_page_walk thing.

Thanks,

-- peterx



Re: [Qemu-devel] [PATCH RFC v3 13/14] intel_iommu: allow dynamic switch of IOMMU region

2017-01-16 Thread Jason Wang



On 2017年01月16日 15:50, Peter Xu wrote:

On Mon, Jan 16, 2017 at 02:20:31PM +0800, Jason Wang wrote:

[...]


diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index fd75112..2596f11 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1343,9 +1343,49 @@ static void vtd_handle_gcmd_sirtp(IntelIOMMUState *s)
  vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRTPS);
  }
+static void vtd_switch_address_space(VTDAddressSpace *as, bool iommu_enabled)

Looks like you can check s->dmar_enabled here?

Yes, we need to check old state in case we don't need a switch at all.
Actually I checked it...



I mean is there a chance that iommu_enabled( better name should be 
dmar_enabled) is not equal to s->dmar_enabled? Looks not.


vtd_handle_gcmd_te() did:

...
if (en) {
s->dmar_enabled = true;
/* Ok - report back to driver */
vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_TES);
} else {
s->dmar_enabled = false;
...

You can vtd_switch_address_space_all(s, en) after this which will call 
this function. And another caller like you've pointed out has already 
call this through s->dmar_enabled. So en here is always s->dmar_enalbed?


Thanks




Re: [Qemu-devel] [PATCH RFC v3 12/14] intel_iommu: do replay when context invalidate

2017-01-16 Thread Peter Xu
On Mon, Jan 16, 2017 at 03:52:10PM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月16日 15:43, Peter Xu wrote:
> >On Mon, Jan 16, 2017 at 01:53:54PM +0800, Jason Wang wrote:
> >>
> >>On 2017年01月13日 11:06, Peter Xu wrote:
> >>>Before this one we only invalidate context cache when we receive context
> >>>entry invalidations. However it's possible that the invalidation also
> >>>contains a domain switch (only if cache-mode is enabled for vIOMMU).
> >>So let's check for CM before replaying?
> >When CM is not set, there should have no device needs
> >IOMMU_NOTIFIER_MAP notifies. So IMHO it won't hurt if we replay here
> >(so the notifier_list will only contain UNMAP notifiers at most, and
> >sending UNMAP to those devices should not affect them at all).
> >
> >If we check CM before replay, it'll be faster when guest change iommu
> >domain for a specific device. But after all this kind of operation is
> >extremely rare, while if we check CM bit, we have a "assumption" in
> >the code that MAP is depending on CM. In that case, to make the codes
> >cleaner, I'd slightly prefer not check it here. How do you think?
> 
> Ok, I think maybe it's better to add a comment here.

Will do it. Thanks!

-- peterx



Re: [Qemu-devel] [PATCH RFC v3 11/14] intel_iommu: provide its own replay() callback

2017-01-16 Thread Jason Wang



On 2017年01月16日 15:59, Peter Xu wrote:

On Mon, Jan 16, 2017 at 03:47:08PM +0800, Jason Wang wrote:


On 2017年01月16日 15:31, Peter Xu wrote:

On Fri, Jan 13, 2017 at 05:26:06PM +0800, Jason Wang wrote:

On 2017年01月13日 11:06, Peter Xu wrote:

The default replay() don't work for VT-d since vt-d will have a huge
default memory region which covers address range 0-(2^64-1). This will
normally bring a dead loop when guest starts.

I think it just takes too much time instead of dead loop?

Hmm, I can touch the commit message above to make it more precise.


The solution is simple - we don't walk over all the regions. Instead, we
jump over the regions when we found that the page directories are empty.
It'll greatly reduce the time to walk the whole region.

Yes, the problem is memory_region_is_iommu_reply() not smart because:

- It doesn't understand large page
- try go over all possible iova

So I'm thinking to introduce something like iommu_ops->iova_iterate() which

1) accept an start iova and return the next exist map
2) understand large page
3) skip unmapped iova

Though I haven't tested with huge pages yet, but this patch should
both solve above issue? I don't know whether you went over the page
walk logic - it should both support huge page, and it will skip
unmapped iova range (at least that's my goal to have this patch). In
that case, looks like this patch is solving the same problem? :)
(though without introducing iova_iterate() interface)

Please correct me if I misunderstood it.

Kind of :) I'm fine with this patch, but just want:

- reuse most of the codes in the patch
- current memory_region_iommu_replay() logic

So what I'm suggesting is a just slight change of API which can let caller
decide it need to do with each range of iova. So it could be reused for
other things except for replaying.

But if you like to keep this patch as is, I don't object it.

I see. Then I can understand your mean here. I had the same thought
before, that's why I exposed the vtd_page_walk with a hook. If you
check the page_walk function comment:

/**
  * vtd_page_walk - walk specific IOVA range, and call the hook
  *
  * @ce: context entry to walk upon
  * @start: IOVA address to start the walk
  * @end: IOVA range end address (start <= addr < end)
  * @hook_fn: the hook that to be called for each detected area
  * @private: private data for the hook function
  */

So I didn't implement the notification in page_walk at all - but in
the hook_fn. If any caller that is interested in doing something else
rather than the notification, we can just simply export the page walk
interface and provide his/her own "hook_fn", then it'll be triggered
for each valid page (no matter a huge/small one).

If we can have a more general interface in the future - no matter
whether we call it iova_iterate() or something else (I'll prefer the
hooker way to do it, so maybe a common page walker with a hook
function), we can do it simply (at least for Intel platform) based on
this vtd_page_walk thing.

Thanks,

-- peterx


Yes but the problem is hook_fn is only visible inside intel iommu code.

Thanks



Re: [Qemu-devel] [PATCH RFC v3 11/14] intel_iommu: provide its own replay() callback

2017-01-16 Thread Peter Xu
On Mon, Jan 16, 2017 at 04:03:22PM +0800, Jason Wang wrote:

[...]

> >>>Though I haven't tested with huge pages yet, but this patch should
> >>>both solve above issue? I don't know whether you went over the page
> >>>walk logic - it should both support huge page, and it will skip
> >>>unmapped iova range (at least that's my goal to have this patch). In
> >>>that case, looks like this patch is solving the same problem? :)
> >>>(though without introducing iova_iterate() interface)
> >>>
> >>>Please correct me if I misunderstood it.
> >>Kind of :) I'm fine with this patch, but just want:
> >>
> >>- reuse most of the codes in the patch
> >>- current memory_region_iommu_replay() logic
> >>
> >>So what I'm suggesting is a just slight change of API which can let caller
> >>decide it need to do with each range of iova. So it could be reused for
> >>other things except for replaying.
> >>
> >>But if you like to keep this patch as is, I don't object it.
> >I see. Then I can understand your mean here. I had the same thought
> >before, that's why I exposed the vtd_page_walk with a hook. If you
> >check the page_walk function comment:
> >
> >/**
> >  * vtd_page_walk - walk specific IOVA range, and call the hook
> >  *
> >  * @ce: context entry to walk upon
> >  * @start: IOVA address to start the walk
> >  * @end: IOVA range end address (start <= addr < end)
> >  * @hook_fn: the hook that to be called for each detected area
> >  * @private: private data for the hook function
> >  */
> >
> >So I didn't implement the notification in page_walk at all - but in
> >the hook_fn. If any caller that is interested in doing something else
> >rather than the notification, we can just simply export the page walk
> >interface and provide his/her own "hook_fn", then it'll be triggered
> >for each valid page (no matter a huge/small one).
> >
> >If we can have a more general interface in the future - no matter
> >whether we call it iova_iterate() or something else (I'll prefer the
> >hooker way to do it, so maybe a common page walker with a hook
> >function), we can do it simply (at least for Intel platform) based on
> >this vtd_page_walk thing.
> >
> >Thanks,
> >
> >-- peterx
> 
> Yes but the problem is hook_fn is only visible inside intel iommu code.

Right.

Btw, do we have existing issue that can leverage this interface
besides replay?

-- peterx



Re: [Qemu-devel] [PULL 000/180] QAPI patches for 2017-01-13

2017-01-16 Thread Markus Armbruster
Eric Blake  writes:

> On 01/13/2017 10:44 AM, Markus Armbruster wrote:
>> This is Marc-André's "[PATCH v8 00/21] qapi doc generation (whole
>> version, squashed)" with a few commit messages tweaked, and "[PATCH v8
>> 14/21] (SQUASHED) move doc to schema" unsquashed into 161 patches.
>> 
>> We did all the respins with in this squashed form to reduce noise.
>> However, since the unsquashed form is better suited for review, and
>> probably nicer if we have to revisit this part of the work down the
>> road, I'm proposing to merge this unsquashed.
>> 
>> If you want me to post the unsquashed patches, I'm happy to redo this
>> pull request.
>> 
>> If you'd rather pull the squashed version, likewise.
>> 
>> I'm afraid this is a bit of a doc conflict magnet.  The sooner we can
>> get it in, the easier for Marc-André and me.
>
> Indeed - there's already a merge conflict with commit e1ff3c6, which
> landed between the time you created the pull request and now.
>
> Also, in trying to merge your branch locally, I get a rejection message
> from my git hooks:
>
> $ git commit
> tests/qapi-schema/comments.out:7: new blank line at EOF.
> tests/qapi-schema/event-case.out:8: new blank line at EOF.
> tests/qapi-schema/ident-with-escape.out:10: new blank line at EOF.
> tests/qapi-schema/include-relpath.out:7: new blank line at EOF.
> tests/qapi-schema/include-repetition.out:7: new blank line at EOF.
> tests/qapi-schema/include-simple.out:7: new blank line at EOF.
> tests/qapi-schema/indented-expr.out:13: new blank line at EOF.
> tests/qapi-schema/qapi-schema-test.out:446: new blank line at EOF.
>
> Worth respinning to fix those issues?

Since I have to respin anyway, why not.

>>   qmp-commands: move 'query-memdev' doc to schema
>
> This one is the conflict with current master.

Thanks for the heads-up, v2 coming.



Re: [Qemu-devel] [PATCH RFC v3 13/14] intel_iommu: allow dynamic switch of IOMMU region

2017-01-16 Thread Peter Xu
On Mon, Jan 16, 2017 at 04:01:00PM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月16日 15:50, Peter Xu wrote:
> >On Mon, Jan 16, 2017 at 02:20:31PM +0800, Jason Wang wrote:
> >
> >[...]
> >
> >>>diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> >>>index fd75112..2596f11 100644
> >>>--- a/hw/i386/intel_iommu.c
> >>>+++ b/hw/i386/intel_iommu.c
> >>>@@ -1343,9 +1343,49 @@ static void vtd_handle_gcmd_sirtp(IntelIOMMUState 
> >>>*s)
> >>>  vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRTPS);
> >>>  }
> >>>+static void vtd_switch_address_space(VTDAddressSpace *as, bool 
> >>>iommu_enabled)
> >>Looks like you can check s->dmar_enabled here?
> >Yes, we need to check old state in case we don't need a switch at all.
> >Actually I checked it...
> >
> 
> I mean is there a chance that iommu_enabled( better name should be
> dmar_enabled) is not equal to s->dmar_enabled? Looks not.
> 
> vtd_handle_gcmd_te() did:
> 
> ...
> if (en) {
> s->dmar_enabled = true;
> /* Ok - report back to driver */
> vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_TES);
> } else {
> s->dmar_enabled = false;
> ...
> 
> You can vtd_switch_address_space_all(s, en) after this which will call this
> function. And another caller like you've pointed out has already call this
> through s->dmar_enabled. So en here is always s->dmar_enalbed?

Hmm, yes...

(I would still prefer keeping this parameter for readablility.
 Though, I prefer your suggestion to rename it to dmar_enabled)

-- peterx



Re: [Qemu-devel] [PATCH RFC v3 12/14] intel_iommu: do replay when context invalidate

2017-01-16 Thread Peter Xu
On Mon, Jan 16, 2017 at 03:52:10PM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月16日 15:43, Peter Xu wrote:
> >On Mon, Jan 16, 2017 at 01:53:54PM +0800, Jason Wang wrote:
> >>
> >>On 2017年01月13日 11:06, Peter Xu wrote:
> >>>Before this one we only invalidate context cache when we receive context
> >>>entry invalidations. However it's possible that the invalidation also
> >>>contains a domain switch (only if cache-mode is enabled for vIOMMU).
> >>So let's check for CM before replaying?
> >When CM is not set, there should have no device needs
> >IOMMU_NOTIFIER_MAP notifies. So IMHO it won't hurt if we replay here
> >(so the notifier_list will only contain UNMAP notifiers at most, and
> >sending UNMAP to those devices should not affect them at all).
> >
> >If we check CM before replay, it'll be faster when guest change iommu
> >domain for a specific device. But after all this kind of operation is
> >extremely rare, while if we check CM bit, we have a "assumption" in
> >the code that MAP is depending on CM. In that case, to make the codes
> >cleaner, I'd slightly prefer not check it here. How do you think?
> 
> Ok, I think maybe it's better to add a comment here.

How about this?

+/*
+ * So a device is moving out of (or moving into) a
+ * domain, a replay() suites here to notify all the
+ * IOMMU_NOTIFIER_MAP registers about this change.
+ * This won't bring bad even if we have no such
+ * notifier registered - the IOMMU notification
+ * framework will skip MAP notifications if that
+ * happened.
+ */
 memory_region_iommu_replay_all(&vtd_as->iommu);

Thanks,

-- peterx



Re: [Qemu-devel] [PATCH v2 11/11] aspeed/smc: handle dummy bytes when doing fast reads in command mode

2017-01-16 Thread Cédric Le Goater
> I did not notice that this function is also called in writes, isn't it?
> If yes, dummy cycles are used only during reads so probably CTRL_FREADMODE
> needs to be tested.

yes. I can take care of that in a follow up patchset for 
dummy support.
 
Dummies in user mode is a bit painful to implement, as I had 
to snoop into the command flow to catch the fast read op.
Not sure this is the right approach so I kept it for later.


Did you have time to take look at the other patches adding 
Command mode and extending the tests ? I should have addressed
your comments there.

Thanks,

C. 



Re: [Qemu-devel] [PATCH RFC v3 11/14] intel_iommu: provide its own replay() callback

2017-01-16 Thread Jason Wang



On 2017年01月16日 16:06, Peter Xu wrote:

On Mon, Jan 16, 2017 at 04:03:22PM +0800, Jason Wang wrote:

[...]


Though I haven't tested with huge pages yet, but this patch should
both solve above issue? I don't know whether you went over the page
walk logic - it should both support huge page, and it will skip
unmapped iova range (at least that's my goal to have this patch). In
that case, looks like this patch is solving the same problem? :)
(though without introducing iova_iterate() interface)

Please correct me if I misunderstood it.

Kind of :) I'm fine with this patch, but just want:

- reuse most of the codes in the patch
- current memory_region_iommu_replay() logic

So what I'm suggesting is a just slight change of API which can let caller
decide it need to do with each range of iova. So it could be reused for
other things except for replaying.

But if you like to keep this patch as is, I don't object it.

I see. Then I can understand your mean here. I had the same thought
before, that's why I exposed the vtd_page_walk with a hook. If you
check the page_walk function comment:

/**
  * vtd_page_walk - walk specific IOVA range, and call the hook
  *
  * @ce: context entry to walk upon
  * @start: IOVA address to start the walk
  * @end: IOVA range end address (start <= addr < end)
  * @hook_fn: the hook that to be called for each detected area
  * @private: private data for the hook function
  */

So I didn't implement the notification in page_walk at all - but in
the hook_fn. If any caller that is interested in doing something else
rather than the notification, we can just simply export the page walk
interface and provide his/her own "hook_fn", then it'll be triggered
for each valid page (no matter a huge/small one).

If we can have a more general interface in the future - no matter
whether we call it iova_iterate() or something else (I'll prefer the
hooker way to do it, so maybe a common page walker with a hook
function), we can do it simply (at least for Intel platform) based on
this vtd_page_walk thing.

Thanks,

-- peterx

Yes but the problem is hook_fn is only visible inside intel iommu code.

Right.

Btw, do we have existing issue that can leverage this interface
besides replay?

-- peterx


Seems not, so I'm fine with current code, just want to show the 
possibility for it to be reused in the future.


Thanks



Re: [Qemu-devel] [PATCH RFC v3 13/14] intel_iommu: allow dynamic switch of IOMMU region

2017-01-16 Thread Jason Wang



On 2017年01月16日 16:12, Peter Xu wrote:

On Mon, Jan 16, 2017 at 04:01:00PM +0800, Jason Wang wrote:


On 2017年01月16日 15:50, Peter Xu wrote:

On Mon, Jan 16, 2017 at 02:20:31PM +0800, Jason Wang wrote:

[...]


diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index fd75112..2596f11 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1343,9 +1343,49 @@ static void vtd_handle_gcmd_sirtp(IntelIOMMUState *s)
  vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRTPS);
  }
+static void vtd_switch_address_space(VTDAddressSpace *as, bool iommu_enabled)

Looks like you can check s->dmar_enabled here?

Yes, we need to check old state in case we don't need a switch at all.
Actually I checked it...


I mean is there a chance that iommu_enabled( better name should be
dmar_enabled) is not equal to s->dmar_enabled? Looks not.

vtd_handle_gcmd_te() did:

 ...
 if (en) {
 s->dmar_enabled = true;
 /* Ok - report back to driver */
 vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_TES);
 } else {
 s->dmar_enabled = false;
 ...

You can vtd_switch_address_space_all(s, en) after this which will call this
function. And another caller like you've pointed out has already call this
through s->dmar_enabled. So en here is always s->dmar_enalbed?

Hmm, yes...

(I would still prefer keeping this parameter for readablility.
  Though, I prefer your suggestion to rename it to dmar_enabled)

-- peterx


I think this does not give more readability :) May I was wrong, let 
leave this for maintainer.


Thanks :)



Re: [Qemu-devel] [PATCH v5 3/4] migration: disallow migrate_add_blocker during migration

2017-01-16 Thread Ashijeet Acharya
On Mon, Jan 16, 2017 at 1:17 AM, Greg Kurz  wrote:
> On Thu, 12 Jan 2017 21:22:33 +0530
> Ashijeet Acharya  wrote:
>
>> If a migration is already in progress and somebody attempts
>> to add a migration blocker, this should rightly fail.
>>
>> Add an errp parameter and a retcode return value to migrate_add_blocker.
>>
>> Signed-off-by: John Snow 
>> Signed-off-by: Ashijeet Acharya 
>> ---
>>  block/qcow.c  |  7 ++-
>>  block/vdi.c   |  7 ++-
>>  block/vhdx.c  | 16 ++--
>>  block/vmdk.c  |  8 +++-
>>  block/vpc.c   | 10 +++---
>>  block/vvfat.c | 20 
>>  hw/9pfs/9p.c  | 31 ---
>>  hw/display/virtio-gpu.c   | 31 ++-
>>  hw/intc/arm_gic_kvm.c | 16 ++--
>>  hw/intc/arm_gicv3_its_kvm.c   | 19 ---
>>  hw/intc/arm_gicv3_kvm.c   | 18 +++---
>>  hw/misc/ivshmem.c | 13 +
>>  hw/scsi/vhost-scsi.c  | 24 ++--
>>  hw/virtio/vhost.c |  7 ++-
>>  include/migration/migration.h |  7 ++-
>>  migration/migration.c | 38 --
>>  stubs/migr-blocker.c  |  3 ++-
>>  target-i386/kvm.c | 15 ---
>>  18 files changed, 208 insertions(+), 82 deletions(-)
>>
>> diff --git a/block/qcow.c b/block/qcow.c
>> index 7540f43..90ebe78 100644
>> --- a/block/qcow.c
>> +++ b/block/qcow.c
>> @@ -104,6 +104,7 @@ static int qcow_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>  unsigned int len, i, shift;
>>  int ret;
>>  QCowHeader header;
>> +Error *local_err = NULL;
>>
>>  ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
>>  if (ret < 0) {
>> @@ -252,7 +253,11 @@ static int qcow_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>  error_setg(&s->migration_blocker, "The qcow format used by node '%s' "
>> "does not support live migration",
>> bdrv_get_device_or_node_name(bs));
>> -migrate_add_blocker(s->migration_blocker);
>> +ret = migrate_add_blocker(s->migration_blocker, &local_err);
>> +if (local_err) {
>> +error_propagate(errp, local_err);
>> +goto fail;
>> +}
>>
>>  qemu_co_mutex_init(&s->lock);
>>  return 0;
>> diff --git a/block/vdi.c b/block/vdi.c
>> index 96b78d5..2cb8ef0 100644
>> --- a/block/vdi.c
>> +++ b/block/vdi.c
>> @@ -361,6 +361,7 @@ static int vdi_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>  VdiHeader header;
>>  size_t bmap_size;
>>  int ret;
>> +Error *local_err = NULL;
>>
>>  logout("\n");
>>
>> @@ -471,7 +472,11 @@ static int vdi_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>  error_setg(&s->migration_blocker, "The vdi format used by node '%s' "
>> "does not support live migration",
>> bdrv_get_device_or_node_name(bs));
>> -migrate_add_blocker(s->migration_blocker);
>> +ret = migrate_add_blocker(s->migration_blocker, &local_err);
>> +if (local_err) {
>> +error_propagate(errp, local_err);
>> +goto fail_free_bmap;
>> +}
>>
>>  qemu_co_mutex_init(&s->write_lock);
>>
>> diff --git a/block/vhdx.c b/block/vhdx.c
>> index 0ba2f0a..77ced7c 100644
>> --- a/block/vhdx.c
>> +++ b/block/vhdx.c
>> @@ -991,6 +991,16 @@ static int vhdx_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>  }
>>  }
>>
>> +/* Disable migration when VHDX images are used */
>> +error_setg(&s->migration_blocker, "The vhdx format used by node '%s' "
>> +   "does not support live migration",
>> +   bdrv_get_device_or_node_name(bs));
>> +ret = migrate_add_blocker(s->migration_blocker, &local_err);
>> +if (local_err) {
>> +error_propagate(errp, local_err);
>> +goto fail;
>> +}
>> +
>>  if (flags & BDRV_O_RDWR) {
>>  ret = vhdx_update_headers(bs, s, false, NULL);
>>  if (ret < 0) {
>> @@ -1000,12 +1010,6 @@ static int vhdx_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>
>>  /* TODO: differencing files */
>>
>> -/* Disable migration when VHDX images are used */
>> -error_setg(&s->migration_blocker, "The vhdx format used by node '%s' "
>> -   "does not support live migration",
>> -   bdrv_get_device_or_node_name(bs));
>> -migrate_add_blocker(s->migration_blocker);
>> -
>>  return 0;
>>  fail:
>>  vhdx_close(bs);
>> diff --git a/block/vmdk.c b/block/vmdk.c
>> index a11c27a..f1d75ae 100644
>> --- a/block/vmdk.c
>> +++ b/block/vmdk.c
>> @@ -941,6 +941,7 @@ static int vmdk_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>  int ret;
>>  BDRVVmdkState *s = bs->opaque;
>>  uint32_t magic;
>> +Error *local_err = NULL;
>>
>>  buf = vmd

Re: [Qemu-devel] [PULL 10/65] tcg/s390: Expose host facilities to tcg-target.h

2017-01-16 Thread Christian Borntraeger
On 01/13/2017 10:18 AM, Christian Borntraeger wrote:
> On 01/11/2017 03:17 AM, Richard Henderson wrote:
>> This lets us expose facilities to TCG_TARGET_HAS_* defines
>> directly, rather than hiding behind function calls.
>>
>> Signed-off-by: Richard Henderson 
>> ---
>>  tcg/s390/tcg-target.h | 126 
>> --
>>  tcg/s390/tcg-target.inc.c |  74 +++
>>  2 files changed, 96 insertions(+), 104 deletions(-)
> 
> This broke compilation.Seems that you forgot one replacement:
> 
> In file included from /home/cborntra/REPOS/qemu/tcg/tcg.c:253:0:
> /home/cborntra/REPOS/qemu/tcg/s390/tcg-target.inc.c: In function ‘tgen_cmp’:
> /home/cborntra/REPOS/qemu/tcg/s390/tcg-target.inc.c:1096:19: error: 
> ‘facilities’ undeclared (first use in this function)
>  if (!(facilities & FACILITY_EXT_IMM)) {
>^~
> /home/cborntra/REPOS/qemu/tcg/s390/tcg-target.inc.c:1096:19: note: each 
> undeclared identifier is reported only once for each function it appears in
> /home/cborntra/REPOS/qemu/rules.mak:64: recipe for target 'tcg/tcg.o' failed
> make[1]: *** [tcg/tcg.o] Error 1
> Makefile:203: recipe for target 'subdir-s390x-softmmu' failed
> make: *** [subdir-s390x-softmmu] Error 2
> make: Leaving directory '/home/cborntra/REPOS/qemu/build'
> 
> 
> 

Can we get a fix for this? Seems that a simple replacement of facilities -> 
s390_facilities
would do the trick.

Christian




Re: [Qemu-devel] [PATCH RFC v3 12/14] intel_iommu: do replay when context invalidate

2017-01-16 Thread Jason Wang



On 2017年01月16日 16:18, Peter Xu wrote:

On Mon, Jan 16, 2017 at 03:52:10PM +0800, Jason Wang wrote:


On 2017年01月16日 15:43, Peter Xu wrote:

On Mon, Jan 16, 2017 at 01:53:54PM +0800, Jason Wang wrote:

On 2017年01月13日 11:06, Peter Xu wrote:

Before this one we only invalidate context cache when we receive context
entry invalidations. However it's possible that the invalidation also
contains a domain switch (only if cache-mode is enabled for vIOMMU).

So let's check for CM before replaying?

When CM is not set, there should have no device needs
IOMMU_NOTIFIER_MAP notifies. So IMHO it won't hurt if we replay here
(so the notifier_list will only contain UNMAP notifiers at most, and
sending UNMAP to those devices should not affect them at all).

If we check CM before replay, it'll be faster when guest change iommu
domain for a specific device. But after all this kind of operation is
extremely rare, while if we check CM bit, we have a "assumption" in
the code that MAP is depending on CM. In that case, to make the codes
cleaner, I'd slightly prefer not check it here. How do you think?

Ok, I think maybe it's better to add a comment here.

How about this?

+/*
+ * So a device is moving out of (or moving into) a
+ * domain, a replay() suites here to notify all the
+ * IOMMU_NOTIFIER_MAP registers about this change.
+ * This won't bring bad even if we have no such
+ * notifier registered - the IOMMU notification
+ * framework will skip MAP notifications if that
+ * happened.
+ */
  memory_region_iommu_replay_all(&vtd_as->iommu);

Thanks,

-- peterx


I'm fine with this.

Thanks



Re: [Qemu-devel] [PATCH RFC v3 13/14] intel_iommu: allow dynamic switch of IOMMU region

2017-01-16 Thread Peter Xu
On Mon, Jan 16, 2017 at 04:25:35PM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月16日 16:12, Peter Xu wrote:
> >On Mon, Jan 16, 2017 at 04:01:00PM +0800, Jason Wang wrote:
> >>
> >>On 2017年01月16日 15:50, Peter Xu wrote:
> >>>On Mon, Jan 16, 2017 at 02:20:31PM +0800, Jason Wang wrote:
> >>>
> >>>[...]
> >>>
> >diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> >index fd75112..2596f11 100644
> >--- a/hw/i386/intel_iommu.c
> >+++ b/hw/i386/intel_iommu.c
> >@@ -1343,9 +1343,49 @@ static void vtd_handle_gcmd_sirtp(IntelIOMMUState 
> >*s)
> >  vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRTPS);
> >  }
> >+static void vtd_switch_address_space(VTDAddressSpace *as, bool 
> >iommu_enabled)
> Looks like you can check s->dmar_enabled here?
> >>>Yes, we need to check old state in case we don't need a switch at all.
> >>>Actually I checked it...
> >>>
> >>I mean is there a chance that iommu_enabled( better name should be
> >>dmar_enabled) is not equal to s->dmar_enabled? Looks not.
> >>
> >>vtd_handle_gcmd_te() did:
> >>
> >> ...
> >> if (en) {
> >> s->dmar_enabled = true;
> >> /* Ok - report back to driver */
> >> vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_TES);
> >> } else {
> >> s->dmar_enabled = false;
> >> ...
> >>
> >>You can vtd_switch_address_space_all(s, en) after this which will call this
> >>function. And another caller like you've pointed out has already call this
> >>through s->dmar_enabled. So en here is always s->dmar_enalbed?
> >Hmm, yes...
> >
> >(I would still prefer keeping this parameter for readablility.
> >  Though, I prefer your suggestion to rename it to dmar_enabled)
> >
> >-- peterx
> 
> I think this does not give more readability :) May I was wrong, let leave
> this for maintainer.
> 
> Thanks :)

Thanks for reviewing this series so fast!

I have no strong opinion as well. Maybe you are right. :-)

Michael, please let me know if you dislike this, so I can remove this
parameter (it equals to as->iommu_state->dmar_enabled).

Thanks,

-- peterx



[Qemu-devel] Towards an ivshmem 2.0?

2017-01-16 Thread Jan Kiszka
Hi,

some of you may know that we are using a shared memory device similar to
ivshmem in the partitioning hypervisor Jailhouse [1].

We started as being compatible to the original ivshmem that QEMU
implements, but we quickly deviated in some details, and in the recent
months even more. Some of the deviations are related to making the
implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
aiming at safety critical systems and, therefore, a small code base.
Other changes address deficits in the original design, like missing
life-cycle management.

Now the question is if there is interest in defining a common new
revision of this device and maybe also of some protocols used on top,
such as virtual network links. Ideally, this would enable us to share
Linux drivers. We will definitely go for upstreaming at least a network
driver such as [2], a UIO driver and maybe also a serial port/console.

I've attached a first draft of the specification of our new ivshmem
device. A working implementation can be found in the wip/ivshmem2 branch
of Jailhouse [3], the corresponding ivshmem-net driver in [4].

Deviations from the original design:

- Only two peers per link

  This simplifies the implementation and also the interfaces (think of
  life-cycle management in a multi-peer environment). Moreover, we do
  not have an urgent use case for multiple peers, thus also not
  reference for a protocol that could be used in such setups. If someone
  else happens to share such a protocol, it would be possible to discuss
  potential extensions and their implications.

- Side-band registers to discover and configure share memory regions

  This was one of the first changes: We removed the memory regions from
  the PCI BARs and gave them special configuration space registers. By
  now, these registers are embedded in a PCI capability. The reasons are
  that Jailhouse does not allow to relocate the regions in guest address
  space (but other hypervisors may if they like to) and that we now have
  up to three of them.

- Changed PCI base class code to 0xff (unspecified class)

  This allows us to define our own sub classes and interfaces. That is
  now exploited for specifying the shared memory protocol the two
  connected peers should use. It also allows the Linux drivers to match
  on that.

- INTx interrupts support is back

  This is needed on target platforms without MSI controllers, i.e.
  without the required guest support. Namely some PCI-less ARM SoCs
  required the reintroduction. While doing this, we also took care of
  keeping the MMIO registers free of privileged controls so that a
  guest OS can map them safely into a guest userspace application.

And then there are some extensions of the original ivshmem:

- Multiple shared memory regions, including unidirectional ones

  It is now possible to expose up to three different shared memory
  regions: The first one is read/writable for both sides. The second
  region is read/writable for the local peer and read-only for the
  remote peer (useful for output queues). And the third is read-only
  locally but read/writable remotely (ie. for input queues).
  Unidirectional regions prevent that the receiver of some data can
  interfere with the sender while it is still building the message, a
  property that is not only useful for safety critical communication,
  we are sure.

- Life-cycle management via local and remote state

  Each device can now signal its own state in form of a value to the
  remote side, which triggers an event there. Moreover, state changes
  done by the hypervisor to one peer are signalled to the other side.
  And we introduced a write-to-shared-memory mechanism for the
  respective remote state so that guests do not have to issue an MMIO
  access in order to check the state.

So, this is our proposal. Would be great to hear some opinions if you
see value in adding support for such an "ivshmem 2.0" device to QEMU as
well and expand its ecosystem towards Linux upstream, maybe also DPDK
again. If you see problems in the new design /wrt what QEMU provides so
far with its ivshmem device, let's discuss how to resolve them. Looking
forward to any feedback!

Jan

[1] https://github.com/siemens/jailhouse
[2]
http://git.kiszka.org/?p=linux.git;a=blob;f=drivers/net/ivshmem-net.c;h=0e770ca293a4aca14a55ac0e66871b09c82647af;hb=refs/heads/queues/jailhouse
[3] https://github.com/siemens/jailhouse/commits/wip/ivshmem2
[4]
http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/jailhouse-ivshmem2

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux
IVSHMEM Device Specification


The Inter-VM Shared Memory device provides the following features to its users:

- Interconnection between two peers

- Up to three shared memory regions per connection

- one read/writable for both sides

- two unidirectional, i.e. read/writable for one side and only readable for
  the 

[Qemu-devel] [PATCH v4 1/2] memory: tune mtree_print_mr() to dump mr type

2017-01-16 Thread Peter Xu
We were dumping RW bits for each memory region, that might be confusing.
It'll make more sense to dump the memory region type directly rather
than the RW bits since that's how the bits are derived.

Meanwhile, with some slight cleanup in the function.

Signed-off-by: Peter Xu 
---
 memory.c | 48 +++-
 1 file changed, 27 insertions(+), 21 deletions(-)

diff --git a/memory.c b/memory.c
index 2bfc37f..c42bde4 100644
--- a/memory.c
+++ b/memory.c
@@ -2450,6 +2450,21 @@ void address_space_destroy(AddressSpace *as)
 call_rcu(as, do_address_space_destroy, rcu);
 }
 
+static const char *memory_region_type(MemoryRegion *mr)
+{
+if (memory_region_is_ram_device(mr)) {
+return "ramd";
+} else if (memory_region_is_romd(mr)) {
+return "romd";
+} else if (memory_region_is_rom(mr)) {
+return "rom";
+} else if (memory_region_is_ram(mr)) {
+return "ram";
+} else {
+return "i/o";
+}
+}
+
 typedef struct MemoryRegionList MemoryRegionList;
 
 struct MemoryRegionList {
@@ -2459,6 +2474,10 @@ struct MemoryRegionList {
 
 typedef QTAILQ_HEAD(queue, MemoryRegionList) MemoryRegionListHead;
 
+#define MR_SIZE(size) (int128_nz(size) ? (hwaddr)int128_get64( \
+   int128_sub((size), int128_one())) : 0)
+#define MTREE_INDENT "  "
+
 static void mtree_print_mr(fprintf_function mon_printf, void *f,
const MemoryRegion *mr, unsigned int level,
hwaddr base,
@@ -2474,7 +2493,7 @@ static void mtree_print_mr(fprintf_function mon_printf, 
void *f,
 }
 
 for (i = 0; i < level; i++) {
-mon_printf(f, "  ");
+mon_printf(f, MTREE_INDENT);
 }
 
 if (mr->alias) {
@@ -2494,37 +2513,24 @@ static void mtree_print_mr(fprintf_function mon_printf, 
void *f,
 QTAILQ_INSERT_TAIL(alias_print_queue, ml, queue);
 }
 mon_printf(f, TARGET_FMT_plx "-" TARGET_FMT_plx
-   " (prio %d, %c%c): alias %s @%s " TARGET_FMT_plx
+   " (prio %d, %s): alias %s @%s " TARGET_FMT_plx
"-" TARGET_FMT_plx "%s\n",
base + mr->addr,
-   base + mr->addr
-   + (int128_nz(mr->size) ?
-  (hwaddr)int128_get64(int128_sub(mr->size,
-  int128_one())) : 0),
+   base + mr->addr + MR_SIZE(mr->size),
mr->priority,
-   mr->romd_mode ? 'R' : '-',
-   !mr->readonly && !(mr->rom_device && mr->romd_mode) ? 'W'
-   : '-',
+   memory_region_type((MemoryRegion *)mr),
memory_region_name(mr),
memory_region_name(mr->alias),
mr->alias_offset,
-   mr->alias_offset
-   + (int128_nz(mr->size) ?
-  (hwaddr)int128_get64(int128_sub(mr->size,
-  int128_one())) : 0),
+   mr->alias_offset + MR_SIZE(mr->size),
mr->enabled ? "" : " [disabled]");
 } else {
 mon_printf(f,
-   TARGET_FMT_plx "-" TARGET_FMT_plx " (prio %d, %c%c): 
%s%s\n",
+   TARGET_FMT_plx "-" TARGET_FMT_plx " (prio %d, %s): %s%s\n",
base + mr->addr,
-   base + mr->addr
-   + (int128_nz(mr->size) ?
-  (hwaddr)int128_get64(int128_sub(mr->size,
-  int128_one())) : 0),
+   base + mr->addr + MR_SIZE(mr->size),
mr->priority,
-   mr->romd_mode ? 'R' : '-',
-   !mr->readonly && !(mr->rom_device && mr->romd_mode) ? 'W'
-   : '-',
+   memory_region_type((MemoryRegion *)mr),
memory_region_name(mr),
mr->enabled ? "" : " [disabled]");
 }
-- 
2.7.4




[Qemu-devel] [PATCH v4 0/2] memory: extend "info mtree" with flat view dump

2017-01-16 Thread Peter Xu
v4:
- do unref of flatview when no view is there [Dave]

v3:
- rather than dump flatview directly in "info mtree", provide a new
  parameter "-f" for it. [Paolo]
- replace "RW" chars with the type of memory region [Paolo]
- (cc dave as well since it touches HMP interface)

v2:
- fix a size error in patch 2
- add r-b for Marc-André in patch 1

Each address space has its own flatview. It's another way to observe
memory info besides the default memory region hierachy, for example,
if we want to know which memory region will handle the write to
specific address, a flatview will suite more here than the default
hierachical dump.

Please review. Thanks,

Peter Xu (2):
  memory: tune mtree_print_mr() to dump mr type
  memory: hmp: add "-f" for "info mtree"

 hmp-commands-info.hx  |  6 ++--
 include/exec/memory.h |  2 +-
 memory.c  | 89 ++-
 monitor.c |  4 ++-
 4 files changed, 74 insertions(+), 27 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH v4 2/2] memory: hmp: add "-f" for "info mtree"

2017-01-16 Thread Peter Xu
Adding one more option "-f" for "info mtree" to dump the flat views of
all the address spaces.

This will be useful to debug the memory rendering logic, also it'll be
much easier with it to know what memory region is handling what address
range.

Signed-off-by: Peter Xu 
---
 hmp-commands-info.hx  |  6 +++---
 include/exec/memory.h |  2 +-
 memory.c  | 41 -
 monitor.c |  4 +++-
 4 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 55d50c4..b0f35e6 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -249,9 +249,9 @@ ETEXI
 
 {
 .name   = "mtree",
-.args_type  = "",
-.params = "",
-.help   = "show memory tree",
+.args_type  = "flatview:-f",
+.params = "[-f]",
+.help   = "show memory tree (-f: dump flat view for address 
spaces)",
 .cmd= hmp_info_mtree,
 },
 
diff --git a/include/exec/memory.h b/include/exec/memory.h
index bec9756..71db380 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1254,7 +1254,7 @@ void memory_global_dirty_log_start(void);
  */
 void memory_global_dirty_log_stop(void);
 
-void mtree_info(fprintf_function mon_printf, void *f);
+void mtree_info(fprintf_function mon_printf, void *f, bool flatview);
 
 /**
  * memory_region_dispatch_read: perform a read directly to the specified
diff --git a/memory.c b/memory.c
index c42bde4..6498727 100644
--- a/memory.c
+++ b/memory.c
@@ -2564,12 +2564,51 @@ static void mtree_print_mr(fprintf_function mon_printf, 
void *f,
 }
 }
 
-void mtree_info(fprintf_function mon_printf, void *f)
+static void mtree_print_flatview(fprintf_function p, void *f,
+ AddressSpace *as)
+{
+FlatView *view = address_space_get_flatview(as);
+FlatRange *range = &view->ranges[0];
+MemoryRegion *mr;
+int n = view->nr;
+
+if (n <= 0) {
+p(f, MTREE_INDENT "No rendered FlatView for "
+  "address space '%s'\n", as->name);
+flatview_unref(view);
+return;
+}
+
+while (n--) {
+mr = range->mr;
+p(f, MTREE_INDENT TARGET_FMT_plx "-"
+  TARGET_FMT_plx " (prio %d, %s): %s\n",
+  int128_get64(range->addr.start),
+  int128_get64(range->addr.start) + MR_SIZE(range->addr.size),
+  mr->priority,
+  memory_region_type(mr),
+  memory_region_name(mr));
+range++;
+}
+
+flatview_unref(view);
+}
+
+void mtree_info(fprintf_function mon_printf, void *f, bool flatview)
 {
 MemoryRegionListHead ml_head;
 MemoryRegionList *ml, *ml2;
 AddressSpace *as;
 
+if (flatview) {
+QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
+mon_printf(f, "address-space (flat view): %s\n", as->name);
+mtree_print_flatview(mon_printf, f, as);
+mon_printf(f, "\n");
+}
+return;
+}
+
 QTAILQ_INIT(&ml_head);
 
 QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
diff --git a/monitor.c b/monitor.c
index 0841d43..679cd52 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1529,7 +1529,9 @@ static void hmp_boot_set(Monitor *mon, const QDict *qdict)
 
 static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
 {
-mtree_info((fprintf_function)monitor_printf, mon);
+bool flatview = qdict_get_try_bool(qdict, "flatview", false);
+
+mtree_info((fprintf_function)monitor_printf, mon, flatview);
 }
 
 static void hmp_info_numa(Monitor *mon, const QDict *qdict)
-- 
2.7.4




Re: [Qemu-devel] Virtual Machine Generation ID

2017-01-16 Thread Igor Mammedov
On Sat, 14 Jan 2017 22:17:53 -0800
Ben Warren  wrote:

> Hi Michael,
> 
> > On Dec 10, 2016, at 7:28 PM, Michael S. Tsirkin  wrote:
> > 
> > On Tue, Dec 06, 2016 at 06:15:34PM -0800, Ben Warren wrote:  
> >> Hi Michael,
> >> 
> >> I’m well on my way to implementing this, but I am really new to the QEMU 
> >> code base and am struggling with some concepts.  Please see below:  
> >>> On Oct 5, 2016, at 6:29 PM, Michael S. Tsirkin  wrote:
> >>> 
> >>> On Tue, Oct 04, 2016 at 03:51:40PM -0700, Ed Swierk wrote:  
>  On Thu, Sep 15, 2016 at 5:36 PM, Michael S. Tsirkin  
>  wrote:  
> > On Thu, Sep 15, 2016 at 05:23:28PM -0700, Ed Swierk wrote:  
> >> I'm wondering what it will take to finish up work on vmgenid.
> >> 
> >> https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg05599.html  
> > 
> > We have ACPI_BUILD_TPMLOG_FILE in tree now and I think it could be
> > allocated in a similar way.
> > Integrate patch "fw-cfg: support writeable blobs" to communicate the
> > allocated address back to QEMU.  
>  
>  Starting with Igor's last version at
>  https://github.com/imammedo/qemu/commits/vmgen_wip , it's not clear to
>  me which changes need to be ported, which changes are obsoleted by
>  your new fw-cfg stuff and/or upstream churn in ACPI, device
>  properties, etc. In particular ACPI is still a total mystery to me,
>  though passing a single address from guest to host can't be that hard,
>  can it?
>  
>  Any clues would be appreciated.
>  
>  --Ed  
> >>> 
> >>> It might be best to just re-start from the beginning.
> >>> So the idea is that ACPI should be about supplying the address
> >>> to guest. To supply address to host we'll use fw cfg.
> >>> This would be new I think:
> >>> 
> >>> - add support for writeable fw cfg blobs  
> >> patch applied  
> >>> - add linker/loader command to write address of a blob into
> >>> such a fw cfg file
> >>> - add a new file used for vm gen id, use loader command above
> >>> to pass the address of a blob allocated for it to host  
> >> I don’t really understand the meaning of “file” in this context.  It seems 
> >> to be a way of specifying individual fw_cfg entries without explicitly 
> >> giving an index, but is not something that is visible in either the host 
> >> or guest file system.  Is this about right?  In my code I’m using 
> >> “/etc/vmgenid”  
> > 
> > yes
> >   
> >> As for the blob, I’m thinking this is where my main problem is.  The 
> >> ‘fw_cfg_add_*()’ functions take a data pointer but doesn’t seem to copy 
> >> the data anywhere.  We pass essentially a pointer via ACPI to the guest, 
> >> so what it points to needs to be in an accessible region.  I don’t get how 
> >> to define the blob contents.  There are command-line ‘fw-cfg’ options 
> >> where you can specify a file, but it’s not clear to me how to use them.  
> >> Maybe I reserve some IO memory or something?  
> > 
> > Not sure I understand the question. fw cfg device will make
> > memory accessible to guest. Put the guest physical address there.
> > the address needs to be calculated by linker.
> >   
> I’m almost ready to submit a V2 of the patch set, but there’s still one issue 
> that I can’t figure out.  From the guest, I can read the contents of the 
> blob.  If I make a change to the contents of the blob (via QMP) the guest 
> does not see the changes.  Is there something I need to do on the QEMU side 
> to “push” the updated fw_cfg contents to the guest?  I’ve noticed this both 
> when writing a qtest for the feature, and also in a Linux kernel module I 
> wrote that reads the ACPI contents in a guest.
post here a link to your current repo so I could check it

> 
> thanks,
> Ben



Re: [Qemu-devel] [PATCH RFC v3 14/14] intel_iommu: enable vfio devices

2017-01-16 Thread Peter Xu
On Mon, Jan 16, 2017 at 02:30:20PM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月13日 11:06, Peter Xu wrote:
> >This patch is based on Aviv Ben-David ()'s patch
> >upstream:
> >
> >   "IOMMU: enable intel_iommu map and unmap notifiers"
> >   https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01453.html
> >
> >However I removed/fixed some content, and added my own codes.
> >
> >Instead of translate() every page for iotlb invalidations (which is
> >slower), we walk the pages when needed and notify in a hook function.
> >
> >This patch enables vfio devices for VT-d emulation.
> >
> >Signed-off-by: Peter Xu 
> >---
> >  hw/i386/intel_iommu.c | 68 
> > +--
> >  include/hw/i386/intel_iommu.h |  8 +
> >  2 files changed, 67 insertions(+), 9 deletions(-)
> >
> >diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> >index 2596f11..104200b 100644
> >--- a/hw/i386/intel_iommu.c
> >+++ b/hw/i386/intel_iommu.c
> >@@ -839,7 +839,8 @@ next:
> >   * @private: private data for the hook function
> >   */
> >  static int vtd_page_walk(VTDContextEntry *ce, uint64_t start, uint64_t end,
> >- vtd_page_walk_hook hook_fn, void *private)
> >+ vtd_page_walk_hook hook_fn, void *private,
> >+ bool notify_unmap)
> >  {
> >  dma_addr_t addr = vtd_get_slpt_base_from_context(ce);
> >  uint32_t level = vtd_get_level_from_context_entry(ce);
> >@@ -858,7 +859,7 @@ static int vtd_page_walk(VTDContextEntry *ce, uint64_t 
> >start, uint64_t end,
> >  trace_vtd_page_walk(ce->hi, ce->lo, start, end);
> >  return vtd_page_walk_level(addr, start, end, hook_fn, private,
> >-   level, true, true, NULL, false);
> >+   level, true, true, NULL, notify_unmap);
> >  }
> >  /* Map a device to its corresponding domain (context-entry) */
> >@@ -1212,6 +1213,34 @@ static void 
> >vtd_iotlb_domain_invalidate(IntelIOMMUState *s, uint16_t domain_id)
> >  &domain_id);
> >  }
> >+static int vtd_page_invalidate_notify_hook(IOMMUTLBEntry *entry,
> >+   void *private)
> >+{
> >+memory_region_notify_iommu((MemoryRegion *)private, *entry);
> >+return 0;
> >+}
> >+
> >+static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> >+   uint16_t domain_id, hwaddr addr,
> >+   uint8_t am)
> >+{
> >+IntelIOMMUNotifierNode *node;
> >+VTDContextEntry ce;
> >+int ret;
> >+
> >+QLIST_FOREACH(node, &(s->notifiers_list), next) {
> >+VTDAddressSpace *vtd_as = node->vtd_as;
> >+ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> >+   vtd_as->devfn, &ce);
> >+if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
> >+vtd_page_walk(&ce, addr, addr + (1 << am) * VTD_PAGE_SIZE,
> >+  vtd_page_invalidate_notify_hook,
> >+  (void *)&vtd_as->iommu, true);
> >+}
> >+}
> >+}
> >+
> >+
> >  static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t 
> > domain_id,
> >hwaddr addr, uint8_t am)
> >  {
> >@@ -1222,6 +1251,7 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState 
> >*s, uint16_t domain_id,
> >  info.addr = addr;
> >  info.mask = ~((1 << am) - 1);
> >  g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
> >+vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
> 
> Is the case of GLOBAL or DSI flush missed, or we don't care it at all?

IMHO we don't. For device assignment, since we are having CM=1 here,
we should have explicit page invalidations even if guest sends
global/domain invalidations.

Thanks,

-- peterx



Re: [Qemu-devel] [PATCH RFC v3 14/14] intel_iommu: enable vfio devices

2017-01-16 Thread Peter Xu
On Mon, Jan 16, 2017 at 02:30:20PM +0800, Jason Wang wrote:

[...]

> >  }
> >  /* Flush IOTLB
> >@@ -2244,15 +2274,34 @@ static void 
> >vtd_iommu_notify_flag_changed(MemoryRegion *iommu,
> >IOMMUNotifierFlag new)
> >  {
> >  VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
> >+IntelIOMMUState *s = vtd_as->iommu_state;
> >+IntelIOMMUNotifierNode *node = NULL;
> >+IntelIOMMUNotifierNode *next_node = NULL;
> >-if (new & IOMMU_NOTIFIER_MAP) {
> >-error_report("Device at bus %s addr %02x.%d requires iommu "
> >- "notifier which is currently not supported by "
> >- "intel-iommu emulation",
> >- vtd_as->bus->qbus.name, PCI_SLOT(vtd_as->devfn),
> >- PCI_FUNC(vtd_as->devfn));
> >+if (!s->cache_mode_enabled && new & IOMMU_NOTIFIER_MAP) {
> >+error_report("We need to set cache_mode=1 for intel-iommu to enable 
> >"
> >+ "device assignment with IOMMU protection.");
> >  exit(1);
> >  }
> >+
> >+/* Add new ndoe if no mapping was exising before this call */
> 
> "node"?

Sorry I missed this one - let me just remove above comment since it
just describes what the codes has done below.

Thanks,

> 
> >+if (old == IOMMU_NOTIFIER_NONE) {
> >+node = g_malloc0(sizeof(*node));
> >+node->vtd_as = vtd_as;
> >+QLIST_INSERT_HEAD(&s->notifiers_list, node, next);
> >+return;
> >+}

-- peterx



[Qemu-devel] [PATCH RFC 1/6] headers: update linux headers

2017-01-16 Thread Shannon Zhao
From: Shannon Zhao 

Signed-off-by: Shannon Zhao 
---
 linux-headers/asm-arm64/kvm.h | 1 +
 linux-headers/linux/kvm.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index fd5a276..f914eac 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -97,6 +97,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_EL1_32BIT 1 /* CPU running a 32bit VM */
 #define KVM_ARM_VCPU_PSCI_0_2  2 /* CPU uses PSCI v0.2 */
 #define KVM_ARM_VCPU_PMU_V33 /* Support guest PMUv3 */
+#define KVM_ARM_VCPU_CROSS 4 /* Support cross type vCPU */
 
 struct kvm_vcpu_init {
__u32 target;
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index bb0ed71..ea9e288 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -870,6 +870,8 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_S390_USER_INSTR0 130
 #define KVM_CAP_MSI_DEVID 131
 #define KVM_CAP_PPC_HTM 132
+#define KVM_CAP_ARM_CROSS_VCPU 133
+#define KVM_CAP_ARM_HETEROGENEOUS 134
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.0.4





[Qemu-devel] [PATCH RFC 6/6] target-arm: cpu64: Add support for Cortex-A72

2017-01-16 Thread Shannon Zhao
From: Shannon Zhao 

Add the ARM Cortex-A72 processor definition. It's similar to A57.

Signed-off-by: Shannon Zhao 
---
 hw/arm/virt.c  |  1 +
 target/arm/cpu64.c | 56 ++
 2 files changed, 57 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 49b7b65..2ba93e3 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -166,6 +166,7 @@ static const char *valid_cpus[] = {
 "cortex-a15",
 "cortex-a53",
 "cortex-a57",
+"cortex-a72",
 "generic",
 "host",
 NULL
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 223f31e..4f00ceb 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -204,6 +204,61 @@ static void aarch64_a53_initfn(Object *obj)
 define_arm_cp_regs(cpu, cortex_a57_a53_cp_reginfo);
 }
 
+static void aarch64_a72_initfn(Object *obj)
+{
+ARMCPU *cpu = ARM_CPU(obj);
+
+cpu->dtb_compatible = "arm,cortex-a72";
+set_feature(&cpu->env, ARM_FEATURE_V8);
+set_feature(&cpu->env, ARM_FEATURE_VFP4);
+set_feature(&cpu->env, ARM_FEATURE_NEON);
+set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
+set_feature(&cpu->env, ARM_FEATURE_AARCH64);
+set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
+set_feature(&cpu->env, ARM_FEATURE_V8_AES);
+set_feature(&cpu->env, ARM_FEATURE_V8_SHA1);
+set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
+set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
+set_feature(&cpu->env, ARM_FEATURE_CRC);
+set_feature(&cpu->env, ARM_FEATURE_EL3);
+cpu->kvm_target = QEMU_KVM_ARM_TARGET_GENERIC_V8;
+cpu->midr = 0x410fd081;
+cpu->revidr = 0x;
+cpu->reset_fpsid = 0x41034080;
+cpu->mvfr0 = 0x10110222;
+cpu->mvfr1 = 0x1211;
+cpu->mvfr2 = 0x0043;
+cpu->ctr = 0x8444c004;
+cpu->reset_sctlr = 0x00c50838;
+cpu->id_pfr0 = 0x0131;
+cpu->id_pfr1 = 0x00011011;
+cpu->id_dfr0 = 0x03010066;
+cpu->id_afr0 = 0x;
+cpu->id_mmfr0 = 0x10201105;
+cpu->id_mmfr1 = 0x4000;
+cpu->id_mmfr2 = 0x0126;
+cpu->id_mmfr3 = 0x02102211;
+cpu->id_isar0 = 0x02101110;
+cpu->id_isar1 = 0x13112111;
+cpu->id_isar2 = 0x21232042;
+cpu->id_isar3 = 0x01112131;
+cpu->id_isar4 = 0x00011142;
+cpu->id_isar5 = 0x00011121;
+cpu->id_aa64pfr0 = 0x;
+cpu->id_aa64dfr0 = 0x10305106;
+cpu->pmceid0 = 0x;
+cpu->pmceid1 = 0x;
+cpu->id_aa64isar0 = 0x00011120;
+cpu->id_aa64mmfr0 = 0x1124;
+cpu->dbgdidr = 0x3516d000;
+cpu->clidr = 0x0a200023;
+cpu->ccsidr[0] = 0x701fe00a; /* 32KB L1 dcache */
+cpu->ccsidr[1] = 0x201fe012; /* 48KB L1 icache */
+cpu->ccsidr[2] = 0x71ffe07a; /* 4096KB L2 cache */
+cpu->dcz_blocksize = 4; /* 64 bytes */
+define_arm_cp_regs(cpu, cortex_a57_a53_cp_reginfo);
+}
+
 static void aarch64_generic_initfn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
@@ -285,6 +340,7 @@ typedef struct ARMCPUInfo {
 static const ARMCPUInfo aarch64_cpus[] = {
 { .name = "cortex-a57", .initfn = aarch64_a57_initfn },
 { .name = "cortex-a53", .initfn = aarch64_a53_initfn },
+{ .name = "cortex-a72", .initfn = aarch64_a72_initfn },
 { .name = "generic",.initfn = aarch64_generic_initfn },
 #ifdef CONFIG_USER_ONLY
 { .name = "any", .initfn = aarch64_any_initfn },
-- 
2.0.4





[Qemu-devel] [PATCH RFC 5/6] arm: virt: Enable generic type CPU in virt machine

2017-01-16 Thread Shannon Zhao
From: Shannon Zhao 

Signed-off-by: Shannon Zhao 
---
 hw/arm/virt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4b301c2..49b7b65 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -166,6 +166,7 @@ static const char *valid_cpus[] = {
 "cortex-a15",
 "cortex-a53",
 "cortex-a57",
+"generic",
 "host",
 NULL
 };
-- 
2.0.4





[Qemu-devel] [PATCH RFC 3/6] arm: kvm64: Check if kvm supports cross type vCPU

2017-01-16 Thread Shannon Zhao
From: Shannon Zhao 

If user requests a specific type vCPU which is not same with the
physical ones and if kvm supports cross type vCPU, we set the
KVM_ARM_VCPU_CROSS bit and set the CPU ID registers.

Signed-off-by: Shannon Zhao 
---
 target/arm/kvm64.c | 182 +
 1 file changed, 182 insertions(+)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 609..70442ea 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -481,7 +481,151 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUClass *ahcc)
 return true;
 }
 
+#define ARM_CPU_ID_MIDR3, 0, 0, 0, 0
 #define ARM_CPU_ID_MPIDR   3, 0, 0, 0, 5
+/* ID group 1 registers */
+#define ARM_CPU_ID_REVIDR  3, 0, 0, 0, 6
+#define ARM_CPU_ID_AIDR3, 1, 0, 0, 7
+
+/* ID group 2 registers */
+#define ARM_CPU_ID_CCSIDR  3, 1, 0, 0, 0
+#define ARM_CPU_ID_CLIDR   3, 1, 0, 0, 1
+#define ARM_CPU_ID_CSSELR  3, 2, 0, 0, 0
+#define ARM_CPU_ID_CTR 3, 3, 0, 0, 1
+
+/* ID group 3 registers */
+#define ARM_CPU_ID_PFR03, 0, 0, 1, 0
+#define ARM_CPU_ID_PFR13, 0, 0, 1, 1
+#define ARM_CPU_ID_DFR03, 0, 0, 1, 2
+#define ARM_CPU_ID_AFR03, 0, 0, 1, 3
+#define ARM_CPU_ID_MMFR0   3, 0, 0, 1, 4
+#define ARM_CPU_ID_MMFR1   3, 0, 0, 1, 5
+#define ARM_CPU_ID_MMFR2   3, 0, 0, 1, 6
+#define ARM_CPU_ID_MMFR3   3, 0, 0, 1, 7
+#define ARM_CPU_ID_ISAR0   3, 0, 0, 2, 0
+#define ARM_CPU_ID_ISAR1   3, 0, 0, 2, 1
+#define ARM_CPU_ID_ISAR2   3, 0, 0, 2, 2
+#define ARM_CPU_ID_ISAR3   3, 0, 0, 2, 3
+#define ARM_CPU_ID_ISAR4   3, 0, 0, 2, 4
+#define ARM_CPU_ID_ISAR5   3, 0, 0, 2, 5
+#define ARM_CPU_ID_MMFR4   3, 0, 0, 2, 6
+#define ARM_CPU_ID_MVFR0   3, 0, 0, 3, 0
+#define ARM_CPU_ID_MVFR1   3, 0, 0, 3, 1
+#define ARM_CPU_ID_MVFR2   3, 0, 0, 3, 2
+#define ARM_CPU_ID_AA64PFR03, 0, 0, 4, 0
+#define ARM_CPU_ID_AA64PFR13, 0, 0, 4, 1
+#define ARM_CPU_ID_AA64DFR03, 0, 0, 5, 0
+#define ARM_CPU_ID_AA64DFR13, 0, 0, 5, 1
+#define ARM_CPU_ID_AA64AFR03, 0, 0, 5, 4
+#define ARM_CPU_ID_AA64AFR13, 0, 0, 5, 5
+#define ARM_CPU_ID_AA64ISAR0   3, 0, 0, 6, 0
+#define ARM_CPU_ID_AA64ISAR1   3, 0, 0, 6, 1
+#define ARM_CPU_ID_AA64MMFR0   3, 0, 0, 7, 0
+#define ARM_CPU_ID_AA64MMFR1   3, 0, 0, 7, 1
+#define ARM_CPU_ID_MAX 36
+
+static int kvm_arm_set_id_registers(CPUState *cs)
+{
+int ret = 0;
+uint32_t i;
+ARMCPU *cpu = ARM_CPU(cs);
+struct kvm_one_reg id_regitsers[ARM_CPU_ID_MAX];
+
+memset(id_regitsers, 0, ARM_CPU_ID_MAX * sizeof(struct kvm_one_reg));
+
+id_regitsers[0].id = ARM64_SYS_REG(ARM_CPU_ID_MIDR);
+id_regitsers[0].addr = (uintptr_t)&cpu->midr;
+
+id_regitsers[1].id = ARM64_SYS_REG(ARM_CPU_ID_REVIDR);
+id_regitsers[1].addr = (uintptr_t)&cpu->revidr;
+
+id_regitsers[2].id = ARM64_SYS_REG(ARM_CPU_ID_MVFR0);
+id_regitsers[2].addr = (uintptr_t)&cpu->mvfr0;
+
+id_regitsers[3].id = ARM64_SYS_REG(ARM_CPU_ID_MVFR1);
+id_regitsers[3].addr = (uintptr_t)&cpu->mvfr1;
+
+id_regitsers[4].id = ARM64_SYS_REG(ARM_CPU_ID_MVFR2);
+id_regitsers[4].addr = (uintptr_t)&cpu->mvfr2;
+
+id_regitsers[5].id = ARM64_SYS_REG(ARM_CPU_ID_PFR0);
+id_regitsers[5].addr = (uintptr_t)&cpu->id_pfr0;
+
+id_regitsers[6].id = ARM64_SYS_REG(ARM_CPU_ID_PFR1);
+id_regitsers[6].addr = (uintptr_t)&cpu->id_pfr1;
+
+id_regitsers[7].id = ARM64_SYS_REG(ARM_CPU_ID_DFR0);
+id_regitsers[7].addr = (uintptr_t)&cpu->id_dfr0;
+
+id_regitsers[8].id = ARM64_SYS_REG(ARM_CPU_ID_AFR0);
+id_regitsers[8].addr = (uintptr_t)&cpu->id_afr0;
+
+id_regitsers[9].id = ARM64_SYS_REG(ARM_CPU_ID_MMFR0);
+id_regitsers[9].addr = (uintptr_t)&cpu->id_mmfr0;
+
+id_regitsers[10].id = ARM64_SYS_REG(ARM_CPU_ID_MMFR1);
+id_regitsers[10].addr = (uintptr_t)&cpu->id_mmfr1;
+
+id_regitsers[11].id = ARM64_SYS_REG(ARM_CPU_ID_MMFR2);
+id_regitsers[11].addr = (uintptr_t)&cpu->id_mmfr2;
+
+id_regitsers[12].id = ARM64_SYS_REG(ARM_CPU_ID_MMFR3);
+id_regitsers[12].addr = (uintptr_t)&cpu->id_mmfr3;
+
+id_regitsers[13].id = ARM64_SYS_REG(ARM_CPU_ID_ISAR0);
+id_regitsers[13].addr = (uintptr_t)&cpu->id_isar0;
+
+id_regitsers[14].id = ARM64_SYS_REG(ARM_CPU_ID_ISAR1);
+id_regitsers[14].addr = (uintptr_t)&cpu->id_isar1;
+
+id_regitsers[15].id = ARM64_SYS_REG(ARM_CPU_ID_ISAR2);
+id_regitsers[15].addr = (uintptr_t)&cpu->id_isar2;
+
+id_regitsers[16].id = ARM64_SYS_REG(ARM_CPU_ID_ISAR3);
+id_regitsers[16].addr = (uintptr_t)&cpu->id_isar3;
+
+id_regitsers[17].id = ARM64_SYS_REG(ARM_CPU_ID_ISAR4);
+id_regitsers[17].addr = (uintptr_t)&cpu->id_isar4;
+
+id_regitsers[18].id = ARM64_SYS_REG(ARM_CPU_ID_ISAR5);
+id_regitsers[18].addr = (uintptr_t)&cpu->id_isar5;
+
+id_regitsers[19].id = ARM64_SYS_REG(ARM_CPU_ID_AA64PFR0);
+id_regitsers[19].addr = (uintptr_t)&cpu->id_aa64pfr0;
+
+id_regitsers[20].id = ARM64_SYS_REG(ARM_CPU_ID_AA

[Qemu-devel] [PATCH RFC 4/6] target: arm: Add a generic type cpu

2017-01-16 Thread Shannon Zhao
From: Shannon Zhao 

Add a generic type cpu, it's useful for migration when running on
different hardwares.

Signed-off-by: Shannon Zhao 
---
 target/arm/cpu64.c | 54 ++
 1 file changed, 54 insertions(+)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 549cb1e..223f31e 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -204,6 +204,59 @@ static void aarch64_a53_initfn(Object *obj)
 define_arm_cp_regs(cpu, cortex_a57_a53_cp_reginfo);
 }
 
+static void aarch64_generic_initfn(Object *obj)
+{
+ARMCPU *cpu = ARM_CPU(obj);
+
+cpu->dtb_compatible = "arm,armv8";
+set_feature(&cpu->env, ARM_FEATURE_V8);
+set_feature(&cpu->env, ARM_FEATURE_VFP4);
+set_feature(&cpu->env, ARM_FEATURE_NEON);
+set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
+set_feature(&cpu->env, ARM_FEATURE_AARCH64);
+set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
+set_feature(&cpu->env, ARM_FEATURE_V8_AES);
+set_feature(&cpu->env, ARM_FEATURE_V8_SHA1);
+set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
+set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
+set_feature(&cpu->env, ARM_FEATURE_CRC);
+set_feature(&cpu->env, ARM_FEATURE_EL3);
+cpu->kvm_target = QEMU_KVM_ARM_TARGET_GENERIC_V8;
+cpu->midr = 0x410fd000; /* FIXME: this needs to adjust */
+cpu->revidr = 0x;
+cpu->reset_fpsid = 0x41034070;
+cpu->mvfr0 = 0x10110222;
+cpu->mvfr1 = 0x1211;
+cpu->mvfr2 = 0x0043;
+cpu->ctr = 0x84448004; /* L1Ip = VIPT */
+cpu->reset_sctlr = 0x00c50838;
+cpu->id_pfr0 = 0x0131;
+cpu->id_pfr1 = 0x00011011;
+cpu->id_dfr0 = 0x03010066;
+cpu->id_afr0 = 0x;
+cpu->id_mmfr0 = 0x10101105;
+cpu->id_mmfr1 = 0x4000;
+cpu->id_mmfr2 = 0x0126;
+cpu->id_mmfr3 = 0x02102211;
+cpu->id_isar0 = 0x02101110;
+cpu->id_isar1 = 0x13112111;
+cpu->id_isar2 = 0x21232042;
+cpu->id_isar3 = 0x01112131;
+cpu->id_isar4 = 0x00011142;
+cpu->id_isar5 = 0x00011121;
+cpu->id_aa64pfr0 = 0x;
+cpu->id_aa64dfr0 = 0x10305106;
+cpu->id_aa64isar0 = 0x00011120;
+cpu->id_aa64mmfr0 = 0x0f001101; /* only support 4k page, 36 bit physical 
addr */
+cpu->dbgdidr = 0x3516d000;
+cpu->clidr = 0x0a200023;
+cpu->ccsidr[0] = 0x7003e01a; /* 8KB L1 dcache */
+cpu->ccsidr[1] = 0x2007e00a; /* 8KB L1 icache */
+cpu->ccsidr[2] = 0x700fe07a; /* 128KB L2 cache */
+cpu->dcz_blocksize = 4; /* 64 bytes */
+define_arm_cp_regs(cpu, cortex_a57_a53_cp_reginfo);
+}
+
 #ifdef CONFIG_USER_ONLY
 static void aarch64_any_initfn(Object *obj)
 {
@@ -232,6 +285,7 @@ typedef struct ARMCPUInfo {
 static const ARMCPUInfo aarch64_cpus[] = {
 { .name = "cortex-a57", .initfn = aarch64_a57_initfn },
 { .name = "cortex-a53", .initfn = aarch64_a53_initfn },
+{ .name = "generic",.initfn = aarch64_generic_initfn },
 #ifdef CONFIG_USER_ONLY
 { .name = "any", .initfn = aarch64_any_initfn },
 #endif
-- 
2.0.4





[Qemu-devel] [PULL v2 000/180] QAPI patches for 2017-01-16

2017-01-16 Thread Markus Armbruster
This is Marc-André's "[PATCH v8 00/21] qapi doc generation (whole
version, squashed)" with a few commit messages tweaked, and "[PATCH v8
14/21] (SQUASHED) move doc to schema" unsquashed into 161 patches.

We did all the respins with in this squashed form to reduce noise.
However, since the unsquashed form is better suited for review, and
probably nicer if we have to revisit this part of the work down the
road, I'm proposing to merge this unsquashed.

If you want me to post the unsquashed patches, I'm happy to redo this
pull request.

If you'd rather pull the squashed version, likewise.

I'm afraid this is a bit of a doc conflict magnet.  The sooner we can
get it in, the easier for Marc-André and me.

v2:
* Rebased (v1 conflicts with commit e1ff3c6)
* test-qapi.py tweaked to avoid trailing empty lines in .out

The following changes since commit b6af8ea60282df514f87d32e36afd1c9aeee28c8:

  Merge remote-tracking branch 
'remotes/ehabkost/tags/x86-and-machine-pull-request' into staging (2017-01-13 
14:38:21 +)

are available in the git repository at:

  git://repo.or.cz/qemu/armbru.git tags/pull-qapi-2017-01-16

for you to fetch changes up to 56e8bdd46a8a42d89b0afea9da83ae7679cc0439:

  build-sys: add qapi doc generation targets (2017-01-16 10:11:43 +0100)


QAPI patches for 2017-01-16


Marc-André Lureau (180):
  qapi: replace 'o' for list items
  qapi: move QKeyCode doc body at the top
  qapi: Format TODO comments more consistently
  qapi: improve device_add schema
  qapi: improve TransactionAction doc
  qga/schema: improve guest-set-vcpus Returns: section
  qapi: Reorder doc comments for future doc generator
  qapi: Move "command is experimental" notes down
  qapi: add some sections in docs
  docs: add master qapi texi files
  qapi: rework qapi Exception
  texi2pod: learn quotation, deftp and deftypefn
  qmp-commands: move 'add_client' doc to schema
  qmp-commands: move 'query-name' doc to schema
  qmp-commands: move 'query-kvm' doc to schema
  qmp-commands: move 'query-status' doc to schema
  qmp-commands: move 'query-uuid' doc to schema
  qmp-commands: move 'query-chardev' doc to schema
  qmp-commands: move 'query-chardev-backends' doc to schema
  qmp-commands: move 'ringbuf-write' doc to schema
  qmp-commands: move 'ringbuf-read' doc to schema
  qmp-commands: move 'query-events' doc to schema
  qmp-commands: move 'query-migrate' doc to schema
  qmp-commands: move 'migrate-set-capabilities' doc to schema
  qmp-commands: move 'query-migrate-capabilities' doc to schema
  qmp-commands: move 'migrate-set-parameters' doc to schema
  qmp-commands: move 'query-migrate-parameters' doc to schema
  qmp-commands: move 'client_migrate_info' doc to schema
  qmp-commands: move 'migrate-start-postcopy' doc to schema
  qmp-commands: move 'query-mice' doc to schema
  qmp-commands: move 'query-cpus' doc to schema
  qmp-commands: move 'query-iothreads' doc to schema
  qmp-commands: move 'query-vnc' doc to schema
  qmp-commands: move 'query-spice' doc to schema
  qmp-commands: move 'query-balloon' doc to schema
  qmp-commands: move 'query-pci' doc to schema
  qmp-commands: move 'quit' doc to schema
  qmp-commands: move 'stop' doc to schema
  qmp-commands: move 'system_reset' doc to schema
  qmp-commands: move 'system_powerdown' doc to schema
  qmp-commands: move 'cpu-add' doc to schema
  qmp-commands: move 'memsave' doc to schema
  qmp-commands: move 'pmemsave' doc to schema
  qmp-commands: move 'cont' doc to schema
  qmp-commands: move 'system_wakeup' doc to schema
  qmp-commands: move 'inject-nmi' doc to schema
  qmp-commands: move 'set_link' doc to schema
  qmp-commands: move 'balloon' doc to schema
  qmp-commands: move 'transaction' doc to schema
  qmp-commands: move 'human-monitor-command' doc to schema
  qmp-commands: move 'migrate_cancel' doc to schema
  qmp-commands: move 'migrate_set_downtime' doc to schema
  qmp-commands: move 'migrate_set_speed' doc to schema
  qmp-commands: move 'query-migrate-cache-size' doc to schema
  qmp-commands: move 'set_password' doc to schema
  qmp-commands: move 'expire_password' doc to schema
  qmp-commands: move 'change' doc to schema
  qmp-commands: move 'migrate' doc to schema
  qmp-commands: move 'migrate-incoming' doc to schema
  qmp-commands: move 'xen-save-devices-state' doc to schema
  qmp-commands: move 'xen-set-global-dirty-log' doc to schema
  qmp-commands: move 'device_del' doc to schema
  qmp-commands: move 'dump-guest-memory' doc to schema
  qmp-commands: move 'query-dump-guest-memory-capability' doc to schema
  qmp-commands: move 'dump-skeys' doc to schema
  qmp-commands: move 'netd

[Qemu-devel] [PATCH RFC 2/6] target: arm: Add the qemu target for KVM_ARM_TARGET_GENERIC_V8

2017-01-16 Thread Shannon Zhao
From: Shannon Zhao 

Signed-off-by: Shannon Zhao 
---
 target/arm/kvm-consts.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/arm/kvm-consts.h b/target/arm/kvm-consts.h
index a2c9518..fc01ac5 100644
--- a/target/arm/kvm-consts.h
+++ b/target/arm/kvm-consts.h
@@ -128,6 +128,7 @@ MISMATCH_CHECK(QEMU_PSCI_RET_DISABLED, PSCI_RET_DISABLED)
 #define QEMU_KVM_ARM_TARGET_CORTEX_A57 2
 #define QEMU_KVM_ARM_TARGET_XGENE_POTENZA 3
 #define QEMU_KVM_ARM_TARGET_CORTEX_A53 4
+#define QEMU_KVM_ARM_TARGET_GENERIC_V8 5
 
 /* There's no kernel define for this: sentinel value which
  * matches no KVM target value for either 64 or 32 bit
@@ -140,6 +141,7 @@ MISMATCH_CHECK(QEMU_KVM_ARM_TARGET_FOUNDATION_V8, 
KVM_ARM_TARGET_FOUNDATION_V8)
 MISMATCH_CHECK(QEMU_KVM_ARM_TARGET_CORTEX_A57, KVM_ARM_TARGET_CORTEX_A57)
 MISMATCH_CHECK(QEMU_KVM_ARM_TARGET_XGENE_POTENZA, KVM_ARM_TARGET_XGENE_POTENZA)
 MISMATCH_CHECK(QEMU_KVM_ARM_TARGET_CORTEX_A53, KVM_ARM_TARGET_CORTEX_A53)
+MISMATCH_CHECK(QEMU_KVM_ARM_TARGET_GENERIC_V8, KVM_ARM_TARGET_GENERIC_V8)
 #else
 MISMATCH_CHECK(QEMU_KVM_ARM_TARGET_CORTEX_A15, KVM_ARM_TARGET_CORTEX_A15)
 MISMATCH_CHECK(QEMU_KVM_ARM_TARGET_CORTEX_A7, KVM_ARM_TARGET_CORTEX_A7)
-- 
2.0.4





[Qemu-devel] [PATCH RFC 0/6] target-arm: KVM64: Cross type vCPU support

2017-01-16 Thread Shannon Zhao
From: Shannon Zhao 

This patch set support use cross type vCPU when using KVM on ARM and add
two new CPU types: generic and cortex-a72.

You can test this patch set with QEMU using
-cpu cortex-a53/cortex-a57/generic/cortex-a72

These patches can be fetched from:
https://git.linaro.org/people/shannon.zhao/qemu.git cross_vcpu_rfc

You corresponding KVM patches can be fetched from:
https://git.linaro.org/people/shannon.zhao/linux-mainline.git cross_vcpu_rfc

Shannon Zhao (6):
  headers: update linux headers
  target: arm: Add the qemu target for KVM_ARM_TARGET_GENERIC_V8
  arm: kvm64: Check if kvm supports cross type vCPU
  target: arm: Add a generic type cpu
  arm: virt: Enable generic type CPU in virt machine
  target-arm: cpu64: Add support for Cortex-A72

 hw/arm/virt.c |   2 +
 linux-headers/asm-arm64/kvm.h |   1 +
 linux-headers/linux/kvm.h |   2 +
 target/arm/cpu64.c| 110 +
 target/arm/kvm-consts.h   |   2 +
 target/arm/kvm64.c| 182 ++
 6 files changed, 299 insertions(+)

-- 
2.0.4





Re: [Qemu-devel] [PULL v2 02/11] ui: use evdev keymap when running under wayland

2017-01-16 Thread Daniel P. Berrange
CC qemu-stable

This patch should go into active stable branches given that wayland is
now the default on some OS distro

On Mon, Jan 09, 2017 at 02:09:36PM +0100, Gerd Hoffmann wrote:
> From: "Daniel P. Berrange" 
> 
> Wayland always uses evdev as its input source, so QEMU
> can use the existing evdev keymap data
> 
> Signed-off-by: Daniel P. Berrange 
> Tested-by: Stefan Hajnoczi 
> Message-id: 20161201094117.16407-1-berra...@redhat.com
> Signed-off-by: Gerd Hoffmann 
> ---
>  include/ui/gtk.h | 4 
>  ui/gtk.c | 7 +++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/include/ui/gtk.h b/include/ui/gtk.h
> index 42ca0fe..b3b5005 100644
> --- a/include/ui/gtk.h
> +++ b/include/ui/gtk.h
> @@ -18,6 +18,10 @@
>  #include 
>  #endif
>  
> +#ifdef GDK_WINDOWING_WAYLAND
> +#include 
> +#endif
> +
>  #if defined(CONFIG_OPENGL)
>  #include "ui/egl-helpers.h"
>  #include "ui/egl-context.h"
> diff --git a/ui/gtk.c b/ui/gtk.c
> index 406de4f..356f400 100644
> --- a/ui/gtk.c
> +++ b/ui/gtk.c
> @@ -90,6 +90,9 @@
>  #ifndef GDK_IS_X11_DISPLAY
>  #define GDK_IS_X11_DISPLAY(dpy) (dpy == dpy)
>  #endif
> +#ifndef GDK_IS_WAYLAND_DISPLAY
> +#define GDK_IS_WAYLAND_DISPLAY(dpy) (dpy == dpy)
> +#endif
>  #ifndef GDK_IS_WIN32_DISPLAY
>  #define GDK_IS_WIN32_DISPLAY(dpy) (dpy == dpy)
>  #endif
> @@ -1054,6 +1057,10 @@ static int gd_map_keycode(GtkDisplayState *s, 
> GdkDisplay *dpy, int gdk_keycode)
>  qemu_keycode = translate_xfree86_keycode(gdk_keycode - 97);
>  }
>  #endif
> +#ifdef GDK_WINDOWING_WAYLAND
> +} else if (GDK_IS_WAYLAND_DISPLAY(dpy) && gdk_keycode < 158) {
> +qemu_keycode = translate_evdev_keycode(gdk_keycode - 97);
> +#endif
>  } else if (gdk_keycode == 208) { /* Hiragana_Katakana */
>  qemu_keycode = 0x70;
>  } else if (gdk_keycode == 211) { /* backslash */
> -- 
> 1.8.3.1
> 

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-devel] [PATCH v8 15/21] qapi: add qapi2texi script

2017-01-16 Thread Markus Armbruster
Marc-André Lureau  writes:

> As the name suggests, the qapi2texi script converts JSON QAPI
> description into a texi file suitable for different target
> formats (info/man/txt/pdf/html...).
>
> It parses the following kind of blocks:
>
> Free-form:
>
>   ##
>   # = Section
>   # == Subsection
>   #
>   # Some text foo with *emphasis*
>   # 1. with a list
>   # 2. like that
>   #
>   # And some code:
>   # | $ echo foo
>   # | -> do this
>   # | <- get that
>   #
>   ##
>
> Symbol description:
>
>   ##
>   # @symbol:
>   #
>   # Symbol body ditto ergo sum. Foo bar
>   # baz ding.
>   #
>   # @param1: the frob to frobnicate
>   # @param2: #optional how hard to frobnicate
>   #
>   # Returns: the frobnicated frob.
>   #  If frob isn't frobnicatable, GenericError.
>   #
>   # Since: version
>   # Notes: notes, comments can have
>   #- itemized list
>   #- like this
>   #
>   # Example:
>   #
>   # -> { "execute": "quit" }
>   # <- { "return": {} }
>   #
>   ##
>
> That's roughly following the following EBNF grammar:
>
> api_comment = "##\n" comment "##\n"
> comment = freeform_comment | symbol_comment
> freeform_comment = { "# " text "\n" | "#\n" }
> symbol_comment = "# @" name ":\n" { member | tag_section | freeform_comment }
> member = "# @" name ':' [ text ] "\n" freeform_comment
> tag_section = "# " ( "Returns:", "Since:", "Note:", "Notes:", "Example:", 
> "Examples:" ) [ text ]  "\n" freeform_comment
> text = free text with markup
>
> Note that the grammar is ambiguous: a line "# @foo:\n" can be parsed
> both as freeform_comment and as symbol_comment.  The actual parser
> recognizes symbol_comment.
>
> See docs/qapi-code-gen.txt for more details.
>
> Deficiencies and limitations:
> - the generated QMP documentation includes internal types
> - union type support is lacking
> - type information is lacking in generated documentation
> - doc comment error message positions are imprecise, they point
>   to the beginning of the comment.
> - a few minor issues, all marked TODO/FIXME in the code
>
> Signed-off-by: Marc-André Lureau 

Squashing in the appended patch along with updates to expected output to
avoid git complaining about trailing blank lines like this:

tests/qapi-schema/comments.out:7: new blank line at EOF.
tests/qapi-schema/event-case.out:8: new blank line at EOF.
tests/qapi-schema/ident-with-escape.out:10: new blank line at EOF.
tests/qapi-schema/include-relpath.out:7: new blank line at EOF.
tests/qapi-schema/include-repetition.out:7: new blank line at EOF.
tests/qapi-schema/include-simple.out:7: new blank line at EOF.
tests/qapi-schema/indented-expr.out:13: new blank line at EOF.
tests/qapi-schema/qapi-schema-test.out:446: new blank line at EOF.

diff --git a/tests/qapi-schema/test-qapi.py b/tests/qapi-schema/test-qapi.py
index 39b55b9..b4cde4f 100644
--- a/tests/qapi-schema/test-qapi.py
+++ b/tests/qapi-schema/test-qapi.py
@@ -66,4 +66,6 @@ for doc in schema.docs:
 print 'arg=%s\n%s' % (arg, section)
 for section in doc.sections:
 print 'section=%s\n%s' % (section.name, section)
-print 'body=\n%s' % doc.body
+body = str(doc.body)
+if body:
+print 'body=\n%s' % body



Re: [Qemu-devel] [PATCH RFC 0/6] target-arm: KVM64: Cross type vCPU support

2017-01-16 Thread no-reply
Hi,

Your series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1484558821-15512-1-git-send-email-zhaoshengl...@huawei.com
Subject: [Qemu-devel] [PATCH RFC 0/6] target-arm: KVM64: Cross type vCPU support

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] 
patchew/1484558821-15512-1-git-send-email-zhaoshengl...@huawei.com -> 
patchew/1484558821-15512-1-git-send-email-zhaoshengl...@huawei.com
Switched to a new branch 'test'
ad5e87b target-arm: cpu64: Add support for Cortex-A72
d4f6c86 arm: virt: Enable generic type CPU in virt machine
db2426e target: arm: Add a generic type cpu
042dff8 arm: kvm64: Check if kvm supports cross type vCPU
d8b8db1 target: arm: Add the qemu target for KVM_ARM_TARGET_GENERIC_V8
71eb07d headers: update linux headers

=== OUTPUT BEGIN ===
Checking PATCH 1/6: headers: update linux headers...
Checking PATCH 2/6: target: arm: Add the qemu target for 
KVM_ARM_TARGET_GENERIC_V8...
Checking PATCH 3/6: arm: kvm64: Check if kvm supports cross type vCPU...
ERROR: Macros with complex values should be enclosed in parenthesis
#21: FILE: target/arm/kvm64.c:484:
+#define ARM_CPU_ID_MIDR3, 0, 0, 0, 0

ERROR: Macros with complex values should be enclosed in parenthesis
#24: FILE: target/arm/kvm64.c:487:
+#define ARM_CPU_ID_REVIDR  3, 0, 0, 0, 6

ERROR: Macros with complex values should be enclosed in parenthesis
#25: FILE: target/arm/kvm64.c:488:
+#define ARM_CPU_ID_AIDR3, 1, 0, 0, 7

ERROR: Macros with complex values should be enclosed in parenthesis
#28: FILE: target/arm/kvm64.c:491:
+#define ARM_CPU_ID_CCSIDR  3, 1, 0, 0, 0

ERROR: Macros with complex values should be enclosed in parenthesis
#29: FILE: target/arm/kvm64.c:492:
+#define ARM_CPU_ID_CLIDR   3, 1, 0, 0, 1

ERROR: Macros with complex values should be enclosed in parenthesis
#30: FILE: target/arm/kvm64.c:493:
+#define ARM_CPU_ID_CSSELR  3, 2, 0, 0, 0

ERROR: Macros with complex values should be enclosed in parenthesis
#31: FILE: target/arm/kvm64.c:494:
+#define ARM_CPU_ID_CTR 3, 3, 0, 0, 1

ERROR: Macros with complex values should be enclosed in parenthesis
#34: FILE: target/arm/kvm64.c:497:
+#define ARM_CPU_ID_PFR03, 0, 0, 1, 0

ERROR: Macros with complex values should be enclosed in parenthesis
#35: FILE: target/arm/kvm64.c:498:
+#define ARM_CPU_ID_PFR13, 0, 0, 1, 1

ERROR: Macros with complex values should be enclosed in parenthesis
#36: FILE: target/arm/kvm64.c:499:
+#define ARM_CPU_ID_DFR03, 0, 0, 1, 2

ERROR: Macros with complex values should be enclosed in parenthesis
#37: FILE: target/arm/kvm64.c:500:
+#define ARM_CPU_ID_AFR03, 0, 0, 1, 3

ERROR: Macros with complex values should be enclosed in parenthesis
#38: FILE: target/arm/kvm64.c:501:
+#define ARM_CPU_ID_MMFR0   3, 0, 0, 1, 4

ERROR: Macros with complex values should be enclosed in parenthesis
#39: FILE: target/arm/kvm64.c:502:
+#define ARM_CPU_ID_MMFR1   3, 0, 0, 1, 5

ERROR: Macros with complex values should be enclosed in parenthesis
#40: FILE: target/arm/kvm64.c:503:
+#define ARM_CPU_ID_MMFR2   3, 0, 0, 1, 6

ERROR: Macros with complex values should be enclosed in parenthesis
#41: FILE: target/arm/kvm64.c:504:
+#define ARM_CPU_ID_MMFR3   3, 0, 0, 1, 7

ERROR: Macros with complex values should be enclosed in parenthesis
#42: FILE: target/arm/kvm64.c:505:
+#define ARM_CPU_ID_ISAR0   3, 0, 0, 2, 0

ERROR: Macros with complex values should be enclosed in parenthesis
#43: FILE: target/arm/kvm64.c:506:
+#define ARM_CPU_ID_ISAR1   3, 0, 0, 2, 1

ERROR: Macros with complex values should be enclosed in parenthesis
#44: FILE: target/arm/kvm64.c:507:
+#define ARM_CPU_ID_ISAR2   3, 0, 0, 2, 2

ERROR: Macros with complex values should be enclosed in parenthesis
#45: FILE: target/arm/kvm64.c:508:
+#define ARM_CPU_ID_ISAR3   3, 0, 0, 2, 3

ERROR: Macros with complex values should be enclosed in parenthesis
#46: FILE: target/arm/kvm64.c:509:
+#define ARM_CPU_ID_ISAR4   3, 0, 0, 2, 4

ERROR: Macros with complex values should be enclosed in parenthesis
#47: FILE: target/arm/kvm64.c:510:
+#define ARM_CPU_ID_ISAR5   3, 0, 0, 2, 5

ERROR: Macros with complex values should be enclosed in parenthesis
#48: FILE: target/arm/kvm64.c:511:
+#define ARM_CPU_ID_MMFR4   3, 0, 0, 2, 6

ERROR: Macros with complex values should be enclosed in parenthe

Re: [Qemu-devel] [PULL 00/30] target-sparc sun4v support

2017-01-16 Thread Artyom Tarasenko
On Fri, Jan 13, 2017 at 3:01 PM, Peter Maydell  wrote:
> On 12 January 2017 at 02:55, Richard Henderson  wrote:
>> Mark Cave-Ayland asked me to handle the merge of this patch set
>> while he is traveling.
>>
>> This is the v2 that Artyom posted today.  I had reviewed the majority
>> of v1 earlier.  I re-read through the rebase and saw nothing amiss.
>> It passes my tests for sparc32, and does run the OpenSolaris image
>> to which Artyom links.
>>
>>
>> r~
>>
>>
>> The following changes since commit 41a0e54756a9ae6b60be34bb33302a7e085fdb07:
>>
>>   Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
>> (2017-01-10 10:46:21 +)
>>
>> are available in the git repository at:
>>
>>   git://github.com/rth7680/qemu.git tags/pull-sparc-20170111
>>
>> for you to fetch changes up to 224be7cc93a37ccd38342811a8925de889de1a49:
>>
>>   target-sparc: fix up niagara machine (2017-01-11 12:23:58 -0800)
>>
>> 
>> Sun4v support
>
> This hits clang's unused-function error;
>
> /home/petmay01/linaro/qemu-for-merges/target/sparc/ldst_helper.c:346:20:
> error: unused function 'do_check_asi' [-Werror,-Wunused-function]
> static inline void do_check_asi(CPUSPARCState *env, int asi, uintptr_t ra)
>^
> 1 error generated.
>

My bad. Will add an #ifdef and re-submit.

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu



[Qemu-devel] [Bug 1578192] Re: GTK+ interface doesn't translate keycodes properly with Wayland backend

2017-01-16 Thread Daniel Berrange
This issue was fixed already in git master with:

commit a8ffb372a2202c65f42fdb69891ea68a2f274b55
Author: Daniel P. Berrange 
Date:   Thu Dec 1 09:41:17 2016 +

ui: use evdev keymap when running under wayland

Wayland always uses evdev as its input source, so QEMU
can use the existing evdev keymap data

Signed-off-by: Daniel P. Berrange 
Tested-by: Stefan Hajnoczi 
Message-id: 20161201094117.16407-1-berra...@redhat.com
Signed-off-by: Gerd Hoffmann 


** Changed in: qemu
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1578192

Title:
  GTK+ interface doesn't translate keycodes properly with Wayland
  backend

Status in QEMU:
  Fix Committed

Bug description:
  I already posted this on the mailing list
  (https://lists.nongnu.org/archive/html/qemu-
  devel/2016-05/msg00119.html) but I decided to do a formal bug report
  so it can be tracked and doesn't get lost.

  ... I'm no expert, but it looks like GTK+ key events come in at the
  ui/gtk.c:gd_key_event callback function, which calls
  ui/gtk.c:gd_map_keycode to translate the GTK+ keycode into the Qemu
  keycode before sending it on using qemu_input_event_send_key_number.
  The problem is that gd_map_keycode is incomplete when GTK+ is running
  on a backend other than X11.

  static int gd_map_keycode(GtkDisplayState *s, GdkDisplay *dpy, int 
gdk_keycode)
  {
  int qemu_keycode;

  #ifdef GDK_WINDOWING_WIN32
  if (GDK_IS_WIN32_DISPLAY(dpy)) {
  qemu_keycode = MapVirtualKey(gdk_keycode, MAPVK_VK_TO_VSC);
  switch (qemu_keycode) {
  case 103:   /* alt gr */
  qemu_keycode = 56 | SCANCODE_GREY;
  break;
  }
  return qemu_keycode;
  }
  #endif

  if (gdk_keycode < 9) {
  qemu_keycode = 0;
  } else if (gdk_keycode < 97) {
  qemu_keycode = gdk_keycode - 8;
  #ifdef GDK_WINDOWING_X11
  } else if (GDK_IS_X11_DISPLAY(dpy) && gdk_keycode < 158) {
  if (s->has_evdev) {
  qemu_keycode = translate_evdev_keycode(gdk_keycode - 97);
  } else {
  qemu_keycode = translate_xfree86_keycode(gdk_keycode - 97);
  }
  #endif
  } else if (gdk_keycode == 208) { /* Hiragana_Katakana */
  qemu_keycode = 0x70;
  } else if (gdk_keycode == 211) { /* backslash */
  qemu_keycode = 0x73;
  } else {
  qemu_keycode = 0;
  }

  return qemu_keycode;
  }

  In my case, I'm using GTK+'s Wayland backend, so keycodes 97 through
  157 (this includes KEY_HOME(102), KEY_PAGEUP(104), KEY_PAGEDOWN(109),
  KEY_END(107), etc.) are never translated into a qemu_keycode, and the
  final 'else' block is hit, causing gd_map_keycode to return 0, which
  is an invalid keycode and thus cannot be handled by xen-kbdfront. At
  least that's my best guess as to what is happening.

  The solution that comes to mind is provide an alternative to
  translate_{evdev,xfree86}_keycode that is compatable with
  Wayland/libinput, but I don't know exactly which API would provide
  this functionality, much less do I have a patch. Intuition tells me
  that translate_evdev_keycode would probably work under Wayland because
  Weston uses libinput which uses evdev as its backend, but I don't know
  this for a fact, and I don't know if it would be the Right Way (i.e.
  Wayland or libinput might provide an API for this purpose, but I don't
  know).

  I may try to do some testing with translate_evdev_keycode on Wayland
  and also look into any possible APIs for keycode translation, but I
  just wanted to put it out there and get some feedback awhile.

  Qemu 2.2.1 from Xen 4.6.1 (relevant code appears unchanged in Qemu master)
  GTK+ 3.18.7
  Wayland 1.9.0
  Weston 1.9.0
  libinput 1.2.3

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1578192/+subscriptions



Re: [Qemu-devel] [PATCH RFC v3 14/14] intel_iommu: enable vfio devices

2017-01-16 Thread Jason Wang



On 2017年01月16日 17:18, Peter Xu wrote:

  static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
hwaddr addr, uint8_t am)
  {
@@ -1222,6 +1251,7 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, 
uint16_t domain_id,
  info.addr = addr;
  info.mask = ~((1 << am) - 1);
  g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
+vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);

Is the case of GLOBAL or DSI flush missed, or we don't care it at all?

IMHO we don't. For device assignment, since we are having CM=1 here,
we should have explicit page invalidations even if guest sends
global/domain invalidations.

Thanks,

-- peterx


Is this spec required? Btw, it looks to me that both DSI and GLOBAL are 
indeed explicit flush.


Just have a quick go through on driver codes and find this something 
interesting in intel_iommu_flush_iotlb_psi():


...
/*
 * Fallback to domain selective flush if no PSI support or the size is
 * too big.
 * PSI requires page size to be 2 ^ x, and the base address is 
naturally

 * aligned to the size
 */
if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu->cap))
iommu->flush.flush_iotlb(iommu, did, 0, 0,
DMA_TLB_DSI_FLUSH);
else
iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
DMA_TLB_PSI_FLUSH);
...

It looks like DSI_FLUSH is possible even for CM on.

And in flush_unmaps():

...
/* In caching mode, global flushes turn emulation expensive */
if (!cap_caching_mode(iommu->cap))
iommu->flush.flush_iotlb(iommu, 0, 0, 0,
 DMA_TLB_GLOBAL_FLUSH);
...

If I understand the comments correctly, GLOBAL is ok for CM too (though 
the code did not do it for performance reason).


Thanks



Re: [Qemu-devel] [PATCH] pflash_cfi01: fix per device sector length in CFI table

2017-01-16 Thread Dr. David Alan Gilbert
* Andrew Jones (drjo...@redhat.com) wrote:
> On Thu, Jan 12, 2017 at 10:42:41AM +, Peter Maydell wrote:
> > On 12 January 2017 at 10:35, David Engraf  wrote:
> > > The CFI entry for sector length must be set to sector length per device.
> > > This is important for boards using multiple devices like the ARM Vexpress
> > > board (width = 4, device-width = 2).
> > >
> > > Linux and u-boots calculate the size ratio by dividing both values:
> > >
> > > size_ratio = info->portwidth / info->chipwidth;
> > >
> > > After that the sector length will be multiplied by the size_ratio, thus 
> > > the
> > > CFI entry for sector length is doubled. When Linux or u-boot send a sector
> > > erase, they expect to erase the doubled sector length, but QEMU only 
> > > erases
> > > the board specified sector length.
> > >
> > > This patch fixes the sector length in the CFI table to match the length 
> > > per
> > > device, equal to blocks_per_device.
> > 
> > Thanks for the patch. I haven't checked against the pflash spec yet,
> > but this looks like it's probably the right thing.
> > 
> > The only two machines which use a setup with multiple devices (ie
> > which specify device_width to the pflash_cfi01) are vexpress and virt.
> > For all other machines this patch leaves the behaviour unchanged.
> > 
> > Q: do we need to have some kind of nasty hack so that pre-2.9 virt
> > still gets the old broken values in the CFI table, for version and
> > migration compatibility? Ccing Drew for an opinion...
> >
> 
> I'm pretty sure we need the nasty hack, but I'm also Ccing David for
> his opinion.

Hmm I don't understand enough about pflash to understand the change here;
but generally if you need to keep something unchanged for older
machine types, add a property and then set that property in
the compatibility macros.

See include/hw/compat.h, I think you'd add the entry to HW_COMPAT_2_8
and follow a simple example like  old_msi_addr on intel-hda.c,
that should get picked up by aarch, x86, ppc etc versioned machine types.

Dave


> Thanks,
> drew
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH v4 6/6] hypertrace: Add guest-side Linux module

2017-01-16 Thread Stefan Hajnoczi
On Sun, Jan 15, 2017 at 03:10:13AM +0100, Lluís Vilanova wrote:
> Stefan Hajnoczi writes:
> > On Mon, Dec 26, 2016 at 09:35:10PM +0100, Lluís Vilanova wrote:
> >> Provides guest Linux kernel module "qemu-hypertrace.ko" to abstract
> >> access to the hypertrace channel.
> >> 
> >> Signed-off-by: Lluís Vilanova 
> >> ---
> >> Makefile   |4 -
> >> configure  |4 +
> >> hypertrace/guest/linux-module/Kbuild.in|7 +
> >> hypertrace/guest/linux-module/Makefile |   23 +++
> >> .../include/linux/qemu-hypertrace-internal.h   |   46 ++
> >> .../linux-module/include/linux/qemu-hypertrace.h   |   73 ++
> >> hypertrace/guest/linux-module/qemu-hypertrace.c|  149 
> >> 
> >> 7 files changed, 305 insertions(+), 1 deletion(-)
> >> create mode 100644 hypertrace/guest/linux-module/Kbuild.in
> >> create mode 100644 hypertrace/guest/linux-module/Makefile
> >> create mode 100644 
> >> hypertrace/guest/linux-module/include/linux/qemu-hypertrace-internal.h
> >> create mode 100644 
> >> hypertrace/guest/linux-module/include/linux/qemu-hypertrace.h
> >> create mode 100644 hypertrace/guest/linux-module/qemu-hypertrace.c
> 
> > This should be submitted to Linux, the module cannot live in the QEMU
> > tree (we don't carry out-of-tree kernel modules).
> 
> I thought it made sense to have it here, the same way we have the user-mode
> guest library. If not, I will split it into a separate repo and point at it 
> from
> the documentation file. Once the rest is merged into QEMU, I can post the 
> kernel
> module repo to the linux list.

Linux doesn't have a stable kernel module API.  Out-of-tree modules
bitrot if they are not actively rebased.  Getting the module into
linux.git avoids these problems.

The Linux maintainers may give you feedback that requires changes to the
current design, so please send the patch to them now instead of waiting.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 0/2] virtio: revert virtio_queue_set_notification() nesting

2017-01-16 Thread Stefan Hajnoczi
On Fri, Jan 13, 2017 at 09:15:49AM -0600, Doug Goldstein wrote:
> On 1/13/17 6:02 AM, Stefan Hajnoczi wrote:
> > On Thu, Jan 12, 2017 at 10:57:53AM -0600, Doug Goldstein wrote:
> >> On 1/12/17 5:46 AM, Stefan Hajnoczi wrote:
> >>> The virtio_queue_set_notification() nesting introduced for AioContext 
> >>> polling
> >>> raised an assertion with virtio-net (even in non-polling mode).  
> >>> Converting
> >>> virtio-net and virtio-crypto to use virtio_queue_set_notification() in a
> >>> nesting fashion would be invasive and isn't worth it.
> >>>
> >>> Patch 1 contains the revert to resolve the bug that Doug noticed.
> >>>
> >>> Patch 2 is a less efficient but safe alternative.
> >>>
> >>> Stefan Hajnoczi (2):
> >>>   Revert "virtio: turn vq->notification into a nested counter"
> >>>   virtio: disable notifications again after poll succeeded
> >>>
> >>>  hw/virtio/virtio.c | 21 +
> >>>  1 file changed, 9 insertions(+), 12 deletions(-)
> >>>
> >>
> >> So I just gave this series a whirl and it fixes the assert but causes
> >> another issue for me. While iPXE is getting a DHCP address the screen
> >> immediately flashes over to the UEFI shell. Its like a timeout is
> >> getting hit and just dropping me to the shell.
> > 
> > Sounds like an separate problem.
> > 
> > Stefan
> > 
> 
> Is there any debug output that I can provide to help troubleshoot it?
> I've built 23425cc and UEFI via OVMF is able to get an IP address via
> DHCP inside of iPXE. I've also taken master and only applied the first
> patch in this series (the revert) and it too works. Its only when I add
> the 2nd patch into the mix or don't revert out the "virtio: turn
> vq->notification into a nested counter" patch that it fails.

The code in Patch 2 should not be executed in your QEMU configuration,
so I wonder how Patch 2 can cause the DHCP failure.

Please verify as follows:

  $ gdb --args path/to/qemu-system-x86_64 ...
  (gdb) handle SIGUSR1 noprint nostop pass
  (gdb) handle SIGPIPE noprint nostop pass
  (gdb) b virtio_queue_host_notifier_aio_poll_begin
  (gdb) r

I predict the breakpoint will not be hit.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH] target-i386: Remove AMD feature flag aliases from Opteron models

2017-01-16 Thread Igor Mammedov
On Fri, 13 Jan 2017 17:00:57 -0200
Eduardo Habkost  wrote:

> When CPU vendor is set to AMD, the AMD feature alias bits on
> CPUID[0x8001].EDX are already automatically copied from CPUID[1].EDX
> on x86_cpu_realizefn(). When CPU vendor is Intel, those bits are
> reserved and should be zero. On either case, those bits shouldn't be set
> in the CPU model table.
> 
> Commit 726a8ff68677d8d5fba17eb0ffb85076bfb598dc removed those
> bits from most CPU models, but the Opteron_* entries still have
> them. Remove the alias bits from Opteron_* too.
> 
> Add an assert() to x86_register_cpudef_type() to ensure we don't
> make the same mistake again.
> 
> Signed-off-by: Eduardo Habkost 
Reviewed-by: Igor Mammedov 

> ---
>  target/i386/cpu.c | 46 --
>  1 file changed, 12 insertions(+), 34 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index a149c8dc42..af726a8fd0 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -1339,12 +1339,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
>  .features[FEAT_1_ECX] =
>  CPUID_EXT_SSE3,
>  .features[FEAT_8000_0001_EDX] =
> -CPUID_EXT2_LM | CPUID_EXT2_FXSR | CPUID_EXT2_MMX |
> -CPUID_EXT2_NX | CPUID_EXT2_PSE36 | CPUID_EXT2_PAT |
> -CPUID_EXT2_CMOV | CPUID_EXT2_MCA | CPUID_EXT2_PGE |
> -CPUID_EXT2_MTRR | CPUID_EXT2_SYSCALL | CPUID_EXT2_APIC |
> -CPUID_EXT2_CX8 | CPUID_EXT2_MCE | CPUID_EXT2_PAE | 
> CPUID_EXT2_MSR |
> -CPUID_EXT2_TSC | CPUID_EXT2_PSE | CPUID_EXT2_DE | CPUID_EXT2_FPU,
> +CPUID_EXT2_LM | CPUID_EXT2_NX | CPUID_EXT2_SYSCALL,
>  .xlevel = 0x8008,
>  .model_id = "AMD Opteron 240 (Gen 1 Class Opteron)",
>  },
> @@ -1365,13 +1360,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
>  CPUID_EXT_CX16 | CPUID_EXT_SSE3,
>  /* Missing: CPUID_EXT2_RDTSCP */
>  .features[FEAT_8000_0001_EDX] =
> -CPUID_EXT2_LM | CPUID_EXT2_FXSR |
> -CPUID_EXT2_MMX | CPUID_EXT2_NX | CPUID_EXT2_PSE36 |
> -CPUID_EXT2_PAT | CPUID_EXT2_CMOV | CPUID_EXT2_MCA |
> -CPUID_EXT2_PGE | CPUID_EXT2_MTRR | CPUID_EXT2_SYSCALL |
> -CPUID_EXT2_APIC | CPUID_EXT2_CX8 | CPUID_EXT2_MCE |
> -CPUID_EXT2_PAE | CPUID_EXT2_MSR | CPUID_EXT2_TSC | 
> CPUID_EXT2_PSE |
> -CPUID_EXT2_DE | CPUID_EXT2_FPU,
> +CPUID_EXT2_LM | CPUID_EXT2_NX | CPUID_EXT2_SYSCALL,
>  .features[FEAT_8000_0001_ECX] =
>  CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM,
>  .xlevel = 0x8008,
> @@ -1395,13 +1384,7 @@ static X86CPUDefinition builtin_x86_defs[] = {
>  CPUID_EXT_SSE3,
>  /* Missing: CPUID_EXT2_RDTSCP */
>  .features[FEAT_8000_0001_EDX] =
> -CPUID_EXT2_LM | CPUID_EXT2_FXSR |
> -CPUID_EXT2_MMX | CPUID_EXT2_NX | CPUID_EXT2_PSE36 |
> -CPUID_EXT2_PAT | CPUID_EXT2_CMOV | CPUID_EXT2_MCA |
> -CPUID_EXT2_PGE | CPUID_EXT2_MTRR | CPUID_EXT2_SYSCALL |
> -CPUID_EXT2_APIC | CPUID_EXT2_CX8 | CPUID_EXT2_MCE |
> -CPUID_EXT2_PAE | CPUID_EXT2_MSR | CPUID_EXT2_TSC | 
> CPUID_EXT2_PSE |
> -CPUID_EXT2_DE | CPUID_EXT2_FPU,
> +CPUID_EXT2_LM | CPUID_EXT2_NX | CPUID_EXT2_SYSCALL,
>  .features[FEAT_8000_0001_ECX] =
>  CPUID_EXT3_MISALIGNSSE | CPUID_EXT3_SSE4A |
>  CPUID_EXT3_ABM | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM,
> @@ -1428,13 +1411,8 @@ static X86CPUDefinition builtin_x86_defs[] = {
>  CPUID_EXT_SSE3,
>  /* Missing: CPUID_EXT2_RDTSCP */
>  .features[FEAT_8000_0001_EDX] =
> -CPUID_EXT2_LM |
> -CPUID_EXT2_PDPE1GB | CPUID_EXT2_FXSR | CPUID_EXT2_MMX |
> -CPUID_EXT2_NX | CPUID_EXT2_PSE36 | CPUID_EXT2_PAT |
> -CPUID_EXT2_CMOV | CPUID_EXT2_MCA | CPUID_EXT2_PGE |
> -CPUID_EXT2_MTRR | CPUID_EXT2_SYSCALL | CPUID_EXT2_APIC |
> -CPUID_EXT2_CX8 | CPUID_EXT2_MCE | CPUID_EXT2_PAE | 
> CPUID_EXT2_MSR |
> -CPUID_EXT2_TSC | CPUID_EXT2_PSE | CPUID_EXT2_DE | CPUID_EXT2_FPU,
> +CPUID_EXT2_LM | CPUID_EXT2_PDPE1GB | CPUID_EXT2_NX |
> +CPUID_EXT2_SYSCALL,
>  .features[FEAT_8000_0001_ECX] =
>  CPUID_EXT3_FMA4 | CPUID_EXT3_XOP |
>  CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_MISALIGNSSE |
> @@ -1464,13 +1442,8 @@ static X86CPUDefinition builtin_x86_defs[] = {
>  CPUID_EXT_SSSE3 | CPUID_EXT_PCLMULQDQ | CPUID_EXT_SSE3,
>  /* Missing: CPUID_EXT2_RDTSCP */
>  .features[FEAT_8000_0001_EDX] =
> -CPUID_EXT2_LM |
> -CPUID_EXT2_PDPE1GB | CPUID_EXT2_FXSR | CPUID_EXT2_MMX |
> -CPUID_EXT2_NX | CPUID_EXT2_PSE36 | CPUID_EXT2_PAT |
> -CPUID_EXT2_CMOV | CPUID_EXT2_MCA | CPUID_EXT2_PGE |
> -CPUID_EXT2_MTRR | CPUID_EXT2_SYSCALL | CPUID_EXT2_APIC |
> -

Re: [Qemu-devel] [PATCH] hw/i386: check if nvdimm is enabled before plugging

2017-01-16 Thread Stefan Hajnoczi
On Mon, Jan 16, 2017 at 01:55:34PM +0800, Xiao Guangrong wrote:
> 
> 
> On 01/14/2017 02:02 AM, Eduardo Habkost wrote:
> > On Fri, Jan 13, 2017 at 01:17:27PM +, Stefan Hajnoczi wrote:
> > > On Fri, Jan 13, 2017 at 07:56:51PM +0800, Haozhong Zhang wrote:
> > > > The missing of 'nvdimm' in the machine type option '-M' means NVDIMM
> > > > is disabled. QEMU should refuse to plug any NVDIMM device in this case
> > > > and report the misconfiguration.
> > > > 
> > > > Reported-by: Stefan Hajnoczi 
> > > > Signed-off-by: Haozhong Zhang 
> > > > Message-Id: 20170112110928.GF4621@stefanha-x1.localdomain
> > > > Message-Id: 20170111093630.2088-1-stefa...@redhat.com
> > > > ---
> > > >  hw/i386/pc.c | 5 +
> > > >  1 file changed, 5 insertions(+)
> > > > 
> > > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > > index 25e8586..3907609 100644
> > > > --- a/hw/i386/pc.c
> > > > +++ b/hw/i386/pc.c
> > > > @@ -1715,6 +1715,11 @@ static void pc_dimm_plug(HotplugHandler 
> > > > *hotplug_dev,
> > > >  }
> > > > 
> > > >  if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
> > > > +if (!pcms->acpi_nvdimm_state.is_enabled) {
> > > > +error_setg(&local_err,
> > > > +   "nvdimm is not enabled: missing 'nvdimm' in 
> > > > '-M'");
> > > > +goto out;
> > > > +}
> > > 
> > > A warning is definitely useful to notify users of a possible
> > > configuration error.
> > > 
> > > I wonder what happens when you plug an NVDIMM into a motherboard where
> > > the firmware lacks support.  Does it:
> > >  * Refuse to boot?
> > >  * Treat the DIMM as regular RAM?
> > >  * Boot but the DIMM will not be used by firmware and kernel?
> > > 
> > > QEMU should act the same way as real hardware.
> > 
> > If real hardware behavior is not useful in any way (e.g. first
> > and third options above), is there a good reason for QEMU to not
> > implement an additional safety mechanism preventing NVDIMM from
> > being connected to a machine that doesn't support it?
> > 
> 
> Yes. i agree with Eduardo.
> 
> For the real hardware the behavior may be different between vendors, we
> are asking our HW people to check what will happen on Intel's hardware
> under this case.

Let's find out what real hardware/firmware does.  I guess it's the Intel
MRC component that handles memory initialization.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 01/16] aio: introduce aio_co_schedule and aio_co_wake

2017-01-16 Thread Fam Zheng
On Fri, 01/13 14:17, Paolo Bonzini wrote:
> aio_co_wake provides the infrastructure to start a coroutine on a "home"
> AioContext.  It will be used by CoMutex and CoQueue, so that coroutines
> don't jump from one context to another when they go to sleep on a
> mutex or waitqueue.  However, it can also be used as a more efficient
> alternative to one-shot bottom halves, and saves the effort of tracking
> which AioContext a coroutine is running on.
> 
> aio_co_schedule is the part of aio_co_wake that starts a coroutine
> on a remove AioContext, but it is also useful to implement e.g.

s/remove/remote/ and maybe s/but/and/ ?

> bdrv_set_aio_context callbacks.
> 
> The implementation of aio_co_schedule is based on a lock-free
> multiple-producer, single-consumer queue.  The multiple producers use
> cmpxchg to add to a LIFO stack.  The consumer (a per-AioContext bottom
> half) grabs all items added so far, inverts the list to make it FIFO,
> and goes through it one item at a time until it's empty.  The data
> structure was inspired by OSv, which uses it in the very code we'll
> "port" to QEMU for the thread-safe CoMutex.
> 
> Most of the new code is really tests.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  async.c  |  65 +
>  include/block/aio.h  |  32 +++
>  include/qemu/coroutine_int.h |  10 +-
>  tests/Makefile.include   |  13 ++-
>  tests/iothread.c |  91 ++
>  tests/iothread.h |  25 +
>  tests/test-aio-multithread.c | 213 
> +++
>  tests/test-vmstate.c |  11 ---
>  trace-events |   4 +
>  util/qemu-coroutine.c|   8 ++
>  10 files changed, 456 insertions(+), 16 deletions(-)
>  create mode 100644 tests/iothread.c
>  create mode 100644 tests/iothread.h
>  create mode 100644 tests/test-aio-multithread.c
> 
> diff --git a/async.c b/async.c
> index 0d218ab..1338682 100644
> --- a/async.c
> +++ b/async.c
> @@ -30,6 +30,8 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/atomic.h"
>  #include "block/raw-aio.h"
> +#include "trace/generated-tracers.h"
> +#include "qemu/coroutine_int.h"
>  
>  /***/
>  /* bottom halves (can be seen as timers which expire ASAP) */
> @@ -274,6 +276,9 @@ aio_ctx_finalize(GSource *source)
>  }
>  #endif
>  
> +assert(QSLIST_EMPTY(&ctx->scheduled_coroutines));
> +qemu_bh_delete(ctx->co_schedule_bh);
> +
>  qemu_lockcnt_lock(&ctx->list_lock);
>  assert(!qemu_lockcnt_count(&ctx->list_lock));
>  while (ctx->first_bh) {
> @@ -363,6 +368,28 @@ static bool event_notifier_poll(void *opaque)
>  return atomic_read(&ctx->notified);
>  }
>  
> +static void co_schedule_bh_cb(void *opaque)
> +{
> +AioContext *ctx = opaque;
> +QSLIST_HEAD(, Coroutine) straight, reversed;
> +
> +QSLIST_MOVE_ATOMIC(&reversed, &ctx->scheduled_coroutines);
> +QSLIST_INIT(&straight);

Worth special casing 1 element case?

> +
> +while (!QSLIST_EMPTY(&reversed)) {
> +Coroutine *co = QSLIST_FIRST(&reversed);
> +QSLIST_REMOVE_HEAD(&reversed, co_scheduled_next);
> +QSLIST_INSERT_HEAD(&straight, co, co_scheduled_next);
> +}
> +
> +while (!QSLIST_EMPTY(&straight)) {
> +Coroutine *co = QSLIST_FIRST(&straight);
> +QSLIST_REMOVE_HEAD(&straight, co_scheduled_next);
> +trace_aio_co_schedule_bh_cb(ctx, co);
> +qemu_coroutine_enter(co);
> +}
> +}
> +
> diff --git a/tests/iothread.c b/tests/iothread.c
> new file mode 100644
> index 000..777d9ee
> --- /dev/null
> +++ b/tests/iothread.c
> @@ -0,0 +1,91 @@
> +/*
> + * Event loop thread implementation for unit tests

Curious: what is preventing from (perhaps enhancing and then) using the top
iothread.c implementation?

> + *
> + * Copyright Red Hat Inc., 2013, 2016
> + *
> + * Authors:
> + *  Stefan Hajnoczi   
> + *  Paolo Bonzini 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "block/aio.h"
> +#include "qemu/main-loop.h"
> +#include "qemu/rcu.h"
> +#include "iothread.h"
> +
> +struct IOThread {
> +AioContext *ctx;
> +
> +QemuThread thread;
> +QemuMutex init_done_lock;
> +QemuCond init_done_cond;/* is thread initialization done? */
> +bool stopping;
> +};
> +
> +static __thread IOThread *my_iothread;
> +
> +AioContext *qemu_get_current_aio_context(void)
> +{
> +return my_iothread ? my_iothread->ctx : qemu_get_aio_context();
> +}
> +
> +static void *iothread_run(void *opaque)
> +{
> +IOThread *iothread = opaque;
> +
> +rcu_register_thread();
> +
> +my_iothread = iothread;
> +qemu_mutex_lock(&iothread->init_done_lock);
> +iothread->ctx = aio_context_new(&error_abort);
> +qemu_cond_signal(&iothread->init_done_cond);
> +qe

Re: [Qemu-devel] [PATCH 04/16] io: add methods to set I/O handlers on AioContext

2017-01-16 Thread Fam Zheng
On Fri, 01/13 14:17, Paolo Bonzini wrote:
> This is in preparation for making qio_channel_yield work on
> AioContexts other than the main one.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  include/io/channel.h | 30 ++
>  io/channel-command.c | 13 +
>  io/channel-file.c| 11 +++
>  io/channel-socket.c  | 16 +++-
>  io/channel-tls.c | 12 
>  io/channel-watch.c   |  6 ++
>  io/channel.c | 11 +++
>  7 files changed, 94 insertions(+), 5 deletions(-)
> 
> diff --git a/include/io/channel.h b/include/io/channel.h
> index 32a9470..665edd7 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -23,6 +23,7 @@
>  
>  #include "qemu-common.h"
>  #include "qom/object.h"
> +#include "block/aio.h"
>  
>  #define TYPE_QIO_CHANNEL "qio-channel"
>  #define QIO_CHANNEL(obj)\
> @@ -58,6 +59,8 @@ typedef gboolean (*QIOChannelFunc)(QIOChannel *ioc,
> GIOCondition condition,
> gpointer data);
>  
> +typedef struct QIOChannelRestart QIOChannelRestart;
> +
>  /**
>   * QIOChannel:
>   *
> @@ -80,6 +83,9 @@ struct QIOChannel {
>  Object parent;
>  unsigned int features; /* bitmask of QIOChannelFeatures */
>  char *name;
> +AioContext *ctx;
> +QIOChannelRestart *read_coroutine;
> +QIOChannelRestart *write_coroutine;
>  #ifdef _WIN32
>  HANDLE event; /* For use with GSource on Win32 */
>  #endif
> @@ -132,6 +138,11 @@ struct QIOChannelClass {
>   off_t offset,
>   int whence,
>   Error **errp);
> +void (*io_set_aio_fd_handler)(QIOChannel *ioc,
> +  AioContext *ctx,
> +  IOHandler *io_read,
> +  IOHandler *io_write,
> +  void *opaque);
>  };
>  
>  /* General I/O handling functions */
> @@ -525,4 +536,23 @@ void qio_channel_yield(QIOChannel *ioc,
>  void qio_channel_wait(QIOChannel *ioc,
>GIOCondition condition);
>  
> +/**
> + * qio_channel_set_aio_fd_handler:
> + * @ioc: the channel object
> + * @ctx: the AioContext to set the handlers on
> + * @io_read: the read handler
> + * @io_write: the write handler
> + * @opaque: the opaque value passed to the handler
> + *
> + * This is used internally by qio_channel_yield().  It can
> + * be used by channel implementations to forward the handlers
> + * to another channel (e.g. from #QIOChannelTLS to the
> + * underlying socket).
> + */
> +void qio_channel_set_aio_fd_handler(QIOChannel *ioc,
> +AioContext *ctx,
> +IOHandler *io_read,
> +IOHandler *io_write,
> +void *opaque);
> +
>  #endif /* QIO_CHANNEL_H */
> diff --git a/io/channel-command.c b/io/channel-command.c
> index ad25313..4000b61 100644
> --- a/io/channel-command.c
> +++ b/io/channel-command.c
> @@ -328,6 +328,18 @@ static int qio_channel_command_close(QIOChannel *ioc,
>  }
>  
>  
> +static void qio_channel_command_set_aio_fd_handler(QIOChannel *ioc,
> +AioContext *ctx,
> +IOHandler *io_read,
> +IOHandler *io_write,
> +void *opaque)

Alignment is a bit off.

> +{
> +QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc);
> +aio_set_fd_handler(ctx, cioc->readfd, false, io_read, NULL, NULL, 
> opaque);
> +aio_set_fd_handler(ctx, cioc->writefd, false, NULL, io_write, NULL, 
> opaque);
> +}
> +
> +
>  static GSource *qio_channel_command_create_watch(QIOChannel *ioc,
>   GIOCondition condition)
>  {
> @@ -349,6 +361,7 @@ static void qio_channel_command_class_init(ObjectClass 
> *klass,
>  ioc_klass->io_set_blocking = qio_channel_command_set_blocking;
>  ioc_klass->io_close = qio_channel_command_close;
>  ioc_klass->io_create_watch = qio_channel_command_create_watch;
> +ioc_klass->io_set_aio_fd_handler = 
> qio_channel_command_set_aio_fd_handler;
>  }
>  
>  static const TypeInfo qio_channel_command_info = {
> diff --git a/io/channel-file.c b/io/channel-file.c
> index e1da243..b383273 100644
> --- a/io/channel-file.c
> +++ b/io/channel-file.c
> @@ -186,6 +186,16 @@ static int qio_channel_file_close(QIOChannel *ioc,
>  }
>  
>  
> +static void qio_channel_file_set_aio_fd_handler(QIOChannel *ioc,
> +AioContext *ctx,
> +IOHandler *io_read,
> +IOHandler *io_write,
> +void *opaque)
> +{
> +QIOC

[Qemu-devel] [PATCH v6 0/5] Introduce a new --only-migratable option

2017-01-16 Thread Ashijeet Acharya
Previously posted series patches:
http://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg02324.html
http://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg01277.html
http://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg00320.html
http://lists.nongnu.org/archive/html/qemu-devel/2016-12/msg02391.html
http://lists.nongnu.org/archive/html/qemu-devel/2016-12/msg02062.html

This series adds a new command line option "--only-migratable" which will only
allow addition of those devices to a QEMU instance which are migratable and do
not abruptly fail QEMU after migration.

Patch 1 is a preparatory patch to remove an undesirable comment and make patchew
happy.

Patch 2 adds the new option "-only-migratable".

Patch 3 adds compatibility for various "device adding" options for both via
command line and hotplug methods.

Patch 4 helps to fail the migration blocker if the migration is already in
progress and thus cannot be blocked.
Note: This patch was originally written by John Snow and I have only made few
changes.

Patch 5 handles the special case of devices which become unmigratable
dynamically by making call to "migrate_add_blocker". Here we fail the
migration blocker if --only-migratable was specified.
Eg: 9pfs fails to mount the filesystem.

Note: I have not been able to test and compile the ARM drivers for KVM. They
are:
hw/intc/arm_gic_kvm.c
hw/intc/arm_gicv3_its_kvm.c
hw/intc/arm_gicv3_kvm.c

Changes in v6:
-make Error *local_err = NULL at some places
-free reason at call sites
-rebase
-pass error_copy(reason) in error_propagate()

Changes in v5:
-drop =0 for global variable (Dave)
-print error message if klass=NULL (Dave)
-deal with migration blocker before pdu_marshal (Greg)
-call migrate_add_blocker before kvm_create_device (Dave)
-fix typo in subject line for 4/4 (Eric)
-fix error_free in migrate_add_blocker (Dave)
-make migrate_add_blocker to deal with error messages itself (Peter, Greg, Dave)

Changes in v4:
- drop diff in 9pfs for patch 4/4
- call clunk_fid() after freeing migration_blocker
- drop ret and use err directly

Changes in v3:
- set s->root_fid after migrate_add_blocker
- free migration_blocker inside v9fs_attach()
- change back ret<0 to just ret
- free local_err

Changes in v2:
- change the documentation for the new option
- add a NULL check for ObjectClass
- break patch 3 into patch 3 and 4
- use error_append_hint
- return -EACCES for only-migratable
- fix the error messages

Ashijeet Acharya (5):
  block/vvfat: Remove the undesirable comment
  migration: Add a new option to enable only-migratable
  migration: Allow "device add" options to only add migratable devices
  migration: disallow migrate_add_blocker during migration
  migration: Fail migration blocker for --only-migratable

 block/qcow.c  |  8 +++-
 block/vdi.c   |  8 +++-
 block/vhdx.c  | 17 +++--
 block/vmdk.c  |  9 -
 block/vpc.c   | 11 ---
 block/vvfat.c | 20 
 hw/9pfs/9p.c  | 33 +---
 hw/display/virtio-gpu.c   | 32 ++-
 hw/intc/arm_gic_kvm.c | 17 +++--
 hw/intc/arm_gicv3_its_kvm.c   | 20 +---
 hw/intc/arm_gicv3_kvm.c   | 19 ---
 hw/misc/ivshmem.c | 14 ++
 hw/scsi/vhost-scsi.c  | 25 ++--
 hw/usb/bus.c  | 19 +++
 hw/virtio/vhost.c |  8 +++-
 include/migration/migration.h | 10 +-
 migration/migration.c | 44 +--
 qdev-monitor.c|  9 +
 qemu-options.hx   |  9 +
 stubs/migr-blocker.c  |  3 ++-
 target/i386/kvm.c | 16 +---
 vl.c  |  4 
 22 files changed, 273 insertions(+), 82 deletions(-)

-- 
2.6.2




[Qemu-devel] [PATCH v6 1/5] block/vvfat: Remove the undesirable comment

2017-01-16 Thread Ashijeet Acharya
Remove the "// assert(is_consistent(s))" comment in block/vvfat.c

Signed-off-by: Ashijeet Acharya 
---
 block/vvfat.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index ded2109..7b706dc 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1189,7 +1189,6 @@ static int vvfat_open(BlockDriverState *bs, QDict 
*options, int flags,
 init_mbr(s, cyls, heads, secs);
 }
 
-//assert(is_consistent(s));
 qemu_co_mutex_init(&s->lock);
 
 /* Disable migration when vvfat is used rw */
-- 
2.6.2




[Qemu-devel] [PATCH v6 2/5] migration: Add a new option to enable only-migratable

2017-01-16 Thread Ashijeet Acharya
Add a new option "--only-migratable" in qemu which will allow to add
only those devices which will not fail qemu after migration. Devices
set with the flag 'unmigratable' cannot be added when this option will
be used.

Signed-off-by: Ashijeet Acharya 
---
 include/migration/migration.h | 3 +++
 qemu-options.hx   | 9 +
 vl.c  | 4 
 3 files changed, 16 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index c309d23..40b3697 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -38,6 +38,9 @@
 #define QEMU_VM_COMMAND  0x08
 #define QEMU_VM_SECTION_FOOTER   0x7e
 
+/* for vl.c */
+extern int only_migratable;
+
 struct MigrationParams {
 bool blk;
 bool shared;
diff --git a/qemu-options.hx b/qemu-options.hx
index c534a2f..1e16ae8 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3574,6 +3574,15 @@ be used to change settings (such as migration 
parameters) prior to issuing
 the migrate_incoming to allow the migration to begin.
 ETEXI
 
+DEF("only-migratable", 0, QEMU_OPTION_only_migratable, \
+"-only-migratable allow only migratable devices\n", QEMU_ARCH_ALL)
+STEXI
+@item -only-migratable
+@findex -only-migratable
+Only allow migratable devices. Devices will not be allowed to enter an
+unmigratable state.
+ETEXI
+
 DEF("nodefaults", 0, QEMU_OPTION_nodefaults, \
 "-nodefaults don't create default devices\n", QEMU_ARCH_ALL)
 STEXI
diff --git a/vl.c b/vl.c
index c643d3f..cbcb459 100644
--- a/vl.c
+++ b/vl.c
@@ -180,6 +180,7 @@ bool boot_strict;
 uint8_t *boot_splash_filedata;
 size_t boot_splash_filedata_size;
 uint8_t qemu_extra_params_fw[2];
+int only_migratable; /* turn it off unless user states otherwise */
 
 int icount_align_option;
 
@@ -3910,6 +3911,9 @@ int main(int argc, char **argv, char **envp)
 }
 incoming = optarg;
 break;
+case QEMU_OPTION_only_migratable:
+only_migratable = 1;
+break;
 case QEMU_OPTION_nodefaults:
 has_defaults = 0;
 break;
-- 
2.6.2




[Qemu-devel] [PATCH v6 3/5] migration: Allow "device add" options to only add migratable devices

2017-01-16 Thread Ashijeet Acharya
Introduce checks for the unmigratable flag in the VMStateDescription
structs of respective devices when user attempts to add them. If the
"--only-migratable" was specified, all unmigratable devices will
rightly fail to add. This feature is made compatible for both "-device"
and "-usbdevice" command line options and covers their hmp and qmp
counterparts as well.

Signed-off-by: Ashijeet Acharya 
---
 hw/usb/bus.c   | 19 +++
 qdev-monitor.c |  9 +
 2 files changed, 28 insertions(+)

diff --git a/hw/usb/bus.c b/hw/usb/bus.c
index 25913ad..1dcc35c 100644
--- a/hw/usb/bus.c
+++ b/hw/usb/bus.c
@@ -8,6 +8,7 @@
 #include "monitor/monitor.h"
 #include "trace.h"
 #include "qemu/cutils.h"
+#include "migration/migration.h"
 
 static void usb_bus_dev_print(Monitor *mon, DeviceState *qdev, int indent);
 
@@ -686,6 +687,8 @@ USBDevice *usbdevice_create(const char *cmdline)
 const char *params;
 int len;
 USBDevice *dev;
+ObjectClass *klass;
+DeviceClass *dc;
 
 params = strchr(cmdline,':');
 if (params) {
@@ -720,6 +723,22 @@ USBDevice *usbdevice_create(const char *cmdline)
 return NULL;
 }
 
+klass = object_class_by_name(f->name);
+if (klass == NULL) {
+error_report("Device '%s' not found", f->name);
+return NULL;
+}
+
+dc = DEVICE_CLASS(klass);
+
+if (only_migratable) {
+if (dc->vmsd->unmigratable) {
+error_report("Device %s is not migratable, but --only-migratable "
+ "was specified", f->name);
+return NULL;
+}
+}
+
 if (f->usbdevice_init) {
 dev = f->usbdevice_init(bus, params);
 } else {
diff --git a/qdev-monitor.c b/qdev-monitor.c
index c73410c..81d01df 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -29,6 +29,7 @@
 #include "qemu/error-report.h"
 #include "qemu/help_option.h"
 #include "sysemu/block-backend.h"
+#include "migration/migration.h"
 
 /*
  * Aliases were a bad idea from the start.  Let's keep them
@@ -577,6 +578,14 @@ DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
 return NULL;
 }
 
+if (only_migratable) {
+if (dc->vmsd->unmigratable) {
+error_setg(errp, "Device %s is not migratable, but "
+   "--only-migratable was specified", driver);
+return NULL;
+}
+}
+
 /* find bus */
 path = qemu_opt_get(opts, "bus");
 if (path != NULL) {
-- 
2.6.2




[Qemu-devel] [PATCH v6 4/5] migration: disallow migrate_add_blocker during migration

2017-01-16 Thread Ashijeet Acharya
If a migration is already in progress and somebody attempts
to add a migration blocker, this should rightly fail.

Add an errp parameter and a retcode return value to migrate_add_blocker.

Signed-off-by: John Snow 
Signed-off-by: Ashijeet Acharya 
---
 block/qcow.c  |  8 +++-
 block/vdi.c   |  8 +++-
 block/vhdx.c  | 17 +++--
 block/vmdk.c  |  9 -
 block/vpc.c   | 11 ---
 block/vvfat.c | 19 ---
 hw/9pfs/9p.c  | 33 ++---
 hw/display/virtio-gpu.c   | 32 +++-
 hw/intc/arm_gic_kvm.c | 17 +++--
 hw/intc/arm_gicv3_its_kvm.c   | 20 +---
 hw/intc/arm_gicv3_kvm.c   | 19 ---
 hw/misc/ivshmem.c | 14 ++
 hw/scsi/vhost-scsi.c  | 25 +++--
 hw/virtio/vhost.c |  8 +++-
 include/migration/migration.h |  7 ++-
 migration/migration.c | 37 +++--
 stubs/migr-blocker.c  |  3 ++-
 target/i386/kvm.c | 16 +---
 18 files changed, 222 insertions(+), 81 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index 7540f43..fb738fc 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -104,6 +104,7 @@ static int qcow_open(BlockDriverState *bs, QDict *options, 
int flags,
 unsigned int len, i, shift;
 int ret;
 QCowHeader header;
+Error *local_err = NULL;
 
 ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
 if (ret < 0) {
@@ -252,7 +253,12 @@ static int qcow_open(BlockDriverState *bs, QDict *options, 
int flags,
 error_setg(&s->migration_blocker, "The qcow format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
-migrate_add_blocker(s->migration_blocker);
+ret = migrate_add_blocker(s->migration_blocker, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+error_free(s->migration_blocker);
+goto fail;
+}
 
 qemu_co_mutex_init(&s->lock);
 return 0;
diff --git a/block/vdi.c b/block/vdi.c
index 96b78d5..0aeb940 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -361,6 +361,7 @@ static int vdi_open(BlockDriverState *bs, QDict *options, 
int flags,
 VdiHeader header;
 size_t bmap_size;
 int ret;
+Error *local_err = NULL;
 
 logout("\n");
 
@@ -471,7 +472,12 @@ static int vdi_open(BlockDriverState *bs, QDict *options, 
int flags,
 error_setg(&s->migration_blocker, "The vdi format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
-migrate_add_blocker(s->migration_blocker);
+ret = migrate_add_blocker(s->migration_blocker, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+error_free(s->migration_blocker);
+goto fail_free_bmap;
+}
 
 qemu_co_mutex_init(&s->write_lock);
 
diff --git a/block/vhdx.c b/block/vhdx.c
index 0ba2f0a..68db9e0 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -991,6 +991,17 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 }
 
+/* Disable migration when VHDX images are used */
+error_setg(&s->migration_blocker, "The vhdx format used by node '%s' "
+   "does not support live migration",
+   bdrv_get_device_or_node_name(bs));
+ret = migrate_add_blocker(s->migration_blocker, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+error_free(s->migration_blocker);
+goto fail;
+}
+
 if (flags & BDRV_O_RDWR) {
 ret = vhdx_update_headers(bs, s, false, NULL);
 if (ret < 0) {
@@ -1000,12 +1011,6 @@ static int vhdx_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 /* TODO: differencing files */
 
-/* Disable migration when VHDX images are used */
-error_setg(&s->migration_blocker, "The vhdx format used by node '%s' "
-   "does not support live migration",
-   bdrv_get_device_or_node_name(bs));
-migrate_add_blocker(s->migration_blocker);
-
 return 0;
 fail:
 vhdx_close(bs);
diff --git a/block/vmdk.c b/block/vmdk.c
index a11c27a..7750212 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -941,6 +941,7 @@ static int vmdk_open(BlockDriverState *bs, QDict *options, 
int flags,
 int ret;
 BDRVVmdkState *s = bs->opaque;
 uint32_t magic;
+Error *local_err = NULL;
 
 buf = vmdk_read_desc(bs->file, 0, errp);
 if (!buf) {
@@ -976,7 +977,13 @@ static int vmdk_open(BlockDriverState *bs, QDict *options, 
int flags,
 error_setg(&s->migration_blocker, "The vmdk format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
-migrate_add_blocke

[Qemu-devel] [PATCH v6 5/5] migration: Fail migration blocker for --only-migratable

2017-01-16 Thread Ashijeet Acharya
migrate_add_blocker should rightly fail if the '--only-migratable'
option was specified and the device in use should not be able to
perform the action which results in an unmigratable VM.

Make migrate_add_blocker return -EACCES in this case.

Signed-off-by: Ashijeet Acharya 
---
 include/migration/migration.h | 2 +-
 migration/migration.c | 7 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index bcbdb03..7881e89 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -291,7 +291,7 @@ int ram_postcopy_incoming_init(MigrationIncomingState *mis);
  *
  * @errp - [out] The reason (if any) we cannot block migration right now.
  *
- * @returns - 0 on success, -EBUSY on failure, with errp set.
+ * @returns - 0 on success, -EBUSY/-EACCES on failure, with errp set.
  */
 int migrate_add_blocker(Error *reason, Error **errp);
 
diff --git a/migration/migration.c b/migration/migration.c
index 0d88286..7dcb7d7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1113,6 +1113,13 @@ static GSList *migration_blockers;
 
 int migrate_add_blocker(Error *reason, Error **errp)
 {
+if (only_migratable) {
+error_propagate(errp, error_copy(reason));
+error_prepend(errp, "disallowing migration blocker "
+  "(--only_migratable) for: ");
+return -EACCES;
+}
+
 if (migration_is_idle(NULL)) {
 migration_blockers = g_slist_prepend(migration_blockers, reason);
 return 0;
-- 
2.6.2




Re: [Qemu-devel] [PATCH 05/16] io: make qio_channel_yield aware of AioContexts

2017-01-16 Thread Fam Zheng
On Fri, 01/13 14:17, Paolo Bonzini wrote:
> Support separate coroutines for reading and writing, and place the
> read/write handlers on the AioContext that the QIOChannel is registered
> with.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  include/io/channel.h   | 37 ++
>  io/channel.c   | 86 
> ++
>  tests/Makefile.include |  2 +-
>  3 files changed, 96 insertions(+), 29 deletions(-)
> 
> diff --git a/include/io/channel.h b/include/io/channel.h
> index 665edd7..d7bad94 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -23,6 +23,7 @@
>  
>  #include "qemu-common.h"
>  #include "qom/object.h"
> +#include "qemu/coroutine.h"
>  #include "block/aio.h"
>  
>  #define TYPE_QIO_CHANNEL "qio-channel"
> @@ -59,8 +60,6 @@ typedef gboolean (*QIOChannelFunc)(QIOChannel *ioc,
> GIOCondition condition,
> gpointer data);
>  
> -typedef struct QIOChannelRestart QIOChannelRestart;
> -
>  /**
>   * QIOChannel:
>   *
> @@ -84,8 +83,8 @@ struct QIOChannel {
>  unsigned int features; /* bitmask of QIOChannelFeatures */
>  char *name;
>  AioContext *ctx;
> -QIOChannelRestart *read_coroutine;
> -QIOChannelRestart *write_coroutine;
> +Coroutine *read_coroutine;
> +Coroutine *write_coroutine;
>  #ifdef _WIN32
>  HANDLE event; /* For use with GSource on Win32 */
>  #endif
> @@ -508,13 +507,37 @@ guint qio_channel_add_watch(QIOChannel *ioc,
>  
>  
>  /**
> + * qio_channel_set_aio_context:
> + * @ioc: the channel object
> + * @ctx: the #AioContext to set the handlers on
> + *
> + * Request that qio_channel_yield() sets I/O handlers on
> + * the given #AioContext.  If @ctx is %NULL, qio_channel_yield()
> + * uses QEMU's main thread event loop.
> + */
> +void qio_channel_set_aio_context(QIOChannel *ioc,
> + AioContext *ctx);
> +
> +/**
> + * qio_channel_detach_aio_context:
> + * @ioc: the channel object
> + *
> + * Disable any I/O handlers set by qio_channel_yield().  With the
> + * help of aio_co_schedule(), this allows moving a coroutine that was
> + * paused by qio_channel_yield() to another context.
> + */
> +void qio_channel_detach_aio_context(QIOChannel *ioc);
> +
> +/**
>   * qio_channel_yield:
>   * @ioc: the channel object
>   * @condition: the I/O condition to wait for
>   *
> - * Yields execution from the current coroutine until
> - * the condition indicated by @condition becomes
> - * available.
> + * Yields execution from the current coroutine until the condition
> + * indicated by @condition becomes available.  @condition must
> + * be either %G_IO_IN or %G_IO_OUT; it cannot contain both.  In
> + * addition, no two coroutine can be waiting on the same condition
> + * and channel at the same time.
>   *
>   * This must only be called from coroutine context
>   */
> diff --git a/io/channel.c b/io/channel.c
> index ce470d7..1e043bf 100644
> --- a/io/channel.c
> +++ b/io/channel.c
> @@ -21,7 +21,7 @@
>  #include "qemu/osdep.h"
>  #include "io/channel.h"
>  #include "qapi/error.h"
> -#include "qemu/coroutine.h"
> +#include "qemu/main-loop.h"
>  
>  bool qio_channel_has_feature(QIOChannel *ioc,
>   QIOChannelFeature feature)
> @@ -238,36 +238,80 @@ off_t qio_channel_io_seek(QIOChannel *ioc,
>  }
>  
>  
> -typedef struct QIOChannelYieldData QIOChannelYieldData;
> -struct QIOChannelYieldData {
> -QIOChannel *ioc;
> -Coroutine *co;
> -};
> +static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc);
> +
> +static void qio_channel_restart_read(void *opaque)
> +{
> +QIOChannel *ioc = opaque;
> +Coroutine *co = ioc->read_coroutine;
>  
> +ioc->read_coroutine = NULL;
> +qio_channel_set_aio_fd_handlers(ioc);
> +aio_co_wake(co);
> +}
>  
> -static gboolean qio_channel_yield_enter(QIOChannel *ioc,
> -GIOCondition condition,
> -gpointer opaque)
> +static void qio_channel_restart_write(void *opaque)
>  {
> -QIOChannelYieldData *data = opaque;
> -qemu_coroutine_enter(data->co);
> -return FALSE;
> +QIOChannel *ioc = opaque;
> +Coroutine *co = ioc->write_coroutine;
> +
> +ioc->write_coroutine = NULL;
> +qio_channel_set_aio_fd_handlers(ioc);
> +aio_co_wake(co);
>  }
>  
> +static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc)
> +{
> +IOHandler *rd_handler = NULL, *wr_handler = NULL;
> +AioContext *ctx;
> +
> +if (ioc->read_coroutine) {
> + rd_handler = qio_channel_restart_read;

s/\t//

> +}
> +if (ioc->write_coroutine) {
> + rd_handler = qio_channel_restart_write;

s/\t//

> +}
> +
> +ctx = ioc->ctx ? ioc->ctx : iohandler_get_aio_context();
> +qio_channel_set_aio_fd_handler(ioc, ctx, rd_handler, wr_handler, ioc);
> +}
> +
> +void qio_channel_set_aio_context(QIOChannel *ioc,
> +  

[Qemu-devel] [Bug 1643537] Re: target-ppc/int_helper.c: 2 * bad array index

2017-01-16 Thread Thomas Huth
Released with version 2.8

** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1643537

Title:
  target-ppc/int_helper.c: 2 * bad array index

Status in QEMU:
  Fix Released

Bug description:
  1.

  [qemu/target-ppc/int_helper.c:2575]: (error) Array 'reg.u16[8]'
  accessed at index 8, which is out of bounds.

  Source code is

 return reg->u16[8 - n];

  and

  qemu/target-ppc/cpu.h:uint16_t u16[8];

  but at least once, n is zero, for example line 2725 in the
  int_helper.c file:

  uint16_t sgnb = get_national_digit(b, 0);

  2.

  [qemu/target-ppc/int_helper.c:2584]: (error) Array 'reg.u16[8]'
  accessed at index 8, which is out of bounds.

  Duplicate

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1643537/+subscriptions



[Qemu-devel] [Bug 1630527] Re: qemu/hw/i386/amd_iommu.c:188: possible bad shift ?

2017-01-16 Thread Thomas Huth
Released with v2.8

** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1630527

Title:
  qemu/hw/i386/amd_iommu.c:188: possible bad shift ?

Status in QEMU:
  Fix Released

Bug description:
  qemu/hw/i386/amd_iommu.c:188]: (error) Shifting 32-bit value by 64
  bits is undefined behaviour

  Source code is

  uint64_t mask = ((1 << length) - 1) << bitpos;

  Maybe better code

  uint64_t mask = ((1ULL << length) - 1) << bitpos;

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1630527/+subscriptions



[Qemu-devel] [Bug 1640525] Re: -net socket, connect/listen does not work in 2.7.0

2017-01-16 Thread Thomas Huth
Fix has been released with QEMU v2.8

** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1640525

Title:
  -net socket,connect/listen does not work in 2.7.0

Status in QEMU:
  Fix Released

Bug description:
  Using 2.7.0 release on Debian Sid. What I did: start one VM with:
  
  /home/pierre/build/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
  -smp 4 \
  -cpu Nehalem \
  -soundhw ac97 \
  -k fr \
  -localtime \
  -enable-kvm \
  -m 4099 \
  -drive file=/mnt/virtualMachines/qemu/lfs-7.10-porg.qcow2,cache=writeback \
  -cdrom /mnt/virtualMachines/qemu/grub-img.iso \
  -boot order=c,once=d,menu=on \
  -vga std \
  -serial mon:stdio \
  -net nic,vlan=0,model=e1000,macaddr=52:54:00:12:34:58 \
  -net user,vlan=0,hostfwd=tcp::2223-10.0.2.9:22 \
  -net nic,vlan=1,model=e1000,macaddr=52:54:00:12:34:56 \
  -net socket,vlan=1,listen=:4321
  
  Start another one with:
  
  /home/pierre/build/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
  -smp 4 \
  -cpu Nehalem \
  -soundhw ac97 \
  -k fr \
  -localtime \
  -enable-kvm \
  -m 4099 \
  -drive file=/mnt/virtualMachines/qemu/lfs-7.10-october.qcow2,cache=writeback \
  -cdrom /mnt/virtualMachines/qemu/grub-img.iso \
  -boot order=c \
  -serial mon:stdio \
  -vga std \
  -net nic,vlan=0,model=e1000,macaddr=52:54:00:12:34:57 \
  -net socket,vlan=0,connect=localhost:4321
  
  The network settings of the first machine are:
  
  1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1
  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
  inet 127.0.0.1/8 scope host lo
 valid_lft forever preferred_lft forever
  2: enp0s3:  mtu 1500 qdisc pfifo_fast master 
br0 state UP group default qlen 1000
  link/ether 52:54:00:12:34:58 brd ff:ff:ff:ff:ff:ff
  3: enp0s4:  mtu 1500 qdisc pfifo_fast master 
br0 state UP group default qlen 1000
  link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
  4: br0:  mtu 1500 qdisc noqueue state UP 
group default qlen 1000
  link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
  inet 10.0.2.9/24 brd 10.0.2.255 scope global br0
 valid_lft forever preferred_lft forever
  
  The network settings on the second machine are:
  
  1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1
  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
  inet 127.0.0.1/8 scope host lo
 valid_lft forever preferred_lft forever
  2: enp0s3:  mtu 1500 qdisc pfifo_fast state 
UP group default qlen 1000
  link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff
  inet 10.0.2.10/24 brd 10.0.2.255 scope global enp0s3
 valid_lft forever preferred_lft forever
  
  typing "ping -c 1 10.0.2.10" on the first machine returns:
  
  PING 10.0.2.10 (10.0.2.10): 56 data bytes
  92 bytes from virtuallfs (10.0.2.9): Destination Host Unreachable
  --- 10.0.2.10 ping statistics ---
  1 packets transmitted, 0 packets received, 100% packet loss
  
  and something similar when typing "ping -c 1 10.0.2.9" on the second machine.

  This very same setting works as expected in version 2.6.0. I could bisect, 
and the offending commit is 16a3df403b1:
  
  commit 16a3df403b10c4ac347159e39005fd520b2648bb
  Author: Zhang Chen 
  Date:   Fri May 13 15:35:19 2016 +0800

  net/net: Add SocketReadState for reuse codes
  
  This function is from net/socket.c, move it to net.c and net.h.
  Add SocketReadState to make others reuse net_fill_rstate().
  suggestion from jason.
  
  v4:
   - move 'rs->finalize = finalize' to rs_init()
  
  v3:
   - remove SocketReadState init callback
   - put finalize callback to net_fill_rstate()
  
  v2:
   - rename ReadState to SocketReadState
   - add SocketReadState init and finalize callback
  
  v1:
   - init patch
  
  Signed-off-by: Zhang Chen 
  Signed-off-by: Li Zhijian 
  Signed-off-by: Wen Congyang 
  Signed-off-by: Jason Wang 
  

  BTW, the systems on both VM are built from
  http://www.linuxfromscratch.org. But I do not think this is important,
  since I could do the bisect. Of course, I'll be happy to try other
  VMs, if you point me to some.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1640525/+subscriptions



[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers

2017-01-16 Thread Thomas Huth
Commit 0d34fbabc13 has been released with QEMU v2.8

** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1626972

Title:
  QEMU memfd_create fallback mechanism change for security drivers

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive mitaka series:
  Fix Committed
Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Xenial:
  Fix Committed
Status in qemu source package in Yakkety:
  Fix Committed
Status in qemu source package in Zesty:
  Fix Released

Bug description:
  [Impact]

   * Updated QEMU (from UCA) live migration doesn't work with 3.13 kernels.
   * QEMU code checks if it can create /tmp/memfd-XXX files wrongly.
   * Apparmor will block access to /tmp/ and QEMU will fail migrating.

  [Test Case]

   * Install 2 Ubuntu Trusty (3.13) + UCA Mitaka + apparmor rules.
   * Try to live-migration from one to another. 
   * Apparmor will block creation of /tmp/memfd-XXX files.

  [Regression Potential]

   Pros:
   * Exhaustively tested this.
   * Worked with upstream on this fix. 
   * I'm implementing new vhost log mechanism for upstream.
   * One line change to a blocker that is already broken.

   Cons:
   * To break live migration in other circumstances. 

  [Other Info]

   * Christian Ehrhardt has been following this.

  ORIGINAL DESCRIPTION:

  When libvirt starts using apparmor, and creating apparmor profiles for
  every virtual machine created in the compute nodes, mitaka qemu (2.5 -
  and upstream also) uses a fallback mechanism for creating shared
  memory for live-migrations. This fall back mechanism, on kernels 3.13
  - that don't have memfd_create() system-call, try to create files on
  /tmp/ directory and fails.. causing live-migration not to work.

  Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability =
  can't live migrate.

  From qemu 2.5, logic is on :

  void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int 
*fd)
  {
  if (memfd_create)... ### only works with HWE kernels

  else ### 3.13 kernels, gets blocked by apparmor
     tmpdir = g_get_tmp_dir
     ...
     mfd = mkstemp(fname)
  }

  And you can see the errors:

  From the host trying to send the virtual machine:

  2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver 
[req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 
133ebc3585c041aebaead8c062cd6511 - - -] [instance: 
2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted
  2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver 
[req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 
133ebc3585c041aebaead8c062cd6511 - - -] [instance: 
2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: 
unable to execute QEMU command 'migrate': Migration disabled: failed to 
allocate shared memory

  From the host trying to receive the virtual machine:

  Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 
audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" 
profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" 
pid=12565 comm="apparmor_parser"
  Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 
audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" 
profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser"
  Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 
audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" 
profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" 
pid=12613 comm="apparmor_parser"
  Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 
audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" 
profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser"
  Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 
audit(1471289780.407:76): apparmor="DENIED" operation="mknod" 
profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" 
pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 
ouid=107
  Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 
audit(1471289780.411:77): apparmor="DENIED" operation="open" 
profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 
comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0
  Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 
audit(1471289780.411:78): apparmor="DENIED" operation="open" 
profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" 
pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 
ouid=0

  When leaving libvirt without apparmor capabilities (thus not confining
  virtual machines on comp

[Qemu-devel] [Bug 1194954] Re: Windows 95 guest reboots itself on qemu 1.5.0 & 1.5.50 (GIT)

2017-01-16 Thread Thomas Huth
Triaging old bug tickets ... can you still reproduce this problem with
the latest version of QEMU (currently v2.8)?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1194954

Title:
  Windows 95 guest reboots itself on qemu 1.5.0 & 1.5.50 (GIT)

Status in QEMU:
  Incomplete

Bug description:
  When I begin to run a Windows 95 guest on these releases of qemu, it
  reboots itself more times without my permission (eg. without shutting
  it down properly), and when I'm installing Netscape 4.08 at, for
  example, 46% or 75%, it still reboots itself without completing the
  installation of the web browser. Is this an issue of main-loop.c?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1194954/+subscriptions



[Qemu-devel] [Bug 1625987] Re: target-arm/translate-a64.c:2028: possible coding error ?

2017-01-16 Thread Thomas Huth
Released with v2.8

** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1625987

Title:
  target-arm/translate-a64.c:2028: possible coding error ?

Status in QEMU:
  Fix Released

Bug description:
  target-arm/translate-a64.c:2028:37: warning: ?: using integer
  constants in boolean context [-Wint-in-bool-context]

  Source code is

  bool iss_sf = opc == 0 ? 32 : 64;

  Maybe better code

  bool iss_sf = (opc == 0) ? 32 : 64;

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1625987/+subscriptions



Re: [Qemu-devel] [PATCH] Drop duplicate display option documentation

2017-01-16 Thread Marc-André Lureau
Hi

On Mon, Jan 16, 2017 at 12:30 AM Samuel Thibault 
wrote:

> The curses and none possibilities are already documented on a separate
> line,
> so documenting it on the sdl line was both unneeded and confusing.
>
> Signed-off-by: Samuel Thibault 
>

Introduced in commit f04ec5afbb7d60a56863add800fd90ceee66f362


Reviewed-by: Marc-André Lureau 




> diff --git a/qemu-options.hx b/qemu-options.hx
> index c534a2f7f9..9c81d3752d 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -927,7 +927,7 @@ ETEXI
>
>  DEF("display", HAS_ARG, QEMU_OPTION_display,
>  "-display sdl[,frame=on|off][,alt_grab=on|off][,ctrl_grab=on|off]\n"
> -"[,window_close=on|off][,gl=on|off]|curses|none|\n"
> +"[,window_close=on|off][,gl=on|off]\n"
>  "-display gtk[,grab_on_hover=on|off][,gl=on|off]|\n"
>  "-display vnc=[,]\n"
>  "-display curses\n"
>
> --
Marc-André Lureau


Re: [Qemu-devel] [PATCH 01/16] aio: introduce aio_co_schedule and aio_co_wake

2017-01-16 Thread Paolo Bonzini


On 16/01/2017 12:09, Fam Zheng wrote:
> On Fri, 01/13 14:17, Paolo Bonzini wrote:
>> aio_co_wake provides the infrastructure to start a coroutine on a "home"
>> AioContext.  It will be used by CoMutex and CoQueue, so that coroutines
>> don't jump from one context to another when they go to sleep on a
>> mutex or waitqueue.  However, it can also be used as a more efficient
>> alternative to one-shot bottom halves, and saves the effort of tracking
>> which AioContext a coroutine is running on.
>>
>> aio_co_schedule is the part of aio_co_wake that starts a coroutine
>> on a remove AioContext, but it is also useful to implement e.g.
> 
> s/remove/remote/ and maybe s/but/and/ ?
> 
>> bdrv_set_aio_context callbacks.
>>
>> The implementation of aio_co_schedule is based on a lock-free
>> multiple-producer, single-consumer queue.  The multiple producers use
>> cmpxchg to add to a LIFO stack.  The consumer (a per-AioContext bottom
>> half) grabs all items added so far, inverts the list to make it FIFO,
>> and goes through it one item at a time until it's empty.  The data
>> structure was inspired by OSv, which uses it in the very code we'll
>> "port" to QEMU for the thread-safe CoMutex.
>>
>> Most of the new code is really tests.
>>
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  async.c  |  65 +
>>  include/block/aio.h  |  32 +++
>>  include/qemu/coroutine_int.h |  10 +-
>>  tests/Makefile.include   |  13 ++-
>>  tests/iothread.c |  91 ++
>>  tests/iothread.h |  25 +
>>  tests/test-aio-multithread.c | 213 
>> +++
>>  tests/test-vmstate.c |  11 ---
>>  trace-events |   4 +
>>  util/qemu-coroutine.c|   8 ++
>>  10 files changed, 456 insertions(+), 16 deletions(-)
>>  create mode 100644 tests/iothread.c
>>  create mode 100644 tests/iothread.h
>>  create mode 100644 tests/test-aio-multithread.c
>>
>> diff --git a/async.c b/async.c
>> index 0d218ab..1338682 100644
>> --- a/async.c
>> +++ b/async.c
>> @@ -30,6 +30,8 @@
>>  #include "qemu/main-loop.h"
>>  #include "qemu/atomic.h"
>>  #include "block/raw-aio.h"
>> +#include "trace/generated-tracers.h"
>> +#include "qemu/coroutine_int.h"
>>  
>>  /***/
>>  /* bottom halves (can be seen as timers which expire ASAP) */
>> @@ -274,6 +276,9 @@ aio_ctx_finalize(GSource *source)
>>  }
>>  #endif
>>  
>> +assert(QSLIST_EMPTY(&ctx->scheduled_coroutines));
>> +qemu_bh_delete(ctx->co_schedule_bh);
>> +
>>  qemu_lockcnt_lock(&ctx->list_lock);
>>  assert(!qemu_lockcnt_count(&ctx->list_lock));
>>  while (ctx->first_bh) {
>> @@ -363,6 +368,28 @@ static bool event_notifier_poll(void *opaque)
>>  return atomic_read(&ctx->notified);
>>  }
>>  
>> +static void co_schedule_bh_cb(void *opaque)
>> +{
>> +AioContext *ctx = opaque;
>> +QSLIST_HEAD(, Coroutine) straight, reversed;
>> +
>> +QSLIST_MOVE_ATOMIC(&reversed, &ctx->scheduled_coroutines);
>> +QSLIST_INIT(&straight);
> 
> Worth special casing 1 element case?

Sounds like premature optimization; the QSLIST_MOVE_ATOMIC is going to
be pretty expensive anyway.  Do you mean something like:

if (QSLIST_EMPTY(&reversed)) {
return;
}
Coroutine *co = QSLIST_FIRST(&reversed);
if (!QSLIST_NEXT(co, co_scheduled_next)) {
straight = reversed;
} else {
do {
...
} while (!QSLIST_EMPTY(&reversed);
}

do {
...
} while (!QSLIST_EMPTY(&straight);

?  Looks a but busy.  However, removing the QSLIST_EMPTY case and then
using do/while may be a nice middle.

Paolo

>> +
>> +while (!QSLIST_EMPTY(&reversed)) {
>> +Coroutine *co = QSLIST_FIRST(&reversed);
>> +QSLIST_REMOVE_HEAD(&reversed, co_scheduled_next);
>> +QSLIST_INSERT_HEAD(&straight, co, co_scheduled_next);
>> +}
>> +
>> +while (!QSLIST_EMPTY(&straight)) {
>> +Coroutine *co = QSLIST_FIRST(&straight);
>> +QSLIST_REMOVE_HEAD(&straight, co_scheduled_next);
>> +trace_aio_co_schedule_bh_cb(ctx, co);
>> +qemu_coroutine_enter(co);
>> +}
>> +}
>> +
>> diff --git a/tests/iothread.c b/tests/iothread.c
>> new file mode 100644
>> index 000..777d9ee
>> --- /dev/null
>> +++ b/tests/iothread.c
>> @@ -0,0 +1,91 @@
>> +/*
>> + * Event loop thread implementation for unit tests
> 
> Curious: what is preventing from (perhaps enhancing and then) using the top
> iothread.c implementation?

Mostly the dependency of iothread.c on QOM.  iothread_new is much
simpler than creating a new object, adding it to the QOM tree, calling
user_creatable_complete, etc.  A wrapper wouldn't be much smaller than
this file.

Paolo

>> + *
>> + * Copyright Red Hat Inc., 2013, 2016
>> + *
>> + * Authors:
>> + *  Stefan Hajnoczi   
>> + *  Paolo Bonzini 
>> + *
>> + * This wo

Re: [Qemu-devel] [PATCH 05/16] io: make qio_channel_yield aware of AioContexts

2017-01-16 Thread Paolo Bonzini


On 16/01/2017 12:38, Fam Zheng wrote:
>> +void qio_channel_detach_aio_context(QIOChannel *ioc)
>> +{
>> +ioc->read_coroutine = NULL;
>> +ioc->write_coroutine = NULL;
>> +qio_channel_set_aio_fd_handlers(ioc);
>> +ioc->ctx = NULL;
>
> Why is qio_channel_set_aio_fd_handler not needed here?

Because there are no read_coroutine and write_coroutine anymore.  The
caller needs to schedule them on the right AioContext after calling
qio_channel_set_aio_context.  See nbd_client_attach_aio_context in the
next patch for an example.

>> -tests/test-char$(EXESUF): tests/test-char.o qemu-char.o qemu-timer.o 
>> $(test-util-obj-y) $(qtest-obj-y) $(test-io-obj-y)
>> +tests/test-char$(EXESUF): tests/test-char.o qemu-char.o qemu-timer.o 
>> $(test-util-obj-y) $(qtest-obj-y) $(test-io-obj-y) $(test-block-obj-y)
> 
> I guess this is a hint for moving coroutine code into a lower level library 
> like
> util.

Coroutine, or AioContext?  The reason for this is that io/ now uses
aio_co_wake.

Paolo



[Qemu-devel] [Bug 1568356] Re: ERROR:ui/sdl2-2d.c:120:sdl2_2d_switch:

2017-01-16 Thread Thomas Huth
Commit 435deffefbb07d9a0cafef4 has been released with QEMU v2.7.0
already

** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1568356

Title:
  ERROR:ui/sdl2-2d.c:120:sdl2_2d_switch:

Status in QEMU:
  Fix Released

Bug description:
  when display sdl is selected  in display switch resolution qemu exit
  with a core dump with this message

  
  ERROR:ui/sdl2-2d.c:120:sdl2_2d_switch: code should not be reached

  My Machine is a Cyrus+ PowerPc P5020 4gb ram Radeon 6570 2Gb

  This issue affected PowerMac G5 quad too .

  My distro is Mate 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1568356/+subscriptions



Re: [Qemu-devel] [PATCH] Drop duplicate display option documentation

2017-01-16 Thread Paolo Bonzini
Cc: qemu-triv...@nongnu.org

On 15/01/2017 21:30, Samuel Thibault wrote:
> The curses and none possibilities are already documented on a separate line,
> so documenting it on the sdl line was both unneeded and confusing.
> 
> Signed-off-by: Samuel Thibault 
> 
> diff --git a/qemu-options.hx b/qemu-options.hx
> index c534a2f7f9..9c81d3752d 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -927,7 +927,7 @@ ETEXI
>  
>  DEF("display", HAS_ARG, QEMU_OPTION_display,
>  "-display sdl[,frame=on|off][,alt_grab=on|off][,ctrl_grab=on|off]\n"
> -"[,window_close=on|off][,gl=on|off]|curses|none|\n"
> +"[,window_close=on|off][,gl=on|off]\n"
>  "-display gtk[,grab_on_hover=on|off][,gl=on|off]|\n"
>  "-display vnc=[,]\n"
>  "-display curses\n"
> 
> 



Re: [Qemu-devel] [PATCH] Further tidy-up on block status

2017-01-16 Thread Vladimir Sementsov-Ogievskiy

13.01.2017 13:29, Alex Bligh wrote:

On 13 Jan 2017, at 09:48, Vladimir Sementsov-Ogievskiy 
 wrote:

12.01.2017 16:11, Alex Bligh wrote:

On 12 Jan 2017, at 07:05, Vladimir Sementsov-Ogievskiy 
 wrote:

Yes this is better. But is it actually needed to force contexts have some safe 
default? If context wants it may define such default without this requirement.. 
So, should it be requirement at all?

I've changed this to:

 of the file), a server MAY reply with a single block status
 descriptor with *length* matching the requested length, rather than
 reporting the error; in this case the context MAY mandate the
 status returned.



Hmm, I don't understand. So, it MAY mandate and therefore MAY NOT do it? And 
what client should think, if server replies with one chunk matching the request 
length and not mandate the status?

Some contexts may mandate a particular value (so for instance the allocation 
context might mandate 0).

Some contexts may not mandate a particular value, in which case the 
interpretation is dependent upon the context (just like any other status 
value). EG a context which returned an status of 7 if the range contained a 
prime number, and else 3, could carry on doing that.

As it doesn't make sense to interpret status returns without an understanding 
of the particular context, we might as well simply extend this to 'beyond the 
range' returns - as I think you pointed out!



>>> The status flags are intentionally defined so that a server MAY 
always safely report a status of 0 for any block


- Actually, status flags are _not_ defined. (each context defines it's 
own status flags)



--
Best regards,
Vladimir




Re: [Qemu-devel] [RFC PATCH 02/17] hw/ppc/spapr: Add POWER9 to pseries cpu models

2017-01-16 Thread David Gibson
On Fri, Jan 13, 2017 at 05:28:08PM +1100, Suraj Jitindar Singh wrote:
> Add POWER9 cpu to list of spapr core models which allows it to be specified
> as the cpu model for a pseries guest (e.g. -machine pseries -cpu POWER9).
> 
> Signed-off-by: Suraj Jitindar Singh 

I realize you need this first for development, but for submission,
please move it to the end of the series.  The idea is if you're
bisecting or iterating through commits, then when the POWER9 option
appears, it will actually work.  That won't be the case without the
rest of the patches in the series.

> ---
>  hw/ppc/spapr_cpu_core.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index c18632b..8cc7058 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -358,6 +358,9 @@ static const char *spapr_core_models[] = {
>  
>  /* POWER8NVL */
>  "POWER8NVL_v1.0",
> +
> +/* POWER9 */
> +"POWER9_v1.0",
>  };
>  
>  void spapr_cpu_core_class_init(ObjectClass *oc, void *data)

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] Towards an ivshmem 2.0?

2017-01-16 Thread Marc-André Lureau
Hi

On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka  wrote:

> Hi,
>
> some of you may know that we are using a shared memory device similar to
> ivshmem in the partitioning hypervisor Jailhouse [1].
>
> We started as being compatible to the original ivshmem that QEMU
> implements, but we quickly deviated in some details, and in the recent
> months even more. Some of the deviations are related to making the
> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
> aiming at safety critical systems and, therefore, a small code base.
> Other changes address deficits in the original design, like missing
> life-cycle management.
>
> Now the question is if there is interest in defining a common new
> revision of this device and maybe also of some protocols used on top,
> such as virtual network links. Ideally, this would enable us to share
> Linux drivers. We will definitely go for upstreaming at least a network
> driver such as [2], a UIO driver and maybe also a serial port/console.
>
>
This sounds like duplicating efforts done with virtio and vhost-pci. Have
you looked at Wei Wang proposal?

I've attached a first draft of the specification of our new ivshmem
> device. A working implementation can be found in the wip/ivshmem2 branch
> of Jailhouse [3], the corresponding ivshmem-net driver in [4].
>

You don't have qemu branch, right?


>
> Deviations from the original design:
>
> - Only two peers per link
>
>
sound sane, that's also what vhost-pci aims to afaik


>   This simplifies the implementation and also the interfaces (think of
>   life-cycle management in a multi-peer environment). Moreover, we do
>   not have an urgent use case for multiple peers, thus also not
>   reference for a protocol that could be used in such setups. If someone
>   else happens to share such a protocol, it would be possible to discuss
>   potential extensions and their implications.
>
> - Side-band registers to discover and configure share memory regions
>
>   This was one of the first changes: We removed the memory regions from
>   the PCI BARs and gave them special configuration space registers. By
>   now, these registers are embedded in a PCI capability. The reasons are
>   that Jailhouse does not allow to relocate the regions in guest address
>   space (but other hypervisors may if they like to) and that we now have
>   up to three of them.
>

 Sorry, I can't comment on that.


> - Changed PCI base class code to 0xff (unspecified class)
>
>   This allows us to define our own sub classes and interfaces. That is
>   now exploited for specifying the shared memory protocol the two
>   connected peers should use. It also allows the Linux drivers to match
>   on that.
>
>
Why not, but it worries me that you are going to invent protocols similar
to virtio devices, aren't you?


> - INTx interrupts support is back
>
>   This is needed on target platforms without MSI controllers, i.e.
>   without the required guest support. Namely some PCI-less ARM SoCs
>   required the reintroduction. While doing this, we also took care of
>   keeping the MMIO registers free of privileged controls so that a
>   guest OS can map them safely into a guest userspace application.
>
>
Right, it's not completely removed from ivshmem qemu upstream, although it
should probably be allowed to setup a doorbell-ivshmem with msi=off (this
may be quite trivial to add back)


> And then there are some extensions of the original ivshmem:
>
> - Multiple shared memory regions, including unidirectional ones
>
>   It is now possible to expose up to three different shared memory
>   regions: The first one is read/writable for both sides. The second
>   region is read/writable for the local peer and read-only for the
>   remote peer (useful for output queues). And the third is read-only
>   locally but read/writable remotely (ie. for input queues).
>   Unidirectional regions prevent that the receiver of some data can
>   interfere with the sender while it is still building the message, a
>   property that is not only useful for safety critical communication,
>   we are sure.
>

Sounds like a good idea, and something we may want in virtio too

>
> - Life-cycle management via local and remote state
>
>   Each device can now signal its own state in form of a value to the
>   remote side, which triggers an event there. Moreover, state changes
>   done by the hypervisor to one peer are signalled to the other side.
>   And we introduced a write-to-shared-memory mechanism for the
>   respective remote state so that guests do not have to issue an MMIO
>   access in order to check the state.
>

There is also ongoing work to better support disconnect/reconnect in
virtio.


>
> So, this is our proposal. Would be great to hear some opinions if you
> see value in adding support for such an "ivshmem 2.0" device to QEMU as
> well and expand its ecosystem towards Linux upstream, maybe also DPDK
> again. If you see problems in the new design /wrt what QEMU provides so
> far 

Re: [Qemu-devel] [PULL 0/4] tcg fixes

2017-01-16 Thread Peter Maydell
On 13 January 2017 at 20:05, Richard Henderson  wrote:
> Two problems found with my most recent tcg-2.9 queued patches;
> two patches for tcg/aarch64 that had been submitted but somehow
> dropped off the patch queue.
>
> With this, aarch64 risu passes on aarch64 again.
>
>
> r~
>
>
> The following changes since commit b6af8ea60282df514f87d32e36afd1c9aeee28c8:
>
>   Merge remote-tracking branch 
> 'remotes/ehabkost/tags/x86-and-machine-pull-request' into staging (2017-01-13 
> 14:38:21 +)
>
> are available in the git repository at:
>
>   git://github.com/rth7680/qemu.git tags/pull-tcg-20170113
>
> for you to fetch changes up to 8cf9a3d3f7a4b95f33e0bda5416b9c93ec887dd3:
>
>   tcg/aarch64: Fix tcg_out_movi (2017-01-13 11:47:29 -0800)
>
> 
> Fixes and more queued patches
>

Applied, thanks.

-- PMM



Re: [Qemu-devel] [PATCH 01/16] aio: introduce aio_co_schedule and aio_co_wake

2017-01-16 Thread Fam Zheng
On Mon, 01/16 13:19, Paolo Bonzini wrote:
> >> +static void co_schedule_bh_cb(void *opaque)
> >> +{
> >> +AioContext *ctx = opaque;
> >> +QSLIST_HEAD(, Coroutine) straight, reversed;
> >> +
> >> +QSLIST_MOVE_ATOMIC(&reversed, &ctx->scheduled_coroutines);
> >> +QSLIST_INIT(&straight);
> > 
> > Worth special casing 1 element case?
> 
> Sounds like premature optimization; the QSLIST_MOVE_ATOMIC is going to
> be pretty expensive anyway.  Do you mean something like:
> 
>   if (QSLIST_EMPTY(&reversed)) {
>   return;
>   }
>   Coroutine *co = QSLIST_FIRST(&reversed);
>   if (!QSLIST_NEXT(co, co_scheduled_next)) {
>   straight = reversed;
>   } else {
>   do {
> ...
>   } while (!QSLIST_EMPTY(&reversed);
>   }
> 
>   do {
>   ...
>   } while (!QSLIST_EMPTY(&straight);
> 
> ?  Looks a but busy.  However, removing the QSLIST_EMPTY case and then
> using do/while may be a nice middle.

I think QSLIST_EMPTY is very unusual. I don't know if these are premature or
not, just asked because the !QSLIST_NEXT() case will be the most common one.

Fam



Re: [Qemu-devel] [virtio-dev] Re: [PATCH v15 0/2] virtio-crypto: virtio crypto device specification

2017-01-16 Thread Gonglei (Arei)
Hi Michael and others,

I'd like to redefine struct virtio_crypto_op_data_req is as below: 

struct virtio_crypto_op_data_req {
struct virtio_crypto_op_header header;

union {
struct virtio_crypto_sym_data_req   sym_req;
struct virtio_crypto_hash_data_req  hash_req;
struct virtio_crypto_mac_data_req   mac_req;
struct virtio_crypto_aead_data_req  aead_req;
struct virtio_crypto_sym_data_req_non_sess   sym_non_sess_req;
struct virtio_crypto_hash_data_req_non_sess  hash_non_sess_req;
struct virtio_crypto_mac_data_req_non_sess   mac_non_sess_req;
struct virtio_crypto_aead_data_req_non_sess  aead_non_sess_req;
__u8 padding[72];
} u;
};

The length of union in the structure will be changed, which from current 48 
byte to 72 byte.

We couldn't redefine a structure named virtio_crypto_op_data_req_non_sess,
Because the feature bits are for crypto services, not for the wrapper packet 
request.

It's impossible to mixed use struct virtio_crypto_op_data_req and 
struct named virtio_crypto_op_data_req_non_sess for one data virtqueue.

For driver compabity, I can submit patches for linux dirver and Qemu to change 
the length
of struct virtio_crypto_op_data_req.u

Is the approach available?

Thanks,
-Gonglei


> -Original Message-
> From: Gonglei (Arei)
> Sent: Saturday, January 14, 2017 9:21 AM
> To: 'Michael S. Tsirkin'
> Cc: Halil Pasic; qemu-devel@nongnu.org; virtio-...@lists.oasis-open.org;
> Huangweidong (C); john.grif...@intel.com; cornelia.h...@de.ibm.com;
> Zhoujian (jay, Euler); varun.se...@freescale.com; denglin...@chinamobile.com;
> arei.gong...@hotmail.com; ag...@suse.de; nmo...@kalray.eu;
> vincent.jar...@6wind.com; ola.liljed...@arm.com; Luonengjun;
> xin.z...@intel.com; liang.j...@intel.com; stefa...@redhat.com; Shiqing Fan;
> Jani Kokkonen; brian.a.keat...@intel.com; Claudio Fontana;
> mike.cara...@nxp.com; Wubin (H)
> Subject: RE: [virtio-dev] Re: [Qemu-devel] [PATCH v15 0/2] virtio-crypto: 
> virtio
> crypto device specification
> 
> >
> > On Fri, Jan 13, 2017 at 02:54:29AM +, Gonglei (Arei) wrote:
> > >
> > > >
> > > > On Thu, Jan 12, 2017 at 12:26:24PM +, Gonglei (Arei) wrote:
> > > > > Hi,
> > > > >
> > > > >
> > > > > >
> > > > > > On 01/04/2017 11:10 AM, Gonglei (Arei) wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I attach the diff files between v14 and v15 for better review.
> > > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > only had a quick look. Will try to come back to this later.
> > > > > >
> > > > > That's cool.
> > > > >
> > > > > > > diff --git a/virtio-crypto.tex b/virtio-crypto.tex
> > > > > > > index 9f7faf0..884ee95 100644
> > > > > > > --- a/virtio-crypto.tex
> > > > > > > +++ b/virtio-crypto.tex
> > > > > > > @@ -2,8 +2,8 @@
> > > > > > >
> > > > > > >  The virtio crypto device is a virtual cryptography device as 
> > > > > > > well as a
> > kind
> > > > of
> > > > > > >  virtual hardware accelerator for virtual machines. The encryption
> > and
> > > > > > > -decryption requests are placed in the data queue and are 
> > > > > > > ultimately
> > > > handled
> > > > > > by the
> > > > > > > -backend crypto accelerators. The second queue is the control 
> > > > > > > queue
> > used
> > > > to
> > > > > > create
> > > > > > > +decryption requests are placed in any of the data active queues 
> > > > > > > and
> > are
> > > > > > ultimately handled by the
> > > > > > s/data active/active data/
> > > > > > > +backend crypto accelerators. The second kind of queue is the
> control
> > > > queue
> > > > > > used to create
> > > > > > >  or destroy sessions for symmetric algorithms and will control 
> > > > > > > some
> > > > > > advanced
> > > > > > >  features in the future. The virtio crypto device provides the
> following
> > > > crypto
> > > > > > >  services: CIPHER, MAC, HASH, and AEAD.
> > > > > >
> > > > > > [..]
> > > > > >
> > > > > > > ===The below diff shows the changes of add
> > non-session
> > > > mode
> > > > > > support:
> > > > > > >
> > > > > > > diff --git a/virtio-crypto.tex b/virtio-crypto.tex
> > > > > > > index 884ee95..44819f9 100644
> > > > > > > --- a/virtio-crypto.tex
> > > > > > > +++ b/virtio-crypto.tex
> > > > > > > @@ -26,7 +26,10 @@ N is set by \field{max_dataqueues}.
> > > > > > >
> > > > > > >  \subsection{Feature bits}\label{sec:Device Types / Crypto Device 
> > > > > > > /
> > > > Feature
> > > > > > bits}
> > > > > > >
> > > > > > > -None currently defined.
> > > > > > > +VIRTIO_CRYPTO_F_CIPHER_SESSION_MODE (1) Session mode is
> > > > available
> > > > > > for CIPHER service.
> > > > > > > +VIRTIO_CRYPTO_F_HASH_SESSION_MODE (2) Session mode is
> > available
> > > > for
> > > > > > HASH service.
> > > > > > > +VIRTIO_CRYPTO_F_MAC_SESSION_MODE (3) Session mode is
> > available
> > > > for
> > > > > > MAC service.
> > > > > > > +VIRTIO_CRYPTO_F_AEAD_SESSION_MODE (4) Session mode is
> > available
> > > > for
> > > > > > AEAD ser

Re: [Qemu-devel] [PATCH 05/16] io: make qio_channel_yield aware of AioContexts

2017-01-16 Thread Fam Zheng
On Mon, 01/16 13:24, Paolo Bonzini wrote:
> 
> 
> On 16/01/2017 12:38, Fam Zheng wrote:
> >> +void qio_channel_detach_aio_context(QIOChannel *ioc)
> >> +{
> >> +ioc->read_coroutine = NULL;
> >> +ioc->write_coroutine = NULL;
> >> +qio_channel_set_aio_fd_handlers(ioc);
> >> +ioc->ctx = NULL;
> >
> > Why is qio_channel_set_aio_fd_handler not needed here?
> 
> Because there are no read_coroutine and write_coroutine anymore.  The
> caller needs to schedule them on the right AioContext after calling
> qio_channel_set_aio_context.  See nbd_client_attach_aio_context in the
> next patch for an example.
> 
> >> -tests/test-char$(EXESUF): tests/test-char.o qemu-char.o qemu-timer.o 
> >> $(test-util-obj-y) $(qtest-obj-y) $(test-io-obj-y)
> >> +tests/test-char$(EXESUF): tests/test-char.o qemu-char.o qemu-timer.o 
> >> $(test-util-obj-y) $(qtest-obj-y) $(test-io-obj-y) $(test-block-obj-y)
> > 
> > I guess this is a hint for moving coroutine code into a lower level library 
> > like
> > util.
> 
> Coroutine, or AioContext?  The reason for this is that io/ now uses
> aio_co_wake.

Or both. It just feels a bit odd to see *char* depend on *block*, maybe there're
more such dependencies to come even outside tests/?

Not necessarily for this series, of course.

Fam



Re: [Qemu-devel] [PATCH 06/16] nbd: do not block on partial reply header reads

2017-01-16 Thread Fam Zheng
On Fri, 01/13 14:17, Paolo Bonzini wrote:
> Read the replies from a coroutine, switching the read side between the
> "read header" coroutine and the I/O coroutine that reads the body of
> the reply.
> 
> qio_channel_yield is used so that the right coroutine is restarted
> automatically, eliminating the need for send_coroutine in
> NBDClientSession.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/nbd-client.c | 108 
> +
>  block/nbd-client.h |   2 +-
>  nbd/client.c   |   2 +-
>  nbd/common.c   |   9 +
>  4 files changed, 45 insertions(+), 76 deletions(-)
> 
> @@ -65,54 +67,34 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>  client->ioc = NULL;
>  }
>  
> -static void nbd_reply_ready(void *opaque)
> +static void nbd_read_reply_entry(void *opaque)
>  {
> -BlockDriverState *bs = opaque;
> -NBDClientSession *s = nbd_get_client_session(bs);
> +NBDClientSession *s = opaque;
>  uint64_t i;
>  int ret;
>  
> -if (!s->ioc) { /* Already closed */
> -return;
> -}
> -
> -if (s->reply.handle == 0) {
> -/* No reply already in flight.  Fetch a header.  It is possible
> - * that another thread has done the same thing in parallel, so
> - * the socket is not readable anymore.
> - */
> +for (;;) {
> +assert(s->reply.handle == 0);
>  ret = nbd_receive_reply(s->ioc, &s->reply);
> -if (ret == -EAGAIN) {
> -return;
> -}
>  if (ret < 0) {
> -s->reply.handle = 0;
> -goto fail;
> +break;
>  }
> -}
> -
> -/* There's no need for a mutex on the receive side, because the
> - * handler acts as a synchronization point and ensures that only
> - * one coroutine is called until the reply finishes.  */
> -i = HANDLE_TO_INDEX(s, s->reply.handle);
> -if (i >= MAX_NBD_REQUESTS) {
> -goto fail;
> -}
>  
> -if (s->recv_coroutine[i]) {
> -qemu_coroutine_enter(s->recv_coroutine[i]);
> -return;
> -}
> -
> -fail:
> -nbd_teardown_connection(bs);
> -}
> +/* There's no need for a mutex on the receive side, because the
> + * handler acts as a synchronization point and ensures that only
> + * one coroutine is called until the reply finishes.
> + */
> +i = HANDLE_TO_INDEX(s, s->reply.handle);
> +if (i >= MAX_NBD_REQUESTS || !s->recv_coroutine[i]) {
> +break;
> +}
>  
> -static void nbd_restart_write(void *opaque)
> -{
> -BlockDriverState *bs = opaque;
> +aio_co_wake(s->recv_coroutine[i]);
>  
> -qemu_coroutine_enter(nbd_get_client_session(bs)->send_coroutine);
> +/* We're woken up by the recv_coroutine itself.  */

"Wait until we're woken by ..." ?

> +qemu_coroutine_yield();
> +}
> +s->read_reply_co = NULL;
>  }
>  
>  static int nbd_co_send_request(BlockDriverState *bs,
> @@ -120,7 +102,6 @@ static int nbd_co_send_request(BlockDriverState *bs,
> QEMUIOVector *qiov)
>  {
>  NBDClientSession *s = nbd_get_client_session(bs);
> -AioContext *aio_context;
>  int rc, ret, i;
>  
>  qemu_co_mutex_lock(&s->send_mutex);
> @@ -141,11 +122,6 @@ static int nbd_co_send_request(BlockDriverState *bs,
>  return -EPIPE;
>  }
>  
> -s->send_coroutine = qemu_coroutine_self();
> -aio_context = bdrv_get_aio_context(bs);
> -
> -aio_set_fd_handler(aio_context, s->sioc->fd, false,
> -   nbd_reply_ready, nbd_restart_write, NULL, bs);
>  if (qiov) {
>  qio_channel_set_cork(s->ioc, true);
>  rc = nbd_send_request(s->ioc, request);
> @@ -160,9 +136,6 @@ static int nbd_co_send_request(BlockDriverState *bs,
>  } else {
>  rc = nbd_send_request(s->ioc, request);
>  }
> -aio_set_fd_handler(aio_context, s->sioc->fd, false,
> -   nbd_reply_ready, NULL, NULL, bs);
> -s->send_coroutine = NULL;
>  qemu_co_mutex_unlock(&s->send_mutex);
>  return rc;
>  }
> @@ -174,8 +147,7 @@ static void nbd_co_receive_reply(NBDClientSession *s,
>  {
>  int ret;
>  
> -/* Wait until we're woken up by the read handler.  TODO: perhaps
> - * peek at the next reply and avoid yielding if it's ours?  */
> +/* Wait until we're woken up by nbd_read_reply_entry.  */
>  qemu_coroutine_yield();
>  *reply = s->reply;
>  if (reply->handle != request->handle ||
> @@ -209,14 +181,18 @@ static void nbd_coroutine_start(NBDClientSession *s,
>  /* s->recv_coroutine[i] is set as soon as we get the send_lock.  */
>  }
>  
> -static void nbd_coroutine_end(NBDClientSession *s,
> +static void nbd_coroutine_end(BlockDriverState *bs,
>NBDRequest *request)
>  {
> +NBDClientSession *s = nbd_get_client_session(bs);
>  int i = HANDLE_TO_INDEX(s, request->handle);
> +
>  s->recv_coro

Re: [Qemu-devel] [PATCH 04/16] io: add methods to set I/O handlers on AioContext

2017-01-16 Thread Daniel P. Berrange
On Fri, Jan 13, 2017 at 02:17:19PM +0100, Paolo Bonzini wrote:
> This is in preparation for making qio_channel_yield work on
> AioContexts other than the main one.
> 
> Signed-off-by: Paolo Bonzini 

Reviewed-by: Daniel P. Berrange 

> diff --git a/io/channel-command.c b/io/channel-command.c
> index ad25313..4000b61 100644
> --- a/io/channel-command.c
> +++ b/io/channel-command.c
> @@ -328,6 +328,18 @@ static int qio_channel_command_close(QIOChannel *ioc,
>  }
>  
>  
> +static void qio_channel_command_set_aio_fd_handler(QIOChannel *ioc,
> +AioContext *ctx,
> +IOHandler *io_read,
> +IOHandler *io_write,
> +void *opaque)

nitpick alignment


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-devel] [PATCH 04/16] io: add methods to set I/O handlers on AioContext

2017-01-16 Thread Daniel P. Berrange
On Fri, Jan 13, 2017 at 02:17:19PM +0100, Paolo Bonzini wrote:
> This is in preparation for making qio_channel_yield work on
> AioContexts other than the main one.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  include/io/channel.h | 30 ++
>  io/channel-command.c | 13 +
>  io/channel-file.c| 11 +++
>  io/channel-socket.c  | 16 +++-
>  io/channel-tls.c | 12 
>  io/channel-watch.c   |  6 ++
>  io/channel.c | 11 +++
>  7 files changed, 94 insertions(+), 5 deletions(-)
> 
> diff --git a/include/io/channel.h b/include/io/channel.h
> index 32a9470..665edd7 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h

> @@ -80,6 +83,9 @@ struct QIOChannel {
>  Object parent;
>  unsigned int features; /* bitmask of QIOChannelFeatures */
>  char *name;
> +AioContext *ctx;
> +QIOChannelRestart *read_coroutine;
> +QIOChannelRestart *write_coroutine;

This hunk belongs in the next patch since its not used here and you also
change the data types in the next patch.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-devel] [PATCH 05/16] io: make qio_channel_yield aware of AioContexts

2017-01-16 Thread Daniel P. Berrange
On Mon, Jan 16, 2017 at 07:38:24PM +0800, Fam Zheng wrote:
> On Fri, 01/13 14:17, Paolo Bonzini wrote:
> > Support separate coroutines for reading and writing, and place the
> > read/write handlers on the AioContext that the QIOChannel is registered
> > with.
> > 
> > Signed-off-by: Paolo Bonzini 
> > ---
> >  include/io/channel.h   | 37 ++
> >  io/channel.c   | 86 
> > ++
> >  tests/Makefile.include |  2 +-
> >  3 files changed, 96 insertions(+), 29 deletions(-)
> > 
> > diff --git a/include/io/channel.h b/include/io/channel.h
> > index 665edd7..d7bad94 100644
> > --- a/include/io/channel.h
> > +++ b/include/io/channel.h
> > @@ -23,6 +23,7 @@
> >  
> >  #include "qemu-common.h"
> >  #include "qom/object.h"
> > +#include "qemu/coroutine.h"
> >  #include "block/aio.h"
> >  
> >  #define TYPE_QIO_CHANNEL "qio-channel"
> > @@ -59,8 +60,6 @@ typedef gboolean (*QIOChannelFunc)(QIOChannel *ioc,
> > GIOCondition condition,
> > gpointer data);
> >  
> > -typedef struct QIOChannelRestart QIOChannelRestart;
> > -
> >  /**
> >   * QIOChannel:
> >   *
> > @@ -84,8 +83,8 @@ struct QIOChannel {
> >  unsigned int features; /* bitmask of QIOChannelFeatures */
> >  char *name;
> >  AioContext *ctx;
> > -QIOChannelRestart *read_coroutine;
> > -QIOChannelRestart *write_coroutine;
> > +Coroutine *read_coroutine;
> > +Coroutine *write_coroutine;
> >  #ifdef _WIN32
> >  HANDLE event; /* For use with GSource on Win32 */
> >  #endif
> > @@ -508,13 +507,37 @@ guint qio_channel_add_watch(QIOChannel *ioc,
> >  
> >  
> >  /**
> > + * qio_channel_set_aio_context:
> > + * @ioc: the channel object
> > + * @ctx: the #AioContext to set the handlers on
> > + *
> > + * Request that qio_channel_yield() sets I/O handlers on
> > + * the given #AioContext.  If @ctx is %NULL, qio_channel_yield()
> > + * uses QEMU's main thread event loop.
> > + */
> > +void qio_channel_set_aio_context(QIOChannel *ioc,
> > + AioContext *ctx);
> > +
> > +/**
> > + * qio_channel_detach_aio_context:
> > + * @ioc: the channel object
> > + *
> > + * Disable any I/O handlers set by qio_channel_yield().  With the
> > + * help of aio_co_schedule(), this allows moving a coroutine that was
> > + * paused by qio_channel_yield() to another context.
> > + */
> > +void qio_channel_detach_aio_context(QIOChannel *ioc);
> > +
> > +/**
> >   * qio_channel_yield:
> >   * @ioc: the channel object
> >   * @condition: the I/O condition to wait for
> >   *
> > - * Yields execution from the current coroutine until
> > - * the condition indicated by @condition becomes
> > - * available.
> > + * Yields execution from the current coroutine until the condition
> > + * indicated by @condition becomes available.  @condition must
> > + * be either %G_IO_IN or %G_IO_OUT; it cannot contain both.  In
> > + * addition, no two coroutine can be waiting on the same condition
> > + * and channel at the same time.
> >   *
> >   * This must only be called from coroutine context
> >   */
> > diff --git a/io/channel.c b/io/channel.c
> > index ce470d7..1e043bf 100644
> > --- a/io/channel.c
> > +++ b/io/channel.c
> > @@ -21,7 +21,7 @@
> >  #include "qemu/osdep.h"
> >  #include "io/channel.h"
> >  #include "qapi/error.h"
> > -#include "qemu/coroutine.h"
> > +#include "qemu/main-loop.h"
> >  
> >  bool qio_channel_has_feature(QIOChannel *ioc,
> >   QIOChannelFeature feature)
> > @@ -238,36 +238,80 @@ off_t qio_channel_io_seek(QIOChannel *ioc,
> >  }
> >  
> >  
> > -typedef struct QIOChannelYieldData QIOChannelYieldData;
> > -struct QIOChannelYieldData {
> > -QIOChannel *ioc;
> > -Coroutine *co;
> > -};
> > +static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc);
> > +
> > +static void qio_channel_restart_read(void *opaque)
> > +{
> > +QIOChannel *ioc = opaque;
> > +Coroutine *co = ioc->read_coroutine;
> >  
> > +ioc->read_coroutine = NULL;
> > +qio_channel_set_aio_fd_handlers(ioc);
> > +aio_co_wake(co);
> > +}
> >  
> > -static gboolean qio_channel_yield_enter(QIOChannel *ioc,
> > -GIOCondition condition,
> > -gpointer opaque)
> > +static void qio_channel_restart_write(void *opaque)
> >  {
> > -QIOChannelYieldData *data = opaque;
> > -qemu_coroutine_enter(data->co);
> > -return FALSE;
> > +QIOChannel *ioc = opaque;
> > +Coroutine *co = ioc->write_coroutine;
> > +
> > +ioc->write_coroutine = NULL;
> > +qio_channel_set_aio_fd_handlers(ioc);
> > +aio_co_wake(co);
> >  }
> >  
> > +static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc)
> > +{
> > +IOHandler *rd_handler = NULL, *wr_handler = NULL;
> > +AioContext *ctx;
> > +
> > +if (ioc->read_coroutine) {
> > +   rd_handler = qio_channel_restart_read;
> 
> s/\t/   

Re: [Qemu-devel] [PATCH 05/16] io: make qio_channel_yield aware of AioContexts

2017-01-16 Thread Daniel P. Berrange
On Fri, Jan 13, 2017 at 02:17:20PM +0100, Paolo Bonzini wrote:
> Support separate coroutines for reading and writing, and place the
> read/write handlers on the AioContext that the QIOChannel is registered
> with.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  include/io/channel.h   | 37 ++
>  io/channel.c   | 86 
> ++
>  tests/Makefile.include |  2 +-
>  3 files changed, 96 insertions(+), 29 deletions(-)
> 
> diff --git a/include/io/channel.h b/include/io/channel.h
> index 665edd7..d7bad94 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -23,6 +23,7 @@
>  
>  #include "qemu-common.h"
>  #include "qom/object.h"
> +#include "qemu/coroutine.h"
>  #include "block/aio.h"
>  
>  #define TYPE_QIO_CHANNEL "qio-channel"
> @@ -59,8 +60,6 @@ typedef gboolean (*QIOChannelFunc)(QIOChannel *ioc,
> GIOCondition condition,
> gpointer data);
>  
> -typedef struct QIOChannelRestart QIOChannelRestart;
> -
>  /**
>   * QIOChannel:
>   *
> @@ -84,8 +83,8 @@ struct QIOChannel {
>  unsigned int features; /* bitmask of QIOChannelFeatures */
>  char *name;
>  AioContext *ctx;
> -QIOChannelRestart *read_coroutine;
> -QIOChannelRestart *write_coroutine;
> +Coroutine *read_coroutine;
> +Coroutine *write_coroutine;

Need to squash in part of previous patch here.

>  #ifdef _WIN32
>  HANDLE event; /* For use with GSource on Win32 */
>  #endif
> @@ -508,13 +507,37 @@ guint qio_channel_add_watch(QIOChannel *ioc,
>  
>  
>  /**
> + * qio_channel_set_aio_context:
> + * @ioc: the channel object
> + * @ctx: the #AioContext to set the handlers on
> + *
> + * Request that qio_channel_yield() sets I/O handlers on
> + * the given #AioContext.  If @ctx is %NULL, qio_channel_yield()
> + * uses QEMU's main thread event loop.
> + */

Can you note that it is explicitly permitted to call this while
inside a qio_channel_yield().

> +void qio_channel_set_aio_context(QIOChannel *ioc,
> + AioContext *ctx);
> +
> +/**
> + * qio_channel_detach_aio_context:
> + * @ioc: the channel object
> + *
> + * Disable any I/O handlers set by qio_channel_yield().  With the
> + * help of aio_co_schedule(), this allows moving a coroutine that was
> + * paused by qio_channel_yield() to another context.
> + */
> +void qio_channel_detach_aio_context(QIOChannel *ioc);

> diff --git a/io/channel.c b/io/channel.c
> index ce470d7..1e043bf 100644
> --- a/io/channel.c
> +++ b/io/channel.c
> @@ -21,7 +21,7 @@
>  #include "qemu/osdep.h"
>  #include "io/channel.h"
>  #include "qapi/error.h"
> -#include "qemu/coroutine.h"
> +#include "qemu/main-loop.h"
>  
>  bool qio_channel_has_feature(QIOChannel *ioc,
>   QIOChannelFeature feature)
> @@ -238,36 +238,80 @@ off_t qio_channel_io_seek(QIOChannel *ioc,
>  }
>  
>  
> -typedef struct QIOChannelYieldData QIOChannelYieldData;
> -struct QIOChannelYieldData {
> -QIOChannel *ioc;
> -Coroutine *co;
> -};
> +static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc);
> +
> +static void qio_channel_restart_read(void *opaque)
> +{
> +QIOChannel *ioc = opaque;
> +Coroutine *co = ioc->read_coroutine;
>  
> +ioc->read_coroutine = NULL;
> +qio_channel_set_aio_fd_handlers(ioc);
> +aio_co_wake(co);
> +}
>  
> -static gboolean qio_channel_yield_enter(QIOChannel *ioc,
> -GIOCondition condition,
> -gpointer opaque)
> +static void qio_channel_restart_write(void *opaque)
>  {
> -QIOChannelYieldData *data = opaque;
> -qemu_coroutine_enter(data->co);
> -return FALSE;
> +QIOChannel *ioc = opaque;
> +Coroutine *co = ioc->write_coroutine;
> +
> +ioc->write_coroutine = NULL;
> +qio_channel_set_aio_fd_handlers(ioc);
> +aio_co_wake(co);
>  }
>  
> +static void qio_channel_set_aio_fd_handlers(QIOChannel *ioc)
> +{
> +IOHandler *rd_handler = NULL, *wr_handler = NULL;
> +AioContext *ctx;
> +
> +if (ioc->read_coroutine) {
> + rd_handler = qio_channel_restart_read;
> +}
> +if (ioc->write_coroutine) {
> + rd_handler = qio_channel_restart_write;
> +}

Tab damage.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-devel] [PATCH 05/16] io: make qio_channel_yield aware of AioContexts

2017-01-16 Thread Daniel P. Berrange
On Mon, Jan 16, 2017 at 08:47:28PM +0800, Fam Zheng wrote:
> On Mon, 01/16 13:24, Paolo Bonzini wrote:
> > 
> > 
> > On 16/01/2017 12:38, Fam Zheng wrote:
> > >> +void qio_channel_detach_aio_context(QIOChannel *ioc)
> > >> +{
> > >> +ioc->read_coroutine = NULL;
> > >> +ioc->write_coroutine = NULL;
> > >> +qio_channel_set_aio_fd_handlers(ioc);
> > >> +ioc->ctx = NULL;
> > >
> > > Why is qio_channel_set_aio_fd_handler not needed here?
> > 
> > Because there are no read_coroutine and write_coroutine anymore.  The
> > caller needs to schedule them on the right AioContext after calling
> > qio_channel_set_aio_context.  See nbd_client_attach_aio_context in the
> > next patch for an example.
> > 
> > >> -tests/test-char$(EXESUF): tests/test-char.o qemu-char.o qemu-timer.o 
> > >> $(test-util-obj-y) $(qtest-obj-y) $(test-io-obj-y)
> > >> +tests/test-char$(EXESUF): tests/test-char.o qemu-char.o qemu-timer.o 
> > >> $(test-util-obj-y) $(qtest-obj-y) $(test-io-obj-y) $(test-block-obj-y)
> > > 
> > > I guess this is a hint for moving coroutine code into a lower level 
> > > library like
> > > util.
> > 
> > Coroutine, or AioContext?  The reason for this is that io/ now uses
> > aio_co_wake.
> 
> Or both. It just feels a bit odd to see *char* depend on *block*, maybe 
> there're
> more such dependencies to come even outside tests/?
> 
> Not necessarily for this series, of course.

On the contrary, I think it should be in this series - we shouldn't
introduce a dependancy on the block layer from io layer.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-devel] [PATCH 10/16] block: explicitly acquire aiocontext in timers that need it

2017-01-16 Thread Fam Zheng
On Fri, 01/13 14:17, Paolo Bonzini wrote:
> diff --git a/block/qed.c b/block/qed.c
> index 7f1c508..a21d025 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -345,10 +345,22 @@ static void qed_need_check_timer_cb(void *opaque)
>  
>  trace_qed_need_check_timer_cb(s);
>  
> +qed_acquire(s);
>  qed_plug_allocating_write_reqs(s);
>  
>  /* Ensure writes are on disk before clearing flag */
>  bdrv_aio_flush(s->bs->file->bs, qed_clear_need_check, s);
> +qed_release(s);
> +}
> +
> +void qed_acquire(BDRVQEDState *s)
> +{
> +aio_context_acquire(bdrv_get_aio_context(s->bs));
> +}
> +
> +void qed_release(BDRVQEDState *s)
> +{
> +aio_context_release(bdrv_get_aio_context(s->bs));
>  }
>  
>  static void qed_start_need_check_timer(BDRVQEDState *s)
> diff --git a/block/qed.h b/block/qed.h
> index 9676ab9..ce8c314 100644
> --- a/block/qed.h
> +++ b/block/qed.h
> @@ -198,6 +198,9 @@ enum {
>   */
>  typedef void QEDFindClusterFunc(void *opaque, int ret, uint64_t offset, 
> size_t len);
>  
> +void qed_acquire(BDRVQEDState *s);
> +void qed_release(BDRVQEDState *s);
> +

Why cannot these be local (static) functions, in block/qed.c?

>  /**
>   * Generic callback for chaining async callbacks
>   */

Fam



Re: [Qemu-devel] Towards an ivshmem 2.0?

2017-01-16 Thread Jan Kiszka
Hi Marc-André,

On 2017-01-16 13:41, Marc-André Lureau wrote:
> Hi
> 
> On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka  > wrote:
> 
> Hi,
> 
> some of you may know that we are using a shared memory device similar to
> ivshmem in the partitioning hypervisor Jailhouse [1].
> 
> We started as being compatible to the original ivshmem that QEMU
> implements, but we quickly deviated in some details, and in the recent
> months even more. Some of the deviations are related to making the
> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
> aiming at safety critical systems and, therefore, a small code base.
> Other changes address deficits in the original design, like missing
> life-cycle management.
> 
> Now the question is if there is interest in defining a common new
> revision of this device and maybe also of some protocols used on top,
> such as virtual network links. Ideally, this would enable us to share
> Linux drivers. We will definitely go for upstreaming at least a network
> driver such as [2], a UIO driver and maybe also a serial port/console.
> 
> 
> This sounds like duplicating efforts done with virtio and vhost-pci.
> Have you looked at Wei Wang proposal?

I didn't follow it recently, but the original concept was about
introducing an IOMMU model to the picture, and that's complexity-wise a
no-go for us (we can do this whole thing in less than 500 lines, even
virtio itself is more complex). IIUC, the alternative to an IOMMU is
mapping the whole frontend VM memory into the backend VM - that's
security/safety-wise an absolute no-go.

> 
> I've attached a first draft of the specification of our new ivshmem
> device. A working implementation can be found in the wip/ivshmem2 branch
> of Jailhouse [3], the corresponding ivshmem-net driver in [4].
> 
> 
> You don't have qemu branch, right?

Yes, not yet. I would look into creating a QEMU device model if there is
serious interest.

>  
> 
> 
> Deviations from the original design:
> 
> - Only two peers per link
> 
> 
> sound sane, that's also what vhost-pci aims to afaik
>  
> 
>   This simplifies the implementation and also the interfaces (think of
>   life-cycle management in a multi-peer environment). Moreover, we do
>   not have an urgent use case for multiple peers, thus also not
>   reference for a protocol that could be used in such setups. If someone
>   else happens to share such a protocol, it would be possible to discuss
>   potential extensions and their implications.
> 
> - Side-band registers to discover and configure share memory regions
> 
>   This was one of the first changes: We removed the memory regions from
>   the PCI BARs and gave them special configuration space registers. By
>   now, these registers are embedded in a PCI capability. The reasons are
>   that Jailhouse does not allow to relocate the regions in guest address
>   space (but other hypervisors may if they like to) and that we now have
>   up to three of them.
> 
> 
>  Sorry, I can't comment on that.
> 
> 
> - Changed PCI base class code to 0xff (unspecified class)
> 
>   This allows us to define our own sub classes and interfaces. That is
>   now exploited for specifying the shared memory protocol the two
>   connected peers should use. It also allows the Linux drivers to match
>   on that.
> 
> 
> Why not, but it worries me that you are going to invent protocols
> similar to virtio devices, aren't you?

That partly comes with the desire to simplify the transport (pure shared
memory). With ivshmem-net, we are at least reusing virtio rings and will
try to do this with the new (and faster) virtio ring format as well.

>  
> 
> - INTx interrupts support is back
> 
>   This is needed on target platforms without MSI controllers, i.e.
>   without the required guest support. Namely some PCI-less ARM SoCs
>   required the reintroduction. While doing this, we also took care of
>   keeping the MMIO registers free of privileged controls so that a
>   guest OS can map them safely into a guest userspace application.
> 
> 
> Right, it's not completely removed from ivshmem qemu upstream, although
> it should probably be allowed to setup a doorbell-ivshmem with msi=off
> (this may be quite trivial to add back)
>  
> 
> And then there are some extensions of the original ivshmem:
> 
> - Multiple shared memory regions, including unidirectional ones
> 
>   It is now possible to expose up to three different shared memory
>   regions: The first one is read/writable for both sides. The second
>   region is read/writable for the local peer and read-only for the
>   remote peer (useful for output queues). And the third is read-only
>   locally but read/writable remotely (ie. for input queues).
>   Unidirectional regions prevent that the receiver o

Re: [Qemu-devel] Performance about x-data-plane

2017-01-16 Thread Stefan Hajnoczi
On Tue, Jan 03, 2017 at 12:02:14PM -0500, Weiwei Jia wrote:
> > The expensive part is the virtqueue kick.  Recently we tried polling the
> > virtqueue instead of waiting for the ioeventfd file descriptor and got
> > double-digit performance improvements:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00148.html
> >
> > If you want to understand the performance of your benchmark you'll have
> > compare host/guest disk stats (e.g. request lifetime, disk utilization,
> > queue depth, average request size) to check that the bare metal and
> > guest workloads are really sending comparable I/O patterns to the
> > physical disk.
> >
> > Then you using Linux and/or QEMU tracing to analyze the request latency
> > by looking at interesting points in the request lifecycle like virtqueue
> > kick, host Linux AIO io_submit(2), etc.
> >
> 
> Thank you. I will look into "polling the virtqueue" as you said above.
> Currently, I just use blktrace to see disk stats and add logs in the
> I/O workload to see the time latency for each request. What kind of
> tools are you using to analyze request lifecycle like virtqueue kick,
> host Linux AIO iosubmit, etc.
> 
> Do you trace the lifecycle like this
> (http://www.linux-kvm.org/page/Virtio/Block/Latency#Performance_data)
> but it seems to be out of date. Does it
> (http://repo.or.cz/qemu-kvm/stefanha.git/shortlog/refs/heads/tracing-dev-0.12.4)
> still work on QEMU 2.4.1?

The details are out of date but the general approach to tracing the I/O
request lifecycle still apply.

There are multiple tracing tools that can do what you need.  I've CCed
Karl Rister who did the latest virtio-blk dataplane tracing.

"perf record -a -e kvm:\*" is a good start.  You can use "perf probe" to
trace QEMU's trace events (recent versions have sdt support, which means
SystemTap tracepoints work) and also trace any function in QEMU:
http://blog.vmsplice.net/2011/03/how-to-use-perf-probe.html

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 3/3] nbd: do not block on partial reply header reads

2017-01-16 Thread Stefan Hajnoczi
On Fri, Dec 23, 2016 at 07:26:41PM +0100, Paolo Bonzini wrote:
> @@ -65,54 +67,34 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>  client->ioc = NULL;
>  }
>  
> -static void nbd_reply_ready(void *opaque)
> +static void nbd_read_reply_entry(void *opaque)

Please use the coroutine_fn attribute on all functions that must be
called from coroutine context.


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PULL 0/4] M68k for 2.9 patches

2017-01-16 Thread Peter Maydell
On 12 January 2017 at 18:08, Laurent Vivier  wrote:
> The following changes since commit 4201e616c0fa48de99a505a6ade979f0c2c65e28:
>
>   Merge remote-tracking branch 'remotes/kraxel/tags/pull-audio-20170111-1' 
> into staging (2017-01-12 15:57:18 +)
>
> are available in the git repository at:
>
>   git://github.com/vivier/qemu-m68k.git tags/m68k-for-2.9-pull-request
>
> for you to fetch changes up to ad9075ac2cc6cfe4003f5e47d421c0e5b2ad08a3:
>
>   m68k: Remove PCI and USB from config file (2017-01-12 18:49:09 +0100)
>
> 
> This is the list of the remaining patches that was part of series
> reviewed during the freeze period. They use the new new deposit
> and extract ops.
> There is also a cleanup patch from Thomas.
> 

Applied, thanks.

-- PMM



Re: [Qemu-devel] [PATCH v15 1/2] virtio-crypto: Add virtio crypto device specification

2017-01-16 Thread Stefan Hajnoczi
On Wed, Jan 04, 2017 at 06:03:04PM +0800, Gonglei wrote:
> +The general header of dataq:
> +
> +\begin{lstlisting}
> +struct virtio_crypto_op_header {
> +#define VIRTIO_CRYPTO_CIPHER_ENCRYPT \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x00)
> +#define VIRTIO_CRYPTO_CIPHER_DECRYPT \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x01)
> +#define VIRTIO_CRYPTO_HASH \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x00)
> +#define VIRTIO_CRYPTO_MAC \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x00)
> +#define VIRTIO_CRYPTO_AEAD_ENCRYPT \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x00)
> +#define VIRTIO_CRYPTO_AEAD_DECRYPT \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x01)
> +le32 opcode;
> +/* algo should be service-specific algorithms */
> +le32 algo;
> +/* session_id should be service-specific algorithms */
> +le64 session_id;
> +#define VIRTIO_CRYPTO_FLAG_SESSION_MODE 1
> +#define VIRTIO_CRYPTO_FLAG_NONE_SESSION_MODE 2

This constant isn't mentioned anywhere in the spec, am I missing
something?

Regarding naming: "none session mode" != "non-session mode"

I think this constant should be called
VIRTIO_CRYPTO_FLAG_NON_SESSION_MODE or
VIRTIO_CRYPTO_FLAG_SESSIONLESS_MODE.


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH] virtio: drop an obsolete comment

2017-01-16 Thread Stefan Hajnoczi
On Thu, Jan 12, 2017 at 11:31:56PM +0200, Michael S. Tsirkin wrote:
> virtio core has code to revert queue number
> to maximum on reset. Drop TODO to add that.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  hw/virtio/virtio-pci.c | 1 -
>  1 file changed, 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 06/16] nbd: do not block on partial reply header reads

2017-01-16 Thread Paolo Bonzini


On 16/01/2017 13:52, Fam Zheng wrote:
> +/* Kick the read_reply_co to get the next reply.  */
> +aio_co_wake(s->read_reply_co);
> 
> Can't s->read_reply_co be NULL? nbd_read_reply_entry unsets it. (Surprisingly
> this file is rather unfamiliar to me, it's possible I'm missing something.)

Yes, that can happen depending on how the coroutines are scheduled when
the server goes down.

Paolo



Re: [Qemu-devel] [PATCH] nvdimm: allow read/write zero-size namespace label

2017-01-16 Thread Stefan Hajnoczi
On Sat, Jan 14, 2017 at 07:22:28PM +0800, Li Qiang wrote:
> Hello Guangrong,
> 
> 
> 2017-01-13 17:00 GMT+08:00 Xiao Guangrong :
> 
> >
> >
> > On 01/13/2017 11:02 AM, Li Qiang wrote:
> >
> >> From: Li Qiang 
> >>
> >> The spec doesn't say the namespace label can't be zero
> >> when read/write it. As this is no harmful, just allow
> >> it.
> >>
> >>
> > WHY?
> >
> > The spec said that the label should be at least 128K.
> >
> 
> Yes, the label size has a limit, but in NVDIMM_DSM_Interface_Example.pdf
> section 4.5.1
> When the guest get namespace label data, the 'Length' is not limited, if it
> is 0, it will trigger
> this assert.
> 
> static void nvdimm_validate_rw_label_data(NVDIMMDevice *nvdimm, uint64_t
> size,
> uint64_t offset)
> {
> assert((nvdimm->label_size >= size + offset) && (offset + size >
> offset));
> }
> 
> Though I don't know what the exact behavior of this action in real
> hardware. I just think it should not
> trigger assert and exit when the guest get 0-size label data.
> 
> Anyway, this is just a suggestion, If my understand is wrong, just ignore
> this.

QEMU must prevent guests from triggering assertions.  If the assertion
causes a core dump then host resources are consumed and this could be a
denial-of-service.  An assertion failure in nested virtualization can
kill sibling VMs and is therefore also a denial-of-service.

The size=0 case must be handled in some way (either an error or a nop).

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 10/16] block: explicitly acquire aiocontext in timers that need it

2017-01-16 Thread Paolo Bonzini


On 16/01/2017 14:07, Fam Zheng wrote:
> On Fri, 01/13 14:17, Paolo Bonzini wrote:
>> diff --git a/block/qed.c b/block/qed.c
>> index 7f1c508..a21d025 100644
>> --- a/block/qed.c
>> +++ b/block/qed.c
>> @@ -345,10 +345,22 @@ static void qed_need_check_timer_cb(void *opaque)
>>  
>>  trace_qed_need_check_timer_cb(s);
>>  
>> +qed_acquire(s);
>>  qed_plug_allocating_write_reqs(s);
>>  
>>  /* Ensure writes are on disk before clearing flag */
>>  bdrv_aio_flush(s->bs->file->bs, qed_clear_need_check, s);
>> +qed_release(s);
>> +}
>> +
>> +void qed_acquire(BDRVQEDState *s)
>> +{
>> +aio_context_acquire(bdrv_get_aio_context(s->bs));
>> +}
>> +
>> +void qed_release(BDRVQEDState *s)
>> +{
>> +aio_context_release(bdrv_get_aio_context(s->bs));
>>  }
>>  
>>  static void qed_start_need_check_timer(BDRVQEDState *s)
>> diff --git a/block/qed.h b/block/qed.h
>> index 9676ab9..ce8c314 100644
>> --- a/block/qed.h
>> +++ b/block/qed.h
>> @@ -198,6 +198,9 @@ enum {
>>   */
>>  typedef void QEDFindClusterFunc(void *opaque, int ret, uint64_t offset, 
>> size_t len);
>>  
>> +void qed_acquire(BDRVQEDState *s);
>> +void qed_release(BDRVQEDState *s);
>> +
> 
> Why cannot these be local (static) functions, in block/qed.c?

Patch 13 uses them elsewhere.  Should I put them in a separate patch?

Paolo

>>  /**
>>   * Generic callback for chaining async callbacks
>>   */
> 
> Fam
> 
> 



Re: [Qemu-devel] [Bug 1656234] [NEW] Qemu core dumped if using virtio-net

2017-01-16 Thread Stefan Hajnoczi
On Fri, Jan 13, 2017 at 08:40:22AM -, Robert Hu wrote:
> Public bug reported:
> 
> System Environment
> ===
> Qemu commit/branch: e92fbc75
> Host OS: RHEL7.2 ia32e
> Host Kernel: 4.9.0
> Guest OS: RHEL7.2 ia32e
> Guest Kernel: 4.9.0
> 
> Bug detailed description
> ===
> While create a kvm guest using virtio-net, the qemu will core dump with 
> showing "Aborted (core dumped)".
> Reproduce Steps
> ==
> create a guest:
> qemu-system-x86_64 -enable-kvm -m 4096 -smp 4 -device 
> virtio-net-pci,netdev=nic0,mac=00:16:3e:49:be:24 -netdev 
> tap,id=nic0,script=/etc/kvm/qemu-ifup -drive 
> file=/ia32e_rhel7u2_kvm.qcow2,if=none,id=virtio-disk0 -device 
> virtio-blk-pci,drive=virtio-disk0 -serial file:serial.log
> 
> Current Result:
> ==
> 
> qemu-system-x86_64: 
> /workspace/ia32e/nightly/kvm-next-20170105-ef85b6-e92fbc/kvm/qemu/hw/virtio/virtio.c:214:
>  virtio_queue_set_notification: Assertion `vq->notification_disabled > 0' 
> failed.
> Aborted (core dumped)

Thanks for reporting this assertion failure.  A patch is queued and will
be merged soon:
https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg02277.html

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 11/16] block: explicitly acquire aiocontext in callbacks that need it

2017-01-16 Thread Fam Zheng
On Fri, 01/13 14:17, Paolo Bonzini wrote:
> diff --git a/nbd/server.c b/nbd/server.c
> index efe5cb8..08fb720 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1366,6 +1366,10 @@ static void nbd_restart_write(void *opaque)
>  static void nbd_set_handlers(NBDClient *client)
>  {
>  if (client->exp && client->exp->ctx) {
> +/* Note that the handlers do not expect any concurrency; qemu-nbd
> + * does not instantiate multiple AioContexts yet, nor does it call
> + * aio_poll/aio_dispatch from multiple threads.
> + */
>  aio_set_fd_handler(client->exp->ctx, client->sioc->fd, true,
> client->can_read ? nbd_read : NULL,
> client->send_coroutine ? nbd_restart_write : NULL,

What about the built-in server (QMP nbd_server_start)?

> -- 
> 2.9.3
> 
> 



[Qemu-devel] [PULL 04/11] aio: make ctx->list_lock a QemuLockCnt, subsuming ctx->walking_bh

2017-01-16 Thread Stefan Hajnoczi
From: Paolo Bonzini 

This will make it possible to walk the list of bottom halves without
holding the AioContext lock---and in turn to call bottom half
handlers without holding the lock.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170112180800.21085-4-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio.h | 12 +---
 async.c | 35 ---
 2 files changed, 21 insertions(+), 26 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 013d400..be3adfe 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -90,17 +90,15 @@ struct AioContext {
  */
 uint32_t notify_me;
 
-/* lock to protect between bh's adders and deleter */
-QemuMutex list_lock;
+/* A lock to protect between bh's adders and deleter, and to ensure
+ * that no callbacks are removed while we're walking and dispatching
+ * them.
+ */
+QemuLockCnt list_lock;
 
 /* Anchor of the list of Bottom Halves belonging to the context */
 struct QEMUBH *first_bh;
 
-/* A simple lock used to protect the first_bh list, and ensure that
- * no callbacks are removed while we're walking and dispatching callbacks.
- */
-int walking_bh;
-
 /* Used by aio_notify.
  *
  * "notified" is used to avoid expensive event_notifier_test_and_clear
diff --git a/async.c b/async.c
index 69292fa..2305e11 100644
--- a/async.c
+++ b/async.c
@@ -53,14 +53,14 @@ void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc 
*cb, void *opaque)
 .cb = cb,
 .opaque = opaque,
 };
-qemu_mutex_lock(&ctx->list_lock);
+qemu_lockcnt_lock(&ctx->list_lock);
 bh->next = ctx->first_bh;
 bh->scheduled = 1;
 bh->deleted = 1;
 /* Make sure that the members are ready before putting bh into list */
 smp_wmb();
 ctx->first_bh = bh;
-qemu_mutex_unlock(&ctx->list_lock);
+qemu_lockcnt_unlock(&ctx->list_lock);
 aio_notify(ctx);
 }
 
@@ -73,12 +73,12 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void 
*opaque)
 .cb = cb,
 .opaque = opaque,
 };
-qemu_mutex_lock(&ctx->list_lock);
+qemu_lockcnt_lock(&ctx->list_lock);
 bh->next = ctx->first_bh;
 /* Make sure that the members are ready before putting bh into list */
 smp_wmb();
 ctx->first_bh = bh;
-qemu_mutex_unlock(&ctx->list_lock);
+qemu_lockcnt_unlock(&ctx->list_lock);
 return bh;
 }
 
@@ -93,13 +93,11 @@ int aio_bh_poll(AioContext *ctx)
 QEMUBH *bh, **bhp, *next;
 int ret;
 
-ctx->walking_bh++;
+qemu_lockcnt_inc(&ctx->list_lock);
 
 ret = 0;
-for (bh = ctx->first_bh; bh; bh = next) {
-/* Make sure that fetching bh happens before accessing its members */
-smp_read_barrier_depends();
-next = bh->next;
+for (bh = atomic_rcu_read(&ctx->first_bh); bh; bh = next) {
+next = atomic_rcu_read(&bh->next);
 /* The atomic_xchg is paired with the one in qemu_bh_schedule.  The
  * implicit memory barrier ensures that the callback sees all writes
  * done by the scheduling thread.  It also ensures that the scheduling
@@ -116,11 +114,8 @@ int aio_bh_poll(AioContext *ctx)
 }
 }
 
-ctx->walking_bh--;
-
 /* remove deleted bhs */
-if (!ctx->walking_bh) {
-qemu_mutex_lock(&ctx->list_lock);
+if (qemu_lockcnt_dec_and_lock(&ctx->list_lock)) {
 bhp = &ctx->first_bh;
 while (*bhp) {
 bh = *bhp;
@@ -131,7 +126,7 @@ int aio_bh_poll(AioContext *ctx)
 bhp = &bh->next;
 }
 }
-qemu_mutex_unlock(&ctx->list_lock);
+qemu_lockcnt_unlock(&ctx->list_lock);
 }
 
 return ret;
@@ -187,7 +182,8 @@ aio_compute_timeout(AioContext *ctx)
 int timeout = -1;
 QEMUBH *bh;
 
-for (bh = ctx->first_bh; bh; bh = bh->next) {
+for (bh = atomic_rcu_read(&ctx->first_bh); bh;
+ bh = atomic_rcu_read(&bh->next)) {
 if (bh->scheduled) {
 if (bh->idle) {
 /* idle bottom halves will be polled at least
@@ -270,7 +266,8 @@ aio_ctx_finalize(GSource *source)
 }
 #endif
 
-qemu_mutex_lock(&ctx->list_lock);
+qemu_lockcnt_lock(&ctx->list_lock);
+assert(!qemu_lockcnt_count(&ctx->list_lock));
 while (ctx->first_bh) {
 QEMUBH *next = ctx->first_bh->next;
 
@@ -280,12 +277,12 @@ aio_ctx_finalize(GSource *source)
 g_free(ctx->first_bh);
 ctx->first_bh = next;
 }
-qemu_mutex_unlock(&ctx->list_lock);
+qemu_lockcnt_unlock(&ctx->list_lock);
 
 aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL);
 event_notifier_cleanup(&ctx->notifier);
 qemu_rec_mutex_destroy(&ctx->lock);
-qemu_mutex_destroy(&ctx->list_lock);
+qemu_lockcnt_destroy(&ctx->list_lock);
 timerlistgroup_deinit(&ctx->tlg);
 }
 
@@ -372,6 +369,7 @@ AioContext *aio_context_new(Err

[Qemu-devel] [PULL 02/11] aio: rename bh_lock to list_lock

2017-01-16 Thread Stefan Hajnoczi
From: Paolo Bonzini 

This will be used for AioHandlers too.  There is going to be little
or no contention, so it is better to reuse the same lock.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170112180800.21085-2-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio.h |  2 +-
 async.c | 20 ++--
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 4dca54d..013d400 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -91,7 +91,7 @@ struct AioContext {
 uint32_t notify_me;
 
 /* lock to protect between bh's adders and deleter */
-QemuMutex bh_lock;
+QemuMutex list_lock;
 
 /* Anchor of the list of Bottom Halves belonging to the context */
 struct QEMUBH *first_bh;
diff --git a/async.c b/async.c
index 2960171..69292fa 100644
--- a/async.c
+++ b/async.c
@@ -53,14 +53,14 @@ void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc 
*cb, void *opaque)
 .cb = cb,
 .opaque = opaque,
 };
-qemu_mutex_lock(&ctx->bh_lock);
+qemu_mutex_lock(&ctx->list_lock);
 bh->next = ctx->first_bh;
 bh->scheduled = 1;
 bh->deleted = 1;
 /* Make sure that the members are ready before putting bh into list */
 smp_wmb();
 ctx->first_bh = bh;
-qemu_mutex_unlock(&ctx->bh_lock);
+qemu_mutex_unlock(&ctx->list_lock);
 aio_notify(ctx);
 }
 
@@ -73,12 +73,12 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void 
*opaque)
 .cb = cb,
 .opaque = opaque,
 };
-qemu_mutex_lock(&ctx->bh_lock);
+qemu_mutex_lock(&ctx->list_lock);
 bh->next = ctx->first_bh;
 /* Make sure that the members are ready before putting bh into list */
 smp_wmb();
 ctx->first_bh = bh;
-qemu_mutex_unlock(&ctx->bh_lock);
+qemu_mutex_unlock(&ctx->list_lock);
 return bh;
 }
 
@@ -120,7 +120,7 @@ int aio_bh_poll(AioContext *ctx)
 
 /* remove deleted bhs */
 if (!ctx->walking_bh) {
-qemu_mutex_lock(&ctx->bh_lock);
+qemu_mutex_lock(&ctx->list_lock);
 bhp = &ctx->first_bh;
 while (*bhp) {
 bh = *bhp;
@@ -131,7 +131,7 @@ int aio_bh_poll(AioContext *ctx)
 bhp = &bh->next;
 }
 }
-qemu_mutex_unlock(&ctx->bh_lock);
+qemu_mutex_unlock(&ctx->list_lock);
 }
 
 return ret;
@@ -270,7 +270,7 @@ aio_ctx_finalize(GSource *source)
 }
 #endif
 
-qemu_mutex_lock(&ctx->bh_lock);
+qemu_mutex_lock(&ctx->list_lock);
 while (ctx->first_bh) {
 QEMUBH *next = ctx->first_bh->next;
 
@@ -280,12 +280,12 @@ aio_ctx_finalize(GSource *source)
 g_free(ctx->first_bh);
 ctx->first_bh = next;
 }
-qemu_mutex_unlock(&ctx->bh_lock);
+qemu_mutex_unlock(&ctx->list_lock);
 
 aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL);
 event_notifier_cleanup(&ctx->notifier);
 qemu_rec_mutex_destroy(&ctx->lock);
-qemu_mutex_destroy(&ctx->bh_lock);
+qemu_mutex_destroy(&ctx->list_lock);
 timerlistgroup_deinit(&ctx->tlg);
 }
 
@@ -381,7 +381,7 @@ AioContext *aio_context_new(Error **errp)
 ctx->linux_aio = NULL;
 #endif
 ctx->thread_pool = NULL;
-qemu_mutex_init(&ctx->bh_lock);
+qemu_mutex_init(&ctx->list_lock);
 qemu_rec_mutex_init(&ctx->lock);
 timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
 
-- 
2.9.3




[Qemu-devel] [PULL 10/11] aio: document locking

2017-01-16 Thread Stefan Hajnoczi
From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170112180800.21085-10-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 docs/multiple-iothreads.txt | 13 +
 include/block/aio.h | 32 
 2 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/docs/multiple-iothreads.txt b/docs/multiple-iothreads.txt
index 0e7cdb2..e4d340b 100644
--- a/docs/multiple-iothreads.txt
+++ b/docs/multiple-iothreads.txt
@@ -84,9 +84,8 @@ How to synchronize with an IOThread
 AioContext is not thread-safe so some rules must be followed when using file
 descriptors, event notifiers, timers, or BHs across threads:
 
-1. AioContext functions can be called safely from file descriptor, event
-notifier, timer, or BH callbacks invoked by the AioContext.  No locking is
-necessary.
+1. AioContext functions can always be called safely.  They handle their
+own locking internally.
 
 2. Other threads wishing to access the AioContext must use
 aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
@@ -94,16 +93,14 @@ context is acquired no other thread can access it or run 
event loop iterations
 in this AioContext.
 
 aio_context_acquire()/aio_context_release() calls may be nested.  This
-means you can call them if you're not sure whether #1 applies.
+means you can call them if you're not sure whether #2 applies.
 
 There is currently no lock ordering rule if a thread needs to acquire multiple
 AioContexts simultaneously.  Therefore, it is only safe for code holding the
 QEMU global mutex to acquire other AioContexts.
 
-Side note: the best way to schedule a function call across threads is to create
-a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
-acquire/release or locking is needed for the qemu_bh_schedule() call.  But be
-sure to acquire the AioContext for aio_bh_new() if necessary.
+Side note: the best way to schedule a function call across threads is to call
+aio_bh_schedule_oneshot().  No acquire/release or locking is needed.
 
 AioContext and the block layer
 --
diff --git a/include/block/aio.h b/include/block/aio.h
index be3adfe..7df271d 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -53,18 +53,12 @@ struct LinuxAioState;
 struct AioContext {
 GSource source;
 
-/* Protects all fields from multi-threaded access */
+/* Used by AioContext users to protect from multi-threaded access.  */
 QemuRecMutex lock;
 
-/* The list of registered AIO handlers */
+/* The list of registered AIO handlers.  Protected by ctx->list_lock. */
 QLIST_HEAD(, AioHandler) aio_handlers;
 
-/* This is a simple lock used to protect the aio_handlers list.
- * Specifically, it's used to ensure that no callbacks are removed while
- * we're walking and dispatching callbacks.
- */
-int walking_handlers;
-
 /* Used to avoid unnecessary event_notifier_set calls in aio_notify;
  * accessed with atomic primitives.  If this field is 0, everything
  * (file descriptors, bottom halves, timers) will be re-evaluated
@@ -90,9 +84,9 @@ struct AioContext {
  */
 uint32_t notify_me;
 
-/* A lock to protect between bh's adders and deleter, and to ensure
- * that no callbacks are removed while we're walking and dispatching
- * them.
+/* A lock to protect between QEMUBH and AioHandler adders and deleter,
+ * and to ensure that no callbacks are removed while we're walking and
+ * dispatching them.
  */
 QemuLockCnt list_lock;
 
@@ -114,7 +108,9 @@ struct AioContext {
 bool notified;
 EventNotifier notifier;
 
-/* Thread pool for performing work and receiving completion callbacks */
+/* Thread pool for performing work and receiving completion callbacks.
+ * Has its own locking.
+ */
 struct ThreadPool *thread_pool;
 
 #ifdef CONFIG_LINUX_AIO
@@ -124,7 +120,9 @@ struct AioContext {
 struct LinuxAioState *linux_aio;
 #endif
 
-/* TimerLists for calling timers - one per clock type */
+/* TimerLists for calling timers - one per clock type.  Has its own
+ * locking.
+ */
 QEMUTimerListGroup tlg;
 
 int external_disable_cnt;
@@ -178,9 +176,11 @@ void aio_context_unref(AioContext *ctx);
  * automatically takes care of calling aio_context_acquire and
  * aio_context_release.
  *
- * Access to timers and BHs from a thread that has not acquired AioContext
- * is possible.  Access to callbacks for now must be done while the AioContext
- * is owned by the thread (FIXME).
+ * Note that this is separate from bdrv_drained_begin/bdrv_drained_end.  A
+ * thread still has to call those to avoid being interrupted by the guest.
+ *
+ * Bottom halves, timers and callbacks can be created or removed without
+ * acquiring the AioContext.
  */
 void aio_context_acquire(AioContext *ctx);
 
-- 
2.9.3




[Qemu-devel] [PULL 01/11] block: get rid of bdrv_io_unplugged_begin/end

2017-01-16 Thread Stefan Hajnoczi
From: Paolo Bonzini 

bdrv_io_plug and bdrv_io_unplug are only called (via their
BlockBackend equivalents) after starting asynchronous I/O.
bdrv_drain is not going to be called while they are running,
because---even if a coroutine runs for some reason---it will
only drain in the next iteration of the event loop through
bdrv_co_yield_to_drain.

So this mechanism is unnecessary, get rid of it.

Signed-off-by: Paolo Bonzini 
Message-id: 20161129113334.605-1-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/block.h |  2 --
 include/block/block_int.h |  3 +--
 block/io.c| 41 ++---
 3 files changed, 3 insertions(+), 43 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 49bb0b2..8b0dcda 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -526,8 +526,6 @@ int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry 
*geo);
 
 void bdrv_io_plug(BlockDriverState *bs);
 void bdrv_io_unplug(BlockDriverState *bs);
-void bdrv_io_unplugged_begin(BlockDriverState *bs);
-void bdrv_io_unplugged_end(BlockDriverState *bs);
 
 /**
  * bdrv_drained_begin:
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 4e4562d..2d92d7e 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -526,9 +526,8 @@ struct BlockDriverState {
 uint64_t write_threshold_offset;
 NotifierWithReturn write_threshold_notifier;
 
-/* counters for nested bdrv_io_plug and bdrv_io_unplugged_begin */
+/* counter for nested bdrv_io_plug */
 unsigned io_plugged;
-unsigned io_plug_disabled;
 
 int quiesce_counter;
 };
diff --git a/block/io.c b/block/io.c
index 4f00562..c42b34a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -228,9 +228,7 @@ void bdrv_drained_begin(BlockDriverState *bs)
 bdrv_parent_drained_begin(bs);
 }
 
-bdrv_io_unplugged_begin(bs);
 bdrv_drain_recurse(bs);
-bdrv_io_unplugged_end(bs);
 }
 
 void bdrv_drained_end(BlockDriverState *bs)
@@ -302,7 +300,6 @@ void bdrv_drain_all_begin(void)
 
 aio_context_acquire(aio_context);
 bdrv_parent_drained_begin(bs);
-bdrv_io_unplugged_begin(bs);
 aio_disable_external(aio_context);
 aio_context_release(aio_context);
 
@@ -347,7 +344,6 @@ void bdrv_drain_all_end(void)
 
 aio_context_acquire(aio_context);
 aio_enable_external(aio_context);
-bdrv_io_unplugged_end(bs);
 bdrv_parent_drained_end(bs);
 aio_context_release(aio_context);
 }
@@ -2650,7 +2646,7 @@ void bdrv_io_plug(BlockDriverState *bs)
 bdrv_io_plug(child->bs);
 }
 
-if (bs->io_plugged++ == 0 && bs->io_plug_disabled == 0) {
+if (bs->io_plugged++ == 0) {
 BlockDriver *drv = bs->drv;
 if (drv && drv->bdrv_io_plug) {
 drv->bdrv_io_plug(bs);
@@ -2663,7 +2659,7 @@ void bdrv_io_unplug(BlockDriverState *bs)
 BdrvChild *child;
 
 assert(bs->io_plugged);
-if (--bs->io_plugged == 0 && bs->io_plug_disabled == 0) {
+if (--bs->io_plugged == 0) {
 BlockDriver *drv = bs->drv;
 if (drv && drv->bdrv_io_unplug) {
 drv->bdrv_io_unplug(bs);
@@ -2674,36 +2670,3 @@ void bdrv_io_unplug(BlockDriverState *bs)
 bdrv_io_unplug(child->bs);
 }
 }
-
-void bdrv_io_unplugged_begin(BlockDriverState *bs)
-{
-BdrvChild *child;
-
-if (bs->io_plug_disabled++ == 0 && bs->io_plugged > 0) {
-BlockDriver *drv = bs->drv;
-if (drv && drv->bdrv_io_unplug) {
-drv->bdrv_io_unplug(bs);
-}
-}
-
-QLIST_FOREACH(child, &bs->children, next) {
-bdrv_io_unplugged_begin(child->bs);
-}
-}
-
-void bdrv_io_unplugged_end(BlockDriverState *bs)
-{
-BdrvChild *child;
-
-assert(bs->io_plug_disabled);
-QLIST_FOREACH(child, &bs->children, next) {
-bdrv_io_unplugged_end(child->bs);
-}
-
-if (--bs->io_plug_disabled == 0 && bs->io_plugged > 0) {
-BlockDriver *drv = bs->drv;
-if (drv && drv->bdrv_io_plug) {
-drv->bdrv_io_plug(bs);
-}
-}
-}
-- 
2.9.3




[Qemu-devel] [PULL 06/11] aio-posix: split aio_dispatch_handlers out of aio_dispatch

2017-01-16 Thread Stefan Hajnoczi
From: Paolo Bonzini 

This simplifies the handling of dispatch_fds.

Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 
Message-id: 20170112180800.21085-6-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 aio-posix.c | 43 +--
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index 1585571..25198d9 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -367,31 +367,16 @@ bool aio_pending(AioContext *ctx)
 return false;
 }
 
-/*
- * Note that dispatch_fds == false has the side-effect of post-poning the
- * freeing of deleted handlers.
- */
-bool aio_dispatch(AioContext *ctx, bool dispatch_fds)
+static bool aio_dispatch_handlers(AioContext *ctx)
 {
-AioHandler *node = NULL;
+AioHandler *node;
 bool progress = false;
 
 /*
- * If there are callbacks left that have been queued, we need to call them.
- * Do not call select in this case, because it is possible that the caller
- * does not need a complete flush (as is the case for aio_poll loops).
- */
-if (aio_bh_poll(ctx)) {
-progress = true;
-}
-
-/*
  * We have to walk very carefully in case aio_set_fd_handler is
  * called while we're walking.
  */
-if (dispatch_fds) {
-node = QLIST_FIRST(&ctx->aio_handlers);
-}
+node = QLIST_FIRST(&ctx->aio_handlers);
 while (node) {
 AioHandler *tmp;
 int revents;
@@ -431,6 +416,28 @@ bool aio_dispatch(AioContext *ctx, bool dispatch_fds)
 }
 }
 
+return progress;
+}
+
+/*
+ * Note that dispatch_fds == false has the side-effect of post-poning the
+ * freeing of deleted handlers.
+ */
+bool aio_dispatch(AioContext *ctx, bool dispatch_fds)
+{
+bool progress;
+
+/*
+ * If there are callbacks left that have been queued, we need to call them.
+ * Do not call select in this case, because it is possible that the caller
+ * does not need a complete flush (as is the case for aio_poll loops).
+ */
+progress = aio_bh_poll(ctx);
+
+if (dispatch_fds) {
+progress |= aio_dispatch_handlers(ctx);
+}
+
 /* Run our timers */
 progress |= timerlistgroup_run_timers(&ctx->tlg);
 
-- 
2.9.3




[Qemu-devel] [PULL 08/11] aio-posix: remove walking_handlers, protecting AioHandler list with list_lock

2017-01-16 Thread Stefan Hajnoczi
From: Paolo Bonzini 

Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 
Message-id: 20170112180800.21085-8-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 aio-posix.c | 65 -
 1 file changed, 39 insertions(+), 26 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index f83b7af..9453d83 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -16,7 +16,7 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 #include "block/block.h"
-#include "qemu/queue.h"
+#include "qemu/rcu_queue.h"
 #include "qemu/sockets.h"
 #include "qemu/cutils.h"
 #include "trace.h"
@@ -66,7 +66,7 @@ static bool aio_epoll_try_enable(AioContext *ctx)
 AioHandler *node;
 struct epoll_event event;
 
-QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
 int r;
 if (node->deleted || !node->pfd.events) {
 continue;
@@ -212,24 +212,27 @@ void aio_set_fd_handler(AioContext *ctx,
 bool is_new = false;
 bool deleted = false;
 
+qemu_lockcnt_lock(&ctx->list_lock);
+
 node = find_aio_handler(ctx, fd);
 
 /* Are we deleting the fd handler? */
 if (!io_read && !io_write && !io_poll) {
 if (node == NULL) {
+qemu_lockcnt_unlock(&ctx->list_lock);
 return;
 }
 
 g_source_remove_poll(&ctx->source, &node->pfd);
 
 /* If the lock is held, just mark the node as deleted */
-if (ctx->walking_handlers) {
+if (qemu_lockcnt_count(&ctx->list_lock)) {
 node->deleted = 1;
 node->pfd.revents = 0;
 } else {
 /* Otherwise, delete it for real.  We can't just mark it as
- * deleted because deleted nodes are only cleaned up after
- * releasing the walking_handlers lock.
+ * deleted because deleted nodes are only cleaned up while
+ * no one is walking the handlers list.
  */
 QLIST_REMOVE(node, node);
 deleted = true;
@@ -243,7 +246,7 @@ void aio_set_fd_handler(AioContext *ctx,
 /* Alloc and insert if it's not already there */
 node = g_new0(AioHandler, 1);
 node->pfd.fd = fd;
-QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
+QLIST_INSERT_HEAD_RCU(&ctx->aio_handlers, node, node);
 
 g_source_add_poll(&ctx->source, &node->pfd);
 is_new = true;
@@ -265,6 +268,7 @@ void aio_set_fd_handler(AioContext *ctx,
 }
 
 aio_epoll_update(ctx, node, is_new);
+qemu_lockcnt_unlock(&ctx->list_lock);
 aio_notify(ctx);
 
 if (deleted) {
@@ -316,8 +320,8 @@ static void poll_set_started(AioContext *ctx, bool started)
 
 ctx->poll_started = started;
 
-ctx->walking_handlers++;
-QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+qemu_lockcnt_inc(&ctx->list_lock);
+QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
 IOHandler *fn;
 
 if (node->deleted) {
@@ -334,7 +338,7 @@ static void poll_set_started(AioContext *ctx, bool started)
 fn(node->opaque);
 }
 }
-ctx->walking_handlers--;
+qemu_lockcnt_dec(&ctx->list_lock);
 }
 
 
@@ -349,22 +353,32 @@ bool aio_prepare(AioContext *ctx)
 bool aio_pending(AioContext *ctx)
 {
 AioHandler *node;
+bool result = false;
 
-QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+/*
+ * We have to walk very carefully in case aio_set_fd_handler is
+ * called while we're walking.
+ */
+qemu_lockcnt_inc(&ctx->list_lock);
+
+QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
 int revents;
 
 revents = node->pfd.revents & node->pfd.events;
 if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read &&
 aio_node_check(ctx, node->is_external)) {
-return true;
+result = true;
+break;
 }
 if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write &&
 aio_node_check(ctx, node->is_external)) {
-return true;
+result = true;
+break;
 }
 }
+qemu_lockcnt_dec(&ctx->list_lock);
 
-return false;
+return result;
 }
 
 static bool aio_dispatch_handlers(AioContext *ctx)
@@ -376,9 +390,9 @@ static bool aio_dispatch_handlers(AioContext *ctx)
  * We have to walk very carefully in case aio_set_fd_handler is
  * called while we're walking.
  */
-ctx->walking_handlers++;
+qemu_lockcnt_inc(&ctx->list_lock);
 
-QLIST_FOREACH_SAFE(node, &ctx->aio_handlers, node, tmp) {
+QLIST_FOREACH_SAFE_RCU(node, &ctx->aio_handlers, node, tmp) {
 int revents;
 
 revents = node->pfd.revents & node->pfd.events;
@@ -404,16 +418,15 @@ static bool aio_dispatch_handlers(AioContext *ctx)
 }
 
 if (node->deleted) {
-ctx->walking_handlers--;
-if (!ctx->walking_handlers)

[Qemu-devel] [PULL 00/11] Block patches

2017-01-16 Thread Stefan Hajnoczi
The following changes since commit 2ccede18bd24fce5db83fef3674563a1f256717b:

  Merge remote-tracking branch 'remotes/vivier/tags/m68k-for-2.9-pull-request' 
into staging (2017-01-16 12:41:35 +)

are available in the git repository at:

  git://github.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 7d506c90afa47facdb993bc19c15863eef584f1d:

  async: optimize aio_bh_poll (2017-01-16 13:25:18 +)





Paolo Bonzini (11):
  block: get rid of bdrv_io_unplugged_begin/end
  aio: rename bh_lock to list_lock
  qemu-thread: introduce QemuLockCnt
  aio: make ctx->list_lock a QemuLockCnt, subsuming ctx->walking_bh
  qemu-thread: optimize QemuLockCnt with futexes on Linux
  aio-posix: split aio_dispatch_handlers out of aio_dispatch
  aio: tweak walking in dispatch phase
  aio-posix: remove walking_handlers, protecting AioHandler list with
list_lock
  aio-win32: remove walking_handlers, protecting AioHandler list with
list_lock
  aio: document locking
  async: optimize aio_bh_poll

 docs/lockcnt.txt| 277 +++
 docs/multiple-iothreads.txt |  13 +-
 util/Makefile.objs  |   1 +
 include/block/aio.h |  38 ++---
 include/block/block.h   |   2 -
 include/block/block_int.h   |   3 +-
 include/qemu/futex.h|  36 
 include/qemu/thread.h   | 112 +
 aio-posix.c | 118 +++--
 aio-win32.c |  83 +
 async.c |  45 ++---
 block/io.c  |  41 +
 util/lockcnt.c  | 397 
 util/qemu-thread-posix.c|  35 +---
 util/qemu-thread-win32.c|   2 +-
 util/trace-events   |  10 ++
 16 files changed, 1010 insertions(+), 203 deletions(-)
 create mode 100644 docs/lockcnt.txt
 create mode 100644 include/qemu/futex.h
 create mode 100644 util/lockcnt.c

-- 
2.9.3




[Qemu-devel] [PULL 05/11] qemu-thread: optimize QemuLockCnt with futexes on Linux

2017-01-16 Thread Stefan Hajnoczi
From: Paolo Bonzini 

This is complex, but I think it is reasonably documented in the source.

Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 
Message-id: 20170112180800.21085-5-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 docs/lockcnt.txt |   9 +-
 include/qemu/futex.h |  36 ++
 include/qemu/thread.h|   2 +
 util/lockcnt.c   | 283 +++
 util/qemu-thread-posix.c |  35 +-
 util/qemu-thread-win32.c |   2 +-
 util/trace-events|  10 ++
 7 files changed, 342 insertions(+), 35 deletions(-)
 create mode 100644 include/qemu/futex.h

diff --git a/docs/lockcnt.txt b/docs/lockcnt.txt
index 25a8091..2a79b32 100644
--- a/docs/lockcnt.txt
+++ b/docs/lockcnt.txt
@@ -142,12 +142,11 @@ can also be more efficient in two ways:
 - it avoids taking the lock for many operations (for example
   incrementing the counter while it is non-zero);
 
-- on some platforms, one could implement QemuLockCnt to hold the
-  lock and the mutex in a single word, making it no more expensive
+- on some platforms, one can implement QemuLockCnt to hold the lock
+  and the mutex in a single word, making the fast path no more expensive
   than simply managing a counter using atomic operations (see
-  docs/atomics.txt).  This is not implemented yet, but can be
-  very helpful if concurrent access to the data structure is
-  expected to be rare.
+  docs/atomics.txt).  This can be very helpful if concurrent access to
+  the data structure is expected to be rare.
 
 
 Using the same mutex for frees and writes can still incur some small
diff --git a/include/qemu/futex.h b/include/qemu/futex.h
new file mode 100644
index 000..bb7dc9e
--- /dev/null
+++ b/include/qemu/futex.h
@@ -0,0 +1,36 @@
+/*
+ * Wrappers around Linux futex syscall
+ *
+ * Copyright Red Hat, Inc. 2017
+ *
+ * Author:
+ *  Paolo Bonzini 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include 
+#include 
+
+#define qemu_futex(...)  syscall(__NR_futex, __VA_ARGS__)
+
+static inline void qemu_futex_wake(void *f, int n)
+{
+qemu_futex(f, FUTEX_WAKE, n, NULL, NULL, 0);
+}
+
+static inline void qemu_futex_wait(void *f, unsigned val)
+{
+while (qemu_futex(f, FUTEX_WAIT, (int) val, NULL, NULL, 0)) {
+switch (errno) {
+case EWOULDBLOCK:
+return;
+case EINTR:
+break; /* get out of switch and retry */
+default:
+abort();
+}
+}
+}
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index 5f7de7b..9910f49 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -100,7 +100,9 @@ static inline void qemu_spin_unlock(QemuSpin *spin)
 }
 
 struct QemuLockCnt {
+#ifndef CONFIG_LINUX
 QemuMutex mutex;
+#endif
 unsigned count;
 };
 
diff --git a/util/lockcnt.c b/util/lockcnt.c
index da1de77..4f88dcf 100644
--- a/util/lockcnt.c
+++ b/util/lockcnt.c
@@ -9,7 +9,289 @@
 #include "qemu/osdep.h"
 #include "qemu/thread.h"
 #include "qemu/atomic.h"
+#include "trace.h"
 
+#ifdef CONFIG_LINUX
+#include "qemu/futex.h"
+
+/* On Linux, bits 0-1 are a futex-based lock, bits 2-31 are the counter.
+ * For the mutex algorithm see Ulrich Drepper's "Futexes Are Tricky" (ok,
+ * this is not the most relaxing citation I could make...).  It is similar
+ * to mutex2 in the paper.
+ */
+
+#define QEMU_LOCKCNT_STATE_MASK3
+#define QEMU_LOCKCNT_STATE_FREE0   /* free, uncontended */
+#define QEMU_LOCKCNT_STATE_LOCKED  1   /* locked, uncontended */
+#define QEMU_LOCKCNT_STATE_WAITING 2   /* locked, contended */
+
+#define QEMU_LOCKCNT_COUNT_STEP4
+#define QEMU_LOCKCNT_COUNT_SHIFT   2
+
+void qemu_lockcnt_init(QemuLockCnt *lockcnt)
+{
+lockcnt->count = 0;
+}
+
+void qemu_lockcnt_destroy(QemuLockCnt *lockcnt)
+{
+}
+
+/* *val is the current value of lockcnt->count.
+ *
+ * If the lock is free, try a cmpxchg from *val to new_if_free; return
+ * true and set *val to the old value found by the cmpxchg in
+ * lockcnt->count.
+ *
+ * If the lock is taken, wait for it to be released and return false
+ * *without trying again to take the lock*.  Again, set *val to the
+ * new value of lockcnt->count.
+ *
+ * If *waited is true on return, new_if_free's bottom two bits must not
+ * be QEMU_LOCKCNT_STATE_LOCKED on subsequent calls, because the caller
+ * does not know if there are other waiters.  Furthermore, after *waited
+ * is set the caller has effectively acquired the lock.  If it returns
+ * with the lock not taken, it must wake another futex waiter.
+ */
+static bool qemu_lockcnt_cmpxchg_or_wait(QemuLockCnt *lockcnt, int *val,
+ int new_if_free, bool *waited)
+{
+/* Fast path for when the lock is free.  */
+if ((*val & QEMU_LOCKCNT_STATE_MASK) == QEMU_LOCKCNT_STATE_FREE) {
+int expected = *val;
+
+trace_lockcn

[Qemu-devel] [PULL 11/11] async: optimize aio_bh_poll

2017-01-16 Thread Stefan Hajnoczi
From: Paolo Bonzini 

Avoid entering the slow path of qemu_lockcnt_dec_and_lock if
no bottom half has to be deleted.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170112180800.21085-11-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 async.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/async.c b/async.c
index 2305e11..0d218ab 100644
--- a/async.c
+++ b/async.c
@@ -92,6 +92,7 @@ int aio_bh_poll(AioContext *ctx)
 {
 QEMUBH *bh, **bhp, *next;
 int ret;
+bool deleted = false;
 
 qemu_lockcnt_inc(&ctx->list_lock);
 
@@ -112,9 +113,17 @@ int aio_bh_poll(AioContext *ctx)
 bh->idle = 0;
 aio_bh_call(bh);
 }
+if (bh->deleted) {
+deleted = true;
+}
 }
 
 /* remove deleted bhs */
+if (!deleted) {
+qemu_lockcnt_dec(&ctx->list_lock);
+return ret;
+}
+
 if (qemu_lockcnt_dec_and_lock(&ctx->list_lock)) {
 bhp = &ctx->first_bh;
 while (*bhp) {
@@ -128,7 +137,6 @@ int aio_bh_poll(AioContext *ctx)
 }
 qemu_lockcnt_unlock(&ctx->list_lock);
 }
-
 return ret;
 }
 
-- 
2.9.3




[Qemu-devel] [PULL 03/11] qemu-thread: introduce QemuLockCnt

2017-01-16 Thread Stefan Hajnoczi
From: Paolo Bonzini 

A QemuLockCnt comprises a counter and a mutex, with primitives
to increment and decrement the counter, and to take and release the
mutex.  It can be used to do lock-free visits to a data structure
whenever mutexes would be too heavy-weight and the critical section
is too long for RCU.

This could be implemented simply by protecting the counter with the
mutex, but QemuLockCnt is harder to misuse and more efficient.

Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 
Message-id: 20170112180800.21085-3-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 docs/lockcnt.txt  | 278 ++
 util/Makefile.objs|   1 +
 include/qemu/thread.h | 110 
 util/lockcnt.c| 114 +
 4 files changed, 503 insertions(+)
 create mode 100644 docs/lockcnt.txt
 create mode 100644 util/lockcnt.c

diff --git a/docs/lockcnt.txt b/docs/lockcnt.txt
new file mode 100644
index 000..25a8091
--- /dev/null
+++ b/docs/lockcnt.txt
@@ -0,0 +1,278 @@
+DOCUMENTATION FOR LOCKED COUNTERS (aka QemuLockCnt)
+===
+
+QEMU often uses reference counts to track data structures that are being
+accessed and should not be freed.  For example, a loop that invoke
+callbacks like this is not safe:
+
+QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
+if (ioh->revents & G_IO_OUT) {
+ioh->fd_write(ioh->opaque);
+}
+}
+
+QLIST_FOREACH_SAFE protects against deletion of the current node (ioh)
+by stashing away its "next" pointer.  However, ioh->fd_write could
+actually delete the next node from the list.  The simplest way to
+avoid this is to mark the node as deleted, and remove it from the
+list in the above loop:
+
+QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
+if (ioh->deleted) {
+QLIST_REMOVE(ioh, next);
+g_free(ioh);
+} else {
+if (ioh->revents & G_IO_OUT) {
+ioh->fd_write(ioh->opaque);
+}
+}
+}
+
+If however this loop must also be reentrant, i.e. it is possible that
+ioh->fd_write invokes the loop again, some kind of counting is needed:
+
+walking_handlers++;
+QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
+if (ioh->deleted) {
+if (walking_handlers == 1) {
+QLIST_REMOVE(ioh, next);
+g_free(ioh);
+}
+} else {
+if (ioh->revents & G_IO_OUT) {
+ioh->fd_write(ioh->opaque);
+}
+}
+}
+walking_handlers--;
+
+One may think of using the RCU primitives, rcu_read_lock() and
+rcu_read_unlock(); effectively, the RCU nesting count would take
+the place of the walking_handlers global variable.  Indeed,
+reference counting and RCU have similar purposes, but their usage in
+general is complementary:
+
+- reference counting is fine-grained and limited to a single data
+  structure; RCU delays reclamation of *all* RCU-protected data
+  structures;
+
+- reference counting works even in the presence of code that keeps
+  a reference for a long time; RCU critical sections in principle
+  should be kept short;
+
+- reference counting is often applied to code that is not thread-safe
+  but is reentrant; in fact, usage of reference counting in QEMU predates
+  the introduction of threads by many years.  RCU is generally used to
+  protect readers from other threads freeing memory after concurrent
+  modifications to a data structure.
+
+- reclaiming data can be done by a separate thread in the case of RCU;
+  this can improve performance, but also delay reclamation undesirably.
+  With reference counting, reclamation is deterministic.
+
+This file documents QemuLockCnt, an abstraction for using reference
+counting in code that has to be both thread-safe and reentrant.
+
+
+QemuLockCnt concepts
+
+
+A QemuLockCnt comprises both a counter and a mutex; it has primitives
+to increment and decrement the counter, and to take and release the
+mutex.  The counter notes how many visits to the data structures are
+taking place (the visits could be from different threads, or there could
+be multiple reentrant visits from the same thread).  The basic rules
+governing the counter/mutex pair then are the following:
+
+- Data protected by the QemuLockCnt must not be freed unless the
+  counter is zero and the mutex is taken.
+
+- A new visit cannot be started while the counter is zero and the
+  mutex is taken.
+
+Most of the time, the mutex protects all writes to the data structure,
+not just frees, though there could be cases where this is not necessary.
+
+Reads, instead, can be done without taking the mutex, as long as the
+readers and writers use the same macros that are used for RCU, for
+example atomic_rcu_read, atomic_rcu_set, QLIST_FOREACH_RCU, etc.  This is
+because the reads are done outsi

[Qemu-devel] [PULL 09/11] aio-win32: remove walking_handlers, protecting AioHandler list with list_lock

2017-01-16 Thread Stefan Hajnoczi
From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170112180800.21085-9-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 aio-win32.c | 73 +++--
 1 file changed, 47 insertions(+), 26 deletions(-)

diff --git a/aio-win32.c b/aio-win32.c
index 1ad459d..900524c 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -21,6 +21,7 @@
 #include "qemu/queue.h"
 #include "qemu/sockets.h"
 #include "qapi/error.h"
+#include "qemu/rcu_queue.h"
 
 struct AioHandler {
 EventNotifier *e;
@@ -45,6 +46,7 @@ void aio_set_fd_handler(AioContext *ctx,
 /* fd is a SOCKET in our case */
 AioHandler *node;
 
+qemu_lockcnt_lock(&ctx->list_lock);
 QLIST_FOREACH(node, &ctx->aio_handlers, node) {
 if (node->pfd.fd == fd && !node->deleted) {
 break;
@@ -54,14 +56,14 @@ void aio_set_fd_handler(AioContext *ctx,
 /* Are we deleting the fd handler? */
 if (!io_read && !io_write) {
 if (node) {
-/* If the lock is held, just mark the node as deleted */
-if (ctx->walking_handlers) {
+/* If aio_poll is in progress, just mark the node as deleted */
+if (qemu_lockcnt_count(&ctx->list_lock)) {
 node->deleted = 1;
 node->pfd.revents = 0;
 } else {
 /* Otherwise, delete it for real.  We can't just mark it as
  * deleted because deleted nodes are only cleaned up after
- * releasing the walking_handlers lock.
+ * releasing the list_lock.
  */
 QLIST_REMOVE(node, node);
 g_free(node);
@@ -74,7 +76,7 @@ void aio_set_fd_handler(AioContext *ctx,
 /* Alloc and insert if it's not already there */
 node = g_new0(AioHandler, 1);
 node->pfd.fd = fd;
-QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
+QLIST_INSERT_HEAD_RCU(&ctx->aio_handlers, node, node);
 }
 
 node->pfd.events = 0;
@@ -99,6 +101,7 @@ void aio_set_fd_handler(AioContext *ctx,
FD_CONNECT | FD_WRITE | FD_OOB);
 }
 
+qemu_lockcnt_unlock(&ctx->list_lock);
 aio_notify(ctx);
 }
 
@@ -117,6 +120,7 @@ void aio_set_event_notifier(AioContext *ctx,
 {
 AioHandler *node;
 
+qemu_lockcnt_lock(&ctx->list_lock);
 QLIST_FOREACH(node, &ctx->aio_handlers, node) {
 if (node->e == e && !node->deleted) {
 break;
@@ -128,14 +132,14 @@ void aio_set_event_notifier(AioContext *ctx,
 if (node) {
 g_source_remove_poll(&ctx->source, &node->pfd);
 
-/* If the lock is held, just mark the node as deleted */
-if (ctx->walking_handlers) {
+/* aio_poll is in progress, just mark the node as deleted */
+if (qemu_lockcnt_count(&ctx->list_lock)) {
 node->deleted = 1;
 node->pfd.revents = 0;
 } else {
 /* Otherwise, delete it for real.  We can't just mark it as
  * deleted because deleted nodes are only cleaned up after
- * releasing the walking_handlers lock.
+ * releasing the list_lock.
  */
 QLIST_REMOVE(node, node);
 g_free(node);
@@ -149,7 +153,7 @@ void aio_set_event_notifier(AioContext *ctx,
 node->pfd.fd = (uintptr_t)event_notifier_get_handle(e);
 node->pfd.events = G_IO_IN;
 node->is_external = is_external;
-QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
+QLIST_INSERT_HEAD_RCU(&ctx->aio_handlers, node, node);
 
 g_source_add_poll(&ctx->source, &node->pfd);
 }
@@ -157,6 +161,7 @@ void aio_set_event_notifier(AioContext *ctx,
 node->io_notify = io_notify;
 }
 
+qemu_lockcnt_unlock(&ctx->list_lock);
 aio_notify(ctx);
 }
 
@@ -175,10 +180,16 @@ bool aio_prepare(AioContext *ctx)
 bool have_select_revents = false;
 fd_set rfds, wfds;
 
+/*
+ * We have to walk very carefully in case aio_set_fd_handler is
+ * called while we're walking.
+ */
+qemu_lockcnt_inc(&ctx->list_lock);
+
 /* fill fd sets */
 FD_ZERO(&rfds);
 FD_ZERO(&wfds);
-QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
 if (node->io_read) {
 FD_SET ((SOCKET)node->pfd.fd, &rfds);
 }
@@ -188,7 +199,7 @@ bool aio_prepare(AioContext *ctx)
 }
 
 if (select(0, &rfds, &wfds, NULL, &tv0) > 0) {
-QLIST_FOREACH(node, &ctx->aio_handlers, node) {
+QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
 node->pfd.revents = 0;
 if (FD_ISSET(node->pfd.fd, &rfds)) {
 node->pfd.revents |= G_IO_IN;
@@ -202,41 +213,53 @@ bool aio_prepare(AioContext *ctx)
 }

[Qemu-devel] [PULL 1/4] trace-events: spelling fix

2017-01-16 Thread Stefan Hajnoczi
From: Marc-André Lureau 

Signed-off-by: Marc-André Lureau 
Message-id: 20161212221759.28949-1-marcandre.lur...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 trace-events | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/trace-events b/trace-events
index 1181486..2c66780 100644
--- a/trace-events
+++ b/trace-events
@@ -53,7 +53,7 @@ qemu_system_shutdown_request(void) ""
 qemu_system_powerdown_request(void) ""
 
 # spice-qemu-char.c
-spice_vmc_write(ssize_t out, int len) "spice wrottn %zd of requested %d"
+spice_vmc_write(ssize_t out, int len) "spice wrote %zd of requested %d"
 spice_vmc_read(int bytes, int len) "spice read %d of requested %d"
 spice_vmc_register_interface(void *scd) "spice vmc registered interface %p"
 spice_vmc_unregister_interface(void *scd) "spice vmc unregistered interface %p"
-- 
2.9.3




[Qemu-devel] [PULL 07/11] aio: tweak walking in dispatch phase

2017-01-16 Thread Stefan Hajnoczi
From: Paolo Bonzini 

Preparing for the following patch, use QLIST_FOREACH_SAFE and
modify the placement of walking_handlers increment/decrement.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Message-id: 20170112180800.21085-7-pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 aio-posix.c | 26 --
 aio-win32.c | 26 --
 2 files changed, 24 insertions(+), 28 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index 25198d9..f83b7af 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -369,19 +369,17 @@ bool aio_pending(AioContext *ctx)
 
 static bool aio_dispatch_handlers(AioContext *ctx)
 {
-AioHandler *node;
+AioHandler *node, *tmp;
 bool progress = false;
 
 /*
  * We have to walk very carefully in case aio_set_fd_handler is
  * called while we're walking.
  */
-node = QLIST_FIRST(&ctx->aio_handlers);
-while (node) {
-AioHandler *tmp;
-int revents;
+ctx->walking_handlers++;
 
-ctx->walking_handlers++;
+QLIST_FOREACH_SAFE(node, &ctx->aio_handlers, node, tmp) {
+int revents;
 
 revents = node->pfd.revents & node->pfd.events;
 node->pfd.revents = 0;
@@ -405,17 +403,17 @@ static bool aio_dispatch_handlers(AioContext *ctx)
 progress = true;
 }
 
-tmp = node;
-node = QLIST_NEXT(node, node);
-
-ctx->walking_handlers--;
-
-if (!ctx->walking_handlers && tmp->deleted) {
-QLIST_REMOVE(tmp, node);
-g_free(tmp);
+if (node->deleted) {
+ctx->walking_handlers--;
+if (!ctx->walking_handlers) {
+QLIST_REMOVE(node, node);
+g_free(node);
+}
+ctx->walking_handlers++;
 }
 }
 
+ctx->walking_handlers--;
 return progress;
 }
 
diff --git a/aio-win32.c b/aio-win32.c
index d19dc42..1ad459d 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -227,20 +227,18 @@ bool aio_pending(AioContext *ctx)
 
 static bool aio_dispatch_handlers(AioContext *ctx, HANDLE event)
 {
-AioHandler *node;
+AioHandler *node, *tmp;
 bool progress = false;
 
+ctx->walking_handlers++;
+
 /*
  * We have to walk very carefully in case aio_set_fd_handler is
  * called while we're walking.
  */
-node = QLIST_FIRST(&ctx->aio_handlers);
-while (node) {
-AioHandler *tmp;
+QLIST_FOREACH_SAFE(node, &ctx->aio_handlers, node, tmp) {
 int revents = node->pfd.revents;
 
-ctx->walking_handlers++;
-
 if (!node->deleted &&
 (revents || event_notifier_get_handle(node->e) == event) &&
 node->io_notify) {
@@ -275,17 +273,17 @@ static bool aio_dispatch_handlers(AioContext *ctx, HANDLE 
event)
 }
 }
 
-tmp = node;
-node = QLIST_NEXT(node, node);
-
-ctx->walking_handlers--;
-
-if (!ctx->walking_handlers && tmp->deleted) {
-QLIST_REMOVE(tmp, node);
-g_free(tmp);
+if (node->deleted) {
+ctx->walking_handlers--;
+if (!ctx->walking_handlers) {
+QLIST_REMOVE(node, node);
+g_free(node);
+}
+ctx->walking_handlers++;
 }
 }
 
+ctx->walking_handlers--;
 return progress;
 }
 
-- 
2.9.3




  1   2   3   4   >