Re: [Qemu-devel] [PATCH for 2.8 11/11] vhost_net: device IOTLB support

2016-09-01 Thread Peter Xu
On Thu, Sep 01, 2016 at 03:36:40PM +0800, Jason Wang wrote:
> The synchronization were done in kernel:
> 
> - message dequeue and enqueue were protected by a spinlock
> - the data path and control path (e.g IOTLB updating) were synchronized
> through mutex

Yes I didn't notice it's the vhost chr device... Thanks!

-- peterx



Re: [Qemu-devel] [PATCH v2 1/1] block-backend: allow flush on devices with open tray

2016-09-01 Thread Markus Armbruster
John Snow  writes:

> If a device still has an attached BDS because the medium has not yet
> been removed, we will be unable to migrate to a new host because
> blk_flush will return an error for that backend.
>
> Replace the call to blk_is_available to blk_is_inserted to weaken
> the check and allow flushes from the backend to work, while still
> disallowing flushes from the frontend/device model to work.
>
> This fixes a regression present in 2.6.0 caused by the following commit:
> fe1a9cbc339bb54d20f1ca4c1e8788d16944d5cf
> block: Move some bdrv_*_all() functions to BB
>
> Signed-off-by: John Snow 
> ---
>  block/block-backend.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index effa038..36a32c3 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1058,7 +1058,7 @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t 
> offset,
>  BlockAIOCB *blk_aio_flush(BlockBackend *blk,
>BlockCompletionFunc *cb, void *opaque)
>  {
> -if (!blk_is_available(blk)) {
> +if (!blk_is_inserted(blk)) {
>  return blk_abort_aio_request(blk, cb, opaque, -ENOMEDIUM);
>  }
>  
> @@ -1118,7 +1118,7 @@ int blk_co_pdiscard(BlockBackend *blk, int64_t offset, 
> int count)
>  
>  int blk_co_flush(BlockBackend *blk)
>  {
> -if (!blk_is_available(blk)) {
> +if (!blk_is_inserted(blk)) {
>  return -ENOMEDIUM;
>  }
>  
> @@ -1127,7 +1127,7 @@ int blk_co_flush(BlockBackend *blk)
>  
>  int blk_flush(BlockBackend *blk)
>  {
> -if (!blk_is_available(blk)) {
> +if (!blk_is_inserted(blk)) {
>  return -ENOMEDIUM;
>  }

Naive question: should we flush before we open the tray?



Re: [Qemu-devel] [PATCH for 2.8 10/11] Revert "intel_iommu: Throw hw_error on notify_started"

2016-09-01 Thread Peter Xu
On Fri, Sep 02, 2016 at 02:15:04PM +1000, David Gibson wrote:
> What!?  I see no reason you need a different notifier, just fix the
> implementation of the current one.  As a bonus this will also give you
> working VFIO passthrough with vIOMMU on x86, something which should
> work already, but doesn't.

Hi, David,

Do you mean that we can enhance the interface to suite the two needs?
E.g., adding a "IOTLB notification type" definition:

- "full": for those who is listening on all mapping changes including
  additions (VFIO use case)

- "cache_only": for those who only cares about cache invalidations
  (device IOTLB, aka, vhost use case)

We can:

- add notify type when we register the notifiers (e.g., when VFIO
  registers IOMMU notifier, it should specify the type as "full", so
  it won't receive notification if it's device IOTLB invalidations).

- pass this type when trigger the notification, so for each IOMMU
  notify handler, it can selectively disgard the notification.

Not sure whether above makes sense.

-- peterx



Re: [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-01 Thread Kirti Wankhede


On 9/2/2016 10:18 AM, Michal Privoznik wrote:
> On 01.09.2016 18:59, Alex Williamson wrote:
>> On Thu, 1 Sep 2016 18:47:06 +0200
>> Michal Privoznik  wrote:
>>
>>> On 31.08.2016 08:12, Tian, Kevin wrote:
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, August 31, 2016 12:17 AM
>
> Hi folks,
>
> At KVM Forum we had a BoF session primarily around the mediated device
> sysfs interface.  I'd like to share what I think we agreed on and the
> "problem areas" that still need some work so we can get the thoughts
> and ideas from those who weren't able to attend.
>
> DanPB expressed some concern about the mdev_supported_types sysfs
> interface, which exposes a flat csv file with fields like "type",
> "number of instance", "vendor string", and then a bunch of type
> specific fields like "framebuffer size", "resolution", "frame rate
> limit", etc.  This is not entirely machine parsing friendly and sort of
> abuses the sysfs concept of one value per file.  Example output taken
> from Neo's libvirt RFC:
>
> cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
> # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, 
> framebuffer,
> max_resolution
> 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
> 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
> 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
> 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
> 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
> 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
> 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
> 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160
>
> The create/destroy then looks like this:
>
> echo "$mdev_UUID:vendor_specific_argument_list" >
>   /sys/bus/pci/devices/.../mdev_create
>
> echo "$mdev_UUID:vendor_specific_argument_list" >
>   /sys/bus/pci/devices/.../mdev_destroy
>
> "vendor_specific_argument_list" is nebulous.
>
> So the idea to fix this is to explode this into a directory structure,
> something like:
>
> ├── mdev_destroy
> └── mdev_supported_types
> ├── 11
> │   ├── create
> │   ├── description
> │   └── max_instances
> ├── 12
> │   ├── create
> │   ├── description
> │   └── max_instances
> └── 13
> ├── create
> ├── description
> └── max_instances
>
> Note that I'm only exposing the minimal attributes here for simplicity,
> the other attributes would be included in separate files and we would
> require vendors to create standard attributes for common device classes.  

 I like this idea. All standard attributes are reflected into this 
 hierarchy.
 In the meantime, can we still allow optional vendor string in create 
 interface? libvirt doesn't need to know the meaning, but allows upper
 layer to do some vendor specific tweak if necessary.  
>>>
>>> This is not the best idea IMO. Libvirt is there to shadow differences
>>> between hypervisors. While doing that, we often hide differences between
>>> various types of HW too. Therefore in order to provide good abstraction
>>> we should make vendor specific string as small as possible (ideally an
>>> empty string). I mean I see it as bad idea to expose "vgpu_type_id" from
>>> example above in domain XML. What I think the better idea is if we let
>>> users chose resolution and frame buffer size, e.g.: >> resolution="1024x768" framebuffer="16"/> (just the first idea that came
>>> to my mind while writing this e-mail). The point is, XML part is
>>> completely free of any vendor-specific knobs.
>>
>> That's not really what you want though, a user actually cares whether
>> they get an Intel of NVIDIA vGPU, we can't specify it as just a
>> resolution and framebuffer size.  The user also doesn't want the model
>> changing each time the VM is started, so not only do you *need* to know
>> the vendor, you need to know the vendor model.  This is the only way to
>> provide a consistent VM.  So as we discussed at the BoF, the libvirt
>> xml will likely reference the vendor string, which will be a unique
>> identifier that encompasses all the additional attributes we expose.
>> Really the goal of the attributes is simply so you don't need a per
>> vendor magic decoder ring to figure out the basic features of a given
>> vendor string.  Thanks,
> 
> Okay, maybe I'm misunderstanding something. I just thought that users
> will consult libvirt's nodedev driver (e.g. virsh nodedev-list && virsh
> nodedev-dumpxml $id) to fetch vGPU capabilities and then use that info
> to construct 

Re: [Qemu-devel] [PATCH] vhost-user: return if no net clients found

2016-09-01 Thread Chen Hanxiao

At 2016-09-01 20:52:44, "Marc-André Lureau"  wrote:
>Hi
>
>On Thu, Sep 1, 2016 at 4:00 PM Chen Hanxiao  wrote:
>
>>
>> Hi, here is the backtrace:
>>
>> #0  net_vhost_user_event (opaque=0x7fc2f6893be0, event=5) at
>> net/vhost-user.c:196
>> #1  0x7fc2f4ebfb2b in tcp_chr_disconnect (chr=0x7fc2f68cc400) at
>> qemu-char.c:2837
>> #2  0x7fc2f4ebfba9 in tcp_chr_sync_read (chr=0x7fc2f68cc400,
>> buf=, len=) at qemu-char.c:2888
>> #3  0x7fc2f4ec106d in qemu_chr_fe_read_all (s=0x7fc2f68cc400,
>> buf=buf@entry=0x7fff5fda25b7 "", len=len@entry=1) at qemu-char.c:264
>> #4  0x7fc2f4f9a43a in net_vhost_user_watch (chan=,
>> cond=, opaque=) at net/vhost-user.c:190
>> #5  0x7fc2f321999a in g_main_context_dispatch () from
>> /lib64/libglib-2.0.so.0
>> #6  0x7fc2f4fd8fe8 in glib_pollfds_poll () at main-loop.c:209
>> #7  os_host_main_loop_wait (timeout=) at main-loop.c:254
>> #8  main_loop_wait (nonblocking=) at main-loop.c:503
>> #9  0x7fc2f4dd7b1e in main_loop () at vl.c:1818
>> #10 main (argc=, argv=, envp=> out>) at vl.c:4394
>>
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> net_vhost_user_event (opaque=0x7fc2f6893be0, event=5) at
>> net/vhost-user.c:207
>> 207trace_vhost_user_event(s->chr->label, event);
>>
>>
>thanks for the backtrace, that helps
>
>However, I fail to understand how that can happen, as there has to be at
>least one net_client to start qemu with vhost-user and that callback must
>have at least the first netclient still around because the opaque pointer
>is shared with the netclient struct. So it looks like something destroyed
>the netclient before the callback, and in this case, the opaque pointer is
>invalid, and things are going all wrong. But it can't be host-net-remove,
>since the net-client is not on a registered hub.

The call back give qemu_find_net_clients_except id == 'filename'.
But could not find a netclient match.
Then ncs[i] did not get a valid net client, then we will get a seg fault.

>
>Could you try to find a simple reproducer using qemu only?

I'll try.

Regards,
- Chen

>
>thanks
>
>-- 
>Marc-André Lureau


Re: [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-01 Thread Michal Privoznik
On 01.09.2016 18:59, Alex Williamson wrote:
> On Thu, 1 Sep 2016 18:47:06 +0200
> Michal Privoznik  wrote:
> 
>> On 31.08.2016 08:12, Tian, Kevin wrote:
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Wednesday, August 31, 2016 12:17 AM

 Hi folks,

 At KVM Forum we had a BoF session primarily around the mediated device
 sysfs interface.  I'd like to share what I think we agreed on and the
 "problem areas" that still need some work so we can get the thoughts
 and ideas from those who weren't able to attend.

 DanPB expressed some concern about the mdev_supported_types sysfs
 interface, which exposes a flat csv file with fields like "type",
 "number of instance", "vendor string", and then a bunch of type
 specific fields like "framebuffer size", "resolution", "frame rate
 limit", etc.  This is not entirely machine parsing friendly and sort of
 abuses the sysfs concept of one value per file.  Example output taken
 from Neo's libvirt RFC:

 cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
 # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, 
 framebuffer,
 max_resolution
 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160

 The create/destroy then looks like this:

 echo "$mdev_UUID:vendor_specific_argument_list" >
/sys/bus/pci/devices/.../mdev_create

 echo "$mdev_UUID:vendor_specific_argument_list" >
/sys/bus/pci/devices/.../mdev_destroy

 "vendor_specific_argument_list" is nebulous.

 So the idea to fix this is to explode this into a directory structure,
 something like:

 ├── mdev_destroy
 └── mdev_supported_types
 ├── 11
 │   ├── create
 │   ├── description
 │   └── max_instances
 ├── 12
 │   ├── create
 │   ├── description
 │   └── max_instances
 └── 13
 ├── create
 ├── description
 └── max_instances

 Note that I'm only exposing the minimal attributes here for simplicity,
 the other attributes would be included in separate files and we would
 require vendors to create standard attributes for common device classes.  
>>>
>>> I like this idea. All standard attributes are reflected into this hierarchy.
>>> In the meantime, can we still allow optional vendor string in create 
>>> interface? libvirt doesn't need to know the meaning, but allows upper
>>> layer to do some vendor specific tweak if necessary.  
>>
>> This is not the best idea IMO. Libvirt is there to shadow differences
>> between hypervisors. While doing that, we often hide differences between
>> various types of HW too. Therefore in order to provide good abstraction
>> we should make vendor specific string as small as possible (ideally an
>> empty string). I mean I see it as bad idea to expose "vgpu_type_id" from
>> example above in domain XML. What I think the better idea is if we let
>> users chose resolution and frame buffer size, e.g.: > resolution="1024x768" framebuffer="16"/> (just the first idea that came
>> to my mind while writing this e-mail). The point is, XML part is
>> completely free of any vendor-specific knobs.
> 
> That's not really what you want though, a user actually cares whether
> they get an Intel of NVIDIA vGPU, we can't specify it as just a
> resolution and framebuffer size.  The user also doesn't want the model
> changing each time the VM is started, so not only do you *need* to know
> the vendor, you need to know the vendor model.  This is the only way to
> provide a consistent VM.  So as we discussed at the BoF, the libvirt
> xml will likely reference the vendor string, which will be a unique
> identifier that encompasses all the additional attributes we expose.
> Really the goal of the attributes is simply so you don't need a per
> vendor magic decoder ring to figure out the basic features of a given
> vendor string.  Thanks,

Okay, maybe I'm misunderstanding something. I just thought that users
will consult libvirt's nodedev driver (e.g. virsh nodedev-list && virsh
nodedev-dumpxml $id) to fetch vGPU capabilities and then use that info
to construct domain XML.
Also, I guess libvirt will need some sort of understanding of vGPUs in
sense that if there are two vGPUs in the system (say both INTEL and
NVIDIA) 

Re: [Qemu-devel] [libvirt] qapi DEVICE_DELETED event issued *before* instance_finalize?!

2016-09-01 Thread Michal Privoznik
On 02.09.2016 01:11, Alex Williamson wrote:
> Hey,
> 

> 
> It appears that DEVICE_DELETED only means the VM is done with the
> device but libvirt is interpreting it as QEMU is done with the device.
> Which is correct?  Do we need a new event or do we need to fix the
> ordering of this event?  An ordering fix would be more compatible with
> existing libvirt.  Thanks,

What an interesting race. I think the even should be sent only after
both guest and qemu are done with the device. Having two events looks
like too much granularity to me. I mean, even if libvirt learns that
guest has detached device, it still can't do anything until qemu clears
its internal state.

On the other hand, it is clearly documented that the DEVICE_DELETED
event is sent as soon as guest acknowledges completion of device
removal. So libvirt's buggy if we'd follow documentation strictly. But
then again, I don't see much information value in "guest has detached
device but qemu hasn't yet" event. Libvirt would ignore such event.

Michal



Re: [Qemu-devel] [PATCH for 2.8 10/11] Revert "intel_iommu: Throw hw_error on notify_started"

2016-09-01 Thread David Gibson
On Thu, 1 Sep 2016 11:58:48 +0800
Peter Xu  wrote:

> On Wed, Aug 31, 2016 at 08:43:42PM -0600, Alex Williamson wrote:
> > > > >>This reverts commit 3cb3b1549f5401dc3a5e1d073e34063dc274136f. Vhost
> > > > >>device IOTLB API will get notified and send invalidation request to
> > > > >>vhost through this notifier.
> > > > >AFAICT this series does not address the original problem for which
> > > > >commit 3cb3b1549f54 was added.  We've only addressed the very narrow
> > > > >use case of a device iotlb firing the iommu notifier therefore this
> > > > >change is a regression versus 2.7 since it allows invalid
> > > > >configurations with a physical iommu which will never receive the
> > > > >necessary notifies from intel-iommu emulation to work properly.  
> > > > >Thanks,
> > > > >
> > > > >Alex
> > > > 
> > > > Looking at vfio, it cares about map but vhost only cares about IOTLB
> > > > invalidation. Then I think we probably need another kind of notifier in 
> > > > this
> > > > case to avoid this.
> > > 
> > > Shall we leverage IOMMUTLBEntry.perm == IOMMU_NONE as a sign for
> > > invalidation? If so, we can use the same IOTLB interface as before.
> > > IMHO these two interfaces are not conflicting?
> > > 
> > > Alex,
> > > 
> > > Do you mean we should still disallow user from passing through devices
> > > while Intel IOMMU enabled? If so, not sure whether patch below can
> > > solve the issue.
> > > 
> > > It seems that we need a "name" for either IOMMU notifier
> > > provider/consumer, and we should not allow (provider==Intel &&
> > > consumer==VFIO) happen. In the following case, I added a name for
> > > provider, and VFIO checks it.  
> > 
> > Absolutely not, intel-iommu emulation is simply incomplete, the IOMMU
> > notifier is never called for mappings.  There's a whole aspect of
> > iommu notifiers that intel-iommu simply hasn't bothered to implement.
> > Don't punish vfio for actually making use of the interface as it was
> > intended to be used.  AFAICT you're implementing the unmap/invalidation
> > half, without the actual mapping half of the interface.  It's broken
> > and incompatible with any iommu notifiers that expect to see both
> > sides.  Thanks,  
> 
> Yeah I think I got your point. Thanks for the explanation.
> 
> Now I agree with Jason that we may need another notifier mechanism.

What!?  I see no reason you need a different notifier, just fix the
implementation of the current one.  As a bonus this will also give you
working VFIO passthrough with vIOMMU on x86, something which should
work already, but doesn't.


-- 
David Gibson 
Senior Software Engineer, Virtualization, Red Hat


pgpRiO0B4M2ZV.pgp
Description: OpenPGP digital signature


[Qemu-devel] [PATCH for-2.7] vnc: fix qemu crash because of SIGSEGV

2016-09-01 Thread Gonglei
The backtrace is:

0x7f0b75cdf880 in pixman_image_get_stride () from /lib64/libpixman-1.so.0
0x7f0b77bcb3cf in vnc_server_fb_stride (vd=0x7f0b7a1a2bb0) at ui/vnc.c:680
vnc_dpy_copy (dcl=0x7f0b7a1a2c00, src_x=224, src_y=263, dst_x=319, dst_y=363, 
w=1, h=1) at ui/vnc.c:915
0x7f0b77bbcc35 in dpy_gfx_copy (con=0x7f0b7a146210, src_x=src_x@entry=224, 
src_y=src_y@entry=263, dst_x=dst_x@entry=319,
dst_y=dst_y@entry=363, w=1, h=1) at ui/console.c:1575
0x7f0b77bbda4e in qemu_console_copy (con=, 
src_x=src_x@entry=224, src_y=src_y@entry=263, dst_x=dst_x@entry=319,
dst_y=dst_y@entry=363, w=, h=) at 
ui/console.c:2111
0x7f0b77ac0980 in cirrus_do_copy (h=, w=, 
src=, dst=, s=0x7f0b7b086090) at 
hw/display/cirrus_vga.c:774
cirrus_bitblt_videotovideo_copy (s=0x7f0b7b086090) at 
hw/display/cirrus_vga.c:793
cirrus_bitblt_videotovideo (s=0x7f0b7b086090) at hw/display/cirrus_vga.c:915
cirrus_bitblt_start (s=0x7f0b7b086090) at hw/display/cirrus_vga.c:1056
0x7f0b77965cfb in memory_region_write_accessor (mr=0x7f0b7b096e40, 
addr=320, value=, size=1, shift=,mask=, attrs=...) at /root/rpmbuild/BUILD/master/qemu/memory.c:525
0x7f0b77963f59 in access_with_adjusted_size (addr=addr@entry=320, 
value=value@entry=0x7f0b69a268d8, size=size@entry=4,
access_size_min=, access_size_max=, 
access=access@entry=0x7f0b77965c80 ,
mr=mr@entry=0x7f0b7b096e40, attrs=attrs@entry=...) at 
/root/rpmbuild/BUILD/master/qemu/memory.c:591
0x7f0b77968315 in memory_region_dispatch_write (mr=mr@entry=0x7f0b7b096e40, 
addr=addr@entry=320, data=18446744073709551362,
size=size@entry=4, attrs=attrs@entry=...) at 
/root/rpmbuild/BUILD/master/qemu/memory.c:1262
0x7f0b779256a9 in address_space_write_continue (mr=0x7f0b7b096e40, l=4, 
addr1=320, len=4, buf=0x7f0b77713028 "\002\377\377\377",
attrs=..., addr=4273930560, as=0x7f0b7827d280 ) at 
/root/rpmbuild/BUILD/master/qemu/exec.c:2544
address_space_write (as=, addr=, attrs=..., 
buf=, len=) at 
/root/rpmbuild/BUILD/master/qemu/exec.c:2601
0x7f0b77925c1d in address_space_rw (as=, addr=, attrs=..., attrs@entry=...,
buf=buf@entry=0x7f0b77713028 "\002\377\377\377", len=, 
is_write=) at /root/rpmbuild/BUILD/master/qemu/exec.c:2703
0x7f0b77962f53 in kvm_cpu_exec (cpu=cpu@entry=0x7f0b79fcc2d0) at 
/root/rpmbuild/BUILD/master/qemu/kvm-all.c:1965
0x7f0b77950cc6 in qemu_kvm_cpu_thread_fn (arg=0x7f0b79fcc2d0) at 
/root/rpmbuild/BUILD/master/qemu/cpus.c:1078
0x7f0b744b3dc5 in start_thread (arg=0x7f0b69a27700) at pthread_create.c:308
0x7f0b70d3d66d in clone () from /lib64/libc.so.6

The code path while meeting segfault:
 vnc_dpy_copy
   vnc_update_client
 vnc_disconnect_finish [while vnc_disconnect_start() is invoked because 
somethins wrong]
   vnc_update_server_surface
 vd->server = NULL;
   vnc_server_fb_stride
 pixman_image_get_stride(vd->server)

Let's add a non-NULL check before calling vnc_server_fb_stride() to avoid 
segmentation fault.

Cc: Gerd Hoffmann 
Cc: Daniel P. Berrange 
Reported-by: Yanying Zhuang 
Signed-off-by: Gonglei 
---
 ui/vnc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/ui/vnc.c b/ui/vnc.c
index d1087c9..76a3273 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -911,6 +911,10 @@ static void vnc_dpy_copy(DisplayChangeListener *dcl,
 }
 }
 
+if (!vd->server) {
+/* no client connected */
+return;
+}
 /* do bitblit op on the local surface too */
 pitch = vnc_server_fb_stride(vd);
 src_row = vnc_server_fb_ptr(vd, src_x, src_y);
-- 
1.7.12.4





Re: [Qemu-devel] [PATCH v2 4/6] virtio-balloon: keep collecting stats on save/restore

2016-09-01 Thread Roman Kagan
On Thu, Sep 01, 2016 at 05:43:05PM +0200, Ladi Prosek wrote:
> On Thu, Sep 1, 2016 at 4:17 PM, Roman Kagan  wrote:
> > I'm not happy with this patch: it tries to solve the problem on the
> > "save" side and therefore doesn't fix the bug when migrating from an
> > earlier QEMU version.
> >
> > I wonder if we can do better and solve it on the "load" side.  (At first
> > I thought that your patch did that but on a closer look it turned out
> > not the case).
> >
> > In particular, with Stefan's patch to restore VirtQueue->inuse, we
> > should be able to just rewind ->last_avail_idx by ->inuse during "load",
> > which AFAICS would also fix the bug.  What do you think?
> 
> Stefan was actually going to do exactly that in
> 
> https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg02455.html

Hmm, obviously my mail search skills need to be improved :(  I already
coded a patch before I saw this.

> but then dropped the patch in favor of this "save" side approach.
> Going with Stefan's fix seems kind of unfortunate from a future-proof
> perspective. It wouldn't be easy to revert it if we ever decide that
> we want to leave something in the queue. But I understand that solving
> the problem on the "load" side would be more helpful for real-world
> deployments. Maybe we could do both but gate the load side hack with a
> source QEMU version check? I wonder what Stefan thinks.

I woudn't say it looked like a hack so I wouldn't bother with a version
check.  Anyway let's look at the code and decide.

> >> Could you please resubmit and use the set_status callback instead
> >> of adding another VM state change handler?
> >>
> >> static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
> >> {
> >> VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
> >>
> >> if (!vdev->vm_running && s->stats_vq_elem) {
> >> balloon_stats_push_elem(s);
> >> }
> >> }
> >>
> >> and
> >>
> >> vdc->set_status = virtio_balloon_set_status;
> >> in virtio_balloon_class_init.
> >
> > If the scheme I described above works out this won't be needed at all.

Sure I was wrong here, I used it to kick the stats collection on load.

I'll post the patch in a minute.

Roman.



Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-01 Thread Gonglei (Arei)
Hi Alex,


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, September 01, 2016 9:37 PM
> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device 
> specification
> 
> On 08/30/2016 02:12 PM, Gonglei wrote:
> > The virtio crypto device is a virtual crypto device (ie. hardware
> > crypto accelerator card). The virtio crypto device can provide
> > five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
> >
> > In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> 
> I have mostly a few high level comments.
> 
> For starters, a lot of the structs rely on the compiler to pad them to
> natural alignment. That may get us into trouble when trying to emulate a
> virtio device on a host with different guest architecture (like arm on
> x86). It'd be a good idea to manually pad everything to be 64bit aligned
> - then all fields are always at the same spot.
> 
Good point! I'll do this in the next version. Thanks!

> I also have a hard time getting my head around the final flow of
> everything. Do I always have to create a session to be able to emit a
> command? In that case, doesn't that slow down everything, since a
> request would then need to wait for the host reply to receive its
> session id? There should be some way to fire off a simple non-iv
> operation without any session set up imho.
> 
For symmetric algorithms, we'd better create a session before executing 
encryption
Or decryption, because the session usually be kept for a specific
algorithm with specific key in the production environment. And if we only 
change the iv, 
we don't need to re-create the session. 

For the asymmetric algorithms, we don't need create a session IIRC.

So, I don't think this is a performance degradation, but a performance 
enhancement.

> Also, I don't fully understand the split between control and data
> queues. As far as I read things, the control queue is used to create
> session ids and the data queues can then be used to push data. Is there
> any particular reason for the split? One thing that seems natural to me
> would be to have sessions be per-queue, so you would create a session on
> a particular queue and only have it ever be available there. That way
> you get rid of any locking for sessions.
> 
We want to keep a unify request type (structure) for data queue, so we can
keep the session operation in the control queue. Of course the control queue
only used to create sessions currently, but we can extend its functions if 
needed
in the future.

> 
> Alex

Regards,
-Gonglei



Re: [Qemu-devel] [PATCH] spapr: implement H_CHANGE_LOGICAL_LAN_MAC h_call

2016-09-01 Thread David Gibson
On Thu, Sep 01, 2016 at 10:10:49AM +0200, Laurent Vivier wrote:
> Since kernel v4.0, linux uses H_CHANGE_LOGICAL_LAN_MAC to change lively
> the MAC address of an ibmveth interface.
> 
> As QEMU doesn't implement this h_call, we can't change anymore the
> MAC address of an spapr-vlan interface.
> 
> Signed-off-by: Laurent Vivier 

Mostly looks good, but I have a couple of queries.

> ---
>  hw/net/spapr_llan.c | 30 ++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
> index b273eda..4bb95a5 100644
> --- a/hw/net/spapr_llan.c
> +++ b/hw/net/spapr_llan.c
> @@ -106,6 +106,7 @@ typedef struct VIOsPAPRVLANDevice {
>  VIOsPAPRDevice sdev;
>  NICConf nicconf;
>  NICState *nic;
> +MACAddr perm_mac;

Looking at virtio-net, I see that it copies the mac address from
nic->conf on reset.  Could we do that, to avoid creating an extra
field in the state?

>  bool isopen;
>  hwaddr buf_list;
>  uint32_t add_buf_ptr, use_buf_ptr, rx_bufs;
> @@ -316,6 +317,10 @@ static void spapr_vlan_reset(VIOsPAPRDevice *sdev)
>  spapr_vlan_reset_rx_pool(dev->rx_pool[i]);
>  }
>  }
> +
> +memcpy(>nicconf.macaddr.a, >perm_mac.a,
> +   sizeof(dev->nicconf.macaddr.a));
> +qemu_format_nic_info_str(qemu_get_queue(dev->nic), 
> dev->nicconf.macaddr.a);
>  }
>  
>  static void spapr_vlan_realize(VIOsPAPRDevice *sdev, Error **errp)
> @@ -324,6 +329,8 @@ static void spapr_vlan_realize(VIOsPAPRDevice *sdev, 
> Error **errp)
>  
>  qemu_macaddr_default_if_unset(>nicconf.macaddr);
>  
> +memcpy(>perm_mac.a, >nicconf.macaddr.a, 
> sizeof(dev->perm_mac.a));
> +
>  dev->nic = qemu_new_nic(_spapr_vlan_info, >nicconf,
>  object_get_typename(OBJECT(sdev)), 
> sdev->qdev.id, dev);
>  qemu_format_nic_info_str(qemu_get_queue(dev->nic), 
> dev->nicconf.macaddr.a);
> @@ -756,6 +763,27 @@ static target_ulong h_multicast_ctrl(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  return H_SUCCESS;
>  }
>  
> +static target_ulong h_change_logical_lan_mac(PowerPCCPU *cpu,
> + sPAPRMachineState *spapr,
> + target_ulong opcode,
> + target_ulong *args)
> +{
> +target_ulong reg = args[0];
> +target_ulong macaddr = args[1];
> +VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +VIOsPAPRVLANDevice *dev = VIO_SPAPR_VLAN_DEVICE(sdev);
> +int i;
> +
> +for (i = 0; i < ETH_ALEN; i++) {
> +dev->nicconf.macaddr.a[ETH_ALEN - i - 1] = macaddr & 0xff;
> +macaddr >>= 8;
> +}
> +
> +qemu_format_nic_info_str(qemu_get_queue(dev->nic), 
> dev->nicconf.macaddr.a);
> +
> +return H_SUCCESS;
> +}
> +
>  static Property spapr_vlan_properties[] = {
>  DEFINE_SPAPR_PROPERTIES(VIOsPAPRVLANDevice, sdev),
>  DEFINE_NIC_PROPERTIES(VIOsPAPRVLANDevice, nicconf),
> @@ -854,6 +882,8 @@ static void spapr_vlan_register_types(void)
>  spapr_register_hypercall(H_ADD_LOGICAL_LAN_BUFFER,
>   h_add_logical_lan_buffer);
>  spapr_register_hypercall(H_MULTICAST_CTRL, h_multicast_ctrl);
> +spapr_register_hypercall(H_CHANGE_LOGICAL_LAN_MAC,
> + h_change_logical_lan_mac);
>  type_register_static(_vlan_info);
>  }

Now that the MAC is guest changeable, do we need to add something to
let it be migrated?  Or is that already included in the migration
state for the base NIC info?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] DAX can not work on virtual nvdimm device

2016-09-01 Thread Ross Zwisler
On Wed, Aug 31, 2016 at 04:44:47PM +0800, Xiao Guangrong wrote:
> On 08/31/2016 01:09 AM, Dan Williams wrote:
> > 
> > Can you post your exact reproduction steps?  This test is not failing for 
> > me.
> > 
> 
> Sure.
> 
> 1. make the guest kernel based on your tree, the top commit is
>10d7902fa0e82b (dax: unmap/truncate on device shutdown) and
>the config file can be found in this thread.
> 
> 2. add guest kernel command line: memmap=6G!10G
> 
> 3: start the guest:
>x86_64-softmmu/qemu-system-x86_64 -machine pc,nvdimm --enable-kvm \
>-smp 16 -m 32G,maxmem=100G,slots=100 /other/VMs/centos6.img -monitor stdio
> 
> 4: in guest:
>mkfs.ext4 /dev/pmem0
>mount -o dax /dev/pmem0  /mnt/pmem/
>echo > /mnt/pmem/xxx
>./mmap /mnt/pmem/xxx
>./read /mnt/pmem/xxx
> 
>   The source code of mmap and read has been attached in this mail.
> 
>   Hopefully, you can detect the error triggered by read test.
> 
> Thanks!

Okay, I think I've isolated this issue.  Xiao's VM was an old CentOS 6 system,
and for some reason ext4+DAX with the old tools found in that VM fails.  I was
able to reproduce this failure with a freshly installed CentOS 6.8 VM.

You can see the failure with his tests, or perhaps more easily with this
series of commands:

  # mkfs.ext4 /dev/pmem0
  # mount -o dax /dev/pmem0  /mnt/pmem/
  # touch /mnt/pmem/x
  # md5sum /mnt/pmem/x
  md5sum: /mnt/pmem/x: Bad address

This sequence of commands works fine in the old CentOS 6 system if you use XFS
instead of ext4, and it works fine with both ext4 and XFS in CentOS 7 and
with recent versions of Fedora.

I've added the ext4 folks to this mail in case they care, but my guess is that
the tools in CentOS 6 are so old that it's not worth worrying about.  For
reference, the kernel in CentOS 6 is based on 2.6.32.  :)  DAX was introduced
in v4.0.



[Qemu-devel] [PATCH RESEND v3] target-i386: present virtual L3 cache info for vcpus

2016-09-01 Thread Longpeng(Mike)
Some software algorithms are based on the hardware's cache info, for example,
for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger
a resched IPI and told cpu2 to do the wakeup if they don't share low level
cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc.
The relevant linux-kernel code as bellow:

static void ttwu_queue(struct task_struct *p, int cpu)
{
struct rq *rq = cpu_rq(cpu);
..
if (... && !cpus_share_cache(smp_processor_id(), cpu)) {
..
ttwu_queue_remote(p, cpu); /* will trigger RES IPI */
return;
}
..
ttwu_do_activate(rq, p, 0); /* access target's rq directly */
..
}

In real hardware, the cpus on the same socket share L3 cache, so one won't
trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a
virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs
under some workloads even if the virtual cpus belongs to the same virtual 
socket.

For KVM, this degrades performance, because there will be lots of vmexit due to
guest send IPIs.

The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates)
and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during
the period:

No-L3   With-L3(applied this patch)
cpu0:   363890  44582
cpu1:   373405  43109
cpu2:   340783  43797
cpu3:   333854  43409
cpu4:   327170  40038
cpu5:   325491  39922
cpu6:   319129  42391
cpu7:   306480  41035
cpu8:   161139  32188
cpu9:   164649  31024
cpu10:  149823  30398
cpu11:  149823  32455
cpu12:  164830  35143
cpu13:  172269  35805
cpu14:  179979  33898
cpu15:  194505  32754
avg:268963.640129.8

The VM's topology is "1*socket 8*cores 2*threads".
After present virtual L3 cache info for VM, the amounts of RES IPIs in guest
reduce 85%.

What's more, for KVM, vcpus send IPIs will cause vmexit which is expensive.
We had tested the overall system performance if vcpus actually run on sparate
physical socket. With L3 cache, the performance improves 7.2%~33.1%(avg:15.7%).

Signed-off-by: Longpeng(Mike) 
---
Changes since v2:
  - add more useful commit mesage.
  - rename "compat-cache" to "l3-cache-shared".

Changes since v1:
  - fix the compat problem: set compat_props on PC_COMPAT_2_7.
  - fix a "intentionally introducde bug": make intel's and amd's consistently.
  - fix the CPUID.(EAX=4, ECX=3):EAX[25:14].
  - test the performance if vcpus running on sparate sockets: with L3 cache,
the performance improves 7.2%~33.1%(avg: 15.7%).
---
 include/hw/i386/pc.h |  8 
 target-i386/cpu.c| 49 -
 target-i386/cpu.h|  5 +
 3 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 74c175c..c92c54e 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -367,7 +367,15 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t);
 int e820_get_num_entries(void);
 bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
 
+#define PC_COMPAT_2_7 \
+{\
+.driver   = TYPE_X86_CPU,\
+.property = "l3-cache-shared",\
+.value= "off",\
+},
+
 #define PC_COMPAT_2_6 \
+PC_COMPAT_2_7 \
 HW_COMPAT_2_6 \
 {\
 .driver   = "fw_cfg_io",\
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 6a1afab..4f93922 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -57,6 +57,7 @@
 #define CPUID_2_L1D_32KB_8WAY_64B 0x2c
 #define CPUID_2_L1I_32KB_8WAY_64B 0x30
 #define CPUID_2_L2_2MB_8WAY_64B   0x7d
+#define CPUID_2_L3_16MB_16WAY_64B 0x4d
 
 
 /* CPUID Leaf 4 constants: */
@@ -131,11 +132,18 @@
 #define L2_LINES_PER_TAG   1
 #define L2_SIZE_KB_AMD   512
 
-/* No L3 cache: */
+/* Level 3 unified cache: */
 #define L3_SIZE_KB 0 /* disabled */
 #define L3_ASSOCIATIVITY   0 /* disabled */
 #define L3_LINES_PER_TAG   0 /* disabled */
 #define L3_LINE_SIZE   0 /* disabled */
+#define L3_N_LINE_SIZE 64
+#define L3_N_ASSOCIATIVITY 16
+#define L3_N_SETS   16384
+#define L3_N_PARTITIONS 1
+#define L3_N_DESCRIPTOR CPUID_2_L3_16MB_16WAY_64B
+#define L3_N_LINES_PER_TAG  1
+#define L3_N_SIZE_KB_AMD16384
 
 /* TLB definitions: */
 
@@ -2275,6 +2283,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 {
 X86CPU *cpu = x86_env_get_cpu(env);
 CPUState *cs = CPU(cpu);
+uint32_t pkg_offset;
 
 /* test if maximum index reached */
 if (index & 0x8000) {
@@ -2328,7 +2337,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 }
  

Re: [Qemu-devel] [PATCH v3] target-i386: present virtual L3 cache info for vcpus

2016-09-01 Thread Gonglei (Arei)
> -Original Message-
> From: longpeng
> Sent: Friday, September 02, 2016 10:23 AM
> To: ehabk...@redhat.com; r...@twiddle.net; pbonz...@redhat.com;
> m...@redhat.com
> Cc: Zhaoshenglong; Gonglei (Arei); Huangpeng (Peter); Herongguang (Stephen);
> qemu-devel@nongnu.org; Longpeng(Mike)
> Subject: [PATCH v3] target-i386: present virtual L3 cache info for vcpus
> 
> From: "Longpeng(Mike)" 
> 

A typo in email address, pls resend the v3.

> Some software algorithms are based on the hardware's cache info, for
> example,
> for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will
> trigger
> a resched IPI and told cpu2 to do the wakeup if they don't share low level
> cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share 
> llc.
> The relevant linux-kernel code as bellow:
> 
>   static void ttwu_queue(struct task_struct *p, int cpu)
>   {
>   struct rq *rq = cpu_rq(cpu);
>   ..
>   if (... && !cpus_share_cache(smp_processor_id(), cpu)) {
>   ..
>   ttwu_queue_remote(p, cpu); /* will trigger RES IPI */
>   return;
>   }
>   ..
>   ttwu_do_activate(rq, p, 0); /* access target's rq directly */
>   ..
>   }
> 
> In real hardware, the cpus on the same socket share L3 cache, so one won't
> trigger a resched IPIs when wakeup a task on others. But QEMU doesn't
> present a
> virtual L3 cache info for VM, then the linux guest will trigger lots of RES 
> IPIs
> under some workloads even if the virtual cpus belongs to the same virtual
> socket.
> 
> For KVM, this degrades performance, because there will be lots of vmexit due
> to
> guest send IPIs.
> 
> The workload is a SAP HANA's testsuite, we run it one round(about 40
> minuates)
> and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering
> during
> the period:
> 
> No-L3   With-L3(applied this patch)
> cpu0: 363890  44582
> cpu1: 373405  43109
> cpu2: 340783  43797
> cpu3: 333854  43409
> cpu4: 327170  40038
> cpu5: 325491  39922
> cpu6: 319129  42391
> cpu7: 306480  41035
> cpu8: 161139  32188
> cpu9: 164649  31024
> cpu10:149823  30398
> cpu11:149823  32455
> cpu12:164830  35143
> cpu13:172269  35805
> cpu14:179979  33898
> cpu15:194505  32754
> avg:  268963.640129.8
> 
> The VM's topology is "1*socket 8*cores 2*threads".
> After present virtual L3 cache info for VM, the amounts of RES IPIs in guest
> reduce 85%.
> 
> What's more, for KVM, vcpus send IPIs will cause vmexit which is expensive.
> We had tested the overall system performance if vcpus actually run on sparate
> physical socket. With L3 cache, the performance improves
> 7.2%~33.1%(avg:15.7%).
> 
> Signed-off-by: Longpeng(Mike) 
>
Here as well.

Regards,
-Gonglei



[Qemu-devel] [PATCH v3] target-i386: present virtual L3 cache info for vcpus

2016-09-01 Thread Longpeng(Mike)
From: "Longpeng(Mike)" 

Some software algorithms are based on the hardware's cache info, for example,
for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger
a resched IPI and told cpu2 to do the wakeup if they don't share low level
cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc.
The relevant linux-kernel code as bellow:

static void ttwu_queue(struct task_struct *p, int cpu)
{
struct rq *rq = cpu_rq(cpu);
..
if (... && !cpus_share_cache(smp_processor_id(), cpu)) {
..
ttwu_queue_remote(p, cpu); /* will trigger RES IPI */
return;
}
..
ttwu_do_activate(rq, p, 0); /* access target's rq directly */
..
}

In real hardware, the cpus on the same socket share L3 cache, so one won't
trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a
virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs
under some workloads even if the virtual cpus belongs to the same virtual 
socket.

For KVM, this degrades performance, because there will be lots of vmexit due to
guest send IPIs.

The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates)
and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during
the period:

No-L3   With-L3(applied this patch)
cpu0:   363890  44582
cpu1:   373405  43109
cpu2:   340783  43797
cpu3:   333854  43409
cpu4:   327170  40038
cpu5:   325491  39922
cpu6:   319129  42391
cpu7:   306480  41035
cpu8:   161139  32188
cpu9:   164649  31024
cpu10:  149823  30398
cpu11:  149823  32455
cpu12:  164830  35143
cpu13:  172269  35805
cpu14:  179979  33898
cpu15:  194505  32754
avg:268963.640129.8

The VM's topology is "1*socket 8*cores 2*threads".
After present virtual L3 cache info for VM, the amounts of RES IPIs in guest
reduce 85%.

What's more, for KVM, vcpus send IPIs will cause vmexit which is expensive.
We had tested the overall system performance if vcpus actually run on sparate
physical socket. With L3 cache, the performance improves 7.2%~33.1%(avg:15.7%).

Signed-off-by: Longpeng(Mike) 
---
Changes since v2:
  - add more useful commit mesage.
  - rename "compat-cache" to "l3-cache-shared".

Changes since v1:
  - fix the compat problem: set compat_props on PC_COMPAT_2_7.
  - fix a "intentionally introducde bug": make intel's and amd's consistently.
  - fix the CPUID.(EAX=4, ECX=3):EAX[25:14].
  - test the performance if vcpus running on sparate sockets: with L3 cache,
the performance improves 7.2%~33.1%(avg: 15.7%).
---
 include/hw/i386/pc.h |  8 
 target-i386/cpu.c| 49 -
 target-i386/cpu.h|  5 +
 3 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 74c175c..c92c54e 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -367,7 +367,15 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t);
 int e820_get_num_entries(void);
 bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
 
+#define PC_COMPAT_2_7 \
+{\
+.driver   = TYPE_X86_CPU,\
+.property = "l3-cache-shared",\
+.value= "off",\
+},
+
 #define PC_COMPAT_2_6 \
+PC_COMPAT_2_7 \
 HW_COMPAT_2_6 \
 {\
 .driver   = "fw_cfg_io",\
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 6a1afab..4f93922 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -57,6 +57,7 @@
 #define CPUID_2_L1D_32KB_8WAY_64B 0x2c
 #define CPUID_2_L1I_32KB_8WAY_64B 0x30
 #define CPUID_2_L2_2MB_8WAY_64B   0x7d
+#define CPUID_2_L3_16MB_16WAY_64B 0x4d
 
 
 /* CPUID Leaf 4 constants: */
@@ -131,11 +132,18 @@
 #define L2_LINES_PER_TAG   1
 #define L2_SIZE_KB_AMD   512
 
-/* No L3 cache: */
+/* Level 3 unified cache: */
 #define L3_SIZE_KB 0 /* disabled */
 #define L3_ASSOCIATIVITY   0 /* disabled */
 #define L3_LINES_PER_TAG   0 /* disabled */
 #define L3_LINE_SIZE   0 /* disabled */
+#define L3_N_LINE_SIZE 64
+#define L3_N_ASSOCIATIVITY 16
+#define L3_N_SETS   16384
+#define L3_N_PARTITIONS 1
+#define L3_N_DESCRIPTOR CPUID_2_L3_16MB_16WAY_64B
+#define L3_N_LINES_PER_TAG  1
+#define L3_N_SIZE_KB_AMD16384
 
 /* TLB definitions: */
 
@@ -2275,6 +2283,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 {
 X86CPU *cpu = x86_env_get_cpu(env);
 CPUState *cs = CPU(cpu);
+uint32_t pkg_offset;
 
 /* test if maximum index reached */
 if (index & 0x8000) {
@@ -2328,7 +2337,11 @@ void cpu_x86_cpuid(CPUX86State 

Re: [Qemu-devel] [Xen-devel] [PATCH 2/2] xen_platform: SUSE xenlinux unplug for emulated PCI

2016-09-01 Thread Stefano Stabellini
On Thu, 1 Sep 2016, Konrad Rzeszutek Wilk wrote:
> On Thu, Sep 01, 2016 at 02:11:31PM +0200, Olaf Hering wrote:
> > Implement SUSE specific unplug protocol for emulated PCI devices
> > in PVonHVM guests. Its a simple 'outl(1, (ioaddr + 4));'.
> > This protocol was implemented and used since Xen 3.0.4.
> > It is used in all SUSE/SLES/openSUSE releases up to SLES11SP3 and
> > openSUSE 12.3.
> 
> Should this be documented in the protocol?

Yes, please. The file is docs/misc/hvm-emulated-unplug.markdown in the
Xen repository. Please also document the behavior with SCSI disks, which
is currently missing.


> > 
> > Signed-off-by: Olaf Hering 
> > ---
> >  hw/i386/xen/xen_platform.c | 31 ++-
> >  1 file changed, 30 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
> > index d94b53c..8802482 100644
> > --- a/hw/i386/xen/xen_platform.c
> > +++ b/hw/i386/xen/xen_platform.c
> > @@ -314,13 +314,42 @@ static void xen_platform_ioport_writeb(void *opaque, 
> > hwaddr addr,
> > uint64_t val, unsigned int size)
> >  {
> >  PCIXenPlatformState *s = opaque;
> > +PCIDevice *pci_dev = PCI_DEVICE(s);
> >  
> >  switch (addr) {
> >  case 0: /* Platform flags */
> >  platform_fixed_ioport_writeb(opaque, 0, (uint32_t)val);
> >  break;
> > +case 4:
> > +if (val == 1) {
> > +/*
> > + * SUSE unplug for Xenlinux
> > + * xen-kmp used this since xen-3.0.4, instead the official 
> > protocol from xen-3.3+
> > + * It did an unconditional "outl(1, (ioaddr + 4));"
> > + * Pre VMDP 1.7 made use of 4 and 8 depending on how VMDP was 
> > configured.
> > + * If VMDP was to control both disk and LAN it would use 4.
> > + * If it controlled just disk or just LAN, it would use 8 
> > below.
> > + */
> > +blk_drain_all();
> > +blk_flush_all();
> > +pci_unplug_disks(pci_dev->bus);
> > +pci_unplug_nics(pci_dev->bus);
> > +}
> > +break;
> >  case 8:
> > -log_writeb(s, (uint32_t)val);
> > +switch (val) {
> > +case 1:
> > +blk_drain_all();
> > +blk_flush_all();
> > +pci_unplug_disks(pci_dev->bus);
> > +break;
> > +case 2:
> > +pci_unplug_nics(pci_dev->bus);
> > +break;
> > +default:
> > +log_writeb(s, (uint32_t)val);
> > +break;
> > +}
> >  break;
> >  default:
> >  break;
> > 
> > ___
> > Xen-devel mailing list
> > xen-de...@lists.xen.org
> > https://lists.xen.org/xen-devel
> 



Re: [Qemu-devel] [PATCH v2] target-i386: present virtual L3 cache info for vcpus

2016-09-01 Thread Longpeng (Mike)
Hi Michael,

On 2016/9/1 21:27, Michael S. Tsirkin wrote:

> On Thu, Sep 01, 2016 at 02:58:05PM +0800, l00371263 wrote:
>> From: "Longpeng(Mike)" 
>>
>> Some software algorithms are based on the hardware's cache info, for example,
>> for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will 
>> trigger
>> a resched IPI and told cpu2 to do the wakeup if they don't share low level
>> cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share 
>> llc.
>> The relevant linux-kernel code as bellow:
>>
>>  static void ttwu_queue(struct task_struct *p, int cpu)
>>  {
>>  struct rq *rq = cpu_rq(cpu);
>>  ..
>>  if (... && !cpus_share_cache(smp_processor_id(), cpu)) {
>>  ..
>>  ttwu_queue_remote(p, cpu); /* will trigger RES IPI */
>>  return;
>>  }
>>  ..
>>  ttwu_do_activate(rq, p, 0); /* access target's rq directly */
>>  ..
>>  }
>>
>> In real hardware, the cpus on the same socket share L3 cache, so one won't
>> trigger a resched IPIs when wakeup a task on others. But QEMU doesn't 
>> present a
>> virtual L3 cache info for VM, then the linux guest will trigger lots of RES 
>> IPIs
>> under some workloads even if the virtual cpus belongs to the same virtual 
>> socket.
>>
>> For KVM, this degrades performance, because there will be lots of vmexit due 
>> to
>> guest send IPIs.
>>
>> The workload is a SAP HANA's testsuite, we run it one round(about 40 
>> minuates)
>> and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering 
>> during
>> the period:
>>
>> No-L3   With-L3(applied this patch)
>> cpu0:363890  44582
>> cpu1:373405  43109
>> cpu2:340783  43797
>> cpu3:333854  43409
>> cpu4:327170  40038
>> cpu5:325491  39922
>> cpu6:319129  42391
>> cpu7:306480  41035
>> cpu8:161139  32188
>> cpu9:164649  31024
>> cpu10:   149823  30398
>> cpu11:   149823  32455
>> cpu12:   164830  35143
>> cpu13:   172269  35805
>> cpu14:   179979  33898
>> cpu15:   194505  32754
>> avg: 268963.640129.8
>>
>> The VM's topology is "1*socket 8*cores 2*threads".
>> After present virtual L3 cache info for VM, the amounts of RES IPI in guest
>> reduce 85%.
>>
>> And we also test the overall system performance if vcpus actually run on
>> sparate physical sockets. With L3 cache, the performance improves 7.2%~33.1%
>> (avg: 15.7%).
> 
> Any idea why?  I'm guessing that on bare metal, it is
> sometimes cheaper to send IPIs with a separate cache, but on KVM,
> it is always cheaper to use memory, as this reduces the # of exits.
> Is this it?

Yeah, I think so, vmexit due to vcpu send IPIs is expensive.

> 
> It's worth listing here so that e.g. if it ever becomes possible to send
> IPIs without exits, we know we need to change this code.
> 

OK! I will add in next version.

>> Signed-off-by: Longpeng(Mike) 
>> ---
>> Hi Eduardo, 
>>`
>> Changes since v1:
>>   - fix the compat problem: set compat_props on PC_COMPAT_2_7.
>>   - fix a "intentionally introducde bug": make intel's and amd's 
>> consistently.
>>   - fix the CPUID.(EAX=4, ECX=3):EAX[25:14].
>>   - test the performance if vcpus running on sparate sockets: with L3 cache,
>> the performance improves 7.2%~33.1%(avg: 15.7%).
>> ---
>>  include/hw/i386/pc.h |  8 
>>  target-i386/cpu.c| 49 -
>>  target-i386/cpu.h|  3 +++
>>  3 files changed, 55 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
>> index 74c175c..6072625 100644
>> --- a/include/hw/i386/pc.h
>> +++ b/include/hw/i386/pc.h
>> @@ -367,7 +367,15 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t);
>>  int e820_get_num_entries(void);
>>  bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
>>  
>> +#define PC_COMPAT_2_7 \
>> +{\
>> +.driver   = TYPE_X86_CPU,\
>> +.property = "compat-cache",\
>> +.value= "on",\
>> +},
>> +
>>  #define PC_COMPAT_2_6 \
>> +PC_COMPAT_2_7 \
>>  HW_COMPAT_2_6 \
>>  {\
>>  .driver   = "fw_cfg_io",\
> 
> 
> Could this get a more informative name?
> E.g. l3-cache-shared ?

Thanks! I will take your suggestion in next version.

> 
>> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
>> index 6a1afab..224d967 100644
>> --- a/target-i386/cpu.c
>> +++ b/target-i386/cpu.c
>> @@ -57,6 +57,7 @@
>>  #define CPUID_2_L1D_32KB_8WAY_64B 0x2c
>>  #define CPUID_2_L1I_32KB_8WAY_64B 0x30
>>  #define CPUID_2_L2_2MB_8WAY_64B   0x7d
>> +#define CPUID_2_L3_16MB_16WAY_64B 0x4d
>>  
>>  
>>  /* CPUID Leaf 4 constants: */
>> @@ -131,11 +132,18 @@
>>  

Re: [Qemu-devel] [virtio-comment] Re: [PATCH] *** Vhost-pci RFC v2 ***

2016-09-01 Thread Wei Wang

On 09/01/2016 09:05 PM, Marc-André Lureau wrote:
On Thu, Sep 1, 2016 at 4:13 PM Wei Wang > wrote:


On 09/01/2016 04:49 PM, Marc-André Lureau wrote:
> Hi
>
> On Thu, Sep 1, 2016 at 12:19 PM Wei Wang 
> >> wrote:
>
> On 08/31/2016 08:30 PM, Marc-André Lureau wrote:
>
>> - If it could be made not pci-specific, a better name for the
>> device could be simply "driver": the driver of a virtio device.
>> Or the "slave" in vhost-user terminology - consumer of virtq. I
>> think you prefer to call it "backend" in general, but I find it
>> more confusing.
>
> Not really. A virtio device has it own driver (e.g. a virtio-net
> driver for a virtio-net device). A vhost-pci device plays
the role
> of a backend (just like vhost_net, vhost_user) for a virtio
> device. If we use the "device/driver" naming convention, the
> vhost-pci device is part of the "device". But I actually
prefer to
> use "frontend/backend" :) If we check the QEMU's
> doc/specs/vhost-user.txt, it also uses "backend" to describe.
>
>
> Yes, but it uses "backend" freely without any definition and to name
> eventually different things. (at least "slave" is being defined
as the
> consumer of virtq, but I think some people don't like to use
that word).
>

I think most people know the concept of backend/frontend, that's
probably the reason why they usually don't explicitly explain it in a
doc. If you guys don't have an objection, I suggest to use it in the
discussion :)  The goal here is to get the design finalized first.
When
it comes to the final spec wording phase, we can decide which
description is more proper.


"backend" is too broad for me. Instead I would stick to something 
closer to what we want to name and define. If it's the consumer of 
virtq, then why not call it that way.


OK. Let me get used to it (provider VM - frontend, consumer VM - 
backend).




> Have you thought about making the device not pci specific? I don't
> know much about mmio devices nor s/390, but if devices can hotplug
> their own memory (I believe mmio can), then it should be possible to
> define a device generic enough.

Not yet. I think the main difference would be the way to map the
frontend VM's memory (in our case, we use a BAR). Other things
should be
generic.


I hope some more knowledgeable people will chime in.


That would be great.




>
>> - Why is it required or beneficial to support multiple
"frontend"
>> devices over the same "vhost-pci" device? It could simplify
>> things if it was a single device. If necessary, that could also
>> be interesting as a vhost-user extension.
>
> We call it "multiple backend functionalities" (e.g.
vhost-pci-net,
> vhost-pci-scsi..). A vhost-pci driver contains multiple such
> backend functionalities, because in this way they can reuse
> (share) the same memory mapping. To be more precise, a vhost-pci
> device supplies the memory of a frontend VM, and all the backend
> functionalities need to access the same frontend VM memory,
so we
> consolidate them into one vhost-pci driver to use one vhost-pci
> device.
>
>
> That's what I imagined. Do you have a use case for that?

Currently, we only have the network use cases. I think we can
design it
that way (multple backend functionalities), which is more generic (not
just limited to network usages). When implementing it, we can
first have
the network backend functionality (i.e. vhost-pci-net) implemented. In
the future, if people are interested in other backend
functionalities, I
think it should be easy to add them.


My question is not about the support of various kind of devices (that 
is clearly a worthy goal to me) but to have support simultaneously of 
several frontend/provider devices on the same vhost-pci device: is 
this required or necessary? I think it would simplify things if it was 
1-1 instead, I would like to understand why you propose a different 
design.


It is not required, but necessary, I think. As mentioned above, those 
consumer-side functionalities basically access the same provider VM's 
memory, so I think one vhost-pci device is enough to hold that memory. 
When it comes to the consumer guest kernel, we only need to ioremap that 
memory once. Also, a pair of controlq-s is enough to handle the control 
path messages between all those functionalities and the QEMU. I think 
the design also looks compact in this way. what do you think?


If we make it an N-N model (each functionality has a vhost-pci 

Re: [Qemu-devel] [Xen-devel] [PATCH 2/2] xen_platform: SUSE xenlinux unplug for emulated PCI

2016-09-01 Thread Konrad Rzeszutek Wilk
On Thu, Sep 01, 2016 at 02:11:31PM +0200, Olaf Hering wrote:
> Implement SUSE specific unplug protocol for emulated PCI devices
> in PVonHVM guests. Its a simple 'outl(1, (ioaddr + 4));'.
> This protocol was implemented and used since Xen 3.0.4.
> It is used in all SUSE/SLES/openSUSE releases up to SLES11SP3 and
> openSUSE 12.3.

Should this be documented in the protocol?

> 
> Signed-off-by: Olaf Hering 
> ---
>  hw/i386/xen/xen_platform.c | 31 ++-
>  1 file changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
> index d94b53c..8802482 100644
> --- a/hw/i386/xen/xen_platform.c
> +++ b/hw/i386/xen/xen_platform.c
> @@ -314,13 +314,42 @@ static void xen_platform_ioport_writeb(void *opaque, 
> hwaddr addr,
> uint64_t val, unsigned int size)
>  {
>  PCIXenPlatformState *s = opaque;
> +PCIDevice *pci_dev = PCI_DEVICE(s);
>  
>  switch (addr) {
>  case 0: /* Platform flags */
>  platform_fixed_ioport_writeb(opaque, 0, (uint32_t)val);
>  break;
> +case 4:
> +if (val == 1) {
> +/*
> + * SUSE unplug for Xenlinux
> + * xen-kmp used this since xen-3.0.4, instead the official 
> protocol from xen-3.3+
> + * It did an unconditional "outl(1, (ioaddr + 4));"
> + * Pre VMDP 1.7 made use of 4 and 8 depending on how VMDP was 
> configured.
> + * If VMDP was to control both disk and LAN it would use 4.
> + * If it controlled just disk or just LAN, it would use 8 below.
> + */
> +blk_drain_all();
> +blk_flush_all();
> +pci_unplug_disks(pci_dev->bus);
> +pci_unplug_nics(pci_dev->bus);
> +}
> +break;
>  case 8:
> -log_writeb(s, (uint32_t)val);
> +switch (val) {
> +case 1:
> +blk_drain_all();
> +blk_flush_all();
> +pci_unplug_disks(pci_dev->bus);
> +break;
> +case 2:
> +pci_unplug_nics(pci_dev->bus);
> +break;
> +default:
> +log_writeb(s, (uint32_t)val);
> +break;
> +}
>  break;
>  default:
>  break;
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
> https://lists.xen.org/xen-devel



Re: [Qemu-devel] [Qemu-block] [PATCH v2 1/1] block-backend: allow flush on devices with open tray

2016-09-01 Thread Jeff Cody
On Thu, Sep 01, 2016 at 05:56:52PM -0400, John Snow wrote:
> If a device still has an attached BDS because the medium has not yet
> been removed, we will be unable to migrate to a new host because
> blk_flush will return an error for that backend.
> 
> Replace the call to blk_is_available to blk_is_inserted to weaken
> the check and allow flushes from the backend to work, while still
> disallowing flushes from the frontend/device model to work.
> 
> This fixes a regression present in 2.6.0 caused by the following commit:
> fe1a9cbc339bb54d20f1ca4c1e8788d16944d5cf
> block: Move some bdrv_*_all() functions to BB
> 
> Signed-off-by: John Snow 

Reviewed-by: Jeff Cody 


I don't have any real suggestions for a test either, except to point you to
test 091 which does a live migration test (which is not very helpful, I
know).


> ---
>  block/block-backend.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index effa038..36a32c3 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1058,7 +1058,7 @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t 
> offset,
>  BlockAIOCB *blk_aio_flush(BlockBackend *blk,
>BlockCompletionFunc *cb, void *opaque)
>  {
> -if (!blk_is_available(blk)) {
> +if (!blk_is_inserted(blk)) {
>  return blk_abort_aio_request(blk, cb, opaque, -ENOMEDIUM);
>  }
>  
> @@ -1118,7 +1118,7 @@ int blk_co_pdiscard(BlockBackend *blk, int64_t offset, 
> int count)
>  
>  int blk_co_flush(BlockBackend *blk)
>  {
> -if (!blk_is_available(blk)) {
> +if (!blk_is_inserted(blk)) {
>  return -ENOMEDIUM;
>  }
>  
> @@ -1127,7 +1127,7 @@ int blk_co_flush(BlockBackend *blk)
>  
>  int blk_flush(BlockBackend *blk)
>  {
> -if (!blk_is_available(blk)) {
> +if (!blk_is_inserted(blk)) {
>  return -ENOMEDIUM;
>  }
>  
> -- 
> 2.7.4
> 
> 



[Qemu-devel] qapi DEVICE_DELETED event issued *before* instance_finalize?!

2016-09-01 Thread Alex Williamson
Hey,

I'm out of my QOM depth, so I'll just beg for help in advance.  I
noticed in testing vfio-pci hotunplug that the host seems to be trying
to reclaim the device before QEMU is actually done with it, there's a
very short race where libvirt has seen the DEVICE_DELETED event and
tries to unbind the physical device from vfio-pci, the use count is
clearly non-zero because the host driver tries to send a device
request, but that event channel has already been torn down.  Nearly
immediately after, QEMU finally releases the device, but we can't do a
proper reset due to some issues with device references in the kernel.

When I run gdb on QEMU with breakpoints at
qapi_event_send_device_deleted() and vfio_instance_finalize(),  the
QAPI even happens first.  Clearly this is horribly wrong, right?  I
can't unmap my references to the vfio device file until my
instance_finalize is called, so I'm always going to have that open when
libvirt takes the DEVICE_DELETED event as a cue to return the device to
host drivers.  The call chains look like this:

#0  qapi_event_send_device_deleted (has_device=true, 
device=0x7f5ca3e36fb0 "hostdev0", 
path=0x7f5c89e84fe0 "/machine/peripheral/hostdev0", 
errp=0x7f5ca241f9e8 ) at qapi-event.c:412
#1  0x7f5ca1701608 in device_unparent (obj=0x7f5ca43ffc00)
at hw/core/qdev.c:1115
#2  0x7f5ca18b7891 in object_finalize_child_property (obj=0x7f5ca380f500, 
name=0x7f5ca3f21da0 "hostdev0", opaque=0x7f5ca43ffc00) at qom/object.c:1362
#3  0x7f5ca18b56b2 in object_property_del_child (obj=0x7f5ca380f500, 
child=0x7f5ca43ffc00, errp=0x0) at qom/object.c:422
#4  0x7f5ca18b5790 in object_unparent (obj=0x7f5ca43ffc00)
at qom/object.c:441
#5  0x7f5ca16c1f31 in acpi_pcihp_eject_slot (s=0x7f5ca4c41268, bsel=0, 
slots=4) at hw/acpi/pcihp.c:139


#0  vfio_instance_finalize (obj=0x7f5ca43ffc00)
at /net/gimli/home/alwillia/Work/qemu.git/hw/vfio/pci.c:2731
#1  0x7f5ca18b57c0 in object_deinit (obj=0x7f5ca43ffc00, 
type=0x7f5ca376f490) at qom/object.c:448
#2  0x7f5ca18b5831 in object_finalize (data=0x7f5ca43ffc00)
at qom/object.c:462
#3  0x7f5ca18b6782 in object_unref (obj=0x7f5ca43ffc00) at qom/object.c:896
#4  0x7f5ca1550cc0 in memory_region_unref (mr=0x7f5ca43fff00)
at /net/gimli/home/alwillia/Work/qemu.git/memory.c:1476
#5  0x7f5ca1553886 in do_address_space_destroy (as=0x7f5ca43ffe10)
at /net/gimli/home/alwillia/Work/qemu.git/memory.c:2272


It appears that DEVICE_DELETED only means the VM is done with the
device but libvirt is interpreting it as QEMU is done with the device.
Which is correct?  Do we need a new event or do we need to fix the
ordering of this event?  An ordering fix would be more compatible with
existing libvirt.  Thanks,

Alex



Re: [Qemu-devel] [PATCH] vl: Delay initialization of memory backends

2016-09-01 Thread Bandan Das
Eduardo Habkost  writes:

> On Thu, Sep 01, 2016 at 02:39:37PM -0300, Eduardo Habkost wrote:
>> On Thu, Sep 01, 2016 at 05:41:52PM +0200, Paolo Bonzini wrote:
>> > On 01/09/2016 17:10, Eduardo Habkost wrote:
>> > > Ouch. It looks like the ordering requirements are messier than I
>> > > thought. vhost-user depends on the memory backends to be already
>> > > initialized.
>> > 
>> > You could also look at delaying initialization of vhost-user, not
>> > sending anything on the wire until after machine creation.
>> 
>> I was wishing the bug could be fixed without the need to touch
>> vhost, but I will take a look.
>> 
>> BTW, the vhost error is actually happening inside a VCPU thread,
>> after everything was supposed to be fully initialized.  Maybe the
>> memory listener logic in vhost.c is broken somehow?
>
> This is getting hairier.

Just started looking... I understand that chardev init can't be moved
up because it might depend on object init but why can't we delay
vhost-user initialization ?

> Summary:
>
> 1) vhost_user_set_mem_table() fails because dev->mem->nregions is 0
> 2) dev->mem->nregions is supposed to get new entries based on the
>memory listener callbacks
> 3) vhost_region_add() gets called properly, and calls
>vhost_set_memory(), but:
> 4) vhost_set_memory() forces add=false if
>memory_region_get_dirty_log_mask(section->mr) & ~(1 << 
> DIRTY_MEMORY_MIGRATION)
>(I have no idea why)
> 5) memory_region_init_ram_from_file() sets:
>mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;

The commit message that added it says that DIRTY_MEMORY_CODE is only used
with TCG. However, the above check is saying that as long as any bit
except DIRTY_MEMORY_MIGRATION is set, don't add the region (because that is
handled else where). Should this apply to DIRTY_MEMORY_CODE too ?
This check is at some other places too but it seems none of them can have
DIRTY_MEMORY_CODE set since they will always be called in non-tcg context.

>(I don't understand what are the consequences of this)
> 6) The tcg_enabled() check above is broken if the memory region
>is created before configure_accelerator() is called. My patch
>moves memory backend initialization after
>configure_accelerator()

Indeed, this seems broken and will always add the region.
I am not sure, however, whether DIRTY_MEMORY_CODE is something
that needs to be tracked.

> I'm very confused. My patch seems to fix the dirty_log_mask
> initialization at (5) by accident? But for some reason this
> breaks vhost-user and makes it ignore all memory regions if using
> TCG? (vhost-user-test forces accel=tcg, BTW)

Right, because this bit will be set for all memory regions in the
tcg case. From what I understand, either this shouldn't be set for
all memory regions universally or vhost can simply ignore checking
for this bit in the mask ?

bandan

> As I have no idea why vhost_set_memory() ignores the memory
> region based on memory_region_get_dirty_log_mask(), I don't know
> what's supposed to be happening here. Any help would be
> appreciated.



Re: [Qemu-devel] [PATCH v2 1/1] block-backend: allow flush on devices with open tray

2016-09-01 Thread John Snow



On 09/01/2016 06:11 PM, Eric Blake wrote:

Sounds slightly nicer if you stop short: s/ to work//


I'm happy with this edit if a maintainer would like to make it.



Re: [Qemu-devel] [Qemu-block] [PATCH] atapi: allow 0 transfer bytes for read_cd command

2016-09-01 Thread John Snow



On 08/21/2016 05:16 PM, Hervé Poussineau wrote:

Le 18/08/2016 à 16:24, Kevin Wolf a écrit :

Hm, which of the paths in cmd_read_cd() does this hit? Is it the one
that directly calls ide_atapi_cmd_ok() without doing anything?


This is in ide_atapi_cmd, at line:
if (cmd->handler && !(cmd->flags & NONDATA)) {
handler is cmd_read_cd and flags doesn't contain NONDATA and
atapi_byte_count_limit is 0 and atapi_dma is false, so command is aborted.
Adding NONDATA flag prevents this command abort.



I think adding NONDATA is okay, but we may need to add explicit
atapi_byte_count_limit() == 0 checks to those paths that do transfer
some data. At least at first sight I'm not sure that
ide_atapi_cmd_read() can handle this.



ATAPI packet is:
ATAPI limit=0x0 packet: be 00 00 00 00 00 00 00 00 00 00 00
Note that byte count limit is 0x0.
I also checked that s->packet_dma is false.

cmd_read_cd calculates nb_sectors using buf[6], buf[7] and buf[8] =>
nb_sectors = 0.
There is a specific case in cmd_read_cd if nb_sectors == 0, which
succeeds the command.

So, we have four cases:
a) byte limit == 0 && nb_sectors == 0 -> used by NT4, currently is
aborting the command in ide_atapi_cmd
b) byte limit == 0 && nb_sectors != 0 -> command is aborted in
ide_atapi_cmd
c) byte limit != 0 && nb_sectors == 0 -> command succeeds in cmd_read_cd
d) byte limit != 0 && nb_sectors != 0 -> usual case, works fine

Maybe we should add NONDATA flag for cmd_read_cd command, and add on top
of cmd_read_cd
- if nb_sectors == 0, succeed command (for cases a and c)
- if byte limit == 0 && nb_sectors != 0, abort command (for case b)
- otherwise, process as usual (for case d)

Hervé


What a mess. I guess there's really no way to -- at the command dispatch 
level -- tell whether or not having a BCL of 0 is a problem. It's 
literally up to the command handlers themselves.


That's really unfortunate.

So at this point, it's important to rename this flag. When I introduced 
it, I was under the impression that commands could either return data or 
not, and if they did, that the byte limit must be set.


If there are commands which can do both, "NONDATA" gets a lot less 
meaningful.


I might ask that we rename it BCL0 or something -- this command is 
allowed to process commands with a BCL of zero -- and then those 
commands will also be responsible for further checking if that's truly OK.


If you respin, please cc stable for 2.7.1. Sorry for the long delay.

--js




Re: [Qemu-devel] [PATCH v2 1/1] block-backend: allow flush on devices with open tray

2016-09-01 Thread Eric Blake
On 09/01/2016 04:56 PM, John Snow wrote:
> If a device still has an attached BDS because the medium has not yet
> been removed, we will be unable to migrate to a new host because
> blk_flush will return an error for that backend.
> 
> Replace the call to blk_is_available to blk_is_inserted to weaken
> the check and allow flushes from the backend to work, while still
> disallowing flushes from the frontend/device model to work.

Sounds slightly nicer if you stop short: s/ to work//

> 
> This fixes a regression present in 2.6.0 caused by the following commit:
> fe1a9cbc339bb54d20f1ca4c1e8788d16944d5cf
> block: Move some bdrv_*_all() functions to BB
> 
> Signed-off-by: John Snow 
> ---
>  block/block-backend.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Eric Blake 

However, I don't have an offhand suggestion on how best to test it.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v2 1/1] block-backend: allow flush on devices with open tray

2016-09-01 Thread John Snow
If a device still has an attached BDS because the medium has not yet
been removed, we will be unable to migrate to a new host because
blk_flush will return an error for that backend.

Replace the call to blk_is_available to blk_is_inserted to weaken
the check and allow flushes from the backend to work, while still
disallowing flushes from the frontend/device model to work.

This fixes a regression present in 2.6.0 caused by the following commit:
fe1a9cbc339bb54d20f1ca4c1e8788d16944d5cf
block: Move some bdrv_*_all() functions to BB

Signed-off-by: John Snow 
---
 block/block-backend.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index effa038..36a32c3 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1058,7 +1058,7 @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t 
offset,
 BlockAIOCB *blk_aio_flush(BlockBackend *blk,
   BlockCompletionFunc *cb, void *opaque)
 {
-if (!blk_is_available(blk)) {
+if (!blk_is_inserted(blk)) {
 return blk_abort_aio_request(blk, cb, opaque, -ENOMEDIUM);
 }
 
@@ -1118,7 +1118,7 @@ int blk_co_pdiscard(BlockBackend *blk, int64_t offset, 
int count)
 
 int blk_co_flush(BlockBackend *blk)
 {
-if (!blk_is_available(blk)) {
+if (!blk_is_inserted(blk)) {
 return -ENOMEDIUM;
 }
 
@@ -1127,7 +1127,7 @@ int blk_co_flush(BlockBackend *blk)
 
 int blk_flush(BlockBackend *blk)
 {
-if (!blk_is_available(blk)) {
+if (!blk_is_inserted(blk)) {
 return -ENOMEDIUM;
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH v2 0/1] block-backend: allow flush on devices with open tray

2016-09-01 Thread John Snow
This one fell through the cracks. This is therefore for 2.7.1
and the 2.8 development tree instead.

I have not written a test for this just yet, does anyone have
a suggestion on where it belongs?

John Snow (1):
  block-backend: allow flush on devices with open tray

 block/block-backend.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.7.4




[Qemu-devel] hostdev / iSCSI issue in QEMU 2.7

2016-09-01 Thread Holger Schranz

Hello,

we need help for an issue we have sice QEMU 2.7.
May be we use the wrong mailing list. If so please let me know which
mail list we have to use to report problems in QEMU.

Best regards

Holger

===

We use the following libvirt-descrption for a tape-device and a
changer-device:


  
  
  
  




  
  
  
  


We use tgt as iSCSI-Server.

With qemu 2.6 it works fine.

With qemu 2.7RC4 we got the following error:

Thread 1 "qemu-system-x86" received signal SIGFPE, Arithmetic exception.
0x55c207e2 in iscsi_refresh_limits (bs=0x568236f0,
errp=0x7fffcca8) at
/home/kvm/SOURCES/qemu-2.7.0-rc4/block/iscsi.c:1838
1838if (iscsilun->bl.max_ws_len < 0x /
iscsilun->block_size) {

The function iscsi_refresh_limits is changed in 2.7RC4 and devides by
iscsilun->block_size,
but the block size is filled only for SCSI-type TYPE_DISK and TYPE_ROM in
/home/kvm/SOURCES/qemu-2.7.0-rc4/block/iscsi.c:1378 (function
iscsi_readcapacity_sync)

If the libvirt-construction (see above) is correct, there is no chance
in qemu 2.7
to use a tape-device or changer via iscsi.

!!! If we should change the definition, please let us know. !!!

Backtrace:
 Thread 1 "qemu-system-x86" received signal SIGFPE, Arithmetic exception.
0x55c207e2 in iscsi_refresh_limits (bs=0x568236f0,
errp=0x7fffcca8) at
/home/kvm/SOURCES/qemu-2.7.0-rc4/block/iscsi.c:1838
1838if (iscsilun->bl.max_ws_len < 0x /
iscsilun->block_size) {
(gdb) bt
#0  0x55c207e2 in iscsi_refresh_limits (bs=0x568236f0,
errp=0x7fffcca8) at /home/kvm/SOURCES/qemu-2.7.0-rc4/block/iscsi.c:1838
#1  0x55c018d4 in bdrv_refresh_limits (bs=0x568236f0,
errp=0x7fffcca8) at /home/kvm/SOURCES/qemu-2.7.0-rc4/block/io.c:121
#2  0x55b9627a in bdrv_open_common (bs=0x568236f0, file=0x0,
options=0x56827950, errp=0x7fffcd58)
at /home/kvm/SOURCES/qemu-2.7.0-rc4/block.c:1013
#3  0x55b97e3b in bdrv_open_inherit (filename=0x567de840
"iscsi://192.168.200.53:3260/iqn.2008-09.net.fsc%3Aserver.LT260_61005/1",
reference=0x0, options=0x56827950, flags=57346,
parent=0x5681d450, child_role=
0x56165360 , errp=0x7fffcea8)
at /home/kvm/SOURCES/qemu-2.7.0-rc4/block.c:1687
#4  0x55b97524 in bdrv_open_child (filename=0x567de840
"iscsi://192.168.200.53:3260/iqn.2008-09.net.fsc%3Aserver.LT260_61005/1",
options=0x568216b0, bdref_key=0x55d78112 "file",
parent=0x5681d450, child_role=
0x56165360 , allow_none=true, errp=0x7fffcea8)
at /home/kvm/SOURCES/qemu-2.7.0-rc4/block.c:1449
#5  0x55b97cb9 in bdrv_open_inherit (filename=0x567de840
"iscsi://192.168.200.53:3260/iqn.2008-09.net.fsc%3Aserver.LT260_61005/1",
reference=0x0, options=0x568216b0, flags=8194, parent=0x0, child_role=
0x0, errp=0x7fffd1d8) at
/home/kvm/SOURCES/qemu-2.7.0-rc4/block.c:1648
#6  0x55b9825c in bdrv_open (filename=0x567de840
"iscsi://192.168.200.53:3260/iqn.2008-09.net.fsc%3Aserver.LT260_61005/1",
reference=0x0, options=0x567ed240, flags=2, errp=0x7fffd1d8)
at /home/kvm/SOURCES/qemu-2.7.0-rc4/block.c:1778
#7  0x55beef4f in blk_new_open (filename=0x567de840
"iscsi://192.168.200.53:3260/iqn.2008-09.net.fsc%3Aserver.LT260_61005/1",
reference=0x0, options=0x567ed240, flags=2, errp=0x7fffd1d8)
at /home/kvm/SOURCES/qemu-2.7.0-rc4/block/block-backend.c:160
#8  0x558ce2bb in blockdev_init (file=0x567de840
"iscsi://192.168.200.53:3260/iqn.2008-09.net.fsc%3Aserver.LT260_61005/1",
bs_opts=0x567ed240, errp=0x7fffd1d8) at
/home/kvm/SOURCES/qemu-2.7.0-rc4/blockdev.c:597
#9  0x558cf576 in drive_new (all_opts=0x56766fd0,
block_default_type=IF_IDE) at
/home/kvm/SOURCES/qemu-2.7.0-rc4/blockdev.c:1132
#10 0x558eaec9 in drive_init_func (opaque=0x567787b0,
opts=0x56766fd0, errp=0x0) at /home/kvm/SOURCES/qemu-2.7.0-rc4/vl.c:1138
#11 0x55c9ab9d in qemu_opts_foreach (list=0x56184fe0
, func=0x558eae8a ,
opaque=0x567787b0, errp=0x0)
at /home/kvm/SOURCES/qemu-2.7.0-rc4/util/qemu-option.c:1116
#12 0x558f434a in main (argc=85, argv=0x7fffd6e8,
envp=0x7fffd998) at /home/kvm/SOURCES/qemu-2.7.0-rc4/vl.c:4401

---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus




Re: [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-01 Thread Alex Williamson
On Thu, 1 Sep 2016 23:52:02 +0530
Kirti Wankhede  wrote:

> Alex,
> Thanks for summarizing the discussion.
> 
> On 8/31/2016 9:18 PM, Alex Williamson wrote:
> > On Wed, 31 Aug 2016 15:04:13 +0800
> > Jike Song  wrote:
> >   
> >> On 08/31/2016 02:12 PM, Tian, Kevin wrote:  
>  From: Alex Williamson [mailto:alex.william...@redhat.com]
>  Sent: Wednesday, August 31, 2016 12:17 AM
> 
>  Hi folks,
> 
>  At KVM Forum we had a BoF session primarily around the mediated device
>  sysfs interface.  I'd like to share what I think we agreed on and the
>  "problem areas" that still need some work so we can get the thoughts
>  and ideas from those who weren't able to attend.
> 
>  DanPB expressed some concern about the mdev_supported_types sysfs
>  interface, which exposes a flat csv file with fields like "type",
>  "number of instance", "vendor string", and then a bunch of type
>  specific fields like "framebuffer size", "resolution", "frame rate
>  limit", etc.  This is not entirely machine parsing friendly and sort of
>  abuses the sysfs concept of one value per file.  Example output taken
>  from Neo's libvirt RFC:
> 
>  cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
>  # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, 
>  framebuffer,
>  max_resolution
>  11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
>  12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
>  13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
>  14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
>  15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
>  16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
>  17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
>  18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160
> 
>  The create/destroy then looks like this:
> 
>  echo "$mdev_UUID:vendor_specific_argument_list" >
>   /sys/bus/pci/devices/.../mdev_create
> 
>  echo "$mdev_UUID:vendor_specific_argument_list" >
>   /sys/bus/pci/devices/.../mdev_destroy
> 
>  "vendor_specific_argument_list" is nebulous.
> 
>  So the idea to fix this is to explode this into a directory structure,
>  something like:
> 
>  ├── mdev_destroy
>  └── mdev_supported_types
>  ├── 11
>  │   ├── create
>  │   ├── description
>  │   └── max_instances
>  ├── 12
>  │   ├── create
>  │   ├── description
>  │   └── max_instances
>  └── 13
>  ├── create
>  ├── description
>  └── max_instances
> 
>  Note that I'm only exposing the minimal attributes here for simplicity,
>  the other attributes would be included in separate files and we would
>  require vendors to create standard attributes for common device classes. 
> 
> >>>
> >>> I like this idea. All standard attributes are reflected into this 
> >>> hierarchy.
> >>> In the meantime, can we still allow optional vendor string in create 
> >>> interface? libvirt doesn't need to know the meaning, but allows upper
> >>> layer to do some vendor specific tweak if necessary.
> >>> 
> >>
> >> Not sure whether this can done within MDEV framework (attrs provided by
> >> vendor driver of course), or must be within the vendor driver.  
> > 
> > The purpose of the sub-directories is that libvirt doesn't need to pass
> > arbitrary, vendor strings to the create function, the attributes of the
> > mdev device created are defined by the attributes in the sysfs
> > directory where the create is done.  The user only provides a uuid for
> > the device.  Arbitrary vendor parameters are a barrier, libvirt may not
> > need to know the meaning, but would need to know when to apply them,
> > which is just as bad.  Ultimately we want libvirt to be able to
> > interact with sysfs without having an vendor specific knowledge.
> >   
> 
> Above directory hierarchy looks fine to me. Along with the fixed set of
> parameter, a optional field of extra parameter is also required. Such
> parameters are required for some specific testing or running benchmarks,
> for example to disable FRL (framerate limiter) or to disable console vnc
> when not required. Libvirt don't need to know its details, its just a
> string that user can provide and libvirt need to pass the string as it
> is to vendor driver, vendor driver would act accordingly.

Wouldn't it make more sense to enable these through the vendor driver
which would then provide additional types through the sysfs interface
that could be selected by libvirt?  Or simply transparently change
these parameters within the existing types?  I think we really want to

[Qemu-devel] how to know ept is used by kvm

2016-09-01 Thread Yuxin Ren
Hi all,

How can I know ept (extended page table) is used by kvm?
Do I need special configuration or options to enable ept in kvm?

thanks
Yuxin



Re: [Qemu-devel] [PATCH] vl: Delay initialization of memory backends

2016-09-01 Thread Eduardo Habkost
On Thu, Sep 01, 2016 at 02:39:37PM -0300, Eduardo Habkost wrote:
> On Thu, Sep 01, 2016 at 05:41:52PM +0200, Paolo Bonzini wrote:
> > On 01/09/2016 17:10, Eduardo Habkost wrote:
> > > Ouch. It looks like the ordering requirements are messier than I
> > > thought. vhost-user depends on the memory backends to be already
> > > initialized.
> > 
> > You could also look at delaying initialization of vhost-user, not
> > sending anything on the wire until after machine creation.
> 
> I was wishing the bug could be fixed without the need to touch
> vhost, but I will take a look.
> 
> BTW, the vhost error is actually happening inside a VCPU thread,
> after everything was supposed to be fully initialized.  Maybe the
> memory listener logic in vhost.c is broken somehow?

This is getting hairier.

Summary:

1) vhost_user_set_mem_table() fails because dev->mem->nregions is 0
2) dev->mem->nregions is supposed to get new entries based on the
   memory listener callbacks
3) vhost_region_add() gets called properly, and calls
   vhost_set_memory(), but:
4) vhost_set_memory() forces add=false if
   memory_region_get_dirty_log_mask(section->mr) & ~(1 << 
DIRTY_MEMORY_MIGRATION)
   (I have no idea why)
5) memory_region_init_ram_from_file() sets:
   mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
   (I don't understand what are the consequences of this)
6) The tcg_enabled() check above is broken if the memory region
   is created before configure_accelerator() is called. My patch
   moves memory backend initialization after
   configure_accelerator()

I'm very confused. My patch seems to fix the dirty_log_mask
initialization at (5) by accident? But for some reason this
breaks vhost-user and makes it ignore all memory regions if using
TCG? (vhost-user-test forces accel=tcg, BTW)

As I have no idea why vhost_set_memory() ignores the memory
region based on memory_region_get_dirty_log_mask(), I don't know
what's supposed to be happening here. Any help would be
appreciated.

-- 
Eduardo



[Qemu-devel] [Bug 1619438] [NEW] GTK+ UI, delete key deletes to the left in the monitor

2016-09-01 Thread Bug Report
Public bug reported:

it must delete characters to the right, otherwise it is like having two
backspaces

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1619438

Title:
  GTK+ UI, delete key deletes to the left in the monitor

Status in QEMU:
  New

Bug description:
  it must delete characters to the right, otherwise it is like having
  two backspaces

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1619438/+subscriptions



Re: [Qemu-devel] [PATCH 2/2] virtio-balloon: resume collecting stats on vmload

2016-09-01 Thread Michael S. Tsirkin
On Thu, Sep 01, 2016 at 09:14:00PM +0300, Roman Kagan wrote:
> Upon save/restore virtio-balloon stats acquisition stops.  The reason is
> that the in-use virtqueue element is not saved, and upon restore it's
> not released to the guest, making further progress impossible.
> 
> Saving the information about the used element would introduce
> unjustified vmstate incompatibility.
> 
> However, the number of virtqueue in-use elements can be deduced from the
> data available on restore (see bccdef6b1a204db0f41ffb6e24ce373e4d7890d4
> "virtio: recalculate vq->inuse after migration").
> 
> So make the stats virtqueue forget that an element was popped from it,
> and start over.  As this tackles the problem on the "load" side, it
> is compatible with the state saved by earlier QEMU versions.
> 
> Signed-off-by: Roman Kagan 
> Cc: "Michael S. Tsirkin" 
> Cc: Ladi Prosek 
> Cc: Stefan Hajnoczi 
> ---
>  hw/virtio/virtio-balloon.c | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 5af429a..8660052 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -406,7 +406,13 @@ static void virtio_balloon_save_device(VirtIODevice 
> *vdev, QEMUFile *f)
>  
>  static int virtio_balloon_load(QEMUFile *f, void *opaque, size_t size)
>  {
> -return virtio_load(VIRTIO_DEVICE(opaque), f, 1);
> +VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
> +VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
> +int ret = virtio_load(vdev, f, 1);
> +/* rewind needs vq->inuse populated which happens in virtio_load() after
> + * vdev->load */
> +virtqueue_rewind(s->svq);
> +return ret;
>  }
>  
>  static int virtio_balloon_load_device(VirtIODevice *vdev, QEMUFile *f,
> @@ -468,6 +474,18 @@ static void virtio_balloon_device_reset(VirtIODevice 
> *vdev)
>  }
>  }
>  
> +static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
> +{
> +VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
> +
> +if (vdev->vm_running && balloon_stats_supported(s) &&
> +(status & VIRTIO_CONFIG_S_DRIVER_OK) && !s->stats_vq_elem) {
> +/* poll stats queue for the element we may have discarded when the VM
> + * was stopped */
> +virtio_balloon_receive_stats(vdev, s->svq);
> +}
> +}
> +
>  static void virtio_balloon_instance_init(Object *obj)
>  {
>  VirtIOBalloon *s = VIRTIO_BALLOON(obj);
> @@ -505,6 +523,7 @@ static void virtio_balloon_class_init(ObjectClass *klass, 
> void *data)
>  vdc->get_features = virtio_balloon_get_features;
>  vdc->save = virtio_balloon_save_device;
>  vdc->load = virtio_balloon_load_device;
> +vdc->set_status = virtio_balloon_set_status;
>  }
>  
>  static const TypeInfo virtio_balloon_info = {


I'm sorry - I don't like this patch. This means that
virtio_balloon_receive_stats will be called and will poke
at the ring even if the ring was never kicked.

This is why in my opinion it is cleaner to have
virtqueue_rewind return a true/false status,
and only process the ring if ring was rewinded.

I think Stefan's patches did something similar.

> -- 
> 2.7.4



[Qemu-devel] [Bug 1615212] Re: SDL UI switching to monitor half-broken and scrolling broken

2016-09-01 Thread Bug Report
i see, but there are still problems with the user interfaces.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1615212

Title:
  SDL UI switching to monitor half-broken and scrolling broken

Status in QEMU:
  New

Bug description:
  ctrl+alt+2 must be pressed 2 or more times for the monitor console
  window to appear with -sdl, the window flashes and disappears also
  before finally staying open

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1615212/+subscriptions



Re: [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-01 Thread Kirti Wankhede

Alex,
Thanks for summarizing the discussion.

On 8/31/2016 9:18 PM, Alex Williamson wrote:
> On Wed, 31 Aug 2016 15:04:13 +0800
> Jike Song  wrote:
> 
>> On 08/31/2016 02:12 PM, Tian, Kevin wrote:
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Wednesday, August 31, 2016 12:17 AM

 Hi folks,

 At KVM Forum we had a BoF session primarily around the mediated device
 sysfs interface.  I'd like to share what I think we agreed on and the
 "problem areas" that still need some work so we can get the thoughts
 and ideas from those who weren't able to attend.

 DanPB expressed some concern about the mdev_supported_types sysfs
 interface, which exposes a flat csv file with fields like "type",
 "number of instance", "vendor string", and then a bunch of type
 specific fields like "framebuffer size", "resolution", "frame rate
 limit", etc.  This is not entirely machine parsing friendly and sort of
 abuses the sysfs concept of one value per file.  Example output taken
 from Neo's libvirt RFC:

 cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
 # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, 
 framebuffer,
 max_resolution
 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160

 The create/destroy then looks like this:

 echo "$mdev_UUID:vendor_specific_argument_list" >
/sys/bus/pci/devices/.../mdev_create

 echo "$mdev_UUID:vendor_specific_argument_list" >
/sys/bus/pci/devices/.../mdev_destroy

 "vendor_specific_argument_list" is nebulous.

 So the idea to fix this is to explode this into a directory structure,
 something like:

 ├── mdev_destroy
 └── mdev_supported_types
 ├── 11
 │   ├── create
 │   ├── description
 │   └── max_instances
 ├── 12
 │   ├── create
 │   ├── description
 │   └── max_instances
 └── 13
 ├── create
 ├── description
 └── max_instances

 Note that I'm only exposing the minimal attributes here for simplicity,
 the other attributes would be included in separate files and we would
 require vendors to create standard attributes for common device classes.  
>>>
>>> I like this idea. All standard attributes are reflected into this hierarchy.
>>> In the meantime, can we still allow optional vendor string in create 
>>> interface? libvirt doesn't need to know the meaning, but allows upper
>>> layer to do some vendor specific tweak if necessary.
>>>   
>>
>> Not sure whether this can done within MDEV framework (attrs provided by
>> vendor driver of course), or must be within the vendor driver.
> 
> The purpose of the sub-directories is that libvirt doesn't need to pass
> arbitrary, vendor strings to the create function, the attributes of the
> mdev device created are defined by the attributes in the sysfs
> directory where the create is done.  The user only provides a uuid for
> the device.  Arbitrary vendor parameters are a barrier, libvirt may not
> need to know the meaning, but would need to know when to apply them,
> which is just as bad.  Ultimately we want libvirt to be able to
> interact with sysfs without having an vendor specific knowledge.
> 

Above directory hierarchy looks fine to me. Along with the fixed set of
parameter, a optional field of extra parameter is also required. Such
parameters are required for some specific testing or running benchmarks,
for example to disable FRL (framerate limiter) or to disable console vnc
when not required. Libvirt don't need to know its details, its just a
string that user can provide and libvirt need to pass the string as it
is to vendor driver, vendor driver would act accordingly.


 For vGPUs like NVIDIA where we don't support multiple types
 concurrently, this directory structure would update as mdev devices are
 created, removing no longer available types.  I carried forward  
>>>
>>> or keep the type with max_instances cleared to ZERO.
>>>  
>>
>> +1 :)
> 
> Possible yes, but why would the vendor driver report types that the
> user cannot create?  It just seems like superfluous information (well,
> except for the use I discover below).
> 

The directory structure for a physical GPU will be defined when device
is register to 

[Qemu-devel] [PATCH 0/2] virtio-balloon: resume collecting stats on vmload

2016-09-01 Thread Roman Kagan
Fix virtio-balloon stats acquisition which stops upon save/restore.

Roman Kagan (2):
  virtio: add virtqueue_rewind
  virtio-balloon: resume collecting stats on vmload

Signed-off-by: Roman Kagan 
Cc: "Michael S. Tsirkin" 
Cc: Ladi Prosek 
Cc: Stefan Hajnoczi 
---
 hw/virtio/virtio-balloon.c | 21 -
 hw/virtio/virtio.c |  6 ++
 include/hw/virtio/virtio.h |  1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

-- 
2.7.4




[Qemu-devel] [PATCH 2/2] virtio-balloon: resume collecting stats on vmload

2016-09-01 Thread Roman Kagan
Upon save/restore virtio-balloon stats acquisition stops.  The reason is
that the in-use virtqueue element is not saved, and upon restore it's
not released to the guest, making further progress impossible.

Saving the information about the used element would introduce
unjustified vmstate incompatibility.

However, the number of virtqueue in-use elements can be deduced from the
data available on restore (see bccdef6b1a204db0f41ffb6e24ce373e4d7890d4
"virtio: recalculate vq->inuse after migration").

So make the stats virtqueue forget that an element was popped from it,
and start over.  As this tackles the problem on the "load" side, it
is compatible with the state saved by earlier QEMU versions.

Signed-off-by: Roman Kagan 
Cc: "Michael S. Tsirkin" 
Cc: Ladi Prosek 
Cc: Stefan Hajnoczi 
---
 hw/virtio/virtio-balloon.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 5af429a..8660052 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -406,7 +406,13 @@ static void virtio_balloon_save_device(VirtIODevice *vdev, 
QEMUFile *f)
 
 static int virtio_balloon_load(QEMUFile *f, void *opaque, size_t size)
 {
-return virtio_load(VIRTIO_DEVICE(opaque), f, 1);
+VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
+VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
+int ret = virtio_load(vdev, f, 1);
+/* rewind needs vq->inuse populated which happens in virtio_load() after
+ * vdev->load */
+virtqueue_rewind(s->svq);
+return ret;
 }
 
 static int virtio_balloon_load_device(VirtIODevice *vdev, QEMUFile *f,
@@ -468,6 +474,18 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 }
 }
 
+static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
+{
+VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
+
+if (vdev->vm_running && balloon_stats_supported(s) &&
+(status & VIRTIO_CONFIG_S_DRIVER_OK) && !s->stats_vq_elem) {
+/* poll stats queue for the element we may have discarded when the VM
+ * was stopped */
+virtio_balloon_receive_stats(vdev, s->svq);
+}
+}
+
 static void virtio_balloon_instance_init(Object *obj)
 {
 VirtIOBalloon *s = VIRTIO_BALLOON(obj);
@@ -505,6 +523,7 @@ static void virtio_balloon_class_init(ObjectClass *klass, 
void *data)
 vdc->get_features = virtio_balloon_get_features;
 vdc->save = virtio_balloon_save_device;
 vdc->load = virtio_balloon_load_device;
+vdc->set_status = virtio_balloon_set_status;
 }
 
 static const TypeInfo virtio_balloon_info = {
-- 
2.7.4




[Qemu-devel] [PATCH 1/2] virtio: add virtqueue_rewind

2016-09-01 Thread Roman Kagan
If the in-use elements aren't migrated (like is the case with
virtio-balloon stats vq), make virtqueue forget about them and pretend
they haven't been popped yet, to allow to start over on load.

Signed-off-by: Roman Kagan 
Cc: "Michael S. Tsirkin" 
Cc: Ladi Prosek 
Cc: Stefan Hajnoczi 
---
 hw/virtio/virtio.c | 6 ++
 include/hw/virtio/virtio.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 74c085c..844d2a1 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -272,6 +272,12 @@ void virtqueue_discard(VirtQueue *vq, const 
VirtQueueElement *elem,
 virtqueue_unmap_sg(vq, elem, len);
 }
 
+void virtqueue_rewind(VirtQueue *vq)
+{
+vq->last_avail_idx -= vq->inuse;
+vq->inuse = 0;
+}
+
 void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
 unsigned int len, unsigned int idx)
 {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index d2490c1..9f49fa4 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -154,6 +154,7 @@ void virtqueue_push(VirtQueue *vq, const VirtQueueElement 
*elem,
 void virtqueue_flush(VirtQueue *vq, unsigned int count);
 void virtqueue_discard(VirtQueue *vq, const VirtQueueElement *elem,
unsigned int len);
+void virtqueue_rewind(VirtQueue *vq);
 void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
 unsigned int len, unsigned int idx);
 
-- 
2.7.4




[Qemu-devel] [RFC] e1000: Don't save writes to ICS/ICR masked by IMS

2016-09-01 Thread Ed Swierk
Windows 8, 10 and Server 2012 guests hang intermittently while booting
on Xen 4.5.3 with 1 vCPU and 4 e1000 vNICs, shortly after the Windows
logo appears and the little dots start spinning.

Running strace on qemu shows its main thread doing the following every
couple of milliseconds:

 ppoll([..., {fd=30, events=POLLIN|POLLERR|POLLHUP},
...], ...) = 1 ([{fd=30, revents=POLLIN}], ...)
 read(30, "^\0\0\0", 4) = 4
 write(30, "^\0\0\0", 4) = 4
 ioctl(30, IOCTL_EVTCHN_NOTIFY, 0x7f1f9449d310) = 0
 clock_gettime(CLOCK_MONOTONIC, {6937, 449468262}) = 0
 clock_gettime(CLOCK_MONOTONIC, {6937, 449582903}) = 0
 gettimeofday({1472251376, 673434}, NULL) = 0
 clock_gettime(CLOCK_MONOTONIC, {6937, 449856205}) = 0
 gettimeofday({1472251376, 673679}, NULL) = 0

The event channel (identified by '^' or 94 in this example) is always
the third of the domain's four channels.

Two recent qemu patches (http://git.qemu.org/?p=qemu.git;h=9596ef7c and
http://git.qemu.org/?p=qemu.git;h=74004e8c) seem to address similar
issues, but don't help in this case.

The proposed fix from
https://bugzilla.redhat.com/show_bug.cgi?id=874406#c78 makes the hang
go away. It's not clear to me why it works, or if it's just papering
over a bug elsewhere, or if there are any possible side effects.

Suggested-by: Andrew Jones 
Signed-off-by: Ed Swierk 

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 6eac66d..c891b67 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -293,6 +293,8 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
 uint32_t pending_ints;
 uint32_t mit_delay;
 
+val &= s->mac_reg[IMS];
+
 s->mac_reg[ICR] = val;
 
 /*
@@ -305,7 +307,7 @@ set_interrupt_cause(E1000State *s, int index, uint32_t val)
  */
 s->mac_reg[ICS] = val;
 
-pending_ints = (s->mac_reg[IMS] & s->mac_reg[ICR]);
+pending_ints = s->mac_reg[ICR];
 if (!s->mit_irq_level && pending_ints) {
 /*
  * Here we detect a potential raising edge. We postpone raising the



Re: [Qemu-devel] [PATCH] vl: Delay initialization of memory backends

2016-09-01 Thread Eduardo Habkost
On Thu, Sep 01, 2016 at 05:41:52PM +0200, Paolo Bonzini wrote:
> On 01/09/2016 17:10, Eduardo Habkost wrote:
> > Ouch. It looks like the ordering requirements are messier than I
> > thought. vhost-user depends on the memory backends to be already
> > initialized.
> 
> You could also look at delaying initialization of vhost-user, not
> sending anything on the wire until after machine creation.

I was wishing the bug could be fixed without the need to touch
vhost, but I will take a look.

BTW, the vhost error is actually happening inside a VCPU thread,
after everything was supposed to be fully initialized.  Maybe the
memory listener logic in vhost.c is broken somehow?


Backtrace (after manually adding an abort() to help debugging):

#2  0x562ebf27feb5 in vhost_user_set_mem_table (dev=0x562ec0189630, 
mem=) at 
/home/ehabkost/rh/proj/virt/qemu/hw/virtio/vhost-user.c:308
#3  0x562ebf27e524 in vhost_dev_start (hdev=hdev@entry=0x562ec0189630, 
vdev=vdev@entry=0x562ec19aa4c0) at 
/home/ehabkost/rh/proj/virt/qemu/hw/virtio/vhost.c:1304
#4  0x562ebf264a6b in vhost_net_start (dev=0x562ec19aa4c0, 
net=0x562ec0189630) at /home/ehabkost/rh/proj/virt/qemu/hw/net/vhost_net.c:232
#5  0x562ebf264a6b in vhost_net_start (dev=dev@entry=0x562ec19aa4c0, 
ncs=0x562ec19f3750, total_queues=total_queues@entry=1)
at /home/ehabkost/rh/proj/virt/qemu/hw/net/vhost_net.c:324
#6  0x562ebf261543 in virtio_net_set_status (status=6 '\006', 
n=0x562ec19aa4c0) at /home/ehabkost/rh/proj/virt/qemu/hw/net/virtio-net.c:151
#7  0x562ebf261543 in virtio_net_set_status (vdev=, 
status=) at 
/home/ehabkost/rh/proj/virt/qemu/hw/net/virtio-net.c:224
#8  0x562ebf278fc3 in virtio_set_status (vdev=vdev@entry=0x562ec19aa4c0, 
val=val@entry=6 '\006') at 
/home/ehabkost/rh/proj/virt/qemu/hw/virtio/virtio.c:760
#9  0x562ebf450cbe in virtio_pci_config_write (val=6, addr=18, 
opaque=0x562ec19a2180) at hw/virtio/virtio-pci.c:400
#10 0x562ebf450cbe in virtio_pci_config_write (opaque=0x562ec19a2180, 
addr=18, val=6, size=) at hw/virtio/virtio-pci.c:525
#11 0x562ebf234b98 in memory_region_write_accessor (mr=0x562ec19a2a10, 
addr=18, value=, size=1, shift=, mask=, attrs=...) at /home/ehabkost/rh/proj/virt/qemu/memory.c:525
#12 0x562ebf23309d in access_with_adjusted_size (addr=addr@entry=18, 
value=value@entry=0x7f1917a1c2c8, size=size@entry=1, access_size_min=, access_size_max=, access=0x562ebf234b20 
, mr=0x562ec19a2a10, attrs=...) at 
/home/ehabkost/rh/proj/virt/qemu/memory.c:591
#13 0x562ebf236f4c in memory_region_dispatch_write 
(mr=mr@entry=0x562ec19a2a10, addr=18, data=, size=size@entry=1, 
attrs=attrs@entry=...)
at /home/ehabkost/rh/proj/virt/qemu/memory.c:1275
#14 0x562ebf1f23b7 in address_space_write (mr=0x562ec19a2a10, l=, addr1=, len=1, buf=0x7f1917a1c3a7 "\006", attrs=..., 
addr=49170, as=0x562ebfb52aa0 ) at 
/home/ehabkost/rh/proj/virt/qemu/exec.c:2556
#15 0x562ebf1f23b7 in address_space_write (as=0x562ebfb52aa0 
, addr=, attrs=..., buf=, 
len=)
at /home/ehabkost/rh/proj/virt/qemu/exec.c:2601
#16 0x562ebf1f295d in address_space_rw (as=, addr=, attrs=..., buf=buf@entry=0x7f1917a1c3a7 "\006", len=len@entry=1, 
is_write=is_write@entry=true) at /home/ehabkost/rh/proj/virt/qemu/exec.c:2703
#17 0x562ebf1f61b6 in address_space_stb (as=, 
addr=, val=, attrs=..., result=result@entry=0x0)
at /home/ehabkost/rh/proj/virt/qemu/exec.c:3443
#18 0x562ebf2d6731 in helper_outb (env=, port=, data=) at 
/home/ehabkost/rh/proj/virt/qemu/target-i386/misc_helper.c:32
#19 0x7f193a4b166d in code_gen_buffer ()
#20 0x562ebf1f96e3 in cpu_exec (itb=0x7f1937d85b50, itb=0x7f1937d85b50, 
cpu=0x562ec0199e80) at /home/ehabkost/rh/proj/virt/qemu/cpu-exec.c:166
#21 0x562ebf1f96e3 in cpu_exec (sc=0x7f1917a1c8e0, tb_exit=, last_tb=, tb=0x7f1937d85b50, cpu=0x562ec0199e80)
at /home/ehabkost/rh/proj/virt/qemu/cpu-exec.c:530
#22 0x562ebf1f96e3 in cpu_exec (cpu=cpu@entry=0x562ec0191c00) at 
/home/ehabkost/rh/proj/virt/qemu/cpu-exec.c:625
#23 0x562ebf21f66f in qemu_tcg_cpu_thread_fn (cpu=0x562ec0191c00) at 
/home/ehabkost/rh/proj/virt/qemu/cpus.c:1541
#24 0x562ebf21f66f in qemu_tcg_cpu_thread_fn () at 
/home/ehabkost/rh/proj/virt/qemu/cpus.c:1574
#25 0x562ebf21f66f in qemu_tcg_cpu_thread_fn (arg=) at 
/home/ehabkost/rh/proj/virt/qemu/cpus.c:1171
#26 0x7f195417d5ca in start_thread () at /lib64/libpthread.so.0
#27 0x7f194f0f4ead in clone () at /lib64/libc.so.6
(gdb) up
#2  0x562ebf27feb5 in vhost_user_set_mem_table (dev=0x562ec0189630, 
mem=) at 
/home/ehabkost/rh/proj/virt/qemu/hw/virtio/vhost-user.c:308
308 abort();
(gdb) p dev->mem->nregions 
$1 = 0
(gdb) 

-- 
Eduardo



Re: [Qemu-devel] [PATCH] vl: Delay initialization of memory backends

2016-09-01 Thread Eduardo Habkost
On Thu, Sep 01, 2016 at 06:52:42PM +0200, Michal Privoznik wrote:
> On 01.09.2016 17:10, Eduardo Habkost wrote:
> > On Wed, Aug 31, 2016 at 02:47:21PM -0700, 
> > no-re...@ec2-52-6-146-230.compute-1.amazonaws.com wrote:
> > [...]
> >> GTESTER check-qtest-x86_64
> >> qemu-system-x86_64: Failed initializing vhost-user memory map, consider 
> >> using -object memory-backend-file share=on
> >> qemu-system-x86_64: vhost_set_mem_table failed: Success (0)
> > [...]
> >> **
> >> ERROR:/tmp/qemu-test/src/tests/vhost-user-test.c:149:wait_for_fds: 
> >> assertion failed: (s->fds_num)
> > 
> > Ouch. It looks like the ordering requirements are messier than I
> > thought. vhost-user depends on the memory backends to be already
> > initialized.
> > 
> > We can't use early initialization because prealloc delays chardev
> > init too much. We can't delay initialization because it is done
> > after netdevs.
> > 
> > We _really_ need to change this to simply use the ordering used
> > on the command-line/config instead of hardcoding messy ordering
> > requirements, but I wouldn't like to wait for a QemuOpts
> > refactoring to fix the bug. I will take a look at the memory
> > regions initialization path, and try to trigger the
> > memory-backend prealloc code there.
> > 
> 
> What I don't understand here is, if kernel already has a pool of
> hugepages from which qemu tries to allocate some, why does allocation
> take up to 1 minute? I would understand if it was during building of the
> pool, but once those pages are reserved allocating them should take no
> time. Isn't this a problem in kernel (too)?

I believe the pages are reserved, but are zeroed when actually
allocated by userspace.

-- 
Eduardo



Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 2/7] ppc/pnv: add a PnvChip object

2016-09-01 Thread Greg Kurz
On Wed, 31 Aug 2016 18:34:10 +0200
Cédric Le Goater  wrote:

> This is is an abstraction of a POWER8 chip which is a set of cores
> plus other 'units', like the pervasive unit, the interrupt controller,
> the memory controller, the on-chip microcontroller, etc. The whole can
> be seen as a socket. It depends on a cpu model and its characteristics,
> max cores, specific init are defined in a PnvChipClass.
> 
> We start with an near empty PnvChip with only a few cpu constants
> which we will grow in the subsequent patches with the controllers
> required to run the system.
> 
> Signed-off-by: Cédric Le Goater 
> ---
> 
>  Changes since v1:
>  
>  - introduced a PnvChipClass depending on the cpu model. It also
>provides some chip constants used by devices, like the cpu model hw
>id (f000f), a enum type (not sure this is useful yet), a custom
>realize ops for customization.
>  - the num-chips property can be configured on the command line.
>  
>  Maybe this object deserves its own file hw/ppc/pnv_chip.c ? 
> 
>  hw/ppc/pnv.c | 154 
> +++
>  include/hw/ppc/pnv.h |  71 
>  2 files changed, 225 insertions(+)
> 
> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> index 70413e3c5740..06051268e200 100644
> --- a/hw/ppc/pnv.c
> +++ b/hw/ppc/pnv.c
> @@ -168,6 +168,8 @@ static void ppc_powernv_init(MachineState *machine)
>  char *fw_filename;
>  long fw_size;
>  long kernel_size;
> +int i;
> +char *chip_typename;
>  
>  /* allocate RAM */
>  if (ram_size < (1 * G_BYTE)) {
> @@ -212,6 +214,153 @@ static void ppc_powernv_init(MachineState *machine)
>  exit(1);
>  }
>  }
> +
> +/* Create the processor chips */
> +chip_typename = g_strdup_printf(TYPE_PNV_CHIP "-%s", machine->cpu_model);
> +

You should check that the CPU model is supported by this machine (with
object_class_by_name()) and error out if it is not, or...

> +pnv->chips = g_new0(PnvChip *, pnv->num_chips);
> +for (i = 0; i < pnv->num_chips; i++) {
> +Object *chip = object_new(chip_typename);

... this call to object_new() will abort.

> +object_property_set_int(chip, CHIP_HWID(i), "chip-id", _abort);
> +object_property_set_bool(chip, true, "realized", _abort);
> +pnv->chips[i] = PNV_CHIP(chip);
> +}
> +g_free(chip_typename);
> +}
> +
> +static void pnv_chip_power8nvl_realize(PnvChip *chip, Error **errp)
> +{
> +;
> +}
> +
> +static void pnv_chip_power8nvl_class_init(ObjectClass *klass, void *data)
> +{
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +PnvChipClass *k = PNV_CHIP_CLASS(klass);
> +
> +k->realize = pnv_chip_power8nvl_realize;
> +k->cpu_model = "POWER8NVL";
> +k->chip_type = PNV_CHIP_P8NVL;
> +k->chip_f000f = 0x120d30498000ull;
> +dc->desc = "PowerNV Chip POWER8NVL";
> +}
> +
> +static const TypeInfo pnv_chip_power8nvl_info = {
> +.name  = TYPE_PNV_CHIP_POWER8NVL,
> +.parent= TYPE_PNV_CHIP,
> +.instance_size = sizeof(PnvChipPower8NVL),
> +.class_init= pnv_chip_power8nvl_class_init,
> +};
> +
> +static void pnv_chip_power8_realize(PnvChip *chip, Error **errp)
> +{
> +;
> +}
> +
> +static void pnv_chip_power8_class_init(ObjectClass *klass, void *data)
> +{
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +PnvChipClass *k = PNV_CHIP_CLASS(klass);
> +
> +k->realize = pnv_chip_power8_realize;
> +k->cpu_model = "POWER8";
> +k->chip_type = PNV_CHIP_P8;
> +k->chip_f000f = 0x220ea0498000ull;
> +dc->desc = "PowerNV Chip POWER8";
> +}
> +
> +static const TypeInfo pnv_chip_power8_info = {
> +.name  = TYPE_PNV_CHIP_POWER8,
> +.parent= TYPE_PNV_CHIP,
> +.instance_size = sizeof(PnvChipPower8),
> +.class_init= pnv_chip_power8_class_init,
> +};
> +
> +static void pnv_chip_power8e_realize(PnvChip *chip, Error **errp)
> +{
> +;
> +}
> +
> +static void pnv_chip_power8e_class_init(ObjectClass *klass, void *data)
> +{
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +PnvChipClass *k = PNV_CHIP_CLASS(klass);
> +
> +k->realize = pnv_chip_power8e_realize;
> +k->cpu_model = "POWER8E";
> +k->chip_type = PNV_CHIP_P8E;
> +k->chip_f000f = 0x221ef0498000ull;
> +dc->desc = "PowerNV Chip POWER8E";
> +}
> +
> +static const TypeInfo pnv_chip_power8e_info = {
> +.name  = TYPE_PNV_CHIP_POWER8E,
> +.parent= TYPE_PNV_CHIP,
> +.instance_size = sizeof(PnvChipPower8e),
> +.class_init= pnv_chip_power8e_class_init,
> +};
> +
> +static void pnv_chip_realize(DeviceState *dev, Error **errp)
> +{
> +PnvChip *chip = PNV_CHIP(dev);
> +PnvChipClass *pcc = PNV_CHIP_GET_CLASS(chip);
> +
> +pcc->realize(chip, errp);
> +}
> +
> +static Property pnv_chip_properties[] = {
> +DEFINE_PROP_UINT32("chip-id", PnvChip, chip_id, 0),
> +DEFINE_PROP_END_OF_LIST(),
> +};
> +
> 

Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-01 Thread Laszlo Ersek
On 09/01/16 15:51, Marcel Apfelbaum wrote:
> On 09/01/2016 04:27 PM, Peter Maydell wrote:
>> On 1 September 2016 at 14:22, Marcel Apfelbaum  wrote:
>>> Proposes best practices on how to use PCIe/PCI device
>>> in PCIe based machines and explain the reasoning behind them.
>>>
>>> Signed-off-by: Marcel Apfelbaum 
>>> ---
>>>
>>> Hi,
>>>
>>> Please add your comments on what to add/remove/edit to make this doc
>>> usable.
>>
> 
> Hi Peter,
> 
>> As somebody who doesn't really understand the problem space, my
>> thoughts:
>>
>> (1) is this intended as advice for developers writing machine
>> models and adding pci controllers to them, or is it intended as
>> advice for users (and libvirt-style management layers) about
>> how to configure QEMU?
>>
> 
> Is it intended for management layers as they have no way to
> understand how to "consume" the Q35 machine,
> but also for firmware developers (OVMF/SeaBIOS) to help them
> understand the usage model so they can optimize IO/MEM
> resources allocation for both boot time and hot-plug.
> 
> QEMU users/developers can also benefit from it as the PCIe arch
> is more complex supporting both PCI/PCIe devices and
> several PCI/PCIe controllers with no clear rules on what goes where.
> 
>> (2) it seems to be a bit short on concrete advice (either
>> "you should do this" instructions to machine model developers,
>> or "use command lines like this" instructions to end-users.
>>
> 
> Thanks for the point. I'll be sure to add detailed command line examples
> to the next version.

I think that would be a huge benefit!

(I'll try to read the document later, and come back with remarks.)

Thanks!
Laszlo




Re: [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-01 Thread Alex Williamson
On Thu, 1 Sep 2016 18:47:06 +0200
Michal Privoznik  wrote:

> On 31.08.2016 08:12, Tian, Kevin wrote:
> >> From: Alex Williamson [mailto:alex.william...@redhat.com]
> >> Sent: Wednesday, August 31, 2016 12:17 AM
> >>
> >> Hi folks,
> >>
> >> At KVM Forum we had a BoF session primarily around the mediated device
> >> sysfs interface.  I'd like to share what I think we agreed on and the
> >> "problem areas" that still need some work so we can get the thoughts
> >> and ideas from those who weren't able to attend.
> >>
> >> DanPB expressed some concern about the mdev_supported_types sysfs
> >> interface, which exposes a flat csv file with fields like "type",
> >> "number of instance", "vendor string", and then a bunch of type
> >> specific fields like "framebuffer size", "resolution", "frame rate
> >> limit", etc.  This is not entirely machine parsing friendly and sort of
> >> abuses the sysfs concept of one value per file.  Example output taken
> >> from Neo's libvirt RFC:
> >>
> >> cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
> >> # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, 
> >> framebuffer,
> >> max_resolution
> >> 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
> >> 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
> >> 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
> >> 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
> >> 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
> >> 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
> >> 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
> >> 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160
> >>
> >> The create/destroy then looks like this:
> >>
> >> echo "$mdev_UUID:vendor_specific_argument_list" >
> >>/sys/bus/pci/devices/.../mdev_create
> >>
> >> echo "$mdev_UUID:vendor_specific_argument_list" >
> >>/sys/bus/pci/devices/.../mdev_destroy
> >>
> >> "vendor_specific_argument_list" is nebulous.
> >>
> >> So the idea to fix this is to explode this into a directory structure,
> >> something like:
> >>
> >> ├── mdev_destroy
> >> └── mdev_supported_types
> >> ├── 11
> >> │   ├── create
> >> │   ├── description
> >> │   └── max_instances
> >> ├── 12
> >> │   ├── create
> >> │   ├── description
> >> │   └── max_instances
> >> └── 13
> >> ├── create
> >> ├── description
> >> └── max_instances
> >>
> >> Note that I'm only exposing the minimal attributes here for simplicity,
> >> the other attributes would be included in separate files and we would
> >> require vendors to create standard attributes for common device classes.  
> > 
> > I like this idea. All standard attributes are reflected into this hierarchy.
> > In the meantime, can we still allow optional vendor string in create 
> > interface? libvirt doesn't need to know the meaning, but allows upper
> > layer to do some vendor specific tweak if necessary.  
> 
> This is not the best idea IMO. Libvirt is there to shadow differences
> between hypervisors. While doing that, we often hide differences between
> various types of HW too. Therefore in order to provide good abstraction
> we should make vendor specific string as small as possible (ideally an
> empty string). I mean I see it as bad idea to expose "vgpu_type_id" from
> example above in domain XML. What I think the better idea is if we let
> users chose resolution and frame buffer size, e.g.:  resolution="1024x768" framebuffer="16"/> (just the first idea that came
> to my mind while writing this e-mail). The point is, XML part is
> completely free of any vendor-specific knobs.

That's not really what you want though, a user actually cares whether
they get an Intel of NVIDIA vGPU, we can't specify it as just a
resolution and framebuffer size.  The user also doesn't want the model
changing each time the VM is started, so not only do you *need* to know
the vendor, you need to know the vendor model.  This is the only way to
provide a consistent VM.  So as we discussed at the BoF, the libvirt
xml will likely reference the vendor string, which will be a unique
identifier that encompasses all the additional attributes we expose.
Really the goal of the attributes is simply so you don't need a per
vendor magic decoder ring to figure out the basic features of a given
vendor string.  Thanks,

Alex



Re: [Qemu-devel] [PATCH] vl: Delay initialization of memory backends

2016-09-01 Thread Michal Privoznik
On 01.09.2016 17:10, Eduardo Habkost wrote:
> On Wed, Aug 31, 2016 at 02:47:21PM -0700, 
> no-re...@ec2-52-6-146-230.compute-1.amazonaws.com wrote:
> [...]
>> GTESTER check-qtest-x86_64
>> qemu-system-x86_64: Failed initializing vhost-user memory map, consider 
>> using -object memory-backend-file share=on
>> qemu-system-x86_64: vhost_set_mem_table failed: Success (0)
> [...]
>> **
>> ERROR:/tmp/qemu-test/src/tests/vhost-user-test.c:149:wait_for_fds: assertion 
>> failed: (s->fds_num)
> 
> Ouch. It looks like the ordering requirements are messier than I
> thought. vhost-user depends on the memory backends to be already
> initialized.
> 
> We can't use early initialization because prealloc delays chardev
> init too much. We can't delay initialization because it is done
> after netdevs.
> 
> We _really_ need to change this to simply use the ordering used
> on the command-line/config instead of hardcoding messy ordering
> requirements, but I wouldn't like to wait for a QemuOpts
> refactoring to fix the bug. I will take a look at the memory
> regions initialization path, and try to trigger the
> memory-backend prealloc code there.
> 

What I don't understand here is, if kernel already has a pool of
hugepages from which qemu tries to allocate some, why does allocation
take up to 1 minute? I would understand if it was during building of the
pool, but once those pages are reserved allocating them should take no
time. Isn't this a problem in kernel (too)?

Michal



Re: [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-01 Thread Michal Privoznik
On 31.08.2016 08:12, Tian, Kevin wrote:
>> From: Alex Williamson [mailto:alex.william...@redhat.com]
>> Sent: Wednesday, August 31, 2016 12:17 AM
>>
>> Hi folks,
>>
>> At KVM Forum we had a BoF session primarily around the mediated device
>> sysfs interface.  I'd like to share what I think we agreed on and the
>> "problem areas" that still need some work so we can get the thoughts
>> and ideas from those who weren't able to attend.
>>
>> DanPB expressed some concern about the mdev_supported_types sysfs
>> interface, which exposes a flat csv file with fields like "type",
>> "number of instance", "vendor string", and then a bunch of type
>> specific fields like "framebuffer size", "resolution", "frame rate
>> limit", etc.  This is not entirely machine parsing friendly and sort of
>> abuses the sysfs concept of one value per file.  Example output taken
>> from Neo's libvirt RFC:
>>
>> cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
>> # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, framebuffer,
>> max_resolution
>> 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
>> 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
>> 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
>> 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
>> 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
>> 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
>> 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
>> 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160
>>
>> The create/destroy then looks like this:
>>
>> echo "$mdev_UUID:vendor_specific_argument_list" >
>>  /sys/bus/pci/devices/.../mdev_create
>>
>> echo "$mdev_UUID:vendor_specific_argument_list" >
>>  /sys/bus/pci/devices/.../mdev_destroy
>>
>> "vendor_specific_argument_list" is nebulous.
>>
>> So the idea to fix this is to explode this into a directory structure,
>> something like:
>>
>> ├── mdev_destroy
>> └── mdev_supported_types
>> ├── 11
>> │   ├── create
>> │   ├── description
>> │   └── max_instances
>> ├── 12
>> │   ├── create
>> │   ├── description
>> │   └── max_instances
>> └── 13
>> ├── create
>> ├── description
>> └── max_instances
>>
>> Note that I'm only exposing the minimal attributes here for simplicity,
>> the other attributes would be included in separate files and we would
>> require vendors to create standard attributes for common device classes.
> 
> I like this idea. All standard attributes are reflected into this hierarchy.
> In the meantime, can we still allow optional vendor string in create 
> interface? libvirt doesn't need to know the meaning, but allows upper
> layer to do some vendor specific tweak if necessary.

This is not the best idea IMO. Libvirt is there to shadow differences
between hypervisors. While doing that, we often hide differences between
various types of HW too. Therefore in order to provide good abstraction
we should make vendor specific string as small as possible (ideally an
empty string). I mean I see it as bad idea to expose "vgpu_type_id" from
example above in domain XML. What I think the better idea is if we let
users chose resolution and frame buffer size, e.g.:  (just the first idea that came
to my mind while writing this e-mail). The point is, XML part is
completely free of any vendor-specific knobs.

Michal



Re: [Qemu-devel] [kvm-unit-tests PATCH v3 08/10] arm/arm64: gicv2: add an IPI test

2016-09-01 Thread Auger Eric
Hi Drew,

On 15/07/2016 15:00, Andrew Jones wrote:
> Signed-off-by: Andrew Jones 
> ---
> v2: add more details in the output if a test fails,
> report spurious interrupts if we get them
> ---
>  arm/Makefile.common |   6 +-
>  arm/gic.c   | 194 
> 
>  arm/unittests.cfg   |   7 ++
>  3 files changed, 204 insertions(+), 3 deletions(-)
>  create mode 100644 arm/gic.c
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index 41239c37e0920..bc38183ab86e0 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -9,9 +9,9 @@ ifeq ($(LOADADDR),)
>   LOADADDR = 0x4000
>  endif
>  
> -tests-common = \
> - $(TEST_DIR)/selftest.flat \
> - $(TEST_DIR)/spinlock-test.flat
> +tests-common  = $(TEST_DIR)/selftest.flat
> +tests-common += $(TEST_DIR)/spinlock-test.flat
> +tests-common += $(TEST_DIR)/gic.flat
>  
>  all: test_cases
>  
> diff --git a/arm/gic.c b/arm/gic.c
> new file mode 100644
> index 0..cf7ec1c90413c
> --- /dev/null
> +++ b/arm/gic.c
> @@ -0,0 +1,194 @@
> +/*
> + * GIC tests
> + *
> + * GICv2
> + *   . test sending/receiving IPIs
> + *
> + * Copyright (C) 2016, Red Hat Inc, Andrew Jones 
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static int gic_version;
> +static int acked[NR_CPUS], spurious[NR_CPUS];
> +static cpumask_t ready;
> +
> +static void nr_cpu_check(int nr)
> +{
> + if (nr_cpus < nr)
> + report_abort("At least %d cpus required", nr);
> +}
> +
> +static void wait_on_ready(void)
> +{
> + cpumask_set_cpu(smp_processor_id(), );
> + while (!cpumask_full())
> + cpu_relax();
> +}
> +
> +static void check_acked(cpumask_t *mask)
> +{
> + int missing = 0, extra = 0, unexpected = 0;
> + int nr_pass, cpu, i;
> +
> + /* Wait up to 5s for all interrupts to be delivered */
> + for (i = 0; i < 50; ++i) {
> + mdelay(100);
> + nr_pass = 0;
> + for_each_present_cpu(cpu) {
Couldn't we use for_each_cpu(cpu, mask)?
> + smp_rmb();
> + nr_pass += cpumask_test_cpu(cpu, mask) ?
> + acked[cpu] == 1 : acked[cpu] == 0;
> + }
> + if (nr_pass == nr_cpus) {
> + report("Completed in %d ms", true, ++i * 100);
> + return;
> + }
> + }
> +
> + for_each_present_cpu(cpu) {
> + if (cpumask_test_cpu(cpu, mask)) {
here we can't since we count unexpected

> + if (!acked[cpu])
> + ++missing;
> + else if (acked[cpu] > 1)
> + ++extra;
> + } else {
> + if (acked[cpu])
> + ++unexpected;
> + }
> + }
> +
> + report("Timed-out (5s). ACKS: missing=%d extra=%d unexpected=%d",
> +false, missing, extra, unexpected);
> +}
> +
> +static void ipi_handler(struct pt_regs *regs __unused)
> +{
> + u32 iar = readl(gicv2_cpu_base() + GIC_CPU_INTACK);
> +
> + if (iar != GICC_INT_SPURIOUS) {
> + writel(iar, gicv2_cpu_base() + GIC_CPU_EOI);
if EOIMode is set may need to DIR as well.
> + smp_rmb(); /* pairs with wmb in ipi_test functions */
> + ++acked[smp_processor_id()];
> + smp_wmb(); /* pairs with rmb in check_acked */
> + } else {
> + ++spurious[smp_processor_id()];
> + smp_wmb();
> + }
> +}
> +
> +static void ipi_test_self(void)
> +{
> + cpumask_t mask;
> +
> + report_prefix_push("self");
> + memset(acked, 0, sizeof(acked));
> + smp_wmb();
> + cpumask_clear();
> + cpumask_set_cpu(0, );
> + writel(2 << 24, gicv2_dist_base() + GIC_DIST_SOFTINT);
> + check_acked();
> + report_prefix_pop();
> +}
> +
> +static void ipi_test_smp(void)
> +{
> + cpumask_t mask;
> + unsigned long tlist;
> +
> + report_prefix_push("target-list");
> + memset(acked, 0, sizeof(acked));
> + smp_wmb();
> + tlist = cpumask_bits(_present_mask)[0] & 0xaa;
> + cpumask_bits()[0] = tlist;
> + writel((u8)tlist << 16, gicv2_dist_base() + GIC_DIST_SOFTINT);
> + check_acked();
> + report_prefix_pop();
> +
> + report_prefix_push("broadcast");
> + memset(acked, 0, sizeof(acked));
> + smp_wmb();
> + cpumask_copy(, _present_mask);
> + cpumask_clear_cpu(0, );
> + writel(1 << 24, gicv2_dist_base() + GIC_DIST_SOFTINT);
> + check_acked();
> + report_prefix_pop();
> +}
> +
> +static void ipi_enable(void)
> +{
> + gicv2_enable_defaults();
> +#ifdef __arm__
> + install_exception_handler(EXCPTN_IRQ, ipi_handler);
> +#else
> + install_irq_handler(EL1H_IRQ, ipi_handler);
> +#endif
> + 

Re: [Qemu-devel] [kvm-unit-tests PATCH v3 09/10] arm/arm64: gicv3: add an IPI test

2016-09-01 Thread Auger Eric
Hi Drew,

On 15/07/2016 15:00, Andrew Jones wrote:
> Signed-off-by: Andrew Jones 
> 
> ---
> v2: use IRM for gicv3 broadcast
> ---
>  arm/gic.c | 157 
> ++
>  arm/unittests.cfg |   6 +++
>  2 files changed, 154 insertions(+), 9 deletions(-)
> 
> diff --git a/arm/gic.c b/arm/gic.c
> index cf7ec1c90413c..fc7ef241de3e2 100644
> --- a/arm/gic.c
> +++ b/arm/gic.c
> @@ -3,6 +3,8 @@
>   *
>   * GICv2
>   *   . test sending/receiving IPIs
> + * GICv3
> + *   . test sending/receiving IPIs
>   *
>   * Copyright (C) 2016, Red Hat Inc, Andrew Jones 
>   *
> @@ -16,6 +18,18 @@
>  #include 
>  #include 
>  
> +struct gic {
> + struct {
> + void (*enable)(void);
> + void (*send_self)(void);
> + void (*send_tlist)(cpumask_t *);
> + void (*send_broadcast)(void);
> + } ipi;
> + u32 (*read_iar)(void);
> + void (*write_eoi)(u32);
> +};
> +
> +static struct gic *gic;
>  static int gic_version;
>  static int acked[NR_CPUS], spurious[NR_CPUS];
>  static cpumask_t ready;
> @@ -69,12 +83,22 @@ static void check_acked(cpumask_t *mask)
>  false, missing, extra, unexpected);
>  }
>  
> +static u32 gicv2_read_iar(void)
> +{
> + return readl(gicv2_cpu_base() + GIC_CPU_INTACK);
> +}
> +
> +static void gicv2_write_eoi(u32 irq)
> +{
> + writel(irq, gicv2_cpu_base() + GIC_CPU_EOI);
> +}
> +
>  static void ipi_handler(struct pt_regs *regs __unused)
>  {
> - u32 iar = readl(gicv2_cpu_base() + GIC_CPU_INTACK);
> + u32 iar = gic->read_iar();
>  
>   if (iar != GICC_INT_SPURIOUS) {
> - writel(iar, gicv2_cpu_base() + GIC_CPU_EOI);
> + gic->write_eoi(iar);
>   smp_rmb(); /* pairs with wmb in ipi_test functions */
>   ++acked[smp_processor_id()];
>   smp_wmb(); /* pairs with rmb in check_acked */
> @@ -84,6 +108,89 @@ static void ipi_handler(struct pt_regs *regs __unused)
>   }
>  }
>  
> +static void gicv2_ipi_send_self(void)
> +{
> + writel(2 << 24, gicv2_dist_base() + GIC_DIST_SOFTINT);
> +}
> +
> +static void gicv2_ipi_send_tlist(cpumask_t *mask)
> +{
> + u8 tlist = (u8)cpumask_bits(mask)[0];
> +
> + writel(tlist << 16, gicv2_dist_base() + GIC_DIST_SOFTINT);
> +}
> +
> +static void gicv2_ipi_send_broadcast(void)
> +{
> + writel(1 << 24, gicv2_dist_base() + GIC_DIST_SOFTINT);
> +}
> +
> +#define ICC_SGI1R_AFFINITY_1_SHIFT   16
> +#define ICC_SGI1R_AFFINITY_2_SHIFT   32
> +#define ICC_SGI1R_AFFINITY_3_SHIFT   48
> +#define MPIDR_TO_SGI_AFFINITY(cluster_id, level) \
> + (MPIDR_AFFINITY_LEVEL(cluster_id, level) << ICC_SGI1R_AFFINITY_## level 
> ## _SHIFT)
> +
> +static void gicv3_ipi_send_tlist(cpumask_t *mask)
> +{
This function really is indigestible to me. Can't we reuse the linux
kernel gic_compute_target_list routine. Since the code looks similar why
not trying to be as close as possible to the kernel?

Thanks

Eric
> + u16 tlist;
> + int cpu;
> +
> + for_each_cpu(cpu, mask) {
> + u64 mpidr = cpus[cpu], sgi1r;
> + u64 cluster_id = mpidr & ~0xffUL;
> +
> + tlist = 0;
> +
> + while (cpu < nr_cpus) {
> + if ((mpidr & 0xff) >= 16) {
> + printf("cpu%d MPIDR:aff0 is %d (>= 16)!\n",
> + cpu, (int)(mpidr & 0xff));
> + break;
> + }
> +
> + tlist |= 1 << (mpidr & 0xf);
> +
> + cpu = cpumask_next(cpu, mask);
> + if (cpu >= nr_cpus)
> + break;
> +
> + mpidr = cpus[cpu];
> +
> + if (cluster_id != (mpidr & ~0xffUL)) {
> + --cpu;
> + break;
> + }
> + }
> +
> + sgi1r = (MPIDR_TO_SGI_AFFINITY(cluster_id, 3)   |
> +  MPIDR_TO_SGI_AFFINITY(cluster_id, 2)   |
> +  /* irq << 24   | */
> +  MPIDR_TO_SGI_AFFINITY(cluster_id, 1)   |
> +  tlist);
> +
> + gicv3_write_sgi1r(sgi1r);
> + }
> +
> + /* Force the above writes to ICC_SGI1R_EL1 to be executed */
> + isb();
> +}
> +
> +static void gicv3_ipi_send_self(void)
> +{
> + cpumask_t mask;
> +
> + cpumask_clear();
> + cpumask_set_cpu(smp_processor_id(), );
> + gicv3_ipi_send_tlist();
> +}
> +
> +static void gicv3_ipi_send_broadcast(void)
> +{
> + gicv3_write_sgi1r(1ULL << 40);
> + isb();
> +}
> +
>  static void ipi_test_self(void)
>  {
>   cpumask_t mask;
> @@ -93,7 +200,7 @@ static void ipi_test_self(void)
>   smp_wmb();
>   cpumask_clear();
>   cpumask_set_cpu(0, );
> - writel(2 << 24, gicv2_dist_base() + GIC_DIST_SOFTINT);
> + gic->ipi.send_self();
>   

Re: [Qemu-devel] [PATCH] spapr: implement H_CHANGE_LOGICAL_LAN_MAC h_call

2016-09-01 Thread Laurent Vivier


On 01/09/2016 15:13, Thomas Huth wrote:
> On 01.09.2016 10:10, Laurent Vivier wrote:
>> Since kernel v4.0, linux uses H_CHANGE_LOGICAL_LAN_MAC to change lively
>> the MAC address of an ibmveth interface.
>>
>> As QEMU doesn't implement this h_call, we can't change anymore the
>> MAC address of an spapr-vlan interface.
>>
>> Signed-off-by: Laurent Vivier 
>> ---
>>  hw/net/spapr_llan.c | 30 ++
>>  1 file changed, 30 insertions(+)
>>
>> diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
>> index b273eda..4bb95a5 100644
>> --- a/hw/net/spapr_llan.c
>> +++ b/hw/net/spapr_llan.c
>> @@ -106,6 +106,7 @@ typedef struct VIOsPAPRVLANDevice {
>>  VIOsPAPRDevice sdev;
>>  NICConf nicconf;
>>  NICState *nic;
>> +MACAddr perm_mac;
>>  bool isopen;
>>  hwaddr buf_list;
>>  uint32_t add_buf_ptr, use_buf_ptr, rx_bufs;
>> @@ -316,6 +317,10 @@ static void spapr_vlan_reset(VIOsPAPRDevice *sdev)
>>  spapr_vlan_reset_rx_pool(dev->rx_pool[i]);
>>  }
>>  }
>> +
>> +memcpy(>nicconf.macaddr.a, >perm_mac.a,
>> +   sizeof(dev->nicconf.macaddr.a));
>> +qemu_format_nic_info_str(qemu_get_queue(dev->nic), 
>> dev->nicconf.macaddr.a);
>>  }
>>  
>>  static void spapr_vlan_realize(VIOsPAPRDevice *sdev, Error **errp)
>> @@ -324,6 +329,8 @@ static void spapr_vlan_realize(VIOsPAPRDevice *sdev, 
>> Error **errp)
>>  
>>  qemu_macaddr_default_if_unset(>nicconf.macaddr);
>>  
>> +memcpy(>perm_mac.a, >nicconf.macaddr.a, 
>> sizeof(dev->perm_mac.a));
>> +
>>  dev->nic = qemu_new_nic(_spapr_vlan_info, >nicconf,
>>  object_get_typename(OBJECT(sdev)), 
>> sdev->qdev.id, dev);
>>  qemu_format_nic_info_str(qemu_get_queue(dev->nic), 
>> dev->nicconf.macaddr.a);
>> @@ -756,6 +763,27 @@ static target_ulong h_multicast_ctrl(PowerPCCPU *cpu, 
>> sPAPRMachineState *spapr,
>>  return H_SUCCESS;
>>  }
>>  
>> +static target_ulong h_change_logical_lan_mac(PowerPCCPU *cpu,
>> + sPAPRMachineState *spapr,
>> + target_ulong opcode,
>> + target_ulong *args)
>> +{
>> +target_ulong reg = args[0];
>> +target_ulong macaddr = args[1];
>> +VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
>> +VIOsPAPRVLANDevice *dev = VIO_SPAPR_VLAN_DEVICE(sdev);
>> +int i;
> 
> As an additional sanity check, you could test whether the MAC address is
> a proper unicast address, i.e. that the broadcast bit is not set.

According to LoPAPR_DRAFT_v11_24March2016_cmt.pdf,
 "16.4.3.7 H_CHANGE_LOGICAL_LAN_MAC":
...
Semantics:
* Validates the unit address, else H_Parameter
* Records the low order 6 bytes of mac-address for filtering future
incoming messages
* Returns H_Success

So H_PARAMETER is only for "reg" and we don't have to check mac-address.

Anyway, the address is checked by the kernel, with "is_valid_ether_addr()".

>> +for (i = 0; i < ETH_ALEN; i++) {
>> +dev->nicconf.macaddr.a[ETH_ALEN - i - 1] = macaddr & 0xff;
>> +macaddr >>= 8;
>> +}
>> +
>> +qemu_format_nic_info_str(qemu_get_queue(dev->nic), 
>> dev->nicconf.macaddr.a);
>> +
>> +return H_SUCCESS;
>> +}
>> +
>>  static Property spapr_vlan_properties[] = {
>>  DEFINE_SPAPR_PROPERTIES(VIOsPAPRVLANDevice, sdev),
>>  DEFINE_NIC_PROPERTIES(VIOsPAPRVLANDevice, nicconf),
>> @@ -854,6 +882,8 @@ static void spapr_vlan_register_types(void)
>>  spapr_register_hypercall(H_ADD_LOGICAL_LAN_BUFFER,
>>   h_add_logical_lan_buffer);
>>  spapr_register_hypercall(H_MULTICAST_CTRL, h_multicast_ctrl);
>> +spapr_register_hypercall(H_CHANGE_LOGICAL_LAN_MAC,
>> + h_change_logical_lan_mac);
>>  type_register_static(_vlan_info);
>>  }
> 
> Patch looks basically fine to me. One more thought though:
> What about migration? Don't we need to migrate the perm_mac array, too,
> or is this already covered by the dev->nicconf.macaddr array?

We don't need to migrate perm_mac because it is initialized from the
command line (it is only used to reset the card) on the destination side
as it is on the source side.

I've tested migration and all seems to work fine, but if you want to
double-check don't hesitate :)

Thanks,
Laurent



Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/7] ppc/pnv: add skeleton PowerNV platform

2016-09-01 Thread Greg Kurz
On Wed, 31 Aug 2016 18:34:09 +0200
Cédric Le Goater  wrote:

> From: Benjamin Herrenschmidt 
> 
> The goal is to emulate a PowerNV system at the level of the skiboot
> firmware, which loads the OS and provides some runtime services. Power
> Systems have a lower firmware (HostBoot) that does low level system
> initialization, like DRAM training. This is beyond the scope of what
> qemu will address in a PowerNV guest.
> 
> No devices yet, not even an interrupt controller. Just to get started,
> some RAM to load the skiboot firmware, the kernel and initrd. The
> device tree is fully created in the machine reset op.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> [clg: - updated for qemu-2.7
>   - replaced fprintf by error_report
>   - used a common definition of _FDT macro
>   - removed VMStateDescription as migration is not yet supported
>   - added IBM Copyright statements
>   - reworked kernel_filename handling
>   - merged PnvSystem and sPowerNVMachineState
>   - removed PHANDLE_XICP
>   - added ppc_create_page_sizes_prop helper
>   - removed nmi support
>   - removed kvm support
>   - updated powernv machine to version 2.8
>   - removed chips and cpus, They will be provided in another patches
>   - added a machine reset routine to initialize the device tree (also)
>   - french has a squelette and english a skeleton.
>   - improved commit log.
>   - reworked prototypes parameters
>   - added a check on the ram size (thanks to Michael Ellerman)
>   - fixed chip-id cell
>   - changed MAX_CPUS to 2048
>   - simplified memory node creation to one node only
>   - removed machine version
>   - rewrote the device tree creation with the fdt "rw" routines
>   - s/sPowerNVMachineState/PnvMachineState/
>   - etc.
> ]
> Signed-off-by: Cédric Le Goater 
> ---
> 
>  Changes since v1:
> 
>  - changed MAX_CPUS to 2048
>  - simplified memory node creation to one node only
>  - removed machine version 
>  - rewrote the device tree creation with the fdt "rw" routines
>  - s/sPowerNVMachineState/PnvMachineState/
>  - block_default_type is back to IF_IDE because of the AHCI device
> 
>  default-configs/ppc64-softmmu.mak |   1 +
>  hw/ppc/Makefile.objs  |   2 +
>  hw/ppc/pnv.c  | 244 
> ++
>  include/hw/ppc/pnv.h  |  37 ++
>  4 files changed, 284 insertions(+)
>  create mode 100644 hw/ppc/pnv.c
>  create mode 100644 include/hw/ppc/pnv.h
> 
> diff --git a/default-configs/ppc64-softmmu.mak 
> b/default-configs/ppc64-softmmu.mak
> index c4be59f638ed..516a6e25aba3 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -40,6 +40,7 @@ CONFIG_I8259=y
>  CONFIG_XILINX=y
>  CONFIG_XILINX_ETHLITE=y
>  CONFIG_PSERIES=y
> +CONFIG_POWERNV=y
>  CONFIG_PREP=y
>  CONFIG_MAC=y
>  CONFIG_E500=y
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index 99a0d4e581bf..8105db7d5600 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -5,6 +5,8 @@ obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
>  obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
>  obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
>  obj-$(CONFIG_PSERIES) += spapr_cpu_core.o
> +# IBM PowerNV
> +obj-$(CONFIG_POWERNV) += pnv.o
>  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
>  obj-y += spapr_pci_vfio.o
>  endif
> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> new file mode 100644
> index ..70413e3c5740
> --- /dev/null
> +++ b/hw/ppc/pnv.c
> @@ -0,0 +1,244 @@
> +/*
> + * QEMU PowerPC PowerNV model
> + *
> + * Copyright (c) 2004-2007 Fabrice Bellard
> + * Copyright (c) 2007 Jocelyn Mayer
> + * Copyright (c) 2010 David Gibson, IBM Corporation.
> + * Copyright (c) 2014-2016 BenH, IBM Corporation.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 

Re: [Qemu-devel] [PATCH 0/2] Xen HVM unplug changes

2016-09-01 Thread Paolo Bonzini


On 01/09/2016 14:11, Olaf Hering wrote:
> Update unplug in Xen HVM guests to cover more cases.
> Please review.
> 
> Olaf
> 
> Olaf Hering (2):
>   xen_platform: unplug also SCSI disks
>   xen_platform: SUSE xenlinux unplug for emulated PCI
> 
>  hw/i386/xen/xen_platform.c | 36 +++-
>  1 file changed, 35 insertions(+), 1 deletion(-)
> 

Looks good with the checkpatch complaints fixed.

Paolo



Re: [Qemu-devel] [PATCH v2 4/6] virtio-balloon: keep collecting stats on save/restore

2016-09-01 Thread Michael S. Tsirkin
On Thu, Sep 01, 2016 at 05:43:05PM +0200, Ladi Prosek wrote:
> On Thu, Sep 1, 2016 at 4:17 PM, Roman Kagan  wrote:
> > On Thu, Sep 01, 2016 at 10:35:49AM +0200, Ladi Prosek wrote:
> >> On Fri, Aug 19, 2016 at 3:39 PM, Roman Kagan  wrote:
> >> > Upon save/restore virtio-balloon stats acquisition stops.  The reason is
> >> > that the fact that the (only) virtqueue element is being used by QEMU is
> >> > not recorded anywhere on save, so upon restore it's not released to the
> >> > guest, making further progress impossible.
> >> >
> >> > Saving the information about the used element would introduce unjustified
> >> > vmstate incompatibility.
> >> >
> >> > So instead just make sure the element is pushed before save, leaving the
> >> > ball on the guest side.  For that, add vm state change handler to
> >> > virtio-ballon which would take care of pushing the element if there is
> >> > one.
> >> >
> >> > Signed-off-by: Roman Kagan 
> >> > Cc: "Michael S. Tsirkin" 
> >> > Cc: Ladi Prosek 
> >> > Cc: Stefan Hajnoczi 
> >> > ---
> >> >  hw/virtio/virtio-balloon.c | 27 ++-
> >> >  include/hw/virtio/virtio-balloon.h |  1 +
> >> >  2 files changed, 23 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> >> > index 6d4c57c..f00ad8e 100644
> >> > --- a/hw/virtio/virtio-balloon.c
> >> > +++ b/hw/virtio/virtio-balloon.c
> >> > @@ -88,10 +88,19 @@ static void balloon_stats_change_timer(VirtIOBalloon 
> >> > *s, int64_t secs)
> >> >  timer_mod(s->stats_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 
> >> > secs * 1000);
> >> >  }
> >> >
> >> > +static void balloon_stats_push_elem(VirtIOBalloon *s)
> >> > +{
> >> > +VirtIODevice *vdev = VIRTIO_DEVICE(s);
> >> > +
> >> > +virtqueue_push(s->svq, s->stats_vq_elem, s->stats_vq_offset);
> >> > +virtio_notify(vdev, s->svq);
> >> > +g_free(s->stats_vq_elem);
> >> > +s->stats_vq_elem = NULL;
> >> > +}
> >> > +
> >> >  static void balloon_stats_poll_cb(void *opaque)
> >> >  {
> >> >  VirtIOBalloon *s = opaque;
> >> > -VirtIODevice *vdev = VIRTIO_DEVICE(s);
> >> >
> >> >  if (!s->stats_vq_elem) {
> >> >  /* The guest hasn't sent the stats yet (either not enabled or 
> >> > we came
> >> > @@ -100,10 +109,7 @@ static void balloon_stats_poll_cb(void *opaque)
> >> >  return;
> >> >  }
> >> >
> >> > -virtqueue_push(s->svq, s->stats_vq_elem, s->stats_vq_offset);
> >> > -virtio_notify(vdev, s->svq);
> >> > -g_free(s->stats_vq_elem);
> >> > -s->stats_vq_elem = NULL;
> >> > +balloon_stats_push_elem(s);
> >> >  }
> >> >
> >> >  static void balloon_stats_get_all(Object *obj, Visitor *v, const char 
> >> > *name,
> >> > @@ -414,6 +420,15 @@ static int virtio_balloon_load_device(VirtIODevice 
> >> > *vdev, QEMUFile *f,
> >> >  return 0;
> >> >  }
> >> >
> >> > +static void balloon_vm_state_change(void *opaque, int running, RunState 
> >> > state)
> >> > +{
> >> > +VirtIOBalloon *s = opaque;
> >> > +
> >> > +if (!running && s->stats_vq_elem) {
> >> > +balloon_stats_push_elem(s);
> >> > +}
> >> > +}
> >> > +
> >> >  static void virtio_balloon_device_realize(DeviceState *dev, Error 
> >> > **errp)
> >> >  {
> >> >  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >> > @@ -436,6 +451,7 @@ static void 
> >> > virtio_balloon_device_realize(DeviceState *dev, Error **errp)
> >> >  s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
> >> >  s->svq = virtio_add_queue(vdev, 1, virtio_balloon_receive_stats);
> >> >
> >> > +s->change = 
> >> > qemu_add_vm_change_state_handler(balloon_vm_state_change, s);
> >> >  reset_stats(s);
> >> >  }
> >> >
> >> > @@ -444,6 +460,7 @@ static void 
> >> > virtio_balloon_device_unrealize(DeviceState *dev, Error **errp)
> >> >  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >> >  VirtIOBalloon *s = VIRTIO_BALLOON(dev);
> >> >
> >> > +qemu_del_vm_change_state_handler(s->change);
> >> >  balloon_stats_destroy_timer(s);
> >> >  qemu_remove_balloon_handler(s);
> >> >  virtio_cleanup(vdev);
> >> > diff --git a/include/hw/virtio/virtio-balloon.h 
> >> > b/include/hw/virtio/virtio-balloon.h
> >> > index 1ea13bd..d72ff7f 100644
> >> > --- a/include/hw/virtio/virtio-balloon.h
> >> > +++ b/include/hw/virtio/virtio-balloon.h
> >> > @@ -43,6 +43,7 @@ typedef struct VirtIOBalloon {
> >> >  int64_t stats_last_update;
> >> >  int64_t stats_poll_interval;
> >> >  uint32_t host_features;
> >> > +VMChangeStateEntry *change;
> >> >  } VirtIOBalloon;
> >> >
> >> >  #endif
> >> > --
> >> > 2.7.4
> >> >
> >>
> >> Hi Roman,
> >>
> >> I talked to Michael Tsirkin and he agrees with merging this patch for
> >> 2.7.
> >
> > I'm not happy with this patch: it tries to solve the problem on the
> > "save" side and therefore 

[Qemu-devel] how to monitor vm exit

2016-09-01 Thread Yuxin Ren
Hi All,

I have two questions.
1. How can I track how many vm exit happens during vm execution using
KVM under Ubuntu?
2. How can I config KVM to enable/disable conditional vm exiting. For
example, how can I disable WBINVD exiting?

Thanks a lot
Yuxin



Re: [Qemu-devel] Deprecate/remove AIX host support? Solaris?

2016-09-01 Thread Karel Gardas
>From time to time I'm using Qemu on my Solaris 11/amd64 workstation.
But as in case of AIX, Solaris build was neither done easily in IIRC I
also tweaked configure in a place or two. So if you keep it, good, if
not, no end of world as I'm planning to migrate to some other OS
anyway in the future because I feel quality of this OS is going
down-hill since 11.0 or so -- at least for me and my application.

On Thu, Sep 1, 2016 at 5:57 PM, Peter Maydell  wrote:
> Hi; I was wondering whether we should remove the probably-broken
> and certainly-untested support for some of the more obscure host
> OSes we have ifdefs lurking around for.
>
> AIX host support seems like an easy one to me. As far as I
> can tell it's been impossible to build QEMU since 2013 without
> editing configure (in commit e3608d66cea31 rth made "cpu=ppc64"
> result in adding a "-m64" option to CFLAGS, and AIX gcc doesn't
> understand -m64). Since nobody has complained I think we can
> assume that nobody's actually using this host. I have tentatively
> added the note
> "Support for building on AIX is deprecated. We think it has been
>  broken for some time and plan to remove it entirely in 2.8."
> to the 2.7 ChangeLog. (If people disagree or somebody pops up to
> volunteer to test and maintain AIX host support I'll delete the
> changelog text, obviously.)
>
> Possibly more controversially, Solaris host support? Is anybody
> using and testing this?
>
> thanks
> -- PMM
>



Re: [Qemu-devel] [PATCH v5 2/5] target-ppc: add vector extract instructions

2016-09-01 Thread Richard Henderson

On 08/31/2016 11:36 PM, Rajalakshmi Srinivasaraghavan wrote:

+#if defined(HOST_WORDS_BIGENDIAN)
+#define VEXTRACT(suffix, element)\
+void helper_vextract##suffix(ppc_avr_t *r, ppc_avr_t *b, uint32_t index) \
+{\
+r->u64[0] = r->u64[1] = 0;   \
+memmove(>u8[8 - sizeof(r->element)], >u8[index],   \
+   sizeof(r->element[0]));   \


Again, you must consider R == B.  I made this same comment wrt v2, when you 
still had a memcpy here.


This is trivial:

  (1) Use memmove to set first sizeof(r->element[0]) bytes,
  (2) Use memset 0 to clean last (16 - sizeof(r->element[0]) bytes.

You have some test cases for these insns, don't you?


r~



Re: [Qemu-devel] [PATCH] doc/rcu: fix typo

2016-09-01 Thread Paolo Bonzini


On 01/09/2016 04:21, Cao jin wrote:
> Signed-off-by: Cao jin 
> ---
>  docs/rcu.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/docs/rcu.txt b/docs/rcu.txt
> index 2f70954..a70b72c 100644
> --- a/docs/rcu.txt
> +++ b/docs/rcu.txt
> @@ -37,7 +37,7 @@ do not matter; as soon as all previous critical sections 
> have finished,
>  there cannot be any readers who hold references to the data structure,
>  and these can now be safely reclaimed (e.g., freed or unref'ed).
>  
> -Here is a picutre:
> +Here is a picture:
>  
>  thread 1  thread 2  thread 3
>  ------
> 

Queued for 2.8, thanks.

Paolo



[Qemu-devel] Deprecate/remove AIX host support? Solaris?

2016-09-01 Thread Peter Maydell
Hi; I was wondering whether we should remove the probably-broken
and certainly-untested support for some of the more obscure host
OSes we have ifdefs lurking around for.

AIX host support seems like an easy one to me. As far as I
can tell it's been impossible to build QEMU since 2013 without
editing configure (in commit e3608d66cea31 rth made "cpu=ppc64"
result in adding a "-m64" option to CFLAGS, and AIX gcc doesn't
understand -m64). Since nobody has complained I think we can
assume that nobody's actually using this host. I have tentatively
added the note
"Support for building on AIX is deprecated. We think it has been
 broken for some time and plan to remove it entirely in 2.8."
to the 2.7 ChangeLog. (If people disagree or somebody pops up to
volunteer to test and maintain AIX host support I'll delete the
changelog text, obviously.)

Possibly more controversially, Solaris host support? Is anybody
using and testing this?

thanks
-- PMM



Re: [Qemu-devel] [PATCH v2 4/6] virtio-balloon: keep collecting stats on save/restore

2016-09-01 Thread Roman Kagan
On Thu, Sep 01, 2016 at 10:35:49AM +0200, Ladi Prosek wrote:
> On Fri, Aug 19, 2016 at 3:39 PM, Roman Kagan  wrote:
> > Upon save/restore virtio-balloon stats acquisition stops.  The reason is
> > that the fact that the (only) virtqueue element is being used by QEMU is
> > not recorded anywhere on save, so upon restore it's not released to the
> > guest, making further progress impossible.
> >
> > Saving the information about the used element would introduce unjustified
> > vmstate incompatibility.
> >
> > So instead just make sure the element is pushed before save, leaving the
> > ball on the guest side.  For that, add vm state change handler to
> > virtio-ballon which would take care of pushing the element if there is
> > one.
> >
> > Signed-off-by: Roman Kagan 
> > Cc: "Michael S. Tsirkin" 
> > Cc: Ladi Prosek 
> > Cc: Stefan Hajnoczi 
> > ---
> >  hw/virtio/virtio-balloon.c | 27 ++-
> >  include/hw/virtio/virtio-balloon.h |  1 +
> >  2 files changed, 23 insertions(+), 5 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index 6d4c57c..f00ad8e 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -88,10 +88,19 @@ static void balloon_stats_change_timer(VirtIOBalloon 
> > *s, int64_t secs)
> >  timer_mod(s->stats_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + secs 
> > * 1000);
> >  }
> >
> > +static void balloon_stats_push_elem(VirtIOBalloon *s)
> > +{
> > +VirtIODevice *vdev = VIRTIO_DEVICE(s);
> > +
> > +virtqueue_push(s->svq, s->stats_vq_elem, s->stats_vq_offset);
> > +virtio_notify(vdev, s->svq);
> > +g_free(s->stats_vq_elem);
> > +s->stats_vq_elem = NULL;
> > +}
> > +
> >  static void balloon_stats_poll_cb(void *opaque)
> >  {
> >  VirtIOBalloon *s = opaque;
> > -VirtIODevice *vdev = VIRTIO_DEVICE(s);
> >
> >  if (!s->stats_vq_elem) {
> >  /* The guest hasn't sent the stats yet (either not enabled or we 
> > came
> > @@ -100,10 +109,7 @@ static void balloon_stats_poll_cb(void *opaque)
> >  return;
> >  }
> >
> > -virtqueue_push(s->svq, s->stats_vq_elem, s->stats_vq_offset);
> > -virtio_notify(vdev, s->svq);
> > -g_free(s->stats_vq_elem);
> > -s->stats_vq_elem = NULL;
> > +balloon_stats_push_elem(s);
> >  }
> >
> >  static void balloon_stats_get_all(Object *obj, Visitor *v, const char 
> > *name,
> > @@ -414,6 +420,15 @@ static int virtio_balloon_load_device(VirtIODevice 
> > *vdev, QEMUFile *f,
> >  return 0;
> >  }
> >
> > +static void balloon_vm_state_change(void *opaque, int running, RunState 
> > state)
> > +{
> > +VirtIOBalloon *s = opaque;
> > +
> > +if (!running && s->stats_vq_elem) {
> > +balloon_stats_push_elem(s);
> > +}
> > +}
> > +
> >  static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
> >  {
> >  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > @@ -436,6 +451,7 @@ static void virtio_balloon_device_realize(DeviceState 
> > *dev, Error **errp)
> >  s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
> >  s->svq = virtio_add_queue(vdev, 1, virtio_balloon_receive_stats);
> >
> > +s->change = qemu_add_vm_change_state_handler(balloon_vm_state_change, 
> > s);
> >  reset_stats(s);
> >  }
> >
> > @@ -444,6 +460,7 @@ static void virtio_balloon_device_unrealize(DeviceState 
> > *dev, Error **errp)
> >  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >  VirtIOBalloon *s = VIRTIO_BALLOON(dev);
> >
> > +qemu_del_vm_change_state_handler(s->change);
> >  balloon_stats_destroy_timer(s);
> >  qemu_remove_balloon_handler(s);
> >  virtio_cleanup(vdev);
> > diff --git a/include/hw/virtio/virtio-balloon.h 
> > b/include/hw/virtio/virtio-balloon.h
> > index 1ea13bd..d72ff7f 100644
> > --- a/include/hw/virtio/virtio-balloon.h
> > +++ b/include/hw/virtio/virtio-balloon.h
> > @@ -43,6 +43,7 @@ typedef struct VirtIOBalloon {
> >  int64_t stats_last_update;
> >  int64_t stats_poll_interval;
> >  uint32_t host_features;
> > +VMChangeStateEntry *change;
> >  } VirtIOBalloon;
> >
> >  #endif
> > --
> > 2.7.4
> >
> 
> Hi Roman,
> 
> I talked to Michael Tsirkin and he agrees with merging this patch for
> 2.7.

I'm not happy with this patch: it tries to solve the problem on the
"save" side and therefore doesn't fix the bug when migrating from an
earlier QEMU version.

I wonder if we can do better and solve it on the "load" side.  (At first
I thought that your patch did that but on a closer look it turned out
not the case).

In particular, with Stefan's patch to restore VirtQueue->inuse, we
should be able to just rewind ->last_avail_idx by ->inuse during "load",
which AFAICS would also fix the bug.  What do you think?

> Could you please resubmit and use the set_status callback instead
> of adding another VM state change 

Re: [Qemu-devel] [PATCH] Fix memory leak in ide_register_restart_cb()

2016-09-01 Thread Ashijeet Acharya
On Thu, Sep 1, 2016 at 9:13 PM, Paolo Bonzini  wrote:
> On 01/09/2016 07:31, Ashijeet Acharya wrote:
>> I am still waiting for review on this one.
>
> Hi,
>
> QEMU is in hard freeze now so it's normal to have some delay in patch
> review.  Maintainers often use this time to work on their own features.
>
> I'm sure John will get to it in short time.
>
> Paolo

Alright thanks. No problem!

Ashijeet
>
>> On Tue, Aug 16, 2016 at 10:40 PM, Ashijeet Acharya
>>  wrote:
>>> Fix a memory leak in ide_register_restart_cb() in hw/ide/core.c and add 
>>> idebus_unrealize() in hw/ide/qdev.c to have calls to 
>>> qemu_del_vm_change_state_handler() to deal with the dangling change state 
>>> handler during hot-unplugging ide devices which might lead to a crash.
>>>
>>> Signed-off-by: Ashijeet Acharya 
>>> ---
>>>  hw/ide/core.c |  2 +-
>>>  hw/ide/qdev.c | 14 ++
>>>  include/hw/ide/internal.h |  1 +
>>>  3 files changed, 16 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>>> index 45b6df1..eecbb47 100644
>>> --- a/hw/ide/core.c
>>> +++ b/hw/ide/core.c
>>> @@ -2582,7 +2582,7 @@ static void ide_restart_cb(void *opaque, int running, 
>>> RunState state)
>>>  void ide_register_restart_cb(IDEBus *bus)
>>>  {
>>>  if (bus->dma->ops->restart_dma) {
>>> -qemu_add_vm_change_state_handler(ide_restart_cb, bus);
>>> +bus->vmstate = qemu_add_vm_change_state_handler(ide_restart_cb, 
>>> bus);
>>>  }
>>>  }
>>>
>>> diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
>>> index 67c76bf..6f75f77 100644
>>> --- a/hw/ide/qdev.c
>>> +++ b/hw/ide/qdev.c
>>> @@ -31,6 +31,7 @@
>>>  /* - */
>>>
>>>  static char *idebus_get_fw_dev_path(DeviceState *dev);
>>> +static void idebus_unrealize(DeviceState *qdev, Error **errp);
>>>
>>>  static Property ide_props[] = {
>>>  DEFINE_PROP_UINT32("unit", IDEDevice, unit, -1),
>>> @@ -345,6 +346,7 @@ static void ide_device_class_init(ObjectClass *klass, 
>>> void *data)
>>>  k->init = ide_qdev_init;
>>>  set_bit(DEVICE_CATEGORY_STORAGE, k->categories);
>>>  k->bus_type = TYPE_IDE_BUS;
>>> +k->unrealize = idebus_unrealize;
>>>  k->props = ide_props;
>>>  }
>>>
>>> @@ -368,3 +370,15 @@ static void ide_register_types(void)
>>>  }
>>>
>>>  type_init(ide_register_types)
>>> +
>>> +static void idebus_unrealize(DeviceState *qdev, Error **errp)
>>> +{
>>> +IDEBus *bus = DO_UPCAST(IDEBus, qbus, qdev->parent_bus);
>>> +
>>> +if (bus->dma->ops->restart_dma) {
>>> +if (bus->vmstate) {
>>> +qemu_del_vm_change_state_handler(bus->vmstate);
>>> +}
>>> +}
>>> +}
>>>
>>> diff --git a/include/hw/ide/internal.h b/include/hw/ide/internal.h
>>> index 7824bc3..2103261 100644
>>> --- a/include/hw/ide/internal.h
>>> +++ b/include/hw/ide/internal.h
>>> @@ -480,6 +480,7 @@ struct IDEBus {
>>>  uint8_t retry_unit;
>>>  int64_t retry_sector_num;
>>>  uint32_t retry_nsector;
>>> +VMChangeStateEntry *vmstate;
>>>  };
>>>
>>>  #define TYPE_IDE_DEVICE "ide-device"
>>> --
>>> 2.6.2
>>>



Re: [Qemu-devel] [PATCH v2 4/6] virtio-balloon: keep collecting stats on save/restore

2016-09-01 Thread Ladi Prosek
On Thu, Sep 1, 2016 at 4:17 PM, Roman Kagan  wrote:
> On Thu, Sep 01, 2016 at 10:35:49AM +0200, Ladi Prosek wrote:
>> On Fri, Aug 19, 2016 at 3:39 PM, Roman Kagan  wrote:
>> > Upon save/restore virtio-balloon stats acquisition stops.  The reason is
>> > that the fact that the (only) virtqueue element is being used by QEMU is
>> > not recorded anywhere on save, so upon restore it's not released to the
>> > guest, making further progress impossible.
>> >
>> > Saving the information about the used element would introduce unjustified
>> > vmstate incompatibility.
>> >
>> > So instead just make sure the element is pushed before save, leaving the
>> > ball on the guest side.  For that, add vm state change handler to
>> > virtio-ballon which would take care of pushing the element if there is
>> > one.
>> >
>> > Signed-off-by: Roman Kagan 
>> > Cc: "Michael S. Tsirkin" 
>> > Cc: Ladi Prosek 
>> > Cc: Stefan Hajnoczi 
>> > ---
>> >  hw/virtio/virtio-balloon.c | 27 ++-
>> >  include/hw/virtio/virtio-balloon.h |  1 +
>> >  2 files changed, 23 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
>> > index 6d4c57c..f00ad8e 100644
>> > --- a/hw/virtio/virtio-balloon.c
>> > +++ b/hw/virtio/virtio-balloon.c
>> > @@ -88,10 +88,19 @@ static void balloon_stats_change_timer(VirtIOBalloon 
>> > *s, int64_t secs)
>> >  timer_mod(s->stats_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 
>> > secs * 1000);
>> >  }
>> >
>> > +static void balloon_stats_push_elem(VirtIOBalloon *s)
>> > +{
>> > +VirtIODevice *vdev = VIRTIO_DEVICE(s);
>> > +
>> > +virtqueue_push(s->svq, s->stats_vq_elem, s->stats_vq_offset);
>> > +virtio_notify(vdev, s->svq);
>> > +g_free(s->stats_vq_elem);
>> > +s->stats_vq_elem = NULL;
>> > +}
>> > +
>> >  static void balloon_stats_poll_cb(void *opaque)
>> >  {
>> >  VirtIOBalloon *s = opaque;
>> > -VirtIODevice *vdev = VIRTIO_DEVICE(s);
>> >
>> >  if (!s->stats_vq_elem) {
>> >  /* The guest hasn't sent the stats yet (either not enabled or we 
>> > came
>> > @@ -100,10 +109,7 @@ static void balloon_stats_poll_cb(void *opaque)
>> >  return;
>> >  }
>> >
>> > -virtqueue_push(s->svq, s->stats_vq_elem, s->stats_vq_offset);
>> > -virtio_notify(vdev, s->svq);
>> > -g_free(s->stats_vq_elem);
>> > -s->stats_vq_elem = NULL;
>> > +balloon_stats_push_elem(s);
>> >  }
>> >
>> >  static void balloon_stats_get_all(Object *obj, Visitor *v, const char 
>> > *name,
>> > @@ -414,6 +420,15 @@ static int virtio_balloon_load_device(VirtIODevice 
>> > *vdev, QEMUFile *f,
>> >  return 0;
>> >  }
>> >
>> > +static void balloon_vm_state_change(void *opaque, int running, RunState 
>> > state)
>> > +{
>> > +VirtIOBalloon *s = opaque;
>> > +
>> > +if (!running && s->stats_vq_elem) {
>> > +balloon_stats_push_elem(s);
>> > +}
>> > +}
>> > +
>> >  static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
>> >  {
>> >  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>> > @@ -436,6 +451,7 @@ static void virtio_balloon_device_realize(DeviceState 
>> > *dev, Error **errp)
>> >  s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>> >  s->svq = virtio_add_queue(vdev, 1, virtio_balloon_receive_stats);
>> >
>> > +s->change = qemu_add_vm_change_state_handler(balloon_vm_state_change, 
>> > s);
>> >  reset_stats(s);
>> >  }
>> >
>> > @@ -444,6 +460,7 @@ static void 
>> > virtio_balloon_device_unrealize(DeviceState *dev, Error **errp)
>> >  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>> >  VirtIOBalloon *s = VIRTIO_BALLOON(dev);
>> >
>> > +qemu_del_vm_change_state_handler(s->change);
>> >  balloon_stats_destroy_timer(s);
>> >  qemu_remove_balloon_handler(s);
>> >  virtio_cleanup(vdev);
>> > diff --git a/include/hw/virtio/virtio-balloon.h 
>> > b/include/hw/virtio/virtio-balloon.h
>> > index 1ea13bd..d72ff7f 100644
>> > --- a/include/hw/virtio/virtio-balloon.h
>> > +++ b/include/hw/virtio/virtio-balloon.h
>> > @@ -43,6 +43,7 @@ typedef struct VirtIOBalloon {
>> >  int64_t stats_last_update;
>> >  int64_t stats_poll_interval;
>> >  uint32_t host_features;
>> > +VMChangeStateEntry *change;
>> >  } VirtIOBalloon;
>> >
>> >  #endif
>> > --
>> > 2.7.4
>> >
>>
>> Hi Roman,
>>
>> I talked to Michael Tsirkin and he agrees with merging this patch for
>> 2.7.
>
> I'm not happy with this patch: it tries to solve the problem on the
> "save" side and therefore doesn't fix the bug when migrating from an
> earlier QEMU version.
>
> I wonder if we can do better and solve it on the "load" side.  (At first
> I thought that your patch did that but on a closer look it turned out
> not the case).
>
> In particular, with Stefan's patch to restore VirtQueue->inuse, we
> should be able to just 

Re: [Qemu-devel] [PATCH] Fix memory leak in ide_register_restart_cb()

2016-09-01 Thread Paolo Bonzini
On 01/09/2016 07:31, Ashijeet Acharya wrote:
> I am still waiting for review on this one.

Hi,

QEMU is in hard freeze now so it's normal to have some delay in patch
review.  Maintainers often use this time to work on their own features.

I'm sure John will get to it in short time.

Paolo

> On Tue, Aug 16, 2016 at 10:40 PM, Ashijeet Acharya
>  wrote:
>> Fix a memory leak in ide_register_restart_cb() in hw/ide/core.c and add 
>> idebus_unrealize() in hw/ide/qdev.c to have calls to 
>> qemu_del_vm_change_state_handler() to deal with the dangling change state 
>> handler during hot-unplugging ide devices which might lead to a crash.
>>
>> Signed-off-by: Ashijeet Acharya 
>> ---
>>  hw/ide/core.c |  2 +-
>>  hw/ide/qdev.c | 14 ++
>>  include/hw/ide/internal.h |  1 +
>>  3 files changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index 45b6df1..eecbb47 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -2582,7 +2582,7 @@ static void ide_restart_cb(void *opaque, int running, 
>> RunState state)
>>  void ide_register_restart_cb(IDEBus *bus)
>>  {
>>  if (bus->dma->ops->restart_dma) {
>> -qemu_add_vm_change_state_handler(ide_restart_cb, bus);
>> +bus->vmstate = qemu_add_vm_change_state_handler(ide_restart_cb, 
>> bus);
>>  }
>>  }
>>
>> diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
>> index 67c76bf..6f75f77 100644
>> --- a/hw/ide/qdev.c
>> +++ b/hw/ide/qdev.c
>> @@ -31,6 +31,7 @@
>>  /* - */
>>
>>  static char *idebus_get_fw_dev_path(DeviceState *dev);
>> +static void idebus_unrealize(DeviceState *qdev, Error **errp);
>>
>>  static Property ide_props[] = {
>>  DEFINE_PROP_UINT32("unit", IDEDevice, unit, -1),
>> @@ -345,6 +346,7 @@ static void ide_device_class_init(ObjectClass *klass, 
>> void *data)
>>  k->init = ide_qdev_init;
>>  set_bit(DEVICE_CATEGORY_STORAGE, k->categories);
>>  k->bus_type = TYPE_IDE_BUS;
>> +k->unrealize = idebus_unrealize;
>>  k->props = ide_props;
>>  }
>>
>> @@ -368,3 +370,15 @@ static void ide_register_types(void)
>>  }
>>
>>  type_init(ide_register_types)
>> +
>> +static void idebus_unrealize(DeviceState *qdev, Error **errp)
>> +{
>> +IDEBus *bus = DO_UPCAST(IDEBus, qbus, qdev->parent_bus);
>> +
>> +if (bus->dma->ops->restart_dma) {
>> +if (bus->vmstate) {
>> +qemu_del_vm_change_state_handler(bus->vmstate);
>> +}
>> +}
>> +}
>>
>> diff --git a/include/hw/ide/internal.h b/include/hw/ide/internal.h
>> index 7824bc3..2103261 100644
>> --- a/include/hw/ide/internal.h
>> +++ b/include/hw/ide/internal.h
>> @@ -480,6 +480,7 @@ struct IDEBus {
>>  uint8_t retry_unit;
>>  int64_t retry_sector_num;
>>  uint32_t retry_nsector;
>> +VMChangeStateEntry *vmstate;
>>  };
>>
>>  #define TYPE_IDE_DEVICE "ide-device"
>> --
>> 2.6.2
>>



Re: [Qemu-devel] [PATCH] vl: Delay initialization of memory backends

2016-09-01 Thread Paolo Bonzini


On 01/09/2016 17:10, Eduardo Habkost wrote:
> Ouch. It looks like the ordering requirements are messier than I
> thought. vhost-user depends on the memory backends to be already
> initialized.

You could also look at delaying initialization of vhost-user, not
sending anything on the wire until after machine creation.

Paolo



Re: [Qemu-devel] [PATCH RFC v2 05/22] block/pcache: add aio requests into cache

2016-09-01 Thread Kevin Wolf
Am 29.08.2016 um 19:10 hat Pavel Butsykin geschrieben:
> For storing requests use an rbtree, here are add basic operations on the
> rbtree to work with  cache nodes.
> 
> Signed-off-by: Pavel Butsykin 
> ---
>  block/pcache.c | 190 
> -
>  1 file changed, 189 insertions(+), 1 deletion(-)
> 
> diff --git a/block/pcache.c b/block/pcache.c
> index 7f221d6..f5022f9 100644
> --- a/block/pcache.c
> +++ b/block/pcache.c
> @@ -27,6 +27,7 @@
>  #include "block/raw-aio.h"
>  #include "qapi/error.h"
>  #include "qapi/qmp/qstring.h"
> +#include "qemu/rbtree.h"
>  
>  #define PCACHE_DEBUG
>  
> @@ -37,9 +38,53 @@
>  #define DPRINTF(fmt, ...) do { } while (0)
>  #endif
>  
> +typedef struct RbNodeKey {
> +uint64_tnum;
> +uint32_tsize;
> +} RbNodeKey;
> +
> +typedef struct BlockNode {
> +struct RbNode   rb_node;
> +union {
> +RbNodeKey   key;
> +struct {
> +uint64_tsector_num;
> +uint32_tnb_sectors;
> +};
> +};

What's the deal with this union?

It just adds an alias, and 'sector_num'/'nb_sectors' are actually longer
to type than 'key.num' and 'key.size', so the only advantage that I
could image doesn't really apply.

But it brings problems: We always have to be careful to keep the two
structs in sync, and grepping for the field name will only bring up half
of the users because the other half uses the other alias.

> +QTAILQ_ENTRY(BlockNode) entry;
> +} BlockNode;
> +
> +typedef struct PCNode {
> +BlockNode cm;
> +
> +uint8_t  *data;
> +} PCNode;

What do 'PC' and 'cm' mean?

> +typedef struct ReqStor {
> +struct {
> +struct RbRoot root;
> +CoMutex   lock;
> +} tree;
> +
> +uint32_t curr_size;
> +} ReqStor;

Same question for ReqStor. For an identifier that is used only three
times, it could be a bit more descriptive.

What unit has curr_size or what does it count? The nodes in the tree?
Also, cur_ seems to be more common as a prefix than curr_.

> +typedef struct BDRVPCacheState {
> +BlockDriverState **bs;

This is unused. (Good thing, it looks weird.)

> +ReqStor pcache;
> +
> +struct {
> +QTAILQ_HEAD(pcache_head, BlockNode) head;
> +CoMutex lock;
> +} list;
> +} BDRVPCacheState;
> +
>  typedef struct PrefCacheAIOCB {
>  BlockAIOCB common;
>  
> +BDRVPCacheState *s;

Not really needed, you already have acb->common.bs.

>  QEMUIOVector *qiov;
>  uint64_t sector_num;
>  uint32_t nb_sectors;
> @@ -64,6 +109,124 @@ static QemuOptsList runtime_opts = {
>  },
>  };
>  
> +#define PCNODE(_n) ((PCNode *)(_n))

container_of() would be preferable for type safety.

> +static int pcache_key_cmp(const RbNodeKey *key1, const RbNodeKey *key2)
> +{
> +assert(key1 != NULL);
> +assert(key2 != NULL);
> +
> +if (key1->num >= key2->num + key2->size) {
> +return 1;
> +}
> +if (key1->num + key1->size <= key2->num) {
> +return -1;
> +}
> +
> +return 0;
> +}
> +
> +static void *node_insert(struct RbRoot *root, BlockNode *node)
> +{
> +struct RbNode **new = &(root->rb_node), *parent = NULL;
> +
> +/* Figure out where to put new node */
> +while (*new) {
> +BlockNode *this = container_of(*new, BlockNode, rb_node);
> +int result = pcache_key_cmp(>key, >key);
> +if (result == 0) {
> +return this;
> +}
> +parent = *new;
> +new = result < 0 ? &((*new)->rb_left) : &((*new)->rb_right);
> +}
> +/* Add new node and rebalance tree. */
> +rb_link_node(>rb_node, parent, new);
> +rb_insert_color(>rb_node, root);
> +
> +return node;
> +}
> +
> +static inline PCNode *pcache_node_insert(struct RbRoot *root, PCNode *node)
> +{
> +return node_insert(root, >cm);
> +}
> +
> +static inline void pcache_node_free(PCNode *node)
> +{
> +g_free(node->data);
> +g_slice_free1(sizeof(*node), node);
> +}

We moved away from g_slice_* because it turned out that it hurt more
than it helped.

> +static inline void *pcache_node_alloc(RbNodeKey* key)
> +{
> +PCNode *node = g_slice_alloc(sizeof(*node));
> +
> +node->cm.sector_num = key->num;
> +node->cm.nb_sectors = key->size;

In other words, node->cm.key = *key;

> +node->data = g_malloc(node->cm.nb_sectors << BDRV_SECTOR_BITS);
> +
> +return node;
> +}
> +
> +static bool pcache_node_find_and_create(PrefCacheAIOCB *acb, RbNodeKey *key,
> +PCNode **out_node)
> +{
> +BDRVPCacheState *s = acb->s;
> +PCNode *new_node = pcache_node_alloc(key);
> +PCNode *found;
> +
> +qemu_co_mutex_lock(>pcache.tree.lock);
> +found = pcache_node_insert(>pcache.tree.root, new_node);
> +qemu_co_mutex_unlock(>pcache.tree.lock);

pcache_node_insert() doesn't yield, so the CoMutex is unnecessary.

> +if 

Re: [Qemu-devel] [PATCH RFC v2 00/22] I/O prefetch cache

2016-09-01 Thread Avi Kivity

On 08/29/2016 08:09 PM, Pavel Butsykin wrote:

The prefetch cache aims to improve the performance of sequential read data.
Of most interest here are the requests of a small size of data for sequential
read, such requests can be optimized by extending them and moving into
the prefetch cache. However, there are 2 issues:
  - In aggregate only a small portion of requests is sequential, so delays 
caused
by the need to read more volumes of data will lead to an overall decrease
in performance.
  - The presence of redundant data in the cache memory with a large number of
random requests.
This pcache implementation solves the above and other problems prefetching data.
The pcache algorithm can be summarised by the following main steps.

1. Monitor I/O requests to identify typical sequences.
This implementation of prefetch cache works at the storage system level and has
information only about the physical block addresses of I/O requests. Statistics
are collected only from read requests to a maximum size of 32kb(by default),
each request that matches the criteria falls into a pool of requests. In order
to store requests statistic used by the rb-tree(lreq.tree), it's simple but for
this issue a quite efficient data structure.

2. Identifying sequential I/O streams.
For each read request to be carried out attempting to lift the chain sequence
from lreq.tree, where this request will be element of a sequential chain of
requests. The key to search for consecutive requests is the area of sectors
preceding the current request. The size of this area should not be too small to
avoid false readahead. The sequential stream data requests can be identified
even when a large number of random requests. For example, if there is access to
the blocks 100, 1157, 27520, 4, 101, 312, 1337, 102, in the context of request
processing 102 will be identified the chain of sequential requests 100, 101. 102
and then should a decision be made to do readahead. Also a situation may arise
when multiple applications A, B, C simultaneously perform sequential read of
data. For each separate application that will be sequential read data
A(100, 101, 102), B(300, 301, 302), C(700, 701, 702), but for block devices it
may look like a random data reading: 100,300,700,101,301,701,102,302,702.
In this case, the sequential streams will also be recognised because location
requests in the rb-tree will allow to separate the sequential I/O streams.

3. Do the readahead into the cache for recognized sequential data streams.
After the issue of the detection of pcache case was resolved, need using larger
requests to bring data into the cache. In this implementation the pcache used
readahead instead of the extension request, therefore the request goes as is.
There is not any reason to put data in the cache that will never be picked up,
but this will always happen in the case of extension requests. In order to store
areas of cached blocks is also used by the rb-tree(pcache.tree), it's simple but
for this issue a quite efficient data structure.

4. Control size of the prefetch cache pool and the requests statistic pool
For control the border of the pool statistic of requests, the data of requests
are placed and replaced according to the FIFO principle, everything is simple.
For control the boundaries of the memory cache used LRU list, it allows to limit
the max amount memory that we can allocate for pcache. But the LRU is there
mainly to prevent displacement of the cache blocks that was read partially.
The main way the memory is pushed out immediately after use, as soon as a chunk
of memory from the cache has been completely read, since the probability of
repetition of the request is very low. Cases when one and the same portion of
the cache memory has been read several times are not optimized and do not apply
to the cases that can optimize the pcache. Thus, using a cache memory of small
volume, by the optimization of the operations read-ahead and clear memory, we
can read entire volumes of data, providing a 100% cache hit. Also does not
decrease the effectiveness of random read requests.

PCache is implemented as a qemu block filter driver, has some configurable
parameters, such as: total cache size, readahead size, maximum size of block
that can be processed.

For performance evaluation has been used several test cases with different
sequential and random read data on SSD disk. Here are the results of tests and
qemu parameters:

qemu parameters:
-M pc-i440fx-2.4 --enable-kvm -smp 4 -m 1024
-drive file=centos7.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,
aio=native,pcache-full-size=4MB,pcache-readahead-size=128KB,
pcache-max-aio-size=32KB
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,
 id=virtio-disk0
(-set device.virtio-disk0.x-data-plane=on)


* Testcase* Results in iops 

Re: [Qemu-devel] Implementation of BusLogic SCSI host adapter (BT-958)

2016-09-01 Thread Paolo Bonzini


On 01/09/2016 15:17, Denis Dmitriev wrote:
> 
> I think that I am missing some initialization or something like that.
> To work with the registers I create a memory region, and associate it
> with the read and write handlers (as an example I used lsi53c895a.c).
> Maybe work with memory in which mailboxes are located must be built
> the same way?

No, it doesn't.

> To me, it looks like I'm trying to get the data to the right place but
> I do not see them because there is something wrong with the mapping.
> The basis of the initialization function I took from lsi53c895a.c.
> Perhaps the root of evil in those lines, I commented out? If so, how
> can I understand what parameters should be passed in
> memory_region_init_io pci_register_bar and functions?
> 
> void buslogic_scsi_realize(PCIDevice *dev, Error **errp)
> {
> BuslogicState *s = BUSLOGIC_BT958(dev);
> DeviceState *d = DEVICE(dev);
> uint8_t *pci_conf;
> 
> pci_conf = dev->config;
> 
> /* PCI latency timer = 255 */
> pci_conf[PCI_LATENCY_TIMER] = 0xff;
> /* Interrupt pin A */
> pci_conf[PCI_INTERRUPT_PIN] = 0x01;
> 
> 
> 
> memory_region_init_io(>port_io, OBJECT(s), _port_ops, s,
>   "BusLogic", 0x4);

If the BT958 has a single BAR, that would be fine.  But is the length
(0x4) correct?

Paolo

> // memory_region_init_io(>ram_io, OBJECT(s), _ram_ops, s,
> //   "BusLogic-ram", 0x2000);
> // memory_region_init_io(>io_io, OBJECT(s), _io_ops, s,
> //   "BusLogic-io", 256);
> 
> pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, >port_io);
> // pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, >mmio_io);
> // pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, >ram_io);
> 
> QTAILQ_INIT(>queue);
> 
> scsi_bus_new(>bus, sizeof(s->bus), d, _scsi_info, NULL);
> if (!d->hotplugged) {
> scsi_bus_legacy_handle_cmdline(>bus, errp);
> }



Re: [Qemu-devel] [PATCH] aio: Remove spurious smp_read_barrier_depends()

2016-09-01 Thread Paolo Bonzini


On 01/09/2016 16:40, Pranith Kumar wrote:
>> >
>> > bh is shared since it is equal to ctx->first_bh or
>> > ctx->first_bh->...->next.  While the compiler will always order the load
>> > of bh->next after the load of ctx->first_bh and any previous load of
>> > bh->next, this may not be the case for the processor.  Only the DEC
>> > Alpha has this behavior, but it _can_ happen.
> In that case there are a bunch of places in the same file where we
> need to insert this barrier. I also guarantee that there are many more
> files which would need this barrier if we examine closely.

If it's protected by a lock, it's not necessary.

If you don't like smp_read_barrier_depends, you can introduce a synonym
for atomic_rcu_read with another name.  But the functionality is required.

Paolo



Re: [Qemu-devel] [PATCH] vl: Delay initialization of memory backends

2016-09-01 Thread Eduardo Habkost
On Wed, Aug 31, 2016 at 02:47:21PM -0700, 
no-re...@ec2-52-6-146-230.compute-1.amazonaws.com wrote:
[...]
> GTESTER check-qtest-x86_64
> qemu-system-x86_64: Failed initializing vhost-user memory map, consider using 
> -object memory-backend-file share=on
> qemu-system-x86_64: vhost_set_mem_table failed: Success (0)
[...]
> **
> ERROR:/tmp/qemu-test/src/tests/vhost-user-test.c:149:wait_for_fds: assertion 
> failed: (s->fds_num)

Ouch. It looks like the ordering requirements are messier than I
thought. vhost-user depends on the memory backends to be already
initialized.

We can't use early initialization because prealloc delays chardev
init too much. We can't delay initialization because it is done
after netdevs.

We _really_ need to change this to simply use the ordering used
on the command-line/config instead of hardcoding messy ordering
requirements, but I wouldn't like to wait for a QemuOpts
refactoring to fix the bug. I will take a look at the memory
regions initialization path, and try to trigger the
memory-backend prealloc code there.

-- 
Eduardo



Re: [Qemu-devel] [PATCH] aio: Remove spurious smp_read_barrier_depends()

2016-09-01 Thread Pranith Kumar
On Thu, Sep 1, 2016 at 7:01 AM, Paolo Bonzini  wrote:
>
>
> On 01/09/2016 00:29, Pranith Kumar wrote:
>> smp_read_barrier_depends() should be used only if you are reading
>> dependent pointers which are shared.
>
> bh is shared since it is equal to ctx->first_bh or
> ctx->first_bh->...->next.  While the compiler will always order the load
> of bh->next after the load of ctx->first_bh and any previous load of
> bh->next, this may not be the case for the processor.  Only the DEC
> Alpha has this behavior, but it _can_ happen.

In that case there are a bunch of places in the same file where we
need to insert this barrier. I also guarantee that there are many more
files which would need this barrier if we examine closely.

I really doubt if we see this kind of re-ordering. Did we ever get any
bug reports about running qemu on an SMP Alpha?

smp_read_barrier_depends() is evil and I would prefer we get rid of it.

Thanks,
-- 
Pranith



Re: [Qemu-devel] [PATCH RFC v2 03/22] util/rbtree: add rbtree from linux kernel

2016-09-01 Thread Kevin Wolf
Am 29.08.2016 um 19:10 hat Pavel Butsykin geschrieben:
> Why don't we use rbtree from glib? We need  pointer to the parent node.
> For optimal implementation storing of cached chunks in the rbtree
> need to get next and previous nodes and content of parent node
> is very useful for effective implementation of these functions. In this
> implementation of rbtree (unlike rbtree of glib) the node contains a pointer
> to parent node.  Moreover, this rbtree allows more flexibility to
> work with an algorithm because to use rbtrees you'll have to implement
> your own insert and search cores. This will avoid us to use callbacks and
> to drop drammatically performances.
> 
> Signed-off-by: Pavel Butsykin 

General note (only having looked at the diffstat): We need to make sure
that new files that this series adds are covered by MAINTAINERS entries.

Kevin



Re: [Qemu-devel] [PATCH RFC v2 01/22] block/pcache: empty pcache driver filter

2016-09-01 Thread Kevin Wolf
Am 29.08.2016 um 19:10 hat Pavel Butsykin geschrieben:
> The basic version of pcache driver for easy preparation of a patch set.
> 
> Signed-off-by: Pavel Butsykin 

> +.bdrv_aio_readv = pcache_aio_readv,
> +.bdrv_aio_writev= pcache_aio_writev,

Can you please use .bdrv_co_preadv/pwritev instead and make everything
based on bytes rather than sectors?

Internally you can still spawn AIO requests to achieve the same
parallelism as you have now (we'll just need new byte-based
bdrv_co_aio_prw_vector() wrappers, but the functionality is there) and I
don't think making the outer layer coroutine based would be too hard. In
fact it might even simplify some code.

Kevin



Re: [Qemu-devel] [PATCH] tests: Fix broken tcg test compilation

2016-09-01 Thread Eric Blake
On 09/01/2016 04:02 AM, Michal Privoznik wrote:
> The first build error to be seen is that linux-test.c fails to
> include cutils.h:
> 
> linux-test.c:42:25: fatal error: qemu/cutils.h: No such file or directory
> 
> This is because toplevel include/ dir is not put onto compiler's
> command line. After that:
> 
> qemu.git/include/qemu/cutils.h:171:1: error: unknown type name ‘bool’
> 
> So we need to include "qemu/osdep.h" which will define bool type
> for us. However, osdep.h eventually includes glib.h from system,
> therefore we need to put GLIB_CFLAGS onto compiler's command line
> too.
> 
> Lastly, getrusage is used in linux-test.c. This function and a
> struct it uses are defined in sys/resource.h:
> 
> linux-test.c:247:5: warning: implicit declaration of function ‘getrusage’
> 
> Signed-off-by: Michal Privoznik 
> ---
> 

> +++ b/tests/tcg/linux-test.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -39,6 +40,7 @@
>  #include 
>  #include 
>  #include 
> +#include "qemu/osdep.h"

osdep.h has to be included first, before any system headers (in case it
sets defines that modify the behavior of those system headers).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH RFC v2 00/22] I/O prefetch cache

2016-09-01 Thread Kevin Wolf
Am 29.08.2016 um 19:09 hat Pavel Butsykin geschrieben:
> The prefetch cache aims to improve the performance of sequential read data.
> Of most interest here are the requests of a small size of data for sequential
> read, such requests can be optimized by extending them and moving into 
> the prefetch cache.
> [...]

Before I start actually looking into your code, I read both this cover
letter and your KVM Forum slides, and as far as I can say, the
fundamental idea and your design look sound to me. It was a good read,
too, so thanks for writing all the explanations!

One thing that came to mind is that we have more caches elsewhere, most
notably the qcow2 metadata cache, and I still have that private branch
that adds a qcow2 data cache, too (for merging small allocating writes,
if you remember my talk from last year). However, the existing
Qcow2Cache has a few problems like being tied to the cluster size.

So I wondered how hard you think it would be to split pcache into a
reusable cache core that just manages the contents based on calls like
"allocate/drop/get cached memory for bytes x...y", and the actual
pcache code that implements the read-ahead policy. Then qcow2 could
reuse the core and use its own policy about what metadata to cache etc.

Of course, this can be done incrementally on top and should by no means
block the inclusion of your code, but if it's possible, it might be an
interesting thought to keep in mind.

Kevin



Re: [Qemu-devel] proposed release timetable for 2.8

2016-09-01 Thread Stefan Hajnoczi
On Thu, Sep 01, 2016 at 12:18:10PM +0100, Peter Maydell wrote:
> I know 2.7 isn't quite out the door yet, but I figured we should
> kick off the discussion of 2.8's schedule. At the QEMU Summit there
> was some discussion on how we're doing with releases, and I think
> the consensus view was that we should try to cut down the softfreeze
> period and also be stricter about (a) making sure pull requests get
> in in a timely way before rc0 and (b) we don't take new features
> during softfreeze.
> 
> (I'm not entirely sure I have those right, and in any case they're
> not pre-decided conclusions, so corrections and further opinion
> welcome.)
> 
> As a strawman, here's a timetable which results in a final
> release in December at the usual sort of time (ie allowing for
> the usual slippage without it hitting the holiday season):
> 
> 
> 2016-10-25 softfreeze, if you think we need 3 weeks, or:
> 2016-11-01 if you think we can do a 2 week softfreeze
> 2016-11-08 deadline for getting pull requests on list before hardfreeze?
> 2016-11-15 rc0 (start of hardfreeze)
> 2016-11-22 rc1
> 2016-11-29 rc2
> 2016-12-06 rc3
> 2016-12-13 final v2.8.0

I suggest we do the schedule above with a firm hardfreeze deadline where
no more feature pull requests are allowed.  This means a 2 week
softfreeze and time before -rc0 for the maintainer to merge and test
pull requests:

2016-10-25 softfreeze
2016-11-08 hardfreeze
2016-11-15 rc0
2016-11-22 rc1
2016-11-29 rc2
2016-12-06 rc3
2016-12-13 final v2.8.0


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-01 Thread Marcel Apfelbaum

On 09/01/2016 04:27 PM, Peter Maydell wrote:

On 1 September 2016 at 14:22, Marcel Apfelbaum  wrote:

Proposes best practices on how to use PCIe/PCI device
in PCIe based machines and explain the reasoning behind them.

Signed-off-by: Marcel Apfelbaum 
---

Hi,

Please add your comments on what to add/remove/edit to make this doc usable.




Hi Peter,


As somebody who doesn't really understand the problem space, my
thoughts:

(1) is this intended as advice for developers writing machine
models and adding pci controllers to them, or is it intended as
advice for users (and libvirt-style management layers) about
how to configure QEMU?



Is it intended for management layers as they have no way to
understand how to "consume" the Q35 machine,
but also for firmware developers (OVMF/SeaBIOS) to help them
understand the usage model so they can optimize IO/MEM
resources allocation for both boot time and hot-plug.

QEMU users/developers can also benefit from it as the PCIe arch
is more complex supporting both PCI/PCIe devices and
several PCI/PCIe controllers with no clear rules on what goes where.


(2) it seems to be a bit short on concrete advice (either
"you should do this" instructions to machine model developers,
or "use command lines like this" instructions to end-users.



Thanks for the point. I'll be sure to add detailed command line examples
to the next version.

Thanks,
Marcel


thanks
-- PMM






Re: [Qemu-devel] [PULL 3/3] vhost-user: Attempt to fix a race with set_mem_table.

2016-09-01 Thread Michael S. Tsirkin
On Wed, Aug 31, 2016 at 01:19:47PM +0200, Maxime Coquelin wrote:
> 
> 
> On 08/14/2016 11:42 AM, Prerna Saxena wrote:
> > On 14/08/16 8:21 am, "Michael S. Tsirkin"  wrote:
> > 
> > 
> > > On Fri, Aug 12, 2016 at 07:16:34AM +, Prerna Saxena wrote:
> > > > 
> > > > On 12/08/16 12:08 pm, "Fam Zheng"  wrote:
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > > On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
> > > > > > From: Prerna Saxena 
> > > > > > 
> > > > > > The set_mem_table command currently does not seek a reply. Hence, 
> > > > > > there is
> > > > > > no easy way for a remote application to notify to QEMU when it 
> > > > > > finished
> > > > > > setting up memory, or if there were errors doing so.
> > > > > > 
> > > > > > As an example:
> > > > > > (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
> > > > > > application). SET_MEM_TABLE does not require a reply according to 
> > > > > > the spec.
> > > > > > (2) Qemu commits the memory to the guest.
> > > > > > (3) Guest issues an I/O operation over a new memory region which 
> > > > > > was configured on (1).
> > > > > > (4) The application has not yet remapped the memory, but it sees 
> > > > > > the I/O request.
> > > > > > (5) The application cannot satisfy the request because it does not 
> > > > > > know about those GPAs.
> > > > > > 
> > > > > > While a guaranteed fix would require a protocol extension 
> > > > > > (committed separately),
> > > > > > a best-effort workaround for existing applications is to send a 
> > > > > > GET_FEATURES
> > > > > > message before completing the vhost_user_set_mem_table() call.
> > > > > > Since GET_FEATURES requires a reply, an application that processes 
> > > > > > vhost-user
> > > > > > messages synchronously would probably have completed the 
> > > > > > SET_MEM_TABLE before replying.
> > > > > > 
> > > > > > Signed-off-by: Prerna Saxena 
> > > > > > Reviewed-by: Michael S. Tsirkin 
> > > > > > Signed-off-by: Michael S. Tsirkin 
> > > > > 
> > > > > Sporadic hangs are seen with test-vhost-user after this patch:
> > > > > 
> > > > > https://travis-ci.org/qemu/qemu/builds
> > > > > 
> > > > > Reverting seems to fix it for me.
> > > > > 
> > > > > Is this a known problem?
> > > > > 
> > > > > Fam
> > > > 
> > > > Hi Fam,
> > > > Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass 
> > > > on my Centos 6 environment, so missed this.
> > > > I am setting up the docker test env to repro this, but I think I can 
> > > > guess the problem :
> > > > 
> > > > In tests/vhost-user-test.c:
> > > > 
> > > > static void chr_read(void *opaque, const uint8_t *buf, int size)
> > > > {
> > > > ..[snip]..
> > > > 
> > > > case VHOST_USER_SET_MEM_TABLE:
> > > >/* received the mem table */
> > > >memcpy(>memory, , 
> > > > sizeof(msg.payload.memory));
> > > >s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, 
> > > > G_N_ELEMENTS(s->fds));
> > > > 
> > > > 
> > > >/* signal the test that it can continue */
> > > >g_cond_signal(>data_cond);
> > > >break;
> > > > ..[snip]..
> > > > }
> > > > 
> > > > 
> > > > The test seems to be marked complete as soon as mem_table is copied.
> > > > However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE 
> > > > vhost command implementation with qemu. SET_MEM_TABLE now sends out a 
> > > > new message GET_FEATURES, and the call is only completed once it 
> > > > receives features from the remote application. (or the test framework, 
> > > > as is the case here.)
> > > 
> > > Hmm but why does it matter that data_cond is woken up?
> > 
> > Michael, sorry, I didn’t quite understand that. Could you pls explain ?
> > 
> > > 
> > > 
> > > > While the test itself can be modified (Do not signal completion until 
> > > > we’ve sent a follow-up response to GET_FEATURES), I am now wondering if 
> > > > this patch may break existing vhost applications too ? If so, reverting 
> > > > it possibly better.
> > > 
> > > What bothers me is that the new feature might cause the same
> > > issue once we enable it in the test.
> > 
> > No it wont. The new feature is a protocol extension, and only works if it 
> > has been negotiated with. If not negotiated, that part of code is never 
> > executed.
> > 
> > > 
> > > How about a patch to tests/vhost-user-test.c adding the new
> > > protocol feature? I would be quite interested to see what
> > > is going on with it.
> > 
> > Yes that can be done. But you can see that the protocol extension patch 
> > will not change the behaviour of the _existing_ test.
> > 
> > > 
> > > 
> > > > What confuses me is why it doesn’t fail all the time, but only about 
> > > > 20% to 30% time as Fam reports.
> > > 
> > > And succeeds every time on my systems :(
> > 
> > +1 to that :( I have had no luck repro’ing it.
> > 
> > > 
> > > > 
> > > > Thoughts : 

Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-01 Thread Alexander Graf

On 08/30/2016 02:12 PM, Gonglei wrote:

The virtio crypto device is a virtual crypto device (ie. hardware
crypto accelerator card). The virtio crypto device can provide
five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.

In this patch, CIPHER, MAC, HASH, AEAD services are introduced.


I have mostly a few high level comments.

For starters, a lot of the structs rely on the compiler to pad them to 
natural alignment. That may get us into trouble when trying to emulate a 
virtio device on a host with different guest architecture (like arm on 
x86). It'd be a good idea to manually pad everything to be 64bit aligned 
- then all fields are always at the same spot.


I also have a hard time getting my head around the final flow of 
everything. Do I always have to create a session to be able to emit a 
command? In that case, doesn't that slow down everything, since a 
request would then need to wait for the host reply to receive its 
session id? There should be some way to fire off a simple non-iv 
operation without any session set up imho.


Also, I don't fully understand the split between control and data 
queues. As far as I read things, the control queue is used to create 
session ids and the data queues can then be used to push data. Is there 
any particular reason for the split? One thing that seems natural to me 
would be to have sessions be per-queue, so you would create a session on 
a particular queue and only have it ever be available there. That way 
you get rid of any locking for sessions.



Alex




Re: [Qemu-devel] [PATCH v2] target-i386: present virtual L3 cache info for vcpus

2016-09-01 Thread Michael S. Tsirkin
On Thu, Sep 01, 2016 at 02:58:05PM +0800, l00371263 wrote:
> From: "Longpeng(Mike)" 
> 
> Some software algorithms are based on the hardware's cache info, for example,
> for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will 
> trigger
> a resched IPI and told cpu2 to do the wakeup if they don't share low level
> cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share 
> llc.
> The relevant linux-kernel code as bellow:
> 
>   static void ttwu_queue(struct task_struct *p, int cpu)
>   {
>   struct rq *rq = cpu_rq(cpu);
>   ..
>   if (... && !cpus_share_cache(smp_processor_id(), cpu)) {
>   ..
>   ttwu_queue_remote(p, cpu); /* will trigger RES IPI */
>   return;
>   }
>   ..
>   ttwu_do_activate(rq, p, 0); /* access target's rq directly */
>   ..
>   }
> 
> In real hardware, the cpus on the same socket share L3 cache, so one won't
> trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present 
> a
> virtual L3 cache info for VM, then the linux guest will trigger lots of RES 
> IPIs
> under some workloads even if the virtual cpus belongs to the same virtual 
> socket.
> 
> For KVM, this degrades performance, because there will be lots of vmexit due 
> to
> guest send IPIs.
> 
> The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates)
> and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during
> the period:
> 
> No-L3   With-L3(applied this patch)
> cpu0: 363890  44582
> cpu1: 373405  43109
> cpu2: 340783  43797
> cpu3: 333854  43409
> cpu4: 327170  40038
> cpu5: 325491  39922
> cpu6: 319129  42391
> cpu7: 306480  41035
> cpu8: 161139  32188
> cpu9: 164649  31024
> cpu10:149823  30398
> cpu11:149823  32455
> cpu12:164830  35143
> cpu13:172269  35805
> cpu14:179979  33898
> cpu15:194505  32754
> avg:  268963.640129.8
> 
> The VM's topology is "1*socket 8*cores 2*threads".
> After present virtual L3 cache info for VM, the amounts of RES IPI in guest
> reduce 85%.
> 
> And we also test the overall system performance if vcpus actually run on
> sparate physical sockets. With L3 cache, the performance improves 7.2%~33.1%
> (avg: 15.7%).

Any idea why?  I'm guessing that on bare metal, it is
sometimes cheaper to send IPIs with a separate cache, but on KVM,
it is always cheaper to use memory, as this reduces the # of exits.
Is this it?

It's worth listing here so that e.g. if it ever becomes possible to send
IPIs without exits, we know we need to change this code.

> Signed-off-by: Longpeng(Mike) 
> ---
> Hi Eduardo, 
> 
> Changes since v1:
>   - fix the compat problem: set compat_props on PC_COMPAT_2_7.
>   - fix a "intentionally introducde bug": make intel's and amd's consistently.
>   - fix the CPUID.(EAX=4, ECX=3):EAX[25:14].
>   - test the performance if vcpus running on sparate sockets: with L3 cache,
> the performance improves 7.2%~33.1%(avg: 15.7%).
> ---
>  include/hw/i386/pc.h |  8 
>  target-i386/cpu.c| 49 -
>  target-i386/cpu.h|  3 +++
>  3 files changed, 55 insertions(+), 5 deletions(-)
> 
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 74c175c..6072625 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -367,7 +367,15 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t);
>  int e820_get_num_entries(void);
>  bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
>  
> +#define PC_COMPAT_2_7 \
> +{\
> +.driver   = TYPE_X86_CPU,\
> +.property = "compat-cache",\
> +.value= "on",\
> +},
> +
>  #define PC_COMPAT_2_6 \
> +PC_COMPAT_2_7 \
>  HW_COMPAT_2_6 \
>  {\
>  .driver   = "fw_cfg_io",\


Could this get a more informative name?
E.g. l3-cache-shared ?

> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 6a1afab..224d967 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -57,6 +57,7 @@
>  #define CPUID_2_L1D_32KB_8WAY_64B 0x2c
>  #define CPUID_2_L1I_32KB_8WAY_64B 0x30
>  #define CPUID_2_L2_2MB_8WAY_64B   0x7d
> +#define CPUID_2_L3_16MB_16WAY_64B 0x4d
>  
>  
>  /* CPUID Leaf 4 constants: */
> @@ -131,11 +132,18 @@
>  #define L2_LINES_PER_TAG   1
>  #define L2_SIZE_KB_AMD   512
>  
> -/* No L3 cache: */
> +/* Level 3 unified cache: */
>  #define L3_SIZE_KB 0 /* disabled */
>  #define L3_ASSOCIATIVITY   0 /* disabled */
>  #define L3_LINES_PER_TAG   0 /* disabled */
>  #define L3_LINE_SIZE   0 /* disabled */
> +#define L3_N_LINE_SIZE 64
> +#define L3_N_ASSOCIATIVITY 16
> 

Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-01 Thread Peter Maydell
On 1 September 2016 at 14:22, Marcel Apfelbaum  wrote:
> Proposes best practices on how to use PCIe/PCI device
> in PCIe based machines and explain the reasoning behind them.
>
> Signed-off-by: Marcel Apfelbaum 
> ---
>
> Hi,
>
> Please add your comments on what to add/remove/edit to make this doc usable.

As somebody who doesn't really understand the problem space, my
thoughts:

(1) is this intended as advice for developers writing machine
models and adding pci controllers to them, or is it intended as
advice for users (and libvirt-style management layers) about
how to configure QEMU?

(2) it seems to be a bit short on concrete advice (either
"you should do this" instructions to machine model developers,
or "use command lines like this" instructions to end-users.

thanks
-- PMM



[Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-01 Thread Marcel Apfelbaum
Proposes best practices on how to use PCIe/PCI device
in PCIe based machines and explain the reasoning behind them.

Signed-off-by: Marcel Apfelbaum 
---

Hi,

Please add your comments on what to add/remove/edit to make this doc usable.

Thanks,
Marcel

 docs/pcie.txt | 145 ++
 1 file changed, 145 insertions(+)
 create mode 100644 docs/pcie.txt

diff --git a/docs/pcie.txt b/docs/pcie.txt
new file mode 100644
index 000..52a8830
--- /dev/null
+++ b/docs/pcie.txt
@@ -0,0 +1,145 @@
+PCI EXPRESS GUIDELINES
+==
+
+1. Introduction
+
+The doc proposes best practices on how to use PCIe/PCI device
+in PCIe based machines and explains the reasoning behind them.
+
+
+2. Device placement strategy
+
+QEMU does not have a clear socket-device matching mechanism
+and allows any PCI/PCIe device to be plugged into any PCI/PCIe slot.
+Plugging a PCI device into a PCIe device might not always work and
+is weird anyway since it cannot be done for "bare metal".
+Plugging a PCIe device into a PCI slot will hide the Extended
+Configuration Space thus is also not recommended.
+
+The recommendation is to separate the PCIe and PCI hierarchies.
+PCIe devices should be plugged only into PCIe Root Ports and
+PCIe Downstream ports (let's call them PCIe ports).
+
+2.1 Root Bus (pcie.0)
+=
+Plug only legacy PCI devices as Root Complex Integrated Devices
+even if the PCIe spec does not forbid PCIe devices. The existing
+hardware uses mostly PCI devices as Integrated Endpoints. In this
+way we may avoid some strange Guest OS-es behaviour.
+Other than that plug only PCIe Root Ports, PCIe Switches (upstream ports)
+or DMI-PCI bridges to start legacy PCI hierarchies.
+
+
+   pcie.0 bus
+   --
+|||   |
+   ---   --   --  --
+   | PCI Dev |   | PCIe Root Port |   |  Upstream Port |  | DMI-PCI bridge |
+   ---   --   --  --
+
+2.2 PCIe only hierarchy
+===
+Always use PCIe Root ports to start a PCIe hierarchy. Use PCIe switches 
(Upstream
+Ports + several Downstream Ports) if out of PCIe Root Ports slots. PCIe 
switches
+can be nested until a depth of 6-7. Plug only PCIe devices into PCIe Ports.
+
+
+   pcie.0 bus
+   
+||   |
+   -   -   -
+   | Root Port |   | Root Port |   | Root Port |
+      --   -
+ |   |
+ -
+| PCIe Dev | | Upstream Port |
+ -
+  ||
+ ------
+ | Downstream Port || Downstream Port |
+ ------
+ |
+ 
+ | PCIe Dev |
+ 
+
+2.3 PCI only hierarchy
+==
+Legacy PCI devices can be plugged into pcie.0 as Integrated Devices or
+into DMI-PCI bridge. PCI-PCI bridges can be plugged into DMI-PCI bridges
+and can be nested until a depth of 6-7. DMI-BRIDGES should be plugged
+only into pcie.0 bus.
+
+   pcie.0 bus
+   --
+||
+   ---   --
+   | PCI Dev |   | DMI-PCI BRIDGE |
+   ----
+   ||
+-----
+| PCI Dev || PCI-PCI Bridge |
+-----
+ |   |
+  --- ---
+  | PCI Dev | | PCI Dev |
+  --- ---
+
+
+
+3. IO space issues
+===
+PCIe Ports are seen by Firmware/Guest OS as PCI bridges and
+as required by PCI spec will reserve a 4K IO range for each.
+The firmware used by QEMU (SeaBIOS/OVMF) will further optimize
+it by allocation the IO space only if there is at least a device
+with IO BARs plugged into the bridge.
+Behind a PCIe PORT only one device may be plugged, resulting in
+the allocation of a whole 4K range for each device.
+The IO space is limited resulting in ~10 PCIe ports per system
+if devices with IO BARs are plugged into IO ports.
+
+Using the proposed device 

Re: [Qemu-devel] Implementation of BusLogic SCSI host adapter (BT-958)

2016-09-01 Thread Denis Dmitriev
2016-09-01 14:11 GMT+03:00 Paolo Bonzini :
>
>
>
> On 31/08/2016 15:48, Денис Дмитриев wrote:
> > uint64_t buslogicReadOutgoingMailbox(BuslogicState *s, BUSLOGICTASKSTATE
> > *TaskState)
> > {
> > uint64_tGCMailbox;
> > Mailbox24   Mbx24;
> > Mbx24.uCmdState = 0;
> > PCIDevice *pci_dev = PCI_DEVICE(s);
> > if (s->fMbxIs24Bit)
> > {
> > //try to calculate mailbox address
> > GCMailbox = s->GCPhysAddrMailboxOutgoingBase +
> > (s->uMailboxOutgoingPositionCurrent * sizeof(Mailbox24));
> > //try to read mailbox
> > pci_dma_read(pci_dev, GCMailbox, , sizeof(Mailbox24));
> > //after that i  have empty buffer
> > TaskState->MailboxGuest.u32PhysAddrCCB=
> > ADDR_TO_U32(Mbx24.aPhysAddrCCB);
> > TaskState->MailboxGuest.u.out.uActionCode = Mbx24.uCmdState;
> > }
> > else
> > {
> > GCMailbox = s->GCPhysAddrMailboxOutgoingBase +
> > (s->uMailboxOutgoingPositionCurrent * sizeof(Mailbox32));
> > pci_dma_read(pci_dev, GCMailbox, >MailboxGuest,
> > sizeof(Mailbox32));
> > }
> > return GCMailbox;
> > }
>
> This seems okay, so I am afraid you'll have to debug it. :(


I think that I am missing some initialization or something like that.
To work with the registers I create a memory region, and associate it
with the read and write handlers (as an example I used lsi53c895a.c).
Maybe work with memory in which mailboxes are located must be built
the same way?
To me, it looks like I'm trying to get the data to the right place but
I do not see them because there is something wrong with the mapping.
The basis of the initialization function I took from lsi53c895a.c.
Perhaps the root of evil in those lines, I commented out? If so, how
can I understand what parameters should be passed in
memory_region_init_io pci_register_bar and functions?

void buslogic_scsi_realize(PCIDevice *dev, Error **errp)
{
BuslogicState *s = BUSLOGIC_BT958(dev);
DeviceState *d = DEVICE(dev);
uint8_t *pci_conf;

pci_conf = dev->config;

/* PCI latency timer = 255 */
pci_conf[PCI_LATENCY_TIMER] = 0xff;
/* Interrupt pin A */
pci_conf[PCI_INTERRUPT_PIN] = 0x01;



memory_region_init_io(>port_io, OBJECT(s), _port_ops, s,
  "BusLogic", 0x4);
// memory_region_init_io(>ram_io, OBJECT(s), _ram_ops, s,
//   "BusLogic-ram", 0x2000);
// memory_region_init_io(>io_io, OBJECT(s), _io_ops, s,
//   "BusLogic-io", 256);

pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, >port_io);
// pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, >mmio_io);
// pci_register_bar(dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, >ram_io);

QTAILQ_INIT(>queue);

scsi_bus_new(>bus, sizeof(s->bus), d, _scsi_info, NULL);
if (!d->hotplugged) {
scsi_bus_legacy_handle_cmdline(>bus, errp);
}
}

-- 
Sincerely, Denis Dmitriev.



Re: [Qemu-devel] [PATCH] spapr: implement H_CHANGE_LOGICAL_LAN_MAC h_call

2016-09-01 Thread Thomas Huth
On 01.09.2016 10:10, Laurent Vivier wrote:
> Since kernel v4.0, linux uses H_CHANGE_LOGICAL_LAN_MAC to change lively
> the MAC address of an ibmveth interface.
> 
> As QEMU doesn't implement this h_call, we can't change anymore the
> MAC address of an spapr-vlan interface.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/net/spapr_llan.c | 30 ++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
> index b273eda..4bb95a5 100644
> --- a/hw/net/spapr_llan.c
> +++ b/hw/net/spapr_llan.c
> @@ -106,6 +106,7 @@ typedef struct VIOsPAPRVLANDevice {
>  VIOsPAPRDevice sdev;
>  NICConf nicconf;
>  NICState *nic;
> +MACAddr perm_mac;
>  bool isopen;
>  hwaddr buf_list;
>  uint32_t add_buf_ptr, use_buf_ptr, rx_bufs;
> @@ -316,6 +317,10 @@ static void spapr_vlan_reset(VIOsPAPRDevice *sdev)
>  spapr_vlan_reset_rx_pool(dev->rx_pool[i]);
>  }
>  }
> +
> +memcpy(>nicconf.macaddr.a, >perm_mac.a,
> +   sizeof(dev->nicconf.macaddr.a));
> +qemu_format_nic_info_str(qemu_get_queue(dev->nic), 
> dev->nicconf.macaddr.a);
>  }
>  
>  static void spapr_vlan_realize(VIOsPAPRDevice *sdev, Error **errp)
> @@ -324,6 +329,8 @@ static void spapr_vlan_realize(VIOsPAPRDevice *sdev, 
> Error **errp)
>  
>  qemu_macaddr_default_if_unset(>nicconf.macaddr);
>  
> +memcpy(>perm_mac.a, >nicconf.macaddr.a, 
> sizeof(dev->perm_mac.a));
> +
>  dev->nic = qemu_new_nic(_spapr_vlan_info, >nicconf,
>  object_get_typename(OBJECT(sdev)), 
> sdev->qdev.id, dev);
>  qemu_format_nic_info_str(qemu_get_queue(dev->nic), 
> dev->nicconf.macaddr.a);
> @@ -756,6 +763,27 @@ static target_ulong h_multicast_ctrl(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  return H_SUCCESS;
>  }
>  
> +static target_ulong h_change_logical_lan_mac(PowerPCCPU *cpu,
> + sPAPRMachineState *spapr,
> + target_ulong opcode,
> + target_ulong *args)
> +{
> +target_ulong reg = args[0];
> +target_ulong macaddr = args[1];
> +VIOsPAPRDevice *sdev = spapr_vio_find_by_reg(spapr->vio_bus, reg);
> +VIOsPAPRVLANDevice *dev = VIO_SPAPR_VLAN_DEVICE(sdev);
> +int i;

As an additional sanity check, you could test whether the MAC address is
a proper unicast address, i.e. that the broadcast bit is not set.

> +for (i = 0; i < ETH_ALEN; i++) {
> +dev->nicconf.macaddr.a[ETH_ALEN - i - 1] = macaddr & 0xff;
> +macaddr >>= 8;
> +}
> +
> +qemu_format_nic_info_str(qemu_get_queue(dev->nic), 
> dev->nicconf.macaddr.a);
> +
> +return H_SUCCESS;
> +}
> +
>  static Property spapr_vlan_properties[] = {
>  DEFINE_SPAPR_PROPERTIES(VIOsPAPRVLANDevice, sdev),
>  DEFINE_NIC_PROPERTIES(VIOsPAPRVLANDevice, nicconf),
> @@ -854,6 +882,8 @@ static void spapr_vlan_register_types(void)
>  spapr_register_hypercall(H_ADD_LOGICAL_LAN_BUFFER,
>   h_add_logical_lan_buffer);
>  spapr_register_hypercall(H_MULTICAST_CTRL, h_multicast_ctrl);
> +spapr_register_hypercall(H_CHANGE_LOGICAL_LAN_MAC,
> + h_change_logical_lan_mac);
>  type_register_static(_vlan_info);
>  }

Patch looks basically fine to me. One more thought though:
What about migration? Don't we need to migrate the perm_mac array, too,
or is this already covered by the dev->nicconf.macaddr array?

 Thomas




Re: [Qemu-devel] [virtio-comment] Re: [PATCH] *** Vhost-pci RFC v2 ***

2016-09-01 Thread Marc-André Lureau
Hi

On Thu, Sep 1, 2016 at 4:13 PM Wei Wang  wrote:

> On 09/01/2016 04:49 PM, Marc-André Lureau wrote:
> > Hi
> >
> > On Thu, Sep 1, 2016 at 12:19 PM Wei Wang  > > wrote:
> >
> > On 08/31/2016 08:30 PM, Marc-André Lureau wrote:
> >
> >> - If it could be made not pci-specific, a better name for the
> >> device could be simply "driver": the driver of a virtio device.
> >> Or the "slave" in vhost-user terminology - consumer of virtq. I
> >> think you prefer to call it "backend" in general, but I find it
> >> more confusing.
> >
> > Not really. A virtio device has it own driver (e.g. a virtio-net
> > driver for a virtio-net device). A vhost-pci device plays the role
> > of a backend (just like vhost_net, vhost_user) for a virtio
> > device. If we use the "device/driver" naming convention, the
> > vhost-pci device is part of the "device". But I actually prefer to
> > use "frontend/backend" :) If we check the QEMU's
> > doc/specs/vhost-user.txt, it also uses "backend" to describe.
> >
> >
> > Yes, but it uses "backend" freely without any definition and to name
> > eventually different things. (at least "slave" is being defined as the
> > consumer of virtq, but I think some people don't like to use that word).
> >
>
> I think most people know the concept of backend/frontend, that's
> probably the reason why they usually don't explicitly explain it in a
> doc. If you guys don't have an objection, I suggest to use it in the
> discussion :)  The goal here is to get the design finalized first. When
> it comes to the final spec wording phase, we can decide which
> description is more proper.
>

"backend" is too broad for me. Instead I would stick to something closer to
what we want to name and define. If it's the consumer of virtq, then why
not call it that way.


> > Have you thought about making the device not pci specific? I don't
> > know much about mmio devices nor s/390, but if devices can hotplug
> > their own memory (I believe mmio can), then it should be possible to
> > define a device generic enough.
>
> Not yet. I think the main difference would be the way to map the
> frontend VM's memory (in our case, we use a BAR). Other things should be
> generic.
>

I hope some more knowledgeable people will chime in.


>
> >
> >> - Why is it required or beneficial to support multiple "frontend"
> >> devices over the same "vhost-pci" device? It could simplify
> >> things if it was a single device. If necessary, that could also
> >> be interesting as a vhost-user extension.
> >
> > We call it "multiple backend functionalities" (e.g. vhost-pci-net,
> > vhost-pci-scsi..). A vhost-pci driver contains multiple such
> > backend functionalities, because in this way they can reuse
> > (share) the same memory mapping. To be more precise, a vhost-pci
> > device supplies the memory of a frontend VM, and all the backend
> > functionalities need to access the same frontend VM memory, so we
> > consolidate them into one vhost-pci driver to use one vhost-pci
> > device.
> >
> >
> > That's what I imagined. Do you have a use case for that?
>
> Currently, we only have the network use cases. I think we can design it
> that way (multple backend functionalities), which is more generic (not
> just limited to network usages). When implementing it, we can first have
> the network backend functionality (i.e. vhost-pci-net) implemented. In
> the future, if people are interested in other backend functionalities, I
> think it should be easy to add them.
>

My question is not about the support of various kind of devices (that is
clearly a worthy goal to me) but to have support simultaneously of several
frontend/provider devices on the same vhost-pci device: is this required or
necessary? I think it would simplify things if it was 1-1 instead, I would
like to understand why you propose a different design.


>
> >
> > Given that it's in a VM (no caching issues?), how is it a problem to
> > map the same memory multiple times? Is there a memory limit?
> >
>
> I need to explain this a little bit more :)  - the backend VM doesn't
> need to map the same memory multiple times. It maps the entire memory of
> a frontend VM using a vhost-pci device (it's a one-time mapping
> happening at the setup phase). Those backend functionalities reside in
> the same vhost-pci driver, so the bar is ioremap()-ed only once, by the
> vhost-pci driver. The backend functionalities are not created together
> in the driver probe() function. A backend functionality is created when
> the vhost-pci driver receives a controlq message asking for creating one
> (the message indicates the type - net, scsi, console etc.).
>
> I haven't seen any caching issues so far.
>
> IIRC, the memory mapping has a limit (512GB or 1T), but that should be
> enough (a guest usually has a much smaller memory size).
>
> >> - 

Re: [Qemu-devel] Implementation of BusLogic SCSI host adapter (BT-958)

2016-09-01 Thread Peter Maydell
On 1 September 2016 at 13:59, Paolo Bonzini  wrote:
> On 01/09/2016 13:19, Peter Maydell wrote:
>> On 1 September 2016 at 12:11, Paolo Bonzini  wrote:
>>> That said, this looks very much like VirtualBox code.  Do not use it if
>>> you want to contribute code to QEMU, because QEMU does not accept
>>> GPLv2-only code.
>>
>> I don't think this is the consensus view. We'd prefer v2-or-later,
>> sure, but we haven't stopped taking v2-only code if that's all
>> we have.
>
> Yes, I was simplifying a bit.  LICENSE says: "As of July 2013,
> contributions under version 2 of the GNU General Public License (and no
> later version) are only accepted for the following files or directories:
> bsd-user/, linux-user/, hw/vfio/, hw/xen/xen_pt*".

I don't personally see the point in restricting to those
directories. We are in practice never going be able to relicense
to v2+ for the whole of QEMU.

thanks
-- PMM



Re: [Qemu-devel] [PATCH] vhost-user: return if no net clients found

2016-09-01 Thread Marc-André Lureau
Hi

On Thu, Sep 1, 2016 at 4:00 PM Chen Hanxiao  wrote:

>
> Hi, here is the backtrace:
>
> #0  net_vhost_user_event (opaque=0x7fc2f6893be0, event=5) at
> net/vhost-user.c:196
> #1  0x7fc2f4ebfb2b in tcp_chr_disconnect (chr=0x7fc2f68cc400) at
> qemu-char.c:2837
> #2  0x7fc2f4ebfba9 in tcp_chr_sync_read (chr=0x7fc2f68cc400,
> buf=, len=) at qemu-char.c:2888
> #3  0x7fc2f4ec106d in qemu_chr_fe_read_all (s=0x7fc2f68cc400,
> buf=buf@entry=0x7fff5fda25b7 "", len=len@entry=1) at qemu-char.c:264
> #4  0x7fc2f4f9a43a in net_vhost_user_watch (chan=,
> cond=, opaque=) at net/vhost-user.c:190
> #5  0x7fc2f321999a in g_main_context_dispatch () from
> /lib64/libglib-2.0.so.0
> #6  0x7fc2f4fd8fe8 in glib_pollfds_poll () at main-loop.c:209
> #7  os_host_main_loop_wait (timeout=) at main-loop.c:254
> #8  main_loop_wait (nonblocking=) at main-loop.c:503
> #9  0x7fc2f4dd7b1e in main_loop () at vl.c:1818
> #10 main (argc=, argv=, envp= out>) at vl.c:4394
>
>
> Program received signal SIGSEGV, Segmentation fault.
> net_vhost_user_event (opaque=0x7fc2f6893be0, event=5) at
> net/vhost-user.c:207
> 207trace_vhost_user_event(s->chr->label, event);
>
>
thanks for the backtrace, that helps

However, I fail to understand how that can happen, as there has to be at
least one net_client to start qemu with vhost-user and that callback must
have at least the first netclient still around because the opaque pointer
is shared with the netclient struct. So it looks like something destroyed
the netclient before the callback, and in this case, the opaque pointer is
invalid, and things are going all wrong. But it can't be host-net-remove,
since the net-client is not on a registered hub.

Could you try to find a simple reproducer using qemu only?

thanks

-- 
Marc-André Lureau


[Qemu-devel] [Bug 955379] Re: cmake hangs with qemu-arm-static

2016-09-01 Thread Luke Kim
Hello, Peter Maydell
we have new qemu-arm hang issue.
I guess you are busy for new qemu 2.7 release.
But, could you please help us if you have time?

https://bugs.launchpad.net/qemu/+bug/1617929

Very thank you in advance :-)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/955379

Title:
  cmake hangs with qemu-arm-static

Status in QEMU:
  Fix Committed
Status in Linaro QEMU:
  Confirmed
Status in qemu-linaro package in Ubuntu:
  Confirmed

Bug description:
  I'm using git commit 3e7ecd976b06f... configured with --target-list
  =arm-linux-user --static in a chroot environment to compile some
  things. I ran into this problem with both pcl and opencv-2.3.1. cmake
  consistently freezes at some point during its execution, though in a
  different spot each time, usually during a step when it's searching
  for some libraries. For instance, pcl most commonly stops after:

  [snip]
  -- Boost version: 1.46.1
  -- Found the following Boost libraries:
  --   system
  --   filesystem
  --   thread
  --   date_time
  -- checking for module 'eigen3'
  --   found eigen3, version 3.0.1

  which is perplexing because it freezes after finding what it wants,
  not during the search. When it does get past that point, it does so
  almost immediately but freezes somewhere else.

  I'm using 64-bit Ubuntu 11.10 with kernel release 3.0.0-16-generic
  with an Intel i5.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/955379/+subscriptions



Re: [Qemu-devel] Implementation of BusLogic SCSI host adapter (BT-958)

2016-09-01 Thread Paolo Bonzini


On 01/09/2016 13:19, Peter Maydell wrote:
> On 1 September 2016 at 12:11, Paolo Bonzini  wrote:
>> That said, this looks very much like VirtualBox code.  Do not use it if
>> you want to contribute code to QEMU, because QEMU does not accept
>> GPLv2-only code.
> 
> I don't think this is the consensus view. We'd prefer v2-or-later,
> sure, but we haven't stopped taking v2-only code if that's all
> we have.

Yes, I was simplifying a bit.  LICENSE says: "As of July 2013,
contributions under version 2 of the GNU General Public License (and no
later version) are only accepted for the following files or directories:
bsd-user/, linux-user/, hw/vfio/, hw/xen/xen_pt*".

Paolo



[Qemu-devel] [Bug 1617929] Re: qemu hangs in pselect syscall

2016-09-01 Thread Peter Maydell
Can you provide sufficient instructions for me to reproduce this on my
machine, please?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1617929

Title:
  qemu hangs in pselect syscall

Status in QEMU:
  New

Bug description:
  I'm using git commit d75aa4372f0414c9960534026a562b0302fcff29 (v2.7.0-rc4) 
configured with;
  --enable-linux-user \
  --disable-system \
  --disable-tools \
  --disable-guest-agent \
  --static --disable-linux-aio \
  --disable-fdt \
  --without-pixman \
  --disable-blobs \
  Stable version (v2.6.0) also have the same problem.

  In a chroot environment I ran below command-line to compile some things, 
different sources each time.
  /usr/bin/qemu-arm -0 /usr/bin/edje_cc /usr/bin/edje_cc -id 
/home/abuild/rpmbuild/BUILD/org.tizen.browser-1.6.2/services/SimpleUI/images_mob/
 -DBROWSER_RESOLUTION_720x1280=1 -DPROFILE_MOBILE=1 
/home/abuild/rpmbuild/BUILD/org.tizen.browser-1.6.2/services/SimpleUI/edc/TextPopup_mob.edc
 
/home/abuild/rpmbuild/BUILD/org.tizen.browser-1.6.2/build-tizen/services/SimpleUI/720x1280_TextPopup.edj

  Here is back trace with gdb;
  #0  safe_syscall_end () at 
/usr/src/debug/qemu-2.6.94/linux-user/host/i386/safe-syscall.inc.S:78
  #1  0x60049370 in safe_pselect6 (nfds=10, readfds=0xffa31b5c, 
writefds=0xffa31bdc, exceptfds=0xffa31c5c, timeout=0x0, sig=0x0)
  at /usr/src/debug/qemu-2.6.94/linux-user/syscall.c:855
  #2  0x6004b2fe in do_select (n=10, rfd_addr=1082122232, wfd_addr=1082122360, 
efd_addr=1082122488, target_tv_addr=0)
  at /usr/src/debug/qemu-2.6.94/linux-user/syscall.c:1386
  #3  0x6005e5ba in do_syscall (cpu_env=0x640d0454, num=142, arg1=10, 
arg2=1082122232, arg3=1082122360, arg4=1082122488, arg5=0, arg6=1087473216, 
arg7=0, 
  arg8=0) at /usr/src/debug/qemu-2.6.94/linux-user/syscall.c:9690
  #4  0x60045def in cpu_loop (env=0x640d0454) at 
/usr/src/debug/qemu-2.6.94/linux-user/main.c:876
  #5  0x60047640 in main (argc=10, argv=0xffa33c84, envp=0xffa33cb0) at 
/usr/src/debug/qemu-2.6.94/linux-user/main.c:4817

  Attached core file taken from gdb. To see the stack frame, you could try; 
  $ tar -xf reproduced_118_04.tar.bz2; gdb --core core.1823 qemu-arm

  And recent strace log for PID 1823(stucked one);
  79965 [  313s] 1823 :0x8e _newselect(10,[9,3,],[],[],NULL)
  79966 [  313s]  ==>[pselect6(0xa)=]
  79967 [  313s]  [pselect6=0x1]<==
  79968 [  313s] 1823 :0x8e _newselect(10,[9,],[],[],NULL)
  79969 [  313s] 1823 :0x8e =>  = 0x0001 ([9,],[],[],NULL)
  79970 [  313s] 1823 :0xfc epoll_wait(3,1082121456,32,0,1082121456,3)
  79971 [  313s] 1823 :0xfc epoll_wait(3,1082121456,32,0,1082121456,3)
  79972 [  313s] 1823 :0xfc =>  = 0
  79973 [  313s] 1823 :0x3 read(9,0x407fdeec,16)
  79974 [  313s] 1823 :0x3 read(9,0x407fdeec,16)
  79975 [  313s] 1823 :0x3 =>  = 8
  79976 [  313s] 1823 :0x107 
clock_gettime(1,1082122120,0,1082829144,1082827588,0)
  79977 [  313s] 1823 :0x107 
clock_gettime(1,1082122120,0,1082829144,1082827588,0)
  79978 [  313s] 1823 :0x107 =>  = 0
  79979 [  313s] 1823 :0x8e _newselect(10,[9,3,],[],[],NULL)
  79980 [  313s]  ==>[pselect6(0xa)=]

  I'm using 64-bit Ubuntu with kernel release Linux 3.19.0-25-generic 
#26~14.04.1-Ubuntu.
  Reproducibility is low. One occurrence out of 50+ trials.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1617929/+subscriptions



Re: [Qemu-devel] [PATCH 0/2] Xen HVM unplug changes

2016-09-01 Thread no-reply
Hi,

Your series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH 0/2] Xen HVM unplug changes
Type: series
Message-id: 20160901121131.16007-1-o...@aepfle.de

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git show --no-patch --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20160901121131.16007-1-o...@aepfle.de -> 
patchew/20160901121131.16007-1-o...@aepfle.de
Switched to a new branch 'test'
bc12567 xen_platform: SUSE xenlinux unplug for emulated PCI
cfa2940 xen_platform: unplug also SCSI disks

=== OUTPUT BEGIN ===
Checking PATCH 1/2: xen_platform: unplug also SCSI disks...
ERROR: else should follow close brace '}'
#58: FILE: hw/i386/xen/xen_platform.c:118:
 }
+else if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==

total: 1 errors, 0 warnings, 11 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/2: xen_platform: SUSE xenlinux unplug for emulated PCI...
ERROR: line over 90 characters
#33: FILE: hw/i386/xen/xen_platform.c:327:
+ * xen-kmp used this since xen-3.0.4, instead the official 
protocol from xen-3.3+

WARNING: line over 80 characters
#35: FILE: hw/i386/xen/xen_platform.c:329:
+ * Pre VMDP 1.7 made use of 4 and 8 depending on how VMDP was 
configured.

total: 1 errors, 1 warnings, 43 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

[Qemu-devel] [PATCH 1/2] xen_platform: unplug also SCSI disks

2016-09-01 Thread Olaf Hering
Using 'vdev=sd[a-o]' will create an emulated LSI controller, which can
be used by the emulated BIOS to boot from disk. If the HVM domU has also
PV driver the disk may appear twice in the guest. To avoid this an
unplug of the emulated hardware is needed, similar to what is done for
IDE and NIC drivers already.

Since the SCSI controller provides only disks the entire controller can
be unplugged at once.

Impact of the change for classic and pvops based guest kernels:

 vdev=sda:disk0
before: pvops:   disk0=pv xvda + emulated sda
classic: disk0=pv sda  + emulated sdq
after:  pvops:   disk0=pv xvda
classic: disk0=pv sda

 vdev=hda:disk0, vdev=sda:disk1
before: pvops:   disk0=pv xvda
 disk1=emulated sda
classic: disk0=pv hda
 disk1=pv sda  + emulated sdq
after:  pvops:   disk0=pv xvda
 disk1=not accessible by blkfront, index hda==index sda
classic: disk0=pv hda
 disk1=pv sda

 vdev=hda:disk0, vdev=sda:disk1, vdev=sdb:disk2
before: pvops:   disk0=pv xvda
 disk1=emulated sda
 disk2=pv xvdb + emulated sdb
classic: disk0=pv hda
 disk1=pv sda  + emulated sdq
 disk2=pv sdb  + emulated sdr
after:  pvops:   disk0=pv xvda
 disk1=not accessible by blkfront, index hda==index sda
 disk2=pv xvdb
classic: disk0=pv hda
 disk1=pv sda
 disk2=pv sda

Signed-off-by: Olaf Hering 
---
 hw/i386/xen/xen_platform.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
index aa78393..d94b53c 100644
--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -115,6 +115,11 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void *o)
 && strcmp(d->name, "xen-pci-passthrough") != 0) {
 pci_piix3_xen_ide_unplug(DEVICE(d));
 }
+else if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
+PCI_CLASS_STORAGE_SCSI
+&& strcmp(d->name, "xen-pci-passthrough") != 0) {
+object_unparent(OBJECT(d));
+}
 }
 
 static void pci_unplug_disks(PCIBus *bus)



[Qemu-devel] [PATCH 2/4] linux-user: Fix TARGET_F_GETOWN definition for Mips

2016-09-01 Thread Aleksandar Markovic
From: Aleksandar Markovic 

For some reason, Qemu's TARGET_F_GETOWN constant for Mips does not
match the correct value of correspondant F_GETOWN. This patch fixes
this problem.

For reference, see Mips' F_GETOWN definition in Linux kernel at
arch/mips/include/uapi/asm/fcntl.h#L44.

This patch also fixes some fcntl()-related LTP tests for Qemu
user mode for Mips.

Signed-off-by: Miodrag Dinic 
---
 linux-user/syscall_defs.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index e6171ae..27d93a8 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -2186,7 +2186,7 @@ struct target_statfs64 {
 #define TARGET_F_SETLK 6
 #define TARGET_F_SETLKW7
 #define TARGET_F_SETOWN24   /*  for sockets. */
-#define TARGET_F_GETOWN25   /*  for sockets. */
+#define TARGET_F_GETOWN23   /*  for sockets. */
 #else
 #define TARGET_F_GETLK 5
 #define TARGET_F_SETLK 6
-- 
1.7.9.5




[Qemu-devel] [PATCH 0/2] Xen HVM unplug changes

2016-09-01 Thread Olaf Hering
Update unplug in Xen HVM guests to cover more cases.
Please review.

Olaf

Olaf Hering (2):
  xen_platform: unplug also SCSI disks
  xen_platform: SUSE xenlinux unplug for emulated PCI

 hw/i386/xen/xen_platform.c | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)




Re: [Qemu-devel] [virtio-comment] Re: [PATCH] *** Vhost-pci RFC v2 ***

2016-09-01 Thread Wei Wang

On 09/01/2016 04:49 PM, Marc-André Lureau wrote:

Hi

On Thu, Sep 1, 2016 at 12:19 PM Wei Wang > wrote:


On 08/31/2016 08:30 PM, Marc-André Lureau wrote:


- If it could be made not pci-specific, a better name for the
device could be simply "driver": the driver of a virtio device.
Or the "slave" in vhost-user terminology - consumer of virtq. I
think you prefer to call it "backend" in general, but I find it
more confusing.


Not really. A virtio device has it own driver (e.g. a virtio-net
driver for a virtio-net device). A vhost-pci device plays the role
of a backend (just like vhost_net, vhost_user) for a virtio
device. If we use the "device/driver" naming convention, the
vhost-pci device is part of the "device". But I actually prefer to
use "frontend/backend" :) If we check the QEMU's
doc/specs/vhost-user.txt, it also uses "backend" to describe.


Yes, but it uses "backend" freely without any definition and to name 
eventually different things. (at least "slave" is being defined as the 
consumer of virtq, but I think some people don't like to use that word).




I think most people know the concept of backend/frontend, that's 
probably the reason why they usually don't explicitly explain it in a 
doc. If you guys don't have an objection, I suggest to use it in the 
discussion :)  The goal here is to get the design finalized first. When 
it comes to the final spec wording phase, we can decide which 
description is more proper.


Have you thought about making the device not pci specific? I don't 
know much about mmio devices nor s/390, but if devices can hotplug 
their own memory (I believe mmio can), then it should be possible to 
define a device generic enough.


Not yet. I think the main difference would be the way to map the 
frontend VM's memory (in our case, we use a BAR). Other things should be 
generic.





- Why is it required or beneficial to support multiple "frontend"
devices over the same "vhost-pci" device? It could simplify
things if it was a single device. If necessary, that could also
be interesting as a vhost-user extension.


We call it "multiple backend functionalities" (e.g. vhost-pci-net,
vhost-pci-scsi..). A vhost-pci driver contains multiple such
backend functionalities, because in this way they can reuse
(share) the same memory mapping. To be more precise, a vhost-pci
device supplies the memory of a frontend VM, and all the backend
functionalities need to access the same frontend VM memory, so we
consolidate them into one vhost-pci driver to use one vhost-pci
device.


That's what I imagined. Do you have a use case for that?


Currently, we only have the network use cases. I think we can design it 
that way (multple backend functionalities), which is more generic (not 
just limited to network usages). When implementing it, we can first have 
the network backend functionality (i.e. vhost-pci-net) implemented. In 
the future, if people are interested in other backend functionalities, I 
think it should be easy to add them.




Given that it's in a VM (no caching issues?), how is it a problem to 
map the same memory multiple times? Is there a memory limit?




I need to explain this a little bit more :)  - the backend VM doesn't 
need to map the same memory multiple times. It maps the entire memory of 
a frontend VM using a vhost-pci device (it's a one-time mapping 
happening at the setup phase). Those backend functionalities reside in 
the same vhost-pci driver, so the bar is ioremap()-ed only once, by the 
vhost-pci driver. The backend functionalities are not created together 
in the driver probe() function. A backend functionality is created when 
the vhost-pci driver receives a controlq message asking for creating one 
(the message indicates the type - net, scsi, console etc.).


I haven't seen any caching issues so far.

IIRC, the memory mapping has a limit (512GB or 1T), but that should be 
enough (a guest usually has a much smaller memory size).



- no interrupt support, I suppose you mainly looked at poll-based
net devices


Yes. But I think it's also possible to add the interrupt support.
For example, we can use ioeventfd (or hypercall) to inject
interrupts to the fontend VM after transmitting packets.

I guess it would be a good idea to have this in the spec from the 
beginning, not as an afterthought


OK, will add it.

Best,
Wei




Re: [Qemu-devel] [PATCH] vhost-user: return if no net clients found

2016-09-01 Thread Chen Hanxiao

在 2016-09-01 19:43:31,"Marc-André Lureau"  写道:

Hi



On Thu, Sep 1, 2016 at 2:15 PM Chen Hanxiao  wrote:

From: Chen Hanxiao 

If we can't find a suitable net client, return directly.
Or we will got a segmentation fault.

Signed-off-by: Chen Hanxiao 
---
 net/vhost-user.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index b0595f8..fb96db7 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -210,6 +210,9 @@ static void net_vhost_user_event(void *opaque, int event)
   MAX_QUEUE_NUM);
 assert(queues < MAX_QUEUE_NUM);

+if (queues < 1)
+return;
+



qemu coding style has mandatory {} braces.


I don't understand what this patch fixes. even if queues == 0, there is not 
reason I can think of it would crash. Could you provide a backtrace?


A qemu-only reproducer would be really useful.


Hi, here is the backtrace:


#0  net_vhost_user_event (opaque=0x7fc2f6893be0, event=5) at 
net/vhost-user.c:196
#1  0x7fc2f4ebfb2b in tcp_chr_disconnect (chr=0x7fc2f68cc400) at 
qemu-char.c:2837
#2  0x7fc2f4ebfba9 in tcp_chr_sync_read (chr=0x7fc2f68cc400, buf=, len=) at qemu-char.c:2888
#3  0x7fc2f4ec106d in qemu_chr_fe_read_all (s=0x7fc2f68cc400, 
buf=buf@entry=0x7fff5fda25b7 "", len=len@entry=1) at qemu-char.c:264
#4  0x7fc2f4f9a43a in net_vhost_user_watch (chan=, 
cond=, opaque=) at net/vhost-user.c:190
#5  0x7fc2f321999a in g_main_context_dispatch () from 
/lib64/libglib-2.0.so.0
#6  0x7fc2f4fd8fe8 in glib_pollfds_poll () at main-loop.c:209
#7  os_host_main_loop_wait (timeout=) at main-loop.c:254
#8  main_loop_wait (nonblocking=) at main-loop.c:503
#9  0x7fc2f4dd7b1e in main_loop () at vl.c:1818
#10 main (argc=, argv=, envp=) at 
vl.c:4394


Program received signal SIGSEGV, Segmentation fault.
net_vhost_user_event (opaque=0x7fc2f6893be0, event=5) at net/vhost-user.c:207
207trace_vhost_user_event(s->chr->label, event);


Regards,
- Chen




[Qemu-devel] [PATCH 2/2] xen_platform: SUSE xenlinux unplug for emulated PCI

2016-09-01 Thread Olaf Hering
Implement SUSE specific unplug protocol for emulated PCI devices
in PVonHVM guests. Its a simple 'outl(1, (ioaddr + 4));'.
This protocol was implemented and used since Xen 3.0.4.
It is used in all SUSE/SLES/openSUSE releases up to SLES11SP3 and
openSUSE 12.3.

Signed-off-by: Olaf Hering 
---
 hw/i386/xen/xen_platform.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
index d94b53c..8802482 100644
--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -314,13 +314,42 @@ static void xen_platform_ioport_writeb(void *opaque, 
hwaddr addr,
uint64_t val, unsigned int size)
 {
 PCIXenPlatformState *s = opaque;
+PCIDevice *pci_dev = PCI_DEVICE(s);
 
 switch (addr) {
 case 0: /* Platform flags */
 platform_fixed_ioport_writeb(opaque, 0, (uint32_t)val);
 break;
+case 4:
+if (val == 1) {
+/*
+ * SUSE unplug for Xenlinux
+ * xen-kmp used this since xen-3.0.4, instead the official 
protocol from xen-3.3+
+ * It did an unconditional "outl(1, (ioaddr + 4));"
+ * Pre VMDP 1.7 made use of 4 and 8 depending on how VMDP was 
configured.
+ * If VMDP was to control both disk and LAN it would use 4.
+ * If it controlled just disk or just LAN, it would use 8 below.
+ */
+blk_drain_all();
+blk_flush_all();
+pci_unplug_disks(pci_dev->bus);
+pci_unplug_nics(pci_dev->bus);
+}
+break;
 case 8:
-log_writeb(s, (uint32_t)val);
+switch (val) {
+case 1:
+blk_drain_all();
+blk_flush_all();
+pci_unplug_disks(pci_dev->bus);
+break;
+case 2:
+pci_unplug_nics(pci_dev->bus);
+break;
+default:
+log_writeb(s, (uint32_t)val);
+break;
+}
 break;
 default:
 break;



[Qemu-devel] [PATCH 4/4] linux-user: Fix structure target_semid64_ds definition for Mips

2016-09-01 Thread Aleksandar Markovic
From: Aleksandar Markovic 

This patch corrects target_semid64_ds structure definition for Mips.

See, for example definition of semid64_ds for Mips in Linux kernel:
arch/mips/include/uapi/asm/sembuf.h#L13.

This patch will also fix certain semaphore-related LTP tests for Mips,
if they are executed in Qemu user mode for any Mips platform.

Signed-off-by: Miodrag Dinic 
---
 linux-user/mips/target_structs.h |   16 
 1 file changed, 16 insertions(+)

diff --git a/linux-user/mips/target_structs.h b/linux-user/mips/target_structs.h
index fbd9955..5ba124d 100644
--- a/linux-user/mips/target_structs.h
+++ b/linux-user/mips/target_structs.h
@@ -45,4 +45,20 @@ struct target_shmid_ds {
 abi_ulong __unused2;
 };
 
+#define TARGET_SEMID64_DS
+
+/*
+ * The semid64_ds structure for the MIPS architecture.
+ * Note extra padding because this structure is passed back and forth
+ * between kernel and user space.
+ */
+struct target_semid64_ds {
+struct target_ipc_perm sem_perm;
+abi_ulong sem_otime;
+abi_ulong sem_ctime;
+abi_ulong sem_nsems;
+abi_ulong __unused3;
+abi_ulong __unused4;
+};
+
 #endif
-- 
1.7.9.5




[Qemu-devel] [PATCH 0/3] linux-user: Fix miscellaneous Mips-specific issues

2016-09-01 Thread Aleksandar Markovic
From: Aleksandar Markovic 

v1 -> v2:
  - Improved a comment in the patch about target_semid64_ds.
  - Added a patch that fixes TARGET_SIOCATMARK for Mips.

This series fixes several wrong definitions of preprocessor constants and
structures in Qemu user mode.

It also fixes certain number of LTP test failures, if executed in Qemu
user mode for Mips platform.

All patches from this series affect Mips platform only.

Aleksandar Markovic (4):
  linux-user: Fix TARGET_SIOCATMARK definition for Mips
  linux-user: Fix TARGET_F_GETOWN definition for Mips
  linux-user: Fix structure target_flock definition for Mips
  linux-user: Fix structure target_semid64_ds definition for Mips

 linux-user/mips/target_structs.h |   16 
 linux-user/syscall_defs.h|   12 +++-
 2 files changed, 27 insertions(+), 1 deletion(-)

-- 
1.7.9.5




[Qemu-devel] [PATCH 3/4] linux-user: Fix structure target_flock definition for Mips

2016-09-01 Thread Aleksandar Markovic
From: Aleksandar Markovic 

Structure flock is defined for Mips in a way different from any
other platform. For reference, see Linux kernel source code files:

arch/mips/include/uapi/asm/fcntl.h#L63 (for Mips)
include/uapi/asm-generic/fcntl.h#L195 (for all other platforms)

This patch fix this problem, by amending structure target_flock,
for Mips only.

Besides, this patch fixes LTP tests fcntl11, fcntl17, fcntl19, fcntl20,
and fcntl21, which are currently failing, if executed in Qemu user mode
for Mips platforms.

Signed-off-by: Aleksandar Markovic 
---
 linux-user/syscall_defs.h |6 ++
 1 file changed, 6 insertions(+)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 27d93a8..63527a7 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -2355,7 +2355,13 @@ struct target_flock {
 short l_whence;
 abi_long l_start;
 abi_long l_len;
+#if defined(TARGET_MIPS)
+target_long l_sysid;
+#endif
 int l_pid;
+#if defined(TARGET_MIPS)
+target_long pad[4];
+#endif
 };
 
 struct target_flock64 {
-- 
1.7.9.5




[Qemu-devel] [PATCH 1/4] linux-user: Fix TARGET_SIOCATMARK definition for Mips

2016-09-01 Thread Aleksandar Markovic
From: Aleksandar Markovic 

This patch fixes wrong definition of TARGET_SIOCATMARK for Mips.

The current definition is:

  #define SIOCATMARK  0x8905

while the correct definition is:

  #define SIOCATMARK  TARGET_IOR('s', 7, int)

See Linux kernel source file arch/mips/include/uapi/asm/sockios.h#L19
for reference.

This patch also a fixes LTP test failure for test sockioctl01, for Mips.

Signed-off-by: Aleksandar Rikalo 
---
 linux-user/syscall_defs.h |4 
 1 file changed, 4 insertions(+)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index afd9191..e6171ae 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -926,7 +926,11 @@ struct target_pollfd {
 #define TARGET_KDSETLED0x4B32  /* set led state [lights, not flags] */
 #define TARGET_KDSIGACCEPT 0x4B4E
 
+#if defined(TARGET_MIPS)
+#define TARGET_SIOCATMARK  TARGET_IOR('s', 7, int)
+#else
 #define TARGET_SIOCATMARK  0x8905
+#endif
 
 /* Networking ioctls */
 #define TARGET_SIOCADDRT   0x890B  /* add routing table entry */
-- 
1.7.9.5




Re: [Qemu-devel] [PATCH v3] scsi: check page count while initialising descriptor rings

2016-09-01 Thread Dmitry Fleytman

Reviewed-by: Dmitry Fleytman 

> On 1 Sep 2016, at 14:00 PM, P J P  wrote:
> 
> From: Prasad J Pandit 
> 
> Vmware Paravirtual SCSI emulation uses command descriptors to
> process SCSI commands. These descriptors come with their ring
> buffers. A guest could set the page count for these rings to
> an arbitrary value, leading to infinite loop or OOB access.
> Add check to avoid it.
> 
> Reported-by: Tom Victor 
> Reported-by: Li Qiang 
> Signed-off-by: Prasad J Pandit 
> ---
> hw/scsi/vmw_pvscsi.c | 21 ++---
> 1 file changed, 10 insertions(+), 11 deletions(-)
> 
> Update per review
>  -> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg00019.html
> 
> diff --git a/hw/scsi/vmw_pvscsi.c b/hw/scsi/vmw_pvscsi.c
> index 5116f4a..4245c15 100644
> --- a/hw/scsi/vmw_pvscsi.c
> +++ b/hw/scsi/vmw_pvscsi.c
> @@ -152,7 +152,7 @@ pvscsi_log2(uint32_t input)
> return log;
> }
> 
> -static int
> +static void
> pvscsi_ring_init_data(PVSCSIRingInfo *m, PVSCSICmdDescSetupRings *ri)
> {
> int i;
> @@ -160,10 +160,6 @@ pvscsi_ring_init_data(PVSCSIRingInfo *m, 
> PVSCSICmdDescSetupRings *ri)
> uint32_t req_ring_size, cmp_ring_size;
> m->rs_pa = ri->ringsStatePPN << VMW_PAGE_SHIFT;
> 
> -if ((ri->reqRingNumPages > PVSCSI_SETUP_RINGS_MAX_NUM_PAGES)
> -|| (ri->cmpRingNumPages > PVSCSI_SETUP_RINGS_MAX_NUM_PAGES)) {
> -return -1;
> -}
> req_ring_size = ri->reqRingNumPages * PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE;
> cmp_ring_size = ri->cmpRingNumPages * PVSCSI_MAX_NUM_CMP_ENTRIES_PER_PAGE;
> txr_len_log2 = pvscsi_log2(req_ring_size - 1);
> @@ -195,8 +191,6 @@ pvscsi_ring_init_data(PVSCSIRingInfo *m, 
> PVSCSICmdDescSetupRings *ri)
> 
> /* Flush ring state page changes */
> smp_wmb();
> -
> -return 0;
> }
> 
> static int
> @@ -746,7 +740,7 @@ pvscsi_dbg_dump_tx_rings_config(PVSCSICmdDescSetupRings 
> *rc)
> 
> trace_pvscsi_tx_rings_num_pages("Confirm Ring", rc->cmpRingNumPages);
> for (i = 0; i < rc->cmpRingNumPages; i++) {
> -trace_pvscsi_tx_rings_ppn("Confirm Ring", rc->reqRingPPNs[i]);
> +trace_pvscsi_tx_rings_ppn("Confirm Ring", rc->cmpRingPPNs[i]);
> }
> }
> 
> @@ -779,10 +773,15 @@ pvscsi_on_cmd_setup_rings(PVSCSIState *s)
> 
> trace_pvscsi_on_cmd_arrived("PVSCSI_CMD_SETUP_RINGS");
> 
> +if (!rc->reqRingNumPages
> +|| rc->reqRingNumPages > PVSCSI_SETUP_RINGS_MAX_NUM_PAGES
> +|| !rc->cmpRingNumPages
> +|| rc->cmpRingNumPages > PVSCSI_SETUP_RINGS_MAX_NUM_PAGES) {
> +return PVSCSI_COMMAND_PROCESSING_FAILED;
> +}
> +
> pvscsi_dbg_dump_tx_rings_config(rc);
> -if (pvscsi_ring_init_data(>rings, rc) < 0) {
> -return PVSCSI_COMMAND_PROCESSING_FAILED;
> -}
> +pvscsi_ring_init_data(>rings, rc);
> 
> s->rings_info_valid = TRUE;
> return PVSCSI_COMMAND_PROCESSING_SUCCEEDED;
> -- 
> 2.5.5
> 




  1   2   >