date:20150622

Re: [Qemu-devel] [PATCH 1/7] error: De-duplicate code creating Error objects

2015-06-22 Thread Markus Armbruster

Eric Blake  writes:

> On 06/22/2015 01:26 PM, Markus Armbruster wrote:
>> Duplicated when commit 680d16d added error_set_errno(), and again when
>> commit 20840d4 added error_set_win32().
>> 
>> Make the original copy in error_set() reusable by factoring out
>> error_setv(), then rewrite error_set_errno() and error_set_win32() on
>> top of it.
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
>>  util/error.c | 69 
>> ++--
>>  1 file changed, 25 insertions(+), 44 deletions(-)
>
>> @@ -96,37 +90,24 @@ void error_setg_file_open(Error **errp, int os_errno, 
>> const char *filename)
>>  void error_set_win32(Error **errp, int win32_err, ErrorClass err_class,
>>   const char *fmt, ...)
>>  {
>
>>  if (win32_err != 0) {
>> -char *msg2 = g_win32_error_message(win32_err);
>> -err->msg = g_strdup_printf("%s: %s (error: %x)", msg1, msg2,
>> -   (unsigned)win32_err);
>> +msg1 = (*errp)->msg;
>> +msg2 = g_win32_error_message(win32_err);
>> +(*errp)->msg = g_strdup_printf("%s: %s", msg1, msg2);
>
> Loses " (error: %x)".  Do we care?  I don't, except maybe in the commit
> message...

Unintentional, I'll put it right back.

(I don't care either, but dropping it deserves its own commit)

> Reviewed-by: Eric Blake 

Thanks!

Re: [Qemu-devel] [PULL v2 58/60] qdev: add 64bit properties

2015-06-22 Thread Markus Armbruster

Gonglei  writes:

> On 2015/6/21 5:10, Paolo Bonzini wrote:
>> 
>> 
>> On 01/06/2015 14:25, Michael S. Tsirkin wrote:
>>> +static uint64_t qdev_get_prop_mask64(Property *prop)
>>> +{
>>> +assert(prop->info == &qdev_prop_bit);
>>> +return 0x1 << prop->bitnr;
>> 
>> This needs to be 1ull.
>> 
> Yes, and coverity spot it and some other OVERRUN problems.

Patch author(s) know?

Or are you taking care of them?

Re: [Qemu-devel] [PATCH qemu v8 14/14] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

2015-06-22 Thread David Gibson

On Thu, Jun 18, 2015 at 09:37:36PM +1000, Alexey Kardashevskiy wrote:
> This adds support for Dynamic DMA Windows (DDW) option defined by
> the SPAPR specification which allows to have additional DMA window(s)
> 
> This implements DDW for emulated and VFIO devices. As all TCE root regions
> are mapped at 0 and 64bit long (and actual tables are child regions),
> this replaces memory_region_add_subregion() with _overlap() to make
> QEMU memory API happy.
> 
> This reserves RTAS token numbers for DDW calls.
> 
> This implements helpers to interact with VFIO kernel interface.
> 
> This changes the TCE table migration descriptor to support dynamic
> tables as from now on, PHB will create as many stub TCE table objects
> as PHB can possibly support but not all of them might be initialized at
> the time of migration because DDW might or might not be requested by
> the guest.
> 
> The "ddw" property is enabled by default on a PHB but for compatibility
> the pseries-2.3 machine and older disable it.
> 
> This implements DDW for VFIO. The host kernel support is required.
> This adds a "levels" property to PHB to control the number of levels
> in the actual TCE table allocated by the host kernel, 0 is the default
> value to tell QEMU to calculate the correct value. Current hardware
> supports up to 5 levels.
> 
> The existing linux guests try creating one additional huge DMA window
> with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
> the guest switches to dma_direct_ops and never calls TCE hypercalls
> (H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
> and not waste time on map/unmap later.
> 
> This adds 4 RTAS handlers:
> * ibm,query-pe-dma-window
> * ibm,create-pe-dma-window
> * ibm,remove-pe-dma-window
> * ibm,reset-pe-dma-window
> These are registered from type_init() callback.
> 
> These RTAS handlers are implemented in a separate file to avoid polluting
> spapr_iommu.c with PCI.
> 
> Signed-off-by: Alexey Kardashevskiy 

> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index c8ab06e..0b2ff6d 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -7,6 +7,9 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
>  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
>  obj-y += spapr_pci_vfio.o
>  endif
> +ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES), yy)
> +obj-y += spapr_rtas_ddw.o
> +endif
>  # PowerPC 4xx boards
>  obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
>  obj-y += ppc4xx_pci.o
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 5ca817c..d50d50b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1860,6 +1860,11 @@ static const TypeInfo spapr_machine_info = {
>  .driver   = "spapr-pci-host-bridge",\
>  .property = "dynamic-reconfiguration",\
>  .value= "off",\
> +},\
> +{\
> +.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,\
> +.property = "ddw",\
> +.value= stringify(off),\
>  },
>  
>  #define SPAPR_COMPAT_2_2 \
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 5e6bdb4..eaa1943 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -136,6 +136,15 @@ static IOMMUTLBEntry 
> spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr,
>  return ret;
>  }
>  
> +static void spapr_tce_table_pre_save(void *opaque)
> +{
> +sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
> +
> +tcet->migtable = tcet->table;
> +}
> +
> +static void spapr_tce_table_do_enable(sPAPRTCETable *tcet, bool vfio_accel);
> +
>  static int spapr_tce_table_post_load(void *opaque, int version_id)
>  {
>  sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
> @@ -144,22 +153,43 @@ static int spapr_tce_table_post_load(void *opaque, int 
> version_id)
>  spapr_vio_set_bypass(tcet->vdev, tcet->bypass);
>  }
>  
> +if (!tcet->migtable) {
> +return 0;
> +}
> +
> +if (tcet->enabled) {
> +if (!tcet->table) {
> +tcet->enabled = false;
> +/* VFIO does not migrate so pass vfio_accel == false */
> +spapr_tce_table_do_enable(tcet, false);
> +}
> +memcpy(tcet->table, tcet->migtable,
> +   tcet->nb_table * sizeof(tcet->table[0]));
> +free(tcet->migtable);
> +tcet->migtable = NULL;
> +}
> +
>  return 0;
>  }
>  
>  static const VMStateDescription vmstate_spapr_tce_table = {
>  .name = "spapr_iommu",
> -.version_id = 2,
> +.version_id = 3,
>  .minimum_version_id = 2,
> +.pre_save = spapr_tce_table_pre_save,
>  .post_load = spapr_tce_table_post_load,
>  .fields  = (VMStateField []) {
>  /* Sanity check */
>  VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable),
> -VMSTATE_UINT32_EQUAL(nb_table, sPAPRTCETable),
>  
>  /* IOMMU state */
> +VMSTATE_BOOL_V(enabled, sPAPRTCETable, 3),
> +VMSTATE_UINT64_V(bus_offset, sPAPRTCETa

Re: [Qemu-devel] [PATCH qemu v8 00/14] spapr: vfio: Enable Dynamic DMA windows (DDW)

2015-06-22 Thread David Gibson

On Thu, Jun 18, 2015 at 09:37:22PM +1000, Alexey Kardashevskiy wrote:
> 
> (cut-n-paste from kernel patchset)
> 
> Each Partitionable Endpoint (IOMMU group) has an address range on a PCI bus
> where devices are allowed to do DMA. These ranges are called DMA windows.
> By default, there is a single DMA window, 1 or 2GB big, mapped at zero
> on a PCI bus.
> 
> PAPR defines a DDW RTAS API which allows pseries guests
> querying the hypervisor about DDW support and capabilities (page size mask
> for now). A pseries guest may request an additional (to the default)
> DMA windows using this RTAS API.
> The existing pseries Linux guests request an additional window as big as
> the guest RAM and map the entire guest window which effectively creates
> direct mapping of the guest memory to a PCI bus.
> 
> This patchset reworks PPC64 IOMMU code and adds necessary structures
> to support big windows.
> 
> Once a Linux guest discovers the presence of DDW, it does:
> 1. query hypervisor about number of available windows and page size masks;
> 2. create a window with the biggest possible page size (today 4K/64K/16M);
> 3. map the entire guest RAM via H_PUT_TCE* hypercalls;
> 4. switche dma_ops to direct_dma_ops on the selected PE.
> 
> Once this is done, H_PUT_TCE is not called anymore for 64bit devices and
> the guest does not waste time on DMA map/unmap operations.
> 
> Note that 32bit devices won't use DDW and will keep using the default
> DMA window so KVM optimizations will be required (to be posted later).
> 
> This patchset adds DDW support for pseries. The host kernel changes are
> required, posted as:
> 
> [PATCH kernel v11 00/34] powerpc/iommu/vfio: Enable Dynamic DMA windows
> 
> This patchset is based on git://github.com/dgibson/qemu.git spapr-next branch.

A couple of general queries - this touchs on the kernel part as well
as the qemu part:

 * Am I correct in thinking that the point in doing the
   pre-registration stuff is to allow the kernel to handle PUT_TCE
   in real mode?  i.e. that the advatage of doing preregistration
   rather than accounting on the DMA_MAP and DMA_UNMAP itself only
   appears once you have kernel KVM+VFIO acceleration?

 * Do you have test numbers to show that it's still worthwhile to have
   kernel acceleration once you have a guest using DDW?  With DDW in
   play, even if PUT_TCE is slow, it should be called a lot less
   often.

The reason I ask is that the preregistration handling is a pretty big
chunk of code that inserts itself into some pretty core kernel data
structures, all for one pretty specific use case.  We only want to do
that if there's a strong justification for it.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpJxQBpmEU8_.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v8 2/7] Qemu-Xen-vTPM: Create a new file xen_pvdev.c

2015-06-22 Thread Xu, Quan



> -Original Message-
> From: Stefano Stabellini [mailto:stefano.stabell...@eu.citrix.com]
> Sent: Tuesday, June 23, 2015 12:37 AM
> To: Xu, Quan
> Cc: stefano.stabell...@eu.citrix.com; qemu-devel@nongnu.org;
> stef...@linux.vnet.ibm.com; ebl...@redhat.com; wei.l...@citrix.com;
> dgde...@tycho.nsa.gov; xen-de...@lists.xen.org
> Subject: Re: [PATCH v8 2/7] Qemu-Xen-vTPM: Create a new file xen_pvdev.c
> 
> On Sun, 17 May 2015, Quan Xu wrote:
> > for some common part of xen frontend and backend, such as xendevs
> > queue and xenstore update functions.
> >
> > Signed-off-by: Quan Xu 
> 
> Hi Quan,
> 
> could you please separate out the code movement from any other changes?
> This patch would become two patches: the first would only move code from
> xen_backend.c to xen_pvdev.c, no other changes except for the ones actually
> required to build the code (Makefile.objs).  The second patch would rename
> xen_be_find_xendev to xen_find_xendev and any other changes to the code
> that you making here.  That way I can very easily go and look only at the 
> things
> you actually modify.
> 
> Thanks,
> 
> Stefano
> 
> 

Stefano, thanks for your comment.
Now I am focusing on ' VT-d async invalidation for Device-TLB' feature as high 
priority. 
I will fix all of your comments when I send out v1 of ' VT-d async invalidation 
for Device-TLB' feature.


Quan

Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target

2015-06-22 Thread Wen Congyang

On 06/19/2015 12:06 AM, Stefan Hajnoczi wrote:
> On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote:
>> At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
>>> On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
 +void bdrv_connect(BlockDriverState *bs, Error **errp)
 +{
 +BlockDriver *drv = bs->drv;
 +
 +if (drv && drv->bdrv_connect) {
 +drv->bdrv_connect(bs, errp);
 +} else if (bs->file) {
 +bdrv_connect(bs->file, errp);
 +} else {
 +error_setg(errp, "this feature or command is not currently 
 supported");
 +}
 +}
 +
 +void bdrv_disconnect(BlockDriverState *bs)
 +{
 +BlockDriver *drv = bs->drv;
 +
 +if (drv && drv->bdrv_disconnect) {
 +drv->bdrv_disconnect(bs);
 +} else if (bs->file) {
 +bdrv_disconnect(bs->file);
 +}
 +}
>>>
>>> Please add doc comments describing the semantics of these commands.
>>
>> Where should it be documented? In the header file?
> 
> block.h doesn't document prototypes in the header file, please document
> the function definition in block.c.  (QEMU is not consistent here, some
> places do it the other way around.)
> 
>>> Why are these operations needed when there is already a bs->drv == NULL
>>> case which means the BDS is not ready for read/write?
>>>
>>
>> The purpos is that: don't connect to nbd server when opening a nbd client.
>> connect/disconnect
>> to nbd server when we need to do it.
>>
>> IIUC, if bs->drv is NULL, it means that the driver is ejected? Here,
>> connect/disconnect
>> means that connect/disconnect to remote target(The target may be in another
>> host).
> 
> Connect/disconnect puts something on the QEMU command-line that isn't
> ready at startup time.
> 
> How about using monitor commands to add objects when needed instead?

We have decieded use this way here:
http://lists.gnu.org/archive/html/qemu-devel/2015-04/msg02975.html

Thanks
Wen Congyang

> 
> That is cleaner because it doesn't introduce a new state (which is only
> implemented for nbd).
>

Re: [Qemu-devel] [PATCH v5 1/3] ich9: add TCO interface emulation

2015-06-22 Thread Michael S. Tsirkin

On Mon, Jun 22, 2015 at 08:10:27PM -0300, Paulo Alcantara wrote:
> This interface provides some registers within a 32-byte range and can be
> acessed through PCI-to-LPC bridge interface (PMBASE + 0x60).
> 
> It's commonly used as a watchdog timer to detect system lockups through
> SMIs that are generated -- if TCO_EN bit is set -- on every timeout. If
> NO_REBOOT bit is not set in GCS (General Control and Status register),
> the system will be resetted upon second timeout if TCO_RLD register
> wasn't previously written to prevent timeout.
> 
> This patch adds support to TCO watchdog logic and few other features
> like mapping NMIs to SMIs (NMI2SMI_EN bit), system intruder detection,
> etc. are not implemented yet.
> 
> Signed-off-by: Paulo Alcantara 

One useful feature to implement could be ability to set the NO_REBOOT
strapping pin to 1. And maybe it's even a better default - will have
to experiment with this feature some more.
Does not have to block this patchset, can be done by a patch on top.

-- 
MST

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-22 Thread Peter Lieven


Am 22.06.2015 um 23:54 schrieb John Snow:


On 06/22/2015 09:09 AM, Peter Lieven wrote:

Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:

On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven  wrote:

Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:

On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven  wrote:

Am 18.06.2015 um 10:42 schrieb Kevin Wolf:

Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:

Am 18.06.2015 um 09:45 schrieb Kevin Wolf:

Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:

Thread 2 (Thread 0x75550700 (LWP 2636)):
#0  0x75d87aa3 in ppoll () from
/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x55955d91 in qemu_poll_ns (fds=0x563889c0,
nfds=3,
   timeout=4999424576) at qemu-timer.c:326
   ts = {tv_sec = 4, tv_nsec = 999424576}
   tvsec = 4
#2  0x55956feb in aio_poll (ctx=0x563528e0,
blocking=true)
   at aio-posix.c:231
   node = 0x0
   was_dispatching = false
   ret = 1
   progress = false
#3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
offset=4292007936,
   qiov=0x7554f760, is_write=false, flags=0) at
block.c:2699
   aio_context = 0x563528e0
   co = 0x563888a0
   rwco = {bs = 0x5637eae0, offset = 4292007936,
 qiov = 0x7554f760, is_write = false, ret =
2147483647,
flags = 0}
#4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
sector_num=8382828,
   buf=0x744cc800 "(", nb_sectors=4, is_write=false,
flags=0)
   at block.c:2722
   qiov = {iov = 0x7554f780, niov = 1, nalloc = -1,
size =
2048}
   iov = {iov_base = 0x744cc800, iov_len = 2048}
#5  0x5594b008 in bdrv_read (bs=0x5637eae0,
sector_num=8382828,
   buf=0x744cc800 "(", nb_sectors=4) at block.c:2730
No locals.
#6  0x5599acef in blk_read (blk=0x56376820,
sector_num=8382828,
   buf=0x744cc800 "(", nb_sectors=4) at
block/block-backend.c:404
No locals.
#7  0x55833ed2 in cd_read_sector (s=0x56408f88,
lba=2095707,
   buf=0x744cc800 "(", sector_size=2048) at
hw/ide/atapi.c:116
   ret = 32767

Here is the problem: The ATAPI emulation uses synchronous
blk_read()
instead of the AIO or coroutine interfaces. This means that it
keeps
polling for request completion while it holds the BQL until the
request
is completed.

I will look at this.

I need some further help. My way to "emulate" a hung NFS Server is to
block it in the Firewall. Currently I face the problem that I
cannot mount
a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
tried with
a kernel NFS mount). It reads a few sectors and then stalls (maybe
another
bug):

(gdb) thread apply all bt full

Thread 3 (Thread 0x70c21700 (LWP 29710)):
#0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
util/qemu-thread-posix.c:120
  err = 
  __func__ = "qemu_cond_broadcast"
#1  0x55911164 in rfifolock_unlock
(r=r@entry=0x56259910) at
util/rfifolock.c:75
  __PRETTY_FUNCTION__ = "rfifolock_unlock"
#2  0x55875921 in aio_context_release
(ctx=ctx@entry=0x562598b0)
at async.c:329
No locals.
#3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
blocking=blocking@entry=true) at aio-posix.c:272
  node = 
  was_dispatching = false
  i = 
  ret = 
  progress = false
  timeout = 611734526
  __PRETTY_FUNCTION__ = "aio_poll"
#4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
block/io.c:552
  aio_context = 0x562598b0
  co = 
  rwco = {bs = 0x5627c0f0, offset = 7038976, qiov =
0x70c208f0, is_write = false, ret = 2147483647, flags =
(unknown: 0)}
#5  0x558bc533 in bdrv_rw_co (bs=0x5627c0f0,
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 "(",
nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
  flags=flags@entry=(unknown: 0)) at block/io.c:575
  qiov = {iov = 0x70c208e0, niov = 1, nalloc = -1, size
= 2048}
  iov = {iov_base = 0x57874800, iov_len = 2048}
#6  0x558bc593 in bdrv_read (bs=,
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 "(",
nb_sectors=nb_sectors@entry=4) at block/io.c:583
No locals.
#7  0x558af75d in blk_read (blk=,
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 "(",
nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
  ret = 
#8  0x557abb88 in cd_read_sector (sector_size=,
buf=0x57874800 "(", lba=3437, s=0x5760db70) at
hw/ide/atapi.c:116
  ret = 
#9  ide_atapi_cmd_reply_end (s=0x5760db70) at hw/ide/atapi.c:190
  byte_count_limit = 
  size = 
  ret = 2

This is still the same scenario Kevin explained.

The ATAPI CD-ROM emulati

Re: [Qemu-devel] [PATCH 1/2] virito-pci: fix OVERRUN problem

2015-06-22 Thread Michael S. Tsirkin

Although the subject is virtio-pci. Fixed up.

[Qemu-devel] [PATCH] MAINTAINERS: add ACPI entry

2015-06-22 Thread Michael S. Tsirkin

Igor agreed to help review ACPI patches, add an entry to MAINTAINERS
with all ACPI stuff I could think of.
Note: I listed ARM ACPI files here just to make sure we are Cc'd, no
plan to maintain ACPI for ARM through my tree :)

Signed-off-by: Michael S. Tsirkin 
---
 MAINTAINERS | 12 
 1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7a13d68..5188b32 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -636,7 +636,19 @@ M: Michael S. Tsirkin 
 S: Supported
 F: include/hw/pci/*
 F: hw/pci/*
+
+ACPI
+M: Michael S. Tsirkin 
+M: Igor Mammedov 
+S: Supported
+F: include/hw/acpi/*
+F: hw/mem/*
 F: hw/acpi/*
+F: hw/i386/acpi-build.[hc]
+F: hw/i386/*dsl
+F: hw/arm/virt-acpi-build.c
+F: include/hw/arm/virt-acpi-build.h
+F: scripts/acpi*py
 
 ppc4xx
 M: Alexander Graf 
-- 
MST

Re: [Qemu-devel] [PATCH] thread-win32: fix GetThreadContext() permanently fails

2015-06-22 Thread Stefan Weil


Am 22.06.2015 um 23:54 schrieb Zavadovsky Yan:

Calling SuspendThread() is not enough to suspend Win32 thread.
We need to call GetThreadContext() after SuspendThread()
to make sure that OS have really suspended target thread.
But GetThreadContext() needs for THREAD_GET_CONTEXT
access right on thread object.

This patch adds THREAD_GET_CONTEXT to OpenThread() arguments
and change 'while(GetThreadContext() == SUCCESS)' to
'while(GetThreadContext() == FAILED)'.
So this 'while' loop will stop only after successful grabbing
of thread context(i.e. when thread is really suspended).
Not after the one failed GetThreadContext() call.

Signed-off-by: Zavadovsky Yan 
---
  cpus.c   | 2 +-
  util/qemu-thread-win32.c | 4 ++--
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/cpus.c b/cpus.c
index b85fb5f..83d5eb5 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1097,7 +1097,7 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
   * suspended until we can get the context.
   */
  tcgContext.ContextFlags = CONTEXT_CONTROL;
-while (GetThreadContext(cpu->hThread, &tcgContext) != 0) {
+while (GetThreadContext(cpu->hThread, &tcgContext) == 0) {
  continue;
  }
  
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c

index 406b52f..823eca1 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -406,8 +406,8 @@ HANDLE qemu_thread_get_handle(QemuThread *thread)
  
  EnterCriticalSection(&data->cs);

  if (!data->exited) {
-handle = OpenThread(SYNCHRONIZE | THREAD_SUSPEND_RESUME, FALSE,
-thread->tid);
+handle = OpenThread(SYNCHRONIZE | THREAD_SUSPEND_RESUME | 
THREAD_GET_CONTEXT,
+FALSE, thread->tid);
  } else {
  handle = NULL;
  }



I added the contributers of the original code to the cc list.

The modifications look reasonable - if GetThreadContext is needed at all.
We should add an URL to reliable documentation which supports that
claim.

Is it a good idea to run a busy waiting loop? Or would a Sleep(0) in
the loop be better (it allows other threads to run, maybe it helps
them to suspend, too).

Regards
Stefan

Re: [Qemu-devel] [PATCH v3 1/2] vhost user: add support of live migration

2015-06-22 Thread Michael S. Tsirkin

On Wed, Jun 10, 2015 at 03:43:02PM +0200, Thibaut Collet wrote:
> Some vhost client/backend are able to support live migration.
> To provide this service the following features must be added:
> 1. GARP from guest after live migration:
>the VIRTIO_NET_F_GUEST_ANNOUNCE capability is added to vhost-net when 
> netdev
>backend is vhost-user to let virtio-net NIC manages GARP after migration.
> 2. RARP on vhost-user:
>After a live migration RARP are automatically sent to any interfaces. A
>receive callback is added to vhost-user to catch this packet. If guest
>supports VIRTIO_NET_F_GUEST_ANNOUNCE this packet is discarded to avoid
>message duplication (already done by virtio-net NIC). For legacy guest
>without VIRTIO_NET_F_GUEST_ANNOUNCE a warning message is displayed to
>alert the user. RARP must be sent to the vhost client/backend.
> 
> Signed-off-by: Thibaut Collet 

So let's do it gradually
patch 1. negotiate VIRTIO_NET_F_GUEST_ANNOUNCE, add empty recv
 callback to make it not crash
patch 2. more tricks for legacy guests

then I can apply patch 1 straight away.

> ---
>  hw/net/vhost_net.c  |   15 +++
>  include/net/vhost_net.h |1 +
>  net/vhost-user.c|   21 +++--
>  3 files changed, 35 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 426b23e..4a42325 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -82,6 +82,8 @@ static const int user_feature_bits[] = {
>  VIRTIO_NET_F_CTRL_MAC_ADDR,
>  VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
>  
> +VIRTIO_NET_F_GUEST_ANNOUNCE,
> +
>  VIRTIO_NET_F_MQ,
>  
>  VHOST_INVALID_FEATURE_BIT
> @@ -365,6 +367,15 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState 
> *ncs,
>  assert(r >= 0);
>  }
>  
> +void vhost_net_inject_rarp(struct vhost_net *net, const uint8_t *buf, size_t 
> size)
> +{
> +if ((net->dev.acked_features & (1 << VIRTIO_NET_F_GUEST_ANNOUNCE)) == 0) 
> {
> +fprintf(stderr,
> +"Warning: Guest with no VIRTIO_NET_F_GUEST_ANNOUNCE support. 
> RARP must be sent by vhost-user backend\n");
> +fflush(stderr);
> +}
> +}
> +
>  void vhost_net_cleanup(struct vhost_net *net)
>  {
>  vhost_dev_cleanup(&net->dev);
> @@ -427,6 +438,10 @@ void vhost_net_stop(VirtIODevice *dev,
>  {
>  }
>  
> +void vhost_net_inject_rarp(struct vhost_net *net, const uint8_t *buf, size_t 
> size);
> +{
> +}
> +
>  void vhost_net_cleanup(struct vhost_net *net)
>  {
>  }
> diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
> index b1c18a3..e82a0f9 100644
> --- a/include/net/vhost_net.h
> +++ b/include/net/vhost_net.h
> @@ -20,6 +20,7 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev);
>  int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, int 
> total_queues);
>  void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs, int 
> total_queues);
>  
> +void vhost_net_inject_rarp(VHostNetState *net, const uint8_t *buf, size_t 
> size);
>  void vhost_net_cleanup(VHostNetState *net);
>  
>  unsigned vhost_net_get_features(VHostNetState *net, unsigned features);
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index 8d26728..69c2313 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -66,6 +66,24 @@ static void vhost_user_stop(VhostUserState *s)
>  s->vhost_net = 0;
>  }
>  
> +static ssize_t vhost_user_receive(NetClientState *nc, const uint8_t *buf,
> +  size_t size)
> +{
> +if (size == 60) {
> +/* Assume it is a RARP request sent automatically after a
> + * live migration */
> +/* The RARP must be sent if guest does not support
> + * VIRTIO_NET_F_GUEST_ANNOUNCE */
> +VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
> +
> +vhost_net_inject_rarp(s->vhost_net, buf, size);
> +} else {
> +fprintf(stderr,"Vhost user receives unexpected packets\n");
> +fflush(stderr);
> +}
> +return size;
> +}
> +
>  static void vhost_user_cleanup(NetClientState *nc)
>  {
>  VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
> @@ -91,6 +109,7 @@ static bool vhost_user_has_ufo(NetClientState *nc)
>  static NetClientInfo net_vhost_user_info = {
>  .type = NET_CLIENT_OPTIONS_KIND_VHOST_USER,
>  .size = sizeof(VhostUserState),
> +.receive = vhost_user_receive,
>  .cleanup = vhost_user_cleanup,
>  .has_vnet_hdr = vhost_user_has_vnet_hdr,
>  .has_ufo = vhost_user_has_ufo,
> @@ -147,8 +166,6 @@ static int net_vhost_user_init(NetClientState *peer, 
> const char *device,
>  
>  s = DO_UPCAST(VhostUserState, nc, nc);
>  
> -/* We don't provide a receive callback */
> -s->nc.receive_disabled = 1;
>  s->chr = chr;
>  s->nc.queue_index = i;
>  
> -- 
> 1.7.10.4

Re: [Qemu-devel] [PATCH 0/2] fix two overflow problems

2015-06-22 Thread Michael S. Tsirkin

On Tue, Jun 23, 2015 at 09:53:03AM +0800, arei.gong...@huawei.com wrote:
> From: Gonglei 
> 
> 
> Gonglei (2):
>   virito-pci: fix OVERRUN problem
>   qdev: fix OVERFLOW_BEFORE_WIDEN


Applied, thanks a lot.

>  hw/core/qdev-properties.c | 2 +-
>  hw/virtio/virtio-pci.c| 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> -- 
> 1.7.12.4
>

Re: [Qemu-devel] [PATCH v3 2/2] vhost user: Add RARP injection for legacy guest

2015-06-22 Thread Michael S. Tsirkin

On Tue, Jun 23, 2015 at 10:12:17AM +0800, Jason Wang wrote:
> 
> 
> On 06/18/2015 11:16 PM, Thibaut Collet wrote:
> > On Tue, Jun 16, 2015 at 10:05 AM, Jason Wang  wrote:
> >>
> >> On 06/16/2015 03:24 PM, Thibaut Collet wrote:
> >>> If my understanding is correct, on a resume operation, we have the
> >>> following callback trace:
> >>> 1. virtio_pci_restore function that calls all restore call back of
> >>> virtio devices
> >>> 2. virtnet_restore that calls try_fill_recv function for each virtual 
> >>> queues
> >>> 3. try_fill_recv function kicks the virtual queue (through
> >>> virtqueue_kick function)
> >> Yes, but this happens only after pm resume not migration. Migration is
> >> totally transparent to guest.
> >>
> > Hi Jason,
> >
> > After a deeper look in the migration code of QEMU a resume event is
> > always sent when the live migration is finished.
> > On a live migration we have the following callback trace:
> > 1. The VM on the new host is set to the state RUN_STATE_INMIGRATE, the
> > autostart boolean to 1  and calls the qemu_start_incoming_migration
> > function (see function main of vl.c)
> > .
> > 2. call of process_incoming_migration function in
> > migration/migration.c file whatever the way to do the live migration
> > (tcp:, fd:, unix:, exec: ...)
> > 3. call of process_incoming_migration_co function in migration/migration.c
> > 4. call of vm_start function in vl.c (otherwise the migrated VM stay
> > in the pause state, the autostart boolean is set to 1 by the main
> > function in vl.c)
> > 5. call of vm_start function that sets the VM is the RUN_STATE_RUNNING 
> > state.
> > 6. call of qapi_event_send_resume function that ends a resume event to the 
> > VM
> 
> AFAIK, this function sends resume event to qemu monitor not VM.
> 
> >
> > So when a live migration is ended:
> > 1. a resume event is sent to the guest
> > 2. On the reception of this resume event the virtual queue are kicked
> > by the guest
> > 3. Backend vhost user catches this kick and can emit a RARP to guest
> > that does not support GUEST_ANNOUNCE
> >
> > This solution, as solution based on detection of DRIVER_OK status
> > suggested by Michael, allows backend to send the RARP to legacy guest
> > without involving QEMU and add ioctl to vhost-user.
> 
> A question here is did vhost-user code pass status to the backend? If
> not, how can userspace backend detect DRIVER_OK?

Sorry, I must have been unclear.
vhost core calls VHOST_NET_SET_BACKEND on DRIVER_OK.
Unfortunately vhost user currently translates it to VHOST_USER_NONE.

As a work around, I think kicking ioeventfds once you get
VHOST_NET_SET_BACKEND will work.



> >
> > Vhost user backend is free to implement one of this two solutions.
> >
> > The single drawback of these two solutions is useless RARP sending in
> > some case (reboot, ...). These messages are harmless and probably not
> > blocking for a "light" patch to support live migration with vhost
> > user.
> >
> > If you agree
> >
> > 1. The first patch must be updated by:
> >   - removing the warning trace
> >   - setting the error trace inside a static bool flag to only
> > print this once
> >   - removing the vhost_net_inject_rarp function (no more useful)
> > 2. The second patch can be removed. Management of legacy guest for
> > rarp sending can be done by modifications in backend without any
> > change in QEMU.
> >
> > Best regards.
> >
> > Thibaut.
> >

Re: [Qemu-devel] [PATCH qemu v8 02/14] vfio: spapr: Move SPAPR-related code to a separate file

2015-06-22 Thread David Gibson

On Thu, Jun 18, 2015 at 09:37:24PM +1000, Alexey Kardashevskiy wrote:
> This moves SPAPR bits to a separate file to avoid pollution of x86 code.
> 
> This enables spapr-vfio on CONFIG_SOFTMMU (not CONFIG_PSERIES) as
> the config options are only visible in makefiles and not in the source code
> so there is no an obvious way of implementing stubs if hw/vfio/spapr.c is
> not compiled.
> 
> This is a mechanical patch.
> 
> Signed-off-by: Alexey Kardashevskiy 
> Reviewed-by: David Gibson 

So, I sent a R-b for this earlier, and I think the patch won't break
anything.

But on consideration I don't really see the point of this.  The parts
that are split out into vfio/spapr.c aren't really spapr specific,
just used a bit more heavily with the spapr model.  That may not even
remain the case if guest-visible x86 IOMMUs become more common.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

pgpX47PGXDHLA.pgp
Description: PGP signature

[Qemu-devel] [Bug 1329956] Re: multi-core FreeBSD guest hangs after warm reboot

2015-06-22 Thread Vasiliy Tolstov

In my testing disable apicv does not needed, but i need latest stable
seabios from seabios site.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1329956

Title:
  multi-core FreeBSD guest hangs after warm reboot

Status in QEMU:
  Incomplete

Bug description:
  On some Linux KVM hosts in our environment, FreeBSD guests fail to
  reboot properly if they have more than one CPU (socket, core, and/or
  thread). They will boot fine the first time, but after issuing a
  "reboot" command via the OS the guest starts to boot but hangs during
  SMP initialization. Fully shutting down and restarting the guest works
  in all cases.

  The only meaningful difference between hosts with the problem and those 
without is the CPU. Hosts with Xeon E5-26xx v2 processors have the problem, 
including at least the "Intel(R) Xeon(R) CPU E5-2667 v2" and the "Intel(R) 
Xeon(R) CPU E5-2650 v2".
  Hosts with any other CPU, including "Intel(R) Xeon(R) CPU E5-2650 0", 
"Intel(R) Xeon(R) CPU E5-2620 0", or "AMD Opteron(TM) Processor 6274" do not 
have the problem. Note the "v2" in the names of the problematic CPUs.

  On hosts with a "v2" Xeon, I can reproduce the problem under Linux
  kernel 3.10 or 3.12 and Qemu 1.7.0 or 2.0.0.

  The problem occurs with all currently-supported versions of FreeBSD,
  including 8.4, 9.2, 10.0 and 11-CURRENT.

  On a Linux KVM host with a "v2" Xeon, this command line is adequate to
  reproduce the problem:

  /usr/bin/qemu-system-x86_64 -machine accel=kvm -name bsdtest -m 512
  -smp 2,sockets=1,cores=1,threads=2 -drive
  file=./20140613_FreeBSD_9.2-RELEASE_ufs.qcow2,if=none,id=drive0,format=qcow2
  -device virtio-blk-pci,scsi=off,drive=drive0 -vnc 0.0.0.0:0 -net none

  I have tried many variations including different models of -machine
  and -cpu for the guest with no visible difference.

  A native FreeBSD installation on a host with a "v2" Xeon does not have
  the problem, nor do a paravirtualized FreeBSD guests under bhyve (the
  BSD legacy-free hypervisor) using the same FreeBSD disk images as on
  the Linux hosts. So it seems unlikely the cause is on the FreeBSD side
  of things.

  I would greatly appreciate any feedback or developer attention to
  this. I am happy to provide additional details, test patches, etc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1329956/+subscriptions

Re: [Qemu-devel] [PATCH v7 0/1] balloon: add a feature bit to let Guest OS deflate

2015-06-22 Thread Michael S. Tsirkin

On Mon, Jun 22, 2015 at 02:09:48PM -0700, James Bottomley wrote:
> On Mon, 2015-06-22 at 15:56 +0300, Denis V. Lunev wrote:
> > On 15/06/15 13:52, Denis V. Lunev wrote:
> > > Excessive virtio_balloon inflation can cause invocation of OOM-killer,
> > > when Linux is under severe memory pressure. Various mechanisms are
> > > responsible for correct virtio_balloon memory management. Nevertheless it
> > > is often the case that these control tools does not have enough time to
> > > react on fast changing memory load. As a result OS runs out of memory and
> > > invokes OOM-killer. The balancing of memory by use of the virtio balloon
> > > should not cause the termination of processes while there are pages in the
> > > balloon. Now there is no way for virtio balloon driver to free memory at
> > > the last moment before some process get killed by OOM-killer.
> > >
> > > This does not provide a security breach as balloon itself is running
> > > inside Guest OS and is working in the cooperation with the host. Thus
> > > some improvements from Guest side should be considered as normal.
> > >
> > > To solve the problem, introduce a virtio_balloon callback which is
> > > expected to be called from the oom notifier call chain in out_of_memory()
> > > function. If virtio balloon could release some memory, it will make the
> > > system to return and retry the allocation that forced the out of memory
> > > killer to run.
> > >
> > > This behavior should be enabled if and only if appropriate feature bit
> > > is set on the device. It is off by default.
> > >
> > > This functionality was recently merged into vanilla Linux (actually in
> > > linux-next at the moment)
> > >
> > >commit 5a10b7dbf904bfe01bb9fcc6298f7df09eed77d5
> > >Author: Raushaniya Maksudova 
> > >Date:   Mon Nov 10 09:36:29 2014 +1030
> > >
> > > This patch adds respective control bits into QEMU. It introduces
> > > deflate-on-oom option for baloon device which do the trick.
> > >
> > > Changes from v6:
> > > - ported to virtio_add_feature
> > >
> > > Changes from v5:
> > > - ported to QEMU current
> > >
> > > Changes from v4:
> > > - spelling corrected according to suggestions from Eric Blake
> > >
> > > Changes from v3:
> > > - ported to git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git 
> > > tags/for_upstream_rebased
> > >
> > > Changes from v2:
> > > - fixed mistake with bit number in virtio_balloon_get_features
> > >
> > > Changes from v1:
> > > - From: in patch 1 according to the original ownership
> > > - feature processing in patch 2 as suggested by Michael. It could be done
> > >without additional field, but this will require to move the property
> > >level up, i.e. to PCI & CCW level.
> > >
> > > Signed-off-by: Raushaniya Maksudova 
> > > Signed-off-by: Denis V. Lunev 
> > > CC: Anthony Liguori 
> > > CC: Michael S. Tsirkin 
> > >
> > > P.S. Sorry for resend (if you have caught additional patch), I have
> > >   expirienced some troubles in the process
> > >
> > ping
> 
> We seem to have become bogged down in a dispute over whether this flag
> should be automatically enabled or disabled.  To be honest, we don't
> care, since we're going to be shipping qemu configured according to our
> requirements (as are most other distros anyway).
> 
> However, for the sake of getting the patch in, what about putting it in
> as is (default disable) because that has no impact to the status quo.
> If it later turns out everyone ships in a default enabled configuration,
> why then someone can patch upstream qemu to match.
> 
> James
> 

Great, so I get your ack on it?

Re: [Qemu-devel] [PATCH v4 0/4] Extend TPM support with a QEMU-external TPM

2015-06-22 Thread Michael S. Tsirkin

On Mon, Jun 08, 2015 at 07:17:33AM -0400, Stefan Berger wrote:
> The following series of patches extends TPM support with an
> external TPM that offers a Linux CUSE (character device in userspace)
> interface. This TPM lets each VM access its own private vTPM.
> The CUSE TPM supports suspend/resume and migration. Much
> out-of-band functionality necessary to control the CUSE TPM is
> implemented using ioctls.

I was hoping this can get a wider discussion, but apparently no one
noticed this.

This needs some thought: how do we decide which ioctls we support?
It's easier with kernel since we know distros ship it, but
will they do so with this tpm? We do want to reuse system components
but we don't want random parts of QEMU delegated to a random
out of tree module.

Couldn't you re-use in-kernel interfaces for the CUSE module?
Then existing pass-through in QEMU would more or less just work with it -
merely open a different chardev.

> Stefan Berger (4):
>   Provide support for the CUSE TPM
>   Introduce condition to notify waiters of completed command
>   Introduce condition in TPM backend for notification
>   Add support for VM suspend/resume for TPM TIS
> 
>  hmp.c|   6 +
>  hw/tpm/tpm_int.h |   4 +
>  hw/tpm/tpm_ioctl.h   | 209 ++
>  hw/tpm/tpm_passthrough.c | 409 
> +--
>  hw/tpm/tpm_tis.c | 151 +++-
>  hw/tpm/tpm_tis.h |   2 +
>  hw/tpm/tpm_util.c| 223 +++
>  hw/tpm/tpm_util.h|   7 +
>  include/sysemu/tpm_backend.h |  12 ++
>  qapi-schema.json |  18 +-
>  qemu-options.hx  |  21 ++-
>  qmp-commands.hx  |   2 +-
>  tpm.c|  11 +-
>  13 files changed, 1056 insertions(+), 19 deletions(-)
>  create mode 100644 hw/tpm/tpm_ioctl.h
> 
> -- 
> 1.9.3

[Qemu-devel] [PATCH QEMU] target-arm: A64: Print ELR when taking exceptions

2015-06-22 Thread Soren Brinkmann

When taking an exception print the content of the exception link
register. This is useful especially for synchronous exceptions because
in that case this registers holds the address of the instruction that
generated the exception.

Signed-off-by: Soren Brinkmann 
---
 target-arm/helper-a64.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index e30af0659e29..08c95a3f5202 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -533,6 +533,8 @@ void aarch64_cpu_do_interrupt(CPUState *cs)
 
 env->condexec_bits = 0;
 }
+qemu_log_mask(CPU_LOG_INT, "...with ELR 0x%" PRIx64 "\n",
+  env->elr_el[new_el]);
 
 pstate_write(env, PSTATE_DAIF | new_mode);
 env->aarch64 = 1;
-- 
2.4.4.3.gd756881

Re: [Qemu-devel] [PATCH v5] i386: Introduce ARAT CPU feature

2015-06-22 Thread Jan Kiszka

On 2015-06-23 04:50, Wanpeng Li wrote:
> 
> 
> On 6/22/15 1:38 AM, Jan Kiszka wrote:
>> On 2015-06-18 22:21, Eduardo Habkost wrote:
>>> On Sun, Jun 07, 2015 at 11:15:08AM +0200, Jan Kiszka wrote:
 From: Jan Kiszka 

 ARAT signals that the APIC timer does not stop in power saving states.
 As our APICs are emulated, it's fine to expose this feature to guests,
 at least when asking for KVM host features or with CPU types that
 include the flag. The exact model number that introduced the feature is
 not known, but reports can be found that it's at least available since
 Sandy Bridge.

 Signed-off-by: Jan Kiszka 
>>> The code looks good now, but: what are the real consequences of
>>> enabling/disabling the flag? What exactly guests use it for?
>>>
>>> Isn't this going to make guests have additional expectations about the
>>> APIC timer that may be broken when live-migrating or pausing the VM?
>> ARAT only refers to stopping of the timer in certain power states (which
>> we do not even emulate IIRC). In that case, the OS is under risk of
>> sleeping forever, thus need to look for a different wakeup source.
> 
> HPET will always be the default broadcast event device I think.

But it's unused (under Linux) if per-cpu clockevents are unaffected by
CLOCK_EVT_FEAT_C3STOP (x86-only "none-feature"), i.e. have ARAT set. And
other guests may have other strategies to deal with missing ARAT.

Again, the scenario for me was not a regular setup but some Jailhouse
boot of Linux where neither a HPET nor a PIT are available as broadcast
sources and Linux therefore refuses to switch to hires mode - in
contrast to running on real hardware.

Jan

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH] Re-attach usb device to kernel while usb_host_open fails

2015-06-22 Thread Lin Ma

Signed-off-by: Lin Ma 
---
 hw/usb/host-libusb.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/usb/host-libusb.c b/hw/usb/host-libusb.c
index 10f4735..7258c4d 100644
--- a/hw/usb/host-libusb.c
+++ b/hw/usb/host-libusb.c
@@ -888,6 +888,11 @@ static int usb_host_open(USBHostDevice *s, libusb_device 
*dev)
 fail:
 trace_usb_host_open_failure(bus_num, addr);
 if (s->dh != NULL) {
+qemu_remove_exit_notifier(&s->exit);
+QTAILQ_REMOVE(&hostdevs, s, next);
+usb_host_release_interfaces(s);
+libusb_reset_device(s->dh);
+usb_host_attach_kernel(s);
 libusb_close(s->dh);
 s->dh = NULL;
 s->dev = NULL;
-- 
2.1.4

Re: [Qemu-devel] [PATCH] rdma: fix memory leak

2015-06-22 Thread Amit Shah

On (Tue) 23 Jun 2015 [09:02:28], arei.gong...@huawei.com wrote:
> From: Gonglei 
> 
> Variable "r" going out of scope leaks the storage
> it points to in line 3268.

Interestingly Dave posted a different patch for the same issue.  I
prefer this approach.

Reviewed-by: Amit Shah 

Amit

Re: [Qemu-devel] [PATCH] iov: don't touch iov in iov_send_recv()

2015-06-22 Thread Wen Congyang

Ping again.

This patch is bugfix. The old discussion is here:
http://lists.nongnu.org/archive/html/qemu-devel/2015-02/msg00245.html

On 05/21/2015 09:50 AM, Wen Congyang wrote:
> Signed-off-by: Wen Congyang 
> ---
>  include/qemu/iov.h |  2 +-
>  util/iov.c | 14 +-
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/include/qemu/iov.h b/include/qemu/iov.h
> index 68d25f2..569b2c2 100644
> --- a/include/qemu/iov.h
> +++ b/include/qemu/iov.h
> @@ -75,7 +75,7 @@ size_t iov_memset(const struct iovec *iov, const unsigned 
> int iov_cnt,
>   * For iov_send_recv() _whole_ area being sent or received
>   * should be within the iovec, not only beginning of it.
>   */
> -ssize_t iov_send_recv(int sockfd, struct iovec *iov, unsigned iov_cnt,
> +ssize_t iov_send_recv(int sockfd, const struct iovec *iov, unsigned iov_cnt,
>size_t offset, size_t bytes, bool do_send);
>  #define iov_recv(sockfd, iov, iov_cnt, offset, bytes) \
>iov_send_recv(sockfd, iov, iov_cnt, offset, bytes, false)
> diff --git a/util/iov.c b/util/iov.c
> index 2fb18e6..a0d5934 100644
> --- a/util/iov.c
> +++ b/util/iov.c
> @@ -133,7 +133,7 @@ do_send_recv(int sockfd, struct iovec *iov, unsigned 
> iov_cnt, bool do_send)
>  #endif
>  }
>  
> -ssize_t iov_send_recv(int sockfd, struct iovec *iov, unsigned iov_cnt,
> +ssize_t iov_send_recv(int sockfd, const struct iovec *_iov, unsigned iov_cnt,
>size_t offset, size_t bytes,
>bool do_send)
>  {
> @@ -141,6 +141,16 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, 
> unsigned iov_cnt,
>  ssize_t ret;
>  size_t orig_len, tail;
>  unsigned niov;
> +struct iovec *local_iov, *iov;
> +
> +if (bytes <= 0) {
> +return 0;
> +}
> +
> +local_iov = g_new0(struct iovec, iov_cnt);
> +iov_copy(local_iov, iov_cnt, _iov, iov_cnt, offset, bytes);
> +offset = 0;
> +iov = local_iov;
>  
>  while (bytes > 0) {
>  /* Find the start position, skipping `offset' bytes:
> @@ -187,6 +197,7 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, 
> unsigned iov_cnt,
>  
>  if (ret < 0) {
>  assert(errno != EINTR);
> +g_free(local_iov);
>  if (errno == EAGAIN && total > 0) {
>  return total;
>  }
> @@ -205,6 +216,7 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, 
> unsigned iov_cnt,
>  bytes -= ret;
>  }
>  
> +g_free(local_iov);
>  return total;
>  }
>  
>

Re: [Qemu-devel] [PATCH] vhost: correctly pass error to caller in vhost_dev_enable_notifiers()

2015-06-22 Thread Jason Wang



On 05/29/2015 02:13 PM, Jason Wang wrote:
> We override the error value r in fail_vq, this will cause the caller
> can't detect the failure which may cause the caller may disable the
> notifiers twice if vhost is failed to start. Fix this by using another
> variable to keep track the return value of set_host_notifier().
>
> Fixes b0b3db79559e57db340b292621c397e7a6cdbdc5 ("vhost-net: cleanup
> host notifiers at last step")
>
> Cc: qemu-sta...@nongnu.org
> Cc: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 
> ---
>  hw/virtio/vhost.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 54851b7..a7858d3 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -921,7 +921,7 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
> VirtIODevice *vdev)
>  BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
>  VirtioBusState *vbus = VIRTIO_BUS(qbus);
>  VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
> -int i, r;
> +int i, r, e;
>  if (!k->set_host_notifier) {
>  fprintf(stderr, "binding does not support host notifiers\n");
>  r = -ENOSYS;
> @@ -939,12 +939,12 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
> VirtIODevice *vdev)
>  return 0;
>  fail_vq:
>  while (--i >= 0) {
> -r = k->set_host_notifier(qbus->parent, hdev->vq_index + i, false);
> -if (r < 0) {
> +e = k->set_host_notifier(qbus->parent, hdev->vq_index + i, false);
> +if (e < 0) {
>  fprintf(stderr, "vhost VQ %d notifier cleanup error: %d\n", i, 
> -r);
>  fflush(stderr);
>  }
> -assert (r >= 0);
> +assert (e >= 0);
>  }
>  fail:
>  return r;

Ping

Re: [Qemu-devel] Greate difference of disk I/O performance for guest on Qemu-2.30 of CentOS.

2015-06-22 Thread cauchy-love


thanks for your help. I don't think it is the filesystem that causes this 
problem since when I updated the kernel of CentOS6.5 to the CentOS 7.0's 
version and the disk io performance was nearly the same as CentOS 7.0.



--
发自我的网易邮箱手机智能版


在 2015-06-19 16:37:23，"Paolo Bonzini"  写道：
>
>
>On 19/06/2015 03:03, cauchy-love wrote:
>> I have tried your qemu cammand line but got no luck (the embedded os
>> have no virtio support but this is not the problem i think). The
>> problem probably lies in the host kernel version as it is the only
>> difference for my tests. I traced the guest kernel and found the ATA
>> drivers always used non-DMA mode (CentOS 7 and CentOS 6.5 are the
>> same at this point).
>
>You can use perf to see what's the difference.
>
>Perhaps CentOS 6.5 used ext4 and 7 uses XFS?  Though in general XFS is
>the faster one...
>
>Paolo
>

Re: [Qemu-devel] [RFC PATCH v4 1/5] spapr: Initialize hotplug memory address space

2015-06-22 Thread David Gibson

On Fri, Jun 19, 2015 at 03:47:53PM +0530, Bharata B Rao wrote:
> Initialize a hotplug memory region under which all the hotplugged
> memory is accommodated. Also enable memory hotplug by setting
> CONFIG_MEM_HOTPLUG.
> 
> Modelled on i386 memory hotplug.
> 
> Signed-off-by: Bharata B Rao 
> ---
>  default-configs/ppc64-softmmu.mak |  1 +
>  hw/ppc/spapr.c| 28 
>  include/hw/ppc/spapr.h| 12 
>  3 files changed, 41 insertions(+)
> 
> diff --git a/default-configs/ppc64-softmmu.mak 
> b/default-configs/ppc64-softmmu.mak
> index ab62cc7..e77cb1a 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -52,3 +52,4 @@ CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
>  # For PReP
>  CONFIG_MC146818RTC=y
>  CONFIG_ISA_TESTDEV=y
> +CONFIG_MEM_HOTPLUG=y
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 5ca817c..87a29dc 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1549,6 +1549,34 @@ static void ppc_spapr_init(MachineState *machine)
>  memory_region_add_subregion(sysmem, 0, rma_region);
>  }
>  
> +/* initialize hotplug memory address space */
> +if (machine->ram_size < machine->maxram_size) {
> +ram_addr_t hotplug_mem_size = machine->maxram_size - 
> machine->ram_size;
> +
> +if (machine->ram_slots > SPAPR_MAX_RAM_SLOTS) {
> +error_report("unsupported amount of memory slots: %"PRIu64,
> +  machine->ram_slots);
> +exit(EXIT_FAILURE);
> +}
> +
> +spapr->hotplug_memory.base = ROUND_UP(machine->ram_size,
> +  SPAPR_HOTPLUG_MEM_ALIGN);
> +
> +hotplug_mem_size += SPAPR_HOTPLUG_MEM_ALIGN * machine->ram_slots;

I'm not sure what this adjustment is about.  Are you putting a gap of
size SPAPR_HOTPLUG_MEM_ALIGN between each memory slot?  That doesn't
see to match the DRC initialization code in the next patch which
assigns all the LMBs in the hotplug area contiguous addresses.

> +if ((spapr->hotplug_memory.base + hotplug_mem_size) <
> + hotplug_mem_size) {
> +error_report("unsupported amount of maximum memory: " 
> RAM_ADDR_FMT,
> + machine->maxram_size);
> +exit(EXIT_FAILURE);
> +}
> +
> +memory_region_init(&spapr->hotplug_memory.mr, OBJECT(spapr),
> +   "hotplug-memory", hotplug_mem_size);
> +memory_region_add_subregion(sysmem, spapr->hotplug_memory.base,
> +&spapr->hotplug_memory.mr);
> +}
> +
>  filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin");
>  if (!filename) {
>  error_report("Could not find LPAR rtas '%s'", "spapr-rtas.bin");
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 91a61ab..8a1929b 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -5,6 +5,7 @@
>  #include "hw/boards.h"
>  #include "hw/ppc/xics.h"
>  #include "hw/ppc/spapr_drc.h"
> +#include "hw/mem/pc-dimm.h"
>  
>  struct VIOsPAPRBus;
>  struct sPAPRPHBState;
> @@ -76,6 +77,7 @@ struct sPAPRMachineState {
>  
>  /*< public >*/
>  char *kvm_type;
> +MemoryHotplugState hotplug_memory;
>  };
>  
>  #define H_SUCCESS 0
> @@ -609,4 +611,14 @@ int spapr_rtc_import_offset(DeviceState *dev, int64_t 
> legacy_offset);
>  
>  #define SPAPR_MEMORY_BLOCK_SIZE (1 << 28) /* 256MB */
>  
> +/*
> + * This defines the maximum number of DIMM slots we can have for sPAPR
> + * guest. This is not defined by sPAPR but we are defining it to 32 slots
> + * based on default number of slots provided by PowerPC kernel.
> + */
> +#define SPAPR_MAX_RAM_SLOTS 32
> +
> +/* 1GB alignment for hotplug memory region */
> +#define SPAPR_HOTPLUG_MEM_ALIGN (1ULL << 30)
> +
>  #endif /* !defined (__HW_SPAPR_H__) */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgp3oQCEnH1xx.pgp
Description: PGP signature

Re: [Qemu-devel] [RFC PATCH v4 4/5] spapr: Make hash table size a factor of maxram_size

2015-06-22 Thread David Gibson

On Fri, Jun 19, 2015 at 03:47:56PM +0530, Bharata B Rao wrote:
> The hash table size is dependent on ram_size, but since with hotplug
> the memory can grow till maxram_size. Hence make hash table size dependent
> on maxram_size.
> 
> This allows to hotplug huge amounts of memory to the guest.
> 
> Signed-off-by: Bharata B Rao 

Reviewed-by: David Gibson 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpMnvcXBTTh5.pgp
Description: PGP signature

Re: [Qemu-devel] [RFC PATCH v4 3/5] spapr: Support ibm, dynamic-reconfiguration-memory

2015-06-22 Thread David Gibson

On Fri, Jun 19, 2015 at 03:47:55PM +0530, Bharata B Rao wrote:
> Parse ibm,architecture.vec table obtained from the guest and enable
> memory node configuration via ibm,dynamic-reconfiguration-memory if guest
> supports it. This is in preparation to support memory hotplug for
> sPAPR guests.
> 
> This changes the way memory node configuration is done. Currently all
> memory nodes are built upfront. But after this patch, only memory@0 node
> for RMA is built upfront. Guest kernel boots with just that and rest of
> the memory nodes (via memory@XXX or ibm,dynamic-reconfiguration-memory)
> are built when guest does ibm,client-architecture-support call.
> 
> Note: This patch needs a SLOF enhancement which is already part of
> SLOF binary in QEMU.
> 
> Signed-off-by: Bharata B Rao 

[snip]
> +int spapr_h_cas_compose_response(sPAPRMachineState *spapr,
> + target_ulong addr, target_ulong size,
> + bool cpu_update, bool memory_update)
> +{
> +void *fdt, *fdt_skel;
> +sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 };
> +
> +size -= sizeof(hdr);
> +
> +/* Create sceleton */
> +fdt_skel = g_malloc0(size);
> +_FDT((fdt_create(fdt_skel, size)));
> +_FDT((fdt_begin_node(fdt_skel, "")));
> +_FDT((fdt_end_node(fdt_skel)));
> +_FDT((fdt_finish(fdt_skel)));
> +fdt = g_malloc0(size);
> +_FDT((fdt_open_into(fdt_skel, fdt, size)));
> +g_free(fdt_skel);
> +
> +/* Fixup cpu nodes */
> +if (cpu_update) {
> +_FDT((spapr_fixup_cpu_dt(fdt, spapr)));
> +}

The cpu_update parameter seems like its not related to memory hotplug
at all.  I'm guessing it relates to CPU hotplug, in which case please
defer it until those patches are ready to go.

> +
> +/* Generate memory nodes or ibm,dynamic-reconfiguration-memory node */
> +if (memory_update && spapr->dr_lmb_enabled) {
> +_FDT((spapr_populate_drconf_memory(spapr, fdt)));
> +} else {
> +_FDT((spapr_populate_memory(spapr, fdt)));
> +}
> +
> +/* Pack resulting tree */
> +_FDT((fdt_pack(fdt)));
> +
> +if (fdt_totalsize(fdt) + sizeof(hdr) > size) {
> +trace_spapr_cas_failed(size);
> +return -1;
> +}
> +
> +cpu_physical_memory_write(addr, &hdr, sizeof(hdr));
> +cpu_physical_memory_write(addr + sizeof(hdr), fdt, fdt_totalsize(fdt));
> +trace_spapr_cas_continue(fdt_totalsize(fdt) + sizeof(hdr));
> +g_free(fdt);
> +
> +return 0;
> +}
> +
>  static void spapr_finalize_fdt(sPAPRMachineState *spapr,
> hwaddr fdt_addr,
> hwaddr rtas_addr,
> @@ -756,10 +866,16 @@ static void spapr_finalize_fdt(sPAPRMachineState *spapr,
>  /* open out the base tree into a temp buffer for the final tweaks */
>  _FDT((fdt_open_into(spapr->fdt_skel, fdt, FDT_MAX_SIZE)));
>  
> -ret = spapr_populate_memory(spapr, fdt);
> -if (ret < 0) {
> -fprintf(stderr, "couldn't setup memory nodes in fdt\n");
> -exit(1);
> +/*
> + * Add memory@0 node to represent RMA. Rest of the memory is either
> + * represented by memory nodes or ibm,dynamic-reconfiguration-memory
> + * node later during ibm,client-architecture-support call.
> + */
> +for (i = 0; i < nb_numa_nodes; ++i) {
> +if (numa_info[i].node_mem) {
> +spapr_populate_memory_node(fdt, i, 0, spapr->rma_size);
> +break;
> +}

?? The code doesn't seem to match the comment - you appear to be
creating a memory@0 node for every NUMA node, not just for the RMA,
which doesn't make much sense.

>  }
>  
>  ret = spapr_populate_vdevice(spapr->vio_bus, fdt);
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 652ddf6..2caac82 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -808,6 +808,32 @@ static target_ulong h_set_mode(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  return ret;
>  }
>  
> +/*
> + * Return the offset to the requested option vector @vector in the
> + * option vector table @table.
> + */
> +static target_ulong cas_get_option_vector(int vector, target_ulong table)
> +{
> +int i;
> +char nr_vectors, nr_entries;
> +
> +if (!table) {
> +return 0;
> +}
> +
> +nr_vectors = (ldl_phys(&address_space_memory, table) >> 24) + 1;
> +if (!vector || vector > nr_vectors) {
> +return 0;
> +}
> +table++; /* skip nr option vectors */
> +
> +for (i = 0; i < vector - 1; i++) {
> +nr_entries = ldl_phys(&address_space_memory, table) >> 24;
> +table += nr_entries + 2;
> +}
> +return table;
> +}
> +
>  typedef struct {
>  PowerPCCPU *cpu;
>  uint32_t cpu_version;
> @@ -828,19 +854,22 @@ static void do_set_compat(void *arg)
>  ((cpuver) == CPU_POWERPC_LOGICAL_2_06_PLUS) ? 2061 : \
>  ((cpuver) == CPU_POWERPC_LOGICAL_2_07) ? 2070 : 0)
>  
> +#define OV5_DRCONF_MEMORY 0x20
> +
>  static targ

Re: [Qemu-devel] [RFC PATCH v4 5/5] spapr: Memory hotplug support

2015-06-22 Thread David Gibson

On Fri, Jun 19, 2015 at 03:47:57PM +0530, Bharata B Rao wrote:
> Make use of pc-dimm infrastructure to support memory hotplug
> for PowerPC.
> 
> Signed-off-by: Bharata B Rao 

Reviewed-by: David Gibson 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgppyQ0EuO5bA.pgp
Description: PGP signature

Re: [Qemu-devel] [RFC PATCH v4 2/5] spapr: Add LMB DR connectors

2015-06-22 Thread David Gibson

On Fri, Jun 19, 2015 at 03:47:54PM +0530, Bharata B Rao wrote:
> Enable memory hotplug for pseries 2.4 and add LMB DR connectors.
> With memory hotplug, enforce NUMA node memory size and maxmem to be
> a multiple of SPAPR_MEMORY_BLOCK_SIZE (256M) since that's the granularity
> in which LMBs are represented and hot-added.
> 
> LMB DR connectors will be used by the memory hotplug code.
> 
> Signed-off-by: Bharata B Rao 
> Signed-off-by: Michael Roth 
>[spapr_drc_reset implementation]
> ---
>  hw/ppc/spapr.c | 78 
> ++
>  include/hw/ppc/spapr.h |  2 ++
>  2 files changed, 80 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 87a29dc..f9af89b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -59,6 +59,7 @@
>  #include "hw/nmi.h"
>  
>  #include "hw/compat.h"
> +#include "qemu-common.h"
>  
>  #include 
>  
> @@ -1436,10 +1437,76 @@ static void spapr_cpu_init(sPAPRMachineState *spapr, 
> PowerPCCPU *cpu)
>  qemu_register_reset(spapr_cpu_reset, cpu);
>  }
>  
> +static void spapr_drc_reset(void *opaque)

This function needs a different name, since it's only called for LMB
drcs, not all drcs.

> +{
> +sPAPRDRConnector *drc = opaque;
> +DeviceState *d = DEVICE(drc);
> +
> +if (d) {
> +device_reset(d);
> +}
> +}
> +
> +static void spapr_create_lmb_dr_connectors(sPAPRMachineState *spapr)
> +{
> +MachineState *machine = MACHINE(qdev_get_machine());
> +uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> +uint32_t nr_rma_lmbs = spapr->rma_size/lmb_size;
> +uint32_t nr_lmbs = machine->maxram_size/lmb_size - nr_rma_lmbs;
> +uint32_t nr_assigned_lmbs = machine->ram_size/lmb_size - nr_rma_lmbs;
> +int i;
> +
> +for (i = 0; i < nr_lmbs; i++) {
> +sPAPRDRConnector *drc;
> +uint64_t addr;
> +
> +if (i < nr_assigned_lmbs) {
> +addr = (i + nr_rma_lmbs) * lmb_size;
> +} else {
> +addr = (i - nr_assigned_lmbs) * lmb_size +
> +SPAPR_MACHINE(qdev_get_machine())->hotplug_memory.base;
> +}
> +
> +drc = spapr_dr_connector_new(qdev_get_machine(),
> +SPAPR_DR_CONNECTOR_TYPE_LMB, addr/lmb_size);
> +qemu_register_reset(spapr_drc_reset, drc);

Actually.. I'm not sure what spapr_drc_reset is needed for at all.
Won't the device reset hook get called through the normal qdev path
anyway?  The PCI hotplug code doesn't have an explicit register_reset,
so why does the memory hotplug code need it?

> +}
> +}
> +
> +/*
> + * If LMB DR is enabled node memory size and max memory size should
> + * be a multiple of SPAPR_MEMORY_BLOCK_SIZE (256M).
> + */
> +static void spapr_validate_node_memory(sPAPRMachineState *spapr)
> +{
> +int i;
> +MachineState *machine = MACHINE(qdev_get_machine());
> +
> +if (!spapr->dr_lmb_enabled) {
> +return;
> +}
> +
> +if (machine->maxram_size % SPAPR_MEMORY_BLOCK_SIZE) {
> +error_report("maxmem should be a multiple of %lld MB",
> +  SPAPR_MEMORY_BLOCK_SIZE/M_BYTE);
> +exit(EXIT_FAILURE);
> +}
> +
> +for (i = 0; i < nb_numa_nodes; i++) {
> +if (numa_info[i].node_mem &&
> +numa_info[i].node_mem % SPAPR_MEMORY_BLOCK_SIZE) {
> +error_report("Memory size on node %d should be a multiple "
> + "of %lld MB", i, SPAPR_MEMORY_BLOCK_SIZE/M_BYTE);
> +exit(EXIT_FAILURE);
> +}
> +}
> +}
> +
>  /* pSeries LPAR / sPAPR hardware init */
>  static void ppc_spapr_init(MachineState *machine)
>  {
>  sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
>  const char *kernel_filename = machine->kernel_filename;
>  const char *kernel_cmdline = machine->kernel_cmdline;
>  const char *initrd_filename = machine->initrd_filename;
> @@ -1518,6 +1585,9 @@ static void ppc_spapr_init(MachineState *machine)
> smp_threads),
>XICS_IRQS);
>  
> +spapr->dr_lmb_enabled = smc->dr_lmb_enabled;

I don't see any point to copying this value into the MachineState -
I'm guessing this is a leftover from sPAPREnvironment.  Anywhere you
have the MachineState you can get to the MachineClass and use the
value directly from there.

> +spapr_validate_node_memory(spapr);
> +
>  /* init CPUs */
>  if (machine->cpu_model == NULL) {
>  machine->cpu_model = kvm_enabled() ? "host" : "POWER7";
> @@ -1577,6 +1647,10 @@ static void ppc_spapr_init(MachineState *machine)
>  &spapr->hotplug_memory.mr);
>  }
>  
> +if (spapr->dr_lmb_enabled) {
> +spapr_create_lmb_dr_connectors(spapr);
> +}
> +
>  filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin");
>  if (!filename) {
>  error_report("Could not find LPAR

Re: [Qemu-devel] [PATCH v3 10/10] ui: convert VNC to use generic cipher API

2015-06-22 Thread Gonglei

On 2015/6/19 1:02, Daniel P. Berrange wrote:
>  if (!vs->vd->password) {
>  VNC_DEBUG("No password configured on server");
> @@ -2534,9 +2536,29 @@ static int protocol_client_auth_vnc(VncState *vs, 
> uint8_t *data, size_t len)
>  pwlen = strlen(vs->vd->password);
>  for (i=0; i  key[i] = ivd->password[i] : 0;
> -deskey(key, EN0);
> -for (j = 0; j < VNC_AUTH_CHALLENGE_SIZE; j += 8)
> -des(response+j, response+j);
> +
> +cipher = qcrypto_cipher_new(
> +QCRYPTO_CIPHER_ALG_DES_RFB,
> +QCRYPTO_CIPHER_MODE_ECB,
> +key, G_N_ELEMENTS(key),
> +&err);
> +if (!cipher) {
> +VNC_DEBUG("Cannot initialize cipher %s",
> +  error_get_pretty(err));
> +error_free(err);
> +goto reject;
> +}
> +
> +if (qcrypto_cipher_decrypt(cipher,
> +   vs->challenge,
> +   response,
> +   VNC_AUTH_CHALLENGE_SIZE,
> +   &err) < 0) {
> +VNC_DEBUG("Cannot encrypt challenge %s",
> +  error_get_pretty(err));
> +error_free(err);
> +goto reject;
> +}

Do we need change above VNC_DEBUGs to error_report() or something like that?
Anyway, it doesn't influence my R-b:
Reviewed-by: Gonglei

Re: [Qemu-devel] [PATCH v4] hmp: add info iothreads command

2015-06-22 Thread Ting Wang

Hi Luiz and Markus,

Would you like to pick up this patch, which has
been reviewed by Stefan and Fam?

Thanks.

Ting

On 2015-3-13 16:58, Ting Wang wrote:
> Make "info iothreads" available on the HMP monitor.
> 
> The results are as follows:
> id1: thread_id=thread_id1
> id2: thread_id=thread_id2
> 
> Signed-off-by: Ting Wang 
> ---
> v4: use the PRId64 format specifier macro for the int64_t thread_id
> v3: fix comment and the trailing whitespace
> v2: add braces for if
> ---
>  hmp-commands.hx |  2 ++
>  hmp.c   | 22 ++
>  hmp.h   |  1 +
>  monitor.c   |  7 +++
>  4 files changed, 32 insertions(+)
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index d5022d8..67d76ed 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -1746,6 +1746,8 @@ show roms
>  show the TPM device
>  @item info memory-devices
>  show the memory devices
> +@item info iothreads
> +show iothreads
>  @end table
>  ETEXI
>  
> diff --git a/hmp.c b/hmp.c
> index 71c28bc..445a8ad 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -821,6 +821,28 @@ void hmp_info_tpm(Monitor *mon, const QDict *qdict)
>  qapi_free_TPMInfoList(info_list);
>  }
>  
> +void hmp_info_iothreads(Monitor *mon, const QDict *qdict)
> +{
> +IOThreadInfoList *head = NULL, *elem = NULL;
> +
> +head = qmp_query_iothreads(NULL);
> +if (!head) {
> +monitor_printf(mon, "No iothread has been added\n");
> +return;
> +}
> +
> +elem = head;
> +while (elem) {
> +if (elem->value) {
> +monitor_printf(mon, "%s: thread_id=%" PRId64 "\n", 
> elem->value->id,
> +elem->value->thread_id);
> +}
> +elem = elem->next;
> +}
> +
> +qapi_free_IOThreadInfoList(head);
> +}
> +
>  void hmp_quit(Monitor *mon, const QDict *qdict)
>  {
>  monitor_suspend(mon);
> diff --git a/hmp.h b/hmp.h
> index 81177b2..d99090e 100644
> --- a/hmp.h
> +++ b/hmp.h
> @@ -38,6 +38,7 @@ void hmp_info_balloon(Monitor *mon, const QDict *qdict);
>  void hmp_info_pci(Monitor *mon, const QDict *qdict);
>  void hmp_info_block_jobs(Monitor *mon, const QDict *qdict);
>  void hmp_info_tpm(Monitor *mon, const QDict *qdict);
> +void hmp_info_iothreads(Monitor *mon, const QDict *qdict);
>  void hmp_quit(Monitor *mon, const QDict *qdict);
>  void hmp_stop(Monitor *mon, const QDict *qdict);
>  void hmp_system_reset(Monitor *mon, const QDict *qdict);
> diff --git a/monitor.c b/monitor.c
> index c86a89e..076a306 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2924,6 +2924,13 @@ static mon_cmd_t info_cmds[] = {
>  .mhandler.cmd = hmp_info_memory_devices,
>  },
>  {
> +.name   = "iothreads",
> +.args_type  = "",
> +.params = "",
> +.help   = "show iothreads",
> +.mhandler.cmd = hmp_info_iothreads,
> +},
> +{
>  .name   = NULL,
>  },
>  };
>

Re: [Qemu-devel] [PATCH v3 09/10] block: convert qcow/qcow2 to use generic cipher API

2015-06-22 Thread Gonglei

On 2015/6/19 1:02, Daniel P. Berrange wrote:
> Switch the qcow/qcow2 block driver over to use the generic cipher
> API, this allows it to use the pluggable AES implementations,
> instead of being hardcoded to use QEMU's built-in impl.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/qcow.c  | 102 
> +-
>  block/qcow2-cluster.c |  46 ++-
>  block/qcow2.c |  95 +++---
>  block/qcow2.h |  13 +++
>  4 files changed, 165 insertions(+), 91 deletions(-)

Reviewed-by: Gonglei

Re: [Qemu-devel] [PATCH v3 08/10] ui: convert VNC websockets to use crypto APIs

2015-06-22 Thread Gonglei

On 2015/6/19 1:02, Daniel P. Berrange wrote:
> Remove the direct use of gnutls for hash processing in the
> websockets code, in favour of using the crypto APIs. This
> allows the websockets code to be built unconditionally
> removing countless conditional checks from the VNC code.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  configure| 19 +---
>  ui/Makefile.objs |  2 +-
>  ui/vnc-ws.c  | 22 +--
>  ui/vnc-ws.h  |  2 --
>  ui/vnc.c | 67 
> +++-
>  ui/vnc.h |  8 ---
>  6 files changed, 25 insertions(+), 95 deletions(-)

Reviewed-by: Gonglei

Re: [Qemu-devel] [PATCH v3 06/10] crypto: add a nettle cipher implementation

2015-06-22 Thread Gonglei

On 2015/6/19 1:02, Daniel P. Berrange wrote:
> If we are linking to gnutls already and gnutls is built against
> nettle, then we should use nettle as a cipher backend in
> preference to our built-in backend.
> 
> This will be used when linking against some GNUTLS 2.x versions
> and all GNUTLS 3.x versions.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  configure  |  35 -
>  crypto/cipher-nettle.c | 206 
> +
>  crypto/cipher.c|   2 +
>  3 files changed, 240 insertions(+), 3 deletions(-)
>  create mode 100644 crypto/cipher-nettle.c

Reviewed-by: Gonglei

Re: [Qemu-devel] [PATCH v3 07/10] block: convert quorum blockdrv to use crypto APIs

2015-06-22 Thread Gonglei

On 2015/6/19 1:02, Daniel P. Berrange wrote:
> Get rid of direct use of gnutls APIs in quorum blockdrv in
> favour of using the crypto APIs. This avoids the need to
> do conditional compilation of the quorum driver. It can
> simply report an error at file open file instead if the
> required hash algorithm isn't supported by QEMU.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/Makefile.objs |  2 +-
>  block/quorum.c  | 41 ++---
>  configure   | 39 ---
>  3 files changed, 23 insertions(+), 59 deletions(-)

Reviewed-by: Gonglei

Re: [Qemu-devel] [PATCH v3 04/10] crypto: introduce generic cipher API & built-in implementation

2015-06-22 Thread Gonglei

On 2015/6/19 1:02, Daniel P. Berrange wrote:
> Introduce a generic cipher API and an implementation of it that
> supports only the built-in AES and DES-RFB algorithms.
> 
> The test suite checks the supported algorithms + modes to
> validate that every backend implementation is actually correctly
> complying with the specs.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  crypto/Makefile.objs   |   1 +
>  crypto/cipher-builtin.c| 398 
> +
>  crypto/cipher.c|  50 ++
>  include/crypto/cipher.h| 210 
>  tests/.gitignore   |   1 +
>  tests/Makefile |   2 +
>  tests/test-crypto-cipher.c | 290 +
>  7 files changed, 952 insertions(+)
>  create mode 100644 crypto/cipher-builtin.c
>  create mode 100644 crypto/cipher.c
>  create mode 100644 include/crypto/cipher.h
>  create mode 100644 tests/test-crypto-cipher.c

Reviewed-by: Gonglei

Re: [Qemu-devel] [PATCH v3 03/10] crypto: move built-in D3DES implementation into crypto/

2015-06-22 Thread Gonglei

On 2015/6/19 1:02, Daniel P. Berrange wrote:
> To prepare for a generic internal cipher API, move the
> built-in D3DES implementation into the crypto/ directory.
> 
> This is not in fact a normal D3DES implementation, it is
> D3DES with double & triple length modes removed, and the
> key bytes in reversed bit order. IOW it is crippled
> specifically for the "benefit" of RFB, so call the new
> files desrfb.c instead of d3des.c to make it clear that
> it isn't a generally useful impl.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  crypto/Makefile.objs  | 1 +
>  ui/d3des.c => crypto/desrfb.c | 2 +-
>  ui/d3des.h => include/crypto/desrfb.h | 0
>  ui/Makefile.objs  | 2 +-
>  ui/vnc.c  | 2 +-
>  5 files changed, 4 insertions(+), 3 deletions(-)
>  rename ui/d3des.c => crypto/desrfb.c (99%)
>  rename ui/d3des.h => include/crypto/desrfb.h (100%)

Reviewed-by: Gonglei

Re: [Qemu-devel] [PATCH v3 2/2] vhost user: Add RARP injection for legacy guest

2015-06-22 Thread Jason Wang



On 06/18/2015 11:16 PM, Thibaut Collet wrote:
> On Tue, Jun 16, 2015 at 10:05 AM, Jason Wang  wrote:
>>
>> On 06/16/2015 03:24 PM, Thibaut Collet wrote:
>>> If my understanding is correct, on a resume operation, we have the
>>> following callback trace:
>>> 1. virtio_pci_restore function that calls all restore call back of
>>> virtio devices
>>> 2. virtnet_restore that calls try_fill_recv function for each virtual queues
>>> 3. try_fill_recv function kicks the virtual queue (through
>>> virtqueue_kick function)
>> Yes, but this happens only after pm resume not migration. Migration is
>> totally transparent to guest.
>>
> Hi Jason,
>
> After a deeper look in the migration code of QEMU a resume event is
> always sent when the live migration is finished.
> On a live migration we have the following callback trace:
> 1. The VM on the new host is set to the state RUN_STATE_INMIGRATE, the
> autostart boolean to 1  and calls the qemu_start_incoming_migration
> function (see function main of vl.c)
> .
> 2. call of process_incoming_migration function in
> migration/migration.c file whatever the way to do the live migration
> (tcp:, fd:, unix:, exec: ...)
> 3. call of process_incoming_migration_co function in migration/migration.c
> 4. call of vm_start function in vl.c (otherwise the migrated VM stay
> in the pause state, the autostart boolean is set to 1 by the main
> function in vl.c)
> 5. call of vm_start function that sets the VM is the RUN_STATE_RUNNING state.
> 6. call of qapi_event_send_resume function that ends a resume event to the VM

AFAIK, this function sends resume event to qemu monitor not VM.

>
> So when a live migration is ended:
> 1. a resume event is sent to the guest
> 2. On the reception of this resume event the virtual queue are kicked
> by the guest
> 3. Backend vhost user catches this kick and can emit a RARP to guest
> that does not support GUEST_ANNOUNCE
>
> This solution, as solution based on detection of DRIVER_OK status
> suggested by Michael, allows backend to send the RARP to legacy guest
> without involving QEMU and add ioctl to vhost-user.

A question here is did vhost-user code pass status to the backend? If
not, how can userspace backend detect DRIVER_OK?

>
> Vhost user backend is free to implement one of this two solutions.
>
> The single drawback of these two solutions is useless RARP sending in
> some case (reboot, ...). These messages are harmless and probably not
> blocking for a "light" patch to support live migration with vhost
> user.
>
> If you agree
>
> 1. The first patch must be updated by:
>   - removing the warning trace
>   - setting the error trace inside a static bool flag to only
> print this once
>   - removing the vhost_net_inject_rarp function (no more useful)
> 2. The second patch can be removed. Management of legacy guest for
> rarp sending can be done by modifications in backend without any
> change in QEMU.
>
> Best regards.
>
> Thibaut.
>

Re: [Qemu-devel] [PATCH v3 02/10] crypto: move built-in AES implementation into crypto/

2015-06-22 Thread Gonglei

On 2015/6/19 1:02, Daniel P. Berrange wrote:
> To prepare for a generic internal cipher API, move the
> built-in AES implementation into the crypto/ directory
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/qcow.c   | 2 +-
>  block/qcow2.c  | 1 -
>  block/qcow2.h  | 2 +-
>  crypto/Makefile.objs   | 1 +
>  {util => crypto}/aes.c | 2 +-
>  include/{qemu => crypto}/aes.h | 0
>  target-arm/crypto_helper.c | 2 +-
>  target-i386/fpu_helper.c   | 1 -
>  target-i386/ops_sse.h  | 2 +-
>  target-ppc/int_helper.c| 2 +-
>  util/Makefile.objs | 2 +-
>  11 files changed, 8 insertions(+), 9 deletions(-)
>  rename {util => crypto}/aes.c (99%)
>  rename include/{qemu => crypto}/aes.h (100%)

Reviewed-by: Gonglei

Re: [Qemu-devel] [PATCH v3 01/10] crypto: introduce new module for computing hash digests

2015-06-22 Thread Gonglei

On 2015/6/19 1:02, Daniel P. Berrange wrote:
> Introduce a new crypto/ directory that will (eventually) contain
> all the cryptographic related code. This initially defines a
> wrapper for initializing gnutls and for computing hashes with
> gnutls. The former ensures that gnutls is guaranteed to be
> initialized exactly once in QEMU regardless of CLI args. The
> block quorum code currently fails to initialize gnutls so it
> only works by luck, if VNC server TLS is not requested. The
> hash APIs avoids the need to litter the rest of the code with
> preprocessor checks and simplifies callers by allocating the
> correct amount of memory for the requested hash.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  MAINTAINERS  |   7 ++
>  Makefile.objs|   1 +
>  configure|  46 +++
>  crypto/Makefile.objs |   2 +
>  crypto/hash.c| 200 +
>  crypto/init.c|  60 ++
>  include/crypto/hash.h| 189 ++
>  include/crypto/init.h|  29 +++
>  tests/.gitignore |   1 +
>  tests/Makefile   |   2 +
>  tests/test-crypto-hash.c | 209 
> +++
>  vl.c |   7 ++
>  12 files changed, 753 insertions(+)
>  create mode 100644 crypto/Makefile.objs
>  create mode 100644 crypto/hash.c
>  create mode 100644 crypto/init.c
>  create mode 100644 include/crypto/hash.h
>  create mode 100644 include/crypto/init.h
>  create mode 100644 tests/test-crypto-hash.c

Reviewed-by: Gonglei

[Qemu-devel] [PATCH 2/2] qdev: fix OVERFLOW_BEFORE_WIDEN

2015-06-22 Thread arei.gonglei

From: Gonglei 

Potentially overflowing expression "1 << prop->bitnr" with
type "int" (32 bits, signed) is evaluated using 32-bit arithmetic,
and then used in a context that expects an expression of type
"uint64_t" (64 bits, unsigned).

Cc: Gerd Hoffmann 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Gonglei 
---
 hw/core/qdev-properties.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index a1606de..f78b335 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -130,7 +130,7 @@ PropertyInfo qdev_prop_bit = {
 static uint64_t qdev_get_prop_mask64(Property *prop)
 {
 assert(prop->info == &qdev_prop_bit);
-return 0x1 << prop->bitnr;
+return 0x1ull << prop->bitnr;
 }
 
 static void bit64_prop_set(DeviceState *dev, Property *props, bool val)
-- 
1.7.12.4

[Qemu-devel] [PATCH 1/2] virito-pci: fix OVERRUN problem

2015-06-22 Thread arei.gonglei

From: Gonglei 

Overrunning array "proxy->guest_features" of 2 4-byte
elements at element index 2 (byte offset 8) using index
"proxy->gfselect" (which evaluates to 2). Normally, the
Linux kernel driver just read/write '0' or '1' as the
"proxy->gfselect" values, so using '<' instead of '=<' to
make coverity happy and avoid potential harm.

Cc: Michael S. Tsirkin 
Signed-off-by: Gonglei 
---
 hw/virtio/virtio-pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index d7cf34c..ce1c46e 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -977,7 +977,7 @@ static uint64_t virtio_pci_common_read(void *opaque, hwaddr 
addr,
 val = proxy->gfselect;
 break;
 case VIRTIO_PCI_COMMON_GF:
-if (proxy->gfselect <= ARRAY_SIZE(proxy->guest_features)) {
+if (proxy->gfselect < ARRAY_SIZE(proxy->guest_features)) {
 val = proxy->guest_features[proxy->gfselect];
 }
 break;
@@ -1052,7 +1052,7 @@ static void virtio_pci_common_write(void *opaque, hwaddr 
addr,
 proxy->gfselect = val;
 break;
 case VIRTIO_PCI_COMMON_GF:
-if (proxy->gfselect <= ARRAY_SIZE(proxy->guest_features)) {
+if (proxy->gfselect < ARRAY_SIZE(proxy->guest_features)) {
 proxy->guest_features[proxy->gfselect] = val;
 virtio_set_features(vdev,
 (((uint64_t)proxy->guest_features[1]) << 32) |
-- 
1.7.12.4

[Qemu-devel] [PATCH 0/2] fix two overflow problems

2015-06-22 Thread arei.gonglei

From: Gonglei 


Gonglei (2):
  virito-pci: fix OVERRUN problem
  qdev: fix OVERFLOW_BEFORE_WIDEN

 hw/core/qdev-properties.c | 2 +-
 hw/virtio/virtio-pci.c| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

-- 
1.7.12.4

Re: [Qemu-devel] [PATCH] configure: Add support for jemalloc

2015-06-22 Thread Fam Zheng

On Fri, 06/19 12:56, Alexandre Derumier wrote:
> This adds "--enable-jemalloc" and "--disable-jemalloc" to allow linking
> to jemalloc memory allocator.
> 
> We have already tcmalloc support,
> but it seem to not working well with a lot of iothreads/disks.
> 
> The main problem is that tcmalloc use a shared thread cache of 16MB
> by default.
> With more threads, this cache is shared, and some bad garbage collections
> can occur if the cache is too low.
> 
> It's possible to tcmalloc cache increase it with a env var:
> TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=256MB
> 
> With default 16MB, performances are  really bad with more than 2 disks.
> Increasing to 256MB, it's helping but still have problem with 16 
> disks/iothreads.
> 
> Jemalloc don't have performance problem with default configuration.
> 
> Here the benchmark results in iops of 1 qemu vm randread 4K iodepth=32,
> with rbd block backend (librbd is doing a lot of memory allocation),
> 1 iothread by disk
> 
> glibc malloc
> 
> 
> 1 disk  29052
> 2 disks 55878
> 4 disks 127899
> 8 disks 240566
> 15 disks269976
> 
> jemalloc
> 
> 
> 1 disk  41278
> 2 disks 75781
> 4 disks 195351
> 8 disks 294241
> 15 disks298199
> 
> tcmalloc 2.2.1 default 16M cache
> 
> 
> 1 disk   37911
> 2 disks  67698
> 4 disks  41076
> 8 disks  43312
> 15 disks 37569
> 
> tcmalloc : 256M cache
> ---
> 
> 1 disk 33914
> 2 disks58839
> 4 disks148205
> 8 disks213298
> 15 disks   218383
> 
> Signed-off-by: Alexandre Derumier 
> ---
>  configure | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/configure b/configure
> index 222694f..2fe1e05 100755
> --- a/configure
> +++ b/configure
> @@ -336,6 +336,7 @@ vhdx=""
>  quorum=""
>  numa=""
>  tcmalloc="no"
> +jemalloc="no"
>  
>  # parse CC options first
>  for opt do
> @@ -1147,6 +1148,10 @@ for opt do
>;;
>--enable-tcmalloc) tcmalloc="yes"
>;;
> +  --disable-jemalloc) jemalloc="no"
> +  ;;
> +  --enable-jemalloc) jemalloc="yes"
> +  ;;
>*)
>echo "ERROR: unknown option $opt"
>echo "Try '$0 --help' for more information"
> @@ -1420,6 +1425,8 @@ Advanced options (experts only):
>--enable-numaenable libnuma support
>--disable-tcmalloc   disable tcmalloc support
>--enable-tcmallocenable tcmalloc support
> +  --disable-jemalloc   disable jemalloc support
> +  --enable-jemallocenable jemalloc support
>  
>  NOTE: The object files are built at the place where configure is launched
>  EOF
> @@ -3344,6 +3351,11 @@ EOF
>fi
>  fi
>  
> +if test "$tcmalloc" = "yes" && test "$jemalloc" = "yes" ; then
> +echo "ERROR: tcmalloc && jemalloc can't be used at the same time"
> +exit 1
> +fi
> +
>  ##
>  # tcmalloc probe
>  
> @@ -3361,6 +3373,22 @@ EOF
>  fi
>  
>  ##
> +# jemalloc probe
> +
> +if test "$jemalloc" = "yes" ; then
> +  cat > $TMPC << EOF
> +#include 
> +int main(void) { malloc(1); return 0; }
> +EOF
> +
> +  if compile_prog "" "-ljemalloc" ; then
> +LIBS="-ljemalloc $LIBS"
> +  else
> +feature_not_found "jemalloc" "install jemalloc devel"
> +  fi
> +fi
> +
> +##
>  # signalfd probe
>  signalfd="no"
>  cat > $TMPC << EOF
> @@ -4499,6 +4527,7 @@ echo "snappy support$snappy"
>  echo "bzip2 support $bzip2"
>  echo "NUMA host support $numa"
>  echo "tcmalloc support  $tcmalloc"
> +echo "jemalloc support  $jemalloc"
>  
>  if test "$sdl_too_old" = "yes"; then
>  echo "-> Your SDL version is too old - please upgrade to have SDL support"
> -- 
> 2.1.4
> 
> 

Reviewed-by: Fam Zheng

Re: [Qemu-devel] Adding new migration-parameters - any easier way?

2015-06-22 Thread zhanghailiang


On 2015/6/19 16:11, Markus Armbruster wrote:

zhanghailiang  writes:


Hi,

Is there any news about this discussion?
Is anyone working on it? ;)

Since the 'hard feature freeze' time is closer, we'd better to fix it in 2.4
before libvirt uses it.

I have sent a RFC patch "[RFC] migration: Re-implement 'migrate-set-parameters' to 
make it easily for extension"
http://patchwork.ozlabs.org/patch/482125/
Which just changes qmp command to
  '{ "execute": "migrate-set-parameters" , "arguments":
  { "compression": { "compress-level": 1 } } }',
Compared with its original way, it seems to be more easy for reading and 
extension, but
it can't address most problems discussed here...



From a QMP wire protocol point of view, @migrate-set-parameters is just

fine as it is:

 #
 # @migrate-set-parameters
 #
 # Set the following migration parameters
 #
 # @compress-level: compression level
 #
 # @compress-threads: compression thread count
 #
 # @decompress-threads: decompression thread count
 #
 # Since: 2.4
 ##
 { 'command': 'migrate-set-parameters',
   'data': { '*compress-level': 'int',
 '*compress-threads': 'int',
 '*decompress-threads': 'int'} }

However, as the number of parameters grows, the generated C interface
becomes more and more unwieldy:

 void qmp_migrate_set_parameters(bool has_compress_level,
 int64_t compress_level,
 bool has_compress_threads,
 int64_t compress_threads,
 bool has_decompress_threads,
 int64_t decompress_threads,
 ... more ...
 Error **errp)



Yes, this is the problem.


Your [RFC PATCH] hides away some of the parameters in a separate
MigrationCompressParameter (misnamed, should be plural).  Leads to

 void qmp_migrate_set_parameters(bool has_compress,
 MigrationCompressParameters *compress,
 ... more ...
 Error **errp)

Makes the QMP wire protocol more complex, because now you have to
collect some parameters in extra curlies.

Note the two levels of optionalness: one for the wrapping struct, one
for each of its members.  If you made the members non-optional, we'd
have to specify none or all, which is not what we want.  If you made the
wrapping struct non-optional, you'd have to add a stupid 'compress': {}
when you want to set something else.  So we need both.

In C, the relatively simple test has_compress_level becomes compress &&
compress->has_level.



Hmm, i see, thanks for your explanation.


If we ever want to set migration parameters from the command line, the
nesting will be in the way.  Our tool to parse command line into
QAPI-generated data types (OptsVisitor) by design doesn't cope with
nested structs.

Perhaps the extra complexity is worthwhile, but it's certainly not
obvious.

If all we want is making adding new parameters easier (this thread's
stated subject), then I guess having the function take a single struct
parameter for all its arguments would do:

 void qmp_migrate_set_parameters(MigrationParameters *parms,
 Error **errp)

To get that from the current QAPI code generator, we'd have to do
something like

 { 'command': 'migrate-set-parameters',
   'data': { 'parms': 'MigrationParameters' } }

Trouble is this messes up the QMP wire interface: you now have to wrap
the actual arguments in two curlies instead of one.

Not a backward-compatibility issue, because the command is new.

Ugly all the same.

To avoid the ugliness, we could change the QAPI generator.  Currently,

 { 'command': 'migrate-set-parameters',
   'data': 'MigrationParameters' }

generates the same interface as when you inline MigrationParameters,
namely

 void qmp_migrate_set_parameters(bool has_compress_level,
 int64_t compress_level,
 bool has_compress_threads,
 int64_t compress_threads,
 bool has_decompress_threads,
 int64_t decompress_threads,
 ... more ...
 Error **errp)

It could instead generate

 void qmp_migrate_set_parameters(MigrationParameters *parms,
 Error **errp)

No change to the wire protocol.  Fairly big, but relatively mechanical
change to the handler functions.  I'd be willing to give it a shot and
see how it turns out, but I can't do it for 2.4, sorry.



Great, that would be fine~ Thanks.

Re: [Qemu-devel] [PULL v2 58/60] qdev: add 64bit properties

2015-06-22 Thread Gonglei

On 2015/6/21 5:10, Paolo Bonzini wrote:
> 
> 
> On 01/06/2015 14:25, Michael S. Tsirkin wrote:
>> +static uint64_t qdev_get_prop_mask64(Property *prop)
>> +{
>> +assert(prop->info == &qdev_prop_bit);
>> +return 0x1 << prop->bitnr;
> 
> This needs to be 1ull.
> 
Yes, and coverity spot it and some other OVERRUN problems.

Regards,
-Gonglei

[Qemu-devel] [PATCH] rdma: fix memory leak

2015-06-22 Thread arei.gonglei

From: Gonglei 

Variable "r" going out of scope leaks the storage
it points to in line 3268.

Signed-off-by: Gonglei 
---
 migration/rdma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index cf5de7e..de80860 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3262,12 +3262,13 @@ static const QEMUFileOps rdma_write_ops = {
 
 static void *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
 {
-QEMUFileRDMA *r = g_malloc0(sizeof(QEMUFileRDMA));
+QEMUFileRDMA *r = NULL;
 
 if (qemu_file_mode_is_not_valid(mode)) {
 return NULL;
 }
 
+r = g_malloc0(sizeof(QEMUFileRDMA));
 r->rdma = rdma;
 
 if (mode[0] == 'w') {
-- 
1.7.12.4

[Qemu-devel] [PATCH] ossaudio: fix memory leak

2015-06-22 Thread arei.gonglei

From: Gonglei 

Variable "conf" going out of scope leaks the storage
it points to in line 856.

Signed-off-by: Gonglei 
---
 audio/ossaudio.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/audio/ossaudio.c b/audio/ossaudio.c
index 11e76a1..7dbe333 100644
--- a/audio/ossaudio.c
+++ b/audio/ossaudio.c
@@ -853,6 +853,7 @@ static void *oss_audio_init (void)
 
 if (access(conf->devpath_in, R_OK | W_OK) < 0 ||
 access(conf->devpath_out, R_OK | W_OK) < 0) {
+g_free(conf);
 return NULL;
 }
 return conf;
-- 
1.7.12.4

Re: [Qemu-devel] [PATCH v2 0/3] m68k: fix ColdFire support

2015-06-22 Thread Greg Ungerer

Hi Andreas,

On 23/06/15 02:49, Andreas Färber wrote:
> Am 20.06.2015 um 06:55 schrieb Greg Ungerer:
>> I have one more fix still to come for the qemu mcf_fec.c net driver.
>> It currently doesn't support the mdio actions or any attached phy and
>> this will cause the linux fec driver to fail probing.
> 
> I still have some mcf_* device conversions back from KVM Forum 2012
> lying around somewhere, waiting for further cleanups for submission...

Please dust of and send if you have the time. Would be
great to have more.

Regards
Greg

[Qemu-devel] [PATCH 11/16] ahci: add cmd header to ncq transfer state

2015-06-22 Thread John Snow

While the rest of the AHCI device can rely on a single bookmarked
pointer for the AHCI Command Header currently being processed, NCQ
is asynchronous and may have many commands in flight simultaneously.

Add a cmdh pointer to the ncq_tfs object and make the sglist prepare
function take an AHCICmdHeader pointer so we can be explicit about
where we'd like to build SGlists from.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 11 ++-
 hw/ide/ahci.h |  1 +
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 043b959..4cf792b 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -832,9 +832,8 @@ static int prdt_tbl_entry_size(const AHCI_SG *tbl)
 }
 
 static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist,
-int64_t limit, int32_t offset)
+AHCICmdHdr *cmd, int64_t limit, int32_t offset)
 {
-AHCICmdHdr *cmd = ad->cur_cmd;
 uint16_t opts = le16_to_cpu(cmd->opts);
 uint16_t prdtl = le16_to_cpu(cmd->prdtl);
 uint64_t cfis_addr = le64_to_cpu(cmd->tbl_addr);
@@ -1057,6 +1056,7 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 ncq_tfs->halt = false;
 ncq_tfs->drive = ad;
 ncq_tfs->slot = slot;
+ncq_tfs->cmdh = &((AHCICmdHdr *)ad->lst)[slot];
 ncq_tfs->cmd = ncq_fis->command;
 ncq_tfs->lba = ((uint64_t)ncq_fis->lba5 << 40) |
((uint64_t)ncq_fis->lba4 << 32) |
@@ -1091,7 +1091,7 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 ncq_tfs->sector_count = 0x1;
 }
 size = ncq_tfs->sector_count * 512;
-ahci_populate_sglist(ad, &ncq_tfs->sglist, size, 0);
+ahci_populate_sglist(ad, &ncq_tfs->sglist, ncq_tfs->cmdh, size, 0);
 
 if (ncq_tfs->sglist.size < size) {
 error_report("ahci: PRDT length for NCQ command (0x%zx) "
@@ -1361,7 +1361,8 @@ static int32_t ahci_dma_prepare_buf(IDEDMA *dma, int64_t 
limit, int is_write)
 AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
 IDEState *s = &ad->port.ifs[0];
 
-if (ahci_populate_sglist(ad, &s->sg, limit, s->io_buffer_offset) == -1) {
+if (ahci_populate_sglist(ad, &s->sg, ad->cur_cmd,
+ limit, s->io_buffer_offset) == -1) {
 DPRINTF(ad->port_no, "ahci_dma_prepare_buf failed.\n");
 return -1;
 }
@@ -1396,7 +1397,7 @@ static int ahci_dma_rw_buf(IDEDMA *dma, int is_write)
 uint8_t *p = s->io_buffer + s->io_buffer_index;
 int l = s->io_buffer_size - s->io_buffer_index;
 
-if (ahci_populate_sglist(ad, &s->sg, l, s->io_buffer_offset)) {
+if (ahci_populate_sglist(ad, &s->sg, ad->cur_cmd, l, s->io_buffer_offset)) 
{
 return 0;
 }
 
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index 9090d3d..9f5b4d2 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -254,6 +254,7 @@ typedef struct AHCIDevice AHCIDevice;
 typedef struct NCQTransferState {
 AHCIDevice *drive;
 BlockAIOCB *aiocb;
+AHCICmdHdr *cmdh;
 QEMUSGList sglist;
 BlockAcctCookie acct;
 uint32_t sector_count;
-- 
2.1.0

[Qemu-devel] [PATCH 13/16] ahci: add get_cmd_header helper

2015-06-22 Thread John Snow

cur_cmd is an internal bookmark that points to the
current AHCI Command Header being processed by the
AHCI state machine. With NCQ needing to occasionally
rely on some of the same AHCI helpers, we cannot use
cur_cmd and will need to grab explicit pointers instead.

In an attempt to begin relying on the cur_cmd pointer
less, add a helper to let us specifically get the pointer
to the command header of particular interest.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index a29cf49..eeb48fb 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1115,11 +1115,20 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 execute_ncq_command(ncq_tfs);
 }
 
+static AHCICmdHdr *get_cmd_header(AHCIState *s, uint8_t port, uint8_t slot)
+{
+if (port > s->ports || slot > AHCI_MAX_CMDS) {
+return NULL;
+}
+
+return s->dev[port].lst ? &((AHCICmdHdr *)s->dev[port].lst)[slot] : NULL;
+}
+
 static void handle_reg_h2d_fis(AHCIState *s, int port,
uint8_t slot, uint8_t *cmd_fis)
 {
 IDEState *ide_state = &s->dev[port].port.ifs[0];
-AHCICmdHdr *cmd = s->dev[port].cur_cmd;
+AHCICmdHdr *cmd = get_cmd_header(s, port, slot);
 uint16_t opts = le16_to_cpu(cmd->opts);
 
 if (cmd_fis[1] & 0x0F) {
@@ -1218,7 +1227,7 @@ static int handle_cmd(AHCIState *s, int port, uint8_t 
slot)
 DPRINTF(port, "error: lst not given but cmd handled");
 return -1;
 }
-cmd = &((AHCICmdHdr *)s->dev[port].lst)[slot];
+cmd = get_cmd_header(s, port, slot);
 /* remember current slot handle for later */
 s->dev[port].cur_cmd = cmd;
 
@@ -1592,7 +1601,7 @@ static int ahci_state_post_load(void *opaque, int 
version_id)
 if (ncq_tfs->slot > AHCI_MAX_CMDS) {
 return -1;
 }
-ncq_tfs->cmdh = &((AHCICmdHdr *)ad->lst)[ncq_tfs->slot];
+ncq_tfs->cmdh = get_cmd_header(s, i, ncq_tfs->slot);
 ahci_populate_sglist(ncq_tfs->drive, &ncq_tfs->sglist,
  ncq_tfs->cmdh, ncq_tfs->sector_count * 512,
  0);
@@ -1618,7 +1627,7 @@ static int ahci_state_post_load(void *opaque, int 
version_id)
 if (ad->busy_slot < 0 || ad->busy_slot >= AHCI_MAX_CMDS) {
 return -1;
 }
-ad->cur_cmd = &((AHCICmdHdr *)ad->lst)[ad->busy_slot];
+ad->cur_cmd = get_cmd_header(s, i, ad->busy_slot);
 }
 }
 
-- 
2.1.0

[Qemu-devel] [PATCH 16/16] ahci: fix sdb fis semantics

2015-06-22 Thread John Snow

There are two things to fix here:

The first one is subtle: the PxSACT register in the AHCI HBA has different
semantics from the field it is shadowing, the ACT field in the
Set Device Bits FIS.

In the HBA register, PxSACT acts as a bitfield indicating outstanding
NCQ commands where a set bit indicates a pending NCQ operation. The FIS
field however operates as an RWC register update to PxSACT, where a set
bit indicates a *successfully* completed command.

Correct the FIS semantics. At the same time, move the "clear finished"
action to the SDB FIS generation instead of the register read to mimick
how the other shadow registers work, which always just report the last
reported value from a FIS, and not the most current values which may
not have been reported by a FIS yet.

Lastly and more simply, SATA 3.2 section 13.6.4.2 (and later sections)
all specify that the Interrupt bit for the SDB FIS should always be set
to one for NCQ commands. That's currently the only time we generate this
FIS, so set it on all the time.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index d344701..db62a53 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -106,8 +106,6 @@ static uint32_t  ahci_port_read(AHCIState *s, int port, int 
offset)
 val = pr->scr_err;
 break;
 case PORT_SCR_ACT:
-pr->scr_act &= ~s->dev[port].finished;
-s->dev[port].finished = 0;
 val = pr->scr_act;
 break;
 case PORT_CMD_ISSUE:
@@ -665,14 +663,14 @@ static void ahci_unmap_clb_address(AHCIDevice *ad)
 ad->lst = NULL;
 }
 
-static void ahci_write_fis_sdb(AHCIState *s, int port, uint32_t finished)
+static void ahci_write_fis_sdb(AHCIState *s, NCQTransferState *ncq_tfs)
 {
-AHCIDevice *ad = &s->dev[port];
+AHCIDevice *ad = ncq_tfs->drive;
 AHCIPortRegs *pr = &ad->port_regs;
 IDEState *ide_state;
 SDBFIS *sdb_fis;
 
-if (!s->dev[port].res_fis ||
+if (!ad->res_fis ||
 !(pr->cmd & PORT_CMD_FIS_RX)) {
 return;
 }
@@ -682,19 +680,22 @@ static void ahci_write_fis_sdb(AHCIState *s, int port, 
uint32_t finished)
 
 sdb_fis->type = SATA_FIS_TYPE_SDB;
 /* Interrupt pending & Notification bit */
-sdb_fis->flags = (ad->hba->control_regs.irqstatus ? (1 << 6) : 0);
+sdb_fis->flags = 0x40; /* Interrupt bit, always 1 for NCQ */
 sdb_fis->status = ide_state->status & 0x77;
 sdb_fis->error = ide_state->error;
 /* update SAct field in SDB_FIS */
-s->dev[port].finished |= finished;
 sdb_fis->payload = cpu_to_le32(ad->finished);
 
 /* Update shadow registers (except BSY 0x80 and DRQ 0x08) */
 pr->tfdata = (ad->port.ifs[0].error << 8) |
 (ad->port.ifs[0].status & 0x77) |
 (pr->tfdata & 0x88);
+pr->scr_act &= ~ad->finished;
+ad->finished = 0;
 
-ahci_trigger_irq(s, ad, PORT_IRQ_SDB_FIS);
+if (sdb_fis->flags & 0x40) {
+ahci_trigger_irq(s, ad, PORT_IRQ_SDB_FIS);
+}
 }
 
 static void ahci_write_fis_pio(AHCIDevice *ad, uint16_t len)
@@ -894,11 +895,14 @@ static void ncq_err(NCQTransferState *ncq_tfs)
 
 static void ncq_finish(NCQTransferState *ncq_tfs)
 {
-/* Clear bit for this tag in SActive */
-ncq_tfs->drive->port_regs.scr_act &= ~(1 << ncq_tfs->tag);
+/* If we didn't error out, set our finished bit. Errored commands
+ * do not get a bit set for the SDB FIS ACT register, nor do they
+ * clear the outstanding bit in scr_act (PxSACT). */
+if (!(ncq_tfs->drive->port_regs.scr_err & (1 << ncq_tfs->tag))) {
+ncq_tfs->drive->finished |= (1 << ncq_tfs->tag);
+}
 
-ahci_write_fis_sdb(ncq_tfs->drive->hba, ncq_tfs->drive->port_no,
-   (1 << ncq_tfs->tag));
+ahci_write_fis_sdb(ncq_tfs->drive->hba, ncq_tfs);
 
 DPRINTF(ncq_tfs->drive->port_no, "NCQ transfer tag %d finished\n",
 ncq_tfs->tag);
-- 
2.1.0

[Qemu-devel] [PATCH 01/16] ide: add limit to .prepare_buf()

2015-06-22 Thread John Snow

prepare_buf should not always grab as many descriptors
as it can, sometimes it should self-limit.

For example, an NCQ transfer of 1 sector with a PRDT that
describes 4GiB of data should not copy 4GiB of data, it
should just transfer that first 512 bytes.

PIO is not affected, because the dma_buf_rw dma helpers
already have a byte limit built-in to them, but DMA/NCQ
will exhaust the entire list regardless of requested size.

AHCI 1.3 specifies in section 6.1.6 Command List Underflow that
NCQ is not required to detect underflow conditions. Non-NCQ
pathways signal underflow by writing to the PRDBC field, which
will already occur by writing the actual transferred byte count
to the PRDBC, signaling the underflow.

Our NCQ pathways aren't required to detect underflow, but since our DMA
backend uses the size of the PRDT to determine the size of the transer,
if our PRDT is bigger than the transaction (the underflow condition) it
doesn't cost us anything to detect it and truncate the PRDT.

This is a recoverable error and is not signaled to the guest, in either
NCQ or normal DMA cases.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 27 ++-
 hw/ide/core.c |  5 +++--
 hw/ide/internal.h |  2 +-
 hw/ide/macio.c|  2 +-
 hw/ide/pci.c  |  5 -
 5 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 95d228f..f873ab1 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -49,7 +49,7 @@ static int handle_cmd(AHCIState *s,int port,int slot);
 static void ahci_reset_port(AHCIState *s, int port);
 static void ahci_write_fis_d2h(AHCIDevice *ad, uint8_t *cmd_fis);
 static void ahci_init_d2h(AHCIDevice *ad);
-static int ahci_dma_prepare_buf(IDEDMA *dma, int is_write);
+static int ahci_dma_prepare_buf(IDEDMA *dma, int64_t limit, int is_write);
 static void ahci_commit_buf(IDEDMA *dma, uint32_t tx_bytes);
 static bool ahci_map_clb_address(AHCIDevice *ad);
 static bool ahci_map_fis_address(AHCIDevice *ad);
@@ -827,11 +827,12 @@ static void ahci_write_fis_d2h(AHCIDevice *ad, uint8_t 
*cmd_fis)
 
 static int prdt_tbl_entry_size(const AHCI_SG *tbl)
 {
+/* flags_size is zero-based */
 return (le32_to_cpu(tbl->flags_size) & AHCI_PRDT_SIZE_MASK) + 1;
 }
 
 static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList *sglist,
-int32_t offset)
+int64_t limit, int32_t offset)
 {
 AHCICmdHdr *cmd = ad->cur_cmd;
 uint16_t opts = le16_to_cpu(cmd->opts);
@@ -881,9 +882,8 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList 
*sglist,
 AHCI_SG *tbl = (AHCI_SG *)prdt;
 sum = 0;
 for (i = 0; i < prdtl; i++) {
-/* flags_size is zero-based */
 tbl_entry_size = prdt_tbl_entry_size(&tbl[i]);
-if (offset <= (sum + tbl_entry_size)) {
+if (offset < (sum + tbl_entry_size)) {
 off_idx = i;
 off_pos = offset - sum;
 break;
@@ -901,12 +901,13 @@ static int ahci_populate_sglist(AHCIDevice *ad, 
QEMUSGList *sglist,
 qemu_sglist_init(sglist, qbus->parent, (prdtl - off_idx),
  ad->hba->as);
 qemu_sglist_add(sglist, le64_to_cpu(tbl[off_idx].addr) + off_pos,
-prdt_tbl_entry_size(&tbl[off_idx]) - off_pos);
+MIN(prdt_tbl_entry_size(&tbl[off_idx]) - off_pos,
+limit));
 
-for (i = off_idx + 1; i < prdtl; i++) {
-/* flags_size is zero-based */
+for (i = off_idx + 1; i < prdtl && sglist->size < limit; i++) {
 qemu_sglist_add(sglist, le64_to_cpu(tbl[i].addr),
-prdt_tbl_entry_size(&tbl[i]));
+MIN(prdt_tbl_entry_size(&tbl[i]),
+limit - sglist->size));
 if (sglist->size > INT32_MAX) {
 error_report("AHCI Physical Region Descriptor Table describes "
  "more than 2 GiB.\n");
@@ -1024,8 +1025,8 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 
 ncq_tfs->sector_count = ((uint16_t)ncq_fis->sector_count_high << 8) |
 ncq_fis->sector_count_low;
-ahci_populate_sglist(ad, &ncq_tfs->sglist, 0);
 size = ncq_tfs->sector_count * 512;
+ahci_populate_sglist(ad, &ncq_tfs->sglist, size, 0);
 
 if (ncq_tfs->sglist.size < size) {
 error_report("ahci: PRDT length for NCQ command (0x%zx) "
@@ -1262,7 +1263,7 @@ static void ahci_start_transfer(IDEDMA *dma)
 goto out;
 }
 
-if (ahci_dma_prepare_buf(dma, is_write)) {
+if (ahci_dma_prepare_buf(dma, size, is_write)) {
 has_sglist = 1;
 }
 
@@ -1312,12 +1313,12 @@ static void ahci_restart_dma(IDEDMA *dma)
  * Not currently invoked by PIO R/W chains,
  * which invoke ahci_populate_sglist via ahci_start_transfer.
  */
-static int32_t ahci_dma_prepare_buf(

[Qemu-devel] [PATCH 10/16] qtest/ahci: halted NCQ test

2015-06-22 Thread John Snow

Signed-off-by: John Snow 
---
 tests/ahci-test.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 3f06fd9..c30837b 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -1200,13 +1200,13 @@ static void test_migrate_ncq(void)
 }
 
 /**
- * DMA Error Test
+ * Halted IO Error Test
  *
  * Simulate an error on first write, Try to write a pattern,
  * Confirm the VM has stopped, resume the VM, verify command
  * has completed, then read back the data and verify.
  */
-static void test_halted_dma(void)
+static void ahci_halted_io_test(uint8_t cmd_read, uint8_t cmd_write)
 {
 AHCIQState *ahci;
 uint8_t port;
@@ -1241,7 +1241,7 @@ static void test_halted_dma(void)
 memwrite(ptr, tx, bufsize);
 
 /* Attempt to write (and fail) */
-cmd = ahci_guest_io_halt(ahci, port, CMD_WRITE_DMA,
+cmd = ahci_guest_io_halt(ahci, port, cmd_write,
  ptr, bufsize, 0);
 
 /* Attempt to resume the command */
@@ -1249,7 +1249,7 @@ static void test_halted_dma(void)
 ahci_free(ahci, ptr);
 
 /* Read back and verify */
-ahci_io(ahci, port, CMD_READ_DMA, rx, bufsize, 0);
+ahci_io(ahci, port, cmd_read, rx, bufsize, 0);
 g_assert_cmphex(memcmp(tx, rx, bufsize), ==, 0);
 
 /* Cleanup and go home */
@@ -1258,6 +1258,16 @@ static void test_halted_dma(void)
 g_free(tx);
 }
 
+static void test_halted_dma(void)
+{
+ahci_halted_io_test(CMD_READ_DMA, CMD_WRITE_DMA);
+}
+
+static void test_halted_ncq(void)
+{
+ahci_halted_io_test(READ_FPDMA_QUEUED, WRITE_FPDMA_QUEUED);
+}
+
 /**
  * DMA Error Migration Test
  *
@@ -1677,6 +1687,7 @@ int main(int argc, char **argv)
 
 qtest_add_func("/ahci/io/ncq/simple", test_ncq_simple);
 qtest_add_func("/ahci/migrate/ncq/simple", test_migrate_ncq);
+qtest_add_func("/ahci/io/ncq/retry", test_halted_ncq);
 
 ret = g_test_run();
 
-- 
2.1.0

[Qemu-devel] [PATCH 14/16] ahci: Do not map cmd_fis to generate response

2015-06-22 Thread John Snow

The Register D2H FIS should copy the current values of
the registers instead of just parroting back the same
values the guest sent back to it.

In this case, the SECTOR COUNT variables are actually
not generally meaningful in terms of standard commands
(See ATA8-AC3 Section 9.2 Normal Outputs), so it actually
probably doesn't matter what we put in here.

Meanwhile, we do need to use the Register update FIS from
the NCQ pathways (in error cases), so getting rid of
references to cur_cmd here is a win for AHCI concurrency.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 50 +-
 1 file changed, 5 insertions(+), 45 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index eeb48fb..d344701 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -700,35 +700,13 @@ static void ahci_write_fis_sdb(AHCIState *s, int port, 
uint32_t finished)
 static void ahci_write_fis_pio(AHCIDevice *ad, uint16_t len)
 {
 AHCIPortRegs *pr = &ad->port_regs;
-uint8_t *pio_fis, *cmd_fis;
-uint64_t tbl_addr;
-dma_addr_t cmd_len = 0x80;
+uint8_t *pio_fis;
 IDEState *s = &ad->port.ifs[0];
 
 if (!ad->res_fis || !(pr->cmd & PORT_CMD_FIS_RX)) {
 return;
 }
 
-/* map cmd_fis */
-tbl_addr = le64_to_cpu(ad->cur_cmd->tbl_addr);
-cmd_fis = dma_memory_map(ad->hba->as, tbl_addr, &cmd_len,
- DMA_DIRECTION_TO_DEVICE);
-
-if (cmd_fis == NULL) {
-DPRINTF(ad->port_no, "dma_memory_map failed in ahci_write_fis_pio");
-ahci_trigger_irq(ad->hba, ad, PORT_IRQ_HBUS_ERR);
-return;
-}
-
-if (cmd_len != 0x80) {
-DPRINTF(ad->port_no,
-"dma_memory_map mapped too few bytes in ahci_write_fis_pio");
-dma_memory_unmap(ad->hba->as, cmd_fis, cmd_len,
- DMA_DIRECTION_TO_DEVICE, cmd_len);
-ahci_trigger_irq(ad->hba, ad, PORT_IRQ_HBUS_ERR);
-return;
-}
-
 pio_fis = &ad->res_fis[RES_FIS_PSFIS];
 
 pio_fis[0] = SATA_FIS_TYPE_PIO_SETUP;
@@ -744,8 +722,8 @@ static void ahci_write_fis_pio(AHCIDevice *ad, uint16_t len)
 pio_fis[9] = s->hob_lcyl;
 pio_fis[10] = s->hob_hcyl;
 pio_fis[11] = 0;
-pio_fis[12] = cmd_fis[12];
-pio_fis[13] = cmd_fis[13];
+pio_fis[12] = s->nsector & 0xFF;
+pio_fis[13] = (s->nsector >> 8) & 0xFF;
 pio_fis[14] = 0;
 pio_fis[15] = s->status;
 pio_fis[16] = len & 255;
@@ -762,9 +740,6 @@ static void ahci_write_fis_pio(AHCIDevice *ad, uint16_t len)
 }
 
 ahci_trigger_irq(ad->hba, ad, PORT_IRQ_PIOS_FIS);
-
-dma_memory_unmap(ad->hba->as, cmd_fis, cmd_len,
- DMA_DIRECTION_TO_DEVICE, cmd_len);
 }
 
 static void ahci_write_fis_d2h(AHCIDevice *ad, uint8_t *cmd_fis)
@@ -772,22 +747,12 @@ static void ahci_write_fis_d2h(AHCIDevice *ad, uint8_t 
*cmd_fis)
 AHCIPortRegs *pr = &ad->port_regs;
 uint8_t *d2h_fis;
 int i;
-dma_addr_t cmd_len = 0x80;
-int cmd_mapped = 0;
 IDEState *s = &ad->port.ifs[0];
 
 if (!ad->res_fis || !(pr->cmd & PORT_CMD_FIS_RX)) {
 return;
 }
 
-if (!cmd_fis) {
-/* map cmd_fis */
-uint64_t tbl_addr = le64_to_cpu(ad->cur_cmd->tbl_addr);
-cmd_fis = dma_memory_map(ad->hba->as, tbl_addr, &cmd_len,
- DMA_DIRECTION_TO_DEVICE);
-cmd_mapped = 1;
-}
-
 d2h_fis = &ad->res_fis[RES_FIS_RFIS];
 
 d2h_fis[0] = SATA_FIS_TYPE_REGISTER_D2H;
@@ -803,8 +768,8 @@ static void ahci_write_fis_d2h(AHCIDevice *ad, uint8_t 
*cmd_fis)
 d2h_fis[9] = s->hob_lcyl;
 d2h_fis[10] = s->hob_hcyl;
 d2h_fis[11] = 0;
-d2h_fis[12] = cmd_fis[12];
-d2h_fis[13] = cmd_fis[13];
+d2h_fis[12] = s->nsector & 0xFF;
+d2h_fis[13] = (s->nsector >> 8) & 0xFF;
 for (i = 14; i < 20; i++) {
 d2h_fis[i] = 0;
 }
@@ -818,11 +783,6 @@ static void ahci_write_fis_d2h(AHCIDevice *ad, uint8_t 
*cmd_fis)
 }
 
 ahci_trigger_irq(ad->hba, ad, PORT_IRQ_D2H_REG_FIS);
-
-if (cmd_mapped) {
-dma_memory_unmap(ad->hba->as, cmd_fis, cmd_len,
- DMA_DIRECTION_TO_DEVICE, cmd_len);
-}
 }
 
 static int prdt_tbl_entry_size(const AHCI_SG *tbl)
-- 
2.1.0

[Qemu-devel] [PATCH 09/16] ahci: correct ncq sector count

2015-06-22 Thread John Snow

uint16_t isn't enough to hold the real sector count, since a value of
zero implies a full 64K sectors, so we need a uint32_t here.

We *could* cheat and pretend that this value is 0-based and fit it in
a uint16_t, but I'd rather waste 2 bytes instead of a future dev's
10 minutes when they forget to +1/-1 accordingly somewhere.

See SATA 3.2, section 13.6.4.1 "READ FPDMA QUEUED".

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 7 +--
 hw/ide/ahci.h | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 7fcc6a2..043b959 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1085,8 +1085,11 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 DPRINTF(port, "Warn: Unsupported attempt to use Rebuild Assist\n");
 }
 
-ncq_tfs->sector_count = ((uint16_t)ncq_fis->sector_count_high << 8) |
-ncq_fis->sector_count_low;
+ncq_tfs->sector_count = ((ncq_fis->sector_count_high << 8) |
+ ncq_fis->sector_count_low);
+if (!ncq_tfs->sector_count) {
+ncq_tfs->sector_count = 0x1;
+}
 size = ncq_tfs->sector_count * 512;
 ahci_populate_sglist(ad, &ncq_tfs->sglist, size, 0);
 
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index c728e3a..9090d3d 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -256,7 +256,7 @@ typedef struct NCQTransferState {
 BlockAIOCB *aiocb;
 QEMUSGList sglist;
 BlockAcctCookie acct;
-uint16_t sector_count;
+uint32_t sector_count;
 uint64_t lba;
 uint8_t tag;
 uint8_t cmd;
-- 
2.1.0

[Qemu-devel] [PATCH 15/16] qtest/ahci: halted ncq migration test

2015-06-22 Thread John Snow

Signed-off-by: John Snow 
---
 tests/ahci-test.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index c30837b..87d7691 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -1269,13 +1269,13 @@ static void test_halted_ncq(void)
 }
 
 /**
- * DMA Error Migration Test
+ * IO Error Migration Test
  *
  * Simulate an error on first write, Try to write a pattern,
  * Confirm the VM has stopped, migrate, resume the VM,
  * verify command has completed, then read back the data and verify.
  */
-static void test_migrate_halted_dma(void)
+static void ahci_migrate_halted_io(uint8_t cmd_read, uint8_t cmd_write)
 {
 AHCIQState *src, *dst;
 uint8_t port;
@@ -1321,14 +1321,14 @@ static void test_migrate_halted_dma(void)
 memwrite(ptr, tx, bufsize);
 
 /* Write, trigger the VM to stop, migrate, then resume. */
-cmd = ahci_guest_io_halt(src, port, CMD_WRITE_DMA,
+cmd = ahci_guest_io_halt(src, port, cmd_write,
  ptr, bufsize, 0);
 ahci_migrate(src, dst, uri);
 ahci_guest_io_resume(dst, cmd);
 ahci_free(dst, ptr);
 
 /* Read back */
-ahci_io(dst, port, CMD_READ_DMA, rx, bufsize, 0);
+ahci_io(dst, port, cmd_read, rx, bufsize, 0);
 
 /* Verify TX and RX are identical */
 g_assert_cmphex(memcmp(tx, rx, bufsize), ==, 0);
@@ -1340,6 +1340,16 @@ static void test_migrate_halted_dma(void)
 g_free(tx);
 }
 
+static void test_migrate_halted_dma(void)
+{
+ahci_migrate_halted_io(CMD_READ_DMA, CMD_WRITE_DMA);
+}
+
+static void test_migrate_halted_ncq(void)
+{
+ahci_migrate_halted_io(READ_FPDMA_QUEUED, WRITE_FPDMA_QUEUED);
+}
+
 /**
  * Migration test: Try to flush, migrate, then resume.
  */
@@ -1688,6 +1698,7 @@ int main(int argc, char **argv)
 qtest_add_func("/ahci/io/ncq/simple", test_ncq_simple);
 qtest_add_func("/ahci/migrate/ncq/simple", test_migrate_ncq);
 qtest_add_func("/ahci/io/ncq/retry", test_halted_ncq);
+qtest_add_func("/ahci/migrate/ncq/halted", test_migrate_halted_ncq);
 
 ret = g_test_run();
 
-- 
2.1.0

[Qemu-devel] [PATCH 12/16] ahci: ncq migration

2015-06-22 Thread John Snow

Migrate the NCQ queue. This is solely for the benefit of halted commands,
since anything else should have completed and had any relevant status
flushed to the HBA registers already.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 49 -
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 4cf792b..a29cf49 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1511,6 +1511,21 @@ void ahci_reset(AHCIState *s)
 }
 }
 
+static const VMStateDescription vmstate_ncq_tfs = {
+.name = "ncq state",
+.version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(sector_count, NCQTransferState),
+VMSTATE_UINT64(lba, NCQTransferState),
+VMSTATE_UINT8(tag, NCQTransferState),
+VMSTATE_UINT8(cmd, NCQTransferState),
+VMSTATE_UINT8(slot, NCQTransferState),
+VMSTATE_BOOL(used, NCQTransferState),
+VMSTATE_BOOL(halt, NCQTransferState),
+VMSTATE_END_OF_LIST()
+},
+};
+
 static const VMStateDescription vmstate_ahci_device = {
 .name = "ahci port",
 .version_id = 1,
@@ -1536,14 +1551,17 @@ static const VMStateDescription vmstate_ahci_device = {
 VMSTATE_BOOL(done_atapi_packet, AHCIDevice),
 VMSTATE_INT32(busy_slot, AHCIDevice),
 VMSTATE_BOOL(init_d2h_sent, AHCIDevice),
+VMSTATE_STRUCT_ARRAY(ncq_tfs, AHCIDevice, AHCI_MAX_CMDS,
+ 1, vmstate_ncq_tfs, NCQTransferState),
 VMSTATE_END_OF_LIST()
 },
 };
 
 static int ahci_state_post_load(void *opaque, int version_id)
 {
-int i;
+int i, j;
 struct AHCIDevice *ad;
+NCQTransferState *ncq_tfs;
 AHCIState *s = opaque;
 
 for (i = 0; i < s->ports; i++) {
@@ -1555,6 +1573,35 @@ static int ahci_state_post_load(void *opaque, int 
version_id)
 return -1;
 }
 
+for (j = 0; j < AHCI_MAX_CMDS; j++) {
+ncq_tfs = &ad->ncq_tfs[j];
+ncq_tfs->drive = ad;
+
+if (ncq_tfs->used != ncq_tfs->halt) {
+return -1;
+}
+if (!ncq_tfs->halt) {
+continue;
+}
+if (!is_ncq(ncq_tfs->cmd)) {
+return -1;
+}
+if (ncq_tfs->slot != ncq_tfs->tag) {
+return -1;
+}
+if (ncq_tfs->slot > AHCI_MAX_CMDS) {
+return -1;
+}
+ncq_tfs->cmdh = &((AHCICmdHdr *)ad->lst)[ncq_tfs->slot];
+ahci_populate_sglist(ncq_tfs->drive, &ncq_tfs->sglist,
+ ncq_tfs->cmdh, ncq_tfs->sector_count * 512,
+ 0);
+if (ncq_tfs->sector_count != ncq_tfs->sglist.size >> 9) {
+return -1;
+}
+}
+
+
 /*
  * If an error is present, ad->busy_slot will be valid and not -1.
  * In this case, an operation is waiting to resume and will re-check
-- 
2.1.0

[Qemu-devel] [PATCH 05/16] ahci: factor ncq_finish out of ncq_cb

2015-06-22 Thread John Snow

When we add werror=stop or rerror=stop support to NCQ,
we'll want to take a codepath where we don't actually
complete the command, so factor that out into a new routine.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 32 +++-
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 4f8cceb..71b5085 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -933,23 +933,11 @@ static void ncq_err(NCQTransferState *ncq_tfs)
 ncq_tfs->drive->port_regs.scr_err |= (1 << ncq_tfs->tag);
 }
 
-static void ncq_cb(void *opaque, int ret)
+static void ncq_finish(NCQTransferState *ncq_tfs)
 {
-NCQTransferState *ncq_tfs = (NCQTransferState *)opaque;
-IDEState *ide_state = &ncq_tfs->drive->port.ifs[0];
-
-if (ret == -ECANCELED) {
-return;
-}
 /* Clear bit for this tag in SActive */
 ncq_tfs->drive->port_regs.scr_act &= ~(1 << ncq_tfs->tag);
 
-if (ret < 0) {
-ncq_err(ncq_tfs);
-} else {
-ide_state->status = READY_STAT | SEEK_STAT;
-}
-
 ahci_write_fis_sdb(ncq_tfs->drive->hba, ncq_tfs->drive->port_no,
(1 << ncq_tfs->tag));
 
@@ -962,6 +950,24 @@ static void ncq_cb(void *opaque, int ret)
 ncq_tfs->used = 0;
 }
 
+static void ncq_cb(void *opaque, int ret)
+{
+NCQTransferState *ncq_tfs = (NCQTransferState *)opaque;
+IDEState *ide_state = &ncq_tfs->drive->port.ifs[0];
+
+if (ret == -ECANCELED) {
+return;
+}
+
+if (ret < 0) {
+ncq_err(ncq_tfs);
+} else {
+ide_state->status = READY_STAT | SEEK_STAT;
+}
+
+ncq_finish(ncq_tfs);
+}
+
 static int is_ncq(uint8_t ata_cmd)
 {
 /* Based on SATA 3.2 section 13.6.3.2 */
-- 
2.1.0

[Qemu-devel] [PATCH 06/16] ahci: record ncq failures

2015-06-22 Thread John Snow

Handle NCQ failures for cases where we want to halt the VM on IO errors.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 17 +++--
 hw/ide/ahci.h |  1 +
 hw/ide/internal.h |  1 +
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 71b5085..a838317 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -959,13 +959,25 @@ static void ncq_cb(void *opaque, int ret)
 return;
 }
 
+ncq_tfs->halt = false;
 if (ret < 0) {
-ncq_err(ncq_tfs);
+bool is_read = ncq_tfs->cmd == READ_FPDMA_QUEUED;
+BlockErrorAction action = blk_get_error_action(ide_state->blk,
+   is_read, -ret);
+if (action == BLOCK_ERROR_ACTION_STOP) {
+ncq_tfs->halt = true;
+ide_state->bus->error_status = IDE_RETRY_HBA;
+} else if (action == BLOCK_ERROR_ACTION_REPORT) {
+ncq_err(ncq_tfs);
+}
+blk_error_action(ide_state->blk, action, is_read, -ret);
 } else {
 ide_state->status = READY_STAT | SEEK_STAT;
 }
 
-ncq_finish(ncq_tfs);
+if (!ncq_tfs->halt) {
+ncq_finish(ncq_tfs);
+}
 }
 
 static int is_ncq(uint8_t ata_cmd)
@@ -1042,6 +1054,7 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 }
 
 ncq_tfs->used = 1;
+ncq_tfs->halt = false;
 ncq_tfs->drive = ad;
 ncq_tfs->slot = slot;
 ncq_tfs->cmd = ncq_fis->command;
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index 33607d7..47a3122 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -262,6 +262,7 @@ typedef struct NCQTransferState {
 uint8_t cmd;
 int slot;
 int used;
+bool halt;
 } NCQTransferState;
 
 struct AHCIDevice {
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 7a4a86d..5abee19 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -499,6 +499,7 @@ struct IDEDevice {
 #define IDE_RETRY_READ  0x20
 #define IDE_RETRY_FLUSH 0x40
 #define IDE_RETRY_TRIM 0x80
+#define IDE_RETRY_HBA  0x100
 
 static inline IDEState *idebus_active_if(IDEBus *bus)
 {
-- 
2.1.0

[Qemu-devel] [PATCH 08/16] ahci: correct types in NCQTransferState

2015-06-22 Thread John Snow

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 10 +-
 hw/ide/ahci.h |  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 6233059..7fcc6a2 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -45,7 +45,7 @@ do { \
 } while (0)
 
 static void check_cmd(AHCIState *s, int port);
-static int handle_cmd(AHCIState *s,int port,int slot);
+static int handle_cmd(AHCIState *s, int port, uint8_t slot);
 static void ahci_reset_port(AHCIState *s, int port);
 static void ahci_write_fis_d2h(AHCIDevice *ad, uint8_t *cmd_fis);
 static void ahci_init_d2h(AHCIDevice *ad);
@@ -506,7 +506,7 @@ static void ahci_reg_init(AHCIState *s)
 static void check_cmd(AHCIState *s, int port)
 {
 AHCIPortRegs *pr = &s->dev[port].port_regs;
-int slot;
+uint8_t slot;
 
 if ((pr->cmd & PORT_CMD_START) && pr->cmd_issue) {
 for (slot = 0; (slot < 32) && pr->cmd_issue; slot++) {
@@ -1037,7 +1037,7 @@ static void execute_ncq_command(NCQTransferState *ncq_tfs)
 
 
 static void process_ncq_command(AHCIState *s, int port, uint8_t *cmd_fis,
-int slot)
+uint8_t slot)
 {
 AHCIDevice *ad = &s->dev[port];
 IDEState *ide_state = &ad->port.ifs[0];
@@ -1113,7 +1113,7 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 }
 
 static void handle_reg_h2d_fis(AHCIState *s, int port,
-   int slot, uint8_t *cmd_fis)
+   uint8_t slot, uint8_t *cmd_fis)
 {
 IDEState *ide_state = &s->dev[port].port.ifs[0];
 AHCICmdHdr *cmd = s->dev[port].cur_cmd;
@@ -1197,7 +1197,7 @@ static void handle_reg_h2d_fis(AHCIState *s, int port,
 ide_exec_cmd(&s->dev[port].port, cmd_fis[2]);
 }
 
-static int handle_cmd(AHCIState *s, int port, int slot)
+static int handle_cmd(AHCIState *s, int port, uint8_t slot)
 {
 IDEState *ide_state;
 uint64_t tbl_addr;
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index 47a3122..c728e3a 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -260,8 +260,8 @@ typedef struct NCQTransferState {
 uint64_t lba;
 uint8_t tag;
 uint8_t cmd;
-int slot;
-int used;
+uint8_t slot;
+bool used;
 bool halt;
 } NCQTransferState;
 
-- 
2.1.0

[Qemu-devel] [PATCH 07/16] ahci: kick NCQ queue

2015-06-22 Thread John Snow

Upon a VM state change, retry the halted NCQ commands.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 18 ++
 hw/ide/core.c |  7 +++
 hw/ide/internal.h |  1 +
 3 files changed, 26 insertions(+)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index a838317..6233059 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1332,6 +1332,23 @@ static void ahci_restart_dma(IDEDMA *dma)
 }
 
 /**
+ * IDE/PIO restarts are handled by the core layer, but NCQ commands
+ * need an extra kick from the AHCI HBA.
+ */
+static void ahci_restart(IDEDMA *dma)
+{
+AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
+int i;
+
+for (i = 0; i < AHCI_MAX_CMDS; i++) {
+NCQTransferState *ncq_tfs = &ad->ncq_tfs[i];
+if (ncq_tfs->halt) {
+execute_ncq_command(ncq_tfs);
+}
+}
+}
+
+/**
  * Called in DMA R/W chains to read the PRDT, utilizing ahci_populate_sglist.
  * Not currently invoked by PIO R/W chains,
  * which invoke ahci_populate_sglist via ahci_start_transfer.
@@ -1419,6 +1436,7 @@ static void ahci_irq_set(void *opaque, int n, int level)
 
 static const IDEDMAOps ahci_dma_ops = {
 .start_dma = ahci_start_dma,
+.restart = ahci_restart,
 .restart_dma = ahci_restart_dma,
 .start_transfer = ahci_start_transfer,
 .prepare_buf = ahci_dma_prepare_buf,
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 6eefb30..8c271cc 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2372,6 +2372,13 @@ static void ide_restart_bh(void *opaque)
  * called function can set a new error status. */
 bus->error_status = 0;
 
+/* The HBA has generically asked to be kicked on retry */
+if (error_status & IDE_RETRY_HBA) {
+if (s->bus->dma->ops->restart) {
+s->bus->dma->ops->restart(s->bus->dma);
+}
+}
+
 if (error_status & IDE_RETRY_DMA) {
 if (error_status & IDE_RETRY_TRIM) {
 ide_restart_dma(s, IDE_DMA_TRIM);
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 5abee19..adb968c 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -436,6 +436,7 @@ struct IDEDMAOps {
 DMAInt32Func *prepare_buf;
 DMAu32Func *commit_buf;
 DMAIntFunc *rw_buf;
+DMAVoidFunc *restart;
 DMAVoidFunc *restart_dma;
 DMAStopFunc *set_inactive;
 DMAVoidFunc *cmd_done;
-- 
2.1.0

[Qemu-devel] [PATCH 03/16] ahci: assert is_ncq for process_ncq

2015-06-22 Thread John Snow

We already checked this in the handle_cmd phase, so just
change this to an assertion and simplify the error logic.

(Also, fix the switch indent, because checkpatch.pl yelled.)
((Sorry for churn.))

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 60 ++-
 1 file changed, 26 insertions(+), 34 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index ce99e81..2d70104 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -987,6 +987,7 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 NCQTransferState *ncq_tfs = &ad->ncq_tfs[tag];
 size_t size;
 
+g_assert(is_ncq(ncq_fis->command));
 if (ncq_tfs->used) {
 /* error - already in use */
 fprintf(stderr, "%s: tag %d already used\n", __FUNCTION__, tag);
@@ -1049,44 +1050,35 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 ide_state->nb_sectors - 1);
 
 switch (ncq_tfs->cmd) {
-case READ_FPDMA_QUEUED:
-DPRINTF(port, "NCQ reading %d sectors from LBA %"PRId64", "
-"tag %d\n",
-ncq_tfs->sector_count, ncq_tfs->lba, ncq_tfs->tag);
+case READ_FPDMA_QUEUED:
+DPRINTF(port, "NCQ reading %d sectors from LBA %"PRId64", tag %d\n",
+ncq_tfs->sector_count, ncq_tfs->lba, ncq_tfs->tag);
 
-DPRINTF(port, "tag %d aio read %"PRId64"\n",
-ncq_tfs->tag, ncq_tfs->lba);
+DPRINTF(port, "tag %d aio read %"PRId64"\n",
+ncq_tfs->tag, ncq_tfs->lba);
 
-dma_acct_start(ide_state->blk, &ncq_tfs->acct,
-   &ncq_tfs->sglist, BLOCK_ACCT_READ);
-ncq_tfs->aiocb = dma_blk_read(ide_state->blk,
-  &ncq_tfs->sglist, ncq_tfs->lba,
-  ncq_cb, ncq_tfs);
-break;
-case WRITE_FPDMA_QUEUED:
-DPRINTF(port, "NCQ writing %d sectors to LBA %"PRId64", tag %d\n",
-ncq_tfs->sector_count, ncq_tfs->lba, ncq_tfs->tag);
+dma_acct_start(ide_state->blk, &ncq_tfs->acct,
+   &ncq_tfs->sglist, BLOCK_ACCT_READ);
+ncq_tfs->aiocb = dma_blk_read(ide_state->blk, &ncq_tfs->sglist,
+  ncq_tfs->lba, ncq_cb, ncq_tfs);
+break;
+case WRITE_FPDMA_QUEUED:
+DPRINTF(port, "NCQ writing %d sectors to LBA %"PRId64", tag %d\n",
+ncq_tfs->sector_count, ncq_tfs->lba, ncq_tfs->tag);
 
-DPRINTF(port, "tag %d aio write %"PRId64"\n",
-ncq_tfs->tag, ncq_tfs->lba);
+DPRINTF(port, "tag %d aio write %"PRId64"\n",
+ncq_tfs->tag, ncq_tfs->lba);
 
-dma_acct_start(ide_state->blk, &ncq_tfs->acct,
-   &ncq_tfs->sglist, BLOCK_ACCT_WRITE);
-ncq_tfs->aiocb = dma_blk_write(ide_state->blk,
-   &ncq_tfs->sglist, ncq_tfs->lba,
-   ncq_cb, ncq_tfs);
-break;
-default:
-if (is_ncq(cmd_fis[2])) {
-DPRINTF(port,
-"error: unsupported NCQ command (0x%02x) received\n",
-cmd_fis[2]);
-} else {
-DPRINTF(port,
-"error: tried to process non-NCQ command as NCQ\n");
-}
-qemu_sglist_destroy(&ncq_tfs->sglist);
-ncq_err(ncq_tfs);
+dma_acct_start(ide_state->blk, &ncq_tfs->acct,
+   &ncq_tfs->sglist, BLOCK_ACCT_WRITE);
+ncq_tfs->aiocb = dma_blk_write(ide_state->blk, &ncq_tfs->sglist,
+   ncq_tfs->lba, ncq_cb, ncq_tfs);
+break;
+default:
+DPRINTF(port, "error: unsupported NCQ command (0x%02x) received\n",
+ncq_tfs->cmd);
+qemu_sglist_destroy(&ncq_tfs->sglist);
+ncq_err(ncq_tfs);
 }
 }
 
-- 
2.1.0

[Qemu-devel] [PATCH 00/16] ahci: ncq cleanup, part 2

2015-06-22 Thread John Snow

requires: 1434470575-21625-1-git-send-email-js...@redhat.com
  1435016308-6150-1-git-send-email-js...@redhat.com
  [PATCH v2 0/4] ahci: misc fixes/tests for 2.4
  [PATCH v2 00/16] ahci: ncq cleanup, part 1

This chunk gets NCQ migration and and resume support working.

There's still some left to do, particularly around error handling
and FIS semantics, but this should get us most of the way there.

There is one RFC bit in this patch, inside of Patch #1, concerning
how to handle truncating PRDTs that are too large -- it looks like
we have attempted to address it in the past, and I accidentally
bumped up against it now. By actually trying to consume the PRDs
but ignoring any extra space they describe, I would break ide-test.

I'll post logs later, but judging by the test text itself, we don't
seem to know what the right behavior is.



For convenience, this branch is available at:
https://github.com/jnsnow/qemu.git branch ahci-ncq-s2
https://github.com/jnsnow/qemu/tree/ahci-ncq-s2

This version is tagged ahci-ncq-s2-v1:
https://github.com/jnsnow/qemu/releases/tag/ahci-ncq-s2-v1

John Snow (16):
  ide: add limit to .prepare_buf()
  ahci: stash ncq command
  ahci: assert is_ncq for process_ncq
  ahci: refactor process_ncq_command
  ahci: factor ncq_finish out of ncq_cb
  ahci: record ncq failures
  ahci: kick NCQ queue
  ahci: correct types in NCQTransferState
  ahci: correct ncq sector count
  qtest/ahci: halted NCQ test
  ahci: add cmd header to ncq transfer state
  ahci: ncq migration
  ahci: add get_cmd_header helper
  ahci: Do not map cmd_fis to generate response
  qtest/ahci: halted ncq migration test
  ahci: fix sdb fis semantics

 hw/ide/ahci.c | 330 --
 hw/ide/ahci.h |   9 +-
 hw/ide/core.c |  12 +-
 hw/ide/internal.h |   4 +-
 hw/ide/macio.c|   2 +-
 hw/ide/pci.c  |   5 +-
 tests/ahci-test.c |  38 +--
 7 files changed, 252 insertions(+), 148 deletions(-)

-- 
2.1.0

[Qemu-devel] [PATCH 04/16] ahci: refactor process_ncq_command

2015-06-22 Thread John Snow

Split off execute_ncq_command so that we can call
it separately later if we desire.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 73 ++-
 1 file changed, 42 insertions(+), 31 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 2d70104..4f8cceb 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -977,6 +977,47 @@ static int is_ncq(uint8_t ata_cmd)
 }
 }
 
+static void execute_ncq_command(NCQTransferState *ncq_tfs)
+{
+AHCIDevice *ad = ncq_tfs->drive;
+IDEState *ide_state = &ad->port.ifs[0];
+int port = ad->port_no;
+g_assert(is_ncq(ncq_tfs->cmd));
+
+switch (ncq_tfs->cmd) {
+case READ_FPDMA_QUEUED:
+DPRINTF(port, "NCQ reading %d sectors from LBA %"PRId64", tag %d\n",
+ncq_tfs->sector_count, ncq_tfs->lba, ncq_tfs->tag);
+
+DPRINTF(port, "tag %d aio read %"PRId64"\n",
+ncq_tfs->tag, ncq_tfs->lba);
+
+dma_acct_start(ide_state->blk, &ncq_tfs->acct,
+   &ncq_tfs->sglist, BLOCK_ACCT_READ);
+ncq_tfs->aiocb = dma_blk_read(ide_state->blk, &ncq_tfs->sglist,
+  ncq_tfs->lba, ncq_cb, ncq_tfs);
+break;
+case WRITE_FPDMA_QUEUED:
+DPRINTF(port, "NCQ writing %d sectors to LBA %"PRId64", tag %d\n",
+ncq_tfs->sector_count, ncq_tfs->lba, ncq_tfs->tag);
+
+DPRINTF(port, "tag %d aio write %"PRId64"\n",
+ncq_tfs->tag, ncq_tfs->lba);
+
+dma_acct_start(ide_state->blk, &ncq_tfs->acct,
+   &ncq_tfs->sglist, BLOCK_ACCT_WRITE);
+ncq_tfs->aiocb = dma_blk_write(ide_state->blk, &ncq_tfs->sglist,
+   ncq_tfs->lba, ncq_cb, ncq_tfs);
+break;
+default:
+DPRINTF(port, "error: unsupported NCQ command (0x%02x) received\n",
+ncq_tfs->cmd);
+qemu_sglist_destroy(&ncq_tfs->sglist);
+ncq_err(ncq_tfs);
+}
+}
+
+
 static void process_ncq_command(AHCIState *s, int port, uint8_t *cmd_fis,
 int slot)
 {
@@ -1049,37 +1090,7 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 ncq_tfs->lba, ncq_tfs->lba + ncq_tfs->sector_count - 1,
 ide_state->nb_sectors - 1);
 
-switch (ncq_tfs->cmd) {
-case READ_FPDMA_QUEUED:
-DPRINTF(port, "NCQ reading %d sectors from LBA %"PRId64", tag %d\n",
-ncq_tfs->sector_count, ncq_tfs->lba, ncq_tfs->tag);
-
-DPRINTF(port, "tag %d aio read %"PRId64"\n",
-ncq_tfs->tag, ncq_tfs->lba);
-
-dma_acct_start(ide_state->blk, &ncq_tfs->acct,
-   &ncq_tfs->sglist, BLOCK_ACCT_READ);
-ncq_tfs->aiocb = dma_blk_read(ide_state->blk, &ncq_tfs->sglist,
-  ncq_tfs->lba, ncq_cb, ncq_tfs);
-break;
-case WRITE_FPDMA_QUEUED:
-DPRINTF(port, "NCQ writing %d sectors to LBA %"PRId64", tag %d\n",
-ncq_tfs->sector_count, ncq_tfs->lba, ncq_tfs->tag);
-
-DPRINTF(port, "tag %d aio write %"PRId64"\n",
-ncq_tfs->tag, ncq_tfs->lba);
-
-dma_acct_start(ide_state->blk, &ncq_tfs->acct,
-   &ncq_tfs->sglist, BLOCK_ACCT_WRITE);
-ncq_tfs->aiocb = dma_blk_write(ide_state->blk, &ncq_tfs->sglist,
-   ncq_tfs->lba, ncq_cb, ncq_tfs);
-break;
-default:
-DPRINTF(port, "error: unsupported NCQ command (0x%02x) received\n",
-ncq_tfs->cmd);
-qemu_sglist_destroy(&ncq_tfs->sglist);
-ncq_err(ncq_tfs);
-}
+execute_ncq_command(ncq_tfs);
 }
 
 static void handle_reg_h2d_fis(AHCIState *s, int port,
-- 
2.1.0

[Qemu-devel] [PATCH 02/16] ahci: stash ncq command

2015-06-22 Thread John Snow

For migration and werror=stop/rerror=stop resume purposes,
it will be convenient to have the command handy inside of
ncq_tfs.

Eventually, we'd like to avoid reading from the FIS entirely
after the initial read, so this is a byte (hah!) sized step
in that direction.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 3 ++-
 hw/ide/ahci.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index f873ab1..ce99e81 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -996,6 +996,7 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 ncq_tfs->used = 1;
 ncq_tfs->drive = ad;
 ncq_tfs->slot = slot;
+ncq_tfs->cmd = ncq_fis->command;
 ncq_tfs->lba = ((uint64_t)ncq_fis->lba5 << 40) |
((uint64_t)ncq_fis->lba4 << 32) |
((uint64_t)ncq_fis->lba3 << 24) |
@@ -1047,7 +1048,7 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 ncq_tfs->lba, ncq_tfs->lba + ncq_tfs->sector_count - 1,
 ide_state->nb_sectors - 1);
 
-switch(ncq_fis->command) {
+switch (ncq_tfs->cmd) {
 case READ_FPDMA_QUEUED:
 DPRINTF(port, "NCQ reading %d sectors from LBA %"PRId64", "
 "tag %d\n",
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index b8872a4..33607d7 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -259,6 +259,7 @@ typedef struct NCQTransferState {
 uint16_t sector_count;
 uint64_t lba;
 uint8_t tag;
+uint8_t cmd;
 int slot;
 int used;
 } NCQTransferState;
-- 
2.1.0

Re: [Qemu-devel] [PATCH 09/11] kvm/x86: distingiush hyper-v guest crash notification

2015-06-22 Thread Peter Hornyack

On Mon, Jun 22, 2015 at 9:05 AM, Denis V. Lunev  wrote:
> From: Andrey Smetanin 
>
> Previous patches allowes userspace to setup Hyper-V crash ctl msr.
> This msr should expose HV_X64_MSR_CRASH_CTL_NOTIFY value to Hyper-V
> guest to allow to send crash data. Unfortunately Hyper-V guest notifies
> hardware about crash by writing the same HV_X64_MSR_CRASH_CTL_NOTIFY value
> into crash ctl msr. Thus both user space and guest writes inside ctl msr the
> same value and this patch distingiush the moment of actual guest crash
> by checking host initiated value from msr info.
>
> Signed-off-by: Andrey Smetanin 
> Signed-off-by: Denis V. Lunev 
> CC: Paolo Bonzini 
> CC: Gleb Natapov 
> ---
>  arch/x86/kvm/hyperv.c | 17 +
>  arch/x86/kvm/hyperv.h |  2 +-
>  arch/x86/kvm/x86.c|  3 ++-
>  3 files changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 6b18015..f49502a 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -54,12 +54,12 @@ static int kvm_hv_msr_get_crash_ctl(struct kvm_vcpu 
> *vcpu, u64 *pdata)
> return 0;
>  }
>
> -static int kvm_hv_msr_set_crash_ctl(struct kvm_vcpu *vcpu, u64 data)
> +static int kvm_hv_msr_set_crash_ctl(struct kvm_vcpu *vcpu, u64 data, bool 
> host)
>  {
> struct kvm_arch_hyperv *hv = &vcpu->kvm->arch.hyperv;
>
> hv->hv_crash_ctl = data;

Should we allow hv_crash_ctl to be set if !host? It's a small detail,
but it doesn't seem like the guest should be able to disable crash
actions that userspace has enabled in hv_crash_ctl.

> -   if ((data & HV_X64_MSR_CRASH_CTL_NOTIFY)) {
> +   if ((data & HV_X64_MSR_CRASH_CTL_NOTIFY) && !host) {
> vcpu_debug(vcpu, "hv crash (0x%llx 0x%llx 0x%llx 0x%llx "
>   "0x%llx)\n", hv->hv_crash_param[0],
>   hv->hv_crash_param[1],
> @@ -99,7 +99,8 @@ static int kvm_hv_msr_get_crash_data(struct kvm_vcpu *vcpu,
> return 0;
>  }
>
> -static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> +static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu,
> +u32 msr, u64 data, bool host)
>  {
> struct kvm *kvm = vcpu->kvm;
> struct kvm_arch_hyperv *hv = &kvm->arch.hyperv;
> @@ -156,7 +157,7 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 
> msr, u64 data)
>  msr - HV_X64_MSR_CRASH_P0,
>  data);
> case HV_X64_MSR_CRASH_CTL:
> -   return kvm_hv_msr_set_crash_ctl(vcpu, data);
> +   return kvm_hv_msr_set_crash_ctl(vcpu, data, host);
> default:
> vcpu_unimpl(vcpu, "Hyper-V unimpl wrmsr: 0x%x data 0x%llx\n",
>   msr, data);
> @@ -165,7 +166,7 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 
> msr, u64 data)
> return 0;
>  }
>
> -static int kvm_hv_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> +static int kvm_hv_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool 
> host)
>  {
> struct kvm_vcpu_arch_hyperv *hv = &vcpu->arch.hyperv;
>
> @@ -278,17 +279,17 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 
> msr, u64 *pdata)
> return 0;
>  }
>
> -int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> +int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool 
> host)
>  {
> if (kvm_hv_msr_partition_wide(msr)) {
> int r;
>
> mutex_lock(&vcpu->kvm->lock);
> -   r = kvm_hv_set_msr_pw(vcpu, msr, data);
> +   r = kvm_hv_set_msr_pw(vcpu, msr, data, host);
> mutex_unlock(&vcpu->kvm->lock);
> return r;
> } else
> -   return kvm_hv_set_msr(vcpu, msr, data);
> +   return kvm_hv_set_msr(vcpu, msr, data, host);
>  }
>
>  int kvm_hv_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> index 39aee93..dc49527 100644
> --- a/arch/x86/kvm/hyperv.h
> +++ b/arch/x86/kvm/hyperv.h
> @@ -22,7 +22,7 @@
>  #ifndef __ARCH_X86_KVM_HYPERV_H__
>  #define __ARCH_X86_KVM_HYPERV_H__
>
> -int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data);
> +int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool 
> host);
>  int kvm_hv_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
>
>  bool kvm_hv_hypercall_enabled(struct kvm *kvm);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 111fa83..db4eecb 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2210,7 +2210,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
> msr_data *msr_info)
> case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15:
> case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
> case HV_X64_MSR_CRASH_CTL:
> -   return kvm_hv_set_msr_c

Re: [Qemu-devel] [PATCH 07/11] kvm/x86: added hyper-v crash data and ctl msr's get/set'ers

2015-06-22 Thread Peter Hornyack

On Mon, Jun 22, 2015 at 9:05 AM, Denis V. Lunev  wrote:
> From: Andrey Smetanin 
>
> Added hyper-v crash msr's(HV_X64_MSR_CRASH*) data and control
> geters and setters.
>
> Signed-off-by: Andrey Smetanin 
> Signed-off-by: Denis V. Lunev 
> CC: Paolo Bonzini 
> CC: Gleb Natapov 
> ---
>  arch/x86/kvm/hyperv.c | 66 
> +++
>  arch/x86/kvm/x86.c|  4 
>  2 files changed, 70 insertions(+)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index f65fb622..0a7d373 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -46,6 +46,59 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
> return r;
>  }
>
> +static int kvm_hv_msr_get_crash_ctl(struct kvm_vcpu *vcpu, u64 *pdata)
> +{
> +   struct kvm_arch_hyperv *hv = &vcpu->kvm->arch.hyperv;
> +
> +   *pdata = hv->hv_crash_ctl;

I see that this is implemented so that userspace controls the Hyper-V
crash capabilities that are enabled - userspace must set
HV_X64_MSR_CRASH_CTL_NOTIFY before the guest reads
HV_X64_MSR_CRASH_CTL. I just want to check that you considered an
alternative: a simpler implementation would have the crash
capabilities statically defined by kvm, and
HV_X64_MSR_CRASH_CTL_CONTENTS could just be returned here (and
hv_crash_ctl could be removed from struct kvm_arch_hyperv).

The current implementation is potentially more flexible but makes the
MSR handling a little more awkward since the host_initiated bool needs
to be passed around (patch 09). I guess either approach seems ok to
me.

Also, if this patchset is used then it looks like
HV_X64_MSR_CRASH_CTL_CONTENTS can be removed.

> +   return 0;
> +}
> +
> +static int kvm_hv_msr_set_crash_ctl(struct kvm_vcpu *vcpu, u64 data)
> +{
> +   struct kvm_arch_hyperv *hv = &vcpu->kvm->arch.hyperv;
> +
> +   hv->hv_crash_ctl = data;
> +   if ((data & HV_X64_MSR_CRASH_CTL_NOTIFY)) {
> +   vcpu_debug(vcpu, "hv crash (0x%llx 0x%llx 0x%llx 0x%llx "
> + "0x%llx)\n", hv->hv_crash_param[0],
> + hv->hv_crash_param[1],
> + hv->hv_crash_param[2],
> + hv->hv_crash_param[3],
> + hv->hv_crash_param[4]);
> +
> +   /* Send notification about crash to user space */
> +   kvm_make_request(KVM_REQ_HV_CRASH, vcpu);
> +   return 0;

Returning from here seems unnecessary - if more crash capabilities are
added in the future, the guest might want to invoke multiple
capabilities at once, so we'd want to check for those here too.

> +   }
> +
> +   return 0;
> +}
> +
> +static int kvm_hv_msr_set_crash_data(struct kvm_vcpu *vcpu,
> +u32 index, u64 data)
> +{
> +   struct kvm_arch_hyperv *hv = &vcpu->kvm->arch.hyperv;
> +
> +   if (WARN_ON_ONCE(index >= ARRAY_SIZE(hv->hv_crash_param)))
> +   return -EINVAL;
> +
> +   hv->hv_crash_param[index] = data;
> +   return 0;
> +}
> +
> +static int kvm_hv_msr_get_crash_data(struct kvm_vcpu *vcpu,
> +u32 index, u64 *pdata)
> +{
> +   struct kvm_arch_hyperv *hv = &vcpu->kvm->arch.hyperv;
> +
> +   if (WARN_ON_ONCE(index >= ARRAY_SIZE(hv->hv_crash_param)))
> +   return -EINVAL;
> +
> +   *pdata = hv->hv_crash_param[index];
> +   return 0;
> +}
> +
>  static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data)
>  {
> struct kvm *kvm = vcpu->kvm;
> @@ -98,6 +152,12 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 
> msr, u64 data)
> mark_page_dirty(kvm, gfn);
> break;
> }
> +   case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
> +   return kvm_hv_msr_set_crash_data(vcpu,
> +msr - HV_X64_MSR_CRASH_P0,
> +data);
> +   case HV_X64_MSR_CRASH_CTL:
> +   return kvm_hv_msr_set_crash_ctl(vcpu, data);
> default:
> vcpu_unimpl(vcpu, "Hyper-V unimpl wrmsr: 0x%x data 0x%llx\n",
>   msr, data);
> @@ -170,6 +230,12 @@ static int kvm_hv_get_msr_pw(struct kvm_vcpu *vcpu, u32 
> msr, u64 *pdata)
> case HV_X64_MSR_REFERENCE_TSC:
> data = hv->hv_tsc_page;
> break;
> +   case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
> +   return kvm_hv_msr_get_crash_data(vcpu,
> +msr - HV_X64_MSR_CRASH_P0,
> +pdata);
> +   case HV_X64_MSR_CRASH_CTL:
> +   return kvm_hv_msr_get_crash_ctl(vcpu, pdata);
> default:
> vcpu_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr);
> return 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2755c37..2046b78 100644
> -

[Qemu-devel] [PATCH] thread-win32: fix GetThreadContext() permanently fails

2015-06-22 Thread Zavadovsky Yan

Calling SuspendThread() is not enough to suspend Win32 thread.
We need to call GetThreadContext() after SuspendThread()
to make sure that OS have really suspended target thread.
But GetThreadContext() needs for THREAD_GET_CONTEXT
access right on thread object.

This patch adds THREAD_GET_CONTEXT to OpenThread() arguments
and change 'while(GetThreadContext() == SUCCESS)' to
'while(GetThreadContext() == FAILED)'.
So this 'while' loop will stop only after successful grabbing
of thread context(i.e. when thread is really suspended).
Not after the one failed GetThreadContext() call.

Signed-off-by: Zavadovsky Yan 
---
 cpus.c   | 2 +-
 util/qemu-thread-win32.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/cpus.c b/cpus.c
index b85fb5f..83d5eb5 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1097,7 +1097,7 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
  * suspended until we can get the context.
  */
 tcgContext.ContextFlags = CONTEXT_CONTROL;
-while (GetThreadContext(cpu->hThread, &tcgContext) != 0) {
+while (GetThreadContext(cpu->hThread, &tcgContext) == 0) {
 continue;
 }
 
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index 406b52f..823eca1 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -406,8 +406,8 @@ HANDLE qemu_thread_get_handle(QemuThread *thread)
 
 EnterCriticalSection(&data->cs);
 if (!data->exited) {
-handle = OpenThread(SYNCHRONIZE | THREAD_SUSPEND_RESUME, FALSE,
-thread->tid);
+handle = OpenThread(SYNCHRONIZE | THREAD_SUSPEND_RESUME | 
THREAD_GET_CONTEXT,
+FALSE, thread->tid);
 } else {
 handle = NULL;
 }
-- 
2.4.4.windows.2

[Qemu-devel] [PATCH v2 16/16] qtest/ahci: ncq migration test

2015-06-22 Thread John Snow

Signed-off-by: John Snow 
---
 tests/ahci-test.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 0d117fe..3f06fd9 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -1146,9 +1146,9 @@ static void test_migrate_sanity(void)
 }
 
 /**
- * DMA Migration test: Write a pattern, migrate, then read.
+ * Simple migration test: Write a pattern, migrate, then read.
  */
-static void test_migrate_dma(void)
+static void ahci_migrate_simple(uint8_t cmd_read, uint8_t cmd_write)
 {
 AHCIQState *src, *dst;
 uint8_t px;
@@ -1176,9 +1176,9 @@ static void test_migrate_dma(void)
 }
 
 /* Write, migrate, then read. */
-ahci_io(src, px, CMD_WRITE_DMA, tx, bufsize, 0);
+ahci_io(src, px, cmd_write, tx, bufsize, 0);
 ahci_migrate(src, dst, uri);
-ahci_io(dst, px, CMD_READ_DMA, rx, bufsize, 0);
+ahci_io(dst, px, cmd_read, rx, bufsize, 0);
 
 /* Verify pattern */
 g_assert_cmphex(memcmp(tx, rx, bufsize), ==, 0);
@@ -1189,6 +1189,16 @@ static void test_migrate_dma(void)
 g_free(tx);
 }
 
+static void test_migrate_dma(void)
+{
+ahci_migrate_simple(CMD_READ_DMA, CMD_WRITE_DMA);
+}
+
+static void test_migrate_ncq(void)
+{
+ahci_migrate_simple(READ_FPDMA_QUEUED, WRITE_FPDMA_QUEUED);
+}
+
 /**
  * DMA Error Test
  *
@@ -1666,6 +1676,7 @@ int main(int argc, char **argv)
 qtest_add_func("/ahci/reset", test_reset);
 
 qtest_add_func("/ahci/io/ncq/simple", test_ncq_simple);
+qtest_add_func("/ahci/migrate/ncq/simple", test_migrate_ncq);
 
 ret = g_test_run();
 
-- 
2.1.0

[Qemu-devel] [PATCH v2 13/16] libqos/ahci: set the NCQ tag on command_commit

2015-06-22 Thread John Snow

NCQ commands have the concept of a "TAG" that they need to set,
but in the AHCI world, it is mandated that the TAG always match
the command slot that you executed the NCQ from.

See AHCI 9.3.1.1.5.2 "Native Queued Commands".

Signed-off-by: John Snow 
---
 tests/libqos/ahci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 953a320..7cf667a 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -857,6 +857,11 @@ void ahci_command_commit(AHCIQState *ahci, AHCICommand 
*cmd, uint8_t port)
 cmd->port = port;
 cmd->slot = ahci_pick_cmd(ahci, port);
 
+if (cmd->props->ncq) {
+NCQFIS *nfis = (NCQFIS *)&cmd->fis;
+nfis->tag = (cmd->slot << 3) & 0xFC;
+}
+
 /* Create a buffer for the command table */
 prdtl = size_to_prdtl(cmd->xbytes, cmd->prd_size);
 table_size = CMD_TBL_SIZ(prdtl);
-- 
2.1.0

[Qemu-devel] [PATCH v2 14/16] libqos/ahci: Force all NCQ commands to be LBA48

2015-06-22 Thread John Snow

NCQ commands are LBA48 by definition.

See SATA 3.2 13.6.4.1 "READ FPDMA QUEUED", or
SATA 3.2 13.6.5.1 "WRITE FPDMA QUEUED."

Signed-off-by: John Snow 
---
 tests/libqos/ahci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 7cf667a..3d62cb6 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -781,7 +781,7 @@ void ahci_command_set_offset(AHCICommand *cmd, uint64_t 
lba_sect)
 RegH2DFIS *fis = &(cmd->fis);
 if (cmd->props->lba28) {
 g_assert_cmphex(lba_sect, <=, 0xFFF);
-} else if (cmd->props->lba48) {
+} else if (cmd->props->lba48 || cmd->props->ncq) {
 g_assert_cmphex(lba_sect, <=, 0x);
 } else {
 /* Can't set offset if we don't know the format. */
-- 
2.1.0

[Qemu-devel] [PATCH v2 10/16] libqos/ahci: add NCQ frame support

2015-06-22 Thread John Snow

NCQ frames are generated a little differently than
their non-NCQ cousins. Add support for them.

Signed-off-by: John Snow 
---
 tests/libqos/ahci.c | 44 +++-
 tests/libqos/ahci.h | 29 -
 2 files changed, 63 insertions(+), 10 deletions(-)

diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 922af8b..9a4d510 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -695,19 +695,34 @@ static void command_header_init(AHCICommand *cmd)
 static void command_table_init(AHCICommand *cmd)
 {
 RegH2DFIS *fis = &(cmd->fis);
+uint16_t sect_count = (cmd->xbytes / AHCI_SECTOR_SIZE);
 
 fis->fis_type = REG_H2D_FIS;
 fis->flags = REG_H2D_FIS_CMD; /* "Command" bit */
 fis->command = cmd->name;
-cmd->fis.feature_low = 0x00;
-cmd->fis.feature_high = 0x00;
-if (cmd->props->lba28 || cmd->props->lba48) {
-cmd->fis.device = ATA_DEVICE_LBA;
+
+if (cmd->props->ncq) {
+NCQFIS *ncqfis = (NCQFIS *)fis;
+/* NCQ is weird and re-uses FIS frames for unrelated data.
+ * See SATA 3.2, 13.6.4.1 READ FPDMA QUEUED for an example. */
+ncqfis->sector_low = sect_count & 0xFF;
+ncqfis->sector_hi = (sect_count >> 8) & 0xFF;
+ncqfis->device = NCQ_DEVICE_MAGIC;
+/* Force Unit Access is bit 7 in the device register */
+ncqfis->tag = 0;  /* bits 3-7 are the NCQ tag */
+ncqfis->prio = 0; /* bits 6,7 are a prio tag */
+/* RARC bit is bit 0 of TAG field */
+} else {
+fis->feature_low = 0x00;
+fis->feature_high = 0x00;
+if (cmd->props->lba28 || cmd->props->lba48) {
+fis->device = ATA_DEVICE_LBA;
+}
+fis->count = (cmd->xbytes / AHCI_SECTOR_SIZE);
 }
-cmd->fis.count = (cmd->xbytes / AHCI_SECTOR_SIZE);
-cmd->fis.icc = 0x00;
-cmd->fis.control = 0x00;
-memset(cmd->fis.aux, 0x00, ARRAY_SIZE(cmd->fis.aux));
+fis->icc = 0x00;
+fis->control = 0x00;
+memset(fis->aux, 0x00, ARRAY_SIZE(fis->aux));
 }
 
 AHCICommand *ahci_command_create(uint8_t command_name)
@@ -721,6 +736,7 @@ AHCICommand *ahci_command_create(uint8_t command_name)
 g_assert(!(props->lba28 && props->lba48));
 g_assert(!(props->read && props->write));
 g_assert(!props->size || props->data);
+g_assert(!props->ncq || (props->ncq && props->lba48));
 
 /* Defaults and book-keeping */
 cmd->props = props;
@@ -789,6 +805,8 @@ void ahci_command_set_buffer(AHCICommand *cmd, uint64_t 
buffer)
 void ahci_command_set_sizes(AHCICommand *cmd, uint64_t xbytes,
 unsigned prd_size)
 {
+uint16_t sect_count;
+
 /* Each PRD can describe up to 4MiB, and must not be odd. */
 g_assert_cmphex(prd_size, <=, 4096 * 1024);
 g_assert_cmphex(prd_size & 0x01, ==, 0x00);
@@ -796,7 +814,15 @@ void ahci_command_set_sizes(AHCICommand *cmd, uint64_t 
xbytes,
 cmd->prd_size = prd_size;
 }
 cmd->xbytes = xbytes;
-cmd->fis.count = (cmd->xbytes / AHCI_SECTOR_SIZE);
+sect_count = (cmd->xbytes / AHCI_SECTOR_SIZE);
+
+if (cmd->props->ncq) {
+NCQFIS *nfis = (NCQFIS *)&(cmd->fis);
+nfis->sector_low = sect_count & 0xFF;
+nfis->sector_hi = (sect_count >> 8) & 0xFF;
+} else {
+cmd->fis.count = sect_count;
+}
 cmd->header.prdtl = size_to_prdtl(cmd->xbytes, cmd->prd_size);
 }
 
diff --git a/tests/libqos/ahci.h b/tests/libqos/ahci.h
index ca8abee..594a1c9 100644
--- a/tests/libqos/ahci.h
+++ b/tests/libqos/ahci.h
@@ -291,8 +291,9 @@ enum {
 #define CMDH_PMP  (0xF000)
 
 /* ATA device register masks */
-#define ATA_DEVICE_MAGIC 0xA0
+#define ATA_DEVICE_MAGIC 0xA0 /* used in ata1-3 */
 #define ATA_DEVICE_LBA   0x40
+#define NCQ_DEVICE_MAGIC 0x40 /* for ncq device registers */
 #define ATA_DEVICE_DRIVE 0x10
 #define ATA_DEVICE_HEAD  0x0F
 
@@ -397,6 +398,32 @@ typedef struct RegH2DFIS {
 } __attribute__((__packed__)) RegH2DFIS;
 
 /**
+ * Register host-to-device FIS structure, for NCQ commands.
+ * Actually just a RegH2DFIS, but with fields repurposed.
+ * Repurposed fields are annotated below.
+ */
+typedef struct NCQFIS {
+/* DW0 */
+uint8_t fis_type;
+uint8_t flags;
+uint8_t command;
+uint8_t sector_low; /* H2D: Feature 7:0 */
+/* DW1 */
+uint8_t lba_lo[3];
+uint8_t device;
+/* DW2 */
+uint8_t lba_hi[3];
+uint8_t sector_hi; /* H2D: Feature 15:8 */
+/* DW3 */
+uint8_t tag;   /* H2D: Count 0:7 */
+uint8_t prio;  /* H2D: Count 15:8 */
+uint8_t icc;
+uint8_t control;
+/* DW4 */
+uint8_t aux[4];
+} __attribute__((__packed__)) NCQFIS;
+
+/**
  * Command List entry structure.
  * The command list contains between 1-32 of these structures.
  */
-- 
2.1.0

[Qemu-devel] [PATCH v2 07/16] ahci: ncq sector count correction

2015-06-22 Thread John Snow

This value should not be size-corrected, 0 sectors does not imply
1 sector(s). This is just debug information, but it's misleading!

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index c2e6642..95d228f 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1043,14 +1043,14 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 
 DPRINTF(port, "NCQ transfer LBA from %"PRId64" to %"PRId64", "
 "drive max %"PRId64"\n",
-ncq_tfs->lba, ncq_tfs->lba + ncq_tfs->sector_count - 2,
+ncq_tfs->lba, ncq_tfs->lba + ncq_tfs->sector_count - 1,
 ide_state->nb_sectors - 1);
 
 switch(ncq_fis->command) {
 case READ_FPDMA_QUEUED:
 DPRINTF(port, "NCQ reading %d sectors from LBA %"PRId64", "
 "tag %d\n",
-ncq_tfs->sector_count-1, ncq_tfs->lba, ncq_tfs->tag);
+ncq_tfs->sector_count, ncq_tfs->lba, ncq_tfs->tag);
 
 DPRINTF(port, "tag %d aio read %"PRId64"\n",
 ncq_tfs->tag, ncq_tfs->lba);
@@ -1063,7 +1063,7 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 break;
 case WRITE_FPDMA_QUEUED:
 DPRINTF(port, "NCQ writing %d sectors to LBA %"PRId64", tag %d\n",
-ncq_tfs->sector_count-1, ncq_tfs->lba, ncq_tfs->tag);
+ncq_tfs->sector_count, ncq_tfs->lba, ncq_tfs->tag);
 
 DPRINTF(port, "tag %d aio write %"PRId64"\n",
 ncq_tfs->tag, ncq_tfs->lba);
-- 
2.1.0

[Qemu-devel] [PATCH v2 12/16] libqos/ahci: adjust expected NCQ interrupts

2015-06-22 Thread John Snow

NCQ commands will expect the SDBS interrupt,
and in the normative case, do not expect to see
a D2H Register FIS unless something went wrong.

Signed-off-by: John Snow 
---
 tests/libqos/ahci.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index da02e2e..953a320 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -745,12 +745,15 @@ AHCICommand *ahci_command_create(uint8_t command_name)
 cmd->prd_size = 4096;
 cmd->buffer = 0xabad1dea;
 
-cmd->interrupts = AHCI_PX_IS_DHRS;
+if (!cmd->props->ncq) {
+cmd->interrupts = AHCI_PX_IS_DHRS;
+}
 /* BUG: We expect the DPS interrupt for data commands */
 /* cmd->interrupts |= props->data ? AHCI_PX_IS_DPS : 0; */
 /* BUG: We expect the DMA Setup interrupt for DMA commands */
 /* cmd->interrupts |= props->dma ? AHCI_PX_IS_DSS : 0; */
 cmd->interrupts |= props->pio ? AHCI_PX_IS_PSS : 0;
+cmd->interrupts |= props->ncq ? AHCI_PX_IS_SDBS : 0;
 
 command_header_init(cmd);
 command_table_init(cmd);
@@ -934,7 +937,9 @@ void ahci_command_verify(AHCIQState *ahci, AHCICommand *cmd)
 ahci_port_check_interrupts(ahci, port, cmd->interrupts);
 ahci_port_check_nonbusy(ahci, port, slot);
 ahci_port_check_cmd_sanity(ahci, cmd);
-ahci_port_check_d2h_sanity(ahci, port, slot);
+if (cmd->interrupts & AHCI_PX_IS_DHRS) {
+ahci_port_check_d2h_sanity(ahci, port, slot);
+}
 if (cmd->props->pio) {
 ahci_port_check_pio_sanity(ahci, port, slot, cmd->xbytes);
 }
-- 
2.1.0

[Qemu-devel] [PATCH v2 09/16] libqos/ahci: fix cmd_sanity for ncq

2015-06-22 Thread John Snow

NCQ commands should not / do not update the byte count
in the command header post command, so this field is
meaningless for NCQ tests.

Signed-off-by: John Snow 
---
 tests/libqos/ahci.c | 46 --
 tests/libqos/ahci.h |  3 +--
 2 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 08e1c98..922af8b 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -73,6 +73,22 @@ AHCICommandProp ahci_command_properties[] = {
 { .cmd = CMD_FLUSH_CACHE,   .data = false }
 };
 
+struct AHCICommand {
+/* Test Management Data */
+uint8_t name;
+uint8_t port;
+uint8_t slot;
+uint32_t interrupts;
+uint64_t xbytes;
+uint32_t prd_size;
+uint64_t buffer;
+AHCICommandProp *props;
+/* Data to be transferred to the guest */
+AHCICommandHeader header;
+RegH2DFIS fis;
+void *atapi_cmd;
+};
+
 /**
  * Allocate space in the guest using information in the AHCIQState object.
  */
@@ -462,13 +478,15 @@ void ahci_port_check_pio_sanity(AHCIQState *ahci, uint8_t 
port,
 g_free(pio);
 }
 
-void ahci_port_check_cmd_sanity(AHCIQState *ahci, uint8_t port,
-uint8_t slot, size_t buffsize)
+void ahci_port_check_cmd_sanity(AHCIQState *ahci, AHCICommand *cmd)
 {
-AHCICommandHeader cmd;
+AHCICommandHeader cmdh;
 
-ahci_get_command_header(ahci, port, slot, &cmd);
-g_assert_cmphex(buffsize, ==, cmd.prdbc);
+ahci_get_command_header(ahci, cmd->port, cmd->slot, &cmdh);
+/* Physical Region Descriptor Byte Count is not required to work for NCQ. 
*/
+if (!cmd->props->ncq) {
+g_assert_cmphex(cmd->xbytes, ==, cmdh.prdbc);
+}
 }
 
 /* Get the command in #slot of port #port. */
@@ -612,22 +630,6 @@ void ahci_guest_io(AHCIQState *ahci, uint8_t port, uint8_t 
ide_cmd,
 ahci_command_free(cmd);
 }
 
-struct AHCICommand {
-/* Test Management Data */
-uint8_t name;
-uint8_t port;
-uint8_t slot;
-uint32_t interrupts;
-uint64_t xbytes;
-uint32_t prd_size;
-uint64_t buffer;
-AHCICommandProp *props;
-/* Data to be transferred to the guest */
-AHCICommandHeader header;
-RegH2DFIS fis;
-void *atapi_cmd;
-};
-
 static AHCICommandProp *ahci_command_find(uint8_t command_name)
 {
 int i;
@@ -901,7 +903,7 @@ void ahci_command_verify(AHCIQState *ahci, AHCICommand *cmd)
 ahci_port_check_error(ahci, port);
 ahci_port_check_interrupts(ahci, port, cmd->interrupts);
 ahci_port_check_nonbusy(ahci, port, slot);
-ahci_port_check_cmd_sanity(ahci, port, slot, cmd->xbytes);
+ahci_port_check_cmd_sanity(ahci, cmd);
 ahci_port_check_d2h_sanity(ahci, port, slot);
 if (cmd->props->pio) {
 ahci_port_check_pio_sanity(ahci, port, slot, cmd->xbytes);
diff --git a/tests/libqos/ahci.h b/tests/libqos/ahci.h
index 779e812..ca8abee 100644
--- a/tests/libqos/ahci.h
+++ b/tests/libqos/ahci.h
@@ -512,8 +512,7 @@ void ahci_port_check_nonbusy(AHCIQState *ahci, uint8_t 
port, uint8_t slot);
 void ahci_port_check_d2h_sanity(AHCIQState *ahci, uint8_t port, uint8_t slot);
 void ahci_port_check_pio_sanity(AHCIQState *ahci, uint8_t port,
 uint8_t slot, size_t buffsize);
-void ahci_port_check_cmd_sanity(AHCIQState *ahci, uint8_t port,
-uint8_t slot, size_t buffsize);
+void ahci_port_check_cmd_sanity(AHCIQState *ahci, AHCICommand *cmd);
 void ahci_get_command_header(AHCIQState *ahci, uint8_t port,
  uint8_t slot, AHCICommandHeader *cmd);
 void ahci_set_command_header(AHCIQState *ahci, uint8_t port,
-- 
2.1.0

[Qemu-devel] [PATCH v2 15/16] qtest/ahci: simple ncq data test

2015-06-22 Thread John Snow

Test the NCQ pathways for a simple IO RW test.
Also, test that libqos doesn't explode when
running NCQ commands :)

Signed-off-by: John Snow 
---
 tests/ahci-test.c   | 13 +
 tests/libqos/ahci.c | 46 +-
 tests/libqos/ahci.h | 27 +++
 3 files changed, 53 insertions(+), 33 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index ee1dc20..0d117fe 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -1401,6 +1401,17 @@ static void test_reset(void)
 ahci_shutdown(ahci);
 }
 
+static void test_ncq_simple(void)
+{
+AHCIQState *ahci;
+
+ahci = ahci_boot_and_enable(NULL);
+ahci_test_io_rw_simple(ahci, 4096, 0,
+   READ_FPDMA_QUEUED,
+   WRITE_FPDMA_QUEUED);
+ahci_shutdown(ahci);
+}
+
 
/**/
 /* AHCI I/O Test Matrix Definitions   
*/
 
@@ -1654,6 +1665,8 @@ int main(int argc, char **argv)
 qtest_add_func("/ahci/max", test_max);
 qtest_add_func("/ahci/reset", test_reset);
 
+qtest_add_func("/ahci/io/ncq/simple", test_ncq_simple);
+
 ret = g_test_run();
 
 /* Cleanup */
diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 3d62cb6..33ecd2a 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -50,27 +50,31 @@ typedef struct AHCICommandProp {
 } AHCICommandProp;
 
 AHCICommandProp ahci_command_properties[] = {
-{ .cmd = CMD_READ_PIO,  .data = true,  .pio = true,
-.lba28 = true, .read = true },
-{ .cmd = CMD_WRITE_PIO, .data = true,  .pio = true,
-.lba28 = true, .write = true },
-{ .cmd = CMD_READ_PIO_EXT,  .data = true,  .pio = true,
-.lba48 = true, .read = true },
-{ .cmd = CMD_WRITE_PIO_EXT, .data = true,  .pio = true,
-.lba48 = true, .write = true },
-{ .cmd = CMD_READ_DMA,  .data = true,  .dma = true,
-.lba28 = true, .read = true },
-{ .cmd = CMD_WRITE_DMA, .data = true,  .dma = true,
-.lba28 = true, .write = true },
-{ .cmd = CMD_READ_DMA_EXT,  .data = true,  .dma = true,
-.lba48 = true, .read = true },
-{ .cmd = CMD_WRITE_DMA_EXT, .data = true,  .dma = true,
-.lba48 = true, .write = true },
-{ .cmd = CMD_IDENTIFY,  .data = true,  .pio = true,
-.size = 512,   .read = true },
-{ .cmd = CMD_READ_MAX,  .lba28 = true },
-{ .cmd = CMD_READ_MAX_EXT,  .lba48 = true },
-{ .cmd = CMD_FLUSH_CACHE,   .data = false }
+{ .cmd = CMD_READ_PIO,   .data = true,  .pio = true,
+ .lba28 = true, .read = true },
+{ .cmd = CMD_WRITE_PIO,  .data = true,  .pio = true,
+ .lba28 = true, .write = true },
+{ .cmd = CMD_READ_PIO_EXT,   .data = true,  .pio = true,
+ .lba48 = true, .read = true },
+{ .cmd = CMD_WRITE_PIO_EXT,  .data = true,  .pio = true,
+ .lba48 = true, .write = true },
+{ .cmd = CMD_READ_DMA,   .data = true,  .dma = true,
+ .lba28 = true, .read = true },
+{ .cmd = CMD_WRITE_DMA,  .data = true,  .dma = true,
+ .lba28 = true, .write = true },
+{ .cmd = CMD_READ_DMA_EXT,   .data = true,  .dma = true,
+ .lba48 = true, .read = true },
+{ .cmd = CMD_WRITE_DMA_EXT,  .data = true,  .dma = true,
+ .lba48 = true, .write = true },
+{ .cmd = CMD_IDENTIFY,   .data = true,  .pio = true,
+ .size = 512,   .read = true },
+{ .cmd = READ_FPDMA_QUEUED,  .data = true,  .dma = true,
+ .lba48 = true, .read = true, .ncq = true },
+{ .cmd = WRITE_FPDMA_QUEUED, .data = true,  .dma = true,
+ .lba48 = true, .write = true, .ncq = true },
+{ .cmd = CMD_READ_MAX,   .lba28 = true },
+{ .cmd = CMD_READ_MAX_EXT,   .lba48 = true },
+{ .cmd = CMD_FLUSH_CACHE,.data = false }
 };
 
 struct AHCICommand {
diff --git a/tests/libqos/ahci.h b/tests/libqos/ahci.h
index 594a1c9..a08a9dd 100644
--- a/tests/libqos/ahci.h
+++ b/tests/libqos/ahci.h
@@ -263,20 +263,23 @@ enum {
 /* ATA Commands */
 enum {
 /* DMA */
-CMD_READ_DMA  = 0xC8,
-CMD_READ_DMA_EXT  = 0x25,
-CMD_WRITE_DMA = 0xCA,
-CMD_WRITE_DMA_EXT = 0x35,
+CMD_READ_DMA   = 0xC8,
+CMD_READ_DMA_EXT   = 0x25,
+CMD_WRITE_DMA  = 0xCA,
+CMD_WRITE_DMA_EXT  = 0x35,
 /* PIO */
-CMD_READ_PIO  = 0x20,
-CMD_READ_PIO_EXT  = 0x24,
-CMD_WRITE_PIO = 0x30,
-CMD_

[Qemu-devel] [PATCH v2 05/16] ahci: separate prdtl from opts

2015-06-22 Thread John Snow

There's no real reason to have it bundled together, and this way
is a little nicer to follow if you have the AHCI spec pulled up.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 23 ---
 hw/ide/ahci.h |  3 ++-
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 24c4832..14961a0 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -834,10 +834,11 @@ static int ahci_populate_sglist(AHCIDevice *ad, 
QEMUSGList *sglist,
 int32_t offset)
 {
 AHCICmdHdr *cmd = ad->cur_cmd;
-uint32_t opts = le32_to_cpu(cmd->opts);
-uint64_t prdt_addr = le64_to_cpu(cmd->tbl_addr) + 0x80;
-int sglist_alloc_hint = opts >> AHCI_CMD_HDR_PRDT_LEN;
-dma_addr_t prdt_len = (sglist_alloc_hint * sizeof(AHCI_SG));
+uint16_t opts = le16_to_cpu(cmd->opts);
+uint16_t prdtl = le16_to_cpu(cmd->prdtl);
+uint64_t cfis_addr = le64_to_cpu(cmd->tbl_addr);
+uint64_t prdt_addr = cfis_addr + 0x80;
+dma_addr_t prdt_len = (prdtl * sizeof(AHCI_SG));
 dma_addr_t real_prdt_len = prdt_len;
 uint8_t *prdt;
 int i;
@@ -857,7 +858,7 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList 
*sglist,
  * request for sector sizes up to 32K.
  */
 
-if (!sglist_alloc_hint) {
+if (!prdtl) {
 DPRINTF(ad->port_no, "no sg list given by guest: 0x%08x\n", opts);
 return -1;
 }
@@ -876,10 +877,10 @@ static int ahci_populate_sglist(AHCIDevice *ad, 
QEMUSGList *sglist,
 }
 
 /* Get entries in the PRDT, init a qemu sglist accordingly */
-if (sglist_alloc_hint > 0) {
+if (prdtl > 0) {
 AHCI_SG *tbl = (AHCI_SG *)prdt;
 sum = 0;
-for (i = 0; i < sglist_alloc_hint; i++) {
+for (i = 0; i < prdtl; i++) {
 /* flags_size is zero-based */
 tbl_entry_size = prdt_tbl_entry_size(&tbl[i]);
 if (offset <= (sum + tbl_entry_size)) {
@@ -897,12 +898,12 @@ static int ahci_populate_sglist(AHCIDevice *ad, 
QEMUSGList *sglist,
 goto out;
 }
 
-qemu_sglist_init(sglist, qbus->parent, (sglist_alloc_hint - off_idx),
+qemu_sglist_init(sglist, qbus->parent, (prdtl - off_idx),
  ad->hba->as);
 qemu_sglist_add(sglist, le64_to_cpu(tbl[off_idx].addr) + off_pos,
 prdt_tbl_entry_size(&tbl[off_idx]) - off_pos);
 
-for (i = off_idx + 1; i < sglist_alloc_hint; i++) {
+for (i = off_idx + 1; i < prdtl; i++) {
 /* flags_size is zero-based */
 qemu_sglist_add(sglist, le64_to_cpu(tbl[i].addr),
 prdt_tbl_entry_size(&tbl[i]));
@@ -1069,7 +1070,7 @@ static void handle_reg_h2d_fis(AHCIState *s, int port,
 {
 IDEState *ide_state = &s->dev[port].port.ifs[0];
 AHCICmdHdr *cmd = s->dev[port].cur_cmd;
-uint32_t opts = le32_to_cpu(cmd->opts);
+uint16_t opts = le16_to_cpu(cmd->opts);
 
 if (cmd_fis[1] & 0x0F) {
 DPRINTF(port, "Port Multiplier not supported."
@@ -1226,7 +1227,7 @@ static void ahci_start_transfer(IDEDMA *dma)
 IDEState *s = &ad->port.ifs[0];
 uint32_t size = (uint32_t)(s->data_end - s->data_ptr);
 /* write == ram -> device */
-uint32_t opts = le32_to_cpu(ad->cur_cmd->opts);
+uint16_t opts = le16_to_cpu(ad->cur_cmd->opts);
 int is_write = opts & AHCI_CMD_WRITE;
 int is_atapi = opts & AHCI_CMD_ATAPI;
 int has_sglist = 0;
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index 6d167f2..b8872a4 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -236,7 +236,8 @@ typedef struct AHCIPortRegs {
 } AHCIPortRegs;
 
 typedef struct AHCICmdHdr {
-uint32_topts;
+uint16_topts;
+uint16_tprdtl;
 uint32_tstatus;
 uint64_ttbl_addr;
 uint32_treserved[4];
-- 
2.1.0

[Qemu-devel] [PATCH v2 11/16] libqos/ahci: edit wait to be ncq aware

2015-06-22 Thread John Snow

The wait command should check to make sure SACT is clear as well
as the Command Issue register.

Signed-off-by: John Snow 
---
 tests/libqos/ahci.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
index 9a4d510..da02e2e 100644
--- a/tests/libqos/ahci.c
+++ b/tests/libqos/ahci.c
@@ -908,11 +908,15 @@ void ahci_command_wait(AHCIQState *ahci, AHCICommand *cmd)
 /* We can't rely on STS_BSY until the command has started processing.
  * Therefore, we also use the Command Issue bit as indication of
  * a command in-flight. */
-while (BITSET(ahci_px_rreg(ahci, cmd->port, AHCI_PX_TFD),
-  AHCI_PX_TFD_STS_BSY) ||
-   BITSET(ahci_px_rreg(ahci, cmd->port, AHCI_PX_CI), (1 << 
cmd->slot))) {
+
+#define RSET(REG, MASK) (BITSET(ahci_px_rreg(ahci, cmd->port, (REG)), (MASK)))
+
+while (RSET(AHCI_PX_TFD, AHCI_PX_TFD_STS_BSY) ||
+   RSET(AHCI_PX_CI, 1 << cmd->slot) ||
+   (cmd->props->ncq && RSET(AHCI_PX_SACT, 1 << cmd->slot))) {
 usleep(50);
 }
+
 }
 
 void ahci_command_issue(AHCIQState *ahci, AHCICommand *cmd)
-- 
2.1.0

[Qemu-devel] [PATCH v2 02/16] ahci: use shorter variables

2015-06-22 Thread John Snow

Trivial cleanup that I didn't want to tack-on to anything else.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 26df2ca..14eccb8 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -972,9 +972,11 @@ static int is_ncq(uint8_t ata_cmd)
 static void process_ncq_command(AHCIState *s, int port, uint8_t *cmd_fis,
 int slot)
 {
+AHCIDevice *ad = &s->dev[port];
+IDEState *ide_state = &ad->port.ifs[0];
 NCQFrame *ncq_fis = (NCQFrame*)cmd_fis;
 uint8_t tag = ncq_fis->tag >> 3;
-NCQTransferState *ncq_tfs = &s->dev[port].ncq_tfs[tag];
+NCQTransferState *ncq_tfs = &ad->ncq_tfs[tag];
 
 if (ncq_tfs->used) {
 /* error - already in use */
@@ -983,7 +985,7 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 }
 
 ncq_tfs->used = 1;
-ncq_tfs->drive = &s->dev[port];
+ncq_tfs->drive = ad;
 ncq_tfs->slot = slot;
 ncq_tfs->lba = ((uint64_t)ncq_fis->lba5 << 40) |
((uint64_t)ncq_fis->lba4 << 32) |
@@ -1000,9 +1002,9 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 DPRINTF(port, "NCQ transfer LBA from %"PRId64" to %"PRId64", "
 "drive max %"PRId64"\n",
 ncq_tfs->lba, ncq_tfs->lba + ncq_tfs->sector_count - 2,
-s->dev[port].port.ifs[0].nb_sectors - 1);
+ide_state->nb_sectors - 1);
 
-ahci_populate_sglist(&s->dev[port], &ncq_tfs->sglist, 0);
+ahci_populate_sglist(ad, &ncq_tfs->sglist, 0);
 ncq_tfs->tag = tag;
 
 switch(ncq_fis->command) {
@@ -1014,9 +1016,9 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 DPRINTF(port, "tag %d aio read %"PRId64"\n",
 ncq_tfs->tag, ncq_tfs->lba);
 
-dma_acct_start(ncq_tfs->drive->port.ifs[0].blk, &ncq_tfs->acct,
+dma_acct_start(ide_state->blk, &ncq_tfs->acct,
&ncq_tfs->sglist, BLOCK_ACCT_READ);
-ncq_tfs->aiocb = dma_blk_read(ncq_tfs->drive->port.ifs[0].blk,
+ncq_tfs->aiocb = dma_blk_read(ide_state->blk,
   &ncq_tfs->sglist, ncq_tfs->lba,
   ncq_cb, ncq_tfs);
 break;
@@ -1027,9 +1029,9 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 DPRINTF(port, "tag %d aio write %"PRId64"\n",
 ncq_tfs->tag, ncq_tfs->lba);
 
-dma_acct_start(ncq_tfs->drive->port.ifs[0].blk, &ncq_tfs->acct,
+dma_acct_start(ide_state->blk, &ncq_tfs->acct,
&ncq_tfs->sglist, BLOCK_ACCT_WRITE);
-ncq_tfs->aiocb = dma_blk_write(ncq_tfs->drive->port.ifs[0].blk,
+ncq_tfs->aiocb = dma_blk_write(ide_state->blk,
&ncq_tfs->sglist, ncq_tfs->lba,
ncq_cb, ncq_tfs);
 break;
-- 
2.1.0

[Qemu-devel] [PATCH v2 08/16] ahci/qtest: Execute IDENTIFY prior to data commands

2015-06-22 Thread John Snow

If you try to execute an NCQ command before trying to engage with the
device by issuing an IDENTIFY command, the error bits that are part of
the signature will fool the test suite into thinking there was a failure.

Issue IDENTIFY first on "boot", which will clear the signature out of
the registers for us.

Signed-off-by: John Snow 
---
 tests/ahci-test.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 0a0ef2a..ee1dc20 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -228,6 +228,8 @@ static AHCIQState *ahci_boot_and_enable(const char *cli, 
...)
 {
 AHCIQState *ahci;
 va_list ap;
+uint16_t buff[256];
+uint8_t port;
 
 if (cli) {
 va_start(ap, cli);
@@ -239,6 +241,10 @@ static AHCIQState *ahci_boot_and_enable(const char *cli, 
...)
 
 ahci_pci_enable(ahci);
 ahci_hba_enable(ahci);
+/* Initialize test device */
+port = ahci_port_select(ahci);
+ahci_port_clear(ahci, port);
+ahci_io(ahci, port, CMD_IDENTIFY, &buff, sizeof(buff), 0);
 
 return ahci;
 }
-- 
2.1.0

[Qemu-devel] [PATCH v2 04/16] ahci: check for ncq prdtl overflow

2015-06-22 Thread John Snow

Don't attempt the NCQ transfer if the PRDT we were given is not big
enough to perform the entire transfer.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 375aa44..24c4832 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -983,6 +983,7 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 NCQFrame *ncq_fis = (NCQFrame*)cmd_fis;
 uint8_t tag = ncq_fis->tag >> 3;
 NCQTransferState *ncq_tfs = &ad->ncq_tfs[tag];
+size_t size;
 
 if (ncq_tfs->used) {
 /* error - already in use */
@@ -999,20 +1000,28 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
((uint64_t)ncq_fis->lba2 << 16) |
((uint64_t)ncq_fis->lba1 << 8) |
(uint64_t)ncq_fis->lba0;
+ncq_tfs->tag = tag;
 
-/* Note: We calculate the sector count, but don't currently rely on it.
- * The total size of the DMA buffer tells us the transfer size instead. */
 ncq_tfs->sector_count = ((uint16_t)ncq_fis->sector_count_high << 8) |
 ncq_fis->sector_count_low;
+ahci_populate_sglist(ad, &ncq_tfs->sglist, 0);
+size = ncq_tfs->sector_count * 512;
+
+if (ncq_tfs->sglist.size < size) {
+error_report("ahci: PRDT length for NCQ command (0x%zx) "
+ "is smaller than the requested size (0x%zx)",
+ ncq_tfs->sglist.size, size);
+qemu_sglist_destroy(&ncq_tfs->sglist);
+ncq_err(ncq_tfs);
+ahci_trigger_irq(ad->hba, ad, PORT_IRQ_OVERFLOW);
+return;
+}
 
 DPRINTF(port, "NCQ transfer LBA from %"PRId64" to %"PRId64", "
 "drive max %"PRId64"\n",
 ncq_tfs->lba, ncq_tfs->lba + ncq_tfs->sector_count - 2,
 ide_state->nb_sectors - 1);
 
-ahci_populate_sglist(ad, &ncq_tfs->sglist, 0);
-ncq_tfs->tag = tag;
-
 switch(ncq_fis->command) {
 case READ_FPDMA_QUEUED:
 DPRINTF(port, "NCQ reading %d sectors from LBA %"PRId64", "
@@ -1051,6 +1060,7 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 "error: tried to process non-NCQ command as NCQ\n");
 }
 qemu_sglist_destroy(&ncq_tfs->sglist);
+ncq_err(ncq_tfs);
 }
 }
 
-- 
2.1.0

[Qemu-devel] [PATCH v2 03/16] ahci: add ncq_err helper

2015-06-22 Thread John Snow

Set some appropriate error bits for NCQ for us.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 14eccb8..375aa44 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -922,6 +922,15 @@ out:
 return r;
 }
 
+static void ncq_err(NCQTransferState *ncq_tfs)
+{
+IDEState *ide_state = &ncq_tfs->drive->port.ifs[0];
+
+ide_state->error = ABRT_ERR;
+ide_state->status = READY_STAT | ERR_STAT;
+ncq_tfs->drive->port_regs.scr_err |= (1 << ncq_tfs->tag);
+}
+
 static void ncq_cb(void *opaque, int ret)
 {
 NCQTransferState *ncq_tfs = (NCQTransferState *)opaque;
@@ -934,10 +943,7 @@ static void ncq_cb(void *opaque, int ret)
 ncq_tfs->drive->port_regs.scr_act &= ~(1 << ncq_tfs->tag);
 
 if (ret < 0) {
-/* error */
-ide_state->error = ABRT_ERR;
-ide_state->status = READY_STAT | ERR_STAT;
-ncq_tfs->drive->port_regs.scr_err |= (1 << ncq_tfs->tag);
+ncq_err(ncq_tfs);
 } else {
 ide_state->status = READY_STAT | SEEK_STAT;
 }
-- 
2.1.0

[Qemu-devel] [PATCH v2 01/16] ahci: Rename NCQFIS structure fields

2015-06-22 Thread John Snow

Several fields of the NCQFIS structure are ambiguously named. This patch
clarifies the intended (if unsupported) usage of the NCQ fields to aid
in creating more meaningful debug messages through the NCQ codepaths.

Signed-off-by: John Snow 
---
 hw/ide/ahci.h | 35 +--
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index 501c002..6d167f2 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -195,6 +195,9 @@
 #define RECEIVE_FPDMA_QUEUED   0x65
 #define SEND_FPDMA_QUEUED  0x64
 
+#define NCQ_FIS_FUA_MASK   0x80
+#define NCQ_FIS_RARC_MASK  0x01
+
 #define RES_FIS_DSFIS  0x00
 #define RES_FIS_PSFIS  0x20
 #define RES_FIS_RFIS   0x40
@@ -312,27 +315,39 @@ extern const VMStateDescription vmstate_ahci;
 .offset = vmstate_offset_value(_state, _field, AHCIState),   \
 }
 
+/**
+ * NCQFrame is the same as a Register H2D FIS (described in SATA 3.2),
+ * but some fields have been re-mapped and re-purposed, as seen in
+ * SATA 3.2 section 13.6.4.1 ("READ FPDMA QUEUED")
+ *
+ * cmd_fis[3], feature 7:0, becomes sector count 7:0.
+ * cmd_fis[7], device 7:0, uses bit 7 as the Force Unit Access bit.
+ * cmd_fis[11], feature 15:8, becomes sector count 15:8.
+ * cmd_fis[12], count 7:0, becomes the NCQ TAG (7:3) and RARC bit (0)
+ * cmd_fis[13], count 15:8, becomes the priority value (7:6)
+ * bytes 16-19 become an le32 "auxiliary" field.
+ */
 typedef struct NCQFrame {
 uint8_t fis_type;
 uint8_t c;
 uint8_t command;
-uint8_t sector_count_low;
+uint8_t sector_count_low;  /* (feature 7:0) */
 uint8_t lba0;
 uint8_t lba1;
 uint8_t lba2;
-uint8_t fua;
+uint8_t fua;   /* (device 7:0) */
 uint8_t lba3;
 uint8_t lba4;
 uint8_t lba5;
-uint8_t sector_count_high;
-uint8_t tag;
-uint8_t reserved5;
-uint8_t reserved6;
+uint8_t sector_count_high; /* (feature 15:8) */
+uint8_t tag;   /* (count 0:7) */
+uint8_t prio;  /* (count 15:8) */
+uint8_t icc;
 uint8_t control;
-uint8_t reserved7;
-uint8_t reserved8;
-uint8_t reserved9;
-uint8_t reserved10;
+uint8_t aux0;
+uint8_t aux1;
+uint8_t aux2;
+uint8_t aux3;
 } QEMU_PACKED NCQFrame;
 
 typedef struct SDBFIS {
-- 
2.1.0

[Qemu-devel] [PATCH v2 06/16] ahci: add ncq debug checks

2015-06-22 Thread John Snow

Most of the time, these bits can be safely ignored. For the purposes
of debugging however, it's nice to know that they're not being used.

Signed-off-by: John Snow 
---
 hw/ide/ahci.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 14961a0..c2e6642 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1003,6 +1003,25 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
(uint64_t)ncq_fis->lba0;
 ncq_tfs->tag = tag;
 
+/* Sanity-check the NCQ packet */
+if (tag != slot) {
+DPRINTF(port, "Warn: NCQ slot (%d) did not match the given tag (%d)\n",
+slot, tag);
+}
+
+if (ncq_fis->aux0 || ncq_fis->aux1 || ncq_fis->aux2 || ncq_fis->aux3) {
+DPRINTF(port, "Warn: Attempt to use NCQ auxiliary fields.\n");
+}
+if (ncq_fis->prio || ncq_fis->icc) {
+DPRINTF(port, "Warn: Unsupported attempt to use PRIO/ICC fields\n");
+}
+if (ncq_fis->fua & NCQ_FIS_FUA_MASK) {
+DPRINTF(port, "Warn: Unsupported attempt to use Force Unit Access\n");
+}
+if (ncq_fis->tag & NCQ_FIS_RARC_MASK) {
+DPRINTF(port, "Warn: Unsupported attempt to use Rebuild Assist\n");
+}
+
 ncq_tfs->sector_count = ((uint16_t)ncq_fis->sector_count_high << 8) |
 ncq_fis->sector_count_low;
 ahci_populate_sglist(ad, &ncq_tfs->sglist, 0);
@@ -1016,6 +1035,10 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 ncq_err(ncq_tfs);
 ahci_trigger_irq(ad->hba, ad, PORT_IRQ_OVERFLOW);
 return;
+} else if (ncq_tfs->sglist.size != size) {
+DPRINTF(port, "Warn: PRDTL (0x%zx)"
+" does not match requested size (0x%zx)",
+ncq_tfs->sglist.size, size);
 }
 
 DPRINTF(port, "NCQ transfer LBA from %"PRId64" to %"PRId64", "
-- 
2.1.0

[Qemu-devel] [PATCH v2 00/16] ahci: ncq cleanup, part 1

2015-06-22 Thread John Snow

requires: 1434470575-21625-1-git-send-email-js...@redhat.com
  [PATCH v2 0/4] ahci: misc fixes/tests for 2.4

This series adds a couple of tests to exercise the NCQ pathways
and establish a baseline for us.

Most of these patches are fairly short and should be relatively trivial
to review. this series will lay the groundwork for part 2,
which adds the rerror/werror=stop and migration support for that feature
as well.

===
v2:

 - Cleared out #ifdefs to prevent unneeded bitrot
 - Fixed migration test in patch 16
 - Removed the forceful ide_state->error clear from patch 08,
   replacing it instead with a change to the tests to issue
   IDENTIFY prior to attempting any data transfer.



For convenience, this branch is available at:
https://github.com/jnsnow/qemu.git branch ahci-ncq-s1
https://github.com/jnsnow/qemu/tree/ahci-ncq-s1

This version is tagged ahci-ncq-s1-v2:
https://github.com/jnsnow/qemu/releases/tag/ahci-ncq-s1-v2

John Snow (16):
  ahci: Rename NCQFIS structure fields
  ahci: use shorter variables
  ahci: add ncq_err helper
  ahci: check for ncq prdtl overflow
  ahci: separate prdtl from opts
  ahci: add ncq debug checks
  ahci: ncq sector count correction
  ahci/qtest: Execute IDENTIFY prior to data commands
  libqos/ahci: fix cmd_sanity for ncq
  libqos/ahci: add NCQ frame support
  libqos/ahci: edit wait to be ncq aware
  libqos/ahci: adjust expected NCQ interrupts
  libqos/ahci: set the NCQ tag on command_commit
  libqos/ahci: Force all NCQ commands to be LBA48
  qtest/ahci: simple ncq data test
  qtest/ahci: ncq migration test

 hw/ide/ahci.c   | 102 +++--
 hw/ide/ahci.h   |  38 
 tests/ahci-test.c   |  38 ++--
 tests/libqos/ahci.c | 162 +---
 tests/libqos/ahci.h |  59 ++-
 5 files changed, 281 insertions(+), 118 deletions(-)

-- 
2.1.0

[Qemu-devel] [PATCH v5 3/3] tests: add testcase for TCO watchdog emulation

2015-06-22 Thread Paulo Alcantara

This patch adds a testcase that covers the following:
  1) TCO default values
  2) first and second TCO timeout
  3) watch and validate ticks counter through TCO_RLD register
  4) maximum supported TCO timeout (0x3ff)
  5) watchdog actions (pause/reset/shutdown/none) upon second TCO
 timeout
  6) set and get of TCO control and status bits

Signed-off-by: Paulo Alcantara 
---
v1 -> v2:
  * some cleanup
  * add test for TCO_LOCK bit

v2 -> v3:
  * add tests for TCO control & status bits
  * fix check of SECOND_TO_STS bit (it's set in TCO2_STS reg)

v3 -> v4:
  * add more description to commit log
  * use RCBA_BASE_ADDR macro defintion from hw/i386/ich9-cc.h instead

v4 -> v5:
  * use modified macros (now prefixed with ICH9_) from ich9-cc.h
  * move license to GPLv2+
---
 tests/Makefile   |   2 +
 tests/tco-test.c | 460 +++
 2 files changed, 462 insertions(+)
 create mode 100644 tests/tco-test.c

diff --git a/tests/Makefile b/tests/Makefile
index eff5e11..ef1e981 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -152,6 +152,7 @@ check-qtest-i386-y += tests/i440fx-test$(EXESUF)
 check-qtest-i386-y += tests/fw_cfg-test$(EXESUF)
 check-qtest-i386-y += tests/drive_del-test$(EXESUF)
 check-qtest-i386-y += tests/wdt_ib700-test$(EXESUF)
+check-qtest-i386-y += tests/tco-test$(EXESUF)
 gcov-files-i386-y += hw/watchdog/watchdog.c hw/watchdog/wdt_ib700.c
 check-qtest-i386-y += $(check-qtest-pci-y)
 gcov-files-i386-y += $(gcov-files-pci-y)
@@ -370,6 +371,7 @@ tests/eepro100-test$(EXESUF): tests/eepro100-test.o
 tests/vmxnet3-test$(EXESUF): tests/vmxnet3-test.o
 tests/ne2000-test$(EXESUF): tests/ne2000-test.o
 tests/wdt_ib700-test$(EXESUF): tests/wdt_ib700-test.o
+tests/tco-test$(EXESUF): tests/tco-test.o $(libqos-pc-obj-y)
 tests/virtio-balloon-test$(EXESUF): tests/virtio-balloon-test.o
 tests/virtio-blk-test$(EXESUF): tests/virtio-blk-test.o $(libqos-virtio-obj-y)
 tests/virtio-net-test$(EXESUF): tests/virtio-net-test.o $(libqos-pc-obj-y)
diff --git a/tests/tco-test.c b/tests/tco-test.c
new file mode 100644
index 000..79a673e
--- /dev/null
+++ b/tests/tco-test.c
@@ -0,0 +1,460 @@
+/*
+ * QEMU ICH9 TCO emulation tests
+ *
+ * Copyright (c) 2015 Paulo Alcantara 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include 
+#include 
+#include 
+#include 
+
+#include "libqtest.h"
+#include "libqos/pci.h"
+#include "libqos/pci-pc.h"
+#include "hw/pci/pci_regs.h"
+#include "hw/i386/ich9.h"
+#include "hw/i386/ich9-cc.h"
+#include "hw/acpi/ich9.h"
+#include "hw/acpi/tco.h"
+
+#define PM_IO_BASE_ADDR 0xb000
+
+enum {
+TCO_RLD_DEFAULT = 0x,
+TCO_DAT_IN_DEFAULT  = 0x00,
+TCO_DAT_OUT_DEFAULT = 0x00,
+TCO1_STS_DEFAULT= 0x,
+TCO2_STS_DEFAULT= 0x,
+TCO1_CNT_DEFAULT= 0x,
+TCO2_CNT_DEFAULT= 0x0008,
+TCO_MESSAGE1_DEFAULT= 0x00,
+TCO_MESSAGE2_DEFAULT= 0x00,
+TCO_WDCNT_DEFAULT   = 0x00,
+TCO_TMR_DEFAULT = 0x0004,
+SW_IRQ_GEN_DEFAULT  = 0x03,
+};
+
+#define TCO_SECS_TO_TICKS(secs) (((secs) * 10) / 6)
+#define TCO_TICKS_TO_SECS(ticks)(((ticks) * 6) / 10)
+
+typedef struct {
+const char *args;
+QPCIDevice *dev;
+void *lpc_base;
+void *tco_io_base;
+} TestData;
+
+static void test_init(TestData *d)
+{
+QPCIBus *bus;
+QTestState *qs;
+char *s;
+
+s = g_strdup_printf("-machine q35 %s", !d->args ? "" : d->args);
+qs = qtest_start(s);
+qtest_irq_intercept_in(qs, "ioapic");
+g_free(s);
+
+bus = qpci_init_pc();
+d->dev = qpci_device_find(bus, QPCI_DEVFN(0x1f, 0x00));
+g_assert(d->dev != NULL);
+
+/* map PCI-to-LPC bridge interface BAR */
+d->lpc_base = qpci_iomap(d->dev, 0, NULL);
+
+qpci_device_enable(d->dev);
+
+g_assert(d->lpc_base != NULL);
+
+/* set ACPI PM I/O space base address */
+qpci_config_writel(d->dev, (uintptr_t)d->lpc_base + ICH9_LPC_PMBASE,
+   PM_IO_BASE_ADDR | 0x1);
+/* enable ACPI I/O */
+qpci_config_writeb(d->dev, (uintptr_t)d->lpc_base + ICH9_LPC_ACPI_CTRL,
+   0x80);
+/* set Root Complex BAR */
+qpci_config_writel(d->dev, (uintptr_t)d->lpc_base + ICH9_LPC_RCBA,
+   ICH9_RCBA_BASE_ADDR | 0x1);
+
+d->tco_io_base = (void *)((uintptr_t)PM_IO_BASE_ADDR + 0x60);
+}
+
+static void stop_tco(const TestData *d)
+{
+uint32_t val;
+
+val = qpci_io_readw(d->dev, d->tco_io_base + TCO1_CNT);
+val |= TCO_TMR_HLT;
+qpci_io_writew(d->dev, d->tco_io_base + TCO1_CNT, val);
+}
+
+static void start_tco(const TestData *d)
+{
+uint32_t val;
+
+val = qpci_io_readw(d->dev, d->tco_io_base + TCO1_CNT);
+val &= ~TCO_TMR_HLT;
+qpci_io_writew(d->dev, d->tco_io_base + TCO1_CNT, val);
+}
+
+static void load_tco(const TestData *d)
+{
+qpci_io_writew(d->dev, d->tco_io_base + TCO

[Qemu-devel] [PATCH v5 1/3] ich9: add TCO interface emulation

2015-06-22 Thread Paulo Alcantara

This interface provides some registers within a 32-byte range and can be
acessed through PCI-to-LPC bridge interface (PMBASE + 0x60).

It's commonly used as a watchdog timer to detect system lockups through
SMIs that are generated -- if TCO_EN bit is set -- on every timeout. If
NO_REBOOT bit is not set in GCS (General Control and Status register),
the system will be resetted upon second timeout if TCO_RLD register
wasn't previously written to prevent timeout.

This patch adds support to TCO watchdog logic and few other features
like mapping NMIs to SMIs (NMI2SMI_EN bit), system intruder detection,
etc. are not implemented yet.

Signed-off-by: Paulo Alcantara 
---
v1 -> v2:
  * add migration support for TCO I/O device state
  * wake up only when total time expired instead of every 0.6s
  * some cleanup suggested by Paolo Bonzini

v2 -> v3:
  * set SECOND_TO_STS and BOOT_STS bits in TCO2_STS instead
  * improve handling of TCO_LOCK bit in TCO1_CNT register

v3 -> v4:
  * fix some conflicts in hw/acpi/ich9.c after rebasing against master
  * remove meaningless "use_tco" field from TCOIORegs structure
  * add a object property named "enable_tco" and only enable TCO support
on pc-q35-2.4 and later

v4 -> v5:
  * remove unused field (use_tco) in TCOIORegs structure
  * move license to GPLv2+
---
 hw/acpi/Makefile.objs  |   2 +-
 hw/acpi/ich9.c |  55 ++-
 hw/acpi/tco.c  | 264 +
 hw/i386/pc_q35.c   |   4 +-
 hw/isa/lpc_ich9.c  |  15 ++-
 include/hw/acpi/ich9.h |   7 +-
 include/hw/acpi/tco.h  |  82 +++
 include/hw/boards.h|   3 +-
 include/hw/i386/ich9.h |  10 +-
 include/hw/i386/pc.h   |   1 +
 10 files changed, 435 insertions(+), 8 deletions(-)
 create mode 100644 hw/acpi/tco.c
 create mode 100644 include/hw/acpi/tco.h

diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 29d46d8..3db1f07 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-$(CONFIG_ACPI_X86) += core.o piix4.o ich9.o pcihp.o
+common-obj-$(CONFIG_ACPI_X86) += core.o piix4.o ich9.o pcihp.o tco.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
 common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI) += acpi_interface.o
diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 8a64ffb..d3d9953 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -30,6 +30,7 @@
 #include "qemu/timer.h"
 #include "sysemu/sysemu.h"
 #include "hw/acpi/acpi.h"
+#include "hw/acpi/tco.h"
 #include "sysemu/kvm.h"
 #include "exec/address-spaces.h"
 
@@ -92,8 +93,16 @@ static void ich9_smi_writel(void *opaque, hwaddr addr, 
uint64_t val,
 unsigned width)
 {
 ICH9LPCPMRegs *pm = opaque;
+TCOIORegs *tr = &pm->tco_regs;
+uint64_t tco_en;
+
 switch (addr) {
 case 0:
+tco_en = pm->smi_en & ICH9_PMIO_SMI_EN_TCO_EN;
+/* once TCO_LOCK bit is set, TCO_EN bit cannot be overwritten */
+if (tr->tco.cnt1 & TCO_LOCK) {
+val = (val & ~ICH9_PMIO_SMI_EN_TCO_EN) | tco_en;
+}
 pm->smi_en &= ~pm->smi_en_wmask;
 pm->smi_en |= (val & pm->smi_en_wmask);
 break;
@@ -159,6 +168,25 @@ static const VMStateDescription vmstate_memhp_state = {
 }
 };
 
+static bool vmstate_test_use_tco(void *opaque)
+{
+ICH9LPCPMRegs *s = opaque;
+return s->enable_tco;
+}
+
+static const VMStateDescription vmstate_tco_io_state = {
+.name = "ich9_pm/tco",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.needed = vmstate_test_use_tco,
+.fields  = (VMStateField[]) {
+VMSTATE_STRUCT(tco_regs, ICH9LPCPMRegs, 1, vmstate_tco_io_sts,
+   TCOIORegs),
+VMSTATE_END_OF_LIST()
+}
+};
+
 const VMStateDescription vmstate_ich9_pm = {
 .name = "ich9_pm",
 .version_id = 1,
@@ -179,6 +207,10 @@ const VMStateDescription vmstate_ich9_pm = {
 .subsections = (const VMStateDescription*[]) {
 &vmstate_memhp_state,
 NULL
+},
+.subsections = (const VMStateDescription*[]) {
+&vmstate_tco_io_state,
+NULL
 }
 };
 
@@ -209,7 +241,7 @@ static void pm_powerdown_req(Notifier *n, void *opaque)
 acpi_pm1_evt_power_down(&pm->acpi_regs);
 }
 
-void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm,
+void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm, bool enable_tco,
   qemu_irq sci_irq)
 {
 memory_region_init(&pm->io, OBJECT(lpc_pci), "ich9-pm", ICH9_PMIO_SIZE);
@@ -231,6 +263,11 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm,
   "acpi-smi", 8);
 memory_region_add_subregion(&pm->io, ICH9_PMIO_SMI_EN, &pm->io_smi);
 
+pm->enable_tco = enable_tco;
+if (pm->enable_tco) {
+acpi_pm_tco_init(&pm->tco_regs, &pm->io);
+}
+
 pm->irq = sci_irq;
 qemu_register_reset(pm_reset, pm);
 pm->powerdown_notifier.notify = pm_

[Qemu-devel] [PATCH v5 2/3] target-i386: reserve RCRB mmio space in ACPI DSDT table

2015-06-22 Thread Paulo Alcantara

This block is mapped into memory space, using the Root Complex Base
Address (RCBA) register of the PCI-to-LPC bridge. Accesses in this space
must be limited to 32-(DW) bit quantities. Burst accesses are not
allowed.

All Chipset Configuration Registers are located in this 16KiB space.

Signed-off-by: Paulo Alcantara 
---
v1 -> v2:
  * s/PDRC/CCR/ for clarity and match ICH9 spec
  * remove unnecessary OperationRegion for RCRB

v2 -> v3: (no changes)

v3 -> v4:
  * quote RCRB description from ICH9 spec to commit log
  * fix indentation issue in _CRS() method declaration
  * create hw/i386/ich9-cc.h for chipset configuration register values
and use them in ASL

v4 -> v5:
  * prefix macros in ich9-cc.h with "ICH9_" for better readability and
make use of them in CCR device definition
---
 hw/i386/q35-acpi-dsdt.dsl |  16 
 include/hw/i386/ich9-cc.h |  15 +++
 tests/acpi-test-data/q35/DSDT | Bin 7666 -> 7723 bytes
 3 files changed, 31 insertions(+)
 create mode 100644 include/hw/i386/ich9-cc.h

diff --git a/hw/i386/q35-acpi-dsdt.dsl b/hw/i386/q35-acpi-dsdt.dsl
index 16eaca3..8f4bb6a 100644
--- a/hw/i386/q35-acpi-dsdt.dsl
+++ b/hw/i386/q35-acpi-dsdt.dsl
@@ -114,6 +114,22 @@ DefinitionBlock (
 }
 }
 
+#include "hw/i386/ich9-cc.h"
+
+/
+ * Chipset Configuration Registers
+ /
+Scope(\_SB.PCI0) {
+Device (CCR) {
+Name (_HID, EISAID("PNP0C02"))
+Name (_UID, 1)
+
+Name (_CRS, ResourceTemplate() {
+Memory32Fixed(ReadWrite, ICH9_RCBA_BASE_ADDR, ICH9_RCRB_SIZE)
+})
+}
+}
+
 #include "acpi-dsdt-hpet.dsl"
 
 
diff --git a/include/hw/i386/ich9-cc.h b/include/hw/i386/ich9-cc.h
new file mode 100644
index 000..d4918ff
--- /dev/null
+++ b/include/hw/i386/ich9-cc.h
@@ -0,0 +1,15 @@
+/*
+ * QEMU ICH9 Chipset Configuration Registers
+ *
+ * Copyright (c) 2015 Paulo Alcantara 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef HW_ICH9_CC_H
+#define HW_ICH9_CC_H
+
+#define ICH9_RCBA_BASE_ADDR0xfed1c000
+#define ICH9_RCRB_SIZE 0x4000
+
+#endif /* HW_ICH9_CC_H */
diff --git a/tests/acpi-test-data/q35/DSDT b/tests/acpi-test-data/q35/DSDT
index 
4723e5954dccb00995ccaf521b7daf6bf15cf1d4..f3bda7b54ea6d669b1498d9380e7781207fb6e49
 100644
GIT binary patch
delta 81
zcmexlz1oJ$CDxbTJfnq$UVN}qe1Nm3L3ERjvvW{9N4$rp3y|j(F#wU_n7HzBWz

[Qemu-devel] [PATCHv2 for-2.4] block: Auto-generate node_names for each BDS entry

2015-06-22 Thread Eric Blake

From: Jeff Cody 

Currently, node_name is only filled in when done so explicitly by the
user.  If no node_name is specified, then the node name field is not
populated.

If node_names are automatically generated when not specified, that means
that all block job operations can be done by reference to the unique
node_name field.  This eliminates ambiguity in resolving filenames
(relative filenames, or file descriptors, symlinks, mounts, etc..) that
qemu currently needs to deal with.

If a node name is specified, then it will not be automatically
generated for that BDS entry.

If it is automatically generated, it will be prefaced with "__qemu##",
followed by 8 characters of a unique number, followed by 8 random
ASCII characters in the range of 'A-Z'.  Some sample generated node-name
strings:
__qemu##IAIYNXXR
__qemu##0002METXTRBQ
__qemu##0001FMBORDWG

The prefix is to aid in identifying it as a qemu-generated name, the
numeric portion is to guarantee uniqueness in a given qemu session, and
the random characters are to further avoid any accidental collisions
with user-specified node-names.

Reviewed-by: Eric Blake 
Signed-off-by: Jeff Cody 
Message-Id: 
<9516d2684d419e1bfa2a95f66d2e9a70a4ff7eb7.1403723862.git.jc...@redhat.com>
[id_wellformed() rejects generated names, which means we can't collide]
Signed-off-by: Eric Blake 
---

v2: revive this patch; very little needed changing

I just posted a libvirt patch series that depends on this patch:
https://www.redhat.com/archives/libvir-list/2015-June/msg0.html

As the original was posted by Jeff (nearly a year ago!), it counts
as a patch that was submitted prior to soft freeze :)

[I will be on vacation the next 3 weeks, so I may be slow to respond
on replies to this message]

 block.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index dd4f58d..2532b70 100644
--- a/block.c
+++ b/block.c
@@ -765,16 +765,28 @@ static int bdrv_open_flags(BlockDriverState *bs, int 
flags)
 return open_flags;
 }

+#define GEN_NODE_NAME_PREFIX"__qemu##"
+#define GEN_NODE_NAME_MAX_LEN   (sizeof(GEN_NODE_NAME_PREFIX) + 8 + 8)
 static void bdrv_assign_node_name(BlockDriverState *bs,
   const char *node_name,
   Error **errp)
 {
+char gen_node_name[GEN_NODE_NAME_MAX_LEN];
+static uint32_t counter; /* simple counter to guarantee uniqueness */
+
+/* if node_name is NULL, auto-generate a node name */
 if (!node_name) {
-return;
-}
-
-/* Check for empty string or invalid characters */
-if (!id_wellformed(node_name)) {
+int len;
+snprintf(gen_node_name, GEN_NODE_NAME_MAX_LEN,
+ "%s%08x", GEN_NODE_NAME_PREFIX, counter++);
+len = strlen(gen_node_name);
+while (len < GEN_NODE_NAME_MAX_LEN - 1) {
+gen_node_name[len++] = g_random_int_range('A', 'Z');
+}
+gen_node_name[GEN_NODE_NAME_MAX_LEN - 1] = '\0';
+node_name = gen_node_name;
+} else if (!id_wellformed(node_name)) {
+/* Check for empty string or invalid characters */
 error_setg(errp, "Invalid node name");
 return;
 }
-- 
2.4.3

Re: [Qemu-devel] [PATCH] block/iscsi: add support for request timeouts

2015-06-22 Thread ronnie sahlberg

LGTM

It is good to finally have timeouts that work in libiscsi,  and a consumer
that can use and benefit from it.

On Tue, Jun 16, 2015 at 4:45 AM, Peter Lieven  wrote:

> libiscsi starting with 1.15 will properly support timeout of iscsi
> commands. The default will remain no timeout, but this can
> be changed via cmdline parameters, e.g.:
>
> qemu -iscsi timeout=30 -drive file=iscsi://...
>
> If a timeout occurs a reconnect is scheduled and the timed out command
> will be requeued for processing after a successful reconnect.
>
> The required API call iscsi_set_timeout is present since libiscsi
> 1.10 which was released in October 2013. However, due to some bugs
> in the libiscsi code the use is not recommended before version 1.15.
>
> Please note that this patch bumps the libiscsi requirement to 1.10
> to have all function and macros defined. The patch fixes also a
> off-by-one error in the NOP timeout calculation which was fixed
> while touching these code parts.
>
> Signed-off-by: Peter Lieven 
> ---
>  block/iscsi.c   | 87
> ++---
>  configure   |  6 ++--
>  qemu-options.hx |  4 +++
>  3 files changed, 72 insertions(+), 25 deletions(-)
>
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 14e97a6..f19a56a 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -69,6 +69,7 @@ typedef struct IscsiLun {
>  bool dpofua;
>  bool has_write_same;
>  bool force_next_flush;
> +bool request_timed_out;
>  } IscsiLun;
>
>  typedef struct IscsiTask {
> @@ -99,7 +100,8 @@ typedef struct IscsiAIOCB {
>  #endif
>  } IscsiAIOCB;
>
> -#define EVENT_INTERVAL 250
> +/* libiscsi uses time_t so its enough to process events every second */
> +#define EVENT_INTERVAL 1000
>  #define NOP_INTERVAL 5000
>  #define MAX_NOP_FAILURES 3
>  #define ISCSI_CMD_RETRIES ARRAY_SIZE(iscsi_retry_times)
> @@ -186,13 +188,18 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int
> status,
>  iTask->do_retry = 1;
>  goto out;
>  }
> -/* status 0x28 is SCSI_TASK_SET_FULL. It was first introduced
> - * in libiscsi 1.10.0. Hardcode this value here to avoid
> - * the need to bump the libiscsi requirement to 1.10.0 */
> -if (status == SCSI_STATUS_BUSY || status == 0x28) {
> +if (status == SCSI_STATUS_BUSY || status ==
> SCSI_STATUS_TIMEOUT ||
> +status == SCSI_STATUS_TASK_SET_FULL) {
>  unsigned retry_time =
>  exp_random(iscsi_retry_times[iTask->retries - 1]);
> -error_report("iSCSI Busy/TaskSetFull (retry #%u in %u
> ms): %s",
> +if (status == SCSI_STATUS_TIMEOUT) {
> +/* make sure the request is rescheduled AFTER the
> + * reconnect is initiated */
> +retry_time = EVENT_INTERVAL * 2;
> +iTask->iscsilun->request_timed_out = true;
> +}
> +error_report("iSCSI Busy/TaskSetFull/TimeOut"
> + " (retry #%u in %u ms): %s",
>   iTask->retries, retry_time,
>   iscsi_get_error(iscsi));
>  aio_timer_init(iTask->iscsilun->aio_context,
> @@ -276,20 +283,26 @@ iscsi_set_events(IscsiLun *iscsilun)
> iscsilun);
>  iscsilun->events = ev;
>  }
> -
> -/* newer versions of libiscsi may return zero events. In this
> - * case start a timer to ensure we are able to return to service
> - * once this situation changes. */
> -if (!ev) {
> -timer_mod(iscsilun->event_timer,
> -  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> EVENT_INTERVAL);
> -}
>  }
>
> -static void iscsi_timed_set_events(void *opaque)
> +static void iscsi_timed_check_events(void *opaque)
>  {
>  IscsiLun *iscsilun = opaque;
> +
> +/* check for timed out requests */
> +iscsi_service(iscsilun->iscsi, 0);
> +
> +if (iscsilun->request_timed_out) {
> +iscsilun->request_timed_out = false;
> +iscsi_reconnect(iscsilun->iscsi);
> +}
> +
> +/* newer versions of libiscsi may return zero events. Ensure we are
> able
> + * to return to service once this situation changes. */
>  iscsi_set_events(iscsilun);
> +
> +timer_mod(iscsilun->event_timer,
> +  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + EVENT_INTERVAL);
>  }
>
>  static void
> @@ -1096,16 +1109,37 @@ static char *parse_initiator_name(const char
> *target)
>  return iscsi_name;
>  }
>
> +static int parse_timeout(const char *target)
> +{
> +QemuOptsList *list;
> +QemuOpts *opts;
> +const char *timeout;
> +
> +list = qemu_find_opts("iscsi");
> +if (list) {
> +opts = qemu_opts_find(list, target);
> +if (!opts) {
> +opts = QTAILQ_FIRST(&list->head);
> +}
> +if (opts) {
> +timeout = qe

Re: [Qemu-devel] [PATCH v3 04/11] linux-user: arm: set CPSR.E correctly for BE8 mode

2015-06-22 Thread Peter Crosthwaite

On Thu, Jun 26, 2014 at 7:18 AM, Paolo Bonzini  wrote:
> Il 26/06/2014 16:15, Peter Maydell ha scritto:
>>
>> (There is code for handling CPSR_E in the kernel's start_thread()
>> macro but that is actually only called for starting new
>> processes, AFAICT.)
>
>
> Yes, you're right.
>

So I am struggling on figuring out the need to have this extra state
of signal_cpsr_e. Is it still needed and to follow up, would something
similar be needed for SCTLR.E0E on AA64 support?

Regards,
Peter

> Paolo
>

Re: [Qemu-devel] [PATCH] Peek dont read for vmdescription

2015-06-22 Thread Alexander Graf



On 22.06.15 16:49, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> The VMDescription section maybe after the EOF mark, the current code
> does a 'qemu_get_byte' and either gets the header byte identifying the
> description or an error (which it ignores).  Doing the 'get' upsets
> RDMA which hangs on old machine types without the VMDescription.
> 
> Using 'qemu_peek_byte' avoids that.
> 
> Signed-off-by: Dr. David Alan Gilbert 

Fun. I did actually use peek at first and then figured it's the same as
read in the qemu file implementation. Have you figured out why exactly
peek does make a difference for the RDMA case?


Alex

Re: [Qemu-devel] 4 CPUs with vexpress is slow?

2015-06-22 Thread Frederic Konrad


On 22/06/2015 22:57, Peter Maydell wrote:

On 22 June 2015 at 19:18, Frederic Konrad  wrote:

Testing MTTCG patch-set performance I found strange slowness with _upstream_
qemu (46bca5404b08201bb9df1ac32bc88fc7e6db1f74).

Basically booting a vexpress-a15 with "-smp 4" takes approximately forever
if I use a
vexpress-v2p-ca15-tc1.dtb (2x A15) dtb and 39 secs if I hack this dtb to
have 4 CPUs.

Wait, so if you change the dtb it boots in a non-infinite time,
but MTTCG boots both dtbs?

Yes MTTCG boots both dtbs.


This is definitely strange because if I use "-smp 2" with the same guest
image it takes
only 4 secs. And MTTCG patch-set seems to fix the issue as it boot in 6 secs
with
"-smp 4".

Is that a known issue or maybe it's my guest (linux 3.13.5)?

Dunno. I basically never run with SMP configs because they're
always slower than uniprocessor... Does your kernel boot on
KVM? It may be that it's configured to assume 2xSMP somehow,
given the h/w we're modelling here is 2xSMP.

Important question: when you boot this config, does /proc/cpuinfo
say you have two cores booted, or all four? If we've ended up with
the other two cores in the config spinning then that would explain
why MTTCG is doing better here, but it's not a very interesting
config to try to optimise :-)

All four cores are in /proc/cpuinfo.



There is an issue where IPIs may be inefficient:
http://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg03824.html
but I'd expect that to merely slow things down a bit, not send
the boot time off to infinity.


This is definitely a bug :
Seems that removing "--nographic" option removes the problem it just 
boots as per

normal..

Fred



-- PMM

Re: [Qemu-devel] [PATCH 3/7] qga: Clean up unnecessarily dirty casts

2015-06-22 Thread Eric Blake

On 06/22/2015 01:26 PM, Markus Armbruster wrote:
> qga_vss_fsfreeze() casts error_set_win32() from
> 
> void (*)(Error **, int, ErrorClass, const char *, ...)
> 
> to
> 
> void (*)(void **, int, int, const char *, ...)
> 
> The result is later called.  Since the two types are not compatible,
> the call is undefined behavior.  It works in practice anyway.

Better than some other horrid casts we do, like monitor.c playing fast
and loose with 'Monitor *' vs. 'FILE *' in hmp_info_mtree() and friends.
 But that's a cleanup for another series.

> 
> However, there's no real need for trickery here.  Clean it up as
> follows:
> 
> * Declare struct Error, and fix the first parameter.
> 
> * Switch to error_setg_win32().  This gets rid of the troublesome
>   ErrorClass parameter.  Requires converting error_setg_win32() from
>   macro to function, but that's trivially easy, because this is the
>   only user of error_set_win32().
> 
> Signed-off-by: Markus Armbruster 
> ---

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-22 Thread John Snow



On 06/22/2015 09:09 AM, Peter Lieven wrote:
> Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
>> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven  wrote:
>>> Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
 On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven  wrote:
> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
 Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
> Thread 2 (Thread 0x75550700 (LWP 2636)):
> #0  0x75d87aa3 in ppoll () from
> /lib/x86_64-linux-gnu/libc.so.6
> No symbol table info available.
> #1  0x55955d91 in qemu_poll_ns (fds=0x563889c0,
> nfds=3,
>   timeout=4999424576) at qemu-timer.c:326
>   ts = {tv_sec = 4, tv_nsec = 999424576}
>   tvsec = 4
> #2  0x55956feb in aio_poll (ctx=0x563528e0,
> blocking=true)
>   at aio-posix.c:231
>   node = 0x0
>   was_dispatching = false
>   ret = 1
>   progress = false
> #3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
> offset=4292007936,
>   qiov=0x7554f760, is_write=false, flags=0) at
> block.c:2699
>   aio_context = 0x563528e0
>   co = 0x563888a0
>   rwco = {bs = 0x5637eae0, offset = 4292007936,
> qiov = 0x7554f760, is_write = false, ret =
> 2147483647,
> flags = 0}
> #4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
> sector_num=8382828,
>   buf=0x744cc800 "(", nb_sectors=4, is_write=false,
> flags=0)
>   at block.c:2722
>   qiov = {iov = 0x7554f780, niov = 1, nalloc = -1,
> size =
> 2048}
>   iov = {iov_base = 0x744cc800, iov_len = 2048}
> #5  0x5594b008 in bdrv_read (bs=0x5637eae0,
> sector_num=8382828,
>   buf=0x744cc800 "(", nb_sectors=4) at block.c:2730
> No locals.
> #6  0x5599acef in blk_read (blk=0x56376820,
> sector_num=8382828,
>   buf=0x744cc800 "(", nb_sectors=4) at
> block/block-backend.c:404
> No locals.
> #7  0x55833ed2 in cd_read_sector (s=0x56408f88,
> lba=2095707,
>   buf=0x744cc800 "(", sector_size=2048) at
> hw/ide/atapi.c:116
>   ret = 32767
 Here is the problem: The ATAPI emulation uses synchronous
 blk_read()
 instead of the AIO or coroutine interfaces. This means that it
 keeps
 polling for request completion while it holds the BQL until the
 request
 is completed.
>>> I will look at this.
> I need some further help. My way to "emulate" a hung NFS Server is to
> block it in the Firewall. Currently I face the problem that I
> cannot mount
> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
> tried with
> a kernel NFS mount). It reads a few sectors and then stalls (maybe
> another
> bug):
>
> (gdb) thread apply all bt full
>
> Thread 3 (Thread 0x70c21700 (LWP 29710)):
> #0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
> util/qemu-thread-posix.c:120
>  err = 
>  __func__ = "qemu_cond_broadcast"
> #1  0x55911164 in rfifolock_unlock
> (r=r@entry=0x56259910) at
> util/rfifolock.c:75
>  __PRETTY_FUNCTION__ = "rfifolock_unlock"
> #2  0x55875921 in aio_context_release
> (ctx=ctx@entry=0x562598b0)
> at async.c:329
> No locals.
> #3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
> blocking=blocking@entry=true) at aio-posix.c:272
>  node = 
>  was_dispatching = false
>  i = 
>  ret = 
>  progress = false
>  timeout = 611734526
>  __PRETTY_FUNCTION__ = "aio_poll"
> #4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
> offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
> block/io.c:552
>  aio_context = 0x562598b0
>  co = 
>  rwco = {bs = 0x5627c0f0, offset = 7038976, qiov =
> 0x70c208f0, is_write = false, ret = 2147483647, flags =
> (unknown: 0)}
> #5  0x558bc533 in bdrv_rw_co (bs=0x5627c0f0,
> sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 "(",
> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
>  flags=flags@entry=(unknown: 0)) at block/io.c:575
>  qiov = {iov = 0x70c208e0, niov = 1, nalloc =

Re: [Qemu-devel] [PATCH 08/16] ahci: clear error register before NCQ cmd

2015-06-22 Thread John Snow



On 06/22/2015 10:43 AM, John Snow wrote:
> 
> 
> On 06/22/2015 10:38 AM, Stefan Hajnoczi wrote:
>> On Fri, Jun 19, 2015 at 09:50:39PM -0400, John Snow wrote:
>>> The legacy ide command execution layer will clear any errors
>>> outstanding before execution, but the NCQ layer doesn't. Even on
>>> success, this register will remain clogged.
>>>
>>> Clear it out before each NCQ command so the guest can tell if
>>> the error code produced after completion is meaningful or not.
>>>
>>> Signed-off-by: John Snow  --- hw/ide/ahci.c | 2
>>> ++ 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c index 6bded67..e63ba9b
>>> 100644 --- a/hw/ide/ahci.c +++ b/hw/ide/ahci.c @@ -1048,6 +1048,8
>>> @@ static void process_ncq_command(AHCIState *s, int port,
>>> uint8_t *cmd_fis, ncq_tfs->lba, ncq_tfs->lba +
>>> ncq_tfs->sector_count - 1, ide_state->nb_sectors - 1);
>>>
>>> +ide_state->error = 0;
>>
>> I'm not sure it makes sense use ide_state at all in NCQ.
>>
>> ide_state is per-port and NCQ can issue multiple asynchronous
>> commands per port.  If process_ncq_command() modifies ide_state, it
>> may do that while other commands are still pending or about to be
>> processed.
>>
>> This will clobber ide_state->error.
>>
> 
> Good point. The real problem that we need to fix then is in the core
> IDE layer, which tends to set a "default error" and waits for
> ide_exec_cmd to clear it before execution.
> 
> If we don't clear that bit somehow, we will get failed commands.
> 
> I'll have to do a little research to see if it's safe to remove our
> startup error, since it might be part of an ATA signature handshake.
> 

It is indeed part of a startup signature and should not be removed.

> Hmm.
> 
> Maybe it's not actually a problem because any real OS should probably
> have issued ATA IDENTIFY (which will clear ERR) by the time they get
> to NCQ, so maybe we can scrap this, and I'll adjust the test accordingly
> to issue at least IDENTIFY before attempting NCQ.

It's not exquisitely clear what the NCQ state machine for SATA should
look like -- I need to figure out what a real device might do if you
attempt to send it an NCQ command before a "hello."

Or I'll just ignore this whole train of thought under the premise that
every OS alive will always send an IDENTIFY packet first, and be on my way.

[Qemu-devel] [PATCH v8 01/11] block: keep a list of block jobs

2015-06-22 Thread Alberto Garcia

The current way to obtain the list of existing block jobs is to
iterate over all root nodes and check which ones own a job.

Since we want to be able to support block jobs in other nodes as well,
this patch keeps a list of jobs that is updated every time one is
created or destroyed.

This also updates qmp_query_block_jobs() and bdrv_drain_all() to use
this new list.

Signed-off-by: Alberto Garcia 
Cc: Max Reitz 
Cc: Eric Blake 
Cc: Alexander Yarygin 
---
 block/io.c   | 22 +++---
 blockdev.c   | 19 ---
 blockjob.c   | 13 +
 include/block/blockjob.h | 14 ++
 4 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/block/io.c b/block/io.c
index 4a7a969..2fb89f2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -271,16 +271,19 @@ void bdrv_drain_all(void)
 /* Always run first iteration so any pending completion BHs run */
 bool busy = true;
 BlockDriverState *bs = NULL;
+BlockJob *job = NULL;
 GSList *aio_ctxs = NULL, *ctx;
 
-while ((bs = bdrv_next(bs))) {
-AioContext *aio_context = bdrv_get_aio_context(bs);
+while ((job = block_job_next(job))) {
+AioContext *aio_context = bdrv_get_aio_context(job->bs);
 
 aio_context_acquire(aio_context);
-if (bs->job) {
-block_job_pause(bs->job);
-}
+block_job_pause(job);
 aio_context_release(aio_context);
+}
+
+while ((bs = bdrv_next(bs))) {
+AioContext *aio_context = bdrv_get_aio_context(bs);
 
 if (!aio_ctxs || !g_slist_find(aio_ctxs, aio_context)) {
 aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
@@ -309,14 +312,11 @@ void bdrv_drain_all(void)
 }
 }
 
-bs = NULL;
-while ((bs = bdrv_next(bs))) {
-AioContext *aio_context = bdrv_get_aio_context(bs);
+while ((job = block_job_next(job))) {
+AioContext *aio_context = bdrv_get_aio_context(job->bs);
 
 aio_context_acquire(aio_context);
-if (bs->job) {
-block_job_resume(bs->job);
-}
+block_job_resume(job);
 aio_context_release(aio_context);
 }
 g_slist_free(aio_ctxs);
diff --git a/blockdev.c b/blockdev.c
index 41d7e0f..66f8515 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3088,21 +3088,18 @@ fail:
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 {
 BlockJobInfoList *head = NULL, **p_next = &head;
-BlockDriverState *bs;
+BlockJob *job;
 
-for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
-AioContext *aio_context = bdrv_get_aio_context(bs);
+for (job = block_job_next(NULL); job; job = block_job_next(job)) {
+BlockJobInfoList *elem = g_new0(BlockJobInfoList, 1);
+AioContext *aio_context = bdrv_get_aio_context(job->bs);
 
 aio_context_acquire(aio_context);
-
-if (bs->job) {
-BlockJobInfoList *elem = g_new0(BlockJobInfoList, 1);
-elem->value = block_job_query(bs->job);
-*p_next = elem;
-p_next = &elem->next;
-}
-
+elem->value = block_job_query(job);
 aio_context_release(aio_context);
+
+*p_next = elem;
+p_next = &elem->next;
 }
 
 return head;
diff --git a/blockjob.c b/blockjob.c
index 2755465..c46984d 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -35,6 +35,16 @@
 #include "qemu/timer.h"
 #include "qapi-event.h"
 
+static QLIST_HEAD(, BlockJob) block_jobs = QLIST_HEAD_INITIALIZER(block_jobs);
+
+BlockJob *block_job_next(BlockJob *job)
+{
+if (!job) {
+return QLIST_FIRST(&block_jobs);
+}
+return QLIST_NEXT(job, job_list);
+}
+
 void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
int64_t speed, BlockCompletionFunc *cb,
void *opaque, Error **errp)
@@ -73,6 +83,8 @@ void *block_job_create(const BlockJobDriver *driver, 
BlockDriverState *bs,
 return NULL;
 }
 }
+
+QLIST_INSERT_HEAD(&block_jobs, job, job_list);
 return job;
 }
 
@@ -85,6 +97,7 @@ void block_job_completed(BlockJob *job, int ret)
 bs->job = NULL;
 bdrv_op_unblock_all(bs, job->blocker);
 error_free(job->blocker);
+QLIST_REMOVE(job, job_list);
 g_free(job);
 }
 
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 57d8ef1..5431dd2 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -102,6 +102,9 @@ struct BlockJob {
  */
 bool ready;
 
+/** Element of the list of block jobs */
+QLIST_ENTRY(BlockJob) job_list;
+
 /** Status that is published by the query-block-jobs QMP API */
 BlockDeviceIoStatus iostatus;
 
@@ -125,6 +128,17 @@ struct BlockJob {
 };
 
 /**
+ * block_job_next:
+ * @job: A block job, or %NULL.
+ *
+ * Get the next element from the list of block jobs after @job, or the
+ * first one if @job is %NULL.
+ *
+ * Returns the requested job, or %NULL if there are no more jobs left.
+ *

Re: [Qemu-devel] [PATCH 2/7] error: Make error_setg() a function

2015-06-22 Thread Eric Blake

On 06/22/2015 01:26 PM, Markus Armbruster wrote:
> Saves a tiny amount of code at every call site.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  include/qapi/error.h | 4 ++--
>  util/error.c | 9 +
>  2 files changed, 11 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake 

(Interesting that you turn it back into a macro later in the series...)

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

1 2 3 4 >

1 - 100 of 309 matches

Mail list logo