date:20170412

Re: [Qemu-devel] [RFC/BUG] xen-mapcache: buggy invalidate map cache?

2017-04-12 Thread Herongguang (Stephen)




On 2017/4/13 7:51, Stefano Stabellini wrote:

On Wed, 12 Apr 2017, Herongguang (Stephen) wrote:

On 2017/4/12 6:32, Stefano Stabellini wrote:

On Tue, 11 Apr 2017, hrg wrote:

On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
 wrote:

On Mon, 10 Apr 2017, Stefano Stabellini wrote:

On Mon, 10 Apr 2017, hrg wrote:

On Sun, Apr 9, 2017 at 11:55 PM, hrg  wrote:

On Sun, Apr 9, 2017 at 11:52 PM, hrg  wrote:

Hi,

In xen_map_cache_unlocked(), map to guest memory maybe in
entry->next
instead of first level entry (if map to rom other than guest
memory
comes first), while in xen_invalidate_map_cache(), when VM
ballooned
out memory, qemu did not invalidate cache entries in linked
list(entry->next), so when VM balloon back in memory, gfns
probably
mapped to different mfns, thus if guest asks device to DMA to
these
GPA, qemu may DMA to stale MFNs.

So I think in xen_invalidate_map_cache() linked lists should
also be
checked and invalidated.

What’s your opinion? Is this a bug? Is my analyze correct?

Yes, you are right. We need to go through the list for each element of
the array in xen_invalidate_map_cache. Can you come up with a patch?

I spoke too soon. In the regular case there should be no locked mappings
when xen_invalidate_map_cache is called (see the DPRINTF warning at the
beginning of the functions). Without locked mappings, there should never
be more than one element in each list (see xen_map_cache_unlocked:
entry->lock == true is a necessary condition to append a new entry to
the list, otherwise it is just remapped).

Can you confirm that what you are seeing are locked mappings
when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
by turning it into a printf or by defininig MAPCACHE_DEBUG.

In fact, I think the DPRINTF above is incorrect too. In
pci_add_option_rom(), rtl8139 rom is locked mapped in
pci_add_option_rom->memory_region_get_ram_ptr (after
memory_region_init_ram). So actually I think we should remove the
DPRINTF warning as it is normal.

Let me explain why the DPRINTF warning is there: emulated dma operations
can involve locked mappings. Once a dma operation completes, the related
mapping is unlocked and can be safely destroyed. But if we destroy a
locked mapping in xen_invalidate_map_cache, while a dma is still
ongoing, QEMU will crash. We cannot handle that case.

However, the scenario you described is different. It has nothing to do
with DMA. It looks like pci_add_option_rom calls
memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
locked mapping and it is never unlocked or destroyed.

It looks like "ptr" is not used after pci_add_option_rom returns. Does
the append patch fix the problem you are seeing? For the proper fix, I
think we probably need some sort of memory_region_unmap wrapper or maybe
a call to address_space_unmap.


Yes, I think so, maybe this is the proper way to fix this.


Would you be up for sending a proper patch and testing it? We cannot call
xen_invalidate_map_cache_entry directly from pci.c though, it would need
to be one of the other functions like address_space_unmap for example.




Yes, I will look into this.




diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e6b08e1..04f98b7 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool
is_default_rom,
   }
 pci_register_bar(pdev, PCI_ROM_SLOT, 0, >rom);
+xen_invalidate_map_cache_entry(ptr);
   }
 static void pci_del_option_rom(PCIDevice *pdev)

Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()

2017-04-12 Thread Paolo Bonzini

On 13/04/2017 09:11, Jeff Cody wrote:
>> It didn't make it into 2.9-rc4 because of limited time. :(
>>
>> Looks like there is no -rc5, we'll have to document this as a known issue.
>> Users should "block-job-complete/cancel" as soon as possible to avoid such a
>> hang.
>
> I'd argue for including a fix for 2.9, since this is both a regression, and
> a hard lock without possible recovery short of restarting the QEMU process.

It is a bit of a corner case (and jobs on I/O thread are relatively rare
too), so maybe it's not worth delaying 2.9.  It has been delayed already
quite a bit.  Another reason I think I prefer to wait is to ensure that
we have an entry in qemu-iotests to avoid the future regression.

Fam explained to me what happens, and the root cause is that bdrv_drain
never does a release/acquire pair in this case, so the I/O thread run
remains stuck in a callback that tries to acquire.  Ironically
reintroducing RFifoLock would probably fix this (not 100% sure).  Oops.

His solution is a bit hacky, but we will hopefully be able to revert it
in 2.10 or whenever aio_context_acquire/release will go away.

Thanks,

Paolo

Re: [Qemu-devel] [PATCH 05/10] tcg: add jr opcode

2017-04-12 Thread Paolo Bonzini



On 12/04/2017 09:17, Emilio G. Cota wrote:
> This will be used by TCG targets to implement a fast path
> for indirect branches.
> 
> I only have implemented and tested this on an i386 host, so
> make this opcode optional and mark it as not implemented by
> other TCG backends.

Please don't forget to document this in tcg/README.

Thanks,

Paolo

Re: [Qemu-devel] [PATCH] spapr-cpu-core: Release ICPState object during CPU unrealization

2017-04-12 Thread David Gibson

On Wed, Apr 12, 2017 at 01:45:07PM +0530, Bharata B Rao wrote:
> Recent commits that re-organized ICPState object missed to destroy
> the object when CPU is unrealized. Fix this so that CPU unplug
> doesn't abort QEMU.
> 
> Signed-off-by: Bharata B Rao 

Applied to ppc-for-2.10, thanks.

> ---
>  hw/ppc/spapr_cpu_core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 2e689b5..4389ef4 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -127,6 +127,7 @@ static void spapr_cpu_core_unrealizefn(DeviceState *dev, 
> Error **errp)
>  PowerPCCPU *cpu = POWERPC_CPU(cs);
>  
>  spapr_cpu_destroy(cpu);
> +object_unparent(cpu->intc);
>  cpu_remove_sync(cs);
>  object_unparent(obj);
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [Qemu-devel RFC v2 0/4] Add support for Smartfusion2 SoC

2017-04-12 Thread sundeep subbaraya

Hi Qemu-devel,

This is my first attempt in Qemu.
Please let me know am doing correct or not.
SoC is cortex M3 based so no bootrom stuff and unlike other
SoCs Qemu need not load dtb and kernel in DDR. Hence am using
u-boot (supplied with -kernel) as bootloader in eNVM and it loads
kernel from SPI flash to DDR same like real hardware scenario.
Also let me know any other Maintainers need to be CCed.

Thank you,
Sundeep

On Sun, Apr 9, 2017 at 4:49 PM, Subbaraya Sundeep
 wrote:
> Hi Qemu-devel,
>
> I am trying to add Smartfusion2 SoC.
> SoC is from Microsemi and System on Module(SOM)
> board is from Emcraft systems. Smartfusion2 has hardened
> Microcontroller(Cortex-M3)based Sub System and FPGA fabric.
> At the moment only system timer, sysreg and SPI
> controller are modelled.
>
> Testing:
> ./arm-softmmu/qemu-system-arm -M smartfusion2-som -serial mon:stdio \
> -kernel u-boot.bin -display none -drive file=spi.bin,if=mtd,format=raw
>
> U-boot is from Emcraft with modified SPI driver not to use PDMA.
> Linux is 4.5 linux with Smartfusion2 SoC dts and clocksource
> driver added by myself @
> https://github.com/Subbaraya-Sundeep/linux.git
>
> Baremetal elfs from Microsemi Softconsole IDE are also working.
>
> Changes from v1:
> Added SPI controller.
>
> Thanks,
> Sundeep
>
> Subbaraya Sundeep (4):
>   msf2: Add Smartfusion2 System timer
>   msf2: Microsemi Smartfusion2 System Register block.
>   msf2: Add Smartfusion2 SPI controller
>   msf2: Add Emcraft's Smartfusion2 SOM kit.
>
>  default-configs/arm-softmmu.mak |   1 +
>  hw/arm/Makefile.objs|   2 +-
>  hw/arm/msf2_soc.c   | 141 +
>  hw/misc/Makefile.objs   |   1 +
>  hw/misc/msf2_sysreg.c   | 168 +++
>  hw/ssi/Makefile.objs|   1 +
>  hw/ssi/msf2_spi.c   | 449 
> 
>  hw/timer/Makefile.objs  |   1 +
>  hw/timer/msf2_timer.c   | 273 
>  9 files changed, 1036 insertions(+), 1 deletion(-)
>  create mode 100644 hw/arm/msf2_soc.c
>  create mode 100644 hw/misc/msf2_sysreg.c
>  create mode 100644 hw/ssi/msf2_spi.c
>  create mode 100644 hw/timer/msf2_timer.c
>
> --
> 2.5.0
>

Re: [Qemu-devel] WinDbg module

2017-04-12 Thread Roman Kagan

On Wed, Apr 12, 2017 at 05:05:45PM +0300, Mihail Abakumov wrote:
> Hello.
> 
> We made the debugger module WinDbg (like GDB) for QEMU. This is the
> replacement of the remote stub in Windows kernel. Used for remote Windows
> kernel debugging without debugging mode.
> 
> The latest build and instructions for the launch can be found here:
> https://github.com/ispras/qemu/releases/tag/v2.7.50-windbg
> 
> Currently only one ways to create a remote debugging connection is
> supported: using COM port with named pipe.
> 
> Should I prepare patches for inclusion in the master branch? Or is it too
> specific module and it is not needed?

Please do!

Every once in a while dealing with a Windows guest problem I wished I
had this.  We at Virtuozzo looked into doing something like this but
never got around to.

Thanks,
Roman.

[Qemu-devel] [PATCH v6] migration/block: use blk_pwrite_zeroes for each zero cluster

2017-04-12 Thread jemmy858585

From: Lidong Chen 

BLOCK_SIZE is (1 << 20), qcow2 cluster size is 65536 by default,
this may cause the qcow2 file size to be bigger after migration.
This patch checks each cluster, using blk_pwrite_zeroes for each
zero cluster.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Lidong Chen 
---
v6 changelog:
Fix up some grammar in the comment.
---
 migration/block.c | 35 +--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index 7734ff7..41c7a55 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -885,6 +885,8 @@ static int block_load(QEMUFile *f, void *opaque, int 
version_id)
 int64_t total_sectors = 0;
 int nr_sectors;
 int ret;
+BlockDriverInfo bdi;
+int cluster_size;
 
 do {
 addr = qemu_get_be64(f);
@@ -919,6 +921,15 @@ static int block_load(QEMUFile *f, void *opaque, int 
version_id)
 error_report_err(local_err);
 return -EINVAL;
 }
+
+ret = bdrv_get_info(blk_bs(blk), );
+if (ret == 0 && bdi.cluster_size > 0 &&
+bdi.cluster_size <= BLOCK_SIZE &&
+BLOCK_SIZE % bdi.cluster_size == 0) {
+cluster_size = bdi.cluster_size;
+} else {
+cluster_size = BLOCK_SIZE;
+}
 }
 
 if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY_CHUNK) {
@@ -932,10 +943,30 @@ static int block_load(QEMUFile *f, void *opaque, int 
version_id)
 nr_sectors * BDRV_SECTOR_SIZE,
 BDRV_REQ_MAY_UNMAP);
 } else {
+int i;
+int64_t cur_addr;
+uint8_t *cur_buf;
+
 buf = g_malloc(BLOCK_SIZE);
 qemu_get_buffer(f, buf, BLOCK_SIZE);
-ret = blk_pwrite(blk, addr * BDRV_SECTOR_SIZE, buf,
- nr_sectors * BDRV_SECTOR_SIZE, 0);
+for (i = 0; i < BLOCK_SIZE / cluster_size; i++) {
+cur_addr = addr * BDRV_SECTOR_SIZE + i * cluster_size;
+cur_buf = buf + i * cluster_size;
+
+if ((!block_mig_state.zero_blocks ||
+cluster_size < BLOCK_SIZE) &&
+buffer_is_zero(cur_buf, cluster_size)) {
+ret = blk_pwrite_zeroes(blk, cur_addr,
+cluster_size,
+BDRV_REQ_MAY_UNMAP);
+} else {
+ret = blk_pwrite(blk, cur_addr, cur_buf,
+ cluster_size, 0);
+}
+if (ret < 0) {
+break;
+}
+}
 g_free(buf);
 }
 
-- 
1.8.3.1

Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()

2017-04-12 Thread Jeff Cody

On Wed, Apr 12, 2017 at 09:11:09PM -0400, Jeff Cody wrote:
> On Thu, Apr 13, 2017 at 07:54:20AM +0800, Fam Zheng wrote:
> > On Wed, 04/12 18:22, Jeff Cody wrote:
> > > On Wed, Apr 12, 2017 at 05:38:17PM -0400, John Snow wrote:
> > > > 
> > > > 
> > > > On 04/12/2017 04:46 PM, Jeff Cody wrote:
> > > > > 
> > > > > This occurs on v2.9.0-rc4, but not on v2.8.0.
> > > > > 
> > > > > When running QEMU with an iothread, and then performing a 
> > > > > block-mirror, if
> > > > > we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu
> > > > > becomes deadlocked.
> > > > > 
> > > > > The block job is not paused, nor cancelled, so we are stuck in the 
> > > > > while
> > > > > loop in block_job_detach_aio_context:
> > > > > 
> > > > > static void block_job_detach_aio_context(void *opaque)
> > > > > {
> > > > > BlockJob *job = opaque;
> > > > > 
> > > > > /* In case the job terminates during aio_poll()... */
> > > > > block_job_ref(job);
> > > > > 
> > > > > block_job_pause(job);
> > > > > 
> > > > > while (!job->paused && !job->completed) {
> > > > > block_job_drain(job);
> > > > > }
> > > > > 
> > > > 
> > > > Looks like when block_job_drain calls block_job_enter from this context
> > > > (the main thread, since we're trying to do a system_reset...), we cannot
> > > > enter the coroutine because it's the wrong context, so we schedule an
> > > > entry instead with
> > > > 
> > > > aio_co_schedule(ctx, co);
> > > > 
> > > > But that entry never happens, so the job never wakes up and we never
> > > > make enough progress in the coroutine to gracefully pause, so we wedge 
> > > > here.
> > > > 
> > > 
> > > 
> > > John Snow and I debugged this some over IRC.  Here is a summary:
> > > 
> > > Simply put, with iothreads the aio context is different.  When
> > > block_job_detach_aio_context() is called from the main thread via the 
> > > system
> > > reset (from main_loop_should_exit()), it calls block_job_drain() in a 
> > > while
> > > loop, with job->busy and job->completed as exit conditions.
> > > 
> > > block_job_drain() attempts to enter the coroutine (thus allowing job->busy
> > > or job->completed to change).  However, since the aio context is different
> > > with iothreads, we schedule the coroutine entry rather than directly
> > > entering it.
> > > 
> > > This means the job coroutine is never going to be re-entered, because we 
> > > are
> > > waiting for it to complete in a while loop from the main thread, which is
> > > blocking the qemu timers which would run the scheduled coroutine... hence,
> > > we become stuck.
> > 
> > John and I confirmed that this can be fixed by this pending patch:
> > 
> > [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01018.html
> > 
> > It didn't make it into 2.9-rc4 because of limited time. :(
> > 
> > Looks like there is no -rc5, we'll have to document this as a known issue.
> > Users should "block-job-complete/cancel" as soon as possible to avoid such a
> > hang.
> >
> 
> I'd argue for including a fix for 2.9, since this is both a regression, and
> a hard lock without possible recovery short of restarting the QEMU process.
> 
> -Jeff

BTW, I can add my verification that the patch you referenced fixed the
issue.

-Jeff

Re: [Qemu-devel] [PATCH 09/10] target/i386: optimize indirect branches with TCG's jr op

2017-04-12 Thread Emilio G. Cota

On Wed, Apr 12, 2017 at 11:43:45 +0800, Paolo Bonzini wrote:
> 
> 
> On 12/04/2017 09:17, Emilio G. Cota wrote:
> > 
> > The fact that NBench is not very sensitive to changes here is a
> > little surprising, especially given the significant improvements for
> > ARM shown in the previous commit. I wonder whether the compiler is doing
> > a better job compiling the x86_64 version (I'm using gcc 5.4.0), or I'm 
> > simply
> > missing some i386 instructions to which the jr optimization should
> > be applied.
> 
> Maybe it is "ret"?  That would be a straightforward "bx lr" on ARM, but
> it is missing in your i386 patch.

Yes I missed that. I added this fix-up:

diff --git a/target/i386/translate.c b/target/i386/translate.c
index aab5c13..f2b5a0f 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -6430,7 +6430,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 /* Note that gen_pop_T0 uses a zero-extending load.  */
 gen_op_jmp_v(cpu_T0);
 gen_bnd_jmp(s);
-gen_eob(s);
+gen_jr(s, cpu_T0);
 break;
 case 0xc3: /* ret */
 ot = gen_pop_T0(s);
@@ -6438,7 +6438,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 /* Note that gen_pop_T0 uses a zero-extending load.  */
 gen_op_jmp_v(cpu_T0);
 gen_bnd_jmp(s);
-gen_eob(s);
+gen_jr(s, cpu_T0);
 break;
 case 0xca: /* lret im */
 val = cpu_ldsw_code(env, s->pc);

Any other instructions I should look into? Perhaps lret/lret im?

Anyway, nbench does not improve much with the above. The reason seems to be
that it's full of direct jumps (visible with -d in_asm). Also tried softmmu
to see whether these jumps are in-page or not: peak improvement is ~8%, so
I guess most of them are in-page. See http://imgur.com/EKRrYUz

I'm running new tests on a server with no other users and which has
frequency scaling disabled. This should help get less noisy numbers,
since I'm having trouble replicating my own results :> (I used my desktop
machine until now). Will post these numbers tomorrow (running overnight
SPECint both train and set sizes).

Thanks,

Emilio

Re: [Qemu-devel] [PATCH 12/12] dirty-bitmap: Convert internal hbitmap size/granularity

2017-04-12 Thread John Snow



On 04/12/2017 01:49 PM, Eric Blake wrote:
> Now that all callers are using byte-based interfaces, there's no
> reason for our internal hbitmap to remain with sector-based
> granularity.  It also simplifies our internal scaling, since we
> already know that hbitmap widens requests out to granularity
> boundaries.
> 
> Signed-off-by: Eric Blake 
> ---
>  block/dirty-bitmap.c | 37 -
>  1 file changed, 12 insertions(+), 25 deletions(-)
> 
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index ef165eb..26ca084 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -37,7 +37,7 @@
>   * or enabled. A frozen bitmap can only abdicate() or reclaim().
>   */
>  struct BdrvDirtyBitmap {
> -HBitmap *bitmap;/* Dirty sector bitmap implementation */
> +HBitmap *bitmap;/* Dirty bitmap implementation */
>  HBitmap *meta;  /* Meta dirty bitmap */
>  BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
>  char *name; /* Optional non-empty unique ID */
> @@ -93,12 +93,7 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState 
> *bs,
>  return NULL;
>  }
>  bitmap = g_new0(BdrvDirtyBitmap, 1);
> -/*
> - * TODO - let hbitmap track full granularity. For now, it is tracking
> - * only sector granularity, as a shortcut for our iterators.
> - */
> -bitmap->bitmap = hbitmap_alloc(bitmap_size >> BDRV_SECTOR_BITS,
> -   ctz32(granularity) - BDRV_SECTOR_BITS);
> +bitmap->bitmap = hbitmap_alloc(bitmap_size, ctz32(granularity));
>  bitmap->size = bitmap_size;
>  bitmap->name = g_strdup(name);
>  bitmap->disabled = false;
> @@ -254,7 +249,7 @@ void bdrv_dirty_bitmap_truncate(BlockDriverState *bs)
>  QLIST_FOREACH(bitmap, >dirty_bitmaps, list) {
>  assert(!bdrv_dirty_bitmap_frozen(bitmap));
>  assert(!bitmap->active_iterators);
> -hbitmap_truncate(bitmap->bitmap, size >> BDRV_SECTOR_BITS);
> +hbitmap_truncate(bitmap->bitmap, size);
>  bitmap->size = size;
>  }
>  }
> @@ -336,7 +331,7 @@ bool bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap 
> *bitmap,
>  int64_t offset)
>  {
>  if (bitmap) {
> -return hbitmap_get(bitmap->bitmap, offset >> BDRV_SECTOR_BITS);
> +return hbitmap_get(bitmap->bitmap, offset);
>  } else {
>  return false;
>  }
> @@ -364,7 +359,7 @@ uint32_t 
> bdrv_get_default_bitmap_granularity(BlockDriverState *bs)
> 
>  uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap *bitmap)
>  {
> -return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
> +return 1U << hbitmap_granularity(bitmap->bitmap);
>  }
> 
>  BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap)
> @@ -397,27 +392,21 @@ void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter)
> 
>  int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
>  {
> -return hbitmap_iter_next(>hbi) * BDRV_SECTOR_SIZE;
> +return hbitmap_iter_next(>hbi);
>  }
> 
>  void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
> int64_t offset, int64_t bytes)
>  {
> -int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
> -
>  assert(bdrv_dirty_bitmap_enabled(bitmap));
> -hbitmap_set(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
> -end_sector - (offset >> BDRV_SECTOR_BITS));
> +hbitmap_set(bitmap->bitmap, offset, bytes);
>  }
> 
>  void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
>   int64_t offset, int64_t bytes)
>  {
> -int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
> -
>  assert(bdrv_dirty_bitmap_enabled(bitmap));
> -hbitmap_reset(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
> -  end_sector - (offset >> BDRV_SECTOR_BITS));
> +hbitmap_reset(bitmap->bitmap, offset, bytes);
>  }
> 
>  void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out)
> @@ -427,7 +416,7 @@ void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, 
> HBitmap **out)
>  hbitmap_reset_all(bitmap->bitmap);
>  } else {
>  HBitmap *backup = bitmap->bitmap;
> -bitmap->bitmap = hbitmap_alloc(bitmap->size >> BDRV_SECTOR_BITS,
> +bitmap->bitmap = hbitmap_alloc(bitmap->size,
> hbitmap_granularity(backup));
>  *out = backup;
>  }
> @@ -481,14 +470,12 @@ void 
> bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap)
>  void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, int64_t bytes)
>  {
>  BdrvDirtyBitmap *bitmap;
> -int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
> 
>  QLIST_FOREACH(bitmap, >dirty_bitmaps, list) {
>  if (!bdrv_dirty_bitmap_enabled(bitmap)) {
>  continue;
>  }
> -

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-12 Thread Denis V. Lunev

On 04/12/2017 09:20 PM, Eric Blake wrote:
> On 04/12/2017 12:55 PM, Denis V. Lunev wrote:
>> Let me rephrase a bit.
>>
>> The proposal is looking very close to the following case:
>> - raw sparse file
>>
>> In this case all writes are very-very-very fast and from the
>> guest point of view all is OK. Sequential data is really sequential.
>> Though once we are starting to perform any sequential IO, we
>> have real pain. Each sequential operation becomes random
>> on the host file system and the IO becomes very slow. This
>> will not be observed with the test, but the performance will
>> degrade very soon.
>>
>> This is why raw sparse files are not used in the real life.
>> Hypervisor must maintain guest OS invariants and the data,
>> which is nearby from the guest point of view should be kept
>> nearby in host.
>>
>> This is why actually that 64kb data blocks are extremely
>> small :) OK. This is offtopic.
> Not necessarily. Using subclusters may allow you to ramp up to larger
> cluster sizes. We can also set up our allocation (and pre-allocation
> schemes) so that we always reserve an entire cluster on the host at the
> time we allocate the cluster, even if we only plan to write to
> particular subclusters within that cluster.  In fact, 32 subclusters to
> a 2M cluster results in 64k subclusters, where you are still writing at
> 64k data chunks but could now have guaranteed 2M locality, compared to
> the current qcow2 with 64k clusters that writes in 64k data chunks but
> with no locality.
>
> Just because we don't write the entire cluster up front does not mean
> that we don't have to allocate (or have a mode that allocates) the
> entire cluster at the time of the first subcluster use.

this is something that I do not understand. We reserve the entire cluster at
allocation. Why do we need sub-clusters at cluster "creation" without COW?
fallocate() and preallocation completely covers this stage for now in
full and
solve all botllenecks we have. 4k/8k granularity of L2 cache solves metadata
write problem. But IMHO it is not important. Normally we sync metadata
at guest sync.

The only difference I am observing in this case is "copy-on-write" pattern
of the load with backing store or snapshot, where we copy only partial
cluster.
Thus we should clearly define that this is the only area of improvement and
start discussion from this point. Simple cluster creation is not the problem
anymore. I think that this reduces the scope of the proposal a lot.

Initial proposal starts from stating 2 problems:

"1) Reading from or writing to a qcow2 image involves reading the
   corresponding entry on the L2 table that maps the guest address to
   the host address. This is very slow because it involves two I/O
   operations: one on the L2 table and the other one on the actual
   data cluster.

2) A cluster is the smallest unit of allocation. Therefore writing a
   mere 512 bytes to an empty disk requires allocating a complete
   cluster and filling it with zeroes (or with data from the backing
   image if there is one). This wastes more disk space and also has a
   negative impact on I/O."

With pre-allocation (2) would be exactly the same as now and all
gain with sub-clusters will be effectively 0 as we will have to
preallocate entire cluster.

(1) is also questionable. I think that the root of the problem
is the cost of L2 cache miss, which is giant. With 1 Mb or 2 Mb
cluster the cost of the cache miss is not acceptable at all.
With page granularity of L2 cache this problem is seriously
reduced. We can switch to bigger blocks without much problem.
Again, the only problem is COW.

Thus I think that the proposal should be seriously re-analyzed and refined
with this input.

>> One can easily recreate this case using the following simple
>> test:
>> - write each even 4kb page of the disk, one by one
>> - write each odd 4 kb page of the disk
>> - run sequential read with f.e. 1 MB data block
>>
>> Normally we should still have native performance, but
>> with raw sparse files and (as far as understand the
>> proposal) sub-clusters we will have the host IO pattern
>> exactly like random.
> Only if we don't pre-allocate entire clusters at the point that we first
> touch the cluster.
>
>> This seems like a big and inevitable problem of the approach
>> for me. We still have the potential to improve current
>> algorithms and not introduce non-compatible changes.
>>
>> Sorry if this is too emotional. We have learned above in a
>> very hard way.
> And your experience is useful, as a way to fine-tune this proposal.  But
> it doesn't mean we should entirely ditch this proposal.  I also
> appreciate that you have patches in the works to reduce bottlenecks
> (such as turning sub-cluster writes into 3 IOPs rather than 5, by doing
> read-head, read-tail, write-cluster, instead of the current read-head,
> write-head, write-body, read-tail, write-tail), but think that both
> approaches are complimentary, not orthogonal.
>
Thank you :) I just

Re: [Qemu-devel] [PATCH 11/12] dirty-bitmap: Switch bdrv_set_dirty() to bytes

2017-04-12 Thread John Snow



On 04/12/2017 01:49 PM, Eric Blake wrote:
> Both callers already had bytes available, but were scaling to
> sectors.  Move the scaling to internal code.  In the case of
> bdrv_aligned_pwritev(), we are now passing the exact offset
> rather than a rounded sector-aligned value, but that's okay
> as long as dirty bitmap widens start/bytes to granularity
> boundaries.

Yes, that shouldn't be a problem. Granularity math will make sure this
comes out in the wash.

> 
> Signed-off-by: Eric Blake 
> ---
>  include/block/block_int.h | 2 +-
>  block/dirty-bitmap.c  | 8 +---
>  block/io.c| 6 ++
>  3 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 08063c1..0b737fd 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -917,7 +917,7 @@ void blk_dev_eject_request(BlockBackend *blk, bool force);
>  bool blk_dev_is_tray_open(BlockBackend *blk);
>  bool blk_dev_is_medium_locked(BlockBackend *blk);
> 
> -void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int64_t 
> nr_sect);
> +void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, int64_t bytes);
>  bool bdrv_requests_pending(BlockDriverState *bs);
> 
>  void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out);
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 8e7822c..ef165eb 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -478,15 +478,17 @@ void 
> bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap)
>  hbitmap_deserialize_finish(bitmap->bitmap);
>  }
> 
> -void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
> -int64_t nr_sectors)
> +void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, int64_t bytes)
>  {
>  BdrvDirtyBitmap *bitmap;
> +int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
> +
>  QLIST_FOREACH(bitmap, >dirty_bitmaps, list) {
>  if (!bdrv_dirty_bitmap_enabled(bitmap)) {
>  continue;
>  }
> -hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
> +hbitmap_set(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
> +end_sector - (offset >> BDRV_SECTOR_BITS));

Well, that's worse, but luckily you've got more patches. :)

>  }
>  }
> 
> diff --git a/block/io.c b/block/io.c
> index 9218329..d22d35f 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1328,7 +1328,6 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
> *child,
>  bool waited;
>  int ret;
> 
> -int64_t start_sector = offset >> BDRV_SECTOR_BITS;
>  int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
>  uint64_t bytes_remaining = bytes;
>  int max_transfer;
> @@ -1407,7 +1406,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
> *child,
>  bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE);
> 
>  ++bs->write_gen;
> -bdrv_set_dirty(bs, start_sector, end_sector - start_sector);
> +bdrv_set_dirty(bs, offset, bytes);
> 
>  if (bs->wr_highest_offset < offset + bytes) {
>  bs->wr_highest_offset = offset + bytes;
> @@ -2535,8 +2534,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, 
> int64_t offset,
>  ret = 0;
>  out:
>  ++bs->write_gen;
> -bdrv_set_dirty(bs, req.offset >> BDRV_SECTOR_BITS,
> -   req.bytes >> BDRV_SECTOR_BITS);
> +bdrv_set_dirty(bs, req.offset, req.bytes);
>  tracked_request_end();
>  bdrv_dec_in_flight(bs);
>  return ret;
> 

Reviewed-by: John Snow

Re: [Qemu-devel] [PATCH 10/12] mirror: Switch mirror_dirty_init() to byte-based iteration

2017-04-12 Thread John Snow



On 04/12/2017 01:49 PM, Eric Blake wrote:
> Now that we have adjusted the majority of the calls this function
> makes to be byte-based, it is easier to read the code if it makes
> passes over the image using bytes rather than sectors.
> 
> Signed-off-by: Eric Blake 
> ---
>  block/mirror.c | 35 ++-
>  1 file changed, 14 insertions(+), 21 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 21b4f5d..846e392 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -601,15 +601,13 @@ static void mirror_throttle(MirrorBlockJob *s)
> 
>  static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
>  {
> -int64_t sector_num, end;
> +int64_t offset;
>  BlockDriverState *base = s->base;
>  BlockDriverState *bs = s->source;
>  BlockDriverState *target_bs = blk_bs(s->target);
> -int ret, n;
> +int ret;
>  int64_t count;
> 
> -end = s->bdev_length / BDRV_SECTOR_SIZE;
> -
>  if (base == NULL && !bdrv_has_zero_init(target_bs)) {
>  if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
>  bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
> @@ -617,9 +615,9 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob 
> *s)
>  }
> 
>  s->initial_zeroing_ongoing = true;
> -for (sector_num = 0; sector_num < end; ) {
> -int nb_sectors = MIN(end - sector_num,
> -QEMU_ALIGN_DOWN(INT_MAX, s->granularity) >> 
> BDRV_SECTOR_BITS);
> +for (offset = 0; offset < s->bdev_length; ) {
> +int bytes = MIN(s->bdev_length - offset,
> +QEMU_ALIGN_DOWN(INT_MAX, s->granularity));
> 
>  mirror_throttle(s);
> 
> @@ -635,9 +633,8 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob 
> *s)
>  continue;
>  }
> 
> -mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
> -  nb_sectors * BDRV_SECTOR_SIZE, false);
> -sector_num += nb_sectors;
> +mirror_do_zero_or_discard(s, offset, bytes, false);
> +offset += bytes;
>  }
> 
>  mirror_wait_for_all_io(s);
> @@ -645,10 +642,10 @@ static int coroutine_fn 
> mirror_dirty_init(MirrorBlockJob *s)
>  }
> 
>  /* First part, loop on the sectors and initialize the dirty bitmap.  */
> -for (sector_num = 0; sector_num < end; ) {
> +for (offset = 0; offset < s->bdev_length; ) {
>  /* Just to make sure we are not exceeding int limit. */
> -int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS,
> - end - sector_num);
> +int bytes = MIN(s->bdev_length - offset,
> +QEMU_ALIGN_DOWN(INT_MAX, s->granularity));
> 
>  mirror_throttle(s);
> 
> @@ -656,20 +653,16 @@ static int coroutine_fn 
> mirror_dirty_init(MirrorBlockJob *s)
>  return 0;
>  }
> 
> -ret = bdrv_is_allocated_above(bs, base, sector_num * 
> BDRV_SECTOR_SIZE,
> -  nb_sectors * BDRV_SECTOR_SIZE, );
> +ret = bdrv_is_allocated_above(bs, base, offset, bytes, );
>  if (ret < 0) {
>  return ret;
>  }
> 
> -n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
> -assert(n > 0);
> +count = QEMU_ALIGN_UP(count, BDRV_SECTOR_SIZE);
>  if (ret == 1) {
> -bdrv_set_dirty_bitmap(s->dirty_bitmap,
> -  sector_num * BDRV_SECTOR_SIZE,
> -  n * BDRV_SECTOR_SIZE);
> +bdrv_set_dirty_bitmap(s->dirty_bitmap, offset, count);
>  }
> -sector_num += n;
> +offset += count;
>  }
>  return 0;
>  }
> 

Reviewed-by: John Snow

Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()

2017-04-12 Thread Jeff Cody

On Thu, Apr 13, 2017 at 07:54:20AM +0800, Fam Zheng wrote:
> On Wed, 04/12 18:22, Jeff Cody wrote:
> > On Wed, Apr 12, 2017 at 05:38:17PM -0400, John Snow wrote:
> > > 
> > > 
> > > On 04/12/2017 04:46 PM, Jeff Cody wrote:
> > > > 
> > > > This occurs on v2.9.0-rc4, but not on v2.8.0.
> > > > 
> > > > When running QEMU with an iothread, and then performing a block-mirror, 
> > > > if
> > > > we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu
> > > > becomes deadlocked.
> > > > 
> > > > The block job is not paused, nor cancelled, so we are stuck in the while
> > > > loop in block_job_detach_aio_context:
> > > > 
> > > > static void block_job_detach_aio_context(void *opaque)
> > > > {
> > > > BlockJob *job = opaque;
> > > > 
> > > > /* In case the job terminates during aio_poll()... */
> > > > block_job_ref(job);
> > > > 
> > > > block_job_pause(job);
> > > > 
> > > > while (!job->paused && !job->completed) {
> > > > block_job_drain(job);
> > > > }
> > > > 
> > > 
> > > Looks like when block_job_drain calls block_job_enter from this context
> > > (the main thread, since we're trying to do a system_reset...), we cannot
> > > enter the coroutine because it's the wrong context, so we schedule an
> > > entry instead with
> > > 
> > > aio_co_schedule(ctx, co);
> > > 
> > > But that entry never happens, so the job never wakes up and we never
> > > make enough progress in the coroutine to gracefully pause, so we wedge 
> > > here.
> > > 
> > 
> > 
> > John Snow and I debugged this some over IRC.  Here is a summary:
> > 
> > Simply put, with iothreads the aio context is different.  When
> > block_job_detach_aio_context() is called from the main thread via the system
> > reset (from main_loop_should_exit()), it calls block_job_drain() in a while
> > loop, with job->busy and job->completed as exit conditions.
> > 
> > block_job_drain() attempts to enter the coroutine (thus allowing job->busy
> > or job->completed to change).  However, since the aio context is different
> > with iothreads, we schedule the coroutine entry rather than directly
> > entering it.
> > 
> > This means the job coroutine is never going to be re-entered, because we are
> > waiting for it to complete in a while loop from the main thread, which is
> > blocking the qemu timers which would run the scheduled coroutine... hence,
> > we become stuck.
> 
> John and I confirmed that this can be fixed by this pending patch:
> 
> [PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01018.html
> 
> It didn't make it into 2.9-rc4 because of limited time. :(
> 
> Looks like there is no -rc5, we'll have to document this as a known issue.
> Users should "block-job-complete/cancel" as soon as possible to avoid such a
> hang.
>

I'd argue for including a fix for 2.9, since this is both a regression, and
a hard lock without possible recovery short of restarting the QEMU process.

-Jeff

Re: [Qemu-devel] [PATCH 08/12] dirty-bitmap: Change bdrv_get_dirty() to take bytes

2017-04-12 Thread Eric Blake

On 04/12/2017 07:25 PM, John Snow wrote:

>> +++ b/migration/block.c
>> @@ -537,8 +537,7 @@ static int mig_save_device_dirty(QEMUFile *f, 
>> BlkMigDevState *bmds,
>>  } else {
>>  blk_mig_unlock();
>>  }
>> -if (bdrv_get_dirty(bs, bmds->dirty_bitmap, sector)) {
>> -
>> +if (bdrv_get_dirty(bs, bmds->dirty_bitmap, sector * 
>> BDRV_SECTOR_SIZE)) {
> 
> This one is a little weirder now though, isn't it? We're asking for the
> dirty status of a single byte, technically. In practice, the scaling
> factor will always cover the entire sector, but it reads a lot jankier now.
> 
>>  if (total_sectors - sector < BDRV_SECTORS_PER_DIRTY_CHUNK) {
>>  nr_sectors = total_sectors - sector;
>>  } else {
>>
> 
> Oh well, it was always janky...

True. Think of it as "is the granularity (which may be a sector, a
cluster, or even some other size) that contains 'offset' dirty?".  I
really think migration/block.c will be easier to read once converted to
byte operations everywhere, but have not yet tackled it (it was hard
enough tackling mirror, backup, commit, and stream in parallel for the
previous series).

> 
> Reviewed-by: John Snow 
> 

Thanks for the reviews, by the way, even if the prerequisite patches
still haven't been fully reviewed.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] Fw: First contribution - Interested in Outreachy

2017-04-12 Thread Fam Zheng

On Tue, 04/11 16:44, Prerna Garg wrote:
> 
> 
> 
> 
> From: Prerna Garg
> Sent: Tuesday, April 11, 2017 9:57 PM
> To: Kevin Wolf
> Subject: Re: First contribution - Interested in Outreachy
> 
> 
> Hello,
> 
> 
> I am having difficulties sending the email through command line. So as a last
> resort I am sending the patch as an attachment.

Hi Prerna,

If it is git-send-email that doesn't work, probably it's because you haven't
configured the email server info correctly. Follow the instructions of
git-send-email to point to the used SMTP server in you $HOME/.gitconfig: 

https://git-scm.com/docs/git-send-email

Let us know if you have any more difficulties.

Fam

Re: [Qemu-devel] [PATCH 09/12] dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes

2017-04-12 Thread John Snow



On 04/12/2017 01:49 PM, Eric Blake wrote:
> Some of the callers were already scaling bytes to sectors; others
> can be easily converted to pass byte offsets, all in our shift
> towards a consistent byte interface everywhere.  Making the change
> will also make it easier to write the hold-out callers to use byte
> rather than sectors for their iterations; it also makes it easier
> for a future dirty-bitmap patch to offload scaling over to the
> internal hbitmap.  Although all callers happen to pass
> sector-aligned values, make the internal scaling robust to any
> sub-sector requests.
> 
> Signed-off-by: Eric Blake 
> ---
>  include/block/dirty-bitmap.h |  4 ++--
>  block/dirty-bitmap.c | 14 ++
>  block/mirror.c   | 16 
>  migration/block.c|  7 +--
>  4 files changed, 25 insertions(+), 16 deletions(-)
> 
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index b8434e5..fdff1e2 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -37,9 +37,9 @@ DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap 
> *bitmap);
>  bool bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
>  int64_t offset);
>  void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
> -   int64_t cur_sector, int64_t nr_sectors);
> +   int64_t offset, int64_t bytes);
>  void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
> - int64_t cur_sector, int64_t nr_sectors);
> + int64_t offset, int64_t bytes);
>  BdrvDirtyBitmapIter *bdrv_dirty_meta_iter_new(BdrvDirtyBitmap *bitmap);
>  BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap);
>  void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter);
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index c8100d2..8e7822c 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -401,17 +401,23 @@ int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
>  }
> 
>  void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
> -   int64_t cur_sector, int64_t nr_sectors)
> +   int64_t offset, int64_t bytes)
>  {
> +int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
> +
>  assert(bdrv_dirty_bitmap_enabled(bitmap));
> -hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
> +hbitmap_set(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
> +end_sector - (offset >> BDRV_SECTOR_BITS));
>  }
> 
>  void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
> - int64_t cur_sector, int64_t nr_sectors)
> + int64_t offset, int64_t bytes)
>  {
> +int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
> +
>  assert(bdrv_dirty_bitmap_enabled(bitmap));
> -hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
> +hbitmap_reset(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
> +  end_sector - (offset >> BDRV_SECTOR_BITS));
>  }
> 
>  void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out)
> diff --git a/block/mirror.c b/block/mirror.c
> index 1e2e655..21b4f5d 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -141,8 +141,7 @@ static void mirror_write_complete(void *opaque, int ret)
>  if (ret < 0) {
>  BlockErrorAction action;
> 
> -bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> 
> BDRV_SECTOR_BITS,
> -  op->bytes >> BDRV_SECTOR_BITS);
> +bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset, op->bytes);
>  action = mirror_error_action(s, false, -ret);
>  if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
>  s->ret = ret;
> @@ -161,8 +160,7 @@ static void mirror_read_complete(void *opaque, int ret)
>  if (ret < 0) {
>  BlockErrorAction action;
> 
> -bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> 
> BDRV_SECTOR_BITS,
> -  op->bytes >> BDRV_SECTOR_BITS);
> +bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset, op->bytes);
>  action = mirror_error_action(s, true, -ret);
>  if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
>  s->ret = ret;
> @@ -380,8 +378,8 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>   * calling bdrv_get_block_status_above could yield - if some blocks are
>   * marked dirty in this window, we need to know.
>   */
> -bdrv_reset_dirty_bitmap(s->dirty_bitmap, offset >> BDRV_SECTOR_BITS,
> -nb_chunks * sectors_per_chunk);
> +bdrv_reset_dirty_bitmap(s->dirty_bitmap, offset,
> +nb_chunks * s->granularity);
>  bitmap_set(s->in_flight_bitmap, offset / s->granularity, nb_chunks);
>  while (nb_chunks > 0 && offset <

Re: [Qemu-devel] [PATCH 08/12] dirty-bitmap: Change bdrv_get_dirty() to take bytes

2017-04-12 Thread John Snow



On 04/12/2017 01:49 PM, Eric Blake wrote:
> Half the callers were already scaling bytes to sectors; the other
> half can eventually be simplified to use byte iteration.  Both
> callers were already using the result as a bool, so make that
> explicit.  Making the change also makes it easier for a future
> dirty-bitmap patch to offload scaling over to the internal hbitmap.
> 
> Signed-off-by: Eric Blake 
> ---
>  include/block/dirty-bitmap.h | 4 ++--
>  block/dirty-bitmap.c | 8 
>  block/mirror.c   | 3 +--
>  migration/block.c| 3 +--
>  4 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index efcec60..b8434e5 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -34,8 +34,8 @@ bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
>  bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap);
>  const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
>  DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap);
> -int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
> -   int64_t sector);
> +bool bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
> +int64_t offset);
>  void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
> int64_t cur_sector, int64_t nr_sectors);
>  void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index e3c2e34..c8100d2 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -332,13 +332,13 @@ BlockDirtyInfoList 
> *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
>  return list;
>  }
> 
> -int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
> -   int64_t sector)
> +bool bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
> +int64_t offset)
>  {
>  if (bitmap) {
> -return hbitmap_get(bitmap->bitmap, sector);
> +return hbitmap_get(bitmap->bitmap, offset >> BDRV_SECTOR_BITS);
>  } else {
> -return 0;
> +return false;
>  }
>  }
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 1b98a77..1e2e655 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -359,8 +359,7 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  int64_t next_offset = offset + nb_chunks * s->granularity;
>  int64_t next_chunk = next_offset / s->granularity;
>  if (next_offset >= s->bdev_length ||
> -!bdrv_get_dirty(source, s->dirty_bitmap,
> -next_offset >> BDRV_SECTOR_BITS)) {
> +!bdrv_get_dirty(source, s->dirty_bitmap, next_offset)) {
>  break;
>  }
>  if (test_bit(next_chunk, s->in_flight_bitmap)) {
> diff --git a/migration/block.c b/migration/block.c
> index 3daa5c7..9e21aeb 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -537,8 +537,7 @@ static int mig_save_device_dirty(QEMUFile *f, 
> BlkMigDevState *bmds,
>  } else {
>  blk_mig_unlock();
>  }
> -if (bdrv_get_dirty(bs, bmds->dirty_bitmap, sector)) {
> -
> +if (bdrv_get_dirty(bs, bmds->dirty_bitmap, sector * 
> BDRV_SECTOR_SIZE)) {

This one is a little weirder now though, isn't it? We're asking for the
dirty status of a single byte, technically. In practice, the scaling
factor will always cover the entire sector, but it reads a lot jankier now.

>  if (total_sectors - sector < BDRV_SECTORS_PER_DIRTY_CHUNK) {
>  nr_sectors = total_sectors - sector;
>  } else {
> 

Oh well, it was always janky...

Reviewed-by: John Snow

Re: [Qemu-devel] [PATCH 07/12] dirty-bitmap: Change bdrv_get_dirty_count() to report bytes

2017-04-12 Thread John Snow



On 04/12/2017 01:49 PM, Eric Blake wrote:
> Thanks to recent cleanups, all callers were scaling a return value
> of sectors into bytes; do the scaling internally instead.
> 
> Signed-off-by: Eric Blake 
> ---
>  block/dirty-bitmap.c |  4 ++--
>  block/mirror.c   | 13 +
>  migration/block.c|  2 +-
>  3 files changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 2f9f554..e3c2e34 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -319,7 +319,7 @@ BlockDirtyInfoList 
> *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
>  QLIST_FOREACH(bm, >dirty_bitmaps, list) {
>  BlockDirtyInfo *info = g_new0(BlockDirtyInfo, 1);
>  BlockDirtyInfoList *entry = g_new0(BlockDirtyInfoList, 1);
> -info->count = bdrv_get_dirty_count(bm) << BDRV_SECTOR_BITS;
> +info->count = bdrv_get_dirty_count(bm);
>  info->granularity = bdrv_dirty_bitmap_granularity(bm);
>  info->has_name = !!bm->name;
>  info->name = g_strdup(bm->name);
> @@ -494,7 +494,7 @@ void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *iter, 
> int64_t offset)
> 
>  int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap)
>  {
> -return hbitmap_count(bitmap->bitmap);
> +return hbitmap_count(bitmap->bitmap) << BDRV_SECTOR_BITS;
>  }
> 
>  int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap)
> diff --git a/block/mirror.c b/block/mirror.c
> index f404ff3..1b98a77 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -794,11 +794,10 @@ static void coroutine_fn mirror_run(void *opaque)
> 
>  cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>  /* s->common.offset contains the number of bytes already processed so
> - * far, cnt is the number of dirty sectors remaining and
> + * far, cnt is the number of dirty bytes remaining and
>   * s->bytes_in_flight is the number of bytes currently being
>   * processed; together those are the current total operation length 
> */
> -s->common.len = s->common.offset + s->bytes_in_flight +
> -cnt * BDRV_SECTOR_SIZE;
> +s->common.len = s->common.offset + s->bytes_in_flight + cnt;
> 
>  /* Note that even when no rate limit is applied we need to yield
>   * periodically with no pending I/O so that bdrv_drain_all() returns.
> @@ -810,8 +809,7 @@ static void coroutine_fn mirror_run(void *opaque)
>  s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
>  if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 ||
>  (cnt == 0 && s->in_flight > 0)) {
> -trace_mirror_yield(s, cnt * BDRV_SECTOR_SIZE,
> -   s->buf_free_count, s->in_flight);
> +trace_mirror_yield(s, cnt, s->buf_free_count, s->in_flight);
>  mirror_wait_for_io(s);
>  continue;
>  } else if (cnt != 0) {
> @@ -852,7 +850,7 @@ static void coroutine_fn mirror_run(void *opaque)
>   * whether to switch to target check one last time if I/O has
>   * come in the meanwhile, and if not flush the data to disk.
>   */
> -trace_mirror_before_drain(s, cnt * BDRV_SECTOR_SIZE);
> +trace_mirror_before_drain(s, cnt);
> 
>  bdrv_drained_begin(bs);
>  cnt = bdrv_get_dirty_count(s->dirty_bitmap);
> @@ -871,8 +869,7 @@ static void coroutine_fn mirror_run(void *opaque)
>  }
> 
>  ret = 0;
> -trace_mirror_before_sleep(s, cnt * BDRV_SECTOR_SIZE,
> -  s->synced, delay_ns);
> +trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
>  if (!s->synced) {
>  block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns);
>  if (block_job_is_cancelled(>common)) {
> diff --git a/migration/block.c b/migration/block.c
> index 9a9c214..3daa5c7 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -672,7 +672,7 @@ static int64_t get_remaining_dirty(void)
>  aio_context_release(blk_get_aio_context(bmds->blk));
>  }
> 
> -return dirty << BDRV_SECTOR_BITS;
> +return dirty;
>  }
> 
>  /* Called with iothread lock taken.  */
> 

Reviewed-by: John Snow

Re: [Qemu-devel] [PATCH 07/12] dirty-bitmap: Change bdrv_get_dirty_count() to report bytes

2017-04-12 Thread John Snow



On 04/12/2017 01:49 PM, Eric Blake wrote:
> Thanks to recent cleanups, all callers were scaling a return value
> of sectors into bytes; do the scaling internally instead.
> 
> Signed-off-by: Eric Blake 
> ---
>  block/dirty-bitmap.c |  4 ++--
>  block/mirror.c   | 13 +
>  migration/block.c|  2 +-
>  3 files changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 2f9f554..e3c2e34 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -319,7 +319,7 @@ BlockDirtyInfoList 
> *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
>  QLIST_FOREACH(bm, >dirty_bitmaps, list) {
>  BlockDirtyInfo *info = g_new0(BlockDirtyInfo, 1);
>  BlockDirtyInfoList *entry = g_new0(BlockDirtyInfoList, 1);
> -info->count = bdrv_get_dirty_count(bm) << BDRV_SECTOR_BITS;
> +info->count = bdrv_get_dirty_count(bm);
>  info->granularity = bdrv_dirty_bitmap_granularity(bm);
>  info->has_name = !!bm->name;
>  info->name = g_strdup(bm->name);
> @@ -494,7 +494,7 @@ void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *iter, 
> int64_t offset)
> 
>  int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap)
>  {
> -return hbitmap_count(bitmap->bitmap);
> +return hbitmap_count(bitmap->bitmap) << BDRV_SECTOR_BITS;
>  }
> 
>  int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap)
> diff --git a/block/mirror.c b/block/mirror.c
> index f404ff3..1b98a77 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -794,11 +794,10 @@ static void coroutine_fn mirror_run(void *opaque)
> 
>  cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>  /* s->common.offset contains the number of bytes already processed so
> - * far, cnt is the number of dirty sectors remaining and
> + * far, cnt is the number of dirty bytes remaining and
>   * s->bytes_in_flight is the number of bytes currently being
>   * processed; together those are the current total operation length 
> */
> -s->common.len = s->common.offset + s->bytes_in_flight +
> -cnt * BDRV_SECTOR_SIZE;
> +s->common.len = s->common.offset + s->bytes_in_flight + cnt;
> 
>  /* Note that even when no rate limit is applied we need to yield
>   * periodically with no pending I/O so that bdrv_drain_all() returns.
> @@ -810,8 +809,7 @@ static void coroutine_fn mirror_run(void *opaque)
>  s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
>  if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 ||
>  (cnt == 0 && s->in_flight > 0)) {
> -trace_mirror_yield(s, cnt * BDRV_SECTOR_SIZE,
> -   s->buf_free_count, s->in_flight);
> +trace_mirror_yield(s, cnt, s->buf_free_count, s->in_flight);
>  mirror_wait_for_io(s);
>  continue;
>  } else if (cnt != 0) {
> @@ -852,7 +850,7 @@ static void coroutine_fn mirror_run(void *opaque)
>   * whether to switch to target check one last time if I/O has
>   * come in the meanwhile, and if not flush the data to disk.
>   */
> -trace_mirror_before_drain(s, cnt * BDRV_SECTOR_SIZE);
> +trace_mirror_before_drain(s, cnt);
> 
>  bdrv_drained_begin(bs);
>  cnt = bdrv_get_dirty_count(s->dirty_bitmap);
> @@ -871,8 +869,7 @@ static void coroutine_fn mirror_run(void *opaque)
>  }
> 
>  ret = 0;
> -trace_mirror_before_sleep(s, cnt * BDRV_SECTOR_SIZE,
> -  s->synced, delay_ns);
> +trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
>  if (!s->synced) {
>  block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns);
>  if (block_job_is_cancelled(>common)) {
> diff --git a/migration/block.c b/migration/block.c
> index 9a9c214..3daa5c7 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -672,7 +672,7 @@ static int64_t get_remaining_dirty(void)
>  aio_context_release(blk_get_aio_context(bmds->blk));
>  }
> 
> -return dirty << BDRV_SECTOR_BITS;
> +return dirty;
>  }
> 
>  /* Called with iothread lock taken.  */
> 

Reviewed-by: John Snow

Re: [Qemu-devel] [PATCH 06/12] dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset

2017-04-12 Thread John Snow



On 04/12/2017 01:49 PM, Eric Blake wrote:
> Thanks to recent cleanups, all callers were scaling a return value
> of sectors into bytes; do the scaling internally instead.
> 
> Signed-off-by: Eric Blake 
> ---
>  block/backup.c   | 2 +-
>  block/dirty-bitmap.c | 2 +-
>  block/mirror.c   | 8 
>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index efa4896..6efd864 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -375,7 +375,7 @@ static int coroutine_fn 
> backup_run_incremental(BackupBlockJob *job)
>  dbi = bdrv_dirty_iter_new(job->sync_bitmap);
> 
>  /* Find the next dirty sector(s) */
> -while ((offset = bdrv_dirty_iter_next(dbi) * BDRV_SECTOR_SIZE) >= 0) {
> +while ((offset = bdrv_dirty_iter_next(dbi)) >= 0) {
>  cluster = offset / job->cluster_size;
> 
>  /* Fake progress updates for any clusters we skipped */
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 3fb4871..2f9f554 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -397,7 +397,7 @@ void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter)
> 
>  int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
>  {
> -return hbitmap_iter_next(>hbi);
> +return hbitmap_iter_next(>hbi) * BDRV_SECTOR_SIZE;
>  }
> 
>  void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
> diff --git a/block/mirror.c b/block/mirror.c
> index 7c1d6bf..f404ff3 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -335,10 +335,10 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  bool write_zeroes_ok = 
> bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
>  int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);
> 
> -offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
> +offset = bdrv_dirty_iter_next(s->dbi);
>  if (offset < 0) {
>  bdrv_set_dirty_iter(s->dbi, 0);
> -offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
> +offset = bdrv_dirty_iter_next(s->dbi);
>  trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap) *
>BDRV_SECTOR_SIZE);
>  assert(offset >= 0);
> @@ -367,11 +367,11 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  break;
>  }
> 
> -next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
> +next_dirty = bdrv_dirty_iter_next(s->dbi);
>  if (next_dirty > next_offset || next_dirty < 0) {
>  /* The bitmap iterator's cache is stale, refresh it */
>  bdrv_set_dirty_iter(s->dbi, next_offset);
> -next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
> +next_dirty = bdrv_dirty_iter_next(s->dbi);
>  }
>  assert(next_dirty == next_offset);
>  nb_chunks++;
> 

Reviewed-by: John Snow

Re: [Qemu-devel] [PATCH] raspi: Add Raspberry Pi 1 support

2017-04-12 Thread Eric Blake

On 04/12/2017 06:57 PM, no-re...@patchew.org wrote:
> Hi,
> 
> This series seems to have some coding style problems. See output below for
> more information:

> === OUTPUT BEGIN ===
> Checking PATCH 1/1: raspi: Add Raspberry Pi 1 support...
> ERROR: line over 90 characters
> #189: FILE: hw/arm/raspi.c:34:
> + * 
> https://github.com/AndrewFromMelbourne/raspberry_pi_revision/blob/2764c19983fb7b1ef8cb21031e94c293703afa2e/README.md

You can shorten this:

https://github.com/AndrewFromMelbourne/raspberry_pi_revision/blob/2764c199/README.md

to fit under 90, while still resolving to the same place (collisions
with just an 8-char prefix are rare enough to generally not be an issue
to git)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PULL 00/10] Block layer fixes for 2.9.0-rc4

2017-04-12 Thread Fam Zheng

On Wed, 04/12 16:51, no-re...@patchew.org wrote:
> Hi,
> 
> This series seems to have some coding style problems. See output below for
> more information:
> 
> Message-id: 1491572865-8549-1-git-send-email-kw...@redhat.com
> Subject: [Qemu-devel] [PULL 00/10] Block layer fixes for 2.9.0-rc4
> Type: series
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> 
> BASE=base
> n=1
> total=$(git log --oneline $BASE.. | wc -l)
> failed=0
> 
> # Useful git options
> git config --local diff.renamelimit 0
> git config --local diff.renames True
> 
> commits="$(git log --format=%H --reverse $BASE..)"
> for c in $commits; do
> echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
> if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; 
> then
> failed=1
> echo
> fi
> n=$((n+1))
> done
> 
> exit $failed
> === TEST SCRIPT END ===
> 
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> Switched to a new branch 'test'
> 4a9e7db mirror: Fix aio context of mirror_top_bs
> 79767f5 block: Assert attached child node has right aio context
> dfad3a5 block: Fix unpaired aio_disable_external in external snapshot
> 99be1dc block: Don't check permissions for copy on read
> 0e5e935 qemu-img: img_create does not support image-opts, fix docs
> 3afb9fa iotests: Add mirror tests for orphaned source
> 5ccc43d block/mirror: Fix use-after-free
> e65d767 commit: Set commit_top_bs->total_sectors
> 2dddcd2 commit: Set commit_top_bs->aio_context
> 5770f38 block: Ignore guest dev permissions during incoming migration
> 
> === OUTPUT BEGIN ===
> Checking PATCH 1/10: block: Ignore guest dev permissions during incoming 
> migration...
> Checking PATCH 2/10: commit: Set commit_top_bs->aio_context...
> Checking PATCH 3/10: commit: Set commit_top_bs->total_sectors...
> Checking PATCH 4/10: block/mirror: Fix use-after-free...
> Checking PATCH 5/10: iotests: Add mirror tests for orphaned source...
> Checking PATCH 6/10: qemu-img: img_create does not support image-opts, fix 
> docs...
> Checking PATCH 7/10: block: Don't check permissions for copy on read...
> ERROR: do not use C99 // comments
> #32: FILE: block/io.c:955:
> +// assert(child->perm & (BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE));
> 
> total: 1 errors, 0 warnings, 15 lines checked
> 
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> 
> Checking PATCH 8/10: block: Fix unpaired aio_disable_external in external 
> snapshot...
> Checking PATCH 9/10: block: Assert attached child node has right aio 
> context...
> Checking PATCH 10/10: mirror: Fix aio context of mirror_top_bs...
> === OUTPUT END ===
> 
> Test command exited with code: 1
> 
> 
> ---
> Email generated automatically by Patchew [http://patchew.org/].
> Please send your feedback to patchew-de...@freelists.org

Hmm, patchew has stalled and a long queue of series is now being processed. I'm
going to disable email notifications for now until it catches up.

Fam

Re: [Qemu-devel] [PATCH 05/12] dirty-bitmap: Set iterator start by offset, not sector

2017-04-12 Thread John Snow



On 04/12/2017 01:49 PM, Eric Blake wrote:
> All callers to bdrv_dirty_iter_new() passed 0 for their initial
> starting point, drop that parameter.
> 
> All callers to bdrv_set_dirty_iter() were scaling an offset to
> a sector number; move the scaling to occur internally to dirty
> bitmap code instead.
> 
> Signed-off-by: Eric Blake 
> ---
>  include/block/dirty-bitmap.h | 5 ++---
>  block/backup.c   | 5 ++---
>  block/dirty-bitmap.c | 9 -
>  block/mirror.c   | 4 ++--
>  4 files changed, 10 insertions(+), 13 deletions(-)
> 
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index a83979d..efcec60 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -41,11 +41,10 @@ void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
>  void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
>   int64_t cur_sector, int64_t nr_sectors);
>  BdrvDirtyBitmapIter *bdrv_dirty_meta_iter_new(BdrvDirtyBitmap *bitmap);
> -BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
> - uint64_t first_sector);
> +BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap);
>  void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter);
>  int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter);
> -void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t sector_num);
> +void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t offset);
>  int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
>  int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap);
>  void bdrv_dirty_bitmap_truncate(BlockDriverState *bs);
> diff --git a/block/backup.c b/block/backup.c
> index 63ca208..efa4896 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -372,7 +372,7 @@ static int coroutine_fn 
> backup_run_incremental(BackupBlockJob *job)
> 
>  granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
>  clusters_per_iter = MAX((granularity / job->cluster_size), 1);
> -dbi = bdrv_dirty_iter_new(job->sync_bitmap, 0);
> +dbi = bdrv_dirty_iter_new(job->sync_bitmap);
> 
>  /* Find the next dirty sector(s) */
>  while ((offset = bdrv_dirty_iter_next(dbi) * BDRV_SECTOR_SIZE) >= 0) {
> @@ -403,8 +403,7 @@ static int coroutine_fn 
> backup_run_incremental(BackupBlockJob *job)
>  /* If the bitmap granularity is smaller than the backup granularity,
>   * we need to advance the iterator pointer to the next cluster. */
>  if (granularity < job->cluster_size) {
> -bdrv_set_dirty_iter(dbi,
> -cluster * job->cluster_size / 
> BDRV_SECTOR_SIZE);
> +bdrv_set_dirty_iter(dbi, cluster * job->cluster_size);
>  }
> 
>  last_cluster = cluster - 1;
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index a413df1..3fb4871 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -367,11 +367,10 @@ uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap 
> *bitmap)
>  return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
>  }
> 
> -BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
> - uint64_t first_sector)
> +BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap)
>  {
>  BdrvDirtyBitmapIter *iter = g_new(BdrvDirtyBitmapIter, 1);
> -hbitmap_iter_init(>hbi, bitmap->bitmap, first_sector);
> +hbitmap_iter_init(>hbi, bitmap->bitmap, 0);
>  iter->bitmap = bitmap;
>  bitmap->active_iterators++;
>  return iter;
> @@ -488,9 +487,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t 
> cur_sector,
>  /**
>   * Advance a BdrvDirtyBitmapIter to an arbitrary offset.
>   */
> -void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *iter, int64_t sector_num)
> +void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *iter, int64_t offset)
>  {
> -hbitmap_iter_init(>hbi, iter->hbi.hb, sector_num);
> +hbitmap_iter_init(>hbi, iter->hbi.hb, offset >> BDRV_SECTOR_BITS);
>  }
> 
>  int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap)
> diff --git a/block/mirror.c b/block/mirror.c
> index c92335a..7c1d6bf 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -370,7 +370,7 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>  next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
>  if (next_dirty > next_offset || next_dirty < 0) {
>  /* The bitmap iterator's cache is stale, refresh it */
> -bdrv_set_dirty_iter(s->dbi, next_offset >> BDRV_SECTOR_BITS);
> +bdrv_set_dirty_iter(s->dbi, next_offset);
>  next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
>  }
>  assert(next_dirty == next_offset);
> @@ -779,7 +779,7 @@ static void coroutine_fn mirror_run(void *opaque)
>  }
> 
>  assert(!s->dbi);
> -s->dbi = bdrv_dirty_iter_new(s->dirty_bitmap,

Re: [Qemu-devel] [PATCH] raspi: Add Raspberry Pi 1 support

2017-04-12 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20170408054318.19830-1-omar.riz...@gmail.com
Subject: [Qemu-devel] [PATCH] raspi: Add Raspberry Pi 1 support
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
082bb61 raspi: Add Raspberry Pi 1 support

=== OUTPUT BEGIN ===
Checking PATCH 1/1: raspi: Add Raspberry Pi 1 support...
ERROR: line over 90 characters
#189: FILE: hw/arm/raspi.c:34:
+ * 
https://github.com/AndrewFromMelbourne/raspberry_pi_revision/blob/2764c19983fb7b1ef8cb21031e94c293703afa2e/README.md

total: 1 errors, 0 warnings, 258 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()

2017-04-12 Thread Fam Zheng

On Wed, 04/12 18:22, Jeff Cody wrote:
> On Wed, Apr 12, 2017 at 05:38:17PM -0400, John Snow wrote:
> > 
> > 
> > On 04/12/2017 04:46 PM, Jeff Cody wrote:
> > > 
> > > This occurs on v2.9.0-rc4, but not on v2.8.0.
> > > 
> > > When running QEMU with an iothread, and then performing a block-mirror, if
> > > we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu
> > > becomes deadlocked.
> > > 
> > > The block job is not paused, nor cancelled, so we are stuck in the while
> > > loop in block_job_detach_aio_context:
> > > 
> > > static void block_job_detach_aio_context(void *opaque)
> > > {
> > > BlockJob *job = opaque;
> > > 
> > > /* In case the job terminates during aio_poll()... */
> > > block_job_ref(job);
> > > 
> > > block_job_pause(job);
> > > 
> > > while (!job->paused && !job->completed) {
> > > block_job_drain(job);
> > > }
> > > 
> > 
> > Looks like when block_job_drain calls block_job_enter from this context
> > (the main thread, since we're trying to do a system_reset...), we cannot
> > enter the coroutine because it's the wrong context, so we schedule an
> > entry instead with
> > 
> > aio_co_schedule(ctx, co);
> > 
> > But that entry never happens, so the job never wakes up and we never
> > make enough progress in the coroutine to gracefully pause, so we wedge here.
> > 
> 
> 
> John Snow and I debugged this some over IRC.  Here is a summary:
> 
> Simply put, with iothreads the aio context is different.  When
> block_job_detach_aio_context() is called from the main thread via the system
> reset (from main_loop_should_exit()), it calls block_job_drain() in a while
> loop, with job->busy and job->completed as exit conditions.
> 
> block_job_drain() attempts to enter the coroutine (thus allowing job->busy
> or job->completed to change).  However, since the aio context is different
> with iothreads, we schedule the coroutine entry rather than directly
> entering it.
> 
> This means the job coroutine is never going to be re-entered, because we are
> waiting for it to complete in a while loop from the main thread, which is
> blocking the qemu timers which would run the scheduled coroutine... hence,
> we become stuck.

John and I confirmed that this can be fixed by this pending patch:

[PATCH for-2.9 4/5] block: Drain BH in bdrv_drained_begin

https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01018.html

It didn't make it into 2.9-rc4 because of limited time. :(

Looks like there is no -rc5, we'll have to document this as a known issue.
Users should "block-job-complete/cancel" as soon as possible to avoid such a
hang.

Fam

Re: [Qemu-devel] Hacks for building on gcc 7 / Fedora 26

2017-04-12 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20170407143847.GM2138@work-vm
Subject: [Qemu-devel] Hacks for building on gcc 7 / Fedora 26
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
ad00ac3 Hacks for building on gcc 7 / Fedora 26

=== OUTPUT BEGIN ===
Checking PATCH 1/1: Hacks for building on gcc 7 / Fedora 26...
WARNING: line over 80 characters
#63: FILE: block/blkverify.c:311:
+ s->test_file->bs->exact_filename), <, 
sizeof(bs->exact_filename));

WARNING: line over 80 characters
#85: FILE: hw/usb/bus.c:411:
+g_assert_cmpint(snprintf(downstream->path, sizeof(downstream->path), 
"%s.%d",

ERROR: Missing Signed-off-by: line(s)

total: 1 errors, 2 warnings, 64 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

Re: [Qemu-devel] [PULL 00/10] Block layer fixes for 2.9.0-rc4

2017-04-12 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 1491572865-8549-1-git-send-email-kw...@redhat.com
Subject: [Qemu-devel] [PULL 00/10] Block layer fixes for 2.9.0-rc4
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
4a9e7db mirror: Fix aio context of mirror_top_bs
79767f5 block: Assert attached child node has right aio context
dfad3a5 block: Fix unpaired aio_disable_external in external snapshot
99be1dc block: Don't check permissions for copy on read
0e5e935 qemu-img: img_create does not support image-opts, fix docs
3afb9fa iotests: Add mirror tests for orphaned source
5ccc43d block/mirror: Fix use-after-free
e65d767 commit: Set commit_top_bs->total_sectors
2dddcd2 commit: Set commit_top_bs->aio_context
5770f38 block: Ignore guest dev permissions during incoming migration

=== OUTPUT BEGIN ===
Checking PATCH 1/10: block: Ignore guest dev permissions during incoming 
migration...
Checking PATCH 2/10: commit: Set commit_top_bs->aio_context...
Checking PATCH 3/10: commit: Set commit_top_bs->total_sectors...
Checking PATCH 4/10: block/mirror: Fix use-after-free...
Checking PATCH 5/10: iotests: Add mirror tests for orphaned source...
Checking PATCH 6/10: qemu-img: img_create does not support image-opts, fix 
docs...
Checking PATCH 7/10: block: Don't check permissions for copy on read...
ERROR: do not use C99 // comments
#32: FILE: block/io.c:955:
+// assert(child->perm & (BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE));

total: 1 errors, 0 warnings, 15 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 8/10: block: Fix unpaired aio_disable_external in external 
snapshot...
Checking PATCH 9/10: block: Assert attached child node has right aio context...
Checking PATCH 10/10: mirror: Fix aio context of mirror_top_bs...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

Re: [Qemu-devel] [PATCH v2] hw/arm/virt: generate 64-bit addressable ACPI objects

2017-04-12 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20170407144138.12871-1-ard.biesheu...@linaro.org
Subject: [Qemu-devel] [PATCH v2] hw/arm/virt: generate 64-bit addressable ACPI 
objects
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
c63a33e hw/arm/virt: generate 64-bit addressable ACPI objects

=== OUTPUT BEGIN ===
Checking PATCH 1/1: hw/arm/virt: generate 64-bit addressable ACPI objects...
ERROR: open brace '{' following struct go on the same line
#154: FILE: include/hw/acpi/acpi-defs.h:244:
+struct AcpiXsdtDescriptorRev2
+{

total: 1 errors, 0 warnings, 125 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

Re: [Qemu-devel] [RFC/BUG] xen-mapcache: buggy invalidate map cache?

2017-04-12 Thread Stefano Stabellini

On Wed, 12 Apr 2017, Herongguang (Stephen) wrote:
> On 2017/4/12 6:32, Stefano Stabellini wrote:
> > On Tue, 11 Apr 2017, hrg wrote:
> > > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
> > >  wrote:
> > > > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
> > > > > On Mon, 10 Apr 2017, hrg wrote:
> > > > > > On Sun, Apr 9, 2017 at 11:55 PM, hrg  wrote:
> > > > > > > On Sun, Apr 9, 2017 at 11:52 PM, hrg  wrote:
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > In xen_map_cache_unlocked(), map to guest memory maybe in
> > > > > > > > entry->next
> > > > > > > > instead of first level entry (if map to rom other than guest
> > > > > > > > memory
> > > > > > > > comes first), while in xen_invalidate_map_cache(), when VM
> > > > > > > > ballooned
> > > > > > > > out memory, qemu did not invalidate cache entries in linked
> > > > > > > > list(entry->next), so when VM balloon back in memory, gfns
> > > > > > > > probably
> > > > > > > > mapped to different mfns, thus if guest asks device to DMA to
> > > > > > > > these
> > > > > > > > GPA, qemu may DMA to stale MFNs.
> > > > > > > > 
> > > > > > > > So I think in xen_invalidate_map_cache() linked lists should
> > > > > > > > also be
> > > > > > > > checked and invalidated.
> > > > > > > > 
> > > > > > > > What’s your opinion? Is this a bug? Is my analyze correct?
> > > > > Yes, you are right. We need to go through the list for each element of
> > > > > the array in xen_invalidate_map_cache. Can you come up with a patch?
> > > > I spoke too soon. In the regular case there should be no locked mappings
> > > > when xen_invalidate_map_cache is called (see the DPRINTF warning at the
> > > > beginning of the functions). Without locked mappings, there should never
> > > > be more than one element in each list (see xen_map_cache_unlocked:
> > > > entry->lock == true is a necessary condition to append a new entry to
> > > > the list, otherwise it is just remapped).
> > > > 
> > > > Can you confirm that what you are seeing are locked mappings
> > > > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
> > > > by turning it into a printf or by defininig MAPCACHE_DEBUG.
> > > In fact, I think the DPRINTF above is incorrect too. In
> > > pci_add_option_rom(), rtl8139 rom is locked mapped in
> > > pci_add_option_rom->memory_region_get_ram_ptr (after
> > > memory_region_init_ram). So actually I think we should remove the
> > > DPRINTF warning as it is normal.
> > Let me explain why the DPRINTF warning is there: emulated dma operations
> > can involve locked mappings. Once a dma operation completes, the related
> > mapping is unlocked and can be safely destroyed. But if we destroy a
> > locked mapping in xen_invalidate_map_cache, while a dma is still
> > ongoing, QEMU will crash. We cannot handle that case.
> > 
> > However, the scenario you described is different. It has nothing to do
> > with DMA. It looks like pci_add_option_rom calls
> > memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
> > locked mapping and it is never unlocked or destroyed.
> > 
> > It looks like "ptr" is not used after pci_add_option_rom returns. Does
> > the append patch fix the problem you are seeing? For the proper fix, I
> > think we probably need some sort of memory_region_unmap wrapper or maybe
> > a call to address_space_unmap.
> 
> Yes, I think so, maybe this is the proper way to fix this.

Would you be up for sending a proper patch and testing it? We cannot call
xen_invalidate_map_cache_entry directly from pci.c though, it would need
to be one of the other functions like address_space_unmap for example.


> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index e6b08e1..04f98b7 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool
> > is_default_rom,
> >   }
> > pci_register_bar(pdev, PCI_ROM_SLOT, 0, >rom);
> > +xen_invalidate_map_cache_entry(ptr);
> >   }
> > static void pci_del_option_rom(PCIDevice *pdev)

Re: [Qemu-devel] [PATCH v2 0/9] Provide support for the software TPM

2017-04-12 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 1491575431-32170-1-git-send-email-amarnath.vall...@intel.com
Subject: [Qemu-devel] [PATCH v2 0/9] Provide support for the software TPM
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
fb65a2b tpm: Added support for TPM emulator
0588e1b tpm-passthrough: move reusable code to utils
3dc626f tpm-backend: Move realloc_buffer() implementation to base class
3f0ba45 tpm-backend: Remove unneeded destroy() method from TpmDriverOps 
interface
2badc93 tmp backend: Add new api to read backend TpmInfo
8bb890f tpm-backend: Made few interface methods optional
e7a1f98 tpm-backend: Initialize and free data members in it's own methods
7e162a1 tpm-backend: Move thread handling inside TPMBackend
b739744 tpm-backend: Remove unneeded member variable from backend class

=== OUTPUT BEGIN ===
Checking PATCH 1/9: tpm-backend: Remove unneeded member variable from backend 
class...
Checking PATCH 2/9: tpm-backend: Move thread handling inside TPMBackend...
ERROR: space prohibited between function name and open parenthesis '('
#30: FILE: backends/tpm.c:27:
+assert (k->handle_request != NULL);

total: 1 errors, 0 warnings, 308 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 3/9: tpm-backend: Initialize and free data members in it's own 
methods...
Checking PATCH 4/9: tpm-backend: Made few interface methods optional...
Checking PATCH 5/9: tmp backend: Add new api to read backend TpmInfo...
Checking PATCH 6/9: tpm-backend: Remove unneeded destroy() method from 
TpmDriverOps interface...
Checking PATCH 7/9: tpm-backend: Move realloc_buffer() implementation to base 
class...
Checking PATCH 8/9: tpm-passthrough: move reusable code to utils...
Checking PATCH 9/9: tpm: Added support for TPM emulator...
ERROR: Macros with complex values should be enclosed in parenthesis
#206: FILE: hw/tpm/tpm_emulator.c:82:
+#define TPM_EMULATOR_IOCTL_TO_CMD(ioctlnum) \
+((ioctlnum >> _IOC_NRSHIFT) & _IOC_NRMASK) + 1

WARNING: line over 80 characters
#266: FILE: hw/tpm/tpm_emulator.c:142:
+ret = qio_channel_read(tpm_pt->data_ioc, (char *)out, (size_t)out_len, 
NULL);

WARNING: line over 80 characters
#567: FILE: hw/tpm/tpm_emulator.c:443:
+static gboolean tpm_emulator_fd_handler(QIOChannel *ioc, GIOCondition cnd, 
void *opaque)

ERROR: trailing whitespace
#586: FILE: hw/tpm/tpm_emulator.c:462:
+ $

ERROR: trailing whitespace
#601: FILE: hw/tpm/tpm_emulator.c:477:
+ $

ERROR: spaces required around that '+' (ctx:VxV)
#640: FILE: hw/tpm/tpm_emulator.c:516:
+char *argv[PARAM_MAX+1];
 ^

ERROR: braces {} are necessary for all arms of this statement
#643: FILE: hw/tpm/tpm_emulator.c:519:
+if (fds[0] >= 0)
[...]

ERROR: braces {} are necessary for all arms of this statement
#645: FILE: hw/tpm/tpm_emulator.c:521:
+if (ctrl_fds[0] >= 0)
[...]

ERROR: braces {} are necessary for all arms of this statement
#715: FILE: hw/tpm/tpm_emulator.c:591:
+if (data_fd >= 0)
[...]

ERROR: braces {} are necessary for all arms of this statement
#717: FILE: hw/tpm/tpm_emulator.c:593:
+if (ctrl_fd >= 0)
[...]

ERROR: braces {} are necessary for all arms of this statement
#728: FILE: hw/tpm/tpm_emulator.c:604:
+if (fds[1] >= 0)
[...]

ERROR: braces {} are necessary for all arms of this statement
#730: FILE: hw/tpm/tpm_emulator.c:606:
+if (ctrl_fds[1] >= 0)
[...]

ERROR: space required before the open parenthesis '('
#733: FILE: hw/tpm/tpm_emulator.c:609:
+while((rc = stat(TPM_EMULATOR_PIDFILE, )) < 0 && timeout--) {

ERROR: trailing whitespace
#735: FILE: hw/tpm/tpm_emulator.c:611:
+} $

WARNING: line over 80 characters
#750: FILE: hw/tpm/tpm_emulator.c:626:
+tpm_pt->ctrl_ioc = _iochannel_new(tpm_pt->ops->ctrl_path, ctrl_fds[0], 
NULL);

ERROR: spaces required around that '|' (ctx:VxV)
#761: FILE: hw/tpm/tpm_emulator.c:637:
+qio_channel_add_watch(tpm_pt->data_ioc, G_IO_HUP|G_IO_ERR,
 ^

total: 13 errors, 3 warnings, 1376 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Re: [Qemu-devel] [Xen-devel] [RFC/BUG] xen-mapcache: buggy invalidate map cache?

2017-04-12 Thread Stefano Stabellini

On Wed, 12 Apr 2017, Alexey G wrote:
> On Tue, 11 Apr 2017 15:32:09 -0700 (PDT)
> Stefano Stabellini  wrote:
> 
> > On Tue, 11 Apr 2017, hrg wrote:
> > > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
> > >  wrote:  
> > > > On Mon, 10 Apr 2017, Stefano Stabellini wrote:  
> > > >> On Mon, 10 Apr 2017, hrg wrote:  
> > > >> > On Sun, Apr 9, 2017 at 11:55 PM, hrg  wrote:  
> > > >> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg  wrote: 
> > > >> > >  
> > > >> > >> Hi,
> > > >> > >>
> > > >> > >> In xen_map_cache_unlocked(), map to guest memory maybe in
> > > >> > >> entry->next instead of first level entry (if map to rom other than
> > > >> > >> guest memory comes first), while in xen_invalidate_map_cache(),
> > > >> > >> when VM ballooned out memory, qemu did not invalidate cache 
> > > >> > >> entries
> > > >> > >> in linked list(entry->next), so when VM balloon back in memory,
> > > >> > >> gfns probably mapped to different mfns, thus if guest asks device
> > > >> > >> to DMA to these GPA, qemu may DMA to stale MFNs.
> > > >> > >>
> > > >> > >> So I think in xen_invalidate_map_cache() linked lists should also 
> > > >> > >> be
> > > >> > >> checked and invalidated.
> > > >> > >>
> > > >> > >> What’s your opinion? Is this a bug? Is my analyze correct?  
> > > >>
> > > >> Yes, you are right. We need to go through the list for each element of
> > > >> the array in xen_invalidate_map_cache. Can you come up with a patch?  
> > > >
> > > > I spoke too soon. In the regular case there should be no locked mappings
> > > > when xen_invalidate_map_cache is called (see the DPRINTF warning at the
> > > > beginning of the functions). Without locked mappings, there should never
> > > > be more than one element in each list (see xen_map_cache_unlocked:
> > > > entry->lock == true is a necessary condition to append a new entry to
> > > > the list, otherwise it is just remapped).
> > > >
> > > > Can you confirm that what you are seeing are locked mappings
> > > > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
> > > > by turning it into a printf or by defininig MAPCACHE_DEBUG.  
> > > 
> > > In fact, I think the DPRINTF above is incorrect too. In
> > > pci_add_option_rom(), rtl8139 rom is locked mapped in
> > > pci_add_option_rom->memory_region_get_ram_ptr (after
> > > memory_region_init_ram). So actually I think we should remove the
> > > DPRINTF warning as it is normal.  
> > 
> > Let me explain why the DPRINTF warning is there: emulated dma operations
> > can involve locked mappings. Once a dma operation completes, the related
> > mapping is unlocked and can be safely destroyed. But if we destroy a
> > locked mapping in xen_invalidate_map_cache, while a dma is still
> > ongoing, QEMU will crash. We cannot handle that case.
> > 
> > However, the scenario you described is different. It has nothing to do
> > with DMA. It looks like pci_add_option_rom calls
> > memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
> > locked mapping and it is never unlocked or destroyed.
> > 
> > It looks like "ptr" is not used after pci_add_option_rom returns. Does
> > the append patch fix the problem you are seeing? For the proper fix, I
> > think we probably need some sort of memory_region_unmap wrapper or maybe
> > a call to address_space_unmap.
> 
> Hmm, for some reason my message to the Xen-devel list got rejected but was 
> sent
> to Qemu-devel instead, without any notice. Sorry if I'm missing something
> obvious as a list newbie.
> 
> Stefano, hrg,
> 
> There is an issue with inconsistency between the list of normal 
> MapCacheEntry's
> and their 'reverse' counterparts - MapCacheRev's in locked_entries.
> When bad situation happens, there are multiple (locked) MapCacheEntry
> entries in the bucket's linked list along with a number of MapCacheRev's. And
> when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the
> first list and calculates a wrong pointer from it which may then be caught 
> with
> the "Bad RAM offset" check (or not). Mapcache invalidation might be related to
> this issue as well I think.
> 
> I'll try to provide a test code which can reproduce the issue from the
> guest side using an emulated IDE controller, though it's much simpler to 
> achieve
> this result with an AHCI controller using multiple NCQ I/O commands. So far 
> I've
> seen this issue only with Windows 7 (and above) guest on AHCI, but any block 
> I/O
> DMA should be enough I think.

That would be helpful. Please see if you can reproduce it after fixing
the other issue (http://marc.info/?l=qemu-devel=149195042500707=2).

Re: [Qemu-devel] [PATCH 03/12] dirty-bitmap: Drop unused functions

2017-04-12 Thread John Snow



On 04/12/2017 07:36 PM, Eric Blake wrote:
> On 04/12/2017 05:47 PM, John Snow wrote:
>>
>>
>> On 04/12/2017 01:49 PM, Eric Blake wrote:
>>> We had several functions that no one was using, and which used
>>> sector-based interfaces.  I'm trying to convert towards byte-based
>>> interfaces, so it's easier to just drop the unused functions:
>>>
>>> bdrv_dirty_bitmap_size
>>> bdrv_dirty_bitmap_get_meta
>>> bdrv_dirty_bitmap_reset_meta
>>> bdrv_dirty_bitmap_meta_granularity
>>>
>>> Signed-off-by: Eric Blake 
>>> ---
>>>  include/block/dirty-bitmap.h |  8 
>>>  block/dirty-bitmap.c | 34 --
>>>  2 files changed, 42 deletions(-)
>>>
> 
>>
>> I think it's likely Vladimir is or at least was relying on some of these
>> for his migration and persistence series.
>>
>> Might be nice to let him chime in to see how much of a hassle this is.
> 
> Then let's add him in cc ;)
> 

Err... I can't just summon people by mentioning them?

> I'm okay if these functions stay because they have a user, but it would
> also be nice if they were properly byte-based (like everything else in
> dirty-bitmap at the end of my series).  So even if we remove them here,
> we can revert the removal, and re-add them but with a sane interface.
> 

OK, but I will offer to do the work in the interest of not slowing
things down any further.

Do you use any of these, Vladimir?

--js

Re: [Qemu-devel] [PATCH 03/12] dirty-bitmap: Drop unused functions

2017-04-12 Thread Eric Blake

On 04/12/2017 05:47 PM, John Snow wrote:
> 
> 
> On 04/12/2017 01:49 PM, Eric Blake wrote:
>> We had several functions that no one was using, and which used
>> sector-based interfaces.  I'm trying to convert towards byte-based
>> interfaces, so it's easier to just drop the unused functions:
>>
>> bdrv_dirty_bitmap_size
>> bdrv_dirty_bitmap_get_meta
>> bdrv_dirty_bitmap_reset_meta
>> bdrv_dirty_bitmap_meta_granularity
>>
>> Signed-off-by: Eric Blake 
>> ---
>>  include/block/dirty-bitmap.h |  8 
>>  block/dirty-bitmap.c | 34 --
>>  2 files changed, 42 deletions(-)
>>

> 
> I think it's likely Vladimir is or at least was relying on some of these
> for his migration and persistence series.
> 
> Might be nice to let him chime in to see how much of a hassle this is.

Then let's add him in cc ;)

I'm okay if these functions stay because they have a user, but it would
also be nice if they were properly byte-based (like everything else in
dirty-bitmap at the end of my series).  So even if we remove them here,
we can revert the removal, and re-add them but with a sane interface.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 04/12] dirty-bitmap: Track size in bytes

2017-04-12 Thread John Snow



On 04/12/2017 01:49 PM, Eric Blake wrote:
> We are still using an internal hbitmap that tracks a size in sectors,
> with the granularity scaled down accordingly, because it lets us
> use a shortcut for our iterators which are currently sector-based.
> But there's no reason we can't track the dirty bitmap size in bytes,
> since it is an internal-only variable.
> 
> Signed-off-by: Eric Blake 
> ---
>  block/dirty-bitmap.c | 23 +--
>  1 file changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 32698d5..a413df1 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -41,7 +41,7 @@ struct BdrvDirtyBitmap {
>  HBitmap *meta;  /* Meta dirty bitmap */
>  BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
>  char *name; /* Optional non-empty unique ID */
> -int64_t size;   /* Size of the bitmap (Number of sectors) */
> +int64_t size;   /* Size of the bitmap, in bytes */
>  bool disabled;  /* Bitmap is read-only */
>  int active_iterators;   /* How many iterators are active */
>  QLIST_ENTRY(BdrvDirtyBitmap) list;
> @@ -79,24 +79,26 @@ BdrvDirtyBitmap 
> *bdrv_create_dirty_bitmap(BlockDriverState *bs,
>  {
>  int64_t bitmap_size;
>  BdrvDirtyBitmap *bitmap;
> -uint32_t sector_granularity;
> 
> -assert((granularity & (granularity - 1)) == 0);
> +assert(is_power_of_2(granularity) && granularity >= BDRV_SECTOR_SIZE);
> 
>  if (name && bdrv_find_dirty_bitmap(bs, name)) {
>  error_setg(errp, "Bitmap already exists: %s", name);
>  return NULL;
>  }
> -sector_granularity = granularity >> BDRV_SECTOR_BITS;
> -assert(sector_granularity);
> -bitmap_size = bdrv_nb_sectors(bs);
> +bitmap_size = bdrv_getlength(bs);
>  if (bitmap_size < 0) {
>  error_setg_errno(errp, -bitmap_size, "could not get length of 
> device");
>  errno = -bitmap_size;
>  return NULL;
>  }
>  bitmap = g_new0(BdrvDirtyBitmap, 1);
> -bitmap->bitmap = hbitmap_alloc(bitmap_size, ctz32(sector_granularity));
> +/*
> + * TODO - let hbitmap track full granularity. For now, it is tracking
> + * only sector granularity, as a shortcut for our iterators.
> + */
> +bitmap->bitmap = hbitmap_alloc(bitmap_size >> BDRV_SECTOR_BITS,
> +   ctz32(granularity) - BDRV_SECTOR_BITS);
>  bitmap->size = bitmap_size;
>  bitmap->name = g_strdup(name);
>  bitmap->disabled = false;
> @@ -246,12 +248,13 @@ BdrvDirtyBitmap 
> *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
>  void bdrv_dirty_bitmap_truncate(BlockDriverState *bs)
>  {
>  BdrvDirtyBitmap *bitmap;
> -uint64_t size = bdrv_nb_sectors(bs);
> +int64_t size = bdrv_getlength(bs);
> 
> +assert(size >= 0);
>  QLIST_FOREACH(bitmap, >dirty_bitmaps, list) {
>  assert(!bdrv_dirty_bitmap_frozen(bitmap));
>  assert(!bitmap->active_iterators);
> -hbitmap_truncate(bitmap->bitmap, size);
> +hbitmap_truncate(bitmap->bitmap, size >> BDRV_SECTOR_BITS);
>  bitmap->size = size;
>  }
>  }
> @@ -419,7 +422,7 @@ void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, 
> HBitmap **out)
>  hbitmap_reset_all(bitmap->bitmap);
>  } else {
>  HBitmap *backup = bitmap->bitmap;
> -bitmap->bitmap = hbitmap_alloc(bitmap->size,
> +bitmap->bitmap = hbitmap_alloc(bitmap->size >> BDRV_SECTOR_BITS,
> hbitmap_granularity(backup));
>  *out = backup;
>  }
> 

Reviewed-by: John Snow

Re: [Qemu-devel] [PATCH 03/12] dirty-bitmap: Drop unused functions

2017-04-12 Thread John Snow



On 04/12/2017 01:49 PM, Eric Blake wrote:
> We had several functions that no one was using, and which used
> sector-based interfaces.  I'm trying to convert towards byte-based
> interfaces, so it's easier to just drop the unused functions:
> 
> bdrv_dirty_bitmap_size
> bdrv_dirty_bitmap_get_meta
> bdrv_dirty_bitmap_reset_meta
> bdrv_dirty_bitmap_meta_granularity
> 
> Signed-off-by: Eric Blake 
> ---
>  include/block/dirty-bitmap.h |  8 
>  block/dirty-bitmap.c | 34 --
>  2 files changed, 42 deletions(-)
> 
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index 9dea14b..a83979d 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -30,11 +30,9 @@ void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
>  BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs);
>  uint32_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs);
>  uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap *bitmap);
> -uint32_t bdrv_dirty_bitmap_meta_granularity(BdrvDirtyBitmap *bitmap);
>  bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
>  bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap);
>  const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
> -int64_t bdrv_dirty_bitmap_size(const BdrvDirtyBitmap *bitmap);
>  DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap);
>  int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
> int64_t sector);
> @@ -42,12 +40,6 @@ void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
> int64_t cur_sector, int64_t nr_sectors);
>  void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
>   int64_t cur_sector, int64_t nr_sectors);
> -int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
> -   BdrvDirtyBitmap *bitmap, int64_t sector,
> -   int nb_sectors);
> -void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
> -  BdrvDirtyBitmap *bitmap, int64_t sector,
> -  int nb_sectors);
>  BdrvDirtyBitmapIter *bdrv_dirty_meta_iter_new(BdrvDirtyBitmap *bitmap);
>  BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
>   uint64_t first_sector);
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 6d8ce5f..32698d5 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -130,35 +130,6 @@ void bdrv_release_meta_dirty_bitmap(BdrvDirtyBitmap 
> *bitmap)
>  bitmap->meta = NULL;
>  }
> 
> -int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
> -   BdrvDirtyBitmap *bitmap, int64_t sector,
> -   int nb_sectors)
> -{
> -uint64_t i;
> -int sectors_per_bit = 1 << hbitmap_granularity(bitmap->meta);
> -
> -/* To optimize: we can make hbitmap to internally check the range in a
> - * coarse level, or at least do it word by word. */
> -for (i = sector; i < sector + nb_sectors; i += sectors_per_bit) {
> -if (hbitmap_get(bitmap->meta, i)) {
> -return true;
> -}
> -}
> -return false;
> -}
> -
> -void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
> -  BdrvDirtyBitmap *bitmap, int64_t sector,
> -  int nb_sectors)
> -{
> -hbitmap_reset(bitmap->meta, sector, nb_sectors);
> -}
> -
> -int64_t bdrv_dirty_bitmap_size(const BdrvDirtyBitmap *bitmap)
> -{
> -return bitmap->size;
> -}
> -
>  const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
>  {
>  return bitmap->name;
> @@ -393,11 +364,6 @@ uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap 
> *bitmap)
>  return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
>  }
> 
> -uint32_t bdrv_dirty_bitmap_meta_granularity(BdrvDirtyBitmap *bitmap)
> -{
> -return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->meta);
> -}
> -
>  BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
>   uint64_t first_sector)
>  {
> 

I think it's likely Vladimir is or at least was relying on some of these
for his migration and persistence series.

Might be nice to let him chime in to see how much of a hassle this is.

Re: [Qemu-devel] [PATCH 02/12] migration: Don't lose errno across aio context changes

2017-04-12 Thread John Snow



On 04/12/2017 01:49 PM, Eric Blake wrote:
> set_drity_tracking() was assuming that the errno value set by

*cough*

> bdrv_create_dirty_bitmap() would not be corrupted by either
> blk_get_aio_context() or aio_context_release().  Rather than
> audit whether this assumption is safe, rewrite the code to just
> grab the value of errno sooner.
> 
> CC: qemu-sta...@nongnu.org
> Signed-off-by: Eric Blake 
> ---
>  migration/block.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/block.c b/migration/block.c
> index 18d50ff..9a9c214 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -350,9 +350,9 @@ static int set_dirty_tracking(void)
>  aio_context_acquire(blk_get_aio_context(bmds->blk));
>  bmds->dirty_bitmap = bdrv_create_dirty_bitmap(blk_bs(bmds->blk),
>BLOCK_SIZE, NULL, 
> NULL);
> +ret = -errno;
>  aio_context_release(blk_get_aio_context(bmds->blk));
>  if (!bmds->dirty_bitmap) {
> -ret = -errno;
>  goto fail;
>  }
>  }
> 

Reviewed-by: John Snow

Re: [Qemu-devel] [PATCH 01/12] dirty-bitmap: Report BlockDirtyInfo.count in bytes, as documented

2017-04-12 Thread John Snow



On 04/12/2017 01:49 PM, Eric Blake wrote:
> We've been documenting the value in bytes since its introduction
> in commit b9a9b3a4 (v1.3), where it was actually reported in bytes.
> 
> Commit e4654d2 (v2.0) then removed things from block/qapi.c, in
> preparation for a rewrite to a list of dirty sectors in the next
> commit 21b5683 in block.c, but the new code mistakenly started
> reporting in sectors.
> 
> Fixes: https://bugzilla.redhat.com/1441460
> 
> CC: qemu-sta...@nongnu.org
> Signed-off-by: Eric Blake 
> 
> ---
> Too late for 2.9, since the regression has been unnoticed for
> nine releases. But worth putting in 2.9.1.
> ---

Since before I started working here :)

I even documented the wrong thing in my two talks on the matter, but I
suppose I never committed it to bitmaps.md, so...

I guess this is technically correct?

http://i3.kym-cdn.com/photos/images/facebook/000/909/991/48c.jpg

>  block/dirty-bitmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 519737c..6d8ce5f 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -345,7 +345,7 @@ BlockDirtyInfoList 
> *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
>  QLIST_FOREACH(bm, >dirty_bitmaps, list) {
>  BlockDirtyInfo *info = g_new0(BlockDirtyInfo, 1);
>  BlockDirtyInfoList *entry = g_new0(BlockDirtyInfoList, 1);
> -info->count = bdrv_get_dirty_count(bm);
> +info->count = bdrv_get_dirty_count(bm) << BDRV_SECTOR_BITS;
>  info->granularity = bdrv_dirty_bitmap_granularity(bm);
>  info->has_name = !!bm->name;
>  info->name = g_strdup(bm->name);
> 

This is strictly more useful than sectors anyway, so ...

Reviewed-by: John Snow

Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()

2017-04-12 Thread Jeff Cody

On Wed, Apr 12, 2017 at 05:38:17PM -0400, John Snow wrote:
> 
> 
> On 04/12/2017 04:46 PM, Jeff Cody wrote:
> > 
> > This occurs on v2.9.0-rc4, but not on v2.8.0.
> > 
> > When running QEMU with an iothread, and then performing a block-mirror, if
> > we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu
> > becomes deadlocked.
> > 
> > The block job is not paused, nor cancelled, so we are stuck in the while
> > loop in block_job_detach_aio_context:
> > 
> > static void block_job_detach_aio_context(void *opaque)
> > {
> > BlockJob *job = opaque;
> > 
> > /* In case the job terminates during aio_poll()... */
> > block_job_ref(job);
> > 
> > block_job_pause(job);
> > 
> > while (!job->paused && !job->completed) {
> > block_job_drain(job);
> > }
> > 
> 
> Looks like when block_job_drain calls block_job_enter from this context
> (the main thread, since we're trying to do a system_reset...), we cannot
> enter the coroutine because it's the wrong context, so we schedule an
> entry instead with
> 
> aio_co_schedule(ctx, co);
> 
> But that entry never happens, so the job never wakes up and we never
> make enough progress in the coroutine to gracefully pause, so we wedge here.
> 


John Snow and I debugged this some over IRC.  Here is a summary:

Simply put, with iothreads the aio context is different.  When
block_job_detach_aio_context() is called from the main thread via the system
reset (from main_loop_should_exit()), it calls block_job_drain() in a while
loop, with job->busy and job->completed as exit conditions.

block_job_drain() attempts to enter the coroutine (thus allowing job->busy
or job->completed to change).  However, since the aio context is different
with iothreads, we schedule the coroutine entry rather than directly
entering it.

This means the job coroutine is never going to be re-entered, because we are
waiting for it to complete in a while loop from the main thread, which is
blocking the qemu timers which would run the scheduled coroutine... hence,
we become stuck.



> > block_job_unref(job);
> > }
> > 
> 
> > 
> > Reproducer script and QAPI commands:
> > 
> > # QEMU script:
> > gdb --args /home/user/deploy-${1}/bin/qemu-system-x86_64 -enable-kvm -smp 4 
> > -object iothread,id=iothread0 -drive 
> > file=${2},if=none,id=drive-virtio-disk0,aio=native,cache=none,discard=unmap 
> >  -device 
> > virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,iothread=iothread0
> >  -m 1024 -boot menu=on -qmp stdio -drive 
> > file=${3},if=none,id=drive-data-disk0,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop
> >  -device 
> > virtio-blk-pci,drive=drive-data-disk0,id=data-disk0,iothread=iothread0,bus=pci.0,addr=0x7
> >  
> > 
> > 
> > # QAPI commands:
> > { "execute": "drive-mirror", "arguments": { "device": "drive-data-disk0", 
> > "target": "/home/user/sn1", "format": "qcow2", "mode": "absolute-paths", 
> > "sync": "full", "speed": 10, "on-source-error": "stop", 
> > "on-target-error": "stop" } }
> > 
> > 
> > # after BLOCK_JOB_READY, do system reset
> > { "execute": "system_reset" }
> > 
> > 
> > 
> > 
> > 
> > gbd bt:
> > 
> > (gdb) bt
> > #0  0x55aa79f3 in bdrv_drain_recurse (bs=bs@entry=0x5783e900) 
> > at block/io.c:164
> > #1  0x55aa825d in bdrv_drained_begin (bs=bs@entry=0x5783e900) 
> > at block/io.c:231
> > #2  0x55aa8449 in bdrv_drain (bs=0x5783e900) at block/io.c:265
> > #3  0x55a9c356 in blk_drain (blk=) at 
> > block/block-backend.c:1383
> > #4  0x55aa3cfd in mirror_drain (job=) at 
> > block/mirror.c:1000
> > #5  0x55a66e11 in block_job_detach_aio_context 
> > (opaque=0x57a19a40) at blockjob.c:142
> > #6  0x55a62f4d in bdrv_detach_aio_context 
> > (bs=bs@entry=0x57839410) at block.c:4357
> > #7  0x55a63116 in bdrv_set_aio_context (bs=bs@entry=0x57839410, 
> > new_context=new_context@entry=0x5668bc20) at block.c:4418
> > #8  0x55a9d326 in blk_set_aio_context (blk=0x566db520, 
> > new_context=0x5668bc20) at block/block-backend.c:1662
> > #9  0x557e38da in virtio_blk_data_plane_stop (vdev=) 
> > at /home/jcody/work/upstream/qemu-kvm/hw/block/dataplane/virtio-blk.c:262
> > #10 0x559f9d5f in virtio_bus_stop_ioeventfd 
> > (bus=bus@entry=0x583089a8) at hw/virtio/virtio-bus.c:246
> > #11 0x559fa49b in virtio_bus_stop_ioeventfd 
> > (bus=bus@entry=0x583089a8) at hw/virtio/virtio-bus.c:238
> > #12 0x559f6a18 in virtio_pci_stop_ioeventfd (proxy=0x58300510) 
> > at hw/virtio/virtio-pci.c:348
> > #13 0x559f6a18 in virtio_pci_reset (qdev=) at 
> > hw/virtio/virtio-pci.c:1872
> > #14 0x559139a9 in qdev_reset_one (dev=, 
> > opaque=) at hw/core/qdev.c:310
> > #15 0x55916738 in qbus_walk_children (bus=0x5693aa30, 
> > pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x559139a0 , 
> > post_busfn=0x559120f0 , opaque=0x0) at hw/core/bus.c:59

Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()

2017-04-12 Thread John Snow



On 04/12/2017 04:46 PM, Jeff Cody wrote:
> 
> This occurs on v2.9.0-rc4, but not on v2.8.0.
> 
> When running QEMU with an iothread, and then performing a block-mirror, if
> we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu
> becomes deadlocked.
> 
> The block job is not paused, nor cancelled, so we are stuck in the while
> loop in block_job_detach_aio_context:
> 
> static void block_job_detach_aio_context(void *opaque)
> {
> BlockJob *job = opaque;
> 
> /* In case the job terminates during aio_poll()... */
> block_job_ref(job);
> 
> block_job_pause(job);
> 
> while (!job->paused && !job->completed) {
> block_job_drain(job);
> }
> 

Looks like when block_job_drain calls block_job_enter from this context
(the main thread, since we're trying to do a system_reset...), we cannot
enter the coroutine because it's the wrong context, so we schedule an
entry instead with

aio_co_schedule(ctx, co);

But that entry never happens, so the job never wakes up and we never
make enough progress in the coroutine to gracefully pause, so we wedge here.

> block_job_unref(job);
> }
> 

> 
> Reproducer script and QAPI commands:
> 
> # QEMU script:
> gdb --args /home/user/deploy-${1}/bin/qemu-system-x86_64 -enable-kvm -smp 4 
> -object iothread,id=iothread0 -drive 
> file=${2},if=none,id=drive-virtio-disk0,aio=native,cache=none,discard=unmap  
> -device 
> virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,iothread=iothread0
>  -m 1024 -boot menu=on -qmp stdio -drive 
> file=${3},if=none,id=drive-data-disk0,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop
>  -device 
> virtio-blk-pci,drive=drive-data-disk0,id=data-disk0,iothread=iothread0,bus=pci.0,addr=0x7
>  
> 
> 
> # QAPI commands:
> { "execute": "drive-mirror", "arguments": { "device": "drive-data-disk0", 
> "target": "/home/user/sn1", "format": "qcow2", "mode": "absolute-paths", 
> "sync": "full", "speed": 10, "on-source-error": "stop", 
> "on-target-error": "stop" } }
> 
> 
> # after BLOCK_JOB_READY, do system reset
> { "execute": "system_reset" }
> 
> 
> 
> 
> 
> gbd bt:
> 
> (gdb) bt
> #0  0x55aa79f3 in bdrv_drain_recurse (bs=bs@entry=0x5783e900) at 
> block/io.c:164
> #1  0x55aa825d in bdrv_drained_begin (bs=bs@entry=0x5783e900) at 
> block/io.c:231
> #2  0x55aa8449 in bdrv_drain (bs=0x5783e900) at block/io.c:265
> #3  0x55a9c356 in blk_drain (blk=) at 
> block/block-backend.c:1383
> #4  0x55aa3cfd in mirror_drain (job=) at 
> block/mirror.c:1000
> #5  0x55a66e11 in block_job_detach_aio_context 
> (opaque=0x57a19a40) at blockjob.c:142
> #6  0x55a62f4d in bdrv_detach_aio_context 
> (bs=bs@entry=0x57839410) at block.c:4357
> #7  0x55a63116 in bdrv_set_aio_context (bs=bs@entry=0x57839410, 
> new_context=new_context@entry=0x5668bc20) at block.c:4418
> #8  0x55a9d326 in blk_set_aio_context (blk=0x566db520, 
> new_context=0x5668bc20) at block/block-backend.c:1662
> #9  0x557e38da in virtio_blk_data_plane_stop (vdev=) 
> at /home/jcody/work/upstream/qemu-kvm/hw/block/dataplane/virtio-blk.c:262
> #10 0x559f9d5f in virtio_bus_stop_ioeventfd 
> (bus=bus@entry=0x583089a8) at hw/virtio/virtio-bus.c:246
> #11 0x559fa49b in virtio_bus_stop_ioeventfd 
> (bus=bus@entry=0x583089a8) at hw/virtio/virtio-bus.c:238
> #12 0x559f6a18 in virtio_pci_stop_ioeventfd (proxy=0x58300510) at 
> hw/virtio/virtio-pci.c:348
> #13 0x559f6a18 in virtio_pci_reset (qdev=) at 
> hw/virtio/virtio-pci.c:1872
> #14 0x559139a9 in qdev_reset_one (dev=, 
> opaque=) at hw/core/qdev.c:310
> #15 0x55916738 in qbus_walk_children (bus=0x5693aa30, 
> pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x559139a0 , 
> post_busfn=0x559120f0 , opaque=0x0) at hw/core/bus.c:59
> #16 0x55913318 in qdev_walk_children (dev=0x569387d0, 
> pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x559139a0 , 
> post_busfn=0x559120f0 , opaque=0x0) at hw/core/qdev.c:617
> #17 0x55916738 in qbus_walk_children (bus=0x56756f70, 
> pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x559139a0 , 
> post_busfn=0x559120f0 , opaque=0x0) at hw/core/bus.c:59
> #18 0x559168ca in qemu_devices_reset () at hw/core/reset.c:69
> #19 0x5581fcbb in pc_machine_reset () at 
> /home/jcody/work/upstream/qemu-kvm/hw/i386/pc.c:2234
> #20 0x558a4d96 in qemu_system_reset (report=) at 
> vl.c:1697
> #21 0x5577157a in main_loop_should_exit () at vl.c:1865
> #22 0x5577157a in main_loop () at vl.c:1902
> #23 0x5577157a in main (argc=, argv=, 
> envp=) at vl.c:4709
> 
> 
> -Jeff
> 

Here's a backtrace for an unoptimized build showing all threads:

https://paste.fedoraproject.org/paste/lLnm8jKeq2wLKF6yEaoEM15M1UNdIGYhyRLivL9gydE=


--js

Re: [Qemu-devel] [PULL 02/15] docs: VM Generation ID device description

2017-04-12 Thread Michael S. Tsirkin

On Wed, Apr 12, 2017 at 09:17:12PM +, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Apr 13, 2017 at 1:03 AM Ben Warren  wrote:
> 
> On Apr 12, 2017, at 1:47 PM, Marc-André Lureau <
> marcandre.lur...@gmail.com> wrote:
> 
> Hi
> 
> On Thu, Apr 13, 2017 at 12:25 AM Ben Warren 
> wrote:
> 
> On Apr 12, 2017, at 1:22 PM, Marc-André Lureau <
> marcandre.lur...@gmail.com> wrote:
> 
> Hi
> 
> On Thu, Apr 13, 2017 at 12:17 AM Ben Warren <
> b...@skyportsystems.com> wrote:
> 
> On Apr 12, 2017, at 1:06 PM, Marc-André Lureau <
> marcandre.lur...@gmail.com> wrote:
> 
> 
> +Device Usage:
> +-
> +
> +The device has one property, which may be only be
> set using the command line:
> +
> +  guid - sets the value of the GUID.  A special
> value "auto" instructs
> +         QEMU to generate a new random GUID.
> +
> +For example:
> +
> +  QEMU  -device vmgenid,guid=
> "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> +  QEMU  -device vmgenid,guid=auto
> 
> 
> The default will keep uuid to null, should it be
> documented? Wouldn't it make sense to default to auto?
> 
> There is no default - you have to supply a value. It’s up
> to whatever software is managing VM lifecycle to decide
> what value to pass in.  Always setting to ‘auto’ will 
> cause
> a lot of churn within Windows that may or may not be
> acceptable to your use case.
> 
> 
> 
> Why would you have a vmgenid device if it's always null? Does
> that please some windows use-cases as well? 
>  
> 
> I don’t get what you mean by this.  What device is always null? 
> Either the device is instantiated or it isn’t.  If not there,
> Windows will not find a device and I don’t know how derived 
> objects
> (Invocation ID, etc.) are handled.
> 
> 
> If you start a VM without specifying guid argument, you'll always have
> a genid null uuid, event after a migration (this could have been
> handled by qemu without requiring management layer, no?). I don't
> understand why auto would create more churn than what the management
> layer would do by setting new uuid for each VM started. Could you
> explain?
> 
> 
> Looks like there’s a bug.  GUID should be a mandatory parameter. 
> 
> 
> Not necessarily a bug, if the guid can be changed when starting a "new" VM,
> which I think should work.

I think spec does not allow for a special "invalid" guid value ATM.


> However, I didn't manage to get your driver noticing the acpi event. I tried 
> to
> migrate/save & restore, and no vmgenid_notify kernel messages came out, nor
> notices got incremented. How did you test it?
>  
> 
> As for the churn, I’ll give you one example.  If an Active Directory 
> Domain
> Controller (ADDC) detects a change in VM Generation ID, it takes this to
> mean that the VM has been rolled back in time, and so its replication
> sequence numbers are “dirty”.  This has the effect of causing the Domain
> controller to perform a full “pull replication” with other ADDCs.  In 
> large
> deployments this can be costly.  VM Generation ID is used by other
> applications besides AD.
> 
> 
> 
> 
> I start to understand better the use case and how the device should be used.
>  
> thanks again
> 
> 
> 
>  
> 
> +The property may be queried via QMP/HMP:
> +
> +  (QEMU) query-vm-generation-id
> +  {"return": {"guid":
> "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"}}
> +
> +Setting of this parameter is intentionally left
> out from the QMP/HMP
> +interfaces.  There are no known use cases for
> changing the GUID once QEMU is
> +running, and adding this capability would greatly
> increase the complexity.
> 
>  
> Is this supposed to be not permitted?
> 
> {

Re: [Qemu-devel] [PULL 02/15] docs: VM Generation ID device description

2017-04-12 Thread Marc-André Lureau

Hi

On Thu, Apr 13, 2017 at 1:03 AM Ben Warren  wrote:

> On Apr 12, 2017, at 1:47 PM, Marc-André Lureau 
> wrote:
>
> Hi
>
> On Thu, Apr 13, 2017 at 12:25 AM Ben Warren 
> wrote:
>
> On Apr 12, 2017, at 1:22 PM, Marc-André Lureau 
> wrote:
>
> Hi
>
> On Thu, Apr 13, 2017 at 12:17 AM Ben Warren 
> wrote:
>
> On Apr 12, 2017, at 1:06 PM, Marc-André Lureau 
> wrote:
>
> +Device Usage:
> +-
> +
> +The device has one property, which may be only be set using the command
> line:
> +
> +  guid - sets the value of the GUID.  A special value "auto" instructs
> + QEMU to generate a new random GUID.
> +
> +For example:
> +
> +  QEMU  -device vmgenid,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> +  QEMU  -device vmgenid,guid=auto
>
>
> The default will keep uuid to null, should it be documented? Wouldn't it
> make sense to default to auto?
>
> There is no default - you have to supply a value. It’s up to whatever
> software is managing VM lifecycle to decide what value to pass in.  Always
> setting to ‘auto’ will cause a lot of churn within Windows that may or may
> not be acceptable to your use case.
>
>
> Why would you have a vmgenid device if it's always null? Does that please
> some windows use-cases as well?
>
>
> I don’t get what you mean by this.  What device is always null?  Either
> the device is instantiated or it isn’t.  If not there, Windows will not
> find a device and I don’t know how derived objects (Invocation ID, etc.)
> are handled.
>
>
> If you start a VM without specifying guid argument, you'll always have a
> genid null uuid, event after a migration (this could have been handled by
> qemu without requiring management layer, no?). I don't understand why auto
> would create more churn than what the management layer would do by setting
> new uuid for each VM started. Could you explain?
>
> Looks like there’s a bug.  GUID should be a mandatory parameter.
>

Not necessarily a bug, if the guid can be changed when starting a "new" VM,
which I think should work.

However, I didn't manage to get your driver noticing the acpi event. I
tried to migrate/save & restore, and no vmgenid_notify kernel messages came
out, nor notices got incremented. How did you test it?


> As for the churn, I’ll give you one example.  If an Active Directory
> Domain Controller (ADDC) detects a change in VM Generation ID, it takes
> this to mean that the VM has been rolled back in time, and so its
> replication sequence numbers are “dirty”.  This has the effect of causing
> the Domain controller to perform a full “pull replication” with other
> ADDCs.  In large deployments this can be costly.  VM Generation ID is used
> by other applications besides AD.
>
>

I start to understand better the use case and how the device should be used.

thanks again

>
>
>
> +The property may be queried via QMP/HMP:
> +
> +  (QEMU) query-vm-generation-id
> +  {"return": {"guid": "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"}}
> +
> +Setting of this parameter is intentionally left out from the QMP/HMP
> +interfaces.  There are no known use cases for changing the GUID once QEMU
> is
> +running, and adding this capability would greatly increase the complexity.
>
>
> Is this supposed to be not permitted?
>
> { "execute": "qom-set", "arguments": { "path":
> "/machine/peripheral-anon/device[1]", "property": "guid", "value": "auto" }
> }
>
> Is there any linux kernel support being worked on?
>
> This isn’t really relevant to the Linux kernel, at least in any way I can
> think of.  What did you have in mind?
>
>
> Testing, but apparently we do have RFE for RHEL as Laszlo pointed out.
>
> OK, so you mean a guest driver.  I do have one that needs work to go
> upstream, but has been helpful to me in testing.
> https://github.com/ben-skyportsystems/vmgenid-test
>
>
> Thanks, that's exactly what I was looking for :)
>
>
> Good.  I wish I had the time to integrate this upstream, but it’s one of
> those things that is good enough, and so will have to wait for another time.
>
> --
> Marc-André Lureau
>
> --
Marc-André Lureau

Re: [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus

2017-04-12 Thread Eduardo Habkost

On Wed, Mar 22, 2017 at 02:32:35PM +0100, Igor Mammedov wrote:
> Introduce machine_set_cpu_numa_node() helper that stores
> node mapping for CPU in MachineState::possible_cpus.
> CPU and node it belongs to is specified by 'props' argument.
> 
> Patch doesn't remove old way of storing mapping in
> numa_info[X].node_cpu as removing it at the same time
> makes patch rather big. Instead it just mirrors mapping
> in possible_cpus and follow up per target patches will
> switch to possible_cpus and numa_info[X].node_cpu will
> be removed once there isn't any users left.
> 
> Signed-off-by: Igor Mammedov 

So, this patch is the one that makes "-numa" and "-numa cpu"
affect query-hotpluggable-cpus output.

Before this patch:

  $ qemu-system-x86_64 -smp 2 -m 2G -numa node -numa node -numa node -numa node
  [run qmp-shell]
  (QEMU) query-hotpluggable-cpus
  {
  "return": [
  {
  "qom-path": "/machine/unattached/device[2]",
  "type": "qemu64-x86_64-cpu",
  "vcpus-count": 1,
  "props": {
  "socket-id": 1,
  "core-id": 0,
  "thread-id": 0
  }
  },
  {
  "qom-path": "/machine/unattached/device[0]",
  "type": "qemu64-x86_64-cpu",
  "vcpus-count": 1,
  "props": {
  "socket-id": 0,
  "core-id": 0,
  "thread-id": 0
  }
  }
  ]
  }


After this patch:

  $ qemu-system-x86_64 -smp 2 -m 2G -numa node -numa node -numa node -numa node
  [run qmp-shell]
  (QEMU) query-hotpluggable-cpus
  {
  "return": [
  {
  "qom-path": "/machine/unattached/device[2]",
  "type": "qemu64-x86_64-cpu",
  "vcpus-count": 1,
  "props": {
  "socket-id": 1,
  "node-id": 1,
  "core-id": 0,
  "thread-id": 0
  }
  },
  {
  "qom-path": "/machine/unattached/device[0]",
  "type": "qemu64-x86_64-cpu",
  "vcpus-count": 1,
  "props": {
  "socket-id": 0,
  "node-id": 0,
  "core-id": 0,
  "thread-id": 0
  }
  }
  ]
  }


As noted in another message, I am not sure we really should make
"-numa" affect query-hotpluggable-cpus output unconditionally (I
believe we shouldn't). But we do, we need to document this very
clearly.


> ---
>  include/hw/boards.h |  2 ++
>  hw/core/machine.c   | 68 
> +
>  numa.c  |  8 +++
>  3 files changed, 78 insertions(+)
> 
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 1dd0fde..40f30f1 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -42,6 +42,8 @@ bool machine_dump_guest_core(MachineState *machine);
>  bool machine_mem_merge(MachineState *machine);
>  void machine_register_compat_props(MachineState *machine);
>  HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine);
> +void machine_set_cpu_numa_node(MachineState *machine,
> +   CpuInstanceProperties *props, Error **errp);
>  
>  /**
>   * CPUArchId:
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 0d92672..6ff0b45 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -388,6 +388,74 @@ HotpluggableCPUList 
> *machine_query_hotpluggable_cpus(MachineState *machine)
>  return head;
>  }
>  
> +void machine_set_cpu_numa_node(MachineState *machine,
> +   CpuInstanceProperties *props, Error **errp)
> +{
> +MachineClass *mc = MACHINE_GET_CLASS(machine);
> +bool match = false;
> +int i;
> +
> +if (!mc->possible_cpu_arch_ids) {
> +error_setg(errp, "mapping of CPUs to NUMA node is not supported");
> +return;
> +}
> +
> +/* force board to initialize possible_cpus if it hasn't been done yet */
> +mc->possible_cpu_arch_ids(machine);
> +
> +for (i = 0; i < machine->possible_cpus->len; i++) {
> +CPUArchId *slot = >possible_cpus->cpus[i];
> +
> +/* reject unsupported by board properties */
> +if (props->has_thread_id && !slot->props.has_thread_id) {
> +error_setg(errp, "thread-id is not supported");
> +return;
> +}
> +
> +if (props->has_core_id && !slot->props.has_core_id) {
> +error_setg(errp, "core-id is not supported");
> +return;
> +}
> +
> +if (props->has_socket_id && !slot->props.has_socket_id) {
> +error_setg(errp, "socket-id is not supported");
> +return;
> +}
> +
> +/* skip slots with explicit mismatch */
> +if (props->has_thread_id && props->thread_id != 
> slot->props.thread_id) {
> +continue;
> +}
> +
> +

Re: [Qemu-devel] [PULL 02/15] docs: VM Generation ID device description

2017-04-12 Thread Ben Warren via Qemu-devel


> On Apr 12, 2017, at 1:47 PM, Marc-André Lureau  
> wrote:
> 
> Hi
> 
> On Thu, Apr 13, 2017 at 12:25 AM Ben Warren  > wrote:
>> On Apr 12, 2017, at 1:22 PM, Marc-André Lureau > > wrote:
>> 
>> Hi
>> 
>> On Thu, Apr 13, 2017 at 12:17 AM Ben Warren > > wrote:
>>> On Apr 12, 2017, at 1:06 PM, Marc-André Lureau >> > wrote:
>>> 
>>> +Device Usage:
>>> +-
>>> +
>>> +The device has one property, which may be only be set using the command 
>>> line:
>>> +
>>> +  guid - sets the value of the GUID.  A special value "auto" instructs
>>> + QEMU to generate a new random GUID.
>>> +
>>> +For example:
>>> +
>>> +  QEMU  -device vmgenid,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
>>> +  QEMU  -device vmgenid,guid=auto
>>> 
>>> The default will keep uuid to null, should it be documented? Wouldn't it 
>>> make sense to default to auto?
>> 
>> There is no default - you have to supply a value. It’s up to whatever 
>> software is managing VM lifecycle to decide what value to pass in.  Always 
>> setting to ‘auto’ will cause a lot of churn within Windows that may or may 
>> not be acceptable to your use case.
>> 
>> 
>> Why would you have a vmgenid device if it's always null? Does that please 
>> some windows use-cases as well? 
>>  
> 
> I don’t get what you mean by this.  What device is always null?  Either the 
> device is instantiated or it isn’t.  If not there, Windows will not find a 
> device and I don’t know how derived objects (Invocation ID, etc.) are handled.
> 
> If you start a VM without specifying guid argument, you'll always have a 
> genid null uuid, event after a migration (this could have been handled by 
> qemu without requiring management layer, no?). I don't understand why auto 
> would create more churn than what the management layer would do by setting 
> new uuid for each VM started. Could you explain?
> 
Looks like there’s a bug.  GUID should be a mandatory parameter.  As for the 
churn, I’ll give you one example.  If an Active Directory Domain Controller 
(ADDC) detects a change in VM Generation ID, it takes this to mean that the VM 
has been rolled back in time, and so its replication sequence numbers are 
“dirty”.  This has the effect of causing the Domain controller to perform a 
full “pull replication” with other ADDCs.  In large deployments this can be 
costly.  VM Generation ID is used by other applications besides AD.
> 
>>>  
>>> +The property may be queried via QMP/HMP:
>>> +
>>> +  (QEMU) query-vm-generation-id
>>> +  {"return": {"guid": "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"}}
>>> +
>>> +Setting of this parameter is intentionally left out from the QMP/HMP
>>> +interfaces.  There are no known use cases for changing the GUID once QEMU 
>>> is
>>> +running, and adding this capability would greatly increase the complexity.
>>>  
>>> Is this supposed to be not permitted?
>>> 
>>> { "execute": "qom-set", "arguments": { "path": 
>>> "/machine/peripheral-anon/device[1]", "property": "guid", "value": "auto" } 
>>> }
>>> 
>>> Is there any linux kernel support being worked on?
>> 
>> This isn’t really relevant to the Linux kernel, at least in any way I can 
>> think of.  What did you have in mind?
>> 
>> Testing, but apparently we do have RFE for RHEL as Laszlo pointed out.
> 
> OK, so you mean a guest driver.  I do have one that needs work to go 
> upstream, but has been helpful to me in testing.
> https://github.com/ben-skyportsystems/vmgenid-test 
> 
> 
> Thanks, that's exactly what I was looking for :) 

Good.  I wish I had the time to integrate this upstream, but it’s one of those 
things that is good enough, and so will have to wait for another time.
> -- 
> Marc-André Lureau

Re: [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU

2017-04-12 Thread Eduardo Habkost

On Wed, Mar 22, 2017 at 02:32:32PM +0100, Igor Mammedov wrote:
> it will allow switching from cpu_index to property based
> numa mapping in follow up patches.

I am not sure I understand all the consequences of this, so I
will give it a try:

"node-id" is an existing field in CpuInstanceProperties.
CpuInstanceProperties is used on both query-hotpluggable-cpus
output and in MachineState::possible_cpus.

We will start using MachineState::possible_cpus to keep track of
NUMA CPU affinity, and that means query-hotpluggable-cpus will
start reporting a "node-id" property when a NUMA mapping is
configured.

To allow query-hotpluggable-cpus to report "node-id", the CPU
objects must have a "node-id" property that can be set. This
patch adds the "node-id" property to X86CPU.

Is this description accurate? Is the presence of "node-id" in
query-hotpluggable-cpus the only reason we really need this
patch, or is there something else that requires the "node-id"
property?

Why exactly do we need to change the output of
query-hotpluggable-cpus for all machines to include "node-id", to
make "-numa cpu" work?  Did you consider saving node_id inside
CPUArchId and outside CpuInstanceProperties, so
query-hotplugabble-cpus output won't be affected by "-numa cpu"?

I'm asking this because I believe we will eventually need a
mechanism that lets management check what are the valid arguments
for "-numa cpu" for a given machine, and it looks like
query-hotpluggable-cpus is already the right mechanism for that.
But we can't make query-hotpluggable-cpus output depend on "-numa
cpu" input, if the "-numa cpu" input will also depend on
query-hotpluggable-cpus output.

> 
> Signed-off-by: Igor Mammedov 
> ---
>  hw/i386/pc.c  | 17 +
>  target/i386/cpu.c |  1 +
>  2 files changed, 18 insertions(+)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 7031100..873bbfa 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1895,6 +1895,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>  DeviceState *dev, Error **errp)
>  {
>  int idx;
> +int node_id;
>  CPUState *cs;
>  CPUArchId *cpu_slot;
>  X86CPUTopoInfo topo;
> @@ -1984,6 +1985,22 @@ static void pc_cpu_pre_plug(HotplugHandler 
> *hotplug_dev,
>  
>  cs = CPU(cpu);
>  cs->cpu_index = idx;
> +
> +node_id = numa_get_node_for_cpu(cs->cpu_index);
> +if (node_id == nb_numa_nodes) {
> +/* by default CPUState::numa_node was 0 if it's not set via CLI
> + * keep it this way for now but in future we probably should
> + * refuse to start up with incomplete numa mapping */
> +node_id = 0;
> +}
> +if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
> +cs->numa_node = node_id;
> +} else if (cs->numa_node != node_id) {
> +error_setg(errp, "node-id %d must match numa node specified"
> +"with -numa option for cpu-index %d",
> +cs->numa_node, cs->cpu_index);
> +return;
> +}
>  }
>  
>  static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 7aa7622..d690244 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -3974,6 +3974,7 @@ static Property x86_cpu_properties[] = {
>  DEFINE_PROP_INT32("core-id", X86CPU, core_id, -1),
>  DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, -1),
>  #endif
> +DEFINE_PROP_INT32("node-id", CPUState, numa_node, 
> CPU_UNSET_NUMA_NODE_ID),
>  DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false),
>  { .name  = "hv-spinlocks", .info  = _prop_spinlocks },
>  DEFINE_PROP_BOOL("hv-relaxed", X86CPU, hyperv_relaxed_timing, false),
> -- 
> 2.7.4
> 
> 

-- 
Eduardo

Re: [Qemu-devel] [PULL 02/15] docs: VM Generation ID device description

2017-04-12 Thread Marc-André Lureau

Hi

On Thu, Apr 13, 2017 at 12:25 AM Ben Warren  wrote:

> On Apr 12, 2017, at 1:22 PM, Marc-André Lureau 
> wrote:
>
> Hi
>
> On Thu, Apr 13, 2017 at 12:17 AM Ben Warren 
> wrote:
>
> On Apr 12, 2017, at 1:06 PM, Marc-André Lureau 
> wrote:
>
> +Device Usage:
> +-
> +
> +The device has one property, which may be only be set using the command
> line:
> +
> +  guid - sets the value of the GUID.  A special value "auto" instructs
> + QEMU to generate a new random GUID.
> +
> +For example:
> +
> +  QEMU  -device vmgenid,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> +  QEMU  -device vmgenid,guid=auto
>
>
> The default will keep uuid to null, should it be documented? Wouldn't it
> make sense to default to auto?
>
> There is no default - you have to supply a value. It’s up to whatever
> software is managing VM lifecycle to decide what value to pass in.  Always
> setting to ‘auto’ will cause a lot of churn within Windows that may or may
> not be acceptable to your use case.
>
>
> Why would you have a vmgenid device if it's always null? Does that please
> some windows use-cases as well?
>
>
> I don’t get what you mean by this.  What device is always null?  Either
> the device is instantiated or it isn’t.  If not there, Windows will not
> find a device and I don’t know how derived objects (Invocation ID, etc.)
> are handled.
>

If you start a VM without specifying guid argument, you'll always have a
genid null uuid, event after a migration (this could have been handled by
qemu without requiring management layer, no?). I don't understand why auto
would create more churn than what the management layer would do by setting
new uuid for each VM started. Could you explain?


>
>
> +The property may be queried via QMP/HMP:
> +
> +  (QEMU) query-vm-generation-id
> +  {"return": {"guid": "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"}}
> +
> +Setting of this parameter is intentionally left out from the QMP/HMP
> +interfaces.  There are no known use cases for changing the GUID once QEMU
> is
> +running, and adding this capability would greatly increase the complexity.
>
>
> Is this supposed to be not permitted?
>
> { "execute": "qom-set", "arguments": { "path":
> "/machine/peripheral-anon/device[1]", "property": "guid", "value": "auto" }
> }
>
> Is there any linux kernel support being worked on?
>
> This isn’t really relevant to the Linux kernel, at least in any way I can
> think of.  What did you have in mind?
>
>
> Testing, but apparently we do have RFE for RHEL as Laszlo pointed out.
>
> OK, so you mean a guest driver.  I do have one that needs work to go
> upstream, but has been helpful to me in testing.
> https://github.com/ben-skyportsystems/vmgenid-test
>

Thanks, that's exactly what I was looking for :)
-- 
Marc-André Lureau

[Qemu-devel] Regression from 2.8: stuck in bdrv_drain()

2017-04-12 Thread Jeff Cody


This occurs on v2.9.0-rc4, but not on v2.8.0.

When running QEMU with an iothread, and then performing a block-mirror, if
we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu
becomes deadlocked.

The block job is not paused, nor cancelled, so we are stuck in the while
loop in block_job_detach_aio_context:

static void block_job_detach_aio_context(void *opaque)
{
BlockJob *job = opaque;

/* In case the job terminates during aio_poll()... */
block_job_ref(job);

block_job_pause(job);

while (!job->paused && !job->completed) {
block_job_drain(job);
}

block_job_unref(job);
}


Reproducer script and QAPI commands:

# QEMU script:
gdb --args /home/user/deploy-${1}/bin/qemu-system-x86_64 -enable-kvm -smp 4 
-object iothread,id=iothread0 -drive 
file=${2},if=none,id=drive-virtio-disk0,aio=native,cache=none,discard=unmap  
-device 
virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,iothread=iothread0
 -m 1024 -boot menu=on -qmp stdio -drive 
file=${3},if=none,id=drive-data-disk0,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop
 -device 
virtio-blk-pci,drive=drive-data-disk0,id=data-disk0,iothread=iothread0,bus=pci.0,addr=0x7
 


# QAPI commands:
{ "execute": "drive-mirror", "arguments": { "device": "drive-data-disk0", 
"target": "/home/user/sn1", "format": "qcow2", "mode": "absolute-paths", 
"sync": "full", "speed": 10, "on-source-error": "stop", 
"on-target-error": "stop" } }


# after BLOCK_JOB_READY, do system reset
{ "execute": "system_reset" }





gbd bt:

(gdb) bt
#0  0x55aa79f3 in bdrv_drain_recurse (bs=bs@entry=0x5783e900) at 
block/io.c:164
#1  0x55aa825d in bdrv_drained_begin (bs=bs@entry=0x5783e900) at 
block/io.c:231
#2  0x55aa8449 in bdrv_drain (bs=0x5783e900) at block/io.c:265
#3  0x55a9c356 in blk_drain (blk=) at 
block/block-backend.c:1383
#4  0x55aa3cfd in mirror_drain (job=) at 
block/mirror.c:1000
#5  0x55a66e11 in block_job_detach_aio_context (opaque=0x57a19a40) 
at blockjob.c:142
#6  0x55a62f4d in bdrv_detach_aio_context (bs=bs@entry=0x57839410) 
at block.c:4357
#7  0x55a63116 in bdrv_set_aio_context (bs=bs@entry=0x57839410, 
new_context=new_context@entry=0x5668bc20) at block.c:4418
#8  0x55a9d326 in blk_set_aio_context (blk=0x566db520, 
new_context=0x5668bc20) at block/block-backend.c:1662
#9  0x557e38da in virtio_blk_data_plane_stop (vdev=) at 
/home/jcody/work/upstream/qemu-kvm/hw/block/dataplane/virtio-blk.c:262
#10 0x559f9d5f in virtio_bus_stop_ioeventfd 
(bus=bus@entry=0x583089a8) at hw/virtio/virtio-bus.c:246
#11 0x559fa49b in virtio_bus_stop_ioeventfd 
(bus=bus@entry=0x583089a8) at hw/virtio/virtio-bus.c:238
#12 0x559f6a18 in virtio_pci_stop_ioeventfd (proxy=0x58300510) at 
hw/virtio/virtio-pci.c:348
#13 0x559f6a18 in virtio_pci_reset (qdev=) at 
hw/virtio/virtio-pci.c:1872
#14 0x559139a9 in qdev_reset_one (dev=, 
opaque=) at hw/core/qdev.c:310
#15 0x55916738 in qbus_walk_children (bus=0x5693aa30, 
pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x559139a0 , 
post_busfn=0x559120f0 , opaque=0x0) at hw/core/bus.c:59
#16 0x55913318 in qdev_walk_children (dev=0x569387d0, 
pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x559139a0 , 
post_busfn=0x559120f0 , opaque=0x0) at hw/core/qdev.c:617
#17 0x55916738 in qbus_walk_children (bus=0x56756f70, 
pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x559139a0 , 
post_busfn=0x559120f0 , opaque=0x0) at hw/core/bus.c:59
#18 0x559168ca in qemu_devices_reset () at hw/core/reset.c:69
#19 0x5581fcbb in pc_machine_reset () at 
/home/jcody/work/upstream/qemu-kvm/hw/i386/pc.c:2234
#20 0x558a4d96 in qemu_system_reset (report=) at 
vl.c:1697
#21 0x5577157a in main_loop_should_exit () at vl.c:1865
#22 0x5577157a in main_loop () at vl.c:1902
#23 0x5577157a in main (argc=, argv=, 
envp=) at vl.c:4709


-Jeff

Re: [Qemu-devel] [PULL 02/15] docs: VM Generation ID device description

2017-04-12 Thread Ben Warren via Qemu-devel


> On Apr 12, 2017, at 1:22 PM, Marc-André Lureau  
> wrote:
> 
> Hi
> 
> On Thu, Apr 13, 2017 at 12:17 AM Ben Warren  > wrote:
>> On Apr 12, 2017, at 1:06 PM, Marc-André Lureau > > wrote:
>> 
>> +Device Usage:
>> +-
>> +
>> +The device has one property, which may be only be set using the command 
>> line:
>> +
>> +  guid - sets the value of the GUID.  A special value "auto" instructs
>> + QEMU to generate a new random GUID.
>> +
>> +For example:
>> +
>> +  QEMU  -device vmgenid,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
>> +  QEMU  -device vmgenid,guid=auto
>> 
>> The default will keep uuid to null, should it be documented? Wouldn't it 
>> make sense to default to auto?
> 
> There is no default - you have to supply a value. It’s up to whatever 
> software is managing VM lifecycle to decide what value to pass in.  Always 
> setting to ‘auto’ will cause a lot of churn within Windows that may or may 
> not be acceptable to your use case.
> 
> 
> Why would you have a vmgenid device if it's always null? Does that please 
> some windows use-cases as well? 
>  
I don’t get what you mean by this.  What device is always null?  Either the 
device is instantiated or it isn’t.  If not there, Windows will not find a 
device and I don’t know how derived objects (Invocation ID, etc.) are handled.
>>  
>> +The property may be queried via QMP/HMP:
>> +
>> +  (QEMU) query-vm-generation-id
>> +  {"return": {"guid": "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"}}
>> +
>> +Setting of this parameter is intentionally left out from the QMP/HMP
>> +interfaces.  There are no known use cases for changing the GUID once QEMU is
>> +running, and adding this capability would greatly increase the complexity.
>>  
>> Is this supposed to be not permitted?
>> 
>> { "execute": "qom-set", "arguments": { "path": 
>> "/machine/peripheral-anon/device[1]", "property": "guid", "value": "auto" } }
>> 
>> Is there any linux kernel support being worked on?
> 
> This isn’t really relevant to the Linux kernel, at least in any way I can 
> think of.  What did you have in mind?
> 
> Testing, but apparently we do have RFE for RHEL as Laszlo pointed out.
OK, so you mean a guest driver.  I do have one that needs work to go upstream, 
but has been helpful to me in testing.
https://github.com/ben-skyportsystems/vmgenid-test 


> 
> Thanks
> -- 
> Marc-André Lureau

Re: [Qemu-devel] [PULL 02/15] docs: VM Generation ID device description

2017-04-12 Thread Marc-André Lureau

Hi

On Thu, Apr 13, 2017 at 12:17 AM Ben Warren  wrote:

> On Apr 12, 2017, at 1:06 PM, Marc-André Lureau 
> wrote:
>
> +Device Usage:
> +-
> +
> +The device has one property, which may be only be set using the command
> line:
> +
> +  guid - sets the value of the GUID.  A special value "auto" instructs
> + QEMU to generate a new random GUID.
> +
> +For example:
> +
> +  QEMU  -device vmgenid,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> +  QEMU  -device vmgenid,guid=auto
>
>
> The default will keep uuid to null, should it be documented? Wouldn't it
> make sense to default to auto?
>
> There is no default - you have to supply a value. It’s up to whatever
> software is managing VM lifecycle to decide what value to pass in.  Always
> setting to ‘auto’ will cause a lot of churn within Windows that may or may
> not be acceptable to your use case.
>
>
Why would you have a vmgenid device if it's always null? Does that please
some windows use-cases as well?


>
>
> +The property may be queried via QMP/HMP:
> +
> +  (QEMU) query-vm-generation-id
> +  {"return": {"guid": "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"}}
> +
> +Setting of this parameter is intentionally left out from the QMP/HMP
> +interfaces.  There are no known use cases for changing the GUID once QEMU
> is
> +running, and adding this capability would greatly increase the complexity.
>
>
> Is this supposed to be not permitted?
>
> { "execute": "qom-set", "arguments": { "path":
> "/machine/peripheral-anon/device[1]", "property": "guid", "value": "auto" }
> }
>
> Is there any linux kernel support being worked on?
>
> This isn’t really relevant to the Linux kernel, at least in any way I can
> think of.  What did you have in mind?
>

Testing, but apparently we do have RFE for RHEL as Laszlo pointed out.

Thanks
-- 
Marc-André Lureau

Re: [Qemu-devel] [PULL 02/15] docs: VM Generation ID device description

2017-04-12 Thread Michael S. Tsirkin

On Wed, Apr 12, 2017 at 08:06:32PM +, Marc-André Lureau wrote:
> Hi
> 
> On Thu, Mar 2, 2017 at 10:22 AM Michael S. Tsirkin  wrote:
> 
> From: Ben Warren 
> 
> This patch is based off an earlier version by
> Gal Hammer (gham...@redhat.com)
> 
> Requirements section, ASCII diagrams and overall help
> provided by Laszlo Ersek (ler...@redhat.com)
> 
> Signed-off-by: Gal Hammer 
> Signed-off-by: Ben Warren 
> Reviewed-by: Laszlo Ersek 
> Reviewed-by: Igor Mammedov 
> Reviewed-by: Michael S. Tsirkin 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  docs/specs/vmgenid.txt | 245
> +
>  1 file changed, 245 insertions(+)
>  create mode 100644 docs/specs/vmgenid.txt
> 
> diff --git a/docs/specs/vmgenid.txt b/docs/specs/vmgenid.txt
> new file mode 100644
> index 000..aa9f518
> --- /dev/null
> +++ b/docs/specs/vmgenid.txt
> @@ -0,0 +1,245 @@
> +VIRTUAL MACHINE GENERATION ID
> +=
> +
> +Copyright (C) 2016 Red Hat, Inc.
> +Copyright (C) 2017 Skyport Systems, Inc.
> +
> +This work is licensed under the terms of the GNU GPL, version 2 or later.
> +See the COPYING file in the top-level directory.
> +
> +===
> +
> +The VM generation ID (vmgenid) device is an emulated device which
> +exposes a 128-bit, cryptographically random, integer value identifier,
> +referred to as a Globally Unique Identifier, or GUID.
> +
> +This allows management applications (e.g. libvirt) to notify the guest
> +operating system when the virtual machine is executed with a different
> +configuration (e.g. snapshot execution or creation from a template).  The
> +guest operating system notices the change, and is then able to react as
> +appropriate by marking its copies of distributed databases as dirty,
> +re-initializing its random number generator etc.
> +
> +
> +Requirements
> +
> +
> +These requirements are extracted from the "How to implement virtual
> machine
> +generation ID support in a virtualization platform" section of the
> +specification, dated August 1, 2012.
> +
> +
> +The document may be found on the web at:
> +  http://go.microsoft.com/fwlink/?LinkId=260709
> +
> +R1a. The generation ID shall live in an 8-byte aligned buffer.
> +
> +R1b. The buffer holding the generation ID shall be in guest RAM, ROM, or
> device
> +     MMIO range.
> +
> +R1c. The buffer holding the generation ID shall be kept separate from
> areas
> +     used by the operating system.
> +
> +R1d. The buffer shall not be covered by an AddressRangeMemory or
> +     AddressRangeACPI entry in the E820 or UEFI memory map.
> +
> +R1e. The generation ID shall not live in a page frame that could be 
> mapped
> with
> +     caching disabled. (In other words, regardless of whether the
> generation ID
> +     lives in RAM, ROM or MMIO, it shall only be mapped as cacheable.)
> +
> +R2 to R5. [These AML requirements are isolated well enough in the
> Microsoft
> +          specification for us to simply refer to them here.]
> +
> +R6. The hypervisor shall expose a _HID (hardware identifier) object in 
> the
> +    VMGenId device's scope that is unique to the hypervisor vendor.
> +
> +
> +QEMU Implementation
> +---
> +
> +The above-mentioned specification does not dictate which ACPI descriptor
> table
> +will contain the VM Generation ID device.  Other implementations (Hyper-V
> and
> +Xen) put it in the main descriptor table (Differentiated System
> Description
> +Table or DSDT).  For ease of debugging and implementation, we have 
> decided
> to
> +put it in its own Secondary System Description Table, or SSDT.
> +
> +The following is a dump of the contents from a running system:
> +
> +# iasl -p ./SSDT -d /sys/firmware/acpi/tables/SSDT
> +
> +Intel ACPI Component Architecture
> +ASL+ Optimizing Compiler version 20150717-64
> +Copyright (c) 2000 - 2015 Intel Corporation
> +
> +Reading ACPI table from file /sys/firmware/acpi/tables/SSDT - Length
> +0198 (0xC6)
> +ACPI: SSDT 0x C6 (v01 BOCHS  VMGENID  0001 BXPC
> +0001)
> +Acpi table [SSDT] successfully installed and loaded
> +Pass 1 parse of [SSDT]
> +Pass 2 parse of [SSDT]
> +Parsing Deferred Opcodes (Methods/Buffers/Packages/Regions)
> +
> +Parsing completed
> +Disassembly completed
> +ASL Output:    ./SSDT.dsl - 1631 bytes
> +# cat SSDT.dsl
> +/*
> + *

Re: [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option

2017-04-12 Thread Eduardo Habkost

On Wed, Mar 22, 2017 at 02:32:25PM +0100, Igor Mammedov wrote:
> Changes since RFC:
> * convert all targets that support numa (Eduardo)
> * add numa CLI tests
> * support wildcard matching with "-numa cpu,..." (Paolo)
> 
> Series introduces a new CLI option to allow mapping cpus to numa
> nodes using public properties [socket|core|thread]-ids instead of
> internal cpu_index and moving internal handling of cpu<->node
> mapping from cpu_index based global bitmaps to MachineState.
> 
> New '-numa cpu' option is supported only on PC and SPAPR
> machines that implement hotpluggable-cpus query.
> ARM machine user-facing interface stays cpu_index based due to
> lack of hotpluggable-cpus support, but internally cpu<->node
> mapping will be using the common for PC/SPAPR/ARM approach
> (i.e. store mapping info in MachineState:possible_cpus)
> 
> It only provides CLI interface to do mapping, there is no QMP
> one as I haven't found a suitable place/way to update/set mapping
> after machine_done for QEMU started with -S (stopped mode) so that
> mgmt could query hopluggable-cpus first, then map them to numa nodes
> in runtime before actually allowing guest to run.
> 
> Another alternative I've been considering is to add CLI option
> similar to -S but that would pause initialization before machine_init()
> callback is run so that user can get CPU layout with hopluggable-cpus,
> then map CPUs to numa nodes and unpause to let machine_init() initialize
> machine using previously predefined numa mapping.
> Such option might also be useful for other usecases.

I would support this approach. This would help on other use cases
as well, and it's what I suggsted at KVM Forum last year:
http://www.linux-kvm.org/images/4/46/03x06A-Eduardo_HabkostMachine-type_Introspection_and_Configuration_Where_Are_We_Going.pdf

But I would treat it as a future plan, as it might take some time
until we refactor the main-loop/QMP code to allow this to happen.

-- 
Eduardo

Re: [Qemu-devel] [PULL 02/15] docs: VM Generation ID device description

2017-04-12 Thread Ben Warren via Qemu-devel


> On Apr 12, 2017, at 1:06 PM, Marc-André Lureau  
> wrote:
> 
> Hi
> 
> On Thu, Mar 2, 2017 at 10:22 AM Michael S. Tsirkin  > wrote:
> From: Ben Warren >
> 
> This patch is based off an earlier version by
> Gal Hammer (gham...@redhat.com )
> 
> Requirements section, ASCII diagrams and overall help
> provided by Laszlo Ersek (ler...@redhat.com )
> 
> Signed-off-by: Gal Hammer >
> Signed-off-by: Ben Warren  >
> Reviewed-by: Laszlo Ersek >
> Reviewed-by: Igor Mammedov >
> Reviewed-by: Michael S. Tsirkin >
> Signed-off-by: Michael S. Tsirkin >
> ---
>  docs/specs/vmgenid.txt | 245 
> +
>  1 file changed, 245 insertions(+)
>  create mode 100644 docs/specs/vmgenid.txt
> 
> diff --git a/docs/specs/vmgenid.txt b/docs/specs/vmgenid.txt
> new file mode 100644
> index 000..aa9f518
> --- /dev/null
> +++ b/docs/specs/vmgenid.txt
> @@ -0,0 +1,245 @@
> +VIRTUAL MACHINE GENERATION ID
> +=
> +
> +Copyright (C) 2016 Red Hat, Inc.
> +Copyright (C) 2017 Skyport Systems, Inc.
> +
> +This work is licensed under the terms of the GNU GPL, version 2 or later.
> +See the COPYING file in the top-level directory.
> +
> +===
> +
> +The VM generation ID (vmgenid) device is an emulated device which
> +exposes a 128-bit, cryptographically random, integer value identifier,
> +referred to as a Globally Unique Identifier, or GUID.
> +
> +This allows management applications (e.g. libvirt) to notify the guest
> +operating system when the virtual machine is executed with a different
> +configuration (e.g. snapshot execution or creation from a template).  The
> +guest operating system notices the change, and is then able to react as
> +appropriate by marking its copies of distributed databases as dirty,
> +re-initializing its random number generator etc.
> +
> +
> +Requirements
> +
> +
> +These requirements are extracted from the "How to implement virtual machine
> +generation ID support in a virtualization platform" section of the
> +specification, dated August 1, 2012.
> +
> +
> +The document may be found on the web at:
> +  http://go.microsoft.com/fwlink/?LinkId=260709 
> 
> +
> +R1a. The generation ID shall live in an 8-byte aligned buffer.
> +
> +R1b. The buffer holding the generation ID shall be in guest RAM, ROM, or 
> device
> + MMIO range.
> +
> +R1c. The buffer holding the generation ID shall be kept separate from areas
> + used by the operating system.
> +
> +R1d. The buffer shall not be covered by an AddressRangeMemory or
> + AddressRangeACPI entry in the E820 or UEFI memory map.
> +
> +R1e. The generation ID shall not live in a page frame that could be mapped 
> with
> + caching disabled. (In other words, regardless of whether the generation 
> ID
> + lives in RAM, ROM or MMIO, it shall only be mapped as cacheable.)
> +
> +R2 to R5. [These AML requirements are isolated well enough in the Microsoft
> +  specification for us to simply refer to them here.]
> +
> +R6. The hypervisor shall expose a _HID (hardware identifier) object in the
> +VMGenId device's scope that is unique to the hypervisor vendor.
> +
> +
> +QEMU Implementation
> +---
> +
> +The above-mentioned specification does not dictate which ACPI descriptor 
> table
> +will contain the VM Generation ID device.  Other implementations (Hyper-V and
> +Xen) put it in the main descriptor table (Differentiated System Description
> +Table or DSDT).  For ease of debugging and implementation, we have decided to
> +put it in its own Secondary System Description Table, or SSDT.
> +
> +The following is a dump of the contents from a running system:
> +
> +# iasl -p ./SSDT -d /sys/firmware/acpi/tables/SSDT
> +
> +Intel ACPI Component Architecture
> +ASL+ Optimizing Compiler version 20150717-64
> +Copyright (c) 2000 - 2015 Intel Corporation
> +
> +Reading ACPI table from file /sys/firmware/acpi/tables/SSDT - Length
> +0198 (0xC6)
> +ACPI: SSDT 0x C6 (v01 BOCHS  VMGENID  0001 BXPC
> +0001)
> +Acpi table [SSDT] successfully installed and loaded
> +Pass 1 parse of [SSDT]
> +Pass 2 parse of [SSDT]
> +Parsing Deferred Opcodes (Methods/Buffers/Packages/Regions)
> +
> +Parsing completed
> +Disassembly completed
> +ASL Output:./SSDT.dsl - 1631 bytes
> +# cat SSDT.dsl
> +/*
> + * Intel ACPI Component Architecture
> + * AML/ASL+ Disassembler version 20150717-64
> + * Copyright (c) 2000 - 2015 Intel Corporation
> + *
>

Re: [Qemu-devel] [PULL 02/15] docs: VM Generation ID device description

2017-04-12 Thread Laszlo Ersek

On 04/12/17 22:06, Marc-André Lureau wrote:
> On Thu, Mar 2, 2017 at 10:22 AM Michael S. Tsirkin  wrote:

>> +Device Usage:
>> +-
>> +
>> +The device has one property, which may be only be set using the command
>> line:
>> +
>> +  guid - sets the value of the GUID.  A special value "auto" instructs
>> + QEMU to generate a new random GUID.
>> +
>> +For example:
>> +
>> +  QEMU  -device vmgenid,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
>> +  QEMU  -device vmgenid,guid=auto
>>
> 
> The default will keep uuid to null, should it be documented? Wouldn't it
> make sense to default to auto?

I guess it might.

> 
> 
>> +The property may be queried via QMP/HMP:
>> +
>> +  (QEMU) query-vm-generation-id
>> +  {"return": {"guid": "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"}}
>> +
>> +Setting of this parameter is intentionally left out from the QMP/HMP
>> +interfaces.  There are no known use cases for changing the GUID once QEMU
>> is
>> +running, and adding this capability would greatly increase the complexity.
>>
> 
> Is this supposed to be not permitted?
> 
> { "execute": "qom-set", "arguments": { "path":
> "/machine/peripheral-anon/device[1]", "property": "guid", "value": "auto" }
> }

I don't know if qom-set can be disabled for individual devices / device
properties. Either way, setting a new GUID for VMGENID will definitely
take custom code.

> Is there any linux kernel support being worked on?

There's an RHBZ for it (alias "vmgenid-kernel"):

https://bugzilla.redhat.com/show_bug.cgi?id=1159983

(It is private because RHBZs for the kernel component are private by
default.)

Thanks
Laszlo

Re: [Qemu-devel] [PATCH v3 1/1] qga: Add 'guest-get-users' command

2017-04-12 Thread Michael Roth

Quoting Vinzenz 'evilissimo' Feenstra (2017-04-04 00:51:46)
> From: Vinzenz Feenstra 
> 
> A command that will list all currently logged in users, and the time
> since when they are logged in.
> 
> Examples:
> 
> virsh # qemu-agent-command F25 '{ "execute": "guest-get-users" }'
> {"return":[{"login-time":1490622289.903835,"user":"root"}]}
> 
> virsh # qemu-agent-command Win2k12r2 '{ "execute": "guest-get-users" }'
> {"return":[{"login-time":1490351044.670552,"domain":"LADIDA",
> "user":"Administrator"}]}
> 
> Signed-off-by: Vinzenz Feenstra 
> ---
>  configure |  2 +-
>  include/glib-compat.h |  5 +++
>  qga/commands-posix.c  | 54 +++
>  qga/commands-win32.c  | 89 
> +++
>  qga/qapi-schema.json  | 24 ++
>  5 files changed, 173 insertions(+), 1 deletion(-)
> 
> diff --git a/configure b/configure
> index d1ce33b..779ebfd 100755
> --- a/configure
> +++ b/configure
> @@ -737,7 +737,7 @@ if test "$mingw32" = "yes" ; then
>sysconfdir="\${prefix}"
>local_statedir=
>confsuffix=""
> -  libs_qga="-lws2_32 -lwinmm -lpowrprof -liphlpapi -lnetapi32 $libs_qga"
> +  libs_qga="-lws2_32 -lwinmm -lpowrprof -lwtsapi32 -liphlpapi -lnetapi32 
> $libs_qga"
>  fi
> 
>  werror=""
> diff --git a/include/glib-compat.h b/include/glib-compat.h
> index 863c8cf..f8ee9dc 100644
> --- a/include/glib-compat.h
> +++ b/include/glib-compat.h
> @@ -217,6 +217,11 @@ static inline void g_hash_table_add(GHashTable 
> *hash_table, gpointer key)
>  {
>  g_hash_table_replace(hash_table, key, key);
>  }
> +
> +static gboolean g_hash_table_contains(GHashTable *hash_table, gpointer key)
> +{
> +return g_hash_table_lookup_extended(hash_table, key, NULL, NULL);
> +}
>  #endif
> 
>  #ifndef g_assert_true
> diff --git a/qga/commands-posix.c b/qga/commands-posix.c
> index 73d93eb..3081ee7 100644
> --- a/qga/commands-posix.c
> +++ b/qga/commands-posix.c
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "qga/guest-agent-core.h"
>  #include "qga-qmp-commands.h"
>  #include "qapi/qmp/qerror.h"
> @@ -2515,3 +2516,56 @@ void ga_command_state_init(GAState *s, GACommandState 
> *cs)
>  ga_command_state_add(cs, NULL, guest_fsfreeze_cleanup);
>  #endif
>  }
> +
> +#ifndef QGA_MICRO_SECOND_TO_SECOND
> +#   define QGA_MICRO_SECOND_TO_SECOND 100
> +#endif

I don't really understand the pattern here and elsewhere. Where else would
QGA_MICRO_SECOND_TO_SECOND ever be defined? I'd rather things break and
need to be reconciled explicitly if such a thing was ever introduced.

Everything else looks good though.

> +
> +static double ga_get_login_time(struct utmpx *user_info)
> +{
> +double seconds = (double)user_info->ut_tv.tv_sec;
> +double useconds = (double)user_info->ut_tv.tv_usec;
> +useconds /= QGA_MICRO_SECOND_TO_SECOND;
> +return seconds + useconds;
> +}
> +
> +GuestUserList *qmp_guest_get_users(Error **err)
> +{
> +GHashTable *cache = g_hash_table_new(g_str_hash, g_str_equal);
> +GuestUserList *head = NULL, *cur_item = NULL;
> +setutxent();
> +for (;;) {
> +struct utmpx *user_info = getutxent();
> +if (user_info == NULL) {
> +break;
> +} else if (user_info->ut_type != USER_PROCESS) {
> +continue;
> +} else if (g_hash_table_contains(cache, user_info->ut_user)) {
> +gpointer value = g_hash_table_lookup(cache, user_info->ut_user);
> +GuestUser *user = (GuestUser *)value;
> +double login_time = ga_get_login_time(user_info);
> +/* We're ensuring the earliest login time to be sent */
> +if (login_time < user->login_time) {
> +user->login_time = login_time;
> +}
> +continue;
> +}
> +
> +GuestUserList *item = g_new0(GuestUserList, 1);
> +item->value = g_new0(GuestUser, 1);
> +item->value->user = g_strdup(user_info->ut_user);
> +item->value->login_time = ga_get_login_time(user_info);
> +
> +g_hash_table_insert(cache, item->value->user, item->value);
> +
> +if (!cur_item) {
> +head = cur_item = item;
> +} else {
> +cur_item->next = item;
> +cur_item = item;
> +}
> +}
> +endutxent();
> +g_hash_table_destroy(cache);
> +return head;
> +}
> diff --git a/qga/commands-win32.c b/qga/commands-win32.c
> index 19d72b2..8b84a90 100644
> --- a/qga/commands-win32.c
> +++ b/qga/commands-win32.c
> @@ -11,6 +11,9 @@
>   * See the COPYING file in the top-level directory.
>   */
> 
> +#ifndef _WIN32_WINNT
> +#   define _WIN32_WINNT 0x0600
> +#endif
>  #include "qemu/osdep.h"
>  #include 
>  #include 
> @@ -25,6 +28,7 @@
>  #include 
>  #endif
>  #include 
> +#include 
> 
>  #include "qga/guest-agent-core.h"
>  #include "qga/vss-win32.h"
> @@ -1536,3 +1540,88 @@ void

Re: [Qemu-devel] vmbus bridge: machine property or device?

2017-04-12 Thread Eduardo Habkost

On Wed, Apr 12, 2017 at 05:18:51PM +0200, Markus Armbruster wrote:
> Cc'ing a few more people who might have a reasoned opinion.
> 
> Roman Kagan  writes:
> 
> > While hammering out the VMBus / storage series, we've been struggling to
> > figure out the best practices solution to the following problem:
> >
> > VMBus is provided by a vmbus bridge; it appears the most natural to have
> > it subclassed from SysBusDevice.  There can only be one VMBus in the
> > VM.
> 
> TYPE_DEVICE unless you actually need something TYPE_SYS_BUS_DEVICE
> provides.
> 
> > Now the question is how to add it to the system:
> >
> > 1) with a boolean machine property "vmbus" that would trigger the
> >creation of the VMBus bridge; its class would have
> >->cannot_instantiate_with_device_add_yet = true
> 
> This makes it an optional onboard device.  Similar ones exist already,
> e.g. various optional onboard USB host controllers controlled by machine
> property "usb".
> 
> > 2) with a regular -device option; this would require setting
> >->has_dynamic_sysbus = true for i440fx machines (q35 already have it)
> 
> This makes it a pluggable sysbus device.
> 
> I'd be tempted to leave old i400FX rot in peace, but your use case may
> not allow that.

I have sent a RFC some time ago that replaces the all-or-nothing
has_dynamic_sysbus flag with an explicit sysbus device whitelist,
so i440fx wouldn't be a big problem. But as you noted above, if
you don't need TYPE_SYS_BUS_DEVIC, you can just use TYPE_DEVICE.

> 
> >
> > 3) anything else
> >
> >
> > So far we went with 1) but since it's essentially the API the management
> > layer would have to use we'd like to get it right from the beginning.
> 
> Asking for advice here is a good idea.
> 
> Anyone?
> 

I would go with (2) instead of (1): it allows more flexibility in
case the device needs additional arguments, and will
automatically benefit from (present and future) mechanisms for
reporting available device-types and buses. Asking QEMU if
"-device FOO" is supported is easy and reliable; the mechanisms
for asking QEMU about supported "-machine" options are obscure
and probably not well-tested.

-- 
Eduardo

Re: [Qemu-devel] [PULL 02/15] docs: VM Generation ID device description

2017-04-12 Thread Marc-André Lureau

Hi

On Thu, Mar 2, 2017 at 10:22 AM Michael S. Tsirkin  wrote:

> From: Ben Warren 
>
> This patch is based off an earlier version by
> Gal Hammer (gham...@redhat.com)
>
> Requirements section, ASCII diagrams and overall help
> provided by Laszlo Ersek (ler...@redhat.com)
>
> Signed-off-by: Gal Hammer 
> Signed-off-by: Ben Warren 
> Reviewed-by: Laszlo Ersek 
> Reviewed-by: Igor Mammedov 
> Reviewed-by: Michael S. Tsirkin 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  docs/specs/vmgenid.txt | 245
> +
>  1 file changed, 245 insertions(+)
>  create mode 100644 docs/specs/vmgenid.txt
>
> diff --git a/docs/specs/vmgenid.txt b/docs/specs/vmgenid.txt
> new file mode 100644
> index 000..aa9f518
> --- /dev/null
> +++ b/docs/specs/vmgenid.txt
> @@ -0,0 +1,245 @@
> +VIRTUAL MACHINE GENERATION ID
> +=
> +
> +Copyright (C) 2016 Red Hat, Inc.
> +Copyright (C) 2017 Skyport Systems, Inc.
> +
> +This work is licensed under the terms of the GNU GPL, version 2 or later.
> +See the COPYING file in the top-level directory.
> +
> +===
> +
> +The VM generation ID (vmgenid) device is an emulated device which
> +exposes a 128-bit, cryptographically random, integer value identifier,
> +referred to as a Globally Unique Identifier, or GUID.
> +
> +This allows management applications (e.g. libvirt) to notify the guest
> +operating system when the virtual machine is executed with a different
> +configuration (e.g. snapshot execution or creation from a template).  The
> +guest operating system notices the change, and is then able to react as
> +appropriate by marking its copies of distributed databases as dirty,
> +re-initializing its random number generator etc.
> +
> +
> +Requirements
> +
> +
> +These requirements are extracted from the "How to implement virtual
> machine
> +generation ID support in a virtualization platform" section of the
> +specification, dated August 1, 2012.
> +
> +
> +The document may be found on the web at:
> +  http://go.microsoft.com/fwlink/?LinkId=260709
> +
> +R1a. The generation ID shall live in an 8-byte aligned buffer.
> +
> +R1b. The buffer holding the generation ID shall be in guest RAM, ROM, or
> device
> + MMIO range.
> +
> +R1c. The buffer holding the generation ID shall be kept separate from
> areas
> + used by the operating system.
> +
> +R1d. The buffer shall not be covered by an AddressRangeMemory or
> + AddressRangeACPI entry in the E820 or UEFI memory map.
> +
> +R1e. The generation ID shall not live in a page frame that could be
> mapped with
> + caching disabled. (In other words, regardless of whether the
> generation ID
> + lives in RAM, ROM or MMIO, it shall only be mapped as cacheable.)
> +
> +R2 to R5. [These AML requirements are isolated well enough in the
> Microsoft
> +  specification for us to simply refer to them here.]
> +
> +R6. The hypervisor shall expose a _HID (hardware identifier) object in the
> +VMGenId device's scope that is unique to the hypervisor vendor.
> +
> +
> +QEMU Implementation
> +---
> +
> +The above-mentioned specification does not dictate which ACPI descriptor
> table
> +will contain the VM Generation ID device.  Other implementations (Hyper-V
> and
> +Xen) put it in the main descriptor table (Differentiated System
> Description
> +Table or DSDT).  For ease of debugging and implementation, we have
> decided to
> +put it in its own Secondary System Description Table, or SSDT.
> +
> +The following is a dump of the contents from a running system:
> +
> +# iasl -p ./SSDT -d /sys/firmware/acpi/tables/SSDT
> +
> +Intel ACPI Component Architecture
> +ASL+ Optimizing Compiler version 20150717-64
> +Copyright (c) 2000 - 2015 Intel Corporation
> +
> +Reading ACPI table from file /sys/firmware/acpi/tables/SSDT - Length
> +0198 (0xC6)
> +ACPI: SSDT 0x C6 (v01 BOCHS  VMGENID  0001 BXPC
> +0001)
> +Acpi table [SSDT] successfully installed and loaded
> +Pass 1 parse of [SSDT]
> +Pass 2 parse of [SSDT]
> +Parsing Deferred Opcodes (Methods/Buffers/Packages/Regions)
> +
> +Parsing completed
> +Disassembly completed
> +ASL Output:./SSDT.dsl - 1631 bytes
> +# cat SSDT.dsl
> +/*
> + * Intel ACPI Component Architecture
> + * AML/ASL+ Disassembler version 20150717-64
> + * Copyright (c) 2000 - 2015 Intel Corporation
> + *
> + * Disassembling to symbolic ASL+ operators
> + *
> + * Disassembly of /sys/firmware/acpi/tables/SSDT, Sun Feb  5 00:19:37 2017
> + *
> + * Original Table Header:
> + * Signature"SSDT"
> + * Length   0x00CA (202)
> + * Revision 0x01
> + * Checksum 0x4B
> + * OEM ID   "BOCHS "
> + * OEM Table ID "VMGENID"
> + * OEM Revision 0x0001 (1)
> + *

Re: [Qemu-devel] [PATCH v3] qemu-ga: add guest-get-osrelease command

2017-04-12 Thread Michael Roth

On 04/03/2017 10:17 AM, Marc-André Lureau wrote:
> Hi
> 
> On Fri, Mar 31, 2017 at 3:41 PM Eric Blake  wrote:
> 
>> On 03/31/2017 05:19 AM, Vinzenz 'evilissimo' Feenstra wrote:
>>> From: Vinzenz Feenstra 
>>>
>>> Add a new 'guest-get-osrelease' command to report OS information in
>>> the
>>> os-release format. As documented here:
>>> https://www.freedesktop.org/software/systemd/man/os-release.html
>>>
>>> The win32 implementation generates the information.
>>> On POSIX systems the /etc/os-release or /usr/lib/os-release files
>>> content is returned when available and gets extended with the
>>> fields:
>>> - QGA_UNAME_RELEASE which is the content of `uname -r`
>>> - QGA_UNAME_VERSION which is the content of `uname -v`
>>> - QGA_UNAME_MACHINE which is the content of `uname -m`
>>>
>>> Here an example for a Fedora 25 VM:
>>>
>>> virsh # qemu-agent-command F25 '{ "execute": "guest-get-osrelease"
>>> }'
>>> {"return":{"content":"NAME=Fedora\nVERSION=\"25 (Server Edition)\"\n
>>> ID=fedora\nVERSION_ID=25\nPRETTY_NAME=\"Fedora 25 (Server
>>> Edition)\"\n
>>> ANSI_COLOR=\"0;34\"\nCPE_NAME=\"cpe:/o:fedoraproject:fedora:25\"\n
>>> HOME_URL=\"https://fedoraproject.org/\"\n
>>> BUG_REPORT_URL=\"https://bugzilla.redhat.com/\"\n
>>> REDHAT_BUGZILLA_PRODUCT=\"Fedora\"\n
>>> REDHAT_BUGZILLA_PRODUCT_VERSION=25\nREDHAT_SUPPORT_PRODUCT=\"Fedora\"\n
>>> REDHAT_SUPPORT_PRODUCT_VERSION=25\n
>>> PRIVACY_POLICY_URL=https://fedoraproject.org/wiki/Legal:PrivacyPolicy\n
>>> VARIANT=\"Server Edition\"\nVARIANT_ID=server\n\n
>>> QGA_UNAME_RELEASE=\"4.8.6-300.fc25.x86_64\"\n
>>> QGA_UNAME_VERSION=\"#1 SMP Tue Nov 1 12:36:38 UTC 2016\"\n
>>> QGA_UNAME_MACHINE=\"x86_64\"\n"}}
>>
>> Uggh. This is a step backwards.  Now you are requiring the end user
>> to
>> parse a raw string, instead of giving them the information already
>> broken out as a JSON dictionary.
>>
> 
> yes otoh, it uses an existing standard to retrieve various guest os
> release
> informations, which existing tool may know how to handle.
> 
> (I feel partially guilty about it since I suggested it, mainly in a
> discussion over irc and Vinzenz adopted it)
> 
> The format is fairly straightforward to parse, but perhaps it should
> be
> sent as a JSON dict instead? However, that would mean that the list of
> keys
> is limited by what QGA protocol defines, and the agent would have to
> parse
> the file himself. And we would have to duplicate the documentation
> etc..
> 
> I would rely on the XDG format instead, given its simplicity,
> extensibility
> and documentation that fits the job nicely imho.

I like the idea of using an existing standard, but if they really want
to get at a raw dump of /etc/os-release to use with existing tools then
I think guest-file-open/read are the more appropriate interfaces.

Knowing that they *can* get at information like that for a particular
guest, or do things like execute 'uname -m' via guest-exec, is where I
think an interface like this has it's place.

So I think a more curated/limited set of identifiers is sufficient, and
still flexible enough to enable to more os-specific use-cases.

But I also don't like the idea of re-defining what terms like
"version_id", "variant", "varient_id", etc mean, so I think it's still
a good idea to use the os-release-documented fields as the basis for
the fields we decide to return in our dictionary, and note that
explicitly in the schema documentation.

Re: [Qemu-devel] [PATCH] Add 'none' as type for drive's if option

2017-04-12 Thread Craig Jellick

Is there an action I need to take for that?


From: Markus Armbruster 
Sent: Tuesday, April 11, 2017 6:17:10 AM
To: Craig Jellick
Cc: qemu-devel@nongnu.org; qemu-triv...@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] Add 'none' as type for drive's if option

Perhaps this can go via qemu-trivial.

Craig Jellick  writes:

> Signed-off-by: Craig Jellick 
> ---
>  qemu-options.hx | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 99af8ed..8291e64 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -591,7 +591,7 @@ Special files such as iSCSI devices can be specified 
> using protocol
>  specific URLs. See the section for "Device URL Syntax" for more information.
>  @item if=@var{interface}
>  This option defines on which type on interface the drive is connected.
> -Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio.
> +Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio, none.
>  @item bus=@var{bus},unit=@var{unit}
>  These options define where is connected the drive by defining the bus number 
> and
>  the unit id.

[Qemu-devel] First contribution - Interested in Outreachy

2017-04-12 Thread Prerna Garg

Hello,


This is my first patch submission. I am interested in the block filter project 
for this round of Outreachy.


diff --git a/backends/hostmem.c b/backends/hostmem.c
index 89feb9e..f056a25 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -263,7 +263,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
 uint64_t sz;

 if (bc->alloc) {
-bc->alloc(backend, _err);
+bc->g_alloc(backend, _err);
 if (local_err) {
 goto out;
 }

Re: [Qemu-devel] WinDbg module

2017-04-12 Thread Denis V. Lunev

On 04/12/2017 05:05 PM, Mihail Abakumov wrote:
> Hello.
>
> We made the debugger module WinDbg (like GDB) for QEMU. This is the
> replacement of the remote stub in Windows kernel. Used for remote
> Windows kernel debugging without debugging mode.
>
> The latest build and instructions for the launch can be found here:
> https://github.com/ispras/qemu/releases/tag/v2.7.50-windbg
>
> Currently only one ways to create a remote debugging connection is
> supported: using COM port with named pipe.
>
> Should I prepare patches for inclusion in the master branch? Or is it
> too specific module and it is not needed?
>
this would be indeed useful. We have a lot of problems
with Windows debugging. In general, Windows behaviour
with and without debugger is different.

Den

Re: [Qemu-devel] EXT :Re: Emulating external registers

2017-04-12 Thread Wu, Michael Y [US] (MS)

Thank you for the suggestions. I found out that the issue was with how the bare 
metal program configured memory.
The program is now able to read and write from a pointer.

I included your unimplemented device into the g3 machine (mac_oldworld.c) I am 
emulating using the following code.
create_unimplemented_device("unimp",0x0340,(4 << 20));

When I set a pointer to 0x0340, I can read and write to that location. 
However I have no outputs from the read and write callbacks. I have the '-d 
unimp' set in the command arguments as well.

I added the create_unimplemented_device at the end of the function 
ppc_heathrow_init (mac_oldworld.c). Is my usage correct in adding this device 
to an existing machine?
In unimp.h, the comments mention that the priority is at -1000. Does that mean 
I can set a unimplemented device to a huge address range to pick up any 
read/writes?

-Original Message-
From: Peter Maydell [mailto:peter.mayd...@linaro.org] 
Sent: Thursday, April 06, 2017 10:38 AM
To: Wu, Michael Y [US] (MS)
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] EXT :Re: Emulating external registers

On 6 April 2017 at 18:23, Wu, Michael Y [US] (MS)  wrote:
> I changed my code to the following and added the appropriate read,write 
> functions and a MemoryRegionOps.
> memory_region_init_io(reg_memory, NULL, _ops, reg,
>   "reg_mem", 0x0040); //set to 64 bytes
> memory_region_add_subregion(sysmem, 0xFC00, reg_memory);
>
> For the read function I just returned a zero. So if I were to read from the 
> address 0xFC00 it should return a value of 0? The current issue I am 
> having is that gdb hangs when the pointer is accessed. I am starting to think 
> my bare-metal program is incorrect. I also added log messages in my read and 
> write functions. The read function was not accessed.

You'll probably find that what has happened is that your program has taken an 
exception which you don't have an exception handler installed for, and has then 
jumped off into nowhere or gone into an infinite loop of taking exceptions. 
(Probably hitting ^c in gdb will break into wherever it is.) It's a good idea 
in bare metal test code to write at least a minimal set of exception handlers 
that can print a message when an unexpected exception occurs and stop, so you 
don't get too confused.

You might also want to investigate QEMU's tracing:
-d in_asm,exec,cpu,int,nochain -D debug.log will write tracing to the debug.log 
file (a lot of it, and this set of trace flags slow down execution a lot, but 
if you're doing very small bare metal flags that should be ok). This can help 
in figuring out what your test program is doing. (Watch out that the in_asm 
shows when we first encounter a block of code, but if we execute the same bit 
of code a second time we'll only print the exec and cpu parts of the logging.)

thanks
-- PMM

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-12 Thread Eric Blake

On 04/12/2017 12:55 PM, Denis V. Lunev wrote:
> Let me rephrase a bit.
> 
> The proposal is looking very close to the following case:
> - raw sparse file
> 
> In this case all writes are very-very-very fast and from the
> guest point of view all is OK. Sequential data is really sequential.
> Though once we are starting to perform any sequential IO, we
> have real pain. Each sequential operation becomes random
> on the host file system and the IO becomes very slow. This
> will not be observed with the test, but the performance will
> degrade very soon.
> 
> This is why raw sparse files are not used in the real life.
> Hypervisor must maintain guest OS invariants and the data,
> which is nearby from the guest point of view should be kept
> nearby in host.
> 
> This is why actually that 64kb data blocks are extremely
> small :) OK. This is offtopic.

Not necessarily. Using subclusters may allow you to ramp up to larger
cluster sizes. We can also set up our allocation (and pre-allocation
schemes) so that we always reserve an entire cluster on the host at the
time we allocate the cluster, even if we only plan to write to
particular subclusters within that cluster.  In fact, 32 subclusters to
a 2M cluster results in 64k subclusters, where you are still writing at
64k data chunks but could now have guaranteed 2M locality, compared to
the current qcow2 with 64k clusters that writes in 64k data chunks but
with no locality.

Just because we don't write the entire cluster up front does not mean
that we don't have to allocate (or have a mode that allocates) the
entire cluster at the time of the first subcluster use.

> 
> One can easily recreate this case using the following simple
> test:
> - write each even 4kb page of the disk, one by one
> - write each odd 4 kb page of the disk
> - run sequential read with f.e. 1 MB data block
> 
> Normally we should still have native performance, but
> with raw sparse files and (as far as understand the
> proposal) sub-clusters we will have the host IO pattern
> exactly like random.

Only if we don't pre-allocate entire clusters at the point that we first
touch the cluster.

> 
> This seems like a big and inevitable problem of the approach
> for me. We still have the potential to improve current
> algorithms and not introduce non-compatible changes.
> 
> Sorry if this is too emotional. We have learned above in a
> very hard way.

And your experience is useful, as a way to fine-tune this proposal.  But
it doesn't mean we should entirely ditch this proposal.  I also
appreciate that you have patches in the works to reduce bottlenecks
(such as turning sub-cluster writes into 3 IOPs rather than 5, by doing
read-head, read-tail, write-cluster, instead of the current read-head,
write-head, write-body, read-tail, write-tail), but think that both
approaches are complimentary, not orthogonal.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-12 Thread Denis V. Lunev

On 04/06/2017 06:01 PM, Alberto Garcia wrote:
> Hi all,
>
> over the past couple of months I discussed with some of you the
> possibility to extend the qcow2 format in order to improve its
> performance and reduce its memory requirements (particularly with very
> large images).
>
> After some discussion in the mailing list and the #qemu IRC channel I
> decided to write a prototype of a new extension for qcow2 so I could
> understand better the scope of the changes and have some preliminary
> data about its effects.
>
> This e-mail is the formal presentation of my proposal to extend the
> on-disk qcow2 format. As you can see this is still an RFC. Due to the
> nature of the changes I would like to get as much feedback as possible
> before going forward.
>
> === Problem ===
>
> The original problem that I wanted to address is the memory
> requirements of qcow2 files if you want to have very large images and
> still keep good I/O performance. This is a consequence of its very
> structure, which I'm going to describe now.
>
> A qcow2 image is divided into units of constant size called clusters,
> and among other things it contains metadata that maps guest addresses
> to host addresses (the so-called L1 and L2 tables).
>
> There are two basic problems that result from this:
>
> 1) Reading from or writing to a qcow2 image involves reading the
>corresponding entry on the L2 table that maps the guest address to
>the host address. This is very slow because it involves two I/O
>operations: one on the L2 table and the other one on the actual
>data cluster.
>
> 2) A cluster is the smallest unit of allocation. Therefore writing a
>mere 512 bytes to an empty disk requires allocating a complete
>cluster and filling it with zeroes (or with data from the backing
>image if there is one). This wastes more disk space and also has a
>negative impact on I/O.
>
> Problem (1) can be solved by keeping in memory a cache of the L2
> tables (QEMU has an "l2_cache_size" parameter for this purpose). The
> amount of disk space used by L2 tables depends on two factors: the
> disk size and the cluster size.
>
> The cluster size can be configured when the image is created, and it
> can be any power of two between 512 bytes and 2 MB (it defaults to 64
> KB).
>
> The maximum amount of space needed for the L2 tables can be calculated
> with the following formula:
>
>max_l2_size = virtual_disk_size * 8 / cluster_size
>
> Large images require a large amount of metadata, and therefore a large
> amount of memory for the L2 cache. With the default cluster size
> (64KB) that's 128MB of L2 cache for a 1TB qcow2 image.
>
> The only way to reduce the size of the L2 tables is therefore
> increasing the cluster size, but this makes the image less efficient
> as described earlier in (2).
>
> === The proposal ===
>
> The idea of this proposal is to extend the qcow2 format by allowing
> subcluster allocation. There would be an optional feature that would
> need to be enabled when creating the image. The on-disk format would
> remain essentially the same, except that each data cluster would be
> internally divided into a number of subclusters of equal size.
>
> What this means in practice is that each entry on an L2 table would be
> accompanied by a bitmap indicating the allocation state of each one of
> the subclusters for that cluster. There are several alternatives for
> storing the bitmap, described below.
>
> Other than L2 entries, all other data structures would remain
> unchanged, but for data clusters the smallest unit of allocation would
> now be the subcluster. Reference counting would still be at the
> cluster level, because there's no way to reference individual
> subclusters. Copy-on-write on internal snapshots would need to copy
> complete clusters, so that scenario would not benefit from this
> change.
>
> I see two main use cases for this feature:
>
> a) The qcow2 image is not too large / the L2 cache is not a problem,
>but you want to increase the allocation performance. In this case
>you can have something like a 128KB cluster with 4KB subclusters
>(with 4KB being a common block size in ext4 and other filesystems)
>
> b) The qcow2 image is very large and you want to save metadata space
>in order to have a smaller L2 cache. In this case you can go for
>the maximum cluster size (2MB) but you want to have smaller
>subclusters to increase the allocation performance and optimize the
>disk usage. This was actually my original use case.
>
> === Test results ===
>
> I have a basic working prototype of this. It's still incomplete -and
> buggy :)- but it gives an idea of what we can expect from it. In my
> implementation each data cluster has 8 subclusters, but that's not set
> in stone (see below).
>
> I made all tests on an SSD drive, writing to an empty qcow2 image with
> a fully populated 40GB backing image, performing random writes using
> fio with a block size of 4KB.
>
> I tried

Re: [Qemu-devel] [PATCH v4 4/4] qemu-img: copy *key-secret opts when opening newly created files

2017-04-12 Thread Eric Blake

On 04/12/2017 11:44 AM, Daniel P. Berrange wrote:
> The qemu-img dd/convert commands will create a image file and
> then try to open it. Historically it has been possible to open
> new files without passing any options. With encrypted files
> though, the *key-secret options are mandatory, so we need to
> provide those options when opening the newly created file.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  qemu-img.c | 41 +++--
>  1 file changed, 35 insertions(+), 6 deletions(-)

Reviewed-by: Eric Blake 


> @@ -332,6 +334,33 @@ static BlockBackend *img_open_file(const char *filename,
>  }
>  
>  
> +static int img_add_key_secrets(void *opaque,
> +   const char *name, const char *value,
> +   Error **errp)
> +{
> +QDict *options = opaque;
> +
> +if (g_str_has_suffix(name, "key-secret")) {
> +qdict_put(options, name, qstring_from_str(value));

If my patch to add qdict_put_str() lands (probably through Markus'
qapi-next tree) before yours, you can simplify this line. If yours lands
first, we just rerun my Coccinelle script to simplify it as part of my
patch.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 11/12] dirty-bitmap: Switch bdrv_set_dirty() to bytes

2017-04-12 Thread Eric Blake

Both callers already had bytes available, but were scaling to
sectors.  Move the scaling to internal code.  In the case of
bdrv_aligned_pwritev(), we are now passing the exact offset
rather than a rounded sector-aligned value, but that's okay
as long as dirty bitmap widens start/bytes to granularity
boundaries.

Signed-off-by: Eric Blake 
---
 include/block/block_int.h | 2 +-
 block/dirty-bitmap.c  | 8 +---
 block/io.c| 6 ++
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 08063c1..0b737fd 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -917,7 +917,7 @@ void blk_dev_eject_request(BlockBackend *blk, bool force);
 bool blk_dev_is_tray_open(BlockBackend *blk);
 bool blk_dev_is_medium_locked(BlockBackend *blk);

-void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int64_t nr_sect);
+void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, int64_t bytes);
 bool bdrv_requests_pending(BlockDriverState *bs);

 void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 8e7822c..ef165eb 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -478,15 +478,17 @@ void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap 
*bitmap)
 hbitmap_deserialize_finish(bitmap->bitmap);
 }

-void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
-int64_t nr_sectors)
+void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
 BdrvDirtyBitmap *bitmap;
+int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
+
 QLIST_FOREACH(bitmap, >dirty_bitmaps, list) {
 if (!bdrv_dirty_bitmap_enabled(bitmap)) {
 continue;
 }
-hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
+hbitmap_set(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
+end_sector - (offset >> BDRV_SECTOR_BITS));
 }
 }

diff --git a/block/io.c b/block/io.c
index 9218329..d22d35f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1328,7 +1328,6 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
*child,
 bool waited;
 int ret;

-int64_t start_sector = offset >> BDRV_SECTOR_BITS;
 int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
 uint64_t bytes_remaining = bytes;
 int max_transfer;
@@ -1407,7 +1406,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
*child,
 bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE);

 ++bs->write_gen;
-bdrv_set_dirty(bs, start_sector, end_sector - start_sector);
+bdrv_set_dirty(bs, offset, bytes);

 if (bs->wr_highest_offset < offset + bytes) {
 bs->wr_highest_offset = offset + bytes;
@@ -2535,8 +2534,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, 
int64_t offset,
 ret = 0;
 out:
 ++bs->write_gen;
-bdrv_set_dirty(bs, req.offset >> BDRV_SECTOR_BITS,
-   req.bytes >> BDRV_SECTOR_BITS);
+bdrv_set_dirty(bs, req.offset, req.bytes);
 tracked_request_end();
 bdrv_dec_in_flight(bs);
 return ret;
-- 
2.9.3

[Qemu-devel] [PATCH 10/12] mirror: Switch mirror_dirty_init() to byte-based iteration

2017-04-12 Thread Eric Blake

Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

Signed-off-by: Eric Blake 
---
 block/mirror.c | 35 ++-
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 21b4f5d..846e392 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -601,15 +601,13 @@ static void mirror_throttle(MirrorBlockJob *s)

 static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 {
-int64_t sector_num, end;
+int64_t offset;
 BlockDriverState *base = s->base;
 BlockDriverState *bs = s->source;
 BlockDriverState *target_bs = blk_bs(s->target);
-int ret, n;
+int ret;
 int64_t count;

-end = s->bdev_length / BDRV_SECTOR_SIZE;
-
 if (base == NULL && !bdrv_has_zero_init(target_bs)) {
 if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
 bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
@@ -617,9 +615,9 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 }

 s->initial_zeroing_ongoing = true;
-for (sector_num = 0; sector_num < end; ) {
-int nb_sectors = MIN(end - sector_num,
-QEMU_ALIGN_DOWN(INT_MAX, s->granularity) >> BDRV_SECTOR_BITS);
+for (offset = 0; offset < s->bdev_length; ) {
+int bytes = MIN(s->bdev_length - offset,
+QEMU_ALIGN_DOWN(INT_MAX, s->granularity));

 mirror_throttle(s);

@@ -635,9 +633,8 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 continue;
 }

-mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
-  nb_sectors * BDRV_SECTOR_SIZE, false);
-sector_num += nb_sectors;
+mirror_do_zero_or_discard(s, offset, bytes, false);
+offset += bytes;
 }

 mirror_wait_for_all_io(s);
@@ -645,10 +642,10 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob 
*s)
 }

 /* First part, loop on the sectors and initialize the dirty bitmap.  */
-for (sector_num = 0; sector_num < end; ) {
+for (offset = 0; offset < s->bdev_length; ) {
 /* Just to make sure we are not exceeding int limit. */
-int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS,
- end - sector_num);
+int bytes = MIN(s->bdev_length - offset,
+QEMU_ALIGN_DOWN(INT_MAX, s->granularity));

 mirror_throttle(s);

@@ -656,20 +653,16 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob 
*s)
 return 0;
 }

-ret = bdrv_is_allocated_above(bs, base, sector_num * BDRV_SECTOR_SIZE,
-  nb_sectors * BDRV_SECTOR_SIZE, );
+ret = bdrv_is_allocated_above(bs, base, offset, bytes, );
 if (ret < 0) {
 return ret;
 }

-n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
-assert(n > 0);
+count = QEMU_ALIGN_UP(count, BDRV_SECTOR_SIZE);
 if (ret == 1) {
-bdrv_set_dirty_bitmap(s->dirty_bitmap,
-  sector_num * BDRV_SECTOR_SIZE,
-  n * BDRV_SECTOR_SIZE);
+bdrv_set_dirty_bitmap(s->dirty_bitmap, offset, count);
 }
-sector_num += n;
+offset += count;
 }
 return 0;
 }
-- 
2.9.3

[Qemu-devel] [PATCH 08/12] dirty-bitmap: Change bdrv_get_dirty() to take bytes

2017-04-12 Thread Eric Blake

Half the callers were already scaling bytes to sectors; the other
half can eventually be simplified to use byte iteration.  Both
callers were already using the result as a bool, so make that
explicit.  Making the change also makes it easier for a future
dirty-bitmap patch to offload scaling over to the internal hbitmap.

Signed-off-by: Eric Blake 
---
 include/block/dirty-bitmap.h | 4 ++--
 block/dirty-bitmap.c | 8 
 block/mirror.c   | 3 +--
 migration/block.c| 3 +--
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index efcec60..b8434e5 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -34,8 +34,8 @@ bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap);
 const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
 DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap);
-int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
-   int64_t sector);
+bool bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
+int64_t offset);
 void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int64_t nr_sectors);
 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index e3c2e34..c8100d2 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -332,13 +332,13 @@ BlockDirtyInfoList 
*bdrv_query_dirty_bitmaps(BlockDriverState *bs)
 return list;
 }

-int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
-   int64_t sector)
+bool bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
+int64_t offset)
 {
 if (bitmap) {
-return hbitmap_get(bitmap->bitmap, sector);
+return hbitmap_get(bitmap->bitmap, offset >> BDRV_SECTOR_BITS);
 } else {
-return 0;
+return false;
 }
 }

diff --git a/block/mirror.c b/block/mirror.c
index 1b98a77..1e2e655 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -359,8 +359,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 int64_t next_offset = offset + nb_chunks * s->granularity;
 int64_t next_chunk = next_offset / s->granularity;
 if (next_offset >= s->bdev_length ||
-!bdrv_get_dirty(source, s->dirty_bitmap,
-next_offset >> BDRV_SECTOR_BITS)) {
+!bdrv_get_dirty(source, s->dirty_bitmap, next_offset)) {
 break;
 }
 if (test_bit(next_chunk, s->in_flight_bitmap)) {
diff --git a/migration/block.c b/migration/block.c
index 3daa5c7..9e21aeb 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -537,8 +537,7 @@ static int mig_save_device_dirty(QEMUFile *f, 
BlkMigDevState *bmds,
 } else {
 blk_mig_unlock();
 }
-if (bdrv_get_dirty(bs, bmds->dirty_bitmap, sector)) {
-
+if (bdrv_get_dirty(bs, bmds->dirty_bitmap, sector * BDRV_SECTOR_SIZE)) 
{
 if (total_sectors - sector < BDRV_SECTORS_PER_DIRTY_CHUNK) {
 nr_sectors = total_sectors - sector;
 } else {
-- 
2.9.3

Re: [Qemu-devel] WinDbg module

2017-04-12 Thread Stefan Weil


Am 12.04.2017 um 18:30 schrieb Roman Kagan:

On Wed, Apr 12, 2017 at 05:05:45PM +0300, Mihail Abakumov wrote:

Hello.

We made the debugger module WinDbg (like GDB) for QEMU. This is the
replacement of the remote stub in Windows kernel. Used for remote Windows
kernel debugging without debugging mode.

The latest build and instructions for the launch can be found here:
https://github.com/ispras/qemu/releases/tag/v2.7.50-windbg

Currently only one ways to create a remote debugging connection is
supported: using COM port with named pipe.

Should I prepare patches for inclusion in the master branch? Or is it too
specific module and it is not needed?


Please do!

Every once in a while dealing with a Windows guest problem I wished I
had this.  We at Virtuozzo looked into doing something like this but
never got around to.

Thanks,
Roman.



Hello Mihail,

from the previous answers you can see that there is a
need for WinDbg support.

If you want to prepare patches for the official QEMU,
you need signed commits from all developers who contributed
(Dmitry Koltunov and you?).

Make also sure that no licensed code (for example extracts
from MS header files) gets added.

See http://wiki.qemu-project.org/Contribute/SubmitAPatch for
more hints.

As far as I saw, WinDbg support adds several thousand lines
of code. Do you want to support that code after it has been
included in QEMU?

Cheers
Stefan

[Qemu-devel] [PATCH 09/12] dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes

2017-04-12 Thread Eric Blake

Some of the callers were already scaling bytes to sectors; others
can be easily converted to pass byte offsets, all in our shift
towards a consistent byte interface everywhere.  Making the change
will also make it easier to write the hold-out callers to use byte
rather than sectors for their iterations; it also makes it easier
for a future dirty-bitmap patch to offload scaling over to the
internal hbitmap.  Although all callers happen to pass
sector-aligned values, make the internal scaling robust to any
sub-sector requests.

Signed-off-by: Eric Blake 
---
 include/block/dirty-bitmap.h |  4 ++--
 block/dirty-bitmap.c | 14 ++
 block/mirror.c   | 16 
 migration/block.c|  7 +--
 4 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index b8434e5..fdff1e2 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -37,9 +37,9 @@ DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap 
*bitmap);
 bool bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
 int64_t offset);
 void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
-   int64_t cur_sector, int64_t nr_sectors);
+   int64_t offset, int64_t bytes);
 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
- int64_t cur_sector, int64_t nr_sectors);
+ int64_t offset, int64_t bytes);
 BdrvDirtyBitmapIter *bdrv_dirty_meta_iter_new(BdrvDirtyBitmap *bitmap);
 BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap);
 void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index c8100d2..8e7822c 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -401,17 +401,23 @@ int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
 }

 void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
-   int64_t cur_sector, int64_t nr_sectors)
+   int64_t offset, int64_t bytes)
 {
+int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
+
 assert(bdrv_dirty_bitmap_enabled(bitmap));
-hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
+hbitmap_set(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
+end_sector - (offset >> BDRV_SECTOR_BITS));
 }

 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
- int64_t cur_sector, int64_t nr_sectors)
+ int64_t offset, int64_t bytes)
 {
+int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
+
 assert(bdrv_dirty_bitmap_enabled(bitmap));
-hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
+hbitmap_reset(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
+  end_sector - (offset >> BDRV_SECTOR_BITS));
 }

 void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out)
diff --git a/block/mirror.c b/block/mirror.c
index 1e2e655..21b4f5d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -141,8 +141,7 @@ static void mirror_write_complete(void *opaque, int ret)
 if (ret < 0) {
 BlockErrorAction action;

-bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> BDRV_SECTOR_BITS,
-  op->bytes >> BDRV_SECTOR_BITS);
+bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset, op->bytes);
 action = mirror_error_action(s, false, -ret);
 if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
 s->ret = ret;
@@ -161,8 +160,7 @@ static void mirror_read_complete(void *opaque, int ret)
 if (ret < 0) {
 BlockErrorAction action;

-bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> BDRV_SECTOR_BITS,
-  op->bytes >> BDRV_SECTOR_BITS);
+bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset, op->bytes);
 action = mirror_error_action(s, true, -ret);
 if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
 s->ret = ret;
@@ -380,8 +378,8 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
  * calling bdrv_get_block_status_above could yield - if some blocks are
  * marked dirty in this window, we need to know.
  */
-bdrv_reset_dirty_bitmap(s->dirty_bitmap, offset >> BDRV_SECTOR_BITS,
-nb_chunks * sectors_per_chunk);
+bdrv_reset_dirty_bitmap(s->dirty_bitmap, offset,
+nb_chunks * s->granularity);
 bitmap_set(s->in_flight_bitmap, offset / s->granularity, nb_chunks);
 while (nb_chunks > 0 && offset < s->bdev_length) {
 int64_t ret;
@@ -614,7 +612,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)

 if (base == NULL && !bdrv_has_zero_init(target_bs)) {
 if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
-

[Qemu-devel] [PATCH 05/12] dirty-bitmap: Set iterator start by offset, not sector

2017-04-12 Thread Eric Blake

All callers to bdrv_dirty_iter_new() passed 0 for their initial
starting point, drop that parameter.

All callers to bdrv_set_dirty_iter() were scaling an offset to
a sector number; move the scaling to occur internally to dirty
bitmap code instead.

Signed-off-by: Eric Blake 
---
 include/block/dirty-bitmap.h | 5 ++---
 block/backup.c   | 5 ++---
 block/dirty-bitmap.c | 9 -
 block/mirror.c   | 4 ++--
 4 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index a83979d..efcec60 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -41,11 +41,10 @@ void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
  int64_t cur_sector, int64_t nr_sectors);
 BdrvDirtyBitmapIter *bdrv_dirty_meta_iter_new(BdrvDirtyBitmap *bitmap);
-BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
- uint64_t first_sector);
+BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap);
 void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter);
 int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter);
-void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t sector_num);
+void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t offset);
 int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
 int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap);
 void bdrv_dirty_bitmap_truncate(BlockDriverState *bs);
diff --git a/block/backup.c b/block/backup.c
index 63ca208..efa4896 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -372,7 +372,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)

 granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
 clusters_per_iter = MAX((granularity / job->cluster_size), 1);
-dbi = bdrv_dirty_iter_new(job->sync_bitmap, 0);
+dbi = bdrv_dirty_iter_new(job->sync_bitmap);

 /* Find the next dirty sector(s) */
 while ((offset = bdrv_dirty_iter_next(dbi) * BDRV_SECTOR_SIZE) >= 0) {
@@ -403,8 +403,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 /* If the bitmap granularity is smaller than the backup granularity,
  * we need to advance the iterator pointer to the next cluster. */
 if (granularity < job->cluster_size) {
-bdrv_set_dirty_iter(dbi,
-cluster * job->cluster_size / 
BDRV_SECTOR_SIZE);
+bdrv_set_dirty_iter(dbi, cluster * job->cluster_size);
 }

 last_cluster = cluster - 1;
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index a413df1..3fb4871 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -367,11 +367,10 @@ uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap 
*bitmap)
 return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
 }

-BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
- uint64_t first_sector)
+BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap)
 {
 BdrvDirtyBitmapIter *iter = g_new(BdrvDirtyBitmapIter, 1);
-hbitmap_iter_init(>hbi, bitmap->bitmap, first_sector);
+hbitmap_iter_init(>hbi, bitmap->bitmap, 0);
 iter->bitmap = bitmap;
 bitmap->active_iterators++;
 return iter;
@@ -488,9 +487,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t 
cur_sector,
 /**
  * Advance a BdrvDirtyBitmapIter to an arbitrary offset.
  */
-void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *iter, int64_t sector_num)
+void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *iter, int64_t offset)
 {
-hbitmap_iter_init(>hbi, iter->hbi.hb, sector_num);
+hbitmap_iter_init(>hbi, iter->hbi.hb, offset >> BDRV_SECTOR_BITS);
 }

 int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap)
diff --git a/block/mirror.c b/block/mirror.c
index c92335a..7c1d6bf 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -370,7 +370,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
 if (next_dirty > next_offset || next_dirty < 0) {
 /* The bitmap iterator's cache is stale, refresh it */
-bdrv_set_dirty_iter(s->dbi, next_offset >> BDRV_SECTOR_BITS);
+bdrv_set_dirty_iter(s->dbi, next_offset);
 next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
 }
 assert(next_dirty == next_offset);
@@ -779,7 +779,7 @@ static void coroutine_fn mirror_run(void *opaque)
 }

 assert(!s->dbi);
-s->dbi = bdrv_dirty_iter_new(s->dirty_bitmap, 0);
+s->dbi = bdrv_dirty_iter_new(s->dirty_bitmap);
 for (;;) {
 uint64_t delay_ns = 0;
 int64_t cnt, delta;
-- 
2.9.3

[Qemu-devel] [PATCH 07/12] dirty-bitmap: Change bdrv_get_dirty_count() to report bytes

2017-04-12 Thread Eric Blake

Thanks to recent cleanups, all callers were scaling a return value
of sectors into bytes; do the scaling internally instead.

Signed-off-by: Eric Blake 
---
 block/dirty-bitmap.c |  4 ++--
 block/mirror.c   | 13 +
 migration/block.c|  2 +-
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 2f9f554..e3c2e34 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -319,7 +319,7 @@ BlockDirtyInfoList 
*bdrv_query_dirty_bitmaps(BlockDriverState *bs)
 QLIST_FOREACH(bm, >dirty_bitmaps, list) {
 BlockDirtyInfo *info = g_new0(BlockDirtyInfo, 1);
 BlockDirtyInfoList *entry = g_new0(BlockDirtyInfoList, 1);
-info->count = bdrv_get_dirty_count(bm) << BDRV_SECTOR_BITS;
+info->count = bdrv_get_dirty_count(bm);
 info->granularity = bdrv_dirty_bitmap_granularity(bm);
 info->has_name = !!bm->name;
 info->name = g_strdup(bm->name);
@@ -494,7 +494,7 @@ void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *iter, int64_t 
offset)

 int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap)
 {
-return hbitmap_count(bitmap->bitmap);
+return hbitmap_count(bitmap->bitmap) << BDRV_SECTOR_BITS;
 }

 int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap)
diff --git a/block/mirror.c b/block/mirror.c
index f404ff3..1b98a77 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -794,11 +794,10 @@ static void coroutine_fn mirror_run(void *opaque)

 cnt = bdrv_get_dirty_count(s->dirty_bitmap);
 /* s->common.offset contains the number of bytes already processed so
- * far, cnt is the number of dirty sectors remaining and
+ * far, cnt is the number of dirty bytes remaining and
  * s->bytes_in_flight is the number of bytes currently being
  * processed; together those are the current total operation length */
-s->common.len = s->common.offset + s->bytes_in_flight +
-cnt * BDRV_SECTOR_SIZE;
+s->common.len = s->common.offset + s->bytes_in_flight + cnt;

 /* Note that even when no rate limit is applied we need to yield
  * periodically with no pending I/O so that bdrv_drain_all() returns.
@@ -810,8 +809,7 @@ static void coroutine_fn mirror_run(void *opaque)
 s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
 if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 ||
 (cnt == 0 && s->in_flight > 0)) {
-trace_mirror_yield(s, cnt * BDRV_SECTOR_SIZE,
-   s->buf_free_count, s->in_flight);
+trace_mirror_yield(s, cnt, s->buf_free_count, s->in_flight);
 mirror_wait_for_io(s);
 continue;
 } else if (cnt != 0) {
@@ -852,7 +850,7 @@ static void coroutine_fn mirror_run(void *opaque)
  * whether to switch to target check one last time if I/O has
  * come in the meanwhile, and if not flush the data to disk.
  */
-trace_mirror_before_drain(s, cnt * BDRV_SECTOR_SIZE);
+trace_mirror_before_drain(s, cnt);

 bdrv_drained_begin(bs);
 cnt = bdrv_get_dirty_count(s->dirty_bitmap);
@@ -871,8 +869,7 @@ static void coroutine_fn mirror_run(void *opaque)
 }

 ret = 0;
-trace_mirror_before_sleep(s, cnt * BDRV_SECTOR_SIZE,
-  s->synced, delay_ns);
+trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
 if (!s->synced) {
 block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns);
 if (block_job_is_cancelled(>common)) {
diff --git a/migration/block.c b/migration/block.c
index 9a9c214..3daa5c7 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -672,7 +672,7 @@ static int64_t get_remaining_dirty(void)
 aio_context_release(blk_get_aio_context(bmds->blk));
 }

-return dirty << BDRV_SECTOR_BITS;
+return dirty;
 }

 /* Called with iothread lock taken.  */
-- 
2.9.3

[Qemu-devel] [PATCH 03/12] dirty-bitmap: Drop unused functions

2017-04-12 Thread Eric Blake

We had several functions that no one was using, and which used
sector-based interfaces.  I'm trying to convert towards byte-based
interfaces, so it's easier to just drop the unused functions:

bdrv_dirty_bitmap_size
bdrv_dirty_bitmap_get_meta
bdrv_dirty_bitmap_reset_meta
bdrv_dirty_bitmap_meta_granularity

Signed-off-by: Eric Blake 
---
 include/block/dirty-bitmap.h |  8 
 block/dirty-bitmap.c | 34 --
 2 files changed, 42 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 9dea14b..a83979d 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -30,11 +30,9 @@ void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs);
 uint32_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs);
 uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap *bitmap);
-uint32_t bdrv_dirty_bitmap_meta_granularity(BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap);
 const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
-int64_t bdrv_dirty_bitmap_size(const BdrvDirtyBitmap *bitmap);
 DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap);
 int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
int64_t sector);
@@ -42,12 +40,6 @@ void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int64_t nr_sectors);
 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
  int64_t cur_sector, int64_t nr_sectors);
-int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
-   BdrvDirtyBitmap *bitmap, int64_t sector,
-   int nb_sectors);
-void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
-  BdrvDirtyBitmap *bitmap, int64_t sector,
-  int nb_sectors);
 BdrvDirtyBitmapIter *bdrv_dirty_meta_iter_new(BdrvDirtyBitmap *bitmap);
 BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
  uint64_t first_sector);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 6d8ce5f..32698d5 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -130,35 +130,6 @@ void bdrv_release_meta_dirty_bitmap(BdrvDirtyBitmap 
*bitmap)
 bitmap->meta = NULL;
 }

-int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
-   BdrvDirtyBitmap *bitmap, int64_t sector,
-   int nb_sectors)
-{
-uint64_t i;
-int sectors_per_bit = 1 << hbitmap_granularity(bitmap->meta);
-
-/* To optimize: we can make hbitmap to internally check the range in a
- * coarse level, or at least do it word by word. */
-for (i = sector; i < sector + nb_sectors; i += sectors_per_bit) {
-if (hbitmap_get(bitmap->meta, i)) {
-return true;
-}
-}
-return false;
-}
-
-void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
-  BdrvDirtyBitmap *bitmap, int64_t sector,
-  int nb_sectors)
-{
-hbitmap_reset(bitmap->meta, sector, nb_sectors);
-}
-
-int64_t bdrv_dirty_bitmap_size(const BdrvDirtyBitmap *bitmap)
-{
-return bitmap->size;
-}
-
 const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
 {
 return bitmap->name;
@@ -393,11 +364,6 @@ uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap 
*bitmap)
 return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
 }

-uint32_t bdrv_dirty_bitmap_meta_granularity(BdrvDirtyBitmap *bitmap)
-{
-return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->meta);
-}
-
 BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
  uint64_t first_sector)
 {
-- 
2.9.3

[Qemu-devel] [PATCH 02/12] migration: Don't lose errno across aio context changes

2017-04-12 Thread Eric Blake

set_drity_tracking() was assuming that the errno value set by
bdrv_create_dirty_bitmap() would not be corrupted by either
blk_get_aio_context() or aio_context_release().  Rather than
audit whether this assumption is safe, rewrite the code to just
grab the value of errno sooner.

CC: qemu-sta...@nongnu.org
Signed-off-by: Eric Blake 
---
 migration/block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/block.c b/migration/block.c
index 18d50ff..9a9c214 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -350,9 +350,9 @@ static int set_dirty_tracking(void)
 aio_context_acquire(blk_get_aio_context(bmds->blk));
 bmds->dirty_bitmap = bdrv_create_dirty_bitmap(blk_bs(bmds->blk),
   BLOCK_SIZE, NULL, NULL);
+ret = -errno;
 aio_context_release(blk_get_aio_context(bmds->blk));
 if (!bmds->dirty_bitmap) {
-ret = -errno;
 goto fail;
 }
 }
-- 
2.9.3

[Qemu-devel] [PATCH 12/12] dirty-bitmap: Convert internal hbitmap size/granularity

2017-04-12 Thread Eric Blake

Now that all callers are using byte-based interfaces, there's no
reason for our internal hbitmap to remain with sector-based
granularity.  It also simplifies our internal scaling, since we
already know that hbitmap widens requests out to granularity
boundaries.

Signed-off-by: Eric Blake 
---
 block/dirty-bitmap.c | 37 -
 1 file changed, 12 insertions(+), 25 deletions(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index ef165eb..26ca084 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -37,7 +37,7 @@
  * or enabled. A frozen bitmap can only abdicate() or reclaim().
  */
 struct BdrvDirtyBitmap {
-HBitmap *bitmap;/* Dirty sector bitmap implementation */
+HBitmap *bitmap;/* Dirty bitmap implementation */
 HBitmap *meta;  /* Meta dirty bitmap */
 BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
 char *name; /* Optional non-empty unique ID */
@@ -93,12 +93,7 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState 
*bs,
 return NULL;
 }
 bitmap = g_new0(BdrvDirtyBitmap, 1);
-/*
- * TODO - let hbitmap track full granularity. For now, it is tracking
- * only sector granularity, as a shortcut for our iterators.
- */
-bitmap->bitmap = hbitmap_alloc(bitmap_size >> BDRV_SECTOR_BITS,
-   ctz32(granularity) - BDRV_SECTOR_BITS);
+bitmap->bitmap = hbitmap_alloc(bitmap_size, ctz32(granularity));
 bitmap->size = bitmap_size;
 bitmap->name = g_strdup(name);
 bitmap->disabled = false;
@@ -254,7 +249,7 @@ void bdrv_dirty_bitmap_truncate(BlockDriverState *bs)
 QLIST_FOREACH(bitmap, >dirty_bitmaps, list) {
 assert(!bdrv_dirty_bitmap_frozen(bitmap));
 assert(!bitmap->active_iterators);
-hbitmap_truncate(bitmap->bitmap, size >> BDRV_SECTOR_BITS);
+hbitmap_truncate(bitmap->bitmap, size);
 bitmap->size = size;
 }
 }
@@ -336,7 +331,7 @@ bool bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap 
*bitmap,
 int64_t offset)
 {
 if (bitmap) {
-return hbitmap_get(bitmap->bitmap, offset >> BDRV_SECTOR_BITS);
+return hbitmap_get(bitmap->bitmap, offset);
 } else {
 return false;
 }
@@ -364,7 +359,7 @@ uint32_t 
bdrv_get_default_bitmap_granularity(BlockDriverState *bs)

 uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap *bitmap)
 {
-return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
+return 1U << hbitmap_granularity(bitmap->bitmap);
 }

 BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap)
@@ -397,27 +392,21 @@ void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter)

 int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
 {
-return hbitmap_iter_next(>hbi) * BDRV_SECTOR_SIZE;
+return hbitmap_iter_next(>hbi);
 }

 void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
int64_t offset, int64_t bytes)
 {
-int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
-
 assert(bdrv_dirty_bitmap_enabled(bitmap));
-hbitmap_set(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
-end_sector - (offset >> BDRV_SECTOR_BITS));
+hbitmap_set(bitmap->bitmap, offset, bytes);
 }

 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
  int64_t offset, int64_t bytes)
 {
-int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
-
 assert(bdrv_dirty_bitmap_enabled(bitmap));
-hbitmap_reset(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
-  end_sector - (offset >> BDRV_SECTOR_BITS));
+hbitmap_reset(bitmap->bitmap, offset, bytes);
 }

 void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out)
@@ -427,7 +416,7 @@ void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, 
HBitmap **out)
 hbitmap_reset_all(bitmap->bitmap);
 } else {
 HBitmap *backup = bitmap->bitmap;
-bitmap->bitmap = hbitmap_alloc(bitmap->size >> BDRV_SECTOR_BITS,
+bitmap->bitmap = hbitmap_alloc(bitmap->size,
hbitmap_granularity(backup));
 *out = backup;
 }
@@ -481,14 +470,12 @@ void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap 
*bitmap)
 void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
 BdrvDirtyBitmap *bitmap;
-int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);

 QLIST_FOREACH(bitmap, >dirty_bitmaps, list) {
 if (!bdrv_dirty_bitmap_enabled(bitmap)) {
 continue;
 }
-hbitmap_set(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
-end_sector - (offset >> BDRV_SECTOR_BITS));
+hbitmap_set(bitmap->bitmap, offset, bytes);
 }
 }

@@ -497,12 +484,12 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, 
int64_t bytes)
  */
 void

[Qemu-devel] [PATCH 06/12] dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset

2017-04-12 Thread Eric Blake

Thanks to recent cleanups, all callers were scaling a return value
of sectors into bytes; do the scaling internally instead.

Signed-off-by: Eric Blake 
---
 block/backup.c   | 2 +-
 block/dirty-bitmap.c | 2 +-
 block/mirror.c   | 8 
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index efa4896..6efd864 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -375,7 +375,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 dbi = bdrv_dirty_iter_new(job->sync_bitmap);

 /* Find the next dirty sector(s) */
-while ((offset = bdrv_dirty_iter_next(dbi) * BDRV_SECTOR_SIZE) >= 0) {
+while ((offset = bdrv_dirty_iter_next(dbi)) >= 0) {
 cluster = offset / job->cluster_size;

 /* Fake progress updates for any clusters we skipped */
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 3fb4871..2f9f554 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -397,7 +397,7 @@ void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter)

 int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
 {
-return hbitmap_iter_next(>hbi);
+return hbitmap_iter_next(>hbi) * BDRV_SECTOR_SIZE;
 }

 void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
diff --git a/block/mirror.c b/block/mirror.c
index 7c1d6bf..f404ff3 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -335,10 +335,10 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
 int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);

-offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+offset = bdrv_dirty_iter_next(s->dbi);
 if (offset < 0) {
 bdrv_set_dirty_iter(s->dbi, 0);
-offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+offset = bdrv_dirty_iter_next(s->dbi);
 trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap) *
   BDRV_SECTOR_SIZE);
 assert(offset >= 0);
@@ -367,11 +367,11 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 break;
 }

-next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+next_dirty = bdrv_dirty_iter_next(s->dbi);
 if (next_dirty > next_offset || next_dirty < 0) {
 /* The bitmap iterator's cache is stale, refresh it */
 bdrv_set_dirty_iter(s->dbi, next_offset);
-next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+next_dirty = bdrv_dirty_iter_next(s->dbi);
 }
 assert(next_dirty == next_offset);
 nb_chunks++;
-- 
2.9.3

[Qemu-devel] [PATCH 04/12] dirty-bitmap: Track size in bytes

2017-04-12 Thread Eric Blake

We are still using an internal hbitmap that tracks a size in sectors,
with the granularity scaled down accordingly, because it lets us
use a shortcut for our iterators which are currently sector-based.
But there's no reason we can't track the dirty bitmap size in bytes,
since it is an internal-only variable.

Signed-off-by: Eric Blake 
---
 block/dirty-bitmap.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 32698d5..a413df1 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -41,7 +41,7 @@ struct BdrvDirtyBitmap {
 HBitmap *meta;  /* Meta dirty bitmap */
 BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
 char *name; /* Optional non-empty unique ID */
-int64_t size;   /* Size of the bitmap (Number of sectors) */
+int64_t size;   /* Size of the bitmap, in bytes */
 bool disabled;  /* Bitmap is read-only */
 int active_iterators;   /* How many iterators are active */
 QLIST_ENTRY(BdrvDirtyBitmap) list;
@@ -79,24 +79,26 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState 
*bs,
 {
 int64_t bitmap_size;
 BdrvDirtyBitmap *bitmap;
-uint32_t sector_granularity;

-assert((granularity & (granularity - 1)) == 0);
+assert(is_power_of_2(granularity) && granularity >= BDRV_SECTOR_SIZE);

 if (name && bdrv_find_dirty_bitmap(bs, name)) {
 error_setg(errp, "Bitmap already exists: %s", name);
 return NULL;
 }
-sector_granularity = granularity >> BDRV_SECTOR_BITS;
-assert(sector_granularity);
-bitmap_size = bdrv_nb_sectors(bs);
+bitmap_size = bdrv_getlength(bs);
 if (bitmap_size < 0) {
 error_setg_errno(errp, -bitmap_size, "could not get length of device");
 errno = -bitmap_size;
 return NULL;
 }
 bitmap = g_new0(BdrvDirtyBitmap, 1);
-bitmap->bitmap = hbitmap_alloc(bitmap_size, ctz32(sector_granularity));
+/*
+ * TODO - let hbitmap track full granularity. For now, it is tracking
+ * only sector granularity, as a shortcut for our iterators.
+ */
+bitmap->bitmap = hbitmap_alloc(bitmap_size >> BDRV_SECTOR_BITS,
+   ctz32(granularity) - BDRV_SECTOR_BITS);
 bitmap->size = bitmap_size;
 bitmap->name = g_strdup(name);
 bitmap->disabled = false;
@@ -246,12 +248,13 @@ BdrvDirtyBitmap 
*bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
 void bdrv_dirty_bitmap_truncate(BlockDriverState *bs)
 {
 BdrvDirtyBitmap *bitmap;
-uint64_t size = bdrv_nb_sectors(bs);
+int64_t size = bdrv_getlength(bs);

+assert(size >= 0);
 QLIST_FOREACH(bitmap, >dirty_bitmaps, list) {
 assert(!bdrv_dirty_bitmap_frozen(bitmap));
 assert(!bitmap->active_iterators);
-hbitmap_truncate(bitmap->bitmap, size);
+hbitmap_truncate(bitmap->bitmap, size >> BDRV_SECTOR_BITS);
 bitmap->size = size;
 }
 }
@@ -419,7 +422,7 @@ void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, 
HBitmap **out)
 hbitmap_reset_all(bitmap->bitmap);
 } else {
 HBitmap *backup = bitmap->bitmap;
-bitmap->bitmap = hbitmap_alloc(bitmap->size,
+bitmap->bitmap = hbitmap_alloc(bitmap->size >> BDRV_SECTOR_BITS,
hbitmap_granularity(backup));
 *out = backup;
 }
-- 
2.9.3

[Qemu-devel] [PATCH 01/12] dirty-bitmap: Report BlockDirtyInfo.count in bytes, as documented

2017-04-12 Thread Eric Blake

We've been documenting the value in bytes since its introduction
in commit b9a9b3a4 (v1.3), where it was actually reported in bytes.

Commit e4654d2 (v2.0) then removed things from block/qapi.c, in
preparation for a rewrite to a list of dirty sectors in the next
commit 21b5683 in block.c, but the new code mistakenly started
reporting in sectors.

Fixes: https://bugzilla.redhat.com/1441460

CC: qemu-sta...@nongnu.org
Signed-off-by: Eric Blake 

---
Too late for 2.9, since the regression has been unnoticed for
nine releases. But worth putting in 2.9.1.
---
 block/dirty-bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 519737c..6d8ce5f 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -345,7 +345,7 @@ BlockDirtyInfoList 
*bdrv_query_dirty_bitmaps(BlockDriverState *bs)
 QLIST_FOREACH(bm, >dirty_bitmaps, list) {
 BlockDirtyInfo *info = g_new0(BlockDirtyInfo, 1);
 BlockDirtyInfoList *entry = g_new0(BlockDirtyInfoList, 1);
-info->count = bdrv_get_dirty_count(bm);
+info->count = bdrv_get_dirty_count(bm) << BDRV_SECTOR_BITS;
 info->granularity = bdrv_dirty_bitmap_granularity(bm);
 info->has_name = !!bm->name;
 info->name = g_strdup(bm->name);
-- 
2.9.3

[Qemu-devel] [PATCH 00/12] make dirty-bitmap byte-based

2017-04-12 Thread Eric Blake

There are patches floating around to add NBD_CMD_BLOCK_STATUS,
but NBD wants to report status on byte granularity (even if the
reporting will probably be naturally aligned to sectors or even
much higher levels).  I've therefore started the task of
converting our block status code to report at a byte granularity
rather than sectors.

This is part two of that conversion: dirty-bitmap. Other parts
include bdrv_is_allocated (previously submitted) and replacing
bdrv_get_block_status with a byte based callback in all the
drivers (still being written).

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-dirty-v1

It requires the following (v1 of bdrv_is_allocated, v9 of
blkdebug, and Max's block-next tree):
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01995.html
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01723.html
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01298.html

Eric Blake (12):
  dirty-bitmap: Report BlockDirtyInfo.count in bytes, as documented
  migration: Don't lose errno across aio context changes
  dirty-bitmap: Drop unused functions
  dirty-bitmap: Track size in bytes
  dirty-bitmap: Set iterator start by offset, not sector
  dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset
  dirty-bitmap: Change bdrv_get_dirty_count() to report bytes
  dirty-bitmap: Change bdrv_get_dirty() to take bytes
  dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes
  mirror: Switch mirror_dirty_init() to byte-based iteration
  dirty-bitmap: Switch bdrv_set_dirty() to bytes
  dirty-bitmap: Convert internal hbitmap size/granularity

 include/block/block_int.h|  2 +-
 include/block/dirty-bitmap.h | 21 ---
 block/backup.c   |  7 ++--
 block/dirty-bitmap.c | 83 
 block/io.c   |  6 ++--
 block/mirror.c   | 73 +-
 migration/block.c| 14 
 7 files changed, 74 insertions(+), 132 deletions(-)

-- 
2.9.3

Re: [Qemu-devel] [PATCH 1/4] net: add FTGMAC100 support

2017-04-12 Thread Peter Maydell

On 12 April 2017 at 18:29, Cédric Le Goater  wrote:
> On 04/10/2017 03:43 PM, Peter Maydell wrote:
>> On 1 April 2017 at 13:57, Cédric Le Goater  wrote:
>>> +case FTGMAC100_MAC_LADR:
>>> +return (s->conf.macaddr.a[2] << 24) | (s->conf.macaddr.a[3] << 16) 
>>> |
>>> +   (s->conf.macaddr.a[4] << 8)  |  s->conf.macaddr.a[5];
>>
>> This will sign extend the high bit of macaddr.a[2] into bits 63..32 of
>> the return value, which probably isn't what you intended and will
>> make Coverity unhapppy.
>
> indeed. What would you recommand ? simply :
>
> ((uint64_t) s->conf.macaddr.a[2] << 24) | ...

Cast to uint32_t is sufficient. We just need to avoid ending
up with a signed 32 bit value...

thanks
-- PMM

[Qemu-devel] [PATCH] arm: remove remaining cannot_destroy_with_object_finalize_yet

2017-04-12 Thread Laurent Vivier

With commit ce5b1bbf624b ("exec: move cpu_exec_init() calls to
realize functions"), we can now remove all the
remaining cannot_destroy_with_object_finalize_yet as
unsafe references have been moved to cpu_exec_realizefn().
(tested with QOM command provided by commit 4c315c27).

Suggested-by: Markus Armbruster 
Signed-off-by: Laurent Vivier 
---
 hw/arm/allwinner-a10.c | 6 --
 hw/arm/bcm2836.c   | 6 --
 hw/arm/digic.c | 6 --
 hw/arm/fsl-imx25.c | 5 -
 hw/arm/fsl-imx31.c | 5 -
 hw/arm/fsl-imx6.c  | 5 -
 hw/arm/xlnx-zynqmp.c   | 6 --
 7 files changed, 39 deletions(-)

diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
index ca15d1c..f62a9a3 100644
--- a/hw/arm/allwinner-a10.c
+++ b/hw/arm/allwinner-a10.c
@@ -118,12 +118,6 @@ static void aw_a10_class_init(ObjectClass *oc, void *data)
 DeviceClass *dc = DEVICE_CLASS(oc);
 
 dc->realize = aw_a10_realize;
-
-/*
- * Reason: creates an ARM CPU, thus use after free(), see
- * arm_cpu_class_init()
- */
-dc->cannot_destroy_with_object_finalize_yet = true;
 }
 
 static const TypeInfo aw_a10_type_info = {
diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index 8451190..8c43291 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -160,12 +160,6 @@ static void bcm2836_class_init(ObjectClass *oc, void *data)
 
 dc->props = bcm2836_props;
 dc->realize = bcm2836_realize;
-
-/*
- * Reason: creates an ARM CPU, thus use after free(), see
- * arm_cpu_class_init()
- */
-dc->cannot_destroy_with_object_finalize_yet = true;
 }
 
 static const TypeInfo bcm2836_type_info = {
diff --git a/hw/arm/digic.c b/hw/arm/digic.c
index d60ea39..94f3263 100644
--- a/hw/arm/digic.c
+++ b/hw/arm/digic.c
@@ -101,12 +101,6 @@ static void digic_class_init(ObjectClass *oc, void *data)
 DeviceClass *dc = DEVICE_CLASS(oc);
 
 dc->realize = digic_realize;
-
-/*
- * Reason: creates an ARM CPU, thus use after free(), see
- * arm_cpu_class_init()
- */
-dc->cannot_destroy_with_object_finalize_yet = true;
 }
 
 static const TypeInfo digic_type_info = {
diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c
index 2126f73..9056f27 100644
--- a/hw/arm/fsl-imx25.c
+++ b/hw/arm/fsl-imx25.c
@@ -290,11 +290,6 @@ static void fsl_imx25_class_init(ObjectClass *oc, void 
*data)
 
 dc->realize = fsl_imx25_realize;
 
-/*
- * Reason: creates an ARM CPU, thus use after free(), see
- * arm_cpu_class_init()
- */
-dc->cannot_destroy_with_object_finalize_yet = true;
 dc->desc = "i.MX25 SOC";
 }
 
diff --git a/hw/arm/fsl-imx31.c b/hw/arm/fsl-imx31.c
index dd1c713..d7e2d83 100644
--- a/hw/arm/fsl-imx31.c
+++ b/hw/arm/fsl-imx31.c
@@ -262,11 +262,6 @@ static void fsl_imx31_class_init(ObjectClass *oc, void 
*data)
 
 dc->realize = fsl_imx31_realize;
 
-/*
- * Reason: creates an ARM CPU, thus use after free(), see
- * arm_cpu_class_init()
- */
-dc->cannot_destroy_with_object_finalize_yet = true;
 dc->desc = "i.MX31 SOC";
 }
 
diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
index 76dd8a4..6969e73 100644
--- a/hw/arm/fsl-imx6.c
+++ b/hw/arm/fsl-imx6.c
@@ -442,11 +442,6 @@ static void fsl_imx6_class_init(ObjectClass *oc, void 
*data)
 
 dc->realize = fsl_imx6_realize;
 
-/*
- * Reason: creates an ARM CPU, thus use after free(), see
- * arm_cpu_class_init()
- */
-dc->cannot_destroy_with_object_finalize_yet = true;
 dc->desc = "i.MX6 SOC";
 }
 
diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index bc4e66b..4f67158 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -439,12 +439,6 @@ static void xlnx_zynqmp_class_init(ObjectClass *oc, void 
*data)
 
 dc->props = xlnx_zynqmp_props;
 dc->realize = xlnx_zynqmp_realize;
-
-/*
- * Reason: creates an ARM CPU, thus use after free(), see
- * arm_cpu_class_init()
- */
-dc->cannot_destroy_with_object_finalize_yet = true;
 }
 
 static const TypeInfo xlnx_zynqmp_type_info = {
-- 
2.9.3

Re: [Qemu-devel] [PATCH 1/4] net: add FTGMAC100 support

2017-04-12 Thread Cédric Le Goater

On 04/10/2017 03:43 PM, Peter Maydell wrote:
> On 1 April 2017 at 13:57, Cédric Le Goater  wrote:
>> The FTGMAC100 device is an Ethernet controller with DMA function that
>> can be found on Aspeed SoCs (which include NCSI).
>>
>> It is fully compliant with IEEE 802.3 specification for 10/100 Mbps
>> Ethernet and IEEE 802.3z specification for 1000 Mbps Ethernet and
>> includes Reduced Media Independent Interface (RMII) and Reduced
>> Gigabit Media Independent Interface (RGMII) interfaces. It adopts an
>> AHB bus interface and integrates a link list DMA engine with direct
>> M-Bus accesses for transmitting and receiving packets. It has
>> independent TX/RX fifos, supports half and full duplex (1000 Mbps mode
>> only supports full duplex), flow control for full duplex and
>> backpressure for half duplex.
>>
>> The FTGMAC100 also implements IP, TCP, UDP checksum offloads and
>> supports IEEE 802.1Q VLAN tag insertion and removal. It offers
>> high-priority transmit queue for QoS and CoS applications
>>
>> This model is complete enough to satisfy two different Linux drivers
>> and a U-Boot driver. Not supported features are :
>>
>>  - IEEE 802.1Q VLAN
>>  - High Priority Transmit Queue
>>  - Wake-On-LAN functions
>>
>> The code is based on the Coldfire Fast Ethernet Controller model.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  default-configs/arm-softmmu.mak |   1 +
>>  hw/net/Makefile.objs|   1 +
>>  hw/net/ftgmac100.c  | 977 
>> 
>>  include/hw/net/ftgmac100.h  |  58 +++
>>  include/hw/net/mii.h|   6 +
>>  5 files changed, 1043 insertions(+)
>>  create mode 100644 hw/net/ftgmac100.c
>>  create mode 100644 include/hw/net/ftgmac100.h
>>
>> diff --git a/default-configs/arm-softmmu.mak 
>> b/default-configs/arm-softmmu.mak
>> index 15a53598d1c3..acc4c6c86297 100644
>> --- a/default-configs/arm-softmmu.mak
>> +++ b/default-configs/arm-softmmu.mak
>> @@ -30,6 +30,7 @@ CONFIG_LAN9118=y
>>  CONFIG_SMC91C111=y
>>  CONFIG_ALLWINNER_EMAC=y
>>  CONFIG_IMX_FEC=y
>> +CONFIG_FTGMAC100=y
>>  CONFIG_DS1338=y
>>  CONFIG_RX8900=y
>>  CONFIG_PFLASH_CFI01=y
>> diff --git a/hw/net/Makefile.objs b/hw/net/Makefile.objs
>> index 6a95d92d37f8..5ddaffe63a46 100644
>> --- a/hw/net/Makefile.objs
>> +++ b/hw/net/Makefile.objs
>> @@ -26,6 +26,7 @@ common-obj-$(CONFIG_IMX_FEC) += imx_fec.o
>>  common-obj-$(CONFIG_CADENCE) += cadence_gem.o
>>  common-obj-$(CONFIG_STELLARIS_ENET) += stellaris_enet.o
>>  common-obj-$(CONFIG_LANCE) += lance.o
>> +common-obj-$(CONFIG_FTGMAC100) += ftgmac100.o
>>
>>  obj-$(CONFIG_ETRAXFS) += etraxfs_eth.o
>>  obj-$(CONFIG_COLDFIRE) += mcf_fec.o
>> diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
>> new file mode 100644
>> index ..331e87391962
>> --- /dev/null
>> +++ b/hw/net/ftgmac100.c
>> @@ -0,0 +1,977 @@
>> +/*
>> + * Faraday FTGMAC100 Gigabit Ethernet
>> + *
>> + * Copyright (C) 2016 IBM Corp.
>> + *
>> + * Based on Coldfire Fast Ethernet Controller emulation.
>> + *
>> + * Copyright (c) 2007 CodeSourcery.
>> + *
>> + * This code is licensed under the GPL version 2 or later. See the
>> + * COPYING file in the top-level directory.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "hw/net/ftgmac100.h"
>> +#include "sysemu/dma.h"
>> +#include "qemu/log.h"
>> +#include "net/checksum.h"
>> +#include "net/eth.h"
>> +#include "hw/net/mii.h"
>> +
>> +/* For crc32 */
>> +#include 
>> +
>> +/*
>> + * FTGMAC100 registers
>> + */
>> +#define FTGMAC100_ISR 0x00
>> +#define FTGMAC100_IER 0x04
>> +#define FTGMAC100_MAC_MADR0x08
>> +#define FTGMAC100_MAC_LADR0x0c
>> +#define FTGMAC100_MATH0   0x10
>> +#define FTGMAC100_MATH1   0x14
>> +#define FTGMAC100_NPTXPD  0x18
>> +#define FTGMAC100_RXPD0x1C
>> +#define FTGMAC100_NPTXR_BADR  0x20
>> +#define FTGMAC100_RXR_BADR0x24
>> +#define FTGMAC100_HPTXPD  0x28 /* TODO */
>> +#define FTGMAC100_HPTXR_BADR  0x2c /* TODO */
> 
> What's TODO about a #define ?
> Maybe better to have TODO comment and LOG_UNIMP at the register/read
> write point?

yes. I have changed that. 

> 
>> +#define FTGMAC100_ITC 0x30
>> +#define FTGMAC100_APTC0x34
>> +#define FTGMAC100_DBLAC   0x38
>> +#define FTGMAC100_REVR0x40
>> +#define FTGMAC100_FEAR1   0x44
>> +#define FTGMAC100_RBSR0x4c
>> +#define FTGMAC100_TPAFCR  0x48
>> +
>> +#define FTGMAC100_MACCR   0x50
>> +#define FTGMAC100_MACSR   0x54 /* TODO */
>> +#define FTGMAC100_PHYCR   0x60
>> +#define FTGMAC100_PHYDATA 0x64
>> +#define FTGMAC100_FCR 0x68
>> +
>> +/*
>> + * Interrupt status register & interrupt enable register
>> + */
>> +#define FTGMAC100_INT_RPKT_BUF(1 << 0)
>> +#define FTGMAC100_INT_RPKT_FIFO   (1 << 1)
>> +#define FTGMAC100_INT_NO_RXBUF(1 << 2)
>> +#define FTGMAC100_INT_RPKT_LOST   (1

Re: [Qemu-devel] [RFC] Proposed qcow2 extension: subcluster allocation

2017-04-12 Thread Denis V. Lunev

On 04/06/2017 06:01 PM, Alberto Garcia wrote:
> Hi all,
>
> over the past couple of months I discussed with some of you the
> possibility to extend the qcow2 format in order to improve its
> performance and reduce its memory requirements (particularly with very
> large images).
>
> After some discussion in the mailing list and the #qemu IRC channel I
> decided to write a prototype of a new extension for qcow2 so I could
> understand better the scope of the changes and have some preliminary
> data about its effects.
>
> This e-mail is the formal presentation of my proposal to extend the
> on-disk qcow2 format. As you can see this is still an RFC. Due to the
> nature of the changes I would like to get as much feedback as possible
> before going forward.
>
> === Problem ===
>
> The original problem that I wanted to address is the memory
> requirements of qcow2 files if you want to have very large images and
> still keep good I/O performance. This is a consequence of its very
> structure, which I'm going to describe now.
>
> A qcow2 image is divided into units of constant size called clusters,
> and among other things it contains metadata that maps guest addresses
> to host addresses (the so-called L1 and L2 tables).
>
> There are two basic problems that result from this:
>
> 1) Reading from or writing to a qcow2 image involves reading the
>corresponding entry on the L2 table that maps the guest address to
>the host address. This is very slow because it involves two I/O
>operations: one on the L2 table and the other one on the actual
>data cluster.
>
> 2) A cluster is the smallest unit of allocation. Therefore writing a
>mere 512 bytes to an empty disk requires allocating a complete
>cluster and filling it with zeroes (or with data from the backing
>image if there is one). This wastes more disk space and also has a
>negative impact on I/O.
>
> Problem (1) can be solved by keeping in memory a cache of the L2
> tables (QEMU has an "l2_cache_size" parameter for this purpose). The
> amount of disk space used by L2 tables depends on two factors: the
> disk size and the cluster size.
>
> The cluster size can be configured when the image is created, and it
> can be any power of two between 512 bytes and 2 MB (it defaults to 64
> KB).
>
> The maximum amount of space needed for the L2 tables can be calculated
> with the following formula:
>
>max_l2_size = virtual_disk_size * 8 / cluster_size
>
> Large images require a large amount of metadata, and therefore a large
> amount of memory for the L2 cache. With the default cluster size
> (64KB) that's 128MB of L2 cache for a 1TB qcow2 image.
>
> The only way to reduce the size of the L2 tables is therefore
> increasing the cluster size, but this makes the image less efficient
> as described earlier in (2).
>
> === The proposal ===
>
> The idea of this proposal is to extend the qcow2 format by allowing
> subcluster allocation. There would be an optional feature that would
> need to be enabled when creating the image. The on-disk format would
> remain essentially the same, except that each data cluster would be
> internally divided into a number of subclusters of equal size.
>
> What this means in practice is that each entry on an L2 table would be
> accompanied by a bitmap indicating the allocation state of each one of
> the subclusters for that cluster. There are several alternatives for
> storing the bitmap, described below.
>
> Other than L2 entries, all other data structures would remain
> unchanged, but for data clusters the smallest unit of allocation would
> now be the subcluster. Reference counting would still be at the
> cluster level, because there's no way to reference individual
> subclusters. Copy-on-write on internal snapshots would need to copy
> complete clusters, so that scenario would not benefit from this
> change.
>
> I see two main use cases for this feature:
>
> a) The qcow2 image is not too large / the L2 cache is not a problem,
>but you want to increase the allocation performance. In this case
>you can have something like a 128KB cluster with 4KB subclusters
>(with 4KB being a common block size in ext4 and other filesystems)
>
> b) The qcow2 image is very large and you want to save metadata space
>in order to have a smaller L2 cache. In this case you can go for
>the maximum cluster size (2MB) but you want to have smaller
>subclusters to increase the allocation performance and optimize the
>disk usage. This was actually my original use case.
>
> === Test results ===
>
> I have a basic working prototype of this. It's still incomplete -and
> buggy :)- but it gives an idea of what we can expect from it. In my
> implementation each data cluster has 8 subclusters, but that's not set
> in stone (see below).
>
> I made all tests on an SSD drive, writing to an empty qcow2 image with
> a fully populated 40GB backing image, performing random writes using
> fio with a block size of 4KB.
>
> I tried

[Qemu-devel] [PATCH v4 4/4] qemu-img: copy *key-secret opts when opening newly created files

2017-04-12 Thread Daniel P. Berrange

The qemu-img dd/convert commands will create a image file and
then try to open it. Historically it has been possible to open
new files without passing any options. With encrypted files
though, the *key-secret options are mandatory, so we need to
provide those options when opening the newly created file.

Signed-off-by: Daniel P. Berrange 
---
 qemu-img.c | 41 +++--
 1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 31c4923..3d9e7b3 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -305,15 +305,17 @@ static BlockBackend *img_open_opts(const char *optstr,
 }
 
 static BlockBackend *img_open_file(const char *filename,
+   QDict *options,
const char *fmt, int flags,
bool writethrough, bool quiet)
 {
 BlockBackend *blk;
 Error *local_err = NULL;
-QDict *options = NULL;
 
 if (fmt) {
-options = qdict_new();
+if (!options) {
+options = qdict_new();
+}
 qdict_put(options, "driver", qstring_from_str(fmt));
 }
 
@@ -332,6 +334,33 @@ static BlockBackend *img_open_file(const char *filename,
 }
 
 
+static int img_add_key_secrets(void *opaque,
+   const char *name, const char *value,
+   Error **errp)
+{
+QDict *options = opaque;
+
+if (g_str_has_suffix(name, "key-secret")) {
+qdict_put(options, name, qstring_from_str(value));
+}
+
+return 0;
+}
+
+static BlockBackend *img_open_new_file(const char *filename,
+   QemuOpts *create_opts,
+   const char *fmt, int flags,
+   bool writethrough, bool quiet)
+{
+QDict *options = NULL;
+
+options = qdict_new();
+qemu_opt_foreach(create_opts, img_add_key_secrets, options, NULL);
+
+return img_open_file(filename, options, fmt, flags, writethrough, quiet);
+}
+
+
 static BlockBackend *img_open(bool image_opts,
   const char *filename,
   const char *fmt, int flags, bool writethrough,
@@ -351,7 +380,7 @@ static BlockBackend *img_open(bool image_opts,
 }
 blk = img_open_opts(filename, opts, flags, writethrough, quiet);
 } else {
-blk = img_open_file(filename, fmt, flags, writethrough, quiet);
+blk = img_open_file(filename, NULL, fmt, flags, writethrough, quiet);
 }
 return blk;
 }
@@ -2301,8 +2330,8 @@ static int img_convert(int argc, char **argv)
  * That has to wait for bdrv_create to be improved
  * to allow filenames in option syntax
  */
-out_blk = img_open_file(out_filename, out_fmt,
-flags, writethrough, quiet);
+out_blk = img_open_new_file(out_filename, opts, out_fmt,
+flags, writethrough, quiet);
 }
 if (!out_blk) {
 ret = -1;
@@ -4351,7 +4380,7 @@ static int img_dd(int argc, char **argv)
  * with the bdrv_create() call above which does not
  * support image-opts style.
  */
-blk2 = img_open_file(out.filename, out_fmt, BDRV_O_RDWR,
+blk2 = img_open_file(out.filename, NULL, out_fmt, BDRV_O_RDWR,
  false, false);
 
 if (!blk2) {
-- 
2.9.3

[Qemu-devel] [PATCH v4 3/4] qemu-img: introduce --target-image-opts for 'convert' command

2017-04-12 Thread Daniel P. Berrange

The '--image-opts' flags indicates whether the source filename
includes options. The target filename has to remain in the
plain filename format though, since it needs to be passed to
bdrv_create().  When using --skip-create though, it would be
possible to use image-opts syntax. This adds --target-image-opts
to indicate that the target filename includes options. Currently
this mandates use of the --skip-create flag too.

Reviewed-by: Eric Blake 
Signed-off-by: Daniel P. Berrange 
---
 qemu-img-cmds.hx |  4 +--
 qemu-img.c   | 84 +++-
 qemu-img.texi| 12 ++--
 3 files changed, 71 insertions(+), 29 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 8ac7822..93b50ef 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -40,9 +40,9 @@ STEXI
 ETEXI
 
 DEF("convert", img_convert,
-"convert [--object objectdef] [--image-opts] [-c] [-p] [-q] [-n] [-f fmt] 
[-t cache] [-T src_cache] [-O output_fmt] [-o options] [-s snapshot_id_or_name] 
[-l snapshot_param] [-S sparse_size] [-m num_coroutines] [-W] filename 
[filename2 [...]] output_filename")
+"convert [--object objectdef] [--image-opts] [--target-image-opts] [-c] 
[-p] [-q] [-n] [-f fmt] [-t cache] [-T src_cache] [-O output_fmt] [-o options] 
[-s snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] [-m 
num_coroutines] [-W] filename [filename2 [...]] output_filename")
 STEXI
-@item convert [--object @var{objectdef}] [--image-opts] [-c] [-p] [-q] [-n] 
[-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O @var{output_fmt}] [-o 
@var{options}] [-s @var{snapshot_id_or_name}] [-l @var{snapshot_param}] [-S 
@var{sparse_size}] [-m @var{num_coroutines}] [-W] @var{filename} 
[@var{filename2} [...]] @var{output_filename}
+@item convert [--object @var{objectdef}] [--image-opts] [--target-image-opts] 
[-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-T @var{src_cache}] [-O 
@var{output_fmt}] [-o @var{options}] [-s @var{snapshot_id_or_name}] [-l 
@var{snapshot_param}] [-S @var{sparse_size}] [-m @var{num_coroutines}] [-W] 
@var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF("dd", img_dd,
diff --git a/qemu-img.c b/qemu-img.c
index 83aff5e..31c4923 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -59,6 +59,7 @@ enum {
 OPTION_PATTERN = 260,
 OPTION_FLUSH_INTERVAL = 261,
 OPTION_NO_DRAIN = 262,
+OPTION_TARGET_IMAGE_OPTS = 263,
 };
 
 typedef enum OutputFormat {
@@ -1921,7 +1922,7 @@ static int img_convert(int argc, char **argv)
 int progress = 0, flags, src_flags;
 bool writethrough, src_writethrough;
 const char *fmt, *out_fmt, *cache, *src_cache, *out_baseimg, *out_filename;
-BlockDriver *drv, *proto_drv;
+BlockDriver *drv = NULL, *proto_drv = NULL;
 BlockBackend **blk = NULL, *out_blk = NULL;
 BlockDriverState **bs = NULL, *out_bs = NULL;
 int64_t total_sectors;
@@ -1941,9 +1942,10 @@ static int img_convert(int argc, char **argv)
 bool image_opts = false;
 bool wr_in_order = true;
 long num_coroutines = 8;
+bool tgt_image_opts = false;
 
+out_fmt = NULL;
 fmt = NULL;
-out_fmt = "raw";
 cache = "unsafe";
 src_cache = BDRV_DEFAULT_CACHE;
 out_baseimg = NULL;
@@ -1954,6 +1956,7 @@ static int img_convert(int argc, char **argv)
 {"help", no_argument, 0, 'h'},
 {"object", required_argument, 0, OPTION_OBJECT},
 {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
+{"target-image-opts", no_argument, 0, OPTION_TARGET_IMAGE_OPTS},
 {0, 0, 0, 0}
 };
 c = getopt_long(argc, argv, ":hf:O:B:ce6o:s:l:S:pt:T:qnm:W",
@@ -2075,9 +2078,16 @@ static int img_convert(int argc, char **argv)
 case OPTION_IMAGE_OPTS:
 image_opts = true;
 break;
+case OPTION_TARGET_IMAGE_OPTS:
+tgt_image_opts = true;
+break;
 }
 }
 
+if (!out_fmt && !tgt_image_opts) {
+out_fmt = "raw";
+}
+
 if (qemu_opts_foreach(_object_opts,
   user_creatable_add_opts_foreach,
   NULL, NULL)) {
@@ -2090,6 +2100,11 @@ static int img_convert(int argc, char **argv)
 goto fail_getopt;
 }
 
+if (tgt_image_opts && !skip_create) {
+error_report("--target-image-opts requires use of -n flag");
+goto fail_getopt;
+}
+
 /* Initialize before goto out */
 if (quiet) {
 progress = 0;
@@ -2100,8 +2115,13 @@ static int img_convert(int argc, char **argv)
 out_filename = bs_n >= 1 ? argv[argc - 1] : NULL;
 
 if (options && has_help_option(options)) {
-ret = print_block_option_help(out_filename, out_fmt);
-goto out;
+if (out_fmt) {
+ret = print_block_option_help(out_filename, out_fmt);
+goto out;
+} else {
+error_report("Option help requires a format be specified");
+

[Qemu-devel] [PATCH v4 0/4 for-2.10] Improve convert and dd commands

2017-04-12 Thread Daniel P. Berrange

Update to

  v1: https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg05699.html
  v2: https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg00728.html
  v3: https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg04391.html

This series is in response to Max pointing out that you cannot
use 'convert' for an encrypted target image.

The 'convert' and 'dd' commands need to first create the image
and then open it. The bdrv_create() method takes a set of options
for creating the image, which let us provide a key-secret for the
encryption key. When the commands then open the new image, they
don't provide any options, so the image is unable to be opened
due to lack of encryption key. It is also not possible to use
the --image-opts argument to provide structured options in the
target image name - it must be a plain filename to satisfy the
bdrv_create() API contract.

This series addresses these problems to some extent

 - Adds a new --target-image-opts flag which is used to say
   that the target filename is using structured options.
   It is *only* permitted to use this when -n is also set.
   ie the target image must be pre-created so convert/dd
   don't need to run bdrv_create().

 - When --target-image-opts is not used, add special case
   code that identifies options passed to bdrv_create()
   named "*key-secret" and adds them to the options used
   to open the new image

In future it is desirable to make --target-image-opts work even when -n is
*not* given. This requires considerable work to create a new bdrv_create()
API impl.

The first patch fixes a bug in the 'dd' command while the second adds support
for the missing '--object' arg to 'dd', allowing it to reference secrets when
opening files.  The last two patches implement the new features described above
for the 'convert' command.

Changed in v4:

 - Refactor img_open_new_file in terms of img_open_file (Kevin)

Changed in v3:

 - Drop all patches affecting the 'dd' command except for the clear bug fix
   and the --object support. They can be re-considered once dd is rewritten
   to run ontop of convert.
 - Use consistent return/goto style in dd command (Max)
 - Fix error reporting when using compressed image and skip-create (Max)
 - Unconditionally create QDict when open files (Max)

Changed in v2:

 - Replace dd -n flag with support for conv=nocreat,notrunc
 - Misc typos (Eric, Fam)

Daniel P. Berrange (4):
  qemu-img: add support for --object with 'dd' command
  qemu-img: fix --image-opts usage with dd command
  qemu-img: introduce --target-image-opts for 'convert' command
  qemu-img: copy *key-secret opts when opening newly created files

 qemu-img-cmds.hx |   4 +-
 qemu-img.c   | 146 +++
 qemu-img.texi|  12 -
 3 files changed, 128 insertions(+), 34 deletions(-)

-- 
2.9.3

[Qemu-devel] [PATCH v4 2/4] qemu-img: fix --image-opts usage with dd command

2017-04-12 Thread Daniel P. Berrange

The --image-opts flag can only be used to affect the parsing
of the source image. The target image has to be specified in
the traditional style regardless, since it needs to be passed
to the bdrv_create() API which does not support the new style
opts.

Reviewed-by: Max Reitz 
Signed-off-by: Daniel P. Berrange 
---
 qemu-img.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 2249c21..83aff5e 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4312,8 +4312,13 @@ static int img_dd(int argc, char **argv)
 goto out;
 }
 
-blk2 = img_open(image_opts, out.filename, out_fmt, BDRV_O_RDWR,
-false, false);
+/* TODO, we can't honour --image-opts for the target,
+ * since it needs to be given in a format compatible
+ * with the bdrv_create() call above which does not
+ * support image-opts style.
+ */
+blk2 = img_open_file(out.filename, out_fmt, BDRV_O_RDWR,
+ false, false);
 
 if (!blk2) {
 ret = -1;
-- 
2.9.3

[Qemu-devel] [PATCH v4 1/4] qemu-img: add support for --object with 'dd' command

2017-04-12 Thread Daniel P. Berrange

The qemu-img dd command added --image-opts support, but missed
the corresponding --object support. This prevented passing
secrets (eg auth passwords) needed by certain disk images.

Reviewed-by: Eric Blake 
Signed-off-by: Daniel P. Berrange 
---
 qemu-img.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index b220cf7..2249c21 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4159,6 +4159,7 @@ static int img_dd(int argc, char **argv)
 };
 const struct option long_options[] = {
 { "help", no_argument, 0, 'h'},
+{ "object", required_argument, 0, OPTION_OBJECT},
 { "image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
 { 0, 0, 0, 0 }
 };
@@ -4183,6 +4184,15 @@ static int img_dd(int argc, char **argv)
 case 'h':
 help();
 break;
+case OPTION_OBJECT: {
+QemuOpts *opts;
+opts = qemu_opts_parse_noisily(_object_opts,
+   optarg, true);
+if (!opts) {
+ret = -1;
+goto out;
+}
+}   break;
 case OPTION_IMAGE_OPTS:
 image_opts = true;
 break;
@@ -4227,6 +4237,14 @@ static int img_dd(int argc, char **argv)
 ret = -1;
 goto out;
 }
+
+if (qemu_opts_foreach(_object_opts,
+  user_creatable_add_opts_foreach,
+  NULL, NULL)) {
+ret = -1;
+goto out;
+}
+
 blk1 = img_open(image_opts, in.filename, fmt, 0, false, false);
 
 if (!blk1) {
-- 
2.9.3

Re: [Qemu-devel] [PATCH v6] kvm: better MWAIT emulation for guests

2017-04-12 Thread Michael S. Tsirkin

On Wed, Apr 12, 2017 at 04:54:10PM +0200, Alexander Graf wrote:
> 
> 
> On 12.04.17 16:34, Jim Mattson wrote:
> > Actually, we have rejected commit 87c00572ba05aa8c ("kvm: x86: emulate
> > monitor and mwait instructions as nop"), so when we intercept
> > MONITOR/MWAIT, we synthesize #UD. Perhaps it is this difference from
> > vanilla kvm that motivates the following idea...
> 
> So you're not running upstream kvm? In that case, you can just not take this
> patch either :).
> 
> > Since we're still not going to report MONITOR support in CPUID, the
> > only guests of consequence are paravirtual guests. What if a
> 
> Only if someone actually implemented something for PV guests, yes.
> 
> The real motivation is to allow user space to force set the MONITOR CPUID
> flag. That way an admin can - if he really wants to - dedicate pCPUs to the
> VM.
> 
> I agree that we don't need the kvm pv flag for that. I'd be happy to drop
> that if everyone agrees.

I don't really agree we do not need the PV flag. mwait on kvm is
different from mwait on bare metal in that you are heavily penalized by
scheduler for polling unless you configure the host just so.
HLT lets you give up the host CPU if you know you won't need
it for a long time.

So while many people can get by with monitor cpuid (those that isolate
host CPUs) and it's a valuable option to have, I think a PV flag is also
a valuable option and can be set for more configurations.

Guest has an idle driver calling mwait on short waits and halt on longer
ones.  I'm in fact testing an idle driver using such a PV flag and will
post when ready (after vacation ~3 weeks from now probably).

> > paravirtual guest was aware of the fact that sometimes MONITOR/MWAIT
> > would work as architected, and sometimes they would raise #UD (or do
> > something else that's guest-visible, to indicate that the hypevisor is
> > intercepting the instructions). Such a guest could first try a
> > MONITOR/MWAIT-based idle loop and then fall back on a HLT-based idle
> > loop if the hypervisor rejected its use of MONITOR/MWAIT.
> 
> How would that work? That guest would have to atomically notify all other
> vCPUs that wakeup notifications now go via IPIs instead of cache line
> dirtying.
> 
> That's probably as much work to get right as it would be to just emulate
> MWAIT inside kvm ;).
> 
> > We already have the loose concept of "this pCPU has other things to
> > do," which is encoded in the variable-sized PLE window. With
> > MONITOR/MWAIT, the choice is binary, but a simple implementation could
> > tie the two together, by allowing the guest to use MONITOR/MWAIT
> > whenever the PLE window exceeds a certain threshold. Or the decision
> > could be left to the userspace agent.
> 
> I agree, and that's basically the idea I mentioned earlier with MWAIT
> emulation. We could (for well behaved guests) switch between emulating MWAIT
> and running native MWAIT.
> 
> 
> 
> Alex

Re: [Qemu-devel] [Bug 1682093] Re: aarch64-softmmu "bad ram pointer" crash

2017-04-12 Thread Peter Maydell

On 12 April 2017 at 16:02, Harry Wagstaff <1682...@bugs.launchpad.net> wrote:
> I've done some investigation and it appears that this bug is caused by
> the following:
>
> 1. The flash memory of the virt platform is initialised as a
> cfi.pflash01. It has a memory region with romd_mode = true and
> rom_device = true
>
> 2. Some code stored in the flash memory is executed. This causes the
> memory to be loaded into the TLB.
>
> 3. The code is overwritten. This causes the romd_mode of the flash
> memory to be reset. It also causes the code to be evicted from the TLB.
>
> 4. An attempt is made to execute the code again (cpu_exec(), cpu-exec.c:677)
> 4a. Eventually, QEMU attempts to refill the TLB (softmmu_template.h:127)
> 4b. We try to fill in the tlb entry (tlb_set_page_with_attrs, cputlb.c:602)
> 4b. The flash memory no longer appears to be a ram or romd (cputlb.c:632)
> 4c. QEMU decides that the flash memory is an IO device (cputlb.c:634)
> 4d. QEMU aborts while trying to fill in the rest of the TLB entry 
> (qemu_ram_addr_from_host_nofail)

Yeah, this is a known bug -- but fixing it would just mean that
we would print the slightly more helpful message about the
guest attempting to execute from something that isn't RAM
or ROM before exiting.

See for instance this thread from January.
https://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg00674.html

> I have built a MWE (which I have attached) which produces this behaviour
> in git head. I'm not exactly sure what a fix for this should look like:
> AFAIK it's not technically valid to write into flash, but I'm not sure
> that QEMU crashing should be considered correct behaviour.

You should fix your guest so that it doesn't try to execute
from flash without putting the flash back into the mode you
can execute from...

Writing to the flash device is permitted -- it's how
you program it (you write command bytes to it, and
read back responses and status and so on).

thanks
-- PMM

Re: [Qemu-devel] WinDbg module

2017-04-12 Thread Marc-André Lureau

Hi

On Wed, Apr 12, 2017 at 6:44 PM Mihail Abakumov 
wrote:

> Hello.
>
> We made the debugger module WinDbg (like GDB) for QEMU. This is the
> replacement of the remote stub in Windows kernel. Used for remote
> Windows kernel debugging without debugging mode.
>
> The latest build and instructions for the launch can be found here:
> https://github.com/ispras/qemu/releases/tag/v2.7.50-windbg
>
> Currently only one ways to create a remote debugging connection is
> supported: using COM port with named pipe.
>
> Should I prepare patches for inclusion in the master branch? Or is it
> too specific module and it is not needed?
>

Great news! A few years ago, I looked into this with a colleague to speed
up debugging, but we were stuck (can't remember the blocking factor, but
the lack of documentation was not helping). I think some windows driver
developers would be quite happy with it, which makes it a good candidate
for inclusion. Please prepare patches!

-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH v6] kvm: better MWAIT emulation for guests

2017-04-12 Thread Jim Mattson via Qemu-devel

On Wed, Apr 12, 2017 at 7:54 AM, Alexander Graf  wrote:
>
>
> On 12.04.17 16:34, Jim Mattson wrote:
>>
>> Actually, we have rejected commit 87c00572ba05aa8c ("kvm: x86: emulate
>> monitor and mwait instructions as nop"), so when we intercept
>> MONITOR/MWAIT, we synthesize #UD. Perhaps it is this difference from
>> vanilla kvm that motivates the following idea...
>
>
> So you're not running upstream kvm? In that case, you can just not take this
> patch either :).
This patch should be harmless. :-)
>
>> Since we're still not going to report MONITOR support in CPUID, the
>> only guests of consequence are paravirtual guests. What if a
>
>
> Only if someone actually implemented something for PV guests, yes.
>
> The real motivation is to allow user space to force set the MONITOR CPUID
> flag. That way an admin can - if he really wants to - dedicate pCPUs to the
> VM.
>
> I agree that we don't need the kvm pv flag for that. I'd be happy to drop
> that if everyone agrees.
>
>> paravirtual guest was aware of the fact that sometimes MONITOR/MWAIT
>> would work as architected, and sometimes they would raise #UD (or do
>> something else that's guest-visible, to indicate that the hypevisor is
>> intercepting the instructions). Such a guest could first try a
>> MONITOR/MWAIT-based idle loop and then fall back on a HLT-based idle
>> loop if the hypervisor rejected its use of MONITOR/MWAIT.
>
>
> How would that work? That guest would have to atomically notify all other
> vCPUs that wakeup notifications now go via IPIs instead of cache line
> dirtying.
>
> That's probably as much work to get right as it would be to just emulate
> MWAIT inside kvm ;).
True. I don't have an easy solution to that problem.
>
>> We already have the loose concept of "this pCPU has other things to
>> do," which is encoded in the variable-sized PLE window. With
>> MONITOR/MWAIT, the choice is binary, but a simple implementation could
>> tie the two together, by allowing the guest to use MONITOR/MWAIT
>> whenever the PLE window exceeds a certain threshold. Or the decision
>> could be left to the userspace agent.
>
>
> I agree, and that's basically the idea I mentioned earlier with MWAIT
> emulation. We could (for well behaved guests) switch between emulating MWAIT
> and running native MWAIT.
Yes, that would probably be the preferred solution.
>
>
>
> Alex

Re: [Qemu-devel] vmbus bridge: machine property or device?

2017-04-12 Thread Markus Armbruster

Cc'ing a few more people who might have a reasoned opinion.

Roman Kagan  writes:

> While hammering out the VMBus / storage series, we've been struggling to
> figure out the best practices solution to the following problem:
>
> VMBus is provided by a vmbus bridge; it appears the most natural to have
> it subclassed from SysBusDevice.  There can only be one VMBus in the
> VM.

TYPE_DEVICE unless you actually need something TYPE_SYS_BUS_DEVICE
provides.

> Now the question is how to add it to the system:
>
> 1) with a boolean machine property "vmbus" that would trigger the
>creation of the VMBus bridge; its class would have
>->cannot_instantiate_with_device_add_yet = true

This makes it an optional onboard device.  Similar ones exist already,
e.g. various optional onboard USB host controllers controlled by machine
property "usb".

> 2) with a regular -device option; this would require setting
>->has_dynamic_sysbus = true for i440fx machines (q35 already have it)

This makes it a pluggable sysbus device.

I'd be tempted to leave old i400FX rot in peace, but your use case may
not allow that.

>
> 3) anything else
>
>
> So far we went with 1) but since it's essentially the API the management
> layer would have to use we'd like to get it right from the beginning.

Asking for advice here is a good idea.

Anyone?

[Qemu-devel] [Bug 1682093] Re: aarch64-softmmu "bad ram pointer" crash

2017-04-12 Thread Harry Wagstaff

I've done some investigation and it appears that this bug is caused by
the following:

1. The flash memory of the virt platform is initialised as a
cfi.pflash01. It has a memory region with romd_mode = true and
rom_device = true

2. Some code stored in the flash memory is executed. This causes the
memory to be loaded into the TLB.

3. The code is overwritten. This causes the romd_mode of the flash
memory to be reset. It also causes the code to be evicted from the TLB.

4. An attempt is made to execute the code again (cpu_exec(), cpu-exec.c:677)
4a. Eventually, QEMU attempts to refill the TLB (softmmu_template.h:127)
4b. We try to fill in the tlb entry (tlb_set_page_with_attrs, cputlb.c:602)
4b. The flash memory no longer appears to be a ram or romd (cputlb.c:632)
4c. QEMU decides that the flash memory is an IO device (cputlb.c:634)
4d. QEMU aborts while trying to fill in the rest of the TLB entry 
(qemu_ram_addr_from_host_nofail)

I have built a MWE (which I have attached) which produces this behaviour
in git head. I'm not exactly sure what a fix for this should look like:
AFAIK it's not technically valid to write into flash, but I'm not sure
that QEMU crashing should be considered correct behaviour.

** Attachment added: "mwe.tar.gz"
   
https://bugs.launchpad.net/qemu/+bug/1682093/+attachment/4860707/+files/mwe.tar.gz

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1682093

Title:
  aarch64-softmmu "bad ram pointer" crash

Status in QEMU:
  New

Bug description:
  I am developing a piece of software called SimBench which is a
  benchmarking system for full system simulators. I am currently porting
  this to aarch64, using QEMU as a test platform.

  I have encountered a 'bad ram pointer' crash. I've attempted to build
  a minimum test case, but I haven't managed to replicate the behaviour
  in isolation, so I've created a branch of my project which exhibits
  the crash: https://bitbucket.org/Awesomeclaw/simbench/get/qemu-
  bug.tar.gz

  The package can be compiled using:

  make

  and then run using:

  qemu-system-aarch64  -M virt -m 512 -cpu cortex-a57 -kernel
  out/armv8/virt/simbench -nographic

  I have replicated the issue in both qemu 2.8.1 and in 2.9.0-rc3, on
  Fedora 23. Please let me know if you need any more information or any
  logs/core dumps/etc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1682093/+subscriptions

Re: [Qemu-devel] [PATCH v2 1/4] Throttle: Create IOThrottle structure

2017-04-12 Thread Alberto Garcia

On Thu 30 Mar 2017 02:10:10 PM CEST, Pradeep Jagadeesh wrote:
> +##
> +# == QAPI IOThrottle definitions
> +##
> +# @IOThrottle:
> +#
> +# A set of parameters describing block
> +#

"describing block ..." ? There's missing text here.

Berto

Re: [Qemu-devel] [PATCH] test-keyval: fix leaks

2017-04-12 Thread Markus Armbruster

Marc-André Lureau  writes:

> Hi
>
> On Wed, Apr 12, 2017 at 3:19 PM Markus Armbruster  wrote:
>
>> Marc-André Lureau  writes:
>>
>> > Signed-off-by: Marc-André Lureau 
>> > ---
>> >  tests/test-keyval.c | 4 
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/tests/test-keyval.c b/tests/test-keyval.c
>> > index ba19560a22..141ee5d0c4 100644
>> > --- a/tests/test-keyval.c
>> > +++ b/tests/test-keyval.c
>> > @@ -628,6 +628,7 @@ static void test_keyval_visit_alternate(void)
>> >  visit_type_AltNumStr(v, "a", , _abort);
>> >  g_assert_cmpint(ans->type, ==, QTYPE_QSTRING);
>> >  g_assert_cmpstr(ans->u.s, ==, "1");
>> > +qapi_free_AltNumStr(ans);
>> >  visit_type_AltNumInt(v, "a", , );
>> >  error_free_or_abort();
>> >  visit_end_struct(v, NULL);
>> > @@ -649,11 +650,14 @@ static void test_keyval_visit_any(void)
>> >  visit_type_any(v, "a", , _abort);
>>
>> @any becomes a strong reference (qobject_input_type_any() increments the
>> reference count).
>>
>> >  qlist = qobject_to_qlist(any);
>>
>> Reference count unchanged.
>>
>> >  g_assert(qlist);
>> > +qobject_decref(any);
>>
>> Uh, this is unnecessarily dirty: you relinquish the reference before
>> you're actually done with it.  Works only because there's *another*
>> reference hiding within @v.  Let's move this ...
>>
>> >  qstr = qobject_to_qstring(qlist_pop(qlist));
>> >  g_assert_cmpstr(qstring_get_str(qstr), ==, "null");
>> > +QDECREF(qstr);
>> >  qstr = qobject_to_qstring(qlist_pop(qlist));
>> >  g_assert_cmpstr(qstring_get_str(qstr), ==, "1");
>> >  g_assert(qlist_empty(qlist));
>> > +QDECREF(qstr);
>>
>> ... here.  Okay to do that on commit?
>>
>
>  sure, makes sense
>
>
>> >  visit_check_struct(v, _abort);
>> >  visit_end_struct(v, NULL);
>> >  visit_free(v);
>>
>> With the reference counting cleaned up:
>> Reviewed-by: Markus Armbruster 
>>
>>
> thanks

Applied to qapi-next, thanks!

Re: [Qemu-devel] [PATCH v6] kvm: better MWAIT emulation for guests

2017-04-12 Thread Alexander Graf




On 12.04.17 16:34, Jim Mattson wrote:

Actually, we have rejected commit 87c00572ba05aa8c ("kvm: x86: emulate
monitor and mwait instructions as nop"), so when we intercept
MONITOR/MWAIT, we synthesize #UD. Perhaps it is this difference from
vanilla kvm that motivates the following idea...


So you're not running upstream kvm? In that case, you can just not take 
this patch either :).



Since we're still not going to report MONITOR support in CPUID, the
only guests of consequence are paravirtual guests. What if a


Only if someone actually implemented something for PV guests, yes.

The real motivation is to allow user space to force set the MONITOR 
CPUID flag. That way an admin can - if he really wants to - dedicate 
pCPUs to the VM.


I agree that we don't need the kvm pv flag for that. I'd be happy to 
drop that if everyone agrees.



paravirtual guest was aware of the fact that sometimes MONITOR/MWAIT
would work as architected, and sometimes they would raise #UD (or do
something else that's guest-visible, to indicate that the hypevisor is
intercepting the instructions). Such a guest could first try a
MONITOR/MWAIT-based idle loop and then fall back on a HLT-based idle
loop if the hypervisor rejected its use of MONITOR/MWAIT.


How would that work? That guest would have to atomically notify all 
other vCPUs that wakeup notifications now go via IPIs instead of cache 
line dirtying.


That's probably as much work to get right as it would be to just emulate 
MWAIT inside kvm ;).



We already have the loose concept of "this pCPU has other things to
do," which is encoded in the variable-sized PLE window. With
MONITOR/MWAIT, the choice is binary, but a simple implementation could
tie the two together, by allowing the guest to use MONITOR/MWAIT
whenever the PLE window exceeds a certain threshold. Or the decision
could be left to the userspace agent.


I agree, and that's basically the idea I mentioned earlier with MWAIT 
emulation. We could (for well behaved guests) switch between emulating 
MWAIT and running native MWAIT.




Alex

Re: [Qemu-devel] [PATCH] event: Add signal information to SHUTDOWN

2017-04-12 Thread Eric Blake

On 04/12/2017 09:33 AM, Markus Armbruster wrote:

>>> Additional ways to terminate QEMU: HMP and QMP command "quit", and the
>>> various GUI controls such "close SDL window".
>>
>> Good points. I have no idea what exit path those take (if they
>> raise(SIGINT) internally, it's quite easy - but if they go through some
>> other exit path, then I'll need to wire in something else).
> 
> Chasing down all host-initiated terminations looks like a fool's errand
> to me.  But if we can reliably detect guest-initiated, we don't have to,
> do we?

I don't know if we can reliably detect guest-initiated. Host-signal
initiated was easy because of the modification to a global variable.
But hopefully I'll find some easy common point, and maybe have to add
another global variable to track whether we went through that point (as
the reporting point is different than the request point).  At any rate,
this conversation is useful, to make sure I find the right place.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] WinDbg module

2017-04-12 Thread Mihail Abakumov


Hello.

We made the debugger module WinDbg (like GDB) for QEMU. This is the 
replacement of the remote stub in Windows kernel. Used for remote 
Windows kernel debugging without debugging mode.


The latest build and instructions for the launch can be found here: 
https://github.com/ispras/qemu/releases/tag/v2.7.50-windbg


Currently only one ways to create a remote debugging connection is 
supported: using COM port with named pipe.


Should I prepare patches for inclusion in the master branch? Or is it 
too specific module and it is not needed?


--
Thanks,
Mihail Abakumov

Re: [Qemu-devel] [PATCH v6] kvm: better MWAIT emulation for guests

2017-04-12 Thread Jim Mattson via Qemu-devel

Actually, we have rejected commit 87c00572ba05aa8c ("kvm: x86: emulate
monitor and mwait instructions as nop"), so when we intercept
MONITOR/MWAIT, we synthesize #UD. Perhaps it is this difference from
vanilla kvm that motivates the following idea...

Since we're still not going to report MONITOR support in CPUID, the
only guests of consequence are paravirtual guests. What if a
paravirtual guest was aware of the fact that sometimes MONITOR/MWAIT
would work as architected, and sometimes they would raise #UD (or do
something else that's guest-visible, to indicate that the hypevisor is
intercepting the instructions). Such a guest could first try a
MONITOR/MWAIT-based idle loop and then fall back on a HLT-based idle
loop if the hypervisor rejected its use of MONITOR/MWAIT.

We already have the loose concept of "this pCPU has other things to
do," which is encoded in the variable-sized PLE window. With
MONITOR/MWAIT, the choice is binary, but a simple implementation could
tie the two together, by allowing the guest to use MONITOR/MWAIT
whenever the PLE window exceeds a certain threshold. Or the decision
could be left to the userspace agent.

On Tue, Apr 11, 2017 at 11:23 AM, Alexander Graf  wrote:
>
>
>> Am 11.04.2017 um 19:10 schrieb Jim Mattson :
>>
>> This might be more useful if it could be dynamically toggled on and
>> off, depending on system load.
>
> What would trapping mwait (currently) buy you?
>
> As it stands today, before this patch, mwait is simply implemented as a nop, 
> so enabling the trap just means you're wasting as much cpu time, but never 
> send the pCPU idle. With this patch, the CPU at least has the chance to go 
> idle.
>
> Keep in mind that this patch does *not* advertise the mwait cpuid feature bit 
> to the guest.
>
> What you're referring to I guess is actual mwait emulation. That is indeed 
> more useful, but a bigger patch than this and needs some more thought on how 
> to properly cache the monitor'ed pages.
>
>
> Alex
>
>

Re: [Qemu-devel] [PATCH] event: Add signal information to SHUTDOWN

2017-04-12 Thread Markus Armbruster

Eric Blake  writes:

> On 04/12/2017 08:52 AM, Markus Armbruster wrote:
 In other words, these three signals are polite requests to terminate
 QEMU.

 Stefan, are there equivalent requests under Windows?  I guess there
 might be one at least for SIGINT, namely whatever happens when you hit
 ^C on the console.
>>>
>>> Mingw has SIGINT (C99 requires it), and that's presumably what happens
>>> for ^C,...
>>>

 Could we arrange to run qemu_system_killed() then?
>>>
>>> ...but I don't know why it is not currently wired up to call
>>> qemu_system_killed(), nor do I have enough Windows programming expertise
>>> to try and write such a patch. But I think that is an orthogonal
>>> improvement.  On the other hand, mingw has a definition for SIGTERM (but
>>> I'm not sure how it gets triggered) and no definition at all for SIGHUP
>>> (as evidenced by the #ifdef'fery in the patch to get it to compile under
>>> docker targetting mingw).
>> 
>> If all we need is distingishing host- and guest-initiated shutdown, then
>> detecting the latter reliably lets us stay away from OS-specific stuff.
>> Can we do that?
>
> I'll simplify what I can; I still can't guarantee that mingw will be
> setting the bool correctly in all cases, but setting a bool is easier
> than trying to set a signal name.
>
>>> There are other reasons too: a guest can request shutdown immediately
>>> before the host sends SIGINT. Based on when things are processed, you
>>> could see either the guest or the host as the initiator.  And the race
>>> is not entirely implausible - when trying to shut down a guest, libvirt
>>> first tries to inform the guest to initiate things (whether by interrupt
>>> or guest agent), but after a given amount of time, assumes the guest is
>>> unresponsive and resorts to a signal to qemu. A heavily loaded guest
>>> that takes its time in responding could easily overlap with the timeout
>>> resorting to a host-side action.
>> 
>> This race doesn't worry me.  If both host and guest have initiated a
>> shutdown, then reporting whichever of the two finishes first seems fair.
>
> So maybe I just tone down the docs and not even mention it.

Works for me.

>> Additional ways to terminate QEMU: HMP and QMP command "quit", and the
>> various GUI controls such "close SDL window".
>
> Good points. I have no idea what exit path those take (if they
> raise(SIGINT) internally, it's quite easy - but if they go through some
> other exit path, then I'll need to wire in something else).

Chasing down all host-initiated terminations looks like a fool's errand
to me.  But if we can reliably detect guest-initiated, we don't have to,
do we?

1 2 >

1 - 100 of 184 matches

Mail list logo