date:20190925

Re: [PATCH v4 4/8] hw/i386: split PCMachineState deriving X86MachineState from it

2019-09-25 Thread Philippe Mathieu-Daudé

On 9/24/19 3:40 PM, Philippe Mathieu-Daudé wrote:
> On 9/24/19 2:44 PM, Sergio Lopez wrote:
>> Split up PCMachineState and PCMachineClass and derive X86MachineState
>> and X86MachineClass from them. This allows sharing code with non-PC
>> machine types.
>>
>> Also, move shared functions from pc.c to x86.c.
>>
>> Signed-off-by: Sergio Lopez 
>> ---
>>  hw/acpi/cpu_hotplug.c |  10 +-
>>  hw/i386/Makefile.objs |   1 +
>>  hw/i386/acpi-build.c  |  31 +-
>>  hw/i386/amd_iommu.c   |   4 +-
>>  hw/i386/intel_iommu.c |   4 +-
>>  hw/i386/pc.c  | 796 +-
>>  hw/i386/pc_piix.c |  48 +--
>>  hw/i386/pc_q35.c  |  38 +-
>>  hw/i386/pc_sysfw.c|  60 +---
>>  hw/i386/x86.c | 788 +
>>  hw/intc/ioapic.c  |   3 +-
>>  include/hw/i386/pc.h  |  29 +-
>>  include/hw/i386/x86.h |  97 +
>>  13 files changed, 1045 insertions(+), 864 deletions(-)
>>  create mode 100644 hw/i386/x86.c
>>  create mode 100644 include/hw/i386/x86.h
>>
>> diff --git a/hw/acpi/cpu_hotplug.c b/hw/acpi/cpu_hotplug.c
>> index 6e8293aac9..3ac2045a95 100644
>> --- a/hw/acpi/cpu_hotplug.c
>> +++ b/hw/acpi/cpu_hotplug.c
>> @@ -128,7 +128,7 @@ void build_legacy_cpu_hotplug_aml(Aml *ctx, MachineState 
>> *machine,
>>  Aml *one = aml_int(1);
>>  MachineClass *mc = MACHINE_GET_CLASS(machine);
>>  const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
>> -PCMachineState *pcms = PC_MACHINE(machine);
>> +X86MachineState *x86ms = X86_MACHINE(machine);
>>  
>>  /*
>>   * _MAT method - creates an madt apic buffer
>> @@ -236,9 +236,9 @@ void build_legacy_cpu_hotplug_aml(Aml *ctx, MachineState 
>> *machine,
>>  /* The current AML generator can cover the APIC ID range [0..255],
>>   * inclusive, for VCPU hotplug. */
>>  QEMU_BUILD_BUG_ON(ACPI_CPU_HOTPLUG_ID_LIMIT > 256);
>> -if (pcms->apic_id_limit > ACPI_CPU_HOTPLUG_ID_LIMIT) {
>> +if (x86ms->apic_id_limit > ACPI_CPU_HOTPLUG_ID_LIMIT) {
>>  error_report("max_cpus is too large. APIC ID of last CPU is %u",
>> - pcms->apic_id_limit - 1);
>> + x86ms->apic_id_limit - 1);
>>  exit(1);
>>  }
>>  
>> @@ -315,8 +315,8 @@ void build_legacy_cpu_hotplug_aml(Aml *ctx, MachineState 
>> *machine,
>>   * ith up to 255 elements. Windows guests up to win2k8 fail when
>>   * VarPackageOp is used.
>>   */
>> -pkg = pcms->apic_id_limit <= 255 ? aml_package(pcms->apic_id_limit) :
>> -   aml_varpackage(pcms->apic_id_limit);
>> +pkg = x86ms->apic_id_limit <= 255 ? aml_package(x86ms->apic_id_limit) :
>> +
>> aml_varpackage(x86ms->apic_id_limit);
>>  
>>  for (i = 0, apic_idx = 0; i < apic_ids->len; i++) {
>>  int apic_id = apic_ids->cpus[i].arch_id;
>> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>> index 149712db07..5b4b3a672e 100644
>> --- a/hw/i386/Makefile.objs
>> +++ b/hw/i386/Makefile.objs
>> @@ -1,6 +1,7 @@
>>  obj-$(CONFIG_KVM) += kvm/
>>  obj-y += multiboot.o
>>  obj-y += pvh.o
>> +obj-y += x86.o
>>  obj-y += pc.o
>>  obj-y += e820.o
>>  obj-$(CONFIG_I440FX) += pc_piix.o
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index e54e571a75..76e18d3285 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -29,6 +29,7 @@
>>  #include "hw/pci/pci.h"
>>  #include "hw/core/cpu.h"
>>  #include "target/i386/cpu.h"
>> +#include "hw/i386/x86.h"
>>  #include "hw/misc/pvpanic.h"
>>  #include "hw/timer/hpet.h"
>>  #include "hw/acpi/acpi-defs.h"
>> @@ -361,6 +362,7 @@ static void
>>  build_madt(GArray *table_data, BIOSLinker *linker, PCMachineState *pcms)
>>  {
>>  MachineClass *mc = MACHINE_GET_CLASS(pcms);
>> +X86MachineState *x86ms = X86_MACHINE(pcms);
>>  const CPUArchIdList *apic_ids = 
>> mc->possible_cpu_arch_ids(MACHINE(pcms));
>>  int madt_start = table_data->len;
>>  AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(pcms->acpi_dev);
>> @@ -390,7 +392,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
>> PCMachineState *pcms)
>>  io_apic->address = cpu_to_le32(IO_APIC_DEFAULT_ADDRESS);
>>  io_apic->interrupt = cpu_to_le32(0);
>>  
>> -if (pcms->apic_xrupt_override) {
>> +if (x86ms->apic_xrupt_override) {
>>  intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
>>  intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
>>  intsrcovr->length = sizeof(*intsrcovr);
>> @@ -1817,8 +1819,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>  CrsRangeEntry *entry;
>>  Aml *dsdt, *sb_scope, *scope, *dev, *method, *field, *pkg, *crs;
>>  CrsRangeSet crs_range_set;
>> -PCMachineState *pcms = PC_MACHINE(machine);
>>  PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
>> +X86MachineState *x86ms = X86_MACHINE(machine);
>>  AcpiMcfgInfo mcfg;
>>  uint32_t nr_mem = machine->ra

Re: [PATCH V3] target/riscv: Bugfix reserved bits in PTE for RV64

2019-09-25 Thread Guo Ren

I think your thoughts are wrong.
The specification is very clear: these bits are not part of ppn, not
part of the translation target address. The current code is against
the riscv-privilege specification.

On Wed, Sep 25, 2019 at 11:20 PM Jonathan Behrens  wrote:
>
> Any code whose behavior is changed by this patch is wrong, so it doesn't seem 
> like it matters much whether this is merged or not. Personally I'd lean 
> towards making sure that attempts to use PTEs with the reserved bit set would 
> always fault, rather than wrapping around to low memory and perhaps silently 
> succeeding.
>
> Jonathan
>
> On Wed, Sep 25, 2019 at 8:29 AM Guo Ren  wrote:
>>
>> On Wed, Sep 25, 2019 at 1:19 PM Alistair Francis  
>> wrote:
>> >
>> > On Tue, Sep 24, 2019 at 9:48 PM  wrote:
>> > >
>> > > From: Guo Ren 
>> > >
>> > > Highest 10 bits of PTE are reserved in riscv-privileged, ref: [1], so we
>> > > need to ignore them. They can not be a part of ppn.
>> > >
>> > > 1: The RISC-V Instruction Set Manual, Volume II: Privileged Architecture
>> > >4.4 Sv39: Page-Based 39-bit Virtual-Memory System
>> > >4.5 Sv48: Page-Based 48-bit Virtual-Memory System
>> >
>> > Hey,
>> >
>> > As I mentioned on patch 2 I don't think this is right. It isn't up to
>> > HW to clear these bits, software should keep them clear.
>>
>> These bits are not part of ppn in spec, so we still need to ignore them for 
>> ppn.
>>
>> The patch is reasonable.
>>
>> --
>> Best Regards
>>  Guo Ren
>>
>> ML: https://lore.kernel.org/linux-csky/
>>


-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

Re: [PATCH V3] target/riscv: Bugfix reserved bits in PTE for RV64

2019-09-25 Thread Jonathan Behrens

> The specification is very clear: these bits are not part of ppn, not
> part of the translation target address. The current code is against
> the riscv-privilege specification.

If all of the reserved bits are zero then the patch changes nothing.
Further the only normative mention of the reserved bits in the spec
says they must be: "Bits 63–54 are reserved for future use and must be
zeroed by software for forward compatibility." Provided that software
follows the spec current QEMU will behave properly. For software that
ignores that directive an sets some of those bits, the spec says
nothing  about what hardware should do, so both the old an the new
behavior are fine.

Jonathan

Re: [PATCH v2 2/2] spapr/irq: Only claim VALID interrupts at the KVM level

2019-09-25 Thread Greg Kurz

On Mon, 16 Sep 2019 10:44:32 +1000
David Gibson  wrote:

> On Wed, Sep 11, 2019 at 03:39:37PM +0200, Cédric Le Goater wrote:
> > A typical pseries VM with 16 vCPUs, one disk, one network adapater
> > uses less than 100 interrupts but the whole IRQ number space of the
> > QEMU machine is allocated at reset time and it is 8K wide. This is
> > wasting a considerable amount of interrupt numbers in the global IRQ
> > space which has 1M interrupts per socket on a POWER9.
> > 
> > To optimise the HW resources, only request at the KVM level interrupts
> > which have been claimed by the guest. This will help to increase the
> > maximum number of VMs per system and also help supporting nested guests
> > using the XIVE interrupt mode.
> > 
> > Signed-off-by: Cédric Le Goater 
> 
> Applied to ppc-for-4.2, thanks.
> 

While experimenting 4.1->4.2 migration with your irq cleanup series,
I've hit this:

qemu-system-ppc64: KVM_SET_DEVICE_ATTR failed: Group 3 attr 0x1300: 
Invalid argument
qemu-system-ppc64: error while loading state for instance 0x0 of device 'spapr'
qemu-system-ppc64: load of migration failed: Operation not permitted

Failing to restore the source config with EINVAL seems to come from the
following check in kvmppc_xive_native_set_source_config():

if (!state->valid)
return -EINVAL;

which makes sense since we haven't requested any interrupt yet.

We should hence do it at post load before restoring the source
config.

I'll send a patch ASAP.

> > ---
> >  hw/intc/spapr_xive_kvm.c | 29 ++---
> >  hw/intc/xics_kvm.c   |  8 
> >  2 files changed, 34 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> > index 17af4d19f54e..71b88d7797bc 100644
> > --- a/hw/intc/spapr_xive_kvm.c
> > +++ b/hw/intc/spapr_xive_kvm.c
> > @@ -255,11 +255,16 @@ void kvmppc_xive_source_reset_one(XiveSource *xsrc, 
> > int srcno, Error **errp)
> >  
> >  static void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp)
> >  {
> > +SpaprXive *xive = SPAPR_XIVE(xsrc->xive);
> >  int i;
> >  
> >  for (i = 0; i < xsrc->nr_irqs; i++) {
> >  Error *local_err = NULL;
> >  
> > +if (!xive_eas_is_valid(&xive->eat[i])) {
> > +continue;
> > +}
> > +
> >  kvmppc_xive_source_reset_one(xsrc, i, &local_err);
> >  if (local_err) {
> >  error_propagate(errp, local_err);
> > @@ -328,11 +333,18 @@ uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int 
> > srcno, uint32_t offset,
> >  
> >  static void kvmppc_xive_source_get_state(XiveSource *xsrc)
> >  {
> > +SpaprXive *xive = SPAPR_XIVE(xsrc->xive);
> >  int i;
> >  
> >  for (i = 0; i < xsrc->nr_irqs; i++) {
> > +uint8_t pq;
> > +
> > +if (!xive_eas_is_valid(&xive->eat[i])) {
> > +continue;
> > +}
> > +
> >  /* Perform a load without side effect to retrieve the PQ bits */
> > -uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_GET);
> > +pq = xive_esb_read(xsrc, i, XIVE_ESB_GET);
> >  
> >  /* and save PQ locally */
> >  xive_source_esb_set(xsrc, i, pq);
> > @@ -521,9 +533,14 @@ static void kvmppc_xive_change_state_handler(void 
> > *opaque, int running,
> >   */
> >  if (running) {
> >  for (i = 0; i < xsrc->nr_irqs; i++) {
> > -uint8_t pq = xive_source_esb_get(xsrc, i);
> > +uint8_t pq;
> >  uint8_t old_pq;
> >  
> > +if (!xive_eas_is_valid(&xive->eat[i])) {
> > +continue;
> > +}
> > +
> > +pq = xive_source_esb_get(xsrc, i);
> >  old_pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_00 + (pq << 
> > 8));
> >  
> >  /*
> > @@ -545,7 +562,13 @@ static void kvmppc_xive_change_state_handler(void 
> > *opaque, int running,
> >   * migration is in progress.
> >   */
> >  for (i = 0; i < xsrc->nr_irqs; i++) {
> > -uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_GET);
> > +uint8_t pq;
> > +
> > +if (!xive_eas_is_valid(&xive->eat[i])) {
> > +continue;
> > +}
> > +
> > +pq = xive_esb_read(xsrc, i, XIVE_ESB_GET);
> >  
> >  /*
> >   * PQ is set to PENDING to possibly catch a triggered
> > diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> > index a4d2e876cc5f..ba90d6dc966c 100644
> > --- a/hw/intc/xics_kvm.c
> > +++ b/hw/intc/xics_kvm.c
> > @@ -190,6 +190,10 @@ void ics_get_kvm_state(ICSState *ics)
> >  for (i = 0; i < ics->nr_irqs; i++) {
> >  ICSIRQState *irq = &ics->irqs[i];
> >  
> > +if (ics_irq_free(ics, i)) {
> > +continue;
> > +}
> > +
> >  kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES,
> >i + ics->offset, &state, false, &error_fatal);
> >  
> > @@ -301,6 +305,10 @@ int ics_set_kvm_state(ICSState *ics, Error **errp)
> >  Error *lo

Re: [PATCH v3 05/25] scripts: add coccinelle script to fix error_append_hint usage

2019-09-25 Thread Vladimir Sementsov-Ogievskiy

24.09.2019 23:48, Eric Blake wrote:
> On 9/24/19 3:08 PM, Vladimir Sementsov-Ogievskiy wrote:
>> error_append_hint will not work, if errp == &fatal_error, as program
>> will exit before error_append_hint call. Fix this by use of special
>> macro ERRP_FUNCTION_BEGIN.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
> 
> With the approach of a partial cleanup (rather than globally enforcing
> it for all functions with errp parameter), we'll probably be rerunning
> this Coccinelle script regularly, to track down any regressions.
> 
> 
>> +++ b/scripts/coccinelle/fix-error_append_hint-usage.cocci
>> @@ -0,0 +1,25 @@
>> +@rule0@
>> +// Add invocation to errp-functions
>> +identifier fn;
>> +@@
>> +
>> + fn(..., Error **errp, ...)
>> + {
>> ++   ERRP_FUNCTION_BEGIN();
>> +<+...
>> +error_append_hint(errp, ...);
>> +...+>
>> + }
> 
> Does not catch the case that we want to also use the macro for any use
> of *errp, but we can augment that later.

I don't want to include *errp here, as actually a lot of *errp invocations in
code are correct: they do it if errp is not NULL. So, it's not related to plan 
B.

Still, I think we forget about error_prepend :)))

I've checked, if I include error_prepend here, series becomes 30 patches, which 
is
not significantly larger. So I think, I'll cover error_prepend in v4.

> 
>> +
>> +@@
>> +// Drop doubled invocation
>> +identifier rule0.fn;
>> +@@
>> +
>> + fn(...)
>> +{
>> +ERRP_FUNCTION_BEGIN();
>> +-   ERRP_FUNCTION_BEGIN();
>> +...
>> +}
> 
> This is smaller than the script you posted in v2, and thus I'm a bit
> more confident in stating that it looks correct and idempotent.
> 
> Reviewed-by: Eric Blake 
> 


-- 
Best regards,
Vladimir

Re: [PATCH v4 09/16] cputlb: Move NOTDIRTY handling from I/O path to TLB path

2019-09-25 Thread Alex Bennée



Richard Henderson  writes:

> Pages that we want to track for NOTDIRTY are RAM.  We do not
> really need to go through the I/O path to handle them.
>
> Acked-by: David Hildenbrand 
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Richard Henderson 
> ---
>  include/exec/cpu-common.h |  2 --
>  accel/tcg/cputlb.c| 26 +---
>  exec.c| 50 ---
>  memory.c  | 16 -
>  4 files changed, 23 insertions(+), 71 deletions(-)

Nice clean-up ;)

Reviewed-by: Alex Bennée 


>
> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> index 1c0e03ddc2..81753bbb34 100644
> --- a/include/exec/cpu-common.h
> +++ b/include/exec/cpu-common.h
> @@ -100,8 +100,6 @@ void qemu_flush_coalesced_mmio_buffer(void);
>
>  void cpu_flush_icache_range(hwaddr start, hwaddr len);
>
> -extern struct MemoryRegion io_mem_notdirty;
> -
>  typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
>
>  int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index af9a44a847..05212ff244 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -904,7 +904,7 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry 
> *iotlbentry,
>  mr = section->mr;
>  mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
>  cpu->mem_io_pc = retaddr;
> -if (mr != &io_mem_notdirty && !cpu->can_do_io) {
> +if (!cpu->can_do_io) {
>  cpu_io_recompile(cpu, retaddr);
>  }
>
> @@ -945,7 +945,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry 
> *iotlbentry,
>  section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
>  mr = section->mr;
>  mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> -if (mr != &io_mem_notdirty && !cpu->can_do_io) {
> +if (!cpu->can_do_io) {
>  cpu_io_recompile(cpu, retaddr);
>  }
>  cpu->mem_io_vaddr = addr;
> @@ -1607,7 +1607,7 @@ store_helper(CPUArchState *env, target_ulong addr, 
> uint64_t val,
>  need_swap = size > 1 && (tlb_addr & TLB_BSWAP);
>
>  /* Handle I/O access.  */
> -if (likely(tlb_addr & (TLB_MMIO | TLB_NOTDIRTY))) {
> +if (tlb_addr & TLB_MMIO) {
>  io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr,
>op ^ (need_swap * MO_BSWAP));
>  return;
> @@ -1620,6 +1620,26 @@ store_helper(CPUArchState *env, target_ulong addr, 
> uint64_t val,
>
>  haddr = (void *)((uintptr_t)addr + entry->addend);
>
> +/* Handle clean RAM pages.  */
> +if (tlb_addr & TLB_NOTDIRTY) {
> +NotDirtyInfo ndi;
> +
> +/* We require mem_io_pc in tb_invalidate_phys_page_range.  */
> +env_cpu(env)->mem_io_pc = retaddr;
> +
> +memory_notdirty_write_prepare(&ndi, env_cpu(env), addr,
> +  addr + iotlbentry->addr, size);
> +
> +if (unlikely(need_swap)) {
> +store_memop(haddr, val, op ^ MO_BSWAP);
> +} else {
> +store_memop(haddr, val, op);
> +}
> +
> +memory_notdirty_write_complete(&ndi);
> +return;
> +}
> +
>  if (unlikely(need_swap)) {
>  store_memop(haddr, val, op ^ MO_BSWAP);
>  } else {
> diff --git a/exec.c b/exec.c
> index ea8c0b18ac..dc7001f115 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -88,7 +88,6 @@ static MemoryRegion *system_io;
>  AddressSpace address_space_io;
>  AddressSpace address_space_memory;
>
> -MemoryRegion io_mem_notdirty;
>  static MemoryRegion io_mem_unassigned;
>  #endif
>
> @@ -191,7 +190,6 @@ typedef struct subpage_t {
>  } subpage_t;
>
>  #define PHYS_SECTION_UNASSIGNED 0
> -#define PHYS_SECTION_NOTDIRTY 1
>
>  static void io_mem_init(void);
>  static void memory_map_init(void);
> @@ -1472,9 +1470,6 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
>  if (memory_region_is_ram(section->mr)) {
>  /* Normal RAM.  */
>  iotlb = memory_region_get_ram_addr(section->mr) + xlat;
> -if (!section->readonly) {
> -iotlb |= PHYS_SECTION_NOTDIRTY;
> -}
>  } else {
>  AddressSpaceDispatch *d;
>
> @@ -2783,42 +2778,6 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
>  }
>  }
>
> -/* Called within RCU critical section.  */
> -static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
> -   uint64_t val, unsigned size)
> -{
> -NotDirtyInfo ndi;
> -
> -memory_notdirty_write_prepare(&ndi, current_cpu, 
> current_cpu->mem_io_vaddr,
> - ram_addr, size);
> -
> -stn_p(qemu_map_ram_ptr(NULL, ram_addr), size, val);
> -memory_notdirty_write_complete(&ndi);
> -}
> -
> -static bool notdirty_mem_accepts(void *opaque, hwaddr addr,
> - unsigned size, bool is_write,
> -

Re: QEMU bitmap backup usability FAQ

2019-09-25 Thread John Snow




On 9/25/19 11:11 AM, Vladimir Sementsov-Ogievskiy wrote:
> 25.09.2019 16:52, John Snow wrote:
>>
>>
>> On 8/20/19 6:25 PM, John Snow wrote:
>>> Hi, downstream here at Red Hat I've been fielding some questions about
>>> the usability and feature readiness of Bitmaps (and related features) in
>>> QEMU.
>>>
>>> Here are some questions I answered internally that I am copying to the
>>> list for two reasons:
>>>
>>> (1) To make sure my answers are actually correct, and
>>> (2) To share this pseudo-reference with the community at large.
>>>
>>> This is long, and mostly for reference. There's a summary at the bottom
>>> with some todo items and observations about the usability of the feature
>>> as it exists in QEMU.
>>>
>>> Before too long, I intend to send a more summarized "roadmap" mail which
>>> details all of the current and remaining work to be done in and around
>>> the bitmaps feature in QEMU.
>>>
>>>
>>> Questions:
>>>
 "What format(s) is/are required for this functionality?"
>>>
>>>  From the QEMU API, any format can be used to create and author
>>> incremental backups. The only known format limitations are:
>>>
>>> 1. Persistent bitmaps cannot be created on any format except qcow2,
>>> although there are hooks to add support to other formats at a later date
>>> if desired.
>>>
>>> DANGER CAVEAT #1: Adding bitmaps to QEMU by default creates transient
>>> bitmaps instead of persistent ones.
>>>
>>> Possible TODO: Allow users to 'upgrade' transient bitmaps to persistent
>>> ones in case they made a mistake.
>>>
>>>
>>> 2. When using push backups (blockdev-backup, drive-backup), you may use
>>> any format as a target format.
>>>
>>> DANGER CAVEAT #2: without backing file and/or filesystem-less sparse
>>> support, these images will be unusable.
>>>
>>> EXAMPLE: Backing up to a raw file loses allocation information, so we
>>> can no longer distinguish between zeroes and unallocated regions. The
>>> cluster size is also lost. This file will not be usable without
>>> additional metadata recorded elsewhere.*
>>>
>>> (* This is complicated, but it is in theory possible to do a push backup
>>> to e.g. an NBD target with custom server code that saves allocation
>>> information to a metadata file, which would allow you to reconstruct
>>> backups. For instance, recording in a .json file which extents were
>>> written out would allow you to -- with a custom binary -- write this
>>> information on top of a base file to reconstruct a backup.)
>>>
>>>
>>> 3. Any format can be used for either shared storage or live storage
>>> migrations. There are TWO distinct mechanisms for migrating bitmaps:
>>>
>>> A) The bitmap is flushed to storage and re-opened on the destination.
>>> This is only supported for qcow2 and shared-storage migrations.
>>>
>>> B) The bitmap is live-migrated to the destination. This is supported for
>>> any format and can be used for either shared storage or live storage
>>> migrations.
>>>
>>> DANGER CAVEAT #3: The second bitmap migration technique there is an
>>> optional migration capability that must be enabled explicitly.
>>> Otherwise, some migration combinations may drop bitmaps.
>>>
>>> Matrix:
>>>
 migrate = migrate_capability or (persistent and shared_storage)
>>>
>>> Enumerated:
>>>
>>> live storage + raw : transient + no-capability: Dropped
>>> live-storage + raw : transient + bm-capability: Migrated
>>> live-storage + qcow2 : transient + no-capability: Dropped
>>> live-storage + qcow2 : transient + bm-capability: Migrated
>>> live-storage + qcow2 : persistent + no-capability: Dropped (!)
>>> live-storage + qcow2 : persistent + bm-capability: Migrated
>>>
>>> shared-storage + raw : transient - no-capability: Dropped
>>> shared-storage + raw : transient + bm-capability: Migrated
>>> shared-storage + qcow2 : transient + no-capability: Migrated
>>> shared-storage + qcow2 : transient + bm-capability: Migrated
>>> shared-storage + qcow2 : persistent + no-capability: Migrated
>>> shared-storage + qcow2 : persistent + bm-capability: Migrated
>>>
>>> Enabling the bitmap migration capability will ALWAYS migrate the bitmap.
>>> If it's disabled, we will only migrate the bitmaps for shared storage
>>> migrations where the bitmap is persistent, which is a qcow2-only case.
>>>
>>> There is no warning or error if you attempt to migrate in a manner that
>>> loses your bitmaps.
>>>
>>> (I might be persuaded to add a case for when you are doing a live
>>> storage migration of qcow2 with persistent bitmaps, which is somewhat a
>>> conflicting case: you've asked for the bitmap to be persistent, but it
>>> seems likely that if this ever happens in practice, it's because you
>>> have neglected to ask for it to be migrated to the new host.)
>>>
>>> See iotest 169 for more details on this.
>>>
>>> At present, these are the only format limitations I am consciously aware
>>> of. From a management API/GUI perspective, it makes sense to restrict
>>> the feature set to "qcow2 only" to minimize edge cases.
>

Re: [PATCH v4 08/16] cputlb: Move ROM handling from I/O path to TLB path

2019-09-25 Thread Alex Bennée



David Hildenbrand  writes:

> On 25.09.19 02:16, Alex Bennée wrote:
>>
>> Richard Henderson  writes:
>>
>>> It does not require going through the whole I/O path
>>> in order to discard a write.
>>>
>>> Reviewed-by: David Hildenbrand 
>>> Signed-off-by: Richard Henderson 
>>> ---
>>>  include/exec/cpu-all.h|  5 -
>>>  include/exec/cpu-common.h |  1 -
>>>  accel/tcg/cputlb.c| 35 +++--
>>>  exec.c| 41 +--
>>>  4 files changed, 25 insertions(+), 57 deletions(-)
>>>
>>> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
>>> index d148bded35..26547cd6dd 100644
>>> --- a/include/exec/cpu-all.h
>>> +++ b/include/exec/cpu-all.h
>> 
>>> @@ -822,16 +821,17 @@ void tlb_set_page_with_attrs(CPUState *cpu, 
>>> target_ulong vaddr,
>>>
>>>  tn.addr_write = -1;
>>>  if (prot & PAGE_WRITE) {
>>> -if ((memory_region_is_ram(section->mr) && section->readonly)
>>> -|| memory_region_is_romd(section->mr)) {
>>> -/* Write access calls the I/O callback.  */
>>> -tn.addr_write = address | TLB_MMIO;
>>> -} else if (memory_region_is_ram(section->mr)
>>> -   && cpu_physical_memory_is_clean(
>>> -   memory_region_get_ram_addr(section->mr) + xlat)) {
>>> -tn.addr_write = address | TLB_NOTDIRTY;
>>> -} else {
>>> -tn.addr_write = address;
>>> +tn.addr_write = address;
>>> +if (memory_region_is_romd(section->mr)) {
>>> +/* Use the MMIO path so that the device can switch states. */
>>> +tn.addr_write |= TLB_MMIO;
>>> +} else if (memory_region_is_ram(section->mr)) {
>>> +if (section->readonly) {
>>> +tn.addr_write |= TLB_ROM;
>>> +} else if (cpu_physical_memory_is_clean(
>>> +memory_region_get_ram_addr(section->mr) + xlat)) {
>>> +tn.addr_write |= TLB_NOTDIRTY;
>>> +}
>>
>> This reads a bit weird because we are saying romd isn't a ROM but
>> something that identifies as RAM can be ROM rather than just a memory
>> protected piece of RAM.
>>
>
> I proposed a bunch of alternatives as reply to v3 (e.g.,
> TLB_DISCARD_WRITES), either Richard missed them or I missed his reply
> :)

That certainly passes the "does what it says on the tin" test.

>
>>>  }
>>>  if (prot & PAGE_WRITE_INV) {
>>>  tn.addr_write |= TLB_INVALID_MASK;
>>
>> So at the moment I don't see what the TLB_ROM flag gives us that setting
>> TLB_INVALID doesn't - either way we won't make the write to our
>> ram-not-ram-rom.
>
> TLB_INVALID will trigger a new MMU translation on every access to fill
> the TLB. TLB_ROM states that we have a valid entry, but that writes are
> to be discarded.

Ahh yes, I didn't notice it because it's hidden in he tlb_hit check.

--
Alex Bennée

[PATCH] spapr/irq: Fix migration of older machine types with XIVE

2019-09-25 Thread Greg Kurz

Recent patch "spapr/irq: Only claim VALID interrupts at the KVM level"
broke migration of older machine types started with ic-mode=xive:

qemu-system-ppc64: KVM_SET_DEVICE_ATTR failed: Group 3 attr 0x1300: 
Invalid argument
qemu-system-ppc64: error while loading state for instance 0x0 of device 'spapr'
qemu-system-ppc64: load of migration failed: Operation not permitted

This is because we should set the interrupt source in KVM at post load,
since we no longer do it unconditionaly at reset time for all interrupts.

Signed-off-by: Greg Kurz 
---

David,

I guess you should probably fold this fix directly into Cedric's
patch (currently SHA1 966d526cdfd9 in ppc-for-4.2) to avoid
bisection breakage.
---
 hw/intc/spapr_xive_kvm.c |   11 +++
 1 file changed, 11 insertions(+)

diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index 71b88d7797bc..2006f96aece1 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -678,6 +678,17 @@ int kvmppc_xive_post_load(SpaprXive *xive, int version_id)
 continue;
 }
 
+/*
+ * We can only restore the source config if the source has been
+ * previously set in KVM. Since we don't do that for all interrupts
+ * at reset time anymore, let's do it now.
+ */
+kvmppc_xive_source_reset_one(&xive->source, i, &local_err);
+if (local_err) {
+error_report_err(local_err);
+return -1;
+}
+
 kvmppc_xive_set_source_config(xive, i, &xive->eat[i], &local_err);
 if (local_err) {
 error_report_err(local_err);

Re: [PATCH v4 10/16] cputlb: Partially inline memory_region_section_get_iotlb

2019-09-25 Thread Alex Bennée



Richard Henderson  writes:

> There is only one caller, tlb_set_page_with_attrs.  We cannot
> inline the entire function because the AddressSpaceDispatch
> structure is private to exec.c, and cannot easily be moved to
> include/exec/memory-internal.h.
>
> Compute is_ram and is_romd once within tlb_set_page_with_attrs.
> Fold the number of tests against these predicates.  Compute
> cpu_physical_memory_is_clean outside of the tlb lock region.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  include/exec/exec-all.h |  6 +---
>  accel/tcg/cputlb.c  | 68 ++---
>  exec.c  | 22 ++---
>  3 files changed, 47 insertions(+), 49 deletions(-)
>
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 81b02eb2fe..49db07ba0b 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -509,11 +509,7 @@ address_space_translate_for_iotlb(CPUState *cpu, int 
> asidx, hwaddr addr,
>hwaddr *xlat, hwaddr *plen,
>MemTxAttrs attrs, int *prot);
>  hwaddr memory_region_section_get_iotlb(CPUState *cpu,
> -   MemoryRegionSection *section,
> -   target_ulong vaddr,
> -   hwaddr paddr, hwaddr xlat,
> -   int prot,
> -   target_ulong *address);
> +   MemoryRegionSection *section);
>  #endif
>
>  /* vl.c */
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 05212ff244..05530a8b0c 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -704,13 +704,14 @@ void tlb_set_page_with_attrs(CPUState *cpu, 
> target_ulong vaddr,
>  MemoryRegionSection *section;
>  unsigned int index;
>  target_ulong address;
> -target_ulong code_address;
> +target_ulong write_address;
>  uintptr_t addend;
>  CPUTLBEntry *te, tn;
>  hwaddr iotlb, xlat, sz, paddr_page;
>  target_ulong vaddr_page;
>  int asidx = cpu_asidx_from_attrs(cpu, attrs);
>  int wp_flags;
> +bool is_ram, is_romd;
>
>  assert_cpu_is_self(cpu);
>
> @@ -739,18 +740,46 @@ void tlb_set_page_with_attrs(CPUState *cpu, 
> target_ulong vaddr,
>  if (attrs.byte_swap) {
>  address |= TLB_BSWAP;
>  }
> -if (!memory_region_is_ram(section->mr) &&
> -!memory_region_is_romd(section->mr)) {
> -/* IO memory case */
> -address |= TLB_MMIO;
> -addend = 0;
> -} else {
> +
> +is_ram = memory_region_is_ram(section->mr);
> +is_romd = memory_region_is_romd(section->mr);
> +
> +if (is_ram || is_romd) {
> +/* RAM and ROMD both have associated host memory. */
>  addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) + xlat;
> +} else {
> +/* I/O does not; force the host address to NULL. */
> +addend = 0;
> +}
> +
> +write_address = address;
> +if (is_ram) {
> +iotlb = memory_region_get_ram_addr(section->mr) + xlat;
> +/*
> + * Computing is_clean is expensive; avoid all that unless
> + * the page is actually writable.
> + */
> +if (prot & PAGE_WRITE) {
> +if (section->readonly) {
> +write_address |= TLB_ROM;
> +} else if (cpu_physical_memory_is_clean(iotlb)) {
> +write_address |= TLB_NOTDIRTY;
> +}
> +}
> +} else {
> +/* I/O or ROMD */
> +iotlb = memory_region_section_get_iotlb(cpu, section) + xlat;
> +/*
> + * Writes to romd devices must go through MMIO to enable write.
> + * Reads to romd devices go through the ram_ptr found above,
> + * but of course reads to I/O must go through MMIO.
> + */
> +write_address |= TLB_MMIO;
> +if (!is_romd) {
> +address = write_address;
> +}
>  }
>
> -code_address = address;
> -iotlb = memory_region_section_get_iotlb(cpu, section, vaddr_page,
> -paddr_page, xlat, prot, 
> &address);
>  wp_flags = cpu_watchpoint_address_matches(cpu, vaddr_page,
>TARGET_PAGE_SIZE);
>
> @@ -790,8 +819,8 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong 
> vaddr,
>  /*
>   * At this point iotlb contains a physical section number in the lower
>   * TARGET_PAGE_BITS, and either
> - *  + the ram_addr_t of the page base of the target RAM (if NOTDIRTY or 
> ROM)
> - *  + the offset within section->mr of the page base (otherwise)
> + *  + the ram_addr_t of the page base of the target RAM (RAM)
> + *  + the offset within section->mr of the page base (I/O, ROMD)
>   * We subtract the vaddr_page (which is page aligned and thus won't
>   * disturb the low bi

Re: [PATCH v4 11/16] cputlb: Merge and move memory_notdirty_write_{prepare,complete}

2019-09-25 Thread Alex Bennée



Richard Henderson  writes:

> Since 9458a9a1df1a, all readers of the dirty bitmaps wait
> for the rcu lock, which means that they wait until the end
> of any executing TranslationBlock.
>
> As a consequence, there is no need for the actual access
> to happen in between the _prepare and _complete.  Therefore,
> we can improve things by merging the two functions into
> notdirty_write and dropping the NotDirtyInfo structure.
>
> In addition, the only users of notdirty_write are in cputlb.c,
> so move the merged function there.  Pass in the CPUIOTLBEntry
> from which the ram_addr_t may be computed.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  include/exec/memory-internal.h | 65 -
>  accel/tcg/cputlb.c | 76 +++---
>  exec.c | 44 
>  3 files changed, 42 insertions(+), 143 deletions(-)
>
> diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
> index ef4fb92371..9fcc2af25c 100644
> --- a/include/exec/memory-internal.h
> +++ b/include/exec/memory-internal.h
> @@ -49,70 +49,5 @@ void address_space_dispatch_free(AddressSpaceDispatch *d);
>
>  void mtree_print_dispatch(struct AddressSpaceDispatch *d,
>MemoryRegion *root);
> -
> -struct page_collection;
> -
> -/* Opaque struct for passing info from memory_notdirty_write_prepare()
> - * to memory_notdirty_write_complete(). Callers should treat all fields
> - * as private, with the exception of @active.
> - *
> - * @active is a field which is not touched by either the prepare or
> - * complete functions, but which the caller can use if it wishes to
> - * track whether it has called prepare for this struct and so needs
> - * to later call the complete function.
> - */
> -typedef struct {
> -CPUState *cpu;
> -struct page_collection *pages;
> -ram_addr_t ram_addr;
> -vaddr mem_vaddr;
> -unsigned size;
> -bool active;
> -} NotDirtyInfo;
> -
> -/**
> - * memory_notdirty_write_prepare: call before writing to non-dirty memory
> - * @ndi: pointer to opaque NotDirtyInfo struct
> - * @cpu: CPU doing the write
> - * @mem_vaddr: virtual address of write
> - * @ram_addr: the ram address of the write
> - * @size: size of write in bytes
> - *
> - * Any code which writes to the host memory corresponding to
> - * guest RAM which has been marked as NOTDIRTY must wrap those
> - * writes in calls to memory_notdirty_write_prepare() and
> - * memory_notdirty_write_complete():
> - *
> - *  NotDirtyInfo ndi;
> - *  memory_notdirty_write_prepare(&ndi, );
> - *  ... perform write here ...
> - *  memory_notdirty_write_complete(&ndi);
> - *
> - * These calls will ensure that we flush any TCG translated code for
> - * the memory being written, update the dirty bits and (if possible)
> - * remove the slowpath callback for writing to the memory.
> - *
> - * This must only be called if we are using TCG; it will assert otherwise.
> - *
> - * We may take locks in the prepare call, so callers must ensure that
> - * they don't exit (via longjump or otherwise) without calling complete.
> - *
> - * This call must only be made inside an RCU critical section.
> - * (Note that while we're executing a TCG TB we're always in an
> - * RCU critical section, which is likely to be the case for callers
> - * of these functions.)
> - */
> -void memory_notdirty_write_prepare(NotDirtyInfo *ndi,
> -   CPUState *cpu,
> -   vaddr mem_vaddr,
> -   ram_addr_t ram_addr,
> -   unsigned size);
> -/**
> - * memory_notdirty_write_complete: finish write to non-dirty memory
> - * @ndi: pointer to the opaque NotDirtyInfo struct which was initialized
> - * by memory_not_dirty_write_prepare().
> - */
> -void memory_notdirty_write_complete(NotDirtyInfo *ndi);
> -
>  #endif
>  #endif
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 05530a8b0c..09b0df87c6 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -33,6 +33,7 @@
>  #include "exec/helper-proto.h"
>  #include "qemu/atomic.h"
>  #include "qemu/atomic128.h"
> +#include "translate-all.h"
>
>  /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */
>  /* #define DEBUG_TLB */
> @@ -1084,6 +1085,37 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env, 
> target_ulong addr)
>  return qemu_ram_addr_from_host_nofail(p);
>  }
>
> +static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
> +   CPUIOTLBEntry *iotlbentry, uintptr_t retaddr)
> +{
> +ram_addr_t ram_addr = mem_vaddr + iotlbentry->addr;
> +
> +trace_memory_notdirty_write_access(mem_vaddr, ram_addr, size);
> +
> +if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
> +struct page_collection *pages
> += page_collection_lock(ram_addr, ram_addr + s

Re: [PATCH V3] target/riscv: Bugfix reserved bits in PTE for RV64

2019-09-25 Thread Guo Ren

"Bits 63–54 are reserved for future use and must be
zeroed by software for forward compatibility."

That doesn't mean 63-54 are belong to ppn, it's reserved for future
and nobody know 63-54 will be part of ppn.
Current riscv qemu ppn implementation is obviously wrong. It shouldn't
care the software's behavior, please follow the spec.

On Wed, Sep 25, 2019 at 11:58 PM Jonathan Behrens  wrote:
>
> > The specification is very clear: these bits are not part of ppn, not
> > part of the translation target address. The current code is against
> > the riscv-privilege specification.
>
> If all of the reserved bits are zero then the patch changes nothing.
> Further the only normative mention of the reserved bits in the spec
> says they must be: "Bits 63–54 are reserved for future use and must be
> zeroed by software for forward compatibility." Provided that software
> follows the spec current QEMU will behave properly. For software that
> ignores that directive an sets some of those bits, the spec says
> nothing  about what hardware should do, so both the old an the new
> behavior are fine.
>
> Jonathan



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

Re: [PATCH-for-4.2 v11 00/11] ARM virt: ACPI memory hotplug support

2019-09-25 Thread Michael S. Tsirkin

On Wed, Sep 25, 2019 at 05:37:53PM +0200, Igor Mammedov wrote:
> On Wed, 25 Sep 2019 11:28:42 -0400
> "Michael S. Tsirkin"  wrote:
> 
> > On Wed, Sep 18, 2019 at 02:06:22PM +0100, Shameer Kolothum wrote:
> > > This series is an attempt to provide device memory hotplug support 
> > > on ARM virt platform. This is based on Eric's recent works here[1]
> > > and carries some of the pc-dimm related patches dropped from his
> > > series.
> > > 
> > > The kernel support for arm64 memory hot add was added recently by
> > > Robin and hence the guest kernel should be => 5.0-rc1.
> > > 
> > > NVDIM support is not included currently as we still have an unresolved
> > > issue while hot adding NVDIMM[2]. However NVDIMM cold plug patches
> > > can be included, but not done for now, for keeping it simple.
> > > 
> > > This makes use of GED device to sent hotplug ACPI events to the
> > > Guest. GED code is based on Nemu. Thanks to the efforts of Samuel and
> > > Sebastien to add the hardware-reduced support to Nemu using GED
> > > device[3]. (Please shout if I got the author/signed-off wrong for
> > > those patches or missed any names).
> > > 
> > > This is sanity tested on a HiSilicon ARM64 platform and appreciate
> > > any further testing.
> > > 
> > > Note:
> > > Attempted adding dimm_pxm test case to bios-tables-test for arm/virt.
> > > But noticed the issue decribed here[5]. This is under investigation 
> > > now.
> > > 
> > > Thanks,
> > > Shameer
> > 
> > 
> > Which tree is this going through? Mine or ARM?
> 
> I'd assume your tree???
> (You are the wizard who knows how to handle bios-tables-test-allowed-diff.h 
> on merge)

Sure. Peter if you agree, could you send your ack for that please?


> > 
> > 
> > > [1] https://patchwork.kernel.org/cover/10837565/
> > > [2] https://patchwork.kernel.org/cover/10783589/
> > > [3] https://github.com/intel/nemu/blob/topic/virt-x86/hw/acpi/ged.c
> > > [4] 
> > > http://lists.infradead.org/pipermail/linux-arm-kernel/2019-May/651763.html
> > > [5] https://www.mail-archive.com/qemu-devel@nongnu.org/msg632651.html
> > > 
> > > v10 --> v11
> > > -Changed patch #10 to update bios-tables-test-allowed-diff.h with a
> > >  list of expected ACPI tables.
> > > -GED document changed to rst format (patch #9)
> > > -Addressed comments from Igor (patch #3 & #5)
> > > -Igor's R-by to #7, #8 & #11.
> > > 
> > > v9 --> v10
> > >  -Fix for "make check" failure on x86_64(Patch #1).
> > >  -Minor updates based on Eric's comments.
> > >  -Dropped patch "hw/arm/virt: Add 4.2 machine type" as this is already
> > >   in master now.
> > >  -Added R-by tags by Eric.
> > > 
> > > v8 --> v9
> > >  -Changes related to GED being a TYPE_SYS_BUS_DEVICE now.
> > >  -Re-arranged patches 8 and 9.
> > >  -Added GED ABI documentation(patch #10).
> > >  -Added numamem and memhp tests to arm/virt(#11 and #12)
> > >  -Dropped few R-by tags as code has changed a bit.
> > >  -Please see Individual patch history for details.
> > >  
> > > v7 --> v8
> > >  -Addressed comments from Igor.Please see individual patches.
> > >  -Updated bios-tables-test-allowed-diff.h to avoid "make check"
> > >   failure (patch #6) and dropped patch #10
> > >  -Added Igor's R-by to patches 4 & 5.
> > >  -Dropped Erics's R-by from patch #9 for now.
> > > 
> > > v6 --> v7
> > > - Added 4.2 machine support and restricted GED creation for < 4.2
> > >   This is to address the migration test fail reported by Eric.
> > > - Included "tests: Update DSDT ACPI table.." patch(#10) from Eric
> > >   to fix the "make check" bios-tables-test failure.
> > >   
> > > v5 --> v6
> > > 
> > > -Addressed comments from Eric.
> > > -Added R-by from Eric and Igor.
> > > 
> > > v4 --> v5
> > > -Removed gsi/ged-irq routing in virt.
> > > -Added Migration support.
> > > -Dropped support for DT coldplug case based on the discussions
> > >  here[4]
> > > -Added system_powerdown support through GED.
> > > 
> > > v3 --> v4
> > > Addressed comments from Igor and Eric,
> > > -Renamed "virt-acpi" to "acpi-ged".
> > > -Changed ged device parent to TYPE_DEVICE.
> > > -Introduced DT memory node property "hotpluggable" to resolve device
> > >  memory being treated as early boot memory issue(patch #7).
> > > -Combined patches #3 and #9 from v3 into #3.
> > > 
> > > v2 --> v3
> > > 
> > > Addressed comments from Igor and Eric,
> > > -Made virt acpi device platform independent and moved
> > >  to hw/acpi/generic_event_device.c
> > > -Moved ged specific code into hw/acpi/generic_event_device.c
> > > -Introduced an opt-in feature "fdt" to resolve device-memory being
> > >  treated as early boot memory.
> > > -Dropped patch #1 from v2.
> > > 
> > > RFC --> v2
> > > 
> > > -Use GED device instead of GPIO for ACPI hotplug events.
> > > -Removed NVDIMM support for now.
> > > -Includes dropped patches from Eric's v9 series.
> > > 
> > > Eric Auger (1):
> > >   hw/arm/virt: Add memory hotplug framework
> > > 
> > > Samuel Ortiz (2):
> > >   hw/acpi: Do not create memory hotplug method when h

Re: [PATCH v4 12/16] cputlb: Handle TLB_NOTDIRTY in probe_access

2019-09-25 Thread Alex Bennée



Richard Henderson  writes:

> We can use notdirty_write for the write and
> return a valid host pointer for this case.

nit: reflow the text

>
> Signed-off-by: Richard Henderson 
> ---
>  accel/tcg/cputlb.c | 26 +-
>  1 file changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 09b0df87c6..d0bdef1eb3 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1167,16 +1167,24 @@ void *probe_access(CPUArchState *env, target_ulong 
> addr, int size,
>  return NULL;
>  }
>
> -/* Handle watchpoints.  */
> -if (tlb_addr & TLB_WATCHPOINT) {
> -cpu_check_watchpoint(env_cpu(env), addr, size,
> - env_tlb(env)->d[mmu_idx].iotlb[index].attrs,
> - wp_access, retaddr);
> -}
> +if (unlikely(tlb_addr & TLB_FLAGS_MASK)) {
> +CPUIOTLBEntry *iotlbentry =
> &env_tlb(env)->d[mmu_idx].iotlb[index];

I was going to say we compute this early but I'm assuming the compiler
can figure that out if it needs to.

Reviewed-by: Alex Bennée 


--
Alex Bennée

Re: [PATCH v4 13/16] cputlb: Remove cpu->mem_io_vaddr

2019-09-25 Thread Alex Bennée



Richard Henderson  writes:

> With the merge of notdirty handling into store_helper,
> the last user of cpu->mem_io_vaddr was removed.
>
> Reviewed-by: David Hildenbrand 
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  include/hw/core/cpu.h | 2 --
>  accel/tcg/cputlb.c| 2 --
>  hw/core/cpu.c | 1 -
>  3 files changed, 5 deletions(-)
>
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index c7cda65c66..031f587e51 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -338,7 +338,6 @@ struct qemu_work_item;
>   * @next_cpu: Next CPU sharing TB cache.
>   * @opaque: User data.
>   * @mem_io_pc: Host Program Counter at which the memory was accessed.
> - * @mem_io_vaddr: Target virtual address at which the memory was accessed.
>   * @kvm_fd: vCPU file descriptor for KVM.
>   * @work_mutex: Lock to prevent multiple access to queued_work_*.
>   * @queued_work_first: First asynchronous work pending.
> @@ -413,7 +412,6 @@ struct CPUState {
>   * we store some rarely used information in the CPU context.
>   */
>  uintptr_t mem_io_pc;
> -vaddr mem_io_vaddr;
>  /*
>   * This is only needed for the legacy cpu_unassigned_access() hook;
>   * when all targets using it have been converted to use
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index d0bdef1eb3..0ca6ee60b3 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -927,7 +927,6 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry 
> *iotlbentry,
>  cpu_io_recompile(cpu, retaddr);
>  }
>
> -cpu->mem_io_vaddr = addr;
>  cpu->mem_io_access_type = access_type;
>
>  if (mr->global_locking && !qemu_mutex_iothread_locked()) {
> @@ -967,7 +966,6 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry 
> *iotlbentry,
>  if (!cpu->can_do_io) {
>  cpu_io_recompile(cpu, retaddr);
>  }
> -cpu->mem_io_vaddr = addr;
>  cpu->mem_io_pc = retaddr;
>
>  if (mr->global_locking && !qemu_mutex_iothread_locked()) {
> diff --git a/hw/core/cpu.c b/hw/core/cpu.c
> index 0035845511..73b1ee34d0 100644
> --- a/hw/core/cpu.c
> +++ b/hw/core/cpu.c
> @@ -261,7 +261,6 @@ static void cpu_common_reset(CPUState *cpu)
>  cpu->interrupt_request = 0;
>  cpu->halted = 0;
>  cpu->mem_io_pc = 0;
> -cpu->mem_io_vaddr = 0;
>  cpu->icount_extra = 0;
>  atomic_set(&cpu->icount_decr_ptr->u32, 0);
>  cpu->can_do_io = 1;


--
Alex Bennée

Re: [PATCH v4 14/16] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access

2019-09-25 Thread Alex Bennée



Richard Henderson  writes:

> All callers pass false to this argument.  Remove it and pass the
> constant on to tb_invalidate_phys_page_range__locked.
>
> Reviewed-by: David Hildenbrand 
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  accel/tcg/translate-all.h | 3 +--
>  accel/tcg/translate-all.c | 6 ++
>  exec.c| 4 ++--
>  3 files changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
> index 64f5fd9a05..31f2117188 100644
> --- a/accel/tcg/translate-all.h
> +++ b/accel/tcg/translate-all.h
> @@ -28,8 +28,7 @@ struct page_collection *page_collection_lock(tb_page_addr_t 
> start,
>  void page_collection_unlock(struct page_collection *set);
>  void tb_invalidate_phys_page_fast(struct page_collection *pages,
>tb_page_addr_t start, int len);
> -void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
> -   int is_cpu_write_access);
> +void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
>  void tb_check_watchpoint(CPUState *cpu);
>
>  #ifdef CONFIG_USER_ONLY
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 5d1e08b169..de4b697163 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1983,8 +1983,7 @@ tb_invalidate_phys_page_range__locked(struct 
> page_collection *pages,
>   *
>   * Called with mmap_lock held for user-mode emulation
>   */
> -void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
> -   int is_cpu_write_access)
> +void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end)
>  {
>  struct page_collection *pages;
>  PageDesc *p;
> @@ -1996,8 +1995,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t 
> start, tb_page_addr_t end,
>  return;
>  }
>  pages = page_collection_lock(start, end);
> -tb_invalidate_phys_page_range__locked(pages, p, start, end,
> -  is_cpu_write_access);
> +tb_invalidate_phys_page_range__locked(pages, p, start, end, 0);
>  page_collection_unlock(pages);
>  }
>
> diff --git a/exec.c b/exec.c
> index 7d835b1a2b..b3df826039 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1012,7 +1012,7 @@ const char *parse_cpu_option(const char *cpu_option)
>  void tb_invalidate_phys_addr(target_ulong addr)
>  {
>  mmap_lock();
> -tb_invalidate_phys_page_range(addr, addr + 1, 0);
> +tb_invalidate_phys_page_range(addr, addr + 1);
>  mmap_unlock();
>  }
>
> @@ -1039,7 +1039,7 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr 
> addr, MemTxAttrs attrs)
>  return;
>  }
>  ram_addr = memory_region_get_ram_addr(mr) + addr;
> -tb_invalidate_phys_page_range(ram_addr, ram_addr + 1, 0);
> +tb_invalidate_phys_page_range(ram_addr, ram_addr + 1);
>  rcu_read_unlock();
>  }


--
Alex Bennée

Re: [PATCH v4 15/16] cputlb: Pass retaddr to tb_invalidate_phys_page_fast

2019-09-25 Thread Alex Bennée



Richard Henderson  writes:

> Rather than rely on cpu->mem_io_pc, pass retaddr down directly.
>
> Within tb_invalidate_phys_page_range__locked, the is_cpu_write_access
> parameter is non-zero exactly when retaddr would be non-zero, so that
> is a simple replacement.
>
> Recognize that current_tb_not_found is true only when mem_io_pc
> (and now retaddr) are also non-zero, so remove a redundant test.
>
> Reviewed-by: David Hildenbrand 
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  accel/tcg/translate-all.h |  3 ++-
>  accel/tcg/cputlb.c|  6 +-
>  accel/tcg/translate-all.c | 39 +++
>  3 files changed, 22 insertions(+), 26 deletions(-)
>
> diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
> index 31f2117188..135c1ea96a 100644
> --- a/accel/tcg/translate-all.h
> +++ b/accel/tcg/translate-all.h
> @@ -27,7 +27,8 @@ struct page_collection *page_collection_lock(tb_page_addr_t 
> start,
>   tb_page_addr_t end);
>  void page_collection_unlock(struct page_collection *set);
>  void tb_invalidate_phys_page_fast(struct page_collection *pages,
> -  tb_page_addr_t start, int len);
> +  tb_page_addr_t start, int len,
> +  uintptr_t retaddr);
>  void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
>  void tb_check_watchpoint(CPUState *cpu);
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 0ca6ee60b3..ea5d12c59d 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1093,11 +1093,7 @@ static void notdirty_write(CPUState *cpu, vaddr 
> mem_vaddr, unsigned size,
>  if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
>  struct page_collection *pages
>  = page_collection_lock(ram_addr, ram_addr + size);
> -
> -/* We require mem_io_pc in tb_invalidate_phys_page_range.  */
> -cpu->mem_io_pc = retaddr;
> -
> -tb_invalidate_phys_page_fast(pages, ram_addr, size);
> +tb_invalidate_phys_page_fast(pages, ram_addr, size, retaddr);
>  page_collection_unlock(pages);
>  }
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index de4b697163..db77fb221b 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1889,7 +1889,7 @@ static void
>  tb_invalidate_phys_page_range__locked(struct page_collection *pages,
>PageDesc *p, tb_page_addr_t start,
>tb_page_addr_t end,
> -  int is_cpu_write_access)
> +  uintptr_t retaddr)
>  {
>  TranslationBlock *tb;
>  tb_page_addr_t tb_start, tb_end;
> @@ -1897,9 +1897,9 @@ tb_invalidate_phys_page_range__locked(struct 
> page_collection *pages,
>  #ifdef TARGET_HAS_PRECISE_SMC
>  CPUState *cpu = current_cpu;
>  CPUArchState *env = NULL;
> -int current_tb_not_found = is_cpu_write_access;
> +bool current_tb_not_found = retaddr != 0;
> +bool current_tb_modified = false;
>  TranslationBlock *current_tb = NULL;
> -int current_tb_modified = 0;
>  target_ulong current_pc = 0;
>  target_ulong current_cs_base = 0;
>  uint32_t current_flags = 0;
> @@ -1931,24 +1931,21 @@ tb_invalidate_phys_page_range__locked(struct 
> page_collection *pages,
>  if (!(tb_end <= start || tb_start >= end)) {
>  #ifdef TARGET_HAS_PRECISE_SMC
>  if (current_tb_not_found) {
> -current_tb_not_found = 0;
> -current_tb = NULL;
> -if (cpu->mem_io_pc) {
> -/* now we have a real cpu fault */
> -current_tb = tcg_tb_lookup(cpu->mem_io_pc);
> -}
> +current_tb_not_found = false;
> +/* now we have a real cpu fault */
> +current_tb = tcg_tb_lookup(retaddr);
>  }
>  if (current_tb == tb &&
>  (tb_cflags(current_tb) & CF_COUNT_MASK) != 1) {
> -/* If we are modifying the current TB, we must stop
> -its execution. We could be more precise by checking
> -that the modification is after the current PC, but it
> -would require a specialized function to partially
> -restore the CPU state */
> -
> -current_tb_modified = 1;
> -cpu_restore_state_from_tb(cpu, current_tb,
> -  cpu->mem_io_pc, true);
> +/*
> + * If we are modifying the current TB, we must stop
> + * its execution. We could be more precise by checking
> + * that the modification is after the current PC, but it
> + * would require a specialized function to parti

Re: [PATCH v4 16/16] cputlb: Pass retaddr to tb_check_watchpoint

2019-09-25 Thread Alex Bennée



Richard Henderson  writes:

> Fixes the previous TLB_WATCHPOINT patches because we are currently
> failing to set cpu->mem_io_pc with the call to cpu_check_watchpoint.
> Pass down the retaddr directly because it's readily available.
>
> Fixes: 50b107c5d61
> Reviewed-by: David Hildenbrand 
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  accel/tcg/translate-all.h | 2 +-
>  accel/tcg/translate-all.c | 6 +++---
>  exec.c| 2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
> index 135c1ea96a..a557b4e2bb 100644
> --- a/accel/tcg/translate-all.h
> +++ b/accel/tcg/translate-all.h
> @@ -30,7 +30,7 @@ void tb_invalidate_phys_page_fast(struct page_collection 
> *pages,
>tb_page_addr_t start, int len,
>uintptr_t retaddr);
>  void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
> -void tb_check_watchpoint(CPUState *cpu);
> +void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr);
>
>  #ifdef CONFIG_USER_ONLY
>  int page_unprotect(target_ulong address, uintptr_t pc);
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index db77fb221b..66d4bc4341 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -2142,16 +2142,16 @@ static bool tb_invalidate_phys_page(tb_page_addr_t 
> addr, uintptr_t pc)
>  #endif
>
>  /* user-mode: call with mmap_lock held */
> -void tb_check_watchpoint(CPUState *cpu)
> +void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr)
>  {
>  TranslationBlock *tb;
>
>  assert_memory_lock();
>
> -tb = tcg_tb_lookup(cpu->mem_io_pc);
> +tb = tcg_tb_lookup(retaddr);
>  if (tb) {
>  /* We can use retranslation to find the PC.  */
> -cpu_restore_state_from_tb(cpu, tb, cpu->mem_io_pc, true);
> +cpu_restore_state_from_tb(cpu, tb, retaddr, true);
>  tb_phys_invalidate(tb, -1);
>  } else {
>  /* The exception probably happened in a helper.  The CPU state should
> diff --git a/exec.c b/exec.c
> index b3df826039..8a0a6613b1 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2758,7 +2758,7 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, 
> vaddr len,
>  cpu->watchpoint_hit = wp;
>
>  mmap_lock();
> -tb_check_watchpoint(cpu);
> +tb_check_watchpoint(cpu, ra);
>  if (wp->flags & BP_STOP_BEFORE_ACCESS) {
>  cpu->exception_index = EXCP_DEBUG;
>  mmap_unlock();


--
Alex Bennée

Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type

2019-09-25 Thread Paolo Bonzini

On 25/09/19 17:04, Sergio Lopez wrote:
> I'm going back to this level of the thread, because after your
> suggestion I took a deeper look at how things work around the PIC, and
> discovered I was completely wrong about my assumptions.
> 
> For virtio-mmio devices, given that we don't have the ability to
> configure vectors (as it's done in the PCI case) we're stuck with the
> ones provided by the platform PIC, which in the x86 case is the i8259
> (at least from Linux's perspective).
> 
> So we can get rid of the IOAPIC, but we need to keep the i8259 (we have
> both a userspace and a kernel implementation too, so it should be fine).

Hmm...  I would have thought the vectors are just GSIs, which will be
configured to the IOAPIC if it is present.  Maybe something is causing
Linux to ignore the IOAPIC?

> As for the PIT, we can omit it if we're running with KVM acceleration,
> as kvmclock will be used to calculate loops per jiffie and avoid the
> calibration, leaving it enabled otherwise.

Can you make it an OnOffAuto property, and default to on iff !KVM?

Paolo



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 0/5] Automatic RCU read unlock

2019-09-25 Thread Paolo Bonzini

On 25/09/19 17:28, Dr. David Alan Gilbert wrote:
> * Paolo Bonzini (pbonz...@redhat.com) wrote:
>> On 25/09/19 15:13, Dr. David Alan Gilbert wrote:
>>> * Dr. David Alan Gilbert (git) (dgilb...@redhat.com) wrote:
 From: "Dr. David Alan Gilbert" 

 This patch uses glib's g_auto mechanism to automatically free
 rcu_read_lock's at the end of the block.  Given that humans
 have a habit of forgetting an error path somewhere it's
 best to leave it to the compiler.
>>>
>>> I've had to unqueue this - clang doesn't like the apparently unused
>>> auto variable; we need to find a way to make that happy.
>>
>> __attribute__((unused))?
> 
> I worry that if I do that, then it'll optimise it out.

It cannot, since the function passed to the cleanup attribute can have
side effects.

Paolo

> 
> Dave
> 
>> Paolo
>>
>>> Dave
>>>
 v3
   Add block-head version of macro
   Rename
   Add docs
   Convert more cases using the block-head version

 Dr. David Alan Gilbert (5):
   rcu: Add automatically released rcu_read_lock variants
   migration: Fix missing rcu_read_unlock
   migration: Use automatic rcu_read unlock in ram.c
   migration: Use automatic rcu_read unlock in rdma.c
   rcu: Use automatic rc_read unlock in core memory/exec code

  docs/devel/rcu.txt  |  16 +++
  exec.c  | 120 +++-
  include/exec/ram_addr.h | 138 +--
  include/qemu/rcu.h  |  25 
  memory.c|  15 +-
  migration/ram.c | 295 +++-
  migration/rdma.c|  57 ++--
  7 files changed, 310 insertions(+), 356 deletions(-)

 -- 
 2.21.0


>>> --
>>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>>>
>>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>

[PATCH 16/15 v13] block/block-copy: fix block_copy

2019-09-25 Thread Vladimir Sementsov-Ogievskiy

block_copy_reset_unallocated may yield, and during this yield someone
may handle dirty bits which we are handling. Calling block_copy_with_*
functions on non-dirty region will lead to copying updated data, which
is wrong.

To be sure, that we call block_copy_with_* functions on dirty region,
check dirty bitmap _after_ block_copy_reset_unallocated.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi!

Suddenly I understand that there is a bug in

[PATCH v13 15/15] block/backup: use backup-top instead of write notifiers
(queued at Max's https://git.xanclic.moe/XanClic/qemu/commits/branch/block)

And here is a fix, which may be squashed to
"block/backup: use backup-top instead of write notifiers" commit.

 block/block-copy.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 55bc360d22..430b88124f 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -292,7 +292,7 @@ int coroutine_fn block_copy(BlockCopyState *s,
 assert(QEMU_IS_ALIGNED(end, s->cluster_size));
 
 while (start < end) {
-int64_t dirty_end;
+int64_t chunk_end = end, dirty_end;
 
 if (!bdrv_dirty_bitmap_get(s->copy_bitmap, start)) {
 trace_block_copy_skip(s, start);
@@ -300,12 +300,6 @@ int coroutine_fn block_copy(BlockCopyState *s,
 continue; /* already copied */
 }
 
-dirty_end = bdrv_dirty_bitmap_next_zero(s->copy_bitmap, start,
-(end - start));
-if (dirty_end < 0) {
-dirty_end = end;
-}
-
 if (s->skip_unallocated) {
 ret = block_copy_reset_unallocated(s, start, &status_bytes);
 if (ret == 0) {
@@ -313,20 +307,37 @@ int coroutine_fn block_copy(BlockCopyState *s,
 start += status_bytes;
 continue;
 }
+
+if (!bdrv_dirty_bitmap_get(s->copy_bitmap, start)) {
+/*
+ * Someone already handled this bit during yield in
+ * block_copy_reset_unallocated.
+ */
+trace_block_copy_skip(s, start);
+start += s->cluster_size;
+continue;
+}
+
 /* Clamp to known allocated region */
-dirty_end = MIN(dirty_end, start + status_bytes);
+chunk_end = MIN(chunk_end, start + status_bytes);
+}
+
+dirty_end = bdrv_dirty_bitmap_next_zero(s->copy_bitmap, start,
+chunk_end - start);
+if (dirty_end >= 0) {
+chunk_end = MIN(chunk_end, dirty_end);
 }
 
 trace_block_copy_process(s, start);
 
 if (s->use_copy_range) {
-ret = block_copy_with_offload(s, start, dirty_end);
+ret = block_copy_with_offload(s, start, chunk_end);
 if (ret < 0) {
 s->use_copy_range = false;
 }
 }
 if (!s->use_copy_range) {
-ret = block_copy_with_bounce_buffer(s, start, dirty_end,
+ret = block_copy_with_bounce_buffer(s, start, chunk_end,
 error_is_read, &bounce_buffer);
 }
 if (ret < 0) {
-- 
2.21.0

Re: [PATCH v4 08/16] cputlb: Move ROM handling from I/O path to TLB path

2019-09-25 Thread Richard Henderson

On 9/24/19 11:59 PM, David Hildenbrand wrote:
>>> +if (section->readonly) {
>>> +tn.addr_write |= TLB_ROM;
>>> +} else if (cpu_physical_memory_is_clean(
>>> +memory_region_get_ram_addr(section->mr) + xlat)) {
>>> +tn.addr_write |= TLB_NOTDIRTY;
>>> +}
>>
>> This reads a bit weird because we are saying romd isn't a ROM but
>> something that identifies as RAM can be ROM rather than just a memory
>> protected piece of RAM.
>>
> 
> I proposed a bunch of alternatives as reply to v3 (e.g.,
> TLB_DISCARD_WRITES), either Richard missed them or I missed his reply :)

Missed it, sorry.


r~

Re: [PATCH v2 2/7] s390x/mmu: Move DAT protection handling out of mmu_translate_asce()

2019-09-25 Thread Thomas Huth

On 25/09/2019 14.52, David Hildenbrand wrote:
> We'll reuse the ilen and tec definitions in mmu_translate
> soon also for all other DAT exceptions we inject. Move it to the caller,
> where we can later pair it up with other protection checks, like IEP.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/mmu_helper.c | 39 ---
>  1 file changed, 16 insertions(+), 23 deletions(-)
> 
> diff --git a/target/s390x/mmu_helper.c b/target/s390x/mmu_helper.c
> index 6a7ad33c4d..847fb240fb 100644
> --- a/target/s390x/mmu_helper.c
> +++ b/target/s390x/mmu_helper.c
> @@ -48,20 +48,6 @@ static void trigger_access_exception(CPUS390XState *env, 
> uint32_t type,
>  }
>  }
>  
> -static void trigger_prot_fault(CPUS390XState *env, target_ulong vaddr,
> -   uint64_t asc, int rw, bool exc)
> -{
> -uint64_t tec;
> -
> -tec = vaddr | (rw == MMU_DATA_STORE ? FS_WRITE : FS_READ) | 4 | asc >> 
> 46;
> -
> -if (!exc) {
> -return;
> -}
> -
> -trigger_access_exception(env, PGM_PROTECTION, ILEN_AUTO, tec);
> -}
> -
>  static void trigger_page_fault(CPUS390XState *env, target_ulong vaddr,
> uint32_t type, uint64_t asc, int rw, bool exc)
>  {
> @@ -229,7 +215,6 @@ static int mmu_translate_asce(CPUS390XState *env, 
> target_ulong vaddr,
>int *flags, int rw, bool exc)
>  {
>  int level;
> -int r;
>  
>  if (asce & ASCE_REAL_SPACE) {
>  /* direct mapping */
> @@ -277,14 +262,8 @@ static int mmu_translate_asce(CPUS390XState *env, 
> target_ulong vaddr,
>  break;
>  }
>  
> -r = mmu_translate_region(env, vaddr, asc, asce, level, raddr, flags, rw,
> - exc);
> -if (!r && rw == MMU_DATA_STORE && !(*flags & PAGE_WRITE)) {
> -trigger_prot_fault(env, vaddr, asc, rw, exc);
> -return -1;
> -}
> -
> -return r;
> +return mmu_translate_region(env, vaddr, asc, asce, level, raddr, flags, 
> rw,
> +exc);
>  }
>  
>  static void mmu_handle_skey(target_ulong addr, int rw, int *flags)
> @@ -369,6 +348,10 @@ static void mmu_handle_skey(target_ulong addr, int rw, 
> int *flags)
>  int mmu_translate(CPUS390XState *env, target_ulong vaddr, int rw, uint64_t 
> asc,
>target_ulong *raddr, int *flags, bool exc)
>  {
> +/* Code accesses have an undefined ilc, let's use 2 bytes. */
> +const int ilen = (rw == MMU_INST_FETCH) ? 2 : ILEN_AUTO;
> +uint64_t tec = (vaddr & TARGET_PAGE_MASK) | (asc >> 46) |
> +   (rw == MMU_DATA_STORE ? FS_WRITE : FS_READ);
>  uint64_t asce;
>  int r;
>  
> @@ -421,6 +404,16 @@ int mmu_translate(CPUS390XState *env, target_ulong 
> vaddr, int rw, uint64_t asc,
>  return r;
>  }
>  
> +/* check for DAT protection */
> +if (unlikely(rw == MMU_DATA_STORE && !(*flags & PAGE_WRITE))) {
> +if (exc) {
> +/* DAT sets bit 61 only */
> +tec |= 0x4;
> +trigger_access_exception(env, PGM_PROTECTION, ilen, tec);
> +}
> +return -1;
> +}
> +
>  nodat:
>  /* Convert real address -> absolute address */
>  *raddr = mmu_real2abs(env, *raddr);
> 

Reviewed-by: Thomas Huth

Re: [PATCH v2 3/7] s390x/mmu: Inject DAT exceptions from a single place

2019-09-25 Thread Thomas Huth

On 25/09/2019 14.52, David Hildenbrand wrote:
> Let's return the PGM from the translation functions on error and inject
> based on that.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/mmu_helper.c | 63 +++
>  1 file changed, 17 insertions(+), 46 deletions(-)

Reviewed-by: Thomas Huth

Re: [PATCH v2 4/7] s390x/mmu: Inject PGM_ADDRESSING on boguous table addresses

2019-09-25 Thread Thomas Huth

On 25/09/2019 14.52, David Hildenbrand wrote:
> Let's document how it works and inject PGM_ADDRESSING if reading of
> table entries fails.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/mmu_helper.c | 28 
>  1 file changed, 24 insertions(+), 4 deletions(-)

Reviewed-by: Thomas Huth

Re: [PATCH 09/20] spapr: Clarify and fix handling of nr_irqs

2019-09-25 Thread Greg Kurz

On Wed, 25 Sep 2019 16:45:23 +1000
David Gibson  wrote:

> Both the XICS and XIVE interrupt backends have a "nr-irqs" property, but
> it means slightly different things.  For XICS (or, strictly, the ICS) it
> indicates the number of "real" external IRQs.  Those start at XICS_IRQ_BASE
> (0x1000) and don't include the special IPI vector.  For XIVE, however, it
> includes the whole IRQ space, including XIVE's many IPI vectors.
> 
> The spapr code currently doesn't handle this sensibly, with the nr_irqs
> value in SpaprIrq having different meanings depending on the backend.
> We fix this by renaming nr_irqs to nr_xirqs and making it always indicate
> just the number of external irqs, adjusting the value we pass to XIVE
> accordingly.  We also use move to using common constants in most of the
^^^
s/use//

> irq configurations, to make it clearer that the IRQ space looks the same
> to the guest (and emulated devices), even if the backend is different.
> 
> Signed-off-by: David Gibson 
> ---

Reviewed-by: Greg Kurz 

>  hw/ppc/spapr_irq.c | 48 +++---
>  include/hw/ppc/spapr_irq.h | 19 +--
>  2 files changed, 31 insertions(+), 36 deletions(-)
> 
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 8c26fa2d1e..5190a33e08 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -92,7 +92,7 @@ static void spapr_irq_init_kvm(SpaprMachineState *spapr,
>   * XICS IRQ backend.
>   */
>  
> -static void spapr_irq_init_xics(SpaprMachineState *spapr, int nr_irqs,
> +static void spapr_irq_init_xics(SpaprMachineState *spapr, int nr_xirqs,
>  Error **errp)
>  {
>  Object *obj;
> @@ -102,7 +102,7 @@ static void spapr_irq_init_xics(SpaprMachineState *spapr, 
> int nr_irqs,
>  object_property_add_child(OBJECT(spapr), "ics", obj, &error_abort);
>  object_property_add_const_link(obj, ICS_PROP_XICS, OBJECT(spapr),
> &error_fatal);
> -object_property_set_int(obj, nr_irqs, "nr-irqs",  &error_fatal);
> +object_property_set_int(obj, nr_xirqs, "nr-irqs",  &error_fatal);
>  object_property_set_bool(obj, true, "realized", &local_err);
>  if (local_err) {
>  error_propagate(errp, local_err);
> @@ -234,13 +234,9 @@ static void spapr_irq_init_kvm_xics(SpaprMachineState 
> *spapr, Error **errp)
>  }
>  }
>  
> -#define SPAPR_IRQ_XICS_NR_IRQS 0x1000
> -#define SPAPR_IRQ_XICS_NR_MSIS \
> -(XICS_IRQ_BASE + SPAPR_IRQ_XICS_NR_IRQS - SPAPR_IRQ_MSI)
> -
>  SpaprIrq spapr_irq_xics = {
> -.nr_irqs = SPAPR_IRQ_XICS_NR_IRQS,
> -.nr_msis = SPAPR_IRQ_XICS_NR_MSIS,
> +.nr_xirqs= SPAPR_NR_XIRQS,
> +.nr_msis = SPAPR_NR_MSIS,
>  .ov5 = SPAPR_OV5_XIVE_LEGACY,
>  
>  .init= spapr_irq_init_xics,
> @@ -260,7 +256,7 @@ SpaprIrq spapr_irq_xics = {
>  /*
>   * XIVE IRQ backend.
>   */
> -static void spapr_irq_init_xive(SpaprMachineState *spapr, int nr_irqs,
> +static void spapr_irq_init_xive(SpaprMachineState *spapr, int nr_xirqs,
>  Error **errp)
>  {
>  uint32_t nr_servers = spapr_max_server_number(spapr);
> @@ -268,7 +264,7 @@ static void spapr_irq_init_xive(SpaprMachineState *spapr, 
> int nr_irqs,
>  int i;
>  
>  dev = qdev_create(NULL, TYPE_SPAPR_XIVE);
> -qdev_prop_set_uint32(dev, "nr-irqs", nr_irqs);
> +qdev_prop_set_uint32(dev, "nr-irqs", nr_xirqs + SPAPR_XIRQ_BASE);
>  /*
>   * 8 XIVE END structures per CPU. One for each available priority
>   */
> @@ -308,7 +304,7 @@ static qemu_irq spapr_qirq_xive(SpaprMachineState *spapr, 
> int irq)
>  {
>  SpaprXive *xive = spapr->xive;
>  
> -if (irq >= xive->nr_irqs) {
> +if ((irq < SPAPR_XIRQ_BASE) || (irq >= xive->nr_irqs)) {
>  return NULL;
>  }
>  
> @@ -409,12 +405,9 @@ static void spapr_irq_init_kvm_xive(SpaprMachineState 
> *spapr, Error **errp)
>   * with XICS.
>   */
>  
> -#define SPAPR_IRQ_XIVE_NR_IRQS 0x2000
> -#define SPAPR_IRQ_XIVE_NR_MSIS (SPAPR_IRQ_XIVE_NR_IRQS - SPAPR_IRQ_MSI)
> -
>  SpaprIrq spapr_irq_xive = {
> -.nr_irqs = SPAPR_IRQ_XIVE_NR_IRQS,
> -.nr_msis = SPAPR_IRQ_XIVE_NR_MSIS,
> +.nr_xirqs= SPAPR_NR_XIRQS,
> +.nr_msis = SPAPR_NR_MSIS,
>  .ov5 = SPAPR_OV5_XIVE_EXPLOIT,
>  
>  .init= spapr_irq_init_xive,
> @@ -450,18 +443,18 @@ static SpaprIrq *spapr_irq_current(SpaprMachineState 
> *spapr)
>  &spapr_irq_xive : &spapr_irq_xics;
>  }
>  
> -static void spapr_irq_init_dual(SpaprMachineState *spapr, int nr_irqs,
> +static void spapr_irq_init_dual(SpaprMachineState *spapr, int nr_xirqs,
>  Error **errp)
>  {
>  Error *local_err = NULL;
>  
> -spapr_irq_xics.init(spapr, spapr_irq_xics.nr_irqs, &local_err);
> +spapr_irq_xics.init(spapr, spapr_irq_xics.nr_xirqs, &local_err);
>  if (local_err) {
>  error_propagate(errp, lo

Re: [PATCH v2 5/7] s390x/mmu: Use TARGET_PAGE_MASK in mmu_translate_pte()

2019-09-25 Thread Thomas Huth

On 25/09/2019 14.52, David Hildenbrand wrote:
> While ASCE_ORIGIN is not wrong, it is certainly confusing. We want a
> page frame address.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/mmu_helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/s390x/mmu_helper.c b/target/s390x/mmu_helper.c
> index c9fde78614..20e9c13202 100644
> --- a/target/s390x/mmu_helper.c
> +++ b/target/s390x/mmu_helper.c
> @@ -126,7 +126,7 @@ static int mmu_translate_pte(CPUS390XState *env, 
> target_ulong vaddr,
>  *flags &= ~PAGE_WRITE;
>  }
>  
> -*raddr = pt_entry & ASCE_ORIGIN;
> +*raddr = pt_entry & TARGET_PAGE_MASK;
>  return 0;
>  }

Reviewed-by: Thomas Huth

Re: [PATCH 10/20] spapr: Eliminate nr_irqs parameter to SpaprIrq::init

2019-09-25 Thread Greg Kurz

On Wed, 25 Sep 2019 16:45:24 +1000
David Gibson  wrote:

> The only reason this parameter was needed was to work around the
> inconsistent meaning of nr_irqs between xics and xive.  Now that we've
> fixed that, we can consistently use the number directly in the SpaprIrq
> configuration.
> 
> Signed-off-by: David Gibson 
> ---

Reviewed-by: Greg Kurz 

>  hw/ppc/spapr_irq.c | 21 ++---
>  include/hw/ppc/spapr_irq.h |  2 +-
>  2 files changed, 11 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 5190a33e08..300c65be3a 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -92,8 +92,7 @@ static void spapr_irq_init_kvm(SpaprMachineState *spapr,
>   * XICS IRQ backend.
>   */
>  
> -static void spapr_irq_init_xics(SpaprMachineState *spapr, int nr_xirqs,
> -Error **errp)
> +static void spapr_irq_init_xics(SpaprMachineState *spapr, Error **errp)
>  {
>  Object *obj;
>  Error *local_err = NULL;
> @@ -102,7 +101,8 @@ static void spapr_irq_init_xics(SpaprMachineState *spapr, 
> int nr_xirqs,
>  object_property_add_child(OBJECT(spapr), "ics", obj, &error_abort);
>  object_property_add_const_link(obj, ICS_PROP_XICS, OBJECT(spapr),
> &error_fatal);
> -object_property_set_int(obj, nr_xirqs, "nr-irqs",  &error_fatal);
> +object_property_set_int(obj, spapr->irq->nr_xirqs,
> +"nr-irqs",  &error_fatal);
>  object_property_set_bool(obj, true, "realized", &local_err);
>  if (local_err) {
>  error_propagate(errp, local_err);
> @@ -256,15 +256,15 @@ SpaprIrq spapr_irq_xics = {
>  /*
>   * XIVE IRQ backend.
>   */
> -static void spapr_irq_init_xive(SpaprMachineState *spapr, int nr_xirqs,
> -Error **errp)
> +static void spapr_irq_init_xive(SpaprMachineState *spapr, Error **errp)
>  {
>  uint32_t nr_servers = spapr_max_server_number(spapr);
>  DeviceState *dev;
>  int i;
>  
>  dev = qdev_create(NULL, TYPE_SPAPR_XIVE);
> -qdev_prop_set_uint32(dev, "nr-irqs", nr_xirqs + SPAPR_XIRQ_BASE);
> +qdev_prop_set_uint32(dev, "nr-irqs",
> + spapr->irq->nr_xirqs + SPAPR_XIRQ_BASE);
>  /*
>   * 8 XIVE END structures per CPU. One for each available priority
>   */
> @@ -443,18 +443,17 @@ static SpaprIrq *spapr_irq_current(SpaprMachineState 
> *spapr)
>  &spapr_irq_xive : &spapr_irq_xics;
>  }
>  
> -static void spapr_irq_init_dual(SpaprMachineState *spapr, int nr_xirqs,
> -Error **errp)
> +static void spapr_irq_init_dual(SpaprMachineState *spapr, Error **errp)
>  {
>  Error *local_err = NULL;
>  
> -spapr_irq_xics.init(spapr, spapr_irq_xics.nr_xirqs, &local_err);
> +spapr_irq_xics.init(spapr, &local_err);
>  if (local_err) {
>  error_propagate(errp, local_err);
>  return;
>  }
>  
> -spapr_irq_xive.init(spapr, spapr_irq_xive.nr_xirqs, &local_err);
> +spapr_irq_xive.init(spapr, &local_err);
>  if (local_err) {
>  error_propagate(errp, local_err);
>  return;
> @@ -683,7 +682,7 @@ void spapr_irq_init(SpaprMachineState *spapr, Error 
> **errp)
>  spapr_irq_msi_init(spapr, spapr->irq->nr_msis);
>  }
>  
> -spapr->irq->init(spapr, spapr->irq->nr_xirqs, errp);
> +spapr->irq->init(spapr, errp);
>  
>  spapr->qirqs = qemu_allocate_irqs(spapr->irq->set_irq, spapr,
>spapr->irq->nr_xirqs + 
> SPAPR_XIRQ_BASE);
> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> index a8f9a2ab11..7e26288fcd 100644
> --- a/include/hw/ppc/spapr_irq.h
> +++ b/include/hw/ppc/spapr_irq.h
> @@ -41,7 +41,7 @@ typedef struct SpaprIrq {
>  uint32_tnr_msis;
>  uint8_t ov5;
>  
> -void (*init)(SpaprMachineState *spapr, int nr_irqs, Error **errp);
> +void (*init)(SpaprMachineState *spapr, Error **errp);
>  int (*claim)(SpaprMachineState *spapr, int irq, bool lsi, Error **errp);
>  void (*free)(SpaprMachineState *spapr, int irq, int num);
>  qemu_irq (*qirq)(SpaprMachineState *spapr, int irq);

Re: [PATCH v4 06/16] cputlb: Introduce TLB_BSWAP

2019-09-25 Thread Richard Henderson

On 9/24/19 11:25 AM, Alex Bennée wrote:
>> -
>> -/* The backing page may or may not require I/O.  */
>> -tlb_addr &= ~TLB_WATCHPOINT;
>> -if ((tlb_addr & ~TARGET_PAGE_MASK) == 0) {
>> -goto do_aligned_access;
>> -}
>>  }
>>
>/* We don't apply MO_BSWAP to op here because we want to
> * ensure the compiler can always unfold and dead-code away
> * the final load_memop in the fast path. If you try the
> * you will find the assert will get you ;-)
> */

I added

+/*
+ * Keep these two load_memop separate to ensure that the compiler
+ * is able to fold the entire function to a single instruction.
+ * There is a build-time assert inside to remind you of this.  ;-)
+ */


r~

Re: [PATCH-for-4.2 v11 11/11] tests: Add bios tests to arm/virt

2019-09-25 Thread Igor Mammedov

On Wed, 25 Sep 2019 11:26:04 -0400
"Michael S. Tsirkin"  wrote:

> On Wed, Sep 18, 2019 at 02:06:33PM +0100, Shameer Kolothum wrote:
> > This adds numamem and memhp tests for arm/virt platform.
> > 
> > Signed-off-by: Shameer Kolothum 
> > Reviewed-by: Igor Mammedov 
> > ---
> > v10-->v11
> > 
> > Added Igor's R-by.
> > 
> > In order to avoid "make check" failure, the files listed in patch #10
> > has to be added to tests/data/acpi/virt folder before this patch.
> 
> So you can just add empty stubs.

Wouldn't IASL choke on such files?

> 
> > ---
> >  tests/bios-tables-test.c | 49 
> >  1 file changed, 49 insertions(+)
> > 
> > diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
> > index 9b3d8b0d1b..6d9e2e41b0 100644
> > --- a/tests/bios-tables-test.c
> > +++ b/tests/bios-tables-test.c
> > @@ -870,6 +870,53 @@ static void test_acpi_piix4_tcg_dimm_pxm(void)
> >  test_acpi_tcg_dimm_pxm(MACHINE_PC);
> >  }
> >  
> > +static void test_acpi_virt_tcg_memhp(void)
> > +{
> > +test_data data = {
> > +.machine = "virt",
> > +.accel = "tcg",
> > +.uefi_fl1 = "pc-bios/edk2-aarch64-code.fd",
> > +.uefi_fl2 = "pc-bios/edk2-arm-vars.fd",
> > +.cd = 
> > "tests/data/uefi-boot-images/bios-tables-test.aarch64.iso.qcow2",
> > +.ram_start = 0x4000ULL,
> > +.scan_len = 256ULL * 1024 * 1024,
> > +};
> > +
> > +data.variant = ".memhp";
> > +test_acpi_one(" -cpu cortex-a57"
> > +  " -m 256M,slots=3,maxmem=1G"
> > +  " -object memory-backend-ram,id=ram0,size=128M"
> > +  " -object memory-backend-ram,id=ram1,size=128M"
> > +  " -numa node,memdev=ram0 -numa node,memdev=ram1"
> > +  " -numa dist,src=0,dst=1,val=21",
> > +  &data);
> > +
> > +free_test_data(&data);
> > +
> > +}
> > +
> > +static void test_acpi_virt_tcg_numamem(void)
> > +{
> > +test_data data = {
> > +.machine = "virt",
> > +.accel = "tcg",
> > +.uefi_fl1 = "pc-bios/edk2-aarch64-code.fd",
> > +.uefi_fl2 = "pc-bios/edk2-arm-vars.fd",
> > +.cd = 
> > "tests/data/uefi-boot-images/bios-tables-test.aarch64.iso.qcow2",
> > +.ram_start = 0x4000ULL,
> > +.scan_len = 128ULL * 1024 * 1024,
> > +};
> > +
> > +data.variant = ".numamem";
> > +test_acpi_one(" -cpu cortex-a57"
> > +  " -object memory-backend-ram,id=ram0,size=128M"
> > +  " -numa node,memdev=ram0",
> > +  &data);
> > +
> > +free_test_data(&data);
> > +
> > +}
> > +
> >  static void test_acpi_virt_tcg(void)
> >  {
> >  test_data data = {
> > @@ -916,6 +963,8 @@ int main(int argc, char *argv[])
> >  qtest_add_func("acpi/q35/dimmpxm", test_acpi_q35_tcg_dimm_pxm);
> >  } else if (strcmp(arch, "aarch64") == 0) {
> >  qtest_add_func("acpi/virt", test_acpi_virt_tcg);
> > +qtest_add_func("acpi/virt/numamem", test_acpi_virt_tcg_numamem);
> > +qtest_add_func("acpi/virt/memhp", test_acpi_virt_tcg_memhp);
> >  }
> >  ret = g_test_run();
> >  boot_sector_cleanup(disk);
> > -- 
> > 2.17.1
> > 
>

[PULL 0/2] Block patches

2019-09-25 Thread Stefan Hajnoczi

The following changes since commit 240ab11fb72049d6373cbbec8d788f8e411a00bc:

  Merge remote-tracking branch 'remotes/aperard/tags/pull-xen-20190924' into 
staging (2019-09-24 15:36:31 +0100)

are available in the Git repository at:

  https://github.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to f9a7e3698a737ee75a7b0af34203303df982550f:

  virtio-blk: schedule virtio_notify_config to run on main context (2019-09-25 
18:06:36 +0100)


Pull request



Sergio Lopez (1):
  virtio-blk: schedule virtio_notify_config to run on main context

Vladimir Sementsov-Ogievskiy (1):
  util/ioc.c: try to reassure Coverity about qemu_iovec_init_extended

 hw/block/virtio-blk.c | 16 +++-
 util/iov.c|  3 ++-
 2 files changed, 17 insertions(+), 2 deletions(-)

-- 
2.21.0

[PULL 2/2] virtio-blk: schedule virtio_notify_config to run on main context

2019-09-25 Thread Stefan Hajnoczi

From: Sergio Lopez 

virtio_notify_config() needs to acquire the global mutex, which isn't
allowed from an iothread, and may lead to a deadlock like this:

 - main thead
  * Has acquired: qemu_global_mutex.
  * Is trying the acquire: iothread AioContext lock via
AIO_WAIT_WHILE (after aio_poll).

 - iothread
  * Has acquired: AioContext lock.
  * Is trying to acquire: qemu_global_mutex (via
virtio_notify_config->prepare_mmio_access).

If virtio_blk_resize() is called from an iothread, schedule
virtio_notify_config() to be run in the main context BH.

[Removed unnecessary newline as suggested by Kevin Wolf
.
--Stefan]

Signed-off-by: Sergio Lopez 
Reviewed-by: Kevin Wolf 
Message-id: 20190916112411.21636-1-...@redhat.com
Message-Id: <20190916112411.21636-1-...@redhat.com>
Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 18851601cb..ed2ddebd2b 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -16,6 +16,7 @@
 #include "qemu/iov.h"
 #include "qemu/module.h"
 #include "qemu/error-report.h"
+#include "qemu/main-loop.h"
 #include "trace.h"
 #include "hw/block/block.h"
 #include "hw/qdev-properties.h"
@@ -1086,11 +1087,24 @@ static int virtio_blk_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 return 0;
 }
 
+static void virtio_resize_cb(void *opaque)
+{
+VirtIODevice *vdev = opaque;
+
+assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+virtio_notify_config(vdev);
+}
+
 static void virtio_blk_resize(void *opaque)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
 
-virtio_notify_config(vdev);
+/*
+ * virtio_notify_config() needs to acquire the global mutex,
+ * so it can't be called from an iothread. Instead, schedule
+ * it to be run in the main context BH.
+ */
+aio_bh_schedule_oneshot(qemu_get_aio_context(), virtio_resize_cb, vdev);
 }
 
 static const BlockDevOps virtio_block_ops = {
-- 
2.21.0

[PULL 1/2] util/ioc.c: try to reassure Coverity about qemu_iovec_init_extended

2019-09-25 Thread Stefan Hajnoczi

From: Vladimir Sementsov-Ogievskiy 

Make it more obvious, that filling qiov corresponds to qiov allocation,
which in turn corresponds to total_niov calculation, based on mid_niov
(not mid_len). Still add an assertion to show that there should be no
difference.

Reported-by: Coverity (CID 1405302)
Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-id: 20190910090310.14032-1-vsement...@virtuozzo.com
Suggested-by: Peter Maydell 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20190910090310.14032-1-vsement...@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi 
---
 util/iov.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/util/iov.c b/util/iov.c
index 5059e10431..a4689ff3c9 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -446,7 +446,8 @@ void qemu_iovec_init_extended(
 p++;
 }
 
-if (mid_len) {
+assert(!mid_niov == !mid_len);
+if (mid_niov) {
 memcpy(p, mid_iov, mid_niov * sizeof(*p));
 p[0].iov_base = (uint8_t *)p[0].iov_base + mid_head;
 p[0].iov_len -= mid_head;
-- 
2.21.0

Re: [PATCH v4 10/16] cputlb: Partially inline memory_region_section_get_iotlb

2019-09-25 Thread Richard Henderson

On 9/24/19 12:59 AM, David Hildenbrand wrote:
>> +is_ram = memory_region_is_ram(section->mr);
>> +is_romd = memory_region_is_romd(section->mr);
>> +
>> +if (is_ram || is_romd) {
>> +/* RAM and ROMD both have associated host memory. */
>>  addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) + xlat;
>> +} else {
>> +/* I/O does not; force the host address to NULL. */
>> +addend = 0;
>> +}
>> +
>> +write_address = address;
> 
> I guess the only "suboptimal" change is that you now have two checks for
> "prot & PAGE_WRITE" twice in the case of ram instead of one.

It's a single bit test on a register operand -- as cheap as can be.  If you
look at the entire code, there *must* be more than one test.  You can rearrange
the code to choose exactly where those tests are, but you'll have to have them
somewhere.

>> +/* I/O or ROMD */
>> +iotlb = memory_region_section_get_iotlb(cpu, section) + xlat;
>> +/*
>> + * Writes to romd devices must go through MMIO to enable write.
>> + * Reads to romd devices go through the ram_ptr found above,
>> + * but of course reads to I/O must go through MMIO.
>> + */
>> +write_address |= TLB_MMIO;
> 
> ... and here you calculate write_address even if probably unused.

Well... while the page might not be writable (but I'd bet that it is -- I/O
memory is almost never read-only), and therefore write_address is technically
unused, the variable is practically used in the next line:

if (!is_romd) {
address = write_address
}

which will compile to a conditional move.

> Can your move the calculation of the write_address completely into the
> "prot & PAGE_WRITE" case below?

We'd prefer not to, since the code below is within the cpu tlb lock region.
We'd prefer to keep all of the expensive operations outside that.

r~

[Bug 1841990] Re: instruction 'denbcdq' misbehaving

2019-09-25 Thread Paul Clarke

I'm still trying to track down a BE system.  Everything I have which is
newer than POWER7 is LE, and POWER7 is not sufficient to run the test.

The test suite that produced the problem is from https://github.com
/open-power-sdk/pveclib.  The good news is that with your (v1) changes,
275 tests no longer fail.  22 tests still fail, but I bet it is
different issue(s).

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1841990

Title:
  instruction 'denbcdq' misbehaving

Status in QEMU:
  New

Bug description:
  Instruction 'denbcdq' appears to have no effect.  Test case attached.

  On ppc64le native:
  --
  gcc -g -O -mcpu=power9 bcdcfsq.c test-denbcdq.c -o test-denbcdq
  $ ./test-denbcdq
  0x
  0x000c
  0x2208
  $ ./test-denbcdq 1
  0x0001
  0x001c
  0x22080001
  $ ./test-denbcdq $(seq 0 99)
  0x0064
  0x100c
  0x22080080
  --

  With "qemu-ppc64le -cpu power9"
  --
  $ qemu-ppc64le -cpu power9 -L [...] ./test-denbcdq
  0x
  0x000c
  0x000c
  $ qemu-ppc64le -cpu power9 -L [...] ./test-denbcdq 1
  0x0001
  0x001c
  0x001c
  $ qemu-ppc64le -cpu power9 -L [...] ./test-denbcdq $(seq 100)
  0x0064
  0x100c
  0x100c
  --

  I started looking at the code, but I got confused rather quickly.
  Could be related to endianness? I think denbcdq arrived on the scene
  before little-endian was a big deal.  Maybe something to do with
  utilizing implicit floating-point register pairs...  I don't think the
  right data is getting to helper_denbcdq, which would point back to the
  gen_fprp_ptr uses in dfp-impl.inc.c (GEN_DFP_T_FPR_I32_Rc).  (Maybe?)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1841990/+subscriptions

[PATCH 0/3] iotests: Fix 125

2019-09-25 Thread Max Reitz

Hi,

iotest 125 is very broken.  It uses qemu-img info’s “disk size” to
determine an image’s on-disk size, but it does so in a wrong way: It
just fetches the first number ([0-9]+), but that isn’t very useful
because qemu-img info emits human-readable values that include units and
decimal points.

We should ust stat -c %b instead.  That’s done in patch 3.
Unfortunately, doing so exposed more problems.

Patch 1 fixes a stupid bug in the test itself that we never noticed
because of what patch 3 fixes.  (Pull patch 3 before patch 1 and you’ll
see.)

The other thing is actually a bug in XFS.  Its fallocate()
implementation rounds up the length independently of the offset, so if
you try to fallocate an unaligned range, chances are that it might not
allocate the last block your range touches.  Patch 2 detects that case
and skips the test then.  (Pull patch 3 before patch 2 and you’ll see
the test fail on XFS.)


Max Reitz (3):
  iotests: Fix 125 for growth_mode = metadata
  iotests: Disable 125 on broken XFS versions
  iotests: Use stat -c %b in 125

 tests/qemu-iotests/125 | 45 +++---
 1 file changed, 42 insertions(+), 3 deletions(-)

-- 
2.21.0

[PATCH 1/3] iotests: Fix 125 for growth_mode = metadata

2019-09-25 Thread Max Reitz

If we use growth_mode = metadata, it is very much possible that the file
uses more disk space after we have written something to the added area.
We did indeed want to test for this case, but unfortunately we evidently
just copied the code from the "Test creation preallocation" section and
forgot to replace "$create_mode" by "$growth_mode".

We never noticed because we only read the first number from qemu-img
info's "disk size" output -- and that is effectively useless, because
qemu-img prints a human-readable value (which generally includes a
decimal point).  That will be fixed in the patch after the next one.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/125 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/125 b/tests/qemu-iotests/125
index dc4b8f5fb9..df328a63a6 100755
--- a/tests/qemu-iotests/125
+++ b/tests/qemu-iotests/125
@@ -111,7 +111,7 @@ for GROWTH_SIZE in 16 48 80; do
 if [ $file_length_2 -gt $file_length_1 ]; then
 echo "ERROR (grow): Image length has grown from 
$file_length_1 to $file_length_2"
 fi
-if [ $create_mode != metadata ]; then
+if [ $growth_mode != metadata ]; then
 # The host size should not have grown either
 if [ $host_size_2 -gt $host_size_1 ]; then
 echo "ERROR (grow): Host size has grown from 
$host_size_1 to $host_size_2"
-- 
2.21.0

[PATCH 2/3] iotests: Disable 125 on broken XFS versions

2019-09-25 Thread Max Reitz

And by that I mean all XFS versions, as far as I can tell.  All details
are in the comment below.

We never noticed this problem because we only read the first number from
qemu-img info's "disk size" output -- and that is effectively useless,
because qemu-img prints a human-readable value (which generally includes
a decimal point).  That will be fixed in the next patch.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/125 | 40 
 1 file changed, 40 insertions(+)

diff --git a/tests/qemu-iotests/125 b/tests/qemu-iotests/125
index df328a63a6..0ef51f1e21 100755
--- a/tests/qemu-iotests/125
+++ b/tests/qemu-iotests/125
@@ -49,6 +49,46 @@ if [ -z "$TEST_IMG_FILE" ]; then
 TEST_IMG_FILE=$TEST_IMG
 fi
 
+# Test whether we are running on a broken XFS version.  There is this
+# bug:
+
+# $ rm -f foo
+# $ touch foo
+# $ block_size=4096 # Your FS's block size
+# $ fallocate -o $((block_size / 2)) -l $block_size foo
+# $ LANG=C xfs_bmap foo | grep hole
+# 1: [8..15]: hole
+#
+# The problem is that the XFS driver rounds down the offset and
+# rounds up the length to the block size, but independently.  As
+# such, it only allocates the first block in the example above,
+# even though it should allocate the first two blocks (because our
+# request is to fallocate something that touches both the first
+# two blocks).
+#
+# This means that when you then write to the beginning of the
+# second block, the disk usage of the first two blocks grows.
+#
+# That is precisely what fallocate() promises, though: That when you
+# write to an area that you have fallocated, no new blocks will have
+# to be allocated.
+
+touch "$TEST_IMG_FILE"
+# Assuming there is no FS with a block size greater than 64k
+fallocate -o 65535 -l 2 "$TEST_IMG_FILE"
+len0=$(get_image_size_on_host)
+
+# Write to something that in theory we have just fallocated
+# (Thus, the on-disk size should not increase)
+poke_file "$TEST_IMG_FILE" 65536 42
+len1=$(get_image_size_on_host)
+
+if [ $len1 -gt $len0 ]; then
+_notrun "the test filesystem's fallocate() is broken"
+fi
+
+rm -f "$TEST_IMG_FILE"
+
 # Generally, we create some image with or without existing preallocation and
 # then resize it. Then we write some data into the image and verify that its
 # size does not change if we have used preallocation.
-- 
2.21.0

[PATCH 3/3] iotests: Use stat -c %b in 125

2019-09-25 Thread Max Reitz

125 should not use qemu-img to get the on-disk image size, because that
reports it in a human-readable format that is useless to us.  Just use
stat instead (like we do to get the image file length).

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/125 | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/125 b/tests/qemu-iotests/125
index 0ef51f1e21..4e31aa4e5f 100755
--- a/tests/qemu-iotests/125
+++ b/tests/qemu-iotests/125
@@ -34,8 +34,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 get_image_size_on_host()
 {
-$QEMU_IMG info -f "$IMGFMT" "$TEST_IMG" | grep "disk size" \
-| sed -e 's/^[^0-9]*\([0-9]\+\).*$/\1/'
+echo $(($(stat -c '%b * %B' "$TEST_IMG_FILE")))
 }
 
 # get standard environment and filters
-- 
2.21.0

[PULL 02/16] cputlb: Disable __always_inline__ without optimization

2019-09-25 Thread Richard Henderson

This forced inlining can result in missing symbols,
which makes a debugging build harder to follow.

Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: David Hildenbrand 
Reported-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 include/qemu/compiler.h | 11 +++
 accel/tcg/cputlb.c  |  4 ++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 09fc44cca4..20780e722d 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -170,6 +170,17 @@
 # define QEMU_NONSTRING
 #endif
 
+/*
+ * Forced inlining may be desired to encourage constant propagation
+ * of function parameters.  However, it can also make debugging harder,
+ * so disable it for a non-optimizing build.
+ */
+#if defined(__OPTIMIZE__)
+#define QEMU_ALWAYS_INLINE  __attribute__((always_inline))
+#else
+#define QEMU_ALWAYS_INLINE
+#endif
+
 /* Implement C11 _Generic via GCC builtins.  Example:
  *
  *QEMU_GENERIC(x, (float, sinf), (long double, sinl), sin) (x)
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index abae79650c..b87764 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1281,7 +1281,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, 
target_ulong addr,
 typedef uint64_t FullLoadHelper(CPUArchState *env, target_ulong addr,
 TCGMemOpIdx oi, uintptr_t retaddr);
 
-static inline uint64_t __attribute__((always_inline))
+static inline uint64_t QEMU_ALWAYS_INLINE
 load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
 uintptr_t retaddr, MemOp op, bool code_read,
 FullLoadHelper *full_load)
@@ -1530,7 +1530,7 @@ tcg_target_ulong helper_be_ldsl_mmu(CPUArchState *env, 
target_ulong addr,
  * Store Helpers
  */
 
-static inline void __attribute__((always_inline))
+static inline void QEMU_ALWAYS_INLINE
 store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
  TCGMemOpIdx oi, uintptr_t retaddr, MemOp op)
 {
-- 
2.17.1

[PULL 05/16] cputlb: Split out load/store_memop

2019-09-25 Thread Richard Henderson

We will shortly be using these more than once.

Reviewed-by: Alex Bennée 
Reviewed-by: David Hildenbrand 
Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c | 107 +++--
 1 file changed, 55 insertions(+), 52 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index e31378bce3..eeba8c9847 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1281,6 +1281,29 @@ static void *atomic_mmu_lookup(CPUArchState *env, 
target_ulong addr,
 typedef uint64_t FullLoadHelper(CPUArchState *env, target_ulong addr,
 TCGMemOpIdx oi, uintptr_t retaddr);
 
+static inline uint64_t QEMU_ALWAYS_INLINE
+load_memop(const void *haddr, MemOp op)
+{
+switch (op) {
+case MO_UB:
+return ldub_p(haddr);
+case MO_BEUW:
+return lduw_be_p(haddr);
+case MO_LEUW:
+return lduw_le_p(haddr);
+case MO_BEUL:
+return (uint32_t)ldl_be_p(haddr);
+case MO_LEUL:
+return (uint32_t)ldl_le_p(haddr);
+case MO_BEQ:
+return ldq_be_p(haddr);
+case MO_LEQ:
+return ldq_le_p(haddr);
+default:
+qemu_build_not_reached();
+}
+}
+
 static inline uint64_t QEMU_ALWAYS_INLINE
 load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
 uintptr_t retaddr, MemOp op, bool code_read,
@@ -1373,33 +1396,7 @@ load_helper(CPUArchState *env, target_ulong addr, 
TCGMemOpIdx oi,
 
  do_aligned_access:
 haddr = (void *)((uintptr_t)addr + entry->addend);
-switch (op) {
-case MO_UB:
-res = ldub_p(haddr);
-break;
-case MO_BEUW:
-res = lduw_be_p(haddr);
-break;
-case MO_LEUW:
-res = lduw_le_p(haddr);
-break;
-case MO_BEUL:
-res = (uint32_t)ldl_be_p(haddr);
-break;
-case MO_LEUL:
-res = (uint32_t)ldl_le_p(haddr);
-break;
-case MO_BEQ:
-res = ldq_be_p(haddr);
-break;
-case MO_LEQ:
-res = ldq_le_p(haddr);
-break;
-default:
-qemu_build_not_reached();
-}
-
-return res;
+return load_memop(haddr, op);
 }
 
 /*
@@ -1530,6 +1527,36 @@ tcg_target_ulong helper_be_ldsl_mmu(CPUArchState *env, 
target_ulong addr,
  * Store Helpers
  */
 
+static inline void QEMU_ALWAYS_INLINE
+store_memop(void *haddr, uint64_t val, MemOp op)
+{
+switch (op) {
+case MO_UB:
+stb_p(haddr, val);
+break;
+case MO_BEUW:
+stw_be_p(haddr, val);
+break;
+case MO_LEUW:
+stw_le_p(haddr, val);
+break;
+case MO_BEUL:
+stl_be_p(haddr, val);
+break;
+case MO_LEUL:
+stl_le_p(haddr, val);
+break;
+case MO_BEQ:
+stq_be_p(haddr, val);
+break;
+case MO_LEQ:
+stq_le_p(haddr, val);
+break;
+default:
+qemu_build_not_reached();
+}
+}
+
 static inline void QEMU_ALWAYS_INLINE
 store_helper(CPUArchState *env, target_ulong addr, uint64_t val,
  TCGMemOpIdx oi, uintptr_t retaddr, MemOp op)
@@ -1657,31 +1684,7 @@ store_helper(CPUArchState *env, target_ulong addr, 
uint64_t val,
 
  do_aligned_access:
 haddr = (void *)((uintptr_t)addr + entry->addend);
-switch (op) {
-case MO_UB:
-stb_p(haddr, val);
-break;
-case MO_BEUW:
-stw_be_p(haddr, val);
-break;
-case MO_LEUW:
-stw_le_p(haddr, val);
-break;
-case MO_BEUL:
-stl_be_p(haddr, val);
-break;
-case MO_LEUL:
-stl_le_p(haddr, val);
-break;
-case MO_BEQ:
-stq_be_p(haddr, val);
-break;
-case MO_LEQ:
-stq_le_p(haddr, val);
-break;
-default:
-qemu_build_not_reached();
-}
+store_memop(haddr, val, op);
 }
 
 void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
-- 
2.17.1

[PULL 03/16] qemu/compiler.h: Add qemu_build_not_reached

2019-09-25 Thread Richard Henderson

Use this as a compile-time assert that a particular
code path is not reachable.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 include/qemu/compiler.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 20780e722d..7b93c73340 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -221,4 +221,19 @@
 #define QEMU_GENERIC9(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC8(x, 
__VA_ARGS__))
 #define QEMU_GENERIC10(x, a0, ...) QEMU_GENERIC_IF(x, a0, QEMU_GENERIC9(x, 
__VA_ARGS__))
 
+/**
+ * qemu_build_not_reached()
+ *
+ * The compiler, during optimization, is expected to prove that a call
+ * to this function cannot be reached and remove it.  If the compiler
+ * supports QEMU_ERROR, this will be reported at compile time; otherwise
+ * this will be reported at link time due to the missing symbol.
+ */
+#ifdef __OPTIMIZE__
+extern void QEMU_NORETURN QEMU_ERROR("code path is reachable")
+qemu_build_not_reached(void);
+#else
+#define qemu_build_not_reached()  g_assert_not_reached()
+#endif
+
 #endif /* COMPILER_H */
-- 
2.17.1

[PULL 00/16] tcg patch queue

2019-09-25 Thread Richard Henderson

This is v4 of my notdirty + rom patch set with two suggested name
changes (qemu_build_not_reached, TLB_DISCARD_WRITE) from David and Alex.


r~


The following changes since commit 240ab11fb72049d6373cbbec8d788f8e411a00bc:

  Merge remote-tracking branch 'remotes/aperard/tags/pull-xen-20190924' into 
staging (2019-09-24 15:36:31 +0100)

are available in the Git repository at:

  https://github.com/rth7680/qemu.git tags/pull-tcg-20190925

for you to fetch changes up to ae57db63acf5a0399232f852acc5c1d83ef63400:

  cputlb: Pass retaddr to tb_check_watchpoint (2019-09-25 10:56:28 -0700)


Fixes for TLB_BSWAP
Coversion of NOTDIRTY and ROM handling to cputlb
Followup cleanups to cputlb


Richard Henderson (16):
  exec: Use TARGET_PAGE_BITS_MIN for TLB flags
  cputlb: Disable __always_inline__ without optimization
  qemu/compiler.h: Add qemu_build_not_reached
  cputlb: Use qemu_build_not_reached in load/store_helpers
  cputlb: Split out load/store_memop
  cputlb: Introduce TLB_BSWAP
  exec: Adjust notdirty tracing
  cputlb: Move ROM handling from I/O path to TLB path
  cputlb: Move NOTDIRTY handling from I/O path to TLB path
  cputlb: Partially inline memory_region_section_get_iotlb
  cputlb: Merge and move memory_notdirty_write_{prepare,complete}
  cputlb: Handle TLB_NOTDIRTY in probe_access
  cputlb: Remove cpu->mem_io_vaddr
  cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
  cputlb: Pass retaddr to tb_invalidate_phys_page_fast
  cputlb: Pass retaddr to tb_check_watchpoint

 accel/tcg/translate-all.h  |   8 +-
 include/exec/cpu-all.h |  23 ++-
 include/exec/cpu-common.h  |   3 -
 include/exec/exec-all.h|   6 +-
 include/exec/memory-internal.h |  65 
 include/hw/core/cpu.h  |   2 -
 include/qemu/compiler.h|  26 +++
 accel/tcg/cputlb.c | 348 +
 accel/tcg/translate-all.c  |  51 +++---
 exec.c | 158 +--
 hw/core/cpu.c  |   1 -
 memory.c   |  20 ---
 trace-events   |   4 +-
 13 files changed, 288 insertions(+), 427 deletions(-)

[PULL 04/16] cputlb: Use qemu_build_not_reached in load/store_helpers

2019-09-25 Thread Richard Henderson

Increase the current runtime assert to a compile-time assert.

Reviewed-by: David Hildenbrand 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index b87764..e31378bce3 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1396,7 +1396,7 @@ load_helper(CPUArchState *env, target_ulong addr, 
TCGMemOpIdx oi,
 res = ldq_le_p(haddr);
 break;
 default:
-g_assert_not_reached();
+qemu_build_not_reached();
 }
 
 return res;
@@ -1680,8 +1680,7 @@ store_helper(CPUArchState *env, target_ulong addr, 
uint64_t val,
 stq_le_p(haddr, val);
 break;
 default:
-g_assert_not_reached();
-break;
+qemu_build_not_reached();
 }
 }
 
-- 
2.17.1

[PULL 01/16] exec: Use TARGET_PAGE_BITS_MIN for TLB flags

2019-09-25 Thread Richard Henderson

These bits do not need to vary with the actual page size
used by the guest.

Reviewed-by: Alex Bennée 
Reviewed-by: David Hildenbrand 
Reviewed-by: Paolo Bonzini 
Signed-off-by: Richard Henderson 
---
 include/exec/cpu-all.h | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index d2d443c4f9..e0c8dc540c 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -317,20 +317,24 @@ CPUArchState *cpu_copy(CPUArchState *env);
 
 #if !defined(CONFIG_USER_ONLY)
 
-/* Flags stored in the low bits of the TLB virtual address.  These are
- * defined so that fast path ram access is all zeros.
+/*
+ * Flags stored in the low bits of the TLB virtual address.
+ * These are defined so that fast path ram access is all zeros.
  * The flags all must be between TARGET_PAGE_BITS and
  * maximum address alignment bit.
+ *
+ * Use TARGET_PAGE_BITS_MIN so that these bits are constant
+ * when TARGET_PAGE_BITS_VARY is in effect.
  */
 /* Zero if TLB entry is valid.  */
-#define TLB_INVALID_MASK(1 << (TARGET_PAGE_BITS - 1))
+#define TLB_INVALID_MASK(1 << (TARGET_PAGE_BITS_MIN - 1))
 /* Set if TLB entry references a clean RAM page.  The iotlb entry will
contain the page physical address.  */
-#define TLB_NOTDIRTY(1 << (TARGET_PAGE_BITS - 2))
+#define TLB_NOTDIRTY(1 << (TARGET_PAGE_BITS_MIN - 2))
 /* Set if TLB entry is an IO callback.  */
-#define TLB_MMIO(1 << (TARGET_PAGE_BITS - 3))
+#define TLB_MMIO(1 << (TARGET_PAGE_BITS_MIN - 3))
 /* Set if TLB entry contains a watchpoint.  */
-#define TLB_WATCHPOINT  (1 << (TARGET_PAGE_BITS - 4))
+#define TLB_WATCHPOINT  (1 << (TARGET_PAGE_BITS_MIN - 4))
 
 /* Use this mask to check interception with an alignment mask
  * in a TCG backend.
-- 
2.17.1

[PULL 08/16] cputlb: Move ROM handling from I/O path to TLB path

2019-09-25 Thread Richard Henderson

It does not require going through the whole I/O path
in order to discard a write.

Reviewed-by: David Hildenbrand 
Signed-off-by: Richard Henderson 
---
 include/exec/cpu-all.h|  5 -
 include/exec/cpu-common.h |  1 -
 accel/tcg/cputlb.c| 36 --
 exec.c| 41 +--
 4 files changed, 26 insertions(+), 57 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index d148bded35..ad9ab85eb3 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -337,12 +337,15 @@ CPUArchState *cpu_copy(CPUArchState *env);
 #define TLB_WATCHPOINT  (1 << (TARGET_PAGE_BITS_MIN - 4))
 /* Set if TLB entry requires byte swap.  */
 #define TLB_BSWAP   (1 << (TARGET_PAGE_BITS_MIN - 5))
+/* Set if TLB entry writes ignored.  */
+#define TLB_DISCARD_WRITE   (1 << (TARGET_PAGE_BITS_MIN - 6))
 
 /* Use this mask to check interception with an alignment mask
  * in a TCG backend.
  */
 #define TLB_FLAGS_MASK \
-(TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO | TLB_WATCHPOINT | TLB_BSWAP)
+(TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO \
+| TLB_WATCHPOINT | TLB_BSWAP | TLB_DISCARD_WRITE)
 
 /**
  * tlb_hit_page: return true if page aligned @addr is a hit against the
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index f7dbe75fbc..1c0e03ddc2 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -100,7 +100,6 @@ void qemu_flush_coalesced_mmio_buffer(void);
 
 void cpu_flush_icache_range(hwaddr start, hwaddr len);
 
-extern struct MemoryRegion io_mem_rom;
 extern struct MemoryRegion io_mem_notdirty;
 
 typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 028eebcb44..404ec57a4e 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -577,7 +577,8 @@ static void tlb_reset_dirty_range_locked(CPUTLBEntry 
*tlb_entry,
 {
 uintptr_t addr = tlb_entry->addr_write;
 
-if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) == 0) {
+if ((addr & (TLB_INVALID_MASK | TLB_MMIO |
+ TLB_DISCARD_WRITE | TLB_NOTDIRTY)) == 0) {
 addr &= TARGET_PAGE_MASK;
 addr += tlb_entry->addend;
 if ((addr - start) < length) {
@@ -745,7 +746,6 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong 
vaddr,
 address |= TLB_MMIO;
 addend = 0;
 } else {
-/* TLB_MMIO for rom/romd handled below */
 addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) + xlat;
 }
 
@@ -822,16 +822,17 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong 
vaddr,
 
 tn.addr_write = -1;
 if (prot & PAGE_WRITE) {
-if ((memory_region_is_ram(section->mr) && section->readonly)
-|| memory_region_is_romd(section->mr)) {
-/* Write access calls the I/O callback.  */
-tn.addr_write = address | TLB_MMIO;
-} else if (memory_region_is_ram(section->mr)
-   && cpu_physical_memory_is_clean(
-   memory_region_get_ram_addr(section->mr) + xlat)) {
-tn.addr_write = address | TLB_NOTDIRTY;
-} else {
-tn.addr_write = address;
+tn.addr_write = address;
+if (memory_region_is_romd(section->mr)) {
+/* Use the MMIO path so that the device can switch states. */
+tn.addr_write |= TLB_MMIO;
+} else if (memory_region_is_ram(section->mr)) {
+if (section->readonly) {
+tn.addr_write |= TLB_DISCARD_WRITE;
+} else if (cpu_physical_memory_is_clean(
+memory_region_get_ram_addr(section->mr) + xlat)) {
+tn.addr_write |= TLB_NOTDIRTY;
+}
 }
 if (prot & PAGE_WRITE_INV) {
 tn.addr_write |= TLB_INVALID_MASK;
@@ -904,7 +905,7 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry 
*iotlbentry,
 mr = section->mr;
 mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
 cpu->mem_io_pc = retaddr;
-if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
+if (mr != &io_mem_notdirty && !cpu->can_do_io) {
 cpu_io_recompile(cpu, retaddr);
 }
 
@@ -945,7 +946,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry 
*iotlbentry,
 section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
 mr = section->mr;
 mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
-if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
+if (mr != &io_mem_notdirty && !cpu->can_do_io) {
 cpu_io_recompile(cpu, retaddr);
 }
 cpu->mem_io_vaddr = addr;
@@ -1125,7 +1126,7 @@ void *probe_access(CPUArchState *env, target_ulong addr, 
int size,
 }
 
 /* Reject I/O access, or other required slow-path.  */
-if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP)) {
+if (tlb_addr & (TLB_NOTDIRTY | TLB_MMI

[PULL 07/16] exec: Adjust notdirty tracing

2019-09-25 Thread Richard Henderson

The memory_region_tb_read tracepoint is unreachable, since notdirty
is supposed to apply only to writes.  The memory_region_tb_write
tracepoint is mis-named, because notdirty is not only used for TB
invalidation.  It is also used for e.g. VGA RAM updates and migration.

Replace memory_region_tb_write with memory_notdirty_write_access,
and place it in memory_notdirty_write_prepare where it can catch
all of the instances.  Add memory_notdirty_set_dirty to log when
we no longer intercept writes to a page.

Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: David Hildenbrand 
Signed-off-by: Richard Henderson 
---
 exec.c   | 3 +++
 memory.c | 4 
 trace-events | 4 ++--
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 8b998974f8..5f2587b621 100644
--- a/exec.c
+++ b/exec.c
@@ -2755,6 +2755,8 @@ void memory_notdirty_write_prepare(NotDirtyInfo *ndi,
 ndi->size = size;
 ndi->pages = NULL;
 
+trace_memory_notdirty_write_access(mem_vaddr, ram_addr, size);
+
 assert(tcg_enabled());
 if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
 ndi->pages = page_collection_lock(ram_addr, ram_addr + size);
@@ -2779,6 +2781,7 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
 /* we remove the notdirty callback only if the code has been
flushed */
 if (!cpu_physical_memory_is_clean(ndi->ram_addr)) {
+trace_memory_notdirty_set_dirty(ndi->mem_vaddr);
 tlb_set_dirty(ndi->cpu, ndi->mem_vaddr);
 }
 }
diff --git a/memory.c b/memory.c
index b9dd6b94ca..57c44c97db 100644
--- a/memory.c
+++ b/memory.c
@@ -438,7 +438,6 @@ static MemTxResult  
memory_region_read_accessor(MemoryRegion *mr,
 /* Accesses to code which has previously been translated into a TB show
  * up in the MMIO path, as accesses to the io_mem_notdirty
  * MemoryRegion. */
-trace_memory_region_tb_read(get_cpu_index(), addr, tmp, size);
 } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
 hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
 trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -465,7 +464,6 @@ static MemTxResult 
memory_region_read_with_attrs_accessor(MemoryRegion *mr,
 /* Accesses to code which has previously been translated into a TB show
  * up in the MMIO path, as accesses to the io_mem_notdirty
  * MemoryRegion. */
-trace_memory_region_tb_read(get_cpu_index(), addr, tmp, size);
 } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
 hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
 trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -490,7 +488,6 @@ static MemTxResult 
memory_region_write_accessor(MemoryRegion *mr,
 /* Accesses to code which has previously been translated into a TB show
  * up in the MMIO path, as accesses to the io_mem_notdirty
  * MemoryRegion. */
-trace_memory_region_tb_write(get_cpu_index(), addr, tmp, size);
 } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
 hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
 trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, 
size);
@@ -515,7 +512,6 @@ static MemTxResult 
memory_region_write_with_attrs_accessor(MemoryRegion *mr,
 /* Accesses to code which has previously been translated into a TB show
  * up in the MMIO path, as accesses to the io_mem_notdirty
  * MemoryRegion. */
-trace_memory_region_tb_write(get_cpu_index(), addr, tmp, size);
 } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
 hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
 trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, 
size);
diff --git a/trace-events b/trace-events
index 823a4ae64e..20821ba545 100644
--- a/trace-events
+++ b/trace-events
@@ -52,14 +52,14 @@ dma_map_wait(void *dbs) "dbs=%p"
 find_ram_offset(uint64_t size, uint64_t offset) "size: 0x%" PRIx64 " @ 0x%" 
PRIx64
 find_ram_offset_loop(uint64_t size, uint64_t candidate, uint64_t offset, 
uint64_t next, uint64_t mingap) "trying size: 0x%" PRIx64 " @ 0x%" PRIx64 ", 
offset: 0x%" PRIx64" next: 0x%" PRIx64 " mingap: 0x%" PRIx64
 ram_block_discard_range(const char *rbname, void *hva, size_t length, bool 
need_madvise, bool need_fallocate, int ret) "%s@%p + 0x%zx: madvise: %d 
fallocate: %d ret: %d"
+memory_notdirty_write_access(uint64_t vaddr, uint64_t ram_addr, unsigned size) 
"0x%" PRIx64 " ram_addr 0x%" PRIx64 " size %u"
+memory_notdirty_set_dirty(uint64_t vaddr) "0x%" PRIx64
 
 # memory.c
 memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, 
unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t 
value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" s

[PULL 10/16] cputlb: Partially inline memory_region_section_get_iotlb

2019-09-25 Thread Richard Henderson

There is only one caller, tlb_set_page_with_attrs.  We cannot
inline the entire function because the AddressSpaceDispatch
structure is private to exec.c, and cannot easily be moved to
include/exec/memory-internal.h.

Compute is_ram and is_romd once within tlb_set_page_with_attrs.
Fold the number of tests against these predicates.  Compute
cpu_physical_memory_is_clean outside of the tlb lock region.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h |  6 +---
 accel/tcg/cputlb.c  | 68 ++---
 exec.c  | 22 ++---
 3 files changed, 47 insertions(+), 49 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 81b02eb2fe..49db07ba0b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -509,11 +509,7 @@ address_space_translate_for_iotlb(CPUState *cpu, int 
asidx, hwaddr addr,
   hwaddr *xlat, hwaddr *plen,
   MemTxAttrs attrs, int *prot);
 hwaddr memory_region_section_get_iotlb(CPUState *cpu,
-   MemoryRegionSection *section,
-   target_ulong vaddr,
-   hwaddr paddr, hwaddr xlat,
-   int prot,
-   target_ulong *address);
+   MemoryRegionSection *section);
 #endif
 
 /* vl.c */
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 7e9a0f7ac8..4f118d2cc9 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -705,13 +705,14 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong 
vaddr,
 MemoryRegionSection *section;
 unsigned int index;
 target_ulong address;
-target_ulong code_address;
+target_ulong write_address;
 uintptr_t addend;
 CPUTLBEntry *te, tn;
 hwaddr iotlb, xlat, sz, paddr_page;
 target_ulong vaddr_page;
 int asidx = cpu_asidx_from_attrs(cpu, attrs);
 int wp_flags;
+bool is_ram, is_romd;
 
 assert_cpu_is_self(cpu);
 
@@ -740,18 +741,46 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong 
vaddr,
 if (attrs.byte_swap) {
 address |= TLB_BSWAP;
 }
-if (!memory_region_is_ram(section->mr) &&
-!memory_region_is_romd(section->mr)) {
-/* IO memory case */
-address |= TLB_MMIO;
-addend = 0;
-} else {
+
+is_ram = memory_region_is_ram(section->mr);
+is_romd = memory_region_is_romd(section->mr);
+
+if (is_ram || is_romd) {
+/* RAM and ROMD both have associated host memory. */
 addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) + xlat;
+} else {
+/* I/O does not; force the host address to NULL. */
+addend = 0;
+}
+
+write_address = address;
+if (is_ram) {
+iotlb = memory_region_get_ram_addr(section->mr) + xlat;
+/*
+ * Computing is_clean is expensive; avoid all that unless
+ * the page is actually writable.
+ */
+if (prot & PAGE_WRITE) {
+if (section->readonly) {
+write_address |= TLB_DISCARD_WRITE;
+} else if (cpu_physical_memory_is_clean(iotlb)) {
+write_address |= TLB_NOTDIRTY;
+}
+}
+} else {
+/* I/O or ROMD */
+iotlb = memory_region_section_get_iotlb(cpu, section) + xlat;
+/*
+ * Writes to romd devices must go through MMIO to enable write.
+ * Reads to romd devices go through the ram_ptr found above,
+ * but of course reads to I/O must go through MMIO.
+ */
+write_address |= TLB_MMIO;
+if (!is_romd) {
+address = write_address;
+}
 }
 
-code_address = address;
-iotlb = memory_region_section_get_iotlb(cpu, section, vaddr_page,
-paddr_page, xlat, prot, &address);
 wp_flags = cpu_watchpoint_address_matches(cpu, vaddr_page,
   TARGET_PAGE_SIZE);
 
@@ -791,8 +820,8 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong 
vaddr,
 /*
  * At this point iotlb contains a physical section number in the lower
  * TARGET_PAGE_BITS, and either
- *  + the ram_addr_t of the page base of the target RAM (if NOTDIRTY or 
ROM)
- *  + the offset within section->mr of the page base (otherwise)
+ *  + the ram_addr_t of the page base of the target RAM (RAM)
+ *  + the offset within section->mr of the page base (I/O, ROMD)
  * We subtract the vaddr_page (which is page aligned and thus won't
  * disturb the low bits) to give an offset which can be added to the
  * (non-page-aligned) vaddr of the eventual memory access to get
@@ -815,25 +844,14 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong 
vaddr,
 }
 
 if (prot & PAGE_EXEC) {
-tn.addr_

[PULL 06/16] cputlb: Introduce TLB_BSWAP

2019-09-25 Thread Richard Henderson

Handle bswap on ram directly in load/store_helper.  This fixes a
bug with the previous implementation in that one cannot use the
I/O path for RAM.

Fixes: a26fc6f5152b47f1
Reviewed-by: Alex Bennée 
Reviewed-by: David Hildenbrand 
Signed-off-by: Richard Henderson 
---
 include/exec/cpu-all.h |  4 ++-
 accel/tcg/cputlb.c | 72 +-
 2 files changed, 46 insertions(+), 30 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index e0c8dc540c..d148bded35 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -335,12 +335,14 @@ CPUArchState *cpu_copy(CPUArchState *env);
 #define TLB_MMIO(1 << (TARGET_PAGE_BITS_MIN - 3))
 /* Set if TLB entry contains a watchpoint.  */
 #define TLB_WATCHPOINT  (1 << (TARGET_PAGE_BITS_MIN - 4))
+/* Set if TLB entry requires byte swap.  */
+#define TLB_BSWAP   (1 << (TARGET_PAGE_BITS_MIN - 5))
 
 /* Use this mask to check interception with an alignment mask
  * in a TCG backend.
  */
 #define TLB_FLAGS_MASK \
-(TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO | TLB_WATCHPOINT)
+(TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO | TLB_WATCHPOINT | TLB_BSWAP)
 
 /**
  * tlb_hit_page: return true if page aligned @addr is a hit against the
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index eeba8c9847..028eebcb44 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -737,8 +737,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong 
vaddr,
 address |= TLB_INVALID_MASK;
 }
 if (attrs.byte_swap) {
-/* Force the access through the I/O slow path.  */
-address |= TLB_MMIO;
+address |= TLB_BSWAP;
 }
 if (!memory_region_is_ram(section->mr) &&
 !memory_region_is_romd(section->mr)) {
@@ -901,10 +900,6 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry 
*iotlbentry,
 bool locked = false;
 MemTxResult r;
 
-if (iotlbentry->attrs.byte_swap) {
-op ^= MO_BSWAP;
-}
-
 section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
 mr = section->mr;
 mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
@@ -947,10 +942,6 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry 
*iotlbentry,
 bool locked = false;
 MemTxResult r;
 
-if (iotlbentry->attrs.byte_swap) {
-op ^= MO_BSWAP;
-}
-
 section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
 mr = section->mr;
 mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
@@ -1133,8 +1124,8 @@ void *probe_access(CPUArchState *env, target_ulong addr, 
int size,
  wp_access, retaddr);
 }
 
-if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO)) {
-/* I/O access */
+/* Reject I/O access, or other required slow-path.  */
+if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP)) {
 return NULL;
 }
 
@@ -1344,6 +1335,7 @@ load_helper(CPUArchState *env, target_ulong addr, 
TCGMemOpIdx oi,
 /* Handle anything that isn't just a straight memory access.  */
 if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
 CPUIOTLBEntry *iotlbentry;
+bool need_swap;
 
 /* For anything that is unaligned, recurse through full_load.  */
 if ((addr & (size - 1)) != 0) {
@@ -1357,17 +1349,27 @@ load_helper(CPUArchState *env, target_ulong addr, 
TCGMemOpIdx oi,
 /* On watchpoint hit, this will longjmp out.  */
 cpu_check_watchpoint(env_cpu(env), addr, size,
  iotlbentry->attrs, BP_MEM_READ, retaddr);
-
-/* The backing page may or may not require I/O.  */
-tlb_addr &= ~TLB_WATCHPOINT;
-if ((tlb_addr & ~TARGET_PAGE_MASK) == 0) {
-goto do_aligned_access;
-}
 }
 
+need_swap = size > 1 && (tlb_addr & TLB_BSWAP);
+
 /* Handle I/O access.  */
-return io_readx(env, iotlbentry, mmu_idx, addr,
-retaddr, access_type, op);
+if (likely(tlb_addr & TLB_MMIO)) {
+return io_readx(env, iotlbentry, mmu_idx, addr, retaddr,
+access_type, op ^ (need_swap * MO_BSWAP));
+}
+
+haddr = (void *)((uintptr_t)addr + entry->addend);
+
+/*
+ * Keep these two load_memop separate to ensure that the compiler
+ * is able to fold the entire function to a single instruction.
+ * There is a build-time assert inside to remind you of this.  ;-)
+ */
+if (unlikely(need_swap)) {
+return load_memop(haddr, op ^ MO_BSWAP);
+}
+return load_memop(haddr, op);
 }
 
 /* Handle slow unaligned access (it spans two pages or IO).  */
@@ -1394,7 +1396,6 @@ load_helper(CPUArchState *env, target_ulong addr, 
TCGMemOpIdx oi,
 return res & MAKE_64BIT_MASK(0, size * 8);
 }
 
- do_aligned_access:
 haddr = (void *)((uintptr_t)addr + entry->addend);
 return

[PULL 11/16] cputlb: Merge and move memory_notdirty_write_{prepare, complete}

2019-09-25 Thread Richard Henderson

Since 9458a9a1df1a, all readers of the dirty bitmaps wait
for the rcu lock, which means that they wait until the end
of any executing TranslationBlock.

As a consequence, there is no need for the actual access
to happen in between the _prepare and _complete.  Therefore,
we can improve things by merging the two functions into
notdirty_write and dropping the NotDirtyInfo structure.

In addition, the only users of notdirty_write are in cputlb.c,
so move the merged function there.  Pass in the CPUIOTLBEntry
from which the ram_addr_t may be computed.

Reviewed-by: David Hildenbrand 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 include/exec/memory-internal.h | 65 -
 accel/tcg/cputlb.c | 76 +++---
 exec.c | 44 
 3 files changed, 42 insertions(+), 143 deletions(-)

diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
index ef4fb92371..9fcc2af25c 100644
--- a/include/exec/memory-internal.h
+++ b/include/exec/memory-internal.h
@@ -49,70 +49,5 @@ void address_space_dispatch_free(AddressSpaceDispatch *d);
 
 void mtree_print_dispatch(struct AddressSpaceDispatch *d,
   MemoryRegion *root);
-
-struct page_collection;
-
-/* Opaque struct for passing info from memory_notdirty_write_prepare()
- * to memory_notdirty_write_complete(). Callers should treat all fields
- * as private, with the exception of @active.
- *
- * @active is a field which is not touched by either the prepare or
- * complete functions, but which the caller can use if it wishes to
- * track whether it has called prepare for this struct and so needs
- * to later call the complete function.
- */
-typedef struct {
-CPUState *cpu;
-struct page_collection *pages;
-ram_addr_t ram_addr;
-vaddr mem_vaddr;
-unsigned size;
-bool active;
-} NotDirtyInfo;
-
-/**
- * memory_notdirty_write_prepare: call before writing to non-dirty memory
- * @ndi: pointer to opaque NotDirtyInfo struct
- * @cpu: CPU doing the write
- * @mem_vaddr: virtual address of write
- * @ram_addr: the ram address of the write
- * @size: size of write in bytes
- *
- * Any code which writes to the host memory corresponding to
- * guest RAM which has been marked as NOTDIRTY must wrap those
- * writes in calls to memory_notdirty_write_prepare() and
- * memory_notdirty_write_complete():
- *
- *  NotDirtyInfo ndi;
- *  memory_notdirty_write_prepare(&ndi, );
- *  ... perform write here ...
- *  memory_notdirty_write_complete(&ndi);
- *
- * These calls will ensure that we flush any TCG translated code for
- * the memory being written, update the dirty bits and (if possible)
- * remove the slowpath callback for writing to the memory.
- *
- * This must only be called if we are using TCG; it will assert otherwise.
- *
- * We may take locks in the prepare call, so callers must ensure that
- * they don't exit (via longjump or otherwise) without calling complete.
- *
- * This call must only be made inside an RCU critical section.
- * (Note that while we're executing a TCG TB we're always in an
- * RCU critical section, which is likely to be the case for callers
- * of these functions.)
- */
-void memory_notdirty_write_prepare(NotDirtyInfo *ndi,
-   CPUState *cpu,
-   vaddr mem_vaddr,
-   ram_addr_t ram_addr,
-   unsigned size);
-/**
- * memory_notdirty_write_complete: finish write to non-dirty memory
- * @ndi: pointer to the opaque NotDirtyInfo struct which was initialized
- * by memory_not_dirty_write_prepare().
- */
-void memory_notdirty_write_complete(NotDirtyInfo *ndi);
-
 #endif
 #endif
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 4f118d2cc9..3e91838519 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -33,6 +33,7 @@
 #include "exec/helper-proto.h"
 #include "qemu/atomic.h"
 #include "qemu/atomic128.h"
+#include "translate-all.h"
 
 /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */
 /* #define DEBUG_TLB */
@@ -1085,6 +1086,37 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env, 
target_ulong addr)
 return qemu_ram_addr_from_host_nofail(p);
 }
 
+static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
+   CPUIOTLBEntry *iotlbentry, uintptr_t retaddr)
+{
+ram_addr_t ram_addr = mem_vaddr + iotlbentry->addr;
+
+trace_memory_notdirty_write_access(mem_vaddr, ram_addr, size);
+
+if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
+struct page_collection *pages
+= page_collection_lock(ram_addr, ram_addr + size);
+
+/* We require mem_io_pc in tb_invalidate_phys_page_range.  */
+cpu->mem_io_pc = retaddr;
+
+tb_invalidate_phys_page_fast(pages, ram_addr, size);
+page_collection_unlock(pages);
+}
+
+/*

[PULL 13/16] cputlb: Remove cpu->mem_io_vaddr

2019-09-25 Thread Richard Henderson

With the merge of notdirty handling into store_helper,
the last user of cpu->mem_io_vaddr was removed.

Reviewed-by: Alex Bennée 
Reviewed-by: David Hildenbrand 
Signed-off-by: Richard Henderson 
---
 include/hw/core/cpu.h | 2 --
 accel/tcg/cputlb.c| 2 --
 hw/core/cpu.c | 1 -
 3 files changed, 5 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index c7cda65c66..031f587e51 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -338,7 +338,6 @@ struct qemu_work_item;
  * @next_cpu: Next CPU sharing TB cache.
  * @opaque: User data.
  * @mem_io_pc: Host Program Counter at which the memory was accessed.
- * @mem_io_vaddr: Target virtual address at which the memory was accessed.
  * @kvm_fd: vCPU file descriptor for KVM.
  * @work_mutex: Lock to prevent multiple access to queued_work_*.
  * @queued_work_first: First asynchronous work pending.
@@ -413,7 +412,6 @@ struct CPUState {
  * we store some rarely used information in the CPU context.
  */
 uintptr_t mem_io_pc;
-vaddr mem_io_vaddr;
 /*
  * This is only needed for the legacy cpu_unassigned_access() hook;
  * when all targets using it have been converted to use
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index b56e9ddf8c..4b24811ce7 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -928,7 +928,6 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry 
*iotlbentry,
 cpu_io_recompile(cpu, retaddr);
 }
 
-cpu->mem_io_vaddr = addr;
 cpu->mem_io_access_type = access_type;
 
 if (mr->global_locking && !qemu_mutex_iothread_locked()) {
@@ -968,7 +967,6 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry 
*iotlbentry,
 if (!cpu->can_do_io) {
 cpu_io_recompile(cpu, retaddr);
 }
-cpu->mem_io_vaddr = addr;
 cpu->mem_io_pc = retaddr;
 
 if (mr->global_locking && !qemu_mutex_iothread_locked()) {
diff --git a/hw/core/cpu.c b/hw/core/cpu.c
index 0035845511..73b1ee34d0 100644
--- a/hw/core/cpu.c
+++ b/hw/core/cpu.c
@@ -261,7 +261,6 @@ static void cpu_common_reset(CPUState *cpu)
 cpu->interrupt_request = 0;
 cpu->halted = 0;
 cpu->mem_io_pc = 0;
-cpu->mem_io_vaddr = 0;
 cpu->icount_extra = 0;
 atomic_set(&cpu->icount_decr_ptr->u32, 0);
 cpu->can_do_io = 1;
-- 
2.17.1

[PULL 09/16] cputlb: Move NOTDIRTY handling from I/O path to TLB path

2019-09-25 Thread Richard Henderson

Pages that we want to track for NOTDIRTY are RAM.  We do not
really need to go through the I/O path to handle them.

Acked-by: David Hildenbrand 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/exec/cpu-common.h |  2 --
 accel/tcg/cputlb.c| 26 +---
 exec.c| 50 ---
 memory.c  | 16 -
 4 files changed, 23 insertions(+), 71 deletions(-)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 1c0e03ddc2..81753bbb34 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -100,8 +100,6 @@ void qemu_flush_coalesced_mmio_buffer(void);
 
 void cpu_flush_icache_range(hwaddr start, hwaddr len);
 
-extern struct MemoryRegion io_mem_notdirty;
-
 typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
 
 int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 404ec57a4e..7e9a0f7ac8 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -905,7 +905,7 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry 
*iotlbentry,
 mr = section->mr;
 mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
 cpu->mem_io_pc = retaddr;
-if (mr != &io_mem_notdirty && !cpu->can_do_io) {
+if (!cpu->can_do_io) {
 cpu_io_recompile(cpu, retaddr);
 }
 
@@ -946,7 +946,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry 
*iotlbentry,
 section = iotlb_to_section(cpu, iotlbentry->addr, iotlbentry->attrs);
 mr = section->mr;
 mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
-if (mr != &io_mem_notdirty && !cpu->can_do_io) {
+if (!cpu->can_do_io) {
 cpu_io_recompile(cpu, retaddr);
 }
 cpu->mem_io_vaddr = addr;
@@ -1612,7 +1612,7 @@ store_helper(CPUArchState *env, target_ulong addr, 
uint64_t val,
 need_swap = size > 1 && (tlb_addr & TLB_BSWAP);
 
 /* Handle I/O access.  */
-if (likely(tlb_addr & (TLB_MMIO | TLB_NOTDIRTY))) {
+if (tlb_addr & TLB_MMIO) {
 io_writex(env, iotlbentry, mmu_idx, val, addr, retaddr,
   op ^ (need_swap * MO_BSWAP));
 return;
@@ -1625,6 +1625,26 @@ store_helper(CPUArchState *env, target_ulong addr, 
uint64_t val,
 
 haddr = (void *)((uintptr_t)addr + entry->addend);
 
+/* Handle clean RAM pages.  */
+if (tlb_addr & TLB_NOTDIRTY) {
+NotDirtyInfo ndi;
+
+/* We require mem_io_pc in tb_invalidate_phys_page_range.  */
+env_cpu(env)->mem_io_pc = retaddr;
+
+memory_notdirty_write_prepare(&ndi, env_cpu(env), addr,
+  addr + iotlbentry->addr, size);
+
+if (unlikely(need_swap)) {
+store_memop(haddr, val, op ^ MO_BSWAP);
+} else {
+store_memop(haddr, val, op);
+}
+
+memory_notdirty_write_complete(&ndi);
+return;
+}
+
 /*
  * Keep these two store_memop separate to ensure that the compiler
  * is able to fold the entire function to a single instruction.
diff --git a/exec.c b/exec.c
index ea8c0b18ac..dc7001f115 100644
--- a/exec.c
+++ b/exec.c
@@ -88,7 +88,6 @@ static MemoryRegion *system_io;
 AddressSpace address_space_io;
 AddressSpace address_space_memory;
 
-MemoryRegion io_mem_notdirty;
 static MemoryRegion io_mem_unassigned;
 #endif
 
@@ -191,7 +190,6 @@ typedef struct subpage_t {
 } subpage_t;
 
 #define PHYS_SECTION_UNASSIGNED 0
-#define PHYS_SECTION_NOTDIRTY 1
 
 static void io_mem_init(void);
 static void memory_map_init(void);
@@ -1472,9 +1470,6 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
 if (memory_region_is_ram(section->mr)) {
 /* Normal RAM.  */
 iotlb = memory_region_get_ram_addr(section->mr) + xlat;
-if (!section->readonly) {
-iotlb |= PHYS_SECTION_NOTDIRTY;
-}
 } else {
 AddressSpaceDispatch *d;
 
@@ -2783,42 +2778,6 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
 }
 }
 
-/* Called within RCU critical section.  */
-static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
-   uint64_t val, unsigned size)
-{
-NotDirtyInfo ndi;
-
-memory_notdirty_write_prepare(&ndi, current_cpu, current_cpu->mem_io_vaddr,
- ram_addr, size);
-
-stn_p(qemu_map_ram_ptr(NULL, ram_addr), size, val);
-memory_notdirty_write_complete(&ndi);
-}
-
-static bool notdirty_mem_accepts(void *opaque, hwaddr addr,
- unsigned size, bool is_write,
- MemTxAttrs attrs)
-{
-return is_write;
-}
-
-static const MemoryRegionOps notdirty_mem_ops = {
-.write = notdirty_mem_write,
-.valid.accepts = notdirty_mem_accepts,
-.endianness = DEVICE_NATIVE_ENDIAN,
-

Re: [PATCH v4 00/16] Move rom and notdirty handling to cputlb

2019-09-25 Thread Mark Cave-Ayland

On 23/09/2019 23:59, Richard Henderson wrote:

> Changes since v3:
>   * Don't accidentally include the TARGET_PAGE_BITS_VARY patch set.  ;-)
>   * Remove __has_attribute(__always_inline__).
>   * Use single load/store_memop function instead of separate small wrappers.
>   * Introduce optimize_away to assert the code folds away as expected.
> 
> Patches without review:
> 
> 0003-qemu-compiler.h-Add-optimize_away.patch
> 0004-cputlb-Use-optimize_away-in-load-store_helpers.patch
> 0005-cputlb-Split-out-load-store_memop.patch
> 0010-cputlb-Partially-inline-memory_region_section_get.patch
> 0011-cputlb-Merge-and-move-memory_notdirty_write_-prep.patch
> 0012-cputlb-Handle-TLB_NOTDIRTY-in-probe_access.patch
> 
> 
> r~
> 
> 
> Richard Henderson (16):
>   exec: Use TARGET_PAGE_BITS_MIN for TLB flags
>   cputlb: Disable __always_inline__ without optimization
>   qemu/compiler.h: Add optimize_away
>   cputlb: Use optimize_away in load/store_helpers
>   cputlb: Split out load/store_memop
>   cputlb: Introduce TLB_BSWAP
>   exec: Adjust notdirty tracing
>   cputlb: Move ROM handling from I/O path to TLB path
>   cputlb: Move NOTDIRTY handling from I/O path to TLB path
>   cputlb: Partially inline memory_region_section_get_iotlb
>   cputlb: Merge and move memory_notdirty_write_{prepare,complete}
>   cputlb: Handle TLB_NOTDIRTY in probe_access
>   cputlb: Remove cpu->mem_io_vaddr
>   cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
>   cputlb: Pass retaddr to tb_invalidate_phys_page_fast
>   cputlb: Pass retaddr to tb_check_watchpoint
> 
>  accel/tcg/translate-all.h  |   8 +-
>  include/exec/cpu-all.h |  23 ++-
>  include/exec/cpu-common.h  |   3 -
>  include/exec/exec-all.h|   6 +-
>  include/exec/memory-internal.h |  65 ---
>  include/hw/core/cpu.h  |   2 -
>  include/qemu/compiler.h|  26 +++
>  accel/tcg/cputlb.c | 340 +++--
>  accel/tcg/translate-all.c  |  51 +++--
>  exec.c | 158 +--
>  hw/core/cpu.c  |   1 -
>  memory.c   |  20 --
>  trace-events   |   4 +-
>  13 files changed, 279 insertions(+), 428 deletions(-)

Am I right in thinking that this is now the latest version of the patchset which
fixes up the byte swaps in RAM?

I'm not sure that I can offer much in the way of review, however is there any 
testing
I can do to help out here?


ATB,

Mark.

[PULL 12/16] cputlb: Handle TLB_NOTDIRTY in probe_access

2019-09-25 Thread Richard Henderson

We can use notdirty_write for the write and return a valid host
pointer for this case.

Reviewed-by: David Hildenbrand 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 3e91838519..b56e9ddf8c 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1168,16 +1168,24 @@ void *probe_access(CPUArchState *env, target_ulong 
addr, int size,
 return NULL;
 }
 
-/* Handle watchpoints.  */
-if (tlb_addr & TLB_WATCHPOINT) {
-cpu_check_watchpoint(env_cpu(env), addr, size,
- env_tlb(env)->d[mmu_idx].iotlb[index].attrs,
- wp_access, retaddr);
-}
+if (unlikely(tlb_addr & TLB_FLAGS_MASK)) {
+CPUIOTLBEntry *iotlbentry = &env_tlb(env)->d[mmu_idx].iotlb[index];
 
-/* Reject I/O access, or other required slow-path.  */
-if (tlb_addr & (TLB_NOTDIRTY | TLB_MMIO | TLB_BSWAP | TLB_DISCARD_WRITE)) {
-return NULL;
+/* Reject I/O access, or other required slow-path.  */
+if (tlb_addr & (TLB_MMIO | TLB_BSWAP | TLB_DISCARD_WRITE)) {
+return NULL;
+}
+
+/* Handle watchpoints.  */
+if (tlb_addr & TLB_WATCHPOINT) {
+cpu_check_watchpoint(env_cpu(env), addr, size,
+ iotlbentry->attrs, wp_access, retaddr);
+}
+
+/* Handle clean RAM pages.  */
+if (tlb_addr & TLB_NOTDIRTY) {
+notdirty_write(env_cpu(env), addr, size, iotlbentry, retaddr);
+}
 }
 
 return (void *)((uintptr_t)addr + entry->addend);
-- 
2.17.1

[PULL 14/16] cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access

2019-09-25 Thread Richard Henderson

All callers pass false to this argument.  Remove it and pass the
constant on to tb_invalidate_phys_page_range__locked.

Reviewed-by: Alex Bennée 
Reviewed-by: David Hildenbrand 
Signed-off-by: Richard Henderson 
---
 accel/tcg/translate-all.h | 3 +--
 accel/tcg/translate-all.c | 6 ++
 exec.c| 4 ++--
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
index 64f5fd9a05..31f2117188 100644
--- a/accel/tcg/translate-all.h
+++ b/accel/tcg/translate-all.h
@@ -28,8 +28,7 @@ struct page_collection *page_collection_lock(tb_page_addr_t 
start,
 void page_collection_unlock(struct page_collection *set);
 void tb_invalidate_phys_page_fast(struct page_collection *pages,
   tb_page_addr_t start, int len);
-void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
-   int is_cpu_write_access);
+void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
 void tb_check_watchpoint(CPUState *cpu);
 
 #ifdef CONFIG_USER_ONLY
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 5d1e08b169..de4b697163 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1983,8 +1983,7 @@ tb_invalidate_phys_page_range__locked(struct 
page_collection *pages,
  *
  * Called with mmap_lock held for user-mode emulation
  */
-void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
-   int is_cpu_write_access)
+void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end)
 {
 struct page_collection *pages;
 PageDesc *p;
@@ -1996,8 +1995,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, 
tb_page_addr_t end,
 return;
 }
 pages = page_collection_lock(start, end);
-tb_invalidate_phys_page_range__locked(pages, p, start, end,
-  is_cpu_write_access);
+tb_invalidate_phys_page_range__locked(pages, p, start, end, 0);
 page_collection_unlock(pages);
 }
 
diff --git a/exec.c b/exec.c
index 7d835b1a2b..b3df826039 100644
--- a/exec.c
+++ b/exec.c
@@ -1012,7 +1012,7 @@ const char *parse_cpu_option(const char *cpu_option)
 void tb_invalidate_phys_addr(target_ulong addr)
 {
 mmap_lock();
-tb_invalidate_phys_page_range(addr, addr + 1, 0);
+tb_invalidate_phys_page_range(addr, addr + 1);
 mmap_unlock();
 }
 
@@ -1039,7 +1039,7 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr 
addr, MemTxAttrs attrs)
 return;
 }
 ram_addr = memory_region_get_ram_addr(mr) + addr;
-tb_invalidate_phys_page_range(ram_addr, ram_addr + 1, 0);
+tb_invalidate_phys_page_range(ram_addr, ram_addr + 1);
 rcu_read_unlock();
 }
 
-- 
2.17.1

Re: [PATCH v3 15/33] tests/docker: reduce scary warnings by cleaning up clean up

2019-09-25 Thread Richard Henderson

On 9/24/19 2:00 PM, Alex Bennée wrote:
> There was in the clean-up code caused by attempting to inspect images
> which finished before we got there. Clean up the clean up code by:
> 
>   - only track the one instance at a time
>   - use --filter for docker ps instead of doing it by hand
>   - just call docker rm -f to be done with it
>   - use uuid.uuid4() for a random uid
> 
> Signed-off-by: Alex Bennée 
> 
> ---
> v2
>   - drop the try/except approach and be smarter
>   - use uuid4 as uuid1 can generate clashes in parallel builds
> 
> fixup! tests/docker: reduce scary warnings by cleaning up clean up
> ---
>  tests/docker/docker.py | 34 --
>  1 file changed, 16 insertions(+), 18 deletions(-)

Reviewed-by: Richard Henderson 


r~

[PULL 15/16] cputlb: Pass retaddr to tb_invalidate_phys_page_fast

2019-09-25 Thread Richard Henderson

Rather than rely on cpu->mem_io_pc, pass retaddr down directly.

Within tb_invalidate_phys_page_range__locked, the is_cpu_write_access
parameter is non-zero exactly when retaddr would be non-zero, so that
is a simple replacement.

Recognize that current_tb_not_found is true only when mem_io_pc
(and now retaddr) are also non-zero, so remove a redundant test.

Reviewed-by: Alex Bennée 
Reviewed-by: David Hildenbrand 
Signed-off-by: Richard Henderson 
---
 accel/tcg/translate-all.h |  3 ++-
 accel/tcg/cputlb.c|  6 +-
 accel/tcg/translate-all.c | 39 +++
 3 files changed, 22 insertions(+), 26 deletions(-)

diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
index 31f2117188..135c1ea96a 100644
--- a/accel/tcg/translate-all.h
+++ b/accel/tcg/translate-all.h
@@ -27,7 +27,8 @@ struct page_collection *page_collection_lock(tb_page_addr_t 
start,
  tb_page_addr_t end);
 void page_collection_unlock(struct page_collection *set);
 void tb_invalidate_phys_page_fast(struct page_collection *pages,
-  tb_page_addr_t start, int len);
+  tb_page_addr_t start, int len,
+  uintptr_t retaddr);
 void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
 void tb_check_watchpoint(CPUState *cpu);
 
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 4b24811ce7..defc8d5929 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1094,11 +1094,7 @@ static void notdirty_write(CPUState *cpu, vaddr 
mem_vaddr, unsigned size,
 if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
 struct page_collection *pages
 = page_collection_lock(ram_addr, ram_addr + size);
-
-/* We require mem_io_pc in tb_invalidate_phys_page_range.  */
-cpu->mem_io_pc = retaddr;
-
-tb_invalidate_phys_page_fast(pages, ram_addr, size);
+tb_invalidate_phys_page_fast(pages, ram_addr, size, retaddr);
 page_collection_unlock(pages);
 }
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index de4b697163..db77fb221b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1889,7 +1889,7 @@ static void
 tb_invalidate_phys_page_range__locked(struct page_collection *pages,
   PageDesc *p, tb_page_addr_t start,
   tb_page_addr_t end,
-  int is_cpu_write_access)
+  uintptr_t retaddr)
 {
 TranslationBlock *tb;
 tb_page_addr_t tb_start, tb_end;
@@ -1897,9 +1897,9 @@ tb_invalidate_phys_page_range__locked(struct 
page_collection *pages,
 #ifdef TARGET_HAS_PRECISE_SMC
 CPUState *cpu = current_cpu;
 CPUArchState *env = NULL;
-int current_tb_not_found = is_cpu_write_access;
+bool current_tb_not_found = retaddr != 0;
+bool current_tb_modified = false;
 TranslationBlock *current_tb = NULL;
-int current_tb_modified = 0;
 target_ulong current_pc = 0;
 target_ulong current_cs_base = 0;
 uint32_t current_flags = 0;
@@ -1931,24 +1931,21 @@ tb_invalidate_phys_page_range__locked(struct 
page_collection *pages,
 if (!(tb_end <= start || tb_start >= end)) {
 #ifdef TARGET_HAS_PRECISE_SMC
 if (current_tb_not_found) {
-current_tb_not_found = 0;
-current_tb = NULL;
-if (cpu->mem_io_pc) {
-/* now we have a real cpu fault */
-current_tb = tcg_tb_lookup(cpu->mem_io_pc);
-}
+current_tb_not_found = false;
+/* now we have a real cpu fault */
+current_tb = tcg_tb_lookup(retaddr);
 }
 if (current_tb == tb &&
 (tb_cflags(current_tb) & CF_COUNT_MASK) != 1) {
-/* If we are modifying the current TB, we must stop
-its execution. We could be more precise by checking
-that the modification is after the current PC, but it
-would require a specialized function to partially
-restore the CPU state */
-
-current_tb_modified = 1;
-cpu_restore_state_from_tb(cpu, current_tb,
-  cpu->mem_io_pc, true);
+/*
+ * If we are modifying the current TB, we must stop
+ * its execution. We could be more precise by checking
+ * that the modification is after the current PC, but it
+ * would require a specialized function to partially
+ * restore the CPU state.
+ */
+current_tb_modified = true;
+cpu_restore_state_from_tb(cpu, current_tb, retaddr, true);
 cpu_get_tb_cpu_state(env, ¤t_pc

Re: [PATCH v3 18/33] tests/tcg: re-enable linux-test for ppc64abi32

2019-09-25 Thread Richard Henderson

On 9/24/19 2:00 PM, Alex Bennée wrote:
> Now we have fixed the signal delivary bug we can remove this horrible
> hack from the system.
> 
> Cc: Richard Henderson 
> Signed-off-by: Alex Bennée 
> 
> ---
> v2
>   - drop un-needed cflags
> ---
>  tests/tcg/multiarch/Makefile.target | 11 +++
>  1 file changed, 3 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [PATCH v4 00/16] Move rom and notdirty handling to cputlb

2019-09-25 Thread Mark Cave-Ayland

On 25/09/2019 19:52, Mark Cave-Ayland wrote:

> On 23/09/2019 23:59, Richard Henderson wrote:
> 
>> Changes since v3:
>>   * Don't accidentally include the TARGET_PAGE_BITS_VARY patch set.  ;-)
>>   * Remove __has_attribute(__always_inline__).
>>   * Use single load/store_memop function instead of separate small wrappers.
>>   * Introduce optimize_away to assert the code folds away as expected.
>>
>> Patches without review:
>>
>> 0003-qemu-compiler.h-Add-optimize_away.patch
>> 0004-cputlb-Use-optimize_away-in-load-store_helpers.patch
>> 0005-cputlb-Split-out-load-store_memop.patch
>> 0010-cputlb-Partially-inline-memory_region_section_get.patch
>> 0011-cputlb-Merge-and-move-memory_notdirty_write_-prep.patch
>> 0012-cputlb-Handle-TLB_NOTDIRTY-in-probe_access.patch
>>
>>
>> r~
>>
>>
>> Richard Henderson (16):
>>   exec: Use TARGET_PAGE_BITS_MIN for TLB flags
>>   cputlb: Disable __always_inline__ without optimization
>>   qemu/compiler.h: Add optimize_away
>>   cputlb: Use optimize_away in load/store_helpers
>>   cputlb: Split out load/store_memop
>>   cputlb: Introduce TLB_BSWAP
>>   exec: Adjust notdirty tracing
>>   cputlb: Move ROM handling from I/O path to TLB path
>>   cputlb: Move NOTDIRTY handling from I/O path to TLB path
>>   cputlb: Partially inline memory_region_section_get_iotlb
>>   cputlb: Merge and move memory_notdirty_write_{prepare,complete}
>>   cputlb: Handle TLB_NOTDIRTY in probe_access
>>   cputlb: Remove cpu->mem_io_vaddr
>>   cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_access
>>   cputlb: Pass retaddr to tb_invalidate_phys_page_fast
>>   cputlb: Pass retaddr to tb_check_watchpoint
>>
>>  accel/tcg/translate-all.h  |   8 +-
>>  include/exec/cpu-all.h |  23 ++-
>>  include/exec/cpu-common.h  |   3 -
>>  include/exec/exec-all.h|   6 +-
>>  include/exec/memory-internal.h |  65 ---
>>  include/hw/core/cpu.h  |   2 -
>>  include/qemu/compiler.h|  26 +++
>>  accel/tcg/cputlb.c | 340 +++--
>>  accel/tcg/translate-all.c  |  51 +++--
>>  exec.c | 158 +--
>>  hw/core/cpu.c  |   1 -
>>  memory.c   |  20 --
>>  trace-events   |   4 +-
>>  13 files changed, 279 insertions(+), 428 deletions(-)
> 
> Am I right in thinking that this is now the latest version of the patchset 
> which
> fixes up the byte swaps in RAM?
> 
> I'm not sure that I can offer much in the way of review, however is there any 
> testing
> I can do to help out here?

Ha okay, I've just seen the TCG PR appear in my inbox so I'll assume that 
everyone is
happy and everything is working as intended :)


ATB,

Mark.

[PULL 16/16] cputlb: Pass retaddr to tb_check_watchpoint

2019-09-25 Thread Richard Henderson

Fixes the previous TLB_WATCHPOINT patches because we are currently
failing to set cpu->mem_io_pc with the call to cpu_check_watchpoint.
Pass down the retaddr directly because it's readily available.

Fixes: 50b107c5d61
Reviewed-by: Alex Bennée 
Reviewed-by: David Hildenbrand 
Signed-off-by: Richard Henderson 
---
 accel/tcg/translate-all.h | 2 +-
 accel/tcg/translate-all.c | 6 +++---
 exec.c| 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
index 135c1ea96a..a557b4e2bb 100644
--- a/accel/tcg/translate-all.h
+++ b/accel/tcg/translate-all.h
@@ -30,7 +30,7 @@ void tb_invalidate_phys_page_fast(struct page_collection 
*pages,
   tb_page_addr_t start, int len,
   uintptr_t retaddr);
 void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end);
-void tb_check_watchpoint(CPUState *cpu);
+void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr);
 
 #ifdef CONFIG_USER_ONLY
 int page_unprotect(target_ulong address, uintptr_t pc);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index db77fb221b..66d4bc4341 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2142,16 +2142,16 @@ static bool tb_invalidate_phys_page(tb_page_addr_t 
addr, uintptr_t pc)
 #endif
 
 /* user-mode: call with mmap_lock held */
-void tb_check_watchpoint(CPUState *cpu)
+void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr)
 {
 TranslationBlock *tb;
 
 assert_memory_lock();
 
-tb = tcg_tb_lookup(cpu->mem_io_pc);
+tb = tcg_tb_lookup(retaddr);
 if (tb) {
 /* We can use retranslation to find the PC.  */
-cpu_restore_state_from_tb(cpu, tb, cpu->mem_io_pc, true);
+cpu_restore_state_from_tb(cpu, tb, retaddr, true);
 tb_phys_invalidate(tb, -1);
 } else {
 /* The exception probably happened in a helper.  The CPU state should
diff --git a/exec.c b/exec.c
index b3df826039..8a0a6613b1 100644
--- a/exec.c
+++ b/exec.c
@@ -2758,7 +2758,7 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, 
vaddr len,
 cpu->watchpoint_hit = wp;
 
 mmap_lock();
-tb_check_watchpoint(cpu);
+tb_check_watchpoint(cpu, ra);
 if (wp->flags & BP_STOP_BEFORE_ACCESS) {
 cpu->exception_index = EXCP_DEBUG;
 mmap_unlock();
-- 
2.17.1

Re: [PATCH v3 33/33] tests/docker: remove debian-powerpc-user-cross

2019-09-25 Thread Richard Henderson

On 9/24/19 2:01 PM, Alex Bennée wrote:
> Despite our attempts in 4d26c7fef4 to keep this going it still gets in
> the way of "make docker-test-build" completing because of course we
> can't build a modern QEMU with the image. Let's put the thing out of
> it's misery and remove it.
> 
> People who really care about building on powerpc can still use the
> binfmt_misc support to manually build an image (or just run the build
> from pre this commit).
> 
> Signed-off-by: Alex Bennée 
> Cc: Mark Cave-Ayland 
> ---
>  tests/docker/Makefile.include |  9 
>  .../debian-powerpc-user-cross.docker  | 21 ---
>  2 files changed, 30 deletions(-)
>  delete mode 100644 tests/docker/dockerfiles/debian-powerpc-user-cross.docker

Reviewed-by: Richard Henderson 


r~

[Bug 1841990] Re: instruction 'denbcdq' misbehaving

2019-09-25 Thread Mark Cave-Ayland

That certainly sounds like progress. Did you see the follow up email
indicating the typo that I found in patch 6? It can be fixed by applying
the following diff on top:

diff --git a/target/ppc/dfp_helper.c b/target/ppc/dfp_helper.c
index c2d335e928..b801acbedc 100644
--- a/target/ppc/dfp_helper.c
+++ b/target/ppc/dfp_helper.c
@@ -1054,7 +1054,7 @@ static inline void dfp_set_sign_64(ppc_vsr_t *t, uint8_t 
sgn)
 static inline void dfp_set_sign_128(ppc_vsr_t *t, uint8_t sgn)
 {
 t->VsrD(0) <<= 4;
-t->VsrD(0) |= (t->VsrD(0) >> 60);
+t->VsrD(0) |= (t->VsrD(1) >> 60);
 t->VsrD(1) <<= 4;
 t->VsrD(1) |= (sgn & 0xF);
 }

Does that help any more tests to pass? Also the changes to the FP
register layout were made in QEMU 4.0 and so it seems to me that even if
some tests fail, if the results between QEMU 3.1 and QEMU git master
with the patchset applied are equivalent then we can assume that the
patchset functionality is correct.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1841990

Title:
  instruction 'denbcdq' misbehaving

Status in QEMU:
  New

Bug description:
  Instruction 'denbcdq' appears to have no effect.  Test case attached.

  On ppc64le native:
  --
  gcc -g -O -mcpu=power9 bcdcfsq.c test-denbcdq.c -o test-denbcdq
  $ ./test-denbcdq
  0x
  0x000c
  0x2208
  $ ./test-denbcdq 1
  0x0001
  0x001c
  0x22080001
  $ ./test-denbcdq $(seq 0 99)
  0x0064
  0x100c
  0x22080080
  --

  With "qemu-ppc64le -cpu power9"
  --
  $ qemu-ppc64le -cpu power9 -L [...] ./test-denbcdq
  0x
  0x000c
  0x000c
  $ qemu-ppc64le -cpu power9 -L [...] ./test-denbcdq 1
  0x0001
  0x001c
  0x001c
  $ qemu-ppc64le -cpu power9 -L [...] ./test-denbcdq $(seq 100)
  0x0064
  0x100c
  0x100c
  --

  I started looking at the code, but I got confused rather quickly.
  Could be related to endianness? I think denbcdq arrived on the scene
  before little-endian was a big deal.  Maybe something to do with
  utilizing implicit floating-point register pairs...  I don't think the
  right data is getting to helper_denbcdq, which would point back to the
  gen_fprp_ptr uses in dfp-impl.inc.c (GEN_DFP_T_FPR_I32_Rc).  (Maybe?)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1841990/+subscriptions

Re: [PATCH v3 23/33] docs/devel: add "check-tcg" to testing.rst

2019-09-25 Thread Richard Henderson

On 9/24/19 2:00 PM, Alex Bennée wrote:
> It was pointed out we haven't documented the check-tcg part of the
> build system. Attempt to rectify that now.
> 
> Signed-off-by: Alex Bennée 
> ---
>  docs/devel/testing.rst | 62 ++
>  1 file changed, 62 insertions(+)

Reviewed-by: Richard Henderson 


r~

Re: [PATCH v8 01/13] vfio: KABI for migration interface

2019-09-25 Thread Alex Williamson

On Tue, 24 Sep 2019 23:04:22 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Wednesday, September 25, 2019 2:03 AM
> > 
> > On Tue, 24 Sep 2019 02:19:15 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Tian, Kevin
> > > > Sent: Friday, September 13, 2019 7:00 AM
> > > >  
> > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > Sent: Thursday, September 12, 2019 10:41 PM
> > > > >
> > > > > On Tue, 3 Sep 2019 06:57:27 +
> > > > > "Tian, Kevin"  wrote:
> > > > >  
> > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > > Sent: Saturday, August 31, 2019 12:33 AM
> > > > > > >
> > > > > > > On Fri, 30 Aug 2019 08:06:32 +
> > > > > > > "Tian, Kevin"  wrote:
> > > > > > >  
> > > > > > > > > From: Tian, Kevin
> > > > > > > > > Sent: Friday, August 30, 2019 3:26 PM
> > > > > > > > >  
> > > > > > > > [...]  
> > > > > > > > > > How does QEMU handle the fact that IOVAs are potentially  
> > > > > dynamic  
> > > > > > > while  
> > > > > > > > > > performing the live portion of a migration?  For example,  
> > each  
> > > > > time a  
> > > > > > > > > > guest driver calls dma_map_page() or dma_unmap_page(), a
> > > > > > > > > > MemoryRegionSection pops in or out of the AddressSpace for  
> > > > the  
> > > > > device  
> > > > > > > > > > (I'm assuming a vIOMMU where the device AddressSpace is  
> > not  
> > > > > > > > > > system_memory).  I don't see any QEMU code that intercepts  
> > > > that  
> > > > > > > change  
> > > > > > > > > > in the AddressSpace such that the IOVA dirty pfns could be  
> > > > > recorded and  
> > > > > > > > > > translated to GFNs.  The vendor driver can't track these  
> > beyond  
> > > > > getting  
> > > > > > > > > > an unmap notification since it only knows the IOVA pfns,  
> > which  
> > > > > can be  
> > > > > > > > > > re-used with different GFN backing.  Once the DMA mapping  
> > is  
> > > > > torn  
> > > > > > > down,  
> > > > > > > > > > it seems those dirty pfns are lost in the ether.  If this 
> > > > > > > > > > works in  
> > > > > QEMU,  
> > > > > > > > > > please help me find the code that handles it.  
> > > > > > > > >
> > > > > > > > > I'm curious about this part too. Interestingly, I didn't find 
> > > > > > > > > any  
> > > > > log_sync  
> > > > > > > > > callback registered by emulated devices in Qemu. Looks dirty  
> > > > pages  
> > > > > > > > > by emulated DMAs are recorded in some implicit way. But KVM  
> > > > > always  
> > > > > > > > > reports dirty page in GFN instead of IOVA, regardless of the  
> > > > > presence of  
> > > > > > > > > vIOMMU. If Qemu also tracks dirty pages in GFN for emulated  
> > > > DMAs  
> > > > > > > > >  (translation can be done when DMA happens), then we don't  
> > > > need  
> > > > > > > > > worry about transient mapping from IOVA to GFN. Along this  
> > way  
> > > > > we  
> > > > > > > > > also want GFN-based dirty bitmap being reported through VFIO,
> > > > > > > > > similar to what KVM does. For vendor drivers, it needs to  
> > translate  
> > > > > > > > > from IOVA to HVA to GFN when tracking DMA activities on  
> > VFIO  
> > > > > > > > > devices. IOVA->HVA is provided by VFIO. for HVA->GFN, it can  
> > be  
> > > > > > > > > provided by KVM but I'm not sure whether it's exposed now.
> > > > > > > > >  
> > > > > > > >
> > > > > > > > HVA->GFN can be done through hva_to_gfn_memslot in  
> > kvm_host.h.  
> > > > > > >
> > > > > > > I thought it was bad enough that we have vendor drivers that  
> > depend  
> > > > > on  
> > > > > > > KVM, but designing a vfio interface that only supports a KVM  
> > interface  
> > > > > > > is more undesirable.  I also note without comment that  
> > > > > gfn_to_memslot()  
> > > > > > > is a GPL symbol.  Thanks,  
> > > > > >
> > > > > > yes it is bad, but sometimes inevitable. If you recall our 
> > > > > > discussions
> > > > > > back to 3yrs (when discussing the 1st mdev framework), there were  
> > > > > similar  
> > > > > > hypervisor dependencies in GVT-g, e.g. querying gpa->hpa when
> > > > > > creating some shadow structures. gpa->hpa is definitely hypervisor
> > > > > > specific knowledge, which is easy in KVM (gpa->hva->hpa), but  
> > needs  
> > > > > > hypercall in Xen. but VFIO already makes assumption based on  
> > KVM-  
> > > > > > only flavor when implementing vfio_{un}pin_page_external.  
> > > > >
> > > > > Where's the KVM assumption there?  The MAP_DMA ioctl takes an  
> > IOVA  
> > > > > and
> > > > > HVA.  When an mdev vendor driver calls vfio_pin_pages(), we GUP the  
> > > > HVA  
> > > > > to get an HPA and provide an array of HPA pfns back to the caller.  
> > > > > The
> > > > > other vGPU mdev vendor manages to make use of this without KVM...  
> > the  
> > > > > KVM interface used by GVT-g is GPL-only.  
> > > >
> > > > To be clear it's the assumption on the host-based hypervisors e.g. KVM.
> > > > GUP is

Re: [PATCH v2 2/7] s390x/mmu: Move DAT protection handling out of mmu_translate_asce()

2019-09-25 Thread Richard Henderson

On 9/25/19 5:52 AM, David Hildenbrand wrote:
> We'll reuse the ilen and tec definitions in mmu_translate
> soon also for all other DAT exceptions we inject. Move it to the caller,
> where we can later pair it up with other protection checks, like IEP.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/mmu_helper.c | 39 ---
>  1 file changed, 16 insertions(+), 23 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [PATCH v2 1/7] s390x/mmu: Drop debug logging from MMU code

2019-09-25 Thread Richard Henderson

On 9/25/19 5:52 AM, David Hildenbrand wrote:
> Let's get it out of the way to make some further refactorings easier.
> Personally, I've never used these debug statements at all. And if I had
> to debug issue, I used plain GDB instead (debug prints are just way too
> much noise in the MMU). We might want to introduce tracing at some point
> instead, so we can able selected events on demand.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/mmu_helper.c | 51 ---
>  1 file changed, 51 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [PATCH v2 4/7] s390x/mmu: Inject PGM_ADDRESSING on boguous table addresses

2019-09-25 Thread Richard Henderson

On 9/25/19 5:52 AM, David Hildenbrand wrote:
> +static inline int read_table_entry(hwaddr gaddr, uint64_t *entry)
> +{
> +/*
> + * According to the PoP, these table addresses are "unpredictably real
> + * or absolute". Also, "it is unpredictable whether the address wraps
> + * or an addressing exception is recognized".
> + *
> + * We treat them as absolute addresses and don't wrap them.
> + */
> +if (unlikely(address_space_read(&address_space_memory, gaddr,
> + MEMTXATTRS_UNSPECIFIED, (uint8_t *)entry, sizeof(*entry)) !=
> + MEMTX_OK)) {
> +return -EFAULT;
> +}
> +*entry = be64_to_cpu(*entry);
> +return 0;
> +}

Maybe I've been away from the kernel too long, but I don't find returning
-EFAULT helpful.  I would return true/false for success/failure so that...


> +if (read_table_entry(origin + offs, &pt_entry)) {
> +return PGM_ADDRESSING;
> +}

... this gets written

if (!read_table_entry(...)) {
return PGM_ADDRESSING;
}

This statement, to me, reads "If we did not read_table_entry, return an
addressing exception."

If you *really* want to return non-zero on failure, I would prefer returning
PGM_ADDRESSING instead of the out-of-context -EFAULT.

> -new_entry = ldq_phys(cs->as, origin + offs);
> +if (read_table_entry(origin + offs, &new_entry)) {

Do you really want to replace cs->as with address_space_memory?


r~

Re: [PATCH v13 00/15] backup-top filter driver for backup

2019-09-25 Thread Vladimir Sementsov-Ogievskiy

Ogh :(

And I realized that there is bigger problem with design:

Assume failed copy in filter request: we want to mark bits dirty again
and release range lock on source.. But if we have some write reguests
in parallel, they may already passed backup-top filter, and they are 
only waiting for range lock. When lock is free the will go on and will
not see bitmap changes..

That means that we can't use range lock: waiting request must wait on
backup-top level, but range lock will not work on it, as they will 
interfer with original write request.

I have to rething it somehow, a kind of "intersecting requests" possibly 
will be kept. I still don't like that current backup write-notifier 
locks the whole region, even non-dirty bits, instead we should lock only 
the region which we are handling at the moment.

Patches 01-11 are still good themselves, as a preparation, let's keep them

Patches 12-13 are good, but range lock is not appropriate for backup.. 
May be they will be used for rewriting copy-on-read filter to copy in 
filter code.. Still I'm not sure, as COR should work through block-copy 
finally, and may possibly reuse same locking.

On 20.09.2019 17:20, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> These series introduce backup-top driver. It's a filter-node, which
> do copy-before-write operation. Mirror uses filter-node for handling
> guest writes, let's move to filter-node (from write-notifiers) for
> backup too.
> 
> v11,v12 -> v13 changes:
> 
> [v12 was two fixes in separate: [PATCH v12 0/2] backup: copy_range fixes]
> 
> 01: new in v12, in v13 change comment
> 02: in v12: add "Fixes: " to commit msg, in v13 add John's r-b
> 05: rebase on 01
> 07: rebase on 01. It still a clean movement, keep r-b
> 
> Vladimir Sementsov-Ogievskiy (15):
>block/backup: fix max_transfer handling for copy_range
>block/backup: fix backup_cow_with_offload for last cluster
>block/backup: split shareable copying part from backup_do_cow
>block/backup: improve comment about image fleecing
>block/backup: introduce BlockCopyState
>block/backup: fix block-comment style
>block: move block_copy from block/backup.c to separate file
>block: teach bdrv_debug_breakpoint skip filters with backing
>iotests: prepare 124 and 257 bitmap querying for backup-top filter
>iotests: 257: drop unused Drive.device field
>iotests: 257: drop device_add
>block/io: refactor wait_serialising_requests
>block: add lock/unlock range functions
>block: introduce backup-top filter driver
>block/backup: use backup-top instead of write notifiers
> 
>   qapi/block-core.json  |   8 +-
>   block/backup-top.h|  37 ++
>   include/block/block-copy.h|  84 
>   include/block/block_int.h |   5 +
>   block.c   |  34 +-
>   block/backup-top.c| 240 
>   block/backup.c| 440 -
>   block/block-copy.c| 346 
>   block/io.c|  68 +++-
>   block/replication.c   |   2 +-
>   blockdev.c|   1 +
>   block/Makefile.objs   |   3 +
>   block/trace-events|  14 +-
>   tests/qemu-iotests/056|   8 +-
>   tests/qemu-iotests/124|  83 ++--
>   tests/qemu-iotests/257|  91 ++---
>   tests/qemu-iotests/257.out| 714 ++
>   tests/qemu-iotests/iotests.py |  27 ++
>   18 files changed, 1287 insertions(+), 918 deletions(-)
>   create mode 100644 block/backup-top.h
>   create mode 100644 include/block/block-copy.h
>   create mode 100644 block/backup-top.c
>   create mode 100644 block/block-copy.c
>

Re: [PATCH v2 3/7] s390x/mmu: Inject DAT exceptions from a single place

2019-09-25 Thread Richard Henderson

On 9/25/19 5:52 AM, David Hildenbrand wrote:
> Let's return the PGM from the translation functions on error and inject
> based on that.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/mmu_helper.c | 63 +++
>  1 file changed, 17 insertions(+), 46 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [PATCH v13 00/15] backup-top filter driver for backup

2019-09-25 Thread Vladimir Sementsov-Ogievskiy

ops, I've sent unfinished message

On 25.09.2019 22:19, Vladimir Sementsov-Ogievskiy wrote:
> Ogh :(
> 
> And I realized that there is bigger problem with design:
> 
> Assume failed copy in filter request: we want to mark bits dirty again
> and release range lock on source.. But if we have some write reguests
> in parallel, they may already passed backup-top filter, and they are
> only waiting for range lock. When lock is free the will go on and will
> not see bitmap changes..
> 
> That means that we can't use range lock: waiting request must wait on
> backup-top level, but range lock will not work on it, as they will
> interfer with original write request.

With such design we can't mark bits dirty again. We can switch to other 
behavior: on failed block-copy in filter just cancel the whole 
block-job.. But actually I think both behaviors should be available for 
user:
1. if backup is important, better to fail guest writes if needed
2. if guest is important, better to fail backup job if failed to do 
copy-before-write

> 
> I have to rething it somehow, a kind of "intersecting requests" possibly
> will be kept. I still don't like that current backup write-notifier
> locks the whole region, even non-dirty bits, instead we should lock only
> the region which we are handling at the moment.
> 
> Patches 01-11 are still good themselves, as a preparation, let's keep them
> 
> Patches 12-13 are good, but range lock is not appropriate for backup..
> May be they will be used for rewriting copy-on-read filter to copy in
> filter code.. Still I'm not sure, as COR should work through block-copy
> finally, and may possibly reuse same locking.

better drop 12-13 for now

Patch 14 is good, let's keep it. It has correct abort() in 
backup_top_cbw(), it's not dependent on 12-13, and it's waiting for 
corrected combining of backup-top, backup and block-copy.

And patch 15 is bad, I'll rewrite it. So, 16 is not needed too.

> 
> On 20.09.2019 17:20, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> These series introduce backup-top driver. It's a filter-node, which
>> do copy-before-write operation. Mirror uses filter-node for handling
>> guest writes, let's move to filter-node (from write-notifiers) for
>> backup too.
>>
>> v11,v12 -> v13 changes:
>>
>> [v12 was two fixes in separate: [PATCH v12 0/2] backup: copy_range fixes]
>>
>> 01: new in v12, in v13 change comment
>> 02: in v12: add "Fixes: " to commit msg, in v13 add John's r-b
>> 05: rebase on 01
>> 07: rebase on 01. It still a clean movement, keep r-b
>>
>> Vladimir Sementsov-Ogievskiy (15):
>> block/backup: fix max_transfer handling for copy_range
>> block/backup: fix backup_cow_with_offload for last cluster
>> block/backup: split shareable copying part from backup_do_cow
>> block/backup: improve comment about image fleecing
>> block/backup: introduce BlockCopyState
>> block/backup: fix block-comment style
>> block: move block_copy from block/backup.c to separate file
>> block: teach bdrv_debug_breakpoint skip filters with backing
>> iotests: prepare 124 and 257 bitmap querying for backup-top filter
>> iotests: 257: drop unused Drive.device field
>> iotests: 257: drop device_add
>> block/io: refactor wait_serialising_requests
>> block: add lock/unlock range functions
>> block: introduce backup-top filter driver
>> block/backup: use backup-top instead of write notifiers
>>
>>qapi/block-core.json  |   8 +-
>>block/backup-top.h|  37 ++
>>include/block/block-copy.h|  84 
>>include/block/block_int.h |   5 +
>>block.c   |  34 +-
>>block/backup-top.c| 240 
>>block/backup.c| 440 -
>>block/block-copy.c| 346 
>>block/io.c|  68 +++-
>>block/replication.c   |   2 +-
>>blockdev.c|   1 +
>>block/Makefile.objs   |   3 +
>>block/trace-events|  14 +-
>>tests/qemu-iotests/056|   8 +-
>>tests/qemu-iotests/124|  83 ++--
>>tests/qemu-iotests/257|  91 ++---
>>tests/qemu-iotests/257.out| 714 ++
>>tests/qemu-iotests/iotests.py |  27 ++
>>18 files changed, 1287 insertions(+), 918 deletions(-)
>>create mode 100644 block/backup-top.h
>>create mode 100644 include/block/block-copy.h
>>create mode 100644 block/backup-top.c
>>create mode 100644 block/block-copy.c
>>

Re: [PATCH v2 5/7] s390x/mmu: Use TARGET_PAGE_MASK in mmu_translate_pte()

2019-09-25 Thread Richard Henderson

On 9/25/19 5:52 AM, David Hildenbrand wrote:
> While ASCE_ORIGIN is not wrong, it is certainly confusing. We want a
> page frame address.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/mmu_helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Richard Henderson 


r~

Re: [PATCH v2 4/7] s390x/mmu: Inject PGM_ADDRESSING on boguous table addresses

2019-09-25 Thread David Hildenbrand

On 25.09.19 21:25, Richard Henderson wrote:
> On 9/25/19 5:52 AM, David Hildenbrand wrote:
>> +static inline int read_table_entry(hwaddr gaddr, uint64_t *entry)
>> +{
>> +/*
>> + * According to the PoP, these table addresses are "unpredictably real
>> + * or absolute". Also, "it is unpredictable whether the address wraps
>> + * or an addressing exception is recognized".
>> + *
>> + * We treat them as absolute addresses and don't wrap them.
>> + */
>> +if (unlikely(address_space_read(&address_space_memory, gaddr,
>> + MEMTXATTRS_UNSPECIFIED, (uint8_t *)entry, sizeof(*entry)) 
>> !=
>> + MEMTX_OK)) {
>> +return -EFAULT;
>> +}
>> +*entry = be64_to_cpu(*entry);
>> +return 0;
>> +}
> 
> Maybe I've been away from the kernel too long, but I don't find returning
> -EFAULT helpful.  I would return true/false for success/failure so that...
> 
> 
>> +if (read_table_entry(origin + offs, &pt_entry)) {
>> +return PGM_ADDRESSING;
>> +}
> 
> ... this gets written
> 
> if (!read_table_entry(...)) {
> return PGM_ADDRESSING;
> }
> 
> This statement, to me, reads "If we did not read_table_entry, return an
> addressing exception."
> 
> If you *really* want to return non-zero on failure, I would prefer returning
> PGM_ADDRESSING instead of the out-of-context -EFAULT.

I'll go for your suggestion with a bool!

> 
>> -new_entry = ldq_phys(cs->as, origin + offs);
>> +if (read_table_entry(origin + offs, &new_entry)) {
> 
> Do you really want to replace cs->as with address_space_memory?
> 

I guess it shouldn't make a difference (unless I am missing something),
but I can just keep using cs->as.

Thanks!

> 
> r~
> 


-- 

Thanks,

David / dhildenb

Re: [PATCH v4 10/16] cputlb: Partially inline memory_region_section_get_iotlb

2019-09-25 Thread David Hildenbrand

On 25.09.19 19:55, Richard Henderson wrote:
> On 9/24/19 12:59 AM, David Hildenbrand wrote:
>>> +is_ram = memory_region_is_ram(section->mr);
>>> +is_romd = memory_region_is_romd(section->mr);
>>> +
>>> +if (is_ram || is_romd) {
>>> +/* RAM and ROMD both have associated host memory. */
>>>  addend = (uintptr_t)memory_region_get_ram_ptr(section->mr) + xlat;
>>> +} else {
>>> +/* I/O does not; force the host address to NULL. */
>>> +addend = 0;
>>> +}
>>> +
>>> +write_address = address;
>>
>> I guess the only "suboptimal" change is that you now have two checks for
>> "prot & PAGE_WRITE" twice in the case of ram instead of one.
> 
> It's a single bit test on a register operand -- as cheap as can be.  If you
> look at the entire code, there *must* be more than one test.  You can 
> rearrange
> the code to choose exactly where those tests are, but you'll have to have them
> somewhere.
> 
>>> +/* I/O or ROMD */
>>> +iotlb = memory_region_section_get_iotlb(cpu, section) + xlat;
>>> +/*
>>> + * Writes to romd devices must go through MMIO to enable write.
>>> + * Reads to romd devices go through the ram_ptr found above,
>>> + * but of course reads to I/O must go through MMIO.
>>> + */
>>> +write_address |= TLB_MMIO;
>>
>> ... and here you calculate write_address even if probably unused.
> 
> Well... while the page might not be writable (but I'd bet that it is -- I/O
> memory is almost never read-only), and therefore write_address is technically
> unused, the variable is practically used in the next line:
> 
> if (!is_romd) {
> address = write_address
> }
> 
> which will compile to a conditional move.
> 
>> Can your move the calculation of the write_address completely into the
>> "prot & PAGE_WRITE" case below?
> 
> We'd prefer not to, since the code below is within the cpu tlb lock region.
> We'd prefer to keep all of the expensive operations outside that.

Makes all sense to me then and looks sane :)

> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

Re: Debian support lifetime (was Re: [PATCH] docker: move tests from python2 to python3)

2019-09-25 Thread Eduardo Habkost

On Tue, Sep 24, 2019 at 08:35:13AM +0100, Daniel P. Berrangé wrote:
> On Mon, Sep 23, 2019 at 04:05:33PM -0300, Eduardo Habkost wrote:
[...]
> > Even for other long-lifetime distros, I really think "2 years
> > after the new major version is released" is too long, and I'd
> > like to shorten this to 1 year.
> 
> I guess this is ok, since this. is still quite a long life time of
> support for distros. eg RHEL has a 3-4 year gap between major
> releases, that gives 4-5 years for each release being supported by
> QEMU. Other LTS distros are similar

Do you mean the 2 years period is OK (and shouldn't be changed),
or that shortening it to 1 year is OK?

-- 
Eduardo

Re: [PATCH 11/20] spapr: Fix indexing of XICS irqs

2019-09-25 Thread Greg Kurz

On Wed, 25 Sep 2019 16:45:25 +1000
David Gibson  wrote:

> spapr global irq numbers are different from the source numbers on the ICS
> when using XICS - they're offset by XICS_IRQ_BASE (0x1000).  But
> spapr_irq_set_irq_xics() was passing through the global irq number to
> the ICS code unmodified.
> 
> We only got away with this because of a counteracting bug - we were
> incorrectly adjusting the qemu_irq we returned for a requested global irq
> number.
> 
> That approach mostly worked but is very confusing, incorrectly relies on
> the way the qemu_irq array is allocated, and undermines the intention of
> having the global array of qemu_irqs for spapr have a consistent meaning
> regardless of irq backend.
> 
> So, fix both set_irq and qemu_irq indexing.  We rename some parameters at
> the same time to make it clear that they are referring to spapr global
> irq numbers.
> 
> Signed-off-by: David Gibson 
> ---

Reviewed-by: Greg Kurz 

Further cleanup could be to have the XICS backend to only take global
irq numbers and to convert them to ICS source numbers internally. This
would put an end to the confusion between srcno/irq in the frontend
code.

>  hw/ppc/spapr_irq.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 300c65be3a..9a9e486eb5 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -153,10 +153,9 @@ static void spapr_irq_free_xics(SpaprMachineState 
> *spapr, int irq, int num)
>  static qemu_irq spapr_qirq_xics(SpaprMachineState *spapr, int irq)
>  {
>  ICSState *ics = spapr->ics;
> -uint32_t srcno = irq - ics->offset;
>  
>  if (ics_valid_irq(ics, irq)) {
> -return spapr->qirqs[srcno];
> +return spapr->qirqs[irq];
>  }
>  
>  return NULL;
> @@ -204,9 +203,10 @@ static int spapr_irq_post_load_xics(SpaprMachineState 
> *spapr, int version_id)
>  return 0;
>  }
>  
> -static void spapr_irq_set_irq_xics(void *opaque, int srcno, int val)
> +static void spapr_irq_set_irq_xics(void *opaque, int irq, int val)
>  {
>  SpaprMachineState *spapr = opaque;
> +uint32_t srcno = irq - spapr->ics->offset;
>  
>  ics_set_irq(spapr->ics, srcno, val);
>  }
> @@ -377,14 +377,14 @@ static void spapr_irq_reset_xive(SpaprMachineState 
> *spapr, Error **errp)
>  spapr_xive_mmio_set_enabled(spapr->xive, true);
>  }
>  
> -static void spapr_irq_set_irq_xive(void *opaque, int srcno, int val)
> +static void spapr_irq_set_irq_xive(void *opaque, int irq, int val)
>  {
>  SpaprMachineState *spapr = opaque;
>  
>  if (kvm_irqchip_in_kernel()) {
> -kvmppc_xive_source_set_irq(&spapr->xive->source, srcno, val);
> +kvmppc_xive_source_set_irq(&spapr->xive->source, irq, val);
>  } else {
> -xive_source_set_irq(&spapr->xive->source, srcno, val);
> +xive_source_set_irq(&spapr->xive->source, irq, val);
>  }
>  }
>  
> @@ -563,11 +563,11 @@ static void spapr_irq_reset_dual(SpaprMachineState 
> *spapr, Error **errp)
>  spapr_irq_current(spapr)->reset(spapr, errp);
>  }
>  
> -static void spapr_irq_set_irq_dual(void *opaque, int srcno, int val)
> +static void spapr_irq_set_irq_dual(void *opaque, int irq, int val)
>  {
>  SpaprMachineState *spapr = opaque;
>  
> -spapr_irq_current(spapr)->set_irq(spapr, srcno, val);
> +spapr_irq_current(spapr)->set_irq(spapr, irq, val);
>  }
>  
>  static const char *spapr_irq_get_nodename_dual(SpaprMachineState *spapr)

Re: [PATCH v3 25/33] tests/docker: Add fedora-win10sdk-cross image

2019-09-25 Thread Philippe Mathieu-Daudé

Hi Alex,

On 9/24/19 11:00 PM, Alex Bennée wrote:
> From: Philippe Mathieu-Daudé 
> 
> To build WHPX (Windows Hypervisor) binaries, we need the WHPX
> headers provided by the Windows SDK.

Justin is checking with his company if this patch is OK with them,
I'd rather wait before merging it:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg646351.html

Can you unqueue this and the next patch (which depends of it) meanwhile
please?

Thanks,

Phil.

> Add a script that fetches the required MSI/CAB files from the
> latest SDK (currently 10.0.18362.1).
> 
> Headers are accessible under /opt/win10sdk/include.
> 
> Set the QEMU_CONFIGURE_OPTS environment variable accordingly,
> enabling HAX and WHPX. Due to CPP warnings related to Microsoft
> specific #pragmas, we also need to use the '--disable-werror'
> configure flag.
> 
> Cc: Justin Terry 
> Signed-off-by: Philippe Mathieu-Daudé 
> Signed-off-by: Alex Bennée 
> Message-Id: <20190920113329.16787-3-phi...@redhat.com>
> ---
>  tests/docker/Makefile.include |  2 ++
>  .../dockerfiles/fedora-win10sdk-cross.docker  | 23 
>  tests/docker/dockerfiles/win10sdk-dl.sh   | 27 +++
>  3 files changed, 52 insertions(+)
>  create mode 100644 tests/docker/dockerfiles/fedora-win10sdk-cross.docker
>  create mode 100755 tests/docker/dockerfiles/win10sdk-dl.sh
> 
> diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
> index 3fc7a863e51..e85e73025ba 100644
> --- a/tests/docker/Makefile.include
> +++ b/tests/docker/Makefile.include
> @@ -125,6 +125,8 @@ docker-image-debian-ppc64-cross: docker-image-debian10
>  docker-image-debian-riscv64-cross: docker-image-debian10
>  docker-image-debian-sh4-cross: docker-image-debian10
>  docker-image-debian-sparc64-cross: docker-image-debian10
> +docker-image-fedora-win10sdk-cross: docker-image-fedora
> +docker-image-fedora-win10sdk-cross: 
> EXTRA_FILES:=$(DOCKER_FILES_DIR)/win10sdk-dl.sh
>  
>  docker-image-travis: NOUSER=1
>  
> diff --git a/tests/docker/dockerfiles/fedora-win10sdk-cross.docker 
> b/tests/docker/dockerfiles/fedora-win10sdk-cross.docker
> new file mode 100644
> index 000..55ca933d40d
> --- /dev/null
> +++ b/tests/docker/dockerfiles/fedora-win10sdk-cross.docker
> @@ -0,0 +1,23 @@
> +#
> +# Docker MinGW64 cross-compiler target with WHPX header installed
> +#
> +# This docker target builds on the Fedora 30 base image.
> +#
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +#
> +FROM qemu:fedora
> +
> +RUN dnf install -y \
> +cabextract \
> +msitools \
> +wget
> +
> +# Install WHPX headers from Windows Software Development Kit:
> +# https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk
> +ADD win10sdk-dl.sh /usr/local/bin/win10sdk-dl.sh
> +RUN /usr/local/bin/win10sdk-dl.sh
> +
> +ENV QEMU_CONFIGURE_OPTS ${QEMU_CONFIGURE_OPTS} \
> +--cross-prefix=x86_64-w64-mingw32- \
> +--extra-cflags=-I/opt/win10sdk/include --disable-werror \
> +--enable-hax --enable-whpx
> diff --git a/tests/docker/dockerfiles/win10sdk-dl.sh 
> b/tests/docker/dockerfiles/win10sdk-dl.sh
> new file mode 100755
> index 000..1c35c2a2524
> --- /dev/null
> +++ b/tests/docker/dockerfiles/win10sdk-dl.sh
> @@ -0,0 +1,27 @@
> +#!/bin/bash
> +#
> +# Install WHPX headers from Windows Software Development Kit
> +# https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk
> +#
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +WINDIR=/opt/win10sdk
> +mkdir -p ${WINDIR}
> +pushd ${WINDIR}
> +# Get the bundle base for Windows SDK v10.0.18362.1
> +BASE_URL=$(curl --silent --include 
> 'http://go.microsoft.com/fwlink/?prd=11966&pver=1.0&plcid=0x409&clcid=0x409&ar=Windows10&sar=SDK&o1=10.0.18362.1'
>  | sed -nE 's_Location: (.*)/\r_\1_p')/Installers
> +# Fetch the MSI containing the headers
> +wget --no-verbose ${BASE_URL}/'Windows SDK Desktop Headers x86-x86_en-us.msi'
> +while true; do
> +# Fetch all cabinets required by this MSI
> +CAB_NAME=$(msiextract Windows\ SDK\ Desktop\ Headers\ x86-x86_en-us.msi 
> 3>&1 2>&3 3>&-| sed -nE "s_.*Error opening file $PWD/(.*): No such file or 
> directory_\1_p")
> +test -z "${CAB_NAME}" && break
> +wget --no-verbose ${BASE_URL}/${CAB_NAME}
> +done
> +rm *.{cab,msi}
> +mkdir /opt/win10sdk/include
> +# Only keep the WHPX headers
> +for inc in "${WINDIR}/Program Files/Windows 
> Kits/10/Include/10.0.18362.0/um"/WinHv*; do
> +ln -s "${inc}" /opt/win10sdk/include
> +done
> +popd > /dev/null
>

Re: [PATCH 12/20] spapr: Simplify spapr_qirq() handling

2019-09-25 Thread Greg Kurz

On Wed, 25 Sep 2019 16:45:26 +1000
David Gibson  wrote:

> Currently spapr_qirq() used to find the qemu_irq for an spapr global irq
> number, redirects through the SpaprIrq::qirq method.  But the array of
> qemu_irqs is allocated in the PAPR layer, not the backends, and so the
> method implementations all return the same thing, just differing in the
> preliminary checks they make.
> 
> So, we can remove the method, and just implement spapr_qirq() directly,
> including all the relevant checks in one place.  We change all those
> checks into assert()s as well, since a failure here indicates an error in
> the calling code.
> 
> Signed-off-by: David Gibson 
> ---

Reviewed-by: Greg Kurz 

>  hw/ppc/spapr_irq.c | 47 ++
>  include/hw/ppc/spapr_irq.h |  1 -
>  2 files changed, 12 insertions(+), 36 deletions(-)
> 
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 9a9e486eb5..038bf4 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -150,17 +150,6 @@ static void spapr_irq_free_xics(SpaprMachineState 
> *spapr, int irq, int num)
>  }
>  }
>  
> -static qemu_irq spapr_qirq_xics(SpaprMachineState *spapr, int irq)
> -{
> -ICSState *ics = spapr->ics;
> -
> -if (ics_valid_irq(ics, irq)) {
> -return spapr->qirqs[irq];
> -}
> -
> -return NULL;
> -}
> -
>  static void spapr_irq_print_info_xics(SpaprMachineState *spapr, Monitor *mon)
>  {
>  CPUState *cs;
> @@ -242,7 +231,6 @@ SpaprIrq spapr_irq_xics = {
>  .init= spapr_irq_init_xics,
>  .claim   = spapr_irq_claim_xics,
>  .free= spapr_irq_free_xics,
> -.qirq= spapr_qirq_xics,
>  .print_info  = spapr_irq_print_info_xics,
>  .dt_populate = spapr_dt_xics,
>  .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
> @@ -300,20 +288,6 @@ static void spapr_irq_free_xive(SpaprMachineState 
> *spapr, int irq, int num)
>  }
>  }
>  
> -static qemu_irq spapr_qirq_xive(SpaprMachineState *spapr, int irq)
> -{
> -SpaprXive *xive = spapr->xive;
> -
> -if ((irq < SPAPR_XIRQ_BASE) || (irq >= xive->nr_irqs)) {
> -return NULL;
> -}
> -
> -/* The sPAPR machine/device should have claimed the IRQ before */
> -assert(xive_eas_is_valid(&xive->eat[irq]));
> -
> -return spapr->qirqs[irq];
> -}
> -
>  static void spapr_irq_print_info_xive(SpaprMachineState *spapr,
>Monitor *mon)
>  {
> @@ -413,7 +387,6 @@ SpaprIrq spapr_irq_xive = {
>  .init= spapr_irq_init_xive,
>  .claim   = spapr_irq_claim_xive,
>  .free= spapr_irq_free_xive,
> -.qirq= spapr_qirq_xive,
>  .print_info  = spapr_irq_print_info_xive,
>  .dt_populate = spapr_dt_xive,
>  .cpu_intc_create = spapr_irq_cpu_intc_create_xive,
> @@ -487,11 +460,6 @@ static void spapr_irq_free_dual(SpaprMachineState 
> *spapr, int irq, int num)
>  spapr_irq_xive.free(spapr, irq, num);
>  }
>  
> -static qemu_irq spapr_qirq_dual(SpaprMachineState *spapr, int irq)
> -{
> -return spapr_irq_current(spapr)->qirq(spapr, irq);
> -}
> -
>  static void spapr_irq_print_info_dual(SpaprMachineState *spapr, Monitor *mon)
>  {
>  spapr_irq_current(spapr)->print_info(spapr, mon);
> @@ -586,7 +554,6 @@ SpaprIrq spapr_irq_dual = {
>  .init= spapr_irq_init_dual,
>  .claim   = spapr_irq_claim_dual,
>  .free= spapr_irq_free_dual,
> -.qirq= spapr_qirq_dual,
>  .print_info  = spapr_irq_print_info_dual,
>  .dt_populate = spapr_irq_dt_populate_dual,
>  .cpu_intc_create = spapr_irq_cpu_intc_create_dual,
> @@ -700,7 +667,18 @@ void spapr_irq_free(SpaprMachineState *spapr, int irq, 
> int num)
>  
>  qemu_irq spapr_qirq(SpaprMachineState *spapr, int irq)
>  {
> -return spapr->irq->qirq(spapr, irq);
> +assert(irq >= SPAPR_XIRQ_BASE);
> +assert(irq < (spapr->irq->nr_xirqs + SPAPR_XIRQ_BASE));
> +
> +if (spapr->ics) {
> +assert(ics_valid_irq(spapr->ics, irq));
> +}
> +if (spapr->xive) {
> +assert(irq < spapr->xive->nr_irqs);
> +assert(xive_eas_is_valid(&spapr->xive->eat[irq]));
> +}
> +
> +return spapr->qirqs[irq];
>  }
>  
>  int spapr_irq_post_load(SpaprMachineState *spapr, int version_id)
> @@ -803,7 +781,6 @@ SpaprIrq spapr_irq_xics_legacy = {
>  .init= spapr_irq_init_xics,
>  .claim   = spapr_irq_claim_xics,
>  .free= spapr_irq_free_xics,
> -.qirq= spapr_qirq_xics,
>  .print_info  = spapr_irq_print_info_xics,
>  .dt_populate = spapr_dt_xics,
>  .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> index 7e26288fcd..a4e790ef60 100644
> --- a/include/hw/ppc/spapr_irq.h
> +++ b/include/hw/ppc/spapr_irq.h
> @@ -44,7 +44,6 @@ typedef struct SpaprIrq {
>  void (*init)(SpaprMachineState *spapr, Error **errp);
>  int (*claim)(SpaprMachineState *spapr, i

Re: [PATCH v7 0/8] Add Qemu to SeaBIOS LCHS interface

2019-09-25 Thread John Snow




On 9/25/19 7:06 AM, Sam Eiderman via wrote:
> v1:
> 
> Non-standard logical geometries break under QEMU.
> 
> A virtual disk which contains an operating system which depends on
> logical geometries (consistent values being reported from BIOS INT13
> AH=08) will most likely break under QEMU/SeaBIOS if it has non-standard
> logical geometries - for example 56 SPT (sectors per track).
> No matter what QEMU will guess - SeaBIOS, for large enough disks - will
> use LBA translation, which will report 63 SPT instead.
> 
> In addition we can not enforce SeaBIOS to rely on phyiscal geometries at
> all. A virtio-blk-pci virtual disk with 255 phyiscal heads can not
> report more than 16 physical heads when moved to an IDE controller, the
> ATA spec allows a maximum of 16 heads - this is an artifact of
> virtualization.
> 
> By supplying the logical geometies directly we are able to support such
> "exotic" disks.
> 
> We will use fw_cfg to do just that.
> 
> v2:
> 
> Fix missing parenthesis check in
> "hd-geo-test: Add tests for lchs override"
> 
> v3:
> 
> * Rename fw_cfg key to "bios-geometry".
> * Remove "extendible" interface.
> * Add cpu_to_le32 fix as Laszlo suggested or big endian hosts
> * Fix last qtest commit - automatic docker tester for some reason does not 
> have qemu-img set
> 
> v4:
> 
> * Change fw_cfg interface from mixed textual/binary to textual only
> 
> v5:
> 
> * Fix line > 80 chars in tests/hd-geo-test.c
> 
> v6:
> 
> * Small fixes for issues pointed by Max
> * (&conf->conf)->lcyls to conf->conf.lcyls and so on
> * Remove scsi_unrealize from everything other than scsi-hd
> * Add proper include to sysemu.h
> * scsi_device_unrealize() after scsi_device_purge_requests()
> 
> v7:
> 
> * Adapted last commit (tests) to changes in qtest
> 
> Sam Eiderman (8):
>   block: Refactor macros - fix tabbing
>   block: Support providing LCHS from user
>   bootdevice: Add interface to gather LCHS
>   scsi: Propagate unrealize() callback to scsi-hd
>   bootdevice: Gather LCHS from all relevant devices
>   bootdevice: Refactor get_boot_devices_list
>   bootdevice: FW_CFG interface for LCHS values
>   hd-geo-test: Add tests for lchs override
> 
>  bootdevice.c | 148 --
>  hw/block/virtio-blk.c|   6 +
>  hw/ide/qdev.c|   7 +-
>  hw/nvram/fw_cfg.c|  14 +-
>  hw/scsi/scsi-bus.c   |  16 ++
>  hw/scsi/scsi-disk.c  |  12 +
>  include/hw/block/block.h |  22 +-
>  include/hw/scsi/scsi.h   |   1 +
>  include/sysemu/sysemu.h  |   4 +
>  tests/Makefile.include   |   2 +-
>  tests/hd-geo-test.c  | 589 +++
>  11 files changed, 780 insertions(+), 41 deletions(-)
> 

Thanks, applied to my IDE tree:

https://github.com/jnsnow/qemu/commits/ide
https://github.com/jnsnow/qemu.git

--js

Is that the right tree? Nope, but time's marching on without us. If any
other maintainer has an objection, you have until Friday before I send
the PR!

Re: [PATCH 6/7] target/ppc: use existing VsrD() macro to eliminate HI_IDX and LO_IDX from dfp_helper.c

2019-09-25 Thread Mark Cave-Ayland

On 24/09/2019 22:46, Richard Henderson wrote:

> On 9/24/19 8:35 AM, Mark Cave-Ayland wrote:
>> Switch over all accesses to the decimal numbers held in struct PPC_DFP from
>> using HI_IDX and LO_IDX to using the VsrD() macro instead. Not only does this
>> allow the compiler to ensure that the various dfp_* functions are being 
>> passed
>> a ppc_vsr_t rather than an arbitrary uint64_t pointer, but also allows the
>> host endian-specific HI_IDX and LO_IDX to be completely removed from
>> dfp_helper.c.
>>
>> Signed-off-by: Mark Cave-Ayland 
>> ---
>>  target/ppc/dfp_helper.c | 70 ++---
>>  1 file changed, 31 insertions(+), 39 deletions(-)
> 
> Ho hum, vs patch 5 that was me not realizing how many different places really
> want to manipulate a 128-bit value.  Do go ahead and keep ppc_vsr_t for now.

Yes, it's a little bit confusing in places as some operations are done on the
decNumber whilst others are done on the decimal representation. After trying a 
few
different approaches, using ppc_vsr_t seemed to be the easiest and most readable
solution here.

I see now that you've given R-b tags for patches 3-7, and having slept on it I'm
inclined to leave patches 1-2 as they are now, i.e. no code changes other than
introducing the get/set helpers to help keep the patchset as mechanical as 
possible.
Do you think that seems a reasonable approach?

> It does look like we might be well served by using Int128 at some point, so
> that these operations can expand to int128_t on appropriate hw so that the
> compiler can DTRT with these.

Certainly ppc_vsr_t already has __uint128_t and Int128 elements but the 
impression I
got from the #ifdef is that not all compilers would support it? Although having 
said
that, making such a change is not something that's really on my radar.

ATB,

Mark.

Re: [PATCH] hw/core/loader: Fix possible crash in rom_copy()

2019-09-25 Thread Philippe Mathieu-Daudé

Hi Thomas,

On 9/25/19 3:03 PM, Thomas Huth wrote:
> Both, "rom->addr" and "addr" are derived from the binary image
> that can be loaded with the "-kernel" paramer. The code in
> rom_copy() then calculates:
> 
> d = dest + (rom->addr - addr);
> 
> and uses "d" as destination in a memcpy() some lines later. Now with
> bad kernel images, it is possible that rom->addr is smaller than addr,
> thus "rom->addr - addr" gets negative and the memcpy() then tries to
> copy contents from the image to a bad memory location. In the best case,
> this just crashes QEMU, in the worst case, this could maybe be used to
> inject code from the kernel image into the QEMU binary, so we better fix
> it with an additional sanity check here.
> 
> Cc: qemu-sta...@nongnu.org
> Reported-by: Guangming Liu
> Buglink: https://bugs.launchpad.net/qemu/+bug/1844635

"This page does not exist, or you may not have permission to see it."

This seems security related. Shouldn't we open a CVE for this?
https://wiki.qemu.org/SecurityProcess#CVE_allocation

Let's say I have write access to a LAN TFTP server used by some PXE
bootloader where I can store my crafted nasty kernel, then I get this score:

https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator?vector=AV:A/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H/E:P/RL:O/RC:C&version=3.1

CVSS Base Score: 9.6
CVSS Temporal Score: 8.6

Which seems quite high.

> Signed-off-by: Thomas Huth 
> ---
>  hw/core/loader.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/core/loader.c b/hw/core/loader.c
> index 0d60219364..5099f27dc8 100644
> --- a/hw/core/loader.c
> +++ b/hw/core/loader.c
> @@ -1281,7 +1281,7 @@ int rom_copy(uint8_t *dest, hwaddr addr, size_t size)

$ git show 235f86ef014
Date:   Thu Nov 12 21:53:11 2009 +0100

This function is old and poorly documented.

>  if (rom->addr + rom->romsize < addr) {
>  continue;
>  }
> -if (rom->addr > end) {
> +if (rom->addr > end || rom->addr < addr) {

Reviewed-by: Philippe Mathieu-Daudé 

>  break;
>  }
>  
>

[PATCH v3] qga: add command guest-get-devices for reporting VirtIO devices

2019-09-25 Thread Tomáš Golembiovský

Add command for reporting devices on Windows guest. The intent is not so
much to report the devices but more importantly the driver (and its
version) that is assigned to the device. This gives caller the
information whether VirtIO drivers are installed and/or whether
inadequate driver is used on a device (e.g. QXL device with base VGA
driver).

Signed-off-by: Tomáš Golembiovský 
---
 qga/commands-posix.c |   9 ++
 qga/commands-win32.c | 204 ++-
 qga/qapi-schema.json |  32 +++
 3 files changed, 244 insertions(+), 1 deletion(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index dfc05f5b8a..58e93feef9 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -2757,6 +2757,8 @@ GList *ga_command_blacklist_init(GList *blacklist)
 blacklist = g_list_append(blacklist, g_strdup("guest-fstrim"));
 #endif
 
+blacklist = g_list_append(blacklist, g_strdup("guest-get-devices"));
+
 return blacklist;
 }
 
@@ -2977,3 +2979,10 @@ GuestOSInfo *qmp_guest_get_osinfo(Error **errp)
 
 return info;
 }
+
+GuestDeviceInfoList *qmp_guest_get_devices(Error **errp)
+{
+error_setg(errp, QERR_UNSUPPORTED);
+
+return NULL;
+}
diff --git a/qga/commands-win32.c b/qga/commands-win32.c
index 6b67f16faf..139dbd7c9a 100644
--- a/qga/commands-win32.c
+++ b/qga/commands-win32.c
@@ -21,10 +21,11 @@
 #ifdef CONFIG_QGA_NTDDSCSI
 #include 
 #include 
+#endif
 #include 
 #include 
 #include 
-#endif
+#include 
 #include 
 #include 
 #include 
@@ -38,6 +39,36 @@
 #include "qemu/host-utils.h"
 #include "qemu/base64.h"
 
+/*
+ * The following should be in devpkey.h, but it isn't. The key names were
+ * prefixed to avoid (future) name clashes. Once the definitions get into
+ * mingw the following lines can be removed.
+ */
+DEFINE_DEVPROPKEY(qga_DEVPKEY_NAME, 0xb725f130, 0x47ef, 0x101a, 0xa5,
+0xf1, 0x02, 0x60, 0x8c, 0x9e, 0xeb, 0xac, 10);
+/* DEVPROP_TYPE_STRING */
+DEFINE_DEVPROPKEY(qga_DEVPKEY_Device_HardwareIds, 0xa45c254e, 0xdf1c,
+0x4efd, 0x80, 0x20, 0x67, 0xd1, 0x46, 0xa8, 0x50, 0xe0, 3);
+/* DEVPROP_TYPE_STRING_LIST */
+DEFINE_DEVPROPKEY(qga_DEVPKEY_Device_DriverDate, 0xa8b865dd, 0x2e3d,
+0x4094, 0xad, 0x97, 0xe5, 0x93, 0xa7, 0xc, 0x75, 0xd6, 2);
+/* DEVPROP_TYPE_FILETIME */
+DEFINE_DEVPROPKEY(qga_DEVPKEY_Device_DriverVersion, 0xa8b865dd, 0x2e3d,
+0x4094, 0xad, 0x97, 0xe5, 0x93, 0xa7, 0xc, 0x75, 0xd6, 3);
+/* DEVPROP_TYPE_STRING */
+/* The following shoud be in cfgmgr32.h, but it isn't */
+#ifndef CM_Get_DevNode_Property
+CMAPI CONFIGRET WINAPI CM_Get_DevNode_PropertyW(
+DEVINST  dnDevInst,
+CONST DEVPROPKEY * PropertyKey,
+DEVPROPTYPE  * PropertyType,
+PBYTEPropertyBuffer,
+PULONG   PropertyBufferSize,
+ULONGulFlags
+);
+#define CM_Get_DevNode_Property CM_Get_DevNode_PropertyW
+#endif
+
 #ifndef SHTDN_REASON_FLAG_PLANNED
 #define SHTDN_REASON_FLAG_PLANNED 0x8000
 #endif
@@ -92,6 +123,8 @@ static OpenFlags guest_file_open_modes[] = {
 g_free(suffix); \
 } while (0)
 
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(GuestDeviceInfo, qapi_free_GuestDeviceInfo)
+
 static OpenFlags *find_open_flag(const char *mode_str)
 {
 int mode;
@@ -2234,3 +2267,172 @@ GuestOSInfo *qmp_guest_get_osinfo(Error **errp)
 
 return info;
 }
+
+/*
+ * Safely get device property. Returned strings are using wide characters.
+ * Caller is responsible for freeing the buffer.
+ */
+static LPBYTE cm_get_property(DEVINST devInst, const DEVPROPKEY *propName,
+PDEVPROPTYPE propType)
+{
+CONFIGRET cr;
+g_autofree LPBYTE buffer = NULL;
+ULONG buffer_len = 0;
+
+/* First query for needed space */
+cr = CM_Get_DevNode_PropertyW(devInst, propName, propType,
+buffer, &buffer_len, 0);
+if (cr != CR_SUCCESS && cr != CR_BUFFER_SMALL) {
+
+slog("failed to get property size, error=0x%lx", cr);
+return NULL;
+}
+buffer = g_new0(BYTE, buffer_len + 1);
+cr = CM_Get_DevNode_PropertyW(devInst, propName, propType,
+buffer, &buffer_len, 0);
+if (cr != CR_SUCCESS) {
+slog("failed to get device property, error=0x%lx", cr);
+return NULL;
+}
+return g_steal_pointer(&buffer);
+}
+
+static GStrv ga_get_hardware_ids(DEVINST devInstance)
+{
+GStrv hw_ids = NULL;
+GArray *values = NULL;
+DEVPROPTYPE cm_type;
+LPWSTR id;
+g_autofree LPWSTR property = (LPWSTR)cm_get_property(devInstance,
+&qga_DEVPKEY_Device_HardwareIds, &cm_type);
+if (property == NULL) {
+slog("failed to get hardware IDs");
+return NULL;
+}
+if (*property == '\0') {
+/* empty list */
+return NULL;
+}
+values = g_array_new(TRUE, TRUE, sizeof(gchar*));
+for (id = property; '\0' != *id; id += lstrlenW(id) + 1) {
+gchar* id8 = g_utf16_to_utf8(id, -1, NULL, NULL, NULL);
+g_array_append_val(values, id8);
+}
+hw_ids = (GStrv)g_array_free(values, FALSE);
+

[PATCH v4] qga: add command guest-get-devices for reporting VirtIO devices

2019-09-25 Thread Tomáš Golembiovský

Add command for reporting devices on Windows guest. The intent is not so
much to report the devices but more importantly the driver (and its
version) that is assigned to the device. This gives caller the
information whether VirtIO drivers are installed and/or whether
inadequate driver is used on a device (e.g. QXL device with base VGA
driver).

Signed-off-by: Tomáš Golembiovský 
---
 qga/commands-posix.c |   9 ++
 qga/commands-win32.c | 204 ++-
 qga/qapi-schema.json |  32 +++
 3 files changed, 244 insertions(+), 1 deletion(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index dfc05f5b8a..58e93feef9 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -2757,6 +2757,8 @@ GList *ga_command_blacklist_init(GList *blacklist)
 blacklist = g_list_append(blacklist, g_strdup("guest-fstrim"));
 #endif
 
+blacklist = g_list_append(blacklist, g_strdup("guest-get-devices"));
+
 return blacklist;
 }
 
@@ -2977,3 +2979,10 @@ GuestOSInfo *qmp_guest_get_osinfo(Error **errp)
 
 return info;
 }
+
+GuestDeviceInfoList *qmp_guest_get_devices(Error **errp)
+{
+error_setg(errp, QERR_UNSUPPORTED);
+
+return NULL;
+}
diff --git a/qga/commands-win32.c b/qga/commands-win32.c
index 6b67f16faf..ec07a5b3ef 100644
--- a/qga/commands-win32.c
+++ b/qga/commands-win32.c
@@ -21,10 +21,11 @@
 #ifdef CONFIG_QGA_NTDDSCSI
 #include 
 #include 
+#endif
 #include 
 #include 
 #include 
-#endif
+#include 
 #include 
 #include 
 #include 
@@ -38,6 +39,36 @@
 #include "qemu/host-utils.h"
 #include "qemu/base64.h"
 
+/*
+ * The following should be in devpkey.h, but it isn't. The key names were
+ * prefixed to avoid (future) name clashes. Once the definitions get into
+ * mingw the following lines can be removed.
+ */
+DEFINE_DEVPROPKEY(qga_DEVPKEY_NAME, 0xb725f130, 0x47ef, 0x101a, 0xa5,
+0xf1, 0x02, 0x60, 0x8c, 0x9e, 0xeb, 0xac, 10);
+/* DEVPROP_TYPE_STRING */
+DEFINE_DEVPROPKEY(qga_DEVPKEY_Device_HardwareIds, 0xa45c254e, 0xdf1c,
+0x4efd, 0x80, 0x20, 0x67, 0xd1, 0x46, 0xa8, 0x50, 0xe0, 3);
+/* DEVPROP_TYPE_STRING_LIST */
+DEFINE_DEVPROPKEY(qga_DEVPKEY_Device_DriverDate, 0xa8b865dd, 0x2e3d,
+0x4094, 0xad, 0x97, 0xe5, 0x93, 0xa7, 0xc, 0x75, 0xd6, 2);
+/* DEVPROP_TYPE_FILETIME */
+DEFINE_DEVPROPKEY(qga_DEVPKEY_Device_DriverVersion, 0xa8b865dd, 0x2e3d,
+0x4094, 0xad, 0x97, 0xe5, 0x93, 0xa7, 0xc, 0x75, 0xd6, 3);
+/* DEVPROP_TYPE_STRING */
+/* The following shoud be in cfgmgr32.h, but it isn't */
+#ifndef CM_Get_DevNode_Property
+CMAPI CONFIGRET WINAPI CM_Get_DevNode_PropertyW(
+DEVINST  dnDevInst,
+CONST DEVPROPKEY * PropertyKey,
+DEVPROPTYPE  * PropertyType,
+PBYTEPropertyBuffer,
+PULONG   PropertyBufferSize,
+ULONGulFlags
+);
+#define CM_Get_DevNode_Property CM_Get_DevNode_PropertyW
+#endif
+
 #ifndef SHTDN_REASON_FLAG_PLANNED
 #define SHTDN_REASON_FLAG_PLANNED 0x8000
 #endif
@@ -92,6 +123,8 @@ static OpenFlags guest_file_open_modes[] = {
 g_free(suffix); \
 } while (0)
 
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(GuestDeviceInfo, qapi_free_GuestDeviceInfo)
+
 static OpenFlags *find_open_flag(const char *mode_str)
 {
 int mode;
@@ -2234,3 +2267,172 @@ GuestOSInfo *qmp_guest_get_osinfo(Error **errp)
 
 return info;
 }
+
+/*
+ * Safely get device property. Returned strings are using wide characters.
+ * Caller is responsible for freeing the buffer.
+ */
+static LPBYTE cm_get_property(DEVINST devInst, const DEVPROPKEY *propName,
+PDEVPROPTYPE propType)
+{
+CONFIGRET cr;
+g_autofree LPBYTE buffer = NULL;
+ULONG buffer_len = 0;
+
+/* First query for needed space */
+cr = CM_Get_DevNode_PropertyW(devInst, propName, propType,
+buffer, &buffer_len, 0);
+if (cr != CR_SUCCESS && cr != CR_BUFFER_SMALL) {
+
+slog("failed to get property size, error=0x%lx", cr);
+return NULL;
+}
+buffer = g_new0(BYTE, buffer_len + 1);
+cr = CM_Get_DevNode_PropertyW(devInst, propName, propType,
+buffer, &buffer_len, 0);
+if (cr != CR_SUCCESS) {
+slog("failed to get device property, error=0x%lx", cr);
+return NULL;
+}
+return g_steal_pointer(&buffer);
+}
+
+static GStrv ga_get_hardware_ids(DEVINST devInstance)
+{
+GStrv hw_ids = NULL;
+GArray *values = NULL;
+DEVPROPTYPE cm_type;
+LPWSTR id;
+g_autofree LPWSTR property = (LPWSTR)cm_get_property(devInstance,
+&qga_DEVPKEY_Device_HardwareIds, &cm_type);
+if (property == NULL) {
+slog("failed to get hardware IDs");
+return NULL;
+}
+if (*property == '\0') {
+/* empty list */
+return NULL;
+}
+values = g_array_new(TRUE, TRUE, sizeof(gchar *));
+for (id = property; '\0' != *id; id += lstrlenW(id) + 1) {
+gchar *id8 = g_utf16_to_utf8(id, -1, NULL, NULL, NULL);
+g_array_append_val(values, id8);
+}
+hw_ids = (GStrv)g_array_free(values, FALSE);
+

Re: [PATCH v3 25/33] tests/docker: Add fedora-win10sdk-cross image

2019-09-25 Thread Alex Bennée



Philippe Mathieu-Daudé  writes:

> Hi Alex,
>
> On 9/24/19 11:00 PM, Alex Bennée wrote:
>> From: Philippe Mathieu-Daudé 
>>
>> To build WHPX (Windows Hypervisor) binaries, we need the WHPX
>> headers provided by the Windows SDK.
>
> Justin is checking with his company if this patch is OK with them,
> I'd rather wait before merging it:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg646351.html
>
> Can you unqueue this and the next patch (which depends of it) meanwhile
> please?
>

OK, done.

> Thanks,
>
> Phil.
>
>> Add a script that fetches the required MSI/CAB files from the
>> latest SDK (currently 10.0.18362.1).
>>
>> Headers are accessible under /opt/win10sdk/include.
>>
>> Set the QEMU_CONFIGURE_OPTS environment variable accordingly,
>> enabling HAX and WHPX. Due to CPP warnings related to Microsoft
>> specific #pragmas, we also need to use the '--disable-werror'
>> configure flag.
>>
>> Cc: Justin Terry 
>> Signed-off-by: Philippe Mathieu-Daudé 
>> Signed-off-by: Alex Bennée 
>> Message-Id: <20190920113329.16787-3-phi...@redhat.com>
>> ---
>>  tests/docker/Makefile.include |  2 ++
>>  .../dockerfiles/fedora-win10sdk-cross.docker  | 23 
>>  tests/docker/dockerfiles/win10sdk-dl.sh   | 27 +++
>>  3 files changed, 52 insertions(+)
>>  create mode 100644 tests/docker/dockerfiles/fedora-win10sdk-cross.docker
>>  create mode 100755 tests/docker/dockerfiles/win10sdk-dl.sh
>>
>> diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
>> index 3fc7a863e51..e85e73025ba 100644
>> --- a/tests/docker/Makefile.include
>> +++ b/tests/docker/Makefile.include
>> @@ -125,6 +125,8 @@ docker-image-debian-ppc64-cross: docker-image-debian10
>>  docker-image-debian-riscv64-cross: docker-image-debian10
>>  docker-image-debian-sh4-cross: docker-image-debian10
>>  docker-image-debian-sparc64-cross: docker-image-debian10
>> +docker-image-fedora-win10sdk-cross: docker-image-fedora
>> +docker-image-fedora-win10sdk-cross: 
>> EXTRA_FILES:=$(DOCKER_FILES_DIR)/win10sdk-dl.sh
>>
>>  docker-image-travis: NOUSER=1
>>
>> diff --git a/tests/docker/dockerfiles/fedora-win10sdk-cross.docker 
>> b/tests/docker/dockerfiles/fedora-win10sdk-cross.docker
>> new file mode 100644
>> index 000..55ca933d40d
>> --- /dev/null
>> +++ b/tests/docker/dockerfiles/fedora-win10sdk-cross.docker
>> @@ -0,0 +1,23 @@
>> +#
>> +# Docker MinGW64 cross-compiler target with WHPX header installed
>> +#
>> +# This docker target builds on the Fedora 30 base image.
>> +#
>> +# SPDX-License-Identifier: GPL-2.0-or-later
>> +#
>> +FROM qemu:fedora
>> +
>> +RUN dnf install -y \
>> +cabextract \
>> +msitools \
>> +wget
>> +
>> +# Install WHPX headers from Windows Software Development Kit:
>> +# https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk
>> +ADD win10sdk-dl.sh /usr/local/bin/win10sdk-dl.sh
>> +RUN /usr/local/bin/win10sdk-dl.sh
>> +
>> +ENV QEMU_CONFIGURE_OPTS ${QEMU_CONFIGURE_OPTS} \
>> +--cross-prefix=x86_64-w64-mingw32- \
>> +--extra-cflags=-I/opt/win10sdk/include --disable-werror \
>> +--enable-hax --enable-whpx
>> diff --git a/tests/docker/dockerfiles/win10sdk-dl.sh 
>> b/tests/docker/dockerfiles/win10sdk-dl.sh
>> new file mode 100755
>> index 000..1c35c2a2524
>> --- /dev/null
>> +++ b/tests/docker/dockerfiles/win10sdk-dl.sh
>> @@ -0,0 +1,27 @@
>> +#!/bin/bash
>> +#
>> +# Install WHPX headers from Windows Software Development Kit
>> +# https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk
>> +#
>> +# SPDX-License-Identifier: GPL-2.0-or-later
>> +
>> +WINDIR=/opt/win10sdk
>> +mkdir -p ${WINDIR}
>> +pushd ${WINDIR}
>> +# Get the bundle base for Windows SDK v10.0.18362.1
>> +BASE_URL=$(curl --silent --include 
>> 'http://go.microsoft.com/fwlink/?prd=11966&pver=1.0&plcid=0x409&clcid=0x409&ar=Windows10&sar=SDK&o1=10.0.18362.1'
>>  | sed -nE 's_Location: (.*)/\r_\1_p')/Installers
>> +# Fetch the MSI containing the headers
>> +wget --no-verbose ${BASE_URL}/'Windows SDK Desktop Headers 
>> x86-x86_en-us.msi'
>> +while true; do
>> +# Fetch all cabinets required by this MSI
>> +CAB_NAME=$(msiextract Windows\ SDK\ Desktop\ Headers\ x86-x86_en-us.msi 
>> 3>&1 2>&3 3>&-| sed -nE "s_.*Error opening file $PWD/(.*): No such file or 
>> directory_\1_p")
>> +test -z "${CAB_NAME}" && break
>> +wget --no-verbose ${BASE_URL}/${CAB_NAME}
>> +done
>> +rm *.{cab,msi}
>> +mkdir /opt/win10sdk/include
>> +# Only keep the WHPX headers
>> +for inc in "${WINDIR}/Program Files/Windows 
>> Kits/10/Include/10.0.18362.0/um"/WinHv*; do
>> +ln -s "${inc}" /opt/win10sdk/include
>> +done
>> +popd > /dev/null
>>


--
Alex Bennée

Re: [PATCH v4] qga: add command guest-get-devices for reporting VirtIO devices

2019-09-25 Thread Eric Blake


On 9/25/19 4:03 PM, Tomáš Golembiovský wrote:

Add command for reporting devices on Windows guest. The intent is not so
much to report the devices but more importantly the driver (and its
version) that is assigned to the device. This gives caller the
information whether VirtIO drivers are installed and/or whether
inadequate driver is used on a device (e.g. QXL device with base VGA
driver).

Signed-off-by: Tomáš Golembiovský 
---


It's nice to mention here, after the --- separator, how v4 differs from 
earlier versions, to let reviewers that saw the earlier version check 
the differences.




+++ b/qga/qapi-schema.json
@@ -1242,3 +1242,35 @@
  ##
  { 'command': 'guest-get-osinfo',
'returns': 'GuestOSInfo' }
+
+##
+# @GuestDeviceInfo:
+#
+# @vendor-id: vendor ID
+# @device-id: device ID
+# @driver-name: name of the associated driver
+# @driver-date: driver release date in format -MM-DD
+# @driver-version: driver version
+#
+# Since: 4.2
+##
+{ 'struct': 'GuestDeviceInfo',
+  'data': {
+  'vendor-id': 'uint16',
+  'device-id': 'uint16',
+  'driver-name': 'str',
+  'driver-date': 'str',
+  'driver-version': 'str'
+  } }
+
+##
+# @guest-get-devices:
+#
+# Retrieve information about device drivers in Windows guest
+#
+# Returns: @GuestDeviceInfo
+#
+# Since: 4.2
+##
+{ 'command': 'guest-get-devices',
+  'returns': ['GuestDeviceInfo'] }




I'm not spotting any obvious problems with the interface itself, but am 
not comfortable enough with the rest of the code for a full review.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 2/3] iotests: Disable 125 on broken XFS versions

2019-09-25 Thread Eric Blake


On 9/25/19 1:32 PM, Max Reitz wrote:

And by that I mean all XFS versions, as far as I can tell.  All details
are in the comment below.

We never noticed this problem because we only read the first number from
qemu-img info's "disk size" output -- and that is effectively useless,
because qemu-img prints a human-readable value (which generally includes
a decimal point).  That will be fixed in the next patch.

Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/125 | 40 
  1 file changed, 40 insertions(+)

diff --git a/tests/qemu-iotests/125 b/tests/qemu-iotests/125
index df328a63a6..0ef51f1e21 100755
--- a/tests/qemu-iotests/125
+++ b/tests/qemu-iotests/125
@@ -49,6 +49,46 @@ if [ -z "$TEST_IMG_FILE" ]; then
  TEST_IMG_FILE=$TEST_IMG
  fi
  
+# Test whether we are running on a broken XFS version.  There is this

+# bug:
+
+# $ rm -f foo
+# $ touch foo
+# $ block_size=4096 # Your FS's block size
+# $ fallocate -o $((block_size / 2)) -l $block_size foo
+# $ LANG=C xfs_bmap foo | grep hole
+# 1: [8..15]: hole
+#
+# The problem is that the XFS driver rounds down the offset and
+# rounds up the length to the block size, but independently.


Eww. I concur you uncovered a bug.  Have you reported this to xfs folks?


+
+touch "$TEST_IMG_FILE"
+# Assuming there is no FS with a block size greater than 64k
+fallocate -o 65535 -l 2 "$TEST_IMG_FILE"
+len0=$(get_image_size_on_host)
+
+# Write to something that in theory we have just fallocated
+# (Thus, the on-disk size should not increase)
+poke_file "$TEST_IMG_FILE" 65536 42
+len1=$(get_image_size_on_host)
+
+if [ $len1 -gt $len0 ]; then
+_notrun "the test filesystem's fallocate() is broken"
+fi
+
+rm -f "$TEST_IMG_FILE"


Reviewed-by: Eric Blake 


+
  # Generally, we create some image with or without existing preallocation and
  # then resize it. Then we write some data into the image and verify that its
  # size does not change if we have used preallocation.



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 1/3] iotests: Fix 125 for growth_mode = metadata

2019-09-25 Thread Eric Blake


On 9/25/19 1:32 PM, Max Reitz wrote:

If we use growth_mode = metadata, it is very much possible that the file
uses more disk space after we have written something to the added area.
We did indeed want to test for this case, but unfortunately we evidently
just copied the code from the "Test creation preallocation" section and
forgot to replace "$create_mode" by "$growth_mode".

We never noticed because we only read the first number from qemu-img
info's "disk size" output -- and that is effectively useless, because
qemu-img prints a human-readable value (which generally includes a
decimal point).  That will be fixed in the patch after the next one.

Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/125 | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



Reviewed-by: Eric Blake 


diff --git a/tests/qemu-iotests/125 b/tests/qemu-iotests/125
index dc4b8f5fb9..df328a63a6 100755
--- a/tests/qemu-iotests/125
+++ b/tests/qemu-iotests/125
@@ -111,7 +111,7 @@ for GROWTH_SIZE in 16 48 80; do
  if [ $file_length_2 -gt $file_length_1 ]; then
  echo "ERROR (grow): Image length has grown from $file_length_1 
to $file_length_2"
  fi
-if [ $create_mode != metadata ]; then
+if [ $growth_mode != metadata ]; then
  # The host size should not have grown either
  if [ $host_size_2 -gt $host_size_1 ]; then
  echo "ERROR (grow): Host size has grown from $host_size_1 
to $host_size_2"



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[Bug 1841990] Re: instruction 'denbcdq' misbehaving

2019-09-25 Thread Paul Clarke

> Did you see the follow up email indicating the typo that I found in
patch 6?

I did, then forgot to include it in my build.  I've included that change
now...

> Does that help any more tests to pass?

I'm down from 22 failures to 8.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1841990

Title:
  instruction 'denbcdq' misbehaving

Status in QEMU:
  New

Bug description:
  Instruction 'denbcdq' appears to have no effect.  Test case attached.

  On ppc64le native:
  --
  gcc -g -O -mcpu=power9 bcdcfsq.c test-denbcdq.c -o test-denbcdq
  $ ./test-denbcdq
  0x
  0x000c
  0x2208
  $ ./test-denbcdq 1
  0x0001
  0x001c
  0x22080001
  $ ./test-denbcdq $(seq 0 99)
  0x0064
  0x100c
  0x22080080
  --

  With "qemu-ppc64le -cpu power9"
  --
  $ qemu-ppc64le -cpu power9 -L [...] ./test-denbcdq
  0x
  0x000c
  0x000c
  $ qemu-ppc64le -cpu power9 -L [...] ./test-denbcdq 1
  0x0001
  0x001c
  0x001c
  $ qemu-ppc64le -cpu power9 -L [...] ./test-denbcdq $(seq 100)
  0x0064
  0x100c
  0x100c
  --

  I started looking at the code, but I got confused rather quickly.
  Could be related to endianness? I think denbcdq arrived on the scene
  before little-endian was a big deal.  Maybe something to do with
  utilizing implicit floating-point register pairs...  I don't think the
  right data is getting to helper_denbcdq, which would point back to the
  gen_fprp_ptr uses in dfp-impl.inc.c (GEN_DFP_T_FPR_I32_Rc).  (Maybe?)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1841990/+subscriptions

Re: [PATCH 3/3] iotests: Use stat -c %b in 125

2019-09-25 Thread Eric Blake


On 9/25/19 1:32 PM, Max Reitz wrote:

125 should not use qemu-img to get the on-disk image size, because that
reports it in a human-readable format that is useless to us.  Just use
stat instead (like we do to get the image file length).

Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/125 | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/125 b/tests/qemu-iotests/125
index 0ef51f1e21..4e31aa4e5f 100755
--- a/tests/qemu-iotests/125
+++ b/tests/qemu-iotests/125
@@ -34,8 +34,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
  
  get_image_size_on_host()

  {
-$QEMU_IMG info -f "$IMGFMT" "$TEST_IMG" | grep "disk size" \
-| sed -e 's/^[^0-9]*\([0-9]\+\).*$/\1/'
+echo $(($(stat -c '%b * %B' "$TEST_IMG_FILE")))


Cute use of $(()) around $().

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH v2 01/13] block-crypto: misc refactoring

2019-09-25 Thread Maxim Levitsky

* rename the write_func to create_write_func,
  and init_func to create_init_func
  this is  preparation for other write_func that will
  be used to update the encryption keys.

No functional changes

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 block/crypto.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 7eb698774e..6e822c6e50 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -78,7 +78,7 @@ struct BlockCryptoCreateData {
 };
 
 
-static ssize_t block_crypto_write_func(QCryptoBlock *block,
+static ssize_t block_crypto_create_write_func(QCryptoBlock *block,
size_t offset,
const uint8_t *buf,
size_t buflen,
@@ -96,8 +96,7 @@ static ssize_t block_crypto_write_func(QCryptoBlock *block,
 return ret;
 }
 
-
-static ssize_t block_crypto_init_func(QCryptoBlock *block,
+static ssize_t block_crypto_create_init_func(QCryptoBlock *block,
   size_t headerlen,
   void *opaque,
   Error **errp)
@@ -109,7 +108,8 @@ static ssize_t block_crypto_init_func(QCryptoBlock *block,
 return -EFBIG;
 }
 
-/* User provided size should reflect amount of space made
+/*
+ * User provided size should reflect amount of space made
  * available to the guest, so we must take account of that
  * which will be used by the crypto header
  */
@@ -279,8 +279,8 @@ static int block_crypto_co_create_generic(BlockDriverState 
*bs,
 };
 
 crypto = qcrypto_block_create(opts, NULL,
-  block_crypto_init_func,
-  block_crypto_write_func,
+  block_crypto_create_init_func,
+  block_crypto_create_write_func,
   &data,
   errp);
 
-- 
2.17.2

[PATCH v2 00/13] crypto/luks: preparation for encryption key managment

2019-09-25 Thread Maxim Levitsky

Hi!

This patch series is the refactoring/preparation part of the
former patch series I had sent which adds support for luks
key management.

V2:
I addressed all the review comments
I also added another minor patch to improve an error messsage
when trying to create too large file, for which I have an
open bug that waits to be closed.
Its also is form of refactoring, and thus I guess it makes
sense to include it here.

Best regards,
Maxim Levitsky

Maxim Levitsky (13):
  block-crypto: misc refactoring
  qcrypto-luks: rename some fields in QCryptoBlockLUKSHeader
  qcrypto-luks: don't overwrite cipher_mode in header
  qcrypto-luks: simplify masterkey and masterkey length
  qcrypto-luks: pass keyslot index rather that pointer to the keyslot
  qcrypto-luks: use the parsed encryption settings in QCryptoBlockLUKS
  qcrypto-luks: purge unused error codes from open callback
  qcrypto-luks: extract store and load header
  qcrypto-luks: extract check and parse header
  qcrypto-luks: extract store key function
  qcrypto-luks: simplify the math used for keyslot locations
  qcrypto-luks: more rigorous header checking
  LUKS: better error message when creating too large files

 block/crypto.c  |   33 +-
 crypto/block-luks.c | 1025 +--
 2 files changed, 617 insertions(+), 441 deletions(-)

-- 
2.17.2

[PATCH v2 03/13] qcrypto-luks: don't overwrite cipher_mode in header

2019-09-25 Thread Maxim Levitsky

This way we can store the header we loaded, which
will be used in key management code

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 crypto/block-luks.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index f12fa2d270..25f8a9f1c4 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -645,6 +645,7 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 QCryptoHashAlgorithm hash;
 QCryptoHashAlgorithm ivhash;
 g_autofree char *password = NULL;
+g_autofree char *cipher_mode = NULL;
 
 if (!(flags & QCRYPTO_BLOCK_OPEN_NO_IO)) {
 if (!options->u.luks.key_secret) {
@@ -701,6 +702,8 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 goto fail;
 }
 
+cipher_mode = g_strdup(luks->header.cipher_mode);
+
 /*
  * The cipher_mode header contains a string that we have
  * to further parse, of the format
@@ -709,11 +712,11 @@ qcrypto_block_luks_open(QCryptoBlock *block,
  *
  * eg  cbc-essiv:sha256, cbc-plain64
  */
-ivgen_name = strchr(luks->header.cipher_mode, '-');
+ivgen_name = strchr(cipher_mode, '-');
 if (!ivgen_name) {
 ret = -EINVAL;
 error_setg(errp, "Unexpected cipher mode string format %s",
-   luks->header.cipher_mode);
+   cipher_mode);
 goto fail;
 }
 *ivgen_name = '\0';
@@ -735,7 +738,7 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 }
 }
 
-ciphermode = 
qcrypto_block_luks_cipher_mode_lookup(luks->header.cipher_mode,
+ciphermode = qcrypto_block_luks_cipher_mode_lookup(cipher_mode,
&local_err);
 if (local_err) {
 ret = -ENOTSUP;
-- 
2.17.2

[PATCH v2 04/13] qcrypto-luks: simplify masterkey and masterkey length

2019-09-25 Thread Maxim Levitsky

Let the caller allocate masterkey
Always use master key len from the header

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 crypto/block-luks.c | 44 +---
 1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index 25f8a9f1c4..9e59a791a6 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -419,7 +419,6 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
 QCryptoCipherAlgorithm ivcipheralg,
 QCryptoHashAlgorithm ivhash,
 uint8_t *masterkey,
-size_t masterkeylen,
 QCryptoBlockReadFunc readfunc,
 void *opaque,
 Error **errp)
@@ -438,9 +437,9 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
 return 0;
 }
 
-splitkeylen = masterkeylen * slot->stripes;
+splitkeylen = luks->header.master_key_len * slot->stripes;
 splitkey = g_new0(uint8_t, splitkeylen);
-possiblekey = g_new0(uint8_t, masterkeylen);
+possiblekey = g_new0(uint8_t, luks->header.master_key_len);
 
 /*
  * The user password is used to generate a (possible)
@@ -453,7 +452,7 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
(const uint8_t *)password, strlen(password),
slot->salt, QCRYPTO_BLOCK_LUKS_SALT_LEN,
slot->iterations,
-   possiblekey, masterkeylen,
+   possiblekey, luks->header.master_key_len,
errp) < 0) {
 return -1;
 }
@@ -478,7 +477,7 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
 /* Setup the cipher/ivgen that we'll use to try to decrypt
  * the split master key material */
 cipher = qcrypto_cipher_new(cipheralg, ciphermode,
-possiblekey, masterkeylen,
+possiblekey, luks->header.master_key_len,
 errp);
 if (!cipher) {
 return -1;
@@ -489,7 +488,7 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
 ivgen = qcrypto_ivgen_new(ivalg,
   ivcipheralg,
   ivhash,
-  possiblekey, masterkeylen,
+  possiblekey, luks->header.master_key_len,
   errp);
 if (!ivgen) {
 return -1;
@@ -519,7 +518,7 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
  * it back together to get the actual master key.
  */
 if (qcrypto_afsplit_decode(hash,
-   masterkeylen,
+   luks->header.master_key_len,
slot->stripes,
splitkey,
masterkey,
@@ -537,11 +536,13 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
  * header
  */
 if (qcrypto_pbkdf2(hash,
-   masterkey, masterkeylen,
+   masterkey,
+   luks->header.master_key_len,
luks->header.master_key_salt,
QCRYPTO_BLOCK_LUKS_SALT_LEN,
luks->header.master_key_iterations,
-   keydigest, G_N_ELEMENTS(keydigest),
+   keydigest,
+   G_N_ELEMENTS(keydigest),
errp) < 0) {
 return -1;
 }
@@ -574,8 +575,7 @@ qcrypto_block_luks_find_key(QCryptoBlock *block,
 QCryptoIVGenAlgorithm ivalg,
 QCryptoCipherAlgorithm ivcipheralg,
 QCryptoHashAlgorithm ivhash,
-uint8_t **masterkey,
-size_t *masterkeylen,
+uint8_t *masterkey,
 QCryptoBlockReadFunc readfunc,
 void *opaque,
 Error **errp)
@@ -584,9 +584,6 @@ qcrypto_block_luks_find_key(QCryptoBlock *block,
 size_t i;
 int rv;
 
-*masterkey = g_new0(uint8_t, luks->header.master_key_len);
-*masterkeylen = luks->header.master_key_len;
-
 for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
 rv = qcrypto_block_luks_load_key(block,
  &luks->header.key_slots[i],
@@ -597,8 +594,7 @@ qcrypto_block_luks_find_key(QCryptoBlock *block,
  ivalg,
  ivcipheralg,
  ivhash,
- *masterkey,
- *masterkeylen,
+ masterkey,
  readfunc,

[PATCH v2 02/13] qcrypto-luks: rename some fields in QCryptoBlockLUKSHeader

2019-09-25 Thread Maxim Levitsky

* key_bytes -> master_key_len
* payload_offset = payload_offset_sector (to emphasise that this isn't byte 
offset)
* key_offset -> key_offset_sector - same as above for luks slots

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 crypto/block-luks.c | 91 +++--
 1 file changed, 47 insertions(+), 44 deletions(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index 743949adbf..f12fa2d270 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -143,7 +143,7 @@ struct QCryptoBlockLUKSKeySlot {
 /* salt for PBKDF2 */
 uint8_t salt[QCRYPTO_BLOCK_LUKS_SALT_LEN];
 /* start sector of key material */
-uint32_t key_offset;
+uint32_t key_offset_sector;
 /* number of anti-forensic stripes */
 uint32_t stripes;
 };
@@ -172,10 +172,10 @@ struct QCryptoBlockLUKSHeader {
 char hash_spec[QCRYPTO_BLOCK_LUKS_HASH_SPEC_LEN];
 
 /* start offset of the volume data (in 512 byte sectors) */
-uint32_t payload_offset;
+uint32_t payload_offset_sector;
 
 /* Number of key bytes */
-uint32_t key_bytes;
+uint32_t master_key_len;
 
 /* master key checksum after PBKDF2 */
 uint8_t master_key_digest[QCRYPTO_BLOCK_LUKS_DIGEST_LEN];
@@ -466,7 +466,7 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
  * then encrypted.
  */
 rv = readfunc(block,
-  slot->key_offset * QCRYPTO_BLOCK_LUKS_SECTOR_SIZE,
+  slot->key_offset_sector * QCRYPTO_BLOCK_LUKS_SECTOR_SIZE,
   splitkey, splitkeylen,
   opaque,
   errp);
@@ -584,8 +584,8 @@ qcrypto_block_luks_find_key(QCryptoBlock *block,
 size_t i;
 int rv;
 
-*masterkey = g_new0(uint8_t, luks->header.key_bytes);
-*masterkeylen = luks->header.key_bytes;
+*masterkey = g_new0(uint8_t, luks->header.master_key_len);
+*masterkeylen = luks->header.master_key_len;
 
 for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
 rv = qcrypto_block_luks_load_key(block,
@@ -677,14 +677,14 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 /* The header is always stored in big-endian format, so
  * convert everything to native */
 be16_to_cpus(&luks->header.version);
-be32_to_cpus(&luks->header.payload_offset);
-be32_to_cpus(&luks->header.key_bytes);
+be32_to_cpus(&luks->header.payload_offset_sector);
+be32_to_cpus(&luks->header.master_key_len);
 be32_to_cpus(&luks->header.master_key_iterations);
 
 for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
 be32_to_cpus(&luks->header.key_slots[i].active);
 be32_to_cpus(&luks->header.key_slots[i].iterations);
-be32_to_cpus(&luks->header.key_slots[i].key_offset);
+be32_to_cpus(&luks->header.key_slots[i].key_offset_sector);
 be32_to_cpus(&luks->header.key_slots[i].stripes);
 }
 
@@ -743,10 +743,11 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 goto fail;
 }
 
-cipheralg = qcrypto_block_luks_cipher_name_lookup(luks->header.cipher_name,
-  ciphermode,
-  luks->header.key_bytes,
-  &local_err);
+cipheralg =
+qcrypto_block_luks_cipher_name_lookup(luks->header.cipher_name,
+  ciphermode,
+  luks->header.master_key_len,
+  &local_err);
 if (local_err) {
 ret = -ENOTSUP;
 error_propagate(errp, local_err);
@@ -838,7 +839,7 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 }
 
 block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
-block->payload_offset = luks->header.payload_offset *
+block->payload_offset = luks->header.payload_offset_sector *
 block->sector_size;
 
 luks->cipher_alg = cipheralg;
@@ -993,9 +994,11 @@ qcrypto_block_luks_create(QCryptoBlock *block,
 strcpy(luks->header.cipher_mode, cipher_mode_spec);
 strcpy(luks->header.hash_spec, hash_alg);
 
-luks->header.key_bytes = qcrypto_cipher_get_key_len(luks_opts.cipher_alg);
+luks->header.master_key_len =
+qcrypto_cipher_get_key_len(luks_opts.cipher_alg);
+
 if (luks_opts.cipher_mode == QCRYPTO_CIPHER_MODE_XTS) {
-luks->header.key_bytes *= 2;
+luks->header.master_key_len *= 2;
 }
 
 /* Generate the salt used for hashing the master key
@@ -1008,9 +1011,9 @@ qcrypto_block_luks_create(QCryptoBlock *block,
 }
 
 /* Generate random master key */
-masterkey = g_new0(uint8_t, luks->header.key_bytes);
+masterkey = g_new0(uint8_t, luks->header.master_key_len);
 if (qcrypto_random_bytes(masterkey,
- luks->header.key_bytes, errp) < 0) {
+ luks->header.master_key_len, errp) < 0) {
 goto error;
 }
 
@@ -

[PATCH v2 06/13] qcrypto-luks: use the parsed encryption settings in QCryptoBlockLUKS

2019-09-25 Thread Maxim Levitsky

Prior to that patch, the parsed encryption settings
were already stored into the QCryptoBlockLUKS but not
used anywhere but in qcrypto_block_luks_get_info

Using them simplifies the code

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 crypto/block-luks.c | 169 +---
 1 file changed, 79 insertions(+), 90 deletions(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index b759cc8d19..f3bfc921b2 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -199,13 +199,25 @@ QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSHeader) 
!= 592);
 struct QCryptoBlockLUKS {
 QCryptoBlockLUKSHeader header;
 
-/* Cache parsed versions of what's in header fields,
- * as we can't rely on QCryptoBlock.cipher being
- * non-NULL */
+/* Main encryption algorithm used for encryption*/
 QCryptoCipherAlgorithm cipher_alg;
+
+/* Mode of encryption for the selected encryption algorithm */
 QCryptoCipherMode cipher_mode;
+
+/* Initialization vector generation algorithm */
 QCryptoIVGenAlgorithm ivgen_alg;
+
+/* Hash algorithm used for IV generation*/
 QCryptoHashAlgorithm ivgen_hash_alg;
+
+/*
+ * Encryption algorithm used for IV generation.
+ * Usually the same as main encryption algorithm
+ */
+QCryptoCipherAlgorithm ivgen_cipher_alg;
+
+/* Hash algorithm used in pbkdf2 function */
 QCryptoHashAlgorithm hash_alg;
 };
 
@@ -412,12 +424,6 @@ static int
 qcrypto_block_luks_load_key(QCryptoBlock *block,
 size_t slot_idx,
 const char *password,
-QCryptoCipherAlgorithm cipheralg,
-QCryptoCipherMode ciphermode,
-QCryptoHashAlgorithm hash,
-QCryptoIVGenAlgorithm ivalg,
-QCryptoCipherAlgorithm ivcipheralg,
-QCryptoHashAlgorithm ivhash,
 uint8_t *masterkey,
 QCryptoBlockReadFunc readfunc,
 void *opaque,
@@ -449,7 +455,7 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
  * the key is correct and validate the results of
  * decryption later.
  */
-if (qcrypto_pbkdf2(hash,
+if (qcrypto_pbkdf2(luks->hash_alg,
(const uint8_t *)password, strlen(password),
slot->salt, QCRYPTO_BLOCK_LUKS_SALT_LEN,
slot->iterations,
@@ -477,19 +483,23 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
 
 /* Setup the cipher/ivgen that we'll use to try to decrypt
  * the split master key material */
-cipher = qcrypto_cipher_new(cipheralg, ciphermode,
-possiblekey, luks->header.master_key_len,
+cipher = qcrypto_cipher_new(luks->cipher_alg,
+luks->cipher_mode,
+possiblekey,
+luks->header.master_key_len,
 errp);
 if (!cipher) {
 return -1;
 }
 
-niv = qcrypto_cipher_get_iv_len(cipheralg,
-ciphermode);
-ivgen = qcrypto_ivgen_new(ivalg,
-  ivcipheralg,
-  ivhash,
-  possiblekey, luks->header.master_key_len,
+niv = qcrypto_cipher_get_iv_len(luks->cipher_alg,
+luks->cipher_mode);
+
+ivgen = qcrypto_ivgen_new(luks->ivgen_alg,
+  luks->ivgen_cipher_alg,
+  luks->ivgen_hash_alg,
+  possiblekey,
+  luks->header.master_key_len,
   errp);
 if (!ivgen) {
 return -1;
@@ -518,7 +528,7 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
  * Now we've decrypted the split master key, join
  * it back together to get the actual master key.
  */
-if (qcrypto_afsplit_decode(hash,
+if (qcrypto_afsplit_decode(luks->hash_alg,
luks->header.master_key_len,
slot->stripes,
splitkey,
@@ -536,7 +546,7 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
  * then comparing that to the hash stored in the key slot
  * header
  */
-if (qcrypto_pbkdf2(hash,
+if (qcrypto_pbkdf2(luks->hash_alg,
masterkey,
luks->header.master_key_len,
luks->header.master_key_salt,
@@ -570,12 +580,6 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
 static int
 qcrypto_block_luks_find_key(QCryptoBlock *block,
 const char *password,
-QCryptoCipherAlgorithm cipheralg,
-QCryptoCipherMode ciphe

[PATCH v2 12/13] qcrypto-luks: more rigorous header checking

2019-09-25 Thread Maxim Levitsky

Check that keyslots don't overlap with the data,
and check that keyslots don't overlap with each other.
(this is done using naive O(n^2) nested loops,
but since there are just 8 keyslots, this doesn't really matter.

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 crypto/block-luks.c | 52 +
 1 file changed, 52 insertions(+)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index a53d5d1916..4861db810c 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -530,6 +530,11 @@ qcrypto_block_luks_load_header(QCryptoBlock *block,
 static int
 qcrypto_block_luks_check_header(const QCryptoBlockLUKS *luks, Error **errp)
 {
+size_t i, j;
+
+unsigned int header_sectors = QCRYPTO_BLOCK_LUKS_KEY_SLOT_OFFSET /
+QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
+
 if (memcmp(luks->header.magic, qcrypto_block_luks_magic,
QCRYPTO_BLOCK_LUKS_MAGIC_LEN) != 0) {
 error_setg(errp, "Volume is not in LUKS format");
@@ -541,6 +546,53 @@ qcrypto_block_luks_check_header(const QCryptoBlockLUKS 
*luks, Error **errp)
luks->header.version);
 return -1;
 }
+
+/* Check all keyslots for corruption  */
+for (i = 0 ; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS ; i++) {
+
+const QCryptoBlockLUKSKeySlot *slot1 = &luks->header.key_slots[i];
+unsigned int start1 = slot1->key_offset_sector;
+unsigned int len1 =
+qcrypto_block_luks_splitkeylen_sectors(luks,
+   header_sectors,
+   slot1->stripes);
+
+if (slot1->stripes == 0) {
+error_setg(errp, "Keyslot %zu is corrupted (stripes == 0)", i);
+return -1;
+}
+
+if (slot1->active != QCRYPTO_BLOCK_LUKS_KEY_SLOT_DISABLED &&
+slot1->active != QCRYPTO_BLOCK_LUKS_KEY_SLOT_ENABLED) {
+error_setg(errp,
+   "Keyslot %zu state (active/disable) is corrupted", i);
+return -1;
+}
+
+if (start1 + len1 > luks->header.payload_offset_sector) {
+error_setg(errp,
+   "Keyslot %zu is overlapping with the encrypted payload",
+   i);
+return -1;
+}
+
+for (j = i + 1 ; j < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS ; j++) {
+const QCryptoBlockLUKSKeySlot *slot2 = &luks->header.key_slots[j];
+unsigned int start2 = slot2->key_offset_sector;
+unsigned int len2 =
+qcrypto_block_luks_splitkeylen_sectors(luks,
+   header_sectors,
+   slot2->stripes);
+
+if (start1 + len1 > start2 && start2 + len2 > start1) {
+error_setg(errp,
+   "Keyslots %zu and %zu are overlapping in the 
header",
+   i, j);
+return -1;
+}
+}
+
+}
 return 0;
 }
 
-- 
2.17.2

[PATCH v2 05/13] qcrypto-luks: pass keyslot index rather that pointer to the keyslot

2019-09-25 Thread Maxim Levitsky

Another minor refactoring

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 crypto/block-luks.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index 9e59a791a6..b759cc8d19 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -410,7 +410,7 @@ qcrypto_block_luks_essiv_cipher(QCryptoCipherAlgorithm 
cipher,
  */
 static int
 qcrypto_block_luks_load_key(QCryptoBlock *block,
-QCryptoBlockLUKSKeySlot *slot,
+size_t slot_idx,
 const char *password,
 QCryptoCipherAlgorithm cipheralg,
 QCryptoCipherMode ciphermode,
@@ -424,6 +424,7 @@ qcrypto_block_luks_load_key(QCryptoBlock *block,
 Error **errp)
 {
 QCryptoBlockLUKS *luks = block->opaque;
+const QCryptoBlockLUKSKeySlot *slot = &luks->header.key_slots[slot_idx];
 g_autofree uint8_t *splitkey = NULL;
 size_t splitkeylen;
 g_autofree uint8_t *possiblekey = NULL;
@@ -580,13 +581,12 @@ qcrypto_block_luks_find_key(QCryptoBlock *block,
 void *opaque,
 Error **errp)
 {
-QCryptoBlockLUKS *luks = block->opaque;
 size_t i;
 int rv;
 
 for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
 rv = qcrypto_block_luks_load_key(block,
- &luks->header.key_slots[i],
+ i,
  password,
  cipheralg,
  ciphermode,
-- 
2.17.2

[PATCH v2 10/13] qcrypto-luks: extract store key function

2019-09-25 Thread Maxim Levitsky

This function will be used later to store
new keys to the luks metadata

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 crypto/block-luks.c | 304 ++--
 1 file changed, 181 insertions(+), 123 deletions(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index fa799fd21d..6d4e9eb348 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -623,6 +623,176 @@ qcrypto_block_luks_parse_header(QCryptoBlockLUKS *luks, 
Error **errp)
 return 0;
 }
 
+/*
+ * Given a key slot,  user password, and the master key,
+ * will store the encrypted master key there, and update the
+ * in-memory header. User must then write the in-memory header
+ *
+ * Returns:
+ *0 if the keyslot was written successfully
+ *  with the provided password
+ *   -1 if a fatal error occurred while storing the key
+ */
+static int
+qcrypto_block_luks_store_key(QCryptoBlock *block,
+ unsigned int slot_idx,
+ const char *password,
+ uint8_t *masterkey,
+ uint64_t iter_time,
+ QCryptoBlockWriteFunc writefunc,
+ void *opaque,
+ Error **errp)
+{
+QCryptoBlockLUKS *luks = block->opaque;
+QCryptoBlockLUKSKeySlot *slot = &luks->header.key_slots[slot_idx];
+g_autofree uint8_t *splitkey = NULL;
+size_t splitkeylen;
+g_autofree uint8_t *slotkey = NULL;
+g_autoptr(QCryptoCipher) cipher = NULL;
+g_autoptr(QCryptoIVGen) ivgen = NULL;
+Error *local_err = NULL;
+uint64_t iters;
+int ret = -1;
+
+if (qcrypto_random_bytes(slot->salt,
+ QCRYPTO_BLOCK_LUKS_SALT_LEN,
+ errp) < 0) {
+goto cleanup;
+}
+
+splitkeylen = luks->header.master_key_len * slot->stripes;
+
+/*
+ * Determine how many iterations are required to
+ * hash the user password while consuming 1 second of compute
+ * time
+ */
+iters = qcrypto_pbkdf2_count_iters(luks->hash_alg,
+   (uint8_t *)password, strlen(password),
+   slot->salt,
+   QCRYPTO_BLOCK_LUKS_SALT_LEN,
+   luks->header.master_key_len,
+   &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto cleanup;
+}
+
+if (iters > (ULLONG_MAX / iter_time)) {
+error_setg_errno(errp, ERANGE,
+ "PBKDF iterations %llu too large to scale",
+ (unsigned long long)iters);
+goto cleanup;
+}
+
+/* iter_time was in millis, but count_iters reported for secs */
+iters = iters * iter_time / 1000;
+
+if (iters > UINT32_MAX) {
+error_setg_errno(errp, ERANGE,
+ "PBKDF iterations %llu larger than %u",
+ (unsigned long long)iters, UINT32_MAX);
+goto cleanup;
+}
+
+slot->iterations =
+MAX(iters, QCRYPTO_BLOCK_LUKS_MIN_SLOT_KEY_ITERS);
+
+
+/*
+ * Generate a key that we'll use to encrypt the master
+ * key, from the user's password
+ */
+slotkey = g_new0(uint8_t, luks->header.master_key_len);
+if (qcrypto_pbkdf2(luks->hash_alg,
+   (uint8_t *)password, strlen(password),
+   slot->salt,
+   QCRYPTO_BLOCK_LUKS_SALT_LEN,
+   slot->iterations,
+   slotkey, luks->header.master_key_len,
+   errp) < 0) {
+goto cleanup;
+}
+
+
+/*
+ * Setup the encryption objects needed to encrypt the
+ * master key material
+ */
+cipher = qcrypto_cipher_new(luks->cipher_alg,
+luks->cipher_mode,
+slotkey, luks->header.master_key_len,
+errp);
+if (!cipher) {
+goto cleanup;
+}
+
+ivgen = qcrypto_ivgen_new(luks->ivgen_alg,
+  luks->ivgen_cipher_alg,
+  luks->ivgen_hash_alg,
+  slotkey, luks->header.master_key_len,
+  errp);
+if (!ivgen) {
+goto cleanup;
+}
+
+/*
+ * Before storing the master key, we need to vastly
+ * increase its size, as protection against forensic
+ * disk data recovery
+ */
+splitkey = g_new0(uint8_t, splitkeylen);
+
+if (qcrypto_afsplit_encode(luks->hash_alg,
+   luks->header.master_key_len,
+   slot->stripes,
+   masterkey,
+   splitkey,
+   errp) < 0) {
+goto cleanup;
+}
+
+/*
+ * Now we

[PATCH v2 08/13] qcrypto-luks: extract store and load header

2019-09-25 Thread Maxim Levitsky

Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 crypto/block-luks.c | 155 ++--
 1 file changed, 93 insertions(+), 62 deletions(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index b8f9b9c20a..47371edf13 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -409,6 +409,97 @@ qcrypto_block_luks_essiv_cipher(QCryptoCipherAlgorithm 
cipher,
 }
 }
 
+/*
+ * Stores the main LUKS header, taking care of endianess
+ */
+static int
+qcrypto_block_luks_store_header(QCryptoBlock *block,
+QCryptoBlockWriteFunc writefunc,
+void *opaque,
+Error **errp)
+{
+const QCryptoBlockLUKS *luks = block->opaque;
+Error *local_err = NULL;
+size_t i;
+g_autofree QCryptoBlockLUKSHeader *hdr_copy = NULL;
+
+/* Create a copy of the header */
+hdr_copy = g_new0(QCryptoBlockLUKSHeader, 1);
+memcpy(hdr_copy, &luks->header, sizeof(QCryptoBlockLUKSHeader));
+
+/*
+ * Everything on disk uses Big Endian (tm), so flip header fields
+ * before writing them
+ */
+cpu_to_be16s(&hdr_copy->version);
+cpu_to_be32s(&hdr_copy->payload_offset_sector);
+cpu_to_be32s(&hdr_copy->master_key_len);
+cpu_to_be32s(&hdr_copy->master_key_iterations);
+
+for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
+cpu_to_be32s(&hdr_copy->key_slots[i].active);
+cpu_to_be32s(&hdr_copy->key_slots[i].iterations);
+cpu_to_be32s(&hdr_copy->key_slots[i].key_offset_sector);
+cpu_to_be32s(&hdr_copy->key_slots[i].stripes);
+}
+
+/* Write out the partition header and key slot headers */
+writefunc(block, 0, (const uint8_t *)hdr_copy, sizeof(*hdr_copy),
+  opaque, &local_err);
+
+if (local_err) {
+error_propagate(errp, local_err);
+return -1;
+}
+return 0;
+}
+
+/*
+ * Loads the main LUKS header,and byteswaps it to native endianess
+ * And run basic sanity checks on it
+ */
+static int
+qcrypto_block_luks_load_header(QCryptoBlock *block,
+QCryptoBlockReadFunc readfunc,
+void *opaque,
+Error **errp)
+{
+ssize_t rv;
+size_t i;
+QCryptoBlockLUKS *luks = block->opaque;
+
+/*
+ * Read the entire LUKS header, minus the key material from
+ * the underlying device
+ */
+rv = readfunc(block, 0,
+  (uint8_t *)&luks->header,
+  sizeof(luks->header),
+  opaque,
+  errp);
+if (rv < 0) {
+return rv;
+}
+
+/*
+ * The header is always stored in big-endian format, so
+ * convert everything to native
+ */
+be16_to_cpus(&luks->header.version);
+be32_to_cpus(&luks->header.payload_offset_sector);
+be32_to_cpus(&luks->header.master_key_len);
+be32_to_cpus(&luks->header.master_key_iterations);
+
+for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
+be32_to_cpus(&luks->header.key_slots[i].active);
+be32_to_cpus(&luks->header.key_slots[i].iterations);
+be32_to_cpus(&luks->header.key_slots[i].key_offset_sector);
+be32_to_cpus(&luks->header.key_slots[i].stripes);
+}
+
+return 0;
+}
+
 /*
  * Given a key slot, and user password, this will attempt to unlock
  * the master encryption key from the key slot.
@@ -622,7 +713,6 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 {
 QCryptoBlockLUKS *luks = NULL;
 Error *local_err = NULL;
-size_t i;
 g_autofree uint8_t *masterkey = NULL;
 char *ivgen_name, *ivhash_name;
 g_autofree char *password = NULL;
@@ -644,30 +734,10 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 luks = g_new0(QCryptoBlockLUKS, 1);
 block->opaque = luks;
 
-/* Read the entire LUKS header, minus the key material from
- * the underlying device */
-if (readfunc(block, 0,
- (uint8_t *)&luks->header,
- sizeof(luks->header),
- opaque,
- errp) < 0) {
+if (qcrypto_block_luks_load_header(block, readfunc, opaque, errp) < 0) {
 goto fail;
 }
 
-/* The header is always stored in big-endian format, so
- * convert everything to native */
-be16_to_cpus(&luks->header.version);
-be32_to_cpus(&luks->header.payload_offset_sector);
-be32_to_cpus(&luks->header.master_key_len);
-be32_to_cpus(&luks->header.master_key_iterations);
-
-for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
-be32_to_cpus(&luks->header.key_slots[i].active);
-be32_to_cpus(&luks->header.key_slots[i].iterations);
-be32_to_cpus(&luks->header.key_slots[i].key_offset_sector);
-be32_to_cpus(&luks->header.key_slots[i].stripes);
-}
-
 if (memcmp(luks->header.magic, qcrypto_block_luks_magic,
QCRYPTO_BLOCK_LUKS_MAGIC_LEN) != 0) {

[PATCH v2 07/13] qcrypto-luks: purge unused error codes from open callback

2019-09-25 Thread Maxim Levitsky

These values are not used by generic crypto code anyway

Signed-off-by: Maxim Levitsky 
---
 crypto/block-luks.c | 45 +
 1 file changed, 13 insertions(+), 32 deletions(-)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index f3bfc921b2..b8f9b9c20a 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -622,9 +622,7 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 {
 QCryptoBlockLUKS *luks = NULL;
 Error *local_err = NULL;
-int ret = 0;
 size_t i;
-ssize_t rv;
 g_autofree uint8_t *masterkey = NULL;
 char *ivgen_name, *ivhash_name;
 g_autofree char *password = NULL;
@@ -648,13 +646,11 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 
 /* Read the entire LUKS header, minus the key material from
  * the underlying device */
-rv = readfunc(block, 0,
-  (uint8_t *)&luks->header,
-  sizeof(luks->header),
-  opaque,
-  errp);
-if (rv < 0) {
-ret = rv;
+if (readfunc(block, 0,
+ (uint8_t *)&luks->header,
+ sizeof(luks->header),
+ opaque,
+ errp) < 0) {
 goto fail;
 }
 
@@ -675,13 +671,11 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 if (memcmp(luks->header.magic, qcrypto_block_luks_magic,
QCRYPTO_BLOCK_LUKS_MAGIC_LEN) != 0) {
 error_setg(errp, "Volume is not in LUKS format");
-ret = -EINVAL;
 goto fail;
 }
 if (luks->header.version != QCRYPTO_BLOCK_LUKS_VERSION) {
 error_setg(errp, "LUKS version %" PRIu32 " is not supported",
luks->header.version);
-ret = -ENOTSUP;
 goto fail;
 }
 
@@ -697,7 +691,6 @@ qcrypto_block_luks_open(QCryptoBlock *block,
  */
 ivgen_name = strchr(cipher_mode, '-');
 if (!ivgen_name) {
-ret = -EINVAL;
 error_setg(errp, "Unexpected cipher mode string format %s",
cipher_mode);
 goto fail;
@@ -715,7 +708,6 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 luks->ivgen_hash_alg = qcrypto_block_luks_hash_name_lookup(ivhash_name,
&local_err);
 if (local_err) {
-ret = -ENOTSUP;
 error_propagate(errp, local_err);
 goto fail;
 }
@@ -724,7 +716,6 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 luks->cipher_mode = qcrypto_block_luks_cipher_mode_lookup(cipher_mode,
   &local_err);
 if (local_err) {
-ret = -ENOTSUP;
 error_propagate(errp, local_err);
 goto fail;
 }
@@ -735,7 +726,6 @@ qcrypto_block_luks_open(QCryptoBlock *block,
   luks->header.master_key_len,
   &local_err);
 if (local_err) {
-ret = -ENOTSUP;
 error_propagate(errp, local_err);
 goto fail;
 }
@@ -744,7 +734,6 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 qcrypto_block_luks_hash_name_lookup(luks->header.hash_spec,
 &local_err);
 if (local_err) {
-ret = -ENOTSUP;
 error_propagate(errp, local_err);
 goto fail;
 }
@@ -752,14 +741,12 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 luks->ivgen_alg = qcrypto_block_luks_ivgen_name_lookup(ivgen_name,
&local_err);
 if (local_err) {
-ret = -ENOTSUP;
 error_propagate(errp, local_err);
 goto fail;
 }
 
 if (luks->ivgen_alg == QCRYPTO_IVGEN_ALG_ESSIV) {
 if (!ivhash_name) {
-ret = -EINVAL;
 error_setg(errp, "Missing IV generator hash specification");
 goto fail;
 }
@@ -768,7 +755,6 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 luks->ivgen_hash_alg,
 &local_err);
 if (local_err) {
-ret = -ENOTSUP;
 error_propagate(errp, local_err);
 goto fail;
 }
@@ -795,7 +781,6 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 masterkey,
 readfunc, opaque,
 errp) < 0) {
-ret = -EACCES;
 goto fail;
 }
 
@@ -813,19 +798,16 @@ qcrypto_block_luks_open(QCryptoBlock *block,
  luks->header.master_key_len,
  errp);
 if (!block->ivgen) {
-ret = -ENOTSUP;
 goto fail;
 }
 
-ret = qcrypto_block_init_cipher(block,
-luks->cipher_alg,
-luks->c

[PATCH v2 13/13] LUKS: better error message when creating too large files

2019-09-25 Thread Maxim Levitsky

Currently if you attampt to create too large file with luks you
get the following error message:

Formatting 'test.luks', fmt=luks size=17592186044416 key-secret=sec0
qemu-img: test.luks: Could not resize file: File too large

While for raw format the error message is
qemu-img: test.img: The image size is too large for file format 'raw'


The reason for this is that qemu-img checks for errono of the failure,
and presents the later error when it is -EFBIG

However crypto generic code 'swallows' the errno and replaces it
with -EIO.

As an attempt to make it better, we can make luks driver,
detect -EFBIG and in this case present a better error message,
which is what this patch does

The new error message is:

qemu-img: error creating test.luks: The requested file size is too large

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534898
Signed-off-by: Maxim Levitsky 
Reviewed-by: Daniel P. Berrangé 
---
 block/crypto.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 6e822c6e50..19c2ac602c 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -102,10 +102,12 @@ static ssize_t block_crypto_create_init_func(QCryptoBlock 
*block,
   Error **errp)
 {
 struct BlockCryptoCreateData *data = opaque;
+Error *local_error = NULL;
+int ret;
 
 if (data->size > INT64_MAX || headerlen > INT64_MAX - data->size) {
-error_setg(errp, "The requested file size is too large");
-return -EFBIG;
+ret = -EFBIG;
+goto error;
 }
 
 /*
@@ -115,6 +117,21 @@ static ssize_t block_crypto_create_init_func(QCryptoBlock 
*block,
  */
 return blk_truncate(data->blk, data->size + headerlen, data->prealloc,
 errp);
+
+if (ret >= 0) {
+return ret;
+}
+
+error:
+if (ret == -EFBIG) {
+/* Replace the error message with a better one */
+error_free(local_error);
+error_setg(errp, "The requested file size is too large");
+} else {
+error_propagate(errp, local_error);
+}
+
+return ret;
 }
 
 
-- 
2.17.2

1 2 3 4 5 >

1 - 100 of 402 matches

Mail list logo