date:20170516

Re: [Qemu-devel] [PATCH 0/4] exec: address space translation cleanups

2017-05-16 Thread Peter Xu

On Wed, May 17, 2017 at 12:23:42PM +0800, Peter Xu wrote:
> On Tue, May 16, 2017 at 06:51:03PM +0200, Maxime Coquelin wrote:
> > Hi Peter,
> > 
> > On 05/16/2017 03:24 PM, Maxime Coquelin wrote:
> > >
> > >
> > >On 05/15/2017 10:50 AM, Peter Xu wrote:
> > >>The problem is that, address_space_get_iotlb_entry() shares a lot with
> > >>address_space_translate(). This patch tries to abstract the
> > >>shared elements.
> > >>
> > >>Originally, this work is derived from discussion from VT-d passthrough
> > >>series discussions [1]. But for sure we can just see this series as a
> > >>standalone cleanup. So I posted it separately here.
> > >>
> > >>Smoke tests are done with general VM boots, IOs, especially with vhost
> > >>dmar configurations.
> > >>
> > >>I believe with current series I can throw away the old patch [1],
> > >>which may be good. But before that, please kindly review. Thanks.
> > >
> > >I faced the problem the old patch fixes when declaring and attaching an
> > >IOMMU device, but booting the kernel with intel_iommu=off.
> > >
> > >I tested again with patches 1 & 4 of your series, and I confirm it fixes
> > >the issue:
> > >Tested-by: Maxime Coquelin 
> > 
> > I did some more testing with my "vhost-user IOMMU" setup, and the series
> > actually breaks with IOMMU device attached, and intel_iommu=on.
> > 
> > The main difference with the previous passing test is the guest RAM
> > size. In the working setup, it is 2G of 2M hugepages, vs. 4G of 2M
> > hugepages in the failing one. Note that I also reproduce with vhost-kernel
> > backend.
> > 
> > The error happens in the first vhost_device_iotlb_miss() call:
> > qemu-system-x86_64: Fail to lookup the translated address b5d7c000
> > 
> > I don't have the root cause yet, I'll keep you updated.
> 
> Maxime,
> 
> Thanks a lot for help testing this series!
> 
> I reproduced this problem, and this is not a problem obvious enough
> for me. Let me investigate as well.
> 
> -- 
> Peter Xu

Maxime,

Could you help try adding this change upon current to see whether
problem solved?

diff --git a/exec.c b/exec.c
index 697d902..68576a2 100644
--- a/exec.c
+++ b/exec.c
@@ -521,6 +521,10 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace 
*as, hwaddr addr,
 goto iotlb_fail;
 }
 
+/* Convert memory region offset into address space offset */
+xlat += section.offset_within_address_space -
+section.offset_within_region;
+
 if (plen == (hwaddr)-1) {
 /*
  * We use default page size here. Logically it only happens

Thanks in advance,

-- 
Peter Xu

Re: [Qemu-devel] [PULL 19/48] spapr: allocate the ICPState object from under sPAPRCPUCore

2017-05-16 Thread Cédric Le Goater

On 05/16/2017 06:10 PM, Greg Kurz wrote:
> On Tue, 16 May 2017 17:18:27 +0200
> Cédric Le Goater  wrote:
> 
>> On 05/16/2017 02:55 PM, Laurent Vivier wrote:
>>> On 16/05/2017 14:50, Cédric Le Goater wrote:  
 On 05/16/2017 02:03 PM, Laurent Vivier wrote:  
> On 26/04/2017 09:00, David Gibson wrote:  
>> From: Cédric Le Goater 
>>
>> Today, all the ICPs are created before the CPUs, stored in an array
>> under the sPAPR machine and linked to the CPU when the core threads
>> are realized. This modeling brings some complexity when a lookup in
>> the array is required and it can be simplified by allocating the ICPs
>> when the CPUs are.
>>
>> This is the purpose of this proposal which introduces a new 'icp_type'
>> field under the machine and creates the ICP objects of the right type
>> (KVM or not) before the PowerPCCPU object are.
>>
>> This change allows more cleanups : the removal of the icps array under
>> the sPAPR machine and the removal of the xics_get_cpu_index_by_dt_id()
>> helper.
>>
>> Signed-off-by: Cédric Le Goater 
>> Reviewed-by: David Gibson 
>> Signed-off-by: David Gibson 
>> ---
>>  hw/intc/xics.c  | 11 ---
>>  hw/ppc/spapr.c  | 47 
>> ++-
>>  hw/ppc/spapr_cpu_core.c | 18 ++
>>  include/hw/ppc/spapr.h  |  2 +-
>>  include/hw/ppc/xics.h   |  2 --
>>  5 files changed, 29 insertions(+), 51 deletions(-)
>>  
>
> This commit breaks CPU re-hotplugging with KVM
>
> the sequence "device_add, device_del, device_add" brings to the
> following error message:
>
> Unable to connect CPUx to kernel XICS: Device or resource busy
>
> It comes from icp_kvm_cpu_setup():
>
> ...
> ret = kvm_vcpu_enable_cap(cs, KVM_CAP_IRQ_XICS, 0, kernel_xics_fd,
>   kvm_arch_vcpu_id(cs));
> if (ret < 0) {
> error_report("Unable to connect CPU%ld to kernel XICS: %s",
>  kvm_arch_vcpu_id(cs), strerror(errno));
> exit(1);
> }
> ..
>
> It should be protected by cap_irq_xics_enabled:
>
> ...
> /*
>  * If we are reusing a parked vCPU fd corresponding to the CPU
>  * which was hot-removed earlier we don't have to renable
>  * KVM_CAP_IRQ_XICS capability again.
>  */
> if (icp->cap_irq_xics_enabled) {
> return;
> }
>
> ...
> ret = kvm_vcpu_enable_cap(...);
> ...
> icp->cap_irq_xics_enabled = true;
> ...
>
> But since this commit, "icp" is a new object on each call:
>
> spapr_cpu_core_realize_child()
> ...
> obj = object_new(spapr->icp_type);
> ...
> xics_cpu_setup(XICS_FABRIC(spapr), cpu, ICP(obj));
> ...
> icpc->cpu_setup(icp, cpu); -> icp_kvm_cpu_setup()
> ...
> ...
>
> and "cap_irq_xics_enabled" is reinitialized.
>
> Any idea how to fix that?  

 it seems that a cleanup is not done in the kernel. We are missing
 a way to call kvmppc_xics_free_icp() from QEMU. Today the only
 way is to destroy the vcpu.   
>>>
>>> The commit introducing this hack, for reference:
>>>
>>> commit a45863bda90daa8ec39e5a312b9734fd4665b016
>>> Author: Bharata B Rao 
>>> Date:   Thu Jul 2 16:23:20 2015 +1000
>>>
>>> xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled
>>> 
>>> When supporting CPU hot removal by parking the vCPU fd and reusing
>>> it during hotplug again, there can be cases where we try to reenable
>>> KVM_CAP_IRQ_XICS CAP for the vCPU for which it was already enabled.
>>> Introduce a boolean member in ICPState to track this and don't
>>> reenable the CAP if it was already enabled earlier.
>>> 
>>> Re-enabling this CAP should ideally work, but currently it results in
>>> kernel trying to create and associate ICP with this vCPU and that
>>> fails since there is already an ICP associated with it. Hence this
>>> patch is needed to work around this problem in the kernel.
>>> 
>>> This change allows CPU hot removal to work for sPAPR.
>>> 
>>> Signed-off-by: Bharata B Rao 
>>> Reviewed-by: David Gibson 
>>> Signed-off-by: David Gibson 
>>> Signed-off-by: Alexander Graf   
>>
>> OK. 
>>
>> Greg is looking at re-adding the ICPState array because of a 
>> migration issue with older machines. We might need to do so 
>> unconditionally ...
>>
> 
> That would be a pity to carry on with the pre-allocated ICPStates for
> new machine types just because of that... What about

Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration

2017-05-16 Thread Wanpeng Li

Hi Zhoujian,
2017-05-17 10:20 GMT+08:00 Zhoujian (jay) :
> Hi Wanpeng,
>
>> > On 11/05/2017 14:07, Zhoujian (jay) wrote:
>> >> -* Scan sptes if dirty logging has been stopped, dropping those
>> >> -* which can be collapsed into a single large-page spte.  Later
>> >> -* page faults will create the large-page sptes.
>> >> +* Reset each vcpu's mmu, then page faults will create the
>> large-page
>> >> +* sptes later.
>> >>  */
>> >> if ((change != KVM_MR_DELETE) &&
>> >> (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
>> >> -   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> >> -   kvm_mmu_zap_collapsible_sptes(kvm, new);
>>
>> This is an unlikely branch(unless guest live migration fails and continue
>> to run on the source machine) instead of hot path, do you have any
>> performance number for your real workloads?
>>
>
> Sorry to bother you again.
>
> Recently, I have tested the performance before migration and after migration 
> failure
> using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard 
> performance
> evaluation tool.
>
> These are the results:
> **
> Before migration the score is 153, and the TLB miss statistics of the 
> qemu process is:
> linux-sjrfac:/mnt/zhoujian # perf stat -e 
> dTLB-load-misses,dTLB-loads,dTLB-store-misses, \
> dTLB-stores,iTLB-load-misses,iTLB-loads -p 26463 sleep 10
>
> Performance counter stats for process id '26463':
>
>698,938  dTLB-load-misses  #0.13% of all dTLB 
> cache hits   (50.46%)
>543,303,875  dTLB-loads
> (50.43%)
>199,597  dTLB-store-misses 
> (16.51%)
> 60,128,561  dTLB-stores   
> (16.67%)
> 69,986  iTLB-load-misses  #6.17% of all iTLB 
> cache hits   (16.67%)
>  1,134,097  iTLB-loads
> (33.33%)
>
>   10.000684064 seconds time elapsed
>
> After migration failure the score is 149, and the TLB miss statistics of 
> the qemu process is:
> linux-sjrfac:/mnt/zhoujian # perf stat -e 
> dTLB-load-misses,dTLB-loads,dTLB-store-misses, \
> dTLB-stores,iTLB-load-misses,iTLB-loads -p 26463 sleep 10
>
> Performance counter stats for process id '26463':
>
>765,400  dTLB-load-misses  #0.14% of all dTLB 
> cache hits   (50.50%)
>540,972,144  dTLB-loads
> (50.47%)
>207,670  dTLB-store-misses 
> (16.50%)
> 58,363,787  dTLB-stores   
> (16.67%)
>109,772  iTLB-load-misses  #9.52% of all iTLB 
> cache hits   (16.67%)
>  1,152,784  iTLB-loads
> (33.32%)
>
>   10.000703078 seconds time elapsed
> **

Could you comment out the original "lazy collapse small sptes into
large sptes" codes in the function kvm_arch_commit_memory_region() and
post the results here?

Regards,
Wanpeng Li

>
> These are the steps:
> ==
>  (1) the version of kmod is 4.4.11(with slightly modified) and the version of 
> qemu is 2.6.0
> (with slightly modified), the kmod is applied with the following patch 
> according to
> Paolo's advice:
>
> diff --git a/source/x86/x86.c b/source/x86/x86.c
> index 054a7d3..75a4bb3 100644
> --- a/source/x86/x86.c
> +++ b/source/x86/x86.c
> @@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>  */
> if ((change != KVM_MR_DELETE) &&
> (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
> -   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
> -   kvm_mmu_zap_collapsible_sptes(kvm, new);
> +   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
> +   printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
> +   kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
> +   }
>
> /*
>  * Set up write protection and/or dirty logging for the new slot.
>
> (2) I started up a memory preoccupied 10G VM(suse11sp3), which means its "RES 
> column" in top is 10G,
> in order to set up the EPT table in advance.
> (3) And then, I run the test case 429.mcf of spec cpu2006 before migration 
> and after migration failure.
> The 429.mcf is a memory intensive workload, and the migration failure is 
> constructed deliberately
> with the following patch of qemu:
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 5d725d0..88dfc59 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -625,6 +625,9 @@ static void process_incoming_migration_co(void *opaque)
>

Re: [Qemu-devel] [PATCH 06/13] vvfat: fix field names in FAT12/FAT16 boot sector

2017-05-16 Thread Hervé Poussineau


Le 16/05/2017 à 16:39, Kevin Wolf a écrit :

Am 15.05.2017 um 22:31 hat Hervé Poussineau geschrieben:

Specification: "FAT: General overview of on-disk format" v1.03, page 11
Signed-off-by: Hervé Poussineau 
---
 block/vvfat.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index f60d2a3889..348cffe1c4 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -218,10 +218,12 @@ typedef struct bootsector_t {
 union {
 struct {
 uint8_t drive_number;
-uint8_t current_head;
+uint8_t reserved1;
 uint8_t signature;
 uint32_t id;
 uint8_t volume_label[11];
+uint8_t fat_type[8];
+uint8_t ignored[0x1c0];
 } QEMU_PACKED fat16;
 struct {
 uint32_t sectors_per_fat;
@@ -233,8 +235,6 @@ typedef struct bootsector_t {
 uint16_t ignored;
 } QEMU_PACKED fat32;
 } u;
-uint8_t fat_type[8];
-uint8_t ignored[0x1c0];
 uint8_t magic[2];
 } QEMU_PACKED bootsector_t;


At least, this makes it clear that .fat16 and .fat32 aren't the same
length. But maybe it would be cleaner to have a third union member
uint8_t bytes[0x1da] (if I calculated correctly) instead of relying on
the .fat16 branch to extend the space for .fat32?


I will also update the .fat32 bootsector to match specification in the same 
patch.
So, both members (.fat16 et .fat32) will have the same size.

BTW, FAT32 doesn't work at all, so that's not really important :)

Hervé

Re: [Qemu-devel] [PATCH 05/13] vvfat: introduce offset_to_bootsector, offset_to_fat and offset_to_root_dir

2017-05-16 Thread Hervé Poussineau


Le 16/05/2017 à 16:16, Kevin Wolf a écrit :

Am 15.05.2017 um 22:31 hat Hervé Poussineau geschrieben:

- offset_to_bootsector is the number of sectors up to FAT bootsector
- offset_to_fat is the number of sectors up to first File Allocation Table
- offset_to_root_dir is the number of sectors up to root directory sector


Hm... These names make me think of byte offsets. Not completely opposed
to them, but if anyone can think of something better...?


Replace first_sectors_number - 1 by offset_to_bootsector.
Replace first_sectors_number by offset_to_fat.
Replace faked_sectors by offset_to_rootdir.

Signed-off-by: Hervé Poussineau 
---
 block/vvfat.c | 67 +++
 1 file changed, 40 insertions(+), 27 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index 4f4a63c03f..f60d2a3889 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -320,22 +320,24 @@ static void print_mapping(const struct mapping_t* 
mapping);
 typedef struct BDRVVVFATState {
 CoMutex lock;
 BlockDriverState* bs; /* pointer to parent */
-unsigned int first_sectors_number; /* 1 for a single partition, 0x40 for a 
disk with partition table */
 unsigned char first_sectors[0x40*0x200];

 int fat_type; /* 16 or 32 */
 array_t fat,directory,mapping;
 char volume_label[11];

+uint32_t offset_to_bootsector; /* 0 for floppy, 0x3f for disk */
+
 unsigned int cluster_size;
 unsigned int sectors_per_cluster;
 unsigned int sectors_per_fat;
 unsigned int sectors_of_root_directory;
 uint32_t last_cluster_of_root_directory;
-unsigned int faked_sectors; /* how many sectors are faked before file data 
*/
 uint32_t sector_count; /* total number of sectors of the partition */
 uint32_t cluster_count; /* total number of clusters of this partition */
 uint32_t max_fat_value;
+uint32_t offset_to_fat;
+uint32_t offset_to_root_dir;

 int current_fd;
 mapping_t* current_mapping;
@@ -394,15 +396,15 @@ static void init_mbr(BDRVVVFATState *s, int cyls, int 
heads, int secs)
 partition->attributes=0x80; /* bootable */

 /* LBA is used when partition is outside the CHS geometry */
-lba  = sector2CHS(>start_CHS, s->first_sectors_number - 1,
+lba  = sector2CHS(>start_CHS, s->offset_to_bootsector,
  cyls, heads, secs);
 lba |= sector2CHS(>end_CHS,   s->bs->total_sectors - 1,
  cyls, heads, secs);

 /*LBA partitions are identified only by start/length_sector_long not by 
CHS*/
-partition->start_sector_long  = cpu_to_le32(s->first_sectors_number - 1);
+partition->start_sector_long  = cpu_to_le32(s->offset_to_bootsector);
 partition->length_sector_long = cpu_to_le32(s->bs->total_sectors
-- s->first_sectors_number + 1);
+- s->offset_to_bootsector);

 /* FAT12/FAT16/FAT32 */
 /* DOS uses different types when partition is LBA,
@@ -823,12 +825,12 @@ static int read_directory(BDRVVVFATState* s, int 
mapping_index)

 static inline uint32_t sector2cluster(BDRVVVFATState* s,off_t sector_num)
 {
-return (sector_num-s->faked_sectors)/s->sectors_per_cluster;
+return (sector_num - s->offset_to_root_dir) / s->sectors_per_cluster;
 }

 static inline off_t cluster2sector(BDRVVVFATState* s, uint32_t cluster_num)
 {
-return s->faked_sectors + s->sectors_per_cluster * cluster_num;
+return s->offset_to_root_dir + s->sectors_per_cluster * cluster_num;
 }

 static int init_directories(BDRVVVFATState* s,
@@ -855,6 +857,9 @@ static int init_directories(BDRVVVFATState* s,
 i = 1+s->sectors_per_cluster*0x200*8/s->fat_type;
 s->sectors_per_fat=(s->sector_count+i)/i; /* round up */

+s->offset_to_fat = s->offset_to_bootsector + 1;
+s->offset_to_root_dir = s->offset_to_fat + s->sectors_per_fat * 2;
+
 array_init(&(s->mapping),sizeof(mapping_t));
 array_init(&(s->directory),sizeof(direntry_t));

@@ -868,7 +873,6 @@ static int init_directories(BDRVVVFATState* s,
 /* Now build FAT, and write back information into directory */
 init_fat(s);

-s->faked_sectors=s->first_sectors_number+s->sectors_per_fat*2;
 s->cluster_count=sector2cluster(s, s->sector_count);

 mapping = array_get_next(&(s->mapping));
@@ -946,7 +950,8 @@ static int init_directories(BDRVVVFATState* s,

 s->current_mapping = NULL;

-
bootsector=(bootsector_t*)(s->first_sectors+(s->first_sectors_number-1)*0x200);
+bootsector = (bootsector_t *)(s->first_sectors
+  + s->offset_to_bootsector * 0x200);
 bootsector->jump[0]=0xeb;
 bootsector->jump[1]=0x3e;
 bootsector->jump[2]=0x90;
@@ -957,16 +962,16 @@ static int init_directories(BDRVVVFATState* s,
 bootsector->number_of_fats=0x2; /* number of FATs */
 bootsector->root_entries=cpu_to_le16(s->sectors_of_root_directory*0x10);

Re: [Qemu-devel] [PATCH 03/13] vvfat: fix typos

2017-05-16 Thread Hervé Poussineau


Le 16/05/2017 à 15:21, Kevin Wolf a écrit :

Am 15.05.2017 um 22:31 hat Hervé Poussineau geschrieben:

@@ -806,7 +806,7 @@ static int read_directory(BDRVVVFATState* s, int 
mapping_index)
 (ROOT_ENTRIES - cur) * sizeof(direntry_t));
 }

- /* reget the mapping, since s->mapping was possibly realloc()ed */
+/* reset the mapping, since s->mapping was possibly realloc()ed */


Are you sure that this was a typo? It seems to make more sense to me as
"re-get" (maybe easer to read with the hyphen).


IMO, both are valid. But I'll change it to "re-get" for v2.




 mapping = array_get(&(s->mapping), mapping_index);
 first_cluster += (s->directory.next - mapping->info.dir.first_dir_index)
 * 0x20 / s->cluster_size;


Kevin

Re: [Qemu-devel] [Qemu devel v5 PATCH 0/5] Add support for Smartfusion2 SoC

2017-05-16 Thread Philippe Mathieu-Daudé


Hi Sundeep,

This patchset is way cleaner!
I had a fast look and I like it, I'll try to make some time soon to 
review details and test it.


Is your work interested on U-Boot or more focused in Linux kernel?

If you compile QEMU with libfdt support you can use the -dtb option to 
pass the blob to the kernel directly, bypassing the bootloader.


If you need a bootloader you may give a look at coreboot which supports 
dts well, see how Vladimir Serbinenko used Linux's dt to boot a QEMU 
Versatile Express board:

https://mail.coreboot.org/pipermail/coreboot-gerrit/2016-February/040899.html

Regards,

Phil.

On 05/16/2017 12:38 PM, Subbaraya Sundeep wrote:

Hi Qemu-devel,

I am trying to add Smartfusion2 SoC.
SoC is from Microsemi and System on Module(SOM)
board is from Emcraft systems. Smartfusion2 has hardened
Microcontroller(Cortex-M3)based Sub System and FPGA fabric.
At the moment only system timer, sysreg and SPI
controller are modelled.

Testing:
./arm-softmmu/qemu-system-arm -M smartfusion2-som -serial mon:stdio \
-kernel u-boot.bin -display none -drive file=spi.bin,if=mtd,format=raw

Binaries u-boot.bin and spi.bin are at:
https://github.com/Subbaraya-Sundeep/qemu-test-binaries.git

U-boot is from Emcraft with modified
- SPI driver not to use PDMA.
- ugly hack to pass dtb to kernel in r1.
@
https://github.com/Subbaraya-Sundeep/emcraft-uboot-sf2.git

Linux is 4.5 linux with Smartfusion2 SoC dts and clocksource
driver added by myself @
https://github.com/Subbaraya-Sundeep/linux.git

v5
As per Philippe comments:
Added abort in Sysreg if guest tries to remap memory
other than default mapping.
Use of CONFIG_MSF2 in Makefile for soc.c
Fixed incorrect logic in timer model.
Renamed msf2-timer.c -> mss-timer.c
msf2-spi.c -> mss-spi.c also type names
Renamed function msf2_init->emcraft_sf2_init in msf2-som.c
Added part-name,eNVM-size,eSRAM-size,pclk0 and pclk1
properties to soc.
Pass soc part-name,memory size and clock rate properties from som.
v4:
Fixed build failure by using PRIx macros.
v3:
Added SoC file and board file as per Alistair comments.
v2:
Added SPI controller so that u-boot loads kernel from spi flash.
v1:
Initial patch set with timer and sysreg

Thanks,
Sundeep

Subbaraya Sundeep (5):
  msf2: Add Smartfusion2 System timer
  msf2: Microsemi Smartfusion2 System Register block.
  msf2: Add Smartfusion2 SPI controller
  msf2: Add Smartfusion2 SoC.
  msf2: Add Emcraft's Smartfusion2 SOM kit.

 default-configs/arm-softmmu.mak |   1 +
 hw/arm/Makefile.objs|   2 +
 hw/arm/msf2-soc.c   | 201 +
 hw/arm/msf2-som.c   |  89 ++
 hw/misc/Makefile.objs   |   1 +
 hw/misc/msf2-sysreg.c   | 161 +
 hw/ssi/Makefile.objs|   1 +
 hw/ssi/mss-spi.c| 378 
 hw/timer/Makefile.objs  |   1 +
 hw/timer/mss-timer.c| 249 ++
 include/hw/arm/msf2-soc.h   |  69 
 include/hw/misc/msf2-sysreg.h   |  80 +
 include/hw/ssi/mss-spi.h| 104 +++
 include/hw/timer/mss-timer.h|  80 +
 14 files changed, 1417 insertions(+)
 create mode 100644 hw/arm/msf2-soc.c
 create mode 100644 hw/arm/msf2-som.c
 create mode 100644 hw/misc/msf2-sysreg.c
 create mode 100644 hw/ssi/mss-spi.c
 create mode 100644 hw/timer/mss-timer.c
 create mode 100644 include/hw/arm/msf2-soc.h
 create mode 100644 include/hw/misc/msf2-sysreg.h
 create mode 100644 include/hw/ssi/mss-spi.h
 create mode 100644 include/hw/timer/mss-timer.h

Re: [Qemu-devel] [PATCH 0/4] exec: address space translation cleanups

2017-05-16 Thread Peter Xu

On Tue, May 16, 2017 at 06:51:03PM +0200, Maxime Coquelin wrote:
> Hi Peter,
> 
> On 05/16/2017 03:24 PM, Maxime Coquelin wrote:
> >
> >
> >On 05/15/2017 10:50 AM, Peter Xu wrote:
> >>The problem is that, address_space_get_iotlb_entry() shares a lot with
> >>address_space_translate(). This patch tries to abstract the
> >>shared elements.
> >>
> >>Originally, this work is derived from discussion from VT-d passthrough
> >>series discussions [1]. But for sure we can just see this series as a
> >>standalone cleanup. So I posted it separately here.
> >>
> >>Smoke tests are done with general VM boots, IOs, especially with vhost
> >>dmar configurations.
> >>
> >>I believe with current series I can throw away the old patch [1],
> >>which may be good. But before that, please kindly review. Thanks.
> >
> >I faced the problem the old patch fixes when declaring and attaching an
> >IOMMU device, but booting the kernel with intel_iommu=off.
> >
> >I tested again with patches 1 & 4 of your series, and I confirm it fixes
> >the issue:
> >Tested-by: Maxime Coquelin 
> 
> I did some more testing with my "vhost-user IOMMU" setup, and the series
> actually breaks with IOMMU device attached, and intel_iommu=on.
> 
> The main difference with the previous passing test is the guest RAM
> size. In the working setup, it is 2G of 2M hugepages, vs. 4G of 2M
> hugepages in the failing one. Note that I also reproduce with vhost-kernel
> backend.
> 
> The error happens in the first vhost_device_iotlb_miss() call:
> qemu-system-x86_64: Fail to lookup the translated address b5d7c000
> 
> I don't have the root cause yet, I'll keep you updated.

Maxime,

Thanks a lot for help testing this series!

I reproduced this problem, and this is not a problem obvious enough
for me. Let me investigate as well.

-- 
Peter Xu

Re: [Qemu-devel] [PATCH 6/6] spapr: fix migration of ICP objects from/to older QEMU

2017-05-16 Thread David Gibson

On Mon, May 15, 2017 at 02:22:32PM +0200, Cédric Le Goater wrote:
> On 05/15/2017 01:40 PM, Greg Kurz wrote:
> > Commit 5bc8d26de20c ("spapr: allocate the ICPState object from under
> > sPAPRCPUCore") moved ICP objects from the machine to CPU cores. This
> > is an improvement since we no longer allocate ICP objects that will
> > never be used. But it has the side-effect of breaking migration of
> > older machine types from older QEMU versions.
> > 
> > This patch introduces a compat flag in the sPAPR machine class so
> > that all pseries machine up to 2.9 go on with the previous behavior
> > of pre-allocating ICP objects.
> 
> I think this is a quite elegant way to a handle the migration 
> regression. Thanks for taking care of it.
> 
> Have you tried to simply reparent the ICPs objects to OBJECT(spapr) 
> instead of the OBJECT(cpu)  ? 

I actually kind of hate changing the QOM tree structure based on
machine type compatibility.  Unfortunately, since we're matching up
the migration state based (essentially) on QOM path, I don't see any
easy alternative.  I really wish there was a mechanism for defining
"alias paths" or something to handle this kind of migration
compatibility shim.

> See some minor comments below.
> 
> > While here, we also ensure that object_property_add_child() errors cause
> > QEMU to abort for newer machines.
> > 
> > Signed-off-by: Greg Kurz 
> > ---
> >  hw/ppc/spapr.c  |   36 
> >  hw/ppc/spapr_cpu_core.c |   28 
> >  include/hw/ppc/spapr.h  |2 ++
> >  3 files changed, 58 insertions(+), 8 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index c53989bb10b1..ab3683bcd677 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -126,6 +126,7 @@ error:
> >  static void xics_system_init(MachineState *machine, int nr_irqs, Error 
> > **errp)
> >  {
> >  sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> > +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> >  Error *local_err = NULL;
> >  
> >  if (kvm_enabled()) {
> > @@ -151,6 +152,38 @@ static void xics_system_init(MachineState *machine, 
> > int nr_irqs, Error **errp)
> >_err);
> >  }
> >  
> > +if (!spapr->ics) {
> > +goto out;
> > +}
> > +
> > +if (smc->must_pre_allocate_icps) {
> 
> I am not sure I like 'must', I think 'pre_allocate_icps' should be enough ? 
> or simply 'allocate_legacy_icps' ?

I'd actually prefer to make it explicit that this is a migration
compatibility shim and call it something like
'pre_2_10_icp_allocation'.

> > +int smt = kvmppc_smt_threads();
> > +int nr_servers = DIV_ROUND_UP(max_cpus * smt, smp_threads);
> 
> may be we should reintroduce nr_servers at the machine level ? 
> 
> > +int i;
> > +
> > +spapr->legacy_icps = g_malloc0(nr_servers * sizeof(ICPState));

This isn't technically safe, although you'll probably get away with
it.  spapr->icp_type is parameterized, which means it could be a
sub-class with a larger state structure than base ICPState.

> > +for (i = 0; i < nr_servers; i++) {
> > +void* obj = >legacy_icps[i];
> 
> 'void *'
> 
> > +
> > +object_initialize(obj, sizeof(ICPState), spapr->icp_type);
> > +object_property_add_child(OBJECT(spapr), "icp[*]", obj,
> > +  _abort);
> 
> David does not like the "icp[*]" syntax.
> 
> > +object_unref(obj);
> > +object_property_add_const_link(obj, "xics", OBJECT(spapr),
> > +   _abort);
> > +object_property_set_bool(obj, true, "realized", _err);
> > +if (local_err) {
> > +while (i--) {
> > +object_unparent(obj);
> > +}
> > +g_free(spapr->legacy_icps);
> > +break;
> > +}
> > +}
> > +}
> > +
> > +out:
> >  error_propagate(errp, local_err);
> >  }
> >  
> > @@ -3256,8 +3289,11 @@ static void 
> > spapr_machine_2_9_instance_options(MachineState *machine)
> >  
> >  static void spapr_machine_2_9_class_options(MachineClass *mc)
> >  {
> > +sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
> > +
> >  spapr_machine_2_10_class_options(mc);
> >  SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9);
> > +smc->must_pre_allocate_icps = true;
> >  }
> >  
> >  DEFINE_SPAPR_MACHINE(2_9, "2.9", false);
> > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> > index 63d160f7e010..5476647efa06 100644
> > --- a/hw/ppc/spapr_cpu_core.c
> > +++ b/hw/ppc/spapr_cpu_core.c
> > @@ -119,6 +119,7 @@ static void spapr_cpu_core_unrealizefn(DeviceState 
> > *dev, Error **errp)
> >  size_t size = object_type_get_instance_size(typename);
> >  CPUCore *cc = CPU_CORE(dev);
> >  int i;
> > +sPAPRMachineState *spapr =

Re: [Qemu-devel] [PATCH 6/6] spapr: fix migration of ICP objects from/to older QEMU

2017-05-16 Thread David Gibson

On Mon, May 15, 2017 at 06:20:06PM +0200, Greg Kurz wrote:
> On Mon, 15 May 2017 18:09:04 +0200
> Cédric Le Goater  wrote:
> 
> > On 05/15/2017 03:16 PM, Greg Kurz wrote:
> > > On Mon, 15 May 2017 14:22:32 +0200
> > > Cédric Le Goater  wrote:
> > >   
> > >> On 05/15/2017 01:40 PM, Greg Kurz wrote:  
> > >>> Commit 5bc8d26de20c ("spapr: allocate the ICPState object from under
> > >>> sPAPRCPUCore") moved ICP objects from the machine to CPU cores. This
> > >>> is an improvement since we no longer allocate ICP objects that will
> > >>> never be used. But it has the side-effect of breaking migration of
> > >>> older machine types from older QEMU versions.
> > >>>
> > >>> This patch introduces a compat flag in the sPAPR machine class so
> > >>> that all pseries machine up to 2.9 go on with the previous behavior
> > >>> of pre-allocating ICP objects.
> > >>
> > >> I think this is a quite elegant way to a handle the migration 
> > >> regression. Thanks for taking care of it.
> > >>
> > >> Have you tried to simply reparent the ICPs objects to OBJECT(spapr) 
> > >> instead of the OBJECT(cpu)  ? 
> > >>  
> > > 
> > > Do you mean to reparent unconditionally to OBJECT(spapr) for all
> > > machine versions ?   
> > 
> > only in the case of smc->must_pre_allocate_icps
> > 
> > > I'm not sure this would be beneficial, but I might be missing 
> > > something...  
> > 
> > I think that we would not need to allocate the legacy_icps array. 
> > Parenting the icp object to the spapr machine should be enough. 
> > I might be wrong. my expertise on the migration stream is very 
> > basic.
> > 
> 
> I don't think this would work because an older QEMU would still
> send state for objects that don't exist in the destination.

Right.  We could create "dummy" objects that receive the ICP data,
then discard it.  But it's probably more trouble than it's worty.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 6/6] spapr: fix migration of ICP objects from/to older QEMU

2017-05-16 Thread David Gibson

On Mon, May 15, 2017 at 06:11:27PM +0200, Cédric Le Goater wrote:
> >>> +int smt = kvmppc_smt_threads();
> >>> +int nr_servers = DIV_ROUND_UP(max_cpus * smt, smp_threads);  
> >>
> >> may be we should reintroduce nr_servers at the machine level ? 
> >>
> > 
> > I had reintroduced it but then I realized it was only used in this
> > function.
> 
> nr_servers is also used when the device tree is populated with the 
> interrupt controller nodes. No big deal.

Which is guest visible, so we should really make that stay the same
for older machine types.  I'd like to avoid re-introducing nr_servers
as a property if we can, but maybe we can't.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 6/6] spapr: fix migration of ICP objects from/to older QEMU

2017-05-16 Thread David Gibson

On Mon, May 15, 2017 at 03:16:02PM +0200, Greg Kurz wrote:
> On Mon, 15 May 2017 14:22:32 +0200
> Cédric Le Goater  wrote:
> 
> > On 05/15/2017 01:40 PM, Greg Kurz wrote:
[snip]
> > > +
> > > +object_initialize(obj, sizeof(ICPState), spapr->icp_type);
> > > +object_property_add_child(OBJECT(spapr), "icp[*]", obj,
> > > +  _abort);  
> > 
> > David does not like the "icp[*]" syntax.
> > 
> 
> Ah... I wasn't aware of that. But I agree that I should probably create
> the object names based on 'i', rather than relying on the more complex
> logic in object_property_add().

Right, the non-obviousness of what the indicies will end up being is
exactly why I dislike the "[*]" syntax.  Especially here, where it's
clearly important that the QOM paths end up exactly like in older qemu
versions, I think it's better to be explicit.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v3] net/rocker: Convert to realize

2017-05-16 Thread Mao Zhongyi

Hi, Markus

On 05/16/2017 11:29 PM, Markus Armbruster wrote:

Mao Zhongyi  writes:

The rocker device still implements the old PCIDeviceClass .init()
instead of the new .realize(). All devices need to be converted to
.realize().

Thanks for chipping in!

.init() reports errors with fprintf() and return 0 on success, negative
number on failure. Meanwhile, when -device rocker fails, it first report
a specific error, then a generic one, like this:

$ x86_64-softmmu/qemu-system-x86_64 -device rocker,name=qemu-rocker
rocker: name too long; please shorten to at most 9 chars
qemu-system-x86_64: -device rocker,name=qemu-rocker: Device initialization 
failed

Now, convert it to .realize() that passes errors to its callers via its
errp argument. Also avoid the superfluous second error message.

Recommend to show the error message after your patch here:

  qemu-system-x86_64: -device rocker,name=qemu-rocker: rocker: name too 
long; please shorten to at most 9 chars

Thanks, I think I got it.

Not least because that makes it blatantly obvious that keeping the
"rocker: " is not a good idea :)

Actually, I was always curious about why there are 2 "rocker" strings
in the report, it's superfluous. But in order to keep a consistent log
format, so inherited the original style.

Will remove it in the next version.

Cc: j...@resnulli.us
Cc: jasow...@redhat.com
Cc: f4...@amsat.org
Signed-off-by: Mao Zhongyi 
---
 hw/net/rocker/rocker.c | 35 +--
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
index 6e70fdd..c446cda 100644
--- a/hw/net/rocker/rocker.c
+++ b/hw/net/rocker/rocker.c
@@ -1252,20 +1252,18 @@ rollback:
 return err;
 }

-static int rocker_msix_init(Rocker *r)
+static int rocker_msix_init(Rocker *r, Error **errp)
 {
 PCIDevice *dev = PCI_DEVICE(r);
 int err;
-Error *local_err = NULL;

 err = msix_init(dev, ROCKER_MSIX_VEC_COUNT(r->fp_ports),
 >msix_bar,
 ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_TABLE_OFFSET,
 >msix_bar,
 ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_PBA_OFFSET,
-0, _err);
+0, errp);
 if (err) {
-error_report_err(local_err);
 return err;
 }

@@ -1301,7 +1299,7 @@ static World *rocker_world_type_by_name(Rocker *r, const 
char *name)
 return NULL;
 }

-static int pci_rocker_init(PCIDevice *dev)
+static void pci_rocker_realize(PCIDevice *dev, Error **errp)
 {
 Rocker *r = to_rocker(dev);
 const MACAddr zero = { .a = { 0, 0, 0, 0, 0, 0 } };
@@ -1315,7 +1313,7 @@ static int pci_rocker_init(PCIDevice *dev)

 for (i = 0; i < ROCKER_WORLD_TYPE_MAX; i++) {
 if (!r->worlds[i]) {
-err = -ENOMEM;
+error_setg(errp, "rocker: memory allocation for worlds failed");

r->worlds[i] is null when of_dpa_world_alloc() returns null.  It's a
wrapper around world_alloc(), which returns null only when g_malloc()
does.  It doesn't.  Please remove the dead error handling.  Ideally in a
separate cleanup patch before this one, to facilitate review.

Thanks very much for your detailed explanation.

After reading g_malloc0(), I am aware of this: g_malloc0(size_t size) 
returns null only when size is 0. But it is a wrapper around

g_malloc0_n(1, size) that ignore the fact that g_malloc0() of 0 bytes
returns null. So it doesn't return null. Am I right?

Recommend to drop the "rocker: " prefix.  Same for all the other error
messages.

Thanks, will dorp it entirely.

 goto err_world_alloc;
 }
 }
@@ -1326,10 +1324,9 @@ static int pci_rocker_init(PCIDevice *dev)

 r->world_dflt = rocker_world_type_by_name(r, r->world_name);
 if (!r->world_dflt) {
-fprintf(stderr,
-"rocker: requested world \"%s\" does not exist\n",
+error_setg(errp,
+"rocker: invalid argument, requested world %s does not exist",
 r->world_name);
-err = -EINVAL;
 goto err_world_type_by_name;
 }

@@ -1349,7 +1346,7 @@ static int pci_rocker_init(PCIDevice *dev)

 /* MSI-X init */

-err = rocker_msix_init(r);
+err = rocker_msix_init(r, errp);
 if (err) {
 goto err_msix_init;
 }
@@ -1361,7 +1358,7 @@ static int pci_rocker_init(PCIDevice *dev)
 }

 if (rocker_find(r->name)) {
-err = -EEXIST;
+error_setg(errp, "rocker: %s already exists", r->name);
 goto err_duplicate;
 }

@@ -1375,10 +1372,10 @@ static int pci_rocker_init(PCIDevice *dev)
 #define ROCKER_IFNAMSIZ 16
 #define MAX_ROCKER_NAME_LEN  (ROCKER_IFNAMSIZ - 1 - 3 - 3)
 if (strlen(r->name) > MAX_ROCKER_NAME_LEN) {
-fprintf(stderr,
-"rocker: name too long; please shorten to at most %d chars\n",
+error_setg(errp,
+

[Qemu-devel] [RFC PATCH v1 6/6] spapr: Fix migration of Radix guests

2017-05-16 Thread Bharata B Rao

Fix migration of radix guests by ensuring that we issue
KVM_PPC_CONFIGURE_V3_MMU for radix case post migration.

Reported-by: Nageswara R Sastry 
Signed-off-by: Bharata B Rao 
---
 hw/ppc/spapr.c | 15 +++
 hw/ppc/spapr_hcall.c   |  1 +
 include/hw/ppc/spapr.h |  1 +
 3 files changed, 17 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 05abfc1..dd1d687 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1443,6 +1443,20 @@ static int spapr_post_load(void *opaque, int version_id)
 err = spapr_rtc_import_offset(>rtc, spapr->rtc_offset);
 }
 
+if (spapr->patb_entry) {
+if (kvmppc_has_cap_mmu_radix() && kvm_enabled()) {
+err = kvmppc_configure_v3_mmu(POWERPC_CPU(first_cpu),
+  spapr->patb_flags &
+  SPAPR_PROC_TABLE_RADIX,
+  spapr->patb_flags &
+  SPAPR_PROC_TABLE_GTSE,
+  spapr->patb_entry);
+} else {
+error_report("Radix guest is unsupported by the host");
+return -EINVAL;
+}
+}
+
 return err;
 }
 
@@ -1527,6 +1541,7 @@ static const VMStateDescription vmstate_spapr_patb_entry 
= {
 .needed = spapr_patb_entry_needed,
 .fields = (VMStateField[]) {
 VMSTATE_UINT64(patb_entry, sPAPRMachineState),
+VMSTATE_UINT64(patb_flags, sPAPRMachineState),
 VMSTATE_END_OF_LIST()
 },
 };
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 768aa57..b002fae 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -986,6 +986,7 @@ static target_ulong h_register_process_table(PowerPCCPU 
*cpu,
 spapr_check_setup_free_hpt(spapr, spapr->patb_entry, cproc);
 
 spapr->patb_entry = cproc; /* Save new process table */
+spapr->patb_flags = flags; /* Save the flags */
 
 /* Update the UPRT and GTSE bits in the LPCR for all cpus */
 CPU_FOREACH(cs) {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 5b39a26..c25a32e 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -75,6 +75,7 @@ struct sPAPRMachineState {
 void *htab;
 uint32_t htab_shift;
 uint64_t patb_entry; /* Process tbl registed in H_REGISTER_PROCESS_TABLE */
+uint64_t patb_flags;
 hwaddr rma_size;
 int vrma_adjust;
 ssize_t rtas_size;
-- 
2.7.4

[Qemu-devel] [RFC PATCH v1 2/6] migration: Introduce unregister_savevm_live()

2017-05-16 Thread Bharata B Rao

Introduce a new function unregister_savevm_live() to unregister the vmstate
handlers registered via register_savevm_live().

register_savevm() allocates SaveVMHandlers while register_savevm_live()
gets passed with SaveVMHandlers. During unregistration, we  want to
free SaveVMHandlers in the former case but not free in the latter case.
Hence this new API is needed to differentiate this.

This new API will be needed by PowerPC to unregister the HTAB savevm
handlers.

Signed-off-by: Bharata B Rao 
---
 hw/net/vmxnet3.c|  2 +-
 hw/s390x/s390-skeys.c   |  2 +-
 include/migration/vmstate.h |  4 +++-
 migration/savevm.c  | 12 ++--
 slirp/slirp.c   |  2 +-
 5 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 8b1fab2..2b923be 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2350,7 +2350,7 @@ static void vmxnet3_pci_uninit(PCIDevice *pci_dev)
 
 VMW_CBPRN("Starting uninit...");
 
-unregister_savevm(dev, "vmxnet3-msix", s);
+unregister_savevm(dev, "vmxnet3-msix", s, false);
 
 vmxnet3_net_uninit(s);
 
diff --git a/hw/s390x/s390-skeys.c b/hw/s390x/s390-skeys.c
index e2d4e1a..32b6435 100644
--- a/hw/s390x/s390-skeys.c
+++ b/hw/s390x/s390-skeys.c
@@ -379,7 +379,7 @@ static inline void s390_skeys_set_migration_enabled(Object 
*obj, bool value,
 register_savevm(NULL, TYPE_S390_SKEYS, 0, 1, s390_storage_keys_save,
 s390_storage_keys_load, ss);
 } else {
-unregister_savevm(DEVICE(ss), TYPE_S390_SKEYS, ss);
+unregister_savevm(DEVICE(ss), TYPE_S390_SKEYS, ss, false);
 }
 }
 
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index f4bf3f1..ba81b3e 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -78,7 +78,9 @@ int register_savevm_live(DeviceState *dev,
  SaveVMHandlers *ops,
  void *opaque);
 
-void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque);
+void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque,
+   bool live);
+void unregister_savevm_live(DeviceState *dev, const char *idstr, void *opaque);
 
 typedef struct VMStateInfo VMStateInfo;
 typedef struct VMStateDescription VMStateDescription;
diff --git a/migration/savevm.c b/migration/savevm.c
index 7a268ec..fa7c3db 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -630,7 +630,8 @@ int register_savevm(DeviceState *dev,
 ops, opaque);
 }
 
-void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque)
+void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque,
+   bool live)
 {
 SaveStateEntry *se, *new_se;
 char id[256] = "";
@@ -651,12 +652,19 @@ void unregister_savevm(DeviceState *dev, const char 
*idstr, void *opaque)
 if (dev) {
 g_free(se->compat);
 }
-g_free(se->ops);
+if (!live) {
+g_free(se->ops);
+}
 g_free(se);
 }
 }
 }
 
+void unregister_savevm_live(DeviceState *dev, const char *idstr, void *opaque)
+{
+unregister_savevm(dev, idstr, opaque, true);
+}
+
 int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
const VMStateDescription *vmsd,
void *opaque, int alias_id,
diff --git a/slirp/slirp.c b/slirp/slirp.c
index 2f2ec2c..108e669 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -333,7 +333,7 @@ void slirp_cleanup(Slirp *slirp)
 {
 QTAILQ_REMOVE(_instances, slirp, entry);
 
-unregister_savevm(NULL, "slirp", slirp);
+unregister_savevm(NULL, "slirp", slirp, false);
 
 ip_cleanup(slirp);
 ip6_cleanup(slirp);
-- 
2.7.4

[Qemu-devel] [RFC PATCH v1 5/6] spapr: Unregister HPT savevm handlers for radix guests

2017-05-16 Thread Bharata B Rao

HPT gets created by default and later when the guest turns out to be
a radix guest, the HPT is destroyed when guest does H_REGISTER_PROC_TBL
hcall. Let HTAB savevm handlers registration and unregistration follow
the same model so that we don't end up having unrequired HTAB savevm
handlers for radix guests.

This also ensures that HTAB savevm handlers seemlessly get destroyed and
recreated like HTAB itself when hash guest reboots.

Signed-off-by: Bharata B Rao 
---
 hw/ppc/spapr.c | 15 +--
 hw/ppc/spapr_hcall.c   |  1 +
 include/hw/ppc/spapr.h |  2 ++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 521eef1..05abfc1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1237,6 +1237,7 @@ static void spapr_reallocate_hpt(sPAPRMachineState 
*spapr, int shift,
 
 /* Clean up any HPT info from a previous boot */
 spapr_free_hpt(spapr);
+spapr_htab_savevm_unregister(spapr);
 
 rc = kvmppc_reset_htab(shift);
 if (rc < 0) {
@@ -1275,6 +1276,7 @@ static void spapr_reallocate_hpt(sPAPRMachineState 
*spapr, int shift,
 DIRTY_HPTE(HPTE(spapr->htab, i));
 }
 }
+spapr_htab_savevm_register(spapr);
 }
 
 void spapr_setup_hpt_and_vrma(sPAPRMachineState *spapr)
@@ -1874,6 +1876,17 @@ static SaveVMHandlers savevm_htab_handlers = {
 .load_state = htab_load,
 };
 
+void spapr_htab_savevm_register(sPAPRMachineState *spapr)
+{
+register_savevm_live(NULL, "spapr/htab", -1, 1,
+ _htab_handlers, spapr);
+}
+
+void spapr_htab_savevm_unregister(sPAPRMachineState *spapr)
+{
+unregister_savevm_live(NULL, "spapr/htab", spapr);
+}
+
 static void spapr_boot_set(void *opaque, const char *boot_device,
Error **errp)
 {
@@ -2336,8 +2349,6 @@ static void ppc_spapr_init(MachineState *machine)
  * interface, this is a legacy from the sPAPREnvironment structure
  * which predated MachineState but had a similar function */
 vmstate_register(NULL, 0, _spapr, spapr);
-register_savevm_live(NULL, "spapr/htab", -1, 1,
- _htab_handlers, spapr);
 
 /* used by RTAS */
 QTAILQ_INIT(>ccs_list);
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index be79e3d..768aa57 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -914,6 +914,7 @@ static void spapr_check_setup_free_hpt(sPAPRMachineState 
*spapr,
 } else if (!(patbe_old & PATBE1_GR)) {
 /* HASH->RADIX : Free HPT */
 spapr_free_hpt(spapr);
+spapr_htab_savevm_unregister(spapr);
 } else if (!(patbe_new & PATBE1_GR)) {
 /* RADIX->HASH || NOTHING->HASH : Allocate HPT */
 spapr_setup_hpt_and_vrma(spapr);
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 6f9cb85..5b39a26 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -636,6 +636,8 @@ void 
spapr_hotplug_req_remove_by_count_indexed(sPAPRDRConnectorType drc_type,
uint32_t count, uint32_t index);
 void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
 sPAPRMachineState *spapr);
+void spapr_htab_savevm_register(sPAPRMachineState *spapr);
+void spapr_htab_savevm_unregister(sPAPRMachineState *spapr);
 
 /* rtas-configure-connector state */
 struct sPAPRConfigureConnectorState {
-- 
2.7.4

[Qemu-devel] [RFC PATCH v1 4/6] spapr: Consolidate HPT freeing code into a routine

2017-05-16 Thread Bharata B Rao

Consolidate the code that frees HPT into a separate routine
spapr_free_hpt() as the same chunk of code is called from two places.

Signed-off-by: Bharata B Rao 
---
 hw/ppc/spapr.c | 13 +
 hw/ppc/spapr_hcall.c   |  5 +
 include/hw/ppc/spapr.h |  1 +
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 1b7cada..521eef1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1222,16 +1222,21 @@ static int spapr_hpt_shift_for_ramsize(uint64_t ramsize)
 return shift;
 }
 
+void spapr_free_hpt(sPAPRMachineState *spapr)
+{
+g_free(spapr->htab);
+spapr->htab = NULL;
+spapr->htab_shift = 0;
+close_htab_fd(spapr);
+}
+
 static void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift,
  Error **errp)
 {
 long rc;
 
 /* Clean up any HPT info from a previous boot */
-g_free(spapr->htab);
-spapr->htab = NULL;
-spapr->htab_shift = 0;
-close_htab_fd(spapr);
+spapr_free_hpt(spapr);
 
 rc = kvmppc_reset_htab(shift);
 if (rc < 0) {
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 3600b0e..be79e3d 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -913,10 +913,7 @@ static void spapr_check_setup_free_hpt(sPAPRMachineState 
*spapr,
 /* We assume RADIX, so this catches all the "Do Nothing" cases */
 } else if (!(patbe_old & PATBE1_GR)) {
 /* HASH->RADIX : Free HPT */
-g_free(spapr->htab);
-spapr->htab = NULL;
-spapr->htab_shift = 0;
-close_htab_fd(spapr);
+spapr_free_hpt(spapr);
 } else if (!(patbe_new & PATBE1_GR)) {
 /* RADIX->HASH || NOTHING->HASH : Allocate HPT */
 spapr_setup_hpt_and_vrma(spapr);
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index a692e63..6f9cb85 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -610,6 +610,7 @@ int spapr_h_cas_compose_response(sPAPRMachineState *sm,
  sPAPROptionVector *ov5_updates);
 void close_htab_fd(sPAPRMachineState *spapr);
 void spapr_setup_hpt_and_vrma(sPAPRMachineState *spapr);
+void spapr_free_hpt(sPAPRMachineState *spapr);
 sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn);
 void spapr_tce_table_enable(sPAPRTCETable *tcet,
 uint32_t page_shift, uint64_t bus_offset,
-- 
2.7.4

[Qemu-devel] [RFC PATCH v1 3/6] spapr: Make h_register_process_table hcall flags global

2017-05-16 Thread Bharata B Rao

The flags used in h_register_process_table hcall are needed in spapr.c
and hence move them to a header file. While doing so, give them
slightly specific names.

Signed-off-by: Bharata B Rao 
Reviewed-by: David Gibson 
---
 hw/ppc/spapr_hcall.c   | 31 ++-
 include/hw/ppc/spapr.h | 10 ++
 2 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 0d608d6..3600b0e 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -924,13 +924,6 @@ static void spapr_check_setup_free_hpt(sPAPRMachineState 
*spapr,
 return;
 }
 
-#define FLAGS_MASK  0x01FULL
-#define FLAG_MODIFY 0x10
-#define FLAG_REGISTER   0x08
-#define FLAG_RADIX  0x04
-#define FLAG_HASH_PROC_TBL  0x02
-#define FLAG_GTSE   0x01
-
 static target_ulong h_register_process_table(PowerPCCPU *cpu,
  sPAPRMachineState *spapr,
  target_ulong opcode,
@@ -943,12 +936,13 @@ static target_ulong h_register_process_table(PowerPCCPU 
*cpu,
 target_ulong table_size = args[3];
 uint64_t cproc;
 
-if (flags & ~FLAGS_MASK) { /* Check no reserved bits are set */
+if (flags & ~SPAPR_PROC_TABLE_MASK) { /* Check no reserved bits are set */
 return H_PARAMETER;
 }
-if (flags & FLAG_MODIFY) {
-if (flags & FLAG_REGISTER) {
-if (flags & FLAG_RADIX) { /* Register new RADIX process table */
+if (flags & SPAPR_PROC_TABLE_MODIFY) {
+if (flags & SPAPR_PROC_TABLE_REGISTER) {
+if (flags & SPAPR_PROC_TABLE_RADIX) {
+/* Register new RADIX process table */
 if (proc_tbl & 0xfff || proc_tbl >> 60) {
 return H_P2;
 } else if (page_size) {
@@ -958,7 +952,8 @@ static target_ulong h_register_process_table(PowerPCCPU 
*cpu,
 }
 cproc = PATBE1_GR | proc_tbl | table_size;
 } else { /* Register new HPT process table */
-if (flags & FLAG_HASH_PROC_TBL) { /* Hash with Segment Tables 
*/
+if (flags & SPAPR_PROC_TABLE_HPT_PT) {
+/* Hash with Segment Tables */
 /* TODO - Not Supported */
 /* Technically caused by flag bits => H_PARAMETER */
 return H_PARAMETER;
@@ -981,7 +976,8 @@ static target_ulong h_register_process_table(PowerPCCPU 
*cpu,
 cproc = spapr->patb_entry & PATBE1_GR;
 }
 } else { /* Maintain current registration */
-if (!(flags & FLAG_RADIX) != !(spapr->patb_entry & PATBE1_GR)) {
+if (!(flags & SPAPR_PROC_TABLE_RADIX) !=
+!(spapr->patb_entry & PATBE1_GR)) {
 /* Technically caused by flag bits => H_PARAMETER */
 return H_PARAMETER; /* Existing Process Table Mismatch */
 }
@@ -996,13 +992,14 @@ static target_ulong h_register_process_table(PowerPCCPU 
*cpu,
 /* Update the UPRT and GTSE bits in the LPCR for all cpus */
 CPU_FOREACH(cs) {
 set_spr(cs, SPR_LPCR, LPCR_UPRT | LPCR_GTSE,
-((flags & (FLAG_RADIX | FLAG_HASH_PROC_TBL)) ? LPCR_UPRT : 0) |
-((flags & FLAG_GTSE) ? LPCR_GTSE : 0));
+((flags & (SPAPR_PROC_TABLE_RADIX | SPAPR_PROC_TABLE_HPT_PT)) ?
+LPCR_UPRT : 0) | ((flags & SPAPR_PROC_TABLE_GTSE) ?
+SPAPR_PROC_TABLE_GTSE : 0));
 }
 
 if (kvm_enabled()) {
-return kvmppc_configure_v3_mmu(cpu, flags & FLAG_RADIX,
-   flags & FLAG_GTSE, cproc);
+return kvmppc_configure_v3_mmu(cpu, flags & SPAPR_PROC_TABLE_RADIX,
+   flags & SPAPR_PROC_TABLE_GTSE, cproc);
 }
 return H_SUCCESS;
 }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 5802f88..a692e63 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -681,4 +681,14 @@ int spapr_rng_populate_dt(void *fdt);
 
 void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
 
+/*
+ * Defines for flag value used in H_REGISTER_PROC_TBL hcall.
+ */
+#define SPAPR_PROC_TABLE_MASK0x01FULL
+#define SPAPR_PROC_TABLE_MODIFY  0x10
+#define SPAPR_PROC_TABLE_REGISTER0x08
+#define SPAPR_PROC_TABLE_RADIX   0x04
+#define SPAPR_PROC_TABLE_HPT_PT  0x02
+#define SPAPR_PROC_TABLE_GTSE0x01
+
 #endif /* HW_SPAPR_H */
-- 
2.7.4

[Qemu-devel] [RFC PATCH v1 1/6] migration: Fix unregister_savevm()

2017-05-16 Thread Bharata B Rao

In unregister_savevm(), free se->compat only if it was allocated earlier.

Signed-off-by: Bharata B Rao 
---
 migration/savevm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 352a8f2..7a268ec 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -648,7 +648,9 @@ void unregister_savevm(DeviceState *dev, const char *idstr, 
void *opaque)
 QTAILQ_FOREACH_SAFE(se, _state.handlers, entry, new_se) {
 if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) {
 QTAILQ_REMOVE(_state.handlers, se, entry);
-g_free(se->compat);
+if (dev) {
+g_free(se->compat);
+}
 g_free(se->ops);
 g_free(se);
 }
-- 
2.7.4

[Qemu-devel] [RFC PATCH v1 0/6] ppc/spapr: Fix migration of radix guests

2017-05-16 Thread Bharata B Rao

This patchset fixes the migration of sPAPR radix guests.

Changes in v1:
--
- Added two patches to fix generic savevm unregistration issues.
- HTAB savevm handlers are now registered/unregistered when HTAB
  is created/destroyed instead of doing this in CAS call (as in v0).

TODO:
- I have checks in spapr_post_load() to detect and fail the migration
  of radix guest to a host that doesn't support radix. However I couldn't
  test this as I am hitting some other unrelated migration failure
  when testing this path.
- I have tested may scenarios like
  - tcg hash->hash, radix->radix, hash->radix and radix->hash reboot
  - kvm hash reboot and migration
  - kvm radix reboot and migration
  However boot->reboot->migration of radix guest doesn't complete
  and this seems to be a different issue to be fixed.

v0: https://lists.gnu.org/archive/html/qemu-ppc/2017-05/msg00197.html

Bharata B Rao (6):
  migration: Fix unregister_savevm()
  migration: Introduce unregister_savevm_live()
  spapr: Make h_register_process_table hcall flags global
  spapr: Consolidate HPT freeing code into a routine
  spapr: Unregister HPT savevm handlers for radix guests
  spapr: Fix migration of Radix guests

 hw/net/vmxnet3.c|  2 +-
 hw/ppc/spapr.c  | 43 +--
 hw/ppc/spapr_hcall.c| 38 +-
 hw/s390x/s390-skeys.c   |  2 +-
 include/hw/ppc/spapr.h  | 14 ++
 include/migration/vmstate.h |  4 +++-
 migration/savevm.c  | 16 +---
 slirp/slirp.c   |  2 +-
 8 files changed, 87 insertions(+), 34 deletions(-)

-- 
2.7.4

Re: [Qemu-devel] [RFC PATCH 2/2] spapr: Fix migration of Radix guests

2017-05-16 Thread Bharata B Rao

On Thu, May 11, 2017 at 11:02:20AM +1000, David Gibson wrote:
> On Mon, May 08, 2017 at 02:36:17PM +0530, Bharata B Rao wrote:
> > Currently HTAB savevm handlers get registered by default and migration
> > of radix guest will fail.
> > 
> > - Ensure that HTAB savevm handlers are not registered for radix case.
> > - Ensure that we issue KVM_PPC_CONFIGURE_V3_MMU for radix case post
> >   migration.
> > 
> > TODO: Right now I have delayed the HTAB savevm handler registration
> > to CAS call where we know if the guest is radix or hash. Another approach
> > is to let the HTAB handlers to be registered by default (as it is being
> > done currently, but unregister them from CAS when we discover radix
> > capability).
> 
> Option 2 there sounds messy.  I also suspect it could break if you try
> to migrate an (eventually) radix guest before it's done CAS.
> 
> Strictly speaking only registering at CAS time will break old hash
> guests that don't do CAS at all.  However such guests are really,
> really ancient, and I suspect we don't work with them already.
> 
> You do, however, need to deregister (and allow the choice to be made
> again) on guest reset.  On KVM we can only (for now) support either
> hash or radix guests.  Under TCG, however, we could run a radix guest
> then reboot to a hash guest or vice versa.

Took care of this in v1.

> 
> 
> > 
> > Reported-by: Nageswara R Sastry 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  hw/ppc/spapr.c | 18 +++---
> >  hw/ppc/spapr_hcall.c   |  5 +
> >  include/hw/ppc/spapr.h |  3 +++
> >  3 files changed, 23 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index e2dc77c..e14f55c 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1436,6 +1436,14 @@ static int spapr_post_load(void *opaque, int 
> > version_id)
> >  err = spapr_rtc_import_offset(>rtc, spapr->rtc_offset);
> >  }
> >  
> > +if (spapr->patb_entry && (spapr->patb_flags & SPAPR_PROC_TABLE_RADIX) 
> > &&
> 
> patb_entry already tells you whether the guest is radix or not
> (PATBE1_GR), you shouldn't need extra flags.
> 
> > +kvmppc_has_cap_mmu_radix() && kvm_enabled()) {
> 
> You should also fail the migration if you have an incoming radix
> guest, but the your new KVM host can't do radix.  Or the reverse, for
> that matter.

I have checks in v1 to fail migration of radix guest to a host that doesn't
support radix. But I don't see how we can detect and fail the migration
of hash guests into hosts that don't support hash from here (i,e., from
spapr_post_load). The hash guest's migration stream would have htab
savevm entries and the target will fail as it knows not about htab
savevm entries.

Regards,
Bharata.

Re: [Qemu-devel] [PATCH 6/6] spec/vhost-user spec: Add IOMMU support

2017-05-16 Thread Jason Wang




On 2017年05月16日 23:16, Michael S. Tsirkin wrote:

On Mon, May 15, 2017 at 01:45:28PM +0800, Jason Wang wrote:


On 2017年05月13日 08:02, Michael S. Tsirkin wrote:

On Fri, May 12, 2017 at 04:21:58PM +0200, Maxime Coquelin wrote:

On 05/11/2017 08:25 PM, Michael S. Tsirkin wrote:

On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:

This patch specifies and implements the master/slave communication
to support device IOTLB in slave.

The vhost_iotlb_msg structure introduced for kernel backends is
re-used, making the design close between the two backends.

An exception is the use of the secondary channel to enable the
slave to send IOTLB miss requests to the master.

Signed-off-by: Maxime Coquelin 
---
docs/specs/vhost-user.txt | 75 
+++
hw/virtio/vhost-user.c| 31 
2 files changed, 106 insertions(+)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 5fa7016..4a1f0c3 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -97,6 +97,23 @@ Depending on the request type, payload can be:
   log offset: offset from start of supplied file descriptor
   where logging starts (i.e. where guest address 0 would be logged)
+ * An IOTLB message
+   -
+   | iova | size | user address | permissions flags | type |
+   -
+
+   IOVA: a 64-bit guest I/O virtual address

guest -> VM

Ok.


+   Size: a 64-bit size

How do you specify "all memory"? give special meaning to size 0?

Good point, it does not support all memory currently.
It is not vhost-user specific, but general to the vhost implementation.

But iommu needs it to support passthrough.

Probably not, we will just pass the mappings in vhost_memory_region to
vhost. Its memory_size is also a __u64.

Thanks

That's different since that's chunks of qemu virtual memory.

IOMMU maps IOVA to GPA.



But we're in fact cache IOVA -> HVA mapping in the remote IOTLB. When 
passthrough mode is enabled, IOVA == GPA, so passing mappings in 
vhost_memory_region should be fine.


The only possible "issue" with "all memory" is if you can not use a 
single TLB invalidation to invalidate all caches in remote TLB. But this 
is only theoretical problem since it only happen when we have a 1 byte 
mapping [2^64 - 1, 2^64) cached in remote TLB. Consider:


- E.g intel IOMMU has a range limitation for invalidation (1G currently)
- Looks like all existed IOMMU use page aligned mappings

It was probably not a big issue. And for safety we could use two 
invalidations to make sure all caches were flushed remotely. Or just 
change the protocol from start, size to start, end. Vhost-kernel is 
probably too late for this change, but I'm still not quite sure it is 
worthwhile.


Thanks

Re: [Qemu-devel] [PATCH] nvme: Add support for Controller Memory Buffers

2017-05-16 Thread Stephen Bates

> Awesome, this looks great!
>
> Acked-by: Keith Busch 

Thanks Keith!

I still seem to be having issues getting my patches onto the qemu-* mailing 
lists. Does anyone have any idea how I go about rectifying that?

Stephen

Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration

2017-05-16 Thread Zhoujian (jay)

Hi Wanpeng,

> > On 11/05/2017 14:07, Zhoujian (jay) wrote:
> >> -* Scan sptes if dirty logging has been stopped, dropping those
> >> -* which can be collapsed into a single large-page spte.  Later
> >> -* page faults will create the large-page sptes.
> >> +* Reset each vcpu's mmu, then page faults will create the
> large-page
> >> +* sptes later.
> >>  */
> >> if ((change != KVM_MR_DELETE) &&
> >> (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
> >> -   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
> >> -   kvm_mmu_zap_collapsible_sptes(kvm, new);
> 
> This is an unlikely branch(unless guest live migration fails and continue
> to run on the source machine) instead of hot path, do you have any
> performance number for your real workloads?
> 

Sorry to bother you again.

Recently, I have tested the performance before migration and after migration 
failure
using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard 
performance
evaluation tool.

These are the results:
**
Before migration the score is 153, and the TLB miss statistics of the qemu 
process is:
linux-sjrfac:/mnt/zhoujian # perf stat -e 
dTLB-load-misses,dTLB-loads,dTLB-store-misses, \
dTLB-stores,iTLB-load-misses,iTLB-loads -p 26463 sleep 10

Performance counter stats for process id '26463':

   698,938  dTLB-load-misses  #0.13% of all dTLB cache 
hits   (50.46%)
   543,303,875  dTLB-loads  
  (50.43%)
   199,597  dTLB-store-misses   
  (16.51%)
60,128,561  dTLB-stores 
  (16.67%)
69,986  iTLB-load-misses  #6.17% of all iTLB cache 
hits   (16.67%)
 1,134,097  iTLB-loads  
  (33.33%)

  10.000684064 seconds time elapsed

After migration failure the score is 149, and the TLB miss statistics of 
the qemu process is:
linux-sjrfac:/mnt/zhoujian # perf stat -e 
dTLB-load-misses,dTLB-loads,dTLB-store-misses, \
dTLB-stores,iTLB-load-misses,iTLB-loads -p 26463 sleep 10

Performance counter stats for process id '26463':

   765,400  dTLB-load-misses  #0.14% of all dTLB cache 
hits   (50.50%)
   540,972,144  dTLB-loads  
  (50.47%)
   207,670  dTLB-store-misses   
  (16.50%)
58,363,787  dTLB-stores 
  (16.67%)
   109,772  iTLB-load-misses  #9.52% of all iTLB cache 
hits   (16.67%)
 1,152,784  iTLB-loads  
  (33.32%)

  10.000703078 seconds time elapsed
**

These are the steps:
==
 (1) the version of kmod is 4.4.11(with slightly modified) and the version of 
qemu is 2.6.0
(with slightly modified), the kmod is applied with the following patch 
according to
Paolo's advice:

diff --git a/source/x86/x86.c b/source/x86/x86.c
index 054a7d3..75a4bb3 100644
--- a/source/x86/x86.c
+++ b/source/x86/x86.c
@@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 */
if ((change != KVM_MR_DELETE) &&
(old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
-   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
-   kvm_mmu_zap_collapsible_sptes(kvm, new);
+   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
+   printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
+   kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
+   }
 
/*
 * Set up write protection and/or dirty logging for the new slot.

(2) I started up a memory preoccupied 10G VM(suse11sp3), which means its "RES 
column" in top is 10G,
in order to set up the EPT table in advance.
(3) And then, I run the test case 429.mcf of spec cpu2006 before migration and 
after migration failure.
The 429.mcf is a memory intensive workload, and the migration failure is 
constructed deliberately
with the following patch of qemu:

diff --git a/migration/migration.c b/migration/migration.c
index 5d725d0..88dfc59 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -625,6 +625,9 @@ static void process_incoming_migration_co(void *opaque)
   MIGRATION_STATUS_ACTIVE);
 ret = qemu_loadvm_state(f);
 
+// deliberately construct the migration failure
+exit(EXIT_FAILURE); 
+
 ps = postcopy_state_get();
 trace_process_incoming_migration_co_end(ret, ps);
 if (ps != POSTCOPY_INCOMING_NONE) {
==


Results of the score and TLB miss rate are almost the same, and I am confused.
May I ask which tool do you use to evaluate the performance?
And if my test steps are wrong, please

Re: [Qemu-devel] [PATCH] iotests: 147: Don't test inet6 if not available

2017-05-16 Thread Fam Zheng

On Fri, 05/05 18:21, Fam Zheng wrote:
> This is the case in our docker tests, as we use --net=none there. Skip
> this method.

Ping. Is this patch okay?

> 
> Signed-off-by: Fam Zheng 
> ---
>  tests/qemu-iotests/147 | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/tests/qemu-iotests/147 b/tests/qemu-iotests/147
> index 32afea6..db34838 100755
> --- a/tests/qemu-iotests/147
> +++ b/tests/qemu-iotests/147
> @@ -147,6 +147,13 @@ class BuiltinNBD(NBDBlockdevAddBase):
>  self._server_down()
>  
>  def test_inet6(self):
> +try:
> +socket.getaddrinfo("::0", "0", socket.AF_INET6,
> +   socket.SOCK_STREAM, socket.IPPROTO_TCP,
> +   socket.AI_ADDRCONFIG | socket.AI_CANONNAME)
> +except socket.gaierror:
> +# IPv6 not available, skip
> +return
>  address = { 'type': 'inet',
>  'data': {
>  'host': '::1',
> -- 
> 2.9.3
> 
>

[Qemu-devel] [PATCH V3 2/3] net/filter-mirror.c: Rename filter_mirror_send() and fix codestyle

2017-05-16 Thread Zhang Chen

Because filter_mirror_receive_iov() and filter_redirector_receive_iov()
both use the filter_mirror_send() to send packet, so I change
filter_mirror_send() to filter_send() that looks more common.
And fix some codestyle.

Signed-off-by: Zhang Chen 
---
 net/filter-mirror.c | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index fd0322f..8b1b069 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -43,9 +43,9 @@ typedef struct MirrorState {
 SocketReadState rs;
 } MirrorState;
 
-static int filter_mirror_send(CharBackend *chr_out,
-  const struct iovec *iov,
-  int iovcnt)
+static int filter_send(CharBackend *chr_out,
+   const struct iovec *iov,
+   int iovcnt)
 {
 int ret = 0;
 ssize_t size = 0;
@@ -141,9 +141,9 @@ static ssize_t filter_mirror_receive_iov(NetFilterState *nf,
 MirrorState *s = FILTER_MIRROR(nf);
 int ret;
 
-ret = filter_mirror_send(>chr_out, iov, iovcnt);
+ret = filter_send(>chr_out, iov, iovcnt);
 if (ret) {
-error_report("filter_mirror_send failed(%s)", strerror(-ret));
+error_report("filter mirror send failed(%s)", strerror(-ret));
 }
 
 /*
@@ -164,9 +164,9 @@ static ssize_t filter_redirector_receive_iov(NetFilterState 
*nf,
 int ret;
 
 if (qemu_chr_fe_get_driver(>chr_out)) {
-ret = filter_mirror_send(>chr_out, iov, iovcnt);
+ret = filter_send(>chr_out, iov, iovcnt);
 if (ret) {
-error_report("filter_mirror_send failed(%s)", strerror(-ret));
+error_report("filter redirector send failed(%s)", strerror(-ret));
 }
 return iov_size(iov, iovcnt);
 } else {
@@ -286,8 +286,9 @@ static char *filter_redirector_get_indev(Object *obj, Error 
**errp)
 return g_strdup(s->indev);
 }
 
-static void
-filter_redirector_set_indev(Object *obj, const char *value, Error **errp)
+static void filter_redirector_set_indev(Object *obj,
+const char *value,
+Error **errp)
 {
 MirrorState *s = FILTER_REDIRECTOR(obj);
 
@@ -302,8 +303,9 @@ static char *filter_mirror_get_outdev(Object *obj, Error 
**errp)
 return g_strdup(s->outdev);
 }
 
-static void
-filter_mirror_set_outdev(Object *obj, const char *value, Error **errp)
+static void filter_mirror_set_outdev(Object *obj,
+ const char *value,
+ Error **errp)
 {
 MirrorState *s = FILTER_MIRROR(obj);
 
@@ -323,8 +325,9 @@ static char *filter_redirector_get_outdev(Object *obj, 
Error **errp)
 return g_strdup(s->outdev);
 }
 
-static void
-filter_redirector_set_outdev(Object *obj, const char *value, Error **errp)
+static void filter_redirector_set_outdev(Object *obj,
+ const char *value,
+ Error **errp)
 {
 MirrorState *s = FILTER_REDIRECTOR(obj);
 
-- 
2.7.4

[Qemu-devel] [PATCH V3 3/3] net/filter-rewriter: Remove unused option in filter-rewriter

2017-05-16 Thread Zhang Chen

Signed-off-by: Zhang Chen 
---
 qemu-options.hx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index f806af9..cbec279 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4038,7 +4038,7 @@ Create a filter-redirector we need to differ outdev id 
from indev id, id can not
 be the same. we can just use indev or outdev, but at least one of indev or 
outdev
 need to be specified.
 
-@item -object 
filter-rewriter,id=@var{id},netdev=@var{netdevid},rewriter-mode=@var{mode}[,queue=@var{all|rx|tx}]
+@item -object 
filter-rewriter,id=@var{id},netdev=@var{netdevid}[,queue=@var{all|rx|tx}]
 
 Filter-rewriter is a part of COLO project.It will rewrite tcp packet to
 secondary from primary to keep secondary tcp connection,and rewrite
-- 
2.7.4

[Qemu-devel] [PATCH V3 0/3] Optimize filter-mirror and filter-rewriter

2017-05-16 Thread Zhang Chen

Fix some duplicate codes and remove unused codes.

v3:
 - Remove the ',' in patch 3/3

v2:
 - Address Eric's comment fix typo and keep a long lien on patch 3.

Zhang Chen (3):
  net/filter-mirror.c: Remove duplicate check code.
  net/filter-mirror.c: Rename filter_mirror_send() and fix codestyle
  net/filter-rewriter: Remove unused option in filter-rewriter

 net/filter-mirror.c | 35 ---
 qemu-options.hx |  2 +-
 2 files changed, 17 insertions(+), 20 deletions(-)

-- 
2.7.4

[Qemu-devel] [PATCH V3 1/3] net/filter-mirror.c: Remove duplicate check code.

2017-05-16 Thread Zhang Chen

The s->outdev have checked in filter_mirror_set_outdev().

Signed-off-by: Zhang Chen 
---
 net/filter-mirror.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index 72fa7c2..fd0322f 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -194,12 +194,6 @@ static void filter_mirror_setup(NetFilterState *nf, Error 
**errp)
 MirrorState *s = FILTER_MIRROR(nf);
 Chardev *chr;
 
-if (!s->outdev) {
-error_setg(errp, "filter mirror needs 'outdev' "
-   "property set");
-return;
-}
-
 chr = qemu_chr_find(s->outdev);
 if (chr == NULL) {
 error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
-- 
2.7.4

Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 3/6] hw/ppc: migrating the DRC state of hotplugged devices

2017-05-16 Thread David Gibson

On Tue, May 16, 2017 at 10:46:23AM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 05/12/2017 03:11 AM, David Gibson wrote:
> > On Fri, May 05, 2017 at 05:47:43PM -0300, Daniel Henrique Barboza wrote:
> > > In pseries, a firmware abstraction called Dynamic Reconfiguration
> > > Connector (DRC) is used to assign a particular dynamic resource
> > > to the guest and provide an interface to manage configuration/removal
> > > of the resource associated with it. In other words, DRC is the
> > > 'plugged state' of a device.
> > > 
> > > Before this patch, DRC wasn't being migrated. This causes
> > > post-migration problems due to DRC state mismatch between source and
> > > target. The DRC state of a device X in the source might
> > > change, while in the target the DRC state of X is still fresh. When
> > > migrating the guest, X will not have the same hotplugged state as it
> > > did in the source. This means that we can't hot unplug X in the
> > > target after migration is completed because its DRC state is not 
> > > consistent.
> > > https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1677552 is one
> > > bug that is caused by this DRC state mismatch between source and
> > > target.
> > > 
> > > To migrate the DRC state, we defined the VMStateDescription struct for
> > > spapr_drc to enable the transmission of spapr_drc state in migration.
> > > Not all the elements in the DRC state are migrated - only those
> > > that can be modified by guest actions or device add/remove
> > > operations:
> > > 
> > > - 'isolation_state', 'allocation_state' and 'indicator_state'
> > > are involved in the DR state transition diagram from
> > > PAPR+ 2.7, 13.4;
> > > 
> > > - 'configured', 'signalled', 'awaiting_release' and 'awaiting_allocation'
> > > are needed in attaching and detaching devices;
> > > 
> > > - 'indicator_state' provides users with hardware state information.
> > > 
> > > These are the DRC elements that are migrated.
> > > 
> > > In this patch the DRC state is migrated for PCI, LMB and CPU
> > > connector types. At this moment there is no support to migrate
> > > DRC for the PHB (PCI Host Bridge) type.
> > > 
> > > In the 'realize' function the DRC is registered using vmstate_register,
> > > similar to what hw/ppc/spapr_iommu.c does in 'spapr_tce_table_realize'.
> > > This approach works because  DRCs are bus-less and do not sit
> > > on a BusClass that implements bc->get_dev_path, so as a fallback the
> > > VMSD gets identified via "spapr_drc"/get_index(drc).
> > > 
> > > Signed-off-by: Daniel Henrique Barboza 
> > > ---
> > >   hw/ppc/spapr_drc.c | 61 
> > > ++
> > >   1 file changed, 61 insertions(+)
> > > 
> > > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > > index 1c72160..926b945 100644
> > > --- a/hw/ppc/spapr_drc.c
> > > +++ b/hw/ppc/spapr_drc.c
> > > @@ -519,6 +519,65 @@ static void reset(DeviceState *d)
> > >   }
> > >   }
> > > +static bool spapr_drc_needed(void *opaque)
> > > +{
> > > +sPAPRDRConnector *drc = (sPAPRDRConnector *)opaque;
> > > +sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > +bool rc = false;
> > > +sPAPRDREntitySense value;
> > Blank line after the declarations, please.
> > 
> > > +drck->entity_sense(drc, );
> > > +/* If no dev is plugged in there is no need to migrate the DRC state 
> > > */
> > > +if (value != SPAPR_DR_ENTITY_SENSE_PRESENT) {
> > > +return false;
> > > +}
> > > +
> > > +/*
> > > + * If there is dev plugged in, we need to migrate the DRC state when
> > > + * it is different from cold-plugged state
> > > + */
> > > +switch (drc->type) {
> > > +
> > No blank line here please.
> > 
> > > +case SPAPR_DR_CONNECTOR_TYPE_PCI:
> > > +rc = !((drc->isolation_state == 
> > > SPAPR_DR_ISOLATION_STATE_UNISOLATED) &&
> > > +   (drc->allocation_state == 
> > > SPAPR_DR_ALLOCATION_STATE_USABLE) &&
> > > +   drc->configured && drc->signalled && 
> > > !drc->awaiting_release);
> > You don't do any more manipulation of the rc value, so you might as
> > well just 'return' directly here.
> > 
> > 
> > > +break;
> > > +
> > > +case SPAPR_DR_CONNECTOR_TYPE_LMB:
> > > +rc = !((drc->isolation_state == 
> > > SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
> > > +   (drc->allocation_state == 
> > > SPAPR_DR_ALLOCATION_STATE_UNUSABLE) &&
> > > +   drc->configured && drc->signalled && 
> > > !drc->awaiting_release);
> > > +break;
> > > +
> > > +case SPAPR_DR_CONNECTOR_TYPE_CPU:
> > > +rc = !((drc->isolation_state == 
> > > SPAPR_DR_ISOLATION_STATE_ISOLATED) &&
> > > +   (drc->allocation_state == 
> > > SPAPR_DR_ALLOCATION_STATE_UNUSABLE) &&
> > > +drc->configured && drc->signalled && 
> > > !drc->awaiting_release);
> > > +break;
> > > +
> > > +default:
> > > +;
> > This should

Re: [Qemu-devel] [Qemu-ppc] [PATCH v9 4/6] hw/ppc/spapr.c: migrate pending_dimm_unplugs of spapr state

2017-05-16 Thread David Gibson

On Fri, May 12, 2017 at 04:54:57PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 05/12/2017 03:12 AM, David Gibson wrote:
> > On Fri, May 05, 2017 at 05:47:44PM -0300, Daniel Henrique Barboza wrote:
> > > To allow for a DIMM unplug event to resume its work if a migration
> > > occurs in the middle of it, this patch migrates the non-empty
> > > pending_dimm_unplugs QTAILQ that stores the DIMM information
> > > that the spapr_lmb_release() callback uses.
> > > 
> > > It was considered an apprach where the DIMM states would be restored
> > > on the post-_load after a migration. The problem is that there is
> > > no way of knowing, from the sPAPRMachineState, if a given DIMM is going
> > > through an unplug process and the callback needs the updated DIMM State.
> > > 
> > > We could migrate a flag indicating that there is an unplug event going
> > > on for a certain DIMM, fetching this information from the start of the
> > > spapr_del_lmbs call. But this would also require a scan on post_load to
> > > figure out how many nr_lmbs are left. At this point we can just
> > > migrate the nr_lmbs information as well, given that it is being calculated
> > > at spapr_del_lmbs already, and spare a scanning/discovery in the
> > > post-load. All that we need is inside the sPAPRDIMMState structure
> > > that is added to the pending_dimm_unplugs queue at the start of the
> > > spapr_del_lmbs, so it's convenient to just migrated this queue it if it's
> > > not empty.
> > > 
> > > Signed-off-by: Daniel Henrique Barboza 
> > NACK.
> > 
> > As I believe I suggested previously, you can reconstruct this state on
> > the receiving side by doing a full scan of the DIMM and LMB DRC states.
> 
> Just had an idea that I think it's in the line of what you're suggesting.
> Given
> that the information we need is only created in the spapr_del_lmbs
> (as per patch 1), we can use the absence of this information in the
> release callback as a sort of a flag, an indication that a migration got
> in the way and we need to reconstruct the nr_lmbs states again, using
> the same scanning function I've used in v8.
> 
> The flow would be like this (considering the changes in the
> previous 3 patches so far):
> 
> 
> 
> /* Callback to be called during DRC release. */
> void spapr_lmb_release(DeviceState *dev)
> {
>  HotplugHandler *hotplug_ctrl;
> 
>  uint64_t addr = spapr_dimm_get_address(PC_DIMM(dev));
>  sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>  sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, addr);
> 
> // no DIMM state found in spapr - re-create it to find out how may LMBs
> are left
> if (ds == NULL) {
> uint32 nr_lmbs  = ***call_scanning_LMB_DRCs_function(dev)***
> // recreate the sPAPRDIMMState element and add it back to spapr
> }
> 
> ( resume callback as usual )
> 
> ---

Yes, the above seems like a reasonable plan.

> Is this approach be adequate? Another alternative would be to use another
> way of detecting if an LMB unplug is happening and, if positive, do the same
> process in the post_load(). In this case I'll need to take a look in the
> code and
> see how we can detect an ongoing unplug besides what I've said above.

You could, but I think the lazy approach above is preferable.

> 
> Thanks,
> 
> 
> Daniel
> 
> > 
> > > ---
> > >   hw/ppc/spapr.c | 31 +++
> > >   1 file changed, 31 insertions(+)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index e190eb9..30f0b7b 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -1437,6 +1437,36 @@ static bool version_before_3(void *opaque, int 
> > > version_id)
> > >   return version_id < 3;
> > >   }
> > > +static bool spapr_pending_dimm_unplugs_needed(void *opaque)
> > > +{
> > > +sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
> > > +return !QTAILQ_EMPTY(>pending_dimm_unplugs);
> > > +}
> > > +
> > > +static const VMStateDescription vmstate_spapr_dimmstate = {
> > > +.name = "spapr_dimm_state",
> > > +.version_id = 1,
> > > +.minimum_version_id = 1,
> > > +.fields = (VMStateField[]) {
> > > +VMSTATE_UINT64(addr, sPAPRDIMMState),
> > > +VMSTATE_UINT32(nr_lmbs, sPAPRDIMMState),
> > > +VMSTATE_END_OF_LIST()
> > > +},
> > > +};
> > > +
> > > +static const VMStateDescription vmstate_spapr_pending_dimm_unplugs = {
> > > +.name = "spapr_pending_dimm_unplugs",
> > > +.version_id = 1,
> > > +.minimum_version_id = 1,
> > > +.needed = spapr_pending_dimm_unplugs_needed,
> > > +.fields = (VMStateField[]) {
> > > +VMSTATE_QTAILQ_V(pending_dimm_unplugs, sPAPRMachineState, 1,
> > > + vmstate_spapr_dimmstate, sPAPRDIMMState,
> > > + next),
> > > +VMSTATE_END_OF_LIST()
> > > +},
> > > +};
> > > +
> > >   static bool spapr_ov5_cas_needed(void *opaque)
> > >   {
>

Re: [Qemu-devel] [PATCH RESEND V2 0/3] Optimize filter-mirror and filter-rewriter

2017-05-16 Thread Zhang Chen




On 05/16/2017 10:42 PM, Eric Blake wrote:

On 05/16/2017 03:22 AM, Zhang Chen wrote:


What's the difference compared with V2? I think I should apply this
series?

In V2 I forgot remove "," in patch 3/3, Yes, please apply this series.

If you changed the patches, even just to add a ',', then it's best to
post the resend as "v3", not "RESEND V2".  Also, documenting what
changed between versions, rather than just blindly titling a message
RESEND (or even explaining WHY you are resending without change, such as
to correct an incorrect cc to save people the problem of bounce replies
when replying to the thread), is good netiquette.


OK, I got your point, will send the V3 later.

Thanks
Zhang Chen





--
Thanks
Zhang Chen

Re: [Qemu-devel] [PATCH v2] virtio: Move memory_listener_unregister to .unrealize

2017-05-16 Thread Fam Zheng

On Tue, 05/16 14:44, Paolo Bonzini wrote:
> 
> 
> On 16/05/2017 14:25, Fam Zheng wrote:
> > You are right. Having had another look, I think it's because of this:
> > VirtIODevice is an embeded member of VirtIOSCSIPCI therefore it is never
> > "finalized" through QOM reference directly.  Am I right?
> 
> What I would expect is:
> 
> virtio_instance_init_common:
> - create the VirtIODevice with refcount 1
> - create a child property for the VirtIODevice (refcount is now 2)
> - unref the VirtIODevice (refcount is again 1)
> 
> virtio_pci_realize:
> - virtio_pci_bus_new creates the virtio bus
> - the virtio bus is added as a child property
> 
> virtio_scsi_pci_realize:
> - qdev_set_parent_bus links the device to the bus (bus and VirtIODevice
> refcounts are now 3)
> - the VirtIODevice is realized
> 
> ...
> at hot-unplug time:
> - the device is unrealized
> - the bus is unparented (calling bus_unparent)
>   - the device is unparented (calling device_unparent)
> - bus_remove_child is called (bus and VirtIODevice refcounts are now 1)
> - the VirtIODevice child property is deleted by object_unparent and
> the VirtIODevice is finalized
>   - the bus child property is deleted by object_unparent and the
> VirtIODevice is finalized

Sorry I don't understand. From my debugging, VirtIODevice is not finalized,
because it is embeded as VirtIOSCSIPCI.vdev.parent_obj.parent_obj, in non-zero
offset.

As I understand it, it's the VirtIOSCSIPCI instance that is being finalized, but
not VirtIODevice. Because the object_unparent is actually called on the
containing object:

Thread 4 "qemu-system-x86" hit Breakpoint 1, device_unparent 
(obj=0x559449824f40) at /stor/work/qemu/hw/core/qdev.c:1078
1078DeviceState *dev = DEVICE(obj);
(gdb) bt
#0  0x5594455f11df in device_unparent (obj=0x559449824f40) at 
/stor/work/qemu/hw/core/qdev.c:1078
#1  0x5594457be582 in object_finalize_child_property (obj=0x559447f0d320, 
name=0x5594498306e0 "scsi1", opaque=0x559449824f40) at 
/stor/work/qemu/qom/object.c:1369
#2  0x5594457bc34a in object_property_del_child (obj=0x559447f0d320, 
child=0x559449824f40, errp=0x0) at /stor/work/qemu/qom/object.c:428
#3  0x5594457bc42a in object_unparent (obj=0x559449824f40) at 
/stor/work/qemu/qom/object.c:447
#4  0x5594455acf19 in acpi_pcihp_eject_slot (s=0x55944967c440, bsel=0, 
slots=16) at /stor/work/qemu/hw/acpi/pcihp.c:138
#5  0x5594455ad542 in pci_write (opaque=0x55944967c440, addr=8, data=16, 
size=4) at /stor/work/qemu/hw/acpi/pcihp.c:272
#6  0x559445437667 in memory_region_write_accessor (mr=0x55944967d050, 
addr=8, value=0x7fdfa7569838, size=4, shift=0, mask=4294967295, attrs=...)
at /stor/work/qemu/memory.c:526
#7  0x55944543787f in access_with_adjusted_size (addr=8, 
value=0x7fdfa7569838, size=4, access_size_min=1, access_size_max=4, access=
0x55944543757d , mr=0x55944967d050, 
attrs=...) at /stor/work/qemu/memory.c:592
#8  0x559445439fdb in memory_region_dispatch_write (mr=0x55944967d050, 
addr=8, data=16, size=4, attrs=...) at /stor/work/qemu/memory.c:1319
#9  0x5594453dfa77 in address_space_write_continue (as=0x559445f1fce0 
, addr=44552, attrs=..., buf=0x7fdfb77d2000 "\020", len=4, 
addr1=8, l=4, mr=0x55944967d050) at /stor/work/qemu/exec.c:2822
#10 0x5594453dfc31 in address_space_write (as=0x559445f1fce0 
, addr=44552, attrs=..., buf=0x7fdfb77d2000 "\020", len=4) at 
/stor/work/qemu/exec.c:2879
#11 0x5594453dffbd in address_space_rw (as=0x559445f1fce0 
, addr=44552, attrs=..., buf=0x7fdfb77d2000 "\020", len=4, 
is_write=true)
at /stor/work/qemu/exec.c:2981
#12 0x559445433797 in kvm_handle_io (port=44552, attrs=..., 
data=0x7fdfb77d2000, direction=1, size=4, count=1) at 
/stor/work/qemu/kvm-all.c:1803
#13 0x559445433e87 in kvm_cpu_exec (cpu=0x559447f0d560) at 
/stor/work/qemu/kvm-all.c:2032
#14 0x559445419cc6 in qemu_kvm_cpu_thread_fn (arg=0x559447f0d560) at 
/stor/work/qemu/cpus.c:1118
#15 0x7fdfb56ba6ca in start_thread () at /lib64/libpthread.so.0
#16 0x7fdfb1382f7f in clone () at /lib64/libc.so.6
(gdb) p *obj.class.type
$1 = {name = 0x559447e57a20 "virtio-scsi-pci", class_size = 280, instance_size 
= 34688, class_init = 0x55944573573e , 
class_base_init = 0x0, 
  class_finalize = 0x0, class_data = 0x0, instance_init = 0x559445735837 
, instance_post_init = 0x0, instance_finalize = 
0x0, 
  abstract = false, parent = 0x559447e57a40 "virtio-pci", parent_type = 
0x559447e57520, class = 0x559447e7add0, num_interfaces = 0, interfaces = {{
  typename = 0x0} }}

Am I missing something?

Fam

Re: [Qemu-devel] [PATCH 4/5] target/sh4: ignore interrupts in a delay slot

2017-05-16 Thread Philippe Mathieu-Daudé


On 05/16/2017 07:47 PM, Aurelien Jarno wrote:

Delay slots are indivisible, therefore avoid scheduling an interrupt in
the delay slot. However exceptions are possible.

Signed-off-by: Aurelien Jarno 


Reviewed-by: Philippe Mathieu-Daudé 


---
 target/sh4/helper.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/target/sh4/helper.c b/target/sh4/helper.c
index d420931530..19d4ec5fb5 100644
--- a/target/sh4/helper.c
+++ b/target/sh4/helper.c
@@ -871,8 +871,16 @@ int cpu_sh4_is_cached(CPUSH4State * env, target_ulong addr)
 bool superh_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
 {
 if (interrupt_request & CPU_INTERRUPT_HARD) {
-superh_cpu_do_interrupt(cs);
-return true;
+SuperHCPU *cpu = SUPERH_CPU(cs);
+CPUSH4State *env = >env;
+
+/* Delay slots are indivisible, ignore interrupts */
+if (env->flags & DELAY_SLOT_MASK) {
+return false;
+} else {
+superh_cpu_do_interrupt(cs);
+return true;
+}
 }
 return false;
 }

Re: [Qemu-devel] [PATCH 3/5] target/sh4: introduce DELAY_SLOT_MASK

2017-05-16 Thread Philippe Mathieu-Daudé


On 05/16/2017 07:47 PM, Aurelien Jarno wrote:

This will make easier the introduction of a new flag in the next
patches.


This makes code cleaner / easier to read, no need further explanation ;)



Signed-off-by: Aurelien Jarno 


Reviewed-by: Philippe Mathieu-Daudé 


---
 target/sh4/cpu.h   |  3 ++-
 target/sh4/helper.c|  4 ++--
 target/sh4/translate.c | 17 -
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/target/sh4/cpu.h b/target/sh4/cpu.h
index 6c07c6b24b..7969c9af98 100644
--- a/target/sh4/cpu.h
+++ b/target/sh4/cpu.h
@@ -91,6 +91,7 @@
 #define FPSCR_RM_NEAREST   (0 << 0)
 #define FPSCR_RM_ZERO  (1 << 0)

+#define DELAY_SLOT_MASK0x3
 #define DELAY_SLOT (1 << 0)
 #define DELAY_SLOT_CONDITIONAL (1 << 1)

@@ -380,7 +381,7 @@ static inline void cpu_get_tb_cpu_state(CPUSH4State *env, 
target_ulong *pc,
 {
 *pc = env->pc;
 *cs_base = 0;
-*flags = (env->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) /* Bits 0-1 
*/
+*flags = (env->flags & DELAY_SLOT_MASK)/* Bits  0- 1 */
 | (env->fpscr & (FPSCR_FR | FPSCR_SZ | FPSCR_PR))  /* Bits 19-21 */
 | (env->sr & ((1u << SR_MD) | (1u << SR_RB)))  /* Bits 29-30 */
 | (env->sr & (1u << SR_FD))/* Bit 15 */
diff --git a/target/sh4/helper.c b/target/sh4/helper.c
index 5296e7cf4e..d420931530 100644
--- a/target/sh4/helper.c
+++ b/target/sh4/helper.c
@@ -172,11 +172,11 @@ void superh_cpu_do_interrupt(CPUState *cs)
 env->sgr = env->gregs[15];
 env->sr |= (1u << SR_BL) | (1u << SR_MD) | (1u << SR_RB);

-if (env->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) {
+if (env->flags & DELAY_SLOT_MASK) {
 /* Branch instruction should be executed again before delay slot. */
env->spc -= 2;
/* Clear flags for exception/interrupt routine. */
-env->flags &= ~(DELAY_SLOT | DELAY_SLOT_CONDITIONAL);
+env->flags &= ~DELAY_SLOT_MASK;
 }

 if (do_exp) {
diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 0bc2f9ff19..aba316f593 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -217,8 +217,7 @@ static inline void gen_save_cpu_state(DisasContext *ctx, 
bool save_pc)
 if (ctx->delayed_pc != (uint32_t) -1) {
 tcg_gen_movi_i32(cpu_delayed_pc, ctx->delayed_pc);
 }
-if ((ctx->tbflags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL))
-!= ctx->envflags) {
+if ((ctx->tbflags & DELAY_SLOT_MASK) != ctx->envflags) {
 tcg_gen_movi_i32(cpu_flags, ctx->envflags);
 }
 }
@@ -329,7 +328,7 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 #define DREG(x) FREG(x) /* Assumes lsb of (x) is always 0 */

 #define CHECK_NOT_DELAY_SLOT \
-if (ctx->envflags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) { \
+if (ctx->envflags & DELAY_SLOT_MASK) {   \
 gen_save_cpu_state(ctx, true);   \
 gen_helper_raise_slot_illegal_instruction(cpu_env);  \
 ctx->bstate = BS_EXCP;   \
@@ -339,7 +338,7 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 #define CHECK_PRIVILEGED \
 if (IS_USER(ctx)) {  \
 gen_save_cpu_state(ctx, true);   \
-if (ctx->envflags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) { \
+if (ctx->envflags & DELAY_SLOT_MASK) {   \
 gen_helper_raise_slot_illegal_instruction(cpu_env);  \
 } else { \
 gen_helper_raise_illegal_instruction(cpu_env);   \
@@ -351,7 +350,7 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 #define CHECK_FPU_ENABLED\
 if (ctx->tbflags & (1u << SR_FD)) {  \
 gen_save_cpu_state(ctx, true);   \
-if (ctx->envflags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) { \
+if (ctx->envflags & DELAY_SLOT_MASK) {   \
 gen_helper_raise_slot_fpu_disable(cpu_env);  \
 } else { \
 gen_helper_raise_fpu_disable(cpu_env);   \
@@ -1784,7 +1783,7 @@ static void _decode_opc(DisasContext * ctx)
 fflush(stderr);
 #endif
 gen_save_cpu_state(ctx, true);
-if (ctx->envflags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) {
+if (ctx->envflags & DELAY_SLOT_MASK) {
 gen_helper_raise_slot_illegal_instruction(cpu_env);
 } else {
 gen_helper_raise_illegal_instruction(cpu_env);
@@ -1798,9 +1797,9 @@ static void decode_opc(DisasContext * ctx)

 _decode_opc(ctx);

-if (old_flags & (DELAY_SLOT |

Re: [Qemu-devel] [PATCH 3/4] target/cris: optimize swap

2017-05-16 Thread Philippe Mathieu-Daudé


On 05/16/2017 08:01 PM, Aurelien Jarno wrote:

Use the same mask to avoid having to load two different constants, as
suggest by Richard Henderson. Also use one less temp.

Signed-off-by: Aurelien Jarno 
---
 target/cris/translate.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index 0ee05ca02d..103b214233 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -433,20 +433,19 @@ static inline void t_gen_subx_carry(DisasContext *dc, 
TCGv d)
T0 = ((T0 << 8) & 0xff00ff00) | ((T0 >> 8) & 0x00ff00ff)  */
 static inline void t_gen_swapb(TCGv d, TCGv s)
 {
-TCGv t, org_s;
+TCGv t, m;

 t = tcg_temp_new();
-org_s = tcg_temp_new();
+m = tcg_const_tl(0x00ff00ff);

 /* d and s may refer to the same object.  */
-tcg_gen_mov_tl(org_s, s);
-tcg_gen_shli_tl(t, org_s, 8);
-tcg_gen_andi_tl(d, t, 0xff00ff00);
-tcg_gen_shri_tl(t, org_s, 8);
-tcg_gen_andi_tl(t, t, 0x00ff00ff);
+tcg_gen_shri_tl(t, s, 8);
+tcg_gen_and_tl(t, t, m);
+tcg_gen_and_tl(d, s, m);


Eventually add a comment /* set d 0xff00ff00 */

Anyway,
Reviewed-by: Philippe Mathieu-Daudé 


+tcg_gen_shli_tl(d, d, 8);
 tcg_gen_or_tl(d, d, t);
+tcg_temp_free(m);
 tcg_temp_free(t);
-tcg_temp_free(org_s);
 }

 /* Swap the halfwords of the s operand.  */

Re: [Qemu-devel] [PATCH 1/5] target/sh4: log unauthorized accesses using qemu_log_mask

2017-05-16 Thread Philippe Mathieu-Daudé


On 05/16/2017 07:47 PM, Aurelien Jarno wrote:

qemu_log_mask() is preferred over fprintf() for logging errors.

Signed-off-by: Aurelien Jarno 


Reviewed-by: Philippe Mathieu-Daudé 


---
 target/sh4/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/sh4/helper.c b/target/sh4/helper.c
index 8f8ce81401..4c024f9529 100644
--- a/target/sh4/helper.c
+++ b/target/sh4/helper.c
@@ -420,7 +420,7 @@ static int get_physical_address(CPUSH4State * env, 
target_ulong * physical,
 if (!(env->sr & (1u << SR_MD))
&& (address < 0xe000 || address >= 0xe400)) {
/* Unauthorized access in user mode (only store queues are 
available) */
-   fprintf(stderr, "Unauthorized access\n");
+qemu_log_mask(LOG_GUEST_ERROR, "Unauthorized access\n");
if (rw == 0)
return MMU_DADDR_ERROR_READ;
else if (rw == 1)

Re: [Qemu-devel] [Qemu-arm] [PATCH 2/4] target/arm: simplify and optimize aarch64 rev16

2017-05-16 Thread Philippe Mathieu-Daudé


On 05/16/2017 08:01 PM, Aurelien Jarno wrote:

Instead of byteswapping individual 16-bit words one by one, work on the
whole register at the same time using shifts and mask. This is the same
strategy than the aarch32 version of rev16 and is much more efficient
in the case sf=1.

Signed-off-by: Aurelien Jarno 


Reviewed-by: Philippe Mathieu-Daudé 


---
 target/arm/translate-a64.c | 24 ++--
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 24de30d92c..ed15d21655 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4035,24 +4035,12 @@ static void handle_rev16(DisasContext *s, unsigned int 
sf,
 TCGv_i64 tcg_tmp = tcg_temp_new_i64();
 TCGv_i64 tcg_rn = read_cpu_reg(s, rn, sf);

-tcg_gen_andi_i64(tcg_tmp, tcg_rn, 0x);
-tcg_gen_bswap16_i64(tcg_rd, tcg_tmp);
-
-tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16);
-tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0x);
-tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 16, 16);
-
-if (sf) {
-tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32);
-tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0x);
-tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 32, 16);
-
-tcg_gen_shri_i64(tcg_tmp, tcg_rn, 48);
-tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 48, 16);
-}
+TCGv mask = tcg_const_i64(sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff);
+tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
+tcg_gen_and_i64(tcg_rd, tcg_rn, mask);
+tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask);
+tcg_gen_shli_i64(tcg_rd, tcg_rd, 8);
+tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_tmp);

 tcg_temp_free_i64(tcg_tmp);
 }

[Qemu-devel] [PATCH v8 12/13] vfio/ccw: update sense data if a unit check is pending

2017-05-16 Thread Dong Jia Shi

Concurrent-sense data is currently not delivered. This patch stores
the concurrent-sense data to the subchannel if a unit check is pending
and the concurrent-sense bit is enabled. Then a TSCH can retreive the
right IRB data back to the guest.

Acked-by: Alex Williamson 
Signed-off-by: Dong Jia Shi 
---
 hw/vfio/ccw.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 007ce435f1..12d0262336 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -94,6 +94,7 @@ static void vfio_ccw_io_notifier_handler(void *opaque)
 CcwDevice *ccw_dev = CCW_DEVICE(cdev);
 SubchDev *sch = ccw_dev->sch;
 SCSW *s = >curr_status.scsw;
+PMCW *p = >curr_status.pmcw;
 IRB irb;
 int size;
 
@@ -143,6 +144,12 @@ static void vfio_ccw_io_notifier_handler(void *opaque)
 /* Update control block via irb. */
 copy_scsw_to_guest(s, );
 
+/* If a uint check is pending, copy sense data. */
+if ((s->dstat & SCSW_DSTAT_UNIT_CHECK) &&
+(p->chars & PMCW_CHARS_MASK_CSENSE)) {
+memcpy(sch->sense_data, irb.ecw, sizeof(irb.ecw));
+}
+
 read_err:
 css_inject_io_interrupt(sch);
 }
-- 
2.11.2

[Qemu-devel] [PATCH v8 09/13] vfio/ccw: get irqs info and set the eventfd fd

2017-05-16 Thread Dong Jia Shi

vfio-ccw resorts to the eventfd mechanism to communicate with userspace.
We fetch the irqs info via the ioctl VFIO_DEVICE_GET_IRQ_INFO,
register a event notifier to get the eventfd fd which is sent
to kernel via the ioctl VFIO_DEVICE_SET_IRQS, then we can implement
read operation once kernel sends the signal.

Reviewed-by: Eric Auger 
Acked-by: Alex Williamson 
Signed-off-by: Dong Jia Shi 
---
 hw/vfio/ccw.c | 101 ++
 1 file changed, 101 insertions(+)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 7ddcfd7767..689a7724b6 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -22,6 +22,7 @@
 #include "hw/vfio/vfio-common.h"
 #include "hw/s390x/s390-ccw.h"
 #include "hw/s390x/ccw-device.h"
+#include "qemu/error-report.h"
 
 #define TYPE_VFIO_CCW "vfio-ccw"
 typedef struct VFIOCCWDevice {
@@ -30,6 +31,7 @@ typedef struct VFIOCCWDevice {
 uint64_t io_region_size;
 uint64_t io_region_offset;
 struct ccw_io_region *io_region;
+EventNotifier io_notifier;
 } VFIOCCWDevice;
 
 static void vfio_ccw_compute_needs_reset(VFIODevice *vdev)
@@ -54,6 +56,97 @@ static void vfio_ccw_reset(DeviceState *dev)
 ioctl(vcdev->vdev.fd, VFIO_DEVICE_RESET);
 }
 
+static void vfio_ccw_io_notifier_handler(void *opaque)
+{
+VFIOCCWDevice *vcdev = opaque;
+
+if (!event_notifier_test_and_clear(>io_notifier)) {
+return;
+}
+}
+
+static void vfio_ccw_register_io_notifier(VFIOCCWDevice *vcdev, Error **errp)
+{
+VFIODevice *vdev = >vdev;
+struct vfio_irq_info *irq_info;
+struct vfio_irq_set *irq_set;
+size_t argsz;
+int32_t *pfd;
+
+if (vdev->num_irqs < VFIO_CCW_IO_IRQ_INDEX + 1) {
+error_setg(errp, "vfio: unexpected number of io irqs %u",
+   vdev->num_irqs);
+return;
+}
+
+argsz = sizeof(*irq_set);
+irq_info = g_malloc0(argsz);
+irq_info->index = VFIO_CCW_IO_IRQ_INDEX;
+irq_info->argsz = argsz;
+if (ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO,
+  irq_info) < 0 || irq_info->count < 1) {
+error_setg_errno(errp, errno, "vfio: Error getting irq info");
+goto out_free_info;
+}
+
+if (event_notifier_init(>io_notifier, 0)) {
+error_setg_errno(errp, errno,
+ "vfio: Unable to init event notifier for IO");
+goto out_free_info;
+}
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+irq_set->index = VFIO_CCW_IO_IRQ_INDEX;
+irq_set->start = 0;
+irq_set->count = 1;
+pfd = (int32_t *) _set->data;
+
+*pfd = event_notifier_get_fd(>io_notifier);
+qemu_set_fd_handler(*pfd, vfio_ccw_io_notifier_handler, NULL, vcdev);
+if (ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set)) {
+error_setg(errp, "vfio: Failed to set up io notification");
+qemu_set_fd_handler(*pfd, NULL, NULL, vcdev);
+event_notifier_cleanup(>io_notifier);
+}
+
+g_free(irq_set);
+
+out_free_info:
+g_free(irq_info);
+}
+
+static void vfio_ccw_unregister_io_notifier(VFIOCCWDevice *vcdev)
+{
+struct vfio_irq_set *irq_set;
+size_t argsz;
+int32_t *pfd;
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+irq_set->index = VFIO_CCW_IO_IRQ_INDEX;
+irq_set->start = 0;
+irq_set->count = 1;
+pfd = (int32_t *) _set->data;
+*pfd = -1;
+
+if (ioctl(vcdev->vdev.fd, VFIO_DEVICE_SET_IRQS, irq_set)) {
+error_report("vfio: Failed to de-assign device io fd: %m");
+}
+
+qemu_set_fd_handler(event_notifier_get_fd(>io_notifier),
+NULL, NULL, vcdev);
+event_notifier_cleanup(>io_notifier);
+
+g_free(irq_set);
+}
+
 static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
 {
 VFIODevice *vdev = >vdev;
@@ -173,8 +266,15 @@ static void vfio_ccw_realize(DeviceState *dev, Error 
**errp)
 goto out_region_err;
 }
 
+vfio_ccw_register_io_notifier(vcdev, );
+if (err) {
+goto out_notifier_err;
+}
+
 return;
 
+out_notifier_err:
+vfio_ccw_put_region(vcdev);
 out_region_err:
 vfio_put_device(vcdev);
 out_device_err:
@@ -195,6 +295,7 @@ static void vfio_ccw_unrealize(DeviceState *dev, Error 
**errp)
 S390CCWDeviceClass *cdc = S390_CCW_DEVICE_GET_CLASS(cdev);
 VFIOGroup *group = vcdev->vdev.group;
 
+vfio_ccw_unregister_io_notifier(vcdev);
 vfio_ccw_put_region(vcdev);
 vfio_put_device(vcdev);
 vfio_put_group(group);
-- 
2.11.2

Re: [Qemu-devel] [Qemu-arm] [PATCH 1/4] target/arm: optimize aarch32 rev16

2017-05-16 Thread Philippe Mathieu-Daudé


Hi Aurelien,

On 05/16/2017 08:01 PM, Aurelien Jarno wrote:

Use the same mask to avoid having to load two different constants, as



suggested by Richard Henderson.


What about
Suggested-by: Richard Henderson  ?


Signed-off-by: Aurelien Jarno 


Reviewed-by: Philippe Mathieu-Daudé 


---
 target/arm/translate.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 0b5a0bca06..5becb2bb89 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -339,11 +339,13 @@ static void gen_smul_dual(TCGv_i32 a, TCGv_i32 b)
 static void gen_rev16(TCGv_i32 var)
 {
 TCGv_i32 tmp = tcg_temp_new_i32();
+TCGv_i32 mask = tcg_const_i32(0x00ff00ff);
 tcg_gen_shri_i32(tmp, var, 8);
-tcg_gen_andi_i32(tmp, tmp, 0x00ff00ff);
+tcg_gen_and_i32(tmp, tmp, mask);
+tcg_gen_and_i32(var, var, mask);
 tcg_gen_shli_i32(var, var, 8);
-tcg_gen_andi_i32(var, var, 0xff00ff00);
 tcg_gen_or_i32(var, var, tmp);
+tcg_temp_free_i32(mask);
 tcg_temp_free_i32(tmp);
 }

[Qemu-devel] [PATCH v8 06/13] s390x/css: device support for s390-ccw passthrough

2017-05-16 Thread Dong Jia Shi

In order to support subchannels pass-through, we introduce a s390
subchannel device called "s390-ccw" to hold the real subchannel info.
The s390-ccw devices inherit from the abstract CcwDevice which connect
to the existing virtual-css-bus.

Reviewed-by: Eric Auger 
Signed-off-by: Dong Jia Shi 
---
 hw/s390x/Makefile.objs  |   1 +
 hw/s390x/s390-ccw.c | 141 
 include/hw/s390x/s390-ccw.h |  38 
 3 files changed, 180 insertions(+)
 create mode 100644 hw/s390x/s390-ccw.c
 create mode 100644 include/hw/s390x/s390-ccw.h

diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
index 36bd4b1645..a8e5575a8a 100644
--- a/hw/s390x/Makefile.objs
+++ b/hw/s390x/Makefile.objs
@@ -14,3 +14,4 @@ obj-y += ccw-device.o
 obj-y += s390-pci-bus.o s390-pci-inst.o
 obj-y += s390-skeys.o
 obj-$(CONFIG_KVM) += s390-skeys-kvm.o
+obj-y += s390-ccw.o
diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
new file mode 100644
index 00..e2b1973fda
--- /dev/null
+++ b/hw/s390x/s390-ccw.c
@@ -0,0 +1,141 @@
+/*
+ * s390 CCW Assignment Support
+ *
+ * Copyright 2017 IBM Corp
+ * Author(s): Dong Jia Shi 
+ *Xiao Feng Ren 
+ *Pierre Morel 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2
+ * or (at your option) any later version. See the COPYING file in the
+ * top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/sysbus.h"
+#include "libgen.h"
+#include "hw/s390x/css.h"
+#include "hw/s390x/css-bridge.h"
+#include "hw/s390x/s390-ccw.h"
+
+static void s390_ccw_get_dev_info(S390CCWDevice *cdev,
+  char *sysfsdev,
+  Error **errp)
+{
+unsigned int cssid, ssid, devid;
+char dev_path[PATH_MAX] = {0}, *tmp;
+
+if (!sysfsdev) {
+error_setg(errp, "No host device provided");
+error_append_hint(errp,
+  "Use -device vfio-ccw,sysfsdev=PATH_TO_DEVICE\n");
+return;
+}
+
+if (!realpath(sysfsdev, dev_path)) {
+error_setg_errno(errp, errno, "Host device '%s' not found", sysfsdev);
+return;
+}
+
+cdev->mdevid = g_strdup(basename(dev_path));
+
+tmp = basename(dirname(dev_path));
+if (sscanf(tmp, "%2x.%1x.%4x", , , ) != 3) {
+error_setg_errno(errp, errno, "Failed to read %s", tmp);
+return;
+}
+
+cdev->hostid.cssid = cssid;
+cdev->hostid.ssid = ssid;
+cdev->hostid.devid = devid;
+cdev->hostid.valid = true;
+}
+
+static void s390_ccw_realize(S390CCWDevice *cdev, char *sysfsdev, Error **errp)
+{
+CcwDevice *ccw_dev = CCW_DEVICE(cdev);
+CCWDeviceClass *ck = CCW_DEVICE_GET_CLASS(ccw_dev);
+DeviceState *parent = DEVICE(ccw_dev);
+BusState *qbus = qdev_get_parent_bus(parent);
+VirtualCssBus *cbus = VIRTUAL_CSS_BUS(qbus);
+SubchDev *sch;
+int ret;
+Error *err = NULL;
+
+s390_ccw_get_dev_info(cdev, sysfsdev, );
+if (err) {
+goto out_err_propagate;
+}
+
+sch = css_create_sch(ccw_dev->devno, false, cbus->squash_mcss, );
+if (!sch) {
+goto out_mdevid_free;
+}
+sch->driver_data = cdev;
+
+ccw_dev->sch = sch;
+ret = css_sch_build_schib(sch, >hostid);
+if (ret) {
+error_setg_errno(, -ret, "%s: Failed to build initial schib",
+ __func__);
+goto out_err;
+}
+
+ck->realize(ccw_dev, );
+if (err) {
+goto out_err;
+}
+
+css_generate_sch_crws(sch->cssid, sch->ssid, sch->schid,
+  parent->hotplugged, 1);
+return;
+
+out_err:
+css_subch_assign(sch->cssid, sch->ssid, sch->schid, sch->devno, NULL);
+ccw_dev->sch = NULL;
+g_free(sch);
+out_mdevid_free:
+g_free(cdev->mdevid);
+out_err_propagate:
+error_propagate(errp, err);
+}
+
+static void s390_ccw_unrealize(S390CCWDevice *cdev, Error **errp)
+{
+CcwDevice *ccw_dev = CCW_DEVICE(cdev);
+SubchDev *sch = ccw_dev->sch;
+
+if (sch) {
+css_subch_assign(sch->cssid, sch->ssid, sch->schid, sch->devno, NULL);
+g_free(sch);
+ccw_dev->sch = NULL;
+}
+
+g_free(cdev->mdevid);
+}
+
+static void s390_ccw_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+S390CCWDeviceClass *cdc = S390_CCW_DEVICE_CLASS(klass);
+
+dc->bus_type = TYPE_VIRTUAL_CSS_BUS;
+cdc->realize = s390_ccw_realize;
+cdc->unrealize = s390_ccw_unrealize;
+}
+
+static const TypeInfo s390_ccw_info = {
+.name  = TYPE_S390_CCW,
+.parent= TYPE_CCW_DEVICE,
+.instance_size = sizeof(S390CCWDevice),
+.class_size= sizeof(S390CCWDeviceClass),
+.class_init= s390_ccw_class_init,
+.abstract  = true,
+};
+
+static void register_s390_ccw_type(void)
+{
+

[Qemu-devel] [PATCH v8 13/13] MAINTAINERS: Add vfio-ccw maintainer

2017-05-16 Thread Dong Jia Shi

Add Cornelia Huck as the vfio-ccw maintainer.

Acked-by: Alex Williamson 
Signed-off-by: Dong Jia Shi 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index efdec47319..a4ae36b411 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -997,6 +997,13 @@ S: Supported
 F: hw/vfio/*
 F: include/hw/vfio/
 
+vfio-ccw
+M: Cornelia Huck 
+S: Supported
+F: hw/vfio/ccw.c
+F: hw/s390x/s390-ccw.c
+F: include/hw/s390x/s390-ccw.h
+
 vhost
 M: Michael S. Tsirkin 
 S: Supported
-- 
2.11.2

[Qemu-devel] [PATCH v8 07/13] vfio/ccw: vfio based subchannel passthrough driver

2017-05-16 Thread Dong Jia Shi

From: Xiao Feng Ren 

We use the IOMMU_TYPE1 of VFIO to realize the subchannels
passthrough, implement a vfio based subchannels passthrough
driver called "vfio-ccw".

Support qemu parameters in the style of:
"-device vfio-ccw,sysfsdev=$mdev_file_path,devno=xx.x.'

Reviewed-by: Eric Auger 
Acked-by: Alex Williamson 
Signed-off-by: Xiao Feng Ren 
Signed-off-by: Dong Jia Shi 
---
 default-configs/s390x-softmmu.mak |   1 +
 hw/vfio/Makefile.objs |   1 +
 hw/vfio/ccw.c | 187 ++
 include/hw/vfio/vfio-common.h |   1 +
 4 files changed, 190 insertions(+)
 create mode 100644 hw/vfio/ccw.c

diff --git a/default-configs/s390x-softmmu.mak 
b/default-configs/s390x-softmmu.mak
index 9615a48f80..18aed56fc0 100644
--- a/default-configs/s390x-softmmu.mak
+++ b/default-configs/s390x-softmmu.mak
@@ -5,4 +5,5 @@ CONFIG_SCLPCONSOLE=y
 CONFIG_TERMINAL3270=y
 CONFIG_S390_FLIC=y
 CONFIG_S390_FLIC_KVM=$(CONFIG_KVM)
+CONFIG_VFIO_CCW=$(CONFIG_LINUX)
 CONFIG_WDT_DIAG288=y
diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index 05e7fbb93f..c3ab9097f1 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,6 +1,7 @@
 ifeq ($(CONFIG_LINUX), y)
 obj-$(CONFIG_SOFTMMU) += common.o
 obj-$(CONFIG_PCI) += pci.o pci-quirks.o
+obj-$(CONFIG_VFIO_CCW) += ccw.o
 obj-$(CONFIG_SOFTMMU) += platform.o
 obj-$(CONFIG_VFIO_XGMAC) += calxeda-xgmac.o
 obj-$(CONFIG_VFIO_AMD_XGBE) += amd-xgbe.o
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
new file mode 100644
index 00..7d2497cee6
--- /dev/null
+++ b/hw/vfio/ccw.c
@@ -0,0 +1,187 @@
+/*
+ * vfio based subchannel assignment support
+ *
+ * Copyright 2017 IBM Corp.
+ * Author(s): Dong Jia Shi 
+ *Xiao Feng Ren 
+ *Pierre Morel 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or(at
+ * your option) any version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include 
+#include 
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/sysbus.h"
+#include "hw/vfio/vfio.h"
+#include "hw/vfio/vfio-common.h"
+#include "hw/s390x/s390-ccw.h"
+#include "hw/s390x/ccw-device.h"
+
+#define TYPE_VFIO_CCW "vfio-ccw"
+typedef struct VFIOCCWDevice {
+S390CCWDevice cdev;
+VFIODevice vdev;
+} VFIOCCWDevice;
+
+static void vfio_ccw_compute_needs_reset(VFIODevice *vdev)
+{
+vdev->needs_reset = false;
+}
+
+/*
+ * We don't need vfio_hot_reset_multi and vfio_eoi operations for
+ * vfio_ccw device now.
+ */
+struct VFIODeviceOps vfio_ccw_ops = {
+.vfio_compute_needs_reset = vfio_ccw_compute_needs_reset,
+};
+
+static void vfio_ccw_reset(DeviceState *dev)
+{
+CcwDevice *ccw_dev = DO_UPCAST(CcwDevice, parent_obj, dev);
+S390CCWDevice *cdev = DO_UPCAST(S390CCWDevice, parent_obj, ccw_dev);
+VFIOCCWDevice *vcdev = DO_UPCAST(VFIOCCWDevice, cdev, cdev);
+
+ioctl(vcdev->vdev.fd, VFIO_DEVICE_RESET);
+}
+
+static void vfio_put_device(VFIOCCWDevice *vcdev)
+{
+g_free(vcdev->vdev.name);
+vfio_put_base_device(>vdev);
+}
+
+static VFIOGroup *vfio_ccw_get_group(S390CCWDevice *cdev, Error **errp)
+{
+char *tmp, group_path[PATH_MAX];
+ssize_t len;
+int groupid;
+
+tmp = g_strdup_printf("/sys/bus/css/devices/%x.%x.%04x/%s/iommu_group",
+  cdev->hostid.cssid, cdev->hostid.ssid,
+  cdev->hostid.devid, cdev->mdevid);
+len = readlink(tmp, group_path, sizeof(group_path));
+g_free(tmp);
+
+if (len <= 0 || len >= sizeof(group_path)) {
+error_setg(errp, "vfio: no iommu_group found");
+return NULL;
+}
+
+group_path[len] = 0;
+
+if (sscanf(basename(group_path), "%d", ) != 1) {
+error_setg(errp, "vfio: failed to read %s", group_path);
+return NULL;
+}
+
+return vfio_get_group(groupid, _space_memory, errp);
+}
+
+static void vfio_ccw_realize(DeviceState *dev, Error **errp)
+{
+VFIODevice *vbasedev;
+VFIOGroup *group;
+CcwDevice *ccw_dev = DO_UPCAST(CcwDevice, parent_obj, dev);
+S390CCWDevice *cdev = DO_UPCAST(S390CCWDevice, parent_obj, ccw_dev);
+VFIOCCWDevice *vcdev = DO_UPCAST(VFIOCCWDevice, cdev, cdev);
+S390CCWDeviceClass *cdc = S390_CCW_DEVICE_GET_CLASS(cdev);
+Error *err = NULL;
+
+/* Call the class init function for subchannel. */
+if (cdc->realize) {
+cdc->realize(cdev, vcdev->vdev.sysfsdev, );
+if (err) {
+goto out_err_propagate;
+}
+}
+
+group = vfio_ccw_get_group(cdev, );
+if (!group) {
+goto out_group_err;
+}
+
+vcdev->vdev.ops = _ccw_ops;
+vcdev->vdev.type = VFIO_DEVICE_TYPE_CCW;
+vcdev->vdev.name = g_strdup_printf("%x.%x.%04x", cdev->hostid.cssid,
+

[Qemu-devel] [PATCH v8 05/13] s390x/css: realize css_create_sch

2017-05-16 Thread Dong Jia Shi

The S390 virtual css support already has a mechanism to create a
virtual subchannel and provide it to the guest. However, to
pass-through subchannels to a guest, we need to introduce a new
mechanism to create the subchannel according to the real device
information. Thus we reconstruct css_create_virtual_sch to a new
css_create_sch function to handle all these cases and do allocation
and initialization of the subchannel according to the device type
and machine configuration.

Reviewed-by: Pierre Morel 
Signed-off-by: Dong Jia Shi 
---
 hw/s390x/3270-ccw.c   |  6 +-
 hw/s390x/css-bridge.c |  2 ++
 hw/s390x/css.c| 45 ---
 hw/s390x/s390-virtio-ccw.c| 11 ---
 hw/s390x/virtio-ccw.c |  6 +-
 include/hw/s390x/css-bridge.h |  1 +
 include/hw/s390x/css.h| 25 
 7 files changed, 76 insertions(+), 20 deletions(-)

diff --git a/hw/s390x/3270-ccw.c b/hw/s390x/3270-ccw.c
index a7a5b412e4..6e6eee4e90 100644
--- a/hw/s390x/3270-ccw.c
+++ b/hw/s390x/3270-ccw.c
@@ -98,9 +98,13 @@ static void emulated_ccw_3270_realize(DeviceState *ds, Error 
**errp)
 EmulatedCcw3270Class *ck = EMULATED_CCW_3270_GET_CLASS(dev);
 CcwDevice *cdev = CCW_DEVICE(ds);
 CCWDeviceClass *cdk = CCW_DEVICE_GET_CLASS(cdev);
-SubchDev *sch = css_create_virtual_sch(cdev->devno, errp);
+DeviceState *parent = DEVICE(cdev);
+BusState *qbus = qdev_get_parent_bus(parent);
+VirtualCssBus *cbus = VIRTUAL_CSS_BUS(qbus);
+SubchDev *sch;
 Error *err = NULL;
 
+sch = css_create_sch(cdev->devno, true, cbus->squash_mcss, errp);
 if (!sch) {
 return;
 }
diff --git a/hw/s390x/css-bridge.c b/hw/s390x/css-bridge.c
index b54ac01d37..823747fcd7 100644
--- a/hw/s390x/css-bridge.c
+++ b/hw/s390x/css-bridge.c
@@ -17,6 +17,7 @@
 #include "hw/s390x/css.h"
 #include "ccw-device.h"
 #include "hw/s390x/css-bridge.h"
+#include "cpu.h"
 
 /*
  * Invoke device-specific unplug handler, disable the subchannel
@@ -103,6 +104,7 @@ VirtualCssBus *virtual_css_bus_init(void)
 /* Create bus on bridge device */
 bus = qbus_create(TYPE_VIRTUAL_CSS_BUS, dev, "virtual-css");
 cbus = VIRTUAL_CSS_BUS(bus);
+cbus->squash_mcss = s390_get_squash_mcss();
 
 /* Enable hotplugging */
 qbus_set_hotplug_handler(bus, dev, _abort);
diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 2c8d0e7219..a8aed9cb3a 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1948,28 +1948,59 @@ PropertyInfo css_devid_ro_propinfo = {
 .get = get_css_devid,
 };
 
-SubchDev *css_create_virtual_sch(CssDevId bus_id, Error **errp)
+SubchDev *css_create_sch(CssDevId bus_id, bool is_virtual, bool squash_mcss,
+ Error **errp)
 {
 uint16_t schid = 0;
 SubchDev *sch;
 
 if (bus_id.valid) {
-/* Enforce use of virtual cssid. */
-if (bus_id.cssid != VIRTUAL_CSSID) {
-error_setg(errp, "cssid %hhx not valid for virtual devices",
-   bus_id.cssid);
+if (is_virtual != (bus_id.cssid == VIRTUAL_CSSID)) {
+error_setg(errp, "cssid %hhx not valid for %s devices",
+   bus_id.cssid,
+   (is_virtual ? "virtual" : "non-virtual"));
 return NULL;
 }
+}
+
+if (bus_id.valid) {
+if (squash_mcss) {
+bus_id.cssid = channel_subsys.default_cssid;
+} else if (!channel_subsys.css[bus_id.cssid]) {
+css_create_css_image(bus_id.cssid, false);
+}
+
 if (!css_find_free_subch_for_devno(bus_id.cssid, bus_id.ssid,
bus_id.devid, , errp)) {
 return NULL;
 }
-} else {
-bus_id.cssid = VIRTUAL_CSSID;
+} else if (squash_mcss || is_virtual) {
+bus_id.cssid = channel_subsys.default_cssid;
+
 if (!css_find_free_subch_and_devno(bus_id.cssid, _id.ssid,
_id.devid, , errp)) {
 return NULL;
 }
+} else {
+for (bus_id.cssid = 0; bus_id.cssid < MAX_CSSID; ++bus_id.cssid) {
+if (bus_id.cssid == VIRTUAL_CSSID) {
+continue;
+}
+
+if (!channel_subsys.css[bus_id.cssid]) {
+css_create_css_image(bus_id.cssid, false);
+}
+
+if   (css_find_free_subch_and_devno(bus_id.cssid, _id.ssid,
+_id.devid, ,
+NULL)) {
+break;
+}
+if (bus_id.cssid == MAX_CSSID) {
+error_setg(errp, "Virtual channel subsystem is full!");
+return NULL;
+}
+}
 }
 
 sch = g_malloc0(sizeof(*sch));
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index

[Qemu-devel] [PATCH v8 10/13] s390x/css: introduce and realize ccw-request callback

2017-05-16 Thread Dong Jia Shi

From: Xiao Feng Ren 

Introduce a new callback on subchannel to handle ccw-request.
Realize the callback in vfio-ccw device. Besides, resort to
the event notifier handler to handling the ccw-request results.
1. Pread the I/O results via MMIO region.
2. Update the scsw info to guest.
3. Inject an I/O interrupt to notify guest the I/O result.

Acked-by: Alex Williamson 
Signed-off-by: Xiao Feng Ren 
Signed-off-by: Dong Jia Shi 
---
 hw/s390x/css.c  |  4 +--
 hw/vfio/ccw.c   | 85 +
 include/hw/s390x/css.h  |  2 ++
 include/hw/s390x/s390-ccw.h |  1 +
 4 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index a8aed9cb3a..462a768f9e 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -259,7 +259,7 @@ uint16_t css_build_subchannel_id(SubchDev *sch)
 return css_do_build_subchannel_id(sch->cssid, sch->ssid);
 }
 
-static void css_inject_io_interrupt(SubchDev *sch)
+void css_inject_io_interrupt(SubchDev *sch)
 {
 uint8_t isc = (sch->curr_status.pmcw.flags & PMCW_FLAGS_MASK_ISC) >> 11;
 
@@ -671,7 +671,7 @@ static void copy_pmcw_to_guest(PMCW *dest, const PMCW *src)
 dest->chars = cpu_to_be32(src->chars);
 }
 
-static void copy_scsw_to_guest(SCSW *dest, const SCSW *src)
+void copy_scsw_to_guest(SCSW *dest, const SCSW *src)
 {
 dest->flags = cpu_to_be16(src->flags);
 dest->ctrl = cpu_to_be16(src->ctrl);
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 689a7724b6..007ce435f1 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -47,6 +47,36 @@ struct VFIODeviceOps vfio_ccw_ops = {
 .vfio_compute_needs_reset = vfio_ccw_compute_needs_reset,
 };
 
+static int vfio_ccw_handle_request(ORB *orb, SCSW *scsw, void *data)
+{
+S390CCWDevice *cdev = data;
+VFIOCCWDevice *vcdev = DO_UPCAST(VFIOCCWDevice, cdev, cdev);
+struct ccw_io_region *region = vcdev->io_region;
+int ret;
+
+QEMU_BUILD_BUG_ON(sizeof(region->orb_area) != sizeof(ORB));
+QEMU_BUILD_BUG_ON(sizeof(region->scsw_area) != sizeof(SCSW));
+QEMU_BUILD_BUG_ON(sizeof(region->irb_area) != sizeof(IRB));
+
+memset(region, 0, sizeof(*region));
+
+memcpy(region->orb_area, orb, sizeof(ORB));
+memcpy(region->scsw_area, scsw, sizeof(SCSW));
+
+again:
+ret = pwrite(vcdev->vdev.fd, region,
+ vcdev->io_region_size, vcdev->io_region_offset);
+if (ret != vcdev->io_region_size) {
+if (errno == EAGAIN) {
+goto again;
+}
+error_report("vfio-ccw: wirte I/O region failed with errno=%d", errno);
+return -errno;
+}
+
+return region->ret_code;
+}
+
 static void vfio_ccw_reset(DeviceState *dev)
 {
 CcwDevice *ccw_dev = DO_UPCAST(CcwDevice, parent_obj, dev);
@@ -59,10 +89,62 @@ static void vfio_ccw_reset(DeviceState *dev)
 static void vfio_ccw_io_notifier_handler(void *opaque)
 {
 VFIOCCWDevice *vcdev = opaque;
+struct ccw_io_region *region = vcdev->io_region;
+S390CCWDevice *cdev = S390_CCW_DEVICE(vcdev);
+CcwDevice *ccw_dev = CCW_DEVICE(cdev);
+SubchDev *sch = ccw_dev->sch;
+SCSW *s = >curr_status.scsw;
+IRB irb;
+int size;
 
 if (!event_notifier_test_and_clear(>io_notifier)) {
 return;
 }
+
+size = pread(vcdev->vdev.fd, region, vcdev->io_region_size,
+ vcdev->io_region_offset);
+if (size == -1) {
+switch (errno) {
+case ENODEV:
+/* Generate a deferred cc 3 condition. */
+s->flags |= SCSW_FLAGS_MASK_CC;
+s->ctrl &= ~SCSW_CTRL_MASK_STCTL;
+s->ctrl |= (SCSW_STCTL_ALERT | SCSW_STCTL_STATUS_PEND);
+goto read_err;
+case EFAULT:
+/* Memory problem, generate channel data check. */
+s->ctrl &= ~SCSW_ACTL_START_PEND;
+s->cstat = SCSW_CSTAT_DATA_CHECK;
+s->ctrl &= ~SCSW_CTRL_MASK_STCTL;
+s->ctrl |= SCSW_STCTL_PRIMARY | SCSW_STCTL_SECONDARY |
+   SCSW_STCTL_ALERT | SCSW_STCTL_STATUS_PEND;
+goto read_err;
+default:
+/* Error, generate channel program check. */
+s->ctrl &= ~SCSW_ACTL_START_PEND;
+s->cstat = SCSW_CSTAT_PROG_CHECK;
+s->ctrl &= ~SCSW_CTRL_MASK_STCTL;
+s->ctrl |= SCSW_STCTL_PRIMARY | SCSW_STCTL_SECONDARY |
+   SCSW_STCTL_ALERT | SCSW_STCTL_STATUS_PEND;
+goto read_err;
+}
+} else if (size != vcdev->io_region_size) {
+/* Information transfer error, generate channel-control check. */
+s->ctrl &= ~SCSW_ACTL_START_PEND;
+s->cstat = SCSW_CSTAT_CHN_CTRL_CHK;
+s->ctrl &= ~SCSW_CTRL_MASK_STCTL;
+s->ctrl |= SCSW_STCTL_PRIMARY | SCSW_STCTL_SECONDARY |
+   SCSW_STCTL_ALERT | SCSW_STCTL_STATUS_PEND;
+goto read_err;
+}
+
+

[Qemu-devel] [PATCH v8 08/13] vfio/ccw: get io region info

2017-05-16 Thread Dong Jia Shi

vfio-ccw provides an MMIO region for I/O operations. We fetch its
information via ioctls here, then we can use it performing I/O
instructions and retrieving I/O results later on.

Reviewed-by: Eric Auger 
Acked-by: Alex Williamson 
Signed-off-by: Dong Jia Shi 
---
 hw/vfio/ccw.c | 54 ++
 1 file changed, 54 insertions(+)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 7d2497cee6..7ddcfd7767 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -12,6 +12,7 @@
  */
 
 #include 
+#include 
 #include 
 
 #include "qemu/osdep.h"
@@ -26,6 +27,9 @@
 typedef struct VFIOCCWDevice {
 S390CCWDevice cdev;
 VFIODevice vdev;
+uint64_t io_region_size;
+uint64_t io_region_offset;
+struct ccw_io_region *io_region;
 } VFIOCCWDevice;
 
 static void vfio_ccw_compute_needs_reset(VFIODevice *vdev)
@@ -50,6 +54,48 @@ static void vfio_ccw_reset(DeviceState *dev)
 ioctl(vcdev->vdev.fd, VFIO_DEVICE_RESET);
 }
 
+static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
+{
+VFIODevice *vdev = >vdev;
+struct vfio_region_info *info;
+int ret;
+
+/* Sanity check device */
+if (!(vdev->flags & VFIO_DEVICE_FLAGS_CCW)) {
+error_setg(errp, "vfio: Um, this isn't a vfio-ccw device");
+return;
+}
+
+if (vdev->num_regions < VFIO_CCW_CONFIG_REGION_INDEX + 1) {
+error_setg(errp, "vfio: Unexpected number of the I/O region %u",
+   vdev->num_regions);
+return;
+}
+
+ret = vfio_get_region_info(vdev, VFIO_CCW_CONFIG_REGION_INDEX, );
+if (ret) {
+error_setg_errno(errp, -ret, "vfio: Error getting config info");
+return;
+}
+
+vcdev->io_region_size = info->size;
+if (sizeof(*vcdev->io_region) != vcdev->io_region_size) {
+error_setg(errp, "vfio: Unexpected size of the I/O region");
+g_free(info);
+return;
+}
+
+vcdev->io_region_offset = info->offset;
+vcdev->io_region = g_malloc0(info->size);
+
+g_free(info);
+}
+
+static void vfio_ccw_put_region(VFIOCCWDevice *vcdev)
+{
+g_free(vcdev->io_region);
+}
+
 static void vfio_put_device(VFIOCCWDevice *vcdev)
 {
 g_free(vcdev->vdev.name);
@@ -122,8 +168,15 @@ static void vfio_ccw_realize(DeviceState *dev, Error 
**errp)
 goto out_device_err;
 }
 
+vfio_ccw_get_region(vcdev, );
+if (err) {
+goto out_region_err;
+}
+
 return;
 
+out_region_err:
+vfio_put_device(vcdev);
 out_device_err:
 vfio_put_group(group);
 out_group_err:
@@ -142,6 +195,7 @@ static void vfio_ccw_unrealize(DeviceState *dev, Error 
**errp)
 S390CCWDeviceClass *cdc = S390_CCW_DEVICE_GET_CLASS(cdev);
 VFIOGroup *group = vcdev->vdev.group;
 
+vfio_ccw_put_region(vcdev);
 vfio_put_device(vcdev);
 vfio_put_group(group);
 
-- 
2.11.2

[Qemu-devel] [PATCH v8 04/13] s390x/css: realize css_sch_build_schib

2017-05-16 Thread Dong Jia Shi

From: Xiao Feng Ren 

The S390 virtual css support already has a mechanism to build a
virtual subchannel information block (schib) and provide virtual
subchannels to the guest. However, to pass-through subchannels to
a guest, we need to introduce a new mechanism to build its schib
according to the real device information. Thus we realize a new css
sch_build_schib function to extract the path_masks, chpids, chpid
type from sysfs. To reuse the existing code, we refactor
css_add_virtual_chpid to css_add_chpid.

Reviewed-by: Pierre Morel 
Signed-off-by: Xiao Feng Ren 
Signed-off-by: Dong Jia Shi 
---
 hw/s390x/css.c | 152 -
 include/hw/s390x/css.h |  36 ++--
 2 files changed, 168 insertions(+), 20 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 15c4f4b249..2c8d0e7219 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -13,6 +13,7 @@
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "hw/qdev.h"
+#include "qemu/error-report.h"
 #include "qemu/bitops.h"
 #include "exec/address-spaces.h"
 #include "cpu.h"
@@ -1326,7 +1327,8 @@ unsigned int css_find_free_chpid(uint8_t cssid)
 return MAX_CHPID + 1;
 }
 
-static int css_add_virtual_chpid(uint8_t cssid, uint8_t chpid, uint8_t type)
+static int css_add_chpid(uint8_t cssid, uint8_t chpid, uint8_t type,
+ bool is_virt)
 {
 CssImage *css;
 
@@ -1340,7 +1342,7 @@ static int css_add_virtual_chpid(uint8_t cssid, uint8_t 
chpid, uint8_t type)
 }
 css->chpids[chpid].in_use = 1;
 css->chpids[chpid].type = type;
-css->chpids[chpid].is_virtual = 1;
+css->chpids[chpid].is_virtual = is_virt;
 
 css_generate_chp_crws(cssid, chpid);
 
@@ -1364,7 +1366,7 @@ void css_sch_build_virtual_schib(SubchDev *sch, uint8_t 
chpid, uint8_t type)
 p->pam = 0x80;
 p->chpid[0] = chpid;
 if (!css->chpids[chpid].in_use) {
-css_add_virtual_chpid(sch->cssid, chpid, type);
+css_add_chpid(sch->cssid, chpid, type, true);
 }
 
 memset(s, 0, sizeof(SCSW));
@@ -1978,3 +1980,147 @@ SubchDev *css_create_virtual_sch(CssDevId bus_id, Error 
**errp)
 css_subch_assign(sch->cssid, sch->ssid, schid, sch->devno, sch);
 return sch;
 }
+
+static int css_sch_get_chpids(SubchDev *sch, CssDevId *dev_id)
+{
+char *fid_path;
+FILE *fd;
+uint32_t chpid[8];
+int i;
+PMCW *p = >curr_status.pmcw;
+
+fid_path = g_strdup_printf("/sys/bus/css/devices/%x.%x.%04x/chpids",
+   dev_id->cssid, dev_id->ssid, dev_id->devid);
+fd = fopen(fid_path, "r");
+if (fd == NULL) {
+error_report("%s: open %s failed", __func__, fid_path);
+g_free(fid_path);
+return -EINVAL;
+}
+
+if (fscanf(fd, "%x %x %x %x %x %x %x %x",
+[0], [1], [2], [3],
+[4], [5], [6], [7]) != 8) {
+fclose(fd);
+g_free(fid_path);
+return -EINVAL;
+}
+
+for (i = 0; i < ARRAY_SIZE(p->chpid); i++) {
+p->chpid[i] = chpid[i];
+}
+
+fclose(fd);
+g_free(fid_path);
+
+return 0;
+}
+
+static int css_sch_get_path_masks(SubchDev *sch, CssDevId *dev_id)
+{
+char *fid_path;
+FILE *fd;
+uint32_t pim, pam, pom;
+PMCW *p = >curr_status.pmcw;
+
+fid_path = g_strdup_printf("/sys/bus/css/devices/%x.%x.%04x/pimpampom",
+   dev_id->cssid, dev_id->ssid, dev_id->devid);
+fd = fopen(fid_path, "r");
+if (fd == NULL) {
+error_report("%s: open %s failed", __func__, fid_path);
+g_free(fid_path);
+return -EINVAL;
+}
+
+if (fscanf(fd, "%x %x %x", , , ) != 3) {
+fclose(fd);
+g_free(fid_path);
+return -EINVAL;
+}
+
+p->pim = pim;
+p->pam = pam;
+p->pom = pom;
+fclose(fd);
+g_free(fid_path);
+
+return 0;
+}
+
+static int css_sch_get_chpid_type(uint8_t chpid, uint32_t *type,
+  CssDevId *dev_id)
+{
+char *fid_path;
+FILE *fd;
+
+fid_path = g_strdup_printf("/sys/devices/css%x/chp0.%02x/type",
+   dev_id->cssid, chpid);
+fd = fopen(fid_path, "r");
+if (fd == NULL) {
+error_report("%s: open %s failed", __func__, fid_path);
+g_free(fid_path);
+return -EINVAL;
+}
+
+if (fscanf(fd, "%x", type) != 1) {
+fclose(fd);
+g_free(fid_path);
+return -EINVAL;
+}
+
+fclose(fd);
+g_free(fid_path);
+
+return 0;
+}
+
+/*
+ * We currently retrieve the real device information from sysfs to build the
+ * guest subchannel information block without considering the migration 
feature.
+ * We need to revisit this problem when we want to add migration support.
+ */
+int css_sch_build_schib(SubchDev *sch, CssDevId *dev_id)
+{
+CssImage *css = channel_subsys.css[sch->cssid];
+PMCW *p

[Qemu-devel] [PATCH v8 01/13] update-linux-headers: update for vfio-ccw

2017-05-16 Thread Dong Jia Shi

Add vfio_ccw.h.

Signed-off-by: Dong Jia Shi 
---
 scripts/update-linux-headers.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 6a370a8669..2f906c4d16 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -113,7 +113,7 @@ done
 
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
-for header in kvm.h kvm_para.h vfio.h vhost.h \
+for header in kvm.h kvm_para.h vfio.h vfio_ccw.h vhost.h \
   psci.h userfaultfd.h; do
 cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
-- 
2.11.2

[Qemu-devel] [PATCH v8 11/13] s390x/css: ccw translation infrastructure

2017-05-16 Thread Dong Jia Shi

From: Xiao Feng Ren 

Implement a basic infrastructure of handling channel I/O instruction
interception for passed through subchannels:
1. Branch the code path of instruction interception handling by
   SubChannel type.
2. For a passed-through subchannel, issue the ORB to kernel to do ccw
   translation and perform an I/O operation.
3. Assign different condition code based on the I/O result, or
   trigger a program check.

Signed-off-by: Xiao Feng Ren 
Signed-off-by: Dong Jia Shi 
---
 hw/s390x/css.c | 89 ++
 hw/s390x/s390-ccw.c| 12 +++
 hw/s390x/virtio-ccw.c  |  1 +
 include/hw/s390x/css.h |  4 +++
 target/s390x/ioinst.c  |  9 +
 5 files changed, 109 insertions(+), 6 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 462a768f9e..1e2f26b65a 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -524,7 +524,7 @@ static int css_interpret_ccw(SubchDev *sch, hwaddr ccw_addr,
 return ret;
 }
 
-static void sch_handle_start_func(SubchDev *sch, ORB *orb)
+static void sch_handle_start_func_virtual(SubchDev *sch, ORB *orb)
 {
 
 PMCW *p = >curr_status.pmcw;
@@ -626,13 +626,58 @@ static void sch_handle_start_func(SubchDev *sch, ORB *orb)
 
 }
 
+static int sch_handle_start_func_passthrough(SubchDev *sch, ORB *orb)
+{
+
+PMCW *p = >curr_status.pmcw;
+SCSW *s = >curr_status.scsw;
+int ret;
+
+if (!(s->ctrl & SCSW_ACTL_SUSP)) {
+assert(orb != NULL);
+p->intparm = orb->intparm;
+}
+
+/*
+ * Only support prefetch enable mode.
+ * Only support 64bit addressing idal.
+ */
+if (!(orb->ctrl0 & ORB_CTRL0_MASK_PFCH) ||
+!(orb->ctrl0 & ORB_CTRL0_MASK_C64)) {
+return -EINVAL;
+}
+
+ret = s390_ccw_cmd_request(orb, s, sch->driver_data);
+switch (ret) {
+/* Currently we don't update control block and just return the cc code. */
+case 0:
+break;
+case -EBUSY:
+break;
+case -ENODEV:
+break;
+case -EACCES:
+/* Let's reflect an inaccessible host device by cc 3. */
+ret = -ENODEV;
+break;
+default:
+   /*
+* All other return codes will trigger a program check,
+* or set cc to 1.
+*/
+   break;
+};
+
+return ret;
+}
+
 /*
  * On real machines, this would run asynchronously to the main vcpus.
  * We might want to make some parts of the ssch handling (interpreting
  * read/writes) asynchronous later on if we start supporting more than
  * our current very simple devices.
  */
-static void do_subchannel_work(SubchDev *sch, ORB *orb)
+int do_subchannel_work_virtual(SubchDev *sch, ORB *orb)
 {
 
 SCSW *s = >curr_status.scsw;
@@ -643,12 +688,45 @@ static void do_subchannel_work(SubchDev *sch, ORB *orb)
 sch_handle_halt_func(sch);
 } else if (s->ctrl & SCSW_FCTL_START_FUNC) {
 /* Triggered by both ssch and rsch. */
-sch_handle_start_func(sch, orb);
+sch_handle_start_func_virtual(sch, orb);
 } else {
 /* Cannot happen. */
-return;
+return 0;
 }
 css_inject_io_interrupt(sch);
+return 0;
+}
+
+int do_subchannel_work_passthrough(SubchDev *sch, ORB *orb)
+{
+int ret;
+SCSW *s = >curr_status.scsw;
+
+if (s->ctrl & SCSW_FCTL_CLEAR_FUNC) {
+/* TODO: Clear handling */
+sch_handle_clear_func(sch);
+ret = 0;
+} else if (s->ctrl & SCSW_FCTL_HALT_FUNC) {
+/* TODO: Halt handling */
+sch_handle_halt_func(sch);
+ret = 0;
+} else if (s->ctrl & SCSW_FCTL_START_FUNC) {
+ret = sch_handle_start_func_passthrough(sch, orb);
+} else {
+/* Cannot happen. */
+return -ENODEV;
+}
+
+return ret;
+}
+
+static int do_subchannel_work(SubchDev *sch, ORB *orb)
+{
+if (sch->do_subchannel_work) {
+return sch->do_subchannel_work(sch, orb);
+} else {
+return -EINVAL;
+}
 }
 
 static void copy_pmcw_to_guest(PMCW *dest, const PMCW *src)
@@ -967,8 +1045,7 @@ int css_do_ssch(SubchDev *sch, ORB *orb)
 s->ctrl |= (SCSW_FCTL_START_FUNC | SCSW_ACTL_START_PEND);
 s->flags &= ~SCSW_FLAGS_MASK_PNO;
 
-do_subchannel_work(sch, orb);
-ret = 0;
+ret = do_subchannel_work(sch, orb);
 
 out:
 return ret;
diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
index e2b1973fda..8614dda6f8 100644
--- a/hw/s390x/s390-ccw.c
+++ b/hw/s390x/s390-ccw.c
@@ -18,6 +18,17 @@
 #include "hw/s390x/css-bridge.h"
 #include "hw/s390x/s390-ccw.h"
 
+int s390_ccw_cmd_request(ORB *orb, SCSW *scsw, void *data)
+{
+S390CCWDeviceClass *cdc = S390_CCW_DEVICE_GET_CLASS(data);
+
+if (cdc->handle_request) {
+return cdc->handle_request(orb, scsw, data);
+} else {
+return -ENOSYS;
+}
+}
+
 static void s390_ccw_get_dev_info(S390CCWDevice *cdev,
   char *sysfsdev,

[Qemu-devel] [PATCH v8 03/13] s390x/css: add s390-squash-mcss machine option

2017-05-16 Thread Dong Jia Shi

From: Xiao Feng Ren 

We want to support real (i.e. not virtual) channel devices
even for guests that do not support MCSS-E (where guests may
see devices from any channel subsystem image at once). As all
virtio-ccw devices are in css 0xfe (and show up in the default
css 0 for guests not activating MCSS-E), we need an option to
squash both the virtio subchannels and e.g. passed-through
subchannels from their real css (0-3, or 0 for hosts not
activating MCSS-E) into the default css. This will be
exploited in a later patch.

Signed-off-by: Xiao Feng Ren 
Signed-off-by: Dong Jia Shi 
---
 hw/s390x/s390-virtio-ccw.c | 21 +
 include/hw/s390x/s390-virtio-ccw.h |  1 +
 qemu-options.hx|  6 +-
 target/s390x/cpu.h | 10 ++
 4 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index fdd4384ff0..cd007ca8cf 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -303,6 +303,20 @@ static void machine_set_loadparm(Object *obj, const char 
*val, Error **errp)
 ms->loadparm[i] = ' '; /* pad right with spaces */
 }
 }
+static inline bool machine_get_squash_mcss(Object *obj, Error **errp)
+{
+S390CcwMachineState *ms = S390_CCW_MACHINE(obj);
+
+return ms->s390_squash_mcss;
+}
+
+static inline void machine_set_squash_mcss(Object *obj, bool value,
+   Error **errp)
+{
+S390CcwMachineState *ms = S390_CCW_MACHINE(obj);
+
+ms->s390_squash_mcss = value;
+}
 
 static inline void s390_machine_initfn(Object *obj)
 {
@@ -328,6 +342,13 @@ static inline void s390_machine_initfn(Object *obj)
 " to upper case) to pass to machine loader, boot manager,"
 " and guest kernel",
 NULL);
+object_property_add_bool(obj, "s390-squash-mcss",
+ machine_get_squash_mcss,
+ machine_set_squash_mcss, NULL);
+object_property_set_description(obj, "s390-squash-mcss",
+"enable/disable squashing subchannels into the default css",
+NULL);
+object_property_set_bool(obj, false, "s390-squash-mcss", NULL);
 }
 
 static const TypeInfo ccw_machine_info = {
diff --git a/include/hw/s390x/s390-virtio-ccw.h 
b/include/hw/s390x/s390-virtio-ccw.h
index 7b8a3e4d74..3027555f6d 100644
--- a/include/hw/s390x/s390-virtio-ccw.h
+++ b/include/hw/s390x/s390-virtio-ccw.h
@@ -29,6 +29,7 @@ typedef struct S390CcwMachineState {
 bool aes_key_wrap;
 bool dea_key_wrap;
 uint8_t loadparm[8];
+bool s390_squash_mcss;
 } S390CcwMachineState;
 
 typedef struct S390CcwMachineClass {
diff --git a/qemu-options.hx b/qemu-options.hx
index f68829f3b0..090c6fe8a6 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -42,7 +42,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
 "dea-key-wrap=on|off controls support for DEA key wrapping 
(default=on)\n"
 "suppress-vmdesc=on|off disables self-describing migration 
(default=off)\n"
 "nvdimm=on|off controls NVDIMM support (default=off)\n"
-"enforce-config-section=on|off enforce configuration 
section migration (default=off)\n",
+"enforce-config-section=on|off enforce configuration 
section migration (default=off)\n"
+"s390-squash-mcss=on|off controls support for squashing 
into default css (default=off)\n",
 QEMU_ARCH_ALL)
 STEXI
 @item -machine [type=]@var{name}[,prop=@var{value}[,...]]
@@ -81,6 +82,9 @@ controls whether DEA wrapping keys will be created to allow
 execution of DEA cryptographic functions.  The default is on.
 @item nvdimm=on|off
 Enables or disables NVDIMM support. The default is off.
+@item s390-squash-mcss=on|off
+Enables or disables squashing subchannels into the default css.
+The default is off.
 @end table
 ETEXI
 
diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index 058ddad83a..c36789112e 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -1250,6 +1250,16 @@ static inline void s390_crypto_reset(void)
 }
 }
 
+static inline bool s390_get_squash_mcss(void)
+{
+if (object_property_get_bool(OBJECT(qdev_get_machine()), 
"s390-squash-mcss",
+ NULL)) {
+return true;
+}
+
+return false;
+}
+
 /* machine check interruption code */
 
 /* subclasses */
-- 
2.11.2

[Qemu-devel] [PATCH v8 00/13] basic channel IO passthrough infrastructure based on vfio

2017-05-16 Thread Dong Jia Shi

The patch series introduce a basic channel I/O passthrough
infrastructure based on vfio.
- Focus on supporting dasd-eckd(cu_type/dev_type = 0x3990/0x3390) as
  the target device. 
- Support new qemu parameters in the style of:
-machine s390-ccw-virtio(,s390-squash-mcss=on|off) \
-device vfio-ccw,sysfsdev=$MDEV_PATH
  We want to support real (i.e. not virtual) channel devices even for
  guests that do not support MCSS-E (where guests may see devices from
  any channel subsystem image at once). As all virtio-ccw devices are in
  css 0xfe (and show up in the default css 0 for guests not activating
  MCSS-E), we need an option to squash e.g. passed-through channel devices
  from their real css (0-3, or 0 for hosts not activating MCSS-E) into
  the default css, that is what the new machine option s390-squash-css is
  added.

Build and install:
1. kernel configuration
  CONFIG_S390_CCW_IOMMU=m
  CONFIG_VFIO=m
  CONFIG_VFIO_MDEV=m
  CONFIG_VFIO_MDEV_DEVICE=m
  CONFIG_VFIO_CCW=m
2. modules required
  modprobe vfio.ko
  modprobe mdev.ko
  modprobe vfio_mdev.ko
  modprobe vfio_iommu_type1.ko
  modprobe vfio_ccw.ko
3. find a subchannel(0.0."%schid") of a DASD-ECKD device and bind it to
  vfio_ccw driver
  #find the dasd you can use with lsdasd on your host. e.g.:
  devno="7e52"
  schid="16ca"
  #unbind the ccw device from the subchannel
  echo 0.0."$devno" > /sys/bus/ccw/devices/0.0."$devno"/driver/unbind
  #unbind the subchannel from io_subchannel driver
  echo 0.0."$schid" > /sys/bus/css/devices/0.0."$schid"/driver/unbind
  #bind the subchannel with vfio_ccw driver
  echo 0.0."$schid" > /sys/bus/css/drivers/vfio_ccw/bind
4. create a mediated device
  #generate a uuid with uuidgen. e.g.:
  uuid="6dfd3ec5-e8b3-4e18-a6fe-57bc9eceb920"
  echo "$uuid" > \
  /sys/bus/css/devices/0.0."$schid"/mdev_supported_types/vfio_ccw-io/create
5. pass-through this device to a vm
  -M s390-ccw-virtio,s390-squash-css=on \
  -device vfio-ccw,sysfsdev=/sys/bus/mdev/devices/$uuid \
  ... ...

Change log:
v7 -> v8:
1. Rebased against master (commit: dd1559b), which contents the 3270
   changes.
2. Patch #4:
   Cosmetic changes for commit message and comments.
3. Patch #5:
   Cosmetic changes for commit message and comments.
   Removed an extra blank.
   For CSS 0xFE use cases, renamed virtio and non virtio to virtual
   and non virtual for the coressponding parameter, message, and
   comments.
   Used the new css_create_sch interface in hw/s390x/3270-ccw.c.
4. Patch #6:
   Moved hw/s390x/s390-ccw.h to include/hw/s390x/s390-ccw.h.
   Added a check for sscanf return.
5. Patch #9:
   Removed the set_error label, and rename get_error to out_free_info.
6. Patch #13:
   Added hw/s390x/s390-ccw.c and include/hw/s390x-s390-ccw.h.
7. For those patches which got a A-B and(or) a R-B, added it(them).

v6 -> v7:
1. Patch #6:
   Use error_setg_errno as possible.
   Use local Error variable as possible.
   Free @sch when error out.
2. Patch #7:
   Use local Error variable as possible.
   Remove vfio_ccw_put_group.
3. Patch #8:
   Use local Error variable as possible.
   Free @info when error out.
4. Patch #9:
   Use error_setg_errno as possible.
   Use local Error variable as possible.
5. Patch #10:
   Move handle_request from device to class.
   Generate channel-check for information transfer error.
   Improve pread logic.

v5 -> v6:
1. Rebase against git://github.com/cohuck/qemu s390-next.
2. Patch #6: correct error message: -vfio-ccw --> -device vfio-ccw
3. Patch #7:
   Rewrite vfio_ccw_get_group by:
 - removing unnecessary checking of path existance;
 - removing useless 'path' variable;
   Fix a typo: s/operationis/operations/
   In vfio_ccw_unrealize, move cdc->unrealize to the end.

v4 -> v5:
1. Rebase to git://github.com/cohuck/qemu s390-next.
2. New patch #1: update-linux-headers.
3. Patch #6: update s390_ccw_realize according to the new code base.
4. New patch #13: add maintainer for vfio-ccw.

v3 -> v4:
1. Adjustments of the s-o-b chains for some patches.

v2 -> v3:
1. Move vfio_ccw.h to uapi.
2. Adopt the vfio-ccw cmdline interface as vfio-pci with mdev devices.
3. Rename s390-map-css to s390-squash-mcss (patch 2), and update devno
   generation method (patch 5).
4. Patch 7: correct the validation of num_regions.
5. Patch 8: correct the validation of num_irqs.

v1 -> v2:
1. Rebase the implementation to the mdev framework approach.
2. Use pread and pwrite on an I/O region to issue I/O requests and
   receive results.

Dong Jia Shi (8):
  update-linux-headers: update for vfio-ccw
  vfio: linux-headers update for vfio-ccw
  s390x/css: realize css_create_sch
  s390x/css: device support for s390-ccw passthrough
  vfio/ccw: get io region info
  vfio/ccw: get irqs info and set the eventfd fd
  vfio/ccw: update sense data if a unit check is pending
  MAINTAINERS: Add vfio-ccw maintainer

Xiao Feng Ren (5):
  s390x/css: add s390-squash-mcss machine option
  s390x/css: realize css_sch_build_schib
  vfio/ccw: vfio based subchannel

[Qemu-devel] [PATCH v8 02/13] vfio: linux-headers update for vfio-ccw

2017-05-16 Thread Dong Jia Shi

This is a placeholder for a linux-headers update.

Signed-off-by: Dong Jia Shi 
---
 linux-headers/linux/vfio.h | 17 +
 linux-headers/linux/vfio_ccw.h | 28 
 2 files changed, 45 insertions(+)
 create mode 100644 linux-headers/linux/vfio_ccw.h

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 531cb2eda9..39a1d3b2e3 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -198,6 +198,7 @@ struct vfio_device_info {
 #define VFIO_DEVICE_FLAGS_PCI  (1 << 1)/* vfio-pci device */
 #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)/* vfio-platform device */
 #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)   /* vfio-amba device */
+#define VFIO_DEVICE_FLAGS_CCW   (1 << 4)   /* vfio-ccw device */
__u32   num_regions;/* Max region index + 1 */
__u32   num_irqs;   /* Max IRQ index + 1 */
 };
@@ -446,6 +447,22 @@ enum {
VFIO_PCI_NUM_IRQS
 };
 
+/*
+ * The VFIO-CCW bus driver makes use of the following fixed region and
+ * IRQ index mapping.  Unimplemented regions return a size of zero.
+ * Unimplemented IRQ types return a count of zero.
+ */
+
+enum {
+VFIO_CCW_CONFIG_REGION_INDEX,
+VFIO_CCW_NUM_REGIONS
+};
+
+enum {
+VFIO_CCW_IO_IRQ_INDEX,
+VFIO_CCW_NUM_IRQS
+};
+
 /**
  * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IORW(VFIO_TYPE, VFIO_BASE + 12,
  *   struct vfio_pci_hot_reset_info)
diff --git a/linux-headers/linux/vfio_ccw.h b/linux-headers/linux/vfio_ccw.h
new file mode 100644
index 00..4ee74aedeb
--- /dev/null
+++ b/linux-headers/linux/vfio_ccw.h
@@ -0,0 +1,28 @@
+/*
+ * Interfaces for vfio-ccw
+ *
+ * Copyright IBM Corp. 2017
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2 only)
+ * as published by the Free Software Foundation.
+ *
+ * Author(s): Dong Jia Shi 
+ */
+
+#ifndef _VFIO_CCW_H_
+#define _VFIO_CCW_H_
+
+#include 
+
+struct ccw_io_region {
+#define ORB_AREA_SIZE 12
+   __u8orb_area[ORB_AREA_SIZE];
+#define SCSW_AREA_SIZE 12
+   __u8scsw_area[SCSW_AREA_SIZE];
+#define IRB_AREA_SIZE 96
+   __u8irb_area[IRB_AREA_SIZE];
+   __u32   ret_code;
+} __packed;
+
+#endif
-- 
2.11.2

[Qemu-devel] [PATCH 2/4] target/arm: simplify and optimize aarch64 rev16

2017-05-16 Thread Aurelien Jarno

Instead of byteswapping individual 16-bit words one by one, work on the
whole register at the same time using shifts and mask. This is the same
strategy than the aarch32 version of rev16 and is much more efficient
in the case sf=1.

Signed-off-by: Aurelien Jarno 
---
 target/arm/translate-a64.c | 24 ++--
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 24de30d92c..ed15d21655 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4035,24 +4035,12 @@ static void handle_rev16(DisasContext *s, unsigned int 
sf,
 TCGv_i64 tcg_tmp = tcg_temp_new_i64();
 TCGv_i64 tcg_rn = read_cpu_reg(s, rn, sf);
 
-tcg_gen_andi_i64(tcg_tmp, tcg_rn, 0x);
-tcg_gen_bswap16_i64(tcg_rd, tcg_tmp);
-
-tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16);
-tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0x);
-tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 16, 16);
-
-if (sf) {
-tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32);
-tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0x);
-tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 32, 16);
-
-tcg_gen_shri_i64(tcg_tmp, tcg_rn, 48);
-tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 48, 16);
-}
+TCGv mask = tcg_const_i64(sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff);
+tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
+tcg_gen_and_i64(tcg_rd, tcg_rn, mask);
+tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask);
+tcg_gen_shli_i64(tcg_rd, tcg_rd, 8);
+tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_tmp);
 
 tcg_temp_free_i64(tcg_tmp);
 }
-- 
2.11.0

[Qemu-devel] [PATCH 1/4] target/arm: optimize aarch32 rev16

2017-05-16 Thread Aurelien Jarno

Use the same mask to avoid having to load two different constants, as
suggested by Richard Henderson.

Signed-off-by: Aurelien Jarno 
---
 target/arm/translate.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 0b5a0bca06..5becb2bb89 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -339,11 +339,13 @@ static void gen_smul_dual(TCGv_i32 a, TCGv_i32 b)
 static void gen_rev16(TCGv_i32 var)
 {
 TCGv_i32 tmp = tcg_temp_new_i32();
+TCGv_i32 mask = tcg_const_i32(0x00ff00ff);
 tcg_gen_shri_i32(tmp, var, 8);
-tcg_gen_andi_i32(tmp, tmp, 0x00ff00ff);
+tcg_gen_and_i32(tmp, tmp, mask);
+tcg_gen_and_i32(var, var, mask);
 tcg_gen_shli_i32(var, var, 8);
-tcg_gen_andi_i32(var, var, 0xff00ff00);
 tcg_gen_or_i32(var, var, tmp);
+tcg_temp_free_i32(mask);
 tcg_temp_free_i32(tmp);
 }
 
-- 
2.11.0

[Qemu-devel] [PATCH 3/4] target/cris: optimize swap

2017-05-16 Thread Aurelien Jarno

Use the same mask to avoid having to load two different constants, as
suggest by Richard Henderson. Also use one less temp.

Signed-off-by: Aurelien Jarno 
---
 target/cris/translate.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index 0ee05ca02d..103b214233 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -433,20 +433,19 @@ static inline void t_gen_subx_carry(DisasContext *dc, 
TCGv d)
T0 = ((T0 << 8) & 0xff00ff00) | ((T0 >> 8) & 0x00ff00ff)  */
 static inline void t_gen_swapb(TCGv d, TCGv s)
 {
-TCGv t, org_s;
+TCGv t, m;
 
 t = tcg_temp_new();
-org_s = tcg_temp_new();
+m = tcg_const_tl(0x00ff00ff);
 
 /* d and s may refer to the same object.  */
-tcg_gen_mov_tl(org_s, s);
-tcg_gen_shli_tl(t, org_s, 8);
-tcg_gen_andi_tl(d, t, 0xff00ff00);
-tcg_gen_shri_tl(t, org_s, 8);
-tcg_gen_andi_tl(t, t, 0x00ff00ff);
+tcg_gen_shri_tl(t, s, 8);
+tcg_gen_and_tl(t, t, m);
+tcg_gen_and_tl(d, s, m);
+tcg_gen_shli_tl(d, d, 8);
 tcg_gen_or_tl(d, d, t);
+tcg_temp_free(m);
 tcg_temp_free(t);
-tcg_temp_free(org_s);
 }
 
 /* Swap the halfwords of the s operand.  */
-- 
2.11.0

[Qemu-devel] [PATCH 0/4] target/arm, cris, mips: optimize "swap bytes within words"

2017-05-16 Thread Aurelien Jarno

This patchset optimizes the "swap bytes within words" instructions on the
arm, cris and mips targets. It all started with the patchset from Philippe
Mathieu-Daudé optimizing TCG code by using the extract op. Looking at the
patch I have found that the aarch64 rev16 function can be optimized even
more. Richard Henderson then suggested an even more optimized version.

Aurelien Jarno (4):
  target/arm: optimize aarch32 rev16
  target/arm: simplify and optimize aarch64 rev16
  target/cris: optimize swap
  target/mips: optimize WSBH, DSBH and DSHD

 target/arm/translate-a64.c | 24 ++--
 target/arm/translate.c |  6 --
 target/cris/translate.c| 15 +++
 target/mips/translate.c| 18 --
 4 files changed, 29 insertions(+), 34 deletions(-)

-- 
2.11.0

[Qemu-devel] [PATCH 4/4] target/mips: optimize WSBH, DSBH and DSHD

2017-05-16 Thread Aurelien Jarno

Use the same mask to avoid having to load two different constants, as
suggested by Richard Henderson.

Signed-off-by: Aurelien Jarno 
---
 target/mips/translate.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 3022f349cb..c71eed498c 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -4572,12 +4572,14 @@ static void gen_bshfl (DisasContext *ctx, uint32_t op2, 
int rt, int rd)
 case OPC_WSBH:
 {
 TCGv t1 = tcg_temp_new();
+TCGv t2 = tcg_const_tl(0x00FF00FF);
 
 tcg_gen_shri_tl(t1, t0, 8);
-tcg_gen_andi_tl(t1, t1, 0x00FF00FF);
+tcg_gen_and_tl(t1, t1, t2);
+tcg_gen_and_tl(t0, t0, t2);
 tcg_gen_shli_tl(t0, t0, 8);
-tcg_gen_andi_tl(t0, t0, ~0x00FF00FF);
 tcg_gen_or_tl(t0, t0, t1);
+tcg_temp_free(t2);
 tcg_temp_free(t1);
 tcg_gen_ext32s_tl(cpu_gpr[rd], t0);
 }
@@ -4592,27 +4594,31 @@ static void gen_bshfl (DisasContext *ctx, uint32_t op2, 
int rt, int rd)
 case OPC_DSBH:
 {
 TCGv t1 = tcg_temp_new();
+TCGv t2 = tcg_const_tl(0x00FF00FF00FF00FFULL);
 
 tcg_gen_shri_tl(t1, t0, 8);
-tcg_gen_andi_tl(t1, t1, 0x00FF00FF00FF00FFULL);
+tcg_gen_and_tl(t1, t1, t2);
+tcg_gen_and_tl(t0, t0, t2);
 tcg_gen_shli_tl(t0, t0, 8);
-tcg_gen_andi_tl(t0, t0, ~0x00FF00FF00FF00FFULL);
 tcg_gen_or_tl(cpu_gpr[rd], t0, t1);
+tcg_temp_free(t2);
 tcg_temp_free(t1);
 }
 break;
 case OPC_DSHD:
 {
 TCGv t1 = tcg_temp_new();
+TCGv t2 = tcg_const_tl(0xULL);
 
 tcg_gen_shri_tl(t1, t0, 16);
-tcg_gen_andi_tl(t1, t1, 0xULL);
+tcg_gen_and_tl(t1, t1, t2);
+tcg_gen_and_tl(t0, t0, t2);
 tcg_gen_shli_tl(t0, t0, 16);
-tcg_gen_andi_tl(t0, t0, ~0xULL);
 tcg_gen_or_tl(t0, t0, t1);
 tcg_gen_shri_tl(t1, t0, 32);
 tcg_gen_shli_tl(t0, t0, 32);
 tcg_gen_or_tl(cpu_gpr[rd], t0, t1);
+tcg_temp_free(t2);
 tcg_temp_free(t1);
 }
 break;
-- 
2.11.0

[Qemu-devel] [PATCH 3/5] target/sh4: introduce DELAY_SLOT_MASK

2017-05-16 Thread Aurelien Jarno

This will make easier the introduction of a new flag in the next
patches.

Signed-off-by: Aurelien Jarno 
---
 target/sh4/cpu.h   |  3 ++-
 target/sh4/helper.c|  4 ++--
 target/sh4/translate.c | 17 -
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/target/sh4/cpu.h b/target/sh4/cpu.h
index 6c07c6b24b..7969c9af98 100644
--- a/target/sh4/cpu.h
+++ b/target/sh4/cpu.h
@@ -91,6 +91,7 @@
 #define FPSCR_RM_NEAREST   (0 << 0)
 #define FPSCR_RM_ZERO  (1 << 0)
 
+#define DELAY_SLOT_MASK0x3
 #define DELAY_SLOT (1 << 0)
 #define DELAY_SLOT_CONDITIONAL (1 << 1)
 
@@ -380,7 +381,7 @@ static inline void cpu_get_tb_cpu_state(CPUSH4State *env, 
target_ulong *pc,
 {
 *pc = env->pc;
 *cs_base = 0;
-*flags = (env->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) /* Bits 0-1 
*/
+*flags = (env->flags & DELAY_SLOT_MASK)/* Bits  0- 1 */
 | (env->fpscr & (FPSCR_FR | FPSCR_SZ | FPSCR_PR))  /* Bits 19-21 */
 | (env->sr & ((1u << SR_MD) | (1u << SR_RB)))  /* Bits 29-30 */
 | (env->sr & (1u << SR_FD))/* Bit 15 */
diff --git a/target/sh4/helper.c b/target/sh4/helper.c
index 5296e7cf4e..d420931530 100644
--- a/target/sh4/helper.c
+++ b/target/sh4/helper.c
@@ -172,11 +172,11 @@ void superh_cpu_do_interrupt(CPUState *cs)
 env->sgr = env->gregs[15];
 env->sr |= (1u << SR_BL) | (1u << SR_MD) | (1u << SR_RB);
 
-if (env->flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) {
+if (env->flags & DELAY_SLOT_MASK) {
 /* Branch instruction should be executed again before delay slot. */
env->spc -= 2;
/* Clear flags for exception/interrupt routine. */
-env->flags &= ~(DELAY_SLOT | DELAY_SLOT_CONDITIONAL);
+env->flags &= ~DELAY_SLOT_MASK;
 }
 
 if (do_exp) {
diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 0bc2f9ff19..aba316f593 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -217,8 +217,7 @@ static inline void gen_save_cpu_state(DisasContext *ctx, 
bool save_pc)
 if (ctx->delayed_pc != (uint32_t) -1) {
 tcg_gen_movi_i32(cpu_delayed_pc, ctx->delayed_pc);
 }
-if ((ctx->tbflags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL))
-!= ctx->envflags) {
+if ((ctx->tbflags & DELAY_SLOT_MASK) != ctx->envflags) {
 tcg_gen_movi_i32(cpu_flags, ctx->envflags);
 }
 }
@@ -329,7 +328,7 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 #define DREG(x) FREG(x) /* Assumes lsb of (x) is always 0 */
 
 #define CHECK_NOT_DELAY_SLOT \
-if (ctx->envflags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) { \
+if (ctx->envflags & DELAY_SLOT_MASK) {   \
 gen_save_cpu_state(ctx, true);   \
 gen_helper_raise_slot_illegal_instruction(cpu_env);  \
 ctx->bstate = BS_EXCP;   \
@@ -339,7 +338,7 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 #define CHECK_PRIVILEGED \
 if (IS_USER(ctx)) {  \
 gen_save_cpu_state(ctx, true);   \
-if (ctx->envflags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) { \
+if (ctx->envflags & DELAY_SLOT_MASK) {   \
 gen_helper_raise_slot_illegal_instruction(cpu_env);  \
 } else { \
 gen_helper_raise_illegal_instruction(cpu_env);   \
@@ -351,7 +350,7 @@ static inline void gen_store_fpr64 (TCGv_i64 t, int reg)
 #define CHECK_FPU_ENABLED\
 if (ctx->tbflags & (1u << SR_FD)) {  \
 gen_save_cpu_state(ctx, true);   \
-if (ctx->envflags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) { \
+if (ctx->envflags & DELAY_SLOT_MASK) {   \
 gen_helper_raise_slot_fpu_disable(cpu_env);  \
 } else { \
 gen_helper_raise_fpu_disable(cpu_env);   \
@@ -1784,7 +1783,7 @@ static void _decode_opc(DisasContext * ctx)
 fflush(stderr);
 #endif
 gen_save_cpu_state(ctx, true);
-if (ctx->envflags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) {
+if (ctx->envflags & DELAY_SLOT_MASK) {
 gen_helper_raise_slot_illegal_instruction(cpu_env);
 } else {
 gen_helper_raise_illegal_instruction(cpu_env);
@@ -1798,9 +1797,9 @@ static void decode_opc(DisasContext * ctx)
 
 _decode_opc(ctx);
 
-if (old_flags & (DELAY_SLOT | DELAY_SLOT_CONDITIONAL)) {
+if (old_flags & DELAY_SLOT_MASK) {
 /* go out of the delay slot */
-ctx->envflags &= ~(DELAY_SLOT | DELAY_SLOT_CONDITIONAL);
+

[Qemu-devel] [PATCH 0/5] target/sh4: system emulation improvement

2017-05-16 Thread Aurelien Jarno

This patch series fix two issues with the SH4 system emulation:
- reboot does not work when using -kernel and -initrd
- the RTE instruction is not correctly emulated in some very rare
  cases, causing userland processes to receive a segmentation fault.

Aurelien Jarno (5):
  target/sh4: log unauthorized accesses using qemu_log_mask
  target/sh4: fix reset when using a kernel and an initrd
  target/sh4: introduce DELAY_SLOT_MASK
  target/sh4: ignore interrupts in a delay slot
  target/sh4: fix RTE instruction delay slot

 target/sh4/cpu.h   | 12 ++--
 target/sh4/helper.c| 28 ++--
 target/sh4/translate.c | 25 ++---
 3 files changed, 46 insertions(+), 19 deletions(-)

-- 
2.11.0

[Qemu-devel] [PATCH 5/5] target/sh4: fix RTE instruction delay slot

2017-05-16 Thread Aurelien Jarno

The ReTurn from Exception (RTE) instruction loads the system register
(SR) with the saved system register (SSR). It has a delay slot, and
behaves specially according to the SH4 manual:

  The SR value accessed by the instruction in the RTE delay slot is the
  value restored from SSR by the RTE instruction. The SR and MD values
  defined prior to RTE execution are used to fetch the instruction in
  the RTE delay slot.

The instruction in the delay slot being often a NOP, it doesn't cause
any issue most of the time except in some rare cases where the NOP is
being splitted in a different TB (for example when the TCG op buffer
is full). In that case the NOP is fetched with the user permissions
and causes an instruction TLB protection violation exception.

This patches fixes that by introducing a new delay slot flag for the
RTE instruction. Given it's a privileged instruction, the RTE delay
slot instruction is always fetched in privileged mode. It is therefore
enough to to check for this flag in cpu_mmu_index.

Signed-off-by: Aurelien Jarno 
---
 target/sh4/cpu.h   | 13 ++---
 target/sh4/translate.c |  8 ++--
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/sh4/cpu.h b/target/sh4/cpu.h
index 7969c9af98..ffb91687b8 100644
--- a/target/sh4/cpu.h
+++ b/target/sh4/cpu.h
@@ -91,9 +91,10 @@
 #define FPSCR_RM_NEAREST   (0 << 0)
 #define FPSCR_RM_ZERO  (1 << 0)
 
-#define DELAY_SLOT_MASK0x3
+#define DELAY_SLOT_MASK0x7
 #define DELAY_SLOT (1 << 0)
 #define DELAY_SLOT_CONDITIONAL (1 << 1)
+#define DELAY_SLOT_RTE (1 << 2)
 
 typedef struct tlb_t {
 uint32_t vpn;  /* virtual page number */
@@ -264,7 +265,13 @@ void cpu_load_tlb(CPUSH4State * env);
 #define MMU_USER_IDX 1
 static inline int cpu_mmu_index (CPUSH4State *env, bool ifetch)
 {
-return (env->sr & (1u << SR_MD)) == 0 ? 1 : 0;
+/* The instruction in a RTE delay slot is fetched in privileged
+   mode, but executed in user mode.  */
+if (ifetch && (env->flags & DELAY_SLOT_RTE)) {
+return 0;
+} else {
+return (env->sr & (1u << SR_MD)) == 0 ? 1 : 0;
+}
 }
 
 #include "exec/cpu-all.h"
@@ -381,7 +388,7 @@ static inline void cpu_get_tb_cpu_state(CPUSH4State *env, 
target_ulong *pc,
 {
 *pc = env->pc;
 *cs_base = 0;
-*flags = (env->flags & DELAY_SLOT_MASK)/* Bits  0- 1 */
+*flags = (env->flags & DELAY_SLOT_MASK)/* Bits  0- 2 */
 | (env->fpscr & (FPSCR_FR | FPSCR_SZ | FPSCR_PR))  /* Bits 19-21 */
 | (env->sr & ((1u << SR_MD) | (1u << SR_RB)))  /* Bits 29-30 */
 | (env->sr & (1u << SR_FD))/* Bit 15 */
diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index aba316f593..8bc132b27b 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -185,6 +185,9 @@ void superh_cpu_dump_state(CPUState *cs, FILE *f,
 } else if (env->flags & DELAY_SLOT_CONDITIONAL) {
cpu_fprintf(f, "in conditional delay slot (delayed_pc=0x%08x)\n",
env->delayed_pc);
+} else if (env->flags & DELAY_SLOT_RTE) {
+cpu_fprintf(f, "in rte delay slot (delayed_pc=0x%08x)\n",
+env->delayed_pc);
 }
 }
 
@@ -427,8 +430,9 @@ static void _decode_opc(DisasContext * ctx)
CHECK_NOT_DELAY_SLOT
 gen_write_sr(cpu_ssr);
tcg_gen_mov_i32(cpu_delayed_pc, cpu_spc);
-ctx->envflags |= DELAY_SLOT;
+ctx->envflags |= DELAY_SLOT_RTE;
ctx->delayed_pc = (uint32_t) - 1;
+ctx->bstate = BS_STOP;
return;
 case 0x0058:   /* sets */
 tcg_gen_ori_i32(cpu_sr, cpu_sr, (1u << SR_S));
@@ -1804,7 +1808,7 @@ static void decode_opc(DisasContext * ctx)
 ctx->bstate = BS_BRANCH;
 if (old_flags & DELAY_SLOT_CONDITIONAL) {
gen_delayed_conditional_jump(ctx);
-} else if (old_flags & DELAY_SLOT) {
+} else {
 gen_jump(ctx);
}
 
-- 
2.11.0

[Qemu-devel] [PATCH 4/5] target/sh4: ignore interrupts in a delay slot

2017-05-16 Thread Aurelien Jarno

Delay slots are indivisible, therefore avoid scheduling an interrupt in
the delay slot. However exceptions are possible.

Signed-off-by: Aurelien Jarno 
---
 target/sh4/helper.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/target/sh4/helper.c b/target/sh4/helper.c
index d420931530..19d4ec5fb5 100644
--- a/target/sh4/helper.c
+++ b/target/sh4/helper.c
@@ -871,8 +871,16 @@ int cpu_sh4_is_cached(CPUSH4State * env, target_ulong addr)
 bool superh_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
 {
 if (interrupt_request & CPU_INTERRUPT_HARD) {
-superh_cpu_do_interrupt(cs);
-return true;
+SuperHCPU *cpu = SUPERH_CPU(cs);
+CPUSH4State *env = >env;
+
+/* Delay slots are indivisible, ignore interrupts */
+if (env->flags & DELAY_SLOT_MASK) {
+return false;
+} else {
+superh_cpu_do_interrupt(cs);
+return true;
+}
 }
 return false;
 }
-- 
2.11.0

[Qemu-devel] [PATCH 2/5] target/sh4: fix reset when using a kernel and an initrd

2017-05-16 Thread Aurelien Jarno

When a masked exception happens, the SH4 CPU generates a non-masked
reset exception, which then jumps to the reset vector at address
0xA000. While this is emulated correctly in QEMU, this does not
work when using a kernel and initrd as this address then contain an
illegal instruction (and there is no guarantee the kernel and initrd
haven't been overwritten).

Therefore call qemu_system_reset_request to reload the kernel and initrd
and load the program counter to the kernel entry point.

Signed-off-by: Aurelien Jarno 
---
 target/sh4/helper.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/target/sh4/helper.c b/target/sh4/helper.c
index 4c024f9529..5296e7cf4e 100644
--- a/target/sh4/helper.c
+++ b/target/sh4/helper.c
@@ -21,6 +21,7 @@
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "exec/log.h"
+#include "sysemu/sysemu.h"
 
 #if !defined(CONFIG_USER_ONLY)
 #include "hw/sh4/sh_intc.h"
@@ -92,7 +93,14 @@ void superh_cpu_do_interrupt(CPUState *cs)
 
 if (env->sr & (1u << SR_BL)) {
 if (do_exp && cs->exception_index != 0x1e0) {
-cs->exception_index = 0x000; /* masked exception -> reset */
+/* In theory a masked exception generates a reset exception,
+   which in turn jumps to the reset vector. However this only
+   works when using a bootloader. When using a kernel and an
+   initrd, they need to be reloaded and the program counter
+   should be loaded with the kernel entry point.
+   qemu_system_reset_request takes care of that.  */
+qemu_system_reset_request();
+return;
 }
 if (do_irq && !env->in_sleep) {
 return; /* masked */
-- 
2.11.0

[Qemu-devel] [PATCH 1/5] target/sh4: log unauthorized accesses using qemu_log_mask

2017-05-16 Thread Aurelien Jarno

qemu_log_mask() is preferred over fprintf() for logging errors.

Signed-off-by: Aurelien Jarno 
---
 target/sh4/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/sh4/helper.c b/target/sh4/helper.c
index 8f8ce81401..4c024f9529 100644
--- a/target/sh4/helper.c
+++ b/target/sh4/helper.c
@@ -420,7 +420,7 @@ static int get_physical_address(CPUSH4State * env, 
target_ulong * physical,
 if (!(env->sr & (1u << SR_MD))
&& (address < 0xe000 || address >= 0xe400)) {
/* Unauthorized access in user mode (only store queues are 
available) */
-   fprintf(stderr, "Unauthorized access\n");
+qemu_log_mask(LOG_GUEST_ERROR, "Unauthorized access\n");
if (rw == 0)
return MMU_DADDR_ERROR_READ;
else if (rw == 1)
-- 
2.11.0

Re: [Qemu-devel] [virtio-dev] [PATCH v18 1/2] virtio-crypto: Add virtio crypto device specification

2017-05-16 Thread Halil Pasic



On 05/16/2017 05:33 PM, Stefan Hajnoczi wrote:
> On Sat, Apr 22, 2017 at 02:23:50PM +0800, Gonglei wrote:
>> +Dataq requests for both session and stateless modes are as follows:
>> +
>> +\begin{lstlisting}
>> +struct virtio_crypto_op_data_req_mux {
>> +struct virtio_crypto_op_header header;
>> +
>> +union {
>> +struct virtio_crypto_sym_data_req   sym_req;
>> +struct virtio_crypto_hash_data_req  hash_req;
>> +struct virtio_crypto_mac_data_req   mac_req;
>> +struct virtio_crypto_aead_data_req  aead_req;
>> +struct virtio_crypto_sym_data_req_stateless   sym_stateless_req;
>> +struct virtio_crypto_hash_data_req_stateless  hash_stateless_req;
>> +struct virtio_crypto_mac_data_req_stateless   mac_stateless_req;
>> +struct virtio_crypto_aead_data_req_stateless  aead_stateless_req;
>> +} u;
>> +};
>> +\end{lstlisting}
> 
> Halil touched on this in the discussion: this spec uses a C-like struct
> syntax but does not define whether unions really affect sizeof(mystruct)
> like they would in C or whether you just mean that any of the union
> fields can be used.  This distinction is important so device and driver
> authors understand the exact memory layout of requests and responses.
> 
> Please include an explanation about the meaning of "union" in the text.
> 

I do not think simple explaining the union will do. I think this
description is bleeding from more wounds. I tried to explain this
while reviewing the implementation here:
https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg03876.html

Unfortunately some technical issues precluded me from posting it
in a timely manner.

@Stefan: Thanks for joining the discussion.

Regards,
Halil



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode

2017-05-16 Thread Xu, Anthony

> > > > With the above change,   I got below data
> > > >
> > > > Platformaccel   count of restoring A20 to 0
> > > > Q35 kvm 96
> > > > Q35 tcg 271
> > > > PC  kvm 3
> > > > PC  tcg 3
> > >
> > > Okay, thanks.  I think the number of a20 switches is due to
> > > differences in option rom execution interacting with the fact that
> > > some mode switches were occurring before SeaBIOS set
> > > call16_override().
> > >
> > > > But I still see a lot of PORT_A20 accesses in QEMU as I expected
> > >
> > > Yes, but it should be possible to significantly reduce the number of
> > > outb() calls by limiting them to when A20 changes.  This should also
> > > be useful to reduce the number of outb() calls needed to disable NMIs.
> > > I sent a patch series to the seabios mailing list to demonstrate the
> > > idea.
> >
> > If both TCG and KVM work by ignoring A20,  why not remove all PORT_A20
> > access in SeaBios when CONFIG_DISABLE_A20 is not defined?
> > Do you see any impact?
> 
> The SeaBIOS CONFIG_DISABLE_A20 build option does not mean "disable
> support for A20"; it means "start the initial operating system
> bootloader with A20 disabled".  CONFIG_DISABLE_A20=y is a
> pessimization, not an optimization.
Make sense, Thanks for explanation, 

> 
> As for adding a new SeaBIOS build option to compile out support for
> A20 - that seems like a very small optimization that would risk memory
> corruption and hard to diagnose crashes.  SeaBIOS runs natively on
> real hardware (with coreboot and as a CSM on UEFI) as well as on
> QEMU/KVM.
I heard new platform doesn't support A20.
What's the hardware SeaBIOS runs natively on needs A20 support?

It is just a build option, we can disable A20 support for QEMU/KVM and
enable A20 support for real hardware. Any concerns here?

BTW QEMU/KVM ignores A20 even SeaBIOS supports A20.

Or, we can add some logic in SeaBIOS to check if the platform supports A20,
if it doesn't support A20, SeaBIOS won't access PORT_A20 anymore.

The check logic is like,
write 0x55 to 0x00:0xeff0 (or other unused address)
disable A20
write 0xaa to 0x:0xf000
read from 0x00:0xeff0,
If the return value is 0x55, A20 is not supported by this platform.


Anthony

Re: [Qemu-devel] [RFC v1 6/9] virtio-crypto: rework virtio_crypto_handle_request

2017-05-16 Thread Halil Pasic



On 05/16/2017 04:52 AM, Gonglei (Arei) wrote:
>>
>> On 05/13/2017 03:16 AM, Gonglei (Arei) wrote:
>>>
 From: Halil Pasic [mailto:pa...@linux.vnet.ibm.com]
 Sent: Friday, May 12, 2017 7:02 PM


 On 05/08/2017 01:38 PM, Gonglei wrote:
> According to the new spec, we should use different
> requst structure to store the data request based
> on whether VIRTIO_CRYPTO_F_MUX_MODE feature bit is
> negotiated or not.
>
> In this patch, we havn't supported stateless mode
> yet. The device reportes an error if both
> VIRTIO_CRYPTO_F_MUX_MODE and
 VIRTIO_CRYPTO_F_CIPHER_STATELESS_MODE
> are negotiated, meanwhile the header.flag doesn't set
> to VIRTIO_CRYPTO_FLAG_SESSION_MODE.
>
> Let's handle this scenario in the following patches.
>
> Signed-off-by: Gonglei 
> ---
>  hw/virtio/virtio-crypto.c | 83
 ---
>  1 file changed, 71 insertions(+), 12 deletions(-)
>
> diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
> index 0353eb6..c4b8a2c 100644
> --- a/hw/virtio/virtio-crypto.c
> +++ b/hw/virtio/virtio-crypto.c
> @@ -577,6 +577,7 @@ virtio_crypto_handle_request(VirtIOCryptoReq
 *request)
>  VirtQueueElement *elem = >elem;
>  int queue_index =
 virtio_crypto_vq2q(virtio_get_queue_index(request->vq));
>  struct virtio_crypto_op_data_req req;
> +struct virtio_crypto_op_data_req_mux req_mux;
>  int ret;
>  struct iovec *in_iov;
>  struct iovec *out_iov;
> @@ -587,6 +588,9 @@ virtio_crypto_handle_request(VirtIOCryptoReq
 *request)
>  uint64_t session_id;
>  CryptoDevBackendSymOpInfo *sym_op_info = NULL;
>  Error *local_err = NULL;
> +bool mux_mode_is_negotiated;
> +struct virtio_crypto_op_header *header;
> +bool is_stateless_req = false;
>
>  if (elem->out_num < 1 || elem->in_num < 1) {
>  virtio_error(vdev, "virtio-crypto dataq missing headers");
> @@ -597,12 +601,28 @@ virtio_crypto_handle_request(VirtIOCryptoReq
 *request)
>  out_iov = elem->out_sg;
>  in_num = elem->in_num;
>  in_iov = elem->in_sg;
> -if (unlikely(iov_to_buf(out_iov, out_num, 0, , sizeof(req))
> -!= sizeof(req))) {
> -virtio_error(vdev, "virtio-crypto request outhdr too short");
> -return -1;
> +
> +mux_mode_is_negotiated =
> +virtio_vdev_has_feature(vdev,
>> VIRTIO_CRYPTO_F_MUX_MODE);
> +if (!mux_mode_is_negotiated) {
> +if (unlikely(iov_to_buf(out_iov, out_num, 0, , sizeof(req))
> +!= sizeof(req))) {
> +virtio_error(vdev, "virtio-crypto request outhdr too short");
> +return -1;
> +}
> +iov_discard_front(_iov, _num, sizeof(req));
> +
> +header = 
> +} else {
> +if (unlikely(iov_to_buf(out_iov, out_num, 0, _mux,
> +sizeof(req_mux)) != sizeof(req_mux))) {
> +virtio_error(vdev, "virtio-crypto request outhdr too short");
> +return -1;
> +}
> +iov_discard_front(_iov, _num, sizeof(req_mux));
> +
> +header = _mux.header;

 I wonder if this request length checking logic is conform to the
 most recent spec draft on the list ("[PATCH v18 0/2] virtio-crypto:
 virtio crypto device specification").

>>> Sure. Please see below normative formulation:
>>>
>>> '''
>>> \drivernormative{\paragraph}{Symmetric algorithms Operation}{Device Types
>> / Crypto Device / Device Operation / Symmetric algorithms Operation}
>>> ...
>>> \item If the VIRTIO_CRYPTO_F_MUX_MODE feature bit is negotiated, the
>> driver MUST use struct virtio_crypto_op_data_req_mux to wrap crypto
>> requests.
>>> Otherwise, the driver MUST use struct virtio_crypto_op_data_req.
>>> ...
>>> '''
>>>
>>
>> As far as I can remember, we have already agreed that in terms of the
>> spec sizeof(struct virtio_crypto_op_data_req) makes no sense! In your
> 
> Sorry, I don't think so. :(
> 
>> code you have a substantially different struct virtio_crypto_op_data_req
>> than in your spec! For instance in the spec virtio_crypto_op_data_req is
>> the full request and contains the data buffers (src_data and the
>> dest_data), while in your code it's effectively just a header and does
>> not contain any data buffers.
>>
> I said struct virtio_crypto_op_data_req in the spec is just a symbol.
> I didn't find a better way to express the src_data and dst_data etc. So
> I used u8[len] xxx_data to occupy a sit in the request.
> 

OK, tell me how is the reader/implementer of the spec supposed to figure
out that a 124 byte padded "header" needs to be precede any "data"?

Besides if you look at

+Stateless mode HASH service requests are as

[Qemu-devel] [PATCH] Memory: use memory address space for cpu-memory

2017-05-16 Thread Anthony Xu

If cpu-memory address space is same as memory address space,
use memory address space for cpu-memory address space.

any memory region change causeaddress space to rebuild PhysPageMap,
rebuilding PhysPageMap is very expensive.

removing cpu-memory address space reduces the guest boot time and
memory usage.

Signed-off-by: Anthony Xu 
---
 cpus.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index 740b8dc..15c7a6a 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1748,8 +1748,13 @@ void qemu_init_vcpu(CPUState *cpu)
 /* If the target cpu hasn't set up any address spaces itself,
  * give it the default one.
  */
-AddressSpace *as = address_space_init_shareable(cpu->memory,
-"cpu-memory");
+AddressSpace *as;
+if (cpu->memory == address_space_memory.root) {
+address_space_memory.ref_count++;
+as = _space_memory;
+} else {
+as = address_space_init_shareable(cpu->memory, "cpu-memory");
+}
 cpu->num_ases = 1;
 cpu_address_space_init(cpu, as, 0);
 }
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH 06/10] virtio-ccw: use vmstate way for config migration

2017-05-16 Thread Halil Pasic



On 05/15/2017 09:07 PM, Dr. David Alan Gilbert wrote:
> * Halil Pasic (pa...@linux.vnet.ibm.com) wrote:
>>
>>
>> On 05/08/2017 08:42 PM, Dr. David Alan Gilbert wrote:
>>> * Halil Pasic (pa...@linux.vnet.ibm.com) wrote:


 On 05/08/2017 07:59 PM, Dr. David Alan Gilbert wrote:
[..]

 Why not use virtio oddities? Because they are oddities. I have
 figured, it's a good idea to separate the migration of the 
 proxy form the rest: we have two QEMU Device objects and it
 should be good practice, that these are migrating themselves via
 DeviceClass.vmsd. That's what I get with this patch set, 
 for new machine versions (since we can not fix the past), and
 with the notable difference of config_vector, because it is
 defined as a common infrastructure (struct VirtIODevice) but
 ain't migrated as a common virtio infrastructure.
>>>
>>> Have you got a bit of a description of your classes/structure - it's
>>> a little hard to get my head around.
>>>
>>
>> Unfortunately I do not have any extra description besides the comments
>> and the commit messages. What exactly do you mean  by 'my
>> classes/structure'?  I would like to provide some helpful developer
>> documentation on how migration works for s390x. There were voices on the
>> internal mailing list too requesting something like that, but I find it
>> hard, because for me, the most challenging part was understanding how
>> qemu migration works in general and the virtio oddities come next. 
> 
> Yes, there are only about 2 people who have the overlap of understanding
> migration AND s390 IO.
> 
>> Fore example, I still don't understand why is is (virtio) load_config
>> called like that, when what it mainly does is loading state of the proxy
>> which is basically the reification of the device side of the virtio spec
>> calls the transport within QOM. (I say mainly, because of this
>> config_vector which resides in core but is migrated by via a callback for
>> some strange reason I do not understand).
> 
> I think the idea is that virtio_load is trying to act as a generic
> save/load with a number of virtual components that are specialised for:
>   a) The device (e.g. rng, serial, gpu, net, blk)
>   b) The transport (PCI, MMIO, CCW etc)
>   c) The virtio queue content
>   d) But has a load of core stuff (features, the virtio ring management)
> 
> (a) & (b) are very much virtual-function like that doesn't fit that
> well with the migration macro structure.
> 
> The split between (a) & (c) isn't necessary clean - gpu does it a
> different way.
> And the order of a/b/c/d is very random (aka wrong).
> 

I mostly agree with your analysis. Honestly I have forgot abut this
load_queue callback (I think its c)), but it's a strange one too. What it
does is handling the vector of the queue which is again common
infrastructure in a sense that it reside within VirtIODevice, but it may
need some proxy specific handling.

In my understanding the virtio migration and the migration subsystem
(lets call it vmstate) are a misfit in the following aspect. Most
importantly it separation of concerns. In my understanding, for vmstate,
each device is supposed to load/save itself, and loading state and doing
stuff with the state we have loaded are separate concerns. I'm not sure
whats the vmstate place for code which is supposed to run as a part of
the migration logic, but requires cooperation of devices (e.g. notify in
virtio_load which basically generates an interrupt). 


>> Could tell me to which (specific) questions should I provide an answer?
>> It would make my job much easier.
>>
>> About the general approach. First step was to provide VMStateDescription
>> for the entities which have migration relevant state but no
>> VMStateDescription (patches 3, 4 and 5).  This is done so that
>> lots of qemu_put/qem_get calls can be replaced with few
>> vmstate_save_state/vmstate_save_state calls (patch 6 and 7) on one hand,
>> and that state not migrated yet but needed is also included, if the
>> compat. switch (property) added in patch 2 is on. Then in patch 8, I add
>> ORB which is a state we wanted to add for some time now, but we needed
>> vmstate to add it without breaking migration. So we waited.
> 
> I'm most interested at this point in understanding which bits aren't
> changing behaviour - if we've got stuff that's just converting qemu_get
> to vmstate then lets go for it, no problem; easy to check.

The commit messages should be helpful. Up to patch 8 all I do is
converting qemu_get to vmstate as you said. 

> I'm just trying to make sure I understand the bit where you're
> converting from being a virtio device.
> 

By converting from being a virtio device you mean factoring out the
transport stuff into a separate section? That's happening in patch
9. Let me try to explain that patch.

The think of that patch is the following:
* Prior to it css_migration_enabled() always returned false. After
patch 9 it returns true for new machine

Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode

2017-05-16 Thread Kevin O'Connor

On Tue, May 16, 2017 at 08:00:28PM +, Xu, Anthony wrote:
> > On Sat, May 13, 2017 at 01:24:30AM +, Xu, Anthony wrote:
> > > I think it is related to accel and platform, the result I gave before is 
> > > for q35
> > tcg,
> > >
> > > With the above change,   I got below data
> > >
> > > Platform  accel   count of restoring A20 to 0
> > > Q35   kvm 96
> > > Q35   tcg 271
> > > PCkvm 3
> > > PCtcg 3
> > 
> > Okay, thanks.  I think the number of a20 switches is due to
> > differences in option rom execution interacting with the fact that
> > some mode switches were occurring before SeaBIOS set
> > call16_override().
> > 
> > > But I still see a lot of PORT_A20 accesses in QEMU as I expected
> > 
> > Yes, but it should be possible to significantly reduce the number of
> > outb() calls by limiting them to when A20 changes.  This should also
> > be useful to reduce the number of outb() calls needed to disable NMIs.
> > I sent a patch series to the seabios mailing list to demonstrate the
> > idea.
> 
> If both TCG and KVM work by ignoring A20,  why not remove all PORT_A20
> access in SeaBios when CONFIG_DISABLE_A20 is not defined?
> Do you see any impact?

The SeaBIOS CONFIG_DISABLE_A20 build option does not mean "disable
support for A20"; it means "start the initial operating system
bootloader with A20 disabled".  CONFIG_DISABLE_A20=y is a
pessimization, not an optimization.

As for adding a new SeaBIOS build option to compile out support for
A20 - that seems like a very small optimization that would risk memory
corruption and hard to diagnose crashes.  SeaBIOS runs natively on
real hardware (with coreboot and as a CSM on UEFI) as well as on
QEMU/KVM.

-Kevin

Re: [Qemu-devel] [PATCH] qapi-schema: Remove obsolete note from ObjectTypeInfo

2017-05-16 Thread Eric Blake

On 05/16/2017 03:53 PM, Eduardo Habkost wrote:
> The "This command is experimental" note in ObjectTypeInfo is obsolete
> since 2012.  Commit 5192082097549c5b3aa7c913c6853d97a68172cb removed the
> warning from the qom-list-types command documentation, but we forgot to
> remove the warning from ObjectTypeInfo.
> 
> Signed-off-by: Eduardo Habkost 
> ---
>  qapi-schema.json | 2 --
>  1 file changed, 2 deletions(-)

Reviewed-by: Eric Blake 

> 
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 80603cfc51..e6da88585a 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -3035,8 +3035,6 @@
>  # @name: the type name found in the search
>  #
>  # Since: 1.1
> -#
> -# Notes: This command is experimental and may change syntax in future 
> releases.
>  ##
>  { 'struct': 'ObjectTypeInfo',
>'data': { 'name': 'str' } }
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v2 09/12] dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes

2017-05-16 Thread Eric Blake

Some of the callers were already scaling bytes to sectors; others
can be easily converted to pass byte offsets, all in our shift
towards a consistent byte interface everywhere.  Making the change
will also make it easier to write the hold-out callers to use byte
rather than sectors for their iterations; it also makes it easier
for a future dirty-bitmap patch to offload scaling over to the
internal hbitmap.  Although all callers happen to pass
sector-aligned values, make the internal scaling robust to any
sub-sector requests.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 

---
v2: no change
---
 include/block/dirty-bitmap.h |  4 ++--
 block/dirty-bitmap.c | 14 ++
 block/mirror.c   | 16 
 migration/block.c|  7 +--
 4 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index b8434e5..fdff1e2 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -37,9 +37,9 @@ DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap 
*bitmap);
 bool bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
 int64_t offset);
 void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
-   int64_t cur_sector, int64_t nr_sectors);
+   int64_t offset, int64_t bytes);
 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
- int64_t cur_sector, int64_t nr_sectors);
+ int64_t offset, int64_t bytes);
 BdrvDirtyBitmapIter *bdrv_dirty_meta_iter_new(BdrvDirtyBitmap *bitmap);
 BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap);
 void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index c8100d2..8e7822c 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -401,17 +401,23 @@ int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
 }

 void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
-   int64_t cur_sector, int64_t nr_sectors)
+   int64_t offset, int64_t bytes)
 {
+int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
+
 assert(bdrv_dirty_bitmap_enabled(bitmap));
-hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
+hbitmap_set(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
+end_sector - (offset >> BDRV_SECTOR_BITS));
 }

 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
- int64_t cur_sector, int64_t nr_sectors)
+ int64_t offset, int64_t bytes)
 {
+int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
+
 assert(bdrv_dirty_bitmap_enabled(bitmap));
-hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
+hbitmap_reset(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
+  end_sector - (offset >> BDRV_SECTOR_BITS));
 }

 void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out)
diff --git a/block/mirror.c b/block/mirror.c
index 8b36ec2..b4fe259 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -141,8 +141,7 @@ static void mirror_write_complete(void *opaque, int ret)
 if (ret < 0) {
 BlockErrorAction action;

-bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> BDRV_SECTOR_BITS,
-  op->bytes >> BDRV_SECTOR_BITS);
+bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset, op->bytes);
 action = mirror_error_action(s, false, -ret);
 if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
 s->ret = ret;
@@ -161,8 +160,7 @@ static void mirror_read_complete(void *opaque, int ret)
 if (ret < 0) {
 BlockErrorAction action;

-bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> BDRV_SECTOR_BITS,
-  op->bytes >> BDRV_SECTOR_BITS);
+bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset, op->bytes);
 action = mirror_error_action(s, true, -ret);
 if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
 s->ret = ret;
@@ -380,8 +378,8 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
  * calling bdrv_get_block_status_above could yield - if some blocks are
  * marked dirty in this window, we need to know.
  */
-bdrv_reset_dirty_bitmap(s->dirty_bitmap, offset >> BDRV_SECTOR_BITS,
-nb_chunks * sectors_per_chunk);
+bdrv_reset_dirty_bitmap(s->dirty_bitmap, offset,
+nb_chunks * s->granularity);
 bitmap_set(s->in_flight_bitmap, offset / s->granularity, nb_chunks);
 while (nb_chunks > 0 && offset < s->bdev_length) {
 int64_t ret;
@@ -614,7 +612,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)

 if (base == NULL && !bdrv_has_zero_init(target_bs)) {
 if

[Qemu-devel] [PATCH v2 11/12] dirty-bitmap: Switch bdrv_set_dirty() to bytes

2017-05-16 Thread Eric Blake

Both callers already had bytes available, but were scaling to
sectors.  Move the scaling to internal code.  In the case of
bdrv_aligned_pwritev(), we are now passing the exact offset
rather than a rounded sector-aligned value, but that's okay
as long as dirty bitmap widens start/bytes to granularity
boundaries.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 

---
v2: no change
---
 include/block/block_int.h | 2 +-
 block/dirty-bitmap.c  | 8 +---
 block/io.c| 6 ++
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 8d3724c..eec9835 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -931,7 +931,7 @@ void blk_dev_eject_request(BlockBackend *blk, bool force);
 bool blk_dev_is_tray_open(BlockBackend *blk);
 bool blk_dev_is_medium_locked(BlockBackend *blk);

-void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int64_t nr_sect);
+void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, int64_t bytes);
 bool bdrv_requests_pending(BlockDriverState *bs);

 void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 8e7822c..ef165eb 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -478,15 +478,17 @@ void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap 
*bitmap)
 hbitmap_deserialize_finish(bitmap->bitmap);
 }

-void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
-int64_t nr_sectors)
+void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
 BdrvDirtyBitmap *bitmap;
+int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
+
 QLIST_FOREACH(bitmap, >dirty_bitmaps, list) {
 if (!bdrv_dirty_bitmap_enabled(bitmap)) {
 continue;
 }
-hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
+hbitmap_set(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
+end_sector - (offset >> BDRV_SECTOR_BITS));
 }
 }

diff --git a/block/io.c b/block/io.c
index d4f1925..e86a546 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1324,7 +1324,6 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
*child,
 bool waited;
 int ret;

-int64_t start_sector = offset >> BDRV_SECTOR_BITS;
 int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
 uint64_t bytes_remaining = bytes;
 int max_transfer;
@@ -1395,7 +1394,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
*child,
 bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE);

 ++bs->write_gen;
-bdrv_set_dirty(bs, start_sector, end_sector - start_sector);
+bdrv_set_dirty(bs, offset, bytes);

 if (bs->wr_highest_offset < offset + bytes) {
 bs->wr_highest_offset = offset + bytes;
@@ -2529,8 +2528,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, 
int64_t offset,
 ret = 0;
 out:
 ++bs->write_gen;
-bdrv_set_dirty(bs, req.offset >> BDRV_SECTOR_BITS,
-   req.bytes >> BDRV_SECTOR_BITS);
+bdrv_set_dirty(bs, req.offset, req.bytes);
 tracked_request_end();
 bdrv_dec_in_flight(bs);
 return ret;
-- 
2.9.4

[Qemu-devel] [PATCH] qapi-schema: Remove obsolete note from ObjectTypeInfo

2017-05-16 Thread Eduardo Habkost

The "This command is experimental" note in ObjectTypeInfo is obsolete
since 2012.  Commit 5192082097549c5b3aa7c913c6853d97a68172cb removed the
warning from the qom-list-types command documentation, but we forgot to
remove the warning from ObjectTypeInfo.

Signed-off-by: Eduardo Habkost 
---
 qapi-schema.json | 2 --
 1 file changed, 2 deletions(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index 80603cfc51..e6da88585a 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3035,8 +3035,6 @@
 # @name: the type name found in the search
 #
 # Since: 1.1
-#
-# Notes: This command is experimental and may change syntax in future releases.
 ##
 { 'struct': 'ObjectTypeInfo',
   'data': { 'name': 'str' } }
-- 
2.11.0.259.g40922b1

[Qemu-devel] [PATCH v2 07/12] dirty-bitmap: Change bdrv_get_dirty_count() to report bytes

2017-05-16 Thread Eric Blake

Thanks to recent cleanups, all callers were scaling a return value
of sectors into bytes; do the scaling internally instead.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 

---
v2: no change
---
 block/dirty-bitmap.c |  4 ++--
 block/mirror.c   | 13 +
 migration/block.c|  2 +-
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 2f9f554..e3c2e34 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -319,7 +319,7 @@ BlockDirtyInfoList 
*bdrv_query_dirty_bitmaps(BlockDriverState *bs)
 QLIST_FOREACH(bm, >dirty_bitmaps, list) {
 BlockDirtyInfo *info = g_new0(BlockDirtyInfo, 1);
 BlockDirtyInfoList *entry = g_new0(BlockDirtyInfoList, 1);
-info->count = bdrv_get_dirty_count(bm) << BDRV_SECTOR_BITS;
+info->count = bdrv_get_dirty_count(bm);
 info->granularity = bdrv_dirty_bitmap_granularity(bm);
 info->has_name = !!bm->name;
 info->name = g_strdup(bm->name);
@@ -494,7 +494,7 @@ void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *iter, int64_t 
offset)

 int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap)
 {
-return hbitmap_count(bitmap->bitmap);
+return hbitmap_count(bitmap->bitmap) << BDRV_SECTOR_BITS;
 }

 int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap)
diff --git a/block/mirror.c b/block/mirror.c
index 8428733..b82bdce 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -794,11 +794,10 @@ static void coroutine_fn mirror_run(void *opaque)

 cnt = bdrv_get_dirty_count(s->dirty_bitmap);
 /* s->common.offset contains the number of bytes already processed so
- * far, cnt is the number of dirty sectors remaining and
+ * far, cnt is the number of dirty bytes remaining and
  * s->bytes_in_flight is the number of bytes currently being
  * processed; together those are the current total operation length */
-s->common.len = s->common.offset + s->bytes_in_flight +
-cnt * BDRV_SECTOR_SIZE;
+s->common.len = s->common.offset + s->bytes_in_flight + cnt;

 /* Note that even when no rate limit is applied we need to yield
  * periodically with no pending I/O so that bdrv_drain_all() returns.
@@ -810,8 +809,7 @@ static void coroutine_fn mirror_run(void *opaque)
 s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
 if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 ||
 (cnt == 0 && s->in_flight > 0)) {
-trace_mirror_yield(s, cnt * BDRV_SECTOR_SIZE,
-   s->buf_free_count, s->in_flight);
+trace_mirror_yield(s, cnt, s->buf_free_count, s->in_flight);
 mirror_wait_for_io(s);
 continue;
 } else if (cnt != 0) {
@@ -852,7 +850,7 @@ static void coroutine_fn mirror_run(void *opaque)
  * whether to switch to target check one last time if I/O has
  * come in the meanwhile, and if not flush the data to disk.
  */
-trace_mirror_before_drain(s, cnt * BDRV_SECTOR_SIZE);
+trace_mirror_before_drain(s, cnt);

 bdrv_drained_begin(bs);
 cnt = bdrv_get_dirty_count(s->dirty_bitmap);
@@ -871,8 +869,7 @@ static void coroutine_fn mirror_run(void *opaque)
 }

 ret = 0;
-trace_mirror_before_sleep(s, cnt * BDRV_SECTOR_SIZE,
-  s->synced, delay_ns);
+trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
 if (!s->synced) {
 block_job_sleep_ns(>common, QEMU_CLOCK_REALTIME, delay_ns);
 if (block_job_is_cancelled(>common)) {
diff --git a/migration/block.c b/migration/block.c
index ecc838a..5a545cf 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -672,7 +672,7 @@ static int64_t get_remaining_dirty(void)
 aio_context_release(blk_get_aio_context(bmds->blk));
 }

-return dirty << BDRV_SECTOR_BITS;
+return dirty;
 }

 /* Called with iothread lock taken.  */
-- 
2.9.4

[Qemu-devel] [PATCH v2 08/12] dirty-bitmap: Change bdrv_get_dirty() to take bytes

2017-05-16 Thread Eric Blake

Half the callers were already scaling bytes to sectors; the other
half can eventually be simplified to use byte iteration.  Both
callers were already using the result as a bool, so make that
explicit.  Making the change also makes it easier for a future
dirty-bitmap patch to offload scaling over to the internal hbitmap.

Remember, asking whether a byte is dirty is effectively asking
whether the entire granularity containing the byte is dirty, since
we only track dirtiness by granularity.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 

---
v2: tweak commit message, no code change
---
 include/block/dirty-bitmap.h | 4 ++--
 block/dirty-bitmap.c | 8 
 block/mirror.c   | 3 +--
 migration/block.c| 3 +--
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index efcec60..b8434e5 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -34,8 +34,8 @@ bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap);
 const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
 DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap);
-int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
-   int64_t sector);
+bool bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
+int64_t offset);
 void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int64_t nr_sectors);
 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index e3c2e34..c8100d2 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -332,13 +332,13 @@ BlockDirtyInfoList 
*bdrv_query_dirty_bitmaps(BlockDriverState *bs)
 return list;
 }

-int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
-   int64_t sector)
+bool bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
+int64_t offset)
 {
 if (bitmap) {
-return hbitmap_get(bitmap->bitmap, sector);
+return hbitmap_get(bitmap->bitmap, offset >> BDRV_SECTOR_BITS);
 } else {
-return 0;
+return false;
 }
 }

diff --git a/block/mirror.c b/block/mirror.c
index b82bdce..8b36ec2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -359,8 +359,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 int64_t next_offset = offset + nb_chunks * s->granularity;
 int64_t next_chunk = next_offset / s->granularity;
 if (next_offset >= s->bdev_length ||
-!bdrv_get_dirty(source, s->dirty_bitmap,
-next_offset >> BDRV_SECTOR_BITS)) {
+!bdrv_get_dirty(source, s->dirty_bitmap, next_offset)) {
 break;
 }
 if (test_bit(next_chunk, s->in_flight_bitmap)) {
diff --git a/migration/block.c b/migration/block.c
index 5a545cf..3e3dec9 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -537,8 +537,7 @@ static int mig_save_device_dirty(QEMUFile *f, 
BlkMigDevState *bmds,
 } else {
 blk_mig_unlock();
 }
-if (bdrv_get_dirty(bs, bmds->dirty_bitmap, sector)) {
-
+if (bdrv_get_dirty(bs, bmds->dirty_bitmap, sector * BDRV_SECTOR_SIZE)) 
{
 if (total_sectors - sector < BDRV_SECTORS_PER_DIRTY_CHUNK) {
 nr_sectors = total_sectors - sector;
 } else {
-- 
2.9.4

[Qemu-devel] [PATCH v2 06/12] dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset

2017-05-16 Thread Eric Blake

Thanks to recent cleanups, all callers were scaling a return value
of sectors into bytes; do the scaling internally instead.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 

---
v2: no change
---
 block/backup.c   | 2 +-
 block/dirty-bitmap.c | 2 +-
 block/mirror.c   | 8 
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 70126b8..dc3c7f2 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -375,7 +375,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 dbi = bdrv_dirty_iter_new(job->sync_bitmap);

 /* Find the next dirty sector(s) */
-while ((offset = bdrv_dirty_iter_next(dbi) * BDRV_SECTOR_SIZE) >= 0) {
+while ((offset = bdrv_dirty_iter_next(dbi)) >= 0) {
 cluster = offset / job->cluster_size;

 /* Fake progress updates for any clusters we skipped */
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 3fb4871..2f9f554 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -397,7 +397,7 @@ void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter)

 int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
 {
-return hbitmap_iter_next(>hbi);
+return hbitmap_iter_next(>hbi) * BDRV_SECTOR_SIZE;
 }

 void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
diff --git a/block/mirror.c b/block/mirror.c
index 885cc29..8428733 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -335,10 +335,10 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
 int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);

-offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+offset = bdrv_dirty_iter_next(s->dbi);
 if (offset < 0) {
 bdrv_set_dirty_iter(s->dbi, 0);
-offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+offset = bdrv_dirty_iter_next(s->dbi);
 trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap) *
   BDRV_SECTOR_SIZE);
 assert(offset >= 0);
@@ -367,11 +367,11 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 break;
 }

-next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+next_dirty = bdrv_dirty_iter_next(s->dbi);
 if (next_dirty > next_offset || next_dirty < 0) {
 /* The bitmap iterator's cache is stale, refresh it */
 bdrv_set_dirty_iter(s->dbi, next_offset);
-next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+next_dirty = bdrv_dirty_iter_next(s->dbi);
 }
 assert(next_dirty == next_offset);
 nb_chunks++;
-- 
2.9.4

[Qemu-devel] [PATCH v2 02/12] migration: Don't lose errno across aio context changes

2017-05-16 Thread Eric Blake

set_dirty_tracking() was assuming that the errno value set by
bdrv_create_dirty_bitmap() would not be corrupted by either
blk_get_aio_context() or aio_context_release().  Rather than
audit whether this assumption is safe, rewrite the code to just
grab the value of errno sooner.

CC: qemu-sta...@nongnu.org
Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Juan Quintela 

---
v2: fix commit message typo, no code change
---
 migration/block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/block.c b/migration/block.c
index 8d79d84..ecc838a 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -350,9 +350,9 @@ static int set_dirty_tracking(void)
 aio_context_acquire(blk_get_aio_context(bmds->blk));
 bmds->dirty_bitmap = bdrv_create_dirty_bitmap(blk_bs(bmds->blk),
   BLOCK_SIZE, NULL, NULL);
+ret = -errno;
 aio_context_release(blk_get_aio_context(bmds->blk));
 if (!bmds->dirty_bitmap) {
-ret = -errno;
 goto fail;
 }
 }
-- 
2.9.4

[Qemu-devel] [PATCH v2 12/12] dirty-bitmap: Convert internal hbitmap size/granularity

2017-05-16 Thread Eric Blake

Now that all callers are using byte-based interfaces, there's no
reason for our internal hbitmap to remain with sector-based
granularity.  It also simplifies our internal scaling, since we
already know that hbitmap widens requests out to granularity
boundaries.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 

---
v2: no change
---
 block/dirty-bitmap.c | 37 -
 1 file changed, 12 insertions(+), 25 deletions(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index ef165eb..26ca084 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -37,7 +37,7 @@
  * or enabled. A frozen bitmap can only abdicate() or reclaim().
  */
 struct BdrvDirtyBitmap {
-HBitmap *bitmap;/* Dirty sector bitmap implementation */
+HBitmap *bitmap;/* Dirty bitmap implementation */
 HBitmap *meta;  /* Meta dirty bitmap */
 BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
 char *name; /* Optional non-empty unique ID */
@@ -93,12 +93,7 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState 
*bs,
 return NULL;
 }
 bitmap = g_new0(BdrvDirtyBitmap, 1);
-/*
- * TODO - let hbitmap track full granularity. For now, it is tracking
- * only sector granularity, as a shortcut for our iterators.
- */
-bitmap->bitmap = hbitmap_alloc(bitmap_size >> BDRV_SECTOR_BITS,
-   ctz32(granularity) - BDRV_SECTOR_BITS);
+bitmap->bitmap = hbitmap_alloc(bitmap_size, ctz32(granularity));
 bitmap->size = bitmap_size;
 bitmap->name = g_strdup(name);
 bitmap->disabled = false;
@@ -254,7 +249,7 @@ void bdrv_dirty_bitmap_truncate(BlockDriverState *bs)
 QLIST_FOREACH(bitmap, >dirty_bitmaps, list) {
 assert(!bdrv_dirty_bitmap_frozen(bitmap));
 assert(!bitmap->active_iterators);
-hbitmap_truncate(bitmap->bitmap, size >> BDRV_SECTOR_BITS);
+hbitmap_truncate(bitmap->bitmap, size);
 bitmap->size = size;
 }
 }
@@ -336,7 +331,7 @@ bool bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap 
*bitmap,
 int64_t offset)
 {
 if (bitmap) {
-return hbitmap_get(bitmap->bitmap, offset >> BDRV_SECTOR_BITS);
+return hbitmap_get(bitmap->bitmap, offset);
 } else {
 return false;
 }
@@ -364,7 +359,7 @@ uint32_t 
bdrv_get_default_bitmap_granularity(BlockDriverState *bs)

 uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap *bitmap)
 {
-return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
+return 1U << hbitmap_granularity(bitmap->bitmap);
 }

 BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap)
@@ -397,27 +392,21 @@ void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter)

 int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
 {
-return hbitmap_iter_next(>hbi) * BDRV_SECTOR_SIZE;
+return hbitmap_iter_next(>hbi);
 }

 void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
int64_t offset, int64_t bytes)
 {
-int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
-
 assert(bdrv_dirty_bitmap_enabled(bitmap));
-hbitmap_set(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
-end_sector - (offset >> BDRV_SECTOR_BITS));
+hbitmap_set(bitmap->bitmap, offset, bytes);
 }

 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
  int64_t offset, int64_t bytes)
 {
-int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
-
 assert(bdrv_dirty_bitmap_enabled(bitmap));
-hbitmap_reset(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
-  end_sector - (offset >> BDRV_SECTOR_BITS));
+hbitmap_reset(bitmap->bitmap, offset, bytes);
 }

 void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out)
@@ -427,7 +416,7 @@ void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, 
HBitmap **out)
 hbitmap_reset_all(bitmap->bitmap);
 } else {
 HBitmap *backup = bitmap->bitmap;
-bitmap->bitmap = hbitmap_alloc(bitmap->size >> BDRV_SECTOR_BITS,
+bitmap->bitmap = hbitmap_alloc(bitmap->size,
hbitmap_granularity(backup));
 *out = backup;
 }
@@ -481,14 +470,12 @@ void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap 
*bitmap)
 void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
 BdrvDirtyBitmap *bitmap;
-int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);

 QLIST_FOREACH(bitmap, >dirty_bitmaps, list) {
 if (!bdrv_dirty_bitmap_enabled(bitmap)) {
 continue;
 }
-hbitmap_set(bitmap->bitmap, offset >> BDRV_SECTOR_BITS,
-end_sector - (offset >> BDRV_SECTOR_BITS));
+hbitmap_set(bitmap->bitmap, offset, bytes);
 }
 }

@@ -497,12 +484,12 @@ void bdrv_set_dirty(BlockDriverState *bs,

[Qemu-devel] [PATCH v2 05/12] dirty-bitmap: Set iterator start by offset, not sector

2017-05-16 Thread Eric Blake

All callers to bdrv_dirty_iter_new() passed 0 for their initial
starting point, drop that parameter.

All callers to bdrv_set_dirty_iter() were scaling an offset to
a sector number; move the scaling to occur internally to dirty
bitmap code instead.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 

---
v2: no change
---
 include/block/dirty-bitmap.h | 5 ++---
 block/backup.c   | 5 ++---
 block/dirty-bitmap.c | 9 -
 block/mirror.c   | 4 ++--
 4 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index a83979d..efcec60 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -41,11 +41,10 @@ void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
  int64_t cur_sector, int64_t nr_sectors);
 BdrvDirtyBitmapIter *bdrv_dirty_meta_iter_new(BdrvDirtyBitmap *bitmap);
-BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
- uint64_t first_sector);
+BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap);
 void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter);
 int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter);
-void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t sector_num);
+void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t offset);
 int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
 int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap);
 void bdrv_dirty_bitmap_truncate(BlockDriverState *bs);
diff --git a/block/backup.c b/block/backup.c
index b8b76e5..70126b8 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -372,7 +372,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)

 granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
 clusters_per_iter = MAX((granularity / job->cluster_size), 1);
-dbi = bdrv_dirty_iter_new(job->sync_bitmap, 0);
+dbi = bdrv_dirty_iter_new(job->sync_bitmap);

 /* Find the next dirty sector(s) */
 while ((offset = bdrv_dirty_iter_next(dbi) * BDRV_SECTOR_SIZE) >= 0) {
@@ -403,8 +403,7 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 /* If the bitmap granularity is smaller than the backup granularity,
  * we need to advance the iterator pointer to the next cluster. */
 if (granularity < job->cluster_size) {
-bdrv_set_dirty_iter(dbi,
-cluster * job->cluster_size / 
BDRV_SECTOR_SIZE);
+bdrv_set_dirty_iter(dbi, cluster * job->cluster_size);
 }

 last_cluster = cluster - 1;
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index a413df1..3fb4871 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -367,11 +367,10 @@ uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap 
*bitmap)
 return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
 }

-BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
- uint64_t first_sector)
+BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap)
 {
 BdrvDirtyBitmapIter *iter = g_new(BdrvDirtyBitmapIter, 1);
-hbitmap_iter_init(>hbi, bitmap->bitmap, first_sector);
+hbitmap_iter_init(>hbi, bitmap->bitmap, 0);
 iter->bitmap = bitmap;
 bitmap->active_iterators++;
 return iter;
@@ -488,9 +487,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t 
cur_sector,
 /**
  * Advance a BdrvDirtyBitmapIter to an arbitrary offset.
  */
-void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *iter, int64_t sector_num)
+void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *iter, int64_t offset)
 {
-hbitmap_iter_init(>hbi, iter->hbi.hb, sector_num);
+hbitmap_iter_init(>hbi, iter->hbi.hb, offset >> BDRV_SECTOR_BITS);
 }

 int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap)
diff --git a/block/mirror.c b/block/mirror.c
index 452d546..885cc29 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -370,7 +370,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
 if (next_dirty > next_offset || next_dirty < 0) {
 /* The bitmap iterator's cache is stale, refresh it */
-bdrv_set_dirty_iter(s->dbi, next_offset >> BDRV_SECTOR_BITS);
+bdrv_set_dirty_iter(s->dbi, next_offset);
 next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
 }
 assert(next_dirty == next_offset);
@@ -779,7 +779,7 @@ static void coroutine_fn mirror_run(void *opaque)
 }

 assert(!s->dbi);
-s->dbi = bdrv_dirty_iter_new(s->dirty_bitmap, 0);
+s->dbi = bdrv_dirty_iter_new(s->dirty_bitmap);
 for (;;) {
 uint64_t delay_ns = 0;
 int64_t cnt, delta;
-- 
2.9.4

[Qemu-devel] [PATCH v2 00/12] make dirty-bitmap byte-based

2017-05-16 Thread Eric Blake

There are patches floating around to add NBD_CMD_BLOCK_STATUS,
but NBD wants to report status on byte granularity (even if the
reporting will probably be naturally aligned to sectors or even
much higher levels).  I've therefore started the task of
converting our block status code to report at a byte granularity
rather than sectors.

This is part two of that conversion: dirty-bitmap. Other parts
include bdrv_is_allocated (at v2 [1]) and replacing
bdrv_get_block_status with a byte based callback in all the
drivers (at v1, needs a rebase [2]).

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-dirty-v2

No change to the code itself since v1 [3], just tweaking commit
messages and adding John's Reviewed-by tags, and making sure it
still rebases cleanly on top of Max's block branch.

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg02573.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02642.html
[3] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02163.html

Eric Blake (12):
  dirty-bitmap: Report BlockDirtyInfo.count in bytes, as documented
  migration: Don't lose errno across aio context changes
  dirty-bitmap: Drop unused functions
  dirty-bitmap: Track size in bytes
  dirty-bitmap: Set iterator start by offset, not sector
  dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset
  dirty-bitmap: Change bdrv_get_dirty_count() to report bytes
  dirty-bitmap: Change bdrv_get_dirty() to take bytes
  dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes
  mirror: Switch mirror_dirty_init() to byte-based iteration
  dirty-bitmap: Switch bdrv_set_dirty() to bytes
  dirty-bitmap: Convert internal hbitmap size/granularity

 include/block/block_int.h|  2 +-
 include/block/dirty-bitmap.h | 21 ---
 block/backup.c   |  7 ++--
 block/dirty-bitmap.c | 83 
 block/io.c   |  6 ++--
 block/mirror.c   | 73 +-
 migration/block.c| 14 
 7 files changed, 74 insertions(+), 132 deletions(-)

-- 
2.9.4

[Qemu-devel] [PATCH v2 10/12] mirror: Switch mirror_dirty_init() to byte-based iteration

2017-05-16 Thread Eric Blake

Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 

---
v2: no change
---
 block/mirror.c | 35 ++-
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b4fe259..6cfa57c 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -601,15 +601,13 @@ static void mirror_throttle(MirrorBlockJob *s)

 static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 {
-int64_t sector_num, end;
+int64_t offset;
 BlockDriverState *base = s->base;
 BlockDriverState *bs = s->source;
 BlockDriverState *target_bs = blk_bs(s->target);
-int ret, n;
+int ret;
 int64_t count;

-end = s->bdev_length / BDRV_SECTOR_SIZE;
-
 if (base == NULL && !bdrv_has_zero_init(target_bs)) {
 if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
 bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
@@ -617,9 +615,9 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 }

 s->initial_zeroing_ongoing = true;
-for (sector_num = 0; sector_num < end; ) {
-int nb_sectors = MIN(end - sector_num,
-QEMU_ALIGN_DOWN(INT_MAX, s->granularity) >> BDRV_SECTOR_BITS);
+for (offset = 0; offset < s->bdev_length; ) {
+int bytes = MIN(s->bdev_length - offset,
+QEMU_ALIGN_DOWN(INT_MAX, s->granularity));

 mirror_throttle(s);

@@ -635,9 +633,8 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 continue;
 }

-mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
-  nb_sectors * BDRV_SECTOR_SIZE, false);
-sector_num += nb_sectors;
+mirror_do_zero_or_discard(s, offset, bytes, false);
+offset += bytes;
 }

 mirror_wait_for_all_io(s);
@@ -645,10 +642,10 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob 
*s)
 }

 /* First part, loop on the sectors and initialize the dirty bitmap.  */
-for (sector_num = 0; sector_num < end; ) {
+for (offset = 0; offset < s->bdev_length; ) {
 /* Just to make sure we are not exceeding int limit. */
-int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS,
- end - sector_num);
+int bytes = MIN(s->bdev_length - offset,
+QEMU_ALIGN_DOWN(INT_MAX, s->granularity));

 mirror_throttle(s);

@@ -656,20 +653,16 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob 
*s)
 return 0;
 }

-ret = bdrv_is_allocated_above(bs, base, sector_num * BDRV_SECTOR_SIZE,
-  nb_sectors * BDRV_SECTOR_SIZE, );
+ret = bdrv_is_allocated_above(bs, base, offset, bytes, );
 if (ret < 0) {
 return ret;
 }

-n = DIV_ROUND_UP(count, BDRV_SECTOR_SIZE);
-assert(n > 0);
+count = QEMU_ALIGN_UP(count, BDRV_SECTOR_SIZE);
 if (ret == 1) {
-bdrv_set_dirty_bitmap(s->dirty_bitmap,
-  sector_num * BDRV_SECTOR_SIZE,
-  n * BDRV_SECTOR_SIZE);
+bdrv_set_dirty_bitmap(s->dirty_bitmap, offset, count);
 }
-sector_num += n;
+offset += count;
 }
 return 0;
 }
-- 
2.9.4

[Qemu-devel] [PATCH v2 04/12] dirty-bitmap: Track size in bytes

2017-05-16 Thread Eric Blake

We are still using an internal hbitmap that tracks a size in sectors,
with the granularity scaled down accordingly, because it lets us
use a shortcut for our iterators which are currently sector-based.
But there's no reason we can't track the dirty bitmap size in bytes,
since it is an internal-only variable.

Use is_power_of_2() while at it, instead of open-coding that.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 

---
v2: tweak commit message, no code change
---
 block/dirty-bitmap.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 32698d5..a413df1 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -41,7 +41,7 @@ struct BdrvDirtyBitmap {
 HBitmap *meta;  /* Meta dirty bitmap */
 BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
 char *name; /* Optional non-empty unique ID */
-int64_t size;   /* Size of the bitmap (Number of sectors) */
+int64_t size;   /* Size of the bitmap, in bytes */
 bool disabled;  /* Bitmap is read-only */
 int active_iterators;   /* How many iterators are active */
 QLIST_ENTRY(BdrvDirtyBitmap) list;
@@ -79,24 +79,26 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState 
*bs,
 {
 int64_t bitmap_size;
 BdrvDirtyBitmap *bitmap;
-uint32_t sector_granularity;

-assert((granularity & (granularity - 1)) == 0);
+assert(is_power_of_2(granularity) && granularity >= BDRV_SECTOR_SIZE);

 if (name && bdrv_find_dirty_bitmap(bs, name)) {
 error_setg(errp, "Bitmap already exists: %s", name);
 return NULL;
 }
-sector_granularity = granularity >> BDRV_SECTOR_BITS;
-assert(sector_granularity);
-bitmap_size = bdrv_nb_sectors(bs);
+bitmap_size = bdrv_getlength(bs);
 if (bitmap_size < 0) {
 error_setg_errno(errp, -bitmap_size, "could not get length of device");
 errno = -bitmap_size;
 return NULL;
 }
 bitmap = g_new0(BdrvDirtyBitmap, 1);
-bitmap->bitmap = hbitmap_alloc(bitmap_size, ctz32(sector_granularity));
+/*
+ * TODO - let hbitmap track full granularity. For now, it is tracking
+ * only sector granularity, as a shortcut for our iterators.
+ */
+bitmap->bitmap = hbitmap_alloc(bitmap_size >> BDRV_SECTOR_BITS,
+   ctz32(granularity) - BDRV_SECTOR_BITS);
 bitmap->size = bitmap_size;
 bitmap->name = g_strdup(name);
 bitmap->disabled = false;
@@ -246,12 +248,13 @@ BdrvDirtyBitmap 
*bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
 void bdrv_dirty_bitmap_truncate(BlockDriverState *bs)
 {
 BdrvDirtyBitmap *bitmap;
-uint64_t size = bdrv_nb_sectors(bs);
+int64_t size = bdrv_getlength(bs);

+assert(size >= 0);
 QLIST_FOREACH(bitmap, >dirty_bitmaps, list) {
 assert(!bdrv_dirty_bitmap_frozen(bitmap));
 assert(!bitmap->active_iterators);
-hbitmap_truncate(bitmap->bitmap, size);
+hbitmap_truncate(bitmap->bitmap, size >> BDRV_SECTOR_BITS);
 bitmap->size = size;
 }
 }
@@ -419,7 +422,7 @@ void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, 
HBitmap **out)
 hbitmap_reset_all(bitmap->bitmap);
 } else {
 HBitmap *backup = bitmap->bitmap;
-bitmap->bitmap = hbitmap_alloc(bitmap->size,
+bitmap->bitmap = hbitmap_alloc(bitmap->size >> BDRV_SECTOR_BITS,
hbitmap_granularity(backup));
 *out = backup;
 }
-- 
2.9.4

[Qemu-devel] [PATCH v2 03/12] dirty-bitmap: Drop unused functions

2017-05-16 Thread Eric Blake

We had several functions that no one is currently using, and which
use sector-based interfaces.  I'm trying to convert towards byte-based
interfaces, so it's easier to just drop the unused functions:

bdrv_dirty_bitmap_size
bdrv_dirty_bitmap_get_meta
bdrv_dirty_bitmap_reset_meta
bdrv_dirty_bitmap_meta_granularity

Vladimir may re-add bdrv_dirty_bitmap_size() for persistent
bitmaps, but has agreed to do so with byte rather than sector
access at the point where it is needed.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 

---
v2: tweak commit message based on review, no code change
---
 include/block/dirty-bitmap.h |  8 
 block/dirty-bitmap.c | 34 --
 2 files changed, 42 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 9dea14b..a83979d 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -30,11 +30,9 @@ void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs);
 uint32_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs);
 uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap *bitmap);
-uint32_t bdrv_dirty_bitmap_meta_granularity(BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap);
 const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap);
-int64_t bdrv_dirty_bitmap_size(const BdrvDirtyBitmap *bitmap);
 DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap);
 int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
int64_t sector);
@@ -42,12 +40,6 @@ void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int64_t nr_sectors);
 void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
  int64_t cur_sector, int64_t nr_sectors);
-int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
-   BdrvDirtyBitmap *bitmap, int64_t sector,
-   int nb_sectors);
-void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
-  BdrvDirtyBitmap *bitmap, int64_t sector,
-  int nb_sectors);
 BdrvDirtyBitmapIter *bdrv_dirty_meta_iter_new(BdrvDirtyBitmap *bitmap);
 BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
  uint64_t first_sector);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 6d8ce5f..32698d5 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -130,35 +130,6 @@ void bdrv_release_meta_dirty_bitmap(BdrvDirtyBitmap 
*bitmap)
 bitmap->meta = NULL;
 }

-int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
-   BdrvDirtyBitmap *bitmap, int64_t sector,
-   int nb_sectors)
-{
-uint64_t i;
-int sectors_per_bit = 1 << hbitmap_granularity(bitmap->meta);
-
-/* To optimize: we can make hbitmap to internally check the range in a
- * coarse level, or at least do it word by word. */
-for (i = sector; i < sector + nb_sectors; i += sectors_per_bit) {
-if (hbitmap_get(bitmap->meta, i)) {
-return true;
-}
-}
-return false;
-}
-
-void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
-  BdrvDirtyBitmap *bitmap, int64_t sector,
-  int nb_sectors)
-{
-hbitmap_reset(bitmap->meta, sector, nb_sectors);
-}
-
-int64_t bdrv_dirty_bitmap_size(const BdrvDirtyBitmap *bitmap)
-{
-return bitmap->size;
-}
-
 const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
 {
 return bitmap->name;
@@ -393,11 +364,6 @@ uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap 
*bitmap)
 return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
 }

-uint32_t bdrv_dirty_bitmap_meta_granularity(BdrvDirtyBitmap *bitmap)
-{
-return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->meta);
-}
-
 BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
  uint64_t first_sector)
 {
-- 
2.9.4

[Qemu-devel] [PATCH v2 01/12] dirty-bitmap: Report BlockDirtyInfo.count in bytes, as documented

2017-05-16 Thread Eric Blake

We've been documenting the value in bytes since its introduction
in commit b9a9b3a4 (v1.3), where it was actually reported in bytes.

Commit e4654d2 (v2.0) then removed things from block/qapi.c, in
preparation for a rewrite to a list of dirty sectors in the next
commit 21b5683 in block.c, but the new code mistakenly started
reporting in sectors.

Fixes: https://bugzilla.redhat.com/1441460

CC: qemu-sta...@nongnu.org
Signed-off-by: Eric Blake 
Reviewed-by: John Snow 

---
Too late for 2.9, since the regression has been unnoticed for
nine releases. But worth putting in 2.9.1.

v2: no change
---
 block/dirty-bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 519737c..6d8ce5f 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -345,7 +345,7 @@ BlockDirtyInfoList 
*bdrv_query_dirty_bitmaps(BlockDriverState *bs)
 QLIST_FOREACH(bm, >dirty_bitmaps, list) {
 BlockDirtyInfo *info = g_new0(BlockDirtyInfo, 1);
 BlockDirtyInfoList *entry = g_new0(BlockDirtyInfoList, 1);
-info->count = bdrv_get_dirty_count(bm);
+info->count = bdrv_get_dirty_count(bm) << BDRV_SECTOR_BITS;
 info->granularity = bdrv_dirty_bitmap_granularity(bm);
 info->has_name = !!bm->name;
 info->name = g_strdup(bm->name);
-- 
2.9.4

Re: [Qemu-devel] [PATCH] nvme: Add support for Controller Memory Buffers

2017-05-16 Thread Keith Busch

On Tue, May 16, 2017 at 01:10:59PM -0600, sba...@raithlin.com wrote:
> From: Stephen Bates 
> 
> Implement NVMe Controller Memory Buffers (CMBs) which were added in
> version 1.2 of the NVMe Specification. This patch adds an optional
> argument (cmb_size_mb) which indicates the size of the CMB (in
> MB). Currently only the Submission Queue Support (SQS) is enabled
> which aligns with the current Linux driver for NVMe.
> 
> Signed-off-by: Stephen Bates 

Awesome, this looks great!

Acked-by: Keith Busch

Re: [Qemu-devel] [PATCH] SMM: disable smram region if smm is disabled

2017-05-16 Thread Xu, Anthony

> On 16/05/2017 03:21, Anthony Xu wrote:
> > when smm is disabled, smram is not used, so disable it
> >
> > Signed-off-by: Anthony Xu 
> 
> What is the benefit?
This patch removes 1 memory region for i440 platform and 3 memory regions
for q35 platform. That makes functions which iterates memory region tree
a little bit fast even the memory regions are disabled.


Anthony

Re: [Qemu-devel] [PATCH] target/i386: enable A20 automatically in system management mode

2017-05-16 Thread Xu, Anthony


> On Sat, May 13, 2017 at 01:24:30AM +, Xu, Anthony wrote:
> > I think it is related to accel and platform, the result I gave before is 
> > for q35
> tcg,
> >
> > With the above change,   I got below data
> >
> > Platformaccel   count of restoring A20 to 0
> > Q35 kvm 96
> > Q35 tcg 271
> > PC  kvm 3
> > PC  tcg 3
> 
> Okay, thanks.  I think the number of a20 switches is due to
> differences in option rom execution interacting with the fact that
> some mode switches were occurring before SeaBIOS set
> call16_override().
> 
> > But I still see a lot of PORT_A20 accesses in QEMU as I expected
> 
> Yes, but it should be possible to significantly reduce the number of
> outb() calls by limiting them to when A20 changes.  This should also
> be useful to reduce the number of outb() calls needed to disable NMIs.
> I sent a patch series to the seabios mailing list to demonstrate the
> idea.

If both TCG and KVM work by ignoring A20,  why not remove all PORT_A20
access in SeaBios when CONFIG_DISABLE_A20 is not defined?
Do you see any impact?


-Anthony

[Qemu-devel] [PULL 4/4] xen: call qemu_set_cloexec instead of fcntl

2017-05-16 Thread Stefano Stabellini

Use the common utility function, which contains checks on return values
and first calls F_GETFD as recommended by POSIX.1-2001, instead of
manually calling fcntl.

CID: 1374831

Signed-off-by: Stefano Stabellini 
Reviewed-by: Eric Blake 
Reviewed-by: Greg Kurz 
CC: anthony.per...@citrix.com
CC: gr...@kaod.org
CC: aneesh.ku...@linux.vnet.ibm.com
CC: Eric Blake 
---
 hw/9pfs/xen-9p-backend.c | 2 +-
 hw/xen/xen_backend.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
index a1fdede..5df97c9 100644
--- a/hw/9pfs/xen-9p-backend.c
+++ b/hw/9pfs/xen-9p-backend.c
@@ -380,7 +380,7 @@ static int xen_9pfs_connect(struct XenDevice *xendev)
 if (xen_9pdev->rings[i].evtchndev == NULL) {
 goto out;
 }
-fcntl(xenevtchn_fd(xen_9pdev->rings[i].evtchndev), F_SETFD, 
FD_CLOEXEC);
+qemu_set_cloexec(xenevtchn_fd(xen_9pdev->rings[i].evtchndev));
 xen_9pdev->rings[i].local_port = xenevtchn_bind_interdomain
 (xen_9pdev->rings[i].evtchndev,
  xendev->dom,
diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
index c85f163..2cac47d 100644
--- a/hw/xen/xen_backend.c
+++ b/hw/xen/xen_backend.c
@@ -147,7 +147,7 @@ static struct XenDevice *xen_be_get_xendev(const char 
*type, int dom, int dev,
 qdev_unplug(DEVICE(xendev), NULL);
 return NULL;
 }
-fcntl(xenevtchn_fd(xendev->evtchndev), F_SETFD, FD_CLOEXEC);
+qemu_set_cloexec(xenevtchn_fd(xendev->evtchndev));
 
 if (ops->flags & DEVOPS_FLAG_NEED_GNTDEV) {
 xendev->gnttabdev = xengnttab_open(NULL, 0);
-- 
1.9.1

[Qemu-devel] [PULL 3/4] xen/9pfs: fix two resource leaks on error paths, discovered by Coverity

2017-05-16 Thread Stefano Stabellini

CID: 1374836

Signed-off-by: Stefano Stabellini 
Reviewed-by: Eric Blake 
Reviewed-by: Greg Kurz 
CC: anthony.per...@citrix.com
CC: gr...@kaod.org
CC: aneesh.ku...@linux.vnet.ibm.com
---
 hw/9pfs/xen-9p-backend.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
index 9c7f41a..a1fdede 100644
--- a/hw/9pfs/xen-9p-backend.c
+++ b/hw/9pfs/xen-9p-backend.c
@@ -332,12 +332,14 @@ static int xen_9pfs_connect(struct XenDevice *xendev)
 str = g_strdup_printf("ring-ref%u", i);
 if (xenstore_read_fe_int(_9pdev->xendev, str,
  _9pdev->rings[i].ref) == -1) {
+g_free(str);
 goto out;
 }
 g_free(str);
 str = g_strdup_printf("event-channel-%u", i);
 if (xenstore_read_fe_int(_9pdev->xendev, str,
  _9pdev->rings[i].evtchn) == -1) {
+g_free(str);
 goto out;
 }
 g_free(str);
-- 
1.9.1

[Qemu-devel] [PULL 2/4] configure: Remove -lxencall for Xen detection

2017-05-16 Thread Stefano Stabellini

From: Anthony PERARD 

QEMU does not depends on libxencall, it was added because it was a
missing link dependency of libxendevicemodel, but now the later should
be built properly.

Signed-off-by: Anthony PERARD 
Reviewed-by: Stefano Stabellini 
---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 57b5ae6..139638e 100755
--- a/configure
+++ b/configure
@@ -2015,7 +2015,7 @@ if test "$xen" != "no" ; then
   else
 
 xen_libs="-lxenstore -lxenctrl -lxenguest"
-xen_stable_libs="-lxencall -lxenforeignmemory -lxengnttab -lxenevtchn"
+xen_stable_libs="-lxenforeignmemory -lxengnttab -lxenevtchn"
 
 # First we test whether Xen headers and libraries are available.
 # If no, we are done and there is no Xen support.
-- 
1.9.1

[Qemu-devel] [PULL 1/4] xen/mapcache: store dma information in revmapcache entries for debugging

2017-05-16 Thread Stefano Stabellini

The Xen mapcache is able to create long term mappings, they are called
"locked" mappings. The third parameter of the xen_map_cache call
specifies if a mapping is a "locked" mapping.

>From the QEMU point of view there are two kinds of long term mappings:

[a] device memory mappings, such as option roms and video memory
[b] dma mappings, created by dma_memory_map & friends

After certain operations, ballooning a VM in particular, Xen asks QEMU
kindly to destroy all mappings. However, certainly [a] mappings are
present and cannot be removed. That's not a problem as they are not
affected by balloonning. The *real* problem is that if there are any
mappings of type [b], any outstanding dma operations could fail. This is
a known shortcoming. In other words, when Xen asks QEMU to destroy all
mappings, it is an error if any [b] mappings exist.

However today we have no way of distinguishing [a] from [b]. Because of
that, we cannot even print a decent warning.

This patch introduces a new "dma" bool field to MapCacheRev entires, to
remember if a given mapping is for dma or is a long term device memory
mapping. When xen_invalidate_map_cache is called, we print a warning if
any [b] mappings exist. We ignore [a] mappings.

Mappings created by qemu_map_ram_ptr are assumed to be [a], while
mappings created by address_space_map->qemu_ram_ptr_length are assumed
to be [b].

The goal of the patch is to make debugging and system understanding
easier.

Signed-off-by: Stefano Stabellini 
Acked-by: Paolo Bonzini 
Acked-by: Anthony PERARD 
---
 exec.c|  8 
 hw/i386/xen/xen-mapcache.c| 15 ++-
 include/sysemu/xen-mapcache.h |  5 +++--
 3 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/exec.c b/exec.c
index eac6085..85769e1 100644
--- a/exec.c
+++ b/exec.c
@@ -2084,10 +2084,10 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t 
addr)
  * In that case just map until the end of the page.
  */
 if (block->offset == 0) {
-return xen_map_cache(addr, 0, 0);
+return xen_map_cache(addr, 0, 0, false);
 }
 
-block->host = xen_map_cache(block->offset, block->max_length, 1);
+block->host = xen_map_cache(block->offset, block->max_length, 1, 
false);
 }
 return ramblock_ptr(block, addr);
 }
@@ -2117,10 +2117,10 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, 
ram_addr_t addr,
  * In that case just map the requested area.
  */
 if (block->offset == 0) {
-return xen_map_cache(addr, *size, 1);
+return xen_map_cache(addr, *size, 1, true);
 }
 
-block->host = xen_map_cache(block->offset, block->max_length, 1);
+block->host = xen_map_cache(block->offset, block->max_length, 1, true);
 }
 
 return ramblock_ptr(block, addr);
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index 31debdf..e60156c 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -62,6 +62,7 @@ typedef struct MapCacheRev {
 hwaddr paddr_index;
 hwaddr size;
 QTAILQ_ENTRY(MapCacheRev) next;
+bool dma;
 } MapCacheRev;
 
 typedef struct MapCache {
@@ -202,7 +203,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 }
 
 static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr, hwaddr size,
-   uint8_t lock)
+   uint8_t lock, bool dma)
 {
 MapCacheEntry *entry, *pentry = NULL;
 hwaddr address_index;
@@ -289,6 +290,7 @@ tryagain:
 if (lock) {
 MapCacheRev *reventry = g_malloc0(sizeof(MapCacheRev));
 entry->lock++;
+reventry->dma = dma;
 reventry->vaddr_req = mapcache->last_entry->vaddr_base + 
address_offset;
 reventry->paddr_index = mapcache->last_entry->paddr_index;
 reventry->size = entry->size;
@@ -300,12 +302,12 @@ tryagain:
 }
 
 uint8_t *xen_map_cache(hwaddr phys_addr, hwaddr size,
-   uint8_t lock)
+   uint8_t lock, bool dma)
 {
 uint8_t *p;
 
 mapcache_lock();
-p = xen_map_cache_unlocked(phys_addr, size, lock);
+p = xen_map_cache_unlocked(phys_addr, size, lock, dma);
 mapcache_unlock();
 return p;
 }
@@ -426,8 +428,11 @@ void xen_invalidate_map_cache(void)
 mapcache_lock();
 
 QTAILQ_FOREACH(reventry, >locked_entries, next) {
-DPRINTF("There should be no locked mappings at this time, "
-"but "TARGET_FMT_plx" -> %p is present\n",
+if (!reventry->dma) {
+continue;
+}
+fprintf(stderr, "Locked DMA mapping while invalidating mapcache!"
+" "TARGET_FMT_plx" -> %p is present\n",
 reventry->paddr_index, reventry->vaddr_req);
 }
 
diff --git a/include/sysemu/xen-mapcache.h b/include/sysemu/xen-mapcache.h
index b8c93b9..01daaad

[Qemu-devel] [PULL 0/4] please pull xen-20170516-tag

2017-05-16 Thread Stefano Stabellini

The following changes since commit cdece0467c7cf8e3f4b3c3f0b13bf2c4fea9:

  block/win32: fix 'ret not initialized' warning (2017-05-16 15:34:18 +0100)

are available in the git repository at:

  git://xenbits.xen.org/people/sstabellini/qemu-dm.git tags/xen-20170516-tag

for you to fetch changes up to 01cd90b641e1aed40cf13a577e6a737af94d55e7:

  xen: call qemu_set_cloexec instead of fcntl (2017-05-16 11:51:25 -0700)


Xen 2017/05/16


Anthony PERARD (1):
  configure: Remove -lxencall for Xen detection

Stefano Stabellini (3):
  xen/mapcache: store dma information in revmapcache entries for debugging
  xen/9pfs: fix two resource leaks on error paths, discovered by Coverity
  xen: call qemu_set_cloexec instead of fcntl

 configure |  2 +-
 exec.c|  8 
 hw/9pfs/xen-9p-backend.c  |  4 +++-
 hw/i386/xen/xen-mapcache.c| 15 ++-
 hw/xen/xen_backend.c  |  2 +-
 include/sysemu/xen-mapcache.h |  5 +++--
 6 files changed, 22 insertions(+), 14 deletions(-)

Re: [Qemu-devel] [PATCH v1] target/s390x: Add support for the TEST BLOCK instruction

2017-05-16 Thread Richard Henderson


On 05/16/2017 02:28 AM, Thomas Huth wrote:

+void HELPER(testblock)(CPUS390XState *env, uint64_t addr)
+{
+CPUState *cs = CPU(s390_env_get_cpu(env));
+int i;
+
+addr = get_address(env, 0, 0, addr) & ~0xfffULL;
+for (i = 0; i < TARGET_PAGE_SIZE; i += 8) {
+stq_phys(cs->as, addr + i, 0);
+}
+env->cc_op = 0;
+}


This needs several changes: check that the physical page does indeed exist, 
"low address protection", return the cc code.



+DEF_HELPER_2(testblock, void, env, i64)


With cc returned, this becomes

  DEF_HELPER_FLAGS_2(testblock, TCG_CALL_NO_RWG, i32, env, i64)


r~

Re: [Qemu-devel] [PATCH v3 0/3] arch_init: Move soundhw code to hw/audio/soundhw.c

2017-05-16 Thread Eduardo Habkost

Ping?

On Mon, May 08, 2017 at 05:57:32PM -0300, Eduardo Habkost wrote:
> Changes v2 -> v3:
> * Build fix: update hw/ppc/prep.c too
> 
> Changes v1 -> v2:
> * Rebase to latest qemu.git master
> 
> This moves the arch_init.c soundhw code to its own file, renames
> audio_init() to soundhw_init(), and renames hw/audio/audio.h to
> hw/audio/soundhw.h.
> 
> Eduardo Habkost (3):
>   audio: Move arch_init audio code to hw/audio/soundhw.c
>   audio: Rename audio_init() to soundhw_init()
>   audio: Rename hw/audio/audio.h to hw/audio/soundhw.h
> 
>  include/hw/audio/{audio.h => soundhw.h} |   3 +
>  include/sysemu/arch_init.h  |   2 -
>  arch_init.c | 126 +-
>  hw/audio/ac97.c |   2 +-
>  hw/audio/adlib.c|   2 +-
>  hw/audio/cs4231a.c  |   2 +-
>  hw/audio/es1370.c   |   2 +-
>  hw/audio/gus.c  |   2 +-
>  hw/audio/intel-hda.c|   2 +-
>  hw/audio/pcspk.c|   2 +-
>  hw/audio/sb16.c |   2 +-
>  hw/audio/soundhw.c  | 156 
> 
>  hw/ppc/prep.c   |   3 +-
>  vl.c|   3 +-
>  hw/audio/Makefile.objs  |   2 +
>  15 files changed, 174 insertions(+), 137 deletions(-)
>  rename include/hw/audio/{audio.h => soundhw.h} (81%)
>  create mode 100644 hw/audio/soundhw.c
> 
> -- 
> 2.11.0.259.g40922b1
> 
> 

-- 
Eduardo

Re: [Qemu-devel] [PATCH v2 2/3] Check the return value of fcntl in qemu_set_cloexec

2017-05-16 Thread Stefano Stabellini

On Thu, 11 May 2017, Paolo Bonzini wrote:
> On 09/05/2017 21:04, Stefano Stabellini wrote:
> > Assert that the return value is not an error. This issue was found by
> > Coverity.
> > 
> > CID: 1374831
> > 
> > Signed-off-by: Stefano Stabellini 
> > CC: gr...@kaod.org
> > CC: pbonz...@redhat.com
> > CC: Eric Blake 
> 
> Queued, thanks.

I am about to send a pull request with the rest of the series, but I'll
leave this one to you.

Cheers,

Stefano


> > ---
> >  util/oslib-posix.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> > index 4d9189e..16894ad 100644
> > --- a/util/oslib-posix.c
> > +++ b/util/oslib-posix.c
> > @@ -182,7 +182,9 @@ void qemu_set_cloexec(int fd)
> >  {
> >  int f;
> >  f = fcntl(fd, F_GETFD);
> > -fcntl(fd, F_SETFD, f | FD_CLOEXEC);
> > +assert(f != -1);
> > +f = fcntl(fd, F_SETFD, f | FD_CLOEXEC);
> > +assert(f != -1);
> >  }
> >  
> >  /*
> > 
>

Re: [Qemu-devel] [PATCH 10/17] object: add uint property setter/getter

2017-05-16 Thread Markus Armbruster

Marc-André Lureau  writes:

> Signed-off-by: Marc-André Lureau 
> ---
>  include/qom/object.h | 23 +++
>  qom/object.c | 33 +
>  2 files changed, 56 insertions(+)
>
> diff --git a/include/qom/object.h b/include/qom/object.h
> index cd0f412ce9..abaeb8cf4e 100644
> --- a/include/qom/object.h
> +++ b/include/qom/object.h
> @@ -1094,6 +1094,29 @@ int64_t object_property_get_int(Object *obj, const 
> char *name,
>  Error **errp);
>  
>  /**
> + * object_property_set_uint:
> + * @value: the value to be written to the property
> + * @name: the name of the property
> + * @errp: returns an error if this function fails
> + *
> + * Writes an unsigned integer value to a property.
> + */
> +void object_property_set_uint(Object *obj, uint64_t value,
> +  const char *name, Error **errp);
> +
> +/**
> + * object_property_get_uint:
> + * @obj: the object
> + * @name: the name of the property
> + * @errp: returns an error if this function fails
> + *
> + * Returns: the value of the property, converted to an unsigned integer, or 0
> + * an error occurs (including when the property value is not an integer).
> + */
> +uint64_t object_property_get_uint(Object *obj, const char *name,
> +  Error **errp);
> +
> +/**
>   * object_property_get_enum:
>   * @obj: the object
>   * @name: the name of the property
> diff --git a/qom/object.c b/qom/object.c
> index c1644dbcb7..a9259e330d 100644
> --- a/qom/object.c
> +++ b/qom/object.c
> @@ -1221,6 +1221,39 @@ int64_t object_property_get_int(Object *obj, const 
> char *name,
>  return retval;
>  }
>  
> +void object_property_set_uint(Object *obj, uint64_t value,
> + const char *name, Error **errp)
> +{
> +QNum *qn = qnum_from_uint(value);

Please call the variable @qnum, to match object_property_set_int() and
object_property_get_uint().

Blank line here.

> +object_property_set_qobject(obj, QOBJECT(qn), name, errp);
> +QDECREF(qn);
> +}
> +
> +uint64_t object_property_get_uint(Object *obj, const char *name,
> +  Error **errp)
> +{
> +QObject *ret = object_property_get_qobject(obj, name, errp);
> +Error *err = NULL;
> +QNum *qnum;
> +uint64_t retval;
> +
> +if (!ret) {
> +return 0;
> +}
> +qnum = qobject_to_qnum(ret);
> +if (qnum) {
> +retval = qnum_get_uint(qnum, );
> +}
> +
> +if (!qnum || err) {
> +error_setg(errp, QERR_INVALID_PARAMETER_TYPE, name, "uint");
> +retval = 0;
> +}
> +
> +qobject_decref(ret);
> +return retval;
> +}
> +
>  typedef struct EnumProperty {
>  const char * const *strings;
>  int (*get)(Object *, Error **);

With the nits touched up:
Reviewed-by: Markus Armbruster

Re: [Qemu-devel] [PATCH 2/5] migration: Create block capability

2017-05-16 Thread Juan Quintela

Eric Blake  wrote:
> On 05/16/2017 11:42 AM, Markus Armbruster wrote:
>
 Well, to suggest something, I'd first have to figure out WTF incremental
 block migration does.  Your text helps me some, but not enough.  What
 exactly is being migrated, and what exactly is assumed to be shared
 between source and destination?

 Block migration is scandalously underdocumented.
>>>
>
>> Can you draft a documentation comment for @block-incremental?
>
> How about:
>
> @block-incremental: Affects how much storage is migrated when the block
> migration capability is enabled.  When false, the entire storage backing
> chain is migrated into a flattened image at the destination; when true,
> only the active qcow2 layer is migrated and the destination must already
> have access to the same backing chain as was used on the source.
> (since 2.10)

Changed.  Thanks.

Re: [Qemu-devel] [PATCH 09/17] qnum: fix get_int() with values > INT64_MAX

2017-05-16 Thread Markus Armbruster

Marc-André Lureau  writes:

> Now that the visitor has been switch to use qnum_uint, fix the bad
> get_int() to use get_uint() instead. Remove compatibility code.
>
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/i386/acpi-build.c | 2 +-
>  qobject/qnum.c   | 4 ++--
>  tests/check-qnum.c   | 9 -
>  3 files changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index ec3ae7fa85..767da5d78e 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2585,7 +2585,7 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
>  if (!o) {
>  return false;
>  }
> -mcfg->mcfg_base = qnum_get_int(qobject_to_qnum(o), _abort);
> +mcfg->mcfg_base = qnum_get_uint(qobject_to_qnum(o), _abort);
>  qobject_decref(o);
>  
>  o = object_property_get_qobject(pci_host, PCIE_HOST_MCFG_SIZE, NULL);

The change makes sense becaise mcfg_base is uint64_t.  But why does it
belong to this patch?

> diff --git a/qobject/qnum.c b/qobject/qnum.c
> index 2f87952db8..be6307accf 100644
> --- a/qobject/qnum.c
> +++ b/qobject/qnum.c
> @@ -76,8 +76,8 @@ int64_t qnum_get_int(const QNum *qn, Error **errp)
>  return qn->u.i64;
>  case QNUM_U64:
>  if (qn->u.u64 > INT64_MAX) {
> -/* temporarily accepts to cast to i64 until visitor is switched 
> */
> -error_report("The number is too large, use qnum_get_uint()");
> +error_setg(errp, "The number is too large, use qnum_get_uint()");
> +return 0;
>  }
>  return qn->u.u64;
>  case QNUM_DOUBLE:
> diff --git a/tests/check-qnum.c b/tests/check-qnum.c
> index 9a22af3d0e..8199546f99 100644
> --- a/tests/check-qnum.c
> +++ b/tests/check-qnum.c
> @@ -107,11 +107,10 @@ static void qnum_get_uint_test(void)
>  error_free_or_abort();
>  QDECREF(qn);
>  
> -/* temporarily disabled until visitor is switched */
> -/* qn = qnum_from_uint(-1ULL); */
> -/* qnum_get_int(qn, ); */
> -/* error_free_or_abort(); */
> -/* QDECREF(qn); */
> +qn = qnum_from_uint(-1ULL);
> +qnum_get_int(qn, );
> +error_free_or_abort();
> +QDECREF(qn);
>  
>  /* invalid case */
>  qn = qnum_from_double(0.42);

Re: [Qemu-devel] [PATCH 08/17] qapi: update the qobject visitor to use QUInt

2017-05-16 Thread Markus Armbruster

On the subject: there is no such thing as "QUInt".  I guess you mean
"uint type" (like in PATCH 06's subject).  Could also say "QNUM_U64".

Apropos subject: humor me, and start your subjects with a capital
letter, like this:

qapi: Update the qobject visitor ...

Marc-André Lureau  writes:

> Switch to use QNum/uint where appropriate to remove i64 limitation.
>
> The input visitor will cast i64 input to u64 for compatibility
> reasons (existing json QMP client already use negative i64 for large
> u64, and expect an implicit cast in qemu).
>
> Signed-off-by: Marc-André Lureau 
> ---
>  qapi/qobject-input-visitor.c| 13 +++--
>  qapi/qobject-output-visitor.c   |  3 +--
>  tests/test-qobject-output-visitor.c | 21 -
>  3 files changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/qapi/qobject-input-visitor.c b/qapi/qobject-input-visitor.c
> index 785949ebab..72cefcf677 100644
> --- a/qapi/qobject-input-visitor.c
> +++ b/qapi/qobject-input-visitor.c
> @@ -420,9 +420,9 @@ static void qobject_input_type_int64_keyval(Visitor *v, 
> const char *name,
>  static void qobject_input_type_uint64(Visitor *v, const char *name,
>uint64_t *obj, Error **errp)
>  {
> -/* FIXME: qobject_to_qnum mishandles values over INT64_MAX */
>  QObjectInputVisitor *qiv = to_qiv(v);
>  QObject *qobj = qobject_input_get_object(qiv, name, true, errp);
> +Error *err = NULL;
>  QNum *qnum;
>  
>  if (!qobj) {
> @@ -435,7 +435,16 @@ static void qobject_input_type_uint64(Visitor *v, const 
> char *name,
>  return;
>  }
>  
> -*obj = qnum_get_int(qnum, errp);
> +/* XXX: compatibility case, accept negative values as u64 */

What does "XXX" signify?

> +*obj = qnum_get_int(qnum, );
> +

Shouldn't the comment go right here?

> +if (err) {
> +error_free(err);
> +err = NULL;
> +*obj = qnum_get_uint(qnum, );
> +}
> +
> +error_propagate(errp, err);
>  }
>  
>  static void qobject_input_type_uint64_keyval(Visitor *v, const char *name,
> diff --git a/qapi/qobject-output-visitor.c b/qapi/qobject-output-visitor.c
> index 2ca5093b22..70be84ccb5 100644
> --- a/qapi/qobject-output-visitor.c
> +++ b/qapi/qobject-output-visitor.c
> @@ -150,9 +150,8 @@ static void qobject_output_type_int64(Visitor *v, const 
> char *name,
>  static void qobject_output_type_uint64(Visitor *v, const char *name,
> uint64_t *obj, Error **errp)
>  {
> -/* FIXME values larger than INT64_MAX become negative */
>  QObjectOutputVisitor *qov = to_qov(v);
> -qobject_output_add(qov, name, qnum_from_int(*obj));
> +qobject_output_add(qov, name, qnum_from_uint(*obj));

Before the patch, uint64_t values above INT64_MAX are sent as negative
values, e.g. UINT64_MAX is sent as -1.

After the patch, they are sent unmodified.  Clearly a bug fix, but we
have to consider compatibility issues anyway.  Does libvirt expect large
integers to be sent as negative integers?  Does it cope with this fix
gracefully?  Eric, any idea?

>  }
>  
>  static void qobject_output_type_bool(Visitor *v, const char *name, bool *obj,
> diff --git a/tests/test-qobject-output-visitor.c 
> b/tests/test-qobject-output-visitor.c
> index 66a682d5a8..767818e393 100644
> --- a/tests/test-qobject-output-visitor.c
> +++ b/tests/test-qobject-output-visitor.c
> @@ -595,15 +595,26 @@ static void check_native_list(QObject *qobj,
>  qlist = qlist_copy(qobject_to_qlist(qdict_get(qdict, "data")));
>  
>  switch (kind) {
> -case USER_DEF_NATIVE_LIST_UNION_KIND_S8:
> -case USER_DEF_NATIVE_LIST_UNION_KIND_S16:
> -case USER_DEF_NATIVE_LIST_UNION_KIND_S32:
> -case USER_DEF_NATIVE_LIST_UNION_KIND_S64:
>  case USER_DEF_NATIVE_LIST_UNION_KIND_U8:
>  case USER_DEF_NATIVE_LIST_UNION_KIND_U16:
>  case USER_DEF_NATIVE_LIST_UNION_KIND_U32:
>  case USER_DEF_NATIVE_LIST_UNION_KIND_U64:
> -/* all integer elements in JSON arrays get stored into QNums when
> +for (i = 0; i < 32; i++) {
> +QObject *tmp;
> +QNum *qvalue;
> +tmp = qlist_peek(qlist);
> +g_assert(tmp);
> +qvalue = qobject_to_qnum(tmp);
> +g_assert_cmpuint(qnum_get_uint(qvalue, _abort), ==, i);
> +qobject_decref(qlist_pop(qlist));
> +}
> +break;
> +
> +case USER_DEF_NATIVE_LIST_UNION_KIND_S8:
> +case USER_DEF_NATIVE_LIST_UNION_KIND_S16:
> +case USER_DEF_NATIVE_LIST_UNION_KIND_S32:
> +case USER_DEF_NATIVE_LIST_UNION_KIND_S64:
> +/* all integer elements in JSON arrays get stored into QInts when
>   * we convert to QObjects, so we can check them all in the same
>   * fashion, so simply fall through here
>   */

Make that "All signed integer ...", and wing both ends of the comment.
Or simply drop the comment.

1 2 3 >

1 - 100 of 282 matches

Mail list logo