Re: Race condition in overlayed qcow2?

2020-02-24 Thread dovgaluk

Vladimir Sementsov-Ogievskiy писал 2020-02-25 10:27:

25.02.2020 8:58, dovgaluk wrote:

Vladimir Sementsov-Ogievskiy писал 2020-02-21 16:23:

21.02.2020 15:35, dovgaluk wrote:

Vladimir Sementsov-Ogievskiy писал 2020-02-21 13:09:

21.02.2020 12:49, dovgaluk wrote:

Vladimir Sementsov-Ogievskiy писал 2020-02-20 12:36:


So, preadv in file-posix.c returns different results for the same
offset, for file which is always opened in RO mode? Sounds 
impossible

:)


True.
Maybe my logging is wrong?

static ssize_t
qemu_preadv(int fd, const struct iovec *iov, int nr_iov, off_t 
offset)

{
 ssize_t res = preadv(fd, iov, nr_iov, offset);
 qemu_log("preadv %x %"PRIx64"\n", fd, (uint64_t)offset);
 int i;
 uint32_t sum = 0;
 int cnt = 0;
 for (i = 0 ; i < nr_iov ; ++i) {
 int j;
 for (j = 0 ; j < (int)iov[i].iov_len ; ++j)
 {
 sum += ((uint8_t*)iov[i].iov_base)[j];
 ++cnt;
 }
 }
 qemu_log("size: %x sum: %x\n", cnt, sum);
 assert(cnt == res);
 return res;
}



Hmm, I don't see any issues here..

Are you absolutely sure, that all these reads are from backing file,
which is read-only and never changed (may be by other processes)?


Yes, I made a copy and compared the files with binwalk.


2. guest modifies buffers during operation (you can catch it if
allocate personal buffer for preadv, than calculate checksum, then
memcpy to guest buffer)


I added the following to the qemu_preadv:

     // do it again
     unsigned char *buf = g_malloc(cnt);
     struct iovec v = {buf, cnt};
     res = preadv(fd, , 1, offset);
     assert(cnt == res);
     uint32_t sum2 = 0;
     for (i = 0 ; i < cnt ; ++i)
     sum2 += buf[i];
     g_free(buf);
     qemu_log("--- sum2 = %x\n", sum2);
     assert(sum2 == sum);

These two reads give different results.
But who can modify the buffer while qcow2 workers filling it with data 
from the disk?




As far as I know, it's guest's buffer, and guest may modify it during
the operation. So, it may be winxp :)


True, but normally the guest won't do it.

But I noticed that DMA operation which causes the problems has the 
following set of the buffers:

dma read sg size 2 offset: c000fe00
--- sg: base: 2eb1000 len: 1000
--- sg: base: 300 len: 1000
--- sg: base: 2eb2000 len: 3000
--- sg: base: 300 len: 1000
--- sg: base: 2eb5000 len: b000
--- sg: base: 304 len: 1000
--- sg: base: 2f41000 len: 3000
--- sg: base: 300 len: 1000
--- sg: base: 2f44000 len: 4000
--- sg: base: 300 len: 1000
--- sg: base: 2f48000 len: 2000
--- sg: base: 300 len: 1000
--- sg: base: 300 len: 1000
--- sg: base: 300 len: 1000


It means that one DMA transaction performs multiple reads into the same 
address.

And no races is possible, when there is only one qcow2 worker.
When there are many of them - they can fill this buffer simultaneously.

Pavel Dovgalyuk



[PATCH] hw/smbios: add options for type 4 max_speed and current_speed

2020-02-24 Thread Heyi Guo
Common VM users sometimes care about CPU speed, so we add two new
options to allow VM vendors to present CPU speed to their users.
Normally these information can be fetched from host smbios.

Strictly speaking, the "max speed" and "current speed" in type 4
are not really for the max speed and current speed of processor, for
"max speed" identifies a capability of the system, and "current speed"
identifies the processor's speed at boot (see smbios spec), but some
applications do not tell the differences.

Signed-off-by: Heyi Guo 

---
Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 
---
 hw/smbios/smbios.c | 22 +++---
 qemu-options.hx|  3 ++-
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index ffd98727ee..1d5439643d 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -94,6 +94,8 @@ static struct {
 
 static struct {
 const char *sock_pfx, *manufacturer, *version, *serial, *asset, *part;
+uint32_t max_speed;
+uint32_t current_speed;
 } type4;
 
 static struct {
@@ -272,6 +274,14 @@ static const QemuOptDesc qemu_smbios_type4_opts[] = {
 .name = "version",
 .type = QEMU_OPT_STRING,
 .help = "version number",
+},{
+.name = "max_speed",
+.type = QEMU_OPT_NUMBER,
+.help = "max speed in MHz",
+},{
+.name = "current_speed",
+.type = QEMU_OPT_NUMBER,
+.help = "speed at system boot in MHz",
 },{
 .name = "serial",
 .type = QEMU_OPT_STRING,
@@ -586,9 +596,8 @@ static void smbios_build_type_4_table(MachineState *ms, 
unsigned instance)
 SMBIOS_TABLE_SET_STR(4, processor_version_str, type4.version);
 t->voltage = 0;
 t->external_clock = cpu_to_le16(0); /* Unknown */
-/* SVVP requires max_speed and current_speed to not be unknown. */
-t->max_speed = cpu_to_le16(2000); /* 2000 MHz */
-t->current_speed = cpu_to_le16(2000); /* 2000 MHz */
+t->max_speed = cpu_to_le16(type4.max_speed);
+t->current_speed = cpu_to_le16(type4.current_speed);
 t->status = 0x41; /* Socket populated, CPU enabled */
 t->processor_upgrade = 0x01; /* Other */
 t->l1_cache_handle = cpu_to_le16(0x); /* N/A */
@@ -1129,6 +1138,13 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
 save_opt(, opts, "serial");
 save_opt(, opts, "asset");
 save_opt(, opts, "part");
+/*
+ * SVVP requires max_speed and current_speed to not be unknown, and
+ * we set the default value to 2000MHz as we did before.
+ */
+type4.max_speed = qemu_opt_get_number(opts, "max_speed", 2000);
+type4.current_speed = qemu_opt_get_number(opts, "current_speed",
+  2000);
 return;
 case 11:
 qemu_opts_validate(opts, qemu_smbios_type11_opts, );
diff --git a/qemu-options.hx b/qemu-options.hx
index ac315c1ac4..bc9ef0fda8 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2233,6 +2233,7 @@ DEF("smbios", HAS_ARG, QEMU_OPTION_smbios,
 "specify SMBIOS type 3 fields\n"
 "-smbios 
type=4[,sock_pfx=str][,manufacturer=str][,version=str][,serial=str]\n"
 "  [,asset=str][,part=str]\n"
+"  [,max_speed=%d][,current_speed=%d]\n"
 "specify SMBIOS type 4 fields\n"
 "-smbios 
type=17[,loc_pfx=str][,bank=str][,manufacturer=str][,serial=str]\n"
 "   [,asset=str][,part=str][,speed=%d]\n"
@@ -2255,7 +2256,7 @@ Specify SMBIOS type 2 fields
 @item -smbios 
type=3[,manufacturer=@var{str}][,version=@var{str}][,serial=@var{str}][,asset=@var{str}][,sku=@var{str}]
 Specify SMBIOS type 3 fields
 
-@item -smbios 
type=4[,sock_pfx=@var{str}][,manufacturer=@var{str}][,version=@var{str}][,serial=@var{str}][,asset=@var{str}][,part=@var{str}]
+@item -smbios 
type=4[,sock_pfx=@var{str}][,manufacturer=@var{str}][,version=@var{str}][,serial=@var{str}][,asset=@var{str}][,part=@var{str}][,max_speed=@var{%d}][,current_speed=@var{%d}]
 Specify SMBIOS type 4 fields
 
 @item -smbios 
type=17[,loc_pfx=@var{str}][,bank=@var{str}][,manufacturer=@var{str}][,serial=@var{str}][,asset=@var{str}][,part=@var{str}][,speed=@var{%d}]
-- 
2.19.1




Re: [PATCH v6 13/18] spapr: Don't use weird units for MIN_RMA_SLOF

2020-02-24 Thread Cédric Le Goater
On 2/25/20 12:37 AM, David Gibson wrote:
> MIN_RMA_SLOF records the minimum about of RMA that the SLOF firmware
> requires.  It lets us give a meaningful error if the RMA ends up too small,
> rather than just letting SLOF crash.
> 
> It's currently stored as a number of megabytes, which is strange for global
> constants.  Move that megabyte scaling into the definition of the constant
> like most other things use.
> 
> Change from M to MiB in the associated message while we're at it.
> 
> Signed-off-by: David Gibson 

Reviewed-by: Cédric Le Goater 


> ---
>  hw/ppc/spapr.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 828e2cc135..272a270b7a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -103,7 +103,7 @@
>  #define FW_OVERHEAD 0x280
>  #define KERNEL_LOAD_ADDRFW_MAX_SIZE
>  
> -#define MIN_RMA_SLOF128UL
> +#define MIN_RMA_SLOF(128 * MiB)
>  
>  #define PHANDLE_INTC0x
>  
> @@ -2959,10 +2959,10 @@ static void spapr_machine_init(MachineState *machine)
>  }
>  }
>  
> -if (spapr->rma_size < (MIN_RMA_SLOF * MiB)) {
> +if (spapr->rma_size < MIN_RMA_SLOF) {
>  error_report(
> -"pSeries SLOF firmware requires >= %ldM guest RMA (Real Mode 
> Area memory)",
> -MIN_RMA_SLOF);
> +"pSeries SLOF firmware requires >= %ldMiB guest RMA (Real Mode 
> Area memory)",
> +MIN_RMA_SLOF / MiB);
>  exit(1);
>  }
>  
> 




Re: [PATCH v4 09/16] target/i386: Cleanup and use the EPYC mode topology functions

2020-02-24 Thread Igor Mammedov
On Mon, 24 Feb 2020 11:29:37 -0600
Babu Moger  wrote:

> On 2/24/20 2:52 AM, Igor Mammedov wrote:
> > On Thu, 13 Feb 2020 12:17:25 -0600
> > Babu Moger  wrote:
> >   
> >> Use the new functions from topology.h and delete the unused code. Given the
> >> sockets, nodes, cores and threads, the new functions generate apic id for 
> >> EPYC
> >> mode. Removes all the hardcoded values.
> >>
> >> Signed-off-by: Babu Moger   
> > 
> > modulo MAX() macro, looks fine to me  
>
> Igor, Sorry. What do you mean here?

I meant s/MAX(topo_info->nodes_per_pkg, 1)/topo_info->nodes_per_pkg/

after it's made sure that topo_info->nodes_per_pkg is always valid.


(I believe I've commented on that somewhere. Series isn't split nicely,
so I've ended up applying it all and then reviewing so comments might
look out of the place sometimes, hopefully next revision will be easier
to review)

> >   
> >> ---
> >>  target/i386/cpu.c |  162 
> >> +++--
> >>  1 file changed, 35 insertions(+), 127 deletions(-)
> >>
> >> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> >> index 5d6edfd09b..19675eb696 100644
> >> --- a/target/i386/cpu.c
> >> +++ b/target/i386/cpu.c
> >> @@ -338,68 +338,15 @@ static void encode_cache_cpuid8006(CPUCacheInfo 
> >> *l2,
> >>  }
> >>  }
> >>  
> >> -/*
> >> - * Definitions used for building CPUID Leaf 0x801D and 0x801E
> >> - * Please refer to the AMD64 Architecture Programmer’s Manual Volume 3.
> >> - * Define the constants to build the cpu topology. Right now, TOPOEXT
> >> - * feature is enabled only on EPYC. So, these constants are based on
> >> - * EPYC supported configurations. We may need to handle the cases if
> >> - * these values change in future.
> >> - */
> >> -/* Maximum core complexes in a node */
> >> -#define MAX_CCX 2
> >> -/* Maximum cores in a core complex */
> >> -#define MAX_CORES_IN_CCX 4
> >> -/* Maximum cores in a node */
> >> -#define MAX_CORES_IN_NODE 8
> >> -/* Maximum nodes in a socket */
> >> -#define MAX_NODES_PER_SOCKET 4
> >> -
> >> -/*
> >> - * Figure out the number of nodes required to build this config.
> >> - * Max cores in a node is 8
> >> - */
> >> -static int nodes_in_socket(int nr_cores)
> >> -{
> >> -int nodes;
> >> -
> >> -nodes = DIV_ROUND_UP(nr_cores, MAX_CORES_IN_NODE);
> >> -
> >> -   /* Hardware does not support config with 3 nodes, return 4 in that 
> >> case */
> >> -return (nodes == 3) ? 4 : nodes;
> >> -}
> >> -
> >> -/*
> >> - * Decide the number of cores in a core complex with the given nr_cores 
> >> using
> >> - * following set constants MAX_CCX, MAX_CORES_IN_CCX, MAX_CORES_IN_NODE 
> >> and
> >> - * MAX_NODES_PER_SOCKET. Maintain symmetry as much as possible
> >> - * L3 cache is shared across all cores in a core complex. So, this will 
> >> also
> >> - * tell us how many cores are sharing the L3 cache.
> >> - */
> >> -static int cores_in_core_complex(int nr_cores)
> >> -{
> >> -int nodes;
> >> -
> >> -/* Check if we can fit all the cores in one core complex */
> >> -if (nr_cores <= MAX_CORES_IN_CCX) {
> >> -return nr_cores;
> >> -}
> >> -/* Get the number of nodes required to build this config */
> >> -nodes = nodes_in_socket(nr_cores);
> >> -
> >> -/*
> >> - * Divide the cores accros all the core complexes
> >> - * Return rounded up value
> >> - */
> >> -return DIV_ROUND_UP(nr_cores, nodes * MAX_CCX);
> >> -}
> >> -
> >>  /* Encode cache info for CPUID[801D] */
> >> -static void encode_cache_cpuid801d(CPUCacheInfo *cache, CPUState *cs,
> >> -uint32_t *eax, uint32_t *ebx,
> >> -uint32_t *ecx, uint32_t *edx)
> >> +static void encode_cache_cpuid801d(CPUCacheInfo *cache,
> >> +   X86CPUTopoInfo *topo_info,
> >> +   uint32_t *eax, uint32_t *ebx,
> >> +   uint32_t *ecx, uint32_t *edx)
> >>  {
> >>  uint32_t l3_cores;
> >> +unsigned nodes = MAX(topo_info->nodes_per_pkg, 1);
> >> +
> >>  assert(cache->size == cache->line_size * cache->associativity *
> >>cache->partitions * cache->sets);
> >>  
> >> @@ -408,10 +355,13 @@ static void encode_cache_cpuid801d(CPUCacheInfo 
> >> *cache, CPUState *cs,
> >>  
> >>  /* L3 is shared among multiple cores */
> >>  if (cache->level == 3) {
> >> -l3_cores = cores_in_core_complex(cs->nr_cores);
> >> -*eax |= ((l3_cores * cs->nr_threads) - 1) << 14;
> >> +l3_cores = DIV_ROUND_UP((topo_info->dies_per_pkg *
> >> + topo_info->cores_per_die *
> >> + topo_info->threads_per_core),
> >> + nodes);
> >> +*eax |= (l3_cores - 1) << 14;
> >>  } else {
> >> -*eax |= ((cs->nr_threads - 1) << 14);
> >> +*eax |= 

Re: [PATCH v2 13/13] migration/ram: Tolerate partially changed mappings in postcopy code

2020-02-24 Thread David Hildenbrand
On 24.02.20 23:49, Peter Xu wrote:
> On Fri, Feb 21, 2020 at 05:42:04PM +0100, David Hildenbrand wrote:
>> When we partially change mappings (esp., mmap over parts of an existing
>> mmap like qemu_ram_remap() does) where we have a userfaultfd handler
>> registered, the handler will implicitly be unregistered from the parts that
>> changed.
>>
>> Trying to place pages onto mappings where there is no longer a handler
>> registered will fail. Let's make sure that any waiter is woken up - we
>> have to do that manually.
>>
>> Let's also document how UFFDIO_UNREGISTER will handle this scenario.
>>
>> This is mainly a preparation for RAM blocks with resizable allcoations,
>> where the mapping of the invalid RAM range will change. The source will
>> keep sending pages that are outside of the new (shrunk) RAM size. We have
>> to treat these pages like they would have been migrated, but can
>> essentially simply drop the content (ignore the placement error).
>>
>> Keep printing a warning on EINVAL, to avoid hiding other (programming)
>> issues. ENOENT is unique.
>>
>> Cc: "Dr. David Alan Gilbert" 
>> Cc: Juan Quintela 
>> Cc: Peter Xu 
>> Cc: Andrea Arcangeli 
>> Signed-off-by: David Hildenbrand 
>> ---
>>  migration/postcopy-ram.c | 37 +
>>  1 file changed, 37 insertions(+)
>>
>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>> index c68caf4e42..f023830b9a 100644
>> --- a/migration/postcopy-ram.c
>> +++ b/migration/postcopy-ram.c
>> @@ -506,6 +506,12 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
>>  range_struct.start = (uintptr_t)host_addr;
>>  range_struct.len = length;
>>  
>> +/*
>> + * In case the mapping was partially changed since we enabled userfault
>> + * (e.g., via qemu_ram_remap()), the userfaultfd handler was already 
>> removed
>> + * for the mappings that changed. Unregistering will, however, still 
>> work
>> + * and ignore mappings without a registered handler.
>> + */
> 
> Ideally we should still only unregister what we have registered.
> After all we do have this information because we know what we
> registered, we know what has unmapped (in your new resize() hook, when
> postcopy_state==RUNNING).

Not in the case of qemu_ram_remap(). And whatever you propose will
require synchronization (see my other mail) and more complicated
handling than this. uffd allows you to handle races with mmap changes in
a very elegant way (e.g., -ENOENT, or unregisterignoring changed mappings).

> 
> An extreme example is when we register with pages in range [A, B),
> then shrink it to [A, C), then we mapped something else within [C, B)
> (note, with virtio-mem logically B can be very big and C can be very
> small, it means [B, C) can cover quite some address space). Then if:
> 
>   - [C, B) memory type is not compatible with uffd, or

That will never happen in the near future. Without resizable allocations:
- All memory is either anonymous or from a single fd

In addition, right now, only anonymous memory can be used for resizable
RAM. However, with resizable allocations we could have:
- All used_length memory is either anonymous or from a single fd
- All remaining memory is either anonymous or from a single fd

Everything else does not make any sense IMHO and I don't think this is
relevant long term. You cannot arbitrarily map things into the
used_length part of a RAMBlock. That would contradict to its page_size
and its fd. E.g., you would break qemu_ram_remap().

> 
>   - [C, B) could be registered with uffd again due to some other
> reason (so far QEMU should not have such a reason)

Any code that wants to make use of uffd properly has to synchronize
against postcopy code either way IMHO. It just doesn't work otherwise.

E.g., once I would use it to protect unplugged memory in virtio-mem
(something I am looking into right now and teaching QEMU not to touch
all RAMBlock memory is complicated :) ), virtio-mem would unregister any
uffd handler when notified that postcopy will start, and re-register
after postcopy finished.

[...]


>>  
>> +static int qemu_ufd_wake_ioctl(int userfault_fd, void *host_addr,
>> +   uint64_t pagesize)
>> +{
>> +struct uffdio_range range = {
>> +.start = (uint64_t)(uintptr_t)host_addr,
>> +.len = pagesize,
>> +};
>> +
>> +return ioctl(userfault_fd, UFFDIO_WAKE, );
>> +}
>> +
>>  static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>> void *from_addr, uint64_t pagesize, RAMBlock 
>> *rb)
>>  {
>> @@ -1198,6 +1215,26 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void 
>> *host_addr,
>>  zero_struct.mode = 0;
>>  ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, _struct);
>>  }
>> +
>> +/*
>> + * When the mapping gets partially changed (e.g., qemu_ram_remap()) 
>> before
>> + * we try to place a page, the userfaultfd handler will be removed for 
>> the
>> +

Re: [PATCH v4 06/16] hw/i386: Update structures for nodes_per_pkg

2020-02-24 Thread Igor Mammedov
On Mon, 24 Feb 2020 11:12:41 -0600
Babu Moger  wrote:

> On 2/24/20 2:34 AM, Igor Mammedov wrote:
> > On Thu, 13 Feb 2020 12:17:04 -0600
> > Babu Moger  wrote:
> >   
> >> Update structures X86CPUTopoIDs and CPUX86State to hold the nodes_per_pkg.
> >> This is required to build EPYC mode topology.
> >>
> >> Signed-off-by: Babu Moger 
> >> ---
> >>  hw/i386/pc.c   |1 +
> >>  hw/i386/x86.c  |2 ++
> >>  include/hw/i386/topology.h |2 ++
> >>  include/hw/i386/x86.h  |1 +
> >>  target/i386/cpu.c  |1 +
> >>  target/i386/cpu.h  |1 +
> >>  6 files changed, 8 insertions(+)
> >>
> >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >> index f13721ac43..02fdb3d506 100644
> >> --- a/hw/i386/pc.c
> >> +++ b/hw/i386/pc.c
> >> @@ -1753,6 +1753,7 @@ static void pc_cpu_pre_plug(HotplugHandler 
> >> *hotplug_dev,
> >>  init_topo_info(_info, x86ms);
> >>  
> >>  env->nr_dies = x86ms->smp_dies;
> >> +env->nr_nodes = ms->numa_state->num_nodes / ms->smp.sockets;  
> > 
> > it would be better if calculation would result in valid result
> > so you won't have to later scatter MAX(env->nr_nodes, 1) everywhere.  
> 
> Ok. Sure.
> > 
> > also I'd use earlier intialized:
> >   env->nr_nodes = topo_info->nodes_per_pkg
> > to avoid repeating calculation  
> 
> yes. Will do it.
> 
> >   
> >>  /*
> >>   * If APIC ID is not set,
> >> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> >> index 083effb2f5..3d944f68e6 100644
> >> --- a/hw/i386/x86.c
> >> +++ b/hw/i386/x86.c
> >> @@ -89,11 +89,13 @@ void x86_cpu_new(X86MachineState *x86ms, int64_t 
> >> apic_id, Error **errp)
> >>  Object *cpu = NULL;
> >>  Error *local_err = NULL;
> >>  CPUX86State *env = NULL;
> >> +MachineState *ms = MACHINE(x86ms);
> >>  
> >>  cpu = object_new(MACHINE(x86ms)->cpu_type);
> >>  
> >>  env = _CPU(cpu)->env;
> >>  env->nr_dies = x86ms->smp_dies;
> >> +env->nr_nodes = ms->numa_state->num_nodes / ms->smp.sockets;  
> > 
> > Is this really necessary?  (I think pc_cpu_pre_plug should take care of 
> > setting it)  
> 
> This does not seem necessary. I can add as a separate patch to remove env
> initialization from x86_cpu_new.

it doesn't have to be part of this series, but it's up to you
how to send it

> 
> >   
> >>  object_property_set_uint(cpu, apic_id, "apic-id", _err);
> >>  object_property_set_bool(cpu, true, "realized", _err);
> >> diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> >> index ef0ab0b6a3..522c77e6a9 100644
> >> --- a/include/hw/i386/topology.h
> >> +++ b/include/hw/i386/topology.h
> >> @@ -41,12 +41,14 @@
> >>  
> >>  #include "qemu/bitops.h"
> >>  #include "hw/i386/x86.h"
> >> +#include "sysemu/numa.h"
> >>  
> >>  static inline void init_topo_info(X86CPUTopoInfo *topo_info,
> >>const X86MachineState *x86ms)
> >>  {
> >>  MachineState *ms = MACHINE(x86ms);
> >>  
> >> +topo_info->nodes_per_pkg = ms->numa_state->num_nodes / 
> >> ms->smp.sockets;
> >>  topo_info->dies_per_pkg = x86ms->smp_dies;
> >>  topo_info->cores_per_die = ms->smp.cores;
> >>  topo_info->threads_per_core = ms->smp.threads;
> >> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
> >> index ad62b01cf2..d76fd0bbb1 100644
> >> --- a/include/hw/i386/x86.h
> >> +++ b/include/hw/i386/x86.h
> >> @@ -48,6 +48,7 @@ typedef struct X86CPUTopoIDs {
> >>  } X86CPUTopoIDs;
> >>  
> >>  typedef struct X86CPUTopoInfo {
> >> +unsigned nodes_per_pkg;
> >>  unsigned dies_per_pkg;
> >>  unsigned cores_per_die;
> >>  unsigned threads_per_core;
> >> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> >> index 7e630f47ac..5d6edfd09b 100644
> >> --- a/target/i386/cpu.c
> >> +++ b/target/i386/cpu.c
> >> @@ -6761,6 +6761,7 @@ static void x86_cpu_initfn(Object *obj)
> >>  FeatureWord w;
> >>  
> >>  env->nr_dies = 1;
> >> +env->nr_nodes = 1;
> >>  cpu_set_cpustate_pointers(cpu);
> >>  
> >>  object_property_add(obj, "family", "int",
> >> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> >> index af282936a7..627a8cb9be 100644
> >> --- a/target/i386/cpu.h
> >> +++ b/target/i386/cpu.h
> >> @@ -1603,6 +1603,7 @@ typedef struct CPUX86State {
> >>  TPRAccess tpr_access_type;
> >>  
> >>  unsigned nr_dies;
> >> +unsigned nr_nodes;
> >>  } CPUX86State;
> >>  
> >>  struct kvm_msrs;
> >>  
> >   
> 




[PATCH 2/4] vhost-user-fs: convert to the new virtio_delete_queue function

2020-02-24 Thread Pan Nengyuan
use the new virtio_delete_queue function to cleanup.

Signed-off-by: Pan Nengyuan 
Cc: "Dr. David Alan Gilbert" 
Cc: Stefan Hajnoczi 
---
 hw/virtio/vhost-user-fs.c | 15 +--
 include/hw/virtio/vhost-user-fs.h |  2 ++
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 4554d123b7..6136768875 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -209,11 +209,12 @@ static void vuf_device_realize(DeviceState *dev, Error 
**errp)
 sizeof(struct virtio_fs_config));
 
 /* Hiprio queue */
-virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
+fs->hiprio_vq = virtio_add_queue(vdev, fs->conf.queue_size, 
vuf_handle_output);
 
 /* Request queues */
+fs->req_vqs = g_new(VirtQueue *, fs->conf.num_request_queues);
 for (i = 0; i < fs->conf.num_request_queues; i++) {
-virtio_add_queue(vdev, fs->conf.queue_size, vuf_handle_output);
+fs->req_vqs[i] = virtio_add_queue(vdev, fs->conf.queue_size, 
vuf_handle_output);
 }
 
 /* 1 high prio queue, plus the number configured */
@@ -230,10 +231,11 @@ static void vuf_device_realize(DeviceState *dev, Error 
**errp)
 
 err_virtio:
 vhost_user_cleanup(>vhost_user);
-virtio_del_queue(vdev, 0);
+virtio_delete_queue(fs->hiprio_vq);
 for (i = 0; i < fs->conf.num_request_queues; i++) {
-virtio_del_queue(vdev, i + 1);
+virtio_delete_queue(fs->req_vqs[i]);
 }
+g_free(fs->req_vqs);
 virtio_cleanup(vdev);
 g_free(fs->vhost_dev.vqs);
 return;
@@ -252,10 +254,11 @@ static void vuf_device_unrealize(DeviceState *dev, Error 
**errp)
 
 vhost_user_cleanup(>vhost_user);
 
-virtio_del_queue(vdev, 0);
+virtio_delete_queue(fs->hiprio_vq);
 for (i = 0; i < fs->conf.num_request_queues; i++) {
-virtio_del_queue(vdev, i + 1);
+virtio_delete_queue(fs->req_vqs[i]);
 }
+g_free(fs->req_vqs);
 virtio_cleanup(vdev);
 g_free(fs->vhost_dev.vqs);
 fs->vhost_dev.vqs = NULL;
diff --git a/include/hw/virtio/vhost-user-fs.h 
b/include/hw/virtio/vhost-user-fs.h
index 9ff1bdb7cf..6f3030d288 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -37,6 +37,8 @@ typedef struct {
 struct vhost_virtqueue *vhost_vqs;
 struct vhost_dev vhost_dev;
 VhostUserState vhost_user;
+VirtQueue **req_vqs;
+VirtQueue *hiprio_vq;
 
 /*< public >*/
 } VHostUserFS;
-- 
2.18.2




[PATCH 3/4] virtio-pmem: do delete rq_vq in virtio_pmem_unrealize

2020-02-24 Thread Pan Nengyuan
Similar to other virtio-deivces, rq_vq forgot to delete in 
virtio_pmem_unrealize, this patch fix it.
This device has aleardy maintained a vq pointer, thus we use the new 
virtio_delete_queue function directly to do the cleanup.

Reported-by: Euler Robot 
Signed-off-by: Pan Nengyuan 
---
 hw/virtio/virtio-pmem.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
index 97287e923b..43399522f5 100644
--- a/hw/virtio/virtio-pmem.c
+++ b/hw/virtio/virtio-pmem.c
@@ -130,6 +130,7 @@ static void virtio_pmem_unrealize(DeviceState *dev, Error 
**errp)
 VirtIOPMEM *pmem = VIRTIO_PMEM(dev);
 
 host_memory_backend_set_mapped(pmem->memdev, false);
+virtio_delete_queue(pmem->rq_vq);
 virtio_cleanup(vdev);
 }
 
-- 
2.18.2




[PATCH 4/4] virtio-crypto: do delete ctrl_vq in virtio_crypto_device_unrealize

2020-02-24 Thread Pan Nengyuan
Similar to other virtio-deivces, ctrl_vq forgot to delete in 
virtio_crypto_device_unrealize, this patch fix it.
This device has aleardy maintained vq pointers. Thus, we use the new 
virtio_delete_queue function directly to do the cleanup.

The leak stack:
Direct leak of 10752 byte(s) in 3 object(s) allocated from:
#0 0x7f4c024b1970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
#1 0x7f4c018be49d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
#2 0x55a2f8017279 in virtio_add_queue 
/mnt/sdb/qemu-new/qemu_test/qemu/hw/virtio/virtio.c:2333
#3 0x55a2f8057035 in virtio_crypto_device_realize 
/mnt/sdb/qemu-new/qemu_test/qemu/hw/virtio/virtio-crypto.c:814
#4 0x55a2f8005d80 in virtio_device_realize 
/mnt/sdb/qemu-new/qemu_test/qemu/hw/virtio/virtio.c:3531
#5 0x55a2f8497d1b in device_set_realized 
/mnt/sdb/qemu-new/qemu_test/qemu/hw/core/qdev.c:891
#6 0x55a2f8b48595 in property_set_bool 
/mnt/sdb/qemu-new/qemu_test/qemu/qom/object.c:2238
#7 0x55a2f8b54fad in object_property_set_qobject 
/mnt/sdb/qemu-new/qemu_test/qemu/qom/qom-qobject.c:26
#8 0x55a2f8b4de2c in object_property_set_bool 
/mnt/sdb/qemu-new/qemu_test/qemu/qom/object.c:1390
#9 0x55a2f80609c9 in virtio_crypto_pci_realize 
/mnt/sdb/qemu-new/qemu_test/qemu/hw/virtio/virtio-crypto-pci.c:58

Reported-by: Euler Robot 
Signed-off-by: Pan Nengyuan 
Cc: "Gonglei (Arei)" 
---
 hw/virtio/virtio-crypto.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
index 7351ab0a19..4c65114de5 100644
--- a/hw/virtio/virtio-crypto.c
+++ b/hw/virtio/virtio-crypto.c
@@ -831,12 +831,13 @@ static void virtio_crypto_device_unrealize(DeviceState 
*dev, Error **errp)
 
 max_queues = vcrypto->multiqueue ? vcrypto->max_queues : 1;
 for (i = 0; i < max_queues; i++) {
-virtio_del_queue(vdev, i);
+virtio_delete_queue(vcrypto->vqs[i].dataq);
 q = >vqs[i];
 qemu_bh_delete(q->dataq_bh);
 }
 
 g_free(vcrypto->vqs);
+virtio_delete_queue(vcrypto->ctrl_vq);
 
 virtio_cleanup(vdev);
 cryptodev_backend_set_used(vcrypto->cryptodev, false);
-- 
2.18.2




[PATCH 1/4] vhost-user-fs: do delete virtio_queues in unrealize

2020-02-24 Thread Pan Nengyuan
Similar to other virtio device(https://patchwork.kernel.org/patch/11399237/), 
virtio queues forgot to delete in unrealize, and aslo error path in realize, 
this patch fix these memleaks, the leak stack is as follow:
Direct leak of 57344 byte(s) in 1 object(s) allocated from:
#0 0x7f15784fb970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
#1 0x7f157790849d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
#2 0x55587a1bf859 in virtio_add_queue 
/mnt/sdb/qemu-new/qemu_test/qemu/hw/virtio/virtio.c:2333
#3 0x55587a2071d5 in vuf_device_realize 
/mnt/sdb/qemu-new/qemu_test/qemu/hw/virtio/vhost-user-fs.c:212
#4 0x55587a1ae360 in virtio_device_realize 
/mnt/sdb/qemu-new/qemu_test/qemu/hw/virtio/virtio.c:3531
#5 0x55587a63fb7b in device_set_realized 
/mnt/sdb/qemu-new/qemu_test/qemu/hw/core/qdev.c:891
#6 0x55587acf03f5 in property_set_bool 
/mnt/sdb/qemu-new/qemu_test/qemu/qom/object.c:2238
#7 0x55587acfce0d in object_property_set_qobject 
/mnt/sdb/qemu-new/qemu_test/qemu/qom/qom-qobject.c:26
#8 0x55587acf5c8c in object_property_set_bool 
/mnt/sdb/qemu-new/qemu_test/qemu/qom/object.c:1390
#9 0x55587a8e22a2 in pci_qdev_realize 
/mnt/sdb/qemu-new/qemu_test/qemu/hw/pci/pci.c:2095
#10 0x55587a63fb7b in device_set_realized 
/mnt/sdb/qemu-new/qemu_test/qemu/hw/core/qdev.c:891
#11 0x55587acf03f5 in property_set_bool 
/mnt/sdb/qemu-new/qemu_test/qemu/qom/object.c:2238
#12 0x55587acfce0d in object_property_set_qobject 
/mnt/sdb/qemu-new/qemu_test/qemu/qom/qom-qobject.c:26
#13 0x55587acf5c8c in object_property_set_bool 
/mnt/sdb/qemu-new/qemu_test/qemu/qom/object.c:1390
#14 0x55587a496d65 in qdev_device_add 
/mnt/sdb/qemu-new/qemu_test/qemu/qdev-monitor.c:679

Reported-by: Euler Robot 
Signed-off-by: Pan Nengyuan 
Cc: "Dr. David Alan Gilbert" 
Cc: Stefan Hajnoczi 
---
 hw/virtio/vhost-user-fs.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 33b17848c2..4554d123b7 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -230,6 +230,10 @@ static void vuf_device_realize(DeviceState *dev, Error 
**errp)
 
 err_virtio:
 vhost_user_cleanup(>vhost_user);
+virtio_del_queue(vdev, 0);
+for (i = 0; i < fs->conf.num_request_queues; i++) {
+virtio_del_queue(vdev, i + 1);
+}
 virtio_cleanup(vdev);
 g_free(fs->vhost_dev.vqs);
 return;
@@ -239,6 +243,7 @@ static void vuf_device_unrealize(DeviceState *dev, Error 
**errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VHostUserFS *fs = VHOST_USER_FS(dev);
+int i;
 
 /* This will stop vhost backend if appropriate. */
 vuf_set_status(vdev, 0);
@@ -247,6 +252,10 @@ static void vuf_device_unrealize(DeviceState *dev, Error 
**errp)
 
 vhost_user_cleanup(>vhost_user);
 
+virtio_del_queue(vdev, 0);
+for (i = 0; i < fs->conf.num_request_queues; i++) {
+virtio_del_queue(vdev, i + 1);
+}
 virtio_cleanup(vdev);
 g_free(fs->vhost_dev.vqs);
 fs->vhost_dev.vqs = NULL;
-- 
2.18.2




[PATCH 0/4] virtio: fix some virtio-queue leaks.

2020-02-24 Thread Pan Nengyuan
Similar to other virtio device(https://patchwork.kernel.org/patch/11399237/), 
we aslo found some virtio-queue leaks in unrealize().
This series do the cleanup in unrealize to fix it.

Pan Nengyuan (4):
  vhost-user-fs: do delete virtio_queues in unrealize
  vhost-user-fs: convert to the new virtio_delete_queue function
  virtio-pmem: do delete rq_vq in virtio_pmem_unrealize
  virtio-crypto: do delete ctrl_vq in virtio_crypto_device_unrealize

 hw/virtio/vhost-user-fs.c | 16 ++--
 hw/virtio/virtio-crypto.c |  3 ++-
 hw/virtio/virtio-pmem.c   |  1 +
 include/hw/virtio/vhost-user-fs.h |  2 ++
 4 files changed, 19 insertions(+), 3 deletions(-)

-- 
2.18.2




Re: [PATCH 2/2] qxl: drop shadow_rom

2020-02-24 Thread Paolo Bonzini
On 25/02/20 06:59, Gerd Hoffmann wrote:
> Now that the rom bar is mapped read-only and the guest can't change
> things under our feet we don't need the shadow rom any more.

Can't it do so when migrating from an older version?

Paolo

> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/display/qxl.h |  2 +-
>  hw/display/qxl.c | 25 +
>  2 files changed, 10 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/display/qxl.h b/hw/display/qxl.h
> index 707631a1f573..3aedc7db5da0 100644
> --- a/hw/display/qxl.h
> +++ b/hw/display/qxl.h
> @@ -95,11 +95,11 @@ typedef struct PCIQXLDevice {
>  uint32_t   vgamem_size;
>  
>  /* rom pci bar */
> -QXLRom shadow_rom;
>  QXLRom *rom;
>  QXLModes   *modes;
>  uint32_t   rom_size;
>  MemoryRegion   rom_bar;
> +uint32_t   rom_mode;
>  #if SPICE_SERVER_VERSION >= 0x000c06 /* release 0.12.6 */
>  uint16_t   max_outputs;
>  #endif
> diff --git a/hw/display/qxl.c b/hw/display/qxl.c
> index 227da69a50d9..0502802688f9 100644
> --- a/hw/display/qxl.c
> +++ b/hw/display/qxl.c
> @@ -391,7 +391,6 @@ static void init_qxl_rom(PCIQXLDevice *d)
>  sizeof(rom->client_monitors_config));
>  }
>  
> -d->shadow_rom = *rom;
>  d->rom= rom;
>  d->modes  = modes;
>  }
> @@ -403,7 +402,7 @@ static void init_qxl_ram(PCIQXLDevice *d)
>  QXLReleaseRing *ring;
>  
>  buf = d->vga.vram_ptr;
> -d->ram = (QXLRam *)(buf + le32_to_cpu(d->shadow_rom.ram_header_offset));
> +d->ram = (QXLRam *)(buf + le32_to_cpu(d->rom->ram_header_offset));
>  d->ram->magic   = cpu_to_le32(QXL_RAM_MAGIC);
>  d->ram->int_pending = cpu_to_le32(0);
>  d->ram->int_mask= cpu_to_le32(0);
> @@ -446,7 +445,7 @@ static void qxl_ram_set_dirty(PCIQXLDevice *qxl, void 
> *ptr)
>  /* can be called from spice server thread context */
>  static void qxl_ring_set_dirty(PCIQXLDevice *qxl)
>  {
> -ram_addr_t addr = qxl->shadow_rom.ram_header_offset;
> +ram_addr_t addr = qxl->rom->ram_header_offset;
>  ram_addr_t end  = qxl->vga.vram_size;
>  qxl_set_dirty(>vga.vram, addr, end);
>  }
> @@ -529,7 +528,6 @@ static void interface_set_compression_level(QXLInstance 
> *sin, int level)
>  PCIQXLDevice *qxl = container_of(sin, PCIQXLDevice, ssd.qxl);
>  
>  trace_qxl_interface_set_compression_level(qxl->id, level);
> -qxl->shadow_rom.compression_level = cpu_to_le32(level);
>  qxl->rom->compression_level = cpu_to_le32(level);
>  qxl_rom_set_dirty(qxl);
>  }
> @@ -561,7 +559,7 @@ static void interface_get_init_info(QXLInstance *sin, 
> QXLDevInitInfo *info)
>  info->num_memslots_groups = NUM_MEMSLOTS_GROUPS;
>  info->internal_groupslot_id = 0;
>  info->qxl_ram_size =
> -le32_to_cpu(qxl->shadow_rom.num_pages) << QXL_PAGE_BITS;
> +le32_to_cpu(qxl->rom->num_pages) << QXL_PAGE_BITS;
>  info->n_surfaces = qxl->ssd.num_surfaces;
>  }
>  
> @@ -1035,9 +1033,6 @@ static void 
> interface_set_client_capabilities(QXLInstance *sin,
>  return;
>  }
>  
> -qxl->shadow_rom.client_present = client_present;
> -memcpy(qxl->shadow_rom.client_capabilities, caps,
> -   sizeof(qxl->shadow_rom.client_capabilities));
>  qxl->rom->client_present = client_present;
>  memcpy(qxl->rom->client_capabilities, caps,
> sizeof(qxl->rom->client_capabilities));
> @@ -1232,11 +1227,8 @@ static void qxl_check_state(PCIQXLDevice *d)
>  
>  static void qxl_reset_state(PCIQXLDevice *d)
>  {
> -QXLRom *rom = d->rom;
> -
>  qxl_check_state(d);
> -d->shadow_rom.update_id = cpu_to_le32(0);
> -*rom = d->shadow_rom;
> +d->rom->update_id = cpu_to_le32(0);
>  qxl_rom_set_dirty(d);
>  init_qxl_ram(d);
>  d->num_free_res = 0;
> @@ -1600,7 +1592,7 @@ static void qxl_set_mode(PCIQXLDevice *d, unsigned int 
> modenr, int loadvm)
>  .format = SPICE_SURFACE_FMT_32_xRGB,
>  .flags  = loadvm ? QXL_SURF_FLAG_KEEP_DATA : 0,
>  .mouse_mode = true,
> -.mem= devmem + d->shadow_rom.draw_area_offset,
> +.mem= devmem + d->rom->draw_area_offset,
>  };
>  
>  trace_qxl_set_mode(d->id, modenr, mode->x_res, mode->y_res, mode->bits,
> @@ -1620,7 +1612,6 @@ static void qxl_set_mode(PCIQXLDevice *d, unsigned int 
> modenr, int loadvm)
>  if (mode->bits == 16) {
>  d->cmdflags |= QXL_COMMAND_FLAG_COMPAT_16BPP;
>  }
> -d->shadow_rom.mode = cpu_to_le32(modenr);
>  d->rom->mode = cpu_to_le32(modenr);
>  qxl_rom_set_dirty(d);
>  }
> @@ -2277,6 +2268,7 @@ static int qxl_pre_save(void *opaque)
>  d->last_release_offset = (uint8_t *)d->last_release - ram_start;
>  }
>  assert(d->last_release_offset < d->vga.vram_size);
> +d->rom_mode = d->rom->mode;
>  
>  return 0;
>  }
> @@ -2316,6 +2308,7 @@ static int qxl_post_load(void *opaque, int version)
>  } else {
>  

Re: [PATCH 2/2] util: add util function buffer_zero_avx512()

2020-02-24 Thread Robert Hoo
On Mon, 2020-02-24 at 08:13 -0800, Richard Henderson wrote:
> On 2/23/20 11:07 PM, Robert Hoo wrote:
> > Inspired by your suggestion, I'm thinking go further: use immediate
> > rather than a global variable, so that saves 1 memory(/cache)
> > access. 
> > 
> > #ifdef CONFIG_AVX512F_OPT   
> > #define OPTIMIZE_LEN256
> > #else
> > #define OPTIMIZE_LEN64
> > #endif
> 
> With that, the testing in tests/test-bufferiszero.c, looping through
> the
> implementations, is invalidated.  Because once you start compiling
> for avx512,
> you're no longer testing sse2 et al with the same inputs.
> 
Right. Thanks pointing out. I didn't noticed that.
More precisely, it would cause no longer testing sse2 et al with < 256
length.

> IF we want to change the length to suit avx512, we would want to
> change it
> unconditionally.  And then you could also tidy up avx2 to avoid the
> extra
> comparisons there.
Considering the length's dependency on sse2/sse4/avx2/avx512 and the
algorithms, as well as future's possible changes, additions, I'd rather
roll back to your original suggestion, use a companion variable with
each accel_fn(). How do you like it?

> 
> 
> r~




Re: [PATCH v2 10/13] migration/ram: Handle RAM block resizes during postcopy

2020-02-24 Thread David Hildenbrand
On 24.02.20 23:26, Peter Xu wrote:
> On Fri, Feb 21, 2020 at 05:42:01PM +0100, David Hildenbrand wrote:
> 
> [...]
> 
>> @@ -3160,7 +3160,13 @@ static int ram_load_postcopy(QEMUFile *f)
>>  break;
>>  }
>>  
>> -if (!offset_in_ramblock(block, addr)) {
>> +/*
>> + * Relying on used_length is racy and can result in false 
>> positives.
>> + * We might place pages beyond used_length in case RAM was 
>> shrunk
>> + * while in postcopy, which is fine - trying to place via
>> + * UFFDIO_COPY/UFFDIO_ZEROPAGE will never segfault.
>> + */
>> +if (!block->host || addr >= block->postcopy_length) {
> 
> I'm thinking whether we can even avoid the -ENOENT failure of
> UFFDIO_COPY.  With the postcopy_length you introduced, I think it's
> the case when addr >= used_length && addr < postcopy_length, right?
> Can we skip those?

1. Recall that any check against used_length is completely racy. So no,
it's not that easy. There is no trusting on used_length at all. It
should never be access from asynchronous postcopy code.

2. There is one theoretical case with resizable allocations: Assume you
first shrink and then grow again. You would have some addr < used_length
where you cannot (and don't want to) place.


Note: Before discovering the nice -ENOENT handling, I had a second
variable postcopy_place_length stored in RAM blocks that would be

- Initialized to postcopy_length
- Synchronized by a mutex
- Changed inside the resize callback on any resizes to
-- postcopy_place_length = min(postcopy_place_length, newsize)

But TBH, I find using -ENOENT much more elegant. It was designed to
handle mmap changes like this.

-- 
Thanks,

David / dhildenb




Re: Race condition in overlayed qcow2?

2020-02-24 Thread Vladimir Sementsov-Ogievskiy

25.02.2020 8:58, dovgaluk wrote:

Vladimir Sementsov-Ogievskiy писал 2020-02-21 16:23:

21.02.2020 15:35, dovgaluk wrote:

Vladimir Sementsov-Ogievskiy писал 2020-02-21 13:09:

21.02.2020 12:49, dovgaluk wrote:

Vladimir Sementsov-Ogievskiy писал 2020-02-20 12:36:

1 or 2 are ok, and 4 or 8 lead to the failures.


That is strange. I could think, that it was caused by the bugs in
deterministic CPU execution, but the first difference in logs
occur in READ operation (I dump read/write buffers in blk_aio_complete).



Aha, yes, looks strange.

Then next steps:

1. Does problem hit into the same offset every time?
2. Do we write to this region before this strange read?

2.1. If yes, we need to check that we read what we write.. You say you dump 
buffers
in blk_aio_complete... I think it would be more reliable to dump at start of
bdrv_co_pwritev and at end of bdrv_co_preadv. Also, guest may modify its buffers
during operation which would be strange but possible.

2.2 If not, hmm...




Another idea to check: use blkverify


I added logging of file descriptor and discovered that different results are 
obtained
when reading from the backing file.
And even more - replay runs of the same recording produce different results.
Logs show that there is a preadv race, but I can't figure out the source of the 
failure.

Log1:
preadv c 30467e00
preadv c 3096
--- sum = a2e1e
bdrv_co_preadv_part complete offset: 30467e00 qiov_offset: 0 len: 8200
--- sum = 10cdee
bdrv_co_preadv_part complete offset: 3096 qiov_offset: 8200 len: ee00

Log2:
preadv c 30467e00
--- sum = a2e1e
bdrv_co_preadv_part complete offset: 30467e00 qiov_offset: 0 len: 8200
preadv c 3096
--- sum = f094f
bdrv_co_preadv_part complete offset: 3096 qiov_offset: 8200 len: ee00


Checksum calculation was added to preadv in file-posix.c



So, preadv in file-posix.c returns different results for the same
offset, for file which is always opened in RO mode? Sounds impossible
:)


True.
Maybe my logging is wrong?

static ssize_t
qemu_preadv(int fd, const struct iovec *iov, int nr_iov, off_t offset)
{
 ssize_t res = preadv(fd, iov, nr_iov, offset);
 qemu_log("preadv %x %"PRIx64"\n", fd, (uint64_t)offset);
 int i;
 uint32_t sum = 0;
 int cnt = 0;
 for (i = 0 ; i < nr_iov ; ++i) {
 int j;
 for (j = 0 ; j < (int)iov[i].iov_len ; ++j)
 {
 sum += ((uint8_t*)iov[i].iov_base)[j];
 ++cnt;
 }
 }
 qemu_log("size: %x sum: %x\n", cnt, sum);
 assert(cnt == res);
 return res;
}



Hmm, I don't see any issues here..

Are you absolutely sure, that all these reads are from backing file,
which is read-only and never changed (may be by other processes)?


Yes, I made a copy and compared the files with binwalk.


2. guest modifies buffers during operation (you can catch it if
allocate personal buffer for preadv, than calculate checksum, then
memcpy to guest buffer)


I added the following to the qemu_preadv:

     // do it again
     unsigned char *buf = g_malloc(cnt);
     struct iovec v = {buf, cnt};
     res = preadv(fd, , 1, offset);
     assert(cnt == res);
     uint32_t sum2 = 0;
     for (i = 0 ; i < cnt ; ++i)
     sum2 += buf[i];
     g_free(buf);
     qemu_log("--- sum2 = %x\n", sum2);
     assert(sum2 == sum);

These two reads give different results.
But who can modify the buffer while qcow2 workers filling it with data from the 
disk?



As far as I know, it's guest's buffer, and guest may modify it during the 
operation. So, it may be winxp :)



--
Best regards,
Vladimir



Re: [PATCH v2 3/3] Lift max memory slots limit imposed by vhost-user

2020-02-24 Thread Raphael Norwitz
Ping

On Sun, Feb 09, 2020 at 12:43:35PM -0500, Raphael Norwitz wrote:
> 
> On Thu, Feb 06, 2020 at 03:32:38AM -0500, Michael S. Tsirkin wrote:
> > 
> > On Wed, Jan 15, 2020 at 09:57:06PM -0500, Raphael Norwitz wrote:
> > > The current vhost-user implementation in Qemu imposes a limit on the
> > > maximum number of memory slots exposed to a VM using a vhost-user
> > > device. This change provides a new protocol feature
> > > VHOST_USER_F_CONFIGURE_SLOTS which, when enabled, lifts this limit and
> > > allows a VM with a vhost-user device to expose a configurable number of
> > > memory slots, up to the ACPI defined maximum. Existing backends which
> > > do not support this protocol feature are unaffected.
> > 
> > Hmm ACPI maximum seems to be up to 512 - is this too much to fit in a
> > single message?  So can't we just increase the number (after negotiating
> > with remote) and be done with it, instead of add/remove?  Or is there
> > another reason to prefer add/remove?
> >
> 
> As mentioned in my cover letter, we experimented with simply increasing the
> message size and it didn’t work on our setup. We debugged down to the socket
> layer and found that on the receiving end the messages were truncated at
> around 512 bytes, or around 16 memory regions. To support 512 memory regions
> we would need a message size of around  32  * 512 
> + 8  ~= 16k packet size. That would be 64
> times larger than the next largest message size. We thought it would be 
> cleaner
> and more in line with the rest of the protocol to keep the message sizes
> smaller. In particular, we thought memory regions should be treated like the
> rings, which are sent over one message at a time instead of in one large 
> message.
> Whether or not such a large message size can be made to work in our case,
> separate messages will always work on Linux, and most likely all other UNIX
> platforms QEMU is used on.
> 

> > > 
> > > This feature works by using three new messages,
> > > VHOST_USER_GET_MAX_MEM_SLOTS, VHOST_USER_ADD_MEM_REG and
> > > VHOST_USER_REM_MEM_REG. VHOST_USER_GET_MAX_MEM_SLOTS gets the
> > > number of memory slots the backend is willing to accept when the
> > > backend is initialized. Then, when the memory tables are set or updated,
> > > a series of VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG messages
> > > are sent to transmit the regions to map and/or unmap instead of trying
> > > to send all the regions in one fixed size VHOST_USER_SET_MEM_TABLE
> > > message.
> > > 
> > > The vhost_user struct maintains a shadow state of the VM’s memory
> > > regions. When the memory tables are modified, the
> > > vhost_user_set_mem_table() function compares the new device memory state
> > > to the shadow state and only sends regions which need to be unmapped or
> > > mapped in. The regions which must be unmapped are sent first, followed
> > > by the new regions to be mapped in. After all the messages have been
> > > sent, the shadow state is set to the current virtual device state.
> > > 
> > > The current feature implementation does not work with postcopy migration
> > > and cannot be enabled if the VHOST_USER_PROTOCOL_F_REPLY_ACK feature has
> > > also been negotiated.
> > 
> > Hmm what would it take to lift the restrictions?
> > conflicting features like this makes is very hard for users to make
> > an informed choice what to support.
> >
> 
> We would need a setup with a backend which supports these features (REPLY_ACK
> and postcopy migration). At first glance it looks like DPDK could work but
> I'm not sure how easy it will be to test postcopy migration with the resources
> we have.
>  

> > > Signed-off-by: Raphael Norwitz 
> > > Signed-off-by: Peter Turschmid 
> > > Suggested-by: Mike Cui 
> > > ---
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > index af83fdd..fed6d02 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -35,11 +35,29 @@
> > >  #include 
> > >  #endif
> > >  
> > > -#define VHOST_MEMORY_MAX_NREGIONS8
> > > +#define VHOST_MEMORY_LEGACY_NREGIONS8
> > 
> > Hardly legacy when this is intended to always be used e.g. with
> > postcopy, right?
> >
> 
> How about 'BASELINE'?

> > > +msg->hdr.size = sizeof(msg->payload.mem_reg.padding);
> > > +msg->hdr.size += sizeof(VhostUserMemoryRegion);
> > > +
> > > +/*
> > > + * Send VHOST_USER_REM_MEM_REG for memory regions in our shadow state
> > > + * which are not found not in the device's memory state.
> > 
> > double negation - could not parse this.
> >
> 
> Apologies - typo here. It should say “Send VHOST_USER_REM_MEM_REG for memory
> regions in our shadow state which are not found in the device's memory 
> state.” 
> i.e. send messages to remove regions in the shadow state but not in the 
> updated
> device state. 
>  
> > > + */
> > > +for (i = 0; i < u->num_shadow_regions; ++i) {
> > > +reg = dev->mem->regions;
> > > +
> > > +for (j = 0; j < 

Re: [PATCH v6 02/18] ppc: Remove stub support for 32-bit hypervisor mode

2020-02-24 Thread Greg Kurz
On Tue, 25 Feb 2020 10:37:08 +1100
David Gibson  wrote:

> a4f30719a8cd, way back in 2007 noted that "PowerPC hypervisor mode is not
> fundamentally available only for PowerPC 64" and added a 32-bit version
> of the MSR[HV] bit.
> 
> But nothing was ever really done with that; there is no meaningful support
> for 32-bit hypervisor mode 13 years later.  Let's stop pretending and just
> remove the stubs.
> 
> Signed-off-by: David Gibson 
> Reviewed-by: Fabiano Rosas 
> ---

Reviewed-by: Greg Kurz 

>  target/ppc/cpu.h| 21 +++--
>  target/ppc/translate_init.inc.c |  6 +++---
>  2 files changed, 10 insertions(+), 17 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index b283042515..8077fdb068 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -24,8 +24,6 @@
>  #include "exec/cpu-defs.h"
>  #include "cpu-qom.h"
>  
> -/* #define PPC_EMULATE_32BITS_HYPV */
> -
>  #define TCG_GUEST_DEFAULT_MO 0
>  
>  #define TARGET_PAGE_BITS_64K 16
> @@ -300,13 +298,12 @@ typedef struct ppc_v3_pate_t {
>  #define MSR_SF   63 /* Sixty-four-bit modehflags 
> */
>  #define MSR_TAG  62 /* Tag-active mode (POWERx ?)
> */
>  #define MSR_ISF  61 /* Sixty-four-bit interrupt mode on 630  
> */
> -#define MSR_SHV  60 /* hypervisor state   hflags 
> */
> +#define MSR_HV   60 /* hypervisor state   hflags 
> */
>  #define MSR_TS0  34 /* Transactional state, 2 bits (Book3s)  
> */
>  #define MSR_TS1  33
>  #define MSR_TM   32 /* Transactional Memory Available (Book3s)   
> */
>  #define MSR_CM   31 /* Computation mode for BookE hflags 
> */
>  #define MSR_ICM  30 /* Interrupt computation mode for BookE  
> */
> -#define MSR_THV  29 /* hypervisor state for 32 bits PowerPC   hflags 
> */
>  #define MSR_GS   28 /* guest state for BookE 
> */
>  #define MSR_UCLE 26 /* User-mode cache lock enable for BookE 
> */
>  #define MSR_VR   25 /* altivec availablex hflags 
> */
> @@ -401,10 +398,13 @@ typedef struct ppc_v3_pate_t {
>  
>  #define msr_sf   ((env->msr >> MSR_SF)   & 1)
>  #define msr_isf  ((env->msr >> MSR_ISF)  & 1)
> -#define msr_shv  ((env->msr >> MSR_SHV)  & 1)
> +#if defined(TARGET_PPC64)
> +#define msr_hv   ((env->msr >> MSR_HV)   & 1)
> +#else
> +#define msr_hv   (0)
> +#endif
>  #define msr_cm   ((env->msr >> MSR_CM)   & 1)
>  #define msr_icm  ((env->msr >> MSR_ICM)  & 1)
> -#define msr_thv  ((env->msr >> MSR_THV)  & 1)
>  #define msr_gs   ((env->msr >> MSR_GS)   & 1)
>  #define msr_ucle ((env->msr >> MSR_UCLE) & 1)
>  #define msr_vr   ((env->msr >> MSR_VR)   & 1)
> @@ -449,16 +449,9 @@ typedef struct ppc_v3_pate_t {
>  
>  /* Hypervisor bit is more specific */
>  #if defined(TARGET_PPC64)
> -#define MSR_HVB (1ULL << MSR_SHV)
> -#define msr_hv  msr_shv
> -#else
> -#if defined(PPC_EMULATE_32BITS_HYPV)
> -#define MSR_HVB (1ULL << MSR_THV)
> -#define msr_hv  msr_thv
> +#define MSR_HVB (1ULL << MSR_HV)
>  #else
>  #define MSR_HVB (0ULL)
> -#define msr_hv  (0)
> -#endif
>  #endif
>  
>  /* DSISR */
> diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
> index 53995f62ea..a0d0eaabf2 100644
> --- a/target/ppc/translate_init.inc.c
> +++ b/target/ppc/translate_init.inc.c
> @@ -8804,7 +8804,7 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
>  PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 |
>  PPC2_TM | PPC2_PM_ISA206;
>  pcc->msr_mask = (1ull << MSR_SF) |
> -(1ull << MSR_SHV) |
> +(1ull << MSR_HV) |
>  (1ull << MSR_TM) |
>  (1ull << MSR_VR) |
>  (1ull << MSR_VSX) |
> @@ -9017,7 +9017,7 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
>  PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 |
>  PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL;
>  pcc->msr_mask = (1ull << MSR_SF) |
> -(1ull << MSR_SHV) |
> +(1ull << MSR_HV) |
>  (1ull << MSR_TM) |
>  (1ull << MSR_VR) |
>  (1ull << MSR_VSX) |
> @@ -9228,7 +9228,7 @@ POWERPC_FAMILY(POWER10)(ObjectClass *oc, void *data)
>  PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 |
>  PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL;
>  pcc->msr_mask = (1ull << MSR_SF) |
> -(1ull << MSR_SHV) |
> +(1ull << MSR_HV) |
>  (1ull << MSR_TM) |
>  (1ull << MSR_VR) |
>  (1ull << MSR_VSX) |




RE: [PATCH 1/1] hw/net/can: Introduce Xlnx ZynqMP CAN controller for QEMU

2020-02-24 Thread Vikram Garhwal
Hi Jason,
Apologies for the delayed response. I tried plugging NetClientState in the CAN 
which is required if we use qemu_send_packet but this will change the 
underlying architecture of can-core, can-socketcan a lot. This means changes 
the way CAN bus is created/works and socket CAN works. CAN Socket(CAN Raw 
socket) is much different from Ethernet so plugging/using NetClient state is 
not working here.

I apologize for still being a little confused about the filters but when 
looking into the code, I can only find them being used with ethernet frames. 
Since no other can controller uses NetClientState it makes me wonder if this 
model perhaps was thought of being an ethernet NIC? Or has the code in net/can/ 
which I referenced been obsoleted?

Sharing this link for SocketCAN(in case you want to have a look): 
https://www.kernel.org/doc/Documentation/networking/can.txt. Section 4 talks on 
how CAN Socket is intended to work. Equivalent file is located as 
net/can-socketcan.c.
 
Regards,
Vikram

> -Original Message-
> From: Jason Wang 
> Sent: Monday, February 10, 2020 7:09 PM
> To: Vikram Garhwal ; qemu-devel@nongnu.org
> Subject: Re: [PATCH 1/1] hw/net/can: Introduce Xlnx ZynqMP CAN controller
> for QEMU
> 
> 
> On 2020/2/11 上午5:45, Vikram Garhwal wrote:
> >>> +}
> >>> +} else {
> >>> +/* Normal mode Tx. */
> >>> +generate_frame(, data);
> >>> +
> >>> +can_bus_client_send(>bus_client, , 1);
> >> I had a quick glance at can_bus_client_send():
> >>
> >> It did:
> >>
> >>       QTAILQ_FOREACH(peer, >clients, next) {
> >>       if (peer->info->can_receive(peer)) {
> >>       if (peer == client) {
> >>       /* No loopback support for now */
> >>       continue;
> >>       }
> >>       if (peer->info->receive(peer, frames, frames_cnt) > 0) {
> >>       ret = 1;
> >>       }
> >>       }
> >>       }
> >>
> >> which looks not correct. We need to use qemu_send_packet() instead of
> >> calling peer->info->receive() directly which bypasses filters completely.
> > [Vikram Garhwal] Can you please elaborate it bit more on why do we need
> to filter outgoing message? So, I can either add a filter before sending the
> packets. I am unable to understand the use case for it. For any message which
> is incoming, we are filtering it for sure before storing in update_rx_fifo().
> 
> 
> I might be not clear, I meant the netfilters supported by qemu which allows
> you to attach a filter to a specific NetClientState, see
> qemu_send_packet_async_with_flags. It doesn't mean the filter implemented
> in your own NIC model.
> 
> Thanks
> 
> 
> > Also, I can see existing CAN models like CAN sja1000 and CAN Kavser are
> using it same can_bus_client_send() function. However, this doesn't mean
> that it is the correct way to send & receive packets.



[PATCH 1/2] qxl: map rom r/o

2020-02-24 Thread Gerd Hoffmann
Map qxl rom read-only into the guest, so the guest can't tamper with the
content.  qxl has a shadow copy of the rom to deal with that, but the
shadow doesn't cover the mode list.  A privilidged user in the guest can
manipulate the mode list and that to trick qemu into oob reads, leading
to a DoS via segfault if that read access happens to hit unmapped memory.

Signed-off-by: Gerd Hoffmann 
---
 hw/display/qxl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index 21a43a1d5ec2..227da69a50d9 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -2136,7 +2136,7 @@ static void qxl_realize_common(PCIQXLDevice *qxl, Error 
**errp)
 pci_set_byte([PCI_INTERRUPT_PIN], 1);
 
 qxl->rom_size = qxl_rom_size();
-memory_region_init_ram(>rom_bar, OBJECT(qxl), "qxl.vrom",
+memory_region_init_rom(>rom_bar, OBJECT(qxl), "qxl.vrom",
qxl->rom_size, _fatal);
 init_qxl_rom(qxl);
 init_qxl_ram(qxl);
-- 
2.18.2




[PATCH 2/2] qxl: drop shadow_rom

2020-02-24 Thread Gerd Hoffmann
Now that the rom bar is mapped read-only and the guest can't change
things under our feet we don't need the shadow rom any more.

Signed-off-by: Gerd Hoffmann 
---
 hw/display/qxl.h |  2 +-
 hw/display/qxl.c | 25 +
 2 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/hw/display/qxl.h b/hw/display/qxl.h
index 707631a1f573..3aedc7db5da0 100644
--- a/hw/display/qxl.h
+++ b/hw/display/qxl.h
@@ -95,11 +95,11 @@ typedef struct PCIQXLDevice {
 uint32_t   vgamem_size;
 
 /* rom pci bar */
-QXLRom shadow_rom;
 QXLRom *rom;
 QXLModes   *modes;
 uint32_t   rom_size;
 MemoryRegion   rom_bar;
+uint32_t   rom_mode;
 #if SPICE_SERVER_VERSION >= 0x000c06 /* release 0.12.6 */
 uint16_t   max_outputs;
 #endif
diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index 227da69a50d9..0502802688f9 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -391,7 +391,6 @@ static void init_qxl_rom(PCIQXLDevice *d)
 sizeof(rom->client_monitors_config));
 }
 
-d->shadow_rom = *rom;
 d->rom= rom;
 d->modes  = modes;
 }
@@ -403,7 +402,7 @@ static void init_qxl_ram(PCIQXLDevice *d)
 QXLReleaseRing *ring;
 
 buf = d->vga.vram_ptr;
-d->ram = (QXLRam *)(buf + le32_to_cpu(d->shadow_rom.ram_header_offset));
+d->ram = (QXLRam *)(buf + le32_to_cpu(d->rom->ram_header_offset));
 d->ram->magic   = cpu_to_le32(QXL_RAM_MAGIC);
 d->ram->int_pending = cpu_to_le32(0);
 d->ram->int_mask= cpu_to_le32(0);
@@ -446,7 +445,7 @@ static void qxl_ram_set_dirty(PCIQXLDevice *qxl, void *ptr)
 /* can be called from spice server thread context */
 static void qxl_ring_set_dirty(PCIQXLDevice *qxl)
 {
-ram_addr_t addr = qxl->shadow_rom.ram_header_offset;
+ram_addr_t addr = qxl->rom->ram_header_offset;
 ram_addr_t end  = qxl->vga.vram_size;
 qxl_set_dirty(>vga.vram, addr, end);
 }
@@ -529,7 +528,6 @@ static void interface_set_compression_level(QXLInstance 
*sin, int level)
 PCIQXLDevice *qxl = container_of(sin, PCIQXLDevice, ssd.qxl);
 
 trace_qxl_interface_set_compression_level(qxl->id, level);
-qxl->shadow_rom.compression_level = cpu_to_le32(level);
 qxl->rom->compression_level = cpu_to_le32(level);
 qxl_rom_set_dirty(qxl);
 }
@@ -561,7 +559,7 @@ static void interface_get_init_info(QXLInstance *sin, 
QXLDevInitInfo *info)
 info->num_memslots_groups = NUM_MEMSLOTS_GROUPS;
 info->internal_groupslot_id = 0;
 info->qxl_ram_size =
-le32_to_cpu(qxl->shadow_rom.num_pages) << QXL_PAGE_BITS;
+le32_to_cpu(qxl->rom->num_pages) << QXL_PAGE_BITS;
 info->n_surfaces = qxl->ssd.num_surfaces;
 }
 
@@ -1035,9 +1033,6 @@ static void interface_set_client_capabilities(QXLInstance 
*sin,
 return;
 }
 
-qxl->shadow_rom.client_present = client_present;
-memcpy(qxl->shadow_rom.client_capabilities, caps,
-   sizeof(qxl->shadow_rom.client_capabilities));
 qxl->rom->client_present = client_present;
 memcpy(qxl->rom->client_capabilities, caps,
sizeof(qxl->rom->client_capabilities));
@@ -1232,11 +1227,8 @@ static void qxl_check_state(PCIQXLDevice *d)
 
 static void qxl_reset_state(PCIQXLDevice *d)
 {
-QXLRom *rom = d->rom;
-
 qxl_check_state(d);
-d->shadow_rom.update_id = cpu_to_le32(0);
-*rom = d->shadow_rom;
+d->rom->update_id = cpu_to_le32(0);
 qxl_rom_set_dirty(d);
 init_qxl_ram(d);
 d->num_free_res = 0;
@@ -1600,7 +1592,7 @@ static void qxl_set_mode(PCIQXLDevice *d, unsigned int 
modenr, int loadvm)
 .format = SPICE_SURFACE_FMT_32_xRGB,
 .flags  = loadvm ? QXL_SURF_FLAG_KEEP_DATA : 0,
 .mouse_mode = true,
-.mem= devmem + d->shadow_rom.draw_area_offset,
+.mem= devmem + d->rom->draw_area_offset,
 };
 
 trace_qxl_set_mode(d->id, modenr, mode->x_res, mode->y_res, mode->bits,
@@ -1620,7 +1612,6 @@ static void qxl_set_mode(PCIQXLDevice *d, unsigned int 
modenr, int loadvm)
 if (mode->bits == 16) {
 d->cmdflags |= QXL_COMMAND_FLAG_COMPAT_16BPP;
 }
-d->shadow_rom.mode = cpu_to_le32(modenr);
 d->rom->mode = cpu_to_le32(modenr);
 qxl_rom_set_dirty(d);
 }
@@ -2277,6 +2268,7 @@ static int qxl_pre_save(void *opaque)
 d->last_release_offset = (uint8_t *)d->last_release - ram_start;
 }
 assert(d->last_release_offset < d->vga.vram_size);
+d->rom_mode = d->rom->mode;
 
 return 0;
 }
@@ -2316,6 +2308,7 @@ static int qxl_post_load(void *opaque, int version)
 } else {
 d->last_release = (QXLReleaseInfo *)(ram_start + 
d->last_release_offset);
 }
+d->rom->mode = d->rom_mode;
 
 d->modes = (QXLModes*)((uint8_t*)d->rom + d->rom->modes_offset);
 
@@ -2361,7 +2354,7 @@ static int qxl_post_load(void *opaque, int version)
 case QXL_MODE_COMPAT:
 /* note: no need to call qxl_create_memslots, qxl_set_mode
 

[PATCH 0/2] qxl: map rom r/o, remove shadow.

2020-02-24 Thread Gerd Hoffmann



Gerd Hoffmann (2):
  qxl: map rom r/o
  qxl: drop shadow_rom

 hw/display/qxl.h |  2 +-
 hw/display/qxl.c | 27 ++-
 2 files changed, 11 insertions(+), 18 deletions(-)

-- 
2.18.2




Re: Race condition in overlayed qcow2?

2020-02-24 Thread dovgaluk

Vladimir Sementsov-Ogievskiy писал 2020-02-21 16:23:

21.02.2020 15:35, dovgaluk wrote:

Vladimir Sementsov-Ogievskiy писал 2020-02-21 13:09:

21.02.2020 12:49, dovgaluk wrote:

Vladimir Sementsov-Ogievskiy писал 2020-02-20 12:36:

1 or 2 are ok, and 4 or 8 lead to the failures.


That is strange. I could think, that it was caused by the bugs in
deterministic CPU execution, but the first difference in logs
occur in READ operation (I dump read/write buffers in 
blk_aio_complete).




Aha, yes, looks strange.

Then next steps:

1. Does problem hit into the same offset every time?
2. Do we write to this region before this strange read?

2.1. If yes, we need to check that we read what we write.. You say 
you dump buffers
in blk_aio_complete... I think it would be more reliable to dump 
at start of
bdrv_co_pwritev and at end of bdrv_co_preadv. Also, guest may 
modify its buffers

during operation which would be strange but possible.

2.2 If not, hmm...




Another idea to check: use blkverify


I added logging of file descriptor and discovered that different 
results are obtained

when reading from the backing file.
And even more - replay runs of the same recording produce different 
results.
Logs show that there is a preadv race, but I can't figure out the 
source of the failure.


Log1:
preadv c 30467e00
preadv c 3096
--- sum = a2e1e
bdrv_co_preadv_part complete offset: 30467e00 qiov_offset: 0 len: 
8200

--- sum = 10cdee
bdrv_co_preadv_part complete offset: 3096 qiov_offset: 8200 len: 
ee00


Log2:
preadv c 30467e00
--- sum = a2e1e
bdrv_co_preadv_part complete offset: 30467e00 qiov_offset: 0 len: 
8200

preadv c 3096
--- sum = f094f
bdrv_co_preadv_part complete offset: 3096 qiov_offset: 8200 len: 
ee00



Checksum calculation was added to preadv in file-posix.c



So, preadv in file-posix.c returns different results for the same
offset, for file which is always opened in RO mode? Sounds impossible
:)


True.
Maybe my logging is wrong?

static ssize_t
qemu_preadv(int fd, const struct iovec *iov, int nr_iov, off_t offset)
{
     ssize_t res = preadv(fd, iov, nr_iov, offset);
     qemu_log("preadv %x %"PRIx64"\n", fd, (uint64_t)offset);
     int i;
     uint32_t sum = 0;
     int cnt = 0;
     for (i = 0 ; i < nr_iov ; ++i) {
     int j;
     for (j = 0 ; j < (int)iov[i].iov_len ; ++j)
     {
     sum += ((uint8_t*)iov[i].iov_base)[j];
     ++cnt;
     }
     }
     qemu_log("size: %x sum: %x\n", cnt, sum);
     assert(cnt == res);
     return res;
}



Hmm, I don't see any issues here..

Are you absolutely sure, that all these reads are from backing file,
which is read-only and never changed (may be by other processes)?


Yes, I made a copy and compared the files with binwalk.


2. guest modifies buffers during operation (you can catch it if
allocate personal buffer for preadv, than calculate checksum, then
memcpy to guest buffer)


I added the following to the qemu_preadv:

// do it again
unsigned char *buf = g_malloc(cnt);
struct iovec v = {buf, cnt};
res = preadv(fd, , 1, offset);
assert(cnt == res);
uint32_t sum2 = 0;
for (i = 0 ; i < cnt ; ++i)
sum2 += buf[i];
g_free(buf);
qemu_log("--- sum2 = %x\n", sum2);
assert(sum2 == sum);

These two reads give different results.
But who can modify the buffer while qcow2 workers filling it with data 
from the disk?




Pavel Dovgalyuk



Re: [PATCH] hw/net/imx_fec: write TGSR and TCSR3 in imx_enet_write()

2020-02-24 Thread Jason Wang



On 2020/2/25 上午10:59, Chen Qun wrote:

The current code causes clang static code analyzer generate warning:
hw/net/imx_fec.c:858:9: warning: Value stored to 'value' is never read
 value = value & 0x000f;
 ^   ~~
hw/net/imx_fec.c:864:9: warning: Value stored to 'value' is never read
 value = value & 0x00fd;
 ^   ~~

According to the definition of the function, the two “value” assignments
  should be written to registers.

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
I'm not sure if this modification is correct, just from the function
  definition, it is correct.
---
  hw/net/imx_fec.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/net/imx_fec.c b/hw/net/imx_fec.c
index 6a124a154a..92f6215712 100644
--- a/hw/net/imx_fec.c
+++ b/hw/net/imx_fec.c
@@ -855,13 +855,13 @@ static void imx_enet_write(IMXFECState *s, uint32_t 
index, uint32_t value)
  break;
  case ENET_TGSR:
  /* implement clear timer flag */
-value = value & 0x000f;
+s->regs[index] = value & 0x000f;
  break;
  case ENET_TCSR0:
  case ENET_TCSR1:
  case ENET_TCSR2:
  case ENET_TCSR3:
-value = value & 0x00fd;
+s->regs[index] = value & 0x00fd;
  break;
  case ENET_TCCR0:
  case ENET_TCCR1:



Applied.

Thanks





Re: [PATCH v2] riscv: sifive_u: Add a "serial" property for board serial number

2020-02-24 Thread Bin Meng
Hi Alistair,

On Tue, Feb 25, 2020 at 5:14 AM Alistair Francis  wrote:
>
> On Sun, Feb 16, 2020 at 5:56 AM Bin Meng  wrote:
> >
> > At present the board serial number is hard-coded to 1, and passed
> > to OTP model during initialization. Firmware (FSBL, U-Boot) uses
> > the serial number to generate a unique MAC address for the on-chip
> > ethernet controller. When multiple QEMU 'sifive_u' instances are
> > created and connected to the same subnet, they all have the same
> > MAC address hence it creates a unusable network.
> >
> > A new "serial" property is introduced to specify the board serial
> > number. When not given, the default serial number 1 is used.
> >
> > Signed-off-by: Bin Meng 
> >
> > ---
> >
> > Changes in v2:
> > - Move setting OTP serial number property from riscv_sifive_u_soc_init()
> >   to riscv_sifive_u_soc_realize(), to fix the 'check-qtest-riscv' error.
> >   I am not really sure why doing so could fix the 'make check' error.
> >   The v1 patch worked fine and nothing seems wrong.
> >
> >  hw/riscv/sifive_u.c | 21 -
> >  include/hw/riscv/sifive_u.h |  1 +
> >  2 files changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
> > index 0e12b3c..ca561d3 100644
> > --- a/hw/riscv/sifive_u.c
> > +++ b/hw/riscv/sifive_u.c
> > @@ -34,6 +34,7 @@
> >  #include "qemu/log.h"
> >  #include "qemu/error-report.h"
> >  #include "qapi/error.h"
> > +#include "qapi/visitor.h"
> >  #include "hw/boards.h"
> >  #include "hw/loader.h"
> >  #include "hw/sysbus.h"
> > @@ -434,7 +435,6 @@ static void riscv_sifive_u_soc_init(Object *obj)
> >TYPE_SIFIVE_U_PRCI);
> >  sysbus_init_child_obj(obj, "otp", >otp, sizeof(s->otp),
> >TYPE_SIFIVE_U_OTP);
> > -qdev_prop_set_uint32(DEVICE(>otp), "serial", OTP_SERIAL);
> >  sysbus_init_child_obj(obj, "gem", >gem, sizeof(s->gem),
> >TYPE_CADENCE_GEM);
> >  }
> > @@ -453,6 +453,18 @@ static void sifive_u_set_start_in_flash(Object *obj, 
> > bool value, Error **errp)
> >  s->start_in_flash = value;
> >  }
> >
> > +static void sifive_u_get_serial(Object *obj, Visitor *v, const char *name,
> > +void *opaque, Error **errp)
> > +{
> > +visit_type_uint32(v, name, (uint32_t *)opaque, errp);
> > +}
> > +
> > +static void sifive_u_set_serial(Object *obj, Visitor *v, const char *name,
> > +void *opaque, Error **errp)
> > +{
> > +visit_type_uint32(v, name, (uint32_t *)opaque, errp);
>
> This is a little confusing. Maybe it's worth adding a comment that
> opaque is s->serial?

Yes, I can add a comment.

>
> Either that or change opaque to be SiFiveUState *s and then access
> serial via the struct.

Do you mean something like this?

Calling object_property_add() with opaque as NULL, not >serial:

object_property_add(obj, "serial", "uint32", sifive_u_get_serial,
sifive_u_set_serial, NULL, NULL, NULL);

Then in the sifive_u_get_serial() or sifive_u_set_serial(), replace
opaque with RISCV_U_MACHINE(obj)->serial.

Wow, it looks we have designed so flexible APIs :)

>
> > +}
> > +
> >  static void riscv_sifive_u_machine_instance_init(Object *obj)
> >  {
> >  SiFiveUState *s = RISCV_U_MACHINE(obj);
> > @@ -464,11 +476,17 @@ static void 
> > riscv_sifive_u_machine_instance_init(Object *obj)
> >  "Set on to tell QEMU's ROM to jump to 
> > " \
> >  "flash. Otherwise QEMU will jump to 
> > DRAM",
> >  NULL);
> > +
> > +s->serial = OTP_SERIAL;
> > +object_property_add(obj, "serial", "uint32", sifive_u_get_serial,
> > +sifive_u_set_serial, NULL, >serial, NULL);
> > +object_property_set_description(obj, "serial", "Board serial number", 
> > NULL);
> >  }
> >
> >  static void riscv_sifive_u_soc_realize(DeviceState *dev, Error **errp)
> >  {
> >  MachineState *ms = MACHINE(qdev_get_machine());
> > +SiFiveUState *us = RISCV_U_MACHINE(ms);
>
> I don't think the Soc should access the machine like this. What if we
> use this Soc on a different machine?
>

Yes, agree. The v1 patch does this in the riscv_sifive_u_init(), but
it could not pass the "make check". See the changelog I mentioned. Do
you know how to fix the "make check" properly? The issue is quite
strange. The v1 patch worked perfectly OK and I did not observe any
crash during my normal use, but with "make check" QEMU RISC-V crashes
with the v1 patch.

> There should be a SoC "serial" property that is set before realise as well.
>

v1 patch: http://patchwork.ozlabs.org/patch/1196127/

Regards,
Bin



RE: [PATCH V2 4/8] COLO: Optimize memory back-up process

2020-02-24 Thread Zhanghailiang
Hi,


> -Original Message-
> From: Daniel Cho [mailto:daniel...@qnap.com]
> Sent: Tuesday, February 25, 2020 10:53 AM
> To: Zhanghailiang 
> Cc: qemu-devel@nongnu.org; quint...@redhat.com; Dr. David Alan Gilbert
> 
> Subject: Re: [PATCH V2 4/8] COLO: Optimize memory back-up process
> 
> Hi Hailiang,
> 
> With version 2, the code in migration/ram.c
> 
> +if (migration_incoming_colo_enabled()) {
> +if (migration_incoming_in_colo_state()) {
> +/* In COLO stage, put all pages into cache
> temporarily */
> +host = colo_cache_from_block_offset(block, addr);
> +} else {
> +   /*
> +* In migration stage but before COLO stage,
> +* Put all pages into both cache and SVM's memory.
> +*/
> +host_bak = colo_cache_from_block_offset(block,
> addr);
> +}
>  }
>  if (!host) {
>  error_report("Illegal RAM offset " RAM_ADDR_FMT,
> addr);
>  ret = -EINVAL;
>  break;
>  }
> 
> host = colo_cache_from_block_offset(block, addr);
> host_bak = colo_cache_from_block_offset(block, addr);
> Does it cause the "if(!host)" will go break if the condition goes
> "host_bak = colo_cache_from_block_offset(block, addr);" ?
> 

That will not happen, you may have missed this parts.

@@ -3379,20 +3393,35 @@ static int ram_load_precopy(QEMUFile *f)
  RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
 RAMBlock *block = ram_block_from_stream(f, flags);
 
+host = host_from_ram_block_offset(block, addr);
 /*

We have given host a value unconditionally.


> Best regards,
> Daniel Cho
> 
> zhanghailiang  於 2020年2月24日 週
> 一 下午2:55寫道:
> >
> > This patch will reduce the downtime of VM for the initial process,
> > Privously, we copied all these memory in preparing stage of COLO
> > while we need to stop VM, which is a time-consuming process.
> > Here we optimize it by a trick, back-up every page while in migration
> > process while COLO is enabled, though it affects the speed of the
> > migration, but it obviously reduce the downtime of back-up all SVM'S
> > memory in COLO preparing stage.
> >
> > Signed-off-by: zhanghailiang 
> > ---
> >  migration/colo.c |  3 +++
> >  migration/ram.c  | 68
> +++-
> >  migration/ram.h  |  1 +
> >  3 files changed, 54 insertions(+), 18 deletions(-)
> >
> > diff --git a/migration/colo.c b/migration/colo.c
> > index 93c5a452fb..44942c4e23 100644
> > --- a/migration/colo.c
> > +++ b/migration/colo.c
> > @@ -26,6 +26,7 @@
> >  #include "qemu/main-loop.h"
> >  #include "qemu/rcu.h"
> >  #include "migration/failover.h"
> > +#include "migration/ram.h"
> >  #ifdef CONFIG_REPLICATION
> >  #include "replication.h"
> >  #endif
> > @@ -845,6 +846,8 @@ void *colo_process_incoming_thread(void
> *opaque)
> >   */
> >  qemu_file_set_blocking(mis->from_src_file, true);
> >
> > +colo_incoming_start_dirty_log();
> > +
> >  bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
> >  fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
> >  object_unref(OBJECT(bioc));
> > diff --git a/migration/ram.c b/migration/ram.c
> > index ed23ed1c7c..ebf9e6ba51 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -2277,6 +2277,7 @@ static void ram_list_init_bitmaps(void)
> >   * dirty_memory[DIRTY_MEMORY_MIGRATION] don't
> include the whole
> >   * guest memory.
> >   */
> > +
> >  block->bmap = bitmap_new(pages);
> >  bitmap_set(block->bmap, 0, pages);
> >  block->clear_bmap_shift = shift;
> > @@ -2986,7 +2987,6 @@ int colo_init_ram_cache(void)
> >  }
> >  return -errno;
> >  }
> > -memcpy(block->colo_cache, block->host,
> block->used_length);
> >  }
> >  }
> >
> > @@ -3000,19 +3000,36 @@ int colo_init_ram_cache(void)
> >
> >  RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> >  unsigned long pages = block->max_length >>
> TARGET_PAGE_BITS;
> > -
> >  block->bmap = bitmap_new(pages);
> > -bitmap_set(block->bmap, 0, pages);
> >  }
> >  }
> > -ram_state = g_new0(RAMState, 1);
> > -ram_state->migration_dirty_pages = 0;
> > -qemu_mutex_init(_state->bitmap_mutex);
> > -memory_global_dirty_log_start();
> >
> > +ram_state_init(_state);
> >  return 0;
> >  }
> >
> > +/* TODO: duplicated with ram_init_bitmaps */
> > +void colo_incoming_start_dirty_log(void)
> > +{
> > +RAMBlock *block = NULL;
> > +/* For memory_global_dirty_log_start below. */
> > +qemu_mutex_lock_iothread();
> > +qemu_mutex_lock_ramlist();
> > +
> > +memory_global_dirty_log_sync();
> > +WITH_RCU_READ_LOCK_GUARD() {
> > +

Re: [PATCH] hw/ide: Remove status register read side effect

2020-02-24 Thread jasper.lowell
> > ide_exec_cmd 0.461 pid=147030 bus=0x55b77f922d10
> > state=0x55b77f922d98 cmd=0xef

> The command is run here if I'm not mistaken Does this set the irq
> right 
> away on QEMU where on real hadware this may take some time? Not sure
> if 
> that's a problem but trying to understand what's happening.

Yes. QEMU raises an IRQ on the completion of the command.

complete = ide_cmd_table[val].handler(s, val);
if (complete) {
s->status &= ~BUSY_STAT;
assert(!!s->error == !!(s->status & ERR_STAT));

if ((ide_cmd_table[val].flags & SET_DSC) && !s->error) {
s->status |= SEEK_STAT;
}

ide_cmd_done(s);
ide_set_irq(s->bus);
}

This code from /qemu/hw/ide/core.c is executed when the SET_FEATURE
command request is made. I have tested that if this interrupt is not
made, Solaris 10 will complain accordingly with a unique error message.

I don't believe the quick interrupt here is the problem. Solaris 10
will spin for a short time while waiting for the interrupt bit to be
set before continuing with its routine. If it doesn't see the interrupt
bit is set before some timeout, it will print an error about the
missing interrupt and give up loading the driver.

> > pci_cfg_read 53.231 pid=147030 dev=b'cmd646-ide' devid=0x3 fnid=0x0
> > offs=0x50 val=0x4
> > ide_ioport_read 35.577 pid=147030 addr=0x7 reg=b'Status' val=0x50
> > bus=0x55b77f922d10 s=0x55b77f922d98
> > ide_ioport_read 29.095 pid=147030 addr=0x7 reg=b'Status' val=0x50
> > bus=0x55b77f922d10 s=0x55b77f922d98
> 
> So these ide_ioport_read calls clear the irq bit...

That's right. The line that I proposed removing in the patch clears
CFG_INTR_CH0 on ide_ioport_read.

> > ide_ioport_write 19.146 pid=147030 addr=0x6 reg=b'Device/Head'
> > val=0xe0 bus=0x55b77f922d10 s=0x55b77f922d98
> > pci_cfg_read 9.468 pid=147030 dev=b'cmd646-ide' devid=0x3 fnid=0x0
> > offs=0x50 val=0x0
> > pci_cfg_read 127.712 pid=147030 dev=b'cmd646-ide' devid=0x3
> > fnid=0x0 offs=0x50 val=0x0
> > pci_cfg_read 101.942 pid=147030 dev=b'cmd646-ide' devid=0x3
> > fnid=0x0 offs=0x50 val=0x0
> 
> ...that would be checked here?

That's right.

Solaris is performing pci_cfg_read on offs=0x50 until it either sees
the interrupt bit set or times out. If it times out, you get a fatal
error for the driver. The behaviour is not expected and aggressively
checked against by the Solaris kernel. From what I can tell, Linux and
OpenBSD don't check if the bit is set before clearing it.

> What I don't get is why ide_ioport_read is called at all and from
> where if 
> that's meant to emulate legacy ide ISA ioport reads and we have a
> PCI 
> device accessed via PCI regs?

Taken from the ATA specification:
All commands that do not include read- or write-data transfers generate
a single interrupt when the command completes. Resets do not generate
an interrupt.

There will be an interrupt whether the command is successful or not. If
the host wants to know if an error occured it needs to inspect the
status register. Solaris might be doing this. As the trace shows, there
is no error and nothing is out of the ordinary.

There are two devices. The PCI/IDE controller (CMD646) and the ATA
compliant drive. The command, feature, and status registers belong to
the drive. If you want to configure the drive in some way or interact
with it you will use the ioport_read/write interface. CFR_INTR_CH0 and
ARTTIM23_INTR_CH1 are PCI registers in the PCI configuration space that
belongs to the PCI/IDE controller (CMD646). It makes sense to me that
both are used.

> There's a possibility that software may want to clear bits without
> reading 
> the current value so having a way to do that can be explained.

I agree that this might be a possibility. I also think its very normal
for kernel drivers to drop the return value from operations when they
are only interested in the side-effect.

> I'm afraid I don't understand the problem enough either to be able
> to 
> help. Maybe you could try to find out where is ide_ioport_read called
> in 
> the above and if that's correct to call it there. Also the CMD646U
> docs 
> mention irq in a lot of regs (all say write to clear) but I don't 
> understand their relation to each other and irq raised by the drive.

I agree and I think that's part of the problem. The documentation does
not explicitly mention their relation to each other. I can't see
anything that suggests that reading the status register on the drive
will unset bits in the pci configuration space of the controller. They
are seperate devices.

> So maybe in DMA mode the BM* regs should be used and in legacy mode
> these 
> interrupts would go to ISA IRQ14 and 15 and cleared on read as per
> the IDE 
> spec while in native mode PCI INTA is raised and not cleared but the
> chip 
> docs don't say anything about this so it's only guessing.

This might be true but I'm suspicious. In native mode the host should
be checking the PCI registers to identify what device was responsible

[PATCH 7/8] target/arm: Check addresses for disabled regimes

2020-02-24 Thread Richard Henderson
We fail to validate the upper bits of a virtual address on a
translation disabled regime, as per AArch64.TranslateAddressS1Off.

Signed-off-by: Richard Henderson 
---
 target/arm/helper.c | 33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 7cf6642210..2867adea29 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11615,7 +11615,38 @@ bool get_phys_addr(CPUARMState *env, target_ulong 
address,
 /* Definitely a real MMU, not an MPU */
 
 if (regime_translation_disabled(env, mmu_idx)) {
-/* MMU disabled. */
+/*
+ * MMU disabled.  S1 addresses are still checked for bounds.
+ * C.f. AArch64.TranslateAddressS1Off.
+ */
+if (is_a64(env) && mmu_idx != ARMMMUIdx_Stage2) {
+int pamax = arm_pamax(env_archcpu(env));
+uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
+int addrtop, tbi;
+
+tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
+if (access_type == MMU_INST_FETCH) {
+tbi &= ~aa64_va_parameter_tbid(tcr, mmu_idx);
+}
+tbi = (tbi >> extract64(address, 55, 1)) & 1;
+addrtop = (tbi ? 55 : 63);
+
+if (extract64(address, pamax, addrtop - pamax + 1) != 0) {
+fi->type = ARMFault_AddressSize;
+fi->level = 0;
+fi->stage2 = false;
+return 1;
+}
+
+/*
+ * The ARM pseudocode copies bits [51:0] to addrdesc.paddress.
+ * Except for TBI, we've just validated everything above PAMax
+ * is zero.  So we only need to drop TBI.
+ */
+if (tbi) {
+address = extract64(address, 0, 56);
+}
+}
 *phys_ptr = address;
 *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
 *page_size = TARGET_PAGE_SIZE;
-- 
2.20.1




[PATCH 4/8] target/arm: Move helper_dc_zva to helper-a64.c

2020-02-24 Thread Richard Henderson
This is an aarch64-only function.  Move it out of the shared file.
This patch is code movement only.

Signed-off-by: Richard Henderson 
---
 target/arm/helper-a64.h |  1 +
 target/arm/helper.h |  1 -
 target/arm/helper-a64.c | 91 
 target/arm/op_helper.c  | 93 -
 4 files changed, 92 insertions(+), 94 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index a915c1247f..b1a5935f61 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -90,6 +90,7 @@ DEF_HELPER_2(advsimd_f16touinth, i32, f16, ptr)
 DEF_HELPER_2(sqrt_f16, f16, f16, ptr)
 
 DEF_HELPER_2(exception_return, void, env, i64)
+DEF_HELPER_2(dc_zva, void, env, i64)
 
 DEF_HELPER_FLAGS_3(pacia, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacib, TCG_CALL_NO_WG, i64, env, i64, i64)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index fcbf504121..72eb9e6a1a 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -559,7 +559,6 @@ DEF_HELPER_FLAGS_3(crypto_sm4ekey, TCG_CALL_NO_RWG, void, 
ptr, ptr, ptr)
 
 DEF_HELPER_FLAGS_3(crc32, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
 DEF_HELPER_FLAGS_3(crc32c, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
-DEF_HELPER_2(dc_zva, void, env, i64)
 
 DEF_HELPER_FLAGS_5(gvec_qrdmlah_s16, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 95e9e879ca..c0a40c5fa9 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -18,6 +18,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include "cpu.h"
 #include "exec/gdbstub.h"
 #include "exec/helper-proto.h"
@@ -1109,4 +1110,94 @@ uint32_t HELPER(sqrt_f16)(uint32_t a, void *fpstp)
 return float16_sqrt(a, s);
 }
 
+void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
+{
+/*
+ * Implement DC ZVA, which zeroes a fixed-length block of memory.
+ * Note that we do not implement the (architecturally mandated)
+ * alignment fault for attempts to use this on Device memory
+ * (which matches the usual QEMU behaviour of not implementing either
+ * alignment faults or any memory attribute handling).
+ */
 
+ARMCPU *cpu = env_archcpu(env);
+uint64_t blocklen = 4 << cpu->dcz_blocksize;
+uint64_t vaddr = vaddr_in & ~(blocklen - 1);
+
+#ifndef CONFIG_USER_ONLY
+{
+/*
+ * Slightly awkwardly, QEMU's TARGET_PAGE_SIZE may be less than
+ * the block size so we might have to do more than one TLB lookup.
+ * We know that in fact for any v8 CPU the page size is at least 4K
+ * and the block size must be 2K or less, but TARGET_PAGE_SIZE is only
+ * 1K as an artefact of legacy v5 subpage support being present in the
+ * same QEMU executable. So in practice the hostaddr[] array has
+ * two entries, given the current setting of TARGET_PAGE_BITS_MIN.
+ */
+int maxidx = DIV_ROUND_UP(blocklen, TARGET_PAGE_SIZE);
+void *hostaddr[DIV_ROUND_UP(2 * KiB, 1 << TARGET_PAGE_BITS_MIN)];
+int try, i;
+unsigned mmu_idx = cpu_mmu_index(env, false);
+TCGMemOpIdx oi = make_memop_idx(MO_UB, mmu_idx);
+
+assert(maxidx <= ARRAY_SIZE(hostaddr));
+
+for (try = 0; try < 2; try++) {
+
+for (i = 0; i < maxidx; i++) {
+hostaddr[i] = tlb_vaddr_to_host(env,
+vaddr + TARGET_PAGE_SIZE * i,
+1, mmu_idx);
+if (!hostaddr[i]) {
+break;
+}
+}
+if (i == maxidx) {
+/*
+ * If it's all in the TLB it's fair game for just writing to;
+ * we know we don't need to update dirty status, etc.
+ */
+for (i = 0; i < maxidx - 1; i++) {
+memset(hostaddr[i], 0, TARGET_PAGE_SIZE);
+}
+memset(hostaddr[i], 0, blocklen - (i * TARGET_PAGE_SIZE));
+return;
+}
+/*
+ * OK, try a store and see if we can populate the tlb. This
+ * might cause an exception if the memory isn't writable,
+ * in which case we will longjmp out of here. We must for
+ * this purpose use the actual register value passed to us
+ * so that we get the fault address right.
+ */
+helper_ret_stb_mmu(env, vaddr_in, 0, oi, GETPC());
+/* Now we can populate the other TLB entries, if any */
+for (i = 0; i < maxidx; i++) {
+uint64_t va = vaddr + TARGET_PAGE_SIZE * i;
+if (va != (vaddr_in & TARGET_PAGE_MASK)) {
+helper_ret_stb_mmu(env, va, 0, oi, GETPC());
+}
+}
+}
+
+/*
+ * Slow path (probably attempt to 

[PATCH 8/8] target/arm: Disable clean_data_tbi for system mode

2020-02-24 Thread Richard Henderson
We must include the tag in the FAR_ELx register when raising
an addressing exception.  Which means that we should not clear
out the tag during translation.

We cannot at present comply with this for user mode, so we
retain the clean_data_tbi function for the moment, though it
no longer does what it says on the tin for system mode.  This
function is to be replaced with MTE, so don't worry about the
slight misnaming.

Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 24c1fbd262..3c9c43926c 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -228,7 +228,18 @@ static void gen_a64_set_pc(DisasContext *s, TCGv_i64 src)
 static TCGv_i64 clean_data_tbi(DisasContext *s, TCGv_i64 addr)
 {
 TCGv_i64 clean = new_tmp_a64(s);
+/*
+ * In order to get the correct value in the FAR_ELx register,
+ * we must present the memory subsystem with the "dirty" address
+ * including the TBI.  In system mode we can make this work via
+ * the TLB, dropping the TBI during translation.  But for user-only
+ * mode we don't have that option, and must remove the top byte now.
+ */
+#ifdef CONFIG_USER_ONLY
 gen_top_byte_ignore(s, clean, addr, s->tbid);
+#else
+tcg_gen_mov_i64(clean, addr);
+#endif
 return clean;
 }
 
-- 
2.20.1




[PATCH 5/8] target/arm: Use DEF_HELPER_FLAGS for helper_dc_zva

2020-02-24 Thread Richard Henderson
The function does not write registers, and only reads them by
implication via the exception path.

Signed-off-by: Richard Henderson 
---
 target/arm/helper-a64.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index b1a5935f61..3df7c185aa 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -90,7 +90,7 @@ DEF_HELPER_2(advsimd_f16touinth, i32, f16, ptr)
 DEF_HELPER_2(sqrt_f16, f16, f16, ptr)
 
 DEF_HELPER_2(exception_return, void, env, i64)
-DEF_HELPER_2(dc_zva, void, env, i64)
+DEF_HELPER_FLAGS_2(dc_zva, TCG_CALL_NO_WG, void, env, i64)
 
 DEF_HELPER_FLAGS_3(pacia, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacib, TCG_CALL_NO_WG, i64, env, i64, i64)
-- 
2.20.1




[PATCH 6/8] target/arm: Clean address for DC ZVA

2020-02-24 Thread Richard Henderson
This data access was forgotten when we added support for cleaning
addresses of TBI information.

Fixes: 3a471103ac1823ba
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 596bf4cf73..24c1fbd262 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1784,7 +1784,7 @@ static void handle_sys(DisasContext *s, uint32_t insn, 
bool isread,
 return;
 case ARM_CP_DC_ZVA:
 /* Writes clear the aligned block of memory which rt points into. */
-tcg_rt = cpu_reg(s, rt);
+tcg_rt = clean_data_tbi(s, cpu_reg(s, rt));
 gen_helper_dc_zva(cpu_env, tcg_rt);
 return;
 default:
-- 
2.20.1




[PATCH 2/8] target/arm: Optimize cpu_mmu_index

2020-02-24 Thread Richard Henderson
We now cache the core mmu_idx in env->hflags.  Rather than recompute
from scratch, extract the field.  All of the uses of cpu_mmu_index
within target/arm are within helpers where env->hflags is stable.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h| 23 +--
 target/arm/helper.c |  5 -
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 65171cb30e..0e53cc255e 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2939,16 +2939,6 @@ typedef enum ARMMMUIdxBit {
 
 #define MMU_USER_IDX 0
 
-/**
- * cpu_mmu_index:
- * @env: The cpu environment
- * @ifetch: True for code access, false for data access.
- *
- * Return the core mmu index for the current translation regime.
- * This function is used by generic TCG code paths.
- */
-int cpu_mmu_index(CPUARMState *env, bool ifetch);
-
 /* Indexes used when registering address spaces with cpu_address_space_init */
 typedef enum ARMASIdx {
 ARMASIdx_NS = 0,
@@ -3228,6 +3218,19 @@ FIELD(TBFLAG_A64, BTYPE, 10, 2) /* Not cached. */
 FIELD(TBFLAG_A64, TBID, 12, 2)
 FIELD(TBFLAG_A64, UNPRIV, 14, 1)
 
+/**
+ * cpu_mmu_index:
+ * @env: The cpu environment
+ * @ifetch: True for code access, false for data access.
+ *
+ * Return the core mmu index for the current translation regime.
+ * This function is used by generic TCG code paths.
+ */
+static inline int cpu_mmu_index(CPUARMState *env, bool ifetch)
+{
+return FIELD_EX32(env->hflags, TBFLAG_ANY, MMUIDX);
+}
+
 static inline bool bswap_code(bool sctlr_b)
 {
 #ifdef CONFIG_USER_ONLY
diff --git a/target/arm/helper.c b/target/arm/helper.c
index c1dae83700..7cf6642210 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -12109,11 +12109,6 @@ ARMMMUIdx arm_mmu_idx(CPUARMState *env)
 return arm_mmu_idx_el(env, arm_current_el(env));
 }
 
-int cpu_mmu_index(CPUARMState *env, bool ifetch)
-{
-return arm_to_core_mmu_idx(arm_mmu_idx(env));
-}
-
 #ifndef CONFIG_USER_ONLY
 ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env)
 {
-- 
2.20.1




[PATCH 3/8] target/arm: Apply TBI to ESR_ELx in helper_exception_return

2020-02-24 Thread Richard Henderson
We missed this case within AArch64.ExceptionReturn.

Signed-off-by: Richard Henderson 
---
 target/arm/helper-a64.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 509ae93069..95e9e879ca 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -1031,6 +1031,8 @@ void HELPER(exception_return)(CPUARMState *env, uint64_t 
new_pc)
   "AArch32 EL%d PC 0x%" PRIx32 "\n",
   cur_el, new_el, env->regs[15]);
 } else {
+int tbii;
+
 env->aarch64 = 1;
 spsr &= aarch64_pstate_valid_mask(_archcpu(env)->isar);
 pstate_write(env, spsr);
@@ -1038,8 +1040,27 @@ void HELPER(exception_return)(CPUARMState *env, uint64_t 
new_pc)
 env->pstate &= ~PSTATE_SS;
 }
 aarch64_restore_sp(env, new_el);
-env->pc = new_pc;
 helper_rebuild_hflags_a64(env, new_el);
+
+/*
+ * Apply TBI to the exception return address.  We had to delay this
+ * until after we selected the new EL, so that we could select the
+ * correct TBI+TBID bits.  This is made easier by waiting until after
+ * the hflags rebuild, since we can pull the composite TBII field
+ * from there.
+ */
+tbii = FIELD_EX32(env->hflags, TBFLAG_A64, TBII);
+if ((tbii >> extract64(new_pc, 55, 1)) & 1) {
+/* TBI is enabled. */
+int core_mmu_idx = cpu_mmu_index(env, false);
+if (regime_has_2_ranges(core_mmu_idx | ARM_MMU_IDX_A)) {
+new_pc = sextract64(new_pc, 0, 56);
+} else {
+new_pc = extract64(new_pc, 0, 56);
+}
+}
+env->pc = new_pc;
+
 qemu_log_mask(CPU_LOG_INT, "Exception return from AArch64 EL%d to "
   "AArch64 EL%d PC 0x%" PRIx64 "\n",
   cur_el, new_el, env->pc);
-- 
2.20.1




[PATCH 1/8] target/arm: Replicate TBI/TBID bits for single range regimes

2020-02-24 Thread Richard Henderson
Replicate the single TBI bit from TCR_EL2 and TCR_EL3 so that
we can unconditionally use pointer bit 55 to index into our
composite TBI1:TBI0 field.

Signed-off-by: Richard Henderson 
---
 target/arm/helper.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 79db169e04..c1dae83700 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -10297,7 +10297,8 @@ static int aa64_va_parameter_tbi(uint64_t tcr, 
ARMMMUIdx mmu_idx)
 } else if (mmu_idx == ARMMMUIdx_Stage2) {
 return 0; /* VTCR_EL2 */
 } else {
-return extract32(tcr, 20, 1);
+/* Replicate the single TBI bit so we always have 2 bits.  */
+return extract32(tcr, 20, 1) * 3;
 }
 }
 
@@ -10308,7 +10309,8 @@ static int aa64_va_parameter_tbid(uint64_t tcr, 
ARMMMUIdx mmu_idx)
 } else if (mmu_idx == ARMMMUIdx_Stage2) {
 return 0; /* VTCR_EL2 */
 } else {
-return extract32(tcr, 29, 1);
+/* Replicate the single TBID bit so we always have 2 bits.  */
+return extract32(tcr, 29, 1) * 3;
 }
 }
 
-- 
2.20.1




[PATCH 0/8] target/arm: Misc cleanups surrounding TBI

2020-02-24 Thread Richard Henderson
We have a bug at present wherein we do not supply the memory tag to
the memory system, so that on fault FAR_ELx does not contain the
correct value.

For system mode, we already handle ignoring TBI in get_phys_addr_lpae,
as long as we don't actually drop the tag during translation.
For user mode, we don't have that option, so for now we must simply
accept that we'll get the wrong value in the siginfo_t.

In the process of looking at all that I found:

  * Exception return was not applying TBI in copying ELR_ELx to PC,
- Extracting the current mmu_idx can be improved,
- Replicating the TBI bits can allow bit 55 to be used
  unconditionally, eliminating a test.

  * DC_ZVA was not handling TBI (now only for user-mode)
- The helper need not have been in op_helper.c,
- The helper could have better tcg markup.

  * TBI still applies when translation is disabled, and we weren't
raising AddressSpace for bad physical addresses.

  * SVE hasn't been updated to handle TBI.  I have done nothing about
this for now.  For the moment, system mode will work properly, while
user-only will only work without tags.  I'll have to touch the same
places to add MTE support, so it'll get done shortly.


r~


Richard Henderson (8):
  target/arm: Replicate TBI/TBID bits for single range regimes
  target/arm: Optimize cpu_mmu_index
  target/arm: Apply TBI to ESR_ELx in helper_exception_return
  target/arm: Move helper_dc_zva to helper-a64.c
  target/arm: Use DEF_HELPER_FLAGS for helper_dc_zva
  target/arm: Clean address for DC ZVA
  target/arm: Check addresses for disabled regimes
  target/arm: Disable clean_data_tbi for system mode

 target/arm/cpu.h   |  23 
 target/arm/helper-a64.h|   1 +
 target/arm/helper.h|   1 -
 target/arm/helper-a64.c| 114 -
 target/arm/helper.c|  44 +++---
 target/arm/op_helper.c |  93 --
 target/arm/translate-a64.c |  13 -
 7 files changed, 175 insertions(+), 114 deletions(-)

-- 
2.20.1




Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC

2020-02-24 Thread Programmingkid


> On Feb 19, 2020, at 10:35 AM, BALATON Zoltan  wrote:
> 
> Hello,
> 
> On Tue, 18 Feb 2020, Programmingkid wrote:
>>> On Feb 18, 2020, at 12:10 PM, BALATON Zoltan  wrote:
>>> While other targets take advantage of using host FPU to do floating
>>> point computations, this was disabled for PPC target because always
>>> clearing exception flags before every FP op made it slightly slower
>>> than emulating everyting with softfloat. To emulate some FPSCR bits,
>>> clearing of fp_status may be necessary (unless these could be handled
>>> e.g. using FP exceptions on host but there's no API for that in QEMU
>>> yet) but preserving at least the inexact flag makes hardfloat usable
>>> and faster than softfloat. Since most clients don't actually care
>>> about this flag, we can gain some speed trading some emulation
>>> accuracy.
>>> 
>>> This patch implements a simple way to keep the inexact flag set for
>>> hardfloat while still allowing to revert to softfloat for workloads
>>> that need more accurate albeit slower emulation. (Set hardfloat
>>> property of CPU, i.e. -cpu name,hardfloat=false for that.) There may
>>> still be room for further improvement but this seems to increase
>>> floating point performance. Unfortunately the softfloat case is slower
>>> than before this patch so this patch only makes sense if the default
>>> is also set to enable hardfloat.
>>> 
>>> Because of the above this patch at the moment is mainly for testing
>>> different workloads to evaluate how viable would this be in practice.
>>> Thus, RFC and not ready for merge yet.
>>> 
>>> Signed-off-by: BALATON Zoltan 
>>> ---
>>> v2: use different approach to avoid needing if () in
>>> helper_reset_fpstatus() but this does not seem to change overhead
>>> much, also make it a single patch as adding the hardfloat option is
>>> only a few lines; with this we can use same value at other places where
>>> float_status is reset and maybe enable hardfloat for a few more places
>>> for a little more performance but not too much. With this I got:
>> 
>> 
>> 
>> Thank you for working on this. It is about time we have a better FPU.
> 
> Thank you for testing it. I think it would be great if we could come up with 
> some viable approach to improve this before the next freeze.
> 
>> I applied your patch over David Gibson's ppc-for-5.0 branch. It applied 
>> cleanly and compiled easily.
> 
> I've heard some preliminary results from others that there's also a 
> difference between v1 and v2 of the patch in performance where v1 may be 
> faster for same cases so if you (or someone else) want and have time you 
> could experiment with different versions and combinations as well to find the 
> one that's best on all CPUs. Basically we have these parts:
> 
> 1. Change target/ppc/fpu_helper.c::helper_reset_fpstatus() to force 
> float_flag_inexact on in case hadfloat is enabled, I've tried two approaches 
> for this:
> 
> a. In v1 added an if () in the function
> b. In v2 used a variable from env set earlier (I've hoped this may be faster 
> but maybe it's not, testing and explanation is welcome)
> 
> 2. Also change places where env->fp_status is copied to a local tstat and 
> then reset (I think this is done to accumulate flags from multiple FP ops 
> that would individually reset env->fp_status or some other reason, maybe this 
> could be avoided if we reset fp_status less often but that would need more 
> understanding of the FP emulation that I don't have so I did not try to clean 
> that up yet).
> 
> If v2 is really slower than v1 then I'm not sure is it because also changing 
> places with tstat or because of the different approach in 
> helper_reset_fpstatus() so you could try combinations of these as well.
> 
>> Tests were done on a Mac OS 10.4.3 VM. The CPU was set to G3.
> 
> What was the host CPU and OS this was tested on? Please always share CPU info 
> and host OS when sharing bechmark results so they are somewhat comparable. It 
> also depends on CPU features for vector instrucions at least so without CPU 
> info the results could not be understood.

Intel Core i5-2500S CPU @ 2.70GHz.

> I think G3 does not have AltiVec/VMX so maybe testing with G4 would be better 
> to also test those ops unless there's a reason to only test G3. I've tested 
> with G4 both FPU only and FPU+VMX code on Linux host with i7-9700K CPU @ 
> 3.60GHz as was noted in the original cover letter but may be I'va also 
> forgotten some details so I list it here again.

Ok, I did test on the G4, here are my results:

Git commit: c1e667d2598b9b3ce62b8e89ed22dd38dfe9f57f
Mac OS 10.4.3 VM
-cpu G4
-USB audio device

Hardfloat=false
Audio sounds bad when playing midi file.
Extraction rate: 1.5x
Converting rate: 0.7x
Total time: 7:24

Hardfloat=true
Midi audio sounded perfect for about 30 seconds, then it went silent!
Extraction rate: 1.4x (slower with hard float)
Converting rate: 0.7x (same as without hardfloat)
Total time: 7:16 (faster time with hardfloat)

> 
>> I did several 

[PATCH] hw/net/imx_fec: write TGSR and TCSR3 in imx_enet_write()

2020-02-24 Thread Chen Qun
The current code causes clang static code analyzer generate warning:
hw/net/imx_fec.c:858:9: warning: Value stored to 'value' is never read
value = value & 0x000f;
^   ~~
hw/net/imx_fec.c:864:9: warning: Value stored to 'value' is never read
value = value & 0x00fd;
^   ~~

According to the definition of the function, the two “value” assignments
 should be written to registers.

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
I'm not sure if this modification is correct, just from the function
 definition, it is correct.
---
 hw/net/imx_fec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/net/imx_fec.c b/hw/net/imx_fec.c
index 6a124a154a..92f6215712 100644
--- a/hw/net/imx_fec.c
+++ b/hw/net/imx_fec.c
@@ -855,13 +855,13 @@ static void imx_enet_write(IMXFECState *s, uint32_t 
index, uint32_t value)
 break;
 case ENET_TGSR:
 /* implement clear timer flag */
-value = value & 0x000f;
+s->regs[index] = value & 0x000f;
 break;
 case ENET_TCSR0:
 case ENET_TCSR1:
 case ENET_TCSR2:
 case ENET_TCSR3:
-value = value & 0x00fd;
+s->regs[index] = value & 0x00fd;
 break;
 case ENET_TCCR0:
 case ENET_TCCR1:
-- 
2.23.0





Re: [PATCH V2 4/8] COLO: Optimize memory back-up process

2020-02-24 Thread Daniel Cho
Hi Hailiang,

With version 2, the code in migration/ram.c

+if (migration_incoming_colo_enabled()) {
+if (migration_incoming_in_colo_state()) {
+/* In COLO stage, put all pages into cache temporarily */
+host = colo_cache_from_block_offset(block, addr);
+} else {
+   /*
+* In migration stage but before COLO stage,
+* Put all pages into both cache and SVM's memory.
+*/
+host_bak = colo_cache_from_block_offset(block, addr);
+}
 }
 if (!host) {
 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
 ret = -EINVAL;
 break;
 }

host = colo_cache_from_block_offset(block, addr);
host_bak = colo_cache_from_block_offset(block, addr);
Does it cause the "if(!host)" will go break if the condition goes
"host_bak = colo_cache_from_block_offset(block, addr);" ?

Best regards,
Daniel Cho

zhanghailiang  於 2020年2月24日 週一 下午2:55寫道:
>
> This patch will reduce the downtime of VM for the initial process,
> Privously, we copied all these memory in preparing stage of COLO
> while we need to stop VM, which is a time-consuming process.
> Here we optimize it by a trick, back-up every page while in migration
> process while COLO is enabled, though it affects the speed of the
> migration, but it obviously reduce the downtime of back-up all SVM'S
> memory in COLO preparing stage.
>
> Signed-off-by: zhanghailiang 
> ---
>  migration/colo.c |  3 +++
>  migration/ram.c  | 68 +++-
>  migration/ram.h  |  1 +
>  3 files changed, 54 insertions(+), 18 deletions(-)
>
> diff --git a/migration/colo.c b/migration/colo.c
> index 93c5a452fb..44942c4e23 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -26,6 +26,7 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/rcu.h"
>  #include "migration/failover.h"
> +#include "migration/ram.h"
>  #ifdef CONFIG_REPLICATION
>  #include "replication.h"
>  #endif
> @@ -845,6 +846,8 @@ void *colo_process_incoming_thread(void *opaque)
>   */
>  qemu_file_set_blocking(mis->from_src_file, true);
>
> +colo_incoming_start_dirty_log();
> +
>  bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE);
>  fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
>  object_unref(OBJECT(bioc));
> diff --git a/migration/ram.c b/migration/ram.c
> index ed23ed1c7c..ebf9e6ba51 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2277,6 +2277,7 @@ static void ram_list_init_bitmaps(void)
>   * dirty_memory[DIRTY_MEMORY_MIGRATION] don't include the whole
>   * guest memory.
>   */
> +
>  block->bmap = bitmap_new(pages);
>  bitmap_set(block->bmap, 0, pages);
>  block->clear_bmap_shift = shift;
> @@ -2986,7 +2987,6 @@ int colo_init_ram_cache(void)
>  }
>  return -errno;
>  }
> -memcpy(block->colo_cache, block->host, block->used_length);
>  }
>  }
>
> @@ -3000,19 +3000,36 @@ int colo_init_ram_cache(void)
>
>  RAMBLOCK_FOREACH_NOT_IGNORED(block) {
>  unsigned long pages = block->max_length >> TARGET_PAGE_BITS;
> -
>  block->bmap = bitmap_new(pages);
> -bitmap_set(block->bmap, 0, pages);
>  }
>  }
> -ram_state = g_new0(RAMState, 1);
> -ram_state->migration_dirty_pages = 0;
> -qemu_mutex_init(_state->bitmap_mutex);
> -memory_global_dirty_log_start();
>
> +ram_state_init(_state);
>  return 0;
>  }
>
> +/* TODO: duplicated with ram_init_bitmaps */
> +void colo_incoming_start_dirty_log(void)
> +{
> +RAMBlock *block = NULL;
> +/* For memory_global_dirty_log_start below. */
> +qemu_mutex_lock_iothread();
> +qemu_mutex_lock_ramlist();
> +
> +memory_global_dirty_log_sync();
> +WITH_RCU_READ_LOCK_GUARD() {
> +RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> +ramblock_sync_dirty_bitmap(ram_state, block);
> +/* Discard this dirty bitmap record */
> +bitmap_zero(block->bmap, block->max_length >> TARGET_PAGE_BITS);
> +}
> +memory_global_dirty_log_start();
> +}
> +ram_state->migration_dirty_pages = 0;
> +qemu_mutex_unlock_ramlist();
> +qemu_mutex_unlock_iothread();
> +}
> +
>  /* It is need to hold the global lock to call this helper */
>  void colo_release_ram_cache(void)
>  {
> @@ -3032,9 +3049,7 @@ void colo_release_ram_cache(void)
>  }
>  }
>  }
> -qemu_mutex_destroy(_state->bitmap_mutex);
> -g_free(ram_state);
> -ram_state = NULL;
> +ram_state_cleanup(_state);
>  }
>
>  /**
> @@ -3302,7 +3317,6 @@ static void colo_flush_ram_cache(void)
>  ramblock_sync_dirty_bitmap(ram_state, block);
>  }
>  }
> -
>  

Re: [PATCH] spapr: Handle pending hot plug/unplug requests at CAS

2020-02-24 Thread David Gibson
On Mon, Feb 24, 2020 at 08:23:43PM +0100, Greg Kurz wrote:
> If a hot plug or unplug request is pending at CAS, we currently trigger
> a CAS reboot, which severely increases the guest boot time. This is
> because SLOF doesn't handle hot plug events and we had no way to fix
> the FDT that gets presented to the guest.
> 
> We can do better thanks to recent changes in QEMU and SLOF:
> 
> - we now return a full FDT to SLOF during CAS
> 
> - SLOF was fixed to correctly detect any device that was either added or
>   removed since boot time and to update its internal DT accordingly.
> 
> The right solution is to process all pending hot plug/unplug requests
> during CAS: convert hot plugged devices to cold plugged devices and
> remove the hot unplugged ones, which is exactly what spapr_drc_reset()
> does. Also clear all hot plug events that are currently queued since
> they're no longer relevant.
> 
> Note that SLOF cannot currently populate hot plugged PCI bridges or PHBs
> at CAS. Until this limitation is lifted, SLOF will reset the machine when
> this scenario occurs : this will allow the FDT to be fully processed when
> SLOF is started again (ie. the same effect as the CAS reboot that would
> occur anyway without this patch).
> 
> Signed-off-by: Greg Kurz 

LGTM, applied to ppc-for-5.0.

> ---
>  hw/ppc/spapr_events.c  |   13 +
>  hw/ppc/spapr_hcall.c   |   11 +--
>  include/hw/ppc/spapr.h |1 +
>  3 files changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 8b32b7eea526..2afd1844e4d4 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -983,6 +983,19 @@ void spapr_clear_pending_events(SpaprMachineState *spapr)
>  }
>  }
>  
> +void spapr_clear_pending_hotplug_events(SpaprMachineState *spapr)
> +{
> +SpaprEventLogEntry *entry = NULL, *next_entry;
> +
> +QTAILQ_FOREACH_SAFE(entry, >pending_events, next, next_entry) {
> +if (spapr_event_log_entry_type(entry) == RTAS_LOG_TYPE_HOTPLUG) {
> +QTAILQ_REMOVE(>pending_events, entry, next);
> +g_free(entry->extended_log);
> +g_free(entry);
> +}
> +}
> +}
> +
>  void spapr_events_init(SpaprMachineState *spapr)
>  {
>  int epow_irq = SPAPR_IRQ_EPOW;
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 6db3dbde9c92..5992849c1664 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1640,7 +1640,7 @@ static uint32_t cas_check_pvr(SpaprMachineState *spapr, 
> PowerPCCPU *cpu,
>  return best_compat;
>  }
>  
> -static bool spapr_transient_dev_before_cas(void)
> +static void spapr_handle_transient_dev_before_cas(SpaprMachineState *spapr)
>  {
>  Object *drc_container;
>  ObjectProperty *prop;
> @@ -1658,10 +1658,11 @@ static bool spapr_transient_dev_before_cas(void)
>prop->name, NULL));
>  
>  if (spapr_drc_transient(drc)) {
> -return true;
> +spapr_drc_reset(drc);
>  }
>  }
> -return false;
> +
> +spapr_clear_pending_hotplug_events(spapr);
>  }
>  
>  static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
> @@ -1834,9 +1835,7 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  
>  spapr_irq_update_active_intc(spapr);
>  
> -if (spapr_transient_dev_before_cas()) {
> -spapr->cas_reboot = true;
> -}
> +spapr_handle_transient_dev_before_cas(spapr);
>  
>  if (!spapr->cas_reboot) {
>  void *fdt;
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 09110961a589..a4216935a148 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -824,6 +824,7 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
>  void spapr_reallocate_hpt(SpaprMachineState *spapr, int shift,
>Error **errp);
>  void spapr_clear_pending_events(SpaprMachineState *spapr);
> +void spapr_clear_pending_hotplug_events(SpaprMachineState *spapr);
>  int spapr_max_server_number(SpaprMachineState *spapr);
>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>uint64_t pte0, uint64_t pte1);
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v4 0/3] pci_expander_brdige:acpi:Support pxb-pcie for ARM

2020-02-24 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20200225015026.940-1-miaoy...@huawei.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

Using expected file 'tests/data/acpi/virt/DSDT.memhp'
qemu-system-aarch64: -device pxb-pcie,bus_nr=128: 'pxb-pcie' is not a valid 
device model name
Broken pipe
ERROR - too few tests run (expected 4, got 3)
/tmp/qemu-test/src/tests/qtest/libqtest.c:166: kill_qemu() tried to terminate 
QEMU process but encountered exit status 1 (expected 0)
make: *** [check-qtest-aarch64] Error 1
make: *** Waiting for unfinished jobs
  TESTcheck-qtest-x86_64: tests/qtest/vmgenid-test
Could not access KVM kernel module: No such file or directory
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=8b83d9f25d6b41c892c069c8b4dc05ad', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-yz4axvvx/src/docker-src.2020-02-24-21.09.30.8626:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=8b83d9f25d6b41c892c069c8b4dc05ad
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-yz4axvvx/src'
make: *** [docker-run-test-quick@centos7] Error 2

real14m1.499s
user0m8.878s


The full log is available at
http://patchew.org/logs/20200225015026.940-1-miaoy...@huawei.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH 11/13] timer/exynos4210_mct: Remove redundant statement in exynos4210_mct_write()

2020-02-24 Thread kuhn.chenqun
From: Chen Qun 

Clang static code analyzer show warning:
hw/timer/exynos4210_mct.c:1370:9: warning: Value stored to 'index' is never read
index = GET_L_TIMER_CNT_REG_IDX(offset, lt_i);
^   ~
hw/timer/exynos4210_mct.c:1399:9: warning: Value stored to 'index' is never read
index = GET_L_TIMER_CNT_REG_IDX(offset, lt_i);
^   ~
hw/timer/exynos4210_mct.c:1441:9: warning: Value stored to 'index' is never read
index = GET_L_TIMER_CNT_REG_IDX(offset, lt_i);
^   ~

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
Cc: Igor Mitsyanko 
Cc: Peter Maydell 
Cc: qemu-...@nongnu.org
---
 hw/timer/exynos4210_mct.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/hw/timer/exynos4210_mct.c b/hw/timer/exynos4210_mct.c
index 944120aea5..570cf7075b 100644
--- a/hw/timer/exynos4210_mct.c
+++ b/hw/timer/exynos4210_mct.c
@@ -1367,7 +1367,6 @@ static void exynos4210_mct_write(void *opaque, hwaddr 
offset,
 
 case L0_TCNTB: case L1_TCNTB:
 lt_i = GET_L_TIMER_IDX(offset);
-index = GET_L_TIMER_CNT_REG_IDX(offset, lt_i);
 
 /*
  * TCNTB is updated to internal register only after CNT expired.
@@ -1396,7 +1395,6 @@ static void exynos4210_mct_write(void *opaque, hwaddr 
offset,
 
 case L0_ICNTB: case L1_ICNTB:
 lt_i = GET_L_TIMER_IDX(offset);
-index = GET_L_TIMER_CNT_REG_IDX(offset, lt_i);
 
 s->l_timer[lt_i].reg.wstat |= L_WSTAT_ICNTB_WRITE;
 s->l_timer[lt_i].reg.cnt[L_REG_CNT_ICNTB] = value &
@@ -1438,8 +1436,6 @@ static void exynos4210_mct_write(void *opaque, hwaddr 
offset,
 
 case L0_FRCNTB: case L1_FRCNTB:
 lt_i = GET_L_TIMER_IDX(offset);
-index = GET_L_TIMER_CNT_REG_IDX(offset, lt_i);
-
 DPRINTF("local timer[%d] FRCNTB write %llx\n", lt_i, value);
 
 s->l_timer[lt_i].reg.wstat |= L_WSTAT_FRCCNTB_WRITE;
-- 
2.23.0





[PATCH 02/13] block/iscsi:Remove redundant statement in iscsi_open()

2020-02-24 Thread kuhn.chenqun
From: Chen Qun 

Clang static code analyzer show warning:
  block/iscsi.c:1920:9: warning: Value stored to 'flags' is never read
flags &= ~BDRV_O_RDWR;
^

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
Cc: Ronnie Sahlberg 
Cc: Paolo Bonzini 
Cc: Peter Lieven 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: qemu-bl...@nongnu.org
---
 block/iscsi.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 682abd8e09..ed88479ede 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1917,7 +1917,6 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 if (ret < 0) {
 goto out;
 }
-flags &= ~BDRV_O_RDWR;
 }
 
 iscsi_readcapacity_sync(iscsilun, _err);
-- 
2.23.0





[PATCH 05/13] scsi/scsi-disk: Remove redundant statement in scsi_disk_emulate_command()

2020-02-24 Thread kuhn.chenqun
From: Chen Qun 

Clang static code analyzer show warning:
scsi/scsi-disk.c:1918:5: warning: Value stored to 'buflen' is never read
buflen = req->cmd.xfer;
^~

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
Cc: Paolo Bonzini 
Cc: Fam Zheng 
---
 hw/scsi/scsi-disk.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 10d0794d60..1c0cb63a6f 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1915,7 +1915,6 @@ static int32_t scsi_disk_emulate_command(SCSIRequest 
*req, uint8_t *buf)
 r->iov.iov_base = blk_blockalign(s->qdev.conf.blk, r->buflen);
 }
 
-buflen = req->cmd.xfer;
 outbuf = r->iov.iov_base;
 memset(outbuf, 0, r->buflen);
 switch (req->cmd.buf[0]) {
-- 
2.23.0





[PATCH 01/13] block/stream: Remove redundant statement in stream_run()

2020-02-24 Thread kuhn.chenqun
From: Chen Qun 

Clang static code analyzer show warning:
  block/stream.c:186:9: warning: Value stored to 'ret' is never read
ret = 0;
^ ~
Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
Cc: John Snow 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: qemu-bl...@nongnu.org
---
 block/stream.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/stream.c b/block/stream.c
index 5562ccbf57..d78074ac80 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -183,7 +183,6 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 break;
 }
 }
-ret = 0;
 
 /* Publish progress */
 job_progress_update(>common.job, n);
-- 
2.23.0





[PATCH 09/13] dma/xlnx-zdma: Remove redundant statement in zdma_write_dst()

2020-02-24 Thread kuhn.chenqun
From: Chen Qun 

Clang static code analyzer show warning:
hw/dma/xlnx-zdma.c:399:13: warning: Value stored to 'dst_type' is never read
dst_type = FIELD_EX32(s->dsc_dst.words[3], ZDMA_CH_DST_DSCR_WORD3,
^  ~~~

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
Cc: Alistair Francis 
Cc: "Edgar E. Iglesias" 
Cc: Peter Maydell 
Cc: qemu-...@nongnu.org
---
 hw/dma/xlnx-zdma.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/hw/dma/xlnx-zdma.c b/hw/dma/xlnx-zdma.c
index 8fb83f5b07..45355c5d59 100644
--- a/hw/dma/xlnx-zdma.c
+++ b/hw/dma/xlnx-zdma.c
@@ -396,8 +396,6 @@ static void zdma_write_dst(XlnxZDMA *s, uint8_t *buf, 
uint32_t len)
 zdma_load_descriptor(s, next, >dsc_dst);
 dst_size = FIELD_EX32(s->dsc_dst.words[2], ZDMA_CH_DST_DSCR_WORD2,
   SIZE);
-dst_type = FIELD_EX32(s->dsc_dst.words[3], ZDMA_CH_DST_DSCR_WORD3,
-  TYPE);
 }
 
 /* Match what hardware does by ignoring the dst_size and only using
-- 
2.23.0





[PATCH 08/13] display/blizzard: Remove redundant statement in blizzard_draw_line16_32()

2020-02-24 Thread kuhn.chenqun
From: Chen Qun 

Clang static code analyzer show warning:
  hw/display/blizzard.c:940:9: warning: Value stored to 'data' is never read
data >>= 5;
^~
Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
Cc: Andrzej Zaborowski 
Cc: Peter Maydell 
Cc: qemu-...@nongnu.org
---
 hw/display/blizzard.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/display/blizzard.c b/hw/display/blizzard.c
index 359e399c2a..62517bdf75 100644
--- a/hw/display/blizzard.c
+++ b/hw/display/blizzard.c
@@ -937,7 +937,6 @@ static void blizzard_draw_line16_32(uint32_t *dest,
 g = (data & 0x3f) << 2;
 data >>= 6;
 r = (data & 0x1f) << 3;
-data >>= 5;
 *dest++ = rgb_to_pixel32(r, g, b);
 }
 }
-- 
2.23.0





[PATCH 13/13] monitor/hmp-cmds: Remove redundant statement in hmp_rocker_of_dpa_groups()

2020-02-24 Thread kuhn.chenqun
From: Chen Qun 

Clang static code analyzer show warning:
monitor/hmp-cmds.c:2867:17: warning: Value stored to 'set' is never read
set = true;
^ 

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
Cc: "Dr. David Alan Gilbert" 
---
 monitor/hmp-cmds.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 53bc3f76c4..84f94647cd 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -2864,7 +2864,6 @@ void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict 
*qdict)
 
 if (group->has_set_eth_dst) {
 if (!set) {
-set = true;
 monitor_printf(mon, " set");
 }
 monitor_printf(mon, " dst %s", group->set_eth_dst);
-- 
2.23.0





[PATCH 06/13] display/pxa2xx_lcd: Remove redundant statement in pxa2xx_palette_parse()

2020-02-24 Thread kuhn.chenqun
From: Chen Qun 

Clang static code analyzer show warning:
hw/display/pxa2xx_lcd.c:596:9: warning: Value stored to 'format' is never read
format = 0;
^~

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
Cc: Andrzej Zaborowski 
Cc: Peter Maydell 
Cc: qemu-...@nongnu.org
---
 hw/display/pxa2xx_lcd.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/display/pxa2xx_lcd.c b/hw/display/pxa2xx_lcd.c
index 05f5f84671..464e93161a 100644
--- a/hw/display/pxa2xx_lcd.c
+++ b/hw/display/pxa2xx_lcd.c
@@ -593,7 +593,6 @@ static void pxa2xx_palette_parse(PXA2xxLCDState *s, int ch, 
int bpp)
 n = 256;
 break;
 default:
-format = 0;
 return;
 }
 
-- 
2.23.0





[PATCH 12/13] usb/hcd-ehci: Remove redundant statements

2020-02-24 Thread kuhn.chenqun
From: Chen Qun 

The "again" assignment is meaningless before g_assert_not_reached.
In addition, the break statements no longer needs to be after
g_assert_not_reached.

Clang static code analyzer show warning:
hw/usb/hcd-ehci.c:2108:13: warning: Value stored to 'again' is never read
again = -1;
^   ~~

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
Cc: Gerd Hoffmann 
---
 hw/usb/hcd-ehci.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 56ab2f457f..29d49c2d7e 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1301,7 +1301,6 @@ static void ehci_execute_complete(EHCIQueue *q)
 /* should not be triggerable */
 fprintf(stderr, "USB invalid response %d\n", p->packet.status);
 g_assert_not_reached();
-break;
 }
 
 /* TODO check 4.12 for splits */
@@ -2105,9 +2104,7 @@ static void ehci_advance_state(EHCIState *ehci, int async)
 
 default:
 fprintf(stderr, "Bad state!\n");
-again = -1;
 g_assert_not_reached();
-break;
 }
 
 if (again < 0 || itd_count > 16) {
-- 
2.23.0





[PATCH 07/13] display/exynos4210_fimd: Remove redundant statement in exynos4210_fimd_update()

2020-02-24 Thread kuhn.chenqun
From: Chen Qun 

Clang static code analyzer show warning:
hw/display/exynos4210_fimd.c:1313:17: warning: Value stored to 'is_dirty' is 
never read
is_dirty = false;

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
Cc: Igor Mitsyanko 
Cc: Peter Maydell 
Cc: qemu-...@nongnu.org
---
 hw/display/exynos4210_fimd.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
index c1071ecd46..05d3265b76 100644
--- a/hw/display/exynos4210_fimd.c
+++ b/hw/display/exynos4210_fimd.c
@@ -1310,7 +1310,6 @@ static void exynos4210_fimd_update(void *opaque)
 }
 host_fb_addr += inc_size;
 fb_line_addr += inc_size;
-is_dirty = false;
 }
 g_free(snap);
 blend = true;
-- 
2.23.0





[PATCH 10/13] migration/vmstate: Remove redundant statement in vmstate_save_state_v()

2020-02-24 Thread kuhn.chenqun
From: Chen Qun 

The "ret" has been assigned in all branches. It didn't need to be
 assigned separately.

Clang static code analyzer show warning:
  migration/vmstate.c:365:17: warning: Value stored to 'ret' is never read
ret = 0;
^ ~

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
Cc: Juan Quintela 
Cc: "Dr. David Alan Gilbert" 
---
 migration/vmstate.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/vmstate.c b/migration/vmstate.c
index 7dd8ef66c6..bafa890384 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -362,7 +362,6 @@ int vmstate_save_state_v(QEMUFile *f, const 
VMStateDescription *vmsd,
 }
 for (i = 0; i < n_elems; i++) {
 void *curr_elem = first_elem + size * i;
-ret = 0;
 
 vmsd_desc_field_start(vmsd, vmdesc_loop, field, i, n_elems);
 old_offset = qemu_ftell_fast(f);
-- 
2.23.0





[PATCH 03/13] block/file-posix: Remove redundant statement in raw_handle_perm_lock()

2020-02-24 Thread kuhn.chenqun
From: Chen Qun 

Clang static code analyzer show warning:
  block/file-posix.c:891:9: warning: Value stored to 'op' is never read
op = RAW_PL_ABORT;
^

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: qemu-bl...@nongnu.org
---
 block/file-posix.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 6345477112..0f77447a25 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -888,7 +888,6 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
   "Is another process using the image [%s]?\n",
   bs->filename);
 }
-op = RAW_PL_ABORT;
 /* fall through to unlock bytes. */
 case RAW_PL_ABORT:
 raw_apply_lock_bytes(s, s->fd, s->perm, ~s->shared_perm,
-- 
2.23.0





[PATCH 04/13] scsi/esp-pci: Remove redundant statement in esp_pci_io_write()

2020-02-24 Thread kuhn.chenqun
From: Chen Qun 

Clang static code analyzer show warning:
  hw/scsi/esp-pci.c:198:9: warning: Value stored to 'size' is never read
size = 4;
^  ~

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
Cc: Paolo Bonzini 
Cc: Fam Zheng 
---
 hw/scsi/esp-pci.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/scsi/esp-pci.c b/hw/scsi/esp-pci.c
index d5a1f9e017..2e6cc07d4e 100644
--- a/hw/scsi/esp-pci.c
+++ b/hw/scsi/esp-pci.c
@@ -195,7 +195,6 @@ static void esp_pci_io_write(void *opaque, hwaddr addr,
 val <<= shift;
 val |= current & ~(mask << shift);
 addr &= ~3;
-size = 4;
 }
 
 if (addr < 0x40) {
-- 
2.23.0





[PATCH 00/13]redundant code: Fix warnings reported by Clang static code analyzer

2020-02-24 Thread kuhn.chenqun
From: Chen Qun 

Hi all, our EulerRobot integrates clang static code analyzer tools and
found a lot of warnings. They are mainly redundant variable assignments.

This series fixes the warnings.

Chen Qun (13):
  block/stream: Remove redundant statement in stream_run()
  block/iscsi:Remove redundant statement in iscsi_open()
  block/file-posix: Remove redundant statement in raw_handle_perm_lock()
  scsi/esp-pci: Remove redundant statement in esp_pci_io_write()
  scsi/scsi-disk: Remove redundant statement in
scsi_disk_emulate_command()
  display/pxa2xx_lcd: Remove redundant statement in
pxa2xx_palette_parse()
  display/exynos4210_fimd: Remove redundant statement in
exynos4210_fimd_update()
  display/blizzard: Remove redundant statement in
blizzard_draw_line16_32()
  dma/xlnx-zdma: Remove redundant statement in zdma_write_dst()
  migration/vmstate: Remove redundant statement in
vmstate_save_state_v()
  timer/exynos4210_mct: Remove redundant statement in
exynos4210_mct_write()
  usb/hcd-ehci: Remove redundant statements
  monitor/hmp-cmds: Remove redundant statement in
hmp_rocker_of_dpa_groups()

 block/file-posix.c   | 1 -
 block/iscsi.c| 1 -
 block/stream.c   | 1 -
 hw/display/blizzard.c| 1 -
 hw/display/exynos4210_fimd.c | 1 -
 hw/display/pxa2xx_lcd.c  | 1 -
 hw/dma/xlnx-zdma.c   | 2 --
 hw/scsi/esp-pci.c| 1 -
 hw/scsi/scsi-disk.c  | 1 -
 hw/timer/exynos4210_mct.c| 4 
 hw/usb/hcd-ehci.c| 3 ---
 migration/vmstate.c  | 1 -
 monitor/hmp-cmds.c   | 1 -
 13 files changed, 19 deletions(-)

-- 
2.23.0





RE: [RFC 2/2] pci-expender-bus:Add pcie-root-port to pxb-pcie under arm.

2020-02-24 Thread miaoyubo

> -Original Message-
> From: Daniel P. Berrangé [mailto:berra...@redhat.com]
> Sent: Monday, February 24, 2020 8:36 PM
> To: miaoyubo 
> Cc: peter.mayd...@linaro.org; m...@redhat.com; qemu-devel@nongnu.org;
> Xiexiangyou ; shannon.zha...@gmail.com;
> imamm...@redhat.com
> Subject: Re: [RFC 2/2] pci-expender-bus:Add pcie-root-port to pxb-pcie
> under arm.
> 
> On Sat, Feb 15, 2020 at 08:59:28AM +, miaoyubo wrote:
> >
> > > -Original Message-
> > > From: Daniel P. Berrangé [mailto:berra...@redhat.com]
> > > Sent: Friday, February 14, 2020 6:25 PM
> > > To: miaoyubo 
> > > Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com;
> > > imamm...@redhat.com; qemu-devel@nongnu.org; Xiexiangyou
> > > ; m...@redhat.com
> > > Subject: Re: [RFC 2/2] pci-expender-bus:Add pcie-root-port to
> > > pxb-pcie under arm.
> > >
> > > On Fri, Feb 14, 2020 at 07:25:43AM +, miaoyubo wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Daniel P. Berrangé [mailto:berra...@redhat.com]
> > > > > Sent: Thursday, February 13, 2020 9:52 PM
> > > > > To: miaoyubo 
> > > > > Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com;
> > > > > imamm...@redhat.com; qemu-devel@nongnu.org; Xiexiangyou
> > > > > ; m...@redhat.com
> > > > > Subject: Re: [RFC 2/2] pci-expender-bus:Add pcie-root-port to
> > > > > pxb-pcie under arm.
> > > > >
> > > > > On Thu, Feb 13, 2020 at 03:49:52PM +0800, Yubo Miao wrote:
> > > > > > From: miaoyubo 
> > > > > >
> > > > > > Since devices could not directly plugged into pxb-pcie, under
> > > > > > arm, one pcie-root port is plugged into pxb-pcie. Due to the
> > > > > > bus for each pxb-pcie is defined as 2 in acpi dsdt tables(one
> > > > > > for pxb-pcie, one for pcie-root-port), only one device could
> > > > > > be plugged into
> > > one pxb-pcie.
> > > > >
> > > > > What is the cause of this arm specific requirement for pxb-pcie
> > > > > and more importantly can be fix it so that we don't need this patch ?
> > > > > I think it is highly undesirable to have such a per-arch
> > > > > difference in configuration of the pxb-pcie device. It means any
> > > > > mgmt app which already supports pxb-pcie will be broken and need
> to special case arm.
> > > > >
> > > >
> > > > Thanks for your reply, Without this patch, the pxb-pcie is also
> > > > useable, however, one extra pcie-root-port or pci-bridge or
> > > > something else need to be defined by mgmt. app. This patch will could
> be abandoned.
> > >
> > > That's not really answering my question. IIUC, this pxb-pcie device
> > > works fine on x86_64, and I want to know why it doesn't work on arm ?
> > > Requiring different setups by the mgmt apps is not at all nice
> > > because it will inevitably lead to broken arm setups. x86_64 gets
> > > far more testing & usage, developers won't realize arm is different.
> > >
> > >
> >
> > Thanks for replying. Currently, on x86_64, pxb-pcie devices is
> > presented in acpi tables but on arm, It is not, only one main host
> > bridge is presented for arm in acpi dsdt tables. That's why pxb-pcie
> > works on
> > x86_64 but doesn't work on arm. The patch 1/2 do the work to present
> > and allocate resources for pxb-pcie in arm.
> 
> Yes, this first patch makes sense
> 

Thanks for the comments, the patch has been updated to v4, pls check it.

> > For x86_64, if one device is going to be plugged into pxb-pcie, one
> > extra pcie-root-port or pci-bridge have to be defined and plugged on
> > pxb-pcie, then the device is plugged on the pcie-root-port or pci-bridge.
> 
> > This patch 2/2 just auto defined one pcie-root-port for arm. If this
> > patch abandoned, the usage of pxb-pcie would be the same with x86_64,
> > therefore, to keep the same step for x86 and arm, this patch 2/2 could
> > be abandonded.
> 
> Yes, I think abandoning this patch 2 is best. Applications that know how to
> use pxb-pcie on x86_64, will already do the right thing on arm too, once your
> first patch is merged.
> 

This patch has been abandoned since v3.

> Regards,
> Daniel
> --
> |: https://berrange.com  -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-
> https://www.instagram.com/dberrange :|


Regards,
Miao


[PATCH v4 3/3] ACPI/unit-test: Add a new test for pxb-pcie for arm

2020-02-24 Thread Yubo Miao
From: miaoyubo 

Currently, pxb-pcie could be defined by the cmdline like
--device pxb-pcie,id=pci.9,bus_nr=128
However pxb-pcie is not described in acpi tables for arm.

The formal two patches support pxb-pcie for arm, escpcially the
specification for pxb-pcie in DSDT table.

Add a testcase to make sure the ACPI table is correct for guest.

The following table need to be added for this test:
tests/data/acpi/virt/DSDT.pxb
Since the ASL diff has 1000+ lines, it would be presented in
commit log with the simply diff. the diff are:
Device (PC80) is presented in DSDT.
Resources allocated for Device (PCI0) is changed.

  * Disassembling to symbolic ASL+ operators
  *
- * Disassembly of /home/DSDT, Mon Feb 24 19:35:28 2020
+ * Disassembly of /home/DSDT.pxb, Mon Feb 24 19:33:38 2020
  *
  * Original Table Header:
  * Signature"DSDT"
- * Length   0x14BB (5307)
+ * Length   0x1F70 (8048)
  * Revision 0x02
- * Checksum 0xD1
+ * Checksum 0xCF
  * OEM ID   "BOCHS "
  * OEM Table ID "BXPCDSDT"
  * OEM Revision 0x0001 (1)
 })
 }

+Device (PC80)
+{
+Name (_HID, "PNP0A08" /* PCI Express Bus */)  // _HID: Hardware ID
+Name (_CID, "PNP0A03" /* PCI Bus */)  // _CID: Compatible ID
+Name (_ADR, Zero)  // _ADR: Address
+Name (_CCA, One)  // _CCA: Cache Coherency Attribute
+Name (_SEG, Zero)  // _SEG: PCI Segment
+Name (_BBN, 0x80)  // _BBN: BIOS Bus Number
+Name (_UID, 0x80)  // _UID: Unique ID
+Name (_STR, Unicode ("pxb Device"))  // _STR: Description String
+Name (_PRT, Package (0x80)  // _PRT: PCI Routing Table
+{
+Package (0x04)
+{
+0x,
+Zero,
+GSI0,
+Zero
+},

Packages are omitted.

+
+Package (0x04)
+{
+0x001F,
+0x03,
+GSI2,
+Zero
+}
+})
+Device (GSI0)
+{
+Name (_HID, "PNP0C0F" /* PCI Interrupt Link Device */)  // 
_HID: Hardware ID
+Name (_UID, Zero)  // _UID: Unique ID
+Name (_PRS, ResourceTemplate ()  // _PRS: Possible Resource 
Settings
+{
+Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive, 
,, )
+{
+0x0023,
+}
+})
+Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource 
Settings
+{
+Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive, 
,, )
+{
+0x0023,
+}
+})
+Method (_SRS, 1, NotSerialized)  // _SRS: Set Resource Settings
+{
+}
+}
+

GSI1,2,3 are omitted.

+Method (_CBA, 0, NotSerialized)  // _CBA: Configuration Base 
Address
+{
+Return (0x00401000)
+}
+
+Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource Settings
+{
+Name (RBUF, ResourceTemplate ()
+{
+WordBusNumber (ResourceProducer, MinFixed, MaxFixed, 
PosDecode,
+0x, // Granularity
+0x0080, // Range Minimum
+0x0081, // Range Maximum
+0x, // Translation Offset
+0x0002, // Length
+,, )
+DWordMemory (ResourceProducer, PosDecode, MinFixed, 
MaxFixed, NonCacheable, ReadWrite,
+0x, // Granularity
+0x3E9F, // Range Minimum
+0x3EFE, // Range Maximum
+0x, // Translation Offset
+0x0060, // Length
+,, , AddressRangeMemory, TypeStatic)
+DWordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, 
EntireRange,
+0x, // Granularity
+0xC000, // Range Minimum
+0x, // Range Maximum
+0x3EFF, // Translation Offset
+0x4000, // Length
+,, , TypeStatic, DenseTranslation)
+QWordMemory (ResourceProducer, PosDecode, MinFixed, 
MaxFixed, NonCacheable, ReadWrite,
+0x, // 

[PATCH v4 1/3] acpi:Extract two APIs from acpi_dsdt_add_pci

2020-02-24 Thread Yubo Miao
From: miaoyubo 

Extract two APIs acpi_dsdt_add_pci_route_table and
acpi_dsdt_add_pci_osc form acpi_dsdt_add_pci. The first
API is used to specify the pci route table and the second
API is used to declare the operation system capabilities.
These two APIs would be used to specify the pxb-pcie in DSDT.

Signed-off-by: miaoyubo 
---
 hw/arm/virt-acpi-build.c | 129 ++-
 1 file changed, 72 insertions(+), 57 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index fb4b166f82..37c34748a6 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -146,29 +146,11 @@ static void acpi_dsdt_add_virtio(Aml *scope,
 }
 }
 
-static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
-  uint32_t irq, bool use_highmem, bool 
highmem_ecam)
+static void acpi_dsdt_add_pci_route_table(Aml *dev, Aml *scope,
+  uint32_t irq)
 {
-int ecam_id = VIRT_ECAM_ID(highmem_ecam);
-Aml *method, *crs, *ifctx, *UUID, *ifctx1, *elsectx, *buf;
 int i, slot_no;
-hwaddr base_mmio = memmap[VIRT_PCIE_MMIO].base;
-hwaddr size_mmio = memmap[VIRT_PCIE_MMIO].size;
-hwaddr base_pio = memmap[VIRT_PCIE_PIO].base;
-hwaddr size_pio = memmap[VIRT_PCIE_PIO].size;
-hwaddr base_ecam = memmap[ecam_id].base;
-hwaddr size_ecam = memmap[ecam_id].size;
-int nr_pcie_buses = size_ecam / PCIE_MMCFG_SIZE_MIN;
-
-Aml *dev = aml_device("%s", "PCI0");
-aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A08")));
-aml_append(dev, aml_name_decl("_CID", aml_string("PNP0A03")));
-aml_append(dev, aml_name_decl("_SEG", aml_int(0)));
-aml_append(dev, aml_name_decl("_BBN", aml_int(0)));
-aml_append(dev, aml_name_decl("_UID", aml_string("PCI0")));
-aml_append(dev, aml_name_decl("_STR", aml_unicode("PCIe 0 Device")));
-aml_append(dev, aml_name_decl("_CCA", aml_int(1)));
-
+Aml *method, *crs;
 /* Declare the PCI Routing Table. */
 Aml *rt_pkg = aml_varpackage(PCI_SLOT_MAX * PCI_NUM_PINS);
 for (slot_no = 0; slot_no < PCI_SLOT_MAX; slot_no++) {
@@ -204,41 +186,11 @@ static void acpi_dsdt_add_pci(Aml *scope, const 
MemMapEntry *memmap,
 aml_append(dev_gsi, method);
 aml_append(dev, dev_gsi);
 }
+}
 
-method = aml_method("_CBA", 0, AML_NOTSERIALIZED);
-aml_append(method, aml_return(aml_int(base_ecam)));
-aml_append(dev, method);
-
-method = aml_method("_CRS", 0, AML_NOTSERIALIZED);
-Aml *rbuf = aml_resource_template();
-aml_append(rbuf,
-aml_word_bus_number(AML_MIN_FIXED, AML_MAX_FIXED, AML_POS_DECODE,
-0x, 0x, nr_pcie_buses - 1, 0x,
-nr_pcie_buses));
-aml_append(rbuf,
-aml_dword_memory(AML_POS_DECODE, AML_MIN_FIXED, AML_MAX_FIXED,
- AML_NON_CACHEABLE, AML_READ_WRITE, 0x, base_mmio,
- base_mmio + size_mmio - 1, 0x, size_mmio));
-aml_append(rbuf,
-aml_dword_io(AML_MIN_FIXED, AML_MAX_FIXED, AML_POS_DECODE,
- AML_ENTIRE_RANGE, 0x, 0x, size_pio - 1, base_pio,
- size_pio));
-
-if (use_highmem) {
-hwaddr base_mmio_high = memmap[VIRT_HIGH_PCIE_MMIO].base;
-hwaddr size_mmio_high = memmap[VIRT_HIGH_PCIE_MMIO].size;
-
-aml_append(rbuf,
-aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED, AML_MAX_FIXED,
- AML_NON_CACHEABLE, AML_READ_WRITE, 0x,
- base_mmio_high,
- base_mmio_high + size_mmio_high - 1, 0x,
- size_mmio_high));
-}
-
-aml_append(method, aml_return(rbuf));
-aml_append(dev, method);
-
+static void acpi_dsdt_add_pci_osc(Aml *dev, Aml *scope)
+{
+Aml *method, *UUID, *ifctx, *ifctx1, *elsectx, *buf;
 /* Declare an _OSC (OS Control Handoff) method */
 aml_append(dev, aml_name_decl("SUPP", aml_int(0)));
 aml_append(dev, aml_name_decl("CTRL", aml_int(0)));
@@ -246,7 +198,8 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry 
*memmap,
 aml_append(method,
 aml_create_dword_field(aml_arg(3), aml_int(0), "CDW1"));
 
-/* PCI Firmware Specification 3.0
+/*
+ * PCI Firmware Specification 3.0
  * 4.5.1. _OSC Interface for PCI Host Bridge Devices
  * The _OSC interface for a PCI/PCI-X/PCI Express hierarchy is
  * identified by the Universal Unique IDentifier (UUID)
@@ -291,7 +244,8 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry 
*memmap,
 
 method = aml_method("_DSM", 4, AML_NOTSERIALIZED);
 
-/* PCI Firmware Specification 3.0
+/*
+ * PCI Firmware Specification 3.0
  * 4.6.1. _DSM for PCI Express Slot Information
  * The UUID in _DSM in this context is
  * {E5C937D0-3553-4D7A-9117-EA4D19C3434D}
@@ -309,6 +263,67 @@ static void 

[PATCH v4 2/3] acpi:pci-expender-bus: Add pxb support for arm

2020-02-24 Thread Yubo Miao
From: miaoyubo 

Currently virt machine is not supported by pxb-pcie,
and only one main host bridge described in ACPI tables.
In this patch,PXB-PCIE is supproted by arm and certain
resource is allocated for each pxb-pcie in acpi table.
The resource for the main host bridge is also reallocated.

Signed-off-by: miaoyubo 
---
 hw/arm/virt-acpi-build.c | 115 ---
 hw/arm/virt.c|   3 +
 include/hw/arm/virt.h|   7 +++
 3 files changed, 118 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 37c34748a6..be1986c60d 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -49,6 +49,8 @@
 #include "kvm_arm.h"
 #include "migration/vmstate.h"
 
+#include "hw/arm/virt.h"
+#include "hw/pci/pci_bus.h"
 #define ARM_SPI_BASE 32
 
 static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus)
@@ -266,19 +268,116 @@ static void acpi_dsdt_add_pci_osc(Aml *dev, Aml *scope)
 }
 
 static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
-  uint32_t irq, bool use_highmem, bool 
highmem_ecam)
+  uint32_t irq, bool use_highmem, bool 
highmem_ecam,
+  VirtMachineState *vms)
 {
 int ecam_id = VIRT_ECAM_ID(highmem_ecam);
-Aml *method, *crs;
+Aml *method, *crs, *dev;
+int count = 0;
 hwaddr base_mmio = memmap[VIRT_PCIE_MMIO].base;
 hwaddr size_mmio = memmap[VIRT_PCIE_MMIO].size;
 hwaddr base_pio = memmap[VIRT_PCIE_PIO].base;
 hwaddr size_pio = memmap[VIRT_PCIE_PIO].size;
 hwaddr base_ecam = memmap[ecam_id].base;
 hwaddr size_ecam = memmap[ecam_id].size;
+/*
+ * 0x60 would be enough for pxb device
+ * if it is too small, there is no enough space
+ * for a pcie device plugged in a pcie-root port
+ */
+hwaddr size_addr = 0x60;
+hwaddr size_io = 0x4000;
 int nr_pcie_buses = size_ecam / PCIE_MMCFG_SIZE_MIN;
+PCIBus *bus = VIRT_MACHINE(vms)->bus;
+
+if (bus) {
+QLIST_FOREACH(bus, >child, sibling) {
+uint8_t bus_num = pci_bus_num(bus);
+uint8_t numa_node = pci_bus_numa_node(bus);
+
+if (!pci_bus_is_root(bus)) {
+continue;
+}
+/*
+ * Coded up the MIN of the busNr defined for pxb-pcie,
+ * the MIN - 1 would be the MAX bus number for the main
+ * host bridge.
+ */
+if (bus_num < nr_pcie_buses) {
+nr_pcie_buses = bus_num;
+}
+count++;
+dev = aml_device("PC%.02X", bus_num);
+aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A08")));
+aml_append(dev, aml_name_decl("_CID", aml_string("PNP0A03")));
+aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
+aml_append(dev, aml_name_decl("_CCA", aml_int(1)));
+aml_append(dev, aml_name_decl("_SEG", aml_int(0)));
+aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
+aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
+aml_append(dev, aml_name_decl("_STR", aml_unicode("pxb Device")));
+if (numa_node != NUMA_NODE_UNASSIGNED) {
+method = aml_method("_PXM", 0, AML_NOTSERIALIZED);
+aml_append(method, aml_return(aml_int(numa_node)));
+aml_append(dev, method);
+}
+
+acpi_dsdt_add_pci_route_table(dev, scope, irq);
+
+method = aml_method("_CBA", 0, AML_NOTSERIALIZED);
+aml_append(method, aml_return(aml_int(base_ecam)));
+aml_append(dev, method);
+
+method = aml_method("_CRS", 0, AML_NOTSERIALIZED);
+Aml *rbuf = aml_resource_template();
+aml_append(rbuf,
+   aml_word_bus_number(AML_MIN_FIXED, AML_MAX_FIXED,
+   AML_POS_DECODE, 0x,
+   bus_num, bus_num + 1, 0x,
+   2));
+aml_append(rbuf,
+   aml_dword_memory(AML_POS_DECODE, AML_MIN_FIXED,
+AML_MAX_FIXED, AML_NON_CACHEABLE,
+AML_READ_WRITE, 0x,
+base_mmio + size_mmio -
+size_addr * count,
+base_mmio + size_mmio - 1 -
+size_addr * (count - 1),
+0x, size_addr));
+aml_append(rbuf,
+   aml_dword_io(AML_MIN_FIXED, AML_MAX_FIXED,
+   AML_POS_DECODE, AML_ENTIRE_RANGE,
+   0x, size_pio - size_io * count,
+   size_pio - 1 - size_io * (count - 1),
+   base_pio, 

[PATCH v4 0/3] pci_expander_brdige:acpi:Support pxb-pcie for ARM

2020-02-24 Thread Yubo Miao
From: miaoyubo 

Currently pxb-pcie is not supported by arm, the reason for it is
pxb-pcie is not described in DSDT table and only one main host bridge
is described in acpi tables, which means it is not impossible to
present different io numas for different devices, especially
host-passthrough devices.

This series of patches make arm to support PXB-PCIE.

Users can configure pxb-pcie with certain numa, Example command
is:

   -device pxb-pcie,id=pci.7,bus_nr=128,numa_node=0,bus=pcie.0,addr=0x9

Since the bus of pxb-pcie is root bus, devices could not be plugged
into pxb-pcie directly,pcie-root-port or pci-bridge should be defined
and plugged on pxb-pcie, then the device could be plugged onto the
pcie-root-port or pci-bridge.

With the patches,io numa could be presented to the guest by define a
pxb-pcie with the numa and plug the device on it.

miaoyubo (3):
  acpi:Extract two APIs from acpi_dsdt_add_pci
  acpi:pci-expender-bus: Add pxb support for arm
  ACPI/unit-test: Add a new test for pxb-pcie for arm

 hw/arm/virt-acpi-build.c| 232 +++-
 hw/arm/virt.c   |   3 +
 include/hw/arm/virt.h   |   7 +
 tests/data/acpi/virt/DSDT.pxb   | Bin 0 -> 8048 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   1 +
 tests/qtest/bios-tables-test.c  |  54 -
 6 files changed, 233 insertions(+), 64 deletions(-)
 create mode 100644 tests/data/acpi/virt/DSDT.pxb

-- 
2.19.1





Re: [PATCH v7 3/9] qdev: add clock input support to devices.

2020-02-24 Thread Alistair Francis
/On Mon, Feb 24, 2020 at 9:12 AM Damien Hedde  wrote
>
> Add functions to easily handle clocks with devices.
> Clock inputs and outputs should be used to handle clock propagation
> between devices.
> The API is very similar the GPIO API.
>
> This is based on the original work of Frederic Konrad.
>
> Signed-off-by: Damien Hedde 
> ---
>
> I did not changed the constness of @name pointer field in
> NamedClockList structure.
> There is no obstacle to do it but the fact that we need to free the
> allocated data it points to. It is possible to make it const and
> hack/remove the const to call g_free but I don't know if its
> allowed in qemu.
>
> v7:
> + update ClockIn/Out types
> + qdev_connect_clock_out function removed / qdev_connect_clock_in added
>   instead
> + qdev_pass_clock renamed to qdev_alias_clock
> + various small fixes (typos, comment, asserts) (Peter)
> + move device's instance_finalize code related to clock in qdev-clock.c
> ---
>  include/hw/qdev-clock.h | 105 +
>  include/hw/qdev-core.h  |  12 +++
>  hw/core/qdev-clock.c| 169 
>  hw/core/qdev.c  |  12 +++
>  hw/core/Makefile.objs   |   2 +-
>  tests/Makefile.include  |   1 +
>  6 files changed, 300 insertions(+), 1 deletion(-)
>  create mode 100644 include/hw/qdev-clock.h
>  create mode 100644 hw/core/qdev-clock.c
>
> diff --git a/include/hw/qdev-clock.h b/include/hw/qdev-clock.h
> new file mode 100644
> index 00..899a95ca6a
> --- /dev/null
> +++ b/include/hw/qdev-clock.h
> @@ -0,0 +1,105 @@
> +/*
> + * Device's clock input and output
> + *
> + * Copyright GreenSocs 2016-2020
> + *
> + * Authors:
> + *  Frederic Konrad
> + *  Damien Hedde
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QDEV_CLOCK_H
> +#define QDEV_CLOCK_H
> +
> +#include "hw/clock.h"
> +
> +/**
> + * qdev_init_clock_in:
> + * @dev: the device to add an input clock to
> + * @name: the name of the clock (can't be NULL).
> + * @callback: optional callback to be called on update or NULL.
> + * @opaque: argument for the callback
> + * @returns: a pointer to the newly added clock
> + *
> + * Add an input clock to device @dev as a clock named @name.
> + * This adds a child<> property.
> + * The callback will be called with @opaque as opaque parameter.
> + */
> +Clock *qdev_init_clock_in(DeviceState *dev, const char *name,
> +  ClockCallback *callback, void *opaque);
> +
> +/**
> + * qdev_init_clock_out:
> + * @dev: the device to add an output clock to
> + * @name: the name of the clock (can't be NULL).
> + * @callback: optional callback to be called on update or NULL.

qdev_init_clock_out() doesn't have a callback.

> + * @returns: a pointer to the newly added clock

> + *
> + * Add an output clock to device @dev as a clock named @name.
> + * This adds a child<> property.
> + */
> +Clock *qdev_init_clock_out(DeviceState *dev, const char *name);
> +
> +/**
> + * qdev_get_clock_in:
> + * @dev: the device which has the clock
> + * @name: the name of the clock (can't be NULL).
> + * @returns: a pointer to the clock
> + *
> + * Get the input clock @name from @dev or NULL if does not exist.
> + */
> +Clock *qdev_get_clock_in(DeviceState *dev, const char *name);
> +
> +/**
> + * qdev_get_clock_out:
> + * @dev: the device which has the clock
> + * @name: the name of the clock (can't be NULL).
> + * @returns: a pointer to the clock
> + *
> + * Get the output clock @name from @dev or NULL if does not exist.
> + */
> +Clock *qdev_get_clock_out(DeviceState *dev, const char *name);
> +
> +/**
> + * qdev_connect_clock_in:
> + * @dev: a device
> + * @name: the name of an input clock in @dev
> + * @source: the source clock (an output clock of another device for example)
> + *
> + * Set the source clock of input clock @name of device @dev to @source.
> + * @source period update will be propagated to @name clock.
> + */
> +static inline void qdev_connect_clock_in(DeviceState *dev, const char *name,
> + Clock *source)
> +{
> +clock_set_source(qdev_get_clock_in(dev, name), source);
> +}
> +
> +/**
> + * qdev_alias_clock:
> + * @dev: the device which has the clock
> + * @name: the name of the clock in @dev (can't be NULL)
> + * @alias_dev: the device to add the clock
> + * @alias_name: the name of the clock in @container
> + * @returns: a pointer to the clock
> + *
> + * Add a clock @alias_name in @alias_dev which is an alias of the clock @name
> + * in @dev. The direction _in_ or _out_ will the same as the original.
> + * An alias clock must not be modified or used by @alias_dev and should
> + * typically be only only for device composition purpose.
> + */
> +Clock *qdev_alias_clock(DeviceState *dev, const char *name,
> +DeviceState *alias_dev, const char *alias_name);
> +
> +/**
> + * qdev_finalize_clocklist:
> + * 

Re: [PATCH] target: i386: Check float overflow about register stack

2020-02-24 Thread Chen Gang
On 2020/2/24 下午8:43, Paolo Bonzini wrote:
> On 22/02/20 13:25, Chen Gang wrote:
>> On 2020/2/22 下午3:37, Paolo Bonzini wrote:
>>> On 22/02/20 03:10, Chen Gang wrote:
 Set C1 to 1 if stack overflow occurred; set to 0 otherwise".

 In helper_fxam_ST0, I guess, we need "env->fpus |= 0x200" (but I don't
 know wheter it will be conflict with SIGND(temp)). And we have to still
 need foverflow, because all env->fptags being 0 doesn't mean overflow.
>>>
>>> No, you need to add "env->fpus |= 0x200" and "env->fpus &= ~0x200"
>>> directly to fpush, fpop, etc.
>>>
>>
>> OK. The content below is my next TODO, welcome your opinions.
>>
>> When overflow occurs, for me, we need keep everything no touch except
>> set C1 flag.
> 
> No, push will overwrite the top entry if there is overflow.
> 
>> In fxam, we don't clear C1, but keep no touch for clearning
>> C1 in another places.
> 
> FXAM is neither push nor pop, it just detects an empty slot via fptags.
>  FXAM should be okay with my patch.
> 

OK. I am not quite clear about it, but it fixes the current issues at
least. Please apply your patch.

Thanks.





Re: Strange data corruption issue with gluster (libgfapi) and ZFS

2020-02-24 Thread Stefan Ring
On Mon, Feb 24, 2020 at 2:27 PM Kevin Wolf  wrote:
> > > There are quite a few machines running on this host, and we have not
> > > experienced other problems so far. So right now, only ZFS is able to
> > > trigger this for some reason. The guest has 8 virtual cores. I also
> > > tried writing directly to the affected device from user space in
> > > patterns mimicking what I see in blktrace, but so far have been unable
> > > to trigger the same issue that way. Of the many ZFS knobs, I know at
> > > least one that makes a huge difference: When I set
> > > zfs_vdev_async_write_max_active to 1 (as opposed to 2 or 10), the
> > > error count goes through the roof (11.000).
>
> Wait, that does this setting actually do? Does it mean that QEMU should
> never sees more than a single active write request at the same time?
> So if this makes the error a lot more prominent, does this mean that
> async I/O actually makes the problem _less_ likely to occur?
>
> This sounds weird, so probably I'm misunderstanding the setting?

Yes, this is strange, and I will not follow this as I cannot reproduce
it on my home setup. Let’s just hope that it’s some kind of anomaly
that will go away once the real issue has been eliminated ;).

> > I can actually reproduce this on my Fedora 31 home machine with 3 VMs.
>
> This means QEMU 4.1.1, right?

Yes, qemu-system-x86-4.1.1-1.fc31.x86_64.

> > All 3 running CentOS 7.7. Two for glusterd, one for ZFS. Briefly, I
> > also got rid of the 2 glusterd VMs altogether, i.e. running glusterd
> > (the Fedora version) directly on the host, and it would still occur.
> > So my impression is that the server side of GlusterFS does not matter
> > much – I’ve seen it happen on 4.x, 6.x, 7.2 and 7.3. Also, as it
> > happens in the same way on a Fedora 31 qemu as well as a CentOS 7 one,
> > the qemu version is equally irrelevant.
> >
> > The main conclusion so far is that it has to do with growing the qcow2
> > image. With a fully pre-populated image, I cannot trigger it.
>
> Ok, that's a good data point.
>
> Is the corruption that you're seeing only in the guest data or is qcow2
> metadata also affected (does 'qemu-img check' report errors)?

"No errors were found on the image."

I don’t entirely rule out the possibility of qcow metadata corruption,
but at least it seems to be very unlikely compared to guest data
corruption.

> > What I plan to do next is look at the block ranges being written in
> > the hope of finding overlaps there.
>
> Either that, or other interesting patterns.
>
> Did you try to remove the guest from the equation? If you say that the
> problem is with multiple parallel requests, maybe 'qemu-img bench' can
> cause the same kind of corruption? (Though if it's only corruption in
> the content rather than qcow2 metadata, it may be hard to detect.
> Giving qemu-io an explicit list of requests could still be an option
> once we have a suspicion what pattern creates the problem.)

Did not know about qemu-img bench, but narrowing it down as much as
possible – and that entails removing the guest VM – is my number one
priority here.



Re: Strange data corruption issue with gluster (libgfapi) and ZFS

2020-02-24 Thread Stefan Ring
On Thu, Feb 20, 2020 at 10:19 AM Stefan Ring  wrote:
>
> Hi,
>
> I have a very curious problem on an oVirt-like virtualization host
> whose storage lives on gluster (as qcow2).
>
> The problem is that of the writes done by ZFS, whose sizes according
> to blktrace are a mixture of 8, 16, 24, ... 256 (512 byte)
> blocks,sometimes the first 4KB or more, but at least the first 4KB,
> end up zeroed out when read back later from storage. To clarify: ZFS
> is only used in the guest. In my current test scenario, I write
> approx. 3GB to the guest machine, which takes roughly a minute.
> Actually it’s around 35 GB which gets compressed down to 3GB by lz4.
> Within that, I end up with close to 100 data errors when I read it
> back from storage afterwards (zpool scrub).
>
> There are quite a few machines running on this host, and we have not
> experienced other problems so far. So right now, only ZFS is able to
> trigger this for some reason. The guest has 8 virtual cores. I also
> tried writing directly to the affected device from user space in
> patterns mimicking what I see in blktrace, but so far have been unable
> to trigger the same issue that way. Of the many ZFS knobs, I know at
> least one that makes a huge difference: When I set
> zfs_vdev_async_write_max_active to 1 (as opposed to 2 or 10), the
> error count goes through the roof (11.000). Curiously, when I switch
> off ZFS compression, the data amount written increases almost
> 10-fold,while the absolute error amount drops to close to, but not
> entirely,zero. Which I guess supports my suspicion that this must be
> somehow related to timing.
>
> Switching the guest storage driver between scsi and virtio does not
> make a difference.
>
> Switching the storage backend to file on glusterfs-fuse does make a
> difference, i.e. the problem disappears.
>
> Any hints? I'm still trying to investigate a few things, but what bugs
> me most that only ZFS seems to trigger this behavior, although I am
> almost sure that ZFS is not really at fault here.
>
> Software versions used:
>
> Host
> kernel 3.10.0-957.12.1.el7.x86_64
> qemu-kvm-ev-2.12.0-18.el7_6.3.1.x86_64
> glusterfs-api-5.6-1.el7.x86_64
>
> Guest
> kernel 3.10.0-1062.12.1.el7.x86_64
> kmod-zfs-0.8.3-1.el7.x86_64 (from the official ZoL binaries)

I can actually reproduce this on my Fedora 31 home machine with 3 VMs.
All 3 running CentOS 7.7. Two for glusterd, one for ZFS. Briefly, I
also got rid of the 2 glusterd VMs altogether, i.e. running glusterd
(the Fedora version) directly on the host, and it would still occur.
So my impression is that the server side of GlusterFS does not matter
much – I’ve seen it happen on 4.x, 6.x, 7.2 and 7.3. Also, as it
happens in the same way on a Fedora 31 qemu as well as a CentOS 7 one,
the qemu version is equally irrelevant.

The main conclusion so far is that it has to do with growing the qcow2
image. With a fully pre-populated image, I cannot trigger it.

I poked around a little in the glfs api integration, but trying to
make sense of two unknown asynchronous io systems (QEMU's and
GlusterFS's) interacting with each other is demanding a bit much for a
single weekend ;). The one thing I did verify so far is that there is
only one thread ever calling qemu_gluster_co_rw. As already stated in
the original post, the problem only occurs with multiple parallel
write requests happening.

What I plan to do next is look at the block ranges being written in
the hope of finding overlaps there.



Re: Strange data corruption issue with gluster (libgfapi) and ZFS

2020-02-24 Thread Stefan Ring
On Mon, Feb 24, 2020 at 1:35 PM Stefan Ring  wrote:
>
> [...]. As already stated in
> the original post, the problem only occurs with multiple parallel
> write requests happening.

Actually I did not state that. Anyway, the corruption does not happen
when I restrict the ZFS io scheduler to only 1 request at a time
(zfs_vdev_max_active), therefore I assume that this is somehow related
to the ordering of asynchronously scheduled writes.



RE: [PATCH V2 7/8] COLO: Migrate dirty pages during the gap of checkpointing

2020-02-24 Thread Zhanghailiang



> -Original Message-
> From: Eric Blake [mailto:ebl...@redhat.com]
> Sent: Monday, February 24, 2020 11:19 PM
> To: Zhanghailiang ;
> qemu-devel@nongnu.org
> Cc: daniel...@qnap.com; dgilb...@redhat.com; quint...@redhat.com
> Subject: Re: [PATCH V2 7/8] COLO: Migrate dirty pages during the gap of
> checkpointing
> 
> On 2/24/20 12:54 AM, zhanghailiang wrote:
> > We can migrate some dirty pages during the gap of checkpointing,
> > by this way, we can reduce the amount of ram migrated during
> checkpointing.
> >
> > Signed-off-by: zhanghailiang 
> > ---
> 
> > +++ b/qapi/migration.json
> > @@ -977,12 +977,14 @@
> >   #
> >   # @vmstate-loaded: VM's state has been loaded by SVM.
> >   #
> > +# @migrate-ram-background: Send some dirty pages during the gap of
> COLO checkpoint
> 
> Missing a '(since 5.0)' tag.
> 

OK, will add this in next version, I forgot to modify it in this version which 
you reminded
In previous version. :(

> > +#
> >   # Since: 2.8
> >   ##
> >   { 'enum': 'COLOMessage',
> > 'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply',
> >   'vmstate-send', 'vmstate-size', 'vmstate-received',
> > -'vmstate-loaded' ] }
> > +'vmstate-loaded', 'migrate-ram-background' ] }
> >
> >   ##
> >   # @COLOMode:
> >
> 
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org




[PATCH 5/6] qmp.py: change event_wait to use a dict

2020-02-24 Thread John Snow
It's easier to work with than a list of tuples, because we can check the
keys for membership.

Signed-off-by: John Snow 
---
 python/qemu/machine.py| 10 +-
 tests/qemu-iotests/040| 12 ++--
 tests/qemu-iotests/260|  5 +++--
 tests/qemu-iotests/iotests.py | 16 
 4 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index 183d8f3d38..748de5f322 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -476,21 +476,21 @@ def event_wait(self, name, timeout=60.0, match=None):
 timeout: QEMUMonitorProtocol.pull_event timeout parameter.
 match: Optional match criteria. See event_match for details.
 """
-return self.events_wait([(name, match)], timeout)
+return self.events_wait({name: match}, timeout)
 
 def events_wait(self, events, timeout=60.0):
 """
 events_wait waits for and returns a named event from QMP with a 
timeout.
 
-events: a sequence of (name, match_criteria) tuples.
+events: a mapping containing {name: match_criteria}.
 The match criteria are optional and may be None.
 See event_match for details.
 timeout: QEMUMonitorProtocol.pull_event timeout parameter.
 """
 def _match(event):
-for name, match in events:
-if event['event'] == name and self.event_match(event, match):
-return True
+name = event['event']
+if name in events:
+return self.event_match(event, events[name])
 return False
 
 # Search cached events
diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index 32c82b4ec6..90b59081ff 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -485,12 +485,12 @@ class TestErrorHandling(iotests.QMPTestCase):
 
 def run_job(self, expected_events, error_pauses_job=False):
 match_device = {'data': {'device': 'job0'}}
-events = [
-('BLOCK_JOB_COMPLETED', match_device),
-('BLOCK_JOB_CANCELLED', match_device),
-('BLOCK_JOB_ERROR', match_device),
-('BLOCK_JOB_READY', match_device),
-]
+events = {
+'BLOCK_JOB_COMPLETED': match_device,
+'BLOCK_JOB_CANCELLED': match_device,
+'BLOCK_JOB_ERROR': match_device,
+'BLOCK_JOB_READY': match_device,
+}
 
 completed = False
 log = []
diff --git a/tests/qemu-iotests/260 b/tests/qemu-iotests/260
index 30c0de380d..b2fb045ddd 100755
--- a/tests/qemu-iotests/260
+++ b/tests/qemu-iotests/260
@@ -65,8 +65,9 @@ def test(persistent, restart):
 
 vm.qmp_log('block-commit', device='drive0', top=top,
filters=[iotests.filter_qmp_testfiles])
-ev = vm.events_wait((('BLOCK_JOB_READY', None),
- ('BLOCK_JOB_COMPLETED', None)))
+ev = vm.events_wait({
+'BLOCK_JOB_READY': None,
+'BLOCK_JOB_COMPLETED': None })
 log(filter_qmp_event(ev))
 if (ev['event'] == 'BLOCK_JOB_COMPLETED'):
 vm.shutdown()
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 5d2990a0e4..3390fab021 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -604,14 +604,14 @@ def run_job(self, job, auto_finalize=True, 
auto_dismiss=False,
 """
 match_device = {'data': {'device': job}}
 match_id = {'data': {'id': job}}
-events = [
-('BLOCK_JOB_COMPLETED', match_device),
-('BLOCK_JOB_CANCELLED', match_device),
-('BLOCK_JOB_ERROR', match_device),
-('BLOCK_JOB_READY', match_device),
-('BLOCK_JOB_PENDING', match_id),
-('JOB_STATUS_CHANGE', match_id)
-]
+events = {
+'BLOCK_JOB_COMPLETED': match_device,
+'BLOCK_JOB_CANCELLED': match_device,
+'BLOCK_JOB_ERROR': match_device,
+'BLOCK_JOB_READY': match_device,
+'BLOCK_JOB_PENDING': match_id,
+'JOB_STATUS_CHANGE': match_id,
+}
 error = None
 while True:
 ev = filter_qmp_event(self.events_wait(events, timeout=wait))
-- 
2.21.1




[PATCH 1/6] block: add bitmap-populate job

2020-02-24 Thread John Snow
This job copies the allocation map into a bitmap. It's a job because
there's no guarantee that allocation interrogation will be quick (or
won't hang), so it cannot be retrofit into block-dirty-bitmap-merge.

It was designed with different possible population patterns in mind,
but only top layer allocation was implemented for now.

Signed-off-by: John Snow 
---
 qapi/block-core.json  |  48 +
 qapi/job.json |   2 +-
 include/block/block_int.h |  21 
 block/bitmap-alloc.c  | 207 ++
 blockjob.c|   3 +-
 block/Makefile.objs   |   1 +
 6 files changed, 280 insertions(+), 2 deletions(-)
 create mode 100644 block/bitmap-alloc.c

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 85e27bb61f..df1797681a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2245,6 +2245,54 @@
   { 'command': 'block-dirty-bitmap-merge',
 'data': 'BlockDirtyBitmapMerge' }
 
+##
+# @BitmapPattern:
+#
+# An enumeration of possible patterns that can be written into a bitmap.
+#
+# @allocation-top: The allocation status of the top layer
+#  of the attached storage node.
+#
+# Since: 5.0
+##
+{ 'enum': 'BitmapPattern',
+  'data': ['allocation-top'] }
+
+##
+# @BlockDirtyBitmapPopulate:
+#
+# @job-id: identifier for the newly-created block job.
+#
+# @pattern: What pattern should be written into the bitmap?
+#
+# @on-error: the action to take if an error is encountered on a bitmap's
+#attached node, default 'report'.
+#'stop' and 'enospc' can only be used if the block device supports
+#io-status (see BlockInfo).
+#
+# @auto-finalize: When false, this job will wait in a PENDING state after it 
has
+# finished its work, waiting for @block-job-finalize before
+# making any block graph changes.
+# When true, this job will automatically
+# perform its abort or commit actions.
+# Defaults to true.
+#
+# @auto-dismiss: When false, this job will wait in a CONCLUDED state after it
+#has completely ceased all work, and awaits @block-job-dismiss.
+#When true, this job will automatically disappear from the 
query
+#list without user intervention.
+#Defaults to true.
+#
+# Since: 5.0
+##
+{ 'struct': 'BlockDirtyBitmapPopulate',
+  'base': 'BlockDirtyBitmap',
+  'data': { 'job-id': 'str',
+'pattern': 'BitmapPattern',
+'*on-error': 'BlockdevOnError',
+'*auto-finalize': 'bool',
+'*auto-dismiss': 'bool' } }
+
 ##
 # @BlockDirtyBitmapSha256:
 #
diff --git a/qapi/job.json b/qapi/job.json
index 5e658281f5..5f496d4630 100644
--- a/qapi/job.json
+++ b/qapi/job.json
@@ -22,7 +22,7 @@
 # Since: 1.7
 ##
 { 'enum': 'JobType',
-  'data': ['commit', 'stream', 'mirror', 'backup', 'create'] }
+  'data': ['commit', 'stream', 'mirror', 'backup', 'create', 
'bitmap-populate'] }
 
 ##
 # @JobStatus:
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 6f9fd5e20e..a5884b597e 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1215,6 +1215,27 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 BlockCompletionFunc *cb, void *opaque,
 JobTxn *txn, Error **errp);
 
+/*
+ * bitpop_job_create: Create a new bitmap population job.
+ *
+ * @job_id: The id of the newly-created job.
+ * @bs: Block device associated with the @target_bitmap.
+ * @target_bitmap: The bitmap to populate.
+ * @on_error: What to do if an error on @bs is encountered.
+ * @creation_flags: Flags that control the behavior of the Job lifetime.
+ *  See @BlockJobCreateFlags
+ * @cb: Completion function for the job.
+ * @opaque: Opaque pointer value passed to @cb.
+ * @txn: Transaction that this job is part of (may be NULL).
+ */
+BlockJob *bitpop_job_create(const char *job_id, BlockDriverState *bs,
+BdrvDirtyBitmap *target_bitmap,
+BitmapPattern pattern,
+BlockdevOnError on_error,
+int creation_flags,
+BlockCompletionFunc *cb, void *opaque,
+JobTxn *txn, Error **errp);
+
 void hmp_drive_add_node(Monitor *mon, const char *optstr);
 
 BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
diff --git a/block/bitmap-alloc.c b/block/bitmap-alloc.c
new file mode 100644
index 00..47d542dc12
--- /dev/null
+++ b/block/bitmap-alloc.c
@@ -0,0 +1,207 @@
+/*
+ * Async Dirty Bitmap Populator
+ *
+ * Copyright (C) 2020 Red Hat, Inc.
+ *
+ * Authors:
+ *  John Snow 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "trace.h"

[PATCH 0/6] block: add block-dirty-bitmap-populate job

2020-02-24 Thread John Snow
Hi,

This is a new (very small) block job that writes a pattern into a
bitmap. The only pattern implemented is the top allocation information.

This can be used to "recover" an incremental bitmap chain if an external
snapshot was taken without creating a new bitmap first: any writes made
to the image will be reflected by the allocation status and can be
written back into a bitmap.

This is useful for e.g. libvirt managing backup chains if a user creates
an external snapshot outside of libvirt.

Patches 1-2: The new job.
Patch 3: iotest prerequisite
Patch 4-5: completely optional cleanup.
Patch 6: Test.

John Snow (6):
  block: add bitmap-populate job
  qmp: expose block-dirty-bitmap-populate
  iotests: move bitmap helpers into their own file
  iotests: add hmp helper with logging
  qmp.py: change event_wait to use a dict
  iotests: add 287 for block-dirty-bitmap-populate

 qapi/block-core.json  |   66 +
 qapi/job.json |2 +-
 qapi/transaction.json |2 +
 include/block/block_int.h |   21 +
 block/bitmap-alloc.c  |  207 ++
 blockdev.c|   78 +
 blockjob.c|3 +-
 block/Makefile.objs   |1 +
 python/qemu/machine.py|   10 +-
 tests/qemu-iotests/040|   12 +-
 tests/qemu-iotests/257|  110 +-
 tests/qemu-iotests/260|5 +-
 tests/qemu-iotests/287|  233 ++
 tests/qemu-iotests/287.out| 4544 +
 tests/qemu-iotests/bitmaps.py |  131 +
 tests/qemu-iotests/group  |1 +
 tests/qemu-iotests/iotests.py |   34 +-
 17 files changed, 5321 insertions(+), 139 deletions(-)
 create mode 100644 block/bitmap-alloc.c
 create mode 100755 tests/qemu-iotests/287
 create mode 100644 tests/qemu-iotests/287.out
 create mode 100644 tests/qemu-iotests/bitmaps.py

-- 
2.21.1




[PATCH 3/6] iotests: move bitmap helpers into their own file

2020-02-24 Thread John Snow
Signed-off-by: John Snow 
---
 tests/qemu-iotests/257| 110 +---
 tests/qemu-iotests/bitmaps.py | 131 ++
 2 files changed, 132 insertions(+), 109 deletions(-)
 create mode 100644 tests/qemu-iotests/bitmaps.py

diff --git a/tests/qemu-iotests/257 b/tests/qemu-iotests/257
index 004a433b8b..2a81f9e30c 100755
--- a/tests/qemu-iotests/257
+++ b/tests/qemu-iotests/257
@@ -24,120 +24,12 @@ import os
 
 import iotests
 from iotests import log, qemu_img
+from bitmaps import EmulatedBitmap, GROUPS
 
 SIZE = 64 * 1024 * 1024
 GRANULARITY = 64 * 1024
 
 
-class Pattern:
-def __init__(self, byte, offset, size=GRANULARITY):
-self.byte = byte
-self.offset = offset
-self.size = size
-
-def bits(self, granularity):
-lower = self.offset // granularity
-upper = (self.offset + self.size - 1) // granularity
-return set(range(lower, upper + 1))
-
-
-class PatternGroup:
-"""Grouping of Pattern objects. Initialize with an iterable of Patterns."""
-def __init__(self, patterns):
-self.patterns = patterns
-
-def bits(self, granularity):
-"""Calculate the unique bits dirtied by this pattern grouping"""
-res = set()
-for pattern in self.patterns:
-res |= pattern.bits(granularity)
-return res
-
-
-GROUPS = [
-PatternGroup([
-# Batch 0: 4 clusters
-Pattern('0x49', 0x000),
-Pattern('0x6c', 0x010),   # 1M
-Pattern('0x6f', 0x200),   # 32M
-Pattern('0x76', 0x3ff)]), # 64M - 64K
-PatternGroup([
-# Batch 1: 6 clusters (3 new)
-Pattern('0x65', 0x000),   # Full overwrite
-Pattern('0x77', 0x00f8000),   # Partial-left (1M-32K)
-Pattern('0x72', 0x2008000),   # Partial-right (32M+32K)
-Pattern('0x69', 0x3fe)]), # Adjacent-left (64M - 128K)
-PatternGroup([
-# Batch 2: 7 clusters (3 new)
-Pattern('0x74', 0x001),   # Adjacent-right
-Pattern('0x69', 0x00e8000),   # Partial-left  (1M-96K)
-Pattern('0x6e', 0x2018000),   # Partial-right (32M+96K)
-Pattern('0x67', 0x3fe,
-2*GRANULARITY)]), # Overwrite [(64M-128K)-64M)
-PatternGroup([
-# Batch 3: 8 clusters (5 new)
-# Carefully chosen such that nothing re-dirties the one cluster
-# that copies out successfully before failure in Group #1.
-Pattern('0xaa', 0x001,
-3*GRANULARITY),   # Overwrite and 2x Adjacent-right
-Pattern('0xbb', 0x00d8000),   # Partial-left (1M-160K)
-Pattern('0xcc', 0x2028000),   # Partial-right (32M+160K)
-Pattern('0xdd', 0x3fc)]), # New; leaving a gap to the right
-]
-
-
-class EmulatedBitmap:
-def __init__(self, granularity=GRANULARITY):
-self._bits = set()
-self.granularity = granularity
-
-def dirty_bits(self, bits):
-self._bits |= set(bits)
-
-def dirty_group(self, n):
-self.dirty_bits(GROUPS[n].bits(self.granularity))
-
-def clear(self):
-self._bits = set()
-
-def clear_bits(self, bits):
-self._bits -= set(bits)
-
-def clear_bit(self, bit):
-self.clear_bits({bit})
-
-def clear_group(self, n):
-self.clear_bits(GROUPS[n].bits(self.granularity))
-
-@property
-def first_bit(self):
-return sorted(self.bits)[0]
-
-@property
-def bits(self):
-return self._bits
-
-@property
-def count(self):
-return len(self.bits)
-
-def compare(self, qmp_bitmap):
-"""
-Print a nice human-readable message checking that a bitmap as reported
-by the QMP interface has as many bits set as we expect it to.
-"""
-
-name = qmp_bitmap.get('name', '(anonymous)')
-log("= Checking Bitmap {:s} =".format(name))
-
-want = self.count
-have = qmp_bitmap['count'] // qmp_bitmap['granularity']
-
-log("expecting {:d} dirty sectors; have {:d}. {:s}".format(
-want, have, "OK!" if want == have else "ERROR!"))
-log('')
-
-
 class Drive:
 """Represents, vaguely, a drive attached to a VM.
 Includes format, graph, and device information."""
diff --git a/tests/qemu-iotests/bitmaps.py b/tests/qemu-iotests/bitmaps.py
new file mode 100644
index 00..522fc25171
--- /dev/null
+++ b/tests/qemu-iotests/bitmaps.py
@@ -0,0 +1,131 @@
+# Bitmap-related helper utilities
+#
+# Copyright (c) 2020 John Snow for Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR 

[PATCH 2/6] qmp: expose block-dirty-bitmap-populate

2020-02-24 Thread John Snow
This is a new job-creating command.

Signed-off-by: John Snow 
---
 qapi/block-core.json  | 18 ++
 qapi/transaction.json |  2 ++
 blockdev.c| 78 +++
 3 files changed, 98 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index df1797681a..a8be1fb36b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2293,6 +2293,24 @@
 '*auto-finalize': 'bool',
 '*auto-dismiss': 'bool' } }
 
+##
+# @block-dirty-bitmap-populate:
+#
+# Creates a new job that writes a pattern into a dirty bitmap.
+#
+# Since: 5.0
+#
+# Example:
+#
+# -> { "execute": "block-dirty-bitmap-populate",
+#  "arguments": { "node": "drive0", "target": "bitmap0",
+# "job-id": "job0", "pattern": "allocate-top" } }
+# <- { "return": {} }
+#
+##
+{ 'command': 'block-dirty-bitmap-populate', 'boxed': true,
+  'data': 'BlockDirtyBitmapPopulate' }
+
 ##
 # @BlockDirtyBitmapSha256:
 #
diff --git a/qapi/transaction.json b/qapi/transaction.json
index 04301f1be7..28521d5c7f 100644
--- a/qapi/transaction.json
+++ b/qapi/transaction.json
@@ -50,6 +50,7 @@
 # - @block-dirty-bitmap-enable: since 4.0
 # - @block-dirty-bitmap-disable: since 4.0
 # - @block-dirty-bitmap-merge: since 4.0
+# - @block-dirty-bitmap-populate: since 5.0
 # - @blockdev-backup: since 2.3
 # - @blockdev-snapshot: since 2.5
 # - @blockdev-snapshot-internal-sync: since 1.7
@@ -67,6 +68,7 @@
'block-dirty-bitmap-enable': 'BlockDirtyBitmap',
'block-dirty-bitmap-disable': 'BlockDirtyBitmap',
'block-dirty-bitmap-merge': 'BlockDirtyBitmapMerge',
+   'block-dirty-bitmap-populate': 'BlockDirtyBitmapPopulate',
'blockdev-backup': 'BlockdevBackup',
'blockdev-snapshot': 'BlockdevSnapshot',
'blockdev-snapshot-internal-sync': 'BlockdevSnapshotInternal',
diff --git a/blockdev.c b/blockdev.c
index 011dcfec27..33c0e35399 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2314,6 +2314,67 @@ static void 
block_dirty_bitmap_remove_commit(BlkActionState *common)
 bdrv_release_dirty_bitmap(state->bitmap);
 }
 
+static void block_dirty_bitmap_populate_prepare(BlkActionState *common, Error 
**errp)
+{
+BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, 
common);
+BlockDirtyBitmapPopulate *bitpop;
+BlockDriverState *bs;
+AioContext *aio_context;
+BdrvDirtyBitmap *bmap = NULL;
+int job_flags = JOB_DEFAULT;
+
+assert(common->action->type == 
TRANSACTION_ACTION_KIND_BLOCK_DIRTY_BITMAP_POPULATE);
+bitpop = common->action->u.block_dirty_bitmap_populate.data;
+
+bs = bdrv_lookup_bs(bitpop->node, bitpop->node, errp);
+if (!bs) {
+return;
+}
+
+aio_context = bdrv_get_aio_context(bs);
+aio_context_acquire(aio_context);
+state->bs = bs;
+
+bmap = bdrv_find_dirty_bitmap(bs, bitpop->name);
+if (!bmap) {
+error_setg(errp, "Bitmap '%s' could not be found", bitpop->name);
+return;
+}
+
+/* Paired with .clean() */
+bdrv_drained_begin(state->bs);
+
+if (!bitpop->has_on_error) {
+bitpop->on_error = BLOCKDEV_ON_ERROR_REPORT;
+}
+if (!bitpop->has_auto_finalize) {
+bitpop->auto_finalize = true;
+}
+if (!bitpop->has_auto_dismiss) {
+bitpop->auto_dismiss = true;
+}
+
+if (!bitpop->auto_finalize) {
+job_flags |= JOB_MANUAL_FINALIZE;
+}
+if (!bitpop->auto_dismiss) {
+job_flags |= JOB_MANUAL_DISMISS;
+}
+
+state->job = bitpop_job_create(
+bitpop->job_id,
+bs,
+bmap,
+bitpop->pattern,
+bitpop->on_error,
+job_flags,
+NULL, NULL,
+common->block_job_txn,
+errp);
+
+aio_context_release(aio_context);
+}
+
 static void abort_prepare(BlkActionState *common, Error **errp)
 {
 error_setg(errp, "Transaction aborted using Abort action");
@@ -2397,6 +2458,13 @@ static const BlkActionOps actions[] = {
 .commit = block_dirty_bitmap_remove_commit,
 .abort = block_dirty_bitmap_remove_abort,
 },
+[TRANSACTION_ACTION_KIND_BLOCK_DIRTY_BITMAP_POPULATE] = {
+.instance_size = sizeof(BlockdevBackupState),
+.prepare = block_dirty_bitmap_populate_prepare,
+.commit = blockdev_backup_commit,
+.abort = blockdev_backup_abort,
+.clean = blockdev_backup_clean,
+},
 /* Where are transactions for MIRROR, COMMIT and STREAM?
  * Although these blockjobs use transaction callbacks like the backup job,
  * these jobs do not necessarily adhere to transaction semantics.
@@ -3225,6 +3293,16 @@ void qmp_block_dirty_bitmap_merge(const char *node, 
const char *target,
 do_block_dirty_bitmap_merge(node, target, bitmaps, NULL, errp);
 }
 
+void qmp_block_dirty_bitmap_populate(BlockDirtyBitmapPopulate *bitpop,
+ Error **errp)
+{
+TransactionAction action = {
+.type = 

[PATCH 4/6] iotests: add hmp helper with logging

2020-02-24 Thread John Snow
Just a mild cleanup while I was here.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 8815052eb5..5d2990a0e4 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -525,23 +525,27 @@ def add_incoming(self, addr):
 self._args.append(addr)
 return self
 
+def hmp(self, command_line, log=False):
+cmd = 'human-monitor-command'
+kwargs = { 'command-line': command_line }
+if log:
+return self.qmp_log(cmd, **kwargs)
+return self.qmp(cmd, **kwargs)
+
 def pause_drive(self, drive, event=None):
 '''Pause drive r/w operations'''
 if not event:
 self.pause_drive(drive, "read_aio")
 self.pause_drive(drive, "write_aio")
 return
-self.qmp('human-monitor-command',
-command_line='qemu-io %s "break %s bp_%s"' % (drive, 
event, drive))
+self.hmp('qemu-io %s "break %s bp_%s"' % (drive, event, drive))
 
 def resume_drive(self, drive):
-self.qmp('human-monitor-command',
-command_line='qemu-io %s "remove_break bp_%s"' % (drive, 
drive))
+self.hmp('qemu-io %s "remove_break bp_%s"' % (drive, drive))
 
-def hmp_qemu_io(self, drive, cmd):
+def hmp_qemu_io(self, drive, cmd, log=False):
 '''Write to a given drive using an HMP command'''
-return self.qmp('human-monitor-command',
-command_line='qemu-io %s "%s"' % (drive, cmd))
+return self.hmp('qemu-io %s "%s"' % (drive, cmd), log=log)
 
 def flatten_qmp_object(self, obj, output=None, basestr=''):
 if output is None:
-- 
2.21.1




Re: [PATCH v6 12/18] target/ppc: Don't store VRMA SLBE persistently

2020-02-24 Thread Fabiano Rosas
David Gibson  writes:

> Currently, we construct the SLBE used for VRMA translations when the LPCR
> is written (which controls some bits in the SLBE), then use it later for
> translations.
>
> This is a bit complex and confusing - simplify it by simply constructing
> the SLBE directly from the LPCR when we need it.
>
> Signed-off-by: David Gibson 

Reviewed-by: Fabiano Rosas 

> ---
>  target/ppc/cpu.h|  3 ---
>  target/ppc/mmu-hash64.c | 28 ++--
>  2 files changed, 6 insertions(+), 25 deletions(-)
>
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index f9871b1233..5a55fb02bd 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1044,9 +1044,6 @@ struct CPUPPCState {
>  uint32_t flags;
>  uint64_t insns_flags;
>  uint64_t insns_flags2;
> -#if defined(TARGET_PPC64)
> -ppc_slb_t vrma_slb;
> -#endif
>  
>  int error_code;
>  uint32_t pending_interrupts;
> diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
> index ac21c14f68..f8bf92aa2e 100644
> --- a/target/ppc/mmu-hash64.c
> +++ b/target/ppc/mmu-hash64.c
> @@ -825,6 +825,7 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
> eaddr,
>  {
>  CPUState *cs = CPU(cpu);
>  CPUPPCState *env = >env;
> +ppc_slb_t vrma_slbe;
>  ppc_slb_t *slb;
>  unsigned apshift;
>  hwaddr ptex;
> @@ -863,8 +864,8 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
> eaddr,
>  }
>  } else if (ppc_hash64_use_vrma(env)) {
>  /* Emulated VRMA mode */
> -slb = >vrma_slb;
> -if (!slb->sps) {
> +slb = _slbe;
> +if (build_vrma_slbe(cpu, slb) != 0) {
>  /* Invalid VRMA setup, machine check */
>  cs->exception_index = POWERPC_EXCP_MCHECK;
>  env->error_code = 0;
> @@ -1012,6 +1013,7 @@ skip_slb_search:
>  hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU *cpu, target_ulong addr)
>  {
>  CPUPPCState *env = >env;
> +ppc_slb_t vrma_slbe;
>  ppc_slb_t *slb;
>  hwaddr ptex, raddr;
>  ppc_hash_pte64_t pte;
> @@ -1033,8 +1035,8 @@ hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU *cpu, 
> target_ulong addr)
>  return raddr | env->spr[SPR_HRMOR];
>  } else if (ppc_hash64_use_vrma(env)) {
>  /* Emulated VRMA mode */
> -slb = >vrma_slb;
> -if (!slb->sps) {
> +slb = _slbe;
> +if (build_vrma_slbe(cpu, slb) != 0) {
>  return -1;
>  }
>  } else {
> @@ -1072,30 +1074,12 @@ void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu, 
> target_ulong ptex,
>  cpu->env.tlb_need_flush = TLB_NEED_GLOBAL_FLUSH | TLB_NEED_LOCAL_FLUSH;
>  }
>  
> -static void ppc_hash64_update_vrma(PowerPCCPU *cpu)
> -{
> -CPUPPCState *env = >env;
> -ppc_slb_t *slb = >vrma_slb;
> -
> -/* Is VRMA enabled ? */
> -if (ppc_hash64_use_vrma(env)) {
> -if (build_vrma_slbe(cpu, slb) == 0) {
> -return;
> -}
> -}
> -
> -/* Otherwise, clear it to indicate error */
> -slb->esid = slb->vsid = 0;
> -slb->sps = NULL;
> -}
> -
>  void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val)
>  {
>  PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>  CPUPPCState *env = >env;
>  
>  env->spr[SPR_LPCR] = val & pcc->lpcr_mask;
> -ppc_hash64_update_vrma(cpu);
>  }
>  
>  void helper_store_lpcr(CPUPPCState *env, target_ulong val)



Re: [PATCH RESEND v2 17/32] hw/ppc/ppc405: Use memory_region_init_rom() with read-only regions

2020-02-24 Thread David Gibson
On Mon, Feb 24, 2020 at 09:55:18PM +0100, Philippe Mathieu-Daudé wrote:
> The scripts/coccinelle/memory-region-housekeeping.cocci reported:
> * TODO 
> [[view:./hw/ppc/ppc405_boards.c::face=ovl-face1::linb=195::colb=8::cole=30][potential
>  use of memory_region_init_rom*() in  ./hw/ppc/ppc405_boards.c::195]]
> * TODO 
> [[view:./hw/ppc/ppc405_boards.c::face=ovl-face1::linb=464::colb=8::cole=30][potential
>  use of memory_region_init_rom*() in  ./hw/ppc/ppc405_boards.c::464]]
> 
> We can indeed replace the memory_region_init_ram() and
> memory_region_set_readonly() calls by memory_region_init_rom().
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Acked-by: David Gibson 

> ---
>  hw/ppc/ppc405_boards.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
> index 1f721feed6..5afe023253 100644
> --- a/hw/ppc/ppc405_boards.c
> +++ b/hw/ppc/ppc405_boards.c
> @@ -192,7 +192,7 @@ static void ref405ep_init(MachineState *machine)
>  #endif
>  {
>  bios = g_new(MemoryRegion, 1);
> -memory_region_init_ram(bios, NULL, "ef405ep.bios", BIOS_SIZE,
> +memory_region_init_rom(bios, NULL, "ef405ep.bios", BIOS_SIZE,
> _fatal);
>  
>  if (bios_name == NULL)
> @@ -216,7 +216,6 @@ static void ref405ep_init(MachineState *machine)
>  /* Avoid an uninitialized variable warning */
>  bios_size = -1;
>  }
> -memory_region_set_readonly(bios, true);
>  }
>  /* Register FPGA */
>  ref405ep_fpga_init(sysmem, 0xF030);
> @@ -461,7 +460,7 @@ static void taihu_405ep_init(MachineState *machine)
>  if (bios_name == NULL)
>  bios_name = BIOS_FILENAME;
>  bios = g_new(MemoryRegion, 1);
> -memory_region_init_ram(bios, NULL, "taihu_405ep.bios", BIOS_SIZE,
> +memory_region_init_rom(bios, NULL, "taihu_405ep.bios", BIOS_SIZE,
> _fatal);
>  filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
>  if (filename) {
> @@ -479,7 +478,6 @@ static void taihu_405ep_init(MachineState *machine)
>  error_report("Could not load PowerPC BIOS '%s'", bios_name);
>  exit(1);
>  }
> -memory_region_set_readonly(bios, true);
>  }
>  /* Register Linux flash */
>  dinfo = drive_get(IF_PFLASH, 0, fl_idx);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH RESEND v2 11/32] hw/ppc: Use memory_region_init_rom() with read-only regions

2020-02-24 Thread David Gibson
On Mon, Feb 24, 2020 at 09:55:12PM +0100, Philippe Mathieu-Daudé wrote:
> This commit was produced with the Coccinelle script
> scripts/coccinelle/memory-region-housekeeping.cocci.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Acked-by: David Gibson 

> ---
>  hw/ppc/mac_newworld.c | 3 +--
>  hw/ppc/mac_oldworld.c | 3 +--
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
> index 464d012103..566413e479 100644
> --- a/hw/ppc/mac_newworld.c
> +++ b/hw/ppc/mac_newworld.c
> @@ -156,13 +156,12 @@ static void ppc_core99_init(MachineState *machine)
>  memory_region_add_subregion(get_system_memory(), 0, ram);
>  
>  /* allocate and load BIOS */
> -memory_region_init_ram(bios, NULL, "ppc_core99.bios", BIOS_SIZE,
> +memory_region_init_rom(bios, NULL, "ppc_core99.bios", BIOS_SIZE,
> _fatal);
>  
>  if (bios_name == NULL)
>  bios_name = PROM_FILENAME;
>  filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
> -memory_region_set_readonly(bios, true);
>  memory_region_add_subregion(get_system_memory(), PROM_ADDR, bios);
>  
>  /* Load OpenBIOS (ELF) */
> diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
> index 7318d7e9b4..8b22ff60b8 100644
> --- a/hw/ppc/mac_oldworld.c
> +++ b/hw/ppc/mac_oldworld.c
> @@ -132,13 +132,12 @@ static void ppc_heathrow_init(MachineState *machine)
>  memory_region_add_subregion(sysmem, 0, ram);
>  
>  /* allocate and load BIOS */
> -memory_region_init_ram(bios, NULL, "ppc_heathrow.bios", BIOS_SIZE,
> +memory_region_init_rom(bios, NULL, "ppc_heathrow.bios", BIOS_SIZE,
> _fatal);
>  
>  if (bios_name == NULL)
>  bios_name = PROM_FILENAME;
>  filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
> -memory_region_set_readonly(bios, true);
>  memory_region_add_subregion(sysmem, PROM_ADDR, bios);
>  
>  /* Load OpenBIOS (ELF) */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH RESEND v2 10/32] hw/pci-host: Use memory_region_init_rom() with read-only regions

2020-02-24 Thread David Gibson
On Mon, Feb 24, 2020 at 09:55:11PM +0100, Philippe Mathieu-Daudé wrote:
> This commit was produced with the Coccinelle script
> scripts/coccinelle/memory-region-housekeeping.cocci.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Acked-by: David Gibson 

> ---
>  hw/pci-host/prep.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/pci-host/prep.c b/hw/pci-host/prep.c
> index 1aff72bec6..1a02e9a670 100644
> --- a/hw/pci-host/prep.c
> +++ b/hw/pci-host/prep.c
> @@ -325,9 +325,8 @@ static void raven_realize(PCIDevice *d, Error **errp)
>  d->config[0x0D] = 0x10; // latency_timer
>  d->config[0x34] = 0x00; // capabilities_pointer
>  
> -memory_region_init_ram_nomigrate(>bios, OBJECT(s), "bios", BIOS_SIZE,
> -   _fatal);
> -memory_region_set_readonly(>bios, true);
> +memory_region_init_rom_nomigrate(>bios, OBJECT(s), "bios", BIOS_SIZE,
> + _fatal);
>  memory_region_add_subregion(get_system_memory(), (uint32_t)(-BIOS_SIZE),
>  >bios);
>  if (s->bios_name) {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] spapr: Rework hash<->radix transitions at CAS

2020-02-24 Thread David Gibson
On Mon, Feb 24, 2020 at 12:18:27PM +0100, Greg Kurz wrote:
> On Wed, 19 Feb 2020 10:21:05 +1100
> David Gibson  wrote:
> 
> > On Fri, Feb 14, 2020 at 07:19:00PM +0100, Greg Kurz wrote:
> > > On Fri, 14 Feb 2020 09:28:35 +1100
> > > David Gibson  wrote:
> > > 
> > > > On Thu, Feb 13, 2020 at 04:38:38PM +0100, Greg Kurz wrote:
> > > > > Until the CAS negotiation is over, an HPT can be allocated on three
> > > > > different paths:
> > > > > 
> > > > > 1) during machine reset if the host doesn't support radix,
> > > > > 
> > > > > 2) during CAS if the guest wants hash and doesn't support HPT 
> > > > > resizing,
> > > > >in which case we pre-emptively resize the HPT to accomodate maxram,
> > > > > 
> > > > > 3) during CAS if no CAS reboot was requested, the guest wants hash but
> > > > >we're currently configured for radix.
> > > > > 
> > > > > Depending on the various combinations of host or guest MMU support,
> > > > > HPT resizing guest support and the possibility of a CAS reboot, it
> > > > > is quite hard to know which of these allocates the HPT that will
> > > > > be ultimately used by the guest that wants to do hash. Also, some of
> > > > > them have bugs:
> > > > > 
> > > > > - 2) calls spapr_reallocate_hpt() instead of 
> > > > > spapr_setup_hpt_and_vrma()
> > > > >   and thus doesn't update the VRMA size, even though we've just 
> > > > > extended
> > > > >   the HPT. Not sure what issues this can cause,
> > > > > 
> > > > > - 3) doesn't check for HPT resizing support and will always allocate a
> > > > >   small HPT based on the initial RAM size. This caps the total amount 
> > > > > of
> > > > >   RAM the guest can see, especially if maxram is much higher than the
> > > > >   initial ram.
> > > > > 
> > > > > We only support guests that do CAS and we already assume that the HPT
> > > > > isn't being used when we do the pre-emptive resizing at CAS. It thus
> > > > > seems reasonable to only allocate the HPT at the end of CAS, when no
> > > > > CAS reboot was requested.
> > > > > 
> > > > > Consolidate the logic so that we only create the HPT during 3), ie.
> > > > > when we're done with the CAS reboot cycles, and ensure HPT resizing
> > > > > is taken into account. This fixes the radix->hash transition for
> > > > > all cases.
> > > > 
> > > > Uh.. I'm pretty sure this can't work for KVM on a POWER8 host.  We
> > > > need the HPT at all times there, or there's nowhere to put VRMA
> > > > entries, so we can't run even in real mode.
> > > > 
> > > 
> > > Well it happens to be working anyway because KVM automatically
> > > creates an HPT (default size 16MB) in kvmppc_hv_setup_htab_rma()
> > > if QEMU didn't do so already... Would a comment to emphasize this
> > > be enough or do you prefer I don't drop the HPT allocation currently
> > > performed at machine reset ?
> > 
> > Relying on the automatic allocation is not a good idea.  With host
> > kernels before HPT resizing, once that automatic allocation happens,
> > we can't change the HPT size *at all*, even with a reset or CAS.
> > 
> 
> Ah ok I see. With these older host kernels, we need QEMU to allocate the
> HPT to fit ms->maxram_size, which KVM doesn't know about, or we'll have
> troubles with VMs that would need a bigger HPT.

Exactly.

> And I guess we want to
> support bigger VMs with pre-4.11 host kernels.

Well, with the usual sizing rules, a 16MiB HPT only supports a 2GiB
guest, so it doesn't have to be a particularly large VM to trip this.

> 
> > So, yes, the current code is annoyingly complex, but it's that way for
> > a reason.
> > 
> 
> My motivation here is to get rid of CAS reboot... it definitely needs more
> thinking on my side.

Yeah, I think so.

> 
> > > > > The guest can theoretically call CAS several times, without a CAS
> > > > > reboot in between. Linux guests don't do that, but better safe than
> > > > > sorry, let's ensure we can also handle the symmetrical hash->radix
> > > > > transition correctly: free the HPT and set the GR bit in PATE.
> > > > > An helper is introduced for the latter since this is already what
> > > > > we do during machine reset when going for radix.
> > > > > 
> > > > > As a bonus, this removes one user of spapr->cas_reboot, which we
> > > > > want to get rid of in the future.
> > > > > 
> > > > > Signed-off-by: Greg Kurz 
> > > > > ---
> > > > >  hw/ppc/spapr.c |   25 +++-
> > > > >  hw/ppc/spapr_hcall.c   |   59 
> > > > > 
> > > > >  include/hw/ppc/spapr.h |1 +
> > > > >  3 files changed, 44 insertions(+), 41 deletions(-)
> > > > > 
> > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > > index 828e2cc1359a..88bc0e4e3ca1 100644
> > > > > --- a/hw/ppc/spapr.c
> > > > > +++ b/hw/ppc/spapr.c
> > > > > @@ -1573,9 +1573,19 @@ void 
> > > > > spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
> > > > >  {
> > > > >  int hpt_shift;
> > > > >  
> > > > > +/*
> > > > > + * HPT resizing is a bit of a 

Re: [PATCH v7 2/9] hw/core/clock-vmstate: define a vmstate entry for clock state

2020-02-24 Thread Alistair Francis
On Mon, Feb 24, 2020 at 9:06 AM Damien Hedde  wrote:
>
> Signed-off-by: Damien Hedde 
> Reviewed-by: Peter Maydell 
> Reviewed-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> --
>
> v7: remove leading underscores in macro args
> ---
>  include/hw/clock.h  |  9 +
>  hw/core/clock-vmstate.c | 25 +
>  hw/core/Makefile.objs   |  1 +
>  3 files changed, 35 insertions(+)
>  create mode 100644 hw/core/clock-vmstate.c
>
> diff --git a/include/hw/clock.h b/include/hw/clock.h
> index 30ac9a9946..8c191751a1 100644
> --- a/include/hw/clock.h
> +++ b/include/hw/clock.h
> @@ -74,6 +74,15 @@ struct Clock {
>  QLIST_ENTRY(Clock) sibling;
>  };
>
> +/*
> + * vmstate description entry to be added in device vmsd.
> + */
> +extern const VMStateDescription vmstate_clock;
> +#define VMSTATE_CLOCK(field, state) \
> +VMSTATE_CLOCK_V(field, state, 0)
> +#define VMSTATE_CLOCK_V(field, state, version) \
> +VMSTATE_STRUCT_POINTER_V(field, state, version, vmstate_clock, Clock)
> +
>  /**
>   * clock_setup_canonical_path:
>   * @clk: clock
> diff --git a/hw/core/clock-vmstate.c b/hw/core/clock-vmstate.c
> new file mode 100644
> index 00..260b13fc2c
> --- /dev/null
> +++ b/hw/core/clock-vmstate.c
> @@ -0,0 +1,25 @@
> +/*
> + * Clock migration structure
> + *
> + * Copyright GreenSocs 2019-2020
> + *
> + * Authors:
> + *  Damien Hedde
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "migration/vmstate.h"
> +#include "hw/clock.h"
> +
> +const VMStateDescription vmstate_clock = {
> +.name = "clock",
> +.version_id = 0,
> +.minimum_version_id = 0,
> +.fields = (VMStateField[]) {
> +VMSTATE_UINT64(period, Clock),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
> index d7080edf89..e3d796fdd4 100644
> --- a/hw/core/Makefile.objs
> +++ b/hw/core/Makefile.objs
> @@ -22,6 +22,7 @@ common-obj-$(CONFIG_SOFTMMU) += null-machine.o
>  common-obj-$(CONFIG_SOFTMMU) += loader.o
>  common-obj-$(CONFIG_SOFTMMU) += machine-hmp-cmds.o
>  common-obj-$(CONFIG_SOFTMMU) += numa.o
> +common-obj-$(CONFIG_SOFTMMU) += clock-vmstate.o
>  obj-$(CONFIG_SOFTMMU) += machine-qmp-cmds.o
>
>  common-obj-$(CONFIG_EMPTY_SLOT) += empty_slot.o
> --
> 2.24.1
>
>



Re: [PATCH v6 05/18] target/ppc: Introduce ppc_hash64_use_vrma() helper

2020-02-24 Thread Fabiano Rosas
David Gibson  writes:

> When running guests under a hypervisor, the hypervisor obviously needs to
> be protected from guest accesses even if those are in what the guest
> considers real mode (translation off).  The POWER hardware provides two
> ways of doing that: The old way has guest real mode accesses simply offset
> and bounds checked into host addresses.  It works, but requires that a
> significant chunk of the guest's memory - the RMA - be physically
> contiguous in the host, which is pretty inconvenient.  The new way, known
> as VRMA, has guest real mode accesses translated in roughly the normal way
> but with some special parameters.
>
> In POWER7 and POWER8 the LPCR[VPM0] bit selected between the two modes, but
> in POWER9 only VRMA mode is supported and LPCR[VPM0] no longer exists.  We
> handle that difference in behaviour in ppc_hash64_set_isi().. but not in
> other places that we blindly check LPCR[VPM0].
>
> Correct those instances with a new helper to tell if we should be in VRMA
> mode.
>
> Signed-off-by: David Gibson 
> Reviewed-by: Cédric Le Goater 

Reviewed-by: Fabiano Rosas 

> ---
>  target/ppc/mmu-hash64.c | 43 -
>  1 file changed, 21 insertions(+), 22 deletions(-)
>
> diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
> index 392f90e0ae..e372c42add 100644
> --- a/target/ppc/mmu-hash64.c
> +++ b/target/ppc/mmu-hash64.c
> @@ -668,6 +668,21 @@ unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU 
> *cpu,
>  return 0;
>  }
>  
> +static bool ppc_hash64_use_vrma(CPUPPCState *env)
> +{
> +switch (env->mmu_model) {
> +case POWERPC_MMU_3_00:
> +/*
> + * ISAv3.0 (POWER9) always uses VRMA, the VPM0 field and RMOR
> + * register no longer exist
> + */
> +return true;
> +
> +default:
> +return !!(env->spr[SPR_LPCR] & LPCR_VPM0);
> +}
> +}
> +
>  static void ppc_hash64_set_isi(CPUState *cs, uint64_t error_code)
>  {
>  CPUPPCState *env = _CPU(cs)->env;
> @@ -676,15 +691,7 @@ static void ppc_hash64_set_isi(CPUState *cs, uint64_t 
> error_code)
>  if (msr_ir) {
>  vpm = !!(env->spr[SPR_LPCR] & LPCR_VPM1);
>  } else {
> -switch (env->mmu_model) {
> -case POWERPC_MMU_3_00:
> -/* Field deprecated in ISAv3.00 - interrupts always go to hyperv 
> */
> -vpm = true;
> -break;
> -default:
> -vpm = !!(env->spr[SPR_LPCR] & LPCR_VPM0);
> -break;
> -}
> +vpm = ppc_hash64_use_vrma(env);
>  }
>  if (vpm && !msr_hv) {
>  cs->exception_index = POWERPC_EXCP_HISI;
> @@ -702,15 +709,7 @@ static void ppc_hash64_set_dsi(CPUState *cs, uint64_t 
> dar, uint64_t dsisr)
>  if (msr_dr) {
>  vpm = !!(env->spr[SPR_LPCR] & LPCR_VPM1);
>  } else {
> -switch (env->mmu_model) {
> -case POWERPC_MMU_3_00:
> -/* Field deprecated in ISAv3.00 - interrupts always go to hyperv 
> */
> -vpm = true;
> -break;
> -default:
> -vpm = !!(env->spr[SPR_LPCR] & LPCR_VPM0);
> -break;
> -}
> +vpm = ppc_hash64_use_vrma(env);
>  }
>  if (vpm && !msr_hv) {
>  cs->exception_index = POWERPC_EXCP_HDSI;
> @@ -799,7 +798,7 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
> eaddr,
>  if (!(eaddr >> 63)) {
>  raddr |= env->spr[SPR_HRMOR];
>  }
> -} else if (env->spr[SPR_LPCR] & LPCR_VPM0) {
> +} else if (ppc_hash64_use_vrma(env)) {
>  /* Emulated VRMA mode */
>  slb = >vrma_slb;
>  if (!slb->sps) {
> @@ -967,7 +966,7 @@ hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU *cpu, 
> target_ulong addr)
>  } else if ((msr_hv || !env->has_hv_mode) && !(addr >> 63)) {
>  /* In HV mode, add HRMOR if top EA bit is clear */
>  return raddr | env->spr[SPR_HRMOR];
> -} else if (env->spr[SPR_LPCR] & LPCR_VPM0) {
> +} else if (ppc_hash64_use_vrma(env)) {
>  /* Emulated VRMA mode */
>  slb = >vrma_slb;
>  if (!slb->sps) {
> @@ -1056,8 +1055,7 @@ static void ppc_hash64_update_vrma(PowerPCCPU *cpu)
>  slb->sps = NULL;
>  
>  /* Is VRMA enabled ? */
> -lpcr = env->spr[SPR_LPCR];
> -if (!(lpcr & LPCR_VPM0)) {
> +if (!ppc_hash64_use_vrma(env)) {
>  return;
>  }
>  
> @@ -1065,6 +1063,7 @@ static void ppc_hash64_update_vrma(PowerPCCPU *cpu)
>   * Make one up. Mostly ignore the ESID which will not be needed
>   * for translation
>   */
> +lpcr = env->spr[SPR_LPCR];
>  vsid = SLB_VSID_VRMA;
>  vrmasd = (lpcr & LPCR_VRMASD) >> LPCR_VRMASD_SHIFT;
>  vsid |= (vrmasd << 4) & (SLB_VSID_L | SLB_VSID_LP);



Re: [PATCH v7 1/9] hw/core/clock: introduce clock object

2020-02-24 Thread Alistair Francis
On Mon, Feb 24, 2020 at 9:05 AM Damien Hedde  wrote:
>
> This object may be used to represent a clock inside a clock tree.
>
> A clock may be connected to another clock so that it receives update,
> through a callback, whenever the source/parent clock is updated.
>
> Although only the root clock of a clock tree controls the values
> (represented as periods) of all clocks in tree, each clock holds
> a local state containing the current value so that it can be fetched
> independently. It will allows us to fullfill migration requirements
> by migrating each clock independently of others.
>
> This is based on the original work of Frederic Konrad.
>
> Signed-off-by: Damien Hedde 
> --
>
> v7:
> + merge ClockIn & ClockOut into a single type Clock
> + switch clock state to a period with 2^-32ns unit
> + add some Hz and ns helpers
> + propagate clock period when setting the source so that
>   clocks with fixed period are easy to handle.
> ---
>  include/hw/clock.h| 216 ++
>  hw/core/clock.c   | 131 +
>  hw/core/Makefile.objs |   2 +
>  hw/core/trace-events  |   7 ++
>  4 files changed, 356 insertions(+)
>  create mode 100644 include/hw/clock.h
>  create mode 100644 hw/core/clock.c
>
> diff --git a/include/hw/clock.h b/include/hw/clock.h
> new file mode 100644
> index 00..30ac9a9946
> --- /dev/null
> +++ b/include/hw/clock.h
> @@ -0,0 +1,216 @@
> +/*
> + * Hardware Clocks
> + *
> + * Copyright GreenSocs 2016-2020
> + *
> + * Authors:
> + *  Frederic Konrad
> + *  Damien Hedde
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_HW_CLOCK_H
> +#define QEMU_HW_CLOCK_H
> +
> +#include "qom/object.h"
> +#include "qemu/queue.h"
> +
> +#define TYPE_CLOCK "clock"
> +#define CLOCK(obj) OBJECT_CHECK(Clock, (obj), TYPE_CLOCK)
> +
> +typedef void ClockCallback(void *opaque);
> +
> +/*
> + * clock store a value representing the clock's period in 2^-32ns unit.
> + * It can represent:
> + *  + periods from 2^-32ns up to 4seconds
> + *  + frequency from ~0.25Hz 2e10Ghz
> + * Resolution of frequency representation decreases with frequency:
> + * + at 100MHz, resolution is ~2mHz
> + * + at 1Ghz,   resolution is ~0.2Hz
> + * + at 10Ghz,  resolution is ~20Hz
> + */
> +#define CLOCK_SECOND (10llu << 32)
> +
> +/*
> + * macro helpers to convert to hertz / nanosecond
> + */
> +#define CLOCK_PERIOD_FROM_NS(ns) ((ns) * (CLOCK_SECOND / 10llu))
> +#define CLOCK_PERIOD_TO_NS(per) ((per) / (CLOCK_SECOND / 10llu))
> +#define CLOCK_PERIOD_FROM_HZ(hz) (((hz) != 0) ? CLOCK_SECOND / (hz) : 0u)
> +#define CLOCK_PERIOD_TO_HZ(per) (((per) != 0) ? CLOCK_SECOND / (per) : 0u)
> +
> +/**
> + * Clock:
> + * @parent_obj: parent class
> + * @period: unsigned integer representing the period of the clock
> + * @canonical_path: clock path string cache (used for trace purpose)
> + * @callback: called when clock changes
> + * @callback_opaque: argument for @callback
> + * @source: source (or parent in clock tree) of the clock
> + * @children: list of clocks connected to this one (it is their source)
> + * @sibling: structure used to form a clock list
> + */
> +
> +typedef struct Clock Clock;
> +
> +struct Clock {
> +/*< private >*/
> +Object parent_obj;
> +
> +/* all fields are private and should not be modified directly */
> +
> +/* fields */
> +uint64_t period;
> +char *canonical_path;
> +ClockCallback *callback;
> +void *callback_opaque;
> +
> +/* Clocks are organized in a clock tree */
> +Clock *source;
> +QLIST_HEAD(, Clock) children;
> +QLIST_ENTRY(Clock) sibling;
> +};
> +
> +/**
> + * clock_setup_canonical_path:
> + * @clk: clock
> + *
> + * compute the canonical path of the clock (used by log messages)
> + */
> +void clock_setup_canonical_path(Clock *clk);
> +
> +/**
> + * clock_add_callback:

s/clock_add_callback/clock_set_callback/g

> + * @clk: the clock to register the callback into
> + * @cb: the callback function
> + * @opaque: the argument to the callback
> + *
> + * Register a callback called on every clock update.
> + */
> +void clock_set_callback(Clock *clk, ClockCallback *cb, void *opaque);
> +
> +/**
> + * clock_clear_callback:
> + * @clk: the clock to delete the callback from
> + *
> + * Unregister the callback registered with clock_set_callback.
> + */
> +void clock_clear_callback(Clock *clk);
> +
> +/**
> + * clock_set_source:
> + * @clk: the clock.
> + * @src: the source clock
> + *
> + * Setup @src as the clock source of @clk. The current @src period
> + * value is also copied to @clk and its subtree but not callback is

s/not/no/g

> + * called.
> + * Further @src update will be propagated to @clk and its subtree.
> + */
> +void clock_set_source(Clock *clk, Clock *src);
> +
> +/**
> + * clock_set:
> + * @clk: the clock to initialize.
> + * @value: the clock's value, 0 means 

[PATCH v6 16/18] spapr: Don't clamp RMA to 16GiB on new machine types

2020-02-24 Thread David Gibson
In spapr_machine_init() we clamp the size of the RMA to 16GiB and the
comment saying why doesn't make a whole lot of sense.  In fact, this was
done because the real mode handling code elsewhere limited the RMA in TCG
mode to the maximum value configurable in LPCR[RMLS], 16GiB.

But,
 * Actually LPCR[RMLS] has been able to encode a 256GiB size for a very
   long time, we just didn't implement it properly in the softmmu
 * LPCR[RMLS] shouldn't really be relevant anyway, it only was because we
   used to abuse the RMOR based translation mode in order to handle the
   fact that we're not modelling the hypervisor parts of the cpu

We've now removed those limitations in the modelling so the 16GiB clamp no
longer serves a function.  However, we can't just remove the limit
universally: that would break migration to earlier qemu versions, where
the 16GiB RMLS limit still applies, no matter how bad the reasons for it
are.

So, we replace the 16GiB clamp, with a clamp to a limit defined in the
machine type class.  We set it to 16 GiB for machine types 4.2 and earlier,
but set it to 0 meaning unlimited for the new 5.0 machine type.

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 13 -
 include/hw/ppc/spapr.h |  1 +
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 4dab489931..6e9f15f64d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2702,12 +2702,14 @@ static void spapr_machine_init(MachineState *machine)
 
 spapr->rma_size = node0_size;
 
-/* Actually we don't support unbounded RMA anymore since we added
- * proper emulation of HV mode. The max we can get is 16G which
- * also happens to be what we configure for PAPR mode so make sure
- * we don't do anything bigger than that
+/*
+ * Clamp the RMA size based on machine type.  This is for
+ * migration compatibility with older qemu versions, which limited
+ * the RMA size for complicated and mostly bad reasons.
  */
-spapr->rma_size = MIN(spapr->rma_size, 0x4ull);
+if (smc->rma_limit) {
+spapr->rma_size = MIN(spapr->rma_size, smc->rma_limit);
+}
 
 if (spapr->rma_size > node0_size) {
 error_report("Numa node 0 has to span the RMA (%#08"HWADDR_PRIx")",
@@ -4600,6 +4602,7 @@ static void spapr_machine_4_2_class_options(MachineClass 
*mc)
 compat_props_add(mc->compat_props, hw_compat_4_2, hw_compat_4_2_len);
 smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
 smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
+smc->rma_limit = 16 * GiB;
 mc->nvdimm_supported = false;
 }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index fc49c1a710..8a44a1f488 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -126,6 +126,7 @@ struct SpaprMachineClass {
 bool pre_4_1_migration; /* don't migrate hpt-max-page-size */
 bool linux_pci_probe;
 bool smp_threads_vsmt; /* set VSMT to smp_threads by default */
+hwaddr rma_limit;  /* clamp the RMA to this size */
 
 void (*phb_placement)(SpaprMachineState *spapr, uint32_t index,
   uint64_t *buid, hwaddr *pio, 
-- 
2.24.1




Re: [PATCH RESEND 1/3] vfio/pci: fix a null pointer reference in vfio_rom_read

2020-02-24 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



On 2020/2/25 0:04, Alex Williamson wrote:
> On Mon, 24 Feb 2020 14:42:17 +0800
> "Longpeng(Mike)"  wrote:
> 
>> From: Longpeng 
>>
>> vfio_pci_load_rom() maybe failed and then the vdev->rom is NULL in
>> some situation (though I've not encountered yet), maybe we should
>> avoid the VM abort.
>>
>> Signed-off-by: Longpeng 
>> ---
>>  hw/vfio/pci.c | 13 -
>>  1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 5e75a95..ed798ae 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -768,7 +768,7 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
>>  }
>>  }
>>  
>> -static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>> +static bool vfio_pci_load_rom(VFIOPCIDevice *vdev)
>>  {
>>  struct vfio_region_info *reg_info;
>>  uint64_t size;
>> @@ -778,7 +778,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>>  if (vfio_get_region_info(>vbasedev,
>>   VFIO_PCI_ROM_REGION_INDEX, _info)) {
>>  error_report("vfio: Error getting ROM info: %m");
>> -return;
>> +return false;
>>  }
>>  
>>  trace_vfio_pci_load_rom(vdev->vbasedev.name, (unsigned 
>> long)reg_info->size,
>> @@ -797,7 +797,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>>  error_printf("Device option ROM contents are probably invalid "
>>  "(check dmesg).\nSkip option ROM probe with rombar=0, "
>>  "or load from file with romfile=\n");
>> -return;
>> +return false;
>>  }
>>  
>>  vdev->rom = g_malloc(size);
>> @@ -849,6 +849,8 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>>  data[6] = -csum;
>>  }
>>  }
>> +
>> +return true;
>>  }
>>  
>>  static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
>> @@ -863,8 +865,9 @@ static uint64_t vfio_rom_read(void *opaque, hwaddr addr, 
>> unsigned size)
>>  uint64_t data = 0;
>>  
>>  /* Load the ROM lazily when the guest tries to read it */
>> -if (unlikely(!vdev->rom && !vdev->rom_read_failed)) {
>> -vfio_pci_load_rom(vdev);
>> +if (unlikely(!vdev->rom && !vdev->rom_read_failed) &&
>> +!vfio_pci_load_rom(vdev)) {
>> +return 0;
>>  }
>>  
>>  memcpy(, vdev->rom + addr,
> 
> Looks like an obvious bug, until you look at the rest of this memcpy():
> 
> memcpy(, vdev->rom + addr,
>(addr < vdev->rom_size) ? MIN(size, vdev->rom_size - addr) : 0);
> 
> IOW, we'll do a zero sized memcpy() if rom_size is zero, so there's no
> risk of the concern identified in the commit log.  This patch is
> unnecessary.  Thanks,
> 
Oh, I missed that, sorry for make the noise, thanks

> Alex
> 
> .
> 



[PATCH v6 15/18] spapr: Don't attempt to clamp RMA to VRMA constraint

2020-02-24 Thread David Gibson
The Real Mode Area (RMA) is the part of memory which a guest can access
when in real (MMU off) mode.  Of course, for a guest under KVM, the MMU
isn't really turned off, it's just in a special translation mode - Virtual
Real Mode Area (VRMA) - which looks like real mode in guest mode.

The mechanics of how this works when using the hash MMU (HPT) put a
constraint on the size of the RMA, which depends on the size of the
HPT.  So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA
we advertise to the guest based on this VRMA limit.

There are several things wrong with this:
 1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum
of Node 0 memory size and the VRMA limit.  That will *often* work the
same as clamping, but there can be other constraints on RMA size which
supersede Node 0 memory size.  We have real bugs caused by this
(currently worked around in the guest kernel)
 2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where
we're past the point that we can actually advertise an RMA limit to the
guest
 3) But most fundamentally, the VRMA limit depends on host configuration
(page size) which shouldn't be visible to the guest, but this partially
exposes it.  This can cause problems with migration in certain edge
cases, although we will mostly get away with it.

In practice, this clamping is almost never applied anyway.  With 64kiB
pages and the normal rules for sizing of the HPT, the theoretical VRMA
limit will be 4x(guest memory size) and so never hit.  It will hit with
4kiB pages, where it will be (guest memory size)/4.  However all mainstream
distro kernels for POWER have used a 64kiB page size for at least 10 years.

So, simply replace this logic with a check that the RMA we've calculated
based only on guest visible configuration will fit within the host implied
VRMA limit.  This can break if running HPT guests on a host kernel with
4kiB page size.  As noted that's very rare.  There also exist several
possible workarounds:
  * Change the host kernel to use 64kiB pages
  * Use radix MMU (RPT) guests instead of HPT
  * Use 64kiB hugepages on the host to back guest memory
  * Increase the guest memory size so that the RMA hits one of the fixed
limits before the RMA limit.  This is relatively easy on POWER8 which
has a 16GiB limit, harder on POWER9 which has a 1TiB limit.
  * Use a guest NUMA configuration which artificially constrains the RMA
within the VRMA limit (the RMA must always fit within Node 0).

Previously, on KVM, we also temporarily reduced the rma_size to 256M so
that the we'd load the kernel and initrd safely, regardless of the VRMA
limit.  This was a) confusing, b) could significantly limit the size of
images we could load and c) introduced a behavioural difference between
KVM and TCG.  So we remove that as well.

Signed-off-by: David Gibson 
Reviewed-by: Alexey Kardashevskiy 
---
 hw/ppc/spapr.c | 28 ++--
 hw/ppc/spapr_hcall.c   |  4 ++--
 include/hw/ppc/spapr.h |  3 +--
 3 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b68d80ba69..4dab489931 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1569,7 +1569,7 @@ void spapr_reallocate_hpt(SpaprMachineState *spapr, int 
shift,
 spapr_set_all_lpcrs(0, LPCR_HR | LPCR_UPRT);
 }
 
-void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
+void spapr_setup_hpt(SpaprMachineState *spapr)
 {
 int hpt_shift;
 
@@ -1585,10 +1585,16 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
 }
 spapr_reallocate_hpt(spapr, hpt_shift, _fatal);
 
-if (spapr->vrma_adjust) {
+if (kvm_enabled()) {
 hwaddr vrma_limit = kvmppc_vrma_limit(spapr->htab_shift);
 
-spapr->rma_size = MIN(spapr_node0_size(MACHINE(spapr)), vrma_limit);
+/* Check our RMA fits in the possible VRMA */
+if (vrma_limit < spapr->rma_size) {
+error_report("Unable to create %" HWADDR_PRIu
+ "MiB RMA (VRMA only allows %" HWADDR_PRIu "MiB",
+ spapr->rma_size / MiB, vrma_limit / MiB);
+exit(EXIT_FAILURE);
+}
 }
 }
 
@@ -1628,7 +1634,7 @@ static void spapr_machine_reset(MachineState *machine)
 spapr->patb_entry = PATE1_GR;
 spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
 } else {
-spapr_setup_hpt_and_vrma(spapr);
+spapr_setup_hpt(spapr);
 }
 
 qemu_devices_reset();
@@ -2696,20 +2702,6 @@ static void spapr_machine_init(MachineState *machine)
 
 spapr->rma_size = node0_size;
 
-/* With KVM, we don't actually know whether KVM supports an
- * unbounded RMA (PR KVM) or is limited by the hash table size
- * (HV KVM using VRMA), so we always assume the latter
- *
- * In that case, we also limit the initial allocations for RTAS
- * etc... to 256M since we have no way to know what the VRMA size
- * is 

[PATCH v6 14/18] spapr,ppc: Simplify signature of kvmppc_rma_size()

2020-02-24 Thread David Gibson
This function calculates the maximum size of the RMA as implied by the
host's page size of structure of the VRMA (there are a number of other
constraints on the RMA size which will supersede this one in many
circumstances).

The current interface takes the current RMA size estimate, and clamps it
to the VRMA derived size.  The only current caller passes in an arguably
wrong value (it will match the current RMA estimate in some but not all
cases).

We want to fix that, but for now just keep concerns separated by having the
KVM helper function just return the VRMA derived limit, and let the caller
combine it with other constraints.  We call the new function
kvmppc_vrma_limit() to more clearly indicate its limited responsibility.

The helper should only ever be called in the KVM enabled case, so replace
its !CONFIG_KVM stub with an assert() rather than a dummy value.

Signed-off-by: David Gibson 
Reviewed-by: Cedric Le Goater 
Reviewed-by: Greg Kurz 
Reviewed-by: Alexey Kardashevskiy 
---
 hw/ppc/spapr.c   | 5 +++--
 target/ppc/kvm.c | 5 ++---
 target/ppc/kvm_ppc.h | 7 +++
 3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 272a270b7a..b68d80ba69 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1586,8 +1586,9 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
 spapr_reallocate_hpt(spapr, hpt_shift, _fatal);
 
 if (spapr->vrma_adjust) {
-spapr->rma_size = kvmppc_rma_size(spapr_node0_size(MACHINE(spapr)),
-  spapr->htab_shift);
+hwaddr vrma_limit = kvmppc_vrma_limit(spapr->htab_shift);
+
+spapr->rma_size = MIN(spapr_node0_size(MACHINE(spapr)), vrma_limit);
 }
 }
 
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 7f44b1aa1a..597f72be1b 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2113,7 +2113,7 @@ void kvmppc_error_append_smt_possible_hint(Error *const 
*errp)
 
 
 #ifdef TARGET_PPC64
-uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
+uint64_t kvmppc_vrma_limit(unsigned int hash_shift)
 {
 struct kvm_ppc_smmu_info info;
 long rampagesize, best_page_shift;
@@ -2140,8 +2140,7 @@ uint64_t kvmppc_rma_size(uint64_t current_size, unsigned 
int hash_shift)
 }
 }
 
-return MIN(current_size,
-   1ULL << (best_page_shift + hash_shift - 7));
+return 1ULL << (best_page_shift + hash_shift - 7);
 }
 #endif
 
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index 9e4f2357cc..332fa0aa1c 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -47,7 +47,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t 
page_shift,
   int *pfd, bool need_vfio);
 int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
 int kvmppc_reset_htab(int shift_hint);
-uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
+uint64_t kvmppc_vrma_limit(unsigned int hash_shift);
 bool kvmppc_has_cap_spapr_vfio(void);
 #endif /* !CONFIG_USER_ONLY */
 bool kvmppc_has_cap_epr(void);
@@ -255,10 +255,9 @@ static inline int kvmppc_reset_htab(int shift_hint)
 return 0;
 }
 
-static inline uint64_t kvmppc_rma_size(uint64_t current_size,
-   unsigned int hash_shift)
+static inline uint64_t kvmppc_vrma_limit(unsigned int hash_shift)
 {
-return ram_size;
+g_assert_not_reached();
 }
 
 static inline bool kvmppc_hpt_needs_host_contiguous_pages(void)
-- 
2.24.1




[PATCH v6 18/18] spapr: Fold spapr_node0_size() into its only caller

2020-02-24 Thread David Gibson
The Real Mode Area (RMA) needs to fit within the NUMA node owning memory
at address 0.  That's usually node 0, but can be a later one if there are
some nodes which have no memory (only CPUs).

This is currently handled by the spapr_node0_size() helper.  It has only
one caller, so there's not a lot of point splitting it out.  It's also
extremely easy to misread the code as clamping to the size of the smallest
node rather than the first node with any memory.

So, fold it into the caller, and add some commentary to make it a bit
clearer exactly what it's doing.

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 37 +
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f0354b699d..9ba645c9cb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -296,20 +296,6 @@ static void spapr_populate_pa_features(SpaprMachineState 
*spapr,
 _FDT((fdt_setprop(fdt, offset, "ibm,pa-features", pa_features, pa_size)));
 }
 
-static hwaddr spapr_node0_size(MachineState *machine)
-{
-if (machine->numa_state->num_nodes) {
-int i;
-for (i = 0; i < machine->numa_state->num_nodes; ++i) {
-if (machine->numa_state->nodes[i].node_mem) {
-return MIN(pow2floor(machine->numa_state->nodes[i].node_mem),
-   machine->ram_size);
-}
-}
-}
-return machine->ram_size;
-}
-
 static void add_str(GString *s, const gchar *s1)
 {
 g_string_append_len(s, s1, strlen(s1) + 1);
@@ -2652,10 +2638,24 @@ static hwaddr spapr_rma_size(SpaprMachineState *spapr, 
Error **errp)
 {
 MachineState *machine = MACHINE(spapr);
 hwaddr rma_size = machine->ram_size;
-hwaddr node0_size = spapr_node0_size(machine);
 
 /* RMA has to fit in the first NUMA node */
-rma_size = MIN(rma_size, node0_size);
+if (machine->numa_state->num_nodes) {
+/*
+ * It's possible for there to be some zero-memory nodes first
+ * in the list.  We need the RMA to fit inside the memory of
+ * the first node which actually has some memory.
+ */
+int i;
+
+for (i = 0; i < machine->numa_state->num_nodes; ++i) {
+if (machine->numa_state->nodes[i].node_mem != 0) {
+rma_size = MIN(rma_size,
+   machine->numa_state->nodes[i].node_mem);
+break;
+}
+}
+}
 
 /*
  * VRMA access is via a special 1TiB SLB mapping, so the RMA can
@@ -2672,6 +2672,11 @@ static hwaddr spapr_rma_size(SpaprMachineState *spapr, 
Error **errp)
 spapr->rma_size = MIN(spapr->rma_size, smc->rma_limit);
 }
 
+/*
+ * RMA size must be a power of 2
+ */
+rma_size = pow2floor(rma_size);
+
 if (rma_size < (MIN_RMA_SLOF * MiB)) {
 error_setg(errp,
 "pSeries SLOF firmware requires >= %ldMiB guest RMA (Real Mode Area)",
-- 
2.24.1




[PATCH v6 17/18] spapr: Clean up RMA size calculation

2020-02-24 Thread David Gibson
Move the calculation of the Real Mode Area (RMA) size into a helper
function.  While we're there clean it up and correct it in a few ways:
  * Add comments making it clearer where the various constraints come from
  * Remove a pointless check that the RMA fits within Node 0 (we've just
clamped it so that it does)

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 59 ++
 1 file changed, 35 insertions(+), 24 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 6e9f15f64d..f0354b699d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2648,6 +2648,40 @@ static PCIHostState *spapr_create_default_phb(void)
 return PCI_HOST_BRIDGE(dev);
 }
 
+static hwaddr spapr_rma_size(SpaprMachineState *spapr, Error **errp)
+{
+MachineState *machine = MACHINE(spapr);
+hwaddr rma_size = machine->ram_size;
+hwaddr node0_size = spapr_node0_size(machine);
+
+/* RMA has to fit in the first NUMA node */
+rma_size = MIN(rma_size, node0_size);
+
+/*
+ * VRMA access is via a special 1TiB SLB mapping, so the RMA can
+ * never exceed that
+ */
+rma_size = MIN(rma_size, TiB);
+
+/*
+ * Clamp the RMA size based on machine type.  This is for
+ * migration compatibility with older qemu versions, which limited
+ * the RMA size for complicated and mostly bad reasons.
+ */
+if (smc->rma_limit) {
+spapr->rma_size = MIN(spapr->rma_size, smc->rma_limit);
+}
+
+if (rma_size < (MIN_RMA_SLOF * MiB)) {
+error_setg(errp,
+"pSeries SLOF firmware requires >= %ldMiB guest RMA (Real Mode Area)",
+   MIN_RMA_SLOF);
+return -1;
+}
+
+return rma_size;
+}
+
 /* pSeries LPAR / sPAPR hardware init */
 static void spapr_machine_init(MachineState *machine)
 {
@@ -2660,7 +2694,6 @@ static void spapr_machine_init(MachineState *machine)
 int i;
 MemoryRegion *sysmem = get_system_memory();
 MemoryRegion *ram = g_new(MemoryRegion, 1);
-hwaddr node0_size = spapr_node0_size(machine);
 long load_limit, fw_size;
 char *filename;
 Error *resize_hpt_err = NULL;
@@ -2700,22 +2733,7 @@ static void spapr_machine_init(MachineState *machine)
 exit(1);
 }
 
-spapr->rma_size = node0_size;
-
-/*
- * Clamp the RMA size based on machine type.  This is for
- * migration compatibility with older qemu versions, which limited
- * the RMA size for complicated and mostly bad reasons.
- */
-if (smc->rma_limit) {
-spapr->rma_size = MIN(spapr->rma_size, smc->rma_limit);
-}
-
-if (spapr->rma_size > node0_size) {
-error_report("Numa node 0 has to span the RMA (%#08"HWADDR_PRIx")",
- spapr->rma_size);
-exit(1);
-}
+spapr->rma_size = spapr_rma_size(spapr, _fatal);
 
 /* Setup a load limit for the ramdisk leaving room for SLOF and FDT */
 load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD;
@@ -2954,13 +2972,6 @@ static void spapr_machine_init(MachineState *machine)
 }
 }
 
-if (spapr->rma_size < MIN_RMA_SLOF) {
-error_report(
-"pSeries SLOF firmware requires >= %ldMiB guest RMA (Real Mode 
Area memory)",
-MIN_RMA_SLOF / MiB);
-exit(1);
-}
-
 if (kernel_filename) {
 uint64_t lowaddr = 0;
 
-- 
2.24.1




[PATCH v6 09/18] target/ppc: Streamline calculation of RMA limit from LPCR[RMLS]

2020-02-24 Thread David Gibson
Currently we use a big switch statement in ppc_hash64_update_rmls() to work
out what the right RMA limit is based on the LPCR[RMLS] field.  There's no
formula for this - it's just an arbitrary mapping defined by the existing
CPU implementations - but we can make it a bit more readable by using a
lookup table rather than a switch.  In addition we can use the MiB/GiB
symbols to make it a bit clearer.

While there we add a bit of clarity and rationale to the comment about
what happens if the LPCR[RMLS] doesn't contain a valid value.

Signed-off-by: David Gibson 
Reviewed-by: Cédric Le Goater 
---
 target/ppc/mmu-hash64.c | 71 -
 1 file changed, 35 insertions(+), 36 deletions(-)

diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index 0ef330a614..4f082d775d 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -18,6 +18,7 @@
  * License along with this library; if not, see .
  */
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
@@ -757,6 +758,39 @@ static void ppc_hash64_set_c(PowerPCCPU *cpu, hwaddr ptex, 
uint64_t pte1)
 stb_phys(CPU(cpu)->as, base + offset, (pte1 & 0xff) | 0x80);
 }
 
+static target_ulong rmls_limit(PowerPCCPU *cpu)
+{
+CPUPPCState *env = >env;
+/*
+ * This is the full 4 bits encoding of POWER8. Previous
+ * CPUs only support a subset of these but the filtering
+ * is done when writing LPCR
+ */
+const target_ulong rma_sizes[] = {
+[0] = 0,
+[1] = 16 * GiB,
+[2] = 1 * GiB,
+[3] = 64 * MiB,
+[4] = 256 * MiB,
+[5] = 0,
+[6] = 0,
+[7] = 128 * MiB,
+[8] = 32 * MiB,
+};
+target_ulong rmls = (env->spr[SPR_LPCR] & LPCR_RMLS) >> LPCR_RMLS_SHIFT;
+
+if (rmls < ARRAY_SIZE(rma_sizes)) {
+return rma_sizes[rmls];
+} else {
+/*
+ * Bad value, so the OS has shot itself in the foot.  Return a
+ * 0-sized RMA which we expect to trigger an immediate DSI or
+ * ISI
+ */
+return 0;
+}
+}
+
 int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr,
 int rwx, int mmu_idx)
 {
@@ -1006,41 +1040,6 @@ void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu, 
target_ulong ptex,
 cpu->env.tlb_need_flush = TLB_NEED_GLOBAL_FLUSH | TLB_NEED_LOCAL_FLUSH;
 }
 
-static void ppc_hash64_update_rmls(PowerPCCPU *cpu)
-{
-CPUPPCState *env = >env;
-uint64_t lpcr = env->spr[SPR_LPCR];
-
-/*
- * This is the full 4 bits encoding of POWER8. Previous
- * CPUs only support a subset of these but the filtering
- * is done when writing LPCR
- */
-switch ((lpcr & LPCR_RMLS) >> LPCR_RMLS_SHIFT) {
-case 0x8: /* 32MB */
-env->rmls = 0x200ull;
-break;
-case 0x3: /* 64MB */
-env->rmls = 0x400ull;
-break;
-case 0x7: /* 128MB */
-env->rmls = 0x800ull;
-break;
-case 0x4: /* 256MB */
-env->rmls = 0x1000ull;
-break;
-case 0x2: /* 1GB */
-env->rmls = 0x4000ull;
-break;
-case 0x1: /* 16GB */
-env->rmls = 0x4ull;
-break;
-default:
-/* What to do here ??? */
-env->rmls = 0;
-}
-}
-
 static void ppc_hash64_update_vrma(PowerPCCPU *cpu)
 {
 CPUPPCState *env = >env;
@@ -1099,7 +1098,7 @@ void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val)
 CPUPPCState *env = >env;
 
 env->spr[SPR_LPCR] = val & pcc->lpcr_mask;
-ppc_hash64_update_rmls(cpu);
+env->rmls = rmls_limit(cpu);
 ppc_hash64_update_vrma(cpu);
 }
 
-- 
2.24.1




[PATCH v6 11/18] target/ppc: Only calculate RMLS derived RMA limit on demand

2020-02-24 Thread David Gibson
When the LPCR is written, we update the env->rmls field with the RMA limit
it implies.  Simplify things by just calculating the value directly from
the LPCR value when we need it.

It's possible this is a little slower, but it's unlikely to be significant,
since this is only for real mode accesses in a translation configuration
that's not used very often, and the whole thing is behind the qemu TLB
anyway.  Therefore, keeping the number of state variables down and not
having to worry about making sure it's always in sync seems the better
option.

Signed-off-by: David Gibson 
Reviewed-by: Cédric Le Goater 
---
 target/ppc/cpu.h|  1 -
 target/ppc/mmu-hash64.c | 84 -
 2 files changed, 40 insertions(+), 45 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 8077fdb068..f9871b1233 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1046,7 +1046,6 @@ struct CPUPPCState {
 uint64_t insns_flags2;
 #if defined(TARGET_PPC64)
 ppc_slb_t vrma_slb;
-target_ulong rmls;
 #endif
 
 int error_code;
diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index dd0df6fd01..ac21c14f68 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -791,6 +791,35 @@ static target_ulong rmls_limit(PowerPCCPU *cpu)
 }
 }
 
+static int build_vrma_slbe(PowerPCCPU *cpu, ppc_slb_t *slb)
+{
+CPUPPCState *env = >env;
+target_ulong lpcr = env->spr[SPR_LPCR];
+uint32_t vrmasd = (lpcr & LPCR_VRMASD) >> LPCR_VRMASD_SHIFT;
+target_ulong vsid = SLB_VSID_VRMA | ((vrmasd << 4) & SLB_VSID_LLP_MASK);
+int i;
+
+for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
+const PPCHash64SegmentPageSizes *sps = >hash64_opts->sps[i];
+
+if (!sps->page_shift) {
+break;
+}
+
+if ((vsid & SLB_VSID_LLP_MASK) == sps->slb_enc) {
+slb->esid = SLB_ESID_V;
+slb->vsid = vsid;
+slb->sps = sps;
+return 0;
+}
+}
+
+error_report("Bad page size encoding in LPCR[VRMASD]; LPCR=0x"
+ TARGET_FMT_lx"\n", lpcr);
+
+return -1;
+}
+
 int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr,
 int rwx, int mmu_idx)
 {
@@ -844,8 +873,10 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
eaddr,
 
 goto skip_slb_search;
 } else {
+target_ulong limit = rmls_limit(cpu);
+
 /* Emulated old-style RMO mode, bounds check against RMLS */
-if (raddr >= env->rmls) {
+if (raddr >= limit) {
 if (rwx == 2) {
 ppc_hash64_set_isi(cs, SRR1_PROTFAULT);
 } else {
@@ -1007,8 +1038,9 @@ hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU *cpu, 
target_ulong addr)
 return -1;
 }
 } else {
+target_ulong limit = rmls_limit(cpu);
 /* Emulated old-style RMO mode, bounds check against RMLS */
-if (raddr >= env->rmls) {
+if (raddr >= limit) {
 return -1;
 }
 return raddr | env->spr[SPR_RMOR];
@@ -1043,53 +1075,18 @@ void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu, 
target_ulong ptex,
 static void ppc_hash64_update_vrma(PowerPCCPU *cpu)
 {
 CPUPPCState *env = >env;
-const PPCHash64SegmentPageSizes *sps = NULL;
-target_ulong esid, vsid, lpcr;
 ppc_slb_t *slb = >vrma_slb;
-uint32_t vrmasd;
-int i;
-
-/* First clear it */
-slb->esid = slb->vsid = 0;
-slb->sps = NULL;
 
 /* Is VRMA enabled ? */
-if (!ppc_hash64_use_vrma(env)) {
-return;
-}
-
-/*
- * Make one up. Mostly ignore the ESID which will not be needed
- * for translation
- */
-lpcr = env->spr[SPR_LPCR];
-vsid = SLB_VSID_VRMA;
-vrmasd = (lpcr & LPCR_VRMASD) >> LPCR_VRMASD_SHIFT;
-vsid |= (vrmasd << 4) & (SLB_VSID_L | SLB_VSID_LP);
-esid = SLB_ESID_V;
-
-for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
-const PPCHash64SegmentPageSizes *sps1 = >hash64_opts->sps[i];
-
-if (!sps1->page_shift) {
-break;
+if (ppc_hash64_use_vrma(env)) {
+if (build_vrma_slbe(cpu, slb) == 0) {
+return;
 }
-
-if ((vsid & SLB_VSID_LLP_MASK) == sps1->slb_enc) {
-sps = sps1;
-break;
-}
-}
-
-if (!sps) {
-error_report("Bad page size encoding esid 0x"TARGET_FMT_lx
- " vsid 0x"TARGET_FMT_lx, esid, vsid);
-return;
 }
 
-slb->vsid = vsid;
-slb->esid = esid;
-slb->sps = sps;
+/* Otherwise, clear it to indicate error */
+slb->esid = slb->vsid = 0;
+slb->sps = NULL;
 }
 
 void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val)
@@ -1098,7 +1095,6 @@ void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val)
 CPUPPCState *env = >env;
 
 env->spr[SPR_LPCR] = val & pcc->lpcr_mask;
-env->rmls = 

[PATCH v6 08/18] target/ppc: Use class fields to simplify LPCR masking

2020-02-24 Thread David Gibson
When we store the Logical Partitioning Control Register (LPCR) we have a
big switch statement to work out which are valid bits for the cpu model
we're emulating.

As well as being ugly, this isn't really conceptually correct, since it is
based on the mmu_model variable, whereas the LPCR isn't (only) about the
MMU, so mmu_model is basically just acting as a proxy for the cpu model.

Handle this in a simpler way, by adding a suitable lpcr_mask to the QOM
class.

Signed-off-by: David Gibson 
Reviewed-by: Cédric Le Goater 
---
 target/ppc/cpu-qom.h|  1 +
 target/ppc/mmu-hash64.c | 36 ++---
 target/ppc/translate_init.inc.c | 27 +
 3 files changed, 26 insertions(+), 38 deletions(-)

diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
index e499575dc8..15d6b54a7d 100644
--- a/target/ppc/cpu-qom.h
+++ b/target/ppc/cpu-qom.h
@@ -177,6 +177,7 @@ typedef struct PowerPCCPUClass {
 uint64_t insns_flags;
 uint64_t insns_flags2;
 uint64_t msr_mask;
+uint64_t lpcr_mask; /* Available bits in the LPCR */
 uint64_t lpcr_pm;   /* Power-saving mode Exit Cause Enable bits */
 powerpc_mmu_t   mmu_model;
 powerpc_excp_t  excp_model;
diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index caf47ad6fc..0ef330a614 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -1095,42 +1095,10 @@ static void ppc_hash64_update_vrma(PowerPCCPU *cpu)
 
 void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val)
 {
+PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
 CPUPPCState *env = >env;
-uint64_t lpcr = 0;
 
-/* Filter out bits */
-switch (env->mmu_model) {
-case POWERPC_MMU_2_03: /* P5p */
-lpcr = val & (LPCR_RMLS | LPCR_ILE |
-  LPCR_LPES0 | LPCR_LPES1 |
-  LPCR_RMI | LPCR_HDICE);
-break;
-case POWERPC_MMU_2_06: /* P7 */
-lpcr = val & (LPCR_VPM0 | LPCR_VPM1 | LPCR_ISL | LPCR_DPFD |
-  LPCR_VRMASD | LPCR_RMLS | LPCR_ILE |
-  LPCR_P7_PECE0 | LPCR_P7_PECE1 | LPCR_P7_PECE2 |
-  LPCR_MER | LPCR_TC |
-  LPCR_LPES0 | LPCR_LPES1 | LPCR_HDICE);
-break;
-case POWERPC_MMU_2_07: /* P8 */
-lpcr = val & (LPCR_VPM0 | LPCR_VPM1 | LPCR_ISL | LPCR_KBV |
-  LPCR_DPFD | LPCR_VRMASD | LPCR_RMLS | LPCR_ILE |
-  LPCR_AIL | LPCR_ONL | LPCR_P8_PECE0 | LPCR_P8_PECE1 |
-  LPCR_P8_PECE2 | LPCR_P8_PECE3 | LPCR_P8_PECE4 |
-  LPCR_MER | LPCR_TC | LPCR_LPES0 | LPCR_HDICE);
-break;
-case POWERPC_MMU_3_00: /* P9 */
-lpcr = val & (LPCR_VPM1 | LPCR_ISL | LPCR_KBV | LPCR_DPFD |
-  (LPCR_PECE_U_MASK & LPCR_HVEE) | LPCR_ILE | LPCR_AIL |
-  LPCR_UPRT | LPCR_EVIRT | LPCR_ONL | LPCR_HR | LPCR_LD |
-  (LPCR_PECE_L_MASK & (LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
-  LPCR_DEE | LPCR_OEE)) | LPCR_MER | LPCR_GTSE | LPCR_TC |
-  LPCR_HEIC | LPCR_LPES0 | LPCR_HVICE | LPCR_HDICE);
-break;
-default:
-g_assert_not_reached();
-}
-env->spr[SPR_LPCR] = lpcr;
+env->spr[SPR_LPCR] = val & pcc->lpcr_mask;
 ppc_hash64_update_rmls(cpu);
 ppc_hash64_update_vrma(cpu);
 }
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 925bc31ca5..5b7a5226e1 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -8476,6 +8476,8 @@ POWERPC_FAMILY(POWER5P)(ObjectClass *oc, void *data)
 (1ull << MSR_DR) |
 (1ull << MSR_PMM) |
 (1ull << MSR_RI);
+pcc->lpcr_mask = LPCR_RMLS | LPCR_ILE | LPCR_LPES0 | LPCR_LPES1 |
+LPCR_RMI | LPCR_HDICE;
 pcc->mmu_model = POWERPC_MMU_2_03;
 #if defined(CONFIG_SOFTMMU)
 pcc->handle_mmu_fault = ppc_hash64_handle_mmu_fault;
@@ -8653,6 +8655,12 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
 (1ull << MSR_PMM) |
 (1ull << MSR_RI) |
 (1ull << MSR_LE);
+pcc->lpcr_mask = LPCR_VPM0 | LPCR_VPM1 | LPCR_ISL | LPCR_DPFD |
+LPCR_VRMASD | LPCR_RMLS | LPCR_ILE |
+LPCR_P7_PECE0 | LPCR_P7_PECE1 | LPCR_P7_PECE2 |
+LPCR_MER | LPCR_TC |
+LPCR_LPES0 | LPCR_LPES1 | LPCR_HDICE;
+pcc->lpcr_pm = LPCR_P7_PECE0 | LPCR_P7_PECE1 | LPCR_P7_PECE2;
 pcc->mmu_model = POWERPC_MMU_2_06;
 #if defined(CONFIG_SOFTMMU)
 pcc->handle_mmu_fault = ppc_hash64_handle_mmu_fault;
@@ -8669,7 +8677,6 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
 pcc->l1_dcache_size = 0x8000;
 pcc->l1_icache_size = 0x8000;
 pcc->interrupts_big_endian = ppc_cpu_interrupts_big_endian_lpcr;
-pcc->lpcr_pm = LPCR_P7_PECE0 | LPCR_P7_PECE1 | LPCR_P7_PECE2;
 }
 
 static void init_proc_POWER8(CPUPPCState *env)
@@ -8825,6 

[PATCH v6 13/18] spapr: Don't use weird units for MIN_RMA_SLOF

2020-02-24 Thread David Gibson
MIN_RMA_SLOF records the minimum about of RMA that the SLOF firmware
requires.  It lets us give a meaningful error if the RMA ends up too small,
rather than just letting SLOF crash.

It's currently stored as a number of megabytes, which is strange for global
constants.  Move that megabyte scaling into the definition of the constant
like most other things use.

Change from M to MiB in the associated message while we're at it.

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 828e2cc135..272a270b7a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -103,7 +103,7 @@
 #define FW_OVERHEAD 0x280
 #define KERNEL_LOAD_ADDRFW_MAX_SIZE
 
-#define MIN_RMA_SLOF128UL
+#define MIN_RMA_SLOF(128 * MiB)
 
 #define PHANDLE_INTC0x
 
@@ -2959,10 +2959,10 @@ static void spapr_machine_init(MachineState *machine)
 }
 }
 
-if (spapr->rma_size < (MIN_RMA_SLOF * MiB)) {
+if (spapr->rma_size < MIN_RMA_SLOF) {
 error_report(
-"pSeries SLOF firmware requires >= %ldM guest RMA (Real Mode Area 
memory)",
-MIN_RMA_SLOF);
+"pSeries SLOF firmware requires >= %ldMiB guest RMA (Real Mode 
Area memory)",
+MIN_RMA_SLOF / MiB);
 exit(1);
 }
 
-- 
2.24.1




[PATCH v6 12/18] target/ppc: Don't store VRMA SLBE persistently

2020-02-24 Thread David Gibson
Currently, we construct the SLBE used for VRMA translations when the LPCR
is written (which controls some bits in the SLBE), then use it later for
translations.

This is a bit complex and confusing - simplify it by simply constructing
the SLBE directly from the LPCR when we need it.

Signed-off-by: David Gibson 
---
 target/ppc/cpu.h|  3 ---
 target/ppc/mmu-hash64.c | 28 ++--
 2 files changed, 6 insertions(+), 25 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index f9871b1233..5a55fb02bd 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1044,9 +1044,6 @@ struct CPUPPCState {
 uint32_t flags;
 uint64_t insns_flags;
 uint64_t insns_flags2;
-#if defined(TARGET_PPC64)
-ppc_slb_t vrma_slb;
-#endif
 
 int error_code;
 uint32_t pending_interrupts;
diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index ac21c14f68..f8bf92aa2e 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -825,6 +825,7 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
eaddr,
 {
 CPUState *cs = CPU(cpu);
 CPUPPCState *env = >env;
+ppc_slb_t vrma_slbe;
 ppc_slb_t *slb;
 unsigned apshift;
 hwaddr ptex;
@@ -863,8 +864,8 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
eaddr,
 }
 } else if (ppc_hash64_use_vrma(env)) {
 /* Emulated VRMA mode */
-slb = >vrma_slb;
-if (!slb->sps) {
+slb = _slbe;
+if (build_vrma_slbe(cpu, slb) != 0) {
 /* Invalid VRMA setup, machine check */
 cs->exception_index = POWERPC_EXCP_MCHECK;
 env->error_code = 0;
@@ -1012,6 +1013,7 @@ skip_slb_search:
 hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU *cpu, target_ulong addr)
 {
 CPUPPCState *env = >env;
+ppc_slb_t vrma_slbe;
 ppc_slb_t *slb;
 hwaddr ptex, raddr;
 ppc_hash_pte64_t pte;
@@ -1033,8 +1035,8 @@ hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU *cpu, 
target_ulong addr)
 return raddr | env->spr[SPR_HRMOR];
 } else if (ppc_hash64_use_vrma(env)) {
 /* Emulated VRMA mode */
-slb = >vrma_slb;
-if (!slb->sps) {
+slb = _slbe;
+if (build_vrma_slbe(cpu, slb) != 0) {
 return -1;
 }
 } else {
@@ -1072,30 +1074,12 @@ void ppc_hash64_tlb_flush_hpte(PowerPCCPU *cpu, 
target_ulong ptex,
 cpu->env.tlb_need_flush = TLB_NEED_GLOBAL_FLUSH | TLB_NEED_LOCAL_FLUSH;
 }
 
-static void ppc_hash64_update_vrma(PowerPCCPU *cpu)
-{
-CPUPPCState *env = >env;
-ppc_slb_t *slb = >vrma_slb;
-
-/* Is VRMA enabled ? */
-if (ppc_hash64_use_vrma(env)) {
-if (build_vrma_slbe(cpu, slb) == 0) {
-return;
-}
-}
-
-/* Otherwise, clear it to indicate error */
-slb->esid = slb->vsid = 0;
-slb->sps = NULL;
-}
-
 void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val)
 {
 PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
 CPUPPCState *env = >env;
 
 env->spr[SPR_LPCR] = val & pcc->lpcr_mask;
-ppc_hash64_update_vrma(cpu);
 }
 
 void helper_store_lpcr(CPUPPCState *env, target_ulong val)
-- 
2.24.1




[PATCH v6 04/18] target/ppc: Correct handling of real mode accesses with vhyp on hash MMU

2020-02-24 Thread David Gibson
On ppc we have the concept of virtual hypervisor ("vhyp") mode, where we
only model the non-hypervisor-privileged parts of the cpu.  Essentially we
model the hypervisor's behaviour from the point of view of a guest OS, but
we don't model the hypervisor's execution.

In particular, in this mode, qemu's notion of target physical address is
a guest physical address from the vcpu's point of view.  So accesses in
guest real mode don't require translation.  If we were modelling the
hypervisor mode, we'd need to translate the guest physical address into
a host physical address.

Currently, we handle this sloppily: we rely on setting up the virtual LPCR
and RMOR registers so that GPAs are simply HPAs plus an offset, which we
set to zero.  This is already conceptually dubious, since the LPCR and RMOR
registers don't exist in the non-hypervisor portion of the CPU.  It gets
worse with POWER9, where RMOR and LPCR[VPM0] no longer exist at all.

Clean this up by explicitly handling the vhyp case.  While we're there,
remove some unnecessary nesting of if statements that made the logic to
select the correct real mode behaviour a bit less clear than it could be.

Signed-off-by: David Gibson 
Reviewed-by: Cédric Le Goater 
---
 target/ppc/mmu-hash64.c | 60 -
 1 file changed, 35 insertions(+), 25 deletions(-)

diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index 3e0be4d55f..392f90e0ae 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -789,27 +789,30 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
eaddr,
  */
 raddr = eaddr & 0x0FFFULL;
 
-/* In HV mode, add HRMOR if top EA bit is clear */
-if (msr_hv || !env->has_hv_mode) {
+if (cpu->vhyp) {
+/*
+ * In virtual hypervisor mode, there's nothing to do:
+ *   EA == GPA == qemu guest address
+ */
+} else if (msr_hv || !env->has_hv_mode) {
+/* In HV mode, add HRMOR if top EA bit is clear */
 if (!(eaddr >> 63)) {
 raddr |= env->spr[SPR_HRMOR];
 }
-} else {
-/* Otherwise, check VPM for RMA vs VRMA */
-if (env->spr[SPR_LPCR] & LPCR_VPM0) {
-slb = >vrma_slb;
-if (slb->sps) {
-goto skip_slb_search;
-}
-/* Not much else to do here */
+} else if (env->spr[SPR_LPCR] & LPCR_VPM0) {
+/* Emulated VRMA mode */
+slb = >vrma_slb;
+if (!slb->sps) {
+/* Invalid VRMA setup, machine check */
 cs->exception_index = POWERPC_EXCP_MCHECK;
 env->error_code = 0;
 return 1;
-} else if (raddr < env->rmls) {
-/* RMA. Check bounds in RMLS */
-raddr |= env->spr[SPR_RMOR];
-} else {
-/* The access failed, generate the approriate interrupt */
+}
+
+goto skip_slb_search;
+} else {
+/* Emulated old-style RMO mode, bounds check against RMLS */
+if (raddr >= env->rmls) {
 if (rwx == 2) {
 ppc_hash64_set_isi(cs, SRR1_PROTFAULT);
 } else {
@@ -821,6 +824,8 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
eaddr,
 }
 return 1;
 }
+
+raddr |= env->spr[SPR_RMOR];
 }
 tlb_set_page(cs, eaddr & TARGET_PAGE_MASK, raddr & TARGET_PAGE_MASK,
  PAGE_READ | PAGE_WRITE | PAGE_EXEC, mmu_idx,
@@ -953,22 +958,27 @@ hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU *cpu, 
target_ulong addr)
 /* In real mode the top 4 effective address bits are ignored */
 raddr = addr & 0x0FFFULL;
 
-/* In HV mode, add HRMOR if top EA bit is clear */
-if ((msr_hv || !env->has_hv_mode) && !(addr >> 63)) {
+if (cpu->vhyp) {
+/*
+ * In virtual hypervisor mode, there's nothing to do:
+ *   EA == GPA == qemu guest address
+ */
+return raddr;
+} else if ((msr_hv || !env->has_hv_mode) && !(addr >> 63)) {
+/* In HV mode, add HRMOR if top EA bit is clear */
 return raddr | env->spr[SPR_HRMOR];
-}
-
-/* Otherwise, check VPM for RMA vs VRMA */
-if (env->spr[SPR_LPCR] & LPCR_VPM0) {
+} else if (env->spr[SPR_LPCR] & LPCR_VPM0) {
+/* Emulated VRMA mode */
 slb = >vrma_slb;
 if (!slb->sps) {
 return -1;
 }
-} else if (raddr < env->rmls) {
-/* RMA. Check bounds in RMLS */
-return raddr | env->spr[SPR_RMOR];
 } else {
-return -1;
+/* Emulated old-style RMO mode, bounds check against RMLS */
+if (raddr >= env->rmls) {
+ 

[PATCH v6 10/18] target/ppc: Correct RMLS table

2020-02-24 Thread David Gibson
The table of RMA limits based on the LPCR[RMLS] field is slightly wrong.
We're missing the RMLS == 0 => 256 GiB RMA option, which is available on
POWER8, so add that.

The comment that goes with the table is much more wrong.  We *don't* filter
invalid RMLS values when writing the LPCR, and there's not really a
sensible way to do so.  Furthermore, while in theory the set of RMLS values
is implementation dependent, it seems in practice the same set has been
available since around POWER4+ up until POWER8, the last model which
supports RMLS at all.  So, correct that as well.

Signed-off-by: David Gibson 
Reviewed-by: Cédric Le Goater 
---
 target/ppc/mmu-hash64.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index 4f082d775d..dd0df6fd01 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -762,12 +762,12 @@ static target_ulong rmls_limit(PowerPCCPU *cpu)
 {
 CPUPPCState *env = >env;
 /*
- * This is the full 4 bits encoding of POWER8. Previous
- * CPUs only support a subset of these but the filtering
- * is done when writing LPCR
+ * In theory the meanings of RMLS values are implementation
+ * dependent.  In practice, this seems to have been the set from
+ * POWER4+..POWER8, and RMLS is no longer supported in POWER9.
  */
 const target_ulong rma_sizes[] = {
-[0] = 0,
+[0] = 256 * GiB,
 [1] = 16 * GiB,
 [2] = 1 * GiB,
 [3] = 64 * MiB,
-- 
2.24.1




[PATCH v6 05/18] target/ppc: Introduce ppc_hash64_use_vrma() helper

2020-02-24 Thread David Gibson
When running guests under a hypervisor, the hypervisor obviously needs to
be protected from guest accesses even if those are in what the guest
considers real mode (translation off).  The POWER hardware provides two
ways of doing that: The old way has guest real mode accesses simply offset
and bounds checked into host addresses.  It works, but requires that a
significant chunk of the guest's memory - the RMA - be physically
contiguous in the host, which is pretty inconvenient.  The new way, known
as VRMA, has guest real mode accesses translated in roughly the normal way
but with some special parameters.

In POWER7 and POWER8 the LPCR[VPM0] bit selected between the two modes, but
in POWER9 only VRMA mode is supported and LPCR[VPM0] no longer exists.  We
handle that difference in behaviour in ppc_hash64_set_isi().. but not in
other places that we blindly check LPCR[VPM0].

Correct those instances with a new helper to tell if we should be in VRMA
mode.

Signed-off-by: David Gibson 
Reviewed-by: Cédric Le Goater 
---
 target/ppc/mmu-hash64.c | 43 -
 1 file changed, 21 insertions(+), 22 deletions(-)

diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index 392f90e0ae..e372c42add 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -668,6 +668,21 @@ unsigned ppc_hash64_hpte_page_shift_noslb(PowerPCCPU *cpu,
 return 0;
 }
 
+static bool ppc_hash64_use_vrma(CPUPPCState *env)
+{
+switch (env->mmu_model) {
+case POWERPC_MMU_3_00:
+/*
+ * ISAv3.0 (POWER9) always uses VRMA, the VPM0 field and RMOR
+ * register no longer exist
+ */
+return true;
+
+default:
+return !!(env->spr[SPR_LPCR] & LPCR_VPM0);
+}
+}
+
 static void ppc_hash64_set_isi(CPUState *cs, uint64_t error_code)
 {
 CPUPPCState *env = _CPU(cs)->env;
@@ -676,15 +691,7 @@ static void ppc_hash64_set_isi(CPUState *cs, uint64_t 
error_code)
 if (msr_ir) {
 vpm = !!(env->spr[SPR_LPCR] & LPCR_VPM1);
 } else {
-switch (env->mmu_model) {
-case POWERPC_MMU_3_00:
-/* Field deprecated in ISAv3.00 - interrupts always go to hyperv */
-vpm = true;
-break;
-default:
-vpm = !!(env->spr[SPR_LPCR] & LPCR_VPM0);
-break;
-}
+vpm = ppc_hash64_use_vrma(env);
 }
 if (vpm && !msr_hv) {
 cs->exception_index = POWERPC_EXCP_HISI;
@@ -702,15 +709,7 @@ static void ppc_hash64_set_dsi(CPUState *cs, uint64_t dar, 
uint64_t dsisr)
 if (msr_dr) {
 vpm = !!(env->spr[SPR_LPCR] & LPCR_VPM1);
 } else {
-switch (env->mmu_model) {
-case POWERPC_MMU_3_00:
-/* Field deprecated in ISAv3.00 - interrupts always go to hyperv */
-vpm = true;
-break;
-default:
-vpm = !!(env->spr[SPR_LPCR] & LPCR_VPM0);
-break;
-}
+vpm = ppc_hash64_use_vrma(env);
 }
 if (vpm && !msr_hv) {
 cs->exception_index = POWERPC_EXCP_HDSI;
@@ -799,7 +798,7 @@ int ppc_hash64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
eaddr,
 if (!(eaddr >> 63)) {
 raddr |= env->spr[SPR_HRMOR];
 }
-} else if (env->spr[SPR_LPCR] & LPCR_VPM0) {
+} else if (ppc_hash64_use_vrma(env)) {
 /* Emulated VRMA mode */
 slb = >vrma_slb;
 if (!slb->sps) {
@@ -967,7 +966,7 @@ hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU *cpu, 
target_ulong addr)
 } else if ((msr_hv || !env->has_hv_mode) && !(addr >> 63)) {
 /* In HV mode, add HRMOR if top EA bit is clear */
 return raddr | env->spr[SPR_HRMOR];
-} else if (env->spr[SPR_LPCR] & LPCR_VPM0) {
+} else if (ppc_hash64_use_vrma(env)) {
 /* Emulated VRMA mode */
 slb = >vrma_slb;
 if (!slb->sps) {
@@ -1056,8 +1055,7 @@ static void ppc_hash64_update_vrma(PowerPCCPU *cpu)
 slb->sps = NULL;
 
 /* Is VRMA enabled ? */
-lpcr = env->spr[SPR_LPCR];
-if (!(lpcr & LPCR_VPM0)) {
+if (!ppc_hash64_use_vrma(env)) {
 return;
 }
 
@@ -1065,6 +1063,7 @@ static void ppc_hash64_update_vrma(PowerPCCPU *cpu)
  * Make one up. Mostly ignore the ESID which will not be needed
  * for translation
  */
+lpcr = env->spr[SPR_LPCR];
 vsid = SLB_VSID_VRMA;
 vrmasd = (lpcr & LPCR_VRMASD) >> LPCR_VRMASD_SHIFT;
 vsid |= (vrmasd << 4) & (SLB_VSID_L | SLB_VSID_LP);
-- 
2.24.1




[PATCH v6 06/18] spapr, ppc: Remove VPM0/RMLS hacks for POWER9

2020-02-24 Thread David Gibson
For the "pseries" machine, we use "virtual hypervisor" mode where we
only model the CPU in non-hypervisor privileged mode.  This means that
we need guest physical addresses within the modelled cpu to be treated
as absolute physical addresses.

We used to do that by clearing LPCR[VPM0] and setting LPCR[RMLS] to a high
limit so that the old offset based translation for guest mode applied,
which does what we need.  However, POWER9 has removed support for that
translation mode, which meant we had some ugly hacks to keep it working.

We now explicitly handle this sort of translation for virtual hypervisor
mode, so the hacks aren't necessary.  We don't need to set VPM0 and RMLS
from the machine type code - they're now ignored in vhyp mode.  On the cpu
side we don't need to allow LPCR[RMLS] to be set on POWER9 in vhyp mode -
that was only there to allow the hack on the machine side.

Signed-off-by: David Gibson 
Reviewed-by: Cédric Le Goater 
---
 hw/ppc/spapr_cpu_core.c | 6 +-
 target/ppc/mmu-hash64.c | 8 
 2 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index d09125d9af..ea5e11f1d9 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -58,14 +58,10 @@ static void spapr_reset_vcpu(PowerPCCPU *cpu)
  * we don't get spurious wakups before an RTAS start-cpu call.
  * For the same reason, set PSSCR_EC.
  */
-lpcr &= ~(LPCR_VPM0 | LPCR_VPM1 | LPCR_ISL | LPCR_KBV | pcc->lpcr_pm);
+lpcr &= ~(LPCR_VPM1 | LPCR_ISL | LPCR_KBV | pcc->lpcr_pm);
 lpcr |= LPCR_LPES0 | LPCR_LPES1;
 env->spr[SPR_PSSCR] |= PSSCR_EC;
 
-/* Set RMLS to the max (ie, 16G) */
-lpcr &= ~LPCR_RMLS;
-lpcr |= 1ull << LPCR_RMLS_SHIFT;
-
 ppc_store_lpcr(cpu, lpcr);
 
 /* Set a full AMOR so guest can use the AMR as it sees fit */
diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index e372c42add..caf47ad6fc 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -1126,14 +1126,6 @@ void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val)
   (LPCR_PECE_L_MASK & (LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
   LPCR_DEE | LPCR_OEE)) | LPCR_MER | LPCR_GTSE | LPCR_TC |
   LPCR_HEIC | LPCR_LPES0 | LPCR_HVICE | LPCR_HDICE);
-/*
- * If we have a virtual hypervisor, we need to bring back RMLS. It
- * doesn't exist on an actual P9 but that's all we know how to
- * configure with softmmu at the moment
- */
-if (cpu->vhyp) {
-lpcr |= (val & LPCR_RMLS);
-}
 break;
 default:
 g_assert_not_reached();
-- 
2.24.1




[PATCH v6 07/18] target/ppc: Remove RMOR register from POWER9 & POWER10

2020-02-24 Thread David Gibson
Currently we create the Real Mode Offset Register (RMOR) on all Book3S cpus
from POWER7 onwards.  However the translation mode which the RMOR controls
is no longer supported in POWER9, and so the register has been removed from
the architecture.

Remove it from our model on POWER9 and POWER10.

Signed-off-by: David Gibson 
Reviewed-by: Cédric Le Goater 
---
 target/ppc/translate_init.inc.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index ab79975fec..925bc31ca5 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -8015,12 +8015,16 @@ static void gen_spr_book3s_ids(CPUPPCState *env)
  SPR_NOACCESS, SPR_NOACCESS,
  _read_generic, _write_generic,
  0x);
-spr_register_hv(env, SPR_RMOR, "RMOR",
+spr_register_hv(env, SPR_HRMOR, "HRMOR",
  SPR_NOACCESS, SPR_NOACCESS,
  SPR_NOACCESS, SPR_NOACCESS,
  _read_generic, _write_generic,
  0x);
-spr_register_hv(env, SPR_HRMOR, "HRMOR",
+}
+
+static void gen_spr_rmor(CPUPPCState *env)
+{
+spr_register_hv(env, SPR_RMOR, "RMOR",
  SPR_NOACCESS, SPR_NOACCESS,
  SPR_NOACCESS, SPR_NOACCESS,
  _read_generic, _write_generic,
@@ -8535,6 +8539,7 @@ static void init_proc_POWER7(CPUPPCState *env)
 
 /* POWER7 Specific Registers */
 gen_spr_book3s_ids(env);
+gen_spr_rmor(env);
 gen_spr_amr(env);
 gen_spr_book3s_purr(env);
 gen_spr_power5p_common(env);
@@ -8676,6 +8681,7 @@ static void init_proc_POWER8(CPUPPCState *env)
 
 /* POWER8 Specific Registers */
 gen_spr_book3s_ids(env);
+gen_spr_rmor(env);
 gen_spr_amr(env);
 gen_spr_iamr(env);
 gen_spr_book3s_purr(env);
-- 
2.24.1




[PATCH v6 00/18] target/ppc: Correct some errors with real mode handling

2020-02-24 Thread David Gibson
POWER "book S" (server class) cpus have a concept of "real mode" where
MMU translation is disabled... sort of.  In fact this can mean a bunch
of slightly different things when hypervisor mode and other
considerations are present.

We had some errors in edge cases here, so clean some things up and
correct them.

Some of those limitations caused problems with calculating the size of
the Real Mode Area of pseries guests, so continue on to clean up and
correct those calculations as well.

Changes since v5:
 * Fixed an error in the sense of a test (pointed out by Fabiano Rosas)
Changes since v4:
 * Some tiny cosmetic fixes to the original patches
 * Added a bunch of extra patches correcting RMA calculation
Changes since v3:
 * Fix style errors reported by checkpatch
Changes since v2:
 * Removed 32-bit hypervisor stubs more completely
 * Minor polish based on review comments
Changes since RFCv1:
 * Add a number of extra patches taking advantage of the initial
   cleanups

Alexey Kardashevskiy (1):
  pseries: Update SLOF firmware image

David Gibson (17):
  ppc: Remove stub support for 32-bit hypervisor mode
  ppc: Remove stub of PPC970 HID4 implementation
  target/ppc: Correct handling of real mode accesses with vhyp on hash
MMU
  target/ppc: Introduce ppc_hash64_use_vrma() helper
  spapr, ppc: Remove VPM0/RMLS hacks for POWER9
  target/ppc: Remove RMOR register from POWER9 & POWER10
  target/ppc: Use class fields to simplify LPCR masking
  target/ppc: Streamline calculation of RMA limit from LPCR[RMLS]
  target/ppc: Correct RMLS table
  target/ppc: Only calculate RMLS derived RMA limit on demand
  target/ppc: Don't store VRMA SLBE persistently
  spapr: Don't use weird units for MIN_RMA_SLOF
  spapr,ppc: Simplify signature of kvmppc_rma_size()
  spapr: Don't attempt to clamp RMA to VRMA constraint
  spapr: Don't clamp RMA to 16GiB on new machine types
  spapr: Clean up RMA size calculation
  spapr: Fold spapr_node0_size() into its only caller

 hw/ppc/spapr.c  | 124 ++--
 hw/ppc/spapr_cpu_core.c |   6 +-
 hw/ppc/spapr_hcall.c|   4 +-
 include/hw/ppc/spapr.h  |   4 +-
 pc-bios/README  |   2 +-
 pc-bios/slof.bin| Bin 931032 -> 968616 bytes
 roms/SLOF   |   2 +-
 target/ppc/cpu-qom.h|   1 +
 target/ppc/cpu.h|  25 +--
 target/ppc/kvm.c|   5 +-
 target/ppc/kvm_ppc.h|   7 +-
 target/ppc/mmu-hash64.c | 327 
 target/ppc/translate_init.inc.c |  63 --
 13 files changed, 254 insertions(+), 316 deletions(-)

-- 
2.24.1




[PATCH v6 03/18] ppc: Remove stub of PPC970 HID4 implementation

2020-02-24 Thread David Gibson
The PowerPC 970 CPU was a cut-down POWER4, which had hypervisor capability.
However, it can be (and often was) strapped into "Apple mode", where the
hypervisor capabilities were disabled (essentially putting it always in
hypervisor mode).

That's actually the only mode of the 970 we support in qemu, and we're
unlikely to change that any time soon.  However, we do have a partial
implementation of the 970's HID4 register which affects things only
relevant for hypervisor mode.

That stub is also really ugly, since it attempts to duplicate the effects
of HID4 by re-encoding it into the LPCR register used in newer CPUs, but
in a really confusing way.

Just get rid of it.

Signed-off-by: David Gibson 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Greg Kurz 
---
 target/ppc/mmu-hash64.c | 29 +
 target/ppc/translate_init.inc.c | 20 
 2 files changed, 9 insertions(+), 40 deletions(-)

diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index da8966ccf5..3e0be4d55f 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -1091,33 +1091,6 @@ void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val)
 
 /* Filter out bits */
 switch (env->mmu_model) {
-case POWERPC_MMU_64B: /* 970 */
-if (val & 0x40) {
-lpcr |= LPCR_LPES0;
-}
-if (val & 0x8000ull) {
-lpcr |= LPCR_LPES1;
-}
-if (val & 0x20) {
-lpcr |= (0x4ull << LPCR_RMLS_SHIFT);
-}
-if (val & 0x4000ull) {
-lpcr |= (0x2ull << LPCR_RMLS_SHIFT);
-}
-if (val & 0x2000ull) {
-lpcr |= (0x1ull << LPCR_RMLS_SHIFT);
-}
-env->spr[SPR_RMOR] = ((lpcr >> 41) & 0xull) << 26;
-
-/*
- * XXX We could also write LPID from HID4 here
- * but since we don't tag any translation on it
- * it doesn't actually matter
- *
- * XXX For proper emulation of 970 we also need
- * to dig HRMOR out of HID5
- */
-break;
 case POWERPC_MMU_2_03: /* P5p */
 lpcr = val & (LPCR_RMLS | LPCR_ILE |
   LPCR_LPES0 | LPCR_LPES1 |
@@ -1154,7 +1127,7 @@ void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val)
 }
 break;
 default:
-;
+g_assert_not_reached();
 }
 env->spr[SPR_LPCR] = lpcr;
 ppc_hash64_update_rmls(cpu);
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index a0d0eaabf2..ab79975fec 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -7895,25 +7895,21 @@ static void spr_write_lpcr(DisasContext *ctx, int sprn, 
int gprn)
 {
 gen_helper_store_lpcr(cpu_env, cpu_gpr[gprn]);
 }
-
-static void spr_write_970_hid4(DisasContext *ctx, int sprn, int gprn)
-{
-#if defined(TARGET_PPC64)
-spr_write_generic(ctx, sprn, gprn);
-gen_helper_store_lpcr(cpu_env, cpu_gpr[gprn]);
-#endif
-}
-
 #endif /* !defined(CONFIG_USER_ONLY) */
 
 static void gen_spr_970_lpar(CPUPPCState *env)
 {
 #if !defined(CONFIG_USER_ONLY)
-/* Logical partitionning */
-/* PPC970: HID4 is effectively the LPCR */
+/*
+ * PPC970: HID4 covers things later controlled by the LPCR and
+ * RMOR in later CPUs, but with a different encoding.  We only
+ * support the 970 in "Apple mode" which has all hypervisor
+ * facilities disabled by strapping, so we can basically just
+ * ignore it
+ */
 spr_register(env, SPR_970_HID4, "HID4",
  SPR_NOACCESS, SPR_NOACCESS,
- _read_generic, _write_970_hid4,
+ _read_generic, _write_generic,
  0x);
 #endif
 }
-- 
2.24.1




[PATCH v6 02/18] ppc: Remove stub support for 32-bit hypervisor mode

2020-02-24 Thread David Gibson
a4f30719a8cd, way back in 2007 noted that "PowerPC hypervisor mode is not
fundamentally available only for PowerPC 64" and added a 32-bit version
of the MSR[HV] bit.

But nothing was ever really done with that; there is no meaningful support
for 32-bit hypervisor mode 13 years later.  Let's stop pretending and just
remove the stubs.

Signed-off-by: David Gibson 
Reviewed-by: Fabiano Rosas 
---
 target/ppc/cpu.h| 21 +++--
 target/ppc/translate_init.inc.c |  6 +++---
 2 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index b283042515..8077fdb068 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -24,8 +24,6 @@
 #include "exec/cpu-defs.h"
 #include "cpu-qom.h"
 
-/* #define PPC_EMULATE_32BITS_HYPV */
-
 #define TCG_GUEST_DEFAULT_MO 0
 
 #define TARGET_PAGE_BITS_64K 16
@@ -300,13 +298,12 @@ typedef struct ppc_v3_pate_t {
 #define MSR_SF   63 /* Sixty-four-bit modehflags */
 #define MSR_TAG  62 /* Tag-active mode (POWERx ?)*/
 #define MSR_ISF  61 /* Sixty-four-bit interrupt mode on 630  */
-#define MSR_SHV  60 /* hypervisor state   hflags */
+#define MSR_HV   60 /* hypervisor state   hflags */
 #define MSR_TS0  34 /* Transactional state, 2 bits (Book3s)  */
 #define MSR_TS1  33
 #define MSR_TM   32 /* Transactional Memory Available (Book3s)   */
 #define MSR_CM   31 /* Computation mode for BookE hflags */
 #define MSR_ICM  30 /* Interrupt computation mode for BookE  */
-#define MSR_THV  29 /* hypervisor state for 32 bits PowerPC   hflags */
 #define MSR_GS   28 /* guest state for BookE */
 #define MSR_UCLE 26 /* User-mode cache lock enable for BookE */
 #define MSR_VR   25 /* altivec availablex hflags */
@@ -401,10 +398,13 @@ typedef struct ppc_v3_pate_t {
 
 #define msr_sf   ((env->msr >> MSR_SF)   & 1)
 #define msr_isf  ((env->msr >> MSR_ISF)  & 1)
-#define msr_shv  ((env->msr >> MSR_SHV)  & 1)
+#if defined(TARGET_PPC64)
+#define msr_hv   ((env->msr >> MSR_HV)   & 1)
+#else
+#define msr_hv   (0)
+#endif
 #define msr_cm   ((env->msr >> MSR_CM)   & 1)
 #define msr_icm  ((env->msr >> MSR_ICM)  & 1)
-#define msr_thv  ((env->msr >> MSR_THV)  & 1)
 #define msr_gs   ((env->msr >> MSR_GS)   & 1)
 #define msr_ucle ((env->msr >> MSR_UCLE) & 1)
 #define msr_vr   ((env->msr >> MSR_VR)   & 1)
@@ -449,16 +449,9 @@ typedef struct ppc_v3_pate_t {
 
 /* Hypervisor bit is more specific */
 #if defined(TARGET_PPC64)
-#define MSR_HVB (1ULL << MSR_SHV)
-#define msr_hv  msr_shv
-#else
-#if defined(PPC_EMULATE_32BITS_HYPV)
-#define MSR_HVB (1ULL << MSR_THV)
-#define msr_hv  msr_thv
+#define MSR_HVB (1ULL << MSR_HV)
 #else
 #define MSR_HVB (0ULL)
-#define msr_hv  (0)
-#endif
 #endif
 
 /* DSISR */
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 53995f62ea..a0d0eaabf2 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -8804,7 +8804,7 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
 PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 |
 PPC2_TM | PPC2_PM_ISA206;
 pcc->msr_mask = (1ull << MSR_SF) |
-(1ull << MSR_SHV) |
+(1ull << MSR_HV) |
 (1ull << MSR_TM) |
 (1ull << MSR_VR) |
 (1ull << MSR_VSX) |
@@ -9017,7 +9017,7 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
 PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 |
 PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL;
 pcc->msr_mask = (1ull << MSR_SF) |
-(1ull << MSR_SHV) |
+(1ull << MSR_HV) |
 (1ull << MSR_TM) |
 (1ull << MSR_VR) |
 (1ull << MSR_VSX) |
@@ -9228,7 +9228,7 @@ POWERPC_FAMILY(POWER10)(ObjectClass *oc, void *data)
 PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 |
 PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL;
 pcc->msr_mask = (1ull << MSR_SF) |
-(1ull << MSR_SHV) |
+(1ull << MSR_HV) |
 (1ull << MSR_TM) |
 (1ull << MSR_VR) |
 (1ull << MSR_VSX) |
-- 
2.24.1




[PATCH v2 1/2] linux-user: Protect more syscalls

2020-02-24 Thread Alistair Francis
New y2038 safe 32-bit architectures (like RISC-V) don't support old
syscalls with a 32-bit time_t. The kernel defines new *_time64 versions
of these syscalls. Add some more #ifdefs to syscall.c in linux-user to
allow us to compile without these old syscalls.

Signed-off-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
---
 linux-user/strace.c  |  2 ++
 linux-user/syscall.c | 20 
 2 files changed, 22 insertions(+)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 4f7130b2ff..6420ccd97b 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -775,6 +775,7 @@ print_syscall_ret_newselect(const struct syscallname *name, 
abi_long ret)
 #define TARGET_TIME_OOP  3   /* leap second in progress */
 #define TARGET_TIME_WAIT 4   /* leap second has occurred */
 #define TARGET_TIME_ERROR5   /* clock not synchronized */
+#ifdef TARGET_NR_adjtimex
 static void
 print_syscall_ret_adjtimex(const struct syscallname *name, abi_long ret)
 {
@@ -813,6 +814,7 @@ print_syscall_ret_adjtimex(const struct syscallname *name, 
abi_long ret)
 
 qemu_log("\n");
 }
+#endif
 
 UNUSED static struct flags access_flags[] = {
 FLAG_GENERIC(F_OK),
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8d27d10807..5a2156f95a 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -742,8 +742,10 @@ safe_syscall3(ssize_t, read, int, fd, void *, buff, 
size_t, count)
 safe_syscall3(ssize_t, write, int, fd, const void *, buff, size_t, count)
 safe_syscall4(int, openat, int, dirfd, const char *, pathname, \
   int, flags, mode_t, mode)
+#if defined(TARGET_NR_wait4)
 safe_syscall4(pid_t, wait4, pid_t, pid, int *, status, int, options, \
   struct rusage *, rusage)
+#endif
 safe_syscall5(int, waitid, idtype_t, idtype, id_t, id, siginfo_t *, infop, \
   int, options, struct rusage *, rusage)
 safe_syscall3(int, execve, const char *, filename, char **, argv, char **, 
envp)
@@ -780,8 +782,10 @@ safe_syscall4(int, rt_sigtimedwait, const sigset_t *, 
these, siginfo_t *, uinfo,
   const struct timespec *, uts, size_t, sigsetsize)
 safe_syscall4(int, accept4, int, fd, struct sockaddr *, addr, socklen_t *, len,
   int, flags)
+#if defined(TARGET_NR_nanosleep)
 safe_syscall2(int, nanosleep, const struct timespec *, req,
   struct timespec *, rem)
+#endif
 #ifdef TARGET_NR_clock_nanosleep
 safe_syscall4(int, clock_nanosleep, const clockid_t, clock, int, flags,
   const struct timespec *, req, struct timespec *, rem)
@@ -1067,6 +1071,7 @@ static inline abi_long host_to_target_rusage(abi_ulong 
target_addr,
 return 0;
 }
 
+#ifdef TARGET_NR_setrlimit
 static inline rlim_t target_to_host_rlim(abi_ulong target_rlim)
 {
 abi_ulong target_rlim_swap;
@@ -1082,7 +1087,9 @@ static inline rlim_t target_to_host_rlim(abi_ulong 
target_rlim)
 
 return result;
 }
+#endif
 
+#if defined(TARGET_NR_getrlimit) || defined(TARGET_NR_ugetrlimit)
 static inline abi_ulong host_to_target_rlim(rlim_t rlim)
 {
 abi_ulong target_rlim_swap;
@@ -1096,6 +1103,7 @@ static inline abi_ulong host_to_target_rlim(rlim_t rlim)
 
 return result;
 }
+#endif
 
 static inline int target_to_host_resource(int code)
 {
@@ -1228,6 +1236,7 @@ static inline abi_long 
host_to_target_timespec64(abi_ulong target_addr,
 return 0;
 }
 
+#if defined(TARGET_NR_settimeofday)
 static inline abi_long copy_from_user_timezone(struct timezone *tz,
abi_ulong target_tz_addr)
 {
@@ -1244,6 +1253,7 @@ static inline abi_long copy_from_user_timezone(struct 
timezone *tz,
 
 return 0;
 }
+#endif
 
 #if defined(TARGET_NR_mq_open) && defined(__NR_mq_open)
 #include 
@@ -8629,6 +8639,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 }
 }
 return ret;
+#if defined(TARGET_NR_gettimeofday)
 case TARGET_NR_gettimeofday:
 {
 struct timeval tv;
@@ -8639,6 +8650,8 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 }
 }
 return ret;
+#endif
+#if defined(TARGET_NR_settimeofday)
 case TARGET_NR_settimeofday:
 {
 struct timeval tv, *ptv = NULL;
@@ -8660,6 +8673,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 
 return get_errno(settimeofday(ptv, ptz));
 }
+#endif
 #if defined(TARGET_NR_select)
 case TARGET_NR_select:
 #if defined(TARGET_WANT_NI_OLD_SELECT)
@@ -9305,6 +9319,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 return do_syscall(cpu_env, arg1 & 0x, arg2, arg3, arg4, arg5,
   arg6, arg7, arg8, 0);
 #endif
+#if defined(TARGET_NR_wait4)
 case TARGET_NR_wait4:
 {
 int status;
@@ -9332,6 +9347,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 }
 }
 return 

[PATCH v2 2/2] linux-user/riscv: Update the syscall_nr's to the 5.5 kernel

2020-02-24 Thread Alistair Francis
Signed-off-by: Alistair Francis 
---
 linux-user/riscv/syscall32_nr.h | 314 
 linux-user/riscv/syscall64_nr.h | 303 ++
 linux-user/riscv/syscall_nr.h   | 294 +-
 3 files changed, 619 insertions(+), 292 deletions(-)
 create mode 100644 linux-user/riscv/syscall32_nr.h
 create mode 100644 linux-user/riscv/syscall64_nr.h

diff --git a/linux-user/riscv/syscall32_nr.h b/linux-user/riscv/syscall32_nr.h
new file mode 100644
index 00..c3bf5930d0
--- /dev/null
+++ b/linux-user/riscv/syscall32_nr.h
@@ -0,0 +1,314 @@
+/*
+ * Syscall numbers from asm-generic for RV32.
+ */
+
+#ifndef LINUX_USER_RISCV_SYSCALL32_NR_H
+#define LINUX_USER_RISCV_SYSCALL32_NR_H
+
+#define TARGET_NR_io_setup 0
+#define TARGET_NR_io_destroy 1
+#define TARGET_NR_io_submit 2
+#define TARGET_NR_io_cancel 3
+#define TARGET_NR_setxattr 5
+#define TARGET_NR_lsetxattr 6
+#define TARGET_NR_fsetxattr 7
+#define TARGET_NR_getxattr 8
+#define TARGET_NR_lgetxattr 9
+#define TARGET_NR_fgetxattr 10
+#define TARGET_NR_listxattr 11
+#define TARGET_NR_llistxattr 12
+#define TARGET_NR_flistxattr 13
+#define TARGET_NR_removexattr 14
+#define TARGET_NR_lremovexattr 15
+#define TARGET_NR_fremovexattr 16
+#define TARGET_NR_getcwd 17
+#define TARGET_NR_lookup_dcookie 18
+#define TARGET_NR_eventfd2 19
+#define TARGET_NR_epoll_create1 20
+#define TARGET_NR_epoll_ctl 21
+#define TARGET_NR_epoll_pwait 22
+#define TARGET_NR_dup 23
+#define TARGET_NR_dup3 24
+#define TARGET_NR_fcntl64 25
+#define TARGET_NR_inotify_init1 26
+#define TARGET_NR_inotify_add_watch 27
+#define TARGET_NR_inotify_rm_watch 28
+#define TARGET_NR_ioctl 29
+#define TARGET_NR_ioprio_set 30
+#define TARGET_NR_ioprio_get 31
+#define TARGET_NR_flock 32
+#define TARGET_NR_mknodat 33
+#define TARGET_NR_mkdirat 34
+#define TARGET_NR_unlinkat 35
+#define TARGET_NR_symlinkat 36
+#define TARGET_NR_linkat 37
+#define TARGET_NR_umount2 39
+#define TARGET_NR_mount 40
+#define TARGET_NR_pivot_root 41
+#define TARGET_NR_nfsservctl 42
+#define TARGET_NR_statfs 43
+#define TARGET_NR_fstatfs 44
+#define TARGET_NR_truncate 45
+#define TARGET_NR_ftruncate 46
+#define TARGET_NR_fallocate 47
+#define TARGET_NR_faccessat 48
+#define TARGET_NR_chdir 49
+#define TARGET_NR_fchdir 50
+#define TARGET_NR_chroot 51
+#define TARGET_NR_fchmod 52
+#define TARGET_NR_fchmodat 53
+#define TARGET_NR_fchownat 54
+#define TARGET_NR_fchown 55
+#define TARGET_NR_openat 56
+#define TARGET_NR_close 57
+#define TARGET_NR_vhangup 58
+#define TARGET_NR_pipe2 59
+#define TARGET_NR_quotactl 60
+#define TARGET_NR_getdents64 61
+#define TARGET_NR__llseek 62
+#define TARGET_NR_read 63
+#define TARGET_NR_write 64
+#define TARGET_NR_readv 65
+#define TARGET_NR_writev 66
+#define TARGET_NR_pread64 67
+#define TARGET_NR_pwrite64 68
+#define TARGET_NR_preadv 69
+#define TARGET_NR_pwritev 70
+#define TARGET_NR_sendfile 71
+#define TARGET_NR_signalfd4 74
+#define TARGET_NR_vmsplice 75
+#define TARGET_NR_splice 76
+#define TARGET_NR_tee 77
+#define TARGET_NR_readlinkat 78
+#define TARGET_NR_newfstatat 79
+#define TARGET_NR_fstat 80
+#define TARGET_NR_sync 81
+#define TARGET_NR_fsync 82
+#define TARGET_NR_fdatasync 83
+#define TARGET_NR_sync_file_range 84
+#define TARGET_NR_timerfd_create 85
+#define TARGET_NR_acct 89
+#define TARGET_NR_capget 90
+#define TARGET_NR_capset 91
+#define TARGET_NR_personality 92
+#define TARGET_NR_exit 93
+#define TARGET_NR_exit_group 94
+#define TARGET_NR_waitid 95
+#define TARGET_NR_set_tid_address 96
+#define TARGET_NR_unshare 97
+#define TARGET_NR_set_robust_list 99
+#define TARGET_NR_get_robust_list 100
+#define TARGET_NR_getitimer 102
+#define TARGET_NR_setitimer 103
+#define TARGET_NR_kexec_load 104
+#define TARGET_NR_init_module 105
+#define TARGET_NR_delete_module 106
+#define TARGET_NR_timer_create 107
+#define TARGET_NR_timer_getoverrun 109
+#define TARGET_NR_timer_delete 111
+#define TARGET_NR_syslog 116
+#define TARGET_NR_ptrace 117
+#define TARGET_NR_sched_setparam 118
+#define TARGET_NR_sched_setscheduler 119
+#define TARGET_NR_sched_getscheduler 120
+#define TARGET_NR_sched_getparam 121
+#define TARGET_NR_sched_setaffinity 122
+#define TARGET_NR_sched_getaffinity 123
+#define TARGET_NR_sched_yield 124
+#define TARGET_NR_sched_get_priority_max 125
+#define TARGET_NR_sched_get_priority_min 126
+#define TARGET_NR_restart_syscall 128
+#define TARGET_NR_kill 129
+#define TARGET_NR_tkill 130
+#define TARGET_NR_tgkill 131
+#define TARGET_NR_sigaltstack 132
+#define TARGET_NR_rt_sigsuspend 133
+#define TARGET_NR_rt_sigaction 134
+#define TARGET_NR_rt_sigprocmask 135
+#define TARGET_NR_rt_sigpending 136
+#define TARGET_NR_rt_sigqueueinfo 138
+#define TARGET_NR_rt_sigreturn 139
+#define TARGET_NR_setpriority 140
+#define TARGET_NR_getpriority 141
+#define TARGET_NR_reboot 142
+#define TARGET_NR_setregid 143
+#define TARGET_NR_setgid 144
+#define TARGET_NR_setreuid 145
+#define TARGET_NR_setuid 146
+#define TARGET_NR_setresuid 

  1   2   3   4   5   >