Re: [PATCH v2] qga: fence guest-set-time if hwclock not available

2019-11-28 Thread Cornelia Huck
On Thu, 28 Nov 2019 19:38:00 +0100
Laszlo Ersek  wrote:

> Hi Cornelia,
> 
> On 11/28/19 19:11, Cornelia Huck wrote:
> > The Posix implementation of guest-set-time invokes hwclock to
> > set/retrieve the time to/from the hardware clock. If hwclock
> > is not available, the user is currently informed that "hwclock
> > failed to set hardware clock to system time", which is quite
> > misleading. This may happen e.g. on s390x, which has a different
> > timekeeping concept anyway.
> > 
> > Let's check for the availability of the hwclock command and
> > return QERR_UNSUPPORTED for guest-set-time if it is not available.
> > 
> > Signed-off-by: Cornelia Huck 
> > ---
> > 
> > v1 (RFC) -> v2:
> > - use hwclock_path[]
> > - use access() instead of stat()
> > 
> > ---
> >  qga/commands-posix.c | 13 -
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/qga/commands-posix.c b/qga/commands-posix.c
> > index 1c1a165daed8..ffb6420fa9cd 100644
> > --- a/qga/commands-posix.c
> > +++ b/qga/commands-posix.c
> > @@ -156,6 +156,17 @@ void qmp_guest_set_time(bool has_time, int64_t 
> > time_ns, Error **errp)
> >  pid_t pid;
> >  Error *local_err = NULL;
> >  struct timeval tv;
> > +const char hwclock_path[] = "/sbin/hwclock";  
> 
> Did you drop the "static" storage-class specifier on purpose?

No, I just need to do patches when I'm less tired :/

> 
> > +static int hwclock_available = -1;
> > +
> > +if (hwclock_available < 0) {
> > +hwclock_available = (access(hwclock_path, X_OK) == 0);
> > +}
> > +
> > +if (!hwclock_available) {
> > +error_setg(errp, QERR_UNSUPPORTED);
> > +return;
> > +}
> >  
> >  /* If user has passed a time, validate and set it. */
> >  if (has_time) {
> > @@ -195,7 +206,7 @@ void qmp_guest_set_time(bool has_time, int64_t time_ns, 
> > Error **errp)
> >  
> >  /* Use '/sbin/hwclock -w' to set RTC from the system time,
> >   * or '/sbin/hwclock -s' to set the system time from RTC. */
> > -execle("/sbin/hwclock", "hwclock", has_time ? "-w" : "-s",
> > +execle(hwclock_path, "hwclock", has_time ? "-w" : "-s",  
> 
> I think it's somewhat obscure now that arg="hwclock" is supposed to
> match the last pathname component in hwclock_path="/sbin/hwclock".
> 
> There are multiple ways to compute "arg" like that, of course. But I
> think they all look uglier than the above. So I'm fine if we just keep this.

Yes, I was not able to come up with something elegant, either.

[Side note: does really everyone put hwclock under /sbin (i.e., nobody
doing something creative?)]

> 
> 
> If you purposely dropped the "static", then:
> 
> Reviewed-by: Laszlo Ersek 
> 
> If you just missed the "static" and now intend to add it back, then for v3:
> 
> Reviewed-by: Laszlo Ersek 
> 
> Thanks
> Laszlo

Thanks for looking! There'll be a v3.

> 
> 
> 
> > NULL, environ);
> >  _exit(EXIT_FAILURE);
> >  } else if (pid < 0) {
> >   
> 




[PATCH v20 3/8] numa: Extend CLI to provide memory side cache information

2019-11-28 Thread Tao Xu
From: Liu Jingqi 

Add -numa hmat-cache option to provide Memory Side Cache Information.
These memory attributes help to build Memory Side Cache Information
Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).
Before using hmat-cache option, enable HMAT with -machine hmat=on.

Acked-by: Markus Armbruster 
Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
---

Changes in v20:
- Disable cache level 0 in hmat-cache option (Igor)
- Update the QAPI description (Markus)

Changes in v19:
- Add description about the machine property 'hmat' in commit
  message (Markus)
- Update the QAPI comments
- Add a check for no memory side cache

Changes in v18:
- Update the error message (Igor)

Changes in v17:
- Use NumaHmatCacheOptions to replace HMAT_Cache_Info (Igor)
- Add check for unordered cache level input (Igor)

Changes in v16:
- Add cross check with hmat_lb data (Igor)
- Drop total_levels in struct HMAT_Cache_Info (Igor)
- Correct the error table number (Igor)
---
 hw/core/numa.c| 80 ++
 include/sysemu/numa.h |  5 +++
 qapi/machine.json | 81 +--
 qemu-options.hx   | 17 +++--
 4 files changed, 179 insertions(+), 4 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 34eb413f5d..33fda31a4c 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -379,6 +379,73 @@ void parse_numa_hmat_lb(NumaState *numa_state, 
NumaHmatLBOptions *node,
 g_array_append_val(hmat_lb->list, lb_data);
 }
 
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
+   Error **errp)
+{
+int nb_numa_nodes = ms->numa_state->num_nodes;
+NodeInfo *numa_info = ms->numa_state->nodes;
+NumaHmatCacheOptions *hmat_cache = NULL;
+
+if (node->node_id >= nb_numa_nodes) {
+error_setg(errp, "Invalid node-id=%" PRIu32 ", it should be less "
+   "than %d", node->node_id, nb_numa_nodes);
+return;
+}
+
+if (numa_info[node->node_id].lb_info_provided != (BIT(0) | BIT(1))) {
+error_setg(errp, "The latency and bandwidth information of "
+   "node-id=%" PRIu32 " should be provided before memory side "
+   "cache attributes", node->node_id);
+return;
+}
+
+if (node->level < 1 || node->level >= HMAT_LB_LEVELS) {
+error_setg(errp, "Invalid level=%" PRIu8 ", it should be larger than 0 
"
+   "and less than or equal to %d", node->level,
+   HMAT_LB_LEVELS - 1);
+return;
+}
+
+assert(node->associativity < HMAT_CACHE_ASSOCIATIVITY__MAX);
+assert(node->policy < HMAT_CACHE_WRITE_POLICY__MAX);
+if (ms->numa_state->hmat_cache[node->node_id][node->level]) {
+error_setg(errp, "Duplicate configuration of the side cache for "
+   "node-id=%" PRIu32 " and level=%" PRIu8,
+   node->node_id, node->level);
+return;
+}
+
+if ((node->level > 1) &&
+ms->numa_state->hmat_cache[node->node_id][node->level - 1] &&
+(node->size >=
+ms->numa_state->hmat_cache[node->node_id][node->level - 1]->size)) 
{
+error_setg(errp, "Invalid size=%" PRIu64 ", the size of level=%" PRIu8
+   " should be less than the size(%" PRIu64 ") of "
+   "level=%" PRIu8, node->size, node->level,
+   ms->numa_state->hmat_cache[node->node_id]
+ [node->level - 1]->size,
+   node->level - 1);
+return;
+}
+
+if ((node->level < HMAT_LB_LEVELS - 1) &&
+ms->numa_state->hmat_cache[node->node_id][node->level + 1] &&
+(node->size <=
+ms->numa_state->hmat_cache[node->node_id][node->level + 1]->size)) 
{
+error_setg(errp, "Invalid size=%" PRIu64 ", the size of level=%" PRIu8
+   " should be larger than the size(%" PRIu64 ") of "
+   "level=%" PRIu8, node->size, node->level,
+   ms->numa_state->hmat_cache[node->node_id]
+ [node->level + 1]->size,
+   node->level + 1);
+return;
+}
+
+hmat_cache = g_malloc0(sizeof(*hmat_cache));
+memcpy(hmat_cache, node, sizeof(*hmat_cache));
+ms->numa_state->hmat_cache[node->node_id][node->level] = hmat_cache;
+}
+
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
 {
 Error *err = NULL;
@@ -430,6 +497,19 @@ void set_numa_options(MachineState *ms, NumaOptions 
*object, Error **errp)
 goto end;
 }
 break;
+case NUMA_OPTIONS_TYPE_HMAT_CACHE:
+if (!ms->numa_state->hmat_enabled) {
+error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
+   "(HMAT) is disabled, enable it with -machine hmat=on "
+   "before using any of 

Re: [PATCH 1/3] target/arm: Honor HCR_EL2.TID2 trapping requirements

2019-11-28 Thread Edgar E. Iglesias
On Thu, Nov 28, 2019 at 04:17:16PM +, Marc Zyngier wrote:
> HCR_EL2.TID2 mandates that access from EL1 to CTR_EL0, CCSIDR_EL1,
> CCSIDR2_EL1, CLIDR_EL1, CSSELR_EL1 are trapped to EL2, and QEMU
> completely ignores it, making impossible for hypervisors to

Nit: "making it impossible"


> virtualize the cache hierarchy.
> 
> Do the right thing by trapping to EL2 if HCR_EL2.TID2 is set.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  target/arm/helper.c | 28 +---
>  1 file changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 0bf8f53d4b..0b6887b100 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -1910,6 +1910,17 @@ static void scr_write(CPUARMState *env, const 
> ARMCPRegInfo *ri, uint64_t value)
>  raw_write(env, ri, value);
>  }
>  
> +static CPAccessResult access_aa64_tid2(CPUARMState *env,
> +   const ARMCPRegInfo *ri,
> +   bool isread)
> +{
> +if (arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_TID2)) {
> +return CP_ACCESS_TRAP_EL2;
> +}
> +
> +return CP_ACCESS_OK;
> +}
> +
>  static uint64_t ccsidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
>  {
>  ARMCPU *cpu = env_archcpu(env);
> @@ -2110,10 +2121,14 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
>.writefn = pmintenclr_write },
>  { .name = "CCSIDR", .state = ARM_CP_STATE_BOTH,
>.opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 0,
> -  .access = PL1_R, .readfn = ccsidr_read, .type = ARM_CP_NO_RAW },
> +  .access = PL1_R,
> +  .accessfn = access_aa64_tid2,
> +  .readfn = ccsidr_read, .type = ARM_CP_NO_RAW },
>  { .name = "CSSELR", .state = ARM_CP_STATE_BOTH,
>.opc0 = 3, .crn = 0, .crm = 0, .opc1 = 2, .opc2 = 0,
> -  .access = PL1_RW, .writefn = csselr_write, .resetvalue = 0,
> +  .access = PL1_RW,
> +  .accessfn = access_aa64_tid2,
> +  .writefn = csselr_write, .resetvalue = 0,
>.bank_fieldoffsets = { offsetof(CPUARMState, cp15.csselr_s),
>   offsetof(CPUARMState, cp15.csselr_ns) } },
>  /* Auxiliary ID register: this actually has an IMPDEF value but for now
> @@ -5204,6 +5219,11 @@ static CPAccessResult ctr_el0_access(CPUARMState *env, 
> const ARMCPRegInfo *ri,
>  if (arm_current_el(env) == 0 && !(env->cp15.sctlr_el[1] & SCTLR_UCT)) {
>  return CP_ACCESS_TRAP;
>  }
> +
> +if (arm_hcr_el2_eff(env) & HCR_TID2) {
> +return CP_ACCESS_TRAP_EL2;
> +}


Shouldn't we also be checking that we're in EL < 2 when trapping?

Also, I think we need to somehow hook in access_aa64_tid2() for the AArch32
view of CTR, don't we?

Cheers,
Edgar


> +
>  return CP_ACCESS_OK;
>  }
>  
> @@ -6184,7 +6204,9 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>  ARMCPRegInfo clidr = {
>  .name = "CLIDR", .state = ARM_CP_STATE_BOTH,
>  .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 1,
> -.access = PL1_R, .type = ARM_CP_CONST, .resetvalue = cpu->clidr
> +.access = PL1_R, .type = ARM_CP_CONST,
> +.accessfn = access_aa64_tid2,
> +.resetvalue = cpu->clidr
>  };
>  define_one_arm_cp_reg(cpu, );
>  define_arm_cp_regs(cpu, v7_cp_reginfo);
> -- 
> 2.20.1
> 
> 



Re: [PATCH] hw: add compat machines for 5.0

2019-11-28 Thread Cornelia Huck
On Tue, 12 Nov 2019 11:48:11 +0100
Cornelia Huck  wrote:

> Add 5.0 machine types for arm/i440fx/q35/s390x/spapr.
> 
> For i440fx and q35, unversioned cpu models are still translated
> to -v1; I'll leave changing this (if desired) to the respective
> maintainers.
> 
> Signed-off-by: Cornelia Huck 
> ---
> 
> also pushed out to https://github.com/cohuck/qemu machine-5.0
> 
> x86 folks: if you want to change the cpu model versioning, I
> can do it in this patch, or just do it on top yourselves
> 
> ---
>  hw/arm/virt.c  |  7 ++-
>  hw/core/machine.c  |  3 +++
>  hw/i386/pc.c   |  3 +++
>  hw/i386/pc_piix.c  | 14 +-
>  hw/i386/pc_q35.c   | 13 -
>  hw/ppc/spapr.c | 15 +--
>  hw/s390x/s390-virtio-ccw.c | 14 +-
>  include/hw/boards.h|  3 +++
>  include/hw/i386/pc.h   |  3 +++
>  9 files changed, 69 insertions(+), 6 deletions(-)

Queued to s390-next.




Re: [PATCH v1 1/1] pc-bios/s390-ccw: fix sclp_get_loadparm_ascii

2019-11-28 Thread Christian Borntraeger



On 28.11.19 16:05, Peter Maydell wrote:
> On Thu, 28 Nov 2019 at 12:48, Christian Borntraeger
>  wrote:
>>
>>
>>
>> On 28.11.19 13:45, Cornelia Huck wrote:
>>> On Thu, 28 Nov 2019 13:35:29 +0100
>>> Christian Borntraeger  wrote:
>>>
 Ack.

 Conny, I think this would be really nice to have for 4.2 (together with a 
 bios rebuild)
 as this fixes a regression. Opinions?
>>>
>>> I fear that this is a bit late for 4.2... but this should get a
>>> cc:stable.
>>
>> So we would rather ship a qemu regression instead of pushing a 1 line fixup?
>> Peter, what is the current state of 4.2? does it look like we will have an 
>> rc4
>> or is everything else silent.
> 
> There isn't currently anything else that would need an rc4.

I would vote for getting this patch still in. And I think we probably do not 
need an
rc4 for that, the fix seems pretty straight forward.




Re: [PATCH 0/2] RFC: add -mem-shared option

2019-11-28 Thread Marc-André Lureau
Hi

On Fri, Nov 29, 2019 at 11:03 AM Gerd Hoffmann  wrote:
>
> On Thu, Nov 28, 2019 at 06:15:16PM +0400, Marc-André Lureau wrote:
> > Hi,
> >
> > Setting up shared memory for vhost-user is a bit complicated from
> > command line, as it requires NUMA setup such as: m 4G -object
> > memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on -numa
> > node,memdev=mem.
> >
> > Instead, I suggest to add a -mem-shared option for non-numa setups,
> > that will make the -mem-path or anonymouse memory shareable.
>
> Is it an option to just use memfd by default (if available)?

Yes, with a fallback path currently using a temporary file under /tmp
(we may want to use shm_open() instead, or a different location such
as XDG_RUNTIME_DIR? - and use O_TMPFILE)




Re: [PATCH] hw: add compat machines for 5.0

2019-11-28 Thread Cornelia Huck
On Thu, 28 Nov 2019 17:38:11 -0300
Eduardo Habkost  wrote:

> On Thu, Nov 28, 2019 at 06:37:06PM +0100, Cornelia Huck wrote:
> > On Tue, 12 Nov 2019 11:48:11 +0100
> > Cornelia Huck  wrote:
> >   
> > > Add 5.0 machine types for arm/i440fx/q35/s390x/spapr.
> > > 
> > > For i440fx and q35, unversioned cpu models are still translated
> > > to -v1; I'll leave changing this (if desired) to the respective
> > > maintainers.
> > > 
> > > Signed-off-by: Cornelia Huck 
> > > ---
> > > 
> > > also pushed out to https://github.com/cohuck/qemu machine-5.0
> > > 
> > > x86 folks: if you want to change the cpu model versioning, I
> > > can do it in this patch, or just do it on top yourselves  
> > 
> > So, do we have a final verdict yet (keep it at v1)?
> > 
> > If yes, I'll queue this via the s390 tree, unless someone else beats me
> > to it.  
> 
> We won't change default_cpu_version in 5.0, so:
> 
> Reviewed-by: Eduardo Habkost 
> 

Thanks!

Will queue through the s390 tree.




[PATCH V3 1/2] block/nbd: extract the common cleanup code

2019-11-28 Thread pannengyuan
From: PanNengyuan 

The BDRVNBDState cleanup code is common in two places, add
nbd_free_bdrvstate_prop() function to do these cleanups (suggested by
Stefano Garzarella).

Signed-off-by: PanNengyuan 
---
 block/nbd.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 1239761..5805979 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -94,6 +94,8 @@ typedef struct BDRVNBDState {
 
 static int nbd_client_connect(BlockDriverState *bs, Error **errp);
 
+static void nbd_free_bdrvstate_prop(BDRVNBDState *s);
+
 static void nbd_channel_error(BDRVNBDState *s, int ret)
 {
 if (ret == -EIO) {
@@ -1486,6 +1488,15 @@ static int nbd_client_connect(BlockDriverState *bs, 
Error **errp)
 }
 }
 
+static void nbd_free_bdrvstate_prop(BDRVNBDState *s)
+{
+object_unref(OBJECT(s->tlscreds));
+qapi_free_SocketAddress(s->saddr);
+g_free(s->export);
+g_free(s->tlscredsid);
+g_free(s->x_dirty_bitmap);
+}
+
 /*
  * Parse nbd_open options
  */
@@ -1855,10 +1866,7 @@ static int nbd_process_options(BlockDriverState *bs, 
QDict *options,
 
  error:
 if (ret < 0) {
-object_unref(OBJECT(s->tlscreds));
-qapi_free_SocketAddress(s->saddr);
-g_free(s->export);
-g_free(s->tlscredsid);
+nbd_free_bdrvstate_prop(s);
 }
 qemu_opts_del(opts);
 return ret;
@@ -1937,12 +1945,7 @@ static void nbd_close(BlockDriverState *bs)
 BDRVNBDState *s = bs->opaque;
 
 nbd_client_close(bs);
-
-object_unref(OBJECT(s->tlscreds));
-qapi_free_SocketAddress(s->saddr);
-g_free(s->export);
-g_free(s->tlscredsid);
-g_free(s->x_dirty_bitmap);
+nbd_free_bdrvstate_prop(s);
 }
 
 static int64_t nbd_getlength(BlockDriverState *bs)
-- 
2.7.2.windows.1





[PATCH V3 2/2] block/nbd: fix memory leak in nbd_open()

2019-11-28 Thread pannengyuan
From: PanNengyuan 

In currently implementation there will be a memory leak when
nbd_client_connect() returns error status. Here is an easy way to
reproduce:

1. run qemu-iotests as follow and check the result with asan:
./check -raw 143

Following is the asan output backtrack:
Direct leak of 40 byte(s) in 1 object(s) allocated from:
#0 0x7f629688a560 in calloc (/usr/lib64/libasan.so.3+0xc7560)
#1 0x7f6295e7e015 in g_malloc0  (/usr/lib64/libglib-2.0.so.0+0x50015)
#2 0x56281dab4642 in qobject_input_start_struct  
/mnt/sdb/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:295
#3 0x56281dab1a04 in visit_start_struct  
/mnt/sdb/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:49
#4 0x56281dad1827 in visit_type_SocketAddress  qapi/qapi-visit-sockets.c:386
#5 0x56281da8062f in nbd_config   /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1716
#6 0x56281da8062f in nbd_process_options 
/mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1829
#7 0x56281da8062f in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873

Direct leak of 15 byte(s) in 1 object(s) allocated from:
#0 0x7f629688a3a0 in malloc (/usr/lib64/libasan.so.3+0xc73a0)
#1 0x7f6295e7dfbd in g_malloc (/usr/lib64/libglib-2.0.so.0+0x4ffbd)
#2 0x7f6295e96ace in g_strdup (/usr/lib64/libglib-2.0.so.0+0x68ace)
#3 0x56281da804ac in nbd_process_options 
/mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1834
#4 0x56281da804ac in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873

Indirect leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x7f629688a3a0 in malloc (/usr/lib64/libasan.so.3+0xc73a0)
#1 0x7f6295e7dfbd in g_malloc (/usr/lib64/libglib-2.0.so.0+0x4ffbd)
#2 0x7f6295e96ace in g_strdup (/usr/lib64/libglib-2.0.so.0+0x68ace)
#3 0x56281dab41a3 in qobject_input_type_str_keyval 
/mnt/sdb/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:536
#4 0x56281dab2ee9 in visit_type_str 
/mnt/sdb/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:297
#5 0x56281dad0fa1 in visit_type_UnixSocketAddress_members 
qapi/qapi-visit-sockets.c:141
#6 0x56281dad17b6 in visit_type_SocketAddress_members 
qapi/qapi-visit-sockets.c:366
#7 0x56281dad186a in visit_type_SocketAddress qapi/qapi-visit-sockets.c:393
#8 0x56281da8062f in nbd_config /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1716
#9 0x56281da8062f in nbd_process_options 
/mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1829
#10 0x56281da8062f in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873

Reported-by: Euler Robot 
Signed-off-by: PanNengyuan 
---
Changes v2 to v1:
- add a new function to do the common cleanups (suggested by Stefano
  Garzarella).
---
Changes v3 to v2:
- split in two patches(suggested by Stefano Garzarella)
---
 block/nbd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/nbd.c b/block/nbd.c
index 5805979..09d6925 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1889,6 +1889,7 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags,
 
 ret = nbd_client_connect(bs, errp);
 if (ret < 0) {
+nbd_free_bdrvstate_prop(s);
 return ret;
 }
 /* successfully connected */
-- 
2.7.2.windows.1





Re: [PATCH 0/2] RFC: add -mem-shared option

2019-11-28 Thread Gerd Hoffmann
On Thu, Nov 28, 2019 at 06:15:16PM +0400, Marc-André Lureau wrote:
> Hi,
> 
> Setting up shared memory for vhost-user is a bit complicated from
> command line, as it requires NUMA setup such as: m 4G -object
> memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on -numa
> node,memdev=mem.
> 
> Instead, I suggest to add a -mem-shared option for non-numa setups,
> that will make the -mem-path or anonymouse memory shareable.

Is it an option to just use memfd by default (if available)?

cheers,
  Gerd




[RFC] smbios: does it make sense to present some host smbios information to guest?

2019-11-28 Thread Guoheyi

Hi folks,

Right now some smbios fields are hard coded (like CPU nominal 
frequency), and some can be opted in, but there is no feasible way to 
present real backend hardware information to the front. In some 
scenario, the users of virtual machines may not be happy to see an 
unknown CPU model and 2.0GHz CPU speed, while they have been told the 
backend CPUs are the newest model with much high frequency.


The backend information may be changed after migration from one host to 
another, but in a large cluster it may also be kept in high probability. 
Even if it is changed, we can synchronize the information after a system 
reset.


So does it make sense to present some host smbios information 
(especially CPU information) to guest, by default or by some easy 
option? Or do you have any other advice?


Thanks,
HG




Re: [PATCH v6] hw/vfio/ap: drop local_err from vfio_ap_realize

2019-11-28 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> No reason for local_err here, use errp directly instead.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Markus Armbruster 




Re: [PATCH v6] backends/cryptodev: drop local_err from cryptodev_backend_complete()

2019-11-28 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> No reason for local_err here, use errp directly instead.

Related: "[PATCH v6] hw/vfio/ap: drop local_err from vfio_ap_realize".
I'm surprised it's just two.  Did you search for the anti-pattern
systematically?

> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Philippe Mathieu-Daudé 
> Reviewed-by: Marc-André Lureau 
> ---
>
> v6: add r-b by Philippe and Marc-André
>
>  backends/cryptodev.c | 11 +--
>  1 file changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/backends/cryptodev.c b/backends/cryptodev.c
> index 3c071eab95..5a9735684e 100644
> --- a/backends/cryptodev.c
> +++ b/backends/cryptodev.c
> @@ -176,19 +176,10 @@ cryptodev_backend_complete(UserCreatable *uc, Error 
> **errp)
>  {
>  CryptoDevBackend *backend = CRYPTODEV_BACKEND(uc);
>  CryptoDevBackendClass *bc = CRYPTODEV_BACKEND_GET_CLASS(uc);
> -Error *local_err = NULL;
>  
>  if (bc->init) {
> -bc->init(backend, _err);
> -if (local_err) {
> -goto out;
> -}
> +bc->init(backend, errp);
>  }
> -
> -return;
> -
> -out:
> -error_propagate(errp, local_err);
>  }
>  
>  void cryptodev_backend_set_used(CryptoDevBackend *backend, bool used)

Reviewed-by: Markus Armbruster 




Re: [PATCH v6] hw/core/qdev: cleanup Error ** variables

2019-11-28 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> Rename Error ** parameter in check_only_migratable to common errp.
>
> In device_set_realized:
>
>  - Move "if (local_err != NULL)" closer to error setters.
>
>  - Drop 'Error **local_errp': it doesn't save any LoCs, but it's very
>unusual.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Eric Blake 
> Reviewed-by: Marc-André Lureau 
> ---
>
> v6: enhance grammar in comment [Eric]
> add r-b by Eric and Marc-André
>
>  hw/core/qdev.c | 28 +---
>  1 file changed, 13 insertions(+), 15 deletions(-)
>
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index cf1ba28fe3..82d3ee590a 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -820,12 +820,12 @@ static bool device_get_realized(Object *obj, Error 
> **errp)
>  return dev->realized;
>  }
>  
> -static bool check_only_migratable(Object *obj, Error **err)
> +static bool check_only_migratable(Object *obj, Error **errp)
>  {
>  DeviceClass *dc = DEVICE_GET_CLASS(obj);
>  
>  if (!vmstate_check_only_migratable(dc->vmsd)) {
> -error_setg(err, "Device %s is not migratable, but "
> +error_setg(errp, "Device %s is not migratable, but "
> "--only-migratable was specified",
> object_get_typename(obj));
>  return false;
> @@ -874,10 +874,9 @@ static void device_set_realized(Object *obj, bool value, 
> Error **errp)
>  
>  if (dc->realize) {
>  dc->realize(dev, _err);
> -}
> -
> -if (local_err != NULL) {
> -goto fail;
> +if (local_err != NULL) {
> +goto fail;
> +}
>  }

Yes, this is the more conventional usage.

>  
>  DEVICE_LISTENER_CALL(realize, Forward, dev);
> @@ -918,27 +917,26 @@ static void device_set_realized(Object *obj, bool 
> value, Error **errp)
> }
>  
>  } else if (!value && dev->realized) {
> -Error **local_errp = NULL;
> +/* We want local_err to track only the first error */
>  QLIST_FOREACH(bus, >child_bus, sibling) {
> -local_errp = local_err ? NULL : _err;
>  object_property_set_bool(OBJECT(bus), false, "realized",
> - local_errp);
> + local_err ? NULL : _err);
>  }

This is a rather unusual way to keep the first error of several.
qapi/error.h advises:

 * Receive and accumulate multiple errors (first one wins):
 * Error *err = NULL, *local_err = NULL;
 * foo(arg, );
 * bar(arg, _err);
 * error_propagate(, local_err);
 * if (err) {
 * handle the error...
 * }

If replacing this by the usual way is too troublesome now, we can do it
after the ERRP_AUTO_PROPAGATE() conversion.  Your choice.

>  if (qdev_get_vmsd(dev)) {
>  vmstate_unregister(dev, qdev_get_vmsd(dev), dev);
>  }
>  if (dc->unrealize) {
> -local_errp = local_err ? NULL : _err;
> -dc->unrealize(dev, local_errp);
> +dc->unrealize(dev, local_err ? NULL : _err);
>  }
>  dev->pending_deleted_event = true;
>  DEVICE_LISTENER_CALL(unrealize, Reverse, dev);
> -}
>  
> -if (local_err != NULL) {
> -goto fail;
> +if (local_err != NULL) {
> +goto fail;
> +}
>  }
>  
> +assert(local_err == NULL);

Not sure this one's worth it's keep with the usage of local_err cleaned
up.

>  dev->realized = value;
>  return;
>  
> @@ -976,7 +974,7 @@ static bool device_get_hotpluggable(Object *obj, Error 
> **errp)
>  qbus_is_hotpluggable(dev->parent_bus));
>  }
>  
> -static bool device_get_hotplugged(Object *obj, Error **err)
> +static bool device_get_hotplugged(Object *obj, Error **errp)
>  {
>  DeviceState *dev = DEVICE(obj);

In case you want to clean up afterwards rather than now:
Reviewed-by: Markus Armbruster 




Re: [PATCH v6] vnc: drop Error pointer indirection in vnc_client_io_error

2019-11-28 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> We don't need Error **, as all callers pass local Error object, which
> isn't used after the call, or NULL. Use Error * instead.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  ui/vnc.h |  2 +-
>  ui/vnc.c | 20 +++-
>  2 files changed, 8 insertions(+), 14 deletions(-)
>
> diff --git a/ui/vnc.h b/ui/vnc.h
> index fea79c2fc9..4e2637ce6c 100644
> --- a/ui/vnc.h
> +++ b/ui/vnc.h
> @@ -547,7 +547,7 @@ uint32_t read_u32(uint8_t *data, size_t offset);
>  
>  /* Protocol stage functions */
>  void vnc_client_error(VncState *vs);
> -size_t vnc_client_io_error(VncState *vs, ssize_t ret, Error **errp);
> +size_t vnc_client_io_error(VncState *vs, ssize_t ret, Error *err);
>  
>  void start_client_init(VncState *vs);
>  void start_auth_vnc(VncState *vs);
> diff --git a/ui/vnc.c b/ui/vnc.c
> index 87b8045afe..4100d6e404 100644
> --- a/ui/vnc.c
> +++ b/ui/vnc.c
> @@ -1312,7 +1312,7 @@ void vnc_disconnect_finish(VncState *vs)
>  g_free(vs);
>  }
>  
> -size_t vnc_client_io_error(VncState *vs, ssize_t ret, Error **errp)
> +size_t vnc_client_io_error(VncState *vs, ssize_t ret, Error *err)
>  {
>  if (ret <= 0) {
>  if (ret == 0) {
> @@ -1320,15 +1320,11 @@ size_t vnc_client_io_error(VncState *vs, ssize_t ret, 
> Error **errp)
>  vnc_disconnect_start(vs);
>  } else if (ret != QIO_CHANNEL_ERR_BLOCK) {
>  trace_vnc_client_io_error(vs, vs->ioc,
> -  errp ? error_get_pretty(*errp) :
> -  "Unknown");
> +  err ? error_get_pretty(err) : 
> "Unknown");
>  vnc_disconnect_start(vs);
>  }
>  
> -if (errp) {
> -error_free(*errp);
> -*errp = NULL;
> -}
> +error_free(err);
>  return 0;
>  }
>  return ret;

Not visible in this patch: the vnc_client_io_error(vs, -1, NULL) in
ui/vnc-auth-sasl.c.  They trace "Unknown", vnc_disconnect_start() as
before.  They additionally execute error_free(NULL), which does nothing.
Good.

> @@ -1361,10 +1357,9 @@ size_t vnc_client_write_buf(VncState *vs, const 
> uint8_t *data, size_t datalen)
>  {
>  Error *err = NULL;
>  ssize_t ret;
> -ret = qio_channel_write(
> -vs->ioc, (const char *)data, datalen, );
> +ret = qio_channel_write(vs->ioc, (const char *)data, datalen, );
>  VNC_DEBUG("Wrote wire %p %zd -> %ld\n", data, datalen, ret);
> -return vnc_client_io_error(vs, ret, );
> +return vnc_client_io_error(vs, ret, err);
>  }
>  
>  
> @@ -1488,10 +1483,9 @@ size_t vnc_client_read_buf(VncState *vs, uint8_t 
> *data, size_t datalen)
>  {
>  ssize_t ret;
>  Error *err = NULL;
> -ret = qio_channel_read(
> -vs->ioc, (char *)data, datalen, );
> +ret = qio_channel_read(vs->ioc, (char *)data, datalen, );
>  VNC_DEBUG("Read wire %p %zd -> %ld\n", data, datalen, ret);
> -return vnc_client_io_error(vs, ret, );
> +return vnc_client_io_error(vs, ret, err);
>  }

Nothing changes for these guys.

Reviewed-by: Markus Armbruster 




Re: [PATCH v6] hmp: drop Error pointer indirection in hmp_handle_error

2019-11-28 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> We don't need Error **, as all callers pass local Error object, which
> isn't used after the call. Use Error * instead.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Acked-by: Dr. David Alan Gilbert 

Reviewed-by: Markus Armbruster 




Re: [PATCH 0/2] RFC: add -mem-shared option

2019-11-28 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20191128141518.628245-1-marcandre.lur...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH 0/2] RFC: add -mem-shared option
Type: series
Message-id: 20191128141518.628245-1-marcandre.lur...@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
e2537da Add -mem-shared option
623d044 memfd: add qemu_memfd_open()

=== OUTPUT BEGIN ===
1/2 Checking commit 623d044023d1 (memfd: add qemu_memfd_open())
2/2 Checking commit e2537da34663 (Add -mem-shared option)
ERROR: do not initialise globals to 0 or NULL
#123: FILE: vl.c:146:
+int mem_shared = 0;

total: 1 errors, 0 warnings, 90 lines checked

Patch 2/2 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20191128141518.628245-1-marcandre.lur...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[for-5.0 1/4] spapr: Don't trigger a CAS reboot for XICS/XIVE mode changeover

2019-11-28 Thread David Gibson
PAPR allows the interrupt controller used on a POWER9 machine (XICS or
XIVE) to be selected by the guest operating system, by using the
ibm,client-architecture-support (CAS) feature negotiation call.

Currently, if the guest selects an interrupt controller different from the
one selected at initial boot, this causes the system to be reset with the
new model and the boot starts again.  This means we run through the SLOF
boot process twice, as well as any other bootloader (e.g. grub) in use
before the OS calls CAS.  This can be confusing and/or inconvenient for
users.

Thanks to two fairly recent changes, we no longer need this reboot.  1) we
now completely regenerate the device tree when CAS is called (meaning we
don't need special case updates for all the device tree changes caused by
the interrupt controller mode change),  2) we now have explicit code paths
to activate and deactivate the different interrupt controllers, rather than
just implicitly calling those at machine reset time.

We can therefore eliminate the reboot for changing irq mode, simply by
putting a called to spapr_irq_update_active_intc() before we call
spapr_h_cas_compose_response() (which gives the updated device tree to the
guest firmware and OS).

Signed-off-by: David Gibson 
---
 hw/ppc/spapr_hcall.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 140f05c1c6..05a7ca275b 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1767,21 +1767,10 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu,
 }
 spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00);
 spapr_ovec_cleanup(ov1_guest);
-if (!spapr->cas_reboot) {
-/* If spapr_machine_reset() did not set up a HPT but one is necessary
- * (because the guest isn't going to use radix) then set it up here. */
-if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
-/* legacy hash or new hash: */
-spapr_setup_hpt_and_vrma(spapr);
-}
-spapr->cas_reboot =
-(spapr_h_cas_compose_response(spapr, args[1], args[2],
-  ov5_updates) != 0);
-}
 
 /*
- * Ensure the guest asks for an interrupt mode we support; otherwise
- * terminate the boot.
+ * Ensure the guest asks for an interrupt mode we support;
+ * otherwise terminate the boot.
  */
 if (guest_xive) {
 if (!spapr->irq->xive) {
@@ -1797,14 +1786,18 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu,
 }
 }
 
-/*
- * Generate a machine reset when we have an update of the
- * interrupt mode. Only required when the machine supports both
- * modes.
- */
+spapr_irq_update_active_intc(spapr);
+
 if (!spapr->cas_reboot) {
-spapr->cas_reboot = spapr_ovec_test(ov5_updates, OV5_XIVE_EXPLOIT)
-&& spapr->irq->xics && spapr->irq->xive;
+/* If spapr_machine_reset() did not set up a HPT but one is necessary
+ * (because the guest isn't going to use radix) then set it up here. */
+if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
+/* legacy hash or new hash: */
+spapr_setup_hpt_and_vrma(spapr);
+}
+spapr->cas_reboot =
+(spapr_h_cas_compose_response(spapr, args[1], args[2],
+  ov5_updates) != 0);
 }
 
 spapr_ovec_cleanup(ov5_updates);
-- 
2.23.0




[for-5.0 3/4] spapr: Fold h_cas_compose_response() into h_client_architecture_support()

2019-11-28 Thread David Gibson
spapr_h_cas_compose_response() handles the last piece of the PAPR feature
negotiation process invoked via the ibm,client-architecture-support OF
call.  Its only caller is h_client_architecture_support() which handles
most of the rest of that process.

I believe it was place in a separate file originally to handle some fiddly
dependencies between functions, but mostly it's just confusing to have
the CAS process split into two pieces like this.  Now that compose response
is simplified (by just generating the whole device tree anew), it's cleaner
to just fold it into h_client_architecture_support().

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 61 +-
 hw/ppc/spapr_hcall.c   | 55 ++---
 include/hw/ppc/spapr.h |  4 +--
 3 files changed, 54 insertions(+), 66 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d34e317f48..5187f5b0a5 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -76,7 +76,6 @@
 #include "hw/nmi.h"
 #include "hw/intc/intc.h"
 
-#include "qemu/cutils.h"
 #include "hw/ppc/spapr_cpu_core.h"
 #include "hw/mem/memory-device.h"
 #include "hw/ppc/spapr_tpm_proxy.h"
@@ -897,63 +896,6 @@ out:
 return ret;
 }
 
-static bool spapr_hotplugged_dev_before_cas(void)
-{
-Object *drc_container, *obj;
-ObjectProperty *prop;
-ObjectPropertyIterator iter;
-
-drc_container = container_get(object_get_root(), "/dr-connector");
-object_property_iter_init(, drc_container);
-while ((prop = object_property_iter_next())) {
-if (!strstart(prop->type, "link<", NULL)) {
-continue;
-}
-obj = object_property_get_link(drc_container, prop->name, NULL);
-if (spapr_drc_needed(obj)) {
-return true;
-}
-}
-return false;
-}
-
-static void *spapr_build_fdt(SpaprMachineState *spapr, bool reset,
- size_t space);
-
-int spapr_h_cas_compose_response(SpaprMachineState *spapr,
- target_ulong addr, target_ulong size,
- SpaprOptionVector *ov5_updates)
-{
-void *fdt;
-SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 };
-
-if (spapr_hotplugged_dev_before_cas()) {
-return 1;
-}
-
-if (size < sizeof(hdr)) {
-error_report("SLOF provided insufficient CAS buffer "
- TARGET_FMT_lu " (min: %zu)", size, sizeof(hdr));
-exit(EXIT_FAILURE);
-}
-
-size -= sizeof(hdr);
-
-fdt = spapr_build_fdt(spapr, false, size);
-_FDT((fdt_pack(fdt)));
-
-cpu_physical_memory_write(addr, , sizeof(hdr));
-cpu_physical_memory_write(addr + sizeof(hdr), fdt, fdt_totalsize(fdt));
-trace_spapr_cas_continue(fdt_totalsize(fdt) + sizeof(hdr));
-
-g_free(spapr->fdt_blob);
-spapr->fdt_size = fdt_totalsize(fdt);
-spapr->fdt_initial_size = spapr->fdt_size;
-spapr->fdt_blob = fdt;
-
-return 0;
-}
-
 static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
 {
 MachineState *ms = MACHINE(spapr);
@@ -1191,8 +1133,7 @@ static void spapr_dt_hypervisor(SpaprMachineState *spapr, 
void *fdt)
 }
 }
 
-static void *spapr_build_fdt(SpaprMachineState *spapr, bool reset,
- size_t space)
+void *spapr_build_fdt(SpaprMachineState *spapr, bool reset, size_t space)
 {
 MachineState *machine = MACHINE(spapr);
 MachineClass *mc = MACHINE_GET_CLASS(machine);
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 05a7ca275b..0f19be794c 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1,4 +1,5 @@
 #include "qemu/osdep.h"
+#include "qemu/cutils.h"
 #include "qapi/error.h"
 #include "sysemu/hw_accel.h"
 #include "sysemu/runstate.h"
@@ -15,6 +16,7 @@
 #include "cpu-models.h"
 #include "trace.h"
 #include "kvm_ppc.h"
+#include "hw/ppc/fdt.h"
 #include "hw/ppc/spapr_ovec.h"
 #include "mmu-book3s-v3.h"
 #include "hw/mem/memory-device.h"
@@ -1638,6 +1640,26 @@ static uint32_t cas_check_pvr(SpaprMachineState *spapr, 
PowerPCCPU *cpu,
 return best_compat;
 }
 
+static bool spapr_hotplugged_dev_before_cas(void)
+{
+Object *drc_container, *obj;
+ObjectProperty *prop;
+ObjectPropertyIterator iter;
+
+drc_container = container_get(object_get_root(), "/dr-connector");
+object_property_iter_init(, drc_container);
+while ((prop = object_property_iter_next())) {
+if (!strstart(prop->type, "link<", NULL)) {
+continue;
+}
+obj = object_property_get_link(drc_container, prop->name, NULL);
+if (spapr_drc_needed(obj)) {
+return true;
+}
+}
+return false;
+}
+
 static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
   SpaprMachineState *spapr,
   target_ulong opcode,
@@ -1645,6 +1667,8 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu,

[for-5.0 4/4] spapr: Simplify ovec diff

2019-11-28 Thread David Gibson
spapr_ovec_diff(ov, old, new) has somewhat complex semantics.  ov is set
to those bits which are in new but not old, and it returns as a boolean
whether or not there are any bits in old but not new.

It turns out that both callers only care about the second, not the first.
This is basically equivalent to a bitmap subset operation, which is easier
to understand and implement.  So replace spapr_ovec_diff() with
spapr_ovec_subset().

Cc: Mike Roth 

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c  | 14 +++---
 hw/ppc/spapr_hcall.c|  8 ++--
 hw/ppc/spapr_ovec.c | 30 ++
 include/hw/ppc/spapr_ovec.h |  4 +---
 4 files changed, 16 insertions(+), 40 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5187f5b0a5..32e1cc1d3f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1840,8 +1840,6 @@ static bool spapr_ov5_cas_needed(void *opaque)
 {
 SpaprMachineState *spapr = opaque;
 SpaprOptionVector *ov5_mask = spapr_ovec_new();
-SpaprOptionVector *ov5_legacy = spapr_ovec_new();
-SpaprOptionVector *ov5_removed = spapr_ovec_new();
 bool cas_needed;
 
 /* Prior to the introduction of SpaprOptionVector, we had two option
@@ -1873,17 +1871,11 @@ static bool spapr_ov5_cas_needed(void *opaque)
 spapr_ovec_set(ov5_mask, OV5_DRCONF_MEMORY);
 spapr_ovec_set(ov5_mask, OV5_DRMEM_V2);
 
-/* spapr_ovec_diff returns true if bits were removed. we avoid using
- * the mask itself since in the future it's possible "legacy" bits may be
- * removed via machine options, which could generate a false positive
- * that breaks migration.
- */
-spapr_ovec_intersect(ov5_legacy, spapr->ov5, ov5_mask);
-cas_needed = spapr_ovec_diff(ov5_removed, spapr->ov5, ov5_legacy);
+/* We need extra information if we have any bits outside the mask
+ * defined above */
+cas_needed = !spapr_ovec_subset(spapr->ov5, ov5_mask);
 
 spapr_ovec_cleanup(ov5_mask);
-spapr_ovec_cleanup(ov5_legacy);
-spapr_ovec_cleanup(ov5_removed);
 
 return cas_needed;
 }
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 0f19be794c..f1799b1b70 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1671,7 +1671,7 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu,
 target_ulong fdt_bufsize = args[2];
 target_ulong ov_table;
 uint32_t cas_pvr;
-SpaprOptionVector *ov1_guest, *ov5_guest, *ov5_cas_old, *ov5_updates;
+SpaprOptionVector *ov1_guest, *ov5_guest, *ov5_cas_old;
 bool guest_radix;
 Error *local_err = NULL;
 bool raw_mode_supported = false;
@@ -1770,9 +1770,7 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu,
 /* capabilities that have been added since CAS-generated guest reset.
  * if capabilities have since been removed, generate another reset
  */
-ov5_updates = spapr_ovec_new();
-spapr->cas_reboot = spapr_ovec_diff(ov5_updates,
-ov5_cas_old, spapr->ov5_cas);
+spapr->cas_reboot = !spapr_ovec_subset(ov5_cas_old, spapr->ov5_cas);
 spapr_ovec_cleanup(ov5_cas_old);
 /* Now that processing is finished, set the radix/hash bit for the
  * guest if it requested a valid mode; otherwise terminate the boot. */
@@ -1849,8 +1847,6 @@ static target_ulong 
h_client_architecture_support(PowerPCCPU *cpu,
 spapr->fdt_blob = fdt;
 }
 
-spapr_ovec_cleanup(ov5_updates);
-
 if (spapr->cas_reboot) {
 qemu_system_reset_request(SHUTDOWN_CAUSE_SUBSYSTEM_RESET);
 }
diff --git a/hw/ppc/spapr_ovec.c b/hw/ppc/spapr_ovec.c
index 811fadf143..0ff6d1aeae 100644
--- a/hw/ppc/spapr_ovec.c
+++ b/hw/ppc/spapr_ovec.c
@@ -76,31 +76,21 @@ void spapr_ovec_intersect(SpaprOptionVector *ov,
 bitmap_and(ov->bitmap, ov1->bitmap, ov2->bitmap, OV_MAXBITS);
 }
 
-/* returns true if options bits were removed, false otherwise */
-bool spapr_ovec_diff(SpaprOptionVector *ov,
- SpaprOptionVector *ov_old,
- SpaprOptionVector *ov_new)
+/* returns true if ov1 has a subset of bits in ov2 */
+bool spapr_ovec_subset(SpaprOptionVector *ov1, SpaprOptionVector *ov2)
 {
-unsigned long *change_mask = bitmap_new(OV_MAXBITS);
-unsigned long *removed_bits = bitmap_new(OV_MAXBITS);
-bool bits_were_removed = false;
+unsigned long *tmp = bitmap_new(OV_MAXBITS);
+bool result;
 
-g_assert(ov);
-g_assert(ov_old);
-g_assert(ov_new);
-
-bitmap_xor(change_mask, ov_old->bitmap, ov_new->bitmap, OV_MAXBITS);
-bitmap_and(ov->bitmap, ov_new->bitmap, change_mask, OV_MAXBITS);
-bitmap_and(removed_bits, ov_old->bitmap, change_mask, OV_MAXBITS);
+g_assert(ov1);
+g_assert(ov2);
 
-if (!bitmap_empty(removed_bits, OV_MAXBITS)) {
-bits_were_removed = true;
-}
+bitmap_andnot(tmp, ov1->bitmap, ov2->bitmap, OV_MAXBITS);
+result = bitmap_empty(tmp, OV_MAXBITS);
 
-g_free(change_mask);

[for-5.0 0/4] spapr: Improvements to CAS feature negotiation

2019-11-28 Thread David Gibson
This series contains several cleanups to the handling of the
ibm,client-architecture-support firmware call used for boot time
feature negotiation between the guest OS and the firmware &
hypervisor.

Mostly it's just internal polish, but one significant user visible
change is that we no longer generate an extra CAS reboot to switch
between XICS and XIVE interrupt modes (by far the most common cause of
CAS reboots in practice).

David Gibson (4):
  spapr: Don't trigger a CAS reboot for XICS/XIVE mode changeover
  spapr: Improve handling of fdt buffer size
  spapr: Fold h_cas_compose_response() into
h_client_architecture_support()
  spapr: Simplify ovec diff

 hw/ppc/spapr.c  | 92 +++--
 hw/ppc/spapr_hcall.c| 90 +---
 hw/ppc/spapr_ovec.c | 30 
 include/hw/ppc/spapr.h  |  4 +-
 include/hw/ppc/spapr_ovec.h |  4 +-
 5 files changed, 83 insertions(+), 137 deletions(-)

-- 
2.23.0




[for-5.0 2/4] spapr: Improve handling of fdt buffer size

2019-11-28 Thread David Gibson
Previously, spapr_build_fdt() constructed the device tree in a fixed
buffer of size FDT_MAX_SIZE.  This is a bit inflexible, but more
importantly it's awkward for the case where we use it during CAS.  In
that case the guest firmware supplies a buffer and we have to
awkwardly check that what we generated fits into it afterwards, after
doing a lot of size checks during spapr_build_fdt().

Simplify this by having spapr_build_fdt() take a 'space' parameter.
For the CAS case, we pass in the buffer size provided by SLOF, for the
machine init case, we continue to pass FDT_MAX_SIZE.

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 33 +++--
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d9c9a2bcee..d34e317f48 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -917,7 +917,8 @@ static bool spapr_hotplugged_dev_before_cas(void)
 return false;
 }
 
-static void *spapr_build_fdt(SpaprMachineState *spapr, bool reset);
+static void *spapr_build_fdt(SpaprMachineState *spapr, bool reset,
+ size_t space);
 
 int spapr_h_cas_compose_response(SpaprMachineState *spapr,
  target_ulong addr, target_ulong size,
@@ -930,24 +931,17 @@ int spapr_h_cas_compose_response(SpaprMachineState *spapr,
 return 1;
 }
 
-if (size < sizeof(hdr) || size > FW_MAX_SIZE) {
-error_report("SLOF provided an unexpected CAS buffer size "
- TARGET_FMT_lu " (min: %zu, max: %u)",
- size, sizeof(hdr), FW_MAX_SIZE);
+if (size < sizeof(hdr)) {
+error_report("SLOF provided insufficient CAS buffer "
+ TARGET_FMT_lu " (min: %zu)", size, sizeof(hdr));
 exit(EXIT_FAILURE);
 }
 
 size -= sizeof(hdr);
 
-fdt = spapr_build_fdt(spapr, false);
+fdt = spapr_build_fdt(spapr, false, size);
 _FDT((fdt_pack(fdt)));
 
-if (fdt_totalsize(fdt) + sizeof(hdr) > size) {
-g_free(fdt);
-trace_spapr_cas_failed(size);
-return -1;
-}
-
 cpu_physical_memory_write(addr, , sizeof(hdr));
 cpu_physical_memory_write(addr + sizeof(hdr), fdt, fdt_totalsize(fdt));
 trace_spapr_cas_continue(fdt_totalsize(fdt) + sizeof(hdr));
@@ -1197,7 +1191,8 @@ static void spapr_dt_hypervisor(SpaprMachineState *spapr, 
void *fdt)
 }
 }
 
-static void *spapr_build_fdt(SpaprMachineState *spapr, bool reset)
+static void *spapr_build_fdt(SpaprMachineState *spapr, bool reset,
+ size_t space)
 {
 MachineState *machine = MACHINE(spapr);
 MachineClass *mc = MACHINE_GET_CLASS(machine);
@@ -1207,8 +1202,8 @@ static void *spapr_build_fdt(SpaprMachineState *spapr, 
bool reset)
 SpaprPhbState *phb;
 char *buf;
 
-fdt = g_malloc0(FDT_MAX_SIZE);
-_FDT((fdt_create_empty_tree(fdt, FDT_MAX_SIZE)));
+fdt = g_malloc0(space);
+_FDT((fdt_create_empty_tree(fdt, space)));
 
 /* Root node */
 _FDT(fdt_setprop_string(fdt, 0, "device_type", "chrp"));
@@ -1723,19 +1718,13 @@ static void spapr_machine_reset(MachineState *machine)
  */
 fdt_addr = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FDT_MAX_SIZE;
 
-fdt = spapr_build_fdt(spapr, true);
+fdt = spapr_build_fdt(spapr, true, FDT_MAX_SIZE);
 
 rc = fdt_pack(fdt);
 
 /* Should only fail if we've built a corrupted tree */
 assert(rc == 0);
 
-if (fdt_totalsize(fdt) > FDT_MAX_SIZE) {
-error_report("FDT too big ! 0x%x bytes (max is 0x%x)",
- fdt_totalsize(fdt), FDT_MAX_SIZE);
-exit(1);
-}
-
 /* Load the fdt */
 qemu_fdt_dumpdtb(fdt, fdt_totalsize(fdt));
 cpu_physical_memory_write(fdt_addr, fdt, fdt_totalsize(fdt));
-- 
2.23.0




Re: [PATCH 0/2] RFC: add -mem-shared option

2019-11-28 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20191128141518.628245-1-marcandre.lur...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  x86_64-softmmu/accel/tcg/tcg-runtime-gvec.o
  CC  x86_64-softmmu/accel/tcg/cpu-exec.o
/tmp/qemu-test/src/exec.c: In function 'qemu_ram_alloc_from_file':
/tmp/qemu-test/src/exec.c:2366:12: error: 'created' may be used uninitialized 
in this function [-Werror=maybe-uninitialized]
 if (created) {
^
cc1: all warnings being treated as errors
make[1]: *** [exec.o] Error 1
make[1]: *** Waiting for unfinished jobs
  CC  aarch64-softmmu/disas.o
  GEN aarch64-softmmu/gdbstub-xml.c
/tmp/qemu-test/src/exec.c: In function 'qemu_ram_alloc_from_file':
/tmp/qemu-test/src/exec.c:2366:12: error: 'created' may be used uninitialized 
in this function [-Werror=maybe-uninitialized]
 if (created) {
^
cc1: all warnings being treated as errors
make[1]: *** [exec.o] Error 1
make[1]: *** Waiting for unfinished jobs
make: *** [x86_64-softmmu/all] Error 2
make: *** Waiting for unfinished jobs
make: *** [aarch64-softmmu/all] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 662, in 
sys.exit(main())
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=ce348942cc144218ab665262b5e2bf34', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-fts2mukp/src/docker-src.2019-11-28-23.35.08.15514:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=ce348942cc144218ab665262b5e2bf34
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-fts2mukp/src'
make: *** [docker-run-test-quick@centos7] Error 2

real2m27.615s
user0m8.146s


The full log is available at
http://patchew.org/logs/20191128141518.628245-1-marcandre.lur...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v19 7/8] tests/numa: Add case for QMP build HMAT

2019-11-28 Thread Tao Xu

On 11/28/2019 7:53 PM, Thomas Huth wrote:

On 28/11/2019 12.49, Markus Armbruster wrote:

Tao Xu  writes:


Check configuring HMAT usecase

Reviewed-by: Igor Mammedov 
Suggested-by: Igor Mammedov 
Signed-off-by: Tao Xu 
---

Changes in v19:
  - Add some fail cases for hmat-cache when level=0

Changes in v18:
  - Rewrite the lines over 80 characters

Chenges in v17:
  - Add some fail test cases (Igor)
---
   tests/numa-test.c | 213 ++
   1 file changed, 213 insertions(+)

diff --git a/tests/numa-test.c b/tests/numa-test.c
index 8de8581231..aed7b2f31b 100644
--- a/tests/numa-test.c
+++ b/tests/numa-test.c
@@ -327,6 +327,216 @@ static void pc_dynamic_cpu_cfg(const void *data)
   qtest_quit(qs);
   }
   
+static void pc_hmat_build_cfg(const void *data)

+{
+QTestState *qs = qtest_initf("%s -nodefaults --preconfig -machine hmat=on "
+ "-smp 2,sockets=2 "
+ "-m 128M,slots=2,maxmem=1G "
+ "-object memory-backend-ram,size=64M,id=m0 "
+ "-object memory-backend-ram,size=64M,id=m1 "
+ "-numa node,nodeid=0,memdev=m0 "
+ "-numa node,nodeid=1,memdev=m1,initiator=0 "
+ "-numa cpu,node-id=0,socket-id=0 "
+ "-numa cpu,node-id=0,socket-id=1",
+ data ? (char *)data : "");
+
+/* Fail: Initiator should be less than the number of nodes */
+g_assert(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 2, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"access-latency\" } }")));


Code smell: side effect within assert().

Harmless here, because compiling tests with NDEBUG is pointless.  Still,
it sets a bad example.  Not your idea, the pattern seems to go back to
commit c35665e1ee3 and fb1e58f72ba.


... maybe best to use g_assert_true() which can't be disabled and thus
should be used in tests. See:

   https://developer.gnome.org/glib/unstable/glib-Testing.html#g-assert-true

   Thomas

Thank you for your suggestion. I will use g_assert_true and 
g_assert_false to replace g_assert




Re: Network connection with COLO VM

2019-11-28 Thread Daniel Cho
Hi David,  Zhang,

Thanks for replying my question.
We know why will occur this issue.
As you said, the COLO VM's network needs
colo-proxy to control packets, so the guest's
interface should set the filter to solve the problem.

But we found another question, when we set the
fault-tolerance feature to guest (primary VM is running,
secondary VM is pausing), the guest's network would not
responds any request for a while (in our environment
about 20~30 secs) after secondary VM runs.

Does it be a normal situation, or a known issue?

Our test is creating primary VM for a while, then creating
secondary VM to make it with COLO feature.

Best Regard,
Daniel Cho

Zhang, Chen  於 2019年11月28日 週四 上午9:26寫道:

>
>
> > -Original Message-
> > From: Dr. David Alan Gilbert 
> > Sent: Wednesday, November 27, 2019 6:51 PM
> > To: Daniel Cho ; Zhang, Chen
> > ; lukasstra...@web.de
> > Cc: qemu-devel@nongnu.org
> > Subject: Re: Network connection with COLO VM
> >
> > * Daniel Cho (daniel...@qnap.com) wrote:
> > > Hello everyone,
> > >
> > > Could we ssh to colo VM (means PVM & SVM are starting)?
> > >
> >
> > Lets cc in Zhang Chen and Lukas Straub.
>
> Thanks Dave.
>
> >
> > > SSH will connect to colo VM for a while, but it will disconnect with
> > > error
> > > *client_loop: send disconnect: Broken pipe*
> > >
> > > It seems to colo VM could not keep network session.
> > >
> > > Does it be a known issue?
> >
> > That sounds like the COLO proxy is getting upset; it's supposed to
> compare
> > packets sent by the primary and secondary and only send one to the
> outside
> > - you shouldn't be talking directly to the guest, but always via the
> proxy.  See
> > docs/colo-proxy.txt
> >
>
> Hi Daniel,
>
> I have try ssh to COLO guest with 8 hours, not occurred this issue.
> Please check your network/qemu configuration.
> But I found another problem maybe related this issue, if no network
> communication for a period of time(maybe 10min), the first message send to
> guest have a chance with delay(maybe 1-5 sec), I will try to fix it when I
> have time.
>
> Thanks
> Zhang Chen
>
> > Dave
> >
> > > Best Regard,
> > > Daniel Cho
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>
>


Re: [PATCH v19 2/8] numa: Extend CLI to provide memory latency and bandwidth information

2019-11-28 Thread Tao Xu

On 11/28/2019 7:50 PM, Markus Armbruster wrote:

Tao Xu  writes:


From: Liu Jingqi 

Add -numa hmat-lb option to provide System Locality Latency and
Bandwidth Information. These memory attributes help to build
System Locality Latency and Bandwidth Information Structure(s)
in ACPI Heterogeneous Memory Attribute Table (HMAT). Before using
hmat-lb option, enable HMAT with -machine hmat=on.

Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
---

[...]

diff --git a/qapi/machine.json b/qapi/machine.json
index 27d0e37534..cf9851fcd1 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -426,10 +426,12 @@
  #
  # @cpu: property based CPU(s) to node mapping (Since: 2.10)
  #
+# @hmat-lb: memory latency and bandwidth information (Since: 5.0)
+#
  # Since: 2.1
  ##
  { 'enum': 'NumaOptionsType',
-  'data': [ 'node', 'dist', 'cpu' ] }
+  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
  
  ##

  # @NumaOptions:
@@ -444,7 +446,8 @@
'data': {
  'node': 'NumaNodeOptions',
  'dist': 'NumaDistOptions',
-'cpu': 'NumaCpuOptions' }}
+'cpu': 'NumaCpuOptions',
+'hmat-lb': 'NumaHmatLBOptions' }}
  
  ##

  # @NumaNodeOptions:
@@ -557,6 +560,92 @@
 'base': 'CpuInstanceProperties',
 'data' : {} }
  
+##

+# @HmatLBMemoryHierarchy:
+#
+# The memory hierarchy in the System Locality Latency and Bandwidth
+# Information Structure of HMAT (Heterogeneous Memory Attribute Table)
+#
+# For more information about @HmatLBMemoryHierarchy see chapter


@HmatLBMemoryHierarchy, see


+# 5.2.27.4: Table 5-146: Field "Flags" of ACPI 6.3 spec.
+#
+# @memory: the structure represents the memory performance
+#
+# @first-level: first level of memory side cache
+#
+# @second-level: second level of memory side cache
+#
+# @third-level: third level of memory side cache
+#
+# Since: 5.0
+##
+{ 'enum': 'HmatLBMemoryHierarchy',
+  'data': [ 'memory', 'first-level', 'second-level', 'third-level' ] }
+
+##
+# @HmatLBDataType:
+#
+# Data type in the System Locality Latency and Bandwidth
+# Information Structure of HMAT (Heterogeneous Memory Attribute Table)
+#
+# For more information about @HmatLBDataType see chapter


@HmatLBDataType, see


+# 5.2.27.4: Table 5-146:  Field "Data Type" of ACPI 6.3 spec.
+#
+# @access-latency: access latency (nanoseconds)
+#
+# @read-latency: read latency (nanoseconds)
+#
+# @write-latency: write latency (nanoseconds)
+#
+# @access-bandwidth: access bandwidth (Bytes per second)
+#
+# @read-bandwidth: read bandwidth (Bytes per second)
+#
+# @write-bandwidth: write bandwidth (Bytes per second)
+#
+# Since: 5.0
+##
+{ 'enum': 'HmatLBDataType',
+  'data': [ 'access-latency', 'read-latency', 'write-latency',
+'access-bandwidth', 'read-bandwidth', 'write-bandwidth' ] }
+
+##
+# @NumaHmatLBOptions:
+#
+# Set the system locality latency and bandwidth information
+# between Initiator and Target proximity Domains.
+#
+# For more information about @NumaHmatLBOptions see chapter


@NumaHmatLBOptions, see


+# 5.2.27.4: Table 5-146 of ACPI 6.3 spec.
+#
+# @initiator: the Initiator Proximity Domain.
+#
+# @target: the Target Proximity Domain.
+#
+# @hierarchy: the Memory Hierarchy. Indicates the performance
+# of memory or side cache.
+#
+# @data-type: presents the type of data, access/read/write
+# latency or hit latency.
+#
+# @latency: the value of latency from @initiator to @target
+#   proximity domain, the latency unit is "ns(nanosecond)".
+#
+# @bandwidth: the value of bandwidth between @initiator and @target
+# proximity domain, the bandwidth unit is
+# "Bytes per second".
+#
+# Since: 5.0
+##
+{ 'struct': 'NumaHmatLBOptions',
+'data': {
+'initiator': 'uint16',
+'target': 'uint16',
+'hierarchy': 'HmatLBMemoryHierarchy',
+'data-type': 'HmatLBDataType',
+'*latency': 'uint64',
+'*bandwidth': 'size' }}
+
  ##
  # @HostMemPolicy:
  #
diff --git a/qemu-options.hx b/qemu-options.hx
index 63f6b33322..23303fc7d7 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -168,16 +168,19 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
  "-numa 
node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
  "-numa 
node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
  "-numa dist,src=source,dst=destination,val=distance\n"
-"-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
+"-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
+"-numa 
hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
  QEMU_ARCH_ALL)
  STEXI
  @item -numa 
node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
  @itemx -numa 
node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
  @itemx -numa 

Re: [PATCH v19 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT)

2019-11-28 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20191128082109.30081-1-tao3...@intel.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TESTiotest-qcow2: 170
Broken pipe
/tmp/qemu-test/src/tests/libqtest.c:149: kill_qemu() detected QEMU death from 
signal 8 (Floating point exception) (core dumped)
ERROR - too few tests run (expected 9, got 8)
make: *** [check-qtest-x86_64] Error 1
make: *** Waiting for unfinished jobs
  TESTiotest-qcow2: 172
  TESTiotest-qcow2: 174
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=ea9c4b2fa2374bedbd6e1d91e0e51884', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-yy9ek690/src/docker-src.2019-11-28-20.32.24.27277:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=ea9c4b2fa2374bedbd6e1d91e0e51884
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-yy9ek690/src'
make: *** [docker-run-test-quick@centos7] Error 2

real11m12.915s
user0m8.101s


The full log is available at
http://patchew.org/logs/20191128082109.30081-1-tao3...@intel.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v19 3/8] numa: Extend CLI to provide memory side cache information

2019-11-28 Thread Tao Xu

On 11/28/2019 7:50 PM, Markus Armbruster wrote:

Tao Xu  writes:


From: Liu Jingqi 

Add -numa hmat-cache option to provide Memory Side Cache Information.
These memory attributes help to build Memory Side Cache Information
Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).
Before using hmat-cache option, enable HMAT with -machine hmat=on.

Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
---

Changes in v19:
 - Add description about the machine property 'hmat' in commit
   message (Markus)
 - Update the QAPI comments
 - Add a check for no memory side cache

Changes in v18:
 - Update the error message (Igor)

Changes in v17:
 - Use NumaHmatCacheOptions to replace HMAT_Cache_Info (Igor)
 - Add check for unordered cache level input (Igor)

Changes in v16:
 - Add cross check with hmat_lb data (Igor)
 - Drop total_levels in struct HMAT_Cache_Info (Igor)
 - Correct the error table number (Igor)

Changes in v15:
 - Change the QAPI version tag to 5.0 (Eric)
---
  hw/core/numa.c| 86 +++
  include/sysemu/numa.h |  5 +++
  qapi/machine.json | 81 +++-
  qemu-options.hx   | 16 +++-
  4 files changed, 184 insertions(+), 4 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 2183c8df1f..664b44ad68 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -366,6 +366,79 @@ void parse_numa_hmat_lb(NumaState *numa_state, 
NumaHmatLBOptions *node,
  g_array_append_val(hmat_lb->list, lb_data);
  }
  
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,

+   Error **errp)
+{
+int nb_numa_nodes = ms->numa_state->num_nodes;
+NodeInfo *numa_info = ms->numa_state->nodes;
+NumaHmatCacheOptions *hmat_cache = NULL;
+
+if (node->node_id >= nb_numa_nodes) {
+error_setg(errp, "Invalid node-id=%" PRIu32 ", it should be less "
+   "than %d", node->node_id, nb_numa_nodes);
+return;
+}
+
+if (numa_info[node->node_id].lb_info_provided != (BIT(0) | BIT(1))) {
+error_setg(errp, "The latency and bandwidth information of "
+   "node-id=%" PRIu32 " should be provided before memory side "
+   "cache attributes", node->node_id);
+return;
+}
+
+if (node->level >= HMAT_LB_LEVELS) {
+error_setg(errp, "Invalid level=%" PRIu8 ", it should be less than or "
+   "equal to %d", node->level, HMAT_LB_LEVELS - 1);
+return;
+}
+
+if (!node->level && (node->assoc || node->policy || node->line)) {
+error_setg(errp, "Assoc and policy options should be 'none', line "
+   "should be 0. If cache level is 0, which means no memory "
+   "side cache in node-id=%" PRIu32, node->node_id);


Error messages should be a phrase, not a paragraph; see error_setg()'s
function comment.  I think you want something like "be 0 when cache
level is 0".

I'm not sure the error message should explain what level 0 means, but
I'm happy to defer to the NUMA maintainers there.


+return;
+}
+
+assert(node->assoc < HMAT_CACHE_ASSOCIATIVITY__MAX);
+assert(node->policy < HMAT_CACHE_WRITE_POLICY__MAX);
+if (ms->numa_state->hmat_cache[node->node_id][node->level]) {
+error_setg(errp, "Duplicate configuration of the side cache for "
+   "node-id=%" PRIu32 " and level=%" PRIu8,
+   node->node_id, node->level);
+return;
+}
+
+if ((node->level > 1) &&
+ms->numa_state->hmat_cache[node->node_id][node->level - 1] &&
+(node->size >=
+ms->numa_state->hmat_cache[node->node_id][node->level - 1]->size)) 
{
+error_setg(errp, "Invalid size=%" PRIu64 ", the size of level=%" PRIu8
+   " should be less than the size(%" PRIu64 ") of "
+   "level=%" PRIu8, node->size, node->level,
+   ms->numa_state->hmat_cache[node->node_id]
+ [node->level - 1]->size,
+   node->level - 1);
+return;
+}
+
+if ((node->level < HMAT_LB_LEVELS - 1) &&
+ms->numa_state->hmat_cache[node->node_id][node->level + 1] &&
+(node->size <=
+ms->numa_state->hmat_cache[node->node_id][node->level + 1]->size)) 
{
+error_setg(errp, "Invalid size=%" PRIu64 ", the size of level=%" PRIu8
+   " should be larger than the size(%" PRIu64 ") of "
+   "level=%" PRIu8, node->size, node->level,
+   ms->numa_state->hmat_cache[node->node_id]
+ [node->level + 1]->size,
+   node->level + 1);
+return;
+}
+
+hmat_cache = g_malloc0(sizeof(*hmat_cache));
+memcpy(hmat_cache, node, sizeof(*hmat_cache));
+ms->numa_state->hmat_cache[node->node_id][node->level] = hmat_cache;

Re: [PATCH 2/7] target/ppc: Work [S]PURR implementation and add HV support

2019-11-28 Thread David Gibson
On Thu, Nov 28, 2019 at 02:46:55PM +0100, Cédric Le Goater wrote:
> From: Suraj Jitindar Singh 
> 
> The Processor Utilisation of Resources Register (PURR) and Scaled
> Processor Utilisation of Resources Register (SPURR) provide an estimate
> of the resources used by the thread, present on POWER7 and later
> processors.
> 
> Currently the [S]PURR registers simply count at the rate of the
> timebase.
> 
> Preserve this behaviour but rework the implementation to store an offset
> like the timebase rather than doing the calculation manually. Also allow
> hypervisor write access to the register along with the currently
> available read access.
> 
> Signed-off-by: Suraj Jitindar Singh 
> Reviewed-by: David Gibson 
> [ clg: rebased on current ppc tree ]
> Signed-off-by: Cédric Le Goater 
> ---
> 
>  David,
> 
>  In the initial discussion, PURR and VTB still needed to be added to
>  the migration stream. The patch is changing the representation indeed
>  but that seems OK. AFAICT, all the SPRs are migrated. I didn't quite
>  understand that part. See http://patchwork.ozlabs.org/patch/1094662
>  for more info.

Ah, right, forgot that discussion in my comment on 1/1.

So, "all SPRs are migrated" is kind of conditional.  All SPRs stored
statically in env->sprs[] are migrated.  But here we have what's
essentially a virtual register whose value is calculated at read time.
We don't actually update sprs[SPR_TB] and similar registers at 500MHz,
that would be impossibly expensive.  So instead we need to migrate the
offset data - in some encoding or other - so that the apparent
register value on reads after the migrate will make sense.

I think we can do this without actually adding extra data to the
stream, by using the sprs[SPR_VTB,SPR_PURR] fields.  But to do so will
need a little extra logic.

I think what Suraj was suggesting in that patchwork link was to read
the PURR value (via the helpers here) at pre_save() and write it to
sprs[SPR_PURR], then at post_load() take the value in sprs[SPR_PURR]
and write it back to the virtual PURR via the helpers here.  That will
effectively freeze the PURR during the migration downtime, but
otherwise preserve it, and for PURR and SPURR I think that makes
sense.

VTB is a little different.  Like the TB itself, it's essentially
tracking wall clock time, and so it should continue to count
(conceptually speaking) during the migration downtime.

I think the key here is that we want to maintain the offset between TB
and VTB across migration.  It will require different logic, but
there's a good chance we can still save the data we need in
sprs[SPR_VTB] rather than having to add extra things to the stream.

That said, the writable versions of PURR, SPURR and VTB are only
relevant for pnv, and I don't think we currently support migration of
pnv machines anyway.  So we could punt on this until later.  But if we
do that, I would like to see some TODO comments in strategic places.

> 
>  Nevertheless, you added your Reviewed-by.
> 
>  include/hw/ppc/ppc.h|  3 +--
>  target/ppc/cpu.h|  1 +
>  target/ppc/helper.h |  1 +
>  hw/ppc/ppc.c| 17 +++--
>  target/ppc/timebase_helper.c|  5 +
>  target/ppc/translate_init.inc.c | 23 +++
>  6 files changed, 30 insertions(+), 20 deletions(-)
> 
> diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
> index 02481cd27c36..27bef85ca869 100644
> --- a/include/hw/ppc/ppc.h
> +++ b/include/hw/ppc/ppc.h
> @@ -33,8 +33,7 @@ struct ppc_tb_t {
>  /* Hypervisor decrementer management */
>  uint64_t hdecr_next;/* Tick for next hdecr interrupt  */
>  QEMUTimer *hdecr_timer;
> -uint64_t purr_load;
> -uint64_t purr_start;
> +int64_t purr_offset;
>  void *opaque;
>  uint32_t flags;
>  };
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 19d6e724bb5a..9128dbefbdb0 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1313,6 +1313,7 @@ void cpu_ppc_store_decr(CPUPPCState *env, target_ulong 
> value);
>  target_ulong cpu_ppc_load_hdecr(CPUPPCState *env);
>  void cpu_ppc_store_hdecr(CPUPPCState *env, target_ulong value);
>  uint64_t cpu_ppc_load_purr(CPUPPCState *env);
> +void cpu_ppc_store_purr(CPUPPCState *env, uint64_t value);
>  uint32_t cpu_ppc601_load_rtcl(CPUPPCState *env);
>  uint32_t cpu_ppc601_load_rtcu(CPUPPCState *env);
>  #if !defined(CONFIG_USER_ONLY)
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index a5f53bb421a7..356a14d8a639 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -655,6 +655,7 @@ DEF_HELPER_FLAGS_1(load_601_rtcu, TCG_CALL_NO_RWG, tl, 
> env)
>  #if !defined(CONFIG_USER_ONLY)
>  #if defined(TARGET_PPC64)
>  DEF_HELPER_FLAGS_1(load_purr, TCG_CALL_NO_RWG, tl, env)
> +DEF_HELPER_FLAGS_2(store_purr, TCG_CALL_NO_RWG, void, env, tl)
>  DEF_HELPER_2(store_ptcr, void, env, tl)
>  #endif
>  DEF_HELPER_2(store_sdr1, void, env, tl)
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> 

Re: [PATCH 1/7] target/ppc: Implement the VTB for HV access

2019-11-28 Thread David Gibson
On Thu, Nov 28, 2019 at 02:46:54PM +0100, Cédric Le Goater wrote:
> From: Suraj Jitindar Singh 
> 
> The virtual timebase register (VTB) is a 64-bit register which
> increments at the same rate as the timebase register, present on POWER8
> and later processors.
> 
> The register is able to be read/written by the hypervisor and read by
> the supervisor. All other accesses are illegal.
> 
> Currently the VTB is just an alias for the timebase (TB) register.
> 
> Implement the VTB so that is can be read/written independent of the TB.
> Make use of the existing method for accessing timebase facilities where
> by the compensation is stored and used to compute the value on reads/is
> updated on writes.
> 
> Signed-off-by: Suraj Jitindar Singh 
> [ clg: rebased on current ppc tree ]
> Signed-off-by: Cédric Le Goater 

Don't we need something to make the VTB migrate correctly?  Or do we
not care because it's only used on pnv which isn't migratable yet?

> ---
>  include/hw/ppc/ppc.h|  1 +
>  target/ppc/cpu.h|  2 ++
>  target/ppc/helper.h |  2 ++
>  hw/ppc/ppc.c| 16 
>  linux-user/ppc/cpu_loop.c   |  5 +
>  target/ppc/timebase_helper.c| 10 ++
>  target/ppc/translate_init.inc.c | 19 +++
>  7 files changed, 51 insertions(+), 4 deletions(-)
> 
> diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
> index 585be6ab98c5..02481cd27c36 100644
> --- a/include/hw/ppc/ppc.h
> +++ b/include/hw/ppc/ppc.h
> @@ -24,6 +24,7 @@ struct ppc_tb_t {
>  /* Time base management */
>  int64_t  tb_offset;/* Compensation*/
>  int64_t  atb_offset;   /* Compensation*/
> +int64_t  vtb_offset;
>  uint32_t tb_freq;  /* TB frequency*/
>  /* Decrementer management */
>  uint64_t decr_next;/* Tick for next decr interrupt*/
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index e3e82327b723..19d6e724bb5a 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1305,6 +1305,8 @@ uint64_t cpu_ppc_load_atbl(CPUPPCState *env);
>  uint32_t cpu_ppc_load_atbu(CPUPPCState *env);
>  void cpu_ppc_store_atbl(CPUPPCState *env, uint32_t value);
>  void cpu_ppc_store_atbu(CPUPPCState *env, uint32_t value);
> +uint64_t cpu_ppc_load_vtb(CPUPPCState *env);
> +void cpu_ppc_store_vtb(CPUPPCState *env, uint64_t value);
>  bool ppc_decr_clear_on_delivery(CPUPPCState *env);
>  target_ulong cpu_ppc_load_decr(CPUPPCState *env);
>  void cpu_ppc_store_decr(CPUPPCState *env, target_ulong value);
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index f843814b8aa8..a5f53bb421a7 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -649,6 +649,7 @@ DEF_HELPER_FLAGS_1(load_tbl, TCG_CALL_NO_RWG, tl, env)
>  DEF_HELPER_FLAGS_1(load_tbu, TCG_CALL_NO_RWG, tl, env)
>  DEF_HELPER_FLAGS_1(load_atbl, TCG_CALL_NO_RWG, tl, env)
>  DEF_HELPER_FLAGS_1(load_atbu, TCG_CALL_NO_RWG, tl, env)
> +DEF_HELPER_FLAGS_1(load_vtb, TCG_CALL_NO_RWG, tl, env)
>  DEF_HELPER_FLAGS_1(load_601_rtcl, TCG_CALL_NO_RWG, tl, env)
>  DEF_HELPER_FLAGS_1(load_601_rtcu, TCG_CALL_NO_RWG, tl, env)
>  #if !defined(CONFIG_USER_ONLY)
> @@ -669,6 +670,7 @@ DEF_HELPER_FLAGS_1(load_decr, TCG_CALL_NO_RWG, tl, env)
>  DEF_HELPER_FLAGS_2(store_decr, TCG_CALL_NO_RWG, void, env, tl)
>  DEF_HELPER_FLAGS_1(load_hdecr, TCG_CALL_NO_RWG, tl, env)
>  DEF_HELPER_FLAGS_2(store_hdecr, TCG_CALL_NO_RWG, void, env, tl)
> +DEF_HELPER_FLAGS_2(store_vtb, TCG_CALL_NO_RWG, void, env, tl)
>  DEF_HELPER_2(store_hid0_601, void, env, tl)
>  DEF_HELPER_3(store_403_pbr, void, env, i32, tl)
>  DEF_HELPER_FLAGS_1(load_40x_pit, TCG_CALL_NO_RWG, tl, env)
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index 8dd982fc1e40..263922052536 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -694,6 +694,22 @@ void cpu_ppc_store_atbu (CPUPPCState *env, uint32_t 
> value)
>   _env->atb_offset, ((uint64_t)value << 32) | tb);
>  }
>  
> +uint64_t cpu_ppc_load_vtb(CPUPPCState *env)
> +{
> +ppc_tb_t *tb_env = env->tb_env;
> +
> +return cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
> +  tb_env->vtb_offset);
> +}
> +
> +void cpu_ppc_store_vtb(CPUPPCState *env, uint64_t value)
> +{
> +ppc_tb_t *tb_env = env->tb_env;
> +
> +cpu_ppc_store_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
> + _env->vtb_offset, value);
> +}
> +
>  static void cpu_ppc_tb_stop (CPUPPCState *env)
>  {
>  ppc_tb_t *tb_env = env->tb_env;
> diff --git a/linux-user/ppc/cpu_loop.c b/linux-user/ppc/cpu_loop.c
> index d5704def2902..5b27f8603e33 100644
> --- a/linux-user/ppc/cpu_loop.c
> +++ b/linux-user/ppc/cpu_loop.c
> @@ -47,6 +47,11 @@ uint32_t cpu_ppc_load_atbu(CPUPPCState *env)
>  return cpu_ppc_get_tb(env) >> 32;
>  }
>  
> +uint64_t cpu_ppc_load_vtb(CPUPPCState *env)
> +{
> +return cpu_ppc_get_tb(env);
> +}
> +
>  uint32_t 

[for-5.0 3/4] spapr: Clean up RMA size calculation

2019-11-28 Thread David Gibson
Move the calculation of the Real Mode Area (RMA) size into a helper
function.  While we're there clean it up and correct it in a few ways:
  * Add comments making it clearer where the various constraints come from
  * Remove a pointless check that the RMA fits within Node 0 (we've just
clamped it so that it does)
  * The 16GiB limit we apply is only correct for POWER8, but there is also
a 1TiB limit that applies on POWER9.

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 57 +++---
 1 file changed, 35 insertions(+), 22 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 52c39daa99..7efd4f2b85 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2664,6 +2664,40 @@ static PCIHostState *spapr_create_default_phb(void)
 return PCI_HOST_BRIDGE(dev);
 }
 
+static hwaddr spapr_rma_size(SpaprMachineState *spapr, Error **errp)
+{
+MachineState *machine = MACHINE(spapr);
+hwaddr rma_size = machine->ram_size;
+hwaddr node0_size = spapr_node0_size(machine);
+
+/* RMA has to fit in the first NUMA node */
+rma_size = MIN(rma_size, node0_size);
+
+/*
+ * VRMA access is via a special 1TiB SLB mapping, so the RMA can
+ * never exceed that
+ */
+rma_size = MIN(rma_size, TiB);
+
+/*
+ * RMA size is controlled in hardware by LPCR[RMLS].  On POWER8
+ * the largest RMA that can be specified there is 16GiB
+ */
+if (!ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00,
+   0, spapr->max_compat_pvr)) {
+rma_size = MIN(rma_size, 16 * GiB);
+}
+
+if (rma_size < (MIN_RMA_SLOF * MiB)) {
+error_setg(errp,
+"pSeries SLOF firmware requires >= %ldMiB guest RMA (Real Mode Area)",
+   MIN_RMA_SLOF);
+return -1;
+}
+
+return rma_size;
+}
+
 /* pSeries LPAR / sPAPR hardware init */
 static void spapr_machine_init(MachineState *machine)
 {
@@ -2675,7 +2709,6 @@ static void spapr_machine_init(MachineState *machine)
 int i;
 MemoryRegion *sysmem = get_system_memory();
 MemoryRegion *ram = g_new(MemoryRegion, 1);
-hwaddr node0_size = spapr_node0_size(machine);
 long load_limit, fw_size;
 char *filename;
 Error *resize_hpt_err = NULL;
@@ -2715,20 +2748,7 @@ static void spapr_machine_init(MachineState *machine)
 exit(1);
 }
 
-spapr->rma_size = node0_size;
-
-/* Actually we don't support unbounded RMA anymore since we added
- * proper emulation of HV mode. The max we can get is 16G which
- * also happens to be what we configure for PAPR mode so make sure
- * we don't do anything bigger than that
- */
-spapr->rma_size = MIN(spapr->rma_size, 0x4ull);
-
-if (spapr->rma_size > node0_size) {
-error_report("Numa node 0 has to span the RMA (%#08"HWADDR_PRIx")",
- spapr->rma_size);
-exit(1);
-}
+spapr->rma_size = spapr_rma_size(spapr, _fatal);
 
 /* Setup a load limit for the ramdisk leaving room for SLOF and FDT */
 load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD;
@@ -2956,13 +2976,6 @@ static void spapr_machine_init(MachineState *machine)
 }
 }
 
-if (spapr->rma_size < (MIN_RMA_SLOF * MiB)) {
-error_report(
-"pSeries SLOF firmware requires >= %ldM guest RMA (Real Mode Area 
memory)",
-MIN_RMA_SLOF);
-exit(1);
-}
-
 if (kernel_filename) {
 uint64_t lowaddr = 0;
 
-- 
2.23.0




Re: [PATCH v19 3/8] numa: Extend CLI to provide memory side cache information

2019-11-28 Thread Tao Xu

On 11/28/2019 9:57 PM, Igor Mammedov wrote:

On Thu, 28 Nov 2019 12:50:36 +0100
Markus Armbruster  wrote:


Tao Xu  writes:


From: Liu Jingqi 

Add -numa hmat-cache option to provide Memory Side Cache Information.
These memory attributes help to build Memory Side Cache Information
Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).
Before using hmat-cache option, enable HMAT with -machine hmat=on.

Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
---

Changes in v19:
 - Add description about the machine property 'hmat' in commit
   message (Markus)
 - Update the QAPI comments
 - Add a check for no memory side cache

Changes in v18:
 - Update the error message (Igor)

Changes in v17:
 - Use NumaHmatCacheOptions to replace HMAT_Cache_Info (Igor)
 - Add check for unordered cache level input (Igor)

Changes in v16:
 - Add cross check with hmat_lb data (Igor)
 - Drop total_levels in struct HMAT_Cache_Info (Igor)
 - Correct the error table number (Igor)

Changes in v15:
 - Change the QAPI version tag to 5.0 (Eric)
---
  hw/core/numa.c| 86 +++
  include/sysemu/numa.h |  5 +++
  qapi/machine.json | 81 +++-
  qemu-options.hx   | 16 +++-
  4 files changed, 184 insertions(+), 4 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 2183c8df1f..664b44ad68 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -366,6 +366,79 @@ void parse_numa_hmat_lb(NumaState *numa_state, 
NumaHmatLBOptions *node,
  g_array_append_val(hmat_lb->list, lb_data);
  }
  
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,

+   Error **errp)
+{
+int nb_numa_nodes = ms->numa_state->num_nodes;
+NodeInfo *numa_info = ms->numa_state->nodes;
+NumaHmatCacheOptions *hmat_cache = NULL;
+
+if (node->node_id >= nb_numa_nodes) {
+error_setg(errp, "Invalid node-id=%" PRIu32 ", it should be less "
+   "than %d", node->node_id, nb_numa_nodes);
+return;
+}
+
+if (numa_info[node->node_id].lb_info_provided != (BIT(0) | BIT(1))) {
+error_setg(errp, "The latency and bandwidth information of "
+   "node-id=%" PRIu32 " should be provided before memory side "
+   "cache attributes", node->node_id);
+return;
+}
+
+if (node->level >= HMAT_LB_LEVELS) {
+error_setg(errp, "Invalid level=%" PRIu8 ", it should be less than or "
+   "equal to %d", node->level, HMAT_LB_LEVELS - 1);
+return;
+}
+
+if (!node->level && (node->assoc || node->policy || node->line)) {
+error_setg(errp, "Assoc and policy options should be 'none', line "
+   "should be 0. If cache level is 0, which means no memory "
+   "side cache in node-id=%" PRIu32, node->node_id);



Do we have to describe node->level == 0 in side-cache table
(spec isn't clear on this usecase)?

Can we just tell user that "RAM (level 0) should not be used with
'hmat-cache' option?



Yes we can. I will do that.

   


Error messages should be a phrase, not a paragraph; see error_setg()'s
function comment.  I think you want something like "be 0 when cache
level is 0".

I'm not sure the error message should explain what level 0 means, but
I'm happy to defer to the NUMA maintainers there.


+return;
+}
+
+assert(node->assoc < HMAT_CACHE_ASSOCIATIVITY__MAX);
+assert(node->policy < HMAT_CACHE_WRITE_POLICY__MAX);
+if (ms->numa_state->hmat_cache[node->node_id][node->level]) {
+error_setg(errp, "Duplicate configuration of the side cache for "
+   "node-id=%" PRIu32 " and level=%" PRIu8,
+   node->node_id, node->level);
+return;
+}
+
+if ((node->level > 1) &&
+ms->numa_state->hmat_cache[node->node_id][node->level - 1] &&
+(node->size >=
+ms->numa_state->hmat_cache[node->node_id][node->level - 1]->size)) 
{
+error_setg(errp, "Invalid size=%" PRIu64 ", the size of level=%" PRIu8
+   " should be less than the size(%" PRIu64 ") of "
+   "level=%" PRIu8, node->size, node->level,
+   ms->numa_state->hmat_cache[node->node_id]
+ [node->level - 1]->size,
+   node->level - 1);
+return;
+}
+
+if ((node->level < HMAT_LB_LEVELS - 1) &&
+ms->numa_state->hmat_cache[node->node_id][node->level + 1] &&
+(node->size <=
+ms->numa_state->hmat_cache[node->node_id][node->level + 1]->size)) 
{
+error_setg(errp, "Invalid size=%" PRIu64 ", the size of level=%" PRIu8
+   " should be larger than the size(%" PRIu64 ") of "
+   "level=%" PRIu8, node->size, node->level,
+   ms->numa_state->hmat_cache[node->node_id]
+  

[for-5.0 1/4] spapr,ppc: Simplify signature of kvmppc_rma_size()

2019-11-28 Thread David Gibson
This function calculates the maximum size of the RMA as implied by the
host's page size of structure of the VRMA (there are a number of other
constraints on the RMA size which will supersede this one in many
circumstances).

The current interface takes the current RMA size estimate, and clamps it
to the VRMA derived size.  The only current caller passes in an arguably
wrong value (it will match the current RMA estimate in some but not all
cases).

We want to fix that, but for now just keep concerns separated by having the
KVM helper function just return the VRMA derived limit, and let the caller
combine it with other constraints.  We call the new function
kvmppc_vrma_limit() to more clearly indicate its limited responsibility.

The helper should only ever be called in the KVM enabled case, so replace
its !CONFIG_KVM stub with an assert() rather than a dummy value.

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c   | 5 +++--
 target/ppc/kvm.c | 5 ++---
 target/ppc/kvm_ppc.h | 7 +++
 3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d9c9a2bcee..069bd04a8d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1635,8 +1635,9 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
 spapr_reallocate_hpt(spapr, hpt_shift, _fatal);
 
 if (spapr->vrma_adjust) {
-spapr->rma_size = kvmppc_rma_size(spapr_node0_size(MACHINE(spapr)),
-  spapr->htab_shift);
+hwaddr vrma_limit = kvmppc_vrma_limit(spapr->htab_shift);
+
+spapr->rma_size = MIN(spapr_node0_size(MACHINE(spapr)), vrma_limit);
 }
 }
 
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index c77f9848ec..09b3bd6443 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2101,7 +2101,7 @@ void kvmppc_hint_smt_possible(Error **errp)
 
 
 #ifdef TARGET_PPC64
-uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift)
+uint64_t kvmppc_vrma_limit(unsigned int hash_shift)
 {
 struct kvm_ppc_smmu_info info;
 long rampagesize, best_page_shift;
@@ -2128,8 +2128,7 @@ uint64_t kvmppc_rma_size(uint64_t current_size, unsigned 
int hash_shift)
 }
 }
 
-return MIN(current_size,
-   1ULL << (best_page_shift + hash_shift - 7));
+return 1ULL << (best_page_shift + hash_shift - 7));
 }
 #endif
 
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index 98bd7d5da6..4f0eec4c1b 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -45,7 +45,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t 
page_shift,
   int *pfd, bool need_vfio);
 int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
 int kvmppc_reset_htab(int shift_hint);
-uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
+uint64_t kvmppc_vrma_limit(unsigned int hash_shift);
 bool kvmppc_has_cap_spapr_vfio(void);
 #endif /* !CONFIG_USER_ONLY */
 bool kvmppc_has_cap_epr(void);
@@ -241,10 +241,9 @@ static inline int kvmppc_reset_htab(int shift_hint)
 return 0;
 }
 
-static inline uint64_t kvmppc_rma_size(uint64_t current_size,
-   unsigned int hash_shift)
+static inline uint64_t kvmppc_vrma_limit(unsigned int hash_shift)
 {
-return ram_size;
+g_assert_not_reached();
 }
 
 static inline bool kvmppc_hpt_needs_host_contiguous_pages(void)
-- 
2.23.0




[for-5.0 4/4] spapr: Correct clamping of RMA to Node 0 size

2019-11-28 Thread David Gibson
The Real Mode Area (RMA) needs to fit within Node 0 in NUMA configurations.
We use a helper function spapr_node0_size() to calculate this.

But that function doesn't actually get the size of Node 0, it gets the
minimum size of all nodes, ever since b082d65a300 "spapr: Add a helper for
node0_size calculation".  That was added, apparently, because Node 0 in
qemu's terms might not have corresponded to Node 0 in PAPR terms (i.e. the
node with memory at address 0).

That might not have been the case at the time, but it *is* the case now
that qemu node 0 must have the lowest address, which is the node we need.
So, we can simplify this logic, folding it into spapr_rma_size(), the only
remaining caller.

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 26 ++
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7efd4f2b85..6611f75bdf 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -295,20 +295,6 @@ static void spapr_populate_pa_features(SpaprMachineState 
*spapr,
 _FDT((fdt_setprop(fdt, offset, "ibm,pa-features", pa_features, pa_size)));
 }
 
-static hwaddr spapr_node0_size(MachineState *machine)
-{
-if (machine->numa_state->num_nodes) {
-int i;
-for (i = 0; i < machine->numa_state->num_nodes; ++i) {
-if (machine->numa_state->nodes[i].node_mem) {
-return MIN(pow2floor(machine->numa_state->nodes[i].node_mem),
-   machine->ram_size);
-}
-}
-}
-return machine->ram_size;
-}
-
 static void add_str(GString *s, const gchar *s1)
 {
 g_string_append_len(s, s1, strlen(s1) + 1);
@@ -2668,10 +2654,13 @@ static hwaddr spapr_rma_size(SpaprMachineState *spapr, 
Error **errp)
 {
 MachineState *machine = MACHINE(spapr);
 hwaddr rma_size = machine->ram_size;
-hwaddr node0_size = spapr_node0_size(machine);
 
 /* RMA has to fit in the first NUMA node */
-rma_size = MIN(rma_size, node0_size);
+if (machine->numa_state->num_nodes) {
+hwaddr node0_size = machine->numa_state->nodes[0].node_mem;
+
+rma_size = MIN(rma_size, node0_size);
+}
 
 /*
  * VRMA access is via a special 1TiB SLB mapping, so the RMA can
@@ -2688,6 +2677,11 @@ static hwaddr spapr_rma_size(SpaprMachineState *spapr, 
Error **errp)
 rma_size = MIN(rma_size, 16 * GiB);
 }
 
+/*
+ * RMA size must be a power of 2
+ */
+rma_size = pow2floor(rma_size);
+
 if (rma_size < (MIN_RMA_SLOF * MiB)) {
 error_setg(errp,
 "pSeries SLOF firmware requires >= %ldMiB guest RMA (Real Mode Area)",
-- 
2.23.0




[for-5.0 0/4] Fixes for RMA size calculation

2019-11-28 Thread David Gibson
PAPR guests have a certain "Real Mode Area" - a subsection of memory
which can be accessed when in guest real mode (that is, with the MMU
"off" from the guest point of view).  This is advertised to the guest
in the device tree.

We want to make the RMA as large as we can, to allow for flexibility
in loading boot images, which need to fit within it.  But, there's a
somewhat complex set of constraints on the size.  At the moment, we
don't always get those correct.  This has caused crashes in some
cases, although for now those are worked around inside the guest
kernel.

These patches clarify and correct the logic here.  They will break
some cases using a host kernel with 4kiB pagesize (which doesn't
include any mainstream distro kernel nowadays).  Since that case is
very rare, and there do exist a number of workarounds for it, I think
that's worth it for the simplified logic and more consistent
behaviour.

David Gibson (4):
  spapr,ppc: Simplify signature of kvmppc_rma_size()
  spapr: Don't attempt to clamp RMA to VRMA constraint
  spapr: Clean up RMA size calculation
  spapr: Correct clamping of RMA to Node 0 size

 hw/ppc/spapr.c | 110 -
 hw/ppc/spapr_hcall.c   |   4 +-
 include/hw/ppc/spapr.h |   3 +-
 target/ppc/kvm.c   |   5 +-
 target/ppc/kvm_ppc.h   |   7 ++-
 5 files changed, 63 insertions(+), 66 deletions(-)

-- 
2.23.0




[for-5.0 2/4] spapr: Don't attempt to clamp RMA to VRMA constraint

2019-11-28 Thread David Gibson
The Real Mode Area (RMA) is the part of memory which a guest can access
when in real (MMU off) mode.  Of course, for a guest under KVM, the MMU
isn't really turned off, it's just in a special translation mode - Virtual
Real Mode Area (VRMA) - which looks like real mode in guest mode.

The mechanics of how this works when in Hashed Page Table (HPT) mode, put
a constraint on the size of the RMA, which depends on the size of the HPT.
So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA we
advertise to the guest based on this VRMA limit.

There are several things wrong with this:
 1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum
of Node 0 memory size and the VRMA limit.  That will *often* work the
same as clamping, but there can be other constraints on RMA size which
supersede Node 0 memory size.  We have real bugs caused by this
(currently worked around in the guest kernel)
 2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where
we're past the point that we can actually advertise an RMA limit to the
guest
 3) But most fundamentally, the VRMA limit depends on host configuration
(page size) which shouldn't be visible to the guest, but this partially
exposes it.  This can cause problems with migration in certain edge
cases, although we will mostly get away with it.

In practice, this clamping is almost never applied anyway.  With 64kiB
pages and the normal rules for sizing of the HPT, the theoretical VRMA
limit will be 4x(guest memory size) and so never hit.  It will hit with
4kiB pages, where it will be (guest memory size)/4.  However all mainstream
distro kernels for POWER have used a 64kiB page size for at least 10 years.

So, simply replace this logic with a check that the RMA we've calculated
based only on guest visible configuration will fit within the host implied
VRMA limit.  This can break if running HPT guests on a host kernel with
4kiB page size.  As noted that's very rare.  There also exist several
possible workarounds:
  * Change the host kernel to use 64kiB pages
  * Use radix MMU (RPT) guests instead of HPT
  * Use 64kiB hugepages on the host to back guest memory
  * Increase the guest memory size so that the RMA hits one of the fixed
limits before the RMA limit.  This is relatively easy on POWER8 which
has a 16GiB limit, harder on POWER9 which has a 1TiB limit.
  * Decrease guest memory size so that it's below the lower bound on VRMA
limit (minimum HPT size is 256kiB, giving a minimum VRAM of 8MiB).
Difficult in practice since modern guests tend to want 1-2GiB.
  * Use a guest NUMA configuration which artificially constrains the RMA
within the VRMA limit (the RMA must always fit within Node 0).

Previously, on KVM, we also temporarily reduced the rma_size to 256M so
that the we'd load the kernel and initrd safely, regardless of the VRMA
limit.  This was a) confusing, b) could significantly limit the size of
images we could load and c) introduced a behavioural difference between
KVM and TCG.  So we remove that as well.

Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 28 ++--
 hw/ppc/spapr_hcall.c   |  4 ++--
 include/hw/ppc/spapr.h |  3 +--
 3 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 069bd04a8d..52c39daa99 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1618,7 +1618,7 @@ void spapr_reallocate_hpt(SpaprMachineState *spapr, int 
shift,
 spapr_set_all_lpcrs(0, LPCR_HR | LPCR_UPRT);
 }
 
-void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
+void spapr_setup_hpt(SpaprMachineState *spapr)
 {
 int hpt_shift;
 
@@ -1634,10 +1634,16 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
 }
 spapr_reallocate_hpt(spapr, hpt_shift, _fatal);
 
-if (spapr->vrma_adjust) {
+if (kvm_enabled()) {
 hwaddr vrma_limit = kvmppc_vrma_limit(spapr->htab_shift);
 
-spapr->rma_size = MIN(spapr_node0_size(MACHINE(spapr)), vrma_limit);
+/* Check our RMA fits in the possible VRMA */
+if (vrma_limit < spapr->rma_size) {
+error_report("Unable to create %" HWADDR_PRIu
+ "MiB RMA (VRMA only allows %" HWADDR_PRIu "MiB",
+ spapr->rma_size / MiB, vrma_limit / MiB);
+exit(EXIT_FAILURE);
+}
 }
 }
 
@@ -1676,7 +1682,7 @@ static void spapr_machine_reset(MachineState *machine)
 spapr->patb_entry = PATE1_GR;
 spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
 } else {
-spapr_setup_hpt_and_vrma(spapr);
+spapr_setup_hpt(spapr);
 }
 
 qemu_devices_reset();
@@ -2711,20 +2717,6 @@ static void spapr_machine_init(MachineState *machine)
 
 spapr->rma_size = node0_size;
 
-/* With KVM, we don't actually know whether KVM supports an
- * unbounded RMA (PR KVM) or is limited by the hash table size
- * (HV KVM using VRMA), so we 

Re: [PATCH V2] block/nbd: fix memory leak in nbd_open()

2019-11-28 Thread pannengyuan
On 2019/11/28 21:36, Stefano Garzarella wrote:
> On Thu, Nov 28, 2019 at 08:09:31PM +0800, pannengy...@huawei.com wrote:
>> From: PanNengyuan 
>>
>> In currently implementation there will be a memory leak when
>> nbd_client_connect() returns error status. Here is an easy way to
>> reproduce:
>>
>> 1. run qemu-iotests as follow and check the result with asan:
>> ./check -raw 143
>>
>> Following is the asan output backtrack:
>> Direct leak of 40 byte(s) in 1 object(s) allocated from:
>> #0 0x7f629688a560 in calloc (/usr/lib64/libasan.so.3+0xc7560)
>> #1 0x7f6295e7e015 in g_malloc0  (/usr/lib64/libglib-2.0.so.0+0x50015)
>> #2 0x56281dab4642 in qobject_input_start_struct  
>> /mnt/sdb/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:295
>> #3 0x56281dab1a04 in visit_start_struct  
>> /mnt/sdb/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:49
>> #4 0x56281dad1827 in visit_type_SocketAddress  
>> qapi/qapi-visit-sockets.c:386
>> #5 0x56281da8062f in nbd_config   
>> /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1716
>> #6 0x56281da8062f in nbd_process_options  
>> /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1829
>> #7 0x56281da8062f in nbd_open  /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873
>>
>> Direct leak of 15 byte(s) in 1 object(s) allocated from:
>> #0 0x7f629688a3a0 in malloc (/usr/lib64/libasan.so.3+0xc73a0)
>> #1 0x7f6295e7dfbd in g_malloc (/usr/lib64/libglib-2.0.so.0+0x4ffbd)
>> #2 0x7f6295e96ace in g_strdup (/usr/lib64/libglib-2.0.so.0+0x68ace)
>> #3 0x56281da804ac in nbd_process_options 
>> /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1834
>> #4 0x56281da804ac in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873
>>
>> Indirect leak of 24 byte(s) in 1 object(s) allocated from:
>> #0 0x7f629688a3a0 in malloc (/usr/lib64/libasan.so.3+0xc73a0)
>> #1 0x7f6295e7dfbd in g_malloc (/usr/lib64/libglib-2.0.so.0+0x4ffbd)
>> #2 0x7f6295e96ace in g_strdup (/usr/lib64/libglib-2.0.so.0+0x68ace)
>> #3 0x56281dab41a3 in qobject_input_type_str_keyval  
>> /mnt/sdb/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:536
>> #4 0x56281dab2ee9 in visit_type_str  
>> /mnt/sdb/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:297
>> #5 0x56281dad0fa1 in visit_type_UnixSocketAddress_members 
>> qapi/qapi-visit-sockets.c:141
>> #6 0x56281dad17b6 in visit_type_SocketAddress_members  
>> qapi/qapi-visit-sockets.c:366
>> #7 0x56281dad186a in visit_type_SocketAddress 
>> qapi/qapi-visit-sockets.c:393
>> #8 0x56281da8062f in nbd_config /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1716
>> #9 0x56281da8062f in nbd_process_options 
>> /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1829
>> #10 0x56281da8062f in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873
>>
>> Reported-by: Euler Robot 
>> Signed-off-by: PanNengyuan 
>> ---
>> Changes v2 to v1:
>> - add a new function to do the common cleanups (suggested by Stefano 
>> Garzarella).
>> ---
>>  block/nbd.c | 26 --
>>  1 file changed, 16 insertions(+), 10 deletions(-)
>>
>> diff --git a/block/nbd.c b/block/nbd.c
>> index 1239761..f8aa2a8 100644
>> --- a/block/nbd.c
>> +++ b/block/nbd.c
>> @@ -94,6 +94,8 @@ typedef struct BDRVNBDState {
>>  
>>  static int nbd_client_connect(BlockDriverState *bs, Error **errp);
>>  
>> +static void nbd_free_bdrvstate_prop(BDRVNBDState *s);
>> +
>>  static void nbd_channel_error(BDRVNBDState *s, int ret)
>>  {
>>  if (ret == -EIO) {
>> @@ -1486,6 +1488,17 @@ static int nbd_client_connect(BlockDriverState *bs, 
>> Error **errp)
>>  }
>>  }
>>  
>> +static void nbd_free_bdrvstate_prop(BDRVNBDState *s)
>> +{
>> +object_unref(OBJECT(s->tlscreds));
>> +qapi_free_SocketAddress(s->saddr);
>> +g_free(s->export);
>> +g_free(s->tlscredsid);
>> +if (s->x_dirty_bitmap) {
>^ it is not needed, g_free() handles NULL pointers.
>> +g_free(s->x_dirty_bitmap);
>> +}
>> +}
>> +
> 
> Please, split this patch in two patches:
> - the first patch where you add this function and use it in
>   nbd_process_options() and nbd_close()
> - the second patch where you fix the leak in nbd_open()
> 
> Thanks,
> Stefano

Thanks, I will change and split it in next version.

> 
>>  /*
>>   * Parse nbd_open options
>>   */
>> @@ -1855,10 +1868,7 @@ static int nbd_process_options(BlockDriverState *bs, 
>> QDict *options,
>>  
>>   error:
>>  if (ret < 0) {
>> -object_unref(OBJECT(s->tlscreds));
>> -qapi_free_SocketAddress(s->saddr);
>> -g_free(s->export);
>> -g_free(s->tlscredsid);
>> +nbd_free_bdrvstate_prop(s);
>>  }
>>  qemu_opts_del(opts);
>>  return ret;
>> @@ -1881,6 +1891,7 @@ static int nbd_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>  
>>  ret = nbd_client_connect(bs, errp);
>>  if (ret < 0) {
>> +nbd_free_bdrvstate_prop(s);
>>  return ret;
>>  }
>>  /* successfully connected */
>> @@ -1937,12 +1948,7 @@ static void nbd_close(BlockDriverState *bs)
>>  BDRVNBDState *s = bs->opaque;
>>  
>>

Re: [PATCH] io/channel-websock: treat 'binary' and no sub-protocol as the same

2019-11-28 Thread Yu-Chen Lin
Ping?

Yu-Chen Lin  於 2019年11月23日 週六 11:43 寫道:

> noVNC doesn't use 'binary' protocol by default after
> commit c912230309806aacbae4295faf7ad6406da97617.
>
> It will cause qemu return 400 when handshaking.
>
> To overcome this problem and remain compatibility of
> older noVNC client.
>
> We treat 'binary' and no sub-protocol as the same
> so that we can support different version of noVNC
> client.
>
> Tested on noVNC before c912230 and after c912230.
>
> Buglink: https://bugs.launchpad.net/qemu/+bug/1849644
>
> Signed-off-by: Yu-Chen Lin 
> ---
>  io/channel-websock.c | 35 +++
>  1 file changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/io/channel-websock.c b/io/channel-websock.c
> index fc36d44eba..918e09ea3f 100644
> --- a/io/channel-websock.c
> +++ b/io/channel-websock.c
> @@ -49,13 +49,20 @@
>  "Server: QEMU VNC\r\n"   \
>  "Date: %s\r\n"
>
> +#define QIO_CHANNEL_WEBSOCK_HANDSHAKE_WITH_PROTO_RES_OK \
> +"HTTP/1.1 101 Switching Protocols\r\n"  \
> +QIO_CHANNEL_WEBSOCK_HANDSHAKE_RES_COMMON\
> +"Upgrade: websocket\r\n"\
> +"Connection: Upgrade\r\n"   \
> +"Sec-WebSocket-Accept: %s\r\n"  \
> +"Sec-WebSocket-Protocol: binary\r\n"\
> +"\r\n"
>  #define QIO_CHANNEL_WEBSOCK_HANDSHAKE_RES_OK\
>  "HTTP/1.1 101 Switching Protocols\r\n"  \
>  QIO_CHANNEL_WEBSOCK_HANDSHAKE_RES_COMMON\
>  "Upgrade: websocket\r\n"\
>  "Connection: Upgrade\r\n"   \
>  "Sec-WebSocket-Accept: %s\r\n"  \
> -"Sec-WebSocket-Protocol: binary\r\n"\
>  "\r\n"
>  #define QIO_CHANNEL_WEBSOCK_HANDSHAKE_RES_NOT_FOUND \
>  "HTTP/1.1 404 Not Found\r\n"\
> @@ -336,6 +343,7 @@
> qio_channel_websock_find_header(QIOChannelWebsockHTTPHeader *hdrs,
>
>  static void qio_channel_websock_handshake_send_res_ok(QIOChannelWebsock
> *ioc,
>const char *key,
> +  const bool
> use_protocols,
>Error **errp)
>  {
>  char combined_key[QIO_CHANNEL_WEBSOCK_CLIENT_KEY_LEN +
> @@ -361,8 +369,13 @@ static void
> qio_channel_websock_handshake_send_res_ok(QIOChannelWebsock *ioc,
>  }
>
>  date = qio_channel_websock_date_str();
> -qio_channel_websock_handshake_send_res(
> -ioc, QIO_CHANNEL_WEBSOCK_HANDSHAKE_RES_OK, date, accept);
> +if (use_protocols) {
> +qio_channel_websock_handshake_send_res(
> +ioc, QIO_CHANNEL_WEBSOCK_HANDSHAKE_WITH_PROTO_RES_OK,
> date, accept);
> +} else {
> +qio_channel_websock_handshake_send_res(
> +ioc, QIO_CHANNEL_WEBSOCK_HANDSHAKE_RES_OK, date, accept);
> +}
>
>  g_free(date);
>  g_free(accept);
> @@ -387,10 +400,6 @@ static void
> qio_channel_websock_handshake_process(QIOChannelWebsock *ioc,
>
>  protocols = qio_channel_websock_find_header(
>  hdrs, nhdrs, QIO_CHANNEL_WEBSOCK_HEADER_PROTOCOL);
> -if (!protocols) {
> -error_setg(errp, "Missing websocket protocol header data");
> -goto bad_request;
> -}
>
>  version = qio_channel_websock_find_header(
>  hdrs, nhdrs, QIO_CHANNEL_WEBSOCK_HEADER_VERSION);
> @@ -430,10 +439,12 @@ static void
> qio_channel_websock_handshake_process(QIOChannelWebsock *ioc,
>  trace_qio_channel_websock_http_request(ioc, protocols, version,
> host, connection, upgrade,
> key);
>
> -if (!g_strrstr(protocols, QIO_CHANNEL_WEBSOCK_PROTOCOL_BINARY)) {
> -error_setg(errp, "No '%s' protocol is supported by client '%s'",
> -   QIO_CHANNEL_WEBSOCK_PROTOCOL_BINARY, protocols);
> -goto bad_request;
> +if (protocols) {
> +if (!g_strrstr(protocols,
> QIO_CHANNEL_WEBSOCK_PROTOCOL_BINARY)) {
> +error_setg(errp, "No '%s' protocol is supported by client
> '%s'",
> +   QIO_CHANNEL_WEBSOCK_PROTOCOL_BINARY,
> protocols);
> +goto bad_request;
> +}
>  }
>
>  if (!g_str_equal(version, QIO_CHANNEL_WEBSOCK_SUPPORTED_VERSION)) {
> @@ -467,7 +478,7 @@ static void
> qio_channel_websock_handshake_process(QIOChannelWebsock *ioc,
>  goto bad_request;
>  }
>
> -qio_channel_websock_handshake_send_res_ok(ioc, key, errp);
> +qio_channel_websock_handshake_send_res_ok(ioc, key, !!protocols,
> errp);
>  return;
>
>   bad_request:
> --
> 2.17.1
>
>


Re: [PATCH v1 0/1] s390x: protvirt: SCLP interpretation

2019-11-28 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/1574935984-16910-1-git-send-email-pmo...@linux.ibm.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v1 0/1] s390x: protvirt: SCLP interpretation
Type: series
Message-id: 1574935984-16910-1-git-send-email-pmo...@linux.ibm.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
cc42cdc s390x: protvirt: SCLP interpretation

=== OUTPUT BEGIN ===
ERROR: switch and case should be at the same indent
#135: FILE: target/s390x/kvm.c:1715:
 switch (icpt_code) {
+ case ICPT_PV_INSTR_NOT:
[...]
+case ICPT_PV_INSTR:

total: 1 errors, 0 warnings, 105 lines checked

Commit cc42cdc36171 (s390x: protvirt: SCLP interpretation) has style problems, 
please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/1574935984-16910-1-git-send-email-pmo...@linux.ibm.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 2/4] target/arm: Abstract the generic timer frequency

2019-11-28 Thread Andrew Jeffery



On Thu, 28 Nov 2019, at 19:16, Cédric Le Goater wrote:
> On 28/11/2019 06:45, Andrew Jeffery wrote:
> > Prepare for SoCs such as the ASPEED AST2600 whose firmware configures
> > CNTFRQ to values significantly larger than the static 62.5MHz value
> > currently derived from GTIMER_SCALE. As the OS potentially derives its
> > timer periods from the CNTFRQ value the lack of support for running
> > QEMUTimers at the appropriate rate leads to sticky behaviour in the
> > guest.
> > 
> > Substitute the GTIMER_SCALE constant with use of a helper to derive the
> > period from gt_cntfrq stored in struct ARMCPU. Initially set gt_cntfrq
> > to the frequency associated with GTIMER_SCALE so current behaviour is
> > maintained.
> > 
> > Signed-off-by: Andrew Jeffery 
> > ---
> >  target/arm/cpu.c|  2 ++
> >  target/arm/cpu.h| 10 ++
> >  target/arm/helper.c | 10 +++---
> >  3 files changed, 19 insertions(+), 3 deletions(-)
> > 
> > diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> > index 7a4ac9339bf9..5698a74061bb 100644
> > --- a/target/arm/cpu.c
> > +++ b/target/arm/cpu.c
> > @@ -974,6 +974,8 @@ static void arm_cpu_initfn(Object *obj)
> >  if (tcg_enabled()) {
> >  cpu->psci_version = 2; /* TCG implements PSCI 0.2 */
> >  }
> > +
> > +cpu->gt_cntfrq = NANOSECONDS_PER_SECOND / GTIMER_SCALE;
> >  }
> >  
> >  static Property arm_cpu_reset_cbar_property =
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index 83a809d4bac4..666c03871fdf 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -932,8 +932,18 @@ struct ARMCPU {
> >   */
> >  DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ);
> >  DECLARE_BITMAP(sve_vq_init, ARM_MAX_VQ);
> > +
> > +/* Generic timer counter frequency, in Hz */
> > +uint64_t gt_cntfrq;
> >  };
> >  
> > +static inline unsigned int gt_cntfrq_period_ns(ARMCPU *cpu)
> > +{
> > +/* XXX: Could include qemu/timer.h to get NANOSECONDS_PER_SECOND? */
> > +const unsigned int ns_per_s = 1000 * 1000 * 1000;
> > +return ns_per_s > cpu->gt_cntfrq ? ns_per_s / cpu->gt_cntfrq : 1;
> > +}
> 
> Are you inlining this helper for performance reasons ? 

Originally I was going to do it as a macro in order to avoid redundantly 
scattering
the calculation around. My thought was to use a macro as it's a simple 
calculation,
but then figured it was a bit nicer as a function for type safety. I already 
had it as a
macro in the header, so it was the least effort to switch it to a static inline 
and leave
it where it was :) So that's the justification, mostly just evolution of 
thought process.
Performance was also a consideration but I've done no measurements.

Andrew



Re: [PATCH] hw: add compat machines for 5.0

2019-11-28 Thread Eduardo Habkost
On Thu, Nov 28, 2019 at 06:37:06PM +0100, Cornelia Huck wrote:
> On Tue, 12 Nov 2019 11:48:11 +0100
> Cornelia Huck  wrote:
> 
> > Add 5.0 machine types for arm/i440fx/q35/s390x/spapr.
> > 
> > For i440fx and q35, unversioned cpu models are still translated
> > to -v1; I'll leave changing this (if desired) to the respective
> > maintainers.
> > 
> > Signed-off-by: Cornelia Huck 
> > ---
> > 
> > also pushed out to https://github.com/cohuck/qemu machine-5.0
> > 
> > x86 folks: if you want to change the cpu model versioning, I
> > can do it in this patch, or just do it on top yourselves
> 
> So, do we have a final verdict yet (keep it at v1)?
> 
> If yes, I'll queue this via the s390 tree, unless someone else beats me
> to it.

We won't change default_cpu_version in 5.0, so:

Reviewed-by: Eduardo Habkost 

-- 
Eduardo




Re: [PATCH v2 1/4] qom/object: enable setter for uint types

2019-11-28 Thread Marc-André Lureau
Hi

On Thu, Nov 28, 2019 at 8:48 PM Felipe Franciosi  wrote:
>
> Traditionally, the uint-specific property helpers only offer getters.
> When adding object (or class) uint types, one must therefore use the
> generic property helper if a setter is needed (and probably duplicate
> some code writing their own getters/setters).
>
> This enhances the uint-specific property helper APIs by adding a
> bitwise-or'd 'flags' field and modifying all clients of that API to set
> this paramater to OBJ_PROP_FLAG_RD. This maintains the current behaviour
> whilst allowing others to also set OBJ_PROP_FLAG_WR in the future (which
> will automatically install a setter). Other flags may be added later.

For readability, I would have the full spelling:
OBJECT_PROPERTY_FLAG_READ/OBJECT_PROPERTY_FLAG_WRITE (and an alias
OBJECT_PROPERTY_FLAG_READWRITE = READ|WRITE)

>
> Signed-off-by: Felipe Franciosi 
> ---
>  hw/acpi/ich9.c   |   4 +-
>  hw/acpi/pcihp.c  |   7 +-
>  hw/acpi/piix4.c  |  12 +--
>  hw/isa/lpc_ich9.c|   4 +-
>  hw/ppc/spapr_drc.c   |   2 +-
>  include/qom/object.h |  42 +++--
>  qom/object.c | 216 ++-
>  ui/console.c |   4 +-
>  8 files changed, 243 insertions(+), 48 deletions(-)
>
> diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
> index 2034dd749e..236300d2a9 100644
> --- a/hw/acpi/ich9.c
> +++ b/hw/acpi/ich9.c
> @@ -454,12 +454,12 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs 
> *pm, Error **errp)
>  pm->s4_val = 2;
>
>  object_property_add_uint32_ptr(obj, ACPI_PM_PROP_PM_IO_BASE,
> -   >pm_io_base, errp);
> +   >pm_io_base, OBJ_PROP_FLAG_RD, errp);
>  object_property_add(obj, ACPI_PM_PROP_GPE0_BLK, "uint32",
>  ich9_pm_get_gpe0_blk,
>  NULL, NULL, pm, NULL);
>  object_property_add_uint32_ptr(obj, ACPI_PM_PROP_GPE0_BLK_LEN,
> -   _len, errp);
> +   _len, OBJ_PROP_FLAG_RD, errp);
>  object_property_add_bool(obj, "memory-hotplug-support",
>   ich9_pm_get_memory_hotplug_support,
>   ich9_pm_set_memory_hotplug_support,
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index 8413348a33..c8a7194b19 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -80,7 +80,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
>
>  *bus_bsel = (*bsel_alloc)++;
>  object_property_add_uint32_ptr(OBJECT(bus), ACPI_PCIHP_PROP_BSEL,
> -   bus_bsel, _abort);
> +   bus_bsel, OBJ_PROP_FLAG_RD,
> +   _abort);
>  }
>
>  return bsel_alloc;
> @@ -373,9 +374,9 @@ void acpi_pcihp_init(Object *owner, AcpiPciHpState *s, 
> PCIBus *root_bus,
>  memory_region_add_subregion(address_space_io, s->io_base, >io);
>
>  object_property_add_uint16_ptr(owner, ACPI_PCIHP_IO_BASE_PROP, 
> >io_base,
> -   _abort);
> +   OBJ_PROP_FLAG_RD, _abort);
>  object_property_add_uint16_ptr(owner, ACPI_PCIHP_IO_LEN_PROP, >io_len,
> -   _abort);
> +   OBJ_PROP_FLAG_RD, _abort);
>  }
>
>  const VMStateDescription vmstate_acpi_pcihp_pci_status = {
> diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> index 93aec2dd2c..06d964a840 100644
> --- a/hw/acpi/piix4.c
> +++ b/hw/acpi/piix4.c
> @@ -443,17 +443,17 @@ static void piix4_pm_add_propeties(PIIX4PMState *s)
>  static const uint16_t sci_int = 9;
>
>  object_property_add_uint8_ptr(OBJECT(s), ACPI_PM_PROP_ACPI_ENABLE_CMD,
> -  _enable_cmd, NULL);
> +  _enable_cmd, OBJ_PROP_FLAG_RD, NULL);
>  object_property_add_uint8_ptr(OBJECT(s), ACPI_PM_PROP_ACPI_DISABLE_CMD,
> -  _disable_cmd, NULL);
> +  _disable_cmd, OBJ_PROP_FLAG_RD, NULL);
>  object_property_add_uint32_ptr(OBJECT(s), ACPI_PM_PROP_GPE0_BLK,
> -  _blk, NULL);
> +  _blk, OBJ_PROP_FLAG_RD, NULL);
>  object_property_add_uint32_ptr(OBJECT(s), ACPI_PM_PROP_GPE0_BLK_LEN,
> -  _blk_len, NULL);
> +  _blk_len, OBJ_PROP_FLAG_RD, NULL);
>  object_property_add_uint16_ptr(OBJECT(s), ACPI_PM_PROP_SCI_INT,
> -  _int, NULL);
> +  _int, OBJ_PROP_FLAG_RD, NULL);
>  object_property_add_uint32_ptr(OBJECT(s), ACPI_PM_PROP_PM_IO_BASE,
> -  >io_base, NULL);
> +  >io_base, OBJ_PROP_FLAG_RD, NULL);
>  }
>
>  static void piix4_pm_realize(PCIDevice *dev, Error **errp)
> diff --git 

Re: [PATCH v37 17/17] target/avr: Update MAINTAINERS file

2019-11-28 Thread Philippe Mathieu-Daudé

On 11/27/19 6:52 PM, Michael Rolnik wrote:

Include AVR maintaners in MAINTAINERS file

Signed-off-by: Michael Rolnik 
---
  MAINTAINERS | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5e5e3e52d6..d7bfb62791 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -163,6 +163,17 @@ S: Maintained
  F: hw/arm/smmu*
  F: include/hw/arm/smmu*
  
+AVR TCG CPUs

+M: Michael Rolnik 
+R: Sarah Harris 
+S: Maintained
+F: target/avr/


^ This is the architectural part section

v This part should go under a new 'AVR Machines' section.
  (See 'Alpha Machines' for example).


+F: hw/misc/avr_mask.c
+F: hw/char/avr_usart.c
+F: hw/timer/avr_timer16.c
+F: hw/avr/
+F: tests/acceptance/machine_avr6.py
+
  CRIS TCG CPUs
  M: Edgar E. Iglesias 
  S: Maintained






Re: [PULL 0/5] i386 patches for QEMU 4.2-rc

2019-11-28 Thread Jens Freimann

On Wed, Nov 27, 2019 at 09:14:01AM +, Dr. David Alan Gilbert wrote:

* Philippe Mathieu-Daudé (phi...@redhat.com) wrote:

On 11/26/19 10:19 AM, no-re...@patchew.org wrote:
> Patchew URL: 
https://patchew.org/QEMU/20191126085936.1689-1-pbonz...@redhat.com/
>
> This series failed the docker-quick@centos7 build test. Please find the 
testing commands and
> their output below. If you have Docker installed, you can probably reproduce 
it
> locally.
>
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> make docker-image-centos7 V=1 NETWORK=1
> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
> === TEST SCRIPT END ===
>
>TESTcheck-unit: tests/test-thread-pool
> wait_for_migration_fail: unexpected status status=wait-unplug allow_active=1
> **
> ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: 
assertion failed: (result)
> ERROR - Bail out! 
ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: 
assertion failed: (result)
> make: *** [check-qtest-aarch64] Error 1

Should we worry about this error?


Interesting; that should be fixed by Jens'
284f42a520cd9f5905abac2fa50397423890de8f - unless fix dev_unplug_pending
is still lying;  it's showing we're still landing in 'wait-unplug' on
aarch, because it's got a virtio-net by default; even though we've not
got a failover device setup.  CCing Jens.


I've run this test  on aarch64 in a loop today for a few hours but could not
reproduce this error.

One bug I found is that in primary_unplug_device() I look at the
virtio guest feature bits instead of the negotiated bits. But I don't
think this could lead to the above problem because even if the check
for the feature bit fails, primary_unplug_pending would still return
false because no primary device was set and n->primary_dev is NULL. 

I'll keep the test running until I can reproduce it. 


regards
Jens




[PATCH v1 2/5] linux-user: convert target_mmap debug to tracepoint

2019-11-28 Thread Alex Bennée
It is a pain to re-compile when you need to debug and tracepoints are
a fairly low impact way to instrument QEMU.

Signed-off-by: Alex Bennée 
---
 linux-user/mmap.c   | 51 +++--
 linux-user/trace-events |  1 +
 2 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 66868762519..c81fd85fbd2 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -60,6 +60,15 @@ void mmap_fork_end(int child)
 pthread_mutex_unlock(_mutex);
 }
 
+/* mmap prot pretty printer */
+static void pp_prot(char (*str)[4], int prot)
+{
+(*str)[0] = prot & PROT_READ ? 'r' : '-';
+(*str)[1] = prot & PROT_WRITE ? 'w' : '-';
+(*str)[2] = prot & PROT_EXEC ? 'x' : '-';
+(*str)[3] = 0;
+}
+
 /* NOTE: all the constants are the HOST ones, but addresses are target. */
 int target_mprotect(abi_ulong start, abi_ulong len, int prot)
 {
@@ -68,10 +77,7 @@ int target_mprotect(abi_ulong start, abi_ulong len, int prot)
 
 if (TRACE_TARGET_MPROTECT_ENABLED) {
 char prot_str[4];
-prot_str[0] = prot & PROT_READ ? 'r' : '-';
-prot_str[1] = prot & PROT_WRITE ? 'w' : '-';
-prot_str[2] = prot & PROT_EXEC ? 'x' : '-';
-prot_str[3] = 0;
+pp_prot(_str, prot);
 trace_target_mprotect(start, len, prot_str);
 }
 
@@ -370,32 +376,33 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
prot,
 abi_ulong ret, end, real_start, real_end, retaddr, host_offset, host_len;
 
 mmap_lock();
-#ifdef DEBUG_MMAP
-{
-printf("mmap: start=0x" TARGET_ABI_FMT_lx
-   " len=0x" TARGET_ABI_FMT_lx " prot=%c%c%c flags=",
-   start, len,
-   prot & PROT_READ ? 'r' : '-',
-   prot & PROT_WRITE ? 'w' : '-',
-   prot & PROT_EXEC ? 'x' : '-');
-if (flags & MAP_FIXED)
-printf("MAP_FIXED ");
-if (flags & MAP_ANONYMOUS)
-printf("MAP_ANON ");
-switch(flags & MAP_TYPE) {
+if (TRACE_TARGET_MMAP_ENABLED) {
+char prot_str[4];
+g_autoptr(GString) flag_str = g_string_new(NULL);
+
+pp_prot(_str, prot);
+
+if (flags & MAP_FIXED) {
+g_string_append(flag_str, "MAP_FIXED ");
+}
+if (flags & MAP_ANONYMOUS) {
+g_string_append(flag_str, "MAP_ANON ");
+}
+
+switch (flags & MAP_TYPE) {
 case MAP_PRIVATE:
-printf("MAP_PRIVATE ");
+g_string_append(flag_str, "MAP_PRIVATE ");
 break;
 case MAP_SHARED:
-printf("MAP_SHARED ");
+g_string_append(flag_str, "MAP_SHARED ");
 break;
 default:
-printf("[MAP_TYPE=0x%x] ", flags & MAP_TYPE);
+g_string_append_printf(flag_str, "[MAP_TYPE=0x%x] ",
+   flags & MAP_TYPE);
 break;
 }
-printf("fd=%d offset=" TARGET_ABI_FMT_lx "\n", fd, offset);
+trace_target_mmap(start, len, prot_str, flag_str->str, fd, offset);
 }
-#endif
 
 if (!len) {
 errno = EINVAL;
diff --git a/linux-user/trace-events b/linux-user/trace-events
index 41d72e61abb..9411ab357c9 100644
--- a/linux-user/trace-events
+++ b/linux-user/trace-events
@@ -14,3 +14,4 @@ user_s390x_restore_sigregs(void *env, uint64_t sc_psw_addr, 
uint64_t env_psw_add
 
 # mmap.c
 target_mprotect(uint64_t start, uint64_t len, char *flags) "start=0x%"PRIx64 " 
len=0x%"PRIx64 " prot=%s"
+target_mmap(uint64_t start, uint64_t len, char *pflags, char *mflags, int fd, 
uint64_t offset) "start=0x%"PRIx64 " len=0x%"PRIx64 " prot=%s flags=%s fd=%d 
offset=0x%"PRIx64
-- 
2.20.1




[PATCH v1 5/5] linux-user: convert target_munmap debug to a tracepoint

2019-11-28 Thread Alex Bennée
Convert the final bit of DEBUG_MMAP to a tracepoint and remove the
last remanents of the #ifdef hackery.

Signed-off-by: Alex Bennée 
---
 linux-user/mmap.c   | 9 ++---
 linux-user/trace-events | 1 +
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index c2755fcba1f..137aa3eb95f 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -21,8 +21,6 @@
 #include "exec/log.h"
 #include "qemu.h"
 
-//#define DEBUG_MMAP
-
 static pthread_mutex_t mmap_mutex = PTHREAD_MUTEX_INITIALIZER;
 static __thread int mmap_lock_count;
 
@@ -639,11 +637,8 @@ int target_munmap(abi_ulong start, abi_ulong len)
 abi_ulong end, real_start, real_end, addr;
 int prot, ret;
 
-#ifdef DEBUG_MMAP
-printf("munmap: start=0x" TARGET_ABI_FMT_lx " len=0x"
-   TARGET_ABI_FMT_lx "\n",
-   start, len);
-#endif
+trace_target_munmap(start, len);
+
 if (start & ~TARGET_PAGE_MASK)
 return -TARGET_EINVAL;
 len = TARGET_PAGE_ALIGN(len);
diff --git a/linux-user/trace-events b/linux-user/trace-events
index 774280cefbd..bd897add252 100644
--- a/linux-user/trace-events
+++ b/linux-user/trace-events
@@ -16,3 +16,4 @@ user_s390x_restore_sigregs(void *env, uint64_t sc_psw_addr, 
uint64_t env_psw_add
 target_mprotect(uint64_t start, uint64_t len, char *flags) "start=0x%"PRIx64 " 
len=0x%"PRIx64 " prot=%s"
 target_mmap(uint64_t start, uint64_t len, char *pflags, char *mflags, int fd, 
uint64_t offset) "start=0x%"PRIx64 " len=0x%"PRIx64 " prot=%s flags=%s fd=%d 
offset=0x%"PRIx64
 target_mmap_complete(uint64_t retaddr) "retaddr=0x%"PRIx64
+target_munmap(uint64_t start, uint64_t len) "start=0x%"PRIx64" len=0x%"PRIx64
-- 
2.20.1




Re: [PATCH v6] error: rename errp to errp_in where it is IN-argument

2019-11-28 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> 28.11.2019 17:23, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy  writes:
>> 
>>> Error **errp is almost always OUT-argument: it's assumed to be NULL, or
>>> pointer to NULL-initialized pointer, or pointer to error_abort or
>>> error_fatal, for callee to report error.
>>>
>>> But very few functions instead get Error **errp as IN-argument:
>>> it's assumed to be set (or, maybe, NULL), and callee should clean it,
>>> or add some information.
>>>
>>> In such cases, rename errp to errp_in.
>> 
>> Missing: why is the rename useful?
>
> The main reason is to prepare for coccinelle part.

It's not a prerequisite for applying the patches Coccinelle produces,
only a prerequisite for running Coccinelle.

>> It's useful if it helps readers recognize unusual Error ** parameters,
>> and recognizing unusual Error ** parameters is actually a problem.  I'm
>> not sure it is, but my familiarity with the Error interface may blind
>> me.
>> 
>> How many functions have unusual Error **parameters?  How are they used?
>> Any calls that could easily be mistaken as the usual case?  See [*]
>> below.
>> 
>> You effectively propose a naming convention.  error.h should spell it
>> out.  Let me try:
>> 
>>  Any Error ** parameter meant for passing an error to the caller must
>>  be named @errp.  No other Error ** parameter may be named @errp.
>
> Good
>
>> 
>> Observe:
>> 
>> * I refrain from stipulating how other Error ** parameters are to be
>>named.  You use @errp_in, because the ones you rename are actually
>>"IN-arguments".  However, different uses are conceivable, where
>>@errp_in would be misleading.
>> 
>> * If I understand your ERRP_AUTO_PROPAGATE() idea correctly, many
>>functions that take an Error ** to pass an error to the caller will
>>also use ERRP_AUTO_PROPAGATE, but not all.  Thus, presence of
>>ERRP_AUTO_PROPAGATE() won't be a reliable indicator of "the Error **
>>parameter is for passing an error to the caller".
>> 
>> * I can't see machinery to help us catch violations of the convention.
>> 
>>> This patch updates only error API functions. There still a few
>>> functions with errp-in semantics, they will be updated in further
>>> commits.
>> 
>> Splitting the series into individual patches was a bad idea :)
>> 
>> First, it really needs review as a whole.  I'll do that, but now I have
>> to hunt down the parts.  Found so far:
>> 
>>  [PATCH v6] error: rename errp to errp_in where it is IN-argument
>>  [PATCH v6] hmp: drop Error pointer indirection in hmp_handle_error
>>  [PATCH v6] vnc: drop Error pointer indirection in vnc_client_io_error
>>  [PATCH v6] qdev-monitor: well form error hint helpers
>>  [PATCH v6] nbd: well form nbd_iter_channel_error errp handler
>>  [PATCH v6] ppc: well form kvmppc_hint_smt_possible error hint helper
>>  [PATCH v6] 9pfs: well form error hint helpers
>>  [PATCH v6] hw/core/qdev: cleanup Error ** variables
>>  [PATCH v6] block/snapshot: rename Error ** parameter to more common errp
>>  [PATCH v6] hw/i386/amd_iommu: rename Error ** parameter to more common 
>> errp
>>  [PATCH v6] qga: rename Error ** parameter to more common errp
>>  [PATCH v6] monitor/qmp-cmds: rename Error ** parameter to more common 
>> errp
>>  [PATCH v6] hw/s390x: rename Error ** parameter to more common errp
>>  [PATCH v6] hw/sd: drop extra whitespace in sdhci_sysbus_realize() header
>>  [PATCH v6] hw/tpm: rename Error ** parameter to more common errp
>>  [PATCH v6] hw/usb: rename Error ** parameter to more common errp
>>  [PATCH v6] include/qom/object.h: rename Error ** parameter to more 
>> common errp
>>  [PATCH v6] backends/cryptodev: drop local_err from 
>> cryptodev_backend_complete()
>>  [PATCH v6] hw/vfio/ap: drop local_err from vfio_ap_realize
>
> .. 19 patches.. should be 21.
>
> It's really simple for me to resend them all in one v7 series. Should I?

Might add to the confusion.  Got a branch I can pull?

>> 
>> [*] The information I asked for above is buried in these patches.  I'll
>> try to dig it up as I go reviewing them.
>> 
>> Second, it risks some of these "further patches" overtake this one, and
>> then its commit message will be misleading.  Moreover, the other commits
>> will lack context.
>> 
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> Reviewed-by: Eric Blake 
>>> ---
>>>
>>> v6: fix s/errp/errp_in/ in comments corresponding to changed functions
>>>  [Eric]
>>>  add Eric's r-b
>>>
>>>   include/qapi/error.h | 16 
>>>   util/error.c | 30 +++---
>>>   2 files changed, 23 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/include/qapi/error.h b/include/qapi/error.h
>>> index 3f95141a01..df518644fc 100644
>>> --- a/include/qapi/error.h
>>> +++ b/include/qapi/error.h
>>> @@ -230,16 +230,16 @@ void error_propagate_prepend(Error **dst_errp, Error 
>>> *local_err,
>>>  

Re: [PATCH v2 2/2] travis.yml: Run tcg tests with tci

2019-11-28 Thread Stefan Weil
Am 28.11.19 um 22:06 schrieb Stefan Weil:

> Am 28.11.19 um 16:35 schrieb Thomas Huth:
>
>> So far we only have compile coverage for tci. But since commit
>> 2f160e0f9797c7522bfd0d09218d0c9340a5137c ("tci: Add implementation
>> for INDEX_op_ld16u_i64") has been included now, we can also run the
>> "tcg" and "qtest" tests with tci, so let's enable them in Travis now.
>> Since we don't gain much additional test coverage by compiling all
>> targets, and TCI is broken e.g. with the Sparc targets, we also limit
>
> As far as I know it is broken with Sparc hosts (not Sparc targets).
>
> I tested without limiting the target list on an x86_64 host, and the
> tests passed.


Sorry, I have to correct myself: check-qtest-sparc64 fails. I'll examine
that.

Stefan




[PATCH v1 0/5] linux-user mmap debug cleanup

2019-11-28 Thread Alex Bennée
Hi,

While debugging some wierd ELF loading bugs I realised our mmap debug
code could do with a little clean-up so I removed the DEBUG_MMAP in
favour of some tracepoints and extending the information that -d page
gives you.

Alex Bennée (5):
  linux-user: convert target_mprotect debug to tracepoint
  linux-user: convert target_mmap debug to tracepoint
  linux-user: add target_mmap_complete tracepoint
  linux-user: log page table changes under -d page
  linux-user: convert target_munmap debug to a tracepoint

 linux-user/mmap.c   | 82 ++---
 linux-user/trace-events |  6 +++
 2 files changed, 50 insertions(+), 38 deletions(-)

-- 
2.20.1




Re: [PATCH v2 0/2] Run tcg tests with tci on Travis

2019-11-28 Thread Philippe Mathieu-Daudé

On 11/28/19 4:35 PM, Thomas Huth wrote:

It's now possible to run some TCG-based tests with our Tiny Code
Generator Interpreter (TCI), too. These two patches enable the
testing on Travis.

Alex Bennée (1):
   configure: allow disable of cross compilation containers

Thomas Huth (1):
   travis.yml: Run tcg tests with tci

  .travis.yml| 7 ---
  configure  | 8 +++-
  tests/tcg/configure.sh | 6 --
  3 files changed, 15 insertions(+), 6 deletions(-)


Good idea to add/use '--disable-containers'.

Reviewed-by: Philippe Mathieu-Daudé 




[PATCH v1 1/5] linux-user: convert target_mprotect debug to tracepoint

2019-11-28 Thread Alex Bennée
It is a pain to re-compile when you need to debug and tracepoints are
a fairly low impact way to instrument QEMU.

Signed-off-by: Alex Bennée 
---
 linux-user/mmap.c   | 17 +
 linux-user/trace-events |  3 +++
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 46a6e3a761a..66868762519 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -17,7 +17,7 @@
  *  along with this program; if not, see .
  */
 #include "qemu/osdep.h"
-
+#include "trace.h"
 #include "qemu.h"
 
 //#define DEBUG_MMAP
@@ -66,13 +66,14 @@ int target_mprotect(abi_ulong start, abi_ulong len, int 
prot)
 abi_ulong end, host_start, host_end, addr;
 int prot1, ret;
 
-#ifdef DEBUG_MMAP
-printf("mprotect: start=0x" TARGET_ABI_FMT_lx
-   "len=0x" TARGET_ABI_FMT_lx " prot=%c%c%c\n", start, len,
-   prot & PROT_READ ? 'r' : '-',
-   prot & PROT_WRITE ? 'w' : '-',
-   prot & PROT_EXEC ? 'x' : '-');
-#endif
+if (TRACE_TARGET_MPROTECT_ENABLED) {
+char prot_str[4];
+prot_str[0] = prot & PROT_READ ? 'r' : '-';
+prot_str[1] = prot & PROT_WRITE ? 'w' : '-';
+prot_str[2] = prot & PROT_EXEC ? 'x' : '-';
+prot_str[3] = 0;
+trace_target_mprotect(start, len, prot_str);
+}
 
 if ((start & ~TARGET_PAGE_MASK) != 0)
 return -TARGET_EINVAL;
diff --git a/linux-user/trace-events b/linux-user/trace-events
index 6df234bbb67..41d72e61abb 100644
--- a/linux-user/trace-events
+++ b/linux-user/trace-events
@@ -11,3 +11,6 @@ user_handle_signal(void *env, int target_sig) "env=%p signal 
%d"
 user_host_signal(void *env, int host_sig, int target_sig) "env=%p signal %d 
(target %d("
 user_queue_signal(void *env, int target_sig) "env=%p signal %d"
 user_s390x_restore_sigregs(void *env, uint64_t sc_psw_addr, uint64_t 
env_psw_addr) "env=%p frame psw.addr 0x%"PRIx64 " current psw.addr 0x%"PRIx64
+
+# mmap.c
+target_mprotect(uint64_t start, uint64_t len, char *flags) "start=0x%"PRIx64 " 
len=0x%"PRIx64 " prot=%s"
-- 
2.20.1




Re: [PATCH v2 2/2] travis.yml: Run tcg tests with tci

2019-11-28 Thread Stefan Weil
Am 28.11.19 um 16:35 schrieb Thomas Huth:

> So far we only have compile coverage for tci. But since commit
> 2f160e0f9797c7522bfd0d09218d0c9340a5137c ("tci: Add implementation
> for INDEX_op_ld16u_i64") has been included now, we can also run the
> "tcg" and "qtest" tests with tci, so let's enable them in Travis now.
> Since we don't gain much additional test coverage by compiling all
> targets, and TCI is broken e.g. with the Sparc targets, we also limit


As far as I know it is broken with Sparc hosts (not Sparc targets).

I tested without limiting the target list on an x86_64 host, and the
tests passed.


> the target list to a reasonable subset now (which should still get
> us test coverage by tests/boot-serial-test for example).
>
> Signed-off-by: Thomas Huth 
> ---
>  .travis.yml | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/.travis.yml b/.travis.yml
> index c09b6a0014..de7559e777 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -215,10 +215,11 @@ matrix:
>  - TEST_CMD=""
>  
>  
> -# We manually include builds which we disable "make check" for
> +# Check the TCG interpreter (TCI)
>  - env:
> -- CONFIG="--enable-debug --enable-tcg-interpreter"
> -- TEST_CMD=""
> +- CONFIG="--enable-debug --enable-tcg-interpreter 
> --disable-containers


You could also --disable-kvm. It should not be needed, and disabling it
might need less build resources.


> +
> --target-list=alpha-softmmu,arm-softmmu,hppa-softmmu,m68k-softmmu,microblaze-softmmu,moxie-softmmu,ppc-softmmu,s390x-softmmu,x86_64-softmmu"
> +- TEST_CMD="make check-qtest check-tcg V=1"
>  
>  
>  # We don't need to exercise every backend with every front-end


Thank you for adding these tests.

Tested-by: Stefan Weil 






libcap vs libcap-ng mess

2019-11-28 Thread Dr. David Alan Gilbert
Hi,
  We seem to have a bit of a mess with libcap and libcap-ng; and I'm not
sure if we should try and untangle it.

a) Our configure script has tests for both libcap and libcap-ng
  for libcap it says $cap, for libcap-ng it says $cap_ng (ok)
  If $cap is set - nothing happens?
  If $cap_ng is set - we define CONFIG_LIBCAP  (!)

b) We use both
  1) pr-helper and bridge-helper use CONFIG_LIBCAP and use cap-ng
  2) 9p's virtfs-proxy-helper uses libcap - it's got a check in
 configure to make sure you have libcap if you've asked for 9p

c) Our gitlab-ci.yml installs libcap-dev to get the 9p stuff tested
  but never installes libcap-ng-dev

I hit this because we're using libcap in virtiofsd at the moment.

So hmm how to fix?
I'm tempted to:
  x) Replace CONFIG_LIBCAP by CONFIG_LIBCAPNG to make it clear
  y) Should we flip over to only using one or the other - what
 are the advantages?
  z) We should probably add the other one to the ci.

Dave

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [RFC 0/1] ATI R300 emulated grpahics card V2

2019-11-28 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20191128064350.20727-1-aaron.zakh...@gmail.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

/tmp/qemu-test/src/hw/display/xlnx_dp.c:504: undefined reference to 
`aux_request'
../hw/display/dpcd.o: In function `dpcd_init':
/tmp/qemu-test/src/hw/display/dpcd.c:141: undefined reference to `aux_init_mmio'
collect2: error: ld returned 1 exit status
make[1]: *** [qemu-system-aarch64] Error 1
make: *** [aarch64-softmmu/all] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 662, in 
sys.exit(main())
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=6382f7f5da9e408e8649412bf4aad739', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-c6g2zogs/src/docker-src.2019-11-28-12.30.21.7650:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=6382f7f5da9e408e8649412bf4aad739
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-c6g2zogs/src'
make: *** [docker-run-test-quick@centos7] Error 2

real2m38.467s
user0m8.298s


The full log is available at
http://patchew.org/logs/20191128064350.20727-1-aaron.zakh...@gmail.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v2 1/2] configure: allow disable of cross compilation containers

2019-11-28 Thread Stefan Weil
Am 28.11.19 um 16:35 schrieb Thomas Huth:

> From: Alex Bennée 
>
> Our docker infrastructure isn't quite as multiarch as we would wish so
> lets allow the user to disable it if they want. This will allow us to


s/lets/let's/ ?

Otherwise fine, thank you.

Reviewed-by: Stefan Weil 




Re: [PATCH 2/2] Add -mem-shared option

2019-11-28 Thread Marc-André Lureau
Hi

On Thu, Nov 28, 2019 at 9:25 PM Igor Mammedov  wrote:
>
> On Thu, 28 Nov 2019 18:15:18 +0400
> Marc-André Lureau  wrote:
>
> > Add an option to simplify shared memory / vhost-user setup.
> >
> > Currently, using vhost-user requires NUMA setup such as:
> > -m 4G -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on 
> > -numa node,memdev=mem
> >
> > As there is no other way to allocate shareable RAM, afaik.
> >
> > -mem-shared aims to have a simple way instead: -m 4G -mem-shared
> User always can write a wrapper script if verbose CLI is too much,
> and we won't have to deal with myriad permutations to maintain.

Sure, but that's not exactly making it easier for the user,
documentation etc (or machine that do not support numa as David
mentionned).

>
> Also current -mem-path/prealloc in combination with memdevs is
> the source of problems (as ram allocation uses 2 different paths).
> It's possible to fix with a kludge but I'd rather fix it properly.

I agree, however I think it's a separate problems. We don't have to
fix both simultaneously. The semantic of a new CLI -mem-shared (or
shared=on etc) can be defined and implemented in a simple way, before
internal refactoring.

> So during 5.0, I'm planning to consolidate -mem-path/prealloc
> handling around memory backend internally (and possibly deprecate them),
> so the only way to allocate RAM for guest would be via memdevs.
> (reducing number of options an globals that they use)
>

That would be great indeed. I tried to look at that in the past, but
was a it overwhelmed by the amount of details and/or code complexity.

> So user who wants something non trivial could override default
> non-numa behavior with
>   -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \
>   -machine memdev=mem
> or use any other backend that suits theirs needs.

That's nice, but not as friendly as a simple -mem-shared.

thanks

>
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  exec.c  | 11 ++-
> >  hw/core/numa.c  | 16 +++-
> >  include/sysemu/sysemu.h |  1 +
> >  qemu-options.hx | 10 ++
> >  vl.c|  4 
> >  5 files changed, 40 insertions(+), 2 deletions(-)
> >
> > diff --git a/exec.c b/exec.c
> > index ffdb518535..4e53937eaf 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -72,6 +72,10 @@
> >  #include "qemu/mmap-alloc.h"
> >  #endif
> >
> > +#ifdef CONFIG_POSIX
> > +#include "qemu/memfd.h"
> > +#endif
> > +
> >  #include "monitor/monitor.h"
> >
> >  //#define DEBUG_SUBPAGE
> > @@ -2347,7 +2351,12 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
> > MemoryRegion *mr,
> >  bool created;
> >  RAMBlock *block;
> >
> > -fd = file_ram_open(mem_path, memory_region_name(mr), , errp);
> > +if (mem_path) {
> > +fd = file_ram_open(mem_path, memory_region_name(mr), , 
> > errp);
> > +} else {
> > +fd = qemu_memfd_open(mr->name, size,
> > + F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL, 
> > errp);
> > +}
>
> that's what I'm mostly against, as it spills out memdev impl. details
> into generic code.
>
> >  if (fd < 0) {
> >  return NULL;
> >  }
> > diff --git a/hw/core/numa.c b/hw/core/numa.c
> > index e3332a984f..6f72cddb1c 100644
> > --- a/hw/core/numa.c
> > +++ b/hw/core/numa.c
> > @@ -493,7 +493,8 @@ static void allocate_system_memory_nonnuma(MemoryRegion 
> > *mr, Object *owner,
> >  if (mem_path) {
> >  #ifdef __linux__
> >  Error *err = NULL;
> > -memory_region_init_ram_from_file(mr, owner, name, ram_size, 0, 0,
> > +memory_region_init_ram_from_file(mr, owner, name, ram_size, 0,
> > + mem_shared ? RAM_SHARED : 0,
> >   mem_path, );
> this will be gone and replaced by memory region that memdev initializes.
>
> >  if (err) {
> >  error_report_err(err);
> > @@ -513,6 +514,19 @@ static void 
> > allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
> >  #else
> >  fprintf(stderr, "-mem-path not supported on this host\n");
> >  exit(1);
> > +#endif
> > +} else if (mem_shared) {
> > +#ifdef CONFIG_POSIX
> > +Error *err = NULL;
> > +memory_region_init_ram_from_file(mr, owner, NULL, ram_size, 0,
> > + RAM_SHARED, NULL, );
> > +if (err) {
> > +error_report_err(err);
> > +exit(1);
> > +}
> > +#else
> > +fprintf(stderr, "-mem-shared not supported on this host\n");
> > +exit(1);
> >  #endif
> >  } else {
> >  memory_region_init_ram_nomigrate(mr, owner, name, ram_size, 
> > _fatal);
> > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > index 80c57fdc4e..80db8465a9 100644
> > --- a/include/sysemu/sysemu.h
> > +++ b/include/sysemu/sysemu.h
> > @@ -55,6 +55,7 @@ extern bool enable_cpu_pm;
> >  extern 

Re: [PATCH 08/15] s390x: protvirt: KVM intercept changes

2019-11-28 Thread Janosch Frank
On 11/28/19 5:45 PM, Cornelia Huck wrote:
> On Thu, 28 Nov 2019 17:38:19 +0100
> Janosch Frank  wrote:
> 
>> On 11/21/19 4:11 PM, Thomas Huth wrote:
>>> On 20/11/2019 12.43, Janosch Frank wrote:  
 Secure guests no longer intercept with code 4 for an instruction
 interception. Instead they have codes 104 and 108 for secure
 instruction interception and secure instruction notification
 respectively.

 The 104 mirrors the 4, but the 108 is a notification, that something
 happened and the hypervisor might need to adjust its tracking data to
 that fact. An example for that is the set prefix notification
 interception, where KVM only reads the new prefix, but does not update
 the prefix in the state description.

 Signed-off-by: Janosch Frank 
 ---
  target/s390x/kvm.c | 6 ++
  1 file changed, 6 insertions(+)

 diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
 index 418154ccfe..58251c0229 100644
 --- a/target/s390x/kvm.c
 +++ b/target/s390x/kvm.c
 @@ -115,6 +115,8 @@
  #define ICPT_CPU_STOP   0x28
  #define ICPT_OPEREXC0x2c
  #define ICPT_IO 0x40
 +#define ICPT_PV_INSTR   0x68
 +#define ICPT_PV_INSTR_NOT   0x6c
  
  #define NR_LOCAL_IRQS 32
  /*
 @@ -151,6 +153,7 @@ static int cap_s390_irq;
  static int cap_ri;
  static int cap_gs;
  static int cap_hpage_1m;
 +static int cap_protvirt;
  
  static int active_cmma;
  
 @@ -336,6 +339,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  cap_async_pf = kvm_check_extension(s, KVM_CAP_ASYNC_PF);
  cap_mem_op = kvm_check_extension(s, KVM_CAP_S390_MEM_OP);
  cap_s390_irq = kvm_check_extension(s, KVM_CAP_S390_INJECT_IRQ);
 +cap_protvirt = kvm_check_extension(s, KVM_CAP_S390_PROTECTED);
  
  if (!kvm_check_extension(s, KVM_CAP_S390_GMAP)
  || !kvm_check_extension(s, KVM_CAP_S390_COW)) {
 @@ -1664,6 +1668,8 @@ static int handle_intercept(S390CPU *cpu)
  (long)cs->kvm_run->psw_addr);
  switch (icpt_code) {
  case ICPT_INSTRUCTION:
 +case ICPT_PV_INSTR:
 +case ICPT_PV_INSTR_NOT:
  r = handle_instruction(cpu, run);  
>>>
>>> Even if this works by default, my gut feeling tells me that it would be
>>> safer and cleaner to have a separate handler for this...
>>> Otherwise we might get surprising results if future machine generations
>>> intercept/notify for more or different instructions, I guess?
>>>
>>> However, it's just a gut feeling ... I really don't have much experience
>>> with this PV stuff yet ... what do the others here think?
>>>
>>>  Thomas  
>>
>>
>> Adding a handle_instruction_pv doesn't hurt me too much.
>> The default case can then do an error_report() and exit(1);
>>
>> PV was designed in a way that we can re-use as much code as possible, so
>> I tried using the normal instruction handlers and only change as little
>> as possible in the instructions themselves.
> 
> I think we could argue that handling 4 and 104 in the same function
> makes sense; but the 108 notification should really be separate, I

In my latest answer to Thomas I stated that we could move to a separate
pv instruction handler. I just had another look and rediscovered, that
it would mean a lot more changes. I would need to duplicate the ipa/b
parsing and for diagnose even the base+disp parsing.

So yes, I'd like to treat the 104 like the 4 intercept...

> think. From what I've seen, the expectation of what the hypervisor
> needs to do is just something else in this case ("hey, I did something;
> just to let you know").

We can remove the notification from QEMU, as far is I know, we moved the
instruction that used this path to a 104 code.

> 
> Is the set of instructions you get a 104 for always supposed to be a
> subset of the instructions you get a 4 for? I'd expect it to be so.
> 

Yes
I'll ask if we'll get a new code for instructions that are only valid in
PV mode; currently there are none.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 0/2] RFC: add -mem-shared option

2019-11-28 Thread Dr. David Alan Gilbert
* Marc-André Lureau (marcandre.lur...@redhat.com) wrote:
> Hi,
> 
> Setting up shared memory for vhost-user is a bit complicated from
> command line, as it requires NUMA setup such as: m 4G -object
> memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on -numa
> node,memdev=mem.
> 
> Instead, I suggest to add a -mem-shared option for non-numa setups,
> that will make the -mem-path or anonymouse memory shareable.
> 
> Comments welcome,

It's worth checking with Igor (cc'd) - he said he was going to work on
something similar.

One other thing this fixes is that it lets you potentially do vhost-user
on s390, since it currently has no NUMA.

Dave

> Marc-André Lureau (2):
>   memfd: add qemu_memfd_open()
>   Add -mem-shared option
> 
>  exec.c  | 11 ++-
>  hw/core/numa.c  | 16 +++-
>  include/qemu/memfd.h|  3 +++
>  include/sysemu/sysemu.h |  1 +
>  qemu-options.hx | 10 ++
>  util/memfd.c| 39 +--
>  vl.c|  4 
>  7 files changed, 68 insertions(+), 16 deletions(-)
> 
> -- 
> 2.24.0
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH 14/15] s390x: protvirt: Disable address checks for PV guest IO emulation

2019-11-28 Thread Thomas Huth
On 28/11/2019 17.10, Janosch Frank wrote:
> On 11/28/19 4:28 PM, Thomas Huth wrote:
>> On 20/11/2019 12.43, Janosch Frank wrote:
>>> IO instruction data is routed through SIDAD for protected guests, so
>>> adresses do not need to be checked, as this is kernel memory.
>>>
>>> Signed-off-by: Janosch Frank 
>>> ---
>>>  target/s390x/ioinst.c | 46 +++
>>>  1 file changed, 29 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/target/s390x/ioinst.c b/target/s390x/ioinst.c
>>> index c437a1d8c6..d3bd422ddd 100644
>>> --- a/target/s390x/ioinst.c
>>> +++ b/target/s390x/ioinst.c
>>> @@ -110,11 +110,13 @@ void ioinst_handle_msch(S390CPU *cpu, uint64_t reg1, 
>>> uint32_t ipb, uintptr_t ra)
>>>  int cssid, ssid, schid, m;
>>>  SubchDev *sch;
>>>  SCHIB schib;
>>> -uint64_t addr;
>>> +uint64_t addr = 0;
>>>  CPUS390XState *env = >env;
>>> -uint8_t ar;
>>> +uint8_t ar = 0;
>>>  
>>> -addr = decode_basedisp_s(env, ipb, );
>>> +if (!env->pv) {
>>> +addr = decode_basedisp_s(env, ipb, );
>>> +}
>>>  if (addr & 3) {
>>>  s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>>  return;
>>> @@ -167,11 +169,13 @@ void ioinst_handle_ssch(S390CPU *cpu, uint64_t reg1, 
>>> uint32_t ipb, uintptr_t ra)
>>>  int cssid, ssid, schid, m;
>>>  SubchDev *sch;
>>>  ORB orig_orb, orb;
>>> -uint64_t addr;
>>> +uint64_t addr = 0;
>>>  CPUS390XState *env = >env;
>>> -uint8_t ar;
>>> +uint8_t ar = 0;
>>>  
>>> -addr = decode_basedisp_s(env, ipb, );
>>> +if (!env->pv) {
>>> +addr = decode_basedisp_s(env, ipb, );
>>> +}
>>>  if (addr & 3) {
>>>  s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>>  return;
>>> @@ -198,12 +202,14 @@ void ioinst_handle_ssch(S390CPU *cpu, uint64_t reg1, 
>>> uint32_t ipb, uintptr_t ra)
>>>  void ioinst_handle_stcrw(S390CPU *cpu, uint32_t ipb, uintptr_t ra)
>>>  {
>>>  CRW crw;
>>> -uint64_t addr;
>>> +uint64_t addr = 0;
>>>  int cc;
>>>  CPUS390XState *env = >env;
>>> -uint8_t ar;
>>> +uint8_t ar = 0;
>>>  
>>> -addr = decode_basedisp_s(env, ipb, );
>>> +if (!env->pv) {
>>> +addr = decode_basedisp_s(env, ipb, );
>>> +}
>>>  if (addr & 3) {
>>>  s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>>  return;
>>> @@ -228,13 +234,15 @@ void ioinst_handle_stsch(S390CPU *cpu, uint64_t reg1, 
>>> uint32_t ipb,
>>>  {
>>>  int cssid, ssid, schid, m;
>>>  SubchDev *sch;
>>> -uint64_t addr;
>>> +uint64_t addr = 0;
>>>  int cc;
>>>  SCHIB schib;
>>>  CPUS390XState *env = >env;
>>> -uint8_t ar;
>>> +uint8_t ar = 0;
>>>  
>>> -addr = decode_basedisp_s(env, ipb, );
>>> +if (!env->pv) {
>>> +addr = decode_basedisp_s(env, ipb, );
>>> +}
>>>  if (addr & 3) {
>>>  s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>>  return;
>>> @@ -294,16 +302,18 @@ int ioinst_handle_tsch(S390CPU *cpu, uint64_t reg1, 
>>> uint32_t ipb, uintptr_t ra)
>>>  int cssid, ssid, schid, m;
>>>  SubchDev *sch;
>>>  IRB irb;
>>> -uint64_t addr;
>>> +uint64_t addr = 0;
>>>  int cc, irb_len;
>>> -uint8_t ar;
>>> +uint8_t ar = 0;
>>>  
>>>  if (ioinst_disassemble_sch_ident(reg1, , , , )) {
>>>  s390_program_interrupt(env, PGM_OPERAND, ra);
>>>  return -EIO;
>>>  }
>>>  trace_ioinst_sch_id("tsch", cssid, ssid, schid);
>>> -addr = decode_basedisp_s(env, ipb, );
>>> +if (!env->pv) {
>>> +addr = decode_basedisp_s(env, ipb, );
>>> +}
>>>  if (addr & 3) {
>>>  s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>>  return -EIO;
>>
>> Would it make sense to hide all these changes in decode_basedisp_s()
>> instead? ... so that decode_basedisp_s() returns 0 if env->pv == true ?
>> ... or are there still cases where we need real values from
>> decode_basedisp_s() in case of env->pv==true?
> 
> I'd like to keep decode_basedisp_s() as is, but how about a static
> function in ioinst.c called something like get_address_from_regs()?
> 
> It'll call decode_basedisp_s() or return 0.

Sounds fine for me, too!

 Thomas



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v37 00/17] QEMU AVR 8 bit cores

2019-11-28 Thread Aleksandar Markovic
On Thursday, November 28, 2019, Philippe Mathieu-Daudé 
wrote:

> On 11/28/19 2:46 PM, Michael Rolnik wrote:
>
>> I will rename them.
>>
>
> Please wait comments from Richard before a version respin.
>
>
Everything went well last 10 or so days, Michael and Sarah were responsive,
the code and series got slowly improved more and more, but there was this
disruption by your idea to "take over" the series with implementation of
"real boards", rather than leave Michael doing improvements by himself,
based on our feedback, like in a regular process of review... There are
some pending quite reasonable and simple review items from me, Michael
should continue working on them... But now he is told to wait... Shouldn't
it be some better way?


On Thu, Nov 28, 2019 at 3:41 PM Aleksandar Markovic <
>> aleksandar.m.m...@gmail.com > wrote:
>>
> [...]
>
>>
>>
>> If I understand Aleksandar correctly, the naming is incorrect
>> because too generic to AVR family, why Sarah only modeled the
>> Atmel implementation.
>>
>> Renaming devices such hw/char/avr_usart.c ->
>> hw/char/atmel_usart.c (similarly with the macros) would be
>> enough Aleksandar?
>>
>>
>>
>> Some renaming could help, perhaps not quite like the one above, but
>> my point (which I find hard to believe I can't explain to you) is
>> that peripherals inside the chip evolved over time, as starkly
>> opposed to external peripherals that are set in stone...
>>
>
>


RE: [PATCH] Updating the GEM MAC IP to properly filter out the multicast addresses

2019-11-28 Thread Wasim, Bilal
This was one of my first attempts, and so I was sure to miss something.. I've 
incorporated all the updates in this patch.. Let me know what you think about 
this.. 

net/cadence_gem: Updating the GEM MAC IP to properly filter out the multicast 
addresses.

The current code makes a bad assumption that the most-significant byte
of the MAC address is used to determine if the address is multicast or
unicast, but in reality only a single bit is used to determine this.
This caused IPv6 to not work.. Fix is now in place and has been tested
with ZCU102-A53 / IPv6 on a TAP interface. Works well..

Signed-off-by: Bilal Wasim 
---
 hw/net/cadence_gem.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
index b8be73dc55..98efb93f8a 100644
--- a/hw/net/cadence_gem.c
+++ b/hw/net/cadence_gem.c
@@ -34,6 +34,7 @@
 #include "qemu/module.h"
 #include "sysemu/dma.h"
 #include "net/checksum.h"
+#include "net/eth.h"
 
 #ifdef CADENCE_GEM_ERR_DEBUG
 #define DB_PRINT(...) do { \
@@ -315,6 +316,12 @@
 
 #define GEM_MODID_VALUE 0x00020118
 
+/* IEEE has specified that the most significant bit of the most significant 
byte be used for
+ * distinguishing between Unicast and Multicast addresses.
+ * If its a 1, that means multicast, 0 means unicast.   */
+#define IS_MULTICAST(address)   is_multicast_ether_addr(address)
+#define IS_UNICAST(address) is_unicast_ether_addr(address)
+
 static inline uint64_t tx_desc_get_buffer(CadenceGEMState *s, uint32_t *desc)
 {
 uint64_t ret = desc[0];
@@ -601,7 +608,7 @@ static void gem_receive_updatestats(CadenceGEMState *s, 
const uint8_t *packet,
 }
 
 /* Error-free Multicast Frames counter */
-if (packet[0] == 0x01) {
+if (IS_MULTICAST(packet)) {
 s->regs[GEM_RXMULTICNT]++;
 }
 
@@ -690,21 +697,21 @@ static int gem_mac_address_filter(CadenceGEMState *s, 
const uint8_t *packet)
 }
 
 /* Accept packets -w- hash match? */
-if ((packet[0] == 0x01 && (s->regs[GEM_NWCFG] & GEM_NWCFG_MCAST_HASH)) ||
-(packet[0] != 0x01 && (s->regs[GEM_NWCFG] & GEM_NWCFG_UCAST_HASH))) {
+if ((IS_MULTICAST(packet) && (s->regs[GEM_NWCFG] & GEM_NWCFG_MCAST_HASH)) 
||
+(IS_UNICAST(packet)   && (s->regs[GEM_NWCFG] & GEM_NWCFG_UCAST_HASH))) 
{
 unsigned hash_index;
 
 hash_index = calc_mac_hash(packet);
 if (hash_index < 32) {
 if (s->regs[GEM_HASHLO] & (1 [PATCH] Updating the GEM MAC IP to properly filter out the multicast 
> addresses. The current code makes a bad assumption that the 
> most-significant byte of the MAC address is used to determine if the 
> address is multicast or unicast, but in reality only a single bit is 
> used to determine this. This caused IPv6 to not work.. Fix is now in 
> place and has been tested with
> ZCU102-A53 / IPv6 on a TAP interface. Works well..

Hi Bilal,

The fix looks right to me but I have a few comments.

* Your patch seems a little wrongly formated.
[PATCH] goes into the Subject line for example and you're missing path prefixes.

Do a git log -- hw/net/cadence_gem.c to see examples on how it should look.

* The patch will probably not pass checkpatch since you seem to have long lines.

* We also need to update gem_receive_updatestats() to use the corrected macros.

More inline:

> 
> Signed-off-by: Bilal Wasim 
> ---
> hw/net/cadence_gem.c | 18 --
> 1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c index 
> b8be73d..f8bcbb3 100644
> --- a/hw/net/cadence_gem.c
> +++ b/hw/net/cadence_gem.c
> @@ -315,6 +315,12 @@
>  #define GEM_MODID_VALUE 0x00020118
> +/* IEEE has specified that the most significant bit of the most 
> +significant byte be used for
> + * distinguishing between Unicast and Multicast addresses.
> + * If its a 1, that means multicast, 0 means unicast.   */
> +#define IS_MULTICAST(address)   (((address[0] & 0x01) == 0x01) ? 1 : 
> 0)



This can be simplified:
#define IS_MULTICAST(address) (address[0] & 1)

Actually, looking closer, we already have functions to do these checks in:
include/net/eth.h

static inline int is_multicast_ether_addr(const uint8_t *addr) static inline 
int is_broadcast_ether_addr(const uint8_t *addr) static inline int 
is_unicast_ether_addr(const uint8_t *addr)



> +#define IS_UNICAST(address) (!IS_MULTICAST(address))
> +
> static inline uint64_t tx_desc_get_buffer(CadenceGEMState *s, uint32_t 
> *desc) {
>  uint64_t ret = desc[0];
> @@ -690,21 +696,21 @@ static int 

[PATCH v1 3/5] linux-user: add target_mmap_complete tracepoint

2019-11-28 Thread Alex Bennée
For full details we also want to see where the mmaps end up.

Signed-off-by: Alex Bennée 
---
 linux-user/mmap.c   | 2 +-
 linux-user/trace-events | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index c81fd85fbd2..a2c7037f1b6 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -577,8 +577,8 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
prot,
  the_end1:
 page_set_flags(start, start + len, prot | PAGE_VALID);
  the_end:
+trace_target_mmap_complete(start);
 #ifdef DEBUG_MMAP
-printf("ret=0x" TARGET_ABI_FMT_lx "\n", start);
 page_dump(stdout);
 printf("\n");
 #endif
diff --git a/linux-user/trace-events b/linux-user/trace-events
index 9411ab357c9..774280cefbd 100644
--- a/linux-user/trace-events
+++ b/linux-user/trace-events
@@ -15,3 +15,4 @@ user_s390x_restore_sigregs(void *env, uint64_t sc_psw_addr, 
uint64_t env_psw_add
 # mmap.c
 target_mprotect(uint64_t start, uint64_t len, char *flags) "start=0x%"PRIx64 " 
len=0x%"PRIx64 " prot=%s"
 target_mmap(uint64_t start, uint64_t len, char *pflags, char *mflags, int fd, 
uint64_t offset) "start=0x%"PRIx64 " len=0x%"PRIx64 " prot=%s flags=%s fd=%d 
offset=0x%"PRIx64
+target_mmap_complete(uint64_t retaddr) "retaddr=0x%"PRIx64
-- 
2.20.1




[PATCH v1 4/5] linux-user: log page table changes under -d page

2019-11-28 Thread Alex Bennée
The CPU_LOG_PAGE flag is woefully underused and could stand to do
extra duty tracking page changes. If the user doesn't want to see the
details as things change they still have the tracepoints available.

Signed-off-by: Alex Bennée 
---
 linux-user/mmap.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index a2c7037f1b6..c2755fcba1f 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -18,6 +18,7 @@
  */
 #include "qemu/osdep.h"
 #include "trace.h"
+#include "exec/log.h"
 #include "qemu.h"
 
 //#define DEBUG_MMAP
@@ -578,10 +579,12 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
prot,
 page_set_flags(start, start + len, prot | PAGE_VALID);
  the_end:
 trace_target_mmap_complete(start);
-#ifdef DEBUG_MMAP
-page_dump(stdout);
-printf("\n");
-#endif
+if (qemu_loglevel_mask(CPU_LOG_PAGE)) {
+qemu_log_lock();
+qemu_log("new page @ 0x"TARGET_ABI_FMT_lx" updates page map:\n", 
start);
+log_page_dump();
+qemu_log_unlock();
+}
 tb_invalidate_phys_range(start, start + len);
 mmap_unlock();
 return start;
-- 
2.20.1




Re: qom device lifecycle interaction with hotplug/hotunplug ?

2019-11-28 Thread Peter Maydell
On Thu, 28 Nov 2019 at 17:27, Igor Mammedov  wrote:
>
> On Thu, 28 Nov 2019 16:00:06 +
> Peter Maydell  wrote:
> > Once a device is hot-unplugged (and thus unrealized) is it valid
> > for it to be re-hot-plugged, or is the assumption that it's then
> > destroyed and a fresh device is created if the user wants to plug
> > something in again later ? Put another way, is it valid for a qdev
> > device to see state transitions realize -> unrealize -> realize ?
>
> I don't think we do it currently (or maybe we do with failover but
> I missed that train), but I don't see why it can't be done.

Well, as Eduardo says, if we don't currently do it then
we probably have a lot of subtly buggy code. Requiring it to work
imposes a requirement on the 'unrealize' function that it
doesn't just do required cleanup/resource releasing actions,
but also returns the device back to exactly the state it was in
after instance_init, so that 'realize' will work correctly.
That's quite a lot of code auditing/effort if we don't actually
have a current or future use for making this work, rather than
just requiring that an unrealized device object is immediately
finalized without possibility of resurrection.

If we do have a plausible usecase then I think we should document
that unrealize needs to handle this, and also have a basic
smoke test of the realize->unrealize->realize.

thanks
-- PMM



Re: [PATCH v37 00/17] QEMU AVR 8 bit cores

2019-11-28 Thread Aleksandar Markovic
On Thursday, November 28, 2019, Alex Bennée  wrote:

>
> Aleksandar Markovic  writes:
>
> > On Thursday, November 28, 2019, Michael Rolnik 
> wrote:
> >
> >> I don't see why you say that the peripherals are inside the chip, there
> is
> >> CPU within target/avr directory and then there are some peripherals in
> hw
> >> directory, CPU does not depend on them. what am I missing?
> >>
> >>>
> >>>
> > I meant these peripherals are physically inside the chip together with
> the
> > core.
> >
> > And USART in a micricontroler from 2010 is different than USART from one
> > from 2018.
>
> Won't these be different chip parts? Or at least revs of the part?
>
> I think broadly the difference between SoC devices is handled by
> handling versioning in the board models - the board being in this case a
> CPU core + a bunch of SoC components + the actual board itself.
>
> All the target/cpu stuff needs to deal with is actual architectural
> revs (c.f. target/arm/cpu[64].c).
>
>
This sounds like a very good way of dealing with this.

I don't want to force Michael to implement some of such cases before
integration, but just to think about such cases - for future improvements
and developments.

Alex, I appreciate your advice, very nice of you!

Aleksandar




> >
> >
> >> On Thu, Nov 28, 2019 at 3:22 PM Aleksandar Markovic <
> >> aleksandar.m.m...@gmail.com> wrote:
> >>
> >>>
> >>>
> >>> On Thursday, November 28, 2019, Michael Rolnik 
> wrote:
> >>>
> 
> 
>  On Wed, Nov 27, 2019 at 11:06 PM Aleksandar Markovic <
>  aleksandar.m.m...@gmail.com> wrote:
> 
> > On Wed, Nov 27, 2019 at 6:53 PM Michael Rolnik 
> > wrote:
> > >
> > > This series of patches adds 8bit AVR cores to QEMU.
> > > All instruction, except BREAK/DES/SPM/SPMX, are implemented. Not
> > fully tested yet.
> > > However I was able to execute simple code with functions. e.g
> > fibonacci calculation.
> > > This series of patches include a non real, sample board.
> > > No fuses support yet. PC is set to 0 at reset.
> > >
> >
> > I have a couple of general remarks, so I am responding to the cover
> > letter, not individual patches.
> >
> > 1) The licenses for Sarah devices differ than the rest - shouldn't
> all
> > licenses be harmonized?
> 
>  Sarah,
>  do you mind if use the same license I use for my code?
> 
> 
> >
> 
> 
> > 2) There is an architectural problem with peripherals. It is possible
> > that they evolve over time, so, for example, USART could not be the
> > same for older and newer CPUs (in principle, newer peripheral is
> > expected to be o sort of "superset" of the older). How do you solve
> > that problem? Right now, it may not looks serious to you, but if you
> > don;t think about that right now, from the outset, soon the code will
> > become so entangled, ti woudl be almost very difficult to fix it.
> > Please think about that, how would you solve it, is there a way to
> > pass the information on the currently emulated CPU to the code
> > covering a peripheral, and provide a different behaviour?
> >
>  Hi Aleksandar,
> 
>  Please explain.
> 
> 
> >>> My concern is about peripherals inside the chip, together with the
> core.
> >>>
> >>> If one models, let's say an external (in the sense, it is a separate
> >>> chip) ADC (analog-to-digital converter), one looks at specs, implement
> what
> >>> is resonable possible in QEMU, plug it in in one of machines thst
> contains
> >>> it, and that's it. That ADC remains the same, of course, whatever the
> >>> surrounding system is.
> >>>
> >>> In AVR case, I think we have a phenomenon likes of which we didn't see
> >>> before (at least I don't know about). Number of AVR microcontrollers is
> >>> very large, and both cores and peripherals evolved.
> >>>
> >>> For cores, you handle differences with all these AVR_FEATURE macros,
> and
> >>> this seems to be working, no significant objection from my side, and
> btw
> >>> that was not an easy task to execute, all admiration from me.
> >>>
> >>> But what about peripherals inside the chip? A peripheral with the same
> >>> name and the same general area of functionality may be differently
> >>> specified for microcontrollers from 2010 and 2018. By the difference I
> >>> don't mean starting address, but the difference in behavior. I don't
> have
> >>> time right now to spell many examples, but I read three different
> specs,
> >>> and there are differences in USART specifications.
> >>>
> >>> I am not clear what is your envisioned solution for these cases. Would
> >>> you such close, but not the same, flabors of a peripheral treat as if
> they
> >>> are two completely separate cases of a peripheral? Or would you have a
> >>> single peripheral that would somehow configure itself depending on the
> core
> >>> it is attached to?
> >>>
> >>> I hope I was clearer this time.
> >>>
> >>> 

[PATCH v2 2/4] ich9: fix getter type for sci_int property

2019-11-28 Thread Felipe Franciosi
When QOM APIs were added to ich9 in 6f1426ab, the getter for sci_int was
written using uint32_t. However, the object property is uint8_t. This
fixes the getter for correctness.

Signed-off-by: Felipe Franciosi 
---
 hw/isa/lpc_ich9.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index f5526f9c3b..3a9c4f0503 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -631,9 +631,7 @@ static void ich9_lpc_get_sci_int(Object *obj, Visitor *v, 
const char *name,
  void *opaque, Error **errp)
 {
 ICH9LPCState *lpc = ICH9_LPC_DEVICE(obj);
-uint32_t value = lpc->sci_gsi;
-
-visit_type_uint32(v, name, , errp);
+visit_type_uint8(v, name, >sci_gsi, errp);
 }
 
 static void ich9_lpc_add_properties(ICH9LPCState *lpc)
@@ -641,7 +639,7 @@ static void ich9_lpc_add_properties(ICH9LPCState *lpc)
 static const uint8_t acpi_enable_cmd = ICH9_APM_ACPI_ENABLE;
 static const uint8_t acpi_disable_cmd = ICH9_APM_ACPI_DISABLE;
 
-object_property_add(OBJECT(lpc), ACPI_PM_PROP_SCI_INT, "uint32",
+object_property_add(OBJECT(lpc), ACPI_PM_PROP_SCI_INT, "uint8",
 ich9_lpc_get_sci_int,
 NULL, NULL, NULL, NULL);
 object_property_add_uint8_ptr(OBJECT(lpc), ACPI_PM_PROP_ACPI_ENABLE_CMD,
-- 
2.20.1



RE: [PATCH] Updating the GEM MAC IP to properly filter out the multicast addresses

2019-11-28 Thread Wasim, Bilal
Thanks for the pointers.. I will incorporate all these changes and post an 
updated thread asap.. 

-Original Message-
From: Edgar E. Iglesias [mailto:edgar.igles...@gmail.com] 
Sent: Thursday, November 28, 2019 10:32 PM
To: Wasim, Bilal 
Cc: qemu-devel@nongnu.org; alist...@alistair23.me; peter.mayd...@linaro.org; 
qemu-...@nongnu.org
Subject: Re: [PATCH] Updating the GEM MAC IP to properly filter out the 
multicast addresses

On Thu, Nov 28, 2019 at 05:02:00PM +, Wasim, Bilal wrote:
> This was one of my first attempts, and so I was sure to miss something.. I've 
> incorporated all the updates in this patch.. Let me know what you think about 
> this.. 
> 
> net/cadence_gem: Updating the GEM MAC IP to properly filter out the multicast 
> addresses.
> 
> The current code makes a bad assumption that the most-significant byte 
> of the MAC address is used to determine if the address is multicast or 
> unicast, but in reality only a single bit is used to determine this.
> This caused IPv6 to not work.. Fix is now in place and has been tested 
> with ZCU102-A53 / IPv6 on a TAP interface. Works well..

Thanks Bilal,

This looks better but not quite there yet.

* You don't seem to be using git-send-email to post patches, try it, it will 
make life easier wrt to formatting. The patch should be in a separate email. 
The subject line should be the subject of the email.
git-format-patch and git-send-email will take care of that for you.

* You don't need to define IS_MULTICAST, you can directly use 
is_multicast_ether_addr() and friends.

* The patch still has long lines (longer than 80 chars) You can run 
scripts/checkpatch.pl on your commit before posting the patch.

Cheers,
Edgar



> 
> Signed-off-by: Bilal Wasim 
> ---
>  hw/net/cadence_gem.c | 21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c index 
> b8be73dc55..98efb93f8a 100644
> --- a/hw/net/cadence_gem.c
> +++ b/hw/net/cadence_gem.c
> @@ -34,6 +34,7 @@
>  #include "qemu/module.h"
>  #include "sysemu/dma.h"
>  #include "net/checksum.h"
> +#include "net/eth.h"
>  
>  #ifdef CADENCE_GEM_ERR_DEBUG
>  #define DB_PRINT(...) do { \
> @@ -315,6 +316,12 @@
>  
>  #define GEM_MODID_VALUE 0x00020118
>  
> +/* IEEE has specified that the most significant bit of the most 
> +significant byte be used for
> + * distinguishing between Unicast and Multicast addresses.
> + * If its a 1, that means multicast, 0 means unicast.   */
> +#define IS_MULTICAST(address)   is_multicast_ether_addr(address)
> +#define IS_UNICAST(address) is_unicast_ether_addr(address)
> +
>  static inline uint64_t tx_desc_get_buffer(CadenceGEMState *s, 
> uint32_t *desc)  {
>  uint64_t ret = desc[0];
> @@ -601,7 +608,7 @@ static void gem_receive_updatestats(CadenceGEMState *s, 
> const uint8_t *packet,
>  }
>  
>  /* Error-free Multicast Frames counter */
> -if (packet[0] == 0x01) {
> +if (IS_MULTICAST(packet)) {
>  s->regs[GEM_RXMULTICNT]++;
>  }
>  
> @@ -690,21 +697,21 @@ static int gem_mac_address_filter(CadenceGEMState *s, 
> const uint8_t *packet)
>  }
>  
>  /* Accept packets -w- hash match? */
> -if ((packet[0] == 0x01 && (s->regs[GEM_NWCFG] & GEM_NWCFG_MCAST_HASH)) ||
> -(packet[0] != 0x01 && (s->regs[GEM_NWCFG] & GEM_NWCFG_UCAST_HASH))) {
> +if ((IS_MULTICAST(packet) && (s->regs[GEM_NWCFG] & 
> GEM_NWCFG_MCAST_HASH)) ||
> +(IS_UNICAST(packet)   && (s->regs[GEM_NWCFG] & 
> GEM_NWCFG_UCAST_HASH))) {
>  unsigned hash_index;
>  
>  hash_index = calc_mac_hash(packet);
>  if (hash_index < 32) {
>  if (s->regs[GEM_HASHLO] & (1< -return packet[0] == 0x01 ? GEM_RX_MULTICAST_HASH_ACCEPT :
> -   GEM_RX_UNICAST_HASH_ACCEPT;
> +return IS_MULTICAST(packet) ? GEM_RX_MULTICAST_HASH_ACCEPT :
> +  
> + GEM_RX_UNICAST_HASH_ACCEPT;
>  }
>  } else {
>  hash_index -= 32;
>  if (s->regs[GEM_HASHHI] & (1< -return packet[0] == 0x01 ? GEM_RX_MULTICAST_HASH_ACCEPT :
> -   GEM_RX_UNICAST_HASH_ACCEPT;
> +return IS_MULTICAST(packet) ? GEM_RX_MULTICAST_HASH_ACCEPT :
> +  
> + GEM_RX_UNICAST_HASH_ACCEPT;
>  }
>  }
>  }
> --
> 2.19.1.windows.1
> 
> --
> 
> -Original Message-
> From: Edgar E. Iglesias [mailto:edgar.igles...@gmail.com]
> Sent: Thursday, November 28, 2019 9:00 PM
> To: Wasim, Bilal 
> Cc: qemu-devel@nongnu.org; alist...@alistair23.me; 
> peter.mayd...@linaro.org; qemu-...@nongnu.org
> Subject: Re: [PATCH] Updating the GEM MAC IP to properly filter out 
> the multicast 

Re: [PATCH v2] virtio-pci: disable vring processing when bus-mastering is disabled

2019-11-28 Thread Halil Pasic
On Tue, 19 Nov 2019 18:50:03 -0600
Michael Roth  wrote:

[..]
> I.e. the calling code is only scheduling a one-shot BH for
> virtio_blk_data_plane_stop_bh, but somehow we end up trying to process
> an additional virtqueue entry before we get there. This is likely due
> to the following check in virtio_queue_host_notifier_aio_poll:
> 
>   static bool virtio_queue_host_notifier_aio_poll(void *opaque)
>   {
>   EventNotifier *n = opaque;
>   VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
>   bool progress;
> 
>   if (!vq->vring.desc || virtio_queue_empty(vq)) {
>   return false;
>   }
> 
>   progress = virtio_queue_notify_aio_vq(vq);
> 
> namely the call to virtio_queue_empty(). In this case, since no new
> requests have actually been issued, shadow_avail_idx == last_avail_idx,
> so we actually try to access the vring via vring_avail_idx() to get
> the latest non-shadowed idx:
> 
>   int virtio_queue_empty(VirtQueue *vq)
>   {
>   bool empty;
>   ...
> 
>   if (vq->shadow_avail_idx != vq->last_avail_idx) {
>   return 0;
>   }
> 
>   rcu_read_lock();
>   empty = vring_avail_idx(vq) == vq->last_avail_idx;
>   rcu_read_unlock();
>   return empty;
> 
> but since the IOMMU region has been disabled we get a bogus value (0
> usually), which causes virtio_queue_empty() to falsely report that
> there are entries to be processed, which causes errors such as:
> 
>   "virtio: zero sized buffers are not allowed"
> 
> or
> 
>   "virtio-blk missing headers"
> 
> and puts the device in an error state.
> 

I've seen something similar on s390x with virtio-ccw-blk under
protected virtualization, that made me wonder about how virtio-blk in
particular but also virtio in general handles shutdown and reset.

This makes me wonder if bus-mastering disabled is the only scenario when
a something like vdev->disabled should be used.

In particular I have the following mechanism in mind 

qemu_system_reset() --> ... --> qemu_devices_reset() --> ... --> 
--> virtio_[transport]_reset() --> ... --> virtio_bus_stop_ioeventfd()
--> virtio_blk_data_plane_stop()

which in turn triggesrs the following cascade:
virtio_blk_data_plane_stop_bh --> virtio_queue_aio_set_host_notifier_handler() 
-->
--> virtio_queue_host_notifier_aio_read() 
which however calls 
virtio_queue_notify_aio_vq() if the notifier tests as
positive. 

Since we still have vq->handle_aio_output that means we may
call virtqueue_pop() during the reset procedure.

This was a problem for us, because (due to a bug) the shared pages that
constitute the virtio ring weren't shared any more. And thus we got
the infamous  
virtio_error(vdev, "virtio: zero sized buffers are not allowed").

Now the bug is no more, and we can tolerate that somewhat late access
to the virtio ring.

But it keeps nagging me, is it really OK for the device to access the
virtio ring during reset? My intuition tells me that the device should
not look for new requests after it has been told to reset.

Opinions? (Michael, Connie)

Regards,
Halil

> This patch works around the issue by introducing virtio_set_disabled(),
> which sets a 'disabled' flag to bypass checks like virtio_queue_empty()
> when bus-mastering is disabled. Since we'd check this flag at all the
> same sites as vdev->broken, we replace those checks with an inline
> function which checks for either vdev->broken or vdev->disabled.
> 
> The 'disabled' flag is only migrated when set, which should be fairly
> rare, but to maintain migration compatibility we disable it's use for
> older machine types. Users requiring the use of the flag in conjunction
> with older machine types can set it explicitly as a virtio-device
> option.
> 




Re: [PATCH v4 6/6] s390x: kvm: Make kvm_sclp_service_call void

2019-11-28 Thread Cornelia Huck
On Wed, 27 Nov 2019 19:38:06 +0100
Janosch Frank  wrote:

> On 11/27/19 7:25 PM, Janosch Frank wrote:
> > 
> > There's 0 (initiated), busy and operational and as far as I know we
> > implement neither.  
> 
> That came out wrong...
> s/operational/not operational/
> 
> We only implement "command initiated" / cc = 0
> We can never have busy, because we handle sclp calls synchronously.
> The spec does not give any indication when we could return "not
> operational". I guess that's just a free pass for hypervisors.

Regardless, setcc(cpu, r) also feels a bit cleaner to me...

> 
> > sclp_service_call() returns either 0 or -PGM_CODE, so we don't need to
> > check when we're after the pgm injection code.  
> 
> 



pgpL1fJhQlaAx.pgp
Description: OpenPGP digital signature


Re: [PATCH v2] qga: fence guest-set-time if hwclock not available

2019-11-28 Thread Laszlo Ersek
Hi Cornelia,

On 11/28/19 19:11, Cornelia Huck wrote:
> The Posix implementation of guest-set-time invokes hwclock to
> set/retrieve the time to/from the hardware clock. If hwclock
> is not available, the user is currently informed that "hwclock
> failed to set hardware clock to system time", which is quite
> misleading. This may happen e.g. on s390x, which has a different
> timekeeping concept anyway.
> 
> Let's check for the availability of the hwclock command and
> return QERR_UNSUPPORTED for guest-set-time if it is not available.
> 
> Signed-off-by: Cornelia Huck 
> ---
> 
> v1 (RFC) -> v2:
> - use hwclock_path[]
> - use access() instead of stat()
> 
> ---
>  qga/commands-posix.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/qga/commands-posix.c b/qga/commands-posix.c
> index 1c1a165daed8..ffb6420fa9cd 100644
> --- a/qga/commands-posix.c
> +++ b/qga/commands-posix.c
> @@ -156,6 +156,17 @@ void qmp_guest_set_time(bool has_time, int64_t time_ns, 
> Error **errp)
>  pid_t pid;
>  Error *local_err = NULL;
>  struct timeval tv;
> +const char hwclock_path[] = "/sbin/hwclock";

Did you drop the "static" storage-class specifier on purpose?

> +static int hwclock_available = -1;
> +
> +if (hwclock_available < 0) {
> +hwclock_available = (access(hwclock_path, X_OK) == 0);
> +}
> +
> +if (!hwclock_available) {
> +error_setg(errp, QERR_UNSUPPORTED);
> +return;
> +}
>  
>  /* If user has passed a time, validate and set it. */
>  if (has_time) {
> @@ -195,7 +206,7 @@ void qmp_guest_set_time(bool has_time, int64_t time_ns, 
> Error **errp)
>  
>  /* Use '/sbin/hwclock -w' to set RTC from the system time,
>   * or '/sbin/hwclock -s' to set the system time from RTC. */
> -execle("/sbin/hwclock", "hwclock", has_time ? "-w" : "-s",
> +execle(hwclock_path, "hwclock", has_time ? "-w" : "-s",

I think it's somewhat obscure now that arg="hwclock" is supposed to
match the last pathname component in hwclock_path="/sbin/hwclock".

There are multiple ways to compute "arg" like that, of course. But I
think they all look uglier than the above. So I'm fine if we just keep this.


If you purposely dropped the "static", then:

Reviewed-by: Laszlo Ersek 

If you just missed the "static" and now intend to add it back, then for v3:

Reviewed-by: Laszlo Ersek 

Thanks
Laszlo



> NULL, environ);
>  _exit(EXIT_FAILURE);
>  } else if (pid < 0) {
> 




Re: [PATCH 08/15] s390x: protvirt: KVM intercept changes

2019-11-28 Thread Cornelia Huck
On Thu, 28 Nov 2019 17:38:19 +0100
Janosch Frank  wrote:

> On 11/21/19 4:11 PM, Thomas Huth wrote:
> > On 20/11/2019 12.43, Janosch Frank wrote:  
> >> Secure guests no longer intercept with code 4 for an instruction
> >> interception. Instead they have codes 104 and 108 for secure
> >> instruction interception and secure instruction notification
> >> respectively.
> >>
> >> The 104 mirrors the 4, but the 108 is a notification, that something
> >> happened and the hypervisor might need to adjust its tracking data to
> >> that fact. An example for that is the set prefix notification
> >> interception, where KVM only reads the new prefix, but does not update
> >> the prefix in the state description.
> >>
> >> Signed-off-by: Janosch Frank 
> >> ---
> >>  target/s390x/kvm.c | 6 ++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> >> index 418154ccfe..58251c0229 100644
> >> --- a/target/s390x/kvm.c
> >> +++ b/target/s390x/kvm.c
> >> @@ -115,6 +115,8 @@
> >>  #define ICPT_CPU_STOP   0x28
> >>  #define ICPT_OPEREXC0x2c
> >>  #define ICPT_IO 0x40
> >> +#define ICPT_PV_INSTR   0x68
> >> +#define ICPT_PV_INSTR_NOT   0x6c
> >>  
> >>  #define NR_LOCAL_IRQS 32
> >>  /*
> >> @@ -151,6 +153,7 @@ static int cap_s390_irq;
> >>  static int cap_ri;
> >>  static int cap_gs;
> >>  static int cap_hpage_1m;
> >> +static int cap_protvirt;
> >>  
> >>  static int active_cmma;
> >>  
> >> @@ -336,6 +339,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> >>  cap_async_pf = kvm_check_extension(s, KVM_CAP_ASYNC_PF);
> >>  cap_mem_op = kvm_check_extension(s, KVM_CAP_S390_MEM_OP);
> >>  cap_s390_irq = kvm_check_extension(s, KVM_CAP_S390_INJECT_IRQ);
> >> +cap_protvirt = kvm_check_extension(s, KVM_CAP_S390_PROTECTED);
> >>  
> >>  if (!kvm_check_extension(s, KVM_CAP_S390_GMAP)
> >>  || !kvm_check_extension(s, KVM_CAP_S390_COW)) {
> >> @@ -1664,6 +1668,8 @@ static int handle_intercept(S390CPU *cpu)
> >>  (long)cs->kvm_run->psw_addr);
> >>  switch (icpt_code) {
> >>  case ICPT_INSTRUCTION:
> >> +case ICPT_PV_INSTR:
> >> +case ICPT_PV_INSTR_NOT:
> >>  r = handle_instruction(cpu, run);  
> > 
> > Even if this works by default, my gut feeling tells me that it would be
> > safer and cleaner to have a separate handler for this...
> > Otherwise we might get surprising results if future machine generations
> > intercept/notify for more or different instructions, I guess?
> > 
> > However, it's just a gut feeling ... I really don't have much experience
> > with this PV stuff yet ... what do the others here think?
> > 
> >  Thomas  
> 
> 
> Adding a handle_instruction_pv doesn't hurt me too much.
> The default case can then do an error_report() and exit(1);
> 
> PV was designed in a way that we can re-use as much code as possible, so
> I tried using the normal instruction handlers and only change as little
> as possible in the instructions themselves.

I think we could argue that handling 4 and 104 in the same function
makes sense; but the 108 notification should really be separate, I
think. From what I've seen, the expectation of what the hypervisor
needs to do is just something else in this case ("hey, I did something;
just to let you know").

Is the set of instructions you get a 104 for always supposed to be a
subset of the instructions you get a 4 for? I'd expect it to be so.


pgpGOwNJaPR5i.pgp
Description: OpenPGP digital signature


Re: [PATCH] vfio-ccw: Fix error message

2019-11-28 Thread Cornelia Huck
On Thu, 28 Nov 2019 15:30:14 +0100
Boris Fiuczynski  wrote:

> Signed-off-by: Boris Fiuczynski 
> Reviewed-by: Eric Farman 
> ---
>  hw/vfio/ccw.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index 6863f6c69f..3b5520ae75 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -102,7 +102,7 @@ again:
>  if (errno == EAGAIN) {
>  goto again;
>  }
> -error_report("vfio-ccw: wirte I/O region failed with errno=%d", 
> errno);
> +error_report("vfio-ccw: write I/O region failed with errno=%d", 
> errno);
>  ret = -errno;
>  } else {
>  ret = region->ret_code;

Heh, that's a long-standing one :)

Thanks, applied.




Re: [PATCH 3/3] target/arm: Handle trapping to EL2 of AArch32 VMRS instructions

2019-11-28 Thread Peter Maydell
On Thu, 28 Nov 2019 at 16:17, Marc Zyngier  wrote:
>
> HCR_EL2.TID3 requires that AArch32 reads of MVFR[012] are trapped to
> EL2, and that HCR_EL2.TID0 does the same for reads of FPSID.
> In order to handle this, introduce a new TCG helper function that
> checks for these control bits before executing the VMRC instruction.
>
> Tested with a hacked-up version of KVM/arm64 that sets the control
> bits for 32bit guests.
>
> Signed-off-by: Marc Zyngier 
> ---
>  target/arm/helper-a64.h|  2 ++
>  target/arm/internals.h |  8 
>  target/arm/translate-vfp.inc.c | 12 +---
>  target/arm/vfp_helper.c| 27 +++
>  4 files changed, 46 insertions(+), 3 deletions(-)
>
> diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
> index a915c1247f..311ced44e6 100644
> --- a/target/arm/helper-a64.h
> +++ b/target/arm/helper-a64.h
> @@ -102,3 +102,5 @@ DEF_HELPER_FLAGS_3(autda, TCG_CALL_NO_WG, i64, env, i64, 
> i64)
>  DEF_HELPER_FLAGS_3(autdb, TCG_CALL_NO_WG, i64, env, i64, i64)
>  DEF_HELPER_FLAGS_2(xpaci, TCG_CALL_NO_RWG_SE, i64, env, i64)
>  DEF_HELPER_FLAGS_2(xpacd, TCG_CALL_NO_RWG_SE, i64, env, i64)
> +
> +DEF_HELPER_3(check_hcr_el2_trap, void, env, int, int)
> diff --git a/target/arm/internals.h b/target/arm/internals.h
> index f5313dd3d4..5a55e960de 100644
> --- a/target/arm/internals.h
> +++ b/target/arm/internals.h
> @@ -430,6 +430,14 @@ static inline uint32_t syn_simd_access_trap(int cv, int 
> cond, bool is_16bit)
>  | (cv << 24) | (cond << 20) | (1 << 5);
>  }
>
> +static inline uint32_t syn_vmrs_trap(int rt, int reg)
> +{
> +return (EC_FPIDTRAP << ARM_EL_EC_SHIFT)
> +| ARM_EL_IL
> +| (1 << 24) | (0xe << 20) | (7 << 14)
> +| (reg << 10) | (rt << 5) | 1;
> +}
> +
>  static inline uint32_t syn_sve_access_trap(void)
>  {
>  return EC_SVEACCESSTRAP << ARM_EL_EC_SHIFT;
> diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
> index 85c5ef897b..4c435b6c35 100644
> --- a/target/arm/translate-vfp.inc.c
> +++ b/target/arm/translate-vfp.inc.c
> @@ -759,15 +759,21 @@ static bool trans_VMSR_VMRS(DisasContext *s, 
> arg_VMSR_VMRS *a)
>  }
>
>  if (a->l) {
> +TCGv_i32 tcg_rt, tcg_reg;
> +
>  /* VMRS, move VFP special register to gp register */
>  switch (a->reg) {
> +case ARM_VFP_MVFR0:
> +case ARM_VFP_MVFR1:
> +case ARM_VFP_MVFR2:
>  case ARM_VFP_FPSID:
> +tcg_rt = tcg_const_i32(a->rt);
> +tcg_reg = tcg_const_i32(a->reg);

Since the syndrome value depends only on these two things,
you might as well generate the full syndrome value at
translate time rather than doing it at runtime; then
you only need to pass one thing through to the helper rather
than two.

> +gen_helper_check_hcr_el2_trap(cpu_env, tcg_rt, tcg_reg);

This helper call is potentially going to throw an exception
at runtime. QEMU's JIT doesn't write back all the state
of the CPU to the CPU state structure fields for helper
calls, so to avoid losing non-written-back state there are
two possible approaches:

(1) manually write back the state before the call; for
aarch32 this looks like
gen_set_condexec(s);
gen_set_pc_im(s, s->pc_curr);
(you can see this done before we call the access_check_cp_reg()
helper, for instance)

(2) in the helper function, instead of raise_exception(),
call raise_exception_ra(..., GETPC())
This says "when we take the exception, also re-sync the
CPU state by looking at the host PC value in the JITted
code (ie the address of the callsite of the helper) and
looking through a table for this translation block that
cross-references the host PC against the guest PC and
condexec values for that point in execution".

Option 1 is better if the expectation is that the trap will
be taken always, often or usually; option 2 is what we
use if the trap is unlikely (it's how we handle
exceptions on guest load/store insns, which are the main
reason we have the mechanism at all).

Since it's unlikely that guest code will be doing ID
register accesses in hot codepaths, I'd go with option 1,
mostly just for consistency with how we do coprocessor
register access-check function calls.

> +/* fall through */
>  case ARM_VFP_FPEXC:
>  case ARM_VFP_FPINST:
>  case ARM_VFP_FPINST2:
> -case ARM_VFP_MVFR0:
> -case ARM_VFP_MVFR1:
> -case ARM_VFP_MVFR2:
>  tmp = load_cpu_field(vfp.xregs[a->reg]);
>  break;
>  case ARM_VFP_FPSCR:
> diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
> index 9710ef1c3e..44e538e51c 100644
> --- a/target/arm/vfp_helper.c
> +++ b/target/arm/vfp_helper.c
> @@ -1322,4 +1322,31 @@ float64 HELPER(frint64_d)(float64 f, void *fpst)
>  return frint_d(f, fpst, 64);
>  }
>
> +void HELPER(check_hcr_el2_trap)(CPUARMState *env, int rt, int reg)
> +{
> +if (arm_current_el(env) != 1) {
> +   

[PATCH v2] qga: fence guest-set-time if hwclock not available

2019-11-28 Thread Cornelia Huck
The Posix implementation of guest-set-time invokes hwclock to
set/retrieve the time to/from the hardware clock. If hwclock
is not available, the user is currently informed that "hwclock
failed to set hardware clock to system time", which is quite
misleading. This may happen e.g. on s390x, which has a different
timekeeping concept anyway.

Let's check for the availability of the hwclock command and
return QERR_UNSUPPORTED for guest-set-time if it is not available.

Signed-off-by: Cornelia Huck 
---

v1 (RFC) -> v2:
- use hwclock_path[]
- use access() instead of stat()

---
 qga/commands-posix.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 1c1a165daed8..ffb6420fa9cd 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -156,6 +156,17 @@ void qmp_guest_set_time(bool has_time, int64_t time_ns, 
Error **errp)
 pid_t pid;
 Error *local_err = NULL;
 struct timeval tv;
+const char hwclock_path[] = "/sbin/hwclock";
+static int hwclock_available = -1;
+
+if (hwclock_available < 0) {
+hwclock_available = (access(hwclock_path, X_OK) == 0);
+}
+
+if (!hwclock_available) {
+error_setg(errp, QERR_UNSUPPORTED);
+return;
+}
 
 /* If user has passed a time, validate and set it. */
 if (has_time) {
@@ -195,7 +206,7 @@ void qmp_guest_set_time(bool has_time, int64_t time_ns, 
Error **errp)
 
 /* Use '/sbin/hwclock -w' to set RTC from the system time,
  * or '/sbin/hwclock -s' to set the system time from RTC. */
-execle("/sbin/hwclock", "hwclock", has_time ? "-w" : "-s",
+execle(hwclock_path, "hwclock", has_time ? "-w" : "-s",
NULL, environ);
 _exit(EXIT_FAILURE);
 } else if (pid < 0) {
-- 
2.21.0




Re: [PATCH v2] virtio-pci: disable vring processing when bus-mastering is disabled

2019-11-28 Thread Michael S. Tsirkin
On Thu, Nov 28, 2019 at 05:48:00PM +0100, Halil Pasic wrote:
> On Tue, 19 Nov 2019 18:50:03 -0600
> Michael Roth  wrote:
> 
> [..]
> > I.e. the calling code is only scheduling a one-shot BH for
> > virtio_blk_data_plane_stop_bh, but somehow we end up trying to process
> > an additional virtqueue entry before we get there. This is likely due
> > to the following check in virtio_queue_host_notifier_aio_poll:
> > 
> >   static bool virtio_queue_host_notifier_aio_poll(void *opaque)
> >   {
> >   EventNotifier *n = opaque;
> >   VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
> >   bool progress;
> > 
> >   if (!vq->vring.desc || virtio_queue_empty(vq)) {
> >   return false;
> >   }
> > 
> >   progress = virtio_queue_notify_aio_vq(vq);
> > 
> > namely the call to virtio_queue_empty(). In this case, since no new
> > requests have actually been issued, shadow_avail_idx == last_avail_idx,
> > so we actually try to access the vring via vring_avail_idx() to get
> > the latest non-shadowed idx:
> > 
> >   int virtio_queue_empty(VirtQueue *vq)
> >   {
> >   bool empty;
> >   ...
> > 
> >   if (vq->shadow_avail_idx != vq->last_avail_idx) {
> >   return 0;
> >   }
> > 
> >   rcu_read_lock();
> >   empty = vring_avail_idx(vq) == vq->last_avail_idx;
> >   rcu_read_unlock();
> >   return empty;
> > 
> > but since the IOMMU region has been disabled we get a bogus value (0
> > usually), which causes virtio_queue_empty() to falsely report that
> > there are entries to be processed, which causes errors such as:
> > 
> >   "virtio: zero sized buffers are not allowed"
> > 
> > or
> > 
> >   "virtio-blk missing headers"
> > 
> > and puts the device in an error state.
> > 
> 
> I've seen something similar on s390x with virtio-ccw-blk under
> protected virtualization, that made me wonder about how virtio-blk in
> particular but also virtio in general handles shutdown and reset.
> 
> This makes me wonder if bus-mastering disabled is the only scenario when
> a something like vdev->disabled should be used.
> 
> In particular I have the following mechanism in mind 
> 
> qemu_system_reset() --> ... --> qemu_devices_reset() --> ... --> 
> --> virtio_[transport]_reset() --> ... --> virtio_bus_stop_ioeventfd()
> --> virtio_blk_data_plane_stop()
> 
> which in turn triggesrs the following cascade:
> virtio_blk_data_plane_stop_bh --> 
> virtio_queue_aio_set_host_notifier_handler() -->
> --> virtio_queue_host_notifier_aio_read() 
> which however calls 
> virtio_queue_notify_aio_vq() if the notifier tests as
> positive. 
> 
> Since we still have vq->handle_aio_output that means we may
> call virtqueue_pop() during the reset procedure.
> 
> This was a problem for us, because (due to a bug) the shared pages that
> constitute the virtio ring weren't shared any more. And thus we got
> the infamous  
> virtio_error(vdev, "virtio: zero sized buffers are not allowed").
> 
> Now the bug is no more, and we can tolerate that somewhat late access
> to the virtio ring.
> 
> But it keeps nagging me, is it really OK for the device to access the
> virtio ring during reset? My intuition tells me that the device should
> not look for new requests after it has been told to reset.


Well it's after it was told to reset but it's not after
it completed reset. So I think it's fine ...

> Opinions? (Michael, Connie)
> 
> Regards,
> Halil
> 
> > This patch works around the issue by introducing virtio_set_disabled(),
> > which sets a 'disabled' flag to bypass checks like virtio_queue_empty()
> > when bus-mastering is disabled. Since we'd check this flag at all the
> > same sites as vdev->broken, we replace those checks with an inline
> > function which checks for either vdev->broken or vdev->disabled.
> > 
> > The 'disabled' flag is only migrated when set, which should be fairly
> > rare, but to maintain migration compatibility we disable it's use for
> > older machine types. Users requiring the use of the flag in conjunction
> > with older machine types can set it explicitly as a virtio-device
> > option.
> > 




Re: [PATCH 3/3] target/arm: Handle trapping to EL2 of AArch32 VMRS instructions

2019-11-28 Thread Peter Maydell
On Thu, 28 Nov 2019 at 17:49, Marc Zyngier  wrote:
>
> Hi Peter,
>
> Thanks for having a look at this.
>
> On 2019-11-28 16:43, Peter Maydell wrote:
> > On Thu, 28 Nov 2019 at 16:17, Marc Zyngier  wrote:
> >>
> >> HCR_EL2.TID3 requires that AArch32 reads of MVFR[012] are trapped to
> >> EL2, and that HCR_EL2.TID0 does the same for reads of FPSID.
> >> In order to handle this, introduce a new TCG helper function that
> >> checks for these control bits before executing the VMRC instruction.
> >>
> >> Tested with a hacked-up version of KVM/arm64 that sets the control
> >> bits for 32bit guests.
> >>
> >> Signed-off-by: Marc Zyngier 

> > Since the syndrome value depends only on these two things,
> > you might as well generate the full syndrome value at
> > translate time rather than doing it at runtime; then
> > you only need to pass one thing through to the helper rather
> > than two.
>
> OK. This means that the register check in check_hcr_el2_trap
> will need to extract the register value from the syndrome.
> Not a big deal, but maybe slightly less readable.

Oops, I hadn't noticed that we were switching on reg.
Yeah, you might as well leave it as is. (We could have
a separate helper for each of TID0 and TID3 but that
seems like overkill.)

> On a vaguely tangential subject, how are conditional instructions
> JIT-ed? I could perfectly imagine a conditional VMRS instruction,
> but none of the code I looked at seem to care about it. Or is
> that done before the access itself is actually emitted?

Arm conditional instructions are handled at a pretty
high level in the decode, because they all work the same way.
In disas_arm_insn() we have:

if (cond != 0xe) {
/* if not always execute, we generate a conditional jump to
   next instruction */
arm_skip_unless(s, cond);
}

and there's something similar in thumb_tr_translate_insn()
which puts in a branch based on the thumb condexec bits.
The target of the branch is a label whose position is
set either in arm_post_translate_insn() after the code for the
insn is emitted, or in arm_tr_tb_stop() if the insn is
the last in the TB (always true for branch or trap insns).

thanks
-- PMM



Re: [PATCH 3/3] target/arm: Handle trapping to EL2 of AArch32 VMRS instructions

2019-11-28 Thread Marc Zyngier

Hi Peter,

Thanks for having a look at this.

On 2019-11-28 16:43, Peter Maydell wrote:

On Thu, 28 Nov 2019 at 16:17, Marc Zyngier  wrote:


HCR_EL2.TID3 requires that AArch32 reads of MVFR[012] are trapped to
EL2, and that HCR_EL2.TID0 does the same for reads of FPSID.
In order to handle this, introduce a new TCG helper function that
checks for these control bits before executing the VMRC instruction.

Tested with a hacked-up version of KVM/arm64 that sets the control
bits for 32bit guests.

Signed-off-by: Marc Zyngier 
---
 target/arm/helper-a64.h|  2 ++
 target/arm/internals.h |  8 
 target/arm/translate-vfp.inc.c | 12 +---
 target/arm/vfp_helper.c| 27 +++
 4 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index a915c1247f..311ced44e6 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -102,3 +102,5 @@ DEF_HELPER_FLAGS_3(autda, TCG_CALL_NO_WG, i64, 
env, i64, i64)

 DEF_HELPER_FLAGS_3(autdb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_2(xpaci, TCG_CALL_NO_RWG_SE, i64, env, i64)
 DEF_HELPER_FLAGS_2(xpacd, TCG_CALL_NO_RWG_SE, i64, env, i64)
+
+DEF_HELPER_3(check_hcr_el2_trap, void, env, int, int)
diff --git a/target/arm/internals.h b/target/arm/internals.h
index f5313dd3d4..5a55e960de 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -430,6 +430,14 @@ static inline uint32_t syn_simd_access_trap(int 
cv, int cond, bool is_16bit)

 | (cv << 24) | (cond << 20) | (1 << 5);
 }

+static inline uint32_t syn_vmrs_trap(int rt, int reg)
+{
+return (EC_FPIDTRAP << ARM_EL_EC_SHIFT)
+| ARM_EL_IL
+| (1 << 24) | (0xe << 20) | (7 << 14)
+| (reg << 10) | (rt << 5) | 1;
+}
+
 static inline uint32_t syn_sve_access_trap(void)
 {
 return EC_SVEACCESSTRAP << ARM_EL_EC_SHIFT;
diff --git a/target/arm/translate-vfp.inc.c 
b/target/arm/translate-vfp.inc.c

index 85c5ef897b..4c435b6c35 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -759,15 +759,21 @@ static bool trans_VMSR_VMRS(DisasContext *s, 
arg_VMSR_VMRS *a)

 }

 if (a->l) {
+TCGv_i32 tcg_rt, tcg_reg;
+
 /* VMRS, move VFP special register to gp register */
 switch (a->reg) {
+case ARM_VFP_MVFR0:
+case ARM_VFP_MVFR1:
+case ARM_VFP_MVFR2:
 case ARM_VFP_FPSID:
+tcg_rt = tcg_const_i32(a->rt);
+tcg_reg = tcg_const_i32(a->reg);


Since the syndrome value depends only on these two things,
you might as well generate the full syndrome value at
translate time rather than doing it at runtime; then
you only need to pass one thing through to the helper rather
than two.


OK. This means that the register check in check_hcr_el2_trap
will need to extract the register value from the syndrome.
Not a big deal, but maybe slightly less readable.



+gen_helper_check_hcr_el2_trap(cpu_env, tcg_rt, 
tcg_reg);


This helper call is potentially going to throw an exception
at runtime. QEMU's JIT doesn't write back all the state
of the CPU to the CPU state structure fields for helper
calls, so to avoid losing non-written-back state there are
two possible approaches:

(1) manually write back the state before the call; for
aarch32 this looks like
gen_set_condexec(s);
gen_set_pc_im(s, s->pc_curr);
(you can see this done before we call the access_check_cp_reg()
helper, for instance)

(2) in the helper function, instead of raise_exception(),
call raise_exception_ra(..., GETPC())
This says "when we take the exception, also re-sync the
CPU state by looking at the host PC value in the JITted
code (ie the address of the callsite of the helper) and
looking through a table for this translation block that
cross-references the host PC against the guest PC and
condexec values for that point in execution".

Option 1 is better if the expectation is that the trap will
be taken always, often or usually; option 2 is what we
use if the trap is unlikely (it's how we handle
exceptions on guest load/store insns, which are the main
reason we have the mechanism at all).

Since it's unlikely that guest code will be doing ID
register accesses in hot codepaths, I'd go with option 1,
mostly just for consistency with how we do coprocessor
register access-check function calls.


Ah, very interesting stuff. There is a lot of "magic" happening
in QEMU, and I wondered about the emulated state at some point,
until I forgot about it!

On a vaguely tangential subject, how are conditional instructions
JIT-ed? I could perfectly imagine a conditional VMRS instruction,
but none of the code I looked at seem to care about it. Or is
that done before the access itself is actually emitted?




+/* fall through */
 case ARM_VFP_FPEXC:
 case ARM_VFP_FPINST:
 case ARM_VFP_FPINST2:
-case ARM_VFP_MVFR0:
-case ARM_VFP_MVFR1:
-   

Re: [Bug 1853826] Re: ELF loader fails to load shared object on ThunderX2 running RHEL7

2019-11-28 Thread Alex Bennée
Do binaries have to be page size aware? I thought it was a runtime thing.
However if the aarch64-linux-user is hardwired to 4k it might explain it's
confusion on a 64k machine.

On Thu, 28 Nov 2019, 16:33 Peter Maydell,  wrote:

> If you objdump the binary and the offending library what do they seem to
> have been built for ?
>
> Certainly this:
>
> 0040-00401000 1000 ---
>
> looks like a 4K page when we're trying to load things, so either we got
> the loading wrong or the binary is 4K.
>
> --
> You received this bug notification because you are a member of qemu-
> devel-ml, which is subscribed to QEMU.
> https://bugs.launchpad.net/bugs/1853826
>
> Title:
>   ELF loader fails to load shared object on ThunderX2 running RHEL7
>
> Status in QEMU:
>   Incomplete
>
> Bug description:
>   Simple test:
>   hello.c
>
>   include 
>
>   int main(int argc, char* argv[])
>   {
> {
>   printf("Hello World... \n");
> }
> return 0;
>   }
>
>   when compiled with :
>   *Compiler
>
> https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-allinea-studio/download
>   Arm-Compiler-for-HPC_19.3_RHEL_7_aarch64.tar
>
>   *Running:
>   1) with -armpl
>armclang -armpl hello.c
>./qemu/build/aarch64-linux-user/qemu-aarch64 a.out
>   2) without flag
>   armclang hello.c
>./qemu/build/aarch64-linux-user/qemu-aarch64 a.out
>
>   •With Docker image:
>  CentOS Linux release 7.7.1908 (AltArch)
>
>   *Two different machines:
>  AArch64, Taishan. tsv110, Kunpeng 920, ARMv8.2-A
>  AArch64, Taishan 2280, Cortex-A72, ARMv8-A
>
>   *QEMU 4.0
>qemu-aarch64 version 4.1.91 (v4.2.0-rc1)
>
>
>   Results:
>
>
>Taishan 2280 Cortex-A72
> Running
>   1)with -armpl flag with and without the docker
> WORKS-> Hello World...
>  -> ldd a.out
>   ldd a.out
>   linux-vdso.so.1 =>  (0xbc6a2000)
>   libamath_generic.so =>
> /scratch/arm-linux-compiler-19.3_Generic-AArch64_RHEL-8_aarch64-linux/lib/clang/9.0.1/armpl_links/lib/libamath_generic.so
> (0xbc544000)
>   libm.so.6 => /lib64/libm.so.6 (0xbc493000)
>   libastring_generic.so =>
> /scratch/arm-linux-compiler-19.3_Generic-AArch64_RHEL-8_aarch64-linux/lib/clang/9.0.1/armpl_links/lib/libastring_generic.so
> (0xbc472000) libarmflang.so =>
> /scratch/arm-linux-compiler-19.3_Generic-AArch64_RHEL-8_aarch64-linux/lib/libarmflang.so
> (0xbbfd3000)
>   libomp.so =>
> /scratch/arm-linux-compiler-19.3_Generic-AArch64_RHEL-8_aarch64-linux/lib/libomp.so
> (0xbbef5000)
>   librt.so.1 => /lib64/librt.so.1 (0xbbed4000)
>   libpthread.so.0 => /lib64/libpthread.so.0 (0xbbe9f000)
>   libarmpl_lp64_generic.so =>
> /scratch/arm-linux-compiler-19.3_Generic-AArch64_RHEL-8_aarch64-linux/lib/clang/9.0.1/armpl_links/lib/libarmpl_lp64_generic.so
> (0xb3306000)
>   libc.so.6 => /lib64/libc.so.6 (0xb318)
>   libstdc++.so.6 =>
> /scratch/gcc-9.2.0_Generic-AArch64_RHEL-8_aarch64-linux/lib64/libstdc++.so.6
> (0xb2f3)
>   libgcc_s.so.1 =>
> /scratch/gcc-9.2.0_Generic-AArch64_RHEL-8_aarch64-linux/lib64/libgcc_s.so.1
> (0xb2eff000)
>   libdl.so.2 => /lib64/libdl.so.2 (0xb2ede000)
>   /lib/ld-linux-aarch64.so.1 (0xbc674000)
>
>
>   Running
>   2) without -armpl flag with and without the docker
>  WORKS -> Hello World...
>-> ldd a.out
>   ldd a.out
>linux-vdso.so.1 =>  (0xa6895000)
>   libastring_generic.so =>
> /scratch/arm-linux-compiler-19.3_Generic-AArch64_RHEL-8_aarch64-linux/lib/clang/9.0.1/armpl_links/lib/libastring_generic.so
> (0xa6846000)
>   libc.so.6 => /lib64/libc.so.6 (0xa66c)
>   /lib/ld-linux-aarch64.so.1 (0xa6867000)
>
>
>   Taishan - tsv110  Kunpeng 920
>  For Running
>
>   1)with -armpl flag with and without the docker
>  DOES NOT WORK -> with and without Docker
>-> It shows : qemu:handle_cpu_signal received
> signal outside vCPU
>context @ pc=0xaaa8844a
>-> ldd a.out
>   ldd a.out
>   linux-vdso.so.1 =>  (0xad4b)
>   libamath_generic.so =>
> /scratch/arm-linux-compiler-19.3_Generic-AArch64_RHEL-8_aarch64-linux/lib/clang/9.0.1/armpl_links/lib/libamath_generic.so
> (0xad37)
>   libm.so.6 => /lib64/libm.so.6 (0xad2a)
>   libastring_generic.so =>
> /scratch/arm-linux-compiler-19.3_Generic-AArch64_RHEL-8_aarch64-linux/lib/clang/9.0.1/armpl_links/lib/libastring_generic.so
> (0xad27) libarmflang.so =>
> /scratch/arm-linux-compiler-19.3_Generic-AArch64_RHEL-8_aarch64-linux/lib/libarmflang.so
> (0xacdd)
>   libomp.so =>
> /scratch/arm-linux-compiler-19.3_Generic-AArch64_RHEL-8_aarch64-linux/lib/libomp.so
> (0xaccf)
>   librt.so.1 => /lib64/librt.so.1 (0xaccc)
>   libpthread.so.0 => 

Re: [PATCH v37 00/17] QEMU AVR 8 bit cores

2019-11-28 Thread Alex Bennée


Aleksandar Markovic  writes:

> On Thursday, November 28, 2019, Michael Rolnik  wrote:
>
>> I don't see why you say that the peripherals are inside the chip, there is
>> CPU within target/avr directory and then there are some peripherals in hw
>> directory, CPU does not depend on them. what am I missing?
>>
>>>
>>>
> I meant these peripherals are physically inside the chip together with the
> core.
>
> And USART in a micricontroler from 2010 is different than USART from one
> from 2018.

Won't these be different chip parts? Or at least revs of the part?

I think broadly the difference between SoC devices is handled by
handling versioning in the board models - the board being in this case a
CPU core + a bunch of SoC components + the actual board itself.

All the target/cpu stuff needs to deal with is actual architectural
revs (c.f. target/arm/cpu[64].c).

>
>
>> On Thu, Nov 28, 2019 at 3:22 PM Aleksandar Markovic <
>> aleksandar.m.m...@gmail.com> wrote:
>>
>>>
>>>
>>> On Thursday, November 28, 2019, Michael Rolnik  wrote:
>>>


 On Wed, Nov 27, 2019 at 11:06 PM Aleksandar Markovic <
 aleksandar.m.m...@gmail.com> wrote:

> On Wed, Nov 27, 2019 at 6:53 PM Michael Rolnik 
> wrote:
> >
> > This series of patches adds 8bit AVR cores to QEMU.
> > All instruction, except BREAK/DES/SPM/SPMX, are implemented. Not
> fully tested yet.
> > However I was able to execute simple code with functions. e.g
> fibonacci calculation.
> > This series of patches include a non real, sample board.
> > No fuses support yet. PC is set to 0 at reset.
> >
>
> I have a couple of general remarks, so I am responding to the cover
> letter, not individual patches.
>
> 1) The licenses for Sarah devices differ than the rest - shouldn't all
> licenses be harmonized?

 Sarah,
 do you mind if use the same license I use for my code?


>


> 2) There is an architectural problem with peripherals. It is possible
> that they evolve over time, so, for example, USART could not be the
> same for older and newer CPUs (in principle, newer peripheral is
> expected to be o sort of "superset" of the older). How do you solve
> that problem? Right now, it may not looks serious to you, but if you
> don;t think about that right now, from the outset, soon the code will
> become so entangled, ti woudl be almost very difficult to fix it.
> Please think about that, how would you solve it, is there a way to
> pass the information on the currently emulated CPU to the code
> covering a peripheral, and provide a different behaviour?
>
 Hi Aleksandar,

 Please explain.


>>> My concern is about peripherals inside the chip, together with the core.
>>>
>>> If one models, let's say an external (in the sense, it is a separate
>>> chip) ADC (analog-to-digital converter), one looks at specs, implement what
>>> is resonable possible in QEMU, plug it in in one of machines thst contains
>>> it, and that's it. That ADC remains the same, of course, whatever the
>>> surrounding system is.
>>>
>>> In AVR case, I think we have a phenomenon likes of which we didn't see
>>> before (at least I don't know about). Number of AVR microcontrollers is
>>> very large, and both cores and peripherals evolved.
>>>
>>> For cores, you handle differences with all these AVR_FEATURE macros, and
>>> this seems to be working, no significant objection from my side, and btw
>>> that was not an easy task to execute, all admiration from me.
>>>
>>> But what about peripherals inside the chip? A peripheral with the same
>>> name and the same general area of functionality may be differently
>>> specified for microcontrollers from 2010 and 2018. By the difference I
>>> don't mean starting address, but the difference in behavior. I don't have
>>> time right now to spell many examples, but I read three different specs,
>>> and there are differences in USART specifications.
>>>
>>> I am not clear what is your envisioned solution for these cases. Would
>>> you such close, but not the same, flabors of a peripheral treat as if they
>>> are two completely separate cases of a peripheral? Or would you have a
>>> single peripheral that would somehow configure itself depending on the core
>>> it is attached to?
>>>
>>> I hope I was clearer this time.
>>>
>>> Aleksandar
>>>
>>>
>>>



 I don't see any problem from CPU's perspective.
 as for the sample board is just a sample, I hope other people will
 create real models or real hw.
 there was no way I could provide a CPU alone, that's why there is sample.



>
> > Following are examples of possible usages, assuming program.elf is
> compiled for AVR cpu
> > 1.  Continious non interrupted execution
> > run `qemu-system-avr -kernel program.elf`
> > 2.  Continious non interrupted execution with serial output into
> telnet 

[PATCH v2 3/4] ich9: Simplify ich9_lpc_initfn

2019-11-28 Thread Felipe Franciosi
Currently, ich9_lpc_initfn simply serves as a caller to
ich9_lpc_add_properties. This simplifies the code a bit by eliminating
ich9_lpc_add_properties altogether and executing its logic in the parent
object initialiser function.

Signed-off-by: Felipe Franciosi 
---
 hw/isa/lpc_ich9.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index 3a9c4f0503..9a5457c83b 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -634,12 +634,14 @@ static void ich9_lpc_get_sci_int(Object *obj, Visitor *v, 
const char *name,
 visit_type_uint8(v, name, >sci_gsi, errp);
 }
 
-static void ich9_lpc_add_properties(ICH9LPCState *lpc)
+static void ich9_lpc_initfn(Object *obj)
 {
+ICH9LPCState *lpc = ICH9_LPC_DEVICE(obj);
+
 static const uint8_t acpi_enable_cmd = ICH9_APM_ACPI_ENABLE;
 static const uint8_t acpi_disable_cmd = ICH9_APM_ACPI_DISABLE;
 
-object_property_add(OBJECT(lpc), ACPI_PM_PROP_SCI_INT, "uint8",
+object_property_add(obj, ACPI_PM_PROP_SCI_INT, "uint8",
 ich9_lpc_get_sci_int,
 NULL, NULL, NULL, NULL);
 object_property_add_uint8_ptr(OBJECT(lpc), ACPI_PM_PROP_ACPI_ENABLE_CMD,
@@ -647,14 +649,7 @@ static void ich9_lpc_add_properties(ICH9LPCState *lpc)
 object_property_add_uint8_ptr(OBJECT(lpc), ACPI_PM_PROP_ACPI_DISABLE_CMD,
   _disable_cmd, OBJ_PROP_FLAG_RD, NULL);
 
-ich9_pm_add_properties(OBJECT(lpc), >pm, NULL);
-}
-
-static void ich9_lpc_initfn(Object *obj)
-{
-ICH9LPCState *lpc = ICH9_LPC_DEVICE(obj);
-
-ich9_lpc_add_properties(lpc);
+ich9_pm_add_properties(obj, >pm, NULL);
 }
 
 static void ich9_lpc_realize(PCIDevice *d, Error **errp)
-- 
2.20.1



[Bug 1852196] Re: update edk2 submodule & binaries to edk2-stable201911

2019-11-28 Thread Laszlo Ersek (Red Hat)
Yes, I do have a reason for delaying this LP until after 4.2.0 is out.

When I filed this ticket (on 2019-Nov-12), QEMU had already entered the
4.2.0 soft feature freeze (on 2019-Oct-29). Despite possible
appearances, this LP is actually a feature addition -- that's why I also
set "Tags: feature-request" when I filed this LP.

The reason this is not a fix but a feature addition is the following:
- CVE-2019-14553 is irrelevant (doesn't exist) until we enable HTTPS Boot,
- we have not enabled HTTPS Boot earlier exactly because of CVE-2019-14553,
- the plan is to enable HTTPS Boot now, with CVE-2019-14553 fixed,
- so what remains are CVE-2019-1543, CVE-2019-1552 and CVE-2019-1563, which are 
native OpenSSL problems.

The upstream edk2 project advanced to OpenSSL 1.1.1d because of the last
point (i.e. because of those three OpenSSL CVEs). That submodule update
was tracked in:

https://bugzilla.tianocore.org/show_bug.cgi?id=2226

As you can see:

(1) there was zero analysis or explanation how those OpenSSL CVEs would
*actually* affect edk2 platforms,

(2) edk2 advanced to OpenSSL 1.1.1d (on 2019-Nov-05) approximately two
months after upstream OpenSSL 1.1.1d was released (on 2019-Sep-10).

Furthermore,

(3) all the listed CVEs are marked "low severity":

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-1543
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-1552
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-1563

(The first two items are declared low severity on cve.mitre.org, while
the last item is declared low severity in
.)

These points (1) through (3) tell me that the edk2 advance was more or
less "better safe than sorry" or "cargo cult".

While that approach is not necessarily wrong, if you have infinite
amounts of time, my capacity falls near the other end of the spectrum.
If someone runs QEMU in production, they should build their firmware
from source anyway -- the bundling of edk2 binaries with QEMU is a
convenience.

If you'd like to submit a QEMU patch set (just for the sake of the CVE
fixes, not the HTTPS Boot feature), and are willing to make the case for
getting that into 4.2-rc4, I won't block it, but I don't think it's
worth the churn, to be honest.

Thanks!
Laszlo

** Bug watch added: bugzilla.tianocore.org/ #2226
   https://bugzilla.tianocore.org/show_bug.cgi?id=2226

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2019-14553

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2019-1543

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2019-1552

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2019-1563

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1852196

Title:
  update edk2 submodule & binaries to edk2-stable201911

Status in QEMU:
  New

Bug description:
  edk2-stable201911 will be tagged soon:

https://github.com/tianocore/tianocore.github.io/wiki/EDK-II-
  Release-Planning

https://github.com/tianocore/edk2/releases/tag/edk2-stable201911
[upcoming link]

  It should be picked up by QEMU, after the v4.2.0 release.

  Relevant fixes / features in edk2, since edk2-stable201905 (which is
  what QEMU bundles at the moment, from LP#1831477):

  - enable UEFI HTTPS Boot in ArmVirtQemu* platforms
https://bugzilla.tianocore.org/show_bug.cgi?id=1009
(this is from edk2-stable201908)

  - fix CVE-2019-14553 (Invalid server certificate accepted in HTTPS Boot)
https://bugzilla.tianocore.org/show_bug.cgi?id=960

  - consume OpenSSL-1.1.1d, for fixing CVE-2019-1543, CVE-2019-1552 and
CVE-2019-1563
https://bugzilla.tianocore.org/show_bug.cgi?id=2226

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1852196/+subscriptions



Re: [RFC 0/1] ATI R300 emulated grpahics card V2

2019-11-28 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20191128064350.20727-1-aaron.zakh...@gmail.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  hw/i2c/aspeed_i2c.o
  CC  hw/i2c/microbit_i2c.o
/tmp/qemu-test/src/hw/display/r300.c: In function 'r300_mm_read':
/tmp/qemu-test/src/hw/display/r300.c:161:36: error: format '%ld' expects 
argument of type 'long int', but argument 2 has type 'uint64_t' {aka 'long long 
unsigned int'} [-Werror=format=]
 qemu_log("RADEON_MEMSIZE %ld \n",val);
  ~~^ ~~~
  %lld
/tmp/qemu-test/src/hw/display/r300.c:414:53: error: format '%lx' expects 
argument of type 'long unsigned int', but argument 2 has type 'uint64_t' {aka 
'long long unsigned int'} [-Werror=format=]
 qemu_log("RADEON_SCLK 0x%08lx \n",val);
 ^ ~~~
 %08llx
/tmp/qemu-test/src/hw/display/r300.c:449:51: error: format '%lx' expects 
argument of type 'long unsigned int', but argument 2 has type 'hwaddr' {aka 
'long long unsigned int'} [-Werror=format=]
 qemu_log("GART REGISTER 0x%08lx CONTAINS 0x%08lx 
\n",addr,val);
   ^  
   %08llx
/tmp/qemu-test/src/hw/display/r300.c:449:68: error: format '%lx' expects 
argument of type 'long unsigned int', but argument 3 has type 'uint64_t' {aka 
'long long unsigned int'} [-Werror=format=]
 qemu_log("GART REGISTER 0x%08lx CONTAINS 0x%08lx 
\n",addr,val);
^  
~~~
%08llx
/tmp/qemu-test/src/hw/display/r300.c:510:38: error: format '%lx' expects 
argument of type 'long unsigned int', but argument 2 has type 'hwaddr' {aka 
'long long unsigned int'} [-Werror=format=]
 qemu_log("READING FROM 0x%08lx \n",addr);
  ^ 
  %08llx
/tmp/qemu-test/src/hw/display/r300.c:512:34: error: format '%lx' expects 
argument of type 'long unsigned int', but argument 2 has type 'hwaddr' {aka 
'long long unsigned int'} [-Werror=format=]
 qemu_log("REGISTER 0x%08lx CONTAINS 0x%08lx \n",addr,val);
  ^  
  %08llx
/tmp/qemu-test/src/hw/display/r300.c:512:51: error: format '%lx' expects 
argument of type 'long unsigned int', but argument 3 has type 'uint64_t' {aka 
'long long unsigned int'} [-Werror=format=]
 qemu_log("REGISTER 0x%08lx CONTAINS 0x%08lx \n",addr,val);
   ^  ~~~
   %08llx
/tmp/qemu-test/src/hw/display/r300.c: In function 'r300_mm_write':
/tmp/qemu-test/src/hw/display/r300.c:849:28: error: format '%lx' expects 
argument of type 'long unsigned int', but argument 2 has type 'hwaddr' {aka 
'long long unsigned int'} [-Werror=format=]
   qemu_log("REGISTER 0x%08lx CONTAINS 0x%08lx \n",addr,data);
^  
%08llx
/tmp/qemu-test/src/hw/display/r300.c:849:45: error: format '%lx' expects 
argument of type 'long unsigned int', but argument 3 has type 'uint64_t' {aka 
'long long unsigned int'} [-Werror=format=]
   qemu_log("REGISTER 0x%08lx CONTAINS 0x%08lx \n",addr,data);
 ^  
 %08llx
/tmp/qemu-test/src/hw/display/r300.c:852:34: error: format '%lx' expects 
argument of type 'long unsigned int', but argument 2 has type 'hwaddr' {aka 
'long long unsigned int'} [-Werror=format=]
   qemu_log("R100 GART ADDR 0x%08lx GART PTR 0x%08lx \n",addr,data);
  ^  
  %08llx
/tmp/qemu-test/src/hw/display/r300.c:852:51: error: format '%lx' expects 
argument of type 'long unsigned int', but argument 3 has type 'uint64_t' {aka 
'long long unsigned int'} [-Werror=format=]
   qemu_log("R100 GART ADDR 0x%08lx GART PTR 0x%08lx \n",addr,data);
   ^  
   %08llx
/tmp/qemu-test/src/hw/display/r300.c:858:38: error: format '%lx' expects 
argument of type 'long unsigned int', but argument 2 has type 'hwaddr' {aka 
'long long unsigned int'} [-Werror=format=]
   qemu_log("WRITE MC_AGP  ADDR 0x%08lx DATA 0x%08lx 

[PATCH v2 0/4] Improve default object property_add uint helpers

2019-11-28 Thread Felipe Franciosi
This improves the family of object_property_add_uintXX_ptr helpers by enabling
a default getter/setter only when desired. To prevent an API behavioural change
(from clients that already used these helpers and did not want a setter), we
add a OBJ_PROP_FLAG_RD flag that allow clients to only have a getter. Patch 1
enhances the API and modify current users.

While modifying the clients of the API, a couple of improvement opportunities
were observed in ich9. These were added in separate patches (2 and 3).

Patch 3 cleans up a lot of existing code by moving various objects to the
enhanced API. Previously, those objects had their own getters/setters that only
updated the values without further checks. Some of them actually lacked a check
for setting overflows, which could have resulted in undesired values being set.
The new default setters include a check for that, not updating the values in
case of errors (and propagating them). If they did not provide an error
pointer, then that behaviour was maintained.


Felipe Franciosi (4):
  qom/object: enable setter for uint types
  ich9: fix getter type for sci_int property
  ich9: Simplify ich9_lpc_initfn
  qom/object: Use common get/set uint helpers

 hw/acpi/ich9.c   | 103 +++--
 hw/acpi/pcihp.c  |   7 +-
 hw/acpi/piix4.c  |  12 +--
 hw/isa/lpc_ich9.c|  28 ++
 hw/misc/edu.c|  14 +--
 hw/pci-host/q35.c|  14 +--
 hw/ppc/spapr.c   |  19 +---
 hw/ppc/spapr_drc.c   |   2 +-
 hw/vfio/pci-quirks.c |  20 ++--
 include/qom/object.h |  42 +++--
 memory.c |  15 +--
 qom/object.c | 216 ++-
 target/arm/cpu.c |  23 +
 target/i386/sev.c| 106 ++---
 ui/console.c |   4 +-
 15 files changed, 293 insertions(+), 332 deletions(-)

-- 
2.20.1

Changelog:
- Update sci_int directly instead of using stack variable
- Defining an enhanced ObjectPropertyFlags instead of just 'readonly'
- Erroring out directly (instead of using gotos) on default setters
- Retaining lack of errp passing when it wasn't there


Re: [PATCH 14/15] s390x: protvirt: Disable address checks for PV guest IO emulation

2019-11-28 Thread Cornelia Huck
On Thu, 28 Nov 2019 17:10:38 +0100
Janosch Frank  wrote:

> On 11/28/19 4:28 PM, Thomas Huth wrote:

> > Would it make sense to hide all these changes in decode_basedisp_s()
> > instead? ... so that decode_basedisp_s() returns 0 if env->pv == true ?
> > ... or are there still cases where we need real values from
> > decode_basedisp_s() in case of env->pv==true?  
> 
> I'd like to keep decode_basedisp_s() as is, but how about a static
> function in ioinst.c called something like get_address_from_regs()?
> 
> It'll call decode_basedisp_s() or return 0.

We could do something like that; but do we ever get there for other
instruction formats as well? It feels a bit odd to single out this one.


pgpvEE_tD8j2V.pgp
Description: OpenPGP digital signature


Re: [PATCH v4 2/6] s390x: Move reset normal to shared reset handler

2019-11-28 Thread Cornelia Huck
On Thu, 28 Nov 2019 07:32:53 +0100
Thomas Huth  wrote:

> On 27/11/2019 18.50, Janosch Frank wrote:
> > Let's start moving the cpu reset functions into a single function with
> > a switch/case, so we can use fallthroughs and share more code between
> > resets.  
> 
> Nit: I'd add a "later" in above sentence, since you don't use 
> fallthroughs yet.

If nobody objects, I can apply this with the sentence changed to "so we
can later use...".

> 
> > This patch introduces the reset function by renaming cpu_reset() and
> > cleaning up leftovers.  
> 
> Hmm, which leftovers? I can mainly see the renaming here...

That's probably a leftover from before splitting the original patch in
three... I can drop the leftover part when applying :)

> 
> > Signed-off-by: Janosch Frank 
> > Reviewed-by: David Hildenbrand 
> > ---
> >   target/s390x/cpu-qom.h |  6 +-
> >   target/s390x/cpu.c | 19 +--
> >   target/s390x/cpu.h |  2 +-
> >   target/s390x/sigp.c|  2 +-
> >   4 files changed, 20 insertions(+), 9 deletions(-)  
> 
> Anyway,
> Reviewed-by: Thomas Huth 




[PATCH 2/3] target/arm: Honor HCR_EL2.TID1 trapping requirements

2019-11-28 Thread Marc Zyngier
HCR_EL2.TID1 mandates that access from EL1 to REVIDR_EL1, AIDR_EL1
(and their 32bit equivalents) as well as TCMTR, TLBTR are trapped
to EL2. QEMU ignores it, naking it harder for a hypervisor to
virtualize the HW (though to be fair, no known hypervisor actually
cares).

Do the right thing by trapping to EL2 if HCR_EL2.TID1 is set.

Signed-off-by: Marc Zyngier 
---
 target/arm/helper.c | 36 
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 0b6887b100..9bff769692 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -1973,6 +1973,26 @@ static uint64_t isr_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 return ret;
 }
 
+static CPAccessResult access_aa64_tid1(CPUARMState *env, const ARMCPRegInfo 
*ri,
+   bool isread)
+{
+if (arm_hcr_el2_eff(env) & HCR_TID1) {
+return CP_ACCESS_TRAP_EL2;
+}
+
+return CP_ACCESS_OK;
+}
+
+static CPAccessResult access_aa32_tid1(CPUARMState *env, const ARMCPRegInfo 
*ri,
+   bool isread)
+{
+if (arm_feature(env, ARM_FEATURE_V8)) {
+return access_aa64_tid1(env, ri, isread);
+}
+
+return CP_ACCESS_OK;
+}
+
 static const ARMCPRegInfo v7_cp_reginfo[] = {
 /* the old v6 WFI, UNPREDICTABLE in v7 but we choose to NOP */
 { .name = "NOP", .cp = 15, .crn = 7, .crm = 0, .opc1 = 0, .opc2 = 4,
@@ -2136,7 +2156,9 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
  */
 { .name = "AIDR", .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .opc1 = 1, .crn = 0, .crm = 0, .opc2 = 7,
-  .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
+  .access = PL1_R, .type = ARM_CP_CONST,
+  .accessfn = access_aa64_tid1,
+  .resetvalue = 0 },
 /* Auxiliary fault status registers: these also are IMPDEF, and we
  * choose to RAZ/WI for all cores.
  */
@@ -6732,7 +6754,9 @@ void register_cp_regs_for_features(ARMCPU *cpu)
   .access = PL1_R, .resetvalue = cpu->midr },
 { .name = "REVIDR_EL1", .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 0, .opc2 = 6,
-  .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = cpu->revidr 
},
+  .access = PL1_R,
+  .accessfn = access_aa64_tid1,
+  .type = ARM_CP_CONST, .resetvalue = cpu->revidr },
 REGINFO_SENTINEL
 };
 ARMCPRegInfo id_cp_reginfo[] = {
@@ -6747,14 +6771,18 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 /* TCMTR and TLBTR exist in v8 but have no 64-bit versions */
 { .name = "TCMTR",
   .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 2,
-  .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
+  .access = PL1_R,
+  .accessfn = access_aa32_tid1,
+  .type = ARM_CP_CONST, .resetvalue = 0 },
 REGINFO_SENTINEL
 };
 /* TLBTR is specific to VMSA */
 ARMCPRegInfo id_tlbtr_reginfo = {
   .name = "TLBTR",
   .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 3,
-  .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0,
+  .access = PL1_R,
+  .accessfn = access_aa32_tid1,
+  .type = ARM_CP_CONST, .resetvalue = 0,
 };
 /* MPUIR is specific to PMSA V6+ */
 ARMCPRegInfo id_mpuir_reginfo = {
-- 
2.20.1




Re: [PATCH 08/15] s390x: protvirt: KVM intercept changes

2019-11-28 Thread Janosch Frank
On 11/21/19 4:11 PM, Thomas Huth wrote:
> On 20/11/2019 12.43, Janosch Frank wrote:
>> Secure guests no longer intercept with code 4 for an instruction
>> interception. Instead they have codes 104 and 108 for secure
>> instruction interception and secure instruction notification
>> respectively.
>>
>> The 104 mirrors the 4, but the 108 is a notification, that something
>> happened and the hypervisor might need to adjust its tracking data to
>> that fact. An example for that is the set prefix notification
>> interception, where KVM only reads the new prefix, but does not update
>> the prefix in the state description.
>>
>> Signed-off-by: Janosch Frank 
>> ---
>>  target/s390x/kvm.c | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
>> index 418154ccfe..58251c0229 100644
>> --- a/target/s390x/kvm.c
>> +++ b/target/s390x/kvm.c
>> @@ -115,6 +115,8 @@
>>  #define ICPT_CPU_STOP   0x28
>>  #define ICPT_OPEREXC0x2c
>>  #define ICPT_IO 0x40
>> +#define ICPT_PV_INSTR   0x68
>> +#define ICPT_PV_INSTR_NOT   0x6c
>>  
>>  #define NR_LOCAL_IRQS 32
>>  /*
>> @@ -151,6 +153,7 @@ static int cap_s390_irq;
>>  static int cap_ri;
>>  static int cap_gs;
>>  static int cap_hpage_1m;
>> +static int cap_protvirt;
>>  
>>  static int active_cmma;
>>  
>> @@ -336,6 +339,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>  cap_async_pf = kvm_check_extension(s, KVM_CAP_ASYNC_PF);
>>  cap_mem_op = kvm_check_extension(s, KVM_CAP_S390_MEM_OP);
>>  cap_s390_irq = kvm_check_extension(s, KVM_CAP_S390_INJECT_IRQ);
>> +cap_protvirt = kvm_check_extension(s, KVM_CAP_S390_PROTECTED);
>>  
>>  if (!kvm_check_extension(s, KVM_CAP_S390_GMAP)
>>  || !kvm_check_extension(s, KVM_CAP_S390_COW)) {
>> @@ -1664,6 +1668,8 @@ static int handle_intercept(S390CPU *cpu)
>>  (long)cs->kvm_run->psw_addr);
>>  switch (icpt_code) {
>>  case ICPT_INSTRUCTION:
>> +case ICPT_PV_INSTR:
>> +case ICPT_PV_INSTR_NOT:
>>  r = handle_instruction(cpu, run);
> 
> Even if this works by default, my gut feeling tells me that it would be
> safer and cleaner to have a separate handler for this...
> Otherwise we might get surprising results if future machine generations
> intercept/notify for more or different instructions, I guess?
> 
> However, it's just a gut feeling ... I really don't have much experience
> with this PV stuff yet ... what do the others here think?
> 
>  Thomas


Adding a handle_instruction_pv doesn't hurt me too much.
The default case can then do an error_report() and exit(1);

PV was designed in a way that we can re-use as much code as possible, so
I tried using the normal instruction handlers and only change as little
as possible in the instructions themselves.




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] hw: add compat machines for 5.0

2019-11-28 Thread Cornelia Huck
On Tue, 12 Nov 2019 11:48:11 +0100
Cornelia Huck  wrote:

> Add 5.0 machine types for arm/i440fx/q35/s390x/spapr.
> 
> For i440fx and q35, unversioned cpu models are still translated
> to -v1; I'll leave changing this (if desired) to the respective
> maintainers.
> 
> Signed-off-by: Cornelia Huck 
> ---
> 
> also pushed out to https://github.com/cohuck/qemu machine-5.0
> 
> x86 folks: if you want to change the cpu model versioning, I
> can do it in this patch, or just do it on top yourselves

So, do we have a final verdict yet (keep it at v1)?

If yes, I'll queue this via the s390 tree, unless someone else beats me
to it.

> 
> ---
>  hw/arm/virt.c  |  7 ++-
>  hw/core/machine.c  |  3 +++
>  hw/i386/pc.c   |  3 +++
>  hw/i386/pc_piix.c  | 14 +-
>  hw/i386/pc_q35.c   | 13 -
>  hw/ppc/spapr.c | 15 +--
>  hw/s390x/s390-virtio-ccw.c | 14 +-
>  include/hw/boards.h|  3 +++
>  include/hw/i386/pc.h   |  3 +++
>  9 files changed, 69 insertions(+), 6 deletions(-)




[PATCH 1/3] target/arm: Honor HCR_EL2.TID2 trapping requirements

2019-11-28 Thread Marc Zyngier
HCR_EL2.TID2 mandates that access from EL1 to CTR_EL0, CCSIDR_EL1,
CCSIDR2_EL1, CLIDR_EL1, CSSELR_EL1 are trapped to EL2, and QEMU
completely ignores it, making impossible for hypervisors to
virtualize the cache hierarchy.

Do the right thing by trapping to EL2 if HCR_EL2.TID2 is set.

Signed-off-by: Marc Zyngier 
---
 target/arm/helper.c | 28 +---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 0bf8f53d4b..0b6887b100 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -1910,6 +1910,17 @@ static void scr_write(CPUARMState *env, const 
ARMCPRegInfo *ri, uint64_t value)
 raw_write(env, ri, value);
 }
 
+static CPAccessResult access_aa64_tid2(CPUARMState *env,
+   const ARMCPRegInfo *ri,
+   bool isread)
+{
+if (arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_TID2)) {
+return CP_ACCESS_TRAP_EL2;
+}
+
+return CP_ACCESS_OK;
+}
+
 static uint64_t ccsidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
 ARMCPU *cpu = env_archcpu(env);
@@ -2110,10 +2121,14 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
   .writefn = pmintenclr_write },
 { .name = "CCSIDR", .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 0,
-  .access = PL1_R, .readfn = ccsidr_read, .type = ARM_CP_NO_RAW },
+  .access = PL1_R,
+  .accessfn = access_aa64_tid2,
+  .readfn = ccsidr_read, .type = ARM_CP_NO_RAW },
 { .name = "CSSELR", .state = ARM_CP_STATE_BOTH,
   .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 2, .opc2 = 0,
-  .access = PL1_RW, .writefn = csselr_write, .resetvalue = 0,
+  .access = PL1_RW,
+  .accessfn = access_aa64_tid2,
+  .writefn = csselr_write, .resetvalue = 0,
   .bank_fieldoffsets = { offsetof(CPUARMState, cp15.csselr_s),
  offsetof(CPUARMState, cp15.csselr_ns) } },
 /* Auxiliary ID register: this actually has an IMPDEF value but for now
@@ -5204,6 +5219,11 @@ static CPAccessResult ctr_el0_access(CPUARMState *env, 
const ARMCPRegInfo *ri,
 if (arm_current_el(env) == 0 && !(env->cp15.sctlr_el[1] & SCTLR_UCT)) {
 return CP_ACCESS_TRAP;
 }
+
+if (arm_hcr_el2_eff(env) & HCR_TID2) {
+return CP_ACCESS_TRAP_EL2;
+}
+
 return CP_ACCESS_OK;
 }
 
@@ -6184,7 +6204,9 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 ARMCPRegInfo clidr = {
 .name = "CLIDR", .state = ARM_CP_STATE_BOTH,
 .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 1,
-.access = PL1_R, .type = ARM_CP_CONST, .resetvalue = cpu->clidr
+.access = PL1_R, .type = ARM_CP_CONST,
+.accessfn = access_aa64_tid2,
+.resetvalue = cpu->clidr
 };
 define_one_arm_cp_reg(cpu, );
 define_arm_cp_regs(cpu, v7_cp_reginfo);
-- 
2.20.1




Re: [PATCH 0/3] target/arm: More HCR_EL2.TIDx fixes

2019-11-28 Thread Marc Zyngier

On 2019-11-28 16:30, Peter Maydell wrote:

On Thu, 28 Nov 2019 at 16:17, Marc Zyngier  wrote:


I started looking the rest of the missing TIDx handling,
and this resulted in the following patches.

There is still one thing I'm a bit puzzled by though:

HCR_EL2.TID0 mandates trapping of the AArch32 JIDR
register, but I couldn't find a trace of it in the QEMU
code, and trying to read it seems to generate an exception.

It isn't like anyone is going to miss it, but I wonder if
it should be implemented... It could also be that I'm missing
the obvious and that my testing is broken! ;-)


Hmm, I was under the impression that we correctly implemented
'trivial Jazelle', but we obviously missed some of it
(we do have the handling of BXJ insns).
We should, yes, ideally, have RAZ/WI implementations
of JIDR, JMCR and JOSCR.


OK, I'll have a look at this, and plumb the handling of TID0
in JIDR.


We also I think don't get right the fiddly detail about
attempting an exception return with SPSR.J set, but that's
not worth messing about with IMHO.


Indeed. The less we hear about Jazelle, the better... ;-)

Thanks,

M.
--
Jazz is not dead. It just smells funny...



Re: [PATCH] Updating the GEM MAC IP to properly filter out the multicast addresses

2019-11-28 Thread Edgar E. Iglesias
On Thu, Nov 28, 2019 at 05:02:00PM +, Wasim, Bilal wrote:
> This was one of my first attempts, and so I was sure to miss something.. I've 
> incorporated all the updates in this patch.. Let me know what you think about 
> this.. 
> 
> net/cadence_gem: Updating the GEM MAC IP to properly filter out the multicast 
> addresses.
> 
> The current code makes a bad assumption that the most-significant byte
> of the MAC address is used to determine if the address is multicast or
> unicast, but in reality only a single bit is used to determine this.
> This caused IPv6 to not work.. Fix is now in place and has been tested
> with ZCU102-A53 / IPv6 on a TAP interface. Works well..

Thanks Bilal,

This looks better but not quite there yet.

* You don't seem to be using git-send-email to post patches, try it,
it will make life easier wrt to formatting. The patch should be in a
separate email. The subject line should be the subject of the email.
git-format-patch and git-send-email will take care of that for you.

* You don't need to define IS_MULTICAST, you can directly
use is_multicast_ether_addr() and friends.

* The patch still has long lines (longer than 80 chars)
You can run scripts/checkpatch.pl on your commit before
posting the patch.

Cheers,
Edgar



> 
> Signed-off-by: Bilal Wasim 
> ---
>  hw/net/cadence_gem.c | 21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
> index b8be73dc55..98efb93f8a 100644
> --- a/hw/net/cadence_gem.c
> +++ b/hw/net/cadence_gem.c
> @@ -34,6 +34,7 @@
>  #include "qemu/module.h"
>  #include "sysemu/dma.h"
>  #include "net/checksum.h"
> +#include "net/eth.h"
>  
>  #ifdef CADENCE_GEM_ERR_DEBUG
>  #define DB_PRINT(...) do { \
> @@ -315,6 +316,12 @@
>  
>  #define GEM_MODID_VALUE 0x00020118
>  
> +/* IEEE has specified that the most significant bit of the most significant 
> byte be used for
> + * distinguishing between Unicast and Multicast addresses.
> + * If its a 1, that means multicast, 0 means unicast.   */
> +#define IS_MULTICAST(address)   is_multicast_ether_addr(address)
> +#define IS_UNICAST(address) is_unicast_ether_addr(address)
> +
>  static inline uint64_t tx_desc_get_buffer(CadenceGEMState *s, uint32_t *desc)
>  {
>  uint64_t ret = desc[0];
> @@ -601,7 +608,7 @@ static void gem_receive_updatestats(CadenceGEMState *s, 
> const uint8_t *packet,
>  }
>  
>  /* Error-free Multicast Frames counter */
> -if (packet[0] == 0x01) {
> +if (IS_MULTICAST(packet)) {
>  s->regs[GEM_RXMULTICNT]++;
>  }
>  
> @@ -690,21 +697,21 @@ static int gem_mac_address_filter(CadenceGEMState *s, 
> const uint8_t *packet)
>  }
>  
>  /* Accept packets -w- hash match? */
> -if ((packet[0] == 0x01 && (s->regs[GEM_NWCFG] & GEM_NWCFG_MCAST_HASH)) ||
> -(packet[0] != 0x01 && (s->regs[GEM_NWCFG] & GEM_NWCFG_UCAST_HASH))) {
> +if ((IS_MULTICAST(packet) && (s->regs[GEM_NWCFG] & 
> GEM_NWCFG_MCAST_HASH)) ||
> +(IS_UNICAST(packet)   && (s->regs[GEM_NWCFG] & 
> GEM_NWCFG_UCAST_HASH))) {
>  unsigned hash_index;
>  
>  hash_index = calc_mac_hash(packet);
>  if (hash_index < 32) {
>  if (s->regs[GEM_HASHLO] & (1< -return packet[0] == 0x01 ? GEM_RX_MULTICAST_HASH_ACCEPT :
> -   GEM_RX_UNICAST_HASH_ACCEPT;
> +return IS_MULTICAST(packet) ? GEM_RX_MULTICAST_HASH_ACCEPT :
> +  GEM_RX_UNICAST_HASH_ACCEPT;
>  }
>  } else {
>  hash_index -= 32;
>  if (s->regs[GEM_HASHHI] & (1< -return packet[0] == 0x01 ? GEM_RX_MULTICAST_HASH_ACCEPT :
> -   GEM_RX_UNICAST_HASH_ACCEPT;
> +return IS_MULTICAST(packet) ? GEM_RX_MULTICAST_HASH_ACCEPT :
> +  GEM_RX_UNICAST_HASH_ACCEPT;
>  }
>  }
>  }
> -- 
> 2.19.1.windows.1
> 
> --
> -Original Message-
> From: Edgar E. Iglesias [mailto:edgar.igles...@gmail.com] 
> Sent: Thursday, November 28, 2019 9:00 PM
> To: Wasim, Bilal 
> Cc: qemu-devel@nongnu.org; alist...@alistair23.me; peter.mayd...@linaro.org; 
> qemu-...@nongnu.org
> Subject: Re: [PATCH] Updating the GEM MAC IP to properly filter out the 
> multicast addresses
> 
> On Thu, Nov 28, 2019 at 03:10:16PM +, Wasim, Bilal wrote:
> > [PATCH] Updating the GEM MAC IP to properly filter out the multicast 
> > addresses. The current code makes a bad assumption that the 
> > most-significant byte of the MAC address is used to determine if the 
> > address is multicast or unicast, but in reality only a single bit is 
> > used to determine this. This caused IPv6 to not work.. Fix is now in 
> > place and has been 

Re: [PATCH 14/15] s390x: protvirt: Disable address checks for PV guest IO emulation

2019-11-28 Thread Janosch Frank
On 11/28/19 4:28 PM, Thomas Huth wrote:
> On 20/11/2019 12.43, Janosch Frank wrote:
>> IO instruction data is routed through SIDAD for protected guests, so
>> adresses do not need to be checked, as this is kernel memory.
>>
>> Signed-off-by: Janosch Frank 
>> ---
>>  target/s390x/ioinst.c | 46 +++
>>  1 file changed, 29 insertions(+), 17 deletions(-)
>>
>> diff --git a/target/s390x/ioinst.c b/target/s390x/ioinst.c
>> index c437a1d8c6..d3bd422ddd 100644
>> --- a/target/s390x/ioinst.c
>> +++ b/target/s390x/ioinst.c
>> @@ -110,11 +110,13 @@ void ioinst_handle_msch(S390CPU *cpu, uint64_t reg1, 
>> uint32_t ipb, uintptr_t ra)
>>  int cssid, ssid, schid, m;
>>  SubchDev *sch;
>>  SCHIB schib;
>> -uint64_t addr;
>> +uint64_t addr = 0;
>>  CPUS390XState *env = >env;
>> -uint8_t ar;
>> +uint8_t ar = 0;
>>  
>> -addr = decode_basedisp_s(env, ipb, );
>> +if (!env->pv) {
>> +addr = decode_basedisp_s(env, ipb, );
>> +}
>>  if (addr & 3) {
>>  s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>  return;
>> @@ -167,11 +169,13 @@ void ioinst_handle_ssch(S390CPU *cpu, uint64_t reg1, 
>> uint32_t ipb, uintptr_t ra)
>>  int cssid, ssid, schid, m;
>>  SubchDev *sch;
>>  ORB orig_orb, orb;
>> -uint64_t addr;
>> +uint64_t addr = 0;
>>  CPUS390XState *env = >env;
>> -uint8_t ar;
>> +uint8_t ar = 0;
>>  
>> -addr = decode_basedisp_s(env, ipb, );
>> +if (!env->pv) {
>> +addr = decode_basedisp_s(env, ipb, );
>> +}
>>  if (addr & 3) {
>>  s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>  return;
>> @@ -198,12 +202,14 @@ void ioinst_handle_ssch(S390CPU *cpu, uint64_t reg1, 
>> uint32_t ipb, uintptr_t ra)
>>  void ioinst_handle_stcrw(S390CPU *cpu, uint32_t ipb, uintptr_t ra)
>>  {
>>  CRW crw;
>> -uint64_t addr;
>> +uint64_t addr = 0;
>>  int cc;
>>  CPUS390XState *env = >env;
>> -uint8_t ar;
>> +uint8_t ar = 0;
>>  
>> -addr = decode_basedisp_s(env, ipb, );
>> +if (!env->pv) {
>> +addr = decode_basedisp_s(env, ipb, );
>> +}
>>  if (addr & 3) {
>>  s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>  return;
>> @@ -228,13 +234,15 @@ void ioinst_handle_stsch(S390CPU *cpu, uint64_t reg1, 
>> uint32_t ipb,
>>  {
>>  int cssid, ssid, schid, m;
>>  SubchDev *sch;
>> -uint64_t addr;
>> +uint64_t addr = 0;
>>  int cc;
>>  SCHIB schib;
>>  CPUS390XState *env = >env;
>> -uint8_t ar;
>> +uint8_t ar = 0;
>>  
>> -addr = decode_basedisp_s(env, ipb, );
>> +if (!env->pv) {
>> +addr = decode_basedisp_s(env, ipb, );
>> +}
>>  if (addr & 3) {
>>  s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>  return;
>> @@ -294,16 +302,18 @@ int ioinst_handle_tsch(S390CPU *cpu, uint64_t reg1, 
>> uint32_t ipb, uintptr_t ra)
>>  int cssid, ssid, schid, m;
>>  SubchDev *sch;
>>  IRB irb;
>> -uint64_t addr;
>> +uint64_t addr = 0;
>>  int cc, irb_len;
>> -uint8_t ar;
>> +uint8_t ar = 0;
>>  
>>  if (ioinst_disassemble_sch_ident(reg1, , , , )) {
>>  s390_program_interrupt(env, PGM_OPERAND, ra);
>>  return -EIO;
>>  }
>>  trace_ioinst_sch_id("tsch", cssid, ssid, schid);
>> -addr = decode_basedisp_s(env, ipb, );
>> +if (!env->pv) {
>> +addr = decode_basedisp_s(env, ipb, );
>> +}
>>  if (addr & 3) {
>>  s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>  return -EIO;
> 
> Would it make sense to hide all these changes in decode_basedisp_s()
> instead? ... so that decode_basedisp_s() returns 0 if env->pv == true ?
> ... or are there still cases where we need real values from
> decode_basedisp_s() in case of env->pv==true?

I'd like to keep decode_basedisp_s() as is, but how about a static
function in ioinst.c called something like get_address_from_regs()?

It'll call decode_basedisp_s() or return 0.


> 
> Anyway,
> Reviewed-by: Thomas Huth 
> 

Thanks!




signature.asc
Description: OpenPGP digital signature


Re: qom device lifecycle interaction with hotplug/hotunplug ?

2019-11-28 Thread Eduardo Habkost
On Thu, Nov 28, 2019 at 04:00:06PM +, Peter Maydell wrote:
> Hi; this is a question which came up in Damien's reset series
> which I don't know the answer to:
> 
> What is the interaction of the QOM device lifecycle (instance_init/realize/
> unrealize/instance_finalize) with hotplug and hot-unplug ? I couldn't
> find any documentation of this but maybe I was looking in the wrong
> place...
> 
> Looking at device_set_realized() it seems like we treat "realize"
> as meaning "and also do the hot-plug if this is a device we're
> trying to hotplug". On the other hand hot-unplug is I think the
> other way around: when we get a hot-unplug event we assume that
> it should also imply an "unrealize" (but just unrealizing doesn't
> auto-hot-unplug) ?

Your description seems accurate, and I agree it is confusing.

It would be more consistent if realized=true didn't plug the
device automatically, and qdev_device_add() asked the hotplug
handler to plug the device instead.

> 
> Once a device is hot-unplugged (and thus unrealized) is it valid
> for it to be re-hot-plugged, or is the assumption that it's then
> destroyed and a fresh device is created if the user wants to plug
> something in again later ? Put another way, is it valid for a qdev
> device to see state transitions realize -> unrealize -> realize ?

My interpretation is that this is valid in theory, but likely to
crash a large portion of our devices if we tried it.

-- 
Eduardo




Re: qom device lifecycle interaction with hotplug/hotunplug ?

2019-11-28 Thread Igor Mammedov
On Thu, 28 Nov 2019 16:00:06 +
Peter Maydell  wrote:

> Hi; this is a question which came up in Damien's reset series
> which I don't know the answer to:
> 
> What is the interaction of the QOM device lifecycle (instance_init/realize/
> unrealize/instance_finalize) with hotplug and hot-unplug ? I couldn't
> find any documentation of this but maybe I was looking in the wrong
> place...
> 
> Looking at device_set_realized() it seems like we treat "realize"
> as meaning "and also do the hot-plug if this is a device we're
> trying to hotplug". On the other hand hot-unplug is I think the
> other way around: when we get a hot-unplug event we assume that
> it should also imply an "unrealize" (but just unrealizing doesn't
> auto-hot-unplug) ?

Let me try to describe it.

device 'hotplug' interface is poorly named nowadays as it's
just 'plug' interface which checks/wires device into existing machine.
'hotplug' attribute is just informs plug controller that it might
wish to take additional actions to complete device initialization
and notify guest.

plug workflow is as follow:

  DeviceState::realize()
 hotplug_ctrl = qdev_get_hotplug_handler(dev);
 hotplug_handler_pre_plug(hotplug_ctrl, dev) // check / prepare / reserve 
resources, can fail
 // call concrete device realize_fn()
 hotplug_handler_plug(hotplug_ctrl, dev) // wire device up to 
board/bus/notify guest, shouldn't fail

 * now old bus based qdev hotplug are tied to _plug callback that
   controller (hotplug_ctrl) that owns bus sets during bus creation.
   (Ideally we should split that on _pre_plug and _plug parts)
 * also any other QOM object could be controller, to allow buss-less
   hotplug (ex: arm-virt machine or pc/q35 machines)

Unplug is a different beast, it could be originated from mgmt side
device_del() and/or from guest side.

device_del() can go 2 ways: qdev_unplug()
 * devices that support surprise removal (i.e. does not require guest 
cooperation)
   go directly to
   hotplug_handler_unplug() // tear down device from machine
   object_unparent(); -> unrealize() + finalize() // device gone
   essentially it's old qdev code behavior as is.
  
 * async removal a bit different, instead of removal it asks
   controller to process unplug request hotplug_handler_unplug_request()
   and does nothing else.
   After guest is prepared to device removal it notifies QEMU in some way
   to eject device. That calls the same sequence
 hotplug_handler_unplug()
 object_unparent()

> Once a device is hot-unplugged (and thus unrealized) is it valid
> for it to be re-hot-plugged, or is the assumption that it's then
> destroyed and a fresh device is created if the user wants to plug
> something in again later ? Put another way, is it valid for a qdev
> device to see state transitions realize -> unrealize -> realize ?

I don't think we do it currently (or maybe we do with failover but
I missed that train), but I don't see why it can't be done.

I theory it's upto the place where actual eject request is originated from,
it could do unrealize -> realize instead of unparent as far as it calls
hotplug_handler_unplug().

> 
> thanks
> -- PMM
> 




Re: [PATCH 2/2] Add -mem-shared option

2019-11-28 Thread Igor Mammedov
On Thu, 28 Nov 2019 18:15:18 +0400
Marc-André Lureau  wrote:

> Add an option to simplify shared memory / vhost-user setup.
> 
> Currently, using vhost-user requires NUMA setup such as:
> -m 4G -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on 
> -numa node,memdev=mem
> 
> As there is no other way to allocate shareable RAM, afaik.
> 
> -mem-shared aims to have a simple way instead: -m 4G -mem-shared
User always can write a wrapper script if verbose CLI is too much,
and we won't have to deal with myriad permutations to maintain.

Also current -mem-path/prealloc in combination with memdevs is
the source of problems (as ram allocation uses 2 different paths).
It's possible to fix with a kludge but I'd rather fix it properly.
So during 5.0, I'm planning to consolidate -mem-path/prealloc
handling around memory backend internally (and possibly deprecate them),
so the only way to allocate RAM for guest would be via memdevs.
(reducing number of options an globals that they use)

So user who wants something non trivial could override default
non-numa behavior with
  -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \
  -machine memdev=mem
or use any other backend that suits theirs needs.


> Signed-off-by: Marc-André Lureau 
> ---
>  exec.c  | 11 ++-
>  hw/core/numa.c  | 16 +++-
>  include/sysemu/sysemu.h |  1 +
>  qemu-options.hx | 10 ++
>  vl.c|  4 
>  5 files changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index ffdb518535..4e53937eaf 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -72,6 +72,10 @@
>  #include "qemu/mmap-alloc.h"
>  #endif
>  
> +#ifdef CONFIG_POSIX
> +#include "qemu/memfd.h"
> +#endif
> +
>  #include "monitor/monitor.h"
>  
>  //#define DEBUG_SUBPAGE
> @@ -2347,7 +2351,12 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
> MemoryRegion *mr,
>  bool created;
>  RAMBlock *block;
>  
> -fd = file_ram_open(mem_path, memory_region_name(mr), , errp);
> +if (mem_path) {
> +fd = file_ram_open(mem_path, memory_region_name(mr), , errp);
> +} else {
> +fd = qemu_memfd_open(mr->name, size,
> + F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL, 
> errp);
> +}

that's what I'm mostly against, as it spills out memdev impl. details
into generic code.

>  if (fd < 0) {
>  return NULL;
>  }
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index e3332a984f..6f72cddb1c 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -493,7 +493,8 @@ static void allocate_system_memory_nonnuma(MemoryRegion 
> *mr, Object *owner,
>  if (mem_path) {
>  #ifdef __linux__
>  Error *err = NULL;
> -memory_region_init_ram_from_file(mr, owner, name, ram_size, 0, 0,
> +memory_region_init_ram_from_file(mr, owner, name, ram_size, 0,
> + mem_shared ? RAM_SHARED : 0,
>   mem_path, );
this will be gone and replaced by memory region that memdev initializes.

>  if (err) {
>  error_report_err(err);
> @@ -513,6 +514,19 @@ static void allocate_system_memory_nonnuma(MemoryRegion 
> *mr, Object *owner,
>  #else
>  fprintf(stderr, "-mem-path not supported on this host\n");
>  exit(1);
> +#endif
> +} else if (mem_shared) {
> +#ifdef CONFIG_POSIX
> +Error *err = NULL;
> +memory_region_init_ram_from_file(mr, owner, NULL, ram_size, 0,
> + RAM_SHARED, NULL, );
> +if (err) {
> +error_report_err(err);
> +exit(1);
> +}
> +#else
> +fprintf(stderr, "-mem-shared not supported on this host\n");
> +exit(1);
>  #endif
>  } else {
>  memory_region_init_ram_nomigrate(mr, owner, name, ram_size, 
> _fatal);
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 80c57fdc4e..80db8465a9 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -55,6 +55,7 @@ extern bool enable_cpu_pm;
>  extern QEMUClockType rtc_clock;
>  extern const char *mem_path;
>  extern int mem_prealloc;
> +extern int mem_shared;
>  
>  #define MAX_OPTION_ROMS 16
>  typedef struct QEMUOptionRom {
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 65c9473b73..4c69b03ad3 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -394,6 +394,16 @@ STEXI
>  Preallocate memory when using -mem-path.
>  ETEXI
>  
> +DEF("mem-shared", 0, QEMU_OPTION_mem_shared,
> +"-mem-shared allocate shared memory\n", QEMU_ARCH_ALL)
> +STEXI
> +@item -mem-shared
> +@findex -mem-shared
> +Allocate guest RAM with shared mapping.  Whether the allocation is
> +anonymous or not (with -mem-path), QEMU will allocate a shared memory that
> +can be shared by unrelated processes, such as vhost-user backends.
> +ETEXI
> +
>  DEF("k", HAS_ARG, QEMU_OPTION_k,
>  "-k language 

Re: [PATCH 14/15] s390x: protvirt: Disable address checks for PV guest IO emulation

2019-11-28 Thread Janosch Frank
On 11/28/19 4:28 PM, Thomas Huth wrote:
> On 20/11/2019 12.43, Janosch Frank wrote:
>> IO instruction data is routed through SIDAD for protected guests, so
>> adresses do not need to be checked, as this is kernel memory.
>>
>> Signed-off-by: Janosch Frank 
>> ---
>>  target/s390x/ioinst.c | 46 +++
>>  1 file changed, 29 insertions(+), 17 deletions(-)
>>
>> diff --git a/target/s390x/ioinst.c b/target/s390x/ioinst.c
>> index c437a1d8c6..d3bd422ddd 100644
>> --- a/target/s390x/ioinst.c
>> +++ b/target/s390x/ioinst.c
>> @@ -110,11 +110,13 @@ void ioinst_handle_msch(S390CPU *cpu, uint64_t reg1, 
>> uint32_t ipb, uintptr_t ra)
>>  int cssid, ssid, schid, m;
>>  SubchDev *sch;
>>  SCHIB schib;
>> -uint64_t addr;
>> +uint64_t addr = 0;
>>  CPUS390XState *env = >env;
>> -uint8_t ar;
>> +uint8_t ar = 0;
>>  
>> -addr = decode_basedisp_s(env, ipb, );
>> +if (!env->pv) {
>> +addr = decode_basedisp_s(env, ipb, );
>> +}
>>  if (addr & 3) {
>>  s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>  return;
>> @@ -167,11 +169,13 @@ void ioinst_handle_ssch(S390CPU *cpu, uint64_t reg1, 
>> uint32_t ipb, uintptr_t ra)
>>  int cssid, ssid, schid, m;
>>  SubchDev *sch;
>>  ORB orig_orb, orb;
>> -uint64_t addr;
>> +uint64_t addr = 0;
>>  CPUS390XState *env = >env;
>> -uint8_t ar;
>> +uint8_t ar = 0;
>>  
>> -addr = decode_basedisp_s(env, ipb, );
>> +if (!env->pv) {
>> +addr = decode_basedisp_s(env, ipb, );
>> +}
>>  if (addr & 3) {
>>  s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>  return;
>> @@ -198,12 +202,14 @@ void ioinst_handle_ssch(S390CPU *cpu, uint64_t reg1, 
>> uint32_t ipb, uintptr_t ra)
>>  void ioinst_handle_stcrw(S390CPU *cpu, uint32_t ipb, uintptr_t ra)
>>  {
>>  CRW crw;
>> -uint64_t addr;
>> +uint64_t addr = 0;
>>  int cc;
>>  CPUS390XState *env = >env;
>> -uint8_t ar;
>> +uint8_t ar = 0;
>>  
>> -addr = decode_basedisp_s(env, ipb, );
>> +if (!env->pv) {
>> +addr = decode_basedisp_s(env, ipb, );
>> +}
>>  if (addr & 3) {
>>  s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>  return;
>> @@ -228,13 +234,15 @@ void ioinst_handle_stsch(S390CPU *cpu, uint64_t reg1, 
>> uint32_t ipb,
>>  {
>>  int cssid, ssid, schid, m;
>>  SubchDev *sch;
>> -uint64_t addr;
>> +uint64_t addr = 0;
>>  int cc;
>>  SCHIB schib;
>>  CPUS390XState *env = >env;
>> -uint8_t ar;
>> +uint8_t ar = 0;
>>  
>> -addr = decode_basedisp_s(env, ipb, );
>> +if (!env->pv) {
>> +addr = decode_basedisp_s(env, ipb, );
>> +}
>>  if (addr & 3) {
>>  s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>  return;
>> @@ -294,16 +302,18 @@ int ioinst_handle_tsch(S390CPU *cpu, uint64_t reg1, 
>> uint32_t ipb, uintptr_t ra)
>>  int cssid, ssid, schid, m;
>>  SubchDev *sch;
>>  IRB irb;
>> -uint64_t addr;
>> +uint64_t addr = 0;
>>  int cc, irb_len;
>> -uint8_t ar;
>> +uint8_t ar = 0;
>>  
>>  if (ioinst_disassemble_sch_ident(reg1, , , , )) {
>>  s390_program_interrupt(env, PGM_OPERAND, ra);
>>  return -EIO;
>>  }
>>  trace_ioinst_sch_id("tsch", cssid, ssid, schid);
>> -addr = decode_basedisp_s(env, ipb, );
>> +if (!env->pv) {
>> +addr = decode_basedisp_s(env, ipb, );
>> +}
>>  if (addr & 3) {
>>  s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>  return -EIO;
> 
> Would it make sense to hide all these changes in decode_basedisp_s()
> instead? ... so that decode_basedisp_s() returns 0 if env->pv == true ?
> ... or are there still cases where we need real values from
> decode_basedisp_s() in case of env->pv==true?

Pierre already suggested that, I'll look into it.
Hopefully we have no addr != 0 or addr > 2 * PAGE_SIZE checks.
Because of that it might make more sense to just rip out the checks.

> 
> Anyway,
> Reviewed-by: Thomas Huth 
> 




signature.asc
Description: OpenPGP digital signature


[PATCH v2 1/4] qom/object: enable setter for uint types

2019-11-28 Thread Felipe Franciosi
Traditionally, the uint-specific property helpers only offer getters.
When adding object (or class) uint types, one must therefore use the
generic property helper if a setter is needed (and probably duplicate
some code writing their own getters/setters).

This enhances the uint-specific property helper APIs by adding a
bitwise-or'd 'flags' field and modifying all clients of that API to set
this paramater to OBJ_PROP_FLAG_RD. This maintains the current behaviour
whilst allowing others to also set OBJ_PROP_FLAG_WR in the future (which
will automatically install a setter). Other flags may be added later.

Signed-off-by: Felipe Franciosi 
---
 hw/acpi/ich9.c   |   4 +-
 hw/acpi/pcihp.c  |   7 +-
 hw/acpi/piix4.c  |  12 +--
 hw/isa/lpc_ich9.c|   4 +-
 hw/ppc/spapr_drc.c   |   2 +-
 include/qom/object.h |  42 +++--
 qom/object.c | 216 ++-
 ui/console.c |   4 +-
 8 files changed, 243 insertions(+), 48 deletions(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 2034dd749e..236300d2a9 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -454,12 +454,12 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs 
*pm, Error **errp)
 pm->s4_val = 2;
 
 object_property_add_uint32_ptr(obj, ACPI_PM_PROP_PM_IO_BASE,
-   >pm_io_base, errp);
+   >pm_io_base, OBJ_PROP_FLAG_RD, errp);
 object_property_add(obj, ACPI_PM_PROP_GPE0_BLK, "uint32",
 ich9_pm_get_gpe0_blk,
 NULL, NULL, pm, NULL);
 object_property_add_uint32_ptr(obj, ACPI_PM_PROP_GPE0_BLK_LEN,
-   _len, errp);
+   _len, OBJ_PROP_FLAG_RD, errp);
 object_property_add_bool(obj, "memory-hotplug-support",
  ich9_pm_get_memory_hotplug_support,
  ich9_pm_set_memory_hotplug_support,
diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 8413348a33..c8a7194b19 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -80,7 +80,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
 
 *bus_bsel = (*bsel_alloc)++;
 object_property_add_uint32_ptr(OBJECT(bus), ACPI_PCIHP_PROP_BSEL,
-   bus_bsel, _abort);
+   bus_bsel, OBJ_PROP_FLAG_RD,
+   _abort);
 }
 
 return bsel_alloc;
@@ -373,9 +374,9 @@ void acpi_pcihp_init(Object *owner, AcpiPciHpState *s, 
PCIBus *root_bus,
 memory_region_add_subregion(address_space_io, s->io_base, >io);
 
 object_property_add_uint16_ptr(owner, ACPI_PCIHP_IO_BASE_PROP, >io_base,
-   _abort);
+   OBJ_PROP_FLAG_RD, _abort);
 object_property_add_uint16_ptr(owner, ACPI_PCIHP_IO_LEN_PROP, >io_len,
-   _abort);
+   OBJ_PROP_FLAG_RD, _abort);
 }
 
 const VMStateDescription vmstate_acpi_pcihp_pci_status = {
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 93aec2dd2c..06d964a840 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -443,17 +443,17 @@ static void piix4_pm_add_propeties(PIIX4PMState *s)
 static const uint16_t sci_int = 9;
 
 object_property_add_uint8_ptr(OBJECT(s), ACPI_PM_PROP_ACPI_ENABLE_CMD,
-  _enable_cmd, NULL);
+  _enable_cmd, OBJ_PROP_FLAG_RD, NULL);
 object_property_add_uint8_ptr(OBJECT(s), ACPI_PM_PROP_ACPI_DISABLE_CMD,
-  _disable_cmd, NULL);
+  _disable_cmd, OBJ_PROP_FLAG_RD, NULL);
 object_property_add_uint32_ptr(OBJECT(s), ACPI_PM_PROP_GPE0_BLK,
-  _blk, NULL);
+  _blk, OBJ_PROP_FLAG_RD, NULL);
 object_property_add_uint32_ptr(OBJECT(s), ACPI_PM_PROP_GPE0_BLK_LEN,
-  _blk_len, NULL);
+  _blk_len, OBJ_PROP_FLAG_RD, NULL);
 object_property_add_uint16_ptr(OBJECT(s), ACPI_PM_PROP_SCI_INT,
-  _int, NULL);
+  _int, OBJ_PROP_FLAG_RD, NULL);
 object_property_add_uint32_ptr(OBJECT(s), ACPI_PM_PROP_PM_IO_BASE,
-  >io_base, NULL);
+  >io_base, OBJ_PROP_FLAG_RD, NULL);
 }
 
 static void piix4_pm_realize(PCIDevice *dev, Error **errp)
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index 17c292e306..f5526f9c3b 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -645,9 +645,9 @@ static void ich9_lpc_add_properties(ICH9LPCState *lpc)
 ich9_lpc_get_sci_int,
 NULL, NULL, NULL, NULL);
 object_property_add_uint8_ptr(OBJECT(lpc), ACPI_PM_PROP_ACPI_ENABLE_CMD,
-  

Re: [PATCH 2/2] Add -mem-shared option

2019-11-28 Thread Eduardo Habkost
+Igor

On Thu, Nov 28, 2019 at 06:15:18PM +0400, Marc-André Lureau wrote:
> Add an option to simplify shared memory / vhost-user setup.
> 
> Currently, using vhost-user requires NUMA setup such as:
> -m 4G -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on 
> -numa node,memdev=mem
> 
> As there is no other way to allocate shareable RAM, afaik.
> 
> -mem-shared aims to have a simple way instead: -m 4G -mem-shared
> 
> Signed-off-by: Marc-André Lureau 
> ---
[...]
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index e3332a984f..6f72cddb1c 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -493,7 +493,8 @@ static void allocate_system_memory_nonnuma(MemoryRegion 
> *mr, Object *owner,
>  if (mem_path) {
>  #ifdef __linux__
>  Error *err = NULL;
> -memory_region_init_ram_from_file(mr, owner, name, ram_size, 0, 0,
> +memory_region_init_ram_from_file(mr, owner, name, ram_size, 0,
> + mem_shared ? RAM_SHARED : 0,
>   mem_path, );
>  if (err) {
>  error_report_err(err);
> @@ -513,6 +514,19 @@ static void allocate_system_memory_nonnuma(MemoryRegion 
> *mr, Object *owner,
>  #else
>  fprintf(stderr, "-mem-path not supported on this host\n");
>  exit(1);
> +#endif
> +} else if (mem_shared) {
> +#ifdef CONFIG_POSIX
> +Error *err = NULL;
> +memory_region_init_ram_from_file(mr, owner, NULL, ram_size, 0,
> + RAM_SHARED, NULL, );
> +if (err) {
> +error_report_err(err);
> +exit(1);
> +}
> +#else
> +fprintf(stderr, "-mem-shared not supported on this host\n");
> +exit(1);
>  #endif
>  } else {
>  memory_region_init_ram_nomigrate(mr, owner, name, ram_size, 
> _fatal);

I'd really like make allocate_system_memory_nonnuma() just create
a memory backend object.  This way non-NUMA and NUMA
configuration would be able to use exactly the same set of
options.

I have the impression we tried to do this in the past.  Igor, do
you remember if we did?

-- 
Eduardo




[PATCH v2 2/2] travis.yml: Run tcg tests with tci

2019-11-28 Thread Thomas Huth
So far we only have compile coverage for tci. But since commit
2f160e0f9797c7522bfd0d09218d0c9340a5137c ("tci: Add implementation
for INDEX_op_ld16u_i64") has been included now, we can also run the
"tcg" and "qtest" tests with tci, so let's enable them in Travis now.
Since we don't gain much additional test coverage by compiling all
targets, and TCI is broken e.g. with the Sparc targets, we also limit
the target list to a reasonable subset now (which should still get
us test coverage by tests/boot-serial-test for example).

Signed-off-by: Thomas Huth 
---
 .travis.yml | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index c09b6a0014..de7559e777 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -215,10 +215,11 @@ matrix:
 - TEST_CMD=""
 
 
-# We manually include builds which we disable "make check" for
+# Check the TCG interpreter (TCI)
 - env:
-- CONFIG="--enable-debug --enable-tcg-interpreter"
-- TEST_CMD=""
+- CONFIG="--enable-debug --enable-tcg-interpreter --disable-containers
+
--target-list=alpha-softmmu,arm-softmmu,hppa-softmmu,m68k-softmmu,microblaze-softmmu,moxie-softmmu,ppc-softmmu,s390x-softmmu,x86_64-softmmu"
+- TEST_CMD="make check-qtest check-tcg V=1"
 
 
 # We don't need to exercise every backend with every front-end
-- 
2.18.1




Re: [PATCH for-5.0 02/31] block: Add BdrvChildRole

2019-11-28 Thread Max Reitz
On 28.11.19 15:12, Kevin Wolf wrote:
> Am 27.11.2019 um 14:15 hat Max Reitz geschrieben:
>> This enum will supplement BdrvChildClass when it comes to what role (or
>> combination of roles) a child takes for its parent.
>>
>> Because empty enums are not allowed, let us just start with it filled.
>>
>> Signed-off-by: Max Reitz 
>> ---
>>  include/block/block.h | 38 ++
>>  1 file changed, 38 insertions(+)
>>
>> diff --git a/include/block/block.h b/include/block/block.h
>> index 38963ef203..36817d5689 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -279,6 +279,44 @@ enum {
>>  DEFAULT_PERM_UNCHANGED  = BLK_PERM_ALL & ~DEFAULT_PERM_PASSTHROUGH,
>>  };
>>  
>> +typedef enum BdrvChildRole {
>> +/*
>> + * If present, bdrv_replace_node() will not change the node this
>> + * BdrvChild points to.
>> + */
>> +BDRV_CHILD_STAY_AT_NODE = (1 << 0),
>> +
>> +/* Child stores data */
>> +BDRV_CHILD_DATA = (1 << 1),
>> +
>> +/* Child stores metadata */
>> +BDRV_CHILD_METADATA = (1 << 2),
>> +
>> +/* Filtered child */
>> +BDRV_CHILD_FILTERED = (1 << 3),
>> +
>> +/* Child to COW from (backing child) */
>> +BDRV_CHILD_COW  = (1 << 4),
>> +
>> +/* Child is expected to be a protocol node */
>> +BDRV_CHILD_PROTOCOL = (1 << 5),
>> +
>> +/* Child is expected to be a format node */
>> +BDRV_CHILD_FORMAT   = (1 << 6),
> 
> In theory, a node shouldn't care what other nodes it has as its
> children. For a parent, protocols and formats look exactly the same.
> 
> Of course, we do have BDRV_O_PROTOCOL, but if I'm not mistaken this is
> basically only about probing or not probing an image format when a
> legacy filename is given rather than BlockdevOptions.
> 
> Therefore, unless you have a real reason for this to be here, I'd prefer
> if we could keep such legacy flags outside of the core infrastructure if
> at all possible.

Hm.  The reason I have it here is because currently this is handled by
BdrvChildClass.inherit_options.  For filtered and backing children, that
will leave PROTOCOL as it is; for the file child of format nodes it will
set PROTOCOL; and for some children (blkverify and quorum) it will clear
PROTOCOL.

So without these flags here, we can’t unify child_file, child_format,
and child_backing in a single class, just because they bequeath the
PROTOCOL flag differently.  At least not directly.

(I’d like to note that this doesn’t make anything worse.  Right now,
drivers need to make a conscious choice on this flag, too, namely by
choosing the right BdrvChildClass.)

Hmm.  Can we do better?  Instead of the driver hinting at what they
expect from the child, can we somehow infer that automatically in
block.c (i.e., in inherit_options without it being given PROTOCOL or
FORMAT)?  FILTERED || COW always means keeping it as-is.  METADATA
generally means setting PROTOCOL, I suppose.

The two problems that come to my mind are blkverify and quorum.
blkverify is special: It must enforce format-probing on the test image,
and it must disable format-probing on the verification (the raw) image.

Quorum wants format probing on everything, but all its children are
simply DATA children.  Other DATA children are e.g. external data files
of qcow2, and we certainly want to force-disable format probing there.

I suppose for the quorum problem we could introduce a
BlockDriver.is_format flag that would force O_PROTOCOL for all non-COW
children, but we would unset O_PROTOCOL for DATA children of
non-is_format drivers.

I suppose the same could work for blkverify’s test image.  For its raw
image, we’d probably just have to enforce the raw driver (or rely on the
fact that blkverify is technically a protocol driver in a way (it
implements .bdrv_file_open...), so it will always have O_PROTOCOL set on
itself; thus, its filtered child (the raw child) will automatically have
it, too, as long as we don’t touch it).

Do you have a better idea?

>> +/*
>> + * The primary child.  For most drivers, this is the child whose
>> + * filename applies best to the parent node.
>> + */
>> +BDRV_CHILD_PRIMARY  = (1 << 7),
> 
> If primary is a flag of each BdrvChild, then you could end up having
> multiple children that claim that they're the primary child. On the
> other hand, if we have a bs->primary_child instead to make sure that we
> only have one primary child, we'd have to keep this consistent when
> children change.
> 
> So maybe just document that this flag must be given to only one
> BdrvChild link for each parent.

Sure.

Max



signature.asc
Description: OpenPGP digital signature


  1   2   3   >