Re: [PATCH 07/11] mips/malta: Fix create_cps() error handling

2020-04-28 Thread Markus Armbruster
Philippe Mathieu-Daudé  writes:

> On 4/24/20 9:20 PM, Markus Armbruster wrote:
>> The Error ** argument must be NULL, _abort, _fatal, or a
>> pointer to a variable containing NULL.  Passing an argument of the
>> latter kind twice without clearing it in between is wrong: if the
>> first call sets an error, it no longer points to NULL for the second
>> 
>> create_cps() is wrong that way.  The last calls treats an error as
>> fatal.  Do that for the prior ones, too.
>> 
>> Fixes: bff384a4fbd5d0e86939092e74e766ef0f5f592c
>> Cc: Aleksandar Markovic 
>> Cc: "Philippe Mathieu-Daudé" 
>> Cc: Aurelien Jarno 
>> Signed-off-by: Markus Armbruster 
>> ---
>>  hw/mips/mips_malta.c | 15 ++-
>>  1 file changed, 6 insertions(+), 9 deletions(-)
>> 
>> diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c
>> index e4c4de1b4e..17bf41616b 100644
>> --- a/hw/mips/mips_malta.c
>> +++ b/hw/mips/mips_malta.c
>> @@ -1185,17 +1185,14 @@ static void create_cpu_without_cps(MachineState *ms,
>>  static void create_cps(MachineState *ms, MaltaState *s,
>> qemu_irq *cbus_irq, qemu_irq *i8259_irq)
>>  {
>> -Error *err = NULL;
>> -
>>  sysbus_init_child_obj(OBJECT(s), "cps", OBJECT(>cps), sizeof(s->cps),
>>TYPE_MIPS_CPS);
>> -object_property_set_str(OBJECT(>cps), ms->cpu_type, "cpu-type", 
>> );
>> -object_property_set_int(OBJECT(>cps), ms->smp.cpus, "num-vp", );
>> -object_property_set_bool(OBJECT(>cps), true, "realized", );
>> -if (err != NULL) {
>> -error_report("%s", error_get_pretty(err));
>
> In https://www.mail-archive.com/qemu-devel@nongnu.org/msg695645.html I
> also remove "qemu/error-report.h" which is not needed once you remove
> this call.

Missed it, sorry.  I only reviewed the Coccinelle scripts [PATCH 1+3/7].

I'd replace my patch by yours to give you proper credit, but your commit
message mentions "the coccinelle script", presumably the one from PATCH
1/7, and that patch isn't quite ready in my opinion.

>> -exit(1);
>> -}
>> +object_property_set_str(OBJECT(>cps), ms->cpu_type, "cpu-type",
>> +_fatal);
>> +object_property_set_int(OBJECT(>cps), ms->smp.cpus, "num-vp",
>> +_fatal);
>> +object_property_set_bool(OBJECT(>cps), true, "realized",
>> + _fatal);
>>  
>>  sysbus_mmio_map_overlap(SYS_BUS_DEVICE(>cps), 0, 0, 1);
>>  
>> 




Re: [PATCH 03/11] s390x/cpumodel: Fix harmless misuse of visit_check_struct()

2020-04-28 Thread Markus Armbruster
David Hildenbrand  writes:

> On 24.04.20 21:20, Markus Armbruster wrote:
>> Commit e47970f51d "s390x/cpumodel: Fix query-cpu-model-FOO error API
>> violations" neglected to change visit_end_struct()'s Error ** argument
>> along with the others.  If visit_end_struct() failed, we'd take the
>
> s/visit_end_struct/visit_check_struct/ ?

Will fix.

>> success path.  Fortunately, it can't fail here:
>> qobject_input_check_struct() checks we consumed the whole dictionary,
>> and to get here, we did.  Fix it anyway.
>
> AFAIKs, if visit_check_struct() failed, we'd still do the memcopy, but
> also report the error. Not nice, not bad.
>
> Reviewed-by: David Hildenbrand 

Thanks!




Re: [PATCH 02/11] xen: Fix and improve handling of device_add usb-host errors

2020-04-28 Thread Markus Armbruster
Paul Durrant  writes:

>> -Original Message-
>> From: Markus Armbruster 
>> Sent: 24 April 2020 20:20
>> To: qemu-devel@nongnu.org
>> Cc: Stefano Stabellini ; Anthony Perard 
>> ; Paul
>> Durrant ; Gerd Hoffmann ; 
>> xen-de...@lists.xenproject.org
>> Subject: [PATCH 02/11] xen: Fix and improve handling of device_add usb-host 
>> errors
>> 
>> usbback_portid_add() leaks the error when qdev_device_add() fails.
>> Fix that.  While there, use the error to improve the error message.
>> 
>> The qemu_opts_from_qdict() similarly leaks on failure.  But any
>> failure there is a programming error.  Pass _abort.
>> 
>> Fixes: 816ac92ef769f9ffc534e49a1bb6177bddce7aa2
>> Cc: Stefano Stabellini 
>> Cc: Anthony Perard 
>> Cc: Paul Durrant 
>> Cc: Gerd Hoffmann 
>> Cc: xen-de...@lists.xenproject.org
>> Signed-off-by: Markus Armbruster 
>> ---
>>  hw/usb/xen-usb.c | 18 --
>>  1 file changed, 8 insertions(+), 10 deletions(-)
>> 
>> diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c
>> index 961190d0f7..42643c3390 100644
>> --- a/hw/usb/xen-usb.c
>> +++ b/hw/usb/xen-usb.c
>> @@ -30,6 +30,7 @@
>>  #include "hw/usb.h"
>>  #include "hw/xen/xen-legacy-backend.h"
>>  #include "monitor/qdev.h"
>> +#include "qapi/error.h"
>>  #include "qapi/qmp/qdict.h"
>>  #include "qapi/qmp/qstring.h"
>> 
>> @@ -755,13 +756,15 @@ static void usbback_portid_add(struct usbback_info 
>> *usbif, unsigned port,
>>  qdict_put_int(qdict, "port", port);
>>  qdict_put_int(qdict, "hostbus", atoi(busid));
>>  qdict_put_str(qdict, "hostport", portname);
>> -opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict, 
>> _err);
>> -if (local_err) {
>> -goto err;
>> -}
>> +opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict,
>> +_abort);
>>  usbif->ports[port - 1].dev = USB_DEVICE(qdev_device_add(opts, 
>> _err));
>>  if (!usbif->ports[port - 1].dev) {
>> -goto err;
>> +qobject_unref(qdict);
>> +xen_pv_printf(>xendev, 0,
>> +  "device %s could not be opened: %s\n",
>> +  busid, error_get_pretty(local_err));
>> +error_free(local_err);
>
> Previously the goto caused the function to bail out. Should there not be a 
> 'return' here?

Owww, of course.  Thanks!

>
>>  }
>>  qobject_unref(qdict);
>>  speed = usbif->ports[port - 1].dev->speed;
>> @@ -793,11 +796,6 @@ static void usbback_portid_add(struct usbback_info 
>> *usbif, unsigned port,
>>  usbback_hotplug_enq(usbif, port);
>> 
>>  TR_BUS(>xendev, "port %d attached\n", port);
>> -return;
>> -
>> -err:
>> -qobject_unref(qdict);
>> -xen_pv_printf(>xendev, 0, "device %s could not be opened\n", 
>> busid);
>>  }
>> 
>>  static void usbback_process_port(struct usbback_info *usbif, unsigned port)
>> --
>> 2.21.1




Re: [PATCH 2/4] smbus: Fix spd_data_generate() error API violation

2020-04-28 Thread Markus Armbruster
BALATON Zoltan  writes:

> On Fri, 24 Apr 2020, Markus Armbruster wrote:
>> BALATON Zoltan  writes:
>>> On Tue, 21 Apr 2020, Markus Armbruster wrote:
 BALATON Zoltan  writes:
> On Mon, 20 Apr 2020, Markus Armbruster wrote:
>> The Error ** argument must be NULL, _abort, _fatal, or a
>> pointer to a variable containing NULL.  Passing an argument of the
>> latter kind twice without clearing it in between is wrong: if the
>> first call sets an error, it no longer points to NULL for the second
>> call.
>>
>> spd_data_generate() can pass @errp to error_setg() more than once when
>> it adjusts both memory size and type.  Harmless, because no caller
>> passes anything that needs adjusting.  Until the previous commit,
>> sam460ex passed types that needed adjusting, but not sizes.
>>
>> spd_data_generate()'s contract is rather awkward:
>>
>>If everything's fine, return non-null and don't set an error.
>>
>>Else, if memory size or type need adjusting, return non-null and
>>set an error describing the adjustment.
>>
>>Else, return null and set an error reporting why no data can be
>>generated.
>>
>> Its callers treat the error as a warning even when null is returned.
>> They don't create the "smbus-eeprom" device then.  Suspicious.
>
> The idea here again is to make it work if there's a way it could work
> rather than throw back an error to user and bail. This is for user
> convenience in the likely case the user knows nothing about the board
> or SPD data restrictions.

 We're in perfect agreement that the user of QEMU should not need to know
 anything about memory type and number of banks.  QEMU should pick a
 sensible configuration for any memory size it supports.
>>>
>>> I though it could be useful to warn the users when QEMU had to fix up
>>> some values to notify them that what they get may not be what they
>>> expect and can then know why.
>>
>> *If* QEMU "fixed up" the user's configuration, then QEMU should indeed
>> warn the user.
>>
>> But it doesn't fix up anything here.  This broken code is unused.
>>
>>>   If the message really annoys you you can
>>> remove it but I think it can be useful. I just think your real problem
>>> with it is that Error can't support it so it's easier to remove the
>>> warning than fixing Error or use warn_report instead.
>>
>> It's indeed easier to remove broken unused code than to try fixing it.
>> YAGNI.
>>
>   You seem to disagree with this

 Here's what I actually disagree with:

 1. QEMU warning the user about its choice of memory type, but only
 sometimes.  Why warn?  There is nothing wrong, and there is nothing the
 user can do to avoid the condition that triggers the warning.
>>>
>>> The memory size that triggers the warning is specified by the user so
>>> the user can do someting about it.
>>
>> There is no way to trigger the warning.  If we dropped PATCH 1 instead
>> of fixing it as I did in v2, then the only way to trigger the warning is
>> -M sam460ex -m 64 or -m 32, and the only way to avoid it is to don't do
>> that.
>>
>> Why would a user care whether he gets DDR or DDR2 memory?
>>
 2. QEMU changing the user's memory size.  Yes, I understand that's an
 attempt to be helpful, but I prefer my tools not to second-guess my
 intent.
>>>
>>> I agree with that and also hate Windows's habit of trying to be more
>>> intelligent than the user and prefer the Unix way however the average
>>> users of QEMU (from my perpective, who wants to run Amiga like OSes
>>> typically on Windows and for the most part not knowing what they are
>>> doing) are already intimidated by the messy QEMU command line
>>> interface and will specify random values and complain or go away if it
>>> does not work so making their life a little easier is not
>>> useless. These users (or any CLI users) are apparently not relevant
>>> from your point of view but they do exist and I think should be
>>> supported better.
>>
>> This theoretical.  Remember, we're talking about unused code.  Proof:
>>
>>$ ppc-softmmu/qemu-system-ppc -M sam460ex -m 4096
>>qemu-system-ppc: Max 1 banks of 2048 ,1024 ,512 ,256 ,128 ,64 ,32 MB 
>> DIMM/bank supported
>>qemu-system-ppc: Possible valid RAM size: 2048
>>
>> I figure commit a0258e4afa "ppc/{ppc440_bamboo, sam460ex}: drop RAM size
>> fixup" removed the only uses.  If you disagree with it, take it up with
>> Igor, please.
>
> I did raise similar complaints at that patch series and proposed
> several alternatives to preserve the previous functionality (sam460ex
> wasn't the only board that fixed up memory size for users) but since
> current APIs don't support that and adding this extra feature for just
> this machine wasn't a priority, my comments were accepted and ignored
> and I did not feel it would be fair to hold 

Re: [PATCH v4 00/18] nvme: factor out cmb/pmr setup

2020-04-28 Thread Klaus Jensen
On Apr 22 13:01, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Changes since v3
> 
> * Remove the addition of a new PROPERTIES macro in "nvme: move device
>   parameters to separate struct" (Philippe)
> 
> * Add NVME_PMR_BIR constant and use it in PMR setup.
> 
> * Split "nvme: factor out cmb/pmr setup" into
>   - "nvme: factor out cmb setup",
>   - "nvme: factor out pmr setup" and
>   - "nvme: do cmb/pmr init as part of pci init"
>   (Philippe)
> 
> 
> Klaus Jensen (18):
>   nvme: fix pci doorbell size calculation
>   nvme: rename trace events to pci_nvme
>   nvme: remove superfluous breaks
>   nvme: move device parameters to separate struct
>   nvme: use constants in identify
>   nvme: refactor nvme_addr_read
>   nvme: add max_ioqpairs device parameter
>   nvme: remove redundant cmbloc/cmbsz members
>   nvme: factor out property/constraint checks
>   nvme: factor out device state setup
>   nvme: factor out block backend setup
>   nvme: add namespace helpers
>   nvme: factor out namespace setup
>   nvme: factor out pci setup
>   nvme: factor out cmb setup
>   nvme: factor out pmr setup
>   nvme: do cmb/pmr init as part of pci init
>   nvme: factor out controller identify setup
> 
>  hw/block/nvme.c   | 543 --
>  hw/block/nvme.h   |  31 ++-
>  hw/block/trace-events | 180 +++---
>  include/block/nvme.h  |   8 +
>  4 files changed, 429 insertions(+), 333 deletions(-)
> 
> -- 
> 2.26.2
> 
> 

Gentle bump on this.

I apparently managed to screw up the git send-email this time, loosing a
bunch of CCs in the process. Sorry about that.



RE: [PATCH v3 3/6] net/colo-compare.c: Fix deadlock in compare_chr_send

2020-04-28 Thread Zhang, Chen


> -Original Message-
> From: Lukas Straub 
> Sent: Monday, April 27, 2020 3:22 PM
> To: Zhang, Chen 
> Cc: qemu-devel ; Li Zhijian
> ; Jason Wang ; Marc-
> André Lureau ; Paolo Bonzini
> 
> Subject: Re: [PATCH v3 3/6] net/colo-compare.c: Fix deadlock in
> compare_chr_send
> 
> On Mon, 27 Apr 2020 03:36:57 +
> "Zhang, Chen"  wrote:
> 
> > > -Original Message-
> > > From: Lukas Straub 
> > > Sent: Monday, April 27, 2020 5:19 AM
> > > To: qemu-devel 
> > > Cc: Zhang, Chen ; Li Zhijian
> > > ; Jason Wang ; Marc-
> > > André Lureau ; Paolo Bonzini
> > > 
> > > Subject: [PATCH v3 3/6] net/colo-compare.c: Fix deadlock in
> > > compare_chr_send
> > >
> > > The chr_out chardev is connected to a filter-redirector running in
> > > the main loop. qemu_chr_fe_write_all might block here in
> > > compare_chr_send if the (socket-)buffer is full.
> > > If another filter-redirector in the main loop want's to send data to
> > > chr_pri_in it might also block if the buffer is full. This leads to
> > > a deadlock because both event loops get blocked.
> > >
> > > Fix this by converting compare_chr_send to a coroutine and putting
> > > the packets in a send queue. Also create a new function
> > > notify_chr_send, since that should be independend.
> > >
> > > Signed-off-by: Lukas Straub 
> > > ---
> > >  net/colo-compare.c | 173 ++---
> 
> > > 
> > >  1 file changed, 130 insertions(+), 43 deletions(-)
> > >
> > > diff --git a/net/colo-compare.c b/net/colo-compare.c index
> > > 1de4220fe2..ff6a740284 100644
> > > --- a/net/colo-compare.c
> > > +++ b/net/colo-compare.c
> > > @@ -32,6 +32,9 @@
> > >  #include "migration/migration.h"
> > >  #include "util.h"
> > >
> > > +#include "block/aio-wait.h"
> > > +#include "qemu/coroutine.h"
> > > +
> > >  #define TYPE_COLO_COMPARE "colo-compare"
> > >  #define COLO_COMPARE(obj) \
> > >  OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE) @@ -
> 77,6
> > > +80,20 @@ static int event_unhandled_count;
> > >   *|packet  |  |packet  +|packet  | |packet  +
> > >   *++  ++++ ++
> > >   */
> > > +
> > > +typedef struct SendCo {
> > > +Coroutine *co;
> > > +GQueue send_list;
> > > +bool done;
> > > +int ret;
> > > +} SendCo;
> > > +
> > > +typedef struct SendEntry {
> > > +uint32_t size;
> > > +uint32_t vnet_hdr_len;
> > > +uint8_t buf[];
> > > +} SendEntry;
> > > +
> > >  typedef struct CompareState {
> > >  Object parent;
> > >
> > > @@ -91,6 +108,7 @@ typedef struct CompareState {
> > >  SocketReadState pri_rs;
> > >  SocketReadState sec_rs;
> > >  SocketReadState notify_rs;
> > > +SendCo sendco;
> > >  bool vnet_hdr;
> > >  uint32_t compare_timeout;
> > >  uint32_t expired_scan_cycle;
> > > @@ -126,8 +144,11 @@ enum {
> > >  static int compare_chr_send(CompareState *s,
> > >  const uint8_t *buf,
> > >  uint32_t size,
> > > -uint32_t vnet_hdr_len,
> > > -bool notify_remote_frame);
> > > +uint32_t vnet_hdr_len);
> > > +
> > > +static int notify_chr_send(CompareState *s,
> > > +   const uint8_t *buf,
> > > +   uint32_t size);
> > >
> > >  static bool packet_matches_str(const char *str,
> > > const uint8_t *buf, @@ -145,7 +166,7
> > > @@ static void notify_remote_frame(CompareState *s)
> > >  char msg[] = "DO_CHECKPOINT";
> > >  int ret = 0;
> > >
> > > -ret = compare_chr_send(s, (uint8_t *)msg, strlen(msg), 0, true);
> > > +ret = notify_chr_send(s, (uint8_t *)msg, strlen(msg));
> > >  if (ret < 0) {
> > >  error_report("Notify Xen COLO-frame failed");
> > >  }
> > > @@ -271,8 +292,7 @@ static void
> > > colo_release_primary_pkt(CompareState
> > > *s, Packet *pkt)
> > >  ret = compare_chr_send(s,
> > > pkt->data,
> > > pkt->size,
> > > -   pkt->vnet_hdr_len,
> > > -   false);
> > > +   pkt->vnet_hdr_len);
> > >  if (ret < 0) {
> > >  error_report("colo send primary packet failed");
> > >  }
> > > @@ -699,63 +719,123 @@ static void colo_compare_connection(void
> > > *opaque, void *user_data)
> > >  }
> > >  }
> > >
> > > -static int compare_chr_send(CompareState *s,
> > > -const uint8_t *buf,
> > > -uint32_t size,
> > > -uint32_t vnet_hdr_len,
> > > -bool notify_remote_frame)
> > > +static void coroutine_fn _compare_chr_send(void *opaque)
> > >  {
> > > +CompareState *s = opaque;
> > > +SendCo *sendco = >sendco;
> > >  int ret = 0;
> > > -uint32_t len = htonl(size);
> > >
> > 

Re: [PATCH V2] Add a new PIIX option to control PCI hot unplugging of devices on non-root buses

2020-04-28 Thread Michael S. Tsirkin
On Wed, Apr 29, 2020 at 06:28:46AM +0530, Ani Sinha wrote:
> Well there were several discussions in the other thread around how PCIE 
> behaves
> and how we can't change the slot features without a HW reset. Those were 
> useful
> inputs.

OK so I'd expect these to be addressed in some way. If we commit to
support a feature which has no chance to work on real hardware, we paint
ourselves into a tight corner. This kind of thing tends to create
maintainance problems down the road. Disabling both hotplug and unplug
sounds like a reasonable thing to do, so if there's a need to disable
just one of these, commit log needs to do a better job documenting the
usecase.

Alternatively, we need to be more creative with achieving what you are
trying to do in ways that can work on real hardware.

For example, how about hot-plugging a bridge which doesn't
support hotplug itself? Would that happen to make windows
do what you want, for both PCI and PCIE? We don't support
hotplugging bridges with devices behind them ATM, but
that sounds like a useful option.


Also, pcie root ports recently gained ability to disable hotplug, see
commit 530a0963184e57e71a5b538e9161f115df533e96
Author: Julia Suvorova 
Date:   Wed Feb 26 18:46:07 2020 +0100

pcie_root_port: Add hotplug disabling option

adding this to pci and pcie bridges sounds very reasonable.

-- 
MST




Re: [PATCH v2 00/17] 64bit block-layer

2020-04-28 Thread Vladimir Sementsov-Ogievskiy

29.04.2020 0:33, Eric Blake wrote:

On 4/27/20 3:23 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

v1 was "[RFC 0/3] 64bit block-layer part I", please refer to initial
cover-letter
  https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg08723.html
for motivation.

v2:
patch 02 is unchanged, add Stefan's r-b. Everything other is changed a
lot. What's new:



You'll also want to check my (now-abandoned?) posting from a while back:
https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg02769.html

to see what (if anything) from that attempt can be salvaged.



Hmm, looks close :) will keep it in mind, thanks. Or, may be you want to 
resend? First 4 patches are not needed now, as bdrv_read/bdrv_write already 
dropped.

--
Best regards,
Vladimir



Re: [PATCH 1/4] sam460ex: Revert change to SPD memory type for <= 128 MiB

2020-04-28 Thread Markus Armbruster
BALATON Zoltan  writes:

> On Tue, 21 Apr 2020, Markus Armbruster wrote:
>> BALATON Zoltan  writes:
>>> On Mon, 20 Apr 2020, Markus Armbruster wrote:
 Requesting 32 or 64 MiB of RAM with the sam460ex machine type produces
 a useless warning:

qemu-system-ppc: warning: Memory size is too small for SDRAM type, 
 adjusting type
>>>
>>> Why is it useless? It lets user know there was a change so it could
>>> help debugging for example.
>>
>> The memory type is chosen by QEMU, not the user.  Why should QEMU warn
>> the user when it chooses DDR, but not when it chooses DDR2?
>>
 This is because sam460ex_init() asks spd_data_generate() for DDR2,
 which is impossible, so spd_data_generate() corrects it to DDR.
>>>
>>> This is correct and intended. The idea is that the board code should
>>> not need to know about SPD data, all knowledge about that should be in
>>> spd_data_genereate().
>>
>> I challenge this idea.
>>
>> The kind of RAM module a board accepts is a property of the board.
>> Modelling that in board code is sensible and easy.  Attempting to model
>> it in a one size fits all helper function is unlikely to work for all
>> boards.
>>
>> Apparently some boards (including malta) need two banks, so your helper
>> increases the number of banks from one to two, but only when that's
>> possible without changing the type.
>>
>> What if another board needs one bank?  Four?  Two even if that requires
>> changing the type?  You'll end up with a bunch of flags to drive the
>> helper's magic.  Not yet because the helper has a grand total of *two*
>> users, and much of its magic is used by neither, as demonstrated by
>> PATCH 2.
>>
>> If you want magic, have a non-magic function that does exactly what it's
>> told, and a magic one to tell it what to do.  The non-magic one will be
>> truly reusable.  You can have any number of magic ones.  Boards with
>> sufficiently similar requirements can share a magic one.
>
> So far we have only sufficiently similar boards that can share the
> only magic function. Not many boards use SPD data (these are mostly
> needed for real board firmware so anything purely virtual don't model
> it usually). The refactoring you propose could be needed if we had
> more dissimilar boards but I think we could do that at that
> time. Until then I've tried to make it simple for board code and put

Keeping things simple and just serve the needs you actually have is
good.  We're in a much better position to figure out how to best serve
more complicated needs once we actually have them :)

> all magic in one place instead of having separate implementation of
> this in several boards. Maybe someone should try to convert the
> remaining boards (MIPS Malta and ARM integratorcp) to see if any
> refactoring is needed before doing those refactoring without checking
> first what's needed. I did not try to convert those boards because I
> cannot test them.

That's fair.

[...]




[Bug 1875139] Re: Domain fails to start when 'readonly' device not writable

2020-04-28 Thread Peter Krempa
This was indeed caused by libvirt starting to use -blockdev. The issue
is that qemu's 'auto-read-only' property which is used by libvirt for
the backing files doesn't properly work if the 'host_device' backend
encounters a read-only LV.

The above situation also happens if you have an read-only LV in the
backing chain.

In the current upstream situation of qemu's APIs there currently isn't
anything that libvirt can do in this case, because any solution would
either not fix the problem completely or would require sacrificing other
features, the auto-read-only property needs to be fixed in qemu.

I've also filed https://bugzilla.redhat.com/show_bug.cgi?id=1828252 to
track this issue.

** Bug watch added: Red Hat Bugzilla #1828252
   https://bugzilla.redhat.com/show_bug.cgi?id=1828252

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1875139

Title:
  Domain fails to start when 'readonly' device not writable

Status in QEMU:
  New

Bug description:
  This issue is introduced in QEMU 4.2.0 (4.1.0 is working fine)

  My root disk is a LVM2 volume thin snapshot that is marked as read-only
  But when I try to start the domain (using virt-manager) I get the following 
error:

  Error starting domain: internal error: process exited while connecting
  to monitor: 2020-04-26T06:55:06.342700Z qemu-system-x86_64: -blockdev
  {"driver":"host_device","filename":"/dev/vg/vmroot-20200425","aio":"native
  ","node-name":"libvirt-3-storage","cache":{"direct":true,"no-
  flush":false},"auto-read-only":true,"discard":"unmap"} The device is
  not writable: Permission denied

  Changing the lvm snapshot to writeable allows me to start the domain.
  (Making it changes possible during domain is running)

  I don't think QEMU should fail when it can't open a (block) device when the 
read-only option is set.
  (why is write access needed?)

  Reproduce steps:
  * Create LVM read-only volume (I don't think any data is needed)
  * Create domain with read-only volume as block device
  * Try to start the domain

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1875139/+subscriptions



Re: [PATCH v2 01/17] block/throttle-groups: throttle_group_co_io_limits_intercept(): 64bit bytes

2020-04-28 Thread Vladimir Sementsov-Ogievskiy

29.04.2020 1:09, Eric Blake wrote:

On 4/27/20 3:23 AM, Vladimir Sementsov-Ogievskiy wrote:

The function is called from 64bit io handlers, and bytes is just passed
to throttle_account() which is 64bit too (unsigned though). So, let's
convert intermediate argument to 64bit too.


My audit for this patch:

Caller has 32-bit, this patch now causes widening which is safe:
block/block-backend.c: blk_do_preadv() passes 'unsigned int'
block/block-backend.c: blk_do_pwritev_part() passes 'unsigned int'
block/throttle.c: throttle_co_pwrite_zeroes() passes 'int'
block/throttle.c: throttle_co_pdiscard() passes 'int'

Caller has 64-bit, this patch fixes potential bug where pre-patch could narrow, 
except it's easy enough to trace that callers are still capped at 2G actions:
block/throttle.c: throttle_co_preadv() passes 'uint64_t'
block/throttle.c: throttle_co_pwritev() passes 'uint64_t'

Implementation in question: block/throttle-groups.c 
throttle_group_co_io_limits_intercept() takes 'unsigned int bytes' and uses it:
argument to util/throttle.c throttle_account(uint64_t)

All safe: it patches a latent bug, and does not introduce any 64-bit gotchas 
once throttle_co_p{read,write}v are relaxed, and assuming throttle_account() is 
not buggy.


Should I add this all to commit message?





Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/throttle-groups.h | 2 +-
  block/throttle-groups.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Eric Blake 



Thanks for careful review!

--
Best regards,
Vladimir



Re: [PATCH] qemu-option: pass NULL rather than 0 to the id of qemu_opts_set()

2020-04-28 Thread Markus Armbruster
Masahiro Yamada  writes:

> The second argument 'id' is a pointer. Pass NULL rather than 0.
>
> Signed-off-by: Masahiro Yamada 

Reviewed-by: Markus Armbruster 

and queued, thanks!




Re: [RFC][PATCH v2 1/3] hw/misc: Add implementation of ivshmem revision 2 device

2020-04-28 Thread Liang Yan
A quick check by checkpatch.pl, pretty straightforward to fix.

ERROR: return is not a function, parentheses are not required
#211: FILE: hw/misc/ivshmem2.c:138:
+return (ivs->features & (1 << feature));

ERROR: memory barrier without comment
#255: FILE: hw/misc/ivshmem2.c:182:
+smp_mb();

ERROR: braces {} are necessary for all arms of this statement
#626: FILE: hw/misc/ivshmem2.c:553:
+if (msg->vector == 0)
[...]

Best,
Liang


On 1/7/20 9:36 AM, Jan Kiszka wrote:
> From: Jan Kiszka 
> 
> This adds a reimplementation of ivshmem in its new revision 2 as
> separate device. The goal of this is not to enable sharing with v1,
> rather to allow explore the properties and potential limitation of the
> new version prior to discussing its integration with the existing code.
> 
> v2 always requires a server to interconnect two more more QEMU
> instances because it provides signaling between peers unconditionally.
> Therefore, only the interconnecting chardev, master mode, and the usage
> of ioeventfd can be configured at device level. All other parameters are
> defined by the server instance.
> 
> A new server protocol is introduced along this. Its primary difference
> is the introduction of a single welcome message that contains all peer
> parameters, rather than a series of single-word messages pushing them
> piece by piece.
> 
> A complicating difference in interrupt handling, compare to v1, is the
> auto-disable mode of v2: When this is active, interrupt delivery is
> disabled by the device after each interrupt event. This prevents the
> usage of irqfd on the receiving side, but it lowers the handling cost
> for guests that implemented interrupt throttling this way (specifically
> when exposing the device via UIO).
> 
> No changes have been made to the ivshmem device regarding migration:
> Only the master can live-migrate, slave peers have to hot-unplug the
> device first.
> 
> The details of the device model will be specified in a succeeding
> commit. Drivers for this device can currently be found under
> 
> http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2
> 
> To instantiate a ivshmem v2 device, just add
> 
>  ... -chardev socket,path=/tmp/ivshmem_socket,id=ivshmem \
>  -device ivshmem,chardev=ivshmem
> 
> provided the server opened its socket under the default path.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  hw/misc/Makefile.objs  |2 +-
>  hw/misc/ivshmem2.c | 1085 
> 
>  include/hw/misc/ivshmem2.h |   48 ++
>  include/hw/pci/pci_ids.h   |2 +
>  4 files changed, 1136 insertions(+), 1 deletion(-)
>  create mode 100644 hw/misc/ivshmem2.c
>  create mode 100644 include/hw/misc/ivshmem2.h
> 
> diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> index ba898a5781..90a4a6608c 100644
> --- a/hw/misc/Makefile.objs
> +++ b/hw/misc/Makefile.objs
> @@ -26,7 +26,7 @@ common-obj-$(CONFIG_PUV3) += puv3_pm.o
>  
>  common-obj-$(CONFIG_MACIO) += macio/
>  
> -common-obj-$(CONFIG_IVSHMEM_DEVICE) += ivshmem.o
> +common-obj-$(CONFIG_IVSHMEM_DEVICE) += ivshmem.o ivshmem2.o
>  
>  common-obj-$(CONFIG_REALVIEW) += arm_sysctl.o
>  common-obj-$(CONFIG_NSERIES) += cbus.o
> diff --git a/hw/misc/ivshmem2.c b/hw/misc/ivshmem2.c
> new file mode 100644
> index 00..d5f88ed0e9
> --- /dev/null
> +++ b/hw/misc/ivshmem2.c
> @@ -0,0 +1,1085 @@
> +/*
> + * Inter-VM Shared Memory PCI device, version 2.
> + *
> + * Copyright (c) Siemens AG, 2019
> + *
> + * Authors:
> + *  Jan Kiszka 
> + *
> + * Based on ivshmem.c by Cam Macdonell 
> + *
> + * This code is licensed under the GNU GPL v2.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qapi/error.h"
> +#include "qemu/cutils.h"
> +#include "hw/hw.h"
> +#include "hw/pci/pci.h"
> +#include "hw/pci/msi.h"
> +#include "hw/pci/msix.h"
> +#include "hw/qdev-properties.h"
> +#include "sysemu/kvm.h"
> +#include "migration/blocker.h"
> +#include "migration/vmstate.h"
> +#include "qemu/error-report.h"
> +#include "qemu/event_notifier.h"
> +#include "qemu/module.h"
> +#include "qom/object_interfaces.h"
> +#include "chardev/char-fe.h"
> +#include "sysemu/qtest.h"
> +#include "qapi/visitor.h"
> +
> +#include "hw/misc/ivshmem2.h"
> +
> +#define PCI_VENDOR_ID_IVSHMEM   PCI_VENDOR_ID_SIEMENS
> +#define PCI_DEVICE_ID_IVSHMEM   0x4106
> +
> +#define IVSHMEM_MAX_PEERS   UINT16_MAX
> +#define IVSHMEM_IOEVENTFD   0
> +#define IVSHMEM_MSI 1
> +
> +#define IVSHMEM_REG_BAR_SIZE0x1000
> +
> +#define IVSHMEM_REG_ID  0x00
> +#define IVSHMEM_REG_MAX_PEERS   0x04
> +#define IVSHMEM_REG_INT_CTRL0x08
> +#define IVSHMEM_REG_DOORBELL0x0c
> +#define IVSHMEM_REG_STATE   0x10
> +
> +#define IVSHMEM_INT_ENABLE  0x1
> +
> +#define IVSHMEM_ONESHOT_MODE0x1
> +
> +#define IVSHMEM_DEBUG 0
> +#define IVSHMEM_DPRINTF(fmt, ...)   \
> +do {\
> +if (IVSHMEM_DEBUG) {

[Bug 1862986] Re: qemu-s390x crashes when run on aarch64

2020-04-28 Thread Launchpad Bug Tracker
[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1862986

Title:
  qemu-s390x crashes when run on aarch64

Status in QEMU:
  Expired

Bug description:
  All tested versions (2.11 and 4.2) qemu-s390x crashes with a segfault
  when run on an aarch64 odroid Ubuntu.


  Steps to reproduce:

  root@odroid:~/workspace/bitcoin-core# /usr/local/bin/qemu-s390x 
"/root/workspace/bitcoin-core/build/bitcoin-s390x-linux-gnu/src/test/test_bitcoin_orig"
  Segmentation fault (core dumped)
  root@odroid:~/workspace/bitcoin-core# /usr/local/bin/qemu-s390x --version
  qemu-s390x version 4.2.0
  Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers
  root@odroid:~/workspace/bitcoin-core# /usr/bin/qemu-s390x 
"/root/workspace/bitcoin-core/build/bitcoin-s390x-linux-gnu/src/test/test_bitcoin_orig"
  Segmentation fault (core dumped)
  root@odroid:~/workspace/bitcoin-core# /usr/bin/qemu-s390x --version
  qemu-s390x version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.22)
  Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

  qemu-arm does work on the same machine:

  root@odroid:~/workspace/bitcoin-core# /usr/bin/qemu-arm 
bitcoin-0.19.0.1-armhf/bin/test_bitcoin -t amount_tests
  Running 4 test cases...

  *** No errors detected
  root@odroid:~/workspace/bitcoin-core# /usr/local/bin/qemu-arm 
bitcoin-0.19.0.1-armhf/bin/test_bitcoin -t amount_tests
  Running 4 test cases...

  *** No errors detected


  
  What kind of debug information would be helpful for this issue report?
  GDB for the self-compiled latest release is not particularly helpful:

  (gdb) run
  Starting program: /usr/local/bin/qemu-s390x 
/root/workspace/bitcoin-core/build/bitcoin-s390x-linux-gnu/src/test/test_bitcoin_orig
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
  [New Thread 0x7fb7a2a140 (LWP 28264)]

  Thread 1 "qemu-s390x" received signal SIGSEGV, Segmentation fault.
  0x0096b218 in __bss_start__ ()
  (gdb) bt
  #0  0x0096b218 in __bss_start__ ()
  #1  0x006120a8 in ?? ()
  #2  0x0055579904b0 in ?? ()
  Backtrace stopped: previous frame inner to this frame (corrupt stack?)


  
  A bit more information is available in the version shipped by Ubuntu:

  (gdb) run
  Starting program: /usr/bin/qemu-s390x 
/root/workspace/bitcoin-core/build/bitcoin-s390x-linux-gnu/src/test/test_bitcoin_orig
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
  [New Thread 0x7fb7a01180 (LWP 28271)]

  Thread 1 "qemu-s390x" received signal SIGSEGV, Segmentation fault.
  0x00738f98 in code_gen_buffer ()
  (gdb) bt
  #0  0x00738f98 in code_gen_buffer ()
  #1  0x005e96c8 in cpu_exec ()
  #2  0x005ee430 in cpu_loop ()
  #3  0x005c3328 in main ()

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1862986/+subscriptions



Re: [RFC][PATCH v2 3/3] contrib: Add server for ivshmem revision 2

2020-04-28 Thread Liang Yan
A quick check by checkpatch.pl, pretty straightforward to fix.

ERROR: memory barrier without comment
#205: FILE: contrib/ivshmem2-server/ivshmem2-server.c:106:
+smp_mb();

ERROR: spaces required around that '*' (ctx:VxV)
#753: FILE: contrib/ivshmem2-server/main.c:22:
+#define IVSHMEM_SERVER_DEFAULT_SHM_SIZE   (4*1024*1024)
 ^

ERROR: spaces required around that '*' (ctx:VxV)
#753: FILE: contrib/ivshmem2-server/main.c:22:
+#define IVSHMEM_SERVER_DEFAULT_SHM_SIZE   (4*1024*1024)


Best,
Liang



On 1/7/20 9:36 AM, Jan Kiszka wrote:
> From: Jan Kiszka 
> 
> This implements the server process for ivshmem v2 device models of QEMU.
> Again, no effort has been spent yet on sharing code with the v1 server.
> Parts have been copied, others were rewritten.
> 
> In addition to parameters of v1, this server now also specifies
> 
>  - the maximum number of peers to be connected (required to know in
>advance because of v2's state table)
>  - the size of the output sections (can be 0)
>  - the protocol ID to be published to all peers
> 
> When a virtio protocol ID is chosen, only 2 peers can be connected.
> Furthermore, the server will signal the backend variant of the ID to the
> master instance and the frontend ID to the slave peer.
> 
> To start, e.g., a server that allows virtio console over ivshmem, call
> 
> ivshmem2-server -F -l 64K -n 2 -V 3 -P 0x8003
> 
> TODO: specify the new server protocol.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  Makefile  |   3 +
>  Makefile.objs |   1 +
>  configure |   1 +
>  contrib/ivshmem2-server/Makefile.objs |   1 +
>  contrib/ivshmem2-server/ivshmem2-server.c | 462 
> ++
>  contrib/ivshmem2-server/ivshmem2-server.h | 158 ++
>  contrib/ivshmem2-server/main.c| 313 
>  7 files changed, 939 insertions(+)
>  create mode 100644 contrib/ivshmem2-server/Makefile.objs
>  create mode 100644 contrib/ivshmem2-server/ivshmem2-server.c
>  create mode 100644 contrib/ivshmem2-server/ivshmem2-server.h
>  create mode 100644 contrib/ivshmem2-server/main.c
> 
> diff --git a/Makefile b/Makefile
> index 6b5ad1121b..33bb0eefdb 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -427,6 +427,7 @@ dummy := $(call unnest-vars,, \
>  elf2dmp-obj-y \
>  ivshmem-client-obj-y \
>  ivshmem-server-obj-y \
> +ivshmem2-server-obj-y \
>  rdmacm-mux-obj-y \
>  libvhost-user-obj-y \
>  vhost-user-scsi-obj-y \
> @@ -655,6 +656,8 @@ ivshmem-client$(EXESUF): $(ivshmem-client-obj-y) 
> $(COMMON_LDADDS)
>   $(call LINK, $^)
>  ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) $(COMMON_LDADDS)
>   $(call LINK, $^)
> +ivshmem2-server$(EXESUF): $(ivshmem2-server-obj-y) $(COMMON_LDADDS)
> + $(call LINK, $^)
>  endif
>  vhost-user-scsi$(EXESUF): $(vhost-user-scsi-obj-y) libvhost-user.a
>   $(call LINK, $^)
> diff --git a/Makefile.objs b/Makefile.objs
> index 02bf5ce11d..ce243975ef 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -115,6 +115,7 @@ qga-vss-dll-obj-y = qga/
>  elf2dmp-obj-y = contrib/elf2dmp/
>  ivshmem-client-obj-$(CONFIG_IVSHMEM) = contrib/ivshmem-client/
>  ivshmem-server-obj-$(CONFIG_IVSHMEM) = contrib/ivshmem-server/
> +ivshmem2-server-obj-$(CONFIG_IVSHMEM) = contrib/ivshmem2-server/
>  libvhost-user-obj-y = contrib/libvhost-user/
>  vhost-user-scsi.o-cflags := $(LIBISCSI_CFLAGS)
>  vhost-user-scsi.o-libs := $(LIBISCSI_LIBS)
> diff --git a/configure b/configure
> index 747d3b4120..1cb1427f1b 100755
> --- a/configure
> +++ b/configure
> @@ -6165,6 +6165,7 @@ if test "$want_tools" = "yes" ; then
>fi
>if [ "$ivshmem" = "yes" ]; then
>  tools="ivshmem-client\$(EXESUF) ivshmem-server\$(EXESUF) $tools"
> +tools="ivshmem2-server\$(EXESUF) $tools"
>fi
>if [ "$curl" = "yes" ]; then
>tools="elf2dmp\$(EXESUF) $tools"
> diff --git a/contrib/ivshmem2-server/Makefile.objs 
> b/contrib/ivshmem2-server/Makefile.objs
> new file mode 100644
> index 00..d233e18ec8
> --- /dev/null
> +++ b/contrib/ivshmem2-server/Makefile.objs
> @@ -0,0 +1 @@
> +ivshmem2-server-obj-y = ivshmem2-server.o main.o
> diff --git a/contrib/ivshmem2-server/ivshmem2-server.c 
> b/contrib/ivshmem2-server/ivshmem2-server.c
> new file mode 100644
> index 00..b341f1fcd0
> --- /dev/null
> +++ b/contrib/ivshmem2-server/ivshmem2-server.c
> @@ -0,0 +1,462 @@
> +/*
> + * Copyright 6WIND S.A., 2014
> + * Copyright (c) Siemens AG, 2019
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * (at your option) any later version.  See the COPYING file in the
> + * top-level directory.
> + */
> +#include "qemu/osdep.h"
> +#include "qemu/host-utils.h"
> +#include "qemu/sockets.h"
> +#include "qemu/atomic.h"
> +
> +#include 
> +#include 
> +
> 

Re: [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU

2020-04-28 Thread Liang Yan
Hi, All,

Did a test for these patches, all looked fine.

Test environment:
Host: opensuse tumbleweed + latest upstream qemu  + these three patches
Guest: opensuse tumbleweed root fs + custom kernel(5.5) + related
uio-ivshmem driver + ivshmem-console/ivshmem-block tools


1. lspci show

00:04.0 Unassigned class [ff80]: Siemens AG Device 4106 (prog-if 02)
Subsystem: Red Hat, Inc. Device 1100
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
Vector table: BAR=1 offset=
PBA: BAR=1 offset=0800
Kernel driver in use: virtio-ivshmem


2. virtio-ivshmem-console test
2.1 ivshmem2-server(host)

airey:~/ivshmem/qemu/:[0]# ./ivshmem2-server -F -l 64K -n 2 -V 3 -P 0x8003
*** Example code, do not use in production ***

2.2 guest vm backend(test-01)
localhost:~ # echo "110a 4106 1af4 1100 ffc003 ff" >
/sys/bus/pci/drivers/uio_ivshmem/new_id
[  185.831277] uio_ivshmem :00:04.0: state_table at
0xfd80, size 0x1000
[  185.835129] uio_ivshmem :00:04.0: rw_section at
0xfd801000, size 0x7000

localhost:~ # virtio/virtio-ivshmem-console /dev/uio0
Waiting for peer to be ready...

2.3 guest vm frontend(test-02)
need to boot or reboot after backend is done

2.4 backend will serial output of frontend

localhost:~ # virtio/virtio-ivshmem-console /dev/uio0
Waiting for peer to be ready...

localhost:~/virtio # ./virtio-ivshmem-console /dev/uio0
Waiting for peer to be ready...
Starting virtio device
device_status: 0x0
device_status: 0x1
device_status: 0x3
device_features_sel: 1
device_features_sel: 0
driver_features_sel: 1
driver_features[1]: 0x13
driver_features_sel: 0
driver_features[0]: 0x1
device_status: 0xb
queue_sel: 0
queue size: 8
queue driver vector: 1
queue desc: 0x200
queue driver: 0x280
queue device: 0x2c0
queue enable: 1
queue_sel: 1
queue size: 8
queue driver vector: 2
queue desc: 0x400
queue driver: 0x480
queue device: 0x4c0
queue enable: 1
device_status: 0xf

Welcome to openSUSE Tumbleweed 20200326 - Kernel 5.5.0-rc5-1-default+
(hvc0).

enp0s3:


localhost login:

2.5 close backend and frontend will show
localhost:~ # [  185.685041] virtio-ivshmem :00:04.0: backend failed!

3. virtio-ivshmem-block test

3.1 ivshmem2-server(host)
airey:~/ivshmem/qemu/:[0]# ./ivshmem2-server -F -l 1M -n 2 -V 2 -P 0x8002
*** Example code, do not use in production ***

3.2 guest vm backend(test-01)

localhost:~ # echo "110a 4106 1af4 1100 ffc002 ff" >
/sys/bus/pci/drivers/uio_ivshmem/new_id
[   77.701462] uio_ivshmem :00:04.0: state_table at
0xfd80, size 0x1000
[   77.705231] uio_ivshmem :00:04.0: rw_section at
0xfd801000, size 0x000ff000

localhost:~ # virtio/virtio-ivshmem-block /dev/uio0 /root/disk.img
Waiting for peer to be ready...

3.3 guest vm frontend(test-02)
need to boot or reboot after backend is done

3.4 guest vm backend(test-01)
localhost:~ # virtio/virtio-ivshmem-block /dev/uio0 /root/disk.img
Waiting for peer to be ready...
Starting virtio device
device_status: 0x0
device_status: 0x1
device_status: 0x3
device_features_sel: 1
device_features_sel: 0
driver_features_sel: 1
driver_features[1]: 0x13
driver_features_sel: 0
driver_features[0]: 0x206
device_status: 0xb
queue_sel: 0
queue size: 8
queue driver vector: 1
queue desc: 0x200
queue driver: 0x280
queue device: 0x2c0
queue enable: 1
device_status: 0xf

3.5 guest vm frontend(test-02), a new disk is attached:

fdisk /dev/vdb

Disk /dev/vdb: 192 KiB, 196608 bytes, 384 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

3.6 close backend and frontend will show
localhost:~ # [ 1312.284301] virtio-ivshmem :00:04.0: backend failed!



Tested-by: Liang Yan 

On 1/7/20 9:36 AM, Jan Kiszka wrote:
> Overdue update of the ivshmem 2.0 device model as presented at [1].
> 
> Changes in v2:
>  - changed PCI device ID to Siemens-granted one,
>adjusted PCI device revision to 0
>  - removed unused feature register from device
>  - addressed feedback on specification document
>  - rebased over master
> 
> This version is now fully in sync with the implementation for Jailhouse
> that is currently under review [2][3], UIO and virtio-ivshmem drivers
> are shared. Jailhouse will very likely pick up this revision of the
> device in order to move forward with stressing it.
> 
> More details on the usage with QEMU were in the original cover letter
> (with adjustements to the new device ID):
> 
> If you want to play with this, the basic setup of the shared memory
> device is described in patch 1 and 3. UIO driver and also the
> virtio-ivshmem prototype can be found at
> 
> http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2
> 
> Accessing the device via UIO is trivial enough. If 

Re: [PATCH for-5.1 4/7] target/mips: Add Loongson-3 CPU definition

2020-04-28 Thread Huacai Chen
Hi, Aleksandar,

I've tried translate.google.com, and documents are available here:
Loongson-3A R1 (Loongson-3A1000)
User Manual Part 1:
http://ftp.godson.ac.cn/lemote/3A1000_p1.pdf
http://ftp.godson.ac.cn/lemote/Loongson3A1000_processor_user_manual_P1.pdf
(Chinese Version)
User Manual Part 2:
http://ftp.godson.ac.cn/lemote/3A1000_p2.pdf
http://ftp.godson.ac.cn/lemote/Loongson3A1000_processor_user_manual_P2.pdf
(Chinese Version)

Loongson-3A R2 (Loongson-3A2000)
User Manual Part 1:
http://ftp.godson.ac.cn/lemote/3A2000_p1.pdf
http://ftp.godson.ac.cn/lemote/Loongson3A2000_user1.pdf (Chinese Version)
User Manual Part 2:
http://ftp.godson.ac.cn/lemote/3A2000_p2.pdf
http://ftp.godson.ac.cn/lemote/Loongson3A2000_user2.pdf (Chinese Version)

Loongson-3A R3 (Loongson-3A3000)
User Manual Part 1:
http://ftp.godson.ac.cn/lemote/3A3000_p1.pdf
http://ftp.godson.ac.cn/lemote/Loongson3A3000_3B3000usermanual1.pdf
(Chinese Version)
User Manual Part 2:
http://ftp.godson.ac.cn/lemote/3A3000_p2.pdf
http://ftp.godson.ac.cn/lemote/Loongson3A3000_3B3000usermanual2.pdf
(Chinese Version)

Loongson-3A R4 (Loongson-3A4000)
User Manual Part 1:
http://ftp.godson.ac.cn/lemote/3A4000_p1.pdf
http://ftp.godson.ac.cn/lemote/3A4000user.pdf (Chinese Version)
User Manual Part 2:
I'm sorry that it is unavailable now.

On Wed, Apr 29, 2020 at 2:37 AM Aleksandar Markovic
 wrote:
>
> Huacai,
>
> Can you please do machine translation of the document?
>
> It can be done via translate.google.com (it accepts pdf files, but
> does not have download feature, and workaround is to "print to pdf"...
>
> Thanks in advance!
> Aleksandar
>
> уто, 28. апр 2020. у 10:26 chen huacai  је написао/ла:
> >
> > Hi, Philippe,
> >
> > On Tue, Apr 28, 2020 at 2:34 PM Philippe Mathieu-Daudé  
> > wrote:
> > >
> > > Hi Huacai,
> > >
> > > On 4/27/20 11:33 AM, Huacai Chen wrote:
> > > > Loongson-3 CPU family include Loongson-3A R1/R2/R3/R4 and Loongson-3B
> > > > R1/R2. Loongson-3A R4 is the newest and its ISA is almost the superset
> > > > of all others. To reduce complexity, we just define a "Loongson-3A" CPU
> > > > which is corresponding to Loongson-3A R4. Loongson-3A has CONFIG6 and
> > > > CONFIG7, so add their bit-fields as well.
> > >
> > > Is there a public datasheet for R4? (If possible in English).
> > I'm sorry that we only have Chinese datasheet in www.loongson.cn.
> >
> > >
> > > >
> > > > Signed-off-by: Huacai Chen 
> > > > Co-developed-by: Jiaxun Yang 
> > > > ---
> > > >  target/mips/cpu.h| 28 ++
> > > >  target/mips/internal.h   |  2 ++
> > > >  target/mips/mips-defs.h  |  7 --
> > > >  target/mips/translate.c  |  2 ++
> > > >  target/mips/translate_init.inc.c | 51 
> > > > 
> > > >  5 files changed, 88 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/target/mips/cpu.h b/target/mips/cpu.h
> > > > index 94d01ea..0b3c987 100644
> > > > --- a/target/mips/cpu.h
> > > > +++ b/target/mips/cpu.h
> > > > @@ -940,7 +940,35 @@ struct CPUMIPSState {
> > > >  #define CP0C5_UFR  2
> > > >  #define CP0C5_NFExists 0
> > > >  int32_t CP0_Config6;
> > > > +int32_t CP0_Config6_rw_bitmask;
> > > > +#define CP0C6_BPPASS  31
> > > > +#define CP0C6_KPOS24
> > > > +#define CP0C6_KE  23
> > > > +#define CP0C6_VTLBONLY22
> > > > +#define CP0C6_LASX21
> > > > +#define CP0C6_SSEN20
> > > > +#define CP0C6_DISDRTIME   19
> > > > +#define CP0C6_PIXNUEN 18
> > > > +#define CP0C6_SCRAND  17
> > > > +#define CP0C6_LLEXCEN 16
> > > > +#define CP0C6_DISVC   15
> > > > +#define CP0C6_VCLRU   14
> > > > +#define CP0C6_DCLRU   13
> > > > +#define CP0C6_PIXUEN  12
> > > > +#define CP0C6_DISBLKLYEN  11
> > > > +#define CP0C6_UMEMUALEN   10
> > > > +#define CP0C6_SFBEN   8
> > > > +#define CP0C6_FLTINT  7
> > > > +#define CP0C6_VLTINT  6
> > > > +#define CP0C6_DISBTB  5
> > > > +#define CP0C6_STPREFCTL   2
> > > > +#define CP0C6_INSTPREF1
> > > > +#define CP0C6_DATAPREF0
> > > >  int32_t CP0_Config7;
> > > > +int64_t CP0_Config7_rw_bitmask;
> > > > +#define CP0C7_NAPCGEN   2
> > > > +#define CP0C7_UNIMUEN   1
> > > > +#define CP0C7_VFPUCGEN  0
> > > >  uint64_t CP0_LLAddr;
> > > >  uint64_t CP0_MAAR[MIPS_MAAR_MAX];
> > > >  int32_t CP0_MAARI;
> > > > diff --git a/target/mips/internal.h b/target/mips/internal.h
> > > > index 1bf274b..7853cb1 100644
> > > > --- a/target/mips/internal.h
> > > > +++ b/target/mips/internal.h
> > > > @@ -36,7 +36,9 @@ struct mips_def_t {
> > > >  int32_t CP0_Config5;
> > > >  int32_t CP0_Config5_rw_bitmask;
> > > >  int32_t CP0_Config6;
> > > > +int32_t CP0_Config6_rw_bitmask;
> > > >  int32_t CP0_Config7;
> > > > +int32_t CP0_Config7_rw_bitmask;
> > > >  target_ulong CP0_LLAddr_rw_bitmask;
> > 

[PATCH v2 2/2] Improve legacy vbios handling

2020-04-28 Thread Grzegorz Uriasz
The current method of getting the vbios is broken - it just isn't working on 
any device I've tested - the reason
for this is explained in the previous patch. The vbios is polymorphic and 
getting a proper unmodified copy is
often not possible without reverse engineering the firmware. We don't need an 
unmodified copy for most purposes -
an unmodified copy is only needed for initializing the bios framebuffer and 
providing the bios with a corrupted
copy of the rom won't do any damage as the bios will just ignore the rom.

After the i915 driver takes over the vbios is only needed for reading some 
metadata/configuration stuff etc...
I've tested that not having any kind of vbios in the guest actually works fine 
but on older generations of IGD
there are some slight hiccups. To maximize compatibility the best approach is 
to just copy the results of the vbios
execution directly to the guest. It turns out the vbios is always present on an 
hardcoded memory range in a reserved
memory range from real mode - all we need to do is to memcpy it into the guest.

The following patch does 2 things:
1) When pci_assign_dev_load_option_rom fails to read the vbios from sysfs(this 
works only when the igd is not the
boot gpu - this is unlikely to happen) it falls back to using /dev/mem to copy 
the vbios directly to the guest.
Using /dev/mem should be fine as there is more xen specific pci code which also 
relies on /dev/mem.
2) When dealing with IGD in the more generic code we skip the allocation of the 
rom resource - the reason for this is to prevent
a malicious guest from modifying the vbios in the host -> this is needed as 
someone might try to pwn the i915 driver in the host by doing so
(attach igd to guest, guest modifies vbios, the guest is terminated and the idg 
is reattached to the host, i915 driver in the host uses data from the modified 
vbios).
This is also needed to not overwrite the proper shadow copy made before.

I've tested this patch and it works fine - the guest isn't complaining about 
the missing vbios tables and the pci config
space in the guest looks fine.

Signed-off-by: Grzegorz Uriasz 
---
 hw/xen/xen_pt.c  |  8 +--
 hw/xen/xen_pt_graphics.c | 48 +---
 hw/xen/xen_pt_load_rom.c |  2 +-
 3 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index b91082cb8b..ffc3559dd4 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -483,8 +483,12 @@ static int xen_pt_register_regions(XenPCIPassthroughState 
*s, uint16_t *cmd)
i, r->size, r->base_addr, type);
 }
 
-/* Register expansion ROM address */
-if (d->rom.base_addr && d->rom.size) {
+/*
+ * Register expansion ROM address. If we are dealing with a ROM
+ * shadow copy for legacy vga devices then don't bother to map it
+ * as previous code creates a proper shadow copy
+ */
+if (d->rom.base_addr && d->rom.size && !(is_igd_vga_passthrough(d))) {
 uint32_t bar_data = 0;
 
 /* Re-set BAR reported by OS, otherwise ROM can't be read. */
diff --git a/hw/xen/xen_pt_graphics.c b/hw/xen/xen_pt_graphics.c
index a3bc7e3921..fe0ef2685c 100644
--- a/hw/xen/xen_pt_graphics.c
+++ b/hw/xen/xen_pt_graphics.c
@@ -129,7 +129,7 @@ int xen_pt_unregister_vga_regions(XenHostPCIDevice *dev)
 return 0;
 }
 
-static void *get_vgabios(XenPCIPassthroughState *s, int *size,
+static void *get_sysfs_vgabios(XenPCIPassthroughState *s, int *size,
XenHostPCIDevice *dev)
 {
 return pci_assign_dev_load_option_rom(>dev, size,
@@ -137,6 +137,45 @@ static void *get_vgabios(XenPCIPassthroughState *s, int 
*size,
   dev->dev, dev->func);
 }
 
+static void xen_pt_direct_vbios_copy(XenPCIPassthroughState *s, Error **errp)
+{
+int fd = -1;
+void *guest_bios = NULL;
+void *host_vbios = NULL;
+/* This is always 32 pages in the real mode reserved region */
+int bios_size = 32 << XC_PAGE_SHIFT;
+int vbios_addr = 0xc;
+
+fd = open("/dev/mem", O_RDONLY);
+if (fd == -1) {
+error_setg(errp, "Can't open /dev/mem: %s", strerror(errno));
+return;
+}
+host_vbios = mmap(NULL, bios_size,
+PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, fd, vbios_addr);
+close(fd);
+
+if (host_vbios == MAP_FAILED) {
+error_setg(errp, "Failed to mmap host vbios: %s", strerror(errno));
+return;
+}
+
+memory_region_init_ram(>dev.rom, OBJECT(>dev),
+"legacy_vbios.rom", bios_size, _abort);
+guest_bios = memory_region_get_ram_ptr(>dev.rom);
+memcpy(guest_bios, host_vbios, bios_size);
+
+if (munmap(host_vbios, bios_size) == -1) {
+XEN_PT_LOG(>dev, "Failed to unmap host vbios: %s\n", 
strerror(errno));
+}
+
+cpu_physical_memory_write(vbios_addr, guest_bios, bios_size);
+memory_region_set_address(>dev.rom, vbios_addr);
+pci_register_bar(>dev, PCI_ROM_SLOT, 

[PATCH v2 1/2] Fix undefined behaviour

2020-04-28 Thread Grzegorz Uriasz
This patch fixes qemu crashes when passing through an IGD device to HVM guests 
under XEN. The problem is that on almost every laptop
reading the IGD ROM from SYSFS will fail, the reason for it is that the IGD rom 
is polymorphic and it modifies itself
during bootup - this results in an invalid rom image - the kernel checks 
whether the image is valid and when that's not the case
it won't allow userspace to read it. It seems that when the code was first 
written(xen_pt_load_rom.c) the kernel's back then didn't check
whether the rom is valid and just passed the contents to userspace - because of 
this qemu also tries to repair the rom later in the code.
When stating the rom the kernel isn't validating the rom contents so it is 
returning the proper size of the rom(32 4kb pages).

This results in undefined behaviour - pci_assign_dev_load_option_rom is 
creating a buffer and then writes the size of the buffer to a pointer.
In pci_assign_dev_load_option_rom under old kernels if the fstat would succeed 
then fread would also succeed - this means if the buffer
is allocated the size of the buffer will be set. On newer kernels this is not 
the case - on all laptops I've tested(spanning a few
generations of IGD) the fstat is successful and the buffer is allocated but the 
fread is failing - as the buffer is not deallocated
the function is returning a valid pointer without setting the size of the 
buffer for the caller. The caller of pci_assign_dev_load_option_rom
is holding the size of the buffer in an uninitialized variable and is just 
checking whether pci_assign_dev_load_option_rom returned a valid pointer.
This later results in cpu_physical_memory_write(0xc, 
result_of_pci_assign_dev_load_option_rom, unitialized_variable) which
depending on the compiler parameters, configure flags, etc... might crash qemu 
or might work - the xen 4.12 stable release with default configure
parameters works but changing the compiler options slightly(for instance by 
enabling qemu debug) results in qemu crashing ¯\_(;-;)_/¯

The only situation when the original pci_assign_dev_load_option_rom works is 
when the IDG is not the boot gpu - this won't happen on any laptop and
will be rare on desktops.

This patch deallocates the buffer and returns NULL if reading the VBIOS fails - 
the caller of the function then properly shuts down qemu.
The next patch in the series introduces a better method for getting the vbios 
so qemu does not exit when pci_assign_dev_load_option_rom fails -
this is the reason I've changed error_report to warn_report as whether a 
failure in pci_assign_dev_load_option_rom is fatal
depends on the caller.

Signed-off-by: Grzegorz Uriasz 
---
 hw/xen/xen_pt_load_rom.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/xen/xen_pt_load_rom.c b/hw/xen/xen_pt_load_rom.c
index a50a80837e..9f100dc159 100644
--- a/hw/xen/xen_pt_load_rom.c
+++ b/hw/xen/xen_pt_load_rom.c
@@ -38,12 +38,12 @@ void *pci_assign_dev_load_option_rom(PCIDevice *dev,
 fp = fopen(rom_file, "r+");
 if (fp == NULL) {
 if (errno != ENOENT) {
-error_report("pci-assign: Cannot open %s: %s", rom_file, 
strerror(errno));
+warn_report("pci-assign: Cannot open %s: %s", rom_file, 
strerror(errno));
 }
 return NULL;
 }
 if (fstat(fileno(fp), ) == -1) {
-error_report("pci-assign: Cannot stat %s: %s", rom_file, 
strerror(errno));
+warn_report("pci-assign: Cannot stat %s: %s", rom_file, 
strerror(errno));
 goto close_rom;
 }
 
@@ -59,10 +59,9 @@ void *pci_assign_dev_load_option_rom(PCIDevice *dev,
 memset(ptr, 0xff, st.st_size);
 
 if (!fread(ptr, 1, st.st_size, fp)) {
-error_report("pci-assign: Cannot read from host %s", rom_file);
-error_printf("Device option ROM contents are probably invalid "
- "(check dmesg).\nSkip option ROM probe with rombar=0, "
- "or load from file with romfile=\n");
+warn_report("pci-assign: Cannot read from host %s", rom_file);
+memory_region_unref(>rom);
+ptr = NULL;
 goto close_rom;
 }
 
-- 
2.26.1




[PATCH v2 0/2] Fix QEMU crashes when passing IGD to a guest VM under XEN

2020-04-28 Thread Grzegorz Uriasz
This is the v1 cover letter - the patches now include a detailed description of 
the changes.

Hi,

This patch series is a small subset of a bigger patch set spanning few projects 
aiming to isolate the GPU
in QUBES OS to a dedicated security domain. I'm doing this together with 3 
colleagues as part of our Bachelors thesis.

When passing an Intel Graphic Device to a HVM guest under XEN, QEMU sometimes 
crashes
when starting the VM. It turns out that the code responsible for setting up
the legacy VBIOS for the IGD contains a bug which results in a memcpy of an 
undefined size
between the QEMU heap and the physical memory of the guest.

If the size of the memcpy is small enough qemu does not crash - this means that 
this
bug is actually a small security issue - a hostile guest kernel might determine 
the memory layout of
QEMU simply by looking at physical memory beyond 0xd - this defeats ASLR 
and might make exploitation
easier if other issues were to be found.

The problem is the current mechanism for obtaining a copy of the ROM of the IGD.
We first allocate a buffer which holds the vbios - the size of which is 
obtained from sysfs.
We then try to read the rom from sysfs, if we fail then we just return without 
setting the size of the buffer.
This would be ok if the size of the ROM reported by sysfs would be 0, but the 
size is always 32 pages as this corresponds
to legacy memory ranges. It turns out that reading the ROM fails on every 
single device I've tested(spanning few
generations of IGD), which means qemu never sets the size of the buffer and 
returns a valid pointer to code which
basically does a memcpy of an undefined size.

I'm including two patches.
The first one fixes the security issue by making failing to read the ROM from 
sysfs fatal.
The second patch introduces a better method for obtaining the VBIOS. I've 
haven't yet seen a single device on which
the old code was working, the new code basically creates a shadow copy directly 
by reading from /dev/mem - this
should be fine as a quick grep of the codebase shows that this approach is 
already being used to handle MSI.
I've tested the new code on few different laptops and it works fine and the 
guest VMS finally stopped complaining that
the VBIOS tables are missing.

Grzegorz Uriasz (2):
  Fix undefined behaviour
  Improve legacy vbios handling

 hw/xen/xen_pt.c  |  8 +--
 hw/xen/xen_pt_graphics.c | 48 +---
 hw/xen/xen_pt_load_rom.c | 13 +--
 3 files changed, 57 insertions(+), 12 deletions(-)

-- 
2.26.1




Re: [PATCH 0/9] More truncate improvements

2020-04-28 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20200428202905.770727-1-ebl...@redhat.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  block/io.o
  CC  block/create.o
/tmp/qemu-test/src/block/parallels.c: In function 'parallels_co_writev':
/tmp/qemu-test/src/block/parallels.c:218:12: error: 'ret' may be used 
uninitialized in this function [-Werror=maybe-uninitialized]
 if (ret < 0) {
^
/tmp/qemu-test/src/block/parallels.c:169:9: note: 'ret' was declared here
 int ret;
 ^~~
cc1: all warnings being treated as errors
make: *** [/tmp/qemu-test/src/rules.mak:69: block/parallels.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=b6fe059fceb24ee6af44e5ec70624428', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-3enfstzu/src/docker-src.2020-04-28-22.21.44.9392:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=b6fe059fceb24ee6af44e5ec70624428
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-3enfstzu/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real2m45.121s
user0m8.436s


The full log is available at
http://patchew.org/logs/20200428202905.770727-1-ebl...@redhat.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 0/9] More truncate improvements

2020-04-28 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20200428202905.770727-1-ebl...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  block/file-posix.o
  CC  block/linux-aio.o
/tmp/qemu-test/src/block/parallels.c: In function 'parallels_co_writev':
/tmp/qemu-test/src/block/parallels.c:218:12: error: 'ret' may be used 
uninitialized in this function [-Werror=maybe-uninitialized]
 if (ret < 0) {
^
/tmp/qemu-test/src/block/parallels.c:169:9: note: 'ret' was declared here
 int ret;
 ^
cc1: all warnings being treated as errors
make: *** [block/parallels.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=17b76be9fd0a432985a07807c0fce033', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-_1eoo1y7/src/docker-src.2020-04-28-22.22.17.10859:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=17b76be9fd0a432985a07807c0fce033
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-_1eoo1y7/src'
make: *** [docker-run-test-quick@centos7] Error 2

real2m2.980s
user0m8.950s


The full log is available at
http://patchew.org/logs/20200428202905.770727-1-ebl...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH for-5.1 3/7] hw/mips: Add CPU IRQ3 delivery for KVM

2020-04-28 Thread Huacai Chen
Hi, Philippe and Aleksandar,

I'm not refusing to change my patch, but I have two questions:
1, Why we should identify Loongson-3 to deliver IP3? It seems that
deliver all IPs (IP2~IP7) unconditionally is harmless as well.
2, How to identify Loongson-3 by Config6/Config7? Loongson-3 is not
the only processor which has Config6/Config7.

Huacai

On Wed, Apr 29, 2020 at 2:58 AM Aleksandar Markovic
 wrote:
>
> уто, 28. апр 2020. у 10:21 chen huacai  је написао/ла:
> >
> > Hi, Philippe,
> >
> > On Mon, Apr 27, 2020 at 5:57 PM Philippe Mathieu-Daudé  
> > wrote:
> > >
> > > On 4/27/20 11:33 AM, Huacai Chen wrote:
> > > > Currently, KVM/MIPS only deliver I/O interrupt via IP2, this patch add
> > > > IP2 delivery as well, because Loongson-3 based machine use both IRQ2
> > > > (CPU's IP2) and IRQ3 (CPU's IP3).
> > > >
> > > > Signed-off-by: Huacai Chen 
> > > > Co-developed-by: Jiaxun Yang 
> > > > ---
> > > >  hw/mips/mips_int.c | 6 ++
> > > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/hw/mips/mips_int.c b/hw/mips/mips_int.c
> > > > index 796730b..5526219 100644
> > > > --- a/hw/mips/mips_int.c
> > > > +++ b/hw/mips/mips_int.c
> > > > @@ -48,16 +48,14 @@ static void cpu_mips_irq_request(void *opaque, int 
> > > > irq, int level)
> > > >  if (level) {
> > > >  env->CP0_Cause |= 1 << (irq + CP0Ca_IP);
> > > >
> > > > -if (kvm_enabled() && irq == 2) {
> > > > +if (kvm_enabled() && (irq == 2 || irq == 3))
> > >
> > > Shouldn't we check env->CP0_Config6 (or Config7) has the required
> > > feature first?
> > I'm sorry that I can't understand IRQ delivery has something to do
> > with Config6/Config7, to identify Loongson-3?
> >
>
> Obviously, yes.
>
> Thanks,
> Aleksandar
>
>
> > >
> > > >  kvm_mips_set_interrupt(cpu, irq, level);
> > > > -}
> > > >
> > > >  } else {
> > > >  env->CP0_Cause &= ~(1 << (irq + CP0Ca_IP));
> > > >
> > > > -if (kvm_enabled() && irq == 2) {
> > > > +if (kvm_enabled() && (irq == 2 || irq == 3))
> > > >  kvm_mips_set_interrupt(cpu, irq, level);
> > > > -}
> > > >  }
> > > >
> > > >  if (env->CP0_Cause & CP0Ca_IP_mask) {
> > > >
> >
> >
> >
> > --
> > Huacai Chen



Re: [PATCH 16/17] spapr_pci: Drop some dead error handling

2020-04-28 Thread David Gibson
On Tue, Apr 28, 2020 at 06:34:18PM +0200, Markus Armbruster wrote:
> chassis_from_bus() uses object_property_get_uint() to get property
> "chassis_nr" of the bridge device.  Failure would be a programming
> error.  Pass _abort, and simplify its callers.
> 
> Cc: David Gibson 
> Cc: qemu-...@nongnu.org
> Signed-off-by: Markus Armbruster 

Acked-by: David Gibson 

> ---
>  hw/ppc/spapr_pci.c | 86 ++
>  1 file changed, 18 insertions(+), 68 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 1d73d05a0a..b6036be51c 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1203,46 +1203,36 @@ static SpaprDrc *drc_from_devfn(SpaprPhbState *phb,
> drc_id_from_devfn(phb, chassis, devfn));
>  }
>  
> -static uint8_t chassis_from_bus(PCIBus *bus, Error **errp)
> +static uint8_t chassis_from_bus(PCIBus *bus)
>  {
>  if (pci_bus_is_root(bus)) {
>  return 0;
>  } else {
>  PCIDevice *bridge = pci_bridge_get_device(bus);
>  
> -return object_property_get_uint(OBJECT(bridge), "chassis_nr", errp);
> +return object_property_get_uint(OBJECT(bridge), "chassis_nr",
> +_abort);
>  }
>  }
>  
>  static SpaprDrc *drc_from_dev(SpaprPhbState *phb, PCIDevice *dev)
>  {
> -Error *local_err = NULL;
> -uint8_t chassis = chassis_from_bus(pci_get_bus(dev), _err);
> -
> -if (local_err) {
> -error_report_err(local_err);
> -return NULL;
> -}
> +uint8_t chassis = chassis_from_bus(pci_get_bus(dev));
>  
>  return drc_from_devfn(phb, chassis, dev->devfn);
>  }
>  
> -static void add_drcs(SpaprPhbState *phb, PCIBus *bus, Error **errp)
> +static void add_drcs(SpaprPhbState *phb, PCIBus *bus)
>  {
>  Object *owner;
>  int i;
>  uint8_t chassis;
> -Error *local_err = NULL;
>  
>  if (!phb->dr_enabled) {
>  return;
>  }
>  
> -chassis = chassis_from_bus(bus, _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> -return;
> -}
> +chassis = chassis_from_bus(bus);
>  
>  if (pci_bus_is_root(bus)) {
>  owner = OBJECT(phb);
> @@ -1256,21 +1246,16 @@ static void add_drcs(SpaprPhbState *phb, PCIBus *bus, 
> Error **errp)
>  }
>  }
>  
> -static void remove_drcs(SpaprPhbState *phb, PCIBus *bus, Error **errp)
> +static void remove_drcs(SpaprPhbState *phb, PCIBus *bus)
>  {
>  int i;
>  uint8_t chassis;
> -Error *local_err = NULL;
>  
>  if (!phb->dr_enabled) {
>  return;
>  }
>  
> -chassis = chassis_from_bus(bus, _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> -return;
> -}
> +chassis = chassis_from_bus(bus);
>  
>  for (i = PCI_SLOT_MAX * PCI_FUNC_MAX - 1; i >= 0; i--) {
>  SpaprDrc *drc = drc_from_devfn(phb, chassis, i);
> @@ -1488,17 +1473,11 @@ int spapr_pci_dt_populate(SpaprDrc *drc, 
> SpaprMachineState *spapr,
>  }
>  
>  static void spapr_pci_bridge_plug(SpaprPhbState *phb,
> -  PCIBridge *bridge,
> -  Error **errp)
> +  PCIBridge *bridge)
>  {
> -Error *local_err = NULL;
>  PCIBus *bus = pci_bridge_get_sec_bus(bridge);
>  
> -add_drcs(phb, bus, _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> -return;
> -}
> +add_drcs(phb, bus);
>  }
>  
>  static void spapr_pci_plug(HotplugHandler *plug_handler,
> @@ -1529,11 +1508,7 @@ static void spapr_pci_plug(HotplugHandler 
> *plug_handler,
>  g_assert(drc);
>  
>  if (pc->is_bridge) {
> -spapr_pci_bridge_plug(phb, PCI_BRIDGE(plugged_dev), _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> -return;
> -}
> +spapr_pci_bridge_plug(phb, PCI_BRIDGE(plugged_dev));
>  }
>  
>  /* Following the QEMU convention used for PCIe multifunction
> @@ -1560,12 +1535,7 @@ static void spapr_pci_plug(HotplugHandler 
> *plug_handler,
>  spapr_drc_reset(drc);
>  } else if (PCI_FUNC(pdev->devfn) == 0) {
>  int i;
> -uint8_t chassis = chassis_from_bus(pci_get_bus(pdev), _err);
> -
> -if (local_err) {
> -error_propagate(errp, local_err);
> -return;
> -}
> +uint8_t chassis = chassis_from_bus(pci_get_bus(pdev));
>  
>  for (i = 0; i < 8; i++) {
>  SpaprDrc *func_drc;
> @@ -1587,17 +1557,11 @@ out:
>  }
>  
>  static void spapr_pci_bridge_unplug(SpaprPhbState *phb,
> -PCIBridge *bridge,
> -Error **errp)
> +PCIBridge *bridge)
>  {
> -Error *local_err = NULL;
>  PCIBus *bus = pci_bridge_get_sec_bus(bridge);
>  
> -remove_drcs(phb, bus, _err);
> -if (local_err) {
> -

Re: [PATCH for-5.1 6/7] hw/mips: Add Loongson-3 machine support (with KVM)

2020-04-28 Thread Huacai Chen
Hi, Aleksandr,

On Wed, Apr 29, 2020 at 3:23 AM Aleksandar Markovic
 wrote:
>
> Hi. Huacei.
>
> Please expand commit message with the description of the machine
> internal organization (several paragraphs).
>
> Also, please include command line for starting the machine. More than
> one example is better than only one.
>
> Specifically, can you explicitly say what is your KVM setup, so that
> anyone could repro it?
>
> Good health to people from China!
>
Thank you very much, I will improve that in V2.

> Yours,
> Aleksandar
>
> пон, 27. апр 2020. у 11:36 Huacai Chen  је написао/ла:
> >
> > Add Loongson-3 based machine support, it use i8259 as the interrupt
> > controler and use GPEX as the pci controller. Currently it can only
> > work with KVM, but we will add TCG support in future.
> >
> > Signed-off-by: Huacai Chen 
> > Co-developed-by: Jiaxun Yang 
> > ---
> >  default-configs/mips64el-softmmu.mak |   1 +
> >  hw/mips/Kconfig  |  10 +
> >  hw/mips/Makefile.objs|   1 +
> >  hw/mips/mips_loongson3.c | 869 
> > +++
> >  4 files changed, 881 insertions(+)
> >  create mode 100644 hw/mips/mips_loongson3.c
> >
> > diff --git a/default-configs/mips64el-softmmu.mak 
> > b/default-configs/mips64el-softmmu.mak
> > index 8b0c9b1..fc798e4 100644
> > --- a/default-configs/mips64el-softmmu.mak
> > +++ b/default-configs/mips64el-softmmu.mak
> > @@ -3,6 +3,7 @@
> >  include mips-softmmu-common.mak
> >  CONFIG_IDE_VIA=y
> >  CONFIG_FULONG=y
> > +CONFIG_LOONGSON3=y
> >  CONFIG_ATI_VGA=y
> >  CONFIG_RTL8139_PCI=y
> >  CONFIG_JAZZ=y
> > diff --git a/hw/mips/Kconfig b/hw/mips/Kconfig
> > index 2c2adbc..6f16b16 100644
> > --- a/hw/mips/Kconfig
> > +++ b/hw/mips/Kconfig
> > @@ -44,6 +44,16 @@ config JAZZ
> >  config FULONG
> >  bool
> >
> > +config LOONGSON3
> > +bool
> > +select PCKBD
> > +select SERIAL
> > +select ISA_BUS
> > +select PCI_EXPRESS_GENERIC_BRIDGE
> > +select VIRTIO_VGA
> > +select QXL if SPICE
> > +select MSI_NONBROKEN
> > +
> >  config MIPS_CPS
> >  bool
> >  select PTIMER
> > diff --git a/hw/mips/Makefile.objs b/hw/mips/Makefile.objs
> > index 2f7795b..f9bc8f5 100644
> > --- a/hw/mips/Makefile.objs
> > +++ b/hw/mips/Makefile.objs
> > @@ -4,5 +4,6 @@ obj-$(CONFIG_MALTA) += gt64xxx_pci.o mips_malta.o
> >  obj-$(CONFIG_MIPSSIM) += mips_mipssim.o
> >  obj-$(CONFIG_JAZZ) += mips_jazz.o
> >  obj-$(CONFIG_FULONG) += mips_fulong2e.o
> > +obj-$(CONFIG_LOONGSON3) += mips_loongson3.o
> >  obj-$(CONFIG_MIPS_CPS) += cps.o
> >  obj-$(CONFIG_MIPS_BOSTON) += boston.o
> > diff --git a/hw/mips/mips_loongson3.c b/hw/mips/mips_loongson3.c
> > new file mode 100644
> > index 000..a45c9ec
> > --- /dev/null
> > +++ b/hw/mips/mips_loongson3.c
> > @@ -0,0 +1,869 @@
> > +/*
> > + * Generic Loongson-3 Platform support
> > + *
> > + * Copyright (c) 2015-2020 Huacai Chen (che...@lemote.com)
> > + * This code is licensed under the GNU GPL v2.
> > + *
> > + * Contributions are licensed under the terms of the GNU GPL,
> > + * version 2 or (at your option) any later version.
> > + */
> > +
> > +/*
> > + * Generic PC Platform based on Loongson-3 CPU (MIPS64R2 with extensions,
> > + * 800~2000MHz)
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu-common.h"
> > +#include "qemu/units.h"
> > +#include "qapi/error.h"
> > +#include "cpu.h"
> > +#include "elf.h"
> > +#include "hw/boards.h"
> > +#include "hw/block/flash.h"
> > +#include "hw/char/serial.h"
> > +#include "hw/mips/mips.h"
> > +#include "hw/mips/cpudevs.h"
> > +#include "hw/intc/i8259.h"
> > +#include "hw/loader.h"
> > +#include "hw/ide.h"
> > +#include "hw/isa/superio.h"
> > +#include "hw/pci/msi.h"
> > +#include "hw/pci/pci.h"
> > +#include "hw/pci/pci_host.h"
> > +#include "hw/pci-host/gpex.h"
> > +#include "hw/rtc/mc146818rtc.h"
> > +#include "net/net.h"
> > +#include "exec/address-spaces.h"
> > +#include "sysemu/qtest.h"
> > +#include "sysemu/reset.h"
> > +#include "sysemu/runstate.h"
> > +#include "qemu/log.h"
> > +#include "qemu/error-report.h"
> > +
> > +#define INITRD_OFFSET  0x0400
> > +#define BOOTPARAM_ADDR 0x8ff0
> > +#define BOOTPARAM_PHYADDR  0x0ff0
> > +#define CFG_ADDR   0x0f10
> > +#define FW_CONF_ADDR   0x0fff
> > +#define PM_MMIO_ADDR   0x1008
> > +#define PM_MMIO_SIZE   0x100
> > +#define PM_CNTL_MODE   0x10
> > +
> > +#define PHYS_TO_VIRT(x) ((x) | ~(target_ulong)0x7fff)
> > +
> > +/* Loongson-3 has a 2MB flash rom */
> > +#define BIOS_SIZE   (2 * MiB)
> > +#define LOONGSON_MAX_VCPUS  16
> > +
> > +#define LOONGSON3_BIOSNAME "bios_loongson3.bin"
> > +
> > +#define PCIE_IRQ_BASE 3
> > +
> > +#define VIRT_PCI_IO_BASE0x1800ul
> > +#define VIRT_PCI_IO_SIZE0x000cul
> > +#define VIRT_PCI_MEM_BASE   0x4000ul
> > +#define VIRT_PCI_MEM_SIZE   0x4000ul
> > +#define VIRT_PCI_ECAM_BASE  0x1a00ul
> > 

Re: [PATCH for-5.1 5/7] target/mips: Add more CP0 register for save/restore

2020-04-28 Thread Huacai Chen
Hi, Aleksandar,

On Wed, Apr 29, 2020 at 3:10 AM Aleksandar Markovic
 wrote:
>
> пон, 27. апр 2020. у 11:36 Huacai Chen  је написао/ла:
> >
> > Add more CP0 register for save/restore, including: EBase, XContext,
> > PageGrain, PWBase, PWSize, PWField, PWCtl, Config*, KScratch1~KScratch6.
> >
> > Signed-off-by: Huacai Chen 
> > Co-developed-by: Jiaxun Yang 
> > ---
> >  target/mips/kvm.c | 212 
> > ++
> >  target/mips/machine.c |   2 +
> >  2 files changed, 214 insertions(+)
> >
> > diff --git a/target/mips/kvm.c b/target/mips/kvm.c
> > index de3e26e..96cfa10 100644
> > --- a/target/mips/kvm.c
> > +++ b/target/mips/kvm.c
> > @@ -245,10 +245,16 @@ int kvm_mips_set_ipi_interrupt(MIPSCPU *cpu, int irq, 
> > int level)
> >  (KVM_REG_MIPS_CP0 | KVM_REG_SIZE_U64 | (8 * (_R) + (_S)))
> >
> >  #define KVM_REG_MIPS_CP0_INDEX  MIPS_CP0_32(0, 0)
> > +#define KVM_REG_MIPS_CP0_RANDOM MIPS_CP0_32(1, 0)
> >  #define KVM_REG_MIPS_CP0_CONTEXTMIPS_CP0_64(4, 0)
> >  #define KVM_REG_MIPS_CP0_USERLOCAL  MIPS_CP0_64(4, 2)
> >  #define KVM_REG_MIPS_CP0_PAGEMASK   MIPS_CP0_32(5, 0)
> > +#define KVM_REG_MIPS_CP0_PAGEGRAIN  MIPS_CP0_32(5, 1)
> > +#define KVM_REG_MIPS_CP0_PWBASE MIPS_CP0_64(5, 5)
> > +#define KVM_REG_MIPS_CP0_PWFIELDMIPS_CP0_64(5, 6)
> > +#define KVM_REG_MIPS_CP0_PWSIZE MIPS_CP0_64(5, 7)
> >  #define KVM_REG_MIPS_CP0_WIRED  MIPS_CP0_32(6, 0)
> > +#define KVM_REG_MIPS_CP0_PWCTL  MIPS_CP0_32(6, 6)
> >  #define KVM_REG_MIPS_CP0_HWRENA MIPS_CP0_32(7, 0)
> >  #define KVM_REG_MIPS_CP0_BADVADDR   MIPS_CP0_64(8, 0)
> >  #define KVM_REG_MIPS_CP0_COUNT  MIPS_CP0_32(9, 0)
> > @@ -258,13 +264,22 @@ int kvm_mips_set_ipi_interrupt(MIPSCPU *cpu, int irq, 
> > int level)
> >  #define KVM_REG_MIPS_CP0_CAUSE  MIPS_CP0_32(13, 0)
> >  #define KVM_REG_MIPS_CP0_EPCMIPS_CP0_64(14, 0)
> >  #define KVM_REG_MIPS_CP0_PRID   MIPS_CP0_32(15, 0)
> > +#define KVM_REG_MIPS_CP0_EBASE  MIPS_CP0_64(15, 1)
> >  #define KVM_REG_MIPS_CP0_CONFIG MIPS_CP0_32(16, 0)
> >  #define KVM_REG_MIPS_CP0_CONFIG1MIPS_CP0_32(16, 1)
> >  #define KVM_REG_MIPS_CP0_CONFIG2MIPS_CP0_32(16, 2)
> >  #define KVM_REG_MIPS_CP0_CONFIG3MIPS_CP0_32(16, 3)
> >  #define KVM_REG_MIPS_CP0_CONFIG4MIPS_CP0_32(16, 4)
> >  #define KVM_REG_MIPS_CP0_CONFIG5MIPS_CP0_32(16, 5)
> > +#define KVM_REG_MIPS_CP0_CONFIG6MIPS_CP0_32(16, 6)
> > +#define KVM_REG_MIPS_CP0_XCONTEXT   MIPS_CP0_64(20, 0)
> >  #define KVM_REG_MIPS_CP0_ERROREPC   MIPS_CP0_64(30, 0)
> > +#define KVM_REG_MIPS_CP0_KSCRATCH1  MIPS_CP0_64(31, 2)
> > +#define KVM_REG_MIPS_CP0_KSCRATCH2  MIPS_CP0_64(31, 3)
> > +#define KVM_REG_MIPS_CP0_KSCRATCH3  MIPS_CP0_64(31, 4)
> > +#define KVM_REG_MIPS_CP0_KSCRATCH4  MIPS_CP0_64(31, 5)
> > +#define KVM_REG_MIPS_CP0_KSCRATCH5  MIPS_CP0_64(31, 6)
> > +#define KVM_REG_MIPS_CP0_KSCRATCH6  MIPS_CP0_64(31, 7)
> >
> >  static inline int kvm_mips_put_one_reg(CPUState *cs, uint64_t reg_id,
> > int32_t *addr)
> > @@ -394,6 +409,29 @@ static inline int kvm_mips_get_one_ureg64(CPUState 
> > *cs, uint64_t reg_id,
> >   (1U << CP0C5_UFE) | \
> >   (1U << CP0C5_FRE) | \
> >   (1U << CP0C5_UFR))
> > +#define KVM_REG_MIPS_CP0_CONFIG6_MASK   ((1U << CP0C6_BPPASS) | \
> > + (0x3fU << CP0C6_KPOS) | \
> > + (1U << CP0C6_KE) | \
> > + (1U << CP0C6_VTLBONLY) | \
> > + (1U << CP0C6_LASX) | \
> > + (1U << CP0C6_SSEN) | \
> > + (1U << CP0C6_DISDRTIME) | \
> > + (1U << CP0C6_PIXNUEN) | \
> > + (1U << CP0C6_SCRAND) | \
> > + (1U << CP0C6_LLEXCEN) | \
> > + (1U << CP0C6_DISVC) | \
> > + (1U << CP0C6_VCLRU) | \
> > + (1U << CP0C6_DCLRU) | \
> > + (1U << CP0C6_PIXUEN) | \
> > + (1U << CP0C6_DISBLKLYEN) | \
> > + (1U << CP0C6_UMEMUALEN) | \
> > + (1U << CP0C6_SFBEN) | \
> > + (1U << CP0C6_FLTINT) | \
> > + (1U << CP0C6_VLTINT) | \
> > + (1U << CP0C6_DISBTB) | \
> > + (3U << CP0C6_STPREFCTL) | \
> > +  

Re: [PATCH V2] Add a new PIIX option to control PCI hot unplugging of devices on non-root buses

2020-04-28 Thread Ani Sinha
On Wed, Apr 29, 2020 at 2:15 AM Michael S. Tsirkin  wrote:

> On Tue, Apr 28, 2020 at 10:10:18PM +0530, Ani Sinha wrote:
> >
> >
> > On Tue, Apr 28, 2020 at 9:51 PM Michael S. Tsirkin 
> wrote:
> >
> > On Tue, Apr 28, 2020 at 09:39:16PM +0530, Ani Sinha wrote:
> > >
> > > Ani
> > > On Apr 28, 2020, 21:35 +0530, Michael S. Tsirkin ,
> wrote:
> > >
> > > On Tue, Apr 28, 2020 at 10:16:52AM +, Ani Sinha wrote:
> > >
> > > A new option "use_acpi_unplug" is introduced for PIIX
> which will
> > > selectively only disable hot unplugging of both hot
> plugged and
> > > cold plugged PCI devices on non-root PCI buses. This will
> prevent
> > > hot unplugging of devices from Windows based guests from
> system
> > > tray but will not prevent devices from being hot plugged
> into the
> > > guest.
> > >
> > > It has been tested on Windows guests.
> > >
> > > Signed-off-by: Ani Sinha 
> > >
> > >
> > > It's still a non starter until we find something similar for
> PCIE and
> > > SHPC. Do guests check command status? Can some unplug commands
> fail?
> > >
> > >
> > > Ok I  give up! I thought we debated this on the other thread.
> >
> > Sorry to hear that.
> > I'd rather you didn't, and worked on a solution that works for
> everyone.
> >
> >
> > That is extremely hard for one person to do, without inputs and ideas
> from the
> > community.
>
> What kind of input are you looking for?


Well there were several discussions in the other thread around how PCIE
behaves and how we can't change the slot features without a HW reset. Those
were useful inputs.

The approach you are taking as a maintainer is very discouraging. All I
have gotten from you is negativity and push back. There has been no other
engagement from you. If you expect one person to fix every use case, that
is an unrealistic expectation IMHO. Even if I could come up with a solution
for every case, testing every use case is a huge investment in time and
effort.  No one outside the big distros will be motivated to do that. So
involvement from outside the distro community will be limited to minor
changes, bug fixes and easy code reworks.

My 2 cents.


>
> > Satisfying the entire world requires lot of time and energy
> > investment, not to mention a broad expertise in multiple technologies.
> >
> >
> >
> > Pushing back on merging code is unfortunately the only mechanism
> > maintainers have to make sure features are complete and
> > orthogonal to each other, so I'm not sure I can help otherwise.
> >
> > >
> > >
> > >
> > >
> > > ---
> > > hw/acpi/piix4.c | 3 +++
> > > hw/i386/acpi-build.c | 40
> > ++--
> > > 2 files changed, 29 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> > > index 964d6f5..59fa707 100644
> > > --- a/hw/acpi/piix4.c
> > > +++ b/hw/acpi/piix4.c
> > > @@ -78,6 +78,7 @@ typedef struct PIIX4PMState {
> > >
> > > AcpiPciHpState acpi_pci_hotplug;
> > > bool use_acpi_pci_hotplug;
> > > + bool use_acpi_unplug;
> > >
> > > uint8_t disable_s3;
> > > uint8_t disable_s4;
> > > @@ -633,6 +634,8 @@ static Property piix4_pm_properties[]
> = {
> > > DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_VAL, PIIX4PMState,
> s4_val, 2),
> > > DEFINE_PROP_BOOL("acpi-pci-hotplug-with-bridge-support",
> > PIIX4PMState,
> > > use_acpi_pci_hotplug, true),
> > > + DEFINE_PROP_BOOL("acpi-pci-hotunplug-enable-bridge",
> > PIIX4PMState,
> > > + use_acpi_unplug, true),
> > > DEFINE_PROP_BOOL("memory-hotplug-support", PIIX4PMState,
> > > acpi_memory_hotplug.is_enabled, true),
> > > DEFINE_PROP_END_OF_LIST(),
> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > index 23c77ee..71b3ac3 100644
> > > --- a/hw/i386/acpi-build.c
> > > +++ b/hw/i386/acpi-build.c
> > > @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
> > > bool s3_disabled;
> > > bool s4_disabled;
> > > bool pcihp_bridge_en;
> > > + bool pcihup_bridge_en;
> > > uint8_t s4_val;
> > > AcpiFadtData fadt;
> > > uint16_t cpu_hp_io_base;
> > > @@ -240,6 +241,9 @@ static void
> acpi_get_pm_info(MachineState
> > *machine,
> > > AcpiPmInfo *pm)
> > > pm->pcihp_bridge_en =
> > > object_property_get_bool(obj,
> > "acpi-pci-hotplug-with-bridge-support",
> > > NULL);
> > > + pm->pcihup_bridge_en =
> > > + 

Re: [PATCH v2] migration/xbzrle: add encoding rate

2020-04-28 Thread Wei Wang

On 04/28/2020 10:51 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

Users may need to check the xbzrle encoding rate to know if the guest
memory is xbzrle encoding-friendly, and dynamically turn off the
encoding if the encoding rate is low.

Signed-off-by: Yi Sun 
Signed-off-by: Wei Wang 
---
  migration/migration.c |  1 +
  migration/ram.c   | 38 --
  monitor/hmp-cmds.c|  2 ++
  qapi/migration.json   |  5 -
  4 files changed, 43 insertions(+), 3 deletions(-)

ChangeLog:
- include the 3 bytes (ENCODING_FLAG_XBZRLE flag and encoded_len) when
   calculating the encoding rate. Similar to the compress rate
   calculation, the 8 byte RAM_SAVE_FLAG_CONTINUE flag isn't included in
   the calculation.

diff --git a/migration/migration.c b/migration/migration.c
index 187ac04..e404213 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -930,6 +930,7 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
  info->xbzrle_cache->pages = xbzrle_counters.pages;
  info->xbzrle_cache->cache_miss = xbzrle_counters.cache_miss;
  info->xbzrle_cache->cache_miss_rate = xbzrle_counters.cache_miss_rate;
+info->xbzrle_cache->encoding_rate = xbzrle_counters.encoding_rate;
  info->xbzrle_cache->overflow = xbzrle_counters.overflow;
  }
  
diff --git a/migration/ram.c b/migration/ram.c

index 04f13fe..f46ab96 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -327,6 +327,10 @@ struct RAMState {
  uint64_t num_dirty_pages_period;
  /* xbzrle misses since the beginning of the period */
  uint64_t xbzrle_cache_miss_prev;
+/* Amount of xbzrle pages since the beginning of the period */
+uint64_t xbzrle_pages_prev;
+/* Amount of xbzrle encoded bytes since the beginning of the period */
+uint64_t xbzrle_bytes_prev;
  
  /* compression statistics since the beginning of the period */

  /* amount of count that no free thread to compress data */
@@ -696,6 +700,18 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
  return -1;
  }
  
+/*

+ * Reaching here means the page has hit the xbzrle cache, no matter what
+ * encoding result it is (normal encoding, overflow or skipping the page),
+ * count the page as encoded. This is used to caculate the encoding rate.
+ *
+ * Example: 2 pages (8KB) being encoded, first page encoding generates 2KB,
+ * 2nd page turns out to be skipped (i.e. no new bytes written to the
+ * page), the overall encoding rate will be 8KB / 2KB = 4, which has the
+ * skipped page included. In this way, the encoding rate can tell if the
+ * guest page is good for xbzrle encoding.
+ */
+xbzrle_counters.pages++;

Can you explain how that works with overflow?  Doesn't overflow return
-1 here, not increment the bytes, so it looks like you've xbzrle'd a
page, but the encoding rate hasn't incremented.


OK. How about adding below before returning -1 (for the overflow case):

...
xbzrle_counters.bytes += TARGET_PAGE_SIZE;
return -1;

Example: if we have 2 pages encoded as below:
4KB--> after normal encoding: 2KB
4KB--> after overflow: 4KB (will be sent as non-encoded page)
then the encoding rate is 8KB / 6KB = ~1.3
(if we skip the counting of the overflow case,
the encoding rate will be 4KB/ 2KB=2. Users may think that's
good to go with xbzrle, unless they do another calculation with
checking the overflow numbers themselves)

Best,
Wei



Re: [PATCH 00/17] qom: Spring cleaning

2020-04-28 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20200428163419.4483-1-arm...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH 00/17] qom: Spring cleaning
Message-id: 20200428163419.4483-1-arm...@redhat.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
1e7e79e qom: Drop @errp parameter of object_property_del()
787ab69 spapr_pci: Drop some dead error handling
4ae2109 qdev: Unrealize must not fail
31f0921 Drop more @errp parameters after previous commit
7ab7aa4 qom: Drop parameter @errp of object_property_add() & friends
3413d9b qdev: Clean up qdev_connect_gpio_out_named()
bbdfd24 hw/arm/bcm2835: Drop futile attempts at QOM-adopting memory
4c03b90 e1000: Don't run e1000_instance_init() twice
78ab4cf hw/isa/superio: Make the components QOM children
df98931 s390x/cpumodel: Fix UI to CPU features pcc-cmac-{aes, eaes}-256
8f252d2 tests/check-qom-proplist: Improve iterator coverage
b4f77da qom: Drop object_property_set_description() parameter @errp
993374e qom: Make all the object_property_add_FOO() return the property
4c6606e qom: Change object_property_get_uint16List() to match its doc
78efad6 qom: Drop object_property_del_child()'s unused parameter @errp
10c2151 qom: Clean up inconsistent use of gchar * vs. char *
a5d5e35 qom: Clearer reference counting in object_initialize_childv()

=== OUTPUT BEGIN ===
1/17 Checking commit a5d5e35f75b2 (qom: Clearer reference counting in 
object_initialize_childv())
2/17 Checking commit 10c2151f5621 (qom: Clean up inconsistent use of gchar * 
vs. char *)
3/17 Checking commit 78efad64bd9a (qom: Drop object_property_del_child()'s 
unused parameter @errp)
4/17 Checking commit 4c6606e56d70 (qom: Change object_property_get_uint16List() 
to match its doc)
5/17 Checking commit 993374e1cb51 (qom: Make all the object_property_add_FOO() 
return the property)
6/17 Checking commit b4f77daab5f8 (qom: Drop object_property_set_description() 
parameter @errp)
7/17 Checking commit 8f252d2131c6 (tests/check-qom-proplist: Improve iterator 
coverage)
8/17 Checking commit df9893154598 (s390x/cpumodel: Fix UI to CPU features 
pcc-cmac-{aes, eaes}-256)
ERROR: line over 90 characters
#54: FILE: target/s390x/cpu_features_def.inc.h:313:
+DEF_FEAT(PCC_CMAC_AES_256, "pcc-cmac-aes-256", PCC, 20, "PCC 
Compute-Last-Block-CMAC-Using-AES-256")

total: 1 errors, 0 warnings, 8 lines checked

Patch 8/17 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

9/17 Checking commit 78ab4cf91a3b (hw/isa/superio: Make the components QOM 
children)
10/17 Checking commit 4c03b90970d8 (e1000: Don't run e1000_instance_init() 
twice)
11/17 Checking commit bbdfd2486f47 (hw/arm/bcm2835: Drop futile attempts at 
QOM-adopting memory)
12/17 Checking commit 3413d9b7f4e7 (qdev: Clean up 
qdev_connect_gpio_out_named())
13/17 Checking commit 7ab7aa47a97d (qom: Drop parameter @errp of 
object_property_add() & friends)
WARNING: line over 80 characters
#207: FILE: backends/hostmem-file.c:187:
+file_memory_backend_get_discard_data, 
file_memory_backend_set_discard_data);

WARNING: line over 80 characters
#1078: FILE: hw/arm/raspi.c:287:
+object_property_add_const_link(OBJECT(>soc), "ram", 
OBJECT(machine->ram));

WARNING: line over 80 characters
#3084: FILE: hw/ppc/spapr.c:3346:
+   >kernel_addr, 
OBJ_PROP_FLAG_READWRITE);

total: 0 errors, 3 warnings, 4457 lines checked

Patch 13/17 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
14/17 Checking commit 31f09214e529 (Drop more @errp parameters after previous 
commit)
15/17 Checking commit 4ae21090aaa6 (qdev: Unrealize must not fail)
16/17 Checking commit 787ab6991f71 (spapr_pci: Drop some dead error handling)
17/17 Checking commit 1e7e79e14e27 (qom: Drop @errp parameter of 
object_property_del())
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200428163419.4483-1-arm...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[Bug 1875762] [NEW] Poor disk performance on sparse VMDKs

2020-04-28 Thread Alan Murtagh
Public bug reported:

Found in QEMU 4.1, and reproduced on master.

QEMU appears to suffer from remarkably poor disk performance when
writing to sparse-extent VMDKs. Of course it's to be expected that
allocation takes time and sparse VMDKs peform worse than allocated
VMDKs, but surely not on the orders of magnitude I'm observing. On my
system, the fully allocated write speeds are approximately 1.5GB/s,
while the fully sparse write speeds can be as low as 10MB/s. I've
noticed that adding "cache unsafe" reduces the issue dramatically,
bringing speeds up to around 750MB/s. I don't know if this is still slow
or if this perhaps reveals a problem with the default caching method.

To reproduce the issue I've attached two 4GiB VMDKs. Both are completely
empty and both are technically sparse-extent VMDKs, but one is 100% pre-
allocated and the other is 100% unallocated. If you attach these VMDKs
as second and third disks to an Ubuntu VM running on QEMU (with KVM) and
measure their write performance (using dd to write to /dev/sdb and
/dev/sdc for example) the difference in write speeds is clear.

For what it's worth, the flags I'm using that relate to the VMDK are as
follows:

`-drive if=none,file=sparse.vmdk,id=hd0,format=vmdk -device virtio-scsi-
pci,id=scsi -device scsi-hd,drive=hd0`

** Affects: qemu
 Importance: Undecided
 Status: New

** Attachment added: "Two different empty VMDKs with vastly different 
performance."
   https://bugs.launchpad.net/bugs/1875762/+attachment/5363023/+files/vmdks.zip

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1875762

Title:
  Poor disk performance on sparse VMDKs

Status in QEMU:
  New

Bug description:
  Found in QEMU 4.1, and reproduced on master.

  QEMU appears to suffer from remarkably poor disk performance when
  writing to sparse-extent VMDKs. Of course it's to be expected that
  allocation takes time and sparse VMDKs peform worse than allocated
  VMDKs, but surely not on the orders of magnitude I'm observing. On my
  system, the fully allocated write speeds are approximately 1.5GB/s,
  while the fully sparse write speeds can be as low as 10MB/s. I've
  noticed that adding "cache unsafe" reduces the issue dramatically,
  bringing speeds up to around 750MB/s. I don't know if this is still
  slow or if this perhaps reveals a problem with the default caching
  method.

  To reproduce the issue I've attached two 4GiB VMDKs. Both are
  completely empty and both are technically sparse-extent VMDKs, but one
  is 100% pre-allocated and the other is 100% unallocated. If you attach
  these VMDKs as second and third disks to an Ubuntu VM running on QEMU
  (with KVM) and measure their write performance (using dd to write to
  /dev/sdb and /dev/sdc for example) the difference in write speeds is
  clear.

  For what it's worth, the flags I'm using that relate to the VMDK are
  as follows:

  `-drive if=none,file=sparse.vmdk,id=hd0,format=vmdk -device virtio-
  scsi-pci,id=scsi -device scsi-hd,drive=hd0`

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1875762/+subscriptions



Re: [PATCH] block: Comment cleanups

2020-04-28 Thread Alberto Garcia
On Tue 28 Apr 2020 11:38:07 PM CEST, Eric Blake wrote:
> It's been a while since we got rid of the sector-based bdrv_read and
> bdrv_write (commit 2e11d756); let's finish the job on a few remaining
> comments.
>
> Signed-off-by: Eric Blake 

Reviewed-by: Alberto Garcia 

Berto



Re: [PATCH v2 01/17] block/throttle-groups: throttle_group_co_io_limits_intercept(): 64bit bytes

2020-04-28 Thread Eric Blake

On 4/27/20 3:23 AM, Vladimir Sementsov-Ogievskiy wrote:

The function is called from 64bit io handlers, and bytes is just passed
to throttle_account() which is 64bit too (unsigned though). So, let's
convert intermediate argument to 64bit too.


My audit for this patch:

Caller has 32-bit, this patch now causes widening which is safe:
block/block-backend.c: blk_do_preadv() passes 'unsigned int'
block/block-backend.c: blk_do_pwritev_part() passes 'unsigned int'
block/throttle.c: throttle_co_pwrite_zeroes() passes 'int'
block/throttle.c: throttle_co_pdiscard() passes 'int'

Caller has 64-bit, this patch fixes potential bug where pre-patch could 
narrow, except it's easy enough to trace that callers are still capped 
at 2G actions:

block/throttle.c: throttle_co_preadv() passes 'uint64_t'
block/throttle.c: throttle_co_pwritev() passes 'uint64_t'

Implementation in question: block/throttle-groups.c 
throttle_group_co_io_limits_intercept() takes 'unsigned int bytes' and 
uses it:

argument to util/throttle.c throttle_account(uint64_t)

All safe: it patches a latent bug, and does not introduce any 64-bit 
gotchas once throttle_co_p{read,write}v are relaxed, and assuming 
throttle_account() is not buggy.




Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/throttle-groups.h | 2 +-
  block/throttle-groups.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




[ANNOUNCE] QEMU 5.0.0 is now available

2020-04-28 Thread Michael Roth
Hello,

On behalf of the QEMU Team, I'd like to announce the availability of
the QEMU 5.0.0 release. This release contains 2800+ commits from 232
authors.

You can grab the tarball from our download page here:

  https://www.qemu.org/download/#source

The full list of changes are available at:

  https://wiki.qemu.org/ChangeLog/5.0

Highlights include:

 * Support for passing host filesystem directory to guest via virtiofsd
 * Live migration support for external processes running on QEMU D-Bus
 * Support for using memory backends for main/"built-in" guest RAM
 * block: support for compressed backup images via block jobs
 * block: qemu-img: 'measure' command now supports LUKS images, 'convert'
   command now supports skipping zero'ing of target image
 * block: experimental support for qemu-storage-daemon, which provides access
   to QEMU block-layer/QMP features like blocks jobs or built-in NBD server
   without starting a full VM

 * ARM: support for the following architecture features:
ARMv8.1 VHE/VMID16/PAN/PMU
ARMv8.2 UAO/DCPoP/ATS1E1/TTCNP
ARMv8.3 RCPC/CCIDX
ARMv8.4 PMU/RCPC
 * ARM: support for Cortex-M7 CPU
 * ARM: new board support for tacoma-bmc, Netduino Plus 2, and Orangepi PC
 * ARM: 'virt' machine now supports vTPM and virtio-iommu devices
 * HPPA: graphical console support via HP Artist graphics device
 * MIPS: support for GINVT (global TLB invalidation) instruction
 * PowerPC: 'pseries' machine no longer requires reboot to negotiate between
   XIVE/XICS interrupt controllers when ic-mode=dual
 * PowerPC: 'powernv' machine can now emulate KVM hardware acceleration to run
   KVM guests while in TCG mode
 * PowerPC: support for file-backed NVDIMMs for persistent memory emulation
 * RISC-V: 'virt' and 'sifive_u' boards now support generic syscon drivers in
   Linux to control power/reboot
 * RISC-V: 'virt' board now supports Goldfish RTC
 * RISC-V: experimental support for v0.5 of draft hypervisor extension
 * s390: support for Adapter Interrupt Suppression while running in KVM mode

 * and lots more...

Thank you to everyone involved!




[PATCH] block: Comment cleanups

2020-04-28 Thread Eric Blake
It's been a while since we got rid of the sector-based bdrv_read and
bdrv_write (commit 2e11d756); let's finish the job on a few remaining
comments.

Signed-off-by: Eric Blake 
---

Hmm - I started this in Nov 2018, and just barely noticed that it has
been sitting in a stale tree on my disk for a while now...

 block/io.c |  3 ++-
 block/qcow2-refcount.c |  2 +-
 block/vvfat.c  | 10 +-
 tests/qemu-iotests/001 |  2 +-
 tests/qemu-iotests/052 |  2 +-
 tests/qemu-iotests/134 |  2 +-
 tests/qemu-iotests/188 |  2 +-
 7 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/block/io.c b/block/io.c
index a4f971423093..7d30e61edc6c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -960,7 +960,7 @@ int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
  * flags are passed through to bdrv_pwrite_zeroes (e.g. BDRV_REQ_MAY_UNMAP,
  * BDRV_REQ_FUA).
  *
- * Returns < 0 on error, 0 on success. For error codes see bdrv_write().
+ * Returns < 0 on error, 0 on success. For error codes see bdrv_pwrite().
  */
 int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
 {
@@ -994,6 +994,7 @@ int bdrv_make_zero(BdrvChild *child, BdrvRequestFlags flags)
 }
 }

+/* return < 0 if error. See bdrv_pwrite() for the return codes */
 int bdrv_preadv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov)
 {
 int ret;
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index d9650b9b6c50..0457a6060d11 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2660,7 +2660,7 @@ fail:
  * - 0 if writing to this offset will not affect the mentioned metadata
  * - a positive QCow2MetadataOverlap value indicating one overlapping section
  * - a negative value (-errno) indicating an error while performing a check,
- *   e.g. when bdrv_read failed on QCOW2_OL_INACTIVE_L2
+ *   e.g. when bdrv_pread failed on QCOW2_OL_INACTIVE_L2
  */
 int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
  int64_t size)
diff --git a/block/vvfat.c b/block/vvfat.c
index ab800c4887a2..6d5c090dec4d 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2148,7 +2148,7 @@ DLOG(checkpoint());
  * - get modified FAT
  * - compare the two FATs (TODO)
  * - get buffer for marking used clusters
- * - recurse direntries from root (using bs->bdrv_read to make
+ * - recurse direntries from root (using bs->bdrv_pread to make
  *sure to get the new data)
  *   - check that the FAT agrees with the size
  *   - count the number of clusters occupied by this directory and
@@ -2913,9 +2913,9 @@ static int handle_deletes(BDRVVVFATState* s)
 /*
  * synchronize mapping with new state:
  *
- * - copy FAT (with bdrv_read)
+ * - copy FAT (with bdrv_pread)
  * - mark all filenames corresponding to mappings as deleted
- * - recurse direntries from root (using bs->bdrv_read)
+ * - recurse direntries from root (using bs->bdrv_pread)
  * - delete files corresponding to mappings marked as deleted
  */
 static int do_commit(BDRVVVFATState* s)
@@ -2935,10 +2935,10 @@ static int do_commit(BDRVVVFATState* s)
 return ret;
 }

-/* copy FAT (with bdrv_read) */
+/* copy FAT (with bdrv_pread) */
 memcpy(s->fat.pointer, s->fat2, 0x200 * s->sectors_per_fat);

-/* recurse direntries from root (using bs->bdrv_read) */
+/* recurse direntries from root (using bs->bdrv_pread) */
 ret = commit_direntries(s, 0, -1);
 if (ret) {
 fprintf(stderr, "Fatal: error while committing (%d)\n", ret);
diff --git a/tests/qemu-iotests/001 b/tests/qemu-iotests/001
index d87a535c3391..696726e45f56 100755
--- a/tests/qemu-iotests/001
+++ b/tests/qemu-iotests/001
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 #
-# Test simple read/write using plain bdrv_read/bdrv_write
+# Test simple read/write using plain bdrv_pread/bdrv_pwrite
 #
 # Copyright (C) 2009 Red Hat, Inc.
 #
diff --git a/tests/qemu-iotests/052 b/tests/qemu-iotests/052
index 45a140910da1..8d5c10601fe9 100755
--- a/tests/qemu-iotests/052
+++ b/tests/qemu-iotests/052
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 #
-# Test bdrv_read/bdrv_write using BDRV_O_SNAPSHOT
+# Test bdrv_pread/bdrv_pwrite using BDRV_O_SNAPSHOT
 #
 # Copyright (C) 2013 Red Hat, Inc.
 #
diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
index 5f0fb86211e3..5162d2166248 100755
--- a/tests/qemu-iotests/134
+++ b/tests/qemu-iotests/134
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 #
-# Test encrypted read/write using plain bdrv_read/bdrv_write
+# Test encrypted read/write using plain bdrv_pread/bdrv_pwrite
 #
 # Copyright (C) 2015 Red Hat, Inc.
 #
diff --git a/tests/qemu-iotests/188 b/tests/qemu-iotests/188
index afca44df5427..09b9b6083ab3 100755
--- a/tests/qemu-iotests/188
+++ b/tests/qemu-iotests/188
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 #
-# Test encrypted read/write using plain bdrv_read/bdrv_write
+# Test encrypted read/write using plain bdrv_pread/bdrv_pwrite
 #
 # Copyright (C) 2017 Red Hat, Inc.
 #
-- 
2.26.2



Re: [PATCH v2 00/17] 64bit block-layer

2020-04-28 Thread Eric Blake

On 4/27/20 3:23 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

v1 was "[RFC 0/3] 64bit block-layer part I", please refer to initial
cover-letter
  https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg08723.html
for motivation.

v2:
patch 02 is unchanged, add Stefan's r-b. Everything other is changed a
lot. What's new:



You'll also want to check my (now-abandoned?) posting from a while back:
https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg02769.html

to see what (if anything) from that attempt can be salvaged.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v22 4/4] iotests: 287: add qcow2 compression type test

2020-04-28 Thread Eric Blake

On 4/28/20 3:00 PM, Denis Plotnikov wrote:

The test checks fulfilling qcow2 requirements for the compression
type feature and zstd compression type operability.

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Tested-by: Vladimir Sementsov-Ogievskiy 
---


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v22 3/4] qcow2: add zstd cluster compression

2020-04-28 Thread Eric Blake

On 4/28/20 3:00 PM, Denis Plotnikov wrote:

zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
   time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
   src.img [zlib|zstd]_compressed.img
decompress cmd
   time ./qemu-img convert -O qcow2
   [zlib|zstd]_compressed.img uncompressed.img

compression   decompression
  zlib   zstd   zlib zstd

real 65.5   16.3 (-75 %)1.9  1.6 (-16 %)
user 65.0   15.85.3  2.5
sys   3.30.22.0  2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov 
QAPI part:
Acked-by: Markus Armbruster 
---



+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
+{
+ssize_t ret;
+size_t zstd_ret;
+ZSTD_outBuffer output = {
+.dst = dest,
+.size = dest_size,
+.pos = 0
+};



+
+/* make sure we can safely return compressed buffer size with ssize_t */
+assert(output.pos <= SSIZE_MAX);


Seems rather vague, since we know that .pos won't exceed .size which was 
initialized by dest_size which is <= 2M.  A tighter assertion:


assert(output.pos <= dest_size)

seems like it is more realistic to your real constraint (namely, that 
zstd did not overflow dest).



+ret = output.pos;
+out:
+ZSTD_freeCCtx(cctx);
+return ret;
+}
+



+++ b/slirp
@@ -1 +1 @@
-Subproject commit 2faae0f778f818fadc873308f983289df697eb93
+Subproject commit 55ab21c9a36852915b81f1b41ebaf3b6509dd8ba


Umm, you definitely don't want that.

A maintainer could touch up both of those, but I'm not sure which block 
maintainer will be accepting this series.  Maybe wait for that question 
to be answered before trying to post a v23.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH V2] Add a new PIIX option to control PCI hot unplugging of devices on non-root buses

2020-04-28 Thread Michael S. Tsirkin
On Tue, Apr 28, 2020 at 05:33:08PM +0100, Daniel P. Berrangé wrote:
> On Tue, Apr 28, 2020 at 12:30:53PM -0400, Michael S. Tsirkin wrote:
> > On Tue, Apr 28, 2020 at 05:28:36PM +0100, Daniel P. Berrangé wrote:
> > > On Tue, Apr 28, 2020 at 12:05:47PM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, Apr 28, 2020 at 10:16:52AM +, Ani Sinha wrote:
> > > > > A new option "use_acpi_unplug" is introduced for PIIX which will
> > > > > selectively only disable hot unplugging of both hot plugged and
> > > > > cold plugged PCI devices on non-root PCI buses. This will prevent
> > > > > hot unplugging of devices from Windows based guests from system
> > > > > tray but will not prevent devices from being hot plugged into the
> > > > > guest.
> > > > > 
> > > > > It has been tested on Windows guests.
> > > > > 
> > > > > Signed-off-by: Ani Sinha 
> > > > 
> > > > It's still a non starter until we find something similar for PCIE and
> > > > SHPC. Do guests check command status? Can some unplug commands fail?
> > > 
> > > Why does PCIE need anything ? For that we already have ability to
> > > control hotplugging per-slot in pcie-root-port.
> > 
> > At the moment that does not support unplug of hotplugged devices.
> 
> I don't see why this patch has to deal with that limitation though,
> it is a independant problem from this patch which is PCI focused,
> not PCIe.

And that's par for the course, each contributor wants to care only about
his own corner. The only tool I as a maintainer have for keeping things
consistent is by deferring merge until they are.

> > 
> > 
> > > If SHPC doesn't
> > > support this that's fine too, it isn't a reason to block its merge
> > > and use with x86 i440fx machine.
> > > 
> > > > 
> > > > 
> > > > > ---
> > > > >  hw/acpi/piix4.c  |  3 +++
> > > > >  hw/i386/acpi-build.c | 40 ++--
> > > > >  2 files changed, 29 insertions(+), 14 deletions(-)
> > > > > 
> > > > > diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> > > > > index 964d6f5..59fa707 100644
> > > > > --- a/hw/acpi/piix4.c
> > > > > +++ b/hw/acpi/piix4.c
> > > > > @@ -78,6 +78,7 @@ typedef struct PIIX4PMState {
> > > > >  
> > > > >  AcpiPciHpState acpi_pci_hotplug;
> > > > >  bool use_acpi_pci_hotplug;
> > > > > +bool use_acpi_unplug;
> > > > >  
> > > > >  uint8_t disable_s3;
> > > > >  uint8_t disable_s4;
> > > > > @@ -633,6 +634,8 @@ static Property piix4_pm_properties[] = {
> > > > >  DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_VAL, PIIX4PMState, s4_val, 2),
> > > > >  DEFINE_PROP_BOOL("acpi-pci-hotplug-with-bridge-support", 
> > > > > PIIX4PMState,
> > > > >   use_acpi_pci_hotplug, true),
> > > > > +DEFINE_PROP_BOOL("acpi-pci-hotunplug-enable-bridge", 
> > > > > PIIX4PMState,
> > > > > + use_acpi_unplug, true),
> > > > >  DEFINE_PROP_BOOL("memory-hotplug-support", PIIX4PMState,
> > > > >   acpi_memory_hotplug.is_enabled, true),
> > > > >  DEFINE_PROP_END_OF_LIST(),
> > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > > index 23c77ee..71b3ac3 100644
> > > > > --- a/hw/i386/acpi-build.c
> > > > > +++ b/hw/i386/acpi-build.c
> > > > > @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
> > > > >  bool s3_disabled;
> > > > >  bool s4_disabled;
> > > > >  bool pcihp_bridge_en;
> > > > > +bool pcihup_bridge_en;
> > > > >  uint8_t s4_val;
> > > > >  AcpiFadtData fadt;
> > > > >  uint16_t cpu_hp_io_base;
> > > > > @@ -240,6 +241,9 @@ static void acpi_get_pm_info(MachineState 
> > > > > *machine, AcpiPmInfo *pm)
> > > > >  pm->pcihp_bridge_en =
> > > > >  object_property_get_bool(obj, 
> > > > > "acpi-pci-hotplug-with-bridge-support",
> > > > >   NULL);
> > > > > +pm->pcihup_bridge_en =
> > > > > +object_property_get_bool(obj, 
> > > > > "acpi-pci-hotunplug-enable-bridge",
> > > > > + NULL);
> > > > >  }
> > > > >  
> > > > >  static void acpi_get_misc_info(AcpiMiscInfo *info)
> > > > > @@ -451,7 +455,8 @@ static void build_append_pcihp_notify_entry(Aml 
> > > > > *method, int slot)
> > > > >  }
> > > > >  
> > > > >  static void build_append_pci_bus_devices(Aml *parent_scope, PCIBus 
> > > > > *bus,
> > > > > - bool pcihp_bridge_en)
> > > > > + bool pcihp_bridge_en,
> > > > > + bool pcihup_bridge_en)
> > > > >  {
> > > > >  Aml *dev, *notify_method = NULL, *method;
> > > > >  QObject *bsel;
> > > > > @@ -479,11 +484,14 @@ static void build_append_pci_bus_devices(Aml 
> > > > > *parent_scope, PCIBus *bus,
> > > > >  dev = aml_device("S%.02X", PCI_DEVFN(slot, 0));
> > > > >  aml_append(dev, aml_name_decl("_SUN", 
> > > > > aml_int(slot)));
> > > > >  aml_append(dev, aml_name_decl("_ADR", aml_int(slot 
> > 

Re: [PATCH V2] Add a new PIIX option to control PCI hot unplugging of devices on non-root buses

2020-04-28 Thread Michael S. Tsirkin
On Tue, Apr 28, 2020 at 10:10:18PM +0530, Ani Sinha wrote:
> 
> 
> On Tue, Apr 28, 2020 at 9:51 PM Michael S. Tsirkin  wrote:
> 
> On Tue, Apr 28, 2020 at 09:39:16PM +0530, Ani Sinha wrote:
> >
> > Ani
> > On Apr 28, 2020, 21:35 +0530, Michael S. Tsirkin , 
> wrote:
> >
> >     On Tue, Apr 28, 2020 at 10:16:52AM +, Ani Sinha wrote:
> >
> >         A new option "use_acpi_unplug" is introduced for PIIX which will
> >         selectively only disable hot unplugging of both hot plugged and
> >         cold plugged PCI devices on non-root PCI buses. This will 
> prevent
> >         hot unplugging of devices from Windows based guests from system
> >         tray but will not prevent devices from being hot plugged into 
> the
> >         guest.
> >
> >         It has been tested on Windows guests.
> >
> >         Signed-off-by: Ani Sinha 
> >
> >
> >     It's still a non starter until we find something similar for PCIE 
> and
> >     SHPC. Do guests check command status? Can some unplug commands fail?
> >
> >
> > Ok I  give up! I thought we debated this on the other thread.
> 
> Sorry to hear that.
> I'd rather you didn't, and worked on a solution that works for everyone.
> 
> 
> That is extremely hard for one person to do, without inputs and ideas from the
> community.

What kind of input are you looking for?

> Satisfying the entire world requires lot of time and energy
> investment, not to mention a broad expertise in multiple technologies. 
> 
> 
> 
> Pushing back on merging code is unfortunately the only mechanism
> maintainers have to make sure features are complete and
> orthogonal to each other, so I'm not sure I can help otherwise.
> 
> >
> >
> >
> >
> >         ---
> >         hw/acpi/piix4.c | 3 +++
> >         hw/i386/acpi-build.c | 40
> ++--
> >         2 files changed, 29 insertions(+), 14 deletions(-)
> >
> >         diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> >         index 964d6f5..59fa707 100644
> >         --- a/hw/acpi/piix4.c
> >         +++ b/hw/acpi/piix4.c
> >         @@ -78,6 +78,7 @@ typedef struct PIIX4PMState {
> >
> >         AcpiPciHpState acpi_pci_hotplug;
> >         bool use_acpi_pci_hotplug;
> >         + bool use_acpi_unplug;
> >
> >         uint8_t disable_s3;
> >         uint8_t disable_s4;
> >         @@ -633,6 +634,8 @@ static Property piix4_pm_properties[] = {
> >         DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_VAL, PIIX4PMState, s4_val, 2),
> >         DEFINE_PROP_BOOL("acpi-pci-hotplug-with-bridge-support",
> PIIX4PMState,
> >         use_acpi_pci_hotplug, true),
> >         + DEFINE_PROP_BOOL("acpi-pci-hotunplug-enable-bridge",
> PIIX4PMState,
> >         + use_acpi_unplug, true),
> >         DEFINE_PROP_BOOL("memory-hotplug-support", PIIX4PMState,
> >         acpi_memory_hotplug.is_enabled, true),
> >         DEFINE_PROP_END_OF_LIST(),
> >         diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >         index 23c77ee..71b3ac3 100644
> >         --- a/hw/i386/acpi-build.c
> >         +++ b/hw/i386/acpi-build.c
> >         @@ -96,6 +96,7 @@ typedef struct AcpiPmInfo {
> >         bool s3_disabled;
> >         bool s4_disabled;
> >         bool pcihp_bridge_en;
> >         + bool pcihup_bridge_en;
> >         uint8_t s4_val;
> >         AcpiFadtData fadt;
> >         uint16_t cpu_hp_io_base;
> >         @@ -240,6 +241,9 @@ static void acpi_get_pm_info(MachineState
> *machine,
> >         AcpiPmInfo *pm)
> >         pm->pcihp_bridge_en =
> >         object_property_get_bool(obj,
> "acpi-pci-hotplug-with-bridge-support",
> >         NULL);
> >         + pm->pcihup_bridge_en =
> >         + object_property_get_bool(obj,
> "acpi-pci-hotunplug-enable-bridge",
> >         + NULL);
> >         }
> >
> >         static void acpi_get_misc_info(AcpiMiscInfo *info)
> >         @@ -451,7 +455,8 @@ static void build_append_pcihp_notify_entry
> (Aml
> >         *method, int slot)
> >         }
> >
> >         static void build_append_pci_bus_devices(Aml *parent_scope,
> PCIBus
> >         *bus,
> >         - bool pcihp_bridge_en)
> >         + bool pcihp_bridge_en,
> >         + bool pcihup_bridge_en)
> >         {
> >         Aml *dev, *notify_method = NULL, *method;
> >         QObject *bsel;
> >         @@ -479,11 +484,14 @@ static void build_append_pci_bus_devices
> (Aml
> >         *parent_scope, PCIBus *bus,
> >         dev = aml_device("S%.02X", PCI_DEVFN(slot, 0));
> >         aml_append(dev, aml_name_decl("_SUN", aml_int(slot)));
> >         aml_append(dev, aml_name_decl("_ADR", aml_int(slot << 16)));

[PATCH 9/9] block: Drop unused .bdrv_has_zero_init_truncate

2020-04-28 Thread Eric Blake
Now that there are no clients of bdrv_has_zero_init_truncate, none of
the drivers need to worry about providing it.

What's more, this eliminates a source of some confusion: a literal
reading of the documentation as written in ceaca56f and implemented in
commit 1dcaf527 claims that a driver which returns 0 for
bdrv_has_zero_init_truncate() must not return 1 for
bdrv_has_zero_init(); this condition was violated for parallels, qcow,
and sometimes for vdi, although in practice it did not matter since
those drivers also lacked .bdrv_co_truncate.

Signed-off-by: Eric Blake 
---
 include/block/block.h |  1 -
 include/block/block_int.h |  7 ---
 block.c   | 21 -
 block/file-posix.c|  1 -
 block/file-win32.c|  1 -
 block/nfs.c   |  1 -
 block/qcow2.c |  1 -
 block/qed.c   |  1 -
 block/raw-format.c|  6 --
 block/rbd.c   |  1 -
 block/sheepdog.c  |  3 ---
 block/ssh.c   |  1 -
 12 files changed, 45 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 8b62429aa4a9..4de8d8f8a6b2 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -430,7 +430,6 @@ int bdrv_pdiscard(BdrvChild *child, int64_t offset, int64_t 
bytes);
 int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
 int bdrv_has_zero_init_1(BlockDriverState *bs);
 int bdrv_has_zero_init(BlockDriverState *bs);
-int bdrv_has_zero_init_truncate(BlockDriverState *bs);
 bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs);
 bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
 int bdrv_block_status(BlockDriverState *bs, int64_t offset,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 92335f33c750..df6d0273d679 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -449,16 +449,9 @@ struct BlockDriver {
 /*
  * Returns 1 if newly created images are guaranteed to contain only
  * zeros, 0 otherwise.
- * Must return 0 if .bdrv_has_zero_init_truncate() returns 0.
  */
 int (*bdrv_has_zero_init)(BlockDriverState *bs);

-/*
- * Returns 1 if new areas added by growing the image with
- * PREALLOC_MODE_OFF contain only zeros, 0 otherwise.
- */
-int (*bdrv_has_zero_init_truncate)(BlockDriverState *bs);
-
 /* Remove fd handlers, timers, and other event loop callbacks so the event
  * loop is no longer in use.  Called with no in-flight requests and in
  * depth-first traversal order with parents before child nodes.
diff --git a/block.c b/block.c
index 03cc5813a292..fea646d33dc3 100644
--- a/block.c
+++ b/block.c
@@ -5291,27 +5291,6 @@ int bdrv_has_zero_init(BlockDriverState *bs)
 return 0;
 }

-int bdrv_has_zero_init_truncate(BlockDriverState *bs)
-{
-if (!bs->drv) {
-return 0;
-}
-
-if (bs->backing) {
-/* Depends on the backing image length, but better safe than sorry */
-return 0;
-}
-if (bs->drv->bdrv_has_zero_init_truncate) {
-return bs->drv->bdrv_has_zero_init_truncate(bs);
-}
-if (bs->file && bs->drv->is_filter) {
-return bdrv_has_zero_init_truncate(bs->file->bs);
-}
-
-/* safe default */
-return 0;
-}
-
 bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs)
 {
 BlockDriverInfo bdi;
diff --git a/block/file-posix.c b/block/file-posix.c
index 1dca220a81ba..84012be18f4d 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3099,7 +3099,6 @@ BlockDriver bdrv_file = {
 .bdrv_co_create = raw_co_create,
 .bdrv_co_create_opts = raw_co_create_opts,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
-.bdrv_has_zero_init_truncate = bdrv_has_zero_init_1,
 .bdrv_co_block_status = raw_co_block_status,
 .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
 .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes,
diff --git a/block/file-win32.c b/block/file-win32.c
index fa569685d8bc..221aaf713e24 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c
@@ -641,7 +641,6 @@ BlockDriver bdrv_file = {
 .bdrv_close = raw_close,
 .bdrv_co_create_opts = raw_co_create_opts,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
-.bdrv_has_zero_init_truncate = bdrv_has_zero_init_1,

 .bdrv_aio_preadv= raw_aio_preadv,
 .bdrv_aio_pwritev   = raw_aio_pwritev,
diff --git a/block/nfs.c b/block/nfs.c
index b93989265630..2d3474c1e051 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -876,7 +876,6 @@ static BlockDriver bdrv_nfs = {
 .create_opts= _create_opts,

 .bdrv_has_zero_init = nfs_has_zero_init,
-.bdrv_has_zero_init_truncate= nfs_has_zero_init,
 .bdrv_get_allocated_file_size   = nfs_get_allocated_file_size,
 .bdrv_co_truncate   = nfs_file_co_truncate,

diff --git a/block/qcow2.c b/block/qcow2.c
index 2ba0b17c391c..9acdbaeb3ab8 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -5596,7 +5596,6 @@ 

Re: [RFC PATCH] plugins: new lockstep plugin for debugging TCG changes

2020-04-28 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200428171633.17487-1-alex.ben...@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [RFC PATCH] plugins: new lockstep plugin for debugging TCG changes
Message-id: 20200428171633.17487-1-alex.ben...@linaro.org
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
608fdd3 plugins: new lockstep plugin for debugging TCG changes

=== OUTPUT BEGIN ===
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#36: 
new file mode 100644

ERROR: do not use C99 // comments
#162: FILE: tests/plugin/lockstep.c:122:
+// compare and bail

ERROR: do not use C99 // comments
#168: FILE: tests/plugin/lockstep.c:128:
+// mark the execution as complete

ERROR: spaces required around that '-' (ctx:VxV)
#203: FILE: tests/plugin/lockstep.c:163:
+g_strlcpy(sockaddr.sun_path, path, sizeof(sockaddr.sun_path)-1);
 ^

ERROR: spaces required around that '-' (ctx:VxV)
#242: FILE: tests/plugin/lockstep.c:202:
+g_strlcpy(sockaddr.sun_path, path, sizeof(sockaddr.sun_path)-1);
 ^

total: 4 errors, 1 warnings, 251 lines checked

Commit 608fdd3a7d7a (plugins: new lockstep plugin for debugging TCG changes) 
has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200428171633.17487-1-alex.ben...@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH 7/9] parallels: Rework truncation logic

2020-04-28 Thread Eric Blake
The parallels driver tries to use truncation for image growth, but can
only do so when reads are guaranteed as zero.  Now that we have a way
to request zero contents from truncation, we can defer the decision to
actual allocation attempts rather than up front, reducing the number
of places that still use bdrv_has_zero_init_truncate.

Signed-off-by: Eric Blake 
---
 block/parallels.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 2be92cf41708..9dadaa3217b9 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -196,14 +196,24 @@ static int64_t allocate_clusters(BlockDriverState *bs, 
int64_t sector_num,
 }
 if (s->data_end + space > (len >> BDRV_SECTOR_BITS)) {
 space += s->prealloc_size;
+/*
+ * We require the expanded size to read back as zero. If the
+ * user permitted truncation, we try that; but if it fails, we
+ * force the safer-but-slower fallocate.
+ */
+if (s->prealloc_mode == PRL_PREALLOC_MODE_TRUNCATE) {
+ret = bdrv_truncate(bs->file,
+(s->data_end + space) << BDRV_SECTOR_BITS,
+false, PREALLOC_MODE_OFF, BDRV_REQ_ZERO_WRITE,
+NULL);
+if (ret == -ENOTSUP) {
+s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE;
+}
+}
 if (s->prealloc_mode == PRL_PREALLOC_MODE_FALLOCATE) {
 ret = bdrv_pwrite_zeroes(bs->file,
  s->data_end << BDRV_SECTOR_BITS,
  space << BDRV_SECTOR_BITS, 0);
-} else {
-ret = bdrv_truncate(bs->file,
-(s->data_end + space) << BDRV_SECTOR_BITS,
-false, PREALLOC_MODE_OFF, 0, NULL);
 }
 if (ret < 0) {
 return ret;
@@ -828,6 +838,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 qemu_opt_get_size_del(opts, PARALLELS_OPT_PREALLOC_SIZE, 0);
 s->prealloc_size = MAX(s->tracks, s->prealloc_size >> BDRV_SECTOR_BITS);
 buf = qemu_opt_get_del(opts, PARALLELS_OPT_PREALLOC_MODE);
+/* prealloc_mode can be downgraded later during allocate_clusters */
 s->prealloc_mode = qapi_enum_parse(_mode_lookup, buf,
PRL_PREALLOC_MODE_FALLOCATE,
_err);
@@ -836,10 +847,6 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail_options;
 }

-if (!bdrv_has_zero_init_truncate(bs->file->bs)) {
-s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE;
-}
-
 if ((flags & BDRV_O_RDWR) && !(flags & BDRV_O_INACTIVE)) {
 s->header->inuse = cpu_to_le32(HEADER_INUSE_MAGIC);
 ret = parallels_update_header(bs);
-- 
2.26.2




[PATCH 6/9] ssh: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-28 Thread Eric Blake
Our .bdrv_has_zero_init_truncate can detect when the remote side
always zero fills; we can reuse that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it when the server gives it to us for
free.

Signed-off-by: Eric Blake 
---
 block/ssh.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/block/ssh.c b/block/ssh.c
index 9eb33df8598c..f9e08a490069 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -883,6 +883,10 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
 /* Go non-blocking. */
 ssh_set_blocking(s->session, 0);

+if (s->attrs->type == SSH_FILEXFER_TYPE_REGULAR) {
+bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
+}
+
 qapi_free_BlockdevOptionsSsh(opts);

 return 0;
-- 
2.26.2




[PATCH 8/9] vhdx: Rework truncation logic

2020-04-28 Thread Eric Blake
The vhdx driver uses truncation for image growth, with a special case
for blocks that already read as zero but which are only being
partially written.  But with a bit of rearranging, it's just as easy
to defer the decision on whether truncation resulted in zeroes to the
actual allocation attempt, reducing the number of places that still
use bdrv_has_zero_init_truncate.

Signed-off-by: Eric Blake 
---
 block/vhdx.c | 89 ++--
 1 file changed, 51 insertions(+), 38 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index 21497f731878..fe544abaf52a 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1241,12 +1241,16 @@ exit:
 /*
  * Allocate a new payload block at the end of the file.
  *
- * Allocation will happen at 1MB alignment inside the file
+ * Allocation will happen at 1MB alignment inside the file.
+ *
+ * If @need_zero is set on entry but not cleared on return, then truncation
+ * could not guarantee that the new portion reads as zero, and the caller
+ * will take care of it instead.
  *
  * Returns the file offset start of the new payload block
  */
 static int vhdx_allocate_block(BlockDriverState *bs, BDRVVHDXState *s,
-uint64_t *new_offset)
+   uint64_t *new_offset, bool *need_zero)
 {
 int64_t current_len;

@@ -1263,6 +1267,17 @@ static int vhdx_allocate_block(BlockDriverState *bs, 
BDRVVHDXState *s,
 return -EINVAL;
 }

+if (*need_zero) {
+int ret;
+
+ret = bdrv_truncate(bs->file, *new_offset + s->block_size, false,
+PREALLOC_MODE_OFF, BDRV_REQ_ZERO_WRITE, NULL);
+if (ret != -ENOTSUP) {
+*need_zero = false;
+return ret;
+}
+}
+
 return bdrv_truncate(bs->file, *new_offset + s->block_size, false,
  PREALLOC_MODE_OFF, 0, NULL);
 }
@@ -1356,18 +1371,38 @@ static coroutine_fn int vhdx_co_writev(BlockDriverState 
*bs, int64_t sector_num,
 /* in this case, we need to preserve zero writes for
  * data that is not part of this write, so we must pad
  * the rest of the buffer to zeroes */
-
-/* if we are on a posix system with ftruncate() that extends
- * a file, then it is zero-filled for us.  On Win32, the raw
- * layer uses SetFilePointer and SetFileEnd, which does not
- * zero fill AFAIK */
-
-/* Queue another write of zero buffers if the underlying file
- * does not zero-fill on file extension */
-
-if (bdrv_has_zero_init_truncate(bs->file->bs) == 0) {
-use_zero_buffers = true;
-
+use_zero_buffers = true;
+/* fall through */
+case PAYLOAD_BLOCK_NOT_PRESENT: /* fall through */
+case PAYLOAD_BLOCK_UNMAPPED:
+case PAYLOAD_BLOCK_UNMAPPED_v095:
+case PAYLOAD_BLOCK_UNDEFINED:
+bat_prior_offset = sinfo.file_offset;
+ret = vhdx_allocate_block(bs, s, _offset,
+  _zero_buffers);
+if (ret < 0) {
+goto exit;
+}
+/*
+ * once we support differencing files, this may also be
+ * partially present
+ */
+/* update block state to the newly specified state */
+vhdx_update_bat_table_entry(bs, s, , _entry,
+_entry_offset,
+PAYLOAD_BLOCK_FULLY_PRESENT);
+bat_update = true;
+/*
+ * Since we just allocated a block, file_offset is the
+ * beginning of the payload block. It needs to be the
+ * write address, which includes the offset into the
+ * block, unless the entire block needs to read as
+ * zeroes but truncation was not able to provide them,
+ * in which case we need to fill in the rest.
+ */
+if (!use_zero_buffers) {
+sinfo.file_offset += sinfo.block_offset;
+} else {
 /* zero fill the front, if any */
 if (sinfo.block_offset) {
 iov1.iov_len = sinfo.block_offset;
@@ -1379,7 +1414,7 @@ static coroutine_fn int vhdx_co_writev(BlockDriverState 
*bs, int64_t sector_num,
 }

 /* our actual data */
-qemu_iovec_concat(_qiov, qiov,  bytes_done,
+qemu_iovec_concat(_qiov, qiov, bytes_done,
   sinfo.bytes_avail);

 /* zero fill the back, if any */
@@ -1394,29 +1429,7 @@ static coroutine_fn int vhdx_co_writev(BlockDriverState 
*bs, int64_t 

[PATCH 3/9] nfs: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-28 Thread Eric Blake
Our .bdrv_has_zero_init_truncate returns 1 if we detect that the OS
always 0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it when the OS gives it to us for
free.

Signed-off-by: Eric Blake 
---
 block/nfs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/nfs.c b/block/nfs.c
index 2393fbfe6bcc..b93989265630 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -623,6 +623,9 @@ static int nfs_file_open(BlockDriverState *bs, QDict 
*options, int flags,
 }

 bs->total_sectors = ret;
+if (client->has_zero_init) {
+bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
+}
 ret = 0;
 return ret;
 }
-- 
2.26.2




[PATCH 4/9] rbd: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-28 Thread Eric Blake
Our .bdrv_has_zero_init_truncate always returns 1 because rbd always
0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it.

Signed-off-by: Eric Blake 
---
 block/rbd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/rbd.c b/block/rbd.c
index f2d52091c702..331c45adb2b2 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -817,6 +817,9 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 }

+/* When extending regular files, we get zeros from the OS */
+bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
+
 r = 0;
 goto out;

-- 
2.26.2




[PATCH 1/9] gluster: Drop useless has_zero_init callback

2020-04-28 Thread Eric Blake
block.c already defaults to 0 if we don't provide a callback; there's
no need to write a callback that always fails.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
---
 block/gluster.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index d06df900f692..31233cac696a 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1359,12 +1359,6 @@ static int64_t 
qemu_gluster_allocated_file_size(BlockDriverState *bs)
 }
 }

-static int qemu_gluster_has_zero_init(BlockDriverState *bs)
-{
-/* GlusterFS volume could be backed by a block device */
-return 0;
-}
-
 /*
  * Find allocation range in @bs around offset @start.
  * May change underlying file descriptor's file offset.
@@ -1569,8 +1563,6 @@ static BlockDriver bdrv_gluster = {
 .bdrv_co_readv= qemu_gluster_co_readv,
 .bdrv_co_writev   = qemu_gluster_co_writev,
 .bdrv_co_flush_to_disk= qemu_gluster_co_flush_to_disk,
-.bdrv_has_zero_init   = qemu_gluster_has_zero_init,
-.bdrv_has_zero_init_truncate  = qemu_gluster_has_zero_init,
 #ifdef CONFIG_GLUSTERFS_DISCARD
 .bdrv_co_pdiscard = qemu_gluster_co_pdiscard,
 #endif
@@ -1601,8 +1593,6 @@ static BlockDriver bdrv_gluster_tcp = {
 .bdrv_co_readv= qemu_gluster_co_readv,
 .bdrv_co_writev   = qemu_gluster_co_writev,
 .bdrv_co_flush_to_disk= qemu_gluster_co_flush_to_disk,
-.bdrv_has_zero_init   = qemu_gluster_has_zero_init,
-.bdrv_has_zero_init_truncate  = qemu_gluster_has_zero_init,
 #ifdef CONFIG_GLUSTERFS_DISCARD
 .bdrv_co_pdiscard = qemu_gluster_co_pdiscard,
 #endif
@@ -1633,8 +1623,6 @@ static BlockDriver bdrv_gluster_unix = {
 .bdrv_co_readv= qemu_gluster_co_readv,
 .bdrv_co_writev   = qemu_gluster_co_writev,
 .bdrv_co_flush_to_disk= qemu_gluster_co_flush_to_disk,
-.bdrv_has_zero_init   = qemu_gluster_has_zero_init,
-.bdrv_has_zero_init_truncate  = qemu_gluster_has_zero_init,
 #ifdef CONFIG_GLUSTERFS_DISCARD
 .bdrv_co_pdiscard = qemu_gluster_co_pdiscard,
 #endif
@@ -1671,8 +1659,6 @@ static BlockDriver bdrv_gluster_rdma = {
 .bdrv_co_readv= qemu_gluster_co_readv,
 .bdrv_co_writev   = qemu_gluster_co_writev,
 .bdrv_co_flush_to_disk= qemu_gluster_co_flush_to_disk,
-.bdrv_has_zero_init   = qemu_gluster_has_zero_init,
-.bdrv_has_zero_init_truncate  = qemu_gluster_has_zero_init,
 #ifdef CONFIG_GLUSTERFS_DISCARD
 .bdrv_co_pdiscard = qemu_gluster_co_pdiscard,
 #endif
-- 
2.26.2




[PATCH 5/9] sheepdog: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-28 Thread Eric Blake
Our .bdrv_has_zero_init_truncate always returns 1 because sheepdog
always 0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it.

Signed-off-by: Eric Blake 
---
 block/sheepdog.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index ef0a6e743e27..26fd22c7f07d 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1654,6 +1654,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, 
int flags,
 memcpy(>inode, buf, sizeof(s->inode));

 bs->total_sectors = s->inode.vdi_size / BDRV_SECTOR_SIZE;
+bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
 pstrcpy(s->name, sizeof(s->name), vdi);
 qemu_co_mutex_init(>lock);
 qemu_co_mutex_init(>queue_lock);
-- 
2.26.2




[PATCH 2/9] file-win32: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-28 Thread Eric Blake
When using bdrv_file, .bdrv_has_zero_init_truncate always returns 1;
therefore, we can behave just like file-posix, and always implement
BDRV_REQ_ZERO_WRITE by ignoring it since the OS gives it to us for
free (note that file-posix.c had to use an 'if' because it shared code
between regular files and block devices, but in file-win32.c,
bdrv_host_device uses a separate .bdrv_file_open).

Signed-off-by: Eric Blake 
---
 block/file-win32.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/file-win32.c b/block/file-win32.c
index a6b0dda5c302..fa569685d8bc 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c
@@ -408,6 +408,9 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 win32_aio_attach_aio_context(s->aio, bdrv_get_aio_context(bs));
 }

+/* When extending regular files, we get zeros from the OS */
+bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
+
 ret = 0;
 fail:
 qemu_opts_del(opts);
-- 
2.26.2




[PATCH 0/9] More truncate improvements

2020-04-28 Thread Eric Blake
Based-on: <20200424125448.63318-1-kw...@redhat.com>
[PATCH v7 00/10] block: Fix resize (extending) of short overlays

After reviewing Kevin's work, I questioned if we had a redundancy with
bdrv_has_zero_init_truncate.  It turns out we do, and this is the result.

Patch 1 has been previously posted [1] and reviewed, the rest is new.
I did not address Neils' comment that modern gluster also always
0-initializes [2], as I am not set up to verify it (my changes to the
other drivers are semantic no-ops, so I don't feel as bad about
posting them with less rigourous testing).

[1] https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg08070.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg04266.html

Eric Blake (9):
  gluster: Drop useless has_zero_init callback
  file-win32: Support BDRV_REQ_ZERO_WRITE for truncate
  nfs: Support BDRV_REQ_ZERO_WRITE for truncate
  rbd: Support BDRV_REQ_ZERO_WRITE for truncate
  sheepdog: Support BDRV_REQ_ZERO_WRITE for truncate
  ssh: Support BDRV_REQ_ZERO_WRITE for truncate
  parallels: Rework truncation logic
  vhdx: Rework truncation logic
  block: Drop unused .bdrv_has_zero_init_truncate

 include/block/block.h |  1 -
 include/block/block_int.h |  7 ---
 block.c   | 21 -
 block/file-posix.c|  1 -
 block/file-win32.c|  4 +-
 block/gluster.c   | 14 --
 block/nfs.c   |  4 +-
 block/parallels.c | 23 ++
 block/qcow2.c |  1 -
 block/qed.c   |  1 -
 block/raw-format.c|  6 ---
 block/rbd.c   |  4 +-
 block/sheepdog.c  |  4 +-
 block/ssh.c   |  5 ++-
 block/vhdx.c  | 89 ++-
 15 files changed, 80 insertions(+), 105 deletions(-)

-- 
2.26.2




Re: [RFC patch v1 2/3] qemu-file: add buffered mode

2020-04-28 Thread Denis Plotnikov




On 28.04.2020 20:54, Dr. David Alan Gilbert wrote:

* Denis Plotnikov (dplotni...@virtuozzo.com) wrote:


On 27.04.2020 15:14, Dr. David Alan Gilbert wrote:

* Denis Plotnikov (dplotni...@virtuozzo.com) wrote:

The patch adds ability to qemu-file to write the data
asynchronously to improve the performance on writing.
Before, only synchronous writing was supported.

Enabling of the asyncronous mode is managed by new
"enabled_buffered" callback.

It's a bit invasive isn't it - changes a lot of functions in a lot of
places!

If you mean changing the qemu-file code - yes, it is.

Yeh that's what I worry about; qemu-file is pretty complex as it is.
Especially when it then passes it to the channel code etc


If you mean changing the qemu-file usage in the code - no.
The only place to change is the snapshot code when the buffered mode is
enabled with a callback.
The change is in 03 patch of the series.

That's fine - that's easy.


The multifd code separated the control headers from the data on separate
fd's - but that doesn't help your case.

yes, that doesn't help

Is there any chance you could do this by using the existing 'save_page'
hook (that RDMA uses).

I don't think so. My goal is to improve writing performance of
the internal snapshot to qcow2 image. The snapshot is saved in qcow2 as
continuous stream placed in the end of address space.
To achieve the best writing speed I need a size and base-aligned buffer
containing the vm state (with ram) which looks like that (related to ram):

... | ram page header | ram page | ram page header | ram page | ... and so
on

to store the buffer in qcow2 with a single operation.

'save_page' would allow me not to store 'ram page' in the qemu-file internal
structures,
and write my own ram page storing logic. I think that wouldn't help me a lot
because:
1. I need a page with the ram page header
2. I want to reduce the number of io operations
3. I want to save other parts of vm state as fast as possible

May be I can't see the better way of using 'save page' callback.
Could you suggest anything?

I guess it depends if we care about keeping the format of the snapshot
the same here;  if we were open to changing it, then we could use
the save_page hook to delay the writes, so we'd have a pile of headers
followed by a pile of pages.


I think we have to care about keeping the format. Because many users 
already have internal snapshots
saved in the qcow2 images, if we change the format we can't load 
snapshots from those images
as well as make snapshots non-readable for older qemu-s or we need to 
support two versions of format

which I think is too complicated.




Denis

In the cover letter you mention direct qemu_fflush calls - have we got a
few too many in some palces that you think we can clean out?

I'm not sure that some of them are excessive. To the best of my knowlege,
qemu-file is used for the source-destination communication on migration
and removing some qemu_fflush-es may break communication logic.

I can't see any obvious places where it's called during the ram
migration; can you try and give me a hint to where you're seeing it ?


I think those qemu_fflush-es aren't in the ram migration rather than in 
other vm state parts.
Although, those parts are quite small in comparison to ram, I saw quite 
a lot of qemu_fflush-es while debugging.
Still, we could benefit saving them with fewer number of io operation if 
we going to use buffered mode.


Denis




Snapshot is just a special case (if not the only) when we know that we can
do buffered (cached)
writings. Do you know any other cases when the buffered (cached) mode could
be useful?

The RDMA code does it because it's really not good at small transfers,
but maybe generally it would be a good idea to do larger writes if
possible - something that multifd manages.

Dave


Dave


Signed-off-by: Denis Plotnikov 
---
   include/qemu/typedefs.h |   1 +
   migration/qemu-file.c   | 351 
+---
   migration/qemu-file.h   |   9 ++
   3 files changed, 339 insertions(+), 22 deletions(-)

diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 88dce54..9b388c8 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -98,6 +98,7 @@ typedef struct QEMUBH QEMUBH;
   typedef struct QemuConsole QemuConsole;
   typedef struct QEMUFile QEMUFile;
   typedef struct QEMUFileBuffer QEMUFileBuffer;
+typedef struct QEMUFileAioTask QEMUFileAioTask;
   typedef struct QemuLockable QemuLockable;
   typedef struct QemuMutex QemuMutex;
   typedef struct QemuOpt QemuOpt;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 285c6ef..f42f949 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -29,19 +29,25 @@
   #include "qemu-file.h"
   #include "trace.h"
   #include "qapi/error.h"
+#include "block/aio_task.h"
-#define IO_BUF_SIZE 32768
+#define IO_BUF_SIZE (1024 * 1024)
   #define MAX_IOV_SIZE MIN(IOV_MAX, 64)
+#define IO_BUF_NUM 2
+#define 

Re: [PATCH 1/2] tpm: tpm-tis-device: set PPI to false by default

2020-04-28 Thread Stefan Berger

On 4/28/20 6:34 AM, Cornelia Huck wrote:

On Mon, 27 Apr 2020 16:31:44 +0200
Eric Auger  wrote:


The tpm-tis-device device does not support PPI. Let's
change the default value for the corresponding property
instead of tricking this latter in the mach-virt machine.

Signed-off-by: Eric Auger 
---
  hw/tpm/tpm_tis_sysbus.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/tpm/tpm_tis_sysbus.c b/hw/tpm/tpm_tis_sysbus.c
index 18c02aed67..eced1fc843 100644
--- a/hw/tpm/tpm_tis_sysbus.c
+++ b/hw/tpm/tpm_tis_sysbus.c
@@ -91,7 +91,7 @@ static void tpm_tis_sysbus_reset(DeviceState *dev)
  static Property tpm_tis_sysbus_properties[] = {
  DEFINE_PROP_UINT32("irq", TPMStateSysBus, state.irq_num, TPM_TIS_IRQ),
  DEFINE_PROP_TPMBE("tpmdev", TPMStateSysBus, state.be_driver),
-DEFINE_PROP_BOOL("ppi", TPMStateSysBus, state.ppi_enabled, true),
+DEFINE_PROP_BOOL("ppi", TPMStateSysBus, state.ppi_enabled, false),
  DEFINE_PROP_END_OF_LIST(),
  };
  

This looks like a better place to do this than in the virt compat
machines, and should get us the same result, leaving compatibility
intact.

Reviewed-by: Cornelia Huck 


Reviewed-by: Stefan Berger 





Re: [PATCH 2/2] hw/arm/virt: Remove the compat forcing tpm-tis-device PPI to off

2020-04-28 Thread Stefan Berger

On 4/28/20 6:36 AM, Cornelia Huck wrote:

On Mon, 27 Apr 2020 16:31:45 +0200
Eric Auger  wrote:


Now that the tpm-tis-device device PPI property is off by default,
we can remove the compat used for the same goal.

Signed-off-by: Eric Auger 
---
  hw/arm/virt.c | 5 -
  1 file changed, 5 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 7dc96abf72..2a68306f28 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2311,11 +2311,6 @@ type_init(machvirt_machine_init);
  
  static void virt_machine_5_0_options(MachineClass *mc)

  {
-static GlobalProperty compat[] = {
-{ TYPE_TPM_TIS_SYSBUS, "ppi", "false" },
-};
-
-compat_props_add(mc->compat_props, compat, G_N_ELEMENTS(compat));
  }
  DEFINE_VIRT_MACHINE_AS_LATEST(5, 0)
  

Reviewed-by: Cornelia Huck 


Reviewed-by: Stefan Berger 





Re: [PATCH RFC v2 9/9] KVM: Dirty ring support

2020-04-28 Thread Peter Xu
On Tue, Apr 28, 2020 at 04:05:09PM -0400, Peter Xu wrote:
> +/*
> + * Flush all the existing dirty pages to the KVM slot buffers.  When
> + * this call returns, we guarantee that all the touched dirty pages
> + * before calling this function have been put into the per-kvmslot
> + * dirty bitmap.
> + *
> + * To achieve this, we need to:
> + *
> + * (1) Kick all vcpus out, this will make sure that we flush all the
> + * dirty buffers that potentially in the hardware (PML) into the
> + * dirty rings, after that,
> + *
> + * (2) Kick the reaper thread and make sure it reaps all the dirty
> + * page that is in the dirty rings.

Please note that some of the comments might be outdated, like this one...

(I think I'll remove these two paragraph in the next post)

> + *
> + * This function must be called with BQL held.
> + */
> +static void kvm_dirty_ring_flush(struct KVMDirtyRingReaper *r)
> +{
> +trace_kvm_dirty_ring_flush(0);
> +
> +/*
> + * The function needs to be serialized.  Since this function
> + * should always be with BQL held, serialization is guaranteed.
> + * However, let's be sure of it.
> + */
> +assert(qemu_mutex_iothread_locked());
> +
> +/*
> + * First make sure to flush the hardware buffers by kicking all
> + * vcpus out in a synchronous way.
> + */
> +kvm_cpu_synchronize_kick_all();
> +
> +/*
> + * Recycle the dirty bits outside the reaper thread.  We're safe because
> + * kvm_dirty_ring_reap() is internally protected by a mutex.

Same here; the comment is obsolete.  There used to be a mutex after v1 and
before v2, but I removed the mutex because now we simply always take the BQL so
that mutex is not needed any more.

I'm not sure whether there's still obsolete comments here and there (since the
code does changed quite a bit).  Anyway please stick to the code if there's
conflicts, and I'll try to fix the comments up.

> + */
> +kvm_dirty_ring_reap(kvm_state);
> +
> +trace_kvm_dirty_ring_flush(1);
> +}

-- 
Peter Xu




Re: [PATCH 0/2] virt: Set tpm-tis-device ppi property to off by default

2020-04-28 Thread Stefan Berger

On 4/28/20 6:38 AM, Cornelia Huck wrote:

On Mon, 27 Apr 2020 16:31:43 +0200
Eric Auger  wrote:


Instead of using a compat in the mach-virt machine to force
PPI off for all virt machines (PPI not supported by the
tpm-tis-device device), let's simply change the default value
in the sysbus device.

Best Regards

Eric

Eric Auger (2):
   tpm: tpm-tis-device: set PPI to false by default
   hw/arm/virt: Remove the compat forcing tpm-tis-device PPI to off

  hw/arm/virt.c   | 5 -
  hw/tpm/tpm_tis_sysbus.c | 2 +-
  2 files changed, 1 insertion(+), 6 deletions(-)


I think we can apply the compat machines patch on top of these two
patches.

Q: Who will queue this and the machine types patch? It feels a bit
weird taking arm patches through the s390 tree :)

I can queue them and would send the PR soon. I am also fine with someone 
else doing it.





[PATCH RFC v2 7/9] KVM: Cache kvm slot dirty bitmap size

2020-04-28 Thread Peter Xu
Cache it too because we'll reference it more frequently in the future.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 accel/kvm/kvm-all.c  | 1 +
 include/sysemu/kvm_int.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index dd21b86efa..2d581013cc 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -558,6 +558,7 @@ static void kvm_slot_init_dirty_bitmap(KVMSlot *mem)
 hwaddr bitmap_size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS),
 /*HOST_LONG_BITS*/ 64) / 8;
 mem->dirty_bmap = g_malloc0(bitmap_size);
+mem->dirty_bmap_size = bitmap_size;
 }
 
 /* Sync dirty bitmap from kernel to KVMSlot.dirty_bmap */
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 1a19bfef80..71c9946ecf 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -23,6 +23,7 @@ typedef struct KVMSlot
 int old_flags;
 /* Dirty bitmap cache for the slot */
 unsigned long *dirty_bmap;
+unsigned long dirty_bmap_size;
 /* Cache of the address space ID */
 int as_id;
 /* Cache of the offset in ram address space */
-- 
2.24.1




[PATCH RFC v2 6/9] KVM: Provide helper to sync dirty bitmap from slot to ramblock

2020-04-28 Thread Peter Xu
kvm_physical_sync_dirty_bitmap() calculates the ramblock offset in an
awkward way from the MemoryRegionSection that passed in from the
caller.  The truth is for each KVMSlot the ramblock offset never
change for the lifecycle.  Cache the ramblock offset for each KVMSlot
into the structure when the KVMSlot is created.

With that, we can further simplify kvm_physical_sync_dirty_bitmap()
with a helper to sync KVMSlot dirty bitmap to the ramblock dirty
bitmap of a specific KVMSlot.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 accel/kvm/kvm-all.c  | 37 +
 include/sysemu/kvm_int.h |  2 ++
 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 436b8fd899..dd21b86efa 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -525,15 +525,12 @@ static void kvm_log_stop(MemoryListener *listener,
 }
 
 /* get kvm's dirty pages bitmap and update qemu's */
-static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
- unsigned long *bitmap)
+static void kvm_slot_sync_dirty_pages(KVMSlot *slot)
 {
-ram_addr_t start = section->offset_within_region +
-   memory_region_get_ram_addr(section->mr);
-ram_addr_t pages = int128_get64(section->size) / qemu_real_host_page_size;
+ram_addr_t start = slot->ram_start_offset;
+ram_addr_t pages = slot->memory_size / qemu_real_host_page_size;
 
-cpu_physical_memory_set_dirty_lebitmap(bitmap, start, pages);
-return 0;
+cpu_physical_memory_set_dirty_lebitmap(slot->dirty_bmap, start, pages);
 }
 
 #define ALIGN(x, y)  (((x)+(y)-1) & ~((y)-1))
@@ -595,12 +592,10 @@ static void 
kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
 KVMState *s = kvm_state;
 KVMSlot *mem;
 hwaddr start_addr, size;
-hwaddr slot_size, slot_offset = 0;
+hwaddr slot_size;
 
 size = kvm_align_section(section, _addr);
 while (size) {
-MemoryRegionSection subsection = *section;
-
 slot_size = MIN(kvm_max_slot_size, size);
 mem = kvm_lookup_matching_slot(kml, start_addr, slot_size);
 if (!mem) {
@@ -609,12 +604,7 @@ static void 
kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
 }
 
 kvm_slot_get_dirty_log(s, mem);
-
-subsection.offset_within_region += slot_offset;
-subsection.size = int128_make64(slot_size);
-kvm_get_dirty_pages_log_range(, mem->dirty_bmap);
-
-slot_offset += slot_size;
+kvm_slot_sync_dirty_pages(mem);
 start_addr += slot_size;
 size -= slot_size;
 }
@@ -1036,7 +1026,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
 int err;
 MemoryRegion *mr = section->mr;
 bool writeable = !mr->readonly && !mr->rom_device;
-hwaddr start_addr, size, slot_size;
+hwaddr start_addr, size, slot_size, mr_offset;
+ram_addr_t ram_start_offset;
 void *ram;
 
 if (!memory_region_is_ram(mr)) {
@@ -1054,9 +1045,13 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
 return;
 }
 
-/* use aligned delta to align the ram address */
-ram = memory_region_get_ram_ptr(mr) + section->offset_within_region +
-  (start_addr - section->offset_within_address_space);
+/* The offset of the kvmslot within the memory region */
+mr_offset = section->offset_within_region + start_addr -
+section->offset_within_address_space;
+
+/* use aligned delta to align the ram address and offset */
+ram = memory_region_get_ram_ptr(mr) + mr_offset;
+ram_start_offset = memory_region_get_ram_addr(mr) + mr_offset;
 
 kvm_slots_lock(kml);
 
@@ -1092,6 +1087,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
 mem->as_id = kml->as_id;
 mem->memory_size = slot_size;
 mem->start_addr = start_addr;
+mem->ram_start_offset = ram_start_offset;
 mem->ram = ram;
 mem->flags = kvm_mem_flags(mr);
 kvm_slot_init_dirty_bitmap(mem);
@@ -1102,6 +1098,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
 abort();
 }
 start_addr += slot_size;
+ram_start_offset += slot_size;
 ram += slot_size;
 size -= slot_size;
 } while (size);
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 4434e15ec7..1a19bfef80 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -25,6 +25,8 @@ typedef struct KVMSlot
 unsigned long *dirty_bmap;
 /* Cache of the address space ID */
 int as_id;
+/* Cache of the offset in ram address space */
+ram_addr_t ram_start_offset;
 } KVMSlot;
 
 typedef struct KVMMemoryListener {
-- 
2.24.1




[PATCH RFC v2 5/9] KVM: Provide helper to get kvm dirty log

2020-04-28 Thread Peter Xu
Provide a helper kvm_slot_get_dirty_log() to make the function
kvm_physical_sync_dirty_bitmap() clearer.  We can even cache the as_id
into KVMSlot when it is created, so that we don't even need to pass it
down every time.

Since at it, remove return value of kvm_physical_sync_dirty_bitmap()
because it should never fail.

Signed-off-by: Peter Xu 
---
 accel/kvm/kvm-all.c  | 42 +---
 include/sysemu/kvm_int.h |  2 ++
 2 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index dc6371b8b2..436b8fd899 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -563,6 +563,21 @@ static void kvm_slot_init_dirty_bitmap(KVMSlot *mem)
 mem->dirty_bmap = g_malloc0(bitmap_size);
 }
 
+/* Sync dirty bitmap from kernel to KVMSlot.dirty_bmap */
+static void kvm_slot_get_dirty_log(KVMState *s, KVMSlot *slot)
+{
+struct kvm_dirty_log d = {};
+int ret;
+
+d.dirty_bitmap = slot->dirty_bmap;
+d.slot = slot->slot | (slot->as_id << 16);
+ret = kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, );
+if (ret) {
+error_report_once("%s: KVM_GET_DIRTY_LOG failed with %d",
+  __func__, ret);
+}
+}
+
 /**
  * kvm_physical_sync_dirty_bitmap - Sync dirty bitmap from kernel space
  *
@@ -574,15 +589,13 @@ static void kvm_slot_init_dirty_bitmap(KVMSlot *mem)
  * @kml: the KVM memory listener object
  * @section: the memory section to sync the dirty bitmap with
  */
-static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
-  MemoryRegionSection *section)
+static void kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
+   MemoryRegionSection *section)
 {
 KVMState *s = kvm_state;
-struct kvm_dirty_log d = {};
 KVMSlot *mem;
 hwaddr start_addr, size;
 hwaddr slot_size, slot_offset = 0;
-int ret = 0;
 
 size = kvm_align_section(section, _addr);
 while (size) {
@@ -592,27 +605,19 @@ static int 
kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
 mem = kvm_lookup_matching_slot(kml, start_addr, slot_size);
 if (!mem) {
 /* We don't have a slot if we want to trap every access. */
-goto out;
+return;
 }
 
-d.dirty_bitmap = mem->dirty_bmap;
-d.slot = mem->slot | (kml->as_id << 16);
-if (kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, ) == -1) {
-DPRINTF("ioctl failed %d\n", errno);
-ret = -1;
-goto out;
-}
+kvm_slot_get_dirty_log(s, mem);
 
 subsection.offset_within_region += slot_offset;
 subsection.size = int128_make64(slot_size);
-kvm_get_dirty_pages_log_range(, d.dirty_bitmap);
+kvm_get_dirty_pages_log_range(, mem->dirty_bmap);
 
 slot_offset += slot_size;
 start_addr += slot_size;
 size -= slot_size;
 }
-out:
-return ret;
 }
 
 /* Alignment requirement for KVM_CLEAR_DIRTY_LOG - 64 pages */
@@ -1084,6 +1089,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
 do {
 slot_size = MIN(kvm_max_slot_size, size);
 mem = kvm_alloc_slot(kml);
+mem->as_id = kml->as_id;
 mem->memory_size = slot_size;
 mem->start_addr = start_addr;
 mem->ram = ram;
@@ -1126,14 +1132,10 @@ static void kvm_log_sync(MemoryListener *listener,
  MemoryRegionSection *section)
 {
 KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, 
listener);
-int r;
 
 kvm_slots_lock(kml);
-r = kvm_physical_sync_dirty_bitmap(kml, section);
+kvm_physical_sync_dirty_bitmap(kml, section);
 kvm_slots_unlock(kml);
-if (r < 0) {
-abort();
-}
 }
 
 static void kvm_log_clear(MemoryListener *listener,
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index ac2d1f8b56..4434e15ec7 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -23,6 +23,8 @@ typedef struct KVMSlot
 int old_flags;
 /* Dirty bitmap cache for the slot */
 unsigned long *dirty_bmap;
+/* Cache of the address space ID */
+int as_id;
 } KVMSlot;
 
 typedef struct KVMMemoryListener {
-- 
2.24.1




[PATCH RFC v2 9/9] KVM: Dirty ring support

2020-04-28 Thread Peter Xu
KVM dirty ring is a new interface to pass over dirty bits from kernel to the
userspace.  Instead of using a bitmap for each memory region, the dirty ring
contains an array of dirtied GPAs to fetch (in the form of offset in slots).
For each vcpu there will be one dirty ring that binds to it.

kvm_dirty_ring_reap() is the major function to collect dirty rings.  It can be
called either by a standalone reaper thread that runs in the background,
collecting dirty pages for the whole VM.  It can also be called directly by any
thread that has BQL taken.

Signed-off-by: Peter Xu 
---
 accel/kvm/kvm-all.c| 341 -
 accel/kvm/trace-events |   7 +
 include/hw/core/cpu.h  |   8 +
 3 files changed, 353 insertions(+), 3 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index fbb0a3b1e9..236dbcd536 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -15,6 +15,7 @@
 
 #include "qemu/osdep.h"
 #include 
+#include 
 
 #include 
 
@@ -75,6 +76,25 @@ struct KVMParkedVcpu {
 QLIST_ENTRY(KVMParkedVcpu) node;
 };
 
+enum KVMDirtyRingReaperState {
+KVM_DIRTY_RING_REAPER_NONE = 0,
+/* The reaper is sleeping */
+KVM_DIRTY_RING_REAPER_WAIT,
+/* The reaper is reaping for dirty pages */
+KVM_DIRTY_RING_REAPER_REAPING,
+};
+
+/*
+ * KVM reaper instance, responsible for collecting the KVM dirty bits
+ * via the dirty ring.
+ */
+struct KVMDirtyRingReaper {
+/* The reaper thread */
+QemuThread reaper_thr;
+volatile uint64_t reaper_iteration; /* iteration number of reaper thr */
+volatile enum KVMDirtyRingReaperState reaper_state; /* reap thr state */
+};
+
 struct KVMState
 {
 AccelState parent_obj;
@@ -121,7 +141,6 @@ struct KVMState
 void *memcrypt_handle;
 int (*memcrypt_encrypt_data)(void *handle, uint8_t *ptr, uint64_t len);
 
-/* For "info mtree -f" to tell if an MR is registered in KVM */
 int nr_as;
 struct KVMAs {
 KVMMemoryListener *ml;
@@ -130,6 +149,7 @@ struct KVMState
 bool kvm_dirty_ring_enabled;/* Whether KVM dirty ring is enabled */
 uint64_t kvm_dirty_ring_size;   /* Size of the per-vcpu dirty ring */
 uint32_t kvm_dirty_gfn_count;   /* Number of dirty GFNs per ring */
+struct KVMDirtyRingReaper reaper;
 };
 
 KVMState *kvm_state;
@@ -359,6 +379,13 @@ int kvm_destroy_vcpu(CPUState *cpu)
 goto err;
 }
 
+if (cpu->kvm_dirty_gfns) {
+ret = munmap(cpu->kvm_dirty_gfns, s->kvm_dirty_ring_size);
+if (ret < 0) {
+goto err;
+}
+}
+
 vcpu = g_malloc0(sizeof(*vcpu));
 vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
 vcpu->kvm_fd = cpu->kvm_fd;
@@ -423,6 +450,19 @@ int kvm_init_vcpu(CPUState *cpu)
 (void *)cpu->kvm_run + s->coalesced_mmio * PAGE_SIZE;
 }
 
+if (s->kvm_dirty_ring_enabled) {
+/* Use MAP_SHARED to share pages with the kernel */
+cpu->kvm_dirty_gfns = mmap(NULL, s->kvm_dirty_ring_size,
+   PROT_READ | PROT_WRITE, MAP_SHARED,
+   cpu->kvm_fd,
+   PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
+if (cpu->kvm_dirty_gfns == MAP_FAILED) {
+ret = -errno;
+DPRINTF("mmap'ing vcpu dirty gfns failed: %d\n", ret);
+goto err;
+}
+}
+
 ret = kvm_arch_init_vcpu(cpu);
 err:
 return ret;
@@ -536,6 +576,11 @@ static void kvm_slot_sync_dirty_pages(KVMSlot *slot)
 cpu_physical_memory_set_dirty_lebitmap(slot->dirty_bmap, start, pages);
 }
 
+static void kvm_slot_reset_dirty_pages(KVMSlot *slot)
+{
+memset(slot->dirty_bmap, 0, slot->dirty_bmap_size);
+}
+
 #define ALIGN(x, y)  (((x)+(y)-1) & ~((y)-1))
 
 /* Allocate the dirty bitmap for a slot  */
@@ -579,6 +624,198 @@ static void kvm_slot_get_dirty_log(KVMState *s, KVMSlot 
*slot)
 }
 }
 
+/* Should be with all slots_lock held for the address spaces. */
+static void kvm_dirty_ring_mark_page(KVMState *s, uint32_t as_id,
+ uint32_t slot_id, uint64_t offset)
+{
+KVMMemoryListener *kml;
+KVMSlot *mem;
+
+if (as_id >= s->nr_as) {
+return;
+}
+
+kml = s->as[as_id].ml;
+mem = >slots[slot_id];
+
+if (!mem->memory_size || offset >= (mem->memory_size / TARGET_PAGE_SIZE)) {
+return;
+}
+
+set_bit(offset, mem->dirty_bmap);
+}
+
+static bool dirty_gfn_is_dirtied(struct kvm_dirty_gfn *gfn)
+{
+return gfn->flags == KVM_DIRTY_GFN_F_DIRTY;
+}
+
+static void dirty_gfn_set_collected(struct kvm_dirty_gfn *gfn)
+{
+gfn->flags = KVM_DIRTY_GFN_F_RESET;
+}
+
+/*
+ * Should be with all slots_lock held for the address spaces.  It returns the
+ * dirty page we've collected on this dirty ring.
+ */
+static uint32_t kvm_dirty_ring_reap_one(KVMState *s, CPUState *cpu)
+{
+struct kvm_dirty_gfn *dirty_gfns = cpu->kvm_dirty_gfns, *cur;
+uint32_t gfn_count = s->kvm_dirty_gfn_count;
+uint32_t count = 0, 

[PATCH RFC v2 8/9] KVM: Add dirty-gfn-count property

2020-04-28 Thread Peter Xu
Add a parameter for dirty gfn count for dirty rings.  If zero, dirty ring is
disabled.  Otherwise dirty ring will be enabled with the per-vcpu gfn count as
specified.  If dirty ring cannot be enabled due to unsupported kernel or
illegal parameter, it'll fallback to dirty logging.

By default, dirty ring is not enabled (dirty-gfn-count default to 0).

Signed-off-by: Peter Xu 
---
 accel/kvm/kvm-all.c | 72 +
 qemu-options.hx |  5 
 2 files changed, 77 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 2d581013cc..fbb0a3b1e9 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -127,6 +127,9 @@ struct KVMState
 KVMMemoryListener *ml;
 AddressSpace *as;
 } *as;
+bool kvm_dirty_ring_enabled;/* Whether KVM dirty ring is enabled */
+uint64_t kvm_dirty_ring_size;   /* Size of the per-vcpu dirty ring */
+uint32_t kvm_dirty_gfn_count;   /* Number of dirty GFNs per ring */
 };
 
 KVMState *kvm_state;
@@ -2087,6 +2090,40 @@ static int kvm_init(MachineState *ms)
 s->memory_listener.listener.coalesced_io_add = kvm_coalesce_mmio_region;
 s->memory_listener.listener.coalesced_io_del = kvm_uncoalesce_mmio_region;
 
+/*
+ * Enable KVM dirty ring if supported, otherwise fall back to
+ * dirty logging mode
+ */
+if (s->kvm_dirty_gfn_count > 0) {
+uint64_t ring_size;
+
+ring_size = s->kvm_dirty_gfn_count * sizeof(struct kvm_dirty_gfn);
+
+/* Read the max supported pages */
+ret = kvm_vm_check_extension(kvm_state, KVM_CAP_DIRTY_LOG_RING);
+if (ret > 0) {
+if (ring_size > ret) {
+error_report("KVM dirty GFN count %" PRIu32 " too big "
+ "(maximum is %ld).  Please use a smaller value.",
+ s->kvm_dirty_gfn_count,
+ ret / sizeof(struct kvm_dirty_gfn));
+ret = -EINVAL;
+goto err;
+}
+
+ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_size);
+if (ret) {
+error_report("Enabling of KVM dirty ring failed: %d. "
+ "Suggested mininum value is 1024. "
+ "Please also make sure it's a power of two.", 
ret);
+goto err;
+}
+
+s->kvm_dirty_ring_size = ring_size;
+s->kvm_dirty_ring_enabled = true;
+}
+}
+
 kvm_memory_listener_register(s, >memory_listener,
  _space_memory, 0);
 memory_listener_register(_io_listener,
@@ -3047,6 +3084,33 @@ bool kvm_kernel_irqchip_split(void)
 return kvm_state->kernel_irqchip_split == ON_OFF_AUTO_ON;
 }
 
+static void kvm_get_dirty_gfn_count(Object *obj, Visitor *v,
+const char *name, void *opaque,
+Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+uint32_t value = s->kvm_dirty_gfn_count;
+
+visit_type_uint32(v, name, , errp);
+}
+
+static void kvm_set_dirty_gfn_count(Object *obj, Visitor *v,
+const char *name, void *opaque,
+Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+Error *error = NULL;
+uint32_t value;
+
+visit_type_uint32(v, name, , );
+if (error) {
+error_propagate(errp, error);
+return;
+}
+
+s->kvm_dirty_gfn_count = value;
+}
+
 static void kvm_accel_instance_init(Object *obj)
 {
 KVMState *s = KVM_STATE(obj);
@@ -3054,6 +3118,8 @@ static void kvm_accel_instance_init(Object *obj)
 s->kvm_shadow_mem = -1;
 s->kernel_irqchip_allowed = true;
 s->kernel_irqchip_split = ON_OFF_AUTO_AUTO;
+/* KVM dirty ring is by default off */
+s->kvm_dirty_gfn_count = 0;
 }
 
 static void kvm_accel_class_init(ObjectClass *oc, void *data)
@@ -3075,6 +3141,12 @@ static void kvm_accel_class_init(ObjectClass *oc, void 
*data)
 NULL, NULL, _abort);
 object_class_property_set_description(oc, "kvm-shadow-mem",
 "KVM shadow MMU size", _abort);
+
+object_class_property_add(oc, "dirty-gfn-count", "uint32",
+kvm_get_dirty_gfn_count, kvm_set_dirty_gfn_count,
+NULL, NULL, _abort);
+object_class_property_set_description(oc, "dirty-gfn-count",
+"KVM dirty GFN count (=0 to disable dirty ring)", _abort);
 }
 
 static const TypeInfo kvm_accel_type = {
diff --git a/qemu-options.hx b/qemu-options.hx
index 292d4e7c0c..62e88f012c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -124,6 +124,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
 "kernel-irqchip=on|off|split controls accelerated irqchip 
support (default=on)\n"
 "kvm-shadow-mem=size of KVM shadow MMU in bytes\n"
 "tb-size=n (TCG translation block cache size)\n"
+"dirty-gfn-count=n (KVM 

[PATCH RFC v2 3/9] memory: Introduce log_sync_global() to memory listener

2020-04-28 Thread Peter Xu
Some of the memory listener may want to do log synchronization without
being able to specify a range of memory to sync but always globally.
Such a memory listener should provide this new method instead of the
log_sync() method.

Obviously we can also achieve similar thing when we put the global
sync logic into a log_sync() handler. However that's not efficient
enough because otherwise memory_global_dirty_log_sync() may do the
global sync N times, where N is the number of flat ranges in the
address space.

Make this new method be exclusive to log_sync().

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 include/exec/memory.h | 12 
 memory.c  | 33 +++--
 2 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index e000bd2f97..c0c6155ca0 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -533,6 +533,18 @@ struct MemoryListener {
  */
 void (*log_sync)(MemoryListener *listener, MemoryRegionSection *section);
 
+/**
+ * @log_sync_global:
+ *
+ * This is the global version of @log_sync when the listener does
+ * not have a way to synchronize the log with finer granularity.
+ * When the listener registers with @log_sync_global defined, then
+ * its @log_sync must be NULL.  Vice versa.
+ *
+ * @listener: The #MemoryListener.
+ */
+void (*log_sync_global)(MemoryListener *listener);
+
 /**
  * @log_clear:
  *
diff --git a/memory.c b/memory.c
index 357f7276ee..2a704996c2 100644
--- a/memory.c
+++ b/memory.c
@@ -2047,6 +2047,10 @@ void memory_region_set_dirty(MemoryRegion *mr, hwaddr 
addr,
 memory_region_get_dirty_log_mask(mr));
 }
 
+/*
+ * If memory region `mr' is NULL, do global sync.  Otherwise, sync
+ * dirty bitmap for the specified memory region.
+ */
 static void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
 {
 MemoryListener *listener;
@@ -2060,18 +2064,24 @@ static void 
memory_region_sync_dirty_bitmap(MemoryRegion *mr)
  * address space once.
  */
 QTAILQ_FOREACH(listener, _listeners, link) {
-if (!listener->log_sync) {
-continue;
-}
-as = listener->address_space;
-view = address_space_get_flatview(as);
-FOR_EACH_FLAT_RANGE(fr, view) {
-if (fr->dirty_log_mask && (!mr || fr->mr == mr)) {
-MemoryRegionSection mrs = section_from_flat_range(fr, view);
-listener->log_sync(listener, );
+if (listener->log_sync) {
+as = listener->address_space;
+view = address_space_get_flatview(as);
+FOR_EACH_FLAT_RANGE(fr, view) {
+if (fr->dirty_log_mask && (!mr || fr->mr == mr)) {
+MemoryRegionSection mrs = section_from_flat_range(fr, 
view);
+listener->log_sync(listener, );
+}
 }
+flatview_unref(view);
+} else if (listener->log_sync_global) {
+/*
+ * No matter whether MR is specified, what we can do here
+ * is to do a global sync, because we are not capable to
+ * sync in a finer granularity.
+ */
+listener->log_sync_global(listener);
 }
-flatview_unref(view);
 }
 }
 
@@ -2758,6 +2768,9 @@ void memory_listener_register(MemoryListener *listener, 
AddressSpace *as)
 {
 MemoryListener *other = NULL;
 
+/* Only one of them can be defined for a listener */
+assert(!(listener->log_sync && listener->log_sync_global));
+
 listener->address_space = as;
 if (QTAILQ_EMPTY(_listeners)
 || listener->priority >= QTAILQ_LAST(_listeners)->priority) {
-- 
2.24.1




[PATCH RFC v2 4/9] KVM: Create the KVMSlot dirty bitmap on flag changes

2020-04-28 Thread Peter Xu
Previously we have two places that will create the per KVMSlot dirty
bitmap:

  1. When a newly created KVMSlot has dirty logging enabled,
  2. When the first log_sync() happens for a memory slot.

The 2nd case is lazy-init, while the 1st case is not (which is a fix
of what the 2nd case missed).

To do explicit initialization of dirty bitmaps, what we're missing is
to create the dirty bitmap when the slot changed from not-dirty-track
to dirty-track.  Do that in kvm_slot_update_flags().

With that, we can safely remove the 2nd lazy-init.

This change will be needed for kvm dirty ring because kvm dirty ring
does not use the log_sync() interface at all.

Also move all the pre-checks into kvm_slot_init_dirty_bitmap().

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 accel/kvm/kvm-all.c | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 1f1fd5316e..dc6371b8b2 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -162,6 +162,8 @@ static NotifierList kvm_irqchip_change_notifiers =
 #define kvm_slots_lock(kml)  qemu_mutex_lock(&(kml)->slots_lock)
 #define kvm_slots_unlock(kml)qemu_mutex_unlock(&(kml)->slots_lock)
 
+static void kvm_slot_init_dirty_bitmap(KVMSlot *mem);
+
 int kvm_get_max_memslots(void)
 {
 KVMState *s = KVM_STATE(current_accel());
@@ -452,6 +454,7 @@ static int kvm_slot_update_flags(KVMMemoryListener *kml, 
KVMSlot *mem,
 return 0;
 }
 
+kvm_slot_init_dirty_bitmap(mem);
 return kvm_set_user_memory_region(kml, mem, false);
 }
 
@@ -536,8 +539,12 @@ static int 
kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
 #define ALIGN(x, y)  (((x)+(y)-1) & ~((y)-1))
 
 /* Allocate the dirty bitmap for a slot  */
-static void kvm_memslot_init_dirty_bitmap(KVMSlot *mem)
+static void kvm_slot_init_dirty_bitmap(KVMSlot *mem)
 {
+if (!(mem->flags & KVM_MEM_LOG_DIRTY_PAGES) || mem->dirty_bmap) {
+return;
+}
+
 /*
  * XXX bad kernel interface alert
  * For dirty bitmap, kernel allocates array of size aligned to
@@ -588,11 +595,6 @@ static int 
kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
 goto out;
 }
 
-if (!mem->dirty_bmap) {
-/* Allocate on the first log_sync, once and for all */
-kvm_memslot_init_dirty_bitmap(mem);
-}
-
 d.dirty_bitmap = mem->dirty_bmap;
 d.slot = mem->slot | (kml->as_id << 16);
 if (kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, ) == -1) {
@@ -1086,14 +1088,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
 mem->start_addr = start_addr;
 mem->ram = ram;
 mem->flags = kvm_mem_flags(mr);
-
-if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
-/*
- * Reallocate the bmap; it means it doesn't disappear in
- * middle of a migrate.
- */
-kvm_memslot_init_dirty_bitmap(mem);
-}
+kvm_slot_init_dirty_bitmap(mem);
 err = kvm_set_user_memory_region(kml, mem, true);
 if (err) {
 fprintf(stderr, "%s: error registering slot: %s\n", __func__,
-- 
2.24.1




[PATCH v22 4/4] iotests: 287: add qcow2 compression type test

2020-04-28 Thread Denis Plotnikov
The test checks fulfilling qcow2 requirements for the compression
type feature and zstd compression type operability.

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Tested-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/287 | 152 +
 tests/qemu-iotests/287.out |  67 
 tests/qemu-iotests/group   |   1 +
 3 files changed, 220 insertions(+)
 create mode 100755 tests/qemu-iotests/287
 create mode 100644 tests/qemu-iotests/287.out

diff --git a/tests/qemu-iotests/287 b/tests/qemu-iotests/287
new file mode 100755
index 00..21fe1f19f5
--- /dev/null
+++ b/tests/qemu-iotests/287
@@ -0,0 +1,152 @@
+#!/usr/bin/env bash
+#
+# Test case for an image using zstd compression
+#
+# Copyright (c) 2020 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=dplotni...@virtuozzo.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+# standard environment
+. ./common.rc
+. ./common.filter
+
+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+_unsupported_imgopts 'compat=0.10' data_file
+
+COMPR_IMG="$TEST_IMG.compressed"
+RAND_FILE="$TEST_DIR/rand_data"
+
+_cleanup()
+{
+   _rm_test_img
+   rm -f "$COMPR_IMG"
+   rm -f "$RAND_FILE"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# for all the cases
+CLUSTER_SIZE=65536
+
+# Check if we can run this test.
+if IMGOPTS='compression_type=zstd' _make_test_img 64M |
+grep "Invalid parameter 'zstd'"; then
+_notrun "ZSTD is disabled"
+fi
+
+echo
+echo "=== Testing compression type incompatible bit setting for zlib ==="
+echo
+_make_test_img -o compression_type=zlib 64M
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+echo
+echo "=== Testing compression type incompatible bit setting for zstd ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+echo
+echo "=== Testing zlib with incompatible bit set ==="
+echo
+_make_test_img -o compression_type=zlib 64M
+$PYTHON qcow2.py "$TEST_IMG" set-feature-bit incompatible 3
+# to make sure the bit was actually set
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+if $QEMU_IMG info "$TEST_IMG" >/dev/null 2>&1 ; then
+echo "Error: The image opened successfully. The image must not be opened."
+fi
+
+echo
+echo "=== Testing zstd with incompatible bit unset ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$PYTHON qcow2.py "$TEST_IMG" set-header incompatible_features 0
+# to make sure the bit was actually unset
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+if $QEMU_IMG info "$TEST_IMG" >/dev/null 2>&1 ; then
+echo "Error: The image opened successfully. The image must not be opened."
+fi
+
+echo
+echo "=== Testing compression type values ==="
+echo
+# zlib=0
+_make_test_img -o compression_type=zlib 64M
+peek_file_be "$TEST_IMG" 104 1
+echo
+
+# zstd=1
+_make_test_img -o compression_type=zstd 64M
+peek_file_be "$TEST_IMG" 104 1
+echo
+
+echo
+echo "=== Testing simple reading and writing with zstd ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$QEMU_IO -c "write -c -P 0xAC 64K 64K " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xAC 64K 64K " "$TEST_IMG" | _filter_qemu_io
+# read on the cluster boundaries
+$QEMU_IO -c "read -v 131070 8 " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -v 65534 8" "$TEST_IMG" | _filter_qemu_io
+
+echo
+echo "=== Testing adjacent clusters reading and writing with zstd ==="
+echo
+_make_test_img -o compression_type=zstd 64M
+$QEMU_IO -c "write -c -P 0xAB 0 64K " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xAC 64K 64K " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xAD 128K 64K " "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IO -c "read -P 0xAB 0 64k " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xAC 64K 64k " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xAD 128K 64k " "$TEST_IMG" | _filter_qemu_io
+
+echo
+echo "=== Testing incompressible cluster processing with zstd ==="
+echo
+# create a 2M image and fill it with 1M likely incompressible data
+# and 1M compressible data
+dd if=/dev/urandom of="$RAND_FILE" bs=1M count=1 seek=1

[PATCH v22 0/4] implement zstd cluster compression method

2020-04-28 Thread Denis Plotnikov
v22:
   03: remove assignemnt in if condition

v21:
   03:
   * remove the loop on compression [Max]
   * use designated initializers [Max]
   04:
   * don't erase user's options [Max]
   * use _rm_test_img [Max]
   * add unsupported qcow2 options [Max]

v20:
   04: fix a number of flaws [Vladimir]
   * don't use $RAND_FILE passing to qemu-io,
 so check $TEST_DIR is redundant
   * re-arrage $RAND_FILE writing
   * fix a typo

v19:
   04: fix a number of flaws [Eric]
   * remove rudundant test case descriptions
   * fix stdout redirect
   * don't use (())
   * use peek_file_be instead of od
   * check $TEST_DIR for spaces and other before using
   * use $RAND_FILE safer

v18:
   * 04: add quotes to all file name variables [Vladimir] 
   * 04: add Vladimir's comment according to "qemu-io write -s"
 option issue.

v17:
   * 03: remove incorrect comment in zstd decompress [Vladimir]
   * 03: remove "paraniod" and rewrite the comment on decompress [Vladimir]
   * 03: fix dead assignment [Vladimir]
   * 04: add and remove quotes [Vladimir]
   * 04: replace long offset form with the short one [Vladimir]

v16:
   * 03: ssize_t for ret, size_t for zstd_ret [Vladimir]
   * 04: small fixes according to the comments [Vladimir] 

v15:
   * 01: aiming qemu 5.1 [Eric]
   * 03: change zstd_res definition place [Vladimir]
   * 04: add two new test cases [Eric]
 1. test adjacent cluster compression with zstd
 2. test incompressible cluster processing
   * 03, 04: many rewording and gramma fixing [Eric]

v14:
   * fix bug on compression - looping until compress == 0 [Me]
   * apply reworked Vladimir's suggestions:
  1. not mixing ssize_t with size_t
  2. safe check for ENOMEM in compression part - avoid overflow
  3. tolerate sanity check allow zstd to make progress only
 on one of the buffers
v13:
   * 03: add progress sanity check to decompression loop [Vladimir]
 03: add successful decompression check [Me]

v12:
   * 03: again, rework compression and decompression loops
 to make them more correct [Vladimir]
 03: move assert in compression to more appropriate place
 [Vladimir]
v11:
   * 03: the loops don't need "do{}while" form anymore and
 the they were buggy (missed "do" in the beginning)
 replace them with usual "while(){}" loops [Vladimir]
v10:
   * 03: fix zstd (de)compressed loops for multi-frame
 cases [Vladimir]
v9:
   * 01: fix error checking and reporting in qcow2_amend compression type part 
[Vladimir]
   * 03: replace asserts with -EIO in qcow2_zstd_decompression [Vladimir, 
Alberto]
   * 03: reword/amend/add comments, fix typos [Vladimir]

v8:
   * 03: switch zstd API from simple to stream [Eric]
 No need to state a special cluster layout for zstd
 compressed clusters.
v7:
   * use qapi_enum_parse instead of the open-coding [Eric]
   * fix wording, typos and spelling [Eric]

v6:
   * "block/qcow2-threads: fix qcow2_decompress" is removed from the series
  since it has been accepted by Max already
   * add compile time checking for Qcow2Header to be a multiple of 8 [Max, 
Alberto]
   * report error on qcow2 amending when the compression type is actually 
chnged [Max]
   * remove the extra space and the extra new line [Max]
   * re-arrange acks and signed-off-s [Vladimir]

v5:
   * replace -ENOTSUP with abort in qcow2_co_decompress [Vladimir]
   * set cluster size for all test cases in the beginning of the 287 test

v4:
   * the series is rebased on top of 01 "block/qcow2-threads: fix 
qcow2_decompress"
   * 01 is just a no-change resend to avoid extra dependencies. Still, it may 
be merged in separate

v3:
   * remove redundant max compression type value check [Vladimir, Eric]
 (the switch below checks everything)
   * prevent compression type changing on "qemu-img amend" [Vladimir]
   * remove zstd config setting, since it has been added already by
 "migration" patches [Vladimir]
   * change the compression type error message [Vladimir] 
   * fix alignment and 80-chars exceeding [Vladimir]

v2:
   * rework compression type setting [Vladimir]
   * squash iotest changes to the compression type introduction patch 
[Vladimir, Eric]
   * fix zstd availability checking in zstd iotest [Vladimir]
   * remove unnecessry casting [Eric]
   * remove rudundant checks [Eric]
   * fix compressed cluster layout in qcow2 spec [Vladimir]
   * fix wording [Eric, Vladimir]
   * fix compression type filtering in iotests [Eric]

v1:
   the initial series

Denis Plotnikov (4):
  qcow2: introduce compression type feature
  qcow2: rework the cluster compression routine
  qcow2: add zstd cluster compression
  iotests: 287: add qcow2 compression type test

 docs/interop/qcow2.txt   |   1 +
 configure|   2 +-
 qapi/block-core.json |  23 ++-
 block/qcow2.h|  20 ++-
 

[PATCH v22 1/4] qcow2: introduce compression type feature

2020-04-28 Thread Denis Plotnikov
The patch adds some preparation parts for incompatible compression type
feature to qcow2 allowing the use different compression methods for
image clusters (de)compressing.

It is implied that the compression type is set on the image creation and
can be changed only later by image conversion, thus compression type
defines the only compression algorithm used for the image, and thus,
for all image clusters.

The goal of the feature is to add support of other compression methods
to qcow2. For example, ZSTD which is more effective on compression than ZLIB.

The default compression is ZLIB. Images created with ZLIB compression type
are backward compatible with older qemu versions.

Adding of the compression type breaks a number of tests because now the
compression type is reported on image creation and there are some changes
in the qcow2 header in size and offsets.

The tests are fixed in the following ways:
* filter out compression_type for many tests
* fix header size, feature table size and backing file offset
  affected tests: 031, 036, 061, 080
  header_size +=8: 1 byte compression type
   7 bytes padding
  feature_table += 48: incompatible feature compression type
  backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change)
* add "compression type" for test output matching when it isn't filtered
  affected tests: 049, 060, 061, 065, 144, 182, 242, 255

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
QAPI part:
Acked-by: Markus Armbruster 
---
 qapi/block-core.json |  22 +-
 block/qcow2.h|  20 +-
 include/block/block_int.h|   1 +
 block/qcow2.c| 113 +++
 tests/qemu-iotests/031.out   |  14 ++--
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 ++--
 tests/qemu-iotests/060.out   |   1 +
 tests/qemu-iotests/061.out   |  34 ++
 tests/qemu-iotests/065   |  28 +---
 tests/qemu-iotests/080   |   2 +-
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/242.out   |   5 ++
 tests/qemu-iotests/255.out   |   8 +--
 tests/qemu-iotests/common.filter |   3 +-
 16 files changed, 267 insertions(+), 96 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 943df1926a..1522e2983f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -78,6 +78,8 @@
 #
 # @bitmaps: A list of qcow2 bitmap details (since 4.0)
 #
+# @compression-type: the image cluster compression method (since 5.1)
+#
 # Since: 1.7
 ##
 { 'struct': 'ImageInfoSpecificQCow2',
@@ -89,7 +91,8 @@
   '*corrupt': 'bool',
   'refcount-bits': 'int',
   '*encrypt': 'ImageInfoSpecificQCow2Encryption',
-  '*bitmaps': ['Qcow2BitmapInfo']
+  '*bitmaps': ['Qcow2BitmapInfo'],
+  'compression-type': 'Qcow2CompressionType'
   } }
 
 ##
@@ -4284,6 +4287,18 @@
   'data': [ 'v2', 'v3' ] }
 
 
+##
+# @Qcow2CompressionType:
+#
+# Compression type used in qcow2 image file
+#
+# @zlib: zlib compression, see 
+#
+# Since: 5.1
+##
+{ 'enum': 'Qcow2CompressionType',
+  'data': [ 'zlib' ] }
+
 ##
 # @BlockdevCreateOptionsQcow2:
 #
@@ -4307,6 +4322,8 @@
 # allowed values: off, falloc, full, metadata)
 # @lazy-refcounts: True if refcounts may be updated lazily (default: off)
 # @refcount-bits: Width of reference counts in bits (default: 16)
+# @compression-type: The image cluster compression method
+#(default: zlib, since 5.1)
 #
 # Since: 2.12
 ##
@@ -4322,7 +4339,8 @@
 '*cluster-size':'size',
 '*preallocation':   'PreallocMode',
 '*lazy-refcounts':  'bool',
-'*refcount-bits':   'int' } }
+'*refcount-bits':   'int',
+'*compression-type':'Qcow2CompressionType' } }
 
 ##
 # @BlockdevCreateOptionsQed:
diff --git a/block/qcow2.h b/block/qcow2.h
index f4de0a27d5..6a8b82e6cc 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -146,8 +146,16 @@ typedef struct QCowHeader {
 
 uint32_t refcount_order;
 uint32_t header_length;
+
+/* Additional fields */
+uint8_t compression_type;
+
+/* header must be a multiple of 8 */
+uint8_t padding[7];
 } QEMU_PACKED QCowHeader;
 
+QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(sizeof(QCowHeader), 8));
+
 typedef struct QEMU_PACKED QCowSnapshotHeader {
 /* header is 8 byte aligned */
 uint64_t l1_table_offset;
@@ -216,13 +224,16 @@ enum {
 QCOW2_INCOMPAT_DIRTY_BITNR  = 0,
 QCOW2_INCOMPAT_CORRUPT_BITNR= 1,
 QCOW2_INCOMPAT_DATA_FILE_BITNR  = 2,
+QCOW2_INCOMPAT_COMPRESSION_BITNR = 3,
 QCOW2_INCOMPAT_DIRTY= 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
 QCOW2_INCOMPAT_CORRUPT  = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
 QCOW2_INCOMPAT_DATA_FILE= 1 << 

[PATCH v22 2/4] qcow2: rework the cluster compression routine

2020-04-28 Thread Denis Plotnikov
The patch enables processing the image compression type defined
for the image and chooses an appropriate method for image clusters
(de)compression.

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
---
 block/qcow2-threads.c | 71 ---
 1 file changed, 60 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index a68126f291..7dbaf53489 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -74,7 +74,9 @@ typedef struct Qcow2CompressData {
 } Qcow2CompressData;
 
 /*
- * qcow2_compress()
+ * qcow2_zlib_compress()
+ *
+ * Compress @src_size bytes of data using zlib compression method
  *
  * @dest - destination buffer, @dest_size bytes
  * @src - source buffer, @src_size bytes
@@ -83,8 +85,8 @@ typedef struct Qcow2CompressData {
  *  -ENOMEM destination buffer is not enough to store compressed data
  *  -EIOon any other error
  */
-static ssize_t qcow2_compress(void *dest, size_t dest_size,
-  const void *src, size_t src_size)
+static ssize_t qcow2_zlib_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
 {
 ssize_t ret;
 z_stream strm;
@@ -119,10 +121,10 @@ static ssize_t qcow2_compress(void *dest, size_t 
dest_size,
 }
 
 /*
- * qcow2_decompress()
+ * qcow2_zlib_decompress()
  *
  * Decompress some data (not more than @src_size bytes) to produce exactly
- * @dest_size bytes.
+ * @dest_size bytes using zlib compression method
  *
  * @dest - destination buffer, @dest_size bytes
  * @src - source buffer, @src_size bytes
@@ -130,8 +132,8 @@ static ssize_t qcow2_compress(void *dest, size_t dest_size,
  * Returns: 0 on success
  *  -EIO on fail
  */
-static ssize_t qcow2_decompress(void *dest, size_t dest_size,
-const void *src, size_t src_size)
+static ssize_t qcow2_zlib_decompress(void *dest, size_t dest_size,
+ const void *src, size_t src_size)
 {
 int ret;
 z_stream strm;
@@ -191,20 +193,67 @@ qcow2_co_do_compress(BlockDriverState *bs, void *dest, 
size_t dest_size,
 return arg.ret;
 }
 
+/*
+ * qcow2_co_compress()
+ *
+ * Compress @src_size bytes of data using the compression
+ * method defined by the image compression type
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  a negative error code on failure
+ */
 ssize_t coroutine_fn
 qcow2_co_compress(BlockDriverState *bs, void *dest, size_t dest_size,
   const void *src, size_t src_size)
 {
-return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
-qcow2_compress);
+BDRVQcow2State *s = bs->opaque;
+Qcow2CompressFunc fn;
+
+switch (s->compression_type) {
+case QCOW2_COMPRESSION_TYPE_ZLIB:
+fn = qcow2_zlib_compress;
+break;
+
+default:
+abort();
+}
+
+return qcow2_co_do_compress(bs, dest, dest_size, src, src_size, fn);
 }
 
+/*
+ * qcow2_co_decompress()
+ *
+ * Decompress some data (not more than @src_size bytes) to produce exactly
+ * @dest_size bytes using the compression method defined by the image
+ * compression type
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: 0 on success
+ *  a negative error code on failure
+ */
 ssize_t coroutine_fn
 qcow2_co_decompress(BlockDriverState *bs, void *dest, size_t dest_size,
 const void *src, size_t src_size)
 {
-return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
-qcow2_decompress);
+BDRVQcow2State *s = bs->opaque;
+Qcow2CompressFunc fn;
+
+switch (s->compression_type) {
+case QCOW2_COMPRESSION_TYPE_ZLIB:
+fn = qcow2_zlib_decompress;
+break;
+
+default:
+abort();
+}
+
+return qcow2_co_do_compress(bs, dest, dest_size, src, src_size, fn);
 }
 
 
-- 
2.17.0




[PATCH v22 3/4] qcow2: add zstd cluster compression

2020-04-28 Thread Denis Plotnikov
zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
  time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
  src.img [zlib|zstd]_compressed.img
decompress cmd
  time ./qemu-img convert -O qcow2
  [zlib|zstd]_compressed.img uncompressed.img

   compression   decompression
 zlib   zstd   zlib zstd

real 65.5   16.3 (-75 %)1.9  1.6 (-16 %)
user 65.0   15.85.3  2.5
sys   3.30.22.0  2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov 
QAPI part:
Acked-by: Markus Armbruster 
---
 docs/interop/qcow2.txt |   1 +
 configure  |   2 +-
 qapi/block-core.json   |   3 +-
 block/qcow2-threads.c  | 169 +
 block/qcow2.c  |   7 ++
 slirp  |   2 +-
 6 files changed, 181 insertions(+), 3 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 640e0eca40..18a77f737e 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -209,6 +209,7 @@ version 2.
 
 Available compression type values:
 0: zlib 
+1: zstd 
 
 
 === Header padding ===
diff --git a/configure b/configure
index 23b5e93752..4e3a1690ea 100755
--- a/configure
+++ b/configure
@@ -1861,7 +1861,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   lzfse   support of lzfse compression library
   (for reading lzfse-compressed dmg images)
   zstdsupport for zstd compression library
-  (for migration compression)
+  (for migration compression and qcow2 cluster compression)
   seccomp seccomp support
   coroutine-pool  coroutine freelist (better performance)
   glusterfs   GlusterFS backend
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1522e2983f..6fbacddab2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4293,11 +4293,12 @@
 # Compression type used in qcow2 image file
 #
 # @zlib: zlib compression, see 
+# @zstd: zstd compression, see 
 #
 # Since: 5.1
 ##
 { 'enum': 'Qcow2CompressionType',
-  'data': [ 'zlib' ] }
+  'data': [ 'zlib', { 'name': 'zstd', 'if': 'defined(CONFIG_ZSTD)' } ] }
 
 ##
 # @BlockdevCreateOptionsQcow2:
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index 7dbaf53489..a0b12e1b15 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -28,6 +28,11 @@
 #define ZLIB_CONST
 #include 
 
+#ifdef CONFIG_ZSTD
+#include 
+#include 
+#endif
+
 #include "qcow2.h"
 #include "block/thread-pool.h"
 #include "crypto.h"
@@ -166,6 +171,160 @@ static ssize_t qcow2_zlib_decompress(void *dest, size_t 
dest_size,
 return ret;
 }
 
+#ifdef CONFIG_ZSTD
+
+/*
+ * qcow2_zstd_compress()
+ *
+ * Compress @src_size bytes of data using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  -ENOMEM destination buffer is not enough to store compressed data
+ *  -EIOon any other error
+ */
+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
+{
+ssize_t ret;
+size_t zstd_ret;
+ZSTD_outBuffer output = {
+.dst = dest,
+.size = dest_size,
+.pos = 0
+};
+ZSTD_inBuffer input = {
+.src = src,
+.size = src_size,
+.pos = 0
+};
+ZSTD_CCtx *cctx = ZSTD_createCCtx();
+
+if (!cctx) {
+return -EIO;
+}
+/*
+ * Use the zstd streamed interface for symmetry with decompression,
+ * where streaming is essential since we don't record the exact
+ * compressed size.
+ *
+ * ZSTD_compressStream2() tries to compress everything it could
+ * with a single call. Although, ZSTD docs says that:
+ * "You must continue calling ZSTD_compressStream2() with ZSTD_e_end
+ * until it returns 0, at which point you are free to start a new frame",
+ * in out tests we saw the only case when it returned with >0 -
+ * when the output 

[PATCH RFC v2 1/9] KVM: Fixup kvm_log_clear_one_slot() ioctl return check

2020-04-28 Thread Peter Xu
kvm_vm_ioctl() handles the errno trick already for ioctl() on
returning -1 for errors.  Fix this.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
 accel/kvm/kvm-all.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index e1c87fa4e1..1f1fd5316e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -698,14 +698,13 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int 
as_id, uint64_t start,
 d.num_pages = bmap_npages;
 d.slot = mem->slot | (as_id << 16);
 
-if (kvm_vm_ioctl(s, KVM_CLEAR_DIRTY_LOG, ) == -1) {
-ret = -errno;
+ret = kvm_vm_ioctl(s, KVM_CLEAR_DIRTY_LOG, );
+if (ret) {
 error_report("%s: KVM_CLEAR_DIRTY_LOG failed, slot=%d, "
  "start=0x%"PRIx64", size=0x%"PRIx32", errno=%d",
  __func__, d.slot, (uint64_t)d.first_page,
  (uint32_t)d.num_pages, ret);
 } else {
-ret = 0;
 trace_kvm_clear_dirty_log(d.slot, d.first_page, d.num_pages);
 }
 
-- 
2.24.1




[PATCH RFC v2 2/9] linux-headers: Update

2020-04-28 Thread Peter Xu
Signed-off-by: Peter Xu 
---
 include/standard-headers/linux/ethtool.h  |  10 +-
 .../linux/input-event-codes.h |   5 +-
 include/standard-headers/linux/pci_regs.h |   2 +
 .../standard-headers/linux/virtio_balloon.h   |   1 +
 include/standard-headers/linux/virtio_ids.h   |   1 +
 linux-headers/COPYING |   2 +
 linux-headers/asm-x86/kvm.h   |   2 +
 linux-headers/asm-x86/unistd_32.h |   1 +
 linux-headers/asm-x86/unistd_64.h |   1 +
 linux-headers/asm-x86/unistd_x32.h|   1 +
 linux-headers/linux/kvm.h | 100 +-
 linux-headers/linux/mman.h|   5 +-
 linux-headers/linux/userfaultfd.h |  40 +--
 linux-headers/linux/vfio.h|  37 +++
 14 files changed, 195 insertions(+), 13 deletions(-)

diff --git a/include/standard-headers/linux/ethtool.h 
b/include/standard-headers/linux/ethtool.h
index 8adf3b018b..1200890c86 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -596,6 +596,9 @@ struct ethtool_pauseparam {
  * @ETH_SS_LINK_MODES: link mode names
  * @ETH_SS_MSG_CLASSES: debug message class names
  * @ETH_SS_WOL_MODES: wake-on-lan modes
+ * @ETH_SS_SOF_TIMESTAMPING: SOF_TIMESTAMPING_* flags
+ * @ETH_SS_TS_TX_TYPES: timestamping Tx types
+ * @ETH_SS_TS_RX_FILTERS: timestamping Rx filters
  */
 enum ethtool_stringset {
ETH_SS_TEST = 0,
@@ -610,6 +613,9 @@ enum ethtool_stringset {
ETH_SS_LINK_MODES,
ETH_SS_MSG_CLASSES,
ETH_SS_WOL_MODES,
+   ETH_SS_SOF_TIMESTAMPING,
+   ETH_SS_TS_TX_TYPES,
+   ETH_SS_TS_RX_FILTERS,
 
/* add new constants above here */
ETH_SS_COUNT
@@ -1330,6 +1336,7 @@ enum ethtool_fec_config_bits {
ETHTOOL_FEC_OFF_BIT,
ETHTOOL_FEC_RS_BIT,
ETHTOOL_FEC_BASER_BIT,
+   ETHTOOL_FEC_LLRS_BIT,
 };
 
 #define ETHTOOL_FEC_NONE   (1 << ETHTOOL_FEC_NONE_BIT)
@@ -1337,6 +1344,7 @@ enum ethtool_fec_config_bits {
 #define ETHTOOL_FEC_OFF(1 << ETHTOOL_FEC_OFF_BIT)
 #define ETHTOOL_FEC_RS (1 << ETHTOOL_FEC_RS_BIT)
 #define ETHTOOL_FEC_BASER  (1 << ETHTOOL_FEC_BASER_BIT)
+#define ETHTOOL_FEC_LLRS   (1 << ETHTOOL_FEC_LLRS_BIT)
 
 /* CMDs currently supported */
 #define ETHTOOL_GSET   0x0001 /* DEPRECATED, Get settings.
@@ -1521,7 +1529,7 @@ enum ethtool_link_mode_bit_indices {
ETHTOOL_LINK_MODE_40baseLR8_ER8_FR8_Full_BIT = 71,
ETHTOOL_LINK_MODE_40baseDR8_Full_BIT = 72,
ETHTOOL_LINK_MODE_40baseCR8_Full_BIT = 73,
-
+   ETHTOOL_LINK_MODE_FEC_LLRS_BIT   = 74,
/* must be last entry */
__ETHTOOL_LINK_MODE_MASK_NBITS
 };
diff --git a/include/standard-headers/linux/input-event-codes.h 
b/include/standard-headers/linux/input-event-codes.h
index b484c25289..ebf72c1031 100644
--- a/include/standard-headers/linux/input-event-codes.h
+++ b/include/standard-headers/linux/input-event-codes.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
 /*
  * Input event codes
  *
@@ -652,6 +652,9 @@
 /* Electronic privacy screen control */
 #define KEY_PRIVACY_SCREEN_TOGGLE  0x279
 
+/* Select an area of screen to be copied */
+#define KEY_SELECTIVE_SCREENSHOT   0x27a
+
 /*
  * Some keyboards have keys which do not have a defined meaning, these keys
  * are intended to be programmed / bound to macros by the user. For most
diff --git a/include/standard-headers/linux/pci_regs.h 
b/include/standard-headers/linux/pci_regs.h
index 5437690483..f9701410d3 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -605,6 +605,7 @@
 #define  PCI_EXP_SLTCTL_PWR_OFF0x0400 /* Power Off */
 #define  PCI_EXP_SLTCTL_EIC0x0800  /* Electromechanical Interlock Control 
*/
 #define  PCI_EXP_SLTCTL_DLLSCE 0x1000  /* Data Link Layer State Changed Enable 
*/
+#define  PCI_EXP_SLTCTL_IBPD_DISABLE   0x4000 /* In-band PD disable */
 #define PCI_EXP_SLTSTA 26  /* Slot Status */
 #define  PCI_EXP_SLTSTA_ABP0x0001  /* Attention Button Pressed */
 #define  PCI_EXP_SLTSTA_PFD0x0002  /* Power Fault Detected */
@@ -680,6 +681,7 @@
 #define PCI_EXP_LNKSTA250  /* Link Status 2 */
 #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 52  /* v2 endpoints with link end 
here */
 #define PCI_EXP_SLTCAP252  /* Slot Capabilities 2 */
+#define  PCI_EXP_SLTCAP2_IBPD  0x0001 /* In-band PD Disable Supported */
 #define PCI_EXP_SLTCTL256  /* Slot Control 2 */
 #define PCI_EXP_SLTSTA258  /* Slot Status 2 */
 
diff --git a/include/standard-headers/linux/virtio_balloon.h 

[PATCH RFC v2 0/9] KVM: Dirty ring support (QEMU part)

2020-04-28 Thread Peter Xu
Still RFC.  Firstly, the kernel series is mostly stall recently... Secondly, we
still haven't settled on how we should handle the dirty sync in kvm mem
removal.  This version is based on the other QEMU series:

  "vl: Sync dirty bitmap when system resets"

Another major change of this series is that I tried to simplify the last patch
by allowing the main/vcpu threads to directly call kvm_dirty_ring_reap(), etc.
Many of the eventfds/semaphores are removed (so less LOC in the last patch),
hopefully that could make the last patch even easier to review.

v2:
- add r-bs for Dave
- change dirty-ring-size parameter from int64 to uint64_t [Dave]
- remove an assertion for KVM_GET_DIRTY_LOG [Dave]
- document update: "per vcpu" dirty ring [Dave]
- rename KVMReaperState to KVMDirtyRingReaperState [Dave]
- dump errno when kvm_init_vcpu fails with dirty ring [Dave]
- switch to use dirty-ring-gfns as parameter [Dave]
- comment MAP_SHARED [Dave]
- dump more info when enable dirty ring failed [Dave]
- add kvm_dirty_ring_enabled flag to show whether dirty ring enabled
- rewrote many of the last patch to reduce LOC, now we do dirty ring reap only
  with BQL to simplify things, allowing the main or vcpu thread to directly
  call kvm_dirty_ring_reap() to collect dirty pages, so that we can drop a lot
  of synchronization variables like sems or eventfds.

For anyone who wants to try (we need to upgrade kernel too):

KVM branch:
  https://github.com/xzpeter/linux/tree/kvm-dirty-ring

QEMU branch for testing:
  https://github.com/xzpeter/qemu/tree/kvm-dirty-ring

Overview


KVM dirty ring is a new interface to pass over dirty bits from kernel
to the userspace.  Instead of using a bitmap for each memory region,
the dirty ring contains an array of dirtied GPAs to fetch, one ring
per vcpu.

There're a few major changes comparing to how the old dirty logging
interface would work:

- Granularity of dirty bits

  KVM dirty ring interface does not offer memory region level
  granularity to collect dirty bits (i.e., per KVM memory
  slot). Instead the dirty bit is collected globally for all the vcpus
  at once.  The major effect is on VGA part because VGA dirty tracking
  is enabled as long as the device is created, also it was in memory
  region granularity.  Now that operation will be amplified to a VM
  sync.  Maybe there's smarter way to do the same thing in VGA with
  the new interface, but so far I don't see it affects much at least
  on regular VMs.

- Collection of dirty bits

  The old dirty logging interface collects KVM dirty bits when
  synchronizing dirty bits.  KVM dirty ring interface instead used a
  standalone thread to do that.  So when the other thread (e.g., the
  migration thread) wants to synchronize the dirty bits, it simply
  kick the thread and wait until it flushes all the dirty bits to the
  ramblock dirty bitmap.

A new parameter "dirty-ring-size" is added to "-accel kvm".  By
default, dirty ring is still disabled (size==0).  To enable it, we
need to be with:

  -accel kvm,dirty-ring-size=65536

This establishes a 64K dirty ring buffer per vcpu.  Then if we
migrate, it'll switch to dirty ring.

I gave it a shot with a 24G guest, 8 vcpus, using 10g NIC as migration
channel.  When idle or dirty workload small, I don't observe major
difference on total migration time.  When with higher random dirty
workload (800MB/s dirty rate upon 20G memory, worse for kvm dirty
ring). Total migration time is (ping pong migrate for 6 times, in
seconds):

|-+---|
| dirty ring (4k entries) | dirty logging |
|-+---|
|  70 |58 |
|  78 |70 |
|  72 |48 |
|  74 |52 |
|  83 |49 |
|  65 |54 |
|-+---|

Summary:

dirty ring average:73s
dirty logging average: 55s

The KVM dirty ring will be slower in above case.  The number may show
that the dirty logging is still preferred as a default value because
small/medium VMs are still major cases, and high dirty workload
happens frequently too.  And that's what this series did.

Please refer to the code and comment itself for more information.

Thanks,

Peter Xu (9):
  KVM: Fixup kvm_log_clear_one_slot() ioctl return check
  linux-headers: Update
  memory: Introduce log_sync_global() to memory listener
  KVM: Create the KVMSlot dirty bitmap on flag changes
  KVM: Provide helper to get kvm dirty log
  KVM: Provide helper to sync dirty bitmap from slot to ramblock
  KVM: Cache kvm slot dirty bitmap size
  KVM: Add dirty-gfn-count property
  KVM: Dirty ring support

 accel/kvm/kvm-all.c   | 517 --
 accel/kvm/trace-events|   7 +
 include/exec/memory.h |  12 +
 include/hw/core/cpu.h |   

[PATCH RFC 3/4] vl: Sync dirty bits for system resets during precopy

2020-04-28 Thread Peter Xu
System resets will also reset system memory layout.  Although the memory layout
after the reset should probably the same as before the reset, we still need to
do frequent memory section removals and additions during the reset process.
Those operations could accidentally lose per-mem-section information like KVM
memslot dirty bitmaps.

Previously we keep those dirty bitmaps by sync it during memory removal.
However that's hard to make it right after all [1].  Instead, we sync dirty
pages before system reset if we know we're during a precopy migration.  This
should solve the same problem explicitly.

[1] https://lore.kernel.org/qemu-devel/20200327150425.GJ422390@xz-x1/

Signed-off-by: Peter Xu 
---
 softmmu/vl.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/softmmu/vl.c b/softmmu/vl.c
index 32c0047889..8f864fee43 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1387,6 +1387,22 @@ void qemu_system_reset(ShutdownCause reason)
 
 cpu_synchronize_all_states();
 
+/*
+ * System reboot could reset memory layout.  Although the final status of
+ * the memory layout should be the same as before the reset, the memory
+ * sections can still be removed and added back frequently due to the reset
+ * process.  This could potentially drop dirty bits in track for those
+ * memory sections before the reset.
+ *
+ * Do a global dirty sync before the reset happens if we are during a
+ * precopy, so we don't lose the dirty bits during the memory shuffles.
+ */
+if (migration_is_precopy()) {
+WITH_RCU_READ_LOCK_GUARD() {
+migration_bitmap_sync_precopy();
+}
+}
+
 if (mc && mc->reset) {
 mc->reset(current_machine);
 } else {
-- 
2.24.1




Re: backing chain & block status & filters

2020-04-28 Thread Vladimir Sementsov-Ogievskiy

28.04.2020 19:46, Vladimir Sementsov-Ogievskiy wrote:

28.04.2020 19:18, Eric Blake wrote:

On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote:


Hm.  I could imagine that there are formats that have non-zero holes
(e.g. 0xff or just garbage).  It would be a bit wrong for them to return
ZERO or DATA then.

But OTOH we don’t care about such cases, do we?  We need to know whether
ranges are zero, data, or unallocated.  If they aren’t zero, we only
care about whether reading from it will return data from this layer or not.

So I suppose that anything that doesn’t support backing files (or
filtered children) should always return ZERO and/or DATA.


I'm not sure I agree with the notion that everything should be
BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
at least. If we want to change this, we will have to check all callers
of bdrv_is_allocated() and friends who might use this to find holes in
the file.


Yes. Because they are doing incorrect (or at least undocumented and unreliable) 
thing.


Here's some previous mails discussing the same question about what block_status 
should actually mean.  At the time, I was so scared of the prospect of 
something breaking if I changed things that I ended up keeping status quo, so 
here we are revisiting the topic several years later, still asking the same 
questions.

https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html





Basically, the way bdrv_is_allocated() works today is that we assume an
implicit zeroed backing layer even for block drivers that don't support
backing files.


But read doesn't work so: it will read data from the bottom layer, not from
this implicit zeroed backing layer. And it is inconsistent. On read data
comes exactly from this layer, not from its implicit backing. So it should
return BDRV_BLOCK_ALLOCATED, accordingly to its definition..

Or, we should at least document current behavior:

   BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
   layer rather than any backing, set by block. Attention: it may not be set
   for drivers without backing support, still data is of course read from
   this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
   allocation on fs level, which occupies real space on disk.. So, for such 
drivers

   ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or
   (most probably) not,
   don't look at ALLOCATED flag, as it is added by generic layer for another 
logic,
   not related to fs-allocation.

   0 means that, most probably, data doesn't occupy space on fs, zero-status is
   unknown (most probably non-zero)



That may be right in describing the current situation, but again, needs a GOOD 
audit of what we are actually using it for, and whether it is what we really 
WANT to be using it for.  If we're going to audit/refactor the code, we might 
as well get semantics that are actually useful, rather than painfully contorted 
to documentation that happens to match our current contorted code.



Honest enough:) I'll try to make a table.

I don't think that reporting fs-allocation status is a bad thing. But I'm sure 
that it should be separated from backing-chain-allocated concept.



As a first step, I've don brief analysis of .bdrv_co_block_status of drivers 
(attached)

--
Best regards,
Vladimir
= Filter functions =
mirror,commit,backup filters: bdrv_co_block_status_from_backing
blklogwrites,compress,COR,throttle: bdrv_co_block_status_from_file

return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID
- semantics of BDRV_BLOCK_RAW is unclean, behavior is broken

blkdebug: blkdebug_co_block_status
- actually, uses bdrv_co_block_status_from_file,
after additional blkdebug-related things not influincing the result

= raw =
raw: raw_co_block_status
return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID
- semantics of BDRV_BLOCK_RAW is unclean, behavior is broken

Summary: we need to fix BDRV_BLOCK_RAW-recursion semantics to not interfere 
with block_status_above/is_allocated_above loops.

= Format drivers with supports_backing = true =

qed: bdrv_qed_co_block_status
bdi->unallocated_blocks_are_zero = true;

0 - go to backing
ZERO - metadata-zero
DATA | OFFSET_VALID with @map set and @file = file-child : 
metadata-allocated-data
 
parallels: parallels_co_block_status
unallocated_blocks_are_zero is unset, but they are actually read as zero if 
no backing

0 - go to backing
DATA | OFFSET_VALID with @map set and @file = file-child : 
metadat-allocated-data

qcow2: qcow2_co_block_status
bdi->unallocated_blocks_are_zero = true;

ZERO | OFFSET_VALID with @map set and @file = something : 
metadata-allocated-zero
DATA | OFFSET_VALID with @map set and @file = something : 
metadata-allocated-data
RECURSE | DATA | OFFSET_VALID with @map set and @file = file-child : 
metadata-allocated-data (hint, that 

[PATCH RFC 2/4] migration: Introduce migrate_is_precopy()

2020-04-28 Thread Peter Xu
Export a helper globally to check whether we're during a precopy.

Signed-off-by: Peter Xu 
---
 include/migration/misc.h | 1 +
 migration/migration.c| 7 +++
 2 files changed, 8 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index e338be8c30..b4f6bf7842 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -61,6 +61,7 @@ void migration_shutdown(void);
 void qemu_start_incoming_migration(const char *uri, Error **errp);
 bool migration_is_idle(void);
 bool migration_is_active(MigrationState *);
+bool migration_is_precopy(void);
 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
 bool migration_in_setup(MigrationState *);
diff --git a/migration/migration.c b/migration/migration.c
index 187ac0410c..0082880279 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1795,6 +1795,13 @@ bool migration_is_active(MigrationState *s)
 s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
 }
 
+bool migration_is_precopy(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s && s->state == MIGRATION_STATUS_ACTIVE;
+}
+
 void migrate_init(MigrationState *s)
 {
 /*
-- 
2.24.1




[PATCH RFC 4/4] kvm: No need to sync dirty bitmap before memslot removal any more

2020-04-28 Thread Peter Xu
With the system reset dirty sync in qemu_system_reset(), we should be able to
drop this operation now.  After all it doesn't really fix the problem cleanly
because logically we could still have a race [1].

[1] https://lore.kernel.org/qemu-devel/20200327150425.GJ422390@xz-x1/

Signed-off-by: Peter Xu 
---
 accel/kvm/kvm-all.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 439a4efe52..e1c87fa4e1 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1061,9 +1061,6 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
 if (!mem) {
 goto out;
 }
-if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
-kvm_physical_sync_dirty_bitmap(kml, section);
-}
 
 /* unregister the slot */
 g_free(mem->dirty_bmap);
-- 
2.24.1




[PATCH RFC 1/4] migration: Export migration_bitmap_sync_precopy()

2020-04-28 Thread Peter Xu
Make it usable outside migration.  To make it easier to use, remove the
RAMState parameter since after all ram.c has the reference of ram_state
directly from its context.

Signed-off-by: Peter Xu 
---
 include/migration/misc.h |  1 +
 migration/ram.c  | 10 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index d2762257aa..e338be8c30 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -66,6 +66,7 @@ void remove_migration_state_change_notifier(Notifier *notify);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
+void migration_bitmap_sync_precopy(void);
 /* ...and after the device transmission */
 bool migration_in_postcopy_after_devices(MigrationState *);
 void migration_global_dump(Monitor *mon);
diff --git a/migration/ram.c b/migration/ram.c
index 04f13feb2e..d737175d4e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -970,7 +970,7 @@ static void migration_bitmap_sync(RAMState *rs)
 }
 }
 
-static void migration_bitmap_sync_precopy(RAMState *rs)
+void migration_bitmap_sync_precopy(void)
 {
 Error *local_err = NULL;
 
@@ -983,7 +983,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs)
 local_err = NULL;
 }
 
-migration_bitmap_sync(rs);
+migration_bitmap_sync(ram_state);
 
 if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, _err)) {
 error_report_err(local_err);
@@ -2303,7 +2303,7 @@ static void ram_init_bitmaps(RAMState *rs)
 WITH_RCU_READ_LOCK_GUARD() {
 ram_list_init_bitmaps();
 memory_global_dirty_log_start();
-migration_bitmap_sync_precopy(rs);
+migration_bitmap_sync_precopy();
 }
 qemu_mutex_unlock_ramlist();
 qemu_mutex_unlock_iothread();
@@ -2592,7 +2592,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 
 WITH_RCU_READ_LOCK_GUARD() {
 if (!migration_in_postcopy()) {
-migration_bitmap_sync_precopy(rs);
+migration_bitmap_sync_precopy();
 }
 
 ram_control_before_iterate(f, RAM_CONTROL_FINISH);
@@ -2642,7 +2642,7 @@ static void ram_save_pending(QEMUFile *f, void *opaque, 
uint64_t max_size,
 remaining_size < max_size) {
 qemu_mutex_lock_iothread();
 WITH_RCU_READ_LOCK_GUARD() {
-migration_bitmap_sync_precopy(rs);
+migration_bitmap_sync_precopy();
 }
 qemu_mutex_unlock_iothread();
 remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
-- 
2.24.1




[PATCH RFC 0/4] vl: Sync dirty bitmap when system resets

2020-04-28 Thread Peter Xu
This RFC series starts from the fact that we will sync dirty bitmap when
removing a memslot for KVM.  IIUC that was majorly to maintain the dirty bitmap
even across a system reboot.

This series wants to move that sync from kvm memslot removal to system reset.

(I still don't know why the reset system will still need to keep the RAM status
 before the reset.  I thought things like kdump might use this to retrieve info
 from previous kernel panic, however IIUC that's not what kdump is doing now.
 Anyway, I'd be more than glad if anyone knows the real scenario behind
 this...)

The current solution (sync at kvm memslot removal) works in most cases, but:

  - it will be merely impossible to work for dirty ring, and,

  - it has an existing flaw on race condition. [1]

So if system reset is the only thing we care here, I'm thinking whether we can
move this sync explicitly to system reset so we do a global sync there instead
of sync every time when memory layout changed and caused memory removals.  I
think it can be more explict to sync during system reset, and also with that
context it will be far easier for kvm dirty ring to provide the same logic.

This is totally RFC because I'm still trying to find whether there will be
other cases besides system reset that we want to keep the dirty bits for a
to-be-removed memslot (real memory removals like unplugging memory shouldn't
matter, because we won't care about the dirty bits if it's never going to be
there anymore, not to mention we won't allow such things during a migration).
So far I don't see any.

I've run some tests either using the old dirty log or dirty ring, with either
some memory load or reboots on the source, and I see no issues so far.

Comments greatly welcomed.  Thanks.

[1] https://lore.kernel.org/qemu-devel/20200327150425.GJ422390@xz-x1/

Peter Xu (4):
  migration: Export migration_bitmap_sync_precopy()
  migration: Introduce migrate_is_precopy()
  vl: Sync dirty bits for system resets during precopy
  kvm: No need to sync dirty bitmap before memslot removal any more

 accel/kvm/kvm-all.c  |  3 ---
 include/migration/misc.h |  2 ++
 migration/migration.c|  7 +++
 migration/ram.c  | 10 +-
 softmmu/vl.c | 16 
 5 files changed, 30 insertions(+), 8 deletions(-)

-- 
2.24.1




[PATCH v4 3/3] qcow2: Tweak comment about bitmaps vs. resize

2020-04-28 Thread Eric Blake
Our comment did not actually match the code.  Rewrite the comment to
be less sensitive to any future changes to qcow2-bitmap.c that might
implement scenarios that we currently reject.

Signed-off-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 3e8b3d022b80..ad934109a813 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3999,7 +3999,7 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 goto fail;
 }

-/* cannot proceed if image has bitmaps */
+/* See qcow2-bitmap.c for which bitmap scenarios prevent a resize. */
 if (qcow2_truncate_bitmaps_check(bs, errp)) {
 ret = -ENOTSUP;
 goto fail;
-- 
2.26.2




[PATCH v4 1/3] block: Add blk_new_with_bs() helper

2020-04-28 Thread Eric Blake
There are several callers that need to create a new block backend from
an existing BDS; make the task slightly easier with a common helper
routine.

Suggested-by: Max Reitz 
Signed-off-by: Eric Blake 
Message-Id: <20200424190903.522087-2-ebl...@redhat.com>
[mreitz: Set @ret only in error paths, see
 https://lists.nongnu.org/archive/html/qemu-block/2020-04/msg01216.html]
Signed-off-by: Max Reitz 
---
 include/sysemu/block-backend.h |  2 ++
 block/block-backend.c  | 23 +++
 block/crypto.c |  9 -
 block/parallels.c  |  8 
 block/qcow.c   |  8 
 block/qcow2.c  | 18 --
 block/qed.c|  8 
 block/sheepdog.c   | 10 +-
 block/vdi.c|  8 
 block/vhdx.c   |  8 
 block/vmdk.c   |  9 -
 block/vpc.c|  8 
 blockdev.c |  8 +++-
 blockjob.c |  7 ++-
 14 files changed, 75 insertions(+), 59 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 34de7faa81de..0917663d89f4 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -77,6 +77,8 @@ typedef struct BlockBackendPublic {
 } BlockBackendPublic;

 BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm);
+BlockBackend *blk_new_with_bs(BlockDriverState *bs, uint64_t perm,
+  uint64_t shared_perm, Error **errp);
 BlockBackend *blk_new_open(const char *filename, const char *reference,
QDict *options, int flags, Error **errp);
 int blk_get_refcnt(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index 17ed6d8c5b27..f4944861fa4e 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -355,6 +355,29 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, 
uint64_t shared_perm)
 return blk;
 }

+/*
+ * Create a new BlockBackend connected to an existing BlockDriverState.
+ *
+ * @perm is a bitmasks of BLK_PERM_* constants which describes the
+ * permissions to request for @bs that is attached to this
+ * BlockBackend.  @shared_perm is a bitmask which describes which
+ * permissions may be granted to other users of the attached node.
+ * Both sets of permissions can be changed later using blk_set_perm().
+ *
+ * Return the new BlockBackend on success, null on failure.
+ */
+BlockBackend *blk_new_with_bs(BlockDriverState *bs, uint64_t perm,
+  uint64_t shared_perm, Error **errp)
+{
+BlockBackend *blk = blk_new(bdrv_get_aio_context(bs), perm, shared_perm);
+
+if (blk_insert_bs(blk, bs, errp) < 0) {
+blk_unref(blk);
+return NULL;
+}
+return blk;
+}
+
 /*
  * Creates a new BlockBackend, opens a new BlockDriverState, and connects both.
  * The new BlockBackend is in the main AioContext.
diff --git a/block/crypto.c b/block/crypto.c
index e02f34359019..ca44dae4f7e8 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -261,11 +261,10 @@ static int 
block_crypto_co_create_generic(BlockDriverState *bs,
 QCryptoBlock *crypto = NULL;
 struct BlockCryptoCreateData data;

-blk = blk_new(bdrv_get_aio_context(bs),
-  BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
-
-ret = blk_insert_bs(blk, bs, errp);
-if (ret < 0) {
+blk = blk_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL,
+  errp);
+if (!blk) {
+ret = -EPERM;
 goto cleanup;
 }

diff --git a/block/parallels.c b/block/parallels.c
index 2be92cf41708..8db64a55e3ae 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -559,10 +559,10 @@ static int coroutine_fn 
parallels_co_create(BlockdevCreateOptions* opts,
 return -EIO;
 }

-blk = blk_new(bdrv_get_aio_context(bs),
-  BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
-ret = blk_insert_bs(blk, bs, errp);
-if (ret < 0) {
+blk = blk_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL,
+  errp);
+if (!blk) {
+ret = -EPERM;
 goto out;
 }
 blk_set_allow_write_beyond_eof(blk, true);
diff --git a/block/qcow.c b/block/qcow.c
index 6b5f2269f0ba..b0475b73a551 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -849,10 +849,10 @@ static int coroutine_fn 
qcow_co_create(BlockdevCreateOptions *opts,
 return -EIO;
 }

-qcow_blk = blk_new(bdrv_get_aio_context(bs),
-   BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
-ret = blk_insert_bs(qcow_blk, bs, errp);
-if (ret < 0) {
+qcow_blk = blk_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE,
+   BLK_PERM_ALL, errp);
+if (!qcow_blk) {
+ret = -EPERM;
 goto exit;
 }
 

[PATCH v4 2/3] qcow2: Allow resize of images with internal snapshots

2020-04-28 Thread Eric Blake
We originally refused to allow resize of images with internal
snapshots because the v2 image format did not require the tracking of
snapshot size, making it impossible to safely revert to a snapshot
with a different size than the current view of the image.  But the
snapshot size tracking was rectified in v3, and our recent fixes to
qemu-img amend (see 0a85af35) guarantee that we always have a valid
snapshot size.  Thus, we no longer need to artificially limit image
resizes, but it does become one more thing that would prevent a
downgrade back to v2.  And now that we support different-sized
snapshots, it's also easy to fix reverting to a snapshot to apply the
new size.

Upgrade iotest 61 to cover this (we previously had NO coverage of
refusal to resize while snapshots exist).  Note that the amend process
can fail but still have effects: in particular, since we break things
into upgrade, resize, downgrade, a failure during resize does not roll
back changes made during upgrade, nor does failure in downgrade roll
back a resize.  But this situation is pre-existing even without this
patch; and without journaling, the best we could do is minimize the
chance of partial failure by collecting all changes prior to doing any
writes - which adds a lot of complexity but could still fail with EIO.
On the other hand, we are careful that even if we have partial
modification but then fail, the image is left viable (that is, we are
careful to sequence things so that after each successful cluster
write, there may be transient leaked clusters but no corrupt
metadata).  And complicating the code to make it more transaction-like
is not worth the effort: a user can always request multiple 'qemu-img
amend' changing one thing each, if they need finer-grained control
over detecting the first failure than what they get by letting qemu
decide how to sequence multiple changes.

Signed-off-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2-snapshot.c | 20 
 block/qcow2.c  | 25 ++---
 tests/qemu-iotests/061 | 35 +++
 tests/qemu-iotests/061.out | 28 
 4 files changed, 101 insertions(+), 7 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 82c32d4c9b08..2756b37d2427 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -23,6 +23,7 @@
  */

 #include "qemu/osdep.h"
+#include "sysemu/block-backend.h"
 #include "qapi/error.h"
 #include "qcow2.h"
 #include "qemu/bswap.h"
@@ -775,10 +776,21 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
*snapshot_id)
 }

 if (sn->disk_size != bs->total_sectors * BDRV_SECTOR_SIZE) {
-error_report("qcow2: Loading snapshots with different disk "
-"size is not implemented");
-ret = -ENOTSUP;
-goto fail;
+BlockBackend *blk = blk_new_with_bs(bs, BLK_PERM_RESIZE, BLK_PERM_ALL,
+_err);
+if (!blk) {
+error_report_err(local_err);
+ret = -ENOTSUP;
+goto fail;
+}
+
+ret = blk_truncate(blk, sn->disk_size, true, PREALLOC_MODE_OFF, 0,
+   _err);
+blk_unref(blk);
+if (ret < 0) {
+error_report_err(local_err);
+goto fail;
+}
 }

 /*
diff --git a/block/qcow2.c b/block/qcow2.c
index 0edc7f4643f8..3e8b3d022b80 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3989,9 +3989,12 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,

 qemu_co_mutex_lock(>lock);

-/* cannot proceed if image has snapshots */
-if (s->nb_snapshots) {
-error_setg(errp, "Can't resize an image which has snapshots");
+/*
+ * Even though we store snapshot size for all images, it was not
+ * required until v3, so it is not safe to proceed for v2.
+ */
+if (s->nb_snapshots && s->qcow_version < 3) {
+error_setg(errp, "Can't resize a v2 image which has snapshots");
 ret = -ENOTSUP;
 goto fail;
 }
@@ -5005,6 +5008,7 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 BDRVQcow2State *s = bs->opaque;
 int current_version = s->qcow_version;
 int ret;
+int i;

 /* This is qcow2_downgrade(), not qcow2_upgrade() */
 assert(target_version < current_version);
@@ -5022,6 +5026,21 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 return -ENOTSUP;
 }

+/*
+ * If any internal snapshot has a different size than the current
+ * image size, or VM state size that exceeds 32 bits, downgrading
+ * is unsafe.  Even though we would still use v3-compliant output
+ * to preserve that data, other v2 programs might not realize
+ * those optional fields are important.
+ */
+for (i = 0; i < s->nb_snapshots; i++) {
+if (s->snapshots[i].vm_state_size > UINT32_MAX 

[PATCH v4 0/3] qcow2: Allow resize of images with internal snapshots

2020-04-28 Thread Eric Blake
Re-posting this to make Max' life easier when rebasing on top of Kevin's work.

Based-on: <20200424125448.63318-1-kw...@redhat.com>
[PATCH v7 00/10] block: Fix resize (extending) of short overlays

In v4:
- patch 1: fold in Max's touchups to my v3
- patch 1: resolve merge conflict on top of Kevin's block-next branch (truncate 
signature change)
- patch 2: resolve semantic conflict (truncate signature change)

001/3:[0004] [FC] 'block: Add blk_new_with_bs() helper'
002/3:[0002] [FC] 'qcow2: Allow resize of images with internal snapshots'
003/3:[] [--] 'qcow2: Tweak comment about bitmaps vs. resize'

Eric Blake (3):
  block: Add blk_new_with_bs() helper
  qcow2: Allow resize of images with internal snapshots
  qcow2: Tweak comment about bitmaps vs. resize

 include/sysemu/block-backend.h |  2 ++
 block/block-backend.c  | 23 +
 block/crypto.c |  9 +++
 block/parallels.c  |  8 +++---
 block/qcow.c   |  8 +++---
 block/qcow2-snapshot.c | 20 ---
 block/qcow2.c  | 45 +++---
 block/qed.c|  8 +++---
 block/sheepdog.c   | 10 
 block/vdi.c|  8 +++---
 block/vhdx.c   |  8 +++---
 block/vmdk.c   |  9 +++
 block/vpc.c|  8 +++---
 blockdev.c |  8 +++---
 blockjob.c |  7 ++
 tests/qemu-iotests/061 | 35 ++
 tests/qemu-iotests/061.out | 28 +
 17 files changed, 177 insertions(+), 67 deletions(-)

-- 
2.26.2




Re: [PATCH for-5.1 6/7] hw/mips: Add Loongson-3 machine support (with KVM)

2020-04-28 Thread Aleksandar Markovic
Hi. Huacei.

Please expand commit message with the description of the machine
internal organization (several paragraphs).

Also, please include command line for starting the machine. More than
one example is better than only one.

Specifically, can you explicitly say what is your KVM setup, so that
anyone could repro it?

Good health to people from China!

Yours,
Aleksandar

пон, 27. апр 2020. у 11:36 Huacai Chen  је написао/ла:
>
> Add Loongson-3 based machine support, it use i8259 as the interrupt
> controler and use GPEX as the pci controller. Currently it can only
> work with KVM, but we will add TCG support in future.
>
> Signed-off-by: Huacai Chen 
> Co-developed-by: Jiaxun Yang 
> ---
>  default-configs/mips64el-softmmu.mak |   1 +
>  hw/mips/Kconfig  |  10 +
>  hw/mips/Makefile.objs|   1 +
>  hw/mips/mips_loongson3.c | 869 
> +++
>  4 files changed, 881 insertions(+)
>  create mode 100644 hw/mips/mips_loongson3.c
>
> diff --git a/default-configs/mips64el-softmmu.mak 
> b/default-configs/mips64el-softmmu.mak
> index 8b0c9b1..fc798e4 100644
> --- a/default-configs/mips64el-softmmu.mak
> +++ b/default-configs/mips64el-softmmu.mak
> @@ -3,6 +3,7 @@
>  include mips-softmmu-common.mak
>  CONFIG_IDE_VIA=y
>  CONFIG_FULONG=y
> +CONFIG_LOONGSON3=y
>  CONFIG_ATI_VGA=y
>  CONFIG_RTL8139_PCI=y
>  CONFIG_JAZZ=y
> diff --git a/hw/mips/Kconfig b/hw/mips/Kconfig
> index 2c2adbc..6f16b16 100644
> --- a/hw/mips/Kconfig
> +++ b/hw/mips/Kconfig
> @@ -44,6 +44,16 @@ config JAZZ
>  config FULONG
>  bool
>
> +config LOONGSON3
> +bool
> +select PCKBD
> +select SERIAL
> +select ISA_BUS
> +select PCI_EXPRESS_GENERIC_BRIDGE
> +select VIRTIO_VGA
> +select QXL if SPICE
> +select MSI_NONBROKEN
> +
>  config MIPS_CPS
>  bool
>  select PTIMER
> diff --git a/hw/mips/Makefile.objs b/hw/mips/Makefile.objs
> index 2f7795b..f9bc8f5 100644
> --- a/hw/mips/Makefile.objs
> +++ b/hw/mips/Makefile.objs
> @@ -4,5 +4,6 @@ obj-$(CONFIG_MALTA) += gt64xxx_pci.o mips_malta.o
>  obj-$(CONFIG_MIPSSIM) += mips_mipssim.o
>  obj-$(CONFIG_JAZZ) += mips_jazz.o
>  obj-$(CONFIG_FULONG) += mips_fulong2e.o
> +obj-$(CONFIG_LOONGSON3) += mips_loongson3.o
>  obj-$(CONFIG_MIPS_CPS) += cps.o
>  obj-$(CONFIG_MIPS_BOSTON) += boston.o
> diff --git a/hw/mips/mips_loongson3.c b/hw/mips/mips_loongson3.c
> new file mode 100644
> index 000..a45c9ec
> --- /dev/null
> +++ b/hw/mips/mips_loongson3.c
> @@ -0,0 +1,869 @@
> +/*
> + * Generic Loongson-3 Platform support
> + *
> + * Copyright (c) 2015-2020 Huacai Chen (che...@lemote.com)
> + * This code is licensed under the GNU GPL v2.
> + *
> + * Contributions are licensed under the terms of the GNU GPL,
> + * version 2 or (at your option) any later version.
> + */
> +
> +/*
> + * Generic PC Platform based on Loongson-3 CPU (MIPS64R2 with extensions,
> + * 800~2000MHz)
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "qemu/units.h"
> +#include "qapi/error.h"
> +#include "cpu.h"
> +#include "elf.h"
> +#include "hw/boards.h"
> +#include "hw/block/flash.h"
> +#include "hw/char/serial.h"
> +#include "hw/mips/mips.h"
> +#include "hw/mips/cpudevs.h"
> +#include "hw/intc/i8259.h"
> +#include "hw/loader.h"
> +#include "hw/ide.h"
> +#include "hw/isa/superio.h"
> +#include "hw/pci/msi.h"
> +#include "hw/pci/pci.h"
> +#include "hw/pci/pci_host.h"
> +#include "hw/pci-host/gpex.h"
> +#include "hw/rtc/mc146818rtc.h"
> +#include "net/net.h"
> +#include "exec/address-spaces.h"
> +#include "sysemu/qtest.h"
> +#include "sysemu/reset.h"
> +#include "sysemu/runstate.h"
> +#include "qemu/log.h"
> +#include "qemu/error-report.h"
> +
> +#define INITRD_OFFSET  0x0400
> +#define BOOTPARAM_ADDR 0x8ff0
> +#define BOOTPARAM_PHYADDR  0x0ff0
> +#define CFG_ADDR   0x0f10
> +#define FW_CONF_ADDR   0x0fff
> +#define PM_MMIO_ADDR   0x1008
> +#define PM_MMIO_SIZE   0x100
> +#define PM_CNTL_MODE   0x10
> +
> +#define PHYS_TO_VIRT(x) ((x) | ~(target_ulong)0x7fff)
> +
> +/* Loongson-3 has a 2MB flash rom */
> +#define BIOS_SIZE   (2 * MiB)
> +#define LOONGSON_MAX_VCPUS  16
> +
> +#define LOONGSON3_BIOSNAME "bios_loongson3.bin"
> +
> +#define PCIE_IRQ_BASE 3
> +
> +#define VIRT_PCI_IO_BASE0x1800ul
> +#define VIRT_PCI_IO_SIZE0x000cul
> +#define VIRT_PCI_MEM_BASE   0x4000ul
> +#define VIRT_PCI_MEM_SIZE   0x4000ul
> +#define VIRT_PCI_ECAM_BASE  0x1a00ul
> +#define VIRT_PCI_ECAM_SIZE  0x0200ul
> +
> +#define align(x) (((x) + 63) & ~63)
> +
> +struct efi_memory_map_loongson {
> +uint16_t vers;   /* version of efi_memory_map */
> +uint32_t nr_map; /* number of memory_maps */
> +uint32_t mem_freq;   /* memory frequence */
> +struct mem_map{
> +uint32_t node_id;/* node_id which memory attached to */
> +

Re: [PATCH v3 1/4] hw/net/can: Introduce Xilinx ZynqMP CAN controller

2020-04-28 Thread Francisco Iglesias
Hi Vikram,

A couple of more comments here also.

On [2020 Apr 22] Wed 17:56:06, Vikram Garhwal wrote:
> XlnxCAN is developed based on SocketCAN, QEMU CAN bus implementation.
> Bus connection and socketCAN connection for each CAN module can be set
> through command lines.
> 
> Signed-off-by: Vikram Garhwal 
> ---
>  hw/net/can/Makefile.objs |1 +
>  hw/net/can/xlnx-zynqmp-can.c | 1113 
> ++
>  include/hw/net/xlnx-zynqmp-can.h |   76 +++
>  3 files changed, 1190 insertions(+)
>  create mode 100644 hw/net/can/xlnx-zynqmp-can.c
>  create mode 100644 include/hw/net/xlnx-zynqmp-can.h
> 
> diff --git a/hw/net/can/Makefile.objs b/hw/net/can/Makefile.objs
> index 9f0c4ee..0fe87dd 100644
> --- a/hw/net/can/Makefile.objs
> +++ b/hw/net/can/Makefile.objs
> @@ -2,3 +2,4 @@ common-obj-$(CONFIG_CAN_SJA1000) += can_sja1000.o
>  common-obj-$(CONFIG_CAN_PCI) += can_kvaser_pci.o
>  common-obj-$(CONFIG_CAN_PCI) += can_pcm3680_pci.o
>  common-obj-$(CONFIG_CAN_PCI) += can_mioe3680_pci.o
> +common-obj-$(CONFIG_XLNX_ZYNQMP) += xlnx-zynqmp-can.o
> diff --git a/hw/net/can/xlnx-zynqmp-can.c b/hw/net/can/xlnx-zynqmp-can.c
> new file mode 100644
> index 000..31799c0
> --- /dev/null
> +++ b/hw/net/can/xlnx-zynqmp-can.c
> @@ -0,0 +1,1113 @@
> +/*
> + * QEMU model of the Xilinx CAN device.
> + *
> + * Copyright (c) 2020 Xilinx Inc.
> + *
> + * Written-by: Vikram Garhwal
> + *
> + * Based on QEMU CAN Device emulation implemented by Jin Yang, Deniz Eren and
> + * Pavel Pisa
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/sysbus.h"
> +#include "hw/register.h"
> +#include "hw/irq.h"
> +#include "qapi/error.h"
> +#include "qemu/bitops.h"
> +#include "qemu/log.h"
> +#include "qemu/cutils.h"
> +#include "sysemu/sysemu.h"
> +#include "migration/vmstate.h"
> +#include "hw/qdev-properties.h"
> +#include "net/can_emu.h"
> +#include "net/can_host.h"
> +#include "qemu/event_notifier.h"
> +#include "qom/object_interfaces.h"
> +#include "hw/net/xlnx-zynqmp-can.h"
> +
> +#ifndef XLNX_ZYNQMP_CAN_ERR_DEBUG
> +#define XLNX_ZYNQMP_CAN_ERR_DEBUG 0
> +#endif
> +
> +#define DB_PRINT(...) do { \
> +if (XLNX_ZYNQMP_CAN_ERR_DEBUG) { \
> +qemu_log(__VA_ARGS__); \
> +} \
> +} while (0)
> +
> +#define MAX_DLC8
> +#undef ERROR
> +
> +REG32(SOFTWARE_RESET_REGISTER, 0x0)
> +FIELD(SOFTWARE_RESET_REGISTER, CEN, 1, 1)
> +FIELD(SOFTWARE_RESET_REGISTER, SRST, 0, 1)
> +REG32(MODE_SELECT_REGISTER, 0x4)
> +FIELD(MODE_SELECT_REGISTER, SNOOP, 2, 1)
> +FIELD(MODE_SELECT_REGISTER, LBACK, 1, 1)
> +FIELD(MODE_SELECT_REGISTER, SLEEP, 0, 1)
> +REG32(ARBITRATION_PHASE_BAUD_RATE_PRESCALER_REGISTER, 0x8)
> +FIELD(ARBITRATION_PHASE_BAUD_RATE_PRESCALER_REGISTER, BRP, 0, 8)
> +REG32(ARBITRATION_PHASE_BIT_TIMING_REGISTER, 0xc)
> +FIELD(ARBITRATION_PHASE_BIT_TIMING_REGISTER, SJW, 7, 2)
> +FIELD(ARBITRATION_PHASE_BIT_TIMING_REGISTER, TS2, 4, 3)
> +FIELD(ARBITRATION_PHASE_BIT_TIMING_REGISTER, TS1, 0, 4)
> +REG32(ERROR_COUNTER_REGISTER, 0x10)
> +FIELD(ERROR_COUNTER_REGISTER, REC, 8, 8)
> +FIELD(ERROR_COUNTER_REGISTER, TEC, 0, 8)
> +REG32(ERROR_STATUS_REGISTER, 0x14)
> +FIELD(ERROR_STATUS_REGISTER, ACKER, 4, 1)
> +FIELD(ERROR_STATUS_REGISTER, BERR, 3, 1)
> +FIELD(ERROR_STATUS_REGISTER, STER, 2, 1)
> +FIELD(ERROR_STATUS_REGISTER, FMER, 1, 1)
> +FIELD(ERROR_STATUS_REGISTER, CRCER, 0, 1)
> +REG32(STATUS_REGISTER, 0x18)
> +FIELD(STATUS_REGISTER, SNOOP, 12, 1)
> +FIELD(STATUS_REGISTER, ACFBSY, 11, 1)
> +FIELD(STATUS_REGISTER, TXFLL, 10, 1)
> +FIELD(STATUS_REGISTER, TXBFLL, 9, 1)
> +FIELD(STATUS_REGISTER, ESTAT, 7, 2)
> +FIELD(STATUS_REGISTER, ERRWRN, 6, 1)
> +FIELD(STATUS_REGISTER, BBSY, 5, 1)
> +FIELD(STATUS_REGISTER, BIDLE, 4, 1)
> +FIELD(STATUS_REGISTER, NORMAL, 3, 1)
> +

Re: [PATCH for-5.1 1/7] configure: Add KVM target support for MIPS64

2020-04-28 Thread Aleksandar Markovic
пон, 27. апр 2020. у 11:33 Huacai Chen  је написао/ла:
>
> Preparing for Loongson-3 virtualization, add KVM target support for
> MIPS64 in configure script.
>
> Signed-off-by: Huacai Chen 
> Co-developed-by: Jiaxun Yang 
> ---

Huacai, hi.

I am really glad this series arrived, and salute your work.

But it looks no cover letter arrived, and here and there there are
some omission.

The english machine translation of all relevant docs would be good too.

Please send v2, a little bit more complete.

Sincerely,
Aleksandar

>  configure | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/configure b/configure
> index 23b5e93..7581e65 100755
> --- a/configure
> +++ b/configure
> @@ -198,7 +198,7 @@ supported_kvm_target() {
>  arm:arm | aarch64:aarch64 | \
>  i386:i386 | i386:x86_64 | i386:x32 | \
>  x86_64:i386 | x86_64:x86_64 | x86_64:x32 | \
> -mips:mips | mipsel:mips | \
> +mips:mips | mipsel:mips | mips64:mips | mips64el:mips | \
>  ppc:ppc | ppc64:ppc | ppc:ppc64 | ppc64:ppc64 | ppc64:ppc64le | \
>  s390x:s390x)
>  return 0
> --
> 2.7.0
>



Re: [Virtio-fs] [PATCH] virtiofsd: Show submounts

2020-04-28 Thread Dr. David Alan Gilbert
* Miklos Szeredi (mszer...@redhat.com) wrote:
> On Tue, Apr 28, 2020 at 4:52 PM Stefan Hajnoczi  wrote:
> >
> > On Mon, Apr 27, 2020 at 06:59:02PM +0100, Dr. David Alan Gilbert wrote:
> > > * Max Reitz (mre...@redhat.com) wrote:
> > > > Currently, setup_mounts() bind-mounts the shared directory without
> > > > MS_REC.  This makes all submounts disappear.
> > > >
> > > > Pass MS_REC so that the guest can see submounts again.
> > >
> > > Thanks!
> > >
> > > > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> > >
> > > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
> > >
> > > > Signed-off-by: Max Reitz 
> > > > ---
> > > >  tools/virtiofsd/passthrough_ll.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/virtiofsd/passthrough_ll.c 
> > > > b/tools/virtiofsd/passthrough_ll.c
> > > > index 4c35c95b25..9d7f863e66 100644
> > > > --- a/tools/virtiofsd/passthrough_ll.c
> > > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
> > > >  int oldroot;
> > > >  int newroot;
> > > >
> > > > -if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> > > > +if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> > > >  fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, 
> > > > source);
> > > >  exit(1);
> > > >  }
> > >
> > > Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> > > host?
> >
> > There are two separate concepts:
> >
> > 1. Mount namespaces.  The virtiofsd process is sandboxed and lives in
> >its own mount namespace.  Therefore it does not share the mounts that
> >the rest of the host system sees.
> >
> > 2. Propagation type.  This is related to bind mounts so that mount
> >operations that happen in one bind-mounted location can also appear
> >in other bind-mounted locations.
> >
> > Since virtiofsd is in a separate mount namespace, does the propagation
> > type even have any effect?
> 
> It's a complicated thing.  Current setup results in propagation
> happening to the cloned namespace, but not to the bind mounted root.
> 
> Why?  Because setting mounts "slave" after unshare, results in the
> propagation being stopped at that point.  To make it propagate
> further, change it back to "shared".  Note: the result changing to
> "slave" and then to "shared" results in breaking the backward
> propagation to the original namespace, but allowing propagation
> further down the chain.

Do you mean on the "/" ?

So our current sequence is:

   (new namespace)
 1)if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) {
 2)   if (mount("proc", "/proc", "proc",
   
 3)   if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
 4)  (chdir newroot, pivot, chdir oldroot)
 5)   if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) {
 6)   if (umount2(".", MNT_DETACH) < 0) {

So are you saying we need a:
   if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) {

  and can this go straight after (1) ?

Dave

> Thanks,
> Miklos
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH for-5.1 5/7] target/mips: Add more CP0 register for save/restore

2020-04-28 Thread Aleksandar Markovic
пон, 27. апр 2020. у 11:36 Huacai Chen  је написао/ла:
>
> Add more CP0 register for save/restore, including: EBase, XContext,
> PageGrain, PWBase, PWSize, PWField, PWCtl, Config*, KScratch1~KScratch6.
>
> Signed-off-by: Huacai Chen 
> Co-developed-by: Jiaxun Yang 
> ---
>  target/mips/kvm.c | 212 
> ++
>  target/mips/machine.c |   2 +
>  2 files changed, 214 insertions(+)
>
> diff --git a/target/mips/kvm.c b/target/mips/kvm.c
> index de3e26e..96cfa10 100644
> --- a/target/mips/kvm.c
> +++ b/target/mips/kvm.c
> @@ -245,10 +245,16 @@ int kvm_mips_set_ipi_interrupt(MIPSCPU *cpu, int irq, 
> int level)
>  (KVM_REG_MIPS_CP0 | KVM_REG_SIZE_U64 | (8 * (_R) + (_S)))
>
>  #define KVM_REG_MIPS_CP0_INDEX  MIPS_CP0_32(0, 0)
> +#define KVM_REG_MIPS_CP0_RANDOM MIPS_CP0_32(1, 0)
>  #define KVM_REG_MIPS_CP0_CONTEXTMIPS_CP0_64(4, 0)
>  #define KVM_REG_MIPS_CP0_USERLOCAL  MIPS_CP0_64(4, 2)
>  #define KVM_REG_MIPS_CP0_PAGEMASK   MIPS_CP0_32(5, 0)
> +#define KVM_REG_MIPS_CP0_PAGEGRAIN  MIPS_CP0_32(5, 1)
> +#define KVM_REG_MIPS_CP0_PWBASE MIPS_CP0_64(5, 5)
> +#define KVM_REG_MIPS_CP0_PWFIELDMIPS_CP0_64(5, 6)
> +#define KVM_REG_MIPS_CP0_PWSIZE MIPS_CP0_64(5, 7)
>  #define KVM_REG_MIPS_CP0_WIRED  MIPS_CP0_32(6, 0)
> +#define KVM_REG_MIPS_CP0_PWCTL  MIPS_CP0_32(6, 6)
>  #define KVM_REG_MIPS_CP0_HWRENA MIPS_CP0_32(7, 0)
>  #define KVM_REG_MIPS_CP0_BADVADDR   MIPS_CP0_64(8, 0)
>  #define KVM_REG_MIPS_CP0_COUNT  MIPS_CP0_32(9, 0)
> @@ -258,13 +264,22 @@ int kvm_mips_set_ipi_interrupt(MIPSCPU *cpu, int irq, 
> int level)
>  #define KVM_REG_MIPS_CP0_CAUSE  MIPS_CP0_32(13, 0)
>  #define KVM_REG_MIPS_CP0_EPCMIPS_CP0_64(14, 0)
>  #define KVM_REG_MIPS_CP0_PRID   MIPS_CP0_32(15, 0)
> +#define KVM_REG_MIPS_CP0_EBASE  MIPS_CP0_64(15, 1)
>  #define KVM_REG_MIPS_CP0_CONFIG MIPS_CP0_32(16, 0)
>  #define KVM_REG_MIPS_CP0_CONFIG1MIPS_CP0_32(16, 1)
>  #define KVM_REG_MIPS_CP0_CONFIG2MIPS_CP0_32(16, 2)
>  #define KVM_REG_MIPS_CP0_CONFIG3MIPS_CP0_32(16, 3)
>  #define KVM_REG_MIPS_CP0_CONFIG4MIPS_CP0_32(16, 4)
>  #define KVM_REG_MIPS_CP0_CONFIG5MIPS_CP0_32(16, 5)
> +#define KVM_REG_MIPS_CP0_CONFIG6MIPS_CP0_32(16, 6)
> +#define KVM_REG_MIPS_CP0_XCONTEXT   MIPS_CP0_64(20, 0)
>  #define KVM_REG_MIPS_CP0_ERROREPC   MIPS_CP0_64(30, 0)
> +#define KVM_REG_MIPS_CP0_KSCRATCH1  MIPS_CP0_64(31, 2)
> +#define KVM_REG_MIPS_CP0_KSCRATCH2  MIPS_CP0_64(31, 3)
> +#define KVM_REG_MIPS_CP0_KSCRATCH3  MIPS_CP0_64(31, 4)
> +#define KVM_REG_MIPS_CP0_KSCRATCH4  MIPS_CP0_64(31, 5)
> +#define KVM_REG_MIPS_CP0_KSCRATCH5  MIPS_CP0_64(31, 6)
> +#define KVM_REG_MIPS_CP0_KSCRATCH6  MIPS_CP0_64(31, 7)
>
>  static inline int kvm_mips_put_one_reg(CPUState *cs, uint64_t reg_id,
> int32_t *addr)
> @@ -394,6 +409,29 @@ static inline int kvm_mips_get_one_ureg64(CPUState *cs, 
> uint64_t reg_id,
>   (1U << CP0C5_UFE) | \
>   (1U << CP0C5_FRE) | \
>   (1U << CP0C5_UFR))
> +#define KVM_REG_MIPS_CP0_CONFIG6_MASK   ((1U << CP0C6_BPPASS) | \
> + (0x3fU << CP0C6_KPOS) | \
> + (1U << CP0C6_KE) | \
> + (1U << CP0C6_VTLBONLY) | \
> + (1U << CP0C6_LASX) | \
> + (1U << CP0C6_SSEN) | \
> + (1U << CP0C6_DISDRTIME) | \
> + (1U << CP0C6_PIXNUEN) | \
> + (1U << CP0C6_SCRAND) | \
> + (1U << CP0C6_LLEXCEN) | \
> + (1U << CP0C6_DISVC) | \
> + (1U << CP0C6_VCLRU) | \
> + (1U << CP0C6_DCLRU) | \
> + (1U << CP0C6_PIXUEN) | \
> + (1U << CP0C6_DISBLKLYEN) | \
> + (1U << CP0C6_UMEMUALEN) | \
> + (1U << CP0C6_SFBEN) | \
> + (1U << CP0C6_FLTINT) | \
> + (1U << CP0C6_VLTINT) | \
> + (1U << CP0C6_DISBTB) | \
> + (3U << CP0C6_STPREFCTL) | \
> + (1U << CP0C6_INSTPREF) | \
> + (1U << CP0C6_DATAPREF))
>
>  static inline int kvm_mips_change_one_reg(CPUState *cs, uint64_t reg_id,
>

Re: [Virtio-fs] [PATCH] virtiofsd: Show submounts

2020-04-28 Thread Miklos Szeredi
On Tue, Apr 28, 2020 at 4:52 PM Stefan Hajnoczi  wrote:
>
> On Mon, Apr 27, 2020 at 06:59:02PM +0100, Dr. David Alan Gilbert wrote:
> > * Max Reitz (mre...@redhat.com) wrote:
> > > Currently, setup_mounts() bind-mounts the shared directory without
> > > MS_REC.  This makes all submounts disappear.
> > >
> > > Pass MS_REC so that the guest can see submounts again.
> >
> > Thanks!
> >
> > > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> >
> > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
> >
> > > Signed-off-by: Max Reitz 
> > > ---
> > >  tools/virtiofsd/passthrough_ll.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/tools/virtiofsd/passthrough_ll.c 
> > > b/tools/virtiofsd/passthrough_ll.c
> > > index 4c35c95b25..9d7f863e66 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
> > >  int oldroot;
> > >  int newroot;
> > >
> > > -if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> > > +if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> > >  fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, 
> > > source);
> > >  exit(1);
> > >  }
> >
> > Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> > host?
>
> There are two separate concepts:
>
> 1. Mount namespaces.  The virtiofsd process is sandboxed and lives in
>its own mount namespace.  Therefore it does not share the mounts that
>the rest of the host system sees.
>
> 2. Propagation type.  This is related to bind mounts so that mount
>operations that happen in one bind-mounted location can also appear
>in other bind-mounted locations.
>
> Since virtiofsd is in a separate mount namespace, does the propagation
> type even have any effect?

It's a complicated thing.  Current setup results in propagation
happening to the cloned namespace, but not to the bind mounted root.

Why?  Because setting mounts "slave" after unshare, results in the
propagation being stopped at that point.  To make it propagate
further, change it back to "shared".  Note: the result changing to
"slave" and then to "shared" results in breaking the backward
propagation to the original namespace, but allowing propagation
further down the chain.

Thanks,
Miklos




Re: [PATCH v7 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-28 Thread Eric Blake

On 4/28/20 1:45 PM, Kevin Wolf wrote:

Am 28.04.2020 um 18:28 hat Eric Blake geschrieben:

On 4/24/20 7:54 AM, Kevin Wolf wrote:

If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
undo any previous preallocation, but just adds the zero flag to all
relevant L2 entries. If an external data file is in use, a write_zeroes
request to the data file is made instead.

Signed-off-by: Kevin Wolf 
---
   block/qcow2-cluster.c |  2 +-
   block/qcow2.c | 34 ++
   2 files changed, 35 insertions(+), 1 deletion(-)




+++ b/block/qcow2.c
@@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
   bs->supported_zero_flags = header.version >= 3 ?
  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
+bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;


Is this really what we want for encrypted files, or would it be better as:

 if (bs->encrypted) {
 bs->supported_truncate_flags = 0;
 } else {
 bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
 }

At the qcow2 level, we can guarantee a read of 0 even for an encrypted
image, but is that really what we want?  Is setting the qcow2 zero flag on
the cluster done at the decrypted level (at which point we may be leaking
information about guest contents via anyone that can read the qcow2
metadata) or at the encrypted level (at which point it's useless
information, because knowing the underlying file reads as zero still
decrypts into garbage)?


The zero flag means that the guest reads zeros, even with encrypted
files. I'm not sure if it's worse than exposing the information which
clusters are allocated and which are unallocated, which we have always
been doing and which is hard to avoid without encrypting all the
metadata, too. But it does reveal some information.

If we think that exposing zero flags is worse than exposing the
allocation status, I would still not use your solution above. In that
case, the full fix would be returning -ENOTSUP from
.bdrv_co_pwrite_zeroes() to cover all other callers, too.


Indeed, it also makes me wonder if we should support 
truncate(BDRV_REQ_ZERO_WRITE|BDRV_REQ_NO_FALLBACK), to differentiate 
whether a truncation request is aiming more to be fast (NO_FALLBACK set, 
fail immediately with -ENOTSUP on encryption) or complete (NO_FALLBACK 
clear, go ahead and write guest-visible zeroes, which populates the 
format layer).  In other words, maybe we want a knob that the user can 
set on encrypted volumes on whether to allow zero flags in the qcow2 image.




If we think that allocation status and zero flags are of comparable
importance, then we need to fix either both or nothing. Hiding all of
this information probably means encrypting at least the L2 tables and
potentially all of the metadata apart from the header. This would
obviously require an incompatible feature flag (and some effort to
implement it).


Indeed, my question is broad enough that it does not hold up _this_ 
series, so much as providing food for thought on what else we may need 
to add for encrypted qcow2 images as a future series, to make it easier 
to adjust the slider between the extremes of performance vs. minimal 
data leaks when using encryption.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH RESEND v6 08/36] multi-process: Add stub functions to facilitate build of multi-process

2020-04-28 Thread Jag Raman



> On Apr 28, 2020, at 12:29 PM, Stefan Hajnoczi  wrote:
> 
> On Fri, Apr 24, 2020 at 09:47:56AM -0400, Jag Raman wrote:
>>> On Apr 24, 2020, at 9:12 AM, Stefan Hajnoczi  wrote:
>>> On Wed, Apr 22, 2020 at 09:13:43PM -0700, elena.ufimts...@oracle.com wrote:
 diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
 index f884bb6180..f74c7e927b 100644
 --- a/stubs/Makefile.objs
 +++ b/stubs/Makefile.objs
 @@ -20,6 +20,7 @@ stub-obj-y += migr-blocker.o
 stub-obj-y += change-state-handler.o
 stub-obj-y += monitor.o
 stub-obj-y += monitor-core.o
 +stub-obj-y += get-fd.o
 stub-obj-y += notify-event.o
 stub-obj-y += qtest.o
 stub-obj-y += replay.o
>>> 
>>> audio.c, vl-stub.c, and xen-mapcache.c are added by this patch but not
>>> added to Makefile.objs?  Can they be removed?
>> 
>> Hey Stefan,
>> 
>> Sorry it’s not clear. but these files are referenced in Makefile.target.
> 
> Why is the Makefile.target change not in this patch?
> 
> Please structure patch series as logical changes that can be reviewed
> sequentially.  Not only is it hard for reviewers to understand what is
> going on but it probably also breaks bisectability if patches contain
> incomplete changes.

Hi Stefan,

We grouped all the stubs into a separate patch for ease of review. If you’re 
finding
it hard to review this way, we’ll modify to ensure that the Makefile changes go 
along
with the stubs.

--
Jag

> 
>>> 
>>> This entire patch requires justification.  Stubs exist so that common
>>> code can be linked without optional features.
>>> 
>>> For example, common code may call into kvm but that callback isn't
>>> relevant when building with kvm accelerator support (e.g. say qemu-nbd).
>>> That's where the stub function comes in.  It fulfills the dependency
>>> without dragging in the actual kvm accelerator code.
>>> 
>>> Adding lots of stubs suggests you are building QEMU in a new way that
>>> wasn't done before (this is true and expected for this patch series).  I
>>> would like to understand the reason for these stubs though.  For
>>> example, why do you need to stub audio?
>> 
>> These stub functions are only used by the remote process, and not by
>> QEMU itself.
>> 
>> Our goal is to ensure that the remote process is building the smallest
>> set of files necessary and these stub functions are necessary to meet
>> that goal.
>> 
>> For example, the remote process needs to build some of the functions
>> defined in “hw/core/qdev-properties-system.c”. However, this file
>> depends on audio.c (references audio_state_by_name()), which is not
>> needed for the remote process. The alternative to stub functions would
>> be to compile audio.c into the remote process, but that was not necessary
>> in our judgement. When the project started out, we spent a lot of time
>> figuring out which functions/files are necessary for the remote process, and
>> we stubbed out the ones which are needed to resolve dependency during
>> compilation, but not needed for functionality.
>> 
>> audio.c is just an example of tens of other places where we needed to
>> make similar judgements.
>> 
>> Would you prefer if we moved these stub functions into a separate
>> library (instead of stub-obj-y) which is only linked by the remote process?
> 
> It's too bad that none of these judgements were documented.  As a
> reviewer I have no idea what the justification for each individual stub
> was.
> 
> Some stubs are unavoidable but they also indicate that the code is
> tightly coupled where maybe it can be split up.  The
> qdev-properties-system.c example you mentioned sounds like something
> that should be broken up into multiple files.  Then stubs wouldn't be
> necessary.
> 
> That said, adding stubs doesn't place a great burden on anyone and I
> think they can be merged.




Re: [PATCH for-5.1 3/7] hw/mips: Add CPU IRQ3 delivery for KVM

2020-04-28 Thread Aleksandar Markovic
уто, 28. апр 2020. у 10:21 chen huacai  је написао/ла:
>
> Hi, Philippe,
>
> On Mon, Apr 27, 2020 at 5:57 PM Philippe Mathieu-Daudé  
> wrote:
> >
> > On 4/27/20 11:33 AM, Huacai Chen wrote:
> > > Currently, KVM/MIPS only deliver I/O interrupt via IP2, this patch add
> > > IP2 delivery as well, because Loongson-3 based machine use both IRQ2
> > > (CPU's IP2) and IRQ3 (CPU's IP3).
> > >
> > > Signed-off-by: Huacai Chen 
> > > Co-developed-by: Jiaxun Yang 
> > > ---
> > >  hw/mips/mips_int.c | 6 ++
> > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/hw/mips/mips_int.c b/hw/mips/mips_int.c
> > > index 796730b..5526219 100644
> > > --- a/hw/mips/mips_int.c
> > > +++ b/hw/mips/mips_int.c
> > > @@ -48,16 +48,14 @@ static void cpu_mips_irq_request(void *opaque, int 
> > > irq, int level)
> > >  if (level) {
> > >  env->CP0_Cause |= 1 << (irq + CP0Ca_IP);
> > >
> > > -if (kvm_enabled() && irq == 2) {
> > > +if (kvm_enabled() && (irq == 2 || irq == 3))
> >
> > Shouldn't we check env->CP0_Config6 (or Config7) has the required
> > feature first?
> I'm sorry that I can't understand IRQ delivery has something to do
> with Config6/Config7, to identify Loongson-3?
>

Obviously, yes.

Thanks,
Aleksandar


> >
> > >  kvm_mips_set_interrupt(cpu, irq, level);
> > > -}
> > >
> > >  } else {
> > >  env->CP0_Cause &= ~(1 << (irq + CP0Ca_IP));
> > >
> > > -if (kvm_enabled() && irq == 2) {
> > > +if (kvm_enabled() && (irq == 2 || irq == 3))
> > >  kvm_mips_set_interrupt(cpu, irq, level);
> > > -}
> > >  }
> > >
> > >  if (env->CP0_Cause & CP0Ca_IP_mask) {
> > >
>
>
>
> --
> Huacai Chen



Re: [PATCH v3 2/4] xlnx-zynqmp: Connect Xilinx ZynqMP CAN controller

2020-04-28 Thread Francisco Iglesias
Hi Vikram,

A couple of more comments:

On the git summary:
s/controller/controllers/

On [2020 Apr 22] Wed 17:56:07, Vikram Garhwal wrote:
> Connect CAN0 and CAN1 to ZCU102 board.

Perhaps also:
s/to ZCU102 board/on the ZynqMP/

(even though zcu102 is the only board using it at the moment).

Best regards,
Francisco

> 
> Signed-off-by: Vikram Garhwal 
> ---
>  hw/arm/xlnx-zynqmp.c | 26 ++
>  include/hw/arm/xlnx-zynqmp.h |  3 +++
>  2 files changed, 29 insertions(+)
> 
> diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
> index b84d153..e5f0d9f 100644
> --- a/hw/arm/xlnx-zynqmp.c
> +++ b/hw/arm/xlnx-zynqmp.c
> @@ -81,6 +81,14 @@ static const int uart_intr[XLNX_ZYNQMP_NUM_UARTS] = {
>  21, 22,
>  };
>  
> +static const uint64_t can_addr[XLNX_ZYNQMP_NUM_CAN] = {
> +0xFF06, 0xFF07,
> +};
> +
> +static const int can_intr[XLNX_ZYNQMP_NUM_CAN] = {
> +23, 24,
> +};
> +
>  static const uint64_t sdhci_addr[XLNX_ZYNQMP_NUM_SDHCI] = {
>  0xFF16, 0xFF17,
>  };
> @@ -254,6 +262,11 @@ static void xlnx_zynqmp_init(Object *obj)
>TYPE_CADENCE_UART);
>  }
>  
> +for (i = 0; i < XLNX_ZYNQMP_NUM_CAN; i++) {
> +sysbus_init_child_obj(obj, "can[*]", >can[i], sizeof(s->can[i]),
> +  TYPE_XLNX_ZYNQMP_CAN);
> +}
> +
>  sysbus_init_child_obj(obj, "sata", >sata, sizeof(s->sata),
>TYPE_SYSBUS_AHCI);
>  
> @@ -508,6 +521,19 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
> **errp)
> gic_spi[uart_intr[i]]);
>  }
>  
> +for (i = 0; i < XLNX_ZYNQMP_NUM_CAN; i++) {
> +object_property_set_int(OBJECT(>can[i]), i, "ctrl-idx",
> +_abort);
> +object_property_set_bool(OBJECT(>can[i]), true, "realized", );
> +if (err) {
> +error_propagate(errp, err);
> +return;
> +}
> +sysbus_mmio_map(SYS_BUS_DEVICE(>can[i]), 0, can_addr[i]);
> +sysbus_connect_irq(SYS_BUS_DEVICE(>can[i]), 0,
> +   gic_spi[can_intr[i]]);
> +}
> +
>  object_property_set_int(OBJECT(>sata), SATA_NUM_PORTS, "num-ports",
>  _abort);
>  object_property_set_bool(OBJECT(>sata), true, "realized", );
> diff --git a/include/hw/arm/xlnx-zynqmp.h b/include/hw/arm/xlnx-zynqmp.h
> index 53076fa..2be0ff9 100644
> --- a/include/hw/arm/xlnx-zynqmp.h
> +++ b/include/hw/arm/xlnx-zynqmp.h
> @@ -22,6 +22,7 @@
>  #include "hw/intc/arm_gic.h"
>  #include "hw/net/cadence_gem.h"
>  #include "hw/char/cadence_uart.h"
> +#include "hw/net/xlnx-zynqmp-can.h"
>  #include "hw/ide/ahci.h"
>  #include "hw/sd/sdhci.h"
>  #include "hw/ssi/xilinx_spips.h"
> @@ -41,6 +42,7 @@
>  #define XLNX_ZYNQMP_NUM_RPU_CPUS 2
>  #define XLNX_ZYNQMP_NUM_GEMS 4
>  #define XLNX_ZYNQMP_NUM_UARTS 2
> +#define XLNX_ZYNQMP_NUM_CAN 2
>  #define XLNX_ZYNQMP_NUM_SDHCI 2
>  #define XLNX_ZYNQMP_NUM_SPIS 2
>  #define XLNX_ZYNQMP_NUM_GDMA_CH 8
> @@ -92,6 +94,7 @@ typedef struct XlnxZynqMPState {
>  
>  CadenceGEMState gem[XLNX_ZYNQMP_NUM_GEMS];
>  CadenceUARTState uart[XLNX_ZYNQMP_NUM_UARTS];
> +XlnxZynqMPCANStatecan[XLNX_ZYNQMP_NUM_CAN];
>  SysbusAHCIState sata;
>  SDHCIState sdhci[XLNX_ZYNQMP_NUM_SDHCI];
>  XilinxSPIPS spi[XLNX_ZYNQMP_NUM_SPIS];
> -- 
> 2.7.4
> 
> 



Re: [PATCH 17/17] qom: Drop @errp parameter of object_property_del()

2020-04-28 Thread Eric Blake

On 4/28/20 11:34 AM, Markus Armbruster wrote:

Same story as for object_property_add(): the only way
object_property_del() can fail is when the property with this name
does not exist.  Since our property names are all hardcoded, failure
is a programming error, and the appropriate way to handle it is
passing _abort.  Most callers do that, the commit before
previous fixed one that didn't (and got the error handling wrong), and
the two remaining exceptions ignore errors.

Drop the @errp parameter and assert the precondition instead.

Signed-off-by: Markus Armbruster 
---


I skipped review of 15/17 (it's less mechanical, and although the commit 
message was good, verifying that the patch matches the commit message is 
a bigger task).  But assuming it is right, then this one indeed makes sense.




+++ b/qom/object.c
@@ -1280,15 +1280,10 @@ ObjectProperty *object_class_property_find(ObjectClass 
*klass, const char *name,
  return prop;
  }
  
-void object_property_del(Object *obj, const char *name, Error **errp)

+void object_property_del(Object *obj, const char *name)
  {
  ObjectProperty *prop = g_hash_table_lookup(obj->properties, name);
  
-if (!prop) {

-error_setg(errp, "Property '.%s' not found", name);
-return;
-}
-
  if (prop->release) {
  prop->release(obj, name, prop->opaque);
  }


However, the commit message says you assert the precondition, whereas 
the code SEGVs rather than asserts if the precondition is not met.  In 
practice, both will flag the programmer error, so I don't care which you 
do, but it's worth making the commit match the intent: Did you mean to 
add an assert()?



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v7 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-28 Thread Kevin Wolf
Am 28.04.2020 um 18:28 hat Eric Blake geschrieben:
> On 4/24/20 7:54 AM, Kevin Wolf wrote:
> > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> > undo any previous preallocation, but just adds the zero flag to all
> > relevant L2 entries. If an external data file is in use, a write_zeroes
> > request to the data file is made instead.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >   block/qcow2-cluster.c |  2 +-
> >   block/qcow2.c | 34 ++
> >   2 files changed, 35 insertions(+), 1 deletion(-)
> > 
> 
> > +++ b/block/qcow2.c
> > @@ -1726,6 +1726,7 @@ static int coroutine_fn 
> > qcow2_do_open(BlockDriverState *bs, QDict *options,
> >   bs->supported_zero_flags = header.version >= 3 ?
> >  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK 
> > : 0;
> > +bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
> 
> Is this really what we want for encrypted files, or would it be better as:
> 
> if (bs->encrypted) {
> bs->supported_truncate_flags = 0;
> } else {
> bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
> }
> 
> At the qcow2 level, we can guarantee a read of 0 even for an encrypted
> image, but is that really what we want?  Is setting the qcow2 zero flag on
> the cluster done at the decrypted level (at which point we may be leaking
> information about guest contents via anyone that can read the qcow2
> metadata) or at the encrypted level (at which point it's useless
> information, because knowing the underlying file reads as zero still
> decrypts into garbage)?

The zero flag means that the guest reads zeros, even with encrypted
files. I'm not sure if it's worse than exposing the information which
clusters are allocated and which are unallocated, which we have always
been doing and which is hard to avoid without encrypting all the
metadata, too. But it does reveal some information.

If we think that exposing zero flags is worse than exposing the
allocation status, I would still not use your solution above. In that
case, the full fix would be returning -ENOTSUP from
.bdrv_co_pwrite_zeroes() to cover all other callers, too.

If we think that allocation status and zero flags are of comparable
importance, then we need to fix either both or nothing. Hiding all of
this information probably means encrypting at least the L2 tables and
potentially all of the metadata apart from the header. This would
obviously require an incompatible feature flag (and some effort to
implement it).

Kevin




Re: [PATCH 1/2] softfloat: m68k: infinity is a valid encoding

2020-04-28 Thread Alex Bennée


KONRAD Frederic  writes:

> The MC68881 say about infinities (3.2.4):
>
> "*For the extended precision format, the most significant bit of the
> mantissa (the integer bit) is a don't care."
>
> https://www.nxp.com/docs/en/reference-manual/MC68881UM.pdf
>
> The m68k extended format is implemented with the floatx80 and
> floatx80_invalid_encoding currently treats 0x7fff as
> an invalid encoding.  This patch fixes floatx80_invalid_encoding so it
> accepts that the most significant bit of the mantissa can be 0.
>
> This bug can be revealed with the following code which pushes extended
> infinity on the stack as a double and then reloads it as a double.  It
> should normally be converted and read back as infinity and is currently
> read back as nan:

Do you have any real HW on which you could record some .ref files for
the various multiarch float tests we have (float_convs/float_madds)?
Does this different of invalid encoding show up when you add them?

>
> .global _start
> .text
> _start:
> lea val, %a0
> lea fp, %fp
> fmovex (%a0), %fp0
> fmoved %fp0, %fp@(-8)
> fmoved %fp@(-8), %fp0
> end:
> bra end
>
> .align 0x4
> val:
> .fill 1, 4, 0x7fff
> .fill 1, 4, 0x
> .fill 1, 4, 0x
> .align 0x4
> .fill 0x100, 1, 0
> fp:
>
> -
>
> (gdb) tar rem :1234
> Remote debugging using :1234
> _start () at main.S:5
> 5  lea val, %a0
> (gdb) display $fp0
> 1: $fp0 = nan(0x)
> (gdb) si
> 6 lea fp, %fp
> 1: $fp0 = nan(0x)
> (gdb) si
> _start () at main.S:7
> 7  fmovex (%a0), %fp0
> 1: $fp0 = nan(0x)
> (gdb) si
> 8 fmoved %fp0, %fp@(-8)
> 1: $fp0 = inf
> (gdb) si
> 9 fmoved %fp@(-8), %fp0
> 1: $fp0 = inf
> (gdb) si
> end () at main.S:12
> 12  bra end
> 1: $fp0 = nan(0xf800)
> (gdb) x/1xg $fp-8
> 0x4120 :   0x7fff
>
> Signed-off-by: KONRAD Frederic 
> ---
>  include/fpu/softfloat.h | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index ecb8ba0..dc80298 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -688,7 +688,12 @@ static inline int floatx80_is_any_nan(floatx80 a)
>  
> **/
>  static inline bool floatx80_invalid_encoding(floatx80 a)
>  {
> +#if defined(TARGET_M68K)
> +return (a.low & (1ULL << 63)) == 0 && (((a.high & 0x7FFF) != 0)
> +   && (a.high != 0x7FFF));
> +#else
>  return (a.low & (1ULL << 63)) == 0 && (a.high & 0x7FFF) != 0;
> +#endif
>  }
>  
>  #define floatx80_zero make_floatx80(0x, 0xLL)


-- 
Alex Bennée



Re: [PATCH 14/17] Drop more @errp parameters after previous commit

2020-04-28 Thread Eric Blake

On 4/28/20 11:34 AM, Markus Armbruster wrote:

Several functions can't fail anymore: ich9_pm_add_properties(),
device_add_bootindex_property(), ppc_compat_add_property(),
spapr_caps_add_properties(), PropertyInfo.create().  Drop their @errp
parameter.

Signed-off-by: Markus Armbruster 
---


Reviewed-by: Eric Blake 

Nice that the compiler helps you find impacted spots, once you tweak the .h.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 13/17] qom: Drop parameter @errp of object_property_add() & friends

2020-04-28 Thread Eric Blake

On 4/28/20 11:34 AM, Markus Armbruster wrote:

The only way object_property_add() can fail is when a property with
the same name already exists.  Since our property names are all
hardcoded, failure is a programming error, and the appropriate way to
handle it is passing _abort.

Same for its variants, except for object_property_add_child(), which
additionally fails when the child already has a parent.  Parentage is
also under program control, so this is a programming error, too.

We have a bit over 500 callers.  Almost half of them pass
_abort, slightly fewer ignore errors, one test case handles
errors, and the remaining few callers pass them to their own callers.

The previous few commits demonstrated once again that ignoring
programming errors is a bad idea.

Of the few ones that pass on errors, several violate the Error API.
The Error ** argument must be NULL, _abort, _fatal, or a
pointer to a variable containing NULL.  Passing an argument of the
latter kind twice without clearing it in between is wrong: if the
first call sets an error, it no longer points to NULL for the second
call.  ich9_pm_add_properties(), sparc32_ledma_realize(),
sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize()
are wrong that way.

When the one appropriate choice of argument is _abort, letting
users pick the argument is a bad idea.

Drop parameter @errp and assert the preconditions instead.

There's one exception to "duplicate property name is a programming
error": the way object_property_add() implements the magic (and
undocumented) "automatic arrayification".  Don't drop @errp there.
Instead, rename object_property_add() to object_property_try_add(),
and add the obvious wrapper object_property_add().


Huge. Could this last paragraph be done as a separate patch (ie. 
introduce object_property_try_add and adjust its clients), prior to the 
bulk mechanical patch that deletes the errp argument for all remaining 
instances?




Signed-off-by: Markus Armbruster 
---
  include/qom/object.h|  81 +++-



  qom/container.c |   2 +-
  qom/object.c| 302 +---
  qom/object_interfaces.c |   5 +-


The core of the patch lies in these files, but even then it is still 
large because of adding a new API at the same time as fixing an existing 
one.



  190 files changed, 643 insertions(+), 987 deletions(-)




+++ b/qom/object.c



@@ -1129,12 +1123,12 @@ void object_unref(Object *obj)
  }
  }
  
-ObjectProperty *

-object_property_add(Object *obj, const char *name, const char *type,
-ObjectPropertyAccessor *get,
-ObjectPropertyAccessor *set,
-ObjectPropertyRelease *release,
-void *opaque, Error **errp)
+static ObjectProperty *
+object_property_try_add(Object *obj, const char *name, const char *type,
+ObjectPropertyAccessor *get,
+ObjectPropertyAccessor *set,
+ObjectPropertyRelease *release,
+void *opaque, Error **errp)
  {
  ObjectProperty *prop;
  size_t name_len = strlen(name);
@@ -1148,8 +1142,8 @@ object_property_add(Object *obj, const char *name, const 
char *type,
  for (i = 0; ; ++i) {
  char *full_name = g_strdup_printf("%s[%d]", name_no_array, i);
  
-ret = object_property_add(obj, full_name, type, get, set,

-  release, opaque, NULL);
+ret = object_property_try_add(obj, full_name, type, get, set,
+  release, opaque, NULL);
  g_free(full_name);


Here's the magic in the last paragraph.


  if (ret) {
  break;
@@ -1179,6 +1173,17 @@ object_property_add(Object *obj, const char *name, const 
char *type,
  return prop;
  }
  
+ObjectProperty *

+object_property_add(Object *obj, const char *name, const char *type,
+ObjectPropertyAccessor *get,
+ObjectPropertyAccessor *set,
+ObjectPropertyRelease *release,
+void *opaque)
+{
+return object_property_try_add(obj, name, type, get, set, release,
+   opaque, _abort);
+}
+


and if you were to split things into two patches, the first patch would 
be adding:


ObjectProperty *
object_property_add(Object *obj, const char *name, const char *type,
ObjectPropertyAccessor *get,
ObjectPropertyAccessor *set,
ObjectPropertyRelease *release,
void *opaque, Error **errp)
{
return object_property_try_add(obj, name, type, get, set, release,
   opaque, errp);
}

with the second changing the signature to drop errp and forward 
_abort.




  ObjectProperty *
  object_class_property_add(ObjectClass *klass,
   

Re: backing chain & block status & filters

2020-04-28 Thread Kevin Wolf
Am 28.04.2020 um 18:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 28.04.2020 19:18, Eric Blake wrote:
> > On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote:
> > 
> > > > > Hm.  I could imagine that there are formats that have non-zero holes
> > > > > (e.g. 0xff or just garbage).  It would be a bit wrong for them to 
> > > > > return
> > > > > ZERO or DATA then.
> > > > > 
> > > > > But OTOH we don’t care about such cases, do we?  We need to know 
> > > > > whether
> > > > > ranges are zero, data, or unallocated.  If they aren’t zero, we only
> > > > > care about whether reading from it will return data from this layer 
> > > > > or not.
> > > > > 
> > > > > So I suppose that anything that doesn’t support backing files (or
> > > > > filtered children) should always return ZERO and/or DATA.
> > > > 
> > > > I'm not sure I agree with the notion that everything should be
> > > > BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
> > > > at least. If we want to change this, we will have to check all callers
> > > > of bdrv_is_allocated() and friends who might use this to find holes in
> > > > the file.
> > > 
> > > Yes. Because they are doing incorrect (or at least undocumented and 
> > > unreliable) thing.
> > 
> > Here's some previous mails discussing the same question about what 
> > block_status should actually mean.  At the time, I was so scared of the 
> > prospect of something breaking if I changed things that I ended up keeping 
> > status quo, so here we are revisiting the topic several years later, still 
> > asking the same questions.
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html
> > https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html
> > 
> > > 
> > > > 
> > > > Basically, the way bdrv_is_allocated() works today is that we assume an
> > > > implicit zeroed backing layer even for block drivers that don't support
> > > > backing files.
> > > 
> > > But read doesn't work so: it will read data from the bottom layer, not 
> > > from
> > > this implicit zeroed backing layer. And it is inconsistent. On read data
> > > comes exactly from this layer, not from its implicit backing. So it should
> > > return BDRV_BLOCK_ALLOCATED, accordingly to its definition..
> > > 
> > > Or, we should at least document current behavior:
> > > 
> > >    BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
> > >    layer rather than any backing, set by block. Attention: it may not be 
> > > set
> > >    for drivers without backing support, still data is of course read from
> > >    this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
> > >    allocation on fs level, which occupies real space on disk.. So, for 
> > > such drivers
> > > 
> > >    ZERO | ALLOCATED means that, read as zero, data may be allocated on 
> > > fs, or
> > >    (most probably) not,
> > >    don't look at ALLOCATED flag, as it is added by generic layer for 
> > > another logic,
> > >    not related to fs-allocation.
> > > 
> > >    0 means that, most probably, data doesn't occupy space on fs, 
> > > zero-status is
> > >    unknown (most probably non-zero)
> > > 
> > 
> > That may be right in describing the current situation, but again,
> > needs a GOOD audit of what we are actually using it for, and whether
> > it is what we really WANT to be using it for.  If we're going to
> > audit/refactor the code, we might as well get semantics that are
> > actually useful, rather than painfully contorted to documentation
> > that happens to match our current contorted code.
> > 
> 
> Honest enough:) I'll try to make a table.
> 
> I don't think that reporting fs-allocation status is a bad thing. But
> I'm sure that it should be separated from backing-chain-allocated
> concept.

I think we could easily agree on what would be a good concept.

My concern is just that existing code probably uses existing semantics
and not what we consider more logical now. So if we change it, we must
make sure that we change all places that expect the old semantics.

Kevin




Re: [PATCH for-5.1 4/7] target/mips: Add Loongson-3 CPU definition

2020-04-28 Thread Aleksandar Markovic
Huacai,

Can you please do machine translation of the document?

It can be done via translate.google.com (it accepts pdf files, but
does not have download feature, and workaround is to "print to pdf"...

Thanks in advance!
Aleksandar

уто, 28. апр 2020. у 10:26 chen huacai  је написао/ла:
>
> Hi, Philippe,
>
> On Tue, Apr 28, 2020 at 2:34 PM Philippe Mathieu-Daudé  
> wrote:
> >
> > Hi Huacai,
> >
> > On 4/27/20 11:33 AM, Huacai Chen wrote:
> > > Loongson-3 CPU family include Loongson-3A R1/R2/R3/R4 and Loongson-3B
> > > R1/R2. Loongson-3A R4 is the newest and its ISA is almost the superset
> > > of all others. To reduce complexity, we just define a "Loongson-3A" CPU
> > > which is corresponding to Loongson-3A R4. Loongson-3A has CONFIG6 and
> > > CONFIG7, so add their bit-fields as well.
> >
> > Is there a public datasheet for R4? (If possible in English).
> I'm sorry that we only have Chinese datasheet in www.loongson.cn.
>
> >
> > >
> > > Signed-off-by: Huacai Chen 
> > > Co-developed-by: Jiaxun Yang 
> > > ---
> > >  target/mips/cpu.h| 28 ++
> > >  target/mips/internal.h   |  2 ++
> > >  target/mips/mips-defs.h  |  7 --
> > >  target/mips/translate.c  |  2 ++
> > >  target/mips/translate_init.inc.c | 51 
> > > 
> > >  5 files changed, 88 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/target/mips/cpu.h b/target/mips/cpu.h
> > > index 94d01ea..0b3c987 100644
> > > --- a/target/mips/cpu.h
> > > +++ b/target/mips/cpu.h
> > > @@ -940,7 +940,35 @@ struct CPUMIPSState {
> > >  #define CP0C5_UFR  2
> > >  #define CP0C5_NFExists 0
> > >  int32_t CP0_Config6;
> > > +int32_t CP0_Config6_rw_bitmask;
> > > +#define CP0C6_BPPASS  31
> > > +#define CP0C6_KPOS24
> > > +#define CP0C6_KE  23
> > > +#define CP0C6_VTLBONLY22
> > > +#define CP0C6_LASX21
> > > +#define CP0C6_SSEN20
> > > +#define CP0C6_DISDRTIME   19
> > > +#define CP0C6_PIXNUEN 18
> > > +#define CP0C6_SCRAND  17
> > > +#define CP0C6_LLEXCEN 16
> > > +#define CP0C6_DISVC   15
> > > +#define CP0C6_VCLRU   14
> > > +#define CP0C6_DCLRU   13
> > > +#define CP0C6_PIXUEN  12
> > > +#define CP0C6_DISBLKLYEN  11
> > > +#define CP0C6_UMEMUALEN   10
> > > +#define CP0C6_SFBEN   8
> > > +#define CP0C6_FLTINT  7
> > > +#define CP0C6_VLTINT  6
> > > +#define CP0C6_DISBTB  5
> > > +#define CP0C6_STPREFCTL   2
> > > +#define CP0C6_INSTPREF1
> > > +#define CP0C6_DATAPREF0
> > >  int32_t CP0_Config7;
> > > +int64_t CP0_Config7_rw_bitmask;
> > > +#define CP0C7_NAPCGEN   2
> > > +#define CP0C7_UNIMUEN   1
> > > +#define CP0C7_VFPUCGEN  0
> > >  uint64_t CP0_LLAddr;
> > >  uint64_t CP0_MAAR[MIPS_MAAR_MAX];
> > >  int32_t CP0_MAARI;
> > > diff --git a/target/mips/internal.h b/target/mips/internal.h
> > > index 1bf274b..7853cb1 100644
> > > --- a/target/mips/internal.h
> > > +++ b/target/mips/internal.h
> > > @@ -36,7 +36,9 @@ struct mips_def_t {
> > >  int32_t CP0_Config5;
> > >  int32_t CP0_Config5_rw_bitmask;
> > >  int32_t CP0_Config6;
> > > +int32_t CP0_Config6_rw_bitmask;
> > >  int32_t CP0_Config7;
> > > +int32_t CP0_Config7_rw_bitmask;
> > >  target_ulong CP0_LLAddr_rw_bitmask;
> > >  int CP0_LLAddr_shift;
> > >  int32_t SYNCI_Step;
> > > diff --git a/target/mips/mips-defs.h b/target/mips/mips-defs.h
> > > index a831bb4..c2c96db 100644
> > > --- a/target/mips/mips-defs.h
> > > +++ b/target/mips/mips-defs.h
> > > @@ -51,8 +51,9 @@
> > >   */
> > >  #define INSN_LOONGSON2E   0x0001ULL
> > >  #define INSN_LOONGSON2F   0x0002ULL
> > > -#define INSN_VR54XX   0x0004ULL
> > > -#define INSN_R59000x0008ULL
> > > +#define INSN_LOONGSON3A   0x0004ULL
> > > +#define INSN_VR54XX   0x0008ULL
> > > +#define INSN_R59000x0010ULL
> > >  /*
> > >   *   bits 56-63: vendor-specific ASEs
> > >   */
> > > @@ -94,6 +95,8 @@
> > >  /* Wave Computing: "nanoMIPS" */
> > >  #define CPU_NANOMIPS32  (CPU_MIPS32R6 | ISA_NANOMIPS32)
> > >
> > > +#define CPU_LOONGSON3A  (CPU_MIPS64R2 | INSN_LOONGSON3A)
> > > +
> > >  /*
> > >   * Strictly follow the architecture standard:
> > >   * - Disallow "special" instruction handling for PMON/SPIM.
> > > diff --git a/target/mips/translate.c b/target/mips/translate.c
> > > index 25b595a..2caf4cb 100644
> > > --- a/target/mips/translate.c
> > > +++ b/target/mips/translate.c
> > > @@ -31206,7 +31206,9 @@ void cpu_state_reset(CPUMIPSState *env)
> > >  env->CP0_Config5 = env->cpu_model->CP0_Config5;
> > >  env->CP0_Config5_rw_bitmask = env->cpu_model->CP0_Config5_rw_bitmask;
> > >  env->CP0_Config6 = env->cpu_model->CP0_Config6;
> > > +env->CP0_Config6_rw_bitmask = 

[PATCH] s390x/kvm: help valgrind in several places

2020-04-28 Thread Christian Borntraeger
We need some little help in the code to reduce the valgrind noise.
- some designated initializers for the cpu model features and subfunctions
- mark memory as defined for sida memory reads

Signed-off-by: Christian Borntraeger 
---
 target/s390x/kvm.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
index 69881a0da0..bcd0ee0d14 100644
--- a/target/s390x/kvm.c
+++ b/target/s390x/kvm.c
@@ -52,6 +52,10 @@
 #include "hw/s390x/s390-virtio-hcall.h"
 #include "hw/s390x/pv.h"
 
+#ifdef CONFIG_VALGRIND_H
+#include 
+#endif
+
 #ifndef DEBUG_KVM
 #define DEBUG_KVM  0
 #endif
@@ -875,6 +879,13 @@ int kvm_s390_mem_op_pv(S390CPU *cpu, uint64_t offset, void 
*hostbuf,
 error_report("KVM_S390_MEM_OP failed: %s", strerror(-ret));
 abort();
 }
+
+#ifdef CONFIG_VALGRIND_H
+if (!is_write) {
+VALGRIND_MAKE_MEM_DEFINED(hostbuf, len);
+}
+#endif
+
 return ret;
 }
 
@@ -2165,7 +2176,7 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 
 static int query_cpu_subfunc(S390FeatBitmap features)
 {
-struct kvm_s390_vm_cpu_subfunc prop;
+struct kvm_s390_vm_cpu_subfunc prop = {};
 struct kvm_device_attr attr = {
 .group = KVM_S390_VM_CPU_MODEL,
 .attr = KVM_S390_VM_CPU_MACHINE_SUBFUNC,
@@ -2292,7 +2303,7 @@ static int kvm_to_feat[][2] = {
 
 static int query_cpu_feat(S390FeatBitmap features)
 {
-struct kvm_s390_vm_cpu_feat prop;
+struct kvm_s390_vm_cpu_feat prop = {};
 struct kvm_device_attr attr = {
 .group = KVM_S390_VM_CPU_MODEL,
 .attr = KVM_S390_VM_CPU_MACHINE_FEAT,
-- 
2.25.1




Re: [PATCH 07/17] tests/check-qom-proplist: Improve iterator coverage

2020-04-28 Thread Eric Blake

On 4/28/20 11:34 AM, Markus Armbruster wrote:

The tests' "qemu-dummy" device has only class properties.  Turn one of
them into an instance property.  test_dummy_class_iterator() expects
one fewer property than test_dummy_iterator().  Rewrite
test_dummy_prop_iterator() to take expected properties as argument.

Signed-off-by: Markus Armbruster 
---
  tests/check-qom-proplist.c | 51 +++---
  1 file changed, 26 insertions(+), 25 deletions(-)



Nice way to enhance coverage.  (I wish we could get rid of instance 
properties, but as long as we still have them, testing them is good).


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3 0/3] qcow2: Allow resize of images with internal snapshots

2020-04-28 Thread Eric Blake

On 4/28/20 7:59 AM, Max Reitz wrote:

On 24.04.20 21:09, Eric Blake wrote:

In v3:
- patch 1: fix error returns [patchew, Max], R-b dropped
- patch 2,3: unchanged, so add R-b

Eric Blake (3):
   block: Add blk_new_with_bs() helper
   qcow2: Allow resize of images with internal snapshots
   qcow2: Tweak comment about bitmaps vs. resize


Thanks, I’ve squashed the diff into patch 1 and applied the series to my
block-next branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next


This series has not only a merge conflict, but a semantic conflict, with 
the current state of Kevin's block-next branch.  I think I'll go ahead 
and post a v4 based on Kevin's branch to spare you the efforts of having 
to repeat my merge resolution.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




  1   2   3   4   >