Re: [PATCH v1] virtio-pci: store virtqueue size directly to a device

2019-12-23 Thread Denis Plotnikov


On 23.12.2019 17:31, Michael S. Tsirkin wrote:
> On Mon, Dec 23, 2019 at 02:37:58PM +0300, Denis Plotnikov wrote:
>> Currenly, the virtqueue size is saved to the proxy on pci writing and
>> is read from the device pci reading.
>> The virtqueue size is propagated later on form the proxy to the device
>> on virqueue enabling stage.
>>
>> This could be a problem, if a guest, on the virtqueue configuration, sets
>> the size and then re-read it immediatly before the queue enabling
>> in order to check if the desiged size has been set.
>>
>> This happens in seabios: (sebios snippet)
>>
>> vp_find_vq()
>> {
>>  ...
>>  /* check if the queue is available */
>>  if (vp->use_modern) {
>>  num = vp_read(>common, virtio_pci_common_cfg, queue_size);
>>  if (num > MAX_QUEUE_NUM) {
>>  vp_write(>common, virtio_pci_common_cfg, queue_size,
>>   MAX_QUEUE_NUM);
>>  num = vp_read(>common, virtio_pci_common_cfg, queue_size);
>>  }
>>  } else {
>>  num = vp_read(>legacy, virtio_pci_legacy, queue_num);
>>  }
>>  if (!num) {
>>  dprintf(1, "ERROR: queue size is 0\n");
>>  goto fail;
>>  }
>>  if (num > MAX_QUEUE_NUM) {
>>  dprintf(1, "ERROR: queue size %d > %d\n", num, MAX_QUEUE_NUM);
>>  goto fail;
>>  }
>>  ...
>> }
>>
>> If the device queue num is greater then the max queue size supported by 
>> seabios,
>> seabios tries to reduce the queue size, then re-read it again, I suppose to
>> check if the setting actually happens, and then checks the virtqueue size 
>> again,
>> to deside whether it is satisfied with the vaule.
>> In this case, if device's virtqueue size is 512 and seabios max supported 
>> queue
>> size is 256, seabios tries to set 256 but than read 512 again and can't 
>> proceed
>> with that vaule, preventing the guest from successful booting.
>> The root case was investigated by Roman Kagan 
>>
>> The patch fixes the problem, by propagating the queue size to the device 
>> right
>> away, so the written value could be read on the next step, if the value was
>> ok for the device.
>>
>> Suggested-by: Roman Kagan 
>> Suggested-by: Michael S. Tsirkin 
>> Signed-off-by: Denis Plotnikov 
> Thanks, I already have this queued as:
>
> commit 8aabbbd9d04f95d5581d2275362996ecb5516dd9
> Author: Michael S. Tsirkin 
> Date:   Fri Dec 13 09:22:48 2019 -0500
>
>  virtio: update queue size on guest write
>  
>  Some guests read back queue size after writing it.
>  Update the size immediatly upon write otherwise
>  they get confused.
>  
>  Signed-off-by: Michael S. Tsirkin 
>
> I would appreciate checking other transports, they likely
> need the same fix.
ok, I'll send the patch shortly
>
>
>> ---
>>   hw/virtio/virtio-pci.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
>> index c6b47a9c73..e5c759e19e 100644
>> --- a/hw/virtio/virtio-pci.c
>> +++ b/hw/virtio/virtio-pci.c
>> @@ -1256,6 +1256,8 @@ static void virtio_pci_common_write(void *opaque, 
>> hwaddr addr,
>>   break;
>>   case VIRTIO_PCI_COMMON_Q_SIZE:
>>   proxy->vqs[vdev->queue_sel].num = val;
>> +virtio_queue_set_num(vdev, vdev->queue_sel,
>> + proxy->vqs[vdev->queue_sel].num);
>>   break;
>>   case VIRTIO_PCI_COMMON_Q_MSIX:
>>   msix_vector_unuse(>pci_dev,
>> -- 
>> 2.17.0



Re: [PATCH for-5.0 v11 18/20] virtio-iommu: Support migration

2019-12-23 Thread Auger Eric
Hi Peter,

On 12/10/19 9:01 PM, Peter Xu wrote:
> On Fri, Nov 22, 2019 at 07:29:41PM +0100, Eric Auger wrote:
>> +static const VMStateDescription vmstate_virtio_iommu_device = {
>> +.name = "virtio-iommu-device",
>> +.minimum_version_id = 1,
>> +.version_id = 1,
>> +.post_load = iommu_post_load,
>> +.fields = (VMStateField[]) {
>> +VMSTATE_GTREE_DIRECT_KEY_V(domains, VirtIOIOMMU, 1,
>> +   _domain, viommu_domain),
>> +VMSTATE_GTREE_DIRECT_KEY_V(endpoints, VirtIOIOMMU, 1,
>> +   _endpoint, viommu_endpoint),
> 
> IIUC vmstate_domain already contains all the endpoint information (in
> endpoint_list of vmstate_domain), but here we migrate it twice. 

I migrated both because at that time I considered we could have
endpoints not attached to any domains but I think I can now simplify
based on the fact any EP is attached.


 I
> suppose that's why now we need reconstruct_ep_domain_link() to fixup
> the duplicated migration?

Even if I only migrate the domain gtree, I need to reconstruct the
ep->domain which was not migrated, on purpose, as it pointed to the old
domain in the origin.
> 
> Then I'll instead ask whether we can skip migrating here?  Then in
> post_load we simply:
> 
>   foreach(domain)
> foreach(endpoint in domain)
>   g_tree_insert(s->endpoints);
> 
> It might help to avoid the reconstruct_ep_domain_link ugliness?
I agree that it is simpler. Also need to update the ep->domain as
mentionned above. Thank you for the suggestion.


> 
> And besides, I also agree with Jean that the endpoint data structure
> could be reused with IOMMUDevice somehow.

As I replied to Jean, I think it makes sense to keep both structures as
endpoints are not indexed by the same key and the bus number is resolved
later.

Thanks

Eric
> 
> Thanks,
> 




Re: [PATCH for-5.0 v11 19/20] pc: Add support for virtio-iommu-pci

2019-12-23 Thread Auger Eric
Hi Jean,

On 12/10/19 5:50 PM, Jean-Philippe Brucker wrote:
> On Fri, Nov 22, 2019 at 07:29:42PM +0100, Eric Auger wrote:
>> The virtio-iommu-pci is instantiated through the -device QEMU
>> option. However if instantiated it also requires an IORT ACPI table
>> to describe the ID mappings between the root complex and the iommu.
>>
>> This patch adds the generation of the IORT table if the
>> virtio-iommu-pci device is instantiated.
>>
>> We also declare the [0xfee0 - 0xfeef] MSI reserved region
>> so that it gets bypassed by the IOMMU.
>>
>> Signed-off-by: Eric Auger 
> 
> It would be nice to factor the IORT code with arm, but this looks OK.
I factorized the iort table code generation. Not sure this will be used
eventually but well.

Thanks

Eric
> 
> Reviewed-by: Jean-Philippe Brucker 
> 




Re: NetBSD/arc on MIPS Magnum, was Re: [PATCH 00/10] Fixes for DP8393X SONIC device emulation

2019-12-23 Thread Hervé Poussineau

Le 24/12/2019 à 05:33, Finn Thain a écrit :

On Tue, 24 Dec 2019, Finn Thain wrote:



I know precious little about NetBSD installation and MIPS Magnum. What I
wrote above was guesswork. Hence this could be a NetBSD bug or user
error.



It was bugs and user error.

The user error was not using the serial console. The NetBSD/arc
installation guide says that only serial console is supported for MIPS
Magnum.

The bugs include regressions in NetBSD. (See below.)

The other issue is that the ARC firmware didn't work properly until I
defined one or more 'boot selections', even though none of these will ever
be selected.


Does there exist a known-good combination of NetBSD/arc and
qemu-system-mips64el releases?



The commit log says that Herv? Poussineau used NetBSD 5.1 with dp8393x in
the past, so I tried that.

Here are the steps I used:

./mips64el-softmmu/qemu-system-mips64el -M magnum -L .
-drive if=scsi,unit=2,media=cdrom,format=raw,file=arccd-5.1.iso
-global ds1225y.filename=nvram -global ds1225y.size=8200
-serial stdio -serial null
-nic bridge,model=dp83932,mac=00:00:00:02:03:04

-> Run setup -> Initialize system -> Set default configurations
800x688
3.5 1.44 M
No
7

-> Set default environment
CD-ROM
2

-> Set environment variables
CONSOLEIN
multi()serial(0)term()
CONSOLEOUT
multi()serial(0)term()

-> Exit

Now restart QEMU. The ARC menu should appear on the tty.

-> Run a program

scsi(0)cdrom(2)fdisk(0)boot scsi(0)cdrom(2)fdisk(0)netbsd

That doesn't work. Add a boot selection.

-> Run setup -> Manage startup -> Add a boot selection -> Scsi CD-ROM 0
\os\nt\osloader.exe
Yes
\winnt
Windows NT
No

Somehow, that seems to help. Now restart QEMU.

-> Run a program

 scsi(0)cdrom(2)fdisk(0)boot scsi(0)cdrom(2)fdisk(0)netbsd

NetBSD/arc Bootstrap, Revision 1.1
(bui...@b7.netbsd.org, Sat Nov  6 14:06:36 UTC 2010)
devopen: scsi(0)cdrom(2)fdisk(0) type disk file netbsd
5502064+289092=0x5860e0
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
 2006, 2007, 2008, 2009, 2010
 The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
 The Regents of the University of California.  All rights reserved.

NetBSD 5.1 (RAMDISK) #0: Sat Nov  6 14:17:36 UTC 2010
 
bui...@b7.netbsd.org:/home/builds/ab/netbsd-5-1-RELEASE/arc/201011061943Z-obj/home/builds/ab/netbsd-5-1-RELEASE/src/sys/arch/arc/compile/RAMDISK
MIPS Magnum
total memory = 128 MB
avail memory = 117 MB
mainbus0 (root)
cpu0 at mainbus0: MIPS R4000 CPU (0x400) Rev. 0.0 with MIPS R4010 FPC Rev. 0.0
cpu0: 8KB/16B direct-mapped L1 Instruction cache, 48 TLB entries
cpu0: 8KB/16B direct-mapped write-back L1 Data cache
jazzio0 at mainbus0
timer0 at jazzio0 addr 0xe228
mcclock0 at jazzio0 addr 0xe0004000: mc146818 compatible time-of-day clock
LPT1 at jazzio0 addr 0xe0008000 intr 0 not configured
fdc0 at jazzio0 addr 0xe0003000 intr 1
fd0 at fdc0 drive 1: 1.44MB, 80 cyl, 2 head, 18 sec
MAGNUM at jazzio0 addr 0xe000c000 intr 2 not configured
VXL at jazzio0 addr 0xe080 intr 3 not configured
sn0 at jazzio0 addr 0xe0001000 intr 4: SONIC Ethernet
sonic: write 0x0015 to reg CR
sonic: write 0x0080 to reg CR
sonic: write 0x to reg IMR
sonic: write 0x7fff to reg ISR
sonic: write 0x to reg CR
sn0: Ethernet address 00:00:00:00:00:00
asc0 at jazzio0 addr 0xe0002000 intr 5: NCR53C94, 25MHz, SCSI ID 7
scsibus0 at asc0: 8 targets, 8 luns per target
pckbc0 at jazzio0 addr 0xe0005000 intr 6
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0 (mux ignored)
pms at jazzio0 addr 0xe0005000 intr 7 not configured
com0 at jazzio0 addr 0xe0006000 intr 8: ns16550a, working fifo
com0: txfifo disabled
com0: console
com1 at jazzio0 addr 0xe0007000 intr 9: ns16550a, working fifo
com1: txfifo disabled
jazzisabr0 at mainbus0
isa0 at jazzisabr0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
scsibus0: waiting 2 seconds for devices to settle...
cd0 at scsibus0 target 2 lun 0:  cdrom removable
cd1 at scsibus0 target 4 lun 0:  cdrom removable
boot device: 
root on md0a dumps on md0b
root file system type: ffs
WARNING: preposterous TOD clock time
WARNING: using filesystem time
WARNING: CHECK AND RESET THE DATE!
erase ^H, werase ^W, kill ^U, intr ^C, status ^T
Terminal type? [vt100]
Erase is backspace.
(I)nstall, (S)hell or (H)alt ? s
# ifconfig sn0 10.2.3.4/24
# ping
usage:
ping [-adDfLnoPqQrRv] [-c count] [-g gateway] [-h host] [-i interval] [-I addr]
  [-l preload] [-p pattern] [-s size] [-t tos] [-T ttl] [-w maxwait] host


My initial testing shows that NetBSD 5.1 doesn't like my v2 patch series.
I'll debug that before I send v3.

BTW, there seem to be regressions in NetBSD 8.1 compared to NetBSD 5.1.

The 'boot' program on the 8.1 ISO just hangs.

If I use the 'boot' program from the 5.1 ISO to load the 'netbsd'
binary from the 8.1 ISO, I get a crash:

-> Run a program


Re: [PATCH] target/ppc: fix memory dump endianness in QEMU monitor

2019-12-23 Thread David Gibson
On Mon, Dec 23, 2019 at 08:27:49PM -0300, Fabiano Rosas wrote:
> David Gibson  writes:
> 
> > b) AFAICT this is the *only* thing that looks for the LE bit in
> > hflags. Given that, and the fact that it would be wrong in most cases,
> > we should remove it from hflags entirely along with this change.
> >
> 
> I see there is:
> 
> static void ppc_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
> {
> ...
> ctx->le_mode = !!(env->hflags & (1 << MSR_LE));
> ...
> }

Ah... good point, I missed that one, sorry.  That makes all the
difference.

My guess is that this bit exists to be a universal flag for endianness
mode, generalizing across the MSR bit on modern cpus, and the old 601
which had it in the HID register.  I'm a bit dubious as to whether our
601 emulation is good enough to warrant bothering with this, but it's
probably best not to mess with it.


> And we call hreg_recompute_hflags in some places:

ITYM hreg_compute_hflags().

> - powerpc_excp (target/ppc/excp_helper.c)
>   Called from TCG do_interrupt
> 
> - ppc_cpu_reset (target/ppc/translate_init.inc.c)
>   Called from spapr_machine_reset
> 
> - hreg_store_msr (target/ppc/helper_regs.h)
>   This is used for migration and for do_rfi, store_msr

Huh... given this, I'm not sure how hflags was getting out of sync
with the MSR in the first place, which brings the initial patch into
question.

> - h_cede (hw/ppc/spapr_hcall.c)
>   QEMU-side H_CEDE hypercall implementation 
> 
> 
> It looks like the hflags MSR_LE is being updated correctly with TCG. But
> with KVM we only touch it on system_reset

Ah.. right.  I think to fix that we'd want an hreg_compute_hflags() at
the end of sucking the state out of KVM.

> (and possibly h_cede? I don't
> know if it is QEMU who handles it).

It's KVM.  If we used the qemu one it would add an awful lot of
latency to cedes.
> 
> So I would let hflags be.
> 
> 
> ... Actually, I don't really know the purpose of hflags. It comes from:
> 
>   commit 3f3373166227b13e762e20d2fb51eadfa6a2d653
>   Author: Fabrice Bellard 
>   Date:   Wed Aug 20 23:02:09 2003 +
>   
>   pop ss, mov ss, x and sti disable irqs for the next instruction -
>   began dispatch optimization by adding new x86 cpu 'hidden' flags
>   
>   
>   git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@372 
> c046a42c-6fe2-441c-8c8c-71466251a162
> 
> Could any one clarify that?

Not really.  It's really, really old, in the cruft bits of TCG I don't
much understand.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] target/ppc: fix memory dump endianness in QEMU monitor

2019-12-23 Thread David Gibson
On Mon, Dec 23, 2019 at 06:35:30PM -0300, Maxiwell S. Garcia wrote:
> On Mon, Dec 23, 2019 at 05:30:43PM +1100, David Gibson wrote:
> > On Thu, Dec 19, 2019 at 01:38:54PM -0300, Maxiwell S. Garcia wrote:
> > > The env->hflags is computed in ppc_cpu_reset(), using the MSR register
> > > as input. But at the point ppc_disas_set_info() is called the MSR_LE bit
> > > in env->hflags doesn't contain the same information that env->msr.
> > > 
> > > Signed-off-by: Maxiwell S. Garcia 
> > > Signed-off-by: Fabiano Rosas 
> > 
> > I think the change is ok as far as it goes but,
> > 
> > a) the commit message should expand on what the practical effect of
> > this is.  Looking, I think the only thing this affects is DEBUG_DISAS
> > output (i.e. very rarely) which is worth noting.
> 
> Ok, I will do that. I got this bug using the 'x/i' command on QEMU
> monitor with a LE guest.

Ok.

> > b) AFAICT this is the *only* thing that looks for the LE bit in
> > hflags. Given that, and the fact that it would be wrong in most cases,
> > we should remove it from hflags entirely along with this change.
> > 
> 
> I was changing the code to remove this LE bit from hflags and I found the
> function 'helper_store_hid0_601()' in misc_helper.c, which manipulates the
> 'hflags'. The commit 056401eae6 says:
> 
> "Implement PowerPC 601 HID0 register, needed for little-endian mode support.
> As a consequence, we need to merge hflags coming from MSR with other ones.
> Use little-endian mode from hflags instead of MSR during code translation."
> 
> So, is the 'hflags' necessary here? Can we use MSR instead of hflags to
> change the endianness in this function?

That function alters the LE bit in hflags, but doesn't read it.
*Nothing* reads it, so none of the places that alter it matter.

I strongly suspect we won't properly honour the LE bit in the 601 HID
register, but that's already the case.  I also suspect it's far from
the only way in which 601 emulation is broken.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCH v2 2/6] linux-user: mips: Update syscall numbers to kernel 5.5 rc3 level

2019-12-23 Thread Aleksandar Markovic
From: Aleksandar Markovic 

Update mips syscall numbers based on Linux kernel tag v5.5-rc3
(commit 46cf053e).

Signed-off-by: Aleksandar Markovic 
---
 linux-user/mips/cpu_loop.c | 69 ++
 linux-user/mips/syscall_nr.h   | 45 +++
 linux-user/mips64/syscall_nr.h | 13 
 3 files changed, 127 insertions(+)

diff --git a/linux-user/mips/cpu_loop.c b/linux-user/mips/cpu_loop.c
index 39915b3..a9db725 100644
--- a/linux-user/mips/cpu_loop.c
+++ b/linux-user/mips/cpu_loop.c
@@ -390,6 +390,75 @@ static const uint8_t mips_syscall_args[] = {
 MIPS_SYS(sys_copy_file_range, 6) /* 360 */
 MIPS_SYS(sys_preadv2, 6)
 MIPS_SYS(sys_pwritev2, 6)
+MIPS_SYS(sys_pkey_mprotect, 4)
+MIPS_SYS(sys_pkey_alloc, 2)
+MIPS_SYS(sys_pkey_free, 1) /* 365 */
+MIPS_SYS(sys_statx, 5)
+MIPS_SYS(sys_rseq, 4)
+MIPS_SYS(sys_io_pgetevents, 6)
+0,
+0, /* 370 */
+0,
+0,
+0,
+0,
+0, /* 375 */
+0,
+0,
+0,
+0,
+0, /* 380 */
+0,
+0,
+0,
+0,
+0, /* 385 */
+0,
+0,
+0,
+0,
+0, /* 390 */
+0,
+0,
+MIPS_SYS(sys_semget, 3)
+MIPS_SYS(sys_semctl, 4)
+MIPS_SYS(sys_shmget, 3)/* 395 */
+MIPS_SYS(sys_shmctl, 3)
+MIPS_SYS(sys_shmat, 3)
+MIPS_SYS(sys_shmdt, 1)
+MIPS_SYS(sys_msgget, 2)
+MIPS_SYS(sys_msgsnd, 4)/* 400 */
+MIPS_SYS(sys_msgrcv, 5)
+MIPS_SYS(sys_msgctl, 3)
+MIPS_SYS(sys_timer_gettime64, 2)
+MIPS_SYS(sys_timer_settime64, 4)
+MIPS_SYS(sys_timerfd_gettime64, 2) /* 410 */
+MIPS_SYS(sys_timerfd_settime64, 4)
+MIPS_SYS(sys_utimensat_time64, 4)
+MIPS_SYS(sys_pselect6_time64, 6)
+MIPS_SYS(sys_ppoll_time64, 5)
+0, /* 415 */
+MIPS_SYS(sys_io_pgetevents_time64, 6)
+MIPS_SYS(sys_recvmmsg_time64, 5)
+MIPS_SYS(sys_mq_timedsend_time64, 5)
+MIPS_SYS(sys_mq_timedreceive_time64, 5)
+MIPS_SYS(sys_semtimedop_time64, 4) /* 420 */
+MIPS_SYS(sys_rt_sigtimedwait_time64, 4)
+MIPS_SYS(sys_futex_time64, 6)
+MIPS_SYS(sys_sched_rr_get_interval_time64, 2)
+MIPS_SYS(sys_pidfd_send_signal, 4)
+MIPS_SYS(sys_io_uring_setup, 2)/* 425 */
+MIPS_SYS(sys_io_uring_enter, 6)
+MIPS_SYS(sys_io_uring_register, 4)
+MIPS_SYS(sys_open_tree, 3)
+MIPS_SYS(sys_move_mount, 5)
+MIPS_SYS(sys_fsopen, 2)/* 430 */
+MIPS_SYS(sys_fsconfig, 5)
+MIPS_SYS(sys_fsmount, 3)
+MIPS_SYS(sys_fspick, 3)
+MIPS_SYS(sys_pidfd_open, 2)
+MIPS_SYS(sys_clone3, 2)/* 435 */
+
 };
 #  undef MIPS_SYS
 # endif /* O32 */
diff --git a/linux-user/mips/syscall_nr.h b/linux-user/mips/syscall_nr.h
index 7fa7fa5..0be3af1 100644
--- a/linux-user/mips/syscall_nr.h
+++ b/linux-user/mips/syscall_nr.h
@@ -376,5 +376,50 @@
 #define TARGET_NR_statx (TARGET_NR_Linux + 366)
 #define TARGET_NR_rseq  (TARGET_NR_Linux + 367)
 #define TARGET_NR_io_pgetevents (TARGET_NR_Linux + 368)
+/* room for arch specific calls */
+#define TARGET_NR_semget(TARGET_NR_Linux + 393)
+#define TARGET_NR_semctl(TARGET_NR_Linux + 394)
+#define TARGET_NR_shmget(TARGET_NR_Linux + 395)
+#define TARGET_NR_shmctl(TARGET_NR_Linux + 396)
+#define TARGET_NR_shmat (TARGET_NR_Linux + 397)
+#define TARGET_NR_shmdt (TARGET_NR_Linux + 398)
+#define TARGET_NR_msgget(TARGET_NR_Linux + 399)
+#define TARGET_NR_msgsnd(TARGET_NR_Linux + 400)
+#define TARGET_NR_msgrcv(TARGET_NR_Linux + 401)
+#define TARGET_NR_msgctl(TARGET_NR_Linux + 402)
+/* 403-423 common for 32-bit archs */
+#define TARGET_NR_clock_gettime64  (TARGET_NR_Linux + 403)
+#define TARGET_NR_clock_settime64  (TARGET_NR_Linux + 404)
+#define TARGET_NR_clock_adjtime64  (TARGET_NR_Linux + 405)
+#define TARGET_NR_clock_getres_time64  (TARGET_NR_Linux + 406)
+#define TARGET_NR_clock_nanosleep_time64   (TARGET_NR_Linux + 407)
+#define TARGET_NR_timer_gettime64  (TARGET_NR_Linux + 408)
+#define TARGET_NR_timer_settime64  (TARGET_NR_Linux + 409)
+#define TARGET_NR_timerfd_gettime64(TARGET_NR_Linux + 410)
+#define TARGET_NR_timerfd_settime64(TARGET_NR_Linux + 411)

[PATCH v2 5/6] linux-user: Add support for FS_IOC32_VERSION ioctls

2019-12-23 Thread Aleksandar Markovic
From: Aleksandar Markovic 

These FS_IOC32_VERSION ioctls are identical to
FS_IOC_VERSION ioctls, but without the anomaly of their
number defined as if their third argument is of type long, while
it is treated internally in kernel as is of type int.

Signed-off-by: Aleksandar Markovic 
---
 linux-user/ioctls.h   | 2 ++
 linux-user/syscall_defs.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index 4fd6939..3affd88 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -142,6 +142,8 @@
  IOCTL(FS_IOC_SETVERSION, IOC_W, MK_PTR(TYPE_INT))
  IOCTL(FS_IOC32_GETFLAGS, IOC_R, MK_PTR(TYPE_INT))
  IOCTL(FS_IOC32_SETFLAGS, IOC_W, MK_PTR(TYPE_INT))
+ IOCTL(FS_IOC32_GETVERSION, IOC_R, MK_PTR(TYPE_INT))
+ IOCTL(FS_IOC32_SETVERSION, IOC_W, MK_PTR(TYPE_INT))
 
 #ifdef CONFIG_USBFS
   /* USB ioctls */
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 964b2b4..a73cc3d 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -922,6 +922,8 @@ struct target_pollfd {
 #define TARGET_FS_IOC_FIEMAP TARGET_IOWR('f',11,struct fiemap)
 #define TARGET_FS_IOC32_GETFLAGS TARGET_IOR('f', 1, int)
 #define TARGET_FS_IOC32_SETFLAGS TARGET_IOW('f', 2, int)
+#define TARGET_FS_IOC32_GETVERSION TARGET_IOR('v', 1, int)
+#define TARGET_FS_IOC32_SETVERSION TARGET_IOW('v', 2, int)
 
 /* usb ioctls */
 #define TARGET_USBDEVFS_CONTROL TARGET_IOWRU('U', 0)
-- 
2.7.4




[PATCH v2 1/6] linux-user: Fix some constants in termbits.h

2019-12-23 Thread Aleksandar Markovic
From: Aleksandar Markovic 

Some constants were defined in terms of host, instead of target,
as they should be.

Some additional trivial changes in this patch were forced by
checkpatch.pl.

Reviewed-by: Max Filippov 
Signed-off-by: Aleksandar Markovic 
---
 linux-user/aarch64/termbits.h|   4 +-
 linux-user/alpha/termbits.h  |  10 +--
 linux-user/arm/termbits.h|   4 +-
 linux-user/cris/termbits.h   |   4 +-
 linux-user/hppa/termbits.h   |   4 +-
 linux-user/i386/termbits.h   |   4 +-
 linux-user/m68k/termbits.h   |   4 +-
 linux-user/microblaze/termbits.h |   4 +-
 linux-user/mips/termbits.h   |   4 +-
 linux-user/nios2/termbits.h  |   4 +-
 linux-user/openrisc/termbits.h   |   6 +-
 linux-user/ppc/termbits.h|   4 +-
 linux-user/riscv/termbits.h  |   4 +-
 linux-user/s390x/termbits.h  |  26 ---
 linux-user/sh4/termbits.h|   4 +-
 linux-user/sparc/termbits.h  |   4 +-
 linux-user/sparc64/termbits.h|   4 +-
 linux-user/x86_64/termbits.h |   6 +-
 linux-user/xtensa/termbits.h | 156 ++-
 19 files changed, 141 insertions(+), 119 deletions(-)

diff --git a/linux-user/aarch64/termbits.h b/linux-user/aarch64/termbits.h
index 0ab448d..998fc1d 100644
--- a/linux-user/aarch64/termbits.h
+++ b/linux-user/aarch64/termbits.h
@@ -83,8 +83,8 @@ struct target_termios {
 #define  TARGET_B9600  015
 #define  TARGET_B19200 016
 #define  TARGET_B38400 017
-#define TARGET_EXTA B19200
-#define TARGET_EXTB B38400
+#define TARGET_EXTATARGET_B19200
+#define TARGET_EXTBTARGET_B38400
 #define TARGET_CSIZE   060
 #define   TARGET_CS5   000
 #define   TARGET_CS6   020
diff --git a/linux-user/alpha/termbits.h b/linux-user/alpha/termbits.h
index a714251..ace19be 100644
--- a/linux-user/alpha/termbits.h
+++ b/linux-user/alpha/termbits.h
@@ -108,8 +108,8 @@ struct target_termios {
 #define  TARGET_B9600  015
 #define  TARGET_B19200 016
 #define  TARGET_B38400 017
-#define TARGET_EXTA B19200
-#define TARGET_EXTB B38400
+#define TARGET_EXTA TARGET_B19200
+#define TARGET_EXTB TARGET_B38400
 #define TARGET_CBAUDEX 000
 #define  TARGET_B57600   00020
 #define  TARGET_B115200  00021
@@ -165,7 +165,7 @@ struct target_termios {
 #define TARGET_FIOASYNCTARGET_IOW('f', 125, int)
 #define TARGET_FIONBIO TARGET_IOW('f', 126, int)
 #define TARGET_FIONREADTARGET_IOR('f', 127, int)
-#define TARGET_TIOCINQ FIONREAD
+#define TARGET_TIOCINQ  TARGET_FIONREAD
 #define TARGET_FIOQSIZETARGET_IOR('f', 128, loff_t)
 
 #define TARGET_TIOCGETPTARGET_IOR('t', 8, struct target_sgttyb)
@@ -217,8 +217,8 @@ struct target_termios {
 # define TARGET_TIOCM_CAR  0x040
 # define TARGET_TIOCM_RNG  0x080
 # define TARGET_TIOCM_DSR  0x100
-# define TARGET_TIOCM_CD   TIOCM_CAR
-# define TARGET_TIOCM_RI   TIOCM_RNG
+# define TARGET_TIOCM_CDTARGET_TIOCM_CAR
+# define TARGET_TIOCM_RITARGET_TIOCM_RNG
 # define TARGET_TIOCM_OUT1 0x2000
 # define TARGET_TIOCM_OUT2 0x4000
 # define TARGET_TIOCM_LOOP 0x8000
diff --git a/linux-user/arm/termbits.h b/linux-user/arm/termbits.h
index e555cff..7170b8a 100644
--- a/linux-user/arm/termbits.h
+++ b/linux-user/arm/termbits.h
@@ -83,8 +83,8 @@ struct target_termios {
 #define  TARGET_B9600  015
 #define  TARGET_B19200 016
 #define  TARGET_B38400 017
-#define TARGET_EXTA B19200
-#define TARGET_EXTB B38400
+#define TARGET_EXTATARGET_B19200
+#define TARGET_EXTBTARGET_B38400
 #define TARGET_CSIZE   060
 #define   TARGET_CS5   000
 #define   TARGET_CS6   020
diff --git a/linux-user/cris/termbits.h b/linux-user/cris/termbits.h
index 475ee70..76d5ed0 100644
--- a/linux-user/cris/termbits.h
+++ b/linux-user/cris/termbits.h
@@ -81,8 +81,8 @@ struct target_termios {
 #define  TARGET_B9600  015
 #define  TARGET_B19200 016
 #define  TARGET_B38400 017
-#define TARGET_EXTA B19200
-#define TARGET_EXTB B38400
+#define TARGET_EXTATARGET_B19200
+#define TARGET_EXTBTARGET_B38400
 #define TARGET_CSIZE   060
 #define   TARGET_CS5   000
 #define   TARGET_CS6   020
diff --git a/linux-user/hppa/termbits.h b/linux-user/hppa/termbits.h
index 8fba839..3094710 100644
--- a/linux-user/hppa/termbits.h
+++ b/linux-user/hppa/termbits.h
@@ -82,8 +82,8 @@ struct target_termios {
 #define  TARGET_B9600  015
 #define  TARGET_B19200 016
 #define  TARGET_B38400 017
-#define TARGET_EXTA B19200
-#define TARGET_EXTB B38400
+#define TARGET_EXTATARGET_B19200
+#define TARGET_EXTBTARGET_B38400
 #define TARGET_CSIZE   060
 #define   TARGET_CS5   000
 #define   TARGET_CS6   020
diff --git a/linux-user/i386/termbits.h b/linux-user/i386/termbits.h
index 88264bb..3b16977 100644
--- a/linux-user/i386/termbits.h
+++ b/linux-user/i386/termbits.h
@@ -82,8 +82,8 @@ struct target_termios {
 #define  TARGET_B9600  015
 #define  

[PATCH v2 0/6] linux-user: Misc patches for 5.0

2019-12-23 Thread Aleksandar Markovic
From: Aleksandar Markovic 

This series is a collection of patches I recently accumulated.

v1->v2:

  - fixed a constant in xtensa's termbits.h that was missed in v1
  - redid syscall numbers for mips o32
  - minor formatting and wording changes

Aleksandar Markovic (6):
  linux-user: Fix some constants in termbits.h
  linux-user: mips: Update syscall numbers to kernel 5.5 rc3 level
  linux-user: Add support for FS_IOC_VERSION ioctls
  linux-user: Add support for FS_IOC32_FLAGS ioctls
  linux-user: Add support for FS_IOC32_VERSION ioctls
  linux-user: Add support for FS_IOC_FSXATTR ioctls

 linux-user/aarch64/termbits.h|   4 +-
 linux-user/alpha/termbits.h  |  10 +--
 linux-user/arm/termbits.h|   4 +-
 linux-user/cris/termbits.h   |   4 +-
 linux-user/hppa/termbits.h   |   4 +-
 linux-user/i386/termbits.h   |   4 +-
 linux-user/ioctls.h  |  13 
 linux-user/m68k/termbits.h   |   4 +-
 linux-user/microblaze/termbits.h |   4 +-
 linux-user/mips/cpu_loop.c   |  69 +
 linux-user/mips/syscall_nr.h |  45 +++
 linux-user/mips/termbits.h   |   4 +-
 linux-user/mips64/syscall_nr.h   |  13 
 linux-user/nios2/termbits.h  |   4 +-
 linux-user/openrisc/termbits.h   |   6 +-
 linux-user/ppc/termbits.h|   4 +-
 linux-user/riscv/termbits.h  |   4 +-
 linux-user/s390x/termbits.h  |  26 ---
 linux-user/sh4/termbits.h|   4 +-
 linux-user/sparc/termbits.h  |   4 +-
 linux-user/sparc64/termbits.h|   4 +-
 linux-user/syscall_defs.h|  14 +++-
 linux-user/x86_64/termbits.h |   6 +-
 linux-user/xtensa/termbits.h | 156 ++-
 24 files changed, 292 insertions(+), 122 deletions(-)

-- 
2.7.4




[PATCH v2 6/6] linux-user: Add support for FS_IOC_FSXATTR ioctls

2019-12-23 Thread Aleksandar Markovic
From: Aleksandar Markovic 

These ioctls were relatively recently introduced, so the "#ifdef"
guards are used in this implementation.

Signed-off-by: Aleksandar Markovic 
---
 linux-user/ioctls.h   | 7 +++
 linux-user/syscall_defs.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index 3affd88..e1b89a7 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -144,6 +144,13 @@
  IOCTL(FS_IOC32_SETFLAGS, IOC_W, MK_PTR(TYPE_INT))
  IOCTL(FS_IOC32_GETVERSION, IOC_R, MK_PTR(TYPE_INT))
  IOCTL(FS_IOC32_SETVERSION, IOC_W, MK_PTR(TYPE_INT))
+#ifdef FS_IOC_FSGETXATTR
+ IOCTL(FS_IOC_FSGETXATTR, IOC_W, MK_PTR(MK_STRUCT(STRUCT_fsxattr)))
+#endif
+#ifdef FS_IOC_FSSETXATTR
+ IOCTL(FS_IOC_FSSETXATTR, IOC_W, MK_PTR(MK_STRUCT(STRUCT_fsxattr)))
+#endif
+
 
 #ifdef CONFIG_USBFS
   /* USB ioctls */
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index a73cc3d..12cd3de 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -924,6 +924,8 @@ struct target_pollfd {
 #define TARGET_FS_IOC32_SETFLAGS TARGET_IOW('f', 2, int)
 #define TARGET_FS_IOC32_GETVERSION TARGET_IOR('v', 1, int)
 #define TARGET_FS_IOC32_SETVERSION TARGET_IOW('v', 2, int)
+#define TARGET_FS_IOC_FSGETXATTR TARGET_IOR('X', 31, struct file_clone_range)
+#define TARGET_FS_IOC_FSSETXATTR TARGET_IOR('X', 32, struct file_clone_range)
 
 /* usb ioctls */
 #define TARGET_USBDEVFS_CONTROL TARGET_IOWRU('U', 0)
-- 
2.7.4




[PATCH v2 4/6] linux-user: Add support for FS_IOC32_FLAGS ioctls

2019-12-23 Thread Aleksandar Markovic
From: Aleksandar Markovic 

These FS_IOC32_FLAGS ioctls are identical to
FS_IOC_FLAGS ioctls, but without the anomaly of their
number defined as if their third argument is of type long, while
it is treated internally in kernel as is of type int.

Signed-off-by: Aleksandar Markovic 
---
 linux-user/ioctls.h   | 2 ++
 linux-user/syscall_defs.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index c44f42e..4fd6939 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -140,6 +140,8 @@
  IOCTL(FS_IOC_SETFLAGS, IOC_W, MK_PTR(TYPE_INT))
  IOCTL(FS_IOC_GETVERSION, IOC_R, MK_PTR(TYPE_INT))
  IOCTL(FS_IOC_SETVERSION, IOC_W, MK_PTR(TYPE_INT))
+ IOCTL(FS_IOC32_GETFLAGS, IOC_R, MK_PTR(TYPE_INT))
+ IOCTL(FS_IOC32_SETFLAGS, IOC_W, MK_PTR(TYPE_INT))
 
 #ifdef CONFIG_USBFS
   /* USB ioctls */
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index f68a8b6..964b2b4 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -920,6 +920,8 @@ struct target_pollfd {
 #define TARGET_FS_IOC_GETVERSION TARGET_IOR('v', 1, abi_long)
 #define TARGET_FS_IOC_SETVERSION TARGET_IOW('v', 2, abi_long)
 #define TARGET_FS_IOC_FIEMAP TARGET_IOWR('f',11,struct fiemap)
+#define TARGET_FS_IOC32_GETFLAGS TARGET_IOR('f', 1, int)
+#define TARGET_FS_IOC32_SETFLAGS TARGET_IOW('f', 2, int)
 
 /* usb ioctls */
 #define TARGET_USBDEVFS_CONTROL TARGET_IOWRU('U', 0)
-- 
2.7.4




[PATCH v2 3/6] linux-user: Add support for FS_IOC_VERSION ioctls

2019-12-23 Thread Aleksandar Markovic
From: Aleksandar Markovic 

A very specific thing for these two ioctls is that their code
implies that their third argument is of type 'long', but the
kernel uses that argument as if it is of type 'int'. This anomaly
is recognized also in commit 6080723 (linux-user: Implement
FS_IOC_GETFLAGS and FS_IOC_SETFLAGS ioctls).

Signed-off-by: Aleksandar Markovic 
---
 linux-user/ioctls.h   | 2 ++
 linux-user/syscall_defs.h | 8 +---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index c6b9d6a..c44f42e 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -138,6 +138,8 @@
 
  IOCTL(FS_IOC_GETFLAGS, IOC_R, MK_PTR(TYPE_INT))
  IOCTL(FS_IOC_SETFLAGS, IOC_W, MK_PTR(TYPE_INT))
+ IOCTL(FS_IOC_GETVERSION, IOC_R, MK_PTR(TYPE_INT))
+ IOCTL(FS_IOC_SETVERSION, IOC_W, MK_PTR(TYPE_INT))
 
 #ifdef CONFIG_USBFS
   /* USB ioctls */
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 98c2119..f68a8b6 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -911,12 +911,14 @@ struct target_pollfd {
 #define TARGET_FICLONETARGET_IOW(0x94, 9, int)
 #define TARGET_FICLONERANGE TARGET_IOW(0x94, 13, struct file_clone_range)
 
-/* Note that the ioctl numbers claim type "long" but the actual type
- * used by the kernel is "int".
+/*
+ * Note that the ioctl numbers for FS_IOC_
+ * claim type "long" but the actual type used by the kernel is "int".
  */
 #define TARGET_FS_IOC_GETFLAGS TARGET_IOR('f', 1, abi_long)
 #define TARGET_FS_IOC_SETFLAGS TARGET_IOW('f', 2, abi_long)
-
+#define TARGET_FS_IOC_GETVERSION TARGET_IOR('v', 1, abi_long)
+#define TARGET_FS_IOC_SETVERSION TARGET_IOW('v', 2, abi_long)
 #define TARGET_FS_IOC_FIEMAP TARGET_IOWR('f',11,struct fiemap)
 
 /* usb ioctls */
-- 
2.7.4




Re: NetBSD/arc on MIPS Magnum, was Re: [PATCH 00/10] Fixes for DP8393X SONIC device emulation

2019-12-23 Thread Finn Thain
On Tue, 24 Dec 2019, Finn Thain wrote:

> 
> I know precious little about NetBSD installation and MIPS Magnum. What I 
> wrote above was guesswork. Hence this could be a NetBSD bug or user 
> error.
> 

It was bugs and user error.

The user error was not using the serial console. The NetBSD/arc 
installation guide says that only serial console is supported for MIPS 
Magnum.

The bugs include regressions in NetBSD. (See below.)

The other issue is that the ARC firmware didn't work properly until I 
defined one or more 'boot selections', even though none of these will ever 
be selected.

> Does there exist a known-good combination of NetBSD/arc and 
> qemu-system-mips64el releases?
> 

The commit log says that Herv? Poussineau used NetBSD 5.1 with dp8393x in 
the past, so I tried that.

Here are the steps I used:

./mips64el-softmmu/qemu-system-mips64el -M magnum -L .  
-drive if=scsi,unit=2,media=cdrom,format=raw,file=arccd-5.1.iso
-global ds1225y.filename=nvram -global ds1225y.size=8200
-serial stdio -serial null
-nic bridge,model=dp83932,mac=00:00:00:02:03:04

-> Run setup -> Initialize system -> Set default configurations
800x688
3.5 1.44 M
No
7

-> Set default environment
CD-ROM
2

-> Set environment variables
CONSOLEIN
multi()serial(0)term()
CONSOLEOUT
multi()serial(0)term()

-> Exit

Now restart QEMU. The ARC menu should appear on the tty.

-> Run a program

scsi(0)cdrom(2)fdisk(0)boot scsi(0)cdrom(2)fdisk(0)netbsd

That doesn't work. Add a boot selection.

-> Run setup -> Manage startup -> Add a boot selection -> Scsi CD-ROM 0
\os\nt\osloader.exe
Yes
\winnt
Windows NT
No

Somehow, that seems to help. Now restart QEMU.

-> Run a program

scsi(0)cdrom(2)fdisk(0)boot scsi(0)cdrom(2)fdisk(0)netbsd

NetBSD/arc Bootstrap, Revision 1.1
(bui...@b7.netbsd.org, Sat Nov  6 14:06:36 UTC 2010)
devopen: scsi(0)cdrom(2)fdisk(0) type disk file netbsd
5502064+289092=0x5860e0
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010
The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.

NetBSD 5.1 (RAMDISK) #0: Sat Nov  6 14:17:36 UTC 2010

bui...@b7.netbsd.org:/home/builds/ab/netbsd-5-1-RELEASE/arc/201011061943Z-obj/home/builds/ab/netbsd-5-1-RELEASE/src/sys/arch/arc/compile/RAMDISK
MIPS Magnum
total memory = 128 MB
avail memory = 117 MB
mainbus0 (root)
cpu0 at mainbus0: MIPS R4000 CPU (0x400) Rev. 0.0 with MIPS R4010 FPC Rev. 0.0
cpu0: 8KB/16B direct-mapped L1 Instruction cache, 48 TLB entries
cpu0: 8KB/16B direct-mapped write-back L1 Data cache
jazzio0 at mainbus0
timer0 at jazzio0 addr 0xe228
mcclock0 at jazzio0 addr 0xe0004000: mc146818 compatible time-of-day clock
LPT1 at jazzio0 addr 0xe0008000 intr 0 not configured
fdc0 at jazzio0 addr 0xe0003000 intr 1
fd0 at fdc0 drive 1: 1.44MB, 80 cyl, 2 head, 18 sec
MAGNUM at jazzio0 addr 0xe000c000 intr 2 not configured
VXL at jazzio0 addr 0xe080 intr 3 not configured
sn0 at jazzio0 addr 0xe0001000 intr 4: SONIC Ethernet
sonic: write 0x0015 to reg CR
sonic: write 0x0080 to reg CR
sonic: write 0x to reg IMR
sonic: write 0x7fff to reg ISR
sonic: write 0x to reg CR
sn0: Ethernet address 00:00:00:00:00:00
asc0 at jazzio0 addr 0xe0002000 intr 5: NCR53C94, 25MHz, SCSI ID 7
scsibus0 at asc0: 8 targets, 8 luns per target
pckbc0 at jazzio0 addr 0xe0005000 intr 6
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0 (mux ignored)
pms at jazzio0 addr 0xe0005000 intr 7 not configured
com0 at jazzio0 addr 0xe0006000 intr 8: ns16550a, working fifo
com0: txfifo disabled
com0: console
com1 at jazzio0 addr 0xe0007000 intr 9: ns16550a, working fifo
com1: txfifo disabled
jazzisabr0 at mainbus0
isa0 at jazzisabr0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
scsibus0: waiting 2 seconds for devices to settle...
cd0 at scsibus0 target 2 lun 0:  cdrom removable
cd1 at scsibus0 target 4 lun 0:  cdrom removable
boot device: 
root on md0a dumps on md0b
root file system type: ffs
WARNING: preposterous TOD clock time
WARNING: using filesystem time
WARNING: CHECK AND RESET THE DATE!
erase ^H, werase ^W, kill ^U, intr ^C, status ^T
Terminal type? [vt100] 
Erase is backspace. 
(I)nstall, (S)hell or (H)alt ? s
# ifconfig sn0 10.2.3.4/24
# ping
usage: 
ping [-adDfLnoPqQrRv] [-c count] [-g gateway] [-h host] [-i interval] [-I addr]
 [-l preload] [-p pattern] [-s size] [-t tos] [-T ttl] [-w maxwait] host


My initial testing shows that NetBSD 5.1 doesn't like my v2 patch series. 
I'll debug that before I send v3.

BTW, there seem to be regressions in NetBSD 8.1 compared to NetBSD 5.1. 

The 'boot' program on the 8.1 ISO just hangs.

If I use the 'boot' program from the 5.1 ISO to load the 'netbsd' 
binary from the 8.1 ISO, I get 

Re: [PATCH v2 2/2] ide: Fix incorrect handling of some PRDTs in ide_dma_cb()

2019-12-23 Thread Alexander Popov
On 24.12.2019 03:20, John Snow wrote:
> On 12/19/19 10:01 AM, Kevin Wolf wrote:
>>
>> John, what do you think?
>>
> 
> I've been out to lunch for a little while. There are some issues that I
> recall with IDE, but couldn't find the time to fix prior to 4.2.

Hello John.

> I'll review all the outstanding IDE problems I am aware of and review
> these series before the end of the year.

Thanks. It would be nice. I've spent some time on it and want it to be finished.

Please see the v3 that I sent yesterday.

Best regards,
Alexander



Re: [EXTERNAL]Re: [PATCH 1/5] linux-user: Fix some constants in termbits.h

2019-12-23 Thread Aleksandar Markovic
> > -#define TARGET_TIOCMIWAIT  _IO('T', 92) /* wait for a change on serial 
> > input line(s) */
> > -#define TARGET_TIOCGICOUNT 0x545D  /* read serial port inline interrupt 
> > counts */
> > +/* wait for a change on serial input line(s) */
> > +#define TARGET_TIOCMIWAIT  _IO('T', 92)
> 
> This one should also be TARGET_IO, right?
> 

Right - it slipped through the cracks. I'll fix it in v2.

Thanks for the review, and spotting this one!

Aleksandar

> --
> Thanks.
> -- Max



Re: [PATCH 1/5] linux-user: Fix some constants in termbits.h

2019-12-23 Thread Max Filippov
On Mon, Dec 23, 2019 at 6:45 PM Aleksandar Markovic
 wrote:
>
> From: Aleksandar Markovic 
>
> Some constants were defined in terms of host, instead of target,
> as they should be.
>
> Some additional trivial changes in this patch were forced by
> checkpatch.pl.
>
> Signed-off-by: Aleksandar Markovic 
[...]
> index d1e09e6..b1853f0 100644
> --- a/linux-user/xtensa/termbits.h
> +++ b/linux-user/xtensa/termbits.h
[...]
> @@ -286,43 +286,61 @@ struct target_ktermios {
[...]
> -#define TARGET_TIOCMIWAIT  _IO('T', 92) /* wait for a change on serial input 
> line(s) */
> -#define TARGET_TIOCGICOUNT 0x545D  /* read serial port inline interrupt 
> counts */
> +/* wait for a change on serial input line(s) */
> +#define TARGET_TIOCMIWAIT  _IO('T', 92)

This one should also be TARGET_IO, right?

> +/* read serial port inline interrupt counts */
> +#define TARGET_TIOCGICOUNT 0x545D
>  #endif /* XTENSA_TERMBITS_H */

Reviewed-by: Max Filippov 

-- 
Thanks.
-- Max



[PATCH 1/5] linux-user: Fix some constants in termbits.h

2019-12-23 Thread Aleksandar Markovic
From: Aleksandar Markovic 

Some constants were defined in terms of host, instead of target,
as they should be.

Some additional trivial changes in this patch were forced by
checkpatch.pl.

Signed-off-by: Aleksandar Markovic 
---
 linux-user/aarch64/termbits.h|   4 +-
 linux-user/alpha/termbits.h  |  10 +--
 linux-user/arm/termbits.h|   4 +-
 linux-user/cris/termbits.h   |   4 +-
 linux-user/hppa/termbits.h   |   4 +-
 linux-user/i386/termbits.h   |   4 +-
 linux-user/m68k/termbits.h   |   4 +-
 linux-user/microblaze/termbits.h |   4 +-
 linux-user/mips/termbits.h   |   4 +-
 linux-user/nios2/termbits.h  |   4 +-
 linux-user/openrisc/termbits.h   |   6 +-
 linux-user/ppc/termbits.h|   4 +-
 linux-user/riscv/termbits.h  |   4 +-
 linux-user/s390x/termbits.h  |  26 ---
 linux-user/sh4/termbits.h|   4 +-
 linux-user/sparc/termbits.h  |   4 +-
 linux-user/sparc64/termbits.h|   4 +-
 linux-user/x86_64/termbits.h |   6 +-
 linux-user/xtensa/termbits.h | 156 ++-
 19 files changed, 141 insertions(+), 119 deletions(-)

diff --git a/linux-user/aarch64/termbits.h b/linux-user/aarch64/termbits.h
index 0ab448d..998fc1d 100644
--- a/linux-user/aarch64/termbits.h
+++ b/linux-user/aarch64/termbits.h
@@ -83,8 +83,8 @@ struct target_termios {
 #define  TARGET_B9600  015
 #define  TARGET_B19200 016
 #define  TARGET_B38400 017
-#define TARGET_EXTA B19200
-#define TARGET_EXTB B38400
+#define TARGET_EXTATARGET_B19200
+#define TARGET_EXTBTARGET_B38400
 #define TARGET_CSIZE   060
 #define   TARGET_CS5   000
 #define   TARGET_CS6   020
diff --git a/linux-user/alpha/termbits.h b/linux-user/alpha/termbits.h
index a714251..ace19be 100644
--- a/linux-user/alpha/termbits.h
+++ b/linux-user/alpha/termbits.h
@@ -108,8 +108,8 @@ struct target_termios {
 #define  TARGET_B9600  015
 #define  TARGET_B19200 016
 #define  TARGET_B38400 017
-#define TARGET_EXTA B19200
-#define TARGET_EXTB B38400
+#define TARGET_EXTA TARGET_B19200
+#define TARGET_EXTB TARGET_B38400
 #define TARGET_CBAUDEX 000
 #define  TARGET_B57600   00020
 #define  TARGET_B115200  00021
@@ -165,7 +165,7 @@ struct target_termios {
 #define TARGET_FIOASYNCTARGET_IOW('f', 125, int)
 #define TARGET_FIONBIO TARGET_IOW('f', 126, int)
 #define TARGET_FIONREADTARGET_IOR('f', 127, int)
-#define TARGET_TIOCINQ FIONREAD
+#define TARGET_TIOCINQ  TARGET_FIONREAD
 #define TARGET_FIOQSIZETARGET_IOR('f', 128, loff_t)
 
 #define TARGET_TIOCGETPTARGET_IOR('t', 8, struct target_sgttyb)
@@ -217,8 +217,8 @@ struct target_termios {
 # define TARGET_TIOCM_CAR  0x040
 # define TARGET_TIOCM_RNG  0x080
 # define TARGET_TIOCM_DSR  0x100
-# define TARGET_TIOCM_CD   TIOCM_CAR
-# define TARGET_TIOCM_RI   TIOCM_RNG
+# define TARGET_TIOCM_CDTARGET_TIOCM_CAR
+# define TARGET_TIOCM_RITARGET_TIOCM_RNG
 # define TARGET_TIOCM_OUT1 0x2000
 # define TARGET_TIOCM_OUT2 0x4000
 # define TARGET_TIOCM_LOOP 0x8000
diff --git a/linux-user/arm/termbits.h b/linux-user/arm/termbits.h
index e555cff..7170b8a 100644
--- a/linux-user/arm/termbits.h
+++ b/linux-user/arm/termbits.h
@@ -83,8 +83,8 @@ struct target_termios {
 #define  TARGET_B9600  015
 #define  TARGET_B19200 016
 #define  TARGET_B38400 017
-#define TARGET_EXTA B19200
-#define TARGET_EXTB B38400
+#define TARGET_EXTATARGET_B19200
+#define TARGET_EXTBTARGET_B38400
 #define TARGET_CSIZE   060
 #define   TARGET_CS5   000
 #define   TARGET_CS6   020
diff --git a/linux-user/cris/termbits.h b/linux-user/cris/termbits.h
index 475ee70..76d5ed0 100644
--- a/linux-user/cris/termbits.h
+++ b/linux-user/cris/termbits.h
@@ -81,8 +81,8 @@ struct target_termios {
 #define  TARGET_B9600  015
 #define  TARGET_B19200 016
 #define  TARGET_B38400 017
-#define TARGET_EXTA B19200
-#define TARGET_EXTB B38400
+#define TARGET_EXTATARGET_B19200
+#define TARGET_EXTBTARGET_B38400
 #define TARGET_CSIZE   060
 #define   TARGET_CS5   000
 #define   TARGET_CS6   020
diff --git a/linux-user/hppa/termbits.h b/linux-user/hppa/termbits.h
index 8fba839..3094710 100644
--- a/linux-user/hppa/termbits.h
+++ b/linux-user/hppa/termbits.h
@@ -82,8 +82,8 @@ struct target_termios {
 #define  TARGET_B9600  015
 #define  TARGET_B19200 016
 #define  TARGET_B38400 017
-#define TARGET_EXTA B19200
-#define TARGET_EXTB B38400
+#define TARGET_EXTATARGET_B19200
+#define TARGET_EXTBTARGET_B38400
 #define TARGET_CSIZE   060
 #define   TARGET_CS5   000
 #define   TARGET_CS6   020
diff --git a/linux-user/i386/termbits.h b/linux-user/i386/termbits.h
index 88264bb..3b16977 100644
--- a/linux-user/i386/termbits.h
+++ b/linux-user/i386/termbits.h
@@ -82,8 +82,8 @@ struct target_termios {
 #define  TARGET_B9600  015
 #define  TARGET_B19200 016
 #define  

[PATCH 3/5] linux-user: Add support for FS_IOC_VERSION ioctls

2019-12-23 Thread Aleksandar Markovic
From: Aleksandar Markovic 

A very specific thing for these two ioctls is that thier code
implies that their third argument is of type long, but the kernel
uses that argument as if it is of type int. This anomaly is
recognized also in commit 6080723 (linux-user: Implement
FS_IOC_GETFLAGS and FS_IOC_SETFLAGS ioctls).

Signed-off-by: Aleksandar Markovic 
---
 linux-user/ioctls.h   | 2 ++
 linux-user/syscall_defs.h | 8 +---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index c6b9d6a..c44f42e 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -138,6 +138,8 @@
 
  IOCTL(FS_IOC_GETFLAGS, IOC_R, MK_PTR(TYPE_INT))
  IOCTL(FS_IOC_SETFLAGS, IOC_W, MK_PTR(TYPE_INT))
+ IOCTL(FS_IOC_GETVERSION, IOC_R, MK_PTR(TYPE_INT))
+ IOCTL(FS_IOC_SETVERSION, IOC_W, MK_PTR(TYPE_INT))
 
 #ifdef CONFIG_USBFS
   /* USB ioctls */
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 98c2119..f68a8b6 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -911,12 +911,14 @@ struct target_pollfd {
 #define TARGET_FICLONETARGET_IOW(0x94, 9, int)
 #define TARGET_FICLONERANGE TARGET_IOW(0x94, 13, struct file_clone_range)
 
-/* Note that the ioctl numbers claim type "long" but the actual type
- * used by the kernel is "int".
+/*
+ * Note that the ioctl numbers for FS_IOC_
+ * claim type "long" but the actual type used by the kernel is "int".
  */
 #define TARGET_FS_IOC_GETFLAGS TARGET_IOR('f', 1, abi_long)
 #define TARGET_FS_IOC_SETFLAGS TARGET_IOW('f', 2, abi_long)
-
+#define TARGET_FS_IOC_GETVERSION TARGET_IOR('v', 1, abi_long)
+#define TARGET_FS_IOC_SETVERSION TARGET_IOW('v', 2, abi_long)
 #define TARGET_FS_IOC_FIEMAP TARGET_IOWR('f',11,struct fiemap)
 
 /* usb ioctls */
-- 
2.7.4




[PATCH 5/5] linux-user: Add support for FS_IOC32_VERSION ioctls

2019-12-23 Thread Aleksandar Markovic
From: Aleksandar Markovic 

These FS_IOC32_VERSION ioctls are identical to
FS_IOC_VERSION ioctls, but without the anomaly of their
number defined as if their third argument is of type long, while
it is treated internally in kernel as is of type int.

Signed-off-by: Aleksandar Markovic 
---
 linux-user/ioctls.h   | 2 ++
 linux-user/syscall_defs.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index 4fd6939..3affd88 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -142,6 +142,8 @@
  IOCTL(FS_IOC_SETVERSION, IOC_W, MK_PTR(TYPE_INT))
  IOCTL(FS_IOC32_GETFLAGS, IOC_R, MK_PTR(TYPE_INT))
  IOCTL(FS_IOC32_SETFLAGS, IOC_W, MK_PTR(TYPE_INT))
+ IOCTL(FS_IOC32_GETVERSION, IOC_R, MK_PTR(TYPE_INT))
+ IOCTL(FS_IOC32_SETVERSION, IOC_W, MK_PTR(TYPE_INT))
 
 #ifdef CONFIG_USBFS
   /* USB ioctls */
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 964b2b4..a73cc3d 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -922,6 +922,8 @@ struct target_pollfd {
 #define TARGET_FS_IOC_FIEMAP TARGET_IOWR('f',11,struct fiemap)
 #define TARGET_FS_IOC32_GETFLAGS TARGET_IOR('f', 1, int)
 #define TARGET_FS_IOC32_SETFLAGS TARGET_IOW('f', 2, int)
+#define TARGET_FS_IOC32_GETVERSION TARGET_IOR('v', 1, int)
+#define TARGET_FS_IOC32_SETVERSION TARGET_IOW('v', 2, int)
 
 /* usb ioctls */
 #define TARGET_USBDEVFS_CONTROL TARGET_IOWRU('U', 0)
-- 
2.7.4




[PATCH 2/5] linux-user: mips: Update syscall numbers to kernel 5.5 rc3 level

2019-12-23 Thread Aleksandar Markovic
From: Aleksandar Markovic 

Update mips syscall numbers based on Linux kernel tag v5.5-rc3
(commit 46cf053e).

Signed-off-by: Aleksandar Markovic 
---
 linux-user/mips/cpu_loop.c | 41 ++
 linux-user/mips/syscall_nr.h   | 45 ++
 linux-user/mips64/syscall_nr.h | 13 
 3 files changed, 99 insertions(+)

diff --git a/linux-user/mips/cpu_loop.c b/linux-user/mips/cpu_loop.c
index 39915b3..f8f944f 100644
--- a/linux-user/mips/cpu_loop.c
+++ b/linux-user/mips/cpu_loop.c
@@ -390,6 +390,47 @@ static const uint8_t mips_syscall_args[] = {
 MIPS_SYS(sys_copy_file_range, 6) /* 360 */
 MIPS_SYS(sys_preadv2, 6)
 MIPS_SYS(sys_pwritev2, 6)
+MIPS_SYS(sys_pkey_mprotect, 4)
+MIPS_SYS(sys_pkey_alloc, 2)
+MIPS_SYS(sys_pkey_free, 1)  /* 365 */
+MIPS_SYS(sys_statx, 5)
+MIPS_SYS(sys_rseq, 4)
+MIPS_SYS(sys_io_pgetevents, 6)
+
+MIPS_SYS(clock_gettime64, 2)
+MIPS_SYS(clock_settime64, 2)
+MIPS_SYS(clock_adjtime64, 2)
+MIPS_SYS(clock_getres_time64, 2)
+MIPS_SYS(clock_nanosleep_time64, 4)
+MIPS_SYS(timer_gettime64, 2)
+MIPS_SYS(timer_settime64, 4)
+MIPS_SYS(timerfd_gettime64, 2)
+MIPS_SYS(timerfd_settime64, 4)
+MIPS_SYS(utimensat_time64, 4)
+MIPS_SYS(pselect6_time64, 6)
+MIPS_SYS(ppoll_time64, 5)
+MIPS_SYS(io_pgetevents_time64, 6)
+MIPS_SYS(recvmmsg_time64, 5)
+MIPS_SYS(mq_timedsend_time64, 5)
+MIPS_SYS(mq_timedreceive_time64, 5)
+MIPS_SYS(semtimedop_time64, 4)
+MIPS_SYS(rt_sigtimedwait_time64, 4)
+MIPS_SYS(futex_time64, 6)
+MIPS_SYS(sched_rr_get_interval_time64, 2)
+
+MIPS_SYS(pidfd_send_signal, 4)
+MIPS_SYS(io_uring_setup, 2)
+MIPS_SYS(io_uring_enter, 6)
+MIPS_SYS(io_uring_register, 4)
+MIPS_SYS(open_tree, 3)
+MIPS_SYS(move_mount, 5)
+MIPS_SYS(fsopen, 2)
+MIPS_SYS(fsconfig, 5)
+MIPS_SYS(fsmount, 3)
+MIPS_SYS(fspick, 3)
+MIPS_SYS(pidfd_open, 2)
+MIPS_SYS(clone3, 2)
+
 };
 #  undef MIPS_SYS
 # endif /* O32 */
diff --git a/linux-user/mips/syscall_nr.h b/linux-user/mips/syscall_nr.h
index 7fa7fa5..94104d0 100644
--- a/linux-user/mips/syscall_nr.h
+++ b/linux-user/mips/syscall_nr.h
@@ -376,5 +376,50 @@
 #define TARGET_NR_statx (TARGET_NR_Linux + 366)
 #define TARGET_NR_rseq  (TARGET_NR_Linux + 367)
 #define TARGET_NR_io_pgetevents (TARGET_NR_Linux + 368)
+/* room for arch specific calls */
+#define TARGET_NR_semget(TARGET_NR_Linux + 393)
+#define TARGET_NR_semctl(TARGET_NR_Linux + 394)
+#define TARGET_NR_shmget(TARGET_NR_Linux + 395)
+#define TARGET_NR_shmctl(TARGET_NR_Linux + 396)
+#define TARGET_NR_shmat (TARGET_NR_Linux + 397)
+#define TARGET_NR_shmdt (TARGET_NR_Linux + 398)
+#define TARGET_NR_msgget(TARGET_NR_Linux + 399)
+#define TARGET_NR_msgsnd(TARGET_NR_Linux + 400)
+#define TARGET_NR_msgrcv(TARGET_NR_Linux + 401)
+#define TARGET_NR_msgctl(TARGET_NR_Linux + 402)
+/* 403-423 common for 32-bit archs */
+#define TARGET_NR_clock_gettime64   (TARGET_NR_Linux + 403)
+#define TARGET_NR_clock_settime64   (TARGET_NR_Linux + 404)
+#define TARGET_NR_clock_adjtime64   (TARGET_NR_Linux + 405)
+#define TARGET_NR_clock_getres_time64   (TARGET_NR_Linux + 406)
+#define TARGET_NR_clock_nanosleep_time64 (TARGET_NR_Linux + 407)
+#define TARGET_NR_timer_gettime64   (TARGET_NR_Linux + 408)
+#define TARGET_NR_timer_settime64   (TARGET_NR_Linux + 409)
+#define TARGET_NR_timerfd_gettime64 (TARGET_NR_Linux + 410)
+#define TARGET_NR_timerfd_settime64 (TARGET_NR_Linux + 411)
+#define TARGET_NR_utimensat_time64  (TARGET_NR_Linux + 412)
+#define TARGET_NR_pselect6_time64   (TARGET_NR_Linux + 413)
+#define TARGET_NR_ppoll_time64  (TARGET_NR_Linux + 414)
+#define TARGET_NR_io_pgetevents_time64  (TARGET_NR_Linux + 416)
+#define TARGET_NR_recvmmsg_time64   (TARGET_NR_Linux + 417)
+#define TARGET_NR_mq_timedsend_time64   (TARGET_NR_Linux + 418)
+#define TARGET_NR_mq_timedreceive_time64 (TARGET_NR_Linux + 419)
+#define TARGET_NR_semtimedop_time64 (TARGET_NR_Linux + 420)
+#define TARGET_NR_rt_sigtimedwait_time64 (TARGET_NR_Linux + 421)
+#define TARGET_NR_futex_time64  (TARGET_NR_Linux + 422)
+#define TARGET_NR_sched_rr_get_interval_time64 (TARGET_NR_Linux + 423)
+/* 424 onwards common for all archs */
+#define TARGET_NR_pidfd_send_signal (TARGET_NR_Linux + 424)
+#define TARGET_NR_io_uring_setup(TARGET_NR_Linux + 425)
+#define TARGET_NR_io_uring_enter(TARGET_NR_Linux + 426)
+#define TARGET_NR_io_uring_register (TARGET_NR_Linux + 427)
+#define TARGET_NR_open_tree 

[PATCH 0/5] linux-user: Misc patches for 5.0

2019-12-23 Thread Aleksandar Markovic
From: Aleksandar Markovic 

This series is a collection of patches I recently accumulated.

Aleksandar Markovic (5):
  linux-user: Fix some constants in termbits.h
  linux-user: mips: Update syscall numbers to kernel 5.5 rc3 level
  linux-user: Add support for FS_IOC_VERSION ioctls
  linux-user: Add support for FS_IOC32_FLAGS ioctls
  linux-user: Add support for FS_IOC32_VERSION ioctls

 linux-user/aarch64/termbits.h|   4 +-
 linux-user/alpha/termbits.h  |  10 +--
 linux-user/arm/termbits.h|   4 +-
 linux-user/cris/termbits.h   |   4 +-
 linux-user/hppa/termbits.h   |   4 +-
 linux-user/i386/termbits.h   |   4 +-
 linux-user/ioctls.h  |   6 ++
 linux-user/m68k/termbits.h   |   4 +-
 linux-user/microblaze/termbits.h |   4 +-
 linux-user/mips/cpu_loop.c   |  41 ++
 linux-user/mips/syscall_nr.h |  45 +++
 linux-user/mips/termbits.h   |   4 +-
 linux-user/mips64/syscall_nr.h   |  13 
 linux-user/nios2/termbits.h  |   4 +-
 linux-user/openrisc/termbits.h   |   6 +-
 linux-user/ppc/termbits.h|   4 +-
 linux-user/riscv/termbits.h  |   4 +-
 linux-user/s390x/termbits.h  |  26 ---
 linux-user/sh4/termbits.h|   4 +-
 linux-user/sparc/termbits.h  |   4 +-
 linux-user/sparc64/termbits.h|   4 +-
 linux-user/syscall_defs.h|  12 ++-
 linux-user/x86_64/termbits.h |   6 +-
 linux-user/xtensa/termbits.h | 156 ++-
 24 files changed, 255 insertions(+), 122 deletions(-)

-- 
2.7.4




[PATCH 4/5] linux-user: Add support for FS_IOC32_FLAGS ioctls

2019-12-23 Thread Aleksandar Markovic
From: Aleksandar Markovic 

These FS_IOC32_FLAGS ioctls are identical to
FS_IOC_FLAGS ioctls, but without the anomaly of their
number defined as if their third argument is of type long, while
it is treated internally in kernel as is of type int.

Signed-off-by: Aleksandar Markovic 
---
 linux-user/ioctls.h   | 2 ++
 linux-user/syscall_defs.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index c44f42e..4fd6939 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -140,6 +140,8 @@
  IOCTL(FS_IOC_SETFLAGS, IOC_W, MK_PTR(TYPE_INT))
  IOCTL(FS_IOC_GETVERSION, IOC_R, MK_PTR(TYPE_INT))
  IOCTL(FS_IOC_SETVERSION, IOC_W, MK_PTR(TYPE_INT))
+ IOCTL(FS_IOC32_GETFLAGS, IOC_R, MK_PTR(TYPE_INT))
+ IOCTL(FS_IOC32_SETFLAGS, IOC_W, MK_PTR(TYPE_INT))
 
 #ifdef CONFIG_USBFS
   /* USB ioctls */
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index f68a8b6..964b2b4 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -920,6 +920,8 @@ struct target_pollfd {
 #define TARGET_FS_IOC_GETVERSION TARGET_IOR('v', 1, abi_long)
 #define TARGET_FS_IOC_SETVERSION TARGET_IOW('v', 2, abi_long)
 #define TARGET_FS_IOC_FIEMAP TARGET_IOWR('f',11,struct fiemap)
+#define TARGET_FS_IOC32_GETFLAGS TARGET_IOR('f', 1, int)
+#define TARGET_FS_IOC32_SETFLAGS TARGET_IOW('f', 2, int)
 
 /* usb ioctls */
 #define TARGET_USBDEVFS_CONTROL TARGET_IOWRU('U', 0)
-- 
2.7.4




Re: [PATCH v2 2/2] ide: Fix incorrect handling of some PRDTs in ide_dma_cb()

2019-12-23 Thread John Snow



On 12/19/19 10:01 AM, Kevin Wolf wrote:
> Am 16.12.2019 um 19:14 hat Alexander Popov geschrieben:
>> The commit a718978ed58a from July 2015 introduced the assertion which
>> implies that the size of successful DMA transfers handled in ide_dma_cb()
>> should be multiple of 512 (the size of a sector). But guest systems can
>> initiate DMA transfers that don't fit this requirement.
>>
>> For fixing that let's check the number of bytes prepared for the transfer
>> by the prepare_buf() handler. The code in ide_dma_cb() must behave
>> according to the Programming Interface for Bus Master IDE Controller
>> (Revision 1.0 5/16/94):
>> 1. If PRDs specified a smaller size than the IDE transfer
>>size, then the Interrupt and Active bits in the Controller
>>status register are not set (Error Condition).
>> 2. If the size of the physical memory regions was equal to
>>the IDE device transfer size, the Interrupt bit in the
>>Controller status register is set to 1, Active bit is set to 0.
>> 3. If PRDs specified a larger size than the IDE transfer size,
>>the Interrupt and Active bits in the Controller status register
>>are both set to 1.
>>
>> Signed-off-by: Alexander Popov 
>> ---
>>  hw/ide/core.c | 30 ++
>>  1 file changed, 22 insertions(+), 8 deletions(-)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index 754ff4dc34..171831c7bd 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -849,6 +849,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>  int64_t sector_num;
>>  uint64_t offset;
>>  bool stay_active = false;
>> +int32_t prep_size = 0;
>>  
>>  if (ret == -EINVAL) {
>>  ide_dma_error(s);
>> @@ -863,13 +864,15 @@ static void ide_dma_cb(void *opaque, int ret)
>>  }
>>  }
>>  
>> -n = s->io_buffer_size >> 9;
>> -if (n > s->nsector) {
>> -/* The PRDs were longer than needed for this request. Shorten them 
>> so
>> - * we don't get a negative remainder. The Active bit must remain set
>> - * after the request completes. */
>> +if (s->io_buffer_size > s->nsector * 512) {
>> +/*
>> + * The PRDs were longer than needed for this request.
>> + * The Active bit must remain set after the request completes.
>> + */
>>  n = s->nsector;
>>  stay_active = true;
>> +} else {
>> +n = s->io_buffer_size >> 9;
>>  }
>>  
>>  sector_num = ide_get_sector(s);
>> @@ -892,9 +895,20 @@ static void ide_dma_cb(void *opaque, int ret)
>>  n = s->nsector;
>>  s->io_buffer_index = 0;
>>  s->io_buffer_size = n * 512;
>> -if (s->bus->dma->ops->prepare_buf(s->bus->dma, s->io_buffer_size) < 
>> 512) {
>> -/* The PRDs were too short. Reset the Active bit, but don't raise an
>> - * interrupt. */
>> +prep_size = s->bus->dma->ops->prepare_buf(s->bus->dma, 
>> s->io_buffer_size);
>> +/* prepare_buf() must succeed and respect the limit */
>> +assert(prep_size > 0 && prep_size <= n * 512);
> 
> Hm, I'm not sure about prep_size > 0. Maybe it's true for
> bmdma_prepare_buf() for PCI (I'm not even sure there: What happens if we
> pass a PRDT with 0 entries? Should we have another test case for this?),
> but other controllers like AHCI don't seem to interpret an entry with
> size 0 as maximum size.
> 
> John, what do you think?
> 

I've been out to lunch for a little while. There are some issues that I
recall with IDE, but couldn't find the time to fix prior to 4.2.

I'll review all the outstanding IDE problems I am aware of and review
these series before the end of the year.

>> +/*
>> + * Now prep_size stores the number of bytes in the sglist, and
>> + * s->io_buffer_size stores the number of bytes described by the PRDs.
>> + */
>> +
>> +if (prep_size < n * 512) {
>> +/*
>> + * The PRDs are too short for this request. Error condition!
>> + * Reset the Active bit and don't raise the interrupt.
>> + */
>>  s->status = READY_STAT | SEEK_STAT;
>>  dma_buf_commit(s, 0);
>>  goto eot;
> 
> Here you decided that we don't need to do partial I/O for short PRDTs. I
> think my conclusion was that the spec doesn't really say what we need to
> do, so this is fine with me.
> 
> Apart from the assertion above, the patch looks good to me.
> 
> Kevin
> 




Re: [PATCH v5 5/6] hppa: Add emulation of Artist graphics

2019-12-23 Thread Philippe Mathieu-Daudé

On 12/23/19 6:50 PM, Sven Schnelle wrote:

Hi Philippe,

On Sun, Dec 22, 2019 at 01:37:48PM +0100, Philippe Mathieu-Daudé wrote:
  
+if (vga_interface_type != VGA_NONE) {

+dev = qdev_create(NULL, "artist");
+qdev_init_nofail(dev);
+s = SYS_BUS_DEVICE(dev);
+sysbus_mmio_map(s, 0, LASI_GFX_HPA);
+sysbus_mmio_map(s, 1, ARTIST_FB_ADDR);


How is this chipset connected on the board?
If it is a card you can plug on a bus, you can use a condition.
If it is soldered or part of another chipset, then it has to be mapped
unconditionally.


Depends on the Model. Hp 9000 712 and 715 had it onboard, for the B160L
we're emulating and others it was a GSC add-on card.


The B160L case is unclear, do you mean this is not the chipset on the 
machine, but the software is happy if another chipset is available?


Looking at hw/hppa/ I only see one machine:

  static void machine_hppa_machine_init(MachineClass *mc)
  {
  mc->desc = "HPPA generic machine";
  ...
  }
  DEFINE_MACHINE("hppa", machine_hppa_machine_init)

Are you saying this generic machine is able to run different physical 
hw? Why not add them? This shouldn't take long and it would be clearer, 
what do you think?


Adding different machines here in QEMU mostly mean add a class which 
declare the different properties used by each machine. Igor Mammedov 
recently suggested to follow the example of aspeed_machine_types[] in 
hw/arm/aspeed.c.





NetBSD/arc on MIPS Magnum, was Re: [PATCH 00/10] Fixes for DP8393X SONIC device emulation

2019-12-23 Thread Finn Thain
On Mon, 23 Dec 2019, Philippe Mathieu-Daud? wrote:

> Hi Finn,
> 
> On 12/20/19 5:24 AM, Finn Thain wrote:
> > On Sun, 15 Dec 2019, Aleksandar Markovic wrote:
> > 
> > > 
> > > Herve,
> > > 
> > > Is there any way for us to come up with an equivalent or at least
> > > approximate scenario for Jazz machines?
> > > 
> > > Regards,
> > > Aleksandar
> > > 
> > 
> > That would be useful in general, but in this case I think it might be
> > better to test NetBSD, since I have already tested Linux. (I had to fix
> > some bugs in the Linux sonic driver.)
> > 
> > I tried to boot NetBSD/arc but failed. I got a blue screen when I typed
> > "cd:boot" at the "Run A Program" prompt in the ARC menu.
> > 
> > $ ln -s NTPROM.RAW mipsel_bios.bin
> > $ mips64el-softmmu/qemu-system-mips64el -M magnum -L .
> > -drive if=scsi,unit=2,media=cdrom,format=raw,file=NetBSD-8.1-arc.iso
> > -global ds1225y.filename=nvram -global ds1225y.size=8200
> > qemu-system-mips64el: g364: invalid read at [00102000]
> > $
> > 
> > Any help would be appreciated.
> 
> Please open a new bug entry with this information at
> https://bugs.launchpad.net/qemu/+filebug
> 

I know precious little about NetBSD installation and MIPS Magnum. What I 
wrote above was guesswork. Hence this could be a NetBSD bug or user error.

Does there exist a known-good combination of NetBSD/arc and 
qemu-system-mips64el releases?

If so, I could use that to check for user error or possible NetBSD issue.

> Thanks,
> 
> Phil.
> 
> 



Re: [PATCH v9 14/15] hw/i386: Introduce the microvm machine type

2019-12-23 Thread Philippe Mathieu-Daudé

Hi Paolo,

On 10/16/19 11:29 AM, Paolo Bonzini wrote:

On 16/10/19 11:26, Philippe Mathieu-Daudé wrote:

On 10/16/19 9:46 AM, Paolo Bonzini wrote:



diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index c5c9d4900e..d399dcba52 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -92,6 +92,10 @@ config Q35
   select SMBIOS
   select FW_CFG_DMA
   +config MICROVM
+    bool


Missing:

  select ISA_BUS
  select APIC
  select IOAPIC
  select I8259
  select MC146818RTC


maybe 'select SERIAL_ISA' too?


Right, but only 'imply'.


Per the documentation, you are correct:

  Boards specify their constituent devices using ``imply`` and ``select``
  directives.  A device should be listed under ``select`` if the board
  cannot be started at all without it.  It should be listed under
  ``imply`` if (depending on the QEMU command line) the board may or
  may not be started without it.  Boards also default to false; they are
  enabled by the ``default-configs/*.mak`` for the target they apply to.

But then the build fails configured with --without-default-devices:

  LINKx86_64-softmmu/qemu-system-x86_64
/usr/bin/ld: hw/i386/microvm.o: in function `microvm_devices_init':
hw/i386/microvm.c:157: undefined reference to `serial_hds_isa_init'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:206: qemu-system-x86_64] Error 1
make: *** [Makefile:483: x86_64-softmmu/all] Error 2

We have:

static void microvm_devices_init(MicrovmMachineState *mms)
{
...
if (mms->isa_serial) {
serial_hds_isa_init(isa_bus, 0, 1);
}
...
}

With this diff the link succeed:

-- >8 --
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -95,7 +95,7 @@ config Q35

 config MICROVM
 bool
-imply SERIAL_ISA
+select SERIAL_ISA
 select ISA_BUS
 select APIC
 select IOAPIC
---

I was going to send this as a patch but suddently remembered this 
thread. I'm not sure what you want so I prefer ask first :)


We have CONFIG_SERIAL_ISA declared in "config-devices.h" so we could 
also use:


-- >8 --
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -153,9 +153,11 @@ static void 
microvm_devices_init(MicrovmMachineState *mms)

 microvm_set_rtc(mms, rtc_state);
 }

+#ifdef CONFIG_SERIAL_ISA
 if (mms->isa_serial) {
 serial_hds_isa_init(isa_bus, 0, 1);
 }
+#endif

 if (bios_name == NULL) {
 bios_name = MICROVM_BIOS_FILENAME;
---

The binary links too, and the difference with the other hunk is now the 
device is not listed in 'qemu-system-x86_64 -M microvm -device help'.

However I guess understand we prefer to avoid that kind of ifdef'ry.

Which way to you prefer?

Thanks,

Phil.




Re: [PATCH] iotests/279: Fix for non-qcow2 formats

2019-12-23 Thread John Snow



On 12/19/19 9:42 AM, Max Reitz wrote:
> First, driver=qcow2 will not work so well with non-qcow2 formats (and
> this test claims to support qcow, qed, and vmdk).
> 
> Second, vmdk will always report the backing file format to be vmdk.
> Filter that out so the output looks like for all other formats.
> 
> Third, the flat vmdk subformats do not support backing files, so they
> will not work with this test.
> 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/279 | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/qemu-iotests/279 b/tests/qemu-iotests/279
> index 6682376808..30d29b1cb2 100755
> --- a/tests/qemu-iotests/279
> +++ b/tests/qemu-iotests/279
> @@ -38,6 +38,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>  _supported_fmt qcow qcow2 vmdk qed
>  _supported_proto file
>  _supported_os Linux
> +_unsupported_imgopts "subformat=monolithicFlat" \
> + "subformat=twoGbMaxExtentFlat" \
>  
>  TEST_IMG="$TEST_IMG.base" _make_test_img 64M
>  TEST_IMG="$TEST_IMG.mid" _make_test_img -b "$TEST_IMG.base"
> @@ -45,11 +47,12 @@ _make_test_img -b "$TEST_IMG.mid"
>  
>  echo
>  echo '== qemu-img info --backing-chain =='
> -_img_info --backing-chain | _filter_img_info
> +_img_info --backing-chain | _filter_img_info | grep -v 'backing file format'
>  
>  echo
>  echo '== qemu-img info --backing-chain --image-opts =='
> -TEST_IMG="driver=qcow2,file.driver=file,file.filename=$TEST_IMG" _img_info 
> --backing-chain --image-opts | _filter_img_info
> +TEST_IMG="driver=$IMGFMT,file.driver=file,file.filename=$TEST_IMG" _img_info 
> --backing-chain --image-opts \
> +| _filter_img_info | grep -v 'backing file format'

Haha.

Reviewed-by: John Snow 

>  
>  # success, all done
>  echo "*** done"
> 




Re: [PATCH] target/ppc: fix memory dump endianness in QEMU monitor

2019-12-23 Thread Fabiano Rosas
David Gibson  writes:

> b) AFAICT this is the *only* thing that looks for the LE bit in
> hflags. Given that, and the fact that it would be wrong in most cases,
> we should remove it from hflags entirely along with this change.
>

I see there is:

static void ppc_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
{
...
ctx->le_mode = !!(env->hflags & (1 << MSR_LE));
...
}

And we call hreg_recompute_hflags in some places:

- powerpc_excp (target/ppc/excp_helper.c)
  Called from TCG do_interrupt

- ppc_cpu_reset (target/ppc/translate_init.inc.c)
  Called from spapr_machine_reset

- hreg_store_msr (target/ppc/helper_regs.h)
  This is used for migration and for do_rfi, store_msr

- h_cede (hw/ppc/spapr_hcall.c)
  QEMU-side H_CEDE hypercall implementation 


It looks like the hflags MSR_LE is being updated correctly with TCG. But
with KVM we only touch it on system_reset (and possibly h_cede? I don't
know if it is QEMU who handles it).

So I would let hflags be.


... Actually, I don't really know the purpose of hflags. It comes from:

  commit 3f3373166227b13e762e20d2fb51eadfa6a2d653
  Author: Fabrice Bellard 
  Date:   Wed Aug 20 23:02:09 2003 +
  
  pop ss, mov ss, x and sti disable irqs for the next instruction -
  began dispatch optimization by adding new x86 cpu 'hidden' flags
  
  
  git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@372 
c046a42c-6fe2-441c-8c8c-71466251a162

Could any one clarify that?

Thanks

>> ---
>>  target/ppc/translate_init.inc.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/target/ppc/translate_init.inc.c 
>> b/target/ppc/translate_init.inc.c
>> index d33d65dff7..a0b384da9e 100644
>> --- a/target/ppc/translate_init.inc.c
>> +++ b/target/ppc/translate_init.inc.c
>> @@ -10830,7 +10830,7 @@ static void ppc_disas_set_info(CPUState *cs, 
>> disassemble_info *info)
>>  PowerPCCPU *cpu = POWERPC_CPU(cs);
>>  CPUPPCState *env = >env;
>>  
>> -if ((env->hflags >> MSR_LE) & 1) {
>> +if (msr_le) {
>>  info->endian = BFD_ENDIAN_LITTLE;
>>  }
>>  info->mach = env->bfd_mach;



Re: [PATCH v2] iotests.py: Let wait_migration wait even more

2019-12-23 Thread John Snow



On 12/19/19 1:36 PM, Max Reitz wrote:
> The "migration completed" event may be sent (on the source, to be
> specific) before the migration is actually completed, so the VM runstate
> will still be "finish-migrate" instead of "postmigrate".  So ask the
> users of VM.wait_migration() to specify the final runstate they desire
> and then poll the VM until it has reached that state.  (This should be
> over very quickly, so busy polling is fine.)
> 
> Without this patch, I see intermittent failures in the new iotest 280
> under high system load.  I have not yet seen such failures with other
> iotests that use VM.wait_migration() and query-status afterwards, but
> maybe they just occur even more rarely, or it is because they also wait
> on the destination VM to be running.
> 
> Signed-off-by: Max Reitz 

Reviewed-by: John Snow 

With Kevin's suggestion on the comment touchup or without.

> ---
> v2:
> - Stop breaking 234 and 262 [Kevin]
> ---
>  tests/qemu-iotests/234| 8 
>  tests/qemu-iotests/262| 4 ++--
>  tests/qemu-iotests/280| 2 +-
>  tests/qemu-iotests/iotests.py | 6 +-
>  4 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/tests/qemu-iotests/234 b/tests/qemu-iotests/234
> index 34c818c485..59a7f949ec 100755
> --- a/tests/qemu-iotests/234
> +++ b/tests/qemu-iotests/234
> @@ -69,9 +69,9 @@ with iotests.FilePath('img') as img_path, \
>  iotests.log(vm_a.qmp('migrate', uri='exec:cat >%s' % (fifo_a)))
>  with iotests.Timeout(3, 'Migration does not complete'):
>  # Wait for the source first (which includes setup=setup)
> -vm_a.wait_migration()
> +vm_a.wait_migration('postmigrate')
>  # Wait for the destination second (which does not)
> -vm_b.wait_migration()
> +vm_b.wait_migration('running')
>  
>  iotests.log(vm_a.qmp('query-migrate')['return']['status'])
>  iotests.log(vm_b.qmp('query-migrate')['return']['status'])
> @@ -98,9 +98,9 @@ with iotests.FilePath('img') as img_path, \
>  iotests.log(vm_b.qmp('migrate', uri='exec:cat >%s' % (fifo_b)))
>  with iotests.Timeout(3, 'Migration does not complete'):
>  # Wait for the source first (which includes setup=setup)
> -vm_b.wait_migration()
> +vm_b.wait_migration('postmigrate')
>  # Wait for the destination second (which does not)
> -vm_a.wait_migration()
> +vm_a.wait_migration('running')
>  
>  iotests.log(vm_a.qmp('query-migrate')['return']['status'])
>  iotests.log(vm_b.qmp('query-migrate')['return']['status'])
> diff --git a/tests/qemu-iotests/262 b/tests/qemu-iotests/262
> index 0963daa806..bbcb5260a6 100755
> --- a/tests/qemu-iotests/262
> +++ b/tests/qemu-iotests/262
> @@ -71,9 +71,9 @@ with iotests.FilePath('img') as img_path, \
>  iotests.log(vm_a.qmp('migrate', uri='exec:cat >%s' % (fifo)))
>  with iotests.Timeout(3, 'Migration does not complete'):
>  # Wait for the source first (which includes setup=setup)
> -vm_a.wait_migration()
> +vm_a.wait_migration('postmigrate')
>  # Wait for the destination second (which does not)
> -vm_b.wait_migration()
> +vm_b.wait_migration('running')
>  
>  iotests.log(vm_a.qmp('query-migrate')['return']['status'])
>  iotests.log(vm_b.qmp('query-migrate')['return']['status'])
> diff --git a/tests/qemu-iotests/280 b/tests/qemu-iotests/280
> index 0b1fa8e1d8..85e9114c5e 100755
> --- a/tests/qemu-iotests/280
> +++ b/tests/qemu-iotests/280
> @@ -45,7 +45,7 @@ with iotests.FilePath('base') as base_path , \
>  vm.qmp_log('migrate', uri='exec:cat > /dev/null')
>  
>  with iotests.Timeout(3, 'Migration does not complete'):
> -vm.wait_migration()
> +vm.wait_migration('postmigrate')
>  
>  iotests.log('\nVM is now stopped:')
>  iotests.log(vm.qmp('query-migrate')['return']['status'])
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 13fd8b5cd2..0b62c42851 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -668,12 +668,16 @@ class VM(qtest.QEMUQtestMachine):
>  }
>  ]))
>  
> -def wait_migration(self):
> +def wait_migration(self, expect_runstate):
>  while True:
>  event = self.event_wait('MIGRATION')
>  log(event, filters=[filter_qmp_event])
>  if event['data']['status'] == 'completed':
>  break
> +# The event may occur in finish-migrate, so wait for the expected
> +# post-migration runstate
> +while self.qmp('query-status')['return']['status'] != 
> expect_runstate:
> +pass
>  
>  def node_info(self, node_name):
>  nodes = self.qmp('query-named-block-nodes')
> 

-- 
—js




[Bug 1745312] Re: Regression report: Disk subsystem I/O failures/issues surfacing in DOS/early Windows [two separate issues: one bisected, one root-caused]

2019-12-23 Thread John Snow
I will try to debug as time permits, but the priority of MS-DOS bugs is
not ... measurable with casual tools. However, there are a lot of other
IDE bugs on my plate that are very important! so I am hoping to grab a
bunch of IDE bugs at once, but no promises here.

Notably, our geometry detection is not very good, it's more than
possible we are misreporting values and confusing DOS. Our IDE disks are
also not very consistent about what standard of the spec they are trying
to emulate, so there are likely other problems there, too.

If you'd like to debug on your own, I'd recommend enabling tracing and
enabling some of the IDE trace points; some of them can be quite verbose
-- don't enable the data dumping ones. The control flow ones can be
informational sometimes to guess when the guest OS got confused and then
walk your way back to a register read that would have picked up some
error bits, or to detect busy-waits on registers not changing and try to
guess what it was waiting for.

https://github.com/qemu/qemu/blob/master/docs/devel/tracing.txt
https://github.com/qemu/qemu/blob/master/hw/ide/trace-events

Ignore the AHCI and ATAPI traces, and don't use the ide_data_* traces
unless you are booting a custom firmware that only performs a strict few
IO accesses -- otherwise you'll get flooded off the map.

** Changed in: qemu
 Assignee: (unassigned) => John Snow (jnsnow)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1745312

Title:
  Regression report: Disk subsystem I/O failures/issues surfacing in
  DOS/early Windows [two separate issues: one bisected, one root-caused]

Status in QEMU:
  New

Bug description:
  [Headsup: This report is long-ish due to the amount of detail I've
  stumbled on along the way that I think is relevant to include. I can't
  speak as to the complexity of the actual bugs, but the size of this
  report should not suggest that the reproduction process is
  particularly headache-inducing.]

  Hi!

  I recently needed to fire up some ancient software for research
  purposes and got very distracted discovering and playing with old
  versions of Windows :). In the process I've discovered some glitches
  with disk I/O.

  I believe I've stumbled on two completely separate issues that
  coincidentally surfaced at the same time. It's possible that
  components of this report will be re-filed as more specific new bugs,
  but I'm not an authority on QEMU internals or how to narrow
  down/categorize what I've found.

  - The first bug only surfaces when the "isapc" machine type is used.
  It intermittently produces "General failure {read,writ}ing drive _"
  under MS-DOS 6.22, and also somehow interferes with early bootstrap of
  Windows NT 4 (in NTLDR). Enabling or disabling KVM (I'm on Linux)
  appears to make no difference whatsoever, which may help with
  debugging.

  - The second issue involves
- a WinNT4 disk image
- created by running through a bog-standard NT4 install inside QEMU 2.9.0
- which will now fail to boot in any version of QEMU - even version 1.0
  - but which VirtualBox will boot fine
- but only if I point VirtualBox at QEMU's raw disk image via a
  hacked-together VMDK file
- if the raw image is converted to VHD(X), VirtualBox will also fail
  to boot the image with exactly the same error as QEMU
- this state of affairs is not affected by image sparseness (which makes
  sense)

  I'm confident I've bisected the first issue.

  I wasn't able to bisect the second issue (as all tested versions of
  QEMU behaved identically), but I've figured out a working repro
  testcase and I believe I've managed to pin down a solid root cause.


  == #1: Intermittent I/O issues when `-M isapc` is used =

  These symptoms sometimes take a small amount of time and fiddling to
  trigger, but I AM able to consistently surface them on my machine
  after a short while. (I am very very interested to hear if others
  cannot reproduce them.)

  So, first of all:

  https://github.com/qemu/qemu/commit/306ec6c3cece7004429c79c1ac93d49919f1f1cc
(Jul 30 2013): the last version that works

  https://github.com/qemu/qemu/commit/e689f7c668cbd9d08f330e17c3dd3a059c9553d3
(Oct 30 2013): the first version that intermittently fails

  Maybe lift out and build these branches while reading. *shrug*
  (How to do this can be found at the end of this report - along with a 
time-saving ./configure line, FWIW)

  Here are the changelists between these two revisions:

  https://github.com/qemu/qemu/compare/306ec6c...e689f7c
  (Compare direction: OLD to NEW) (Commits: 166  Files changed: 192)

  https://github.com/qemu/qemu/compare/e689f7c...306ec6c
  (Compare direction: NEW to OLD) (Commits: 30   Files changed: 22)

  (Someone else more familiar with Git might know why GitHub returns
  results for both compare directions, and/or if the 2nd link is 

Re: [PATCH] target/ppc: fix memory dump endianness in QEMU monitor

2019-12-23 Thread Maxiwell S. Garcia
On Mon, Dec 23, 2019 at 05:30:43PM +1100, David Gibson wrote:
> On Thu, Dec 19, 2019 at 01:38:54PM -0300, Maxiwell S. Garcia wrote:
> > The env->hflags is computed in ppc_cpu_reset(), using the MSR register
> > as input. But at the point ppc_disas_set_info() is called the MSR_LE bit
> > in env->hflags doesn't contain the same information that env->msr.
> > 
> > Signed-off-by: Maxiwell S. Garcia 
> > Signed-off-by: Fabiano Rosas 
> 
> I think the change is ok as far as it goes but,
> 
> a) the commit message should expand on what the practical effect of
> this is.  Looking, I think the only thing this affects is DEBUG_DISAS
> output (i.e. very rarely) which is worth noting.

Ok, I will do that. I got this bug using the 'x/i' command on QEMU
monitor with a LE guest.

> 
> b) AFAICT this is the *only* thing that looks for the LE bit in
> hflags. Given that, and the fact that it would be wrong in most cases,
> we should remove it from hflags entirely along with this change.
> 

I was changing the code to remove this LE bit from hflags and I found the
function 'helper_store_hid0_601()' in misc_helper.c, which manipulates the
'hflags'. The commit 056401eae6 says:

"Implement PowerPC 601 HID0 register, needed for little-endian mode support.
As a consequence, we need to merge hflags coming from MSR with other ones.
Use little-endian mode from hflags instead of MSR during code translation."

So, is the 'hflags' necessary here? Can we use MSR instead of hflags to
change the endianness in this function?

Thank you

> > ---
> >  target/ppc/translate_init.inc.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/target/ppc/translate_init.inc.c 
> > b/target/ppc/translate_init.inc.c
> > index d33d65dff7..a0b384da9e 100644
> > --- a/target/ppc/translate_init.inc.c
> > +++ b/target/ppc/translate_init.inc.c
> > @@ -10830,7 +10830,7 @@ static void ppc_disas_set_info(CPUState *cs, 
> > disassemble_info *info)
> >  PowerPCCPU *cpu = POWERPC_CPU(cs);
> >  CPUPPCState *env = >env;
> >  
> > -if ((env->hflags >> MSR_LE) & 1) {
> > +if (msr_le) {
> >  info->endian = BFD_ENDIAN_LITTLE;
> >  }
> >  info->mach = env->bfd_mach;
> 
> -- 
> David Gibson  | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au| minimalist, thank you.  NOT _the_ 
> _other_
>   | _way_ _around_!
> http://www.ozlabs.org/~dgibson





[Bug 1835477] Re: Converting qcow2 to vmdk on MacOSX results in a non-bootable image

2019-12-23 Thread John Snow
*** This bug is a duplicate of bug 1776920 ***
https://bugs.launchpad.net/bugs/1776920

Does the problem happen only when the image is on APFS? when the
destination is on APFS? Neither? Try to see if it's the filesystem. Use
OSX to convert images on a non-APFS formatted external drive to see if
that improves your luck.

I'm assuming this is a duplicate of 1776920 which is still open because
we have no OSX developers willing or able to debug this problem.

** This bug has been marked a duplicate of bug 1776920
   qemu-img convert on Mac OSX creates corrupt images

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1835477

Title:
  Converting qcow2 to vmdk on MacOSX results in a non-bootable image

Status in QEMU:
  New

Bug description:
  When using qemu-img convert -O vmdk  with version 3.1.0 or 4.0.0 on
  OSX (10.14.3) with a qcow2 image  (https://cloud-
  images.ubuntu.com/bionic/20190703/bionic-server-cloudimg-amd64.img),
  the resulting image is not bootable.

  Running the same command on Ubuntu 18.04 results in a bootable image
  as expected

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1835477/+subscriptions



[Bug 1816805] Re: Cannot create cdrom device with open tray and cache option

2019-12-23 Thread John Snow
Hi, versions 2.5 and 2.11 are quite old (though version 2.10 fixed the
bug mentioned in the Red Hat BZ, so I think this might be a different
bug.)


It's not clear at what step this is failing or where you are seeing the error 
message, so:

1. What is your full command line for QEMU?
2. Do you see this error message when migrating? If so, what does the 
destination QEMU command line look like?
3. Does the problem reproduce on a currently-supported upstream QEMU? (4.1.1, 
4.2.0)

** Changed in: qemu
   Status: New => Incomplete

** Changed in: qemu
 Assignee: (unassigned) => John Snow (jnsnow)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1816805

Title:
  Cannot create cdrom device with open tray and cache option

Status in QEMU:
  Incomplete

Bug description:
  When trying to create cdrom device with open tray and either of
  "cache" or "discard" options specified, I get the following error:

  qemu-system-x86_64: -drive if=none,id=drive-
  ide0-0-0,readonly=on,cache=writeback,discard=unmap,throttling.iops-
  total=900: Must specify either driver or file

  This bug essentially forbids live migration of VMs with open cdrom
  trays.

  I was able to find the same bug at RedHat:
  https://bugzilla.redhat.com/show_bug.cgi?id=1338638

  The bug was encountered in versions 2.5 and 2.11.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1816805/+subscriptions



[Bug 1719282] Re: Unable to boot after drive-mirror

2019-12-23 Thread John Snow
OK, so we're only talking about migrating a disk and not a whole VM, I
misunderstood. However... are you using qemu *2.7*? That's quite old!
Before digging into this further I need to insist that you try on a
supported release, either 4.0.1, 4.1.1, or 4.2.0.

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1719282

Title:
  Unable to boot after drive-mirror

Status in QEMU:
  Incomplete

Bug description:
  Hi,
  I am using "drive-mirror" qmp block-job command to transfer VM disk image to 
other path (different physical disk on host).
  Unfortunately after shutting down and starting from new image, VM is unable 
to boot and qrub enters rescue mode displaying following error:
  ```
  error: file '/grub/i386-pc/normal.mod' not found.
  Entering rescue mode...
  grub rescue>
  ```

  To investigate the problem, I compared both RAW images using linux
  "cmp -l" command and found out that they differ in 569028 bytes
  starting from address 185598977 to 252708864 which are located on
  /boot partition.

  So I mounted /boot partition of mirrored RAW image on host OS and it
  seems that file-system is broken and grub folder is not recognized.
  But /boot on original RAW image has no problem.

  Mirrored Image:
  ls -l /mnt/vm-boot/
  ls: cannot access /mnt/vm-boot/grub: Structure needs cleaning
  total 38168
  -rw-r--r-- 1 root root   157721 Oct 19  2016 config-3.16.0-4-amd64
  -rw-r--r-- 1 root root   129281 Sep 20  2015 config-3.2.0-4-amd64
  d? ? ??   ?? grub
  -rw-r--r-- 1 root root 15739360 Nov  2  2016 initrd.img-3.16.0-4-amd64
  -rw-r--r-- 1 root root 12115412 Oct 10  2015 initrd.img-3.2.0-4-amd64
  drwxr-xr-x 2 root root12288 Oct  7  2013 lost+found
  -rw-r--r-- 1 root root  2679264 Oct 19  2016 System.map-3.16.0-4-amd64
  -rw-r--r-- 1 root root  2114662 Sep 20  2015 System.map-3.2.0-4-amd64
  -rw-r--r-- 1 root root  3126448 Oct 19  2016 vmlinuz-3.16.0-4-amd64
  -rw-r--r-- 1 root root  2842592 Sep 20  2015 vmlinuz-3.2.0-4-amd64

  Original Image:
  ls /mnt/vm-boot/ -l
  total 38173
  -rw-r--r-- 1 root root   157721 Oct 19  2016 config-3.16.0-4-amd64
  -rw-r--r-- 1 root root   129281 Sep 20  2015 config-3.2.0-4-amd64
  drwxr-xr-x 5 root root 5120 Nov  2  2016 grub
  -rw-r--r-- 1 root root 15739360 Nov  2  2016 initrd.img-3.16.0-4-amd64
  -rw-r--r-- 1 root root 12115412 Oct 10  2015 initrd.img-3.2.0-4-amd64
  drwxr-xr-x 2 root root12288 Oct  7  2013 lost+found
  -rw-r--r-- 1 root root  2679264 Oct 19  2016 System.map-3.16.0-4-amd64
  -rw-r--r-- 1 root root  2114662 Sep 20  2015 System.map-3.2.0-4-amd64
  -rw-r--r-- 1 root root  3126448 Oct 19  2016 vmlinuz-3.16.0-4-amd64
  -rw-r--r-- 1 root root  2842592 Sep 20  2015 vmlinuz-3.2.0-4-amd64

  ls /mnt/vm-boot/grub/ -l
  total 2376
  -rw-r--r-- 1 root root  48 Oct  7  2013 device.map
  drwxr-xr-x 2 root root1024 Oct 10  2015 fonts
  -r--r--r-- 1 root root9432 Nov  2  2016 grub.cfg
  -rw-r--r-- 1 root root1024 Oct  7  2013 grubenv
  drwxr-xr-x 2 root root6144 Aug  6  2016 i386-pc
  drwxr-xr-x 2 root root1024 Aug  6  2016 locale
  -rw-r--r-- 1 root root 2400500 Aug  6  2016 unicode.pf2

  qemu Version: 2.7.0-10

  Host OS: Debian 8x64
  Guest OS: Debian 8x64

  QMP Commands log:
  socat UNIX-CONNECT:/var/run/qemu-server/48016.qmp STDIO
  {"QMP": {"version": {"qemu": {"micro": 0, "minor": 7, "major": 2}, "package": 
"pve-qemu-kvm_2.7.0-10"}, "capabilities": []}}
  { "execute": "qmp_capabilities" }
  {"return": {}}
  { "execute": "drive-mirror",
"arguments": {
  "device": "drive-ide0",
  "target": "/diskc/48016/vm-48016-disk-2.raw",
  "sync": "full",
  "mode": "absolute-paths",
  "speed": 0
}
  }
  {"return": {}}
  {"timestamp": {"seconds": 1506331591, "microseconds": 623095}, "event": 
"BLOCK_JOB_READY", "data": {"device": "drive-ide0", "len": 269445758976, 
"offset": 269445758976, "speed": 0, "type": "mirror"}}
  {"timestamp": {"seconds": 1506332641, "microseconds": 245272}, "event": 
"SHUTDOWN"}
  {"timestamp": {"seconds": 1506332641, "microseconds": 377751}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "drive-ide0", "len": 271707340800, 
"offset": 271707340800, "speed": 0, "type": "mirror"}}

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1719282/+subscriptions



[Bug 1814420] Re: drive-backup with iscsi, it will failed "Could not create image: Invalid argument"

2019-12-23 Thread John Snow
Hi, qemu version 3.1 is a bit old in terms of upstream support. Can you
confirm that this is still an issue on 4.2 ?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1814420

Title:
  drive-backup with iscsi, it will failed "Could not create image:
  Invalid argument"

Status in QEMU:
  Incomplete

Bug description:
  I use iscsi protocol to drive-backup:

  ---iscsi target---
  yum -y install targetcli python-rtslib
  systemctl start target
  systemctl enable target
  targetcli /iscsi create iqn.2019-01.com.iaas
  targetcli /iscsi/iqn.2019-01.com.iaas/tpg1 set attribute authentication=0 
demo_mode_write_protect=0 generate_node_acls=1
  targetcli /iscsi/iqn.2019-01.com.iaas/tpg1/portals create 192.168.1.1 3260
  targetcli /backstores/fileio create file1 /opt/file1 2G
  targetcli /iscsi/iqn.2019-01.com.iaas/tpg1/luns create 
/backstores/fileio/file1
  ---

  Now, '{ "execute" : "drive-backup" , "arguments" : { "device" :
  "drive-virtio-disk0" , "sync" : "top" , "target" :
  "iscsi://192.168.1.1:3260/iqn.2019-01.com.iaas/0" } }'

  It may failed:
  {"id":"libvirt-1785","error":{"class":"GenericError","desc":"Could not
  create image: Invalid argument"}}

  But, This abnormal will be appear at the first time. Because when I
  retry again, It works very well.

  Then, I re-start the vm, It still be failed 'Could not create image:
  Invalid argument' on the first try, and the second try it will work
  very well.

  ---
  Host: centos 7.5
  qemu version: 2.12 and 3.1.0
  qemu command line: qemu-system-x86_64 -name guest=test,debug-threads=on -S 
-object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-190-test./master-key.aes
 -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off,mem-merge=off -m 
1024 -mem-prealloc -mem-path /dev/hugepages1G/libvirt/qemu/190-test -realtime 
mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 
1c8611c2-a18a-4b1c-b40b-9d82040eafa4 -smbios type=1,manufacturer=IaaS 
-no-user-config -nodefaults -chardev socket,id=charmonitor,fd=31,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-boot menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive 
file=/opt/vol/sas/fb0c7c37-13e7-41fe-b3f8-f0fbaaeec7ce,format=qcow2,if=none,id=drive-virtio-disk0,cache=writeback
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on
 -drive 
file=/opt/vol/sas/bde66671-536d-49cd-8b46-a4f1ea7be513,format=qcow2,if=none,id=drive-virtio-disk1,cache=writeback
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1,write-cache=on
 -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:85:45:3e:d4:3a,bus=pci.0,addr=0x6 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-chardev socket,id=charchannel0,fd=35,server,nowait -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0,password -device 
cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1814420/+subscriptions



[Bug 1808928] Re: Bitmap Extra data is not supported

2019-12-23 Thread John Snow
For now, QEMU cannot accept images with extra bitmap data, because QEMU
isn't aware of the semantics of those fields, so we cannot even allow
the image to load, just in case.

However, we SHOULD allow qemu-img check --repair to detect such bitmaps
as corruption so that images can be scrubbed and recovered. I will add
that to my TODO list.

Meanwhile, I believe the root cause of your problem here is a data
corruption event, but it's hard to tell exactly what it might be because
of the extra_data flag ... I can try to have some patches ready for you
in January that try to "ignore" the data and analyze the rest of the
image as best as possible, which might help us see what else went wrong.

** Changed in: qemu
 Assignee: (unassigned) => John Snow (jnsnow)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1808928

Title:
  Bitmap Extra data is not supported

Status in QEMU:
  New

Bug description:
  i am using dirty bitmaps and drive-backup. It works as aspected.

  Lately, i encounter a disastrous error. There is not any information
  about that situation. I cannot reach/open/attach/info or anything with
  a qcow2 file.

  virsh version
  Compiled against library: libvirt 4.10.0
  Using library: libvirt 4.10.0
  Using API: QEMU 4.10.0
  Running hypervisor: QEMU 2.12.0

  "qemu-img: Could not open '/var/lib/libvirt/images/test.qcow2': Bitmap
  extra data is not supported"

  what is that mean? what should i do?
  i cannot remove bitmap. i cannot open image or query.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1808928/+subscriptions



[Bug 1810400] Re: Failed to make dirty bitmaps writable: Can't update bitmap directory: Operation not permitted

2019-12-23 Thread John Snow
Hi, this should be fixed in 4.2. It looks like you're still on 2.12.0
based on the report. Closing under the assumption this is fixed. If you
discover otherwise, please feel free to re-open (or file a new bug.)

** Changed in: qemu
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1810400

Title:
   Failed to make dirty bitmaps writable: Can't update bitmap directory:
  Operation not permitted

Status in QEMU:
  Fix Released

Bug description:
  blockcommit does not work if there is dirty block.

  virsh version
  Compiled against library: libvirt 4.10.0
  Using library: libvirt 4.10.0
  Using API: QEMU 4.10.0
  Running hypervisor: QEMU 2.12.0

  Scenario:
  1. Create an instance
  2. Add dirty bitmap to vm disk.
  3. create a snapshot(external or internal)
  4. revert snapshot or blockcommit disk

  virsh blockcommit rota-test vda  --active
  Active Block Commit started

  virsh blockjob rota-test vda --info
  No current block job for vda

  
  rota-test.log:
   starting up libvirt version: 4.10.0, package: 1.el7 (CBS , 
2018-12-05-12:27:12, c1bk.rdu2.centos.org), qemu version: 
2.12.0qemu-kvm-ev-2.12.0-18.el7_6.1.1, kernel: 4.1.12-103.9.7.el7uek.x86_64, 
hostname: vm-kvm07
  LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin 
QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name 
guest=rota-test,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-101-rota-test/master-key.aes
 -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,dump-guest-core=off -cpu 
SandyBridge,hypervisor=on,xsaveopt=on -m 8192 -realtime mlock=off -smp 
3,sockets=3,cores=1,threads=1 -uuid 50dec55c-a80a-4adc-a788-7ba23230064e 
-no-user-config -nodefaults -chardev socket,id=charmonitor,fd=59,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
-global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global 
PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device 
ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device 
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive 
file=/var/lib/libvirt/images/rota-0003,format=qcow2,if=none,id=drive-virtio-disk0,cache=none
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on
 -netdev tap,fd=61,id=hostnet0,vhost=on,vhostfd=62 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e8:09:94,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-chardev spicevmc,id=charchannel0,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
 -spice port=5902,addr=0.0.0.0,disable-ticketing,seamless-migration=on -device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
 -chardev spicevmc,id=charredir0,name=usbredir -device 
usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev 
spicevmc,id=charredir1,name=usbredir -device 
usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -sandbox 
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg 
timestamp=on
  2019-01-03T07:50:43.810142Z qemu-kvm: -chardev pty,id=charserial0: char 
device redirected to /dev/pts/3 (label charserial0)
  main_channel_link: add main channel client
  red_qxl_set_cursor_peer:
  inputs_connect: inputs channel client create
  inputs_channel_detach_tablet:
  #block339: Failed to make dirty bitmaps writable: Can't update bitmap 
directory: Operation not permitted

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1810400/+subscriptions



[PATCH v3 2/2] tests/ide-test: Create a single unit-test covering more PRDT cases

2019-12-23 Thread Alexander Popov
Fuzzing the Linux kernel with syzkaller allowed to find how to crash qemu
using a special SCSI_IOCTL_SEND_COMMAND. It hits the assertion in
ide_dma_cb() introduced in the commit a718978ed58a in July 2015.
Currently this bug is not reproduced by the unit tests.

Let's improve the ide-test to cover more PRDT cases including one
that causes this particular qemu crash.

The test is developed according to the Programming Interface for
Bus Master IDE Controller (Revision 1.0 5/16/94).

Signed-off-by: Alexander Popov 
---
 tests/ide-test.c | 174 ---
 1 file changed, 74 insertions(+), 100 deletions(-)

diff --git a/tests/ide-test.c b/tests/ide-test.c
index 0277e7d5a9..5cfd97f915 100644
--- a/tests/ide-test.c
+++ b/tests/ide-test.c
@@ -445,104 +445,81 @@ static void test_bmdma_trim(void)
 test_bmdma_teardown(qts);
 }
 
-static void test_bmdma_short_prdt(void)
-{
-QTestState *qts;
-QPCIDevice *dev;
-QPCIBar bmdma_bar, ide_bar;
-uint8_t status;
-
-PrdtEntry prdt[] = {
-{
-.addr = 0,
-.size = cpu_to_le32(0x10 | PRDT_EOT),
-},
-};
-
-qts = test_bmdma_setup();
-
-dev = get_pci_device(qts, _bar, _bar);
-
-/* Normal request */
-status = send_dma_request(qts, CMD_READ_DMA, 0, 1,
-  prdt, ARRAY_SIZE(prdt), NULL);
-g_assert_cmphex(status, ==, 0);
-assert_bit_clear(qpci_io_readb(dev, ide_bar, reg_status), DF | ERR);
-
-/* Abort the request before it completes */
-status = send_dma_request(qts, CMD_READ_DMA | CMDF_ABORT, 0, 1,
-  prdt, ARRAY_SIZE(prdt), NULL);
-g_assert_cmphex(status, ==, 0);
-assert_bit_clear(qpci_io_readb(dev, ide_bar, reg_status), DF | ERR);
-free_pci_device(dev);
-test_bmdma_teardown(qts);
-}
-
-static void test_bmdma_one_sector_short_prdt(void)
-{
-QTestState *qts;
-QPCIDevice *dev;
-QPCIBar bmdma_bar, ide_bar;
-uint8_t status;
-
-/* Read 2 sectors but only give 1 sector in PRDT */
-PrdtEntry prdt[] = {
-{
-.addr = 0,
-.size = cpu_to_le32(0x200 | PRDT_EOT),
-},
-};
-
-qts = test_bmdma_setup();
-
-dev = get_pci_device(qts, _bar, _bar);
-
-/* Normal request */
-status = send_dma_request(qts, CMD_READ_DMA, 0, 2,
-  prdt, ARRAY_SIZE(prdt), NULL);
-g_assert_cmphex(status, ==, 0);
-assert_bit_clear(qpci_io_readb(dev, ide_bar, reg_status), DF | ERR);
-
-/* Abort the request before it completes */
-status = send_dma_request(qts, CMD_READ_DMA | CMDF_ABORT, 0, 2,
-  prdt, ARRAY_SIZE(prdt), NULL);
-g_assert_cmphex(status, ==, 0);
-assert_bit_clear(qpci_io_readb(dev, ide_bar, reg_status), DF | ERR);
-free_pci_device(dev);
-test_bmdma_teardown(qts);
-}
-
-static void test_bmdma_long_prdt(void)
+/*
+ * This test is developed according to the Programming Interface for
+ * Bus Master IDE Controller (Revision 1.0 5/16/94)
+ */
+static void test_bmdma_various_prdts(void)
 {
-QTestState *qts;
-QPCIDevice *dev;
-QPCIBar bmdma_bar, ide_bar;
-uint8_t status;
-
-PrdtEntry prdt[] = {
-{
-.addr = 0,
-.size = cpu_to_le32(0x1000 | PRDT_EOT),
-},
-};
-
-qts = test_bmdma_setup();
-
-dev = get_pci_device(qts, _bar, _bar);
-
-/* Normal request */
-status = send_dma_request(qts, CMD_READ_DMA, 0, 1,
-  prdt, ARRAY_SIZE(prdt), NULL);
-g_assert_cmphex(status, ==, BM_STS_ACTIVE | BM_STS_INTR);
-assert_bit_clear(qpci_io_readb(dev, ide_bar, reg_status), DF | ERR);
+int sectors = 0;
+uint32_t size = 0;
+
+for (sectors = 1; sectors <= 256; sectors *= 2) {
+QTestState *qts = NULL;
+QPCIDevice *dev = NULL;
+QPCIBar bmdma_bar, ide_bar;
+
+qts = test_bmdma_setup();
+dev = get_pci_device(qts, _bar, _bar);
+
+for (size = 0; size < 65536; size += 256) {
+uint32_t req_size = sectors * 512;
+uint32_t prd_size = size & 0xfffe; /* bit 0 is always set to 0 */
+uint8_t ret = 0;
+uint8_t req_status = 0;
+uint8_t abort_req_status = 0;
+PrdtEntry prdt[] = {
+{
+.addr = 0,
+.size = cpu_to_le32(size | PRDT_EOT),
+},
+};
+
+/* A value of zero in PRD size indicates 64K */
+if (prd_size == 0) {
+prd_size = 65536;
+}
+
+/*
+ * 1. If PRDs specified a smaller size than the IDE transfer
+ * size, then the Interrupt and Active bits in the Controller
+ * status register are not set (Error Condition).
+ *
+ * 2. If the size of the physical memory regions was equal to
+ * the IDE device transfer size, the Interrupt bit in the
+  

[PATCH v3 0/2] ide: Fix incorrect handling of some PRDTs and add the corresponding unit-test

2019-12-23 Thread Alexander Popov
Fuzzing the Linux kernel with syzkaller allowed to find how to crash qemu
using a special SCSI_IOCTL_SEND_COMMAND. It hits the assertion in
ide_dma_cb() introduced in the commit a718978ed58a in July 2015.

This patch series fixes incorrect handling of some PRDTs in ide_dma_cb()
and improves the ide-test to cover more PRDT cases (including one
that causes that particular qemu crash).

Changes from v2 (thanks to Kevin Wolf for the feedback):
 - the assertion about prepare_buf() return value is improved;
 - the patch order is reversed to keep the tree bisectable;
 - the unit-test performance is improved -- now it runs 8 seconds
   instead of 3 minutes on my laptop.

Alexander Popov (2):
  ide: Fix incorrect handling of some PRDTs in ide_dma_cb()
  tests/ide-test: Create a single unit-test covering more PRDT cases

 hw/ide/core.c|  30 +---
 tests/ide-test.c | 174 ---
 2 files changed, 96 insertions(+), 108 deletions(-)

-- 
2.23.0




[PATCH v3 1/2] ide: Fix incorrect handling of some PRDTs in ide_dma_cb()

2019-12-23 Thread Alexander Popov
The commit a718978ed58a from July 2015 introduced the assertion which
implies that the size of successful DMA transfers handled in ide_dma_cb()
should be multiple of 512 (the size of a sector). But guest systems can
initiate DMA transfers that don't fit this requirement.

For fixing that let's check the number of bytes prepared for the transfer
by the prepare_buf() handler. The code in ide_dma_cb() must behave
according to the Programming Interface for Bus Master IDE Controller
(Revision 1.0 5/16/94):
1. If PRDs specified a smaller size than the IDE transfer
   size, then the Interrupt and Active bits in the Controller
   status register are not set (Error Condition).
2. If the size of the physical memory regions was equal to
   the IDE device transfer size, the Interrupt bit in the
   Controller status register is set to 1, Active bit is set to 0.
3. If PRDs specified a larger size than the IDE transfer size,
   the Interrupt and Active bits in the Controller status register
   are both set to 1.

Signed-off-by: Alexander Popov 
---
 hw/ide/core.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 754ff4dc34..8eb766 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -849,6 +849,7 @@ static void ide_dma_cb(void *opaque, int ret)
 int64_t sector_num;
 uint64_t offset;
 bool stay_active = false;
+int32_t prep_size = 0;
 
 if (ret == -EINVAL) {
 ide_dma_error(s);
@@ -863,13 +864,15 @@ static void ide_dma_cb(void *opaque, int ret)
 }
 }
 
-n = s->io_buffer_size >> 9;
-if (n > s->nsector) {
-/* The PRDs were longer than needed for this request. Shorten them so
- * we don't get a negative remainder. The Active bit must remain set
- * after the request completes. */
+if (s->io_buffer_size > s->nsector * 512) {
+/*
+ * The PRDs were longer than needed for this request.
+ * The Active bit must remain set after the request completes.
+ */
 n = s->nsector;
 stay_active = true;
+} else {
+n = s->io_buffer_size >> 9;
 }
 
 sector_num = ide_get_sector(s);
@@ -892,9 +895,20 @@ static void ide_dma_cb(void *opaque, int ret)
 n = s->nsector;
 s->io_buffer_index = 0;
 s->io_buffer_size = n * 512;
-if (s->bus->dma->ops->prepare_buf(s->bus->dma, s->io_buffer_size) < 512) {
-/* The PRDs were too short. Reset the Active bit, but don't raise an
- * interrupt. */
+prep_size = s->bus->dma->ops->prepare_buf(s->bus->dma, s->io_buffer_size);
+/* prepare_buf() must succeed and respect the limit */
+assert(prep_size >= 0 && prep_size <= n * 512);
+
+/*
+ * Now prep_size stores the number of bytes in the sglist, and
+ * s->io_buffer_size stores the number of bytes described by the PRDs.
+ */
+
+if (prep_size < n * 512) {
+/*
+ * The PRDs are too short for this request. Error condition!
+ * Reset the Active bit and don't raise the interrupt.
+ */
 s->status = READY_STAT | SEEK_STAT;
 dma_buf_commit(s, 0);
 goto eot;
-- 
2.23.0




Re: [PATCH v5 5/6] hppa: Add emulation of Artist graphics

2019-12-23 Thread Sven Schnelle
Hi Philippe,

On Sun, Dec 22, 2019 at 01:37:48PM +0100, Philippe Mathieu-Daudé wrote:
> >  
> > +if (vga_interface_type != VGA_NONE) {
> > +dev = qdev_create(NULL, "artist");
> > +qdev_init_nofail(dev);
> > +s = SYS_BUS_DEVICE(dev);
> > +sysbus_mmio_map(s, 0, LASI_GFX_HPA);
> > +sysbus_mmio_map(s, 1, ARTIST_FB_ADDR);
> 
> How is this chipset connected on the board?
> If it is a card you can plug on a bus, you can use a condition.
> If it is soldered or part of another chipset, then it has to be mapped
> unconditionally.

Depends on the Model. Hp 9000 712 and 715 had it onboard, for the B160L
we're emulating and others it was a GSC add-on card.

Regards
Sven



Re: [PATCH] virtio: add the queue number check

2019-12-23 Thread Paolo Bonzini
On 23/12/19 15:25, Michael S. Tsirkin wrote:
> On Mon, Dec 23, 2019 at 12:02:18PM +0100, Paolo Bonzini wrote:
>> On 23/12/19 10:18, Yang Zhong wrote:
>>>   In this time, the queue number in the front-end block driver is 2, but
>>>   the queue number in qemu side is still 4. So the guest virtio_blk
>>>   driver will failed to create vq with backend.
>>
>> Where?
>>
>>>   There is no "set back"
>>>   mechnism for block driver to inform backend this new queue number.
>>>   So, i added this check in qemu side.
>>
>> Perhaps the guest kernel should still create the virtqueues, and just
>> not use them.  In any case, now that you have explained it, it is
>> certainly a guest bug.
> 
> Paolo do you understand where the bug is?

No, I asked above where does the virtio_blk driver fail to create the
virtqueues.  But it shouldn't; it is legal for the guest not to
configure all virtqueues, and QEMU knows to ignore the extra ones.  For
example, firmware can ignore virtio-scsi request queues above the first,
or ignore the virtio-scsi control and event queues (see
src/hw/virtio-scsi.c in SeaBIOS, it only calls vp_find_vq with index 2).

In particular this is the reason why request queues for virtio-scsi are
numbered 2 and above, rather than starting from zero: this way, the
guest can just pretend that unnecessary queues do not exist, and still
keep the virtqueue numbers consecutive.

> E.g. I see this in vhost user block:
> 
> /* Kick right away to begin processing requests already in vring */
> for (i = 0; i < s->dev.nvqs; i++) {
> VirtQueue *kick_vq = virtio_get_queue(vdev, i);
> 
> if (!virtio_queue_get_desc_addr(vdev, i)) {
> continue;
> }
> event_notifier_set(virtio_queue_get_host_notifier(kick_vq));
> }
> 
> which is an (admittedly hacky) want to skip VQs which
> were not configured by guest 

Right, this is an example of QEMU ignoring extra virtqueues.

Paolo

> 
> 
>>>   Since the current virtio-blk and vhost-user-blk device always
>>>   defaultly use 1 queue, it's hard to find this issue.
>>>
>>>   I checked the guest kernel driver, virtio-scsi and virtio-blk all
>>>   have same check in their driver probe:
>>>
>>>   num_vqs = min_t(unsigned int, nr_cpu_ids, num_vqs);
>>>  
>>>   It's possible the guest driver has different queue number with qemu
>>>   side.
>>>
>>>   I also want to fix this issue from guest driver side, but currently there 
>>>   is no better solution to fix this issue.
>>>
>>>   By the way, i did not try scsi with this corner case, and only check
>>>   driver and qemu code to find same issue. thanks! 
>>>
>>>   Yang
>>>
 Paolo

> Signed-off-by: Yang Zhong 
> ---
>  hw/block/vhost-user-blk.c | 11 +++
>  hw/block/virtio-blk.c | 11 ++-
>  hw/scsi/virtio-scsi.c | 12 
>  3 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 63da9bb619..250e72abe4 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -23,6 +23,8 @@
>  #include "qom/object.h"
>  #include "hw/qdev-core.h"
>  #include "hw/qdev-properties.h"
> +#include "qemu/option.h"
> +#include "qemu/config-file.h"
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-user-blk.h"
>  #include "hw/virtio/virtio.h"
> @@ -391,6 +393,7 @@ static void vhost_user_blk_device_realize(DeviceState 
> *dev, Error **errp)
>  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>  VHostUserBlk *s = VHOST_USER_BLK(vdev);
>  Error *err = NULL;
> +unsigned cpus;
>  int i, ret;
>  
>  if (!s->chardev.chr) {
> @@ -403,6 +406,14 @@ static void 
> vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
>  return;
>  }
>  
> +cpus = 
> qemu_opt_get_number(qemu_opts_find(qemu_find_opts("smp-opts"), NULL),
> +   "cpus", 0);
> +if (s->num_queues > cpus ) {
> +error_setg(errp, "vhost-user-blk: the queue number should be 
> equal "
> +"or less than vcpu number");
> +return;
> +}
> +
>  if (!s->queue_size) {
>  error_setg(errp, "vhost-user-blk: queue size must be non-zero");
>  return;
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index d62e6377c2..b2f4d01148 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -18,6 +18,8 @@
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
>  #include "trace.h"
> +#include "qemu/option.h"
> +#include "qemu/config-file.h"
>  #include "hw/block/block.h"
>  #include "hw/qdev-properties.h"
>  #include "sysemu/blockdev.h"
> @@ -1119,7 +1121,7 @@ static void virtio_blk_device_realize(DeviceState 
> *dev, Error **errp)
> 

Re: [PATCH 00/10] Fixes for DP8393X SONIC device emulation

2019-12-23 Thread Philippe Mathieu-Daudé

Hi Finn,

On 12/20/19 5:24 AM, Finn Thain wrote:

On Sun, 15 Dec 2019, Aleksandar Markovic wrote:



Herve,

Is there any way for us to come up with an equivalent or at least
approximate scenario for Jazz machines?

Regards,
Aleksandar



That would be useful in general, but in this case I think it might be
better to test NetBSD, since I have already tested Linux. (I had to fix
some bugs in the Linux sonic driver.)

I tried to boot NetBSD/arc but failed. I got a blue screen when I typed
"cd:boot" at the "Run A Program" prompt in the ARC menu.

$ ln -s NTPROM.RAW mipsel_bios.bin
$ mips64el-softmmu/qemu-system-mips64el -M magnum -L .
-drive if=scsi,unit=2,media=cdrom,format=raw,file=NetBSD-8.1-arc.iso
-global ds1225y.filename=nvram -global ds1225y.size=8200
qemu-system-mips64el: g364: invalid read at [00102000]
$

Any help would be appreciated.


Please open a new bug entry with this information at 
https://bugs.launchpad.net/qemu/+filebug


Thanks,

Phil.




[PULL v2 26/27] virtio: make seg_max virtqueue size dependent

2019-12-23 Thread Michael S. Tsirkin
From: Denis Plotnikov 

Before the patch, seg_max parameter was immutable and hardcoded
to 126 (128 - 2) without respect to queue size. This has two negative effects:

1. when queue size is < 128, we have Virtio 1.1 specfication violation:
   (2.6.5.3.1 Driver Requirements) seq_max must be <= queue_size.
   This violation affects the old Linux guests (ver < 4.14). These guests
   crash on these queue_size setups.

2. when queue_size > 128, as was pointed out by Denis Lunev 
,
   seg_max restrics guest's block request length which affects guests'
   performance making them issues more block request than needed.
   https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03721.html

To mitigate this two effects, the patch adds the property adjusting seg_max
to queue size automaticaly. Since seg_max is a guest visible parameter,
the property is machine type managable and allows to choose between
old (seg_max = 126 always) and new (seg_max = queue_size - 2) behaviors.

Not to change the behavior of the older VMs, prevent setting the default
seg_max_adjust value for older machine types.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Denis Plotnikov 
Message-Id: <20191220140905.1718-2-dplotni...@virtuozzo.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/virtio/virtio-blk.h  |  1 +
 include/hw/virtio/virtio-scsi.h |  1 +
 hw/block/virtio-blk.c   |  9 -
 hw/core/machine.c   |  3 +++
 hw/scsi/vhost-scsi.c|  2 ++
 hw/scsi/virtio-scsi.c   | 10 +-
 6 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 9c19f5b634..1e62f869b2 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -38,6 +38,7 @@ struct VirtIOBlkConf
 uint32_t request_merging;
 uint16_t num_queues;
 uint16_t queue_size;
+bool seg_max_adjust;
 uint32_t max_discard_sectors;
 uint32_t max_write_zeroes_sectors;
 bool x_enable_wce_if_config_wce;
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 122f7c4b6f..24e768909d 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -48,6 +48,7 @@ typedef struct virtio_scsi_config VirtIOSCSIConfig;
 struct VirtIOSCSIConf {
 uint32_t num_queues;
 uint32_t virtqueue_size;
+bool seg_max_adjust;
 uint32_t max_sectors;
 uint32_t cmd_per_lun;
 #ifdef CONFIG_VHOST_SCSI
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index b12157b5eb..9bee514c4e 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -913,7 +913,8 @@ static void virtio_blk_update_config(VirtIODevice *vdev, 
uint8_t *config)
 blk_get_geometry(s->blk, );
 memset(, 0, sizeof(blkcfg));
 virtio_stq_p(vdev, , capacity);
-virtio_stl_p(vdev, _max, 128 - 2);
+virtio_stl_p(vdev, _max,
+ s->conf.seg_max_adjust ? s->conf.queue_size - 2 : 128 - 2);
 virtio_stw_p(vdev, , conf->cyls);
 virtio_stl_p(vdev, _size, blk_size);
 virtio_stw_p(vdev, _io_size, conf->min_io_size / blk_size);
@@ -1138,6 +1139,11 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 error_setg(errp, "num-queues property must be larger than 0");
 return;
 }
+if (conf->queue_size <= 2) {
+error_setg(errp, "invalid queue-size property (%" PRIu16 "), "
+   "must be > 2", conf->queue_size);
+return;
+}
 if (!is_power_of_2(conf->queue_size) ||
 conf->queue_size > VIRTQUEUE_MAX_SIZE) {
 error_setg(errp, "invalid queue-size property (%" PRIu16 "), "
@@ -1267,6 +1273,7 @@ static Property virtio_blk_properties[] = {
 true),
 DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1),
 DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 128),
+DEFINE_PROP_BOOL("seg-max-adjust", VirtIOBlock, conf.seg_max_adjust, true),
 DEFINE_PROP_LINK("iothread", VirtIOBlock, conf.iothread, TYPE_IOTHREAD,
  IOThread *),
 DEFINE_PROP_BIT64("discard", VirtIOBlock, host_features,
diff --git a/hw/core/machine.c b/hw/core/machine.c
index f5e2b32b3b..ec2e3fcb61 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -29,6 +29,9 @@
 
 GlobalProperty hw_compat_4_2[] = {
 { "virtio-blk-device", "x-enable-wce-if-config-wce", "off" },
+{ "virtio-blk-device", "seg-max-adjust", "off"},
+{ "virtio-scsi-device", "seg_max_adjust", "off"},
+{ "vhost-blk-device", "seg_max_adjust", "off"},
 };
 const size_t hw_compat_4_2_len = G_N_ELEMENTS(hw_compat_4_2);
 
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index c693fc748a..26f710d3ec 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -275,6 +275,8 @@ static Property vhost_scsi_properties[] = {
 DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1),
 

[PULL v2 25/27] hw: fix using 4.2 compat in 5.0 machine types for i440fx/q35

2019-12-23 Thread Michael S. Tsirkin
From: Denis Plotnikov 

5.0 machine type uses 4.2 compats. This seems to be incorrect, since
the latests machine type by now is 5.0 and it should use its own
compat or shouldn't use any relying on the defaults.
Seems, like this appeared because of some problems on merge/rebase.

Signed-off-by: Denis Plotnikov 
Message-Id: <20191223072856.5369-1-dplotni...@virtuozzo.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/pc_piix.c | 1 -
 hw/i386/pc_q35.c  | 1 -
 2 files changed, 2 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 721c7aa64e..fa12203079 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -425,7 +425,6 @@ static void pc_i440fx_5_0_machine_options(MachineClass *m)
 m->alias = "pc";
 m->is_default = 1;
 pcmc->default_cpu_version = 1;
-compat_props_add(m->compat_props, hw_compat_4_2, hw_compat_4_2_len);
 }
 
 DEFINE_I440FX_MACHINE(v5_0, "pc-i440fx-5.0", NULL,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 52f45735e4..84cf925cf4 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -354,7 +354,6 @@ static void pc_q35_5_0_machine_options(MachineClass *m)
 pc_q35_machine_options(m);
 m->alias = "q35";
 pcmc->default_cpu_version = 1;
-compat_props_add(m->compat_props, hw_compat_4_2, hw_compat_4_2_len);
 }
 
 DEFINE_Q35_MACHINE(v5_0, "pc-q35-5.0", NULL,
-- 
MST




[PULL v2 23/27] vhost-user: add VHOST_USER_RESET_DEVICE to reset devices

2019-12-23 Thread Michael S. Tsirkin
From: Raphael Norwitz 

Add a VHOST_USER_RESET_DEVICE message which will reset the vhost user
backend. Disabling all rings, and resetting all internal state, ready
for the backend to be reinitialized.

A backend has to report it supports this features with the
VHOST_USER_PROTOCOL_F_RESET_DEVICE protocol feature bit. If it does
so, the new message is used instead of sending a RESET_OWNER which has
had inconsistent implementations.

Signed-off-by: David Vrabel 
Signed-off-by: Raphael Norwitz 
Message-Id: <1572385083-5254-2-git-send-email-raphael.norw...@nutanix.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/vhost-user.c  |  8 +++-
 docs/interop/vhost-user.rst | 15 +++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 02a9b25199..d27a10fcc6 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -58,6 +58,7 @@ enum VhostUserProtocolFeature {
 VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD = 10,
 VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
 VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
+VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
 VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -98,6 +99,7 @@ typedef enum VhostUserRequest {
 VHOST_USER_GET_INFLIGHT_FD = 31,
 VHOST_USER_SET_INFLIGHT_FD = 32,
 VHOST_USER_GPU_SET_SOCKET = 33,
+VHOST_USER_RESET_DEVICE = 34,
 VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -890,10 +892,14 @@ static int vhost_user_set_owner(struct vhost_dev *dev)
 static int vhost_user_reset_device(struct vhost_dev *dev)
 {
 VhostUserMsg msg = {
-.hdr.request = VHOST_USER_RESET_OWNER,
 .hdr.flags = VHOST_USER_VERSION,
 };
 
+msg.hdr.request = virtio_has_feature(dev->protocol_features,
+ VHOST_USER_PROTOCOL_F_RESET_DEVICE)
+? VHOST_USER_RESET_DEVICE
+: VHOST_USER_RESET_OWNER;
+
 if (vhost_user_write(dev, , NULL, 0) < 0) {
 return -1;
 }
diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 015ac08177..5f8b3a456b 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -785,6 +785,7 @@ Protocol features
   #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD  10
   #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER  11
   #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
+  #define VHOST_USER_PROTOCOL_F_RESET_DEVICE   13
 
 Master message types
 
@@ -1190,6 +1191,20 @@ Master message types
   ancillary data. The GPU protocol is used to inform the master of
   rendering state and updates. See vhost-user-gpu.rst for details.
 
+``VHOST_USER_RESET_DEVICE``
+  :id: 34
+  :equivalent ioctl: N/A
+  :master payload: N/A
+  :slave payload: N/A
+
+  Ask the vhost user backend to disable all rings and reset all
+  internal device state to the initial state, ready to be
+  reinitialized. The backend retains ownership of the device
+  throughout the reset operation.
+
+  Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol
+  feature is set by the backend.
+
 Slave message types
 ---
 
-- 
MST




[PULL v2 00/27] virtio, pci, pc: fixes, features

2019-12-23 Thread Michael S. Tsirkin
The following changes since commit dd5b0f95490883cd8bc7d070db8de70d5c979cbc:

  Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20191219' into 
staging (2019-12-20 16:37:07 +)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to 585c70da17395114e5ec4f6f3f9f5068ae1ff0f3:

  tests: add virtio-scsi and virtio-blk seg_max_adjust test (2019-12-23 
09:12:30 -0500)


virtio, pci, pc: fixes, features

Bugfixes all over the place.
HMAT support.
New flags for vhost-user-blk utility.
Auto-tuning of seg max for virtio storage.

Signed-off-by: Michael S. Tsirkin 



Changes since v1:
- (hopefully) fix build on Mac OSX
- fix up 5.0 compat for pc
- seg_max auto-config

Denis Plotnikov (3):
  hw: fix using 4.2 compat in 5.0 machine types for i440fx/q35
  virtio: make seg_max virtqueue size dependent
  tests: add virtio-scsi and virtio-blk seg_max_adjust test

Jean-Philippe Brucker (1):
  virtio-mmio: Clear v2 transport state on soft reset

Liu Jingqi (5):
  numa: Extend CLI to provide memory latency and bandwidth information
  numa: Extend CLI to provide memory side cache information
  hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
  hmat acpi: Build System Locality Latency and Bandwidth Information 
Structure(s)
  hmat acpi: Build Memory Side Cache Information Structure(s)

Michael Roth (1):
  virtio-pci: disable vring processing when bus-mastering is disabled

Michael S. Tsirkin (5):
  virtio: add ability to delete vq through a pointer
  virtio: make virtio_delete_queue idempotent
  virtio-input: convert to new virtio_delete_queue
  virtio: update queue size on guest write
  ACPI: add expected files for HMAT tests (acpihmat)

Micky Yun Chan (1):
  Implement backend program convention command for vhost-user-blk

Pan Nengyuan (2):
  virtio-balloon: fix memory leak while attach virtio-balloon device
  virtio-serial-bus: fix memory leak while attach virtio-serial-bus

Philippe Mathieu-Daudé (2):
  hw/pci/pci_host: Remove redundant PCI_DPRINTF()
  hw/pci/pci_host: Let pci_data_[read/write] use unsigned 'size' argument

Raphael Norwitz (2):
  vhost-user: add VHOST_USER_RESET_DEVICE to reset devices
  vhost-user-scsi: reset the device if supported

Stefan Hajnoczi (1):
  virtio: don't enable notifications during polling

Tao Xu (3):
  numa: Extend CLI to provide initiator information for numa nodes
  tests/numa: Add case for QMP build HMAT
  tests/bios-tables-test: add test cases for ACPI HMAT

Yi Sun (1):
  intel_iommu: fix bug to read DMAR_RTADDR_REG

 docs/interop/vhost-user.json  |  31 
 qapi/machine.json | 180 +-
 hw/acpi/hmat.h|  42 +
 include/hw/pci/pci_host.h |   4 +-
 include/hw/virtio/virtio-blk.h|   1 +
 include/hw/virtio/virtio-scsi.h   |   1 +
 include/hw/virtio/virtio.h|  18 ++
 include/sysemu/numa.h |  63 +++
 contrib/vhost-user-blk/vhost-user-blk.c   | 108 ++-
 hw/acpi/hmat.c| 268 +++
 hw/block/virtio-blk.c |  18 +-
 hw/char/virtio-serial-bus.c   |   8 +
 hw/core/machine.c |  68 +++
 hw/core/numa.c| 297 ++
 hw/i386/acpi-build.c  |   5 +
 hw/i386/intel_iommu.c |   7 +-
 hw/i386/pc_piix.c |   1 -
 hw/i386/pc_q35.c  |   1 -
 hw/input/virtio-input.c   |   5 +-
 hw/pci/pci_host.c |  25 +--
 hw/scsi/vhost-scsi.c  |   2 +
 hw/scsi/vhost-user-scsi.c |  24 +++
 hw/scsi/virtio-scsi.c |  19 +-
 hw/virtio/vhost-user.c|   8 +-
 hw/virtio/virtio-balloon.c|   7 +
 hw/virtio/virtio-mmio.c   |  14 ++
 hw/virtio/virtio-pci.c|  14 +-
 hw/virtio/virtio.c|  63 +--
 tests/bios-tables-test.c  |  44 +
 tests/numa-test.c | 213 +
 docs/interop/vhost-user.rst   |  32 
 hw/acpi/Kconfig   |   7 +-
 hw/acpi/Makefile.objs |   1 +
 qemu-options.hx   |  95 +-
 tests/acceptance/virtio_seg_max_adjust.py | 134 ++
 tests/data/acpi/pc/APIC.acpihmat  | Bin 0 -> 128 bytes
 tests/data/acpi/pc/DSDT.acpihmat  | Bin 0 -> 6455 bytes
 tests/data/acpi/pc/HMAT.acpihmat  | Bin 0 -> 280 bytes
 

[PULL v2 24/27] vhost-user-scsi: reset the device if supported

2019-12-23 Thread Michael S. Tsirkin
From: Raphael Norwitz 

If the vhost-user-scsi backend supports the VHOST_USER_F_RESET_DEVICE
protocol feature, then the device can be reset when requested.

If this feature is not supported, do not try a reset as this will send
a VHOST_USER_RESET_OWNER that the backend is not expecting,
potentially putting into an inoperable state.

Signed-off-by: David Vrabel 
Signed-off-by: Raphael Norwitz 
Message-Id: <1572385083-5254-3-git-send-email-raphael.norw...@nutanix.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/scsi/vhost-user-scsi.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 6a6c15dd32..23f972df59 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -39,6 +39,10 @@ static const int user_feature_bits[] = {
 VHOST_INVALID_FEATURE_BIT
 };
 
+enum VhostUserProtocolFeature {
+VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
+};
+
 static void vhost_user_scsi_set_status(VirtIODevice *vdev, uint8_t status)
 {
 VHostUserSCSI *s = (VHostUserSCSI *)vdev;
@@ -62,6 +66,25 @@ static void vhost_user_scsi_set_status(VirtIODevice *vdev, 
uint8_t status)
 }
 }
 
+static void vhost_user_scsi_reset(VirtIODevice *vdev)
+{
+VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev);
+struct vhost_dev *dev = >dev;
+
+/*
+ * Historically, reset was not implemented so only reset devices
+ * that are expecting it.
+ */
+if (!virtio_has_feature(dev->protocol_features,
+VHOST_USER_PROTOCOL_F_RESET_DEVICE)) {
+return;
+}
+
+if (dev->vhost_ops->vhost_reset_device) {
+dev->vhost_ops->vhost_reset_device(dev);
+}
+}
+
 static void vhost_dummy_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 }
@@ -182,6 +205,7 @@ static void vhost_user_scsi_class_init(ObjectClass *klass, 
void *data)
 vdc->get_features = vhost_scsi_common_get_features;
 vdc->set_config = vhost_scsi_common_set_config;
 vdc->set_status = vhost_user_scsi_set_status;
+vdc->reset = vhost_user_scsi_reset;
 fwc->get_dev_path = vhost_scsi_common_get_fw_dev_path;
 }
 
-- 
MST




[PULL v2 22/27] hw/pci/pci_host: Let pci_data_[read/write] use unsigned 'size' argument

2019-12-23 Thread Michael S. Tsirkin
From: Philippe Mathieu-Daudé 

Both functions are called by MemoryRegionOps.[read/write] handlers
with unsigned 'size' argument. Both functions call
pci_host_config_[read/write]_common() which expect a uint32_t 'len'
parameter (also unsigned).
Since it is pointless (and confuse) to use a signed value, use a
unsigned type.

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20191216002134.18279-3-phi...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/pci/pci_host.h | 4 ++--
 hw/pci/pci_host.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/hw/pci/pci_host.h b/include/hw/pci/pci_host.h
index ba31595fc7..9ce088bd13 100644
--- a/include/hw/pci/pci_host.h
+++ b/include/hw/pci/pci_host.h
@@ -62,8 +62,8 @@ void pci_host_config_write_common(PCIDevice *pci_dev, 
uint32_t addr,
 uint32_t pci_host_config_read_common(PCIDevice *pci_dev, uint32_t addr,
  uint32_t limit, uint32_t len);
 
-void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, int len);
-uint32_t pci_data_read(PCIBus *s, uint32_t addr, int len);
+void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, unsigned len);
+uint32_t pci_data_read(PCIBus *s, uint32_t addr, unsigned len);
 
 extern const MemoryRegionOps pci_host_conf_le_ops;
 extern const MemoryRegionOps pci_host_conf_be_ops;
diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
index 0958d157de..ce7bcdb1d5 100644
--- a/hw/pci/pci_host.c
+++ b/hw/pci/pci_host.c
@@ -106,7 +106,7 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, 
uint32_t addr,
 return ret;
 }
 
-void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, int len)
+void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, unsigned len)
 {
 PCIDevice *pci_dev = pci_dev_find_by_addr(s, addr);
 uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
@@ -119,7 +119,7 @@ void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, 
int len)
  val, len);
 }
 
-uint32_t pci_data_read(PCIBus *s, uint32_t addr, int len)
+uint32_t pci_data_read(PCIBus *s, uint32_t addr, unsigned len)
 {
 PCIDevice *pci_dev = pci_dev_find_by_addr(s, addr);
 uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
-- 
MST




[PULL v2 19/27] ACPI: add expected files for HMAT tests (acpihmat)

2019-12-23 Thread Michael S. Tsirkin
Signed-off-by: Michael S. Tsirkin 
---
 tests/bios-tables-test-allowed-diff.h |   8 
 tests/data/acpi/pc/APIC.acpihmat  | Bin 0 -> 128 bytes
 tests/data/acpi/pc/DSDT.acpihmat  | Bin 0 -> 6455 bytes
 tests/data/acpi/pc/HMAT.acpihmat  | Bin 0 -> 280 bytes
 tests/data/acpi/pc/SRAT.acpihmat  | Bin 0 -> 280 bytes
 tests/data/acpi/q35/APIC.acpihmat | Bin 0 -> 128 bytes
 tests/data/acpi/q35/DSDT.acpihmat | Bin 0 -> 9203 bytes
 tests/data/acpi/q35/HMAT.acpihmat | Bin 0 -> 280 bytes
 tests/data/acpi/q35/SRAT.acpihmat | Bin 0 -> 280 bytes
 9 files changed, 8 deletions(-)

diff --git a/tests/bios-tables-test-allowed-diff.h 
b/tests/bios-tables-test-allowed-diff.h
index 3c9e0c979b..dfb8523c8b 100644
--- a/tests/bios-tables-test-allowed-diff.h
+++ b/tests/bios-tables-test-allowed-diff.h
@@ -1,9 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/pc/APIC.acpihmat",
-"tests/data/acpi/pc/SRAT.acpihmat",
-"tests/data/acpi/pc/HMAT.acpihmat",
-"tests/data/acpi/pc/DSDT.acpihmat",
-"tests/data/acpi/q35/APIC.acpihmat",
-"tests/data/acpi/q35/SRAT.acpihmat",
-"tests/data/acpi/q35/HMAT.acpihmat",
-"tests/data/acpi/q35/DSDT.acpihmat",
diff --git a/tests/data/acpi/pc/APIC.acpihmat b/tests/data/acpi/pc/APIC.acpihmat
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..a21f164699bfccd8992ea1bdb5717f2dc3025496
 100644
GIT binary patch
literal 128
zcmZ<^@N{lqU|?Xp<>c?|5v<@85#a0y6k`O6f!H9Lf#JbFFwFr}2jX%tGJA8l)n3DUlYJT5~Da#Tw;OmQgB5
z;e`?xQH-Fp0w_-I0@g(f^nx~cZ9hWu2zi9`6;d?uRn+h7awwY80?9=Qh?+C!o9~>N
zIp@p_4clm3JxpZZtF6
zQqyh}m`6RXM_sK?U7@Go8VvOxF_0>z{4Y}*=owjb~^1iRBDC2O&%H{P469?*Yd<3S)Dt4h6;IOcSyOPx-
z!WD4$wLe@U78=P|`7%3EwMsS4-eFO_K#izg#6ML(cR4Bz6PvU5R=uHvG+44E7K{9y
z6ECfDk9kauEHJ*xci0Y#Ozbje@9J977{a4bE#a@qaH9S|m${5%)E3*q|Ah$V>+HR5
zu5SznPS1`HR78A%sRS%2D~3MY#1jLL=EdA9|33PCl*Ly0kI^5oPz%fKV$A2xtyHao
z-0T>LZR<>Hx$h*$A9Bj&|{_)z>HriG$3SBz5nl+Y*)M?Vn=~t1uE{d-UPP0_OVD*q(BW%2Nc(JOlG3`}LFI|f`=Sey^@YH<>9`(iJt-z0wM57Jv?U^J)4RXZ+GHZiZuivg
zZGaL;n`&*%U|YQl-P^pE?zTj1*ln||$CE>;08qMnTSSIE#X(PW*rT&8@3Y-ap)w>c
zd$`4zcfSRD54Sk;wjR1IcCXcUod*{#N6A~t70Nbl)vsq2eC6nCk-qYZHe0!lRqZA2
zi%uI!pXiIEwp6*U*AoELv*{_3{BnXN{9xN>e`_EP71Ng*)#Txr%OxH~VrDldeQQa<8Ge8=JMn+3DE47N^G3vzH=Wh8`3cdvXO%`;klFjC_}?-2FHc0qdkOl%cqf+NSnr(A-=_dWw&8kU!5Ae#H60Rg?h_=Nlq`P6jF9mmnKqB)ShY#u0?%&`j
zQ7BN5I`NVGHp^B}!6vPml`UkK;4co4N%PL1?(Xbn+DFnnxbOw}EpV#LQGS;#I_{Ye
zbGVO{g3JVSP@2b>jf!bzw(k6Sn~H{F8nwEJI1et3EHwE)Jt(s(vC}!
zP9%M
zs8J=79qOhAjU~ZX)99Y|i26vsLo)X}(|#xac-TX(R{{<3yLb5kU3)V)~po`^Blz
zDbrMGnlwd!dig~mK;Oii(44~9LGvUWYI
z9!00zR9E{I0;>@_0@|ja^FE-c3n;Y(#Az1y;TN|PVS?}tAV}K@)5_wVL)n@A1f^zp
z@^|E+YDW(Q8)tEl{?8|AYKvD9JMAQw8pprsv_!2c#{mw<^Hecw6i8M
zV{lN@GzSMYnmDKrt65%FGb%2WlInVl3Z1_cgBt)!nD#CzUyjj9F?_uSPmPfdORxE*
z`~`%nYeDs*OMFo3mWBA|!$}swQ=bBgzVKFko_e0*3i^4FXFo#yJT;Uj{qXzGXiy3S
zBVEAh6Jfq4DE5pg2M7Q9DbRfL!;#$mhG2-}-_m1q$81!w)DkM~dLM1B#FK{3k23?Y@DG5!5LM7{@gieM*uS(V_37ras
zN>;q117D}ZpzlalRYKKJsANq`XgUn~u4J8&(3w!EWSy1J*)V89vLf<>X=$NQ$@-Fn
zz7z&6O4gGSdNLF$Sx-smsW7N6S?45lE)*(RPfO_OFsLC}(N*P^e^`m(ckzXi2i3
zmC&=HP|5nTguWaGU6HJ>Na!n}P|12uLeCus)ynh6jannpOWkkdA+k@kZrc}B2(
zkRd^8mLZ@b1)2^Cq?x?mPU01_Z=MiD(R;0w^3bjitO7+I4R>CfqbaPX|iu4b)+6IEy#y@m1HD=)MtG8P`^wv!ddD&
zzI>?nXNek
Qg_O1sXWa%LGz<~_AAR>eTL1t6

literal 0
HcmV?d1

diff --git a/tests/data/acpi/pc/HMAT.acpihmat b/tests/data/acpi/pc/HMAT.acpihmat
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..c00f7ba6cd0acecbc4b158f430d29b2f32988522
 100644
GIT binary patch
literal 280
zcmeb9bqtYUWME*L;pFe^5v<@85#a0r6axw|fY=}!1~h;SWIjwBokmuNOFc;30ICth
zW`eR`Fhdzga*PcB{=?M+<*vJ6H|M*!-WC@U?fIv`?15Cr@;rh|!0TAyG
A0RR91

literal 0
HcmV?d1

diff --git a/tests/data/acpi/pc/SRAT.acpihmat b/tests/data/acpi/pc/SRAT.acpihmat
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..1dcae90aec688e88f9d212e632ff2e0dc7bc
 100644
GIT binary patch
literal 280
zcmWFzatx7RWME)C;N1+c3F
q8VCj-m|+T0)xmizPc?|5v<@85#a0y6k`O6f!H9Lf#JbFFwFr}2jX%tGJl1TZsreZQ3+HD9Lq;HVsVfQY0;|Op$crfCjjB<5o
zWT7}k(jb7O0QuoVfrNF?-snJw-g@gb{u6R*fS!8oH7Md!)c4Ko$TLd{h!53)oNxBM
z@0+(fGjFv^zvFkmxxkneRIYmUPO);m<@xBd7-Q6?Z>N#D!FqdsrPjA{sf^Xz
zDz^KqU%6JZ{<0l@9)@>53ay(FyY+>0@7B%egO9^oj6iSSia4i+v%@s`
z+5LLM%zw+D{c47ew*3-YYpFWo0I*k9WQhD4d(f;tPD3N2HS;s?(~9xt$n+^
zboujF?vx+=>Yu;4`v%Vdu!?UR-)j+lgztrXIUG8l4R);ei7t+<4Cg-^h{Lkap(9a9
zJ@@mni|zTpJ69Bb9Cx2jz=RtqD<*l<4Tt!}{bjD7W8j%9lL4#o2?S2z7)tL^uT
z?xlXGTV@gUgb{V!{A0+SaG3ve5VAp-J32aK!XDpf7<-pFmnWa6;m~Qr>B}}c<-Ryo7{D?H(`vN0Qat4O-d+<|Fva(Hs<(fJ
z+RVEel+(<@R|Q|qR@YAnR5is92z3gmD)Y+KP0Op`quIiTbNTcOX;qP`^$wnEcRdu9
z*DQx?L?d0~r)pNjBIcydGplCpvR#(SzRP+CKijDI$MAI8of7BcSfU_?EMyi~ud
zeLe-Hy@RKNtJjW+v-3%!%_q(?hk!3Z%P}y++(miDY5d_Zi?e*l?q`k*I()ijy_0??
zZQJMfM4@U1=VV1Gny}(o7pI{Ua#EUy>4#}%pLax>bxO0ENW)WVICM@=%#(VULLM7=
z>{i2DXKucaZ!6lS$obQ&7F`0z`;S;It#-FHxAE5ATrPvz!?Ue?l6%tcb!MvAPktZXV3Yw8jHF$)&>3gkUP@gk-A0Bh+N>GQXlHuL&^cx5M5ycJhE8tiN=+lozsTSX;UXcUFVFUbH>m)W9mex>zp-o
z#rO`QmJosOZ?F?2enPK3J7f}yiu=q#8z5$ZbU44rd^)*jLS5&)p>y8QIdAGj
zsOxkMovxwNHFYA?b)GVGo-%ZvGIb)7Yv;XrcQ*q@(}vE|rcQ*q#

[PULL v2 27/27] tests: add virtio-scsi and virtio-blk seg_max_adjust test

2019-12-23 Thread Michael S. Tsirkin
From: Denis Plotnikov 

It tests proper seg_max_adjust settings for all machine types except
'none', 'isapc', 'microvm'

Signed-off-by: Denis Plotnikov 
Message-Id: <20191220140905.1718-3-dplotni...@virtuozzo.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 tests/acceptance/virtio_seg_max_adjust.py | 134 ++
 1 file changed, 134 insertions(+)
 create mode 100755 tests/acceptance/virtio_seg_max_adjust.py

diff --git a/tests/acceptance/virtio_seg_max_adjust.py 
b/tests/acceptance/virtio_seg_max_adjust.py
new file mode 100755
index 00..5458573138
--- /dev/null
+++ b/tests/acceptance/virtio_seg_max_adjust.py
@@ -0,0 +1,134 @@
+#!/usr/bin/env python
+#
+# Test virtio-scsi and virtio-blk queue settings for all machine types
+#
+# Copyright (c) 2019 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import sys
+import os
+import re
+
+sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'python'))
+from qemu.machine import QEMUMachine
+from avocado_qemu import Test
+
+#list of machine types and virtqueue properties to test
+VIRTIO_SCSI_PROPS = {'seg_max_adjust': 'seg_max_adjust'}
+VIRTIO_BLK_PROPS = {'seg_max_adjust': 'seg-max-adjust'}
+
+DEV_TYPES = {'virtio-scsi-pci': VIRTIO_SCSI_PROPS,
+ 'virtio-blk-pci': VIRTIO_BLK_PROPS}
+
+VM_DEV_PARAMS = {'virtio-scsi-pci': ['-device', 'virtio-scsi-pci,id=scsi0'],
+ 'virtio-blk-pci': ['-device',
+'virtio-blk-pci,id=scsi0,drive=drive0',
+'-drive',
+'driver=null-co,id=drive0,if=none']}
+
+
+class VirtioMaxSegSettingsCheck(Test):
+@staticmethod
+def make_pattern(props):
+pattern_items = ['{0} = \w+'.format(prop) for prop in props]
+return '|'.join(pattern_items)
+
+def query_virtqueue(self, vm, dev_type_name):
+query_ok = False
+error = None
+props = None
+
+output = vm.command('human-monitor-command',
+command_line = 'info qtree')
+props_list = DEV_TYPES[dev_type_name].values();
+pattern = self.make_pattern(props_list)
+res = re.findall(pattern, output)
+
+if len(res) != len(props_list):
+props_list = set(props_list)
+res = set(res)
+not_found = props_list.difference(res)
+not_found = ', '.join(not_found)
+error = '({0}): The following properties not found: {1}'\
+ .format(dev_type_name, not_found)
+else:
+query_ok = True
+props = dict()
+for prop in res:
+p = prop.split(' = ')
+props[p[0]] = p[1]
+return query_ok, props, error
+
+def check_mt(self, mt, dev_type_name):
+with QEMUMachine(self.qemu_bin) as vm:
+vm.set_machine(mt["name"])
+for s in VM_DEV_PARAMS[dev_type_name]:
+vm.add_args(s)
+vm.launch()
+query_ok, props, error = self.query_virtqueue(vm, dev_type_name)
+
+if not query_ok:
+self.fail('machine type {0}: {1}'.format(mt['name'], error))
+
+for prop_name, prop_val in props.items():
+expected_val = mt[prop_name]
+self.assertEqual(expected_val, prop_val)
+
+@staticmethod
+def seg_max_adjust_enabled(mt):
+# machine types >= 5.0 should have seg_max_adjust = true
+# others seg_max_adjust = false
+mt = mt.split("-")
+
+# machine types with one line name and name like pc-x.x
+if len(mt) <= 2:
+return False
+
+# machine types like pc--x.x[.x]
+ver = mt[2]
+ver = ver.split(".");
+
+# versions >= 5.0 goes with seg_max_adjust enabled
+major = int(ver[0])
+
+if major >= 5:
+return True
+return False
+
+def test_machine_types(self):
+# collect all machine types except 'none', 'isapc', 'microvm'
+with QEMUMachine(self.qemu_bin) as vm:
+vm.launch()
+machines = [m['name'] for m in vm.command('query-machines')]
+vm.shutdown()
+machines.remove('none')
+machines.remove('isapc')
+machines.remove('microvm')
+
+for dev_type in DEV_TYPES:
+   

[PULL v2 17/27] tests/numa: Add case for QMP build HMAT

2019-12-23 Thread Michael S. Tsirkin
From: Tao Xu 

Check configuring HMAT usecase

Acked-by: Markus Armbruster 
Suggested-by: Igor Mammedov 
Signed-off-by: Tao Xu 
Message-Id: <20191213011929.2520-8-tao3...@intel.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Igor Mammedov 
---
 tests/numa-test.c | 213 ++
 1 file changed, 213 insertions(+)

diff --git a/tests/numa-test.c b/tests/numa-test.c
index 8de8581231..17dd807d2a 100644
--- a/tests/numa-test.c
+++ b/tests/numa-test.c
@@ -327,6 +327,216 @@ static void pc_dynamic_cpu_cfg(const void *data)
 qtest_quit(qs);
 }
 
+static void pc_hmat_build_cfg(const void *data)
+{
+QTestState *qs = qtest_initf("%s -nodefaults --preconfig -machine hmat=on "
+ "-smp 2,sockets=2 "
+ "-m 128M,slots=2,maxmem=1G "
+ "-object memory-backend-ram,size=64M,id=m0 "
+ "-object memory-backend-ram,size=64M,id=m1 "
+ "-numa node,nodeid=0,memdev=m0 "
+ "-numa node,nodeid=1,memdev=m1,initiator=0 "
+ "-numa cpu,node-id=0,socket-id=0 "
+ "-numa cpu,node-id=0,socket-id=1",
+ data ? (char *)data : "");
+
+/* Fail: Initiator should be less than the number of nodes */
+g_assert_true(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 2, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"access-latency\" } }")));
+
+/* Fail: Target should be less than the number of nodes */
+g_assert_true(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 2,"
+" 'hierarchy': \"memory\", 'data-type': \"access-latency\" } }")));
+
+/* Fail: Initiator should contain cpu */
+g_assert_true(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 1, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"access-latency\" } }")));
+
+/* Fail: Data-type mismatch */
+g_assert_true(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"write-latency\","
+" 'bandwidth': 524288000 } }")));
+g_assert_true(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"read-bandwidth\","
+" 'latency': 5 } }")));
+
+/* Fail: Bandwidth should be 1MB (1048576) aligned */
+g_assert_true(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"access-bandwidth\","
+" 'bandwidth': 1048575 } }")));
+
+/* Configuring HMAT bandwidth and latency details */
+g_assert_false(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"access-latency\","
+" 'latency': 1 } }")));/* 1 ns */
+g_assert_true(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"access-latency\","
+" 'latency': 5 } }")));/* Fail: Duplicate configuration */
+g_assert_false(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"access-bandwidth\","
+" 'bandwidth': 68717379584 } }")));/* 65534 MB/s */
+g_assert_false(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 1,"
+" 'hierarchy': \"memory\", 'data-type': \"access-latency\","
+" 'latency': 65534 } }")));/* 65534 ns */
+g_assert_false(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 1,"
+" 'hierarchy': \"memory\", 'data-type': \"access-bandwidth\","
+" 'bandwidth': 34358689792 } }")));/* 32767 MB/s */
+
+/* Fail: node_id should be less than the number of nodes */
+g_assert_true(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-cache', 'node-id': 2, 'size': 10240,"
+" 'level': 1, 'associativity': \"direct\", 'policy': \"write-back\","
+" 'line': 8 } }")));
+
+/* Fail: level should be less than HMAT_LB_LEVELS (4) */
+g_assert_true(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 

[PULL v2 16/27] hmat acpi: Build Memory Side Cache Information Structure(s)

2019-12-23 Thread Michael S. Tsirkin
From: Liu Jingqi 

This structure describes memory side cache information for memory
proximity domains if the memory side cache is present and the
physical device forms the memory side cache.
The software could use this information to effectively place
the data in memory to maximize the performance of the system
memory that use the memory side cache.

Acked-by: Markus Armbruster 
Reviewed-by: Igor Mammedov 
Reviewed-by: Daniel Black 
Reviewed-by: Jonathan Cameron 
Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
Message-Id: <20191213011929.2520-7-tao3...@intel.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/acpi/hmat.c | 69 +-
 1 file changed, 68 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index 4635d45dee..7c24bb5371 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -143,14 +143,62 @@ static void build_hmat_lb(GArray *table_data, 
HMAT_LB_Info *hmat_lb,
 g_free(entry_list);
 }
 
+/* ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure: Table 5-147 */
+static void build_hmat_cache(GArray *table_data, uint8_t total_levels,
+ NumaHmatCacheOptions *hmat_cache)
+{
+/*
+ * Cache Attributes: Bits [3:0] – Total Cache Levels
+ * for this Memory Proximity Domain
+ */
+uint32_t cache_attr = total_levels;
+
+/* Bits [7:4] : Cache Level described in this structure */
+cache_attr |= (uint32_t) hmat_cache->level << 4;
+
+/* Bits [11:8] - Cache Associativity */
+cache_attr |= (uint32_t) hmat_cache->associativity << 8;
+
+/* Bits [15:12] - Write Policy */
+cache_attr |= (uint32_t) hmat_cache->policy << 12;
+
+/* Bits [31:16] - Cache Line size in bytes */
+cache_attr |= (uint32_t) hmat_cache->line << 16;
+
+/* Type */
+build_append_int_noprefix(table_data, 2, 2);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Length */
+build_append_int_noprefix(table_data, 32, 4);
+/* Proximity Domain for the Memory */
+build_append_int_noprefix(table_data, hmat_cache->node_id, 4);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+/* Memory Side Cache Size */
+build_append_int_noprefix(table_data, hmat_cache->size, 8);
+/* Cache Attributes */
+build_append_int_noprefix(table_data, cache_attr, 4);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/*
+ * Number of SMBIOS handles (n)
+ * Linux kernel uses Memory Side Cache Information Structure
+ * without SMBIOS entries for now, so set Number of SMBIOS handles
+ * as 0.
+ */
+build_append_int_noprefix(table_data, 0, 2);
+}
+
 /* Build HMAT sub table structures */
 static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
 {
 uint16_t flags;
 uint32_t num_initiator = 0;
 uint32_t initiator_list[MAX_NODES];
-int i, hierarchy, type;
+int i, hierarchy, type, cache_level, total_levels;
 HMAT_LB_Info *hmat_lb;
+NumaHmatCacheOptions *hmat_cache;
 
 for (i = 0; i < numa_state->num_nodes; i++) {
 flags = 0;
@@ -184,6 +232,25 @@ static void hmat_build_table_structs(GArray *table_data, 
NumaState *numa_state)
 }
 }
 }
+
+/*
+ * ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure:
+ * Table 5-147
+ */
+for (i = 0; i < numa_state->num_nodes; i++) {
+total_levels = 0;
+for (cache_level = 1; cache_level < HMAT_LB_LEVELS; cache_level++) {
+if (numa_state->hmat_cache[i][cache_level]) {
+total_levels++;
+}
+}
+for (cache_level = 0; cache_level <= total_levels; cache_level++) {
+hmat_cache = numa_state->hmat_cache[i][cache_level];
+if (hmat_cache) {
+build_hmat_cache(table_data, total_levels, hmat_cache);
+}
+}
+}
 }
 
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *numa_state)
-- 
MST




[PULL v2 21/27] hw/pci/pci_host: Remove redundant PCI_DPRINTF()

2019-12-23 Thread Michael S. Tsirkin
From: Philippe Mathieu-Daudé 

In commit 3bf4dfdd111 we introduced the pci_cfg_[read/write]
trace events in pci_host_config_[read/write]_common().
We have the following call trace:

  pci_host_data_[read/write]()
- PCI_DPRINTF()
- pci_data_[read/write]()
- PCI_DPRINTF()
- pci_host_config_[read/write]_common()
trace_pci_cfg_[read/write]()

Since the PCI_DPRINTF() calls are redundant with the trace
events, remove them.

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20191216002134.18279-2-phi...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/pci/pci_host.c | 21 +
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
index c5f9244934..0958d157de 100644
--- a/hw/pci/pci_host.c
+++ b/hw/pci/pci_host.c
@@ -115,8 +115,6 @@ void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, 
int len)
 return;
 }
 
-PCI_DPRINTF("%s: %s: addr=%02" PRIx32 " val=%08" PRIx32 " len=%d\n",
-__func__, pci_dev->name, config_addr, val, len);
 pci_host_config_write_common(pci_dev, config_addr, PCI_CONFIG_SPACE_SIZE,
  val, len);
 }
@@ -125,18 +123,13 @@ uint32_t pci_data_read(PCIBus *s, uint32_t addr, int len)
 {
 PCIDevice *pci_dev = pci_dev_find_by_addr(s, addr);
 uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
-uint32_t val;
 
 if (!pci_dev) {
 return ~0x0;
 }
 
-val = pci_host_config_read_common(pci_dev, config_addr,
-  PCI_CONFIG_SPACE_SIZE, len);
-PCI_DPRINTF("%s: %s: addr=%02"PRIx32" val=%08"PRIx32" len=%d\n",
-__func__, pci_dev->name, config_addr, val, len);
-
-return val;
+return pci_host_config_read_common(pci_dev, config_addr,
+   PCI_CONFIG_SPACE_SIZE, len);
 }
 
 static void pci_host_config_write(void *opaque, hwaddr addr,
@@ -167,8 +160,7 @@ static void pci_host_data_write(void *opaque, hwaddr addr,
 uint64_t val, unsigned len)
 {
 PCIHostState *s = opaque;
-PCI_DPRINTF("write addr " TARGET_FMT_plx " len %d val %x\n",
-addr, len, (unsigned)val);
+
 if (s->config_reg & (1u << 31))
 pci_data_write(s->bus, s->config_reg | (addr & 3), val, len);
 }
@@ -177,14 +169,11 @@ static uint64_t pci_host_data_read(void *opaque,
hwaddr addr, unsigned len)
 {
 PCIHostState *s = opaque;
-uint32_t val;
+
 if (!(s->config_reg & (1U << 31))) {
 return 0x;
 }
-val = pci_data_read(s->bus, s->config_reg | (addr & 3), len);
-PCI_DPRINTF("read addr " TARGET_FMT_plx " len %d val %x\n",
-addr, len, val);
-return val;
+return pci_data_read(s->bus, s->config_reg | (addr & 3), len);
 }
 
 const MemoryRegionOps pci_host_conf_le_ops = {
-- 
MST




[PULL v2 20/27] virtio-mmio: Clear v2 transport state on soft reset

2019-12-23 Thread Michael S. Tsirkin
From: Jean-Philippe Brucker 

At the moment when the guest writes a status of 0, we only reset the
virtio core state but not the virtio-mmio state. The virtio-mmio
specification says (v1.1 cs01, 4.2.2.1 Device Requirements:
MMIO Device Register Layout):

Upon reset, the device MUST clear all bits in InterruptStatus and
ready bits in the QueueReady register for all queues in the device.

The core already takes care of InterruptStatus by clearing isr, but we
still need to clear QueueReady.

It would be tempting to clean all registers, but since the specification
doesn't say anything more, guests could rely on the registers keeping
their state across reset. Linux for example, relies on this for
GuestPageSize in the legacy MMIO tranport.

Fixes: 44e687a4d9ab ("virtio-mmio: implement modern (v2) personality 
(virtio-1)")
Signed-off-by: Jean-Philippe Brucker 
Message-Id: <20191213095410.1516119-1-jean-phili...@linaro.org>
Reviewed-by: Sergio Lopez 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-mmio.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index 94d934c44b..ef40b7a9b2 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -65,6 +65,19 @@ static void virtio_mmio_stop_ioeventfd(VirtIOMMIOProxy 
*proxy)
 virtio_bus_stop_ioeventfd(>bus);
 }
 
+static void virtio_mmio_soft_reset(VirtIOMMIOProxy *proxy)
+{
+int i;
+
+if (proxy->legacy) {
+return;
+}
+
+for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
+proxy->vqs[i].enabled = 0;
+}
+}
+
 static uint64_t virtio_mmio_read(void *opaque, hwaddr offset, unsigned size)
 {
 VirtIOMMIOProxy *proxy = (VirtIOMMIOProxy *)opaque;
@@ -378,6 +391,7 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 
 if (vdev->status == 0) {
 virtio_reset(vdev);
+virtio_mmio_soft_reset(proxy);
 }
 break;
 case VIRTIO_MMIO_QUEUE_DESC_LOW:
-- 
MST




[PULL v2 15/27] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)

2019-12-23 Thread Michael S. Tsirkin
From: Liu Jingqi 

This structure describes the memory access latency and bandwidth
information from various memory access initiator proximity domains.
The latency and bandwidth numbers represented in this structure
correspond to rated latency and bandwidth for the platform.
The software could use this information as hint for optimization.

Acked-by: Markus Armbruster 
Reviewed-by: Igor Mammedov 
Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
Message-Id: <20191213011929.2520-6-tao3...@intel.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/acpi/hmat.c | 104 -
 1 file changed, 103 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index 9ff79308a4..4635d45dee 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -25,6 +25,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include "sysemu/numa.h"
 #include "hw/acpi/hmat.h"
 
@@ -67,11 +68,89 @@ static void build_hmat_mpda(GArray *table_data, uint16_t 
flags,
 build_append_int_noprefix(table_data, 0, 8);
 }
 
+/*
+ * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+ * Structure: Table 5-146
+ */
+static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
+  uint32_t num_initiator, uint32_t num_target,
+  uint32_t *initiator_list)
+{
+int i, index;
+HMAT_LB_Data *lb_data;
+uint16_t *entry_list;
+uint32_t base;
+/* Length in bytes for entire structure */
+uint32_t lb_length
+= 32 /* Table length upto and including Entry Base Unit */
++ 4 * num_initiator /* Initiator Proximity Domain List */
++ 4 * num_target /* Target Proximity Domain List */
++ 2 * num_initiator * num_target; /* Latency or Bandwidth Entries */
+
+/* Type */
+build_append_int_noprefix(table_data, 1, 2);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Length */
+build_append_int_noprefix(table_data, lb_length, 4);
+/* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
+assert(!(hmat_lb->hierarchy >> 4));
+build_append_int_noprefix(table_data, hmat_lb->hierarchy, 1);
+/* Data Type */
+build_append_int_noprefix(table_data, hmat_lb->data_type, 1);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Number of Initiator Proximity Domains (s) */
+build_append_int_noprefix(table_data, num_initiator, 4);
+/* Number of Target Proximity Domains (t) */
+build_append_int_noprefix(table_data, num_target, 4);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+
+/* Entry Base Unit */
+if (hmat_lb->data_type <= HMAT_LB_DATA_WRITE_LATENCY) {
+/* Convert latency base from nanoseconds to picosecond */
+base = hmat_lb->base * 1000;
+} else {
+/* Convert bandwidth base from Byte to Megabyte */
+base = hmat_lb->base / MiB;
+}
+build_append_int_noprefix(table_data, base, 8);
+
+/* Initiator Proximity Domain List */
+for (i = 0; i < num_initiator; i++) {
+build_append_int_noprefix(table_data, initiator_list[i], 4);
+}
+
+/* Target Proximity Domain List */
+for (i = 0; i < num_target; i++) {
+build_append_int_noprefix(table_data, i, 4);
+}
+
+/* Latency or Bandwidth Entries */
+entry_list = g_malloc0(num_initiator * num_target * sizeof(uint16_t));
+for (i = 0; i < hmat_lb->list->len; i++) {
+lb_data = _array_index(hmat_lb->list, HMAT_LB_Data, i);
+index = lb_data->initiator * num_target + lb_data->target;
+
+entry_list[index] = (uint16_t)(lb_data->data / hmat_lb->base);
+}
+
+for (i = 0; i < num_initiator * num_target; i++) {
+build_append_int_noprefix(table_data, entry_list[i], 2);
+}
+
+g_free(entry_list);
+}
+
 /* Build HMAT sub table structures */
 static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
 {
 uint16_t flags;
-int i;
+uint32_t num_initiator = 0;
+uint32_t initiator_list[MAX_NODES];
+int i, hierarchy, type;
+HMAT_LB_Info *hmat_lb;
 
 for (i = 0; i < numa_state->num_nodes; i++) {
 flags = 0;
@@ -82,6 +161,29 @@ static void hmat_build_table_structs(GArray *table_data, 
NumaState *numa_state)
 
 build_hmat_mpda(table_data, flags, numa_state->nodes[i].initiator, i);
 }
+
+for (i = 0; i < numa_state->num_nodes; i++) {
+if (numa_state->nodes[i].has_cpu) {
+initiator_list[num_initiator++] = i;
+}
+}
+
+/*
+ * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+ * Structure: Table 5-146
+ */
+for (hierarchy = HMAT_LB_MEM_MEMORY;
+ hierarchy <= HMAT_LB_MEM_CACHE_3RD_LEVEL; hierarchy++) {
+for (type = HMAT_LB_DATA_ACCESS_LATENCY;
+ type <= HMAT_LB_DATA_WRITE_BANDWIDTH; type++) {
+hmat_lb = 

[PULL v2 12/27] numa: Extend CLI to provide memory latency and bandwidth information

2019-12-23 Thread Michael S. Tsirkin
From: Liu Jingqi 

Add -numa hmat-lb option to provide System Locality Latency and
Bandwidth Information. These memory attributes help to build
System Locality Latency and Bandwidth Information Structure(s)
in ACPI Heterogeneous Memory Attribute Table (HMAT). Before using
hmat-lb option, enable HMAT with -machine hmat=on.

Acked-by: Markus Armbruster 
Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
Message-Id: <20191213011929.2520-3-tao3...@intel.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Igor Mammedov 
---
 qapi/machine.json |  93 +++-
 include/sysemu/numa.h |  53 
 hw/core/numa.c| 194 ++
 qemu-options.hx   |  47 +-
 4 files changed, 384 insertions(+), 3 deletions(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index 27d0e37534..cf8faf5a2a 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -426,10 +426,12 @@
 #
 # @cpu: property based CPU(s) to node mapping (Since: 2.10)
 #
+# @hmat-lb: memory latency and bandwidth information (Since: 5.0)
+#
 # Since: 2.1
 ##
 { 'enum': 'NumaOptionsType',
-  'data': [ 'node', 'dist', 'cpu' ] }
+  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
 
 ##
 # @NumaOptions:
@@ -444,7 +446,8 @@
   'data': {
 'node': 'NumaNodeOptions',
 'dist': 'NumaDistOptions',
-'cpu': 'NumaCpuOptions' }}
+'cpu': 'NumaCpuOptions',
+'hmat-lb': 'NumaHmatLBOptions' }}
 
 ##
 # @NumaNodeOptions:
@@ -557,6 +560,92 @@
'base': 'CpuInstanceProperties',
'data' : {} }
 
+##
+# @HmatLBMemoryHierarchy:
+#
+# The memory hierarchy in the System Locality Latency and Bandwidth
+# Information Structure of HMAT (Heterogeneous Memory Attribute Table)
+#
+# For more information about @HmatLBMemoryHierarchy, see chapter
+# 5.2.27.4: Table 5-146: Field "Flags" of ACPI 6.3 spec.
+#
+# @memory: the structure represents the memory performance
+#
+# @first-level: first level of memory side cache
+#
+# @second-level: second level of memory side cache
+#
+# @third-level: third level of memory side cache
+#
+# Since: 5.0
+##
+{ 'enum': 'HmatLBMemoryHierarchy',
+  'data': [ 'memory', 'first-level', 'second-level', 'third-level' ] }
+
+##
+# @HmatLBDataType:
+#
+# Data type in the System Locality Latency and Bandwidth
+# Information Structure of HMAT (Heterogeneous Memory Attribute Table)
+#
+# For more information about @HmatLBDataType, see chapter
+# 5.2.27.4: Table 5-146:  Field "Data Type" of ACPI 6.3 spec.
+#
+# @access-latency: access latency (nanoseconds)
+#
+# @read-latency: read latency (nanoseconds)
+#
+# @write-latency: write latency (nanoseconds)
+#
+# @access-bandwidth: access bandwidth (Bytes per second)
+#
+# @read-bandwidth: read bandwidth (Bytes per second)
+#
+# @write-bandwidth: write bandwidth (Bytes per second)
+#
+# Since: 5.0
+##
+{ 'enum': 'HmatLBDataType',
+  'data': [ 'access-latency', 'read-latency', 'write-latency',
+'access-bandwidth', 'read-bandwidth', 'write-bandwidth' ] }
+
+##
+# @NumaHmatLBOptions:
+#
+# Set the system locality latency and bandwidth information
+# between Initiator and Target proximity Domains.
+#
+# For more information about @NumaHmatLBOptions, see chapter
+# 5.2.27.4: Table 5-146 of ACPI 6.3 spec.
+#
+# @initiator: the Initiator Proximity Domain.
+#
+# @target: the Target Proximity Domain.
+#
+# @hierarchy: the Memory Hierarchy. Indicates the performance
+# of memory or side cache.
+#
+# @data-type: presents the type of data, access/read/write
+# latency or hit latency.
+#
+# @latency: the value of latency from @initiator to @target
+#   proximity domain, the latency unit is "ns(nanosecond)".
+#
+# @bandwidth: the value of bandwidth between @initiator and @target
+# proximity domain, the bandwidth unit is
+# "Bytes per second".
+#
+# Since: 5.0
+##
+{ 'struct': 'NumaHmatLBOptions',
+'data': {
+'initiator': 'uint16',
+'target': 'uint16',
+'hierarchy': 'HmatLBMemoryHierarchy',
+'data-type': 'HmatLBDataType',
+'*latency': 'uint64',
+'*bandwidth': 'size' }}
+
 ##
 # @HostMemPolicy:
 #
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 788cbec7a2..70f93c83d7 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -14,11 +14,34 @@ struct CPUArchId;
 #define NUMA_DISTANCE_MAX 254
 #define NUMA_DISTANCE_UNREACHABLE 255
 
+/* the value of AcpiHmatLBInfo flags */
+enum {
+HMAT_LB_MEM_MEMORY   = 0,
+HMAT_LB_MEM_CACHE_1ST_LEVEL  = 1,
+HMAT_LB_MEM_CACHE_2ND_LEVEL  = 2,
+HMAT_LB_MEM_CACHE_3RD_LEVEL  = 3,
+HMAT_LB_LEVELS   /* must be the last entry */
+};
+
+/* the value of AcpiHmatLBInfo data type */
+enum {
+HMAT_LB_DATA_ACCESS_LATENCY   = 0,
+HMAT_LB_DATA_READ_LATENCY = 1,
+HMAT_LB_DATA_WRITE_LATENCY= 2,
+HMAT_LB_DATA_ACCESS_BANDWIDTH = 3,
+HMAT_LB_DATA_READ_BANDWIDTH   = 4,
+HMAT_LB_DATA_WRITE_BANDWIDTH  = 5,
+

[PULL v2 07/27] virtio: update queue size on guest write

2019-12-23 Thread Michael S. Tsirkin
Some guests read back queue size after writing it.
Update the size immediatly upon write otherwise
they get confused.

Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index c6b47a9c73..e5c759e19e 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1256,6 +1256,8 @@ static void virtio_pci_common_write(void *opaque, hwaddr 
addr,
 break;
 case VIRTIO_PCI_COMMON_Q_SIZE:
 proxy->vqs[vdev->queue_sel].num = val;
+virtio_queue_set_num(vdev, vdev->queue_sel,
+ proxy->vqs[vdev->queue_sel].num);
 break;
 case VIRTIO_PCI_COMMON_Q_MSIX:
 msix_vector_unuse(>pci_dev,
-- 
MST




[PULL v2 18/27] tests/bios-tables-test: add test cases for ACPI HMAT

2019-12-23 Thread Michael S. Tsirkin
From: Tao Xu 

ACPI table HMAT has been introduced, QEMU now builds HMAT tables for
Heterogeneous Memory with boot option '-numa node'.

Add test cases on PC and Q35 machines with 2 numa nodes.
Because HMAT is generated when system enable numa, the
following tables need to be added for this test:
tests/data/acpi/pc/APIC.acpihmat
tests/data/acpi/pc/SRAT.acpihmat
tests/data/acpi/pc/HMAT.acpihmat
tests/data/acpi/pc/DSDT.acpihmat
tests/data/acpi/q35/APIC.acpihmat
tests/data/acpi/q35/SRAT.acpihmat
tests/data/acpi/q35/HMAT.acpihmat
tests/data/acpi/q35/DSDT.acpihmat

Acked-by: Markus Armbruster 
Reviewed-by: Igor Mammedov 
Reviewed-by: Daniel Black 
Reviewed-by: Jingqi Liu 
Suggested-by: Igor Mammedov 
Signed-off-by: Tao Xu 
Message-Id: <20191213011929.2520-9-tao3...@intel.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 tests/bios-tables-test-allowed-diff.h |  8 +
 tests/bios-tables-test.c  | 44 +++
 tests/data/acpi/pc/APIC.acpihmat  |  0
 tests/data/acpi/pc/DSDT.acpihmat  |  0
 tests/data/acpi/pc/HMAT.acpihmat  |  0
 tests/data/acpi/pc/SRAT.acpihmat  |  0
 tests/data/acpi/q35/APIC.acpihmat |  0
 tests/data/acpi/q35/DSDT.acpihmat |  0
 tests/data/acpi/q35/HMAT.acpihmat |  0
 tests/data/acpi/q35/SRAT.acpihmat |  0
 10 files changed, 52 insertions(+)
 create mode 100644 tests/data/acpi/pc/APIC.acpihmat
 create mode 100644 tests/data/acpi/pc/DSDT.acpihmat
 create mode 100644 tests/data/acpi/pc/HMAT.acpihmat
 create mode 100644 tests/data/acpi/pc/SRAT.acpihmat
 create mode 100644 tests/data/acpi/q35/APIC.acpihmat
 create mode 100644 tests/data/acpi/q35/DSDT.acpihmat
 create mode 100644 tests/data/acpi/q35/HMAT.acpihmat
 create mode 100644 tests/data/acpi/q35/SRAT.acpihmat

diff --git a/tests/bios-tables-test-allowed-diff.h 
b/tests/bios-tables-test-allowed-diff.h
index dfb8523c8b..3c9e0c979b 100644
--- a/tests/bios-tables-test-allowed-diff.h
+++ b/tests/bios-tables-test-allowed-diff.h
@@ -1 +1,9 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/APIC.acpihmat",
+"tests/data/acpi/pc/SRAT.acpihmat",
+"tests/data/acpi/pc/HMAT.acpihmat",
+"tests/data/acpi/pc/DSDT.acpihmat",
+"tests/data/acpi/q35/APIC.acpihmat",
+"tests/data/acpi/q35/SRAT.acpihmat",
+"tests/data/acpi/q35/HMAT.acpihmat",
+"tests/data/acpi/q35/DSDT.acpihmat",
diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index bc0ad594a1..f1ac2d7e96 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -947,6 +947,48 @@ static void test_acpi_virt_tcg_numamem(void)
 
 }
 
+static void test_acpi_tcg_acpi_hmat(const char *machine)
+{
+test_data data;
+
+memset(, 0, sizeof(data));
+data.machine = machine;
+data.variant = ".acpihmat";
+test_acpi_one(" -machine hmat=on"
+  " -smp 2,sockets=2"
+  " -m 128M,slots=2,maxmem=1G"
+  " -object memory-backend-ram,size=64M,id=m0"
+  " -object memory-backend-ram,size=64M,id=m1"
+  " -numa node,nodeid=0,memdev=m0"
+  " -numa node,nodeid=1,memdev=m1,initiator=0"
+  " -numa cpu,node-id=0,socket-id=0"
+  " -numa cpu,node-id=0,socket-id=1"
+  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+  "data-type=access-latency,latency=1"
+  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+  "data-type=access-bandwidth,bandwidth=65534M"
+  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+  "data-type=access-latency,latency=65534"
+  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+  "data-type=access-bandwidth,bandwidth=32767M"
+  " -numa hmat-cache,node-id=0,size=10K,level=1,"
+  "associativity=direct,policy=write-back,line=8"
+  " -numa hmat-cache,node-id=1,size=10K,level=1,"
+  "associativity=direct,policy=write-back,line=8",
+  );
+free_test_data();
+}
+
+static void test_acpi_q35_tcg_acpi_hmat(void)
+{
+test_acpi_tcg_acpi_hmat(MACHINE_Q35);
+}
+
+static void test_acpi_piix4_tcg_acpi_hmat(void)
+{
+test_acpi_tcg_acpi_hmat(MACHINE_PC);
+}
+
 static void test_acpi_virt_tcg(void)
 {
 test_data data = {
@@ -991,6 +1033,8 @@ int main(int argc, char *argv[])
 qtest_add_func("acpi/q35/numamem", test_acpi_q35_tcg_numamem);
 qtest_add_func("acpi/piix4/dimmpxm", test_acpi_piix4_tcg_dimm_pxm);
 qtest_add_func("acpi/q35/dimmpxm", test_acpi_q35_tcg_dimm_pxm);
+qtest_add_func("acpi/piix4/acpihmat", test_acpi_piix4_tcg_acpi_hmat);
+qtest_add_func("acpi/q35/acpihmat", test_acpi_q35_tcg_acpi_hmat);
 } else if (strcmp(arch, "aarch64") == 0) {
 qtest_add_func("acpi/virt", test_acpi_virt_tcg);
 

[PULL v2 13/27] numa: Extend CLI to provide memory side cache information

2019-12-23 Thread Michael S. Tsirkin
From: Liu Jingqi 

Add -numa hmat-cache option to provide Memory Side Cache Information.
These memory attributes help to build Memory Side Cache Information
Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).
Before using hmat-cache option, enable HMAT with -machine hmat=on.

Acked-by: Markus Armbruster 
Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
Message-Id: <20191213011929.2520-4-tao3...@intel.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Igor Mammedov 
---
 qapi/machine.json | 81 +--
 include/sysemu/numa.h |  5 +++
 hw/core/numa.c| 80 ++
 qemu-options.hx   | 17 +++--
 4 files changed, 179 insertions(+), 4 deletions(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index cf8faf5a2a..b3d30bc816 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -428,10 +428,12 @@
 #
 # @hmat-lb: memory latency and bandwidth information (Since: 5.0)
 #
+# @hmat-cache: memory side cache information (Since: 5.0)
+#
 # Since: 2.1
 ##
 { 'enum': 'NumaOptionsType',
-  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
+  'data': [ 'node', 'dist', 'cpu', 'hmat-lb', 'hmat-cache' ] }
 
 ##
 # @NumaOptions:
@@ -447,7 +449,8 @@
 'node': 'NumaNodeOptions',
 'dist': 'NumaDistOptions',
 'cpu': 'NumaCpuOptions',
-'hmat-lb': 'NumaHmatLBOptions' }}
+'hmat-lb': 'NumaHmatLBOptions',
+'hmat-cache': 'NumaHmatCacheOptions' }}
 
 ##
 # @NumaNodeOptions:
@@ -646,6 +649,80 @@
 '*latency': 'uint64',
 '*bandwidth': 'size' }}
 
+##
+# @HmatCacheAssociativity:
+#
+# Cache associativity in the Memory Side Cache Information Structure
+# of HMAT
+#
+# For more information of @HmatCacheAssociativity, see chapter
+# 5.2.27.5: Table 5-147 of ACPI 6.3 spec.
+#
+# @none: None (no memory side cache in this proximity domain,
+#  or cache associativity unknown)
+#
+# @direct: Direct Mapped
+#
+# @complex: Complex Cache Indexing (implementation specific)
+#
+# Since: 5.0
+##
+{ 'enum': 'HmatCacheAssociativity',
+  'data': [ 'none', 'direct', 'complex' ] }
+
+##
+# @HmatCacheWritePolicy:
+#
+# Cache write policy in the Memory Side Cache Information Structure
+# of HMAT
+#
+# For more information of @HmatCacheWritePolicy, see chapter
+# 5.2.27.5: Table 5-147: Field "Cache Attributes" of ACPI 6.3 spec.
+#
+# @none: None (no memory side cache in this proximity domain,
+#  or cache write policy unknown)
+#
+# @write-back: Write Back (WB)
+#
+# @write-through: Write Through (WT)
+#
+# Since: 5.0
+##
+{ 'enum': 'HmatCacheWritePolicy',
+  'data': [ 'none', 'write-back', 'write-through' ] }
+
+##
+# @NumaHmatCacheOptions:
+#
+# Set the memory side cache information for a given memory domain.
+#
+# For more information of @NumaHmatCacheOptions, see chapter
+# 5.2.27.5: Table 5-147: Field "Cache Attributes" of ACPI 6.3 spec.
+#
+# @node-id: the memory proximity domain to which the memory belongs.
+#
+# @size: the size of memory side cache in bytes.
+#
+# @level: the cache level described in this structure.
+#
+# @associativity: the cache associativity,
+# none/direct-mapped/complex(complex cache indexing).
+#
+# @policy: the write policy, none/write-back/write-through.
+#
+# @line: the cache Line size in bytes.
+#
+# Since: 5.0
+##
+{ 'struct': 'NumaHmatCacheOptions',
+  'data': {
+   'node-id': 'uint32',
+   'size': 'size',
+   'level': 'uint8',
+   'associativity': 'HmatCacheAssociativity',
+   'policy': 'HmatCacheWritePolicy',
+   'line': 'uint16' }}
+
 ##
 # @HostMemPolicy:
 #
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 70f93c83d7..ba693cc80b 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -91,6 +91,9 @@ struct NumaState {
 
 /* NUMA nodes HMAT Locality Latency and Bandwidth Information */
 HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
+
+/* Memory Side Cache Information Structure */
+NumaHmatCacheOptions *hmat_cache[MAX_NODES][HMAT_LB_LEVELS];
 };
 typedef struct NumaState NumaState;
 
@@ -98,6 +101,8 @@ void set_numa_options(MachineState *ms, NumaOptions *object, 
Error **errp);
 void parse_numa_opts(MachineState *ms);
 void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
 Error **errp);
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
+   Error **errp);
 void numa_complete_configuration(MachineState *ms);
 void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
 extern QemuOptsList qemu_numa_opts;
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 34eb413f5d..747c9680b0 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -379,6 +379,73 @@ void parse_numa_hmat_lb(NumaState *numa_state, 
NumaHmatLBOptions *node,
 g_array_append_val(hmat_lb->list, lb_data);
 }
 
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
+   Error 

[PULL v2 09/27] Implement backend program convention command for vhost-user-blk

2019-12-23 Thread Michael S. Tsirkin
From: Micky Yun Chan 

This patch is to add standard commands defined in docs/interop/vhost-user.rst
For vhost-user-* program

Signed-off-by: Micky Yun Chan (michiboo) 
Message-Id: <20191209015331.5455-1-chanmicky...@gmail.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 docs/interop/vhost-user.json|  31 +++
 contrib/vhost-user-blk/vhost-user-blk.c | 108 ++--
 docs/interop/vhost-user.rst |  17 
 3 files changed, 112 insertions(+), 44 deletions(-)

diff --git a/docs/interop/vhost-user.json b/docs/interop/vhost-user.json
index da6aaf51c8..ce0ef74db5 100644
--- a/docs/interop/vhost-user.json
+++ b/docs/interop/vhost-user.json
@@ -54,6 +54,37 @@
   ]
 }
 
+##
+# @VHostUserBackendBlockFeature:
+#
+# List of vhost user "block" features.
+#
+# @read-only: The --read-only command line option is supported.
+# @blk-file: The --blk-file command line option is supported.
+#
+# Since: 5.0
+##
+{
+  'enum': 'VHostUserBackendBlockFeature',
+  'data': [ 'read-only', 'blk-file' ]
+}
+
+##
+# @VHostUserBackendCapabilitiesBlock:
+#
+# Capabilities reported by vhost user "block" backends
+#
+# @features: list of supported features.
+#
+# Since: 5.0
+##
+{
+  'struct': 'VHostUserBackendCapabilitiesBlock',
+  'data': {
+'features': [ 'VHostUserBackendBlockFeature' ]
+  }
+}
+
 ##
 # @VHostUserBackendInputFeature:
 #
diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index ae61034656..6fd91c7e99 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -576,70 +576,90 @@ vub_new(char *blk_file)
 return vdev_blk;
 }
 
+static int opt_fdnum = -1;
+static char *opt_socket_path;
+static char *opt_blk_file;
+static gboolean opt_print_caps;
+static gboolean opt_read_only;
+
+static GOptionEntry entries[] = {
+{ "print-capabilities", 'c', 0, G_OPTION_ARG_NONE, _print_caps,
+  "Print capabilities", NULL },
+{ "fd", 'f', 0, G_OPTION_ARG_INT, _fdnum,
+  "Use inherited fd socket", "FDNUM" },
+{ "socket-path", 's', 0, G_OPTION_ARG_FILENAME, _socket_path,
+  "Use UNIX socket path", "PATH" },
+{"blk-file", 'b', 0, G_OPTION_ARG_FILENAME, _blk_file,
+ "block device or file path", "PATH"},
+{ "read-only", 'r', 0, G_OPTION_ARG_NONE, _read_only,
+  "Enable read-only", NULL }
+};
+
 int main(int argc, char **argv)
 {
-int opt;
-char *unix_socket = NULL;
-char *blk_file = NULL;
-bool enable_ro = false;
 int lsock = -1, csock = -1;
 VubDev *vdev_blk = NULL;
+GError *error = NULL;
+GOptionContext *context;
 
-while ((opt = getopt(argc, argv, "b:rs:h")) != -1) {
-switch (opt) {
-case 'b':
-blk_file = g_strdup(optarg);
-break;
-case 's':
-unix_socket = g_strdup(optarg);
-break;
-case 'r':
-enable_ro = true;
-break;
-case 'h':
-default:
-printf("Usage: %s [ -b block device or file, -s UNIX domain socket"
-   " | -r Enable read-only ] | [ -h ]\n", argv[0]);
-return 0;
+context = g_option_context_new(NULL);
+g_option_context_add_main_entries(context, entries, NULL);
+if (!g_option_context_parse(context, , , )) {
+g_printerr("Option parsing failed: %s\n", error->message);
+exit(EXIT_FAILURE);
+}
+if (opt_print_caps) {
+g_print("{\n");
+g_print("  \"type\": \"block\",\n");
+g_print("  \"features\": [\n");
+g_print("\"read-only\",\n");
+g_print("\"blk-file\"\n");
+g_print("  ]\n");
+g_print("}\n");
+exit(EXIT_SUCCESS);
+}
+
+if (!opt_blk_file) {
+g_print("%s\n", g_option_context_get_help(context, true, NULL));
+exit(EXIT_FAILURE);
+}
+
+if (opt_socket_path) {
+lsock = unix_sock_new(opt_socket_path);
+if (lsock < 0) {
+exit(EXIT_FAILURE);
 }
+} else if (opt_fdnum < 0) {
+g_print("%s\n", g_option_context_get_help(context, true, NULL));
+exit(EXIT_FAILURE);
+} else {
+lsock = opt_fdnum;
 }
 
-if (!unix_socket || !blk_file) {
-printf("Usage: %s [ -b block device or file, -s UNIX domain socket"
-   " | -r Enable read-only ] | [ -h ]\n", argv[0]);
-return -1;
-}
-
-lsock = unix_sock_new(unix_socket);
-if (lsock < 0) {
-goto err;
-}
-
-csock = accept(lsock, (void *)0, (void *)0);
+csock = accept(lsock, NULL, NULL);
 if (csock < 0) {
-fprintf(stderr, "Accept error %s\n", strerror(errno));
-goto err;
+g_printerr("Accept error %s\n", strerror(errno));
+exit(EXIT_FAILURE);
 }
 
-vdev_blk = vub_new(blk_file);
+vdev_blk = vub_new(opt_blk_file);
 if (!vdev_blk) {
-goto err;
+exit(EXIT_FAILURE);
 }
-if (enable_ro) {
+

[PULL v2 14/27] hmat acpi: Build Memory Proximity Domain Attributes Structure(s)

2019-12-23 Thread Michael S. Tsirkin
From: Liu Jingqi 

HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
(HMAT). The specification references below link:
http://www.uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf

It describes the memory attributes, such as memory side cache
attributes and bandwidth and latency details, related to the
Memory Proximity Domain. The software is
expected to use this information as hint for optimization.

This structure describes Memory Proximity Domain Attributes by memory
subsystem and its associativity with processor proximity domain as well as
hint for memory usage.

In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

Acked-by: Markus Armbruster 
Reviewed-by: Igor Mammedov 
Reviewed-by: Daniel Black 
Reviewed-by: Jonathan Cameron 
Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
Message-Id: <20191213011929.2520-5-tao3...@intel.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/acpi/hmat.h| 42 ++
 hw/acpi/hmat.c| 99 +++
 hw/i386/acpi-build.c  |  5 +++
 hw/acpi/Kconfig   |  7 ++-
 hw/acpi/Makefile.objs |  1 +
 5 files changed, 152 insertions(+), 2 deletions(-)
 create mode 100644 hw/acpi/hmat.h
 create mode 100644 hw/acpi/hmat.c

diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
new file mode 100644
index 00..437dbc6872
--- /dev/null
+++ b/hw/acpi/hmat.h
@@ -0,0 +1,42 @@
+/*
+ * HMAT ACPI Implementation Header
+ *
+ * Copyright(C) 2019 Intel Corporation.
+ *
+ * Author:
+ *  Liu jingqi 
+ *  Tao Xu 
+ *
+ * HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
+ * (HMAT)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#ifndef HMAT_H
+#define HMAT_H
+
+#include "hw/acpi/aml-build.h"
+
+/*
+ * ACPI 6.3: 5.2.27.3 Memory Proximity Domain Attributes Structure,
+ * Table 5-145, Field "flag", Bit [0]: set to 1 to indicate that data in
+ * the Proximity Domain for the Attached Initiator field is valid.
+ * Other bits reserved.
+ */
+#define HMAT_PROXIMITY_INITIATOR_VALID  0x1
+
+void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *numa_state);
+
+#endif
diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
new file mode 100644
index 00..9ff79308a4
--- /dev/null
+++ b/hw/acpi/hmat.c
@@ -0,0 +1,99 @@
+/*
+ * HMAT ACPI Implementation
+ *
+ * Copyright(C) 2019 Intel Corporation.
+ *
+ * Author:
+ *  Liu jingqi 
+ *  Tao Xu 
+ *
+ * HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
+ * (HMAT)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/numa.h"
+#include "hw/acpi/hmat.h"
+
+/*
+ * ACPI 6.3:
+ * 5.2.27.3 Memory Proximity Domain Attributes Structure: Table 5-145
+ */
+static void build_hmat_mpda(GArray *table_data, uint16_t flags,
+uint32_t initiator, uint32_t mem_node)
+{
+
+/* Memory Proximity Domain Attributes Structure */
+/* Type */
+build_append_int_noprefix(table_data, 0, 2);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Length */
+build_append_int_noprefix(table_data, 40, 4);
+/* Flags */
+build_append_int_noprefix(table_data, flags, 2);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Proximity Domain for the Attached Initiator */
+build_append_int_noprefix(table_data, initiator, 4);
+/* Proximity Domain for the Memory */
+build_append_int_noprefix(table_data, mem_node, 4);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+/*
+ * Reserved:
+ * Previously defined as the Start Address of the System Physical
+  

[PULL v2 06/27] intel_iommu: fix bug to read DMAR_RTADDR_REG

2019-12-23 Thread Michael S. Tsirkin
From: Yi Sun 

Should directly read DMAR_RTADDR_REG but not using 's->root'.
Because 's->root' is modified in 'vtd_root_table_setup()' so
that the first 12 bits are omitted. This causes the guest
iommu debugfs cannot show pasid tables.

Signed-off-by: Yi Sun 
Message-Id: <20191205095439.29114-1-yi.y@linux.intel.com>
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Peter Xu 
Reviewed-by: Michael S. Tsirkin 
---
 hw/i386/intel_iommu.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 43c94b993b..ee06993675 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2610,16 +2610,15 @@ static uint64_t vtd_mem_read(void *opaque, hwaddr addr, 
unsigned size)
 switch (addr) {
 /* Root Table Address Register, 64-bit */
 case DMAR_RTADDR_REG:
+val = vtd_get_quad_raw(s, DMAR_RTADDR_REG);
 if (size == 4) {
-val = s->root & ((1ULL << 32) - 1);
-} else {
-val = s->root;
+val = val & ((1ULL << 32) - 1);
 }
 break;
 
 case DMAR_RTADDR_REG_HI:
 assert(size == 4);
-val = s->root >> 32;
+val = vtd_get_quad_raw(s, DMAR_RTADDR_REG) >> 32;
 break;
 
 /* Invalidation Queue Address Register, 64-bit */
-- 
MST




[PULL v2 03/27] virtio-balloon: fix memory leak while attach virtio-balloon device

2019-12-23 Thread Michael S. Tsirkin
From: Pan Nengyuan 

ivq/dvq/svq/free_page_vq is forgot to cleanup in
virtio_balloon_device_unrealize, the memory leak stack is as follow:

Direct leak of 14336 byte(s) in 2 object(s) allocated from:
#0 0x7f99fd9d8560 in calloc (/usr/lib64/libasan.so.3+0xc7560)
#1 0x7f99fcb20015 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x50015)
#2 0x557d90638437 in virtio_add_queue hw/virtio/virtio.c:2327
#3 0x557d9064401d in virtio_balloon_device_realize 
hw/virtio/virtio-balloon.c:793
#4 0x557d906356f7 in virtio_device_realize hw/virtio/virtio.c:3504
#5 0x557d9073f081 in device_set_realized hw/core/qdev.c:876
#6 0x557d908b1f4d in property_set_bool qom/object.c:2080
#7 0x557d908b655e in object_property_set_qobject qom/qom-qobject.c:26

Reported-by: Euler Robot 
Signed-off-by: Pan Nengyuan 
Message-Id: <1575444716-17632-2-git-send-email-pannengy...@huawei.com>
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: David Hildenbrand 
Reviewed-by: Michael S. Tsirkin 
Reviewed-by: David Hildenbrand 
---
 hw/virtio/virtio-balloon.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 40b04f5180..57f3b9f22d 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -831,6 +831,13 @@ static void virtio_balloon_device_unrealize(DeviceState 
*dev, Error **errp)
 }
 balloon_stats_destroy_timer(s);
 qemu_remove_balloon_handler(s);
+
+virtio_delete_queue(s->ivq);
+virtio_delete_queue(s->dvq);
+virtio_delete_queue(s->svq);
+if (s->free_page_vq) {
+virtio_delete_queue(s->free_page_vq);
+}
 virtio_cleanup(vdev);
 }
 
-- 
MST




[PULL v2 11/27] numa: Extend CLI to provide initiator information for numa nodes

2019-12-23 Thread Michael S. Tsirkin
From: Tao Xu 

In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
The initiator represents processor which access to memory. And in 5.2.27.3
Memory Proximity Domain Attributes Structure, the attached initiator is
defined as where the memory controller responsible for a memory proximity
domain. With attached initiator information, the topology of heterogeneous
memory can be described. Add new machine property 'hmat' to enable all
HMAT specific options.

Extend CLI of "-numa node" option to indicate the initiator numa node-id.
In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables. Before using initiator option, enable HMAT with
-machine hmat=on.

Acked-by: Markus Armbruster 
Reviewed-by: Igor Mammedov 
Reviewed-by: Jingqi Liu 
Suggested-by: Dan Williams 
Signed-off-by: Tao Xu 
Message-Id: <20191213011929.2520-2-tao3...@intel.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 qapi/machine.json | 10 ++-
 include/sysemu/numa.h |  5 
 hw/core/machine.c | 64 +++
 hw/core/numa.c| 23 
 qemu-options.hx   | 35 +++
 5 files changed, 131 insertions(+), 6 deletions(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index ca26779f1a..27d0e37534 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -463,6 +463,13 @@
 # @memdev: memory backend object.  If specified for one node,
 #  it must be specified for all nodes.
 #
+# @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145,
+# points to the nodeid which has the memory controller
+# responsible for this NUMA node. This field provides
+# additional information as to the initiator node that
+# is closest (as in directly attached) to this node, and
+# therefore has the best performance (since 5.0)
+#
 # Since: 2.1
 ##
 { 'struct': 'NumaNodeOptions',
@@ -470,7 +477,8 @@
'*nodeid': 'uint16',
'*cpus':   ['uint16'],
'*mem':'size',
-   '*memdev': 'str' }}
+   '*memdev': 'str',
+   '*initiator': 'uint16' }}
 
 ##
 # @NumaDistOptions:
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index ae9c41d02b..788cbec7a2 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -18,6 +18,8 @@ struct NodeInfo {
 uint64_t node_mem;
 struct HostMemoryBackend *node_memdev;
 bool present;
+bool has_cpu;
+uint16_t initiator;
 uint8_t distance[MAX_NODES];
 };
 
@@ -33,6 +35,9 @@ struct NumaState {
 /* Allow setting NUMA distance for different NUMA nodes */
 bool have_numa_distance;
 
+/* Detect if HMAT support is enabled. */
+bool hmat_enabled;
+
 /* NUMA nodes information */
 NodeInfo nodes[MAX_NODES];
 };
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 0854dcebdd..f5e2b32b3b 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -430,6 +430,20 @@ static void machine_set_nvdimm(Object *obj, bool value, 
Error **errp)
 ms->nvdimms_state->is_enabled = value;
 }
 
+static bool machine_get_hmat(Object *obj, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+return ms->numa_state->hmat_enabled;
+}
+
+static void machine_set_hmat(Object *obj, bool value, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+ms->numa_state->hmat_enabled = value;
+}
+
 static char *machine_get_nvdimm_persistence(Object *obj, Error **errp)
 {
 MachineState *ms = MACHINE(obj);
@@ -557,6 +571,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
const CpuInstanceProperties *props, Error 
**errp)
 {
 MachineClass *mc = MACHINE_GET_CLASS(machine);
+NodeInfo *numa_info = machine->numa_state->nodes;
 bool match = false;
 int i;
 
@@ -626,6 +641,17 @@ void machine_set_cpu_numa_node(MachineState *machine,
 match = true;
 slot->props.node_id = props->node_id;
 slot->props.has_node_id = props->has_node_id;
+
+if (machine->numa_state->hmat_enabled) {
+if ((numa_info[props->node_id].initiator < MAX_NODES) &&
+(props->node_id != numa_info[props->node_id].initiator)) {
+error_setg(errp, "The initiator of CPU NUMA node %" PRId64
+" should be itself", props->node_id);
+return;
+}
+numa_info[props->node_id].has_cpu = true;
+numa_info[props->node_id].initiator = props->node_id;
+}
 }
 
 if (!match) {
@@ -846,6 +872,13 @@ static void machine_initfn(Object *obj)
 
 if (mc->numa_mem_supported) {
 ms->numa_state = g_new0(NumaState, 1);
+object_property_add_bool(obj, "hmat",
+ machine_get_hmat, machine_set_hmat,
+ _abort);
+object_property_set_description(obj, "hmat",
+"Set on/off to enable/disable 

[PULL v2 08/27] virtio-pci: disable vring processing when bus-mastering is disabled

2019-12-23 Thread Michael S. Tsirkin
From: Michael Roth 

Currently the SLOF firmware for pseries guests will disable/re-enable
a PCI device multiple times via IO/MEM/MASTER bits of PCI_COMMAND
register after the initial probe/feature negotiation, as it tends to
work with a single device at a time at various stages like probing
and running block/network bootloaders without doing a full reset
in-between.

In QEMU, when PCI_COMMAND_MASTER is disabled we disable the
corresponding IOMMU memory region, so DMA accesses (including to vring
fields like idx/flags) will no longer undergo the necessary
translation. Normally we wouldn't expect this to happen since it would
be misbehavior on the driver side to continue driving DMA requests.

However, in the case of pseries, with iommu_platform=on, we trigger the
following sequence when tearing down the virtio-blk dataplane ioeventfd
in response to the guest unsetting PCI_COMMAND_MASTER:

  #2  0x55922651 in virtqueue_map_desc (vdev=vdev@entry=0x56dbcfb0, 
p_num_sg=p_num_sg@entry=0x7fffe657e1a8, addr=addr@entry=0x7fffe657e240, 
iov=iov@entry=0x7fffe6580240, max_num_sg=max_num_sg@entry=1024, 
is_write=is_write@entry=false, pa=0, sz=0)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:757
  #3  0x55922a89 in virtqueue_pop (vq=vq@entry=0x56dc8660, 
sz=sz@entry=184)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:950
  #4  0x558d3eca in virtio_blk_get_request (vq=0x56dc8660, 
s=0x56dbcfb0)
  at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:255
  #5  0x558d3eca in virtio_blk_handle_vq (s=0x56dbcfb0, 
vq=0x56dc8660)
  at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:776
  #6  0x5591dd66 in virtio_queue_notify_aio_vq 
(vq=vq@entry=0x56dc8660)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1550
  #7  0x5591ecef in virtio_queue_notify_aio_vq (vq=0x56dc8660)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1546
  #8  0x5591ecef in virtio_queue_host_notifier_aio_poll 
(opaque=0x56dc86c8)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:2527
  #9  0x55d02164 in run_poll_handlers_once 
(ctx=ctx@entry=0x5688bfc0, timeout=timeout@entry=0x7fffe65844a8)
  at /home/mdroth/w/qemu.git/util/aio-posix.c:520
  #10 0x55d02d1b in try_poll_mode (timeout=0x7fffe65844a8, 
ctx=0x5688bfc0)
  at /home/mdroth/w/qemu.git/util/aio-posix.c:607
  #11 0x55d02d1b in aio_poll (ctx=ctx@entry=0x5688bfc0, 
blocking=blocking@entry=true)
  at /home/mdroth/w/qemu.git/util/aio-posix.c:639
  #12 0x55d0004d in aio_wait_bh_oneshot (ctx=0x5688bfc0, 
cb=cb@entry=0x558d5130 , 
opaque=opaque@entry=0x56de86f0)
  at /home/mdroth/w/qemu.git/util/aio-wait.c:71
  #13 0x558d59bf in virtio_blk_data_plane_stop (vdev=)
  at /home/mdroth/w/qemu.git/hw/block/dataplane/virtio-blk.c:288
  #14 0x55b906a1 in virtio_bus_stop_ioeventfd 
(bus=bus@entry=0x56dbcf38)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:245
  #15 0x55b90dbb in virtio_bus_stop_ioeventfd 
(bus=bus@entry=0x56dbcf38)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:237
  #16 0x55b92a8e in virtio_pci_stop_ioeventfd (proxy=0x56db4e40)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:292
  #17 0x55b92a8e in virtio_write_config (pci_dev=0x56db4e40, 
address=, val=1048832, len=)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:613

I.e. the calling code is only scheduling a one-shot BH for
virtio_blk_data_plane_stop_bh, but somehow we end up trying to process
an additional virtqueue entry before we get there. This is likely due
to the following check in virtio_queue_host_notifier_aio_poll:

  static bool virtio_queue_host_notifier_aio_poll(void *opaque)
  {
  EventNotifier *n = opaque;
  VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
  bool progress;

  if (!vq->vring.desc || virtio_queue_empty(vq)) {
  return false;
  }

  progress = virtio_queue_notify_aio_vq(vq);

namely the call to virtio_queue_empty(). In this case, since no new
requests have actually been issued, shadow_avail_idx == last_avail_idx,
so we actually try to access the vring via vring_avail_idx() to get
the latest non-shadowed idx:

  int virtio_queue_empty(VirtQueue *vq)
  {
  bool empty;
  ...

  if (vq->shadow_avail_idx != vq->last_avail_idx) {
  return 0;
  }

  rcu_read_lock();
  empty = vring_avail_idx(vq) == vq->last_avail_idx;
  rcu_read_unlock();
  return empty;

but since the IOMMU region has been disabled we get a bogus value (0
usually), which causes virtio_queue_empty() to falsely report that
there are entries to be processed, which causes errors such as:

  "virtio: zero sized buffers are not allowed"

or

  "virtio-blk missing headers"

and puts the device in an error state.

This patch works around the issue by introducing virtio_set_disabled(),
which sets a 'disabled' 

[PULL v2 05/27] virtio-input: convert to new virtio_delete_queue

2019-12-23 Thread Michael S. Tsirkin
Seems cleaner than using VQ index values.

Signed-off-by: Michael S. Tsirkin 
---
 hw/input/virtio-input.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/input/virtio-input.c b/hw/input/virtio-input.c
index ec54e46ad6..9c013afddb 100644
--- a/hw/input/virtio-input.c
+++ b/hw/input/virtio-input.c
@@ -280,6 +280,7 @@ static void virtio_input_device_unrealize(DeviceState *dev, 
Error **errp)
 {
 VirtIOInputClass *vic = VIRTIO_INPUT_GET_CLASS(dev);
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtIOInput *vinput = VIRTIO_INPUT(dev);
 Error *local_err = NULL;
 
 if (vic->unrealize) {
@@ -289,8 +290,8 @@ static void virtio_input_device_unrealize(DeviceState *dev, 
Error **errp)
 return;
 }
 }
-virtio_del_queue(vdev, 0);
-virtio_del_queue(vdev, 1);
+virtio_delete_queue(vinput->evt);
+virtio_delete_queue(vinput->sts);
 virtio_cleanup(vdev);
 }
 
-- 
MST




[PULL v2 04/27] virtio-serial-bus: fix memory leak while attach virtio-serial-bus

2019-12-23 Thread Michael S. Tsirkin
From: Pan Nengyuan 

ivqs/ovqs/c_ivq/c_ovq is forgot to cleanup in
virtio_serial_device_unrealize, the memory leak stack is as bellow:

Direct leak of 1290240 byte(s) in 180 object(s) allocated from:
#0 0x7fc9bfc27560 in calloc (/usr/lib64/libasan.so.3+0xc7560)
#1 0x7fc9bed6f015 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x50015)
#2 0x5650e02b83e7 in virtio_add_queue hw/virtio/virtio.c:2327
#3 0x5650e02847b5 in virtio_serial_device_realize 
hw/char/virtio-serial-bus.c:1089
#4 0x5650e02b56a7 in virtio_device_realize hw/virtio/virtio.c:3504
#5 0x5650e03bf031 in device_set_realized hw/core/qdev.c:876
#6 0x5650e0531efd in property_set_bool qom/object.c:2080
#7 0x5650e053650e in object_property_set_qobject qom/qom-qobject.c:26
#8 0x5650e0533e14 in object_property_set_bool qom/object.c:1338
#9 0x5650e04c0e37 in virtio_pci_realize hw/virtio/virtio-pci.c:1801

Reported-by: Euler Robot 
Signed-off-by: Pan Nengyuan 
Cc: Laurent Vivier 
Cc: Amit Shah 
Cc: "Marc-André Lureau" 
Cc: Paolo Bonzini 
Message-Id: <1575444716-17632-3-git-send-email-pannengy...@huawei.com>
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Michael S. Tsirkin 
---
 hw/char/virtio-serial-bus.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 33259042a9..e1cbce3ba3 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -1126,9 +1126,17 @@ static void virtio_serial_device_unrealize(DeviceState 
*dev, Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIOSerial *vser = VIRTIO_SERIAL(dev);
+int i;
 
 QLIST_REMOVE(vser, next);
 
+virtio_delete_queue(vser->c_ivq);
+virtio_delete_queue(vser->c_ovq);
+for (i = 0; i < vser->bus.max_nr_ports; i++) {
+virtio_delete_queue(vser->ivqs[i]);
+virtio_delete_queue(vser->ovqs[i]);
+}
+
 g_free(vser->ivqs);
 g_free(vser->ovqs);
 g_free(vser->ports_map);
-- 
MST




[PULL v2 10/27] virtio: don't enable notifications during polling

2019-12-23 Thread Michael S. Tsirkin
From: Stefan Hajnoczi 

Virtqueue notifications are not necessary during polling, so we disable
them.  This allows the guest driver to avoid MMIO vmexits.
Unfortunately the virtio-blk and virtio-scsi handler functions re-enable
notifications, defeating this optimization.

Fix virtio-blk and virtio-scsi emulation so they leave notifications
disabled.  The key thing to remember for correctness is that polling
always checks one last time after ending its loop, therefore it's safe
to lose the race when re-enabling notifications at the end of polling.

There is a measurable performance improvement of 5-10% with the null-co
block driver.  Real-life storage configurations will see a smaller
improvement because the MMIO vmexit overhead contributes less to
latency.

Signed-off-by: Stefan Hajnoczi 
Message-Id: <20191209210957.65087-1-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/virtio/virtio.h |  1 +
 hw/block/virtio-blk.c  |  9 +++--
 hw/scsi/virtio-scsi.c  |  9 +++--
 hw/virtio/virtio.c | 12 ++--
 4 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 72475c..b69d517496 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -228,6 +228,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
version_id);
 
 void virtio_notify_config(VirtIODevice *vdev);
 
+bool virtio_queue_get_notification(VirtQueue *vq);
 void virtio_queue_set_notification(VirtQueue *vq, int enable);
 
 int virtio_queue_ready(VirtQueue *vq);
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index d62e6377c2..b12157b5eb 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -764,13 +764,16 @@ bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
 {
 VirtIOBlockReq *req;
 MultiReqBuffer mrb = {};
+bool suppress_notifications = virtio_queue_get_notification(vq);
 bool progress = false;
 
 aio_context_acquire(blk_get_aio_context(s->blk));
 blk_io_plug(s->blk);
 
 do {
-virtio_queue_set_notification(vq, 0);
+if (suppress_notifications) {
+virtio_queue_set_notification(vq, 0);
+}
 
 while ((req = virtio_blk_get_request(s, vq))) {
 progress = true;
@@ -781,7 +784,9 @@ bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
 }
 }
 
-virtio_queue_set_notification(vq, 1);
+if (suppress_notifications) {
+virtio_queue_set_notification(vq, 1);
+}
 } while (!virtio_queue_empty(vq));
 
 if (mrb.num_reqs) {
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index e8b2b64d09..f080545f48 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -597,12 +597,15 @@ bool virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue 
*vq)
 {
 VirtIOSCSIReq *req, *next;
 int ret = 0;
+bool suppress_notifications = virtio_queue_get_notification(vq);
 bool progress = false;
 
 QTAILQ_HEAD(, VirtIOSCSIReq) reqs = QTAILQ_HEAD_INITIALIZER(reqs);
 
 do {
-virtio_queue_set_notification(vq, 0);
+if (suppress_notifications) {
+virtio_queue_set_notification(vq, 0);
+}
 
 while ((req = virtio_scsi_pop_req(s, vq))) {
 progress = true;
@@ -622,7 +625,9 @@ bool virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
 }
 }
 
-virtio_queue_set_notification(vq, 1);
+if (suppress_notifications) {
+virtio_queue_set_notification(vq, 1);
+}
 } while (ret != -EINVAL && !virtio_queue_empty(vq));
 
 QTAILQ_FOREACH_SAFE(req, , next, next) {
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 7bc6a9455e..95d8ff8508 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -432,6 +432,11 @@ static void virtio_queue_packed_set_notification(VirtQueue 
*vq, int enable)
 }
 }
 
+bool virtio_queue_get_notification(VirtQueue *vq)
+{
+return vq->notification;
+}
+
 void virtio_queue_set_notification(VirtQueue *vq, int enable)
 {
 vq->notification = enable;
@@ -3410,17 +3415,12 @@ static bool virtio_queue_host_notifier_aio_poll(void 
*opaque)
 {
 EventNotifier *n = opaque;
 VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
-bool progress;
 
 if (!vq->vring.desc || virtio_queue_empty(vq)) {
 return false;
 }
 
-progress = virtio_queue_notify_aio_vq(vq);
-
-/* In case the handler function re-enabled notifications */
-virtio_queue_set_notification(vq, 0);
-return progress;
+return virtio_queue_notify_aio_vq(vq);
 }
 
 static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
-- 
MST




[PULL v2 02/27] virtio: make virtio_delete_queue idempotent

2019-12-23 Thread Michael S. Tsirkin
Let's make sure calling this twice is harmless -
no known instances, but seems safer.

Suggested-by: Pan Nengyuan 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 31dd140990..6de3cfdc2c 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2337,6 +2337,7 @@ void virtio_delete_queue(VirtQueue *vq)
 vq->handle_output = NULL;
 vq->handle_aio_output = NULL;
 g_free(vq->used_elems);
+vq->used_elems = NULL;
 }
 
 void virtio_del_queue(VirtIODevice *vdev, int n)
-- 
MST




[PULL v2 01/27] virtio: add ability to delete vq through a pointer

2019-12-23 Thread Michael S. Tsirkin
Devices tend to maintain vq pointers, allow deleting them trough a vq pointer.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: David Hildenbrand 
Reviewed-by: David Hildenbrand 
---
 include/hw/virtio/virtio.h |  2 ++
 hw/virtio/virtio.c | 15 ++-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c32a815303..e18756d50d 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -183,6 +183,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 
 void virtio_del_queue(VirtIODevice *vdev, int n);
 
+void virtio_delete_queue(VirtQueue *vq);
+
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
 unsigned int len);
 void virtqueue_flush(VirtQueue *vq, unsigned int count);
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 04716b5f6c..31dd140990 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2330,17 +2330,22 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 return >vq[i];
 }
 
+void virtio_delete_queue(VirtQueue *vq)
+{
+vq->vring.num = 0;
+vq->vring.num_default = 0;
+vq->handle_output = NULL;
+vq->handle_aio_output = NULL;
+g_free(vq->used_elems);
+}
+
 void virtio_del_queue(VirtIODevice *vdev, int n)
 {
 if (n < 0 || n >= VIRTIO_QUEUE_MAX) {
 abort();
 }
 
-vdev->vq[n].vring.num = 0;
-vdev->vq[n].vring.num_default = 0;
-vdev->vq[n].handle_output = NULL;
-vdev->vq[n].handle_aio_output = NULL;
-g_free(vdev->vq[n].used_elems);
+virtio_delete_queue(>vq[n]);
 }
 
 static void virtio_set_isr(VirtIODevice *vdev, int value)
-- 
MST




[Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs

2019-12-23 Thread Babu Moger
Damir,
  We normally test Linux guests here. Can you please give me exact qemu command 
line. Even the SMP parameters(sockets,cores,threads,dies) will also work. I 
will try to recreate it locally first.
Give me example what works and what does not work.

I have recently sent few more patches to fix another bug. Please check if this 
makes any difference.
https://patchwork.kernel.org/cover/11272063/
https://lore.kernel.org/qemu-devel/157541968844.46157.17994918142533791313.st...@naples-babu.amd.com/

This should apply cleanly on git://github.com/ehabkost/qemu.git (branch
x86-next)

Note: I will be on vacation until first week of Jan. Responses will be
delayed.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1856335

Title:
  Cache Layout wrong on many Zen Arch CPUs

Status in QEMU:
  New

Bug description:
  AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems
  to always map Cache ass if it was an 4-Core per CCX CPU, which is
  incorrect, and costs upwards 30% performance (more realistically 10%)
  in L3 Cache Layout aware applications.

  Example on a 4-CCX CPU (1950X /w 8 Cores and no SMT):

    
  EPYC-IBPB
  AMD
  

  In windows, coreinfo reports correctly:

    Unified Cache 1, Level 3,8 MB, Assoc  16, LineSize  64
    Unified Cache 6, Level 3,8 MB, Assoc  16, LineSize  64

  On a 3-CCX CPU (3960X /w 6 cores and no SMT):

   
  EPYC-IBPB
  AMD
  

  in windows, coreinfo reports incorrectly:

  --  Unified Cache  1, Level 3,8 MB, Assoc  16, LineSize  64
  **  Unified Cache  6, Level 3,8 MB, Assoc  16, LineSize  64

  Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm.

  With newer Qemu there is a fix (that does behave correctly) in using the dies 
parameter:
   

  The problem is that the dies are exposed differently than how AMD does
  it natively, they are exposed to Windows as sockets, which means, that
  if you are nto a business user, you can't ever have a machine with
  more than two CCX (6 cores) as consumer versions of Windows only
  supports two sockets. (Should this be reported as a separate bug?)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1856335/+subscriptions



Re: Making QEMU easier for management tools and applications

2019-12-23 Thread Michal Prívozník
On 12/21/19 10:02 AM, Markus Armbruster wrote:
> Stefan Hajnoczi  writes:
> 


>> 4. Go and Rust bindings would also be useful.  There is
>> https://github.com/intel/govmm but I think it makes sense to keep it
>> in qemu.git and provide an interface similar to our Python modules.
> 
> Mapping QAPI/QMP commands and events to function signatures isn't hard
> (the QAPI code generator does).  Two problems (at least):
> 
> 1. Leads to some pretty ridiculous functions.  Here's one:
> 
> void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
>  const char *device,
>  const char *target,
>  bool has_replaces, const char *replaces,
>  MirrorSyncMode sync,
>  bool has_speed, int64_t speed,
>  bool has_granularity, uint32_t granularity,
>  bool has_buf_size, int64_t buf_size,
>  bool has_on_source_error,
>  BlockdevOnError on_source_error,
>  bool has_on_target_error, BlockdevOnError 
> on_target_error,
>  bool has_filter_node_name, const char 
> *filter_node_name,
>  bool has_copy_mode, MirrorCopyMode copy_mode, 
>  bool has_auto_finalize, bool auto_finalize,
>  bool has_auto_dismiss, bool auto_dismiss,
>  Error **errp);
> 
>   We commonly use 'boxed': true for such beasts, which results in
>   functions like this one:
> 
> void qmp_blockdev_add(BlockdevOptions *arg, Error **errp);
> 
> 2. Many schema changes that are nicely backward compatible in QMP are
>anything but in such an "obvious" C API.  Adding optional arguments,
>for instance, or changing integer type width.  The former is less of
>an issue with 'boxed': true.
> 
> Perhaps less of an issue with dynamic languages.
> 
> I figure a static language would need much more expressive oomph than C
> to be a good target.  No idea how well Go or Rust bindings can work.

This is something that bothered me for a while now. Even though it's not
as bad as it used to be because we are not adding so much wrappers for
monitor commands as we used to. I mean, in libvirt the wrapper for a
monitor command has to be written by hand. Worse, whenever I'm adding a
wrapper I look at the QMP schema of it and let my muscle memory write
the wrapper.

However, it's not only what Markus already mentioned. Even if we
generated wrappers by a script, we need to be able to generate wrappers
for every single supported version of qemu.

For instance, if qemu version X has a command that accepts some set of
arguments and this set changes in version X+1 then libvirt needs both
wrappers and decides at runtime (depending on what version it is talking
to) what wrapper to use.

Unfortunately, I don't see any easy way out.

Michal




Re: [PULL 48/87] x86: move SMM property to X86MachineState

2019-12-23 Thread Michal Prívozník
On 12/23/19 2:38 PM, Paolo Bonzini wrote:
> On 23/12/19 12:40, Michal Prívozník wrote:
>>
>> diff --git i/target/i386/kvm.c w/target/i386/kvm.c
>> index 0b511906e3..7ee3202634 100644
>> --- i/target/i386/kvm.c
>> +++ w/target/i386/kvm.c
>> @@ -2173,6 +2173,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>  }
>>
>>  if (kvm_check_extension(s, KVM_CAP_X86_SMM) &&
>> +object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE) &&
>>  x86_machine_is_smm_enabled(X86_MACHINE(ms))) {
>>  smram_machine_done.notify = register_smram_listener;
>>  qemu_add_machine_init_done_notifier(_machine_done);
> 
> Yes, it's correct.  Is it okay if I just send a patch with your
> Signed-off-by?

ACK.

Michal




Re: [PATCH v1] virtio-pci: store virtqueue size directly to a device

2019-12-23 Thread Michael S. Tsirkin
On Mon, Dec 23, 2019 at 02:37:58PM +0300, Denis Plotnikov wrote:
> Currenly, the virtqueue size is saved to the proxy on pci writing and
> is read from the device pci reading.
> The virtqueue size is propagated later on form the proxy to the device
> on virqueue enabling stage.
> 
> This could be a problem, if a guest, on the virtqueue configuration, sets
> the size and then re-read it immediatly before the queue enabling
> in order to check if the desiged size has been set.
> 
> This happens in seabios: (sebios snippet)
> 
> vp_find_vq()
> {
> ...
> /* check if the queue is available */
> if (vp->use_modern) {
> num = vp_read(>common, virtio_pci_common_cfg, queue_size);
> if (num > MAX_QUEUE_NUM) {
> vp_write(>common, virtio_pci_common_cfg, queue_size,
>  MAX_QUEUE_NUM);
> num = vp_read(>common, virtio_pci_common_cfg, queue_size);
> }
> } else {
> num = vp_read(>legacy, virtio_pci_legacy, queue_num);
> }
> if (!num) {
> dprintf(1, "ERROR: queue size is 0\n");
> goto fail;
> }
> if (num > MAX_QUEUE_NUM) {
> dprintf(1, "ERROR: queue size %d > %d\n", num, MAX_QUEUE_NUM);
> goto fail;
> }
> ...
> }
> 
> If the device queue num is greater then the max queue size supported by 
> seabios,
> seabios tries to reduce the queue size, then re-read it again, I suppose to
> check if the setting actually happens, and then checks the virtqueue size 
> again,
> to deside whether it is satisfied with the vaule.
> In this case, if device's virtqueue size is 512 and seabios max supported 
> queue
> size is 256, seabios tries to set 256 but than read 512 again and can't 
> proceed
> with that vaule, preventing the guest from successful booting.
> The root case was investigated by Roman Kagan 
> 
> The patch fixes the problem, by propagating the queue size to the device right
> away, so the written value could be read on the next step, if the value was
> ok for the device.
> 
> Suggested-by: Roman Kagan 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Denis Plotnikov 

Thanks, I already have this queued as:

commit 8aabbbd9d04f95d5581d2275362996ecb5516dd9
Author: Michael S. Tsirkin 
Date:   Fri Dec 13 09:22:48 2019 -0500

virtio: update queue size on guest write

Some guests read back queue size after writing it.
Update the size immediatly upon write otherwise
they get confused.

Signed-off-by: Michael S. Tsirkin 

I would appreciate checking other transports, they likely
need the same fix.


> ---
>  hw/virtio/virtio-pci.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index c6b47a9c73..e5c759e19e 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1256,6 +1256,8 @@ static void virtio_pci_common_write(void *opaque, 
> hwaddr addr,
>  break;
>  case VIRTIO_PCI_COMMON_Q_SIZE:
>  proxy->vqs[vdev->queue_sel].num = val;
> +virtio_queue_set_num(vdev, vdev->queue_sel,
> + proxy->vqs[vdev->queue_sel].num);
>  break;
>  case VIRTIO_PCI_COMMON_Q_MSIX:
>  msix_vector_unuse(>pci_dev,
> -- 
> 2.17.0




Re: [PATCH] virtio: add the queue number check

2019-12-23 Thread Michael S. Tsirkin
On Mon, Dec 23, 2019 at 12:02:18PM +0100, Paolo Bonzini wrote:
> On 23/12/19 10:18, Yang Zhong wrote:
> >   In this time, the queue number in the front-end block driver is 2, but
> >   the queue number in qemu side is still 4. So the guest virtio_blk
> >   driver will failed to create vq with backend.
> 
> Where?
> 
> >   There is no "set back"
> >   mechnism for block driver to inform backend this new queue number.
> >   So, i added this check in qemu side.
> 
> Perhaps the guest kernel should still create the virtqueues, and just
> not use them.  In any case, now that you have explained it, it is
> certainly a guest bug.
> 
> Paolo


Paolo do you understand where the bug is?
E.g. I see this in vhost user block:

/* Kick right away to begin processing requests already in vring */
for (i = 0; i < s->dev.nvqs; i++) {
VirtQueue *kick_vq = virtio_get_queue(vdev, i);

if (!virtio_queue_get_desc_addr(vdev, i)) {
continue;
}
event_notifier_set(virtio_queue_get_host_notifier(kick_vq));
}

which is an (admittedly hacky) want to skip VQs which
were not configured by guest 


> >   Since the current virtio-blk and vhost-user-blk device always
> >   defaultly use 1 queue, it's hard to find this issue.
> > 
> >   I checked the guest kernel driver, virtio-scsi and virtio-blk all
> >   have same check in their driver probe:
> > 
> >   num_vqs = min_t(unsigned int, nr_cpu_ids, num_vqs);
> >  
> >   It's possible the guest driver has different queue number with qemu
> >   side.
> > 
> >   I also want to fix this issue from guest driver side, but currently there 
> >   is no better solution to fix this issue.
> > 
> >   By the way, i did not try scsi with this corner case, and only check
> >   driver and qemu code to find same issue. thanks! 
> > 
> >   Yang
> > 
> >> Paolo
> >>
> >>> Signed-off-by: Yang Zhong 
> >>> ---
> >>>  hw/block/vhost-user-blk.c | 11 +++
> >>>  hw/block/virtio-blk.c | 11 ++-
> >>>  hw/scsi/virtio-scsi.c | 12 
> >>>  3 files changed, 33 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> >>> index 63da9bb619..250e72abe4 100644
> >>> --- a/hw/block/vhost-user-blk.c
> >>> +++ b/hw/block/vhost-user-blk.c
> >>> @@ -23,6 +23,8 @@
> >>>  #include "qom/object.h"
> >>>  #include "hw/qdev-core.h"
> >>>  #include "hw/qdev-properties.h"
> >>> +#include "qemu/option.h"
> >>> +#include "qemu/config-file.h"
> >>>  #include "hw/virtio/vhost.h"
> >>>  #include "hw/virtio/vhost-user-blk.h"
> >>>  #include "hw/virtio/virtio.h"
> >>> @@ -391,6 +393,7 @@ static void vhost_user_blk_device_realize(DeviceState 
> >>> *dev, Error **errp)
> >>>  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >>>  VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>>  Error *err = NULL;
> >>> +unsigned cpus;
> >>>  int i, ret;
> >>>  
> >>>  if (!s->chardev.chr) {
> >>> @@ -403,6 +406,14 @@ static void 
> >>> vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> >>>  return;
> >>>  }
> >>>  
> >>> +cpus = 
> >>> qemu_opt_get_number(qemu_opts_find(qemu_find_opts("smp-opts"), NULL),
> >>> +   "cpus", 0);
> >>> +if (s->num_queues > cpus ) {
> >>> +error_setg(errp, "vhost-user-blk: the queue number should be 
> >>> equal "
> >>> +"or less than vcpu number");
> >>> +return;
> >>> +}
> >>> +
> >>>  if (!s->queue_size) {
> >>>  error_setg(errp, "vhost-user-blk: queue size must be non-zero");
> >>>  return;
> >>> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> >>> index d62e6377c2..b2f4d01148 100644
> >>> --- a/hw/block/virtio-blk.c
> >>> +++ b/hw/block/virtio-blk.c
> >>> @@ -18,6 +18,8 @@
> >>>  #include "qemu/error-report.h"
> >>>  #include "qemu/main-loop.h"
> >>>  #include "trace.h"
> >>> +#include "qemu/option.h"
> >>> +#include "qemu/config-file.h"
> >>>  #include "hw/block/block.h"
> >>>  #include "hw/qdev-properties.h"
> >>>  #include "sysemu/blockdev.h"
> >>> @@ -1119,7 +1121,7 @@ static void virtio_blk_device_realize(DeviceState 
> >>> *dev, Error **errp)
> >>>  VirtIOBlock *s = VIRTIO_BLK(dev);
> >>>  VirtIOBlkConf *conf = >conf;
> >>>  Error *err = NULL;
> >>> -unsigned i;
> >>> +unsigned i,cpus;
> >>>  
> >>>  if (!conf->conf.blk) {
> >>>  error_setg(errp, "drive property not set");
> >>> @@ -1133,6 +1135,13 @@ static void virtio_blk_device_realize(DeviceState 
> >>> *dev, Error **errp)
> >>>  error_setg(errp, "num-queues property must be larger than 0");
> >>>  return;
> >>>  }
> >>> +cpus = 
> >>> qemu_opt_get_number(qemu_opts_find(qemu_find_opts("smp-opts"), NULL),
> >>> +   "cpus", 0);
> >>> +if (conf->num_queues > cpus ) {
> >>> +error_setg(errp, "virtio-blk: the queue number should be equal "
> >>> +"or less than vcpu number");
> 

Re: [PATCH] block/backup: fix memory leak in bdrv_backup_top_append()

2019-12-23 Thread Eiichi Tsukata



On 2019/12/23 21:40, Vladimir Sementsov-Ogievskiy wrote:
> 23.12.2019 12:06, Eiichi Tsukata wrote:
>> bdrv_open_driver() allocates bs->opaque according to drv->instance_size.
>> There is no need to allocate it and overwrite opaque in
>> bdrv_backup_top_append().
>>
>> Reproducer:
>>
>>$ QTEST_QEMU_BINARY=./x86_64-softmmu/qemu-system-x86_64 valgrind -q 
>> --leak-check=full tests/test-replication -p /replication/secondary/start
>>==29792== 24 bytes in 1 blocks are definitely lost in loss record 52 of 
>> 226
>>==29792==at 0x483AB1A: calloc (vg_replace_malloc.c:762)
>>==29792==by 0x4B07CE0: g_malloc0 (in 
>> /usr/lib64/libglib-2.0.so.0.6000.7)
>>==29792==by 0x12BAB9: bdrv_open_driver (block.c:1289)
>>==29792==by 0x12BEA9: bdrv_new_open_driver (block.c:1359)
>>==29792==by 0x1D15CB: bdrv_backup_top_append (backup-top.c:190)
>>==29792==by 0x1CC11A: backup_job_create (backup.c:439)
>>==29792==by 0x1CD542: replication_start (replication.c:544)
>>==29792==by 0x1401B9: replication_start_all (replication.c:52)
>>==29792==by 0x128B50: test_secondary_start (test-replication.c:427)
>>...
>>
>> Fixes: 7df7868b9640 ("block: introduce backup-top filter driver")
>> Signed-off-by: Eiichi Tsukata 
>> ---
>>   block/backup-top.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/backup-top.c b/block/backup-top.c
>> index 7cdb1f8eba..617217374d 100644
>> --- a/block/backup-top.c
>> +++ b/block/backup-top.c
>> @@ -196,7 +196,7 @@ BlockDriverState 
>> *bdrv_backup_top_append(BlockDriverState *source,
>>   }
>>   
>>   top->total_sectors = source->total_sectors;
>> -top->opaque = state = g_new0(BDRVBackupTopState, 1);
>> +state = top->opaque;
>>   
>>   bdrv_ref(target);
>>   state->target = bdrv_attach_child(top, target, "target", _file, 
>> errp);
>>
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> 
> Hmm, it was not my idea, I just copied it from mirror.. And there should be 
> the same leak. and
> may be in other places:
> 
> # git grep 'opaque =.*g_new'
> block/backup-top.c:top->opaque = state = g_new0(BDRVBackupTopState, 1);
> block/file-posix.c:state->opaque = g_new0(BDRVRawReopenState, 1);
> block/gluster.c:state->opaque = g_new0(BDRVGlusterReopenState, 1);
> block/iscsi.c:bs->opaque = g_new0(struct IscsiLun, 1);
> block/mirror.c:bs_opaque = g_new0(MirrorBDSOpaque, 1);
> block/raw-format.c:reopen_state->opaque = g_new0(BDRVRawState, 1);
> block/sheepdog.c:re_s = state->opaque = g_new0(BDRVSheepdogReopenState, 
> 1);
> 
> 
> 

Thanks for reviewing.
As you say, block/mirror.c has similar code. But it does not cause the leak.
The difference is bdrv_mirror_top BlockDriver does not have .instance_size
whereas bdrv_backup_top_filter BlockDriver has .instance_size = 
sizeof(BDRVBackupTopState).
So when bdrv_open_driver() is called from mirror.c, g_malloc0(0) is
called allocating nothing.

Eiichi



Re: [PULL 48/87] x86: move SMM property to X86MachineState

2019-12-23 Thread Paolo Bonzini
On 23/12/19 12:40, Michal Prívozník wrote:
> 
> diff --git i/target/i386/kvm.c w/target/i386/kvm.c
> index 0b511906e3..7ee3202634 100644
> --- i/target/i386/kvm.c
> +++ w/target/i386/kvm.c
> @@ -2173,6 +2173,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>  }
> 
>  if (kvm_check_extension(s, KVM_CAP_X86_SMM) &&
> +object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE) &&
>  x86_machine_is_smm_enabled(X86_MACHINE(ms))) {
>  smram_machine_done.notify = register_smram_listener;
>  qemu_add_machine_init_done_notifier(_machine_done);

Yes, it's correct.  Is it okay if I just send a patch with your
Signed-off-by?

Paolo




[PATCH v1] virtio: stregthen virtqueue size invariants

2019-12-23 Thread Denis Plotnikov
1. virtqueue_size is a power of 2
2. virtqueue_size > 2, since seg_max is virtqueue_size - 2

Signed-off-by: Denis Plotnikov 
---
 hw/virtio/virtio.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 04716b5f6c..e3ab69061e 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2166,7 +2166,8 @@ void virtio_queue_set_num(VirtIODevice *vdev, int n, int 
num)
  */
 if (!!num != !!vdev->vq[n].vring.num ||
 num > VIRTQUEUE_MAX_SIZE ||
-num < 0) {
+num < 2 ||
+!is_power_of_2(num)) {
 return;
 }
 vdev->vq[n].vring.num = num;
-- 
2.17.0




Re: [PATCH 0/2] Speed up QMP stream reading

2019-12-23 Thread Yury Kotov
Hi!

20.12.2019, 19:09, "Markus Armbruster" :
> Yury Kotov  writes:
>
>>  Hi,
>>
>>  This series is continuation of another one:
>>  [PATCH] monitor: Fix slow reading
>>  https://lists.gnu.org/archive/html/qemu-devel/2019-11/msg03722.html
>>
>>  Which also tried to read more than one byte from a stream at a time,
>>  but had some problems with OOB and HMP:
>>  https://lists.gnu.org/archive/html/qemu-devel/2019-11/msg05018.html
>>
>>  This series is an attempt to fix problems described.
>
> Two problems: (1) breaks HMP migrate -d, and (2) need to think through
> how this affects reading of QMP input, in particular OOB.
>
> This series refrains from changing HMP, thus avoids (1). Good.
>
> What about (2)? I'm feeling denser than usual today... Can you explain
> real slow how QMP input works? PATCH 2 appears to splice in a ring
> buffer. Why is that needed?

Yes, the second patch introduced the input ring buffer to store remaining
bytes while monitor is suspended.

QMP input scheme:
1. monitor_qmp_can_read returns a number of bytes, which it's ready to receive.
   Currently it returns 0 (if suspended) or 1 otherwise.
   In my patch: monitor_qmp_can_read returns a free size of the introduced
   ring buffer.

2. monitor_qmp_read receives and handles input bytes
   Currently it just puts received bytes into a json lexer.
   If monitor is suspended this function won't be called and thus it won't
   process new command until monitor resume.
   In my patch: monitor_qmp_read stores input bytes into the buffer and then
   handles bytes in the buffer one by one while monitor is not suspended.
   So, it allows to be sure that the original logic is preserved and
   we won't handle new commands while monitor is suspended.

3. monitor_resume schedules monitor_accept_input which calls
   monitor_qmp_handle_inbuf which tries to handle remaining bytes
   in the buffer. monitor_accept_input is a BH scheduled by monitor_resume
   on monitor's aio context. It is needed to be sure, that we access
   the input buffer only in monitor's context.

Example:
1. QMP read 100 bytes
2. Handle some command in the first 60 bytes
3. For some reason, monitor becomes suspended after the first command
4. 40 bytes are remaining
5. After a while, something calls monitor_resume which handles
   the remaining bytes in the buffer (implicitly: resume -> sched bh -> buf)

Actually, QMP continues to receive data even though the monitor is suspended
until the buffer is full. But it doesn't process received data.

Regards,
Yury




Re: [PATCH] block/backup: fix memory leak in bdrv_backup_top_append()

2019-12-23 Thread Vladimir Sementsov-Ogievskiy
23.12.2019 12:06, Eiichi Tsukata wrote:
> bdrv_open_driver() allocates bs->opaque according to drv->instance_size.
> There is no need to allocate it and overwrite opaque in
> bdrv_backup_top_append().
> 
> Reproducer:
> 
>$ QTEST_QEMU_BINARY=./x86_64-softmmu/qemu-system-x86_64 valgrind -q 
> --leak-check=full tests/test-replication -p /replication/secondary/start
>==29792== 24 bytes in 1 blocks are definitely lost in loss record 52 of 226
>==29792==at 0x483AB1A: calloc (vg_replace_malloc.c:762)
>==29792==by 0x4B07CE0: g_malloc0 (in 
> /usr/lib64/libglib-2.0.so.0.6000.7)
>==29792==by 0x12BAB9: bdrv_open_driver (block.c:1289)
>==29792==by 0x12BEA9: bdrv_new_open_driver (block.c:1359)
>==29792==by 0x1D15CB: bdrv_backup_top_append (backup-top.c:190)
>==29792==by 0x1CC11A: backup_job_create (backup.c:439)
>==29792==by 0x1CD542: replication_start (replication.c:544)
>==29792==by 0x1401B9: replication_start_all (replication.c:52)
>==29792==by 0x128B50: test_secondary_start (test-replication.c:427)
>...
> 
> Fixes: 7df7868b9640 ("block: introduce backup-top filter driver")
> Signed-off-by: Eiichi Tsukata 
> ---
>   block/backup-top.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/backup-top.c b/block/backup-top.c
> index 7cdb1f8eba..617217374d 100644
> --- a/block/backup-top.c
> +++ b/block/backup-top.c
> @@ -196,7 +196,7 @@ BlockDriverState *bdrv_backup_top_append(BlockDriverState 
> *source,
>   }
>   
>   top->total_sectors = source->total_sectors;
> -top->opaque = state = g_new0(BDRVBackupTopState, 1);
> +state = top->opaque;
>   
>   bdrv_ref(target);
>   state->target = bdrv_attach_child(top, target, "target", _file, 
> errp);
> 

Reviewed-by: Vladimir Sementsov-Ogievskiy 

Hmm, it was not my idea, I just copied it from mirror.. And there should be the 
same leak. and
may be in other places:

# git grep 'opaque =.*g_new'
block/backup-top.c:top->opaque = state = g_new0(BDRVBackupTopState, 1);
block/file-posix.c:state->opaque = g_new0(BDRVRawReopenState, 1);
block/gluster.c:state->opaque = g_new0(BDRVGlusterReopenState, 1);
block/iscsi.c:bs->opaque = g_new0(struct IscsiLun, 1);
block/mirror.c:bs_opaque = g_new0(MirrorBDSOpaque, 1);
block/raw-format.c:reopen_state->opaque = g_new0(BDRVRawState, 1);
block/sheepdog.c:re_s = state->opaque = g_new0(BDRVSheepdogReopenState, 1);



-- 
Best regards,
Vladimir


Re: [PATCH] iotests: fix usage -machine accel= together with -accel option

2019-12-23 Thread Vladimir Sementsov-Ogievskiy
23.12.2019 11:39, Paolo Bonzini wrote:
> On 23/12/19 08:43, Vladimir Sementsov-Ogievskiy wrote:
>> diff --git a/vl.c b/vl.c
>> index 86474a55c9..9fb859969c 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -2779,7 +2779,7 @@ static void configure_accelerators(const char 
>> *progname)
>>   for (tmp = accel_list; !accel_initialised && tmp && *tmp; tmp++) {
>>   /*
>>* Filter invalid accelerators here, to prevent obscenities
>> - * such as "-machine accel=tcg,,thread=single".
>> + * such as "-machine accel=tcg,thread=single".
> 
> The double comma is intentional.  Without the "if" below, the comma
> would be escaped and parsed as "-accel tcg,thread=single".

Ah, OK, than drop this chunk.

> 
>>*/
>>   if (accel_find(*tmp)) {
>>   qemu_opts_parse_noisily(qemu_find_opts("accel"), *tmp, 
>> true);
>> diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
>> index 90970b0549..2890785a10 100755
>> --- a/tests/qemu-iotests/check
>> +++ b/tests/qemu-iotests/check
>> @@ -587,13 +587,13 @@ export QEMU_PROG="$(type -p "$QEMU_PROG")"
>>   
>>   case "$QEMU_PROG" in
>>   *qemu-system-arm|*qemu-system-aarch64)
>> -export QEMU_OPTIONS="-nodefaults -display none -machine 
>> virt,accel=qtest"
>> +export QEMU_OPTIONS="-nodefaults -display none -machine virt -accel 
>> qtest"
>>   ;;
>>   *qemu-system-tricore)
>> -export QEMU_OPTIONS="-nodefaults -display none -machine 
>> tricore_testboard,accel=qtest"
>> +export QEMU_OPTIONS="-nodefaults -display none -machine 
>> tricore_testboard -accel qtest"
>>   ;;
>>   *)
>> -export QEMU_OPTIONS="-nodefaults -display none -machine accel=qtest"
>> +export QEMU_OPTIONS="-nodefaults -display none -accel qtest"
>>   ;;
>>   esac
>>   
>>
> 
> This part is good, but what is the reproducer?
> 


For example, 30 iotest fails for me with a lot of
+==
+ERROR: test_stream (__main__.TestSmallerBackingFile)
+--
+Traceback (most recent call last):
+  File "030", line 592, in setUp
+self.vm.launch()
+  File 
"/work/src/qemu/master/tests/qemu-iotests/../../python/qemu/machine.py", line 
302, in launch
+self._launch()
+  File 
"/work/src/qemu/master/tests/qemu-iotests/../../python/qemu/machine.py", line 
329, in _launch
+self._post_launch()
+  File "/work/src/qemu/master/tests/qemu-iotests/../../python/qemu/qtest.py", 
line 110, in _post_launch
+super(QEMUQtestMachine, self)._post_launch()
+  File 
"/work/src/qemu/master/tests/qemu-iotests/../../python/qemu/machine.py", line 
274, in _post_launch
+self._qmp.accept()
+  File "/work/src/qemu/master/tests/qemu-iotests/../../python/qemu/qmp.py", 
line 157, in accept
+return self.__negotiate_capabilities()
+  File "/work/src/qemu/master/tests/qemu-iotests/../../python/qemu/qmp.py", 
line 73, in __negotiate_capabilities
+raise QMPConnectError
+qemu.qmp.QMPConnectError


and if I add -d, I see

+DEBUG:qemu.machine:Error launching VM
+DEBUG:qemu.machine:Command: 
'/work/src/qemu/master/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64
 -display none -vga none -chardev 
socket,id=mon,path=/tmp/tmp.ANcscZnFct/qemu-23220-monitor.sock -mon 
chardev=mon,mode=control -qtest 
unix:path=/tmp/tmp.ANcscZnFct/qemu-23220-qtest.sock -accel qtest -nodefaults 
-display none -machine accel=qtest -drive 
if=virtio,id=drive0,file=blkdebug::/ramdisk/x3/test.img,format=qcow2,cache=writeback,backing.node-name=mid,backing.backing.node-name=base'
+DEBUG:qemu.machine:Output: 'qemu-system-x86_64: The -accel and "-machine 
accel=" options are incompatible\n'


Try just random three python tests: 132 202 203: all fails with same symptoms. 
(interesting that -d doesn't help for 202 and 203, but still they are fixed 
after this patch).

I expect that all python tests are broken.. Still I'm lazy to check all, let's 
just fix.


-- 
Best regards,
Vladimir



Re: [PULL 48/87] x86: move SMM property to X86MachineState

2019-12-23 Thread Michal Prívozník
On 12/23/19 12:33 PM, Daniel P. Berrangé wrote:
> On Mon, Dec 23, 2019 at 12:28:43PM +0100, Michal Prívozník wrote:
>> On 12/18/19 1:02 PM, Paolo Bonzini wrote:
>>> Add it to microvm as well, it is a generic property of the x86
>>> architecture.
>>>
>>> Suggested-by: Sergio Lopez 
>>> Signed-off-by: Paolo Bonzini 
>>> ---
>>>  hw/i386/pc.c  | 49 
>>> -
>>>  hw/i386/pc_piix.c |  6 +++---
>>>  hw/i386/pc_q35.c  |  2 +-
>>>  hw/i386/x86.c | 50 
>>> +-
>>>  include/hw/i386/pc.h  |  3 ---
>>>  include/hw/i386/x86.h |  5 +
>>>  target/i386/kvm.c |  3 +--
>>>  7 files changed, 59 insertions(+), 59 deletions(-)
>>>
>>
>>
>>> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
>>> index ef63f3a..c7ff67a 100644
>>> --- a/target/i386/kvm.c
>>> +++ b/target/i386/kvm.c
>>> @@ -2173,8 +2173,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>  }
>>>  
>>>  if (kvm_check_extension(s, KVM_CAP_X86_SMM) &&
>>> -object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE) &&
>>> -pc_machine_is_smm_enabled(PC_MACHINE(ms))) {
>>> +x86_machine_is_smm_enabled(X86_MACHINE(ms))) {
>>>  smram_machine_done.notify = register_smram_listener;
>>>  qemu_add_machine_init_done_notifier(_machine_done);
>>>  }
>>>
>>
>> Sorry for not catching this earlier, but I don't think this is right.
>> The @ms is not instance of X
>>
>>
>> After I refreshed my qemu master I realized that libvirt is unable to
>> fetch capabilities. Libvirt runs the following command:
>>
>>   qemu.git $ ./x86_64-softmmu/qemu-system-x86_64 -S -no-user-config
>> -nodefaults -nographic -machine none,accel=kvm:tcg
> 
> Hmm, it would be good if we can get QEMU CI to launch QEMU  in
> this way, as this isn't the first time some change has broken
> launching of QEMU for probing capabilities.

Agreed.

NB, this diff fixes the issue for me, but I have no idea if it's correct
(it looks correct judging by the way the code looked before):

diff --git i/target/i386/kvm.c w/target/i386/kvm.c
index 0b511906e3..7ee3202634 100644
--- i/target/i386/kvm.c
+++ w/target/i386/kvm.c
@@ -2173,6 +2173,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 }

 if (kvm_check_extension(s, KVM_CAP_X86_SMM) &&
+object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE) &&
 x86_machine_is_smm_enabled(X86_MACHINE(ms))) {
 smram_machine_done.notify = register_smram_listener;
 qemu_add_machine_init_done_notifier(_machine_done);


Michal




[PATCH v1] virtio-pci: store virtqueue size directly to a device

2019-12-23 Thread Denis Plotnikov
Currenly, the virtqueue size is saved to the proxy on pci writing and
is read from the device pci reading.
The virtqueue size is propagated later on form the proxy to the device
on virqueue enabling stage.

This could be a problem, if a guest, on the virtqueue configuration, sets
the size and then re-read it immediatly before the queue enabling
in order to check if the desiged size has been set.

This happens in seabios: (sebios snippet)

vp_find_vq()
{
...
/* check if the queue is available */
if (vp->use_modern) {
num = vp_read(>common, virtio_pci_common_cfg, queue_size);
if (num > MAX_QUEUE_NUM) {
vp_write(>common, virtio_pci_common_cfg, queue_size,
 MAX_QUEUE_NUM);
num = vp_read(>common, virtio_pci_common_cfg, queue_size);
}
} else {
num = vp_read(>legacy, virtio_pci_legacy, queue_num);
}
if (!num) {
dprintf(1, "ERROR: queue size is 0\n");
goto fail;
}
if (num > MAX_QUEUE_NUM) {
dprintf(1, "ERROR: queue size %d > %d\n", num, MAX_QUEUE_NUM);
goto fail;
}
...
}

If the device queue num is greater then the max queue size supported by seabios,
seabios tries to reduce the queue size, then re-read it again, I suppose to
check if the setting actually happens, and then checks the virtqueue size again,
to deside whether it is satisfied with the vaule.
In this case, if device's virtqueue size is 512 and seabios max supported queue
size is 256, seabios tries to set 256 but than read 512 again and can't proceed
with that vaule, preventing the guest from successful booting.
The root case was investigated by Roman Kagan 

The patch fixes the problem, by propagating the queue size to the device right
away, so the written value could be read on the next step, if the value was
ok for the device.

Suggested-by: Roman Kagan 
Suggested-by: Michael S. Tsirkin 
Signed-off-by: Denis Plotnikov 
---
 hw/virtio/virtio-pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index c6b47a9c73..e5c759e19e 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1256,6 +1256,8 @@ static void virtio_pci_common_write(void *opaque, hwaddr 
addr,
 break;
 case VIRTIO_PCI_COMMON_Q_SIZE:
 proxy->vqs[vdev->queue_sel].num = val;
+virtio_queue_set_num(vdev, vdev->queue_sel,
+ proxy->vqs[vdev->queue_sel].num);
 break;
 case VIRTIO_PCI_COMMON_Q_MSIX:
 msix_vector_unuse(>pci_dev,
-- 
2.17.0




Re: [PULL 48/87] x86: move SMM property to X86MachineState

2019-12-23 Thread Daniel P . Berrangé
On Mon, Dec 23, 2019 at 12:28:43PM +0100, Michal Prívozník wrote:
> On 12/18/19 1:02 PM, Paolo Bonzini wrote:
> > Add it to microvm as well, it is a generic property of the x86
> > architecture.
> > 
> > Suggested-by: Sergio Lopez 
> > Signed-off-by: Paolo Bonzini 
> > ---
> >  hw/i386/pc.c  | 49 
> > -
> >  hw/i386/pc_piix.c |  6 +++---
> >  hw/i386/pc_q35.c  |  2 +-
> >  hw/i386/x86.c | 50 
> > +-
> >  include/hw/i386/pc.h  |  3 ---
> >  include/hw/i386/x86.h |  5 +
> >  target/i386/kvm.c |  3 +--
> >  7 files changed, 59 insertions(+), 59 deletions(-)
> > 
> 
> 
> > diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> > index ef63f3a..c7ff67a 100644
> > --- a/target/i386/kvm.c
> > +++ b/target/i386/kvm.c
> > @@ -2173,8 +2173,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> >  }
> >  
> >  if (kvm_check_extension(s, KVM_CAP_X86_SMM) &&
> > -object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE) &&
> > -pc_machine_is_smm_enabled(PC_MACHINE(ms))) {
> > +x86_machine_is_smm_enabled(X86_MACHINE(ms))) {
> >  smram_machine_done.notify = register_smram_listener;
> >  qemu_add_machine_init_done_notifier(_machine_done);
> >  }
> > 
> 
> Sorry for not catching this earlier, but I don't think this is right.
> The @ms is not instance of X
> 
> 
> After I refreshed my qemu master I realized that libvirt is unable to
> fetch capabilities. Libvirt runs the following command:
> 
>   qemu.git $ ./x86_64-softmmu/qemu-system-x86_64 -S -no-user-config
> -nodefaults -nographic -machine none,accel=kvm:tcg

Hmm, it would be good if we can get QEMU CI to launch QEMU  in
this way, as this isn't the first time some change has broken
launching of QEMU for probing capabilities.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PULL 48/87] x86: move SMM property to X86MachineState

2019-12-23 Thread Michal Prívozník
On 12/18/19 1:02 PM, Paolo Bonzini wrote:
> Add it to microvm as well, it is a generic property of the x86
> architecture.
> 
> Suggested-by: Sergio Lopez 
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/i386/pc.c  | 49 -
>  hw/i386/pc_piix.c |  6 +++---
>  hw/i386/pc_q35.c  |  2 +-
>  hw/i386/x86.c | 50 +-
>  include/hw/i386/pc.h  |  3 ---
>  include/hw/i386/x86.h |  5 +
>  target/i386/kvm.c |  3 +--
>  7 files changed, 59 insertions(+), 59 deletions(-)
> 


> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index ef63f3a..c7ff67a 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -2173,8 +2173,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>  }
>  
>  if (kvm_check_extension(s, KVM_CAP_X86_SMM) &&
> -object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE) &&
> -pc_machine_is_smm_enabled(PC_MACHINE(ms))) {
> +x86_machine_is_smm_enabled(X86_MACHINE(ms))) {
>  smram_machine_done.notify = register_smram_listener;
>  qemu_add_machine_init_done_notifier(_machine_done);
>  }
> 

Sorry for not catching this earlier, but I don't think this is right.
The @ms is not instance of X


After I refreshed my qemu master I realized that libvirt is unable to
fetch capabilities. Libvirt runs the following command:

  qemu.git $ ./x86_64-softmmu/qemu-system-x86_64 -S -no-user-config
-nodefaults -nographic -machine none,accel=kvm:tcg

plus some other (for now) irrelevant args. But qemu fails to initialize:

  qemu.git/target/i386/kvm.c:2176:kvm_arch_init: Object 0x563493f306b0
is not an instance of type x86-machine

and indeed it is not:

#0  0x750acd21 in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:50
#1  0x75096535 in __GI_abort () at abort.c:79
#2  0x55d23275 in object_dynamic_cast_assert
(obj=0x567846b0, typename=0x55fd42f7 "x86-machine",
file=0x55fd3878 "/home/zippy/work/qemu/qemu.git/target/i386/kvm.c",
line=2176, func=0x55fd4eb0 <__func__.31258> "kvm_arch_init") at
qom/object.c:815
#3  0x55a1c3fb in kvm_arch_init (ms=0x567846b0,
s=0x568a8430) at /home/zippy/work/qemu/qemu.git/target/i386/kvm.c:2176
#4  0x558b4ad7 in kvm_init (ms=0x567846b0) at
/home/zippy/work/qemu/qemu.git/accel/kvm/kvm-all.c:2068
#5  0x55a44f0a in accel_init_machine (accel=0x568a8430,
ms=0x567846b0) at accel/accel.c:55
#6  0x55a3e28d in do_configure_accelerator
(opaque=0x7fffd6c2, opts=0x568a8290, errp=0x566f34f0
) at vl.c:2737
#7  0x55e9b773 in qemu_opts_foreach (list=0x5654ffe0
, func=0x55a3e1a8 ,
opaque=0x7fffd6c2, errp=0x566f34f0 ) at
util/qemu-option.c:1170
#8  0x55a3e4cb in configure_accelerators
(progname=0x7fffdde1
"/home/zippy/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64") at
vl.c:2798
#9  0x55a417a8 in main (argc=7, argv=0x7fffda08,
envp=0x7fffda48) at vl.c:4121


#2  0x55d23275 in object_dynamic_cast_assert
(obj=0x567846b0, typename=0x55fd42f7 "x86-machine",
file=0x55fd3878 "/home/zippy/work/qemu/qemu.git/target/i386/kvm.c",
line=2176, func=0x55fd4eb0 <__func__.31258> "kvm_arch_init") at
qom/object.c:815
815 abort();
object_dynamic_cast_assert 1 $ p obj->class->type->name
$4 = 0x567ad720 "none-machine"


Michal




Re: [PATCH] virtio: add the queue number check

2019-12-23 Thread Paolo Bonzini
On 23/12/19 10:18, Yang Zhong wrote:
>   In this time, the queue number in the front-end block driver is 2, but
>   the queue number in qemu side is still 4. So the guest virtio_blk
>   driver will failed to create vq with backend.

Where?

>   There is no "set back"
>   mechnism for block driver to inform backend this new queue number.
>   So, i added this check in qemu side.

Perhaps the guest kernel should still create the virtqueues, and just
not use them.  In any case, now that you have explained it, it is
certainly a guest bug.

Paolo

>   Since the current virtio-blk and vhost-user-blk device always
>   defaultly use 1 queue, it's hard to find this issue.
> 
>   I checked the guest kernel driver, virtio-scsi and virtio-blk all
>   have same check in their driver probe:
> 
>   num_vqs = min_t(unsigned int, nr_cpu_ids, num_vqs);
>  
>   It's possible the guest driver has different queue number with qemu
>   side.
> 
>   I also want to fix this issue from guest driver side, but currently there 
>   is no better solution to fix this issue.
> 
>   By the way, i did not try scsi with this corner case, and only check
>   driver and qemu code to find same issue. thanks! 
> 
>   Yang
> 
>> Paolo
>>
>>> Signed-off-by: Yang Zhong 
>>> ---
>>>  hw/block/vhost-user-blk.c | 11 +++
>>>  hw/block/virtio-blk.c | 11 ++-
>>>  hw/scsi/virtio-scsi.c | 12 
>>>  3 files changed, 33 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
>>> index 63da9bb619..250e72abe4 100644
>>> --- a/hw/block/vhost-user-blk.c
>>> +++ b/hw/block/vhost-user-blk.c
>>> @@ -23,6 +23,8 @@
>>>  #include "qom/object.h"
>>>  #include "hw/qdev-core.h"
>>>  #include "hw/qdev-properties.h"
>>> +#include "qemu/option.h"
>>> +#include "qemu/config-file.h"
>>>  #include "hw/virtio/vhost.h"
>>>  #include "hw/virtio/vhost-user-blk.h"
>>>  #include "hw/virtio/virtio.h"
>>> @@ -391,6 +393,7 @@ static void vhost_user_blk_device_realize(DeviceState 
>>> *dev, Error **errp)
>>>  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>>  VHostUserBlk *s = VHOST_USER_BLK(vdev);
>>>  Error *err = NULL;
>>> +unsigned cpus;
>>>  int i, ret;
>>>  
>>>  if (!s->chardev.chr) {
>>> @@ -403,6 +406,14 @@ static void vhost_user_blk_device_realize(DeviceState 
>>> *dev, Error **errp)
>>>  return;
>>>  }
>>>  
>>> +cpus = qemu_opt_get_number(qemu_opts_find(qemu_find_opts("smp-opts"), 
>>> NULL),
>>> +   "cpus", 0);
>>> +if (s->num_queues > cpus ) {
>>> +error_setg(errp, "vhost-user-blk: the queue number should be equal 
>>> "
>>> +"or less than vcpu number");
>>> +return;
>>> +}
>>> +
>>>  if (!s->queue_size) {
>>>  error_setg(errp, "vhost-user-blk: queue size must be non-zero");
>>>  return;
>>> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
>>> index d62e6377c2..b2f4d01148 100644
>>> --- a/hw/block/virtio-blk.c
>>> +++ b/hw/block/virtio-blk.c
>>> @@ -18,6 +18,8 @@
>>>  #include "qemu/error-report.h"
>>>  #include "qemu/main-loop.h"
>>>  #include "trace.h"
>>> +#include "qemu/option.h"
>>> +#include "qemu/config-file.h"
>>>  #include "hw/block/block.h"
>>>  #include "hw/qdev-properties.h"
>>>  #include "sysemu/blockdev.h"
>>> @@ -1119,7 +1121,7 @@ static void virtio_blk_device_realize(DeviceState 
>>> *dev, Error **errp)
>>>  VirtIOBlock *s = VIRTIO_BLK(dev);
>>>  VirtIOBlkConf *conf = >conf;
>>>  Error *err = NULL;
>>> -unsigned i;
>>> +unsigned i,cpus;
>>>  
>>>  if (!conf->conf.blk) {
>>>  error_setg(errp, "drive property not set");
>>> @@ -1133,6 +1135,13 @@ static void virtio_blk_device_realize(DeviceState 
>>> *dev, Error **errp)
>>>  error_setg(errp, "num-queues property must be larger than 0");
>>>  return;
>>>  }
>>> +cpus = qemu_opt_get_number(qemu_opts_find(qemu_find_opts("smp-opts"), 
>>> NULL),
>>> +   "cpus", 0);
>>> +if (conf->num_queues > cpus ) {
>>> +error_setg(errp, "virtio-blk: the queue number should be equal "
>>> +"or less than vcpu number");
>>> +return;
>>> +}
>>>  if (!is_power_of_2(conf->queue_size) ||
>>>  conf->queue_size > VIRTQUEUE_MAX_SIZE) {
>>>  error_setg(errp, "invalid queue-size property (%" PRIu16 "), "
>>> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
>>> index e8b2b64d09..8e3e44f6b9 100644
>>> --- a/hw/scsi/virtio-scsi.c
>>> +++ b/hw/scsi/virtio-scsi.c
>>> @@ -21,6 +21,8 @@
>>>  #include "qemu/error-report.h"
>>>  #include "qemu/iov.h"
>>>  #include "qemu/module.h"
>>> +#include "qemu/option.h"
>>> +#include "qemu/config-file.h"
>>>  #include "sysemu/block-backend.h"
>>>  #include "hw/qdev-properties.h"
>>>  #include "hw/scsi/scsi.h"
>>> @@ -880,6 +882,7 @@ void virtio_scsi_common_realize(DeviceState *dev,
>>>  {
>>>  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>>> 

Re: [PATCH] 9pfs: local: Fix possible memory leak in local_link()

2019-12-23 Thread Greg Kurz
On Fri, 20 Dec 2019 17:49:34 +0800
Jiajun Chen  wrote:

> There is a possible memory leak while local_link return -1 without free
> odirpath and oname.
> 
> Reported-by: Euler Robot 
> Signed-off-by: Jaijun Chen 
> Signed-off-by: Xiang Zheng 
> ---

Applied to 9p-next.

Thanks.

>  hw/9pfs/9p-local.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
> index 4708c0bd89..491b08aee8 100644
> --- a/hw/9pfs/9p-local.c
> +++ b/hw/9pfs/9p-local.c
> @@ -947,7 +947,7 @@ static int local_link(FsContext *ctx, V9fsPath *oldpath,
>  if (ctx->export_flags & V9FS_SM_MAPPED_FILE &&
>  local_is_mapped_file_metadata(ctx, name)) {
>  errno = EINVAL;
> -return -1;
> +goto out;
>  }
>  
>  odirfd = local_opendir_nofollow(ctx, odirpath);




Re: [PULL 2/2] configure: Require Python >= 3.5

2019-12-23 Thread Juan Quintela
Eduardo Habkost  wrote:
> On Fri, Dec 20, 2019 at 07:59:28PM +0100, Juan Quintela wrote:
>> Eduardo Habkost  wrote:
>> > Python 3.5 is the oldest Python version available on our
>> > supported build platforms, and Python 2 end of life will be 3
>> > weeks after the planned release date of QEMU 4.2.0.  Drop Python
>> > 2 support from configure completely, and require Python 3.5 or
>> > newer.
>> >
>> > Signed-off-by: Eduardo Habkost 
>> > Message-Id: <20191016224237.26180-1-ehabk...@redhat.com>
>> > Reviewed-by: John Snow 
>> > Signed-off-by: Eduardo Habkost 
>> 
>> Reviewed-by: Juan Quintela 
>
> Thanks!
>
>> 
>> But once here, a comment telling why we want 3.5, not 3.4 or 3.6 will
>> have been helpful.
>
> Is "Python 3.5 is the oldest Python version available on our
> supported build platforms" a good explanation why we want 3.5?

You have a point here O:-)

Later, Juan.




[for-5.0 PATCH 10/11] gdbstub: add reverse continue support in replay mode

2019-12-23 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

This patch adds support of the reverse continue operation for gdbstub.
Reverse continue finds the last breakpoint that would happen in normal
execution from the beginning to the current moment.
Implementation of the reverse continue replays the execution twice:
to find the breakpoints that were hit and to seek to the last breakpoint.
Reverse continue loads the previous snapshot and tries to find the breakpoint
since that moment. If there are no such breakpoints, it proceeds to
the earlier snapshot, and so on. When no breakpoints or watchpoints were
hit at all, execution stops at the beginning of the replay log.

Signed-off-by: Pavel Dovgalyuk 
---
 cpus.c|5 +++
 exec.c|1 +
 gdbstub.c |   10 ++
 include/sysemu/replay.h   |8 +
 replay/replay-debugging.c |   71 +
 stubs/replay.c|5 +++
 6 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 4951a68796..5a5e7e5f4e 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1068,6 +1068,11 @@ static void cpu_handle_guest_debug(CPUState *cpu)
 cpu->stopped = true;
 } else {
 if (!cpu->singlestep_enabled) {
+/*
+ * Report about the breakpoint and
+ * make a single step to skip it
+ */
+replay_breakpoint();
 cpu_single_step(cpu, SSTEP_ENABLE);
 } else {
 cpu_single_step(cpu, 0);
diff --git a/exec.c b/exec.c
index 861fcc7ea3..0d5ea6b6fe 100644
--- a/exec.c
+++ b/exec.c
@@ -2748,6 +2748,7 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, 
vaddr len,
  * Don't process the watchpoints when we are
  * in a reverse debugging operation.
  */
+replay_breakpoint();
 return;
 }
 if (flags == BP_MEM_READ) {
diff --git a/gdbstub.c b/gdbstub.c
index 6539c8017e..2298e4cb98 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -1880,6 +1880,13 @@ static void handle_backward(GdbCmdContext *gdb_ctx, void 
*user_ctx)
 put_packet(gdb_ctx->s, "E14");
 }
 return;
+case 'c':
+if (replay_reverse_continue()) {
+gdb_continue(gdb_ctx->s);
+} else {
+put_packet(gdb_ctx->s, "E14");
+}
+return;
 }
 }
 
@@ -2142,7 +2149,8 @@ static void handle_query_supported(GdbCmdContext 
*gdb_ctx, void *user_ctx)
 ";qXfer:features:read+");
 }
 if (replay_mode == REPLAY_MODE_PLAY) {
-pstrcat(gdb_ctx->str_buf, sizeof(gdb_ctx->str_buf), ";ReverseStep+");
+pstrcat(gdb_ctx->str_buf, sizeof(gdb_ctx->str_buf),
+";ReverseStep+;ReverseContinue+");
 }
 
 if (gdb_ctx->num_params &&
diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index 13a8123b09..b6cac175c4 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -81,11 +81,19 @@ const char *replay_get_filename(void);
  * Returns true on success.
  */
 bool replay_reverse_step(void);
+/*
+ * Start searching the last breakpoint/watchpoint.
+ * Used by gdbstub for backwards debugging.
+ * Returns true if the process successfully started.
+ */
+bool replay_reverse_continue(void);
 /*
  * Returns true if replay module is processing
  * reverse_continue or reverse_step request
  */
 bool replay_running_debug(void);
+/* Called in reverse debugging mode to collect breakpoint information */
+void replay_breakpoint(void);
 
 /* Processing the instructions */
 
diff --git a/replay/replay-debugging.c b/replay/replay-debugging.c
index cdc01af4a2..e4a083949e 100644
--- a/replay/replay-debugging.c
+++ b/replay/replay-debugging.c
@@ -23,6 +23,8 @@
 #include "migration/snapshot.h"
 
 static bool replay_is_debugging;
+static int64_t replay_last_breakpoint;
+static int64_t replay_last_snapshot;
 
 bool replay_running_debug(void)
 {
@@ -252,3 +254,72 @@ bool replay_reverse_step(void)
 
 return false;
 }
+
+static void replay_continue_end(void)
+{
+replay_is_debugging = false;
+vm_stop(RUN_STATE_DEBUG);
+replay_delete_break();
+}
+
+static void replay_continue_stop(void *opaque)
+{
+Error *err = NULL;
+if (replay_last_breakpoint != -1LL) {
+replay_seek(replay_last_breakpoint, replay_stop_vm_debug, );
+if (err) {
+error_free(err);
+replay_continue_end();
+}
+return;
+}
+/*
+ * No breakpoints since the last snapshot.
+ * Find previous snapshot and try again.
+ */
+if (replay_last_snapshot != 0) {
+replay_seek(replay_last_snapshot - 1, replay_continue_stop, );
+if (err) {
+error_free(err);
+replay_continue_end();
+}
+replay_last_snapshot = replay_get_current_icount();
+return;
+} else {
+/* Seek to the very first step */

[for-5.0 PATCH 08/11] replay: flush rr queue before loading the vmstate

2019-12-23 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

Non-empty record/replay queue prevents saving and loading the VM state,
because it includes pending bottom halves and block coroutines.
But when the new VM state is loaded, we don't have to preserve the consistency
of the current state anymore. Therefore this patch just flushes the queue
allowing the coroutines to finish and removes checking for empty rr queue
for load_snapshot function.

Signed-off-by: Pavel Dovgalyuk 
---
 include/sysemu/replay.h  |2 ++
 migration/savevm.c   |   12 ++--
 replay/replay-internal.h |2 --
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index e00ed2f4a5..239c01e7df 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -149,6 +149,8 @@ void replay_disable_events(void);
 void replay_enable_events(void);
 /*! Returns true when saving events is enabled */
 bool replay_events_enabled(void);
+/* Flushes events queue */
+void replay_flush_events(void);
 /*! Adds bottom half event to the queue */
 void replay_bh_schedule_event(QEMUBH *bh);
 /* Adds oneshot bottom half event to the queue */
diff --git a/migration/savevm.c b/migration/savevm.c
index ae84bf6ab0..0c5cac372a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2834,12 +2834,6 @@ int load_snapshot(const char *name, Error **errp)
 AioContext *aio_context;
 MigrationIncomingState *mis = migration_incoming_get_current();
 
-if (!replay_can_snapshot()) {
-error_setg(errp, "Record/replay does not allow loading snapshot "
-   "right now. Try once more later.");
-return -EINVAL;
-}
-
 if (!bdrv_all_can_snapshot()) {
 error_setg(errp,
"Device '%s' is writable but does not support snapshots",
@@ -2873,6 +2867,12 @@ int load_snapshot(const char *name, Error **errp)
 return -EINVAL;
 }
 
+/*
+ * Flush the record/replay queue. Now the VM state is going
+ * to change. Therefore we don't need to preserve its consistency
+ */
+replay_flush_events();
+
 /* Flush all IO requests so they don't interfere with the new state.  */
 bdrv_drain_all_begin();
 
diff --git a/replay/replay-internal.h b/replay/replay-internal.h
index 2f6145ec7c..97649ed8d7 100644
--- a/replay/replay-internal.h
+++ b/replay/replay-internal.h
@@ -149,8 +149,6 @@ void replay_read_next_clock(unsigned int kind);
 void replay_init_events(void);
 /*! Clears internal data structures for events handling */
 void replay_finish_events(void);
-/*! Flushes events queue */
-void replay_flush_events(void);
 /*! Returns true if there are any unsaved events in the queue */
 bool replay_has_events(void);
 /*! Saves events from queue into the file */




[for-5.0 PATCH 09/11] gdbstub: add reverse step support in replay mode

2019-12-23 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

GDB remote protocol supports two reverse debugging commands:
reverse step and reverse continue.
This patch adds support of the first one to the gdbstub.
Reverse step is intended to step one instruction in the backwards
direction. This is not possible in regular execution.
But replayed execution is deterministic, therefore we can load one of
the prior snapshots and proceed to the desired step. It is equivalent
to stepping one instruction back.
There should be at least one snapshot preceding the debugged part of
the replay log.

Signed-off-by: Pavel Dovgalyuk 
---
 accel/tcg/translator.c|1 +
 cpus.c|   14 +--
 exec.c|7 ++
 gdbstub.c |   56 +++--
 include/sysemu/replay.h   |   11 +
 replay/replay-debugging.c |   33 +++
 stubs/replay.c|5 
 7 files changed, 121 insertions(+), 6 deletions(-)

diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 603d17ff83..fb1e19c585 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -17,6 +17,7 @@
 #include "exec/log.h"
 #include "exec/translator.h"
 #include "exec/plugin-gen.h"
+#include "sysemu/replay.h"
 
 /* Pairs with tcg_clear_temp_count.
To be called by #TranslatorOps.{translate_insn,tb_stop} if
diff --git a/cpus.c b/cpus.c
index be2d655f37..4951a68796 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1062,9 +1062,17 @@ static bool cpu_can_run(CPUState *cpu)
 
 static void cpu_handle_guest_debug(CPUState *cpu)
 {
-gdb_set_stop_cpu(cpu);
-qemu_system_debug_request();
-cpu->stopped = true;
+if (!replay_running_debug()) {
+gdb_set_stop_cpu(cpu);
+qemu_system_debug_request();
+cpu->stopped = true;
+} else {
+if (!cpu->singlestep_enabled) {
+cpu_single_step(cpu, SSTEP_ENABLE);
+} else {
+cpu_single_step(cpu, 0);
+}
+}
 }
 
 #ifdef CONFIG_LINUX
diff --git a/exec.c b/exec.c
index d4b769d0d4..861fcc7ea3 100644
--- a/exec.c
+++ b/exec.c
@@ -2743,6 +2743,13 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, 
vaddr len,
 QTAILQ_FOREACH(wp, >watchpoints, entry) {
 if (watchpoint_address_matches(wp, addr, len)
 && (wp->flags & flags)) {
+if (replay_running_debug()) {
+/*
+ * Don't process the watchpoints when we are
+ * in a reverse debugging operation.
+ */
+return;
+}
 if (flags == BP_MEM_READ) {
 wp->flags |= BP_WATCHPOINT_HIT_READ;
 } else {
diff --git a/gdbstub.c b/gdbstub.c
index 4cf8af365e..6539c8017e 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -51,6 +51,7 @@
 #include "sysemu/runstate.h"
 #include "hw/semihosting/semihost.h"
 #include "exec/exec-all.h"
+#include "sysemu/replay.h"
 
 #ifdef CONFIG_USER_ONLY
 #define GDB_ATTACHED "0"
@@ -372,6 +373,20 @@ typedef struct GDBState {
  */
 static int sstep_flags = SSTEP_ENABLE|SSTEP_NOIRQ|SSTEP_NOTIMER;
 
+/* Retrieves flags for single step mode. */
+static int get_sstep_flags(void)
+{
+/*
+ * In replay mode all events written into the log should be replayed.
+ * That is why NOIRQ flag is removed in this mode.
+ */
+if (replay_mode != REPLAY_MODE_NONE) {
+return SSTEP_ENABLE;
+} else {
+return sstep_flags;
+}
+}
+
 static GDBState *gdbserver_state;
 
 bool gdb_has_xml;
@@ -462,7 +477,7 @@ static int gdb_continue_partial(GDBState *s, char 
*newstates)
 CPU_FOREACH(cpu) {
 if (newstates[cpu->cpu_index] == 's') {
 trace_gdbstub_op_stepping(cpu->cpu_index);
-cpu_single_step(cpu, sstep_flags);
+cpu_single_step(cpu, get_sstep_flags());
 }
 }
 s->running_state = 1;
@@ -481,7 +496,7 @@ static int gdb_continue_partial(GDBState *s, char 
*newstates)
 break; /* nothing to do here */
 case 's':
 trace_gdbstub_op_stepping(cpu->cpu_index);
-cpu_single_step(cpu, sstep_flags);
+cpu_single_step(cpu, get_sstep_flags());
 cpu_resume(cpu);
 flag = 1;
 break;
@@ -1847,10 +1862,31 @@ static void handle_step(GdbCmdContext *gdb_ctx, void 
*user_ctx)
 gdb_set_cpu_pc(gdb_ctx->s, (target_ulong)gdb_ctx->params[0].val_ull);
 }
 
-cpu_single_step(gdb_ctx->s->c_cpu, sstep_flags);
+cpu_single_step(gdb_ctx->s->c_cpu, get_sstep_flags());
 gdb_continue(gdb_ctx->s);
 }
 
+static void handle_backward(GdbCmdContext *gdb_ctx, void *user_ctx)
+{
+if (replay_mode != REPLAY_MODE_PLAY) {
+put_packet(gdb_ctx->s, "E22");
+}
+if (gdb_ctx->num_params == 1) {
+switch (gdb_ctx->params[0].opcode) {
+case 's':
+if (replay_reverse_step()) {
+gdb_continue(gdb_ctx->s);
+} else {
+   

[for-5.0 PATCH 11/11] replay: describe reverse debugging in docs/replay.txt

2019-12-23 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

This patch updates the documentation and describes usage of the reverse
debugging in QEMU+GDB.

Signed-off-by: Pavel Dovgalyuk 
---
 docs/replay.txt |   33 +
 1 file changed, 33 insertions(+)

diff --git a/docs/replay.txt b/docs/replay.txt
index f4619a62a3..07104492f2 100644
--- a/docs/replay.txt
+++ b/docs/replay.txt
@@ -294,6 +294,39 @@ for recording and replaying must contain identical number 
of ports in record
 and replay modes, but their backends may differ.
 E.g., '-serial stdio' in record mode, and '-serial null' in replay mode.
 
+Reverse debugging
+-
+
+Reverse debugging allows "executing" the program in reverse direction.
+GDB remote protocol supports "reverse step" and "reverse continue"
+commands. The first one steps single instruction backwards in time,
+and the second one finds the last breakpoint in the past.
+
+Recorded executions may be used to enable reverse debugging. QEMU can't
+execute the code in backwards direction, but can load a snapshot and
+replay forward to find the desired position or breakpoint.
+
+The following GDB commands are supported:
+ - reverse-stepi (or rsi) - step one instruction backwards
+ - reverse-continue (or rc) - find last breakpoint in the past
+
+Reverse step loads the nearest snapshot and replays the execution until
+the required instruction is met.
+
+Reverse continue may include several passes of examining the execution
+between the snapshots. Each of the passes include the following steps:
+ 1. loading the snapshot
+ 2. replaying to examine the breakpoints
+ 3. if breakpoint or watchpoint was met
+- loading the snaphot again
+- replaying to the required breakpoint
+ 4. else
+- proceeding to the p.1 with the earlier snapshot
+
+Therefore usage of the reverse debugging requires at least one snapshot
+created in advance. See the "Snapshotting" section to learn about running
+record/replay and creating the snapshot in these modes.
+
 Replay log format
 -
 




[for-5.0 PATCH 07/11] replay: implement replay-seek command

2019-12-23 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

This patch adds hmp/qmp commands replay_seek/replay-seek that proceed
the execution to the specified instruction count.
The command automatically loads nearest snapshot and replays the execution
to find the desired instruction count.

Signed-off-by: Pavel Dovgalyuk 
Acked-by: Markus Armbruster 

--

v2:
 - renamed replay_seek qmp command into replay-seek
   (suggested by Eric Blake)
v7:
 - small fixes related to Markus Armbruster's review
v9:
 - changed 'step' parameter name to 'icount'
 - moved json stuff to replay.json and updated the description
   (suggested by Markus Armbruster)
v10:
 - updated the descriptions
---
 hmp-commands.hx   |   19 +
 include/monitor/hmp.h |1 
 qapi/replay.json  |   20 ++
 replay/replay-debugging.c |   92 +
 4 files changed, 132 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3704294da8..420565c7f8 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1964,6 +1964,25 @@ STEXI
 @findex replay_delete_break
 Remove replay breakpoint which was previously set with replay_break.
 The command is ignored when there are no replay breakpoints.
+ETEXI
+
+{
+.name   = "replay_seek",
+.args_type  = "icount:i",
+.params = "icount",
+.help   = "replay execution to the specified instruction count",
+.cmd= hmp_replay_seek,
+},
+
+STEXI
+@item replay_seek @var{icount}
+@findex replay_seek
+Automatically proceed to the instruction count @var{icount}, when
+replaying the execution. The command automatically loads nearest
+snapshot and replays the execution to find the desired instruction.
+When there is no preceding snapshot or the execution is not replayed,
+then the command fails.
+icount for the reference may be observed with 'info replay' command.
 ETEXI
 
 {
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 0fd477aa13..bd97d4a8c6 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -156,5 +156,6 @@ void hmp_info_sev(Monitor *mon, const QDict *qdict);
 void hmp_info_replay(Monitor *mon, const QDict *qdict);
 void hmp_replay_break(Monitor *mon, const QDict *qdict);
 void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
+void hmp_replay_seek(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/qapi/replay.json b/qapi/replay.json
index e3266ef3a9..5d24cdc680 100644
--- a/qapi/replay.json
+++ b/qapi/replay.json
@@ -99,3 +99,23 @@
 #
 ##
 { 'command': 'replay-delete-break' }
+
+##
+# @replay-seek:
+#
+# Automatically proceed to the instruction count @icount, when
+# replaying the execution. The command automatically loads nearest
+# snapshot and replays the execution to find the desired instruction.
+# When there is no preceding snapshot or the execution is not replayed,
+# then the command fails.
+# icount for the reference may be obtained with @query-replay command.
+#
+# @icount: target instruction count
+#
+# Since: 5.0
+#
+# Example:
+#
+# -> { "execute": "replay-seek", "data": { "icount": 220414 } }
+##
+{ 'command': 'replay-seek', 'data': { 'icount': 'int' } }
diff --git a/replay/replay-debugging.c b/replay/replay-debugging.c
index 166ba10d2c..f5a02a5aa1 100644
--- a/replay/replay-debugging.c
+++ b/replay/replay-debugging.c
@@ -19,6 +19,8 @@
 #include "qapi/qapi-commands-replay.h"
 #include "qapi/qmp/qdict.h"
 #include "qemu/timer.h"
+#include "block/snapshot.h"
+#include "migration/snapshot.h"
 
 void hmp_info_replay(Monitor *mon, const QDict *qdict)
 {
@@ -127,3 +129,93 @@ void hmp_replay_delete_break(Monitor *mon, const QDict 
*qdict)
 return;
 }
 }
+
+static char *replay_find_nearest_snapshot(int64_t icount,
+  int64_t *snapshot_icount)
+{
+BlockDriverState *bs;
+QEMUSnapshotInfo *sn_tab;
+QEMUSnapshotInfo *nearest = NULL;
+char *ret = NULL;
+int nb_sns, i;
+AioContext *aio_context;
+
+*snapshot_icount = -1;
+
+bs = bdrv_all_find_vmstate_bs();
+if (!bs) {
+goto fail;
+}
+aio_context = bdrv_get_aio_context(bs);
+
+aio_context_acquire(aio_context);
+nb_sns = bdrv_snapshot_list(bs, _tab);
+aio_context_release(aio_context);
+
+for (i = 0; i < nb_sns; i++) {
+if (bdrv_all_find_snapshot(sn_tab[i].name, ) == 0) {
+if (sn_tab[i].icount != -1ULL
+&& sn_tab[i].icount <= icount
+&& (!nearest || nearest->icount < sn_tab[i].icount)) {
+nearest = _tab[i];
+}
+}
+}
+if (nearest) {
+ret = g_strdup(nearest->name);
+*snapshot_icount = nearest->icount;
+}
+g_free(sn_tab);
+
+fail:
+return ret;
+}
+
+static void replay_seek(int64_t icount, QEMUTimerCB callback, Error **errp)
+{
+char *snapshot = NULL;
+int64_t snapshot_icount;
+
+if (replay_mode != REPLAY_MODE_PLAY) {
+error_setg(errp, "replay must be 

[for-5.0 PATCH 04/11] qapi: introduce replay.json for record/replay-related stuff

2019-12-23 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

This patch adds replay.json file. It will be
used for adding record/replay-related data structures and commands.

Signed-off-by: Pavel Dovgalyuk 
Reviewed-by: Markus Armbruster 

--

v10:
 - minor changes
v13:
 - rebased to the new QAPI files
---
 MAINTAINERS |1 +
 include/sysemu/replay.h |1 +
 qapi/Makefile.objs  |2 +-
 qapi/misc.json  |   18 --
 qapi/qapi-schema.json   |1 +
 qapi/replay.json|   26 ++
 6 files changed, 30 insertions(+), 19 deletions(-)
 create mode 100644 qapi/replay.json

diff --git a/MAINTAINERS b/MAINTAINERS
index 387879aebc..7ad3001b0e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2311,6 +2311,7 @@ F: net/filter-replay.c
 F: include/sysemu/replay.h
 F: docs/replay.txt
 F: stubs/replay.c
+F: qapi/replay.json
 
 IOVA Tree
 M: Peter Xu 
diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index c9c896ae8d..e00ed2f4a5 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -14,6 +14,7 @@
 
 #include "qapi/qapi-types-misc.h"
 #include "qapi/qapi-types-run-state.h"
+#include "qapi/qapi-types-replay.h"
 #include "qapi/qapi-types-ui.h"
 #include "block/aio.h"
 
diff --git a/qapi/Makefile.objs b/qapi/Makefile.objs
index dd3f5e6f94..4e84247d0c 100644
--- a/qapi/Makefile.objs
+++ b/qapi/Makefile.objs
@@ -7,7 +7,7 @@ util-obj-y += qapi-util.o
 
 QAPI_COMMON_MODULES = audio authz block-core block char common crypto
 QAPI_COMMON_MODULES += dump error introspect job machine migration misc net
-QAPI_COMMON_MODULES += qdev qom rdma rocker run-state sockets tpm
+QAPI_COMMON_MODULES += qdev qom rdma replay rocker run-state sockets tpm
 QAPI_COMMON_MODULES += trace transaction ui
 QAPI_TARGET_MODULES = machine-target misc-target
 QAPI_MODULES = $(QAPI_COMMON_MODULES) $(QAPI_TARGET_MODULES)
diff --git a/qapi/misc.json b/qapi/misc.json
index 33b94e3589..76a5f86e7f 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -1694,24 +1694,6 @@
 { 'event': 'ACPI_DEVICE_OST',
  'data': { 'info': 'ACPIOSTInfo' } }
 
-##
-# @ReplayMode:
-#
-# Mode of the replay subsystem.
-#
-# @none: normal execution mode. Replay or record are not enabled.
-#
-# @record: record mode. All non-deterministic data is written into the
-#  replay log.
-#
-# @play: replay mode. Non-deterministic data required for system execution
-#is read from the log.
-#
-# Since: 2.5
-##
-{ 'enum': 'ReplayMode',
-  'data': [ 'none', 'record', 'play' ] }
-
 ##
 # @xen-load-devices-state:
 #
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index 9751b11f8f..62f425410c 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -103,6 +103,7 @@
 { 'include': 'qdev.json' }
 { 'include': 'machine.json' }
 { 'include': 'machine-target.json' }
+{ 'include': 'replay.json' }
 { 'include': 'misc.json' }
 { 'include': 'misc-target.json' }
 { 'include': 'audio.json' }
diff --git a/qapi/replay.json b/qapi/replay.json
new file mode 100644
index 00..9e13551d20
--- /dev/null
+++ b/qapi/replay.json
@@ -0,0 +1,26 @@
+# -*- Mode: Python -*-
+#
+
+##
+# = Record/replay
+##
+
+{ 'include': 'common.json' }
+
+##
+# @ReplayMode:
+#
+# Mode of the replay subsystem.
+#
+# @none: normal execution mode. Replay or record are not enabled.
+#
+# @record: record mode. All non-deterministic data is written into the
+#  replay log.
+#
+# @play: replay mode. Non-deterministic data required for system execution
+#is read from the log.
+#
+# Since: 2.5
+##
+{ 'enum': 'ReplayMode',
+  'data': [ 'none', 'record', 'play' ] }




[for-5.0 PATCH 03/11] migration: introduce icount field for snapshots

2019-12-23 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

Saving icount as a parameters of the snapshot allows navigation between
them in the execution replay scenario.
This information can be used for finding a specific snapshot for proceeding
the recorded execution to the specific moment of the time.
E.g., 'reverse step' action (introduced in one of the following patches)
needs to load the nearest snapshot which is prior to the current moment
of time.

Signed-off-by: Pavel Dovgalyuk 
Acked-by: Markus Armbruster 

--

v2:
 - made icount in SnapshotInfo optional (suggested by Eric Blake)
v7:
 - added more comments for icount member (suggested by Markus Armbruster)
v9:
 - updated icount comment
v10:
 - updated icount comment again
---
 block/qapi.c |   18 ++
 block/qcow2-snapshot.c   |2 ++
 blockdev.c   |   10 ++
 include/block/snapshot.h |1 +
 migration/savevm.c   |5 +
 qapi/block-core.json |7 ++-
 qapi/block.json  |3 ++-
 7 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index 9a5d0c9b27..110ac253ab 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -219,6 +219,8 @@ int bdrv_query_snapshot_info_list(BlockDriverState *bs,
 info->date_nsec = sn_tab[i].date_nsec;
 info->vm_clock_sec  = sn_tab[i].vm_clock_nsec / 10;
 info->vm_clock_nsec = sn_tab[i].vm_clock_nsec % 10;
+info->icount= sn_tab[i].icount;
+info->has_icount= sn_tab[i].icount != -1ULL;
 
 info_list = g_new0(SnapshotInfoList, 1);
 info_list->value = info;
@@ -651,14 +653,15 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
 void bdrv_snapshot_dump(QEMUSnapshotInfo *sn)
 {
 char date_buf[128], clock_buf[128];
+char icount_buf[128] = {0};
 struct tm tm;
 time_t ti;
 int64_t secs;
 char *sizing = NULL;
 
 if (!sn) {
-qemu_printf("%-10s%-20s%7s%20s%15s",
-"ID", "TAG", "VM SIZE", "DATE", "VM CLOCK");
+qemu_printf("%-10s%-18s%7s%20s%13s%11s",
+"ID", "TAG", "VM SIZE", "DATE", "VM CLOCK", "ICOUNT");
 } else {
 ti = sn->date_sec;
 localtime_r(, );
@@ -672,11 +675,16 @@ void bdrv_snapshot_dump(QEMUSnapshotInfo *sn)
  (int)(secs % 60),
  (int)((sn->vm_clock_nsec / 100) % 1000));
 sizing = size_to_str(sn->vm_state_size);
-qemu_printf("%-10s%-20s%7s%20s%15s",
+if (sn->icount != -1ULL) {
+snprintf(icount_buf, sizeof(icount_buf),
+"%"PRId64, sn->icount);
+}
+qemu_printf("%-10s%-18s%7s%20s%13s%11s",
 sn->id_str, sn->name,
 sizing,
 date_buf,
-clock_buf);
+clock_buf,
+icount_buf);
 }
 g_free(sizing);
 }
@@ -838,6 +846,8 @@ void bdrv_image_info_dump(ImageInfo *info)
 .date_nsec = elem->value->date_nsec,
 .vm_clock_nsec = elem->value->vm_clock_sec * 10ULL +
  elem->value->vm_clock_nsec,
+.icount = elem->value->has_icount ?
+  elem->value->icount : -1ULL,
 };
 
 pstrcpy(sn.id_str, sizeof(sn.id_str), elem->value->id);
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index b04b3e1634..2c003514ef 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -662,6 +662,7 @@ int qcow2_snapshot_create(BlockDriverState *bs, 
QEMUSnapshotInfo *sn_info)
 sn->date_sec = sn_info->date_sec;
 sn->date_nsec = sn_info->date_nsec;
 sn->vm_clock_nsec = sn_info->vm_clock_nsec;
+sn->icount = sn_info->icount;
 sn->extra_data_size = sizeof(QCowSnapshotExtraData);
 
 /* Allocate the L1 table of the snapshot and copy the current one there. */
@@ -995,6 +996,7 @@ int qcow2_snapshot_list(BlockDriverState *bs, 
QEMUSnapshotInfo **psn_tab)
 sn_info->date_sec = sn->date_sec;
 sn_info->date_nsec = sn->date_nsec;
 sn_info->vm_clock_nsec = sn->vm_clock_nsec;
+sn_info->icount = sn->icount;
 }
 *psn_tab = sn_tab;
 return s->nb_snapshots;
diff --git a/blockdev.c b/blockdev.c
index 8e029e9c01..6383a64ddd 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -59,6 +59,7 @@
 #include "sysemu/arch_init.h"
 #include "sysemu/qtest.h"
 #include "sysemu/runstate.h"
+#include "sysemu/replay.h"
 #include "qemu/cutils.h"
 #include "qemu/help_option.h"
 #include "qemu/main-loop.h"
@@ -1242,6 +1243,10 @@ SnapshotInfo 
*qmp_blockdev_snapshot_delete_internal_sync(const char *device,
 info->vm_state_size = sn.vm_state_size;
 info->vm_clock_nsec = sn.vm_clock_nsec % 10;
 info->vm_clock_sec = sn.vm_clock_nsec / 10;
+if (sn.icount != -1ULL) {
+info->icount = sn.icount;
+info->has_icount = true;
+}
 
 return info;
 
@@ -1449,6 +1454,11 

[for-5.0 PATCH 05/11] replay: introduce info hmp/qmp command

2019-12-23 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

This patch introduces 'info replay' monitor command and
corresponding qmp request.
These commands request the current record/replay mode, replay log file
name, and the instruction count (number of recorded/replayed
instructions).  The instruction count can be used with the
replay_seek/replay_break commands added in the next two patches.

Signed-off-by: Pavel Dovgalyuk 
Acked-by: Dr. David Alan Gilbert 
Acked-by: Markus Armbruster 

--

v2:
 - renamed info_replay qmp into query-replay (suggested by Eric Blake)
v7:
 - added empty line (suggested by Markus Armbruster)
v9:
 - changed 'step' parameter name to 'icount'
 - moved json stuff to replay.json and updated the descriptions
   (suggested by Markus Armbruster)
v10:
 - updated descriptions and messages for rr stuff
---
 hmp-commands-info.hx  |   14 ++
 include/monitor/hmp.h |1 +
 qapi/block-core.json  |3 ++-
 qapi/replay.json  |   39 +++
 replay/Makefile.objs  |1 +
 replay/replay-debugging.c |   43 +++
 6 files changed, 100 insertions(+), 1 deletion(-)
 create mode 100644 replay/replay-debugging.c

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 257ee7d7a3..0288860db0 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -930,6 +930,20 @@ STEXI
 @item info sev
 @findex info sev
 Show SEV information.
+ETEXI
+
+{
+.name   = "replay",
+.args_type  = "",
+.params = "",
+.help   = "show record/replay information",
+.cmd= hmp_info_replay,
+},
+
+STEXI
+@item info replay
+@findex info replay
+Display the record/replay information: mode and the current icount.
 ETEXI
 
 STEXI
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 3d329853b2..783784cf10 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -153,5 +153,6 @@ void hmp_hotpluggable_cpus(Monitor *mon, const QDict 
*qdict);
 void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict);
 void hmp_info_memory_size_summary(Monitor *mon, const QDict *qdict);
 void hmp_info_sev(Monitor *mon, const QDict *qdict);
+void hmp_info_replay(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/qapi/block-core.json b/qapi/block-core.json
index db3e435c74..1b665b1ad4 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -28,7 +28,8 @@
 #
 # @icount: Current instruction count. Appears when execution record/replay
 #  is enabled. Used for "time-traveling" to match the moment
-#  in the recorded execution with the snapshots. (since 5.0)
+#  in the recorded execution with the snapshots. This counter may
+#  be obtained through @query-replay command (since 5.0)
 #
 # Since: 1.3
 #
diff --git a/qapi/replay.json b/qapi/replay.json
index 9e13551d20..67f2d1f859 100644
--- a/qapi/replay.json
+++ b/qapi/replay.json
@@ -24,3 +24,42 @@
 ##
 { 'enum': 'ReplayMode',
   'data': [ 'none', 'record', 'play' ] }
+
+##
+# @ReplayInfo:
+#
+# Record/replay information.
+#
+# @mode: current mode.
+#
+# @filename: name of the record/replay log file.
+#It is present only in record or replay modes, when the log
+#is recorded or replayed.
+#
+# @icount: current number of executed instructions.
+#
+# Since: 5.0
+#
+##
+{ 'struct': 'ReplayInfo',
+  'data': { 'mode': 'ReplayMode', '*filename': 'str', 'icount': 'int' } }
+
+##
+# @query-replay:
+#
+# Retrieve the record/replay information.
+# It includes current instruction count which may be used for
+# @replay-break and @replay-seek commands.
+#
+# Returns: record/replay information.
+#
+# Since: 5.0
+#
+# Example:
+#
+# -> { "execute": "query-replay" }
+# <- { "return": { "mode": "play", "filename": "log.rr", "icount": 220414 } }
+#
+##
+{ 'command': 'query-replay',
+  'returns': 'ReplayInfo' }
diff --git a/replay/Makefile.objs b/replay/Makefile.objs
index 939be964a9..f847c5c023 100644
--- a/replay/Makefile.objs
+++ b/replay/Makefile.objs
@@ -8,3 +8,4 @@ common-obj-y += replay-snapshot.o
 common-obj-y += replay-net.o
 common-obj-y += replay-audio.o
 common-obj-y += replay-random.o
+common-obj-y += replay-debugging.o
diff --git a/replay/replay-debugging.c b/replay/replay-debugging.c
new file mode 100644
index 00..8cf15ebc11
--- /dev/null
+++ b/replay/replay-debugging.c
@@ -0,0 +1,43 @@
+/*
+ * replay-debugging.c
+ *
+ * Copyright (c) 2010-2018 Institute for System Programming
+ * of the Russian Academy of Sciences.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "sysemu/replay.h"
+#include "replay-internal.h"
+#include "monitor/hmp.h"
+#include "monitor/monitor.h"
+#include "qapi/qapi-commands-replay.h"
+
+void hmp_info_replay(Monitor *mon, const QDict *qdict)
+{
+if 

[for-5.0 PATCH 06/11] replay: introduce breakpoint at the specified step

2019-12-23 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

This patch introduces replay_break, replay_delete_break
qmp and hmp commands.
These commands allow stopping at the specified instruction.
It may be useful for debugging when there are some known
events that should be investigated.
replay_break command has one argument - number of instructions
executed since the start of the replay.
replay_delete_break removes previously set breakpoint.

Signed-off-by: Pavel Dovgalyuk 
Acked-by: Markus Armbruster 

--

v2:
 - renamed replay_break qmp command into replay-break
   (suggested by Eric Blake)
v7:
 - introduces replay_delete_break command
v9:
 - changed 'step' parameter name to 'icount'
 - moved json stuff to replay.json and updated the description
   (suggested by Markus Armbruster)
v10:
 - updated descriptions (suggested by Markus Armbruster)
v11:
 - fixed replay_break rearm bug
---
 hmp-commands.hx   |   34 ++
 include/monitor/hmp.h |2 +
 qapi/replay.json  |   36 +++
 replay/replay-debugging.c |   86 +
 replay/replay-internal.h  |4 ++
 replay/replay.c   |   17 +
 6 files changed, 179 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index cfcc044ce4..3704294da8 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1930,6 +1930,40 @@ ETEXI
 STEXI
 @item qom-set @var{path} @var{property} @var{value}
 Set QOM property @var{property} of object at location @var{path} to value 
@var{value}
+ETEXI
+
+{
+.name   = "replay_break",
+.args_type  = "icount:i",
+.params = "icount",
+.help   = "set breakpoint at the specified instruction count",
+.cmd= hmp_replay_break,
+},
+
+STEXI
+@item replay_break @var{icount}
+@findex replay_break
+Set replay breakpoint at instruction count @var{icount}.
+Execution stops when the specified instruction is reached.
+There can be at most one breakpoint. When breakpoint is set, any prior
+one is removed.  The breakpoint may be set only in replay mode and only
+"in the future", i.e. at instruction counts greater than the current one.
+The current instruction count can be observed with 'info replay'.
+ETEXI
+
+{
+.name   = "replay_delete_break",
+.args_type  = "",
+.params = "",
+.help   = "removes replay breakpoint",
+.cmd= hmp_replay_delete_break,
+},
+
+STEXI
+@item replay_delete_break
+@findex replay_delete_break
+Remove replay breakpoint which was previously set with replay_break.
+The command is ignored when there are no replay breakpoints.
 ETEXI
 
 {
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 783784cf10..0fd477aa13 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -154,5 +154,7 @@ void hmp_info_vm_generation_id(Monitor *mon, const QDict 
*qdict);
 void hmp_info_memory_size_summary(Monitor *mon, const QDict *qdict);
 void hmp_info_sev(Monitor *mon, const QDict *qdict);
 void hmp_info_replay(Monitor *mon, const QDict *qdict);
+void hmp_replay_break(Monitor *mon, const QDict *qdict);
+void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/qapi/replay.json b/qapi/replay.json
index 67f2d1f859..e3266ef3a9 100644
--- a/qapi/replay.json
+++ b/qapi/replay.json
@@ -63,3 +63,39 @@
 ##
 { 'command': 'query-replay',
   'returns': 'ReplayInfo' }
+
+##
+# @replay-break:
+#
+# Set replay breakpoint at instruction count @icount.
+# Execution stops when the specified instruction is reached.
+# There can be at most one breakpoint. When breakpoint is set, any prior
+# one is removed.  The breakpoint may be set only in replay mode and only
+# "in the future", i.e. at instruction counts greater than the current one.
+# The current instruction count can be observed with @query-replay.
+#
+# @icount: instruction count to stop at
+#
+# Since: 5.0
+#
+# Example:
+#
+# -> { "execute": "replay-break", "data": { "icount": 220414 } }
+#
+##
+{ 'command': 'replay-break', 'data': { 'icount': 'int' } }
+
+##
+# @replay-delete-break:
+#
+# Remove replay breakpoint which was set with @replay-break.
+# The command is ignored when there are no replay breakpoints.
+#
+# Since: 5.0
+#
+# Example:
+#
+# -> { "execute": "replay-delete-break" }
+#
+##
+{ 'command': 'replay-delete-break' }
diff --git a/replay/replay-debugging.c b/replay/replay-debugging.c
index 8cf15ebc11..166ba10d2c 100644
--- a/replay/replay-debugging.c
+++ b/replay/replay-debugging.c
@@ -12,10 +12,13 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "sysemu/replay.h"
+#include "sysemu/runstate.h"
 #include "replay-internal.h"
 #include "monitor/hmp.h"
 #include "monitor/monitor.h"
 #include "qapi/qapi-commands-replay.h"
+#include "qapi/qmp/qdict.h"
+#include "qemu/timer.h"
 
 void hmp_info_replay(Monitor *mon, const QDict *qdict)
 {
@@ -41,3 +44,86 @@ ReplayInfo *qmp_query_replay(Error **errp)
 retval->icount = 

[for-5.0 PATCH 02/11] qcow2: introduce icount field for snapshots

2019-12-23 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

This patch introduces the icount field for saving within the snapshot.
It is required for navigation between the snapshots in record/replay mode.

Signed-off-by: Pavel Dovgalyuk 
Acked-by: Kevin Wolf 

--

v2:
 - documented format changes in docs/interop/qcow2.txt
   (suggested by Eric Blake)
v10:
 - updated the documentation
---
 block/qcow2-snapshot.c |7 +++
 block/qcow2.h  |3 +++
 docs/interop/qcow2.txt |4 
 3 files changed, 14 insertions(+)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 5ab64da1ec..b04b3e1634 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -163,6 +163,12 @@ static int qcow2_do_read_snapshots(BlockDriverState *bs, 
bool repair,
 sn->disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
 }
 
+if (sn->extra_data_size >= endof(QCowSnapshotExtraData, icount)) {
+sn->icount = be64_to_cpu(extra.icount);
+} else {
+sn->icount = -1ULL;
+}
+
 if (sn->extra_data_size > sizeof(extra)) {
 uint64_t extra_data_end;
 size_t unknown_extra_data_size;
@@ -332,6 +338,7 @@ int qcow2_write_snapshots(BlockDriverState *bs)
 memset(, 0, sizeof(extra));
 extra.vm_state_size_large = cpu_to_be64(sn->vm_state_size);
 extra.disk_size = cpu_to_be64(sn->disk_size);
+extra.icount = cpu_to_be64(sn->icount);
 
 id_str_size = strlen(sn->id_str);
 name_size = strlen(sn->name);
diff --git a/block/qcow2.h b/block/qcow2.h
index 0942126232..f3e0d9aa56 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -171,6 +171,7 @@ typedef struct QEMU_PACKED QCowSnapshotHeader {
 typedef struct QEMU_PACKED QCowSnapshotExtraData {
 uint64_t vm_state_size_large;
 uint64_t disk_size;
+uint64_t icount;
 } QCowSnapshotExtraData;
 
 
@@ -184,6 +185,8 @@ typedef struct QCowSnapshot {
 uint32_t date_sec;
 uint32_t date_nsec;
 uint64_t vm_clock_nsec;
+/* icount value for the moment when snapshot was taken */
+uint64_t icount;
 /* Size of all extra data, including QCowSnapshotExtraData if available */
 uint32_t extra_data_size;
 /* Data beyond QCowSnapshotExtraData, if any */
diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index af5711e533..aa9d447cda 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -584,6 +584,10 @@ Snapshot table entry:
 
 Byte 48 - 55:   Virtual disk size of the snapshot in bytes
 
+Byte 56 - 63:   icount value which corresponds to
+the record/replay instruction count
+when the snapshot was taken
+
 Version 3 images must include extra data at least up to
 byte 55.
 




  1   2   >