Re: [PATCH] docs: Fix virtiofsd.1 location

2020-02-12 Thread Dr. David Alan Gilbert
* Peter Maydell (peter.mayd...@linaro.org) wrote:
> On Wed, 12 Feb 2020 at 13:16, Miroslav Rezanina  wrote:
> >
> > Patch 6a7e2bbee5 docs: add virtiofsd(1) man page introduced new man
> > page virtiofsd.1. Unfortunately, wrong file location is used as
> > source for install command. This cause installation of docs fail.
> >
> > Fixing wrong location so installation is successful.
> >
> > Signed-off-by: Miroslav Rezanina 
> 
> Reviewed-by: Peter Maydell 
> 
> I noticed this in review of v1 of the patch
> https://patchew.org/QEMU/20200127162514.56784-1-stefa...@redhat.com/
> but missed that it hadn't been fixed in v2/v3.

Oops thanks!

Does someone want to take this via build or trivial - I've not got
any more virtiofsd stuff currently queued.

Dave

> thanks
> -- PMM
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH v2 01/16] virtio-mem: Prototype

2020-02-12 Thread David Hildenbrand
On 12.02.20 15:15, Eric Blake wrote:
> On 2/12/20 7:35 AM, David Hildenbrand wrote:
>> Signed-off-by: David Hildenbrand 
>> ---
> 
> It's at least worth mentioning VirtioMEMDeviceInfo in the commit 
> message, to make it easier to find which commit introduces a given QAPI 
> struct when searching the git log.

Patches in this series were sent by mistake (don't match the cover
letter), so they are not in a review/able state. Thanks for the feedback
anyway :)

-- 
Thanks,

David / dhildenb




Re: [PATCH] docs: Fix virtiofsd.1 location

2020-02-12 Thread Peter Maydell
On Wed, 12 Feb 2020 at 13:16, Miroslav Rezanina  wrote:
>
> Patch 6a7e2bbee5 docs: add virtiofsd(1) man page introduced new man
> page virtiofsd.1. Unfortunately, wrong file location is used as
> source for install command. This cause installation of docs fail.
>
> Fixing wrong location so installation is successful.
>
> Signed-off-by: Miroslav Rezanina 

Reviewed-by: Peter Maydell 

I noticed this in review of v1 of the patch
https://patchew.org/QEMU/20200127162514.56784-1-stefa...@redhat.com/
but missed that it hadn't been fixed in v2/v3.

thanks
-- PMM



Re: [PATCH v2 13/16] qmp/hmp: Expose "managed-size" for memory backends

2020-02-12 Thread Eric Blake

On 2/12/20 7:35 AM, David Hildenbrand wrote:

Expose it, and document what it means and when it was added.

Signed-off-by: David Hildenbrand 
---
  hw/core/machine-hmp-cmds.c | 2 ++
  hw/core/machine-qmp-cmds.c | 3 +++
  qapi/machine.json  | 6 ++
  3 files changed, 11 insertions(+)



+++ b/qapi/machine.json
@@ -758,6 +758,9 @@
  #
  # @prealloc: enables or disables memory preallocation
  #
+# @managed-size: the owner manages the actual size, 'size' is an upper limit
+#(since 5.1)
+#


There's still time to get this in 5.0, if the series is accepted before 
soft freeze.



  # @host-nodes: host nodes for its memory policy
  #
  # @policy: memory policy of memory backend
@@ -771,6 +774,7 @@
  'merge':  'bool',
  'dump':   'bool',
  'prealloc':   'bool',
+'managed-size': 'bool',
  'host-nodes': ['uint16'],
  'policy': 'HostMemPolicy' }}
  
@@ -793,6 +797,7 @@

  #  "merge": false,
  #  "dump": true,
  #  "prealloc": false,
+#  "manmaged-size": false,


typo, managed-size


  #  "host-nodes": [0, 1],
  #  "policy": "bind"
  #},
@@ -801,6 +806,7 @@
  #  "merge": false,
  #  "dump": true,
  #  "prealloc": true,
+#  "manmaged-size": false,


and again


  #  "host-nodes": [2, 3],
  #  "policy": "preferred"
  #}



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2 01/16] virtio-mem: Prototype

2020-02-12 Thread Eric Blake

On 2/12/20 7:35 AM, David Hildenbrand wrote:

Signed-off-by: David Hildenbrand 
---


It's at least worth mentioning VirtioMEMDeviceInfo in the commit 
message, to make it easier to find which commit introduces a given QAPI 
struct when searching the git log.



+++ b/qapi/misc.json
@@ -1557,19 +1557,56 @@
}
  }
  
+##

+# @VirtioMEMDeviceInfo:
+#
+# VirtioMEMDevice state information
+#
+# @id: device's ID
+#
+# @memaddr: physical address in memory, where device is mapped
+#
+# @requested-size: the user requested size of the device
+#
+# @size: the (current) size of memory that the device provides
+#
+# @max-size: the maximum size of memory that the device can provide
+#
+# @block-size: the block size of memory that the device provides
+#
+# @node: NUMA node number where device is assigned to
+#
+# @memdev: memory backend linked with the region
+#
+# Since: 4.1


5.0


+##
+{ 'struct': 'VirtioMEMDeviceInfo',
+  'data': { '*id': 'str',


Does it make sense for id to be optional, or should it be mandatory?


+'memaddr': 'size',
+'requested-size': 'size',
+'size': 'size',
+'max-size': 'size',
+'block-size': 'size',
+'node': 'int',
+'memdev': 'str'
+  }
+}
+
  ##
  # @MemoryDeviceInfo:
  #
  # Union containing information about a memory device
  #
  # nvdimm is included since 2.12. virtio-pmem is included since 4.1.
+# virtio-mem is included since 4.2.


5.0


  #
  # Since: 2.1
  ##
  { 'union': 'MemoryDeviceInfo',
'data': { 'dimm': 'PCDIMMDeviceInfo',
  'nvdimm': 'PCDIMMDeviceInfo',
-'virtio-pmem': 'VirtioPMEMDeviceInfo'
+'virtio-pmem': 'VirtioPMEMDeviceInfo',
+'virtio-mem': 'VirtioMEMDeviceInfo'
}
  }
  



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3 4/4] linux-user: fix use of SIGRTMIN

2020-02-12 Thread Peter Maydell
On Wed, 12 Feb 2020 at 12:57, Laurent Vivier  wrote:
>
> Some RT signals can be in use by glibc,
> it's why SIGRTMIN (34) is generally greater than __SIGRTMIN (32).
>
> So SIGRTMIN cannot be mapped to TARGET_SIGRTMIN.
>
> Instead of swapping only SIGRTMIN and SIGRTMAX, map all the
> range [TARGET_SIGRTMIN ... TARGET_SIGRTMAX - X] to
>   [__SIGRTMIN + X ... SIGRTMAX ]
> (SIGRTMIN is __SIGRTMIN + X).
>
> Signed-off-by: Laurent Vivier 
> Reviewed-by: Taylor Simson 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM



Re: Summary of Re: Making QEMU easier for management tools and applications

2020-02-12 Thread Daniel P . Berrangé
On Wed, Feb 12, 2020 at 01:54:42PM +, Stefan Hajnoczi wrote:
> On Mon, Feb 10, 2020 at 05:43:13PM +0100, Markus Armbruster wrote:
> > Stefan Hajnoczi  writes:
> > 
> > > On Tue, Feb 4, 2020 at 3:54 PM Markus Armbruster  
> > > wrote:
> > >> = Ways to provide machine-friendly initial configuration =
> > >>
> > >> Two ways to provide machine-friendly initial configuration on par with
> > >> QMP have been proposed:
> > >>
> > >> 1. Extend QMP
> > >>
> > >>Machines use the CLI only to configure a QMP socket.  The remainder
> > >>of the CLI becomes human-only, with much relaxed compatibility rules.
> > >>
> > >> 2. QAPIfy the CLI
> > >>
> > >>Provide a machine-friendly CLI based on QAPI and JSON.  The current
> > >>CLI becomes human-only, with much relaxed compatibility rules.
> > >
> > > Do we keep the existing CLI around in both cases?  I'm concerned that
> > > we're still following the HMP/QMP approach, which has left QEMU with
> > > the legacy HMP monitor that we still haven't removed.
> > 
> > The "HMP is legacy" idea is relatively recent.
> > 
> > I think having separate interfaces for humans and machines makes sense,
> > we just need to give both the attention and care they need and deserve.
> > 
> > I think a human-friendly monitor is has its use, but it should ideally
> > be done differently than we do HMP now.
> > 
> > Likewise, human-friendly initial configuration should exist, but it
> > should ideally be done differently than we do HMP now.
> > 
> > > I'm in favor of simplifying QEMU at the expense of an incompatible CLI
> > > change in QEMU 6.0.
> > 
> > I'm convinced the current CLI needs cleanup badly, and that means
> > incompatible change.  The question is how and when to change it.
> > 
> > Here's how I'd like us to do it:
> > 
> > 1. Create machine-friendly initial configuration interface separate from
> >the existing CLI
> > 
> >Doesn't mean it cannot be a CLI.
> > 
> > 2. Develop it step by step to feature parity with existing CLI
> > 
> >If we identify misfeatures we don't want anymore, we should
> >immediately deprecate them in the existing CLI instead.
> > 
> > 2. Transition machine users to this new interface
> > 
> > 3. Declare the existing CLI to be like HMP: for humans, may change
> >incompatibly
> > 
> > 4. Clean up existing CLI step by step to wrap around the
> >machine-friendly interface
> > 
> >Whatever we deprecated in step 2 goes to the bit bucket instead.
> > 
> >I'm open to replacing the existing CLI by a separate wrapper process
> >instead.
> > 
> >Capability to translate to the machine-friendly interface is
> >desirable, so human users can easily transition to the
> >machine-friendly interface when they run into a need to automate.
> > 
> > The risk is of course that we fail at step 4 and remain stuck with the
> > CLI mess we've made.
> 
> Yes, QEMU does not have a good track record of successfully converting
> to new APIs and then removing old code.
> 
> My worry is that this effort will result in the addition of even more
> code but we'll still be stuck with the old cruft (both in the user
> visible interface and in the implementation).

This is why I think any new CLI ought to be done in a new binary,
not qemu-system-. I think it is an easier proposition to
sell to people that this is a clean break if we make it a new
binary. The mere fact the binary exists will make people curious
about it. If we add new stuff to existing binaries, it is
essentially invisible unless you look for it.  Separate binaries
would also make life better for documentation IMHO, as we can
clearly distinguish legacy and modern in the docs. Indeed the
new binary doc shoudl be completely separate, so when people
learn about it, they're not distracted by legacy.

This way, even if we don't delete qemu-system- for a long time,
the new binary would not be polluted by the legacy cruft, even if
it still exists in some internal places.

Ideally the goal would be that QemuOpts be entirely missing from
any code linked into the new binary. This will be challenging given
some of the places QemuOpts embeds itself. Perhaps we can split some
of the source files to isolate the QemuOpts usage. The block layer
is the biggest challenge here.

> But we won't get anywhere if we don't try :).  This sounds like a
> significant project and I wonder if others would be willing to help if
> you can break down the tasks for them.



Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] tests: Fix a bug with count variables

2020-02-12 Thread Stefan Hajnoczi
On Fri, Feb 07, 2020 at 07:54:33PM +0800, Tianjia Zhang wrote:
> The counting code here should use the local variable n_nodes_local.
> Otherwise, the variable n_nodes is counting incorrectly, causing the
> counting logic of the code to be wrong.
> 
> Signed-off-by: Tianjia Zhang 
> ---
>  tests/test-rcu-list.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v30 00/22] Add RX archtecture support

2020-02-12 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200212130311.127515-1-ys...@users.sourceforge.jp/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v30 00/22] Add RX archtecture support
Message-id: 20200212130311.127515-1-ys...@users.sourceforge.jp
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 - [tag update]  patchew/20200212130311.127515-1-ys...@users.sourceforge.jp 
-> patchew/20200212130311.127515-1-ys...@users.sourceforge.jp
Switched to a new branch 'test'
b99dc82 qemu-doc.texi: Add RX section.
0d81589 BootLinuxConsoleTest: Test the RX-Virt machine
e1fb98f Add rx-softmmu
9fb7102 hw/rx: Restrict the RX62N microcontroller to the RX62N CPU core
3491db9 hw/rx: Honor -accel qtest
7905f70 hw/rx: RX Target hardware definition
6871a28 hw/char: RX62N serial communication interface (SCI)
430e7ae hw/timer: RX62N internal timer modules
029f7c2 hw/intc: RX62N interrupt controller (ICUa)
08436a0 target/rx: Dump bytes for each insn during disassembly
0922998 target/rx: Collect all bytes during disassembly
f7c9eeb target/rx: Emit all disassembly in one prt()
b26c962 target/rx: Use prt_ldmi for XCHG_mr disassembly
547cba7 target/rx: Replace operand with prt_ldmi in disassembler
72fdb0c target/rx: Disassemble rx_index_addr into a string
9ae47c8 target/rx: RX disassembler
6ce9166 target/rx: CPU definition
f44b93f target/rx: TCG helper
d12c2b0 target/rx: TCG translation
4765cee hw/registerfields.h: Add 8bit and 16bit register macros
d8bef98 qemu/bitops.h: Add extract8 and extract16
3b29a0a MAINTAINERS: Add RX

=== OUTPUT BEGIN ===
1/22 Checking commit 3b29a0a2c13b (MAINTAINERS: Add RX)
2/22 Checking commit d8bef9867414 (qemu/bitops.h: Add extract8 and extract16)
3/22 Checking commit 4765cee62a1b (hw/registerfields.h: Add 8bit and 16bit 
register macros)
Use of uninitialized value in concatenation (.) or string at 
./scripts/checkpatch.pl line 2490.
ERROR: Macros with multiple statements should be enclosed in a do - while loop
#27: FILE: include/hw/registerfields.h:25:
+#define REG8(reg, addr)  \
+enum { A_ ## reg = (addr) };  \
+enum { R_ ## reg = (addr) };

ERROR: Macros with multiple statements should be enclosed in a do - while loop
#31: FILE: include/hw/registerfields.h:29:
+#define REG16(reg, addr)  \
+enum { A_ ## reg = (addr) };  \
+enum { R_ ## reg = (addr) / 2 };

total: 2 errors, 0 warnings, 56 lines checked

Patch 3/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/22 Checking commit d12c2b05dc64 (target/rx: TCG translation)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#20: 
new file mode 100644

total: 0 errors, 1 warnings, 3065 lines checked

Patch 4/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
5/22 Checking commit f44b93fefa39 (target/rx: TCG helper)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#21: 
new file mode 100644

total: 0 errors, 1 warnings, 650 lines checked

Patch 5/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
6/22 Checking commit 6ce916678c4b (target/rx: CPU definition)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#22: 
new file mode 100644

total: 0 errors, 1 warnings, 659 lines checked

Patch 6/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
7/22 Checking commit 9ae47c83cd99 (target/rx: RX disassembler)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#38: 
new file mode 100644

total: 0 errors, 1 warnings, 1497 lines checked

Patch 7/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
8/22 Checking commit 72fdb0cf8e02 (target/rx: Disassemble rx_index_addr into a 
string)
9/22 Checking commit 547cba7fec67 (target/rx: Replace operand with prt_ldmi in 
disassembler)
10/22 Checking commit b26c962f1e11 (target/rx: Use prt_ldmi for XCHG_mr 
disassembly)
11/22 Checking commit f7c9eebd32fb (target/rx: Emit all disassembly in one 
prt())
12/22 Checking commit 0922998c8ca2 (target/rx: Collect all bytes during 
disassembly)
13/22 Checking commit 08436a08ece7 (target/rx: Dump bytes for each insn 

Re: Summary of Re: Making QEMU easier for management tools and applications

2020-02-12 Thread Stefan Hajnoczi
On Mon, Feb 10, 2020 at 05:43:13PM +0100, Markus Armbruster wrote:
> Stefan Hajnoczi  writes:
> 
> > On Tue, Feb 4, 2020 at 3:54 PM Markus Armbruster  wrote:
> >> = Ways to provide machine-friendly initial configuration =
> >>
> >> Two ways to provide machine-friendly initial configuration on par with
> >> QMP have been proposed:
> >>
> >> 1. Extend QMP
> >>
> >>Machines use the CLI only to configure a QMP socket.  The remainder
> >>of the CLI becomes human-only, with much relaxed compatibility rules.
> >>
> >> 2. QAPIfy the CLI
> >>
> >>Provide a machine-friendly CLI based on QAPI and JSON.  The current
> >>CLI becomes human-only, with much relaxed compatibility rules.
> >
> > Do we keep the existing CLI around in both cases?  I'm concerned that
> > we're still following the HMP/QMP approach, which has left QEMU with
> > the legacy HMP monitor that we still haven't removed.
> 
> The "HMP is legacy" idea is relatively recent.
> 
> I think having separate interfaces for humans and machines makes sense,
> we just need to give both the attention and care they need and deserve.
> 
> I think a human-friendly monitor is has its use, but it should ideally
> be done differently than we do HMP now.
> 
> Likewise, human-friendly initial configuration should exist, but it
> should ideally be done differently than we do HMP now.
> 
> > I'm in favor of simplifying QEMU at the expense of an incompatible CLI
> > change in QEMU 6.0.
> 
> I'm convinced the current CLI needs cleanup badly, and that means
> incompatible change.  The question is how and when to change it.
> 
> Here's how I'd like us to do it:
> 
> 1. Create machine-friendly initial configuration interface separate from
>the existing CLI
> 
>Doesn't mean it cannot be a CLI.
> 
> 2. Develop it step by step to feature parity with existing CLI
> 
>If we identify misfeatures we don't want anymore, we should
>immediately deprecate them in the existing CLI instead.
> 
> 2. Transition machine users to this new interface
> 
> 3. Declare the existing CLI to be like HMP: for humans, may change
>incompatibly
> 
> 4. Clean up existing CLI step by step to wrap around the
>machine-friendly interface
> 
>Whatever we deprecated in step 2 goes to the bit bucket instead.
> 
>I'm open to replacing the existing CLI by a separate wrapper process
>instead.
> 
>Capability to translate to the machine-friendly interface is
>desirable, so human users can easily transition to the
>machine-friendly interface when they run into a need to automate.
> 
> The risk is of course that we fail at step 4 and remain stuck with the
> CLI mess we've made.

Yes, QEMU does not have a good track record of successfully converting
to new APIs and then removing old code.

My worry is that this effort will result in the addition of even more
code but we'll still be stuck with the old cruft (both in the user
visible interface and in the implementation).

But we won't get anywhere if we don't try :).  This sounds like a
significant project and I wonder if others would be willing to help if
you can break down the tasks for them.

Stefan


signature.asc
Description: PGP signature


Re: [RFC 0/9] Add an interVM memory sharing device

2020-02-12 Thread Stefan Hajnoczi
On Mon, Feb 10, 2020 at 02:01:48PM +0100, Igor Kotrasiński wrote:
> On 2/7/20 5:33 PM, Stefan Hajnoczi wrote:
> > On Fri, Feb 07, 2020 at 11:04:03AM +0100, Igor Mammedov wrote:
> >> On Fri, 7 Feb 2020 10:00:50 +0100
> >> Igor Kotrasiński  wrote:
> >>
> >>> On 2/5/20 3:49 PM, Jan Kiszka wrote:
>  On 05.02.20 15:39, Stefan Hajnoczi wrote:
> > On Tue, Feb 04, 2020 at 12:30:42PM +0100,
> > i.kotrasi...@partner.samsung.com wrote:
> >> From: Igor Kotrasinski 
> >>
> >> This patchset adds a "memory exposing" device that allows two QEMU
> >> instances to share arbitrary memory regions. Unlike ivshmem, it does 
> >> not
> >> create a new region of memory that's shared between VMs, but instead
> >> allows one VM to access any memory region of the other VM we choose to
> >> share.
> >>
> >> The motivation for this device is a sort of ARM Trustzone "emulation",
> >> where a rich system running on one machine (e.g. x86_64 linux) is able
> >> to perform SMCs to a trusted system running on another (e.g. OpTEE on
> >> ARM). With a device that allows sharing arbitrary memory between VMs,
> >> this can be achieved with minimal changes to the trusted system and its
> >> linux driver while allowing the rich system to run on a speedier x86
> >> emulator. I prepared additional patches for linux, OpTEE OS and OpTEE
> >> build system as a PoC that such emulation works and passes OpTEE tests;
> >> I'm not sure what would be the best way to share them.
> >>
> >> This patchset is my first foray into QEMU source code and I'm certain
> >> it's not yet ready to be merged in. I'm not sure whether memory sharing
> >> code has any race conditions or breaks rules of working with memory
> >> regions, or if having VMs communicate synchronously via chardevs is the
> >> right way to do it. I do believe the basic idea for sharing memory
> >> regions is sound and that it could be useful for inter-VM 
> >> communication.
> >
> > Hi,
> > Without having looked into the patches yet, I'm already wondering if you
> > can use the existing -object
> > memory-backend-file,size=512M,mem-path=/my/shared/mem feature for your
> > use case?
> >
> > That's the existing mechanism for fully sharing guest RAM and if you
> > want to share all of memory then maybe a device is not necessary - just
> > share the memory.
> >>>
> >>> That option adds memory in addition to the memory allocated with the
> >>> '-m' flag, doesn't it? I looked into that option, and it seemed to me
> >>> you can't back all memory this way.
> >> with current QEMU you play with memory sharing using numa workaround
> >>
> >> -m 512 \
> >> -object memory-backend-file,id=mem,size=512M,mem-path=/my/shared/mem 
> >> feature,share=on \
> >> -numa node,memdev=mem
> >>
> >> also on the list there is series that allows to share main ram
> >> without numa workaround, see
> >>"[PATCH v4 00/80] refactor main RAM allocation to use hostmem backend"
> >>
> >> with it applied you can share main RAM with following CLI:
> >>
> >> -object memory-backend-file,id=mem,size=512M,mem-path=/my/shared/mem 
> >> feature,share=on \
> >> -m 512 \
> >> -M virt,memory-backend=mem
> > 
> > Nice!  That takes care of memory.
> 
> After a bit of hacking to map the shared RAM instead of communicating 
> via socket I can confirm - I can run OpTEE this way, and it passes 
> tests. My solution is *technically* more accurate since it is aware of 
> memory subregions and completely independent from memory backend setup, 
> but with my use case satisfied already, I don't think it's of use to anyone.

Great!

Stefan


signature.asc
Description: PGP signature


[PATCH v2 fixed 13/16] numa: Teach ram block notifiers about resizable ram blocks

2020-02-12 Thread David Hildenbrand
We want to actually resize ram blocks (make everything between
used_length and max_length inaccessible) - however, not all ram block
notifiers will support that. Let's teach the notifier that ram blocks
are indeed resizable, but keep using max_size in the existing notifiers.

Supply the max_size when adding and removing ram blocks. Also, notify on
resizes. Introduce a way to detect if any registered notifier does not
support resizes - ram_block_notifiers_support_resize() - which we can later
use to fallback to legacy handling if a registered notifier (esp., SEV and
HAX) does not support actual resizes.

Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: "Dr. David Alan Gilbert" 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Paul Durrant 
Cc: "Michael S. Tsirkin" 
Cc: xen-de...@lists.xenproject.org
Cc: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 exec.c | 13 +++--
 hw/core/numa.c | 34 +-
 hw/i386/xen/xen-mapcache.c |  7 ---
 include/exec/ramlist.h | 14 ++
 target/i386/hax-mem.c  |  5 +++--
 target/i386/sev.c  | 18 ++
 util/vfio-helpers.c| 17 +
 7 files changed, 76 insertions(+), 32 deletions(-)

diff --git a/exec.c b/exec.c
index fc65c4f7ca..f2d30479b8 100644
--- a/exec.c
+++ b/exec.c
@@ -2139,6 +2139,8 @@ static void qemu_ram_apply_settings(void *host, size_t 
length)
  */
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
 {
+const ram_addr_t oldsize = block->used_length;
+
 assert(block);
 
 newsize = HOST_PAGE_ALIGN(newsize);
@@ -2167,6 +2169,11 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, 
Error **errp)
 block->used_length = newsize;
 cpu_physical_memory_set_dirty_range(block->offset, block->used_length,
 DIRTY_CLIENTS_ALL);
+
+if (block->host) {
+ram_block_notify_resized(block->host, oldsize, newsize);
+}
+
 memory_region_set_size(block->mr, newsize);
 if (block->resized) {
 block->resized(block->idstr, newsize, block->host);
@@ -2319,7 +2326,8 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp)
 
 if (new_block->host) {
 qemu_ram_apply_settings(new_block->host, new_block->max_length);
-ram_block_notify_add(new_block->host, new_block->max_length);
+ram_block_notify_add(new_block->host, new_block->used_length,
+ new_block->max_length);
 }
 }
 
@@ -2502,7 +2510,8 @@ void qemu_ram_free(RAMBlock *block)
 }
 
 if (block->host) {
-ram_block_notify_remove(block->host, block->max_length);
+ram_block_notify_remove(block->host, block->used_length,
+block->max_length);
 }
 
 qemu_mutex_lock_ramlist();
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 6599c69e05..5b20dc726d 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -902,11 +902,12 @@ void query_numa_node_mem(NumaNodeMem node_mem[], 
MachineState *ms)
 static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
 {
 const ram_addr_t max_size = qemu_ram_get_max_length(rb);
+const ram_addr_t size = qemu_ram_get_used_length(rb);
 void *host = qemu_ram_get_host_addr(rb);
 RAMBlockNotifier *notifier = opaque;
 
 if (host) {
-notifier->ram_block_added(notifier, host, max_size);
+notifier->ram_block_added(notifier, host, size, max_size);
 }
 return 0;
 }
@@ -923,20 +924,43 @@ void ram_block_notifier_remove(RAMBlockNotifier *n)
 QLIST_REMOVE(n, next);
 }
 
-void ram_block_notify_add(void *host, size_t size)
+void ram_block_notify_add(void *host, size_t size, size_t max_size)
 {
 RAMBlockNotifier *notifier;
 
 QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
-notifier->ram_block_added(notifier, host, size);
+notifier->ram_block_added(notifier, host, size, max_size);
 }
 }
 
-void ram_block_notify_remove(void *host, size_t size)
+void ram_block_notify_remove(void *host, size_t size, size_t max_size)
 {
 RAMBlockNotifier *notifier;
 
 QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
-notifier->ram_block_removed(notifier, host, size);
+notifier->ram_block_removed(notifier, host, size, max_size);
 }
 }
+
+void ram_block_notify_resized(void *host, size_t old_size, size_t new_size)
+{
+RAMBlockNotifier *notifier;
+
+QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
+if (notifier->ram_block_resized) {
+notifier->ram_block_resized(notifier, host, old_size, new_size);
+}
+}
+}
+
+bool ram_block_notifiers_support_resize(void)
+{
+RAMBlockNotifier *notifier;
+
+QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {
+if (!notifier->ram_block_resized) {
+return false;
+}
+}
+return true;
+}

[PATCH v2 fixed 16/16] exec: Ram blocks with resizable anonymous allocations under POSIX

2020-02-12 Thread David Hildenbrand
We can now make use of resizable anonymous allocations to implement
actually resizable ram blocks. Resizable anonymous allocations are
not implemented under WIN32 yet and are not available when using
alternative allocators. Fall back to the existing handling.

We also have to fallback to the existing handling in case any ram block
notifier does not support resizing (esp., AMD SEV, HAX) yet. Remember
in RAM_RESIZEABLE_ALLOC if we are using resizable anonymous allocations.

As the mmap()-hackery will invalidate some madvise settings, we have to
re-apply them after resizing. After resizing, notify the ram block
notifiers.

Try to grow early, as that can easily fail if out of memory. Shrink late
and ignore errors (nothing will actually break). Warn only.

The benefit of actually resizable ram blocks is that e.g., under Linux,
only the actual size will be reserved (even if
"/proc/sys/vm/overcommit_memory" is set to "never"). Additional memory will
be reserved when trying to resize, which allows to have ram blocks that
start small but can theoretically grow very large.

Note1: We are not able to create resizable ram blocks with pre-allocated
   memory yet, so prealloc is not affected.
Note2: mlock should work as it used to as os_mlock() does a
   mlockall(MCL_CURRENT | MCL_FUTURE), which includes future
   mappings.
Note3: Nobody should access memory beyond used_length. Memory notifiers
   already properly take care of this, only ram block notifiers
   violate this constraint and, therefore, have to be special-cased.
   Especially, any ram block notifier that might dynamically
   register at runtime (e.g., vfio), has to support resizes. Add an
   assert for that. Both, HAX and SEV register early, so they are
   fine.

Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: "Dr. David Alan Gilbert" 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Stefan Weil 
Cc: Igor Mammedov 
Cc: Shameerali Kolothum Thodi 
Signed-off-by: David Hildenbrand 
---
 exec.c| 60 ---
 hw/core/numa.c|  7 +
 include/exec/cpu-common.h |  2 ++
 include/exec/memory.h |  8 ++
 4 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index f2d30479b8..71e32dcc11 100644
--- a/exec.c
+++ b/exec.c
@@ -2053,6 +2053,16 @@ void qemu_ram_unset_migratable(RAMBlock *rb)
 rb->flags &= ~RAM_MIGRATABLE;
 }
 
+bool qemu_ram_is_resizable(RAMBlock *rb)
+{
+return rb->flags & RAM_RESIZEABLE;
+}
+
+bool qemu_ram_is_resizable_alloc(RAMBlock *rb)
+{
+return rb->flags & RAM_RESIZEABLE_ALLOC;
+}
+
 /* Called with iothread lock held.  */
 void qemu_ram_set_idstr(RAMBlock *new_block, const char *name, DeviceState 
*dev)
 {
@@ -2139,6 +2149,7 @@ static void qemu_ram_apply_settings(void *host, size_t 
length)
  */
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp)
 {
+const bool shared = block->flags & RAM_SHARED;
 const ram_addr_t oldsize = block->used_length;
 
 assert(block);
@@ -2149,7 +2160,7 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, 
Error **errp)
 return 0;
 }
 
-if (!(block->flags & RAM_RESIZEABLE)) {
+if (!qemu_ram_is_resizable(block)) {
 error_setg_errno(errp, EINVAL,
  "Length mismatch: %s: 0x" RAM_ADDR_FMT
  " in != 0x" RAM_ADDR_FMT, block->idstr,
@@ -2165,6 +2176,12 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, 
Error **errp)
 return -EINVAL;
 }
 
+if (oldsize < newsize && qemu_ram_is_resizable_alloc(block) &&
+!qemu_anon_ram_resize(block->host, oldsize, newsize, shared)) {
+error_setg_errno(errp, -ENOMEM, "Cannot allocate enough memory.");
+return -ENOMEM;
+}
+
 cpu_physical_memory_clear_dirty_range(block->offset, block->used_length);
 block->used_length = newsize;
 cpu_physical_memory_set_dirty_range(block->offset, block->used_length,
@@ -2178,6 +2195,21 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, 
Error **errp)
 if (block->resized) {
 block->resized(block->idstr, newsize, block->host);
 }
+
+/*
+ * Shrinking will only fail in rare scenarios (e.g., maximum number of
+ * mappings reached), and can be ignored. Warn only.
+ */
+if (newsize < oldsize && qemu_ram_is_resizable_alloc(block) &&
+!qemu_anon_ram_resize(block->host, oldsize, newsize, shared)) {
+warn_report("Shrinking memory allocation failed.");
+}
+
+if (block->host && qemu_ram_is_resizable_alloc(block)) {
+/* re-apply settings that might have been overriden by the resize */
+qemu_ram_apply_settings(block->host, block->max_length);
+}
+
 return 0;
 }
 
@@ -2256,6 +2288,28 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
 }
 }
 
+static void ram_block_alloc_ram(RAMBlock *rb)
+{
+const bool shared = qemu_ram_is_shared(rb);
+
+/*
+ * If we ca

[PATCH v2 fixed 12/16] util/mmap-alloc: Implement resizable mmaps

2020-02-12 Thread David Hildenbrand
Implement resizable mmaps. For now, the actual resizing is not wired up.
Introduce qemu_ram_mmap_resizable() and qemu_ram_mmap_resize(). Make
qemu_ram_mmap() a wrapper of qemu_ram_mmap_resizable().

Cc: Richard Henderson 
Cc: Igor Kotrasinski 
Cc: Murilo Opsfelder Araujo 
Cc: "Michael S. Tsirkin" 
Cc: Greg Kurz 
Cc: Eduardo Habkost 
Cc: "Dr. David Alan Gilbert" 
Cc: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 include/qemu/mmap-alloc.h | 21 
 util/mmap-alloc.c | 42 ---
 2 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index e786266b92..3a219721e3 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -7,11 +7,13 @@ size_t qemu_fd_getpagesize(int fd);
 size_t qemu_mempath_getpagesize(const char *mem_path);
 
 /**
- * qemu_ram_mmap: mmap the specified file or device.
+ * qemu_ram_mmap_resizable: reserve a memory region of @max_size to mmap the
+ *  specified file or device and mmap @size of it.
  *
  * Parameters:
  *  @fd: the file or the device to mmap
  *  @size: the number of bytes to be mmaped
+ *  @max_size: the number of bytes to be reserved
  *  @align: if not zero, specify the alignment of the starting mapping address;
  *  otherwise, the alignment in use will be determined by QEMU.
  *  @shared: map has RAM_SHARED flag.
@@ -21,12 +23,15 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
  *  On success, return a pointer to the mapped area.
  *  On failure, return MAP_FAILED.
  */
-void *qemu_ram_mmap(int fd,
-size_t size,
-size_t align,
-bool shared,
-bool is_pmem);
-
-void qemu_ram_munmap(int fd, void *ptr, size_t size);
+void *qemu_ram_mmap_resizable(int fd, size_t size, size_t max_size,
+  size_t align, bool shared, bool is_pmem);
+bool qemu_ram_mmap_resize(void *ptr, int fd, size_t old_size, size_t new_size,
+  bool shared, bool is_pmem);
+static inline void *qemu_ram_mmap(int fd, size_t size, size_t align,
+  bool shared, bool is_pmem)
+{
+return qemu_ram_mmap_resizable(fd, size, size, align, shared, is_pmem);
+}
+void qemu_ram_munmap(int fd, void *ptr, size_t max_size);
 
 #endif
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fb7ef588fe..164b88a088 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -173,23 +173,22 @@ static inline size_t mmap_pagesize(int fd)
 #endif
 }
 
-void *qemu_ram_mmap(int fd,
-size_t size,
-size_t align,
-bool shared,
-bool is_pmem)
+void *qemu_ram_mmap_resizable(int fd, size_t size, size_t max_size,
+  size_t align, bool shared, bool is_pmem)
 {
 const size_t pagesize = mmap_pagesize(fd);
 size_t offset, total;
 void *ptr, *guardptr;
 
 g_assert(QEMU_IS_ALIGNED(size, pagesize));
+g_assert(QEMU_IS_ALIGNED(max_size, pagesize));
 
 /*
  * Note: this always allocates at least one extra page of virtual address
- * space, even if size is already aligned.
+ * space, even if the size is already aligned. We will reserve an area of
+ * at least max_size, but only populate the requested part of it.
  */
-total = size + align;
+total = max_size + align;
 
 guardptr = mmap_reserve(0, total, fd);
 if (guardptr == MAP_FAILED) {
@@ -217,21 +216,40 @@ void *qemu_ram_mmap(int fd,
  * a guard page guarding against potential buffer overflows.
  */
 total -= offset;
-if (total > size + pagesize) {
-munmap(ptr + size + pagesize, total - size - pagesize);
+if (total > max_size + pagesize) {
+munmap(ptr + max_size + pagesize, total - max_size - pagesize);
 }
 
 return ptr;
 }
 
-void qemu_ram_munmap(int fd, void *ptr, size_t size)
+bool qemu_ram_mmap_resize(void *ptr, int fd, size_t old_size, size_t new_size,
+  bool shared, bool is_pmem)
 {
 const size_t pagesize = mmap_pagesize(fd);
 
-g_assert(QEMU_IS_ALIGNED(size, pagesize));
+g_assert(QEMU_IS_ALIGNED(old_size, pagesize));
+g_assert(QEMU_IS_ALIGNED(new_size, pagesize));
+
+if (old_size < new_size) {
+/* populate the missing piece into the reserved area */
+ptr = mmap_populate(ptr + old_size, new_size - old_size, fd, old_size,
+shared, is_pmem);
+} else if (old_size > new_size) {
+/* discard this piece, marking it reserved */
+ptr = mmap_reserve(ptr + new_size, old_size - new_size, fd);
+}
+return ptr != MAP_FAILED;
+}
+
+void qemu_ram_munmap(int fd, void *ptr, size_t max_size)
+{
+const size_t pagesize = mmap_pagesize(fd);
+
+g_assert(QEMU_IS_ALIGNED(max_size, pagesize));
 
 if (ptr) {
 /* Unmap both the RAM block an

[PATCH v2 fixed 07/16] exec: Drop "shared" parameter from ram_block_add()

2020-02-12 Thread David Hildenbrand
Properly store it in the flags of the ram block instead (and the flag
even already exists and is used).

E.g., qemu_ram_is_shared() now properly succeeds on all ram blocks that are
actually shared.

Reviewed-by: Igor Kotrasinski 
Reviewed-by: Richard Henderson 
Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 exec.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/exec.c b/exec.c
index f7525867ec..fc65c4f7ca 100644
--- a/exec.c
+++ b/exec.c
@@ -2249,7 +2249,7 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
 }
 }
 
-static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
+static void ram_block_add(RAMBlock *new_block, Error **errp)
 {
 RAMBlock *block;
 RAMBlock *last_block = NULL;
@@ -2272,7 +2272,8 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp, bool shared)
 }
 } else {
 new_block->host = phys_mem_alloc(new_block->max_length,
- &new_block->mr->align, shared);
+ &new_block->mr->align,
+ qemu_ram_is_shared(new_block));
 if (!new_block->host) {
 error_setg_errno(errp, errno,
  "cannot set up guest memory '%s'",
@@ -2376,7 +2377,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, 
MemoryRegion *mr,
 return NULL;
 }
 
-ram_block_add(new_block, &local_err, ram_flags & RAM_SHARED);
+ram_block_add(new_block, &local_err);
 if (local_err) {
 g_free(new_block);
 error_propagate(errp, local_err);
@@ -2438,10 +2439,13 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, 
ram_addr_t max_size,
 if (host) {
 new_block->flags |= RAM_PREALLOC;
 }
+if (share) {
+new_block->flags |= RAM_SHARED;
+}
 if (resizeable) {
 new_block->flags |= RAM_RESIZEABLE;
 }
-ram_block_add(new_block, &local_err, share);
+ram_block_add(new_block, &local_err);
 if (local_err) {
 g_free(new_block);
 error_propagate(errp, local_err);
-- 
2.24.1




[PATCH v2 fixed 14/16] util: vfio-helpers: Implement ram_block_resized()

2020-02-12 Thread David Hildenbrand
Let's implement ram_block_resized(), allowing resizable mappings.

For resizable mappings, we reserve $max_size IOVA address space, but only
map $size of it. When resizing, unmap the old part and remap the new
part. We'll need a new ioctl to do this atomically (e.g., to resize
while the guest is running - not allowed for now).

Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Alex Williamson 
Cc: Stefan Hajnoczi 
Cc: "Dr. David Alan Gilbert" 
Signed-off-by: David Hildenbrand 
---
 util/trace-events   |  5 ++--
 util/vfio-helpers.c | 70 -
 2 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/util/trace-events b/util/trace-events
index 83b6639018..88b7dbf4a5 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -74,8 +74,9 @@ qemu_mutex_unlock(void *mutex, const char *file, const int 
line) "released mutex
 
 # vfio-helpers.c
 qemu_vfio_dma_reset_temporary(void *s) "s %p"
-qemu_vfio_ram_block_added(void *s, void *p, size_t size) "s %p host %p size 
0x%zx"
-qemu_vfio_ram_block_removed(void *s, void *p, size_t size) "s %p host %p size 
0x%zx"
+qemu_vfio_ram_block_added(void *s, void *p, size_t size, size_t max_size) "s 
%p host %p size 0x%zx max_size 0x%zx"
+qemu_vfio_ram_block_removed(void *s, void *p, size_t size, size_t max_size) "s 
%p host %p size 0x%zx max_size 0x%zx"
+qemu_vfio_ram_block_resized(void *s, void *p, size_t old_size, size_t 
new_sizze) "s %p host %p old_size 0x%zx new_size 0x%zx"
 qemu_vfio_find_mapping(void *s, void *p) "s %p host %p"
 qemu_vfio_new_mapping(void *s, void *host, size_t size, int index, uint64_t 
iova) "s %p host %p size %zu index %d iova 0x%"PRIx64
 qemu_vfio_do_mapping(void *s, void *host, size_t size, uint64_t iova) "s %p 
host %p size %zu iova 0x%"PRIx64
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 3db6aa49f4..70877b9ebd 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -372,14 +372,20 @@ fail_container:
 return ret;
 }
 
+static int qemu_vfio_dma_map_resizable(QEMUVFIOState *s, void *host,
+   size_t size, size_t max_size,
+   bool temporary, uint64_t *iova);
+static void qemu_vfio_dma_map_resize(QEMUVFIOState *s, void *host,
+ size_t old_size, size_t new_size);
+
 static void qemu_vfio_ram_block_added(RAMBlockNotifier *n, void *host,
   size_t size, size_t max_size)
 {
 QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
 int ret;
 
-trace_qemu_vfio_ram_block_added(s, host, max_size);
-ret = qemu_vfio_dma_map(s, host, max_size, false, NULL);
+trace_qemu_vfio_ram_block_added(s, host, size, max_size);
+ret = qemu_vfio_dma_map_resizable(s, host, size, max_size, false, NULL);
 if (ret) {
 error_report("qemu_vfio_dma_map(%p, %zu) failed: %d", host,
  max_size, ret);
@@ -391,16 +397,28 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier 
*n, void *host,
 {
 QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
 if (host) {
-trace_qemu_vfio_ram_block_removed(s, host, max_size);
+trace_qemu_vfio_ram_block_removed(s, host, size, max_size);
 qemu_vfio_dma_unmap(s, host);
 }
 }
 
+static void qemu_vfio_ram_block_resized(RAMBlockNotifier *n, void *host,
+size_t old_size, size_t new_size)
+{
+QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
+
+if (host) {
+trace_qemu_vfio_ram_block_resized(s, host, old_size, new_size);
+qemu_vfio_dma_map_resize(s, host, old_size, new_size);
+}
+}
+
 static void qemu_vfio_open_common(QEMUVFIOState *s)
 {
 qemu_mutex_init(&s->lock);
 s->ram_notifier.ram_block_added = qemu_vfio_ram_block_added;
 s->ram_notifier.ram_block_removed = qemu_vfio_ram_block_removed;
+s->ram_notifier.ram_block_resized = qemu_vfio_ram_block_resized;
 s->low_water_mark = QEMU_VFIO_IOVA_MIN;
 s->high_water_mark = QEMU_VFIO_IOVA_MAX;
 ram_block_notifier_add(&s->ram_notifier);
@@ -597,9 +615,14 @@ static bool qemu_vfio_verify_mappings(QEMUVFIOState *s)
  * the result in @iova if not NULL. The caller need to make sure the area is
  * aligned to page size, and mustn't overlap with existing mapping areas (split
  * mapping status within this area is not allowed).
+ *
+ * If size < max_size, a region of max_size in IOVA address is reserved, such
+ * that the mapping can later be resized. Resizable mappings are only allowed
+ * for !temporary mappings.
  */
-int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
-  bool temporary, uint64_t *iova)
+static int qemu_vfio_dma_map_resizable(QEMUVFIOState *s, void *host,
+   size_t size, size_t max_size,
+   bool temporary, uint64_t *iova)

[PATCH v2 fixed 11/16] util/mmap-alloc: Prepare for resizable mmaps

2020-02-12 Thread David Hildenbrand
When shrinking a mmap we want to re-reserve the already populated area.
When growing a memory region, we want to populate starting with a given
fd_offset. Prepare by allowing to pass these parameters.

Also, let's make sure we always process full pages, to avoid
unmapping/remapping pages that are already in use when
growing/shrinking. Add some asserts.

Reviewed-by: Richard Henderson 
Cc: Igor Kotrasinski 
Cc: Murilo Opsfelder Araujo 
Cc: "Michael S. Tsirkin" 
Cc: Greg Kurz 
Cc: Murilo Opsfelder Araujo 
Cc: Eduardo Habkost 
Cc: "Dr. David Alan Gilbert" 
Cc: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 util/mmap-alloc.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 2f366dae72..fb7ef588fe 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -83,12 +83,12 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
 }
 
 /*
- * Reserve a new memory region of the requested size to be used for mapping
- * from the given fd (if any).
+ * Reserve a new memory region of the requested size or re-reserve parts
+ * of an existing region to be used for mapping from the given fd (if any).
  */
-static void *mmap_reserve(size_t size, int fd)
+static void *mmap_reserve(void *ptr, size_t size, int fd)
 {
-int flags = MAP_PRIVATE;
+int flags = MAP_PRIVATE | (ptr ? MAP_FIXED : 0);
 
 #if defined(__powerpc64__) && defined(__linux__)
 /*
@@ -111,19 +111,23 @@ static void *mmap_reserve(size_t size, int fd)
 flags |= MAP_ANONYMOUS;
 #endif
 
-return mmap(0, size, PROT_NONE, flags, fd, 0);
+return mmap(ptr, size, PROT_NONE, flags, fd, 0);
 }
 
 /*
  * Populate memory in a reserved region from the given fd (if any).
  */
-static void *mmap_populate(void *ptr, size_t size, int fd, bool shared,
-   bool is_pmem)
+static void *mmap_populate(void *ptr, size_t size, int fd, size_t fd_offset,
+   bool shared, bool is_pmem)
 {
 int map_sync_flags = 0;
 int flags = MAP_FIXED;
 void *populated_ptr;
 
+if (fd == -1) {
+fd_offset = 0;
+}
+
 flags |= fd == -1 ? MAP_ANONYMOUS : 0;
 flags |= shared ? MAP_SHARED : MAP_PRIVATE;
 if (shared && is_pmem) {
@@ -131,7 +135,7 @@ static void *mmap_populate(void *ptr, size_t size, int fd, 
bool shared,
 }
 
 populated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE,
- flags | map_sync_flags, fd, 0);
+ flags | map_sync_flags, fd, fd_offset);
 if (populated_ptr == MAP_FAILED && map_sync_flags) {
 if (errno == ENOTSUP) {
 char *proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
@@ -153,7 +157,8 @@ static void *mmap_populate(void *ptr, size_t size, int fd, 
bool shared,
  * If mmap failed with MAP_SHARED_VALIDATE | MAP_SYNC, we will try
  * again without these flags to handle backwards compatibility.
  */
-populated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE, flags, fd, 0);
+populated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE, flags, fd,
+ fd_offset);
 }
 return populated_ptr;
 }
@@ -178,13 +183,15 @@ void *qemu_ram_mmap(int fd,
 size_t offset, total;
 void *ptr, *guardptr;
 
+g_assert(QEMU_IS_ALIGNED(size, pagesize));
+
 /*
  * Note: this always allocates at least one extra page of virtual address
  * space, even if size is already aligned.
  */
 total = size + align;
 
-guardptr = mmap_reserve(total, fd);
+guardptr = mmap_reserve(0, total, fd);
 if (guardptr == MAP_FAILED) {
 return MAP_FAILED;
 }
@@ -195,7 +202,7 @@ void *qemu_ram_mmap(int fd,
 
 offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
-ptr = mmap_populate(guardptr + offset, size, fd, shared, is_pmem);
+ptr = mmap_populate(guardptr + offset, size, fd, 0, shared, is_pmem);
 if (ptr == MAP_FAILED) {
 munmap(guardptr, total);
 return MAP_FAILED;
@@ -221,6 +228,8 @@ void qemu_ram_munmap(int fd, void *ptr, size_t size)
 {
 const size_t pagesize = mmap_pagesize(fd);
 
+g_assert(QEMU_IS_ALIGNED(size, pagesize));
+
 if (ptr) {
 /* Unmap both the RAM block and the guard page */
 munmap(ptr, size + pagesize);
-- 
2.24.1




[PATCH v2 fixed 09/16] util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve()

2020-02-12 Thread David Hildenbrand
We want to reserve a memory region without actually populating memory.
Let's factor that out.

Reviewed-by: Igor Kotrasinski 
Acked-by: Murilo Opsfelder Araujo 
Reviewed-by: Richard Henderson 
Cc: "Michael S. Tsirkin" 
Cc: Greg Kurz 
Cc: Murilo Opsfelder Araujo 
Cc: Eduardo Habkost 
Cc: "Dr. David Alan Gilbert" 
Cc: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 util/mmap-alloc.c | 58 +++
 1 file changed, 33 insertions(+), 25 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 82f02a2cec..43a26f38a8 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -82,6 +82,38 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
 return qemu_real_host_page_size;
 }
 
+/*
+ * Reserve a new memory region of the requested size to be used for mapping
+ * from the given fd (if any).
+ */
+static void *mmap_reserve(size_t size, int fd)
+{
+int flags = MAP_PRIVATE;
+
+#if defined(__powerpc64__) && defined(__linux__)
+/*
+ * On ppc64 mappings in the same segment (aka slice) must share the same
+ * page size. Since we will be re-allocating part of this segment
+ * from the supplied fd, we should make sure to use the same page size, to
+ * this end we mmap the supplied fd.  In this case, set MAP_NORESERVE to
+ * avoid allocating backing store memory.
+ * We do this unless we are using the system page size, in which case
+ * anonymous memory is OK.
+ */
+if (fd == -1 || qemu_fd_getpagesize(fd) == qemu_real_host_page_size) {
+fd = -1;
+flags |= MAP_ANONYMOUS;
+} else {
+flags |= MAP_NORESERVE;
+}
+#else
+fd = -1;
+flags |= MAP_ANONYMOUS;
+#endif
+
+return mmap(0, size, PROT_NONE, flags, fd, 0);
+}
+
 static inline size_t mmap_pagesize(int fd)
 {
 #if defined(__powerpc64__) && defined(__linux__)
@@ -101,7 +133,6 @@ void *qemu_ram_mmap(int fd,
 const size_t pagesize = mmap_pagesize(fd);
 int flags;
 int map_sync_flags = 0;
-int guardfd;
 size_t offset;
 size_t total;
 void *guardptr;
@@ -113,30 +144,7 @@ void *qemu_ram_mmap(int fd,
  */
 total = size + align;
 
-#if defined(__powerpc64__) && defined(__linux__)
-/* On ppc64 mappings in the same segment (aka slice) must share the same
- * page size. Since we will be re-allocating part of this segment
- * from the supplied fd, we should make sure to use the same page size, to
- * this end we mmap the supplied fd.  In this case, set MAP_NORESERVE to
- * avoid allocating backing store memory.
- * We do this unless we are using the system page size, in which case
- * anonymous memory is OK.
- */
-flags = MAP_PRIVATE;
-if (fd == -1 || pagesize == qemu_real_host_page_size) {
-guardfd = -1;
-flags |= MAP_ANONYMOUS;
-} else {
-guardfd = fd;
-flags |= MAP_NORESERVE;
-}
-#else
-guardfd = -1;
-flags = MAP_PRIVATE | MAP_ANONYMOUS;
-#endif
-
-guardptr = mmap(0, total, PROT_NONE, flags, guardfd, 0);
-
+guardptr = mmap_reserve(total, fd);
 if (guardptr == MAP_FAILED) {
 return MAP_FAILED;
 }
-- 
2.24.1




[PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX

2020-02-12 Thread David Hildenbrand
We already allow resizable ram blocks for anonymous memory, however, they
are not actually resized. All memory is mmaped() R/W, including the memory
exceeding the used_length, up to the max_length.

When resizing, effectively only the boundary is moved. Implement actually
resizable anonymous allocations and make use of them in resizable ram
blocks when possible. Memory exceeding the used_length will be
inaccessible. Especially ram block notifiers require care.

Having actually resizable anonymous allocations (via mmap-hackery) allows
to reserve a big region in virtual address space and grow the
accessible/usable part on demand. Even if "/proc/sys/vm/overcommit_memory"
is set to "never" under Linux, huge reservations will succeed. If there is
not enough memory when resizing (to populate parts of the reserved region),
trying to resize will fail. Only the actually used size is reserved in the
OS.

E.g., virtio-mem [1] wants to reserve big resizable memory regions and
grow the usable part on demand. I think this change is worth sending out
individually. Accompanied by a bunch of minor fixes and cleanups.

Especially, memory notifiers already handle resizing by first removing
the old region, and then re-adding the resized region. prealloc is
currently not possible with resizable ram blocks. mlock() should continue
to work as is. Resizing is currently rare and must only happen on the
start of an incoming migration, or during resets. No code path (except
HAX and SEV ram block notifiers) should access memory outside of the usable
range - and if we ever find one, that one has to be fixed (I did not
identify any).

v1 -> v2:
- Add "util: vfio-helpers: Fix qemu_vfio_close()"
- Add "util: vfio-helpers: Remove Error parameter from
   qemu_vfio_undo_mapping()"
- Add "util: vfio-helpers: Factor out removal from
   qemu_vfio_undo_mapping()"
- "util/mmap-alloc: ..."
 -- Minor changes due to review feedback (e.g., assert alignment, return
bool when resizing)
- "util: vfio-helpers: Implement ram_block_resized()"
 -- Reserve max_size in the IOVA address space.
 -- On resize, undo old mapping and do new mapping. We can later implement
a new ioctl to resize the mapping directly.
- "numa: Teach ram block notifiers about resizable ram blocks"
 -- Pass size/max_size to ram block notifiers, which makes things easier an
cleaner
- "exec: Ram blocks with resizable anonymous allocations under POSIX"
 -- Adapt to new ram block notifiers
 -- Shrink after notifying. Always trigger ram block notifiers on resizes
 -- Add a safety net that all ram block notifiers registered at runtime
support resizes.

[1] https://lore.kernel.org/kvm/20191212171137.13872-1-da...@redhat.com/

David Hildenbrand (16):
  util: vfio-helpers: Factor out and fix processing of existing ram
blocks
  util: vfio-helpers: Fix qemu_vfio_close()
  util: vfio-helpers: Remove Error parameter from
qemu_vfio_undo_mapping()
  util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
  exec: Factor out setting ram settings (madvise ...) into
qemu_ram_apply_settings()
  exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()
  exec: Drop "shared" parameter from ram_block_add()
  util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
  util/mmap-alloc: Factor out reserving of a memory region to
mmap_reserve()
  util/mmap-alloc: Factor out populating of memory to mmap_populate()
  util/mmap-alloc: Prepare for resizable mmaps
  util/mmap-alloc: Implement resizable mmaps
  numa: Teach ram block notifiers about resizable ram blocks
  util: vfio-helpers: Implement ram_block_resized()
  util: oslib: Resizable anonymous allocations under POSIX
  exec: Ram blocks with resizable anonymous allocations under POSIX

 exec.c | 104 +++
 hw/core/numa.c |  53 +++-
 hw/i386/xen/xen-mapcache.c |   7 +-
 include/exec/cpu-common.h  |   3 +
 include/exec/memory.h  |   8 ++
 include/exec/ramlist.h |  14 +++-
 include/qemu/mmap-alloc.h  |  21 +++--
 include/qemu/osdep.h   |   6 +-
 stubs/ram-block.c  |  20 -
 target/i386/hax-mem.c  |   5 +-
 target/i386/sev.c  |  18 ++--
 util/mmap-alloc.c  | 165 +++--
 util/oslib-posix.c |  37 -
 util/oslib-win32.c |  14 
 util/trace-events  |   9 +-
 util/vfio-helpers.c| 145 +---
 16 files changed, 450 insertions(+), 179 deletions(-)

-- 
2.24.1




[PATCH v2 fixed 10/16] util/mmap-alloc: Factor out populating of memory to mmap_populate()

2020-02-12 Thread David Hildenbrand
We want to populate memory within a reserved memory region. Let's factor
that out.

Reviewed-by: Richard Henderson 
Acked-by: Murilo Opsfelder Araujo 
Cc: Igor Kotrasinski 
Cc: "Michael S. Tsirkin" 
Cc: Greg Kurz 
Cc: Murilo Opsfelder Araujo 
Cc: Eduardo Habkost 
Cc: "Dr. David Alan Gilbert" 
Cc: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 util/mmap-alloc.c | 89 +--
 1 file changed, 47 insertions(+), 42 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 43a26f38a8..2f366dae72 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -114,6 +114,50 @@ static void *mmap_reserve(size_t size, int fd)
 return mmap(0, size, PROT_NONE, flags, fd, 0);
 }
 
+/*
+ * Populate memory in a reserved region from the given fd (if any).
+ */
+static void *mmap_populate(void *ptr, size_t size, int fd, bool shared,
+   bool is_pmem)
+{
+int map_sync_flags = 0;
+int flags = MAP_FIXED;
+void *populated_ptr;
+
+flags |= fd == -1 ? MAP_ANONYMOUS : 0;
+flags |= shared ? MAP_SHARED : MAP_PRIVATE;
+if (shared && is_pmem) {
+map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
+}
+
+populated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE,
+ flags | map_sync_flags, fd, 0);
+if (populated_ptr == MAP_FAILED && map_sync_flags) {
+if (errno == ENOTSUP) {
+char *proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
+char *file_name = g_malloc0(PATH_MAX);
+int len = readlink(proc_link, file_name, PATH_MAX - 1);
+
+if (len < 0) {
+len = 0;
+}
+file_name[len] = '\0';
+fprintf(stderr, "Warning: requesting persistence across crashes "
+"for backend file %s failed. Proceeding without "
+"persistence, data might become corrupted in case of host "
+"crash.\n", file_name);
+g_free(proc_link);
+g_free(file_name);
+}
+/*
+ * If mmap failed with MAP_SHARED_VALIDATE | MAP_SYNC, we will try
+ * again without these flags to handle backwards compatibility.
+ */
+populated_ptr = mmap(ptr, size, PROT_READ | PROT_WRITE, flags, fd, 0);
+}
+return populated_ptr;
+}
+
 static inline size_t mmap_pagesize(int fd)
 {
 #if defined(__powerpc64__) && defined(__linux__)
@@ -131,12 +175,8 @@ void *qemu_ram_mmap(int fd,
 bool is_pmem)
 {
 const size_t pagesize = mmap_pagesize(fd);
-int flags;
-int map_sync_flags = 0;
-size_t offset;
-size_t total;
-void *guardptr;
-void *ptr;
+size_t offset, total;
+void *ptr, *guardptr;
 
 /*
  * Note: this always allocates at least one extra page of virtual address
@@ -153,44 +193,9 @@ void *qemu_ram_mmap(int fd,
 /* Always align to host page size */
 assert(align >= pagesize);
 
-flags = MAP_FIXED;
-flags |= fd == -1 ? MAP_ANONYMOUS : 0;
-flags |= shared ? MAP_SHARED : MAP_PRIVATE;
-if (shared && is_pmem) {
-map_sync_flags = MAP_SYNC | MAP_SHARED_VALIDATE;
-}
-
 offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
 
-ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
-   flags | map_sync_flags, fd, 0);
-
-if (ptr == MAP_FAILED && map_sync_flags) {
-if (errno == ENOTSUP) {
-char *proc_link, *file_name;
-int len;
-proc_link = g_strdup_printf("/proc/self/fd/%d", fd);
-file_name = g_malloc0(PATH_MAX);
-len = readlink(proc_link, file_name, PATH_MAX - 1);
-if (len < 0) {
-len = 0;
-}
-file_name[len] = '\0';
-fprintf(stderr, "Warning: requesting persistence across crashes "
-"for backend file %s failed. Proceeding without "
-"persistence, data might become corrupted in case of host "
-"crash.\n", file_name);
-g_free(proc_link);
-g_free(file_name);
-}
-/*
- * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
- * we will remove these flags to handle compatibility.
- */
-ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE,
-   flags, fd, 0);
-}
-
+ptr = mmap_populate(guardptr + offset, size, fd, shared, is_pmem);
 if (ptr == MAP_FAILED) {
 munmap(guardptr, total);
 return MAP_FAILED;
-- 
2.24.1




[PATCH v2 fixed 08/16] util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()

2020-02-12 Thread David Hildenbrand
Factor it out and add a comment.

Reviewed-by: Igor Kotrasinski 
Acked-by: Murilo Opsfelder Araujo 
Reviewed-by: Richard Henderson 
Cc: "Michael S. Tsirkin" 
Cc: Murilo Opsfelder Araujo 
Cc: Greg Kurz 
Cc: Eduardo Habkost 
Cc: "Dr. David Alan Gilbert" 
Cc: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 util/mmap-alloc.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 27dcccd8ec..82f02a2cec 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -82,17 +82,27 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
 return qemu_real_host_page_size;
 }
 
+static inline size_t mmap_pagesize(int fd)
+{
+#if defined(__powerpc64__) && defined(__linux__)
+/* Mappings in the same segment must share the same page size */
+return qemu_fd_getpagesize(fd);
+#else
+return qemu_real_host_page_size;
+#endif
+}
+
 void *qemu_ram_mmap(int fd,
 size_t size,
 size_t align,
 bool shared,
 bool is_pmem)
 {
+const size_t pagesize = mmap_pagesize(fd);
 int flags;
 int map_sync_flags = 0;
 int guardfd;
 size_t offset;
-size_t pagesize;
 size_t total;
 void *guardptr;
 void *ptr;
@@ -113,7 +123,6 @@ void *qemu_ram_mmap(int fd,
  * anonymous memory is OK.
  */
 flags = MAP_PRIVATE;
-pagesize = qemu_fd_getpagesize(fd);
 if (fd == -1 || pagesize == qemu_real_host_page_size) {
 guardfd = -1;
 flags |= MAP_ANONYMOUS;
@@ -123,7 +132,6 @@ void *qemu_ram_mmap(int fd,
 }
 #else
 guardfd = -1;
-pagesize = qemu_real_host_page_size;
 flags = MAP_PRIVATE | MAP_ANONYMOUS;
 #endif
 
@@ -198,15 +206,10 @@ void *qemu_ram_mmap(int fd,
 
 void qemu_ram_munmap(int fd, void *ptr, size_t size)
 {
-size_t pagesize;
+const size_t pagesize = mmap_pagesize(fd);
 
 if (ptr) {
 /* Unmap both the RAM block and the guard page */
-#if defined(__powerpc64__) && defined(__linux__)
-pagesize = qemu_fd_getpagesize(fd);
-#else
-pagesize = qemu_real_host_page_size;
-#endif
 munmap(ptr, size + pagesize);
 }
 }
-- 
2.24.1




Re: [PATCH] nbd-client: Support leading / in NBD URI

2020-02-12 Thread Richard W.M. Jones
On Tue, Feb 11, 2020 at 08:31:01PM -0600, Eric Blake wrote:
> The NBD URI specification [1] states that only one leading slash at
> the beginning of the URI path component is stripped, not all such
> slashes.  This becomes important to a patch I just proposed to nbdkit
> [2], which would allow the exportname to select a file embedded within
> an ext2 image: ext2fs demands an absolute pathname beginning with '/',
> and because qemu was inadvertantly stripping it, my nbdkit patch had
> to work around the behavior.
> 
> [1] https://github.com/NetworkBlockDevice/nbd/blob/master/doc/uri.md
> [2] https://www.redhat.com/archives/libguestfs/2020-February/msg00109.html
> 
> Note that the qemu bug only affects handling of URIs such as
> nbd://host:port//abs/path (where '/abs/path' should be the export
> name); it is still possible to use --image-opts and pass the desired
> export name with a leading slash directly through JSON even without
> this patch.
> 
> Signed-off-by: Eric Blake 
> ---
>  block/nbd.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index d085554f21ea..82f9b7ef50a5 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -1516,8 +1516,10 @@ static int nbd_parse_uri(const char *filename, QDict 
> *options)
>  goto out;
>  }
> 
> -p = uri->path ? uri->path : "/";
> -p += strspn(p, "/");
> +p = uri->path ? uri->path : "";
> +if (p[0] == '/') {
> +p++;
> +}
>  if (p[0]) {
>  qdict_put_str(options, "export", p);
>  }

Looks reasonable, ACK.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top




[PATCH v2 fixed 06/16] exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()

2020-02-12 Thread David Hildenbrand
I don't see why we shouldn't apply all settings to make it look like the
surrounding RAM (and enable proper VMA merging).

Note: memory backend settings might have overridden these settings. We
would need a callback to let the memory backend fix that up.

Reviewed-by: Richard Henderson 
Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 exec.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index 31a462a7d3..f7525867ec 100644
--- a/exec.c
+++ b/exec.c
@@ -2552,8 +2552,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
  length, addr);
 exit(1);
 }
-memory_try_enable_merging(vaddr, length);
-qemu_ram_setup_dump(vaddr, length);
+qemu_ram_apply_settings(vaddr, length);
 }
 }
 }
-- 
2.24.1




Re: [PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX

2020-02-12 Thread David Hildenbrand
On 12.02.20 14:35, David Hildenbrand wrote:
> We already allow resizable ram blocks for anonymous memory, however, they
> are not actually resized. All memory is mmaped() R/W, including the memory
> exceeding the used_length, up to the max_length.
> 
> When resizing, effectively only the boundary is moved. Implement actually
> resizable anonymous allocations and make use of them in resizable ram
> blocks when possible. Memory exceeding the used_length will be
> inaccessible. Especially ram block notifiers require care.
> 
> Having actually resizable anonymous allocations (via mmap-hackery) allows
> to reserve a big region in virtual address space and grow the
> accessible/usable part on demand. Even if "/proc/sys/vm/overcommit_memory"
> is set to "never" under Linux, huge reservations will succeed. If there is
> not enough memory when resizing (to populate parts of the reserved region),
> trying to resize will fail. Only the actually used size is reserved in the
> OS.
> 
> E.g., virtio-mem [1] wants to reserve big resizable memory regions and
> grow the usable part on demand. I think this change is worth sending out
> individually. Accompanied by a bunch of minor fixes and cleanups.
> 
> Especially, memory notifiers already handle resizing by first removing
> the old region, and then re-adding the resized region. prealloc is
> currently not possible with resizable ram blocks. mlock() should continue
> to work as is. Resizing is currently rare and must only happen on the
> start of an incoming migration, or during resets. No code path (except
> HAX and SEV ram block notifiers) should access memory outside of the usable
> range - and if we ever find one, that one has to be fixed (I did not
> identify any).
> 
> v1 -> v2:
> - Add "util: vfio-helpers: Fix qemu_vfio_close()"
> - Add "util: vfio-helpers: Remove Error parameter from
>qemu_vfio_undo_mapping()"
> - Add "util: vfio-helpers: Factor out removal from
>qemu_vfio_undo_mapping()"
> - "util/mmap-alloc: ..."
>  -- Minor changes due to review feedback (e.g., assert alignment, return
> bool when resizing)
> - "util: vfio-helpers: Implement ram_block_resized()"
>  -- Reserve max_size in the IOVA address space.
>  -- On resize, undo old mapping and do new mapping. We can later implement
> a new ioctl to resize the mapping directly.
> - "numa: Teach ram block notifiers about resizable ram blocks"
>  -- Pass size/max_size to ram block notifiers, which makes things easier an
> cleaner
> - "exec: Ram blocks with resizable anonymous allocations under POSIX"
>  -- Adapt to new ram block notifiers
>  -- Shrink after notifying. Always trigger ram block notifiers on resizes
>  -- Add a safety net that all ram block notifiers registered at runtime
> support resizes.
> 
> [1] https://lore.kernel.org/kvm/20191212171137.13872-1-da...@redhat.com/
> 
> David Hildenbrand (16):
>   util: vfio-helpers: Factor out and fix processing of existing ram
> blocks
>   util: vfio-helpers: Fix qemu_vfio_close()
>   util: vfio-helpers: Remove Error parameter from
> qemu_vfio_undo_mapping()
>   util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
>   exec: Factor out setting ram settings (madvise ...) into
> qemu_ram_apply_settings()
>   exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()
>   exec: Drop "shared" parameter from ram_block_add()
>   util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
>   util/mmap-alloc: Factor out reserving of a memory region to
> mmap_reserve()
>   util/mmap-alloc: Factor out populating of memory to mmap_populate()
>   util/mmap-alloc: Prepare for resizable mmaps
>   util/mmap-alloc: Implement resizable mmaps
>   numa: Teach ram block notifiers about resizable ram blocks
>   util: vfio-helpers: Implement ram_block_resized()
>   util: oslib: Resizable anonymous allocations under POSIX
>   exec: Ram blocks with resizable anonymous allocations under POSIX

I should double check what I send out while doing last minute changes.
Please ignore this series, will send the proper one right away.


-- 
Thanks,

David / dhildenb




[PATCH v2 fixed 04/16] util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()

2020-02-12 Thread David Hildenbrand
Factor it out and properly use it where applicable. Make
qemu_vfio_undo_mapping() look like qemu_vfio_do_mapping(), passing the
size and iova, not the mapping.

Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Alex Williamson 
Cc: Stefan Hajnoczi 
Signed-off-by: David Hildenbrand 
---
 util/vfio-helpers.c | 43 +++
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 13dd962d95..b3adc328db 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -517,6 +517,20 @@ static IOVAMapping *qemu_vfio_add_mapping(QEMUVFIOState *s,
 return insert;
 }
 
+/**
+ * Remove the mapping from @s and free it.
+ */
+static void qemu_vfio_remove_mapping(QEMUVFIOState *s, IOVAMapping *mapping)
+{
+const int index = mapping - s->mappings;
+
+assert(index >= 0 && index < s->nr_mappings);
+memmove(mapping, &s->mappings[index + 1],
+sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
+s->nr_mappings--;
+s->mappings = g_renew(IOVAMapping, s->mappings, s->nr_mappings);
+}
+
 /* Do the DMA mapping with VFIO. */
 static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
 uint64_t iova)
@@ -538,29 +552,22 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void 
*host, size_t size,
 }
 
 /**
- * Undo the DMA mapping from @s with VFIO, and remove from mapping list.
+ * Undo the DMA mapping from @s with VFIO.
  */
-static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping)
+static void qemu_vfio_undo_mapping(QEMUVFIOState *s, size_t size, uint64_t 
iova)
 {
-int index;
 struct vfio_iommu_type1_dma_unmap unmap = {
 .argsz = sizeof(unmap),
 .flags = 0,
-.iova = mapping->iova,
-.size = mapping->size,
+.iova = iova,
+.size = size,
 };
 
-index = mapping - s->mappings;
-assert(mapping->size > 0);
-assert(QEMU_IS_ALIGNED(mapping->size, qemu_real_host_page_size));
-assert(index >= 0 && index < s->nr_mappings);
+assert(size > 0);
+assert(QEMU_IS_ALIGNED(size, qemu_real_host_page_size));
 if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
 error_report("VFIO_UNMAP_DMA failed: %d", -errno);
 }
-memmove(mapping, &s->mappings[index + 1],
-sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
-s->nr_mappings--;
-s->mappings = g_renew(IOVAMapping, s->mappings, s->nr_mappings);
 }
 
 /* Check if the mapping list is (ascending) ordered. */
@@ -620,7 +627,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t 
size,
 assert(qemu_vfio_verify_mappings(s));
 ret = qemu_vfio_do_mapping(s, host, size, iova0);
 if (ret) {
-qemu_vfio_undo_mapping(s, mapping);
+qemu_vfio_remove_mapping(s, mapping);
 goto out;
 }
 s->low_water_mark += size;
@@ -680,7 +687,8 @@ void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host)
 if (!m) {
 goto out;
 }
-qemu_vfio_undo_mapping(s, m);
+qemu_vfio_undo_mapping(s, m->size, m->iova);
+qemu_vfio_remove_mapping(s, m);
 out:
 qemu_mutex_unlock(&s->lock);
 }
@@ -697,7 +705,10 @@ void qemu_vfio_close(QEMUVFIOState *s)
 return;
 }
 while (s->nr_mappings) {
-qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1]);
+IOVAMapping *m = &s->mappings[s->nr_mappings - 1];
+
+qemu_vfio_undo_mapping(s, m->size, m->iova);
+qemu_vfio_remove_mapping(s, m);
 }
 ram_block_notifier_remove(&s->ram_notifier);
 qemu_vfio_reset(s);
-- 
2.24.1




Re: [PATCH v30 00/22] Add RX archtecture support

2020-02-12 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200212130311.127515-1-ys...@users.sourceforge.jp/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v30 00/22] Add RX archtecture support
Message-id: 20200212130311.127515-1-ys...@users.sourceforge.jp
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  patchew/20200212023101.1162686-1-ebl...@redhat.com -> 
patchew/20200212023101.1162686-1-ebl...@redhat.com
 - [tag update]  patchew/20200212130311.127515-1-ys...@users.sourceforge.jp 
-> patchew/20200212130311.127515-1-ys...@users.sourceforge.jp
Switched to a new branch 'test'
63d8f29 qemu-doc.texi: Add RX section.
0828fee BootLinuxConsoleTest: Test the RX-Virt machine
bea9ae2 Add rx-softmmu
46f27e3 hw/rx: Restrict the RX62N microcontroller to the RX62N CPU core
2afce4c hw/rx: Honor -accel qtest
ed93bfd hw/rx: RX Target hardware definition
72f4963 hw/char: RX62N serial communication interface (SCI)
ad00efa hw/timer: RX62N internal timer modules
bf6c82f hw/intc: RX62N interrupt controller (ICUa)
253d941 target/rx: Dump bytes for each insn during disassembly
0d816ca target/rx: Collect all bytes during disassembly
bdfbcf3 target/rx: Emit all disassembly in one prt()
f587124 target/rx: Use prt_ldmi for XCHG_mr disassembly
eeca618 target/rx: Replace operand with prt_ldmi in disassembler
956befe target/rx: Disassemble rx_index_addr into a string
0835249 target/rx: RX disassembler
6821236 target/rx: CPU definition
0e76a8f target/rx: TCG helper
fe37ab1 target/rx: TCG translation
7cc01ef hw/registerfields.h: Add 8bit and 16bit register macros
d17802b qemu/bitops.h: Add extract8 and extract16
81462db MAINTAINERS: Add RX

=== OUTPUT BEGIN ===
1/22 Checking commit 81462db667b2 (MAINTAINERS: Add RX)
2/22 Checking commit d17802b2bd09 (qemu/bitops.h: Add extract8 and extract16)
3/22 Checking commit 7cc01ef2805e (hw/registerfields.h: Add 8bit and 16bit 
register macros)
Use of uninitialized value in concatenation (.) or string at 
./scripts/checkpatch.pl line 2490.
ERROR: Macros with multiple statements should be enclosed in a do - while loop
#27: FILE: include/hw/registerfields.h:25:
+#define REG8(reg, addr)  \
+enum { A_ ## reg = (addr) };  \
+enum { R_ ## reg = (addr) };

ERROR: Macros with multiple statements should be enclosed in a do - while loop
#31: FILE: include/hw/registerfields.h:29:
+#define REG16(reg, addr)  \
+enum { A_ ## reg = (addr) };  \
+enum { R_ ## reg = (addr) / 2 };

total: 2 errors, 0 warnings, 56 lines checked

Patch 3/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/22 Checking commit fe37ab11655e (target/rx: TCG translation)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#20: 
new file mode 100644

total: 0 errors, 1 warnings, 3065 lines checked

Patch 4/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
5/22 Checking commit 0e76a8f988a8 (target/rx: TCG helper)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#21: 
new file mode 100644

total: 0 errors, 1 warnings, 650 lines checked

Patch 5/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
6/22 Checking commit 6821236f1ec6 (target/rx: CPU definition)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#22: 
new file mode 100644

total: 0 errors, 1 warnings, 659 lines checked

Patch 6/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
7/22 Checking commit 083524988c91 (target/rx: RX disassembler)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#38: 
new file mode 100644

total: 0 errors, 1 warnings, 1497 lines checked

Patch 7/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
8/22 Checking commit 956befec0bfb (target/rx: Disassemble rx_index_addr into a 
string)
9/22 Checking commit eeca61803f65 (target/rx: Replace operand with prt_ldmi in 
disassembler)
10/22 Checking commit f58712457dc7 (target/rx: Use prt_ldmi for XCHG_mr 
disassembly)
11/22 Checking commit bdfbcf35637b (target/rx: Emit all disassem

[PATCH v2 fixed 05/16] exec: Factor out setting ram settings (madvise ...) into qemu_ram_apply_settings()

2020-02-12 Thread David Hildenbrand
Factor all settings out into qemu_ram_apply_settings().

For memory_try_enable_merging(), the important bit is that it won't be
called with XEN - which is now still the case as new_block->host will
remain NULL.

Reviewed-by: Richard Henderson 
Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 exec.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index 05cfe868ab..31a462a7d3 100644
--- a/exec.c
+++ b/exec.c
@@ -2121,6 +2121,15 @@ static int memory_try_enable_merging(void *addr, size_t 
len)
 return qemu_madvise(addr, len, QEMU_MADV_MERGEABLE);
 }
 
+static void qemu_ram_apply_settings(void *host, size_t length)
+{
+memory_try_enable_merging(host, length);
+qemu_ram_setup_dump(host, length);
+qemu_madvise(host, length, QEMU_MADV_HUGEPAGE);
+/* MADV_DONTFORK is also needed by KVM in absence of synchronous MMU */
+qemu_madvise(host, length, QEMU_MADV_DONTFORK);
+}
+
 /* Only legal before guest might have detected the memory size: e.g. on
  * incoming migration, or right after reset.
  *
@@ -2271,7 +2280,6 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp, bool shared)
 qemu_mutex_unlock_ramlist();
 return;
 }
-memory_try_enable_merging(new_block->host, new_block->max_length);
 }
 }
 
@@ -2309,10 +2317,7 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp, bool shared)
 DIRTY_CLIENTS_ALL);
 
 if (new_block->host) {
-qemu_ram_setup_dump(new_block->host, new_block->max_length);
-qemu_madvise(new_block->host, new_block->max_length, 
QEMU_MADV_HUGEPAGE);
-/* MADV_DONTFORK is also needed by KVM in absence of synchronous MMU */
-qemu_madvise(new_block->host, new_block->max_length, 
QEMU_MADV_DONTFORK);
+qemu_ram_apply_settings(new_block->host, new_block->max_length);
 ram_block_notify_add(new_block->host, new_block->max_length);
 }
 }
-- 
2.24.1




[PATCH v2 fixed 02/16] util: vfio-helpers: Fix qemu_vfio_close()

2020-02-12 Thread David Hildenbrand
qemu_vfio_undo_mapping() will decrement the number of mappings and
reshuffle the array elements to fit into the reduced size.

Iterating over all elements like this does not work as expected, let's make
sure to remove all mappings properly.

Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Alex Williamson 
Cc: Stefan Hajnoczi 
Signed-off-by: David Hildenbrand 
---
 util/vfio-helpers.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 71e02e7f35..d6332522c1 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -694,13 +694,11 @@ static void qemu_vfio_reset(QEMUVFIOState *s)
 /* Close and free the VFIO resources. */
 void qemu_vfio_close(QEMUVFIOState *s)
 {
-int i;
-
 if (!s) {
 return;
 }
-for (i = 0; i < s->nr_mappings; ++i) {
-qemu_vfio_undo_mapping(s, &s->mappings[i], NULL);
+while (s->nr_mappings) {
+qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1], NULL);
 }
 ram_block_notifier_remove(&s->ram_notifier);
 qemu_vfio_reset(s);
-- 
2.24.1




[PATCH v2 fixed 03/16] util: vfio-helpers: Remove Error parameter from qemu_vfio_undo_mapping()

2020-02-12 Thread David Hildenbrand
Everybody discards the error. Let's error_report() instead so this error
doesn't get lost.

Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Alex Williamson 
Cc: Stefan Hajnoczi 
Signed-off-by: David Hildenbrand 
---
 util/vfio-helpers.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index d6332522c1..13dd962d95 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -540,8 +540,7 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void 
*host, size_t size,
 /**
  * Undo the DMA mapping from @s with VFIO, and remove from mapping list.
  */
-static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping,
-   Error **errp)
+static void qemu_vfio_undo_mapping(QEMUVFIOState *s, IOVAMapping *mapping)
 {
 int index;
 struct vfio_iommu_type1_dma_unmap unmap = {
@@ -556,7 +555,7 @@ static void qemu_vfio_undo_mapping(QEMUVFIOState *s, 
IOVAMapping *mapping,
 assert(QEMU_IS_ALIGNED(mapping->size, qemu_real_host_page_size));
 assert(index >= 0 && index < s->nr_mappings);
 if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
-error_setg(errp, "VFIO_UNMAP_DMA failed: %d", -errno);
+error_report("VFIO_UNMAP_DMA failed: %d", -errno);
 }
 memmove(mapping, &s->mappings[index + 1],
 sizeof(s->mappings[0]) * (s->nr_mappings - index - 1));
@@ -621,7 +620,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t 
size,
 assert(qemu_vfio_verify_mappings(s));
 ret = qemu_vfio_do_mapping(s, host, size, iova0);
 if (ret) {
-qemu_vfio_undo_mapping(s, mapping, NULL);
+qemu_vfio_undo_mapping(s, mapping);
 goto out;
 }
 s->low_water_mark += size;
@@ -681,7 +680,7 @@ void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host)
 if (!m) {
 goto out;
 }
-qemu_vfio_undo_mapping(s, m, NULL);
+qemu_vfio_undo_mapping(s, m);
 out:
 qemu_mutex_unlock(&s->lock);
 }
@@ -698,7 +697,7 @@ void qemu_vfio_close(QEMUVFIOState *s)
 return;
 }
 while (s->nr_mappings) {
-qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1], NULL);
+qemu_vfio_undo_mapping(s, &s->mappings[s->nr_mappings - 1]);
 }
 ram_block_notifier_remove(&s->ram_notifier);
 qemu_vfio_reset(s);
-- 
2.24.1




Re: [PATCH] nbd-client: Support leading / in NBD URI

2020-02-12 Thread Maxim Levitsky
On Wed, 2020-02-12 at 14:33 +0100, Ján Tomko wrote:
> On Tue, Feb 11, 2020 at 08:31:01PM -0600, Eric Blake wrote:
> > The NBD URI specification [1] states that only one leading slash at
> > the beginning of the URI path component is stripped, not all such
> > slashes.  This becomes important to a patch I just proposed to nbdkit
> > [2], which would allow the exportname to select a file embedded within
> > an ext2 image: ext2fs demands an absolute pathname beginning with '/',
> > and because qemu was inadvertantly stripping it, my nbdkit patch had
> > to work around the behavior.
> > 
> > [1] https://github.com/NetworkBlockDevice/nbd/blob/master/doc/uri.md
> > [2] https://www.redhat.com/archives/libguestfs/2020-February/msg00109.html
> > 
> > Note that the qemu bug only affects handling of URIs such as
> > nbd://host:port//abs/path (where '/abs/path' should be the export
> > name); it is still possible to use --image-opts and pass the desired
> > export name with a leading slash directly through JSON even without
> > this patch.
> > 
> > Signed-off-by: Eric Blake 
> > ---
> > block/nbd.c | 6 --
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> 
> Reviewed-by: Ján Tomko 
> 
> Jano
Note that I had a bug open for this.
https://bugzilla.redhat.com/show_bug.cgi?id=1728545

I expected this to be a feature to be honest,
and was afraid to break existing users that might rely on this.

Best regards,
Maxim Levitsky




[PATCH v2 14/16] virtio-mem: Support for resizable memory regions

2020-02-12 Thread David Hildenbrand
Signed-off-by: David Hildenbrand 
---
 hw/virtio/virtio-mem.c | 168 ++---
 1 file changed, 109 insertions(+), 59 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 093b6eb0bb..d28b501778 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -237,30 +237,78 @@ static void virtio_mem_unplug_request(VirtIOMEM *vm, 
VirtQueueElement *elem,
 virtio_mem_send_response_simple(vm, elem, type);
 }
 
+/*
+ * Try to resize the usable region to hold at least the requested size.
+ */
+static void virtio_mem_resize_usable_region(VirtIOMEM *vm,
+uint64_t requested_size,
+Error **errp)
+{
+/*
+ * If possible, we size the usable region a little bit bigger than the
+ * requested size, so the guest has more flexibility.
+ */
+uint64_t newsize = MIN(memory_region_max_size(&vm->memdev->mr),
+   requested_size + VIRTIO_MEM_USABLE_EXTENT);
+Error *err = NULL;
+
+/*
+ * Size it as small as possible (0 is not valid).
+ */
+if (!requested_size) {
+newsize = vm->block_size;
+}
+
+if (newsize == vm->usable_region_size) {
+return;
+}
+
+/* resize the memory region, if supported */
+if (memory_region_is_resizable(&vm->memdev->mr)) {
+memory_region_ram_resize(&vm->memdev->mr, newsize, &err);
+}
+if (!err) {
+vm->usable_region_size = newsize;
+fprintf(stderr, "New usable_region_size: %" PRIx64 "\n",
+vm->usable_region_size);
+}
+error_propagate(errp, err);
+}
+
 /*
  * Unplug all memory and shrink the usable region.
  */
-static void virtio_mem_unplug_all(VirtIOMEM *vm)
+static int virtio_mem_unplug_all(VirtIOMEM *vm)
 {
+Error *err = NULL;
+
+if (virtio_mem_busy()) {
+return -EBUSY;
+}
+
+virtio_mem_resize_usable_region(vm, vm->requested_size, &err);
+if (err) {
+/* It's unlikely that shrinking fails. */
+warn_report_err(err);
+return -ENOMEM;
+}
 if (vm->size) {
-virtio_mem_set_block_state(vm, vm->addr,
-   memory_region_size(&vm->memdev->mr), false);
+ram_block_discard_range(vm->memdev->mr.ram_block, 0,
+memory_region_size(&vm->memdev->mr));
+bitmap_clear(vm->bitmap, 0, vm->bitmap_size);
 vm->size = 0;
 }
-vm->usable_region_size = MIN(memory_region_size(&vm->memdev->mr),
- vm->requested_size + 
VIRTIO_MEM_USABLE_EXTENT);
+return 0;
 }
 
 static void virtio_mem_unplug_all_request(VirtIOMEM *vm, VirtQueueElement 
*elem)
 {
 
-if (virtio_mem_busy()) {
+if (virtio_mem_unplug_all(vm)) {
 virtio_mem_send_response_simple(vm, elem,  VIRTIO_MEM_RESP_BUSY);
-return;
+} else {
+virtio_mem_send_response_simple(vm, elem,  VIRTIO_MEM_RESP_ACK);
 }
-
-virtio_mem_unplug_all(vm);
-virtio_mem_send_response_simple(vm, elem,  VIRTIO_MEM_RESP_ACK);
 }
 
 static void virtio_mem_state_request(VirtIOMEM *vm, VirtQueueElement *elem,
@@ -344,7 +392,7 @@ static void virtio_mem_get_config(VirtIODevice *vdev, 
uint8_t *config_data)
 config->requested_size = cpu_to_le64(vm->requested_size);
 config->plugged_size = cpu_to_le64(vm->size);
 config->addr = cpu_to_le64(vm->addr);
-config->region_size = cpu_to_le64(memory_region_size(&vm->memdev->mr));
+config->region_size = cpu_to_le64(memory_region_max_size(&vm->memdev->mr));
 config->usable_region_size = cpu_to_le64(vm->usable_region_size);
 }
 
@@ -370,10 +418,6 @@ static void virtio_mem_system_reset(void *opaque)
  * region size. This is, however, not possible in all scenarios. Then,
  * the guest has to deal with this manually (VIRTIO_MEM_REQ_UNPLUG_ALL).
  */
-if (virtio_mem_busy()) {
-return;
-}
-
 virtio_mem_unplug_all(vm);
 }
 
@@ -410,32 +454,32 @@ static void virtio_mem_device_realize(DeviceState *dev, 
Error **errp)
 int nb_numa_nodes = ms->numa_state ? ms->numa_state->num_nodes : 0;
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIOMEM *vm = VIRTIO_MEM(dev);
-Error *local_err = NULL;
+Error *err = NULL;
 uint64_t page_size;
 
 /* verify the memdev */
 host_memory_backend_validate(vm->memdev, VIRTIO_MEM_MEMDEV_PROP,
- false, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
+ true, &err);
+if (err) {
+error_propagate(errp, err);
 return;
 }
 
 /* verify the node */
 if ((nb_numa_nodes && vm->node >= nb_numa_nodes) ||
 (!nb_numa_nodes && vm->node)) {
-error_setg(&local_err, "Property '%s' has value '%" PRIu32
+error_setg(errp, "Property '%s' has value '%" PRIu32
"', which exceeds the number of 

[PATCH v2 fixed 15/16] util: oslib: Resizable anonymous allocations under POSIX

2020-02-12 Thread David Hildenbrand
Introduce qemu_anon_ram_alloc_resizable() and qemu_anon_ram_resize().
Implement them under POSIX and make them return NULL under WIN32.

Under POSIX, we make use of resizable mmaps. An implementation under
WIN32 is theoretically possible AFAIK and can be added later.

In qemu_anon_ram_free(), rename the size parameter to max_size, to make
it clearer that we have to use the maximum size when freeing resizable
anonymous allocations.

Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: "Dr. David Alan Gilbert" 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Stefan Weil 
Cc: Igor Mammedov 
Signed-off-by: David Hildenbrand 
---
 include/qemu/osdep.h |  6 +-
 util/oslib-posix.c   | 37 ++---
 util/oslib-win32.c   | 14 ++
 util/trace-events|  4 +++-
 4 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 9bd3dcfd13..84c54c1647 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -311,8 +311,12 @@ int qemu_daemon(int nochdir, int noclose);
 void *qemu_try_memalign(size_t alignment, size_t size);
 void *qemu_memalign(size_t alignment, size_t size);
 void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool shared);
+void *qemu_anon_ram_alloc_resizable(size_t size, size_t max_size,
+uint64_t *align, bool shared);
+bool qemu_anon_ram_resize(void *ptr, size_t old_size, size_t new_size,
+  bool shared);
 void qemu_vfree(void *ptr);
-void qemu_anon_ram_free(void *ptr, size_t size);
+void qemu_anon_ram_free(void *ptr, size_t max_size);
 
 #define QEMU_MADV_INVALID -1
 
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 5a291cc982..147246d543 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -219,16 +219,47 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t 
*alignment, bool shared)
 return ptr;
 }
 
+void *qemu_anon_ram_alloc_resizable(size_t size, size_t max_size,
+uint64_t *alignment, bool shared)
+{
+size_t align = QEMU_VMALLOC_ALIGN;
+void *ptr = qemu_ram_mmap_resizable(-1, size, max_size, align, shared,
+false);
+
+if (ptr == MAP_FAILED) {
+return NULL;
+}
+
+if (alignment) {
+*alignment = align;
+}
+
+trace_qemu_anon_ram_alloc_resizable(size, max_size, ptr);
+return ptr;
+}
+
+bool qemu_anon_ram_resize(void *ptr, size_t old_size, size_t new_size,
+  bool shared)
+{
+bool resized = qemu_ram_mmap_resize(ptr, -1, old_size, new_size, shared,
+false);
+
+if (resized) {
+trace_qemu_anon_ram_resize(old_size, new_size, ptr);
+}
+return resized;
+}
+
 void qemu_vfree(void *ptr)
 {
 trace_qemu_vfree(ptr);
 free(ptr);
 }
 
-void qemu_anon_ram_free(void *ptr, size_t size)
+void qemu_anon_ram_free(void *ptr, size_t max_size)
 {
-trace_qemu_anon_ram_free(ptr, size);
-qemu_ram_munmap(-1, ptr, size);
+trace_qemu_anon_ram_free(ptr, max_size);
+qemu_ram_munmap(-1, ptr, max_size);
 }
 
 void qemu_set_block(int fd)
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index e9b14ab178..5ba872bd3b 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -90,6 +90,20 @@ void *qemu_anon_ram_alloc(size_t size, uint64_t *align, bool 
shared)
 return ptr;
 }
 
+void *qemu_anon_ram_alloc_resizable(size_t size, size_t max_size,
+uint64_t *align, bool shared)
+{
+/* resizable ram not implemented yet */
+return NULL;
+}
+
+bool qemu_anon_ram_resize(void *ptr, size_t old_size, size_t new_size,
+  bool shared)
+{
+/* resizable ram not implemented yet */
+return false;
+}
+
 void qemu_vfree(void *ptr)
 {
 trace_qemu_vfree(ptr);
diff --git a/util/trace-events b/util/trace-events
index 88b7dbf4a5..8f44dcc1a0 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -46,8 +46,10 @@ qemu_co_mutex_unlock_return(void *mutex, void *self) "mutex 
%p self %p"
 # oslib-posix.c
 qemu_memalign(size_t alignment, size_t size, void *ptr) "alignment %zu size 
%zu ptr %p"
 qemu_anon_ram_alloc(size_t size, void *ptr) "size %zu ptr %p"
+qemu_anon_ram_alloc_resizable(size_t size, size_t max_size, void *ptr) "size 
%zu max_size %zu ptr %p"
+qemu_anon_ram_resize(size_t old_size, size_t new_size, void *ptr) "old_size 
%zu new_size %zu ptr %p"
 qemu_vfree(void *ptr) "ptr %p"
-qemu_anon_ram_free(void *ptr, size_t size) "ptr %p size %zu"
+qemu_anon_ram_free(void *ptr, size_t max_size) "ptr %p max_size %zu"
 
 # hbitmap.c
 hbitmap_iter_skip_words(const void *hb, void *hbi, uint64_t pos, unsigned long 
cur) "hb %p hbi %p pos %"PRId64" cur 0x%lx"
-- 
2.24.1




[PATCH v2 fixed 01/16] util: vfio-helpers: Factor out and fix processing of existing ram blocks

2020-02-12 Thread David Hildenbrand
Factor it out into common code when a new notifier is registered, just
as done with the memory region notifier. This allows us to have the
logic about how to process existing ram blocks at a central place (which
will be extended soon).

Just like when adding a new ram block, we have to register the max_length
for now. We don't have a way to get notified about resizes yet, and some
memory would not be mapped when growing the ram block.

Note: Currently, ram blocks are only "fake resized". All memory
(max_length) is accessible.

We can get rid of a bunch of functions in stubs/ram-block.c . Print the
warning from inside qemu_vfio_ram_block_added().

Cc: Richard Henderson 
Cc: Paolo Bonzini 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Alex Williamson 
Cc: Stefan Hajnoczi 
Signed-off-by: David Hildenbrand 
---
 exec.c|  5 +
 hw/core/numa.c| 14 ++
 include/exec/cpu-common.h |  1 +
 stubs/ram-block.c | 20 
 util/vfio-helpers.c   | 28 +++-
 5 files changed, 27 insertions(+), 41 deletions(-)

diff --git a/exec.c b/exec.c
index 67e520d18e..05cfe868ab 100644
--- a/exec.c
+++ b/exec.c
@@ -2017,6 +2017,11 @@ ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
 return rb->used_length;
 }
 
+ram_addr_t qemu_ram_get_max_length(RAMBlock *rb)
+{
+return rb->max_length;
+}
+
 bool qemu_ram_is_shared(RAMBlock *rb)
 {
 return rb->flags & RAM_SHARED;
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 0d1b4be76a..6599c69e05 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -899,9 +899,23 @@ void query_numa_node_mem(NumaNodeMem node_mem[], 
MachineState *ms)
 }
 }
 
+static int ram_block_notify_add_single(RAMBlock *rb, void *opaque)
+{
+const ram_addr_t max_size = qemu_ram_get_max_length(rb);
+void *host = qemu_ram_get_host_addr(rb);
+RAMBlockNotifier *notifier = opaque;
+
+if (host) {
+notifier->ram_block_added(notifier, host, max_size);
+}
+return 0;
+}
+
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
 QLIST_INSERT_HEAD(&ram_list.ramblock_notifiers, n, next);
+/* Notify about all existing ram blocks. */
+qemu_ram_foreach_block(ram_block_notify_add_single, n);
 }
 
 void ram_block_notifier_remove(RAMBlockNotifier *n)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 81753bbb34..9760ac9068 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -59,6 +59,7 @@ const char *qemu_ram_get_idstr(RAMBlock *rb);
 void *qemu_ram_get_host_addr(RAMBlock *rb);
 ram_addr_t qemu_ram_get_offset(RAMBlock *rb);
 ram_addr_t qemu_ram_get_used_length(RAMBlock *rb);
+ram_addr_t qemu_ram_get_max_length(RAMBlock *rb);
 bool qemu_ram_is_shared(RAMBlock *rb);
 bool qemu_ram_is_uf_zeroable(RAMBlock *rb);
 void qemu_ram_set_uf_zeroable(RAMBlock *rb);
diff --git a/stubs/ram-block.c b/stubs/ram-block.c
index 73c0a3ee08..10855b52dd 100644
--- a/stubs/ram-block.c
+++ b/stubs/ram-block.c
@@ -2,21 +2,6 @@
 #include "exec/ramlist.h"
 #include "exec/cpu-common.h"
 
-void *qemu_ram_get_host_addr(RAMBlock *rb)
-{
-return 0;
-}
-
-ram_addr_t qemu_ram_get_offset(RAMBlock *rb)
-{
-return 0;
-}
-
-ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
-{
-return 0;
-}
-
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
 }
@@ -24,8 +9,3 @@ void ram_block_notifier_add(RAMBlockNotifier *n)
 void ram_block_notifier_remove(RAMBlockNotifier *n)
 {
 }
-
-int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
-{
-return 0;
-}
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 813f7ec564..71e02e7f35 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -376,8 +376,13 @@ static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
   void *host, size_t size)
 {
 QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
+int ret;
+
 trace_qemu_vfio_ram_block_added(s, host, size);
-qemu_vfio_dma_map(s, host, size, false, NULL);
+ret = qemu_vfio_dma_map(s, host, size, false, NULL);
+if (ret) {
+error_report("qemu_vfio_dma_map(%p, %zu) failed: %d", host, size, ret);
+}
 }
 
 static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
@@ -390,33 +395,14 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier 
*n,
 }
 }
 
-static int qemu_vfio_init_ramblock(RAMBlock *rb, void *opaque)
-{
-void *host_addr = qemu_ram_get_host_addr(rb);
-ram_addr_t length = qemu_ram_get_used_length(rb);
-int ret;
-QEMUVFIOState *s = opaque;
-
-if (!host_addr) {
-return 0;
-}
-ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL);
-if (ret) {
-fprintf(stderr, "qemu_vfio_init_ramblock: failed %p %" PRId64 "\n",
-host_addr, (uint64_t)length);
-}
-return 0;
-}
-
 static void qemu_vfio_open_common(QEMUVFIOState *s)
 {
 qemu_mutex_init(&s->lock);
 s->ram_notifier.ram_block_added = qemu_vfio_ra

[PATCH v2 11/16] hostmem: Factor out common checks into host_memory_backend_validate()

2020-02-12 Thread David Hildenbrand
All users want to perform similar checks. Lat's factor it out to prepare
for more checks.

Signed-off-by: David Hildenbrand 
---
 backends/hostmem.c   | 14 ++
 hw/mem/pc-dimm.c | 12 +---
 hw/misc/ivshmem.c| 11 ---
 hw/virtio/virtio-mem.c   | 15 +--
 hw/virtio/virtio-pmem.c  | 13 -
 include/sysemu/hostmem.h |  2 ++
 6 files changed, 34 insertions(+), 33 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 2c8e4567e1..de37f1bf5d 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -291,6 +291,20 @@ bool host_memory_backend_is_mapped(HostMemoryBackend 
*backend)
 return backend->is_mapped;
 }
 
+void host_memory_backend_validate(HostMemoryBackend *backend,
+  const char *property, Error **errp)
+{
+char *path = object_get_canonical_path_component(OBJECT(backend));
+
+if (!backend) {
+error_setg(errp, "'%s' property is not set", property);
+} else if (host_memory_backend_is_mapped(backend)) {
+error_setg(errp, "'%s' property specifies a busy memdev: %s",
+   property, path);
+}
+g_free(path);
+}
+
 #ifdef __linux__
 size_t host_memory_backend_pagesize(HostMemoryBackend *memdev)
 {
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 8f50b8afea..9ee634ee89 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -174,16 +174,14 @@ static void pc_dimm_realize(DeviceState *dev, Error 
**errp)
 PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
 MachineState *ms = MACHINE(qdev_get_machine());
 int nb_numa_nodes = ms->numa_state->num_nodes;
+Error *err = NULL;
 
-if (!dimm->hostmem) {
-error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property is not set");
-return;
-} else if (host_memory_backend_is_mapped(dimm->hostmem)) {
-char *path = 
object_get_canonical_path_component(OBJECT(dimm->hostmem));
-error_setg(errp, "can't use already busy memdev: %s", path);
-g_free(path);
+host_memory_backend_validate(dimm->hostmem, PC_DIMM_MEMDEV_PROP, &err);
+if (err) {
+error_propagate(errp, err);
 return;
 }
+
 if (((nb_numa_nodes > 0) && (dimm->node >= nb_numa_nodes)) ||
 (!nb_numa_nodes && dimm->node)) {
 error_setg(errp, "'DIMM property " PC_DIMM_NODE_PROP " has value %"
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 1a0fad74e1..39bffceadf 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -1035,14 +1035,11 @@ static Property ivshmem_plain_properties[] = {
 static void ivshmem_plain_realize(PCIDevice *dev, Error **errp)
 {
 IVShmemState *s = IVSHMEM_COMMON(dev);
+Error *err = NULL;
 
-if (!s->hostmem) {
-error_setg(errp, "You must specify a 'memdev'");
-return;
-} else if (host_memory_backend_is_mapped(s->hostmem)) {
-char *path = object_get_canonical_path_component(OBJECT(s->hostmem));
-error_setg(errp, "can't use already busy memdev: %s", path);
-g_free(path);
+host_memory_backend_validate(s->hostmem, "memdev", &err);
+if (err) {
+error_propagate(errp, err);
 return;
 }
 
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 2f759578fe..4b7b4cf950 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -414,16 +414,11 @@ static void virtio_mem_device_realize(DeviceState *dev, 
Error **errp)
 uint64_t page_size;
 
 /* verify the memdev */
-if (!vm->memdev) {
-error_setg(&local_err, "'%s' property must be set",
-   VIRTIO_MEM_MEMDEV_PROP);
-goto out;
-} else if (host_memory_backend_is_mapped(vm->memdev)) {
-char *path = object_get_canonical_path_component(OBJECT(vm->memdev));
-
-error_setg(&local_err, "can't use already busy memdev: %s", path);
-g_free(path);
-goto out;
+host_memory_backend_validate(vm->memdev, VIRTIO_MEM_MEMDEV_PROP,
+ &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
 }
 
 /* verify the node */
diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
index 97287e923b..85cb337ed5 100644
--- a/hw/virtio/virtio-pmem.c
+++ b/hw/virtio/virtio-pmem.c
@@ -105,16 +105,11 @@ static void virtio_pmem_realize(DeviceState *dev, Error 
**errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIOPMEM *pmem = VIRTIO_PMEM(dev);
+Error *err = NULL;
 
-if (!pmem->memdev) {
-error_setg(errp, "virtio-pmem memdev not set");
-return;
-}
-
-if (host_memory_backend_is_mapped(pmem->memdev)) {
-char *path = object_get_canonical_path_component(OBJECT(pmem->memdev));
-error_setg(errp, "can't use already busy memdev: %s", path);
-g_free(path);
+host_memory_backend_validate(pmem->memdev, "memdev", &err);
+if (err) {
+error_propagate(errp, err);
 return;
 }
 
dif

[PATCH v2 13/16] qmp/hmp: Expose "managed-size" for memory backends

2020-02-12 Thread David Hildenbrand
Expose it, and document what it means and when it was added.

Signed-off-by: David Hildenbrand 
---
 hw/core/machine-hmp-cmds.c | 2 ++
 hw/core/machine-qmp-cmds.c | 3 +++
 qapi/machine.json  | 6 ++
 3 files changed, 11 insertions(+)

diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
index b76f7223af..681216198d 100644
--- a/hw/core/machine-hmp-cmds.c
+++ b/hw/core/machine-hmp-cmds.c
@@ -122,6 +122,8 @@ void hmp_info_memdev(Monitor *mon, const QDict *qdict)
m->value->dump ? "true" : "false");
 monitor_printf(mon, "  prealloc: %s\n",
m->value->prealloc ? "true" : "false");
+monitor_printf(mon, "  managed-size: %s\n",
+   m->value->managed_size ? "true" : "false");
 monitor_printf(mon, "  policy: %s\n",
HostMemPolicy_str(m->value->policy));
 visit_complete(v, &str);
diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index eed5aeb2f7..800b55af5d 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -321,6 +321,9 @@ static int query_memdev(Object *obj, void *opaque)
 m->value->prealloc = object_property_get_bool(obj,
   "prealloc",
   &error_abort);
+m->value->managed_size = object_property_get_bool(obj,
+  "managed-size",
+  &error_abort);
 m->value->policy = object_property_get_enum(obj,
 "policy",
 "HostMemPolicy",
diff --git a/qapi/machine.json b/qapi/machine.json
index b3d30bc816..0c31818853 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -758,6 +758,9 @@
 #
 # @prealloc: enables or disables memory preallocation
 #
+# @managed-size: the owner manages the actual size, 'size' is an upper limit
+#(since 5.1)
+#
 # @host-nodes: host nodes for its memory policy
 #
 # @policy: memory policy of memory backend
@@ -771,6 +774,7 @@
 'merge':  'bool',
 'dump':   'bool',
 'prealloc':   'bool',
+'managed-size': 'bool',
 'host-nodes': ['uint16'],
 'policy': 'HostMemPolicy' }}
 
@@ -793,6 +797,7 @@
 #  "merge": false,
 #  "dump": true,
 #  "prealloc": false,
+#  "manmaged-size": false,
 #  "host-nodes": [0, 1],
 #  "policy": "bind"
 #},
@@ -801,6 +806,7 @@
 #  "merge": false,
 #  "dump": true,
 #  "prealloc": true,
+#  "manmaged-size": false,
 #  "host-nodes": [2, 3],
 #  "policy": "preferred"
 #}
-- 
2.24.1




[PATCH v2 08/16] memory: Disallow resizing to 0

2020-02-12 Thread David Hildenbrand
Memory regions / qemu ramblocks always have to have a size > 0.
Especially, otherwise, ramblock_ptr() will bail out with an assert.
Enforce this.

Signed-off-by: David Hildenbrand 
---
 exec.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/exec.c b/exec.c
index 5bc9b231c4..161e40e16e 100644
--- a/exec.c
+++ b/exec.c
@@ -2160,6 +2160,11 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, 
Error **errp)
 return 0;
 }
 
+if (!newsize) {
+error_setg_errno(errp, EINVAL, "Length cannot be 0: %s", block->idstr);
+return -EINVAL;
+}
+
 if (!qemu_ram_is_resizable(block)) {
 error_setg_errno(errp, EINVAL,
  "Length mismatch: %s: 0x" RAM_ADDR_FMT
-- 
2.24.1




[PATCH v2 16/16] kvm: Implement region_resize() for atomic memory section resizes

2020-02-12 Thread David Hildenbrand
virtio-mem wants to resize (esp. grow) memory regions while the guest is
already aware of them and makes use of them. Resizing a KVM slot can
only currently be done by removing it and re-adding it. While the kvm slot
is temporarily removed, VCPUs that try to read from these slots will fault.

Let's inhibit KVM_RUN while performing the resize. Keep it lightweight by
remembering using one bool per VCPU, if the VCPU is executing in the
kernel.

Note1: Instead of implementing region_resize(), we could also inhibit in
begin() and let the VCPUs continue to run in commit(). This would also
handle atomic splitting of memory regions. (I remember a BUG report but
cannot dig up the mail). However, using the region_resize() callback we
can later wire up an ioctl that can perform the resize atomically, and
make the inhibit conditional. Also, this way we inhibit KVM only when
resizing - not on any address space changes. This will not affect existing
RT workloads (resizes currently only happen during reboot or at the
start of an incoming migration).

Note2: We cannot use pause_all_vcpus()/resume_all_vcpus(), as it will
temporarily drop the BQL, which is something most caller cannot deal
with when trying to resize a memory region.

Signed-off-by: David Hildenbrand 
---
 accel/kvm/kvm-all.c   | 87 +++
 include/hw/core/cpu.h |  3 ++
 2 files changed, 90 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c111312dfd..e24805771c 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -148,6 +148,10 @@ bool kvm_ioeventfd_any_length_allowed;
 bool kvm_msi_use_devid;
 static bool kvm_immediate_exit;
 static hwaddr kvm_max_slot_size = ~0;
+static QemuMutex kvm_run_mutex;
+static QemuCond kvm_run_cond;
+static QemuCond kvm_run_inhibit_cond;
+static int kvm_run_inhibited;
 
 static const KVMCapabilityInfo kvm_required_capabilites[] = {
 KVM_CAP_INFO(USER_MEMORY),
@@ -1121,6 +1125,57 @@ static void kvm_region_del(MemoryListener *listener,
 memory_region_unref(section->mr);
 }
 
+/*
+ * Certain updates (e.g., resizing memory regions) require temporarily removing
+ * kvm memory slots. Avoid any VCPU to fault by making sure all VCPUs
+ * left KVM_RUN and won't enter it again until unblocked.
+ */
+static void kvm_run_inhibit_begin(void)
+{
+CPUState *cpu;
+
+atomic_inc(&kvm_run_inhibited);
+while (true) {
+bool any_in_kernel = false;
+
+CPU_FOREACH(cpu) {
+if (atomic_read(&cpu->in_kernel)) {
+any_in_kernel = true;
+qemu_cpu_kick(cpu);
+}
+}
+if (!any_in_kernel) {
+break;
+}
+qemu_mutex_lock(&kvm_run_mutex);
+qemu_cond_wait(&kvm_run_inhibit_cond, &kvm_run_mutex);
+qemu_mutex_unlock(&kvm_run_mutex);
+}
+}
+
+static void kvm_run_inhibit_end(void)
+{
+atomic_dec(&kvm_run_inhibited);
+qemu_mutex_lock(&kvm_run_mutex);
+qemu_cond_broadcast(&kvm_run_cond);
+qemu_mutex_unlock(&kvm_run_mutex);
+}
+
+static void kvm_region_resize(MemoryListener *listener,
+  MemoryRegionSection *section, Int128 new)
+{
+KVMMemoryListener *kml = container_of(listener, KVMMemoryListener, 
listener);
+MemoryRegionSection new_section = *section;
+
+new_section.size = new;
+
+/* Inhibit KVM while we temporarily remove slots. */
+kvm_run_inhibit_begin();
+kvm_set_phys_mem(kml, section, false);
+kvm_set_phys_mem(kml, &new_section, true);
+kvm_run_inhibit_end();
+}
+
 static void kvm_log_sync(MemoryListener *listener,
  MemoryRegionSection *section)
 {
@@ -1239,6 +1294,7 @@ void kvm_memory_listener_register(KVMState *s, 
KVMMemoryListener *kml,
 
 kml->listener.region_add = kvm_region_add;
 kml->listener.region_del = kvm_region_del;
+kml->listener.region_resize = kvm_region_resize;
 kml->listener.log_start = kvm_log_start;
 kml->listener.log_stop = kvm_log_stop;
 kml->listener.log_sync = kvm_log_sync;
@@ -1884,6 +1940,9 @@ static int kvm_init(MachineState *ms)
 assert(TARGET_PAGE_SIZE <= qemu_real_host_page_size);
 
 s->sigmask_len = 8;
+qemu_mutex_init(&kvm_run_mutex);
+qemu_cond_init(&kvm_run_cond);
+qemu_cond_init(&kvm_run_inhibit_cond);
 
 #ifdef KVM_CAP_SET_GUEST_DEBUG
 QTAILQ_INIT(&s->kvm_sw_breakpoints);
@@ -2294,6 +2353,29 @@ static void kvm_eat_signals(CPUState *cpu)
 } while (sigismember(&chkset, SIG_IPI));
 }
 
+static void kvm_set_cpu_in_kernel(CPUState *cpu, bool in_kernel)
+{
+atomic_set(&cpu->in_kernel, in_kernel);
+if (in_kernel) {
+/* wait until KVM_RUN is no longer inhibited */
+while (unlikely(atomic_read(&kvm_run_inhibited))) {
+atomic_set(&cpu->in_kernel, false);
+qemu_mutex_lock(&kvm_run_mutex);
+qemu_cond_broadcast(&kvm_run_inhibit_cond);
+qemu_cond_wait(&kvm_run_cond, &kvm_run_mutex);
+qemu_mut

[PATCH v2 10/16] hostmem: Factor out applying settings

2020-02-12 Thread David Hildenbrand
We want to reuse the functionality when resizing resizable memory
region.

Signed-off-by: David Hildenbrand 
---
 backends/hostmem.c | 137 +
 1 file changed, 76 insertions(+), 61 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index e773bdfa6e..2c8e4567e1 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -308,15 +308,85 @@ size_t host_memory_backend_pagesize(HostMemoryBackend 
*memdev)
 }
 #endif
 
+static void host_memory_backend_apply_settings(HostMemoryBackend *backend,
+   Error **errp)
+{
+const uint64_t sz = memory_region_size(&backend->mr);
+void *ptr = memory_region_get_ram_ptr(&backend->mr);
+MachineState *ms = MACHINE(qdev_get_machine());
+Error *local_err = NULL;
+
+if (backend->merge) {
+qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
+}
+if (!backend->dump) {
+qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
+}
+#ifdef CONFIG_NUMA
+   unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
+   /* lastbit == MAX_NODES means maxnode = 0 */
+   unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
+   /*
+* Ensure policy won't be ignored in case memory is preallocated before
+* mbind(). note: MPOL_MF_STRICT is ignored on hugepages so this doesn't
+* catch hugepage case.
+*/
+   unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
+
+   /*
+* Check for invalid host-nodes and policies and give more verbose error
+* messages than mbind().
+*/
+   if (maxnode && backend->policy == MPOL_DEFAULT) {
+   error_setg(errp, "host-nodes must be empty for policy default,"
+  " or you should explicitly specify a policy other"
+  " than default");
+   return;
+   } else if (maxnode == 0 && backend->policy != MPOL_DEFAULT) {
+   error_setg(errp, "host-nodes must be set for policy %s",
+  HostMemPolicy_str(backend->policy));
+   return;
+   }
+
+   /*
+* We can have up to MAX_NODES nodes, but we need to pass maxnode+1 as
+* argument to mbind() due to an old Linux bug (feature?) which cuts off the
+* last specified node. This means backend->host_nodes must have MAX_NODES+1
+* bits available.
+*/
+   assert(sizeof(backend->host_nodes) >=
+  BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
+   assert(maxnode <= MAX_NODES);
+   if (mbind(ptr, sz, backend->policy,
+ maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) {
+   if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
+   error_setg_errno(errp, errno,
+"cannot bind memory to host NUMA nodes");
+   return;
+   }
+   }
+#endif
+/*
+ * Preallocate memory after the NUMA policy has been instantiated. This is
+ * necessary to guarantee memory is allocated with specified NUMA policy
+ * in place.
+ */
+if (backend->prealloc) {
+os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz,
+ms->smp.cpus, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
+
 static void
 host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(uc);
 HostMemoryBackendClass *bc = MEMORY_BACKEND_GET_CLASS(uc);
-MachineState *ms = MACHINE(qdev_get_machine());
 Error *local_err = NULL;
-void *ptr;
-uint64_t sz;
 
 if (bc->alloc) {
 bc->alloc(backend, &local_err);
@@ -324,64 +394,9 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
 goto out;
 }
 
-ptr = memory_region_get_ram_ptr(&backend->mr);
-sz = memory_region_size(&backend->mr);
-
-if (backend->merge) {
-qemu_madvise(ptr, sz, QEMU_MADV_MERGEABLE);
-}
-if (!backend->dump) {
-qemu_madvise(ptr, sz, QEMU_MADV_DONTDUMP);
-}
-#ifdef CONFIG_NUMA
-unsigned long lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
-/* lastbit == MAX_NODES means maxnode = 0 */
-unsigned long maxnode = (lastbit + 1) % (MAX_NODES + 1);
-/* ensure policy won't be ignored in case memory is preallocated
- * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
- * this doesn't catch hugepage case. */
-unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
-
-/* check for invalid host-nodes and policies and give more verbose
- * error messages than mbind(). */
-if (maxnode && backend->policy == MPOL_DEFAULT) {
-error_setg(errp, "host-nodes must be empty for policy default,"
-   " or you should explicitly specify a policy other"
-   " than default");
-return;
-} else if (maxnode == 0 && backend->policy != MPOL_DEFAUL

[PATCH v2 05/16] pc: Support for virtio-mem-pci

2020-02-12 Thread David Hildenbrand
Signed-off-by: David Hildenbrand 
---
 hw/i386/Kconfig |  1 +
 hw/i386/pc.c| 42 --
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index cdc851598c..e8ce582edd 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -35,6 +35,7 @@ config PC
 select ACPI_PCI
 select ACPI_VMGENID
 select VIRTIO_PMEM_SUPPORTED
+select VIRTIO_MEM_SUPPORTED
 
 config PC_PCI
 bool
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2ddce4230a..ed8850f31d 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -85,6 +85,7 @@
 #include "hw/net/ne2000-isa.h"
 #include "standard-headers/asm-x86/bootparam.h"
 #include "hw/virtio/virtio-pmem-pci.h"
+#include "hw/virtio/virtio-mem-pci.h"
 #include "hw/mem/memory-device.h"
 #include "sysemu/replay.h"
 #include "qapi/qmp/qerror.h"
@@ -1648,8 +1649,8 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
 numa_cpu_pre_plug(cpu_slot, dev, errp);
 }
 
-static void pc_virtio_pmem_pci_pre_plug(HotplugHandler *hotplug_dev,
-DeviceState *dev, Error **errp)
+static void pc_virtio_md_pci_pre_plug(HotplugHandler *hotplug_dev,
+  DeviceState *dev, Error **errp)
 {
 HotplugHandler *hotplug_dev2 = qdev_get_bus_hotplug_handler(dev);
 Error *local_err = NULL;
@@ -1660,7 +1661,7 @@ static void pc_virtio_pmem_pci_pre_plug(HotplugHandler 
*hotplug_dev,
  * order. This should never be the case on x86, however better add
  * a safety net.
  */
-error_setg(errp, "virtio-pmem-pci not supported on this bus.");
+error_setg(errp, "virtio based memory devices not supported on this 
bus.");
 return;
 }
 /*
@@ -1675,8 +1676,8 @@ static void pc_virtio_pmem_pci_pre_plug(HotplugHandler 
*hotplug_dev,
 error_propagate(errp, local_err);
 }
 
-static void pc_virtio_pmem_pci_plug(HotplugHandler *hotplug_dev,
-DeviceState *dev, Error **errp)
+static void pc_virtio_md_pci_plug(HotplugHandler *hotplug_dev,
+  DeviceState *dev, Error **errp)
 {
 HotplugHandler *hotplug_dev2 = qdev_get_bus_hotplug_handler(dev);
 Error *local_err = NULL;
@@ -1694,15 +1695,15 @@ static void pc_virtio_pmem_pci_plug(HotplugHandler 
*hotplug_dev,
 error_propagate(errp, local_err);
 }
 
-static void pc_virtio_pmem_pci_unplug_request(HotplugHandler *hotplug_dev,
-  DeviceState *dev, Error **errp)
+static void pc_virtio_md_pci_unplug_request(HotplugHandler *hotplug_dev,
+DeviceState *dev, Error **errp)
 {
 /* We don't support virtio pmem hot unplug */
 error_setg(errp, "virtio pmem device unplug not supported.");
 }
 
-static void pc_virtio_pmem_pci_unplug(HotplugHandler *hotplug_dev,
-  DeviceState *dev, Error **errp)
+static void pc_virtio_md_pci_unplug(HotplugHandler *hotplug_dev,
+DeviceState *dev, Error **errp)
 {
 /* We don't support virtio pmem hot unplug */
 }
@@ -1714,8 +1715,9 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler 
*hotplug_dev,
 pc_memory_pre_plug(hotplug_dev, dev, errp);
 } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
 pc_cpu_pre_plug(hotplug_dev, dev, errp);
-} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
-pc_virtio_pmem_pci_pre_plug(hotplug_dev, dev, errp);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
+   object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
+pc_virtio_md_pci_pre_plug(hotplug_dev, dev, errp);
 }
 }
 
@@ -1726,8 +1728,9 @@ static void pc_machine_device_plug_cb(HotplugHandler 
*hotplug_dev,
 pc_memory_plug(hotplug_dev, dev, errp);
 } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
 pc_cpu_plug(hotplug_dev, dev, errp);
-} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
-pc_virtio_pmem_pci_plug(hotplug_dev, dev, errp);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
+   object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
+pc_virtio_md_pci_plug(hotplug_dev, dev, errp);
 }
 }
 
@@ -1738,8 +1741,9 @@ static void 
pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
 pc_memory_unplug_request(hotplug_dev, dev, errp);
 } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
 pc_cpu_unplug_request_cb(hotplug_dev, dev, errp);
-} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
-pc_virtio_pmem_pci_unplug_request(hotplug_dev, dev, errp);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
+   object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
+pc_virtio_md_pci_unplug_request(hotp

[PATCH v2 12/16] hostmem: Introduce "managed-size" for memory-backend-ram

2020-02-12 Thread David Hildenbrand
virtio-mem wants to make use of resizable memory regions. Allow to
create them by the user by specifying "managed-size".

Disallow setting "managed-size" with "prealloc" and "shared". The latter
might theoretically be possible, however has to be wired up internally
first.

Support for memory-backend-ram only for now. Support for other backends
(especially, hugepages), can be added later (and once e.g., virtio-mem
also supports hugepages).

When the memory region gets resized, apply the same settings just as when
allocating the memory.

Fence off the all such memory backends in all existing users. We'll
convert virtio-mem soon.

Signed-off-by: David Hildenbrand 
---
 backends/hostmem-ram.c   | 18 --
 backends/hostmem.c   | 72 ++--
 hw/mem/pc-dimm.c |  3 +-
 hw/misc/ivshmem.c|  2 +-
 hw/virtio/virtio-mem.c   |  2 +-
 hw/virtio/virtio-pmem.c  |  2 +-
 include/sysemu/hostmem.h |  8 +++--
 7 files changed, 97 insertions(+), 10 deletions(-)

diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index 6aab8d3a73..881276cf6b 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -29,8 +29,21 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 }
 
 name = host_memory_backend_get_name(backend);
-memory_region_init_ram_shared_nomigrate(&backend->mr, OBJECT(backend), 
name,
-   backend->size, backend->share, errp);
+if (backend->managed_size) {
+/*
+ * The size of a memory region must always be > 0 - start with 1. The
+ * managing object/device will resize accordingly.
+ */
+g_assert(!backend->share);
+memory_region_init_resizeable_ram(&backend->mr, OBJECT(backend), name,
+  1, backend->size,
+  host_memory_backend_resized,
+  errp);
+} else {
+memory_region_init_ram_shared_nomigrate(&backend->mr, OBJECT(backend),
+name, backend->size,
+backend->share, errp);
+}
 g_free(name);
 }
 
@@ -40,6 +53,7 @@ ram_backend_class_init(ObjectClass *oc, void *data)
 HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
 
 bc->alloc = ram_backend_memory_alloc;
+bc->managed_size_supported = true;
 }
 
 static const TypeInfo ram_backend_info = {
diff --git a/backends/hostmem.c b/backends/hostmem.c
index de37f1bf5d..c3c453753a 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -238,7 +238,10 @@ static void host_memory_backend_set_prealloc(Object *obj, 
bool value,
 return;
 }
 
-if (value && !backend->prealloc) {
+if (value && backend->managed_size) {
+error_setg(errp, "'prealloc' is not compatible with 'managed-size'");
+return;
+} else if (value && !backend->prealloc) {
 int fd = memory_region_get_fd(&backend->mr);
 void *ptr = memory_region_get_ram_ptr(&backend->mr);
 uint64_t sz = memory_region_size(&backend->mr);
@@ -292,7 +295,8 @@ bool host_memory_backend_is_mapped(HostMemoryBackend 
*backend)
 }
 
 void host_memory_backend_validate(HostMemoryBackend *backend,
-  const char *property, Error **errp)
+  const char *property,
+  bool managed_size_support, Error **errp)
 {
 char *path = object_get_canonical_path_component(OBJECT(backend));
 
@@ -301,6 +305,10 @@ void host_memory_backend_validate(HostMemoryBackend 
*backend,
 } else if (host_memory_backend_is_mapped(backend)) {
 error_setg(errp, "'%s' property specifies a busy memdev: %s",
property, path);
+} else if (backend->managed_size && !managed_size_support) {
+error_setg(errp,
+   "'%s' property does not support a memdev with a managed 
size: %s",
+   property, path);
 }
 g_free(path);
 }
@@ -395,6 +403,24 @@ static void 
host_memory_backend_apply_settings(HostMemoryBackend *backend,
 }
 }
 
+void host_memory_backend_resized(Object *owner, const char *idstr,
+ uint64_t size, void *host)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(owner);
+Error *local_err = NULL;
+
+/*
+ * Just apply the settings for all (resized) memory again. Note that
+ * "shared" and "prealloc" is currently not compatible with resizable 
memory
+ * regions ("managed-size"). Warn only.
+ */
+host_memory_backend_apply_settings(backend, &local_err);
+if (local_err) {
+ warn_report_err(local_err);
+ local_err = NULL;
+}
+}
+
 static void
 host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
 {
@@ -441,6 +467,9 @@ static void host_memory_backend_set_share(Object *o, bool 
value, Error **errp)
 if (host_memory_bac

[PATCH v2 09/16] memory-device: properly deal with resizable memory regions

2020-02-12 Thread David Hildenbrand
In case we are dealing with resizable memory regions, we always have to
assign space in the physical address space which can fit the maximum
region size.

Signed-off-by: David Hildenbrand 
---
 hw/mem/memory-device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 4bc9cf0917..32d0c5d334 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -269,7 +269,7 @@ void memory_device_pre_plug(MemoryDeviceState *md, 
MachineState *ms,
 align = legacy_align ? *legacy_align : memory_region_get_alignment(mr);
 addr = mdc->get_addr(md);
 addr = memory_device_get_free_addr(ms, !addr ? NULL : &addr, align,
-   memory_region_size(mr), &local_err);
+   memory_region_max_size(mr), &local_err);
 if (local_err) {
 goto out;
 }
@@ -329,7 +329,7 @@ uint64_t memory_device_get_region_size(const 
MemoryDeviceState *md,
 return 0;
 }
 
-return memory_region_size(mr);
+return memory_region_max_size(mr);
 }
 
 static const TypeInfo memory_device_info = {
-- 
2.24.1




[PATCH v2 15/16] memory: Add region_resize() callback to memory notifier

2020-02-12 Thread David Hildenbrand
Let's provide a way for memory notifiers to get notified about a resize.
If the region_resize() callback is not implemented by a notifier, we
mimic the old behavior by removing the old section and adding the
new, resized section.

The existing code would remove all sections first and then add the new
ones. When resizing, we will now remove+re-add in a single shot. As we
grow in the adding phase and shrink in the removal phase, this should
not make a difference.

This callback is helpful when backends (like KVM) want to implement
atomic resizes of memory sections (e.g., resize while VCPUs are running and
using the section).

Note 1: Resizing while changing logging is unlikely, but nothing speaks
against allowing it.
Note 2: Resizing MMIO regions is unlikely (coalesced io handling), but
nothing speaks against it.

Signed-off-by: David Hildenbrand 
---
 include/exec/memory.h | 19 ++
 memory.c  | 85 ---
 2 files changed, 99 insertions(+), 5 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index dfedd88f13..1ec5432340 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -493,6 +493,25 @@ struct MemoryListener {
  */
 void (*region_nop)(MemoryListener *listener, MemoryRegionSection *section);
 
+/**
+ * @region_resize:
+ *
+ * Called during an address space update transaction,
+ * for a section of the address space that is in the same place in the
+ * address space as in the last transaction, however, the size changed.
+ * Dirty memory logging can change as well.
+ *
+ * Note: If this callback is not implemented well, the resize is
+ *   communicated via a region_del(), followed by a region_add()
+ *   instead.
+ *
+ * @listener: The #MemoryListener.
+ * @section: The old #MemoryRegionSection.
+ * @new: The new size.
+ */
+void (*region_resize)(MemoryListener *listener,
+  MemoryRegionSection *section, Int128 new);
+
 /**
  * @log_start:
  *
diff --git a/memory.c b/memory.c
index 5c62702618..0d9fe189ad 100644
--- a/memory.c
+++ b/memory.c
@@ -246,6 +246,17 @@ static bool flatrange_equal(FlatRange *a, FlatRange *b)
 && a->nonvolatile == b->nonvolatile;
 }
 
+static bool flatrange_resized(FlatRange *a, FlatRange *b)
+{
+return a->mr == b->mr
+&& int128_eq(a->addr.start, b->addr.start)
+&& int128_ne(a->addr.size, b->addr.size)
+&& a->offset_in_region == b->offset_in_region
+&& a->romd_mode == b->romd_mode
+&& a->readonly == b->readonly
+&& a->nonvolatile == b->nonvolatile;
+}
+
 static FlatView *flatview_new(MemoryRegion *mr_root)
 {
 FlatView *view;
@@ -875,6 +886,51 @@ static void flat_range_coalesced_io_add(FlatRange *fr, 
AddressSpace *as)
 }
 }
 
+static void memory_listener_resize_region(FlatRange *fr, AddressSpace *as,
+  enum ListenerDirection dir,
+  Int128 new)
+{
+FlatView *as_view = address_space_to_flatview(as);
+MemoryRegionSection old_mrs = section_from_flat_range(fr, as_view);
+MemoryRegionSection new_mrs = old_mrs;
+MemoryListener *listener;
+
+new_mrs.size = new;
+
+switch (dir) {
+case Forward:
+QTAILQ_FOREACH(listener, &as->listeners, link_as) {
+if (listener->region_resize) {
+listener->region_resize(listener, &old_mrs, new);
+continue;
+}
+if (listener->region_del) {
+listener->region_del(listener, &old_mrs);
+}
+if (listener->region_add) {
+listener->region_add(listener, &new_mrs);
+}
+}
+break;
+case Reverse:
+QTAILQ_FOREACH_REVERSE(listener, &as->listeners, link_as) {
+if (listener->region_resize) {
+listener->region_resize(listener, &old_mrs, new);
+continue;
+}
+if (listener->region_del) {
+listener->region_del(listener, &old_mrs);
+}
+if (listener->region_add) {
+listener->region_add(listener, &new_mrs);
+}
+}
+break;
+default:
+g_assert_not_reached();
+}
+}
+
 static void address_space_update_topology_pass(AddressSpace *as,
const FlatView *old_view,
const FlatView *new_view,
@@ -899,11 +955,30 @@ static void 
address_space_update_topology_pass(AddressSpace *as,
 frnew = NULL;
 }
 
-if (frold
-&& (!frnew
-|| int128_lt(frold->addr.start, frnew->addr.start)
-|| (int128_eq(frold->addr.start, frnew->addr.start)
-&& !flatrange_equal(frold, frnew {
+if (frold && 

[PATCH v2 06/16] exec: Provide owner when resizing memory region

2020-02-12 Thread David Hildenbrand
Let's pass the owner in the callback. While touching it, introduce a
typedef for the callback.

Signed-off-by: David Hildenbrand 
---
 exec.c  | 13 +
 hw/core/loader.c|  3 ++-
 include/exec/memory.h   |  7 ---
 include/exec/ram_addr.h |  4 +---
 include/exec/ramblock.h |  3 ++-
 memory.c|  4 +---
 6 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/exec.c b/exec.c
index 71e32dcc11..5bc9b231c4 100644
--- a/exec.c
+++ b/exec.c
@@ -2193,7 +2193,8 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, 
Error **errp)
 
 memory_region_set_size(block->mr, newsize);
 if (block->resized) {
-block->resized(block->idstr, newsize, block->host);
+block->resized(memory_region_owner(block->mr), block->idstr, newsize,
+   block->host);
 }
 
 /*
@@ -2476,9 +2477,7 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
 
 static
 RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
-  void (*resized)(const char*,
-  uint64_t length,
-  void *host),
+  memory_region_resized_fn resized,
   void *host, bool resizeable, bool share,
   MemoryRegion *mr, Error **errp)
 {
@@ -2529,10 +2528,8 @@ RAMBlock *qemu_ram_alloc(ram_addr_t size, bool share,
 }
 
 RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t maxsz,
- void (*resized)(const char*,
- uint64_t length,
- void *host),
- MemoryRegion *mr, Error **errp)
+memory_region_resized_fn resized,
+MemoryRegion *mr, Error **errp)
 {
 return qemu_ram_alloc_internal(size, maxsz, resized, NULL, true,
false, mr, errp);
diff --git a/hw/core/loader.c b/hw/core/loader.c
index d1b78f60cd..59fb1620f1 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -912,7 +912,8 @@ static void rom_insert(Rom *rom)
 QTAILQ_INSERT_TAIL(&roms, rom, next);
 }
 
-static void fw_cfg_resized(const char *id, uint64_t length, void *host)
+static void fw_cfg_resized(Object *owner, const char *id, uint64_t length,
+   void *host)
 {
 if (fw_cfg) {
 fw_cfg_modify_file(fw_cfg, id + strlen("/rom@"), host, length);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 19417943a2..9f02bb7830 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -846,6 +846,9 @@ void memory_region_init_ram_shared_nomigrate(MemoryRegion 
*mr,
  bool share,
  Error **errp);
 
+typedef void (*memory_region_resized_fn)(Object *owner, const char*id,
+ uint64_t length, void *host);
+
 /**
  * memory_region_init_resizeable_ram:  Initialize memory region with resizeable
  * RAM.  Accesses into the region will
@@ -870,9 +873,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
const char *name,
uint64_t size,
uint64_t max_size,
-   void (*resized)(const char*,
-   uint64_t length,
-   void *host),
+   memory_region_resized_fn resized,
Error **errp);
 #ifdef CONFIG_POSIX
 
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 5e59a3d8d7..0ee3126361 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -128,9 +128,7 @@ RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void 
*host,
 RAMBlock *qemu_ram_alloc(ram_addr_t size, bool share, MemoryRegion *mr,
  Error **errp);
 RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t max_size,
-void (*resized)(const char*,
-uint64_t length,
-void *host),
+memory_region_resized_fn resized,
 MemoryRegion *mr, Error **errp);
 void qemu_ram_free(RAMBlock *block);
 
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 07d50864d8..437b8f82ea 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -21,6 +21,7 @@
 
 #ifndef CONFIG_USER_ONLY
 #include "cpu-common.h"
+#include "exe

[PATCH v2 04/16] numa: Handle virtio-mem in NUMA stats

2020-02-12 Thread David Hildenbrand
Account the memory to the configured nide.

Signed-off-by: David Hildenbrand 
---
 hw/core/numa.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 601cf9f603..4deb27ebee 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -855,10 +855,11 @@ static void numa_stat_memory_devices(NumaNodeMem 
node_mem[])
 {
 MemoryDeviceInfoList *info_list = qmp_memory_device_list();
 MemoryDeviceInfoList *info;
-PCDIMMDeviceInfo *pcdimm_info;
 VirtioPMEMDeviceInfo *vpi;
+VirtioMEMDeviceInfo *vmi;
 
 for (info = info_list; info; info = info->next) {
+PCDIMMDeviceInfo *pcdimm_info = NULL;;
 MemoryDeviceInfo *value = info->value;
 
 if (value) {
@@ -877,6 +878,11 @@ static void numa_stat_memory_devices(NumaNodeMem 
node_mem[])
 node_mem[0].node_mem += vpi->size;
 node_mem[0].node_plugged_mem += vpi->size;
 break;
+case MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM:
+vmi = value->u.virtio_mem.data;
+node_mem[vmi->node].node_mem += vmi->size;
+node_mem[vmi->node].node_plugged_mem += vmi->size;
+break;
 default:
 g_assert_not_reached();
 }
-- 
2.24.1




[PATCH v2 01/16] virtio-mem: Prototype

2020-02-12 Thread David Hildenbrand
Signed-off-by: David Hildenbrand 
---
 hw/virtio/Kconfig  |  11 +
 hw/virtio/Makefile.objs|   1 +
 hw/virtio/virtio-mem.c | 805 +
 include/hw/virtio/virtio-mem.h |  83 
 qapi/misc.json |  39 +-
 5 files changed, 938 insertions(+), 1 deletion(-)
 create mode 100644 hw/virtio/virtio-mem.c
 create mode 100644 include/hw/virtio/virtio-mem.h

diff --git a/hw/virtio/Kconfig b/hw/virtio/Kconfig
index f87def27a6..638fe120b1 100644
--- a/hw/virtio/Kconfig
+++ b/hw/virtio/Kconfig
@@ -42,3 +42,14 @@ config VIRTIO_PMEM
 depends on VIRTIO
 depends on VIRTIO_PMEM_SUPPORTED
 select MEM_DEVICE
+
+config VIRTIO_MEM_SUPPORTED
+bool
+
+config VIRTIO_MEM
+bool
+default y
+depends on VIRTIO
+depends on LINUX
+depends on VIRTIO_MEM_SUPPORTED
+select MEM_DEVICE
diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index de0f5fc39b..3ed94c84d7 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -17,6 +17,7 @@ obj-$(CONFIG_VIRTIO_PMEM) += virtio-pmem.o
 common-obj-$(call land,$(CONFIG_VIRTIO_PMEM),$(CONFIG_VIRTIO_PCI)) += 
virtio-pmem-pci.o
 obj-$(call land,$(CONFIG_VHOST_USER_FS),$(CONFIG_VIRTIO_PCI)) += 
vhost-user-fs-pci.o
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
+obj-$(CONFIG_VIRTIO_MEM) += virtio-mem.o
 
 ifeq ($(CONFIG_VIRTIO_PCI),y)
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock-pci.o
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
new file mode 100644
index 00..2f759578fe
--- /dev/null
+++ b/hw/virtio/virtio-mem.c
@@ -0,0 +1,805 @@
+/*
+ * Virtio MEM device
+ *
+ * Copyright (C) 2018-2019 Red Hat, Inc.
+ *
+ * Authors:
+ *  David Hildenbrand 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/iov.h"
+#include "qemu/cutils.h"
+#include "qemu/error-report.h"
+#include "qemu/units.h"
+#include "sysemu/kvm.h"
+#include "sysemu/numa.h"
+#include "sysemu/balloon.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/reset.h"
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-mem.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "exec/ram_addr.h"
+#include "migration/postcopy-ram.h"
+#include "migration/misc.h"
+#include "hw/boards.h"
+#include "hw/qdev-properties.h"
+#include "config-devices.h"
+
+/*
+ * Use QEMU_VMALLOC_ALIGN, so no THP will have to be split when unplugging
+ * memory.
+ */
+#define VIRTIO_MEM_DEFAULT_BLOCK_SIZE QEMU_VMALLOC_ALIGN
+#define VIRTIO_MEM_MIN_BLOCK_SIZE QEMU_VMALLOC_ALIGN
+/*
+ * Size the usable region slightly bigger than the requested size if
+ * possible. This allows guests to make use of most requested memory even
+ * if the memory region in guest physical memory has strange alignment.
+ * E.g. x86-64 has alignment requirements for sections of 128 MiB.
+ */
+#define VIRTIO_MEM_USABLE_EXTENT (256 * MiB)
+
+static bool virtio_mem_busy(void)
+{
+/*
+ * Better don't mess with dumps and migration - especially when
+ * resizing memory regions. Also, RDMA migration pins all memory.
+ */
+if (!migration_is_idle()) {
+return true;
+}
+if (dump_in_progress()) {
+return true;
+}
+/*
+ * We can't use madvise(DONTNEED) e.g. with certain VFIO devices,
+ * also resizing memory regions might be problematic. Bad thing is,
+ * this might change suddenly, e.g. when hotplugging a VFIO device.
+ */
+if (qemu_balloon_is_inhibited()) {
+return true;
+}
+return false;
+}
+
+static bool virtio_mem_test_bitmap(VirtIOMEM *vm, uint64_t start_gpa,
+   uint64_t size, bool plug)
+{
+uint64_t bit = (start_gpa - vm->addr) / vm->block_size;
+
+g_assert(QEMU_IS_ALIGNED(start_gpa, vm->block_size));
+g_assert(QEMU_IS_ALIGNED(size, vm->block_size));
+g_assert(vm->bitmap);
+
+while (size) {
+g_assert((bit / BITS_PER_BYTE) <= vm->bitmap_size);
+
+if (plug && !test_bit(bit, vm->bitmap)) {
+return false;
+} else if (!plug && test_bit(bit, vm->bitmap)) {
+return false;
+}
+size -= vm->block_size;
+bit++;
+}
+return true;
+}
+
+static void virtio_mem_set_bitmap(VirtIOMEM *vm, uint64_t start_gpa,
+  uint64_t size, bool plug)
+{
+const uint64_t bit = (start_gpa - vm->addr) / vm->block_size;
+const uint64_t nbits = size / vm->block_size;
+
+g_assert(QEMU_IS_ALIGNED(start_gpa, vm->block_size));
+g_assert(QEMU_IS_ALIGNED(size, vm->block_size));
+g_assert(vm->bitmap);
+
+if (plug) {
+bitmap_set(vm->bitmap, bit, nbits);
+} else {
+bitmap_clear(vm->bitmap, bit, nbits);
+}
+}
+
+static void virtio_mem_set_block_state(VirtIOMEM *vm, uint64_t start_gpa,
+   

[PATCH v2 03/16] hmp: Handle virtio-mem when printing memory device infos

2020-02-12 Thread David Hildenbrand
Print the memory device info just like other memory devices.

Signed-off-by: David Hildenbrand 
---
 monitor/hmp-cmds.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 558fe06b8f..798aead52e 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -2542,6 +2542,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict 
*qdict)
 MemoryDeviceInfoList *info_list = qmp_query_memory_devices(&err);
 MemoryDeviceInfoList *info;
 VirtioPMEMDeviceInfo *vpi;
+VirtioMEMDeviceInfo *vmi;
 MemoryDeviceInfo *value;
 PCDIMMDeviceInfo *di;
 
@@ -2576,6 +2577,21 @@ void hmp_info_memory_devices(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "  size: %" PRIu64 "\n", vpi->size);
 monitor_printf(mon, "  memdev: %s\n", vpi->memdev);
 break;
+case MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM:
+vmi = value->u.virtio_mem.data;
+monitor_printf(mon, "Memory device [%s]: \"%s\"\n",
+   MemoryDeviceInfoKind_str(value->type),
+   vmi->id ? vmi->id : "");
+monitor_printf(mon, "  memaddr: 0x%" PRIx64 "\n", 
vmi->memaddr);
+monitor_printf(mon, "  node: %" PRId64 "\n", vmi->node);
+monitor_printf(mon, "  requested-size: %" PRIu64 "\n",
+   vmi->requested_size);
+monitor_printf(mon, "  size: %" PRIu64 "\n", vmi->size);
+monitor_printf(mon, "  max-size: %" PRIu64 "\n", 
vmi->max_size);
+monitor_printf(mon, "  block-size: %" PRIu64 "\n",
+   vmi->block_size);
+monitor_printf(mon, "  memdev: %s\n", vmi->memdev);
+break;
 default:
 g_assert_not_reached();
 }
-- 
2.24.1




[PATCH v2 00/16] Ram blocks with resizable anonymous allocations under POSIX

2020-02-12 Thread David Hildenbrand
We already allow resizable ram blocks for anonymous memory, however, they
are not actually resized. All memory is mmaped() R/W, including the memory
exceeding the used_length, up to the max_length.

When resizing, effectively only the boundary is moved. Implement actually
resizable anonymous allocations and make use of them in resizable ram
blocks when possible. Memory exceeding the used_length will be
inaccessible. Especially ram block notifiers require care.

Having actually resizable anonymous allocations (via mmap-hackery) allows
to reserve a big region in virtual address space and grow the
accessible/usable part on demand. Even if "/proc/sys/vm/overcommit_memory"
is set to "never" under Linux, huge reservations will succeed. If there is
not enough memory when resizing (to populate parts of the reserved region),
trying to resize will fail. Only the actually used size is reserved in the
OS.

E.g., virtio-mem [1] wants to reserve big resizable memory regions and
grow the usable part on demand. I think this change is worth sending out
individually. Accompanied by a bunch of minor fixes and cleanups.

Especially, memory notifiers already handle resizing by first removing
the old region, and then re-adding the resized region. prealloc is
currently not possible with resizable ram blocks. mlock() should continue
to work as is. Resizing is currently rare and must only happen on the
start of an incoming migration, or during resets. No code path (except
HAX and SEV ram block notifiers) should access memory outside of the usable
range - and if we ever find one, that one has to be fixed (I did not
identify any).

v1 -> v2:
- Add "util: vfio-helpers: Fix qemu_vfio_close()"
- Add "util: vfio-helpers: Remove Error parameter from
   qemu_vfio_undo_mapping()"
- Add "util: vfio-helpers: Factor out removal from
   qemu_vfio_undo_mapping()"
- "util/mmap-alloc: ..."
 -- Minor changes due to review feedback (e.g., assert alignment, return
bool when resizing)
- "util: vfio-helpers: Implement ram_block_resized()"
 -- Reserve max_size in the IOVA address space.
 -- On resize, undo old mapping and do new mapping. We can later implement
a new ioctl to resize the mapping directly.
- "numa: Teach ram block notifiers about resizable ram blocks"
 -- Pass size/max_size to ram block notifiers, which makes things easier an
cleaner
- "exec: Ram blocks with resizable anonymous allocations under POSIX"
 -- Adapt to new ram block notifiers
 -- Shrink after notifying. Always trigger ram block notifiers on resizes
 -- Add a safety net that all ram block notifiers registered at runtime
support resizes.

[1] https://lore.kernel.org/kvm/20191212171137.13872-1-da...@redhat.com/

David Hildenbrand (16):
  util: vfio-helpers: Factor out and fix processing of existing ram
blocks
  util: vfio-helpers: Fix qemu_vfio_close()
  util: vfio-helpers: Remove Error parameter from
qemu_vfio_undo_mapping()
  util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping()
  exec: Factor out setting ram settings (madvise ...) into
qemu_ram_apply_settings()
  exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap()
  exec: Drop "shared" parameter from ram_block_add()
  util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize()
  util/mmap-alloc: Factor out reserving of a memory region to
mmap_reserve()
  util/mmap-alloc: Factor out populating of memory to mmap_populate()
  util/mmap-alloc: Prepare for resizable mmaps
  util/mmap-alloc: Implement resizable mmaps
  numa: Teach ram block notifiers about resizable ram blocks
  util: vfio-helpers: Implement ram_block_resized()
  util: oslib: Resizable anonymous allocations under POSIX
  exec: Ram blocks with resizable anonymous allocations under POSIX

 exec.c | 104 +++
 hw/core/numa.c |  53 +++-
 hw/i386/xen/xen-mapcache.c |   7 +-
 include/exec/cpu-common.h  |   3 +
 include/exec/memory.h  |   8 ++
 include/exec/ramlist.h |  14 +++-
 include/qemu/mmap-alloc.h  |  21 +++--
 include/qemu/osdep.h   |   6 +-
 stubs/ram-block.c  |  20 -
 target/i386/hax-mem.c  |   5 +-
 target/i386/sev.c  |  18 ++--
 util/mmap-alloc.c  | 165 +++--
 util/oslib-posix.c |  37 -
 util/oslib-win32.c |  14 
 util/trace-events  |   9 +-
 util/vfio-helpers.c| 145 +---
 16 files changed, 450 insertions(+), 179 deletions(-)

-- 
2.24.1




[PATCH v2 07/16] memory: Add memory_region_max_size() and memory_region_is_resizable()

2020-02-12 Thread David Hildenbrand
We want to pass resizable memory regions to devices that can deal
with them (and autoamtically resize them). Allow them to easily
identify if a region can be resized and what the maximum size is.

Add both functions, adding qemu_ram_is_resizable() as a helper.

Signed-off-by: David Hildenbrand 
---
 include/exec/memory.h | 17 +
 memory.c  | 18 ++
 2 files changed, 35 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 9f02bb7830..dfedd88f13 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -395,6 +395,7 @@ struct MemoryRegion {
 void *opaque;
 MemoryRegion *container;
 Int128 size;
+Int128 max_size;
 hwaddr addr;
 void (*destructor)(MemoryRegion *mr);
 uint64_t align;
@@ -1180,6 +1181,13 @@ struct Object *memory_region_owner(MemoryRegion *mr);
  */
 uint64_t memory_region_size(MemoryRegion *mr);
 
+/**
+ * memory_region_max_size: get a memory region's maximum size.
+ *
+ * @mr: the memory region being queried.
+ */
+uint64_t memory_region_max_size(MemoryRegion *mr);
+
 /**
  * memory_region_is_ram: check whether a memory region is random access
  *
@@ -1471,6 +1479,15 @@ MemoryRegion *memory_region_from_host(void *ptr, 
ram_addr_t *offset);
  */
 void *memory_region_get_ram_ptr(MemoryRegion *mr);
 
+/**
+ * memory_region_is_resizable: check whether a memory region resizable
+ *
+ * Returns %true if a memory region is resizable.
+ *
+ * @mr: the memory region being queried
+ */
+bool memory_region_is_resizable(MemoryRegion *mr);
+
 /* memory_region_ram_resize: Resize a RAM region.
  *
  * Only legal before guest might have detected the memory size: e.g. on
diff --git a/memory.c b/memory.c
index cb09a8ee59..5c62702618 100644
--- a/memory.c
+++ b/memory.c
@@ -1130,6 +1130,7 @@ static void memory_region_do_init(MemoryRegion *mr,
 if (size == UINT64_MAX) {
 mr->size = int128_2_64();
 }
+mr->max_size = mr->size;
 mr->name = g_strdup(name);
 mr->owner = owner;
 mr->ram_block = NULL;
@@ -1540,6 +1541,10 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
 {
 Error *err = NULL;
 memory_region_init(mr, owner, name, size);
+mr->max_size = int128_make64(max_size);
+if (max_size == UINT64_MAX) {
+mr->max_size = int128_2_64();
+}
 mr->ram = true;
 mr->terminates = true;
 mr->destructor = memory_region_destructor_ram;
@@ -1779,6 +1784,14 @@ uint64_t memory_region_size(MemoryRegion *mr)
 return int128_get64(mr->size);
 }
 
+uint64_t memory_region_max_size(MemoryRegion *mr)
+{
+if (int128_eq(mr->max_size, int128_2_64())) {
+return UINT64_MAX;
+}
+return int128_get64(mr->max_size);
+}
+
 const char *memory_region_name(const MemoryRegion *mr)
 {
 if (!mr->name) {
@@ -2198,6 +2211,11 @@ ram_addr_t memory_region_get_ram_addr(MemoryRegion *mr)
 return mr->ram_block ? mr->ram_block->offset : RAM_ADDR_INVALID;
 }
 
+bool memory_region_is_resizable(MemoryRegion *mr)
+{
+return mr->ram_block && qemu_ram_is_resizable(mr->ram_block);
+}
+
 void memory_region_ram_resize(MemoryRegion *mr, ram_addr_t newsize, Error 
**errp)
 {
 assert(mr->ram_block);
-- 
2.24.1




[PATCH v2 02/16] virtio-pci: Proxy for virtio-mem

2020-02-12 Thread David Hildenbrand
Signed-off-by: David Hildenbrand 
---
 hw/virtio/Makefile.objs|   1 +
 hw/virtio/virtio-mem-pci.c | 136 +
 hw/virtio/virtio-mem-pci.h |  33 +
 include/hw/pci/pci.h   |   1 +
 4 files changed, 171 insertions(+)
 create mode 100644 hw/virtio/virtio-mem-pci.c
 create mode 100644 hw/virtio/virtio-mem-pci.h

diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 3ed94c84d7..3f8a281d36 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -18,6 +18,7 @@ common-obj-$(call 
land,$(CONFIG_VIRTIO_PMEM),$(CONFIG_VIRTIO_PCI)) += virtio-pme
 obj-$(call land,$(CONFIG_VHOST_USER_FS),$(CONFIG_VIRTIO_PCI)) += 
vhost-user-fs-pci.o
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
 obj-$(CONFIG_VIRTIO_MEM) += virtio-mem.o
+common-obj-$(call land,$(CONFIG_VIRTIO_MEM),$(CONFIG_VIRTIO_PCI)) += 
virtio-mem-pci.o
 
 ifeq ($(CONFIG_VIRTIO_PCI),y)
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock-pci.o
diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c
new file mode 100644
index 00..d3a2c99492
--- /dev/null
+++ b/hw/virtio/virtio-mem-pci.c
@@ -0,0 +1,136 @@
+/*
+ * Virtio MEM PCI device
+ *
+ * Copyright (C) 2018-2019 Red Hat, Inc.
+ *
+ * Authors:
+ *  David Hildenbrand 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "virtio-mem-pci.h"
+#include "hw/mem/memory-device.h"
+#include "qapi/error.h"
+
+static void virtio_mem_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
+{
+VirtIOMEMPCI *mem_pci = VIRTIO_MEM_PCI(vpci_dev);
+DeviceState *vdev = DEVICE(&mem_pci->vdev);
+
+qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
+object_property_set_bool(OBJECT(vdev), true, "realized", errp);
+}
+
+static void virtio_mem_pci_set_addr(MemoryDeviceState *md, uint64_t addr,
+ Error **errp)
+{
+object_property_set_uint(OBJECT(md), addr, VIRTIO_MEM_ADDR_PROP, errp);
+}
+
+static uint64_t virtio_mem_pci_get_addr(const MemoryDeviceState *md)
+{
+return object_property_get_uint(OBJECT(md), VIRTIO_MEM_ADDR_PROP,
+&error_abort);
+}
+
+static MemoryRegion *virtio_mem_pci_get_memory_region(MemoryDeviceState *md,
+  Error **errp)
+{
+VirtIOMEMPCI *pci_mem = VIRTIO_MEM_PCI(md);
+VirtIOMEM *vmem = VIRTIO_MEM(&pci_mem->vdev);
+VirtIOMEMClass *vmc = VIRTIO_MEM_GET_CLASS(vmem);
+
+return vmc->get_memory_region(vmem, errp);
+}
+
+static uint64_t virtio_mem_pci_get_plugged_size(const MemoryDeviceState *md,
+ Error **errp)
+{
+VirtIOMEMPCI *pci_mem = VIRTIO_MEM_PCI(md);
+VirtIOMEM *mem = VIRTIO_MEM(&pci_mem->vdev);
+VirtIOMEMClass *vpc = VIRTIO_MEM_GET_CLASS(mem);
+MemoryRegion *mr = vpc->get_memory_region(mem, errp);
+
+/* the plugged size corresponds to the region size */
+return mr ? 0 : memory_region_size(mr);
+}
+
+static void virtio_mem_pci_fill_device_info(const MemoryDeviceState *md,
+ MemoryDeviceInfo *info)
+{
+VirtioMEMDeviceInfo *vi = g_new0(VirtioMEMDeviceInfo, 1);
+VirtIOMEMPCI *pci_mem = VIRTIO_MEM_PCI(md);
+VirtIOMEM *mem = VIRTIO_MEM(&pci_mem->vdev);
+VirtIOMEMClass *vpc = VIRTIO_MEM_GET_CLASS(mem);
+DeviceState *dev = DEVICE(md);
+
+if (dev->id) {
+vi->has_id = true;
+vi->id = g_strdup(dev->id);
+}
+
+/* let the real device handle everything else */
+vpc->fill_device_info(mem, vi);
+
+info->u.virtio_mem.data = vi;
+info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM;
+}
+
+static void virtio_mem_pci_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
+PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
+MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(klass);
+
+k->realize = virtio_mem_pci_realize;
+set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
+pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_MEM;
+pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
+pcidev_k->class_id = PCI_CLASS_OTHERS;
+
+mdc->get_addr = virtio_mem_pci_get_addr;
+mdc->set_addr = virtio_mem_pci_set_addr;
+mdc->get_plugged_size = virtio_mem_pci_get_plugged_size;
+mdc->get_memory_region = virtio_mem_pci_get_memory_region;
+mdc->fill_device_info = virtio_mem_pci_fill_device_info;
+}
+
+static void virtio_mem_pci_instance_init(Object *obj)
+{
+VirtIOMEMPCI *dev = VIRTIO_MEM_PCI(obj);
+
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_MEM);
+object_property_add_alias(obj, VIRTIO_MEM_BLOCK_SIZE_PROP,
+  OBJECT(&dev->vdev),
+  VIRTIO_MEM_

Re: [PATCH] nbd-client: Support leading / in NBD URI

2020-02-12 Thread Ján Tomko

On Tue, Feb 11, 2020 at 08:31:01PM -0600, Eric Blake wrote:

The NBD URI specification [1] states that only one leading slash at
the beginning of the URI path component is stripped, not all such
slashes.  This becomes important to a patch I just proposed to nbdkit
[2], which would allow the exportname to select a file embedded within
an ext2 image: ext2fs demands an absolute pathname beginning with '/',
and because qemu was inadvertantly stripping it, my nbdkit patch had
to work around the behavior.

[1] https://github.com/NetworkBlockDevice/nbd/blob/master/doc/uri.md
[2] https://www.redhat.com/archives/libguestfs/2020-February/msg00109.html

Note that the qemu bug only affects handling of URIs such as
nbd://host:port//abs/path (where '/abs/path' should be the export
name); it is still possible to use --image-opts and pass the desired
export name with a leading slash directly through JSON even without
this patch.

Signed-off-by: Eric Blake 
---
block/nbd.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature


Re: [PATCH v30 00/22] Add RX archtecture support

2020-02-12 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200212130311.127515-1-ys...@users.sourceforge.jp/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v30 00/22] Add RX archtecture support
Message-id: 20200212130311.127515-1-ys...@users.sourceforge.jp
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 * [new tag] patchew/20200212130311.127515-1-ys...@users.sourceforge.jp 
-> patchew/20200212130311.127515-1-ys...@users.sourceforge.jp
Switched to a new branch 'test'
f4d6eaf qemu-doc.texi: Add RX section.
0b8d847 BootLinuxConsoleTest: Test the RX-Virt machine
c557e26 Add rx-softmmu
43b988e hw/rx: Restrict the RX62N microcontroller to the RX62N CPU core
ff21308 hw/rx: Honor -accel qtest
8c88664 hw/rx: RX Target hardware definition
1b2fe5f hw/char: RX62N serial communication interface (SCI)
b2f8fa7 hw/timer: RX62N internal timer modules
ae62029 hw/intc: RX62N interrupt controller (ICUa)
765ce42 target/rx: Dump bytes for each insn during disassembly
a2a0a4b target/rx: Collect all bytes during disassembly
b746289 target/rx: Emit all disassembly in one prt()
c90e743 target/rx: Use prt_ldmi for XCHG_mr disassembly
89b0c41 target/rx: Replace operand with prt_ldmi in disassembler
c5849aa target/rx: Disassemble rx_index_addr into a string
88a4745 target/rx: RX disassembler
def33df target/rx: CPU definition
ec76660 target/rx: TCG helper
65fc0c5 target/rx: TCG translation
c225a52 hw/registerfields.h: Add 8bit and 16bit register macros
1b8ec2b qemu/bitops.h: Add extract8 and extract16
4292f08 MAINTAINERS: Add RX

=== OUTPUT BEGIN ===
1/22 Checking commit 4292f083c3fa (MAINTAINERS: Add RX)
2/22 Checking commit 1b8ec2b1e39d (qemu/bitops.h: Add extract8 and extract16)
3/22 Checking commit c225a5286644 (hw/registerfields.h: Add 8bit and 16bit 
register macros)
Use of uninitialized value in concatenation (.) or string at 
./scripts/checkpatch.pl line 2490.
ERROR: Macros with multiple statements should be enclosed in a do - while loop
#27: FILE: include/hw/registerfields.h:25:
+#define REG8(reg, addr)  \
+enum { A_ ## reg = (addr) };  \
+enum { R_ ## reg = (addr) };

ERROR: Macros with multiple statements should be enclosed in a do - while loop
#31: FILE: include/hw/registerfields.h:29:
+#define REG16(reg, addr)  \
+enum { A_ ## reg = (addr) };  \
+enum { R_ ## reg = (addr) / 2 };

total: 2 errors, 0 warnings, 56 lines checked

Patch 3/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/22 Checking commit 65fc0c5eab98 (target/rx: TCG translation)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#20: 
new file mode 100644

total: 0 errors, 1 warnings, 3065 lines checked

Patch 4/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
5/22 Checking commit ec766608772b (target/rx: TCG helper)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#21: 
new file mode 100644

total: 0 errors, 1 warnings, 650 lines checked

Patch 5/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
6/22 Checking commit def33dfddaf5 (target/rx: CPU definition)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#22: 
new file mode 100644

total: 0 errors, 1 warnings, 659 lines checked

Patch 6/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
7/22 Checking commit 88a474516e8d (target/rx: RX disassembler)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#38: 
new file mode 100644

total: 0 errors, 1 warnings, 1497 lines checked

Patch 7/22 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
8/22 Checking commit c5849aab2ce9 (target/rx: Disassemble rx_index_addr into a 
string)
9/22 Checking commit 89b0c4140179 (target/rx: Replace operand with prt_ldmi in 
disassembler)
10/22 Checking commit c90e743d5f36 (target/rx: Use prt_ldmi for XCHG_mr 
disassembly)
11/22 Checking commit b74628914f4f (target/rx: Emit all disassembly in one 
prt())
12/22 Checking commit a2a0a4b84d0e (target/rx: Collect all bytes during 
disassembly)
13/22 Checking commit 765ce427f3f7 (target/rx: Dump bytes for each insn 

Re: [PATCH v5 01/26] nvme: rename trace events to nvme_dev

2020-02-12 Thread Maxim Levitsky
On Wed, 2020-02-12 at 14:08 +0100, Klaus Birkelund Jensen wrote:
> On Feb 12 11:08, Maxim Levitsky wrote:
> > On Tue, 2020-02-04 at 10:51 +0100, Klaus Jensen wrote:
> > > Change the prefix of all nvme device related trace events to 'nvme_dev'
> > > to not clash with trace events from the nvme block driver.
> > > 
> 
> Hi Maxim,
> 
> Thank you very much for your thorough reviews! Utterly appreciated!

Thanks to you for the patch series!

> 
> I'll start going through your suggested changes. There is a bit of work
> to do on splitting patches into refactoring and bugfixes, but I can
> definitely see the reason for this, so I'll get to work.
> 
> You mention the alignment with split lines alot. I actually thought I
> was following CODING_STYLE.rst (which allows a single 4 space indent for
> functions, but not statements such as if/else and while/for). But since
> hw/block/nvme.c is originally written in the style of aligning with the
> opening paranthesis I'm in the wrong here, so I will of course amend
> it. Should have done that from the beginning, it's just my personal
> taste shining through ;)

TO be honest this is my personal taste as well, but after *many* review
complains about this I consider that aligning on opening paranthesis 
is kind of an official style.

If others are OK with this though, I am personally 100% fine with leaving the
split lines as is.


Best regards,
Maxim Levitsky




Re: [PATCH v5 01/26] nvme: rename trace events to nvme_dev

2020-02-12 Thread Klaus Birkelund Jensen
On Feb 12 11:08, Maxim Levitsky wrote:
> On Tue, 2020-02-04 at 10:51 +0100, Klaus Jensen wrote:
> > Change the prefix of all nvme device related trace events to 'nvme_dev'
> > to not clash with trace events from the nvme block driver.
> > 

Hi Maxim,

Thank you very much for your thorough reviews! Utterly appreciated!

I'll start going through your suggested changes. There is a bit of work
to do on splitting patches into refactoring and bugfixes, but I can
definitely see the reason for this, so I'll get to work.

You mention the alignment with split lines alot. I actually thought I
was following CODING_STYLE.rst (which allows a single 4 space indent for
functions, but not statements such as if/else and while/for). But since
hw/block/nvme.c is originally written in the style of aligning with the
opening paranthesis I'm in the wrong here, so I will of course amend
it. Should have done that from the beginning, it's just my personal
taste shining through ;)


Thanks again,
Klaus



[PATCH] docs: Fix virtiofsd.1 location

2020-02-12 Thread Miroslav Rezanina
Patch 6a7e2bbee5 docs: add virtiofsd(1) man page introduced new man
page virtiofsd.1. Unfortunately, wrong file location is used as
source for install command. This cause installation of docs fail.

Fixing wrong location so installation is successful.

Signed-off-by: Miroslav Rezanina 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index f0e1a2f..62a367d 100644
--- a/Makefile
+++ b/Makefile
@@ -865,7 +865,7 @@ ifdef CONFIG_VIRTFS
$(INSTALL_DATA) $(MANUAL_BUILDDIR)/interop/virtfs-proxy-helper.1 
"$(DESTDIR)$(mandir)/man1"
 endif
 ifeq ($(CONFIG_LINUX)$(CONFIG_SECCOMP)$(CONFIG_LIBCAP_NG),yyy)
-   $(INSTALL_DATA) docs/interop/virtiofsd.1 "$(DESTDIR)$(mandir)/man1"
+   $(INSTALL_DATA) $(MANUAL_BUILDDIR)/interop/virtiofsd.1 
"$(DESTDIR)$(mandir)/man1"
 endif

 install-datadir:
-- 
1.8.3.1





[PATCH v30 21/22] BootLinuxConsoleTest: Test the RX-Virt machine

2020-02-12 Thread Yoshinori Sato
From: Philippe Mathieu-Daudé 

Add two tests for the rx-virt machine, based on the recommended test
setup from Yoshinori Sato:
https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03586.html

- U-Boot prompt
- Linux kernel with Sash shell

These are very quick tests:

  $ avocado run -t arch:rx tests/acceptance/boot_linux_console.py
  JOB ID : 84a6ef01c0b87975ecbfcb31a920afd735753ace
  JOB LOG: 
/home/phil/avocado/job-results/job-2019-05-24T05.02-84a6ef0/job.log
   (1/2) tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_rx_uboot: 
PASS (0.11 s)
   (2/2) tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_rx_linux: 
PASS (0.45 s)
  RESULTS: PASS 2 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | 
CANCEL 0

Tests can also be run with:

  $ avocado --show=console run -t arch:rx tests/acceptance/boot_linux_console.py
  console: U-Boot 2016.05-rc3-23705-ga1ef3c71cb-dirty (Feb 05 2019 - 21:56:06 
+0900)
  console: Linux version 4.19.0+ (yo-satoh@yo-satoh-debian) (gcc version 9.0.0 
20181105 (experimental) (GCC)) #137 Wed Feb 20 23:20:02 JST 2019
  console: Built 1 zonelists, mobility grouping on.  Total pages: 8128
  ...
  console: SuperH (H)SCI(F) driver initialized
  console: 88240.serial: ttySC0 at MMIO 0x88240 (irq = 215, base_baud = 0) is a 
sci
  console: console [ttySC0] enabled
  console: 88248.serial: ttySC1 at MMIO 0x88248 (irq = 219, base_baud = 0) is a 
sci

Signed-off-by: Philippe Mathieu-Daudé 
Based-on: 20190517045136.3509-1-richard.hender...@linaro.org
"RX architecture support"
Signed-off-by: Yoshinori Sato 
---
 tests/acceptance/boot_linux_console.py | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index 34d37eba3b..367cf480a5 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -686,3 +686,49 @@ class BootLinuxConsole(Test):
 tar_hash = '49e88d9933742f0164b60839886c9739cb7a0d34'
 self.vm.add_args('-cpu', 'dc233c')
 self.do_test_advcal_2018('02', tar_hash, 'santas-sleigh-ride.elf')
+
+def test_rx_uboot(self):
+"""
+:avocado: tags=arch:rx
+:avocado: tags=machine:rx-virt
+:avocado: tags=endian:little
+"""
+uboot_url = ('https://acc.dl.osdn.jp/users/23/23888/u-boot.bin.gz')
+uboot_hash = '9b78dbd43b40b2526848c0b1ce9de02c24f4dcdb'
+uboot_path = self.fetch_asset(uboot_url, asset_hash=uboot_hash)
+uboot_path = archive.uncompress(uboot_path, self.workdir)
+
+self.vm.set_machine('rx-virt')
+self.vm.set_console()
+self.vm.add_args('-bios', uboot_path,
+ '-no-reboot')
+self.vm.launch()
+uboot_version = 'U-Boot 2016.05-rc3-23705-ga1ef3c71cb-dirty'
+self.wait_for_console_pattern(uboot_version)
+gcc_version = 'rx-unknown-linux-gcc (GCC) 9.0.0 20181105 
(experimental)'
+# FIXME limit baudrate on chardev, else we type too fast
+#self.exec_command_and_wait_for_pattern('version', gcc_version)
+
+def test_rx_linux(self):
+"""
+:avocado: tags=arch:rx
+:avocado: tags=machine:rx-virt
+:avocado: tags=endian:little
+"""
+dtb_url = ('https://acc.dl.osdn.jp/users/23/23887/rx-qemu.dtb')
+dtb_hash = '7b4e4e2c71905da44e86ce47adee2210b026ac18'
+dtb_path = self.fetch_asset(dtb_url, asset_hash=dtb_hash)
+kernel_url = ('http://acc.dl.osdn.jp/users/23/23845/zImage')
+kernel_hash = '39a81067f8d72faad90866ddfefa19165d68fc99'
+kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+
+self.vm.set_machine('rx-virt')
+self.vm.set_console()
+kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'earlycon'
+self.vm.add_args('-kernel', kernel_path,
+ '-dtb', dtb_path,
+ '-no-reboot')
+self.vm.launch()
+self.wait_for_console_pattern('Sash command shell (version 1.1.1)')
+self.exec_command_and_wait_for_pattern('printenv',
+   'TERM=linux')
-- 
2.20.1




[PATCH v30 02/22] qemu/bitops.h: Add extract8 and extract16

2020-02-12 Thread Yoshinori Sato
Signed-off-by: Yoshinori Sato 
Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20190607091116.49044-10-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/qemu/bitops.h | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index 02c1ce6a5d..f55ce8b320 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -301,6 +301,44 @@ static inline uint32_t extract32(uint32_t value, int 
start, int length)
 return (value >> start) & (~0U >> (32 - length));
 }
 
+/**
+ * extract8:
+ * @value: the value to extract the bit field from
+ * @start: the lowest bit in the bit field (numbered from 0)
+ * @length: the length of the bit field
+ *
+ * Extract from the 8 bit input @value the bit field specified by the
+ * @start and @length parameters, and return it. The bit field must
+ * lie entirely within the 8 bit word. It is valid to request that
+ * all 8 bits are returned (ie @length 8 and @start 0).
+ *
+ * Returns: the value of the bit field extracted from the input value.
+ */
+static inline uint8_t extract8(uint8_t value, int start, int length)
+{
+assert(start >= 0 && length > 0 && length <= 8 - start);
+return extract32(value, start, length);
+}
+
+/**
+ * extract16:
+ * @value: the value to extract the bit field from
+ * @start: the lowest bit in the bit field (numbered from 0)
+ * @length: the length of the bit field
+ *
+ * Extract from the 16 bit input @value the bit field specified by the
+ * @start and @length parameters, and return it. The bit field must
+ * lie entirely within the 16 bit word. It is valid to request that
+ * all 16 bits are returned (ie @length 16 and @start 0).
+ *
+ * Returns: the value of the bit field extracted from the input value.
+ */
+static inline uint16_t extract16(uint16_t value, int start, int length)
+{
+assert(start >= 0 && length > 0 && length <= 16 - start);
+return extract32(value, start, length);
+}
+
 /**
  * extract64:
  * @value: the value to extract the bit field from
-- 
2.20.1




[PATCH v30 09/22] target/rx: Replace operand with prt_ldmi in disassembler

2020-02-12 Thread Yoshinori Sato
From: Richard Henderson 

This has consistency with prt_ri().  It loads all data before
beginning output.  It uses exactly one call to prt() to emit
the full instruction.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Yoshinori Sato 
Signed-off-by: Yoshinori Sato 
Message-Id: <20190607091116.49044-20-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/rx/disas.c | 77 +--
 1 file changed, 27 insertions(+), 50 deletions(-)

diff --git a/target/rx/disas.c b/target/rx/disas.c
index 64342537ee..515b365528 100644
--- a/target/rx/disas.c
+++ b/target/rx/disas.c
@@ -135,18 +135,18 @@ static void rx_index_addr(DisasContext *ctx, char out[8], 
int ld, int mi)
 sprintf(out, "%u", dsp << (mi < 3 ? mi : 4 - mi));
 }
 
-static void operand(DisasContext *ctx, int ld, int mi, int rs, int rd)
+static void prt_ldmi(DisasContext *ctx, const char *insn,
+ int ld, int mi, int rs, int rd)
 {
 static const char sizes[][4] = {".b", ".w", ".l", ".uw", ".ub"};
 char dsp[8];
 
 if (ld < 3) {
 rx_index_addr(ctx, dsp, ld, mi);
-prt("%s[r%d]%s", dsp, rs, sizes[mi]);
+prt("%s\t%s[r%d]%s, r%d", insn, dsp, rs, sizes[mi], rd);
 } else {
-prt("r%d", rs);
+prt("%s\tr%d, r%d", insn, rs, rd);
 }
-prt(", r%d", rd);
 }
 
 static void prt_ir(DisasContext *ctx, const char *insn, int imm, int rd)
@@ -416,8 +416,7 @@ static bool trans_AND_ir(DisasContext *ctx, arg_AND_ir *a)
 /* and rs,rd */
 static bool trans_AND_mr(DisasContext *ctx, arg_AND_mr *a)
 {
-prt("and\t");
-operand(ctx, a->ld, a->mi, a->rs, a->rd);
+prt_ldmi(ctx, "and", a->ld, a->mi, a->rs, a->rd);
 return true;
 }
 
@@ -440,8 +439,7 @@ static bool trans_OR_ir(DisasContext *ctx, arg_OR_ir *a)
 /* or rs,rd */
 static bool trans_OR_mr(DisasContext *ctx, arg_OR_mr *a)
 {
-prt("or\t");
-operand(ctx, a->ld, a->mi, a->rs, a->rd);
+prt_ldmi(ctx, "or", a->ld, a->mi, a->rs, a->rd);
 return true;
 }
 
@@ -463,8 +461,7 @@ static bool trans_XOR_ir(DisasContext *ctx, arg_XOR_ir *a)
 /* xor rs,rd */
 static bool trans_XOR_mr(DisasContext *ctx, arg_XOR_mr *a)
 {
-prt("xor\t");
-operand(ctx, a->ld, a->mi, a->rs, a->rd);
+prt_ldmi(ctx, "xor", a->ld, a->mi, a->rs, a->rd);
 return true;
 }
 
@@ -479,8 +476,7 @@ static bool trans_TST_ir(DisasContext *ctx, arg_TST_ir *a)
 /* tst rs, rd */
 static bool trans_TST_mr(DisasContext *ctx, arg_TST_mr *a)
 {
-prt("tst\t");
-operand(ctx, a->ld, a->mi, a->rs, a->rd);
+prt_ldmi(ctx, "tst", a->ld, a->mi, a->rs, a->rd);
 return true;
 }
 
@@ -548,8 +544,7 @@ static bool trans_ADD_irr(DisasContext *ctx, arg_ADD_irr *a)
 /* add dsp[rs], rd */
 static bool trans_ADD_mr(DisasContext *ctx, arg_ADD_mr *a)
 {
-prt("add\t");
-operand(ctx, a->ld, a->mi, a->rs, a->rd);
+prt_ldmi(ctx, "add", a->ld, a->mi, a->rs, a->rd);
 return true;
 }
 
@@ -573,8 +568,7 @@ static bool trans_CMP_ir(DisasContext *ctx, arg_CMP_ir *a)
 /* cmp dsp[rs], rs2 */
 static bool trans_CMP_mr(DisasContext *ctx, arg_CMP_mr *a)
 {
-prt("cmp\t");
-operand(ctx, a->ld, a->mi, a->rs, a->rd);
+prt_ldmi(ctx, "cmp", a->ld, a->mi, a->rs, a->rd);
 return true;
 }
 
@@ -589,8 +583,7 @@ static bool trans_SUB_ir(DisasContext *ctx, arg_SUB_ir *a)
 /* sub dsp[rs], rd */
 static bool trans_SUB_mr(DisasContext *ctx, arg_SUB_mr *a)
 {
-prt("sub\t");
-operand(ctx, a->ld, a->mi, a->rs, a->rd);
+prt_ldmi(ctx, "sub", a->ld, a->mi, a->rs, a->rd);
 return true;
 }
 
@@ -611,8 +604,7 @@ static bool trans_SBB_rr(DisasContext *ctx, arg_SBB_rr *a)
 /* sbb dsp[rs], rd */
 static bool trans_SBB_mr(DisasContext *ctx, arg_SBB_mr *a)
 {
-prt("sbb\t");
-operand(ctx, a->ld, RX_IM_LONG, a->rs, a->rd);
+prt_ldmi(ctx, "sbb", a->ld, RX_IM_LONG, a->rs, a->rd);
 return true;
 }
 
@@ -640,8 +632,7 @@ static bool trans_MAX_ir(DisasContext *ctx, arg_MAX_ir *a)
 /* max dsp[rs], rd */
 static bool trans_MAX_mr(DisasContext *ctx, arg_MAX_mr *a)
 {
-prt("max\t");
-operand(ctx, a->ld, a->mi, a->rs, a->rd);
+prt_ldmi(ctx, "max", a->ld, a->mi, a->rs, a->rd);
 return true;
 }
 
@@ -656,8 +647,7 @@ static bool trans_MIN_ir(DisasContext *ctx, arg_MIN_ir *a)
 /* min dsp[rs], rd */
 static bool trans_MIN_mr(DisasContext *ctx, arg_MIN_mr *a)
 {
-prt("max\t");
-operand(ctx, a->ld, a->mi, a->rs, a->rd);
+prt_ldmi(ctx, "min", a->ld, a->mi, a->rs, a->rd);
 return true;
 }
 
@@ -673,8 +663,7 @@ static bool trans_MUL_ir(DisasContext *ctx, arg_MUL_ir *a)
 /* mul dsp[rs], rd */
 static bool trans_MUL_mr(DisasContext *ctx, arg_MUL_mr *a)
 {
-prt("mul\t");
-operand(ctx, a->ld, a->mi, a->rs, a->rd);
+prt_ldmi(ctx, "mul", a->ld, a->mi, a->rs, a->rd);
 return true;
 }
 
@@ -696,8 +685,7 @@ static bool trans_EMUL_ir(DisasContext *ctx, arg_EMUL_ir *a)
 /* emul dsp[rs], rd */
 static bool trans_EMUL_mr(DisasContext *ctx

[PATCH v30 17/22] hw/rx: RX Target hardware definition

2020-02-12 Thread Yoshinori Sato
rx62n - RX62N cpu.
rx-virt - RX QEMU virtual target.

Signed-off-by: Yoshinori Sato 

Message-Id: <20190616142836.10614-17-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20190607091116.49044-9-ys...@users.sourceforge.jp>
Signed-off-by: Richard Henderson 
[PMD: Use TYPE_RX62N_CPU, use #define for RX62N_NR_TMR/CMT/SCI,
 renamed CPU -> MCU, device -> microcontroller]
Signed-off-by: Philippe Mathieu-Daudé 
---
v23 changes.
Add missing includes.

v21 changes.
rx_load_image move to rx-virt.c

v19: Fixed typo (Peter Maydell)
Signed-off-by: Yoshinori Sato 
---
 include/hw/rx/rx.h|   7 ++
 include/hw/rx/rx62n.h |  91 
 hw/rx/rx-virt.c   | 140 +
 hw/rx/rx62n.c | 239 ++
 hw/rx/Kconfig |  14 +++
 hw/rx/Makefile.objs   |   2 +
 6 files changed, 493 insertions(+)
 create mode 100644 include/hw/rx/rx.h
 create mode 100644 include/hw/rx/rx62n.h
 create mode 100644 hw/rx/rx-virt.c
 create mode 100644 hw/rx/rx62n.c
 create mode 100644 hw/rx/Kconfig
 create mode 100644 hw/rx/Makefile.objs

diff --git a/include/hw/rx/rx.h b/include/hw/rx/rx.h
new file mode 100644
index 00..ff5924b81f
--- /dev/null
+++ b/include/hw/rx/rx.h
@@ -0,0 +1,7 @@
+#ifndef QEMU_RX_H
+#define QEMU_RX_H
+/* Definitions for RX board emulation.  */
+
+#include "target/rx/cpu-qom.h"
+
+#endif
diff --git a/include/hw/rx/rx62n.h b/include/hw/rx/rx62n.h
new file mode 100644
index 00..97ea8ddb8e
--- /dev/null
+++ b/include/hw/rx/rx62n.h
@@ -0,0 +1,91 @@
+/*
+ * RX62N MCU Object
+ *
+ * Datasheet: RX62N Group, RX621 Group User's Manual: Hardware
+ * (Rev.1.40 R01UH0033EJ0140)
+ *
+ * Copyright (c) 2019 Yoshinori Sato
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#ifndef HW_RX_RX62N_H
+#define HW_RX_RX62N_H
+
+#include "hw/sysbus.h"
+#include "hw/intc/rx_icu.h"
+#include "hw/timer/renesas_tmr.h"
+#include "hw/timer/renesas_cmt.h"
+#include "hw/char/renesas_sci.h"
+#include "target/rx/cpu.h"
+#include "qemu/units.h"
+
+#define TYPE_RX62N "rx62n"
+#define RX62N(obj) OBJECT_CHECK(RX62NState, (obj), TYPE_RX62N)
+
+#define RX62N_NR_TMR2
+#define RX62N_NR_CMT2
+#define RX62N_NR_SCI6
+
+typedef struct RX62NState {
+SysBusDevice parent_obj;
+
+RXCPU cpu;
+RXICUState icu;
+RTMRState tmr[RX62N_NR_TMR];
+RCMTState cmt[RX62N_NR_CMT];
+RSCIState sci[RX62N_NR_SCI];
+
+MemoryRegion *sysmem;
+bool kernel;
+
+MemoryRegion iram;
+MemoryRegion iomem1;
+MemoryRegion d_flash;
+MemoryRegion iomem2;
+MemoryRegion iomem3;
+MemoryRegion c_flash;
+qemu_irq irq[NR_IRQS];
+} RX62NState;
+
+/*
+ * RX62N Peripheral Address
+ * See users manual section 5
+ */
+#define RX62N_ICUBASE 0x00087000
+#define RX62N_TMRBASE 0x00088200
+#define RX62N_CMTBASE 0x00088000
+#define RX62N_SCIBASE 0x00088240
+
+/*
+ * RX62N Peripheral IRQ
+ * See users manual section 11
+ */
+#define RX62N_TMR_IRQBASE 174
+#define RX62N_CMT_IRQBASE 28
+#define RX62N_SCI_IRQBASE 214
+
+/*
+ * RX62N Internal Memory
+ * It is the value of R5F562N8.
+ * Please change the size for R5F562N7.
+ */
+#define RX62N_IRAM_BASE 0x
+#define RX62N_IRAM_SIZE (96 * KiB)
+#define RX62N_DFLASH_BASE 0x0010
+#define RX62N_DFLASH_SIZE (32 * KiB)
+#define RX62N_CFLASH_BASE 0xfff8
+#define RX62N_CFLASH_SIZE (512 * KiB)
+
+#define RX62N_PCLK (48 * 1000 * 1000)
+#endif
diff --git a/hw/rx/rx-virt.c b/hw/rx/rx-virt.c
new file mode 100644
index 00..017941b996
--- /dev/null
+++ b/hw/rx/rx-virt.c
@@ -0,0 +1,140 @@
+/*
+ * RX QEMU virtual platform
+ *
+ * Copyright (c) 2019 Yoshinori Sato
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+#include "cpu.h"
+#include "hw/hw.h"

[PATCH v30 12/22] target/rx: Collect all bytes during disassembly

2020-02-12 Thread Yoshinori Sato
From: Richard Henderson 

Collected, to be used in the next patch.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Yoshinori Sato 
Signed-off-by: Yoshinori Sato 
Message-Id: <20190607091116.49044-23-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/rx/disas.c | 62 ---
 1 file changed, 42 insertions(+), 20 deletions(-)

diff --git a/target/rx/disas.c b/target/rx/disas.c
index ebc1a44249..5a32a87534 100644
--- a/target/rx/disas.c
+++ b/target/rx/disas.c
@@ -25,43 +25,59 @@ typedef struct DisasContext {
 disassemble_info *dis;
 uint32_t addr;
 uint32_t pc;
+uint8_t len;
+uint8_t bytes[8];
 } DisasContext;
 
 
 static uint32_t decode_load_bytes(DisasContext *ctx, uint32_t insn,
-   int i, int n)
+  int i, int n)
 {
-bfd_byte buf;
+uint32_t addr = ctx->addr;
+
+g_assert(ctx->len == i);
+g_assert(n <= ARRAY_SIZE(ctx->bytes));
+
 while (++i <= n) {
-ctx->dis->read_memory_func(ctx->addr++, &buf, 1, ctx->dis);
-insn |= buf << (32 - i * 8);
+ctx->dis->read_memory_func(addr++, &ctx->bytes[i - 1], 1, ctx->dis);
+insn |= ctx->bytes[i - 1] << (32 - i * 8);
 }
+ctx->addr = addr;
+ctx->len = n;
+
 return insn;
 }
 
 static int32_t li(DisasContext *ctx, int sz)
 {
-int32_t addr;
-bfd_byte buf[4];
-addr = ctx->addr;
+uint32_t addr = ctx->addr;
+uintptr_t len = ctx->len;
 
 switch (sz) {
 case 1:
+g_assert(len + 1 <= ARRAY_SIZE(ctx->bytes));
 ctx->addr += 1;
-ctx->dis->read_memory_func(addr, buf, 1, ctx->dis);
-return (int8_t)buf[0];
+ctx->len += 1;
+ctx->dis->read_memory_func(addr, ctx->bytes + len, 1, ctx->dis);
+return (int8_t)ctx->bytes[len];
 case 2:
+g_assert(len + 2 <= ARRAY_SIZE(ctx->bytes));
 ctx->addr += 2;
-ctx->dis->read_memory_func(addr, buf, 2, ctx->dis);
-return ldsw_le_p(buf);
+ctx->len += 2;
+ctx->dis->read_memory_func(addr, ctx->bytes + len, 2, ctx->dis);
+return ldsw_le_p(ctx->bytes + len);
 case 3:
+g_assert(len + 3 <= ARRAY_SIZE(ctx->bytes));
 ctx->addr += 3;
-ctx->dis->read_memory_func(addr, buf, 3, ctx->dis);
-return (int8_t)buf[2] << 16 | lduw_le_p(buf);
+ctx->len += 3;
+ctx->dis->read_memory_func(addr, ctx->bytes + len, 3, ctx->dis);
+return (int8_t)ctx->bytes[len + 2] << 16 | lduw_le_p(ctx->bytes + len);
 case 0:
+g_assert(len + 4 <= ARRAY_SIZE(ctx->bytes));
 ctx->addr += 4;
-ctx->dis->read_memory_func(addr, buf, 4, ctx->dis);
-return ldl_le_p(buf);
+ctx->len += 4;
+ctx->dis->read_memory_func(addr, ctx->bytes + len, 4, ctx->dis);
+return ldl_le_p(ctx->bytes + len);
 default:
 g_assert_not_reached();
 }
@@ -110,7 +126,7 @@ static const char psw[] = {
 static void rx_index_addr(DisasContext *ctx, char out[8], int ld, int mi)
 {
 uint32_t addr = ctx->addr;
-uint8_t buf[2];
+uintptr_t len = ctx->len;
 uint16_t dsp;
 
 switch (ld) {
@@ -119,14 +135,18 @@ static void rx_index_addr(DisasContext *ctx, char out[8], 
int ld, int mi)
 out[0] = '\0';
 return;
 case 1:
+g_assert(len + 1 <= ARRAY_SIZE(ctx->bytes));
 ctx->addr += 1;
-ctx->dis->read_memory_func(addr, buf, 1, ctx->dis);
-dsp = buf[0];
+ctx->len += 1;
+ctx->dis->read_memory_func(addr, ctx->bytes + len, 1, ctx->dis);
+dsp = ctx->bytes[len];
 break;
 case 2:
+g_assert(len + 2 <= ARRAY_SIZE(ctx->bytes));
 ctx->addr += 2;
-ctx->dis->read_memory_func(addr, buf, 2, ctx->dis);
-dsp = lduw_le_p(buf);
+ctx->len += 2;
+ctx->dis->read_memory_func(addr, ctx->bytes + len, 2, ctx->dis);
+dsp = lduw_le_p(ctx->bytes + len);
 break;
 default:
 g_assert_not_reached();
@@ -1392,8 +1412,10 @@ int print_insn_rx(bfd_vma addr, disassemble_info *dis)
 DisasContext ctx;
 uint32_t insn;
 int i;
+
 ctx.dis = dis;
 ctx.pc = ctx.addr = addr;
+ctx.len = 0;
 
 insn = decode_load(&ctx);
 if (!decode(&ctx, insn)) {
-- 
2.20.1




[PATCH v30 15/22] hw/timer: RX62N internal timer modules

2020-02-12 Thread Yoshinori Sato
renesas_tmr: 8bit timer modules.
renesas_cmt: 16bit compare match timer modules.
This part use many renesas's CPU.
Hardware manual.
https://www.renesas.com/us/en/doc/products/mpumcu/doc/rx_family/r01uh0033ej0140_rx62n.pdf

Signed-off-by: Yoshinori Sato 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20190607091116.49044-7-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/hw/timer/renesas_cmt.h |  38 +++
 include/hw/timer/renesas_tmr.h |  53 
 hw/timer/renesas_cmt.c | 278 
 hw/timer/renesas_tmr.c | 458 +
 hw/timer/Kconfig   |   6 +
 hw/timer/Makefile.objs |   3 +
 6 files changed, 836 insertions(+)
 create mode 100644 include/hw/timer/renesas_cmt.h
 create mode 100644 include/hw/timer/renesas_tmr.h
 create mode 100644 hw/timer/renesas_cmt.c
 create mode 100644 hw/timer/renesas_tmr.c

diff --git a/include/hw/timer/renesas_cmt.h b/include/hw/timer/renesas_cmt.h
new file mode 100644
index 00..acd25c6e0b
--- /dev/null
+++ b/include/hw/timer/renesas_cmt.h
@@ -0,0 +1,38 @@
+/*
+ * Renesas Compare-match timer Object
+ *
+ * Copyright (c) 2019 Yoshinori Sato
+ *
+ * This code is licensed under the GPL version 2 or later.
+ *
+ */
+
+#ifndef HW_RENESAS_CMT_H
+#define HW_RENESAS_CMT_H
+
+#include "hw/sysbus.h"
+
+#define TYPE_RENESAS_CMT "renesas-cmt"
+#define RCMT(obj) OBJECT_CHECK(RCMTState, (obj), TYPE_RENESAS_CMT)
+
+enum {
+CMT_CH = 2,
+CMT_NR_IRQ = 1 * CMT_CH,
+};
+
+typedef struct RCMTState {
+SysBusDevice parent_obj;
+
+uint64_t input_freq;
+MemoryRegion memory;
+
+uint16_t cmstr;
+uint16_t cmcr[CMT_CH];
+uint16_t cmcnt[CMT_CH];
+uint16_t cmcor[CMT_CH];
+int64_t tick[CMT_CH];
+qemu_irq cmi[CMT_CH];
+QEMUTimer *timer[CMT_CH];
+} RCMTState;
+
+#endif
diff --git a/include/hw/timer/renesas_tmr.h b/include/hw/timer/renesas_tmr.h
new file mode 100644
index 00..5787004c74
--- /dev/null
+++ b/include/hw/timer/renesas_tmr.h
@@ -0,0 +1,53 @@
+/*
+ * Renesas 8bit timer Object
+ *
+ * Copyright (c) 2018 Yoshinori Sato
+ *
+ * This code is licensed under the GPL version 2 or later.
+ *
+ */
+
+#ifndef HW_RENESAS_TMR_H
+#define HW_RENESAS_TMR_H
+
+#include "hw/sysbus.h"
+
+#define TYPE_RENESAS_TMR "renesas-tmr"
+#define RTMR(obj) OBJECT_CHECK(RTMRState, (obj), TYPE_RENESAS_TMR)
+
+enum timer_event {
+cmia = 0,
+cmib = 1,
+ovi = 2,
+none = 3,
+TMR_NR_EVENTS = 4
+};
+
+enum {
+TMR_CH = 2,
+TMR_NR_IRQ = 3 * TMR_CH,
+};
+
+typedef struct RTMRState {
+SysBusDevice parent_obj;
+
+uint64_t input_freq;
+MemoryRegion memory;
+
+uint8_t tcnt[TMR_CH];
+uint8_t tcora[TMR_CH];
+uint8_t tcorb[TMR_CH];
+uint8_t tcr[TMR_CH];
+uint8_t tccr[TMR_CH];
+uint8_t tcor[TMR_CH];
+uint8_t tcsr[TMR_CH];
+int64_t tick;
+int64_t div_round[TMR_CH];
+enum timer_event next[TMR_CH];
+qemu_irq cmia[TMR_CH];
+qemu_irq cmib[TMR_CH];
+qemu_irq ovi[TMR_CH];
+QEMUTimer *timer[TMR_CH];
+} RTMRState;
+
+#endif
diff --git a/hw/timer/renesas_cmt.c b/hw/timer/renesas_cmt.c
new file mode 100644
index 00..574772b89b
--- /dev/null
+++ b/hw/timer/renesas_cmt.c
@@ -0,0 +1,278 @@
+/*
+ * Renesas 16bit Compare-match timer
+ *
+ * Datasheet: RX62N Group, RX621 Group User's Manual: Hardware
+ * (Rev.1.40 R01UH0033EJ0140)
+ *
+ * Copyright (c) 2019 Yoshinori Sato
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "qemu/timer.h"
+#include "cpu.h"
+#include "hw/hw.h"
+#include "hw/irq.h"
+#include "hw/sysbus.h"
+#include "hw/registerfields.h"
+#include "hw/qdev-properties.h"
+#include "hw/timer/renesas_cmt.h"
+#include "migration/vmstate.h"
+#include "qemu/error-report.h"
+
+/*
+ *  +0 CMSTR - common control
+ *  +2 CMCR  - ch0
+ *  +4 CMCNT - ch0
+ *  +6 CMCOR - ch0
+ *  +8 CMCR  - ch1
+ * +10 CMCNT - ch1
+ * +12 CMCOR - ch1
+ * If we think that the address of CH 0 has an offset of +2,
+ * we can treat it with the same address as CH 1, so define it like that.
+ */
+REG16(CMSTR, 0)
+  FIELD(CMSTR, STR0, 0, 1)
+  FIELD(CMSTR, STR1, 1, 1)
+  FIELD(CMSTR, STR,  0, 2)
+/* This addeess is channel offset */
+REG16(CMCR, 0)
+  FIELD(CMCR, CKS, 0, 2)
+  FIELD(CMCR, CMIE, 6, 

[PATCH v30 07/22] target/rx: RX disassembler

2020-02-12 Thread Yoshinori Sato
Signed-off-by: Yoshinori Sato 
Reviewed-by: Richard Henderson 
Tested-by: Philippe Mathieu-Daudé 
Message-Id: <20190607091116.49044-5-ys...@users.sourceforge.jp>
Signed-off-by: Richard Henderson 
---
 include/disas/dis-asm.h |5 +
 target/rx/disas.c   | 1480 +++
 2 files changed, 1485 insertions(+)
 create mode 100644 target/rx/disas.c

diff --git a/include/disas/dis-asm.h b/include/disas/dis-asm.h
index f87f468809..c5f9fa08ab 100644
--- a/include/disas/dis-asm.h
+++ b/include/disas/dis-asm.h
@@ -226,6 +226,10 @@ enum bfd_architecture
 #define bfd_mach_nios2r22
   bfd_arch_lm32,   /* Lattice Mico32 */
 #define bfd_mach_lm32 1
+  bfd_arch_rx,   /* Renesas RX */
+#define bfd_mach_rx0x75
+#define bfd_mach_rx_v2 0x76
+#define bfd_mach_rx_v3 0x77
   bfd_arch_last
   };
 #define bfd_mach_s390_31 31
@@ -436,6 +440,7 @@ int print_insn_little_nios2 (bfd_vma, 
disassemble_info*);
 int print_insn_xtensa   (bfd_vma, disassemble_info*);
 int print_insn_riscv32  (bfd_vma, disassemble_info*);
 int print_insn_riscv64  (bfd_vma, disassemble_info*);
+int print_insn_rx(bfd_vma, disassemble_info *);
 
 #if 0
 /* Fetch the disassembler for a given BFD, if that support is available.  */
diff --git a/target/rx/disas.c b/target/rx/disas.c
new file mode 100644
index 00..8cada4825d
--- /dev/null
+++ b/target/rx/disas.c
@@ -0,0 +1,1480 @@
+/*
+ * Renesas RX Disassembler
+ *
+ * Copyright (c) 2019 Yoshinori Sato 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "disas/dis-asm.h"
+#include "qemu/bitops.h"
+#include "cpu.h"
+
+typedef struct DisasContext {
+disassemble_info *dis;
+uint32_t addr;
+uint32_t pc;
+} DisasContext;
+
+
+static uint32_t decode_load_bytes(DisasContext *ctx, uint32_t insn,
+   int i, int n)
+{
+bfd_byte buf;
+while (++i <= n) {
+ctx->dis->read_memory_func(ctx->addr++, &buf, 1, ctx->dis);
+insn |= buf << (32 - i * 8);
+}
+return insn;
+}
+
+static int32_t li(DisasContext *ctx, int sz)
+{
+int32_t addr;
+bfd_byte buf[4];
+addr = ctx->addr;
+
+switch (sz) {
+case 1:
+ctx->addr += 1;
+ctx->dis->read_memory_func(addr, buf, 1, ctx->dis);
+return (int8_t)buf[0];
+case 2:
+ctx->addr += 2;
+ctx->dis->read_memory_func(addr, buf, 2, ctx->dis);
+return ldsw_le_p(buf);
+case 3:
+ctx->addr += 3;
+ctx->dis->read_memory_func(addr, buf, 3, ctx->dis);
+return (int8_t)buf[2] << 16 | lduw_le_p(buf);
+case 0:
+ctx->addr += 4;
+ctx->dis->read_memory_func(addr, buf, 4, ctx->dis);
+return ldl_le_p(buf);
+default:
+g_assert_not_reached();
+}
+}
+
+static int bdsp_s(DisasContext *ctx, int d)
+{
+/*
+ * 0 -> 8
+ * 1 -> 9
+ * 2 -> 10
+ * 3 -> 3
+ * :
+ * 7 -> 7
+ */
+if (d < 3) {
+d += 8;
+}
+return d;
+}
+
+/* Include the auto-generated decoder.  */
+#include "decode.inc.c"
+
+#define prt(...) (ctx->dis->fprintf_func)((ctx->dis->stream), __VA_ARGS__)
+
+#define RX_MEMORY_BYTE 0
+#define RX_MEMORY_WORD 1
+#define RX_MEMORY_LONG 2
+
+#define RX_IM_BYTE 0
+#define RX_IM_WORD 1
+#define RX_IM_LONG 2
+#define RX_IM_UWORD 3
+
+static const char size[] = {'b', 'w', 'l'};
+static const char cond[][4] = {
+"eq", "ne", "c", "nc", "gtu", "leu", "pz", "n",
+"ge", "lt", "gt", "le", "o", "no", "ra", "f"
+};
+static const char psw[] = {
+'c', 'z', 's', 'o', 0, 0, 0, 0,
+'i', 'u', 0, 0, 0, 0, 0, 0,
+};
+
+static uint32_t rx_index_addr(int ld, int size, DisasContext *ctx)
+{
+bfd_byte buf[2];
+switch (ld) {
+case 0:
+return 0;
+case 1:
+ctx->dis->read_memory_func(ctx->addr, buf, 1, ctx->dis);
+ctx->addr += 1;
+return ((uint8_t)buf[0]) << size;
+case 2:
+ctx->dis->read_memory_func(ctx->addr, buf, 2, ctx->dis);
+ctx->addr += 2;
+return lduw_le_p(buf) << size;
+}
+g_assert_not_reached();
+}
+
+static void operand(DisasContext *ctx, int ld, int mi, int rs, int rd)
+{
+int dsp;
+static const char sizes[][4] = {".b", ".w", ".l", ".uw", ".ub"};
+if (ld < 3) {
+switch (mi) {
+case 4:
+/* dsp[rs].ub */
+dsp = rx_index_addr(ld, RX_MEMORY

[PATCH v30 04/22] target/rx: TCG translation

2020-02-12 Thread Yoshinori Sato
This part only supported RXv1 instructions.
Instruction manual.
https://www.renesas.com/us/en/doc/products/mpumcu/doc/rx_family/r01us0032ej0120_rxsm.pdf

Signed-off-by: Yoshinori Sato 
Reviewed-by: Richard Henderson 
Tested-by: Philippe Mathieu-Daudé 
Message-Id: <20190607091116.49044-2-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/rx/insns.decode  |  621 ++
 target/rx/translate.c   | 2432 +++
 target/rx/Makefile.objs |   12 +
 3 files changed, 3065 insertions(+)
 create mode 100644 target/rx/insns.decode
 create mode 100644 target/rx/translate.c
 create mode 100644 target/rx/Makefile.objs

diff --git a/target/rx/insns.decode b/target/rx/insns.decode
new file mode 100644
index 00..232a61fc8e
--- /dev/null
+++ b/target/rx/insns.decode
@@ -0,0 +1,621 @@
+#
+# Renesas RX instruction decode definitions.
+#
+# Copyright (c) 2019 Richard Henderson 
+# Copyright (c) 2019 Yoshinori Sato 
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see .
+#
+
+&bcnd  cd dsp sz
+&jdsp  dsp sz
+&jreg  rs
+&rrrd rs
+&rird imm
+&rrr   rd rs rs2
+&rri   rd imm rs2
+&rmrd rs ld mi
+&mirs ld mi imm
+&mrrs ld mi rs2
+&mcnd  ld sz rd cd
+
+%b1_bdsp   24:3 !function=bdsp_s
+
+@b1_bcnd_s  cd:1 ...   &bcnd dsp=%b1_bdsp sz=1
+@b1_bra_s      &jdsp dsp=%b1_bdsp sz=1
+
+%b2_r_016:4
+%b2_li_2   18:2 !function=li
+%b2_li_8   24:2 !function=li
+%b2_dsp5_3 23:4 19:1
+
+@b2_rds   rd:4 &rr rs=%b2_r_0
+@b2_rds_li    rd:4 &rri rs2=%b2_r_0 imm=%b2_li_8
+@b2_rds_uimm4    imm:4 rd:4&rri rs2=%b2_r_0
+@b2_rs2_uimm4    imm:4 rs2:4   &rri rd=0
+@b2_rds_imm5    ... imm:5 rd:4 &rri rs2=%b2_r_0
+@b2_rd_rs_li     rs2:4 rd:4&rri imm=%b2_li_8
+@b2_rd_ld_ub    .. ld:2 rs:4 rd:4  &rm mi=4
+@b2_ld_imm3 .. ld:2 rs:4 . imm:3   &mi mi=4
+@b2_bcnd_b  cd:4 dsp:s8&bcnd sz=2
+@b2_bra_b    dsp:s8&jdsp sz=2
+
+
+
+%b3_r_08:4
+%b3_li_10  18:2 !function=li
+%b3_dsp5_8 23:1 16:4
+%b3_bdsp   8:s8 16:8
+
+@b3_rd_rs      rs:4 rd:4   &rr
+@b3_rs_rd      rd:4 rs:4   &rr
+@b3_rd_li       rd:4 \
+   &rri rs2=%b3_r_0 imm=%b3_li_10
+@b3_rd_ld    mi:2  ld:2 rs:4 rd:4  &rm
+@b3_rd_ld_ub      .. ld:2 rs:4 rd:4&rm mi=4
+@b3_rd_ld_ul      .. ld:2 rs:4 rd:4&rm mi=2
+@b3_rd_rs_rs2     rd:4 rs:4 rs2:4  &rrr
+@b3_rds_imm5     ... imm:5 rd:4&rri rs2=%b3_r_0
+@b3_rd_rs_imm5   ... imm:5 rs2:4 rd:4  &rri
+@b3_bcnd_w  ... cd:1       &bcnd dsp=%b3_bdsp sz=3
+@b3_bra_w          &jdsp dsp=%b3_bdsp sz=3
+@b3_ld_rd_rs      .. ld:2 rs:4 rd:4&rm mi=0
+@b3_sz_ld_rd_cd   sz:2 ld:2 rd:4 cd:4  &mcnd
+
+
+
+%b4_li_18  18:2 !function=li
+%b4_dsp_16 0:s8 8:8
+%b4_bdsp   0:s8 8:8 16:8
+
+@b4_rd_ldmi  mi:2  ld:2   rs:4 rd:4&rm
+@b4_bra_a          \
+   &jdsp dsp=%b4_bdsp sz=4
+
+# ABS rd
+ABS_rr 0111 1110 0010  @b2_rds
+# ABS rs, rd
+ABS_rr  1100       @b3_rd_rs
+
+# ADC #imm, rd
+ADC_ir  1101 0111 ..00 0010    @b3_rd_li
+# ADC rs, rd
+ADC_rr  1100  1011     @b3_rd_rs
+# ADC dsp[rs].l, rd
+# Note only mi==2 allowed.
+ADC_mr  0110 ..10 00..  0010   @b4_rd_ldmi
+
+# ADD #uimm4, rd
+ADD_irr0110 0010   @b2_rds_uimm4
+# ADD #imm, rs, rd
+ADD_irr0111 00..   @b2_rd_rs_li
+# ADD dsp[rs].ub, rd
+# ADD rs, rd
+ADD_mr 0100 10..   @b2_rd_ld_ub
+# ADD dsp[rs], rd
+ADD_mr  

[PATCH v30 00/22] Add RX archtecture support

2020-02-12 Thread Yoshinori Sato
Hello.
This patch series is added Renesas RX target emulation.

Changes for v29.
Add target description XML. It required gdb-9.1.
Follow git master changes.

Changes for v28.
Allow -m option.
With this option, 16 Mbytes or more can be specified.
Add example for qemu-doc.
Fix build error on latest master.

Changes for v27.
Added RX section to qemu-doc.
Rebase for master

Changes for v26.
Rebase for 5.0
Update machine.json for 5.0

Changes for v25.
Update commit message.
Squashed qapi/machine.json changes.

Changes for v24.
Add note for qapi/machine.json.
Added Acked-by for 6/22.
git rebase master.

Changes for v23.
Follow master changes.

Changes for v22.
Added some include.

Changes for v21.
rebase latest master.
Remove unneeded hmp_info_tlb.

Chanegs for v20.
Reorderd patches.
Squashed v19 changes.

Changes for v19.
Follow tcg changes.
Cleanup cpu.c.
simplify rx_cpu_class_by_name and rx_load_image move to rx-virt.

My git repository is bellow.
git://git.pf.osdn.net/gitroot/y/ys/ysato/qemu.git tags/rx-20200212

Testing binaries bellow.
u-boot
Download - https://osdn.net/users/ysato/pf/qemu/dl/u-boot.bin.gz

starting
$ gzip -d u-boot.bin.gz
$ qemu-system-rx -bios u-boot.bin

linux and pico-root (only sash)
Download - https://osdn.net/users/ysato/pf/qemu/dl/zImage (kernel)
   https://osdn.net/users/ysato/pf/qemu/dl/rx-virt.dtb (DeviceTree)

starting
$ qemu-system-rx -kernel zImage -dtb rx-virt.dtb -append "earlycon"

Philippe Mathieu-Daudé (3):
  hw/registerfields.h: Add 8bit and 16bit register macros
  hw/rx: Restrict the RX62N microcontroller to the RX62N CPU core
  BootLinuxConsoleTest: Test the RX-Virt machine

Richard Henderson (7):
  target/rx: Disassemble rx_index_addr into a string
  target/rx: Replace operand with prt_ldmi in disassembler
  target/rx: Use prt_ldmi for XCHG_mr disassembly
  target/rx: Emit all disassembly in one prt()
  target/rx: Collect all bytes during disassembly
  target/rx: Dump bytes for each insn during disassembly
  hw/rx: Honor -accel qtest

Yoshinori Sato (12):
  MAINTAINERS: Add RX
  qemu/bitops.h: Add extract8 and extract16
  target/rx: TCG translation
  target/rx: TCG helper
  target/rx: CPU definition
  target/rx: RX disassembler
  hw/intc: RX62N interrupt controller (ICUa)
  hw/timer: RX62N internal timer modules
  hw/char: RX62N serial communication interface (SCI)
  hw/rx: RX Target hardware definition
  Add rx-softmmu
  qemu-doc.texi: Add RX section.

 qemu-doc.texi  |   44 +
 configure  |   11 +-
 default-configs/rx-softmmu.mak |3 +
 qapi/machine.json  |3 +-
 include/disas/dis-asm.h|5 +
 include/exec/poison.h  |1 +
 include/hw/char/renesas_sci.h  |   45 +
 include/hw/intc/rx_icu.h   |   56 +
 include/hw/registerfields.h|   32 +-
 include/hw/rx/rx.h |7 +
 include/hw/rx/rx62n.h  |   91 +
 include/hw/timer/renesas_cmt.h |   38 +
 include/hw/timer/renesas_tmr.h |   53 +
 include/qemu/bitops.h  |   38 +
 include/sysemu/arch_init.h |1 +
 target/rx/cpu-param.h  |   31 +
 target/rx/cpu-qom.h|   42 +
 target/rx/cpu.h|  181 ++
 target/rx/helper.h |   31 +
 target/rx/insns.decode |  621 ++
 arch_init.c|2 +
 hw/char/renesas_sci.c  |  342 
 hw/intc/rx_icu.c   |  379 
 hw/rx/rx-virt.c|  142 ++
 hw/rx/rx62n.c  |  247 +++
 hw/timer/renesas_cmt.c |  278 +++
 hw/timer/renesas_tmr.c |  458 +
 target/rx/cpu.c|  218 +++
 target/rx/disas.c  | 1446 ++
 target/rx/gdbstub.c|  112 ++
 target/rx/helper.c |  149 ++
 target/rx/op_helper.c  |  470 +
 target/rx/translate.c  | 2432 
 tests/qtest/machine-none-test.c|1 +
 MAINTAINERS|   19 +
 gdb-xml/rx-core.xml|   70 +
 hw/Kconfig |1 +
 hw/char/Kconfig|3 +
 hw/char/Makefile.objs  |1 +
 hw/intc/Kconfig|3 +
 hw/intc/Makefile.objs  |1 +
 hw/rx/Kconfig  |   14 +
 hw/rx/Makefile.objs|2 +
 hw/timer/Kconfig   |6 +
 hw/timer/Makefile.objs |3 +
 target/rx/Makefile.objs|   11 +
 tests/acceptance/boot_linux_console.py |   46 +
 47 files changed, 8187 insertions(+), 3 deletions(-)
 create mode 100644 default-configs/rx-softmmu.mak
 create mode 100644 include/hw/char/renesas_sci.h
 create mode 1006

[PATCH v30 20/22] Add rx-softmmu

2020-02-12 Thread Yoshinori Sato
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Yoshinori Sato 
Message-Id: <20190607091116.49044-17-ys...@users.sourceforge.jp>
Signed-off-by: Richard Henderson 
pick ed65c02993 target/rx: Add RX to SysEmuTarget
pick 01372568ae tests: Add rx to machine-none-test.c
[PMD: Squashed patches from Richard Henderson modifying
  qapi/common.json and tests/machine-none-test.c]
Signed-off-by: Philippe Mathieu-Daudé 
---
 configure   | 11 ++-
 default-configs/rx-softmmu.mak  |  3 +++
 qapi/machine.json   |  3 ++-
 include/exec/poison.h   |  1 +
 include/sysemu/arch_init.h  |  1 +
 arch_init.c |  2 ++
 tests/qtest/machine-none-test.c |  1 +
 hw/Kconfig  |  1 +
 8 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 default-configs/rx-softmmu.mak

diff --git a/configure b/configure
index 115dc38085..ffbd54726f 100755
--- a/configure
+++ b/configure
@@ -4104,7 +4104,7 @@ fi
 fdt_required=no
 for target in $target_list; do
   case $target in
-
aarch64*-softmmu|arm*-softmmu|ppc*-softmmu|microblaze*-softmmu|mips64el-softmmu|riscv*-softmmu)
+
aarch64*-softmmu|arm*-softmmu|ppc*-softmmu|microblaze*-softmmu|mips64el-softmmu|riscv*-softmmu|rx-softmmu)
   fdt_required=yes
 ;;
   esac
@@ -7744,6 +7744,12 @@ case "$target_name" in
 mttcg=yes
 gdb_xml_files="riscv-64bit-cpu.xml riscv-64bit-fpu.xml riscv-64bit-csr.xml 
riscv-64bit-virtual.xml"
   ;;
+  rx)
+TARGET_ARCH=rx
+bflt="yes"
+target_compiler=$cross_cc_rx
+gdb_xml_files="rx-core.xml"
+  ;;
   sh4|sh4eb)
 TARGET_ARCH=sh4
 bflt="yes"
@@ -7925,6 +7931,9 @@ for i in $ARCH $TARGET_BASE_ARCH ; do
   riscv*)
 disas_config "RISCV"
   ;;
+  rx)
+disas_config "RX"
+  ;;
   s390*)
 disas_config "S390"
   ;;
diff --git a/default-configs/rx-softmmu.mak b/default-configs/rx-softmmu.mak
new file mode 100644
index 00..a3eecefb11
--- /dev/null
+++ b/default-configs/rx-softmmu.mak
@@ -0,0 +1,3 @@
+# Default configuration for rx-softmmu
+
+CONFIG_RX_VIRT=y
diff --git a/qapi/machine.json b/qapi/machine.json
index b3d30bc816..57703c9696 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -21,6 +21,7 @@
 #is true even for "qemu-system-x86_64".
 #
 # ppcemb: dropped in 3.1
+# rx: added in 5.0
 #
 # Since: 3.0
 ##
@@ -28,7 +29,7 @@
   'data' : [ 'aarch64', 'alpha', 'arm', 'cris', 'hppa', 'i386', 'lm32',
  'm68k', 'microblaze', 'microblazeel', 'mips', 'mips64',
  'mips64el', 'mipsel', 'moxie', 'nios2', 'or1k', 'ppc',
- 'ppc64', 'riscv32', 'riscv64', 's390x', 'sh4',
+ 'ppc64', 'riscv32', 'riscv64', 'rx', 's390x', 'sh4',
  'sh4eb', 'sparc', 'sparc64', 'tricore', 'unicore32',
  'x86_64', 'xtensa', 'xtensaeb' ] }
 
diff --git a/include/exec/poison.h b/include/exec/poison.h
index 955eb863ab..7b9ac361dc 100644
--- a/include/exec/poison.h
+++ b/include/exec/poison.h
@@ -26,6 +26,7 @@
 #pragma GCC poison TARGET_PPC
 #pragma GCC poison TARGET_PPC64
 #pragma GCC poison TARGET_ABI32
+#pragma GCC poison TARGET_RX
 #pragma GCC poison TARGET_S390X
 #pragma GCC poison TARGET_SH4
 #pragma GCC poison TARGET_SPARC
diff --git a/include/sysemu/arch_init.h b/include/sysemu/arch_init.h
index 62c6fe4cf1..6c011acc52 100644
--- a/include/sysemu/arch_init.h
+++ b/include/sysemu/arch_init.h
@@ -24,6 +24,7 @@ enum {
 QEMU_ARCH_NIOS2 = (1 << 17),
 QEMU_ARCH_HPPA = (1 << 18),
 QEMU_ARCH_RISCV = (1 << 19),
+QEMU_ARCH_RX = (1 << 20),
 };
 
 extern const uint32_t arch_type;
diff --git a/arch_init.c b/arch_init.c
index 705d0b94ad..d9eb0ec1dd 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -77,6 +77,8 @@ int graphic_depth = 32;
 #define QEMU_ARCH QEMU_ARCH_PPC
 #elif defined(TARGET_RISCV)
 #define QEMU_ARCH QEMU_ARCH_RISCV
+#elif defined(TARGET_RX)
+#define QEMU_ARCH QEMU_ARCH_RX
 #elif defined(TARGET_S390X)
 #define QEMU_ARCH QEMU_ARCH_S390X
 #elif defined(TARGET_SH4)
diff --git a/tests/qtest/machine-none-test.c b/tests/qtest/machine-none-test.c
index 5953d31755..8bb54a6360 100644
--- a/tests/qtest/machine-none-test.c
+++ b/tests/qtest/machine-none-test.c
@@ -56,6 +56,7 @@ static struct arch2cpu cpus_map[] = {
 { "hppa", "hppa" },
 { "riscv64", "rv64gcsu-v1.10.0" },
 { "riscv32", "rv32gcsu-v1.9.1" },
+{ "rx", "rx62n" },
 };
 
 static const char *get_cpu_model_by_arch(const char *arch)
diff --git a/hw/Kconfig b/hw/Kconfig
index ecf491bf04..62f9ebdc22 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -55,6 +55,7 @@ source nios2/Kconfig
 source openrisc/Kconfig
 source ppc/Kconfig
 source riscv/Kconfig
+source rx/Kconfig
 source s390x/Kconfig
 source sh4/Kconfig
 source sparc/Kconfig
-- 
2.20.1




[PATCH v30 14/22] hw/intc: RX62N interrupt controller (ICUa)

2020-02-12 Thread Yoshinori Sato
This implementation supported only ICUa.
Hardware manual.
https://www.renesas.com/us/en/doc/products/mpumcu/doc/rx_family/r01uh0033ej0140_rx62n.pdf

Signed-off-by: Yoshinori Sato 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20190607091116.49044-6-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/hw/intc/rx_icu.h |  56 ++
 hw/intc/rx_icu.c | 379 +++
 hw/intc/Kconfig  |   3 +
 hw/intc/Makefile.objs|   1 +
 4 files changed, 439 insertions(+)
 create mode 100644 include/hw/intc/rx_icu.h
 create mode 100644 hw/intc/rx_icu.c

diff --git a/include/hw/intc/rx_icu.h b/include/hw/intc/rx_icu.h
new file mode 100644
index 00..acfcf06aef
--- /dev/null
+++ b/include/hw/intc/rx_icu.h
@@ -0,0 +1,56 @@
+#ifndef RX_ICU_H
+#define RX_ICU_H
+
+#include "qemu-common.h"
+#include "hw/irq.h"
+
+enum TRG_MODE {
+TRG_LEVEL = 0,
+TRG_NEDGE = 1,  /* Falling */
+TRG_PEDGE = 2,  /* Raising */
+TRG_BEDGE = 3,  /* Both */
+};
+
+struct IRQSource {
+enum TRG_MODE sense;
+int level;
+};
+
+enum {
+/* Software interrupt request */
+SWI = 27,
+NR_IRQS = 256,
+};
+
+struct RXICUState {
+SysBusDevice parent_obj;
+
+MemoryRegion memory;
+struct IRQSource src[NR_IRQS];
+char *icutype;
+uint32_t nr_irqs;
+uint32_t *map;
+uint32_t nr_sense;
+uint32_t *init_sense;
+
+uint8_t ir[NR_IRQS];
+uint8_t dtcer[NR_IRQS];
+uint8_t ier[NR_IRQS / 8];
+uint8_t ipr[142];
+uint8_t dmasr[4];
+uint16_t fir;
+uint8_t nmisr;
+uint8_t nmier;
+uint8_t nmiclr;
+uint8_t nmicr;
+int req_irq;
+qemu_irq _irq;
+qemu_irq _fir;
+qemu_irq _swi;
+};
+typedef struct RXICUState RXICUState;
+
+#define TYPE_RXICU "rx-icu"
+#define RXICU(obj) OBJECT_CHECK(RXICUState, (obj), TYPE_RXICU)
+
+#endif /* RX_ICU_H */
diff --git a/hw/intc/rx_icu.c b/hw/intc/rx_icu.c
new file mode 100644
index 00..ab9a300467
--- /dev/null
+++ b/hw/intc/rx_icu.c
@@ -0,0 +1,379 @@
+/*
+ * RX Interrupt Control Unit
+ *
+ * Warning: Only ICUa is supported.
+ *
+ * Datasheet: RX62N Group, RX621 Group User's Manual: Hardware
+ * (Rev.1.40 R01UH0033EJ0140)
+ *
+ * Copyright (c) 2019 Yoshinori Sato
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "cpu.h"
+#include "hw/hw.h"
+#include "hw/irq.h"
+#include "hw/sysbus.h"
+#include "hw/registerfields.h"
+#include "hw/qdev-properties.h"
+#include "hw/intc/rx_icu.h"
+#include "migration/vmstate.h"
+#include "qemu/error-report.h"
+
+REG8(IR, 0)
+  FIELD(IR, IR,  0, 1)
+REG8(DTCER, 0x100)
+  FIELD(DTCER, DTCE,  0, 1)
+REG8(IER, 0x200)
+REG8(SWINTR, 0x2e0)
+  FIELD(SWINTR, SWINT, 0, 1)
+REG16(FIR, 0x2f0)
+  FIELD(FIR, FVCT, 0, 8)
+  FIELD(FIR, FIEN, 15, 1)
+REG8(IPR, 0x300)
+  FIELD(IPR, IPR, 0, 4)
+REG8(DMRSR, 0x400)
+REG8(IRQCR, 0x500)
+  FIELD(IRQCR, IRQMD, 2, 2)
+REG8(NMISR, 0x580)
+  FIELD(NMISR, NMIST, 0, 1)
+  FIELD(NMISR, LVDST, 1, 1)
+  FIELD(NMISR, OSTST, 2, 1)
+REG8(NMIER, 0x581)
+  FIELD(NMIER, NMIEN, 0, 1)
+  FIELD(NMIER, LVDEN, 1, 1)
+  FIELD(NMIER, OSTEN, 2, 1)
+REG8(NMICLR, 0x582)
+  FIELD(NMICLR, NMICLR, 0, 1)
+  FIELD(NMICLR, OSTCLR, 2, 1)
+REG8(NMICR, 0x583)
+  FIELD(NMICR, NMIMD, 3, 1)
+
+#define request(icu, n) (icu->ipr[icu->map[n]] << 8 | n)
+
+static void set_irq(RXICUState *icu, int n_IRQ, int req)
+{
+if ((icu->fir & R_FIR_FIEN_MASK) &&
+(icu->fir & R_FIR_FVCT_MASK) == n_IRQ) {
+qemu_set_irq(icu->_fir, req);
+} else {
+qemu_set_irq(icu->_irq, req);
+}
+}
+
+static void rxicu_request(RXICUState *icu, int n_IRQ)
+{
+int enable;
+
+enable = icu->ier[n_IRQ / 8] & (1 << (n_IRQ & 7));
+if (n_IRQ > 0 && enable != 0 && atomic_read(&icu->req_irq) < 0) {
+atomic_set(&icu->req_irq, n_IRQ);
+set_irq(icu, n_IRQ, request(icu, n_IRQ));
+}
+}
+
+static void rxicu_set_irq(void *opaque, int n_IRQ, int level)
+{
+RXICUState *icu = opaque;
+struct IRQSource *src;
+int issue;
+
+if (n_IRQ >= NR_IRQS) {
+error_report("%s: IRQ %d out of range", __func__, n_IRQ);
+return;
+}
+
+src = &icu->src[n_IRQ];
+
+level = (level != 0);
+switch (src->sense) {
+case TRG_LEVEL:
+/* level-sen

[PATCH v30 01/22] MAINTAINERS: Add RX

2020-02-12 Thread Yoshinori Sato
Signed-off-by: Yoshinori Sato 
Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20190607091116.49044-18-ys...@users.sourceforge.jp>
Signed-off-by: Richard Henderson 
---
 MAINTAINERS | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c7717df720..41498dbab5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -274,6 +274,13 @@ F: include/hw/riscv/
 F: linux-user/host/riscv32/
 F: linux-user/host/riscv64/
 
+RENESAS RX CPUs
+M: Yoshinori Sato 
+S: Maintained
+F: target/rx/
+F: hw/rx/
+F: include/hw/rx/
+
 S390 TCG CPUs
 M: Richard Henderson 
 M: David Hildenbrand 
@@ -1159,6 +1166,18 @@ F: pc-bios/canyonlands.dt[sb]
 F: pc-bios/u-boot-sam460ex-20100605.bin
 F: roms/u-boot-sam460ex
 
+RX Machines
+---
+rx-virt
+M: Yoshinori Sato 
+S: Maintained
+F: hw/rx/rxqemu.c
+F: hw/intc/rx_icu.c
+F: hw/timer/renesas_*.c
+F: hw/char/renesas_sci.c
+F: include/hw/timer/renesas_*.h
+F: include/hw/char/renesas_sci.h
+
 SH4 Machines
 
 R2D
-- 
2.20.1




[PATCH v30 19/22] hw/rx: Restrict the RX62N microcontroller to the RX62N CPU core

2020-02-12 Thread Yoshinori Sato
From: Philippe Mathieu-Daudé 

While the VIRT machine can use different microcontrollers,
the RX62N microcontroller is tied to the RX62N CPU core.

Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Yoshinori Sato 
---
 hw/rx/rx-virt.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/rx/rx-virt.c b/hw/rx/rx-virt.c
index 017941b996..f58fa3e5a8 100644
--- a/hw/rx/rx-virt.c
+++ b/hw/rx/rx-virt.c
@@ -17,6 +17,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "qemu-common.h"
 #include "cpu.h"
@@ -56,6 +57,7 @@ static void rx_load_image(RXCPU *cpu, const char *filename,
 
 static void rxvirt_init(MachineState *machine)
 {
+MachineClass *mc = MACHINE_GET_CLASS(machine);
 RX62NState *s = g_new(RX62NState, 1);
 MemoryRegion *sysmem = get_system_memory();
 MemoryRegion *sdram = g_new(MemoryRegion, 1);
-- 
2.20.1




[PATCH v30 08/22] target/rx: Disassemble rx_index_addr into a string

2020-02-12 Thread Yoshinori Sato
From: Richard Henderson 

We were eliding all zero indexes.  It is only ld==0 that does
not have an index in the instruction.  This also allows us to
avoid breaking the final print into multiple pieces.

Reviewed-by: Yoshinori Sato 
Signed-off-by: Yoshinori Sato 
Message-Id: <20190607091116.49044-19-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/rx/disas.c | 154 +-
 1 file changed, 55 insertions(+), 99 deletions(-)

diff --git a/target/rx/disas.c b/target/rx/disas.c
index 8cada4825d..64342537ee 100644
--- a/target/rx/disas.c
+++ b/target/rx/disas.c
@@ -107,49 +107,42 @@ static const char psw[] = {
 'i', 'u', 0, 0, 0, 0, 0, 0,
 };
 
-static uint32_t rx_index_addr(int ld, int size, DisasContext *ctx)
+static void rx_index_addr(DisasContext *ctx, char out[8], int ld, int mi)
 {
-bfd_byte buf[2];
+uint32_t addr = ctx->addr;
+uint8_t buf[2];
+uint16_t dsp;
+
 switch (ld) {
 case 0:
-return 0;
+/* No index; return empty string.  */
+out[0] = '\0';
+return;
 case 1:
-ctx->dis->read_memory_func(ctx->addr, buf, 1, ctx->dis);
 ctx->addr += 1;
-return ((uint8_t)buf[0]) << size;
+ctx->dis->read_memory_func(addr, buf, 1, ctx->dis);
+dsp = buf[0];
+break;
 case 2:
-ctx->dis->read_memory_func(ctx->addr, buf, 2, ctx->dis);
 ctx->addr += 2;
-return lduw_le_p(buf) << size;
+ctx->dis->read_memory_func(addr, buf, 2, ctx->dis);
+dsp = lduw_le_p(buf);
+break;
+default:
+g_assert_not_reached();
 }
-g_assert_not_reached();
+
+sprintf(out, "%u", dsp << (mi < 3 ? mi : 4 - mi));
 }
 
 static void operand(DisasContext *ctx, int ld, int mi, int rs, int rd)
 {
-int dsp;
 static const char sizes[][4] = {".b", ".w", ".l", ".uw", ".ub"};
+char dsp[8];
+
 if (ld < 3) {
-switch (mi) {
-case 4:
-/* dsp[rs].ub */
-dsp = rx_index_addr(ld, RX_MEMORY_BYTE, ctx);
-break;
-case 3:
-/* dsp[rs].uw */
-dsp = rx_index_addr(ld, RX_MEMORY_WORD, ctx);
-break;
-default:
-/* dsp[rs].b */
-/* dsp[rs].w */
-/* dsp[rs].l */
-dsp = rx_index_addr(ld, mi, ctx);
-break;
-}
-if (dsp > 0) {
-prt("%d", dsp);
-}
-prt("[r%d]%s", rs, sizes[mi]);
+rx_index_addr(ctx, dsp, ld, mi);
+prt("%s[r%d]%s", dsp, rs, sizes[mi]);
 } else {
 prt("r%d", rs);
 }
@@ -235,7 +228,7 @@ static bool trans_MOV_ra(DisasContext *ctx, arg_MOV_ra *a)
 /* mov.[bwl] rs,rd */
 static bool trans_MOV_mm(DisasContext *ctx, arg_MOV_mm *a)
 {
-int dsp;
+char dspd[8], dsps[8];
 
 prt("mov.%c\t", size[a->sz]);
 if (a->lds == 3 && a->ldd == 3) {
@@ -244,29 +237,15 @@ static bool trans_MOV_mm(DisasContext *ctx, arg_MOV_mm *a)
 return true;
 }
 if (a->lds == 3) {
-prt("r%d, ", a->rd);
-dsp = rx_index_addr(a->ldd, a->sz, ctx);
-if (dsp > 0) {
-prt("%d", dsp);
-}
-prt("[r%d]", a->rs);
+rx_index_addr(ctx, dspd, a->ldd, a->sz);
+prt("r%d, %s[r%d]", a->rs, dspd, a->rd);
 } else if (a->ldd == 3) {
-dsp = rx_index_addr(a->lds, a->sz, ctx);
-if (dsp > 0) {
-prt("%d", dsp);
-}
-prt("[r%d], r%d", a->rs, a->rd);
+rx_index_addr(ctx, dsps, a->lds, a->sz);
+prt("%s[r%d], r%d", dsps, a->rs, a->rd);
 } else {
-dsp = rx_index_addr(a->lds, a->sz, ctx);
-if (dsp > 0) {
-prt("%d", dsp);
-}
-prt("[r%d], ", a->rs);
-dsp = rx_index_addr(a->ldd, a->sz, ctx);
-if (dsp > 0) {
-prt("%d", dsp);
-}
-prt("[r%d]", a->rd);
+rx_index_addr(ctx, dsps, a->lds, a->sz);
+rx_index_addr(ctx, dspd, a->ldd, a->sz);
+prt("%s[r%d], %s[r%d]", dsps, a->rs, dspd, a->rd);
 }
 return true;
 }
@@ -357,12 +336,10 @@ static bool trans_PUSH_r(DisasContext *ctx, arg_PUSH_r *a)
 /* push dsp[rs] */
 static bool trans_PUSH_m(DisasContext *ctx, arg_PUSH_m *a)
 {
-prt("push\t");
-int dsp = rx_index_addr(a->ld, a->sz, ctx);
-if (dsp > 0) {
-prt("%d", dsp);
-}
-prt("[r%d]", a->rs);
+char dsp[8];
+
+rx_index_addr(ctx, dsp, a->ld, a->sz);
+prt("push\t%s[r%d]", dsp, a->rs);
 return true;
 }
 
@@ -389,17 +366,13 @@ static bool trans_XCHG_rr(DisasContext *ctx, arg_XCHG_rr 
*a)
 /* xchg dsp[rs].,rd */
 static bool trans_XCHG_mr(DisasContext *ctx, arg_XCHG_mr *a)
 {
-int dsp;
 static const char msize[][4] = {
 "b", "w", "l", "ub", "uw",
 };
+char dsp[8];
 
-prt("xchg\t");
-dsp = rx_index_addr(a->ld, a->mi, ctx);
-if (dsp > 0) {
-prt("%d", dsp);
-}
-prt("[r%d]

[PATCH v30 11/22] target/rx: Emit all disassembly in one prt()

2020-02-12 Thread Yoshinori Sato
From: Richard Henderson 

Many of the multi-part prints have been eliminated by previous
patches.  Eliminate the rest of them.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Yoshinori Sato 
Signed-off-by: Yoshinori Sato 
Message-Id: <20190607091116.49044-22-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/rx/disas.c | 75 ---
 1 file changed, 39 insertions(+), 36 deletions(-)

diff --git a/target/rx/disas.c b/target/rx/disas.c
index db10385fd0..ebc1a44249 100644
--- a/target/rx/disas.c
+++ b/target/rx/disas.c
@@ -228,24 +228,21 @@ static bool trans_MOV_ra(DisasContext *ctx, arg_MOV_ra *a)
 /* mov.[bwl] rs,rd */
 static bool trans_MOV_mm(DisasContext *ctx, arg_MOV_mm *a)
 {
-char dspd[8], dsps[8];
+char dspd[8], dsps[8], szc = size[a->sz];
 
-prt("mov.%c\t", size[a->sz]);
 if (a->lds == 3 && a->ldd == 3) {
 /* mov.[bwl] rs,rd */
-prt("r%d, r%d", a->rs, a->rd);
-return true;
-}
-if (a->lds == 3) {
+prt("mov.%c\tr%d, r%d", szc, a->rs, a->rd);
+} else if (a->lds == 3) {
 rx_index_addr(ctx, dspd, a->ldd, a->sz);
-prt("r%d, %s[r%d]", a->rs, dspd, a->rd);
+prt("mov.%c\tr%d, %s[r%d]", szc, a->rs, dspd, a->rd);
 } else if (a->ldd == 3) {
 rx_index_addr(ctx, dsps, a->lds, a->sz);
-prt("%s[r%d], r%d", dsps, a->rs, a->rd);
+prt("mov.%c\t%s[r%d], r%d", szc, dsps, a->rs, a->rd);
 } else {
 rx_index_addr(ctx, dsps, a->lds, a->sz);
 rx_index_addr(ctx, dspd, a->ldd, a->sz);
-prt("%s[r%d], %s[r%d]", dsps, a->rs, dspd, a->rd);
+prt("mov.%c\t%s[r%d], %s[r%d]", szc, dsps, a->rs, dspd, a->rd);
 }
 return true;
 }
@@ -254,8 +251,11 @@ static bool trans_MOV_mm(DisasContext *ctx, arg_MOV_mm *a)
 /* mov.[bwl] rs,[-rd] */
 static bool trans_MOV_rp(DisasContext *ctx, arg_MOV_rp *a)
 {
-prt("mov.%c\tr%d, ", size[a->sz], a->rs);
-prt((a->ad == 0) ? "[r%d+]" : "[-r%d]", a->rd);
+if (a->ad) {
+prt("mov.%c\tr%d, [-r%d]", size[a->sz], a->rs, a->rd);
+} else {
+prt("mov.%c\tr%d, [r%d+]", size[a->sz], a->rs, a->rd);
+}
 return true;
 }
 
@@ -263,9 +263,11 @@ static bool trans_MOV_rp(DisasContext *ctx, arg_MOV_rp *a)
 /* mov.[bwl] [-rd],rs */
 static bool trans_MOV_pr(DisasContext *ctx, arg_MOV_pr *a)
 {
-prt("mov.%c\t", size[a->sz]);
-prt((a->ad == 0) ? "[r%d+]" : "[-r%d]", a->rd);
-prt(", r%d", a->rs);
+if (a->ad) {
+prt("mov.%c\t[-r%d], r%d", size[a->sz], a->rd, a->rs);
+} else {
+prt("mov.%c\t[r%d+], r%d", size[a->sz], a->rd, a->rs);
+}
 return true;
 }
 
@@ -299,9 +301,11 @@ static bool trans_MOVU_ar(DisasContext *ctx, arg_MOVU_ar 
*a)
 /* movu.[bw] [-rs],rd */
 static bool trans_MOVU_pr(DisasContext *ctx, arg_MOVU_pr *a)
 {
-prt("movu.%c\t", size[a->sz]);
-prt((a->ad == 0) ? "[r%d+]" : "[-r%d]", a->rd);
-prt(", r%d", a->rs);
+if (a->ad) {
+prt("movu.%c\t[-r%d], r%d", size[a->sz], a->rd, a->rs);
+} else {
+prt("movu.%c\t[r%d+], r%d", size[a->sz], a->rd, a->rs);
+}
 return true;
 }
 
@@ -478,11 +482,11 @@ static bool trans_TST_mr(DisasContext *ctx, arg_TST_mr *a)
 /* not rs, rd */
 static bool trans_NOT_rr(DisasContext *ctx, arg_NOT_rr *a)
 {
-prt("not\t");
 if (a->rs != a->rd) {
-prt("r%d, ", a->rs);
+prt("not\tr%d, r%d", a->rs, a->rd);
+} else {
+prt("not\tr%d", a->rs);
 }
-prt("r%d", a->rd);
 return true;
 }
 
@@ -490,11 +494,11 @@ static bool trans_NOT_rr(DisasContext *ctx, arg_NOT_rr *a)
 /* neg rs, rd */
 static bool trans_NEG_rr(DisasContext *ctx, arg_NEG_rr *a)
 {
-prt("neg\t");
 if (a->rs != a->rd) {
-prt("r%d, ", a->rs);
+prt("neg\tr%d, r%d", a->rs, a->rd);
+} else {
+prt("neg\tr%d", a->rs);
 }
-prt("r%d", a->rd);
 return true;
 }
 
@@ -606,11 +610,10 @@ static bool trans_SBB_mr(DisasContext *ctx, arg_SBB_mr *a)
 /* abs rs, rd */
 static bool trans_ABS_rr(DisasContext *ctx, arg_ABS_rr *a)
 {
-prt("abs\t");
-if (a->rs == a->rd) {
-prt("r%d", a->rd);
+if (a->rs != a->rd) {
+prt("abs\tr%d, r%d", a->rs, a->rd);
 } else {
-prt("r%d, r%d", a->rs, a->rd);
+prt("abs\tr%d", a->rs);
 }
 return true;
 }
@@ -733,11 +736,11 @@ static bool trans_DIVU_mr(DisasContext *ctx, arg_DIVU_mr 
*a)
 /* shll #imm:5, rs, rd */
 static bool trans_SHLL_irr(DisasContext *ctx, arg_SHLL_irr *a)
 {
-prt("shll\t#%d, ", a->imm);
 if (a->rs2 != a->rd) {
-prt("r%d, ", a->rs2);
+prt("shll\t#%d, r%d, r%d", a->imm, a->rs2, a->rd);
+} else {
+prt("shll\t#%d, r%d", a->imm, a->rd);
 }
-prt("r%d", a->rd);
 return true;
 }
 
@@ -752,11 +755,11 @@ static bool trans_SHLL_rr(DisasContext *ctx, arg_SHLL_rr 
*a)
 /* shar #imm:5, rs, rd */
 static bool trans_SHAR_irr(DisasContext *ctx, arg_SHAR_irr

[PATCH v30 10/22] target/rx: Use prt_ldmi for XCHG_mr disassembly

2020-02-12 Thread Yoshinori Sato
From: Richard Henderson 

Note that the ld == 3 case handled by prt_ldmi is decoded as
XCHG_rr and cannot appear here.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Yoshinori Sato 
Signed-off-by: Yoshinori Sato 
Message-Id: <20190607091116.49044-21-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/rx/disas.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/target/rx/disas.c b/target/rx/disas.c
index 515b365528..db10385fd0 100644
--- a/target/rx/disas.c
+++ b/target/rx/disas.c
@@ -366,13 +366,7 @@ static bool trans_XCHG_rr(DisasContext *ctx, arg_XCHG_rr 
*a)
 /* xchg dsp[rs].,rd */
 static bool trans_XCHG_mr(DisasContext *ctx, arg_XCHG_mr *a)
 {
-static const char msize[][4] = {
-"b", "w", "l", "ub", "uw",
-};
-char dsp[8];
-
-rx_index_addr(ctx, dsp, a->ld, a->mi);
-prt("xchg\t%s[r%d].%s, r%d", dsp, a->rs, msize[a->mi], a->rd);
+prt_ldmi(ctx, "xchg", a->ld, a->mi, a->rs, a->rd);
 return true;
 }
 
-- 
2.20.1




[PATCH v30 13/22] target/rx: Dump bytes for each insn during disassembly

2020-02-12 Thread Yoshinori Sato
From: Richard Henderson 

There are so many different forms of each RX instruction
that it will be very useful to be able to look at the bytes
to see on which path a bug may lie.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Yoshinori Sato 
Signed-off-by: Yoshinori Sato 
Message-Id: <20190607091116.49044-24-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/rx/disas.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/target/rx/disas.c b/target/rx/disas.c
index 5a32a87534..d73b53db44 100644
--- a/target/rx/disas.c
+++ b/target/rx/disas.c
@@ -102,7 +102,21 @@ static int bdsp_s(DisasContext *ctx, int d)
 /* Include the auto-generated decoder.  */
 #include "decode.inc.c"
 
-#define prt(...) (ctx->dis->fprintf_func)((ctx->dis->stream), __VA_ARGS__)
+static void dump_bytes(DisasContext *ctx)
+{
+int i, len = ctx->len;
+
+for (i = 0; i < len; ++i) {
+ctx->dis->fprintf_func(ctx->dis->stream, "%02x ", ctx->bytes[i]);
+}
+ctx->dis->fprintf_func(ctx->dis->stream, "%*c", (8 - i) * 3, '\t');
+}
+
+#define prt(...) \
+do {\
+dump_bytes(ctx);\
+ctx->dis->fprintf_func(ctx->dis->stream, __VA_ARGS__);  \
+} while (0)
 
 #define RX_MEMORY_BYTE 0
 #define RX_MEMORY_WORD 1
-- 
2.20.1




[PATCH v30 22/22] qemu-doc.texi: Add RX section.

2020-02-12 Thread Yoshinori Sato
Describe emulated target specification. And two examples.

Signed-off-by: Yoshinori Sato 
---
 qemu-doc.texi | 44 
 1 file changed, 44 insertions(+)

diff --git a/qemu-doc.texi b/qemu-doc.texi
index a1ef6b6484..0b2173daa7 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -1719,6 +1719,7 @@ differences are mentioned in the following sections.
 * Microblaze System emulator::
 * SH4 System emulator::
 * Xtensa System emulator::
+* RX System emulator::
 @end menu
 
 @node PowerPC System emulator
@@ -2487,6 +2488,49 @@ so should only be used with trusted guest OS.
 
 @c man end
 
+@node RX System emulator
+@section RX System emulator
+@cindex system emulation (RX)
+
+Use the executable @file{qemu-system-rx} to simulate a Virtual RX target.
+This target emulated following devices.
+
+@itemize @minus
+@item
+R5F562N8 MCU
+@item
+On-chip memory (ROM 512KB, RAM 96KB)
+@item
+Interrupt Control Unit (ICUa)
+@item
+8Bit Timer x 1CH (TMR0,1)
+@item
+Compare Match Timer x 2CH (CMT0,1)
+@item
+Serial Communication Interface x 1CH (SCI0)
+@item
+External memory 16MByte
+@end itemize
+
+Example of @file{qemu-system-rx} usage for rx is shown below:
+
+Download @code{u-boot_image} from 
@url{https://osdn.net/users/ysato/pf/qemu/dl/u-boot.bin.gz}
+
+Start emulation of rx-virt:
+@example
+qemu-system-rx -bios @code{u-boot_image}
+@end example
+
+Download @code{kernel_image} from 
@url{https://osdn.net/users/ysato/pf/qemu/dl/zImage}
+
+Download @code{device_tree_blob} from 
@url{https://osdn.net/users/ysato/pf/qemu/dl/rx-virt.dtb}
+
+Start emulation of rx-virt:
+@example
+qemu-system-rx -kernel @code{kernel_image} -dtb @code{device_tree_blob} \
+  -append "earlycon"
+@end example
+
 @node QEMU User space emulator
 @chapter QEMU User space emulator
 
-- 
2.20.1




[PATCH v30 05/22] target/rx: TCG helper

2020-02-12 Thread Yoshinori Sato
Signed-off-by: Yoshinori Sato 

Message-Id: <20190616142836.10614-3-ys...@users.sourceforge.jp>
Reviewed-by: Richard Henderson 
Message-Id: <20190607091116.49044-3-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
[PMD: Removed tlb_fill, extracted from patch of Yoshinori Sato
 'Convert to CPUClass::tlb_fill']
Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Yoshinori Sato 
---
 target/rx/helper.h|  31 +++
 target/rx/helper.c| 149 +
 target/rx/op_helper.c | 470 ++
 3 files changed, 650 insertions(+)
 create mode 100644 target/rx/helper.h
 create mode 100644 target/rx/helper.c
 create mode 100644 target/rx/op_helper.c

diff --git a/target/rx/helper.h b/target/rx/helper.h
new file mode 100644
index 00..f0b7ebbbf7
--- /dev/null
+++ b/target/rx/helper.h
@@ -0,0 +1,31 @@
+DEF_HELPER_1(raise_illegal_instruction, noreturn, env)
+DEF_HELPER_1(raise_access_fault, noreturn, env)
+DEF_HELPER_1(raise_privilege_violation, noreturn, env)
+DEF_HELPER_1(wait, noreturn, env)
+DEF_HELPER_1(debug, noreturn, env)
+DEF_HELPER_2(rxint, noreturn, env, i32)
+DEF_HELPER_1(rxbrk, noreturn, env)
+DEF_HELPER_FLAGS_3(fadd, TCG_CALL_NO_WG, f32, env, f32, f32)
+DEF_HELPER_FLAGS_3(fsub, TCG_CALL_NO_WG, f32, env, f32, f32)
+DEF_HELPER_FLAGS_3(fmul, TCG_CALL_NO_WG, f32, env, f32, f32)
+DEF_HELPER_FLAGS_3(fdiv, TCG_CALL_NO_WG, f32, env, f32, f32)
+DEF_HELPER_FLAGS_3(fcmp, TCG_CALL_NO_WG, void, env, f32, f32)
+DEF_HELPER_FLAGS_2(ftoi, TCG_CALL_NO_WG, i32, env, f32)
+DEF_HELPER_FLAGS_2(round, TCG_CALL_NO_WG, i32, env, f32)
+DEF_HELPER_FLAGS_2(itof, TCG_CALL_NO_WG, f32, env, i32)
+DEF_HELPER_2(set_fpsw, void, env, i32)
+DEF_HELPER_FLAGS_2(racw, TCG_CALL_NO_WG, void, env, i32)
+DEF_HELPER_FLAGS_2(set_psw_rte, TCG_CALL_NO_WG, void, env, i32)
+DEF_HELPER_FLAGS_2(set_psw, TCG_CALL_NO_WG, void, env, i32)
+DEF_HELPER_1(pack_psw, i32, env)
+DEF_HELPER_FLAGS_3(div, TCG_CALL_NO_WG, i32, env, i32, i32)
+DEF_HELPER_FLAGS_3(divu, TCG_CALL_NO_WG, i32, env, i32, i32)
+DEF_HELPER_FLAGS_1(scmpu, TCG_CALL_NO_WG, void, env)
+DEF_HELPER_1(smovu, void, env)
+DEF_HELPER_1(smovf, void, env)
+DEF_HELPER_1(smovb, void, env)
+DEF_HELPER_2(sstr, void, env, i32)
+DEF_HELPER_FLAGS_2(swhile, TCG_CALL_NO_WG, void, env, i32)
+DEF_HELPER_FLAGS_2(suntil, TCG_CALL_NO_WG, void, env, i32)
+DEF_HELPER_FLAGS_2(rmpa, TCG_CALL_NO_WG, void, env, i32)
+DEF_HELPER_1(satr, void, env)
diff --git a/target/rx/helper.c b/target/rx/helper.c
new file mode 100644
index 00..a6a337a311
--- /dev/null
+++ b/target/rx/helper.c
@@ -0,0 +1,149 @@
+/*
+ *  RX emulation
+ *
+ *  Copyright (c) 2019 Yoshinori Sato
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/bitops.h"
+#include "cpu.h"
+#include "exec/log.h"
+#include "exec/cpu_ldst.h"
+#include "sysemu/sysemu.h"
+#include "hw/irq.h"
+
+void rx_cpu_unpack_psw(CPURXState *env, uint32_t psw, int rte)
+{
+if (env->psw_pm == 0) {
+env->psw_ipl = FIELD_EX32(psw, PSW, IPL);
+if (rte) {
+/* PSW.PM can write RTE and RTFI */
+env->psw_pm = FIELD_EX32(psw, PSW, PM);
+}
+env->psw_u = FIELD_EX32(psw, PSW, U);
+env->psw_i = FIELD_EX32(psw, PSW, I);
+}
+env->psw_o = FIELD_EX32(psw, PSW, O) << 31;
+env->psw_s = FIELD_EX32(psw, PSW, S) << 31;
+env->psw_z = 1 - FIELD_EX32(psw, PSW, Z);
+env->psw_c = FIELD_EX32(psw, PSW, C);
+}
+
+#define INT_FLAGS (CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIR)
+void rx_cpu_do_interrupt(CPUState *cs)
+{
+RXCPU *cpu = RXCPU(cs);
+CPURXState *env = &cpu->env;
+int do_irq = cs->interrupt_request & INT_FLAGS;
+uint32_t save_psw;
+
+env->in_sleep = 0;
+
+if (env->psw_u) {
+env->usp = env->regs[0];
+} else {
+env->isp = env->regs[0];
+}
+save_psw = rx_cpu_pack_psw(env);
+env->psw_pm = env->psw_i = env->psw_u = 0;
+
+if (do_irq) {
+if (do_irq & CPU_INTERRUPT_FIR) {
+env->bpc = env->pc;
+env->bpsw = save_psw;
+env->pc = env->fintv;
+env->psw_ipl = 15;
+cs->interrupt_request &= ~CPU_INTERRUPT_FIR;
+qemu_set_irq(env->ack, env->ack_irq);
+qemu_log_mask(CPU_LOG_INT, "fast interrupt raised\n");
+} else if (do_irq & CPU_INTERRUPT_HARD) {
+env->isp -= 4;
+cpu_

[PATCH v30 16/22] hw/char: RX62N serial communication interface (SCI)

2020-02-12 Thread Yoshinori Sato
This module supported only non FIFO type.
Hardware manual.
https://www.renesas.com/us/en/doc/products/mpumcu/doc/rx_family/r01uh0033ej0140_rx62n.pdf

Signed-off-by: Yoshinori Sato 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20190607091116.49044-8-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/hw/char/renesas_sci.h |  45 +
 hw/char/renesas_sci.c | 342 ++
 hw/char/Kconfig   |   3 +
 hw/char/Makefile.objs |   1 +
 4 files changed, 391 insertions(+)
 create mode 100644 include/hw/char/renesas_sci.h
 create mode 100644 hw/char/renesas_sci.c

diff --git a/include/hw/char/renesas_sci.h b/include/hw/char/renesas_sci.h
new file mode 100644
index 00..50d1336944
--- /dev/null
+++ b/include/hw/char/renesas_sci.h
@@ -0,0 +1,45 @@
+/*
+ * Renesas Serial Communication Interface
+ *
+ * Copyright (c) 2018 Yoshinori Sato
+ *
+ * This code is licensed under the GPL version 2 or later.
+ *
+ */
+
+#include "chardev/char-fe.h"
+#include "qemu/timer.h"
+#include "hw/sysbus.h"
+
+#define TYPE_RENESAS_SCI "renesas-sci"
+#define RSCI(obj) OBJECT_CHECK(RSCIState, (obj), TYPE_RENESAS_SCI)
+
+enum {
+ERI = 0,
+RXI = 1,
+TXI = 2,
+TEI = 3,
+SCI_NR_IRQ = 4,
+};
+
+typedef struct {
+SysBusDevice parent_obj;
+MemoryRegion memory;
+
+uint8_t smr;
+uint8_t brr;
+uint8_t scr;
+uint8_t tdr;
+uint8_t ssr;
+uint8_t rdr;
+uint8_t scmr;
+uint8_t semr;
+
+uint8_t read_ssr;
+int64_t trtime;
+int64_t rx_next;
+QEMUTimer *timer;
+CharBackend chr;
+uint64_t input_freq;
+qemu_irq irq[SCI_NR_IRQ];
+} RSCIState;
diff --git a/hw/char/renesas_sci.c b/hw/char/renesas_sci.c
new file mode 100644
index 00..0760a51f43
--- /dev/null
+++ b/hw/char/renesas_sci.c
@@ -0,0 +1,342 @@
+/*
+ * Renesas Serial Communication Interface
+ *
+ * Datasheet: RX62N Group, RX621 Group User's Manual: Hardware
+ * (Rev.1.40 R01UH0033EJ0140)
+ *
+ * Copyright (c) 2019 Yoshinori Sato
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+#include "hw/hw.h"
+#include "hw/irq.h"
+#include "hw/sysbus.h"
+#include "hw/registerfields.h"
+#include "hw/qdev-properties.h"
+#include "hw/char/renesas_sci.h"
+#include "migration/vmstate.h"
+#include "qemu/error-report.h"
+
+/* SCI register map */
+REG8(SMR, 0)
+  FIELD(SMR, CKS,  0, 2)
+  FIELD(SMR, MP,   2, 1)
+  FIELD(SMR, STOP, 3, 1)
+  FIELD(SMR, PM,   4, 1)
+  FIELD(SMR, PE,   5, 1)
+  FIELD(SMR, CHR,  6, 1)
+  FIELD(SMR, CM,   7, 1)
+REG8(BRR, 1)
+REG8(SCR, 2)
+  FIELD(SCR, CKE, 0, 2)
+  FIELD(SCR, TEIE, 2, 1)
+  FIELD(SCR, MPIE, 3, 1)
+  FIELD(SCR, RE,   4, 1)
+  FIELD(SCR, TE,   5, 1)
+  FIELD(SCR, RIE,  6, 1)
+  FIELD(SCR, TIE,  7, 1)
+REG8(TDR, 3)
+REG8(SSR, 4)
+  FIELD(SSR, MPBT, 0, 1)
+  FIELD(SSR, MPB,  1, 1)
+  FIELD(SSR, TEND, 2, 1)
+  FIELD(SSR, ERR, 3, 3)
+FIELD(SSR, PER,  3, 1)
+FIELD(SSR, FER,  4, 1)
+FIELD(SSR, ORER, 5, 1)
+  FIELD(SSR, RDRF, 6, 1)
+  FIELD(SSR, TDRE, 7, 1)
+REG8(RDR, 5)
+REG8(SCMR, 6)
+  FIELD(SCMR, SMIF, 0, 1)
+  FIELD(SCMR, SINV, 2, 1)
+  FIELD(SCMR, SDIR, 3, 1)
+  FIELD(SCMR, BCP2, 7, 1)
+REG8(SEMR, 7)
+  FIELD(SEMR, ACS0, 0, 1)
+  FIELD(SEMR, ABCS, 4, 1)
+
+static int can_receive(void *opaque)
+{
+RSCIState *sci = RSCI(opaque);
+if (sci->rx_next > qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL)) {
+return 0;
+} else {
+return FIELD_EX8(sci->scr, SCR, RE);
+}
+}
+
+static void receive(void *opaque, const uint8_t *buf, int size)
+{
+RSCIState *sci = RSCI(opaque);
+sci->rx_next = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + sci->trtime;
+if (FIELD_EX8(sci->ssr, SSR, RDRF) || size > 1) {
+sci->ssr = FIELD_DP8(sci->ssr, SSR, ORER, 1);
+if (FIELD_EX8(sci->scr, SCR, RIE)) {
+qemu_set_irq(sci->irq[ERI], 1);
+}
+} else {
+sci->rdr = buf[0];
+sci->ssr = FIELD_DP8(sci->ssr, SSR, RDRF, 1);
+if (FIELD_EX8(sci->scr, SCR, RIE)) {
+qemu_irq_pulse(sci->irq[RXI]);
+}
+}
+}
+
+static void send_byte(RSCIState *sci)
+{
+if (qemu_chr_fe_backend_connected(&sci->chr)) {
+qemu_chr_fe_write_all(&sci->chr, &sci->tdr, 1);
+}
+timer_mod(sci->timer,
+  qemu_

[PATCH v30 18/22] hw/rx: Honor -accel qtest

2020-02-12 Thread Yoshinori Sato
From: Richard Henderson 

Issue an error if no kernel, no bios, and not qtest'ing.
Fixes make check-qtest-rx: test/qom-test.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Yoshinori Sato 
Message-Id: <20190607091116.49044-16-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
We could squash this with the previous patch
---
 hw/rx/rx62n.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/rx/rx62n.c b/hw/rx/rx62n.c
index bd4cd4b6ea..c488934f09 100644
--- a/hw/rx/rx62n.c
+++ b/hw/rx/rx62n.c
@@ -21,12 +21,14 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qemu/error-report.h"
 #include "hw/hw.h"
 #include "hw/rx/rx62n.h"
 #include "hw/loader.h"
 #include "hw/sysbus.h"
 #include "hw/qdev-properties.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/qtest.h"
 #include "cpu.h"
 
 /*
@@ -191,8 +193,14 @@ static void rx62n_realize(DeviceState *dev, Error **errp)
 memory_region_init_rom(&s->c_flash, NULL, "codeflash",
RX62N_CFLASH_SIZE, errp);
 memory_region_add_subregion(s->sysmem, RX62N_CFLASH_BASE, &s->c_flash);
+
 if (!s->kernel) {
-rom_add_file_fixed(bios_name, RX62N_CFLASH_BASE, 0);
+if (bios_name) {
+rom_add_file_fixed(bios_name, RX62N_CFLASH_BASE, 0);
+}  else if (!qtest_enabled()) {
+error_report("No bios or kernel specified");
+exit(1);
+}
 }
 
 /* Initialize CPU */
-- 
2.20.1




[PATCH v30 03/22] hw/registerfields.h: Add 8bit and 16bit register macros

2020-02-12 Thread Yoshinori Sato
From: Philippe Mathieu-Daudé 

Some RX peripheral using 8bit and 16bit registers.
Added 8bit and 16bit APIs.

Signed-off-by: Yoshinori Sato 
Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20190607091116.49044-11-ys...@users.sourceforge.jp>
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 include/hw/registerfields.h | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/include/hw/registerfields.h b/include/hw/registerfields.h
index 2659a58737..a0bb0654d6 100644
--- a/include/hw/registerfields.h
+++ b/include/hw/registerfields.h
@@ -22,6 +22,14 @@
 enum { A_ ## reg = (addr) };  \
 enum { R_ ## reg = (addr) / 4 };
 
+#define REG8(reg, addr)  \
+enum { A_ ## reg = (addr) };  \
+enum { R_ ## reg = (addr) };
+
+#define REG16(reg, addr)  \
+enum { A_ ## reg = (addr) };  \
+enum { R_ ## reg = (addr) / 2 };
+
 /* Define SHIFT, LENGTH and MASK constants for a field within a register */
 
 /* This macro will define R_FOO_BAR_MASK, R_FOO_BAR_SHIFT and R_FOO_BAR_LENGTH
@@ -34,6 +42,12 @@
 MAKE_64BIT_MASK(shift, length)};
 
 /* Extract a field from a register */
+#define FIELD_EX8(storage, reg, field)\
+extract8((storage), R_ ## reg ## _ ## field ## _SHIFT,\
+  R_ ## reg ## _ ## field ## _LENGTH)
+#define FIELD_EX16(storage, reg, field)   \
+extract16((storage), R_ ## reg ## _ ## field ## _SHIFT,   \
+  R_ ## reg ## _ ## field ## _LENGTH)
 #define FIELD_EX32(storage, reg, field)   \
 extract32((storage), R_ ## reg ## _ ## field ## _SHIFT,   \
   R_ ## reg ## _ ## field ## _LENGTH)
@@ -49,6 +63,22 @@
  * Assigning values larger then the target field will result in
  * compilation warnings.
  */
+#define FIELD_DP8(storage, reg, field, val) ({\
+struct {  \
+unsigned int v:R_ ## reg ## _ ## field ## _LENGTH;\
+} v = { .v = val };   \
+uint8_t d;\
+d = deposit32((storage), R_ ## reg ## _ ## field ## _SHIFT,   \
+  R_ ## reg ## _ ## field ## _LENGTH, v.v);   \
+d; })
+#define FIELD_DP16(storage, reg, field, val) ({   \
+struct {  \
+unsigned int v:R_ ## reg ## _ ## field ## _LENGTH;\
+} v = { .v = val };   \
+uint16_t d;   \
+d = deposit32((storage), R_ ## reg ## _ ## field ## _SHIFT,   \
+  R_ ## reg ## _ ## field ## _LENGTH, v.v);   \
+d; })
 #define FIELD_DP32(storage, reg, field, val) ({   \
 struct {  \
 unsigned int v:R_ ## reg ## _ ## field ## _LENGTH;\
@@ -57,7 +87,7 @@
 d = deposit32((storage), R_ ## reg ## _ ## field ## _SHIFT,   \
   R_ ## reg ## _ ## field ## _LENGTH, v.v);   \
 d; })
-#define FIELD_DP64(storage, reg, field, val) ({   \
+#define FIELD_DP64(storage, reg, field, val) ({ \
 struct {  \
 unsigned int v:R_ ## reg ## _ ## field ## _LENGTH;\
 } v = { .v = val };   \
-- 
2.20.1




[PATCH v30 06/22] target/rx: CPU definition

2020-02-12 Thread Yoshinori Sato
Signed-off-by: Yoshinori Sato 

Message-Id: <20190616142836.10614-4-ys...@users.sourceforge.jp>
Reviewed-by: Richard Henderson 
Message-Id: <20190607091116.49044-4-ys...@users.sourceforge.jp>
Signed-off-by: Richard Henderson 
[PMD: Use newer QOM style, split cpu-qom.h, restrict access to
 extable array, use rx_cpu_tlb_fill() extracted from patch of
 Yoshinori Sato 'Convert to CPUClass::tlb_fill']
Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Igor Mammedov 
Signed-off-by: Yoshinori Sato 
---
 target/rx/cpu-param.h   |  31 ++
 target/rx/cpu-qom.h |  42 
 target/rx/cpu.h | 181 +
 target/rx/cpu.c | 218 
 target/rx/gdbstub.c | 112 +
 gdb-xml/rx-core.xml |  70 +
 target/rx/Makefile.objs |   1 -
 7 files changed, 654 insertions(+), 1 deletion(-)
 create mode 100644 target/rx/cpu-param.h
 create mode 100644 target/rx/cpu-qom.h
 create mode 100644 target/rx/cpu.h
 create mode 100644 target/rx/cpu.c
 create mode 100644 target/rx/gdbstub.c
 create mode 100644 gdb-xml/rx-core.xml

diff --git a/target/rx/cpu-param.h b/target/rx/cpu-param.h
new file mode 100644
index 00..5da87fbebe
--- /dev/null
+++ b/target/rx/cpu-param.h
@@ -0,0 +1,31 @@
+/*
+ *  RX cpu parameters
+ *
+ *  Copyright (c) 2019 Yoshinori Sato
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#ifndef RX_CPU_PARAM_H
+#define RX_CPU_PARAM_H
+
+#define TARGET_LONG_BITS 32
+#define TARGET_PAGE_BITS 12
+
+#define TARGET_PHYS_ADDR_SPACE_BITS 32
+#define TARGET_VIRT_ADDR_SPACE_BITS 32
+
+#define NB_MMU_MODES 1
+#define MMU_MODE0_SUFFIX _all
+
+#endif
diff --git a/target/rx/cpu-qom.h b/target/rx/cpu-qom.h
new file mode 100644
index 00..8328900f3f
--- /dev/null
+++ b/target/rx/cpu-qom.h
@@ -0,0 +1,42 @@
+#ifndef QEMU_RX_CPU_QOM_H
+#define QEMU_RX_CPU_QOM_H
+
+#include "hw/core/cpu.h"
+/*
+ * RX CPU
+ *
+ * Copyright (c) 2019 Yoshinori Sato
+ * SPDX-License-Identifier: LGPL-2.0+
+ */
+
+#define TYPE_RX_CPU "rx-cpu"
+
+#define TYPE_RX62N_CPU RX_CPU_TYPE_NAME("rx62n")
+
+#define RXCPU_CLASS(klass) \
+OBJECT_CLASS_CHECK(RXCPUClass, (klass), TYPE_RX_CPU)
+#define RXCPU(obj) \
+OBJECT_CHECK(RXCPU, (obj), TYPE_RX_CPU)
+#define RXCPU_GET_CLASS(obj) \
+OBJECT_GET_CLASS(RXCPUClass, (obj), TYPE_RX_CPU)
+
+/*
+ * RXCPUClass:
+ * @parent_realize: The parent class' realize handler.
+ * @parent_reset: The parent class' reset handler.
+ *
+ * A RX CPU model.
+ */
+typedef struct RXCPUClass {
+/*< private >*/
+CPUClass parent_class;
+/*< public >*/
+
+DeviceRealize parent_realize;
+void (*parent_reset)(CPUState *cpu);
+
+} RXCPUClass;
+
+#define CPUArchState struct CPURXState
+
+#endif
diff --git a/target/rx/cpu.h b/target/rx/cpu.h
new file mode 100644
index 00..2d1eb7665c
--- /dev/null
+++ b/target/rx/cpu.h
@@ -0,0 +1,181 @@
+/*
+ *  RX emulation definition
+ *
+ *  Copyright (c) 2019 Yoshinori Sato
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#ifndef RX_CPU_H
+#define RX_CPU_H
+
+#include "qemu/bitops.h"
+#include "qemu-common.h"
+#include "hw/registerfields.h"
+#include "cpu-qom.h"
+
+#include "exec/cpu-defs.h"
+
+/* PSW define */
+REG32(PSW, 0)
+FIELD(PSW, C, 0, 1)
+FIELD(PSW, Z, 1, 1)
+FIELD(PSW, S, 2, 1)
+FIELD(PSW, O, 3, 1)
+FIELD(PSW, I, 16, 1)
+FIELD(PSW, U, 17, 1)
+FIELD(PSW, PM, 20, 1)
+FIELD(PSW, IPL, 24, 4)
+
+/* FPSW define */
+REG32(FPSW, 0)
+FIELD(FPSW, RM, 0, 2)
+FIELD(FPSW, CV, 2, 1)
+FIELD(FPSW, CO, 3, 1)
+FIELD(FPSW, CZ, 4, 1)
+FIELD(FPSW, CU, 5, 1)
+FIELD(FPSW, CX, 6, 1)
+FIELD(FPSW, CE, 7, 1)
+FIELD(FPSW, CAUSE, 2, 6)
+FIELD(FPSW, DN, 8, 1)
+FIELD(FPSW, EV, 10, 1)
+FIELD(FPSW, EO, 11, 1)
+FIELD(FPSW, EZ, 12, 1)
+FIELD(FPSW, EU, 13, 1)
+FIELD(FPSW, EX, 14, 1)
+FIELD(FPSW, ENABLE, 10, 5)
+FIELD(FPSW, FV, 26, 1)
+FIELD(FPSW, FO, 2

Re: [PATCH] qapi: Expand documentation for LostTickPolicy

2020-02-12 Thread Ján Tomko

On Tue, Feb 11, 2020 at 07:37:44PM +0100, Andrea Bolognani wrote:

The current documentation is fairly terse and not easy to decode
for someone who's not intimately familiar with the inner workings
of timer devices. Expand on it by providing a somewhat verbose


Perchance exorbitantly circumlocutory, but definitely an improvement.


description of what behavior each policy will result in, as seen
from both the guest OS and host point of view.

Signed-off-by: Andrea Bolognani 
---
This information is reported pretty much word by word in

 https://libvirt.org/formatdomain.html#elementsTime

so I'm hoping I can get the QEMU documentation updated and then just
merge back the changes.

qapi/misc.json | 34 +++---
1 file changed, 23 insertions(+), 11 deletions(-)


Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature


[PATCH v3 3/4] linux-user: fix TARGET_NSIG and _NSIG uses

2020-02-12 Thread Laurent Vivier
Valid signal numbers are between 1 (SIGHUP) and SIGRTMAX.

System includes define _NSIG to SIGRTMAX + 1, but
QEMU (like kernel) defines TARGET_NSIG to TARGET_SIGRTMAX.

Fix all the checks involving the signal range.

Signed-off-by: Laurent Vivier 
Reviewed-by: Peter Maydell 
---

Notes:
v2: replace i, j by target_sig, host_sig

 linux-user/signal.c | 52 -
 1 file changed, 37 insertions(+), 15 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index 246315571c09..c1e664f97a7c 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -30,6 +30,15 @@ static struct target_sigaction sigact_table[TARGET_NSIG];
 static void host_signal_handler(int host_signum, siginfo_t *info,
 void *puc);
 
+
+/*
+ * System includes define _NSIG as SIGRTMAX + 1,
+ * but qemu (like the kernel) defines TARGET_NSIG as TARGET_SIGRTMAX
+ * and the first signal is SIGHUP defined as 1
+ * Signal number 0 is reserved for use as kill(pid, 0), to test whether
+ * a process exists without sending it a signal.
+ */
+QEMU_BUILD_BUG_ON(__SIGRTMAX + 1 != _NSIG);
 static uint8_t host_to_target_signal_table[_NSIG] = {
 [SIGHUP] = TARGET_SIGHUP,
 [SIGINT] = TARGET_SIGINT,
@@ -67,19 +76,24 @@ static uint8_t host_to_target_signal_table[_NSIG] = {
 [SIGSYS] = TARGET_SIGSYS,
 /* next signals stay the same */
 };
-static uint8_t target_to_host_signal_table[_NSIG];
 
+static uint8_t target_to_host_signal_table[TARGET_NSIG + 1];
+
+/* valid sig is between 1 and _NSIG - 1 */
 int host_to_target_signal(int sig)
 {
-if (sig < 0 || sig >= _NSIG)
+if (sig < 1 || sig >= _NSIG) {
 return sig;
+}
 return host_to_target_signal_table[sig];
 }
 
+/* valid sig is between 1 and TARGET_NSIG */
 int target_to_host_signal(int sig)
 {
-if (sig < 0 || sig >= _NSIG)
+if (sig < 1 || sig > TARGET_NSIG) {
 return sig;
+}
 return target_to_host_signal_table[sig];
 }
 
@@ -100,11 +114,15 @@ static inline int target_sigismember(const 
target_sigset_t *set, int signum)
 void host_to_target_sigset_internal(target_sigset_t *d,
 const sigset_t *s)
 {
-int i;
+int host_sig, target_sig;
 target_sigemptyset(d);
-for (i = 1; i <= TARGET_NSIG; i++) {
-if (sigismember(s, i)) {
-target_sigaddset(d, host_to_target_signal(i));
+for (host_sig = 1; host_sig < _NSIG; host_sig++) {
+target_sig = host_to_target_signal(host_sig);
+if (target_sig < 1 || target_sig > TARGET_NSIG) {
+continue;
+}
+if (sigismember(s, host_sig)) {
+target_sigaddset(d, target_sig);
 }
 }
 }
@@ -122,11 +140,15 @@ void host_to_target_sigset(target_sigset_t *d, const 
sigset_t *s)
 void target_to_host_sigset_internal(sigset_t *d,
 const target_sigset_t *s)
 {
-int i;
+int host_sig, target_sig;
 sigemptyset(d);
-for (i = 1; i <= TARGET_NSIG; i++) {
-if (target_sigismember(s, i)) {
-sigaddset(d, target_to_host_signal(i));
+for (target_sig = 1; target_sig <= TARGET_NSIG; target_sig++) {
+host_sig = target_to_host_signal(target_sig);
+if (host_sig < 1 || host_sig >= _NSIG) {
+continue;
+}
+if (target_sigismember(s, target_sig)) {
+sigaddset(d, host_sig);
 }
 }
 }
@@ -492,10 +514,10 @@ static void signal_table_init(void)
 if (host_to_target_signal_table[host_sig] == 0) {
 host_to_target_signal_table[host_sig] = host_sig;
 }
-}
-for (host_sig = 1; host_sig < _NSIG; host_sig++) {
 target_sig = host_to_target_signal_table[host_sig];
-target_to_host_signal_table[target_sig] = host_sig;
+if (target_sig <= TARGET_NSIG) {
+target_to_host_signal_table[target_sig] = host_sig;
+}
 }
 }
 
@@ -518,7 +540,7 @@ void signal_init(void)
 act.sa_sigaction = host_signal_handler;
 for(i = 1; i <= TARGET_NSIG; i++) {
 #ifdef TARGET_GPROF
-if (i == SIGPROF) {
+if (i == TARGET_SIGPROF) {
 continue;
 }
 #endif
-- 
2.24.1




[PATCH v3 1/4] linux-user: add missing TARGET_SIGRTMIN for hppa

2020-02-12 Thread Laurent Vivier
This signal is defined for all other targets and we will need it later

Signed-off-by: Laurent Vivier 
[pm: that this was actually an ABI change in the hppa kernel (at kernel
version 3.17, kernel commit 1f25df2eff5b25f52c139d). Before that
SIGRTMIN was 37...
All our other HPPA TARGET_SIG* values are for the updated
ABI following that commit, so using 32 for SIGRTMIN is
the right thing for us.]
Reviewed-by: Peter Maydell 
---

Notes:
v3: Add Rb and comment from Peter

 linux-user/hppa/target_signal.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/linux-user/hppa/target_signal.h b/linux-user/hppa/target_signal.h
index ba159ff8d006..c2a0102ed73d 100644
--- a/linux-user/hppa/target_signal.h
+++ b/linux-user/hppa/target_signal.h
@@ -34,6 +34,7 @@
 #define TARGET_SIGURG  29
 #define TARGET_SIGXFSZ 30
 #define TARGET_SIGSYS  31
+#define TARGET_SIGRTMIN32
 
 #define TARGET_SIG_BLOCK   0
 #define TARGET_SIG_UNBLOCK 1
-- 
2.24.1




[PATCH v3 4/4] linux-user: fix use of SIGRTMIN

2020-02-12 Thread Laurent Vivier
Some RT signals can be in use by glibc,
it's why SIGRTMIN (34) is generally greater than __SIGRTMIN (32).

So SIGRTMIN cannot be mapped to TARGET_SIGRTMIN.

Instead of swapping only SIGRTMIN and SIGRTMAX, map all the
range [TARGET_SIGRTMIN ... TARGET_SIGRTMAX - X] to
  [__SIGRTMIN + X ... SIGRTMAX ]
(SIGRTMIN is __SIGRTMIN + X).

Signed-off-by: Laurent Vivier 
Reviewed-by: Taylor Simson 
---

Notes:
v3: use trace_event_get_state_backends()
update comments

v2: ignore error when target sig <= TARGET_NSIG but host sig > SIGRTMAX
replace i, j by target_sig, host_sig
update signal_table_init() trace message

 linux-user/signal.c | 50 -
 linux-user/trace-events |  3 +++
 2 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index c1e664f97a7c..046159dd0c5b 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -498,18 +498,30 @@ static int core_dump_signal(int sig)
 
 static void signal_table_init(void)
 {
-int host_sig, target_sig;
+int host_sig, target_sig, count;
 
 /*
- * Nasty hack: Reverse SIGRTMIN and SIGRTMAX to avoid overlap with
- * host libpthread signals.  This assumes no one actually uses SIGRTMAX :-/
+ * Signals are supported starting from TARGET_SIGRTMIN and going up
+ * until we run out of host realtime signals.
+ * glibc at least uses only the lower 2 rt signals and probably
+ * nobody's using the upper ones.
+ * it's why SIGRTMIN (34) is generally greater than __SIGRTMIN (32)
  * To fix this properly we need to do manual signal delivery multiplexed
  * over a single host signal.
+ * Attempts for configure "missing" signals via sigaction will be
+ * silently ignored.
  */
-host_to_target_signal_table[__SIGRTMIN] = __SIGRTMAX;
-host_to_target_signal_table[__SIGRTMAX] = __SIGRTMIN;
+for (host_sig = SIGRTMIN; host_sig <= SIGRTMAX; host_sig++) {
+target_sig = host_sig - SIGRTMIN + TARGET_SIGRTMIN;
+if (target_sig <= TARGET_NSIG) {
+host_to_target_signal_table[host_sig] = target_sig;
+}
+}
 
 /* generate signal conversion tables */
+for (target_sig = 1; target_sig <= TARGET_NSIG; target_sig++) {
+target_to_host_signal_table[target_sig] = _NSIG; /* poison */
+}
 for (host_sig = 1; host_sig < _NSIG; host_sig++) {
 if (host_to_target_signal_table[host_sig] == 0) {
 host_to_target_signal_table[host_sig] = host_sig;
@@ -519,6 +531,15 @@ static void signal_table_init(void)
 target_to_host_signal_table[target_sig] = host_sig;
 }
 }
+
+if (trace_event_get_state_backends(TRACE_SIGNAL_TABLE_INIT)) {
+for (target_sig = 1, count = 0; target_sig <= TARGET_NSIG; 
target_sig++) {
+if (target_to_host_signal_table[target_sig] == _NSIG) {
+count++;
+}
+}
+trace_signal_table_init(count);
+}
 }
 
 void signal_init(void)
@@ -817,6 +838,8 @@ int do_sigaction(int sig, const struct target_sigaction 
*act,
 int host_sig;
 int ret = 0;
 
+trace_signal_do_sigaction_guest(sig, TARGET_NSIG);
+
 if (sig < 1 || sig > TARGET_NSIG || sig == TARGET_SIGKILL || sig == 
TARGET_SIGSTOP) {
 return -TARGET_EINVAL;
 }
@@ -847,6 +870,23 @@ int do_sigaction(int sig, const struct target_sigaction 
*act,
 
 /* we update the host linux signal state */
 host_sig = target_to_host_signal(sig);
+trace_signal_do_sigaction_host(host_sig, TARGET_NSIG);
+if (host_sig > SIGRTMAX) {
+/* we don't have enough host signals to map all target signals */
+qemu_log_mask(LOG_UNIMP, "Unsupported target signal #%d, 
ignored\n",
+  sig);
+/*
+ * we don't return an error here because some programs try to
+ * register an handler for all possible rt signals even if they
+ * don't need it.
+ * An error here can abort them whereas there can be no problem
+ * to not have the signal available later.
+ * This is the case for golang,
+ *   See https://github.com/golang/go/issues/33746
+ * So we silently ignore the error.
+ */
+return 0;
+}
 if (host_sig != SIGSEGV && host_sig != SIGBUS) {
 sigfillset(&act1.sa_mask);
 act1.sa_flags = SA_SIGINFO;
diff --git a/linux-user/trace-events b/linux-user/trace-events
index f6de1b8befc0..0296133daeb6 100644
--- a/linux-user/trace-events
+++ b/linux-user/trace-events
@@ -1,6 +1,9 @@
 # See docs/devel/tracing.txt for syntax documentation.
 
 # signal.c
+signal_table_init(int i) "number of unavailable signals: %d"
+signal_do_sigaction_guest(int sig, int max) "target signal %d (MAX %d)"
+signal_do_sigaction_host(int sig, int max) "host signal %d (MAX %d)"
 # */signal.c
 user_se

[PATCH v3 0/4] linux-user: fix use of SIGRTMIN

2020-02-12 Thread Laurent Vivier
This series fixes the problem of the first real-time signals already
in use by the glibc that are not available for the target glibc.

Instead of reverting the first and last real-time signals we rely on
the value provided by the glibc (SIGRTMIN) to know the first available
signal and we map all the signals from this value to SIGRTMAX on top
of TARGET_SIGRTMIN. So the consequence is we have less available signals
in the target (generally 2) but all seems fine as at least 30 signals are
still available.

This has been tested with Go (golang 1.10.1 linux/arm64, bionic) on x86_64
fedora 31. We can avoid the failure in this case allowing the unsupported
signals when we don't provide the "act" parameters to sigaction, only the
"oldact" one. I have also run the LTP suite with several target and debian
based distros.

v3: use trace_event_get_state_backends()
update comments
Add R-b

v2: tested with golang 1.12.10 linux/arm64, eoan)
Ignore unsupported signals rather than returning an error
replace i, j by target_sig, host_sig

Laurent Vivier (4):
  linux-user: add missing TARGET_SIGRTMIN for hppa
  linux-user: cleanup signal.c
  linux-user: fix TARGET_NSIG and _NSIG uses
  linux-user: fix use of SIGRTMIN

 linux-user/hppa/target_signal.h |   1 +
 linux-user/signal.c | 134 
 linux-user/trace-events |   3 +
 3 files changed, 106 insertions(+), 32 deletions(-)

-- 
2.24.1




[PATCH v3 2/4] linux-user: cleanup signal.c

2020-02-12 Thread Laurent Vivier
No functional changes. Prepare the field for future fixes.

Remove memset(.., 0, ...) that is useless on a static array

Signed-off-by: Laurent Vivier 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Maydell 
---

Notes:
v2: replace i, j by target_sig, host_sig

 linux-user/signal.c | 48 ++---
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index 5ca6d62b15d3..246315571c09 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -66,12 +66,6 @@ static uint8_t host_to_target_signal_table[_NSIG] = {
 [SIGPWR] = TARGET_SIGPWR,
 [SIGSYS] = TARGET_SIGSYS,
 /* next signals stay the same */
-/* Nasty hack: Reverse SIGRTMIN and SIGRTMAX to avoid overlap with
-   host libpthread signals.  This assumes no one actually uses SIGRTMAX :-/
-   To fix this properly we need to do manual signal delivery multiplexed
-   over a single host signal.  */
-[__SIGRTMIN] = __SIGRTMAX,
-[__SIGRTMAX] = __SIGRTMIN,
 };
 static uint8_t target_to_host_signal_table[_NSIG];
 
@@ -480,31 +474,45 @@ static int core_dump_signal(int sig)
 }
 }
 
+static void signal_table_init(void)
+{
+int host_sig, target_sig;
+
+/*
+ * Nasty hack: Reverse SIGRTMIN and SIGRTMAX to avoid overlap with
+ * host libpthread signals.  This assumes no one actually uses SIGRTMAX :-/
+ * To fix this properly we need to do manual signal delivery multiplexed
+ * over a single host signal.
+ */
+host_to_target_signal_table[__SIGRTMIN] = __SIGRTMAX;
+host_to_target_signal_table[__SIGRTMAX] = __SIGRTMIN;
+
+/* generate signal conversion tables */
+for (host_sig = 1; host_sig < _NSIG; host_sig++) {
+if (host_to_target_signal_table[host_sig] == 0) {
+host_to_target_signal_table[host_sig] = host_sig;
+}
+}
+for (host_sig = 1; host_sig < _NSIG; host_sig++) {
+target_sig = host_to_target_signal_table[host_sig];
+target_to_host_signal_table[target_sig] = host_sig;
+}
+}
+
 void signal_init(void)
 {
 TaskState *ts = (TaskState *)thread_cpu->opaque;
 struct sigaction act;
 struct sigaction oact;
-int i, j;
+int i;
 int host_sig;
 
-/* generate signal conversion tables */
-for(i = 1; i < _NSIG; i++) {
-if (host_to_target_signal_table[i] == 0)
-host_to_target_signal_table[i] = i;
-}
-for(i = 1; i < _NSIG; i++) {
-j = host_to_target_signal_table[i];
-target_to_host_signal_table[j] = i;
-}
+/* initialize signal conversion tables */
+signal_table_init();
 
 /* Set the signal mask from the host mask. */
 sigprocmask(0, 0, &ts->signal_mask);
 
-/* set all host signal handlers. ALL signals are blocked during
-   the handlers to serialize them. */
-memset(sigact_table, 0, sizeof(sigact_table));
-
 sigfillset(&act.sa_mask);
 act.sa_flags = SA_SIGINFO;
 act.sa_sigaction = host_signal_handler;
-- 
2.24.1




[Bug 1857811] Re: qemu user static binary seems to lack support for network namespace.

2020-02-12 Thread crocket
def _configure_loopback_interface():
"""
Configure the loopback interface.
"""

# We add some additional addresses to work around odd behavior in 
glibc's
# getaddrinfo() implementation when the AI_ADDRCONFIG flag is set.
#
# For example:
#
#   struct addrinfo *res, hints = { .ai_family = AF_INET, .ai_flags = 
AI_ADDRCONFIG };
#   getaddrinfo("localhost", NULL, &hints, &res);
#
# This returns no results if there are no non-loopback addresses
# configured for a given address family.
#
# Bug: https://bugs.gentoo.org/690758
# Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=12377#c13

# Avoid importing this module on systems that may not support netlink 
sockets.
from portage.util.netlink import RtNetlink

try:
with RtNetlink() as rtnl:
ifindex = rtnl.get_link_ifindex(b'lo')
rtnl.set_link_up(ifindex)
rtnl.add_address(ifindex, socket.AF_INET, '10.0.0.1', 8)
if _has_ipv6():
rtnl.add_address(ifindex, socket.AF_INET6, 
'fd::1', 8)
except EnvironmentError as e:
writemsg("Unable to configure loopback interface: %s\n" % 
e.strerror, noiselevel=-1)

If I execute _configure_loopback_interface in a qemu-aarch64 chroot, I
see the following error.

Unable to configure loopback interface: Operation not supported

https://bugs.gentoo.org/703276 explains the issue.

** Bug watch added: Sourceware.org Bugzilla #12377
   https://sourceware.org/bugzilla/show_bug.cgi?id=12377

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1857811

Title:
  qemu user static binary seems to lack support for network namespace.

Status in QEMU:
  New

Bug description:
  Whenever I execute emerge in gentoo linux in qemu-aarch64 chroot, I
  see the following error message.

  Unable to configure loopback interface: Operation not supported

  If I disable emerge's network-sandbox which utilizes network
  namespace, the error disappears.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1857811/+subscriptions



[Bug 1857811] Re: qemu user static binary seems to lack support for network namespace.

2020-02-12 Thread crocket
You can obtain portage source code from
https://gentoo.osuosl.org/distfiles/portage-2.3.84.tar.bz2

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1857811

Title:
  qemu user static binary seems to lack support for network namespace.

Status in QEMU:
  New

Bug description:
  Whenever I execute emerge in gentoo linux in qemu-aarch64 chroot, I
  see the following error message.

  Unable to configure loopback interface: Operation not supported

  If I disable emerge's network-sandbox which utilizes network
  namespace, the error disappears.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1857811/+subscriptions



Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE

2020-02-12 Thread Eric Blake

On 2/12/20 6:36 AM, Richard W.M. Jones wrote:


Okay, in v2, I will start with just two bits, NBD_INIT_SPARSE
(entire image is sparse, nothing is allocated) and NBD_INIT_ZERO
(entire image reads as zero), and save any future bits for later
additions.  Do we think that 16 bits is sufficient for the amount of
initial information likely to be exposed?


So as I understand the proposal, the 16 bit limit comes about because
we want a round 4 byte reply, 16 bits are used by NBD_INFO_INIT_STATE
and that leaves 16 bits feature bits.  Therefore the only way to go
from there is to have 32 feature bits but an awkward unaligned 6 byte
structure, or 48 feature bits (8 byte structure).


In general, the NBD protocol has NOT focused on alignment issues (for 
good or for bad).  For example, NBD_INFO_BLOCK_SIZE is 18 bytes; all 
NBD_CMD_* 32-bit requests are 28 bytes except for NBD_CMD_WRITE which 
can send unaligned payload with no further padding, and so forth.




I guess given those constraints we can stick with 16 feature bits, and
if we ever needed more then we'd have to introduce NBD_INFO_INIT_STATE2.

The only thing I can think of which might be useful is a "fully
preallocated" bit which might be used as an indication that writes are
fast and are unlikely to fail with ENOSPC.


and which would be mutually-exclusive with NBD_INFO_SPARSE (except for 
an image of size 0).  That bit would ALSO be an indication that the user 
may not want to punch holes into the image, but preserve the 
fully-allocated state (and thus avoid NBD_CMD_TRIM as well as passing 
NBD_CMD_FLAG_NO_HOLE to any WRITE_ZEROES request).





Are we in agreement that
my addition of an NBD_INFO_ response to NBD_OPT_GO is the best way
to expose initial state bits?


Seems reasonable to me.

Rich.



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v5 26/26] nvme: make lba data size configurable

2020-02-12 Thread Maxim Levitsky
On Thu, 2020-02-06 at 08:24 +0100, Klaus Birkelund Jensen wrote:
> On Feb  5 01:43, Keith Busch wrote:
> > On Tue, Feb 04, 2020 at 10:52:08AM +0100, Klaus Jensen wrote:
> > > Signed-off-by: Klaus Jensen 
> > > ---
> > >  hw/block/nvme-ns.c | 2 +-
> > >  hw/block/nvme-ns.h | 4 +++-
> > >  hw/block/nvme.c| 1 +
> > >  3 files changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
> > > index 0e5be44486f4..981d7101b8f2 100644
> > > --- a/hw/block/nvme-ns.c
> > > +++ b/hw/block/nvme-ns.c
> > > @@ -18,7 +18,7 @@ static int nvme_ns_init(NvmeNamespace *ns)
> > >  {
> > >  NvmeIdNs *id_ns = &ns->id_ns;
> > >  
> > > -id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
> > > +id_ns->lbaf[0].ds = ns->params.lbads;
> > >  id_ns->nuse = id_ns->ncap = id_ns->nsze =
> > >  cpu_to_le64(nvme_ns_nlbas(ns));
> > >  
> > > diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
> > > index b564bac25f6d..f1fe4db78b41 100644
> > > --- a/hw/block/nvme-ns.h
> > > +++ b/hw/block/nvme-ns.h
> > > @@ -7,10 +7,12 @@
> > >  
> > >  #define DEFINE_NVME_NS_PROPERTIES(_state, _props) \
> > >  DEFINE_PROP_DRIVE("drive", _state, blk), \
> > > -DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0)
> > > +DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0), \
> > > +DEFINE_PROP_UINT8("lbads", _state, _props.lbads, BDRV_SECTOR_BITS)
> > 
> > I think we need to validate the parameter is between 9 and 12 before
> > trusting it can be used safely.
> > 
> > Alternatively, add supported formats to the lbaf array and let the host
> > decide on a live system with the 'format' command.
> 
> The device does not yet support Format NVM, but we have a patch ready
> for that to be submitted with a new series when this is merged.
> 
> For now, while it does not support Format, I will change this patch such
> that it defaults to 9 (BRDV_SECTOR_BITS) and only accept 12 as an
> alternative (while always keeping the number of formats available to 1).
Looks like a good idea.

Best regards,
Maxim Levitsky




Re: [PATCH v5 25/26] nvme: remove redundant NvmeCmd pointer parameter

2020-02-12 Thread Maxim Levitsky
On Tue, 2020-02-04 at 10:52 +0100, Klaus Jensen wrote:
> The command struct is available in the NvmeRequest that we generally
> pass around anyway.
> 
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c | 198 
>  1 file changed, 98 insertions(+), 100 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index bdef53a590b0..5fe2e2fe1fa9 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -566,16 +566,18 @@ unmap:
>  }
>  
>  static uint16_t nvme_dma(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
> -NvmeCmd *cmd, DMADirection dir, NvmeRequest *req)
> +DMADirection dir, NvmeRequest *req)
>  {
>  uint16_t status = NVME_SUCCESS;
>  size_t bytes;
> +uint64_t prp1, prp2;
>  
> -switch (NVME_CMD_FLAGS_PSDT(cmd->flags)) {
> +switch (NVME_CMD_FLAGS_PSDT(req->cmd.flags)) {
>  case PSDT_PRP:
> -status = nvme_map_prp(n, &req->qsg, &req->iov,
> -le64_to_cpu(cmd->dptr.prp.prp1), le64_to_cpu(cmd->dptr.prp.prp2),
> -len, req);
> +prp1 = le64_to_cpu(req->cmd.dptr.prp.prp1);
> +prp2 = le64_to_cpu(req->cmd.dptr.prp.prp2);
> +
> +status = nvme_map_prp(n, &req->qsg, &req->iov, prp1, prp2, len, req);
>  if (status) {
>  return status;
>  }
> @@ -589,7 +591,7 @@ static uint16_t nvme_dma(NvmeCtrl *n, uint8_t *ptr, 
> uint32_t len,
>  return NVME_INVALID_FIELD;
>  }
>  
> -status = nvme_map_sgl(n, &req->qsg, &req->iov, cmd->dptr.sgl, len,
> +status = nvme_map_sgl(n, &req->qsg, &req->iov, req->cmd.dptr.sgl, 
> len,
>  req);
>  if (status) {
>  return status;
> @@ -632,20 +634,21 @@ static uint16_t nvme_dma(NvmeCtrl *n, uint8_t *ptr, 
> uint32_t len,
>  return status;
>  }
>  
> -static uint16_t nvme_map(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> +static uint16_t nvme_map(NvmeCtrl *n, NvmeRequest *req)
>  {
>  uint32_t len = req->nlb << nvme_ns_lbads(req->ns);
>  uint64_t prp1, prp2;
>  
> -switch (NVME_CMD_FLAGS_PSDT(cmd->flags)) {
> +switch (NVME_CMD_FLAGS_PSDT(req->cmd.flags)) {
>  case PSDT_PRP:
> -prp1 = le64_to_cpu(cmd->dptr.prp.prp1);
> -prp2 = le64_to_cpu(cmd->dptr.prp.prp2);
> +prp1 = le64_to_cpu(req->cmd.dptr.prp.prp1);
> +prp2 = le64_to_cpu(req->cmd.dptr.prp.prp2);
>  
>  return nvme_map_prp(n, &req->qsg, &req->iov, prp1, prp2, len, req);
>  case PSDT_SGL_MPTR_CONTIGUOUS:
>  case PSDT_SGL_MPTR_SGL:
> -return nvme_map_sgl(n, &req->qsg, &req->iov, cmd->dptr.sgl, len, 
> req);
> +return nvme_map_sgl(n, &req->qsg, &req->iov, req->cmd.dptr.sgl, len,
> +req);
>  default:
>  return NVME_INVALID_FIELD;
>  }
> @@ -1024,7 +1027,7 @@ static void nvme_aio_cb(void *opaque, int ret)
>  nvme_aio_destroy(aio);
>  }
>  
> -static uint16_t nvme_flush(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> +static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
>  {
>  NvmeNamespace *ns = req->ns;
>  NvmeAIO *aio = g_new0(NvmeAIO, 1);
> @@ -1040,12 +1043,12 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeCmd *cmd, 
> NvmeRequest *req)
>  return NVME_NO_COMPLETE;
>  }
>  
> -static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> +static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeRequest *req)
>  {
>  NvmeAIO *aio;
>  
>  NvmeNamespace *ns = req->ns;
> -NvmeRwCmd *rw = (NvmeRwCmd *) cmd;
> +NvmeRwCmd *rw = (NvmeRwCmd *) &req->cmd;
>  
>  int64_t offset;
>  size_t count;
> @@ -1081,9 +1084,9 @@ static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeCmd 
> *cmd, NvmeRequest *req)
>  return NVME_NO_COMPLETE;
>  }
>  
> -static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> +static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
>  {
> -NvmeRwCmd *rw = (NvmeRwCmd *) cmd;
> +NvmeRwCmd *rw = (NvmeRwCmd *) &req->cmd;
>  NvmeNamespace *ns = req->ns;
>  int status;
>  
> @@ -1103,7 +1106,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, 
> NvmeRequest *req)
>  return status;
>  }
>  
> -status = nvme_map(n, cmd, req);
> +status = nvme_map(n, req);
>  if (status) {
>  block_acct_invalid(blk_get_stats(ns->blk), acct);
>  return status;
> @@ -1115,12 +1118,12 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeCmd *cmd, 
> NvmeRequest *req)
>  return NVME_NO_COMPLETE;
>  }
>  
> -static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> +static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
>  {
> -uint32_t nsid = le32_to_cpu(cmd->nsid);
> +uint32_t nsid = le32_to_cpu(req->cmd.nsid);
>  
>  trace_nvme_dev_io_cmd(nvme_cid(req), nsid, le16_to_cpu(req->sq->sqid),
> -cmd->opcode);
> +req->cmd.opcode);
>  
>  req->ns = nvme_ns(n, nsid);
>  
> @@ -1128,16 +1131,16 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd

Re: [PATCH v5 24/26] nvme: change controller pci id

2020-02-12 Thread Maxim Levitsky
On Tue, 2020-02-04 at 10:52 +0100, Klaus Jensen wrote:
> There are two reasons for changing this:
> 
>   1. The nvme device currently uses an internal Intel device id.
> 
>   2. Since commits "nvme: fix write zeroes offset and count" and "nvme:
>  support multiple namespaces" the controller device no longer has
>  the quirks that the Linux kernel think it has.
> 
>  As the quirks are applied based on pci vendor and device id, change
>  them to get rid of the quirks.
> 
> To keep backward compatibility, add a new 'x-use-intel-id' parameter to
> the nvme device to force use of the Intel vendor and device id. This is
> off by default but add a compat property to set this for machines 4.2
> and older.
> 
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c   | 13 +
>  hw/block/nvme.h   |  4 +++-
>  hw/core/machine.c |  1 +
>  3 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 3a377bc56734..bdef53a590b0 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -2467,8 +2467,15 @@ static void nvme_init_pci(NvmeCtrl *n, PCIDevice 
> *pci_dev)
>  
>  pci_conf[PCI_INTERRUPT_PIN] = 1;
>  pci_config_set_prog_interface(pci_conf, 0x2);
> -pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL);
> -pci_config_set_device_id(pci_conf, 0x5845);
> +
> +if (n->params.use_intel_id) {
> +pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL);
> +pci_config_set_device_id(pci_conf, 0x5846);
> +} else {
> +pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_REDHAT);
> +pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_REDHAT_NVME);
> +}
> +
>  pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS);
>  pcie_endpoint_cap_init(pci_dev, 0x80);
>  
> @@ -2638,8 +2645,6 @@ static void nvme_class_init(ObjectClass *oc, void *data)
>  pc->realize = nvme_realize;
>  pc->exit = nvme_exit;
>  pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
> -pc->vendor_id = PCI_VENDOR_ID_INTEL;
> -pc->device_id = 0x5845;
>  pc->revision = 2;
>  
>  set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index c3cef0f024da..6b584f53ed64 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -12,7 +12,8 @@
>  DEFINE_PROP_UINT32("num_queues", _state, _props.num_queues, 64), \
>  DEFINE_PROP_UINT8("aerl", _state, _props.aerl, 3), \
>  DEFINE_PROP_UINT32("aer_max_queued", _state, _props.aer_max_queued, 64), 
> \
> -DEFINE_PROP_UINT8("mdts", _state, _props.mdts, 7)
> +DEFINE_PROP_UINT8("mdts", _state, _props.mdts, 7), \
> +DEFINE_PROP_BOOL("x-use-intel-id", _state, _props.use_intel_id, false)
>  
>  typedef struct NvmeParams {
>  char *serial;
> @@ -21,6 +22,7 @@ typedef struct NvmeParams {
>  uint8_t  aerl;
>  uint32_t aer_max_queued;
>  uint8_t  mdts;
> +bool use_intel_id;
>  } NvmeParams;
>  
>  typedef struct NvmeAsyncEvent {
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 3e288bfceb7f..984412d98c9d 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -34,6 +34,7 @@ GlobalProperty hw_compat_4_2[] = {
>  { "vhost-blk-device", "seg_max_adjust", "off"},
>  { "usb-host", "suppress-remote-wake", "off" },
>  { "usb-redir", "suppress-remote-wake", "off" },
> +{ "nvme", "x-use-intel-id", "on"},
>  };
>  const size_t hw_compat_4_2_len = G_N_ELEMENTS(hw_compat_4_2);
>  

Reviewed-by: Maxim Levitsky 

Best regards,
Maxim Levitsky




Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE

2020-02-12 Thread Richard W.M. Jones


On Wed, Feb 12, 2020 at 06:09:11AM -0600, Eric Blake wrote:
> On 2/12/20 1:27 AM, Wouter Verhelst wrote:
> >Hi,
> >
> >On Mon, Feb 10, 2020 at 10:52:55PM +, Richard W.M. Jones wrote:
> >>But anyway ... could a flag indicating that the whole image is sparse
> >>be useful, either as well as NBD_INIT_SPARSE or instead of it?  You
> >>could use it to avoid an initial disk trim, which is something that
> >>mke2fs does:
> >
> >Yeah, I think that could definitely be useful. I honestly can't see a
> >use for NBD_INIT_SPARSE as defined in this proposal; and I don't think
> >it's generally useful to have a feature if we can't think of a use case
> >for it (that creates added complexity for no benefit).
> >
> >If we can find a reasonable use case for NBD_INIT_SPARSE as defined in
> >this proposal, then just add a third bit (NBD_INIT_ALL_SPARSE or
> >something) that says "the whole image is sparse". Otherwise, I think we
> >should redefine NBD_INIT_SPARSE to say that.
> 
> Okay, in v2, I will start with just two bits, NBD_INIT_SPARSE
> (entire image is sparse, nothing is allocated) and NBD_INIT_ZERO
> (entire image reads as zero), and save any future bits for later
> additions.  Do we think that 16 bits is sufficient for the amount of
> initial information likely to be exposed?

So as I understand the proposal, the 16 bit limit comes about because
we want a round 4 byte reply, 16 bits are used by NBD_INFO_INIT_STATE
and that leaves 16 bits feature bits.  Therefore the only way to go
from there is to have 32 feature bits but an awkward unaligned 6 byte
structure, or 48 feature bits (8 byte structure).

I guess given those constraints we can stick with 16 feature bits, and
if we ever needed more then we'd have to introduce NBD_INFO_INIT_STATE2.

The only thing I can think of which might be useful is a "fully
preallocated" bit which might be used as an indication that writes are
fast and are unlikely to fail with ENOSPC.

> Are we in agreement that
> my addition of an NBD_INFO_ response to NBD_OPT_GO is the best way
> to expose initial state bits?

Seems reasonable to me.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW




Re: [PATCH v5 23/26] pci: allocate pci id for nvme

2020-02-12 Thread Maxim Levitsky
On Tue, 2020-02-04 at 10:52 +0100, Klaus Jensen wrote:
> The emulated nvme device (hw/block/nvme.c) is currently using an
> internal Intel device id.
> 
> Prepare to change that by allocating a device id under the 1b36 (Red
> Hat, Inc.) vendor id.

> 
> Signed-off-by: Klaus Jensen 
> ---
>  MAINTAINERS|  1 +
>  docs/specs/nvme.txt| 10 ++
>  docs/specs/pci-ids.txt |  1 +
>  include/hw/pci/pci.h   |  1 +
>  4 files changed, 13 insertions(+)
>  create mode 100644 docs/specs/nvme.txt
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 1f0bc72f2189..14a018e9c0ae 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1645,6 +1645,7 @@ L: qemu-bl...@nongnu.org
>  S: Supported
>  F: hw/block/nvme*
>  F: tests/qtest/nvme-test.c
> +F: docs/specs/nvme.txt
>  
>  megasas
>  M: Hannes Reinecke 
> diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
> new file mode 100644
> index ..6ec7ddbc7ee0
> --- /dev/null
> +++ b/docs/specs/nvme.txt
> @@ -0,0 +1,10 @@
> +NVM Express Controller
> +==
> +
> +The nvme device (-device nvme) emulates an NVM Express Controller.
> +
> +
> +Reference Specifications
> +
> +
> +  https://nvmexpress.org/resources/specifications/

Nitpick: maybe mention the nvme version here, plus some TODOs that are left?

> diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> index 4d53e5c7d9d5..abbdbca6be38 100644
> --- a/docs/specs/pci-ids.txt
> +++ b/docs/specs/pci-ids.txt
> @@ -63,6 +63,7 @@ PCI devices (other than virtio):
>  1b36:000b  PCIe Expander Bridge (-device pxb-pcie)
>  1b36:000d  PCI xhci usb host adapter
>  1b36:000f  mdpy (mdev sample device), linux/samples/vfio-mdev/mdpy.c
> +1b36:0010  PCIe NVMe device (-device nvme)
>  
>  All these devices are documented in docs/specs.
>  
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index b5013b834b20..9a20c309d0f2 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -103,6 +103,7 @@ extern bool pci_available;
>  #define PCI_DEVICE_ID_REDHAT_XHCI0x000d
>  #define PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE 0x000e
>  #define PCI_DEVICE_ID_REDHAT_MDPY0x000f
> +#define PCI_DEVICE_ID_REDHAT_NVME0x0010
>  #define PCI_DEVICE_ID_REDHAT_QXL 0x0100
>  
>  #define FMT_PCIBUS  PRIx64

Other than the actual ID assignment which is not something
I can approve/allocate:

Reviewed-by: Maxim Levitsky 

Best regards,
Maxim Levitsky






Re: [PATCH v5 22/26] nvme: support multiple namespaces

2020-02-12 Thread Maxim Levitsky
On Tue, 2020-02-04 at 10:52 +0100, Klaus Jensen wrote:
> This adds support for multiple namespaces by introducing a new 'nvme-ns'
> device model. The nvme device creates a bus named from the device name
> ('id'). The nvme-ns devices then connect to this and registers
> themselves with the nvme device.
> 
> This changes how an nvme device is created. Example with two namespaces:
> 
>   -drive file=nvme0n1.img,if=none,id=disk1
>   -drive file=nvme0n2.img,if=none,id=disk2
>   -device nvme,serial=deadbeef,id=nvme0
>   -device nvme-ns,drive=disk1,bus=nvme0,nsid=1
>   -device nvme-ns,drive=disk2,bus=nvme0,nsid=2
> 
> The drive property is kept on the nvme device to keep the change
> backward compatible, but the property is now optional. Specifying a
> drive for the nvme device will always create the namespace with nsid 1.
Very reasonable way to do it. 
> 
> Signed-off-by: Klaus Jensen 
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/Makefile.objs |   2 +-
>  hw/block/nvme-ns.c | 158 +++
>  hw/block/nvme-ns.h |  60 +++
>  hw/block/nvme.c| 235 +
>  hw/block/nvme.h|  47 -
>  hw/block/trace-events  |   6 +-
>  6 files changed, 389 insertions(+), 119 deletions(-)
>  create mode 100644 hw/block/nvme-ns.c
>  create mode 100644 hw/block/nvme-ns.h
> 
> diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
> index 28c2495a00dc..45f463462f1e 100644
> --- a/hw/block/Makefile.objs
> +++ b/hw/block/Makefile.objs
> @@ -7,7 +7,7 @@ common-obj-$(CONFIG_PFLASH_CFI02) += pflash_cfi02.o
>  common-obj-$(CONFIG_XEN) += xen-block.o
>  common-obj-$(CONFIG_ECC) += ecc.o
>  common-obj-$(CONFIG_ONENAND) += onenand.o
> -common-obj-$(CONFIG_NVME_PCI) += nvme.o
> +common-obj-$(CONFIG_NVME_PCI) += nvme.o nvme-ns.o
>  common-obj-$(CONFIG_SWIM) += swim.o
>  
>  obj-$(CONFIG_SH4) += tc58128.o
> diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
> new file mode 100644
> index ..0e5be44486f4
> --- /dev/null
> +++ b/hw/block/nvme-ns.c
> @@ -0,0 +1,158 @@
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qemu/cutils.h"
> +#include "qemu/log.h"
> +#include "hw/block/block.h"
> +#include "hw/pci/msix.h"
Do you need this include?
> +#include "sysemu/sysemu.h"
> +#include "sysemu/block-backend.h"
> +#include "qapi/error.h"
> +
> +#include "hw/qdev-properties.h"
> +#include "hw/qdev-core.h"
> +
> +#include "nvme.h"
> +#include "nvme-ns.h"
> +
> +static int nvme_ns_init(NvmeNamespace *ns)
> +{
> +NvmeIdNs *id_ns = &ns->id_ns;
> +
> +id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
> +id_ns->nuse = id_ns->ncap = id_ns->nsze =
> +cpu_to_le64(nvme_ns_nlbas(ns));
Nitpick: To be honest I don't really like that chain assignment, 
especially since it forces to wrap the line, but that is just my
personal taste.
> +
> +return 0;
> +}
> +
> +static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, NvmeIdCtrl *id,
> +Error **errp)
> +{
> +uint64_t perm, shared_perm;
> +
> +Error *local_err = NULL;
> +int ret;
> +
> +perm = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE;
> +shared_perm = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
> +BLK_PERM_GRAPH_MOD;
> +
> +ret = blk_set_perm(ns->blk, perm, shared_perm, &local_err);
> +if (ret) {
> +error_propagate_prepend(errp, local_err, "blk_set_perm: ");
> +return ret;
> +}

You should consider using blkconf_apply_backend_options.
Take a look at for example virtio_blk_device_realize.
That will give you support for read only block devices as well.

I personally only once grazed the area of block permissions,
so I prefer someone from the block layer to review this as well.

> +
> +ns->size = blk_getlength(ns->blk);
> +if (ns->size < 0) {
> +error_setg_errno(errp, -ns->size, "blk_getlength");
> +return 1;
> +}
> +
> +switch (n->conf.wce) {
> +case ON_OFF_AUTO_ON:
> +n->features.volatile_wc = 1;
> +break;
> +case ON_OFF_AUTO_OFF:
> +n->features.volatile_wc = 0;
> +case ON_OFF_AUTO_AUTO:
> +n->features.volatile_wc = blk_enable_write_cache(ns->blk);
> +break;
> +default:
> +abort();
> +}
> +
> +blk_set_enable_write_cache(ns->blk, n->features.volatile_wc);
> +
> +return 0;

Nitpick: also I just noticed that you call the controller 'n' I didn't paid 
attention to this
before. I think something like 'ctrl' or ctl would be more readable.

> +}
> +
> +static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
> +{
> +if (!ns->blk) {
> +error_setg(errp, "block backend not configured");
> +return 1;
> +}
> +
> +return 0;
> +}
> +
> +int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
> +{
> +Error *local_err = NULL;
> +
> +if (nvme_ns_check_constraints(ns, &local_err)) {
> +error_propagate_prepend(errp, local_err,
> +"nvme_ns_check_cons

Re: [PATCH] console: make QMP screendump use coroutine

2020-02-12 Thread Gerd Hoffmann
  Hi,

> Thanks to the QMP coroutine support, the screendump handler can
> trigger a graphic_hw_update(), yield and let the main loop run until
> update is done. Then the handler is resumed, and the ppm_save() will
> write the screen image to disk in the coroutine context (thus
> non-blocking).
> 
> For now, HMP doesn't have coroutine support, so it remains potentially
> outdated or glitched.
> 
> Fixes:
> https://bugzilla.redhat.com/show_bug.cgi?id=1230527
> 
> Based-on: <20200109183545.27452-2-kw...@redhat.com>

What is the status here?  Tried to apply (worked) and build (failed),
seems Kevins patches are not merged yet?

thanks,
  Gerd




Re: [PATCH] ui/cocoa: Drop workarounds for pre-10.12 OSX

2020-02-12 Thread Gerd Hoffmann
On Sat, Feb 01, 2020 at 05:05:34PM +, Peter Maydell wrote:
> Our official OSX support policy covers the last two released versions.
> Currently that is 10.14 and 10.15.  We also may work on older versions, but
> don't guarantee it.
> 
> In commit 50290c002c045280f8d in mid-2019 we introduced some uses of
> CLOCK_MONOTONIC which incidentally broke compilation for pre-10.12 OSX
> versions (see LP:1861551). We don't intend to fix that, so we might
> as well drop the code in ui/cocoa.m which caters for pre-10.12
> versions as well. (For reference, 10.11 fell out of Apple extended
> security support in September 2018.)

Added to UI patch queue.

thanks,
  Gerd




Re: [PATCH v2 0/2] ui/gtk: Fix gd_refresh_rate_millihz() when widget window is not realized

2020-02-12 Thread Gerd Hoffmann
On Sat, Feb 08, 2020 at 05:10:46PM +0100, Philippe Mathieu-Daudé wrote:
> Fix bug report from Jan Kiszka:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg678130.html
> https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02260.html

Added to UI queue.

thanks,
  Gerd




[Bug 1857811] Re: qemu user static binary seems to lack support for network namespace.

2020-02-12 Thread Laurent Vivier
The interesting part in emerge.log is:

  23473 socket(16,,IPPROTO_IP) = 5
  23473 bind(5,274886353720,12,0,1,274889671712) = 0
  23473 sendto(5,275542232672,38,0,274886353960,12) = -1 errno=95 (Operation 
not supported)
  23473 close(5) = 0
  Unable to configure loopback interface: Operation not supported

So you're right 16 is AF_NETLINK

At QEMU level only one function returns  EOPNOTSUPP, the one managing
RTM_* operations (RTM_GETLINK, RTM_GETADDR, ...) and it doesn't manage a
bunch of them.

Could you provide a step by step example to reproduce the problem?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1857811

Title:
  qemu user static binary seems to lack support for network namespace.

Status in QEMU:
  New

Bug description:
  Whenever I execute emerge in gentoo linux in qemu-aarch64 chroot, I
  see the following error message.

  Unable to configure loopback interface: Operation not supported

  If I disable emerge's network-sandbox which utilizes network
  namespace, the error disappears.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1857811/+subscriptions



Re: [PATCH] nbd: Fix regression with multiple meta contexts

2020-02-12 Thread Laurent Vivier
Le 12/02/2020 à 13:10, Eric Blake a écrit :
> On 2/12/20 3:24 AM, Laurent Vivier wrote:
>> Le 06/02/2020 à 18:38, Eric Blake a écrit :
>>> Detected by a hang in the libnbd testsuite.  If a client requests
>>> multiple meta contexts (both base:allocation and qemu:dirty-bitmap:x)
>>> at the same time, our attempt to silence a false-positive warning
>>> about a potential uninitialized variable introduced botched logic: we
>>> were short-circuiting the second context, and never sending the
>>> NBD_REPLY_FLAG_DONE.  Combining two 'if' into one 'if/else' in
>>> bdf200a55 was wrong (I'm a bit embarrassed that such a change was my
>>> initial suggestion after the v1 patch, then I did not review the v2
>>> patch that actually got committed). Revert that, and instead silence
>>> the false positive warning by replacing 'return ret' with 'return 0'
>>> (the value it always has at that point in the code, even though it
>>> eluded the deduction abilities of the robot that reported the false
>>> positive).
>>>
>>> Fixes: bdf200a5535
>>> Signed-off-by: Eric Blake 
>>> ---
>>>
>>> It's never fun when a regression is caused by a patch taken through
>>> qemu-trivial, proving that the patch was not trivial after all.
>>
>> Do you want this one be merged using the trivial branch?
> 
> Up to you; I'm also fine taking it through my NBD tree as I have a few
> other NBD patches landing soon.
> 

For the moment, I have only one patch in my queue so I think you can
take it.

Thanks,
Laurent



Re: [PATCH] nbd: Fix regression with multiple meta contexts

2020-02-12 Thread Eric Blake

On 2/12/20 3:24 AM, Laurent Vivier wrote:

Le 06/02/2020 à 18:38, Eric Blake a écrit :

Detected by a hang in the libnbd testsuite.  If a client requests
multiple meta contexts (both base:allocation and qemu:dirty-bitmap:x)
at the same time, our attempt to silence a false-positive warning
about a potential uninitialized variable introduced botched logic: we
were short-circuiting the second context, and never sending the
NBD_REPLY_FLAG_DONE.  Combining two 'if' into one 'if/else' in
bdf200a55 was wrong (I'm a bit embarrassed that such a change was my
initial suggestion after the v1 patch, then I did not review the v2
patch that actually got committed). Revert that, and instead silence
the false positive warning by replacing 'return ret' with 'return 0'
(the value it always has at that point in the code, even though it
eluded the deduction abilities of the robot that reported the false
positive).

Fixes: bdf200a5535
Signed-off-by: Eric Blake 
---

It's never fun when a regression is caused by a patch taken through
qemu-trivial, proving that the patch was not trivial after all.


Do you want this one be merged using the trivial branch?


Up to you; I'm also fine taking it through my NBD tree as I have a few 
other NBD patches landing soon.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: Cross-project NBD extension proposal: NBD_INFO_INIT_STATE

2020-02-12 Thread Eric Blake

On 2/12/20 1:27 AM, Wouter Verhelst wrote:

Hi,

On Mon, Feb 10, 2020 at 10:52:55PM +, Richard W.M. Jones wrote:

But anyway ... could a flag indicating that the whole image is sparse
be useful, either as well as NBD_INIT_SPARSE or instead of it?  You
could use it to avoid an initial disk trim, which is something that
mke2fs does:


Yeah, I think that could definitely be useful. I honestly can't see a
use for NBD_INIT_SPARSE as defined in this proposal; and I don't think
it's generally useful to have a feature if we can't think of a use case
for it (that creates added complexity for no benefit).

If we can find a reasonable use case for NBD_INIT_SPARSE as defined in
this proposal, then just add a third bit (NBD_INIT_ALL_SPARSE or
something) that says "the whole image is sparse". Otherwise, I think we
should redefine NBD_INIT_SPARSE to say that.


Okay, in v2, I will start with just two bits, NBD_INIT_SPARSE (entire 
image is sparse, nothing is allocated) and NBD_INIT_ZERO (entire image 
reads as zero), and save any future bits for later additions.  Do we 
think that 16 bits is sufficient for the amount of initial information 
likely to be exposed?  Are we in agreement that my addition of an 
NBD_INFO_ response to NBD_OPT_GO is the best way to expose initial state 
bits?


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v5 21/26] nvme: add support for scatter gather lists

2020-02-12 Thread Maxim Levitsky
On Tue, 2020-02-04 at 10:52 +0100, Klaus Jensen wrote:
> For now, support the Data Block, Segment and Last Segment descriptor
> types.
> 
> See NVM Express 1.3d, Section 4.4 ("Scatter Gather List (SGL)").
> 
> Signed-off-by: Klaus Jensen 
> Acked-by: Fam Zheng 
> ---
>  block/nvme.c  |  18 +-
>  hw/block/nvme.c   | 375 +++---
>  hw/block/trace-events |   4 +
>  include/block/nvme.h  |  62 ++-
>  4 files changed, 389 insertions(+), 70 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index d41c4bda6e39..521f521054d5 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -446,7 +446,7 @@ static void nvme_identify(BlockDriverState *bs, int 
> namespace, Error **errp)
>  error_setg(errp, "Cannot map buffer for DMA");
>  goto out;
>  }
> -cmd.prp1 = cpu_to_le64(iova);
> +cmd.dptr.prp.prp1 = cpu_to_le64(iova);
>  
>  if (nvme_cmd_sync(bs, s->queues[0], &cmd)) {
>  error_setg(errp, "Failed to identify controller");
> @@ -545,7 +545,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
> **errp)
>  }
>  cmd = (NvmeCmd) {
>  .opcode = NVME_ADM_CMD_CREATE_CQ,
> -.prp1 = cpu_to_le64(q->cq.iova),
> +.dptr.prp.prp1 = cpu_to_le64(q->cq.iova),
>  .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)),
>  .cdw11 = cpu_to_le32(0x3),
>  };
> @@ -556,7 +556,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
> **errp)
>  }
>  cmd = (NvmeCmd) {
>  .opcode = NVME_ADM_CMD_CREATE_SQ,
> -.prp1 = cpu_to_le64(q->sq.iova),
> +.dptr.prp.prp1 = cpu_to_le64(q->sq.iova),
>  .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)),
>  .cdw11 = cpu_to_le32(0x1 | (n << 16)),
>  };
> @@ -906,16 +906,16 @@ try_map:
>  case 0:
>  abort();
>  case 1:
> -cmd->prp1 = pagelist[0];
> -cmd->prp2 = 0;
> +cmd->dptr.prp.prp1 = pagelist[0];
> +cmd->dptr.prp.prp2 = 0;
>  break;
>  case 2:
> -cmd->prp1 = pagelist[0];
> -cmd->prp2 = pagelist[1];
> +cmd->dptr.prp.prp1 = pagelist[0];
> +cmd->dptr.prp.prp2 = pagelist[1];
>  break;
>  default:
> -cmd->prp1 = pagelist[0];
> -cmd->prp2 = cpu_to_le64(req->prp_list_iova + sizeof(uint64_t));
> +cmd->dptr.prp.prp1 = pagelist[0];
> +cmd->dptr.prp.prp2 = cpu_to_le64(req->prp_list_iova + 
> sizeof(uint64_t));
>  break;
>  }
>  trace_nvme_cmd_map_qiov(s, cmd, req, qiov, entries);
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 204ae1d33234..a91c60fdc111 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -75,8 +75,10 @@ static inline bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr 
> addr)
>  
>  static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
>  {
> -if (n->cmbsz && nvme_addr_is_cmb(n, addr)) {
> -memcpy(buf, (void *) &n->cmbuf[addr - n->ctrl_mem.addr], size);
> +hwaddr hi = addr + size;
Are you sure you don't want to check for overflow here?
Its theoretical issue since addr has to be almost full 64 bit
but still for those things I check this very defensively.

> +
> +if (n->cmbsz && nvme_addr_is_cmb(n, addr) && nvme_addr_is_cmb(n, hi)) {
Here you fix the bug I mentioned in patch 6. I suggest you to move the fix 
there.
> +memcpy(buf, nvme_addr_to_cmb(n, addr), size);
>  return 0;
>  }
>  
> @@ -159,6 +161,48 @@ static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue 
> *cq)
>  }
>  }
>  
> +static uint16_t nvme_map_addr_cmb(NvmeCtrl *n, QEMUIOVector *iov, hwaddr 
> addr,
> +size_t len)
> +{
> +if (!nvme_addr_is_cmb(n, addr) || !nvme_addr_is_cmb(n, addr + len)) {
> +return NVME_DATA_TRANSFER_ERROR;
> +}
> +
> +qemu_iovec_add(iov, nvme_addr_to_cmb(n, addr), len);
> +
> +return NVME_SUCCESS;
> +}
> +
> +static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector 
> *iov,
> +hwaddr addr, size_t len)
> +{
> +bool addr_is_cmb = nvme_addr_is_cmb(n, addr);
> +
> +if (addr_is_cmb) {
> +if (qsg->sg) {
> +return NVME_INVALID_USE_OF_CMB | NVME_DNR;
> +}
> +
> +if (!iov->iov) {
> +qemu_iovec_init(iov, 1);
> +}
> +
> +return nvme_map_addr_cmb(n, iov, addr, len);
> +}
> +
> +if (iov->iov) {
> +return NVME_INVALID_USE_OF_CMB | NVME_DNR;
> +}
> +
> +if (!qsg->sg) {
> +pci_dma_sglist_init(qsg, &n->parent_obj, 1);
> +}
> +
> +qemu_sglist_add(qsg, addr, len);
> +
> +return NVME_SUCCESS;
> +}

Very good refactoring. I would also suggest you to move this to a separate
patch. I always put refactoring first and then patches that add features.

> +
>  static uint16_t nvme_map_prp(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
>  uint64_t prp1, uint64_t prp2, uint32_t len, NvmeRequest *req)
>  {
> @@ -307,15 +351,240 

<    1   2   3   4   >