date:20201016

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Jann Horn

On Sat, Oct 17, 2020 at 8:44 AM Willy Tarreau  wrote:
> On Sat, Oct 17, 2020 at 07:52:48AM +0200, Jann Horn wrote:
> > On Sat, Oct 17, 2020 at 7:37 AM Willy Tarreau  wrote:
> > > On Sat, Oct 17, 2020 at 07:01:31AM +0200, Jann Horn wrote:
> > > > Microsoft's documentation
> > > > (http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM
> > > > Generation ID that we get after a fork "is a 128-bit,
> > > > cryptographically random integer value". If multiple people use the
> > > > same image, it guarantees that each use of the image gets its own,
> > > > fresh ID:
> > >
> > > No. It cannot be more unique than the source that feeds that cryptographic
> > > transformation. All it guarantees is that the entropy source is protected
> > > from being guessed based on the output. Applying cryptography on a simple
> > > counter provides apparently random numbers that will be unique for a long
> > > period for the same source, but as soon as you duplicate that code between
> > > users and they start from the same counter they'll get the same IDs.
> > >
> > > This is why I think that using a counter is better if you really need 
> > > something
> > > unique. Randoms only reduce predictability which helps avoiding 
> > > collisions.
> >
> > Microsoft's spec tells us that they're giving us cryptographically
> > random numbers. Where they're getting those from is not our problem.
> > (And if even the hypervisor is not able to collect enough entropy to
> > securely generate random numbers, worrying about RNG reseeding in the
> > guest would be kinda pointless, we'd be fairly screwed anyway.)
>
> Sorry if I sound annoying, but it's a matter of terminology and needs.
>
> Cryptograhically random means safe for use with cryptography in that it
> is unguessable enough so that you can use it for encryption keys that
> nobody will be able to guess. It in no ways guarantees uniqueness, just
> like you don't really care if the symmetric crypto key of you VPN has
> already been used once somewhere else as long as there's no way to know.
> However with the good enough distribution that a CSPRNG provides,
> collisions within a *same* generator are bound to a very low, predictable
> rate which is by generally considered as acceptable for all use cases.

Yes.

> Something random (cryptographically or not) *cannot* be unique by
> definition, otherwise it's not random anymore, since each draw has an
> influence on the remaining list of possible draws, which is contrary to
> randomness. And conversely something unique cannot be completely random
> because if you know it's unique, you can already rule out all other known
> values from the candidates, thus it's more predictable than random.

Yes.

> With this in mind, picking randoms from a same RNG is often highly
> sufficient to consider they're highly likely unique within a long
> period. But it's not a guarantee. And it's even less one between two
> RNGs (e.g. if uniqueness is required between multiple hypervisors in
> case VMs are migrated or centrally managed, which I don't know).
>
> If what is sought here is a strong guarantee of uniqueness, using a
> counter as you first suggested is better.

My suggestion is to use a counter *in the UAPI*, not in the hypervisor
protocol. (And as long as that counter can only miss increments in a
cryptographically negligible fraction of cases, everything's fine.)

> If what is sought is pure
> randomness (in the sense that it's unpredictable, which I don't think
> is needed here), then randoms are better.

And this is what *the hypervisor protocol* gives us (which could be
very useful for reseeding the kernel RNG).

> If both are required, just
> concatenate a counter and a random. And if you need them to be spatially
> unique, just include a node identifier.
>
> Now the initial needs in the forwarded message are not entirely clear
> to me but I wanted to rule out the apparent mismatch between the expressed
> needs for uniqueness and the proposed solutions solely based on randomness.

Sure, from a theoretical standpoint, it would be a little bit nicer if
the hypervisor protocol included a generation number along with the
128-bit random value. But AFAIU it doesn't, so if we want this to just
work under Microsoft's existing hypervisor, we'll have to make do with
checking whether the random value changed. :P

Re: [PATCH v2 2/3] grackle: use qdev gpios for PCI IRQs

2020-10-16 Thread David Gibson

On Fri, Oct 16, 2020 at 07:53:10AM +0100, Mark Cave-Ayland wrote:
> On 16/10/2020 07:45, Howard Spoelstra wrote:
> 
> > Hi,
> > 
> > I see compilation of the current ppc-for-5.2 branch fail with:
> > 
> > ../hw/pci-host/grackle.c: In function ‘grackle_realize’:
> > ../hw/pci-host/grackle.c:68:11: error: ‘GrackleState’ has no member named 
> > ‘pic’
> >     68 |     if (!s->pic) {
> >        |           ^~
> > make: *** [Makefile.ninja:1741: libcommon.fa.p/hw_pci-host_grackle.c.o] 
> > Error 1
> > 
> > Best,
> > Howard
> 
> I see - as per the cover letter, my series is a replacement for Phil's
> original patch at
> https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg02988.html (the PIC
> link is now completely removed), so the solution here is to drop patch
> 7daac97 "hw/pci-host/grackle: Verify PIC link is properly set".

Ok, I've removed that from my ppc-for-5.2 tree.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v2 1/3] macio: don't reference serial_hd() directly within the device

2020-10-16 Thread David Gibson

On Fri, Oct 16, 2020 at 08:00:06AM +0100, Mark Cave-Ayland wrote:
> On 16/10/2020 01:16, David Gibson wrote:
> 
> > On Tue, Oct 13, 2020 at 12:49:20PM +0100, Mark Cave-Ayland wrote:
> > > Instead use qdev_prop_set_chr() to configure the ESCC serial chardevs at 
> > > the
> > > Mac Old World and New World machine level.
> > > 
> > > Also remove the now obsolete comment referring to the use of serial_hd() 
> > > and
> > > the setting of user_creatable to false accordingly.
> > > 
> > > Signed-off-by: Mark Cave-Ayland 
> > 
> > Applied to ppc-for-5.2, thanks.
> 
> Ah okay, I was planning to send a separate qemu-macppc pull request myself
> with these patches plus some cherry-picks from Zoltan's patchset after some
> more testing. Does this mean you would prefer to take the patches directly
> yourself?

Not really.  I sent a PR recently, so I probably won't do another for
a little while.  So go ahead and send yours, and mine should rebase
easily enough.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Willy Tarreau

On Sat, Oct 17, 2020 at 07:52:48AM +0200, Jann Horn wrote:
> On Sat, Oct 17, 2020 at 7:37 AM Willy Tarreau  wrote:
> > On Sat, Oct 17, 2020 at 07:01:31AM +0200, Jann Horn wrote:
> > > Microsoft's documentation
> > > (http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM
> > > Generation ID that we get after a fork "is a 128-bit,
> > > cryptographically random integer value". If multiple people use the
> > > same image, it guarantees that each use of the image gets its own,
> > > fresh ID:
> >
> > No. It cannot be more unique than the source that feeds that cryptographic
> > transformation. All it guarantees is that the entropy source is protected
> > from being guessed based on the output. Applying cryptography on a simple
> > counter provides apparently random numbers that will be unique for a long
> > period for the same source, but as soon as you duplicate that code between
> > users and they start from the same counter they'll get the same IDs.
> >
> > This is why I think that using a counter is better if you really need 
> > something
> > unique. Randoms only reduce predictability which helps avoiding collisions.
> 
> Microsoft's spec tells us that they're giving us cryptographically
> random numbers. Where they're getting those from is not our problem.
> (And if even the hypervisor is not able to collect enough entropy to
> securely generate random numbers, worrying about RNG reseeding in the
> guest would be kinda pointless, we'd be fairly screwed anyway.)

Sorry if I sound annoying, but it's a matter of terminology and needs.

Cryptograhically random means safe for use with cryptography in that it
is unguessable enough so that you can use it for encryption keys that
nobody will be able to guess. It in no ways guarantees uniqueness, just
like you don't really care if the symmetric crypto key of you VPN has
already been used once somewhere else as long as there's no way to know.
However with the good enough distribution that a CSPRNG provides,
collisions within a *same* generator are bound to a very low, predictable
rate which is by generally considered as acceptable for all use cases.

Something random (cryptographically or not) *cannot* be unique by
definition, otherwise it's not random anymore, since each draw has an
influence on the remaining list of possible draws, which is contrary to
randomness. And conversely something unique cannot be completely random
because if you know it's unique, you can already rule out all other known
values from the candidates, thus it's more predictable than random.

With this in mind, picking randoms from a same RNG is often highly
sufficient to consider they're highly likely unique within a long
period. But it's not a guarantee. And it's even less one between two
RNGs (e.g. if uniqueness is required between multiple hypervisors in
case VMs are migrated or centrally managed, which I don't know).

If what is sought here is a strong guarantee of uniqueness, using a
counter as you first suggested is better. If what is sought is pure
randomness (in the sense that it's unpredictable, which I don't think
is needed here), then randoms are better. If both are required, just
concatenate a counter and a random. And if you need them to be spatially
unique, just include a node identifier.

Now the initial needs in the forwarded message are not entirely clear
to me but I wanted to rule out the apparent mismatch between the expressed
needs for uniqueness and the proposed solutions solely based on randomness.

Cheers,
Willy

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Willy Tarreau

On Sat, Oct 17, 2020 at 07:01:31AM +0200, Jann Horn wrote:
> Microsoft's documentation
> (http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM
> Generation ID that we get after a fork "is a 128-bit,
> cryptographically random integer value". If multiple people use the
> same image, it guarantees that each use of the image gets its own,
> fresh ID:

No. It cannot be more unique than the source that feeds that cryptographic
transformation. All it guarantees is that the entropy source is protected
from being guessed based on the output. Applying cryptography on a simple
counter provides apparently random numbers that will be unique for a long
period for the same source, but as soon as you duplicate that code between
users and they start from the same counter they'll get the same IDs.

This is why I think that using a counter is better if you really need something
unique. Randoms only reduce predictability which helps avoiding collisions.
And I'm saying this as someone who had on his external gateway the same SSH
host key as 89 other hosts on the net, each of them using randoms to provide
a universally unique one...

Willy

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Colm MacCarthaigh





On 16 Oct 2020, at 21:02, Jann Horn wrote:

On Sat, Oct 17, 2020 at 5:36 AM Willy Tarreau  wrote:
But in userspace, we just need a simple counter. There's no need for
us to worry about anything else, like timestamps or whatever. If we
repeatedly fork a paused VM, the forked VMs will see the same counter
value, but that's totally fine, because the only thing that matters to
userspace is that the counter changes when the VM is forked.


For user-space, even a single bit would do. We added MADVISE_WIPEONFORK 
so that userspace libraries can detect fork()/clone() robustly, for the 
same reasons. It just wipes a page as the indicator, which is 
effectively a single-bit signal, and it works well. On the user-space 
side of this, I’m keen to find a solution like that that we can use 
fairly easily inside of portable libraries and applications. The “have 
I forked” checks do end up in hot paths, so it’s nice if they can be 
CPU cache friendly. Comparing a whole 128-bit value wouldn’t be my 
favorite.



And actually, since the value is a cryptographically random 128-bit
value, I think that we should definitely use it to help reseed the
kernel's RNG, and keep it secret from userspace. That way, even if the
VM image is public, we can ensure that going forward, the kernel RNG
will return securely random data.


If the image is public, you need some extra new raw entropy from 
somewhere. The gen-id could be mixed in, that can’t do any harm as 
long as rigorous cryptographic mixing with the prior state is used, but 
if that’s all you do then the final state is still deterministic and 
non-secret. The kernel would need to use the change as a trigger to 
measure some entropy (e.g. interrupts and RDRAND, or whatever). Our just 
define the machine contract as “this has to be unique random data and 
if it’s not unique, or if it’s pubic, you’re toast”.



-
Colm

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Willy Tarreau

On Sat, Oct 17, 2020 at 03:40:08AM +0200, Jann Horn wrote:
> [adding some more people who are interested in RNG stuff: Andy, Jason,
> Theodore, Willy Tarreau, Eric Biggers. also linux-api@, because this
> concerns some pretty fundamental API stuff related to RNG usage]
> 
> On Fri, Oct 16, 2020 at 4:33 PM Catangiu, Adrian Costin
>  wrote:
> > This patch is a driver which exposes the Virtual Machine Generation ID
> > via a char-dev FS interface that provides ID update sync and async
> > notification, retrieval and confirmation mechanisms:
> >
> > When the device is 'open()'ed a copy of the current vm UUID is
> > associated with the file handle. 'read()' operations block until the
> > associated UUID is no longer up to date - until HW vm gen id changes -
> > at which point the new UUID is provided/returned. Nonblocking 'read()'
> > uses EWOULDBLOCK to signal that there is no _new_ UUID available.
> >
> > 'poll()' is implemented to allow polling for UUID updates. Such
> > updates result in 'EPOLLIN' events.
> >
> > Subsequent read()s following a UUID update no longer block, but return
> > the updated UUID. The application needs to acknowledge the UUID update
> > by confirming it through a 'write()'.
> > Only on writing back to the driver the right/latest UUID, will the
> > driver mark this "watcher" as up to date and remove EPOLLIN status.
> >
> > 'mmap()' support allows mapping a single read-only shared page which
> > will always contain the latest UUID value at offset 0.
> 
> It would be nicer if that page just contained an incrementing counter,
> instead of a UUID. It's not like the application cares *what* the UUID
> changed to, just that it *did* change and all RNGs state now needs to
> be reseeded from the kernel, right? And an application can't reliably
> read the entire UUID from the memory mapping anyway, because the VM
> might be forked in the middle.
> 
> So I think your kernel driver should detect UUID changes and then turn
> those into a monotonically incrementing counter. (Probably 64 bits
> wide?) (That's probably also a little bit faster than comparing an
> entire UUID.)

I agree with this. Further, I'm observing there is a very common
confusion between "universally unique" and "random". Randoms are
needed when seeking unpredictability. A random number generator
*must* be able to return the same value multiple times in a row
(though this is rare), otherwise it's not random.

To illustrate this, a die has less than 3 bits of randomness and
is sufficient to play games with friends where a counter would allow
everyone to cheat. Conversely if you want to assign IDs to members
of your family you'd rather use a counter than a die for this or
you risk collisions and/or long series of retries to pick unique
IDs.

RFC4122 explains in great length how to produce guaranteed unique
IDs, and this only involves space, time and counters. There's
indeed a lazy variant that probably everyone uses nowadays,
consisting in picking random numbers, but this is not guaranteed
unique anymore.

If the UUIDs used there are real UUIDs, it could be as simple as
updating them according to their format, i.e. updating the timestamp,
and if the timestamp is already the same, just increase the seq counter.
Doing this doesn't require entropy, doesn't need to block and doesn't
needlessly leak randoms that sometimes make people feel nervous.

Just my two cents,
Willy

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Colm MacCarthaigh





On 16 Oct 2020, at 22:01, Jann Horn wrote:


On Sat, Oct 17, 2020 at 6:34 AM Colm MacCarthaigh 
 wrote:
For user-space, even a single bit would do. We added 
MADVISE_WIPEONFORK
so that userspace libraries can detect fork()/clone() robustly, for 
the

same reasons. It just wipes a page as the indicator, which is
effectively a single-bit signal, and it works well. On the user-space
side of this, I’m keen to find a solution like that that we can use
fairly easily inside of portable libraries and applications. The 
“have
I forked” checks do end up in hot paths, so it’s nice if they can 
be

CPU cache friendly. Comparing a whole 128-bit value wouldn’t be my
favorite.


I'm pretty sure a single bit is not enough if you want to have a
single page, shared across the entire system, that stores the VM
forking state; you need a counter for that.


You’re right. WIPEONFORK is more like a single-bit per use. If it’s 
something system wide then a counter is better.



So the RNG state after mixing in the new VM Generation ID would
contain 128 bits of secret entropy not known to anyone else, including
people with access to the VM image.

Now, 128 bits of cryptographically random data aren't _optimal_; I
think something on the order of 256 bits would be nicer from a
theoretical standpoint. But in practice I think we'll be good with the
128 bits we're getting (since the number of users who fork a VM image
is probably not going to be so large that worst-case collision
probabilities matter).


This reminds me on key/IV usage limits for AES encryption, where the 
same birthday bounds apply, and even though 256-bits would be better, we 
routinely make 128-bit birthday bounds work for massively scalable 
systems.



The kernel would need to use the change as a trigger to
measure some entropy (e.g. interrupts and RDRAND, or whatever). Our 
just
define the machine contract as “this has to be unique random data 
and

if it’s not unique, or if it’s pubic, you’re toast”.


As far as I can tell from Microsoft's spec, that is a guarantee we're
already getting.


Neat.

-
Colm

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Jann Horn

On Sat, Oct 17, 2020 at 7:37 AM Willy Tarreau  wrote:
> On Sat, Oct 17, 2020 at 07:01:31AM +0200, Jann Horn wrote:
> > Microsoft's documentation
> > (http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM
> > Generation ID that we get after a fork "is a 128-bit,
> > cryptographically random integer value". If multiple people use the
> > same image, it guarantees that each use of the image gets its own,
> > fresh ID:
>
> No. It cannot be more unique than the source that feeds that cryptographic
> transformation. All it guarantees is that the entropy source is protected
> from being guessed based on the output. Applying cryptography on a simple
> counter provides apparently random numbers that will be unique for a long
> period for the same source, but as soon as you duplicate that code between
> users and they start from the same counter they'll get the same IDs.
>
> This is why I think that using a counter is better if you really need 
> something
> unique. Randoms only reduce predictability which helps avoiding collisions.

Microsoft's spec tells us that they're giving us cryptographically
random numbers. Where they're getting those from is not our problem.
(And if even the hypervisor is not able to collect enough entropy to
securely generate random numbers, worrying about RNG reseeding in the
guest would be kinda pointless, we'd be fairly screwed anyway.)

Also note that we don't actually need to *always* reinitialize RNG
state on forks for functional correctness; it is fine if that fails
with a probability of 2^-128, because functionally everything will be
fine, and an attacker who is that lucky could also just guess an AES
key (which has the same probability of being successful). (And also
2^-128 is such a tiny number that it doesn't matter anyway.)

> And I'm saying this as someone who had on his external gateway the same SSH
> host key as 89 other hosts on the net, each of them using randoms to provide
> a universally unique one...

If your SSH host key was shared with 89 other hosts, it evidently
wasn't generated from cryptographically random numbers. :P Either
because the key generator was not properly hooked up to the system's
entropy pool (if you're talking about the Debian fiasco), or because
the system simply did not have enough entropy available. (Or because
the key generator is broken, but I don't think that ever happened with
OpenSSH?)

[Bug 1891748] Re: qemu-arm-static 5.1 can't run gcc

2020-10-16 Thread Richard Henderson

This has been fixed in mainline, probably commit
8ef618859c379fdce81c91bc93e0574e36ea76ff.


** Changed in: qemu
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1891748

Title:
  qemu-arm-static 5.1 can't run gcc

Status in QEMU:
  Fix Committed

Bug description:
  Issue discovered while trying to build pikvm (1)

  Long story short: when using qemu-arm-static 5.1, gcc exits whith
  message:

  Allocating guest commpage: Operation not permitted

  
  when using qemu-arm-static v5.0, gcc "works"

  Steps to reproduce will follow

  (1)  https://github.com/pikvm/pikvm/blob/master/pages/building_os.md

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1891748/+subscriptions

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Jann Horn

On Sat, Oct 17, 2020 at 6:34 AM Colm MacCarthaigh  wrote:
> On 16 Oct 2020, at 21:02, Jann Horn wrote:
> > On Sat, Oct 17, 2020 at 5:36 AM Willy Tarreau  wrote:
> > But in userspace, we just need a simple counter. There's no need for
> > us to worry about anything else, like timestamps or whatever. If we
> > repeatedly fork a paused VM, the forked VMs will see the same counter
> > value, but that's totally fine, because the only thing that matters to
> > userspace is that the counter changes when the VM is forked.
>
> For user-space, even a single bit would do. We added MADVISE_WIPEONFORK
> so that userspace libraries can detect fork()/clone() robustly, for the
> same reasons. It just wipes a page as the indicator, which is
> effectively a single-bit signal, and it works well. On the user-space
> side of this, I’m keen to find a solution like that that we can use
> fairly easily inside of portable libraries and applications. The “have
> I forked” checks do end up in hot paths, so it’s nice if they can be
> CPU cache friendly. Comparing a whole 128-bit value wouldn’t be my
> favorite.

I'm pretty sure a single bit is not enough if you want to have a
single page, shared across the entire system, that stores the VM
forking state; you need a counter for that.

> > And actually, since the value is a cryptographically random 128-bit
> > value, I think that we should definitely use it to help reseed the
> > kernel's RNG, and keep it secret from userspace. That way, even if the
> > VM image is public, we can ensure that going forward, the kernel RNG
> > will return securely random data.
>
> If the image is public, you need some extra new raw entropy from
> somewhere. The gen-id could be mixed in, that can’t do any harm as
> long as rigorous cryptographic mixing with the prior state is used, but
> if that’s all you do then the final state is still deterministic and
> non-secret.

Microsoft's documentation
(http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM
Generation ID that we get after a fork "is a 128-bit,
cryptographically random integer value". If multiple people use the
same image, it guarantees that each use of the image gets its own,
fresh ID: The table in section "How to implement virtual machine
generation ID support in a virtualization platform" says that (among
other things) "Virtual machine is imported, copied, or cloned"
generates a new generation ID.

So the RNG state after mixing in the new VM Generation ID would
contain 128 bits of secret entropy not known to anyone else, including
people with access to the VM image.

Now, 128 bits of cryptographically random data aren't _optimal_; I
think something on the order of 256 bits would be nicer from a
theoretical standpoint. But in practice I think we'll be good with the
128 bits we're getting (since the number of users who fork a VM image
is probably not going to be so large that worst-case collision
probabilities matter).

> The kernel would need to use the change as a trigger to
> measure some entropy (e.g. interrupts and RDRAND, or whatever). Our just
> define the machine contract as “this has to be unique random data and
> if it’s not unique, or if it’s pubic, you’re toast”.

As far as I can tell from Microsoft's spec, that is a guarantee we're
already getting.

[PATCH v3 16/18] migration/rdma: add rdma_channel into Migrationstate field

2020-10-16 Thread Chuan Zheng

Multifd RDMA is need to poll when we send data, record it.

Signed-off-by: Chuan Zheng 
---
 migration/migration.c |  1 +
 migration/migration.h |  1 +
 migration/rdma.c  | 14 ++
 3 files changed, 16 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 7061410..1ec1dc9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1892,6 +1892,7 @@ void migrate_init(MigrationState *s)
 s->migration_thread_running = false;
 s->enabled_rdma_migration = false;
 s->host_port = NULL;
+s->rdma_channel = 0;
 error_free(s->error);
 s->error = NULL;
 s->hostname = NULL;
diff --git a/migration/migration.h b/migration/migration.h
index fea63de..5676b23 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -272,6 +272,7 @@ struct MigrationState
 
 /* Need by Multi-RDMA */
 char *host_port;
+int rdma_channel;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/rdma.c b/migration/rdma.c
index d5d6364..327f80f 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -183,6 +183,20 @@ typedef struct {
 } RDMAWorkRequestData;
 
 /*
+ * Get the multifd RDMA channel used to send data.
+ */
+static int get_multifd_RDMA_channel(void)
+{
+int thread_count = migrate_multifd_channels();
+MigrationState *s = migrate_get_current();
+
+s->rdma_channel++;
+s->rdma_channel %= thread_count;
+
+return s->rdma_channel;
+}
+
+/*
  * Negotiate RDMA capabilities during connection-setup time.
  */
 typedef struct {
-- 
1.8.3.1

[PATCH v3 18/18] migration/rdma: RDMA cleanup for multifd migration

2020-10-16 Thread Chuan Zheng

Signed-off-by: Chuan Zheng 
---
 migration/multifd.c |  6 ++
 migration/multifd.h |  1 +
 migration/rdma.c| 16 +++-
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index c4d90ef..f548122 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -574,6 +574,9 @@ void multifd_save_cleanup(void)
 p->packet_len = 0;
 g_free(p->packet);
 p->packet = NULL;
+#ifdef CONFIG_RDMA
+multifd_rdma_cleanup(p->rdma);
+#endif
 multifd_send_state->ops->send_cleanup(p, &local_err);
 if (local_err) {
 migrate_set_error(migrate_get_current(), local_err);
@@ -1017,6 +1020,9 @@ int multifd_load_cleanup(Error **errp)
 p->packet_len = 0;
 g_free(p->packet);
 p->packet = NULL;
+#ifdef CONFIG_RDMA
+multifd_rdma_cleanup(p->rdma);
+#endif
 multifd_recv_state->ops->recv_cleanup(p);
 }
 qemu_sem_destroy(&multifd_recv_state->sem_sync);
diff --git a/migration/multifd.h b/migration/multifd.h
index ec9e897..6fddd4e 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -183,6 +183,7 @@ typedef struct {
 
 #ifdef CONFIG_RDMA
 MultiFDSetup *multifd_rdma_setup(void);
+void multifd_rdma_cleanup(void *opaque);
 #endif
 void multifd_send_terminate_threads(Error *err);
 int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp);
diff --git a/migration/rdma.c b/migration/rdma.c
index 519fa7a..89bf54b 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2368,7 +2368,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
 {
 int idx;
 
-if (rdma->cm_id && rdma->connected) {
+if (rdma->channel && rdma->cm_id && rdma->connected) {
 if ((rdma->error_state ||
  migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
 !rdma->received_error) {
@@ -4609,6 +4609,20 @@ static MultiFDSetup multifd_rdma_ops = {
 .recv_channel_setup = multifd_rdma_recv_channel_setup
 };
 
+void multifd_rdma_cleanup(void *opaque)
+{
+RDMAContext *rdma = (RDMAContext *)opaque;
+
+if (!migrate_use_rdma()) {
+return;
+}
+
+rdma->listen_id = NULL;
+rdma->channel = NULL;
+qemu_rdma_cleanup(rdma);
+g_free(rdma);
+}
+
 MultiFDSetup *multifd_rdma_setup(void)
 {
 MultiFDSetup *rdma_ops;
-- 
1.8.3.1

[PATCH v3 03/18] migration/rdma: create multifd_setup_ops for Tx/Rx thread

2020-10-16 Thread Chuan Zheng

Create multifd_setup_ops for TxRx thread, no logic change.

Signed-off-by: Chuan Zheng 
---
 migration/multifd.c | 44 +++-
 migration/multifd.h |  7 +++
 2 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 68b171f..1f82307 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -383,6 +383,8 @@ struct {
 int exiting;
 /* multifd ops */
 MultiFDMethods *ops;
+/* multifd setup ops */
+MultiFDSetup *setup_ops;
 } *multifd_send_state;
 
 /*
@@ -790,8 +792,9 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
 } else {
 /* update for tls qio channel */
 p->c = ioc;
-qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
-   QEMU_THREAD_JOINABLE);
+qemu_thread_create(&p->thread, p->name,
+   
multifd_send_state->setup_ops->send_thread_setup,
+   p, QEMU_THREAD_JOINABLE);
}
return false;
 }
@@ -839,6 +842,11 @@ cleanup:
 multifd_new_send_channel_cleanup(p, sioc, local_err);
 }
 
+static void multifd_send_channel_setup(MultiFDSendParams *p)
+{
+socket_send_channel_create(multifd_new_send_channel_async, p);
+}
+
 int multifd_save_setup(Error **errp)
 {
 int thread_count;
@@ -856,6 +864,7 @@ int multifd_save_setup(Error **errp)
 multifd_send_state->pages = multifd_pages_init(page_count);
 qemu_sem_init(&multifd_send_state->channels_ready, 0);
 qatomic_set(&multifd_send_state->exiting, 0);
+multifd_send_state->setup_ops = multifd_setup_ops_init();
 multifd_send_state->ops = multifd_ops[migrate_multifd_compression()];
 
 for (i = 0; i < thread_count; i++) {
@@ -875,7 +884,7 @@ int multifd_save_setup(Error **errp)
 p->packet->version = cpu_to_be32(MULTIFD_VERSION);
 p->name = g_strdup_printf("multifdsend_%d", i);
 p->tls_hostname = g_strdup(s->hostname);
-socket_send_channel_create(multifd_new_send_channel_async, p);
+multifd_send_state->setup_ops->send_channel_setup(p);
 }
 
 for (i = 0; i < thread_count; i++) {
@@ -902,6 +911,8 @@ struct {
 uint64_t packet_num;
 /* multifd ops */
 MultiFDMethods *ops;
+/* multifd setup ops */
+MultiFDSetup *setup_ops;
 } *multifd_recv_state;
 
 static void multifd_recv_terminate_threads(Error *err)
@@ -1095,6 +1106,7 @@ int multifd_load_setup(Error **errp)
 multifd_recv_state->params = g_new0(MultiFDRecvParams, thread_count);
 qatomic_set(&multifd_recv_state->count, 0);
 qemu_sem_init(&multifd_recv_state->sem_sync, 0);
+multifd_recv_state->setup_ops = multifd_setup_ops_init();
 multifd_recv_state->ops = multifd_ops[migrate_multifd_compression()];
 
 for (i = 0; i < thread_count; i++) {
@@ -1173,9 +1185,31 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error 
**errp)
 p->num_packets = 1;
 
 p->running = true;
-qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
-   QEMU_THREAD_JOINABLE);
+multifd_recv_state->setup_ops->recv_channel_setup(ioc, p);
+qemu_thread_create(&p->thread, p->name,
+   multifd_recv_state->setup_ops->recv_thread_setup,
+   p, QEMU_THREAD_JOINABLE);
 qatomic_inc(&multifd_recv_state->count);
 return qatomic_read(&multifd_recv_state->count) ==
migrate_multifd_channels();
 }
+
+static void multifd_recv_channel_setup(QIOChannel *ioc, MultiFDRecvParams *p)
+{
+return;
+}
+
+static MultiFDSetup multifd_socket_ops = {
+.send_thread_setup = multifd_send_thread,
+.recv_thread_setup = multifd_recv_thread,
+.send_channel_setup = multifd_send_channel_setup,
+.recv_channel_setup = multifd_recv_channel_setup
+};
+
+MultiFDSetup *multifd_setup_ops_init(void)
+{
+MultiFDSetup *ops;
+
+ops = &multifd_socket_ops;
+return ops;
+}
diff --git a/migration/multifd.h b/migration/multifd.h
index 8d6751f..446315b 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -166,6 +166,13 @@ typedef struct {
 int (*recv_pages)(MultiFDRecvParams *p, uint32_t used, Error **errp);
 } MultiFDMethods;
 
+typedef struct {
+void *(*send_thread_setup)(void *opaque);
+void *(*recv_thread_setup)(void *opaque);
+void (*send_channel_setup)(MultiFDSendParams *p);
+void (*recv_channel_setup)(QIOChannel *ioc, MultiFDRecvParams *p);
+} MultiFDSetup;
+
 void multifd_register_ops(int method, MultiFDMethods *ops);
 
 #endif
-- 
1.8.3.1

[PATCH v3 05/18] migration/rdma: do not need sync main for rdma

2020-10-16 Thread Chuan Zheng

Signed-off-by: Zhimin Feng 
Signed-off-by: Chuan Zheng 
---
 migration/multifd.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index 0d494df..8ccfd46 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -580,6 +580,10 @@ void multifd_send_sync_main(QEMUFile *f)
 if (!migrate_use_multifd()) {
 return;
 }
+ /* Do not need sync for rdma */
+if (migrate_use_rdma()) {
+return;
+}
 if (multifd_send_state->pages->used) {
 if (multifd_send_pages(f) < 0) {
 error_report("%s: multifd_send_pages fail", __func__);
@@ -1002,6 +1006,10 @@ void multifd_recv_sync_main(void)
 if (!migrate_use_multifd()) {
 return;
 }
+/* Do not need sync for rdma */
+if (migrate_use_rdma()) {
+return;
+}
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDRecvParams *p = &multifd_recv_state->params[i];
 
-- 
1.8.3.1

[PATCH v3 11/18] migration/rdma: record host_port for multifd RDMA

2020-10-16 Thread Chuan Zheng

Signed-off-by: Chuan Zheng 
---
 migration/migration.c | 1 +
 migration/migration.h | 3 +++
 migration/rdma.c  | 3 +++
 3 files changed, 7 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index be6166a..7061410 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1891,6 +1891,7 @@ void migrate_init(MigrationState *s)
 s->postcopy_after_devices = false;
 s->migration_thread_running = false;
 s->enabled_rdma_migration = false;
+s->host_port = NULL;
 error_free(s->error);
 s->error = NULL;
 s->hostname = NULL;
diff --git a/migration/migration.h b/migration/migration.h
index e92eb29..fea63de 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -269,6 +269,9 @@ struct MigrationState
  * Enable RDMA migration
  */
 bool enabled_rdma_migration;
+
+/* Need by Multi-RDMA */
+char *host_port;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/rdma.c b/migration/rdma.c
index 63559f1..dd9f705 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -4206,6 +4206,8 @@ void rdma_start_outgoing_migration(void *opaque,
 goto err;
 }
 
+s->host_port = g_strdup(host_port);
+
 ret = qemu_rdma_source_init(rdma,
 s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL], errp);
 
@@ -4250,6 +4252,7 @@ void rdma_start_outgoing_migration(void *opaque,
 
 s->to_dst_file = qemu_fopen_rdma(rdma, "wb");
 migrate_fd_connect(s, NULL);
+g_free(s->host_port);
 return;
 return_path_err:
 qemu_rdma_cleanup(rdma);
-- 
1.8.3.1

[PATCH v3 12/18] migration/rdma: Create the multifd send channels for RDMA

2020-10-16 Thread Chuan Zheng

Signed-off-by: Chuan Zheng 
---
 migration/multifd.c |  4 ++--
 migration/multifd.h |  2 ++
 migration/rdma.c| 56 +
 3 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 03f3a1e..9439b3c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -173,7 +173,7 @@ void multifd_register_ops(int method, MultiFDMethods *ops)
 multifd_ops[method] = ops;
 }
 
-static int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp)
+int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp)
 {
 MultiFDInit_t msg = {};
 int ret;
@@ -500,7 +500,7 @@ int multifd_queue_page(QEMUFile *f, RAMBlock *block, 
ram_addr_t offset)
 return 1;
 }
 
-static void multifd_send_terminate_threads(Error *err)
+void multifd_send_terminate_threads(Error *err)
 {
 int i;
 
diff --git a/migration/multifd.h b/migration/multifd.h
index ff80bd5..ec9e897 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -184,6 +184,8 @@ typedef struct {
 #ifdef CONFIG_RDMA
 MultiFDSetup *multifd_rdma_setup(void);
 #endif
+void multifd_send_terminate_threads(Error *err);
+int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp);
 int get_multifd_send_param(int id, MultiFDSendParams **param);
 int get_multifd_recv_param(int id, MultiFDRecvParams **param);
 MultiFDSetup *multifd_setup_ops_init(void);
diff --git a/migration/rdma.c b/migration/rdma.c
index dd9f705..1af81f5 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -4261,9 +4261,54 @@ err:
 g_free(rdma_return_path);
 }
 
+static int multifd_channel_rdma_connect(void *opaque)
+{
+MultiFDSendParams *p = opaque;
+Error *local_err = NULL;
+int ret = 0;
+MigrationState *s = migrate_get_current();
+
+p->rdma = qemu_rdma_data_init(s->host_port, &local_err);
+if (p->rdma == NULL) {
+goto out;
+}
+
+ret = qemu_rdma_source_init(p->rdma,
+migrate_use_rdma_pin_all(),
+&local_err);
+if (ret) {
+goto out;
+}
+
+ret = qemu_rdma_connect(p->rdma, &local_err);
+if (ret) {
+goto out;
+}
+
+p->file = qemu_fopen_rdma(p->rdma, "wb");
+if (p->file == NULL) {
+goto out;
+}
+
+p->c = QIO_CHANNEL(getQIOChannel(p->file));
+
+out:
+if (local_err) {
+trace_multifd_send_error(p->id);
+}
+
+return ret;
+}
+
 static void *multifd_rdma_send_thread(void *opaque)
 {
 MultiFDSendParams *p = opaque;
+Error *local_err = NULL;
+
+trace_multifd_send_thread_start(p->id);
+if (multifd_send_initial_packet(p, &local_err) < 0) {
+goto out;
+}
 
 while (true) {
 qemu_mutex_lock(&p->mutex);
@@ -4275,6 +4320,11 @@ static void *multifd_rdma_send_thread(void *opaque)
 qemu_sem_wait(&p->sem);
 }
 
+out:
+if (local_err) {
+trace_multifd_send_error(p->id);
+multifd_send_terminate_threads(local_err);
+}
 qemu_mutex_lock(&p->mutex);
 p->running = false;
 qemu_mutex_unlock(&p->mutex);
@@ -4286,6 +4336,12 @@ static void 
multifd_rdma_send_channel_setup(MultiFDSendParams *p)
 {
 Error *local_err = NULL;
 
+if (multifd_channel_rdma_connect(p)) {
+error_setg(&local_err, "multifd: rdma channel %d not established",
+   p->id);
+return ;
+}
+
 if (p->quit) {
 error_setg(&local_err, "multifd: send id %d already quit", p->id);
 return ;
-- 
1.8.3.1

[PATCH v3 13/18] migration/rdma: Add the function for dynamic page registration

2020-10-16 Thread Chuan Zheng

Add the 'qemu_rdma_registration' function, multifd send threads
call it to register memory.

Signed-off-by: Zhimin Feng 
Signed-off-by: Chuan Zheng 
---
 migration/rdma.c | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index 1af81f5..a366849 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3738,6 +3738,57 @@ out:
 return ret;
 }
 
+/*
+ * Dynamic page registrations for multifd RDMA threads.
+ */
+static int qemu_rdma_registration(void *opaque)
+{
+RDMAContext *rdma = opaque;
+RDMAControlHeader resp = {.type = RDMA_CONTROL_RAM_BLOCKS_RESULT };
+RDMALocalBlocks *local = &rdma->local_ram_blocks;
+int reg_result_idx, i, nb_dest_blocks;
+RDMAControlHeader head = { .len = 0, .repeat = 1 };
+int ret = 0;
+
+head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST;
+
+ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
+®_result_idx, rdma->pin_all ?
+qemu_rdma_reg_whole_ram_blocks : NULL);
+if (ret < 0) {
+goto out;
+}
+
+nb_dest_blocks = resp.len / sizeof(RDMADestBlock);
+
+if (local->nb_blocks != nb_dest_blocks) {
+rdma->error_state = -EINVAL;
+ret = -1;
+goto out;
+}
+
+qemu_rdma_move_header(rdma, reg_result_idx, &resp);
+memcpy(rdma->dest_blocks,
+   rdma->wr_data[reg_result_idx].control_curr, resp.len);
+
+for (i = 0; i < nb_dest_blocks; i++) {
+network_to_dest_block(&rdma->dest_blocks[i]);
+
+/* We require that the blocks are in the same order */
+if (rdma->dest_blocks[i].length != local->block[i].length) {
+rdma->error_state = -EINVAL;
+ret = -1;
+goto out;
+}
+local->block[i].remote_host_addr =
+rdma->dest_blocks[i].remote_host_addr;
+local->block[i].remote_rkey = rdma->dest_blocks[i].remote_rkey;
+}
+
+out:
+return ret;
+}
+
 /* Destination:
  * Called via a ram_control_load_hook during the initial RAM load section which
  * lists the RAMBlocks by name.  This lets us know the order of the RAMBlocks
-- 
1.8.3.1

[PATCH v3 02/18] migration/rdma: judge whether or not the RDMA is used for migration

2020-10-16 Thread Chuan Zheng

Add enabled_rdma_migration into MigrationState to judge
whether or not the RDMA is used for migration.

Signed-off-by: Zhimin Feng 
Signed-off-by: Chuan Zheng 
---
 migration/migration.c | 13 +
 migration/migration.h |  6 ++
 2 files changed, 19 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 64ae417..be6166a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -389,7 +389,9 @@ void migrate_add_address(SocketAddress *address)
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
 const char *p = NULL;
+MigrationState *s = migrate_get_current();
 
+s->enabled_rdma_migration = false;
 qapi_event_send_migration(MIGRATION_STATUS_SETUP);
 if (!strcmp(uri, "defer")) {
 deferred_incoming_migration(errp);
@@ -399,6 +401,7 @@ void qemu_start_incoming_migration(const char *uri, Error 
**errp)
 socket_start_incoming_migration(p ? p : uri, errp);
 #ifdef CONFIG_RDMA
 } else if (strstart(uri, "rdma:", &p)) {
+s->enabled_rdma_migration = true;
 rdma_start_incoming_migration(p, errp);
 #endif
 } else if (strstart(uri, "exec:", &p)) {
@@ -1887,6 +1890,7 @@ void migrate_init(MigrationState *s)
 s->start_postcopy = false;
 s->postcopy_after_devices = false;
 s->migration_thread_running = false;
+s->enabled_rdma_migration = false;
 error_free(s->error);
 s->error = NULL;
 s->hostname = NULL;
@@ -2115,6 +2119,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 socket_start_outgoing_migration(s, p ? p : uri, &local_err);
 #ifdef CONFIG_RDMA
 } else if (strstart(uri, "rdma:", &p)) {
+s->enabled_rdma_migration = true;
 rdma_start_outgoing_migration(s, p, &local_err);
 #endif
 } else if (strstart(uri, "exec:", &p)) {
@@ -2329,6 +2334,14 @@ bool migrate_use_events(void)
 return s->enabled_capabilities[MIGRATION_CAPABILITY_EVENTS];
 }
 
+bool migrate_use_rdma(void)
+{
+MigrationState *s;
+s = migrate_get_current();
+
+return s->enabled_rdma_migration;
+}
+
 bool migrate_use_rdma_pin_all(void)
 {
 MigrationState *s;
diff --git a/migration/migration.h b/migration/migration.h
index 74fd790..e92eb29 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -264,6 +264,11 @@ struct MigrationState
  * This save hostname when out-going migration starts
  */
 char *hostname;
+
+/*
+ * Enable RDMA migration
+ */
+bool enabled_rdma_migration;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
@@ -300,6 +305,7 @@ bool migrate_ignore_shared(void);
 bool migrate_validate_uuid(void);
 
 bool migrate_auto_converge(void);
+bool migrate_use_rdma(void);
 bool migrate_use_rdma_pin_all(void);
 bool migrate_use_multifd(void);
 bool migrate_pause_before_switchover(void);
-- 
1.8.3.1

[PATCH v3 10/18] migration/rdma: Create the multifd recv channels for RDMA

2020-10-16 Thread Chuan Zheng

We still don't transmit anything through them, and we only build
the RDMA connections.

Signed-off-by: Zhimin Feng 
Signed-off-by: Chuan Zheng 
---
 migration/rdma.c | 70 ++--
 1 file changed, 68 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 2baa933..63559f1 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3266,6 +3266,40 @@ static void rdma_cm_poll_handler(void *opaque)
 }
 }
 
+static bool qemu_rdma_accept_setup(RDMAContext *rdma)
+{
+RDMAContext *multifd_rdma = NULL;
+int thread_count;
+int i;
+MultiFDRecvParams *multifd_recv_param;
+thread_count = migrate_multifd_channels();
+/* create the multifd channels for RDMA */
+for (i = 0; i < thread_count; i++) {
+if (get_multifd_recv_param(i, &multifd_recv_param) < 0) {
+error_report("rdma: error getting multifd_recv_param(%d)", i);
+return false;
+}
+
+multifd_rdma = (RDMAContext *) multifd_recv_param->rdma;
+if (multifd_rdma->cm_id == NULL) {
+break;
+} else {
+multifd_rdma = NULL;
+}
+}
+
+if (multifd_rdma) {
+qemu_set_fd_handler(rdma->channel->fd,
+rdma_accept_incoming_migration,
+NULL, (void *)(intptr_t)multifd_rdma);
+} else {
+qemu_set_fd_handler(rdma->channel->fd, rdma_cm_poll_handler,
+NULL, rdma);
+}
+
+return true;
+}
+
 static int qemu_rdma_accept(RDMAContext *rdma)
 {
 RDMACapabilities cap;
@@ -3365,6 +3399,10 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
 NULL,
 (void *)(intptr_t)rdma->return_path);
+} else if (migrate_use_multifd()) {
+if (!qemu_rdma_accept_setup(rdma)) {
+goto err_rdma_dest_wait;
+}
 } else {
 qemu_set_fd_handler(rdma->channel->fd, rdma_cm_poll_handler,
 NULL, rdma);
@@ -3975,6 +4013,35 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, 
const char *mode)
 return rioc->file;
 }
 
+static void migration_rdma_process_incoming(QEMUFile *f,
+RDMAContext *rdma, Error **errp)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+QIOChannel *ioc = NULL;
+bool start_migration = false;
+
+/* FIXME: Need refactor */
+if (!migrate_use_multifd()) {
+rdma->migration_started_on_destination = 1;
+migration_fd_process_incoming(f, errp);
+return;
+}
+
+if (!mis->from_src_file) {
+mis->from_src_file = f;
+qemu_file_set_blocking(f, false);
+} else {
+ioc = QIO_CHANNEL(getQIOChannel(f));
+/* Multiple connections */
+assert(migrate_use_multifd());
+start_migration = multifd_recv_new_channel(ioc, errp);
+}
+
+if (start_migration) {
+migration_incoming_process();
+}
+}
+
 static void rdma_accept_incoming_migration(void *opaque)
 {
 RDMAContext *rdma = opaque;
@@ -4003,8 +4070,7 @@ static void rdma_accept_incoming_migration(void *opaque)
 return;
 }
 
-rdma->migration_started_on_destination = 1;
-migration_fd_process_incoming(f, &local_err);
+migration_rdma_process_incoming(f, rdma, &local_err);
 if (local_err) {
 error_reportf_err(local_err, "RDMA ERROR:");
 }
-- 
1.8.3.1

[PATCH v3 09/18] migration/rdma: add multifd_rdma_load_setup() to setup multifd rdma

2020-10-16 Thread Chuan Zheng

Signed-off-by: Chuan Zheng 
---
 migration/rdma.c | 52 
 1 file changed, 52 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index ad4e4ba..2baa933 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -4010,6 +4010,48 @@ static void rdma_accept_incoming_migration(void *opaque)
 }
 }
 
+static bool multifd_rdma_load_setup(const char *host_port,
+RDMAContext *rdma, Error **errp)
+{
+int thread_count;
+int i;
+int idx;
+MultiFDRecvParams *multifd_recv_param;
+RDMAContext *multifd_rdma;
+
+if (!migrate_use_multifd()) {
+return true;
+}
+
+if (multifd_load_setup(errp) != 0) {
+/*
+ * We haven't been able to create multifd threads
+ * nothing better to do
+ */
+return false;
+}
+
+thread_count = migrate_multifd_channels();
+for (i = 0; i < thread_count; i++) {
+if (get_multifd_recv_param(i, &multifd_recv_param) < 0) {
+ERROR(errp, "rdma: error getting multifd_recv_param(%d)", i);
+return false;
+}
+
+multifd_rdma = qemu_rdma_data_init(host_port, errp);
+for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
+multifd_rdma->wr_data[idx].control_len = 0;
+multifd_rdma->wr_data[idx].control_curr = NULL;
+}
+/* the CM channel and CM id is shared */
+multifd_rdma->channel = rdma->channel;
+multifd_rdma->listen_id = rdma->listen_id;
+multifd_recv_param->rdma = (void *)multifd_rdma;
+}
+
+return true;
+}
+
 void rdma_start_incoming_migration(const char *host_port, Error **errp)
 {
 int ret;
@@ -4057,6 +4099,16 @@ void rdma_start_incoming_migration(const char 
*host_port, Error **errp)
 qemu_rdma_return_path_dest_init(rdma_return_path, rdma);
 }
 
+/* multifd rdma setup */
+if (!multifd_rdma_load_setup(host_port, rdma, &local_err)) {
+/*
+ * We haven't been able to create multifd threads
+ * nothing better to do
+ */
+error_report_err(local_err);
+goto err;
+}
+
 qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
 NULL, (void *)(intptr_t)rdma);
 return;
-- 
1.8.3.1

[PATCH v3 17/18] migration/rdma: send data for both rdma-pin-all and NOT rdma-pin-all mode

2020-10-16 Thread Chuan Zheng

Signed-off-by: Zhimin Feng 
Signed-off-by: Chuan Zheng 
---
 migration/rdma.c | 67 +++-
 1 file changed, 62 insertions(+), 5 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 327f80f..519fa7a 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2001,6 +2001,20 @@ static int qemu_rdma_write_one(QEMUFile *f, RDMAContext 
*rdma,
.repeat = 1,
  };
 
+/* use multifd to send data */
+if (migrate_use_multifd()) {
+int channel = get_multifd_RDMA_channel();
+int ret = 0;
+MultiFDSendParams *multifd_send_param = NULL;
+ret = get_multifd_send_param(channel, &multifd_send_param);
+if (ret) {
+error_report("rdma: error getting multifd_send_param(%d)", 
channel);
+return -EINVAL;
+}
+rdma = (RDMAContext *)multifd_send_param->rdma;
+block = &(rdma->local_ram_blocks.block[current_index]);
+}
+
 retry:
 sge.addr = (uintptr_t)(block->local_host_addr +
 (current_addr - block->offset));
@@ -2196,6 +2210,27 @@ retry:
 return 0;
 }
 
+static int multifd_rdma_write_flush(void)
+{
+/* The multifd RDMA threads send data */
+MultiFDSendParams *multifd_send_param = NULL;
+RDMAContext *rdma = NULL;
+MigrationState *s = migrate_get_current();
+int ret = 0;
+
+ret = get_multifd_send_param(s->rdma_channel,
+ &multifd_send_param);
+if (ret) {
+error_report("rdma: error getting multifd_send_param(%d)",
+ s->rdma_channel);
+return ret;
+}
+rdma = (RDMAContext *)(multifd_send_param->rdma);
+rdma->nb_sent++;
+
+return ret;
+}
+
 /*
  * Push out any unwritten RDMA operations.
  *
@@ -2218,8 +2253,15 @@ static int qemu_rdma_write_flush(QEMUFile *f, 
RDMAContext *rdma)
 }
 
 if (ret == 0) {
-rdma->nb_sent++;
-trace_qemu_rdma_write_flush(rdma->nb_sent);
+if (migrate_use_multifd()) {
+ret = multifd_rdma_write_flush();
+if (ret) {
+return ret;
+}
+} else {
+rdma->nb_sent++;
+trace_qemu_rdma_write_flush(rdma->nb_sent);
+}
 }
 
 rdma->current_length = 0;
@@ -4061,6 +4103,7 @@ wait_reg_complete:
 }
 
 qemu_sem_post(&multifd_send_param->sem_sync);
+qemu_sem_wait(&multifd_send_param->sem);
 }
 }
 
@@ -4443,6 +4486,7 @@ static void *multifd_rdma_send_thread(void *opaque)
 Error *local_err = NULL;
 int ret = 0;
 RDMAControlHeader head = { .len = 0, .repeat = 1 };
+RDMAContext *rdma = p->rdma;
 
 trace_multifd_send_thread_start(p->id);
 if (multifd_send_initial_packet(p, &local_err) < 0) {
@@ -4451,7 +4495,7 @@ static void *multifd_rdma_send_thread(void *opaque)
 
 /* wait for semaphore notification to register memory */
 qemu_sem_wait(&p->sem_sync);
-if (qemu_rdma_registration(p->rdma) < 0) {
+if (qemu_rdma_registration(rdma) < 0) {
 goto out;
 }
 /*
@@ -4467,13 +4511,26 @@ static void *multifd_rdma_send_thread(void *opaque)
 break;
 }
 qemu_mutex_unlock(&p->mutex);
-
+/* To complete polling(CQE) */
+while (rdma->nb_sent) {
+ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
+if (ret < 0) {
+error_report("multifd RDMA migration: "
+ "complete polling error!");
+return NULL;
+}
+}
 /* Send FINISHED to the destination */
 head.type = RDMA_CONTROL_REGISTER_FINISHED;
-ret = qemu_rdma_exchange_send(p->rdma, &head, NULL, NULL, NULL, NULL);
+ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
 if (ret < 0) {
+error_report("multifd RDMA migration: "
+ "sending remote error!");
 return NULL;
 }
+
+/* sync main thread */
+qemu_sem_post(&p->sem);
 }
 
 out:
-- 
1.8.3.1

[PATCH v3 00/18] Support Multifd for RDMA migration

2020-10-16 Thread Chuan Zheng

Now I continue to support multifd for RDMA migration based on my colleague
zhiming's work:)

The previous RFC patches is listed below:
v1:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg669455.html
v2:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg679188.html

As descried in previous RFC, the RDMA bandwidth is not fully utilized for
over 25Gigabit NIC because of single channel for RDMA migration.
This patch series is going to support multifd for RDMA migration based on
multifd framework.

Comparsion is between origion and multifd RDMA migration is re-tested for v3.
The VM specifications for migration are as follows:
- VM use 4k page;
- the number of VCPU is 4;
- the total memory is 16Gigabit;
- use 'mempress' tool to pressurize VM(mempress 8000 500);
- use 25Gigabit network card to migrate;

For origin RDMA and MultiRDMA migration, the total migration times of
VM are as follows:
+
| | NOT rdma-pin-all | rdma-pin-all |
+
| origin RDMA |   26 s   | 29 s |
-
|  MultiRDMA  |   16 s   | 17 s |
+

Test the multifd RDMA migration like this:
virsh migrate --live --multiFd --migrateuri
rdma://192.168.1.100 [VM] --listen-address 0.0.0.0  
qemu+tcp://192.168.1.100/system --verbose

v2 -> v3:
create multifd ops for both tcp and rdma
do not export rdma to avoid multifd code in mess
fix build issue for non-rdma
fix some codestyle and buggy code

Chuan Zheng (18):
  migration/rdma: add the 'migrate_use_rdma_pin_all' function
  migration/rdma: judge whether or not the RDMA is used for migration
  migration/rdma: create multifd_setup_ops for Tx/Rx thread
  migration/rdma: add multifd_setup_ops for rdma
  migration/rdma: do not need sync main for rdma
  migration/rdma: export MultiFDSendParams/MultiFDRecvParams
  migration/rdma: add rdma field into multifd send/recv param
  migration/rdma: export getQIOChannel to get QIOchannel in rdma
  migration/rdma: add multifd_rdma_load_setup() to setup multifd rdma
  migration/rdma: Create the multifd recv channels for RDMA
  migration/rdma: record host_port for multifd RDMA
  migration/rdma: Create the multifd send channels for RDMA
  migration/rdma: Add the function for dynamic page registration
  migration/rdma: register memory for multifd RDMA channels
  migration/rdma: only register the memory for multifd channels
  migration/rdma: add rdma_channel into Migrationstate field
  migration/rdma: send data for both rdma-pin-all and NOT rdma-pin-all
mode
  migration/rdma: RDMA cleanup for multifd migration

 migration/migration.c |  24 +++
 migration/migration.h |  11 ++
 migration/multifd.c   |  97 +-
 migration/multifd.h   |  24 +++
 migration/qemu-file.c |   5 +
 migration/qemu-file.h |   1 +
 migration/rdma.c  | 503 +-
 7 files changed, 653 insertions(+), 12 deletions(-)

-- 
1.8.3.1

[PATCH v3 15/18] migration/rdma: only register the memory for multifd channels

2020-10-16 Thread Chuan Zheng

All data is sent by multifd Channels, so we only register its for
multifd channels and main channel don't register its.

Signed-off-by: Zhimin Feng 
Signed-off-by: Chuan Zheng 
---
 migration/rdma.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index 3210e6e..d5d6364 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3938,6 +3938,12 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void 
*opaque,
 
 qemu_sem_post(&multifd_send_param->sem_sync);
 }
+
+/*
+ * Use multifd to migrate, we only register memory for
+ * multifd RDMA channel and main channel don't register it.
+ */
+goto wait_reg_complete;
 }
 
 /*
@@ -3998,6 +4004,8 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void 
*opaque,
 rdma->dest_blocks[i].remote_host_addr;
 local->block[i].remote_rkey = rdma->dest_blocks[i].remote_rkey;
 }
+
+wait_reg_complete:
 /* Wait for all multifd channels to complete registration */
 if (migrate_use_multifd()) {
 int i;
-- 
1.8.3.1

[PATCH v3 08/18] migration/rdma: export getQIOChannel to get QIOchannel in rdma

2020-10-16 Thread Chuan Zheng

Signed-off-by: Zhimin Feng 
Signed-off-by: Chuan Zheng 
---
 migration/qemu-file.c | 5 +
 migration/qemu-file.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index be21518..37f6201 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -260,6 +260,11 @@ void ram_control_before_iterate(QEMUFile *f, uint64_t 
flags)
 }
 }
 
+void *getQIOChannel(QEMUFile *f)
+{
+return f->opaque;
+}
+
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
 {
 int ret = 0;
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index a9b6d6c..4cef043 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -165,6 +165,7 @@ void qemu_file_set_blocking(QEMUFile *f, bool block);
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
+void *getQIOChannel(QEMUFile *f);
 
 /* Whenever this is found in the data stream, the flags
  * will be passed to ram_control_load_hook in the incoming-migration
-- 
1.8.3.1

[PATCH v3 14/18] migration/rdma: register memory for multifd RDMA channels

2020-10-16 Thread Chuan Zheng

Signed-off-by: Zhimin Feng 
Signed-off-by: Chuan Zheng 
---
 migration/multifd.c |  3 ++
 migration/rdma.c| 94 +++--
 2 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 9439b3c..c4d90ef 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -534,6 +534,9 @@ void multifd_send_terminate_threads(Error *err)
 qemu_mutex_lock(&p->mutex);
 p->quit = true;
 qemu_sem_post(&p->sem);
+if (migrate_use_rdma()) {
+qemu_sem_post(&p->sem_sync);
+}
 qemu_mutex_unlock(&p->mutex);
 }
 }
diff --git a/migration/rdma.c b/migration/rdma.c
index a366849..3210e6e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3837,6 +3837,19 @@ static int rdma_load_hook(QEMUFile *f, void *opaque, 
uint64_t flags, void *data)
 return rdma_block_notification_handle(opaque, data);
 
 case RAM_CONTROL_HOOK:
+if (migrate_use_multifd()) {
+int i;
+MultiFDRecvParams *multifd_recv_param = NULL;
+int thread_count = migrate_multifd_channels();
+/* Inform dest recv_thread to poll */
+for (i = 0; i < thread_count; i++) {
+if (get_multifd_recv_param(i, &multifd_recv_param)) {
+return -1;
+}
+qemu_sem_post(&multifd_recv_param->sem_sync);
+}
+}
+
 return qemu_rdma_registration_handle(f, opaque);
 
 default:
@@ -3909,6 +3922,24 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void 
*opaque,
 head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST;
 trace_qemu_rdma_registration_stop_ram();
 
+if (migrate_use_multifd()) {
+/*
+ * Inform the multifd channels to register memory
+ */
+int i;
+int thread_count = migrate_multifd_channels();
+MultiFDSendParams *multifd_send_param = NULL;
+for (i = 0; i < thread_count; i++) {
+ret = get_multifd_send_param(i, &multifd_send_param);
+if (ret) {
+error_report("rdma: error getting multifd(%d)", i);
+return ret;
+}
+
+qemu_sem_post(&multifd_send_param->sem_sync);
+}
+}
+
 /*
  * Make sure that we parallelize the pinning on both sides.
  * For very large guests, doing this serially takes a really
@@ -3967,6 +3998,21 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void 
*opaque,
 rdma->dest_blocks[i].remote_host_addr;
 local->block[i].remote_rkey = rdma->dest_blocks[i].remote_rkey;
 }
+/* Wait for all multifd channels to complete registration */
+if (migrate_use_multifd()) {
+int i;
+int thread_count = migrate_multifd_channels();
+MultiFDSendParams *multifd_send_param = NULL;
+for (i = 0; i < thread_count; i++) {
+ret = get_multifd_send_param(i, &multifd_send_param);
+if (ret) {
+error_report("rdma: error getting multifd(%d)", i);
+return ret;
+}
+
+qemu_sem_wait(&multifd_send_param->sem);
+}
+}
 }
 
 trace_qemu_rdma_registration_stop(flags);
@@ -3978,6 +4024,24 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void 
*opaque,
 goto err;
 }
 
+if (migrate_use_multifd()) {
+/*
+ * Inform src send_thread to send FINISHED signal.
+ * Wait for multifd RDMA send threads to poll the CQE.
+ */
+int i;
+int thread_count = migrate_multifd_channels();
+MultiFDSendParams *multifd_send_param = NULL;
+for (i = 0; i < thread_count; i++) {
+ret = get_multifd_send_param(i, &multifd_send_param);
+if (ret < 0) {
+goto err;
+}
+
+qemu_sem_post(&multifd_send_param->sem_sync);
+}
+}
+
 return 0;
 err:
 rdma->error_state = ret;
@@ -4355,20 +4419,39 @@ static void *multifd_rdma_send_thread(void *opaque)
 {
 MultiFDSendParams *p = opaque;
 Error *local_err = NULL;
+int ret = 0;
+RDMAControlHeader head = { .len = 0, .repeat = 1 };
 
 trace_multifd_send_thread_start(p->id);
 if (multifd_send_initial_packet(p, &local_err) < 0) {
 goto out;
 }
 
+/* wait for semaphore notification to register memory */
+qemu_sem_wait(&p->sem_sync);
+if (qemu_rdma_registration(p->rdma) < 0) {
+goto out;
+}
+/*
+ * Inform the main RDMA thread to run when multifd
+ * RDMA thread have completed registration.
+ */
+qemu_sem_post(&p->sem);
 while (true) {
+qemu_sem_wait(&p->sem_sync);
 qemu_mutex_lock(&p->mutex);
 if (p->quit) {
 qemu_mutex_unloc

[PATCH v3 01/18] migration/rdma: add the 'migrate_use_rdma_pin_all' function

2020-10-16 Thread Chuan Zheng

Signed-off-by: Zhimin Feng 
Signed-off-by: Chuan Zheng 
---
 migration/migration.c | 9 +
 migration/migration.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 0575ecb..64ae417 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2329,6 +2329,15 @@ bool migrate_use_events(void)
 return s->enabled_capabilities[MIGRATION_CAPABILITY_EVENTS];
 }
 
+bool migrate_use_rdma_pin_all(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL];
+}
+
 bool migrate_use_multifd(void)
 {
 MigrationState *s;
diff --git a/migration/migration.h b/migration/migration.h
index deb411a..74fd790 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -300,6 +300,7 @@ bool migrate_ignore_shared(void);
 bool migrate_validate_uuid(void);
 
 bool migrate_auto_converge(void);
+bool migrate_use_rdma_pin_all(void);
 bool migrate_use_multifd(void);
 bool migrate_pause_before_switchover(void);
 int migrate_multifd_channels(void);
-- 
1.8.3.1

[PATCH v3 06/18] migration/rdma: export MultiFDSendParams/MultiFDRecvParams

2020-10-16 Thread Chuan Zheng

MultiFDSendParams and MultiFDRecvParams is need for rdma, export it

Signed-off-by: Zhimin Feng 
Signed-off-by: Chuan Zheng 
---
 migration/multifd.c | 26 ++
 migration/multifd.h |  2 ++
 2 files changed, 28 insertions(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index 8ccfd46..03f3a1e 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -387,6 +387,19 @@ struct {
 MultiFDSetup *setup_ops;
 } *multifd_send_state;
 
+int get_multifd_send_param(int id, MultiFDSendParams **param)
+{
+int ret = 0;
+
+if (id < 0 || id >= migrate_multifd_channels()) {
+ret = -1;
+} else {
+*param = &(multifd_send_state->params[id]);
+}
+
+return ret;
+}
+
 /*
  * How we use multifd_send_state->pages and channel->pages?
  *
@@ -919,6 +932,19 @@ struct {
 MultiFDSetup *setup_ops;
 } *multifd_recv_state;
 
+int get_multifd_recv_param(int id, MultiFDRecvParams **param)
+{
+int ret = 0;
+
+if (id < 0 || id >= migrate_multifd_channels()) {
+ret = -1;
+} else {
+*param = &(multifd_recv_state->params[id]);
+}
+
+return ret;
+}
+
 static void multifd_recv_terminate_threads(Error *err)
 {
 int i;
diff --git a/migration/multifd.h b/migration/multifd.h
index 62a0b2a..2f4e585 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -176,6 +176,8 @@ typedef struct {
 #ifdef CONFIG_RDMA
 MultiFDSetup *multifd_rdma_setup(void);
 #endif
+int get_multifd_send_param(int id, MultiFDSendParams **param);
+int get_multifd_recv_param(int id, MultiFDRecvParams **param);
 MultiFDSetup *multifd_setup_ops_init(void);
 void multifd_register_ops(int method, MultiFDMethods *ops);
 
-- 
1.8.3.1

[PATCH v3 04/18] migration/rdma: add multifd_setup_ops for rdma

2020-10-16 Thread Chuan Zheng

Signed-off-by: Chuan Zheng 
---
 migration/multifd.c |  6 
 migration/multifd.h |  4 +++
 migration/rdma.c| 82 +
 3 files changed, 92 insertions(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index 1f82307..0d494df 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1210,6 +1210,12 @@ MultiFDSetup *multifd_setup_ops_init(void)
 {
 MultiFDSetup *ops;
 
+#ifdef CONFIG_RDMA
+if (migrate_use_rdma()) {
+ops = multifd_rdma_setup();
+return ops;
+}
+#endif
 ops = &multifd_socket_ops;
 return ops;
 }
diff --git a/migration/multifd.h b/migration/multifd.h
index 446315b..62a0b2a 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -173,6 +173,10 @@ typedef struct {
 void (*recv_channel_setup)(QIOChannel *ioc, MultiFDRecvParams *p);
 } MultiFDSetup;
 
+#ifdef CONFIG_RDMA
+MultiFDSetup *multifd_rdma_setup(void);
+#endif
+MultiFDSetup *multifd_setup_ops_init(void);
 void multifd_register_ops(int method, MultiFDMethods *ops);
 
 #endif
diff --git a/migration/rdma.c b/migration/rdma.c
index 0340841..ad4e4ba 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -19,6 +19,7 @@
 #include "qemu/cutils.h"
 #include "rdma.h"
 #include "migration.h"
+#include "multifd.h"
 #include "qemu-file.h"
 #include "ram.h"
 #include "qemu-file-channel.h"
@@ -4138,3 +4139,84 @@ err:
 g_free(rdma);
 g_free(rdma_return_path);
 }
+
+static void *multifd_rdma_send_thread(void *opaque)
+{
+MultiFDSendParams *p = opaque;
+
+while (true) {
+qemu_mutex_lock(&p->mutex);
+if (p->quit) {
+qemu_mutex_unlock(&p->mutex);
+break;
+}
+qemu_mutex_unlock(&p->mutex);
+qemu_sem_wait(&p->sem);
+}
+
+qemu_mutex_lock(&p->mutex);
+p->running = false;
+qemu_mutex_unlock(&p->mutex);
+
+return NULL;
+}
+
+static void multifd_rdma_send_channel_setup(MultiFDSendParams *p)
+{
+Error *local_err = NULL;
+
+if (p->quit) {
+error_setg(&local_err, "multifd: send id %d already quit", p->id);
+return ;
+}
+p->running = true;
+
+qemu_thread_create(&p->thread, p->name, multifd_rdma_send_thread, p,
+   QEMU_THREAD_JOINABLE);
+}
+
+static void *multifd_rdma_recv_thread(void *opaque)
+{
+MultiFDRecvParams *p = opaque;
+
+while (true) {
+qemu_mutex_lock(&p->mutex);
+if (p->quit) {
+qemu_mutex_unlock(&p->mutex);
+break;
+}
+qemu_mutex_unlock(&p->mutex);
+qemu_sem_wait(&p->sem_sync);
+}
+
+qemu_mutex_lock(&p->mutex);
+p->running = false;
+qemu_mutex_unlock(&p->mutex);
+
+return NULL;
+}
+
+static void multifd_rdma_recv_channel_setup(QIOChannel *ioc,
+MultiFDRecvParams *p)
+{
+QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+
+p->file = rioc->file;
+return;
+}
+
+static MultiFDSetup multifd_rdma_ops = {
+.send_thread_setup = multifd_rdma_send_thread,
+.recv_thread_setup = multifd_rdma_recv_thread,
+.send_channel_setup = multifd_rdma_send_channel_setup,
+.recv_channel_setup = multifd_rdma_recv_channel_setup
+};
+
+MultiFDSetup *multifd_rdma_setup(void)
+{
+MultiFDSetup *rdma_ops;
+
+rdma_ops = &multifd_rdma_ops;
+
+return rdma_ops;
+}
-- 
1.8.3.1

[PATCH v3 07/18] migration/rdma: add rdma field into multifd send/recv param

2020-10-16 Thread Chuan Zheng

Note we do want to export any rdma struct, take void * instead.

Signed-off-by: Chuan Zheng 
---
 migration/multifd.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/migration/multifd.h b/migration/multifd.h
index 2f4e585..ff80bd5 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -108,6 +108,10 @@ typedef struct {
 QemuSemaphore sem_sync;
 /* used for compression methods */
 void *data;
+/* used for multifd rdma */
+void *rdma;
+/* communication channel */
+QEMUFile *file;
 }  MultiFDSendParams;
 
 typedef struct {
@@ -147,6 +151,10 @@ typedef struct {
 QemuSemaphore sem_sync;
 /* used for de-compression methods */
 void *data;
+/* used for multifd rdma */
+void *rdma;
+/* communication channel */
+QEMUFile *file;
 } MultiFDRecvParams;
 
 typedef struct {
-- 
1.8.3.1

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Jann Horn

On Sat, Oct 17, 2020 at 5:36 AM Willy Tarreau  wrote:
> On Sat, Oct 17, 2020 at 03:40:08AM +0200, Jann Horn wrote:
> > [adding some more people who are interested in RNG stuff: Andy, Jason,
> > Theodore, Willy Tarreau, Eric Biggers. also linux-api@, because this
> > concerns some pretty fundamental API stuff related to RNG usage]
> >
> > On Fri, Oct 16, 2020 at 4:33 PM Catangiu, Adrian Costin
> >  wrote:
> > > This patch is a driver which exposes the Virtual Machine Generation ID
> > > via a char-dev FS interface that provides ID update sync and async
> > > notification, retrieval and confirmation mechanisms:
> > >
> > > When the device is 'open()'ed a copy of the current vm UUID is
> > > associated with the file handle. 'read()' operations block until the
> > > associated UUID is no longer up to date - until HW vm gen id changes -
> > > at which point the new UUID is provided/returned. Nonblocking 'read()'
> > > uses EWOULDBLOCK to signal that there is no _new_ UUID available.
> > >
> > > 'poll()' is implemented to allow polling for UUID updates. Such
> > > updates result in 'EPOLLIN' events.
> > >
> > > Subsequent read()s following a UUID update no longer block, but return
> > > the updated UUID. The application needs to acknowledge the UUID update
> > > by confirming it through a 'write()'.
> > > Only on writing back to the driver the right/latest UUID, will the
> > > driver mark this "watcher" as up to date and remove EPOLLIN status.
> > >
> > > 'mmap()' support allows mapping a single read-only shared page which
> > > will always contain the latest UUID value at offset 0.
> >
> > It would be nicer if that page just contained an incrementing counter,
> > instead of a UUID. It's not like the application cares *what* the UUID
> > changed to, just that it *did* change and all RNGs state now needs to
> > be reseeded from the kernel, right? And an application can't reliably
> > read the entire UUID from the memory mapping anyway, because the VM
> > might be forked in the middle.
> >
> > So I think your kernel driver should detect UUID changes and then turn
> > those into a monotonically incrementing counter. (Probably 64 bits
> > wide?) (That's probably also a little bit faster than comparing an
> > entire UUID.)
>
> I agree with this. Further, I'm observing there is a very common
> confusion between "universally unique" and "random". Randoms are
> needed when seeking unpredictability. A random number generator
> *must* be able to return the same value multiple times in a row
> (though this is rare), otherwise it's not random.
[...]
> If the UUIDs used there are real UUIDs, it could be as simple as
> updating them according to their format, i.e. updating the timestamp,
> and if the timestamp is already the same, just increase the seq counter.
> Doing this doesn't require entropy, doesn't need to block and doesn't
> needlessly leak randoms that sometimes make people feel nervous.

Those UUIDs are supplied by existing hypervisor code; in that regard,
this is almost like a driver for a hardware device. It is written
against a fixed API provided by the underlying machine. Making sure
that the sequence of UUIDs, as seen from inside the machine, never
changes back to a previous one is the responsibility of the hypervisor
and out of scope for this driver.

Microsoft's spec document (which is a .docx file for reasons I don't
understand) actually promises us that it is a cryptographically random
128-bit integer value, which means that if you fork a VM 2^64 times,
the probability that any two of those VMs have the same counter is
2^-64. That should be good enough.

But in userspace, we just need a simple counter. There's no need for
us to worry about anything else, like timestamps or whatever. If we
repeatedly fork a paused VM, the forked VMs will see the same counter
value, but that's totally fine, because the only thing that matters to
userspace is that the counter changes when the VM is forked.

And actually, since the value is a cryptographically random 128-bit
value, I think that we should definitely use it to help reseed the
kernel's RNG, and keep it secret from userspace. That way, even if the
VM image is public, we can ensure that going forward, the kernel RNG
will return securely random data.

[PATCH 3/4] target/s390x: Improve cc computation for SUBTRACT LOGICAL

2020-10-16 Thread Richard Henderson

The resulting cc is only dependent on the result and the
borrow-out.  So save those things rather than the inputs.

Borrow-out for 64-bit inputs is had via tcg_gen_sub2_i64 directly
into cc_src.  Borrow-out for 32-bit inputs is had via extraction
from a normal 64-bit sub (with zero-extended inputs).

Signed-off-by: Richard Henderson 
---
 target/s390x/internal.h|  3 +--
 target/s390x/cc_helper.c   | 40 ++-
 target/s390x/helper.c  |  3 +--
 target/s390x/translate.c   | 55 +++---
 target/s390x/insn-data.def | 24 -
 5 files changed, 43 insertions(+), 82 deletions(-)

diff --git a/target/s390x/internal.h b/target/s390x/internal.h
index f5f3ae063e..4077047494 100644
--- a/target/s390x/internal.h
+++ b/target/s390x/internal.h
@@ -161,6 +161,7 @@ enum cc_op {
 
 CC_OP_NZ,   /* env->cc_dst != 0 */
 CC_OP_ADDU, /* dst != 0, src = carry out (0,1) */
+CC_OP_SUBU, /* dst != 0, src = borrow out (0,-1) */
 
 CC_OP_LTGT_32,  /* signed less/greater than (32bit) */
 CC_OP_LTGT_64,  /* signed less/greater than (64bit) */
@@ -171,7 +172,6 @@ enum cc_op {
 
 CC_OP_ADD_64,   /* overflow on add (64bit) */
 CC_OP_SUB_64,   /* overflow on subtraction (64bit) */
-CC_OP_SUBU_64,  /* overflow on unsigned subtraction (64bit) */
 CC_OP_SUBB_64,  /* overflow on unsigned sub-borrow (64bit) */
 CC_OP_ABS_64,   /* sign eval on abs (64bit) */
 CC_OP_NABS_64,  /* sign eval on nabs (64bit) */
@@ -179,7 +179,6 @@ enum cc_op {
 
 CC_OP_ADD_32,   /* overflow on add (32bit) */
 CC_OP_SUB_32,   /* overflow on subtraction (32bit) */
-CC_OP_SUBU_32,  /* overflow on unsigned subtraction (32bit) */
 CC_OP_SUBB_32,  /* overflow on unsigned sub-borrow (32bit) */
 CC_OP_ABS_32,   /* sign eval on abs (64bit) */
 CC_OP_NABS_32,  /* sign eval on nabs (64bit) */
diff --git a/target/s390x/cc_helper.c b/target/s390x/cc_helper.c
index cd2c5c4b39..c7728d1225 100644
--- a/target/s390x/cc_helper.c
+++ b/target/s390x/cc_helper.c
@@ -129,6 +129,11 @@ static uint32_t cc_calc_addu(uint64_t carry_out, uint64_t 
result)
 return (result != 0) + 2 * carry_out;
 }
 
+static uint32_t cc_calc_subu(uint64_t borrow_out, uint64_t result)
+{
+return cc_calc_addu(borrow_out + 1, result);
+}
+
 static uint32_t cc_calc_add_64(int64_t a1, int64_t a2, int64_t ar)
 {
 if ((a1 > 0 && a2 > 0 && ar < 0) || (a1 < 0 && a2 < 0 && ar > 0)) {
@@ -159,19 +164,6 @@ static uint32_t cc_calc_sub_64(int64_t a1, int64_t a2, 
int64_t ar)
 }
 }
 
-static uint32_t cc_calc_subu_64(uint64_t a1, uint64_t a2, uint64_t ar)
-{
-if (ar == 0) {
-return 2;
-} else {
-if (a2 > a1) {
-return 1;
-} else {
-return 3;
-}
-}
-}
-
 static uint32_t cc_calc_subb_64(uint64_t a1, uint64_t a2, uint64_t ar)
 {
 int borrow_out;
@@ -245,19 +237,6 @@ static uint32_t cc_calc_sub_32(int32_t a1, int32_t a2, 
int32_t ar)
 }
 }
 
-static uint32_t cc_calc_subu_32(uint32_t a1, uint32_t a2, uint32_t ar)
-{
-if (ar == 0) {
-return 2;
-} else {
-if (a2 > a1) {
-return 1;
-} else {
-return 3;
-}
-}
-}
-
 static uint32_t cc_calc_subb_32(uint32_t a1, uint32_t a2, uint32_t ar)
 {
 int borrow_out;
@@ -462,15 +441,15 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t 
cc_op,
 case CC_OP_ADDU:
 r = cc_calc_addu(src, dst);
 break;
+case CC_OP_SUBU:
+r = cc_calc_subu(src, dst);
+break;
 case CC_OP_ADD_64:
 r =  cc_calc_add_64(src, dst, vr);
 break;
 case CC_OP_SUB_64:
 r =  cc_calc_sub_64(src, dst, vr);
 break;
-case CC_OP_SUBU_64:
-r =  cc_calc_subu_64(src, dst, vr);
-break;
 case CC_OP_SUBB_64:
 r =  cc_calc_subb_64(src, dst, vr);
 break;
@@ -493,9 +472,6 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t 
cc_op,
 case CC_OP_SUB_32:
 r =  cc_calc_sub_32(src, dst, vr);
 break;
-case CC_OP_SUBU_32:
-r =  cc_calc_subu_32(src, dst, vr);
-break;
 case CC_OP_SUBB_32:
 r =  cc_calc_subb_32(src, dst, vr);
 break;
diff --git a/target/s390x/helper.c b/target/s390x/helper.c
index 4f4561bc64..fa3aa500e5 100644
--- a/target/s390x/helper.c
+++ b/target/s390x/helper.c
@@ -396,6 +396,7 @@ const char *cc_name(enum cc_op cc_op)
 [CC_OP_STATIC]= "CC_OP_STATIC",
 [CC_OP_NZ]= "CC_OP_NZ",
 [CC_OP_ADDU]  = "CC_OP_ADDU",
+[CC_OP_SUBU]  = "CC_OP_SUBU",
 [CC_OP_LTGT_32]   = "CC_OP_LTGT_32",
 [CC_OP_LTGT_64]   = "CC_OP_LTGT_64",
 [CC_OP_LTUGTU_32] = "CC_OP_LTUGTU_32",
@@ -404,13 +405,

[PATCH 2/4] target/s390x: Improve ADD LOGICAL WITH CARRY

2020-10-16 Thread Richard Henderson

Now that ADD LOGICAL outputs carry, we can use that as input directly.
It also means we can re-use CC_OP_ZC and produce an output carry
directly from ADD LOGICAL WITH CARRY.

Signed-off-by: Richard Henderson 
---
 target/s390x/internal.h|  2 --
 target/s390x/cc_helper.c   | 26 ---
 target/s390x/helper.c  |  2 --
 target/s390x/translate.c   | 66 ++
 target/s390x/insn-data.def |  8 ++---
 5 files changed, 35 insertions(+), 69 deletions(-)

diff --git a/target/s390x/internal.h b/target/s390x/internal.h
index 55c5442102..f5f3ae063e 100644
--- a/target/s390x/internal.h
+++ b/target/s390x/internal.h
@@ -170,7 +170,6 @@ enum cc_op {
 CC_OP_LTGT0_64, /* signed less/greater than 0 (64bit) */
 
 CC_OP_ADD_64,   /* overflow on add (64bit) */
-CC_OP_ADDC_64,  /* overflow on unsigned add-carry (64bit) */
 CC_OP_SUB_64,   /* overflow on subtraction (64bit) */
 CC_OP_SUBU_64,  /* overflow on unsigned subtraction (64bit) */
 CC_OP_SUBB_64,  /* overflow on unsigned sub-borrow (64bit) */
@@ -179,7 +178,6 @@ enum cc_op {
 CC_OP_MULS_64,  /* overflow on signed multiply (64bit) */
 
 CC_OP_ADD_32,   /* overflow on add (32bit) */
-CC_OP_ADDC_32,  /* overflow on unsigned add-carry (32bit) */
 CC_OP_SUB_32,   /* overflow on subtraction (32bit) */
 CC_OP_SUBU_32,  /* overflow on unsigned subtraction (32bit) */
 CC_OP_SUBB_32,  /* overflow on unsigned sub-borrow (32bit) */
diff --git a/target/s390x/cc_helper.c b/target/s390x/cc_helper.c
index 59da4d1cc2..cd2c5c4b39 100644
--- a/target/s390x/cc_helper.c
+++ b/target/s390x/cc_helper.c
@@ -144,16 +144,6 @@ static uint32_t cc_calc_add_64(int64_t a1, int64_t a2, 
int64_t ar)
 }
 }
 
-static uint32_t cc_calc_addc_64(uint64_t a1, uint64_t a2, uint64_t ar)
-{
-/* Recover a2 + carry_in.  */
-uint64_t a2c = ar - a1;
-/* Check for a2+carry_in overflow, then a1+a2c overflow.  */
-int carry_out = (a2c < a2) || (ar < a1);
-
-return (ar != 0) + 2 * carry_out;
-}
-
 static uint32_t cc_calc_sub_64(int64_t a1, int64_t a2, int64_t ar)
 {
 if ((a1 > 0 && a2 < 0 && ar < 0) || (a1 < 0 && a2 > 0 && ar > 0)) {
@@ -240,16 +230,6 @@ static uint32_t cc_calc_add_32(int32_t a1, int32_t a2, 
int32_t ar)
 }
 }
 
-static uint32_t cc_calc_addc_32(uint32_t a1, uint32_t a2, uint32_t ar)
-{
-/* Recover a2 + carry_in.  */
-uint32_t a2c = ar - a1;
-/* Check for a2+carry_in overflow, then a1+a2c overflow.  */
-int carry_out = (a2c < a2) || (ar < a1);
-
-return (ar != 0) + 2 * carry_out;
-}
-
 static uint32_t cc_calc_sub_32(int32_t a1, int32_t a2, int32_t ar)
 {
 if ((a1 > 0 && a2 < 0 && ar < 0) || (a1 < 0 && a2 > 0 && ar > 0)) {
@@ -485,9 +465,6 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t 
cc_op,
 case CC_OP_ADD_64:
 r =  cc_calc_add_64(src, dst, vr);
 break;
-case CC_OP_ADDC_64:
-r =  cc_calc_addc_64(src, dst, vr);
-break;
 case CC_OP_SUB_64:
 r =  cc_calc_sub_64(src, dst, vr);
 break;
@@ -513,9 +490,6 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t 
cc_op,
 case CC_OP_ADD_32:
 r =  cc_calc_add_32(src, dst, vr);
 break;
-case CC_OP_ADDC_32:
-r =  cc_calc_addc_32(src, dst, vr);
-break;
 case CC_OP_SUB_32:
 r =  cc_calc_sub_32(src, dst, vr);
 break;
diff --git a/target/s390x/helper.c b/target/s390x/helper.c
index db87a62a57..4f4561bc64 100644
--- a/target/s390x/helper.c
+++ b/target/s390x/helper.c
@@ -403,14 +403,12 @@ const char *cc_name(enum cc_op cc_op)
 [CC_OP_LTGT0_32]  = "CC_OP_LTGT0_32",
 [CC_OP_LTGT0_64]  = "CC_OP_LTGT0_64",
 [CC_OP_ADD_64]= "CC_OP_ADD_64",
-[CC_OP_ADDC_64]   = "CC_OP_ADDC_64",
 [CC_OP_SUB_64]= "CC_OP_SUB_64",
 [CC_OP_SUBU_64]   = "CC_OP_SUBU_64",
 [CC_OP_SUBB_64]   = "CC_OP_SUBB_64",
 [CC_OP_ABS_64]= "CC_OP_ABS_64",
 [CC_OP_NABS_64]   = "CC_OP_NABS_64",
 [CC_OP_ADD_32]= "CC_OP_ADD_32",
-[CC_OP_ADDC_32]   = "CC_OP_ADDC_32",
 [CC_OP_SUB_32]= "CC_OP_SUB_32",
 [CC_OP_SUBU_32]   = "CC_OP_SUBU_32",
 [CC_OP_SUBB_32]   = "CC_OP_SUBB_32",
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 9bf4c14f66..570b3c88c8 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -600,12 +600,10 @@ static void gen_op_calc_cc(DisasContext *s)
 dummy = tcg_const_i64(0);
 /* FALLTHRU */
 case CC_OP_ADD_64:
-case CC_OP_ADDC_64:
 case CC_OP_SUB_64:
 case CC_OP_SUBU_64:
 case CC_OP_SUBB_64:
 case CC_OP_ADD_32:
-case CC_OP_ADDC_32:
 case CC_OP_SUB_32:
 case CC_OP_SUBU_32:
 case CC_OP_SUBB_32:
@@ -665,12 +663,10 @@ static void gen_op_calc_cc(DisasContext *s)

[PATCH 4/4] target/s390x: Improve SUB LOGICAL WITH BORROW

2020-10-16 Thread Richard Henderson

Now that SUB LOGICAL outputs carry, we can use that as input directly.
It also means we can re-use CC_OP_ZC and produce an output carry
directly from SUB LOGICAL WITH BORROW.

Signed-off-by: Richard Henderson 
---
 target/s390x/internal.h|  2 --
 target/s390x/cc_helper.c   | 32 -
 target/s390x/helper.c  |  2 --
 target/s390x/translate.c   | 74 --
 target/s390x/insn-data.def |  8 ++---
 5 files changed, 44 insertions(+), 74 deletions(-)

diff --git a/target/s390x/internal.h b/target/s390x/internal.h
index 4077047494..11515bb617 100644
--- a/target/s390x/internal.h
+++ b/target/s390x/internal.h
@@ -172,14 +172,12 @@ enum cc_op {
 
 CC_OP_ADD_64,   /* overflow on add (64bit) */
 CC_OP_SUB_64,   /* overflow on subtraction (64bit) */
-CC_OP_SUBB_64,  /* overflow on unsigned sub-borrow (64bit) */
 CC_OP_ABS_64,   /* sign eval on abs (64bit) */
 CC_OP_NABS_64,  /* sign eval on nabs (64bit) */
 CC_OP_MULS_64,  /* overflow on signed multiply (64bit) */
 
 CC_OP_ADD_32,   /* overflow on add (32bit) */
 CC_OP_SUB_32,   /* overflow on subtraction (32bit) */
-CC_OP_SUBB_32,  /* overflow on unsigned sub-borrow (32bit) */
 CC_OP_ABS_32,   /* sign eval on abs (64bit) */
 CC_OP_NABS_32,  /* sign eval on nabs (64bit) */
 CC_OP_MULS_32,  /* overflow on signed multiply (32bit) */
diff --git a/target/s390x/cc_helper.c b/target/s390x/cc_helper.c
index c7728d1225..e7039d0d18 100644
--- a/target/s390x/cc_helper.c
+++ b/target/s390x/cc_helper.c
@@ -164,19 +164,6 @@ static uint32_t cc_calc_sub_64(int64_t a1, int64_t a2, 
int64_t ar)
 }
 }
 
-static uint32_t cc_calc_subb_64(uint64_t a1, uint64_t a2, uint64_t ar)
-{
-int borrow_out;
-
-if (ar != a1 - a2) {   /* difference means borrow-in */
-borrow_out = (a2 >= a1);
-} else {
-borrow_out = (a2 > a1);
-}
-
-return (ar != 0) + 2 * !borrow_out;
-}
-
 static uint32_t cc_calc_abs_64(int64_t dst)
 {
 if ((uint64_t)dst == 0x8000ULL) {
@@ -237,19 +224,6 @@ static uint32_t cc_calc_sub_32(int32_t a1, int32_t a2, 
int32_t ar)
 }
 }
 
-static uint32_t cc_calc_subb_32(uint32_t a1, uint32_t a2, uint32_t ar)
-{
-int borrow_out;
-
-if (ar != a1 - a2) {   /* difference means borrow-in */
-borrow_out = (a2 >= a1);
-} else {
-borrow_out = (a2 > a1);
-}
-
-return (ar != 0) + 2 * !borrow_out;
-}
-
 static uint32_t cc_calc_abs_32(int32_t dst)
 {
 if ((uint32_t)dst == 0x8000UL) {
@@ -450,9 +424,6 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t 
cc_op,
 case CC_OP_SUB_64:
 r =  cc_calc_sub_64(src, dst, vr);
 break;
-case CC_OP_SUBB_64:
-r =  cc_calc_subb_64(src, dst, vr);
-break;
 case CC_OP_ABS_64:
 r =  cc_calc_abs_64(dst);
 break;
@@ -472,9 +443,6 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t 
cc_op,
 case CC_OP_SUB_32:
 r =  cc_calc_sub_32(src, dst, vr);
 break;
-case CC_OP_SUBB_32:
-r =  cc_calc_subb_32(src, dst, vr);
-break;
 case CC_OP_ABS_32:
 r =  cc_calc_abs_32(dst);
 break;
diff --git a/target/s390x/helper.c b/target/s390x/helper.c
index fa3aa500e5..7678994feb 100644
--- a/target/s390x/helper.c
+++ b/target/s390x/helper.c
@@ -405,12 +405,10 @@ const char *cc_name(enum cc_op cc_op)
 [CC_OP_LTGT0_64]  = "CC_OP_LTGT0_64",
 [CC_OP_ADD_64]= "CC_OP_ADD_64",
 [CC_OP_SUB_64]= "CC_OP_SUB_64",
-[CC_OP_SUBB_64]   = "CC_OP_SUBB_64",
 [CC_OP_ABS_64]= "CC_OP_ABS_64",
 [CC_OP_NABS_64]   = "CC_OP_NABS_64",
 [CC_OP_ADD_32]= "CC_OP_ADD_32",
 [CC_OP_SUB_32]= "CC_OP_SUB_32",
-[CC_OP_SUBB_32]   = "CC_OP_SUBB_32",
 [CC_OP_ABS_32]= "CC_OP_ABS_32",
 [CC_OP_NABS_32]   = "CC_OP_NABS_32",
 [CC_OP_COMP_32]   = "CC_OP_COMP_32",
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 48494a86cc..0d8235a5fb 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -601,10 +601,8 @@ static void gen_op_calc_cc(DisasContext *s)
 /* FALLTHRU */
 case CC_OP_ADD_64:
 case CC_OP_SUB_64:
-case CC_OP_SUBB_64:
 case CC_OP_ADD_32:
 case CC_OP_SUB_32:
-case CC_OP_SUBB_32:
 local_cc_op = tcg_const_i32(s->cc_op);
 break;
 case CC_OP_CONST0:
@@ -663,10 +661,8 @@ static void gen_op_calc_cc(DisasContext *s)
 break;
 case CC_OP_ADD_64:
 case CC_OP_SUB_64:
-case CC_OP_SUBB_64:
 case CC_OP_ADD_32:
 case CC_OP_SUB_32:
-case CC_OP_SUBB_32:
 /* 3 arguments */
 gen_helper_calc_cc(cc_op, cpu_env, local_cc_op, cc_src, cc_dst, cc_vr);
 break;
@@ -4744,29 +4740,49 @@ static DisasJumpType op_subu64(DisasC

[PATCH 0/4] target/s390x: Improve carry computation

2020-10-16 Thread Richard Henderson

While testing the float128_muladd changes for s390x host,
emulating under x86_64 of course, I noticed that the code
we generate for strings of ALCGR and SLBGR is pretty awful.

I realized that we were missing a trick: the output cc is
based only on the output (result and carry) and so we don't
need to save the inputs.  And once we do that, we can use
the output carry as a direct input to the next insn.

For subtract, computing carry as per the PoO (a + ~b + c),
in tcg is less efficient than computing borrow.  We can
convert between the two simply by adding or subtracting 1.

As an example from float128_muladd,

0x43f014:  b90b 0019   slgr %r1, %r9
0x43f018:  b989 002a   slbgr%r2, %r10
0x43f01c:  b989 0030   slbgr%r3, %r0

Before:

  -- guest addr 0x0043f014
0x7fcbf811a4bc:  4d 8b f5 movq %r13, %r14
0x7fcbf811a4bf:  4c 8b 7d 48  movq 0x48(%rbp), %r15
0x7fcbf811a4c3:  4d 2b ef subq %r15, %r13
0x7fcbf811a4c6:  4c 89 6d 08  movq %r13, 8(%rbp)
  -- guest addr 0x0043f018
0x7fcbf811a4ca:  4c 8b d3 movq %rbx, %r10
0x7fcbf811a4cd:  4c 8b 5d 50  movq 0x50(%rbp), %r11
0x7fcbf811a4d1:  49 2b db subq %r11, %rbx
0x7fcbf811a4d4:  4d 3b f7 cmpq %r15, %r14
0x7fcbf811a4d7:  41 0f 92 c6  setb %r14b
0x7fcbf811a4db:  45 0f b6 f6  movzbl   %r14b, %r14d
0x7fcbf811a4df:  49 2b de subq %r14, %rbx
0x7fcbf811a4e2:  48 89 5d 10  movq %rbx, 0x10(%rbp)
0x7fcbf811a4e6:  4c 8b c3 movq %rbx, %r8
  -- guest addr 0x0043f01c
0x7fcbf811a4e9:  4c 8b 75 18  movq 0x18(%rbp), %r14
0x7fcbf811a4ed:  4d 8b fe movq %r14, %r15
0x7fcbf811a4f0:  4c 8b 4d 00  movq (%rbp), %r9
0x7fcbf811a4f4:  4d 2b f1 subq %r9, %r14
0x7fcbf811a4f7:  48 8b fd movq %rbp, %rdi
0x7fcbf811a4fa:  be 12 00 00 00   movl $0x12, %esi
0x7fcbf811a4ff:  49 8b d2 movq %r10, %rdx
0x7fcbf811a502:  49 8b cb movq %r11, %rcx
0x7fcbf811a505:  ff 15 4d 01 00 00callq*0x14d(%rip)
0x7fcbf811a50b:  83 f8 02 cmpl $2, %eax
0x7fcbf811a50e:  41 0f 92 c2  setb %r10b
0x7fcbf811a512:  45 0f b6 d2  movzbl   %r10b, %r10d
0x7fcbf811a516:  45 8b d2 movl %r10d, %r10d
0x7fcbf811a519:  4d 2b f2 subq %r10, %r14
0x7fcbf811a51c:  4c 89 75 18  movq %r14, 0x18(%rbp)
0x7fcbf811a520:  48 8b 4d 00  movq (%rbp), %rcx
0x7fcbf811a524:  4d 8b c6 movq %r14, %r8

After:

  -- guest addr 0x0043f014
0x7fd1d011a23c:  45 33 f6 xorl %r14d, %r14d
0x7fd1d011a23f:  4c 8b 7d 48  movq 0x48(%rbp), %r15
0x7fd1d011a243:  4d 2b ef subq %r15, %r13
0x7fd1d011a246:  4d 1b f6 sbbq %r14, %r14
0x7fd1d011a249:  4c 89 6d 08  movq %r13, 8(%rbp)
  -- guest addr 0x0043f018
0x7fd1d011a24d:  49 03 de addq %r14, %rbx
0x7fd1d011a250:  49 83 d6 00  adcq $0, %r14
0x7fd1d011a254:  4c 8b 7d 50  movq 0x50(%rbp), %r15
0x7fd1d011a258:  49 2b df subq %r15, %rbx
0x7fd1d011a25b:  49 83 de 00  sbbq $0, %r14
0x7fd1d011a25f:  48 89 5d 10  movq %rbx, 0x10(%rbp)
  -- guest addr 0x0043f01c
0x7fd1d011a263:  4c 8b 7d 18  movq 0x18(%rbp), %r15
0x7fd1d011a267:  4d 03 fe addq %r14, %r15
0x7fd1d011a26a:  49 83 d6 00  adcq $0, %r14
0x7fd1d011a26e:  4c 8b 55 00  movq (%rbp), %r10
0x7fd1d011a272:  4d 2b fa subq %r10, %r15
0x7fd1d011a275:  49 83 de 00  sbbq $0, %r14
0x7fd1d011a279:  4c 89 7d 18  movq %r15, 0x18(%rbp)


r~


Richard Henderson (4):
  target/s390x: Improve cc computation for ADD LOGICAL
  target/s390x: Improve ADD LOGICAL WITH CARRY
  target/s390x: Improve cc computation for SUBTRACT LOGICAL
  target/s390x: Improve SUB LOGICAL WITH BORROW

 target/s390x/internal.h|  11 +-
 target/s390x/cc_helper.c   | 123 +++-
 target/s390x/helper.c  |  10 +-
 target/s390x/translate.c   | 286 -
 target/s390x/insn-data.def |  76 +-
 5 files changed, 213 insertions(+), 293 deletions(-)

-- 
2.25.1

[PATCH 1/4] target/s390x: Improve cc computation for ADD LOGICAL

2020-10-16 Thread Richard Henderson

The resulting cc is only dependent on the result and the
carry-out.  So save those things rather than the inputs.

Carry-out for 64-bit inputs is had via tcg_gen_add2_i64 directly
into cc_src.  Carry-out for 32-bit inputs is had via extraction
from a normal 64-bit add (with zero-extended inputs).

Signed-off-by: Richard Henderson 
---
 target/s390x/internal.h|   4 +-
 target/s390x/cc_helper.c   |  25 -
 target/s390x/helper.c  |   3 +-
 target/s390x/translate.c   | 103 -
 target/s390x/insn-data.def |  36 ++---
 5 files changed, 97 insertions(+), 74 deletions(-)

diff --git a/target/s390x/internal.h b/target/s390x/internal.h
index 64602660ae..55c5442102 100644
--- a/target/s390x/internal.h
+++ b/target/s390x/internal.h
@@ -160,6 +160,8 @@ enum cc_op {
 CC_OP_STATIC,   /* CC value is env->cc_op */
 
 CC_OP_NZ,   /* env->cc_dst != 0 */
+CC_OP_ADDU, /* dst != 0, src = carry out (0,1) */
+
 CC_OP_LTGT_32,  /* signed less/greater than (32bit) */
 CC_OP_LTGT_64,  /* signed less/greater than (64bit) */
 CC_OP_LTUGTU_32,/* unsigned less/greater than (32bit) */
@@ -168,7 +170,6 @@ enum cc_op {
 CC_OP_LTGT0_64, /* signed less/greater than 0 (64bit) */
 
 CC_OP_ADD_64,   /* overflow on add (64bit) */
-CC_OP_ADDU_64,  /* overflow on unsigned add (64bit) */
 CC_OP_ADDC_64,  /* overflow on unsigned add-carry (64bit) */
 CC_OP_SUB_64,   /* overflow on subtraction (64bit) */
 CC_OP_SUBU_64,  /* overflow on unsigned subtraction (64bit) */
@@ -178,7 +179,6 @@ enum cc_op {
 CC_OP_MULS_64,  /* overflow on signed multiply (64bit) */
 
 CC_OP_ADD_32,   /* overflow on add (32bit) */
-CC_OP_ADDU_32,  /* overflow on unsigned add (32bit) */
 CC_OP_ADDC_32,  /* overflow on unsigned add-carry (32bit) */
 CC_OP_SUB_32,   /* overflow on subtraction (32bit) */
 CC_OP_SUBU_32,  /* overflow on unsigned subtraction (32bit) */
diff --git a/target/s390x/cc_helper.c b/target/s390x/cc_helper.c
index 5432aeeed4..59da4d1cc2 100644
--- a/target/s390x/cc_helper.c
+++ b/target/s390x/cc_helper.c
@@ -123,6 +123,12 @@ static uint32_t cc_calc_nz(uint64_t dst)
 return !!dst;
 }
 
+static uint32_t cc_calc_addu(uint64_t carry_out, uint64_t result)
+{
+g_assert(carry_out <= 1);
+return (result != 0) + 2 * carry_out;
+}
+
 static uint32_t cc_calc_add_64(int64_t a1, int64_t a2, int64_t ar)
 {
 if ((a1 > 0 && a2 > 0 && ar < 0) || (a1 < 0 && a2 < 0 && ar > 0)) {
@@ -138,11 +144,6 @@ static uint32_t cc_calc_add_64(int64_t a1, int64_t a2, 
int64_t ar)
 }
 }
 
-static uint32_t cc_calc_addu_64(uint64_t a1, uint64_t a2, uint64_t ar)
-{
-return (ar != 0) + 2 * (ar < a1);
-}
-
 static uint32_t cc_calc_addc_64(uint64_t a1, uint64_t a2, uint64_t ar)
 {
 /* Recover a2 + carry_in.  */
@@ -239,11 +240,6 @@ static uint32_t cc_calc_add_32(int32_t a1, int32_t a2, 
int32_t ar)
 }
 }
 
-static uint32_t cc_calc_addu_32(uint32_t a1, uint32_t a2, uint32_t ar)
-{
-return (ar != 0) + 2 * (ar < a1);
-}
-
 static uint32_t cc_calc_addc_32(uint32_t a1, uint32_t a2, uint32_t ar)
 {
 /* Recover a2 + carry_in.  */
@@ -483,12 +479,12 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t 
cc_op,
 case CC_OP_NZ:
 r =  cc_calc_nz(dst);
 break;
+case CC_OP_ADDU:
+r = cc_calc_addu(src, dst);
+break;
 case CC_OP_ADD_64:
 r =  cc_calc_add_64(src, dst, vr);
 break;
-case CC_OP_ADDU_64:
-r =  cc_calc_addu_64(src, dst, vr);
-break;
 case CC_OP_ADDC_64:
 r =  cc_calc_addc_64(src, dst, vr);
 break;
@@ -517,9 +513,6 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t 
cc_op,
 case CC_OP_ADD_32:
 r =  cc_calc_add_32(src, dst, vr);
 break;
-case CC_OP_ADDU_32:
-r =  cc_calc_addu_32(src, dst, vr);
-break;
 case CC_OP_ADDC_32:
 r =  cc_calc_addc_32(src, dst, vr);
 break;
diff --git a/target/s390x/helper.c b/target/s390x/helper.c
index b877690845..db87a62a57 100644
--- a/target/s390x/helper.c
+++ b/target/s390x/helper.c
@@ -395,6 +395,7 @@ const char *cc_name(enum cc_op cc_op)
 [CC_OP_DYNAMIC]   = "CC_OP_DYNAMIC",
 [CC_OP_STATIC]= "CC_OP_STATIC",
 [CC_OP_NZ]= "CC_OP_NZ",
+[CC_OP_ADDU]  = "CC_OP_ADDU",
 [CC_OP_LTGT_32]   = "CC_OP_LTGT_32",
 [CC_OP_LTGT_64]   = "CC_OP_LTGT_64",
 [CC_OP_LTUGTU_32] = "CC_OP_LTUGTU_32",
@@ -402,7 +403,6 @@ const char *cc_name(enum cc_op cc_op)
 [CC_OP_LTGT0_32]  = "CC_OP_LTGT0_32",
 [CC_OP_LTGT0_64]  = "CC_OP_LTGT0_64",
 [CC_OP_ADD_64]= "CC_OP_ADD_64",
-[CC_OP_ADDU_64]   = "CC_OP_ADDU_64",
 [CC_OP_ADDC_64]   =

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Jann Horn

[adding some more people who are interested in RNG stuff: Andy, Jason,
Theodore, Willy Tarreau, Eric Biggers. also linux-api@, because this
concerns some pretty fundamental API stuff related to RNG usage]

On Fri, Oct 16, 2020 at 4:33 PM Catangiu, Adrian Costin
 wrote:
> - Background
>
> The VM Generation ID is a feature defined by Microsoft (paper:
> http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
> multiple hypervisor vendors.
>
> The feature is required in virtualized environments by apps that work
> with local copies/caches of world-unique data such as random values,
> uuids, monotonically increasing counters, etc.
> Such apps can be negatively affected by VM snapshotting when the VM
> is either cloned or returned to an earlier point in time.
>
> The VM Generation ID is a simple concept meant to alleviate the issue
> by providing a unique ID that changes each time the VM is restored
> from a snapshot. The hw provided UUID value can be used to
> differentiate between VMs or different generations of the same VM.
>
> - Problem
>
> The VM Generation ID is exposed through an ACPI device by multiple
> hypervisor vendors but neither the vendors or upstream Linux have no
> default driver for it leaving users to fend for themselves.
>
> Furthermore, simply finding out about a VM generation change is only
> the starting point of a process to renew internal states of possibly
> multiple applications across the system. This process could benefit
> from a driver that provides an interface through which orchestration
> can be easily done.
>
> - Solution
>
> This patch is a driver which exposes the Virtual Machine Generation ID
> via a char-dev FS interface that provides ID update sync and async
> notification, retrieval and confirmation mechanisms:
>
> When the device is 'open()'ed a copy of the current vm UUID is
> associated with the file handle. 'read()' operations block until the
> associated UUID is no longer up to date - until HW vm gen id changes -
> at which point the new UUID is provided/returned. Nonblocking 'read()'
> uses EWOULDBLOCK to signal that there is no _new_ UUID available.
>
> 'poll()' is implemented to allow polling for UUID updates. Such
> updates result in 'EPOLLIN' events.
>
> Subsequent read()s following a UUID update no longer block, but return
> the updated UUID. The application needs to acknowledge the UUID update
> by confirming it through a 'write()'.
> Only on writing back to the driver the right/latest UUID, will the
> driver mark this "watcher" as up to date and remove EPOLLIN status.
>
> 'mmap()' support allows mapping a single read-only shared page which
> will always contain the latest UUID value at offset 0.

It would be nicer if that page just contained an incrementing counter,
instead of a UUID. It's not like the application cares *what* the UUID
changed to, just that it *did* change and all RNGs state now needs to
be reseeded from the kernel, right? And an application can't reliably
read the entire UUID from the memory mapping anyway, because the VM
might be forked in the middle.

So I think your kernel driver should detect UUID changes and then turn
those into a monotonically incrementing counter. (Probably 64 bits
wide?) (That's probably also a little bit faster than comparing an
entire UUID.)

An option might be to put that counter into the vDSO, instead of a
separate VMA; but I don't know how the other folks feel about that.
Andy, do you have opinions on this? That way, normal userspace code
that uses this infrastructure wouldn't have to mess around with a
special device at all. And it'd be usable in seccomp sandboxes and so
on without needing special plumbing. And libraries wouldn't have to
call open() and mess with file descriptor numbers.

> The driver also adds support for tracking count of open file handles
> that haven't acknowledged an UUID update. This is exposed through
> two IOCTLs:
>  * VMGENID_GET_OUTDATED_WATCHERS: immediately returns the number of
>_outdated_ watchers - number of file handles that were open during
>a VM generation change, and which have not yet confirmed the new
>Vm-Gen-Id.
>  * VMGENID_WAIT_WATCHERS: blocks until there are no more _outdated_
>watchers, or if a 'timeout' argument is provided, until the timeout
>expires.

Does this mean that code that uses the memory mapping to detect
changes is still supposed to confirm generation IDs? What about
multithreaded processes, especially ones that use libraries - if a
library opens a single file descriptor that is used from multiple
threads, is the library required to synchronize all its threads before
confirming the change? That seems like it could get messy. And it
again means that this interface can't easily be used from inside
seccomp sandboxes.

[...]
> diff --git a/Documentation/virt/vmgenid.rst b/Documentation/virt/vmgenid.rst
[...]
> +``close()``:
> +  Removes the file handle as a Vm-Gen-Id watcher.

(Linux doesn't have "file handles". Technically close(

[Bug 1891748] Re: qemu-arm-static 5.1 can't run gcc

2020-10-16 Thread Tyler Price

Plz take a look at this issue please.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1891748

Title:
  qemu-arm-static 5.1 can't run gcc

Status in QEMU:
  New

Bug description:
  Issue discovered while trying to build pikvm (1)

  Long story short: when using qemu-arm-static 5.1, gcc exits whith
  message:

  Allocating guest commpage: Operation not permitted

  
  when using qemu-arm-static v5.0, gcc "works"

  Steps to reproduce will follow

  (1)  https://github.com/pikvm/pikvm/blob/master/pages/building_os.md

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1891748/+subscriptions

Re: [PATCH v3 2/5] [RISCV_PM] Support CSRs required for RISC-V PM extension except for ones in hypervisor mode

2020-10-16 Thread Richard Henderson

On 10/16/20 3:11 PM, Alexey Baturo wrote:
> Signed-off-by: Alexey Baturo 
> ---
>  target/riscv/cpu.c  |   1 +
>  target/riscv/cpu.h  |  11 ++
>  target/riscv/cpu_bits.h |  66 ++
>  target/riscv/csr.c  | 264 
>  4 files changed, 342 insertions(+)

Acked-by: Richard Henderson 

I'd be delighted to see the J working group address the security concerns.  And
to address the fact that existing hardware will *not* read 0 for the *MTE CSRs,
so it's silly to insist on that retroactively.  Code should be explicitly
checking for J in MISA.

r~

Re: [PATCH v3 1/5] [RISCV_PM] Add J-extension into RISC-V

2020-10-16 Thread Richard Henderson

On 10/16/20 3:11 PM, Alexey Baturo wrote:
> Signed-off-by: Alexey Baturo 
> ---
>  target/riscv/cpu.c | 4 
>  target/riscv/cpu.h | 2 ++
>  2 files changed, 6 insertions(+)

Reviewed-by: Richard Henderson 

r~

[Bug 1891748] Re: qemu-arm-static 5.1 can't run gcc

2020-10-16 Thread Devaev Maxim

Anyone? Seriously, the problem really exists and we even made a case
that reproduces it. Someone please take a look at this.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1891748

Title:
  qemu-arm-static 5.1 can't run gcc

Status in QEMU:
  New

Bug description:
  Issue discovered while trying to build pikvm (1)

  Long story short: when using qemu-arm-static 5.1, gcc exits whith
  message:

  Allocating guest commpage: Operation not permitted

  
  when using qemu-arm-static v5.0, gcc "works"

  Steps to reproduce will follow

  (1)  https://github.com/pikvm/pikvm/blob/master/pages/building_os.md

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1891748/+subscriptions

Re: [PATCH v3 4/5] [RISCV_PM] Support pointer masking for RISC-V for i/c/f/d/a types of instructions

2020-10-16 Thread Richard Henderson

On 10/16/20 3:11 PM, Alexey Baturo wrote:
> Signed-off-by: Alexey Baturo 
> ---
>  target/riscv/insn_trans/trans_rva.c.inc |  3 +++
>  target/riscv/insn_trans/trans_rvd.c.inc |  2 ++
>  target/riscv/insn_trans/trans_rvf.c.inc |  2 ++
>  target/riscv/insn_trans/trans_rvi.c.inc |  2 ++
>  target/riscv/translate.c| 14 ++
>  5 files changed, 23 insertions(+)

This will need changes for RVV, but let's omit that for now, so as not to race
with the in-flight update to rvv-1.0.

Reviewed-by: Richard Henderson 

r~

Re: [PATCH v3 3/5] [RISCV_PM] Print new PM CSRs in QEMU logs

2020-10-16 Thread Richard Henderson

On 10/16/20 3:11 PM, Alexey Baturo wrote:
> Signed-off-by: Alexey Baturo 
> ---
>  target/riscv/cpu.c | 19 +++
>  1 file changed, 19 insertions(+)

Reviewed-by: Richard Henderson 

r~

Re: [PATCH v3 5/5] [RISCV_PM] Implement address masking functions required for RISC-V Pointer Masking extension

2020-10-16 Thread Richard Henderson

On 10/16/20 3:11 PM, Alexey Baturo wrote:
> From: Anatoly Parshintsev 
> 
> Signed-off-by: Anatoly Parshintsev 
> ---
>  target/riscv/cpu.h   | 19 +++
>  target/riscv/translate.c | 34 --
>  2 files changed, 51 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson 

r~

Re: HTIF tohost symbol size check always fails

2020-10-16 Thread Alistair Francis

On Fri, Oct 16, 2020 at 2:35 PM Peer Adelt  wrote:
>
> The solution was even easier: I forgot to load the proxy kernel. As soon as I 
> replaced the command-line parameter "-kernel " with "-kernel  
> -append ", everything was working as expected.

Even better, you can skip the proxy kernel.

You can run:

`-bios default -kernel `

Alistair

>
> Without your hint about my possibly misconfigured toolchain I would have 
> probably continued to search for the error in the QEMU HTIF device. But in 
> fact it was due to the wrong binary.
>
> Thanks a lot! :-)
>
> > On 16. Oct 2020, at 20:03, Alistair Francis  wrote:
> >
> > On Fri, Oct 16, 2020 at 7:59 AM Peer Adelt  wrote:
> >>
> >> Hi,
> >>
> >> I have a problem with the RISC-V HTIF device.
> >>
> >> Every binary I have compiled for Spike on riscv32 fails with the following 
> >> error message: "HTIF tohost must be 8 bytes"
> >>
> >> This happens regardless of which program I have translated for Spike. This 
> >> is also the case with the official riscv-compliance tests, for example.
> >>
> >> The query "if (st_size != 8)" in the HTIF device always fails, because 
> >> st_size seems to be always 0.
> >>
> >> To be able to reproduce it:
> >> - QEMU GIT Hash: d0ed6a69d399ae193959225cdeaa9382746c91cc (tag "v5.1.0")
> >
> > I just checked with this hash and with the current master and on both
> > I can run a ELF executable on the Spike machine for RV32.
> >
> >> - System: Mac OS 10.14.6 (Darwin Kernel Version 18.7.0)
> >> - Compiler: Latest SiFive Build for GCC under OSX
> >
> > Maybe try using an official toolchain instead of a vendor fork.
> >
> > Alistair
> >
> >> - Command: qemu-system-riscv32 -M spike -nographic -bios none -kernel 
> >> 
> >>
> >> Best regards,
> >> Peer Adelt
>

Re: [PATCH v3 0/5] RISC-V Pointer Masking implementation

2020-10-16 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20201016221138.10371-1-space.monkey.deliv...@gmail.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20201016221138.10371-1-space.monkey.deliv...@gmail.com
Subject: [PATCH v3 0/5] RISC-V Pointer Masking implementation

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  
patchew/20201016210754.818257-1-richard.hender...@linaro.org -> 
patchew/20201016210754.818257-1-richard.hender...@linaro.org
 * [new tag] 
patchew/20201016221138.10371-1-space.monkey.deliv...@gmail.com -> 
patchew/20201016221138.10371-1-space.monkey.deliv...@gmail.com
Switched to a new branch 'test'
fb26b62 Implement address masking functions required for RISC-V Pointer Masking 
extension
1135d53 Support pointer masking for RISC-V for i/c/f/d/a types of instructions
760b894 Print new PM CSRs in QEMU logs
c4c4e27 Support CSRs required for RISC-V PM extension except for ones in 
hypervisor mode
5981961 Add J-extension into RISC-V

=== OUTPUT BEGIN ===
1/5 Checking commit 5981961fdb00 (Add J-extension into RISC-V)
2/5 Checking commit c4c4e27fe937 (Support CSRs required for RISC-V PM extension 
except for ones in hypervisor mode)
WARNING: Block comments use a leading /* on a separate line
#31: FILE: target/riscv/cpu.h:230:
+/* CSRs for PM

ERROR: trailing whitespace
#146: FILE: target/riscv/csr.c:1258:
+/* Functions to access Pointer Masking feature registers $

WARNING: Block comments use a leading /* on a separate line
#146: FILE: target/riscv/csr.c:1258:
+/* Functions to access Pointer Masking feature registers 

ERROR: line over 90 characters
#153: FILE: target/riscv/csr.c:1265:
+/* If priv lvls differ that means we're accessing csr from higher priv 
lvl, so allow */

WARNING: line over 80 characters
#172: FILE: target/riscv/csr.c:1284:
+/* We're in same priv lvl, so we allow to modify csr only if pm_current==1 
*/

ERROR: line over 90 characters
#194: FILE: target/riscv/csr.c:1306:
+  "MMTE: WPRI violation written 0x%lx vs expected 
0x%lx\n", val, wpri_val);

ERROR: line over 90 characters
#220: FILE: target/riscv/csr.c:1332:
+  "SMTE: WPRI violation written 0x%lx vs expected 
0x%lx\n", val, wpri_val);

ERROR: line over 90 characters
#249: FILE: target/riscv/csr.c:1361:
+  "UMTE: WPRI violation written 0x%lx vs expected 
0x%lx\n", val, wpri_val);

total: 5 errors, 3 warnings, 382 lines checked

Patch 2/5 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

3/5 Checking commit 760b8944af78 (Print new PM CSRs in QEMU logs)
WARNING: line over 80 characters
#22: FILE: target/riscv/cpu.c:262:
+qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "upmbase ", 
env->upmbase);

WARNING: line over 80 characters
#23: FILE: target/riscv/cpu.c:263:
+qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "upmmask ", 
env->upmmask);

WARNING: line over 80 characters
#26: FILE: target/riscv/cpu.c:266:
+qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "spmbase ", 
env->spmbase);

WARNING: line over 80 characters
#27: FILE: target/riscv/cpu.c:267:
+qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "spmmask ", 
env->spmmask);

WARNING: line over 80 characters
#30: FILE: target/riscv/cpu.c:270:
+qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "mpmbase ", 
env->mpmbase);

WARNING: line over 80 characters
#31: FILE: target/riscv/cpu.c:271:
+qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "mpmmask ", 
env->mpmmask);

total: 0 errors, 6 warnings, 25 lines checked

Patch 3/5 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
4/5 Checking commit 1135d53a1565 (Support pointer masking for RISC-V for 
i/c/f/d/a types of instructions)
5/5 Checking commit fb26b62914a4 (Implement address masking functions required 
for RISC-V Pointer Masking extension)
WARNING: line over 80 characters
#110: FILE: target/riscv/translate.c:966:
+pm_mask[PRV_U] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, 
upmmask),

WARNING: line over 80 characters
#112: FILE: target/riscv/translate.c:968:
+pm_base[PRV_U] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, 
upmbase),

WARNING: line over 80 characters
#114: FILE: target/riscv/translate.c:970:
+pm_mask[PRV_S] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, 
spmmask),

WARNING: line over 80 characters
#116: FILE: target/riscv/translate.c:972:
+pm_base[PRV_S] = tcg_global_mem_new(cpu_env

[PATCH v3 4/5] [RISCV_PM] Support pointer masking for RISC-V for i/c/f/d/a types of instructions

2020-10-16 Thread Alexey Baturo

Signed-off-by: Alexey Baturo 
---
 target/riscv/insn_trans/trans_rva.c.inc |  3 +++
 target/riscv/insn_trans/trans_rvd.c.inc |  2 ++
 target/riscv/insn_trans/trans_rvf.c.inc |  2 ++
 target/riscv/insn_trans/trans_rvi.c.inc |  2 ++
 target/riscv/translate.c| 14 ++
 5 files changed, 23 insertions(+)

diff --git a/target/riscv/insn_trans/trans_rva.c.inc 
b/target/riscv/insn_trans/trans_rva.c.inc
index be8a9f06dd..5559e347ba 100644
--- a/target/riscv/insn_trans/trans_rva.c.inc
+++ b/target/riscv/insn_trans/trans_rva.c.inc
@@ -26,6 +26,7 @@ static inline bool gen_lr(DisasContext *ctx, arg_atomic *a, 
MemOp mop)
 if (a->rl) {
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
 }
+gen_pm_adjust_address(ctx, src1, src1);
 tcg_gen_qemu_ld_tl(load_val, src1, ctx->mem_idx, mop);
 if (a->aq) {
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
@@ -46,6 +47,7 @@ static inline bool gen_sc(DisasContext *ctx, arg_atomic *a, 
MemOp mop)
 TCGLabel *l2 = gen_new_label();
 
 gen_get_gpr(src1, a->rs1);
+gen_pm_adjust_address(ctx, src1, src1);
 tcg_gen_brcond_tl(TCG_COND_NE, load_res, src1, l1);
 
 gen_get_gpr(src2, a->rs2);
@@ -91,6 +93,7 @@ static bool gen_amo(DisasContext *ctx, arg_atomic *a,
 gen_get_gpr(src1, a->rs1);
 gen_get_gpr(src2, a->rs2);
 
+gen_pm_adjust_address(ctx, src1, src1);
 (*func)(src2, src1, src2, ctx->mem_idx, mop);
 
 gen_set_gpr(a->rd, src2);
diff --git a/target/riscv/insn_trans/trans_rvd.c.inc 
b/target/riscv/insn_trans/trans_rvd.c.inc
index 4f832637fa..935342f66d 100644
--- a/target/riscv/insn_trans/trans_rvd.c.inc
+++ b/target/riscv/insn_trans/trans_rvd.c.inc
@@ -25,6 +25,7 @@ static bool trans_fld(DisasContext *ctx, arg_fld *a)
 TCGv t0 = tcg_temp_new();
 gen_get_gpr(t0, a->rs1);
 tcg_gen_addi_tl(t0, t0, a->imm);
+gen_pm_adjust_address(ctx, t0, t0);
 
 tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEQ);
 
@@ -40,6 +41,7 @@ static bool trans_fsd(DisasContext *ctx, arg_fsd *a)
 TCGv t0 = tcg_temp_new();
 gen_get_gpr(t0, a->rs1);
 tcg_gen_addi_tl(t0, t0, a->imm);
+gen_pm_adjust_address(ctx, t0, t0);
 
 tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEQ);
 
diff --git a/target/riscv/insn_trans/trans_rvf.c.inc 
b/target/riscv/insn_trans/trans_rvf.c.inc
index 3dfec8211d..04b3c3eb3d 100644
--- a/target/riscv/insn_trans/trans_rvf.c.inc
+++ b/target/riscv/insn_trans/trans_rvf.c.inc
@@ -30,6 +30,7 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a)
 TCGv t0 = tcg_temp_new();
 gen_get_gpr(t0, a->rs1);
 tcg_gen_addi_tl(t0, t0, a->imm);
+gen_pm_adjust_address(ctx, t0, t0);
 
 tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL);
 gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
@@ -47,6 +48,7 @@ static bool trans_fsw(DisasContext *ctx, arg_fsw *a)
 gen_get_gpr(t0, a->rs1);
 
 tcg_gen_addi_tl(t0, t0, a->imm);
+gen_pm_adjust_address(ctx, t0, t0);
 
 tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEUL);
 
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index d04ca0394c..bee7f6be46 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -141,6 +141,7 @@ static bool gen_load(DisasContext *ctx, arg_lb *a, MemOp 
memop)
 TCGv t1 = tcg_temp_new();
 gen_get_gpr(t0, a->rs1);
 tcg_gen_addi_tl(t0, t0, a->imm);
+gen_pm_adjust_address(ctx, t0, t0);
 
 tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx, memop);
 gen_set_gpr(a->rd, t1);
@@ -180,6 +181,7 @@ static bool gen_store(DisasContext *ctx, arg_sb *a, MemOp 
memop)
 TCGv dat = tcg_temp_new();
 gen_get_gpr(t0, a->rs1);
 tcg_gen_addi_tl(t0, t0, a->imm);
+gen_pm_adjust_address(ctx, t0, t0);
 gen_get_gpr(dat, a->rs2);
 
 tcg_gen_qemu_st_tl(dat, t0, ctx->mem_idx, memop);
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 79dca2291b..a7cbf909f3 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -101,6 +101,16 @@ static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
 tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32));
 }
 
+/*
+ * Temp stub: generates address adjustment for PointerMasking
+ */
+static void gen_pm_adjust_address(DisasContext *s,
+  TCGv_i64  dst,
+  TCGv_i64  src)
+{
+tcg_gen_mov_i64(dst, src);
+}
+
 /*
  * A narrow n-bit operation, where n < FLEN, checks that input operands
  * are correctly Nan-boxed, i.e., all upper FLEN - n bits are 1.
@@ -380,6 +390,7 @@ static void gen_load_c(DisasContext *ctx, uint32_t opc, int 
rd, int rs1,
 TCGv t1 = tcg_temp_new();
 gen_get_gpr(t0, rs1);
 tcg_gen_addi_tl(t0, t0, imm);
+gen_pm_adjust_address(ctx, t0, t0);
 int memop = tcg_memop_lookup[(opc >> 12) & 0x7];
 
 if (memop < 0) {
@@ -400,6 +411,7 @@ static void gen_store_c(DisasContext *ctx, uint32_t opc,

[PATCH v3 3/5] [RISCV_PM] Print new PM CSRs in QEMU logs

2020-10-16 Thread Alexey Baturo

Signed-off-by: Alexey Baturo 
---
 target/riscv/cpu.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index d63031eb08..6ba3e98508 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -255,6 +255,25 @@ static void riscv_cpu_dump_state(CPUState *cs, FILE *f, 
int flags)
 qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "htval ", env->htval);
 qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "mtval2 ", env->mtval2);
 }
+if (riscv_has_ext(env, RVJ)) {
+qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "mmte", env->mmte);
+switch (env->priv) {
+case PRV_U:
+qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "upmbase ", 
env->upmbase);
+qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "upmmask ", 
env->upmmask);
+break;
+case PRV_S:
+qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "spmbase ", 
env->spmbase);
+qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "spmmask ", 
env->spmmask);
+break;
+case PRV_M:
+qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "mpmbase ", 
env->mpmbase);
+qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "mpmmask ", 
env->mpmmask);
+break;
+default:
+assert(0 && "Unreachable");
+}
+}
 #endif
 
 for (i = 0; i < 32; i++) {
-- 
2.20.1

[PATCH v3 5/5] [RISCV_PM] Implement address masking functions required for RISC-V Pointer Masking extension

2020-10-16 Thread Alexey Baturo

From: Anatoly Parshintsev 

Signed-off-by: Anatoly Parshintsev 
---
 target/riscv/cpu.h   | 19 +++
 target/riscv/translate.c | 34 --
 2 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 21e47b8283..a481c205c4 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -385,6 +385,7 @@ FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1)
 FIELD(TB_FLAGS, LMUL, 3, 2)
 FIELD(TB_FLAGS, SEW, 5, 3)
 FIELD(TB_FLAGS, VILL, 8, 1)
+FIELD(TB_FLAGS, PM_ENABLED, 9, 1)
 
 /*
  * A simplification for VLMAX
@@ -431,6 +432,24 @@ static inline void cpu_get_tb_cpu_state(CPURISCVState 
*env, target_ulong *pc,
 if (riscv_cpu_fp_enabled(env)) {
 flags |= env->mstatus & MSTATUS_FS;
 }
+if (riscv_has_ext(env, RVJ)) {
+int priv = cpu_mmu_index(env, false);
+bool pm_enabled = false;
+switch (priv) {
+case PRV_U:
+pm_enabled = env->mmte & U_PM_ENABLE;
+break;
+case PRV_S:
+pm_enabled = env->mmte & S_PM_ENABLE;
+break;
+case PRV_M:
+pm_enabled = env->mmte & M_PM_ENABLE;
+break;
+default:
+g_assert_not_reached();
+}
+flags = FIELD_DP32(flags, TB_FLAGS, PM_ENABLED, pm_enabled);
+}
 #endif
 *pflags = flags;
 }
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index a7cbf909f3..2cfc8fd26d 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -36,6 +36,9 @@ static TCGv cpu_gpr[32], cpu_pc, cpu_vl;
 static TCGv_i64 cpu_fpr[32]; /* assume F and D extensions */
 static TCGv load_res;
 static TCGv load_val;
+/* globals for PM CSRs */
+static TCGv pm_mask[4];
+static TCGv pm_base[4];
 
 #include "exec/gen-icount.h"
 
@@ -63,6 +66,10 @@ typedef struct DisasContext {
 uint16_t vlen;
 uint16_t mlen;
 bool vl_eq_vlmax;
+/* PointerMasking extension */
+bool pm_enabled;
+TCGv pm_mask;
+TCGv pm_base;
 } DisasContext;
 
 #ifdef TARGET_RISCV64
@@ -102,13 +109,19 @@ static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
 }
 
 /*
- * Temp stub: generates address adjustment for PointerMasking
+ * Generates address adjustment for PointerMasking
  */
 static void gen_pm_adjust_address(DisasContext *s,
   TCGv_i64  dst,
   TCGv_i64  src)
 {
-tcg_gen_mov_i64(dst, src);
+if (!s->pm_enabled) {
+/* Load unmodified address */
+tcg_gen_mov_i64(dst, src);
+} else {
+tcg_gen_andc_i64(dst, src, s->pm_mask);
+tcg_gen_or_i64(dst, dst, s->pm_base);
+}
 }
 
 /*
@@ -826,6 +839,10 @@ static void riscv_tr_init_disas_context(DisasContextBase 
*dcbase, CPUState *cs)
 ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL);
 ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
 ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
+ctx->pm_enabled = FIELD_EX32(tb_flags, TB_FLAGS, PM_ENABLED);
+int priv = cpu_mmu_index(env, false);
+ctx->pm_mask = pm_mask[priv];
+ctx->pm_base = pm_base[priv];
 }
 
 static void riscv_tr_tb_start(DisasContextBase *db, CPUState *cpu)
@@ -945,4 +962,17 @@ void riscv_translate_init(void)
  "load_res");
 load_val = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, load_val),
  "load_val");
+/* Assign PM CSRs to tcg globals */
+pm_mask[PRV_U] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, 
upmmask),
+"upmmask");
+pm_base[PRV_U] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, 
upmbase),
+"upmbase");
+pm_mask[PRV_S] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, 
spmmask),
+"spmmask");
+pm_base[PRV_S] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, 
spmbase),
+"spmbase");
+pm_mask[PRV_M] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, 
mpmmask),
+"mpmmask");
+pm_base[PRV_M] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, 
mpmbase),
+"mpmbase");
 }
-- 
2.20.1

[PATCH v3 2/5] [RISCV_PM] Support CSRs required for RISC-V PM extension except for ones in hypervisor mode

2020-10-16 Thread Alexey Baturo

Signed-off-by: Alexey Baturo 
---
 target/riscv/cpu.c  |   1 +
 target/riscv/cpu.h  |  11 ++
 target/riscv/cpu_bits.h |  66 ++
 target/riscv/csr.c  | 264 
 4 files changed, 342 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index fe6bab4a52..d63031eb08 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -440,6 +440,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 }
 if (cpu->cfg.ext_j) {
 target_misa |= RVJ;
+env->mmte |= PM_EXT_INITIAL;
 }
 if (cpu->cfg.ext_v) {
 target_misa |= RVV;
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index eca611a367..21e47b8283 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -226,6 +226,17 @@ struct CPURISCVState {
 
 /* True if in debugger mode.  */
 bool debugger;
+
+/* CSRs for PM
+ * TODO: move these csr to appropriate groups
+ */
+target_ulong mmte;
+target_ulong mpmmask;
+target_ulong mpmbase;
+target_ulong spmmask;
+target_ulong spmbase;
+target_ulong upmmask;
+target_ulong upmbase;
 #endif
 
 float_status fp_status;
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index bd36062877..84c93c77ae 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -354,6 +354,21 @@
 #define CSR_MHPMCOUNTER30H  0xb9e
 #define CSR_MHPMCOUNTER31H  0xb9f
 
+/* Custom user register */
+#define CSR_UMTE0x8c0
+#define CSR_UPMMASK 0x8c1
+#define CSR_UPMBASE 0x8c2
+
+/* Custom machine register */
+#define CSR_MMTE0x7c0
+#define CSR_MPMMASK 0x7c1
+#define CSR_MPMBASE 0x7c2
+
+/* Custom supervisor register */
+#define CSR_SMTE0x9c0
+#define CSR_SPMMASK 0x9c1
+#define CSR_SPMBASE 0x9c2
+
 /* Legacy Machine Protection and Translation (priv v1.9.1) */
 #define CSR_MBASE   0x380
 #define CSR_MBOUND  0x381
@@ -604,4 +619,55 @@
 #define MIE_UTIE   (1 << IRQ_U_TIMER)
 #define MIE_SSIE   (1 << IRQ_S_SOFT)
 #define MIE_USIE   (1 << IRQ_U_SOFT)
+
+/* general mte CSR bits*/
+#define PM_ENABLE   0x0001ULL
+#define PM_CURRENT  0x0002ULL
+#define PM_XS_MASK  0x0003ULL
+
+/* PM XS bits values */
+#define PM_EXT_DISABLE  0xULL
+#define PM_EXT_INITIAL  0x0001ULL
+#define PM_EXT_CLEAN0x0002ULL
+#define PM_EXT_DIRTY0x0003ULL
+
+/* offsets for every pair of control bits per each priv level */
+#define XS_OFFSET0ULL
+#define U_OFFSET 2ULL
+#define S_OFFSET 4ULL
+#define M_OFFSET 6ULL
+
+#define PM_XS_BITS   (PM_XS_MASK << XS_OFFSET)
+#define U_PM_ENABLE  (PM_ENABLE  << U_OFFSET)
+#define U_PM_CURRENT (PM_CURRENT << U_OFFSET)
+#define S_PM_ENABLE  (PM_ENABLE  << S_OFFSET)
+#define S_PM_CURRENT (PM_CURRENT << S_OFFSET)
+#define M_PM_ENABLE  (PM_ENABLE  << M_OFFSET)
+
+/* mmte CSR bits */
+#define MMTE_PM_XS_BITS PM_XS_BITS
+#define MMTE_U_PM_ENABLEU_PM_ENABLE
+#define MMTE_U_PM_CURRENT   U_PM_CURRENT
+#define MMTE_S_PM_ENABLES_PM_ENABLE
+#define MMTE_S_PM_CURRENT   S_PM_CURRENT
+#define MMTE_M_PM_ENABLEM_PM_ENABLE
+#define MMTE_MASK   (MMTE_U_PM_ENABLE | MMTE_U_PM_CURRENT | \
+ MMTE_S_PM_ENABLE | MMTE_S_PM_CURRENT | \
+ MMTE_M_PM_ENABLE | MMTE_PM_XS_BITS)
+
+/* smte CSR bits */
+#define SMTE_PM_XS_BITS PM_XS_BITS
+#define SMTE_U_PM_ENABLEU_PM_ENABLE
+#define SMTE_U_PM_CURRENT   U_PM_CURRENT
+#define SMTE_S_PM_ENABLES_PM_ENABLE
+#define SMTE_S_PM_CURRENT   S_PM_CURRENT
+#define SMTE_MASK   (SMTE_U_PM_ENABLE | SMTE_U_PM_CURRENT | \
+ SMTE_S_PM_ENABLE | SMTE_S_PM_CURRENT | \
+ SMTE_PM_XS_BITS)
+
+/* umte CSR bits */
+#define UMTE_U_PM_ENABLEU_PM_ENABLE
+#define UMTE_U_PM_CURRENT   U_PM_CURRENT
+#define UMTE_MASK   (UMTE_U_PM_ENABLE | MMTE_U_PM_CURRENT)
+
 #endif
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index aaef6c6f20..8679b5c80c 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -140,6 +140,11 @@ static int any(CPURISCVState *env, int csrno)
 return 0;
 }
 
+static int umode(CPURISCVState *env, int csrno)
+{
+return -!riscv_has_ext(env, RVU);
+}
+
 static int smode(CPURISCVState *env, int csrno)
 {
 return -!riscv_has_ext(env, RVS);
@@ -1250,6 +1255,250 @@ static int write_pmpaddr(CPURISCVState *env, int csrno, 
target_ulong val)
 return 0;
 }
 
+/* Functions to access Pointer Masking feature registers 
+ * We have to check if current priv lvl could modify
+ * csr in given mode
+ */
+static int check_pm_current_disabled(CPURISCVState *env, int csrno)
+{
+int csr_priv = get_field(csrno, 0xC00);
+/* If priv lvls differ that means we're accessing csr from higher priv 
lvl, so allow */
+if (env->priv != c

[PATCH v3 0/5] RISC-V Pointer Masking implementation

2020-10-16 Thread Alexey Baturo

Hi folks,

This is third iteration of patches to support Pointer Masking for RISC-V.
Most of suggestions have been addressed, however some of them not:
- applying mask for return value while reading PM CSR has been kept to mask 
higher priv level bits
- check_pm_current_disabled is not placed into CSR predicate, since the spec 
expects to read zero from PM CSR if no PM extenstion is present

Thanks

Alexey Baturo (4):
  [RISCV_PM] Add J-extension into RISC-V
  [RISCV_PM] Support CSRs required for RISC-V PM extension except for
ones in hypervisor mode
  [RISCV_PM] Print new PM CSRs in QEMU logs
  [RISCV_PM] Support pointer masking for RISC-V for i/c/f/d/a types of
instructions

Anatoly Parshintsev (1):
  [RISCV_PM] Implement address masking functions required for RISC-V
Pointer Masking extension

 target/riscv/cpu.c  |  24 +++
 target/riscv/cpu.h  |  32 +++
 target/riscv/cpu_bits.h |  66 ++
 target/riscv/csr.c  | 264 
 target/riscv/insn_trans/trans_rva.c.inc |   3 +
 target/riscv/insn_trans/trans_rvd.c.inc |   2 +
 target/riscv/insn_trans/trans_rvf.c.inc |   2 +
 target/riscv/insn_trans/trans_rvi.c.inc |   2 +
 target/riscv/translate.c|  44 
 9 files changed, 439 insertions(+)

-- 
2.20.1

[PATCH v3 1/5] [RISCV_PM] Add J-extension into RISC-V

2020-10-16 Thread Alexey Baturo

Signed-off-by: Alexey Baturo 
---
 target/riscv/cpu.c | 4 
 target/riscv/cpu.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 0bbfd7f457..fe6bab4a52 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -438,6 +438,9 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 if (cpu->cfg.ext_h) {
 target_misa |= RVH;
 }
+if (cpu->cfg.ext_j) {
+target_misa |= RVJ;
+}
 if (cpu->cfg.ext_v) {
 target_misa |= RVV;
 if (!is_power_of_2(cpu->cfg.vlen)) {
@@ -516,6 +519,7 @@ static Property riscv_cpu_properties[] = {
 DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true),
 /* This is experimental so mark with 'x-' */
 DEFINE_PROP_BOOL("x-h", RISCVCPU, cfg.ext_h, false),
+DEFINE_PROP_BOOL("x-j", RISCVCPU, cfg.ext_j, false),
 DEFINE_PROP_BOOL("x-v", RISCVCPU, cfg.ext_v, false),
 DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
 DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index de275782e6..eca611a367 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -66,6 +66,7 @@
 #define RVS RV('S')
 #define RVU RV('U')
 #define RVH RV('H')
+#define RVJ RV('J')
 
 /* S extension denotes that Supervisor mode exists, however it is possible
to have a core that support S mode but does not have an MMU and there
@@ -277,6 +278,7 @@ struct RISCVCPU {
 bool ext_s;
 bool ext_u;
 bool ext_h;
+bool ext_j;
 bool ext_v;
 bool ext_counters;
 bool ext_ifencei;
-- 
2.20.1

Re: [PATCH v2 1/2] accel/tcg: Add tlb_flush_page_bits_by_mmuidx*

2020-10-16 Thread Peter Maydell

On Fri, 16 Oct 2020 at 22:07, Richard Henderson
 wrote:
>
> On ARM, the Top Byte Ignore feature means that only 56 bits of
> the address are significant in the virtual address.  We are
> required to give the entire 64-bit address to FAR_ELx on fault,
> which means that we do not "clean" the top byte early in TCG.
>
> This new interface allows us to flush all 256 possible aliases
> for a given page, currently missed by tlb_flush_page*.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: HTIF tohost symbol size check always fails

2020-10-16 Thread Peer Adelt

The solution was even easier: I forgot to load the proxy kernel. As soon as I 
replaced the command-line parameter "-kernel " with "-kernel  -append 
", everything was working as expected. 

Without your hint about my possibly misconfigured toolchain I would have 
probably continued to search for the error in the QEMU HTIF device. But in fact 
it was due to the wrong binary.

Thanks a lot! :-)

> On 16. Oct 2020, at 20:03, Alistair Francis  wrote:
> 
> On Fri, Oct 16, 2020 at 7:59 AM Peer Adelt  wrote:
>> 
>> Hi,
>> 
>> I have a problem with the RISC-V HTIF device.
>> 
>> Every binary I have compiled for Spike on riscv32 fails with the following 
>> error message: "HTIF tohost must be 8 bytes"
>> 
>> This happens regardless of which program I have translated for Spike. This 
>> is also the case with the official riscv-compliance tests, for example.
>> 
>> The query "if (st_size != 8)" in the HTIF device always fails, because 
>> st_size seems to be always 0.
>> 
>> To be able to reproduce it:
>> - QEMU GIT Hash: d0ed6a69d399ae193959225cdeaa9382746c91cc (tag "v5.1.0")
> 
> I just checked with this hash and with the current master and on both
> I can run a ELF executable on the Spike machine for RV32.
> 
>> - System: Mac OS 10.14.6 (Darwin Kernel Version 18.7.0)
>> - Compiler: Latest SiFive Build for GCC under OSX
> 
> Maybe try using an official toolchain instead of a vendor fork.
> 
> Alistair
> 
>> - Command: qemu-system-riscv32 -M spike -nographic -bios none -kernel 
>> 
>> 
>> Best regards,
>> Peer Adelt

[PATCH v2 1/2] accel/tcg: Add tlb_flush_page_bits_by_mmuidx*

2020-10-16 Thread Richard Henderson

On ARM, the Top Byte Ignore feature means that only 56 bits of
the address are significant in the virtual address.  We are
required to give the entire 64-bit address to FAR_ELx on fault,
which means that we do not "clean" the top byte early in TCG.

This new interface allows us to flush all 256 possible aliases
for a given page, currently missed by tlb_flush_page*.

Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h |  36 ++
 accel/tcg/cputlb.c  | 275 ++--
 2 files changed, 302 insertions(+), 9 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 66f9b4cca6..4707ac140c 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -251,6 +251,25 @@ void tlb_flush_by_mmuidx_all_cpus(CPUState *cpu, uint16_t 
idxmap);
  * depend on when the guests translation ends the TB.
  */
 void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu, uint16_t idxmap);
+
+/**
+ * tlb_flush_page_bits_by_mmuidx
+ * @cpu: CPU whose TLB should be flushed
+ * @addr: virtual address of page to be flushed
+ * @idxmap: bitmap of mmu indexes to flush
+ * @bits: number of significant bits in address
+ *
+ * Similar to tlb_flush_page_mask, but with a bitmap of indexes.
+ */
+void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, target_ulong addr,
+   uint16_t idxmap, unsigned bits);
+
+/* Similarly, with broadcast and syncing. */
+void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *cpu, target_ulong addr,
+uint16_t idxmap, unsigned bits);
+void tlb_flush_page_bits_by_mmuidx_all_cpus_synced
+(CPUState *cpu, target_ulong addr, uint16_t idxmap, unsigned bits);
+
 /**
  * tlb_set_page_with_attrs:
  * @cpu: CPU to add this TLB entry for
@@ -337,6 +356,23 @@ static inline void 
tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu,
uint16_t idxmap)
 {
 }
+static inline void tlb_flush_page_bits_by_mmuidx(CPUState *cpu,
+ target_ulong addr,
+ uint16_t idxmap,
+ unsigned bits)
+{
+}
+static inline void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *cpu,
+  target_ulong addr,
+  uint16_t idxmap,
+  unsigned bits)
+{
+}
+static inline void
+tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState *cpu, target_ulong addr,
+  uint16_t idxmap, unsigned bits)
+{
+}
 #endif
 /**
  * probe_access:
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 23ab29..42ab79c1a5 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -409,12 +409,21 @@ void tlb_flush_all_cpus_synced(CPUState *src_cpu)
 tlb_flush_by_mmuidx_all_cpus_synced(src_cpu, ALL_MMUIDX_BITS);
 }
 
+static bool tlb_hit_page_mask_anyprot(CPUTLBEntry *tlb_entry,
+  target_ulong page, target_ulong mask)
+{
+page &= mask;
+mask &= TARGET_PAGE_MASK | TLB_INVALID_MASK;
+
+return (page == (tlb_entry->addr_read & mask) ||
+page == (tlb_addr_write(tlb_entry) & mask) ||
+page == (tlb_entry->addr_code & mask));
+}
+
 static inline bool tlb_hit_page_anyprot(CPUTLBEntry *tlb_entry,
 target_ulong page)
 {
-return tlb_hit_page(tlb_entry->addr_read, page) ||
-   tlb_hit_page(tlb_addr_write(tlb_entry), page) ||
-   tlb_hit_page(tlb_entry->addr_code, page);
+return tlb_hit_page_mask_anyprot(tlb_entry, page, -1);
 }
 
 /**
@@ -427,31 +436,45 @@ static inline bool tlb_entry_is_empty(const CPUTLBEntry 
*te)
 }
 
 /* Called with tlb_c.lock held */
-static inline bool tlb_flush_entry_locked(CPUTLBEntry *tlb_entry,
-  target_ulong page)
+static bool tlb_flush_entry_mask_locked(CPUTLBEntry *tlb_entry,
+target_ulong page,
+target_ulong mask)
 {
-if (tlb_hit_page_anyprot(tlb_entry, page)) {
+if (tlb_hit_page_mask_anyprot(tlb_entry, page, mask)) {
 memset(tlb_entry, -1, sizeof(*tlb_entry));
 return true;
 }
 return false;
 }
 
+static inline bool tlb_flush_entry_locked(CPUTLBEntry *tlb_entry,
+  target_ulong page)
+{
+return tlb_flush_entry_mask_locked(tlb_entry, page, -1);
+}
+
 /* Called with tlb_c.lock held */
-static inline void tlb_flush_vtlb_page_locked(CPUArchState *env, int mmu_idx,
-  target_ulong page)
+static void tlb_flush_vtlb_page_mask_locked(CPUArchState *env, int mmu_idx,
+target_ulong page,
+

[PATCH v2 2/2] target/arm: Use tlb_flush_page_bits_by_mmuidx*

2020-10-16 Thread Richard Henderson

When TBI is enabled in a given regime, 56 bits of the address
are significant and we need to clear out any other matching
virtual addresses with differing tags.

The other uses of tlb_flush_page (without mmuidx) in this file
are only used by aarch32 mode.

Fixes: 38d931687fa1
Reported-by: Jordan Frank 
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.c | 46 ++---
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index cd0779ff5f..f49b045d36 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -50,6 +50,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong 
address,
 #endif
 
 static void switch_mode(CPUARMState *env, int mode);
+static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx);
 
 static int vfp_gdb_get_reg(CPUARMState *env, GByteArray *buf, int reg)
 {
@@ -4457,6 +4458,33 @@ static int vae1_tlbmask(CPUARMState *env)
 }
 }
 
+/* Return 56 if TBI is enabled, 64 otherwise. */
+static int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx,
+  uint64_t addr)
+{
+uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
+int tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
+int select = extract64(addr, 55, 1);
+
+return (tbi >> select) & 1 ? 56 : 64;
+}
+
+static int vae1_tlbbits(CPUARMState *env, uint64_t addr)
+{
+ARMMMUIdx mmu_idx;
+
+/* Only the regime of the mmu_idx below is significant. */
+if (arm_is_secure_below_el3(env)) {
+mmu_idx = ARMMMUIdx_SE10_0;
+} else if ((env->cp15.hcr_el2 & (HCR_E2H | HCR_TGE))
+   == (HCR_E2H | HCR_TGE)) {
+mmu_idx = ARMMMUIdx_E20_0;
+} else {
+mmu_idx = ARMMMUIdx_E10_0;
+}
+return tlbbits_for_regime(env, mmu_idx, addr);
+}
+
 static void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
   uint64_t value)
 {
@@ -4593,8 +4621,9 @@ static void tlbi_aa64_vae1is_write(CPUARMState *env, 
const ARMCPRegInfo *ri,
 CPUState *cs = env_cpu(env);
 int mask = vae1_tlbmask(env);
 uint64_t pageaddr = sextract64(value << 12, 0, 56);
+int bits = vae1_tlbbits(env, pageaddr);
 
-tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, mask);
+tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits);
 }
 
 static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -4608,11 +4637,12 @@ static void tlbi_aa64_vae1_write(CPUARMState *env, 
const ARMCPRegInfo *ri,
 CPUState *cs = env_cpu(env);
 int mask = vae1_tlbmask(env);
 uint64_t pageaddr = sextract64(value << 12, 0, 56);
+int bits = vae1_tlbbits(env, pageaddr);
 
 if (tlb_force_broadcast(env)) {
-tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, mask);
+tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, 
bits);
 } else {
-tlb_flush_page_by_mmuidx(cs, pageaddr, mask);
+tlb_flush_page_bits_by_mmuidx(cs, pageaddr, mask, bits);
 }
 }
 
@@ -4621,9 +4651,10 @@ static void tlbi_aa64_vae2is_write(CPUARMState *env, 
const ARMCPRegInfo *ri,
 {
 CPUState *cs = env_cpu(env);
 uint64_t pageaddr = sextract64(value << 12, 0, 56);
+int bits = tlbbits_for_regime(env, ARMMMUIdx_E2, pageaddr);
 
-tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
- ARMMMUIdxBit_E2);
+tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr,
+  ARMMMUIdxBit_E2, bits);
 }
 
 static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -4631,9 +4662,10 @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, 
const ARMCPRegInfo *ri,
 {
 CPUState *cs = env_cpu(env);
 uint64_t pageaddr = sextract64(value << 12, 0, 56);
+int bits = tlbbits_for_regime(env, ARMMMUIdx_SE3, pageaddr);
 
-tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
- ARMMMUIdxBit_SE3);
+tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr,
+  ARMMMUIdxBit_SE3, bits);
 }
 
 static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri,
-- 
2.25.1

[PATCH v2 0/2] target/arm: Fix tlb flush page vs tbi

2020-10-16 Thread Richard Henderson

Since the FAR_ELx fix at 38d931687fa1, it is reported that
page granularity flushing is broken.

This makes sense, since TCG will record the entire virtual
address in its TLB, not simply the 56 significant bits.
With no other TCG support, the ARM backend should require
256 different page flushes to clear the virtual address of
any possible tag.

So I added a new tcg interface that allows passing the size
of the virtual address.  I thought a simple bit-count was a 
cleaner interface than passing in a mask, since it means that
we couldn't be passed nonsensical masks like 0xdeadbeef.  It
also makes it easy to re-direct special cases.

Changes for v2:
  * Add encode_pbm_to_runon/+decode_runon_to_pbm helpers (pmm).


r~


Richard Henderson (2):
  accel/tcg: Add tlb_flush_page_bits_by_mmuidx*
  target/arm: Use tlb_flush_page_bits_by_mmuidx*

 include/exec/exec-all.h |  36 ++
 accel/tcg/cputlb.c  | 275 ++--
 target/arm/helper.c |  46 ++-
 3 files changed, 341 insertions(+), 16 deletions(-)

-- 
2.25.1

Re: [PATCH] configure: actually disable 'git_update' mode with --disable-git-update

2020-10-16 Thread Dan Streetman

On Fri, Oct 2, 2020 at 9:11 AM Daniel P. Berrangé  wrote:
>
> On Wed, Sep 30, 2020 at 09:28:54PM -0400, Dan Streetman wrote:
> > On Tue, Sep 22, 2020 at 12:34 PM Daniel P. Berrangé  
> > wrote:
> > >
> > > On Wed, Jul 29, 2020 at 03:58:29PM -0400, Dan Streetman wrote:
> > > > The --disable-git-update configure param sets git_update=no, but
> > > > some later checks only look for the .git dir. This changes the
> > > > --enable-git-update to set git_update=yes but also fail if it
> > > > does not find a .git dir. Then all the later checks for the .git
> > > > dir can just be changed to a check for $git_update = "yes".
> > > >
> > > > Also update the Makefile to skip the 'git_update' checks if it has
> > > > been disabled.
> > > >
> > > > This is needed because downstream packagers, e.g. Debian, Ubuntu, etc,
> > > > also keep the source code in git, but do not want to enable the
> > > > 'git_update' mode; with the current code, that's not possible even
> > > > if the downstream package specifies --disable-git-update.
> > >
> > > Lets recap the original intended behaviour
> > >
> > >  1. User building from master qemu.git or a fork
> > >  a) git_update=yes (default)
> > >  - Always sync submodules to required commit
> > >
> > >  b) git_update=no  (--disable-git-update)
> > >  - Never sync submodules, user is responsible for sync
> > >  - Validate submodules are at correct commit and fail if not.
> > >
> > >  2. User building from tarball
> > >  - Never do anything git related at all
> > >
> > >
> > > Your change is removing the validation from  1.b).
> >
> > Would you accept adding a --disable-git-submodules-check so downstream
> > packagers can keep the source in git, but avoid the submodule
> > validation?
>
> It feels like the current option shouldn't be a boolean, rather
> a tri-state--with-git-submodules=[update|validate|ignore]
>

Ok, I updated the patch to do that:
https://lists.nongnu.org/archive/html/qemu-devel/2020-10/msg04799.html

> > Or do you have another suggestion for handling this?
>
> Assuming you're just using git for conveniently applying local
> downstream patches, you don't need the git repo to exist once
> getting to the build stage. IOW just delete the .git dir after
> applying patches before running a build.
>
>
> Regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>

[PATCH] configure: replace --enable/disable-git-update with --with-git-submodules

2020-10-16 Thread Dan Streetman

Replace the --enable-git-update and --disable-git-update configure params
with the param --with-git-submodules=(update|validate|ignore) to
allow 3 options for building from a git repo.

This is needed because downstream packagers, e.g. Debian, Ubuntu, etc,
also keep the source code in git, but do not want to enable the
'git_update' mode; with the current code, that's not possible even
if the downstream package specifies --disable-git-update.

Signed-off-by: Dan Streetman 
---
 Makefile | 26 ++-
 configure| 55 +---
 scripts/git-submodule.sh | 34 +++--
 3 files changed, 62 insertions(+), 53 deletions(-)

diff --git a/Makefile b/Makefile
index c37e513431..033455dc8f 100644
--- a/Makefile
+++ b/Makefile
@@ -34,33 +34,11 @@ ifneq ($(wildcard config-host.mak),)
 all:
 include config-host.mak
 
-git-submodule-update:
-
 .PHONY: git-submodule-update
-
-git_module_status := $(shell \
-  cd '$(SRC_PATH)' && \
-  GIT="$(GIT)" ./scripts/git-submodule.sh status $(GIT_SUBMODULES); \
-  echo $$?; \
-)
-
-ifeq (1,$(git_module_status))
-ifeq (no,$(GIT_UPDATE))
 git-submodule-update:
$(call quiet-command, \
-echo && \
-echo "GIT submodule checkout is out of date. Please run" && \
-echo "  scripts/git-submodule.sh update $(GIT_SUBMODULES)" && \
-echo "from the source directory checkout $(SRC_PATH)" && \
-echo && \
-exit 1)
-else
-git-submodule-update:
-   $(call quiet-command, \
-  (cd $(SRC_PATH) && GIT="$(GIT)" ./scripts/git-submodule.sh update 
$(GIT_SUBMODULES)), \
-  "GIT","$(GIT_SUBMODULES)")
-endif
-endif
+   (GIT="$(GIT)" "$(SRC_PATH)/scripts/git-submodule.sh" 
$(GIT_SUBMODULES_ACTION) $(GIT_SUBMODULES)), \
+   "GIT","$(GIT_SUBMODULES)")
 
 export NINJA=./ninjatool
 
diff --git a/configure b/configure
index f839c2a557..c5df778790 100755
--- a/configure
+++ b/configure
@@ -249,12 +249,12 @@ gdb_bin=$(command -v "gdb-multiarch" || command -v "gdb")
 
 if test -e "$source_path/.git"
 then
-git_update=yes
+git_submodules_action="update"
 git_submodules="ui/keycodemapdb"
 git_submodules="$git_submodules tests/fp/berkeley-testfloat-3"
 git_submodules="$git_submodules tests/fp/berkeley-softfloat-3"
 else
-git_update=no
+git_submodules_action="ignore"
 git_submodules=""
 
 if ! test -f "$source_path/ui/keycodemapdb/README"
@@ -1478,9 +1478,16 @@ for opt do
   ;;
   --with-git=*) git="$optarg"
   ;;
-  --enable-git-update) git_update=yes
+  --enable-git-update)
+  git_submodules_action="update"
+  echo "--enable-git-update deprecated, use --with-git-submodules=update"
   ;;
-  --disable-git-update) git_update=no
+  --disable-git-update)
+  git_submodules_action="validate"
+  echo "--disable-git-update deprecated, use 
--with-git-submodules=validate"
+  ;;
+  --with-git-submodules=*)
+  git_submodules_action="$optarg"
   ;;
   --enable-debug-mutex) debug_mutex=yes
   ;;
@@ -1528,6 +1535,20 @@ for opt do
   esac
 done
 
+case $git_submodules_action in
+update|validate)
+if test ! -e "$source_path/.git"; then
+echo "ERROR: cannot $git_submodules_action git submodules without 
.git"
+exit 1
+fi
+;;
+ignore) ;;
+*)
+echo "ERROR: invalid --with-git-submodules= value 
'$git_submodules_action'"
+exit 1
+;;
+esac
+
 firmwarepath="${firmwarepath:-$prefix/share/qemu-firmware}"
 libdir="${libdir:-$prefix/lib}"
 libexecdir="${libexecdir:-$prefix/libexec}"
@@ -1868,7 +1889,7 @@ python="$python -B"
 if test -z "$meson"; then
 if test "$explicit_python" = no && has meson && version_ge "$(meson 
--version)" 0.55.1; then
 meson=meson
-elif test -e "${source_path}/.git" && test $git_update = 'yes' ; then
+elif test $git_submodules_action != 'ignore' ; then
 meson=git
 elif test -e "${source_path}/meson/meson.py" ; then
 meson=internal
@@ -1936,7 +1957,7 @@ fi
 # Consult white-list to determine whether to enable werror
 # by default.  Only enable by default for git builds
 if test -z "$werror" ; then
-if test -e "$source_path/.git" && \
+if test "$git_submodules_action" != "ignore" && \
 { test "$linux" = "yes" || test "$mingw32" = "yes"; }; then
 werror="yes"
 else
@@ -3824,9 +3845,7 @@ fi
 case "$fdt" in
   auto | enabled | internal)
 # Simpler to always update submodule, even if not needed.
-if test -e "${source_path}/.git" && test $git_update = 'yes' ; then
-  git_submodules="${git_submodules} dtc"
-fi
+git_submodules="${git_submodules} dtc"
 ;;
 esac
 
@@ -4696,9 +4715,7 @@ fi
 case "$capstone" in
   auto | enabled | internal)
 # Simpler to always update submodule, even if not needed.
-if test -e "${source_path}/.git" && test $git_update = 'yes' ; then
-  git_submodules="${git_submodul

Re: [PATCH 4/5] ppc405_boards: use qdev properties instead of legacy m48t59_init() function

2020-10-16 Thread BALATON Zoltan via


On Fri, 16 Oct 2020, Mark Cave-Ayland wrote:

Signed-off-by: Mark Cave-Ayland 
---
hw/ppc/ppc405_boards.c | 10 +-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
index 6198ec1035..4687715b15 100644
--- a/hw/ppc/ppc405_boards.c
+++ b/hw/ppc/ppc405_boards.c
@@ -28,6 +28,8 @@
#include "qemu-common.h"
#include "cpu.h"
#include "hw/ppc/ppc.h"
+#include "hw/qdev-properties.h"
+#include "hw/sysbus.h"
#include "ppc405.h"
#include "hw/rtc/m48t59.h"
#include "hw/block/flash.h"
@@ -145,6 +147,8 @@ static void ref405ep_init(MachineState *machine)
char *filename;
ppc4xx_bd_info_t bd;
CPUPPCState *env;
+DeviceState *dev;
+SysBusDevice *s;
qemu_irq *pic;
MemoryRegion *bios;
MemoryRegion *sram = g_new(MemoryRegion, 1);
@@ -227,7 +231,11 @@ static void ref405ep_init(MachineState *machine)
/* Register FPGA */
ref405ep_fpga_init(sysmem, 0xF030);
/* Register NVRAM */
-m48t59_init(NULL, 0xF000, 0, 8192, 1968, 8);
+dev = qdev_new("sysbus-m48t08");
+qdev_prop_set_int32(dev, "base-year", 1968);


Is there anything that uses other than 1968 as base year? If not this 
could be the default in the device so you don't need these set prop calls 
here and in sun machines.


The only other place this device is used seems to be ppc/prep machine that 
uses the isa version but does not set a base year. Is that a bug? The 
original prep machine removed in b2ce76a0730 used 2000 but that's unlikely 
as well as these machines predate that. Anyway, the sysbus and isa 
versions are different so their default base-year could be set 
independently and then boards won't need to set this propery and may be 
could use qdev_create_simple instead or whatever that was renamed to.


Regards,
BALATON Zoltan


+s = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(s, &error_fatal);
+sysbus_mmio_map(s, 0, 0xF000);
/* Load kernel */
linux_boot = (kernel_filename != NULL);
if (linux_boot) {

Re: [PATCH 0/5] m48t59: remove legacy init functions

2020-10-16 Thread Hervé Poussineau


Le 16/10/2020 à 20:27, Mark Cave-Ayland a écrit :

This patchset is inspired by Philippe's "hw/rtc/m48t59: Simplify m48t59_init()"
patchset at https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg04493.html
but goes further: rather than tidy-up the legacy init functions, convert the
callers to use qdev properties directly so they can simply be removed.

Signed-off-by: Mark Cave-Ayland 


Mark Cave-Ayland (5):
   m48t59-isa: remove legacy m48t59_init_isa() function
   sun4m: use qdev properties instead of legacy m48t59_init() function
   sun4u: use qdev properties instead of legacy m48t59_init() function
   ppc405_boards: use qdev properties instead of legacy m48t59_init()
 function
   m48t59: remove legacy m48t59_init() function

  hw/ppc/ppc405_boards.c  | 10 +-
  hw/rtc/m48t59-isa.c | 25 -
  hw/rtc/m48t59.c | 35 ---
  hw/sparc/sun4m.c|  8 +++-
  hw/sparc64/sun4u.c  |  7 +--
  include/hw/rtc/m48t59.h |  6 --
  6 files changed, 21 insertions(+), 70 deletions(-)



Reviewed-by: Hervé Poussineau

Re: [PATCH v2 4/5] target/mips: Refactor helpers for fp comparison instructions

2020-10-16 Thread Philippe Mathieu-Daudé


On 10/9/20 4:47 PM, Philippe Mathieu-Daudé wrote:

Hi Aleksandar,

On 10/7/20 10:37 PM, Aleksandar Markovic wrote:

This change causes slighlty better performance of emulation of fp
comparison instructions via better compiler optimization of refactored
code. The functionality is otherwise unchanged.

Signed-off-by: Aleksandar Markovic 
---
  target/mips/fpu_helper.c | 56 
+++-

  1 file changed, 32 insertions(+), 24 deletions(-)


[...]


  /*
@@ -2080,12 +2088,12 @@ uint64_t helper_r6_cmp_d_ ## op(CPUMIPSState 
*env, uint64_t fdt0,   \

  {   \
  uint64_t c; \
  c = cond;   \
-    update_fcr31(env, GETPC()); \
  if (c) {    \
  return -1;  \
  } else {    \
  return 0;   \
  }   \
+    update_fcr31(env, GETPC()); \


Isn't it now never called (dead code)?


Confirmed:

target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_af’:
target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void 
function [-Werror=return-type]

 2097 | }
  | ^
target/mips/fpu_helper.c:2103:1: note: in expansion of macro ‘FOP_CONDN_D’
 2103 | FOP_CONDN_D(af,  (float64_unordered_quiet(fdt1, fdt0,
  | ^~~
Compiling C object libqemu-mips-softmmu.fa.p/target_mips_dsp_helper.c.o
target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_un’:
target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void 
function [-Werror=return-type]

 2097 | }
  | ^
target/mips/fpu_helper.c:2105:1: note: in expansion of macro ‘FOP_CONDN_D’
 2105 | FOP_CONDN_D(un,  (float64_unordered_quiet(fdt1, fdt0,
  | ^~~
target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_eq’:
target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void 
function [-Werror=return-type]

 2097 | }
  | ^
target/mips/fpu_helper.c:2107:1: note: in expansion of macro ‘FOP_CONDN_D’
 2107 | FOP_CONDN_D(eq,  (float64_eq_quiet(fdt0, fdt1,
  | ^~~
target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_ueq’:
target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void 
function [-Werror=return-type]

 2097 | }
  | ^
target/mips/fpu_helper.c:2109:1: note: in expansion of macro ‘FOP_CONDN_D’
 2109 | FOP_CONDN_D(ueq, (float64_unordered_quiet(fdt1, fdt0,
  | ^~~
target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_lt’:
target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void 
function [-Werror=return-type]

 2097 | }
  | ^
target/mips/fpu_helper.c:2113:1: note: in expansion of macro ‘FOP_CONDN_D’
 2113 | FOP_CONDN_D(lt,  (float64_lt_quiet(fdt0, fdt1,
  | ^~~
target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_ult’:
target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void 
function [-Werror=return-type]

 2097 | }
  | ^
target/mips/fpu_helper.c:2115:1: note: in expansion of macro ‘FOP_CONDN_D’
 2115 | FOP_CONDN_D(ult, (float64_unordered_quiet(fdt1, fdt0,
  | ^~~
target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_le’:
target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void 
function [-Werror=return-type]

 2097 | }
  | ^
target/mips/fpu_helper.c:2119:1: note: in expansion of macro ‘FOP_CONDN_D’
 2119 | FOP_CONDN_D(le,  (float64_le_quiet(fdt0, fdt1,
  | ^~~
target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_ule’:
target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void 
function [-Werror=return-type]

 2097 | }
  | ^
target/mips/fpu_helper.c:2121:1: note: in expansion of macro ‘FOP_CONDN_D’
 2121 | FOP_CONDN_D(ule, (float64_unordered_quiet(fdt1, fdt0,
  | ^~~
target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_saf’:
target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void 
function [-Werror=return-type]

 2097 | }
  | ^
target/mips/fpu_helper.c:2129:1: note: in expansion of macro ‘FOP_CONDN_D’
 2129 | FOP_CONDN_D(saf,  (float64_unordered(fdt1, fdt0,
  | ^~~
target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_sun’:
target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void 
function [-Werror=return-type]

 2097 | }
  | ^
target/mips/fpu_helper.c:2131:1: note: in expansion of macro ‘FOP_CONDN_D’
 2131 | FOP_CONDN_D(sun,  (float64_unordered(fdt1, fdt0,
  | ^~~
target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_seq’:
target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void 
function [-Werror=return-type]

 2097

Re: [PATCH] goldfish_rtc: re-arm the alarm after migration

2020-10-16 Thread Alistair Francis

On Fri, Oct 16, 2020 at 11:16 AM Laurent Vivier  wrote:
>
> After a migration the clock offset is updated, but we also
> need to re-arm the alarm if needed.
>
> Signed-off-by: Laurent Vivier 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/rtc/goldfish_rtc.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/hw/rtc/goldfish_rtc.c b/hw/rtc/goldfish_rtc.c
> index 0f4e8185a796..e07ff0164e0c 100644
> --- a/hw/rtc/goldfish_rtc.c
> +++ b/hw/rtc/goldfish_rtc.c
> @@ -211,6 +211,8 @@ static int goldfish_rtc_post_load(void *opaque, int 
> version_id)
>  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
>  s->tick_offset = s->tick_offset_vmstate - delta;
>
> +goldfish_rtc_set_alarm(s);
> +
>  return 0;
>  }
>
> --
> 2.26.2
>
>

Re: [PULL 00/10] Block layer patches

2020-10-16 Thread Peter Maydell

On Thu, 15 Oct 2020 at 15:50, Kevin Wolf  wrote:
>
> The following changes since commit 57c98ea9acdcef5021f5671efa6475a5794a51c4:
>
>   Merge remote-tracking branch 'remotes/kraxel/tags/ui-20201014-pull-request' 
> into staging (2020-10-14 13:56:06 +0100)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to e1c4269763999e3b359fff19ad170e0110d3b457:
>
>   block: deprecate the sheepdog block driver (2020-10-15 16:06:28 +0200)
>
> 
> Block layer patches:
>
> - qemu-storage-daemon: Remove QemuOpts from --object parser
> - monitor: Fix order in monitor_cleanup()
> - Deprecate the sheepdog block driver


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.2
for any user-visible changes.

-- PMM

Re: [PATCH v11 00/12] linux-user: User support for AArch64 BTI

2020-10-16 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20201016184207.786698-1-richard.hender...@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20201016184207.786698-1-richard.hender...@linaro.org
Subject: [PATCH v11 00/12] linux-user: User support for AArch64 BTI

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] 
patchew/20201016184207.786698-1-richard.hender...@linaro.org -> 
patchew/20201016184207.786698-1-richard.hender...@linaro.org
Switched to a new branch 'test'
adddebd tests/tcg/aarch64: Add bti smoke tests
6de9e12 linux-user/elfload: Parse GNU_PROPERTY_AARCH64_FEATURE_1_AND
9b05077 linux-user/elfload: Parse NT_GNU_PROPERTY_TYPE_0 notes
e063fde linux-user/elfload: Use Error for load_elf_interp
5fa6305 linux-user/elfload: Use Error for load_elf_image
0bdf3b9 linux-user/elfload: Move PT_INTERP detection to first loop
3192943 linux-user/elfload: Adjust iteration over phdr
d35ac0e linux-user/elfload: Fix coding style in load_elf_image
f1019c0 linux-user/elfload: Avoid leaking interp_name using GLib memory API
46b8e04 include/elf: Add defines related to GNU property notes for AArch64
15162b4 linux-user: Set PAGE_TARGET_1 for TARGET_PROT_BTI
49ed74b linux-user/aarch64: Reset btype for signals

=== OUTPUT BEGIN ===
1/12 Checking commit 49ed74bc1aee (linux-user/aarch64: Reset btype for signals)
2/12 Checking commit 15162b4dcbec (linux-user: Set PAGE_TARGET_1 for 
TARGET_PROT_BTI)
3/12 Checking commit 46b8e04781a1 (include/elf: Add defines related to GNU 
property notes for AArch64)
4/12 Checking commit f1019c0416c3 (linux-user/elfload: Avoid leaking 
interp_name using GLib memory API)
5/12 Checking commit d35ac0e1930b (linux-user/elfload: Fix coding style in 
load_elf_image)
6/12 Checking commit 3192943b8d75 (linux-user/elfload: Adjust iteration over 
phdr)
7/12 Checking commit 0bdf3b9c07f6 (linux-user/elfload: Move PT_INTERP detection 
to first loop)
8/12 Checking commit 5fa63056e084 (linux-user/elfload: Use Error for 
load_elf_image)
9/12 Checking commit e063fde7d4b1 (linux-user/elfload: Use Error for 
load_elf_interp)
10/12 Checking commit 9b05077fa2ed (linux-user/elfload: Parse 
NT_GNU_PROPERTY_TYPE_0 notes)
11/12 Checking commit 6de9e12b2adf (linux-user/elfload: Parse 
GNU_PROPERTY_AARCH64_FEATURE_1_AND)
12/12 Checking commit adddebd8a702 (tests/tcg/aarch64: Add bti smoke tests)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#37: 
new file mode 100644

ERROR: externs should be avoided in .c files
#165: FILE: tests/tcg/aarch64/bti-2.c:56:
+extern char test_begin[], test_end[];

ERROR: use qemu_real_host_page_size instead of getpagesize()
#199: FILE: tests/tcg/aarch64/bti-2.c:90:
+void *p = mmap(0, getpagesize(),

ERROR: externs should be avoided in .c files
#236: FILE: tests/tcg/aarch64/bti-crt.inc.c:13:
+int main(void);

total: 3 errors, 1 warnings, 247 lines checked

Patch 12/12 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20201016184207.786698-1-richard.hender...@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH v11 12/12] tests/tcg/aarch64: Add bti smoke tests

2020-10-16 Thread Richard Henderson

The note test requires gcc 10 for -mbranch-protection=standard.
The mmap test uses PROT_BTI and does not require special compiler support.

Acked-by: Alex Bennée 
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
v9: Expect and require gcc 10.
v11: Squash mmap smoke test.
---
 tests/tcg/aarch64/bti-1.c |  62 +
 tests/tcg/aarch64/bti-2.c | 108 ++
 tests/tcg/aarch64/bti-crt.inc.c   |  51 ++
 tests/tcg/aarch64/Makefile.target |  10 +++
 tests/tcg/configure.sh|   4 ++
 5 files changed, 235 insertions(+)
 create mode 100644 tests/tcg/aarch64/bti-1.c
 create mode 100644 tests/tcg/aarch64/bti-2.c
 create mode 100644 tests/tcg/aarch64/bti-crt.inc.c

diff --git a/tests/tcg/aarch64/bti-1.c b/tests/tcg/aarch64/bti-1.c
new file mode 100644
index 00..61924f0d7a
--- /dev/null
+++ b/tests/tcg/aarch64/bti-1.c
@@ -0,0 +1,62 @@
+/*
+ * Branch target identification, basic notskip cases.
+ */
+
+#include "bti-crt.inc.c"
+
+static void skip2_sigill(int sig, siginfo_t *info, ucontext_t *uc)
+{
+uc->uc_mcontext.pc += 8;
+uc->uc_mcontext.pstate = 1;
+}
+
+#define NOP   "nop"
+#define BTI_N "hint #32"
+#define BTI_C "hint #34"
+#define BTI_J "hint #36"
+#define BTI_JC"hint #38"
+
+#define BTYPE_1(DEST) \
+asm("mov %0,#1; adr x16, 1f; br x16; 1: " DEST "; mov %0,#0" \
+: "=r"(skipped) : : "x16")
+
+#define BTYPE_2(DEST) \
+asm("mov %0,#1; adr x16, 1f; blr x16; 1: " DEST "; mov %0,#0" \
+: "=r"(skipped) : : "x16", "x30")
+
+#define BTYPE_3(DEST) \
+asm("mov %0,#1; adr x15, 1f; br x15; 1: " DEST "; mov %0,#0" \
+: "=r"(skipped) : : "x15")
+
+#define TEST(WHICH, DEST, EXPECT) \
+do { WHICH(DEST); fail += skipped ^ EXPECT; } while (0)
+
+
+int main()
+{
+int fail = 0;
+int skipped;
+
+/* Signal-like with SA_SIGINFO.  */
+signal_info(SIGILL, skip2_sigill);
+
+TEST(BTYPE_1, NOP, 1);
+TEST(BTYPE_1, BTI_N, 1);
+TEST(BTYPE_1, BTI_C, 0);
+TEST(BTYPE_1, BTI_J, 0);
+TEST(BTYPE_1, BTI_JC, 0);
+
+TEST(BTYPE_2, NOP, 1);
+TEST(BTYPE_2, BTI_N, 1);
+TEST(BTYPE_2, BTI_C, 0);
+TEST(BTYPE_2, BTI_J, 1);
+TEST(BTYPE_2, BTI_JC, 0);
+
+TEST(BTYPE_3, NOP, 1);
+TEST(BTYPE_3, BTI_N, 1);
+TEST(BTYPE_3, BTI_C, 1);
+TEST(BTYPE_3, BTI_J, 0);
+TEST(BTYPE_3, BTI_JC, 0);
+
+return fail;
+}
diff --git a/tests/tcg/aarch64/bti-2.c b/tests/tcg/aarch64/bti-2.c
new file mode 100644
index 00..6dc8908b5a
--- /dev/null
+++ b/tests/tcg/aarch64/bti-2.c
@@ -0,0 +1,108 @@
+/*
+ * Branch target identification, basic notskip cases.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifndef PROT_BTI
+#define PROT_BTI  0x10
+#endif
+
+static void skip2_sigill(int sig, siginfo_t *info, void *vuc)
+{
+ucontext_t *uc = vuc;
+uc->uc_mcontext.pc += 8;
+uc->uc_mcontext.pstate = 1;
+}
+
+#define NOP   "nop"
+#define BTI_N "hint #32"
+#define BTI_C "hint #34"
+#define BTI_J "hint #36"
+#define BTI_JC"hint #38"
+
+#define BTYPE_1(DEST)\
+"mov x1, #1\n\t" \
+"adr x16, 1f\n\t"\
+"br x16\n"   \
+"1: " DEST "\n\t"\
+"mov x1, #0"
+
+#define BTYPE_2(DEST)\
+"mov x1, #1\n\t" \
+"adr x16, 1f\n\t"\
+"blr x16\n"  \
+"1: " DEST "\n\t"\
+"mov x1, #0"
+
+#define BTYPE_3(DEST)\
+"mov x1, #1\n\t" \
+"adr x15, 1f\n\t"\
+"br x15\n"   \
+"1: " DEST "\n\t"\
+"mov x1, #0"
+
+#define TEST(WHICH, DEST, EXPECT) \
+WHICH(DEST) "\n"  \
+".if " #EXPECT "\n\t" \
+"eor x1, x1," #EXPECT "\n"\
+".endif\n\t"  \
+"add x0, x0, x1\n\t"
+
+extern char test_begin[], test_end[];
+
+asm("\n"
+"test_begin:\n\t"
+BTI_C "\n\t"
+"mov x2, x30\n\t"
+"mov x0, #0\n\t"
+
+TEST(BTYPE_1, NOP, 1)
+TEST(BTYPE_1, BTI_N, 1)
+TEST(BTYPE_1, BTI_C, 0)
+TEST(BTYPE_1, BTI_J, 0)
+TEST(BTYPE_1, BTI_JC, 0)
+
+TEST(BTYPE_2, NOP, 1)
+TEST(BTYPE_2, BTI_N, 1)
+TEST(BTYPE_2, BTI_C, 0)
+TEST(BTYPE_2, BTI_J, 1)
+TEST(BTYPE_2, BTI_JC, 0)
+
+TEST(BTYPE_3, NOP, 1)
+TEST(BTYPE_3, BTI_N, 1)
+TEST(BTYPE_3, BTI_C, 1)
+TEST(BTYPE_3, BTI_J, 0)
+TEST(BTYPE_3, BTI_JC, 0)
+
+"ret x2\n"
+"test_end:"
+);
+
+int main()
+{
+struct sigaction sa;
+
+void *p = mmap(0, getpagesize(),
+   PROT_EXEC | PROT_READ | PROT_WRITE | PROT_BTI,
+   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+if (p == MAP_FAILED) {
+perror("mmap");
+return 1;
+}
+
+memset(&sa, 0, sizeof(sa));
+sa.sa_sigaction = skip2_sigill;
+sa.sa_flags = SA_SIGINFO;
+if (sigaction(SIGILL, &sa, NULL) < 0) {
+perror("sigaction");
+return 1;
+}
+
+memcpy(p, test_begin, test_end - test_begin);
+return ((int (*)(void))p)();
+}
diff --git a/tests/tcg/aarch64/b

[PATCH v11 11/12] linux-user/elfload: Parse GNU_PROPERTY_AARCH64_FEATURE_1_AND

2020-10-16 Thread Richard Henderson

Use the new generic support for NT_GNU_PROPERTY_TYPE_0.

Signed-off-by: Richard Henderson 
---
v11: Split out aarch64 bits from generic patch.
---
 linux-user/elfload.c | 48 ++--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 428dcaa152..bf8c1bd253 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1522,6 +1522,28 @@ static void elf_core_copy_regs(target_elf_gregset_t 
*regs,
 
 #include "elf.h"
 
+/* We must delay the following stanzas until after "elf.h". */
+#if defined(TARGET_AARCH64)
+
+static bool arch_parse_elf_property(uint32_t pr_type, uint32_t pr_datasz,
+const uint32_t *data,
+struct image_info *info,
+Error **errp)
+{
+if (pr_type == GNU_PROPERTY_AARCH64_FEATURE_1_AND) {
+if (pr_datasz != sizeof(uint32_t)) {
+error_setg(errp, "Ill-formed GNU_PROPERTY_AARCH64_FEATURE_1_AND");
+return false;
+}
+/* We will extract GNU_PROPERTY_AARCH64_FEATURE_1_BTI later. */
+info->note_flags = *data;
+}
+return true;
+}
+#define ARCH_USE_GNU_PROPERTY 1
+
+#else
+
 static bool arch_parse_elf_property(uint32_t pr_type, uint32_t pr_datasz,
 const uint32_t *data,
 struct image_info *info,
@@ -1531,6 +1553,8 @@ static bool arch_parse_elf_property(uint32_t pr_type, 
uint32_t pr_datasz,
 }
 #define ARCH_USE_GNU_PROPERTY 0
 
+#endif
+
 struct exec
 {
 unsigned int a_info;   /* Use macros N_MAGIC, etc for access */
@@ -2545,7 +2569,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 struct elfhdr *ehdr = (struct elfhdr *)bprm_buf;
 struct elf_phdr *phdr;
 abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
-int i, retval;
+int i, retval, prot_exec;
 Error *err = NULL;
 
 /* First of all, some simple consistency checks */
@@ -2712,6 +2736,26 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 info->brk = 0;
 info->elf_flags = ehdr->e_flags;
 
+prot_exec = PROT_EXEC;
+#ifdef TARGET_AARCH64
+/*
+ * If the BTI feature is present, this indicates that the executable
+ * pages of the startup binary should be mapped with PROT_BTI, so that
+ * branch targets are enforced.
+ *
+ * The startup binary is either the interpreter or the static executable.
+ * The interpreter is responsible for all pages of a dynamic executable.
+ *
+ * Elf notes are backward compatible to older cpus.
+ * Do not enable BTI unless it is supported.
+ */
+if ((info->note_flags & GNU_PROPERTY_AARCH64_FEATURE_1_BTI)
+&& (pinterp_name == NULL || *pinterp_name == 0)
+&& cpu_isar_feature(aa64_bti, ARM_CPU(thread_cpu))) {
+prot_exec |= TARGET_PROT_BTI;
+}
+#endif
+
 for (i = 0; i < ehdr->e_phnum; i++) {
 struct elf_phdr *eppnt = phdr + i;
 if (eppnt->p_type == PT_LOAD) {
@@ -2725,7 +2769,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 elf_prot |= PROT_WRITE;
 }
 if (eppnt->p_flags & PF_X) {
-elf_prot |= PROT_EXEC;
+elf_prot |= prot_exec;
 }
 
 vaddr = load_bias + eppnt->p_vaddr;
-- 
2.25.1

[PATCH v11 09/12] linux-user/elfload: Use Error for load_elf_interp

2020-10-16 Thread Richard Henderson

This is slightly clearer than just using strerror, though
the different forms produced by error_setg_file_open and
error_setg_errno isn't entirely convenient.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 56fbda93d0..04c04bc260 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2675,26 +2675,27 @@ static void load_elf_interp(const char *filename, 
struct image_info *info,
 char bprm_buf[BPRM_BUF_SIZE])
 {
 int fd, retval;
+Error *err = NULL;
 
 fd = open(path(filename), O_RDONLY);
 if (fd < 0) {
-goto exit_perror;
+error_setg_file_open(&err, errno, filename);
+error_report_err(err);
+exit(-1);
 }
 
 retval = read(fd, bprm_buf, BPRM_BUF_SIZE);
 if (retval < 0) {
-goto exit_perror;
+error_setg_errno(&err, errno, "Error reading file header");
+error_reportf_err(err, "%s: ", filename);
+exit(-1);
 }
+
 if (retval < BPRM_BUF_SIZE) {
 memset(bprm_buf + retval, 0, BPRM_BUF_SIZE - retval);
 }
 
 load_elf_image(filename, fd, info, NULL, bprm_buf);
-return;
-
- exit_perror:
-fprintf(stderr, "%s: %s\n", filename, strerror(errno));
-exit(-1);
 }
 
 static int symfind(const void *s0, const void *s1)
-- 
2.25.1

[PATCH v11 10/12] linux-user/elfload: Parse NT_GNU_PROPERTY_TYPE_0 notes

2020-10-16 Thread Richard Henderson

This is generic support, with the code disabled for all targets.

Signed-off-by: Richard Henderson 
---
v9: Only map the startup executable with BTI; anything else must be
handled by the interpreter.
v10: Split out preparatory patches (pmm).
v11: Mirror(-ish) the kernel's code structure (pmm).
---
 linux-user/qemu.h|   4 ++
 linux-user/elfload.c | 157 +++
 2 files changed, 161 insertions(+)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 941ca99722..534753ca12 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -61,6 +61,10 @@ struct image_info {
 abi_ulong   interpreter_loadmap_addr;
 abi_ulong   interpreter_pt_dynamic_addr;
 struct image_info *other_info;
+
+/* For target-specific processing of NT_GNU_PROPERTY_TYPE_0. */
+uint32_tnote_flags;
+
 #ifdef TARGET_MIPS
 int fp_abi;
 int interp_fp_abi;
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 04c04bc260..428dcaa152 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1522,6 +1522,15 @@ static void elf_core_copy_regs(target_elf_gregset_t 
*regs,
 
 #include "elf.h"
 
+static bool arch_parse_elf_property(uint32_t pr_type, uint32_t pr_datasz,
+const uint32_t *data,
+struct image_info *info,
+Error **errp)
+{
+g_assert_not_reached();
+}
+#define ARCH_USE_GNU_PROPERTY 0
+
 struct exec
 {
 unsigned int a_info;   /* Use macros N_MAGIC, etc for access */
@@ -2373,6 +2382,150 @@ void probe_guest_base(const char *image_name, abi_ulong 
guest_loaddr,
   "@ 0x%" PRIx64 "\n", (uint64_t)guest_base);
 }
 
+enum {
+/* The string "GNU\0" as a magic number. */
+GNU0_MAGIC = const_le32('G' | 'N' << 8 | 'U' << 16),
+NOTE_DATA_SZ = 1 * KiB,
+NOTE_NAME_SZ = 4,
+ELF_GNU_PROPERTY_ALIGN = ELF_CLASS == ELFCLASS32 ? 4 : 8,
+};
+
+/*
+ * Process a single gnu_property entry.
+ * Return false for error.
+ */
+static bool parse_elf_property(const uint32_t *data, int *off, int datasz,
+   struct image_info *info, bool have_prev_type,
+   uint32_t *prev_type, Error **errp)
+{
+uint32_t pr_type, pr_datasz, step;
+
+if (*off > datasz || !QEMU_IS_ALIGNED(*off, ELF_GNU_PROPERTY_ALIGN)) {
+goto error_data;
+}
+datasz -= *off;
+data += *off / sizeof(uint32_t);
+
+if (datasz < 2 * sizeof(uint32_t)) {
+goto error_data;
+}
+pr_type = data[0];
+pr_datasz = data[1];
+data += 2;
+datasz -= 2 * sizeof(uint32_t);
+step = ROUND_UP(pr_datasz, ELF_GNU_PROPERTY_ALIGN);
+if (step > datasz) {
+goto error_data;
+}
+
+/* Properties are supposed to be unique and sorted on pr_type. */
+if (have_prev_type && pr_type <= *prev_type) {
+if (pr_type == *prev_type) {
+error_setg(errp, "Duplicate property in PT_GNU_PROPERTY");
+} else {
+error_setg(errp, "Unsorted property in PT_GNU_PROPERTY");
+}
+return false;
+}
+*prev_type = pr_type;
+
+if (!arch_parse_elf_property(pr_type, pr_datasz, data, info, errp)) {
+return false;
+}
+
+*off += 2 * sizeof(uint32_t) + step;
+return true;
+
+ error_data:
+error_setg(errp, "Ill-formed property in PT_GNU_PROPERTY");
+return false;
+}
+
+/* Process NT_GNU_PROPERTY_TYPE_0. */
+static bool parse_elf_properties(int image_fd,
+ struct image_info *info,
+ const struct elf_phdr *phdr,
+ char bprm_buf[BPRM_BUF_SIZE],
+ Error **errp)
+{
+union {
+struct elf_note nhdr;
+uint32_t data[NOTE_DATA_SZ / sizeof(uint32_t)];
+} note;
+
+int n, off, datasz;
+bool have_prev_type;
+uint32_t prev_type;
+
+/* Unless the arch requires properties, ignore them. */
+if (!ARCH_USE_GNU_PROPERTY) {
+return true;
+}
+
+/* If the properties are crazy large, that's too bad. */
+n = phdr->p_filesz;
+if (n > sizeof(note)) {
+error_setg(errp, "PT_GNU_PROPERTY too large");
+return false;
+}
+if (n < sizeof(note.nhdr)) {
+error_setg(errp, "PT_GNU_PROPERTY too small");
+return false;
+}
+
+if (phdr->p_offset + n <= BPRM_BUF_SIZE) {
+memcpy(¬e, bprm_buf + phdr->p_offset, n);
+} else {
+ssize_t len = pread(image_fd, ¬e, n, phdr->p_offset);
+if (len != n) {
+error_setg_errno(errp, errno, "Error reading file header");
+return false;
+}
+}
+
+/*
+ * The contents of a valid PT_GNU_PROPERTY is a sequence
+ * of uint32_t -- swap them all now.
+ */
+#ifdef BSWAP_NEEDED
+for (int i = 0; i < n / 4; i++) {
+bswap32s(note.data + i)

[PATCH v11 02/12] linux-user: Set PAGE_TARGET_1 for TARGET_PROT_BTI

2020-10-16 Thread Richard Henderson

Transform the prot bit to a qemu internal page bit, and save
it in the page tables.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
v10: Add PAGE_BTI define (pmm).
---
 include/exec/cpu-all.h |  2 ++
 linux-user/syscall_defs.h  |  4 
 target/arm/cpu.h   |  5 +
 linux-user/mmap.c  | 16 
 target/arm/translate-a64.c |  6 +++---
 5 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 61e13b5038..656a2a8788 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -274,6 +274,8 @@ extern intptr_t qemu_host_page_mask;
 /* FIXME: Code that sets/uses this is broken and needs to go away.  */
 #define PAGE_RESERVED  0x0020
 #endif
+/* Target-specific bits that will be used via page_get_flags().  */
+#define PAGE_TARGET_1  0x0080
 
 #if defined(CONFIG_USER_ONLY)
 void page_dump(FILE *f);
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 731c3d5341..cabbfb762d 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -1277,6 +1277,10 @@ struct target_winsize {
 #define TARGET_PROT_SEM 0x08
 #endif
 
+#ifdef TARGET_AARCH64
+#define TARGET_PROT_BTI 0x10
+#endif
+
 /* Common */
 #define TARGET_MAP_SHARED  0x01/* Share changes */
 #define TARGET_MAP_PRIVATE 0x02/* Changes are private */
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index cfff1b5c8f..e8efe21a1b 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3443,6 +3443,11 @@ static inline MemTxAttrs 
*typecheck_memtxattrs(MemTxAttrs *x)
 #define arm_tlb_bti_gp(x) (typecheck_memtxattrs(x)->target_tlb_bit0)
 #define arm_tlb_mte_tagged(x) (typecheck_memtxattrs(x)->target_tlb_bit1)
 
+/*
+ * AArch64 usage of the PAGE_TARGET_* bits for linux-user.
+ */
+#define PAGE_BTI  PAGE_TARGET_1
+
 /*
  * Naming convention for isar_feature functions:
  * Functions which test 32-bit ID registers should have _aa32_ in
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index f261563420..00c05e6a0f 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -83,6 +83,22 @@ static int validate_prot_to_pageflags(int *host_prot, int 
prot)
 *host_prot = (prot & (PROT_READ | PROT_WRITE))
| (prot & PROT_EXEC ? PROT_READ : 0);
 
+#ifdef TARGET_AARCH64
+/*
+ * The PROT_BTI bit is only accepted if the cpu supports the feature.
+ * Since this is the unusual case, don't bother checking unless
+ * the bit has been requested.  If set and valid, record the bit
+ * within QEMU's page_flags.
+ */
+if (prot & TARGET_PROT_BTI) {
+ARMCPU *cpu = ARM_CPU(thread_cpu);
+if (cpu_isar_feature(aa64_bti, cpu)) {
+valid |= TARGET_PROT_BTI;
+page_flags |= PAGE_BTI;
+}
+}
+#endif
+
 return prot & ~valid ? 0 : page_flags;
 }
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 7188808341..072754fa24 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -14507,10 +14507,10 @@ static void disas_data_proc_simd_fp(DisasContext *s, 
uint32_t insn)
  */
 static bool is_guarded_page(CPUARMState *env, DisasContext *s)
 {
-#ifdef CONFIG_USER_ONLY
-return false;  /* FIXME */
-#else
 uint64_t addr = s->base.pc_first;
+#ifdef CONFIG_USER_ONLY
+return page_get_flags(addr) & PAGE_BTI;
+#else
 int mmu_idx = arm_to_core_mmu_idx(s->mmu_idx);
 unsigned int index = tlb_index(env, mmu_idx, addr);
 CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
-- 
2.25.1

[PATCH v11 08/12] linux-user/elfload: Use Error for load_elf_image

2020-10-16 Thread Richard Henderson

This is a bit clearer than open-coding some of this
with a bare c string.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 37 -
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 107a628a9e..56fbda93d0 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -12,6 +12,7 @@
 #include "qemu/guest-random.h"
 #include "qemu/units.h"
 #include "qemu/selfmap.h"
+#include "qapi/error.h"
 
 #ifdef _ARCH_PPC64
 #undef ARCH_DLINFO
@@ -2392,15 +2393,16 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 struct elf_phdr *phdr;
 abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
 int i, retval;
-const char *errmsg;
+Error *err = NULL;
 
 /* First of all, some simple consistency checks */
-errmsg = "Invalid ELF image for this architecture";
 if (!elf_check_ident(ehdr)) {
+error_setg(&err, "Invalid ELF image for this architecture");
 goto exit_errmsg;
 }
 bswap_ehdr(ehdr);
 if (!elf_check_ehdr(ehdr)) {
+error_setg(&err, "Invalid ELF image for this architecture");
 goto exit_errmsg;
 }
 
@@ -2444,13 +2446,11 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 g_autofree char *interp_name = NULL;
 
 if (*pinterp_name) {
-errmsg = "Multiple PT_INTERP entries";
+error_setg(&err, "Multiple PT_INTERP entries");
 goto exit_errmsg;
 }
+
 interp_name = g_malloc(eppnt->p_filesz);
-if (!interp_name) {
-goto exit_perror;
-}
 
 if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) {
 memcpy(interp_name, bprm_buf + eppnt->p_offset,
@@ -2459,11 +2459,11 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 retval = pread(image_fd, interp_name, eppnt->p_filesz,
eppnt->p_offset);
 if (retval != eppnt->p_filesz) {
-goto exit_perror;
+goto exit_read;
 }
 }
 if (interp_name[eppnt->p_filesz - 1] != 0) {
-errmsg = "Invalid PT_INTERP entry";
+error_setg(&err, "Invalid PT_INTERP entry");
 goto exit_errmsg;
 }
 *pinterp_name = g_steal_pointer(&interp_name);
@@ -2520,7 +2520,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
 -1, 0);
 if (load_addr == -1) {
-goto exit_perror;
+goto exit_mmap;
 }
 load_bias = load_addr - loaddr;
 
@@ -2587,7 +2587,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 image_fd, eppnt->p_offset - vaddr_po);
 
 if (error == -1) {
-goto exit_perror;
+goto exit_mmap;
 }
 }
 
@@ -2623,7 +2623,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) {
 Mips_elf_abiflags_v0 abiflags;
 if (eppnt->p_filesz < sizeof(Mips_elf_abiflags_v0)) {
-errmsg = "Invalid PT_MIPS_ABIFLAGS entry";
+error_setg(&err, "Invalid PT_MIPS_ABIFLAGS entry");
 goto exit_errmsg;
 }
 if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) {
@@ -2633,7 +2633,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 retval = pread(image_fd, &abiflags, 
sizeof(Mips_elf_abiflags_v0),
eppnt->p_offset);
 if (retval != sizeof(Mips_elf_abiflags_v0)) {
-goto exit_perror;
+goto exit_read;
 }
 }
 bswap_mips_abiflags(&abiflags);
@@ -2658,13 +2658,16 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 
  exit_read:
 if (retval >= 0) {
-errmsg = "Incomplete read of file header";
-goto exit_errmsg;
+error_setg(&err, "Incomplete read of file header");
+} else {
+error_setg_errno(&err, errno, "Error reading file header");
 }
- exit_perror:
-errmsg = strerror(errno);
+goto exit_errmsg;
+ exit_mmap:
+error_setg_errno(&err, errno, "Error mapping file");
+goto exit_errmsg;
  exit_errmsg:
-fprintf(stderr, "%s: %s\n", image_name, errmsg);
+error_reportf_err(err, "%s: ", image_name);
 exit(-1);
 }
 
-- 
2.25.1

[PATCH v11 03/12] include/elf: Add defines related to GNU property notes for AArch64

2020-10-16 Thread Richard Henderson

These are all of the defines required to parse
GNU_PROPERTY_AARCH64_FEATURE_1_AND, copied from binutils.
Other missing defines related to other GNU program headers
and notes are elided for now.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 include/elf.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/include/elf.h b/include/elf.h
index c117a4d1ab..10126ff809 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -26,9 +26,13 @@ typedef int64_t  Elf64_Sxword;
 #define PT_NOTE4
 #define PT_SHLIB   5
 #define PT_PHDR6
+#define PT_LOOS0x6000
+#define PT_HIOS0x6fff
 #define PT_LOPROC  0x7000
 #define PT_HIPROC  0x7fff
 
+#define PT_GNU_PROPERTY   (PT_LOOS + 0x474e553)
+
 #define PT_MIPS_REGINFO   0x7000
 #define PT_MIPS_RTPROC0x7001
 #define PT_MIPS_OPTIONS   0x7002
@@ -1657,6 +1661,24 @@ typedef struct elf64_shdr {
 #define NT_ARM_SYSTEM_CALL  0x404   /* ARM system call number */
 #define NT_ARM_SVE  0x405   /* ARM Scalable Vector Extension regs 
*/
 
+/* Defined note types for GNU systems.  */
+
+#define NT_GNU_PROPERTY_TYPE_0  5   /* Program property */
+
+/* Values used in GNU .note.gnu.property notes (NT_GNU_PROPERTY_TYPE_0).  */
+
+#define GNU_PROPERTY_STACK_SIZE 1
+#define GNU_PROPERTY_NO_COPY_ON_PROTECTED   2
+
+#define GNU_PROPERTY_LOPROC 0xc000
+#define GNU_PROPERTY_HIPROC 0xdfff
+#define GNU_PROPERTY_LOUSER 0xe000
+#define GNU_PROPERTY_HIUSER 0x
+
+#define GNU_PROPERTY_AARCH64_FEATURE_1_AND  0xc000
+#define GNU_PROPERTY_AARCH64_FEATURE_1_BTI  (1u << 0)
+#define GNU_PROPERTY_AARCH64_FEATURE_1_PAC  (1u << 1)
+
 /*
  * Physical entry point into the kernel.
  *
-- 
2.25.1

[PATCH v11 07/12] linux-user/elfload: Move PT_INTERP detection to first loop

2020-10-16 Thread Richard Henderson

For BTI, we need to know if the executable is static or dynamic,
which means looking for PT_INTERP earlier.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 60 +++-
 1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 210592aa90..107a628a9e 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2421,8 +2421,10 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 
 mmap_lock();
 
-/* Find the maximum size of the image and allocate an appropriate
-   amount of memory to handle that.  */
+/*
+ * Find the maximum size of the image and allocate an appropriate
+ * amount of memory to handle that.  Locate the interpreter, if any.
+ */
 loaddr = -1, hiaddr = 0;
 info->alignment = 0;
 for (i = 0; i < ehdr->e_phnum; ++i) {
@@ -2438,6 +2440,33 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 }
 ++info->nsegs;
 info->alignment |= eppnt->p_align;
+} else if (eppnt->p_type == PT_INTERP && pinterp_name) {
+g_autofree char *interp_name = NULL;
+
+if (*pinterp_name) {
+errmsg = "Multiple PT_INTERP entries";
+goto exit_errmsg;
+}
+interp_name = g_malloc(eppnt->p_filesz);
+if (!interp_name) {
+goto exit_perror;
+}
+
+if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) {
+memcpy(interp_name, bprm_buf + eppnt->p_offset,
+   eppnt->p_filesz);
+} else {
+retval = pread(image_fd, interp_name, eppnt->p_filesz,
+   eppnt->p_offset);
+if (retval != eppnt->p_filesz) {
+goto exit_perror;
+}
+}
+if (interp_name[eppnt->p_filesz - 1] != 0) {
+errmsg = "Invalid PT_INTERP entry";
+goto exit_errmsg;
+}
+*pinterp_name = g_steal_pointer(&interp_name);
 }
 }
 
@@ -2590,33 +2619,6 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 if (vaddr_em > info->brk) {
 info->brk = vaddr_em;
 }
-} else if (eppnt->p_type == PT_INTERP && pinterp_name) {
-g_autofree char *interp_name = NULL;
-
-if (*pinterp_name) {
-errmsg = "Multiple PT_INTERP entries";
-goto exit_errmsg;
-}
-interp_name = g_malloc(eppnt->p_filesz);
-if (!interp_name) {
-goto exit_perror;
-}
-
-if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) {
-memcpy(interp_name, bprm_buf + eppnt->p_offset,
-   eppnt->p_filesz);
-} else {
-retval = pread(image_fd, interp_name, eppnt->p_filesz,
-   eppnt->p_offset);
-if (retval != eppnt->p_filesz) {
-goto exit_perror;
-}
-}
-if (interp_name[eppnt->p_filesz - 1] != 0) {
-errmsg = "Invalid PT_INTERP entry";
-goto exit_errmsg;
-}
-*pinterp_name = g_steal_pointer(&interp_name);
 #ifdef TARGET_MIPS
 } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) {
 Mips_elf_abiflags_v0 abiflags;
-- 
2.25.1

[PATCH v11 05/12] linux-user/elfload: Fix coding style in load_elf_image

2020-10-16 Thread Richard Henderson

Fixing this now will clarify following patches.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 1a3150df7c..290ef70222 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2531,9 +2531,15 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 abi_ulong vaddr, vaddr_po, vaddr_ps, vaddr_ef, vaddr_em, vaddr_len;
 int elf_prot = 0;
 
-if (eppnt->p_flags & PF_R) elf_prot =  PROT_READ;
-if (eppnt->p_flags & PF_W) elf_prot |= PROT_WRITE;
-if (eppnt->p_flags & PF_X) elf_prot |= PROT_EXEC;
+if (eppnt->p_flags & PF_R) {
+elf_prot |= PROT_READ;
+}
+if (eppnt->p_flags & PF_W) {
+elf_prot |= PROT_WRITE;
+}
+if (eppnt->p_flags & PF_X) {
+elf_prot |= PROT_EXEC;
+}
 
 vaddr = load_bias + eppnt->p_vaddr;
 vaddr_po = TARGET_ELF_PAGEOFFSET(vaddr);
-- 
2.25.1

[PATCH v11 06/12] linux-user/elfload: Adjust iteration over phdr

2020-10-16 Thread Richard Henderson

The second loop uses a loop induction variable, and the first
does not.  Transform the first to match the second, to simplify
a following patch moving code between them.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 290ef70222..210592aa90 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2426,17 +2426,18 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 loaddr = -1, hiaddr = 0;
 info->alignment = 0;
 for (i = 0; i < ehdr->e_phnum; ++i) {
-if (phdr[i].p_type == PT_LOAD) {
-abi_ulong a = phdr[i].p_vaddr - phdr[i].p_offset;
+struct elf_phdr *eppnt = phdr + i;
+if (eppnt->p_type == PT_LOAD) {
+abi_ulong a = eppnt->p_vaddr - eppnt->p_offset;
 if (a < loaddr) {
 loaddr = a;
 }
-a = phdr[i].p_vaddr + phdr[i].p_memsz;
+a = eppnt->p_vaddr + eppnt->p_memsz;
 if (a > hiaddr) {
 hiaddr = a;
 }
 ++info->nsegs;
-info->alignment |= phdr[i].p_align;
+info->alignment |= eppnt->p_align;
 }
 }
 
-- 
2.25.1

[PATCH v11 01/12] linux-user/aarch64: Reset btype for signals

2020-10-16 Thread Richard Henderson

The kernel sets btype for the signal handler as if for a call.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/signal.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index d50c1ae583..b591790c22 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -506,10 +506,16 @@ static void target_setup_frame(int usig, struct 
target_sigaction *ka,
 + offsetof(struct target_rt_frame_record, tramp);
 }
 env->xregs[0] = usig;
-env->xregs[31] = frame_addr;
 env->xregs[29] = frame_addr + fr_ofs;
-env->pc = ka->_sa_handler;
 env->xregs[30] = return_addr;
+env->xregs[31] = frame_addr;
+env->pc = ka->_sa_handler;
+
+/* Invoke the signal handler as if by indirect call.  */
+if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
+env->btype = 2;
+}
+
 if (info) {
 tswap_siginfo(&frame->info, info);
 env->xregs[1] = frame_addr + offsetof(struct target_rt_sigframe, info);
-- 
2.25.1

[PATCH v11 04/12] linux-user/elfload: Avoid leaking interp_name using GLib memory API

2020-10-16 Thread Richard Henderson

From: Philippe Mathieu-Daudé 

Fix an unlikely memory leak in load_elf_image().

Fixes: bf858897b7 ("linux-user: Re-use load_elf_image for the main binary.")
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20201003174944.1972444-1-f4...@amsat.org>
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index f6022fd704..1a3150df7c 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2584,13 +2584,13 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 info->brk = vaddr_em;
 }
 } else if (eppnt->p_type == PT_INTERP && pinterp_name) {
-char *interp_name;
+g_autofree char *interp_name = NULL;
 
 if (*pinterp_name) {
 errmsg = "Multiple PT_INTERP entries";
 goto exit_errmsg;
 }
-interp_name = malloc(eppnt->p_filesz);
+interp_name = g_malloc(eppnt->p_filesz);
 if (!interp_name) {
 goto exit_perror;
 }
@@ -2609,7 +2609,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 errmsg = "Invalid PT_INTERP entry";
 goto exit_errmsg;
 }
-*pinterp_name = interp_name;
+*pinterp_name = g_steal_pointer(&interp_name);
 #ifdef TARGET_MIPS
 } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) {
 Mips_elf_abiflags_v0 abiflags;
@@ -2961,7 +2961,7 @@ int load_elf_binary(struct linux_binprm *bprm, struct 
image_info *info)
 if (elf_interpreter) {
 info->load_bias = interp_info.load_bias;
 info->entry = interp_info.entry;
-free(elf_interpreter);
+g_free(elf_interpreter);
 }
 
 #ifdef USE_ELF_CORE_DUMP
-- 
2.25.1

[PATCH v11 00/12] linux-user: User support for AArch64 BTI

2020-10-16 Thread Richard Henderson

The kernel abi for this was merged in v5.8, just as the qemu 5.1
merge window was closing, so this slipped to the next dev cycle.

Changes from v10:
  * Include Phil's plug of interp_name memory leak.
  * Convert error reporting to Error api.
  * Mirror the kernel's code structure for parsing notes
(though Error means that it's not exactly the same).
  * Split aarch64 stuff from basic note parsing patch.

Changes from v9:
  * Split what is now patch 7 into 3 more (pmm).
  * All prerequisites are now upstream.


r~


Philippe Mathieu-Daudé (1):
  linux-user/elfload: Avoid leaking interp_name using GLib memory API

Richard Henderson (11):
  linux-user/aarch64: Reset btype for signals
  linux-user: Set PAGE_TARGET_1 for TARGET_PROT_BTI
  include/elf: Add defines related to GNU property notes for AArch64
  linux-user/elfload: Fix coding style in load_elf_image
  linux-user/elfload: Adjust iteration over phdr
  linux-user/elfload: Move PT_INTERP detection to first loop
  linux-user/elfload: Use Error for load_elf_image
  linux-user/elfload: Use Error for load_elf_interp
  linux-user/elfload: Parse NT_GNU_PROPERTY_TYPE_0 notes
  linux-user/elfload: Parse GNU_PROPERTY_AARCH64_FEATURE_1_AND
  tests/tcg/aarch64: Add bti smoke tests

 include/elf.h |  22 ++
 include/exec/cpu-all.h|   2 +
 linux-user/qemu.h |   4 +
 linux-user/syscall_defs.h |   4 +
 target/arm/cpu.h  |   5 +
 linux-user/aarch64/signal.c   |  10 +-
 linux-user/elfload.c  | 326 +-
 linux-user/mmap.c |  16 ++
 target/arm/translate-a64.c|   6 +-
 tests/tcg/aarch64/bti-1.c |  62 ++
 tests/tcg/aarch64/bti-2.c | 108 ++
 tests/tcg/aarch64/bti-crt.inc.c   |  51 +
 tests/tcg/aarch64/Makefile.target |  10 +
 tests/tcg/configure.sh|   4 +
 14 files changed, 569 insertions(+), 61 deletions(-)
 create mode 100644 tests/tcg/aarch64/bti-1.c
 create mode 100644 tests/tcg/aarch64/bti-2.c
 create mode 100644 tests/tcg/aarch64/bti-crt.inc.c

-- 
2.25.1

[PATCH 5/5] m48t59: remove legacy m48t59_init() function

2020-10-16 Thread Mark Cave-Ayland

Now that all of the callers of this function have been switched to use qdev
properties, this legacy init function can now be removed.

Signed-off-by: Mark Cave-Ayland 
---
 hw/rtc/m48t59.c | 35 ---
 include/hw/rtc/m48t59.h |  4 
 2 files changed, 39 deletions(-)

diff --git a/hw/rtc/m48t59.c b/hw/rtc/m48t59.c
index 6525206976..d54929e861 100644
--- a/hw/rtc/m48t59.c
+++ b/hw/rtc/m48t59.c
@@ -564,41 +564,6 @@ const MemoryRegionOps m48t59_io_ops = {
 .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-/* Initialisation routine */
-Nvram *m48t59_init(qemu_irq IRQ, hwaddr mem_base,
-   uint32_t io_base, uint16_t size, int base_year,
-   int model)
-{
-DeviceState *dev;
-SysBusDevice *s;
-int i;
-
-for (i = 0; i < ARRAY_SIZE(m48txx_sysbus_info); i++) {
-if (m48txx_sysbus_info[i].size != size ||
-m48txx_sysbus_info[i].model != model) {
-continue;
-}
-
-dev = qdev_new(m48txx_sysbus_info[i].bus_name);
-qdev_prop_set_int32(dev, "base-year", base_year);
-s = SYS_BUS_DEVICE(dev);
-sysbus_realize_and_unref(s, &error_fatal);
-sysbus_connect_irq(s, 0, IRQ);
-if (io_base != 0) {
-memory_region_add_subregion(get_system_io(), io_base,
-sysbus_mmio_get_region(s, 1));
-}
-if (mem_base != 0) {
-sysbus_mmio_map(s, 0, mem_base);
-}
-
-return NVRAM(s);
-}
-
-assert(false);
-return NULL;
-}
-
 void m48t59_realize_common(M48t59State *s, Error **errp)
 {
 s->buffer = g_malloc0(s->size);
diff --git a/include/hw/rtc/m48t59.h b/include/hw/rtc/m48t59.h
index 9defe578d1..d9b45eb161 100644
--- a/include/hw/rtc/m48t59.h
+++ b/include/hw/rtc/m48t59.h
@@ -47,8 +47,4 @@ struct NvramClass {
 void (*toggle_lock)(Nvram *obj, int lock);
 };
 
-Nvram *m48t59_init(qemu_irq IRQ, hwaddr mem_base,
-   uint32_t io_base, uint16_t size, int base_year,
-   int type);
-
 #endif /* HW_M48T59_H */
-- 
2.20.1

[PATCH 4/5] ppc405_boards: use qdev properties instead of legacy m48t59_init() function

2020-10-16 Thread Mark Cave-Ayland

Signed-off-by: Mark Cave-Ayland 
---
 hw/ppc/ppc405_boards.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
index 6198ec1035..4687715b15 100644
--- a/hw/ppc/ppc405_boards.c
+++ b/hw/ppc/ppc405_boards.c
@@ -28,6 +28,8 @@
 #include "qemu-common.h"
 #include "cpu.h"
 #include "hw/ppc/ppc.h"
+#include "hw/qdev-properties.h"
+#include "hw/sysbus.h"
 #include "ppc405.h"
 #include "hw/rtc/m48t59.h"
 #include "hw/block/flash.h"
@@ -145,6 +147,8 @@ static void ref405ep_init(MachineState *machine)
 char *filename;
 ppc4xx_bd_info_t bd;
 CPUPPCState *env;
+DeviceState *dev;
+SysBusDevice *s;
 qemu_irq *pic;
 MemoryRegion *bios;
 MemoryRegion *sram = g_new(MemoryRegion, 1);
@@ -227,7 +231,11 @@ static void ref405ep_init(MachineState *machine)
 /* Register FPGA */
 ref405ep_fpga_init(sysmem, 0xF030);
 /* Register NVRAM */
-m48t59_init(NULL, 0xF000, 0, 8192, 1968, 8);
+dev = qdev_new("sysbus-m48t08");
+qdev_prop_set_int32(dev, "base-year", 1968);
+s = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(s, &error_fatal);
+sysbus_mmio_map(s, 0, 0xF000);
 /* Load kernel */
 linux_boot = (kernel_filename != NULL);
 if (linux_boot) {
-- 
2.20.1

[PATCH 1/5] m48t59-isa: remove legacy m48t59_init_isa() function

2020-10-16 Thread Mark Cave-Ayland

This function is no longer used within the codebase.

Signed-off-by: Mark Cave-Ayland 
---
 hw/rtc/m48t59-isa.c | 25 -
 include/hw/rtc/m48t59.h |  2 --
 2 files changed, 27 deletions(-)

diff --git a/hw/rtc/m48t59-isa.c b/hw/rtc/m48t59-isa.c
index cae315e488..dc21fb10a5 100644
--- a/hw/rtc/m48t59-isa.c
+++ b/hw/rtc/m48t59-isa.c
@@ -58,31 +58,6 @@ static M48txxInfo m48txx_isa_info[] = {
 }
 };
 
-Nvram *m48t59_init_isa(ISABus *bus, uint32_t io_base, uint16_t size,
-   int base_year, int model)
-{
-ISADevice *isa_dev;
-DeviceState *dev;
-int i;
-
-for (i = 0; i < ARRAY_SIZE(m48txx_isa_info); i++) {
-if (m48txx_isa_info[i].size != size ||
-m48txx_isa_info[i].model != model) {
-continue;
-}
-
-isa_dev = isa_new(m48txx_isa_info[i].bus_name);
-dev = DEVICE(isa_dev);
-qdev_prop_set_uint32(dev, "iobase", io_base);
-qdev_prop_set_int32(dev, "base-year", base_year);
-isa_realize_and_unref(isa_dev, bus, &error_fatal);
-return NVRAM(dev);
-}
-
-assert(false);
-return NULL;
-}
-
 static uint32_t m48txx_isa_read(Nvram *obj, uint32_t addr)
 {
 M48txxISAState *d = M48TXX_ISA(obj);
diff --git a/include/hw/rtc/m48t59.h b/include/hw/rtc/m48t59.h
index 04abedf3b2..9defe578d1 100644
--- a/include/hw/rtc/m48t59.h
+++ b/include/hw/rtc/m48t59.h
@@ -47,8 +47,6 @@ struct NvramClass {
 void (*toggle_lock)(Nvram *obj, int lock);
 };
 
-Nvram *m48t59_init_isa(ISABus *bus, uint32_t io_base, uint16_t size,
-   int base_year, int type);
 Nvram *m48t59_init(qemu_irq IRQ, hwaddr mem_base,
uint32_t io_base, uint16_t size, int base_year,
int type);
-- 
2.20.1

[PATCH 2/5] sun4m: use qdev properties instead of legacy m48t59_init() function

2020-10-16 Thread Mark Cave-Ayland

Signed-off-by: Mark Cave-Ayland 
---
 hw/sparc/sun4m.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
index 54a2b2f9ef..a9bb60f2b2 100644
--- a/hw/sparc/sun4m.c
+++ b/hw/sparc/sun4m.c
@@ -966,7 +966,13 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef,
 create_unimplemented_device("SUNW,sx", hwdef->sx_base, 0x2000);
 }
 
-nvram = m48t59_init(slavio_irq[0], hwdef->nvram_base, 0, 0x2000, 1968, 8);
+dev = qdev_new("sysbus-m48t08");
+qdev_prop_set_int32(dev, "base-year", 1968);
+s = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(s, &error_fatal);
+sysbus_connect_irq(s, 0, slavio_irq[0]);
+sysbus_mmio_map(s, 0, hwdef->nvram_base);
+nvram = NVRAM(dev);
 
 slavio_timer_init_all(hwdef->counter_base, slavio_irq[19], slavio_cpu_irq, 
smp_cpus);
 
-- 
2.20.1

[PATCH 3/5] sun4u: use qdev properties instead of legacy m48t59_init() function

2020-10-16 Thread Mark Cave-Ayland

Signed-off-by: Mark Cave-Ayland 
---
 hw/sparc64/sun4u.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c
index ad5ca2472a..05e659c8a4 100644
--- a/hw/sparc64/sun4u.c
+++ b/hw/sparc64/sun4u.c
@@ -671,10 +671,13 @@ static void sun4uv_init(MemoryRegion *address_space_mem,
 pci_ide_create_devs(pci_dev);
 
 /* Map NVRAM into I/O (ebus) space */
-nvram = m48t59_init(NULL, 0, 0, NVRAM_SIZE, 1968, 59);
-s = SYS_BUS_DEVICE(nvram);
+dev = qdev_new("sysbus-m48t59");
+qdev_prop_set_int32(dev, "base-year", 1968);
+s = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(s, &error_fatal);
 memory_region_add_subregion(pci_address_space_io(ebus), 0x2000,
 sysbus_mmio_get_region(s, 0));
+nvram = NVRAM(dev);
  
 initrd_size = 0;
 initrd_addr = 0;
-- 
2.20.1

[PATCH 0/5] m48t59: remove legacy init functions

2020-10-16 Thread Mark Cave-Ayland

This patchset is inspired by Philippe's "hw/rtc/m48t59: Simplify m48t59_init()"
patchset at https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg04493.html
but goes further: rather than tidy-up the legacy init functions, convert the
callers to use qdev properties directly so they can simply be removed.

Signed-off-by: Mark Cave-Ayland 


Mark Cave-Ayland (5):
  m48t59-isa: remove legacy m48t59_init_isa() function
  sun4m: use qdev properties instead of legacy m48t59_init() function
  sun4u: use qdev properties instead of legacy m48t59_init() function
  ppc405_boards: use qdev properties instead of legacy m48t59_init()
function
  m48t59: remove legacy m48t59_init() function

 hw/ppc/ppc405_boards.c  | 10 +-
 hw/rtc/m48t59-isa.c | 25 -
 hw/rtc/m48t59.c | 35 ---
 hw/sparc/sun4m.c|  8 +++-
 hw/sparc64/sun4u.c  |  7 +--
 include/hw/rtc/m48t59.h |  6 --
 6 files changed, 21 insertions(+), 70 deletions(-)

-- 
2.20.1

[PATCH] goldfish_rtc: re-arm the alarm after migration

2020-10-16 Thread Laurent Vivier

After a migration the clock offset is updated, but we also
need to re-arm the alarm if needed.

Signed-off-by: Laurent Vivier 
---
 hw/rtc/goldfish_rtc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/rtc/goldfish_rtc.c b/hw/rtc/goldfish_rtc.c
index 0f4e8185a796..e07ff0164e0c 100644
--- a/hw/rtc/goldfish_rtc.c
+++ b/hw/rtc/goldfish_rtc.c
@@ -211,6 +211,8 @@ static int goldfish_rtc_post_load(void *opaque, int 
version_id)
 qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 s->tick_offset = s->tick_offset_vmstate - delta;
 
+goldfish_rtc_set_alarm(s);
+
 return 0;
 }
 
-- 
2.26.2

Re: HTIF tohost symbol size check always fails

2020-10-16 Thread Alistair Francis

On Fri, Oct 16, 2020 at 7:59 AM Peer Adelt  wrote:
>
> Hi,
>
> I have a problem with the RISC-V HTIF device.
>
> Every binary I have compiled for Spike on riscv32 fails with the following 
> error message: "HTIF tohost must be 8 bytes"
>
> This happens regardless of which program I have translated for Spike. This is 
> also the case with the official riscv-compliance tests, for example.
>
> The query "if (st_size != 8)" in the HTIF device always fails, because 
> st_size seems to be always 0.
>
> To be able to reproduce it:
> - QEMU GIT Hash: d0ed6a69d399ae193959225cdeaa9382746c91cc (tag "v5.1.0")

I just checked with this hash and with the current master and on both
I can run a ELF executable on the Spike machine for RV32.

> - System: Mac OS 10.14.6 (Darwin Kernel Version 18.7.0)
> - Compiler: Latest SiFive Build for GCC under OSX

Maybe try using an official toolchain instead of a vendor fork.

Alistair

> - Command: qemu-system-riscv32 -M spike -nographic -bios none -kernel 
> 
>
> Best regards,
> Peer Adelt

Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)

2020-10-16 Thread Jens Axboe

On 10/16/20 12:04 PM, Ju Hyung Park wrote:
> A small update:
> 
> As per Stefano's suggestion, disabling io_uring support from QEMU from
> the configuration step did fix the problem and I'm no longer having
> hangs.
> 
> Looks like it __is__ an io_uring issue :(

Would be great if you could try 5.4.71 and see if that helps for your
issue.

-- 
Jens Axboe

Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)

2020-10-16 Thread Ju Hyung Park

A small update:

As per Stefano's suggestion, disabling io_uring support from QEMU from
the configuration step did fix the problem and I'm no longer having
hangs.

Looks like it __is__ an io_uring issue :(

Btw, I used liburing fe50048 for linking QEMU.

Thanks.


On Fri, Oct 2, 2020 at 4:35 PM Stefano Garzarella  wrote:
>
> Hi Ju,
>
> On Thu, Oct 01, 2020 at 11:30:14PM +0900, Ju Hyung Park wrote:
> > Hi Stefano,
> >
> > On Thu, Oct 1, 2020 at 5:59 PM Stefano Garzarella  
> > wrote:
> > > Please, can you share the qemu command line that you are using?
> > > This can be useful for the analysis.
> >
> > Sure.
>
> Thanks for sharing.
>
> The issue seems related to io_uring and the new io_uring fd monitoring
> implementation available from QEMU 5.0.
>
> I'll try to reproduce.
>
> For now, as a workaround, you can rebuild qemu by disabling io-uring support:
>
>   ../configure --disable-linux-io-uring ...
>
>
> Thanks,
> Stefano
>
> >
> > QEMU:
> > /usr/bin/qemu-system-x86_64 -name guest=win10,debug-threads=on -S
> > -object 
> > secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-win10/master-key.aes
> > -blockdev 
> > {"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}
> > -blockdev 
> > {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}
> > -blockdev 
> > {"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/win10_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}
> > -blockdev 
> > {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}
> > -machine 
> > pc-q35-5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,mem-merge=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format
> > -cpu 
> > Skylake-Client-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-ssbd=on,fma=off,avx=off,f16c=off,rdrand=off,bmi1=off,hle=off,avx2=off,bmi2=off,rtm=off,rdseed=off,adx=off,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vpindex,hv-runtime,hv-synic,hv-stimer,hv-reset
> > -m 8192 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu/1-win10
> > -overcommit mem-lock=off -smp 4,sockets=1,dies=1,cores=2,threads=2
> > -uuid 7ccc3031-1dab-4267-b72a-d60065b5ff7f -display none
> > -no-user-config -nodefaults -chardev
> > socket,id=charmonitor,fd=32,server,nowait -mon
> > chardev=charmonitor,id=monitor,mode=control -rtc
> > base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay
> > -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global
> > ICH9-LPC.disable_s4=1 -boot menu=off,strict=on -device
> > pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1
> > -device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1
> > -device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2
> > -device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3
> > -device pcie-pci-bridge,id=pci.5,bus=pci.2,addr=0x0 -device
> > qemu-xhci,id=usb,bus=pci.1,addr=0x0 -blockdev
> > {"driver":"host_device","filename":"/dev/disk/by-partuuid/05c3750b-060f-4703-95ea-6f5e546bf6e9","node-name":"libvirt-1-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"}
> > -blockdev 
> > {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","detect-zeroes":"unmap","cache":{"direct":false,"no-flush":true},"driver":"raw","file":"libvirt-1-storage"}
> > -device 
> > virtio-blk-pci,scsi=off,bus=pcie.0,addr=0xa,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on
> > -netdev tap,fd=34,id=hostnet0 -device
> > e1000,netdev=hostnet0,id=net0,mac=52:54:00:c6:bb:bc,bus=pcie.0,addr=0x3
> > -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x4 -device
> > hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device
> > vfio-pci,host=:00:02.0,id=hostdev0,bus=pcie.0,addr=0x2,rombar=0
> > -device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x8 -object
> > rng-random,id=objrng0,filename=/dev/urandom -device
> > virtio-rng-pci,rng=objrng0,id=rng0,bus=pcie.0,addr=0x9 -msg
> > timestamp=on
> >
> > And I use libvirt 6.3.0 to manage the VM. Here's an xml of my VM.
> >
> > 
> >   win10
> >   7ccc3031-1dab-4267-b72a-d60065b5ff7f
> >   
> >  > xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0";>
> >   http://microsoft.com/win/10"/>
> > 
> >   
> >   8388608
> >   8388608
> >   
> > 
> > 
> >   
> >   4
> >   
> > 
> > 
> > 
> > 
> >   
> >   
> > hvm
> >  > type="pflash">/usr/share/OVMF/OVMF_CODE.fd
> > /var/lib/libvirt/qemu/nvram/win10_VARS.fd
> > 
> > 
> >   
> >   
> > 
> > 
> > 
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> > 
> > 
> >   
> >   
> > 
> >   
> >

Re: [PATCH v2] hw/riscv: microchip_pfsoc: IOSCBCTRL memmap entry

2020-10-16 Thread Alistair Francis

On Fri, Oct 16, 2020 at 10:10 AM Ivan Griffin  wrote:
>
> Adding the PolarFire SoC IOSCBCTRL memory region to prevent QEMU
> reporting a STORE/AMO Access Fault.
>
> This region is used by the PolarFire SoC port of U-Boot to
> interact with the FPGA system controller.
>
> Signed-off-by: Ivan Griffin 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/riscv/microchip_pfsoc.c | 10 ++
>  include/hw/riscv/microchip_pfsoc.h |  1 +
>  2 files changed, 11 insertions(+)
>
> diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
> index 4627179cd3..9aaa276ee2 100644
> --- a/hw/riscv/microchip_pfsoc.c
> +++ b/hw/riscv/microchip_pfsoc.c
> @@ -97,6 +97,7 @@ static const struct MemmapEntry {
>  [MICROCHIP_PFSOC_GPIO2] =   { 0x20122000, 0x1000 },
>  [MICROCHIP_PFSOC_ENVM_CFG] ={ 0x2020, 0x1000 },
>  [MICROCHIP_PFSOC_ENVM_DATA] =   { 0x2022,0x2 },
> +[MICROCHIP_PFSOC_IOSCB_CTRL] =  { 0x3702, 0x1000 },
>  [MICROCHIP_PFSOC_IOSCB_CFG] =   { 0x3708, 0x1000 },
>  [MICROCHIP_PFSOC_DRAM] ={ 0x8000,0x0 },
>  };
> @@ -341,6 +342,15 @@ static void microchip_pfsoc_soc_realize(DeviceState 
> *dev, Error **errp)
>  create_unimplemented_device("microchip.pfsoc.ioscb.cfg",
>  memmap[MICROCHIP_PFSOC_IOSCB_CFG].base,
>  memmap[MICROCHIP_PFSOC_IOSCB_CFG].size);
> +
> +/* IOSCBCTRL
> + *
> + * These registers are not documented in the official documentation
> + * but used by the polarfire-soc-bare-meta-library code
> + */
> +create_unimplemented_device("microchip.pfsoc.ioscb.ctrl",
> +memmap[MICROCHIP_PFSOC_IOSCB_CTRL].base,
> +memmap[MICROCHIP_PFSOC_IOSCB_CTRL].size);
>  }
>
>  static void microchip_pfsoc_soc_class_init(ObjectClass *oc, void *data)
> diff --git a/include/hw/riscv/microchip_pfsoc.h 
> b/include/hw/riscv/microchip_pfsoc.h
> index 8bfc7e1a85..3f1874b162 100644
> --- a/include/hw/riscv/microchip_pfsoc.h
> +++ b/include/hw/riscv/microchip_pfsoc.h
> @@ -95,6 +95,7 @@ enum {
>  MICROCHIP_PFSOC_ENVM_CFG,
>  MICROCHIP_PFSOC_ENVM_DATA,
>  MICROCHIP_PFSOC_IOSCB_CFG,
> +MICROCHIP_PFSOC_IOSCB_CTRL,
>  MICROCHIP_PFSOC_DRAM,
>  };
>
> --
> 2.17.1
>

Re: [RFC PATCH v3] target/mips: Increase number of TLB entries on the 34Kf core (16 -> 64)

2020-10-16 Thread Richard Henderson

On 10/16/20 6:33 AM, Philippe Mathieu-Daudé wrote:
> Per "MIPS32 34K Processor Core Family Software User's Manual,
> Revision 01.13" page 8 in "Joint TLB (JTLB)" section:
> 
>   "The JTLB is a fully associative TLB cache containing 16, 32,
>or 64-dual-entries mapping up to 128 virtual pages to their
>corresponding physical addresses."
> 
> There is no particular reason to restrict the 34Kf core model to
> 16 TLB entries, so raise its config to 64.
> 
> This is helpful for other projects, in particular the Yocto Project:
> 
>   Yocto Project uses qemu-system-mips 34Kf cpu model, to run 32bit
>   MIPS CI loop. It was observed that in this case CI test execution
>   time was almost twice longer than 64bit MIPS variant that runs
>   under MIPS64R2-generic model. It was investigated and concluded
>   that the difference in number of TLBs 16 in 34Kf case vs 64 in
>   MIPS64R2-generic is responsible for most of CI real time execution
>   difference. Because with 16 TLBs linux user-land trashes TLB more
>   and it needs to execute more instructions in TLB refill handler
>   calls, as result it runs much longer.
> 
> (https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html)
> 
> Buglink: https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992
> Reported-by: Victor Kamensky 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> v3: KISS
> Supersedes: <20201015224746.540027-1-f4...@amsat.org>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/mips/translate_init.c.inc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Richard Henderson 


r~

Re: [PATCH v3 00/15] raspi: add the bcm2835 cprman clock manager

2020-10-16 Thread Philippe Mathieu-Daudé


On 10/10/20 3:57 PM, Luc Michel wrote:
[...]

Hi,

This series add the BCM2835 CPRMAN clock manager peripheral to the
Raspberry Pi machine.


Series:
Tested-by: Philippe Mathieu-Daudé

[PATCH v3 11/13] block/qcow2: read_cache_sizes: return status value

2020-10-16 Thread Vladimir Sementsov-Ogievskiy

It's better to return status together with setting errp. It allows to
reduce error propagation.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Greg Kurz 
Reviewed-by: Alberto Garcia 
---
 block/qcow2.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index c4b86df7c0..2b6ec4b757 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -869,7 +869,7 @@ static void qcow2_attach_aio_context(BlockDriverState *bs,
 cache_clean_timer_init(bs, new_context);
 }
 
-static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
+static bool read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
  uint64_t *l2_cache_size,
  uint64_t *l2_cache_entry_size,
  uint64_t *refcount_cache_size, Error **errp)
@@ -907,16 +907,16 @@ static void read_cache_sizes(BlockDriverState *bs, 
QemuOpts *opts,
 error_setg(errp, QCOW2_OPT_CACHE_SIZE ", " QCOW2_OPT_L2_CACHE_SIZE
" and " QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not be set "
"at the same time");
-return;
+return false;
 } else if (l2_cache_size_set &&
(l2_cache_max_setting > combined_cache_size)) {
 error_setg(errp, QCOW2_OPT_L2_CACHE_SIZE " may not exceed "
QCOW2_OPT_CACHE_SIZE);
-return;
+return false;
 } else if (*refcount_cache_size > combined_cache_size) {
 error_setg(errp, QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not exceed "
QCOW2_OPT_CACHE_SIZE);
-return;
+return false;
 }
 
 if (l2_cache_size_set) {
@@ -955,8 +955,10 @@ static void read_cache_sizes(BlockDriverState *bs, 
QemuOpts *opts,
 error_setg(errp, "L2 cache entry size must be a power of two "
"between %d and the cluster size (%d)",
1 << MIN_CLUSTER_BITS, s->cluster_size);
-return;
+return false;
 }
+
+return true;
 }
 
 typedef struct Qcow2ReopenState {
@@ -983,7 +985,6 @@ static int qcow2_update_options_prepare(BlockDriverState 
*bs,
 int i;
 const char *encryptfmt;
 QDict *encryptopts = NULL;
-Error *local_err = NULL;
 int ret;
 
 qdict_extract_subqdict(options, &encryptopts, "encrypt.");
@@ -996,10 +997,8 @@ static int qcow2_update_options_prepare(BlockDriverState 
*bs,
 }
 
 /* get L2 table/refcount block cache size from command line options */
-read_cache_sizes(bs, opts, &l2_cache_size, &l2_cache_entry_size,
- &refcount_cache_size, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
+if (!read_cache_sizes(bs, opts, &l2_cache_size, &l2_cache_entry_size,
+  &refcount_cache_size, errp)) {
 ret = -EINVAL;
 goto fail;
 }
-- 
2.21.3

[PATCH v3 12/13] block/qcow2: simplify qcow2_co_invalidate_cache()

2020-10-16 Thread Vladimir Sementsov-Ogievskiy

qcow2_do_open correctly sets errp on each failure path. So, we can
simplify code in qcow2_co_invalidate_cache() and drop explicit error
propagation.

Add ERRP_GUARD() as mandated by the documentation in
include/qapi/error.h so that error_prepend() is actually called even if
errp is &error_fatal.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
Reviewed-by: Greg Kurz 
---
 block/qcow2.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2b6ec4b757..cd5f48d3fb 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2702,11 +2702,11 @@ static void qcow2_close(BlockDriverState *bs)
 static void coroutine_fn qcow2_co_invalidate_cache(BlockDriverState *bs,
Error **errp)
 {
+ERRP_GUARD();
 BDRVQcow2State *s = bs->opaque;
 int flags = s->flags;
 QCryptoBlock *crypto = NULL;
 QDict *options;
-Error *local_err = NULL;
 int ret;
 
 /*
@@ -2724,16 +2724,11 @@ static void coroutine_fn 
qcow2_co_invalidate_cache(BlockDriverState *bs,
 
 flags &= ~BDRV_O_INACTIVE;
 qemu_co_mutex_lock(&s->lock);
-ret = qcow2_do_open(bs, options, flags, &local_err);
+ret = qcow2_do_open(bs, options, flags, errp);
 qemu_co_mutex_unlock(&s->lock);
 qobject_unref(options);
-if (local_err) {
-error_propagate_prepend(errp, local_err,
-"Could not reopen qcow2 layer: ");
-bs->drv = NULL;
-return;
-} else if (ret < 0) {
-error_setg_errno(errp, -ret, "Could not reopen qcow2 layer");
+if (ret < 0) {
+error_prepend(errp, "Could not reopen qcow2 layer: ");
 bs->drv = NULL;
 return;
 }
-- 
2.21.3

[PATCH v3 13/13] block/qed: bdrv_qed_do_open: deal with errp

2020-10-16 Thread Vladimir Sementsov-Ogievskiy

Set errp always on failure. Generic bdrv_open_driver supports driver
functions which can return negative value and forget to set errp.
That's a strange thing.. Let's improve bdrv_qed_do_open to not behave
this way. This allows to simplify code in
bdrv_qed_co_invalidate_cache().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
Reviewed-by: Greg Kurz 
---
 block/qed.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index b27e7546ca..f45c640513 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -393,6 +393,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
 
 ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
 if (ret < 0) {
+error_setg(errp, "Failed to read QED header");
 return ret;
 }
 qed_header_le_to_cpu(&le_header, &s->header);
@@ -408,25 +409,30 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
 return -ENOTSUP;
 }
 if (!qed_is_cluster_size_valid(s->header.cluster_size)) {
+error_setg(errp, "QED cluster size is invalid");
 return -EINVAL;
 }
 
 /* Round down file size to the last cluster */
 file_size = bdrv_getlength(bs->file->bs);
 if (file_size < 0) {
+error_setg(errp, "Failed to get file length");
 return file_size;
 }
 s->file_size = qed_start_of_cluster(s, file_size);
 
 if (!qed_is_table_size_valid(s->header.table_size)) {
+error_setg(errp, "QED table size is invalid");
 return -EINVAL;
 }
 if (!qed_is_image_size_valid(s->header.image_size,
  s->header.cluster_size,
  s->header.table_size)) {
+error_setg(errp, "QED image size is invalid");
 return -EINVAL;
 }
 if (!qed_check_table_offset(s, s->header.l1_table_offset)) {
+error_setg(errp, "QED table offset is invalid");
 return -EINVAL;
 }
 
@@ -438,6 +444,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
 
 /* Header size calculation must not overflow uint32_t */
 if (s->header.header_size > UINT32_MAX / s->header.cluster_size) {
+error_setg(errp, "QED header size is too large");
 return -EINVAL;
 }
 
@@ -445,6 +452,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
 if ((uint64_t)s->header.backing_filename_offset +
 s->header.backing_filename_size >
 s->header.cluster_size * s->header.header_size) {
+error_setg(errp, "QED backing filename offset is invalid");
 return -EINVAL;
 }
 
@@ -453,6 +461,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
   bs->auto_backing_file,
   sizeof(bs->auto_backing_file));
 if (ret < 0) {
+error_setg(errp, "Failed to read backing filename");
 return ret;
 }
 pstrcpy(bs->backing_file, sizeof(bs->backing_file),
@@ -475,6 +484,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
 
 ret = qed_write_header_sync(s);
 if (ret) {
+error_setg(errp, "Failed to update header");
 return ret;
 }
 
@@ -487,6 +497,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
 
 ret = qed_read_l1_table_sync(s);
 if (ret) {
+error_setg(errp, "Failed to read L1 table");
 goto out;
 }
 
@@ -503,6 +514,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
*bs, QDict *options,
 
 ret = qed_check(s, &result, true);
 if (ret) {
+error_setg(errp, "Image corrupted");
 goto out;
 }
 }
@@ -1537,22 +1549,16 @@ static void coroutine_fn 
bdrv_qed_co_invalidate_cache(BlockDriverState *bs,
   Error **errp)
 {
 BDRVQEDState *s = bs->opaque;
-Error *local_err = NULL;
 int ret;
 
 bdrv_qed_close(bs);
 
 bdrv_qed_init_state(bs);
 qemu_co_mutex_lock(&s->table_lock);
-ret = bdrv_qed_do_open(bs, NULL, bs->open_flags, &local_err);
+ret = bdrv_qed_do_open(bs, NULL, bs->open_flags, errp);
 qemu_co_mutex_unlock(&s->table_lock);
-if (local_err) {
-error_propagate_prepend(errp, local_err,
-"Could not reopen qed layer: ");
-return;
-} else if (ret < 0) {
-error_setg_errno(errp, -ret, "Could not reopen qed layer");
-return;
+if (ret < 0) {
+error_prepend(errp, "Could not reopen qed layer: ");
 }
 }
 
-- 
2.21.3

[PATCH v3 09/13] block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface

2020-10-16 Thread Vladimir Sementsov-Ogievskiy

It's recommended for bool functions with errp to return true on success
and false on failure. Non-standard interfaces don't help to understand
the code. The change is also needed to reduce error propagation.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
Reviewed-by: Greg Kurz 
---
 block/qcow2.h|  3 ++-
 block/qcow2-bitmap.c | 26 +++---
 block/qcow2.c|  6 ++
 3 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 024901a6ca..4c994739ed 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -975,7 +975,8 @@ void qcow2_cache_discard(Qcow2Cache *c, void *table);
 int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
   void **refcount_table,
   int64_t *refcount_table_size);
-bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error **errp);
+bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, bool *header_updated,
+  Error **errp);
 bool qcow2_get_bitmap_info_list(BlockDriverState *bs,
 Qcow2BitmapInfoList **info_list, Error **errp);
 int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 55cd52ab96..ca01f08bac 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -962,25 +962,27 @@ static void set_readonly_helper(gpointer bitmap, gpointer 
value)
 bdrv_dirty_bitmap_set_readonly(bitmap, (bool)value);
 }
 
-/* qcow2_load_dirty_bitmaps()
- * Return value is a hint for caller: true means that the Qcow2 header was
- * updated. (false doesn't mean that the header should be updated by the
- * caller, it just means that updating was not needed or the image cannot be
- * written to).
- * On failure the function returns false.
+/*
+ * Return true on success, false on failure.
+ * If header_updated is not NULL then it is set appropriately regardless of
+ * the return value.
  */
-bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error **errp)
+bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, bool *header_updated,
+  Error **errp)
 {
 BDRVQcow2State *s = bs->opaque;
 Qcow2BitmapList *bm_list;
 Qcow2Bitmap *bm;
 GSList *created_dirty_bitmaps = NULL;
-bool header_updated = false;
 bool needs_update = false;
 
+if (header_updated) {
+*header_updated = false;
+}
+
 if (s->nb_bitmaps == 0) {
 /* No bitmaps - nothing to do */
-return false;
+return true;
 }
 
 bm_list = bitmap_list_load(bs, s->bitmap_directory_offset,
@@ -1036,7 +1038,9 @@ bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error 
**errp)
 error_setg_errno(errp, -ret, "Can't update bitmap directory");
 goto fail;
 }
-header_updated = true;
+if (header_updated) {
+*header_updated = true;
+}
 }
 
 if (!can_write(bs)) {
@@ -1047,7 +1051,7 @@ bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error 
**errp)
 g_slist_free(created_dirty_bitmaps);
 bitmap_list_free(bm_list);
 
-return header_updated;
+return true;
 
 fail:
 g_slist_foreach(created_dirty_bitmaps, release_dirty_bitmap_helper, bs);
diff --git a/block/qcow2.c b/block/qcow2.c
index 8c89c98978..c4b86df7c0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1297,7 +1297,6 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 unsigned int len, i;
 int ret = 0;
 QCowHeader header;
-Error *local_err = NULL;
 uint64_t ext_end;
 uint64_t l1_vm_state_index;
 bool update_header = false;
@@ -1785,9 +1784,8 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 
 if (!(bdrv_get_flags(bs) & BDRV_O_INACTIVE)) {
 /* It's case 1, 2 or 3.2. Or 3.1 which is BUG in management layer. */
-bool header_updated = qcow2_load_dirty_bitmaps(bs, &local_err);
-if (local_err != NULL) {
-error_propagate(errp, local_err);
+bool header_updated;
+if (!qcow2_load_dirty_bitmaps(bs, &header_updated, errp)) {
 ret = -EINVAL;
 goto fail;
 }
-- 
2.21.3

Re: [RFC PATCH 0/3] target/mips: Make the number of TLB entries a CPU property

2020-10-16 Thread Richard Henderson

On 10/15/20 11:56 AM, Victor Kamensky (kamensky) via wrote:
> Is possible to come back to 34Kf route, doing
> very small localized very well defined change
> of bumping TLBs number for model that we know
> works well for us?

Yes, thanks for testing.

I think we should also add a property to enable Config3.PM for any cpu, and see
how that gets on.  But simply adjusting the number of tlb entries is a good
start, and is the only thing that will work for older kernels.

r~

[PATCH v3 06/13] block/mirror: drop extra error propagation in commit_active_start()

2020-10-16 Thread Vladimir Sementsov-Ogievskiy

Let's check return value of mirror_start_job to check for failure
instead of local_err.

Rename ret to job, as ret is usually integer variable.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Greg Kurz 
Reviewed-by: Alberto Garcia 
---
 block/mirror.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b3778248b8..f7c624d6a9 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1851,8 +1851,7 @@ BlockJob *commit_active_start(const char *job_id, 
BlockDriverState *bs,
   bool auto_complete, Error **errp)
 {
 bool base_read_only;
-Error *local_err = NULL;
-BlockJob *ret;
+BlockJob *job;
 
 base_read_only = bdrv_is_read_only(base);
 
@@ -1862,19 +1861,18 @@ BlockJob *commit_active_start(const char *job_id, 
BlockDriverState *bs,
 }
 }
 
-ret = mirror_start_job(
+job = mirror_start_job(
  job_id, bs, creation_flags, base, NULL, speed, 0, 0,
  MIRROR_LEAVE_BACKING_CHAIN, false,
  on_error, on_error, true, cb, opaque,
  &commit_active_job_driver, false, base, auto_complete,
  filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
- &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
+ errp);
+if (!job) {
 goto error_restore_flags;
 }
 
-return ret;
+return job;
 
 error_restore_flags:
 /* ignore error and errp for bdrv_reopen, because we want to propagate
-- 
2.21.3

[PATCH v3 10/13] block/qcow2-bitmap: return status from qcow2_store_persistent_dirty_bitmaps

2020-10-16 Thread Vladimir Sementsov-Ogievskiy

It's better to return status together with setting errp. It makes
possible to avoid error propagation.

While being here, put ERRP_GUARD() to fix error_prepend(errp, ...)
usage inside qcow2_store_persistent_dirty_bitmaps() (see the comment
above ERRP_GUARD() definition in include/qapi/error.h)

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Greg Kurz 
Reviewed-by: Alberto Garcia 
---
 block/qcow2.h|  2 +-
 block/qcow2-bitmap.c | 13 ++---
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 4c994739ed..467cfd4779 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -981,7 +981,7 @@ bool qcow2_get_bitmap_info_list(BlockDriverState *bs,
 Qcow2BitmapInfoList **info_list, Error **errp);
 int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
 int qcow2_truncate_bitmaps_check(BlockDriverState *bs, Error **errp);
-void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
+bool qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
   bool release_stored, Error **errp);
 int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp);
 bool qcow2_co_can_store_new_dirty_bitmap(BlockDriverState *bs,
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index ca01f08bac..9eccaab7f8 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -1536,9 +1536,10 @@ out:
  * readonly to begin with, and whether we opened directly or reopened to that
  * state shouldn't matter for the state we get afterward.
  */
-void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
+bool qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
   bool release_stored, Error **errp)
 {
+ERRP_GUARD();
 BdrvDirtyBitmap *bitmap;
 BDRVQcow2State *s = bs->opaque;
 uint32_t new_nb_bitmaps = s->nb_bitmaps;
@@ -1558,7 +1559,7 @@ void 
qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
 bm_list = bitmap_list_load(bs, s->bitmap_directory_offset,
s->bitmap_directory_size, errp);
 if (bm_list == NULL) {
-return;
+return false;
 }
 }
 
@@ -1673,7 +1674,7 @@ success:
 }
 
 bitmap_list_free(bm_list);
-return;
+return true;
 
 fail:
 QSIMPLEQ_FOREACH(bm, bm_list, entry) {
@@ -1691,16 +1692,14 @@ fail:
 }
 
 bitmap_list_free(bm_list);
+return false;
 }
 
 int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp)
 {
 BdrvDirtyBitmap *bitmap;
-Error *local_err = NULL;
 
-qcow2_store_persistent_dirty_bitmaps(bs, false, &local_err);
-if (local_err != NULL) {
-error_propagate(errp, local_err);
+if (!qcow2_store_persistent_dirty_bitmaps(bs, false, errp)) {
 return -EINVAL;
 }
 
-- 
2.21.3

Re: [PATCH v2 2/5] [RISCV_PM] Support CSRs required for RISC-V PM extension except for ones in hypervisor mode

2020-10-16 Thread Richard Henderson

On 10/15/20 11:05 AM, Alexey Baturo wrote:
> Meanwhile, do you think applying **MTE *masks while reading CSR values is a
> good solution for now?

Yes.


r~

[PATCH v3 05/13] block: drop extra error propagation for bdrv_set_backing_hd

2020-10-16 Thread Vladimir Sementsov-Ogievskiy

bdrv_set_backing_hd now returns status, let's use it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Greg Kurz 
Reviewed-by: Alberto Garcia 
---
 block.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 7b6818c681..a35dc80dd4 100644
--- a/block.c
+++ b/block.c
@@ -3016,11 +3016,9 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 
 /* Hook up the backing file link; drop our reference, bs owns the
  * backing_hd reference now */
-bdrv_set_backing_hd(bs, backing_hd, &local_err);
+ret = bdrv_set_backing_hd(bs, backing_hd, errp);
 bdrv_unref(backing_hd);
-if (local_err) {
-error_propagate(errp, local_err);
-ret = -EINVAL;
+if (ret < 0) {
 goto free_exit;
 }
 
-- 
2.21.3

1 2 3 >

1 - 100 of 254 matches

Mail list logo