Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
On Sat, Oct 17, 2020 at 8:44 AM Willy Tarreau wrote: > On Sat, Oct 17, 2020 at 07:52:48AM +0200, Jann Horn wrote: > > On Sat, Oct 17, 2020 at 7:37 AM Willy Tarreau wrote: > > > On Sat, Oct 17, 2020 at 07:01:31AM +0200, Jann Horn wrote: > > > > Microsoft's documentation > > > > (http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM > > > > Generation ID that we get after a fork "is a 128-bit, > > > > cryptographically random integer value". If multiple people use the > > > > same image, it guarantees that each use of the image gets its own, > > > > fresh ID: > > > > > > No. It cannot be more unique than the source that feeds that cryptographic > > > transformation. All it guarantees is that the entropy source is protected > > > from being guessed based on the output. Applying cryptography on a simple > > > counter provides apparently random numbers that will be unique for a long > > > period for the same source, but as soon as you duplicate that code between > > > users and they start from the same counter they'll get the same IDs. > > > > > > This is why I think that using a counter is better if you really need > > > something > > > unique. Randoms only reduce predictability which helps avoiding > > > collisions. > > > > Microsoft's spec tells us that they're giving us cryptographically > > random numbers. Where they're getting those from is not our problem. > > (And if even the hypervisor is not able to collect enough entropy to > > securely generate random numbers, worrying about RNG reseeding in the > > guest would be kinda pointless, we'd be fairly screwed anyway.) > > Sorry if I sound annoying, but it's a matter of terminology and needs. > > Cryptograhically random means safe for use with cryptography in that it > is unguessable enough so that you can use it for encryption keys that > nobody will be able to guess. It in no ways guarantees uniqueness, just > like you don't really care if the symmetric crypto key of you VPN has > already been used once somewhere else as long as there's no way to know. > However with the good enough distribution that a CSPRNG provides, > collisions within a *same* generator are bound to a very low, predictable > rate which is by generally considered as acceptable for all use cases. Yes. > Something random (cryptographically or not) *cannot* be unique by > definition, otherwise it's not random anymore, since each draw has an > influence on the remaining list of possible draws, which is contrary to > randomness. And conversely something unique cannot be completely random > because if you know it's unique, you can already rule out all other known > values from the candidates, thus it's more predictable than random. Yes. > With this in mind, picking randoms from a same RNG is often highly > sufficient to consider they're highly likely unique within a long > period. But it's not a guarantee. And it's even less one between two > RNGs (e.g. if uniqueness is required between multiple hypervisors in > case VMs are migrated or centrally managed, which I don't know). > > If what is sought here is a strong guarantee of uniqueness, using a > counter as you first suggested is better. My suggestion is to use a counter *in the UAPI*, not in the hypervisor protocol. (And as long as that counter can only miss increments in a cryptographically negligible fraction of cases, everything's fine.) > If what is sought is pure > randomness (in the sense that it's unpredictable, which I don't think > is needed here), then randoms are better. And this is what *the hypervisor protocol* gives us (which could be very useful for reseeding the kernel RNG). > If both are required, just > concatenate a counter and a random. And if you need them to be spatially > unique, just include a node identifier. > > Now the initial needs in the forwarded message are not entirely clear > to me but I wanted to rule out the apparent mismatch between the expressed > needs for uniqueness and the proposed solutions solely based on randomness. Sure, from a theoretical standpoint, it would be a little bit nicer if the hypervisor protocol included a generation number along with the 128-bit random value. But AFAIU it doesn't, so if we want this to just work under Microsoft's existing hypervisor, we'll have to make do with checking whether the random value changed. :P
Re: [PATCH v2 2/3] grackle: use qdev gpios for PCI IRQs
On Fri, Oct 16, 2020 at 07:53:10AM +0100, Mark Cave-Ayland wrote: > On 16/10/2020 07:45, Howard Spoelstra wrote: > > > Hi, > > > > I see compilation of the current ppc-for-5.2 branch fail with: > > > > ../hw/pci-host/grackle.c: In function ‘grackle_realize’: > > ../hw/pci-host/grackle.c:68:11: error: ‘GrackleState’ has no member named > > ‘pic’ > > 68 | if (!s->pic) { > > | ^~ > > make: *** [Makefile.ninja:1741: libcommon.fa.p/hw_pci-host_grackle.c.o] > > Error 1 > > > > Best, > > Howard > > I see - as per the cover letter, my series is a replacement for Phil's > original patch at > https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg02988.html (the PIC > link is now completely removed), so the solution here is to drop patch > 7daac97 "hw/pci-host/grackle: Verify PIC link is properly set". Ok, I've removed that from my ppc-for-5.2 tree. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH v2 1/3] macio: don't reference serial_hd() directly within the device
On Fri, Oct 16, 2020 at 08:00:06AM +0100, Mark Cave-Ayland wrote: > On 16/10/2020 01:16, David Gibson wrote: > > > On Tue, Oct 13, 2020 at 12:49:20PM +0100, Mark Cave-Ayland wrote: > > > Instead use qdev_prop_set_chr() to configure the ESCC serial chardevs at > > > the > > > Mac Old World and New World machine level. > > > > > > Also remove the now obsolete comment referring to the use of serial_hd() > > > and > > > the setting of user_creatable to false accordingly. > > > > > > Signed-off-by: Mark Cave-Ayland > > > > Applied to ppc-for-5.2, thanks. > > Ah okay, I was planning to send a separate qemu-macppc pull request myself > with these patches plus some cherry-picks from Zoltan's patchset after some > more testing. Does this mean you would prefer to take the patches directly > yourself? Not really. I sent a PR recently, so I probably won't do another for a little while. So go ahead and send yours, and mine should rebase easily enough. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
On Sat, Oct 17, 2020 at 07:52:48AM +0200, Jann Horn wrote: > On Sat, Oct 17, 2020 at 7:37 AM Willy Tarreau wrote: > > On Sat, Oct 17, 2020 at 07:01:31AM +0200, Jann Horn wrote: > > > Microsoft's documentation > > > (http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM > > > Generation ID that we get after a fork "is a 128-bit, > > > cryptographically random integer value". If multiple people use the > > > same image, it guarantees that each use of the image gets its own, > > > fresh ID: > > > > No. It cannot be more unique than the source that feeds that cryptographic > > transformation. All it guarantees is that the entropy source is protected > > from being guessed based on the output. Applying cryptography on a simple > > counter provides apparently random numbers that will be unique for a long > > period for the same source, but as soon as you duplicate that code between > > users and they start from the same counter they'll get the same IDs. > > > > This is why I think that using a counter is better if you really need > > something > > unique. Randoms only reduce predictability which helps avoiding collisions. > > Microsoft's spec tells us that they're giving us cryptographically > random numbers. Where they're getting those from is not our problem. > (And if even the hypervisor is not able to collect enough entropy to > securely generate random numbers, worrying about RNG reseeding in the > guest would be kinda pointless, we'd be fairly screwed anyway.) Sorry if I sound annoying, but it's a matter of terminology and needs. Cryptograhically random means safe for use with cryptography in that it is unguessable enough so that you can use it for encryption keys that nobody will be able to guess. It in no ways guarantees uniqueness, just like you don't really care if the symmetric crypto key of you VPN has already been used once somewhere else as long as there's no way to know. However with the good enough distribution that a CSPRNG provides, collisions within a *same* generator are bound to a very low, predictable rate which is by generally considered as acceptable for all use cases. Something random (cryptographically or not) *cannot* be unique by definition, otherwise it's not random anymore, since each draw has an influence on the remaining list of possible draws, which is contrary to randomness. And conversely something unique cannot be completely random because if you know it's unique, you can already rule out all other known values from the candidates, thus it's more predictable than random. With this in mind, picking randoms from a same RNG is often highly sufficient to consider they're highly likely unique within a long period. But it's not a guarantee. And it's even less one between two RNGs (e.g. if uniqueness is required between multiple hypervisors in case VMs are migrated or centrally managed, which I don't know). If what is sought here is a strong guarantee of uniqueness, using a counter as you first suggested is better. If what is sought is pure randomness (in the sense that it's unpredictable, which I don't think is needed here), then randoms are better. If both are required, just concatenate a counter and a random. And if you need them to be spatially unique, just include a node identifier. Now the initial needs in the forwarded message are not entirely clear to me but I wanted to rule out the apparent mismatch between the expressed needs for uniqueness and the proposed solutions solely based on randomness. Cheers, Willy
Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
On Sat, Oct 17, 2020 at 07:01:31AM +0200, Jann Horn wrote: > Microsoft's documentation > (http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM > Generation ID that we get after a fork "is a 128-bit, > cryptographically random integer value". If multiple people use the > same image, it guarantees that each use of the image gets its own, > fresh ID: No. It cannot be more unique than the source that feeds that cryptographic transformation. All it guarantees is that the entropy source is protected from being guessed based on the output. Applying cryptography on a simple counter provides apparently random numbers that will be unique for a long period for the same source, but as soon as you duplicate that code between users and they start from the same counter they'll get the same IDs. This is why I think that using a counter is better if you really need something unique. Randoms only reduce predictability which helps avoiding collisions. And I'm saying this as someone who had on his external gateway the same SSH host key as 89 other hosts on the net, each of them using randoms to provide a universally unique one... Willy
Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
On 16 Oct 2020, at 21:02, Jann Horn wrote: On Sat, Oct 17, 2020 at 5:36 AM Willy Tarreau wrote: But in userspace, we just need a simple counter. There's no need for us to worry about anything else, like timestamps or whatever. If we repeatedly fork a paused VM, the forked VMs will see the same counter value, but that's totally fine, because the only thing that matters to userspace is that the counter changes when the VM is forked. For user-space, even a single bit would do. We added MADVISE_WIPEONFORK so that userspace libraries can detect fork()/clone() robustly, for the same reasons. It just wipes a page as the indicator, which is effectively a single-bit signal, and it works well. On the user-space side of this, I’m keen to find a solution like that that we can use fairly easily inside of portable libraries and applications. The “have I forked” checks do end up in hot paths, so it’s nice if they can be CPU cache friendly. Comparing a whole 128-bit value wouldn’t be my favorite. And actually, since the value is a cryptographically random 128-bit value, I think that we should definitely use it to help reseed the kernel's RNG, and keep it secret from userspace. That way, even if the VM image is public, we can ensure that going forward, the kernel RNG will return securely random data. If the image is public, you need some extra new raw entropy from somewhere. The gen-id could be mixed in, that can’t do any harm as long as rigorous cryptographic mixing with the prior state is used, but if that’s all you do then the final state is still deterministic and non-secret. The kernel would need to use the change as a trigger to measure some entropy (e.g. interrupts and RDRAND, or whatever). Our just define the machine contract as “this has to be unique random data and if it’s not unique, or if it’s pubic, you’re toast”. - Colm
Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
On Sat, Oct 17, 2020 at 03:40:08AM +0200, Jann Horn wrote: > [adding some more people who are interested in RNG stuff: Andy, Jason, > Theodore, Willy Tarreau, Eric Biggers. also linux-api@, because this > concerns some pretty fundamental API stuff related to RNG usage] > > On Fri, Oct 16, 2020 at 4:33 PM Catangiu, Adrian Costin > wrote: > > This patch is a driver which exposes the Virtual Machine Generation ID > > via a char-dev FS interface that provides ID update sync and async > > notification, retrieval and confirmation mechanisms: > > > > When the device is 'open()'ed a copy of the current vm UUID is > > associated with the file handle. 'read()' operations block until the > > associated UUID is no longer up to date - until HW vm gen id changes - > > at which point the new UUID is provided/returned. Nonblocking 'read()' > > uses EWOULDBLOCK to signal that there is no _new_ UUID available. > > > > 'poll()' is implemented to allow polling for UUID updates. Such > > updates result in 'EPOLLIN' events. > > > > Subsequent read()s following a UUID update no longer block, but return > > the updated UUID. The application needs to acknowledge the UUID update > > by confirming it through a 'write()'. > > Only on writing back to the driver the right/latest UUID, will the > > driver mark this "watcher" as up to date and remove EPOLLIN status. > > > > 'mmap()' support allows mapping a single read-only shared page which > > will always contain the latest UUID value at offset 0. > > It would be nicer if that page just contained an incrementing counter, > instead of a UUID. It's not like the application cares *what* the UUID > changed to, just that it *did* change and all RNGs state now needs to > be reseeded from the kernel, right? And an application can't reliably > read the entire UUID from the memory mapping anyway, because the VM > might be forked in the middle. > > So I think your kernel driver should detect UUID changes and then turn > those into a monotonically incrementing counter. (Probably 64 bits > wide?) (That's probably also a little bit faster than comparing an > entire UUID.) I agree with this. Further, I'm observing there is a very common confusion between "universally unique" and "random". Randoms are needed when seeking unpredictability. A random number generator *must* be able to return the same value multiple times in a row (though this is rare), otherwise it's not random. To illustrate this, a die has less than 3 bits of randomness and is sufficient to play games with friends where a counter would allow everyone to cheat. Conversely if you want to assign IDs to members of your family you'd rather use a counter than a die for this or you risk collisions and/or long series of retries to pick unique IDs. RFC4122 explains in great length how to produce guaranteed unique IDs, and this only involves space, time and counters. There's indeed a lazy variant that probably everyone uses nowadays, consisting in picking random numbers, but this is not guaranteed unique anymore. If the UUIDs used there are real UUIDs, it could be as simple as updating them according to their format, i.e. updating the timestamp, and if the timestamp is already the same, just increase the seq counter. Doing this doesn't require entropy, doesn't need to block and doesn't needlessly leak randoms that sometimes make people feel nervous. Just my two cents, Willy
Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
On 16 Oct 2020, at 22:01, Jann Horn wrote: On Sat, Oct 17, 2020 at 6:34 AM Colm MacCarthaigh wrote: For user-space, even a single bit would do. We added MADVISE_WIPEONFORK so that userspace libraries can detect fork()/clone() robustly, for the same reasons. It just wipes a page as the indicator, which is effectively a single-bit signal, and it works well. On the user-space side of this, I’m keen to find a solution like that that we can use fairly easily inside of portable libraries and applications. The “have I forked” checks do end up in hot paths, so it’s nice if they can be CPU cache friendly. Comparing a whole 128-bit value wouldn’t be my favorite. I'm pretty sure a single bit is not enough if you want to have a single page, shared across the entire system, that stores the VM forking state; you need a counter for that. You’re right. WIPEONFORK is more like a single-bit per use. If it’s something system wide then a counter is better. So the RNG state after mixing in the new VM Generation ID would contain 128 bits of secret entropy not known to anyone else, including people with access to the VM image. Now, 128 bits of cryptographically random data aren't _optimal_; I think something on the order of 256 bits would be nicer from a theoretical standpoint. But in practice I think we'll be good with the 128 bits we're getting (since the number of users who fork a VM image is probably not going to be so large that worst-case collision probabilities matter). This reminds me on key/IV usage limits for AES encryption, where the same birthday bounds apply, and even though 256-bits would be better, we routinely make 128-bit birthday bounds work for massively scalable systems. The kernel would need to use the change as a trigger to measure some entropy (e.g. interrupts and RDRAND, or whatever). Our just define the machine contract as “this has to be unique random data and if it’s not unique, or if it’s pubic, you’re toast”. As far as I can tell from Microsoft's spec, that is a guarantee we're already getting. Neat. - Colm
Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
On Sat, Oct 17, 2020 at 7:37 AM Willy Tarreau wrote: > On Sat, Oct 17, 2020 at 07:01:31AM +0200, Jann Horn wrote: > > Microsoft's documentation > > (http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM > > Generation ID that we get after a fork "is a 128-bit, > > cryptographically random integer value". If multiple people use the > > same image, it guarantees that each use of the image gets its own, > > fresh ID: > > No. It cannot be more unique than the source that feeds that cryptographic > transformation. All it guarantees is that the entropy source is protected > from being guessed based on the output. Applying cryptography on a simple > counter provides apparently random numbers that will be unique for a long > period for the same source, but as soon as you duplicate that code between > users and they start from the same counter they'll get the same IDs. > > This is why I think that using a counter is better if you really need > something > unique. Randoms only reduce predictability which helps avoiding collisions. Microsoft's spec tells us that they're giving us cryptographically random numbers. Where they're getting those from is not our problem. (And if even the hypervisor is not able to collect enough entropy to securely generate random numbers, worrying about RNG reseeding in the guest would be kinda pointless, we'd be fairly screwed anyway.) Also note that we don't actually need to *always* reinitialize RNG state on forks for functional correctness; it is fine if that fails with a probability of 2^-128, because functionally everything will be fine, and an attacker who is that lucky could also just guess an AES key (which has the same probability of being successful). (And also 2^-128 is such a tiny number that it doesn't matter anyway.) > And I'm saying this as someone who had on his external gateway the same SSH > host key as 89 other hosts on the net, each of them using randoms to provide > a universally unique one... If your SSH host key was shared with 89 other hosts, it evidently wasn't generated from cryptographically random numbers. :P Either because the key generator was not properly hooked up to the system's entropy pool (if you're talking about the Debian fiasco), or because the system simply did not have enough entropy available. (Or because the key generator is broken, but I don't think that ever happened with OpenSSH?)
[Bug 1891748] Re: qemu-arm-static 5.1 can't run gcc
This has been fixed in mainline, probably commit 8ef618859c379fdce81c91bc93e0574e36ea76ff. ** Changed in: qemu Status: New => Fix Committed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1891748 Title: qemu-arm-static 5.1 can't run gcc Status in QEMU: Fix Committed Bug description: Issue discovered while trying to build pikvm (1) Long story short: when using qemu-arm-static 5.1, gcc exits whith message: Allocating guest commpage: Operation not permitted when using qemu-arm-static v5.0, gcc "works" Steps to reproduce will follow (1) https://github.com/pikvm/pikvm/blob/master/pages/building_os.md To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1891748/+subscriptions
Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
On Sat, Oct 17, 2020 at 6:34 AM Colm MacCarthaigh wrote: > On 16 Oct 2020, at 21:02, Jann Horn wrote: > > On Sat, Oct 17, 2020 at 5:36 AM Willy Tarreau wrote: > > But in userspace, we just need a simple counter. There's no need for > > us to worry about anything else, like timestamps or whatever. If we > > repeatedly fork a paused VM, the forked VMs will see the same counter > > value, but that's totally fine, because the only thing that matters to > > userspace is that the counter changes when the VM is forked. > > For user-space, even a single bit would do. We added MADVISE_WIPEONFORK > so that userspace libraries can detect fork()/clone() robustly, for the > same reasons. It just wipes a page as the indicator, which is > effectively a single-bit signal, and it works well. On the user-space > side of this, I’m keen to find a solution like that that we can use > fairly easily inside of portable libraries and applications. The “have > I forked” checks do end up in hot paths, so it’s nice if they can be > CPU cache friendly. Comparing a whole 128-bit value wouldn’t be my > favorite. I'm pretty sure a single bit is not enough if you want to have a single page, shared across the entire system, that stores the VM forking state; you need a counter for that. > > And actually, since the value is a cryptographically random 128-bit > > value, I think that we should definitely use it to help reseed the > > kernel's RNG, and keep it secret from userspace. That way, even if the > > VM image is public, we can ensure that going forward, the kernel RNG > > will return securely random data. > > If the image is public, you need some extra new raw entropy from > somewhere. The gen-id could be mixed in, that can’t do any harm as > long as rigorous cryptographic mixing with the prior state is used, but > if that’s all you do then the final state is still deterministic and > non-secret. Microsoft's documentation (http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM Generation ID that we get after a fork "is a 128-bit, cryptographically random integer value". If multiple people use the same image, it guarantees that each use of the image gets its own, fresh ID: The table in section "How to implement virtual machine generation ID support in a virtualization platform" says that (among other things) "Virtual machine is imported, copied, or cloned" generates a new generation ID. So the RNG state after mixing in the new VM Generation ID would contain 128 bits of secret entropy not known to anyone else, including people with access to the VM image. Now, 128 bits of cryptographically random data aren't _optimal_; I think something on the order of 256 bits would be nicer from a theoretical standpoint. But in practice I think we'll be good with the 128 bits we're getting (since the number of users who fork a VM image is probably not going to be so large that worst-case collision probabilities matter). > The kernel would need to use the change as a trigger to > measure some entropy (e.g. interrupts and RDRAND, or whatever). Our just > define the machine contract as “this has to be unique random data and > if it’s not unique, or if it’s pubic, you’re toast”. As far as I can tell from Microsoft's spec, that is a guarantee we're already getting.
[PATCH v3 16/18] migration/rdma: add rdma_channel into Migrationstate field
Multifd RDMA is need to poll when we send data, record it. Signed-off-by: Chuan Zheng --- migration/migration.c | 1 + migration/migration.h | 1 + migration/rdma.c | 14 ++ 3 files changed, 16 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 7061410..1ec1dc9 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1892,6 +1892,7 @@ void migrate_init(MigrationState *s) s->migration_thread_running = false; s->enabled_rdma_migration = false; s->host_port = NULL; +s->rdma_channel = 0; error_free(s->error); s->error = NULL; s->hostname = NULL; diff --git a/migration/migration.h b/migration/migration.h index fea63de..5676b23 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -272,6 +272,7 @@ struct MigrationState /* Need by Multi-RDMA */ char *host_port; +int rdma_channel; }; void migrate_set_state(int *state, int old_state, int new_state); diff --git a/migration/rdma.c b/migration/rdma.c index d5d6364..327f80f 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -183,6 +183,20 @@ typedef struct { } RDMAWorkRequestData; /* + * Get the multifd RDMA channel used to send data. + */ +static int get_multifd_RDMA_channel(void) +{ +int thread_count = migrate_multifd_channels(); +MigrationState *s = migrate_get_current(); + +s->rdma_channel++; +s->rdma_channel %= thread_count; + +return s->rdma_channel; +} + +/* * Negotiate RDMA capabilities during connection-setup time. */ typedef struct { -- 1.8.3.1
[PATCH v3 18/18] migration/rdma: RDMA cleanup for multifd migration
Signed-off-by: Chuan Zheng --- migration/multifd.c | 6 ++ migration/multifd.h | 1 + migration/rdma.c| 16 +++- 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/migration/multifd.c b/migration/multifd.c index c4d90ef..f548122 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -574,6 +574,9 @@ void multifd_save_cleanup(void) p->packet_len = 0; g_free(p->packet); p->packet = NULL; +#ifdef CONFIG_RDMA +multifd_rdma_cleanup(p->rdma); +#endif multifd_send_state->ops->send_cleanup(p, &local_err); if (local_err) { migrate_set_error(migrate_get_current(), local_err); @@ -1017,6 +1020,9 @@ int multifd_load_cleanup(Error **errp) p->packet_len = 0; g_free(p->packet); p->packet = NULL; +#ifdef CONFIG_RDMA +multifd_rdma_cleanup(p->rdma); +#endif multifd_recv_state->ops->recv_cleanup(p); } qemu_sem_destroy(&multifd_recv_state->sem_sync); diff --git a/migration/multifd.h b/migration/multifd.h index ec9e897..6fddd4e 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -183,6 +183,7 @@ typedef struct { #ifdef CONFIG_RDMA MultiFDSetup *multifd_rdma_setup(void); +void multifd_rdma_cleanup(void *opaque); #endif void multifd_send_terminate_threads(Error *err); int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp); diff --git a/migration/rdma.c b/migration/rdma.c index 519fa7a..89bf54b 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -2368,7 +2368,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) { int idx; -if (rdma->cm_id && rdma->connected) { +if (rdma->channel && rdma->cm_id && rdma->connected) { if ((rdma->error_state || migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) && !rdma->received_error) { @@ -4609,6 +4609,20 @@ static MultiFDSetup multifd_rdma_ops = { .recv_channel_setup = multifd_rdma_recv_channel_setup }; +void multifd_rdma_cleanup(void *opaque) +{ +RDMAContext *rdma = (RDMAContext *)opaque; + +if (!migrate_use_rdma()) { +return; +} + +rdma->listen_id = NULL; +rdma->channel = NULL; +qemu_rdma_cleanup(rdma); +g_free(rdma); +} + MultiFDSetup *multifd_rdma_setup(void) { MultiFDSetup *rdma_ops; -- 1.8.3.1
[PATCH v3 03/18] migration/rdma: create multifd_setup_ops for Tx/Rx thread
Create multifd_setup_ops for TxRx thread, no logic change. Signed-off-by: Chuan Zheng --- migration/multifd.c | 44 +++- migration/multifd.h | 7 +++ 2 files changed, 46 insertions(+), 5 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 68b171f..1f82307 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -383,6 +383,8 @@ struct { int exiting; /* multifd ops */ MultiFDMethods *ops; +/* multifd setup ops */ +MultiFDSetup *setup_ops; } *multifd_send_state; /* @@ -790,8 +792,9 @@ static bool multifd_channel_connect(MultiFDSendParams *p, } else { /* update for tls qio channel */ p->c = ioc; -qemu_thread_create(&p->thread, p->name, multifd_send_thread, p, - QEMU_THREAD_JOINABLE); +qemu_thread_create(&p->thread, p->name, + multifd_send_state->setup_ops->send_thread_setup, + p, QEMU_THREAD_JOINABLE); } return false; } @@ -839,6 +842,11 @@ cleanup: multifd_new_send_channel_cleanup(p, sioc, local_err); } +static void multifd_send_channel_setup(MultiFDSendParams *p) +{ +socket_send_channel_create(multifd_new_send_channel_async, p); +} + int multifd_save_setup(Error **errp) { int thread_count; @@ -856,6 +864,7 @@ int multifd_save_setup(Error **errp) multifd_send_state->pages = multifd_pages_init(page_count); qemu_sem_init(&multifd_send_state->channels_ready, 0); qatomic_set(&multifd_send_state->exiting, 0); +multifd_send_state->setup_ops = multifd_setup_ops_init(); multifd_send_state->ops = multifd_ops[migrate_multifd_compression()]; for (i = 0; i < thread_count; i++) { @@ -875,7 +884,7 @@ int multifd_save_setup(Error **errp) p->packet->version = cpu_to_be32(MULTIFD_VERSION); p->name = g_strdup_printf("multifdsend_%d", i); p->tls_hostname = g_strdup(s->hostname); -socket_send_channel_create(multifd_new_send_channel_async, p); +multifd_send_state->setup_ops->send_channel_setup(p); } for (i = 0; i < thread_count; i++) { @@ -902,6 +911,8 @@ struct { uint64_t packet_num; /* multifd ops */ MultiFDMethods *ops; +/* multifd setup ops */ +MultiFDSetup *setup_ops; } *multifd_recv_state; static void multifd_recv_terminate_threads(Error *err) @@ -1095,6 +1106,7 @@ int multifd_load_setup(Error **errp) multifd_recv_state->params = g_new0(MultiFDRecvParams, thread_count); qatomic_set(&multifd_recv_state->count, 0); qemu_sem_init(&multifd_recv_state->sem_sync, 0); +multifd_recv_state->setup_ops = multifd_setup_ops_init(); multifd_recv_state->ops = multifd_ops[migrate_multifd_compression()]; for (i = 0; i < thread_count; i++) { @@ -1173,9 +1185,31 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp) p->num_packets = 1; p->running = true; -qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p, - QEMU_THREAD_JOINABLE); +multifd_recv_state->setup_ops->recv_channel_setup(ioc, p); +qemu_thread_create(&p->thread, p->name, + multifd_recv_state->setup_ops->recv_thread_setup, + p, QEMU_THREAD_JOINABLE); qatomic_inc(&multifd_recv_state->count); return qatomic_read(&multifd_recv_state->count) == migrate_multifd_channels(); } + +static void multifd_recv_channel_setup(QIOChannel *ioc, MultiFDRecvParams *p) +{ +return; +} + +static MultiFDSetup multifd_socket_ops = { +.send_thread_setup = multifd_send_thread, +.recv_thread_setup = multifd_recv_thread, +.send_channel_setup = multifd_send_channel_setup, +.recv_channel_setup = multifd_recv_channel_setup +}; + +MultiFDSetup *multifd_setup_ops_init(void) +{ +MultiFDSetup *ops; + +ops = &multifd_socket_ops; +return ops; +} diff --git a/migration/multifd.h b/migration/multifd.h index 8d6751f..446315b 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -166,6 +166,13 @@ typedef struct { int (*recv_pages)(MultiFDRecvParams *p, uint32_t used, Error **errp); } MultiFDMethods; +typedef struct { +void *(*send_thread_setup)(void *opaque); +void *(*recv_thread_setup)(void *opaque); +void (*send_channel_setup)(MultiFDSendParams *p); +void (*recv_channel_setup)(QIOChannel *ioc, MultiFDRecvParams *p); +} MultiFDSetup; + void multifd_register_ops(int method, MultiFDMethods *ops); #endif -- 1.8.3.1
[PATCH v3 05/18] migration/rdma: do not need sync main for rdma
Signed-off-by: Zhimin Feng Signed-off-by: Chuan Zheng --- migration/multifd.c | 8 1 file changed, 8 insertions(+) diff --git a/migration/multifd.c b/migration/multifd.c index 0d494df..8ccfd46 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -580,6 +580,10 @@ void multifd_send_sync_main(QEMUFile *f) if (!migrate_use_multifd()) { return; } + /* Do not need sync for rdma */ +if (migrate_use_rdma()) { +return; +} if (multifd_send_state->pages->used) { if (multifd_send_pages(f) < 0) { error_report("%s: multifd_send_pages fail", __func__); @@ -1002,6 +1006,10 @@ void multifd_recv_sync_main(void) if (!migrate_use_multifd()) { return; } +/* Do not need sync for rdma */ +if (migrate_use_rdma()) { +return; +} for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDRecvParams *p = &multifd_recv_state->params[i]; -- 1.8.3.1
[PATCH v3 11/18] migration/rdma: record host_port for multifd RDMA
Signed-off-by: Chuan Zheng --- migration/migration.c | 1 + migration/migration.h | 3 +++ migration/rdma.c | 3 +++ 3 files changed, 7 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index be6166a..7061410 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1891,6 +1891,7 @@ void migrate_init(MigrationState *s) s->postcopy_after_devices = false; s->migration_thread_running = false; s->enabled_rdma_migration = false; +s->host_port = NULL; error_free(s->error); s->error = NULL; s->hostname = NULL; diff --git a/migration/migration.h b/migration/migration.h index e92eb29..fea63de 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -269,6 +269,9 @@ struct MigrationState * Enable RDMA migration */ bool enabled_rdma_migration; + +/* Need by Multi-RDMA */ +char *host_port; }; void migrate_set_state(int *state, int old_state, int new_state); diff --git a/migration/rdma.c b/migration/rdma.c index 63559f1..dd9f705 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -4206,6 +4206,8 @@ void rdma_start_outgoing_migration(void *opaque, goto err; } +s->host_port = g_strdup(host_port); + ret = qemu_rdma_source_init(rdma, s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL], errp); @@ -4250,6 +4252,7 @@ void rdma_start_outgoing_migration(void *opaque, s->to_dst_file = qemu_fopen_rdma(rdma, "wb"); migrate_fd_connect(s, NULL); +g_free(s->host_port); return; return_path_err: qemu_rdma_cleanup(rdma); -- 1.8.3.1
[PATCH v3 12/18] migration/rdma: Create the multifd send channels for RDMA
Signed-off-by: Chuan Zheng --- migration/multifd.c | 4 ++-- migration/multifd.h | 2 ++ migration/rdma.c| 56 + 3 files changed, 60 insertions(+), 2 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 03f3a1e..9439b3c 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -173,7 +173,7 @@ void multifd_register_ops(int method, MultiFDMethods *ops) multifd_ops[method] = ops; } -static int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp) +int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp) { MultiFDInit_t msg = {}; int ret; @@ -500,7 +500,7 @@ int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset) return 1; } -static void multifd_send_terminate_threads(Error *err) +void multifd_send_terminate_threads(Error *err) { int i; diff --git a/migration/multifd.h b/migration/multifd.h index ff80bd5..ec9e897 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -184,6 +184,8 @@ typedef struct { #ifdef CONFIG_RDMA MultiFDSetup *multifd_rdma_setup(void); #endif +void multifd_send_terminate_threads(Error *err); +int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp); int get_multifd_send_param(int id, MultiFDSendParams **param); int get_multifd_recv_param(int id, MultiFDRecvParams **param); MultiFDSetup *multifd_setup_ops_init(void); diff --git a/migration/rdma.c b/migration/rdma.c index dd9f705..1af81f5 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -4261,9 +4261,54 @@ err: g_free(rdma_return_path); } +static int multifd_channel_rdma_connect(void *opaque) +{ +MultiFDSendParams *p = opaque; +Error *local_err = NULL; +int ret = 0; +MigrationState *s = migrate_get_current(); + +p->rdma = qemu_rdma_data_init(s->host_port, &local_err); +if (p->rdma == NULL) { +goto out; +} + +ret = qemu_rdma_source_init(p->rdma, +migrate_use_rdma_pin_all(), +&local_err); +if (ret) { +goto out; +} + +ret = qemu_rdma_connect(p->rdma, &local_err); +if (ret) { +goto out; +} + +p->file = qemu_fopen_rdma(p->rdma, "wb"); +if (p->file == NULL) { +goto out; +} + +p->c = QIO_CHANNEL(getQIOChannel(p->file)); + +out: +if (local_err) { +trace_multifd_send_error(p->id); +} + +return ret; +} + static void *multifd_rdma_send_thread(void *opaque) { MultiFDSendParams *p = opaque; +Error *local_err = NULL; + +trace_multifd_send_thread_start(p->id); +if (multifd_send_initial_packet(p, &local_err) < 0) { +goto out; +} while (true) { qemu_mutex_lock(&p->mutex); @@ -4275,6 +4320,11 @@ static void *multifd_rdma_send_thread(void *opaque) qemu_sem_wait(&p->sem); } +out: +if (local_err) { +trace_multifd_send_error(p->id); +multifd_send_terminate_threads(local_err); +} qemu_mutex_lock(&p->mutex); p->running = false; qemu_mutex_unlock(&p->mutex); @@ -4286,6 +4336,12 @@ static void multifd_rdma_send_channel_setup(MultiFDSendParams *p) { Error *local_err = NULL; +if (multifd_channel_rdma_connect(p)) { +error_setg(&local_err, "multifd: rdma channel %d not established", + p->id); +return ; +} + if (p->quit) { error_setg(&local_err, "multifd: send id %d already quit", p->id); return ; -- 1.8.3.1
[PATCH v3 13/18] migration/rdma: Add the function for dynamic page registration
Add the 'qemu_rdma_registration' function, multifd send threads call it to register memory. Signed-off-by: Zhimin Feng Signed-off-by: Chuan Zheng --- migration/rdma.c | 51 +++ 1 file changed, 51 insertions(+) diff --git a/migration/rdma.c b/migration/rdma.c index 1af81f5..a366849 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3738,6 +3738,57 @@ out: return ret; } +/* + * Dynamic page registrations for multifd RDMA threads. + */ +static int qemu_rdma_registration(void *opaque) +{ +RDMAContext *rdma = opaque; +RDMAControlHeader resp = {.type = RDMA_CONTROL_RAM_BLOCKS_RESULT }; +RDMALocalBlocks *local = &rdma->local_ram_blocks; +int reg_result_idx, i, nb_dest_blocks; +RDMAControlHeader head = { .len = 0, .repeat = 1 }; +int ret = 0; + +head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST; + +ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp, +®_result_idx, rdma->pin_all ? +qemu_rdma_reg_whole_ram_blocks : NULL); +if (ret < 0) { +goto out; +} + +nb_dest_blocks = resp.len / sizeof(RDMADestBlock); + +if (local->nb_blocks != nb_dest_blocks) { +rdma->error_state = -EINVAL; +ret = -1; +goto out; +} + +qemu_rdma_move_header(rdma, reg_result_idx, &resp); +memcpy(rdma->dest_blocks, + rdma->wr_data[reg_result_idx].control_curr, resp.len); + +for (i = 0; i < nb_dest_blocks; i++) { +network_to_dest_block(&rdma->dest_blocks[i]); + +/* We require that the blocks are in the same order */ +if (rdma->dest_blocks[i].length != local->block[i].length) { +rdma->error_state = -EINVAL; +ret = -1; +goto out; +} +local->block[i].remote_host_addr = +rdma->dest_blocks[i].remote_host_addr; +local->block[i].remote_rkey = rdma->dest_blocks[i].remote_rkey; +} + +out: +return ret; +} + /* Destination: * Called via a ram_control_load_hook during the initial RAM load section which * lists the RAMBlocks by name. This lets us know the order of the RAMBlocks -- 1.8.3.1
[PATCH v3 02/18] migration/rdma: judge whether or not the RDMA is used for migration
Add enabled_rdma_migration into MigrationState to judge whether or not the RDMA is used for migration. Signed-off-by: Zhimin Feng Signed-off-by: Chuan Zheng --- migration/migration.c | 13 + migration/migration.h | 6 ++ 2 files changed, 19 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 64ae417..be6166a 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -389,7 +389,9 @@ void migrate_add_address(SocketAddress *address) void qemu_start_incoming_migration(const char *uri, Error **errp) { const char *p = NULL; +MigrationState *s = migrate_get_current(); +s->enabled_rdma_migration = false; qapi_event_send_migration(MIGRATION_STATUS_SETUP); if (!strcmp(uri, "defer")) { deferred_incoming_migration(errp); @@ -399,6 +401,7 @@ void qemu_start_incoming_migration(const char *uri, Error **errp) socket_start_incoming_migration(p ? p : uri, errp); #ifdef CONFIG_RDMA } else if (strstart(uri, "rdma:", &p)) { +s->enabled_rdma_migration = true; rdma_start_incoming_migration(p, errp); #endif } else if (strstart(uri, "exec:", &p)) { @@ -1887,6 +1890,7 @@ void migrate_init(MigrationState *s) s->start_postcopy = false; s->postcopy_after_devices = false; s->migration_thread_running = false; +s->enabled_rdma_migration = false; error_free(s->error); s->error = NULL; s->hostname = NULL; @@ -2115,6 +2119,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, socket_start_outgoing_migration(s, p ? p : uri, &local_err); #ifdef CONFIG_RDMA } else if (strstart(uri, "rdma:", &p)) { +s->enabled_rdma_migration = true; rdma_start_outgoing_migration(s, p, &local_err); #endif } else if (strstart(uri, "exec:", &p)) { @@ -2329,6 +2334,14 @@ bool migrate_use_events(void) return s->enabled_capabilities[MIGRATION_CAPABILITY_EVENTS]; } +bool migrate_use_rdma(void) +{ +MigrationState *s; +s = migrate_get_current(); + +return s->enabled_rdma_migration; +} + bool migrate_use_rdma_pin_all(void) { MigrationState *s; diff --git a/migration/migration.h b/migration/migration.h index 74fd790..e92eb29 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -264,6 +264,11 @@ struct MigrationState * This save hostname when out-going migration starts */ char *hostname; + +/* + * Enable RDMA migration + */ +bool enabled_rdma_migration; }; void migrate_set_state(int *state, int old_state, int new_state); @@ -300,6 +305,7 @@ bool migrate_ignore_shared(void); bool migrate_validate_uuid(void); bool migrate_auto_converge(void); +bool migrate_use_rdma(void); bool migrate_use_rdma_pin_all(void); bool migrate_use_multifd(void); bool migrate_pause_before_switchover(void); -- 1.8.3.1
[PATCH v3 10/18] migration/rdma: Create the multifd recv channels for RDMA
We still don't transmit anything through them, and we only build the RDMA connections. Signed-off-by: Zhimin Feng Signed-off-by: Chuan Zheng --- migration/rdma.c | 70 ++-- 1 file changed, 68 insertions(+), 2 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index 2baa933..63559f1 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3266,6 +3266,40 @@ static void rdma_cm_poll_handler(void *opaque) } } +static bool qemu_rdma_accept_setup(RDMAContext *rdma) +{ +RDMAContext *multifd_rdma = NULL; +int thread_count; +int i; +MultiFDRecvParams *multifd_recv_param; +thread_count = migrate_multifd_channels(); +/* create the multifd channels for RDMA */ +for (i = 0; i < thread_count; i++) { +if (get_multifd_recv_param(i, &multifd_recv_param) < 0) { +error_report("rdma: error getting multifd_recv_param(%d)", i); +return false; +} + +multifd_rdma = (RDMAContext *) multifd_recv_param->rdma; +if (multifd_rdma->cm_id == NULL) { +break; +} else { +multifd_rdma = NULL; +} +} + +if (multifd_rdma) { +qemu_set_fd_handler(rdma->channel->fd, +rdma_accept_incoming_migration, +NULL, (void *)(intptr_t)multifd_rdma); +} else { +qemu_set_fd_handler(rdma->channel->fd, rdma_cm_poll_handler, +NULL, rdma); +} + +return true; +} + static int qemu_rdma_accept(RDMAContext *rdma) { RDMACapabilities cap; @@ -3365,6 +3399,10 @@ static int qemu_rdma_accept(RDMAContext *rdma) qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration, NULL, (void *)(intptr_t)rdma->return_path); +} else if (migrate_use_multifd()) { +if (!qemu_rdma_accept_setup(rdma)) { +goto err_rdma_dest_wait; +} } else { qemu_set_fd_handler(rdma->channel->fd, rdma_cm_poll_handler, NULL, rdma); @@ -3975,6 +4013,35 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode) return rioc->file; } +static void migration_rdma_process_incoming(QEMUFile *f, +RDMAContext *rdma, Error **errp) +{ +MigrationIncomingState *mis = migration_incoming_get_current(); +QIOChannel *ioc = NULL; +bool start_migration = false; + +/* FIXME: Need refactor */ +if (!migrate_use_multifd()) { +rdma->migration_started_on_destination = 1; +migration_fd_process_incoming(f, errp); +return; +} + +if (!mis->from_src_file) { +mis->from_src_file = f; +qemu_file_set_blocking(f, false); +} else { +ioc = QIO_CHANNEL(getQIOChannel(f)); +/* Multiple connections */ +assert(migrate_use_multifd()); +start_migration = multifd_recv_new_channel(ioc, errp); +} + +if (start_migration) { +migration_incoming_process(); +} +} + static void rdma_accept_incoming_migration(void *opaque) { RDMAContext *rdma = opaque; @@ -4003,8 +4070,7 @@ static void rdma_accept_incoming_migration(void *opaque) return; } -rdma->migration_started_on_destination = 1; -migration_fd_process_incoming(f, &local_err); +migration_rdma_process_incoming(f, rdma, &local_err); if (local_err) { error_reportf_err(local_err, "RDMA ERROR:"); } -- 1.8.3.1
[PATCH v3 09/18] migration/rdma: add multifd_rdma_load_setup() to setup multifd rdma
Signed-off-by: Chuan Zheng --- migration/rdma.c | 52 1 file changed, 52 insertions(+) diff --git a/migration/rdma.c b/migration/rdma.c index ad4e4ba..2baa933 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -4010,6 +4010,48 @@ static void rdma_accept_incoming_migration(void *opaque) } } +static bool multifd_rdma_load_setup(const char *host_port, +RDMAContext *rdma, Error **errp) +{ +int thread_count; +int i; +int idx; +MultiFDRecvParams *multifd_recv_param; +RDMAContext *multifd_rdma; + +if (!migrate_use_multifd()) { +return true; +} + +if (multifd_load_setup(errp) != 0) { +/* + * We haven't been able to create multifd threads + * nothing better to do + */ +return false; +} + +thread_count = migrate_multifd_channels(); +for (i = 0; i < thread_count; i++) { +if (get_multifd_recv_param(i, &multifd_recv_param) < 0) { +ERROR(errp, "rdma: error getting multifd_recv_param(%d)", i); +return false; +} + +multifd_rdma = qemu_rdma_data_init(host_port, errp); +for (idx = 0; idx < RDMA_WRID_MAX; idx++) { +multifd_rdma->wr_data[idx].control_len = 0; +multifd_rdma->wr_data[idx].control_curr = NULL; +} +/* the CM channel and CM id is shared */ +multifd_rdma->channel = rdma->channel; +multifd_rdma->listen_id = rdma->listen_id; +multifd_recv_param->rdma = (void *)multifd_rdma; +} + +return true; +} + void rdma_start_incoming_migration(const char *host_port, Error **errp) { int ret; @@ -4057,6 +4099,16 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp) qemu_rdma_return_path_dest_init(rdma_return_path, rdma); } +/* multifd rdma setup */ +if (!multifd_rdma_load_setup(host_port, rdma, &local_err)) { +/* + * We haven't been able to create multifd threads + * nothing better to do + */ +error_report_err(local_err); +goto err; +} + qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration, NULL, (void *)(intptr_t)rdma); return; -- 1.8.3.1
[PATCH v3 17/18] migration/rdma: send data for both rdma-pin-all and NOT rdma-pin-all mode
Signed-off-by: Zhimin Feng Signed-off-by: Chuan Zheng --- migration/rdma.c | 67 +++- 1 file changed, 62 insertions(+), 5 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index 327f80f..519fa7a 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -2001,6 +2001,20 @@ static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma, .repeat = 1, }; +/* use multifd to send data */ +if (migrate_use_multifd()) { +int channel = get_multifd_RDMA_channel(); +int ret = 0; +MultiFDSendParams *multifd_send_param = NULL; +ret = get_multifd_send_param(channel, &multifd_send_param); +if (ret) { +error_report("rdma: error getting multifd_send_param(%d)", channel); +return -EINVAL; +} +rdma = (RDMAContext *)multifd_send_param->rdma; +block = &(rdma->local_ram_blocks.block[current_index]); +} + retry: sge.addr = (uintptr_t)(block->local_host_addr + (current_addr - block->offset)); @@ -2196,6 +2210,27 @@ retry: return 0; } +static int multifd_rdma_write_flush(void) +{ +/* The multifd RDMA threads send data */ +MultiFDSendParams *multifd_send_param = NULL; +RDMAContext *rdma = NULL; +MigrationState *s = migrate_get_current(); +int ret = 0; + +ret = get_multifd_send_param(s->rdma_channel, + &multifd_send_param); +if (ret) { +error_report("rdma: error getting multifd_send_param(%d)", + s->rdma_channel); +return ret; +} +rdma = (RDMAContext *)(multifd_send_param->rdma); +rdma->nb_sent++; + +return ret; +} + /* * Push out any unwritten RDMA operations. * @@ -2218,8 +2253,15 @@ static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma) } if (ret == 0) { -rdma->nb_sent++; -trace_qemu_rdma_write_flush(rdma->nb_sent); +if (migrate_use_multifd()) { +ret = multifd_rdma_write_flush(); +if (ret) { +return ret; +} +} else { +rdma->nb_sent++; +trace_qemu_rdma_write_flush(rdma->nb_sent); +} } rdma->current_length = 0; @@ -4061,6 +4103,7 @@ wait_reg_complete: } qemu_sem_post(&multifd_send_param->sem_sync); +qemu_sem_wait(&multifd_send_param->sem); } } @@ -4443,6 +4486,7 @@ static void *multifd_rdma_send_thread(void *opaque) Error *local_err = NULL; int ret = 0; RDMAControlHeader head = { .len = 0, .repeat = 1 }; +RDMAContext *rdma = p->rdma; trace_multifd_send_thread_start(p->id); if (multifd_send_initial_packet(p, &local_err) < 0) { @@ -4451,7 +4495,7 @@ static void *multifd_rdma_send_thread(void *opaque) /* wait for semaphore notification to register memory */ qemu_sem_wait(&p->sem_sync); -if (qemu_rdma_registration(p->rdma) < 0) { +if (qemu_rdma_registration(rdma) < 0) { goto out; } /* @@ -4467,13 +4511,26 @@ static void *multifd_rdma_send_thread(void *opaque) break; } qemu_mutex_unlock(&p->mutex); - +/* To complete polling(CQE) */ +while (rdma->nb_sent) { +ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL); +if (ret < 0) { +error_report("multifd RDMA migration: " + "complete polling error!"); +return NULL; +} +} /* Send FINISHED to the destination */ head.type = RDMA_CONTROL_REGISTER_FINISHED; -ret = qemu_rdma_exchange_send(p->rdma, &head, NULL, NULL, NULL, NULL); +ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL); if (ret < 0) { +error_report("multifd RDMA migration: " + "sending remote error!"); return NULL; } + +/* sync main thread */ +qemu_sem_post(&p->sem); } out: -- 1.8.3.1
[PATCH v3 00/18] Support Multifd for RDMA migration
Now I continue to support multifd for RDMA migration based on my colleague zhiming's work:) The previous RFC patches is listed below: v1: https://www.mail-archive.com/qemu-devel@nongnu.org/msg669455.html v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg679188.html As descried in previous RFC, the RDMA bandwidth is not fully utilized for over 25Gigabit NIC because of single channel for RDMA migration. This patch series is going to support multifd for RDMA migration based on multifd framework. Comparsion is between origion and multifd RDMA migration is re-tested for v3. The VM specifications for migration are as follows: - VM use 4k page; - the number of VCPU is 4; - the total memory is 16Gigabit; - use 'mempress' tool to pressurize VM(mempress 8000 500); - use 25Gigabit network card to migrate; For origin RDMA and MultiRDMA migration, the total migration times of VM are as follows: + | | NOT rdma-pin-all | rdma-pin-all | + | origin RDMA | 26 s | 29 s | - | MultiRDMA | 16 s | 17 s | + Test the multifd RDMA migration like this: virsh migrate --live --multiFd --migrateuri rdma://192.168.1.100 [VM] --listen-address 0.0.0.0 qemu+tcp://192.168.1.100/system --verbose v2 -> v3: create multifd ops for both tcp and rdma do not export rdma to avoid multifd code in mess fix build issue for non-rdma fix some codestyle and buggy code Chuan Zheng (18): migration/rdma: add the 'migrate_use_rdma_pin_all' function migration/rdma: judge whether or not the RDMA is used for migration migration/rdma: create multifd_setup_ops for Tx/Rx thread migration/rdma: add multifd_setup_ops for rdma migration/rdma: do not need sync main for rdma migration/rdma: export MultiFDSendParams/MultiFDRecvParams migration/rdma: add rdma field into multifd send/recv param migration/rdma: export getQIOChannel to get QIOchannel in rdma migration/rdma: add multifd_rdma_load_setup() to setup multifd rdma migration/rdma: Create the multifd recv channels for RDMA migration/rdma: record host_port for multifd RDMA migration/rdma: Create the multifd send channels for RDMA migration/rdma: Add the function for dynamic page registration migration/rdma: register memory for multifd RDMA channels migration/rdma: only register the memory for multifd channels migration/rdma: add rdma_channel into Migrationstate field migration/rdma: send data for both rdma-pin-all and NOT rdma-pin-all mode migration/rdma: RDMA cleanup for multifd migration migration/migration.c | 24 +++ migration/migration.h | 11 ++ migration/multifd.c | 97 +- migration/multifd.h | 24 +++ migration/qemu-file.c | 5 + migration/qemu-file.h | 1 + migration/rdma.c | 503 +- 7 files changed, 653 insertions(+), 12 deletions(-) -- 1.8.3.1
[PATCH v3 15/18] migration/rdma: only register the memory for multifd channels
All data is sent by multifd Channels, so we only register its for multifd channels and main channel don't register its. Signed-off-by: Zhimin Feng Signed-off-by: Chuan Zheng --- migration/rdma.c | 8 1 file changed, 8 insertions(+) diff --git a/migration/rdma.c b/migration/rdma.c index 3210e6e..d5d6364 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3938,6 +3938,12 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, qemu_sem_post(&multifd_send_param->sem_sync); } + +/* + * Use multifd to migrate, we only register memory for + * multifd RDMA channel and main channel don't register it. + */ +goto wait_reg_complete; } /* @@ -3998,6 +4004,8 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, rdma->dest_blocks[i].remote_host_addr; local->block[i].remote_rkey = rdma->dest_blocks[i].remote_rkey; } + +wait_reg_complete: /* Wait for all multifd channels to complete registration */ if (migrate_use_multifd()) { int i; -- 1.8.3.1
[PATCH v3 08/18] migration/rdma: export getQIOChannel to get QIOchannel in rdma
Signed-off-by: Zhimin Feng Signed-off-by: Chuan Zheng --- migration/qemu-file.c | 5 + migration/qemu-file.h | 1 + 2 files changed, 6 insertions(+) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index be21518..37f6201 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -260,6 +260,11 @@ void ram_control_before_iterate(QEMUFile *f, uint64_t flags) } } +void *getQIOChannel(QEMUFile *f) +{ +return f->opaque; +} + void ram_control_after_iterate(QEMUFile *f, uint64_t flags) { int ret = 0; diff --git a/migration/qemu-file.h b/migration/qemu-file.h index a9b6d6c..4cef043 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -165,6 +165,7 @@ void qemu_file_set_blocking(QEMUFile *f, bool block); void ram_control_before_iterate(QEMUFile *f, uint64_t flags); void ram_control_after_iterate(QEMUFile *f, uint64_t flags); void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data); +void *getQIOChannel(QEMUFile *f); /* Whenever this is found in the data stream, the flags * will be passed to ram_control_load_hook in the incoming-migration -- 1.8.3.1
[PATCH v3 14/18] migration/rdma: register memory for multifd RDMA channels
Signed-off-by: Zhimin Feng Signed-off-by: Chuan Zheng --- migration/multifd.c | 3 ++ migration/rdma.c| 94 +++-- 2 files changed, 95 insertions(+), 2 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 9439b3c..c4d90ef 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -534,6 +534,9 @@ void multifd_send_terminate_threads(Error *err) qemu_mutex_lock(&p->mutex); p->quit = true; qemu_sem_post(&p->sem); +if (migrate_use_rdma()) { +qemu_sem_post(&p->sem_sync); +} qemu_mutex_unlock(&p->mutex); } } diff --git a/migration/rdma.c b/migration/rdma.c index a366849..3210e6e 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3837,6 +3837,19 @@ static int rdma_load_hook(QEMUFile *f, void *opaque, uint64_t flags, void *data) return rdma_block_notification_handle(opaque, data); case RAM_CONTROL_HOOK: +if (migrate_use_multifd()) { +int i; +MultiFDRecvParams *multifd_recv_param = NULL; +int thread_count = migrate_multifd_channels(); +/* Inform dest recv_thread to poll */ +for (i = 0; i < thread_count; i++) { +if (get_multifd_recv_param(i, &multifd_recv_param)) { +return -1; +} +qemu_sem_post(&multifd_recv_param->sem_sync); +} +} + return qemu_rdma_registration_handle(f, opaque); default: @@ -3909,6 +3922,24 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST; trace_qemu_rdma_registration_stop_ram(); +if (migrate_use_multifd()) { +/* + * Inform the multifd channels to register memory + */ +int i; +int thread_count = migrate_multifd_channels(); +MultiFDSendParams *multifd_send_param = NULL; +for (i = 0; i < thread_count; i++) { +ret = get_multifd_send_param(i, &multifd_send_param); +if (ret) { +error_report("rdma: error getting multifd(%d)", i); +return ret; +} + +qemu_sem_post(&multifd_send_param->sem_sync); +} +} + /* * Make sure that we parallelize the pinning on both sides. * For very large guests, doing this serially takes a really @@ -3967,6 +3998,21 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, rdma->dest_blocks[i].remote_host_addr; local->block[i].remote_rkey = rdma->dest_blocks[i].remote_rkey; } +/* Wait for all multifd channels to complete registration */ +if (migrate_use_multifd()) { +int i; +int thread_count = migrate_multifd_channels(); +MultiFDSendParams *multifd_send_param = NULL; +for (i = 0; i < thread_count; i++) { +ret = get_multifd_send_param(i, &multifd_send_param); +if (ret) { +error_report("rdma: error getting multifd(%d)", i); +return ret; +} + +qemu_sem_wait(&multifd_send_param->sem); +} +} } trace_qemu_rdma_registration_stop(flags); @@ -3978,6 +4024,24 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, goto err; } +if (migrate_use_multifd()) { +/* + * Inform src send_thread to send FINISHED signal. + * Wait for multifd RDMA send threads to poll the CQE. + */ +int i; +int thread_count = migrate_multifd_channels(); +MultiFDSendParams *multifd_send_param = NULL; +for (i = 0; i < thread_count; i++) { +ret = get_multifd_send_param(i, &multifd_send_param); +if (ret < 0) { +goto err; +} + +qemu_sem_post(&multifd_send_param->sem_sync); +} +} + return 0; err: rdma->error_state = ret; @@ -4355,20 +4419,39 @@ static void *multifd_rdma_send_thread(void *opaque) { MultiFDSendParams *p = opaque; Error *local_err = NULL; +int ret = 0; +RDMAControlHeader head = { .len = 0, .repeat = 1 }; trace_multifd_send_thread_start(p->id); if (multifd_send_initial_packet(p, &local_err) < 0) { goto out; } +/* wait for semaphore notification to register memory */ +qemu_sem_wait(&p->sem_sync); +if (qemu_rdma_registration(p->rdma) < 0) { +goto out; +} +/* + * Inform the main RDMA thread to run when multifd + * RDMA thread have completed registration. + */ +qemu_sem_post(&p->sem); while (true) { +qemu_sem_wait(&p->sem_sync); qemu_mutex_lock(&p->mutex); if (p->quit) { qemu_mutex_unloc
[PATCH v3 01/18] migration/rdma: add the 'migrate_use_rdma_pin_all' function
Signed-off-by: Zhimin Feng Signed-off-by: Chuan Zheng --- migration/migration.c | 9 + migration/migration.h | 1 + 2 files changed, 10 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 0575ecb..64ae417 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2329,6 +2329,15 @@ bool migrate_use_events(void) return s->enabled_capabilities[MIGRATION_CAPABILITY_EVENTS]; } +bool migrate_use_rdma_pin_all(void) +{ +MigrationState *s; + +s = migrate_get_current(); + +return s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL]; +} + bool migrate_use_multifd(void) { MigrationState *s; diff --git a/migration/migration.h b/migration/migration.h index deb411a..74fd790 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -300,6 +300,7 @@ bool migrate_ignore_shared(void); bool migrate_validate_uuid(void); bool migrate_auto_converge(void); +bool migrate_use_rdma_pin_all(void); bool migrate_use_multifd(void); bool migrate_pause_before_switchover(void); int migrate_multifd_channels(void); -- 1.8.3.1
[PATCH v3 06/18] migration/rdma: export MultiFDSendParams/MultiFDRecvParams
MultiFDSendParams and MultiFDRecvParams is need for rdma, export it Signed-off-by: Zhimin Feng Signed-off-by: Chuan Zheng --- migration/multifd.c | 26 ++ migration/multifd.h | 2 ++ 2 files changed, 28 insertions(+) diff --git a/migration/multifd.c b/migration/multifd.c index 8ccfd46..03f3a1e 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -387,6 +387,19 @@ struct { MultiFDSetup *setup_ops; } *multifd_send_state; +int get_multifd_send_param(int id, MultiFDSendParams **param) +{ +int ret = 0; + +if (id < 0 || id >= migrate_multifd_channels()) { +ret = -1; +} else { +*param = &(multifd_send_state->params[id]); +} + +return ret; +} + /* * How we use multifd_send_state->pages and channel->pages? * @@ -919,6 +932,19 @@ struct { MultiFDSetup *setup_ops; } *multifd_recv_state; +int get_multifd_recv_param(int id, MultiFDRecvParams **param) +{ +int ret = 0; + +if (id < 0 || id >= migrate_multifd_channels()) { +ret = -1; +} else { +*param = &(multifd_recv_state->params[id]); +} + +return ret; +} + static void multifd_recv_terminate_threads(Error *err) { int i; diff --git a/migration/multifd.h b/migration/multifd.h index 62a0b2a..2f4e585 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -176,6 +176,8 @@ typedef struct { #ifdef CONFIG_RDMA MultiFDSetup *multifd_rdma_setup(void); #endif +int get_multifd_send_param(int id, MultiFDSendParams **param); +int get_multifd_recv_param(int id, MultiFDRecvParams **param); MultiFDSetup *multifd_setup_ops_init(void); void multifd_register_ops(int method, MultiFDMethods *ops); -- 1.8.3.1
[PATCH v3 04/18] migration/rdma: add multifd_setup_ops for rdma
Signed-off-by: Chuan Zheng --- migration/multifd.c | 6 migration/multifd.h | 4 +++ migration/rdma.c| 82 + 3 files changed, 92 insertions(+) diff --git a/migration/multifd.c b/migration/multifd.c index 1f82307..0d494df 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -1210,6 +1210,12 @@ MultiFDSetup *multifd_setup_ops_init(void) { MultiFDSetup *ops; +#ifdef CONFIG_RDMA +if (migrate_use_rdma()) { +ops = multifd_rdma_setup(); +return ops; +} +#endif ops = &multifd_socket_ops; return ops; } diff --git a/migration/multifd.h b/migration/multifd.h index 446315b..62a0b2a 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -173,6 +173,10 @@ typedef struct { void (*recv_channel_setup)(QIOChannel *ioc, MultiFDRecvParams *p); } MultiFDSetup; +#ifdef CONFIG_RDMA +MultiFDSetup *multifd_rdma_setup(void); +#endif +MultiFDSetup *multifd_setup_ops_init(void); void multifd_register_ops(int method, MultiFDMethods *ops); #endif diff --git a/migration/rdma.c b/migration/rdma.c index 0340841..ad4e4ba 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -19,6 +19,7 @@ #include "qemu/cutils.h" #include "rdma.h" #include "migration.h" +#include "multifd.h" #include "qemu-file.h" #include "ram.h" #include "qemu-file-channel.h" @@ -4138,3 +4139,84 @@ err: g_free(rdma); g_free(rdma_return_path); } + +static void *multifd_rdma_send_thread(void *opaque) +{ +MultiFDSendParams *p = opaque; + +while (true) { +qemu_mutex_lock(&p->mutex); +if (p->quit) { +qemu_mutex_unlock(&p->mutex); +break; +} +qemu_mutex_unlock(&p->mutex); +qemu_sem_wait(&p->sem); +} + +qemu_mutex_lock(&p->mutex); +p->running = false; +qemu_mutex_unlock(&p->mutex); + +return NULL; +} + +static void multifd_rdma_send_channel_setup(MultiFDSendParams *p) +{ +Error *local_err = NULL; + +if (p->quit) { +error_setg(&local_err, "multifd: send id %d already quit", p->id); +return ; +} +p->running = true; + +qemu_thread_create(&p->thread, p->name, multifd_rdma_send_thread, p, + QEMU_THREAD_JOINABLE); +} + +static void *multifd_rdma_recv_thread(void *opaque) +{ +MultiFDRecvParams *p = opaque; + +while (true) { +qemu_mutex_lock(&p->mutex); +if (p->quit) { +qemu_mutex_unlock(&p->mutex); +break; +} +qemu_mutex_unlock(&p->mutex); +qemu_sem_wait(&p->sem_sync); +} + +qemu_mutex_lock(&p->mutex); +p->running = false; +qemu_mutex_unlock(&p->mutex); + +return NULL; +} + +static void multifd_rdma_recv_channel_setup(QIOChannel *ioc, +MultiFDRecvParams *p) +{ +QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc); + +p->file = rioc->file; +return; +} + +static MultiFDSetup multifd_rdma_ops = { +.send_thread_setup = multifd_rdma_send_thread, +.recv_thread_setup = multifd_rdma_recv_thread, +.send_channel_setup = multifd_rdma_send_channel_setup, +.recv_channel_setup = multifd_rdma_recv_channel_setup +}; + +MultiFDSetup *multifd_rdma_setup(void) +{ +MultiFDSetup *rdma_ops; + +rdma_ops = &multifd_rdma_ops; + +return rdma_ops; +} -- 1.8.3.1
[PATCH v3 07/18] migration/rdma: add rdma field into multifd send/recv param
Note we do want to export any rdma struct, take void * instead. Signed-off-by: Chuan Zheng --- migration/multifd.h | 8 1 file changed, 8 insertions(+) diff --git a/migration/multifd.h b/migration/multifd.h index 2f4e585..ff80bd5 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -108,6 +108,10 @@ typedef struct { QemuSemaphore sem_sync; /* used for compression methods */ void *data; +/* used for multifd rdma */ +void *rdma; +/* communication channel */ +QEMUFile *file; } MultiFDSendParams; typedef struct { @@ -147,6 +151,10 @@ typedef struct { QemuSemaphore sem_sync; /* used for de-compression methods */ void *data; +/* used for multifd rdma */ +void *rdma; +/* communication channel */ +QEMUFile *file; } MultiFDRecvParams; typedef struct { -- 1.8.3.1
Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
On Sat, Oct 17, 2020 at 5:36 AM Willy Tarreau wrote: > On Sat, Oct 17, 2020 at 03:40:08AM +0200, Jann Horn wrote: > > [adding some more people who are interested in RNG stuff: Andy, Jason, > > Theodore, Willy Tarreau, Eric Biggers. also linux-api@, because this > > concerns some pretty fundamental API stuff related to RNG usage] > > > > On Fri, Oct 16, 2020 at 4:33 PM Catangiu, Adrian Costin > > wrote: > > > This patch is a driver which exposes the Virtual Machine Generation ID > > > via a char-dev FS interface that provides ID update sync and async > > > notification, retrieval and confirmation mechanisms: > > > > > > When the device is 'open()'ed a copy of the current vm UUID is > > > associated with the file handle. 'read()' operations block until the > > > associated UUID is no longer up to date - until HW vm gen id changes - > > > at which point the new UUID is provided/returned. Nonblocking 'read()' > > > uses EWOULDBLOCK to signal that there is no _new_ UUID available. > > > > > > 'poll()' is implemented to allow polling for UUID updates. Such > > > updates result in 'EPOLLIN' events. > > > > > > Subsequent read()s following a UUID update no longer block, but return > > > the updated UUID. The application needs to acknowledge the UUID update > > > by confirming it through a 'write()'. > > > Only on writing back to the driver the right/latest UUID, will the > > > driver mark this "watcher" as up to date and remove EPOLLIN status. > > > > > > 'mmap()' support allows mapping a single read-only shared page which > > > will always contain the latest UUID value at offset 0. > > > > It would be nicer if that page just contained an incrementing counter, > > instead of a UUID. It's not like the application cares *what* the UUID > > changed to, just that it *did* change and all RNGs state now needs to > > be reseeded from the kernel, right? And an application can't reliably > > read the entire UUID from the memory mapping anyway, because the VM > > might be forked in the middle. > > > > So I think your kernel driver should detect UUID changes and then turn > > those into a monotonically incrementing counter. (Probably 64 bits > > wide?) (That's probably also a little bit faster than comparing an > > entire UUID.) > > I agree with this. Further, I'm observing there is a very common > confusion between "universally unique" and "random". Randoms are > needed when seeking unpredictability. A random number generator > *must* be able to return the same value multiple times in a row > (though this is rare), otherwise it's not random. [...] > If the UUIDs used there are real UUIDs, it could be as simple as > updating them according to their format, i.e. updating the timestamp, > and if the timestamp is already the same, just increase the seq counter. > Doing this doesn't require entropy, doesn't need to block and doesn't > needlessly leak randoms that sometimes make people feel nervous. Those UUIDs are supplied by existing hypervisor code; in that regard, this is almost like a driver for a hardware device. It is written against a fixed API provided by the underlying machine. Making sure that the sequence of UUIDs, as seen from inside the machine, never changes back to a previous one is the responsibility of the hypervisor and out of scope for this driver. Microsoft's spec document (which is a .docx file for reasons I don't understand) actually promises us that it is a cryptographically random 128-bit integer value, which means that if you fork a VM 2^64 times, the probability that any two of those VMs have the same counter is 2^-64. That should be good enough. But in userspace, we just need a simple counter. There's no need for us to worry about anything else, like timestamps or whatever. If we repeatedly fork a paused VM, the forked VMs will see the same counter value, but that's totally fine, because the only thing that matters to userspace is that the counter changes when the VM is forked. And actually, since the value is a cryptographically random 128-bit value, I think that we should definitely use it to help reseed the kernel's RNG, and keep it secret from userspace. That way, even if the VM image is public, we can ensure that going forward, the kernel RNG will return securely random data.
[PATCH 3/4] target/s390x: Improve cc computation for SUBTRACT LOGICAL
The resulting cc is only dependent on the result and the borrow-out. So save those things rather than the inputs. Borrow-out for 64-bit inputs is had via tcg_gen_sub2_i64 directly into cc_src. Borrow-out for 32-bit inputs is had via extraction from a normal 64-bit sub (with zero-extended inputs). Signed-off-by: Richard Henderson --- target/s390x/internal.h| 3 +-- target/s390x/cc_helper.c | 40 ++- target/s390x/helper.c | 3 +-- target/s390x/translate.c | 55 +++--- target/s390x/insn-data.def | 24 - 5 files changed, 43 insertions(+), 82 deletions(-) diff --git a/target/s390x/internal.h b/target/s390x/internal.h index f5f3ae063e..4077047494 100644 --- a/target/s390x/internal.h +++ b/target/s390x/internal.h @@ -161,6 +161,7 @@ enum cc_op { CC_OP_NZ, /* env->cc_dst != 0 */ CC_OP_ADDU, /* dst != 0, src = carry out (0,1) */ +CC_OP_SUBU, /* dst != 0, src = borrow out (0,-1) */ CC_OP_LTGT_32, /* signed less/greater than (32bit) */ CC_OP_LTGT_64, /* signed less/greater than (64bit) */ @@ -171,7 +172,6 @@ enum cc_op { CC_OP_ADD_64, /* overflow on add (64bit) */ CC_OP_SUB_64, /* overflow on subtraction (64bit) */ -CC_OP_SUBU_64, /* overflow on unsigned subtraction (64bit) */ CC_OP_SUBB_64, /* overflow on unsigned sub-borrow (64bit) */ CC_OP_ABS_64, /* sign eval on abs (64bit) */ CC_OP_NABS_64, /* sign eval on nabs (64bit) */ @@ -179,7 +179,6 @@ enum cc_op { CC_OP_ADD_32, /* overflow on add (32bit) */ CC_OP_SUB_32, /* overflow on subtraction (32bit) */ -CC_OP_SUBU_32, /* overflow on unsigned subtraction (32bit) */ CC_OP_SUBB_32, /* overflow on unsigned sub-borrow (32bit) */ CC_OP_ABS_32, /* sign eval on abs (64bit) */ CC_OP_NABS_32, /* sign eval on nabs (64bit) */ diff --git a/target/s390x/cc_helper.c b/target/s390x/cc_helper.c index cd2c5c4b39..c7728d1225 100644 --- a/target/s390x/cc_helper.c +++ b/target/s390x/cc_helper.c @@ -129,6 +129,11 @@ static uint32_t cc_calc_addu(uint64_t carry_out, uint64_t result) return (result != 0) + 2 * carry_out; } +static uint32_t cc_calc_subu(uint64_t borrow_out, uint64_t result) +{ +return cc_calc_addu(borrow_out + 1, result); +} + static uint32_t cc_calc_add_64(int64_t a1, int64_t a2, int64_t ar) { if ((a1 > 0 && a2 > 0 && ar < 0) || (a1 < 0 && a2 < 0 && ar > 0)) { @@ -159,19 +164,6 @@ static uint32_t cc_calc_sub_64(int64_t a1, int64_t a2, int64_t ar) } } -static uint32_t cc_calc_subu_64(uint64_t a1, uint64_t a2, uint64_t ar) -{ -if (ar == 0) { -return 2; -} else { -if (a2 > a1) { -return 1; -} else { -return 3; -} -} -} - static uint32_t cc_calc_subb_64(uint64_t a1, uint64_t a2, uint64_t ar) { int borrow_out; @@ -245,19 +237,6 @@ static uint32_t cc_calc_sub_32(int32_t a1, int32_t a2, int32_t ar) } } -static uint32_t cc_calc_subu_32(uint32_t a1, uint32_t a2, uint32_t ar) -{ -if (ar == 0) { -return 2; -} else { -if (a2 > a1) { -return 1; -} else { -return 3; -} -} -} - static uint32_t cc_calc_subb_32(uint32_t a1, uint32_t a2, uint32_t ar) { int borrow_out; @@ -462,15 +441,15 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t cc_op, case CC_OP_ADDU: r = cc_calc_addu(src, dst); break; +case CC_OP_SUBU: +r = cc_calc_subu(src, dst); +break; case CC_OP_ADD_64: r = cc_calc_add_64(src, dst, vr); break; case CC_OP_SUB_64: r = cc_calc_sub_64(src, dst, vr); break; -case CC_OP_SUBU_64: -r = cc_calc_subu_64(src, dst, vr); -break; case CC_OP_SUBB_64: r = cc_calc_subb_64(src, dst, vr); break; @@ -493,9 +472,6 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t cc_op, case CC_OP_SUB_32: r = cc_calc_sub_32(src, dst, vr); break; -case CC_OP_SUBU_32: -r = cc_calc_subu_32(src, dst, vr); -break; case CC_OP_SUBB_32: r = cc_calc_subb_32(src, dst, vr); break; diff --git a/target/s390x/helper.c b/target/s390x/helper.c index 4f4561bc64..fa3aa500e5 100644 --- a/target/s390x/helper.c +++ b/target/s390x/helper.c @@ -396,6 +396,7 @@ const char *cc_name(enum cc_op cc_op) [CC_OP_STATIC]= "CC_OP_STATIC", [CC_OP_NZ]= "CC_OP_NZ", [CC_OP_ADDU] = "CC_OP_ADDU", +[CC_OP_SUBU] = "CC_OP_SUBU", [CC_OP_LTGT_32] = "CC_OP_LTGT_32", [CC_OP_LTGT_64] = "CC_OP_LTGT_64", [CC_OP_LTUGTU_32] = "CC_OP_LTUGTU_32", @@ -404,13 +405,
[PATCH 2/4] target/s390x: Improve ADD LOGICAL WITH CARRY
Now that ADD LOGICAL outputs carry, we can use that as input directly. It also means we can re-use CC_OP_ZC and produce an output carry directly from ADD LOGICAL WITH CARRY. Signed-off-by: Richard Henderson --- target/s390x/internal.h| 2 -- target/s390x/cc_helper.c | 26 --- target/s390x/helper.c | 2 -- target/s390x/translate.c | 66 ++ target/s390x/insn-data.def | 8 ++--- 5 files changed, 35 insertions(+), 69 deletions(-) diff --git a/target/s390x/internal.h b/target/s390x/internal.h index 55c5442102..f5f3ae063e 100644 --- a/target/s390x/internal.h +++ b/target/s390x/internal.h @@ -170,7 +170,6 @@ enum cc_op { CC_OP_LTGT0_64, /* signed less/greater than 0 (64bit) */ CC_OP_ADD_64, /* overflow on add (64bit) */ -CC_OP_ADDC_64, /* overflow on unsigned add-carry (64bit) */ CC_OP_SUB_64, /* overflow on subtraction (64bit) */ CC_OP_SUBU_64, /* overflow on unsigned subtraction (64bit) */ CC_OP_SUBB_64, /* overflow on unsigned sub-borrow (64bit) */ @@ -179,7 +178,6 @@ enum cc_op { CC_OP_MULS_64, /* overflow on signed multiply (64bit) */ CC_OP_ADD_32, /* overflow on add (32bit) */ -CC_OP_ADDC_32, /* overflow on unsigned add-carry (32bit) */ CC_OP_SUB_32, /* overflow on subtraction (32bit) */ CC_OP_SUBU_32, /* overflow on unsigned subtraction (32bit) */ CC_OP_SUBB_32, /* overflow on unsigned sub-borrow (32bit) */ diff --git a/target/s390x/cc_helper.c b/target/s390x/cc_helper.c index 59da4d1cc2..cd2c5c4b39 100644 --- a/target/s390x/cc_helper.c +++ b/target/s390x/cc_helper.c @@ -144,16 +144,6 @@ static uint32_t cc_calc_add_64(int64_t a1, int64_t a2, int64_t ar) } } -static uint32_t cc_calc_addc_64(uint64_t a1, uint64_t a2, uint64_t ar) -{ -/* Recover a2 + carry_in. */ -uint64_t a2c = ar - a1; -/* Check for a2+carry_in overflow, then a1+a2c overflow. */ -int carry_out = (a2c < a2) || (ar < a1); - -return (ar != 0) + 2 * carry_out; -} - static uint32_t cc_calc_sub_64(int64_t a1, int64_t a2, int64_t ar) { if ((a1 > 0 && a2 < 0 && ar < 0) || (a1 < 0 && a2 > 0 && ar > 0)) { @@ -240,16 +230,6 @@ static uint32_t cc_calc_add_32(int32_t a1, int32_t a2, int32_t ar) } } -static uint32_t cc_calc_addc_32(uint32_t a1, uint32_t a2, uint32_t ar) -{ -/* Recover a2 + carry_in. */ -uint32_t a2c = ar - a1; -/* Check for a2+carry_in overflow, then a1+a2c overflow. */ -int carry_out = (a2c < a2) || (ar < a1); - -return (ar != 0) + 2 * carry_out; -} - static uint32_t cc_calc_sub_32(int32_t a1, int32_t a2, int32_t ar) { if ((a1 > 0 && a2 < 0 && ar < 0) || (a1 < 0 && a2 > 0 && ar > 0)) { @@ -485,9 +465,6 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t cc_op, case CC_OP_ADD_64: r = cc_calc_add_64(src, dst, vr); break; -case CC_OP_ADDC_64: -r = cc_calc_addc_64(src, dst, vr); -break; case CC_OP_SUB_64: r = cc_calc_sub_64(src, dst, vr); break; @@ -513,9 +490,6 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t cc_op, case CC_OP_ADD_32: r = cc_calc_add_32(src, dst, vr); break; -case CC_OP_ADDC_32: -r = cc_calc_addc_32(src, dst, vr); -break; case CC_OP_SUB_32: r = cc_calc_sub_32(src, dst, vr); break; diff --git a/target/s390x/helper.c b/target/s390x/helper.c index db87a62a57..4f4561bc64 100644 --- a/target/s390x/helper.c +++ b/target/s390x/helper.c @@ -403,14 +403,12 @@ const char *cc_name(enum cc_op cc_op) [CC_OP_LTGT0_32] = "CC_OP_LTGT0_32", [CC_OP_LTGT0_64] = "CC_OP_LTGT0_64", [CC_OP_ADD_64]= "CC_OP_ADD_64", -[CC_OP_ADDC_64] = "CC_OP_ADDC_64", [CC_OP_SUB_64]= "CC_OP_SUB_64", [CC_OP_SUBU_64] = "CC_OP_SUBU_64", [CC_OP_SUBB_64] = "CC_OP_SUBB_64", [CC_OP_ABS_64]= "CC_OP_ABS_64", [CC_OP_NABS_64] = "CC_OP_NABS_64", [CC_OP_ADD_32]= "CC_OP_ADD_32", -[CC_OP_ADDC_32] = "CC_OP_ADDC_32", [CC_OP_SUB_32]= "CC_OP_SUB_32", [CC_OP_SUBU_32] = "CC_OP_SUBU_32", [CC_OP_SUBB_32] = "CC_OP_SUBB_32", diff --git a/target/s390x/translate.c b/target/s390x/translate.c index 9bf4c14f66..570b3c88c8 100644 --- a/target/s390x/translate.c +++ b/target/s390x/translate.c @@ -600,12 +600,10 @@ static void gen_op_calc_cc(DisasContext *s) dummy = tcg_const_i64(0); /* FALLTHRU */ case CC_OP_ADD_64: -case CC_OP_ADDC_64: case CC_OP_SUB_64: case CC_OP_SUBU_64: case CC_OP_SUBB_64: case CC_OP_ADD_32: -case CC_OP_ADDC_32: case CC_OP_SUB_32: case CC_OP_SUBU_32: case CC_OP_SUBB_32: @@ -665,12 +663,10 @@ static void gen_op_calc_cc(DisasContext *s)
[PATCH 4/4] target/s390x: Improve SUB LOGICAL WITH BORROW
Now that SUB LOGICAL outputs carry, we can use that as input directly. It also means we can re-use CC_OP_ZC and produce an output carry directly from SUB LOGICAL WITH BORROW. Signed-off-by: Richard Henderson --- target/s390x/internal.h| 2 -- target/s390x/cc_helper.c | 32 - target/s390x/helper.c | 2 -- target/s390x/translate.c | 74 -- target/s390x/insn-data.def | 8 ++--- 5 files changed, 44 insertions(+), 74 deletions(-) diff --git a/target/s390x/internal.h b/target/s390x/internal.h index 4077047494..11515bb617 100644 --- a/target/s390x/internal.h +++ b/target/s390x/internal.h @@ -172,14 +172,12 @@ enum cc_op { CC_OP_ADD_64, /* overflow on add (64bit) */ CC_OP_SUB_64, /* overflow on subtraction (64bit) */ -CC_OP_SUBB_64, /* overflow on unsigned sub-borrow (64bit) */ CC_OP_ABS_64, /* sign eval on abs (64bit) */ CC_OP_NABS_64, /* sign eval on nabs (64bit) */ CC_OP_MULS_64, /* overflow on signed multiply (64bit) */ CC_OP_ADD_32, /* overflow on add (32bit) */ CC_OP_SUB_32, /* overflow on subtraction (32bit) */ -CC_OP_SUBB_32, /* overflow on unsigned sub-borrow (32bit) */ CC_OP_ABS_32, /* sign eval on abs (64bit) */ CC_OP_NABS_32, /* sign eval on nabs (64bit) */ CC_OP_MULS_32, /* overflow on signed multiply (32bit) */ diff --git a/target/s390x/cc_helper.c b/target/s390x/cc_helper.c index c7728d1225..e7039d0d18 100644 --- a/target/s390x/cc_helper.c +++ b/target/s390x/cc_helper.c @@ -164,19 +164,6 @@ static uint32_t cc_calc_sub_64(int64_t a1, int64_t a2, int64_t ar) } } -static uint32_t cc_calc_subb_64(uint64_t a1, uint64_t a2, uint64_t ar) -{ -int borrow_out; - -if (ar != a1 - a2) { /* difference means borrow-in */ -borrow_out = (a2 >= a1); -} else { -borrow_out = (a2 > a1); -} - -return (ar != 0) + 2 * !borrow_out; -} - static uint32_t cc_calc_abs_64(int64_t dst) { if ((uint64_t)dst == 0x8000ULL) { @@ -237,19 +224,6 @@ static uint32_t cc_calc_sub_32(int32_t a1, int32_t a2, int32_t ar) } } -static uint32_t cc_calc_subb_32(uint32_t a1, uint32_t a2, uint32_t ar) -{ -int borrow_out; - -if (ar != a1 - a2) { /* difference means borrow-in */ -borrow_out = (a2 >= a1); -} else { -borrow_out = (a2 > a1); -} - -return (ar != 0) + 2 * !borrow_out; -} - static uint32_t cc_calc_abs_32(int32_t dst) { if ((uint32_t)dst == 0x8000UL) { @@ -450,9 +424,6 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t cc_op, case CC_OP_SUB_64: r = cc_calc_sub_64(src, dst, vr); break; -case CC_OP_SUBB_64: -r = cc_calc_subb_64(src, dst, vr); -break; case CC_OP_ABS_64: r = cc_calc_abs_64(dst); break; @@ -472,9 +443,6 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t cc_op, case CC_OP_SUB_32: r = cc_calc_sub_32(src, dst, vr); break; -case CC_OP_SUBB_32: -r = cc_calc_subb_32(src, dst, vr); -break; case CC_OP_ABS_32: r = cc_calc_abs_32(dst); break; diff --git a/target/s390x/helper.c b/target/s390x/helper.c index fa3aa500e5..7678994feb 100644 --- a/target/s390x/helper.c +++ b/target/s390x/helper.c @@ -405,12 +405,10 @@ const char *cc_name(enum cc_op cc_op) [CC_OP_LTGT0_64] = "CC_OP_LTGT0_64", [CC_OP_ADD_64]= "CC_OP_ADD_64", [CC_OP_SUB_64]= "CC_OP_SUB_64", -[CC_OP_SUBB_64] = "CC_OP_SUBB_64", [CC_OP_ABS_64]= "CC_OP_ABS_64", [CC_OP_NABS_64] = "CC_OP_NABS_64", [CC_OP_ADD_32]= "CC_OP_ADD_32", [CC_OP_SUB_32]= "CC_OP_SUB_32", -[CC_OP_SUBB_32] = "CC_OP_SUBB_32", [CC_OP_ABS_32]= "CC_OP_ABS_32", [CC_OP_NABS_32] = "CC_OP_NABS_32", [CC_OP_COMP_32] = "CC_OP_COMP_32", diff --git a/target/s390x/translate.c b/target/s390x/translate.c index 48494a86cc..0d8235a5fb 100644 --- a/target/s390x/translate.c +++ b/target/s390x/translate.c @@ -601,10 +601,8 @@ static void gen_op_calc_cc(DisasContext *s) /* FALLTHRU */ case CC_OP_ADD_64: case CC_OP_SUB_64: -case CC_OP_SUBB_64: case CC_OP_ADD_32: case CC_OP_SUB_32: -case CC_OP_SUBB_32: local_cc_op = tcg_const_i32(s->cc_op); break; case CC_OP_CONST0: @@ -663,10 +661,8 @@ static void gen_op_calc_cc(DisasContext *s) break; case CC_OP_ADD_64: case CC_OP_SUB_64: -case CC_OP_SUBB_64: case CC_OP_ADD_32: case CC_OP_SUB_32: -case CC_OP_SUBB_32: /* 3 arguments */ gen_helper_calc_cc(cc_op, cpu_env, local_cc_op, cc_src, cc_dst, cc_vr); break; @@ -4744,29 +4740,49 @@ static DisasJumpType op_subu64(DisasC
[PATCH 0/4] target/s390x: Improve carry computation
While testing the float128_muladd changes for s390x host, emulating under x86_64 of course, I noticed that the code we generate for strings of ALCGR and SLBGR is pretty awful. I realized that we were missing a trick: the output cc is based only on the output (result and carry) and so we don't need to save the inputs. And once we do that, we can use the output carry as a direct input to the next insn. For subtract, computing carry as per the PoO (a + ~b + c), in tcg is less efficient than computing borrow. We can convert between the two simply by adding or subtracting 1. As an example from float128_muladd, 0x43f014: b90b 0019 slgr %r1, %r9 0x43f018: b989 002a slbgr%r2, %r10 0x43f01c: b989 0030 slbgr%r3, %r0 Before: -- guest addr 0x0043f014 0x7fcbf811a4bc: 4d 8b f5 movq %r13, %r14 0x7fcbf811a4bf: 4c 8b 7d 48 movq 0x48(%rbp), %r15 0x7fcbf811a4c3: 4d 2b ef subq %r15, %r13 0x7fcbf811a4c6: 4c 89 6d 08 movq %r13, 8(%rbp) -- guest addr 0x0043f018 0x7fcbf811a4ca: 4c 8b d3 movq %rbx, %r10 0x7fcbf811a4cd: 4c 8b 5d 50 movq 0x50(%rbp), %r11 0x7fcbf811a4d1: 49 2b db subq %r11, %rbx 0x7fcbf811a4d4: 4d 3b f7 cmpq %r15, %r14 0x7fcbf811a4d7: 41 0f 92 c6 setb %r14b 0x7fcbf811a4db: 45 0f b6 f6 movzbl %r14b, %r14d 0x7fcbf811a4df: 49 2b de subq %r14, %rbx 0x7fcbf811a4e2: 48 89 5d 10 movq %rbx, 0x10(%rbp) 0x7fcbf811a4e6: 4c 8b c3 movq %rbx, %r8 -- guest addr 0x0043f01c 0x7fcbf811a4e9: 4c 8b 75 18 movq 0x18(%rbp), %r14 0x7fcbf811a4ed: 4d 8b fe movq %r14, %r15 0x7fcbf811a4f0: 4c 8b 4d 00 movq (%rbp), %r9 0x7fcbf811a4f4: 4d 2b f1 subq %r9, %r14 0x7fcbf811a4f7: 48 8b fd movq %rbp, %rdi 0x7fcbf811a4fa: be 12 00 00 00 movl $0x12, %esi 0x7fcbf811a4ff: 49 8b d2 movq %r10, %rdx 0x7fcbf811a502: 49 8b cb movq %r11, %rcx 0x7fcbf811a505: ff 15 4d 01 00 00callq*0x14d(%rip) 0x7fcbf811a50b: 83 f8 02 cmpl $2, %eax 0x7fcbf811a50e: 41 0f 92 c2 setb %r10b 0x7fcbf811a512: 45 0f b6 d2 movzbl %r10b, %r10d 0x7fcbf811a516: 45 8b d2 movl %r10d, %r10d 0x7fcbf811a519: 4d 2b f2 subq %r10, %r14 0x7fcbf811a51c: 4c 89 75 18 movq %r14, 0x18(%rbp) 0x7fcbf811a520: 48 8b 4d 00 movq (%rbp), %rcx 0x7fcbf811a524: 4d 8b c6 movq %r14, %r8 After: -- guest addr 0x0043f014 0x7fd1d011a23c: 45 33 f6 xorl %r14d, %r14d 0x7fd1d011a23f: 4c 8b 7d 48 movq 0x48(%rbp), %r15 0x7fd1d011a243: 4d 2b ef subq %r15, %r13 0x7fd1d011a246: 4d 1b f6 sbbq %r14, %r14 0x7fd1d011a249: 4c 89 6d 08 movq %r13, 8(%rbp) -- guest addr 0x0043f018 0x7fd1d011a24d: 49 03 de addq %r14, %rbx 0x7fd1d011a250: 49 83 d6 00 adcq $0, %r14 0x7fd1d011a254: 4c 8b 7d 50 movq 0x50(%rbp), %r15 0x7fd1d011a258: 49 2b df subq %r15, %rbx 0x7fd1d011a25b: 49 83 de 00 sbbq $0, %r14 0x7fd1d011a25f: 48 89 5d 10 movq %rbx, 0x10(%rbp) -- guest addr 0x0043f01c 0x7fd1d011a263: 4c 8b 7d 18 movq 0x18(%rbp), %r15 0x7fd1d011a267: 4d 03 fe addq %r14, %r15 0x7fd1d011a26a: 49 83 d6 00 adcq $0, %r14 0x7fd1d011a26e: 4c 8b 55 00 movq (%rbp), %r10 0x7fd1d011a272: 4d 2b fa subq %r10, %r15 0x7fd1d011a275: 49 83 de 00 sbbq $0, %r14 0x7fd1d011a279: 4c 89 7d 18 movq %r15, 0x18(%rbp) r~ Richard Henderson (4): target/s390x: Improve cc computation for ADD LOGICAL target/s390x: Improve ADD LOGICAL WITH CARRY target/s390x: Improve cc computation for SUBTRACT LOGICAL target/s390x: Improve SUB LOGICAL WITH BORROW target/s390x/internal.h| 11 +- target/s390x/cc_helper.c | 123 +++- target/s390x/helper.c | 10 +- target/s390x/translate.c | 286 - target/s390x/insn-data.def | 76 +- 5 files changed, 213 insertions(+), 293 deletions(-) -- 2.25.1
[PATCH 1/4] target/s390x: Improve cc computation for ADD LOGICAL
The resulting cc is only dependent on the result and the carry-out. So save those things rather than the inputs. Carry-out for 64-bit inputs is had via tcg_gen_add2_i64 directly into cc_src. Carry-out for 32-bit inputs is had via extraction from a normal 64-bit add (with zero-extended inputs). Signed-off-by: Richard Henderson --- target/s390x/internal.h| 4 +- target/s390x/cc_helper.c | 25 - target/s390x/helper.c | 3 +- target/s390x/translate.c | 103 - target/s390x/insn-data.def | 36 ++--- 5 files changed, 97 insertions(+), 74 deletions(-) diff --git a/target/s390x/internal.h b/target/s390x/internal.h index 64602660ae..55c5442102 100644 --- a/target/s390x/internal.h +++ b/target/s390x/internal.h @@ -160,6 +160,8 @@ enum cc_op { CC_OP_STATIC, /* CC value is env->cc_op */ CC_OP_NZ, /* env->cc_dst != 0 */ +CC_OP_ADDU, /* dst != 0, src = carry out (0,1) */ + CC_OP_LTGT_32, /* signed less/greater than (32bit) */ CC_OP_LTGT_64, /* signed less/greater than (64bit) */ CC_OP_LTUGTU_32,/* unsigned less/greater than (32bit) */ @@ -168,7 +170,6 @@ enum cc_op { CC_OP_LTGT0_64, /* signed less/greater than 0 (64bit) */ CC_OP_ADD_64, /* overflow on add (64bit) */ -CC_OP_ADDU_64, /* overflow on unsigned add (64bit) */ CC_OP_ADDC_64, /* overflow on unsigned add-carry (64bit) */ CC_OP_SUB_64, /* overflow on subtraction (64bit) */ CC_OP_SUBU_64, /* overflow on unsigned subtraction (64bit) */ @@ -178,7 +179,6 @@ enum cc_op { CC_OP_MULS_64, /* overflow on signed multiply (64bit) */ CC_OP_ADD_32, /* overflow on add (32bit) */ -CC_OP_ADDU_32, /* overflow on unsigned add (32bit) */ CC_OP_ADDC_32, /* overflow on unsigned add-carry (32bit) */ CC_OP_SUB_32, /* overflow on subtraction (32bit) */ CC_OP_SUBU_32, /* overflow on unsigned subtraction (32bit) */ diff --git a/target/s390x/cc_helper.c b/target/s390x/cc_helper.c index 5432aeeed4..59da4d1cc2 100644 --- a/target/s390x/cc_helper.c +++ b/target/s390x/cc_helper.c @@ -123,6 +123,12 @@ static uint32_t cc_calc_nz(uint64_t dst) return !!dst; } +static uint32_t cc_calc_addu(uint64_t carry_out, uint64_t result) +{ +g_assert(carry_out <= 1); +return (result != 0) + 2 * carry_out; +} + static uint32_t cc_calc_add_64(int64_t a1, int64_t a2, int64_t ar) { if ((a1 > 0 && a2 > 0 && ar < 0) || (a1 < 0 && a2 < 0 && ar > 0)) { @@ -138,11 +144,6 @@ static uint32_t cc_calc_add_64(int64_t a1, int64_t a2, int64_t ar) } } -static uint32_t cc_calc_addu_64(uint64_t a1, uint64_t a2, uint64_t ar) -{ -return (ar != 0) + 2 * (ar < a1); -} - static uint32_t cc_calc_addc_64(uint64_t a1, uint64_t a2, uint64_t ar) { /* Recover a2 + carry_in. */ @@ -239,11 +240,6 @@ static uint32_t cc_calc_add_32(int32_t a1, int32_t a2, int32_t ar) } } -static uint32_t cc_calc_addu_32(uint32_t a1, uint32_t a2, uint32_t ar) -{ -return (ar != 0) + 2 * (ar < a1); -} - static uint32_t cc_calc_addc_32(uint32_t a1, uint32_t a2, uint32_t ar) { /* Recover a2 + carry_in. */ @@ -483,12 +479,12 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t cc_op, case CC_OP_NZ: r = cc_calc_nz(dst); break; +case CC_OP_ADDU: +r = cc_calc_addu(src, dst); +break; case CC_OP_ADD_64: r = cc_calc_add_64(src, dst, vr); break; -case CC_OP_ADDU_64: -r = cc_calc_addu_64(src, dst, vr); -break; case CC_OP_ADDC_64: r = cc_calc_addc_64(src, dst, vr); break; @@ -517,9 +513,6 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t cc_op, case CC_OP_ADD_32: r = cc_calc_add_32(src, dst, vr); break; -case CC_OP_ADDU_32: -r = cc_calc_addu_32(src, dst, vr); -break; case CC_OP_ADDC_32: r = cc_calc_addc_32(src, dst, vr); break; diff --git a/target/s390x/helper.c b/target/s390x/helper.c index b877690845..db87a62a57 100644 --- a/target/s390x/helper.c +++ b/target/s390x/helper.c @@ -395,6 +395,7 @@ const char *cc_name(enum cc_op cc_op) [CC_OP_DYNAMIC] = "CC_OP_DYNAMIC", [CC_OP_STATIC]= "CC_OP_STATIC", [CC_OP_NZ]= "CC_OP_NZ", +[CC_OP_ADDU] = "CC_OP_ADDU", [CC_OP_LTGT_32] = "CC_OP_LTGT_32", [CC_OP_LTGT_64] = "CC_OP_LTGT_64", [CC_OP_LTUGTU_32] = "CC_OP_LTUGTU_32", @@ -402,7 +403,6 @@ const char *cc_name(enum cc_op cc_op) [CC_OP_LTGT0_32] = "CC_OP_LTGT0_32", [CC_OP_LTGT0_64] = "CC_OP_LTGT0_64", [CC_OP_ADD_64]= "CC_OP_ADD_64", -[CC_OP_ADDU_64] = "CC_OP_ADDU_64", [CC_OP_ADDC_64] =
Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
[adding some more people who are interested in RNG stuff: Andy, Jason, Theodore, Willy Tarreau, Eric Biggers. also linux-api@, because this concerns some pretty fundamental API stuff related to RNG usage] On Fri, Oct 16, 2020 at 4:33 PM Catangiu, Adrian Costin wrote: > - Background > > The VM Generation ID is a feature defined by Microsoft (paper: > http://go.microsoft.com/fwlink/?LinkId=260709) and supported by > multiple hypervisor vendors. > > The feature is required in virtualized environments by apps that work > with local copies/caches of world-unique data such as random values, > uuids, monotonically increasing counters, etc. > Such apps can be negatively affected by VM snapshotting when the VM > is either cloned or returned to an earlier point in time. > > The VM Generation ID is a simple concept meant to alleviate the issue > by providing a unique ID that changes each time the VM is restored > from a snapshot. The hw provided UUID value can be used to > differentiate between VMs or different generations of the same VM. > > - Problem > > The VM Generation ID is exposed through an ACPI device by multiple > hypervisor vendors but neither the vendors or upstream Linux have no > default driver for it leaving users to fend for themselves. > > Furthermore, simply finding out about a VM generation change is only > the starting point of a process to renew internal states of possibly > multiple applications across the system. This process could benefit > from a driver that provides an interface through which orchestration > can be easily done. > > - Solution > > This patch is a driver which exposes the Virtual Machine Generation ID > via a char-dev FS interface that provides ID update sync and async > notification, retrieval and confirmation mechanisms: > > When the device is 'open()'ed a copy of the current vm UUID is > associated with the file handle. 'read()' operations block until the > associated UUID is no longer up to date - until HW vm gen id changes - > at which point the new UUID is provided/returned. Nonblocking 'read()' > uses EWOULDBLOCK to signal that there is no _new_ UUID available. > > 'poll()' is implemented to allow polling for UUID updates. Such > updates result in 'EPOLLIN' events. > > Subsequent read()s following a UUID update no longer block, but return > the updated UUID. The application needs to acknowledge the UUID update > by confirming it through a 'write()'. > Only on writing back to the driver the right/latest UUID, will the > driver mark this "watcher" as up to date and remove EPOLLIN status. > > 'mmap()' support allows mapping a single read-only shared page which > will always contain the latest UUID value at offset 0. It would be nicer if that page just contained an incrementing counter, instead of a UUID. It's not like the application cares *what* the UUID changed to, just that it *did* change and all RNGs state now needs to be reseeded from the kernel, right? And an application can't reliably read the entire UUID from the memory mapping anyway, because the VM might be forked in the middle. So I think your kernel driver should detect UUID changes and then turn those into a monotonically incrementing counter. (Probably 64 bits wide?) (That's probably also a little bit faster than comparing an entire UUID.) An option might be to put that counter into the vDSO, instead of a separate VMA; but I don't know how the other folks feel about that. Andy, do you have opinions on this? That way, normal userspace code that uses this infrastructure wouldn't have to mess around with a special device at all. And it'd be usable in seccomp sandboxes and so on without needing special plumbing. And libraries wouldn't have to call open() and mess with file descriptor numbers. > The driver also adds support for tracking count of open file handles > that haven't acknowledged an UUID update. This is exposed through > two IOCTLs: > * VMGENID_GET_OUTDATED_WATCHERS: immediately returns the number of >_outdated_ watchers - number of file handles that were open during >a VM generation change, and which have not yet confirmed the new >Vm-Gen-Id. > * VMGENID_WAIT_WATCHERS: blocks until there are no more _outdated_ >watchers, or if a 'timeout' argument is provided, until the timeout >expires. Does this mean that code that uses the memory mapping to detect changes is still supposed to confirm generation IDs? What about multithreaded processes, especially ones that use libraries - if a library opens a single file descriptor that is used from multiple threads, is the library required to synchronize all its threads before confirming the change? That seems like it could get messy. And it again means that this interface can't easily be used from inside seccomp sandboxes. [...] > diff --git a/Documentation/virt/vmgenid.rst b/Documentation/virt/vmgenid.rst [...] > +``close()``: > + Removes the file handle as a Vm-Gen-Id watcher. (Linux doesn't have "file handles". Technically close(
[Bug 1891748] Re: qemu-arm-static 5.1 can't run gcc
Plz take a look at this issue please. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1891748 Title: qemu-arm-static 5.1 can't run gcc Status in QEMU: New Bug description: Issue discovered while trying to build pikvm (1) Long story short: when using qemu-arm-static 5.1, gcc exits whith message: Allocating guest commpage: Operation not permitted when using qemu-arm-static v5.0, gcc "works" Steps to reproduce will follow (1) https://github.com/pikvm/pikvm/blob/master/pages/building_os.md To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1891748/+subscriptions
Re: [PATCH v3 2/5] [RISCV_PM] Support CSRs required for RISC-V PM extension except for ones in hypervisor mode
On 10/16/20 3:11 PM, Alexey Baturo wrote: > Signed-off-by: Alexey Baturo > --- > target/riscv/cpu.c | 1 + > target/riscv/cpu.h | 11 ++ > target/riscv/cpu_bits.h | 66 ++ > target/riscv/csr.c | 264 > 4 files changed, 342 insertions(+) Acked-by: Richard Henderson I'd be delighted to see the J working group address the security concerns. And to address the fact that existing hardware will *not* read 0 for the *MTE CSRs, so it's silly to insist on that retroactively. Code should be explicitly checking for J in MISA. r~
Re: [PATCH v3 1/5] [RISCV_PM] Add J-extension into RISC-V
On 10/16/20 3:11 PM, Alexey Baturo wrote: > Signed-off-by: Alexey Baturo > --- > target/riscv/cpu.c | 4 > target/riscv/cpu.h | 2 ++ > 2 files changed, 6 insertions(+) Reviewed-by: Richard Henderson r~
[Bug 1891748] Re: qemu-arm-static 5.1 can't run gcc
Anyone? Seriously, the problem really exists and we even made a case that reproduces it. Someone please take a look at this. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1891748 Title: qemu-arm-static 5.1 can't run gcc Status in QEMU: New Bug description: Issue discovered while trying to build pikvm (1) Long story short: when using qemu-arm-static 5.1, gcc exits whith message: Allocating guest commpage: Operation not permitted when using qemu-arm-static v5.0, gcc "works" Steps to reproduce will follow (1) https://github.com/pikvm/pikvm/blob/master/pages/building_os.md To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1891748/+subscriptions
Re: [PATCH v3 4/5] [RISCV_PM] Support pointer masking for RISC-V for i/c/f/d/a types of instructions
On 10/16/20 3:11 PM, Alexey Baturo wrote: > Signed-off-by: Alexey Baturo > --- > target/riscv/insn_trans/trans_rva.c.inc | 3 +++ > target/riscv/insn_trans/trans_rvd.c.inc | 2 ++ > target/riscv/insn_trans/trans_rvf.c.inc | 2 ++ > target/riscv/insn_trans/trans_rvi.c.inc | 2 ++ > target/riscv/translate.c| 14 ++ > 5 files changed, 23 insertions(+) This will need changes for RVV, but let's omit that for now, so as not to race with the in-flight update to rvv-1.0. Reviewed-by: Richard Henderson r~
Re: [PATCH v3 3/5] [RISCV_PM] Print new PM CSRs in QEMU logs
On 10/16/20 3:11 PM, Alexey Baturo wrote: > Signed-off-by: Alexey Baturo > --- > target/riscv/cpu.c | 19 +++ > 1 file changed, 19 insertions(+) Reviewed-by: Richard Henderson r~
Re: [PATCH v3 5/5] [RISCV_PM] Implement address masking functions required for RISC-V Pointer Masking extension
On 10/16/20 3:11 PM, Alexey Baturo wrote: > From: Anatoly Parshintsev > > Signed-off-by: Anatoly Parshintsev > --- > target/riscv/cpu.h | 19 +++ > target/riscv/translate.c | 34 -- > 2 files changed, 51 insertions(+), 2 deletions(-) Reviewed-by: Richard Henderson r~
Re: HTIF tohost symbol size check always fails
On Fri, Oct 16, 2020 at 2:35 PM Peer Adelt wrote: > > The solution was even easier: I forgot to load the proxy kernel. As soon as I > replaced the command-line parameter "-kernel " with "-kernel > -append ", everything was working as expected. Even better, you can skip the proxy kernel. You can run: `-bios default -kernel ` Alistair > > Without your hint about my possibly misconfigured toolchain I would have > probably continued to search for the error in the QEMU HTIF device. But in > fact it was due to the wrong binary. > > Thanks a lot! :-) > > > On 16. Oct 2020, at 20:03, Alistair Francis wrote: > > > > On Fri, Oct 16, 2020 at 7:59 AM Peer Adelt wrote: > >> > >> Hi, > >> > >> I have a problem with the RISC-V HTIF device. > >> > >> Every binary I have compiled for Spike on riscv32 fails with the following > >> error message: "HTIF tohost must be 8 bytes" > >> > >> This happens regardless of which program I have translated for Spike. This > >> is also the case with the official riscv-compliance tests, for example. > >> > >> The query "if (st_size != 8)" in the HTIF device always fails, because > >> st_size seems to be always 0. > >> > >> To be able to reproduce it: > >> - QEMU GIT Hash: d0ed6a69d399ae193959225cdeaa9382746c91cc (tag "v5.1.0") > > > > I just checked with this hash and with the current master and on both > > I can run a ELF executable on the Spike machine for RV32. > > > >> - System: Mac OS 10.14.6 (Darwin Kernel Version 18.7.0) > >> - Compiler: Latest SiFive Build for GCC under OSX > > > > Maybe try using an official toolchain instead of a vendor fork. > > > > Alistair > > > >> - Command: qemu-system-riscv32 -M spike -nographic -bios none -kernel > >> > >> > >> Best regards, > >> Peer Adelt >
Re: [PATCH v3 0/5] RISC-V Pointer Masking implementation
Patchew URL: https://patchew.org/QEMU/20201016221138.10371-1-space.monkey.deliv...@gmail.com/ Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 20201016221138.10371-1-space.monkey.deliv...@gmail.com Subject: [PATCH v3 0/5] RISC-V Pointer Masking implementation === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu - [tag update] patchew/20201016210754.818257-1-richard.hender...@linaro.org -> patchew/20201016210754.818257-1-richard.hender...@linaro.org * [new tag] patchew/20201016221138.10371-1-space.monkey.deliv...@gmail.com -> patchew/20201016221138.10371-1-space.monkey.deliv...@gmail.com Switched to a new branch 'test' fb26b62 Implement address masking functions required for RISC-V Pointer Masking extension 1135d53 Support pointer masking for RISC-V for i/c/f/d/a types of instructions 760b894 Print new PM CSRs in QEMU logs c4c4e27 Support CSRs required for RISC-V PM extension except for ones in hypervisor mode 5981961 Add J-extension into RISC-V === OUTPUT BEGIN === 1/5 Checking commit 5981961fdb00 (Add J-extension into RISC-V) 2/5 Checking commit c4c4e27fe937 (Support CSRs required for RISC-V PM extension except for ones in hypervisor mode) WARNING: Block comments use a leading /* on a separate line #31: FILE: target/riscv/cpu.h:230: +/* CSRs for PM ERROR: trailing whitespace #146: FILE: target/riscv/csr.c:1258: +/* Functions to access Pointer Masking feature registers $ WARNING: Block comments use a leading /* on a separate line #146: FILE: target/riscv/csr.c:1258: +/* Functions to access Pointer Masking feature registers ERROR: line over 90 characters #153: FILE: target/riscv/csr.c:1265: +/* If priv lvls differ that means we're accessing csr from higher priv lvl, so allow */ WARNING: line over 80 characters #172: FILE: target/riscv/csr.c:1284: +/* We're in same priv lvl, so we allow to modify csr only if pm_current==1 */ ERROR: line over 90 characters #194: FILE: target/riscv/csr.c:1306: + "MMTE: WPRI violation written 0x%lx vs expected 0x%lx\n", val, wpri_val); ERROR: line over 90 characters #220: FILE: target/riscv/csr.c:1332: + "SMTE: WPRI violation written 0x%lx vs expected 0x%lx\n", val, wpri_val); ERROR: line over 90 characters #249: FILE: target/riscv/csr.c:1361: + "UMTE: WPRI violation written 0x%lx vs expected 0x%lx\n", val, wpri_val); total: 5 errors, 3 warnings, 382 lines checked Patch 2/5 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 3/5 Checking commit 760b8944af78 (Print new PM CSRs in QEMU logs) WARNING: line over 80 characters #22: FILE: target/riscv/cpu.c:262: +qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "upmbase ", env->upmbase); WARNING: line over 80 characters #23: FILE: target/riscv/cpu.c:263: +qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "upmmask ", env->upmmask); WARNING: line over 80 characters #26: FILE: target/riscv/cpu.c:266: +qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "spmbase ", env->spmbase); WARNING: line over 80 characters #27: FILE: target/riscv/cpu.c:267: +qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "spmmask ", env->spmmask); WARNING: line over 80 characters #30: FILE: target/riscv/cpu.c:270: +qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "mpmbase ", env->mpmbase); WARNING: line over 80 characters #31: FILE: target/riscv/cpu.c:271: +qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "mpmmask ", env->mpmmask); total: 0 errors, 6 warnings, 25 lines checked Patch 3/5 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 4/5 Checking commit 1135d53a1565 (Support pointer masking for RISC-V for i/c/f/d/a types of instructions) 5/5 Checking commit fb26b62914a4 (Implement address masking functions required for RISC-V Pointer Masking extension) WARNING: line over 80 characters #110: FILE: target/riscv/translate.c:966: +pm_mask[PRV_U] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, upmmask), WARNING: line over 80 characters #112: FILE: target/riscv/translate.c:968: +pm_base[PRV_U] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, upmbase), WARNING: line over 80 characters #114: FILE: target/riscv/translate.c:970: +pm_mask[PRV_S] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, spmmask), WARNING: line over 80 characters #116: FILE: target/riscv/translate.c:972: +pm_base[PRV_S] = tcg_global_mem_new(cpu_env
[PATCH v3 4/5] [RISCV_PM] Support pointer masking for RISC-V for i/c/f/d/a types of instructions
Signed-off-by: Alexey Baturo --- target/riscv/insn_trans/trans_rva.c.inc | 3 +++ target/riscv/insn_trans/trans_rvd.c.inc | 2 ++ target/riscv/insn_trans/trans_rvf.c.inc | 2 ++ target/riscv/insn_trans/trans_rvi.c.inc | 2 ++ target/riscv/translate.c| 14 ++ 5 files changed, 23 insertions(+) diff --git a/target/riscv/insn_trans/trans_rva.c.inc b/target/riscv/insn_trans/trans_rva.c.inc index be8a9f06dd..5559e347ba 100644 --- a/target/riscv/insn_trans/trans_rva.c.inc +++ b/target/riscv/insn_trans/trans_rva.c.inc @@ -26,6 +26,7 @@ static inline bool gen_lr(DisasContext *ctx, arg_atomic *a, MemOp mop) if (a->rl) { tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); } +gen_pm_adjust_address(ctx, src1, src1); tcg_gen_qemu_ld_tl(load_val, src1, ctx->mem_idx, mop); if (a->aq) { tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); @@ -46,6 +47,7 @@ static inline bool gen_sc(DisasContext *ctx, arg_atomic *a, MemOp mop) TCGLabel *l2 = gen_new_label(); gen_get_gpr(src1, a->rs1); +gen_pm_adjust_address(ctx, src1, src1); tcg_gen_brcond_tl(TCG_COND_NE, load_res, src1, l1); gen_get_gpr(src2, a->rs2); @@ -91,6 +93,7 @@ static bool gen_amo(DisasContext *ctx, arg_atomic *a, gen_get_gpr(src1, a->rs1); gen_get_gpr(src2, a->rs2); +gen_pm_adjust_address(ctx, src1, src1); (*func)(src2, src1, src2, ctx->mem_idx, mop); gen_set_gpr(a->rd, src2); diff --git a/target/riscv/insn_trans/trans_rvd.c.inc b/target/riscv/insn_trans/trans_rvd.c.inc index 4f832637fa..935342f66d 100644 --- a/target/riscv/insn_trans/trans_rvd.c.inc +++ b/target/riscv/insn_trans/trans_rvd.c.inc @@ -25,6 +25,7 @@ static bool trans_fld(DisasContext *ctx, arg_fld *a) TCGv t0 = tcg_temp_new(); gen_get_gpr(t0, a->rs1); tcg_gen_addi_tl(t0, t0, a->imm); +gen_pm_adjust_address(ctx, t0, t0); tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEQ); @@ -40,6 +41,7 @@ static bool trans_fsd(DisasContext *ctx, arg_fsd *a) TCGv t0 = tcg_temp_new(); gen_get_gpr(t0, a->rs1); tcg_gen_addi_tl(t0, t0, a->imm); +gen_pm_adjust_address(ctx, t0, t0); tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEQ); diff --git a/target/riscv/insn_trans/trans_rvf.c.inc b/target/riscv/insn_trans/trans_rvf.c.inc index 3dfec8211d..04b3c3eb3d 100644 --- a/target/riscv/insn_trans/trans_rvf.c.inc +++ b/target/riscv/insn_trans/trans_rvf.c.inc @@ -30,6 +30,7 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a) TCGv t0 = tcg_temp_new(); gen_get_gpr(t0, a->rs1); tcg_gen_addi_tl(t0, t0, a->imm); +gen_pm_adjust_address(ctx, t0, t0); tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL); gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]); @@ -47,6 +48,7 @@ static bool trans_fsw(DisasContext *ctx, arg_fsw *a) gen_get_gpr(t0, a->rs1); tcg_gen_addi_tl(t0, t0, a->imm); +gen_pm_adjust_address(ctx, t0, t0); tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEUL); diff --git a/target/riscv/insn_trans/trans_rvi.c.inc b/target/riscv/insn_trans/trans_rvi.c.inc index d04ca0394c..bee7f6be46 100644 --- a/target/riscv/insn_trans/trans_rvi.c.inc +++ b/target/riscv/insn_trans/trans_rvi.c.inc @@ -141,6 +141,7 @@ static bool gen_load(DisasContext *ctx, arg_lb *a, MemOp memop) TCGv t1 = tcg_temp_new(); gen_get_gpr(t0, a->rs1); tcg_gen_addi_tl(t0, t0, a->imm); +gen_pm_adjust_address(ctx, t0, t0); tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx, memop); gen_set_gpr(a->rd, t1); @@ -180,6 +181,7 @@ static bool gen_store(DisasContext *ctx, arg_sb *a, MemOp memop) TCGv dat = tcg_temp_new(); gen_get_gpr(t0, a->rs1); tcg_gen_addi_tl(t0, t0, a->imm); +gen_pm_adjust_address(ctx, t0, t0); gen_get_gpr(dat, a->rs2); tcg_gen_qemu_st_tl(dat, t0, ctx->mem_idx, memop); diff --git a/target/riscv/translate.c b/target/riscv/translate.c index 79dca2291b..a7cbf909f3 100644 --- a/target/riscv/translate.c +++ b/target/riscv/translate.c @@ -101,6 +101,16 @@ static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in) tcg_gen_ori_i64(out, in, MAKE_64BIT_MASK(32, 32)); } +/* + * Temp stub: generates address adjustment for PointerMasking + */ +static void gen_pm_adjust_address(DisasContext *s, + TCGv_i64 dst, + TCGv_i64 src) +{ +tcg_gen_mov_i64(dst, src); +} + /* * A narrow n-bit operation, where n < FLEN, checks that input operands * are correctly Nan-boxed, i.e., all upper FLEN - n bits are 1. @@ -380,6 +390,7 @@ static void gen_load_c(DisasContext *ctx, uint32_t opc, int rd, int rs1, TCGv t1 = tcg_temp_new(); gen_get_gpr(t0, rs1); tcg_gen_addi_tl(t0, t0, imm); +gen_pm_adjust_address(ctx, t0, t0); int memop = tcg_memop_lookup[(opc >> 12) & 0x7]; if (memop < 0) { @@ -400,6 +411,7 @@ static void gen_store_c(DisasContext *ctx, uint32_t opc,
[PATCH v3 3/5] [RISCV_PM] Print new PM CSRs in QEMU logs
Signed-off-by: Alexey Baturo --- target/riscv/cpu.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c index d63031eb08..6ba3e98508 100644 --- a/target/riscv/cpu.c +++ b/target/riscv/cpu.c @@ -255,6 +255,25 @@ static void riscv_cpu_dump_state(CPUState *cs, FILE *f, int flags) qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "htval ", env->htval); qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "mtval2 ", env->mtval2); } +if (riscv_has_ext(env, RVJ)) { +qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "mmte", env->mmte); +switch (env->priv) { +case PRV_U: +qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "upmbase ", env->upmbase); +qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "upmmask ", env->upmmask); +break; +case PRV_S: +qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "spmbase ", env->spmbase); +qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "spmmask ", env->spmmask); +break; +case PRV_M: +qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "mpmbase ", env->mpmbase); +qemu_fprintf(f, " %s " TARGET_FMT_lx "\n", "mpmmask ", env->mpmmask); +break; +default: +assert(0 && "Unreachable"); +} +} #endif for (i = 0; i < 32; i++) { -- 2.20.1
[PATCH v3 5/5] [RISCV_PM] Implement address masking functions required for RISC-V Pointer Masking extension
From: Anatoly Parshintsev Signed-off-by: Anatoly Parshintsev --- target/riscv/cpu.h | 19 +++ target/riscv/translate.c | 34 -- 2 files changed, 51 insertions(+), 2 deletions(-) diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h index 21e47b8283..a481c205c4 100644 --- a/target/riscv/cpu.h +++ b/target/riscv/cpu.h @@ -385,6 +385,7 @@ FIELD(TB_FLAGS, VL_EQ_VLMAX, 2, 1) FIELD(TB_FLAGS, LMUL, 3, 2) FIELD(TB_FLAGS, SEW, 5, 3) FIELD(TB_FLAGS, VILL, 8, 1) +FIELD(TB_FLAGS, PM_ENABLED, 9, 1) /* * A simplification for VLMAX @@ -431,6 +432,24 @@ static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc, if (riscv_cpu_fp_enabled(env)) { flags |= env->mstatus & MSTATUS_FS; } +if (riscv_has_ext(env, RVJ)) { +int priv = cpu_mmu_index(env, false); +bool pm_enabled = false; +switch (priv) { +case PRV_U: +pm_enabled = env->mmte & U_PM_ENABLE; +break; +case PRV_S: +pm_enabled = env->mmte & S_PM_ENABLE; +break; +case PRV_M: +pm_enabled = env->mmte & M_PM_ENABLE; +break; +default: +g_assert_not_reached(); +} +flags = FIELD_DP32(flags, TB_FLAGS, PM_ENABLED, pm_enabled); +} #endif *pflags = flags; } diff --git a/target/riscv/translate.c b/target/riscv/translate.c index a7cbf909f3..2cfc8fd26d 100644 --- a/target/riscv/translate.c +++ b/target/riscv/translate.c @@ -36,6 +36,9 @@ static TCGv cpu_gpr[32], cpu_pc, cpu_vl; static TCGv_i64 cpu_fpr[32]; /* assume F and D extensions */ static TCGv load_res; static TCGv load_val; +/* globals for PM CSRs */ +static TCGv pm_mask[4]; +static TCGv pm_base[4]; #include "exec/gen-icount.h" @@ -63,6 +66,10 @@ typedef struct DisasContext { uint16_t vlen; uint16_t mlen; bool vl_eq_vlmax; +/* PointerMasking extension */ +bool pm_enabled; +TCGv pm_mask; +TCGv pm_base; } DisasContext; #ifdef TARGET_RISCV64 @@ -102,13 +109,19 @@ static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in) } /* - * Temp stub: generates address adjustment for PointerMasking + * Generates address adjustment for PointerMasking */ static void gen_pm_adjust_address(DisasContext *s, TCGv_i64 dst, TCGv_i64 src) { -tcg_gen_mov_i64(dst, src); +if (!s->pm_enabled) { +/* Load unmodified address */ +tcg_gen_mov_i64(dst, src); +} else { +tcg_gen_andc_i64(dst, src, s->pm_mask); +tcg_gen_or_i64(dst, dst, s->pm_base); +} } /* @@ -826,6 +839,10 @@ static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs) ctx->lmul = FIELD_EX32(tb_flags, TB_FLAGS, LMUL); ctx->mlen = 1 << (ctx->sew + 3 - ctx->lmul); ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX); +ctx->pm_enabled = FIELD_EX32(tb_flags, TB_FLAGS, PM_ENABLED); +int priv = cpu_mmu_index(env, false); +ctx->pm_mask = pm_mask[priv]; +ctx->pm_base = pm_base[priv]; } static void riscv_tr_tb_start(DisasContextBase *db, CPUState *cpu) @@ -945,4 +962,17 @@ void riscv_translate_init(void) "load_res"); load_val = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, load_val), "load_val"); +/* Assign PM CSRs to tcg globals */ +pm_mask[PRV_U] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, upmmask), +"upmmask"); +pm_base[PRV_U] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, upmbase), +"upmbase"); +pm_mask[PRV_S] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, spmmask), +"spmmask"); +pm_base[PRV_S] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, spmbase), +"spmbase"); +pm_mask[PRV_M] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, mpmmask), +"mpmmask"); +pm_base[PRV_M] = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, mpmbase), +"mpmbase"); } -- 2.20.1
[PATCH v3 2/5] [RISCV_PM] Support CSRs required for RISC-V PM extension except for ones in hypervisor mode
Signed-off-by: Alexey Baturo --- target/riscv/cpu.c | 1 + target/riscv/cpu.h | 11 ++ target/riscv/cpu_bits.h | 66 ++ target/riscv/csr.c | 264 4 files changed, 342 insertions(+) diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c index fe6bab4a52..d63031eb08 100644 --- a/target/riscv/cpu.c +++ b/target/riscv/cpu.c @@ -440,6 +440,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp) } if (cpu->cfg.ext_j) { target_misa |= RVJ; +env->mmte |= PM_EXT_INITIAL; } if (cpu->cfg.ext_v) { target_misa |= RVV; diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h index eca611a367..21e47b8283 100644 --- a/target/riscv/cpu.h +++ b/target/riscv/cpu.h @@ -226,6 +226,17 @@ struct CPURISCVState { /* True if in debugger mode. */ bool debugger; + +/* CSRs for PM + * TODO: move these csr to appropriate groups + */ +target_ulong mmte; +target_ulong mpmmask; +target_ulong mpmbase; +target_ulong spmmask; +target_ulong spmbase; +target_ulong upmmask; +target_ulong upmbase; #endif float_status fp_status; diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h index bd36062877..84c93c77ae 100644 --- a/target/riscv/cpu_bits.h +++ b/target/riscv/cpu_bits.h @@ -354,6 +354,21 @@ #define CSR_MHPMCOUNTER30H 0xb9e #define CSR_MHPMCOUNTER31H 0xb9f +/* Custom user register */ +#define CSR_UMTE0x8c0 +#define CSR_UPMMASK 0x8c1 +#define CSR_UPMBASE 0x8c2 + +/* Custom machine register */ +#define CSR_MMTE0x7c0 +#define CSR_MPMMASK 0x7c1 +#define CSR_MPMBASE 0x7c2 + +/* Custom supervisor register */ +#define CSR_SMTE0x9c0 +#define CSR_SPMMASK 0x9c1 +#define CSR_SPMBASE 0x9c2 + /* Legacy Machine Protection and Translation (priv v1.9.1) */ #define CSR_MBASE 0x380 #define CSR_MBOUND 0x381 @@ -604,4 +619,55 @@ #define MIE_UTIE (1 << IRQ_U_TIMER) #define MIE_SSIE (1 << IRQ_S_SOFT) #define MIE_USIE (1 << IRQ_U_SOFT) + +/* general mte CSR bits*/ +#define PM_ENABLE 0x0001ULL +#define PM_CURRENT 0x0002ULL +#define PM_XS_MASK 0x0003ULL + +/* PM XS bits values */ +#define PM_EXT_DISABLE 0xULL +#define PM_EXT_INITIAL 0x0001ULL +#define PM_EXT_CLEAN0x0002ULL +#define PM_EXT_DIRTY0x0003ULL + +/* offsets for every pair of control bits per each priv level */ +#define XS_OFFSET0ULL +#define U_OFFSET 2ULL +#define S_OFFSET 4ULL +#define M_OFFSET 6ULL + +#define PM_XS_BITS (PM_XS_MASK << XS_OFFSET) +#define U_PM_ENABLE (PM_ENABLE << U_OFFSET) +#define U_PM_CURRENT (PM_CURRENT << U_OFFSET) +#define S_PM_ENABLE (PM_ENABLE << S_OFFSET) +#define S_PM_CURRENT (PM_CURRENT << S_OFFSET) +#define M_PM_ENABLE (PM_ENABLE << M_OFFSET) + +/* mmte CSR bits */ +#define MMTE_PM_XS_BITS PM_XS_BITS +#define MMTE_U_PM_ENABLEU_PM_ENABLE +#define MMTE_U_PM_CURRENT U_PM_CURRENT +#define MMTE_S_PM_ENABLES_PM_ENABLE +#define MMTE_S_PM_CURRENT S_PM_CURRENT +#define MMTE_M_PM_ENABLEM_PM_ENABLE +#define MMTE_MASK (MMTE_U_PM_ENABLE | MMTE_U_PM_CURRENT | \ + MMTE_S_PM_ENABLE | MMTE_S_PM_CURRENT | \ + MMTE_M_PM_ENABLE | MMTE_PM_XS_BITS) + +/* smte CSR bits */ +#define SMTE_PM_XS_BITS PM_XS_BITS +#define SMTE_U_PM_ENABLEU_PM_ENABLE +#define SMTE_U_PM_CURRENT U_PM_CURRENT +#define SMTE_S_PM_ENABLES_PM_ENABLE +#define SMTE_S_PM_CURRENT S_PM_CURRENT +#define SMTE_MASK (SMTE_U_PM_ENABLE | SMTE_U_PM_CURRENT | \ + SMTE_S_PM_ENABLE | SMTE_S_PM_CURRENT | \ + SMTE_PM_XS_BITS) + +/* umte CSR bits */ +#define UMTE_U_PM_ENABLEU_PM_ENABLE +#define UMTE_U_PM_CURRENT U_PM_CURRENT +#define UMTE_MASK (UMTE_U_PM_ENABLE | MMTE_U_PM_CURRENT) + #endif diff --git a/target/riscv/csr.c b/target/riscv/csr.c index aaef6c6f20..8679b5c80c 100644 --- a/target/riscv/csr.c +++ b/target/riscv/csr.c @@ -140,6 +140,11 @@ static int any(CPURISCVState *env, int csrno) return 0; } +static int umode(CPURISCVState *env, int csrno) +{ +return -!riscv_has_ext(env, RVU); +} + static int smode(CPURISCVState *env, int csrno) { return -!riscv_has_ext(env, RVS); @@ -1250,6 +1255,250 @@ static int write_pmpaddr(CPURISCVState *env, int csrno, target_ulong val) return 0; } +/* Functions to access Pointer Masking feature registers + * We have to check if current priv lvl could modify + * csr in given mode + */ +static int check_pm_current_disabled(CPURISCVState *env, int csrno) +{ +int csr_priv = get_field(csrno, 0xC00); +/* If priv lvls differ that means we're accessing csr from higher priv lvl, so allow */ +if (env->priv != c
[PATCH v3 0/5] RISC-V Pointer Masking implementation
Hi folks, This is third iteration of patches to support Pointer Masking for RISC-V. Most of suggestions have been addressed, however some of them not: - applying mask for return value while reading PM CSR has been kept to mask higher priv level bits - check_pm_current_disabled is not placed into CSR predicate, since the spec expects to read zero from PM CSR if no PM extenstion is present Thanks Alexey Baturo (4): [RISCV_PM] Add J-extension into RISC-V [RISCV_PM] Support CSRs required for RISC-V PM extension except for ones in hypervisor mode [RISCV_PM] Print new PM CSRs in QEMU logs [RISCV_PM] Support pointer masking for RISC-V for i/c/f/d/a types of instructions Anatoly Parshintsev (1): [RISCV_PM] Implement address masking functions required for RISC-V Pointer Masking extension target/riscv/cpu.c | 24 +++ target/riscv/cpu.h | 32 +++ target/riscv/cpu_bits.h | 66 ++ target/riscv/csr.c | 264 target/riscv/insn_trans/trans_rva.c.inc | 3 + target/riscv/insn_trans/trans_rvd.c.inc | 2 + target/riscv/insn_trans/trans_rvf.c.inc | 2 + target/riscv/insn_trans/trans_rvi.c.inc | 2 + target/riscv/translate.c| 44 9 files changed, 439 insertions(+) -- 2.20.1
[PATCH v3 1/5] [RISCV_PM] Add J-extension into RISC-V
Signed-off-by: Alexey Baturo --- target/riscv/cpu.c | 4 target/riscv/cpu.h | 2 ++ 2 files changed, 6 insertions(+) diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c index 0bbfd7f457..fe6bab4a52 100644 --- a/target/riscv/cpu.c +++ b/target/riscv/cpu.c @@ -438,6 +438,9 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp) if (cpu->cfg.ext_h) { target_misa |= RVH; } +if (cpu->cfg.ext_j) { +target_misa |= RVJ; +} if (cpu->cfg.ext_v) { target_misa |= RVV; if (!is_power_of_2(cpu->cfg.vlen)) { @@ -516,6 +519,7 @@ static Property riscv_cpu_properties[] = { DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true), /* This is experimental so mark with 'x-' */ DEFINE_PROP_BOOL("x-h", RISCVCPU, cfg.ext_h, false), +DEFINE_PROP_BOOL("x-j", RISCVCPU, cfg.ext_j, false), DEFINE_PROP_BOOL("x-v", RISCVCPU, cfg.ext_v, false), DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true), DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true), diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h index de275782e6..eca611a367 100644 --- a/target/riscv/cpu.h +++ b/target/riscv/cpu.h @@ -66,6 +66,7 @@ #define RVS RV('S') #define RVU RV('U') #define RVH RV('H') +#define RVJ RV('J') /* S extension denotes that Supervisor mode exists, however it is possible to have a core that support S mode but does not have an MMU and there @@ -277,6 +278,7 @@ struct RISCVCPU { bool ext_s; bool ext_u; bool ext_h; +bool ext_j; bool ext_v; bool ext_counters; bool ext_ifencei; -- 2.20.1
Re: [PATCH v2 1/2] accel/tcg: Add tlb_flush_page_bits_by_mmuidx*
On Fri, 16 Oct 2020 at 22:07, Richard Henderson wrote: > > On ARM, the Top Byte Ignore feature means that only 56 bits of > the address are significant in the virtual address. We are > required to give the entire 64-bit address to FAR_ELx on fault, > which means that we do not "clean" the top byte early in TCG. > > This new interface allows us to flush all 256 possible aliases > for a given page, currently missed by tlb_flush_page*. > > Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell thanks -- PMM
Re: HTIF tohost symbol size check always fails
The solution was even easier: I forgot to load the proxy kernel. As soon as I replaced the command-line parameter "-kernel " with "-kernel -append ", everything was working as expected. Without your hint about my possibly misconfigured toolchain I would have probably continued to search for the error in the QEMU HTIF device. But in fact it was due to the wrong binary. Thanks a lot! :-) > On 16. Oct 2020, at 20:03, Alistair Francis wrote: > > On Fri, Oct 16, 2020 at 7:59 AM Peer Adelt wrote: >> >> Hi, >> >> I have a problem with the RISC-V HTIF device. >> >> Every binary I have compiled for Spike on riscv32 fails with the following >> error message: "HTIF tohost must be 8 bytes" >> >> This happens regardless of which program I have translated for Spike. This >> is also the case with the official riscv-compliance tests, for example. >> >> The query "if (st_size != 8)" in the HTIF device always fails, because >> st_size seems to be always 0. >> >> To be able to reproduce it: >> - QEMU GIT Hash: d0ed6a69d399ae193959225cdeaa9382746c91cc (tag "v5.1.0") > > I just checked with this hash and with the current master and on both > I can run a ELF executable on the Spike machine for RV32. > >> - System: Mac OS 10.14.6 (Darwin Kernel Version 18.7.0) >> - Compiler: Latest SiFive Build for GCC under OSX > > Maybe try using an official toolchain instead of a vendor fork. > > Alistair > >> - Command: qemu-system-riscv32 -M spike -nographic -bios none -kernel >> >> >> Best regards, >> Peer Adelt
[PATCH v2 1/2] accel/tcg: Add tlb_flush_page_bits_by_mmuidx*
On ARM, the Top Byte Ignore feature means that only 56 bits of the address are significant in the virtual address. We are required to give the entire 64-bit address to FAR_ELx on fault, which means that we do not "clean" the top byte early in TCG. This new interface allows us to flush all 256 possible aliases for a given page, currently missed by tlb_flush_page*. Signed-off-by: Richard Henderson --- include/exec/exec-all.h | 36 ++ accel/tcg/cputlb.c | 275 ++-- 2 files changed, 302 insertions(+), 9 deletions(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 66f9b4cca6..4707ac140c 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -251,6 +251,25 @@ void tlb_flush_by_mmuidx_all_cpus(CPUState *cpu, uint16_t idxmap); * depend on when the guests translation ends the TB. */ void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu, uint16_t idxmap); + +/** + * tlb_flush_page_bits_by_mmuidx + * @cpu: CPU whose TLB should be flushed + * @addr: virtual address of page to be flushed + * @idxmap: bitmap of mmu indexes to flush + * @bits: number of significant bits in address + * + * Similar to tlb_flush_page_mask, but with a bitmap of indexes. + */ +void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, target_ulong addr, + uint16_t idxmap, unsigned bits); + +/* Similarly, with broadcast and syncing. */ +void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *cpu, target_ulong addr, +uint16_t idxmap, unsigned bits); +void tlb_flush_page_bits_by_mmuidx_all_cpus_synced +(CPUState *cpu, target_ulong addr, uint16_t idxmap, unsigned bits); + /** * tlb_set_page_with_attrs: * @cpu: CPU to add this TLB entry for @@ -337,6 +356,23 @@ static inline void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu, uint16_t idxmap) { } +static inline void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, + target_ulong addr, + uint16_t idxmap, + unsigned bits) +{ +} +static inline void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *cpu, + target_ulong addr, + uint16_t idxmap, + unsigned bits) +{ +} +static inline void +tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState *cpu, target_ulong addr, + uint16_t idxmap, unsigned bits) +{ +} #endif /** * probe_access: diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index 23ab29..42ab79c1a5 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -409,12 +409,21 @@ void tlb_flush_all_cpus_synced(CPUState *src_cpu) tlb_flush_by_mmuidx_all_cpus_synced(src_cpu, ALL_MMUIDX_BITS); } +static bool tlb_hit_page_mask_anyprot(CPUTLBEntry *tlb_entry, + target_ulong page, target_ulong mask) +{ +page &= mask; +mask &= TARGET_PAGE_MASK | TLB_INVALID_MASK; + +return (page == (tlb_entry->addr_read & mask) || +page == (tlb_addr_write(tlb_entry) & mask) || +page == (tlb_entry->addr_code & mask)); +} + static inline bool tlb_hit_page_anyprot(CPUTLBEntry *tlb_entry, target_ulong page) { -return tlb_hit_page(tlb_entry->addr_read, page) || - tlb_hit_page(tlb_addr_write(tlb_entry), page) || - tlb_hit_page(tlb_entry->addr_code, page); +return tlb_hit_page_mask_anyprot(tlb_entry, page, -1); } /** @@ -427,31 +436,45 @@ static inline bool tlb_entry_is_empty(const CPUTLBEntry *te) } /* Called with tlb_c.lock held */ -static inline bool tlb_flush_entry_locked(CPUTLBEntry *tlb_entry, - target_ulong page) +static bool tlb_flush_entry_mask_locked(CPUTLBEntry *tlb_entry, +target_ulong page, +target_ulong mask) { -if (tlb_hit_page_anyprot(tlb_entry, page)) { +if (tlb_hit_page_mask_anyprot(tlb_entry, page, mask)) { memset(tlb_entry, -1, sizeof(*tlb_entry)); return true; } return false; } +static inline bool tlb_flush_entry_locked(CPUTLBEntry *tlb_entry, + target_ulong page) +{ +return tlb_flush_entry_mask_locked(tlb_entry, page, -1); +} + /* Called with tlb_c.lock held */ -static inline void tlb_flush_vtlb_page_locked(CPUArchState *env, int mmu_idx, - target_ulong page) +static void tlb_flush_vtlb_page_mask_locked(CPUArchState *env, int mmu_idx, +target_ulong page, +
[PATCH v2 2/2] target/arm: Use tlb_flush_page_bits_by_mmuidx*
When TBI is enabled in a given regime, 56 bits of the address are significant and we need to clear out any other matching virtual addresses with differing tags. The other uses of tlb_flush_page (without mmuidx) in this file are only used by aarch32 mode. Fixes: 38d931687fa1 Reported-by: Jordan Frank Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper.c | 46 ++--- 1 file changed, 39 insertions(+), 7 deletions(-) diff --git a/target/arm/helper.c b/target/arm/helper.c index cd0779ff5f..f49b045d36 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -50,6 +50,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address, #endif static void switch_mode(CPUARMState *env, int mode); +static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx); static int vfp_gdb_get_reg(CPUARMState *env, GByteArray *buf, int reg) { @@ -4457,6 +4458,33 @@ static int vae1_tlbmask(CPUARMState *env) } } +/* Return 56 if TBI is enabled, 64 otherwise. */ +static int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx, + uint64_t addr) +{ +uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr; +int tbi = aa64_va_parameter_tbi(tcr, mmu_idx); +int select = extract64(addr, 55, 1); + +return (tbi >> select) & 1 ? 56 : 64; +} + +static int vae1_tlbbits(CPUARMState *env, uint64_t addr) +{ +ARMMMUIdx mmu_idx; + +/* Only the regime of the mmu_idx below is significant. */ +if (arm_is_secure_below_el3(env)) { +mmu_idx = ARMMMUIdx_SE10_0; +} else if ((env->cp15.hcr_el2 & (HCR_E2H | HCR_TGE)) + == (HCR_E2H | HCR_TGE)) { +mmu_idx = ARMMMUIdx_E20_0; +} else { +mmu_idx = ARMMMUIdx_E10_0; +} +return tlbbits_for_regime(env, mmu_idx, addr); +} + static void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -4593,8 +4621,9 @@ static void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri, CPUState *cs = env_cpu(env); int mask = vae1_tlbmask(env); uint64_t pageaddr = sextract64(value << 12, 0, 56); +int bits = vae1_tlbbits(env, pageaddr); -tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, mask); +tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits); } static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri, @@ -4608,11 +4637,12 @@ static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri, CPUState *cs = env_cpu(env); int mask = vae1_tlbmask(env); uint64_t pageaddr = sextract64(value << 12, 0, 56); +int bits = vae1_tlbbits(env, pageaddr); if (tlb_force_broadcast(env)) { -tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, mask); +tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits); } else { -tlb_flush_page_by_mmuidx(cs, pageaddr, mask); +tlb_flush_page_bits_by_mmuidx(cs, pageaddr, mask, bits); } } @@ -4621,9 +4651,10 @@ static void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri, { CPUState *cs = env_cpu(env); uint64_t pageaddr = sextract64(value << 12, 0, 56); +int bits = tlbbits_for_regime(env, ARMMMUIdx_E2, pageaddr); -tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, - ARMMMUIdxBit_E2); +tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, + ARMMMUIdxBit_E2, bits); } static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri, @@ -4631,9 +4662,10 @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri, { CPUState *cs = env_cpu(env); uint64_t pageaddr = sextract64(value << 12, 0, 56); +int bits = tlbbits_for_regime(env, ARMMMUIdx_SE3, pageaddr); -tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, - ARMMMUIdxBit_SE3); +tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, + ARMMMUIdxBit_SE3, bits); } static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri, -- 2.25.1
[PATCH v2 0/2] target/arm: Fix tlb flush page vs tbi
Since the FAR_ELx fix at 38d931687fa1, it is reported that page granularity flushing is broken. This makes sense, since TCG will record the entire virtual address in its TLB, not simply the 56 significant bits. With no other TCG support, the ARM backend should require 256 different page flushes to clear the virtual address of any possible tag. So I added a new tcg interface that allows passing the size of the virtual address. I thought a simple bit-count was a cleaner interface than passing in a mask, since it means that we couldn't be passed nonsensical masks like 0xdeadbeef. It also makes it easy to re-direct special cases. Changes for v2: * Add encode_pbm_to_runon/+decode_runon_to_pbm helpers (pmm). r~ Richard Henderson (2): accel/tcg: Add tlb_flush_page_bits_by_mmuidx* target/arm: Use tlb_flush_page_bits_by_mmuidx* include/exec/exec-all.h | 36 ++ accel/tcg/cputlb.c | 275 ++-- target/arm/helper.c | 46 ++- 3 files changed, 341 insertions(+), 16 deletions(-) -- 2.25.1
Re: [PATCH] configure: actually disable 'git_update' mode with --disable-git-update
On Fri, Oct 2, 2020 at 9:11 AM Daniel P. Berrangé wrote: > > On Wed, Sep 30, 2020 at 09:28:54PM -0400, Dan Streetman wrote: > > On Tue, Sep 22, 2020 at 12:34 PM Daniel P. Berrangé > > wrote: > > > > > > On Wed, Jul 29, 2020 at 03:58:29PM -0400, Dan Streetman wrote: > > > > The --disable-git-update configure param sets git_update=no, but > > > > some later checks only look for the .git dir. This changes the > > > > --enable-git-update to set git_update=yes but also fail if it > > > > does not find a .git dir. Then all the later checks for the .git > > > > dir can just be changed to a check for $git_update = "yes". > > > > > > > > Also update the Makefile to skip the 'git_update' checks if it has > > > > been disabled. > > > > > > > > This is needed because downstream packagers, e.g. Debian, Ubuntu, etc, > > > > also keep the source code in git, but do not want to enable the > > > > 'git_update' mode; with the current code, that's not possible even > > > > if the downstream package specifies --disable-git-update. > > > > > > Lets recap the original intended behaviour > > > > > > 1. User building from master qemu.git or a fork > > > a) git_update=yes (default) > > > - Always sync submodules to required commit > > > > > > b) git_update=no (--disable-git-update) > > > - Never sync submodules, user is responsible for sync > > > - Validate submodules are at correct commit and fail if not. > > > > > > 2. User building from tarball > > > - Never do anything git related at all > > > > > > > > > Your change is removing the validation from 1.b). > > > > Would you accept adding a --disable-git-submodules-check so downstream > > packagers can keep the source in git, but avoid the submodule > > validation? > > It feels like the current option shouldn't be a boolean, rather > a tri-state--with-git-submodules=[update|validate|ignore] > Ok, I updated the patch to do that: https://lists.nongnu.org/archive/html/qemu-devel/2020-10/msg04799.html > > Or do you have another suggestion for handling this? > > Assuming you're just using git for conveniently applying local > downstream patches, you don't need the git repo to exist once > getting to the build stage. IOW just delete the .git dir after > applying patches before running a build. > > > Regards, > Daniel > -- > |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o-https://fstop138.berrange.com :| > |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| >
[PATCH] configure: replace --enable/disable-git-update with --with-git-submodules
Replace the --enable-git-update and --disable-git-update configure params with the param --with-git-submodules=(update|validate|ignore) to allow 3 options for building from a git repo. This is needed because downstream packagers, e.g. Debian, Ubuntu, etc, also keep the source code in git, but do not want to enable the 'git_update' mode; with the current code, that's not possible even if the downstream package specifies --disable-git-update. Signed-off-by: Dan Streetman --- Makefile | 26 ++- configure| 55 +--- scripts/git-submodule.sh | 34 +++-- 3 files changed, 62 insertions(+), 53 deletions(-) diff --git a/Makefile b/Makefile index c37e513431..033455dc8f 100644 --- a/Makefile +++ b/Makefile @@ -34,33 +34,11 @@ ifneq ($(wildcard config-host.mak),) all: include config-host.mak -git-submodule-update: - .PHONY: git-submodule-update - -git_module_status := $(shell \ - cd '$(SRC_PATH)' && \ - GIT="$(GIT)" ./scripts/git-submodule.sh status $(GIT_SUBMODULES); \ - echo $$?; \ -) - -ifeq (1,$(git_module_status)) -ifeq (no,$(GIT_UPDATE)) git-submodule-update: $(call quiet-command, \ -echo && \ -echo "GIT submodule checkout is out of date. Please run" && \ -echo " scripts/git-submodule.sh update $(GIT_SUBMODULES)" && \ -echo "from the source directory checkout $(SRC_PATH)" && \ -echo && \ -exit 1) -else -git-submodule-update: - $(call quiet-command, \ - (cd $(SRC_PATH) && GIT="$(GIT)" ./scripts/git-submodule.sh update $(GIT_SUBMODULES)), \ - "GIT","$(GIT_SUBMODULES)") -endif -endif + (GIT="$(GIT)" "$(SRC_PATH)/scripts/git-submodule.sh" $(GIT_SUBMODULES_ACTION) $(GIT_SUBMODULES)), \ + "GIT","$(GIT_SUBMODULES)") export NINJA=./ninjatool diff --git a/configure b/configure index f839c2a557..c5df778790 100755 --- a/configure +++ b/configure @@ -249,12 +249,12 @@ gdb_bin=$(command -v "gdb-multiarch" || command -v "gdb") if test -e "$source_path/.git" then -git_update=yes +git_submodules_action="update" git_submodules="ui/keycodemapdb" git_submodules="$git_submodules tests/fp/berkeley-testfloat-3" git_submodules="$git_submodules tests/fp/berkeley-softfloat-3" else -git_update=no +git_submodules_action="ignore" git_submodules="" if ! test -f "$source_path/ui/keycodemapdb/README" @@ -1478,9 +1478,16 @@ for opt do ;; --with-git=*) git="$optarg" ;; - --enable-git-update) git_update=yes + --enable-git-update) + git_submodules_action="update" + echo "--enable-git-update deprecated, use --with-git-submodules=update" ;; - --disable-git-update) git_update=no + --disable-git-update) + git_submodules_action="validate" + echo "--disable-git-update deprecated, use --with-git-submodules=validate" + ;; + --with-git-submodules=*) + git_submodules_action="$optarg" ;; --enable-debug-mutex) debug_mutex=yes ;; @@ -1528,6 +1535,20 @@ for opt do esac done +case $git_submodules_action in +update|validate) +if test ! -e "$source_path/.git"; then +echo "ERROR: cannot $git_submodules_action git submodules without .git" +exit 1 +fi +;; +ignore) ;; +*) +echo "ERROR: invalid --with-git-submodules= value '$git_submodules_action'" +exit 1 +;; +esac + firmwarepath="${firmwarepath:-$prefix/share/qemu-firmware}" libdir="${libdir:-$prefix/lib}" libexecdir="${libexecdir:-$prefix/libexec}" @@ -1868,7 +1889,7 @@ python="$python -B" if test -z "$meson"; then if test "$explicit_python" = no && has meson && version_ge "$(meson --version)" 0.55.1; then meson=meson -elif test -e "${source_path}/.git" && test $git_update = 'yes' ; then +elif test $git_submodules_action != 'ignore' ; then meson=git elif test -e "${source_path}/meson/meson.py" ; then meson=internal @@ -1936,7 +1957,7 @@ fi # Consult white-list to determine whether to enable werror # by default. Only enable by default for git builds if test -z "$werror" ; then -if test -e "$source_path/.git" && \ +if test "$git_submodules_action" != "ignore" && \ { test "$linux" = "yes" || test "$mingw32" = "yes"; }; then werror="yes" else @@ -3824,9 +3845,7 @@ fi case "$fdt" in auto | enabled | internal) # Simpler to always update submodule, even if not needed. -if test -e "${source_path}/.git" && test $git_update = 'yes' ; then - git_submodules="${git_submodules} dtc" -fi +git_submodules="${git_submodules} dtc" ;; esac @@ -4696,9 +4715,7 @@ fi case "$capstone" in auto | enabled | internal) # Simpler to always update submodule, even if not needed. -if test -e "${source_path}/.git" && test $git_update = 'yes' ; then - git_submodules="${git_submodul
Re: [PATCH 4/5] ppc405_boards: use qdev properties instead of legacy m48t59_init() function
On Fri, 16 Oct 2020, Mark Cave-Ayland wrote: Signed-off-by: Mark Cave-Ayland --- hw/ppc/ppc405_boards.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c index 6198ec1035..4687715b15 100644 --- a/hw/ppc/ppc405_boards.c +++ b/hw/ppc/ppc405_boards.c @@ -28,6 +28,8 @@ #include "qemu-common.h" #include "cpu.h" #include "hw/ppc/ppc.h" +#include "hw/qdev-properties.h" +#include "hw/sysbus.h" #include "ppc405.h" #include "hw/rtc/m48t59.h" #include "hw/block/flash.h" @@ -145,6 +147,8 @@ static void ref405ep_init(MachineState *machine) char *filename; ppc4xx_bd_info_t bd; CPUPPCState *env; +DeviceState *dev; +SysBusDevice *s; qemu_irq *pic; MemoryRegion *bios; MemoryRegion *sram = g_new(MemoryRegion, 1); @@ -227,7 +231,11 @@ static void ref405ep_init(MachineState *machine) /* Register FPGA */ ref405ep_fpga_init(sysmem, 0xF030); /* Register NVRAM */ -m48t59_init(NULL, 0xF000, 0, 8192, 1968, 8); +dev = qdev_new("sysbus-m48t08"); +qdev_prop_set_int32(dev, "base-year", 1968); Is there anything that uses other than 1968 as base year? If not this could be the default in the device so you don't need these set prop calls here and in sun machines. The only other place this device is used seems to be ppc/prep machine that uses the isa version but does not set a base year. Is that a bug? The original prep machine removed in b2ce76a0730 used 2000 but that's unlikely as well as these machines predate that. Anyway, the sysbus and isa versions are different so their default base-year could be set independently and then boards won't need to set this propery and may be could use qdev_create_simple instead or whatever that was renamed to. Regards, BALATON Zoltan +s = SYS_BUS_DEVICE(dev); +sysbus_realize_and_unref(s, &error_fatal); +sysbus_mmio_map(s, 0, 0xF000); /* Load kernel */ linux_boot = (kernel_filename != NULL); if (linux_boot) {
Re: [PATCH 0/5] m48t59: remove legacy init functions
Le 16/10/2020 à 20:27, Mark Cave-Ayland a écrit : This patchset is inspired by Philippe's "hw/rtc/m48t59: Simplify m48t59_init()" patchset at https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg04493.html but goes further: rather than tidy-up the legacy init functions, convert the callers to use qdev properties directly so they can simply be removed. Signed-off-by: Mark Cave-Ayland Mark Cave-Ayland (5): m48t59-isa: remove legacy m48t59_init_isa() function sun4m: use qdev properties instead of legacy m48t59_init() function sun4u: use qdev properties instead of legacy m48t59_init() function ppc405_boards: use qdev properties instead of legacy m48t59_init() function m48t59: remove legacy m48t59_init() function hw/ppc/ppc405_boards.c | 10 +- hw/rtc/m48t59-isa.c | 25 - hw/rtc/m48t59.c | 35 --- hw/sparc/sun4m.c| 8 +++- hw/sparc64/sun4u.c | 7 +-- include/hw/rtc/m48t59.h | 6 -- 6 files changed, 21 insertions(+), 70 deletions(-) Reviewed-by: Hervé Poussineau
Re: [PATCH v2 4/5] target/mips: Refactor helpers for fp comparison instructions
On 10/9/20 4:47 PM, Philippe Mathieu-Daudé wrote: Hi Aleksandar, On 10/7/20 10:37 PM, Aleksandar Markovic wrote: This change causes slighlty better performance of emulation of fp comparison instructions via better compiler optimization of refactored code. The functionality is otherwise unchanged. Signed-off-by: Aleksandar Markovic --- target/mips/fpu_helper.c | 56 +++- 1 file changed, 32 insertions(+), 24 deletions(-) [...] /* @@ -2080,12 +2088,12 @@ uint64_t helper_r6_cmp_d_ ## op(CPUMIPSState *env, uint64_t fdt0, \ { \ uint64_t c; \ c = cond; \ - update_fcr31(env, GETPC()); \ if (c) { \ return -1; \ } else { \ return 0; \ } \ + update_fcr31(env, GETPC()); \ Isn't it now never called (dead code)? Confirmed: target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_af’: target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void function [-Werror=return-type] 2097 | } | ^ target/mips/fpu_helper.c:2103:1: note: in expansion of macro ‘FOP_CONDN_D’ 2103 | FOP_CONDN_D(af, (float64_unordered_quiet(fdt1, fdt0, | ^~~ Compiling C object libqemu-mips-softmmu.fa.p/target_mips_dsp_helper.c.o target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_un’: target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void function [-Werror=return-type] 2097 | } | ^ target/mips/fpu_helper.c:2105:1: note: in expansion of macro ‘FOP_CONDN_D’ 2105 | FOP_CONDN_D(un, (float64_unordered_quiet(fdt1, fdt0, | ^~~ target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_eq’: target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void function [-Werror=return-type] 2097 | } | ^ target/mips/fpu_helper.c:2107:1: note: in expansion of macro ‘FOP_CONDN_D’ 2107 | FOP_CONDN_D(eq, (float64_eq_quiet(fdt0, fdt1, | ^~~ target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_ueq’: target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void function [-Werror=return-type] 2097 | } | ^ target/mips/fpu_helper.c:2109:1: note: in expansion of macro ‘FOP_CONDN_D’ 2109 | FOP_CONDN_D(ueq, (float64_unordered_quiet(fdt1, fdt0, | ^~~ target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_lt’: target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void function [-Werror=return-type] 2097 | } | ^ target/mips/fpu_helper.c:2113:1: note: in expansion of macro ‘FOP_CONDN_D’ 2113 | FOP_CONDN_D(lt, (float64_lt_quiet(fdt0, fdt1, | ^~~ target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_ult’: target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void function [-Werror=return-type] 2097 | } | ^ target/mips/fpu_helper.c:2115:1: note: in expansion of macro ‘FOP_CONDN_D’ 2115 | FOP_CONDN_D(ult, (float64_unordered_quiet(fdt1, fdt0, | ^~~ target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_le’: target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void function [-Werror=return-type] 2097 | } | ^ target/mips/fpu_helper.c:2119:1: note: in expansion of macro ‘FOP_CONDN_D’ 2119 | FOP_CONDN_D(le, (float64_le_quiet(fdt0, fdt1, | ^~~ target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_ule’: target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void function [-Werror=return-type] 2097 | } | ^ target/mips/fpu_helper.c:2121:1: note: in expansion of macro ‘FOP_CONDN_D’ 2121 | FOP_CONDN_D(ule, (float64_unordered_quiet(fdt1, fdt0, | ^~~ target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_saf’: target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void function [-Werror=return-type] 2097 | } | ^ target/mips/fpu_helper.c:2129:1: note: in expansion of macro ‘FOP_CONDN_D’ 2129 | FOP_CONDN_D(saf, (float64_unordered(fdt1, fdt0, | ^~~ target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_sun’: target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void function [-Werror=return-type] 2097 | } | ^ target/mips/fpu_helper.c:2131:1: note: in expansion of macro ‘FOP_CONDN_D’ 2131 | FOP_CONDN_D(sun, (float64_unordered(fdt1, fdt0, | ^~~ target/mips/fpu_helper.c: In function ‘helper_r6_cmp_d_seq’: target/mips/fpu_helper.c:2097:1: error: control reaches end of non-void function [-Werror=return-type] 2097
Re: [PATCH] goldfish_rtc: re-arm the alarm after migration
On Fri, Oct 16, 2020 at 11:16 AM Laurent Vivier wrote: > > After a migration the clock offset is updated, but we also > need to re-arm the alarm if needed. > > Signed-off-by: Laurent Vivier Reviewed-by: Alistair Francis Alistair > --- > hw/rtc/goldfish_rtc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/hw/rtc/goldfish_rtc.c b/hw/rtc/goldfish_rtc.c > index 0f4e8185a796..e07ff0164e0c 100644 > --- a/hw/rtc/goldfish_rtc.c > +++ b/hw/rtc/goldfish_rtc.c > @@ -211,6 +211,8 @@ static int goldfish_rtc_post_load(void *opaque, int > version_id) > qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); > s->tick_offset = s->tick_offset_vmstate - delta; > > +goldfish_rtc_set_alarm(s); > + > return 0; > } > > -- > 2.26.2 > >
Re: [PULL 00/10] Block layer patches
On Thu, 15 Oct 2020 at 15:50, Kevin Wolf wrote: > > The following changes since commit 57c98ea9acdcef5021f5671efa6475a5794a51c4: > > Merge remote-tracking branch 'remotes/kraxel/tags/ui-20201014-pull-request' > into staging (2020-10-14 13:56:06 +0100) > > are available in the Git repository at: > > git://repo.or.cz/qemu/kevin.git tags/for-upstream > > for you to fetch changes up to e1c4269763999e3b359fff19ad170e0110d3b457: > > block: deprecate the sheepdog block driver (2020-10-15 16:06:28 +0200) > > > Block layer patches: > > - qemu-storage-daemon: Remove QemuOpts from --object parser > - monitor: Fix order in monitor_cleanup() > - Deprecate the sheepdog block driver Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/5.2 for any user-visible changes. -- PMM
Re: [PATCH v11 00/12] linux-user: User support for AArch64 BTI
Patchew URL: https://patchew.org/QEMU/20201016184207.786698-1-richard.hender...@linaro.org/ Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 20201016184207.786698-1-richard.hender...@linaro.org Subject: [PATCH v11 00/12] linux-user: User support for AArch64 BTI === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/20201016184207.786698-1-richard.hender...@linaro.org -> patchew/20201016184207.786698-1-richard.hender...@linaro.org Switched to a new branch 'test' adddebd tests/tcg/aarch64: Add bti smoke tests 6de9e12 linux-user/elfload: Parse GNU_PROPERTY_AARCH64_FEATURE_1_AND 9b05077 linux-user/elfload: Parse NT_GNU_PROPERTY_TYPE_0 notes e063fde linux-user/elfload: Use Error for load_elf_interp 5fa6305 linux-user/elfload: Use Error for load_elf_image 0bdf3b9 linux-user/elfload: Move PT_INTERP detection to first loop 3192943 linux-user/elfload: Adjust iteration over phdr d35ac0e linux-user/elfload: Fix coding style in load_elf_image f1019c0 linux-user/elfload: Avoid leaking interp_name using GLib memory API 46b8e04 include/elf: Add defines related to GNU property notes for AArch64 15162b4 linux-user: Set PAGE_TARGET_1 for TARGET_PROT_BTI 49ed74b linux-user/aarch64: Reset btype for signals === OUTPUT BEGIN === 1/12 Checking commit 49ed74bc1aee (linux-user/aarch64: Reset btype for signals) 2/12 Checking commit 15162b4dcbec (linux-user: Set PAGE_TARGET_1 for TARGET_PROT_BTI) 3/12 Checking commit 46b8e04781a1 (include/elf: Add defines related to GNU property notes for AArch64) 4/12 Checking commit f1019c0416c3 (linux-user/elfload: Avoid leaking interp_name using GLib memory API) 5/12 Checking commit d35ac0e1930b (linux-user/elfload: Fix coding style in load_elf_image) 6/12 Checking commit 3192943b8d75 (linux-user/elfload: Adjust iteration over phdr) 7/12 Checking commit 0bdf3b9c07f6 (linux-user/elfload: Move PT_INTERP detection to first loop) 8/12 Checking commit 5fa63056e084 (linux-user/elfload: Use Error for load_elf_image) 9/12 Checking commit e063fde7d4b1 (linux-user/elfload: Use Error for load_elf_interp) 10/12 Checking commit 9b05077fa2ed (linux-user/elfload: Parse NT_GNU_PROPERTY_TYPE_0 notes) 11/12 Checking commit 6de9e12b2adf (linux-user/elfload: Parse GNU_PROPERTY_AARCH64_FEATURE_1_AND) 12/12 Checking commit adddebd8a702 (tests/tcg/aarch64: Add bti smoke tests) WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? #37: new file mode 100644 ERROR: externs should be avoided in .c files #165: FILE: tests/tcg/aarch64/bti-2.c:56: +extern char test_begin[], test_end[]; ERROR: use qemu_real_host_page_size instead of getpagesize() #199: FILE: tests/tcg/aarch64/bti-2.c:90: +void *p = mmap(0, getpagesize(), ERROR: externs should be avoided in .c files #236: FILE: tests/tcg/aarch64/bti-crt.inc.c:13: +int main(void); total: 3 errors, 1 warnings, 247 lines checked Patch 12/12 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20201016184207.786698-1-richard.hender...@linaro.org/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
[PATCH v11 12/12] tests/tcg/aarch64: Add bti smoke tests
The note test requires gcc 10 for -mbranch-protection=standard. The mmap test uses PROT_BTI and does not require special compiler support. Acked-by: Alex Bennée Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- v9: Expect and require gcc 10. v11: Squash mmap smoke test. --- tests/tcg/aarch64/bti-1.c | 62 + tests/tcg/aarch64/bti-2.c | 108 ++ tests/tcg/aarch64/bti-crt.inc.c | 51 ++ tests/tcg/aarch64/Makefile.target | 10 +++ tests/tcg/configure.sh| 4 ++ 5 files changed, 235 insertions(+) create mode 100644 tests/tcg/aarch64/bti-1.c create mode 100644 tests/tcg/aarch64/bti-2.c create mode 100644 tests/tcg/aarch64/bti-crt.inc.c diff --git a/tests/tcg/aarch64/bti-1.c b/tests/tcg/aarch64/bti-1.c new file mode 100644 index 00..61924f0d7a --- /dev/null +++ b/tests/tcg/aarch64/bti-1.c @@ -0,0 +1,62 @@ +/* + * Branch target identification, basic notskip cases. + */ + +#include "bti-crt.inc.c" + +static void skip2_sigill(int sig, siginfo_t *info, ucontext_t *uc) +{ +uc->uc_mcontext.pc += 8; +uc->uc_mcontext.pstate = 1; +} + +#define NOP "nop" +#define BTI_N "hint #32" +#define BTI_C "hint #34" +#define BTI_J "hint #36" +#define BTI_JC"hint #38" + +#define BTYPE_1(DEST) \ +asm("mov %0,#1; adr x16, 1f; br x16; 1: " DEST "; mov %0,#0" \ +: "=r"(skipped) : : "x16") + +#define BTYPE_2(DEST) \ +asm("mov %0,#1; adr x16, 1f; blr x16; 1: " DEST "; mov %0,#0" \ +: "=r"(skipped) : : "x16", "x30") + +#define BTYPE_3(DEST) \ +asm("mov %0,#1; adr x15, 1f; br x15; 1: " DEST "; mov %0,#0" \ +: "=r"(skipped) : : "x15") + +#define TEST(WHICH, DEST, EXPECT) \ +do { WHICH(DEST); fail += skipped ^ EXPECT; } while (0) + + +int main() +{ +int fail = 0; +int skipped; + +/* Signal-like with SA_SIGINFO. */ +signal_info(SIGILL, skip2_sigill); + +TEST(BTYPE_1, NOP, 1); +TEST(BTYPE_1, BTI_N, 1); +TEST(BTYPE_1, BTI_C, 0); +TEST(BTYPE_1, BTI_J, 0); +TEST(BTYPE_1, BTI_JC, 0); + +TEST(BTYPE_2, NOP, 1); +TEST(BTYPE_2, BTI_N, 1); +TEST(BTYPE_2, BTI_C, 0); +TEST(BTYPE_2, BTI_J, 1); +TEST(BTYPE_2, BTI_JC, 0); + +TEST(BTYPE_3, NOP, 1); +TEST(BTYPE_3, BTI_N, 1); +TEST(BTYPE_3, BTI_C, 1); +TEST(BTYPE_3, BTI_J, 0); +TEST(BTYPE_3, BTI_JC, 0); + +return fail; +} diff --git a/tests/tcg/aarch64/bti-2.c b/tests/tcg/aarch64/bti-2.c new file mode 100644 index 00..6dc8908b5a --- /dev/null +++ b/tests/tcg/aarch64/bti-2.c @@ -0,0 +1,108 @@ +/* + * Branch target identification, basic notskip cases. + */ + +#include +#include +#include +#include +#include + +#ifndef PROT_BTI +#define PROT_BTI 0x10 +#endif + +static void skip2_sigill(int sig, siginfo_t *info, void *vuc) +{ +ucontext_t *uc = vuc; +uc->uc_mcontext.pc += 8; +uc->uc_mcontext.pstate = 1; +} + +#define NOP "nop" +#define BTI_N "hint #32" +#define BTI_C "hint #34" +#define BTI_J "hint #36" +#define BTI_JC"hint #38" + +#define BTYPE_1(DEST)\ +"mov x1, #1\n\t" \ +"adr x16, 1f\n\t"\ +"br x16\n" \ +"1: " DEST "\n\t"\ +"mov x1, #0" + +#define BTYPE_2(DEST)\ +"mov x1, #1\n\t" \ +"adr x16, 1f\n\t"\ +"blr x16\n" \ +"1: " DEST "\n\t"\ +"mov x1, #0" + +#define BTYPE_3(DEST)\ +"mov x1, #1\n\t" \ +"adr x15, 1f\n\t"\ +"br x15\n" \ +"1: " DEST "\n\t"\ +"mov x1, #0" + +#define TEST(WHICH, DEST, EXPECT) \ +WHICH(DEST) "\n" \ +".if " #EXPECT "\n\t" \ +"eor x1, x1," #EXPECT "\n"\ +".endif\n\t" \ +"add x0, x0, x1\n\t" + +extern char test_begin[], test_end[]; + +asm("\n" +"test_begin:\n\t" +BTI_C "\n\t" +"mov x2, x30\n\t" +"mov x0, #0\n\t" + +TEST(BTYPE_1, NOP, 1) +TEST(BTYPE_1, BTI_N, 1) +TEST(BTYPE_1, BTI_C, 0) +TEST(BTYPE_1, BTI_J, 0) +TEST(BTYPE_1, BTI_JC, 0) + +TEST(BTYPE_2, NOP, 1) +TEST(BTYPE_2, BTI_N, 1) +TEST(BTYPE_2, BTI_C, 0) +TEST(BTYPE_2, BTI_J, 1) +TEST(BTYPE_2, BTI_JC, 0) + +TEST(BTYPE_3, NOP, 1) +TEST(BTYPE_3, BTI_N, 1) +TEST(BTYPE_3, BTI_C, 1) +TEST(BTYPE_3, BTI_J, 0) +TEST(BTYPE_3, BTI_JC, 0) + +"ret x2\n" +"test_end:" +); + +int main() +{ +struct sigaction sa; + +void *p = mmap(0, getpagesize(), + PROT_EXEC | PROT_READ | PROT_WRITE | PROT_BTI, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); +if (p == MAP_FAILED) { +perror("mmap"); +return 1; +} + +memset(&sa, 0, sizeof(sa)); +sa.sa_sigaction = skip2_sigill; +sa.sa_flags = SA_SIGINFO; +if (sigaction(SIGILL, &sa, NULL) < 0) { +perror("sigaction"); +return 1; +} + +memcpy(p, test_begin, test_end - test_begin); +return ((int (*)(void))p)(); +} diff --git a/tests/tcg/aarch64/b
[PATCH v11 11/12] linux-user/elfload: Parse GNU_PROPERTY_AARCH64_FEATURE_1_AND
Use the new generic support for NT_GNU_PROPERTY_TYPE_0. Signed-off-by: Richard Henderson --- v11: Split out aarch64 bits from generic patch. --- linux-user/elfload.c | 48 ++-- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index 428dcaa152..bf8c1bd253 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -1522,6 +1522,28 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, #include "elf.h" +/* We must delay the following stanzas until after "elf.h". */ +#if defined(TARGET_AARCH64) + +static bool arch_parse_elf_property(uint32_t pr_type, uint32_t pr_datasz, +const uint32_t *data, +struct image_info *info, +Error **errp) +{ +if (pr_type == GNU_PROPERTY_AARCH64_FEATURE_1_AND) { +if (pr_datasz != sizeof(uint32_t)) { +error_setg(errp, "Ill-formed GNU_PROPERTY_AARCH64_FEATURE_1_AND"); +return false; +} +/* We will extract GNU_PROPERTY_AARCH64_FEATURE_1_BTI later. */ +info->note_flags = *data; +} +return true; +} +#define ARCH_USE_GNU_PROPERTY 1 + +#else + static bool arch_parse_elf_property(uint32_t pr_type, uint32_t pr_datasz, const uint32_t *data, struct image_info *info, @@ -1531,6 +1553,8 @@ static bool arch_parse_elf_property(uint32_t pr_type, uint32_t pr_datasz, } #define ARCH_USE_GNU_PROPERTY 0 +#endif + struct exec { unsigned int a_info; /* Use macros N_MAGIC, etc for access */ @@ -2545,7 +2569,7 @@ static void load_elf_image(const char *image_name, int image_fd, struct elfhdr *ehdr = (struct elfhdr *)bprm_buf; struct elf_phdr *phdr; abi_ulong load_addr, load_bias, loaddr, hiaddr, error; -int i, retval; +int i, retval, prot_exec; Error *err = NULL; /* First of all, some simple consistency checks */ @@ -2712,6 +2736,26 @@ static void load_elf_image(const char *image_name, int image_fd, info->brk = 0; info->elf_flags = ehdr->e_flags; +prot_exec = PROT_EXEC; +#ifdef TARGET_AARCH64 +/* + * If the BTI feature is present, this indicates that the executable + * pages of the startup binary should be mapped with PROT_BTI, so that + * branch targets are enforced. + * + * The startup binary is either the interpreter or the static executable. + * The interpreter is responsible for all pages of a dynamic executable. + * + * Elf notes are backward compatible to older cpus. + * Do not enable BTI unless it is supported. + */ +if ((info->note_flags & GNU_PROPERTY_AARCH64_FEATURE_1_BTI) +&& (pinterp_name == NULL || *pinterp_name == 0) +&& cpu_isar_feature(aa64_bti, ARM_CPU(thread_cpu))) { +prot_exec |= TARGET_PROT_BTI; +} +#endif + for (i = 0; i < ehdr->e_phnum; i++) { struct elf_phdr *eppnt = phdr + i; if (eppnt->p_type == PT_LOAD) { @@ -2725,7 +2769,7 @@ static void load_elf_image(const char *image_name, int image_fd, elf_prot |= PROT_WRITE; } if (eppnt->p_flags & PF_X) { -elf_prot |= PROT_EXEC; +elf_prot |= prot_exec; } vaddr = load_bias + eppnt->p_vaddr; -- 2.25.1
[PATCH v11 09/12] linux-user/elfload: Use Error for load_elf_interp
This is slightly clearer than just using strerror, though the different forms produced by error_setg_file_open and error_setg_errno isn't entirely convenient. Signed-off-by: Richard Henderson --- linux-user/elfload.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index 56fbda93d0..04c04bc260 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -2675,26 +2675,27 @@ static void load_elf_interp(const char *filename, struct image_info *info, char bprm_buf[BPRM_BUF_SIZE]) { int fd, retval; +Error *err = NULL; fd = open(path(filename), O_RDONLY); if (fd < 0) { -goto exit_perror; +error_setg_file_open(&err, errno, filename); +error_report_err(err); +exit(-1); } retval = read(fd, bprm_buf, BPRM_BUF_SIZE); if (retval < 0) { -goto exit_perror; +error_setg_errno(&err, errno, "Error reading file header"); +error_reportf_err(err, "%s: ", filename); +exit(-1); } + if (retval < BPRM_BUF_SIZE) { memset(bprm_buf + retval, 0, BPRM_BUF_SIZE - retval); } load_elf_image(filename, fd, info, NULL, bprm_buf); -return; - - exit_perror: -fprintf(stderr, "%s: %s\n", filename, strerror(errno)); -exit(-1); } static int symfind(const void *s0, const void *s1) -- 2.25.1
[PATCH v11 10/12] linux-user/elfload: Parse NT_GNU_PROPERTY_TYPE_0 notes
This is generic support, with the code disabled for all targets. Signed-off-by: Richard Henderson --- v9: Only map the startup executable with BTI; anything else must be handled by the interpreter. v10: Split out preparatory patches (pmm). v11: Mirror(-ish) the kernel's code structure (pmm). --- linux-user/qemu.h| 4 ++ linux-user/elfload.c | 157 +++ 2 files changed, 161 insertions(+) diff --git a/linux-user/qemu.h b/linux-user/qemu.h index 941ca99722..534753ca12 100644 --- a/linux-user/qemu.h +++ b/linux-user/qemu.h @@ -61,6 +61,10 @@ struct image_info { abi_ulong interpreter_loadmap_addr; abi_ulong interpreter_pt_dynamic_addr; struct image_info *other_info; + +/* For target-specific processing of NT_GNU_PROPERTY_TYPE_0. */ +uint32_tnote_flags; + #ifdef TARGET_MIPS int fp_abi; int interp_fp_abi; diff --git a/linux-user/elfload.c b/linux-user/elfload.c index 04c04bc260..428dcaa152 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -1522,6 +1522,15 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, #include "elf.h" +static bool arch_parse_elf_property(uint32_t pr_type, uint32_t pr_datasz, +const uint32_t *data, +struct image_info *info, +Error **errp) +{ +g_assert_not_reached(); +} +#define ARCH_USE_GNU_PROPERTY 0 + struct exec { unsigned int a_info; /* Use macros N_MAGIC, etc for access */ @@ -2373,6 +2382,150 @@ void probe_guest_base(const char *image_name, abi_ulong guest_loaddr, "@ 0x%" PRIx64 "\n", (uint64_t)guest_base); } +enum { +/* The string "GNU\0" as a magic number. */ +GNU0_MAGIC = const_le32('G' | 'N' << 8 | 'U' << 16), +NOTE_DATA_SZ = 1 * KiB, +NOTE_NAME_SZ = 4, +ELF_GNU_PROPERTY_ALIGN = ELF_CLASS == ELFCLASS32 ? 4 : 8, +}; + +/* + * Process a single gnu_property entry. + * Return false for error. + */ +static bool parse_elf_property(const uint32_t *data, int *off, int datasz, + struct image_info *info, bool have_prev_type, + uint32_t *prev_type, Error **errp) +{ +uint32_t pr_type, pr_datasz, step; + +if (*off > datasz || !QEMU_IS_ALIGNED(*off, ELF_GNU_PROPERTY_ALIGN)) { +goto error_data; +} +datasz -= *off; +data += *off / sizeof(uint32_t); + +if (datasz < 2 * sizeof(uint32_t)) { +goto error_data; +} +pr_type = data[0]; +pr_datasz = data[1]; +data += 2; +datasz -= 2 * sizeof(uint32_t); +step = ROUND_UP(pr_datasz, ELF_GNU_PROPERTY_ALIGN); +if (step > datasz) { +goto error_data; +} + +/* Properties are supposed to be unique and sorted on pr_type. */ +if (have_prev_type && pr_type <= *prev_type) { +if (pr_type == *prev_type) { +error_setg(errp, "Duplicate property in PT_GNU_PROPERTY"); +} else { +error_setg(errp, "Unsorted property in PT_GNU_PROPERTY"); +} +return false; +} +*prev_type = pr_type; + +if (!arch_parse_elf_property(pr_type, pr_datasz, data, info, errp)) { +return false; +} + +*off += 2 * sizeof(uint32_t) + step; +return true; + + error_data: +error_setg(errp, "Ill-formed property in PT_GNU_PROPERTY"); +return false; +} + +/* Process NT_GNU_PROPERTY_TYPE_0. */ +static bool parse_elf_properties(int image_fd, + struct image_info *info, + const struct elf_phdr *phdr, + char bprm_buf[BPRM_BUF_SIZE], + Error **errp) +{ +union { +struct elf_note nhdr; +uint32_t data[NOTE_DATA_SZ / sizeof(uint32_t)]; +} note; + +int n, off, datasz; +bool have_prev_type; +uint32_t prev_type; + +/* Unless the arch requires properties, ignore them. */ +if (!ARCH_USE_GNU_PROPERTY) { +return true; +} + +/* If the properties are crazy large, that's too bad. */ +n = phdr->p_filesz; +if (n > sizeof(note)) { +error_setg(errp, "PT_GNU_PROPERTY too large"); +return false; +} +if (n < sizeof(note.nhdr)) { +error_setg(errp, "PT_GNU_PROPERTY too small"); +return false; +} + +if (phdr->p_offset + n <= BPRM_BUF_SIZE) { +memcpy(¬e, bprm_buf + phdr->p_offset, n); +} else { +ssize_t len = pread(image_fd, ¬e, n, phdr->p_offset); +if (len != n) { +error_setg_errno(errp, errno, "Error reading file header"); +return false; +} +} + +/* + * The contents of a valid PT_GNU_PROPERTY is a sequence + * of uint32_t -- swap them all now. + */ +#ifdef BSWAP_NEEDED +for (int i = 0; i < n / 4; i++) { +bswap32s(note.data + i)
[PATCH v11 02/12] linux-user: Set PAGE_TARGET_1 for TARGET_PROT_BTI
Transform the prot bit to a qemu internal page bit, and save it in the page tables. Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- v10: Add PAGE_BTI define (pmm). --- include/exec/cpu-all.h | 2 ++ linux-user/syscall_defs.h | 4 target/arm/cpu.h | 5 + linux-user/mmap.c | 16 target/arm/translate-a64.c | 6 +++--- 5 files changed, 30 insertions(+), 3 deletions(-) diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h index 61e13b5038..656a2a8788 100644 --- a/include/exec/cpu-all.h +++ b/include/exec/cpu-all.h @@ -274,6 +274,8 @@ extern intptr_t qemu_host_page_mask; /* FIXME: Code that sets/uses this is broken and needs to go away. */ #define PAGE_RESERVED 0x0020 #endif +/* Target-specific bits that will be used via page_get_flags(). */ +#define PAGE_TARGET_1 0x0080 #if defined(CONFIG_USER_ONLY) void page_dump(FILE *f); diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index 731c3d5341..cabbfb762d 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -1277,6 +1277,10 @@ struct target_winsize { #define TARGET_PROT_SEM 0x08 #endif +#ifdef TARGET_AARCH64 +#define TARGET_PROT_BTI 0x10 +#endif + /* Common */ #define TARGET_MAP_SHARED 0x01/* Share changes */ #define TARGET_MAP_PRIVATE 0x02/* Changes are private */ diff --git a/target/arm/cpu.h b/target/arm/cpu.h index cfff1b5c8f..e8efe21a1b 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -3443,6 +3443,11 @@ static inline MemTxAttrs *typecheck_memtxattrs(MemTxAttrs *x) #define arm_tlb_bti_gp(x) (typecheck_memtxattrs(x)->target_tlb_bit0) #define arm_tlb_mte_tagged(x) (typecheck_memtxattrs(x)->target_tlb_bit1) +/* + * AArch64 usage of the PAGE_TARGET_* bits for linux-user. + */ +#define PAGE_BTI PAGE_TARGET_1 + /* * Naming convention for isar_feature functions: * Functions which test 32-bit ID registers should have _aa32_ in diff --git a/linux-user/mmap.c b/linux-user/mmap.c index f261563420..00c05e6a0f 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -83,6 +83,22 @@ static int validate_prot_to_pageflags(int *host_prot, int prot) *host_prot = (prot & (PROT_READ | PROT_WRITE)) | (prot & PROT_EXEC ? PROT_READ : 0); +#ifdef TARGET_AARCH64 +/* + * The PROT_BTI bit is only accepted if the cpu supports the feature. + * Since this is the unusual case, don't bother checking unless + * the bit has been requested. If set and valid, record the bit + * within QEMU's page_flags. + */ +if (prot & TARGET_PROT_BTI) { +ARMCPU *cpu = ARM_CPU(thread_cpu); +if (cpu_isar_feature(aa64_bti, cpu)) { +valid |= TARGET_PROT_BTI; +page_flags |= PAGE_BTI; +} +} +#endif + return prot & ~valid ? 0 : page_flags; } diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 7188808341..072754fa24 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -14507,10 +14507,10 @@ static void disas_data_proc_simd_fp(DisasContext *s, uint32_t insn) */ static bool is_guarded_page(CPUARMState *env, DisasContext *s) { -#ifdef CONFIG_USER_ONLY -return false; /* FIXME */ -#else uint64_t addr = s->base.pc_first; +#ifdef CONFIG_USER_ONLY +return page_get_flags(addr) & PAGE_BTI; +#else int mmu_idx = arm_to_core_mmu_idx(s->mmu_idx); unsigned int index = tlb_index(env, mmu_idx, addr); CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr); -- 2.25.1
[PATCH v11 08/12] linux-user/elfload: Use Error for load_elf_image
This is a bit clearer than open-coding some of this with a bare c string. Signed-off-by: Richard Henderson --- linux-user/elfload.c | 37 - 1 file changed, 20 insertions(+), 17 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index 107a628a9e..56fbda93d0 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -12,6 +12,7 @@ #include "qemu/guest-random.h" #include "qemu/units.h" #include "qemu/selfmap.h" +#include "qapi/error.h" #ifdef _ARCH_PPC64 #undef ARCH_DLINFO @@ -2392,15 +2393,16 @@ static void load_elf_image(const char *image_name, int image_fd, struct elf_phdr *phdr; abi_ulong load_addr, load_bias, loaddr, hiaddr, error; int i, retval; -const char *errmsg; +Error *err = NULL; /* First of all, some simple consistency checks */ -errmsg = "Invalid ELF image for this architecture"; if (!elf_check_ident(ehdr)) { +error_setg(&err, "Invalid ELF image for this architecture"); goto exit_errmsg; } bswap_ehdr(ehdr); if (!elf_check_ehdr(ehdr)) { +error_setg(&err, "Invalid ELF image for this architecture"); goto exit_errmsg; } @@ -2444,13 +2446,11 @@ static void load_elf_image(const char *image_name, int image_fd, g_autofree char *interp_name = NULL; if (*pinterp_name) { -errmsg = "Multiple PT_INTERP entries"; +error_setg(&err, "Multiple PT_INTERP entries"); goto exit_errmsg; } + interp_name = g_malloc(eppnt->p_filesz); -if (!interp_name) { -goto exit_perror; -} if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) { memcpy(interp_name, bprm_buf + eppnt->p_offset, @@ -2459,11 +2459,11 @@ static void load_elf_image(const char *image_name, int image_fd, retval = pread(image_fd, interp_name, eppnt->p_filesz, eppnt->p_offset); if (retval != eppnt->p_filesz) { -goto exit_perror; +goto exit_read; } } if (interp_name[eppnt->p_filesz - 1] != 0) { -errmsg = "Invalid PT_INTERP entry"; +error_setg(&err, "Invalid PT_INTERP entry"); goto exit_errmsg; } *pinterp_name = g_steal_pointer(&interp_name); @@ -2520,7 +2520,7 @@ static void load_elf_image(const char *image_name, int image_fd, (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0), -1, 0); if (load_addr == -1) { -goto exit_perror; +goto exit_mmap; } load_bias = load_addr - loaddr; @@ -2587,7 +2587,7 @@ static void load_elf_image(const char *image_name, int image_fd, image_fd, eppnt->p_offset - vaddr_po); if (error == -1) { -goto exit_perror; +goto exit_mmap; } } @@ -2623,7 +2623,7 @@ static void load_elf_image(const char *image_name, int image_fd, } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) { Mips_elf_abiflags_v0 abiflags; if (eppnt->p_filesz < sizeof(Mips_elf_abiflags_v0)) { -errmsg = "Invalid PT_MIPS_ABIFLAGS entry"; +error_setg(&err, "Invalid PT_MIPS_ABIFLAGS entry"); goto exit_errmsg; } if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) { @@ -2633,7 +2633,7 @@ static void load_elf_image(const char *image_name, int image_fd, retval = pread(image_fd, &abiflags, sizeof(Mips_elf_abiflags_v0), eppnt->p_offset); if (retval != sizeof(Mips_elf_abiflags_v0)) { -goto exit_perror; +goto exit_read; } } bswap_mips_abiflags(&abiflags); @@ -2658,13 +2658,16 @@ static void load_elf_image(const char *image_name, int image_fd, exit_read: if (retval >= 0) { -errmsg = "Incomplete read of file header"; -goto exit_errmsg; +error_setg(&err, "Incomplete read of file header"); +} else { +error_setg_errno(&err, errno, "Error reading file header"); } - exit_perror: -errmsg = strerror(errno); +goto exit_errmsg; + exit_mmap: +error_setg_errno(&err, errno, "Error mapping file"); +goto exit_errmsg; exit_errmsg: -fprintf(stderr, "%s: %s\n", image_name, errmsg); +error_reportf_err(err, "%s: ", image_name); exit(-1); } -- 2.25.1
[PATCH v11 03/12] include/elf: Add defines related to GNU property notes for AArch64
These are all of the defines required to parse GNU_PROPERTY_AARCH64_FEATURE_1_AND, copied from binutils. Other missing defines related to other GNU program headers and notes are elided for now. Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- include/elf.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/include/elf.h b/include/elf.h index c117a4d1ab..10126ff809 100644 --- a/include/elf.h +++ b/include/elf.h @@ -26,9 +26,13 @@ typedef int64_t Elf64_Sxword; #define PT_NOTE4 #define PT_SHLIB 5 #define PT_PHDR6 +#define PT_LOOS0x6000 +#define PT_HIOS0x6fff #define PT_LOPROC 0x7000 #define PT_HIPROC 0x7fff +#define PT_GNU_PROPERTY (PT_LOOS + 0x474e553) + #define PT_MIPS_REGINFO 0x7000 #define PT_MIPS_RTPROC0x7001 #define PT_MIPS_OPTIONS 0x7002 @@ -1657,6 +1661,24 @@ typedef struct elf64_shdr { #define NT_ARM_SYSTEM_CALL 0x404 /* ARM system call number */ #define NT_ARM_SVE 0x405 /* ARM Scalable Vector Extension regs */ +/* Defined note types for GNU systems. */ + +#define NT_GNU_PROPERTY_TYPE_0 5 /* Program property */ + +/* Values used in GNU .note.gnu.property notes (NT_GNU_PROPERTY_TYPE_0). */ + +#define GNU_PROPERTY_STACK_SIZE 1 +#define GNU_PROPERTY_NO_COPY_ON_PROTECTED 2 + +#define GNU_PROPERTY_LOPROC 0xc000 +#define GNU_PROPERTY_HIPROC 0xdfff +#define GNU_PROPERTY_LOUSER 0xe000 +#define GNU_PROPERTY_HIUSER 0x + +#define GNU_PROPERTY_AARCH64_FEATURE_1_AND 0xc000 +#define GNU_PROPERTY_AARCH64_FEATURE_1_BTI (1u << 0) +#define GNU_PROPERTY_AARCH64_FEATURE_1_PAC (1u << 1) + /* * Physical entry point into the kernel. * -- 2.25.1
[PATCH v11 07/12] linux-user/elfload: Move PT_INTERP detection to first loop
For BTI, we need to know if the executable is static or dynamic, which means looking for PT_INTERP earlier. Signed-off-by: Richard Henderson --- linux-user/elfload.c | 60 +++- 1 file changed, 31 insertions(+), 29 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index 210592aa90..107a628a9e 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -2421,8 +2421,10 @@ static void load_elf_image(const char *image_name, int image_fd, mmap_lock(); -/* Find the maximum size of the image and allocate an appropriate - amount of memory to handle that. */ +/* + * Find the maximum size of the image and allocate an appropriate + * amount of memory to handle that. Locate the interpreter, if any. + */ loaddr = -1, hiaddr = 0; info->alignment = 0; for (i = 0; i < ehdr->e_phnum; ++i) { @@ -2438,6 +2440,33 @@ static void load_elf_image(const char *image_name, int image_fd, } ++info->nsegs; info->alignment |= eppnt->p_align; +} else if (eppnt->p_type == PT_INTERP && pinterp_name) { +g_autofree char *interp_name = NULL; + +if (*pinterp_name) { +errmsg = "Multiple PT_INTERP entries"; +goto exit_errmsg; +} +interp_name = g_malloc(eppnt->p_filesz); +if (!interp_name) { +goto exit_perror; +} + +if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) { +memcpy(interp_name, bprm_buf + eppnt->p_offset, + eppnt->p_filesz); +} else { +retval = pread(image_fd, interp_name, eppnt->p_filesz, + eppnt->p_offset); +if (retval != eppnt->p_filesz) { +goto exit_perror; +} +} +if (interp_name[eppnt->p_filesz - 1] != 0) { +errmsg = "Invalid PT_INTERP entry"; +goto exit_errmsg; +} +*pinterp_name = g_steal_pointer(&interp_name); } } @@ -2590,33 +2619,6 @@ static void load_elf_image(const char *image_name, int image_fd, if (vaddr_em > info->brk) { info->brk = vaddr_em; } -} else if (eppnt->p_type == PT_INTERP && pinterp_name) { -g_autofree char *interp_name = NULL; - -if (*pinterp_name) { -errmsg = "Multiple PT_INTERP entries"; -goto exit_errmsg; -} -interp_name = g_malloc(eppnt->p_filesz); -if (!interp_name) { -goto exit_perror; -} - -if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) { -memcpy(interp_name, bprm_buf + eppnt->p_offset, - eppnt->p_filesz); -} else { -retval = pread(image_fd, interp_name, eppnt->p_filesz, - eppnt->p_offset); -if (retval != eppnt->p_filesz) { -goto exit_perror; -} -} -if (interp_name[eppnt->p_filesz - 1] != 0) { -errmsg = "Invalid PT_INTERP entry"; -goto exit_errmsg; -} -*pinterp_name = g_steal_pointer(&interp_name); #ifdef TARGET_MIPS } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) { Mips_elf_abiflags_v0 abiflags; -- 2.25.1
[PATCH v11 05/12] linux-user/elfload: Fix coding style in load_elf_image
Fixing this now will clarify following patches. Signed-off-by: Richard Henderson --- linux-user/elfload.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index 1a3150df7c..290ef70222 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -2531,9 +2531,15 @@ static void load_elf_image(const char *image_name, int image_fd, abi_ulong vaddr, vaddr_po, vaddr_ps, vaddr_ef, vaddr_em, vaddr_len; int elf_prot = 0; -if (eppnt->p_flags & PF_R) elf_prot = PROT_READ; -if (eppnt->p_flags & PF_W) elf_prot |= PROT_WRITE; -if (eppnt->p_flags & PF_X) elf_prot |= PROT_EXEC; +if (eppnt->p_flags & PF_R) { +elf_prot |= PROT_READ; +} +if (eppnt->p_flags & PF_W) { +elf_prot |= PROT_WRITE; +} +if (eppnt->p_flags & PF_X) { +elf_prot |= PROT_EXEC; +} vaddr = load_bias + eppnt->p_vaddr; vaddr_po = TARGET_ELF_PAGEOFFSET(vaddr); -- 2.25.1
[PATCH v11 06/12] linux-user/elfload: Adjust iteration over phdr
The second loop uses a loop induction variable, and the first does not. Transform the first to match the second, to simplify a following patch moving code between them. Signed-off-by: Richard Henderson --- linux-user/elfload.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index 290ef70222..210592aa90 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -2426,17 +2426,18 @@ static void load_elf_image(const char *image_name, int image_fd, loaddr = -1, hiaddr = 0; info->alignment = 0; for (i = 0; i < ehdr->e_phnum; ++i) { -if (phdr[i].p_type == PT_LOAD) { -abi_ulong a = phdr[i].p_vaddr - phdr[i].p_offset; +struct elf_phdr *eppnt = phdr + i; +if (eppnt->p_type == PT_LOAD) { +abi_ulong a = eppnt->p_vaddr - eppnt->p_offset; if (a < loaddr) { loaddr = a; } -a = phdr[i].p_vaddr + phdr[i].p_memsz; +a = eppnt->p_vaddr + eppnt->p_memsz; if (a > hiaddr) { hiaddr = a; } ++info->nsegs; -info->alignment |= phdr[i].p_align; +info->alignment |= eppnt->p_align; } } -- 2.25.1
[PATCH v11 01/12] linux-user/aarch64: Reset btype for signals
The kernel sets btype for the signal handler as if for a call. Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- linux-user/aarch64/signal.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c index d50c1ae583..b591790c22 100644 --- a/linux-user/aarch64/signal.c +++ b/linux-user/aarch64/signal.c @@ -506,10 +506,16 @@ static void target_setup_frame(int usig, struct target_sigaction *ka, + offsetof(struct target_rt_frame_record, tramp); } env->xregs[0] = usig; -env->xregs[31] = frame_addr; env->xregs[29] = frame_addr + fr_ofs; -env->pc = ka->_sa_handler; env->xregs[30] = return_addr; +env->xregs[31] = frame_addr; +env->pc = ka->_sa_handler; + +/* Invoke the signal handler as if by indirect call. */ +if (cpu_isar_feature(aa64_bti, env_archcpu(env))) { +env->btype = 2; +} + if (info) { tswap_siginfo(&frame->info, info); env->xregs[1] = frame_addr + offsetof(struct target_rt_sigframe, info); -- 2.25.1
[PATCH v11 04/12] linux-user/elfload: Avoid leaking interp_name using GLib memory API
From: Philippe Mathieu-Daudé Fix an unlikely memory leak in load_elf_image(). Fixes: bf858897b7 ("linux-user: Re-use load_elf_image for the main binary.") Signed-off-by: Philippe Mathieu-Daudé Message-Id: <20201003174944.1972444-1-f4...@amsat.org> Signed-off-by: Richard Henderson --- linux-user/elfload.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index f6022fd704..1a3150df7c 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -2584,13 +2584,13 @@ static void load_elf_image(const char *image_name, int image_fd, info->brk = vaddr_em; } } else if (eppnt->p_type == PT_INTERP && pinterp_name) { -char *interp_name; +g_autofree char *interp_name = NULL; if (*pinterp_name) { errmsg = "Multiple PT_INTERP entries"; goto exit_errmsg; } -interp_name = malloc(eppnt->p_filesz); +interp_name = g_malloc(eppnt->p_filesz); if (!interp_name) { goto exit_perror; } @@ -2609,7 +2609,7 @@ static void load_elf_image(const char *image_name, int image_fd, errmsg = "Invalid PT_INTERP entry"; goto exit_errmsg; } -*pinterp_name = interp_name; +*pinterp_name = g_steal_pointer(&interp_name); #ifdef TARGET_MIPS } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) { Mips_elf_abiflags_v0 abiflags; @@ -2961,7 +2961,7 @@ int load_elf_binary(struct linux_binprm *bprm, struct image_info *info) if (elf_interpreter) { info->load_bias = interp_info.load_bias; info->entry = interp_info.entry; -free(elf_interpreter); +g_free(elf_interpreter); } #ifdef USE_ELF_CORE_DUMP -- 2.25.1
[PATCH v11 00/12] linux-user: User support for AArch64 BTI
The kernel abi for this was merged in v5.8, just as the qemu 5.1 merge window was closing, so this slipped to the next dev cycle. Changes from v10: * Include Phil's plug of interp_name memory leak. * Convert error reporting to Error api. * Mirror the kernel's code structure for parsing notes (though Error means that it's not exactly the same). * Split aarch64 stuff from basic note parsing patch. Changes from v9: * Split what is now patch 7 into 3 more (pmm). * All prerequisites are now upstream. r~ Philippe Mathieu-Daudé (1): linux-user/elfload: Avoid leaking interp_name using GLib memory API Richard Henderson (11): linux-user/aarch64: Reset btype for signals linux-user: Set PAGE_TARGET_1 for TARGET_PROT_BTI include/elf: Add defines related to GNU property notes for AArch64 linux-user/elfload: Fix coding style in load_elf_image linux-user/elfload: Adjust iteration over phdr linux-user/elfload: Move PT_INTERP detection to first loop linux-user/elfload: Use Error for load_elf_image linux-user/elfload: Use Error for load_elf_interp linux-user/elfload: Parse NT_GNU_PROPERTY_TYPE_0 notes linux-user/elfload: Parse GNU_PROPERTY_AARCH64_FEATURE_1_AND tests/tcg/aarch64: Add bti smoke tests include/elf.h | 22 ++ include/exec/cpu-all.h| 2 + linux-user/qemu.h | 4 + linux-user/syscall_defs.h | 4 + target/arm/cpu.h | 5 + linux-user/aarch64/signal.c | 10 +- linux-user/elfload.c | 326 +- linux-user/mmap.c | 16 ++ target/arm/translate-a64.c| 6 +- tests/tcg/aarch64/bti-1.c | 62 ++ tests/tcg/aarch64/bti-2.c | 108 ++ tests/tcg/aarch64/bti-crt.inc.c | 51 + tests/tcg/aarch64/Makefile.target | 10 + tests/tcg/configure.sh| 4 + 14 files changed, 569 insertions(+), 61 deletions(-) create mode 100644 tests/tcg/aarch64/bti-1.c create mode 100644 tests/tcg/aarch64/bti-2.c create mode 100644 tests/tcg/aarch64/bti-crt.inc.c -- 2.25.1
[PATCH 5/5] m48t59: remove legacy m48t59_init() function
Now that all of the callers of this function have been switched to use qdev properties, this legacy init function can now be removed. Signed-off-by: Mark Cave-Ayland --- hw/rtc/m48t59.c | 35 --- include/hw/rtc/m48t59.h | 4 2 files changed, 39 deletions(-) diff --git a/hw/rtc/m48t59.c b/hw/rtc/m48t59.c index 6525206976..d54929e861 100644 --- a/hw/rtc/m48t59.c +++ b/hw/rtc/m48t59.c @@ -564,41 +564,6 @@ const MemoryRegionOps m48t59_io_ops = { .endianness = DEVICE_LITTLE_ENDIAN, }; -/* Initialisation routine */ -Nvram *m48t59_init(qemu_irq IRQ, hwaddr mem_base, - uint32_t io_base, uint16_t size, int base_year, - int model) -{ -DeviceState *dev; -SysBusDevice *s; -int i; - -for (i = 0; i < ARRAY_SIZE(m48txx_sysbus_info); i++) { -if (m48txx_sysbus_info[i].size != size || -m48txx_sysbus_info[i].model != model) { -continue; -} - -dev = qdev_new(m48txx_sysbus_info[i].bus_name); -qdev_prop_set_int32(dev, "base-year", base_year); -s = SYS_BUS_DEVICE(dev); -sysbus_realize_and_unref(s, &error_fatal); -sysbus_connect_irq(s, 0, IRQ); -if (io_base != 0) { -memory_region_add_subregion(get_system_io(), io_base, -sysbus_mmio_get_region(s, 1)); -} -if (mem_base != 0) { -sysbus_mmio_map(s, 0, mem_base); -} - -return NVRAM(s); -} - -assert(false); -return NULL; -} - void m48t59_realize_common(M48t59State *s, Error **errp) { s->buffer = g_malloc0(s->size); diff --git a/include/hw/rtc/m48t59.h b/include/hw/rtc/m48t59.h index 9defe578d1..d9b45eb161 100644 --- a/include/hw/rtc/m48t59.h +++ b/include/hw/rtc/m48t59.h @@ -47,8 +47,4 @@ struct NvramClass { void (*toggle_lock)(Nvram *obj, int lock); }; -Nvram *m48t59_init(qemu_irq IRQ, hwaddr mem_base, - uint32_t io_base, uint16_t size, int base_year, - int type); - #endif /* HW_M48T59_H */ -- 2.20.1
[PATCH 4/5] ppc405_boards: use qdev properties instead of legacy m48t59_init() function
Signed-off-by: Mark Cave-Ayland --- hw/ppc/ppc405_boards.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c index 6198ec1035..4687715b15 100644 --- a/hw/ppc/ppc405_boards.c +++ b/hw/ppc/ppc405_boards.c @@ -28,6 +28,8 @@ #include "qemu-common.h" #include "cpu.h" #include "hw/ppc/ppc.h" +#include "hw/qdev-properties.h" +#include "hw/sysbus.h" #include "ppc405.h" #include "hw/rtc/m48t59.h" #include "hw/block/flash.h" @@ -145,6 +147,8 @@ static void ref405ep_init(MachineState *machine) char *filename; ppc4xx_bd_info_t bd; CPUPPCState *env; +DeviceState *dev; +SysBusDevice *s; qemu_irq *pic; MemoryRegion *bios; MemoryRegion *sram = g_new(MemoryRegion, 1); @@ -227,7 +231,11 @@ static void ref405ep_init(MachineState *machine) /* Register FPGA */ ref405ep_fpga_init(sysmem, 0xF030); /* Register NVRAM */ -m48t59_init(NULL, 0xF000, 0, 8192, 1968, 8); +dev = qdev_new("sysbus-m48t08"); +qdev_prop_set_int32(dev, "base-year", 1968); +s = SYS_BUS_DEVICE(dev); +sysbus_realize_and_unref(s, &error_fatal); +sysbus_mmio_map(s, 0, 0xF000); /* Load kernel */ linux_boot = (kernel_filename != NULL); if (linux_boot) { -- 2.20.1
[PATCH 1/5] m48t59-isa: remove legacy m48t59_init_isa() function
This function is no longer used within the codebase. Signed-off-by: Mark Cave-Ayland --- hw/rtc/m48t59-isa.c | 25 - include/hw/rtc/m48t59.h | 2 -- 2 files changed, 27 deletions(-) diff --git a/hw/rtc/m48t59-isa.c b/hw/rtc/m48t59-isa.c index cae315e488..dc21fb10a5 100644 --- a/hw/rtc/m48t59-isa.c +++ b/hw/rtc/m48t59-isa.c @@ -58,31 +58,6 @@ static M48txxInfo m48txx_isa_info[] = { } }; -Nvram *m48t59_init_isa(ISABus *bus, uint32_t io_base, uint16_t size, - int base_year, int model) -{ -ISADevice *isa_dev; -DeviceState *dev; -int i; - -for (i = 0; i < ARRAY_SIZE(m48txx_isa_info); i++) { -if (m48txx_isa_info[i].size != size || -m48txx_isa_info[i].model != model) { -continue; -} - -isa_dev = isa_new(m48txx_isa_info[i].bus_name); -dev = DEVICE(isa_dev); -qdev_prop_set_uint32(dev, "iobase", io_base); -qdev_prop_set_int32(dev, "base-year", base_year); -isa_realize_and_unref(isa_dev, bus, &error_fatal); -return NVRAM(dev); -} - -assert(false); -return NULL; -} - static uint32_t m48txx_isa_read(Nvram *obj, uint32_t addr) { M48txxISAState *d = M48TXX_ISA(obj); diff --git a/include/hw/rtc/m48t59.h b/include/hw/rtc/m48t59.h index 04abedf3b2..9defe578d1 100644 --- a/include/hw/rtc/m48t59.h +++ b/include/hw/rtc/m48t59.h @@ -47,8 +47,6 @@ struct NvramClass { void (*toggle_lock)(Nvram *obj, int lock); }; -Nvram *m48t59_init_isa(ISABus *bus, uint32_t io_base, uint16_t size, - int base_year, int type); Nvram *m48t59_init(qemu_irq IRQ, hwaddr mem_base, uint32_t io_base, uint16_t size, int base_year, int type); -- 2.20.1
[PATCH 2/5] sun4m: use qdev properties instead of legacy m48t59_init() function
Signed-off-by: Mark Cave-Ayland --- hw/sparc/sun4m.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c index 54a2b2f9ef..a9bb60f2b2 100644 --- a/hw/sparc/sun4m.c +++ b/hw/sparc/sun4m.c @@ -966,7 +966,13 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef, create_unimplemented_device("SUNW,sx", hwdef->sx_base, 0x2000); } -nvram = m48t59_init(slavio_irq[0], hwdef->nvram_base, 0, 0x2000, 1968, 8); +dev = qdev_new("sysbus-m48t08"); +qdev_prop_set_int32(dev, "base-year", 1968); +s = SYS_BUS_DEVICE(dev); +sysbus_realize_and_unref(s, &error_fatal); +sysbus_connect_irq(s, 0, slavio_irq[0]); +sysbus_mmio_map(s, 0, hwdef->nvram_base); +nvram = NVRAM(dev); slavio_timer_init_all(hwdef->counter_base, slavio_irq[19], slavio_cpu_irq, smp_cpus); -- 2.20.1
[PATCH 3/5] sun4u: use qdev properties instead of legacy m48t59_init() function
Signed-off-by: Mark Cave-Ayland --- hw/sparc64/sun4u.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c index ad5ca2472a..05e659c8a4 100644 --- a/hw/sparc64/sun4u.c +++ b/hw/sparc64/sun4u.c @@ -671,10 +671,13 @@ static void sun4uv_init(MemoryRegion *address_space_mem, pci_ide_create_devs(pci_dev); /* Map NVRAM into I/O (ebus) space */ -nvram = m48t59_init(NULL, 0, 0, NVRAM_SIZE, 1968, 59); -s = SYS_BUS_DEVICE(nvram); +dev = qdev_new("sysbus-m48t59"); +qdev_prop_set_int32(dev, "base-year", 1968); +s = SYS_BUS_DEVICE(dev); +sysbus_realize_and_unref(s, &error_fatal); memory_region_add_subregion(pci_address_space_io(ebus), 0x2000, sysbus_mmio_get_region(s, 0)); +nvram = NVRAM(dev); initrd_size = 0; initrd_addr = 0; -- 2.20.1
[PATCH 0/5] m48t59: remove legacy init functions
This patchset is inspired by Philippe's "hw/rtc/m48t59: Simplify m48t59_init()" patchset at https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg04493.html but goes further: rather than tidy-up the legacy init functions, convert the callers to use qdev properties directly so they can simply be removed. Signed-off-by: Mark Cave-Ayland Mark Cave-Ayland (5): m48t59-isa: remove legacy m48t59_init_isa() function sun4m: use qdev properties instead of legacy m48t59_init() function sun4u: use qdev properties instead of legacy m48t59_init() function ppc405_boards: use qdev properties instead of legacy m48t59_init() function m48t59: remove legacy m48t59_init() function hw/ppc/ppc405_boards.c | 10 +- hw/rtc/m48t59-isa.c | 25 - hw/rtc/m48t59.c | 35 --- hw/sparc/sun4m.c| 8 +++- hw/sparc64/sun4u.c | 7 +-- include/hw/rtc/m48t59.h | 6 -- 6 files changed, 21 insertions(+), 70 deletions(-) -- 2.20.1
[PATCH] goldfish_rtc: re-arm the alarm after migration
After a migration the clock offset is updated, but we also need to re-arm the alarm if needed. Signed-off-by: Laurent Vivier --- hw/rtc/goldfish_rtc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/rtc/goldfish_rtc.c b/hw/rtc/goldfish_rtc.c index 0f4e8185a796..e07ff0164e0c 100644 --- a/hw/rtc/goldfish_rtc.c +++ b/hw/rtc/goldfish_rtc.c @@ -211,6 +211,8 @@ static int goldfish_rtc_post_load(void *opaque, int version_id) qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); s->tick_offset = s->tick_offset_vmstate - delta; +goldfish_rtc_set_alarm(s); + return 0; } -- 2.26.2
Re: HTIF tohost symbol size check always fails
On Fri, Oct 16, 2020 at 7:59 AM Peer Adelt wrote: > > Hi, > > I have a problem with the RISC-V HTIF device. > > Every binary I have compiled for Spike on riscv32 fails with the following > error message: "HTIF tohost must be 8 bytes" > > This happens regardless of which program I have translated for Spike. This is > also the case with the official riscv-compliance tests, for example. > > The query "if (st_size != 8)" in the HTIF device always fails, because > st_size seems to be always 0. > > To be able to reproduce it: > - QEMU GIT Hash: d0ed6a69d399ae193959225cdeaa9382746c91cc (tag "v5.1.0") I just checked with this hash and with the current master and on both I can run a ELF executable on the Spike machine for RV32. > - System: Mac OS 10.14.6 (Darwin Kernel Version 18.7.0) > - Compiler: Latest SiFive Build for GCC under OSX Maybe try using an official toolchain instead of a vendor fork. Alistair > - Command: qemu-system-riscv32 -M spike -nographic -bios none -kernel > > > Best regards, > Peer Adelt
Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)
On 10/16/20 12:04 PM, Ju Hyung Park wrote: > A small update: > > As per Stefano's suggestion, disabling io_uring support from QEMU from > the configuration step did fix the problem and I'm no longer having > hangs. > > Looks like it __is__ an io_uring issue :( Would be great if you could try 5.4.71 and see if that helps for your issue. -- Jens Axboe
Re: io_uring possibly the culprit for qemu hang (linux-5.4.y)
A small update: As per Stefano's suggestion, disabling io_uring support from QEMU from the configuration step did fix the problem and I'm no longer having hangs. Looks like it __is__ an io_uring issue :( Btw, I used liburing fe50048 for linking QEMU. Thanks. On Fri, Oct 2, 2020 at 4:35 PM Stefano Garzarella wrote: > > Hi Ju, > > On Thu, Oct 01, 2020 at 11:30:14PM +0900, Ju Hyung Park wrote: > > Hi Stefano, > > > > On Thu, Oct 1, 2020 at 5:59 PM Stefano Garzarella > > wrote: > > > Please, can you share the qemu command line that you are using? > > > This can be useful for the analysis. > > > > Sure. > > Thanks for sharing. > > The issue seems related to io_uring and the new io_uring fd monitoring > implementation available from QEMU 5.0. > > I'll try to reproduce. > > For now, as a workaround, you can rebuild qemu by disabling io-uring support: > > ../configure --disable-linux-io-uring ... > > > Thanks, > Stefano > > > > > QEMU: > > /usr/bin/qemu-system-x86_64 -name guest=win10,debug-threads=on -S > > -object > > secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-win10/master-key.aes > > -blockdev > > {"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"} > > -blockdev > > {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"} > > -blockdev > > {"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/win10_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"} > > -blockdev > > {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"} > > -machine > > pc-q35-5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,mem-merge=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format > > -cpu > > Skylake-Client-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,amd-ssbd=on,fma=off,avx=off,f16c=off,rdrand=off,bmi1=off,hle=off,avx2=off,bmi2=off,rtm=off,rdseed=off,adx=off,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vpindex,hv-runtime,hv-synic,hv-stimer,hv-reset > > -m 8192 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu/1-win10 > > -overcommit mem-lock=off -smp 4,sockets=1,dies=1,cores=2,threads=2 > > -uuid 7ccc3031-1dab-4267-b72a-d60065b5ff7f -display none > > -no-user-config -nodefaults -chardev > > socket,id=charmonitor,fd=32,server,nowait -mon > > chardev=charmonitor,id=monitor,mode=control -rtc > > base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay > > -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global > > ICH9-LPC.disable_s4=1 -boot menu=off,strict=on -device > > pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 > > -device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 > > -device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 > > -device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 > > -device pcie-pci-bridge,id=pci.5,bus=pci.2,addr=0x0 -device > > qemu-xhci,id=usb,bus=pci.1,addr=0x0 -blockdev > > {"driver":"host_device","filename":"/dev/disk/by-partuuid/05c3750b-060f-4703-95ea-6f5e546bf6e9","node-name":"libvirt-1-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"} > > -blockdev > > {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","detect-zeroes":"unmap","cache":{"direct":false,"no-flush":true},"driver":"raw","file":"libvirt-1-storage"} > > -device > > virtio-blk-pci,scsi=off,bus=pcie.0,addr=0xa,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on > > -netdev tap,fd=34,id=hostnet0 -device > > e1000,netdev=hostnet0,id=net0,mac=52:54:00:c6:bb:bc,bus=pcie.0,addr=0x3 > > -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x4 -device > > hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device > > vfio-pci,host=:00:02.0,id=hostdev0,bus=pcie.0,addr=0x2,rombar=0 > > -device virtio-balloon-pci,id=balloon0,bus=pcie.0,addr=0x8 -object > > rng-random,id=objrng0,filename=/dev/urandom -device > > virtio-rng-pci,rng=objrng0,id=rng0,bus=pcie.0,addr=0x9 -msg > > timestamp=on > > > > And I use libvirt 6.3.0 to manage the VM. Here's an xml of my VM. > > > > > > win10 > > 7ccc3031-1dab-4267-b72a-d60065b5ff7f > > > > > xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0";> > > http://microsoft.com/win/10"/> > > > > > > 8388608 > > 8388608 > > > > > > > > > > 4 > > > > > > > > > > > > > > > > hvm > > > type="pflash">/usr/share/OVMF/OVMF_CODE.fd > > /var/lib/libvirt/qemu/nvram/win10_VARS.fd > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [PATCH v2] hw/riscv: microchip_pfsoc: IOSCBCTRL memmap entry
On Fri, Oct 16, 2020 at 10:10 AM Ivan Griffin wrote: > > Adding the PolarFire SoC IOSCBCTRL memory region to prevent QEMU > reporting a STORE/AMO Access Fault. > > This region is used by the PolarFire SoC port of U-Boot to > interact with the FPGA system controller. > > Signed-off-by: Ivan Griffin Reviewed-by: Alistair Francis Alistair > --- > hw/riscv/microchip_pfsoc.c | 10 ++ > include/hw/riscv/microchip_pfsoc.h | 1 + > 2 files changed, 11 insertions(+) > > diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c > index 4627179cd3..9aaa276ee2 100644 > --- a/hw/riscv/microchip_pfsoc.c > +++ b/hw/riscv/microchip_pfsoc.c > @@ -97,6 +97,7 @@ static const struct MemmapEntry { > [MICROCHIP_PFSOC_GPIO2] = { 0x20122000, 0x1000 }, > [MICROCHIP_PFSOC_ENVM_CFG] ={ 0x2020, 0x1000 }, > [MICROCHIP_PFSOC_ENVM_DATA] = { 0x2022,0x2 }, > +[MICROCHIP_PFSOC_IOSCB_CTRL] = { 0x3702, 0x1000 }, > [MICROCHIP_PFSOC_IOSCB_CFG] = { 0x3708, 0x1000 }, > [MICROCHIP_PFSOC_DRAM] ={ 0x8000,0x0 }, > }; > @@ -341,6 +342,15 @@ static void microchip_pfsoc_soc_realize(DeviceState > *dev, Error **errp) > create_unimplemented_device("microchip.pfsoc.ioscb.cfg", > memmap[MICROCHIP_PFSOC_IOSCB_CFG].base, > memmap[MICROCHIP_PFSOC_IOSCB_CFG].size); > + > +/* IOSCBCTRL > + * > + * These registers are not documented in the official documentation > + * but used by the polarfire-soc-bare-meta-library code > + */ > +create_unimplemented_device("microchip.pfsoc.ioscb.ctrl", > +memmap[MICROCHIP_PFSOC_IOSCB_CTRL].base, > +memmap[MICROCHIP_PFSOC_IOSCB_CTRL].size); > } > > static void microchip_pfsoc_soc_class_init(ObjectClass *oc, void *data) > diff --git a/include/hw/riscv/microchip_pfsoc.h > b/include/hw/riscv/microchip_pfsoc.h > index 8bfc7e1a85..3f1874b162 100644 > --- a/include/hw/riscv/microchip_pfsoc.h > +++ b/include/hw/riscv/microchip_pfsoc.h > @@ -95,6 +95,7 @@ enum { > MICROCHIP_PFSOC_ENVM_CFG, > MICROCHIP_PFSOC_ENVM_DATA, > MICROCHIP_PFSOC_IOSCB_CFG, > +MICROCHIP_PFSOC_IOSCB_CTRL, > MICROCHIP_PFSOC_DRAM, > }; > > -- > 2.17.1 >
Re: [RFC PATCH v3] target/mips: Increase number of TLB entries on the 34Kf core (16 -> 64)
On 10/16/20 6:33 AM, Philippe Mathieu-Daudé wrote: > Per "MIPS32 34K Processor Core Family Software User's Manual, > Revision 01.13" page 8 in "Joint TLB (JTLB)" section: > > "The JTLB is a fully associative TLB cache containing 16, 32, >or 64-dual-entries mapping up to 128 virtual pages to their >corresponding physical addresses." > > There is no particular reason to restrict the 34Kf core model to > 16 TLB entries, so raise its config to 64. > > This is helpful for other projects, in particular the Yocto Project: > > Yocto Project uses qemu-system-mips 34Kf cpu model, to run 32bit > MIPS CI loop. It was observed that in this case CI test execution > time was almost twice longer than 64bit MIPS variant that runs > under MIPS64R2-generic model. It was investigated and concluded > that the difference in number of TLBs 16 in 34Kf case vs 64 in > MIPS64R2-generic is responsible for most of CI real time execution > difference. Because with 16 TLBs linux user-land trashes TLB more > and it needs to execute more instructions in TLB refill handler > calls, as result it runs much longer. > > (https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html) > > Buglink: https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992 > Reported-by: Victor Kamensky > Signed-off-by: Philippe Mathieu-Daudé > --- > v3: KISS > Supersedes: <20201015224746.540027-1-f4...@amsat.org> > Signed-off-by: Philippe Mathieu-Daudé > --- > target/mips/translate_init.c.inc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v3 00/15] raspi: add the bcm2835 cprman clock manager
On 10/10/20 3:57 PM, Luc Michel wrote: [...] Hi, This series add the BCM2835 CPRMAN clock manager peripheral to the Raspberry Pi machine. Series: Tested-by: Philippe Mathieu-Daudé
[PATCH v3 11/13] block/qcow2: read_cache_sizes: return status value
It's better to return status together with setting errp. It allows to reduce error propagation. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Greg Kurz Reviewed-by: Alberto Garcia --- block/qcow2.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index c4b86df7c0..2b6ec4b757 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -869,7 +869,7 @@ static void qcow2_attach_aio_context(BlockDriverState *bs, cache_clean_timer_init(bs, new_context); } -static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, +static bool read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, uint64_t *l2_cache_size, uint64_t *l2_cache_entry_size, uint64_t *refcount_cache_size, Error **errp) @@ -907,16 +907,16 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, error_setg(errp, QCOW2_OPT_CACHE_SIZE ", " QCOW2_OPT_L2_CACHE_SIZE " and " QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not be set " "at the same time"); -return; +return false; } else if (l2_cache_size_set && (l2_cache_max_setting > combined_cache_size)) { error_setg(errp, QCOW2_OPT_L2_CACHE_SIZE " may not exceed " QCOW2_OPT_CACHE_SIZE); -return; +return false; } else if (*refcount_cache_size > combined_cache_size) { error_setg(errp, QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not exceed " QCOW2_OPT_CACHE_SIZE); -return; +return false; } if (l2_cache_size_set) { @@ -955,8 +955,10 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, error_setg(errp, "L2 cache entry size must be a power of two " "between %d and the cluster size (%d)", 1 << MIN_CLUSTER_BITS, s->cluster_size); -return; +return false; } + +return true; } typedef struct Qcow2ReopenState { @@ -983,7 +985,6 @@ static int qcow2_update_options_prepare(BlockDriverState *bs, int i; const char *encryptfmt; QDict *encryptopts = NULL; -Error *local_err = NULL; int ret; qdict_extract_subqdict(options, &encryptopts, "encrypt."); @@ -996,10 +997,8 @@ static int qcow2_update_options_prepare(BlockDriverState *bs, } /* get L2 table/refcount block cache size from command line options */ -read_cache_sizes(bs, opts, &l2_cache_size, &l2_cache_entry_size, - &refcount_cache_size, &local_err); -if (local_err) { -error_propagate(errp, local_err); +if (!read_cache_sizes(bs, opts, &l2_cache_size, &l2_cache_entry_size, + &refcount_cache_size, errp)) { ret = -EINVAL; goto fail; } -- 2.21.3
[PATCH v3 12/13] block/qcow2: simplify qcow2_co_invalidate_cache()
qcow2_do_open correctly sets errp on each failure path. So, we can simplify code in qcow2_co_invalidate_cache() and drop explicit error propagation. Add ERRP_GUARD() as mandated by the documentation in include/qapi/error.h so that error_prepend() is actually called even if errp is &error_fatal. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Alberto Garcia Reviewed-by: Greg Kurz --- block/qcow2.c | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 2b6ec4b757..cd5f48d3fb 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -2702,11 +2702,11 @@ static void qcow2_close(BlockDriverState *bs) static void coroutine_fn qcow2_co_invalidate_cache(BlockDriverState *bs, Error **errp) { +ERRP_GUARD(); BDRVQcow2State *s = bs->opaque; int flags = s->flags; QCryptoBlock *crypto = NULL; QDict *options; -Error *local_err = NULL; int ret; /* @@ -2724,16 +2724,11 @@ static void coroutine_fn qcow2_co_invalidate_cache(BlockDriverState *bs, flags &= ~BDRV_O_INACTIVE; qemu_co_mutex_lock(&s->lock); -ret = qcow2_do_open(bs, options, flags, &local_err); +ret = qcow2_do_open(bs, options, flags, errp); qemu_co_mutex_unlock(&s->lock); qobject_unref(options); -if (local_err) { -error_propagate_prepend(errp, local_err, -"Could not reopen qcow2 layer: "); -bs->drv = NULL; -return; -} else if (ret < 0) { -error_setg_errno(errp, -ret, "Could not reopen qcow2 layer"); +if (ret < 0) { +error_prepend(errp, "Could not reopen qcow2 layer: "); bs->drv = NULL; return; } -- 2.21.3
[PATCH v3 13/13] block/qed: bdrv_qed_do_open: deal with errp
Set errp always on failure. Generic bdrv_open_driver supports driver functions which can return negative value and forget to set errp. That's a strange thing.. Let's improve bdrv_qed_do_open to not behave this way. This allows to simplify code in bdrv_qed_co_invalidate_cache(). Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Alberto Garcia Reviewed-by: Greg Kurz --- block/qed.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/block/qed.c b/block/qed.c index b27e7546ca..f45c640513 100644 --- a/block/qed.c +++ b/block/qed.c @@ -393,6 +393,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState *bs, QDict *options, ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header)); if (ret < 0) { +error_setg(errp, "Failed to read QED header"); return ret; } qed_header_le_to_cpu(&le_header, &s->header); @@ -408,25 +409,30 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState *bs, QDict *options, return -ENOTSUP; } if (!qed_is_cluster_size_valid(s->header.cluster_size)) { +error_setg(errp, "QED cluster size is invalid"); return -EINVAL; } /* Round down file size to the last cluster */ file_size = bdrv_getlength(bs->file->bs); if (file_size < 0) { +error_setg(errp, "Failed to get file length"); return file_size; } s->file_size = qed_start_of_cluster(s, file_size); if (!qed_is_table_size_valid(s->header.table_size)) { +error_setg(errp, "QED table size is invalid"); return -EINVAL; } if (!qed_is_image_size_valid(s->header.image_size, s->header.cluster_size, s->header.table_size)) { +error_setg(errp, "QED image size is invalid"); return -EINVAL; } if (!qed_check_table_offset(s, s->header.l1_table_offset)) { +error_setg(errp, "QED table offset is invalid"); return -EINVAL; } @@ -438,6 +444,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState *bs, QDict *options, /* Header size calculation must not overflow uint32_t */ if (s->header.header_size > UINT32_MAX / s->header.cluster_size) { +error_setg(errp, "QED header size is too large"); return -EINVAL; } @@ -445,6 +452,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState *bs, QDict *options, if ((uint64_t)s->header.backing_filename_offset + s->header.backing_filename_size > s->header.cluster_size * s->header.header_size) { +error_setg(errp, "QED backing filename offset is invalid"); return -EINVAL; } @@ -453,6 +461,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState *bs, QDict *options, bs->auto_backing_file, sizeof(bs->auto_backing_file)); if (ret < 0) { +error_setg(errp, "Failed to read backing filename"); return ret; } pstrcpy(bs->backing_file, sizeof(bs->backing_file), @@ -475,6 +484,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState *bs, QDict *options, ret = qed_write_header_sync(s); if (ret) { +error_setg(errp, "Failed to update header"); return ret; } @@ -487,6 +497,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState *bs, QDict *options, ret = qed_read_l1_table_sync(s); if (ret) { +error_setg(errp, "Failed to read L1 table"); goto out; } @@ -503,6 +514,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState *bs, QDict *options, ret = qed_check(s, &result, true); if (ret) { +error_setg(errp, "Image corrupted"); goto out; } } @@ -1537,22 +1549,16 @@ static void coroutine_fn bdrv_qed_co_invalidate_cache(BlockDriverState *bs, Error **errp) { BDRVQEDState *s = bs->opaque; -Error *local_err = NULL; int ret; bdrv_qed_close(bs); bdrv_qed_init_state(bs); qemu_co_mutex_lock(&s->table_lock); -ret = bdrv_qed_do_open(bs, NULL, bs->open_flags, &local_err); +ret = bdrv_qed_do_open(bs, NULL, bs->open_flags, errp); qemu_co_mutex_unlock(&s->table_lock); -if (local_err) { -error_propagate_prepend(errp, local_err, -"Could not reopen qed layer: "); -return; -} else if (ret < 0) { -error_setg_errno(errp, -ret, "Could not reopen qed layer"); -return; +if (ret < 0) { +error_prepend(errp, "Could not reopen qed layer: "); } } -- 2.21.3
[PATCH v3 09/13] block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface
It's recommended for bool functions with errp to return true on success and false on failure. Non-standard interfaces don't help to understand the code. The change is also needed to reduce error propagation. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Alberto Garcia Reviewed-by: Greg Kurz --- block/qcow2.h| 3 ++- block/qcow2-bitmap.c | 26 +++--- block/qcow2.c| 6 ++ 3 files changed, 19 insertions(+), 16 deletions(-) diff --git a/block/qcow2.h b/block/qcow2.h index 024901a6ca..4c994739ed 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -975,7 +975,8 @@ void qcow2_cache_discard(Qcow2Cache *c, void *table); int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, BdrvCheckResult *res, void **refcount_table, int64_t *refcount_table_size); -bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error **errp); +bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, bool *header_updated, + Error **errp); bool qcow2_get_bitmap_info_list(BlockDriverState *bs, Qcow2BitmapInfoList **info_list, Error **errp); int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp); diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c index 55cd52ab96..ca01f08bac 100644 --- a/block/qcow2-bitmap.c +++ b/block/qcow2-bitmap.c @@ -962,25 +962,27 @@ static void set_readonly_helper(gpointer bitmap, gpointer value) bdrv_dirty_bitmap_set_readonly(bitmap, (bool)value); } -/* qcow2_load_dirty_bitmaps() - * Return value is a hint for caller: true means that the Qcow2 header was - * updated. (false doesn't mean that the header should be updated by the - * caller, it just means that updating was not needed or the image cannot be - * written to). - * On failure the function returns false. +/* + * Return true on success, false on failure. + * If header_updated is not NULL then it is set appropriately regardless of + * the return value. */ -bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error **errp) +bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, bool *header_updated, + Error **errp) { BDRVQcow2State *s = bs->opaque; Qcow2BitmapList *bm_list; Qcow2Bitmap *bm; GSList *created_dirty_bitmaps = NULL; -bool header_updated = false; bool needs_update = false; +if (header_updated) { +*header_updated = false; +} + if (s->nb_bitmaps == 0) { /* No bitmaps - nothing to do */ -return false; +return true; } bm_list = bitmap_list_load(bs, s->bitmap_directory_offset, @@ -1036,7 +1038,9 @@ bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error **errp) error_setg_errno(errp, -ret, "Can't update bitmap directory"); goto fail; } -header_updated = true; +if (header_updated) { +*header_updated = true; +} } if (!can_write(bs)) { @@ -1047,7 +1051,7 @@ bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error **errp) g_slist_free(created_dirty_bitmaps); bitmap_list_free(bm_list); -return header_updated; +return true; fail: g_slist_foreach(created_dirty_bitmaps, release_dirty_bitmap_helper, bs); diff --git a/block/qcow2.c b/block/qcow2.c index 8c89c98978..c4b86df7c0 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1297,7 +1297,6 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options, unsigned int len, i; int ret = 0; QCowHeader header; -Error *local_err = NULL; uint64_t ext_end; uint64_t l1_vm_state_index; bool update_header = false; @@ -1785,9 +1784,8 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options, if (!(bdrv_get_flags(bs) & BDRV_O_INACTIVE)) { /* It's case 1, 2 or 3.2. Or 3.1 which is BUG in management layer. */ -bool header_updated = qcow2_load_dirty_bitmaps(bs, &local_err); -if (local_err != NULL) { -error_propagate(errp, local_err); +bool header_updated; +if (!qcow2_load_dirty_bitmaps(bs, &header_updated, errp)) { ret = -EINVAL; goto fail; } -- 2.21.3
Re: [RFC PATCH 0/3] target/mips: Make the number of TLB entries a CPU property
On 10/15/20 11:56 AM, Victor Kamensky (kamensky) via wrote: > Is possible to come back to 34Kf route, doing > very small localized very well defined change > of bumping TLBs number for model that we know > works well for us? Yes, thanks for testing. I think we should also add a property to enable Config3.PM for any cpu, and see how that gets on. But simply adjusting the number of tlb entries is a good start, and is the only thing that will work for older kernels. r~
[PATCH v3 06/13] block/mirror: drop extra error propagation in commit_active_start()
Let's check return value of mirror_start_job to check for failure instead of local_err. Rename ret to job, as ret is usually integer variable. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Greg Kurz Reviewed-by: Alberto Garcia --- block/mirror.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index b3778248b8..f7c624d6a9 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -1851,8 +1851,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs, bool auto_complete, Error **errp) { bool base_read_only; -Error *local_err = NULL; -BlockJob *ret; +BlockJob *job; base_read_only = bdrv_is_read_only(base); @@ -1862,19 +1861,18 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs, } } -ret = mirror_start_job( +job = mirror_start_job( job_id, bs, creation_flags, base, NULL, speed, 0, 0, MIRROR_LEAVE_BACKING_CHAIN, false, on_error, on_error, true, cb, opaque, &commit_active_job_driver, false, base, auto_complete, filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND, - &local_err); -if (local_err) { -error_propagate(errp, local_err); + errp); +if (!job) { goto error_restore_flags; } -return ret; +return job; error_restore_flags: /* ignore error and errp for bdrv_reopen, because we want to propagate -- 2.21.3
[PATCH v3 10/13] block/qcow2-bitmap: return status from qcow2_store_persistent_dirty_bitmaps
It's better to return status together with setting errp. It makes possible to avoid error propagation. While being here, put ERRP_GUARD() to fix error_prepend(errp, ...) usage inside qcow2_store_persistent_dirty_bitmaps() (see the comment above ERRP_GUARD() definition in include/qapi/error.h) Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Greg Kurz Reviewed-by: Alberto Garcia --- block/qcow2.h| 2 +- block/qcow2-bitmap.c | 13 ++--- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/block/qcow2.h b/block/qcow2.h index 4c994739ed..467cfd4779 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -981,7 +981,7 @@ bool qcow2_get_bitmap_info_list(BlockDriverState *bs, Qcow2BitmapInfoList **info_list, Error **errp); int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp); int qcow2_truncate_bitmaps_check(BlockDriverState *bs, Error **errp); -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, +bool qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, bool release_stored, Error **errp); int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp); bool qcow2_co_can_store_new_dirty_bitmap(BlockDriverState *bs, diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c index ca01f08bac..9eccaab7f8 100644 --- a/block/qcow2-bitmap.c +++ b/block/qcow2-bitmap.c @@ -1536,9 +1536,10 @@ out: * readonly to begin with, and whether we opened directly or reopened to that * state shouldn't matter for the state we get afterward. */ -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, +bool qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, bool release_stored, Error **errp) { +ERRP_GUARD(); BdrvDirtyBitmap *bitmap; BDRVQcow2State *s = bs->opaque; uint32_t new_nb_bitmaps = s->nb_bitmaps; @@ -1558,7 +1559,7 @@ void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, bm_list = bitmap_list_load(bs, s->bitmap_directory_offset, s->bitmap_directory_size, errp); if (bm_list == NULL) { -return; +return false; } } @@ -1673,7 +1674,7 @@ success: } bitmap_list_free(bm_list); -return; +return true; fail: QSIMPLEQ_FOREACH(bm, bm_list, entry) { @@ -1691,16 +1692,14 @@ fail: } bitmap_list_free(bm_list); +return false; } int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp) { BdrvDirtyBitmap *bitmap; -Error *local_err = NULL; -qcow2_store_persistent_dirty_bitmaps(bs, false, &local_err); -if (local_err != NULL) { -error_propagate(errp, local_err); +if (!qcow2_store_persistent_dirty_bitmaps(bs, false, errp)) { return -EINVAL; } -- 2.21.3
Re: [PATCH v2 2/5] [RISCV_PM] Support CSRs required for RISC-V PM extension except for ones in hypervisor mode
On 10/15/20 11:05 AM, Alexey Baturo wrote: > Meanwhile, do you think applying **MTE *masks while reading CSR values is a > good solution for now? Yes. r~
[PATCH v3 05/13] block: drop extra error propagation for bdrv_set_backing_hd
bdrv_set_backing_hd now returns status, let's use it. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Greg Kurz Reviewed-by: Alberto Garcia --- block.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/block.c b/block.c index 7b6818c681..a35dc80dd4 100644 --- a/block.c +++ b/block.c @@ -3016,11 +3016,9 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options, /* Hook up the backing file link; drop our reference, bs owns the * backing_hd reference now */ -bdrv_set_backing_hd(bs, backing_hd, &local_err); +ret = bdrv_set_backing_hd(bs, backing_hd, errp); bdrv_unref(backing_hd); -if (local_err) { -error_propagate(errp, local_err); -ret = -EINVAL; +if (ret < 0) { goto free_exit; } -- 2.21.3