date:20220131

Re: [PATCH 3/4] acpi: fix OEM ID/OEM Table ID padding

2022-01-31 Thread Igor Mammedov

On Mon, 31 Jan 2022 19:51:24 +0530 (IST)
Ani Sinha  wrote:

> On Mon, 31 Jan 2022, Igor Mammedov wrote:
> 
> > On Mon, 31 Jan 2022 18:58:57 +0530 (IST)
> > Ani Sinha  wrote:
> >  
> > > On Mon, 31 Jan 2022, Igor Mammedov wrote:
> > >  
> > > > On Mon, 31 Jan 2022 11:47:00 +0530
> > > > Ani Sinha  wrote:
> > > >  
> > > > > On Wed, Jan 12, 2022 at 6:33 PM Igor Mammedov  
> > > > > wrote:  
> > > > > >
> > > > > > Commit [2] broke original '\0' padding of OEM ID and OEM Table ID
> > > > > > fields in headers of ACPI tables. While it doesn't have impact on
> > > > > > default values since QEMU uses 6 and 8 characters long values
> > > > > > respectively, it broke usecase where IDs are provided on QEMU CLI.
> > > > > > It shouldn't affect guest (but may cause licensing verification
> > > > > > issues in guest OS).
> > > > > > One of the broken usecases is user supplied SLIC table with IDs
> > > > > > shorter than max possible length, where [2] mangles IDs with extra
> > > > > > spaces in RSDT and FADT tables whereas guest OS expects those to
> > > > > > mirror the respective values of the used SLIC table.
> > > > > >
> > > > > > Fix it by replacing whitespace padding with '\0' padding in
> > > > > > accordance with [1] and expectations of guest OS
> > > > > >
> > > > > > 1) ACPI spec, v2.0b
> > > > > >17.2 AML Grammar Definition
> > > > > >...
> > > > > >//OEM ID of up to 6 characters. If the OEM ID is
> > > > > >//shorter than 6 characters, it can be terminated
> > > > > >//with a NULL character.  
> > > > >
> > > > > On the other hand, from
> > > > > https://uefi.org/specs/ACPI/6.4/21_ACPI_Data_Tables_and_Table_Def_Language/ACPI_Data_Tables.html
> > > > > ,
> > > > >
> > > > > "For example, the OEM ID and OEM Table ID in the common ACPI table
> > > > > header (shown above) are fixed at six and eight characters,
> > > > > respectively. They are not necessarily null terminated"
> > > > >
> > > > > I also checked version 5 and the verbiage is the same. I think not
> > > > > terminating with a null is not incorrect.  
> > > >
> > > > I have a trouble with too much 'not' within the sentence.  
> > >
> > > :-)
> > >  
> > > > So what's the point of this comment and how it's related to
> > > > this patch?  
> > >
> > > My understanding of the spec is that null termination of both those IDs is
> > > not mandatory. Guests may get confused or expect the strings to be null
> > > termimated but they should really be open to expecting non-null terminated
> > > strings as well. What is important is that the number of chars of those
> > > two strings are fixed and well defined in the spec and qemu
> > > implementation.
> > >
> > > In any case, I think we can leave the patch as is for now and see if the
> > > change causes trouble with other guests.  
> >
> >
> > these fields have a fixed length so one doesn't need terminating NULL
> > in case the full length of the field is utilized, otherwise in case of
> > where the value is shorter than max length it has to be null terminated
> > to express a shorter value. That way QEMU worked for years until
> > 602b458201 introduced regression.
> >  
> 
> My comment was based on what I interpreted from reading the latest
> version of the specs. I guess the spec does not explicitly say what the
> padding
> bytes would be in case the length of the IDs are less the max length. I
> interpreted the wording to mean that whether or not the
> length of the string is shorter, one should not expect it to terminate with 
> null.

that's what AML grmamar quoted in commit message clarifies
for specific field(s), as opposed to your generic string
type description

> It would be nice if a future version of the spec made is explicit and
> clearer.


PS:
you were asking the other day if there is any bugs left in ACPI,
(the answer is that I'm not aware of any).
But there are issues with SMBIOS tables that need to be fixed
(it's corner cases with large VM configurations), are you
interested in trying to fix it?

Re: "make check-acceptance" takes way too long

2022-01-31 Thread Gerd Hoffmann

  Hi,

> I'm not sure you can recycle something from it, but my (ugly) approach
> to make this fast (for a different purpose -- I'm using qemu to run
> tests in guests, not testing qemu) is to build an initramfs by copying
> the host binaries I need (a shell, ip, jq) and recursively sourcing
> libraries using ldd (I guess I mentioned it's ugly).

By design limited to the host architecture, but might be good enough
depending on what you want test ...

> No downloads, systemd, dracut, etc., guest boots in half a second
> (x86_64 on x86_64, KVM -- no idea with TCG). Host kernel with a few
> modules packed and loaded by a custom init script.

I've simply used dracut for that in the past.  Recursively sourcing
libraries is one of the things it does which I didn't have to code up
myself that way. Used to work pretty well.

But these days dracut doesn't want give you a shell prompt without
asking for a password beforehand, which is annoying if all you want
do is run some simple tests, and there was to easy way to turn that
off last time I checked ...

take care,
  Gerd

Re: [PULL 53/61] target/riscv: Split out the vill from vtype

2022-01-31 Thread LIU Zhiwei




On 2022/2/1 10:12, Alistair Francis wrote:

On Sat, Jan 29, 2022 at 2:10 AM Peter Maydell  wrote:

On Fri, 21 Jan 2022 at 09:42, Alistair Francis
 wrote:

From: LIU Zhiwei 

We need not specially process vtype when XLEN changes.

Signed-off-by: LIU Zhiwei 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
Message-id: 20220120122050.41546-16-zhiwei_...@c-sky.com
Signed-off-by: Alistair Francis 

Odd thing I noticed looking at this code: as far as I can see we
may set env->vill to 1 in the vsetvl helper, but there is nowhere
that we set it to 0, so once it transitions to 1 it's stuck there
until the system is reset. Is this really right?

This is really confusing. It implies that you can't set vill from
software, but that just seems to be confusing wording.

Reading 
https://lists.riscv.org/g/tech-vector-ext/topic/reliably_set_vtype_vill/86745728
it seems that this is a QEMU bug and the guest should be able to set
the bit as part of vsetvl

@LIU Zhiwei are you able to fix this up?


Thanks for pointing it out. I have sent a patch to fix this up.

Thanks,
Zhiwei




Alistair

Re: "make check-acceptance" takes way too long

2022-01-31 Thread Stefano Brivio

Hi,

On Tue, 25 Jan 2022 10:20:11 +0100
Gerd Hoffmann  wrote:

>   Hi,
> 
> > IMHO the ideal scenario would be for us to have a kernel, initrd
> > containing just busybox tools for the key arch targets we care
> > about. Those could be used with direct kernel boot or stuffed
> > into a disk iamge. Either way, they would boot in ~1 second,
> > even with TCG, and would be able to execute simple shell scripts
> > to test a decent amount of QEMU functionality.  
> 
> I have some test images based on buildroot which are essentially that.
> https://gitlab.com/kraxel/br-kraxel/
> 
> Still a significant download, but much smaller than a full fedora or
> ubuntu cloud image and it boots much faster too.  Not down to only one
> second though.

I'm not sure you can recycle something from it, but my (ugly) approach
to make this fast (for a different purpose -- I'm using qemu to run
tests in guests, not testing qemu) is to build an initramfs by copying
the host binaries I need (a shell, ip, jq) and recursively sourcing
libraries using ldd (I guess I mentioned it's ugly).

No downloads, systemd, dracut, etc., guest boots in half a second
(x86_64 on x86_64, KVM -- no idea with TCG). Host kernel with a few
modules packed and loaded by a custom init script.

If you're interested, you can see it in operation at 3:11:17 (ah, the
sarcasm) of: https://passt.top/passt/about/#continuous-integration
(click on the "udp/pasta" anchor below, it's a few seconds in), or in
slow motion at 0:51 of https://passt.top/passt/about/#passt_2.

It's basically:

  git clone https://mbuto.lameexcu.se/mbuto/ && cd mbuto
  ./mbuto -c lz4 -p passt -f img # Profiles define sets of binaries
  ${qemu} -kernel /boot/vmlinuz-$(uname -r) -initrd img

-- 
Stefano

Re: [PATCH 1/2] hw/char/renesas_sci: Add fifo buffer to backend interface.

2022-01-31 Thread Thomas Huth


On 31/01/2022 10.42, Yoshinori Sato wrote:

SCI does not have a fifo, it is necessary to send and receive
  at a bit rate speed.
But, qemu's chardev backend does not have a buffer,
  so it sends received data continuously.
By buffering the received data with the FIFO, continuous
  received data can be received.


 Hi!

If you describe it like this, it sounds like you're now emulating a buffer 
that is not there with real hardware? Is that really what you want here, 
i.e. wouldn't this hide problems with the real hardware that are mitigated 
in QEMU with this buffer?


Anyway, please use scripts/get_maintainer.pl to get a list of people who 
should be put on CC:, otherwise your patches might get lost in the high 
traffic of the mailing list.


 Thomas

[PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

2022-01-31 Thread Leonardo Bras

Implement zero copy send on nocomp_send_write(), by making use of QIOChannel
writev + flags & flush interface.

Change multifd_send_sync_main() so flush_zero_copy() can be called
after each iteration in order to make sure all dirty pages are sent before
a new iteration is started. It will also flush at the beginning and at the
end of migration.

Also make it return -1 if flush_zero_copy() fails, in order to cancel
the migration process, and avoid resuming the guest in the target host
without receiving all current RAM.

This will work fine on RAM migration because the RAM pages are not usually 
freed,
and there is no problem on changing the pages content between 
writev_zero_copy() and
the actual sending of the buffer, because this change will dirty the page and
cause it to be re-sent on a next iteration anyway.

A lot of locked memory may be needed in order to use multid migration
with zero-copy enabled, so disabling the feature should be necessary for
low-privileged users trying to perform multifd migrations.

Signed-off-by: Leonardo Bras 
---
 migration/multifd.h   |  4 +++-
 migration/migration.c | 11 ++-
 migration/multifd.c   | 41 +++--
 migration/ram.c   | 29 ++---
 migration/socket.c|  5 +++--
 5 files changed, 73 insertions(+), 17 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 4dda900a0b..7ec688fb4f 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -22,7 +22,7 @@ int multifd_load_cleanup(Error **errp);
 bool multifd_recv_all_channels_created(void);
 bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
-void multifd_send_sync_main(QEMUFile *f);
+int multifd_send_sync_main(QEMUFile *f);
 int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
 
 /* Multifd Compression flags */
@@ -96,6 +96,8 @@ typedef struct {
 uint32_t packet_len;
 /* pointer to the packet */
 MultiFDPacket_t *packet;
+/* multifd flags for sending ram */
+int write_flags;
 /* multifd flags for each packet */
 uint32_t flags;
 /* size of the next packet that contains pages */
diff --git a/migration/migration.c b/migration/migration.c
index 3e0a25bb5b..1450fd0370 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1479,7 +1479,16 @@ static bool migrate_params_check(MigrationParameters 
*params, Error **errp)
 error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: 
");
 return false;
 }
-
+#ifdef CONFIG_LINUX
+if (params->zero_copy_send &&
+(!migrate_use_multifd() ||
+ params->multifd_compression != MULTIFD_COMPRESSION_NONE ||
+ (params->tls_creds && *params->tls_creds))) {
+error_setg(errp,
+   "Zero copy only available for non-compressed non-TLS 
multifd migration");
+return false;
+}
+#endif
 return true;
 }
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 43998ad117..2d68b9cf4f 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -568,19 +568,28 @@ void multifd_save_cleanup(void)
 multifd_send_state = NULL;
 }
 
-void multifd_send_sync_main(QEMUFile *f)
+int multifd_send_sync_main(QEMUFile *f)
 {
 int i;
+bool flush_zero_copy;
 
 if (!migrate_use_multifd()) {
-return;
+return 0;
 }
 if (multifd_send_state->pages->num) {
 if (multifd_send_pages(f) < 0) {
 error_report("%s: multifd_send_pages fail", __func__);
-return;
+return 0;
 }
 }
+
+/*
+ * When using zero-copy, it's necessary to flush after each iteration to
+ * make sure pages from earlier iterations don't end up replacing newer
+ * pages.
+ */
+flush_zero_copy = migrate_use_zero_copy_send();
+
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDSendParams *p = _send_state->params[i];
 
@@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f)
 if (p->quit) {
 error_report("%s: channel %d has already quit", __func__, i);
 qemu_mutex_unlock(>mutex);
-return;
+return 0;
 }
 
 p->packet_num = multifd_send_state->packet_num++;
@@ -602,6 +611,17 @@ void multifd_send_sync_main(QEMUFile *f)
 ram_counters.transferred += p->packet_len;
 qemu_mutex_unlock(>mutex);
 qemu_sem_post(>sem);
+
+if (flush_zero_copy) {
+int ret;
+Error *err = NULL;
+
+ret = qio_channel_flush(p->c, );
+if (ret < 0) {
+error_report_err(err);
+return -1;
+}
+}
 }
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDSendParams *p = _send_state->params[i];
@@ -610,6 +630,8 @@ void multifd_send_sync_main(QEMUFile *f)
 qemu_sem_wait(>sem_sync);
 }

[PATCH v8 3/5] migration: Add zero-copy-send parameter for QMP/HMP for Linux

2022-01-31 Thread Leonardo Bras

Add property that allows zero-copy migration of memory pages
on the sending side, and also includes a helper function
migrate_use_zero_copy_send() to check if it's enabled.

No code is introduced to actually do the migration, but it allow
future implementations to enable/disable this feature.

On non-Linux builds this parameter is compiled-out.

Signed-off-by: Leonardo Bras 
Reviewed-by: Peter Xu 
Reviewed-by: Daniel P. Berrangé 
---
 qapi/migration.json   | 24 
 migration/migration.h |  5 +
 migration/migration.c | 32 
 migration/socket.c|  5 +
 monitor/hmp-cmds.c|  6 ++
 5 files changed, 72 insertions(+)

diff --git a/qapi/migration.json b/qapi/migration.json
index 5975a0e104..5b4753b5de 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -741,6 +741,13 @@
 #  will consume more CPU.
 #  Defaults to 1. (Since 5.0)
 #
+# @zero-copy-send: Controls behavior on sending memory pages on migration.
+#  When true, enables a zero-copy mechanism for sending memory
+#  pages, if host supports it.
+#  Requires that QEMU be permitted to use locked memory for 
guest
+#  RAM pages.
+#  Defaults to false. (Since 7.1)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #aliases for the purpose of dirty bitmap migration.  
Such
 #aliases may for example be the corresponding names on 
the
@@ -780,6 +787,7 @@
'xbzrle-cache-size', 'max-postcopy-bandwidth',
'max-cpu-throttle', 'multifd-compression',
'multifd-zlib-level' ,'multifd-zstd-level',
+   { 'name': 'zero-copy-send', 'if' : 'CONFIG_LINUX'},
'block-bitmap-mapping' ] }
 
 ##
@@ -906,6 +914,13 @@
 #  will consume more CPU.
 #  Defaults to 1. (Since 5.0)
 #
+# @zero-copy-send: Controls behavior on sending memory pages on migration.
+#  When true, enables a zero-copy mechanism for sending memory
+#  pages, if host supports it.
+#  Requires that QEMU be permitted to use locked memory for 
guest
+#  RAM pages.
+#  Defaults to false. (Since 7.1)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #aliases for the purpose of dirty bitmap migration.  
Such
 #aliases may for example be the corresponding names on 
the
@@ -960,6 +975,7 @@
 '*multifd-compression': 'MultiFDCompression',
 '*multifd-zlib-level': 'uint8',
 '*multifd-zstd-level': 'uint8',
+'*zero-copy-send': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
 '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
@@ -1106,6 +1122,13 @@
 #  will consume more CPU.
 #  Defaults to 1. (Since 5.0)
 #
+# @zero-copy-send: Controls behavior on sending memory pages on migration.
+#  When true, enables a zero-copy mechanism for sending memory
+#  pages, if host supports it.
+#  Requires that QEMU be permitted to use locked memory for 
guest
+#  RAM pages.
+#  Defaults to false. (Since 7.1)
+#
 # @block-bitmap-mapping: Maps block nodes and bitmaps on them to
 #aliases for the purpose of dirty bitmap migration.  
Such
 #aliases may for example be the corresponding names on 
the
@@ -1158,6 +1181,7 @@
 '*multifd-compression': 'MultiFDCompression',
 '*multifd-zlib-level': 'uint8',
 '*multifd-zstd-level': 'uint8',
+'*zero-copy-send': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
 '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } }
 
 ##
diff --git a/migration/migration.h b/migration/migration.h
index 8130b703eb..4cbc901ea0 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -339,6 +339,11 @@ MultiFDCompression migrate_multifd_compression(void);
 int migrate_multifd_zlib_level(void);
 int migrate_multifd_zstd_level(void);
 
+#ifdef CONFIG_LINUX
+bool migrate_use_zero_copy_send(void);
+#else
+#define migrate_use_zero_copy_send() (false)
+#endif
 int migrate_use_xbzrle(void);
 uint64_t migrate_xbzrle_cache_size(void);
 bool migrate_colo_enabled(void);
diff --git a/migration/migration.c b/migration/migration.c
index bcc385b94b..1b3230a97b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -893,6 +893,10 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 params->multifd_zlib_level = s->parameters.multifd_zlib_level;
 params->has_multifd_zstd_level = true;
 params->multifd_zstd_level = s->parameters.multifd_zstd_level;
+#ifdef CONFIG_LINUX
+params->has_zero_copy_send = true;
+

[PATCH] target/riscv: Fix vill field write in vtype

2022-01-31 Thread LIU Zhiwei

The guest should be able to set the vill bit as part of vsetvl.

Currently we may set env->vill to 1 in the vsetvl helper, but there
is nowhere that we set it to 0, so once it transitions to 1 it's stuck
there until the system is reset.

Signed-off-by: LIU Zhiwei 
---
 target/riscv/vector_helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 020d2e841f..3bd4aac9c9 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -71,6 +71,7 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong 
s1,
 env->vl = vl;
 env->vtype = s2;
 env->vstart = 0;
+env->vill = 0;
 return vl;
 }
 
-- 
2.25.1

[PATCH v8 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback

2022-01-31 Thread Leonardo Bras

Add flags to io_writev and introduce io_flush as optional callback to
QIOChannelClass, allowing the implementation of zero copy writes by
subclasses.

How to use them:
- Write data using qio_channel_writev*(...,QIO_CHANNEL_WRITE_FLAG_ZERO_COPY),
- Wait write completion with qio_channel_flush().

Notes:
As some zero copy write implementations work asynchronously, it's
recommended to keep the write buffer untouched until the return of
qio_channel_flush(), to avoid the risk of sending an updated buffer
instead of the buffer state during write.

As io_flush callback is optional, if a subclass does not implement it, then:
- io_flush will return 0 without changing anything.

Also, some functions like qio_channel_writev_full_all() were adapted to
receive a flag parameter. That allows shared code between zero copy and
non-zero copy writev, and also an easier implementation on new flags.

Signed-off-by: Leonardo Bras 
---
 include/io/channel.h| 38 -
 chardev/char-io.c   |  2 +-
 hw/remote/mpqemu-link.c |  2 +-
 io/channel-buffer.c |  1 +
 io/channel-command.c|  1 +
 io/channel-file.c   |  1 +
 io/channel-socket.c |  2 ++
 io/channel-tls.c|  1 +
 io/channel-websock.c|  1 +
 io/channel.c| 53 +++--
 migration/rdma.c|  1 +
 scsi/pr-manager-helper.c|  2 +-
 tests/unit/test-io-channel-socket.c |  1 +
 13 files changed, 92 insertions(+), 14 deletions(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 88988979f8..c680ee7480 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -32,12 +32,15 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
 
 #define QIO_CHANNEL_ERR_BLOCK -2
 
+#define QIO_CHANNEL_WRITE_FLAG_ZERO_COPY 0x1
+
 typedef enum QIOChannelFeature QIOChannelFeature;
 
 enum QIOChannelFeature {
 QIO_CHANNEL_FEATURE_FD_PASS,
 QIO_CHANNEL_FEATURE_SHUTDOWN,
 QIO_CHANNEL_FEATURE_LISTEN,
+QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY,
 };
 
 
@@ -104,6 +107,7 @@ struct QIOChannelClass {
  size_t niov,
  int *fds,
  size_t nfds,
+ int flags,
  Error **errp);
 ssize_t (*io_readv)(QIOChannel *ioc,
 const struct iovec *iov,
@@ -136,6 +140,8 @@ struct QIOChannelClass {
   IOHandler *io_read,
   IOHandler *io_write,
   void *opaque);
+int (*io_flush)(QIOChannel *ioc,
+Error **errp);
 };
 
 /* General I/O handling functions */
@@ -228,6 +234,7 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc,
  * @niov: the length of the @iov array
  * @fds: an array of file handles to send
  * @nfds: number of file handles in @fds
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  * Write data to the IO channel, reading it from the
@@ -260,6 +267,7 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc,
 size_t niov,
 int *fds,
 size_t nfds,
+int flags,
 Error **errp);
 
 /**
@@ -837,6 +845,7 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
  * @niov: the length of the @iov array
  * @fds: an array of file handles to send
  * @nfds: number of file handles in @fds
+ * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  *
@@ -846,6 +855,14 @@ int qio_channel_readv_full_all(QIOChannel *ioc,
  * to be written, yielding from the current coroutine
  * if required.
  *
+ * If QIO_CHANNEL_WRITE_FLAG_ZERO_COPY is passed in flags,
+ * instead of waiting for all requested data to be written,
+ * this function will wait until it's all queued for writing.
+ * In this case, if the buffer gets changed between queueing and
+ * sending, the updated buffer will be sent. If this is not a
+ * desired behavior, it's suggested to call qio_channel_flush()
+ * before reusing the buffer.
+ *
  * Returns: 0 if all bytes were written, or -1 on error
  */
 
@@ -853,6 +870,25 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
 const struct iovec *iov,
 size_t niov,
 int *fds, size_t nfds,
-Error **errp);
+int flags, Error **errp);
+
+/**
+ * qio_channel_flush:
+ * @ioc: the channel object
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Will block until every packet queued with
+ * qio_channel_writev_full() + QIO_CHANNEL_WRITE_FLAG_ZERO_COPY
+ * is sent, or return in case of any error.
+ *
+ * If

[PATCH v8 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX

2022-01-31 Thread Leonardo Bras

For CONFIG_LINUX, implement the new zero copy flag and the optional callback
io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
feature is available in the host kernel, which is checked on
qio_channel_socket_connect_sync()

qio_channel_socket_flush() was implemented by counting how many times
sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
socket's error queue, in order to find how many of them finished sending.
Flush will loop until those counters are the same, or until some error occurs.

Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
1: Buffer
- As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying,
some caution is necessary to avoid overwriting any buffer before it's sent.
If something like this happen, a newer version of the buffer may be sent 
instead.
- If this is a problem, it's recommended to call qio_channel_flush() before 
freeing
or re-using the buffer.

2: Locked memory
- When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and
unlocked after it's sent.
- Depending on the size of each buffer, and how often it's sent, it may require
a larger amount of locked memory than usually available to non-root user.
- If the required amount of locked memory is not available, writev_zero_copy
will return an error, which can abort an operation like migration,
- Because of this, when an user code wants to add zero copy as a feature, it
requires a mechanism to disable it, so it can still be accessible to less
privileged users.

Signed-off-by: Leonardo Bras 
Reviewed-by: Peter Xu 
Reviewed-by: Daniel P. Berrangé 
---
 include/io/channel-socket.h |   2 +
 io/channel-socket.c | 108 ++--
 2 files changed, 106 insertions(+), 4 deletions(-)

diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index e747e63514..513c428fe4 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -47,6 +47,8 @@ struct QIOChannelSocket {
 socklen_t localAddrLen;
 struct sockaddr_storage remoteAddr;
 socklen_t remoteAddrLen;
+ssize_t zero_copy_queued;
+ssize_t zero_copy_sent;
 };
 
 
diff --git a/io/channel-socket.c b/io/channel-socket.c
index b2d254ef8d..155a0a2ada 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -26,6 +26,10 @@
 #include "io/channel-watch.h"
 #include "trace.h"
 #include "qapi/clone-visitor.h"
+#ifdef CONFIG_LINUX
+#include 
+#include 
+#endif
 
 #define SOCKET_MAX_FDS 16
 
@@ -55,6 +59,8 @@ qio_channel_socket_new(void)
 
 sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
 sioc->fd = -1;
+sioc->zero_copy_queued = 0;
+sioc->zero_copy_sent = 0;
 
 ioc = QIO_CHANNEL(sioc);
 qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
@@ -154,6 +160,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
 return -1;
 }
 
+#ifdef CONFIG_LINUX
+int ret, v = 1;
+ret = qemu_setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, , sizeof(v));
+if (ret == 0) {
+/* Zero copy available on host */
+qio_channel_set_feature(QIO_CHANNEL(ioc),
+QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY);
+}
+#endif
+
 return 0;
 }
 
@@ -534,6 +550,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
 char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)];
 size_t fdsize = sizeof(int) * nfds;
 struct cmsghdr *cmsg;
+int sflags = 0;
 
 memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS));
 
@@ -558,15 +575,27 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
 memcpy(CMSG_DATA(cmsg), fds, fdsize);
 }
 
+if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
+sflags = MSG_ZEROCOPY;
+}
+
  retry:
-ret = sendmsg(sioc->fd, , 0);
+ret = sendmsg(sioc->fd, , sflags);
 if (ret <= 0) {
-if (errno == EAGAIN) {
+switch (errno) {
+case EAGAIN:
 return QIO_CHANNEL_ERR_BLOCK;
-}
-if (errno == EINTR) {
+case EINTR:
 goto retry;
+case ENOBUFS:
+if (sflags & MSG_ZEROCOPY) {
+error_setg_errno(errp, errno,
+ "Process can't lock enough memory for using 
MSG_ZEROCOPY");
+return -1;
+}
+break;
 }
+
 error_setg_errno(errp, errno,
  "Unable to write to socket");
 return -1;
@@ -660,6 +689,74 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
 }
 #endif /* WIN32 */
 
+
+#ifdef CONFIG_LINUX
+static int qio_channel_socket_flush(QIOChannel *ioc,
+Error **errp)
+{
+QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
+struct msghdr msg = {};
+struct sock_extended_err *serr;
+struct cmsghdr *cm;
+char control[CMSG_SPACE(sizeof(*serr))];
+int received;
+int ret = 1;
+
+msg.msg_control =

[PATCH v8 4/5] migration: Add migrate_use_tls() helper

2022-01-31 Thread Leonardo Bras

A lot of places check parameters.tls_creds in order to evaluate if TLS is
in use, and sometimes call migrate_get_current() just for that test.

Add new helper function migrate_use_tls() in order to simplify testing
for TLS usage.

Signed-off-by: Leonardo Bras 
Reviewed-by: Juan Quintela 
Reviewed-by: Peter Xu 
Reviewed-by: Daniel P. Berrangé 
---
 migration/migration.h | 1 +
 migration/channel.c   | 3 +--
 migration/migration.c | 9 +
 migration/multifd.c   | 5 +
 4 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 4cbc901ea0..debacb2251 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -344,6 +344,7 @@ bool migrate_use_zero_copy_send(void);
 #else
 #define migrate_use_zero_copy_send() (false)
 #endif
+int migrate_use_tls(void);
 int migrate_use_xbzrle(void);
 uint64_t migrate_xbzrle_cache_size(void);
 bool migrate_colo_enabled(void);
diff --git a/migration/channel.c b/migration/channel.c
index c4fc000a1a..086b5c0d8b 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -38,8 +38,7 @@ void migration_channel_process_incoming(QIOChannel *ioc)
 trace_migration_set_incoming_channel(
 ioc, object_get_typename(OBJECT(ioc)));
 
-if (s->parameters.tls_creds &&
-*s->parameters.tls_creds &&
+if (migrate_use_tls() &&
 !object_dynamic_cast(OBJECT(ioc),
  TYPE_QIO_CHANNEL_TLS)) {
 migration_tls_channel_process_incoming(s, ioc, _err);
diff --git a/migration/migration.c b/migration/migration.c
index 1b3230a97b..3e0a25bb5b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2576,6 +2576,15 @@ bool migrate_use_zero_copy_send(void)
 }
 #endif
 
+int migrate_use_tls(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->parameters.tls_creds && *s->parameters.tls_creds;
+}
+
 int migrate_use_xbzrle(void)
 {
 MigrationState *s;
diff --git a/migration/multifd.c b/migration/multifd.c
index 76b57a7177..43998ad117 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -784,14 +784,11 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
 QIOChannel *ioc,
 Error *error)
 {
-MigrationState *s = migrate_get_current();
-
 trace_multifd_set_outgoing_channel(
 ioc, object_get_typename(OBJECT(ioc)), p->tls_hostname, error);
 
 if (!error) {
-if (s->parameters.tls_creds &&
-*s->parameters.tls_creds &&
+if (migrate_use_tls() &&
 !object_dynamic_cast(OBJECT(ioc),
  TYPE_QIO_CHANNEL_TLS)) {
 multifd_tls_channel_connect(p, ioc, );
-- 
2.34.1

[PATCH v8 0/5] MSG_ZEROCOPY + multifd

2022-01-31 Thread Leonardo Bras

This patch series intends to enable MSG_ZEROCOPY in QIOChannel, and make
use of it for multifd migration performance improvement, by reducing cpu
usage.

Patch #1 creates new callbacks for QIOChannel, allowing the implementation
of zero copy writing.

Patch #2 implements io_writev flags and io_flush() on QIOChannelSocket,
making use of MSG_ZEROCOPY on Linux.

Patch #3 adds a "zero_copy_send" migration property, only available with
CONFIG_LINUX, and compiled-out in any other architectures.
This migration property has to be enabled before multifd migration starts.

Patch #4 adds a helper function that allows to see if TLS is going to be used.
This helper will be later used in patch #5.

Patch #5 Makes use of QIOChannelSocket zero_copy implementation on
nocomp multifd migration.

Results:
In preliminary tests, the resource usage of __sys_sendmsg() reduced 15 times,
and the overall migration took 13-22% less time, based in synthetic cpu
workload.

In further tests, it was noted that, on multifd migration with 8 channels:
- On idle hosts, migration time reduced in 10% to 21%.
- On hosts busy with heavy cpu stress (1 stress thread per cpu, but
  not cpu-pinned) migration time reduced in ~25% by enabling zero-copy.
- On hosts with heavy cpu-pinned workloads (1 stress thread per cpu, 
  cpu-pinned), migration time reducted in ~66% by enabling zero-copy.

Above tests setup:
- Sending and Receiving hosts:
  - CPU : Intel(R) Xeon(R) Platinum 8276L CPU @ 2.20GHz (448 CPUS)
  - Network card: E810-C (100Gbps)
  - >1TB RAM
  - QEMU: Upstream master branch + This patchset
  - Linux: Upstream v5.15 
- VM configuration:
  - 28 VCPUs
  - 512GB RAM


---
Changes since v7:
- Migration property renamed from zero-copy to zero-copy-send
- A few early tests added to help misconfigurations to fail earlier
- qio_channel_full*_flags() renamed back to qio_channel_full*()
- multifd_send_sync_main() reverted back to not receiving a flag,
  so it always sync zero-copy when enabled.
- Improve code quality on a few points


Changes since v6:
- Remove io_writev_zero_copy(), and makes use of io_writev() new flags
  to achieve the same results.
- Rename io_flush_zero_copy() to io_flush()
- Previous patch #2 became too small, so it was squashed in previous
  patch #3 (now patch #2)

Changes since v5:
- flush_zero_copy now returns -1 on fail, 0 on success, and 1 when all
  processed writes were not able to use zerocopy in kernel.
- qio_channel_socket_poll() removed, using qio_channel_wait() instead
- ENOBUFS is now processed inside qio_channel_socket_writev_flags()
- Most zerocopy parameter validation moved to migrate_params_check(),
  leaving only feature test to socket_outgoing_migration() callback
- Naming went from *zerocopy to *zero_copy or *zero-copy, due to QAPI/QMP
  preferences
- Improved docs

Changes since v4:
- 3 patches got splitted in 6
- Flush is used for syncing after each iteration, instead of only at the end
- If zerocopy is not available, fail in connect instead of failing on write
- 'multifd-zerocopy' property renamed to 'zerocopy'
- Fail migrations that don't support zerocopy, if it's enabled.
- Instead of checking for zerocopy at each write, save the flags in
  MultiFDSendParams->write_flags and use them on write
- Reorganized flag usage in QIOChannelSocket 
- A lot of typos fixed
- More doc on buffer restrictions

Changes since v3:
- QIOChannel interface names changed from io_async_{writev,flush} to
  io_{writev,flush}_zerocopy
- Instead of falling back in case zerocopy is not implemented, return
  error and abort operation.
- Flush now waits as long as needed, or return error in case anything
  goes wrong, aborting the operation.
- Zerocopy is now conditional in multifd, being set by parameter
  multifd-zerocopy
- Moves zerocopy_flush to multifd_send_sync_main() from multifd_save_cleanup
  so migration can abort if flush goes wrong.
- Several other small improvements

Changes since v2:
- Patch #1: One more fallback
- Patch #2: Fall back to sync if fails to lock buffer memory in MSG_ZEROCOPY 
send.

Changes since v1:
- Reimplemented the patchset using async_write + async_flush approach.
- Implemented a flush to be able to tell whenever all data was written.


Leonardo Bras (5):
  QIOChannel: Add flags on io_writev and introduce io_flush callback
  QIOChannelSocket: Implement io_writev zero copy flag & io_flush for
CONFIG_LINUX
  migration: Add zero-copy-send parameter for QMP/HMP for Linux
  migration: Add migrate_use_tls() helper
  multifd: Implement zero copy write in multifd migration
(multifd-zero-copy)

 qapi/migration.json |  24 ++
 include/io/channel-socket.h |   2 +
 include/io/channel.h|  38 +-
 migration/migration.h   |   6 ++
 migration/multifd.h |   4 +-
 chardev/char-io.c   |   2 +-
 hw/remote/mpqemu-link.c |   2 +-
 io/channel-buffer.c |   1 +
 io/channel-command.c

Re: [PATCH v7 3/5] target/riscv: add support for svnapot extension

2022-01-31 Thread Alistair Francis

On Fri, Jan 28, 2022 at 6:57 PM Weiwei Li  wrote:
>
> - add PTE_N bit
> - add PTE_N bit check for inner PTE
> - update address translation to support 64KiB continuous region (napot_bits = 
> 4)
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 
> Reviewed-by: Anup Patel 
> ---
>  target/riscv/cpu.c|  2 ++
>  target/riscv/cpu_bits.h   |  1 +
>  target/riscv/cpu_helper.c | 17 ++---
>  3 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 1cb0436187..8752fa1544 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -729,6 +729,8 @@ static Property riscv_cpu_properties[] = {
>  DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
>  DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
>
> +DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),
> +
>  DEFINE_PROP_BOOL("zba", RISCVCPU, cfg.ext_zba, true),
>  DEFINE_PROP_BOOL("zbb", RISCVCPU, cfg.ext_zbb, true),
>  DEFINE_PROP_BOOL("zbc", RISCVCPU, cfg.ext_zbc, true),
> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> index 6ea3944423..f6ff1c5012 100644
> --- a/target/riscv/cpu_bits.h
> +++ b/target/riscv/cpu_bits.h
> @@ -489,6 +489,7 @@ typedef enum {
>  #define PTE_A   0x040 /* Accessed */
>  #define PTE_D   0x080 /* Dirty */
>  #define PTE_SOFT0x300 /* Reserved for Software */
> +#define PTE_N   0x8000 /* NAPOT translation */

This should be 0x8000ULL to avoid casting

>
>  /* Page table PPN shift amount */
>  #define PTE_PPN_SHIFT   10
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index b820166dc5..6262d157e2 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -641,7 +641,7 @@ restart:
>  return TRANSLATE_FAIL;
>  } else if (!(pte & (PTE_R | PTE_W | PTE_X))) {
>  /* Inner PTE, continue walking */
> -if (pte & (PTE_D | PTE_A | PTE_U)) {
> +if (pte & (target_ulong)(PTE_D | PTE_A | PTE_U | PTE_N)) {
>  return TRANSLATE_FAIL;
>  }
>  base = ppn << PGSHIFT;
> @@ -717,8 +717,19 @@ restart:
>  /* for superpage mappings, make a fake leaf PTE for the TLB's
> benefit. */
>  target_ulong vpn = addr >> PGSHIFT;
> -*physical = ((ppn | (vpn & ((1L << ptshift) - 1))) << PGSHIFT) |
> -(addr & ~TARGET_PAGE_MASK);
> +
> +int napot_bits = 0;
> +if (cpu->cfg.ext_svnapot && (pte & (target_ulong)PTE_N)) {
> +napot_bits = ctzl(ppn) + 1;
> +if ((i != (levels - 1)) || (napot_bits != 4)) {
> +return TRANSLATE_FAIL;
> +}
> +}
> +
> +*physical = (((ppn & ~(((target_ulong)1 << napot_bits) - 1)) |

It might be clearer to create the mask as a variable, there are a lot
of brackets here :)

Alistair

> +  (vpn & (((target_ulong)1 << napot_bits) - 1)) |
> +  (vpn & (((target_ulong)1 << ptshift) - 1))
> +) << PGSHIFT) | (addr & ~TARGET_PAGE_MASK);
>
>  /* set permissions on the TLB entry */
>  if ((pte & PTE_R) || ((pte & PTE_X) && mxr)) {
> --
> 2.17.1
>
>

Re: [RFC PATCH] spapr: Add SPAPR_CAP_AIL_MODES for supported AIL modes for H_SET_MODE hcall

2022-01-31 Thread Nicholas Piggin

Excerpts from Daniel Henrique Barboza's message of February 1, 2022 5:10 am:
> 
> 
> On 1/29/22 03:50, Nicholas Piggin wrote:
>> The behaviour of the Address Translation Mode on Interrupt resource is
>> not consistently supported by all CPU versions or all KVM versions.  In
>> particular KVM HV only supports mode 0 on POWER7 processors, and does
>> not support mode 2 on any processors. KVM PR only supports mode 0. TCG
>> can support all modes (0,2,3).
>> 
>> This leads to inconsistencies in guest behaviour and could cause
>> problems migrating guests.
>> 
>> This was not too noticable for Linux guests for a long time because the
>> kernel only used mode 0 or 3, and it used to consider AIL to be somewhat
>> advisory (KVM would not always honor it either) and it kept both sets of
>> interrupt vectors around.
>> 
>> Recent Linux guests depend on the AIL mode working as defined by the ISA
>> to support the SCV facility interrupt. If AIL mode 3 can not be provided,
>> then Linux must be given an error so it can disable the SCV facility.
> 
> Is this the scenario where migration failures can occur? I don't understand
> what are the migration problems you cited that were possible to happen.

Maybe I'm overly concerned and nothing would practically use it (beyond 
testing which we could just hack around). I was thinking of if we 
implemented AIL=2 in KVM HV, or AIL=3 in PR.

> 
>> 
>> Add the ail-modes capability which is a bitmap of the supported values
>> for the H_SET_MODE Address Translation Mode on Interrupt resource. Add
>> a new KVM CAP that exports the same thing, and provide defaults for PR
>> and HV KVM that predate the cap.
> 
> Why add a new machine cap in this case? Isn't something that the KVM 
> capability
> should be able to handle by itself, where we always assume that we should have
> the best AIL value possible?
> 
> Besides, the way it is coded here, we're adding an user-visible capability 
> that
> mimics the exact behavior we want from h_set_mode_resource_addr_trans_mode(),
> meaning that only bits 0,1,2 and 3 of cap-ail-modes can be set, but:
> 
> - bit 0 must always be set
> - bit 1 must always be cleared
> - if kvm_enabled():
> * bit 2 must always be cleared
> * bit 3 can be cleared or not depending on kvmppc_has_cap_ail_3(), which 
> translates
> to not allowed if running with KVM_PR and allowing it if it we're running 
> with Power8
> and newer
> 
> i.e. bit 0 is always set, bit 1 is always cleared, bit 2 can be set or not 
> for TCG but
> always cleared for KVM, and bit 3 can be set depending on the circunstances.
> 
> Note that this would allow an user to set this guest in a Power9/10 machine:
> 
> -machine pseries,accel=kvm,cap-ail-modes=1
> 
> And the guest will end up having degraded performance because AIL=3 is being 
> disabled.
> 
> If we want to avoid this and force AIL=3 to be used in this case, then this 
> capability
> would be used just to set or clear AIL=2 when running with TCG.

I was thinking how it could be more flexible with maybe possibly future 
AIL modes and things we don't foresee. In theory AIL=0 could go away
(although unlikely in practice).

> I believe the chunks in which we check for kvm_pr and allow only AIL=0 are 
> improvements
> of h_set_mode_resource_addr_trans_mode(), but other than that I'm afraid that 
> exposing
> this cap to users is a bit overkill.

That said, maybe you are right and it's overkill until a real need comes 
up.

I will split and submit the KVM cap part of it, at least.

Thanks,
Nick

Re: [RFC PATCH] spapr: Add SPAPR_CAP_AIL_MODES for supported AIL modes for H_SET_MODE hcall

2022-01-31 Thread Nicholas Piggin

Excerpts from Fabiano Rosas's message of February 1, 2022 1:51 am:
> Nicholas Piggin  writes:
> 
>> The behaviour of the Address Translation Mode on Interrupt resource is
>> not consistently supported by all CPU versions or all KVM versions.  In
>> particular KVM HV only supports mode 0 on POWER7 processors, and does
>> not support mode 2 on any processors. KVM PR only supports mode 0. TCG
>> can support all modes (0,2,3).
>>
>> This leads to inconsistencies in guest behaviour and could cause
>> problems migrating guests.
>>
>> This was not too noticable for Linux guests for a long time because the
>> kernel only used mode 0 or 3, and it used to consider AIL to be somewhat
>> advisory (KVM would not always honor it either) and it kept both sets of
>> interrupt vectors around.
>>
>> Recent Linux guests depend on the AIL mode working as defined by the ISA
>> to support the SCV facility interrupt. If AIL mode 3 can not be provided,
>> then Linux must be given an error so it can disable the SCV facility.
>>
>> Add the ail-modes capability which is a bitmap of the supported values
>> for the H_SET_MODE Address Translation Mode on Interrupt resource. Add
>> a new KVM CAP that exports the same thing, and provide defaults for PR
>> and HV KVM that predate the cap.
>> ---
>>
>> I just wanted to get some feedback on the approach before submitting a
>> patch for the KVM cap.
> 
> Could you expand a bit on what is the use case for setting this in the
> QEMU cmdline? I looks to me we already have all the information we need
> with just the KVM cap.

To be able to match TCG with KVM HV or PR behaviour here.
I guess I'm not sure how much that is actually needed though.

>> +if (kvm_enabled()) {
>> +if (val & (0x01 << 2)) {
>> +error_setg(errp, "KVM does not support cap-ail-modes mode 
>> AIL=2");
> 
> Isn't this something KVM should tell us via the capability?

Yeah, might as well do that. I changed some of the interfaces halfway
through and didn't clean this up.

>> +error_append_hint(errp,
>> +  "Ensure bit 2 (value 4) is clear in 
>> cap-ail-modes\n");
>> +if (kvmppc_has_cap_ail_3()) {
>> +error_append_hint(errp, "Try appending -machine 
>> cap-ail-modes=9\n");
>> +} else {
>> +error_append_hint(errp, "Try appending -machine 
>> cap-ail-modes=1\n");
>> +}
>> +return;
>> +}
>> +if ((val & (0x01 << 3)) && !kvmppc_has_cap_ail_3()) {
>> +error_setg(errp, "KVM implementation does not support 
>> cap-ail-modes AIL=3");
>> +error_append_hint(errp,
>> +  "Ensure bit 3 (value 8) is clear in 
>> cap-ail-modes\n");
>> +error_append_hint(errp, "Try appending -machine 
>> cap-ail-modes=1\n");
>> +return;
>> +}
>> +}
>> +}
> 
> I think the error reporting here is too complex. A user who just wants
> to make their guest start will not bother thinking about binary
> representation. There's also some room for confusion in having three
> numbers present in the error message (bit #, decimal value and AIL
> mode). Imagine dealing with this in a bug report, for instance.
> 
> I would just tell outright what the supported values are. Perhaps in a
> little table:
> 
> Supported AIL modes:
>  AIL = 0   | cap-ail-modes=1
>  AIL = 2   | cap-ail-modes=5
>  AIL = 3   | cap-ail-modes=9
>  AIL = 2&3 | cap-ail-modes=13
> 
> We could then make the code a bit more generic. Roughly:

Yeah I didn't like the interface either :P

The nicest option I guess is to be able to give it a list

cap-ail-modes=0,2,3

Maybe there's already some parsing to be able to do that. I'll
look a bit harder.

Thanks,
Nick

Re: "make check-acceptance" takes way too long

2022-01-31 Thread Cleber Rosa

On Fri, Jan 21, 2022 at 10:22 AM Daniel P. Berrangé  wrote:
>
> On Fri, Jan 21, 2022 at 12:23:23PM +, Alex Bennée wrote:
> >
> > Peter Maydell  writes:
> >
> > > On Fri, 21 Jan 2022 at 10:50, Markus Armbruster  wrote:
> > >> No objection, but it's no replacement for looking into why these tests
> > >> are so slow.
> > >>
> > >> The #1 reason for things being slow is not giving a damn :)
> > >
> > > See previous messages in the thread -- the test starts a
> > > full-fat guest OS including UEFI boot, and it takes forever to
> > > get to the login prompt because systemd is starting everything
> > > including the kitchen sink.
> >
> > There has to be a half-way house between booting a kernel until it fails
> > to find a rootfs and running a full Ubuntu distro. Maybe just asking
> > systemd to reach "rescue.target" would be enough to show the disks are
> > up and userspace works.
>
> Booting up full OS distros is useful, but at the same time I feel it
> is too much as something to expect developers to do on any kind of
> regular basis.
>

Agreed.  The solution IMO can be as simple as having different "test
job profiles".

> Ideally some decent amount of acceptance testing could be a standard
> part of the 'make check', but that's impossible as long as we're
> downloading large disk images or booting things that are very slow,
> especially so with TCG.
>
> IMHO the ideal scenario would be for us to have a kernel, initrd
> containing just busybox tools for the key arch targets we care
> about. Those could be used with direct kernel boot or stuffed
> into a disk iamge. Either way, they would boot in ~1 second,
> even with TCG, and would be able to execute simple shell scripts
> to test a decent amount of QEMU functionality.
>

I see different use cases here:

A) Testing that QEMU can boot a full distro

For testing purposes, the more different subsystems the "boot" process
depends on, the better.  Currently the "boot_linux.py" tests require the entire
guest boot to complete and have a networking configuration and interaction.

B) Using something as a base OS for scripts (tests) to run on it

Here's where there's the most benefit in having a more lightweight distro
(or kernel + initrd).  But, this requirement will also come in
different "optimal"
sizes for different people.  Some of the existing tests require not
only a Fedora
system, but a given version that has given capabilities.

For a sustainable, framework-like solution, tests should be able to determine
the guest they need with minimal setup from test writers[1].  If a Fedora-like
system is not needed, maybe a lightweight system like CirrOS[2] is enough.
CirrOS, unfortunately, can not be used Today as the distro in most of the
acceptance tests because the cloud-init mechanism used to configure the
networking is not currently supported, although there have been discussions
to consider implementing it[3].

> It wouldn't eliminate the need to test with full OS, but it
> would let us have some acceptance testing run as standard with
> 'make check' in a decently fast time.  It would then be less
> critical if the more thorough full OS tests were somewhat
> slower than we'd like. We could just leave those as a scheduled
> job to run overnight post-merge. If they do detect any problems
> post-merge, then write a dedicated test scenario to replicate it
> under the minimal kernel/initrd acceptance test so it'll be
> caught pre-merge in future.
>

Assuming this is about "Testing that QEMU can boot a full distro", I wouldn't
try to solve the problem by making the distro too slim to get to the
point of becoming
an unrealistic system.

IMO the deal breaker with regards to test time can be solved more cheaply by
having and using KVM where these tests will run, and not running them by
default otherwise.  With the tagging mechanism we should be able to set a
condition such as: "If using TCG, exclude tests that boot a full blown distro.
If using KVM, do not criticize what gets booted".  Resulting in something
like:

$ avocado list -t accel:tcg,boots:-distro -t accel:kvm
~/src/qemu/tests/avocado/{boot_linux.py,boot_linux_console.py}
avocado-instrumented
/home/cleber/src/qemu/tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_i440fx_kvm
avocado-instrumented
/home/cleber/src/qemu/tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_kvm
avocado-instrumented
/home/cleber/src/qemu/tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_kvm
avocado-instrumented
/home/cleber/src/qemu/tests/avocado/boot_linux_console.py:BootLinuxConsole.test_aarch64_virt
avocado-instrumented
/home/cleber/src/qemu/tests/avocado/boot_linux_console.py:BootLinuxConsole.test_aarch64_xlnx_versal_virt
avocado-instrumented
/home/cleber/src/qemu/tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_virt
avocado-instrumented
/home/cleber/src/qemu/tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_emcraft_sf2
avocado-instrumented

Re: [PATCH 0/2] RISC-V: Correctly generate store/amo faults

2022-01-31 Thread Alistair Francis

On Wed, Jan 26, 2022 at 10:09 AM Richard Henderson
 wrote:
>
> On 1/24/22 4:17 PM, LIU Zhiwei wrote:
> >
> > On 2022/1/24 上午8:59, Alistair Francis wrote:
> >> From: Alistair Francis 
> >>
> >> This series adds a MO_ op to specify that a load instruction should
> >> produce a store fault. This is used on RISC-V to produce a store/amo
> >> fault when an atomic access fails.
> >
> > Hi Alistair,
> >
> > As Richard said,  we  can address this issue in two ways, probe_read(I 
> > think probe_write
> > is typo)
>
> It is not a typo: we want to verify that the memory is writable before we 
> perform the
> load.  This will raise a write fault on a no-access page before a read fault 
> would be
> generated by the load.  This may still generate the wrong fault for a 
> write-only page.
> (Is such a page permission encoding possible with RISCV?  Not all cpus 
> support that, since

It's not. RISC-V doesn't have write only pages, at least not in the
current priv spec (maybe some extension allows it).

> at first blush it seems to be mostly useless.  But some do, and a generic tcg 
> feature
> should be designed with those in mind.)
>
> > In my opinion use MO_op in io_readx may be not right because the issue is 
> > not only with IO
> > access. And MO_ op in io_readx is too later because the exception has been 
> > created when
> > tlb_fill.
>
> You are correct that changing only io_readx is insufficient.  Very much so.
>
> Alistair, you're only changing the reporting of MMIO faults for which read 
> permission is
> missing.  Importantly, the actual permission check is done elsewhere, and you 
> aren't
> changing that to perform a write access check.  Also, you very much need to 
> handle normal

I'm a little confused with this part.

Looking at tcg_gen_atomic_cmpxchg_i64() for example we either:
 1. call tcg_gen_qemu_ld_i64() then tcg_gen_qemu_st_i64()
 2. call table_cmpxchg[] which eventually calls atomic_mmu_lookup()
 3. call tcg_gen_atomic_cmpxchg_i32() which is pretty much the same as
the above two

That means in both cases we end up performing a load or tlb_fill(..,
MMU_DATA_LOAD, ..) operation as well as a store operation.

So we are already performing a write permission check, if that fails
on RISC-V we correctly generate the RISCV_EXCP_STORE_AMO_ACCESS_FAULT
fault. I guess on some architectures there might be a specific atomic
fault, which we will still not correctly trigger though.

The part we are interested in is the load, and ensuring that we
generate a store fault if that fails. At least for RISC-V.

> memory not just MMIO.  Which will require changes across all tcg/arch/, as 
> well as in all
> of the memory access helpers in accel/tcg/.

Argh, yeah

>
> We may not want to add this check along the normal hot path of a normal load, 
> but create

Can't we just do the check in the slow path? By the time we get to the
fast path shouldn't we already have permissions?

> separate helpers for "load with write-permission-check".  And we should 
> answer the

As in add a new INDEX_op_qemu_ld_write_perm_i32/i64, make edits to
atomic_mmu_lookup() and all of the plumbing for those?

Alistair

> question of whether it should really be "load with 
> read-write-permission-check", which
> will make the changes to tcg/arch/ harder.
>
>
> r~

Re: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX

2022-01-31 Thread Leonardo Bras Soares Passos

Hello Peter,

Re-reading everything before submitting the next version.
I think I finally got that you are suggesting to just add a break at
the end of the case, after the if :)

Sorry I misunderstand that before,

Best regards,
Leo

On Thu, Jan 13, 2022 at 3:48 AM Peter Xu  wrote:
>
> On Thu, Jan 06, 2022 at 07:13:39PM -0300, Leonardo Bras wrote:
> > @@ -558,15 +575,26 @@ static ssize_t qio_channel_socket_writev(QIOChannel 
> > *ioc,
> >  memcpy(CMSG_DATA(cmsg), fds, fdsize);
> >  }
> >
> > +if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > +sflags = MSG_ZEROCOPY;
> > +}
> > +
> >   retry:
> > -ret = sendmsg(sioc->fd, , 0);
> > +ret = sendmsg(sioc->fd, , sflags);
> >  if (ret <= 0) {
> > -if (errno == EAGAIN) {
> > +switch (errno) {
> > +case EAGAIN:
> >  return QIO_CHANNEL_ERR_BLOCK;
> > -}
> > -if (errno == EINTR) {
> > +case EINTR:
> >  goto retry;
> > +case ENOBUFS:
> > +if (sflags & MSG_ZEROCOPY) {
> > +error_setg_errno(errp, errno,
> > + "Process can't lock enough memory for 
> > using MSG_ZEROCOPY");
> > +return -1;
> > +}
>
> I have no idea whether it'll make a real differnece, but - should we better 
> add
> a "break" here?  If you agree and with that fixed, feel free to add:
>
> Reviewed-by: Peter Xu 
>
> I also wonder whether you hit ENOBUFS in any of the environments.  On Fedora
> here it's by default unlimited, but just curious when we should keep an eye.
>
> Thanks,
>
> --
> Peter Xu
>

[PATCH v4 3/4] python: upgrade mypy to 0.780

2022-01-31 Thread John Snow

We need a slightly newer version of mypy in order to use some features
of the asyncio server functions in the next commit.

(Note: pipenv is not really suited to upgrading individual packages; I
need to replace this tool with something better for the task. For now,
the miscellaneous updates not related to the mypy upgrade are simply
beyond my control. It's on my list to take care of soon.)

Signed-off-by: John Snow 
---
 python/Pipfile.lock | 66 ++---
 python/setup.cfg|  2 +-
 2 files changed, 40 insertions(+), 28 deletions(-)

diff --git a/python/Pipfile.lock b/python/Pipfile.lock
index d2a7dbd88b..ce46404ce0 100644
--- a/python/Pipfile.lock
+++ b/python/Pipfile.lock
@@ -1,7 +1,7 @@
 {
 "_meta": {
 "hash": {
-"sha256": 
"784b327272db32403d5a488507853b5afba850ba26a5948e5b6a90c1baef2d9c"
+"sha256": 
"f1a25654d884a5b450e38d78b1f2e3ebb9073e421cc4358d4bbb83ac251a5670"
 },
 "pipfile-spec": 6,
 "requires": {
@@ -34,7 +34,7 @@
 
"sha256:09bdb456e02564731f8b5957cdd0c98a7f01d2db5e90eb1d794c353c28bfd705",
 
"sha256:6a8a51f64dae307f6e0c9db752b66a7951e282389d8362cc1d39a56f3feeb31d"
 ],
-"markers": "python_version ~= '3.6'",
+"index": "pypi",
 "version": "==2.6.0"
 },
 "avocado-framework": {
@@ -50,6 +50,7 @@
 
"sha256:106fef6dc37dd8c0e2c0a60d3fca3e77460a48907f335fa28420463a6f799736",
 
"sha256:23e223426b28491b1ced97dc3bbe183027419dfc7982b4fa2f05d5f3ff10711c"
 ],
+"index": "pypi",
 "version": "==0.3.2"
 },
 "filelock": {
@@ -57,6 +58,7 @@
 
"sha256:18d82244ee114f543149c66a6e0c14e9c4f8a1044b5cdaadd0f82159d6a6ff59",
 
"sha256:929b7d63ec5b7d6b71b0fa5ac14e030b3f70b75747cef1b10da9b879fef15836"
 ],
+"index": "pypi",
 "version": "==3.0.12"
 },
 "flake8": {
@@ -88,7 +90,7 @@
 
"sha256:54161657e8ffc76596c4ede7080ca68cb02962a2e074a2586b695a93a925d36e",
 
"sha256:e962bff7440364183203d179d7ae9ad90cb1f2b74dcb84300e88ecc42dca3351"
 ],
-"markers": "python_version < '3.7'",
+"index": "pypi",
 "version": "==5.1.4"
 },
 "isort": {
@@ -124,7 +126,7 @@
 
"sha256:ed361bb83436f117f9917d282a456f9e5009ea12fd6de8742d1a4752c3017e93",
 
"sha256:f5144c75445ae3ca2057faac03fda5a902eff196702b0a24daf1d6ce0650514b"
 ],
-"markers": "python_version >= '2.7' and python_version not in 
'3.0, 3.1, 3.2, 3.3, 3.4, 3.5'",
+"index": "pypi",
 "version": "==1.6.0"
 },
 "mccabe": {
@@ -136,23 +138,23 @@
 },
 "mypy": {
 "hashes": [
-
"sha256:15b948e1302682e3682f11f50208b726a246ab4e6c1b39f9264a8796bb416aa2",
-
"sha256:219a3116ecd015f8dca7b5d2c366c973509dfb9a8fc97ef044a36e3da66144a1",
-
"sha256:3b1fc683fb204c6b4403a1ef23f0b1fac8e4477091585e0c8c54cbdf7d7bb164",
-
"sha256:3beff56b453b6ef94ecb2996bea101a08f1f8a9771d3cbf4988a61e4d9973761",
-
"sha256:7687f6455ec3ed7649d1ae574136835a4272b65b3ddcf01ab8704ac65616c5ce",
-
"sha256:7ec45a70d40ede1ec7ad7f95b3c94c9cf4c186a32f6bacb1795b60abd2f9ef27",
-
"sha256:86c857510a9b7c3104cf4cde1568f4921762c8f9842e987bc03ed4f160925754",
-
"sha256:8a627507ef9b307b46a1fea9513d5c98680ba09591253082b4c48697ba05a4ae",
-
"sha256:8dfb69fbf9f3aeed18afffb15e319ca7f8da9642336348ddd6cab2713ddcf8f9",
-
"sha256:a34b577cdf6313bf24755f7a0e3f3c326d5c1f4fe7422d1d06498eb25ad0c600",
-
"sha256:a8ffcd53cb5dfc131850851cc09f1c44689c2812d0beb954d8138d4f5fc17f65",
-
"sha256:b90928f2d9eb2f33162405f32dde9f6dcead63a0971ca8a1b50eb4ca3e35ceb8",
-
"sha256:c56ffe22faa2e51054c5f7a3bc70a370939c2ed4de308c690e7949230c995913",
-
"sha256:f91c7ae919bbc3f96cd5e5b2e786b2b108343d1d7972ea130f7de27fdd547cf3"
+
"sha256:00cb1964a7476e871d6108341ac9c1a857d6bd20bf5877f4773ac5e9d92cd3cd",
+
"sha256:127de5a9b817a03a98c5ae8a0c46a20dc2af6dcfa2ae7f96cb519b312efa",
+
"sha256:1f3976a945ad7f0a0727aafdc5651c2d3278e3c88dee94e2bf75cd3386b7b2f4",
+
"sha256:2f8c098f12b402c19b735aec724cc9105cc1a9eea405d08814eb4b14a6fb1a41",
+
"sha256:4ef13b619a289aa025f2273e05e755f8049bb4eaba6d703a425de37d495d178d",
+
"sha256:5d142f219bf8c7894dfa79ebfb7d352c4c63a325e75f10dfb4c3db9417dcd135",
+
"sha256:62eb5dd4ea86bda8ce386f26684f7f26e4bfe6283c9f2b6ca6d17faf704dcfad",
+
"sha256:64c36eb0936d0bfb7d8da49f92c18e312ad2e3ed46e5548ae4ca997b0d33bd59",
+

[PATCH v4 1/4] python/aqmp: Fix negotiation with pre-"oob" QEMU

2022-01-31 Thread John Snow

QEMU versions prior to the "oob" capability *also* can't accept the
"enable" keyword argument at all. Fix the handshake process with older
QEMU versions.

Signed-off-by: John Snow 
Reviewed-by: Hanna Reitz 
---
 python/qemu/aqmp/qmp_client.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/qemu/aqmp/qmp_client.py b/python/qemu/aqmp/qmp_client.py
index f1a845cc82..90a8737f03 100644
--- a/python/qemu/aqmp/qmp_client.py
+++ b/python/qemu/aqmp/qmp_client.py
@@ -292,9 +292,9 @@ async def _negotiate(self) -> None:
 """
 self.logger.debug("Negotiating capabilities ...")
 
-arguments: Dict[str, List[str]] = {'enable': []}
+arguments: Dict[str, List[str]] = {}
 if self._greeting and 'oob' in self._greeting.QMP.capabilities:
-arguments['enable'].append('oob')
+arguments.setdefault('enable', []).append('oob')
 msg = self.make_execute_msg('qmp_capabilities', arguments=arguments)
 
 # It's not safe to use execute() here, because the reader/writers
-- 
2.31.1

[PATCH v4 0/4] Python: Improvements for iotest 040,041

2022-01-31 Thread John Snow

GitLab: https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-fixes
CI: https://gitlab.com/jsnow/qemu/-/pipelines/455146881

Fixes and improvements all relating to "iotest 040,041, intermittent
failure in netbsd VM"
https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg01975.html

See each patch for details.

V4:
 - Just commit message changes, and applying Hanna's RBs.

V3:
 - Retitled series
 - Dropped patch that was already merged
 - Reworded some comments, docstrings, etc.

John Snow (4):
  python/aqmp: Fix negotiation with pre-"oob" QEMU
  python/machine: raise VMLaunchFailure exception from launch()
  python: upgrade mypy to 0.780
  python/aqmp: add socket bind step to legacy.py

 python/Pipfile.lock   | 66 +--
 python/qemu/aqmp/legacy.py|  3 ++
 python/qemu/aqmp/protocol.py  | 41 --
 python/qemu/aqmp/qmp_client.py|  4 +-
 python/qemu/machine/machine.py| 45 +---
 python/setup.cfg  |  2 +-
 tests/qemu-iotests/tests/mirror-top-perms |  3 +-
 7 files changed, 123 insertions(+), 41 deletions(-)

-- 
2.31.1

[PATCH v4 4/4] python/aqmp: add socket bind step to legacy.py

2022-01-31 Thread John Snow

The synchronous QMP library would bind to the server address during
__init__(). The new library delays this to the accept() call, because
binding occurs inside of the call to start_[unix_]server(), which is an
async method -- so it cannot happen during __init__ anymore.

Python 3.7+ adds the ability to create the server (and thus the bind()
call) and begin the active listening in separate steps, but we don't
have that functionality in 3.6, our current minimum.

Therefore ... Add a temporary workaround that allows the synchronous
version of the client to bind the socket in advance, guaranteeing that
there will be a UNIX socket in the filesystem ready for the QEMU client
to connect to without a race condition.

(Yes, it's a bit ugly. Fixing it more nicely will have to wait until our
minimum Python version is 3.7+.)

Signed-off-by: John Snow 
---
 python/qemu/aqmp/legacy.py   |  3 +++
 python/qemu/aqmp/protocol.py | 41 +---
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/python/qemu/aqmp/legacy.py b/python/qemu/aqmp/legacy.py
index 0890f95b16..6baa5f3409 100644
--- a/python/qemu/aqmp/legacy.py
+++ b/python/qemu/aqmp/legacy.py
@@ -56,6 +56,9 @@ def __init__(self, address: SocketAddrT,
 self._address = address
 self._timeout: Optional[float] = None
 
+if server:
+self._aqmp._bind_hack(address)  # pylint: disable=protected-access
+
 _T = TypeVar('_T')
 
 def _sync(
diff --git a/python/qemu/aqmp/protocol.py b/python/qemu/aqmp/protocol.py
index 50e973c2f2..33358f5cd7 100644
--- a/python/qemu/aqmp/protocol.py
+++ b/python/qemu/aqmp/protocol.py
@@ -15,6 +15,7 @@
 from enum import Enum
 from functools import wraps
 import logging
+import socket
 from ssl import SSLContext
 from typing import (
 Any,
@@ -238,6 +239,9 @@ def __init__(self, name: Optional[str] = None) -> None:
 self._runstate = Runstate.IDLE
 self._runstate_changed: Optional[asyncio.Event] = None
 
+# Workaround for bind()
+self._sock: Optional[socket.socket] = None
+
 def __repr__(self) -> str:
 cls_name = type(self).__name__
 tokens = []
@@ -427,6 +431,34 @@ async def _establish_connection(
 else:
 await self._do_connect(address, ssl)
 
+def _bind_hack(self, address: Union[str, Tuple[str, int]]) -> None:
+"""
+Used to create a socket in advance of accept().
+
+This is a workaround to ensure that we can guarantee timing of
+precisely when a socket exists to avoid a connection attempt
+bouncing off of nothing.
+
+Python 3.7+ adds a feature to separate the server creation and
+listening phases instead, and should be used instead of this
+hack.
+"""
+if isinstance(address, tuple):
+family = socket.AF_INET
+else:
+family = socket.AF_UNIX
+
+sock = socket.socket(family, socket.SOCK_STREAM)
+sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
+
+try:
+sock.bind(address)
+except:
+sock.close()
+raise
+
+self._sock = sock
+
 @upper_half
 async def _do_accept(self, address: SocketAddrT,
  ssl: Optional[SSLContext] = None) -> None:
@@ -464,24 +496,27 @@ async def _client_connected_cb(reader: 
asyncio.StreamReader,
 if isinstance(address, tuple):
 coro = asyncio.start_server(
 _client_connected_cb,
-host=address[0],
-port=address[1],
+host=None if self._sock else address[0],
+port=None if self._sock else address[1],
 ssl=ssl,
 backlog=1,
 limit=self._limit,
+sock=self._sock,
 )
 else:
 coro = asyncio.start_unix_server(
 _client_connected_cb,
-path=address,
+path=None if self._sock else address,
 ssl=ssl,
 backlog=1,
 limit=self._limit,
+sock=self._sock,
 )
 
 server = await coro # Starts listening
 await connected.wait()  # Waits for the callback to fire (and finish)
 assert server is None
+self._sock = None
 
 self.logger.debug("Connection accepted.")
 
-- 
2.31.1

[PATCH v4 2/4] python/machine: raise VMLaunchFailure exception from launch()

2022-01-31 Thread John Snow

This allows us to pack in some extra information about the failure,
which guarantees that if the caller did not *intentionally* cause a
failure (by capturing this Exception), some pretty good clues will be
printed at the bottom of the traceback information.

This will help make failures in the event of a non-negative return code
more obvious when they go unhandled; the current behavior in
_post_shutdown() is to print a warning message only in the event of
signal-based terminations (for negative return codes).

(Note: In Python, catching BaseException instead of Exception catches a
broader array of Exception events, including SystemExit and
KeyboardInterrupt. We do not want to "wrap" such exceptions as a
VMLaunchFailure, because that will 'downgrade' the exception from a
BaseException to a regular Exception. We do, however, want to perform
cleanup in either case, so catch on the broadest scope and
wrap-and-re-raise only in the more targeted scope.)

Signed-off-by: John Snow 
Reviewed-by: Hanna Reitz 
---
 python/qemu/machine/machine.py| 45 ---
 tests/qemu-iotests/tests/mirror-top-perms |  3 +-
 2 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 67ab06ca2b..a5972fab4d 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -74,6 +74,35 @@ class QEMUMachineAddDeviceError(QEMUMachineError):
 """
 
 
+class VMLaunchFailure(QEMUMachineError):
+"""
+Exception raised when a VM launch was attempted, but failed.
+"""
+def __init__(self, exitcode: Optional[int],
+ command: str, output: Optional[str]):
+super().__init__(exitcode, command, output)
+self.exitcode = exitcode
+self.command = command
+self.output = output
+
+def __str__(self) -> str:
+ret = ''
+if self.__cause__ is not None:
+name = type(self.__cause__).__name__
+reason = str(self.__cause__)
+if reason:
+ret += f"{name}: {reason}"
+else:
+ret += f"{name}"
+ret += '\n'
+
+if self.exitcode is not None:
+ret += f"\tExit code: {self.exitcode}\n"
+ret += f"\tCommand: {self.command}\n"
+ret += f"\tOutput: {self.output}\n"
+return ret
+
+
 class AbnormalShutdown(QEMUMachineError):
 """
 Exception raised when a graceful shutdown was requested, but not performed.
@@ -397,7 +426,7 @@ def launch(self) -> None:
 
 try:
 self._launch()
-except:
+except BaseException as exc:
 # We may have launched the process but it may
 # have exited before we could connect via QMP.
 # Assume the VM didn't launch or is exiting.
@@ -408,11 +437,15 @@ def launch(self) -> None:
 else:
 self._post_shutdown()
 
-LOG.debug('Error launching VM')
-if self._qemu_full_args:
-LOG.debug('Command: %r', ' '.join(self._qemu_full_args))
-if self._iolog:
-LOG.debug('Output: %r', self._iolog)
+if isinstance(exc, Exception):
+raise VMLaunchFailure(
+exitcode=self.exitcode(),
+command=' '.join(self._qemu_full_args),
+output=self._iolog
+) from exc
+
+# Don't wrap 'BaseException'; doing so would downgrade
+# that exception. However, we still want to clean up.
 raise
 
 def _launch(self) -> None:
diff --git a/tests/qemu-iotests/tests/mirror-top-perms 
b/tests/qemu-iotests/tests/mirror-top-perms
index 0a51a613f3..b5849978c4 100755
--- a/tests/qemu-iotests/tests/mirror-top-perms
+++ b/tests/qemu-iotests/tests/mirror-top-perms
@@ -21,7 +21,6 @@
 
 import os
 
-from qemu.aqmp import ConnectError
 from qemu.machine import machine
 from qemu.qmp import QMPConnectError
 
@@ -107,7 +106,7 @@ class TestMirrorTopPerms(iotests.QMPTestCase):
 self.vm_b.launch()
 print('ERROR: VM B launched successfully, '
   'this should not have happened')
-except (QMPConnectError, ConnectError):
+except (QMPConnectError, machine.VMLaunchFailure):
 assert 'Is another process using the image' in self.vm_b.get_log()
 
 result = self.vm.qmp('block-job-cancel',
-- 
2.31.1

Re: [PATCH v5 17/18] vfio-user: register handlers to facilitate migration

2022-01-31 Thread Jag Raman



> On Jan 28, 2022, at 3:29 AM, Stefan Hajnoczi  wrote:
> 
> On Thu, Jan 27, 2022 at 05:04:26PM +, Jag Raman wrote:
>> 
>> 
>>> On Jan 25, 2022, at 10:48 AM, Stefan Hajnoczi  wrote:
>>> 
>>> On Wed, Jan 19, 2022 at 04:42:06PM -0500, Jagannathan Raman wrote:
 + * The client subsequetly asks the remote server for any data that
>>> 
>>> subsequently
>>> 
 +static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
 +{
 +VfuObject *o = vfu_get_private(vfu_ctx);
 +VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
 +static int migrated_devs;
 +Error *local_err = NULL;
 +int ret;
 +
 +/**
 + * TODO: move to VFU_MIGR_STATE_RESUME handler. Presently, the
 + * VMSD data from source is not available at RESUME state.
 + * Working on a fix for this.
 + */
 +if (!o->vfu_mig_file) {
 +o->vfu_mig_file = qemu_fopen_ops(o, _mig_fops_load, false);
 +}
 +
 +ret = qemu_remote_loadvm(o->vfu_mig_file);
 +if (ret) {
 +VFU_OBJECT_ERROR(o, "vfu: failed to restore device state");
 +return;
 +}
 +
 +qemu_file_shutdown(o->vfu_mig_file);
 +o->vfu_mig_file = NULL;
 +
 +/* VFU_MIGR_STATE_RUNNING begins here */
 +if (++migrated_devs == k->nr_devs) {
>>> 
>>> When is this counter reset so migration can be tried again if it
>>> fails/cancels?
>> 
>> Detecting cancellation is a pending item. We will address it in the
>> next rev. Will check with you if  we get stuck during the process
>> of implementing it.
>> 
>>> 
 +static ssize_t vfu_mig_read_data(vfu_ctx_t *vfu_ctx, void *buf,
 + uint64_t size, uint64_t offset)
 +{
 +VfuObject *o = vfu_get_private(vfu_ctx);
 +
 +if (offset > o->vfu_mig_buf_size) {
 +return -1;
 +}
 +
 +if ((offset + size) > o->vfu_mig_buf_size) {
 +warn_report("vfu: buffer overflow - check pending_bytes");
 +size = o->vfu_mig_buf_size - offset;
 +}
 +
 +memcpy(buf, (o->vfu_mig_buf + offset), size);
 +
 +o->vfu_mig_buf_pending -= size;
>>> 
>>> This assumes that the caller increments offset by size each time. If
>>> that assumption is okay, then we can just trust offset and don't need to
>>> do arithmetic on vfu_mig_buf_pending. If that assumption is not correct,
>>> then the code needs to be extended to safely update vfu_mig_buf_pending
>>> when offset jumps around arbitrarily between calls.
>> 
>> Going by the definition of vfu_migration_callbacks_t in the library, I 
>> assumed
>> that read_data advances the offset by size bytes.
>> 
>> Will add a comment a comment to explain that.
>> 
>>> 
 +uint64_t vmstate_vmsd_size(PCIDevice *pci_dev)
 +{
 +DeviceClass *dc = DEVICE_GET_CLASS(DEVICE(pci_dev));
 +const VMStateField *field = NULL;
 +uint64_t size = 0;
 +
 +if (!dc->vmsd) {
 +return 0;
 +}
 +
 +field = dc->vmsd->fields;
 +while (field && field->name) {
 +size += vmstate_size(pci_dev, field);
 +field++;
 +}
 +
 +return size;
 +}
>>> 
>>> This function looks incorrect because it ignores subsections as well as
>>> runtime behavior during save(). Although VMStateDescription is partially
>>> declarative, there is still a bunch of imperative code that can write to
>>> the QEMUFile at save() time so there's no way of knowing the size ahead
>>> of time.
>> 
>> I see your point, it would be a problem for any field which has the
>> (VMS_BUFFER | VMS_ALLOC) flags set.
>> 
>>> 
>>> I asked this in a previous revision of this series but I'm not sure if
>>> it was answered: is it really necessary to know the size of the vmstate?
>>> I thought the VFIO migration interface is designed to support
>>> streaming reads/writes. We could choose a fixed size like 64KB and
>>> stream the vmstate in 64KB chunks.
>> 
>> The library exposes the migration data to the client as a device BAR with
>> fixed size - the size of which is fixed at boot time, even when using
>> vfu_migration_callbacks_t callbacks.
>> 
>> I don’t believe the library supports streaming vmstate/migration-data - see
>> the following comment in migration_region_access() defined in the library:
>> 
>> * Does this mean that partial reads are not allowed?
>> 
>> Thanos or John,
>> 
>>Could you please clarify this?
>> 
>> Stefan,
>>We attempted to answer the migration cancellation and vmstate size
>>questions previously also, in the following email:
>> 
>> https://lore.kernel.org/all/f48606b1-15a4-4dd2-9d71-2fcafc0e6...@oracle.com/
> 
>> libvfio-user has the vfu_migration_callbacks_t interface that allows the
>> device to save/load more data regardless of the size of the migration
>> region. I don't see the issue here since the region doesn't need to be
>> sized to fit

Re: [PATCH v7 2/5] target/riscv: add PTE_A/PTE_D/PTE_U bits check for inner PTE

2022-01-31 Thread Alistair Francis

On Fri, Jan 28, 2022 at 7:06 PM Weiwei Li  wrote:
>
> For non-leaf PTEs, the D, A, and U bits are reserved for future standard use.
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 
> Reviewed-by: Anup Patel 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu_helper.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 5a1c0e239e..b820166dc5 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -641,6 +641,9 @@ restart:
>  return TRANSLATE_FAIL;
>  } else if (!(pte & (PTE_R | PTE_W | PTE_X))) {
>  /* Inner PTE, continue walking */
> +if (pte & (PTE_D | PTE_A | PTE_U)) {
> +return TRANSLATE_FAIL;
> +}
>  base = ppn << PGSHIFT;
>  } else if ((pte & (PTE_R | PTE_W | PTE_X)) == PTE_W) {
>  /* Reserved leaf PTE flags: PTE_W */
> --
> 2.17.1
>
>

Re: [PATCH v7 1/5] target/riscv: Ignore reserved bits in PTE for RV64

2022-01-31 Thread Alistair Francis

On Fri, Jan 28, 2022 at 7:11 PM Weiwei Li  wrote:
>
> From: Guo Ren 
>
> Highest bits of PTE has been used for svpbmt, ref: [1], [2], so we
> need to ignore them. They cannot be a part of ppn.
>
> 1: The RISC-V Instruction Set Manual, Volume II: Privileged Architecture
>4.4 Sv39: Page-Based 39-bit Virtual-Memory System
>4.5 Sv48: Page-Based 48-bit Virtual-Memory System
>
> 2: https://github.com/riscv/virtual-memory/blob/main/specs/663-Svpbmt-diff.pdf
>
> Signed-off-by: Guo Ren 
> Reviewed-by: Liu Zhiwei 
> Cc: Bin Meng 
> Cc: Alistair Francis 
> ---
>  target/riscv/cpu.h| 15 +++
>  target/riscv/cpu_bits.h   |  3 +++
>  target/riscv/cpu_helper.c | 14 +-
>  3 files changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 55635d68d5..336fe8e3d5 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -341,6 +341,8 @@ struct RISCVCPU {
>  bool ext_counters;
>  bool ext_ifencei;
>  bool ext_icsr;
> +bool ext_svnapot;
> +bool ext_svpbmt;
>  bool ext_zfh;
>  bool ext_zfhmin;
>  bool ext_zve32f;

Hello, thanks for the patches.

This looks good, but you might need to rebase it as there are patches
on list that move this into a different struct.

> @@ -495,6 +497,19 @@ static inline int riscv_cpu_xlen(CPURISCVState *env)
>  return 16 << env->xl;
>  }
>
> +#ifdef TARGET_RISCV32
> +#define riscv_cpu_sxl(env)  ((void)(env), MXL_RV32)
> +#else
> +static inline RISCVMXL riscv_cpu_sxl(CPURISCVState *env)
> +{
> +#ifdef CONFIG_USER_ONLY
> +return env->misa_mxl;
> +#else
> +return get_field(env->mstatus, MSTATUS64_SXL);
> +#endif
> +}
> +#endif
> +
>  /*
>   * Encode LMUL to lmul as follows:
>   * LMULvlmullmul
> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> index 7c87433645..6ea3944423 100644
> --- a/target/riscv/cpu_bits.h
> +++ b/target/riscv/cpu_bits.h
> @@ -493,6 +493,9 @@ typedef enum {
>  /* Page table PPN shift amount */
>  #define PTE_PPN_SHIFT   10
>
> +/* Page table PPN mask */
> +#define PTE_PPN_MASK0x3FFC00ULL
> +
>  /* Leaf page shift amount */
>  #define PGSHIFT 12
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 327a2c4f1d..5a1c0e239e 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -622,7 +622,19 @@ restart:
>  return TRANSLATE_FAIL;
>  }
>
> -hwaddr ppn = pte >> PTE_PPN_SHIFT;
> +hwaddr ppn;
> +RISCVCPU *cpu = env_archcpu(env);

I know there is existing code in this function that does this, but
please don't initiate variables mid function. Can you move this to the
top of the function?

Otherwise:

Reviewed-by: Alistair Francis 

Alistair

> +
> +if (riscv_cpu_sxl(env) == MXL_RV32) {
> +ppn = pte >> PTE_PPN_SHIFT;
> +} else if (cpu->cfg.ext_svpbmt || cpu->cfg.ext_svnapot) {
> +ppn = (pte & (target_ulong)PTE_PPN_MASK) >> PTE_PPN_SHIFT;
> +} else {
> +ppn = pte >> PTE_PPN_SHIFT;
> +if ((pte & ~(target_ulong)PTE_PPN_MASK) >> PTE_PPN_SHIFT) {
> +return TRANSLATE_FAIL;
> +}
> +}
>
>  if (!(pte & PTE_V)) {
>  /* Invalid PTE */
> --
> 2.17.1
>
>

Re: [PATCH v5 6/7] target/riscv: Add XVentanaCondOps custom extension

2022-01-31 Thread Alistair Francis

On Mon, Jan 31, 2022 at 9:23 PM Philipp Tomsich
 wrote:
>
> This adds the decoder and translation for the XVentanaCondOps custom
> extension (vendor-defined by Ventana Micro Systems), which is
> documented at 
> https://github.com/ventanamicro/ventana-custom-extensions/releases/download/v1.0.0/ventana-custom-extensions-v1.0.0.pdf
>
> This commit then also adds a guard-function (has_XVentanaCondOps_p)
> and the decoder function to the table of decoders, enabling the
> support for the XVentanaCondOps extension.
>
> Signed-off-by: Philipp Tomsich 
> Reviewed-by: Richard Henderson 
>
> ---
>
> (no changes since v3)
>
> Changes in v3:
> - rename to trans_xventanacondops.c.inc (i.e. with the '.c')
> - (in MATERIALISE_EXT_PREDICATE) don't annotate the predicate function
>   for testing the availability of individual extensions as 'inline'
>   and don't make CPURISCVState* visible to these predicate functions
>
> Changes in v2:
> - Split off decode table into XVentanaCondOps.decode
> - Wire up XVentanaCondOps in the decoder-table
>
>  target/riscv/XVentanaCondOps.decode   | 25 
>  target/riscv/cpu.c|  3 ++
>  target/riscv/cpu.h|  3 ++
>  .../insn_trans/trans_xventanacondops.c.inc| 39 +++
>  target/riscv/meson.build  |  1 +
>  target/riscv/translate.c  | 12 ++
>  6 files changed, 83 insertions(+)
>  create mode 100644 target/riscv/XVentanaCondOps.decode
>  create mode 100644 target/riscv/insn_trans/trans_xventanacondops.c.inc
>
> diff --git a/target/riscv/XVentanaCondOps.decode 
> b/target/riscv/XVentanaCondOps.decode
> new file mode 100644
> index 00..5aef7c3d72
> --- /dev/null
> +++ b/target/riscv/XVentanaCondOps.decode
> @@ -0,0 +1,25 @@
> +#
> +# RISC-V translation routines for the XVentanaCondOps extension
> +#
> +# Copyright (c) 2022 Dr. Philipp Tomsich, philipp.toms...@vrull.eu
> +#
> +# SPDX-License-Identifier: LGPL-2.1-or-later
> +#
> +# Reference: VTx-family custom instructions
> +#Custom ISA extensions for Ventana Micro Systems RISC-V cores
> +#
> (https://github.com/ventanamicro/ventana-custom-extensions/releases/download/v1.0.0/ventana-custom-extensions-v1.0.0.pdf)
> +
> +# Fields
> +%rs2  20:5
> +%rs1  15:5
> +%rd7:5
> +
> +# Argument sets
> +rd rs1 rs2  !extern
> +
> +# Formats
> +@r ...  . . ... . ... %rs2 
> %rs1 %rd
> +
> +# *** RV64 Custom-3 Extension ***
> +vt_maskc   000  . . 110 . 011 @r
> +vt_maskcn  000  . . 111 . 011 @r
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 1cb0436187..6df07b8289 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -734,6 +734,9 @@ static Property riscv_cpu_properties[] = {
>  DEFINE_PROP_BOOL("zbc", RISCVCPU, cfg.ext_zbc, true),
>  DEFINE_PROP_BOOL("zbs", RISCVCPU, cfg.ext_zbs, true),
>
> +/* Vendor-specific custom extensions */
> +DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, 
> false),
> +
>  /* These are experimental so mark with 'x-' */
>  DEFINE_PROP_BOOL("x-j", RISCVCPU, cfg.ext_j, false),
>  /* ePMP 0.9.3 */
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 1175915c0d..aacc997d56 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -329,6 +329,9 @@ struct RISCVCPUConfig {
>  bool ext_zve32f;
>  bool ext_zve64f;
>
> +/* Vendor-specific custom extensions */
> +bool ext_XVentanaCondOps;
> +
>  char *priv_spec;
>  char *user_spec;
>  char *bext_spec;
> diff --git a/target/riscv/insn_trans/trans_xventanacondops.c.inc 
> b/target/riscv/insn_trans/trans_xventanacondops.c.inc
> new file mode 100644
> index 00..b8a5d031b5
> --- /dev/null
> +++ b/target/riscv/insn_trans/trans_xventanacondops.c.inc
> @@ -0,0 +1,39 @@
> +/*
> + * RISC-V translation routines for the XVentanaCondOps extension.
> + *
> + * Copyright (c) 2021-2022 VRULL GmbH.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program.  If not, see .
> + */
> +
> +static bool gen_condmask(DisasContext *ctx, arg_r *a, TCGCond cond)

This should also have a vendor prefix

Otherwise:

Reviewed-by: Alistair Francis 

Alistair

> +{
> +TCGv dest = dest_gpr(ctx, a->rd);
> +TCGv src1 = get_gpr(ctx, a->rs1, EXT_NONE);
> +TCGv src2 =

Re: [PATCH v5 5/7] target/riscv: iterate over a table of decoders

2022-01-31 Thread Alistair Francis

On Mon, Jan 31, 2022 at 9:32 PM Philipp Tomsich
 wrote:
>
> To split up the decoder into multiple functions (both to support
> vendor-specific opcodes in separate files and to simplify maintenance
> of orthogonal extensions), this changes decode_op to iterate over a
> table of decoders predicated on guard functions.
>
> This commit only adds the new structure and the table, allowing for
> the easy addition of additional decoders in the future.
>
> Signed-off-by: Philipp Tomsich 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

>
> ---
>
> (no changes since v4)
>
> Changes in v4:
> - add braces to comply with coding standard (as suggested by Richard)
> - merge the two if-statements to reduce clutter after (now that the
>   braces have been added)
>
> Changes in v3:
> - expose only the DisasContext* to predicate functions
> - mark the table of decoder functions as static
> - drop the inline from always_true_p, until the need arises (i.e.,
>   someone finds a use for it and calls it directly)
> - rewrite to drop the 'handled' temporary in iterating over the
>   decoder table, removing the assignment in the condition of the if
>
> Changes in v2:
> - (new patch) iterate over a table of guarded decoder functions
>
>  target/riscv/translate.c | 32 +++-
>  1 file changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index f19d5cd0c0..30b1b68341 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -111,6 +111,11 @@ static inline bool has_ext(DisasContext *ctx, uint32_t 
> ext)
>  return ctx->misa_ext & ext;
>  }
>
> +static bool always_true_p(DisasContext *ctx  __attribute__((__unused__)))
> +{
> +return true;
> +}
> +
>  #ifdef TARGET_RISCV32
>  #define get_xl(ctx)MXL_RV32
>  #elif defined(CONFIG_USER_ONLY)
> @@ -855,15 +860,26 @@ static uint32_t opcode_at(DisasContextBase *dcbase, 
> target_ulong pc)
>
>  static void decode_opc(CPURISCVState *env, DisasContext *ctx, uint16_t 
> opcode)
>  {
> -/* check for compressed insn */
> +/*
> + * A table with predicate (i.e., guard) functions and decoder functions
> + * that are tested in-order until a decoder matches onto the opcode.
> + */
> +static const struct {
> +bool (*guard_func)(DisasContext *);
> +bool (*decode_func)(DisasContext *, uint32_t);
> +} decoders[] = {
> +{ always_true_p,  decode_insn32 },
> +};
> +
> +/* Check for compressed insn */
>  if (extract16(opcode, 0, 2) != 3) {
>  if (!has_ext(ctx, RVC)) {
>  gen_exception_illegal(ctx);
>  } else {
>  ctx->opcode = opcode;
>  ctx->pc_succ_insn = ctx->base.pc_next + 2;
> -if (!decode_insn16(ctx, opcode)) {
> -gen_exception_illegal(ctx);
> +if (decode_insn16(ctx, opcode)) {
> +return;
>  }
>  }
>  } else {
> @@ -873,10 +889,16 @@ static void decode_opc(CPURISCVState *env, DisasContext 
> *ctx, uint16_t opcode)
>   ctx->base.pc_next + 2));
>  ctx->opcode = opcode32;
>  ctx->pc_succ_insn = ctx->base.pc_next + 4;
> -if (!decode_insn32(ctx, opcode32)) {
> -gen_exception_illegal(ctx);
> +
> +for (size_t i = 0; i < ARRAY_SIZE(decoders); ++i) {
> +if (decoders[i].guard_func(ctx) &&
> +decoders[i].decode_func(ctx, opcode32)) {
> +return;
> +}
>  }
>  }
> +
> +gen_exception_illegal(ctx);
>  }
>
>  static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState 
> *cs)
> --
> 2.33.1
>
>

Re: [PATCH v5 7/7] target/riscv: add a MAINTAINERS entry for XVentanaCondOps

2022-01-31 Thread Alistair Francis

On Mon, Jan 31, 2022 at 9:24 PM Philipp Tomsich
 wrote:
>
> The XVentanaCondOps extension is supported by VRULL on behalf of the
> Ventana Micro.  Add myself as a point-of-contact.
>
> Signed-off-by: Philipp Tomsich 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

>
> ---
>
> (no changes since v3)
>
> Changes in v3:
> - add a MAINTAINERS entry for XVentanaCondOps
>
>  MAINTAINERS | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b43344fa98..2e0b2ae947 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -286,6 +286,13 @@ F: include/hw/riscv/
>  F: linux-user/host/riscv32/
>  F: linux-user/host/riscv64/
>
> +RISC-V XVentanaCondOps extension
> +M: Philipp Tomsich 
> +L: qemu-ri...@nongnu.org
> +S: Supported
> +F: target/riscv/XVentanaCondOps.decode
> +F: target/riscv/insn_trans/trans_xventanacondops.c.inc
> +
>  RENESAS RX CPUs
>  R: Yoshinori Sato 
>  S: Orphan
> --
> 2.33.1
>
>

Re: [PATCH v5 4/7] target/riscv: access cfg structure through DisasContext

2022-01-31 Thread Alistair Francis

On Mon, Jan 31, 2022 at 9:05 PM Philipp Tomsich
 wrote:
>
> The Zb[abcs] support code still uses the RISCV_CPU macros to access
> the configuration information (i.e., check whether an extension is
> available/enabled).  Now that we provide this information directly
> from DisasContext, we can access this directly via the cfg_ptr field.
>
> Signed-off-by: Philipp Tomsich 
> Suggested-by: Richard Henderson 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

>
> ---
>
> (no changes since v3)
>
> Changes in v3:
> - (new patch) change Zb[abcs] implementation to use cfg_ptr (copied
>   into DisasContext) instead of going throuhg RISCV_CPU
>
>  target/riscv/insn_trans/trans_rvb.c.inc | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
> b/target/riscv/insn_trans/trans_rvb.c.inc
> index 810431a1d6..f9bd3b7ec4 100644
> --- a/target/riscv/insn_trans/trans_rvb.c.inc
> +++ b/target/riscv/insn_trans/trans_rvb.c.inc
> @@ -19,25 +19,25 @@
>   */
>
>  #define REQUIRE_ZBA(ctx) do {\
> -if (!RISCV_CPU(ctx->cs)->cfg.ext_zba) {  \
> +if (ctx->cfg_ptr->ext_zba) { \
>  return false;\
>  }\
>  } while (0)
>
>  #define REQUIRE_ZBB(ctx) do {\
> -if (!RISCV_CPU(ctx->cs)->cfg.ext_zbb) {  \
> +if (ctx->cfg_ptr->ext_zbb) { \
>  return false;\
>  }\
>  } while (0)
>
>  #define REQUIRE_ZBC(ctx) do {\
> -if (!RISCV_CPU(ctx->cs)->cfg.ext_zbc) {  \
> +if (ctx->cfg_ptr->ext_zbc) { \
>  return false;\
>  }\
>  } while (0)
>
>  #define REQUIRE_ZBS(ctx) do {\
> -if (!RISCV_CPU(ctx->cs)->cfg.ext_zbs) {  \
> +if (ctx->cfg_ptr->ext_zbs) { \
>  return false;\
>  }\
>  } while (0)
> --
> 2.33.1
>
>

Re: [PATCH v5 3/7] target/riscv: access configuration through cfg_ptr in DisasContext

2022-01-31 Thread Alistair Francis

On Mon, Jan 31, 2022 at 9:10 PM Philipp Tomsich
 wrote:
>
> The implementation in trans_{rvi,rvv,rvzfh}.c.inc accesses the shallow
> copies (in DisasContext) of some of the elements available in the
> RISCVCPUConfig structure.  This commit redirects accesses to use the
> cfg_ptr copied into DisasContext and removes the shallow copies.
>
> Signed-off-by: Philipp Tomsich 
> Suggested-by: Richard Henderson 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

>
> ---
>
> (no changes since v3)
>
> Changes in v3:
> - (new patch) test extension-availability through cfg_ptr in
>   DisasContext, removing the fields that have been copied into
>   DisasContext directly
>
>  target/riscv/insn_trans/trans_rvi.c.inc   |   2 +-
>  target/riscv/insn_trans/trans_rvv.c.inc   | 104 +++---
>  target/riscv/insn_trans/trans_rvzfh.c.inc |   4 +-
>  target/riscv/translate.c  |  14 ---
>  4 files changed, 55 insertions(+), 69 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
> b/target/riscv/insn_trans/trans_rvi.c.inc
> index 3cd1b3f877..f1342f30f8 100644
> --- a/target/riscv/insn_trans/trans_rvi.c.inc
> +++ b/target/riscv/insn_trans/trans_rvi.c.inc
> @@ -806,7 +806,7 @@ static bool trans_fence(DisasContext *ctx, arg_fence *a)
>
>  static bool trans_fence_i(DisasContext *ctx, arg_fence_i *a)
>  {
> -if (!ctx->ext_ifencei) {
> +if (!ctx->cfg_ptr->ext_ifencei) {
>  return false;
>  }
>
> diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
> b/target/riscv/insn_trans/trans_rvv.c.inc
> index f85a9e83b4..ff09e345ad 100644
> --- a/target/riscv/insn_trans/trans_rvv.c.inc
> +++ b/target/riscv/insn_trans/trans_rvv.c.inc
> @@ -74,7 +74,7 @@ static bool require_zve32f(DisasContext *s)
>  }
>
>  /* Zve32f doesn't support FP64. (Section 18.2) */
> -return s->ext_zve32f ? s->sew <= MO_32 : true;
> +return s->cfg_ptr->ext_zve32f ? s->sew <= MO_32 : true;
>  }
>
>  static bool require_scale_zve32f(DisasContext *s)
> @@ -85,7 +85,7 @@ static bool require_scale_zve32f(DisasContext *s)
>  }
>
>  /* Zve32f doesn't support FP64. (Section 18.2) */
> -return s->ext_zve64f ? s->sew <= MO_16 : true;
> +return s->cfg_ptr->ext_zve64f ? s->sew <= MO_16 : true;
>  }
>
>  static bool require_zve64f(DisasContext *s)
> @@ -96,7 +96,7 @@ static bool require_zve64f(DisasContext *s)
>  }
>
>  /* Zve64f doesn't support FP64. (Section 18.2) */
> -return s->ext_zve64f ? s->sew <= MO_32 : true;
> +return s->cfg_ptr->ext_zve64f ? s->sew <= MO_32 : true;
>  }
>
>  static bool require_scale_zve64f(DisasContext *s)
> @@ -107,7 +107,7 @@ static bool require_scale_zve64f(DisasContext *s)
>  }
>
>  /* Zve64f doesn't support FP64. (Section 18.2) */
> -return s->ext_zve64f ? s->sew <= MO_16 : true;
> +return s->cfg_ptr->ext_zve64f ? s->sew <= MO_16 : true;
>  }
>
>  /* Destination vector register group cannot overlap source mask register. */
> @@ -174,7 +174,7 @@ static bool do_vsetvl(DisasContext *s, int rd, int rs1, 
> TCGv s2)
>  TCGv s1, dst;
>
>  if (!require_rvv(s) ||
> -!(has_ext(s, RVV) || s->ext_zve32f || s->ext_zve64f)) {
> +!(has_ext(s, RVV) || s->cfg_ptr->ext_zve32f || 
> s->cfg_ptr->ext_zve64f)) {
>  return false;
>  }
>
> @@ -210,7 +210,7 @@ static bool do_vsetivli(DisasContext *s, int rd, TCGv s1, 
> TCGv s2)
>  TCGv dst;
>
>  if (!require_rvv(s) ||
> -!(has_ext(s, RVV) || s->ext_zve32f || s->ext_zve64f)) {
> +!(has_ext(s, RVV) || s->cfg_ptr->ext_zve32f || 
> s->cfg_ptr->ext_zve64f)) {
>  return false;
>  }
>
> @@ -248,7 +248,7 @@ static bool trans_vsetivli(DisasContext *s, arg_vsetivli 
> *a)
>  /* vector register offset from env */
>  static uint32_t vreg_ofs(DisasContext *s, int reg)
>  {
> -return offsetof(CPURISCVState, vreg) + reg * s->vlen / 8;
> +return offsetof(CPURISCVState, vreg) + reg * s->cfg_ptr->vlen / 8;
>  }
>
>  /* check functions */
> @@ -318,7 +318,7 @@ static bool vext_check_st_index(DisasContext *s, int vd, 
> int vs2, int nf,
>   * when XLEN=32. (Section 18.2)
>   */
>  if (get_xl(s) == MXL_RV32) {
> -ret &= (!has_ext(s, RVV) && s->ext_zve64f ? eew != MO_64 : true);
> +ret &= (!has_ext(s, RVV) && s->cfg_ptr->ext_zve64f ? eew != MO_64 : 
> true);
>  }
>
>  return ret;
> @@ -454,7 +454,7 @@ static bool vext_wide_check_common(DisasContext *s, int 
> vd, int vm)
>  {
>  return (s->lmul <= 2) &&
> (s->sew < MO_64) &&
> -   ((s->sew + 1) <= (s->elen >> 4)) &&
> +   ((s->sew + 1) <= (s->cfg_ptr->elen >> 4)) &&
> require_align(vd, s->lmul + 1) &&
> require_vm(vm, vd);
>  }
> @@ -482,7 +482,7 @@ static bool vext_narrow_check_common(DisasContext *s, int 
> vd, int vs2,
>  {
>  return (s->lmul <= 2) &&
> (s->sew < MO_64) &&
> -   ((s->sew + 1) <= (s->elen >> 4)) &&
> +   ((s->sew + 1) <=

Re: [PATCH v5 2/7] target/riscv: riscv_tr_init_disas_context: copy pointer-to-cfg into cfg_ptr

2022-01-31 Thread Alistair Francis

On Mon, Jan 31, 2022 at 9:03 PM Philipp Tomsich
 wrote:
>
> As the number of extensions is growing, copying them individiually
> into the DisasContext will scale less and less... instead we populate
> a pointer to the RISCVCPUConfig structure in the DisasContext.
>
> This adds an extra indirection when checking for the availability of
> an extension (compared to copying the fields into DisasContext).
> While not a performance problem today, we can always (shallow) copy
> the entire structure into the DisasContext (instead of putting a
> pointer to it) if this is ever deemed necessary.
>
> Signed-off-by: Philipp Tomsich 
> Suggested-by: Richard Henderson 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

>
> ---
>
> Changes in v5:
> - use the typedef in DisasContext instead of the naked struct
>   for RISCVCPUConfig
>
> Changes in v3:
> - (new patch) copy pointer to element cfg into DisasContext
>
>  target/riscv/translate.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index f0bbe80875..49e40735ce 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -76,6 +76,7 @@ typedef struct DisasContext {
>  int frm;
>  RISCVMXL ol;
>  bool virt_enabled;
> +const RISCVCPUConfig *cfg_ptr;
>  bool ext_ifencei;
>  bool ext_zfh;
>  bool ext_zfhmin;
> @@ -908,6 +909,7 @@ static void riscv_tr_init_disas_context(DisasContextBase 
> *dcbase, CPUState *cs)
>  #endif
>  ctx->misa_ext = env->misa_ext;
>  ctx->frm = -1;  /* unknown rounding mode */
> +ctx->cfg_ptr = &(cpu->cfg);
>  ctx->ext_ifencei = cpu->cfg.ext_ifencei;
>  ctx->ext_zfh = cpu->cfg.ext_zfh;
>  ctx->ext_zfhmin = cpu->cfg.ext_zfhmin;
> --
> 2.33.1
>
>

Re: [PATCH v5 1/7] target/riscv: refactor (anonymous struct) RISCVCPU.cfg into 'struct RISCVCPUConfig'

2022-01-31 Thread Alistair Francis

On Mon, Jan 31, 2022 at 9:03 PM Philipp Tomsich
 wrote:
>
> Signed-off-by: Philipp Tomsich 
> Suggested-by: Richard Henderson 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

>
> ---
>
> (no changes since v4)
>
> Changes in v4:
> - use a typedef into 'RISCVCPUConfig' (instead of the explicit
>   'struct RISCVCPUConfig') to comply with the coding standard
>   (as suggested in Richard's review of v3)
>
> Changes in v3:
> - (new patch) refactor 'struct RISCVCPUConfig'
>
>  target/riscv/cpu.h | 78 --
>  1 file changed, 41 insertions(+), 37 deletions(-)
>
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 55635d68d5..1175915c0d 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -303,6 +303,46 @@ struct RISCVCPUClass {
>  DeviceReset parent_reset;
>  };
>
> +struct RISCVCPUConfig {
> +bool ext_i;
> +bool ext_e;
> +bool ext_g;
> +bool ext_m;
> +bool ext_a;
> +bool ext_f;
> +bool ext_d;
> +bool ext_c;
> +bool ext_s;
> +bool ext_u;
> +bool ext_h;
> +bool ext_j;
> +bool ext_v;
> +bool ext_zba;
> +bool ext_zbb;
> +bool ext_zbc;
> +bool ext_zbs;
> +bool ext_counters;
> +bool ext_ifencei;
> +bool ext_icsr;
> +bool ext_zfh;
> +bool ext_zfhmin;
> +bool ext_zve32f;
> +bool ext_zve64f;
> +
> +char *priv_spec;
> +char *user_spec;
> +char *bext_spec;
> +char *vext_spec;
> +uint16_t vlen;
> +uint16_t elen;
> +bool mmu;
> +bool pmp;
> +bool epmp;
> +uint64_t resetvec;
> +};
> +
> +typedef struct RISCVCPUConfig RISCVCPUConfig;
> +
>  /**
>   * RISCVCPU:
>   * @env: #CPURISCVState
> @@ -320,43 +360,7 @@ struct RISCVCPU {
>  char *dyn_vreg_xml;
>
>  /* Configuration Settings */
> -struct {
> -bool ext_i;
> -bool ext_e;
> -bool ext_g;
> -bool ext_m;
> -bool ext_a;
> -bool ext_f;
> -bool ext_d;
> -bool ext_c;
> -bool ext_s;
> -bool ext_u;
> -bool ext_h;
> -bool ext_j;
> -bool ext_v;
> -bool ext_zba;
> -bool ext_zbb;
> -bool ext_zbc;
> -bool ext_zbs;
> -bool ext_counters;
> -bool ext_ifencei;
> -bool ext_icsr;
> -bool ext_zfh;
> -bool ext_zfhmin;
> -bool ext_zve32f;
> -bool ext_zve64f;
> -
> -char *priv_spec;
> -char *user_spec;
> -char *bext_spec;
> -char *vext_spec;
> -uint16_t vlen;
> -uint16_t elen;
> -bool mmu;
> -bool pmp;
> -bool epmp;
> -uint64_t resetvec;
> -} cfg;
> +RISCVCPUConfig cfg;
>  };
>
>  static inline int riscv_has_ext(CPURISCVState *env, target_ulong ext)
> --
> 2.33.1
>
>

Re: [RFC PATCH] hw/intc: Make RISC-V ACLINT mtime MMIO register writable

2022-01-31 Thread Alistair Francis

On Wed, Jan 26, 2022 at 7:55 PM  wrote:
>
> From: Frank Chang 
>
> RISC-V privilege spec defines that mtime is exposed as a memory-mapped
> machine-mode read-write register. However, as QEMU uses host monotonic
> timer as timer source, this makes mtime to be read-only in RISC-V
> ACLINT.
>
> This patch makes mtime to be writable by recording the time delta value
> between the mtime value to be written and the timer value at the time
> mtime is written. Time delta value is then added back whenever the timer
> value is retrieved.
>
> Signed-off-by: Frank Chang 
> ---
>  hw/intc/riscv_aclint.c | 58 ++
>  include/hw/intc/riscv_aclint.h |  1 +
>  target/riscv/cpu.h |  8 ++---
>  target/riscv/cpu_helper.c  |  4 +--
>  4 files changed, 44 insertions(+), 27 deletions(-)
>
> diff --git a/hw/intc/riscv_aclint.c b/hw/intc/riscv_aclint.c
> index f1a5d3d284..ffbe211a3d 100644
> --- a/hw/intc/riscv_aclint.c
> +++ b/hw/intc/riscv_aclint.c
> @@ -38,12 +38,18 @@ typedef struct riscv_aclint_mtimer_callback {
>  int num;
>  } riscv_aclint_mtimer_callback;
>
> -static uint64_t cpu_riscv_read_rtc(uint32_t timebase_freq)
> +static uint64_t cpu_riscv_read_rtc_raw(uint32_t timebase_freq)
>  {
>  return muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
>  timebase_freq, NANOSECONDS_PER_SECOND);
>  }
>
> +static uint64_t cpu_riscv_read_rtc(void *opaque)
> +{
> +RISCVAclintMTimerState *mtimer = opaque;
> +return cpu_riscv_read_rtc_raw(mtimer->timebase_freq) + 
> mtimer->time_delta;
> +}
> +
>  /*
>   * Called when timecmp is written to update the QEMU timer or immediately
>   * trigger timer interrupt if mtimecmp <= current timer value.
> @@ -51,13 +57,13 @@ static uint64_t cpu_riscv_read_rtc(uint32_t timebase_freq)
>  static void riscv_aclint_mtimer_write_timecmp(RISCVAclintMTimerState *mtimer,
>RISCVCPU *cpu,
>int hartid,
> -  uint64_t value,
> -  uint32_t timebase_freq)
> +  uint64_t value)
>  {
> +uint32_t timebase_freq = mtimer->timebase_freq;
>  uint64_t next;
>  uint64_t diff;
>
> -uint64_t rtc_r = cpu_riscv_read_rtc(timebase_freq);
> +uint64_t rtc_r = cpu_riscv_read_rtc(mtimer);
>
>  cpu->env.timecmp = value;
>  if (cpu->env.timecmp <= rtc_r) {
> @@ -140,10 +146,10 @@ static uint64_t riscv_aclint_mtimer_read(void *opaque, 
> hwaddr addr,
>  }
>  } else if (addr == mtimer->time_base) {
>  /* time_lo */
> -return cpu_riscv_read_rtc(mtimer->timebase_freq) & 0x;
> +return cpu_riscv_read_rtc(mtimer) & 0x;
>  } else if (addr == mtimer->time_base + 4) {
>  /* time_hi */
> -return (cpu_riscv_read_rtc(mtimer->timebase_freq) >> 32) & 
> 0x;
> +return (cpu_riscv_read_rtc(mtimer) >> 32) & 0x;
>  }
>
>  qemu_log_mask(LOG_UNIMP,
> @@ -156,6 +162,7 @@ static void riscv_aclint_mtimer_write(void *opaque, 
> hwaddr addr,
>  uint64_t value, unsigned size)
>  {
>  RISCVAclintMTimerState *mtimer = opaque;
> +int i;
>
>  if (addr >= mtimer->timecmp_base &&
>  addr < (mtimer->timecmp_base + (mtimer->num_harts << 3))) {
> @@ -170,31 +177,40 @@ static void riscv_aclint_mtimer_write(void *opaque, 
> hwaddr addr,
>  /* timecmp_lo */
>  uint64_t timecmp_hi = env->timecmp >> 32;
>  riscv_aclint_mtimer_write_timecmp(mtimer, RISCV_CPU(cpu), hartid,
> -timecmp_hi << 32 | (value & 0x),
> -mtimer->timebase_freq);
> +timecmp_hi << 32 | (value & 0x));
>  return;
>  } else if ((addr & 0x7) == 4) {
>  /* timecmp_hi */
>  uint64_t timecmp_lo = env->timecmp;
>  riscv_aclint_mtimer_write_timecmp(mtimer, RISCV_CPU(cpu), hartid,
> -value << 32 | (timecmp_lo & 0x),
> -mtimer->timebase_freq);
> +value << 32 | (timecmp_lo & 0x));
>  } else {
>  qemu_log_mask(LOG_UNIMP,
>"aclint-mtimer: invalid timecmp write: %08x",
>(uint32_t)addr);
>  }
>  return;
> -} else if (addr == mtimer->time_base) {
> -/* time_lo */
> -qemu_log_mask(LOG_UNIMP,
> -  "aclint-mtimer: time_lo write not implemented");
> -return;
> -} else if (addr == mtimer->time_base + 4) {
> -/* time_hi */
> -qemu_log_mask(LOG_UNIMP,
> -  "aclint-mtimer: time_hi write not implemented");
> -return;
> +} else if (addr == mtimer->time_base || addr == mtimer->time_base + 4) {
> +uint64_t rtc_r = cpu_riscv_read_rtc_raw(mtimer->timebase_freq);
> +
> +

Re: [PULL 53/61] target/riscv: Split out the vill from vtype

2022-01-31 Thread Alistair Francis

On Sat, Jan 29, 2022 at 2:10 AM Peter Maydell  wrote:
>
> On Fri, 21 Jan 2022 at 09:42, Alistair Francis
>  wrote:
> >
> > From: LIU Zhiwei 
> >
> > We need not specially process vtype when XLEN changes.
> >
> > Signed-off-by: LIU Zhiwei 
> > Reviewed-by: Richard Henderson 
> > Reviewed-by: Alistair Francis 
> > Message-id: 20220120122050.41546-16-zhiwei_...@c-sky.com
> > Signed-off-by: Alistair Francis 
>
> Odd thing I noticed looking at this code: as far as I can see we
> may set env->vill to 1 in the vsetvl helper, but there is nowhere
> that we set it to 0, so once it transitions to 1 it's stuck there
> until the system is reset. Is this really right?

This is really confusing. It implies that you can't set vill from
software, but that just seems to be confusing wording.

Reading 
https://lists.riscv.org/g/tech-vector-ext/topic/reliably_set_vtype_vill/86745728
it seems that this is a QEMU bug and the guest should be able to set
the bit as part of vsetvl

@LIU Zhiwei are you able to fix this up?


Alistair

Re: [PATCH] MAINTAINERS: Adding myself as a reviewer of some components

2022-01-31 Thread Philippe Mathieu-Daudé via


On 31/1/22 13:20, Ani Sinha wrote:

Added myself as a reviewer of vmgenid, unimplemented device and empty slot.

Signed-off-by: Ani Sinha 
---
  MAINTAINERS | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b43344fa98..fed31a5eb5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2182,6 +2182,7 @@ F: tests/qtest/prom-env-test.c
  
  VM Generation ID

  S: Orphan
+R: Ani Sinha 
  F: hw/acpi/vmgenid.c
  F: include/hw/acpi/vmgenid.h
  F: docs/specs/vmgenid.txt
@@ -2197,6 +2198,7 @@ F: hw/misc/led.c
  Unimplemented device
  M: Peter Maydell 
  R: Philippe Mathieu-Daudé 
+R: Ani Sinha 
  S: Maintained
  F: include/hw/misc/unimp.h
  F: hw/misc/unimp.c
@@ -2204,6 +2206,7 @@ F: hw/misc/unimp.c
  Empty slot
  M: Artyom Tarasenko 
  R: Philippe Mathieu-Daudé 
+R: Ani Sinha 
  S: Maintained
  F: include/hw/misc/empty_slot.h
  F: hw/misc/empty_slot.c


Don't expect much activity in unimp/empty_slot ;)

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] block.h: remove outdated comment

2022-01-31 Thread Philippe Mathieu-Daudé via


On 31/1/22 13:56, Emanuele Giuseppe Esposito wrote:

The comment "disk I/O throttling" doesn't make any sense at all
any more. It was added in commit 0563e191516 to describe
bdrv_io_limits_enable()/disable(), which were removed in commit
97148076, so the comment is just a forgotten leftover.

Suggested-by: Kevin Wolf 
Signed-off-by: Emanuele Giuseppe Esposito 
---
  include/block/block.h | 1 -
  1 file changed, 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [RFC PATCH 1/1] i386: Remove features from Epyc-Milan cpu

2022-01-31 Thread Babu Moger



On 1/31/22 14:18, Leonardo Bras Soares Passos wrote:
> On Mon, Jan 31, 2022 at 3:04 PM Daniel P. Berrangé  
> wrote:
>> On Mon, Jan 31, 2022 at 02:56:38PM -0300, Leonardo Bras Soares Passos wrote:
>>> Hello Daniel,
>>>
>>> On Mon, Jan 31, 2022 at 6:08 AM Daniel P. Berrangé  
>>> wrote:
 CC'ing  Babu Moger who aded the Milan CPU model.

 On Sat, Jan 29, 2022 at 07:23:37AM -0300, Leonardo Bras wrote:
> While trying to bring a VM with EPYC-Milan cpu on a host with
> EPYC-Milan cpu (EPYC 7313), the following warning can be seen:
>
> qemu-system-x86_64: warning: host doesn't support requested feature: 
> CPUID.07H:EBX.erms [bit 9]
> qemu-system-x86_64: warning: host doesn't support requested feature: 
> CPUID.07H:EDX.fsrm [bit 4]
>
> Even with this warning, the host goes up.
>
> Then, grep'ing cpuid output on both guest and host, outputs:
>
> extended feature flags (7):
>   enhanced REP MOVSB/STOSB = false
>   fast short REP MOV   = false
>   (simple synth)  = AMD EPYC (3rd Gen) (Milan B1) [Zen 3], 7nm
>brand = "AMD EPYC 7313 16-Core Processor   "
>
> This means that for the same -cpu model (EPYC-Milan), the vcpu may or may
> not have the above feature bits set, which is usually not a good idea for
> live migration:
> Migrating from a host with these features to a host without them can
> be troublesome for the guest.
>
> Remove the "optional" features (erms, fsrm) from Epyc-Milan, in order to
> avoid possible after-migration guest issues.
 Babu,  can you give some insight into availability of erms / fsrm
 features across the EPYC 3rd gen CPU line. Is this example missing
 erms/fsrm an exception, or common place ?

AMD supports fsrm and erms in EPYC 3rd gen CPUs. But there is some
inconsistency in enabling these features in the BIOS. Some BIOSes enable
it automatically and some BIOSes don't. But there a BIOS option
(in ADVANCED -> AMD CBS) to enable/disable manually. We are working
internally to find out the going forward strategy for these features. We
will update the code when we find out about it.

We know it is causing little bit of annoyance to the users. But as far as
we know it should not cause migration issues as already discussed.
thanks



> Signed-off-by: Leonardo Bras 
> ---
>
> Does this make sense? Or maybe I am missing something here.
>
> Having a kvm guest running with a feature bit, while the host
> does not support it seems to cause a possible break the guest.
 The guest won't see the feature bit - that warning message from QEMU
 is telling you that it did't honour the request to expose
 erms / fsrm - it has dropped them from the CPUO exposed to the guest.
>>> Exactly.
>>> What I meant here is:
>>> 1 - Host with these feature bits start a VM with EPYC-Milan cpu (and
>>> thus have those bits enabled)
>>> 2 - Guest is migrated to a host such as the above, which does not
>>> support those features (bits disabled), but does support EPYC-Milan
>>> cpus (without those features).
>>> 3 - The migration should be allowed, given the same cpu types. Then
>>> either we have:
>>> 3a : The guest vcpu stays with the flag enabled (case I tried to
>>> explain above), possibly crashing if the new feature is used, or
>>> 3b: The guest vcpu disables the flag due to incompatibility,  which
>>> may make the guest confuse due to cpu change, and even end up trying
>>> to use the new feature on the guest, even if it's disabled.
>> Neither should happen with a correctly written mgmt app in charge.
>>
>> When launching a QEMU process for an incoming migration, it is expected
>> that the mgmt app has first queried QEMU on the source to get the precise
>> CPU model + flags that were added/removed on the source. The QEMU on
>> the target is then launched with this exact set of flags, and the
>> 'check' flag is also set for -cpu. That will cause QEMU on the target
>> to refuse to start unless it can give the guest the 100% identical
>> CPUID to what has been requested on the CLI, and thus matching the
>> source.
>>
>> Libvirt will ensure all this is done correctly. If not using libvirt
>> then you've got a bunch of work to do to achieve this. It certainly
>> isn't sufficient to merely use the same plain '-cpu' arg that the
>> soruce was original booted with, unless you have 100% identical
>> hardware, microcode, and software on both hosts, or the target host
>> offers a superset of features.
> Oh, that is very interesting! Thanks for sharing!
>
> Well, then at least one unexpected scenario should happen:
> - VM with EPYC-Milan cpu, created in source host
> - Source host with EPYC-Milan cpu. Support for 'extra features'
> enabled ( erms / fsrm in this ex.)
> - Target host with EPYC-Milan cpu. No support for 'extra features'.
> Since the VM will be created with support for 'extra features',

Re: [PATCH v2] Use long endian options for ppc64

2022-01-31 Thread Philippe Mathieu-Daudé via


On 31/1/22 10:17, Miroslav Rezanina wrote:

GCC options pairs -mlittle/-mlittle-endian and -mbig/-mbig-endian are
equivalent on ppc64 architecture. However, Clang supports only long
version of the options.

Use longer form in configure to properly support both GCC and Clang
compiler. In addition, fix this issue in tcg test configure.

Signed-off-by: Miroslav Rezanina 

---
This is v2 of configure: Use -mlittle-endian instead of -mlittle for ppc64.

v2:
  - handle both -mlittle and -mbig usage
  - fix tests/tcg/configure.sh
---
  configure  | 4 ++--
  tests/tcg/configure.sh | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 3/6] hppa: Add support for an emulated TOC/NMI button.

2022-01-31 Thread Philippe Mathieu-Daudé via


On 31/1/22 22:35, Helge Deller wrote:

Almost all PA-RISC machines have either a button that is labeled with 'TOC' or
a BMC/GSP function to trigger a TOC.  TOC is a non-maskable interrupt that is
sent to the processor.  This can be used for diagnostic purposes like obtaining
a stack trace/register dump or to enter KDB/KGDB in Linux.

This patch adds support for such an emulated TOC button.

It wires up the qemu monitor "nmi" command to trigger a TOC.  For that it


s/qemu/QEMU/ (few others).


provides the hppa_nmi function which is assigned to the nmi_monitor_handler
function pointer.  When called it raises the EXCP_TOC hardware interrupt in the
hppa_cpu_do_interrupt() function.  The interrupt function then calls the
architecturally defined TOC function in SeaBIOS-hppa firmware (at fixed address
0xf000).

According to the PA-RISC PDC specification, the SeaBIOS firmware then writes
the CPU registers into PIM (processor internal memmory) for later analysis.  In


Typo "memory".


order to write all registers it needs to know the contents of the CPU "shadow
registers" and the IASQ- and IAOQ-back values. The IAOQ/IASQ values are
provided by qemu in shadow registers when entering the SeaBIOS TOC function.
This patch adds a new aritificial opcode "getshadowregs" (0xfffdead2) which


Typo "artificial".


restores the original values of the shadow registers. With this opcode SeaBIOS
can store those registers as well into PIM before calling an OS-provided TOC
handler.

To trigger a TOC, switch to the qemu monitor with Ctrl-A C, and type in the
command "nmi".  After the TOC started the OS-debugger, exit the qemu monitor
with Ctrl-A C.

Signed-off-by: Helge Deller 
---
  hw/hppa/machine.c| 35 ++-
  target/hppa/cpu.c|  2 +-
  target/hppa/cpu.h|  5 +
  target/hppa/helper.h |  1 +
  target/hppa/insns.decode |  1 +
  target/hppa/int_helper.c | 19 ++-
  target/hppa/op_helper.c  |  7 ++-
  target/hppa/translate.c  | 10 ++
  8 files changed, 76 insertions(+), 4 deletions(-)
+static const TypeInfo machine_hppa_machine_init_typeinfo = {
+.name = ("hppa" "-machine"),


   .name = MACHINE_TYPE_NAME("hppa"),


+.parent = "machine",
+.class_init = machine_hppa_machine_init_class_init,
+.interfaces = (InterfaceInfo[]) {
+{ TYPE_NMI },
+{ }
+},
+};



diff --git a/target/hppa/helper.h b/target/hppa/helper.h
index 0a629ffa7c..fe8a9ce493 100644
--- a/target/hppa/helper.h
+++ b/target/hppa/helper.h
@@ -80,6 +80,7 @@ DEF_HELPER_FLAGS_0(read_interval_timer, TCG_CALL_NO_RWG, tr)
  #ifndef CONFIG_USER_ONLY
  DEF_HELPER_1(halt, noreturn, env)
  DEF_HELPER_1(reset, noreturn, env)
+DEF_HELPER_1(getshadowregs, void, env)
  DEF_HELPER_1(rfi, void, env)
  DEF_HELPER_1(rfi_r, void, env)
  DEF_HELPER_FLAGS_2(write_interval_timer, TCG_CALL_NO_RWG, void, env, tr)
diff --git a/target/hppa/insns.decode b/target/hppa/insns.decode
index d4eefc0d48..c7a7e997f9 100644
--- a/target/hppa/insns.decode
+++ b/target/hppa/insns.decode
@@ -111,6 +111,7 @@ rfi_r   00 - - --- 01100101 0
  # They are allocated from the unassigned instruction space.
  halt   1101 1110 1010 1101 
  reset      1101 1110 1010 1101 0001
+getshadowregs      1101 1110 1010 1101 0010




diff --git a/target/hppa/op_helper.c b/target/hppa/op_helper.c
index 1b86557d5d..b0dec4ebf4 100644
--- a/target/hppa/op_helper.c
+++ b/target/hppa/op_helper.c
@@ -694,7 +694,7 @@ void HELPER(rfi)(CPUHPPAState *env)
  cpu_hppa_put_psw(env, env->cr[CR_IPSW]);
  }

-void HELPER(rfi_r)(CPUHPPAState *env)
+void HELPER(getshadowregs)(CPUHPPAState *env)
  {
  env->gr[1] = env->shadow[0];
  env->gr[8] = env->shadow[1];
@@ -703,6 +703,11 @@ void HELPER(rfi_r)(CPUHPPAState *env)
  env->gr[17] = env->shadow[4];
  env->gr[24] = env->shadow[5];
  env->gr[25] = env->shadow[6];
+}
+
+void HELPER(rfi_r)(CPUHPPAState *env)
+{
+helper_getshadowregs(env);
  helper_rfi(env);
  }
  #endif
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index c6195590f8..5c0b1eb274 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -2393,6 +2393,16 @@ static bool trans_reset(DisasContext *ctx, arg_reset *a)
  #endif
  }

+static bool trans_getshadowregs(DisasContext *ctx, arg_getshadowregs *a)
+{
+CHECK_MOST_PRIVILEGED(EXCP_PRIV_OPR);
+#ifndef CONFIG_USER_ONLY
+nullify_over(ctx);
+gen_helper_getshadowregs(cpu_env);
+return nullify_end(ctx);
+#endif
+}


Why not add getshadowregs opcode in a preliminary patch? That would be
easier to review.

Preferably split and using MACHINE_TYPE_NAME():
Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v3 2/3] memory: Make memory_region_is_mapped() succeed when mapped via an alias

2022-01-31 Thread Philippe Mathieu-Daudé via

On 31/1/22 20:20, Niek Linnenbank wrote:

Hi Philippe,

On Mon, Jan 31, 2022 at 12:29 AM Philippe Mathieu-Daudé > wrote:

Hi Niek!

(+Mark FYI)

On 30/1/22 23:50, Niek Linnenbank wrote:
 > Hi David,
 >
 > While I realize my response is quite late, I wanted to report
this error
 > I found when running the acceptance
 > tests for the orangepi-pc machine using avocado:

Unfortunately I only run the full SD/MMC tests when I send a SD/MMC pull
request, so missed that here.

I understand. These tests are behind the AVOCADO_ALLOW_LARGE_STORAGE 
flag in avocado, so I guess they

don't run on gitlab as well, but I'm not sure about that.

Indeed they don't run on GitLab due to that flag, but I run them locally
before sending a SD/MMC pull request (along with older images that are
not available anymore on the internet but are still in my local Avocado
cache).

 > Basically the two tests freeze during the part where the U-Boot
 > bootloader needs to detect the amount of memory. We model this in
the
 > hw/misc/allwinner-h3-dramc.c file.
 > And when running the machine manually it shows an assert on
 > 'alias->mapped_via_alias >= 0'. When running manually via gdb, I was
 > able to collect this backtrace:
 >
 > $ gdb ./build/qemu-system-arm
 > ...
 > gdb) run -M orangepi-pc -nographic
 > ./Armbian_20.08.1_Orangepipc_bionic_current_5.8.5.img
 > ...
 > U-Boot SPL 2020.04-armbian (Sep 02 2020 - 10:16:13 +0200)
 > DRAM:
 > qemu-system-arm: ../softmmu/memory.c:2588:
memory_region_del_subregion:
 > Assertion `alias->mapped_via_alias >= 0' failed.
...

 > So it seems that the hw/misc/allwinner-h3-dramc.c file is using
the call
 > memory_region_set_address, where internally we are calling
 > memory_region_del_subregion.
 > The allwinner-h3-dramc.c file does use
 > memory_region_add_subregion_overlap once in the realize function,
but
 > might use the memory_region_set_address multiple times.
 > It looks to me this is the path where the assert comes in. If I
revert
 > this patch on current master, the machine boots without the
assertion.
 >
 > Would you be able to help out how we can best resolve this?
Ofcourse, if
 > there is anything needed to be changed on the
allwinner-h3-dramc.c file,
 > I would be happy to prepare a patch for that.

David's patch LGTM and I think your model might be somehow abusing the
memory API, but I'd like to read on the DRAMCOM Control Register to
understand the allwinner_h3_dramc_map_rows() logic. I couldn't find a
reference looking at Allwinner_H3_Datasheet_V1.2.pdf.
I wonder if we could ignore implementing it.

Yes David's fix using memory_region_add_subregion_common inside 
memory_region_readd_subregion resolves the issue indeed.

Great.

Well the allwinner-h3-dramc.c module works OK for now, but it can 
certainly use improvements indeed.
And you're right, unfortunately the DRAMCOM device isn't documented in 
the datasheet as far as I know.

OK :/

Your use case is typically what I tried to solve with this model:
https://lore.kernel.org/qemu-devel/20210419094329.1402767-2-f4...@amsat.org/

In your case, @span_size is your amount of DRAM, and @region_size is the
area u-boot is scanning (and @offset is zero).
Could that work, or is DRAMCOM doing much more?

The current model in allwinner-h3-dramc.c is roughly based on the code 
that is present in U-Boot in the file arm/arm/mach-sunxi/dram_sunxi_dw.c.
It implements the low-level initialization of the memory controller, and 
when running using Qemu the most important thing it needs to do is
detect the amount of memory. If it cannot accomplish this task, the 
U-Boot SPL won't boot properly or crash later. So what we have in
the allwinner-h3-dramc.c implementation comes from the information and 
code in the dram_sunxi_dw.c file in U-Boot, not the datasheet.

OK, this is a good start point. I'll look at the memory accesses
(certainly not today, but that problem is of my interest).

The proposal you send with span_size/region_size looks interesting 
indeed. It would be great if this could help
simplify the code in allwinner-h3-dramc.c. But it would require some 
effort to figure out if it can indeed replace the current

behavior.

Regards,

Phil.

Re: [RFC PATCH 2/2] hw/i386/sgx: Attach SGX-EPC to its memory backend

2022-01-31 Thread Philippe Mathieu-Daudé via


On 23/1/22 13:52, Yang Zhong wrote:

On Mon, Jan 17, 2022 at 12:48:10PM +0100, Paolo Bonzini wrote:

On 1/17/22 00:53, Philippe Mathieu-Daudé via wrote:

We have one SGX-EPC address/size/node per memory backend,
make it child of the backend in the QOM composition tree.

Cc: Yang Zhong 
Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/i386/sgx.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index 5de5dd08936..6362e5e9d02 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -300,6 +300,9 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
  /* set the memdev link with memory backend */
  object_property_parse(obj, SGX_EPC_MEMDEV_PROP, list->value->memdev,
_fatal);
+object_property_add_child(OBJECT(list->value->memdev), "sgx-epc",
+  OBJECT(obj));
+
  /* set the numa node property for sgx epc object */
  object_property_set_uint(obj, SGX_EPC_NUMA_NODE_PROP, 
list->value->node,
   _fatal);


I don't think this is a good idea; only list->value->memdev should
add something below itself in the tree.

However, I think obj can be added under the machine itself as
/machine/sgx-epc-device[*].



   Philippe, Sorry I can't receive all Qemu mails from my mutt tool.

   https://lists.nongnu.org/archive/html/qemu-devel/2022-01/msg03535.html
   I verified this patch, and the issue was reported as below:

   Unexpected error in object_property_try_add() at ../qom/object.c:1224:
   qemu-system-x86_64: attempt to add duplicate property 'sgx-epc' to object 
(type 'pc-q35-7.0-machine')
   Aborted (core dumped)

   Even I changed it to another name, which still reported same kind of issue.

   I tried below patch as my previous patch, and it can work
   diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
   index d60485fc422..66444745b47 100644
   --- a/hw/i386/sgx.c
   +++ b/hw/i386/sgx.c
   @@ -281,6 +281,7 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
SGXEPCState *sgx_epc = >sgx_epc;
X86MachineState *x86ms = X86_MACHINE(pcms);
SgxEPCList *list = NULL;
   +int sgx_count = 0;
Object *obj;

memset(sgx_epc, 0, sizeof(SGXEPCState));
   @@ -297,7 +298,9 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
for (list = x86ms->sgx_epc_list; list; list = list->next) {
obj = object_new("sgx-epc");

   -object_property_add_child(OBJECT(pcms), "sgx-epc", OBJECT(obj));
   +gchar *name = g_strdup_printf("device[%d]", sgx_count++);


Oops yes you are right... Fixed in v3!


   +object_property_add_child(container_get(qdev_get_machine(), 
"/sgx-epc-device"),
   +  name, obj);
   
/* set the memdev link with memory backend */

object_property_parse(obj, SGX_EPC_MEMDEV_PROP, list->value->memdev,


   From the monitor,
   (qemu) qom-list /machine/sgx-epc-device
   type (string)
   device[0] (child)
   device[1] (child)
   
   This can normally show two sgx epc section devices.
   
   If you have new patch, I can help verify, thanks!


Here you go for v3:
https://lore.kernel.org/qemu-devel/20220131233507.334174-1-f4...@amsat.org/

Regards,

Phil.

[PATCH v3 2/2] hw/i386/sgx: Attach SGX-EPC objects to machine

2022-01-31 Thread Philippe Mathieu-Daudé via

Previously SGX-EPC objects were exposed in the QOM tree at a path

  /machine/unattached/device[nn]

where the 'nn' varies depending on what devices were already created.

With this change the SGX-EPC objects are now at

  /machine/sgx-epc[nn]

where the 'nn' of the first SGX-EPC object is always zero.

Reported-by: Yang Zhong 
Suggested-by: Paolo Bonzini 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/i386/sgx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index a2b318dd938..3ab2217ca43 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -304,6 +304,8 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
 for (list = x86ms->sgx_epc_list; list; list = list->next) {
 obj = object_new("sgx-epc");
 
+object_property_add_child(OBJECT(pcms), "sgx-epc[*]", OBJECT(obj));
+
 /* set the memdev link with memory backend */
 object_property_parse(obj, SGX_EPC_MEMDEV_PROP, list->value->memdev,
   _fatal);
-- 
2.34.1

[PATCH v3 1/2] hw/i386: Attach CPUs to machine

2022-01-31 Thread Philippe Mathieu-Daudé via

Previously CPUs were exposed in the QOM tree at a path

  /machine/unattached/device[nn]

where the 'nn' of the first CPU is usually zero, but can
vary depending on what devices were already created.

With this change the CPUs are now at

  /machine/cpu[nn]

where the 'nn' of the first CPU is always zero.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/i386/x86.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index b84840a1bb9..50bf249c700 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -108,6 +108,7 @@ void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id, 
Error **errp)
 {
 Object *cpu = object_new(MACHINE(x86ms)->cpu_type);
 
+object_property_add_child(OBJECT(x86ms), "cpu[*]", OBJECT(cpu));
 if (!object_property_set_uint(cpu, "apic-id", apic_id, errp)) {
 goto out;
 }
-- 
2.34.1

[PATCH v3 0/2] hw/i386: QOM-attach CPUs/SGX-EPC objects to their parents

2022-01-31 Thread Philippe Mathieu-Daudé via

Trying to fix the issue reported here:
https://lists.nongnu.org/archive/html/qemu-devel/2021-11/msg05670.html

First attach the CPUs and SGX-EPC objects to the machine.

Since v2:
- added missing QOM property auto-enum: [*] (Yang Zhong)

Since v1:
- addressed Paolo & Daniel review feedbacks

Philippe Mathieu-Daudé (2):
  hw/i386: Attach CPUs to machine
  hw/i386/sgx: Attach SGX-EPC objects to machine

 hw/i386/sgx.c | 2 ++
 hw/i386/x86.c | 1 +
 2 files changed, 3 insertions(+)

-- 
2.34.1

Re: [PATCH v1 5/6] hw/misc: Add a model of the Xilinx ZynqMP APU Control

2022-01-31 Thread Philippe Mathieu-Daudé via


On 31/1/22 13:17, Edgar E. Iglesias wrote:

On Mon, Jan 31, 2022 at 12:35:54AM +0100, Philippe Mathieu-Daudé wrote:

On 31/1/22 00:12, Edgar E. Iglesias wrote:

From: "Edgar E. Iglesias" 

Add a model of the Xilinx ZynqMP APU Control.

Signed-off-by: Edgar E. Iglesias 
---
   include/hw/misc/xlnx-zynqmp-apu-ctrl.h |  91 +
   hw/misc/xlnx-zynqmp-apu-ctrl.c | 257 +
   hw/misc/meson.build|   1 +
   3 files changed, 349 insertions(+)
   create mode 100644 include/hw/misc/xlnx-zynqmp-apu-ctrl.h
   create mode 100644 hw/misc/xlnx-zynqmp-apu-ctrl.c



+
+#define NUM_CPUS 4


Hmm isn't it APU_MAX_CPU?




Thanks Philippe. Yes, this was a little confusing. The values happen to be the
same but they belong to different models. For example, the apu-ctrl model will 
be
reused in Versal.

Anyways, for v2 I've renamed the macros to CRF_MAX_CPU and APU_MAX_CPU 
respectevly.
I hope that makes things clearer.


Thanks!

Re: [PATCH] qapi/block: Cosmetic change in BlockExportType schema

2022-01-31 Thread Philippe Mathieu-Daudé via


On 31/1/22 22:37, Eric Blake wrote:

On Sun, Jan 30, 2022 at 07:50:41PM +0100, Philippe Mathieu-Daudé wrote:

On 28/1/22 21:54, Eric Blake wrote:

On Wed, Jan 19, 2022 at 01:14:39PM +0100, Philippe Mathieu-Daudé wrote:

From: Philippe Mathieu-Daude 


'git am' used this line to insert the authorship...



From: Philippe Mathieu-Daudé 


...then left this line in the commit body, which I manually deleted,
without spotting the difference between the two.



The doubled From: looks odd here.  I'll double-check that git doesn't
mess up the actual commit once I apply the patch.


I played with the git --from option to not appear in the list as
'"Philippe Mathieu-Daudé via" ':
https://lore.kernel.org/qemu-devel/efc5f304-f3d2-ff7b-99a6-673595ff0...@amsat.org/
by using a different sendemail.from (removing the acute in my
lastname) to force a correct author.from.
git-am should have picked the 2nd form, but I see the 1st in commit
3a8fa0edd1. Just curious, did you had to modify it manually?


Alas, since I managed to overlook the change in the acute (I suppose
I'm cursed with having a boring name, so unlike many list participants
who are overjoyed by the power of UTF-8 to make self-expression more
accurate, I have not had as much experience with thinking about it),
my manual edits explain why the merged commit ended up with a less
desirable spelling.  I apologize for the mishap.  Do we need/want a
.mailmap entry to aid git at listing your preferred spelling?


A missing acute is not a big deal, compared to other alphabets where
people try to approximate their name pronunciation to Latin symbols,
and is still better than UTF-8 mojibake :)

Re: [PATCH v3 4/4] python/aqmp: add socket bind step to legacy.py

2022-01-31 Thread John Snow

On Thu, Jan 27, 2022 at 10:50 AM Hanna Reitz  wrote:
>
> On 24.01.22 19:06, John Snow wrote:
> > The synchronous QMP library would bind to the server address during
> > __init__(). The new library delays this to the accept() call, because
> > binding occurs inside of the call to start_[unix_]server(), which is an
> > async method -- so it cannot happen during __init__ anymore.
> >
> > Python 3.7+ adds the ability to create the server (and thus the bind()
> > call) and begin the active listening in separate steps, but we don't
> > have that functionality in 3.6, our current minimum.
> >
> > Therefore ... Add a temporary workaround that allows the synchronous
> > version of the client to bind the socket in advance, guaranteeing that
> > there will be a UNIX socket in the filesystem ready for the QEMU client
> > to connect to without a race condition.
> >
> > (Yes, it's a bit ugly. Fixing it more nicely will have to wait until our
> > minimum Python version is 3.7+.)
>
> I mean.  Looks good to me?  Not quite enough for an R-b, I’d say, and
> you don’t really need an A-b from me on this, but looks good to me! O:)
>

Works for me, thanks!

Re: [PATCH v3 2/4] python/machine: raise VMLaunchFailure exception from launch()

2022-01-31 Thread John Snow

On Thu, Jan 27, 2022 at 9:22 AM Hanna Reitz  wrote:
>
> On 24.01.22 19:06, John Snow wrote:
> > This allows us to pack in some extra information about the failure,
> > which guarantees that if the caller did not *intentionally* cause a
> > failure (by capturing this Exception), some pretty good clues will be
> > printed at the bottom of the traceback information.
>
> OK, I presume in contrast to unconditionally logging this on debug
> level, which is less than ideal because on that level it’s most likely
> hidden, but that was exactly the point, because we don’t know whether
> the caller will catch the exception, so we mustn’t log it on a more
> urgent level.
>

Exactly. More urgent logging interferes with tests where we
intentionally give a bad configuration. device-crash-test is another
example.

> But by creating a new exception class, we get a reasonable log output
> exactly when the caller won’t catch it.
>

That's the intent. By stuffing this info into the Exception, we'll
always see it printed if the error went unhandled. It seemed like the
best way to make sure the error messages were more apparent more often
without requiring the use of debug mode -- so that errors in e.g.
GitLab CI would print good diagnostic info by default.

> > This will help make failures in the event of a non-negative return code
> > more obvious when they go unhandled; the current behavior is to print a
> > warning message only in the event of signal-based terminations (for
> > negative return codes).
>
> I assume you mean the one in _post_shutdown()...?
>

Yes.

> Confused me a bit, because for a while I interpreted this to mean “We
> don’t output anything in case of a positive return code”, but it means
> “We don’t print any details in that case, because the exception we
> re-raise in launch() doesn’t contain valuable information”.  Makes sense.
>

Sorry, I'll improve the commit message.

> > (Note: In Python, catching BaseException instead of Exception is like
> > installing a signal handler that will run as long as Python itself
> > doesn't crash.
>
> This really confused me, because I can’t really understand this at all.
>
> But I guess what I took from googling was that every exception object
> must be derived from BaseException eventually, and so we continue to
> catch all exceptions here, we just give them a name. (And then we create
> a VMLaunchFailure only for Exception exceptions, because the others
> don’t have much to do with launching the VM.)
>

Apologies for not being more clear. ("It made sense to me at the time
...") What I mean to say here is: there are several ways to catch all
exceptions.

"except:" will catch everything.
"except BaseException" catches everything, too. This is equivalent to the above.
"except Exception" won't catch anything that inherits directly from
BaseException, only Exception and children.

What I wanted to convey here is that:

(1) If the exception is a BaseException, it's probably something like
KeyboardInterrupt (SIGINT) or SystemExit, we don't want to wrap the
exception and instead we want to re-raise it as-is. We are functioning
more or less like a signal handler here, performing some cleanup and
then yielding back control.
(2) If the exception is merely an Exception, it's OK to wrap it with
the custom exception and re-raise.

Wrapping a BaseException would be a problem because it would
'downgrade' the severity of the exception (so to speak) and may cause
issues.
I'll try to improve the commit message.

> > KeyboardInterrupt and several other "strong" events in
> > Python are a BaseException. These events should generally not be
> > suppressed, but occasionally we want to perform some cleanup in response
> > to one.)
> >
> > Signed-off-by: John Snow 
> > ---
> >   python/qemu/machine/machine.py| 45 ---
> >   tests/qemu-iotests/tests/mirror-top-perms |  3 +-
> >   2 files changed, 40 insertions(+), 8 deletions(-)
>
> Reviewed-by: Hanna Reitz 
>
> (Looked at `except` and `ConnectError` usage outside of
> mirror-top-perms, but couldn’t find anything else that looked like it
> caught VM launch exceptions.)
>

Thanks!

[PATCH v4 1/2] hw/sensor: Add SB-TSI Temperature Sensor Interface

2022-01-31 Thread Patrick Venture

From: Hao Wu 

SB Temperature Sensor Interface (SB-TSI) is an SMBus compatible
interface that reports AMD SoC's Ttcl (normalized temperature),
and resembles a typical 8-pin remote temperature sensor's I2C interface
to BMC.

This patch implements a basic AMD SB-TSI sensor that is
compatible with the open-source data sheet from AMD and Linux
kernel driver.

Reference:
Linux kernel driver:
https://lkml.org/lkml/2020/12/11/968
Register Map:
https://developer.amd.com/wp-content/resources/56255_3_03.PDF
(Chapter 6)

Signed-off-by: Hao Wu 
Signed-off-by: Patrick Venture 
Reviewed-by: Doug Evans 
Reviewed-by: Philippe Mathieu-Daudé 
Acked-by: Corey Minyard 
---
 meson.build   |   1 +
 hw/sensor/trace.h |   1 +
 include/hw/sensor/sbtsi.h |  45 +
 hw/sensor/tmp_sbtsi.c | 369 ++
 hw/sensor/Kconfig |   4 +
 hw/sensor/meson.build |   1 +
 hw/sensor/trace-events|   5 +
 7 files changed, 426 insertions(+)
 create mode 100644 hw/sensor/trace.h
 create mode 100644 include/hw/sensor/sbtsi.h
 create mode 100644 hw/sensor/tmp_sbtsi.c
 create mode 100644 hw/sensor/trace-events

diff --git a/meson.build b/meson.build
index c1b1db1e28..3634214546 100644
--- a/meson.build
+++ b/meson.build
@@ -2494,6 +2494,7 @@ if have_system
 'hw/rtc',
 'hw/s390x',
 'hw/scsi',
+'hw/sensor',
 'hw/sd',
 'hw/sh4',
 'hw/sparc',
diff --git a/hw/sensor/trace.h b/hw/sensor/trace.h
new file mode 100644
index 00..e4721560b0
--- /dev/null
+++ b/hw/sensor/trace.h
@@ -0,0 +1 @@
+#include "trace/trace-hw_sensor.h"
diff --git a/include/hw/sensor/sbtsi.h b/include/hw/sensor/sbtsi.h
new file mode 100644
index 00..9073f76ebb
--- /dev/null
+++ b/include/hw/sensor/sbtsi.h
@@ -0,0 +1,45 @@
+/*
+ * AMD SBI Temperature Sensor Interface (SB-TSI)
+ *
+ * Copyright 2021 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+#ifndef QEMU_TMP_SBTSI_H
+#define QEMU_TMP_SBTSI_H
+
+/*
+ * SB-TSI registers only support SMBus byte data access. "_INT" registers are
+ * the integer part of a temperature value or limit, and "_DEC" registers are
+ * corresponding decimal parts.
+ */
+#define SBTSI_REG_TEMP_INT  0x01 /* RO */
+#define SBTSI_REG_STATUS0x02 /* RO */
+#define SBTSI_REG_CONFIG0x03 /* RO */
+#define SBTSI_REG_TEMP_HIGH_INT 0x07 /* RW */
+#define SBTSI_REG_TEMP_LOW_INT  0x08 /* RW */
+#define SBTSI_REG_CONFIG_WR 0x09 /* RW */
+#define SBTSI_REG_TEMP_DEC  0x10 /* RO */
+#define SBTSI_REG_TEMP_HIGH_DEC 0x13 /* RW */
+#define SBTSI_REG_TEMP_LOW_DEC  0x14 /* RW */
+#define SBTSI_REG_ALERT_CONFIG  0xBF /* RW */
+#define SBTSI_REG_MAN   0xFE /* RO */
+#define SBTSI_REG_REV   0xFF /* RO */
+
+#define SBTSI_STATUS_HIGH_ALERT BIT(4)
+#define SBTSI_STATUS_LOW_ALERT  BIT(3)
+#define SBTSI_CONFIG_ALERT_MASK BIT(7)
+#define SBTSI_ALARM_EN  BIT(0)
+
+/* The temperature we stored are in units of 0.125 degrees. */
+#define SBTSI_TEMP_UNIT_IN_MILLIDEGREE 125
+
+#endif
diff --git a/hw/sensor/tmp_sbtsi.c b/hw/sensor/tmp_sbtsi.c
new file mode 100644
index 00..d5406844ef
--- /dev/null
+++ b/hw/sensor/tmp_sbtsi.c
@@ -0,0 +1,369 @@
+/*
+ * AMD SBI Temperature Sensor Interface (SB-TSI)
+ *
+ * Copyright 2021 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/i2c/smbus_slave.h"
+#include "hw/sensor/sbtsi.h"
+#include "hw/irq.h"
+#include "migration/vmstate.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qom/object.h"
+#include "trace.h"
+
+#define TYPE_SBTSI "sbtsi"
+#define SBTSI(obj) OBJECT_CHECK(SBTSIState, (obj), TYPE_SBTSI)
+
+/**
+ * SBTSIState:
+ * temperatures are in units of 0.125 degrees
+ * @temperature: Temperature
+ * @limit_low: Lowest temperature
+ * @limit_high: Highest temperature
+ * @status: The status register
+ * @config: The config register
+ * @alert_config: The config for alarm_l output.
+ * @addr: The address to read/write for the next cmd.
+ *

[PATCH v4 0/2] hw/sensor: Add SB-TSI Temperature Sensor Interface

2022-01-31 Thread Patrick Venture

v4:
 * Added missing signature block from submitter.

v3:
 * typofix where I accidentally embedded 'wq' into a string
 * moved the type sbtsi definition back into the source file
 * renamed the qtest file to use hyphens only

v2:
 * Split the commit into a separate patch for the qtest
 * Moved the common registers into the new header
 * Introduced a new header

Hao Wu (2):
  hw/sensor: Add SB-TSI Temperature Sensor Interface
  tests: add qtest for hw/sensor/sbtsi

 meson.build  |   1 +
 hw/sensor/trace.h|   1 +
 include/hw/sensor/sbtsi.h|  45 +
 hw/sensor/tmp_sbtsi.c| 369 +++
 tests/qtest/tmp-sbtsi-test.c | 161 +++
 hw/sensor/Kconfig|   4 +
 hw/sensor/meson.build|   1 +
 hw/sensor/trace-events   |   5 +
 tests/qtest/meson.build  |   1 +
 9 files changed, 588 insertions(+)
 create mode 100644 hw/sensor/trace.h
 create mode 100644 include/hw/sensor/sbtsi.h
 create mode 100644 hw/sensor/tmp_sbtsi.c
 create mode 100644 tests/qtest/tmp-sbtsi-test.c
 create mode 100644 hw/sensor/trace-events

-- 
2.34.1.575.g55b058a8bb-goog

[PATCH v4 2/2] tests: add qtest for hw/sensor/sbtsi

2022-01-31 Thread Patrick Venture

From: Hao Wu 

Reviewed-by: Doug Evans 
Signed-off-by: Hao Wu 
Signed-off-by: Patrick Venture 
Acked-by: Thomas Huth 
---
 tests/qtest/tmp-sbtsi-test.c | 161 +++
 tests/qtest/meson.build  |   1 +
 2 files changed, 162 insertions(+)
 create mode 100644 tests/qtest/tmp-sbtsi-test.c

diff --git a/tests/qtest/tmp-sbtsi-test.c b/tests/qtest/tmp-sbtsi-test.c
new file mode 100644
index 00..ff1e193739
--- /dev/null
+++ b/tests/qtest/tmp-sbtsi-test.c
@@ -0,0 +1,161 @@
+/*
+ * QTests for the SBTSI temperature sensor
+ *
+ * Copyright 2020 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+
+#include "libqtest-single.h"
+#include "libqos/qgraph.h"
+#include "libqos/i2c.h"
+#include "qapi/qmp/qdict.h"
+#include "qemu/bitops.h"
+#include "hw/sensor/sbtsi.h"
+
+#define TEST_ID   "sbtsi-test"
+#define TEST_ADDR (0x4c)
+
+/* The temperature stored are in units of 0.125 degrees. */
+#define LIMIT_LOW_IN_MILLIDEGREE (10500)
+#define LIMIT_HIGH_IN_MILLIDEGREE (55125)
+
+static uint32_t qmp_sbtsi_get_temperature(const char *id)
+{
+QDict *response;
+int ret;
+
+response = qmp("{ 'execute': 'qom-get', 'arguments': { 'path': %s, "
+   "'property': 'temperature' } }", id);
+g_assert(qdict_haskey(response, "return"));
+ret = (uint32_t)qdict_get_int(response, "return");
+qobject_unref(response);
+return ret;
+}
+
+static void qmp_sbtsi_set_temperature(const char *id, uint32_t value)
+{
+QDict *response;
+
+response = qmp("{ 'execute': 'qom-set', 'arguments': { 'path': %s, "
+   "'property': 'temperature', 'value': %d } }", id, value);
+g_assert(qdict_haskey(response, "return"));
+qobject_unref(response);
+}
+
+/*
+ * Compute the temperature using the integer and decimal part and return
+ * millidegrees. The decimal part are only the top 3 bits so we shift it by
+ * 5 here.
+ */
+static uint32_t regs_to_temp(uint8_t integer, uint8_t decimal)
+{
+return ((integer << 3) + (decimal >> 5)) * SBTSI_TEMP_UNIT_IN_MILLIDEGREE;
+}
+
+/*
+ * Compute the integer and decimal parts of the temperature in millidegrees.
+ * H/W store the decimal in the top 3 bits so we shift it by 5.
+ */
+static void temp_to_regs(uint32_t temp, uint8_t *integer, uint8_t *decimal)
+{
+temp /= SBTSI_TEMP_UNIT_IN_MILLIDEGREE;
+*integer = temp >> 3;
+*decimal = (temp & 0x7) << 5;
+}
+
+static void tx_rx(void *obj, void *data, QGuestAllocator *alloc)
+{
+uint16_t value;
+uint8_t integer, decimal;
+QI2CDevice *i2cdev = (QI2CDevice *)obj;
+
+/* Test default values */
+value = qmp_sbtsi_get_temperature(TEST_ID);
+g_assert_cmpuint(value, ==, 0);
+
+integer = i2c_get8(i2cdev, SBTSI_REG_TEMP_INT);
+decimal = i2c_get8(i2cdev, SBTSI_REG_TEMP_DEC);
+g_assert_cmpuint(regs_to_temp(integer, decimal), ==, 0);
+
+/* Test setting temperature */
+qmp_sbtsi_set_temperature(TEST_ID, 2);
+value = qmp_sbtsi_get_temperature(TEST_ID);
+g_assert_cmpuint(value, ==, 2);
+
+integer = i2c_get8(i2cdev, SBTSI_REG_TEMP_INT);
+decimal = i2c_get8(i2cdev, SBTSI_REG_TEMP_DEC);
+g_assert_cmpuint(regs_to_temp(integer, decimal), ==, 2);
+
+/* Set alert mask in config */
+i2c_set8(i2cdev, SBTSI_REG_CONFIG_WR, SBTSI_CONFIG_ALERT_MASK);
+value = i2c_get8(i2cdev, SBTSI_REG_CONFIG);
+g_assert_cmphex(value, ==, SBTSI_CONFIG_ALERT_MASK);
+/* Enable alarm_en */
+i2c_set8(i2cdev, SBTSI_REG_ALERT_CONFIG, SBTSI_ALARM_EN);
+value = i2c_get8(i2cdev, SBTSI_REG_ALERT_CONFIG);
+g_assert_cmphex(value, ==, SBTSI_ALARM_EN);
+
+/* Test setting limits */
+/* Limit low = 10.500 */
+temp_to_regs(LIMIT_LOW_IN_MILLIDEGREE, , );
+i2c_set8(i2cdev, SBTSI_REG_TEMP_LOW_INT, integer);
+i2c_set8(i2cdev, SBTSI_REG_TEMP_LOW_DEC, decimal);
+integer = i2c_get8(i2cdev, SBTSI_REG_TEMP_LOW_INT);
+decimal = i2c_get8(i2cdev, SBTSI_REG_TEMP_LOW_DEC);
+g_assert_cmpuint(
+regs_to_temp(integer, decimal), ==, LIMIT_LOW_IN_MILLIDEGREE);
+/* Limit high = 55.125 */
+temp_to_regs(LIMIT_HIGH_IN_MILLIDEGREE, , );
+i2c_set8(i2cdev, SBTSI_REG_TEMP_HIGH_INT, integer);
+i2c_set8(i2cdev, SBTSI_REG_TEMP_HIGH_DEC, decimal);
+integer = i2c_get8(i2cdev, SBTSI_REG_TEMP_HIGH_INT);
+decimal = i2c_get8(i2cdev, SBTSI_REG_TEMP_HIGH_DEC);
+g_assert_cmpuint(
+regs_to_temp(integer, decimal), ==, LIMIT_HIGH_IN_MILLIDEGREE);
+/* No alert is generated. */
+

Re: [PATCH v2 10/11] 9p: darwin: Implement compatibility for mknodat

2022-01-31 Thread Will Cohen

Upon further review, it looks like since 10.12 there's actually a
(not-heavily-documented) function that wraps this syscall and avoids the
need to call the private syscall directly:
https://opensource.apple.com/source/libpthread/libpthread-218.51.1/src/pthread_cwd.c.auto.html.
Chromium uses it too (
https://chromium.googlesource.com/chromium/src/+/lkgr/base/process/launch_mac.cc#110)
-- given that we're not looking for pre-10.12 compatibility, I'm a little
less worried about the workaround breaking in the future if this wrapper
gets used instead.

Would it work to change to pthread_fchdir_np, remove all the syscall
discussion in the comment, and add a meson check for pthread_fchdir_np as a
prereq for virtfs on darwin?

On Fri, Jan 28, 2022 at 1:28 PM Will Cohen  wrote:

> Understood. Since I cannot find the original number, I have submitted a
> new report at rdar://FB9862426  (
> https://openradar.appspot.com/FB9862426).
>
> I'll note that and work on a testcase/error message for v4.
>
> Many thanks,
> Will
>
> On Fri, Jan 28, 2022 at 10:15 AM Christian Schoenebeck <
> qemu_...@crudebyte.com> wrote:
>
>> On Donnerstag, 27. Januar 2022 22:47:54 CET Will Cohen wrote:
>> > Back when this was being proposed, the original proposer did file such a
>> > report to Apple, but we're still in this situation!
>>
>> Ok, but still, do you find it appropriate to just blindly use a private
>> syscall that may or may not exist or might change its behaviour at any
>> time
>> without a user being aware?
>>
>> I am not opposed to using workarounds at all, but what I worry about is
>> that
>> Apple might change this in whatever way at any time, and as this sycall
>> is
>> currently not guarded in this patch at all, we might one day receive bug
>> reports by macOS users with symptoms that might not immediately be
>> obvious to
>> relate to this being the root cause.
>>
>> Options that would come to my mind:
>> - a test case for this syscall
>> - a clear runtime error message for ordinary users
>>
>> Is there a rdar or FB number for the report on Apple's side?
>>
>> > Replacing clang with gcc in v3.
>> >
>> > On Wed, Nov 24, 2021 at 12:20 PM Christian Schoenebeck <
>> >
>> > qemu_...@crudebyte.com> wrote:
>> > > On Montag, 22. November 2021 01:49:12 CET Will Cohen wrote:
>> > > > From: Keno Fischer 
>> > > >
>> > > > Darwin does not support mknodat. However, to avoid race conditions
>> > > > with later setting the permissions, we must avoid using mknod on
>> > > > the full path instead. We could try to fchdir, but that would cause
>> > > > problems if multiple threads try to call mknodat at the same time.
>> > > > However, luckily there is a solution: Darwin as an (unexposed in the
>> > > > C library) system call that sets the cwd for the current thread
>> only.
>> > > > This should suffice to use mknod safely.
>> > > >
>> > > > Signed-off-by: Keno Fischer 
>> > > > Signed-off-by: Michael Roitzsch 
>> > > > [Will Cohen: - Adjust coding style]
>> > > > Signed-off-by: Will Cohen 
>> > > > ---
>> > > >
>> > > >  hw/9pfs/9p-local.c   |  5 +++--
>> > > >  hw/9pfs/9p-util-darwin.c | 33 +
>> > > >  hw/9pfs/9p-util-linux.c  |  5 +
>> > > >  hw/9pfs/9p-util.h|  2 ++
>> > > >  4 files changed, 43 insertions(+), 2 deletions(-)
>> > > >
>> > > > diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
>> > > > index 4268703d05..42b65e143b 100644
>> > > > --- a/hw/9pfs/9p-local.c
>> > > > +++ b/hw/9pfs/9p-local.c
>> > > > @@ -673,7 +673,7 @@ static int local_mknod(FsContext *fs_ctx,
>> V9fsPath
>> > > > *dir_path,
>> > > >
>> > > >  if (fs_ctx->export_flags & V9FS_SM_MAPPED ||
>> > > >
>> > > >  fs_ctx->export_flags & V9FS_SM_MAPPED_FILE) {
>> > > >
>> > > > -err = mknodat(dirfd, name, fs_ctx->fmode | S_IFREG, 0);
>> > > > +err = qemu_mknodat(dirfd, name, fs_ctx->fmode | S_IFREG,
>> 0);
>> > > >
>> > > >  if (err == -1) {
>> > > >
>> > > >  goto out;
>> > > >
>> > > >  }
>> > > >
>> > > > @@ -688,7 +688,7 @@ static int local_mknod(FsContext *fs_ctx,
>> V9fsPath
>> > > > *dir_path, }
>> > > >
>> > > >  } else if (fs_ctx->export_flags & V9FS_SM_PASSTHROUGH ||
>> > > >
>> > > > fs_ctx->export_flags & V9FS_SM_NONE) {
>> > > >
>> > > > -err = mknodat(dirfd, name, credp->fc_mode, credp->fc_rdev);
>> > > > +err = qemu_mknodat(dirfd, name, credp->fc_mode,
>> > > > credp->fc_rdev);
>> > > >
>> > > >  if (err == -1) {
>> > > >
>> > > >  goto out;
>> > > >
>> > > >  }
>> > > >
>> > > > @@ -701,6 +701,7 @@ static int local_mknod(FsContext *fs_ctx,
>> V9fsPath
>> > > > *dir_path,
>> > > >
>> > > >  err_end:
>> > > >  unlinkat_preserve_errno(dirfd, name, 0);
>> > > >
>> > > > +
>> > > >
>> > > >  out:
>> > > >  close_preserve_errno(dirfd);
>> > > >  return err;
>> > > >
>> > > > diff --git a/hw/9pfs/9p-util-darwin.c

Re: [PATCH v2 3/6] hppa: Add support for an emulated TOC/NMI button.

2022-01-31 Thread Richard Henderson


On 2/1/22 08:35, Helge Deller wrote:

Almost all PA-RISC machines have either a button that is labeled with 'TOC' or
a BMC/GSP function to trigger a TOC.  TOC is a non-maskable interrupt that is
sent to the processor.  This can be used for diagnostic purposes like obtaining
a stack trace/register dump or to enter KDB/KGDB in Linux.

This patch adds support for such an emulated TOC button.

It wires up the qemu monitor "nmi" command to trigger a TOC.  For that it
provides the hppa_nmi function which is assigned to the nmi_monitor_handler
function pointer.  When called it raises the EXCP_TOC hardware interrupt in the
hppa_cpu_do_interrupt() function.  The interrupt function then calls the
architecturally defined TOC function in SeaBIOS-hppa firmware (at fixed address
0xf000).

According to the PA-RISC PDC specification, the SeaBIOS firmware then writes
the CPU registers into PIM (processor internal memmory) for later analysis.  In
order to write all registers it needs to know the contents of the CPU "shadow
registers" and the IASQ- and IAOQ-back values. The IAOQ/IASQ values are
provided by qemu in shadow registers when entering the SeaBIOS TOC function.
This patch adds a new aritificial opcode "getshadowregs" (0xfffdead2) which
restores the original values of the shadow registers. With this opcode SeaBIOS
can store those registers as well into PIM before calling an OS-provided TOC
handler.

To trigger a TOC, switch to the qemu monitor with Ctrl-A C, and type in the
command "nmi".  After the TOC started the OS-debugger, exit the qemu monitor
with Ctrl-A C.

Signed-off-by: Helge Deller
---
  hw/hppa/machine.c| 35 ++-
  target/hppa/cpu.c|  2 +-
  target/hppa/cpu.h|  5 +
  target/hppa/helper.h |  1 +
  target/hppa/insns.decode |  1 +
  target/hppa/int_helper.c | 19 ++-
  target/hppa/op_helper.c  |  7 ++-
  target/hppa/translate.c  | 10 ++
  8 files changed, 76 insertions(+), 4 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 2/6] hw/hppa: Allow up to 16 emulated CPUs

2022-01-31 Thread Richard Henderson


On 2/1/22 08:35, Helge Deller wrote:

This brings the hppa_hardware.h file in sync with the copy in the
SeaBIOS-hppa sources.

In order to support up to 16 CPUs, it's required to move the HPA for
MEMORY_HPA out of the address space of the 16th CPU.

Signed-off-by: Helge Deller
---
  hw/hppa/hppa_hardware.h | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/hppa/hppa_hardware.h b/hw/hppa/hppa_hardware.h
index bc258895c9..5edf577563 100644
--- a/hw/hppa/hppa_hardware.h
+++ b/hw/hppa/hppa_hardware.h
@@ -25,7 +25,7 @@
  #define LASI_GFX_HPA0xf800
  #define ARTIST_FB_ADDR  0xf900
  #define CPU_HPA 0xfffb
-#define MEMORY_HPA  0xfffbf000
+#define MEMORY_HPA  0xf000


You could mention that you're moving it *well* out of the way, perhaps.  I was first 
confused about why the gap between the two numbers was so large.




+#define PIM_STORAGE_SIZE 600   /* storage size of pdc_pim_toc_struct (64bit) */


Belongs to the next patch?  Completely unused?

Otherwise,
Reviewed-by: Richard Henderson 

r~

[PATCH v2 3/6] hppa: Add support for an emulated TOC/NMI button.

2022-01-31 Thread Helge Deller

Almost all PA-RISC machines have either a button that is labeled with 'TOC' or
a BMC/GSP function to trigger a TOC.  TOC is a non-maskable interrupt that is
sent to the processor.  This can be used for diagnostic purposes like obtaining
a stack trace/register dump or to enter KDB/KGDB in Linux.

This patch adds support for such an emulated TOC button.

It wires up the qemu monitor "nmi" command to trigger a TOC.  For that it
provides the hppa_nmi function which is assigned to the nmi_monitor_handler
function pointer.  When called it raises the EXCP_TOC hardware interrupt in the
hppa_cpu_do_interrupt() function.  The interrupt function then calls the
architecturally defined TOC function in SeaBIOS-hppa firmware (at fixed address
0xf000).

According to the PA-RISC PDC specification, the SeaBIOS firmware then writes
the CPU registers into PIM (processor internal memmory) for later analysis.  In
order to write all registers it needs to know the contents of the CPU "shadow
registers" and the IASQ- and IAOQ-back values. The IAOQ/IASQ values are
provided by qemu in shadow registers when entering the SeaBIOS TOC function.
This patch adds a new aritificial opcode "getshadowregs" (0xfffdead2) which
restores the original values of the shadow registers. With this opcode SeaBIOS
can store those registers as well into PIM before calling an OS-provided TOC
handler.

To trigger a TOC, switch to the qemu monitor with Ctrl-A C, and type in the
command "nmi".  After the TOC started the OS-debugger, exit the qemu monitor
with Ctrl-A C.

Signed-off-by: Helge Deller 
---
 hw/hppa/machine.c| 35 ++-
 target/hppa/cpu.c|  2 +-
 target/hppa/cpu.h|  5 +
 target/hppa/helper.h |  1 +
 target/hppa/insns.decode |  1 +
 target/hppa/int_helper.c | 19 ++-
 target/hppa/op_helper.c  |  7 ++-
 target/hppa/translate.c  | 10 ++
 8 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c
index 2a46af5bc9..98b30e0395 100644
--- a/hw/hppa/machine.c
+++ b/hw/hppa/machine.c
@@ -17,6 +17,7 @@
 #include "hw/timer/i8254.h"
 #include "hw/char/serial.h"
 #include "hw/net/lasi_82596.h"
+#include "hw/nmi.h"
 #include "hppa_sys.h"
 #include "qemu/units.h"
 #include "qapi/error.h"
@@ -355,6 +356,14 @@ static void hppa_machine_reset(MachineState *ms)
 cpu[0]->env.gr[19] = FW_CFG_IO_BASE;
 }

+static void hppa_nmi(NMIState *n, int cpu_index, Error **errp)
+{
+CPUState *cs;
+
+CPU_FOREACH(cs) {
+cpu_interrupt(cs, CPU_INTERRUPT_NMI);
+}
+}

 static void machine_hppa_machine_init(MachineClass *mc)
 {
@@ -371,4 +380,28 @@ static void machine_hppa_machine_init(MachineClass *mc)
 mc->default_ram_id = "ram";
 }

-DEFINE_MACHINE("hppa", machine_hppa_machine_init)
+static void machine_hppa_machine_init_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+machine_hppa_machine_init(mc);
+
+NMIClass *nc = NMI_CLASS(oc);
+nc->nmi_monitor_handler = hppa_nmi;
+}
+
+static const TypeInfo machine_hppa_machine_init_typeinfo = {
+.name = ("hppa" "-machine"),
+.parent = "machine",
+.class_init = machine_hppa_machine_init_class_init,
+.interfaces = (InterfaceInfo[]) {
+{ TYPE_NMI },
+{ }
+},
+};
+
+static void machine_hppa_machine_init_register_types(void)
+{
+type_register_static(_hppa_machine_init_typeinfo);
+}
+
+type_init(machine_hppa_machine_init_register_types)
diff --git a/target/hppa/cpu.c b/target/hppa/cpu.c
index 23eb254228..37b763fca0 100644
--- a/target/hppa/cpu.c
+++ b/target/hppa/cpu.c
@@ -62,7 +62,7 @@ static void hppa_cpu_synchronize_from_tb(CPUState *cs,

 static bool hppa_cpu_has_work(CPUState *cs)
 {
-return cs->interrupt_request & CPU_INTERRUPT_HARD;
+return cs->interrupt_request & (CPU_INTERRUPT_HARD | CPU_INTERRUPT_NMI);
 }

 static void hppa_cpu_disas_set_info(CPUState *cs, disassemble_info *info)
diff --git a/target/hppa/cpu.h b/target/hppa/cpu.h
index 45fd338b02..93c119532a 100644
--- a/target/hppa/cpu.h
+++ b/target/hppa/cpu.h
@@ -69,6 +69,11 @@
 #define EXCP_SYSCALL 30
 #define EXCP_SYSCALL_LWS 31

+/* Emulated hardware TOC button */
+#define EXCP_TOC 32 /* TOC = Transfer of control (NMI) */
+
+#define CPU_INTERRUPT_NMI   CPU_INTERRUPT_TGT_EXT_3 /* TOC */
+
 /* Taken from Linux kernel: arch/parisc/include/asm/psw.h */
 #define PSW_I0x0001
 #define PSW_D0x0002
diff --git a/target/hppa/helper.h b/target/hppa/helper.h
index 0a629ffa7c..fe8a9ce493 100644
--- a/target/hppa/helper.h
+++ b/target/hppa/helper.h
@@ -80,6 +80,7 @@ DEF_HELPER_FLAGS_0(read_interval_timer, TCG_CALL_NO_RWG, tr)
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_1(halt, noreturn, env)
 DEF_HELPER_1(reset, noreturn, env)
+DEF_HELPER_1(getshadowregs, void, env)
 DEF_HELPER_1(rfi, void, env)
 DEF_HELPER_1(rfi_r, void, env)
 DEF_HELPER_FLAGS_2(write_interval_timer,

[PATCH v2 4/6] hw/display/artist: rewrite vram access mode handling

2022-01-31 Thread Helge Deller

From: Sven Schnelle 

When writing this code it was assumed that register 0x118000 is the
buffer access mode for color map accesses. It turned out that this
is wrong. Instead register 0x118000 sets both src and dst buffer
access mode at the same time.

This required a larger rewrite of the code. The good thing is that
both the linear framebuffer and the register based vram access can
now be combined into one function.

This makes the linux 'stifb' framebuffer work, and both HP-UX 10.20
and HP-UX 11.11 are still working.

Signed-off-by: Sven Schnelle 
Signed-off-by: Helge Deller 
Cc: qemu-sta...@nongnu.org
Signed-off-by: Helge Deller 
---
 hw/display/artist.c | 416 
 hw/display/trace-events |   8 +-
 2 files changed, 166 insertions(+), 258 deletions(-)

diff --git a/hw/display/artist.c b/hw/display/artist.c
index 21b7fd1b44..442bdbc130 100644
--- a/hw/display/artist.c
+++ b/hw/display/artist.c
@@ -91,7 +91,6 @@ struct ARTISTState {
 uint32_t reg_300208;
 uint32_t reg_300218;

-uint32_t cmap_bm_access;
 uint32_t dst_bm_access;
 uint32_t src_bm_access;
 uint32_t control_plane;
@@ -134,7 +133,7 @@ typedef enum {
 PATTERN_LINE_START = 0x100ecc,
 LINE_SIZE = 0x100e04,
 LINE_END = 0x100e44,
-CMAP_BM_ACCESS = 0x118000,
+DST_SRC_BM_ACCESS = 0x118000,
 DST_BM_ACCESS = 0x118004,
 SRC_BM_ACCESS = 0x118008,
 CONTROL_PLANE = 0x11800c,
@@ -176,7 +175,7 @@ static const char *artist_reg_name(uint64_t addr)
 REG_NAME(TRANSFER_DATA);
 REG_NAME(CONTROL_PLANE);
 REG_NAME(IMAGE_BITMAP_OP);
-REG_NAME(CMAP_BM_ACCESS);
+REG_NAME(DST_SRC_BM_ACCESS);
 REG_NAME(DST_BM_ACCESS);
 REG_NAME(SRC_BM_ACCESS);
 REG_NAME(CURSOR_POS);
@@ -222,40 +221,14 @@ static void artist_invalidate_lines(struct vram_buffer 
*buf,
 }
 }

-static int vram_write_pix_per_transfer(ARTISTState *s)
-{
-if (s->cmap_bm_access) {
-return 1 << ((s->cmap_bm_access >> 27) & 0x0f);
-} else {
-return 1 << ((s->dst_bm_access >> 27) & 0x0f);
-}
-}
-
-static int vram_pixel_length(ARTISTState *s)
-{
-if (s->cmap_bm_access) {
-return (s->cmap_bm_access >> 24) & 0x07;
-} else {
-return (s->dst_bm_access >> 24) & 0x07;
-}
-}
-
 static int vram_write_bufidx(ARTISTState *s)
 {
-if (s->cmap_bm_access) {
-return (s->cmap_bm_access >> 12) & 0x0f;
-} else {
-return (s->dst_bm_access >> 12) & 0x0f;
-}
+return (s->dst_bm_access >> 12) & 0x0f;
 }

 static int vram_read_bufidx(ARTISTState *s)
 {
-if (s->cmap_bm_access) {
-return (s->cmap_bm_access >> 12) & 0x0f;
-} else {
-return (s->src_bm_access >> 12) & 0x0f;
-}
+return (s->src_bm_access >> 12) & 0x0f;
 }

 static struct vram_buffer *vram_read_buffer(ARTISTState *s)
@@ -352,130 +325,6 @@ static void artist_invalidate_cursor(ARTISTState *s)
 y, s->cursor_height);
 }

-static void vram_bit_write(ARTISTState *s, int posy, bool incr_x,
-   int size, uint32_t data)
-{
-struct vram_buffer *buf;
-uint32_t vram_bitmask = s->vram_bitmask;
-int mask, i, pix_count, pix_length;
-unsigned int posx, offset, width;
-uint8_t *data8, *p;
-
-pix_count = vram_write_pix_per_transfer(s);
-pix_length = vram_pixel_length(s);
-
-buf = vram_write_buffer(s);
-width = buf->width;
-
-if (s->cmap_bm_access) {
-offset = s->vram_pos;
-} else {
-posx = ADDR_TO_X(s->vram_pos >> 2);
-posy += ADDR_TO_Y(s->vram_pos >> 2);
-offset = posy * width + posx;
-}
-
-if (!buf->size || offset >= buf->size) {
-return;
-}
-
-p = buf->data;
-
-if (pix_count > size * 8) {
-pix_count = size * 8;
-}
-
-switch (pix_length) {
-case 0:
-if (s->image_bitmap_op & 0x2000) {
-data &= vram_bitmask;
-}
-
-for (i = 0; i < pix_count; i++) {
-uint32_t off = offset + pix_count - 1 - i;
-if (off < buf->size) {
-artist_rop8(s, buf, off,
-(data & 1) ? (s->plane_mask >> 24) : 0);
-}
-data >>= 1;
-}
-memory_region_set_dirty(>mr, offset, pix_count);
-break;
-
-case 3:
-if (s->cmap_bm_access) {
-if (offset + 3 < buf->size) {
-*(uint32_t *)(p + offset) = data;
-}
-break;
-}
-data8 = (uint8_t *)
-
-for (i = 3; i >= 0; i--) {
-if (!(s->image_bitmap_op & 0x2000) ||
-s->vram_bitmask & (1 << (28 + i))) {
-uint32_t off = offset + 3 - i;
-if (off < buf->size) {
-artist_rop8(s, buf, off, data8[ROP8OFF(i)]);
-}
-}
-}
-memory_region_set_dirty(>mr, offset, 3);
-break;
-
-case 6:
-switch (size) {
-

Re: [PATCH v2] qsd: Document fuse's allow-other option

2022-01-31 Thread Eric Blake

On Mon, Jan 31, 2022 at 11:31:24AM +0100, Hanna Reitz wrote:
> We did not add documentation to the storage daemon's man page for fuse's
> allow-other option when it was introduced, so do that now.
> 
> Fixes: 8fc54f9428b9763f800 ("export/fuse: Add allow-other option")
> Signed-off-by: Hanna Reitz 
> ---
> v2:
> - Replaced instances of "QSD" by more generic descriptions, as suggested
>   by Kevin.
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[PATCH v2 2/6] hw/hppa: Allow up to 16 emulated CPUs

2022-01-31 Thread Helge Deller

This brings the hppa_hardware.h file in sync with the copy in the
SeaBIOS-hppa sources.

In order to support up to 16 CPUs, it's required to move the HPA for
MEMORY_HPA out of the address space of the 16th CPU.

Signed-off-by: Helge Deller 
---
 hw/hppa/hppa_hardware.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/hppa/hppa_hardware.h b/hw/hppa/hppa_hardware.h
index bc258895c9..5edf577563 100644
--- a/hw/hppa/hppa_hardware.h
+++ b/hw/hppa/hppa_hardware.h
@@ -25,7 +25,7 @@
 #define LASI_GFX_HPA0xf800
 #define ARTIST_FB_ADDR  0xf900
 #define CPU_HPA 0xfffb
-#define MEMORY_HPA  0xfffbf000
+#define MEMORY_HPA  0xf000

 #define PCI_HPA DINO_HPA/* PCI bus */
 #define IDE_HPA 0xf900  /* Boot disc controller */
@@ -43,9 +43,10 @@
 #define PORT_SERIAL1(DINO_UART_HPA + 0x800)
 #define PORT_SERIAL2(LASI_UART_HPA + 0x800)

-#define HPPA_MAX_CPUS   8   /* max. number of SMP CPUs */
+#define HPPA_MAX_CPUS   16  /* max. number of SMP CPUs */
 #define CPU_CLOCK_MHZ   250 /* emulate a 250 MHz CPU */

 #define CPU_HPA_CR_REG  7   /* store CPU HPA in cr7 (SeaBIOS internal) */
+#define PIM_STORAGE_SIZE 600   /* storage size of pdc_pim_toc_struct (64bit) */

 #endif
--
2.34.1

Re: [PATCH] qapi/block: Cosmetic change in BlockExportType schema

2022-01-31 Thread Eric Blake

On Sun, Jan 30, 2022 at 07:50:41PM +0100, Philippe Mathieu-Daudé wrote:
> On 28/1/22 21:54, Eric Blake wrote:
> > On Wed, Jan 19, 2022 at 01:14:39PM +0100, Philippe Mathieu-Daudé wrote:
> > > From: Philippe Mathieu-Daude 

'git am' used this line to insert the authorship...

> > > 
> > > From: Philippe Mathieu-Daudé 

...then left this line in the commit body, which I manually deleted,
without spotting the difference between the two.

> > 
> > The doubled From: looks odd here.  I'll double-check that git doesn't
> > mess up the actual commit once I apply the patch.
> 
> I played with the git --from option to not appear in the list as
> '"Philippe Mathieu-Daudé via" ':
> https://lore.kernel.org/qemu-devel/efc5f304-f3d2-ff7b-99a6-673595ff0...@amsat.org/
> by using a different sendemail.from (removing the acute in my
> lastname) to force a correct author.from.
> git-am should have picked the 2nd form, but I see the 1st in commit
> 3a8fa0edd1. Just curious, did you had to modify it manually?

Alas, since I managed to overlook the change in the acute (I suppose
I'm cursed with having a boring name, so unlike many list participants
who are overjoyed by the power of UTF-8 to make self-expression more
accurate, I have not had as much experience with thinking about it),
my manual edits explain why the merged commit ended up with a less
desirable spelling.  I apologize for the mishap.  Do we need/want a
.mailmap entry to aid git at listing your preferred spelling?

> 
> Anyway, thanks for merging this :)

And thanks for bearing with developers that are still learning to
overcome accidental cultural bias!

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[PATCH v2 0/6] Fixes and updates for hppa target

2022-01-31 Thread Helge Deller

This patchset fixes some important bugs in the hppa artist graphics driver:
- Fix various artist graphics misrenderings on HP-UX and Linux
- Fix draw_line() function on artist graphic
- Fix mouse cursor positioning and focus under HP-UX

New qemu features for hppa:
- Allow up to 16 emulated CPUs (instead of 8)
- Add support for an emulated TOC/NMI button

A new Seabios-hppa firmware:
- Update SeaBIOS-hppa to VERSION 3
- New opt/hostid fw_cfg option to change hostid
- Add opt/console fw_cfg option to select default console
- Added 16x32 font to STI firmware

Signed-off-by: Helge Deller 

Helge Deller (4):
  seabios-hppa: Update SeaBIOS-hppa to VERSION 3
  hw/hppa: Allow up to 16 emulated CPUs
  hppa: Add support for an emulated TOC/NMI button.
  hw/display/artist: Mouse cursor fixes for HP-UX

Sven Schnelle (2):
  hw/display/artist: rewrite vram access mode handling
  hw/display/artist: Fix draw_line() artefacts

 hw/display/artist.c   | 463 --
 hw/display/trace-events   |   8 +-
 hw/hppa/hppa_hardware.h   |   5 +-
 hw/hppa/machine.c |  35 ++-
 pc-bios/hppa-firmware.img | Bin 757144 -> 701964 bytes
 roms/seabios-hppa |   2 +-
 target/hppa/cpu.c |   2 +-
 target/hppa/cpu.h |   5 +
 target/hppa/helper.h  |   1 +
 target/hppa/insns.decode  |   1 +
 target/hppa/int_helper.c  |  19 +-
 target/hppa/op_helper.c   |   7 +-
 target/hppa/translate.c   |  10 +
 13 files changed, 283 insertions(+), 275 deletions(-)

-- 
2.34.1

[PATCH v2 6/6] hw/display/artist: Fix draw_line() artefacts

2022-01-31 Thread Helge Deller

From: Sven Schnelle 

The draw_line() function left artefacts on the screen because it was using the
x/y variables which were incremented in the loop before. Fix it by using the
unmodified x1/x2 variables instead.

Signed-off-by: Sven Schnelle 
Signed-off-by: Helge Deller 
Cc: qemu-sta...@nongnu.org
Signed-off-by: Helge Deller 
---
 hw/display/artist.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/display/artist.c b/hw/display/artist.c
index 8a9fa482d0..1d877998b9 100644
--- a/hw/display/artist.c
+++ b/hw/display/artist.c
@@ -553,10 +553,11 @@ static void draw_line(ARTISTState *s,
 }
 x++;
 } while (x <= x2 && (max_pix == -1 || --max_pix > 0));
+
 if (c1)
-artist_invalidate_lines(buf, x, dy+1);
+artist_invalidate_lines(buf, x1, x2 - x1);
 else
-artist_invalidate_lines(buf, y, dx+1);
+artist_invalidate_lines(buf, y1 > y2 ? y2 : y1, x2 - x1);
 }

 static void draw_line_pattern_start(ARTISTState *s)
--
2.34.1

[PATCH v2 5/6] hw/display/artist: Mouse cursor fixes for HP-UX

2022-01-31 Thread Helge Deller

This patch fix the behaviour and positioning of the X11 mouse cursor in HP-UX.

The current code missed to subtract the offset of the CURSOR_CTRL register from
the current mouse cursor position. The HP-UX graphics driver stores in this
register the offset of the mouse graphics compared to the current cursor
position.  Without this adjustment the mouse behaves strange at the screen
borders.

Additionally, depending on the HP-UX version, the mouse cursor position
in the cursor_pos register reports different values. To accommodate this
track the current min and max reported values and auto-adjust at runtime.

With this fix the mouse now behaves as expected on HP-UX 10 and 11.

Signed-off-by: Helge Deller 
Cc: qemu-sta...@nongnu.org
Signed-off-by: Helge Deller 
---
 hw/display/artist.c | 42 ++
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/hw/display/artist.c b/hw/display/artist.c
index 442bdbc130..8a9fa482d0 100644
--- a/hw/display/artist.c
+++ b/hw/display/artist.c
@@ -80,6 +80,7 @@ struct ARTISTState {
 uint32_t line_pattern_skip;

 uint32_t cursor_pos;
+uint32_t cursor_cntrl;

 uint32_t cursor_height;
 uint32_t cursor_width;
@@ -301,19 +302,42 @@ static void artist_get_cursor_pos(ARTISTState *s, int *x, 
int *y)
 {
 /*
  * Don't know whether these magic offset values are configurable via
- * some register. They are the same for all resolutions, so don't
- * bother about it.
+ * some register. They seem to be the same for all resolutions.
+ * The cursor values provided in the registers are:
+ * X-value: -295 (for HP-UX 11) and 338 (for HP-UX 10.20) up to 2265
+ * Y-value: 1146 down to 0
+ * The emulated Artist graphic is like a CRX graphic, and as such
+ * it's usually fixed at 1280x1024 pixels.
+ * Because of the maximum Y-value of 1146 you can not choose a higher
+ * vertical resolution on HP-UX (unless you disable the mouse).
  */

-*y = 0x47a - artist_get_y(s->cursor_pos);
-*x = ((artist_get_x(s->cursor_pos) - 338) / 2);
+static int offset = 338;
+int lx;
+
+/* ignore if uninitialized */
+if (s->cursor_pos == 0) {
+*x = *y = 0;
+return;
+}
+
+lx = artist_get_x(s->cursor_pos);
+if (lx < offset)
+offset = lx;
+*x = (lx - offset) / 2;
+
+*y = 1146 - artist_get_y(s->cursor_pos);
+
+/* subtract cursor offset from cursor control register */
+*x -= (s->cursor_cntrl & 0xf0) >> 4;
+*y -= (s->cursor_cntrl & 0x0f);

 if (*x > s->width) {
-*x = 0;
+*x = s->width;
 }

 if (*y > s->height) {
-*y = 0;
+*y = s->height;
 }
 }

@@ -1027,6 +1051,7 @@ static void artist_reg_write(void *opaque, hwaddr addr, 
uint64_t val,
 break;

 case CURSOR_CTRL:
+combine_write_reg(addr, val, size, >cursor_cntrl);
 break;

 case IMAGE_BITMAP_OP:
@@ -1331,8 +1356,8 @@ static int vmstate_artist_post_load(void *opaque, int 
version_id)

 static const VMStateDescription vmstate_artist = {
 .name = "artist",
-.version_id = 1,
-.minimum_version_id = 1,
+.version_id = 2,
+.minimum_version_id = 2,
 .post_load = vmstate_artist_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT16(height, ARTISTState),
@@ -1352,6 +1377,7 @@ static const VMStateDescription vmstate_artist = {
 VMSTATE_UINT32(line_end, ARTISTState),
 VMSTATE_UINT32(line_xy, ARTISTState),
 VMSTATE_UINT32(cursor_pos, ARTISTState),
+VMSTATE_UINT32(cursor_cntrl, ARTISTState),
 VMSTATE_UINT32(cursor_height, ARTISTState),
 VMSTATE_UINT32(cursor_width, ARTISTState),
 VMSTATE_UINT32(plane_mask, ARTISTState),
--
2.34.1

Re: [RFC PATCH 1/1] i386: Remove features from Epyc-Milan cpu

2022-01-31 Thread Leonardo Bras Soares Passos

On Mon, Jan 31, 2022 at 3:04 PM Daniel P. Berrangé  wrote:
>
> On Mon, Jan 31, 2022 at 02:56:38PM -0300, Leonardo Bras Soares Passos wrote:
> > Hello Daniel,
> >
> > On Mon, Jan 31, 2022 at 6:08 AM Daniel P. Berrangé  
> > wrote:
> > >
> > > CC'ing  Babu Moger who aded the Milan CPU model.
> > >
> > > On Sat, Jan 29, 2022 at 07:23:37AM -0300, Leonardo Bras wrote:
> > > > While trying to bring a VM with EPYC-Milan cpu on a host with
> > > > EPYC-Milan cpu (EPYC 7313), the following warning can be seen:
> > > >
> > > > qemu-system-x86_64: warning: host doesn't support requested feature: 
> > > > CPUID.07H:EBX.erms [bit 9]
> > > > qemu-system-x86_64: warning: host doesn't support requested feature: 
> > > > CPUID.07H:EDX.fsrm [bit 4]
> > > >
> > > > Even with this warning, the host goes up.
> > > >
> > > > Then, grep'ing cpuid output on both guest and host, outputs:
> > > >
> > > > extended feature flags (7):
> > > >   enhanced REP MOVSB/STOSB = false
> > > >   fast short REP MOV   = false
> > > >   (simple synth)  = AMD EPYC (3rd Gen) (Milan B1) [Zen 3], 7nm
> > > >brand = "AMD EPYC 7313 16-Core Processor   "
> > > >
> > > > This means that for the same -cpu model (EPYC-Milan), the vcpu may or 
> > > > may
> > > > not have the above feature bits set, which is usually not a good idea 
> > > > for
> > > > live migration:
> > > > Migrating from a host with these features to a host without them can
> > > > be troublesome for the guest.
> > > >
> > > > Remove the "optional" features (erms, fsrm) from Epyc-Milan, in order to
> > > > avoid possible after-migration guest issues.
> > >
> > > Babu,  can you give some insight into availability of erms / fsrm
> > > features across the EPYC 3rd gen CPU line. Is this example missing
> > > erms/fsrm an exception, or common place ?
> > >
> > > >
> > > > Signed-off-by: Leonardo Bras 
> > > > ---
> > > >
> > > > Does this make sense? Or maybe I am missing something here.
> > > >
> > > > Having a kvm guest running with a feature bit, while the host
> > > > does not support it seems to cause a possible break the guest.
> > >
> > > The guest won't see the feature bit - that warning message from QEMU
> > > is telling you that it did't honour the request to expose
> > > erms / fsrm - it has dropped them from the CPUO exposed to the guest.
> >
> > Exactly.
> > What I meant here is:
> > 1 - Host with these feature bits start a VM with EPYC-Milan cpu (and
> > thus have those bits enabled)
> > 2 - Guest is migrated to a host such as the above, which does not
> > support those features (bits disabled), but does support EPYC-Milan
> > cpus (without those features).
> > 3 - The migration should be allowed, given the same cpu types. Then
> > either we have:
> > 3a : The guest vcpu stays with the flag enabled (case I tried to
> > explain above), possibly crashing if the new feature is used, or
> > 3b: The guest vcpu disables the flag due to incompatibility,  which
> > may make the guest confuse due to cpu change, and even end up trying
> > to use the new feature on the guest, even if it's disabled.
>
> Neither should happen with a correctly written mgmt app in charge.
>
> When launching a QEMU process for an incoming migration, it is expected
> that the mgmt app has first queried QEMU on the source to get the precise
> CPU model + flags that were added/removed on the source. The QEMU on
> the target is then launched with this exact set of flags, and the
> 'check' flag is also set for -cpu. That will cause QEMU on the target
> to refuse to start unless it can give the guest the 100% identical
> CPUID to what has been requested on the CLI, and thus matching the
> source.
>
> Libvirt will ensure all this is done correctly. If not using libvirt
> then you've got a bunch of work to do to achieve this. It certainly
> isn't sufficient to merely use the same plain '-cpu' arg that the
> soruce was original booted with, unless you have 100% identical
> hardware, microcode, and software on both hosts, or the target host
> offers a superset of features.

Oh, that is very interesting! Thanks for sharing!

Well, then at least one unexpected scenario should happen:
- VM with EPYC-Milan cpu, created in source host
- Source host with EPYC-Milan cpu. Support for 'extra features'
enabled ( erms / fsrm in this ex.)
- Target host with EPYC-Milan cpu. No support for 'extra features'.
Since the VM will be created with support for 'extra features', trying
to migrate from source host to target host should fail, right?

Which is, IMHO, odd. I imagine questions like:
- "How does a host with EPYC-Milan cpu does not offer support to
receive a live migration of some VMs with EPYC-Milan cpu?", or even
- "If I can create a VM with EPYC-Milan cpu on that host, why can't I
receive (via migration) some VMs with EPYC-Milan CPU ?"

But I am new to live migration, so maybe I am getting something wrong
regarding the cpu-model idea.

Best regards,
Leo



>
>
>

Re: [PATCH] replay: use CF_NOIRQ for special exception-replaying TB

2022-01-31 Thread Richard Henderson


On 1/31/22 22:25, Pavel Dovgalyuk wrote:

Commit aff0e204cb1f1c036a496c94c15f5dfafcd9b4b4 introduced CF_NOIRQ usage,
but one case was forgotten. Record/replay uses one special TB which is not
really executed, but used to cause a correct exception in replay mode.
This patch adds CF_NOIRQ flag for such block.

Signed-off-by: Pavel Dovgalyuk
---
  accel/tcg/cpu-exec.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~

[PULL 13/40] bsd-user/arm/target_arch_cpu.h: Correct code pointer

2022-01-31 Thread Warner Losh

The code has moved in FreeBSD since the emulator was started, update the
comment to reflect that change.

Signed-off-by: Warner Losh 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 bsd-user/arm/target_arch_cpu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bsd-user/arm/target_arch_cpu.h b/bsd-user/arm/target_arch_cpu.h
index 05b19ce6119..905f13aa1b9 100644
--- a/bsd-user/arm/target_arch_cpu.h
+++ b/bsd-user/arm/target_arch_cpu.h
@@ -73,7 +73,7 @@ static inline void target_cpu_loop(CPUARMState *env)
 int32_t syscall_nr = n;
 int32_t arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8;
 
-/* See arm/arm/trap.c cpu_fetch_syscall_args() */
+/* See arm/arm/syscall.c cpu_fetch_syscall_args() */
 if (syscall_nr == TARGET_FREEBSD_NR_syscall) {
 syscall_nr = env->regs[0];
 arg1 = env->regs[1];
-- 
2.33.1

[PULL 25/40] bsd-user/signal.c: Implement rewind_if_in_safe_syscall

2022-01-31 Thread Warner Losh

Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 bsd-user/qemu.h   |  2 ++
 bsd-user/signal.c | 13 -
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index 49f01932a53..8ed1bfbca89 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -446,4 +446,6 @@ static inline void *lock_user_string(abi_ulong guest_addr)
 
 #include 
 
+#include "user/safe-syscall.h"
+
 #endif /* QEMU_H */
diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index db8cf0a08f1..454aef2993e 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -48,6 +48,18 @@ int target_to_host_signal(int sig)
 return sig;
 }
 
+/* Adjust the signal context to rewind out of safe-syscall if we're in it */
+static inline void rewind_if_in_safe_syscall(void *puc)
+{
+ucontext_t *uc = (ucontext_t *)puc;
+uintptr_t pcreg = host_signal_pc(uc);
+
+if (pcreg > (uintptr_t)safe_syscall_start
+&& pcreg < (uintptr_t)safe_syscall_end) {
+host_signal_set_pc(uc, (uintptr_t)safe_syscall_start);
+}
+}
+
 static bool has_trapno(int tsig)
 {
 return tsig == TARGET_SIGILL ||
@@ -57,7 +69,6 @@ static bool has_trapno(int tsig)
 tsig == TARGET_SIGTRAP;
 }
 
-
 /* Siginfo conversion. */
 
 /*
-- 
2.33.1

[PULL 18/40] bsd-user/signal.c: Add si_type argument to queue_signal

2022-01-31 Thread Warner Losh

Mirror the linux-user practice and add a si_type argument to queue
signal. This will be transported as the upper 8 bits in the si_type
element of siginfo so that we know what bits of the structure are valid
and so we can properly implement host_to_target_siginfo_noswap and
tswap_siginfo. Adapt the one caller of queue_signal to the new
interface.  Use all the same names as Linux (except _RT which we don't
treat differently, unlike Linux), though some are unused. Place this
into signal-common.h since that's a better place given bsd-user's
structure. Move prototype of queue_signal to signal-common.h to mirror
linux-user's location.

Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/signal-common.h | 26 +-
 bsd-user/signal.c|  5 +++--
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/bsd-user/signal-common.h b/bsd-user/signal-common.h
index efed23d9efb..80e9503238a 100644
--- a/bsd-user/signal-common.h
+++ b/bsd-user/signal-common.h
@@ -15,8 +15,32 @@ long do_sigreturn(CPUArchState *env);
 void force_sig_fault(int sig, int code, abi_ulong addr);
 int host_to_target_signal(int sig);
 void process_pending_signals(CPUArchState *env);
-void queue_signal(CPUArchState *env, int sig, target_siginfo_t *info);
+void queue_signal(CPUArchState *env, int sig, int si_type,
+  target_siginfo_t *info);
 void signal_init(void);
 int target_to_host_signal(int sig);
 
+/*
+ * Within QEMU the top 8 bits of si_code indicate which of the parts of the
+ * union in target_siginfo is valid. This only applies between
+ * host_to_target_siginfo_noswap() and tswap_siginfo(); it does not appear
+ * either within host siginfo_t or in target_siginfo structures which we get
+ * from the guest userspace program. Linux kenrels use this internally, but BSD
+ * kernels don't do this, but its a useful abstraction.
+ *
+ * The linux-user version of this uses the top 16 bits, but FreeBSD's SI_USER
+ * and other signal indepenent SI_ codes have bit 16 set, so we only use the 
top
+ * byte instead.
+ *
+ * For FreeBSD, we have si_pid, si_uid, si_status, and si_addr always. Linux 
and
+ * {Open,Net}BSD have a different approach (where their reason field is larger,
+ * but whose siginfo has fewer fields always).
+ */
+#define QEMU_SI_NOINFO   0  /* nothing other than si_signo valid */
+#define QEMU_SI_FAULT1  /* _fault is valid in _reason */
+#define QEMU_SI_TIMER2  /* _timer is valid in _reason */
+#define QEMU_SI_MESGQ3  /* _mesgq is valid in _reason */
+#define QEMU_SI_POLL 4  /* _poll is valid in _reason */
+#define QEMU_SI_CAPSICUM 5  /* _capsicum is valid in _reason */
+
 #endif
diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index 3ef7cf5e23c..ad8437a8bfb 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -50,7 +50,8 @@ int target_to_host_signal(int sig)
  * Queue a signal so that it will be send to the virtual CPU as soon as
  * possible.
  */
-void queue_signal(CPUArchState *env, int sig, target_siginfo_t *info)
+void queue_signal(CPUArchState *env, int sig, int si_type,
+  target_siginfo_t *info)
 {
 qemu_log_mask(LOG_UNIMP, "No signal queueing, dropping signal %d\n", sig);
 }
@@ -91,7 +92,7 @@ void force_sig_fault(int sig, int code, abi_ulong addr)
 info.si_errno = 0;
 info.si_code = code;
 info.si_addr = addr;
-queue_signal(env, sig, );
+queue_signal(env, sig, QEMU_SI_FAULT, );
 }
 
 static void host_signal_handler(int host_sig, siginfo_t *info, void *puc)
-- 
2.33.1

[PULL 19/40] bsd-user/host/arm/host-signal.h: Implement host_signal_*

2022-01-31 Thread Warner Losh

Implement host_signal_pc, host_signal_set_pc and host_signal_write for
arm.

Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/host/arm/host-signal.h | 35 +
 1 file changed, 35 insertions(+)
 create mode 100644 bsd-user/host/arm/host-signal.h

diff --git a/bsd-user/host/arm/host-signal.h b/bsd-user/host/arm/host-signal.h
new file mode 100644
index 000..56679bd6993
--- /dev/null
+++ b/bsd-user/host/arm/host-signal.h
@@ -0,0 +1,35 @@
+/*
+ * host-signal.h: signal info dependent on the host architecture
+ *
+ * Copyright (c) 2021 Warner Losh
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef ARM_HOST_SIGNAL_H
+#define ARM_HOST_SIGNAL_H
+
+#include 
+
+static inline uintptr_t host_signal_pc(ucontext_t *uc)
+{
+return uc->uc_mcontext.__gregs[_REG_PC];
+}
+
+static inline void host_signal_set_pc(ucontext_t *uc, uintptr_t pc)
+{
+uc->uc_mcontext.__gregs[_REG_PC] = pc;
+}
+
+static inline bool host_signal_write(siginfo_t *info, ucontext_t *uc)
+{
+/*
+ * In the FSR, bit 11 is WnR. FreeBSD returns this as part of the
+ * si_info.si_trapno.
+ */
+uint32_t fsr = info->si_trapno;
+
+return extract32(fsr, 11, 1);
+}
+
+#endif
-- 
2.33.1

Re: [RFC 4/5] target/riscv: Add envcfg CSRs support

2022-01-31 Thread Atish Kumar Patra

On Fri, Jan 28, 2022 at 5:50 PM angell1518  wrote:

>
> 在 2022/1/29 上午9:28, Atish Patra 写道:
>
>
>
> On Wed, Jan 26, 2022 at 12:37 AM Weiwei Li  wrote:
>
>>
>> 在 2022/1/21 上午4:07, Atish Patra 写道:
>> > The RISC-V privileged specification v1.12 defines few execution
>> > environment configuration CSRs that can be used enable/disable
>> > extensions per privilege levels.
>> >
>> > Add the basic support for these CSRs.
>> >
>> > Signed-off-by: Atish Patra 
>> > ---
>> >   target/riscv/cpu.h  |  8 
>> >   target/riscv/cpu_bits.h | 31 +++
>> >   target/riscv/csr.c  | 84 +
>> >   target/riscv/machine.c  | 26 +
>> >   4 files changed, 149 insertions(+)
>> >
>> > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>> > index 7f87917204c5..b9462300a472 100644
>> > --- a/target/riscv/cpu.h
>> > +++ b/target/riscv/cpu.h
>> > @@ -264,6 +264,14 @@ struct CPURISCVState {
>> >   target_ulong spmbase;
>> >   target_ulong upmmask;
>> >   target_ulong upmbase;
>> > +
>> > +/* CSRs for execution enviornment configuration */
>> > +
>> > +target_ulong menvcfg;
>> > +target_ulong menvcfgh;
>>
>> I think we needn't maintain seperate menvcfg and menvcfgh, just use
>> "uint64_t menvcfg" as the way of mstatus.
>>
>>
> unlike mstatush, menvcfgh/henvcfgh will be accessed directly to do runtime
> predicate for stimecmp/vstimecmp.
>
> We have to do the 32 bit shifting during every check which makes the code
> hard to read
> at the cost of 2 ulongs.
>
> IMO, having separate variables is much simpler.
>
> Do you mean check STCE/VSTCE bit in menvcfg/henvcfg?
>
> If so, I think use a simple "uint64_t menvcfg/henvcfg" may be better,
> then we can only check the 63 bit of them.
>

Which is a bit confusing as the STCE bit in mencfgh/henvcfgh is 31 not 63.
But that's my personal preference.

I will just leave a comment to clarify the confusion for now. I will send a
patch with unified menvcfg and wait for others's feedback.

> Or we should decide where to get this bit from(mencvfg/henvcfg, or
> mencfgh/henvcfgh) based on the MXLEN/HSXLEN.
>
> Regards,
>
> Weiwei Li
>
>
> Similar to  henvcfg and henvcfg.
>>
>> > +target_ulong senvcfg;
>> > +target_ulong henvcfg;
>> > +target_ulong henvcfgh;
>> >   #endif
>> >
>> >   float_status fp_status;
>> > diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
>> > index f6f90b5cbd52..afb237c2313b 100644
>> > --- a/target/riscv/cpu_bits.h
>> > +++ b/target/riscv/cpu_bits.h
>> > @@ -177,6 +177,9 @@
>> >   #define CSR_STVEC   0x105
>> >   #define CSR_SCOUNTEREN  0x106
>> >
>> > +/* Supervisor Configuration CSRs */
>> > +#define CSR_SENVCFG 0x10A
>> > +
>> >   /* Supervisor Trap Handling */
>> >   #define CSR_SSCRATCH0x140
>> >   #define CSR_SEPC0x141
>> > @@ -204,6 +207,10 @@
>> >   #define CSR_HTIMEDELTA  0x605
>> >   #define CSR_HTIMEDELTAH 0x615
>> >
>> > +/* Hypervisor Configuration CSRs */
>> > +#define CSR_HENVCFG 0x60A
>> > +#define CSR_HENVCFGH0x61A
>> > +
>> >   /* Virtual CSRs */
>> >   #define CSR_VSSTATUS0x200
>> >   #define CSR_VSIE0x204
>> > @@ -218,6 +225,10 @@
>> >   #define CSR_MTINST  0x34a
>> >   #define CSR_MTVAL2  0x34b
>> >
>> > +/* Machine Configuration CSRs */
>> > +#define CSR_MENVCFG 0x30A
>> > +#define CSR_MENVCFGH0x31A
>> > +
>> >   /* Enhanced Physical Memory Protection (ePMP) */
>> >   #define CSR_MSECCFG 0x747
>> >   #define CSR_MSECCFGH0x757
>> > @@ -578,6 +589,26 @@ typedef enum RISCVException {
>> >   #define PM_EXT_CLEAN0x0002ULL
>> >   #define PM_EXT_DIRTY0x0003ULL
>> >
>> > +/* Execution enviornment configuration bits */
>> > +#define MENVCFG_FIOM   (1 << 0)
>> > +#define MENVCFG_CBE0x3ULL
>> > +#define MENVCFG_CBCFE  (1 << 6)
>> > +#define MENVCFG_CBZE   (1 << 7)
>> > +#define MENVCFG_PBMTE  (1 << 62)
>> > +#define MENVCFG_STCE   (1 << 63)
>> > +
>> > +#define SENVCFG_FIOM   MENVCFG_FIOM
>> > +#define SENVCFG_CBEMENVCFG_CBE
>> > +#define SENVCFG_CBCFE  MENVCFG_CBCFE
>> > +#define SENVCFG_CBZE   MENVCFG_CBZE
>> > +
>> > +#define HENVCFG_FIOM   MENVCFG_FIOM
>> > +#define HENVCFG_CBEMENVCFG_CBE
>> > +#define HENVCFG_CBCFE  MENVCFG_CBCFE
>> > +#define HENVCFG_CBZE   MENVCFG_CBZE
>> > +#define HENVCFG_PBMTE  MENVCFG_PBMTE
>> > +#define HENVCFG_STCE   MENVCFG_STCE
>> > +
>> >   /* Offsets for every pair of control bits per each priv level */
>> >   #define XS_OFFSET0ULL
>> >   #define U_OFFSET 2ULL
>> > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
>> > index

[PULL 17/40] bsd-user/signal.c: Implement signal_init()

2022-01-31 Thread Warner Losh

Initialize the signal state for the emulator. Setup a set of sane
default signal handlers, mirroring the host's signals. For fatal signals
(those that exit by default), establish our own set of signal
handlers. Stub out the actual signal handler we use for the moment.

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson  XXX SIGPROF 
PENDING
---
 bsd-user/qemu.h   |  7 +
 bsd-user/signal.c | 67 +++
 2 files changed, 74 insertions(+)

diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index 99c37fc9942..49f01932a53 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -94,6 +94,13 @@ typedef struct TaskState {
  * from multiple threads.)
  */
 int signal_pending;
+/*
+ * This thread's signal mask, as requested by the guest program.
+ * The actual signal mask of this thread may differ:
+ *  + we don't let SIGSEGV and SIGBUS be blocked while running guest code
+ *  + sometimes we block all signals to avoid races
+ */
+sigset_t signal_mask;
 
 uint8_t stack[];
 } __attribute__((aligned(16))) TaskState;
diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index 1313baec96a..3ef7cf5e23c 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -28,6 +28,9 @@
  * fork.
  */
 
+static struct target_sigaction sigact_table[TARGET_NSIG];
+static void host_signal_handler(int host_sig, siginfo_t *info, void *puc);
+
 /*
  * The BSD ABIs use the same singal numbers across all the CPU architectures, 
so
  * (unlike Linux) these functions are just the identity mapping. This might not
@@ -52,6 +55,28 @@ void queue_signal(CPUArchState *env, int sig, 
target_siginfo_t *info)
 qemu_log_mask(LOG_UNIMP, "No signal queueing, dropping signal %d\n", sig);
 }
 
+static int fatal_signal(int sig)
+{
+
+switch (sig) {
+case TARGET_SIGCHLD:
+case TARGET_SIGURG:
+case TARGET_SIGWINCH:
+case TARGET_SIGINFO:
+/* Ignored by default. */
+return 0;
+case TARGET_SIGCONT:
+case TARGET_SIGSTOP:
+case TARGET_SIGTSTP:
+case TARGET_SIGTTIN:
+case TARGET_SIGTTOU:
+/* Job control signals.  */
+return 0;
+default:
+return 1;
+}
+}
+
 /*
  * Force a synchronously taken QEMU_SI_FAULT signal. For QEMU the
  * 'force' part is handled in process_pending_signals().
@@ -69,8 +94,50 @@ void force_sig_fault(int sig, int code, abi_ulong addr)
 queue_signal(env, sig, );
 }
 
+static void host_signal_handler(int host_sig, siginfo_t *info, void *puc)
+{
+}
+
 void signal_init(void)
 {
+TaskState *ts = (TaskState *)thread_cpu->opaque;
+struct sigaction act;
+struct sigaction oact;
+int i;
+int host_sig;
+
+/* Set the signal mask from the host mask. */
+sigprocmask(0, 0, >signal_mask);
+
+sigfillset(_mask);
+act.sa_sigaction = host_signal_handler;
+act.sa_flags = SA_SIGINFO;
+
+for (i = 1; i <= TARGET_NSIG; i++) {
+#ifdef CONFIG_GPROF
+if (i == TARGET_SIGPROF) {
+continue;
+}
+#endif
+host_sig = target_to_host_signal(i);
+sigaction(host_sig, NULL, );
+if (oact.sa_sigaction == (void *)SIG_IGN) {
+sigact_table[i - 1]._sa_handler = TARGET_SIG_IGN;
+} else if (oact.sa_sigaction == (void *)SIG_DFL) {
+sigact_table[i - 1]._sa_handler = TARGET_SIG_DFL;
+}
+/*
+ * If there's already a handler installed then something has
+ * gone horribly wrong, so don't even try to handle that case.
+ * Install some handlers for our own use.  We need at least
+ * SIGSEGV and SIGBUS, to detect exceptions.  We can not just
+ * trap all signals because it affects syscall interrupt
+ * behavior.  But do trap all default-fatal signals.
+ */
+if (fatal_signal(i)) {
+sigaction(host_sig, , NULL);
+}
+}
 }
 
 void process_pending_signals(CPUArchState *cpu_env)
-- 
2.33.1

[PULL 11/40] bsd-user/signal.c: implement cpu_loop_exit_sigbus

2022-01-31 Thread Warner Losh

First attempt at implementing cpu_loop_exit_sigbus, mostly copied from
linux-user version of this function.

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 bsd-user/signal.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index 12de0e2dea4..844dfa19095 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -80,7 +80,13 @@ void cpu_loop_exit_sigsegv(CPUState *cpu, target_ulong addr,
 void cpu_loop_exit_sigbus(CPUState *cpu, target_ulong addr,
   MMUAccessType access_type, uintptr_t ra)
 {
-qemu_log_mask(LOG_UNIMP, "No signal support for SIGBUS\n");
-/* unreachable */
-abort();
+const struct TCGCPUOps *tcg_ops = CPU_GET_CLASS(cpu)->tcg_ops;
+
+if (tcg_ops->record_sigbus) {
+tcg_ops->record_sigbus(cpu, addr, access_type, ra);
+}
+
+force_sig_fault(TARGET_SIGBUS, TARGET_BUS_ADRALN, addr);
+cpu->exception_index = EXCP_INTERRUPT;
+cpu_loop_exit_restore(cpu, ra);
 }
-- 
2.33.1

[PULL 34/40] bsd-user/signal.c: process_pending_signals

2022-01-31 Thread Warner Losh

Process the currently queued signals.

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/signal.c | 56 ++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index 34e8c811ad6..4b398745f45 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -756,8 +756,62 @@ static void handle_pending_signal(CPUArchState *env, int 
sig,
 }
 }
 
-void process_pending_signals(CPUArchState *cpu_env)
+void process_pending_signals(CPUArchState *env)
 {
+CPUState *cpu = env_cpu(env);
+int sig;
+sigset_t *blocked_set, set;
+struct emulated_sigtable *k;
+TaskState *ts = cpu->opaque;
+
+while (qatomic_read(>signal_pending)) {
+sigfillset();
+sigprocmask(SIG_SETMASK, , 0);
+
+restart_scan:
+sig = ts->sync_signal.pending;
+if (sig) {
+/*
+ * Synchronous signals are forced by the emulated CPU in some way.
+ * If they are set to ignore, restore the default handler (see
+ * sys/kern_sig.c trapsignal() and execsigs() for this behavior)
+ * though maybe this is done only when forcing exit for non 
SIGCHLD.
+ */
+if (sigismember(>signal_mask, target_to_host_signal(sig)) ||
+sigact_table[sig - 1]._sa_handler == TARGET_SIG_IGN) {
+sigdelset(>signal_mask, target_to_host_signal(sig));
+sigact_table[sig - 1]._sa_handler = TARGET_SIG_DFL;
+}
+handle_pending_signal(env, sig, >sync_signal);
+}
+
+k = ts->sigtab;
+for (sig = 1; sig <= TARGET_NSIG; sig++, k++) {
+blocked_set = ts->in_sigsuspend ?
+>sigsuspend_mask : >signal_mask;
+if (k->pending &&
+!sigismember(blocked_set, target_to_host_signal(sig))) {
+handle_pending_signal(env, sig, k);
+/*
+ * Restart scan from the beginning, as handle_pending_signal
+ * might have resulted in a new synchronous signal (eg 
SIGSEGV).
+ */
+goto restart_scan;
+}
+}
+
+/*
+ * Unblock signals and check one more time. Unblocking signals may 
cause
+ * us to take another host signal, which will set signal_pending again.
+ */
+qatomic_set(>signal_pending, 0);
+ts->in_sigsuspend = false;
+set = ts->signal_mask;
+sigdelset(, SIGSEGV);
+sigdelset(, SIGBUS);
+sigprocmask(SIG_SETMASK, , 0);
+}
+ts->in_sigsuspend = false;
 }
 
 void cpu_loop_exit_sigsegv(CPUState *cpu, target_ulong addr,
-- 
2.33.1

[PULL 09/40] bsd-user/signal-common.h: Move signal functions prototypes to here

2022-01-31 Thread Warner Losh

Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/arm/target_arch_cpu.h| 1 +
 bsd-user/i386/target_arch_cpu.h   | 1 +
 bsd-user/qemu.h   | 8 
 bsd-user/signal-common.h  | 6 ++
 bsd-user/x86_64/target_arch_cpu.h | 1 +
 5 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/bsd-user/arm/target_arch_cpu.h b/bsd-user/arm/target_arch_cpu.h
index c526fc73502..b7f728fd667 100644
--- a/bsd-user/arm/target_arch_cpu.h
+++ b/bsd-user/arm/target_arch_cpu.h
@@ -21,6 +21,7 @@
 #define _TARGET_ARCH_CPU_H_
 
 #include "target_arch.h"
+#include "signal-common.h"
 
 #define TARGET_DEFAULT_CPU_MODEL "any"
 
diff --git a/bsd-user/i386/target_arch_cpu.h b/bsd-user/i386/target_arch_cpu.h
index b28602adbbd..472a96689fc 100644
--- a/bsd-user/i386/target_arch_cpu.h
+++ b/bsd-user/i386/target_arch_cpu.h
@@ -20,6 +20,7 @@
 #define _TARGET_ARCH_CPU_H_
 
 #include "target_arch.h"
+#include "signal-common.h"
 
 #define TARGET_DEFAULT_CPU_MODEL "qemu32"
 
diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index 671b26f00cc..99c37fc9942 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -199,14 +199,6 @@ print_openbsd_syscall(int num,
 void print_openbsd_syscall_ret(int num, abi_long ret);
 extern int do_strace;
 
-/* signal.c */
-void process_pending_signals(CPUArchState *cpu_env);
-void signal_init(void);
-long do_sigreturn(CPUArchState *env);
-long do_rt_sigreturn(CPUArchState *env);
-void queue_signal(CPUArchState *env, int sig, target_siginfo_t *info);
-abi_long do_sigaltstack(abi_ulong uss_addr, abi_ulong uoss_addr, abi_ulong sp);
-
 /* mmap.c */
 int target_mprotect(abi_ulong start, abi_ulong len, int prot);
 abi_long target_mmap(abi_ulong start, abi_ulong len, int prot,
diff --git a/bsd-user/signal-common.h b/bsd-user/signal-common.h
index 6207417d39e..f9a9d1e01aa 100644
--- a/bsd-user/signal-common.h
+++ b/bsd-user/signal-common.h
@@ -9,6 +9,12 @@
 #ifndef SIGNAL_COMMON_H
 #define SIGNAL_COMMON_H
 
+long do_rt_sigreturn(CPUArchState *env);
+abi_long do_sigaltstack(abi_ulong uss_addr, abi_ulong uoss_addr, abi_ulong sp);
+long do_sigreturn(CPUArchState *env);
 void force_sig_fault(int sig, int code, abi_ulong addr);
+void process_pending_signals(CPUArchState *env);
+void queue_signal(CPUArchState *env, int sig, target_siginfo_t *info);
+void signal_init(void);
 
 #endif
diff --git a/bsd-user/x86_64/target_arch_cpu.h 
b/bsd-user/x86_64/target_arch_cpu.h
index 5172b230f09..14def48adb5 100644
--- a/bsd-user/x86_64/target_arch_cpu.h
+++ b/bsd-user/x86_64/target_arch_cpu.h
@@ -20,6 +20,7 @@
 #define _TARGET_ARCH_CPU_H_
 
 #include "target_arch.h"
+#include "signal-common.h"
 
 #define TARGET_DEFAULT_CPU_MODEL "qemu64"
 
-- 
2.33.1

Re: [RFC PATCH 1/2] target/arm: move regime_ttbr helper

2022-01-31 Thread Vasilev Oleg via

[PULL 40/40] bsd-user/freebsd/target_os_ucontext.h: Prefer env as arg name for CPUArchState args

2022-01-31 Thread Warner Losh

Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/freebsd/target_os_ucontext.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/bsd-user/freebsd/target_os_ucontext.h 
b/bsd-user/freebsd/target_os_ucontext.h
index 41b28b2c150..b196b1c629f 100644
--- a/bsd-user/freebsd/target_os_ucontext.h
+++ b/bsd-user/freebsd/target_os_ucontext.h
@@ -36,9 +36,9 @@ abi_long set_sigtramp_args(CPUArchState *env, int sig,
struct target_sigframe *frame,
abi_ulong frame_addr,
struct target_sigaction *ka);
-abi_long get_mcontext(CPUArchState *regs, target_mcontext_t *mcp, int flags);
-abi_long set_mcontext(CPUArchState *regs, target_mcontext_t *mcp, int srflag);
-abi_long get_ucontext_sigreturn(CPUArchState *regs, abi_ulong target_sf,
+abi_long get_mcontext(CPUArchState *env, target_mcontext_t *mcp, int flags);
+abi_long set_mcontext(CPUArchState *env, target_mcontext_t *mcp, int srflag);
+abi_long get_ucontext_sigreturn(CPUArchState *env, abi_ulong target_sf,
 abi_ulong *target_uc);
 
 #endif /* TARGET_OS_UCONTEXT_H */
-- 
2.33.1

[PULL 24/40] bsd-user/signal.c: host_to_target_siginfo_noswap

2022-01-31 Thread Warner Losh

Implement conversion of host to target siginfo.

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/signal.c | 113 ++
 1 file changed, 113 insertions(+)

diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index cb0036acb61..db8cf0a08f1 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -48,6 +48,119 @@ int target_to_host_signal(int sig)
 return sig;
 }
 
+static bool has_trapno(int tsig)
+{
+return tsig == TARGET_SIGILL ||
+tsig == TARGET_SIGFPE ||
+tsig == TARGET_SIGSEGV ||
+tsig == TARGET_SIGBUS ||
+tsig == TARGET_SIGTRAP;
+}
+
+
+/* Siginfo conversion. */
+
+/*
+ * Populate tinfo w/o swapping based on guessing which fields are valid.
+ */
+static inline void host_to_target_siginfo_noswap(target_siginfo_t *tinfo,
+const siginfo_t *info)
+{
+int sig = host_to_target_signal(info->si_signo);
+int si_code = info->si_code;
+int si_type;
+
+/*
+ * Make sure we that the variable portion of the target siginfo is zeroed
+ * out so we don't leak anything into that.
+ */
+memset(>_reason, 0, sizeof(tinfo->_reason));
+
+/*
+ * This is awkward, because we have to use a combination of the si_code and
+ * si_signo to figure out which of the union's members are valid.o We
+ * therefore make our best guess.
+ *
+ * Once we have made our guess, we record it in the top 16 bits of
+ * the si_code, so that tswap_siginfo() later can use it.
+ * tswap_siginfo() will strip these top bits out before writing
+ * si_code to the guest (sign-extending the lower bits).
+ */
+tinfo->si_signo = sig;
+tinfo->si_errno = info->si_errno;
+tinfo->si_code = info->si_code;
+tinfo->si_pid = info->si_pid;
+tinfo->si_uid = info->si_uid;
+tinfo->si_status = info->si_status;
+tinfo->si_addr = (abi_ulong)(unsigned long)info->si_addr;
+/*
+ * si_value is opaque to kernel. On all FreeBSD platforms,
+ * sizeof(sival_ptr) >= sizeof(sival_int) so the following
+ * always will copy the larger element.
+ */
+tinfo->si_value.sival_ptr =
+(abi_ulong)(unsigned long)info->si_value.sival_ptr;
+
+switch (si_code) {
+/*
+ * All the SI_xxx codes that are defined here are global to
+ * all the signals (they have values that none of the other,
+ * more specific signal info will set).
+ */
+case SI_USER:
+case SI_LWP:
+case SI_KERNEL:
+case SI_QUEUE:
+case SI_ASYNCIO:
+/*
+ * Only the fixed parts are valid (though FreeBSD doesn't always
+ * set all the fields to non-zero values.
+ */
+si_type = QEMU_SI_NOINFO;
+break;
+case SI_TIMER:
+tinfo->_reason._timer._timerid = info->_reason._timer._timerid;
+tinfo->_reason._timer._overrun = info->_reason._timer._overrun;
+si_type = QEMU_SI_TIMER;
+break;
+case SI_MESGQ:
+tinfo->_reason._mesgq._mqd = info->_reason._mesgq._mqd;
+si_type = QEMU_SI_MESGQ;
+break;
+default:
+/*
+ * We have to go based on the signal number now to figure out
+ * what's valid.
+ */
+if (has_trapno(sig)) {
+tinfo->_reason._fault._trapno = info->_reason._fault._trapno;
+si_type = QEMU_SI_FAULT;
+}
+#ifdef TARGET_SIGPOLL
+/*
+ * FreeBSD never had SIGPOLL, but emulates it for Linux so there's
+ * a chance it may popup in the future.
+ */
+if (sig == TARGET_SIGPOLL) {
+tinfo->_reason._poll._band = info->_reason._poll._band;
+si_type = QEMU_SI_POLL;
+}
+#endif
+/*
+ * Unsure that this can actually be generated, and our support for
+ * capsicum is somewhere between weak and non-existant, but if we get
+ * one, then we know what to save.
+ */
+if (sig == TARGET_SIGTRAP) {
+tinfo->_reason._capsicum._syscall =
+info->_reason._capsicum._syscall;
+si_type = QEMU_SI_CAPSICUM;
+}
+break;
+}
+tinfo->si_code = deposit32(si_code, 24, 8, si_type);
+}
+
 /*
  * Queue a signal so that it will be send to the virtual CPU as soon as
  * possible.
-- 
2.33.1

[PULL 38/40] MAINTAINERS: Add tests/vm/*bsd to the list to get reviews on

2022-01-31 Thread Warner Losh

tests/vm/*bsd (especailly tests/vm/freebsd) are adjacent to the bsd-user
stuff and we're keen on keeping them working as well.

Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e4b3a4bcdf4..b7487f9b54b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3181,6 +3181,7 @@ R: Kyle Evans 
 S: Maintained
 F: bsd-user/
 F: configs/targets/*-bsd-user.mak
+F: tests/vm/*bsd
 T: git https://github.com/qemu-bsd-user/qemu-bsd-user bsd-user-rebase-3.1
 
 Linux user
-- 
2.33.1

[PULL 23/40] bsd-user: Add trace events for bsd-user

2022-01-31 Thread Warner Losh

Add the bsd-user specific events and infrastructure. Only include the
linux-user trace events for linux-user, not bsd-user.

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 bsd-user/signal.c |  1 +
 bsd-user/trace-events | 11 +++
 bsd-user/trace.h  |  1 +
 meson.build   |  5 -
 4 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 bsd-user/trace-events
 create mode 100644 bsd-user/trace.h

diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index f3e020e004a..cb0036acb61 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -21,6 +21,7 @@
 #include "qemu/osdep.h"
 #include "qemu.h"
 #include "signal-common.h"
+#include "trace.h"
 #include "hw/core/tcg-cpu-ops.h"
 #include "host-signal.h"
 
diff --git a/bsd-user/trace-events b/bsd-user/trace-events
new file mode 100644
index 000..843896f6271
--- /dev/null
+++ b/bsd-user/trace-events
@@ -0,0 +1,11 @@
+# See docs/tracing.txt for syntax documentation.
+
+# bsd-user/signal.c
+user_setup_frame(void *env, uint64_t frame_addr) "env=%p frame_addr=0x%"PRIx64
+user_setup_rt_frame(void *env, uint64_t frame_addr) "env=%p 
frame_addr=0x%"PRIx64
+user_do_rt_sigreturn(void *env, uint64_t frame_addr) "env=%p 
frame_addr=0x%"PRIx64
+user_do_sigreturn(void *env, uint64_t frame_addr) "env=%p frame_addr=0x%"PRIx64
+user_dump_core_and_abort(void *env, int target_sig, int host_sig) "env=%p 
signal %d (host %d)"
+user_handle_signal(void *env, int target_sig) "env=%p signal %d"
+user_host_signal(void *env, int host_sig, int target_sig) "env=%p signal %d 
(target %d("
+user_queue_signal(void *env, int target_sig) "env=%p signal %d"
diff --git a/bsd-user/trace.h b/bsd-user/trace.h
new file mode 100644
index 000..593c0204add
--- /dev/null
+++ b/bsd-user/trace.h
@@ -0,0 +1 @@
+#include "trace/trace-bsd_user.h"
diff --git a/meson.build b/meson.build
index 155403d44f4..5f433550718 100644
--- a/meson.build
+++ b/meson.build
@@ -2458,9 +2458,12 @@ trace_events_subdirs = [
   'monitor',
   'util',
 ]
-if have_user
+if have_linux_user
   trace_events_subdirs += [ 'linux-user' ]
 endif
+if have_bsd_user
+  trace_events_subdirs += [ 'bsd-user' ]
+endif
 if have_block
   trace_events_subdirs += [
 'authz',
-- 
2.33.1

[PULL 07/40] bsd-user/arm/target_arch_cpu.h: Move EXCP_ATOMIC to match linux-user

2022-01-31 Thread Warner Losh

Move the EXCP_ATOMIC case to match linux-user/arm/cpu_loop.c:cpu_loop
ordering.

Signed-off-by: Warner Losh 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 bsd-user/arm/target_arch_cpu.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/bsd-user/arm/target_arch_cpu.h b/bsd-user/arm/target_arch_cpu.h
index c675419c30a..c526fc73502 100644
--- a/bsd-user/arm/target_arch_cpu.h
+++ b/bsd-user/arm/target_arch_cpu.h
@@ -180,12 +180,12 @@ static inline void target_cpu_loop(CPUARMState *env)
 queue_signal(env, info.si_signo, );
 }
 break;
-case EXCP_ATOMIC:
-cpu_exec_step_atomic(cs);
-break;
 case EXCP_YIELD:
 /* nothing to do here for user-mode, just resume guest code */
 break;
+case EXCP_ATOMIC:
+cpu_exec_step_atomic(cs);
+break;
 default:
 fprintf(stderr, "qemu: unhandled CPU exception 0x%x - aborting\n",
 trapnr);
-- 
2.33.1

[PULL 32/40] bsd-user/signal.c: handle_pending_signal

2022-01-31 Thread Warner Losh

Handle a queued signal.

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/qemu.h   |  7 
 bsd-user/signal.c | 87 +++
 2 files changed, 94 insertions(+)

diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index de20650a00d..02921ac8b3b 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -99,6 +99,8 @@ typedef struct TaskState {
  * from multiple threads.)
  */
 int signal_pending;
+/* True if we're leaving a sigsuspend and sigsuspend_mask is valid. */
+bool in_sigsuspend;
 /*
  * This thread's signal mask, as requested by the guest program.
  * The actual signal mask of this thread may differ:
@@ -106,6 +108,11 @@ typedef struct TaskState {
  *  + sometimes we block all signals to avoid races
  */
 sigset_t signal_mask;
+/*
+ * The signal mask imposed by a guest sigsuspend syscall, if we are
+ * currently in the middle of such a syscall
+ */
+sigset_t sigsuspend_mask;
 
 /* This thread's sigaltstack, if it has one */
 struct target_sigaltstack sigaltstack_used;
diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index dbc13736073..366e047 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -616,6 +616,93 @@ void signal_init(void)
 }
 }
 
+static void handle_pending_signal(CPUArchState *env, int sig,
+  struct emulated_sigtable *k)
+{
+CPUState *cpu = env_cpu(env);
+TaskState *ts = cpu->opaque;
+struct target_sigaction *sa;
+int code;
+sigset_t set;
+abi_ulong handler;
+target_siginfo_t tinfo;
+target_sigset_t target_old_set;
+
+trace_user_handle_signal(env, sig);
+
+k->pending = 0;
+
+sig = gdb_handlesig(cpu, sig);
+if (!sig) {
+sa = NULL;
+handler = TARGET_SIG_IGN;
+} else {
+sa = _table[sig - 1];
+handler = sa->_sa_handler;
+}
+
+if (do_strace) {
+print_taken_signal(sig, >info);
+}
+
+if (handler == TARGET_SIG_DFL) {
+/*
+ * default handler : ignore some signal. The other are job
+ * control or fatal.
+ */
+if (sig == TARGET_SIGTSTP || sig == TARGET_SIGTTIN ||
+sig == TARGET_SIGTTOU) {
+kill(getpid(), SIGSTOP);
+} else if (sig != TARGET_SIGCHLD && sig != TARGET_SIGURG &&
+   sig != TARGET_SIGINFO && sig != TARGET_SIGWINCH &&
+   sig != TARGET_SIGCONT) {
+dump_core_and_abort(sig);
+}
+} else if (handler == TARGET_SIG_IGN) {
+/* ignore sig */
+} else if (handler == TARGET_SIG_ERR) {
+dump_core_and_abort(sig);
+} else {
+/* compute the blocked signals during the handler execution */
+sigset_t *blocked_set;
+
+target_to_host_sigset(, >sa_mask);
+/*
+ * SA_NODEFER indicates that the current signal should not be
+ * blocked during the handler.
+ */
+if (!(sa->sa_flags & TARGET_SA_NODEFER)) {
+sigaddset(, target_to_host_signal(sig));
+}
+
+/*
+ * Save the previous blocked signal state to restore it at the
+ * end of the signal execution (see do_sigreturn).
+ */
+host_to_target_sigset_internal(_old_set, >signal_mask);
+
+blocked_set = ts->in_sigsuspend ?
+>sigsuspend_mask : >signal_mask;
+sigorset(>signal_mask, blocked_set, );
+ts->in_sigsuspend = false;
+sigprocmask(SIG_SETMASK, >signal_mask, NULL);
+
+/* XXX VM86 on x86 ??? */
+
+code = k->info.si_code; /* From host, so no si_type */
+/* prepare the stack frame of the virtual CPU */
+if (sa->sa_flags & TARGET_SA_SIGINFO) {
+tswap_siginfo(, >info);
+setup_frame(sig, code, sa, _old_set, , env);
+} else {
+setup_frame(sig, code, sa, _old_set, NULL, env);
+}
+if (sa->sa_flags & TARGET_SA_RESETHAND) {
+sa->_sa_handler = TARGET_SIG_DFL;
+}
+}
+}
+
 void process_pending_signals(CPUArchState *cpu_env)
 {
 }
-- 
2.33.1

[PULL 33/40] bsd-user/signal.c: tswap_siginfo

2022-01-31 Thread Warner Losh

Convert siginfo from targer to host.

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/signal.c | 53 +++
 1 file changed, 53 insertions(+)

diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index 366e047..34e8c811ad6 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -256,6 +256,59 @@ static inline void 
host_to_target_siginfo_noswap(target_siginfo_t *tinfo,
 tinfo->si_code = deposit32(si_code, 24, 8, si_type);
 }
 
+static void tswap_siginfo(target_siginfo_t *tinfo, const target_siginfo_t 
*info)
+{
+int si_type = extract32(info->si_code, 24, 8);
+int si_code = sextract32(info->si_code, 0, 24);
+
+__put_user(info->si_signo, >si_signo);
+__put_user(info->si_errno, >si_errno);
+__put_user(si_code, >si_code); /* Zero out si_type, it's internal */
+__put_user(info->si_pid, >si_pid);
+__put_user(info->si_uid, >si_uid);
+__put_user(info->si_status, >si_status);
+__put_user(info->si_addr, >si_addr);
+/*
+ * Unswapped, because we passed it through mostly untouched.  si_value is
+ * opaque to the kernel, so we didn't bother with potentially wasting 
cycles
+ * to swap it into host byte order.
+ */
+tinfo->si_value.sival_ptr = info->si_value.sival_ptr;
+
+/*
+ * We can use our internal marker of which fields in the structure
+ * are valid, rather than duplicating the guesswork of
+ * host_to_target_siginfo_noswap() here.
+ */
+switch (si_type) {
+case QEMU_SI_NOINFO:/* No additional info */
+break;
+case QEMU_SI_FAULT:
+__put_user(info->_reason._fault._trapno,
+   >_reason._fault._trapno);
+break;
+case QEMU_SI_TIMER:
+__put_user(info->_reason._timer._timerid,
+   >_reason._timer._timerid);
+__put_user(info->_reason._timer._overrun,
+   >_reason._timer._overrun);
+break;
+case QEMU_SI_MESGQ:
+__put_user(info->_reason._mesgq._mqd, >_reason._mesgq._mqd);
+break;
+case QEMU_SI_POLL:
+/* Note: Not generated on FreeBSD */
+__put_user(info->_reason._poll._band, >_reason._poll._band);
+break;
+case QEMU_SI_CAPSICUM:
+__put_user(info->_reason._capsicum._syscall,
+   >_reason._capsicum._syscall);
+break;
+default:
+g_assert_not_reached();
+}
+}
+
 /* Returns 1 if given signal should dump core if not handled. */
 static int core_dump_signal(int sig)
 {
-- 
2.33.1

[PULL 31/40] bsd-user/signal.c: setup_frame

2022-01-31 Thread Warner Losh

setup_frame sets up a signalled stack frame. Associated routines to
extract the pointer to the stack frame and to support alternate stacks.

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/main.c   |  5 +++
 bsd-user/qemu.h   |  3 +-
 bsd-user/signal.c | 83 +++
 3 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/bsd-user/main.c b/bsd-user/main.c
index 29cf4e15693..f1d58e905e7 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -217,6 +217,11 @@ void qemu_cpu_kick(CPUState *cpu)
 /* Assumes contents are already zeroed.  */
 static void init_task_state(TaskState *ts)
 {
+ts->sigaltstack_used = (struct target_sigaltstack) {
+.ss_sp = 0,
+.ss_size = 0,
+.ss_flags = TARGET_SS_DISABLE,
+};
 }
 
 void gemu_log(const char *fmt, ...)
diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index 1648a509b9c..de20650a00d 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -107,7 +107,8 @@ typedef struct TaskState {
  */
 sigset_t signal_mask;
 
-uint8_t stack[];
+/* This thread's sigaltstack, if it has one */
+struct target_sigaltstack sigaltstack_used;
 } __attribute__((aligned(16))) TaskState;
 
 void stop_all_tasks(void);
diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index 84dafa4e9fe..dbc13736073 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -35,6 +35,16 @@ static void host_signal_handler(int host_sig, siginfo_t 
*info, void *puc);
 static void target_to_host_sigset_internal(sigset_t *d,
 const target_sigset_t *s);
 
+static inline int on_sig_stack(TaskState *ts, unsigned long sp)
+{
+return sp - ts->sigaltstack_used.ss_sp < ts->sigaltstack_used.ss_size;
+}
+
+static inline int sas_ss_flags(TaskState *ts, unsigned long sp)
+{
+return ts->sigaltstack_used.ss_size == 0 ? SS_DISABLE :
+on_sig_stack(ts, sp) ? SS_ONSTACK : 0;
+}
 
 /*
  * The BSD ABIs use the same singal numbers across all the CPU architectures, 
so
@@ -491,6 +501,79 @@ static void host_signal_handler(int host_sig, siginfo_t 
*info, void *puc)
 cpu_exit(thread_cpu);
 }
 
+static inline abi_ulong get_sigframe(struct target_sigaction *ka,
+CPUArchState *env, size_t frame_size)
+{
+TaskState *ts = (TaskState *)thread_cpu->opaque;
+abi_ulong sp;
+
+/* Use default user stack */
+sp = get_sp_from_cpustate(env);
+
+if ((ka->sa_flags & TARGET_SA_ONSTACK) && sas_ss_flags(ts, sp) == 0) {
+sp = ts->sigaltstack_used.ss_sp + ts->sigaltstack_used.ss_size;
+}
+
+/* TODO: make this a target_arch function / define */
+#if defined(TARGET_ARM)
+return (sp - frame_size) & ~7;
+#elif defined(TARGET_AARCH64)
+return (sp - frame_size) & ~15;
+#else
+return sp - frame_size;
+#endif
+}
+
+/* compare to $M/$M/exec_machdep.c sendsig and sys/kern/kern_sig.c sigexit */
+
+static void setup_frame(int sig, int code, struct target_sigaction *ka,
+target_sigset_t *set, target_siginfo_t *tinfo, CPUArchState *env)
+{
+struct target_sigframe *frame;
+abi_ulong frame_addr;
+int i;
+
+frame_addr = get_sigframe(ka, env, sizeof(*frame));
+trace_user_setup_frame(env, frame_addr);
+if (!lock_user_struct(VERIFY_WRITE, frame, frame_addr, 0)) {
+unlock_user_struct(frame, frame_addr, 1);
+dump_core_and_abort(TARGET_SIGILL);
+return;
+}
+
+memset(frame, 0, sizeof(*frame));
+setup_sigframe_arch(env, frame_addr, frame, 0);
+
+for (i = 0; i < TARGET_NSIG_WORDS; i++) {
+__put_user(set->__bits[i], >sf_uc.uc_sigmask.__bits[i]);
+}
+
+if (tinfo) {
+frame->sf_si.si_signo = tinfo->si_signo;
+frame->sf_si.si_errno = tinfo->si_errno;
+frame->sf_si.si_code = tinfo->si_code;
+frame->sf_si.si_pid = tinfo->si_pid;
+frame->sf_si.si_uid = tinfo->si_uid;
+frame->sf_si.si_status = tinfo->si_status;
+frame->sf_si.si_addr = tinfo->si_addr;
+/* see host_to_target_siginfo_noswap() for more details */
+frame->sf_si.si_value.sival_ptr = tinfo->si_value.sival_ptr;
+/*
+ * At this point, whatever is in the _reason union is complete
+ * and in target order, so just copy the whole thing over, even
+ * if it's too large for this specific signal.
+ * host_to_target_siginfo_noswap() and tswap_siginfo() have ensured
+ * that's so.
+ */
+memcpy(>sf_si._reason, >_reason,
+   sizeof(tinfo->_reason));
+}
+
+set_sigtramp_args(env, sig, frame, frame_addr, ka);
+
+unlock_user_struct(frame, frame_addr, 1);
+}
+
 void signal_init(void)
 {
 TaskState *ts = (TaskState *)thread_cpu->opaque;
-- 
2.33.1

[PULL 22/40] bsd-user: Add host signals to the build

2022-01-31 Thread Warner Losh

Start to add the host signal functionality to the build.

Signed-off-by: Warner Losh 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 bsd-user/signal.c | 1 +
 meson.build   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index ad8437a8bfb..f3e020e004a 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -22,6 +22,7 @@
 #include "qemu.h"
 #include "signal-common.h"
 #include "hw/core/tcg-cpu-ops.h"
+#include "host-signal.h"
 
 /*
  * Stubbed out routines until we merge signal support from bsd-user
diff --git a/meson.build b/meson.build
index 5dbc9a7a365..155403d44f4 100644
--- a/meson.build
+++ b/meson.build
@@ -2947,6 +2947,7 @@ foreach target : target_dirs
 if 'CONFIG_BSD_USER' in config_target
   base_dir = 'bsd-user'
   target_inc += include_directories('bsd-user/' / targetos)
+  target_inc += include_directories('bsd-user/host/' / host_arch)
   dir = base_dir / abi
   arch_srcs += files(dir / 'signal.c', dir / 'target_arch_cpu.c')
 endif
-- 
2.33.1

[PULL 00/40] Bsd user arm 2022q1 patches

2022-01-31 Thread Warner Losh

The following changes since commit 7a1043cef91739ff4b59812d30f1ed2850d3d34e:

  Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into 
staging (2022-01-28 14:04:01 +)

are available in the Git repository at:

  g...@gitlab.com:bsdimp/qemu.git tags/bsd-user-arm-2022q1-pull-request

for you to fetch changes up to 1103d59caaa82c94b4223a5429c31895d2f05217:

  bsd-user/freebsd/target_os_ucontext.h: Prefer env as arg name for 
CPUArchState args (2022-01-30 17:13:50 -0700)


bsd-user: upstream signal implementation

Upstream the bsd-user fork signal implementation, for the most part.  This
series of commits represents nearly all of the infrastructure that surround
signals, except the actual system call glue (that was also reworked in the
fork and needs its own series). In addition, this adds the sigsegv and sigbus
code to arm. Even in the fork, we don't have good x86 signal implementation,
so there's little to upstream for that at the moment.

bsd-user's signal implementation is similar to linux-user's. The full context
can be found in the bsd-user's fork's 'blitz branch' at
https://github.com/qemu-bsd-user/qemu-bsd-user/tree/blitz which shows how these
are used to implement various system calls. Since this was built from
linux-user's stack stuff, evolved for BSD with the passage of a few years, it
no-doubt missed some bug fixes from linux-user (though nothing obvious stood out
in the quick comparison I made). After the first round of reviews, many of these
improvements have been incorporated.

Patchew history: https://patchew.org/QEMU/20220125012947.14974-1-...@bsdimp.com/



Warner Losh (40):
  bsd-user: Complete FreeBSD siginfo
  bsd-user: Create setup_sigframe_arch to setup sigframe context
  bsd-user/arm/signal.c: Implement setup_sigframe_arch for arm
  bsd-user/arm/signal.c: get_mcontext should zero vfp data
  bsd-user: Remove vestiges of signal queueing code
  bsd-user: Bring in docs from linux-user for signal_pending
  bsd-user/arm/target_arch_cpu.h: Move EXCP_ATOMIC to match linux-user
  bsd-user/signal.c: implement force_sig_fault
  bsd-user/signal-common.h: Move signal functions prototypes to here
  bsd-user/signal.c: Implement cpu_loop_exit_sigsegv
  bsd-user/signal.c: implement cpu_loop_exit_sigbus
  bsd-user/arm/arget_arch_cpu.h: Move EXCP_DEBUG and EXCP_BKPT together
  bsd-user/arm/target_arch_cpu.h: Correct code pointer
  bsd-user/arm/target_arch_cpu.h: Use force_sig_fault for EXCP_UDEF
  bsd-user/arm/target_arch_cpu.h: Implement data faults
  bsd-user/signal.c: implement abstract target / host signal translation
  bsd-user/signal.c: Implement signal_init()
  bsd-user/signal.c: Add si_type argument to queue_signal
  bsd-user/host/arm/host-signal.h: Implement host_signal_*
  bsd-user/host/i386/host-signal.h: Implement host_signal_*
  bsd-user/host/x86_64/host-signal.h: Implement host_signal_*
  bsd-user: Add host signals to the build
  bsd-user: Add trace events for bsd-user
  bsd-user/signal.c: host_to_target_siginfo_noswap
  bsd-user/signal.c: Implement rewind_if_in_safe_syscall
  bsd-user/signal.c: Implement host_signal_handler
  bsd-user/strace.c: print_taken_signal
  bsd-user/signal.c: Implement dump_core_and_abort
  bsd-user/signal.c: Fill in queue_signal
  bsd-user/signal.c: sigset manipulation routines.
  bsd-user/signal.c: setup_frame
  bsd-user/signal.c: handle_pending_signal
  bsd-user/signal.c: tswap_siginfo
  bsd-user/signal.c: process_pending_signals
  bsd-user/signal.c: implement do_sigreturn
  bsd-user/signal.c: implement do_sigaction
  bsd-user/signal.c: do_sigaltstack
  MAINTAINERS: Add tests/vm/*bsd to the list to get reviews on
  bsd-user: Rename arg name for target_cpu_reset to env
  bsd-user/freebsd/target_os_ucontext.h: Prefer env as arg name for
CPUArchState args

 MAINTAINERS   |1 +
 bsd-user/arm/signal.c |   59 +-
 bsd-user/arm/target_arch_cpu.h|  101 +--
 bsd-user/freebsd/target_os_siginfo.h  |   15 +-
 bsd-user/freebsd/target_os_signal.h   |3 +
 bsd-user/freebsd/target_os_ucontext.h |6 +-
 bsd-user/host/arm/host-signal.h   |   35 +
 bsd-user/host/i386/host-signal.h  |   37 +
 bsd-user/host/x86_64/host-signal.h|   37 +
 bsd-user/i386/signal.c|   13 +
 bsd-user/i386/target_arch_cpu.h   |5 +-
 bsd-user/main.c   |   14 +-
 bsd-user/qemu.h   |   66 +-
 bsd-user/signal-common.h  |   70 ++
 bsd-user/signal.c | 1008 -
 bsd-user/strace.c |   97 +++
 bsd-user/syscall_defs.h   |1 +
 bsd-user/trace-events |   11 +
 bsd-user/trace.h  |1 +
 bsd-user/x86_64/signal.c  |   13 +
 bsd-user/x86_64/target_arch_cpu.h |5 +-
 meson.build

[PULL 30/40] bsd-user/signal.c: sigset manipulation routines.

2022-01-31 Thread Warner Losh

target_sigemptyset: resets a set to having no bits set
target_sigaddset:   adds a signal to a set
target_sigismember: returns true when signal is a member
host_to_target_sigset_internal: convert host sigset to target
host_to_target_sigset: convert host sigset to target
target_to_host_sigset_internal: convert target sigset to host
target_to_host_sigset: convert target sigset to host

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/signal-common.h |  2 ++
 bsd-user/signal.c| 74 
 2 files changed, 76 insertions(+)

diff --git a/bsd-user/signal-common.h b/bsd-user/signal-common.h
index 80e9503238a..ee819266f54 100644
--- a/bsd-user/signal-common.h
+++ b/bsd-user/signal-common.h
@@ -14,11 +14,13 @@ abi_long do_sigaltstack(abi_ulong uss_addr, abi_ulong 
uoss_addr, abi_ulong sp);
 long do_sigreturn(CPUArchState *env);
 void force_sig_fault(int sig, int code, abi_ulong addr);
 int host_to_target_signal(int sig);
+void host_to_target_sigset(target_sigset_t *d, const sigset_t *s);
 void process_pending_signals(CPUArchState *env);
 void queue_signal(CPUArchState *env, int sig, int si_type,
   target_siginfo_t *info);
 void signal_init(void);
 int target_to_host_signal(int sig);
+void target_to_host_sigset(sigset_t *d, const target_sigset_t *s);
 
 /*
  * Within QEMU the top 8 bits of si_code indicate which of the parts of the
diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index 34663f7a28a..84dafa4e9fe 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -32,6 +32,9 @@
 
 static struct target_sigaction sigact_table[TARGET_NSIG];
 static void host_signal_handler(int host_sig, siginfo_t *info, void *puc);
+static void target_to_host_sigset_internal(sigset_t *d,
+const target_sigset_t *s);
+
 
 /*
  * The BSD ABIs use the same singal numbers across all the CPU architectures, 
so
@@ -48,6 +51,25 @@ int target_to_host_signal(int sig)
 return sig;
 }
 
+static inline void target_sigemptyset(target_sigset_t *set)
+{
+memset(set, 0, sizeof(*set));
+}
+
+static inline void target_sigaddset(target_sigset_t *set, int signum)
+{
+signum--;
+uint32_t mask = (uint32_t)1 << (signum % TARGET_NSIG_BPW);
+set->__bits[signum / TARGET_NSIG_BPW] |= mask;
+}
+
+static inline int target_sigismember(const target_sigset_t *set, int signum)
+{
+signum--;
+abi_ulong mask = (abi_ulong)1 << (signum % TARGET_NSIG_BPW);
+return (set->__bits[signum / TARGET_NSIG_BPW] & mask) != 0;
+}
+
 /* Adjust the signal context to rewind out of safe-syscall if we're in it */
 static inline void rewind_if_in_safe_syscall(void *puc)
 {
@@ -60,6 +82,58 @@ static inline void rewind_if_in_safe_syscall(void *puc)
 }
 }
 
+/*
+ * Note: The following take advantage of the BSD signal property that all
+ * signals are available on all architectures.
+ */
+static void host_to_target_sigset_internal(target_sigset_t *d,
+const sigset_t *s)
+{
+int i;
+
+target_sigemptyset(d);
+for (i = 1; i <= NSIG; i++) {
+if (sigismember(s, i)) {
+target_sigaddset(d, host_to_target_signal(i));
+}
+}
+}
+
+void host_to_target_sigset(target_sigset_t *d, const sigset_t *s)
+{
+target_sigset_t d1;
+int i;
+
+host_to_target_sigset_internal(, s);
+for (i = 0; i < _SIG_WORDS; i++) {
+d->__bits[i] = tswap32(d1.__bits[i]);
+}
+}
+
+static void target_to_host_sigset_internal(sigset_t *d,
+const target_sigset_t *s)
+{
+int i;
+
+sigemptyset(d);
+for (i = 1; i <= TARGET_NSIG; i++) {
+if (target_sigismember(s, i)) {
+sigaddset(d, target_to_host_signal(i));
+}
+}
+}
+
+void target_to_host_sigset(sigset_t *d, const target_sigset_t *s)
+{
+target_sigset_t s1;
+int i;
+
+for (i = 0; i < TARGET_NSIG_WORDS; i++) {
+s1.__bits[i] = tswap32(s->__bits[i]);
+}
+target_to_host_sigset_internal(d, );
+}
+
 static bool has_trapno(int tsig)
 {
 return tsig == TARGET_SIGILL ||
-- 
2.33.1

[PULL 16/40] bsd-user/signal.c: implement abstract target / host signal translation

2022-01-31 Thread Warner Losh

Implement host_to_target_signal and target_to_host_signal.

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 bsd-user/signal-common.h |  2 ++
 bsd-user/signal.c| 16 
 2 files changed, 18 insertions(+)

diff --git a/bsd-user/signal-common.h b/bsd-user/signal-common.h
index f9a9d1e01aa..efed23d9efb 100644
--- a/bsd-user/signal-common.h
+++ b/bsd-user/signal-common.h
@@ -13,8 +13,10 @@ long do_rt_sigreturn(CPUArchState *env);
 abi_long do_sigaltstack(abi_ulong uss_addr, abi_ulong uoss_addr, abi_ulong sp);
 long do_sigreturn(CPUArchState *env);
 void force_sig_fault(int sig, int code, abi_ulong addr);
+int host_to_target_signal(int sig);
 void process_pending_signals(CPUArchState *env);
 void queue_signal(CPUArchState *env, int sig, target_siginfo_t *info);
 void signal_init(void);
+int target_to_host_signal(int sig);
 
 #endif
diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index 844dfa19095..1313baec96a 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -2,6 +2,7 @@
  *  Emulation of BSD signals
  *
  *  Copyright (c) 2003 - 2008 Fabrice Bellard
+ *  Copyright (c) 2013 Stacey Son
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -27,6 +28,21 @@
  * fork.
  */
 
+/*
+ * The BSD ABIs use the same singal numbers across all the CPU architectures, 
so
+ * (unlike Linux) these functions are just the identity mapping. This might not
+ * be true for XyzBSD running on AbcBSD, which doesn't currently work.
+ */
+int host_to_target_signal(int sig)
+{
+return sig;
+}
+
+int target_to_host_signal(int sig)
+{
+return sig;
+}
+
 /*
  * Queue a signal so that it will be send to the virtual CPU as soon as
  * possible.
-- 
2.33.1

[PULL 02/40] bsd-user: Create setup_sigframe_arch to setup sigframe context

2022-01-31 Thread Warner Losh

Define setup_sigframe_arch whose job it is to setup the mcontext for the
sigframe. Implement for x86 to just call mcontext.

Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/freebsd/target_os_signal.h |  3 +++
 bsd-user/i386/signal.c  | 13 +
 bsd-user/x86_64/signal.c| 13 +
 3 files changed, 29 insertions(+)

diff --git a/bsd-user/freebsd/target_os_signal.h 
b/bsd-user/freebsd/target_os_signal.h
index 3ed454e086d..43700d08f71 100644
--- a/bsd-user/freebsd/target_os_signal.h
+++ b/bsd-user/freebsd/target_os_signal.h
@@ -4,6 +4,9 @@
 #include "target_os_siginfo.h"
 #include "target_arch_signal.h"
 
+abi_long setup_sigframe_arch(CPUArchState *env, abi_ulong frame_addr,
+ struct target_sigframe *frame, int flags);
+
 /* Compare to sys/signal.h */
 #define TARGET_SIGHUP  1   /* hangup */
 #define TARGET_SIGINT  2   /* interrupt */
diff --git a/bsd-user/i386/signal.c b/bsd-user/i386/signal.c
index 2939d32400c..5dd975ce56a 100644
--- a/bsd-user/i386/signal.c
+++ b/bsd-user/i386/signal.c
@@ -32,6 +32,19 @@ abi_long set_sigtramp_args(CPUX86State *env, int sig,
 return 0;
 }
 
+/*
+ * Compare to i386/i386/exec_machdep.c sendsig()
+ * Assumes that the memory is locked if frame points to user memory.
+ */
+abi_long setup_sigframe_arch(CPUX86State *env, abi_ulong frame_addr,
+ struct target_sigframe *frame, int flags)
+{
+target_mcontext_t *mcp = >sf_uc.uc_mcontext;
+
+get_mcontext(env, mcp, flags);
+return 0;
+}
+
 /* Compare to i386/i386/machdep.c get_mcontext() */
 abi_long get_mcontext(CPUX86State *regs, target_mcontext_t *mcp, int flags)
 {
diff --git a/bsd-user/x86_64/signal.c b/bsd-user/x86_64/signal.c
index 8885152a7da..c3875bc4c6a 100644
--- a/bsd-user/x86_64/signal.c
+++ b/bsd-user/x86_64/signal.c
@@ -30,6 +30,19 @@ abi_long set_sigtramp_args(CPUX86State *regs,
 return 0;
 }
 
+/*
+ * Compare to amd64/amd64/exec_machdep.c sendsig()
+ * Assumes that the memory is locked if frame points to user memory.
+ */
+abi_long setup_sigframe_arch(CPUX86State *env, abi_ulong frame_addr,
+ struct target_sigframe *frame, int flags)
+{
+target_mcontext_t *mcp = >sf_uc.uc_mcontext;
+
+get_mcontext(env, mcp, flags);
+return 0;
+}
+
 /* Compare to amd64/amd64/machdep.c get_mcontext() */
 abi_long get_mcontext(CPUX86State *regs,
 target_mcontext_t *mcp, int flags)
-- 
2.33.1

[PULL 08/40] bsd-user/signal.c: implement force_sig_fault

2022-01-31 Thread Warner Losh

Start to implement the force_sig_fault code. This currently just calls
queue_signal(). The bsd-user fork version of that will handle this the
synchronous nature of this call. Add signal-common.h to hold signal
helper functions like force_sig_fault.

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 bsd-user/signal-common.h | 14 ++
 bsd-user/signal.c| 18 ++
 2 files changed, 32 insertions(+)
 create mode 100644 bsd-user/signal-common.h

diff --git a/bsd-user/signal-common.h b/bsd-user/signal-common.h
new file mode 100644
index 000..6207417d39e
--- /dev/null
+++ b/bsd-user/signal-common.h
@@ -0,0 +1,14 @@
+/*
+ * Emulation of BSD signals
+ *
+ * Copyright (c) 2013 Stacey Son
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef SIGNAL_COMMON_H
+#define SIGNAL_COMMON_H
+
+void force_sig_fault(int sig, int code, abi_ulong addr);
+
+#endif
diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index 05b277c6422..1206d0d728c 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -19,6 +19,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu.h"
+#include "signal-common.h"
 
 /*
  * Stubbed out routines until we merge signal support from bsd-user
@@ -34,6 +35,23 @@ void queue_signal(CPUArchState *env, int sig, 
target_siginfo_t *info)
 qemu_log_mask(LOG_UNIMP, "No signal queueing, dropping signal %d\n", sig);
 }
 
+/*
+ * Force a synchronously taken QEMU_SI_FAULT signal. For QEMU the
+ * 'force' part is handled in process_pending_signals().
+ */
+void force_sig_fault(int sig, int code, abi_ulong addr)
+{
+CPUState *cpu = thread_cpu;
+CPUArchState *env = cpu->env_ptr;
+target_siginfo_t info = {};
+
+info.si_signo = sig;
+info.si_errno = 0;
+info.si_code = code;
+info.si_addr = addr;
+queue_signal(env, sig, );
+}
+
 void signal_init(void)
 {
 }
-- 
2.33.1

[PULL 36/40] bsd-user/signal.c: implement do_sigaction

2022-01-31 Thread Warner Losh

Implement the meat of the sigaction(2) system call with do_sigaction and
helper routiner block_signals (which is also used to implemement signal
masking so it's global).

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/signal-common.h | 22 +++
 bsd-user/signal.c| 82 
 2 files changed, 104 insertions(+)

diff --git a/bsd-user/signal-common.h b/bsd-user/signal-common.h
index 786ec592d18..7ff8e8f2e40 100644
--- a/bsd-user/signal-common.h
+++ b/bsd-user/signal-common.h
@@ -9,7 +9,29 @@
 #ifndef SIGNAL_COMMON_H
 #define SIGNAL_COMMON_H
 
+/**
+ * block_signals: block all signals while handling this guest syscall
+ *
+ * Block all signals, and arrange that the signal mask is returned to
+ * its correct value for the guest before we resume execution of guest code.
+ * If this function returns non-zero, then the caller should immediately
+ * return -TARGET_ERESTARTSYS to the main loop, which will take the pending
+ * signal and restart execution of the syscall.
+ * If block_signals() returns zero, then the caller can continue with
+ * emulation of the system call knowing that no signals can be taken
+ * (and therefore that no race conditions will result).
+ * This should only be called once, because if it is called a second time
+ * it will always return non-zero. (Think of it like a mutex that can't
+ * be recursively locked.)
+ * Signals will be unblocked again by process_pending_signals().
+ *
+ * Return value: non-zero if there was a pending signal, zero if not.
+ */
+int block_signals(void); /* Returns non zero if signal pending */
+
 long do_rt_sigreturn(CPUArchState *env);
+int do_sigaction(int sig, const struct target_sigaction *act,
+struct target_sigaction *oact);
 abi_long do_sigaltstack(abi_ulong uss_addr, abi_ulong uoss_addr, abi_ulong sp);
 long do_sigreturn(CPUArchState *env, abi_ulong addr);
 void force_sig_fault(int sig, int code, abi_ulong addr);
diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index 150262a87e5..5c94bd02e38 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -309,6 +309,25 @@ static void tswap_siginfo(target_siginfo_t *tinfo, const 
target_siginfo_t *info)
 }
 }
 
+int block_signals(void)
+{
+TaskState *ts = (TaskState *)thread_cpu->opaque;
+sigset_t set;
+
+/*
+ * It's OK to block everything including SIGSEGV, because we won't run any
+ * further guest code before unblocking signals in
+ * process_pending_signals(). We depend on the FreeBSD behaivor here where
+ * this will only affect this thread's signal mask. We don't use
+ * pthread_sigmask which might seem more correct because that routine also
+ * does odd things with SIGCANCEL to implement pthread_cancel().
+ */
+sigfillset();
+sigprocmask(SIG_SETMASK, , 0);
+
+return qatomic_xchg(>signal_pending, 1);
+}
+
 /* Returns 1 if given signal should dump core if not handled. */
 static int core_dump_signal(int sig)
 {
@@ -554,6 +573,69 @@ static void host_signal_handler(int host_sig, siginfo_t 
*info, void *puc)
 cpu_exit(thread_cpu);
 }
 
+/* do_sigaction() return host values and errnos */
+int do_sigaction(int sig, const struct target_sigaction *act,
+struct target_sigaction *oact)
+{
+struct target_sigaction *k;
+struct sigaction act1;
+int host_sig;
+int ret = 0;
+
+if (sig < 1 || sig > TARGET_NSIG) {
+return -TARGET_EINVAL;
+}
+
+if ((sig == TARGET_SIGKILL || sig == TARGET_SIGSTOP) &&
+act != NULL && act->_sa_handler != TARGET_SIG_DFL) {
+return -TARGET_EINVAL;
+}
+
+if (block_signals()) {
+return -TARGET_ERESTART;
+}
+
+k = _table[sig - 1];
+if (oact) {
+oact->_sa_handler = tswapal(k->_sa_handler);
+oact->sa_flags = tswap32(k->sa_flags);
+oact->sa_mask = k->sa_mask;
+}
+if (act) {
+k->_sa_handler = tswapal(act->_sa_handler);
+k->sa_flags = tswap32(act->sa_flags);
+k->sa_mask = act->sa_mask;
+
+/* Update the host signal state. */
+host_sig = target_to_host_signal(sig);
+if (host_sig != SIGSEGV && host_sig != SIGBUS) {
+memset(, 0, sizeof(struct sigaction));
+sigfillset(_mask);
+act1.sa_flags = SA_SIGINFO;
+if (k->sa_flags & TARGET_SA_RESTART) {
+act1.sa_flags |= SA_RESTART;
+}
+/*
+ *  Note: It is important to update the host kernel signal mask to
+ *  avoid getting unexpected interrupted system calls.
+ */
+if (k->_sa_handler == TARGET_SIG_IGN) {
+act1.sa_sigaction = (void *)SIG_IGN;
+} else if (k->_sa_handler == TARGET_SIG_DFL) {
+if (fatal_signal(sig)) {
+act1.sa_sigaction = host_signal_handler;
+} else {
+

[PULL 26/40] bsd-user/signal.c: Implement host_signal_handler

2022-01-31 Thread Warner Losh

Implement host_signal_handler to handle signals generated by the host
and to do safe system calls.

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/signal.c | 105 ++
 1 file changed, 105 insertions(+)

diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index 454aef2993e..24cf4b1120b 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -223,6 +223,111 @@ void force_sig_fault(int sig, int code, abi_ulong addr)
 
 static void host_signal_handler(int host_sig, siginfo_t *info, void *puc)
 {
+CPUArchState *env = thread_cpu->env_ptr;
+CPUState *cpu = env_cpu(env);
+TaskState *ts = cpu->opaque;
+target_siginfo_t tinfo;
+ucontext_t *uc = puc;
+struct emulated_sigtable *k;
+int guest_sig;
+uintptr_t pc = 0;
+bool sync_sig = false;
+
+/*
+ * Non-spoofed SIGSEGV and SIGBUS are synchronous, and need special
+ * handling wrt signal blocking and unwinding.
+ */
+if ((host_sig == SIGSEGV || host_sig == SIGBUS) && info->si_code > 0) {
+MMUAccessType access_type;
+uintptr_t host_addr;
+abi_ptr guest_addr;
+bool is_write;
+
+host_addr = (uintptr_t)info->si_addr;
+
+/*
+ * Convert forcefully to guest address space: addresses outside
+ * reserved_va are still valid to report via SEGV_MAPERR.
+ */
+guest_addr = h2g_nocheck(host_addr);
+
+pc = host_signal_pc(uc);
+is_write = host_signal_write(info, uc);
+access_type = adjust_signal_pc(, is_write);
+
+if (host_sig == SIGSEGV) {
+bool maperr = true;
+
+if (info->si_code == SEGV_ACCERR && h2g_valid(host_addr)) {
+/* If this was a write to a TB protected page, restart. */
+if (is_write &&
+handle_sigsegv_accerr_write(cpu, >uc_sigmask,
+pc, guest_addr)) {
+return;
+}
+
+/*
+ * With reserved_va, the whole address space is PROT_NONE,
+ * which means that we may get ACCERR when we want MAPERR.
+ */
+if (page_get_flags(guest_addr) & PAGE_VALID) {
+maperr = false;
+} else {
+info->si_code = SEGV_MAPERR;
+}
+}
+
+sigprocmask(SIG_SETMASK, >uc_sigmask, NULL);
+cpu_loop_exit_sigsegv(cpu, guest_addr, access_type, maperr, pc);
+} else {
+sigprocmask(SIG_SETMASK, >uc_sigmask, NULL);
+if (info->si_code == BUS_ADRALN) {
+cpu_loop_exit_sigbus(cpu, guest_addr, access_type, pc);
+}
+}
+
+sync_sig = true;
+}
+
+/* Get the target signal number. */
+guest_sig = host_to_target_signal(host_sig);
+if (guest_sig < 1 || guest_sig > TARGET_NSIG) {
+return;
+}
+trace_user_host_signal(cpu, host_sig, guest_sig);
+
+host_to_target_siginfo_noswap(, info);
+
+k = >sigtab[guest_sig - 1];
+k->info = tinfo;
+k->pending = guest_sig;
+ts->signal_pending = 1;
+
+/*
+ * For synchronous signals, unwind the cpu state to the faulting
+ * insn and then exit back to the main loop so that the signal
+ * is delivered immediately.
+ */
+if (sync_sig) {
+cpu->exception_index = EXCP_INTERRUPT;
+cpu_loop_exit_restore(cpu, pc);
+}
+
+rewind_if_in_safe_syscall(puc);
+
+/*
+ * Block host signals until target signal handler entered. We
+ * can't block SIGSEGV or SIGBUS while we're executing guest
+ * code in case the guest code provokes one in the window between
+ * now and it getting out to the main loop. Signals will be
+ * unblocked again in process_pending_signals().
+ */
+sigfillset(>uc_sigmask);
+sigdelset(>uc_sigmask, SIGSEGV);
+sigdelset(>uc_sigmask, SIGBUS);
+
+/* Interrupt the virtual CPU as soon as possible. */
+cpu_exit(thread_cpu);
 }
 
 void signal_init(void)
-- 
2.33.1

[PULL 20/40] bsd-user/host/i386/host-signal.h: Implement host_signal_*

2022-01-31 Thread Warner Losh

Implement host_signal_pc, host_signal_set_pc and host_signal_write for
i386.

Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 bsd-user/host/i386/host-signal.h | 37 
 1 file changed, 37 insertions(+)
 create mode 100644 bsd-user/host/i386/host-signal.h

diff --git a/bsd-user/host/i386/host-signal.h b/bsd-user/host/i386/host-signal.h
new file mode 100644
index 000..169e61b154c
--- /dev/null
+++ b/bsd-user/host/i386/host-signal.h
@@ -0,0 +1,37 @@
+/*
+ * host-signal.h: signal info dependent on the host architecture
+ *
+ * Copyright (c) 2021 Warner Losh
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef I386_HOST_SIGNAL_H
+#define I386_HOST_SIGNAL_H
+
+#include 
+#include 
+#include 
+#include 
+
+static inline uintptr_t host_signal_pc(ucontext_t *uc)
+{
+return uc->uc_mcontext.mc_eip;
+}
+
+static inline void host_signal_set_pc(ucontext_t *uc, uintptr_t pc)
+{
+uc->uc_mcontext.mc_eip = pc;
+}
+
+static inline bool host_signal_write(siginfo_t *info, ucontext_t *uc)
+{
+/*
+ * Look in sys/i386/i386/trap.c. NOTE: mc_err == tr_err due to type punning
+ * between a trapframe and mcontext on FreeBSD/i386.
+ */
+return uc->uc_mcontext.mc_trapno == T_PAGEFLT &&
+uc->uc_mcontext.mc_err & PGEX_W;
+}
+
+#endif
-- 
2.33.1

[PULL 14/40] bsd-user/arm/target_arch_cpu.h: Use force_sig_fault for EXCP_UDEF

2022-01-31 Thread Warner Losh

Use force_sig_fault to implement unknown opcode. This just uninlines
that function, so simplify things by using it. Fold in EXCP_NOCP and
EXCP_INVSTATE, as is done in linux-user. Make a note about slight
differences with FreeBSD in case any of them turn out to be important
later.

Signed-off-by: Warner Losh 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 bsd-user/arm/target_arch_cpu.h | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/bsd-user/arm/target_arch_cpu.h b/bsd-user/arm/target_arch_cpu.h
index 905f13aa1b9..9d790176420 100644
--- a/bsd-user/arm/target_arch_cpu.h
+++ b/bsd-user/arm/target_arch_cpu.h
@@ -51,18 +51,19 @@ static inline void target_cpu_loop(CPUARMState *env)
 process_queued_cpu_work(cs);
 switch (trapnr) {
 case EXCP_UDEF:
-{
-/* See arm/arm/undefined.c undefinedinstruction(); */
-info.si_addr = env->regs[15];
-
-/* illegal instruction */
-info.si_signo = TARGET_SIGILL;
-info.si_errno = 0;
-info.si_code = TARGET_ILL_ILLOPC;
-queue_signal(env, info.si_signo, );
-
-/* TODO: What about instruction emulation? */
-}
+case EXCP_NOCP:
+case EXCP_INVSTATE:
+/*
+ * See arm/arm/undefined.c undefinedinstruction();
+ *
+ * A number of details aren't emulated (they likely don't matter):
+ * o Misaligned PC generates ILL_ILLADR (these can't come from 
qemu)
+ * o Thumb-2 instructions generate ILLADR
+ * o Both modes implement coprocessor instructions, which we don't
+ *   do here. FreeBSD just implements them for the VFP coprocessor
+ *   and special kernel breakpoints, trace points, dtrace, etc.
+ */
+force_sig_fault(TARGET_SIGILL, TARGET_ILL_ILLOPC, env->regs[15]);
 break;
 case EXCP_SWI:
 {
-- 
2.33.1

[PULL 04/40] bsd-user/arm/signal.c: get_mcontext should zero vfp data

2022-01-31 Thread Warner Losh

FreeBSD's get_mcontext doesn't return any vfp data. Instead, it zeros
out the vfp feilds (and all the spare fields). Impelement this
behavior. We're still missing the sysarch(ARM_GET_VFPCONTEXT) syscall,
though.

Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/arm/signal.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/bsd-user/arm/signal.c b/bsd-user/arm/signal.c
index 9026343b478..2b1dd745d13 100644
--- a/bsd-user/arm/signal.c
+++ b/bsd-user/arm/signal.c
@@ -109,6 +109,15 @@ abi_long get_mcontext(CPUARMState *env, target_mcontext_t 
*mcp, int flags)
 gr[TARGET_REG_LR] = tswap32(env->regs[14]);
 gr[TARGET_REG_PC] = tswap32(env->regs[15]);
 
+/*
+ * FreeBSD's get_mcontext doesn't save VFP info, but sets the pointer and
+ * size to zero.  Applications that need the VFP state use
+ * sysarch(ARM_GET_VFPSTATE) and are expected to adjust mcontext after 
that.
+ */
+mcp->mc_vfp_size = 0;
+mcp->mc_vfp_ptr = 0;
+memset(>mc_spare, 0, sizeof(mcp->mc_spare));
+
 return 0;
 }
 
-- 
2.33.1

[PULL 12/40] bsd-user/arm/arget_arch_cpu.h: Move EXCP_DEBUG and EXCP_BKPT together

2022-01-31 Thread Warner Losh

Implement EXCP_DEBUG and EXCP_BKPT the same, as is done in
linux-user. The prior adjustment of register 15 isn't needed, so remove
that. Remove a redunant comment (that code in FreeBSD never handled
break points). It's unclear why BKPT was an alias for system calls,
but FreeBSD doesn't do that today.

Signed-off-by: Warner Losh 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 bsd-user/arm/target_arch_cpu.h | 22 ++
 1 file changed, 2 insertions(+), 20 deletions(-)

diff --git a/bsd-user/arm/target_arch_cpu.h b/bsd-user/arm/target_arch_cpu.h
index b7f728fd667..05b19ce6119 100644
--- a/bsd-user/arm/target_arch_cpu.h
+++ b/bsd-user/arm/target_arch_cpu.h
@@ -65,19 +65,7 @@ static inline void target_cpu_loop(CPUARMState *env)
 }
 break;
 case EXCP_SWI:
-case EXCP_BKPT:
 {
-/*
- * system call
- * See arm/arm/trap.c cpu_fetch_syscall_args()
- */
-if (trapnr == EXCP_BKPT) {
-if (env->thumb) {
-env->regs[15] += 2;
-} else {
-env->regs[15] += 4;
-}
-}
 n = env->regs[7];
 if (bsd_type == target_freebsd) {
 int ret;
@@ -172,14 +160,8 @@ static inline void target_cpu_loop(CPUARMState *env)
 queue_signal(env, info.si_signo, );
 break;
 case EXCP_DEBUG:
-{
-
-info.si_signo = TARGET_SIGTRAP;
-info.si_errno = 0;
-info.si_code = TARGET_TRAP_BRKPT;
-info.si_addr = env->exception.vaddress;
-queue_signal(env, info.si_signo, );
-}
+case EXCP_BKPT:
+force_sig_fault(TARGET_SIGTRAP, TARGET_TRAP_BRKPT, env->regs[15]);
 break;
 case EXCP_YIELD:
 /* nothing to do here for user-mode, just resume guest code */
-- 
2.33.1

[PULL 28/40] bsd-user/signal.c: Implement dump_core_and_abort

2022-01-31 Thread Warner Losh

Force delivering a signal and generating a core file. It's a global
function for the moment...

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/signal.c   | 76 +
 bsd-user/syscall_defs.h |  1 +
 2 files changed, 77 insertions(+)

diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index 24cf4b1120b..ccda7adbeef 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -172,6 +172,82 @@ static inline void 
host_to_target_siginfo_noswap(target_siginfo_t *tinfo,
 tinfo->si_code = deposit32(si_code, 24, 8, si_type);
 }
 
+/* Returns 1 if given signal should dump core if not handled. */
+static int core_dump_signal(int sig)
+{
+switch (sig) {
+case TARGET_SIGABRT:
+case TARGET_SIGFPE:
+case TARGET_SIGILL:
+case TARGET_SIGQUIT:
+case TARGET_SIGSEGV:
+case TARGET_SIGTRAP:
+case TARGET_SIGBUS:
+return 1;
+default:
+return 0;
+}
+}
+
+/* Abort execution with signal. */
+static void QEMU_NORETURN dump_core_and_abort(int target_sig)
+{
+CPUArchState *env = thread_cpu->env_ptr;
+CPUState *cpu = env_cpu(env);
+TaskState *ts = cpu->opaque;
+int core_dumped = 0;
+int host_sig;
+struct sigaction act;
+
+host_sig = target_to_host_signal(target_sig);
+gdb_signalled(env, target_sig);
+
+/* Dump core if supported by target binary format */
+if (core_dump_signal(target_sig) && (ts->bprm->core_dump != NULL)) {
+stop_all_tasks();
+core_dumped =
+((*ts->bprm->core_dump)(target_sig, env) == 0);
+}
+if (core_dumped) {
+struct rlimit nodump;
+
+/*
+ * We already dumped the core of target process, we don't want
+ * a coredump of qemu itself.
+ */
+ getrlimit(RLIMIT_CORE, );
+ nodump.rlim_cur = 0;
+ setrlimit(RLIMIT_CORE, );
+ (void) fprintf(stderr, "qemu: uncaught target signal %d (%s) "
+ "- %s\n", target_sig, strsignal(host_sig), "core dumped");
+}
+
+/*
+ * The proper exit code for dying from an uncaught signal is
+ * -.  The kernel doesn't allow exit() or _exit() to pass
+ * a negative value.  To get the proper exit code we need to
+ * actually die from an uncaught signal.  Here the default signal
+ * handler is installed, we send ourself a signal and we wait for
+ * it to arrive.
+ */
+memset(, 0, sizeof(act));
+sigfillset(_mask);
+act.sa_handler = SIG_DFL;
+sigaction(host_sig, , NULL);
+
+kill(getpid(), host_sig);
+
+/*
+ * Make sure the signal isn't masked (just reuse the mask inside
+ * of act).
+ */
+sigdelset(_mask, host_sig);
+sigsuspend(_mask);
+
+/* unreachable */
+abort();
+}
+
 /*
  * Queue a signal so that it will be send to the virtual CPU as soon as
  * possible.
diff --git a/bsd-user/syscall_defs.h b/bsd-user/syscall_defs.h
index 04a1a886d7b..62b472b990b 100644
--- a/bsd-user/syscall_defs.h
+++ b/bsd-user/syscall_defs.h
@@ -21,6 +21,7 @@
 #define _SYSCALL_DEFS_H_
 
 #include 
+#include 
 
 #include "errno_defs.h"
 
-- 
2.33.1

[PULL 35/40] bsd-user/signal.c: implement do_sigreturn

2022-01-31 Thread Warner Losh

Implements the meat of a sigreturn(2) system call via do_sigreturn, and
helper reset_signal_mask. Fix the prototype of do_sigreturn in qemu.h
and remove do_rt_sigreturn since it's linux only.

Signed-off-by: Stacey Son 
Signed-off-by: Kyle Evans 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/signal-common.h |  2 +-
 bsd-user/signal.c| 56 
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/bsd-user/signal-common.h b/bsd-user/signal-common.h
index ee819266f54..786ec592d18 100644
--- a/bsd-user/signal-common.h
+++ b/bsd-user/signal-common.h
@@ -11,7 +11,7 @@
 
 long do_rt_sigreturn(CPUArchState *env);
 abi_long do_sigaltstack(abi_ulong uss_addr, abi_ulong uoss_addr, abi_ulong sp);
-long do_sigreturn(CPUArchState *env);
+long do_sigreturn(CPUArchState *env, abi_ulong addr);
 void force_sig_fault(int sig, int code, abi_ulong addr);
 int host_to_target_signal(int sig);
 void host_to_target_sigset(target_sigset_t *d, const sigset_t *s);
diff --git a/bsd-user/signal.c b/bsd-user/signal.c
index 4b398745f45..150262a87e5 100644
--- a/bsd-user/signal.c
+++ b/bsd-user/signal.c
@@ -627,6 +627,62 @@ static void setup_frame(int sig, int code, struct 
target_sigaction *ka,
 unlock_user_struct(frame, frame_addr, 1);
 }
 
+static int reset_signal_mask(target_ucontext_t *ucontext)
+{
+int i;
+sigset_t blocked;
+target_sigset_t target_set;
+TaskState *ts = (TaskState *)thread_cpu->opaque;
+
+for (i = 0; i < TARGET_NSIG_WORDS; i++) {
+if (__get_user(target_set.__bits[i],
+>uc_sigmask.__bits[i])) {
+return -TARGET_EFAULT;
+}
+}
+target_to_host_sigset_internal(, _set);
+ts->signal_mask = blocked;
+
+return 0;
+}
+
+/* See sys/$M/$M/exec_machdep.c sigreturn() */
+long do_sigreturn(CPUArchState *env, abi_ulong addr)
+{
+long ret;
+abi_ulong target_ucontext;
+target_ucontext_t *ucontext = NULL;
+
+/* Get the target ucontext address from the stack frame */
+ret = get_ucontext_sigreturn(env, addr, _ucontext);
+if (is_error(ret)) {
+return ret;
+}
+trace_user_do_sigreturn(env, addr);
+if (!lock_user_struct(VERIFY_READ, ucontext, target_ucontext, 0)) {
+goto badframe;
+}
+
+/* Set the register state back to before the signal. */
+if (set_mcontext(env, >uc_mcontext, 1)) {
+goto badframe;
+}
+
+/* And reset the signal mask. */
+if (reset_signal_mask(ucontext)) {
+goto badframe;
+}
+
+unlock_user_struct(ucontext, target_ucontext, 0);
+return -TARGET_EJUSTRETURN;
+
+badframe:
+if (ucontext != NULL) {
+unlock_user_struct(ucontext, target_ucontext, 0);
+}
+return -TARGET_EFAULT;
+}
+
 void signal_init(void)
 {
 TaskState *ts = (TaskState *)thread_cpu->opaque;
-- 
2.33.1

[PATCH v3 1/3] qmp: Support for querying stats

2022-01-31 Thread Mark Kanda

Introduce QMP support for querying stats. Provide a framework for adding new
stats and support for the following commands:

- query-stats
Returns a list of all stats per target type (only VM and VCPU for now), with
additional options for specifying stat names, VCPU qom paths, and stat provider.

- query-stats-schemas
Returns a list of stats included in each schema type, with an option for
specifying the stat provider.

The framework provides a method to register callbacks for these QMP commands.

The first usecase will be for fd-based KVM stats (in an upcoming patch).

Examples (with fd-based KVM stats):

- Display all VM stats:

{ "execute": "query-stats", "arguments" : { "target": "vm" } }
{ "return": {
"list": [
  { "provider": "kvm",
"stats": [
  { "name": "max_mmu_page_hash_collisions", "value": 0 },
  { "name": "max_mmu_rmap_size", "value": 0 },
  { "name": "nx_lpage_splits", "value": 131 },
 ...
] }
  { "provider": "provider XYZ",
  ...
],
"target": "vm"
  }
}

- Display all VCPU stats:

{ "execute": "query-stats", "arguments" : { "target": "vcpu" } }
{ "return": {
"list": [
  { "list": [
  { "provider": "kvm",
"stats": [
  { "name": "guest_mode", "value": 0 },
  { "name": "directed_yield_successful", "value": 0  },
  { "name": "directed_yield_attempted", "value": 76 },
  ...
] }
  { "provider": "provider XYZ",
  ...
],
"path": "/machine/unattached/device[0]"
  },
  { "list": [
  { "provider": "kvm",
"stats": [
  { "name": "guest_mode", "value": 0 },
  { "name": "directed_yield_successful", "value": 0 },
  { "name": "directed_yield_attempted", "value": 51 },
  ...
  }
],
"target": "vcpu"
  }
}

- Display 'exits' and 'l1d_flush' KVM stats for VCPUs at 
'/machine/unattached/device[2]'
and '/machine/unattached/device[4]':

{ "execute": "query-stats",
  "arguments" : { "target": "vcpu",
  "fields": [ "exits", "l1d_flush" ],
  "paths": [ "/machine/unattached/device[2]",
  "/machine/unattached/device[4]" ]
  "provider": "kvm" } }

{ "return": {
"list": [
  { "list": [
  { "provider": "kvm",
"stats": [
  { "name": "l1d_flush", "value": 14690 },
  { "name": "exits", "value": 50898 }
] }
],
"path": "/machine/unattached/device[2]"
  },
  { "list": [
  { "provider": "kvm",
"stats": [
  { "name": "l1d_flush", "value": 24902 },
  { "name": "exits", "value": 74374 }
] }
 ],
"path": "/machine/unattached/device[4]"
  }
],
"target": "vcpu"
  }
}

- Query stats schemas:

{ "execute": "query-stats-schemas" }
{ "return": {
"vcpu": [
  { "provider": "kvm",
"stats": [
   { "name": "guest_mode",
 "unit": "none",
 "base": 10,
 "exponent": 0,
 "type": "instant" },
   { "name": "directed_yield_successful",
 "unit": "none",
 "base": 10,
 "exponent": 0,
 "type": "cumulative" },
 ...
"provider": "provider XYZ",
...
   "vm": [
  { "provider": "kvm",
"stats": [
   { "name": "max_mmu_page_hash_collisions",
 "unit": "none",
 "base": 10,
 "exponent": 0,
 "type": "peak" },
"provider": "provider XYZ",
...

Signed-off-by: Mark Kanda 
---
 include/monitor/stats.h |  36 ++
 monitor/qmp-cmds.c  | 183 +
 qapi/misc.json  | 253 
 3 files changed, 472 insertions(+)
 create mode 100644 include/monitor/stats.h

diff --git a/include/monitor/stats.h b/include/monitor/stats.h
new file mode 100644
index 00..d4b57322eb
--- /dev/null
+++ b/include/monitor/stats.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright (c) 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef STATS_H
+#define STATS_H
+
+/*
+ * Add qmp stats callbacks to the stats_callbacks list.
+ *
+ * @provider: stats provider
+ *
+ * @stats_fn: routine to query stats:
+ *void (*stats_fn)(StatsResults *results, StatsFilter *filter, Error 
**errp)
+ *
+ * @schema_fn: routine to query stat schemas:
+ *void (*schemas_fn)(StatsSchemaResult *results, Error **errp)
+ */
+void add_stats_callbacks(StatsProvider provider,
+ void (*stats_fn)(StatsResults *, StatsFilter *,
+  Error **),
+ void (*schemas_fn)(StatsSchemaResult *, Error **));
+
+/* Stats helpers routines */

[PULL 39/40] bsd-user: Rename arg name for target_cpu_reset to env

2022-01-31 Thread Warner Losh

Rename the parameter name for target_cpu_reset's CPUArchState * arg from
cpu to env.

Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/arm/target_arch_cpu.h| 2 +-
 bsd-user/i386/target_arch_cpu.h   | 4 ++--
 bsd-user/x86_64/target_arch_cpu.h | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/bsd-user/arm/target_arch_cpu.h b/bsd-user/arm/target_arch_cpu.h
index 2b395d5c97f..b087db48fa4 100644
--- a/bsd-user/arm/target_arch_cpu.h
+++ b/bsd-user/arm/target_arch_cpu.h
@@ -213,7 +213,7 @@ static inline void target_cpu_clone_regs(CPUARMState *env, 
target_ulong newsp)
 env->regs[0] = 0;
 }
 
-static inline void target_cpu_reset(CPUArchState *cpu)
+static inline void target_cpu_reset(CPUArchState *env)
 {
 }
 
diff --git a/bsd-user/i386/target_arch_cpu.h b/bsd-user/i386/target_arch_cpu.h
index 472a96689fc..3cbf69d8af2 100644
--- a/bsd-user/i386/target_arch_cpu.h
+++ b/bsd-user/i386/target_arch_cpu.h
@@ -200,9 +200,9 @@ static inline void target_cpu_clone_regs(CPUX86State *env, 
target_ulong newsp)
 env->regs[R_EAX] = 0;
 }
 
-static inline void target_cpu_reset(CPUArchState *cpu)
+static inline void target_cpu_reset(CPUArchState *env)
 {
-cpu_reset(env_cpu(cpu));
+cpu_reset(env_cpu(env));
 }
 
 #endif /* ! _TARGET_ARCH_CPU_H_ */
diff --git a/bsd-user/x86_64/target_arch_cpu.h 
b/bsd-user/x86_64/target_arch_cpu.h
index 14def48adb5..0a9c0f08946 100644
--- a/bsd-user/x86_64/target_arch_cpu.h
+++ b/bsd-user/x86_64/target_arch_cpu.h
@@ -238,9 +238,9 @@ static inline void target_cpu_clone_regs(CPUX86State *env, 
target_ulong newsp)
 env->regs[R_EAX] = 0;
 }
 
-static inline void target_cpu_reset(CPUArchState *cpu)
+static inline void target_cpu_reset(CPUArchState *env)
 {
-cpu_reset(env_cpu(cpu));
+cpu_reset(env_cpu(env));
 }
 
 #endif /* ! _TARGET_ARCH_CPU_H_ */
-- 
2.33.1

[PULL 15/40] bsd-user/arm/target_arch_cpu.h: Implement data faults

2022-01-31 Thread Warner Losh

Update for the richer set of data faults that are now possible. Copied
largely from linux-user/arm/cpu_loop.c, with minor typo fixes.

Signed-off-by: Warner Losh 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 bsd-user/arm/target_arch_cpu.h | 45 ++
 1 file changed, 35 insertions(+), 10 deletions(-)

diff --git a/bsd-user/arm/target_arch_cpu.h b/bsd-user/arm/target_arch_cpu.h
index 9d790176420..2b395d5c97f 100644
--- a/bsd-user/arm/target_arch_cpu.h
+++ b/bsd-user/arm/target_arch_cpu.h
@@ -39,8 +39,7 @@ static inline void target_cpu_init(CPUARMState *env,
 
 static inline void target_cpu_loop(CPUARMState *env)
 {
-int trapnr;
-target_siginfo_t info;
+int trapnr, si_signo, si_code;
 unsigned int n;
 CPUState *cs = env_cpu(env);
 
@@ -150,15 +149,41 @@ static inline void target_cpu_loop(CPUARMState *env)
 /* just indicate that signals should be handled asap */
 break;
 case EXCP_PREFETCH_ABORT:
-/* See arm/arm/trap.c prefetch_abort_handler() */
 case EXCP_DATA_ABORT:
-/* See arm/arm/trap.c data_abort_handler() */
-info.si_signo = TARGET_SIGSEGV;
-info.si_errno = 0;
-/* XXX: check env->error_code */
-info.si_code = 0;
-info.si_addr = env->exception.vaddress;
-queue_signal(env, info.si_signo, );
+/*
+ * See arm/arm/trap-v6.c prefetch_abort_handler() and
+ * data_abort_handler()
+ *
+ * However, FreeBSD maps these to a generic value and then uses 
that
+ * to maybe fault in pages in vm/vm_fault.c:vm_fault_trap(). I
+ * believe that the indirection maps the same as Linux, but haven't
+ * chased down every single possible indirection.
+ */
+
+/* For user-only we don't set TTBCR_EAE, so look at the FSR. */
+switch (env->exception.fsr & 0x1f) {
+case 0x1: /* Alignment */
+si_signo = TARGET_SIGBUS;
+si_code = TARGET_BUS_ADRALN;
+break;
+case 0x3: /* Access flag fault, level 1 */
+case 0x6: /* Access flag fault, level 2 */
+case 0x9: /* Domain fault, level 1 */
+case 0xb: /* Domain fault, level 2 */
+case 0xd: /* Permission fault, level 1 */
+case 0xf: /* Permission fault, level 2 */
+si_signo = TARGET_SIGSEGV;
+si_code = TARGET_SEGV_ACCERR;
+break;
+case 0x5: /* Translation fault, level 1 */
+case 0x7: /* Translation fault, level 2 */
+si_signo = TARGET_SIGSEGV;
+si_code = TARGET_SEGV_MAPERR;
+break;
+default:
+g_assert_not_reached();
+}
+force_sig_fault(si_signo, si_code, env->exception.vaddress);
 break;
 case EXCP_DEBUG:
 case EXCP_BKPT:
-- 
2.33.1

1 2 3 4 >

1 - 100 of 303 matches

Mail list logo