date:20240110

Re: [NOTFORMERGE PATCH 2/2] gitlab: Add Loongarch64 KVM-only build

2024-01-10 Thread gaosong


Hi,

在 2024/1/11 下午3:10, Thomas Huth 写道:

On 02/01/2024 18.22, Philippe Mathieu-Daudé wrote:

Signed-off-by: Philippe Mathieu-Daudé 
---
Used to test 
https://lore.kernel.org/qemu-devel/20231228084051.3235354-1-zhaotian...@loongson.cn/


So why is it NOTFORMERGE ? Don't we want to test KVM-only builds for 
loongarch in the long run?


 Thomas


I think we can drop this title.

I tested this job by the latest loongarch kvm patches.  buf I find a 
test-hmp check error.


See:
https://gitlab.com/gaosong/qemu/-/jobs/5906385234

If you want to log in to this machine, we can create an account for you.

Thanks.
Song Gao



---
  .../openeuler-22.03-loongarch64.yml   | 22 +++
  1 file changed, 22 insertions(+)

diff --git 
a/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml 
b/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml

index 86d18f820e..60674b8d0f 100644
--- a/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml
+++ b/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml
@@ -19,3 +19,25 @@ openeuler-22.03-loongarch64-all:
 || { cat config.log meson-logs/meson-log.txt; exit 1; }
   - make --output-sync -j`nproc --ignore=40`
   - make --output-sync -j`nproc --ignore=40` check
+
+openeuler-22.03-loongarch64-kvm:
+ extends: .custom_runner_template
+ needs: []
+ stage: build
+ tags:
+ - oe2203
+ - loongarch64
+ rules:
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH 
=~ /^staging/'

+   when: manual
+   allow_failure: true
+ - if: "$LOONGARCH64_RUNNER_AVAILABLE"
+   when: manual
+   allow_failure: true
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --enable-kvm --disable-tcg
+   || { cat config.log meson-logs/meson-log.txt; exit 1; }
+ - make --output-sync -j`nproc --ignore=40`
+ - make --output-sync -j`nproc --ignore=40` check

Re: [PATCH 08/40] vdpa: add back vhost_vdpa_net_first_nc_vdpa

2024-01-10 Thread Jason Wang

On Fri, Dec 8, 2023 at 2:52 AM Si-Wei Liu  wrote:
>
> Previous commits had it removed. Now adding it back because
> this function will be needed by next patches.

Need some description to explain why. Because it should not be needed
as we have a "parent" structure now, anything that is common could be
stored there?

Thanks

Re: [PATCH 09/40] vdpa: no repeat setting shadow_data

2024-01-10 Thread Jason Wang

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu  wrote:
>
> Since shadow_data is now shared in the parent data struct, it
> just needs to be set only once by the first vq. This change
> will make shadow_data independent of svq enabled state, which
> can be optionally turned off when SVQ descritors and device

Typo for descriptors.

> driver areas are all isolated to a separate address space.
>
> Signed-off-by: Si-Wei Liu 

Acked-by: Jason Wang 

Thanks

> ---
>  net/vhost-vdpa.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index c9bfc6f..2555897 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -387,13 +387,12 @@ static int vhost_vdpa_net_data_start(NetClientState *nc)
>  if (s->always_svq ||
>  migration_is_setup_or_active(migrate_get_current()->state)) {
>  v->shadow_vqs_enabled = true;
> -v->shared->shadow_data = true;
>  } else {
>  v->shadow_vqs_enabled = false;
> -v->shared->shadow_data = false;
>  }
>
>  if (v->index == 0) {
> +v->shared->shadow_data = v->shadow_vqs_enabled;
>  vhost_vdpa_net_data_start_first(s);
>  return 0;
>  }
> --
> 1.8.3.1
>

Re: [PATCH 07/40] vdpa: move around vhost_vdpa_set_address_space_id

2024-01-10 Thread Jason Wang

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu  wrote:
>
> Move it a few lines ahead to make function call easier for those
> before it.  No funtional change involved.

Typo for functional.

>
> Signed-off-by: Si-Wei Liu 

Acked-by: Jason Wang 

Thanks

> ---
>  net/vhost-vdpa.c | 36 ++--
>  1 file changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 1a738b2..dbfa192 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -335,6 +335,24 @@ static void vdpa_net_migration_state_notifier(Notifier 
> *notifier, void *data)
>  }
>  }
>
> +static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
> +   unsigned vq_group,
> +   unsigned asid_num)
> +{
> +struct vhost_vring_state asid = {
> +.index = vq_group,
> +.num = asid_num,
> +};
> +int r;
> +
> +r = ioctl(v->shared->device_fd, VHOST_VDPA_SET_GROUP_ASID, );
> +if (unlikely(r < 0)) {
> +error_report("Can't set vq group %u asid %u, errno=%d (%s)",
> + asid.index, asid.num, errno, g_strerror(errno));
> +}
> +return r;
> +}
> +
>  static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
>  {
>  struct vhost_vdpa *v = >vhost_vdpa;
> @@ -490,24 +508,6 @@ static int64_t vhost_vdpa_get_vring_desc_group(int 
> device_fd,
>  return state.num;
>  }
>
> -static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
> -   unsigned vq_group,
> -   unsigned asid_num)
> -{
> -struct vhost_vring_state asid = {
> -.index = vq_group,
> -.num = asid_num,
> -};
> -int r;
> -
> -r = ioctl(v->shared->device_fd, VHOST_VDPA_SET_GROUP_ASID, );
> -if (unlikely(r < 0)) {
> -error_report("Can't set vq group %u asid %u, errno=%d (%s)",
> - asid.index, asid.num, errno, g_strerror(errno));
> -}
> -return r;
> -}
> -
>  static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
>  {
>  VhostIOVATree *tree = v->shared->iova_tree;
> --
> 1.8.3.1
>

Re: [PATCH 06/40] vhost: make svq work with gpa without iova translation

2024-01-10 Thread Jason Wang

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu  wrote:
>
> Make vhost_svq_vring_write_descs able to work with GPA directly
> without going through iova tree for translation. This will be
> needed in the next few patches where the SVQ has dedicated
> address space to host its virtqueues. Instead of having to
> translate qemu's VA to IOVA via the iova tree, with dedicated
> or isolated address space for SVQ descriptors, the IOVA is
> exactly same as the guest GPA space where translation would
> not be needed any more.
>
> Signed-off-by: Si-Wei Liu 
> ---
>  hw/virtio/vhost-shadow-virtqueue.c | 35 +++
>  1 file changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> b/hw/virtio/vhost-shadow-virtqueue.c
> index fc5f408..97ccd45 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -136,8 +136,8 @@ static bool vhost_svq_translate_addr(const 
> VhostShadowVirtqueue *svq,
>   * Return true if success, false otherwise and print error.
>   */
>  static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr 
> *sg,
> -const struct iovec *iovec, size_t 
> num,
> -bool more_descs, bool write)
> +const struct iovec *iovec, hwaddr 
> *addr,
> +size_t num, bool more_descs, bool 
> write)
>  {
>  uint16_t i = svq->free_head, last = svq->free_head;
>  unsigned n;
> @@ -149,8 +149,15 @@ static bool 
> vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>  return true;
>  }
>
> -ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> -if (unlikely(!ok)) {
> +if (svq->iova_tree) {
> +ok = vhost_svq_translate_addr(svq, sg, iovec, num);
> +if (unlikely(!ok)) {
> +return false;
> +}

So the idea is when shadow virtqueue can work directly for GPA, there
won't be an iova_tree here?

If yes, I think we need a comment around iova_tree or here to explain this.

> +} else if (!addr) {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "No translation found for vaddr 0x%p\n",
> +  iovec[0].iov_base);
>  return false;
>  }
>
> @@ -161,7 +168,7 @@ static bool 
> vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>  } else {
>  descs[i].flags = flags;
>  }
> -descs[i].addr = cpu_to_le64(sg[n]);
> +descs[i].addr = cpu_to_le64(svq->iova_tree ? sg[n] : addr[n]);

Or maybe a helper and do the switch there with the comments.

Thanks

>  descs[i].len = cpu_to_le32(iovec[n].iov_len);
>
>  last = i;
> @@ -173,9 +180,10 @@ static bool 
> vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
>  }
>
>  static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> -const struct iovec *out_sg, size_t out_num,
> -const struct iovec *in_sg, size_t in_num,
> -unsigned *head)
> +const struct iovec *out_sg, hwaddr *out_addr,
> +size_t out_num,
> +const struct iovec *in_sg, hwaddr *in_addr,
> +size_t in_num, unsigned *head)
>  {
>  unsigned avail_idx;
>  vring_avail_t *avail = svq->vring.avail;
> @@ -191,13 +199,14 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue 
> *svq,
>  return false;
>  }
>
> -ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_num, in_num > 0,
> - false);
> +ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_addr, out_num,
> + in_num > 0, false);
>  if (unlikely(!ok)) {
>  return false;
>  }
>
> -ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_num, false, true);
> +ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_addr, in_num,
> + false, true);
>  if (unlikely(!ok)) {
>  return false;
>  }
> @@ -258,7 +267,9 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct 
> iovec *out_sg,
>  return -ENOSPC;
>  }
>
> -ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, 
> _head);
> +ok = vhost_svq_add_split(svq, out_sg, elem ? elem->out_addr : NULL,
> + out_num, in_sg, elem ? elem->in_addr : NULL,
> + in_num, _head);
>  if (unlikely(!ok)) {
>  return -EINVAL;
>  }
> --
> 1.8.3.1
>

Re: [PATCH] hw/core: Handle cpu_model_from_type() returning NULL value

2024-01-10 Thread Gavin Shan


Hi Phil,

On 1/11/24 16:47, Philippe Mathieu-Daudé wrote:

Per cpu_model_from_type() docstring (added in commit 445946f4dd):

   * Returns: CPU model name or NULL if the CPU class doesn't exist

We must check the return value in order to avoid surprises, i.e.:

  $ qemu-system-arm -machine virt -cpu cortex-a9
   qemu-system-arm: Invalid CPU model: cortex-a9
   The valid models are: cortex-a7, cortex-a15, (null), (null), (null), (null), 
(null), (null), (null), (null), (null), (null), (null), max

Add assertions when the call can not fail (because the CPU type
must be registered).

Fixes: 5422d2a8fa ("machine: Print CPU model name instead of CPU type")
Reported-by: Peter Maydell 
Signed-off-by: Philippe Mathieu-Daudé 
---
  cpu-target.c  | 1 +
  hw/core/machine.c | 5 +
  target/ppc/cpu_init.c | 1 +
  3 files changed, 7 insertions(+)

diff --git a/cpu-target.c b/cpu-target.c
index 5eecd7ea2d..b0f6deb13b 100644
--- a/cpu-target.c
+++ b/cpu-target.c
@@ -291,6 +291,7 @@ static void cpu_list_entry(gpointer data, gpointer 
user_data)
  const char *typename = object_class_get_name(OBJECT_CLASS(data));
  g_autofree char *model = cpu_model_from_type(typename);
  
+assert(model);

  if (cc->deprecation_note) {
  qemu_printf("  %s (deprecated)\n", model);
  } else {
diff --git a/hw/core/machine.c b/hw/core/machine.c
index fc239101f9..730ec10328 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1422,16 +1422,21 @@ static bool is_cpu_type_supported(const MachineState 
*machine, Error **errp)
  /* The user specified CPU type isn't valid */
  if (!mc->valid_cpu_types[i]) {
  g_autofree char *requested = 
cpu_model_from_type(machine->cpu_type);
+assert(requested);
  error_setg(errp, "Invalid CPU model: %s", requested);
  if (!mc->valid_cpu_types[1]) {
  g_autofree char *model = cpu_model_from_type(
   mc->valid_cpu_types[0]);
+assert(model);
  error_append_hint(errp, "The only valid type is: %s\n", 
model);
  } else {
  error_append_hint(errp, "The valid models are: ");
  for (i = 0; mc->valid_cpu_types[i]; i++) {
  g_autofree char *model = cpu_model_from_type(
   mc->valid_cpu_types[i]);
+if (!model) {
+continue;
+}


Shall we assert(model) for this case, to be consistent with other cases? :)


  error_append_hint(errp, "%s%s",
model,
mc->valid_cpu_types[i + 1] ? ", " : "");


Otherwise, the separator here need to be adjusted because it's uncertain that
mc->valid_cpu_types[i+1] ... mc->valid_cpu_types[END] are valid.



diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 344196a8ce..58f0c1e30e 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -7037,6 +7037,7 @@ static void ppc_cpu_list_entry(gpointer data, gpointer 
user_data)
  }
  
  name = cpu_model_from_type(typename);

+assert(name);
  qemu_printf("PowerPC %-16s PVR %08x\n", name, pcc->pvr);
  for (i = 0; ppc_cpu_aliases[i].alias != NULL; i++) {
  PowerPCCPUAlias *alias = _cpu_aliases[i];


Thanks,
Gavin

Re: [PATCH 1/2] gitlab: Introduce Loongarch64 runner

2024-01-10 Thread gaosong


Hi,

在 2024/1/11 下午3:08, Thomas Huth 写道:

On 02/01/2024 18.22, Philippe Mathieu-Daudé wrote:

Full build config to run CI tests on a Loongarch64 host.

Forks might enable this by setting LOONGARCH64_RUNNER_AVAILABLE
in their CI namespace settings, see:
https://www.qemu.org/docs/master/devel/ci.html#maintainer-controlled-job-variables 



Signed-off-by: Philippe Mathieu-Daudé 
---
  docs/devel/ci-jobs.rst.inc    |  6 ++
  .gitlab-ci.d/custom-runners.yml   |  1 +
  .../openeuler-22.03-loongarch64.yml   | 21 +++
  3 files changed, 28 insertions(+)
  create mode 100644 
.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml



...
diff --git a/.gitlab-ci.d/custom-runners.yml 
b/.gitlab-ci.d/custom-runners.yml

index 8e5b9500f4..152ace4492 100644
--- a/.gitlab-ci.d/custom-runners.yml
+++ b/.gitlab-ci.d/custom-runners.yml
@@ -32,3 +32,4 @@ include:
    - local: '/.gitlab-ci.d/custom-runners/ubuntu-22.04-aarch64.yml'
    - local: '/.gitlab-ci.d/custom-runners/ubuntu-22.04-aarch32.yml'
    - local: '/.gitlab-ci.d/custom-runners/centos-stream-8-x86_64.yml'
+  - local: 
'/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml'
diff --git 
a/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml 
b/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml

new file mode 100644
index 00..86d18f820e
--- /dev/null
+++ b/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml
@@ -0,0 +1,21 @@
+openeuler-22.03-loongarch64-all:
+ extends: .custom_runner_template :-)
+ needs: []
+ stage: build
+ tags:
+ - oe2203
+ - loongarch64
+ rules:
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH 
=~ /^staging/'

+   when: manual
+   allow_failure: true
+ - if: "$LOONGARCH64_RUNNER_AVAILABLE"
+   when: manual
+   allow_failure: true
+ script:
+ - mkdir build
+ - cd build
+ - ../configure
+   || { cat config.log meson-logs/meson-log.txt; exit 1; }
+ - make --output-sync -j`nproc --ignore=40`
+ - make --output-sync -j`nproc --ignore=40` check


Does this system really have more than 40 CPU threads? Or is this a 
copy-n-past from one of the other scripts? In the latter case, I'd 
suggest to adjust the --ignore=40 to a more reasonable value.


 Thomas

No,  only 32.   I think it should be --ignore=32 or 16.

I create a same runner on this machine,  and I  find  some check error.
but I am not sure how to fix it. :-)

See:

https://gitlab.com/gaosong/qemu/-/jobs/5906269934

Thanks.
Song Gao

Re: [NOTFORMERGE PATCH 2/2] gitlab: Add Loongarch64 KVM-only build

2024-01-10 Thread Thomas Huth


On 02/01/2024 18.22, Philippe Mathieu-Daudé wrote:

Signed-off-by: Philippe Mathieu-Daudé 
---
Used to test 
https://lore.kernel.org/qemu-devel/20231228084051.3235354-1-zhaotian...@loongson.cn/


So why is it NOTFORMERGE ? Don't we want to test KVM-only builds for 
loongarch in the long run?


 Thomas



---
  .../openeuler-22.03-loongarch64.yml   | 22 +++
  1 file changed, 22 insertions(+)

diff --git a/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml 
b/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml
index 86d18f820e..60674b8d0f 100644
--- a/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml
+++ b/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml
@@ -19,3 +19,25 @@ openeuler-22.03-loongarch64-all:
 || { cat config.log meson-logs/meson-log.txt; exit 1; }
   - make --output-sync -j`nproc --ignore=40`
   - make --output-sync -j`nproc --ignore=40` check
+
+openeuler-22.03-loongarch64-kvm:
+ extends: .custom_runner_template
+ needs: []
+ stage: build
+ tags:
+ - oe2203
+ - loongarch64
+ rules:
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
+   when: manual
+   allow_failure: true
+ - if: "$LOONGARCH64_RUNNER_AVAILABLE"
+   when: manual
+   allow_failure: true
+ script:
+ - mkdir build
+ - cd build
+ - ../configure --enable-kvm --disable-tcg
+   || { cat config.log meson-logs/meson-log.txt; exit 1; }
+ - make --output-sync -j`nproc --ignore=40`
+ - make --output-sync -j`nproc --ignore=40` check

Re: [PATCH 05/40] vdpa: populate desc_group from net_vhost_vdpa_init

2024-01-10 Thread Jason Wang

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu  wrote:
>
> Add the desc_group field to struct vhost_vdpa, and get it
> populated when the corresponding vq is initialized at
> net_vhost_vdpa_init. If the vq does not have descriptor
> group capability, or it doesn't have a dedicated ASID
> group to host descriptors other than the data buffers,
> desc_group will be set to a negative value -1.
>
> Signed-off-by: Si-Wei Liu 
> ---
>  include/hw/virtio/vhost-vdpa.h |  1 +
>  net/vhost-vdpa.c   | 15 +--
>  2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 6533ad2..63493ff 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -87,6 +87,7 @@ typedef struct vhost_vdpa {
>  Error *migration_blocker;
>  VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>  IOMMUNotifier n;
> +int64_t desc_group;
>  } VhostVDPA;
>
>  int vhost_vdpa_get_iova_range(int fd, struct vhost_vdpa_iova_range 
> *iova_range);
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index cb5705d..1a738b2 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -1855,11 +1855,22 @@ static NetClientState 
> *net_vhost_vdpa_init(NetClientState *peer,
>
>  ret = vhost_vdpa_add(nc, (void *)>vhost_vdpa, queue_pair_index, nvqs);
>  if (ret) {
> -qemu_del_net_client(nc);
> -return NULL;
> +goto err;

This part of introducing the "err" label looks more like a cleanup.

Others look good.

Thanks

>  }
>
> +if (is_datapath) {
> +ret = vhost_vdpa_probe_desc_group(vdpa_device_fd, features,
> +  0, _group, errp);
> +if (unlikely(ret < 0)) {
> +goto err;
> +}
> +}
> +s->vhost_vdpa.desc_group = desc_group;
>  return nc;
> +
> +err:
> +qemu_del_net_client(nc);
> +return NULL;
>  }
>
>  static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
> --
> 1.8.3.1
>

Re: [PATCH 1/2] gitlab: Introduce Loongarch64 runner

2024-01-10 Thread Thomas Huth


On 02/01/2024 18.22, Philippe Mathieu-Daudé wrote:

Full build config to run CI tests on a Loongarch64 host.

Forks might enable this by setting LOONGARCH64_RUNNER_AVAILABLE
in their CI namespace settings, see:
https://www.qemu.org/docs/master/devel/ci.html#maintainer-controlled-job-variables

Signed-off-by: Philippe Mathieu-Daudé 
---
  docs/devel/ci-jobs.rst.inc|  6 ++
  .gitlab-ci.d/custom-runners.yml   |  1 +
  .../openeuler-22.03-loongarch64.yml   | 21 +++
  3 files changed, 28 insertions(+)
  create mode 100644 .gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml


...

diff --git a/.gitlab-ci.d/custom-runners.yml b/.gitlab-ci.d/custom-runners.yml
index 8e5b9500f4..152ace4492 100644
--- a/.gitlab-ci.d/custom-runners.yml
+++ b/.gitlab-ci.d/custom-runners.yml
@@ -32,3 +32,4 @@ include:
- local: '/.gitlab-ci.d/custom-runners/ubuntu-22.04-aarch64.yml'
- local: '/.gitlab-ci.d/custom-runners/ubuntu-22.04-aarch32.yml'
- local: '/.gitlab-ci.d/custom-runners/centos-stream-8-x86_64.yml'
+  - local: '/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml'
diff --git a/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml 
b/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml
new file mode 100644
index 00..86d18f820e
--- /dev/null
+++ b/.gitlab-ci.d/custom-runners/openeuler-22.03-loongarch64.yml
@@ -0,0 +1,21 @@
+openeuler-22.03-loongarch64-all:
+ extends: .custom_runner_template
+ needs: []
+ stage: build
+ tags:
+ - oe2203
+ - loongarch64
+ rules:
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
+   when: manual
+   allow_failure: true
+ - if: "$LOONGARCH64_RUNNER_AVAILABLE"
+   when: manual
+   allow_failure: true
+ script:
+ - mkdir build
+ - cd build
+ - ../configure
+   || { cat config.log meson-logs/meson-log.txt; exit 1; }
+ - make --output-sync -j`nproc --ignore=40`
+ - make --output-sync -j`nproc --ignore=40` check


Does this system really have more than 40 CPU threads? Or is this a 
copy-n-past from one of the other scripts? In the latter case, I'd suggest 
to adjust the --ignore=40 to a more reasonable value.


 Thomas

Re: [PATCH 04/40] vdpa: piggyback desc_group index when probing isolated cvq

2024-01-10 Thread Jason Wang

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu  wrote:
>
> Same as the previous commit, but do it for cvq instead of data vqs.
>
> Signed-off-by: Si-Wei Liu 
> ---
>  net/vhost-vdpa.c | 21 +
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 0cf3147..cb5705d 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -1601,16 +1601,19 @@ static const VhostShadowVirtqueueOps 
> vhost_vdpa_net_svq_ops = {
>  };
>
>  /**
> - * Probe if CVQ is isolated
> + * Probe if CVQ is isolated, and piggyback its descriptor group
> + * index if supported
>   *
>   * @device_fd The vdpa device fd
>   * @features  Features offered by the device.
>   * @cvq_index The control vq pair index
> + * @desc_grpidx   The CVQ's descriptor group index to return
>   *
> - * Returns <0 in case of failure, 0 if false and 1 if true.
> + * Returns <0 in case of failure, 0 if false and 1 if true (isolated).
>   */
>  static int vhost_vdpa_probe_cvq_isolation(int device_fd, uint64_t features,
> -  int cvq_index, Error **errp)
> +  int cvq_index, int64_t 
> *desc_grpidx,
> +  Error **errp)
>  {
>  uint64_t backend_features;
>  int64_t cvq_group;
> @@ -1667,6 +1670,13 @@ static int vhost_vdpa_probe_cvq_isolation(int 
> device_fd, uint64_t features,
>  goto out;
>  }
>
> +if (backend_features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID)) {
> +int64_t desc_group = vhost_vdpa_get_vring_desc_group(device_fd,
> + cvq_index, 
> errp);
> +if (likely(desc_group >= 0) && desc_group != cvq_group)
> +*desc_grpidx = desc_group;
> +}
> +
>  for (int i = 0; i < cvq_index; ++i) {
>  int64_t group = vhost_vdpa_get_vring_group(device_fd, i, errp);
>  if (unlikely(group < 0)) {
> @@ -1685,6 +1695,8 @@ static int vhost_vdpa_probe_cvq_isolation(int 
> device_fd, uint64_t features,
>  out:
>  status = 0;
>  ioctl(device_fd, VHOST_VDPA_SET_STATUS, );
> +status = VIRTIO_CONFIG_S_ACKNOWLEDGE | VIRTIO_CONFIG_S_DRIVER;

Is this a bug fix or I don't see the connection with the descriptor group.

Thanks

> +ioctl(device_fd, VHOST_VDPA_SET_STATUS, );
>  return r;
>  }
>
> @@ -1791,6 +1803,7 @@ static NetClientState 
> *net_vhost_vdpa_init(NetClientState *peer,
> Error **errp)
>  {
>  NetClientState *nc = NULL;
> +int64_t desc_group = -1;
>  VhostVDPAState *s;
>  int ret = 0;
>  assert(name);
> @@ -1802,7 +1815,7 @@ static NetClientState 
> *net_vhost_vdpa_init(NetClientState *peer,
>  } else {
>  cvq_isolated = vhost_vdpa_probe_cvq_isolation(vdpa_device_fd, 
> features,
>queue_pair_index * 2,
> -  errp);
> +  _group, errp);
>  if (unlikely(cvq_isolated < 0)) {
>  return NULL;
>  }
> --
> 1.8.3.1
>

Re: [RFC PATCH v3 03/30] io: implement io_pwritev/preadv for QIOChannelFile

2024-01-10 Thread Peter Xu

On Mon, Nov 27, 2023 at 05:25:45PM -0300, Fabiano Rosas wrote:
> From: Nikolay Borisov 
> 
> The upcoming 'fixed-ram' feature will require qemu to write data to
> (and restore from) specific offsets of the migration file.
> 
> Add a minimal implementation of pwritev/preadv and expose them via the
> io_pwritev and io_preadv interfaces.
> 
> Signed-off-by: Nikolay Borisov 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH v6 1/2] qom: new object to associate device to numa node

2024-01-10 Thread Michael S. Tsirkin

On Wed, Jan 10, 2024 at 03:19:05PM -0800, Dan Williams wrote:
> David Hildenbrand wrote:
> > On 09.01.24 17:52, Jonathan Cameron wrote:
> > > On Thu, 4 Jan 2024 10:39:41 -0700
> > > Alex Williamson  wrote:
> > > 
> > >> On Thu, 4 Jan 2024 16:40:39 +
> > >> Ankit Agrawal  wrote:
> > >>
> > >>> Had a discussion with RH folks, summary follows:
> > >>>
> > >>> 1. To align with the current spec description pointed by Jonathan, we 
> > >>> first do
> > >>>   a separate object instance per GI node as suggested by Jonathan. 
> > >>> i.e.
> > >>>   a acpi-generic-initiator would only link one node to the device. 
> > >>> To
> > >>>   associate a set of nodes, those number of object instances should 
> > >>> be
> > >>>   created.
> > >>> 2. In parallel, we work to get the spec updated. After the update, we 
> > >>> switch
> > >>>  to the current implementation to link a PCI device with a set of 
> > >>> NUMA
> > >>>  nodes.
> > >>>
> > >>> Alex/Jonathan, does this sound fine?
> > >>>
> > >>
> > >> Yes, as I understand Jonathan's comments, the acpi-generic-initiator
> > >> object should currently define a single device:node relationship to
> > >> match the ACPI definition.
> > > 
> > > Doesn't matter for this, but it's a many_device:single_node
> > > relationship as currently defined. We should be able to support that
> > > in any new interfaces for QEMU.
> > > 
> > >>   Separately a clarification of the spec
> > >> could be pursued that could allow us to reinstate a node list option
> > >> for the acpi-generic-initiator object.  In the interim, a user can
> > >> define multiple 1:1 objects to create the 1:N relationship that's
> > >> ultimately required here.  Thanks,
> > > 
> > > Yes, a spec clarification would work, probably needs some text
> > > to say a GI might not be an initiator as well - my worry is
> > > theoretical backwards compatibility with a (probably
> > > nonexistent) OS that assumes the N:1 mapping. So you may be in
> > > new SRAT entry territory.
> > > 
> > > Given that, an alternative proposal that I think would work
> > > for you would be to add a 'placeholder' memory node definition
> > > in SRAT (so allow 0 size explicitly - might need a new SRAT
> > > entry to avoid backwards compat issues).
> > 
> > Putting all the PCI/GI/... complexity aside, I'll just raise again that 
> > for virtio-mem something simple like that might be helpful as well, IIUC.
> > 
> > -numa node,nodeid=2 \
> > ...
> > -device virtio-mem-pci,node=2,... \
> > 
> > All we need is the OS to prepare for an empty node that will get 
> > populated with memory later.
> > 
> > So if that's what a "placeholder" node definition in srat could achieve 
> > as well, even without all of the other acpi-generic-initiator stuff, 
> > that would be great.
> 
> Please no "placeholder" definitions in SRAT. One of the main thrusts of
> CXL is to move away from static ACPI tables describing vendor-specific
> memory topology, towards an industry standard device enumeration.
> 
> Platform firmware enumerates the platform CXL "windows" (ACPI CEDT
> CFMWS) and the relative performance of the CPU access a CXL port (ACPI
> HMAT Generic Port), everything else is CXL standard enumeration.

I assume memory topology and so on apply, right?  E.g PMTT etc.
Just making sure.


> It is strictly OS policy about how many NUMA nodes it imagines it wants
> to define within that playground. The current OS policy is one node per
> "window". If a solution believes Linux should be creating more than that
> I submit that's a discussion with OS policy developers, not a trip to
> the BIOS team to please sprinkle in more placeholders. Linux can fully
> own the policy here. The painful bit is just that it never had to
> before.

Re: [RFC PATCH v3 02/30] io: Add generic pwritev/preadv interface

2024-01-10 Thread Peter Xu

On Mon, Nov 27, 2023 at 05:25:44PM -0300, Fabiano Rosas wrote:
> From: Nikolay Borisov 
> 
> Introduce basic pwritev/preadv support in the generic channel layer.
> Specific implementation will follow for the file channel as this is
> required in order to support migration streams with fixed location of
> each ram page.
> 
> Signed-off-by: Nikolay Borisov 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH] hw/core: Handle cpu_model_from_type() returning NULL value

2024-01-10 Thread Philippe Mathieu-Daudé


On 11/1/24 07:47, Philippe Mathieu-Daudé wrote:

Per cpu_model_from_type() docstring (added in commit 445946f4dd):

   * Returns: CPU model name or NULL if the CPU class doesn't exist

We must check the return value in order to avoid surprises, i.e.:

  $ qemu-system-arm -machine virt -cpu cortex-a9


Doh I missed one space before the '$' character when pasting.


   qemu-system-arm: Invalid CPU model: cortex-a9
   The valid models are: cortex-a7, cortex-a15, (null), (null), (null), (null), 
(null), (null), (null), (null), (null), (null), (null), max

Add assertions when the call can not fail (because the CPU type
must be registered).

Fixes: 5422d2a8fa ("machine: Print CPU model name instead of CPU type")
Reported-by: Peter Maydell 
Signed-off-by: Philippe Mathieu-Daudé 
---
  cpu-target.c  | 1 +
  hw/core/machine.c | 5 +
  target/ppc/cpu_init.c | 1 +
  3 files changed, 7 insertions(+)

[PATCH] hw/core: Handle cpu_model_from_type() returning NULL value

2024-01-10 Thread Philippe Mathieu-Daudé

Per cpu_model_from_type() docstring (added in commit 445946f4dd):

  * Returns: CPU model name or NULL if the CPU class doesn't exist

We must check the return value in order to avoid surprises, i.e.:

 $ qemu-system-arm -machine virt -cpu cortex-a9
  qemu-system-arm: Invalid CPU model: cortex-a9
  The valid models are: cortex-a7, cortex-a15, (null), (null), (null), (null), 
(null), (null), (null), (null), (null), (null), (null), max

Add assertions when the call can not fail (because the CPU type
must be registered).

Fixes: 5422d2a8fa ("machine: Print CPU model name instead of CPU type")
Reported-by: Peter Maydell 
Signed-off-by: Philippe Mathieu-Daudé 
---
 cpu-target.c  | 1 +
 hw/core/machine.c | 5 +
 target/ppc/cpu_init.c | 1 +
 3 files changed, 7 insertions(+)

diff --git a/cpu-target.c b/cpu-target.c
index 5eecd7ea2d..b0f6deb13b 100644
--- a/cpu-target.c
+++ b/cpu-target.c
@@ -291,6 +291,7 @@ static void cpu_list_entry(gpointer data, gpointer 
user_data)
 const char *typename = object_class_get_name(OBJECT_CLASS(data));
 g_autofree char *model = cpu_model_from_type(typename);
 
+assert(model);
 if (cc->deprecation_note) {
 qemu_printf("  %s (deprecated)\n", model);
 } else {
diff --git a/hw/core/machine.c b/hw/core/machine.c
index fc239101f9..730ec10328 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1422,16 +1422,21 @@ static bool is_cpu_type_supported(const MachineState 
*machine, Error **errp)
 /* The user specified CPU type isn't valid */
 if (!mc->valid_cpu_types[i]) {
 g_autofree char *requested = 
cpu_model_from_type(machine->cpu_type);
+assert(requested);
 error_setg(errp, "Invalid CPU model: %s", requested);
 if (!mc->valid_cpu_types[1]) {
 g_autofree char *model = cpu_model_from_type(
  mc->valid_cpu_types[0]);
+assert(model);
 error_append_hint(errp, "The only valid type is: %s\n", model);
 } else {
 error_append_hint(errp, "The valid models are: ");
 for (i = 0; mc->valid_cpu_types[i]; i++) {
 g_autofree char *model = cpu_model_from_type(
  mc->valid_cpu_types[i]);
+if (!model) {
+continue;
+}
 error_append_hint(errp, "%s%s",
   model,
   mc->valid_cpu_types[i + 1] ? ", " : "");
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 344196a8ce..58f0c1e30e 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -7037,6 +7037,7 @@ static void ppc_cpu_list_entry(gpointer data, gpointer 
user_data)
 }
 
 name = cpu_model_from_type(typename);
+assert(name);
 qemu_printf("PowerPC %-16s PVR %08x\n", name, pcc->pvr);
 for (i = 0; ppc_cpu_aliases[i].alias != NULL; i++) {
 PowerPCCPUAlias *alias = _cpu_aliases[i];
-- 
2.41.0

Re: [External] Re: [PATCH 3/5] migration: Introduce unimplemented 'qatzip' compression method

2024-01-10 Thread Hao Xiang

On Mon, Jan 8, 2024 at 12:28 PM Fabiano Rosas  wrote:
>
> "Liu, Yuan1"  writes:
>
> >> -Original Message-
> >> From: Hao Xiang 
> >> Sent: Saturday, January 6, 2024 7:53 AM
> >> To: Fabiano Rosas 
> >> Cc: Bryan Zhang ; qemu-devel@nongnu.org;
> >> marcandre.lur...@redhat.com; pet...@redhat.com; quint...@redhat.com;
> >> peter.mayd...@linaro.org; Liu, Yuan1 ;
> >> berra...@redhat.com
> >> Subject: Re: [External] Re: [PATCH 3/5] migration: Introduce unimplemented
> >> 'qatzip' compression method
> >>
> >> On Fri, Jan 5, 2024 at 12:07 PM Fabiano Rosas  wrote:
> >> >
> >> > Bryan Zhang  writes:
> >> >
> >> > +cc Yuan Liu, Daniel Berrangé
> >> >
> >> > > Adds support for 'qatzip' as an option for the multifd compression
> >> > > method parameter, but copy-pastes the no-op logic to leave the
> >> > > actual methods effectively unimplemented. This is in preparation of
> >> > > a subsequent commit that will implement actually using QAT for
> >> > > compression and decompression.
> >> > >
> >> > > Signed-off-by: Bryan Zhang 
> >> > > Signed-off-by: Hao Xiang 
> >> > > ---
> >> > >  hw/core/qdev-properties-system.c |  6 ++-
> >> > >  migration/meson.build|  1 +
> >> > >  migration/multifd-qatzip.c   | 81
> >> 
> >> > >  migration/multifd.h  |  1 +
> >> > >  qapi/migration.json  |  5 +-
> >> > >  5 files changed, 92 insertions(+), 2 deletions(-)  create mode
> >> > > 100644 migration/multifd-qatzip.c
> >> > >
> >> > > diff --git a/hw/core/qdev-properties-system.c
> >> > > b/hw/core/qdev-properties-system.c
> >> > > index 1a396521d5..d8e48dcb0e 100644
> >> > > --- a/hw/core/qdev-properties-system.c
> >> > > +++ b/hw/core/qdev-properties-system.c
> >> > > @@ -658,7 +658,11 @@ const PropertyInfo qdev_prop_fdc_drive_type = {
> >> > > const PropertyInfo qdev_prop_multifd_compression = {
> >> > >  .name = "MultiFDCompression",
> >> > >  .description = "multifd_compression values, "
> >> > > -   "none/zlib/zstd",
> >> > > +   "none/zlib/zstd"
> >> > > +#ifdef CONFIG_QATZIP
> >> > > +   "/qatzip"
> >> > > +#endif
> >> > > +   ,
> >> > >  .enum_table = _lookup,
> >> > >  .get = qdev_propinfo_get_enum,
> >> > >  .set = qdev_propinfo_set_enum,
> >> > > diff --git a/migration/meson.build b/migration/meson.build index
> >> > > 92b1cc4297..e20f318379 100644
> >> > > --- a/migration/meson.build
> >> > > +++ b/migration/meson.build
> >> > > @@ -40,6 +40,7 @@ if get_option('live_block_migration').allowed()
> >> > >system_ss.add(files('block.c'))
> >> > >  endif
> >> > >  system_ss.add(when: zstd, if_true: files('multifd-zstd.c'))
> >> > > +system_ss.add(when: qatzip, if_true: files('multifd-qatzip.c'))
> >> > >
> >> > >  specific_ss.add(when: 'CONFIG_SYSTEM_ONLY',
> >> > >  if_true: files('ram.c', diff --git
> >> > > a/migration/multifd-qatzip.c b/migration/multifd-qatzip.c new file
> >> > > mode 100644 index 00..1733bbddb7
> >> > > --- /dev/null
> >> > > +++ b/migration/multifd-qatzip.c
> >> > > @@ -0,0 +1,81 @@
> >> > > +/*
> >> > > + * Multifd QATzip compression implementation
> >> > > + *
> >> > > + * Copyright (c) Bytedance
> >> > > + *
> >> > > + * Authors:
> >> > > + *  Bryan Zhang 
> >> > > + *  Hao Xiang   
> >> > > + *
> >> > > + * This work is licensed under the terms of the GNU GPL, version 2 or
> >> later.
> >> > > + * See the COPYING file in the top-level directory.
> >> > > + */
> >> > > +
> >> > > +#include "qemu/osdep.h"
> >> > > +#include "exec/ramblock.h"
> >> > > +#include "exec/target_page.h"
> >> > > +#include "qapi/error.h"
> >> > > +#include "migration.h"
> >> > > +#include "options.h"
> >> > > +#include "multifd.h"
> >> > > +
> >> > > +static int qatzip_send_setup(MultiFDSendParams *p, Error **errp) {
> >> > > +return 0;
> >> > > +}
> >> > > +
> >> > > +static void qatzip_send_cleanup(MultiFDSendParams *p, Error **errp)
> >> > > +{};
> >> > > +
> >> > > +static int qatzip_send_prepare(MultiFDSendParams *p, Error **errp)
> >> > > +{
> >> > > +MultiFDPages_t *pages = p->pages;
> >> > > +
> >> > > +for (int i = 0; i < p->normal_num; i++) {
> >> > > +p->iov[p->iovs_num].iov_base = pages->block->host + p-
> >> >normal[i];
> >> > > +p->iov[p->iovs_num].iov_len = p->page_size;
> >> > > +p->iovs_num++;
> >> > > +}
> >> > > +
> >> > > +p->next_packet_size = p->normal_num * p->page_size;
> >> > > +p->flags |= MULTIFD_FLAG_NOCOMP;
> >> > > +return 0;
> >> > > +}
> >> > > +
> >> > > +static int qatzip_recv_setup(MultiFDRecvParams *p, Error **errp) {
> >> > > +return 0;
> >> > > +}
> >> > > +
> >> > > +static void qatzip_recv_cleanup(MultiFDRecvParams *p) {};
> >> > > +
> >> > > +static int qatzip_recv_pages(MultiFDRecvParams *p, Error **errp) {
> >> > > +uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
> >> > > +
> >> > > +if (flags != MULTIFD_FLAG_NOCOMP) {

Re: [PATCH v2 1/2] nubus-device: round Declaration ROM memory region address to qemu_target_page_size()

2024-01-10 Thread Philippe Mathieu-Daudé


On 9/1/24 22:53, Mark Cave-Ayland wrote:

On 08/01/2024 23:06, Philippe Mathieu-Daudé wrote:


On 8/1/24 20:20, Mark Cave-Ayland wrote:
Declaration ROM binary images can be any arbitrary size, however if a 
host ROM

memory region is not aligned to qemu_target_page_size() then we fail the
"assert(!(iotlb & ~TARGET_PAGE_MASK))" check in tlb_set_page_full().

Ensure that the host ROM memory region is aligned to 
qemu_target_page_size()
and adjust the offset at which the Declaration ROM image is loaded, 
since Nubus
ROM images are unusual in that they are aligned to the end of the 
slot address

space.

Signed-off-by: Mark Cave-Ayland 
---
  hw/nubus/nubus-device.c | 16 
  1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/nubus/nubus-device.c b/hw/nubus/nubus-device.c
index 49008e4938..e4f824d58b 100644
--- a/hw/nubus/nubus-device.c
+++ b/hw/nubus/nubus-device.c
@@ -10,6 +10,7 @@
  #include "qemu/osdep.h"
  #include "qemu/datadir.h"
+#include "exec/target_page.h"
  #include "hw/irq.h"
  #include "hw/loader.h"
  #include "hw/nubus/nubus.h"
@@ -30,7 +31,7 @@ static void nubus_device_realize(DeviceState *dev, 
Error **errp)

  NubusDevice *nd = NUBUS_DEVICE(dev);
  char *name, *path;
  hwaddr slot_offset;
-    int64_t size;
+    int64_t size, align_size;


Both are 'size_t'.


I had a look at include/hw/loader.h, and the function signature for 
get_image_size() returns int64_t. Does it not make sense to keep int64_t 
here and use uintptr_t for the pointer arithmetic as below so that 
everything matches?


Oh you are right:

$ git grep -E '(get_image_size|qemu_target_page_size|load_image_size)\(' 
include

include/exec/target_page.h:17:size_t qemu_target_page_size(void);
include/hw/loader.h:13:int64_t get_image_size(const char *filename);
include/hw/loader.h:30:ssize_t load_image_size(const char *filename, 
void *addr, size_t size);


So I guess int64_t is safer.


  int ret;
  /* Super */
@@ -76,16 +77,23 @@ static void nubus_device_realize(DeviceState 
*dev, Error **errp)

  }
  name = g_strdup_printf("nubus-slot-%x-declaration-rom", 
nd->slot);

-    memory_region_init_rom(>decl_rom, OBJECT(dev), name, size,
+
+    /*
+ * Ensure ROM memory region is aligned to target page size 
regardless

+ * of the size of the Declaration ROM image
+ */
+    align_size = ROUND_UP(size, qemu_target_page_size());
+    memory_region_init_rom(>decl_rom, OBJECT(dev), name, 
align_size,

 _abort);
-    ret = load_image_mr(path, >decl_rom);
+    ret = load_image_size(path, 
memory_region_get_ram_ptr(>decl_rom) +
+    (uintptr_t)align_size - size, 
size);


memory_region_get_ram_ptr() returns a 'void *' so this looks dubious.
Maybe use a local variable to ease offset calculation?

   char *rombase = memory_region_get_ram_ptr(>decl_rom);
   ret = load_image_size(path, rombase + align_size - size, size);

Otherwise KISS but ugly:

   ret = load_image_size(path,
 (void *)((uintptr_t)memory_region_get_ram_ptr(>decl_rom)
  + align_size - size), size);


I prefer the first approach, but with uint8_t instead of char since it 
clarifies that it is a pointer to an arbitrary set of bytes as opposed 
to a string. Does that seem reasonable?


Sure! Then with that:

Reviewed-by: Philippe Mathieu-Daudé 




  g_free(path);
  g_free(name);
  if (ret < 0) {
  error_setg(errp, "could not load romfile \"%s\"", 
nd->romfile);

  return;
  }
-    memory_region_add_subregion(>slot_mem, NUBUS_SLOT_SIZE - 
size,
+    memory_region_add_subregion(>slot_mem, NUBUS_SLOT_SIZE - 
align_size,

  >decl_rom);
  }
  }



ATB,

Mark.

Re: [PATCH 00/10] docs/migration: Reorganize migration documentations

2024-01-10 Thread Peter Xu

On Tue, Jan 09, 2024 at 02:46:18PM +0800, pet...@redhat.com wrote:
> From: Peter Xu 
> 
> Migration docs grow larger and larger.  There are plenty of things we can
> do here in the future, but to start that we'd better reorganize the current
> bloated doc files first and properly organize them into separate files.
> This series kicks that off.
> 
> This series mostly does the movement only, so please don't be scared of the
> slightly large diff.  I did touch up things here and there, but I didn't
> yet started writting much.  One thing I did is I converted virtio.txt to
> rST, but that's trivial and no real content I touched.
> 
> I am copying both virtio and vfio people because I'm merging the two
> separate files into the new docs/devel/migration/ folder.

I fixed all the spelling of "practice"s in patch 5, and queued it for now
into staging.

-- 
Peter Xu

Re: [PATCH] target/riscv: Check for 'A' extension on all atomic instructions

2024-01-10 Thread Alistair Francis

On Thu, Jan 11, 2024 at 3:44 AM Rob Bradford  wrote:
>
> Add requirement that 'A' is enabled for all atomic instructions that
> lack the check. This makes the 64-bit versions consistent with the
> 32-bit versions in the same file.
>
> Signed-off-by: Rob Bradford 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  target/riscv/insn_trans/trans_rva.c.inc | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/target/riscv/insn_trans/trans_rva.c.inc 
> b/target/riscv/insn_trans/trans_rva.c.inc
> index 5f194a447b..f0368de3e4 100644
> --- a/target/riscv/insn_trans/trans_rva.c.inc
> +++ b/target/riscv/insn_trans/trans_rva.c.inc
> @@ -163,65 +163,76 @@ static bool trans_amomaxu_w(DisasContext *ctx, 
> arg_amomaxu_w *a)
>  static bool trans_lr_d(DisasContext *ctx, arg_lr_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_lr(ctx, a, MO_ALIGN | MO_TEUQ);
>  }
>
>  static bool trans_sc_d(DisasContext *ctx, arg_sc_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_sc(ctx, a, (MO_ALIGN | MO_TEUQ));
>  }
>
>  static bool trans_amoswap_d(DisasContext *ctx, arg_amoswap_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_xchg_tl, (MO_ALIGN | MO_TEUQ));
>  }
>
>  static bool trans_amoadd_d(DisasContext *ctx, arg_amoadd_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_add_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
>
>  static bool trans_amoxor_d(DisasContext *ctx, arg_amoxor_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_xor_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
>
>  static bool trans_amoand_d(DisasContext *ctx, arg_amoand_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_and_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
>
>  static bool trans_amoor_d(DisasContext *ctx, arg_amoor_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_or_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
>
>  static bool trans_amomin_d(DisasContext *ctx, arg_amomin_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_smin_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
>
>  static bool trans_amomax_d(DisasContext *ctx, arg_amomax_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_smax_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
>
>  static bool trans_amominu_d(DisasContext *ctx, arg_amominu_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_umin_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
>
>  static bool trans_amomaxu_d(DisasContext *ctx, arg_amomaxu_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_umax_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
> --
> 2.43.0
>
>

Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]

2024-01-10 Thread Xiaoyao Li


On 1/8/2024 4:27 PM, Zhao Liu wrote:

From: Zhao Liu 

Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
erroneous smp_num_siblings on Intel Hybrid platforms") is able to
handle platforms with Module level enumerated via CPUID.1F.

Expose the module level in CPUID[0x1F] if the machine has more than 1
modules.

(Tested CPU topology in CPUID[0x1F] leaf with various die/cluster
configurations in "-smp".)

Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
Tested-by: Yongwei Ma 
Acked-by: Michael S. Tsirkin 
---
Changes since v3:
  * New patch to expose module level in 0x1F.
  * Add Tested-by tag from Yongwei.
---
  target/i386/cpu.c | 12 +++-
  target/i386/cpu.h |  2 ++
  target/i386/kvm/kvm.c |  2 +-
  3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 294ca6b8947a..a2d39d2198b6 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -277,6 +277,8 @@ static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo 
*topo_info,
  return 1;
  case CPU_TOPO_LEVEL_CORE:
  return topo_info->threads_per_core;
+case CPU_TOPO_LEVEL_MODULE:
+return topo_info->threads_per_core * topo_info->cores_per_module;
  case CPU_TOPO_LEVEL_DIE:
  return topo_info->threads_per_core * topo_info->cores_per_module *
 topo_info->modules_per_die;
@@ -297,6 +299,8 @@ static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo 
*topo_info,
  return 0;
  case CPU_TOPO_LEVEL_CORE:
  return apicid_core_offset(topo_info);
+case CPU_TOPO_LEVEL_MODULE:
+return apicid_module_offset(topo_info);
  case CPU_TOPO_LEVEL_DIE:
  return apicid_die_offset(topo_info);
  case CPU_TOPO_LEVEL_PACKAGE:
@@ -316,6 +320,8 @@ static uint32_t cpuid1f_topo_type(enum CPUTopoLevel 
topo_level)
  return CPUID_1F_ECX_TOPO_LEVEL_SMT;
  case CPU_TOPO_LEVEL_CORE:
  return CPUID_1F_ECX_TOPO_LEVEL_CORE;
+case CPU_TOPO_LEVEL_MODULE:
+return CPUID_1F_ECX_TOPO_LEVEL_MODULE;
  case CPU_TOPO_LEVEL_DIE:
  return CPUID_1F_ECX_TOPO_LEVEL_DIE;
  default:
@@ -347,6 +353,10 @@ static void encode_topo_cpuid1f(CPUX86State *env, uint32_t 
count,
  if (env->nr_dies > 1) {
  set_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
  }
+
+if (env->nr_modules > 1) {
+set_bit(CPU_TOPO_LEVEL_MODULE, topo_bitmap);
+}
  }
  
  *ecx = count & 0xff;

@@ -6394,7 +6404,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
  break;
  case 0x1F:
  /* V2 Extended Topology Enumeration Leaf */
-if (topo_info.dies_per_pkg < 2) {
+if (topo_info.modules_per_die < 2 && topo_info.dies_per_pkg < 2) {


maybe we can come up with below function if we have 
env->valid_cpu_topo[] as I suggested in patch 5.


bool cpu_x86_has_valid_cpuid1f(CPUX86State *env) {
return env->valid_cpu_topo[2] ? true : false;
}

...


  *eax = *ebx = *ecx = *edx = 0;
  break;
  }
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index eecd30bde92b..97b290e10576 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1018,6 +1018,7 @@ enum CPUTopoLevel {
  CPU_TOPO_LEVEL_INVALID,
  CPU_TOPO_LEVEL_SMT,
  CPU_TOPO_LEVEL_CORE,
+CPU_TOPO_LEVEL_MODULE,
  CPU_TOPO_LEVEL_DIE,
  CPU_TOPO_LEVEL_PACKAGE,
  CPU_TOPO_LEVEL_MAX,
@@ -1032,6 +1033,7 @@ enum CPUTopoLevel {
  #define CPUID_1F_ECX_TOPO_LEVEL_INVALID  CPUID_B_ECX_TOPO_LEVEL_INVALID
  #define CPUID_1F_ECX_TOPO_LEVEL_SMT  CPUID_B_ECX_TOPO_LEVEL_SMT
  #define CPUID_1F_ECX_TOPO_LEVEL_CORE CPUID_B_ECX_TOPO_LEVEL_CORE
+#define CPUID_1F_ECX_TOPO_LEVEL_MODULE   3
  #define CPUID_1F_ECX_TOPO_LEVEL_DIE  5
  
  /* MSR Feature Bits */

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4ce80555b45c..e5ddb214cb36 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1913,7 +1913,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
  break;
  }
  case 0x1f:
-if (env->nr_dies < 2) {
+if (env->nr_modules < 2 && env->nr_dies < 2) {


then cpu_x86_has_valid_cpuid1f() can be used here.


  break;
  }
  /* fallthrough */

Re: [PULL 29/71] hw/arm/virt: Check CPU type in machine_run_board_init()

2024-01-10 Thread Gavin Shan


Hi Peter,

On 1/10/24 00:33, Peter Maydell wrote:

On Fri, 5 Jan 2024 at 15:46, Philippe Mathieu-Daudé  wrote:


From: Gavin Shan 

Set mc->valid_cpu_types so that the user specified CPU type can be
validated in machine_run_board_init(). We needn't to do the check
by ourselves.


Hi; after this change if you try to use the 'virt' board from
qemu-system-arm with an invalid CPU type you get an odd
error message full of "(null)"s:

$ ./build/x86/qemu-system-arm -machine virt -cpu cortex-a9
qemu-system-arm: Invalid CPU model: cortex-a9
The valid models are: cortex-a7, cortex-a15, (null), (null), (null),
(null), (null), (null), (null), (null), (null), (null), (null), max

This seems to be because we print a "(null)" for every 64-bit
only CPU in the list, instead of either ignoring them or not
compiling them into the list in the first place.

https://gitlab.com/qemu-project/qemu/-/issues/2084



Yes, it's because all 64-bits CPUs aren't available to 'qemu-system-arm'.
I've sent a fix for it. Please take a look when getting a chance.

https://lists.nongnu.org/archive/html/qemu-arm/2024-01/msg00531.html

Thanks,
Gavin

Re: [PATCH v7 07/16] i386: Support modules_per_die in X86CPUTopoInfo

2024-01-10 Thread Xiaoyao Li


On 1/8/2024 4:27 PM, Zhao Liu wrote:

From: Zhuocheng Ding 

Support module level in i386 cpu topology structure "X86CPUTopoInfo".

Since x86 does not yet support the "clusters" parameter in "-smp",
X86CPUTopoInfo.modules_per_die is currently always 1. Therefore, the
module level width in APIC ID, which can be calculated by
"apicid_bitwidth_for_count(topo_info->modules_per_die)", is always 0
for now, so we can directly add APIC ID related helpers to support
module level parsing.

In addition, update topology structure in test-x86-topo.c.

Signed-off-by: Zhuocheng Ding 
Co-developed-by: Zhao Liu 
Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
Tested-by: Yongwei Ma 
Acked-by: Michael S. Tsirkin 
---
Changes since v3:
  * Drop the description about not exposing module level in commit
message.
  * Update topology related calculation in newly added helpers:
num_cpus_by_topo_level() and apicid_offset_by_topo_level().

Changes since v1:
  * Include module level related helpers (apicid_module_width() and
apicid_module_offset()) in this patch. (Yanan)
---
  hw/i386/x86.c  |  3 ++-
  include/hw/i386/topology.h | 22 +++
  target/i386/cpu.c  | 17 +-
  tests/unit/test-x86-topo.c | 45 --
  4 files changed, 55 insertions(+), 32 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 1d19a8c609b1..85b847ac7914 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -72,7 +72,8 @@ static void init_topo_info(X86CPUTopoInfo *topo_info,
  MachineState *ms = MACHINE(x86ms);
  
  topo_info->dies_per_pkg = ms->smp.dies;

-topo_info->cores_per_die = ms->smp.cores;
+topo_info->modules_per_die = ms->smp.clusters;
+topo_info->cores_per_module = ms->smp.cores;
  topo_info->threads_per_core = ms->smp.threads;
  }
  
diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h

index d4eeb7ab8290..517e51768c13 100644
--- a/include/hw/i386/topology.h
+++ b/include/hw/i386/topology.h
@@ -56,7 +56,8 @@ typedef struct X86CPUTopoIDs {
  
  typedef struct X86CPUTopoInfo {

  unsigned dies_per_pkg;
-unsigned cores_per_die;
+unsigned modules_per_die;
+unsigned cores_per_module;
  unsigned threads_per_core;
  } X86CPUTopoInfo;
  
@@ -77,7 +78,13 @@ static inline unsigned apicid_smt_width(X86CPUTopoInfo *topo_info)

  /* Bit width of the Core_ID field */
  static inline unsigned apicid_core_width(X86CPUTopoInfo *topo_info)
  {
-return apicid_bitwidth_for_count(topo_info->cores_per_die);
+return apicid_bitwidth_for_count(topo_info->cores_per_module);
+}
+
+/* Bit width of the Module_ID (cluster ID) field */
+static inline unsigned apicid_module_width(X86CPUTopoInfo *topo_info)
+{
+return apicid_bitwidth_for_count(topo_info->modules_per_die);
  }
  
  /* Bit width of the Die_ID field */

@@ -92,10 +99,16 @@ static inline unsigned apicid_core_offset(X86CPUTopoInfo 
*topo_info)
  return apicid_smt_width(topo_info);
  }
  
+/* Bit offset of the Module_ID (cluster ID) field */

+static inline unsigned apicid_module_offset(X86CPUTopoInfo *topo_info)
+{
+return apicid_core_offset(topo_info) + apicid_core_width(topo_info);
+}
+
  /* Bit offset of the Die_ID field */
  static inline unsigned apicid_die_offset(X86CPUTopoInfo *topo_info)
  {
-return apicid_core_offset(topo_info) + apicid_core_width(topo_info);
+return apicid_module_offset(topo_info) + apicid_module_width(topo_info);
  }
  
  /* Bit offset of the Pkg_ID (socket ID) field */

@@ -127,7 +140,8 @@ static inline void x86_topo_ids_from_idx(X86CPUTopoInfo 
*topo_info,
   X86CPUTopoIDs *topo_ids)
  {
  unsigned nr_dies = topo_info->dies_per_pkg;
-unsigned nr_cores = topo_info->cores_per_die;
+unsigned nr_cores = topo_info->cores_per_module *
+topo_info->modules_per_die;
  unsigned nr_threads = topo_info->threads_per_core;
  
  topo_ids->pkg_id = cpu_index / (nr_dies * nr_cores * nr_threads);

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 0a2ce9b92b1f..294ca6b8947a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -278,10 +278,11 @@ static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo 
*topo_info,
  case CPU_TOPO_LEVEL_CORE:
  return topo_info->threads_per_core;
  case CPU_TOPO_LEVEL_DIE:
-return topo_info->threads_per_core * topo_info->cores_per_die;
+return topo_info->threads_per_core * topo_info->cores_per_module *
+   topo_info->modules_per_die;
  case CPU_TOPO_LEVEL_PACKAGE:
-return topo_info->threads_per_core * topo_info->cores_per_die *
-   topo_info->dies_per_pkg;
+return topo_info->threads_per_core * topo_info->cores_per_module *
+   topo_info->modules_per_die * topo_info->dies_per_pkg;
  default:
  g_assert_not_reached();
  }
@@ -450,7 +451,9 @@ static void encode_cache_cpuid801d(CPUCacheInfo *cache,

Re: [External] Re: [PATCH v3 01/20] multifd: Add capability to enable/disable zero_page

2024-01-10 Thread Hao Xiang

On Mon, Jan 8, 2024 at 12:39 PM Fabiano Rosas  wrote:
>
> Hao Xiang  writes:
>
> > From: Juan Quintela 
> >
> > We have to enable it by default until we introduce the new code.
> >
> > Signed-off-by: Juan Quintela 
> > ---
> >  migration/options.c | 15 +++
> >  migration/options.h |  1 +
> >  qapi/migration.json |  8 +++-
> >  3 files changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/migration/options.c b/migration/options.c
> > index 8d8ec73ad9..0f6bd78b9f 100644
> > --- a/migration/options.c
> > +++ b/migration/options.c
> > @@ -204,6 +204,8 @@ Property migration_properties[] = {
> >  DEFINE_PROP_MIG_CAP("x-switchover-ack",
> >  MIGRATION_CAPABILITY_SWITCHOVER_ACK),
> >  DEFINE_PROP_MIG_CAP("x-dirty-limit", MIGRATION_CAPABILITY_DIRTY_LIMIT),
> > +DEFINE_PROP_MIG_CAP("main-zero-page",
> > +MIGRATION_CAPABILITY_MAIN_ZERO_PAGE),
> >  DEFINE_PROP_END_OF_LIST(),
> >  };
> >
> > @@ -284,6 +286,19 @@ bool migrate_multifd(void)
> >  return s->capabilities[MIGRATION_CAPABILITY_MULTIFD];
> >  }
> >
> > +bool migrate_use_main_zero_page(void)
> > +{
> > +/* MigrationState *s; */
> > +
> > +/* s = migrate_get_current(); */
> > +
> > +/*
> > + * We will enable this when we add the right code.
> > + * return s->enabled_capabilities[MIGRATION_CAPABILITY_MAIN_ZERO_PAGE];
> > + */
> > +return true;
> > +}
> > +
> >  bool migrate_pause_before_switchover(void)
> >  {
> >  MigrationState *s = migrate_get_current();
> > diff --git a/migration/options.h b/migration/options.h
> > index 246c160aee..c901eb57c6 100644
> > --- a/migration/options.h
> > +++ b/migration/options.h
> > @@ -88,6 +88,7 @@ int migrate_multifd_channels(void);
> >  MultiFDCompression migrate_multifd_compression(void);
> >  int migrate_multifd_zlib_level(void);
> >  int migrate_multifd_zstd_level(void);
> > +bool migrate_use_main_zero_page(void);
> >  uint8_t migrate_throttle_trigger_threshold(void);
> >  const char *migrate_tls_authz(void);
> >  const char *migrate_tls_creds(void);
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index eb2f883513..80c4b13516 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -531,6 +531,12 @@
> >  # and can result in more stable read performance.  Requires KVM
> >  # with accelerator property "dirty-ring-size" set.  (Since 8.1)
> >  #
> > +#
> > +# @main-zero-page: If enabled, the detection of zero pages will be
> > +#  done on the main thread.  Otherwise it is done on
> > +#  the multifd threads.
> > +#  (since 8.2)
> > +#
> >  # Features:
> >  #
> >  # @deprecated: Member @block is deprecated.  Use blockdev-mirror with
> > @@ -555,7 +561,7 @@
> > { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
> > 'validate-uuid', 'background-snapshot',
> > 'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
> > -   'dirty-limit'] }
> > +   'dirty-limit', 'main-zero-page'] }
> >
> >  ##
> >  # @MigrationCapabilityStatus:
>
> I'll extract this zero page work into a separate series and submit for
> review soon. I want to get people's opinion on it independently of this
> series.

Sounds good. Thanks.

Re: [External] Re: [PATCH 3/5] migration: Introduce unimplemented 'qatzip' compression method

2024-01-10 Thread Hao Xiang

On Mon, Jan 8, 2024 at 6:26 PM Liu, Yuan1  wrote:
>
> > -Original Message-
> > From: Fabiano Rosas 
> > Sent: Tuesday, January 9, 2024 4:28 AM
> > To: Liu, Yuan1 ; Hao Xiang 
> > Cc: Bryan Zhang ; qemu-devel@nongnu.org;
> > marcandre.lur...@redhat.com; pet...@redhat.com; quint...@redhat.com;
> > peter.mayd...@linaro.org; berra...@redhat.com
> > Subject: RE: [External] Re: [PATCH 3/5] migration: Introduce unimplemented
> > 'qatzip' compression method
> >
> > "Liu, Yuan1"  writes:
> >
> > >> -Original Message-
> > >> From: Hao Xiang 
> > >> Sent: Saturday, January 6, 2024 7:53 AM
> > >> To: Fabiano Rosas 
> > >> Cc: Bryan Zhang ; qemu-devel@nongnu.org;
> > >> marcandre.lur...@redhat.com; pet...@redhat.com; quint...@redhat.com;
> > >> peter.mayd...@linaro.org; Liu, Yuan1 ;
> > >> berra...@redhat.com
> > >> Subject: Re: [External] Re: [PATCH 3/5] migration: Introduce
> > >> unimplemented 'qatzip' compression method
> > >>
> > >> On Fri, Jan 5, 2024 at 12:07 PM Fabiano Rosas  wrote:
> > >> >
> > >> > Bryan Zhang  writes:
> > >> >
> > >> > +cc Yuan Liu, Daniel Berrangé
> > >> >
> > >> > > Adds support for 'qatzip' as an option for the multifd
> > >> > > compression method parameter, but copy-pastes the no-op logic to
> > >> > > leave the actual methods effectively unimplemented. This is in
> > >> > > preparation of a subsequent commit that will implement actually
> > >> > > using QAT for compression and decompression.
> > >> > >
> > >> > > Signed-off-by: Bryan Zhang 
> > >> > > Signed-off-by: Hao Xiang 
> > >> > > ---
> > >> > >  hw/core/qdev-properties-system.c |  6 ++-
> > >> > >  migration/meson.build|  1 +
> > >> > >  migration/multifd-qatzip.c   | 81
> > >> 
> > >> > >  migration/multifd.h  |  1 +
> > >> > >  qapi/migration.json  |  5 +-
> > >> > >  5 files changed, 92 insertions(+), 2 deletions(-)  create mode
> > >> > > 100644 migration/multifd-qatzip.c
> > >> > >
> > >> > > diff --git a/hw/core/qdev-properties-system.c
> > >> > > b/hw/core/qdev-properties-system.c
> > >> > > index 1a396521d5..d8e48dcb0e 100644
> > >> > > --- a/hw/core/qdev-properties-system.c
> > >> > > +++ b/hw/core/qdev-properties-system.c
> > >> > > @@ -658,7 +658,11 @@ const PropertyInfo qdev_prop_fdc_drive_type
> > >> > > = { const PropertyInfo qdev_prop_multifd_compression = {
> > >> > >  .name = "MultiFDCompression",
> > >> > >  .description = "multifd_compression values, "
> > >> > > -   "none/zlib/zstd",
> > >> > > +   "none/zlib/zstd"
> > >> > > +#ifdef CONFIG_QATZIP
> > >> > > +   "/qatzip"
> > >> > > +#endif
> > >> > > +   ,
> > >> > >  .enum_table = _lookup,
> > >> > >  .get = qdev_propinfo_get_enum,
> > >> > >  .set = qdev_propinfo_set_enum, diff --git
> > >> > > a/migration/meson.build b/migration/meson.build index
> > >> > > 92b1cc4297..e20f318379 100644
> > >> > > --- a/migration/meson.build
> > >> > > +++ b/migration/meson.build
> > >> > > @@ -40,6 +40,7 @@ if get_option('live_block_migration').allowed()
> > >> > >system_ss.add(files('block.c'))  endif
> > >> > >  system_ss.add(when: zstd, if_true: files('multifd-zstd.c'))
> > >> > > +system_ss.add(when: qatzip, if_true: files('multifd-qatzip.c'))
> > >> > >
> > >> > >  specific_ss.add(when: 'CONFIG_SYSTEM_ONLY',
> > >> > >  if_true: files('ram.c', diff --git
> > >> > > a/migration/multifd-qatzip.c b/migration/multifd-qatzip.c new file
> > >> > > mode 100644 index 00..1733bbddb7
> > >> > > --- /dev/null
> > >> > > +++ b/migration/multifd-qatzip.c
> > >> > > @@ -0,0 +1,81 @@
> > >> > > +/*
> > >> > > + * Multifd QATzip compression implementation
> > >> > > + *
> > >> > > + * Copyright (c) Bytedance
> > >> > > + *
> > >> > > + * Authors:
> > >> > > + *  Bryan Zhang 
> > >> > > + *  Hao Xiang   
> > >> > > + *
> > >> > > + * This work is licensed under the terms of the GNU GPL, version 2
> > or
> > >> later.
> > >> > > + * See the COPYING file in the top-level directory.
> > >> > > + */
> > >> > > +
> > >> > > +#include "qemu/osdep.h"
> > >> > > +#include "exec/ramblock.h"
> > >> > > +#include "exec/target_page.h"
> > >> > > +#include "qapi/error.h"
> > >> > > +#include "migration.h"
> > >> > > +#include "options.h"
> > >> > > +#include "multifd.h"
> > >> > > +
> > >> > > +static int qatzip_send_setup(MultiFDSendParams *p, Error **errp) {
> > >> > > +return 0;
> > >> > > +}
> > >> > > +
> > >> > > +static void qatzip_send_cleanup(MultiFDSendParams *p, Error
> > **errp)
> > >> > > +{};
> > >> > > +
> > >> > > +static int qatzip_send_prepare(MultiFDSendParams *p, Error **errp)
> > >> > > +{
> > >> > > +MultiFDPages_t *pages = p->pages;
> > >> > > +
> > >> > > +for (int i = 0; i < p->normal_num; i++) {
> > >> > > +p->iov[p->iovs_num].iov_base = pages->block->host + p-
> > >> >normal[i];
> > >> > > +p->iov[p->iovs_num].iov_len = p->page_size;
> > >> >

[PATCH] hw/arm/virt: Consolidate valid CPU types

2024-01-10 Thread Gavin Shan

It's found that some of the CPU type names in the array of valid
CPU types are invalid because their corresponding classes aren't
registered, as reported by Peter Maydell.

[gshan@gshan build]$ ./qemu-system-arm -machine virt -cpu cortex-a9
qemu-system-arm: Invalid CPU model: cortex-a9
The valid models are: cortex-a7, cortex-a15, (null), (null), (null),
(null), (null), (null), (null), (null), (null), (null), (null), max

Fix it by consolidating the array of valid CPU types. After it's
applied, we have the following output when TCG is enabled.

[gshan@gshan build]$ ./qemu-system-arm -machine virt -cpu cortex-a9
qemu-system-arm: Invalid CPU model: cortex-a9
The valid models are: cortex-a7, cortex-a15, max

[gshan@gshan build]$ ./qemu-system-aarch64 -machine virt -cpu cortex-a9
qemu-system-aarch64: Invalid CPU model: cortex-a9
The valid models are: cortex-a7, cortex-a15, cortex-a35, cortex-a55,
cortex-a72, cortex-a76, cortex-a710, a64fx, neoverse-n1, neoverse-v1,
neoverse-n2, cortex-a53, cortex-a57, max

Reported-by: Peter Maydell 
Fixes: fa8c617791 ("hw/arm/virt: Check CPU type in machine_run_board_init()")
Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 2793121cb4..5cbc69dff8 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2905,6 +2905,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 #ifdef CONFIG_TCG
 ARM_CPU_TYPE_NAME("cortex-a7"),
 ARM_CPU_TYPE_NAME("cortex-a15"),
+#ifdef TARGET_AARCH64
 ARM_CPU_TYPE_NAME("cortex-a35"),
 ARM_CPU_TYPE_NAME("cortex-a55"),
 ARM_CPU_TYPE_NAME("cortex-a72"),
@@ -2914,12 +2915,15 @@ static void virt_machine_class_init(ObjectClass *oc, 
void *data)
 ARM_CPU_TYPE_NAME("neoverse-n1"),
 ARM_CPU_TYPE_NAME("neoverse-v1"),
 ARM_CPU_TYPE_NAME("neoverse-n2"),
-#endif
+#endif /* TARGET_AARCH64 */
+#endif /* CONFIG_TCG */
+#ifdef TARGET_AARCH64
 ARM_CPU_TYPE_NAME("cortex-a53"),
 ARM_CPU_TYPE_NAME("cortex-a57"),
 #if defined(CONFIG_KVM) || defined(CONFIG_HVF)
 ARM_CPU_TYPE_NAME("host"),
-#endif
+#endif /* CONFIG_KVM || CONFIG_HVF */
+#endif /* TARGET_AARCH64 */
 ARM_CPU_TYPE_NAME("max"),
 NULL
 };
-- 
2.43.0

Re: [PATCH 03/40] vdpa: probe descriptor group index for data vqs

2024-01-10 Thread Jason Wang

On Fri, Dec 8, 2023 at 2:53 AM Si-Wei Liu  wrote:
>
> Getting it ahead at initialization time instead of start time allows
> decision making independent of device status, while reducing failure
> possibility in starting device or during migration.
>
> Adding function vhost_vdpa_probe_desc_group() for that end. This
> function will be used to probe the descriptor group for data vqs.
>
> Signed-off-by: Si-Wei Liu 
> ---
>  net/vhost-vdpa.c | 89 
> 
>  1 file changed, 89 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 887c329..0cf3147 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -1688,6 +1688,95 @@ out:
>  return r;
>  }
>
> +static int vhost_vdpa_probe_desc_group(int device_fd, uint64_t features,
> +   int vq_index, int64_t *desc_grpidx,
> +   Error **errp)
> +{
> +uint64_t backend_features;
> +int64_t vq_group, desc_group;
> +uint8_t saved_status = 0;
> +uint8_t status = 0;
> +int r;
> +
> +ERRP_GUARD();
> +
> +r = ioctl(device_fd, VHOST_GET_BACKEND_FEATURES, _features);
> +if (unlikely(r < 0)) {
> +error_setg_errno(errp, errno, "Cannot get vdpa backend_features");
> +return r;
> +}
> +
> +if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID))) {
> +return 0;
> +}
> +
> +if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID))) {
> +return 0;
> +}
> +
> +r = ioctl(device_fd, VHOST_VDPA_GET_STATUS, _status);
> +if (unlikely(r)) {
> +error_setg_errno(errp, -r, "Cannot get device status");
> +goto out;
> +}

I wonder what's the reason for the status being saved and restored?

We don't do this in vhost_vdpa_probe_cvq_isolation().

Thanks

Re: [PATCH] target/riscv: Check for 'A' extension on all atomic instructions

2024-01-10 Thread Alistair Francis

On Thu, Jan 11, 2024 at 3:44 AM Rob Bradford  wrote:
>
> Add requirement that 'A' is enabled for all atomic instructions that
> lack the check. This makes the 64-bit versions consistent with the
> 32-bit versions in the same file.
>
> Signed-off-by: Rob Bradford 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/insn_trans/trans_rva.c.inc | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/target/riscv/insn_trans/trans_rva.c.inc 
> b/target/riscv/insn_trans/trans_rva.c.inc
> index 5f194a447b..f0368de3e4 100644
> --- a/target/riscv/insn_trans/trans_rva.c.inc
> +++ b/target/riscv/insn_trans/trans_rva.c.inc
> @@ -163,65 +163,76 @@ static bool trans_amomaxu_w(DisasContext *ctx, 
> arg_amomaxu_w *a)
>  static bool trans_lr_d(DisasContext *ctx, arg_lr_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_lr(ctx, a, MO_ALIGN | MO_TEUQ);
>  }
>
>  static bool trans_sc_d(DisasContext *ctx, arg_sc_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_sc(ctx, a, (MO_ALIGN | MO_TEUQ));
>  }
>
>  static bool trans_amoswap_d(DisasContext *ctx, arg_amoswap_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_xchg_tl, (MO_ALIGN | MO_TEUQ));
>  }
>
>  static bool trans_amoadd_d(DisasContext *ctx, arg_amoadd_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_add_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
>
>  static bool trans_amoxor_d(DisasContext *ctx, arg_amoxor_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_xor_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
>
>  static bool trans_amoand_d(DisasContext *ctx, arg_amoand_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_and_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
>
>  static bool trans_amoor_d(DisasContext *ctx, arg_amoor_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_or_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
>
>  static bool trans_amomin_d(DisasContext *ctx, arg_amomin_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_smin_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
>
>  static bool trans_amomax_d(DisasContext *ctx, arg_amomax_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_smax_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
>
>  static bool trans_amominu_d(DisasContext *ctx, arg_amominu_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_umin_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
>
>  static bool trans_amomaxu_d(DisasContext *ctx, arg_amomaxu_d *a)
>  {
>  REQUIRE_64BIT(ctx);
> +REQUIRE_EXT(ctx, RVA);
>  return gen_amo(ctx, a, _gen_atomic_fetch_umax_tl, (MO_ALIGN | 
> MO_TEUQ));
>  }
> --
> 2.43.0
>
>

Re: [PATCH 02/40] vdpa: add vhost_vdpa_get_vring_desc_group

2024-01-10 Thread Jason Wang

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu  wrote:
>
> Internal API to get the descriptor group index for a specific virtqueue
> through the VHOST_VDPA_GET_VRING_DESC_GROUP ioctl.
>
> Signed-off-by: Si-Wei Liu 

Acked-by: Jason Wang 

Thanks

> ---
>  net/vhost-vdpa.c | 19 +++
>  1 file changed, 19 insertions(+)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 90f4128..887c329 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -471,6 +471,25 @@ static int64_t vhost_vdpa_get_vring_group(int device_fd, 
> unsigned vq_index,
>  return state.num;
>  }
>
> +static int64_t vhost_vdpa_get_vring_desc_group(int device_fd,
> +   unsigned vq_index,
> +   Error **errp)
> +{
> +struct vhost_vring_state state = {
> +.index = vq_index,
> +};
> +int r = ioctl(device_fd, VHOST_VDPA_GET_VRING_DESC_GROUP, );
> +
> +if (unlikely(r < 0)) {
> +r = -errno;
> +error_setg_errno(errp, errno, "Cannot get VQ %u descriptor group",
> + vq_index);
> +return r;
> +}
> +
> +return state.num;
> +}
> +
>  static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
> unsigned vq_group,
> unsigned asid_num)
> --
> 1.8.3.1
>

Re: [PATCH 01/40] linux-headers: add vhost_types.h and vhost.h

2024-01-10 Thread Jason Wang

On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu  wrote:
>
> Signed-off-by: Si-Wei Liu 

It's better to document which version did this commit sync to.

Thanks

> ---
>  include/standard-headers/linux/vhost_types.h | 13 +
>  linux-headers/linux/vhost.h  |  9 +
>  2 files changed, 22 insertions(+)
>
> diff --git a/include/standard-headers/linux/vhost_types.h 
> b/include/standard-headers/linux/vhost_types.h
> index 5ad07e1..c39199b 100644
> --- a/include/standard-headers/linux/vhost_types.h
> +++ b/include/standard-headers/linux/vhost_types.h
> @@ -185,5 +185,18 @@ struct vhost_vdpa_iova_range {
>   * DRIVER_OK
>   */
>  #define VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK  0x6
> +/* Device can be resumed */
> +#define VHOST_BACKEND_F_RESUME  0x5
> +/* Device supports the driver enabling virtqueues both before and after
> + * DRIVER_OK
> + */
> +#define VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK  0x6
> +/* Device may expose the virtqueue's descriptor area, driver area and
> + * device area to a different group for ASID binding than where its
> + * buffers may reside. Requires VHOST_BACKEND_F_IOTLB_ASID.
> + */
> +#define VHOST_BACKEND_F_DESC_ASID0x7
> +/* IOTLB don't flush memory mapping across device reset */
> +#define VHOST_BACKEND_F_IOTLB_PERSIST  0x8
>
>  #endif
> diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
> index f5c48b6..c61c687 100644
> --- a/linux-headers/linux/vhost.h
> +++ b/linux-headers/linux/vhost.h
> @@ -219,4 +219,13 @@
>   */
>  #define VHOST_VDPA_RESUME  _IO(VHOST_VIRTIO, 0x7E)
>
> +/* Get the dedicated group for the descriptor table of a virtqueue:
> + * read index, write group in num.
> + * The virtqueue index is stored in the index field of vhost_vring_state.
> + * The group id for the descriptor table of this specific virtqueue
> + * is returned via num field of vhost_vring_state.
> + */
> +#define VHOST_VDPA_GET_VRING_DESC_GROUP_IOWR(VHOST_VIRTIO, 0x7F, 
>   \
> + struct vhost_vring_state)
> +
>  #endif
> --
> 1.8.3.1
>

Re: [PATCH v7 05/16] i386: Decouple CPUID[0x1F] subleaf with specific topology level

2024-01-10 Thread Xiaoyao Li


On 1/8/2024 4:27 PM, Zhao Liu wrote:

From: Zhao Liu 

At present, the subleaf 0x02 of CPUID[0x1F] is bound to the "die" level.

In fact, the specific topology level exposed in 0x1F depends on the
platform's support for extension levels (module, tile and die).

To help expose "module" level in 0x1F, decouple CPUID[0x1F] subleaf
with specific topology level.

Signed-off-by: Zhao Liu 
Tested-by: Babu Moger 
Tested-by: Yongwei Ma 
Acked-by: Michael S. Tsirkin 
---
Changes since v3:
  * New patch to prepare to expose module level in 0x1F.
  * Move the CPUTopoLevel enumeration definition from "i386: Add cache
topology info in CPUCacheInfo" to this patch. Note, to align with
topology types in SDM, revert the name of CPU_TOPO_LEVEL_UNKNOW to
CPU_TOPO_LEVEL_INVALID.
---
  target/i386/cpu.c | 136 +-
  target/i386/cpu.h |  15 +
  2 files changed, 126 insertions(+), 25 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index bc440477d13d..5c295c9a9e2d 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -269,6 +269,116 @@ static void encode_cache_cpuid4(CPUCacheInfo *cache,
 (cache->complex_indexing ? CACHE_COMPLEX_IDX : 0);
  }
  
+static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo *topo_info,

+   enum CPUTopoLevel topo_level)
+{
+switch (topo_level) {
+case CPU_TOPO_LEVEL_SMT:
+return 1;
+case CPU_TOPO_LEVEL_CORE:
+return topo_info->threads_per_core;
+case CPU_TOPO_LEVEL_DIE:
+return topo_info->threads_per_core * topo_info->cores_per_die;
+case CPU_TOPO_LEVEL_PACKAGE:
+return topo_info->threads_per_core * topo_info->cores_per_die *
+   topo_info->dies_per_pkg;
+default:
+g_assert_not_reached();
+}
+return 0;
+}
+
+static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo *topo_info,
+enum CPUTopoLevel topo_level)
+{
+switch (topo_level) {
+case CPU_TOPO_LEVEL_SMT:
+return 0;
+case CPU_TOPO_LEVEL_CORE:
+return apicid_core_offset(topo_info);
+case CPU_TOPO_LEVEL_DIE:
+return apicid_die_offset(topo_info);
+case CPU_TOPO_LEVEL_PACKAGE:
+return apicid_pkg_offset(topo_info);
+default:
+g_assert_not_reached();
+}
+return 0;
+}
+
+static uint32_t cpuid1f_topo_type(enum CPUTopoLevel topo_level)
+{
+switch (topo_level) {
+case CPU_TOPO_LEVEL_INVALID:
+return CPUID_1F_ECX_TOPO_LEVEL_INVALID;
+case CPU_TOPO_LEVEL_SMT:
+return CPUID_1F_ECX_TOPO_LEVEL_SMT;
+case CPU_TOPO_LEVEL_CORE:
+return CPUID_1F_ECX_TOPO_LEVEL_CORE;
+case CPU_TOPO_LEVEL_DIE:
+return CPUID_1F_ECX_TOPO_LEVEL_DIE;
+default:
+/* Other types are not supported in QEMU. */
+g_assert_not_reached();
+}
+return 0;
+}
+
+static void encode_topo_cpuid1f(CPUX86State *env, uint32_t count,
+X86CPUTopoInfo *topo_info,
+uint32_t *eax, uint32_t *ebx,
+uint32_t *ecx, uint32_t *edx)
+{
+static DECLARE_BITMAP(topo_bitmap, CPU_TOPO_LEVEL_MAX);
+X86CPU *cpu = env_archcpu(env);
+unsigned long level, next_level;
+uint32_t num_cpus_next_level, offset_next_level;


again, I dislike the name of cpus to represent the logical process or 
thread. we can call it, num_lps_next_level, or num_threads_next_level;



+
+/*
+ * Initialize the bitmap to decide which levels should be
+ * encoded in 0x1f.
+ */
+if (!count) {


using static bitmap and initialize the bitmap on (count == 0), looks bad 
to me. It highly relies on the order of how encode_topo_cpuid1f() is 
called, and fragile.


Instead, we can maintain an array in CPUX86State, e.g.,

--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1904,6 +1904,8 @@ typedef struct CPUArchState {

 /* Number of dies within this CPU package. */
 unsigned nr_dies;
+
+unint8_t valid_cpu_topo[CPU_TOPO_LEVEL_MAX];
 } CPUX86State;


and initialize it as below, when initializing the env

env->valid_cpu_topo[0] = CPU_TOPO_LEVEL_SMT;
env->valid_cpu_topo[1] = CPU_TOPO_LEVEL_CORE;
if (env->nr_dies > 1) {
env->valid_cpu_topo[2] = CPU_TOPO_LEVEL_DIE;
}

then in encode_topo_cpuid1f(), we can get level and next_level as

level = env->valid_cpu_topo[count];
next_level = env->valid_cpu_topo[count + 1];



+/* SMT and core levels are exposed in 0x1f leaf by default. */
+set_bit(CPU_TOPO_LEVEL_SMT, topo_bitmap);
+set_bit(CPU_TOPO_LEVEL_CORE, topo_bitmap);
+
+if (env->nr_dies > 1) {
+set_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
+}
+}
+
+*ecx = count & 0xff;
+*edx = cpu->apic_id;
+
+level = find_first_bit(topo_bitmap, CPU_TOPO_LEVEL_MAX);
+if (level == CPU_TOPO_LEVEL_MAX) {
+num_cpus_next_level = 0;
+offset_next_level = 0;
+
+

Re: [PATCH 00/10] docs/migration: Reorganize migration documentations

2024-01-10 Thread Peter Xu

On Wed, Jan 10, 2024 at 04:21:12PM +0100, Cédric Le Goater wrote:
> We also have a [feature request] label under gitlab and some issues are
> tagged with it. I wonder how we can consolidate the 3 sources : wiki,
> gitlab, https://www.qemu.org/docs/master/

Thanks for mentioning the gitlab issues!  This reminded me that we used to
have Dave looking after that from time to time, but it's totally overlooked
at least by myself..  probably we need to have some time tracking it.  On
the documentation side for ToDos, it's indeed potentially doable to already
merge into gitlab issues, then we merge 3->2.  I'll think about it.

-- 
Peter Xu

Re: [PATCH v3 4/4] [NOT FOR MERGE] tests/qtest/migration: Adapt tests to use older QEMUs

2024-01-10 Thread Peter Xu

On Wed, Jan 10, 2024 at 11:42:18AM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Tue, Jan 09, 2024 at 11:46:32AM -0300, Fabiano Rosas wrote:
> >> Hm, it would be better to avoid the extra maintenance task at the start
> >> of every release, no? It also blocks us from doing n-2 even
> >> experimentally.
> >
> > See my other reply, on whether we can use "n-1" for migration-test.  If
> > that can work for us, then IIUC we can avoid either "since:" or any
> > relevant flag, neither do we need to unmask tests after each releases.  All
> > old tests should always "just work" with a new qemu binary.
> 
> Hmm.. There are some assumptions here:
> 
> 1) New code will always be compatible with old tests. E.g. some
>patchseries changed code and changed a test to match the new
>code. Then we'd need a flag like 'since' anyway to mark that the new
>QEMU cannot be used with the old test.
> 
>(if new QEMU is not compatible with old tests without any good
>reason, then that's just a regression I think)

Exactly what you are saying here.  We can't make new QEMU not working on
old tests.

One way to simplify the understanding is, we can imagine the old tests as
"some user currently using the old QEMU, and who would like to migrate to
the master QEMU binary".  Such user only uses exactly the same cmdline we
used for testing migration-test in exactly that n-1 qemu release binary.

If we fail that old test, it means we can already fail such an user.
That's destined a regression to me, no?  Or, do you have a solid example?

The only thing I can think of is, when we want to e.g. obsolete a QEMU
cmdline that is used in migration-test.  But then that cmdline needs to be
declared obsolete first for a few releases (let's say, 4), and before that
deadline we should already rewrite migration-test to not use it, and as
long as we do it in 3 releases I suppose nothing will be affected.

> 
> 2) There would not be issues when fixing bugs/refactoring
>tests. E.g. old tests had a bug that is now fixed, but since we're
>not using the new tests, the bug is always there until next
>release. This could block the entire test suite, specially with
>concurrency bugs which can start triggering due to changes in timing.

Yes this might be a problem.  Note that the old tests we're using will be
exactly the same test we released previous QEMU.  I am "assuming" that the
test case is as stable as the released QEMU, since we kept running it for
all pulls in CI runs.  If we see anything flaky, we should mark it
especially right before the release, then the released tests will be
considerably stable.

The worst case is we still keep a knob in the CI file, and we can turn off
n-1 -> n tests for the CI for some release if there's some unfortunate
accident.  But I hope in reality that can be avoided.

> 
> 3) New code that can only be reached via new tests cannot cause
>regressions. E.g. new code is added but is kept under a machine
>property or migration capability. That code will only show the
>regression after the new test enables that cap/property. At that
>point it's too late because it was already released.

I can't say I fully get the point here.  New code, if with a new cap with
it, should run exactly like the old code if the cap is not turned on.  I
suppose that's the case for when we only run n-1 version of migration-test.
IMHO it's the same issue as 1) above, that we just should not break it, and
if we do, that's exactly what we want to capture and fix in master, not n-1
branch.

But as I said, perhaps I didn't really get the issue you wanted to describe..

> 
> In general I like the simplicity of your approach, but it would be
> annoying to change this series only to find out we still need some sort
> of flag later. Even worse, #3 would miss the point of this kind of
> testing entirely.
> 
> #1 could be mitigated by a "no changes to tests rule". We'd start
> requiring that new tests be written and an existing test is never
> altered. For #2 and #3 I don't have a solution.
> 

-- 
Peter Xu

Re: [PATCH v4 8/9b] target/loongarch: Implement set vcpu intr for kvm

2024-01-10 Thread gaosong


Hi,

在 2024/1/10 下午5:41, Philippe Mathieu-Daudé 写道:

From: Tianrui Zhao 

Implement loongarch kvm set vcpu interrupt interface,
when a irq is set in vcpu, we use the KVM_INTERRUPT
ioctl to set intr into kvm.

Signed-off-by: Tianrui Zhao 
Signed-off-by: xianglai li 
Reviewed-by: Song Gao 
Message-ID: <20240105075804.1228596-9-zhaotian...@loongson.cn>
[PMD: Split from bigger patch, part 2]
Signed-off-by: Philippe Mathieu-Daudé 
---
  target/loongarch/kvm/kvm_loongarch.h | 16 
  target/loongarch/cpu.c   |  9 -
  target/loongarch/kvm/kvm.c   | 15 +++
  target/loongarch/trace-events|  1 +
  4 files changed, 40 insertions(+), 1 deletion(-)
  create mode 100644 target/loongarch/kvm/kvm_loongarch.h

diff --git a/target/loongarch/kvm/kvm_loongarch.h 
b/target/loongarch/kvm/kvm_loongarch.h
new file mode 100644
index 00..d945b6bb82
--- /dev/null
+++ b/target/loongarch/kvm/kvm_loongarch.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * QEMU LoongArch kvm interface
+ *
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ */
+
+#include "cpu.h"
+
+#ifndef QEMU_KVM_LOONGARCH_H
+#define QEMU_KVM_LOONGARCH_H
+
+int  kvm_loongarch_set_interrupt(LoongArchCPU *cpu, int irq, int level);
+void kvm_arch_reset_vcpu(CPULoongArchState *env);
+
+#endif
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index d9f8661cfd..d3a8a2f521 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -12,6 +12,7 @@
  #include "qemu/module.h"
  #include "sysemu/qtest.h"
  #include "sysemu/tcg.h"
+#include "sysemu/kvm.h"
  #include "exec/exec-all.h"
  #include "cpu.h"
  #include "internals.h"
@@ -21,6 +22,10 @@
  #include "sysemu/reset.h"
  #endif
  #include "vec.h"
+#ifdef CONFIG_KVM
+#include "kvm/kvm_loongarch.h"


This broken  tcg 'loongarch64-softmmu' build on X86 host, :-[

../target/loongarch/cpu.c: In function ‘loongarch_cpu_set_irq’:
../target/loongarch/cpu.c:122:9: error: implicit declaration of function 
‘kvm_loongarch_set_interrupt’ [-Werror=implicit-function-declaration]

  122 | kvm_loongarch_set_interrupt(cpu, irq, level);
  | ^~~
../target/loongarch/cpu.c:122:9: error: nested extern declaration of 
‘kvm_loongarch_set_interrupt’ [-Werror=nested-externs]

../target/loongarch/cpu.c: In function ‘loongarch_cpu_reset_hold’:
../target/loongarch/cpu.c:557:9: error: implicit declaration of function 
‘kvm_arch_reset_vcpu’; did you mean ‘kvm_arch_init_vcpu’? 
[-Werror=implicit-function-declaration]

  557 | kvm_arch_reset_vcpu(env);
  | ^~~
  | kvm_arch_init_vcpu
../target/loongarch/cpu.c:557:9: error: nested extern declaration of 
‘kvm_arch_reset_vcpu’ [-Werror=nested-externs]

cc1: all warnings being treated as errors

I will move it out of  '#ifdef CONFIG_KVM'

Thanks.
Song Gao

+#include 
+#endif
  #ifdef CONFIG_TCG
  #include "exec/cpu_ldst.h"
  #include "tcg/tcg.h"
@@ -113,7 +118,9 @@ void loongarch_cpu_set_irq(void *opaque, int irq, int level)
  return;
  }
  
-if (tcg_enabled()) {

+if (kvm_enabled()) {
+kvm_loongarch_set_interrupt(cpu, irq, level);
+} else if (tcg_enabled()) {
  env->CSR_ESTAT = deposit64(env->CSR_ESTAT, irq, 1, level != 0);
  if (FIELD_EX64(env->CSR_ESTAT, CSR_ESTAT, IS)) {
  cpu_interrupt(cs, CPU_INTERRUPT_HARD);
diff --git a/target/loongarch/kvm/kvm.c b/target/loongarch/kvm/kvm.c
index d2dab3fef4..bd33ec2114 100644
--- a/target/loongarch/kvm/kvm.c
+++ b/target/loongarch/kvm/kvm.c
@@ -748,6 +748,21 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
  return ret;
  }
  
+int kvm_loongarch_set_interrupt(LoongArchCPU *cpu, int irq, int level)

+{
+struct kvm_interrupt intr;
+CPUState *cs = CPU(cpu);
+
+if (level) {
+intr.irq = irq;
+} else {
+intr.irq = -irq;
+}
+
+trace_kvm_set_intr(irq, level);
+return kvm_vcpu_ioctl(cs, KVM_INTERRUPT, );
+}
+
  void kvm_arch_accel_class_init(ObjectClass *oc)
  {
  }
diff --git a/target/loongarch/trace-events b/target/loongarch/trace-events
index 021839880e..dea11edc0f 100644
--- a/target/loongarch/trace-events
+++ b/target/loongarch/trace-events
@@ -12,3 +12,4 @@ kvm_failed_put_counter(const char *msg) "Failed to put counter 
into KVM: %s"
  kvm_failed_get_cpucfg(const char *msg) "Failed to get cpucfg from KVM: %s"
  kvm_failed_put_cpucfg(const char *msg) "Failed to put cpucfg into KVM: %s"
  kvm_arch_handle_exit(int num) "kvm arch handle exit, the reason number: %d"
+kvm_set_intr(int irq, int level) "kvm set interrupt, irq num: %d, level: %d"

Re: [PATCH V1 2/3] migration: notifier error reporting

2024-01-10 Thread Peter Xu

On Wed, Jan 10, 2024 at 01:08:41PM -0500, Steven Sistare wrote:
> On 1/10/2024 2:18 AM, Peter Xu wrote:
> > On Wed, Dec 13, 2023 at 10:11:32AM -0800, Steve Sistare wrote:
> >> After calling notifiers, check if an error has been reported via
> >> migrate_set_error, and halt the migration.
> >>
> >> None of the notifiers call migrate_set_error at this time, so no
> >> functional change.
> >>
> >> Signed-off-by: Steve Sistare 
> >> ---
> >>  include/migration/misc.h |  2 +-
> >>  migration/migration.c| 26 ++
> >>  2 files changed, 23 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/include/migration/misc.h b/include/migration/misc.h
> >> index 901d117..231d7e4 100644
> >> --- a/include/migration/misc.h
> >> +++ b/include/migration/misc.h
> >> @@ -65,7 +65,7 @@ MigMode migrate_mode_of(MigrationState *);
> >>  void migration_add_notifier(Notifier *notify,
> >>  void (*func)(Notifier *notifier, void *data));
> >>  void migration_remove_notifier(Notifier *notify);
> >> -void migration_call_notifiers(MigrationState *s);
> >> +int migration_call_notifiers(MigrationState *s);
> >>  bool migration_in_setup(MigrationState *);
> >>  bool migration_has_finished(MigrationState *);
> >>  bool migration_has_failed(MigrationState *);
> >> diff --git a/migration/migration.c b/migration/migration.c
> >> index d5bfe70..29a9a92 100644
> >> --- a/migration/migration.c
> >> +++ b/migration/migration.c
> >> @@ -1280,6 +1280,8 @@ void migrate_set_state(int *state, int old_state, 
> >> int new_state)
> >>  
> >>  static void migrate_fd_cleanup(MigrationState *s)
> >>  {
> >> +bool already_failed;
> >> +
> >>  qemu_bh_delete(s->cleanup_bh);
> >>  s->cleanup_bh = NULL;
> >>  
> >> @@ -1327,11 +1329,20 @@ static void migrate_fd_cleanup(MigrationState *s)
> >>MIGRATION_STATUS_CANCELLED);
> >>  }
> >>  
> >> +already_failed = migration_has_failed(s);
> >> +if (migration_call_notifiers(s)) {
> >> +if (!already_failed) {
> >> +migrate_set_state(>state, s->state, 
> >> MIGRATION_STATUS_FAILED);
> >> +/* Notify again to recover from this late failure. */
> >> +migration_call_notifiers(s);
> >> +}
> >> +}
> >> +
> >>  if (s->error) {
> >>  /* It is used on info migrate.  We can't free it */
> >>  error_report_err(error_copy(s->error));
> >>  }
> >> -migration_call_notifiers(s);
> >> +
> >>  block_cleanup_parameters();
> >>  yank_unregister_instance(MIGRATION_YANK_INSTANCE);
> >>  }
> >> @@ -1450,9 +1461,10 @@ void migration_remove_notifier(Notifier *notify)
> >>  }
> >>  }
> >>  
> >> -void migration_call_notifiers(MigrationState *s)
> >> +int migration_call_notifiers(MigrationState *s)
> >>  {
> >>  notifier_list_notify(_state_notifiers, s);
> >> +return (s->error != NULL);
> > 
> > Exporting more migration_*() functions is pretty ugly to me..
> 
> I assume you mean migrate_set_error(), which is currently only called from
> migration/*.c code.
> 
> Instead, we could define a new function migrate_set_notifier_error(), defined
> in the new file migration/notifier.h, so we clearly limit the migration 
> functions which can be called from notifiers.  (Its implementation just calls
> migrate_set_error)

Fundementally this allows another .c to change one more field of
MigrationState (which is ->error) and I still want to avoid it.

I just replied in the other thread, but now with all these in mind I think
I still prefer not passing in MigrationState* at all.  It's already kind of
abused due to migrate_get_current(), and IMHO it's healthier to limit its
usage to minimum to cover the core of migration states for migration/ use
only.

Shrinking or even stop exporting migrate_get_current() is another more
challenging task, but now what we can do is stop enlarging the direct use
of MigrationState*.

> 
> > Would it be better to pass in "Error** errp" into each notifiers?  That may
> > need an open coded notifier_list_notify(), breaking the loop if "*errp".
> > 
> > And the notifier API currently only support one arg..  maybe we should
> > implement the notifiers ourselves, ideally passing in "(int state, Error
> > **errp)" instead of "(MigrationState *s)".
> > 
> > Ideally with that MigrationState* shouldn't be visible outside migration/.
> 
> I will regret saying this because of the amount of (mechanical) code change 
> involved,
> but the cleanest solution is:

:)

>
> * Pass errp to: 
>   notifier_with_return_list_notify(NotifierWithReturnList *list, void *data, 
> Error *errp)
> * Pass errp to the NotifierWithReturn notifier:
>   int (*notify)(NotifierWithReturn *notifier, void *data, Error **errp);
> * Delete the errp member from struct PostcopyNotifyData and pass errp to the 
> notifier function
>   Ditto for PrecopyNotifyData.
> * Convert all migration notifiers to NotifierWithReturn

Would you mind changing MigrationState* into an

Re: [PATCH V1 1/3] migration: check mode in notifiers

2024-01-10 Thread Peter Xu

On Wed, Jan 10, 2024 at 01:08:01PM -0500, Steven Sistare wrote:
> On 1/10/2024 2:09 AM, Peter Xu wrote:
> > On Wed, Dec 13, 2023 at 10:11:31AM -0800, Steve Sistare wrote:
> >> The existing notifiers should only apply to normal mode.
> >>
> >> No functional change.
> > 
> > Instead of adding such check in every notifier, why not make CPR a separate
> > list of notifiers?  Just like the blocker lists.
> 
> Sure.   I proposed minimal changes in this current series, but extending the 
> api to take migration mode would be nicer.
> 
> > Aside of this patch, I just started to look at this "notifier" code, I
> > really don't think we should pass in MigrationState* into the notifiers.
> > IIUC we only need the "state" as an enum.  Then with two separate
> > registers, the device code knows the migration mode.
> > 
> > What do you think?
> 
> If we pass state, the notifier must either compare to enum values such as
> MIGRATION_STATUS_COMPLETED instead of calling migration_has_finished(s), or
> we must define new accessors such as migration_state_is_finished(state).
> 
> IMO passing MigrationState is the best approach.
> MigrationState is an incomplete type in most notifiers, and the client can
> pass it to a limited set of accessors to get more information -- exactly what 
> we want to hide migration internals.  However, we could further limit the
> allowed accessors, eg move these to a new file "include/migration/notifier.h".
> 
> 
> #include "qemu/notify.h"
> void migration_add_notifier(Notifier *notify,
> void (*func)(Notifier *notifier, void *data));
> void migration_remove_notifier(Notifier *notify);
> bool migration_is_active(MigrationState *);
> bool migration_in_setup(MigrationState *);
> bool migration_has_finished(MigrationState *);
> bool migration_has_failed(MigrationState *);
> ---

Yes this also sounds good.  Thanks,

-- 
Peter Xu

[PATCH v12 04/10] hw/net: Add NPCMXXX GMAC device

2024-01-10 Thread Nabih Estefan

From: Hao Wu 

This patch implements the basic registers of GMAC device and sets
registers for networking functionalities.

Tested:
The following message shows up with the change:
Broadcom BCM54612E stmmac-0:00: attached PHY driver [Broadcom BCM54612E] 
(mii_bus:phy_addr=stmmac-0:00, irq=POLL)
stmmaceth f0802000.eth eth0: Link is Up - 1Gbps/Full - flow control rx/tx

Change-Id: If71c6d486b95edcccba109ba454870714d7e0940
Signed-off-by: Hao Wu 
Signed-off-by: Nabih Estefan Diaz 
Reviewed-by: Tyrone Ting 
---
 hw/net/meson.build |   2 +-
 hw/net/npcm_gmac.c | 424 +
 hw/net/trace-events|  11 +
 include/hw/net/npcm_gmac.h | 340 +
 4 files changed, 776 insertions(+), 1 deletion(-)
 create mode 100644 hw/net/npcm_gmac.c
 create mode 100644 include/hw/net/npcm_gmac.h

diff --git a/hw/net/meson.build b/hw/net/meson.build
index 9afceb0619..d4e1dc9838 100644
--- a/hw/net/meson.build
+++ b/hw/net/meson.build
@@ -38,7 +38,7 @@ system_ss.add(when: 'CONFIG_I82596_COMMON', if_true: 
files('i82596.c'))
 system_ss.add(when: 'CONFIG_SUNHME', if_true: files('sunhme.c'))
 system_ss.add(when: 'CONFIG_FTGMAC100', if_true: files('ftgmac100.c'))
 system_ss.add(when: 'CONFIG_SUNGEM', if_true: files('sungem.c'))
-system_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_emc.c'))
+system_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_emc.c', 
'npcm_gmac.c'))
 
 system_ss.add(when: 'CONFIG_ETRAXFS', if_true: files('etraxfs_eth.c'))
 system_ss.add(when: 'CONFIG_COLDFIRE', if_true: files('mcf_fec.c'))
diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
new file mode 100644
index 00..98b3c33c94
--- /dev/null
+++ b/hw/net/npcm_gmac.c
@@ -0,0 +1,424 @@
+/*
+ * Nuvoton NPCM7xx/8xx GMAC Module
+ *
+ * Copyright 2022 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * Unsupported/unimplemented features:
+ * - MII is not implemented, MII_ADDR.BUSY and MII_DATA always return zero
+ * - Precision timestamp (PTP) is not implemented.
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/registerfields.h"
+#include "hw/net/mii.h"
+#include "hw/net/npcm_gmac.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/units.h"
+#include "sysemu/dma.h"
+#include "trace.h"
+
+REG32(NPCM_DMA_BUS_MODE, 0x1000)
+REG32(NPCM_DMA_XMT_POLL_DEMAND, 0x1004)
+REG32(NPCM_DMA_RCV_POLL_DEMAND, 0x1008)
+REG32(NPCM_DMA_RX_BASE_ADDR, 0x100c)
+REG32(NPCM_DMA_TX_BASE_ADDR, 0x1010)
+REG32(NPCM_DMA_STATUS, 0x1014)
+REG32(NPCM_DMA_CONTROL, 0x1018)
+REG32(NPCM_DMA_INTR_ENA, 0x101c)
+REG32(NPCM_DMA_MISSED_FRAME_CTR, 0x1020)
+REG32(NPCM_DMA_HOST_TX_DESC, 0x1048)
+REG32(NPCM_DMA_HOST_RX_DESC, 0x104c)
+REG32(NPCM_DMA_CUR_TX_BUF_ADDR, 0x1050)
+REG32(NPCM_DMA_CUR_RX_BUF_ADDR, 0x1054)
+REG32(NPCM_DMA_HW_FEATURE, 0x1058)
+
+REG32(NPCM_GMAC_MAC_CONFIG, 0x0)
+REG32(NPCM_GMAC_FRAME_FILTER, 0x4)
+REG32(NPCM_GMAC_HASH_HIGH, 0x8)
+REG32(NPCM_GMAC_HASH_LOW, 0xc)
+REG32(NPCM_GMAC_MII_ADDR, 0x10)
+REG32(NPCM_GMAC_MII_DATA, 0x14)
+REG32(NPCM_GMAC_FLOW_CTRL, 0x18)
+REG32(NPCM_GMAC_VLAN_FLAG, 0x1c)
+REG32(NPCM_GMAC_VERSION, 0x20)
+REG32(NPCM_GMAC_WAKEUP_FILTER, 0x28)
+REG32(NPCM_GMAC_PMT, 0x2c)
+REG32(NPCM_GMAC_LPI_CTRL, 0x30)
+REG32(NPCM_GMAC_TIMER_CTRL, 0x34)
+REG32(NPCM_GMAC_INT_STATUS, 0x38)
+REG32(NPCM_GMAC_INT_MASK, 0x3c)
+REG32(NPCM_GMAC_MAC0_ADDR_HI, 0x40)
+REG32(NPCM_GMAC_MAC0_ADDR_LO, 0x44)
+REG32(NPCM_GMAC_MAC1_ADDR_HI, 0x48)
+REG32(NPCM_GMAC_MAC1_ADDR_LO, 0x4c)
+REG32(NPCM_GMAC_MAC2_ADDR_HI, 0x50)
+REG32(NPCM_GMAC_MAC2_ADDR_LO, 0x54)
+REG32(NPCM_GMAC_MAC3_ADDR_HI, 0x58)
+REG32(NPCM_GMAC_MAC3_ADDR_LO, 0x5c)
+REG32(NPCM_GMAC_RGMII_STATUS, 0xd8)
+REG32(NPCM_GMAC_WATCHDOG, 0xdc)
+REG32(NPCM_GMAC_PTP_TCR, 0x700)
+REG32(NPCM_GMAC_PTP_SSIR, 0x704)
+REG32(NPCM_GMAC_PTP_STSR, 0x708)
+REG32(NPCM_GMAC_PTP_STNSR, 0x70c)
+REG32(NPCM_GMAC_PTP_STSUR, 0x710)
+REG32(NPCM_GMAC_PTP_STNSUR, 0x714)
+REG32(NPCM_GMAC_PTP_TAR, 0x718)
+REG32(NPCM_GMAC_PTP_TTSR, 0x71c)
+
+/* Register Fields */
+#define NPCM_GMAC_MII_ADDR_BUSY BIT(0)
+#define NPCM_GMAC_MII_ADDR_WRITEBIT(1)
+#define NPCM_GMAC_MII_ADDR_GR(rv)   extract16((rv), 6, 5)
+#define NPCM_GMAC_MII_ADDR_PA(rv)   extract16((rv), 11, 5)
+
+#define NPCM_GMAC_INT_MASK_LPIIMBIT(10)
+#define NPCM_GMAC_INT_MASK_PMTM BIT(3)
+#define NPCM_GMAC_INT_MASK_RGIM BIT(0)
+
+#define NPCM_DMA_BUS_MODE_SWR   BIT(0)
+
+static const uint32_t npcm_gmac_cold_reset_values[NPCM_GMAC_NR_REGS] = {
+/*

[PATCH v12 08/10] hw/net: GMAC Rx Implementation

2024-01-10 Thread Nabih Estefan

From: Nabih Estefan Diaz 

- Implementation of Receive function for packets
- Implementation for reading and writing from and to descriptors in
  memory for Rx

When RX starts, we need to flush the queued packets so that they
can be received by the GMAC device. Without this it won't work
with TAP NIC device.

When RX descriptor list is full, it returns a DMA_STATUS for software
to handle it. But there's no way to indicate the software has
handled all RX descriptors and the whole pipeline stalls.

We do something similar to NPCM7XX EMC to handle this case.

1. Return packet size when RX descriptor is full, effectively
dropping these packets in such a case.
2. When software clears RX descriptor full bit, continue receiving
further packets by flushing QEMU packet queue.

Added relevant trace-events

Change-Id: I132aa254a94cda1a586aba2ea33bbfc74ecdb831
Signed-off-by: Hao Wu 
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 hw/net/npcm_gmac.c  | 306 +++-
 hw/net/trace-events |   7 +-
 2 files changed, 310 insertions(+), 3 deletions(-)

diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
index 44c4ffaff4..e81996b01a 100644
--- a/hw/net/npcm_gmac.c
+++ b/hw/net/npcm_gmac.c
@@ -24,6 +24,10 @@
 #include "hw/net/mii.h"
 #include "hw/net/npcm_gmac.h"
 #include "migration/vmstate.h"
+#include "net/checksum.h"
+#include "net/eth.h"
+#include "net/net.h"
+#include "qemu/cutils.h"
 #include "qemu/log.h"
 #include "qemu/units.h"
 #include "sysemu/dma.h"
@@ -146,6 +150,17 @@ static void gmac_phy_set_link(NPCMGMACState *gmac, bool 
active)
 
 static bool gmac_can_receive(NetClientState *nc)
 {
+NPCMGMACState *gmac = NPCM_GMAC(qemu_get_nic_opaque(nc));
+
+/* If GMAC receive is disabled. */
+if (!(gmac->regs[R_NPCM_GMAC_MAC_CONFIG] & NPCM_GMAC_MAC_CONFIG_RX_EN)) {
+return false;
+}
+
+/* If GMAC DMA RX is stopped. */
+if (!(gmac->regs[R_NPCM_DMA_CONTROL] & NPCM_DMA_CONTROL_START_STOP_RX)) {
+return false;
+}
 return true;
 }
 
@@ -189,12 +204,288 @@ static void gmac_update_irq(NPCMGMACState *gmac)
 qemu_set_irq(gmac->irq, level);
 }
 
-static ssize_t gmac_receive(NetClientState *nc, const uint8_t *buf, size_t len)
+static int gmac_read_rx_desc(dma_addr_t addr, struct NPCMGMACRxDesc *desc)
+{
+if (dma_memory_read(_space_memory, addr, desc,
+sizeof(*desc), MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read descriptor @ 0x%"
+  HWADDR_PRIx "\n", __func__, addr);
+return -1;
+}
+desc->rdes0 = le32_to_cpu(desc->rdes0);
+desc->rdes1 = le32_to_cpu(desc->rdes1);
+desc->rdes2 = le32_to_cpu(desc->rdes2);
+desc->rdes3 = le32_to_cpu(desc->rdes3);
+return 0;
+}
+
+static int gmac_write_rx_desc(dma_addr_t addr, struct NPCMGMACRxDesc *desc)
+{
+struct NPCMGMACRxDesc le_desc;
+le_desc.rdes0 = cpu_to_le32(desc->rdes0);
+le_desc.rdes1 = cpu_to_le32(desc->rdes1);
+le_desc.rdes2 = cpu_to_le32(desc->rdes2);
+le_desc.rdes3 = cpu_to_le32(desc->rdes3);
+if (dma_memory_write(_space_memory, addr, _desc,
+sizeof(le_desc), MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to write descriptor @ 0x%"
+  HWADDR_PRIx "\n", __func__, addr);
+return -1;
+}
+return 0;
+}
+
+static int gmac_read_tx_desc(dma_addr_t addr, struct NPCMGMACTxDesc *desc)
+{
+if (dma_memory_read(_space_memory, addr, desc,
+sizeof(*desc), MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read descriptor @ 0x%"
+  HWADDR_PRIx "\n", __func__, addr);
+return -1;
+}
+desc->tdes0 = le32_to_cpu(desc->tdes0);
+desc->tdes1 = le32_to_cpu(desc->tdes1);
+desc->tdes2 = le32_to_cpu(desc->tdes2);
+desc->tdes3 = le32_to_cpu(desc->tdes3);
+return 0;
+}
+
+static int gmac_write_tx_desc(dma_addr_t addr, struct NPCMGMACTxDesc *desc)
+{
+struct NPCMGMACTxDesc le_desc;
+le_desc.tdes0 = cpu_to_le32(desc->tdes0);
+le_desc.tdes1 = cpu_to_le32(desc->tdes1);
+le_desc.tdes2 = cpu_to_le32(desc->tdes2);
+le_desc.tdes3 = cpu_to_le32(desc->tdes3);
+if (dma_memory_write(_space_memory, addr, _desc,
+sizeof(le_desc), MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to write descriptor @ 0x%"
+  HWADDR_PRIx "\n", __func__, addr);
+return -1;
+}
+return 0;
+}
+static int gmac_rx_transfer_frame_to_buffer(uint32_t rx_buf_len,
+uint32_t *left_frame,
+uint32_t rx_buf_addr,
+bool *eof_transferred,
+const uint8_t **frame_ptr,
+uint16_t *transferred)
 {
-/*

[PATCH v12 06/10] tests/qtest: Creating qtest for GMAC Module

2024-01-10 Thread Nabih Estefan

From: Nabih Estefan Diaz 

 - Created qtest to check initialization of registers in GMAC Module.
 - Implemented test into Build File.

Change-Id: I8b2fe152d3987a7eec4cf6a1d25ba92e75a5391d
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 tests/qtest/meson.build  |   1 +
 tests/qtest/npcm_gmac-test.c | 209 +++
 2 files changed, 210 insertions(+)
 create mode 100644 tests/qtest/npcm_gmac-test.c

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 41bc75c8b1..2471298f3e 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -221,6 +221,7 @@ qtests_aarch64 = \
   (config_all_devices.has_key('CONFIG_RASPI') ? ['bcm2835-dma-test'] : []) +  \
   (config_all_accel.has_key('CONFIG_TCG') and  
  \
config_all_devices.has_key('CONFIG_TPM_TIS_I2C') ? ['tpm-tis-i2c-test'] : 
[]) + \
+  (config_all_devices.has_key('CONFIG_NPCM7XX') ? qtests_npcm7xx : []) + \
   ['arm-cpu-features',
'numa-test',
'boot-serial-test',
diff --git a/tests/qtest/npcm_gmac-test.c b/tests/qtest/npcm_gmac-test.c
new file mode 100644
index 00..130a1599a8
--- /dev/null
+++ b/tests/qtest/npcm_gmac-test.c
@@ -0,0 +1,209 @@
+/*
+ * QTests for Nuvoton NPCM7xx/8xx GMAC Modules.
+ *
+ * Copyright 2023 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+#include "libqos/libqos.h"
+
+/* Name of the GMAC Device */
+#define TYPE_NPCM_GMAC "npcm-gmac"
+
+typedef struct GMACModule {
+int irq;
+uint64_t base_addr;
+} GMACModule;
+
+typedef struct TestData {
+const GMACModule *module;
+} TestData;
+
+/* Values extracted from hw/arm/npcm8xx.c */
+static const GMACModule gmac_module_list[] = {
+{
+.irq= 14,
+.base_addr  = 0xf0802000
+},
+{
+.irq= 15,
+.base_addr  = 0xf0804000
+},
+{
+.irq= 16,
+.base_addr  = 0xf0806000
+},
+{
+.irq= 17,
+.base_addr  = 0xf0808000
+}
+};
+
+/* Returns the index of the GMAC module. */
+static int gmac_module_index(const GMACModule *mod)
+{
+ptrdiff_t diff = mod - gmac_module_list;
+
+g_assert_true(diff >= 0 && diff < ARRAY_SIZE(gmac_module_list));
+
+return diff;
+}
+
+/* 32-bit register indices. Taken from npcm_gmac.c */
+typedef enum NPCMRegister {
+/* DMA Registers */
+NPCM_DMA_BUS_MODE = 0x1000,
+NPCM_DMA_XMT_POLL_DEMAND = 0x1004,
+NPCM_DMA_RCV_POLL_DEMAND = 0x1008,
+NPCM_DMA_RCV_BASE_ADDR = 0x100c,
+NPCM_DMA_TX_BASE_ADDR = 0x1010,
+NPCM_DMA_STATUS = 0x1014,
+NPCM_DMA_CONTROL = 0x1018,
+NPCM_DMA_INTR_ENA = 0x101c,
+NPCM_DMA_MISSED_FRAME_CTR = 0x1020,
+NPCM_DMA_HOST_TX_DESC = 0x1048,
+NPCM_DMA_HOST_RX_DESC = 0x104c,
+NPCM_DMA_CUR_TX_BUF_ADDR = 0x1050,
+NPCM_DMA_CUR_RX_BUF_ADDR = 0x1054,
+NPCM_DMA_HW_FEATURE = 0x1058,
+
+/* GMAC Registers */
+NPCM_GMAC_MAC_CONFIG = 0x0,
+NPCM_GMAC_FRAME_FILTER = 0x4,
+NPCM_GMAC_HASH_HIGH = 0x8,
+NPCM_GMAC_HASH_LOW = 0xc,
+NPCM_GMAC_MII_ADDR = 0x10,
+NPCM_GMAC_MII_DATA = 0x14,
+NPCM_GMAC_FLOW_CTRL = 0x18,
+NPCM_GMAC_VLAN_FLAG = 0x1c,
+NPCM_GMAC_VERSION = 0x20,
+NPCM_GMAC_WAKEUP_FILTER = 0x28,
+NPCM_GMAC_PMT = 0x2c,
+NPCM_GMAC_LPI_CTRL = 0x30,
+NPCM_GMAC_TIMER_CTRL = 0x34,
+NPCM_GMAC_INT_STATUS = 0x38,
+NPCM_GMAC_INT_MASK = 0x3c,
+NPCM_GMAC_MAC0_ADDR_HI = 0x40,
+NPCM_GMAC_MAC0_ADDR_LO = 0x44,
+NPCM_GMAC_MAC1_ADDR_HI = 0x48,
+NPCM_GMAC_MAC1_ADDR_LO = 0x4c,
+NPCM_GMAC_MAC2_ADDR_HI = 0x50,
+NPCM_GMAC_MAC2_ADDR_LO = 0x54,
+NPCM_GMAC_MAC3_ADDR_HI = 0x58,
+NPCM_GMAC_MAC3_ADDR_LO = 0x5c,
+NPCM_GMAC_RGMII_STATUS = 0xd8,
+NPCM_GMAC_WATCHDOG = 0xdc,
+NPCM_GMAC_PTP_TCR = 0x700,
+NPCM_GMAC_PTP_SSIR = 0x704,
+NPCM_GMAC_PTP_STSR = 0x708,
+NPCM_GMAC_PTP_STNSR = 0x70c,
+NPCM_GMAC_PTP_STSUR = 0x710,
+NPCM_GMAC_PTP_STNSUR = 0x714,
+NPCM_GMAC_PTP_TAR = 0x718,
+NPCM_GMAC_PTP_TTSR = 0x71c,
+} NPCMRegister;
+
+static uint32_t gmac_read(QTestState *qts, const GMACModule *mod,
+  NPCMRegister regno)
+{
+return qtest_readl(qts, mod->base_addr + regno);
+}
+
+/* Check that GMAC registers are reset to default value */
+static void test_init(gconstpointer test_data)
+{
+const TestData *td = test_data;
+const GMACModule *mod = td->module;
+QTestState *qts = qtest_init("-machine npcm845-evb");
+
+#define

[PATCH v12 10/10] tests/qtest: Adding PCS Module test to GMAC Qtest

2024-01-10 Thread Nabih Estefan

From: Nabih Estefan Diaz 

 - Add PCS Register check to npcm_gmac-test

Change-Id: I34821beb5e0b1e89e2be576ab58eabe41545af12
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 tests/qtest/npcm_gmac-test.c | 132 +++
 1 file changed, 132 insertions(+)

diff --git a/tests/qtest/npcm_gmac-test.c b/tests/qtest/npcm_gmac-test.c
index 130a1599a8..b64515794b 100644
--- a/tests/qtest/npcm_gmac-test.c
+++ b/tests/qtest/npcm_gmac-test.c
@@ -20,6 +20,10 @@
 /* Name of the GMAC Device */
 #define TYPE_NPCM_GMAC "npcm-gmac"
 
+/* Address of the PCS Module */
+#define PCS_BASE_ADDRESS 0xf078
+#define NPCM_PCS_IND_AC_BA 0x1fe
+
 typedef struct GMACModule {
 int irq;
 uint64_t base_addr;
@@ -111,6 +115,62 @@ typedef enum NPCMRegister {
 NPCM_GMAC_PTP_STNSUR = 0x714,
 NPCM_GMAC_PTP_TAR = 0x718,
 NPCM_GMAC_PTP_TTSR = 0x71c,
+
+/* PCS Registers */
+NPCM_PCS_SR_CTL_ID1 = 0x3c0008,
+NPCM_PCS_SR_CTL_ID2 = 0x3c000a,
+NPCM_PCS_SR_CTL_STS = 0x3c0010,
+
+NPCM_PCS_SR_MII_CTRL = 0x3e,
+NPCM_PCS_SR_MII_STS = 0x3e0002,
+NPCM_PCS_SR_MII_DEV_ID1 = 0x3e0004,
+NPCM_PCS_SR_MII_DEV_ID2 = 0x3e0006,
+NPCM_PCS_SR_MII_AN_ADV = 0x3e0008,
+NPCM_PCS_SR_MII_LP_BABL = 0x3e000a,
+NPCM_PCS_SR_MII_AN_EXPN = 0x3e000c,
+NPCM_PCS_SR_MII_EXT_STS = 0x3e001e,
+
+NPCM_PCS_SR_TIM_SYNC_ABL = 0x3e0e10,
+NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_LWR = 0x3e0e12,
+NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_UPR = 0x3e0e14,
+NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_LWR = 0x3e0e16,
+NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_UPR = 0x3e0e18,
+NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_LWR = 0x3e0e1a,
+NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_UPR = 0x3e0e1c,
+NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_LWR = 0x3e0e1e,
+NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_UPR = 0x3e0e20,
+
+NPCM_PCS_VR_MII_MMD_DIG_CTRL1 = 0x3f,
+NPCM_PCS_VR_MII_AN_CTRL = 0x3f0002,
+NPCM_PCS_VR_MII_AN_INTR_STS = 0x3f0004,
+NPCM_PCS_VR_MII_TC = 0x3f0006,
+NPCM_PCS_VR_MII_DBG_CTRL = 0x3f000a,
+NPCM_PCS_VR_MII_EEE_MCTRL0 = 0x3f000c,
+NPCM_PCS_VR_MII_EEE_TXTIMER = 0x3f0010,
+NPCM_PCS_VR_MII_EEE_RXTIMER = 0x3f0012,
+NPCM_PCS_VR_MII_LINK_TIMER_CTRL = 0x3f0014,
+NPCM_PCS_VR_MII_EEE_MCTRL1 = 0x3f0016,
+NPCM_PCS_VR_MII_DIG_STS = 0x3f0020,
+NPCM_PCS_VR_MII_ICG_ERRCNT1 = 0x3f0022,
+NPCM_PCS_VR_MII_MISC_STS = 0x3f0030,
+NPCM_PCS_VR_MII_RX_LSTS = 0x3f0040,
+NPCM_PCS_VR_MII_MP_TX_BSTCTRL0 = 0x3f0070,
+NPCM_PCS_VR_MII_MP_TX_LVLCTRL0 = 0x3f0074,
+NPCM_PCS_VR_MII_MP_TX_GENCTRL0 = 0x3f007a,
+NPCM_PCS_VR_MII_MP_TX_GENCTRL1 = 0x3f007c,
+NPCM_PCS_VR_MII_MP_TX_STS = 0x3f0090,
+NPCM_PCS_VR_MII_MP_RX_GENCTRL0 = 0x3f00b0,
+NPCM_PCS_VR_MII_MP_RX_GENCTRL1 = 0x3f00b2,
+NPCM_PCS_VR_MII_MP_RX_LOS_CTRL0 = 0x3f00ba,
+NPCM_PCS_VR_MII_MP_MPLL_CTRL0 = 0x3f00f0,
+NPCM_PCS_VR_MII_MP_MPLL_CTRL1 = 0x3f00f2,
+NPCM_PCS_VR_MII_MP_MPLL_STS = 0x3f0110,
+NPCM_PCS_VR_MII_MP_MISC_CTRL2 = 0x3f0126,
+NPCM_PCS_VR_MII_MP_LVL_CTRL = 0x3f0130,
+NPCM_PCS_VR_MII_MP_MISC_CTRL0 = 0x3f0132,
+NPCM_PCS_VR_MII_MP_MISC_CTRL1 = 0x3f0134,
+NPCM_PCS_VR_MII_DIG_CTRL2 = 0x3f01c2,
+NPCM_PCS_VR_MII_DIG_ERRCNT_SEL = 0x3f01c4,
 } NPCMRegister;
 
 static uint32_t gmac_read(QTestState *qts, const GMACModule *mod,
@@ -119,6 +179,15 @@ static uint32_t gmac_read(QTestState *qts, const 
GMACModule *mod,
 return qtest_readl(qts, mod->base_addr + regno);
 }
 
+static uint16_t pcs_read(QTestState *qts, const GMACModule *mod,
+  NPCMRegister regno)
+{
+uint32_t write_value = (regno & 0x3ffe00) >> 9;
+qtest_writel(qts, PCS_BASE_ADDRESS + NPCM_PCS_IND_AC_BA, write_value);
+uint32_t read_offset = regno & 0x1ff;
+return qtest_readl(qts, PCS_BASE_ADDRESS + read_offset);
+}
+
 /* Check that GMAC registers are reset to default value */
 static void test_init(gconstpointer test_data)
 {
@@ -131,6 +200,11 @@ static void test_init(gconstpointer test_data)
 g_assert_cmphex(gmac_read(qts, mod, (regno)), ==, (value)); \
 } while (0)
 
+#define CHECK_REG_PCS(regno, value) \
+do { \
+g_assert_cmphex(pcs_read(qts, mod, (regno)), ==, (value)); \
+} while (0)
+
 CHECK_REG32(NPCM_DMA_BUS_MODE, 0x00020100);
 CHECK_REG32(NPCM_DMA_XMT_POLL_DEMAND, 0);
 CHECK_REG32(NPCM_DMA_RCV_POLL_DEMAND, 0);
@@ -180,6 +254,64 @@ static void test_init(gconstpointer test_data)
 CHECK_REG32(NPCM_GMAC_PTP_TAR, 0);
 CHECK_REG32(NPCM_GMAC_PTP_TTSR, 0);
 
+/* TODO Add registers PCS */
+if (mod->base_addr == 0xf0802000) {
+CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID1, 0x699e);
+CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID2, 0);
+CHECK_REG_PCS(NPCM_PCS_SR_CTL_STS, 0x8000);
+
+CHECK_REG_PCS(NPCM_PCS_SR_MII_CTRL, 0x1140);
+CHECK_REG_PCS(NPCM_PCS_SR_MII_STS, 0x0109);
+CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID1, 0x699e);
+CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID2, 0x0ced0);
+

[PATCH v12 05/10] hw/arm: Add GMAC devices to NPCM7XX SoC

2024-01-10 Thread Nabih Estefan

From: Hao Wu 

Change-Id: Id8a3461fb5042adc4c3fd6f4fbd1ca0d33e22565
Signed-off-by: Hao Wu 
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 hw/arm/npcm7xx.c | 36 ++--
 include/hw/arm/npcm7xx.h |  2 ++
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
index c9e87162cb..12e11250e1 100644
--- a/hw/arm/npcm7xx.c
+++ b/hw/arm/npcm7xx.c
@@ -91,6 +91,7 @@ enum NPCM7xxInterrupt {
 NPCM7XX_GMAC1_IRQ   = 14,
 NPCM7XX_EMC1RX_IRQ  = 15,
 NPCM7XX_EMC1TX_IRQ,
+NPCM7XX_GMAC2_IRQ,
 NPCM7XX_MMC_IRQ = 26,
 NPCM7XX_PSPI2_IRQ   = 28,
 NPCM7XX_PSPI1_IRQ   = 31,
@@ -234,6 +235,12 @@ static const hwaddr npcm7xx_pspi_addr[] = {
 0xf0201000,
 };
 
+/* Register base address for each GMAC Module */
+static const hwaddr npcm7xx_gmac_addr[] = {
+0xf0802000,
+0xf0804000,
+};
+
 static const struct {
 hwaddr regs_addr;
 uint32_t unconnected_pins;
@@ -462,6 +469,10 @@ static void npcm7xx_init(Object *obj)
 object_initialize_child(obj, "pspi[*]", >pspi[i], TYPE_NPCM_PSPI);
 }
 
+for (i = 0; i < ARRAY_SIZE(s->gmac); i++) {
+object_initialize_child(obj, "gmac[*]", >gmac[i], TYPE_NPCM_GMAC);
+}
+
 object_initialize_child(obj, "pci-mbox", >pci_mbox,
 TYPE_NPCM7XX_PCI_MBOX);
 object_initialize_child(obj, "mmc", >mmc, TYPE_NPCM7XX_SDHCI);
@@ -695,6 +706,29 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
 sysbus_connect_irq(sbd, 1, npcm7xx_irq(s, rx_irq));
 }
 
+/*
+ * GMAC Modules. Cannot fail.
+ */
+QEMU_BUILD_BUG_ON(ARRAY_SIZE(npcm7xx_gmac_addr) != ARRAY_SIZE(s->gmac));
+QEMU_BUILD_BUG_ON(ARRAY_SIZE(s->gmac) != 2);
+for (i = 0; i < ARRAY_SIZE(s->gmac); i++) {
+SysBusDevice *sbd = SYS_BUS_DEVICE(>gmac[i]);
+
+/*
+ * The device exists regardless of whether it's connected to a QEMU
+ * netdev backend. So always instantiate it even if there is no
+ * backend.
+ */
+sysbus_realize(sbd, _abort);
+sysbus_mmio_map(sbd, 0, npcm7xx_gmac_addr[i]);
+int irq = i == 0 ? NPCM7XX_GMAC1_IRQ : NPCM7XX_GMAC2_IRQ;
+/*
+ * N.B. The values for the second argument sysbus_connect_irq are
+ * chosen to match the registration order in npcm7xx_emc_realize.
+ */
+sysbus_connect_irq(sbd, 0, npcm7xx_irq(s, irq));
+}
+
 /*
  * Flash Interface Unit (FIU). Can fail if incorrect number of chip selects
  * specified, but this is a programming error.
@@ -765,8 +799,6 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
 create_unimplemented_device("npcm7xx.siox[2]",  0xf0102000,   4 * KiB);
 create_unimplemented_device("npcm7xx.ahbpci",   0xf040,   1 * MiB);
 create_unimplemented_device("npcm7xx.mcphy",0xf05f,  64 * KiB);
-create_unimplemented_device("npcm7xx.gmac1",0xf0802000,   8 * KiB);
-create_unimplemented_device("npcm7xx.gmac2",0xf0804000,   8 * KiB);
 create_unimplemented_device("npcm7xx.vcd",  0xf081,  64 * KiB);
 create_unimplemented_device("npcm7xx.ece",  0xf082,   8 * KiB);
 create_unimplemented_device("npcm7xx.vdma", 0xf0822000,   8 * KiB);
diff --git a/include/hw/arm/npcm7xx.h b/include/hw/arm/npcm7xx.h
index cec3792a2e..9e5cf639a2 100644
--- a/include/hw/arm/npcm7xx.h
+++ b/include/hw/arm/npcm7xx.h
@@ -30,6 +30,7 @@
 #include "hw/misc/npcm7xx_pwm.h"
 #include "hw/misc/npcm7xx_rng.h"
 #include "hw/net/npcm7xx_emc.h"
+#include "hw/net/npcm_gmac.h"
 #include "hw/nvram/npcm7xx_otp.h"
 #include "hw/timer/npcm7xx_timer.h"
 #include "hw/ssi/npcm7xx_fiu.h"
@@ -105,6 +106,7 @@ struct NPCM7xxState {
 OHCISysBusState ohci;
 NPCM7xxFIUState fiu[2];
 NPCM7xxEMCState emc[2];
+NPCMGMACState   gmac[2];
 NPCM7xxPCIMBoxState pci_mbox;
 NPCM7xxSDHCIState   mmc;
 NPCMPSPIState   pspi[2];
-- 
2.43.0.275.g3460e3d667-goog

[PATCH v12 02/10] hw/arm: Add PCI mailbox module to Nuvoton SoC

2024-01-10 Thread Nabih Estefan

From: Hao Wu 

This patch wires the PCI mailbox module to Nuvoton SoC.

Change-Id: I14c42c628258804030f0583889882842bde0d972
Signed-off-by: Hao Wu 
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 docs/system/arm/nuvoton.rst | 2 ++
 hw/arm/npcm7xx.c| 2 ++
 include/hw/arm/npcm7xx.h| 1 +
 3 files changed, 5 insertions(+)

diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
index 0424cae4b0..e611099545 100644
--- a/docs/system/arm/nuvoton.rst
+++ b/docs/system/arm/nuvoton.rst
@@ -50,6 +50,8 @@ Supported devices
  * Ethernet controller (EMC)
  * Tachometer
  * Peripheral SPI controller (PSPI)
+ * BIOS POST code FIFO
+ * PCI Mailbox
 
 Missing devices
 ---
diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
index 1c3634ff45..c9e87162cb 100644
--- a/hw/arm/npcm7xx.c
+++ b/hw/arm/npcm7xx.c
@@ -462,6 +462,8 @@ static void npcm7xx_init(Object *obj)
 object_initialize_child(obj, "pspi[*]", >pspi[i], TYPE_NPCM_PSPI);
 }
 
+object_initialize_child(obj, "pci-mbox", >pci_mbox,
+TYPE_NPCM7XX_PCI_MBOX);
 object_initialize_child(obj, "mmc", >mmc, TYPE_NPCM7XX_SDHCI);
 }
 
diff --git a/include/hw/arm/npcm7xx.h b/include/hw/arm/npcm7xx.h
index 273090ac60..cec3792a2e 100644
--- a/include/hw/arm/npcm7xx.h
+++ b/include/hw/arm/npcm7xx.h
@@ -105,6 +105,7 @@ struct NPCM7xxState {
 OHCISysBusState ohci;
 NPCM7xxFIUState fiu[2];
 NPCM7xxEMCState emc[2];
+NPCM7xxPCIMBoxState pci_mbox;
 NPCM7xxSDHCIState   mmc;
 NPCMPSPIState   pspi[2];
 };
-- 
2.43.0.275.g3460e3d667-goog

[PATCH v12 09/10] hw/net: GMAC Tx Implementation

2024-01-10 Thread Nabih Estefan

From: Nabih Estefan Diaz 

- Implementation of Transmit function for packets
- Implementation for reading and writing from and to descriptors in
  memory for Tx

NOTE: This function implements the steps detailed in the datasheet for
transmitting messages from the GMAC.

Added relevant trace-events

Change-Id: Icf14f9fcc6cc7808a41acd872bca67c9832087e6
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 hw/net/npcm_gmac.c  | 173 
 hw/net/trace-events |   4 +-
 2 files changed, 176 insertions(+), 1 deletion(-)

diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
index e81996b01a..c107e835b1 100644
--- a/hw/net/npcm_gmac.c
+++ b/hw/net/npcm_gmac.c
@@ -265,6 +265,7 @@ static int gmac_write_tx_desc(dma_addr_t addr, struct 
NPCMGMACTxDesc *desc)
 }
 return 0;
 }
+
 static int gmac_rx_transfer_frame_to_buffer(uint32_t rx_buf_len,
 uint32_t *left_frame,
 uint32_t rx_buf_addr,
@@ -486,6 +487,155 @@ static ssize_t gmac_receive(NetClientState *nc, const 
uint8_t *buf, size_t len)
 return len;
 }
 
+static int gmac_tx_get_csum(uint32_t tdes1)
+{
+uint32_t mask = TX_DESC_TDES1_CHKSM_INS_CTRL_MASK(tdes1);
+int csum = 0;
+
+if (likely(mask > 0)) {
+csum |= CSUM_IP;
+}
+if (likely(mask > 1)) {
+csum |= CSUM_TCP | CSUM_UDP;
+}
+
+return csum;
+}
+
+static void gmac_try_send_next_packet(NPCMGMACState *gmac)
+{
+/*
+ * Comments about steps refer to steps for
+ * transmitting in page 384 of datasheet
+ */
+uint16_t tx_buffer_size = 2048;
+g_autofree uint8_t *tx_send_buffer = g_malloc(tx_buffer_size);
+uint32_t desc_addr;
+struct NPCMGMACTxDesc tx_desc;
+uint32_t tx_buf_addr, tx_buf_len;
+uint16_t length = 0;
+uint8_t *buf = tx_send_buffer;
+uint32_t prev_buf_size = 0;
+int csum = 0;
+
+/* steps 1&2 */
+if (!gmac->regs[R_NPCM_DMA_HOST_TX_DESC]) {
+gmac->regs[R_NPCM_DMA_HOST_TX_DESC] =
+NPCM_DMA_HOST_TX_DESC_MASK(gmac->regs[R_NPCM_DMA_TX_BASE_ADDR]);
+}
+desc_addr = gmac->regs[R_NPCM_DMA_HOST_TX_DESC];
+
+while (true) {
+gmac_dma_set_state(gmac, NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT,
+NPCM_DMA_STATUS_TX_RUNNING_FETCHING_STATE);
+if (gmac_read_tx_desc(desc_addr, _desc)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "TX Descriptor @ 0x%x can't be read\n",
+  desc_addr);
+return;
+}
+/* step 3 */
+
+trace_npcm_gmac_packet_desc_read(DEVICE(gmac)->canonical_path,
+desc_addr);
+trace_npcm_gmac_debug_desc_data(DEVICE(gmac)->canonical_path, _desc,
+tx_desc.tdes0, tx_desc.tdes1, tx_desc.tdes2, tx_desc.tdes3);
+
+/* 1 = DMA Owned, 0 = Software Owned */
+if (!(tx_desc.tdes0 & TX_DESC_TDES0_OWN)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "TX Descriptor @ 0x%x is owned by software\n",
+  desc_addr);
+gmac->regs[R_NPCM_DMA_STATUS] |= NPCM_DMA_STATUS_TU;
+gmac_dma_set_state(gmac, NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT,
+NPCM_DMA_STATUS_TX_SUSPENDED_STATE);
+gmac_update_irq(gmac);
+return;
+}
+
+gmac_dma_set_state(gmac, NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT,
+NPCM_DMA_STATUS_TX_RUNNING_READ_STATE);
+/* Give the descriptor back regardless of what happens. */
+tx_desc.tdes0 &= ~TX_DESC_TDES0_OWN;
+
+if (tx_desc.tdes1 & TX_DESC_TDES1_FIRST_SEG_MASK) {
+csum = gmac_tx_get_csum(tx_desc.tdes1);
+}
+
+/* step 4 */
+tx_buf_addr = tx_desc.tdes2;
+gmac->regs[R_NPCM_DMA_CUR_TX_BUF_ADDR] = tx_buf_addr;
+tx_buf_len = TX_DESC_TDES1_BFFR1_SZ_MASK(tx_desc.tdes1);
+buf = _send_buffer[prev_buf_size];
+
+if ((prev_buf_size + tx_buf_len) > sizeof(buf)) {
+tx_buffer_size = prev_buf_size + tx_buf_len;
+tx_send_buffer = g_realloc(tx_send_buffer, tx_buffer_size);
+buf = _send_buffer[prev_buf_size];
+}
+
+/* step 5 */
+if (dma_memory_read(_space_memory, tx_buf_addr, buf,
+tx_buf_len, MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read packet @ 
0x%x\n",
+__func__, tx_buf_addr);
+return;
+}
+length += tx_buf_len;
+prev_buf_size += tx_buf_len;
+
+/* If not chained we'll have a second buffer. */
+if (!(tx_desc.tdes1 & TX_DESC_TDES1_SEC_ADDR_CHND_MASK)) {
+tx_buf_addr = tx_desc.tdes3;
+gmac->regs[R_NPCM_DMA_CUR_TX_BUF_ADDR] = tx_buf_addr;
+tx_buf_len = TX_DESC_TDES1_BFFR2_SZ_MASK(tx_desc.tdes1);
+buf = _send_buffer[prev_buf_size];
+
+

[PATCH v12 03/10] hw/misc: Add qtest for NPCM7xx PCI Mailbox

2024-01-10 Thread Nabih Estefan

From: Hao Wu 

This patches adds a qtest for NPCM7XX PCI Mailbox module.
It sends read and write requests to the module, and verifies that
the module contains the correct data after the requests.

Change-Id: I2e1dbaecf8be9ec7eab55cb54f7fdeb0715b8275
Signed-off-by: Hao Wu 
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 tests/qtest/meson.build |   1 +
 tests/qtest/npcm7xx_pci_mbox-test.c | 238 
 2 files changed, 239 insertions(+)
 create mode 100644 tests/qtest/npcm7xx_pci_mbox-test.c

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index f25bffcc20..41bc75c8b1 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -183,6 +183,7 @@ qtests_sparc64 = \
 qtests_npcm7xx = \
   ['npcm7xx_adc-test',
'npcm7xx_gpio-test',
+   'npcm7xx_pci_mbox-test',
'npcm7xx_pwm-test',
'npcm7xx_rng-test',
'npcm7xx_sdhci-test',
diff --git a/tests/qtest/npcm7xx_pci_mbox-test.c 
b/tests/qtest/npcm7xx_pci_mbox-test.c
new file mode 100644
index 00..24eec18e3c
--- /dev/null
+++ b/tests/qtest/npcm7xx_pci_mbox-test.c
@@ -0,0 +1,238 @@
+/*
+ * QTests for Nuvoton NPCM7xx PCI Mailbox Modules.
+ *
+ * Copyright 2021 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/bitops.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qnum.h"
+#include "libqtest-single.h"
+
+#define PCI_MBOX_BA 0xf0848000
+#define PCI_MBOX_IRQ8
+
+/* register offset */
+#define PCI_MBOX_STAT   0x00
+#define PCI_MBOX_CTL0x04
+#define PCI_MBOX_CMD0x08
+
+#define CODE_OK 0x00
+#define CODE_INVALID_OP 0xa0
+#define CODE_INVALID_SIZE   0xa1
+#define CODE_ERROR  0xff
+
+#define OP_READ 0x01
+#define OP_WRITE0x02
+#define OP_INVALID  0x41
+
+
+static int sock;
+static int fd;
+
+/*
+ * Create a local TCP socket with any port, then save off the port we got.
+ */
+static in_port_t open_socket(void)
+{
+struct sockaddr_in myaddr;
+socklen_t addrlen;
+
+myaddr.sin_family = AF_INET;
+myaddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+myaddr.sin_port = 0;
+sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
+g_assert(sock != -1);
+g_assert(bind(sock, (struct sockaddr *) , sizeof(myaddr)) != -1);
+addrlen = sizeof(myaddr);
+g_assert(getsockname(sock, (struct sockaddr *)  , ) != -1);
+g_assert(listen(sock, 1) != -1);
+return ntohs(myaddr.sin_port);
+}
+
+static void setup_fd(void)
+{
+fd_set readfds;
+
+FD_ZERO();
+FD_SET(sock, );
+g_assert(select(sock + 1, , NULL, NULL, NULL) == 1);
+
+fd = accept(sock, NULL, 0);
+g_assert(fd >= 0);
+}
+
+static uint8_t read_response(uint8_t *buf, size_t len)
+{
+uint8_t code;
+ssize_t ret = read(fd, , 1);
+
+if (ret == -1) {
+return CODE_ERROR;
+}
+if (code != CODE_OK) {
+return code;
+}
+g_test_message("response code: %x", code);
+if (len > 0) {
+ret = read(fd, buf, len);
+if (ret < len) {
+return CODE_ERROR;
+}
+}
+return CODE_OK;
+}
+
+static void receive_data(uint64_t offset, uint8_t *buf, size_t len)
+{
+uint8_t op = OP_READ;
+uint8_t code;
+ssize_t rv;
+
+while (len > 0) {
+uint8_t size;
+
+if (len >= 8) {
+size = 8;
+} else if (len >= 4) {
+size = 4;
+} else if (len >= 2) {
+size = 2;
+} else {
+size = 1;
+}
+
+g_test_message("receiving %u bytes", size);
+/* Write op */
+rv = write(fd, , 1);
+g_assert_cmpint(rv, ==, 1);
+/* Write offset */
+rv = write(fd, (uint8_t *), sizeof(uint64_t));
+g_assert_cmpint(rv, ==, sizeof(uint64_t));
+/* Write size */
+g_assert_cmpint(write(fd, , 1), ==, 1);
+
+/* Read data and Expect response */
+code = read_response(buf, size);
+g_assert_cmphex(code, ==, CODE_OK);
+
+buf += size;
+offset += size;
+len -= size;
+}
+}
+
+static void send_data(uint64_t offset, const uint8_t *buf, size_t len)
+{
+uint8_t op = OP_WRITE;
+uint8_t code;
+ssize_t rv;
+
+while (len > 0) {
+uint8_t size;
+
+if (len >= 8) {
+size = 8;
+} else if (len >= 4) {
+size = 4;
+} else if (len >= 2) {
+size = 2;
+} else {
+size = 1;
+}
+
+

[PATCH v12 07/10] include/hw/net: General GMAC Implementation

2024-01-10 Thread Nabih Estefan

From: Nabih Estefan Diaz 

Implemented GMAC IRQ Handling and added relevant trace-events

Change-Id: I7a2d3cd3f493278bcd0cf483233c1e05c37488b7
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 hw/net/npcm_gmac.c  | 40 
 hw/net/trace-events |  1 +
 2 files changed, 41 insertions(+)

diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
index 98b3c33c94..44c4ffaff4 100644
--- a/hw/net/npcm_gmac.c
+++ b/hw/net/npcm_gmac.c
@@ -149,6 +149,46 @@ static bool gmac_can_receive(NetClientState *nc)
 return true;
 }
 
+/*
+ * Function that updates the GMAC IRQ
+ * It find the logical OR of the enabled bits for NIS (if enabled)
+ * It find the logical OR of the enabled bits for AIS (if enabled)
+ */
+static void gmac_update_irq(NPCMGMACState *gmac)
+{
+/*
+ * Check if the normal interrupts summary is enabled
+ * if so, add the bits for the summary that are enabled
+ */
+if (gmac->regs[R_NPCM_DMA_INTR_ENA] & gmac->regs[R_NPCM_DMA_STATUS] &
+(NPCM_DMA_INTR_ENAB_NIE_BITS)) {
+gmac->regs[R_NPCM_DMA_STATUS] |=  NPCM_DMA_STATUS_NIS;
+}
+/*
+ * Check if the abnormal interrupts summary is enabled
+ * if so, add the bits for the summary that are enabled
+ */
+if (gmac->regs[R_NPCM_DMA_INTR_ENA] & gmac->regs[R_NPCM_DMA_STATUS] &
+(NPCM_DMA_INTR_ENAB_AIE_BITS)) {
+gmac->regs[R_NPCM_DMA_STATUS] |=  NPCM_DMA_STATUS_AIS;
+}
+
+/* Get the logical OR of both normal and abnormal interrupts */
+int level = !!((gmac->regs[R_NPCM_DMA_STATUS] &
+gmac->regs[R_NPCM_DMA_INTR_ENA] &
+NPCM_DMA_STATUS_NIS) |
+   (gmac->regs[R_NPCM_DMA_STATUS] &
+   gmac->regs[R_NPCM_DMA_INTR_ENA] &
+   NPCM_DMA_STATUS_AIS));
+
+/* Set the IRQ */
+trace_npcm_gmac_update_irq(DEVICE(gmac)->canonical_path,
+   gmac->regs[R_NPCM_DMA_STATUS],
+   gmac->regs[R_NPCM_DMA_INTR_ENA],
+   level);
+qemu_set_irq(gmac->irq, level);
+}
+
 static ssize_t gmac_receive(NetClientState *nc, const uint8_t *buf, size_t len)
 {
 /* Placeholder. Function will be filled in following patches */
diff --git a/hw/net/trace-events b/hw/net/trace-events
index 33514548b8..56057de47f 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -473,6 +473,7 @@ npcm_gmac_reg_write(const char *name, uint64_t offset, 
uint32_t value) "%s: offs
 npcm_gmac_mdio_access(const char *name, uint8_t is_write, uint8_t pa, uint8_t 
gr, uint16_t val) "%s: is_write: %" PRIu8 " pa: %" PRIu8 " gr: %" PRIu8 " val: 
0x%04" PRIx16
 npcm_gmac_reset(const char *name, uint16_t value) "%s: phy_regs[0][1]: 0x%04" 
PRIx16
 npcm_gmac_set_link(bool active) "Set link: active=%u"
+npcm_gmac_update_irq(const char *name, uint32_t status, uint32_t intr_en, int 
level) "%s: Status Reg: 0x%04" PRIX32 " Interrupt Enable Reg: 0x%04" PRIX32 " 
IRQ Set: %d"
 
 # npcm_pcs.c
 npcm_pcs_reg_read(const char *name, uint16_t indirect_access_baes, uint64_t 
offset, uint16_t value) "%s: IND: 0x%02" PRIx16 " offset: 0x%04" PRIx64 " 
value: 0x%04" PRIx16
-- 
2.43.0.275.g3460e3d667-goog

[PATCH v12 00/10] Implementation of NPI Mailbox and GMAC Networking Module

2024-01-10 Thread Nabih Estefan

From: Nabih Estefan Diaz 

[Changes since v11]
Was running into error syncing into master. It seemed to be related to a
hash problem introduced in patchset 10 (unrelated to the macOS build
issue). carried the patches from v9 (before the syncing problem) and
added the fixes from patchsets 10 and 11 to remove the hash error.

[Changes since v10]
Fixed macOS build issue. Changed imports to not be linux-specific.

[Changes since v9]
More cleanup and fixes based on suggestions from Peter Maydell
(peter.mayd...@linaro.org) suggestions.

[Changes since v8]
Suggestions and Fixes from Peter Maydell (peter.mayd...@linaro.org),
also cleaned up changes so nothing is deleted in a later patch that was
added in an earlier patch. Patch count decresed by 1 because this cleanup
led to one of the patches being irrelevant.

[Changes since v7]
Fixed patch 4 declaration of new NIC based on comments by Peter Maydell
(peter.mayd...@linaro.org)

[Changes since v6]
Remove the Change-Ids from the commit messages.

[Changes since v5]
Undid remove of some qtests that seem to have been caused by a merge
conflict.

[Changes since v4]
Added Signed-off-by tag and fixed patch 4 commit message as suggested by
Peter Maydell (peter.mayd...@linaro.org)

[Changes since v3]
Fixed comments from Hao Wu (wuhao...@google.com)

[Changes since v2]
Fixed bugs related to the RC functionality of the GMAC. Added and
squashed patches related to that.

[Changes since v1]
Fixed some errors in formatting.
Fixed a merge error that I didn't see in v1.
Removed Nuvoton 8xx references since that is a separate patch set.

[Original Cover]
Creates NPI Mailbox Module with data verification for read and write (internal 
and external),
wiring to the Nuvoton SoC, and QTests.

Also creates the GMAC Networking Module. Implements read and write 
functionalities with cooresponding descriptors
and registers. Also includes QTests for the different functionalities.

Hao Wu (5):
  hw/misc: Add Nuvoton's PCI Mailbox Module
  hw/arm: Add PCI mailbox module to Nuvoton SoC
  hw/misc: Add qtest for NPCM7xx PCI Mailbox
  hw/net: Add NPCMXXX GMAC device
  hw/arm: Add GMAC devices to NPCM7XX SoC

Nabih Estefan Diaz (5):
  tests/qtest: Creating qtest for GMAC Module
  include/hw/net: General GMAC Implementation
  hw/net: GMAC Rx Implementation
  hw/net: GMAC Tx Implementation
  tests/qtest: Adding PCS Module test to GMAC Qtest

 docs/system/arm/nuvoton.rst |   2 +
 hw/arm/npcm7xx.c|  53 +-
 hw/misc/meson.build |   1 +
 hw/misc/npcm7xx_pci_mbox.c  | 324 ++
 hw/misc/trace-events|   5 +
 hw/net/meson.build  |   2 +-
 hw/net/npcm_gmac.c  | 939 
 hw/net/trace-events |  19 +
 include/hw/arm/npcm7xx.h|   4 +
 include/hw/misc/npcm7xx_pci_mbox.h  |  81 +++
 include/hw/net/npcm_gmac.h  | 340 ++
 tests/qtest/meson.build |   2 +
 tests/qtest/npcm7xx_pci_mbox-test.c | 238 +++
 tests/qtest/npcm_gmac-test.c| 341 ++
 14 files changed, 2347 insertions(+), 4 deletions(-)
 create mode 100644 hw/misc/npcm7xx_pci_mbox.c
 create mode 100644 hw/net/npcm_gmac.c
 create mode 100644 include/hw/misc/npcm7xx_pci_mbox.h
 create mode 100644 include/hw/net/npcm_gmac.h
 create mode 100644 tests/qtest/npcm7xx_pci_mbox-test.c
 create mode 100644 tests/qtest/npcm_gmac-test.c

-- 
2.43.0.275.g3460e3d667-goog

[PATCH v12 01/10] hw/misc: Add Nuvoton's PCI Mailbox Module

2024-01-10 Thread Nabih Estefan

From: Hao Wu 

The PCI Mailbox Module is a high-bandwidth communcation module
between a Nuvoton BMC and CPU. It features 16KB RAM that are both
accessible by the BMC and core CPU. and supports interrupt for
both sides.

This patch implements the BMC side of the PCI mailbox module.
Communication with the core CPU is emulated via a chardev and
will be in a follow-up patch.

Change-Id: Iaca22f81c4526927d437aa367079ed038faf43f2
Signed-off-by: Hao Wu 
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 hw/arm/npcm7xx.c   |  15 +-
 hw/misc/meson.build|   1 +
 hw/misc/npcm7xx_pci_mbox.c | 324 +
 hw/misc/trace-events   |   5 +
 include/hw/arm/npcm7xx.h   |   1 +
 include/hw/misc/npcm7xx_pci_mbox.h |  81 
 6 files changed, 426 insertions(+), 1 deletion(-)
 create mode 100644 hw/misc/npcm7xx_pci_mbox.c
 create mode 100644 include/hw/misc/npcm7xx_pci_mbox.h

diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
index 15ff21d047..1c3634ff45 100644
--- a/hw/arm/npcm7xx.c
+++ b/hw/arm/npcm7xx.c
@@ -53,6 +53,9 @@
 /* ADC Module */
 #define NPCM7XX_ADC_BA  (0xf000c000)
 
+/* PCI Mailbox Module */
+#define NPCM7XX_PCI_MBOX_BA (0xf0848000)
+
 /* Internal AHB SRAM */
 #define NPCM7XX_RAM3_BA (0xc0008000)
 #define NPCM7XX_RAM3_SZ (4 * KiB)
@@ -83,6 +86,9 @@ enum NPCM7xxInterrupt {
 NPCM7XX_UART1_IRQ,
 NPCM7XX_UART2_IRQ,
 NPCM7XX_UART3_IRQ,
+NPCM7XX_PCI_MBOX_IRQ= 8,
+NPCM7XX_KCS_HIB_IRQ = 9,
+NPCM7XX_GMAC1_IRQ   = 14,
 NPCM7XX_EMC1RX_IRQ  = 15,
 NPCM7XX_EMC1TX_IRQ,
 NPCM7XX_MMC_IRQ = 26,
@@ -706,6 +712,14 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
 }
 }
 
+/* PCI Mailbox. Cannot fail */
+sysbus_realize(SYS_BUS_DEVICE(>pci_mbox), _abort);
+sysbus_mmio_map(SYS_BUS_DEVICE(>pci_mbox), 0, NPCM7XX_PCI_MBOX_BA);
+sysbus_mmio_map(SYS_BUS_DEVICE(>pci_mbox), 1,
+NPCM7XX_PCI_MBOX_BA + NPCM7XX_PCI_MBOX_RAM_SIZE);
+sysbus_connect_irq(SYS_BUS_DEVICE(>pci_mbox), 0,
+   npcm7xx_irq(s, NPCM7XX_PCI_MBOX_IRQ));
+
 /* RAM2 (SRAM) */
 memory_region_init_ram(>sram, OBJECT(dev), "ram2",
NPCM7XX_RAM2_SZ, _abort);
@@ -765,7 +779,6 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
 create_unimplemented_device("npcm7xx.usbd[8]",  0xf0838000,   4 * KiB);
 create_unimplemented_device("npcm7xx.usbd[9]",  0xf0839000,   4 * KiB);
 create_unimplemented_device("npcm7xx.sd",   0xf084,   8 * KiB);
-create_unimplemented_device("npcm7xx.pcimbx",   0xf0848000, 512 * KiB);
 create_unimplemented_device("npcm7xx.aes",  0xf0858000,   4 * KiB);
 create_unimplemented_device("npcm7xx.des",  0xf0859000,   4 * KiB);
 create_unimplemented_device("npcm7xx.sha",  0xf085a000,   4 * KiB);
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index 36c20d5637..0ead2e9ede 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -73,6 +73,7 @@ system_ss.add(when: 'CONFIG_NPCM7XX', if_true: files(
   'npcm7xx_clk.c',
   'npcm7xx_gcr.c',
   'npcm7xx_mft.c',
+  'npcm7xx_pci_mbox.c',
   'npcm7xx_pwm.c',
   'npcm7xx_rng.c',
 ))
diff --git a/hw/misc/npcm7xx_pci_mbox.c b/hw/misc/npcm7xx_pci_mbox.c
new file mode 100644
index 00..c770ad6fcf
--- /dev/null
+++ b/hw/misc/npcm7xx_pci_mbox.c
@@ -0,0 +1,324 @@
+/*
+ * Nuvoton NPCM7xx PCI Mailbox Module
+ *
+ * Copyright 2021 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+#include "chardev/char-fe.h"
+#include "hw/irq.h"
+#include "hw/qdev-clock.h"
+#include "hw/qdev-properties-system.h"
+#include "hw/misc/npcm7xx_pci_mbox.h"
+#include "hw/registerfields.h"
+#include "migration/vmstate.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "qemu/bitops.h"
+#include "qemu/error-report.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qemu/timer.h"
+#include "qemu/units.h"
+#include "trace.h"
+
+REG32(NPCM7XX_PCI_MBOX_BMBXSTAT, 0x00);
+REG32(NPCM7XX_PCI_MBOX_BMBXCTL, 0x04);
+REG32(NPCM7XX_PCI_MBOX_BMBXCMD, 0x08);
+
+enum NPCM7xxPCIMBoxOperation {
+NPCM7XX_PCI_MBOX_OP_READ = 1,
+NPCM7XX_PCI_MBOX_OP_WRITE,
+};
+
+#define NPCM7XX_PCI_MBOX_OFFSET_BYTES 8
+
+/* Response code */
+#define NPCM7XX_PCI_MBOX_OK 0
+#define NPCM7XX_PCI_MBOX_INVALID_OP 0xa0
+#define NPCM7XX_PCI_MBOX_INVALID_SIZE 0xa1
+#define

[PATCH v10 8/9] hw/fsi: Added FSI documentation

2024-01-10 Thread Ninad Palsule

Documentation for IBM FSI model.

Signed-off-by: Cédric Le Goater 
Signed-off-by: Ninad Palsule 
---
 docs/specs/fsi.rst   | 138 +++
 docs/specs/index.rst |   1 +
 2 files changed, 139 insertions(+)
 create mode 100644 docs/specs/fsi.rst

diff --git a/docs/specs/fsi.rst b/docs/specs/fsi.rst
new file mode 100644
index 00..05a6b6347a
--- /dev/null
+++ b/docs/specs/fsi.rst
@@ -0,0 +1,138 @@
+==
+IBM's Flexible Service Interface (FSI)
+==
+
+The QEMU FSI emulation implements hardware interfaces between ASPEED SOC, FSI
+master/slave and the end engine.
+
+FSI is a point-to-point two wire interface which is capable of supporting
+distances of up to 4 meters. FSI interfaces have been used successfully for
+many years in IBM servers to attach IBM Flexible Support Processors(FSP) to
+CPUs and IBM ASICs.
+
+FSI allows a service processor access to the internal buses of a host POWER
+processor to perform configuration or debugging. FSI has long existed in POWER
+processes and so comes with some baggage, including how it has been integrated
+into the ASPEED SoC.
+
+Working backwards from the POWER processor, the fundamental pieces of interest
+for the implementation are: (see the `FSI specification`_ for more details)
+
+1. The Common FRU Access Macro (CFAM), an address space containing various
+   "engines" that drive accesses on buses internal and external to the POWER
+   chip. Examples include the SBEFIFO and I2C masters. The engines hang off of
+   an internal Local Bus (LBUS) which is described by the CFAM configuration
+   block.
+
+2. The FSI slave: The slave is the terminal point of the FSI bus for FSI
+   symbols addressed to it. Slaves can be cascaded off of one another. The
+   slave's configuration registers appear in address space of the CFAM to
+   which it is attached.
+
+3. The FSI master: A controller in the platform service processor (e.g. BMC)
+   driving CFAM engine accesses into the POWER chip. At the hardware level
+   FSI is a bit-based protocol supporting synchronous and DMA-driven accesses
+   of engines in a CFAM.
+
+4. The On-Chip Peripheral Bus (OPB): A low-speed bus typically found in POWER
+   processors. This now makes an appearance in the ASPEED SoC due to tight
+   integration of the FSI master IP with the OPB, mainly the existence of an
+   MMIO-mapping of the CFAM address straight onto a sub-region of the OPB
+   address space.
+
+5. An APB-to-OPB bridge enabling access to the OPB from the ARM core in the
+   AST2600. Hardware limitations prevent the OPB from being directly mapped
+   into APB, so all accesses are indirect through the bridge.
+
+The LBUS is modelled to maintain the qdev bus hierarchy and to take advantages
+of the object model to automatically generate the CFAM configuration block.
+The configuration block presents engines in the order they are attached to the
+CFAM's LBUS. Engine implementations should subclass the LBusDevice and set the
+'config' member of LBusDeviceClass to match the engine's type.
+
+CFAM designs offer a lot of flexibility, for instance it is possible for a
+CFAM to be simultaneously driven from multiple FSI links. The modeling is not
+so complete; it's assumed that each CFAM is attached to a single FSI slave (as
+a consequence the CFAM subclasses the FSI slave).
+
+As for FSI, its symbols and wire-protocol are not modelled at all. This is not
+necessary to get FSI off the ground thanks to the mapping of the CFAM address
+space onto the OPB address space - the models follow this directly and map the
+CFAM memory region into the OPB's memory region.
+
+QEMU files related to FSI interface:
+ - ``hw/fsi/aspeed-apb2opb.c``
+ - ``include/hw/fsi/aspeed-apb2opb.h``
+ - ``hw/fsi/opb.c``
+ - ``include/hw/fsi/opb.h``
+ - ``hw/fsi/fsi.c``
+ - ``include/hw/fsi/fsi.h``
+ - ``hw/fsi/fsi-master.c``
+ - ``include/hw/fsi/fsi-master.h``
+ - ``hw/fsi/fsi-slave.c``
+ - ``include/hw/fsi/fsi-slave.h``
+ - ``hw/fsi/cfam.c``
+ - ``include/hw/fsi/cfam.h``
+ - ``hw/fsi/engine-scratchpad.c``
+ - ``include/hw/fsi/engine-scratchpad.h``
+ - ``include/hw/fsi/lbus.h``
+
+The following commands start the rainier machine with built-in FSI model.
+There are no model specific arguments.
+
+.. code-block:: console
+
+  qemu-system-arm -M rainier-bmc -nographic \
+  -kernel fitImage-linux.bin \
+  -dtb aspeed-bmc-ibm-rainier.dtb \
+  -initrd obmc-phosphor-initramfs.rootfs.cpio.xz \
+  -drive file=obmc-phosphor-image.rootfs.wic.qcow2,if=sd,index=2 \
+  -append "rootwait console=ttyS4,115200n8 root=PARTLABEL=rofs-a"
+
+The implementation appears as following in the qemu device tree:
+
+.. code-block:: console
+
+  (qemu) info qtree
+  bus: main-system-bus
+type System
+...
+dev: aspeed.apb2opb, id ""
+  gpio-out "sysbus-irq" 1
+  mmio 1e79b000/1000
+  bus: opb.1
+type opb
+dev: fsi.master, id ""
+

Re: [PATCH v8 00/10] Introduce model for IBM's FSI

2024-01-10 Thread Ninad Palsule


Hello Cedric,



  include/hw/fsi/aspeed-apb2opb.h |  34 


aspeed-apb2opb is a HW logic bridging the FSI world and Aspeed. It
doesn't belong to the FSI susbsytem. Since we don't have a directory
for platform specific devices, I think the model shoud go under hw/misc/.


Moved it to hw/misc directory



  include/hw/fsi/cfam.h   |  45 +


scratchpad is the only lbus device and it is quite generic, we could
move it to lbus files. It would be nice to implement more than one
reg.

Moved scratchpad to lbus files.




  include/hw/fsi/fsi-master.h |  32 
  include/hw/fsi/fsi-slave.h  |  29 +++
  include/hw/fsi/fsi.h    |  24 +++


I would move the definitions and implementation of the fsi bus and
the fsi slave under the fsi.h and fsi.c files

Moved fsi-slave to fsi files.




  include/hw/fsi/lbus.h   |  40 
  include/hw/fsi/opb.h    |  25 +++


opb is quite minimal now and I think it could be hidden under
aspeed-apb2opb.

Moved opb to aspeed-apb2opb files.



  hw/fsi/Kconfig  |  21 +++


one CONFIG_FSI option and one CONFIG_FSI_APB2OPB should be enough.
CONFIG_FSI_APB2OPB should select FSI and depends on CONFIG_ASPEED_SOC.

Reduced number of configs as you suggested.

Thanks for the review.

Regards,

Ninad

[PATCH v10 9/9] hw/fsi: Update MAINTAINER list

2024-01-10 Thread Ninad Palsule

Added maintainer for IBM FSI model

Signed-off-by: Cédric Le Goater 
Signed-off-by: Ninad Palsule 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 00ec1f7eca..79f97a3fb9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3569,6 +3569,14 @@ F: tests/qtest/adm1272-test.c
 F: tests/qtest/max34451-test.c
 F: tests/qtest/isl_pmbus_vr-test.c
 
+FSI
+M: Ninad Palsule 
+S: Maintained
+F: hw/fsi/*
+F: include/hw/fsi/*
+F: docs/specs/fsi.rst
+F: tests/qtest/fsi-test.c
+
 Firmware schema specifications
 M: Philippe Mathieu-Daudé 
 R: Daniel P. Berrange 
-- 
2.39.2

[PATCH v10 3/9] hw/fsi: Introduce IBM's cfam

2024-01-10 Thread Ninad Palsule

This is a part of patchset where IBM's Flexible Service Interface is
introduced.

The Common FRU Access Macro (CFAM), an address space containing
various "engines" that drive accesses on busses internal and external
to the POWER chip. Examples include the SBEFIFO and I2C masters. The
engines hang off of an internal Local Bus (LBUS) which is described
by the CFAM configuration block.

[ clg: - moved object FSIScratchPad under FSICFAMState
   - moved FSIScratchPad code under cfam.c
   - introduced fsi_cfam_instance_init()
   - reworked fsi_cfam_realize() ]

Signed-off-by: Andrew Jeffery 
Signed-off-by: Cédric Le Goater 
Signed-off-by: Ninad Palsule 
---
v9:
  - Added more registers to scratchpad
  - Removed unnecessary address space
  - Removed unnecessary header file
  - Defined macros for config values.
  - Cleaned up cfam config read.
---
 include/hw/fsi/cfam.h |  34 
 hw/fsi/cfam.c | 182 ++
 hw/fsi/meson.build|   2 +-
 hw/fsi/trace-events   |   5 ++
 4 files changed, 222 insertions(+), 1 deletion(-)
 create mode 100644 include/hw/fsi/cfam.h
 create mode 100644 hw/fsi/cfam.c

diff --git a/include/hw/fsi/cfam.h b/include/hw/fsi/cfam.h
new file mode 100644
index 00..bba5e3323a
--- /dev/null
+++ b/include/hw/fsi/cfam.h
@@ -0,0 +1,34 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * IBM Common FRU Access Macro
+ */
+#ifndef FSI_CFAM_H
+#define FSI_CFAM_H
+
+#include "exec/memory.h"
+
+#include "hw/fsi/fsi.h"
+#include "hw/fsi/lbus.h"
+
+#define TYPE_FSI_CFAM "cfam"
+#define FSI_CFAM(obj) OBJECT_CHECK(FSICFAMState, (obj), TYPE_FSI_CFAM)
+
+/* P9-ism */
+#define CFAM_CONFIG_NR_REGS 0x28
+
+typedef struct FSICFAMState {
+/* < private > */
+FSISlaveState parent;
+
+/* CFAM config address space */
+MemoryRegion config_iomem;
+
+MemoryRegion mr;
+
+FSILBus lbus;
+FSIScratchPad scratchpad;
+} FSICFAMState;
+
+#endif /* FSI_CFAM_H */
diff --git a/hw/fsi/cfam.c b/hw/fsi/cfam.c
new file mode 100644
index 00..d9ed1b532a
--- /dev/null
+++ b/hw/fsi/cfam.c
@@ -0,0 +1,182 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * IBM Common FRU Access Macro
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+
+#include "qapi/error.h"
+#include "trace.h"
+
+#include "hw/fsi/cfam.h"
+#include "hw/fsi/fsi.h"
+
+#include "hw/qdev-properties.h"
+
+#define ENGINE_CONFIG_NEXTBE_BIT(0)
+#define ENGINE_CONFIG_TYPE_PEEK   (0x02 << 4)
+#define ENGINE_CONFIG_TYPE_FSI(0x03 << 4)
+#define ENGINE_CONFIG_TYPE_SCRATCHPAD (0x06 << 4)
+
+/* Valid, slots, version, type, crc */
+#define CFAM_CONFIG_REG_PEEK   (ENGINE_CONFIG_NEXT   | \
+0x0001   | \
+0x1000   | \
+ENGINE_CONFIG_TYPE_PEEK  | \
+0x000c)
+
+/* Valid, slots, version, type, crc */
+#define CFAM_CONFIG_REG_FSI_SLAVE  (ENGINE_CONFIG_NEXT   | \
+0x0001   | \
+0x5000   | \
+ENGINE_CONFIG_TYPE_FSI   | \
+0x000a)
+
+/* Valid, slots, version, type, crc */
+#define CFAM_CONFIG_REG_SCRATCHPAD (ENGINE_CONFIG_NEXT   | \
+0x0001   | \
+0x1000   | \
+ENGINE_CONFIG_TYPE_SCRATCHPAD | \
+0x0007)
+
+#define TO_REG(x)  ((x) >> 2)
+
+#define CFAM_CONFIG_CHIP_IDTO_REG(0x00)
+#define CFAM_CONFIG_PEEK_STATUSTO_REG(0x04)
+#define CFAM_CONFIG_CHIP_ID_P9 0xc0022d15
+#define CFAM_CONFIG_CHIP_ID_BREAK  0xc0de
+
+static uint64_t fsi_cfam_config_read(void *opaque, hwaddr addr, unsigned size)
+{
+trace_fsi_cfam_config_read(addr, size);
+
+switch (addr) {
+case 0x00:
+return CFAM_CONFIG_CHIP_ID_P9;
+case 0x04:
+return CFAM_CONFIG_REG_PEEK;
+case 0x08:
+return CFAM_CONFIG_REG_FSI_SLAVE;
+case 0xc:
+return CFAM_CONFIG_REG_SCRATCHPAD;
+default:
+/*
+ * The config table contains different engines from 0xc onwards.
+ * The scratch pad is already added at address 0xc. We need to add
+ * future engines from address 0x10 onwards. Returning 0 as engine
+ * is not implemented.
+ */
+return 0;
+}
+}
+
+static void fsi_cfam_config_write(void *opaque, hwaddr addr, uint64_t data,
+  unsigned size)
+{
+FSICFAMState

[PATCH v10 1/9] hw/fsi: Introduce IBM's Local bus and scratchpad

2024-01-10 Thread Ninad Palsule

This is a part of patchset where IBM's Flexible Service Interface is
introduced.

The LBUS is modelled to maintain mapped memory for the devices. The
memory is mapped after CFAM config, peek table and FSI slave registers.

The scratchpad provides a set of non-functional registers. The firmware
is free to use them, hardware does not support any special management
support. The scratchpad registers can be read or written from LBUS
slave. The scratch pad is managed under FSI CFAM state.

[ clg: - removed lbus_add_device() bc unused
   - removed lbus_create_device() bc used only once
   - removed "address" property
   - updated meson.build to build fsi dir
   - included an empty hw/fsi/trace-events ]

Signed-off-by: Andrew Jeffery 
Signed-off-by: Cédric Le Goater 
Signed-off-by: Ninad Palsule 
---
v9:
  - Changed LBUS memory region to 1MB.
---
 meson.build   |   1 +
 hw/fsi/trace.h|   1 +
 include/hw/fsi/lbus.h |  52 ++
 hw/fsi/lbus.c | 121 ++
 hw/Kconfig|   1 +
 hw/fsi/Kconfig|   2 +
 hw/fsi/meson.build|   1 +
 hw/fsi/trace-events   |   2 +
 hw/meson.build|   1 +
 9 files changed, 182 insertions(+)
 create mode 100644 hw/fsi/trace.h
 create mode 100644 include/hw/fsi/lbus.h
 create mode 100644 hw/fsi/lbus.c
 create mode 100644 hw/fsi/Kconfig
 create mode 100644 hw/fsi/meson.build
 create mode 100644 hw/fsi/trace-events

diff --git a/meson.build b/meson.build
index 371edafae6..498d08b866 100644
--- a/meson.build
+++ b/meson.build
@@ -3273,6 +3273,7 @@ if have_system
 'hw/char',
 'hw/display',
 'hw/dma',
+'hw/fsi',
 'hw/hyperv',
 'hw/i2c',
 'hw/i386',
diff --git a/hw/fsi/trace.h b/hw/fsi/trace.h
new file mode 100644
index 00..ee67c7fb04
--- /dev/null
+++ b/hw/fsi/trace.h
@@ -0,0 +1 @@
+#include "trace/trace-hw_fsi.h"
diff --git a/include/hw/fsi/lbus.h b/include/hw/fsi/lbus.h
new file mode 100644
index 00..8bacdded7f
--- /dev/null
+++ b/include/hw/fsi/lbus.h
@@ -0,0 +1,52 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2024 IBM Corp.
+ *
+ * IBM Local bus and connected device structures.
+ */
+#ifndef FSI_LBUS_H
+#define FSI_LBUS_H
+
+#include "hw/qdev-core.h"
+#include "qemu/units.h"
+#include "exec/memory.h"
+
+#define TYPE_FSI_LBUS_DEVICE "fsi.lbus.device"
+OBJECT_DECLARE_TYPE(FSILBusDevice, FSILBusDeviceClass, FSI_LBUS_DEVICE)
+
+#define FSI_LBUS_MEM_REGION_SIZE  (1 * MiB)
+#define FSI_LBUSDEV_IOMEM_START   0xc00 /* 3K used by CFAM config etc */
+
+typedef struct FSILBusDevice {
+DeviceState parent;
+
+MemoryRegion iomem;
+} FSILBusDevice;
+
+typedef struct FSILBusDeviceClass {
+DeviceClass parent;
+
+uint32_t config;
+} FSILBusDeviceClass;
+
+#define TYPE_FSI_LBUS "fsi.lbus"
+OBJECT_DECLARE_SIMPLE_TYPE(FSILBus, FSI_LBUS)
+
+typedef struct FSILBus {
+BusState bus;
+
+MemoryRegion mr;
+} FSILBus;
+
+#define TYPE_FSI_SCRATCHPAD "fsi.scratchpad"
+#define SCRATCHPAD(obj) OBJECT_CHECK(FSIScratchPad, (obj), TYPE_FSI_SCRATCHPAD)
+
+#define FSI_SCRATCHPAD_NR_REGS 4
+
+typedef struct FSIScratchPad {
+FSILBusDevice parent;
+
+uint32_t reg[FSI_SCRATCHPAD_NR_REGS];
+} FSIScratchPad;
+
+#endif /* FSI_LBUS_H */
diff --git a/hw/fsi/lbus.c b/hw/fsi/lbus.c
new file mode 100644
index 00..34c450cc68
--- /dev/null
+++ b/hw/fsi/lbus.c
@@ -0,0 +1,121 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2024 IBM Corp.
+ *
+ * IBM Local bus where FSI slaves are connected
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/fsi/lbus.h"
+
+#include "hw/qdev-properties.h"
+
+#include "trace.h"
+
+static void lbus_init(Object *o)
+{
+FSILBus *lbus = FSI_LBUS(o);
+
+memory_region_init(>mr, OBJECT(lbus), TYPE_FSI_LBUS,
+   FSI_LBUS_MEM_REGION_SIZE - FSI_LBUSDEV_IOMEM_START);
+}
+
+static const TypeInfo lbus_info = {
+.name = TYPE_FSI_LBUS,
+.parent = TYPE_BUS,
+.instance_init = lbus_init,
+.instance_size = sizeof(FSILBus),
+};
+
+static void lbus_device_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->bus_type = TYPE_FSI_LBUS;
+}
+
+static const TypeInfo lbus_device_type_info = {
+.name = TYPE_FSI_LBUS_DEVICE,
+.parent = TYPE_DEVICE,
+.instance_size = sizeof(FSILBusDevice),
+.abstract = true,
+.class_init = lbus_device_class_init,
+.class_size = sizeof(FSILBusDeviceClass),
+};
+
+static uint64_t fsi_scratchpad_read(void *opaque, hwaddr addr, unsigned size)
+{
+FSIScratchPad *s = SCRATCHPAD(opaque);
+
+trace_fsi_scratchpad_read(addr, size);
+
+if (addr & ~(FSI_SCRATCHPAD_NR_REGS - 1)) {
+return 0;
+}
+
+return s->reg[addr];
+}
+
+static void fsi_scratchpad_write(void *opaque, hwaddr addr, uint64_t data,
+ unsigned size)
+{
+FSIScratchPad *s = SCRATCHPAD(opaque);
+
+

[PATCH v10 5/9] hw/fsi: Aspeed APB2OPB interface, Onchip perif bus

2024-01-10 Thread Ninad Palsule

This is a part of patchset where IBM's Flexible Service Interface is
introduced.

An APB-to-OPB bridge enabling access to the OPB from the ARM core in
the AST2600. Hardware limitations prevent the OPB from being directly
mapped into APB, so all accesses are indirect through the bridge.

The On-Chip Peripheral Bus (OPB): A low-speed bus typically found in
POWER processors. This now makes an appearance in the ASPEED SoC due
to tight integration of the FSI master IP with the OPB, mainly the
existence of an MMIO-mapping of the CFAM address straight onto a
sub-region of the OPB address space.

[ clg: - moved FSIMasterState under AspeedAPB2OPBState
   - modified fsi_opb_fsi_master_address() and
 fsi_opb_opb2fsi_address()
   - instroduced fsi_aspeed_apb2opb_init()
   - reworked fsi_aspeed_apb2opb_realize()
   - removed FSIMasterState object and fsi_opb_realize()
   - simplified OPBus ]

Signed-off-by: Andrew Jeffery 
Signed-off-by: Cédric Le Goater 
Signed-off-by: Ninad Palsule 
Reviewed-by: Joel Stanley 
---
v9:
  - Removed unused parameters from function.
  - Used qdev_realize() instead of qdev_realize_and_undef
  - Given a name to the opb memory region.

v10:
  - Combine Aspeed APB2OPB and on-chip pheripheral bus
---
 include/hw/misc/aspeed-apb2opb.h |  50 +
 hw/misc/aspeed-apb2opb.c | 338 +++
 hw/arm/Kconfig   |   1 +
 hw/misc/Kconfig  |   5 +
 hw/misc/meson.build  |   1 +
 hw/misc/trace-events |   4 +
 6 files changed, 399 insertions(+)
 create mode 100644 include/hw/misc/aspeed-apb2opb.h
 create mode 100644 hw/misc/aspeed-apb2opb.c

diff --git a/include/hw/misc/aspeed-apb2opb.h b/include/hw/misc/aspeed-apb2opb.h
new file mode 100644
index 00..fcd76631a9
--- /dev/null
+++ b/include/hw/misc/aspeed-apb2opb.h
@@ -0,0 +1,50 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2024 IBM Corp.
+ *
+ * ASPEED APB2OPB Bridge
+ * IBM On-Chip Peripheral Bus
+ */
+#ifndef FSI_ASPEED_APB2OPB_H
+#define FSI_ASPEED_APB2OPB_H
+
+#include "exec/memory.h"
+#include "hw/fsi/fsi-master.h"
+#include "hw/sysbus.h"
+
+#define TYPE_FSI_OPB "fsi.opb"
+
+#define TYPE_OP_BUS "opb"
+OBJECT_DECLARE_SIMPLE_TYPE(OPBus, OP_BUS)
+
+typedef struct OPBus {
+/*< private >*/
+BusState bus;
+
+/*< public >*/
+MemoryRegion mr;
+AddressSpace as;
+} OPBus;
+
+#define TYPE_ASPEED_APB2OPB "aspeed.apb2opb"
+OBJECT_DECLARE_SIMPLE_TYPE(AspeedAPB2OPBState, ASPEED_APB2OPB)
+
+#define ASPEED_APB2OPB_NR_REGS ((0xe8 >> 2) + 1)
+
+#define ASPEED_FSI_NUM 2
+
+typedef struct AspeedAPB2OPBState {
+/*< private >*/
+SysBusDevice parent_obj;
+
+/*< public >*/
+MemoryRegion iomem;
+
+uint32_t regs[ASPEED_APB2OPB_NR_REGS];
+qemu_irq irq;
+
+OPBus opb[ASPEED_FSI_NUM];
+FSIMasterState fsi[ASPEED_FSI_NUM];
+} AspeedAPB2OPBState;
+
+#endif /* FSI_ASPEED_APB2OPB_H */
diff --git a/hw/misc/aspeed-apb2opb.c b/hw/misc/aspeed-apb2opb.c
new file mode 100644
index 00..19545c780f
--- /dev/null
+++ b/hw/misc/aspeed-apb2opb.c
@@ -0,0 +1,338 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2024 IBM Corp.
+ *
+ * ASPEED APB-OPB FSI interface
+ * IBM On-chip Peripheral Bus
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qom/object.h"
+#include "qapi/error.h"
+#include "trace.h"
+
+#include "hw/misc/aspeed-apb2opb.h"
+#include "hw/qdev-core.h"
+
+#define TO_REG(x) (x >> 2)
+
+#define APB2OPB_VERSIONTO_REG(0x00)
+#define APB2OPB_TRIGGERTO_REG(0x04)
+
+#define APB2OPB_CONTROLTO_REG(0x08)
+#define   APB2OPB_CONTROL_OFF  BE_GENMASK(31, 13)
+
+#define APB2OPB_OPB2FSITO_REG(0x0c)
+#define   APB2OPB_OPB2FSI_OFF  BE_GENMASK(31, 22)
+
+#define APB2OPB_OPB0_SEL   TO_REG(0x10)
+#define APB2OPB_OPB1_SEL   TO_REG(0x28)
+#define   APB2OPB_OPB_SEL_EN   BIT(0)
+
+#define APB2OPB_OPB0_MODE  TO_REG(0x14)
+#define APB2OPB_OPB1_MODE  TO_REG(0x2c)
+#define   APB2OPB_OPB_MODE_RD  BIT(0)
+
+#define APB2OPB_OPB0_XFER  TO_REG(0x18)
+#define APB2OPB_OPB1_XFER  TO_REG(0x30)
+#define   APB2OPB_OPB_XFER_FULLBIT(1)
+#define   APB2OPB_OPB_XFER_HALFBIT(0)
+
+#define APB2OPB_OPB0_ADDR  TO_REG(0x1c)
+#define APB2OPB_OPB0_WRITE_DATATO_REG(0x20)
+
+#define APB2OPB_OPB1_ADDR  TO_REG(0x34)
+#define APB2OPB_OPB1_WRITE_DATA  TO_REG(0x38)
+
+#define APB2OPB_IRQ_STSTO_REG(0x48)
+#define   APB2OPB_IRQ_STS_OPB1_TX_ACK  BIT(17)
+#define   APB2OPB_IRQ_STS_OPB0_TX_ACK  BIT(16)
+
+#define APB2OPB_OPB0_WRITE_WORD_ENDIAN TO_REG(0x4c)
+#define   APB2OPB_OPB0_WRITE_WORD_ENDIAN_BE 0x0011101b
+#define APB2OPB_OPB0_WRITE_BYTE_ENDIAN

[PATCH v10 7/9] hw/fsi: Added qtest

2024-01-10 Thread Ninad Palsule

Added basic qtests for FSI model.

Acked-by: Thomas Huth 
Signed-off-by: Cédric Le Goater 
Signed-off-by: Ninad Palsule 
---
 tests/qtest/aspeed-fsi-test.c | 205 ++
 tests/qtest/meson.build   |   1 +
 2 files changed, 206 insertions(+)
 create mode 100644 tests/qtest/aspeed-fsi-test.c

diff --git a/tests/qtest/aspeed-fsi-test.c b/tests/qtest/aspeed-fsi-test.c
new file mode 100644
index 00..b3020dd821
--- /dev/null
+++ b/tests/qtest/aspeed-fsi-test.c
@@ -0,0 +1,205 @@
+/*
+ * QTest testcases for IBM's Flexible Service Interface (FSI)
+ *
+ * Copyright (c) 2023 IBM Corporation
+ *
+ * Authors:
+ *   Ninad Palsule 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include 
+
+#include "qemu/module.h"
+#include "libqtest-single.h"
+
+/* Registers from ast2600 specifications */
+#define ASPEED_FSI_ENGINER_TRIGGER   0x04
+#define ASPEED_FSI_OPB0_BUS_SELECT   0x10
+#define ASPEED_FSI_OPB1_BUS_SELECT   0x28
+#define ASPEED_FSI_OPB0_RW_DIRECTION 0x14
+#define ASPEED_FSI_OPB1_RW_DIRECTION 0x2c
+#define ASPEED_FSI_OPB0_XFER_SIZE0x18
+#define ASPEED_FSI_OPB1_XFER_SIZE0x30
+#define ASPEED_FSI_OPB0_BUS_ADDR 0x1c
+#define ASPEED_FSI_OPB1_BUS_ADDR 0x34
+#define ASPEED_FSI_INTRRUPT_CLEAR0x40
+#define ASPEED_FSI_INTRRUPT_STATUS   0x48
+#define ASPEED_FSI_OPB0_BUS_STATUS   0x80
+#define ASPEED_FSI_OPB1_BUS_STATUS   0x8c
+#define ASPEED_FSI_OPB0_READ_DATA0x84
+#define ASPEED_FSI_OPB1_READ_DATA0x90
+
+/*
+ * FSI Base addresses from the ast2600 specifications.
+ */
+#define AST2600_OPB_FSI0_BASE_ADDR 0x1e79b000
+#define AST2600_OPB_FSI1_BASE_ADDR 0x1e79b100
+
+static uint32_t aspeed_fsi_base_addr;
+
+static uint32_t aspeed_fsi_readl(QTestState *s, uint32_t reg)
+{
+return qtest_readl(s, aspeed_fsi_base_addr + reg);
+}
+
+static void aspeed_fsi_writel(QTestState *s, uint32_t reg, uint32_t val)
+{
+qtest_writel(s, aspeed_fsi_base_addr + reg, val);
+}
+
+/* Setup base address and select register */
+static void test_fsi_setup(QTestState *s, uint32_t base_addr)
+{
+uint32_t curval;
+
+aspeed_fsi_base_addr = base_addr;
+
+/* Set the base select register */
+if (base_addr == AST2600_OPB_FSI0_BASE_ADDR) {
+/* Unselect FSI1 */
+aspeed_fsi_writel(s, ASPEED_FSI_OPB1_BUS_SELECT, 0x0);
+curval = aspeed_fsi_readl(s, ASPEED_FSI_OPB1_BUS_SELECT);
+g_assert_cmpuint(curval, ==, 0x0);
+
+/* Select FSI0 */
+aspeed_fsi_writel(s, ASPEED_FSI_OPB0_BUS_SELECT, 0x1);
+curval = aspeed_fsi_readl(s, ASPEED_FSI_OPB0_BUS_SELECT);
+g_assert_cmpuint(curval, ==, 0x1);
+} else if (base_addr == AST2600_OPB_FSI1_BASE_ADDR) {
+/* Unselect FSI0 */
+aspeed_fsi_writel(s, ASPEED_FSI_OPB0_BUS_SELECT, 0x0);
+curval = aspeed_fsi_readl(s, ASPEED_FSI_OPB0_BUS_SELECT);
+g_assert_cmpuint(curval, ==, 0x0);
+
+/* Select FSI1 */
+aspeed_fsi_writel(s, ASPEED_FSI_OPB1_BUS_SELECT, 0x1);
+curval = aspeed_fsi_readl(s, ASPEED_FSI_OPB1_BUS_SELECT);
+g_assert_cmpuint(curval, ==, 0x1);
+} else {
+g_assert_not_reached();
+}
+}
+
+static void test_fsi_reg_change(QTestState *s, uint32_t reg, uint32_t newval)
+{
+uint32_t base;
+uint32_t curval;
+
+base = aspeed_fsi_readl(s, reg);
+aspeed_fsi_writel(s, reg, newval);
+curval = aspeed_fsi_readl(s, reg);
+g_assert_cmpuint(curval, ==, newval);
+aspeed_fsi_writel(s, reg, base);
+curval = aspeed_fsi_readl(s, reg);
+g_assert_cmpuint(curval, ==, base);
+}
+
+static void test_fsi0_master_regs(const void *data)
+{
+QTestState *s = (QTestState *)data;
+
+test_fsi_setup(s, AST2600_OPB_FSI0_BASE_ADDR);
+
+test_fsi_reg_change(s, ASPEED_FSI_OPB0_RW_DIRECTION, 0xF3F4F514);
+test_fsi_reg_change(s, ASPEED_FSI_OPB0_XFER_SIZE, 0xF3F4F518);
+test_fsi_reg_change(s, ASPEED_FSI_OPB0_BUS_ADDR, 0xF3F4F51c);
+test_fsi_reg_change(s, ASPEED_FSI_INTRRUPT_CLEAR, 0xF3F4F540);
+test_fsi_reg_change(s, ASPEED_FSI_INTRRUPT_STATUS, 0xF3F4F548);
+test_fsi_reg_change(s, ASPEED_FSI_OPB0_BUS_STATUS, 0xF3F4F580);
+test_fsi_reg_change(s, ASPEED_FSI_OPB0_READ_DATA, 0xF3F4F584);
+}
+
+static void test_fsi1_master_regs(const void *data)
+{
+QTestState *s = (QTestState *)data;
+
+test_fsi_setup(s, AST2600_OPB_FSI1_BASE_ADDR);
+
+test_fsi_reg_change(s, ASPEED_FSI_OPB1_RW_DIRECTION, 0xF3F4F514);
+test_fsi_reg_change(s, ASPEED_FSI_OPB1_XFER_SIZE, 0xF3F4F518);
+test_fsi_reg_change(s, ASPEED_FSI_OPB1_BUS_ADDR, 0xF3F4F51c);
+test_fsi_reg_change(s, ASPEED_FSI_INTRRUPT_CLEAR, 0xF3F4F540);
+test_fsi_reg_change(s, ASPEED_FSI_INTRRUPT_STATUS, 0xF3F4F548);
+test_fsi_reg_change(s, ASPEED_FSI_OPB1_BUS_STATUS, 0xF3F4F580);
+test_fsi_reg_change(s, ASPEED_FSI_OPB1_READ_DATA, 0xF3F4F584);
+}
+
+static void test_fsi0_getcfam_addr0(const void

[PATCH v10 4/9] hw/fsi: Introduce IBM's FSI master

2024-01-10 Thread Ninad Palsule

This is a part of patchset where IBM's Flexible Service Interface is
introduced.

This commit models the FSI master. CFAM is hanging out of FSI master which is a 
bus controller.

The FSI master: A controller in the platform service processor (e.g.
BMC) driving CFAM engine accesses into the POWER chip. At the
hardware level FSI is a bit-based protocol supporting synchronous and
DMA-driven accesses of engines in a CFAM.

[ clg: - move FSICFAMState object under FSIMasterState
   - introduced fsi_master_init()
   - reworked fsi_master_realize()
   - dropped FSIBus definition ]

Signed-off-by: Andrew Jeffery 
Reviewed-by: Joel Stanley 
Signed-off-by: Cédric Le Goater 
Signed-off-by: Ninad Palsule 
---
v9:
  - Initialized registers.
  - Fixed the address check.
---
 include/hw/fsi/fsi-master.h |  32 +++
 hw/fsi/fsi-master.c | 173 
 hw/fsi/meson.build  |   2 +-
 hw/fsi/trace-events |   2 +
 4 files changed, 208 insertions(+), 1 deletion(-)
 create mode 100644 include/hw/fsi/fsi-master.h
 create mode 100644 hw/fsi/fsi-master.c

diff --git a/include/hw/fsi/fsi-master.h b/include/hw/fsi/fsi-master.h
new file mode 100644
index 00..3830869877
--- /dev/null
+++ b/include/hw/fsi/fsi-master.h
@@ -0,0 +1,32 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2019 IBM Corp.
+ *
+ * IBM Flexible Service Interface Master
+ */
+#ifndef FSI_FSI_MASTER_H
+#define FSI_FSI_MASTER_H
+
+#include "exec/memory.h"
+#include "hw/qdev-core.h"
+#include "hw/fsi/fsi.h"
+#include "hw/fsi/cfam.h"
+
+#define TYPE_FSI_MASTER "fsi.master"
+OBJECT_DECLARE_SIMPLE_TYPE(FSIMasterState, FSI_MASTER)
+
+#define FSI_MASTER_NR_REGS ((0x2e0 >> 2) + 1)
+
+typedef struct FSIMasterState {
+DeviceState parent;
+MemoryRegion iomem;
+MemoryRegion opb2fsi;
+
+FSIBus bus;
+
+uint32_t regs[FSI_MASTER_NR_REGS];
+FSICFAMState cfam;
+} FSIMasterState;
+
+
+#endif /* FSI_FSI_H */
diff --git a/hw/fsi/fsi-master.c b/hw/fsi/fsi-master.c
new file mode 100644
index 00..939de5927f
--- /dev/null
+++ b/hw/fsi/fsi-master.c
@@ -0,0 +1,173 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * IBM Flexible Service Interface master
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/log.h"
+#include "trace.h"
+
+#include "hw/fsi/fsi-master.h"
+
+#define TYPE_OP_BUS "opb"
+
+#define TO_REG(x)   ((x) >> 2)
+
+#define FSI_MENP0   TO_REG(0x010)
+#define FSI_MENP32  TO_REG(0x014)
+#define FSI_MSENP0  TO_REG(0x018)
+#define FSI_MLEVP0  TO_REG(0x018)
+#define FSI_MSENP32 TO_REG(0x01c)
+#define FSI_MLEVP32 TO_REG(0x01c)
+#define FSI_MCENP0  TO_REG(0x020)
+#define FSI_MREFP0  TO_REG(0x020)
+#define FSI_MCENP32 TO_REG(0x024)
+#define FSI_MREFP32 TO_REG(0x024)
+
+#define FSI_MVERTO_REG(0x074)
+#define FSI_MRESP0  TO_REG(0x0d0)
+
+#define FSI_MRESB0  TO_REG(0x1d0)
+#define   FSI_MRESB0_RESET_GENERAL  BE_BIT(0)
+#define   FSI_MRESB0_RESET_ERRORBE_BIT(1)
+
+static uint64_t fsi_master_read(void *opaque, hwaddr addr, unsigned size)
+{
+FSIMasterState *s = FSI_MASTER(opaque);
+int reg = TO_REG(addr);
+
+trace_fsi_master_read(addr, size);
+
+if (reg >= FSI_MASTER_NR_REGS) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: Out of bounds read: 0x%"HWADDR_PRIx" for %u\n",
+  __func__, addr, size);
+return 0;
+}
+
+return s->regs[reg];
+}
+
+static void fsi_master_write(void *opaque, hwaddr addr, uint64_t data,
+ unsigned size)
+{
+FSIMasterState *s = FSI_MASTER(opaque);
+int reg = TO_REG(addr);
+
+trace_fsi_master_write(addr, size, data);
+
+if (reg >= FSI_MASTER_NR_REGS) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: Out of bounds write: %"HWADDR_PRIx" for %u\n",
+  __func__, addr, size);
+return;
+}
+
+switch (reg) {
+case FSI_MENP0:
+s->regs[FSI_MENP0] = data;
+break;
+case FSI_MENP32:
+s->regs[FSI_MENP32] = data;
+break;
+case FSI_MSENP0:
+s->regs[FSI_MENP0] |= data;
+break;
+case FSI_MSENP32:
+s->regs[FSI_MENP32] |= data;
+break;
+case FSI_MCENP0:
+s->regs[FSI_MENP0] &= ~data;
+break;
+case FSI_MCENP32:
+s->regs[FSI_MENP32] &= ~data;
+break;
+case FSI_MRESP0:
+/* Perform necessary resets leave register 0 to indicate no errors */
+break;
+case FSI_MRESB0:
+if (data

[PATCH v10 0/9] Introduce model for IBM's FSI

2024-01-10 Thread Ninad Palsule

Hello,

Please review the patch-set version 10.
I have incorporated review comments from Cedric.
v10:
  - Moved aspeed-apb2opb to hw/misc directory
  - Moved scratchpad to lbus files.
  - Moved fsi-slave to fsi files.
  - Merged opb changes in the aspeed-apb2opb files
  - Reduced number of config option to 2

Ninad Palsule (9):
  hw/fsi: Introduce IBM's Local bus and scratchpad
  hw/fsi: Introduce IBM's FSI Bus and FSI slave
  hw/fsi: Introduce IBM's cfam
  hw/fsi: Introduce IBM's FSI master
  hw/fsi: Aspeed APB2OPB interface, Onchip perif bus
  hw/arm: Hook up FSI module in AST2600
  hw/fsi: Added qtest
  hw/fsi: Added FSI documentation
  hw/fsi: Update MAINTAINER list

 MAINTAINERS  |   8 +
 docs/specs/fsi.rst   | 138 +
 docs/specs/index.rst |   1 +
 meson.build  |   1 +
 hw/fsi/trace.h   |   1 +
 include/hw/arm/aspeed_soc.h  |   4 +
 include/hw/fsi/cfam.h|  34 
 include/hw/fsi/fsi-master.h  |  32 +++
 include/hw/fsi/fsi.h |  38 
 include/hw/fsi/lbus.h|  52 +
 include/hw/misc/aspeed-apb2opb.h |  50 +
 hw/arm/aspeed_ast2600.c  |  19 ++
 hw/fsi/cfam.c| 182 +
 hw/fsi/fsi-master.c  | 173 
 hw/fsi/fsi.c | 111 ++
 hw/fsi/lbus.c| 121 +++
 hw/misc/aspeed-apb2opb.c | 338 +++
 tests/qtest/aspeed-fsi-test.c| 205 +++
 hw/Kconfig   |   1 +
 hw/arm/Kconfig   |   1 +
 hw/fsi/Kconfig   |   2 +
 hw/fsi/meson.build   |   1 +
 hw/fsi/trace-events  |  11 +
 hw/meson.build   |   1 +
 hw/misc/Kconfig  |   5 +
 hw/misc/meson.build  |   1 +
 hw/misc/trace-events |   4 +
 tests/qtest/meson.build  |   1 +
 28 files changed, 1536 insertions(+)
 create mode 100644 docs/specs/fsi.rst
 create mode 100644 hw/fsi/trace.h
 create mode 100644 include/hw/fsi/cfam.h
 create mode 100644 include/hw/fsi/fsi-master.h
 create mode 100644 include/hw/fsi/fsi.h
 create mode 100644 include/hw/fsi/lbus.h
 create mode 100644 include/hw/misc/aspeed-apb2opb.h
 create mode 100644 hw/fsi/cfam.c
 create mode 100644 hw/fsi/fsi-master.c
 create mode 100644 hw/fsi/fsi.c
 create mode 100644 hw/fsi/lbus.c
 create mode 100644 hw/misc/aspeed-apb2opb.c
 create mode 100644 tests/qtest/aspeed-fsi-test.c
 create mode 100644 hw/fsi/Kconfig
 create mode 100644 hw/fsi/meson.build
 create mode 100644 hw/fsi/trace-events

-- 
2.39.2

[PATCH v10 2/9] hw/fsi: Introduce IBM's FSI Bus and FSI slave

2024-01-10 Thread Ninad Palsule

This is a part of patchset where FSI bus is introduced.

The FSI bus is a simple bus where FSI master is attached.

The FSI slave: The slave is the terminal point of the FSI bus for
FSI symbols addressed to it. Slaves can be cascaded off of one
another. The slave's configuration registers appear in address space
of the CFAM to which it is attached.

[ clg: - removed include/hw/fsi/engine-scratchpad.h and
 hw/fsi/engine-scratchpad.c
   - dropped FSI_SCRATCHPAD
   - included FSIBus definition
   - dropped hw/fsi/trace-events changes ]

Signed-off-by: Andrew Jeffery 
Signed-off-by: Cédric Le Goater 
Signed-off-by: Ninad Palsule 
---
 include/hw/fsi/fsi.h |  38 +++
 hw/fsi/fsi.c | 111 +++
 hw/fsi/meson.build   |   2 +-
 hw/fsi/trace-events  |   2 +
 4 files changed, 152 insertions(+), 1 deletion(-)
 create mode 100644 include/hw/fsi/fsi.h
 create mode 100644 hw/fsi/fsi.c

diff --git a/include/hw/fsi/fsi.h b/include/hw/fsi/fsi.h
new file mode 100644
index 00..6e11747dd5
--- /dev/null
+++ b/include/hw/fsi/fsi.h
@@ -0,0 +1,38 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2024 IBM Corp.
+ *
+ * IBM Flexible Service Interface
+ */
+#ifndef FSI_FSI_H
+#define FSI_FSI_H
+
+#include "exec/memory.h"
+#include "hw/qdev-core.h"
+#include "hw/fsi/lbus.h"
+#include "qemu/bitops.h"
+
+/* Bitwise operations at the word level. */
+#define BE_BIT(x)   BIT(31 - (x))
+#define BE_GENMASK(hb, lb)  MAKE_64BIT_MASK((lb), ((hb) - (lb) + 1))
+
+#define TYPE_FSI_BUS "fsi.bus"
+OBJECT_DECLARE_SIMPLE_TYPE(FSIBus, FSI_BUS)
+
+typedef struct FSIBus {
+BusState bus;
+} FSIBus;
+
+#define TYPE_FSI_SLAVE "fsi.slave"
+OBJECT_DECLARE_SIMPLE_TYPE(FSISlaveState, FSI_SLAVE)
+
+#define FSI_SLAVE_CONTROL_NR_REGS ((0x40 >> 2) + 1)
+
+typedef struct FSISlaveState {
+DeviceState parent;
+
+MemoryRegion iomem;
+uint32_t regs[FSI_SLAVE_CONTROL_NR_REGS];
+} FSISlaveState;
+
+#endif /* FSI_FSI_H */
diff --git a/hw/fsi/fsi.c b/hw/fsi/fsi.c
new file mode 100644
index 00..0c73ca14ad
--- /dev/null
+++ b/hw/fsi/fsi.c
@@ -0,0 +1,111 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2024 IBM Corp.
+ *
+ * IBM Flexible Service Interface
+ */
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/log.h"
+#include "trace.h"
+
+#include "hw/fsi/fsi.h"
+
+static const TypeInfo fsi_bus_info = {
+.name = TYPE_FSI_BUS,
+.parent = TYPE_BUS,
+.instance_size = sizeof(FSIBus),
+};
+
+static void fsi_bus_register_types(void)
+{
+type_register_static(_bus_info);
+}
+
+type_init(fsi_bus_register_types);
+
+#define TO_REG(x)   ((x) >> 2)
+
+static uint64_t fsi_slave_read(void *opaque, hwaddr addr, unsigned size)
+{
+FSISlaveState *s = FSI_SLAVE(opaque);
+int reg = TO_REG(addr);
+
+trace_fsi_slave_read(addr, size);
+
+if (reg >= FSI_SLAVE_CONTROL_NR_REGS) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: Out of bounds read: 0x%"HWADDR_PRIx" for %u\n",
+  __func__, addr, size);
+return 0;
+}
+
+return s->regs[reg];
+}
+
+static void fsi_slave_write(void *opaque, hwaddr addr, uint64_t data,
+ unsigned size)
+{
+FSISlaveState *s = FSI_SLAVE(opaque);
+int reg = TO_REG(addr);
+
+trace_fsi_slave_write(addr, size, data);
+
+if (reg >= FSI_SLAVE_CONTROL_NR_REGS) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: Out of bounds write: 0x%"HWADDR_PRIx" for %u\n",
+  __func__, addr, size);
+return;
+}
+
+s->regs[reg] = data;
+}
+
+static const struct MemoryRegionOps fsi_slave_ops = {
+.read = fsi_slave_read,
+.write = fsi_slave_write,
+.endianness = DEVICE_BIG_ENDIAN,
+};
+
+static void fsi_slave_reset(DeviceState *dev)
+{
+FSISlaveState *s = FSI_SLAVE(dev);
+int i;
+
+/* Initialize registers */
+for (i = 0; i < FSI_SLAVE_CONTROL_NR_REGS; i++) {
+s->regs[i] = 0;
+}
+}
+
+static void fsi_slave_init(Object *o)
+{
+FSISlaveState *s = FSI_SLAVE(o);
+
+memory_region_init_io(>iomem, OBJECT(s), _slave_ops,
+  s, TYPE_FSI_SLAVE, 0x400);
+}
+
+static void fsi_slave_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->bus_type = TYPE_FSI_BUS;
+dc->desc = "FSI Slave";
+dc->reset = fsi_slave_reset;
+}
+
+static const TypeInfo fsi_slave_info = {
+.name = TYPE_FSI_SLAVE,
+.parent = TYPE_DEVICE,
+.instance_init = fsi_slave_init,
+.instance_size = sizeof(FSISlaveState),
+.class_init = fsi_slave_class_init,
+};
+
+static void fsi_slave_register_types(void)
+{
+type_register_static(_slave_info);
+}
+
+type_init(fsi_slave_register_types);
diff --git a/hw/fsi/meson.build b/hw/fsi/meson.build
index 93ba19dd04..574f5f9289 100644
--- a/hw/fsi/meson.build
+++

[PATCH v10 6/9] hw/arm: Hook up FSI module in AST2600

2024-01-10 Thread Ninad Palsule

This patchset introduces IBM's Flexible Service Interface(FSI).

Time for some fun with inter-processor buses. FSI allows a service
processor access to the internal buses of a host POWER processor to
perform configuration or debugging.

FSI has long existed in POWER processes and so comes with some baggage,
including how it has been integrated into the ASPEED SoC.

Working backwards from the POWER processor, the fundamental pieces of
interest for the implementation are:

1. The Common FRU Access Macro (CFAM), an address space containing
   various "engines" that drive accesses on buses internal and external
   to the POWER chip. Examples include the SBEFIFO and I2C masters. The
   engines hang off of an internal Local Bus (LBUS) which is described
   by the CFAM configuration block.

2. The FSI slave: The slave is the terminal point of the FSI bus for
   FSI symbols addressed to it. Slaves can be cascaded off of one
   another. The slave's configuration registers appear in address space
   of the CFAM to which it is attached.

3. The FSI master: A controller in the platform service processor (e.g.
   BMC) driving CFAM engine accesses into the POWER chip. At the
   hardware level FSI is a bit-based protocol supporting synchronous and
   DMA-driven accesses of engines in a CFAM.

4. The On-Chip Peripheral Bus (OPB): A low-speed bus typically found in
   POWER processors. This now makes an appearance in the ASPEED SoC due
   to tight integration of the FSI master IP with the OPB, mainly the
   existence of an MMIO-mapping of the CFAM address straight onto a
   sub-region of the OPB address space.

5. An APB-to-OPB bridge enabling access to the OPB from the ARM core in
   the AST2600. Hardware limitations prevent the OPB from being directly
   mapped into APB, so all accesses are indirect through the bridge.

The implementation appears as following in the qemu device tree:

(qemu) info qtree
bus: main-system-bus
  type System
  ...
  dev: aspeed.apb2opb, id ""
gpio-out "sysbus-irq" 1
mmio 1e79b000/1000
bus: opb.1
  type opb
  dev: fsi.master, id ""
bus: fsi.bus.1
  type fsi.bus
  dev: cfam.config, id ""
  dev: cfam, id ""
bus: fsi.lbus.1
  type lbus
  dev: scratchpad, id ""
address = 0 (0x0)
bus: opb.0
  type opb
  dev: fsi.master, id ""
bus: fsi.bus.0
  type fsi.bus
  dev: cfam.config, id ""
  dev: cfam, id ""
bus: fsi.lbus.0
  type lbus
  dev: scratchpad, id ""
address = 0 (0x0)

The LBUS is modelled to maintain the qdev bus hierarchy and to take
advantage of the object model to automatically generate the CFAM
configuration block. The configuration block presents engines in the
order they are attached to the CFAM's LBUS. Engine implementations
should subclass the LBusDevice and set the 'config' member of
LBusDeviceClass to match the engine's type.

CFAM designs offer a lot of flexibility, for instance it is possible for
a CFAM to be simultaneously driven from multiple FSI links. The modeling
is not so complete; it's assumed that each CFAM is attached to a single
FSI slave (as a consequence the CFAM subclasses the FSI slave).

As for FSI, its symbols and wire-protocol are not modelled at all. This
is not necessary to get FSI off the ground thanks to the mapping of the
CFAM address space onto the OPB address space - the models follow this
directly and map the CFAM memory region into the OPB's memory region.
Future work includes supporting more advanced accesses that drive the
FSI master directly rather than indirectly via the CFAM mapping, which
will require implementing the FSI state machine and methods for each of
the FSI symbols on the slave. Further down the track we can also look at
supporting the bitbanged SoftFSI drivers in Linux by extending the FSI
slave model to resolve sequences of GPIO IRQs into FSI symbols, and
calling the associated symbol method on the slave to map the access onto
the CFAM.

Testing:
Tested by reading cfam config address 0 on rainier machine type.

root@p10bmc:~# pdbg -a getcfam 0x0
p0: 0x0 = 0xc0022d15

Signed-off-by: Andrew Jeffery 
Signed-off-by: Cédric Le Goater 
Signed-off-by: Ninad Palsule 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Cédric Le Goater 
---
 include/hw/arm/aspeed_soc.h |  4 
 hw/arm/aspeed_ast2600.c | 19 +++
 2 files changed, 23 insertions(+)

diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h
index cb832bc1ee..9bc5c7a5ad 100644
--- a/include/hw/arm/aspeed_soc.h
+++ b/include/hw/arm/aspeed_soc.h
@@ -36,6 +36,7 @@
 #include "hw/misc/aspeed_lpc.h"
 #include "hw/misc/unimp.h"
 #include "hw/misc/aspeed_peci.h"
+#include "hw/misc/aspeed-apb2opb.h"
 #include

Re: [RFC/PATCH v1 07/11] gunyah: Specify device-tree location

2024-01-10 Thread Alex Bennée

Srivatsa Vaddagiri  writes:

> * Philippe Mathieu-Daud?  [2024-01-09 14:31:03]:
>
>> Hi Srivatsa,
>> 
>> On 9/1/24 10:00, Srivatsa Vaddagiri wrote:
>> > Specify the location of device-tree and its size, as Gunyah requires the
>> > device-tree to be parsed before VM can begin its execution.
>> > 
>> > Signed-off-by: Srivatsa Vaddagiri 
>> > ---
>> >   MAINTAINERS   |  1 +
>> >   accel/stubs/gunyah-stub.c |  5 +
>> >   hw/arm/virt.c |  6 ++
>> >   include/sysemu/gunyah.h   |  2 ++
>> >   target/arm/gunyah.c   | 45 +++
>> >   target/arm/meson.build|  3 +++
>> >   6 files changed, 62 insertions(+)
>> >   create mode 100644 target/arm/gunyah.c
>> 
>> (Please enable scripts/git.orderfile)
>
> Sure will do so from the next version!
>
>> 
>> > diff --git a/include/sysemu/gunyah.h b/include/sysemu/gunyah.h
>> > index 4f26938521..a73d17bfb9 100644
>> > --- a/include/sysemu/gunyah.h
>> > +++ b/include/sysemu/gunyah.h
>> > @@ -27,4 +27,6 @@ typedef struct GUNYAHState GUNYAHState;
>> >   DECLARE_INSTANCE_CHECKER(GUNYAHState, GUNYAH_STATE,
>> >TYPE_GUNYAH_ACCEL)
>> > +int gunyah_arm_set_dtb(__u64 dtb_start, __u64 dtb_size);
>> I'm getting:
>> 
>> In file included from hw/intc/arm_gicv3_common.c:35:
>> include/sysemu/gunyah.h:30:24: error: unknown type name '__u64'
>> int gunyah_arm_set_dtb(__u64 dtb_start, __u64 dtb_size);
>>^
>> include/sysemu/gunyah.h:30:41: error: unknown type name '__u64'
>> int gunyah_arm_set_dtb(__u64 dtb_start, __u64 dtb_size);
>> ^
>> 2 errors generated.
>
> Hmm I don't get that error when compiling on Linux. I think uint64_t will work
> better for all platforms where Qemu can get compiled?

Yes, aside from imported headers we state:

  In the event that you require a specific width, use a standard type
  like int32_t, uint32_t, uint64_t, etc.  The specific types are
  mandatory for VMState fields.

  Don't use Linux kernel internal types like u32, __u32 or __le32.

in style.rst

>
> - vatsa

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[PATCH v3 23/38] tcg/i386: Move tcg_cond_to_jcc[] into tcg_out_cmp

2024-01-10 Thread Richard Henderson

Return the x86 condition codes to use after the compare.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 2d6100a8f4..02718a02d8 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1449,8 +1449,8 @@ static void tcg_out_jxx(TCGContext *s, int opc, TCGLabel 
*l, bool small)
 }
 }
 
-static void tcg_out_cmp(TCGContext *s, TCGArg arg1, TCGArg arg2,
-int const_arg2, int rexw)
+static int tcg_out_cmp(TCGContext *s, TCGCond cond, TCGArg arg1,
+   TCGArg arg2, int const_arg2, int rexw)
 {
 if (const_arg2) {
 if (arg2 == 0) {
@@ -1462,14 +1462,15 @@ static void tcg_out_cmp(TCGContext *s, TCGArg arg1, 
TCGArg arg2,
 } else {
 tgen_arithr(s, ARITH_CMP + rexw, arg1, arg2);
 }
+return tcg_cond_to_jcc[cond];
 }
 
 static void tcg_out_brcond(TCGContext *s, int rexw, TCGCond cond,
TCGArg arg1, TCGArg arg2, int const_arg2,
TCGLabel *label, bool small)
 {
-tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
-tcg_out_jxx(s, tcg_cond_to_jcc[cond], label, small);
+int jcc = tcg_out_cmp(s, cond, arg1, arg2, const_arg2, rexw);
+tcg_out_jxx(s, jcc, label, small);
 }
 
 #if TCG_TARGET_REG_BITS == 32
@@ -1561,6 +1562,7 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 {
 bool inv = false;
 bool cleared;
+int jcc;
 
 switch (cond) {
 case TCG_COND_NE:
@@ -1597,7 +1599,7 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
  * We can then use NEG or INC to produce the desired result.
  * This is always smaller than the SETCC expansion.
  */
-tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
+tcg_out_cmp(s, TCG_COND_LTU, arg1, arg2, const_arg2, rexw);
 
 /* X - X - C = -C = (C ? -1 : 0) */
 tgen_arithr(s, ARITH_SBB + (neg ? rexw : 0), dest, dest);
@@ -1644,8 +1646,8 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 cleared = true;
 }
 
-tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
-tcg_out_modrm(s, OPC_SETCC | tcg_cond_to_jcc[cond], 0, dest);
+jcc = tcg_out_cmp(s, cond, arg1, arg2, const_arg2, rexw);
+tcg_out_modrm(s, OPC_SETCC | jcc, 0, dest);
 
 if (!cleared) {
 tcg_out_ext8u(s, dest, dest);
@@ -1716,8 +1718,8 @@ static void tcg_out_movcond(TCGContext *s, int rexw, 
TCGCond cond,
 TCGReg dest, TCGReg c1, TCGArg c2, int const_c2,
 TCGReg v1)
 {
-tcg_out_cmp(s, c1, c2, const_c2, rexw);
-tcg_out_cmov(s, tcg_cond_to_jcc[cond], rexw, dest, v1);
+int jcc = tcg_out_cmp(s, cond, c1, c2, const_c2, rexw);
+tcg_out_cmov(s, jcc, rexw, dest, v1);
 }
 
 static void tcg_out_ctz(TCGContext *s, int rexw, TCGReg dest, TCGReg arg1,
@@ -1759,8 +1761,8 @@ static void tcg_out_clz(TCGContext *s, int rexw, TCGReg 
dest, TCGReg arg1,
 tgen_arithi(s, ARITH_XOR + rexw, dest, rexw ? 63 : 31, 0);
 
 /* Since we have destroyed the flags from BSR, we have to re-test.  */
-tcg_out_cmp(s, arg1, 0, 1, rexw);
-tcg_out_cmov(s, JCC_JE, rexw, dest, arg2);
+int jcc = tcg_out_cmp(s, TCG_COND_EQ, arg1, 0, 1, rexw);
+tcg_out_cmov(s, jcc, rexw, dest, arg2);
 }
 }
 
-- 
2.34.1

[PATCH v3 20/38] tcg/arm: Factor tcg_out_cmp() out

2024-01-10 Thread Richard Henderson

Signed-off-by: Richard Henderson 
Message-Id: <20231028194522.245170-12-richard.hender...@linaro.org>
[PMD: Split from bigger patch, part 1/2]
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20231108145244.72421-1-phi...@linaro.org>
---
 tcg/arm/tcg-target.c.inc | 32 +---
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 0c29a3929b..66d71af8bf 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1191,6 +1191,13 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
 }
 }
 
+static TCGCond tcg_out_cmp(TCGContext *s, TCGCond cond, TCGReg a,
+   TCGArg b, int b_const)
+{
+tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0, a, b, b_const);
+return cond;
+}
+
 static TCGCond tcg_out_cmp2(TCGContext *s, const TCGArg *args,
 const int *const_args)
 {
@@ -1806,9 +1813,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 /* Constraints mean that v2 is always in the same register as dest,
  * so we only need to do "if condition passed, move v1 to dest".
  */
-tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
-args[1], args[2], const_args[2]);
-tcg_out_dat_rIK(s, tcg_cond_to_arm_cond[args[5]], ARITH_MOV,
+c = tcg_out_cmp(s, args[5], args[1], args[2], const_args[2]);
+tcg_out_dat_rIK(s, tcg_cond_to_arm_cond[c], ARITH_MOV,
 ARITH_MVN, args[0], 0, args[3], const_args[3]);
 break;
 case INDEX_op_add_i32:
@@ -1958,25 +1964,21 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 break;
 
 case INDEX_op_brcond_i32:
-tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
-   args[0], args[1], const_args[1]);
-tcg_out_goto_label(s, tcg_cond_to_arm_cond[args[2]],
-   arg_label(args[3]));
+c = tcg_out_cmp(s, args[2], args[0], args[1], const_args[1]);
+tcg_out_goto_label(s, tcg_cond_to_arm_cond[c], arg_label(args[3]));
 break;
 case INDEX_op_setcond_i32:
-tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
-args[1], args[2], const_args[2]);
-tcg_out_dat_imm(s, tcg_cond_to_arm_cond[args[3]],
+c = tcg_out_cmp(s, args[3], args[1], args[2], const_args[2]);
+tcg_out_dat_imm(s, tcg_cond_to_arm_cond[c],
 ARITH_MOV, args[0], 0, 1);
-tcg_out_dat_imm(s, tcg_cond_to_arm_cond[tcg_invert_cond(args[3])],
+tcg_out_dat_imm(s, tcg_cond_to_arm_cond[tcg_invert_cond(c)],
 ARITH_MOV, args[0], 0, 0);
 break;
 case INDEX_op_negsetcond_i32:
-tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
-args[1], args[2], const_args[2]);
-tcg_out_dat_imm(s, tcg_cond_to_arm_cond[args[3]],
+c = tcg_out_cmp(s, args[3], args[1], args[2], const_args[2]);
+tcg_out_dat_imm(s, tcg_cond_to_arm_cond[c],
 ARITH_MVN, args[0], 0, 0);
-tcg_out_dat_imm(s, tcg_cond_to_arm_cond[tcg_invert_cond(args[3])],
+tcg_out_dat_imm(s, tcg_cond_to_arm_cond[tcg_invert_cond(c)],
 ARITH_MOV, args[0], 0, 0);
 break;
 
-- 
2.34.1

[PATCH v3 24/38] tcg/i386: Support TCG_COND_TST{EQ,NE}

2024-01-10 Thread Richard Henderson

Merge tcg_out_testi into tcg_out_cmp and adjust the two uses.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h |  2 +-
 tcg/i386/tcg-target.c.inc | 95 ---
 2 files changed, 60 insertions(+), 37 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 1dd917a680..a10d4e1fce 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -198,7 +198,7 @@ typedef enum {
 #define TCG_TARGET_HAS_qemu_ldst_i128 \
 (TCG_TARGET_REG_BITS == 64 && (cpuinfo & CPUINFO_ATOMIC_VMOVDQA))
 
-#define TCG_TARGET_HAS_tst  0
+#define TCG_TARGET_HAS_tst  1
 
 /* We do not support older SSE systems, only beginning with AVX1.  */
 #define TCG_TARGET_HAS_v64  have_avx1
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 02718a02d8..f2414177bd 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -506,6 +506,8 @@ static const uint8_t tcg_cond_to_jcc[] = {
 [TCG_COND_GEU] = JCC_JAE,
 [TCG_COND_LEU] = JCC_JBE,
 [TCG_COND_GTU] = JCC_JA,
+[TCG_COND_TSTEQ] = JCC_JE,
+[TCG_COND_TSTNE] = JCC_JNE,
 };
 
 #if TCG_TARGET_REG_BITS == 64
@@ -1452,17 +1454,49 @@ static void tcg_out_jxx(TCGContext *s, int opc, 
TCGLabel *l, bool small)
 static int tcg_out_cmp(TCGContext *s, TCGCond cond, TCGArg arg1,
TCGArg arg2, int const_arg2, int rexw)
 {
-if (const_arg2) {
-if (arg2 == 0) {
-/* test r, r */
+int jz;
+
+if (!is_tst_cond(cond)) {
+if (!const_arg2) {
+tgen_arithr(s, ARITH_CMP + rexw, arg1, arg2);
+} else if (arg2 == 0) {
 tcg_out_modrm(s, OPC_TESTL + rexw, arg1, arg1);
 } else {
+tcg_debug_assert(!rexw || arg2 == (int32_t)arg2);
 tgen_arithi(s, ARITH_CMP + rexw, arg1, arg2, 0);
 }
-} else {
-tgen_arithr(s, ARITH_CMP + rexw, arg1, arg2);
+return tcg_cond_to_jcc[cond];
 }
-return tcg_cond_to_jcc[cond];
+
+jz = tcg_cond_to_jcc[cond];
+
+if (!const_arg2) {
+tcg_out_modrm(s, OPC_TESTL + rexw, arg1, arg2);
+return jz;
+}
+
+if (arg2 <= 0xff && (TCG_TARGET_REG_BITS == 64 || arg1 < 4)) {
+tcg_out_modrm(s, OPC_GRP3_Eb | P_REXB_RM, EXT3_TESTi, arg1);
+tcg_out8(s, arg2);
+return jz;
+}
+
+if ((arg2 & ~0xff00) == 0 && arg1 < 4) {
+tcg_out_modrm(s, OPC_GRP3_Eb, EXT3_TESTi, arg1 + 4);
+tcg_out8(s, arg2 >> 8);
+return jz;
+}
+
+if (rexw) {
+if (arg2 == (uint32_t)arg2) {
+rexw = 0;
+} else {
+tcg_debug_assert(arg2 == (int32_t)arg2);
+}
+}
+tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_TESTi, arg1);
+tcg_out32(s, arg2);
+return jz;
 }
 
 static void tcg_out_brcond(TCGContext *s, int rexw, TCGCond cond,
@@ -1479,18 +1513,21 @@ static void tcg_out_brcond2(TCGContext *s, const TCGArg 
*args,
 {
 TCGLabel *label_next = gen_new_label();
 TCGLabel *label_this = arg_label(args[5]);
+TCGCond cond = args[4];
 
-switch(args[4]) {
+switch (cond) {
 case TCG_COND_EQ:
-tcg_out_brcond(s, 0, TCG_COND_NE, args[0], args[2], const_args[2],
-   label_next, 1);
-tcg_out_brcond(s, 0, TCG_COND_EQ, args[1], args[3], const_args[3],
+case TCG_COND_TSTEQ:
+tcg_out_brcond(s, 0, tcg_invert_cond(cond),
+   args[0], args[2], const_args[2], label_next, 1);
+tcg_out_brcond(s, 0, cond, args[1], args[3], const_args[3],
label_this, small);
 break;
 case TCG_COND_NE:
-tcg_out_brcond(s, 0, TCG_COND_NE, args[0], args[2], const_args[2],
+case TCG_COND_TSTNE:
+tcg_out_brcond(s, 0, cond, args[0], args[2], const_args[2],
label_this, small);
-tcg_out_brcond(s, 0, TCG_COND_NE, args[1], args[3], const_args[3],
+tcg_out_brcond(s, 0, cond, args[1], args[3], const_args[3],
label_this, small);
 break;
 case TCG_COND_LT:
@@ -1827,23 +1864,6 @@ static void tcg_out_nopn(TCGContext *s, int n)
 tcg_out8(s, 0x90);
 }
 
-/* Test register R vs immediate bits I, setting Z flag for EQ/NE. */
-static void __attribute__((unused))
-tcg_out_testi(TCGContext *s, TCGReg r, uint32_t i)
-{
-/*
- * This is used for testing alignment, so we can usually use testb.
- * For i686, we have to use testl for %esi/%edi.
- */
-if (i <= 0xff && (TCG_TARGET_REG_BITS == 64 || r < 4)) {
-tcg_out_modrm(s, OPC_GRP3_Eb | P_REXB_RM, EXT3_TESTi, r);
-tcg_out8(s, i);
-} else {
-tcg_out_modrm(s, OPC_GRP3_Ev, EXT3_TESTi, r);
-tcg_out32(s, i);
-}
-}
-
 typedef struct {
 TCGReg base;
 int index;
@@ -2104,16 +2124,17 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext 
*s, HostAddress *h,
 tcg_out_ld(s,

[PATCH v3 38/38] tcg/tci: Support TCG_COND_TST{EQ,NE}

2024-01-10 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.h |  2 +-
 tcg/tci.c| 14 ++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 609b2f4e4a..a076f401d2 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -117,7 +117,7 @@
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   0
 
-#define TCG_TARGET_HAS_tst  0
+#define TCG_TARGET_HAS_tst  1
 
 /* Number of registers available. */
 #define TCG_TARGET_NB_REGS 16
diff --git a/tcg/tci.c b/tcg/tci.c
index 3cc851b7bd..39adcb7d82 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -228,6 +228,12 @@ static bool tci_compare32(uint32_t u0, uint32_t u1, 
TCGCond condition)
 case TCG_COND_GTU:
 result = (u0 > u1);
 break;
+case TCG_COND_TSTEQ:
+result = (u0 & u1) == 0;
+break;
+case TCG_COND_TSTNE:
+result = (u0 & u1) != 0;
+break;
 default:
 g_assert_not_reached();
 }
@@ -270,6 +276,12 @@ static bool tci_compare64(uint64_t u0, uint64_t u1, 
TCGCond condition)
 case TCG_COND_GTU:
 result = (u0 > u1);
 break;
+case TCG_COND_TSTEQ:
+result = (u0 & u1) == 0;
+break;
+case TCG_COND_TSTNE:
+result = (u0 & u1) != 0;
+break;
 default:
 g_assert_not_reached();
 }
@@ -1041,6 +1053,8 @@ static const char *str_c(TCGCond c)
 [TCG_COND_GEU] = "geu",
 [TCG_COND_LEU] = "leu",
 [TCG_COND_GTU] = "gtu",
+[TCG_COND_TSTEQ] = "tsteq",
+[TCG_COND_TSTNE] = "tstne",
 };
 
 assert((unsigned)c < ARRAY_SIZE(cond));
-- 
2.34.1

[PATCH v3 25/38] tcg/i386: Improve TSTNE/TESTEQ vs powers of two

2024-01-10 Thread Richard Henderson

Use "test x,x" when the bit is one of the 4 sign bits.
Use "bt imm,x" otherwise.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target-con-set.h |  6 ++--
 tcg/i386/tcg-target-con-str.h |  1 +
 tcg/i386/tcg-target.c.inc | 54 +++
 3 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/tcg/i386/tcg-target-con-set.h b/tcg/i386/tcg-target-con-set.h
index 7d00a7dde8..e24241cfa2 100644
--- a/tcg/i386/tcg-target-con-set.h
+++ b/tcg/i386/tcg-target-con-set.h
@@ -20,7 +20,7 @@ C_O0_I2(L, L)
 C_O0_I2(qi, r)
 C_O0_I2(re, r)
 C_O0_I2(ri, r)
-C_O0_I2(r, re)
+C_O0_I2(r, reT)
 C_O0_I2(s, L)
 C_O0_I2(x, r)
 C_O0_I3(L, L, L)
@@ -34,7 +34,7 @@ C_O1_I1(r, r)
 C_O1_I1(x, r)
 C_O1_I1(x, x)
 C_O1_I2(q, 0, qi)
-C_O1_I2(q, r, re)
+C_O1_I2(q, r, reT)
 C_O1_I2(r, 0, ci)
 C_O1_I2(r, 0, r)
 C_O1_I2(r, 0, re)
@@ -50,7 +50,7 @@ C_N1_I2(r, r, r)
 C_N1_I2(r, r, rW)
 C_O1_I3(x, 0, x, x)
 C_O1_I3(x, x, x, x)
-C_O1_I4(r, r, re, r, 0)
+C_O1_I4(r, r, reT, r, 0)
 C_O1_I4(r, r, r, ri, ri)
 C_O2_I1(r, r, L)
 C_O2_I2(a, d, a, r)
diff --git a/tcg/i386/tcg-target-con-str.h b/tcg/i386/tcg-target-con-str.h
index 95a30e58cd..cc22db227b 100644
--- a/tcg/i386/tcg-target-con-str.h
+++ b/tcg/i386/tcg-target-con-str.h
@@ -28,5 +28,6 @@ REGS('s', ALL_BYTEL_REGS & ~SOFTMMU_RESERVE_REGS)/* 
qemu_st8_i32 data */
  */
 CONST('e', TCG_CT_CONST_S32)
 CONST('I', TCG_CT_CONST_I32)
+CONST('T', TCG_CT_CONST_TST)
 CONST('W', TCG_CT_CONST_WSZ)
 CONST('Z', TCG_CT_CONST_U32)
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index f2414177bd..0b8c60d021 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -132,6 +132,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define TCG_CT_CONST_U32 0x200
 #define TCG_CT_CONST_I32 0x400
 #define TCG_CT_CONST_WSZ 0x800
+#define TCG_CT_CONST_TST 0x1000
 
 /* Registers used with L constraint, which are the first argument
registers on x86_64, and two random call clobbered registers on
@@ -202,7 +203,8 @@ static bool tcg_target_const_match(int64_t val, int ct,
 return 1;
 }
 if (type == TCG_TYPE_I32) {
-if (ct & (TCG_CT_CONST_S32 | TCG_CT_CONST_U32 | TCG_CT_CONST_I32)) {
+if (ct & (TCG_CT_CONST_S32 | TCG_CT_CONST_U32 |
+  TCG_CT_CONST_I32 | TCG_CT_CONST_TST)) {
 return 1;
 }
 } else {
@@ -215,6 +217,17 @@ static bool tcg_target_const_match(int64_t val, int ct,
 if ((ct & TCG_CT_CONST_I32) && ~val == (int32_t)~val) {
 return 1;
 }
+/*
+ * This will be used in combination with TCG_CT_CONST_S32,
+ * so "normal" TESTQ is already matched.  Also accept:
+ *TESTQ -> TESTL   (uint32_t)
+ *TESTQ -> BT  (is_power_of_2)
+ */
+if ((ct & TCG_CT_CONST_TST)
+&& is_tst_cond(cond)
+&& (val == (uint32_t)val || is_power_of_2(val))) {
+return 1;
+}
 }
 if ((ct & TCG_CT_CONST_WSZ) && val == (type == TCG_TYPE_I32 ? 32 : 64)) {
 return 1;
@@ -396,6 +409,7 @@ static bool tcg_target_const_match(int64_t val, int ct,
 #define OPC_SHLX(0xf7 | P_EXT38 | P_DATA16)
 #define OPC_SHRX(0xf7 | P_EXT38 | P_SIMDF2)
 #define OPC_SHRD_Ib (0xac | P_EXT)
+#define OPC_TESTB  (0x84)
 #define OPC_TESTL  (0x85)
 #define OPC_TZCNT   (0xbc | P_EXT | P_SIMDF3)
 #define OPC_UD2 (0x0b | P_EXT)
@@ -442,6 +456,12 @@ static bool tcg_target_const_match(int64_t val, int ct,
 #define OPC_GRP3_Ev (0xf7)
 #define OPC_GRP5(0xff)
 #define OPC_GRP14   (0x73 | P_EXT | P_DATA16)
+#define OPC_GRPBT   (0xba | P_EXT)
+
+#define OPC_GRPBT_BT4
+#define OPC_GRPBT_BTS   5
+#define OPC_GRPBT_BTR   6
+#define OPC_GRPBT_BTC   7
 
 /* Group 1 opcode extensions for 0x80-0x83.
These are also used as modifiers for OPC_ARITH.  */
@@ -1454,7 +1474,7 @@ static void tcg_out_jxx(TCGContext *s, int opc, TCGLabel 
*l, bool small)
 static int tcg_out_cmp(TCGContext *s, TCGCond cond, TCGArg arg1,
TCGArg arg2, int const_arg2, int rexw)
 {
-int jz;
+int jz, js;
 
 if (!is_tst_cond(cond)) {
 if (!const_arg2) {
@@ -1469,6 +1489,7 @@ static int tcg_out_cmp(TCGContext *s, TCGCond cond, 
TCGArg arg1,
 }
 
 jz = tcg_cond_to_jcc[cond];
+js = (cond == TCG_COND_TSTNE ? JCC_JS : JCC_JNS);
 
 if (!const_arg2) {
 tcg_out_modrm(s, OPC_TESTL + rexw, arg1, arg2);
@@ -1476,17 +1497,40 @@ static int tcg_out_cmp(TCGContext *s, TCGCond cond, 
TCGArg arg1,
 }
 
 if (arg2 <= 0xff && (TCG_TARGET_REG_BITS == 64 || arg1 < 4)) {
+if (arg2 == 0x80) {
+tcg_out_modrm(s, OPC_TESTB | P_REXB_R, arg1, arg1);
+return js;
+}
 tcg_out_modrm(s, OPC_GRP3_Eb | P_REXB_RM, EXT3_TESTi, arg1);
 tcg_out8(s, arg2);
 return jz;
 }
 
 if ((arg2 & ~0xff00) == 0 && arg1 < 4) {
+if (arg2 == 0x8000) {

[PATCH v3 29/38] tcg/sparc64: Support TCG_COND_TST{EQ,NE}

2024-01-10 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/sparc64/tcg-target.h |  2 +-
 tcg/sparc64/tcg-target.c.inc | 16 ++--
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/tcg/sparc64/tcg-target.h b/tcg/sparc64/tcg-target.h
index ae2910c4ee..a18906a14e 100644
--- a/tcg/sparc64/tcg-target.h
+++ b/tcg/sparc64/tcg-target.h
@@ -149,7 +149,7 @@ extern bool use_vis3_instructions;
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   0
 
-#define TCG_TARGET_HAS_tst  0
+#define TCG_TARGET_HAS_tst  1
 
 #define TCG_AREG0 TCG_REG_I0
 
diff --git a/tcg/sparc64/tcg-target.c.inc b/tcg/sparc64/tcg-target.c.inc
index 10fb8a1a0d..176c98740b 100644
--- a/tcg/sparc64/tcg-target.c.inc
+++ b/tcg/sparc64/tcg-target.c.inc
@@ -607,9 +607,11 @@ static void tcg_out_div32(TCGContext *s, TCGReg rd, TCGReg 
rs1,
uns ? ARITH_UDIV : ARITH_SDIV);
 }
 
-static const uint8_t tcg_cond_to_bcond[] = {
+static const uint8_t tcg_cond_to_bcond[16] = {
 [TCG_COND_EQ] = COND_E,
 [TCG_COND_NE] = COND_NE,
+[TCG_COND_TSTEQ] = COND_E,
+[TCG_COND_TSTNE] = COND_NE,
 [TCG_COND_LT] = COND_L,
 [TCG_COND_GE] = COND_GE,
 [TCG_COND_LE] = COND_LE,
@@ -649,7 +651,8 @@ static void tcg_out_bpcc(TCGContext *s, int scond, int 
flags, TCGLabel *l)
 static void tcg_out_cmp(TCGContext *s, TCGCond cond,
 TCGReg c1, int32_t c2, int c2const)
 {
-tcg_out_arithc(s, TCG_REG_G0, c1, c2, c2const, ARITH_SUBCC);
+tcg_out_arithc(s, TCG_REG_G0, c1, c2, c2const,
+   is_tst_cond(cond) ? ARITH_ANDCC : ARITH_SUBCC);
 }
 
 static void tcg_out_brcond_i32(TCGContext *s, TCGCond cond, TCGReg arg1,
@@ -744,6 +747,15 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond 
cond, TCGReg ret,
 cond = (cond == TCG_COND_EQ ? TCG_COND_GEU : TCG_COND_LTU);
break;
 
+case TCG_COND_TSTEQ:
+case TCG_COND_TSTNE:
+/* Transform to inequality vs zero.  */
+tcg_out_arithc(s, TCG_REG_T1, c1, c2, c2const, ARITH_AND);
+c1 = TCG_REG_G0;
+c2 = TCG_REG_T1, c2const = 0;
+cond = (cond == TCG_COND_TSTEQ ? TCG_COND_GEU : TCG_COND_LTU);
+   break;
+
 case TCG_COND_GTU:
 case TCG_COND_LEU:
 /* If we don't need to load a constant into a register, we can
-- 
2.34.1

[PATCH v3 19/38] tcg/aarch64: Generate CBNZ for TSTNE of UINT32_MAX

2024-01-10 Thread Richard Henderson

... and the inverse, CBZ for TSTEQ.

Suggested-by: Paolo Bonzini 
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.c.inc | 8 
 1 file changed, 8 insertions(+)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 55225313ad..0c98c48f68 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1453,6 +1453,7 @@ static void tcg_out_brcond(TCGContext *s, TCGType ext, 
TCGCond c, TCGArg a,
 break;
 case TCG_COND_LT:
 case TCG_COND_GE:
+/* cmp xN,0; b.mi L -> tbnz xN,63,L */
 if (b_const && b == 0) {
 c = (c == TCG_COND_LT ? TCG_COND_TSTNE : TCG_COND_TSTEQ);
 tbit = ext ? 63 : 31;
@@ -1461,6 +1462,13 @@ static void tcg_out_brcond(TCGContext *s, TCGType ext, 
TCGCond c, TCGArg a,
 break;
 case TCG_COND_TSTEQ:
 case TCG_COND_TSTNE:
+/* tst xN,0x; b.ne L -> cbnz wN,L */
+if (b_const && b == UINT32_MAX) {
+ext = TCG_TYPE_I32;
+need_cmp = false;
+break;
+}
+/* tst xN,1< tbnz xN,B,L */
 if (b_const && is_power_of_2(b)) {
 tbit = ctz64(b);
 need_cmp = false;
-- 
2.34.1

[PATCH v3 28/38] tcg/sparc64: Pass TCGCond to tcg_out_cmp

2024-01-10 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/sparc64/tcg-target.c.inc | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/tcg/sparc64/tcg-target.c.inc b/tcg/sparc64/tcg-target.c.inc
index e16b25e309..10fb8a1a0d 100644
--- a/tcg/sparc64/tcg-target.c.inc
+++ b/tcg/sparc64/tcg-target.c.inc
@@ -646,7 +646,8 @@ static void tcg_out_bpcc(TCGContext *s, int scond, int 
flags, TCGLabel *l)
 tcg_out_bpcc0(s, scond, flags, off19);
 }
 
-static void tcg_out_cmp(TCGContext *s, TCGReg c1, int32_t c2, int c2const)
+static void tcg_out_cmp(TCGContext *s, TCGCond cond,
+TCGReg c1, int32_t c2, int c2const)
 {
 tcg_out_arithc(s, TCG_REG_G0, c1, c2, c2const, ARITH_SUBCC);
 }
@@ -654,7 +655,7 @@ static void tcg_out_cmp(TCGContext *s, TCGReg c1, int32_t 
c2, int c2const)
 static void tcg_out_brcond_i32(TCGContext *s, TCGCond cond, TCGReg arg1,
int32_t arg2, int const_arg2, TCGLabel *l)
 {
-tcg_out_cmp(s, arg1, arg2, const_arg2);
+tcg_out_cmp(s, cond, arg1, arg2, const_arg2);
 tcg_out_bpcc(s, tcg_cond_to_bcond[cond], BPCC_ICC | BPCC_PT, l);
 tcg_out_nop(s);
 }
@@ -671,7 +672,7 @@ static void tcg_out_movcond_i32(TCGContext *s, TCGCond 
cond, TCGReg ret,
 TCGReg c1, int32_t c2, int c2const,
 int32_t v1, int v1const)
 {
-tcg_out_cmp(s, c1, c2, c2const);
+tcg_out_cmp(s, cond, c1, c2, c2const);
 tcg_out_movcc(s, cond, MOVCC_ICC, ret, v1, v1const);
 }
 
@@ -691,7 +692,7 @@ static void tcg_out_brcond_i64(TCGContext *s, TCGCond cond, 
TCGReg arg1,
 tcg_out32(s, INSN_OP(0) | INSN_OP2(3) | BPR_PT | INSN_RS1(arg1)
   | INSN_COND(rcond) | off16);
 } else {
-tcg_out_cmp(s, arg1, arg2, const_arg2);
+tcg_out_cmp(s, cond, arg1, arg2, const_arg2);
 tcg_out_bpcc(s, tcg_cond_to_bcond[cond], BPCC_XCC | BPCC_PT, l);
 }
 tcg_out_nop(s);
@@ -715,7 +716,7 @@ static void tcg_out_movcond_i64(TCGContext *s, TCGCond 
cond, TCGReg ret,
 if (c2 == 0 && rcond && (!v1const || check_fit_i32(v1, 10))) {
 tcg_out_movr(s, rcond, ret, c1, v1, v1const);
 } else {
-tcg_out_cmp(s, c1, c2, c2const);
+tcg_out_cmp(s, cond, c1, c2, c2const);
 tcg_out_movcc(s, cond, MOVCC_XCC, ret, v1, v1const);
 }
 }
@@ -759,13 +760,13 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond 
cond, TCGReg ret,
 /* FALLTHRU */
 
 default:
-tcg_out_cmp(s, c1, c2, c2const);
+tcg_out_cmp(s, cond, c1, c2, c2const);
 tcg_out_movi_s13(s, ret, 0);
 tcg_out_movcc(s, cond, MOVCC_ICC, ret, neg ? -1 : 1, 1);
 return;
 }
 
-tcg_out_cmp(s, c1, c2, c2const);
+tcg_out_cmp(s, cond, c1, c2, c2const);
 if (cond == TCG_COND_LTU) {
 if (neg) {
 /* 0 - 0 - C = -C = (C ? -1 : 0) */
@@ -799,7 +800,7 @@ static void tcg_out_setcond_i64(TCGContext *s, TCGCond 
cond, TCGReg ret,
 c2 = c1, c2const = 0, c1 = TCG_REG_G0;
 /* FALLTHRU */
 case TCG_COND_LTU:
-tcg_out_cmp(s, c1, c2, c2const);
+tcg_out_cmp(s, cond, c1, c2, c2const);
 tcg_out_arith(s, ret, TCG_REG_G0, TCG_REG_G0, ARITH_ADDXC);
 return;
 default:
@@ -814,7 +815,7 @@ static void tcg_out_setcond_i64(TCGContext *s, TCGCond 
cond, TCGReg ret,
 tcg_out_movi_s13(s, ret, 0);
 tcg_out_movr(s, rcond, ret, c1, neg ? -1 : 1, 1);
 } else {
-tcg_out_cmp(s, c1, c2, c2const);
+tcg_out_cmp(s, cond, c1, c2, c2const);
 tcg_out_movi_s13(s, ret, 0);
 tcg_out_movcc(s, cond, MOVCC_XCC, ret, neg ? -1 : 1, 1);
 }
@@ -1102,7 +1103,7 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, 
HostAddress *h,
 tcg_out_movi_s32(s, TCG_REG_T3, compare_mask);
 tcg_out_arith(s, TCG_REG_T3, addr_reg, TCG_REG_T3, ARITH_AND);
 }
-tcg_out_cmp(s, TCG_REG_T2, TCG_REG_T3, 0);
+tcg_out_cmp(s, TCG_COND_NE, TCG_REG_T2, TCG_REG_T3, 0);
 
 ldst = new_ldst_label(s);
 ldst->is_ld = is_ld;
-- 
2.34.1

[PATCH v3 27/38] tcg/sparc64: Hoist read of tcg_cond_to_rcond

2024-01-10 Thread Richard Henderson

Use a non-zero value here (an illegal encoding) as a better
condition than is_unsigned_cond for when MOVR/BPR is usable.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/sparc64/tcg-target.c.inc | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/tcg/sparc64/tcg-target.c.inc b/tcg/sparc64/tcg-target.c.inc
index ac86b92b75..e16b25e309 100644
--- a/tcg/sparc64/tcg-target.c.inc
+++ b/tcg/sparc64/tcg-target.c.inc
@@ -620,7 +620,7 @@ static const uint8_t tcg_cond_to_bcond[] = {
 [TCG_COND_GTU] = COND_GU,
 };
 
-static const uint8_t tcg_cond_to_rcond[] = {
+static const uint8_t tcg_cond_to_rcond[16] = {
 [TCG_COND_EQ] = RCOND_Z,
 [TCG_COND_NE] = RCOND_NZ,
 [TCG_COND_LT] = RCOND_LZ,
@@ -679,7 +679,8 @@ static void tcg_out_brcond_i64(TCGContext *s, TCGCond cond, 
TCGReg arg1,
int32_t arg2, int const_arg2, TCGLabel *l)
 {
 /* For 64-bit signed comparisons vs zero, we can avoid the compare.  */
-if (arg2 == 0 && !is_unsigned_cond(cond)) {
+int rcond = tcg_cond_to_rcond[cond];
+if (arg2 == 0 && rcond) {
 int off16 = 0;
 
 if (l->has_value) {
@@ -688,7 +689,7 @@ static void tcg_out_brcond_i64(TCGContext *s, TCGCond cond, 
TCGReg arg1,
 tcg_out_reloc(s, s->code_ptr, R_SPARC_WDISP16, l, 0);
 }
 tcg_out32(s, INSN_OP(0) | INSN_OP2(3) | BPR_PT | INSN_RS1(arg1)
-  | INSN_COND(tcg_cond_to_rcond[cond]) | off16);
+  | INSN_COND(rcond) | off16);
 } else {
 tcg_out_cmp(s, arg1, arg2, const_arg2);
 tcg_out_bpcc(s, tcg_cond_to_bcond[cond], BPCC_XCC | BPCC_PT, l);
@@ -696,11 +697,10 @@ static void tcg_out_brcond_i64(TCGContext *s, TCGCond 
cond, TCGReg arg1,
 tcg_out_nop(s);
 }
 
-static void tcg_out_movr(TCGContext *s, TCGCond cond, TCGReg ret, TCGReg c1,
+static void tcg_out_movr(TCGContext *s, int rcond, TCGReg ret, TCGReg c1,
  int32_t v1, int v1const)
 {
-tcg_out32(s, ARITH_MOVR | INSN_RD(ret) | INSN_RS1(c1)
-  | (tcg_cond_to_rcond[cond] << 10)
+tcg_out32(s, ARITH_MOVR | INSN_RD(ret) | INSN_RS1(c1) | (rcond << 10)
   | (v1const ? INSN_IMM10(v1) : INSN_RS2(v1)));
 }
 
@@ -711,9 +711,9 @@ static void tcg_out_movcond_i64(TCGContext *s, TCGCond 
cond, TCGReg ret,
 /* For 64-bit signed comparisons vs zero, we can avoid the compare.
Note that the immediate range is one bit smaller, so we must check
for that as well.  */
-if (c2 == 0 && !is_unsigned_cond(cond)
-&& (!v1const || check_fit_i32(v1, 10))) {
-tcg_out_movr(s, cond, ret, c1, v1, v1const);
+int rcond = tcg_cond_to_rcond[cond];
+if (c2 == 0 && rcond && (!v1const || check_fit_i32(v1, 10))) {
+tcg_out_movr(s, rcond, ret, c1, v1, v1const);
 } else {
 tcg_out_cmp(s, c1, c2, c2const);
 tcg_out_movcc(s, cond, MOVCC_XCC, ret, v1, v1const);
@@ -788,6 +788,8 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond 
cond, TCGReg ret,
 static void tcg_out_setcond_i64(TCGContext *s, TCGCond cond, TCGReg ret,
 TCGReg c1, int32_t c2, int c2const, bool neg)
 {
+int rcond;
+
 if (use_vis3_instructions && !neg) {
 switch (cond) {
 case TCG_COND_NE:
@@ -807,9 +809,10 @@ static void tcg_out_setcond_i64(TCGContext *s, TCGCond 
cond, TCGReg ret,
 
 /* For 64-bit signed comparisons vs zero, we can avoid the compare
if the input does not overlap the output.  */
-if (c2 == 0 && !is_unsigned_cond(cond) && c1 != ret) {
+rcond = tcg_cond_to_rcond[cond];
+if (c2 == 0 && rcond && c1 != ret) {
 tcg_out_movi_s13(s, ret, 0);
-tcg_out_movr(s, cond, ret, c1, neg ? -1 : 1, 1);
+tcg_out_movr(s, rcond, ret, c1, neg ? -1 : 1, 1);
 } else {
 tcg_out_cmp(s, c1, c2, c2const);
 tcg_out_movi_s13(s, ret, 0);
-- 
2.34.1

[PATCH v3 37/38] tcg/s390x: Support TCG_COND_TST{EQ,NE}

2024-01-10 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 tcg/s390x/tcg-target.h |   2 +-
 tcg/s390x/tcg-target.c.inc | 139 +
 2 files changed, 97 insertions(+), 44 deletions(-)

diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 53bed8c8d2..ae448c3a3a 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -138,7 +138,7 @@ extern uint64_t s390_facilities[3];
 
 #define TCG_TARGET_HAS_qemu_ldst_i128 1
 
-#define TCG_TARGET_HAS_tst0
+#define TCG_TARGET_HAS_tst1
 
 #define TCG_TARGET_HAS_v64HAVE_FACILITY(VECTOR)
 #define TCG_TARGET_HAS_v128   HAVE_FACILITY(VECTOR)
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 86ec737768..cb1693c9cf 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -112,6 +112,9 @@ typedef enum S390Opcode {
 RI_OILH = 0xa50a,
 RI_OILL = 0xa50b,
 RI_TMLL = 0xa701,
+RI_TMLH = 0xa700,
+RI_TMHL = 0xa703,
+RI_TMHH = 0xa702,
 
 RIEb_CGRJ= 0xec64,
 RIEb_CLGRJ   = 0xec65,
@@ -404,10 +407,15 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define S390_CC_NEVER   0
 #define S390_CC_ALWAYS  15
 
+#define S390_TM_EQ  8  /* CC == 0 */
+#define S390_TM_NE  7  /* CC in {1,2,3} */
+
 /* Condition codes that result from a COMPARE and COMPARE LOGICAL.  */
-static const uint8_t tcg_cond_to_s390_cond[] = {
+static const uint8_t tcg_cond_to_s390_cond[16] = {
 [TCG_COND_EQ]  = S390_CC_EQ,
 [TCG_COND_NE]  = S390_CC_NE,
+[TCG_COND_TSTEQ] = S390_CC_EQ,
+[TCG_COND_TSTNE] = S390_CC_NE,
 [TCG_COND_LT]  = S390_CC_LT,
 [TCG_COND_LE]  = S390_CC_LE,
 [TCG_COND_GT]  = S390_CC_GT,
@@ -421,9 +429,11 @@ static const uint8_t tcg_cond_to_s390_cond[] = {
 /* Condition codes that result from a LOAD AND TEST.  Here, we have no
unsigned instruction variation, however since the test is vs zero we
can re-map the outcomes appropriately.  */
-static const uint8_t tcg_cond_to_ltr_cond[] = {
+static const uint8_t tcg_cond_to_ltr_cond[16] = {
 [TCG_COND_EQ]  = S390_CC_EQ,
 [TCG_COND_NE]  = S390_CC_NE,
+[TCG_COND_TSTEQ] = S390_CC_ALWAYS,
+[TCG_COND_TSTNE] = S390_CC_NEVER,
 [TCG_COND_LT]  = S390_CC_LT,
 [TCG_COND_LE]  = S390_CC_LE,
 [TCG_COND_GT]  = S390_CC_GT,
@@ -542,10 +552,13 @@ static bool risbg_mask(uint64_t c)
 static bool tcg_target_const_match(int64_t val, int ct,
TCGType type, TCGCond cond, int vece)
 {
+uint64_t uval = val;
+
 if (ct & TCG_CT_CONST) {
 return true;
 }
 if (type == TCG_TYPE_I32) {
+uval = (uint32_t)val;
 val = (int32_t)val;
 }
 
@@ -567,6 +580,15 @@ static bool tcg_target_const_match(int64_t val, int ct,
 case TCG_COND_GTU:
 ct |= TCG_CT_CONST_U32;  /* CLGFI */
 break;
+case TCG_COND_TSTNE:
+case TCG_COND_TSTEQ:
+if (is_const_p16(uval) >= 0) {
+return true;  /* TMxx */
+}
+if (risbg_mask(uval)) {
+return true;  /* RISBG */
+}
+break;
 default:
 g_assert_not_reached();
 }
@@ -588,10 +610,6 @@ static bool tcg_target_const_match(int64_t val, int ct,
 if (ct & TCG_CT_CONST_INV) {
 val = ~val;
 }
-/*
- * Note that is_const_p16 is a subset of is_const_p32,
- * so we don't need both constraints.
- */
 if ((ct & TCG_CT_CONST_P32) && is_const_p32(val) >= 0) {
 return true;
 }
@@ -868,6 +886,9 @@ static const S390Opcode oi_insns[4] = {
 static const S390Opcode lif_insns[2] = {
 RIL_LLILF, RIL_LLIHF,
 };
+static const S390Opcode tm_insns[4] = {
+RI_TMLL, RI_TMLH, RI_TMHL, RI_TMHH
+};
 
 /* load a register with an immediate value */
 static void tcg_out_movi(TCGContext *s, TCGType type,
@@ -1228,6 +1249,36 @@ static int tgen_cmp2(TCGContext *s, TCGType type, 
TCGCond c, TCGReg r1,
 TCGCond inv_c = tcg_invert_cond(c);
 S390Opcode op;
 
+if (is_tst_cond(c)) {
+tcg_debug_assert(!need_carry);
+
+if (!c2const) {
+if (type == TCG_TYPE_I32) {
+tcg_out_insn(s, RRFa, NRK, TCG_REG_R0, r1, c2);
+} else {
+tcg_out_insn(s, RRFa, NGRK, TCG_REG_R0, r1, c2);
+}
+goto exit;
+}
+
+if (type == TCG_TYPE_I32) {
+c2 = (uint32_t)c2;
+}
+
+int i = is_const_p16(c2);
+if (i >= 0) {
+tcg_out_insn_RI(s, tm_insns[i], r1, c2 >> (i * 16));
+*inv_cc = TCG_COND_TSTEQ ? S390_TM_NE : S390_TM_EQ;
+return *inv_cc ^ 15;
+}
+
+if (risbg_mask(c2)) {
+tgen_andi_risbg(s, TCG_REG_R0, r1, c2);
+goto exit;
+}
+g_assert_not_reached();
+}
+
 if (c2const) {
 if (c2 == 0) {
 if (!(is_unsigned &&

[PATCH v3 34/38] tcg/ppc: Support TCG_COND_TST{EQ,NE}

2024-01-10 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.h |   2 +-
 tcg/ppc/tcg-target.c.inc | 122 ---
 2 files changed, 115 insertions(+), 9 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 60ce49e672..04a7aba4d3 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -143,7 +143,7 @@ typedef enum {
 #define TCG_TARGET_HAS_qemu_ldst_i128   \
 (TCG_TARGET_REG_BITS == 64 && have_isa_2_07)
 
-#define TCG_TARGET_HAS_tst  0
+#define TCG_TARGET_HAS_tst  1
 
 /*
  * While technically Altivec could support V64, it has no 64-bit store
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 535ef2cbe7..7f3829beeb 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -283,11 +283,15 @@ static bool reloc_pc34(tcg_insn_unit *src_rw, const 
tcg_insn_unit *target)
 return false;
 }
 
+static bool mask_operand(uint32_t c, int *mb, int *me);
+static bool mask64_operand(uint64_t c, int *mb, int *me);
+
 /* test if a constant matches the constraint */
 static bool tcg_target_const_match(int64_t sval, int ct,
TCGType type, TCGCond cond, int vece)
 {
 uint64_t uval = sval;
+int mb, me;
 
 if (ct & TCG_CT_CONST) {
 return 1;
@@ -316,6 +320,17 @@ static bool tcg_target_const_match(int64_t sval, int ct,
 case TCG_COND_GTU:
 ct |= TCG_CT_CONST_U16;
 break;
+case TCG_COND_TSTEQ:
+case TCG_COND_TSTNE:
+if ((uval & ~0x) == 0 || (uval & ~0xull) == 0) {
+return 1;
+}
+if (TCG_TARGET_REG_BITS == 32 || type == TCG_TYPE_I32
+? mask_operand(uval, , )
+: mask64_operand(uval << clz64(uval), , )) {
+return 1;
+}
+return 0;
 default:
 g_assert_not_reached();
 }
@@ -703,9 +718,11 @@ enum {
 CR_SO
 };
 
-static const uint32_t tcg_to_bc[] = {
+static const uint32_t tcg_to_bc[16] = {
 [TCG_COND_EQ]  = BC | BI(0, CR_EQ) | BO_COND_TRUE,
 [TCG_COND_NE]  = BC | BI(0, CR_EQ) | BO_COND_FALSE,
+[TCG_COND_TSTEQ]  = BC | BI(0, CR_EQ) | BO_COND_TRUE,
+[TCG_COND_TSTNE]  = BC | BI(0, CR_EQ) | BO_COND_FALSE,
 [TCG_COND_LT]  = BC | BI(0, CR_LT) | BO_COND_TRUE,
 [TCG_COND_GE]  = BC | BI(0, CR_LT) | BO_COND_FALSE,
 [TCG_COND_LE]  = BC | BI(0, CR_GT) | BO_COND_FALSE,
@@ -717,9 +734,11 @@ static const uint32_t tcg_to_bc[] = {
 };
 
 /* The low bit here is set if the RA and RB fields must be inverted.  */
-static const uint32_t tcg_to_isel[] = {
+static const uint32_t tcg_to_isel[16] = {
 [TCG_COND_EQ]  = ISEL | BC_(0, CR_EQ),
 [TCG_COND_NE]  = ISEL | BC_(0, CR_EQ) | 1,
+[TCG_COND_TSTEQ] = ISEL | BC_(0, CR_EQ),
+[TCG_COND_TSTNE] = ISEL | BC_(0, CR_EQ) | 1,
 [TCG_COND_LT]  = ISEL | BC_(0, CR_LT),
 [TCG_COND_GE]  = ISEL | BC_(0, CR_LT) | 1,
 [TCG_COND_LE]  = ISEL | BC_(0, CR_GT) | 1,
@@ -872,19 +891,31 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, 
TCGReg ret, TCGReg arg)
 return true;
 }
 
-static inline void tcg_out_rld(TCGContext *s, int op, TCGReg ra, TCGReg rs,
-   int sh, int mb)
+static void tcg_out_rld_rc(TCGContext *s, int op, TCGReg ra, TCGReg rs,
+   int sh, int mb, bool rc)
 {
 tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
 sh = SH(sh & 0x1f) | (((sh >> 5) & 1) << 1);
 mb = MB64((mb >> 5) | ((mb << 1) & 0x3f));
-tcg_out32(s, op | RA(ra) | RS(rs) | sh | mb);
+tcg_out32(s, op | RA(ra) | RS(rs) | sh | mb | rc);
 }
 
-static inline void tcg_out_rlw(TCGContext *s, int op, TCGReg ra, TCGReg rs,
-   int sh, int mb, int me)
+static void tcg_out_rld(TCGContext *s, int op, TCGReg ra, TCGReg rs,
+int sh, int mb)
 {
-tcg_out32(s, op | RA(ra) | RS(rs) | SH(sh) | MB(mb) | ME(me));
+tcg_out_rld_rc(s, op, ra, rs, sh, mb, false);
+}
+
+static void tcg_out_rlw_rc(TCGContext *s, int op, TCGReg ra, TCGReg rs,
+   int sh, int mb, int me, bool rc)
+{
+tcg_out32(s, op | RA(ra) | RS(rs) | SH(sh) | MB(mb) | ME(me) | rc);
+}
+
+static void tcg_out_rlw(TCGContext *s, int op, TCGReg ra, TCGReg rs,
+int sh, int mb, int me)
+{
+tcg_out_rlw_rc(s, op, ra, rs, sh, mb, me, false);
 }
 
 static void tcg_out_ext8s(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)
@@ -1702,6 +1733,50 @@ static inline bool tcg_out_sti(TCGContext *s, TCGType 
type, TCGArg val,
 return false;
 }
 
+/*
+ * Set dest non-zero if and only if (arg1 & arg2) is non-zero.
+ * If RC, then also set RC0.
+ */
+static void tcg_out_test(TCGContext *s, TCGReg dest, TCGReg arg1, TCGArg arg2,
+ bool const_arg2, TCGType type, bool rc)
+{
+int mb, me;
+
+if (!const_arg2) {
+tcg_out32(s, AND | SAB(arg1, dest, arg2) | rc);
+

[PATCH v3 35/38] tcg/s390x: Split constraint A into J+U

2024-01-10 Thread Richard Henderson

Signed 33-bit == signed 32-bit + unsigned 32-bit.

Signed-off-by: Richard Henderson 
---
 tcg/s390x/tcg-target-con-set.h |  8 
 tcg/s390x/tcg-target-con-str.h |  2 +-
 tcg/s390x/tcg-target.c.inc | 36 +-
 3 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index 9a42037499..665851d84a 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -15,7 +15,7 @@
 C_O0_I1(r)
 C_O0_I2(r, r)
 C_O0_I2(r, ri)
-C_O0_I2(r, rA)
+C_O0_I2(r, rJU)
 C_O0_I2(v, r)
 C_O0_I3(o, m, r)
 C_O1_I1(r, r)
@@ -27,7 +27,7 @@ C_O1_I2(r, 0, rI)
 C_O1_I2(r, 0, rJ)
 C_O1_I2(r, r, r)
 C_O1_I2(r, r, ri)
-C_O1_I2(r, r, rA)
+C_O1_I2(r, r, rJU)
 C_O1_I2(r, r, rI)
 C_O1_I2(r, r, rJ)
 C_O1_I2(r, r, rK)
@@ -39,10 +39,10 @@ C_O1_I2(v, v, r)
 C_O1_I2(v, v, v)
 C_O1_I3(v, v, v, v)
 C_O1_I4(r, r, ri, rI, r)
-C_O1_I4(r, r, rA, rI, r)
+C_O1_I4(r, r, rJU, rI, r)
 C_O2_I1(o, m, r)
 C_O2_I2(o, m, 0, r)
 C_O2_I2(o, m, r, r)
 C_O2_I3(o, m, 0, 1, r)
 C_N1_O1_I4(r, r, 0, 1, ri, r)
-C_N1_O1_I4(r, r, 0, 1, rA, r)
+C_N1_O1_I4(r, r, 0, 1, rJU, r)
diff --git a/tcg/s390x/tcg-target-con-str.h b/tcg/s390x/tcg-target-con-str.h
index 25675b449e..9d2cb775dc 100644
--- a/tcg/s390x/tcg-target-con-str.h
+++ b/tcg/s390x/tcg-target-con-str.h
@@ -16,10 +16,10 @@ REGS('o', 0x) /* odd numbered general regs */
  * Define constraint letters for constants:
  * CONST(letter, TCG_CT_CONST_* bit set)
  */
-CONST('A', TCG_CT_CONST_S33)
 CONST('I', TCG_CT_CONST_S16)
 CONST('J', TCG_CT_CONST_S32)
 CONST('K', TCG_CT_CONST_P32)
 CONST('N', TCG_CT_CONST_INV)
 CONST('R', TCG_CT_CONST_INVRISBG)
+CONST('U', TCG_CT_CONST_U32)
 CONST('Z', TCG_CT_CONST_ZERO)
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 08fe00a392..a317ccd3a5 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -30,7 +30,7 @@
 
 #define TCG_CT_CONST_S16(1 << 8)
 #define TCG_CT_CONST_S32(1 << 9)
-#define TCG_CT_CONST_S33(1 << 10)
+#define TCG_CT_CONST_U32(1 << 10)
 #define TCG_CT_CONST_ZERO   (1 << 11)
 #define TCG_CT_CONST_P32(1 << 12)
 #define TCG_CT_CONST_INV(1 << 13)
@@ -542,22 +542,23 @@ static bool tcg_target_const_match(int64_t val, int ct,
TCGType type, TCGCond cond, int vece)
 {
 if (ct & TCG_CT_CONST) {
-return 1;
+return true;
 }
-
 if (type == TCG_TYPE_I32) {
 val = (int32_t)val;
 }
 
-/* The following are mutually exclusive.  */
-if (ct & TCG_CT_CONST_S16) {
-return val == (int16_t)val;
-} else if (ct & TCG_CT_CONST_S32) {
-return val == (int32_t)val;
-} else if (ct & TCG_CT_CONST_S33) {
-return val >= -0xll && val <= 0xll;
-} else if (ct & TCG_CT_CONST_ZERO) {
-return val == 0;
+if ((ct & TCG_CT_CONST_S32) && val == (int32_t)val) {
+return true;
+}
+if ((ct & TCG_CT_CONST_U32) && val == (uint32_t)val) {
+return true;
+}
+if ((ct & TCG_CT_CONST_S16) && val == (int16_t)val) {
+return true;
+}
+if ((ct & TCG_CT_CONST_ZERO) && val == 0) {
+return true;
 }
 
 if (ct & TCG_CT_CONST_INV) {
@@ -573,8 +574,7 @@ static bool tcg_target_const_match(int64_t val, int ct,
 if ((ct & TCG_CT_CONST_INVRISBG) && risbg_mask(~val)) {
 return true;
 }
-
-return 0;
+return false;
 }
 
 /* Emit instructions according to the given instruction format.  */
@@ -3137,7 +3137,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 return C_O1_I2(r, r, ri);
 case INDEX_op_setcond_i64:
 case INDEX_op_negsetcond_i64:
-return C_O1_I2(r, r, rA);
+return C_O1_I2(r, r, rJU);
 
 case INDEX_op_clz_i64:
 return C_O1_I2(r, r, rI);
@@ -3187,7 +3187,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_brcond_i32:
 return C_O0_I2(r, ri);
 case INDEX_op_brcond_i64:
-return C_O0_I2(r, rA);
+return C_O0_I2(r, rJU);
 
 case INDEX_op_bswap16_i32:
 case INDEX_op_bswap16_i64:
@@ -3240,7 +3240,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_movcond_i32:
 return C_O1_I4(r, r, ri, rI, r);
 case INDEX_op_movcond_i64:
-return C_O1_I4(r, r, rA, rI, r);
+return C_O1_I4(r, r, rJU, rI, r);
 
 case INDEX_op_div2_i32:
 case INDEX_op_div2_i64:
@@ -3259,7 +3259,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 
 case INDEX_op_add2_i64:
 case INDEX_op_sub2_i64:
-return C_N1_O1_I4(r, r, 0, 1, rA, r);
+return C_N1_O1_I4(r, r, 0, 1, rJU, r);
 
 case INDEX_op_st_vec:
 return C_O0_I2(v, r);
-- 
2.34.1

[PATCH v3 15/38] target/s390x: Improve general case of disas_jcc

2024-01-10 Thread Richard Henderson

Avoid code duplication by handling 7 of the 14 cases
by inverting the test for the other 7 cases.

Use TCG_COND_TSTNE for cc in {1,3}.
Use (cc - 1) <= 1 for cc in {1,2}.

Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/translate.c | 82 +---
 1 file changed, 30 insertions(+), 52 deletions(-)

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index ae4e7b27ec..168974f2e6 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -885,67 +885,45 @@ static void disas_jcc(DisasContext *s, DisasCompare *c, 
uint32_t mask)
 case CC_OP_STATIC:
 c->is_64 = false;
 c->u.s32.a = cc_op;
-switch (mask) {
-case 0x8 | 0x4 | 0x2: /* cc != 3 */
-cond = TCG_COND_NE;
+
+/* Fold half of the cases using bit 3 to invert. */
+switch (mask & 8 ? mask ^ 0xf : mask) {
+case 0x1: /* cc == 3 */
+cond = TCG_COND_EQ;
 c->u.s32.b = tcg_constant_i32(3);
 break;
-case 0x8 | 0x4 | 0x1: /* cc != 2 */
-cond = TCG_COND_NE;
-c->u.s32.b = tcg_constant_i32(2);
-break;
-case 0x8 | 0x2 | 0x1: /* cc != 1 */
-cond = TCG_COND_NE;
-c->u.s32.b = tcg_constant_i32(1);
-break;
-case 0x8 | 0x2: /* cc == 0 || cc == 2 => (cc & 1) == 0 */
-cond = TCG_COND_EQ;
-c->u.s32.a = tcg_temp_new_i32();
-c->u.s32.b = tcg_constant_i32(0);
-tcg_gen_andi_i32(c->u.s32.a, cc_op, 1);
-break;
-case 0x8 | 0x4: /* cc < 2 */
-cond = TCG_COND_LTU;
-c->u.s32.b = tcg_constant_i32(2);
-break;
-case 0x8: /* cc == 0 */
-cond = TCG_COND_EQ;
-c->u.s32.b = tcg_constant_i32(0);
-break;
-case 0x4 | 0x2 | 0x1: /* cc != 0 */
-cond = TCG_COND_NE;
-c->u.s32.b = tcg_constant_i32(0);
-break;
-case 0x4 | 0x1: /* cc == 1 || cc == 3 => (cc & 1) != 0 */
-cond = TCG_COND_NE;
-c->u.s32.a = tcg_temp_new_i32();
-c->u.s32.b = tcg_constant_i32(0);
-tcg_gen_andi_i32(c->u.s32.a, cc_op, 1);
-break;
-case 0x4: /* cc == 1 */
-cond = TCG_COND_EQ;
-c->u.s32.b = tcg_constant_i32(1);
-break;
-case 0x2 | 0x1: /* cc > 1 */
-cond = TCG_COND_GTU;
-c->u.s32.b = tcg_constant_i32(1);
-break;
 case 0x2: /* cc == 2 */
 cond = TCG_COND_EQ;
 c->u.s32.b = tcg_constant_i32(2);
 break;
-case 0x1: /* cc == 3 */
+case 0x4: /* cc == 1 */
 cond = TCG_COND_EQ;
-c->u.s32.b = tcg_constant_i32(3);
+c->u.s32.b = tcg_constant_i32(1);
+break;
+case 0x2 | 0x1: /* cc == 2 || cc == 3 => cc > 1 */
+cond = TCG_COND_GTU;
+c->u.s32.b = tcg_constant_i32(1);
+break;
+case 0x4 | 0x1: /* cc == 1 || cc == 3 => (cc & 1) != 0 */
+cond = TCG_COND_TSTNE;
+c->u.s32.b = tcg_constant_i32(1);
+break;
+case 0x4 | 0x2: /* cc == 1 || cc == 2 => (cc - 1) <= 1 */
+cond = TCG_COND_LEU;
+c->u.s32.a = tcg_temp_new_i32();
+c->u.s32.b = tcg_constant_i32(1);
+tcg_gen_addi_i32(c->u.s32.a, cc_op, -1);
+break;
+case 0x4 | 0x2 | 0x1: /* cc != 0 */
+cond = TCG_COND_NE;
+c->u.s32.b = tcg_constant_i32(0);
 break;
 default:
-/* CC is masked by something else: (8 >> cc) & mask.  */
-cond = TCG_COND_NE;
-c->u.s32.a = tcg_temp_new_i32();
-c->u.s32.b = tcg_constant_i32(0);
-tcg_gen_shr_i32(c->u.s32.a, tcg_constant_i32(8), cc_op);
-tcg_gen_andi_i32(c->u.s32.a, c->u.s32.a, mask);
-break;
+/* case 0: never, handled above. */
+g_assert_not_reached();
+}
+if (mask & 8) {
+cond = tcg_invert_cond(cond);
 }
 break;
 
-- 
2.34.1

[PATCH v3 17/38] tcg/aarch64: Support TCG_COND_TST{EQ,NE}

2024-01-10 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target-con-set.h |  5 +--
 tcg/aarch64/tcg-target-con-str.h |  1 +
 tcg/aarch64/tcg-target.h |  2 +-
 tcg/aarch64/tcg-target.c.inc | 56 ++--
 4 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/tcg/aarch64/tcg-target-con-set.h b/tcg/aarch64/tcg-target-con-set.h
index 3fdee26a3d..44fcc1206e 100644
--- a/tcg/aarch64/tcg-target-con-set.h
+++ b/tcg/aarch64/tcg-target-con-set.h
@@ -10,7 +10,7 @@
  * tcg-target-con-str.h; the constraint combination is inclusive or.
  */
 C_O0_I1(r)
-C_O0_I2(r, rA)
+C_O0_I2(r, rC)
 C_O0_I2(rZ, r)
 C_O0_I2(w, r)
 C_O0_I3(rZ, rZ, r)
@@ -22,6 +22,7 @@ C_O1_I2(r, 0, rZ)
 C_O1_I2(r, r, r)
 C_O1_I2(r, r, rA)
 C_O1_I2(r, r, rAL)
+C_O1_I2(r, r, rC)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rL)
 C_O1_I2(r, rZ, rZ)
@@ -31,6 +32,6 @@ C_O1_I2(w, w, wN)
 C_O1_I2(w, w, wO)
 C_O1_I2(w, w, wZ)
 C_O1_I3(w, w, w, w)
-C_O1_I4(r, r, rA, rZ, rZ)
+C_O1_I4(r, r, rC, rZ, rZ)
 C_O2_I1(r, r, r)
 C_O2_I4(r, r, rZ, rZ, rA, rMZ)
diff --git a/tcg/aarch64/tcg-target-con-str.h b/tcg/aarch64/tcg-target-con-str.h
index fb1a845b4f..48e1722c68 100644
--- a/tcg/aarch64/tcg-target-con-str.h
+++ b/tcg/aarch64/tcg-target-con-str.h
@@ -16,6 +16,7 @@ REGS('w', ALL_VECTOR_REGS)
  * CONST(letter, TCG_CT_CONST_* bit set)
  */
 CONST('A', TCG_CT_CONST_AIMM)
+CONST('C', TCG_CT_CONST_CMP)
 CONST('L', TCG_CT_CONST_LIMM)
 CONST('M', TCG_CT_CONST_MONE)
 CONST('O', TCG_CT_CONST_ORRI)
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index b4ac13be7b..ef5ebe91bd 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -138,7 +138,7 @@ typedef enum {
 #define TCG_TARGET_HAS_qemu_ldst_i128   1
 #endif
 
-#define TCG_TARGET_HAS_tst  0
+#define TCG_TARGET_HAS_tst  1
 
 #define TCG_TARGET_HAS_v64  1
 #define TCG_TARGET_HAS_v128 1
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 420e4a35ea..70df250c04 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -126,6 +126,7 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 #define TCG_CT_CONST_MONE 0x800
 #define TCG_CT_CONST_ORRI 0x1000
 #define TCG_CT_CONST_ANDI 0x2000
+#define TCG_CT_CONST_CMP  0x4000
 
 #define ALL_GENERAL_REGS  0xu
 #define ALL_VECTOR_REGS   0xull
@@ -279,6 +280,15 @@ static bool tcg_target_const_match(int64_t val, int ct,
 if (type == TCG_TYPE_I32) {
 val = (int32_t)val;
 }
+
+if (ct & TCG_CT_CONST_CMP) {
+if (is_tst_cond(cond)) {
+ct |= TCG_CT_CONST_LIMM;
+} else {
+ct |= TCG_CT_CONST_AIMM;
+}
+}
+
 if ((ct & TCG_CT_CONST_AIMM) && (is_aimm(val) || is_aimm(-val))) {
 return 1;
 }
@@ -345,6 +355,9 @@ static const enum aarch64_cond_code tcg_cond_to_aarch64[] = 
{
 [TCG_COND_GTU] = COND_HI,
 [TCG_COND_GEU] = COND_HS,
 [TCG_COND_LEU] = COND_LS,
+/* bit test */
+[TCG_COND_TSTEQ] = COND_EQ,
+[TCG_COND_TSTNE] = COND_NE,
 };
 
 typedef enum {
@@ -1342,19 +1355,26 @@ static inline void tcg_out_dep(TCGContext *s, TCGType 
ext, TCGReg rd,
 tcg_out_bfm(s, ext, rd, rn, a, b);
 }
 
-static void tcg_out_cmp(TCGContext *s, TCGType ext, TCGReg a,
+static void tcg_out_cmp(TCGContext *s, TCGType ext, TCGCond cond, TCGReg a,
 tcg_target_long b, bool const_b)
 {
-if (const_b) {
-/* Using CMP or CMN aliases.  */
-if (b >= 0) {
-tcg_out_insn(s, 3401, SUBSI, ext, TCG_REG_XZR, a, b);
+if (is_tst_cond(cond)) {
+if (!const_b) {
+tcg_out_insn(s, 3510, ANDS, ext, TCG_REG_XZR, a, b);
 } else {
-tcg_out_insn(s, 3401, ADDSI, ext, TCG_REG_XZR, a, -b);
+tcg_debug_assert(is_limm(b));
+tcg_out_logicali(s, I3404_ANDSI, 0, TCG_REG_XZR, a, b);
 }
 } else {
-/* Using CMP alias SUBS wzr, Wn, Wm */
-tcg_out_insn(s, 3502, SUBS, ext, TCG_REG_XZR, a, b);
+if (!const_b) {
+tcg_out_insn(s, 3502, SUBS, ext, TCG_REG_XZR, a, b);
+} else if (b >= 0) {
+tcg_debug_assert(is_aimm(b));
+tcg_out_insn(s, 3401, SUBSI, ext, TCG_REG_XZR, a, b);
+} else {
+tcg_debug_assert(is_aimm(-b));
+tcg_out_insn(s, 3401, ADDSI, ext, TCG_REG_XZR, a, -b);
+}
 }
 }
 
@@ -1402,7 +1422,7 @@ static void tcg_out_brcond(TCGContext *s, TCGType ext, 
TCGCond c, TCGArg a,
 need_cmp = false;
 } else {
 need_cmp = true;
-tcg_out_cmp(s, ext, a, b, b_const);
+tcg_out_cmp(s, ext, c, a, b, b_const);
 }
 
 if (!l->has_value) {
@@ -1575,7 +1595,7 @@ static void tcg_out_cltz(TCGContext *s, TCGType ext, 
TCGReg d,
 } else {
 AArch64Insn sel = I3506_CSEL;
 
-tcg_out_cmp(s, ext, a0, 0, 1);
+tcg_out_cmp(s, ext, TCG_COND_NE, a0, 0, 1);
 tcg_out_insn(s, 3507, CLZ,

[PATCH v3 26/38] tcg/i386: Use TEST r,r to test 8/16/32 bits

2024-01-10 Thread Richard Henderson

From: Paolo Bonzini 

Just like when testing against the sign bits, TEST r,r can be used when the
immediate is 0xff, 0xff00, 0x, 0x.

Signed-off-by: Paolo Bonzini 
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 17 +
 1 file changed, 17 insertions(+)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 0b8c60d021..c6ba498623 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1501,6 +1501,10 @@ static int tcg_out_cmp(TCGContext *s, TCGCond cond, 
TCGArg arg1,
 tcg_out_modrm(s, OPC_TESTB | P_REXB_R, arg1, arg1);
 return js;
 }
+if (arg2 == 0xff) {
+tcg_out_modrm(s, OPC_TESTB | P_REXB_R, arg1, arg1);
+return jz;
+}
 tcg_out_modrm(s, OPC_GRP3_Eb | P_REXB_RM, EXT3_TESTi, arg1);
 tcg_out8(s, arg2);
 return jz;
@@ -1511,11 +1515,24 @@ static int tcg_out_cmp(TCGContext *s, TCGCond cond, 
TCGArg arg1,
 tcg_out_modrm(s, OPC_TESTB, arg1 + 4, arg1 + 4);
 return js;
 }
+if (arg2 == 0xff00) {
+tcg_out_modrm(s, OPC_TESTB, arg1 + 4, arg1 + 4);
+return jz;
+}
 tcg_out_modrm(s, OPC_GRP3_Eb, EXT3_TESTi, arg1 + 4);
 tcg_out8(s, arg2 >> 8);
 return jz;
 }
 
+if (arg2 == 0x) {
+tcg_out_modrm(s, OPC_TESTL | P_DATA16, arg1, arg1);
+return jz;
+}
+if (arg2 == 0xu) {
+tcg_out_modrm(s, OPC_TESTL, arg1, arg1);
+return jz;
+}
+
 if (is_power_of_2(rexw ? arg2 : (uint32_t)arg2)) {
 int jc = (cond == TCG_COND_TSTNE ? JCC_JB : JCC_JAE);
 int sh = ctz64(arg2);
-- 
2.34.1

[PATCH v3 36/38] tcg/s390x: Add TCG_CT_CONST_CMP

2024-01-10 Thread Richard Henderson

Better constraint for tcg_out_cmp, based on the comparison.

Signed-off-by: Richard Henderson 
---
 tcg/s390x/tcg-target-con-set.h |  6 +--
 tcg/s390x/tcg-target-con-str.h |  1 +
 tcg/s390x/tcg-target.c.inc | 72 +-
 3 files changed, 58 insertions(+), 21 deletions(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index 665851d84a..f75955eaa8 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -15,7 +15,7 @@
 C_O0_I1(r)
 C_O0_I2(r, r)
 C_O0_I2(r, ri)
-C_O0_I2(r, rJU)
+C_O0_I2(r, rC)
 C_O0_I2(v, r)
 C_O0_I3(o, m, r)
 C_O1_I1(r, r)
@@ -27,7 +27,7 @@ C_O1_I2(r, 0, rI)
 C_O1_I2(r, 0, rJ)
 C_O1_I2(r, r, r)
 C_O1_I2(r, r, ri)
-C_O1_I2(r, r, rJU)
+C_O1_I2(r, r, rC)
 C_O1_I2(r, r, rI)
 C_O1_I2(r, r, rJ)
 C_O1_I2(r, r, rK)
@@ -39,7 +39,7 @@ C_O1_I2(v, v, r)
 C_O1_I2(v, v, v)
 C_O1_I3(v, v, v, v)
 C_O1_I4(r, r, ri, rI, r)
-C_O1_I4(r, r, rJU, rI, r)
+C_O1_I4(r, r, rC, rI, r)
 C_O2_I1(o, m, r)
 C_O2_I2(o, m, 0, r)
 C_O2_I2(o, m, r, r)
diff --git a/tcg/s390x/tcg-target-con-str.h b/tcg/s390x/tcg-target-con-str.h
index 9d2cb775dc..745f6c0df5 100644
--- a/tcg/s390x/tcg-target-con-str.h
+++ b/tcg/s390x/tcg-target-con-str.h
@@ -16,6 +16,7 @@ REGS('o', 0x) /* odd numbered general regs */
  * Define constraint letters for constants:
  * CONST(letter, TCG_CT_CONST_* bit set)
  */
+CONST('C', TCG_CT_CONST_CMP)
 CONST('I', TCG_CT_CONST_S16)
 CONST('J', TCG_CT_CONST_S32)
 CONST('K', TCG_CT_CONST_P32)
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index a317ccd3a5..86ec737768 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -35,6 +35,7 @@
 #define TCG_CT_CONST_P32(1 << 12)
 #define TCG_CT_CONST_INV(1 << 13)
 #define TCG_CT_CONST_INVRISBG   (1 << 14)
+#define TCG_CT_CONST_CMP(1 << 15)
 
 #define ALL_GENERAL_REGS MAKE_64BIT_MASK(0, 16)
 #define ALL_VECTOR_REGS  MAKE_64BIT_MASK(32, 32)
@@ -548,6 +549,29 @@ static bool tcg_target_const_match(int64_t val, int ct,
 val = (int32_t)val;
 }
 
+if (ct & TCG_CT_CONST_CMP) {
+switch (cond) {
+case TCG_COND_EQ:
+case TCG_COND_NE:
+ct |= TCG_CT_CONST_S32 | TCG_CT_CONST_U32;  /* CGFI or CLGFI */
+break;
+case TCG_COND_LT:
+case TCG_COND_GE:
+case TCG_COND_LE:
+case TCG_COND_GT:
+ct |= TCG_CT_CONST_S32;  /* CGFI */
+break;
+case TCG_COND_LTU:
+case TCG_COND_GEU:
+case TCG_COND_LEU:
+case TCG_COND_GTU:
+ct |= TCG_CT_CONST_U32;  /* CLGFI */
+break;
+default:
+g_assert_not_reached();
+}
+}
+
 if ((ct & TCG_CT_CONST_S32) && val == (int32_t)val) {
 return true;
 }
@@ -1229,22 +1253,34 @@ static int tgen_cmp2(TCGContext *s, TCGType type, 
TCGCond c, TCGReg r1,
 goto exit;
 }
 
-/*
- * Constraints are for a signed 33-bit operand, which is a
- * convenient superset of this signed/unsigned test.
- */
-if (c2 == (is_unsigned ? (TCGArg)(uint32_t)c2 : (TCGArg)(int32_t)c2)) {
-op = (is_unsigned ? RIL_CLGFI : RIL_CGFI);
-tcg_out_insn_RIL(s, op, r1, c2);
-goto exit;
+/* Should match TCG_CT_CONST_CMP. */
+switch (c) {
+case TCG_COND_LT:
+case TCG_COND_GE:
+case TCG_COND_LE:
+case TCG_COND_GT:
+tcg_debug_assert(c2 == (int32_t)c2);
+op = RIL_CGFI;
+break;
+case TCG_COND_EQ:
+case TCG_COND_NE:
+if (c2 == (int32_t)c2) {
+op = RIL_CGFI;
+break;
+}
+/* fall through */
+case TCG_COND_LTU:
+case TCG_COND_GEU:
+case TCG_COND_LEU:
+case TCG_COND_GTU:
+tcg_debug_assert(c2 == (uint32_t)c2);
+op = RIL_CLGFI;
+break;
+default:
+g_assert_not_reached();
 }
-
-/* Load everything else into a register. */
-tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, c2);
-c2 = TCG_TMP0;
-}
-
-if (type == TCG_TYPE_I32) {
+tcg_out_insn_RIL(s, op, r1, c2);
+} else if (type == TCG_TYPE_I32) {
 op = (is_unsigned ? RR_CLR : RR_CR);
 tcg_out_insn_RR(s, op, r1, c2);
 } else {
@@ -3137,7 +3173,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 return C_O1_I2(r, r, ri);
 case INDEX_op_setcond_i64:
 case INDEX_op_negsetcond_i64:
-return C_O1_I2(r, r, rJU);
+return C_O1_I2(r, r, rC);
 
 case INDEX_op_clz_i64:
 return C_O1_I2(r, r, rI);
@@ -3187,7 +3223,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_brcond_i32:
 return C_O0_I2(r, ri);
 case INDEX_op_brcond_i64:
-return C_O0_I2(r, rJU);
+return C_O0_I2(r, rC);
 
 case

[PATCH v3 08/38] target/alpha: Pass immediate value to gen_bcond_internal()

2024-01-10 Thread Richard Henderson

Simplify gen_bcond() by passing an immediate value.

Signed-off-by: Richard Henderson 
Message-Id: <20231028194522.245170-33-richard.hender...@linaro.org>
[PMD: Split from bigger patch, part 1/2]
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20231108205247.83234-1-phi...@linaro.org>
---
 target/alpha/translate.c | 21 +++--
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 32333081d8..89e630a7cc 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -453,13 +453,13 @@ static DisasJumpType gen_bdirect(DisasContext *ctx, int 
ra, int32_t disp)
 }
 
 static DisasJumpType gen_bcond_internal(DisasContext *ctx, TCGCond cond,
-TCGv cmp, int32_t disp)
+TCGv cmp, uint64_t imm, int32_t disp)
 {
 uint64_t dest = ctx->base.pc_next + (disp << 2);
 TCGLabel *lab_true = gen_new_label();
 
 if (use_goto_tb(ctx, dest)) {
-tcg_gen_brcondi_i64(cond, cmp, 0, lab_true);
+tcg_gen_brcondi_i64(cond, cmp, imm, lab_true);
 
 tcg_gen_goto_tb(0);
 tcg_gen_movi_i64(cpu_pc, ctx->base.pc_next);
@@ -472,11 +472,11 @@ static DisasJumpType gen_bcond_internal(DisasContext 
*ctx, TCGCond cond,
 
 return DISAS_NORETURN;
 } else {
-TCGv_i64 z = load_zero(ctx);
+TCGv_i64 i = tcg_constant_i64(imm);
 TCGv_i64 d = tcg_constant_i64(dest);
 TCGv_i64 p = tcg_constant_i64(ctx->base.pc_next);
 
-tcg_gen_movcond_i64(cond, cpu_pc, cmp, z, d, p);
+tcg_gen_movcond_i64(cond, cpu_pc, cmp, i, d, p);
 return DISAS_PC_UPDATED;
 }
 }
@@ -484,15 +484,8 @@ static DisasJumpType gen_bcond_internal(DisasContext *ctx, 
TCGCond cond,
 static DisasJumpType gen_bcond(DisasContext *ctx, TCGCond cond, int ra,
int32_t disp, int mask)
 {
-if (mask) {
-TCGv tmp = tcg_temp_new();
-DisasJumpType ret;
-
-tcg_gen_andi_i64(tmp, load_gpr(ctx, ra), 1);
-ret = gen_bcond_internal(ctx, cond, tmp, disp);
-return ret;
-}
-return gen_bcond_internal(ctx, cond, load_gpr(ctx, ra), disp);
+return gen_bcond_internal(ctx, cond, load_gpr(ctx, ra),
+  mask, disp);
 }
 
 /* Fold -0.0 for comparison with COND.  */
@@ -533,7 +526,7 @@ static DisasJumpType gen_fbcond(DisasContext *ctx, TCGCond 
cond, int ra,
 DisasJumpType ret;
 
 gen_fold_mzero(cond, cmp_tmp, load_fpr(ctx, ra));
-ret = gen_bcond_internal(ctx, cond, cmp_tmp, disp);
+ret = gen_bcond_internal(ctx, cond, cmp_tmp, 0, disp);
 return ret;
 }
 
-- 
2.34.1

[PATCH v3 33/38] tcg/ppc: Add TCG_CT_CONST_CMP

2024-01-10 Thread Richard Henderson

Better constraint for tcg_out_cmp, based on the comparison.
We can't yet remove the fallback to load constants into a
scratch because of tcg_out_cmp2, but that path should not
be as frequent.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target-con-set.h |  5 ++--
 tcg/ppc/tcg-target-con-str.h |  1 +
 tcg/ppc/tcg-target.c.inc | 48 ++--
 3 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/tcg/ppc/tcg-target-con-set.h b/tcg/ppc/tcg-target-con-set.h
index cb47b29452..9f99bde505 100644
--- a/tcg/ppc/tcg-target-con-set.h
+++ b/tcg/ppc/tcg-target-con-set.h
@@ -11,7 +11,7 @@
  */
 C_O0_I1(r)
 C_O0_I2(r, r)
-C_O0_I2(r, ri)
+C_O0_I2(r, rC)
 C_O0_I2(v, r)
 C_O0_I3(r, r, r)
 C_O0_I3(o, m, r)
@@ -26,13 +26,14 @@ C_O1_I2(r, rI, ri)
 C_O1_I2(r, rI, rT)
 C_O1_I2(r, r, r)
 C_O1_I2(r, r, ri)
+C_O1_I2(r, r, rC)
 C_O1_I2(r, r, rI)
 C_O1_I2(r, r, rT)
 C_O1_I2(r, r, rU)
 C_O1_I2(r, r, rZW)
 C_O1_I2(v, v, v)
 C_O1_I3(v, v, v, v)
-C_O1_I4(r, r, ri, rZ, rZ)
+C_O1_I4(r, r, rC, rZ, rZ)
 C_O1_I4(r, r, r, ri, ri)
 C_O2_I1(r, r, r)
 C_N1O1_I1(o, m, r)
diff --git a/tcg/ppc/tcg-target-con-str.h b/tcg/ppc/tcg-target-con-str.h
index 20846901de..16b687216e 100644
--- a/tcg/ppc/tcg-target-con-str.h
+++ b/tcg/ppc/tcg-target-con-str.h
@@ -16,6 +16,7 @@ REGS('v', ALL_VECTOR_REGS)
  * Define constraint letters for constants:
  * CONST(letter, TCG_CT_CONST_* bit set)
  */
+CONST('C', TCG_CT_CONST_CMP)
 CONST('I', TCG_CT_CONST_S16)
 CONST('M', TCG_CT_CONST_MONE)
 CONST('T', TCG_CT_CONST_S32)
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 26e0bc31d7..535ef2cbe7 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -92,11 +92,13 @@
 #define SZR  (TCG_TARGET_REG_BITS / 8)
 
 #define TCG_CT_CONST_S16  0x100
+#define TCG_CT_CONST_U16  0x200
 #define TCG_CT_CONST_S32  0x400
 #define TCG_CT_CONST_U32  0x800
 #define TCG_CT_CONST_ZERO 0x1000
 #define TCG_CT_CONST_MONE 0x2000
 #define TCG_CT_CONST_WSZ  0x4000
+#define TCG_CT_CONST_CMP  0x8000
 
 #define ALL_GENERAL_REGS  0xu
 #define ALL_VECTOR_REGS   0xull
@@ -296,9 +298,35 @@ static bool tcg_target_const_match(int64_t sval, int ct,
 sval = (int32_t)sval;
 }
 
+if (ct & TCG_CT_CONST_CMP) {
+switch (cond) {
+case TCG_COND_EQ:
+case TCG_COND_NE:
+ct |= TCG_CT_CONST_S16 | TCG_CT_CONST_U16;
+break;
+case TCG_COND_LT:
+case TCG_COND_GE:
+case TCG_COND_LE:
+case TCG_COND_GT:
+ct |= TCG_CT_CONST_S16;
+break;
+case TCG_COND_LTU:
+case TCG_COND_GEU:
+case TCG_COND_LEU:
+case TCG_COND_GTU:
+ct |= TCG_CT_CONST_U16;
+break;
+default:
+g_assert_not_reached();
+}
+}
+
 if ((ct & TCG_CT_CONST_S16) && sval == (int16_t)sval) {
 return 1;
 }
+if ((ct & TCG_CT_CONST_U16) && uval == (uint16_t)uval) {
+return 1;
+}
 if ((ct & TCG_CT_CONST_S32) && sval == (int32_t)sval) {
 return 1;
 }
@@ -1682,7 +1710,10 @@ static void tcg_out_cmp(TCGContext *s, int cond, TCGArg 
arg1, TCGArg arg2,
 
 tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
 
-/* Simplify the comparisons below wrt CMPI.  */
+/*
+ * Simplify the comparisons below wrt CMPI.
+ * All of the tests are 16-bit, so a 32-bit sign extend always works.
+ */
 if (type == TCG_TYPE_I32) {
 arg2 = (int32_t)arg2;
 }
@@ -3991,8 +4022,6 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_sar_i32:
 case INDEX_op_rotl_i32:
 case INDEX_op_rotr_i32:
-case INDEX_op_setcond_i32:
-case INDEX_op_negsetcond_i32:
 case INDEX_op_and_i64:
 case INDEX_op_andc_i64:
 case INDEX_op_shl_i64:
@@ -4000,8 +4029,6 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_sar_i64:
 case INDEX_op_rotl_i64:
 case INDEX_op_rotr_i64:
-case INDEX_op_setcond_i64:
-case INDEX_op_negsetcond_i64:
 return C_O1_I2(r, r, ri);
 
 case INDEX_op_mul_i32:
@@ -4045,11 +4072,16 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)
 
 case INDEX_op_brcond_i32:
 case INDEX_op_brcond_i64:
-return C_O0_I2(r, ri);
-
+return C_O0_I2(r, rC);
+case INDEX_op_setcond_i32:
+case INDEX_op_setcond_i64:
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
+return C_O1_I2(r, r, rC);
 case INDEX_op_movcond_i32:
 case INDEX_op_movcond_i64:
-return C_O1_I4(r, r, ri, rZ, rZ);
+return C_O1_I4(r, r, rC, rZ, rZ);
+
 case INDEX_op_deposit_i32:
 case INDEX_op_deposit_i64:
 return C_O1_I2(r, 0, rZ);
-- 
2.34.1

[PATCH v3 18/38] tcg/aarch64: Generate TBZ, TBNZ

2024-01-10 Thread Richard Henderson

Test the sign bit for LT/GE vs 0, and TSTNE/EQ vs a power of 2.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.c.inc | 100 ---
 1 file changed, 81 insertions(+), 19 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 70df250c04..55225313ad 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -105,6 +105,18 @@ static bool reloc_pc19(tcg_insn_unit *src_rw, const 
tcg_insn_unit *target)
 return false;
 }
 
+static bool reloc_pc14(tcg_insn_unit *src_rw, const tcg_insn_unit *target)
+{
+const tcg_insn_unit *src_rx = tcg_splitwx_to_rx(src_rw);
+ptrdiff_t offset = target - src_rx;
+
+if (offset == sextract64(offset, 0, 14)) {
+*src_rw = deposit32(*src_rw, 5, 14, offset);
+return true;
+}
+return false;
+}
+
 static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 intptr_t value, intptr_t addend)
 {
@@ -115,6 +127,8 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 return reloc_pc26(code_ptr, (const tcg_insn_unit *)value);
 case R_AARCH64_CONDBR19:
 return reloc_pc19(code_ptr, (const tcg_insn_unit *)value);
+case R_AARCH64_TSTBR14:
+return reloc_pc14(code_ptr, (const tcg_insn_unit *)value);
 default:
 g_assert_not_reached();
 }
@@ -380,6 +394,10 @@ typedef enum {
 /* Conditional branch (immediate).  */
 I3202_B_C   = 0x5400,
 
+/* Test and branch (immediate).  */
+I3205_TBZ   = 0x3600,
+I3205_TBNZ  = 0x3700,
+
 /* Unconditional branch (immediate).  */
 I3206_B = 0x1400,
 I3206_BL= 0x9400,
@@ -660,6 +678,14 @@ static void tcg_out_insn_3202(TCGContext *s, AArch64Insn 
insn,
 tcg_out32(s, insn | tcg_cond_to_aarch64[c] | (imm19 & 0x7) << 5);
 }
 
+static void tcg_out_insn_3205(TCGContext *s, AArch64Insn insn,
+  TCGReg rt, int imm6, int imm14)
+{
+insn |= (imm6 & 0x20) << (31 - 5);
+insn |= (imm6 & 0x1f) << 19;
+tcg_out32(s, insn | (imm14 & 0x3fff) << 5 | rt);
+}
+
 static void tcg_out_insn_3206(TCGContext *s, AArch64Insn insn, int imm26)
 {
 tcg_out32(s, insn | (imm26 & 0x03ff));
@@ -1415,30 +1441,66 @@ static inline void tcg_out_goto_label(TCGContext *s, 
TCGLabel *l)
 static void tcg_out_brcond(TCGContext *s, TCGType ext, TCGCond c, TCGArg a,
TCGArg b, bool b_const, TCGLabel *l)
 {
-intptr_t offset;
-bool need_cmp;
+int tbit = -1;
+bool need_cmp = true;
 
-if (b_const && b == 0 && (c == TCG_COND_EQ || c == TCG_COND_NE)) {
-need_cmp = false;
-} else {
-need_cmp = true;
-tcg_out_cmp(s, ext, c, a, b, b_const);
-}
-
-if (!l->has_value) {
-tcg_out_reloc(s, s->code_ptr, R_AARCH64_CONDBR19, l, 0);
-offset = tcg_in32(s) >> 5;
-} else {
-offset = tcg_pcrel_diff(s, l->u.value_ptr) >> 2;
-tcg_debug_assert(offset == sextract64(offset, 0, 19));
+switch (c) {
+case TCG_COND_EQ:
+case TCG_COND_NE:
+if (b_const && b == 0) {
+need_cmp = false;
+}
+break;
+case TCG_COND_LT:
+case TCG_COND_GE:
+if (b_const && b == 0) {
+c = (c == TCG_COND_LT ? TCG_COND_TSTNE : TCG_COND_TSTEQ);
+tbit = ext ? 63 : 31;
+need_cmp = false;
+}
+break;
+case TCG_COND_TSTEQ:
+case TCG_COND_TSTNE:
+if (b_const && is_power_of_2(b)) {
+tbit = ctz64(b);
+need_cmp = false;
+}
+break;
+default:
+break;
 }
 
 if (need_cmp) {
-tcg_out_insn(s, 3202, B_C, c, offset);
-} else if (c == TCG_COND_EQ) {
-tcg_out_insn(s, 3201, CBZ, ext, a, offset);
+tcg_out_cmp(s, ext, c, a, b, b_const);
+tcg_out_reloc(s, s->code_ptr, R_AARCH64_CONDBR19, l, 0);
+tcg_out_insn(s, 3202, B_C, c, 0);
+return;
+}
+
+if (tbit >= 0) {
+tcg_out_reloc(s, s->code_ptr, R_AARCH64_TSTBR14, l, 0);
+switch (c) {
+case TCG_COND_TSTEQ:
+tcg_out_insn(s, 3205, TBZ, a, tbit, 0);
+break;
+case TCG_COND_TSTNE:
+tcg_out_insn(s, 3205, TBNZ, a, tbit, 0);
+break;
+default:
+g_assert_not_reached();
+}
 } else {
-tcg_out_insn(s, 3201, CBNZ, ext, a, offset);
+tcg_out_reloc(s, s->code_ptr, R_AARCH64_CONDBR19, l, 0);
+switch (c) {
+case TCG_COND_EQ:
+tcg_out_insn(s, 3201, CBZ, ext, a, 0);
+break;
+case TCG_COND_NE:
+tcg_out_insn(s, 3201, CBNZ, ext, a, 0);
+break;
+default:
+g_assert_not_reached();
+}
 }
 }
 
-- 
2.34.1

[PATCH v3 30/38] tcg/ppc: Sink tcg_to_bc usage into tcg_out_bc

2024-01-10 Thread Richard Henderson

Rename the current tcg_out_bc function to tcg_out_bc_lab, and
create a new function that takes an integer displacement + link.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 850ace98b2..830d2fe73a 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -1946,14 +1946,20 @@ static void tcg_out_setcond(TCGContext *s, TCGType 
type, TCGCond cond,
 }
 }
 
-static void tcg_out_bc(TCGContext *s, int bc, TCGLabel *l)
+static void tcg_out_bc(TCGContext *s, TCGCond cond, int bd)
 {
+tcg_out32(s, tcg_to_bc[cond] | bd);
+}
+
+static void tcg_out_bc_lab(TCGContext *s, TCGCond cond, TCGLabel *l)
+{
+int bd = 0;
 if (l->has_value) {
-bc |= reloc_pc14_val(tcg_splitwx_to_rx(s->code_ptr), l->u.value_ptr);
+bd = reloc_pc14_val(tcg_splitwx_to_rx(s->code_ptr), l->u.value_ptr);
 } else {
 tcg_out_reloc(s, s->code_ptr, R_PPC_REL14, l, 0);
 }
-tcg_out32(s, bc);
+tcg_out_bc(s, cond, bd);
 }
 
 static void tcg_out_brcond(TCGContext *s, TCGCond cond,
@@ -1961,7 +1967,7 @@ static void tcg_out_brcond(TCGContext *s, TCGCond cond,
TCGLabel *l, TCGType type)
 {
 tcg_out_cmp(s, cond, arg1, arg2, const_arg2, 7, type);
-tcg_out_bc(s, tcg_to_bc[cond], l);
+tcg_out_bc_lab(s, cond, l);
 }
 
 static void tcg_out_movcond(TCGContext *s, TCGType type, TCGCond cond,
@@ -2003,7 +2009,7 @@ static void tcg_out_movcond(TCGContext *s, TCGType type, 
TCGCond cond,
 }
 }
 /* Branch forward over one insn */
-tcg_out32(s, tcg_to_bc[cond] | 8);
+tcg_out_bc(s, cond, 8);
 if (v2 == 0) {
 tcg_out_movi(s, type, dest, 0);
 } else {
@@ -2024,11 +2030,11 @@ static void tcg_out_cntxz(TCGContext *s, TCGType type, 
uint32_t opc,
 tcg_out32(s, opc | RA(TCG_REG_R0) | RS(a1));
 tcg_out32(s, tcg_to_isel[TCG_COND_EQ] | TAB(a0, a2, TCG_REG_R0));
 } else if (!const_a2 && a0 == a2) {
-tcg_out32(s, tcg_to_bc[TCG_COND_EQ] | 8);
+tcg_out_bc(s, TCG_COND_EQ, 8);
 tcg_out32(s, opc | RA(a0) | RS(a1));
 } else {
 tcg_out32(s, opc | RA(a0) | RS(a1));
-tcg_out32(s, tcg_to_bc[TCG_COND_NE] | 8);
+tcg_out_bc(s, TCG_COND_NE, 8);
 if (const_a2) {
 tcg_out_movi(s, type, a0, 0);
 } else {
@@ -2108,11 +2114,11 @@ static void tcg_out_setcond2(TCGContext *s, const 
TCGArg *args,
 tcg_out_rlw(s, RLWINM, args[0], TCG_REG_R0, 31, 31, 31);
 }
 
-static void tcg_out_brcond2 (TCGContext *s, const TCGArg *args,
- const int *const_args)
+static void tcg_out_brcond2(TCGContext *s, const TCGArg *args,
+const int *const_args)
 {
 tcg_out_cmp2(s, args, const_args);
-tcg_out_bc(s, BC | BI(7, CR_EQ) | BO_COND_TRUE, arg_label(args[5]));
+tcg_out_bc_lab(s, TCG_COND_EQ, arg_label(args[5]));
 }
 
 static void tcg_out_mb(TCGContext *s, TCGArg a0)
@@ -2446,7 +2452,7 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, 
HostAddress *h,
 
 /* Load a pointer into the current opcode w/conditional branch-link. */
 ldst->label_ptr[0] = s->code_ptr;
-tcg_out32(s, BC | BI(7, CR_EQ) | BO_COND_FALSE | LK);
+tcg_out_bc(s, TCG_COND_NE, LK);
 
 h->base = TCG_REG_TMP1;
 } else {
-- 
2.34.1

[PATCH v3 22/38] tcg/i386: Pass x86 condition codes to tcg_out_cmov

2024-01-10 Thread Richard Henderson

Hoist the tcg_cond_to_jcc index outside the function.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index accaaa2660..2d6100a8f4 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1699,14 +1699,14 @@ static void tcg_out_setcond2(TCGContext *s, const 
TCGArg *args,
 }
 #endif
 
-static void tcg_out_cmov(TCGContext *s, TCGCond cond, int rexw,
+static void tcg_out_cmov(TCGContext *s, int jcc, int rexw,
  TCGReg dest, TCGReg v1)
 {
 if (have_cmov) {
-tcg_out_modrm(s, OPC_CMOVCC | tcg_cond_to_jcc[cond] | rexw, dest, v1);
+tcg_out_modrm(s, OPC_CMOVCC | jcc | rexw, dest, v1);
 } else {
 TCGLabel *over = gen_new_label();
-tcg_out_jxx(s, tcg_cond_to_jcc[tcg_invert_cond(cond)], over, 1);
+tcg_out_jxx(s, jcc ^ 1, over, 1);
 tcg_out_mov(s, TCG_TYPE_I32, dest, v1);
 tcg_out_label(s, over);
 }
@@ -1717,7 +1717,7 @@ static void tcg_out_movcond(TCGContext *s, int rexw, 
TCGCond cond,
 TCGReg v1)
 {
 tcg_out_cmp(s, c1, c2, const_c2, rexw);
-tcg_out_cmov(s, cond, rexw, dest, v1);
+tcg_out_cmov(s, tcg_cond_to_jcc[cond], rexw, dest, v1);
 }
 
 static void tcg_out_ctz(TCGContext *s, int rexw, TCGReg dest, TCGReg arg1,
@@ -1729,12 +1729,12 @@ static void tcg_out_ctz(TCGContext *s, int rexw, TCGReg 
dest, TCGReg arg1,
 tcg_debug_assert(arg2 == (rexw ? 64 : 32));
 } else {
 tcg_debug_assert(dest != arg2);
-tcg_out_cmov(s, TCG_COND_LTU, rexw, dest, arg2);
+tcg_out_cmov(s, JCC_JB, rexw, dest, arg2);
 }
 } else {
 tcg_debug_assert(dest != arg2);
 tcg_out_modrm(s, OPC_BSF + rexw, dest, arg1);
-tcg_out_cmov(s, TCG_COND_EQ, rexw, dest, arg2);
+tcg_out_cmov(s, JCC_JE, rexw, dest, arg2);
 }
 }
 
@@ -1747,7 +1747,7 @@ static void tcg_out_clz(TCGContext *s, int rexw, TCGReg 
dest, TCGReg arg1,
 tcg_debug_assert(arg2 == (rexw ? 64 : 32));
 } else {
 tcg_debug_assert(dest != arg2);
-tcg_out_cmov(s, TCG_COND_LTU, rexw, dest, arg2);
+tcg_out_cmov(s, JCC_JB, rexw, dest, arg2);
 }
 } else {
 tcg_debug_assert(!const_a2);
@@ -1760,7 +1760,7 @@ static void tcg_out_clz(TCGContext *s, int rexw, TCGReg 
dest, TCGReg arg1,
 
 /* Since we have destroyed the flags from BSR, we have to re-test.  */
 tcg_out_cmp(s, arg1, 0, 1, rexw);
-tcg_out_cmov(s, TCG_COND_EQ, rexw, dest, arg2);
+tcg_out_cmov(s, JCC_JE, rexw, dest, arg2);
 }
 }
 
-- 
2.34.1

[PATCH v3 14/38] target/s390x: Use TCG_COND_TSTNE for CC_OP_{TM,ICM}

2024-01-10 Thread Richard Henderson

These are all test-and-compare type instructions.

Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/translate.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 62ab2be8b1..ae4e7b27ec 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -754,10 +754,10 @@ static void disas_jcc(DisasContext *s, DisasCompare *c, 
uint32_t mask)
 case CC_OP_TM_64:
 switch (mask) {
 case 8:
-cond = TCG_COND_EQ;
+cond = TCG_COND_TSTEQ;
 break;
 case 4 | 2 | 1:
-cond = TCG_COND_NE;
+cond = TCG_COND_TSTNE;
 break;
 default:
 goto do_dynamic;
@@ -768,11 +768,11 @@ static void disas_jcc(DisasContext *s, DisasCompare *c, 
uint32_t mask)
 case CC_OP_ICM:
 switch (mask) {
 case 8:
-cond = TCG_COND_EQ;
+cond = TCG_COND_TSTEQ;
 break;
 case 4 | 2 | 1:
 case 4 | 2:
-cond = TCG_COND_NE;
+cond = TCG_COND_TSTNE;
 break;
 default:
 goto do_dynamic;
@@ -854,18 +854,14 @@ static void disas_jcc(DisasContext *s, DisasCompare *c, 
uint32_t mask)
 c->u.s64.a = cc_dst;
 c->u.s64.b = tcg_constant_i64(0);
 break;
+
 case CC_OP_LTGT_64:
 case CC_OP_LTUGTU_64:
-c->u.s64.a = cc_src;
-c->u.s64.b = cc_dst;
-break;
-
 case CC_OP_TM_32:
 case CC_OP_TM_64:
 case CC_OP_ICM:
-c->u.s64.a = tcg_temp_new_i64();
-c->u.s64.b = tcg_constant_i64(0);
-tcg_gen_and_i64(c->u.s64.a, cc_src, cc_dst);
+c->u.s64.a = cc_src;
+c->u.s64.b = cc_dst;
 break;
 
 case CC_OP_ADDU:
-- 
2.34.1

[PATCH v3 21/38] tcg/arm: Support TCG_COND_TST{EQ,NE}

2024-01-10 Thread Richard Henderson

Signed-off-by: Richard Henderson 
Message-Id: <20231028194522.245170-12-richard.hender...@linaro.org>
[PMD: Split from bigger patch, part 2/2]
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20231108145244.72421-2-phi...@linaro.org>
---
 tcg/arm/tcg-target.h |  2 +-
 tcg/arm/tcg-target.c.inc | 29 -
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 7bf42045a7..a43875cb09 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -125,7 +125,7 @@ extern bool use_neon_instructions;
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   0
 
-#define TCG_TARGET_HAS_tst  0
+#define TCG_TARGET_HAS_tst  1
 
 #define TCG_TARGET_HAS_v64  use_neon_instructions
 #define TCG_TARGET_HAS_v128 use_neon_instructions
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 66d71af8bf..0fc7273b16 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1194,7 +1194,27 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
 static TCGCond tcg_out_cmp(TCGContext *s, TCGCond cond, TCGReg a,
TCGArg b, int b_const)
 {
-tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0, a, b, b_const);
+if (!is_tst_cond(cond)) {
+tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0, a, b, b_const);
+return cond;
+}
+
+cond = tcg_tst_eqne_cond(cond);
+if (b_const) {
+int imm12 = encode_imm(b);
+
+/*
+ * The compare constraints allow rIN, but TST does not support N.
+ * Be prepared to load the constant into a scratch register.
+ */
+if (imm12 >= 0) {
+tcg_out_dat_imm(s, COND_AL, ARITH_TST, 0, a, imm12);
+return cond;
+}
+tcg_out_movi32(s, COND_AL, TCG_REG_TMP, b);
+b = TCG_REG_TMP;
+}
+tcg_out_dat_reg(s, COND_AL, ARITH_TST, 0, a, b, SHIFT_IMM_LSL(0));
 return cond;
 }
 
@@ -1225,6 +1245,13 @@ static TCGCond tcg_out_cmp2(TCGContext *s, const TCGArg 
*args,
 tcg_out_dat_rI(s, COND_EQ, ARITH_CMP, 0, al, bl, const_bl);
 return cond;
 
+case TCG_COND_TSTEQ:
+case TCG_COND_TSTNE:
+/* Similar, but with TST instead of CMP. */
+tcg_out_dat_rI(s, COND_AL, ARITH_TST, 0, ah, bh, const_bh);
+tcg_out_dat_rI(s, COND_EQ, ARITH_TST, 0, al, bl, const_bl);
+return tcg_tst_eqne_cond(cond);
+
 case TCG_COND_LT:
 case TCG_COND_GE:
 /* We perform a double-word subtraction and examine the result.
-- 
2.34.1

[PATCH v3 32/38] tcg/ppc: Tidy up tcg_target_const_match

2024-01-10 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 27 ---
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index b9323baa86..26e0bc31d7 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -282,31 +282,36 @@ static bool reloc_pc34(tcg_insn_unit *src_rw, const 
tcg_insn_unit *target)
 }
 
 /* test if a constant matches the constraint */
-static bool tcg_target_const_match(int64_t val, int ct,
+static bool tcg_target_const_match(int64_t sval, int ct,
TCGType type, TCGCond cond, int vece)
 {
+uint64_t uval = sval;
+
 if (ct & TCG_CT_CONST) {
 return 1;
 }
 
-/* The only 32-bit constraint we use aside from
-   TCG_CT_CONST is TCG_CT_CONST_S16.  */
 if (type == TCG_TYPE_I32) {
-val = (int32_t)val;
+uval = (uint32_t)sval;
+sval = (int32_t)sval;
 }
 
-if ((ct & TCG_CT_CONST_S16) && val == (int16_t)val) {
+if ((ct & TCG_CT_CONST_S16) && sval == (int16_t)sval) {
 return 1;
-} else if ((ct & TCG_CT_CONST_S32) && val == (int32_t)val) {
+}
+if ((ct & TCG_CT_CONST_S32) && sval == (int32_t)sval) {
 return 1;
-} else if ((ct & TCG_CT_CONST_U32) && val == (uint32_t)val) {
+}
+if ((ct & TCG_CT_CONST_U32) && uval == (uint32_t)uval) {
 return 1;
-} else if ((ct & TCG_CT_CONST_ZERO) && val == 0) {
+}
+if ((ct & TCG_CT_CONST_ZERO) && sval == 0) {
 return 1;
-} else if ((ct & TCG_CT_CONST_MONE) && val == -1) {
+}
+if ((ct & TCG_CT_CONST_MONE) && sval == -1) {
 return 1;
-} else if ((ct & TCG_CT_CONST_WSZ)
-   && val == (type == TCG_TYPE_I32 ? 32 : 64)) {
+}
+if ((ct & TCG_CT_CONST_WSZ) && sval == (type == TCG_TYPE_I32 ? 32 : 64)) {
 return 1;
 }
 return 0;
-- 
2.34.1

[PATCH v3 11/38] target/alpha: Use TCG_COND_TSTNE for gen_fold_mzero

2024-01-10 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/alpha/translate.c | 49 +++-
 1 file changed, 23 insertions(+), 26 deletions(-)

diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index c7daf46de7..c68c2bcd21 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -490,56 +490,53 @@ static DisasJumpType gen_bcond(DisasContext *ctx, TCGCond 
cond, int ra,
 
 /* Fold -0.0 for comparison with COND.  */
 
-static void gen_fold_mzero(TCGCond cond, TCGv dest, TCGv src)
+static TCGv_i64 gen_fold_mzero(TCGCond *pcond, uint64_t *pimm, TCGv_i64 src)
 {
-uint64_t mzero = 1ull << 63;
+TCGv_i64 tmp;
 
-switch (cond) {
+*pimm = 0;
+switch (*pcond) {
 case TCG_COND_LE:
 case TCG_COND_GT:
 /* For <= or >, the -0.0 value directly compares the way we want.  */
-tcg_gen_mov_i64(dest, src);
-break;
+return src;
 
 case TCG_COND_EQ:
 case TCG_COND_NE:
-/* For == or !=, we can simply mask off the sign bit and compare.  */
-tcg_gen_andi_i64(dest, src, mzero - 1);
-break;
+/* For == or !=, we can compare without the sign bit. */
+*pcond = *pcond == TCG_COND_EQ ? TCG_COND_TSTEQ : TCG_COND_TSTNE;
+*pimm = INT64_MAX;
+return src;
 
 case TCG_COND_GE:
 case TCG_COND_LT:
 /* For >= or <, map -0.0 to +0.0. */
-tcg_gen_movcond_i64(TCG_COND_NE, dest, src, tcg_constant_i64(mzero),
-src, tcg_constant_i64(0));
-break;
+tmp = tcg_temp_new_i64();
+tcg_gen_movcond_i64(TCG_COND_EQ, tmp,
+src, tcg_constant_i64(INT64_MIN),
+tcg_constant_i64(0), src);
+return tmp;
 
 default:
-abort();
+g_assert_not_reached();
 }
 }
 
 static DisasJumpType gen_fbcond(DisasContext *ctx, TCGCond cond, int ra,
 int32_t disp)
 {
-TCGv cmp_tmp = tcg_temp_new();
-DisasJumpType ret;
-
-gen_fold_mzero(cond, cmp_tmp, load_fpr(ctx, ra));
-ret = gen_bcond_internal(ctx, cond, cmp_tmp, 0, disp);
-return ret;
+uint64_t imm;
+TCGv_i64 tmp = gen_fold_mzero(, , load_fpr(ctx, ra));
+return gen_bcond_internal(ctx, cond, tmp, imm, disp);
 }
 
 static void gen_fcmov(DisasContext *ctx, TCGCond cond, int ra, int rb, int rc)
 {
-TCGv_i64 va, vb, z;
-
-z = load_zero(ctx);
-vb = load_fpr(ctx, rb);
-va = tcg_temp_new();
-gen_fold_mzero(cond, va, load_fpr(ctx, ra));
-
-tcg_gen_movcond_i64(cond, dest_fpr(ctx, rc), va, z, vb, load_fpr(ctx, rc));
+uint64_t imm;
+TCGv_i64 tmp = gen_fold_mzero(, , load_fpr(ctx, ra));
+tcg_gen_movcond_i64(cond, dest_fpr(ctx, rc),
+tmp, tcg_constant_i64(imm),
+load_fpr(ctx, rb), load_fpr(ctx, rc));
 }
 
 #define QUAL_RM_N   0x080   /* Round mode nearest even */
-- 
2.34.1

[PATCH v3 10/38] target/alpha: Use TCG_COND_TST{EQ, NE} for CMOVLB{C, S}

2024-01-10 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/alpha/translate.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 49e6a7b62d..c7daf46de7 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -1676,16 +1676,12 @@ static DisasJumpType translate_one(DisasContext *ctx, 
uint32_t insn)
 break;
 case 0x14:
 /* CMOVLBS */
-tmp = tcg_temp_new();
-tcg_gen_andi_i64(tmp, va, 1);
-tcg_gen_movcond_i64(TCG_COND_NE, vc, tmp, load_zero(ctx),
+tcg_gen_movcond_i64(TCG_COND_TSTNE, vc, va, tcg_constant_i64(1),
 vb, load_gpr(ctx, rc));
 break;
 case 0x16:
 /* CMOVLBC */
-tmp = tcg_temp_new();
-tcg_gen_andi_i64(tmp, va, 1);
-tcg_gen_movcond_i64(TCG_COND_EQ, vc, tmp, load_zero(ctx),
+tcg_gen_movcond_i64(TCG_COND_TSTEQ, vc, va, tcg_constant_i64(1),
 vb, load_gpr(ctx, rc));
 break;
 case 0x20:
-- 
2.34.1

[PATCH v3 31/38] tcg/ppc: Use cr0 in tcg_to_bc and tcg_to_isel

2024-01-10 Thread Richard Henderson

Using cr0 means we could choose to use rc=1 to compute the condition.
Adjust the tables and tcg_out_cmp that feeds them.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 68 
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 830d2fe73a..b9323baa86 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -671,30 +671,30 @@ enum {
 };
 
 static const uint32_t tcg_to_bc[] = {
-[TCG_COND_EQ]  = BC | BI(7, CR_EQ) | BO_COND_TRUE,
-[TCG_COND_NE]  = BC | BI(7, CR_EQ) | BO_COND_FALSE,
-[TCG_COND_LT]  = BC | BI(7, CR_LT) | BO_COND_TRUE,
-[TCG_COND_GE]  = BC | BI(7, CR_LT) | BO_COND_FALSE,
-[TCG_COND_LE]  = BC | BI(7, CR_GT) | BO_COND_FALSE,
-[TCG_COND_GT]  = BC | BI(7, CR_GT) | BO_COND_TRUE,
-[TCG_COND_LTU] = BC | BI(7, CR_LT) | BO_COND_TRUE,
-[TCG_COND_GEU] = BC | BI(7, CR_LT) | BO_COND_FALSE,
-[TCG_COND_LEU] = BC | BI(7, CR_GT) | BO_COND_FALSE,
-[TCG_COND_GTU] = BC | BI(7, CR_GT) | BO_COND_TRUE,
+[TCG_COND_EQ]  = BC | BI(0, CR_EQ) | BO_COND_TRUE,
+[TCG_COND_NE]  = BC | BI(0, CR_EQ) | BO_COND_FALSE,
+[TCG_COND_LT]  = BC | BI(0, CR_LT) | BO_COND_TRUE,
+[TCG_COND_GE]  = BC | BI(0, CR_LT) | BO_COND_FALSE,
+[TCG_COND_LE]  = BC | BI(0, CR_GT) | BO_COND_FALSE,
+[TCG_COND_GT]  = BC | BI(0, CR_GT) | BO_COND_TRUE,
+[TCG_COND_LTU] = BC | BI(0, CR_LT) | BO_COND_TRUE,
+[TCG_COND_GEU] = BC | BI(0, CR_LT) | BO_COND_FALSE,
+[TCG_COND_LEU] = BC | BI(0, CR_GT) | BO_COND_FALSE,
+[TCG_COND_GTU] = BC | BI(0, CR_GT) | BO_COND_TRUE,
 };
 
 /* The low bit here is set if the RA and RB fields must be inverted.  */
 static const uint32_t tcg_to_isel[] = {
-[TCG_COND_EQ]  = ISEL | BC_(7, CR_EQ),
-[TCG_COND_NE]  = ISEL | BC_(7, CR_EQ) | 1,
-[TCG_COND_LT]  = ISEL | BC_(7, CR_LT),
-[TCG_COND_GE]  = ISEL | BC_(7, CR_LT) | 1,
-[TCG_COND_LE]  = ISEL | BC_(7, CR_GT) | 1,
-[TCG_COND_GT]  = ISEL | BC_(7, CR_GT),
-[TCG_COND_LTU] = ISEL | BC_(7, CR_LT),
-[TCG_COND_GEU] = ISEL | BC_(7, CR_LT) | 1,
-[TCG_COND_LEU] = ISEL | BC_(7, CR_GT) | 1,
-[TCG_COND_GTU] = ISEL | BC_(7, CR_GT),
+[TCG_COND_EQ]  = ISEL | BC_(0, CR_EQ),
+[TCG_COND_NE]  = ISEL | BC_(0, CR_EQ) | 1,
+[TCG_COND_LT]  = ISEL | BC_(0, CR_LT),
+[TCG_COND_GE]  = ISEL | BC_(0, CR_LT) | 1,
+[TCG_COND_LE]  = ISEL | BC_(0, CR_GT) | 1,
+[TCG_COND_GT]  = ISEL | BC_(0, CR_GT),
+[TCG_COND_LTU] = ISEL | BC_(0, CR_LT),
+[TCG_COND_GEU] = ISEL | BC_(0, CR_LT) | 1,
+[TCG_COND_LEU] = ISEL | BC_(0, CR_GT) | 1,
+[TCG_COND_GTU] = ISEL | BC_(0, CR_GT),
 };
 
 static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
@@ -1827,7 +1827,7 @@ static void tcg_out_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 if (have_isa_3_10) {
 tcg_insn_unit bi, opc;
 
-tcg_out_cmp(s, cond, arg1, arg2, const_arg2, 7, type);
+tcg_out_cmp(s, cond, arg1, arg2, const_arg2, 0, type);
 
 /* Re-use tcg_to_bc for BI and BO_COND_{TRUE,FALSE}. */
 bi = tcg_to_bc[cond] & (0x1f << 16);
@@ -1880,7 +1880,7 @@ static void tcg_out_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 if (have_isel) {
 int isel, tab;
 
-tcg_out_cmp(s, cond, arg1, arg2, const_arg2, 7, type);
+tcg_out_cmp(s, cond, arg1, arg2, const_arg2, 0, type);
 
 isel = tcg_to_isel[cond];
 
@@ -1966,7 +1966,7 @@ static void tcg_out_brcond(TCGContext *s, TCGCond cond,
TCGArg arg1, TCGArg arg2, int const_arg2,
TCGLabel *l, TCGType type)
 {
-tcg_out_cmp(s, cond, arg1, arg2, const_arg2, 7, type);
+tcg_out_cmp(s, cond, arg1, arg2, const_arg2, 0, type);
 tcg_out_bc_lab(s, cond, l);
 }
 
@@ -1980,7 +1980,7 @@ static void tcg_out_movcond(TCGContext *s, TCGType type, 
TCGCond cond,
 return;
 }
 
-tcg_out_cmp(s, cond, c1, c2, const_c2, 7, type);
+tcg_out_cmp(s, cond, c1, c2, const_c2, 0, type);
 
 if (have_isel) {
 int isel = tcg_to_isel[cond];
@@ -2024,7 +2024,7 @@ static void tcg_out_cntxz(TCGContext *s, TCGType type, 
uint32_t opc,
 if (const_a2 && a2 == (type == TCG_TYPE_I32 ? 32 : 64)) {
 tcg_out32(s, opc | RA(a0) | RS(a1));
 } else {
-tcg_out_cmp(s, TCG_COND_EQ, a1, 0, 1, 7, type);
+tcg_out_cmp(s, TCG_COND_EQ, a1, 0, 1, 0, type);
 /* Note that the only other valid constant for a2 is 0.  */
 if (have_isel) {
 tcg_out32(s, opc | RA(TCG_REG_R0) | RS(a1));
@@ -2079,7 +2079,7 @@ static void tcg_out_cmp2(TCGContext *s, const TCGArg 
*args,
 do_equality:
 tcg_out_cmp(s, cond, al, bl, blconst, 6, TCG_TYPE_I32);
 tcg_out_cmp(s, cond, ah, bh, bhconst, 7, TCG_TYPE_I32);
-tcg_out32(s, op | BT(7, CR_EQ) | BA(6, CR_EQ) | BB(7, CR_EQ));
+tcg_out32(s, op | BT(0, CR_EQ) | BA(6, CR_EQ) | BB(7, CR_EQ));
 break;

[PATCH v3 12/38] target/m68k: Use TCG_COND_TST{EQ, NE} in gen_fcc_cond

2024-01-10 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/m68k/translate.c | 74 ++---
 1 file changed, 33 insertions(+), 41 deletions(-)

diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index 4a0b0b2703..f30b92f2d4 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -5129,46 +5129,44 @@ undef:
 static void gen_fcc_cond(DisasCompare *c, DisasContext *s, int cond)
 {
 TCGv fpsr;
+int imm = 0;
 
-c->v2 = tcg_constant_i32(0);
 /* TODO: Raise BSUN exception.  */
 fpsr = tcg_temp_new();
 gen_load_fcr(s, fpsr, M68K_FPSR);
+c->v1 = fpsr;
+
 switch (cond) {
 case 0:  /* False */
 case 16: /* Signaling False */
-c->v1 = c->v2;
 c->tcond = TCG_COND_NEVER;
 break;
 case 1:  /* EQual Z */
 case 17: /* Signaling EQual Z */
-c->v1 = tcg_temp_new();
-tcg_gen_andi_i32(c->v1, fpsr, FPSR_CC_Z);
-c->tcond = TCG_COND_NE;
+imm = FPSR_CC_Z;
+c->tcond = TCG_COND_TSTNE;
 break;
 case 2:  /* Ordered Greater Than !(A || Z || N) */
 case 18: /* Greater Than !(A || Z || N) */
-c->v1 = tcg_temp_new();
-tcg_gen_andi_i32(c->v1, fpsr,
- FPSR_CC_A | FPSR_CC_Z | FPSR_CC_N);
-c->tcond = TCG_COND_EQ;
+imm = FPSR_CC_A | FPSR_CC_Z | FPSR_CC_N;
+c->tcond = TCG_COND_TSTEQ;
 break;
 case 3:  /* Ordered Greater than or Equal Z || !(A || N) */
 case 19: /* Greater than or Equal Z || !(A || N) */
 c->v1 = tcg_temp_new();
 tcg_gen_andi_i32(c->v1, fpsr, FPSR_CC_A);
 tcg_gen_shli_i32(c->v1, c->v1, ctz32(FPSR_CC_N) - ctz32(FPSR_CC_A));
-tcg_gen_andi_i32(fpsr, fpsr, FPSR_CC_Z | FPSR_CC_N);
 tcg_gen_or_i32(c->v1, c->v1, fpsr);
 tcg_gen_xori_i32(c->v1, c->v1, FPSR_CC_N);
-c->tcond = TCG_COND_NE;
+imm = FPSR_CC_Z | FPSR_CC_N;
+c->tcond = TCG_COND_TSTNE;
 break;
 case 4:  /* Ordered Less Than !(!N || A || Z); */
 case 20: /* Less Than !(!N || A || Z); */
 c->v1 = tcg_temp_new();
 tcg_gen_xori_i32(c->v1, fpsr, FPSR_CC_N);
-tcg_gen_andi_i32(c->v1, c->v1, FPSR_CC_N | FPSR_CC_A | FPSR_CC_Z);
-c->tcond = TCG_COND_EQ;
+imm = FPSR_CC_N | FPSR_CC_A | FPSR_CC_Z;
+c->tcond = TCG_COND_TSTEQ;
 break;
 case 5:  /* Ordered Less than or Equal Z || (N && !A) */
 case 21: /* Less than or Equal Z || (N && !A) */
@@ -5176,49 +5174,45 @@ static void gen_fcc_cond(DisasCompare *c, DisasContext 
*s, int cond)
 tcg_gen_andi_i32(c->v1, fpsr, FPSR_CC_A);
 tcg_gen_shli_i32(c->v1, c->v1, ctz32(FPSR_CC_N) - ctz32(FPSR_CC_A));
 tcg_gen_andc_i32(c->v1, fpsr, c->v1);
-tcg_gen_andi_i32(c->v1, c->v1, FPSR_CC_Z | FPSR_CC_N);
-c->tcond = TCG_COND_NE;
+imm = FPSR_CC_Z | FPSR_CC_N;
+c->tcond = TCG_COND_TSTNE;
 break;
 case 6:  /* Ordered Greater or Less than !(A || Z) */
 case 22: /* Greater or Less than !(A || Z) */
-c->v1 = tcg_temp_new();
-tcg_gen_andi_i32(c->v1, fpsr, FPSR_CC_A | FPSR_CC_Z);
-c->tcond = TCG_COND_EQ;
+imm = FPSR_CC_A | FPSR_CC_Z;
+c->tcond = TCG_COND_TSTEQ;
 break;
 case 7:  /* Ordered !A */
 case 23: /* Greater, Less or Equal !A */
-c->v1 = tcg_temp_new();
-tcg_gen_andi_i32(c->v1, fpsr, FPSR_CC_A);
-c->tcond = TCG_COND_EQ;
+imm = FPSR_CC_A;
+c->tcond = TCG_COND_TSTEQ;
 break;
 case 8:  /* Unordered A */
 case 24: /* Not Greater, Less or Equal A */
-c->v1 = tcg_temp_new();
-tcg_gen_andi_i32(c->v1, fpsr, FPSR_CC_A);
-c->tcond = TCG_COND_NE;
+imm = FPSR_CC_A;
+c->tcond = TCG_COND_TSTNE;
 break;
 case 9:  /* Unordered or Equal A || Z */
 case 25: /* Not Greater or Less then A || Z */
-c->v1 = tcg_temp_new();
-tcg_gen_andi_i32(c->v1, fpsr, FPSR_CC_A | FPSR_CC_Z);
-c->tcond = TCG_COND_NE;
+imm = FPSR_CC_A | FPSR_CC_Z;
+c->tcond = TCG_COND_TSTNE;
 break;
 case 10: /* Unordered or Greater Than A || !(N || Z)) */
 case 26: /* Not Less or Equal A || !(N || Z)) */
 c->v1 = tcg_temp_new();
 tcg_gen_andi_i32(c->v1, fpsr, FPSR_CC_Z);
 tcg_gen_shli_i32(c->v1, c->v1, ctz32(FPSR_CC_N) - ctz32(FPSR_CC_Z));
-tcg_gen_andi_i32(fpsr, fpsr, FPSR_CC_A | FPSR_CC_N);
 tcg_gen_or_i32(c->v1, c->v1, fpsr);
 tcg_gen_xori_i32(c->v1, c->v1, FPSR_CC_N);
-c->tcond = TCG_COND_NE;
+imm = FPSR_CC_A | FPSR_CC_N;
+c->tcond = TCG_COND_TSTNE;
 break;
 case 11: /* Unordered or Greater or Equal A || Z || !N */
 case 27: /* Not Less Than A || Z || !N */
 c->v1 = tcg_temp_new();
-tcg_gen_andi_i32(c->v1, fpsr, FPSR_CC_A | FPSR_CC_Z | FPSR_CC_N);
-tcg_gen_xori_i32(c->v1, c->v1, FPSR_CC_N);
-c->tcond =

[PATCH v3 02/38] tcg: Introduce TCG_TARGET_HAS_tst

2024-01-10 Thread Richard Henderson

Define as 0 for all tcg backends.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.h | 2 ++
 tcg/arm/tcg-target.h | 2 ++
 tcg/i386/tcg-target.h| 2 ++
 tcg/loongarch64/tcg-target.h | 2 ++
 tcg/mips/tcg-target.h| 2 ++
 tcg/ppc/tcg-target.h | 2 ++
 tcg/riscv/tcg-target.h   | 2 ++
 tcg/s390x/tcg-target.h   | 2 ++
 tcg/sparc64/tcg-target.h | 2 ++
 tcg/tci/tcg-target.h | 2 ++
 10 files changed, 20 insertions(+)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 33f15a564a..b4ac13be7b 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -138,6 +138,8 @@ typedef enum {
 #define TCG_TARGET_HAS_qemu_ldst_i128   1
 #endif
 
+#define TCG_TARGET_HAS_tst  0
+
 #define TCG_TARGET_HAS_v64  1
 #define TCG_TARGET_HAS_v128 1
 #define TCG_TARGET_HAS_v256 0
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index a712cc80ad..7bf42045a7 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -125,6 +125,8 @@ extern bool use_neon_instructions;
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   0
 
+#define TCG_TARGET_HAS_tst  0
+
 #define TCG_TARGET_HAS_v64  use_neon_instructions
 #define TCG_TARGET_HAS_v128 use_neon_instructions
 #define TCG_TARGET_HAS_v256 0
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index fa34deec47..1dd917a680 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -198,6 +198,8 @@ typedef enum {
 #define TCG_TARGET_HAS_qemu_ldst_i128 \
 (TCG_TARGET_REG_BITS == 64 && (cpuinfo & CPUINFO_ATOMIC_VMOVDQA))
 
+#define TCG_TARGET_HAS_tst  0
+
 /* We do not support older SSE systems, only beginning with AVX1.  */
 #define TCG_TARGET_HAS_v64  have_avx1
 #define TCG_TARGET_HAS_v128 have_avx1
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 9c70ebfefc..fede627bf7 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -169,6 +169,8 @@ typedef enum {
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   (cpuinfo & CPUINFO_LSX)
 
+#define TCG_TARGET_HAS_tst  0
+
 #define TCG_TARGET_HAS_v64  0
 #define TCG_TARGET_HAS_v128 (cpuinfo & CPUINFO_LSX)
 #define TCG_TARGET_HAS_v256 0
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index b98ffae1d0..a996aa171d 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -194,6 +194,8 @@ extern bool use_mips32r2_instructions;
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   0
 
+#define TCG_TARGET_HAS_tst  0
+
 #define TCG_TARGET_DEFAULT_MO   0
 #define TCG_TARGET_NEED_LDST_LABELS
 #define TCG_TARGET_NEED_POOL_LABELS
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 5295e4f9ab..60ce49e672 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -143,6 +143,8 @@ typedef enum {
 #define TCG_TARGET_HAS_qemu_ldst_i128   \
 (TCG_TARGET_REG_BITS == 64 && have_isa_2_07)
 
+#define TCG_TARGET_HAS_tst  0
+
 /*
  * While technically Altivec could support V64, it has no 64-bit store
  * instruction and substituting two 32-bit stores makes the generated
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index a4edc3dc74..2c1b680b93 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -158,6 +158,8 @@ extern bool have_zbb;
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   0
 
+#define TCG_TARGET_HAS_tst  0
+
 #define TCG_TARGET_DEFAULT_MO (0)
 
 #define TCG_TARGET_NEED_LDST_LABELS
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index e69b0d2ddd..53bed8c8d2 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -138,6 +138,8 @@ extern uint64_t s390_facilities[3];
 
 #define TCG_TARGET_HAS_qemu_ldst_i128 1
 
+#define TCG_TARGET_HAS_tst0
+
 #define TCG_TARGET_HAS_v64HAVE_FACILITY(VECTOR)
 #define TCG_TARGET_HAS_v128   HAVE_FACILITY(VECTOR)
 #define TCG_TARGET_HAS_v256   0
diff --git a/tcg/sparc64/tcg-target.h b/tcg/sparc64/tcg-target.h
index f8cf145266..ae2910c4ee 100644
--- a/tcg/sparc64/tcg-target.h
+++ b/tcg/sparc64/tcg-target.h
@@ -149,6 +149,8 @@ extern bool use_vis3_instructions;
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   0
 
+#define TCG_TARGET_HAS_tst  0
+
 #define TCG_AREG0 TCG_REG_I0
 
 #define TCG_TARGET_DEFAULT_MO (0)
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 2a13816c8e..609b2f4e4a 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -117,6 +117,8 @@
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   0
 
+#define TCG_TARGET_HAS_tst  0
+
 /* Number of registers available. */
 #define TCG_TARGET_NB_REGS 16
 
-- 
2.34.1

[PATCH v3 16/38] tcg: Add TCGConst argument to tcg_target_const_match

2024-01-10 Thread Richard Henderson

Fill the new argument from any condition within the opcode.
Not yet used within any backend.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c| 34 ++--
 tcg/aarch64/tcg-target.c.inc |  3 ++-
 tcg/arm/tcg-target.c.inc |  3 ++-
 tcg/i386/tcg-target.c.inc|  3 ++-
 tcg/loongarch64/tcg-target.c.inc |  3 ++-
 tcg/mips/tcg-target.c.inc|  3 ++-
 tcg/ppc/tcg-target.c.inc |  3 ++-
 tcg/riscv/tcg-target.c.inc   |  3 ++-
 tcg/s390x/tcg-target.c.inc   |  3 ++-
 tcg/sparc64/tcg-target.c.inc |  3 ++-
 tcg/tci/tcg-target.c.inc |  3 ++-
 11 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 2f4522488a..4169ce89a4 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -173,7 +173,8 @@ static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg 
val,
 static void tcg_out_call(TCGContext *s, const tcg_insn_unit *target,
  const TCGHelperInfo *info);
 static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot);
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int 
vece);
+static bool tcg_target_const_match(int64_t val, int ct,
+   TCGType type, TCGCond cond, int vece);
 #ifdef TCG_TARGET_NEED_LDST_LABELS
 static int tcg_out_ldst_finalize(TCGContext *s);
 #endif
@@ -4786,6 +4787,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 TCGTemp *ts;
 TCGArg new_args[TCG_MAX_OP_ARGS];
 int const_args[TCG_MAX_OP_ARGS];
+TCGCond op_cond;
 
 nb_oargs = def->nb_oargs;
 nb_iargs = def->nb_iargs;
@@ -4798,6 +4800,33 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 i_allocated_regs = s->reserved_regs;
 o_allocated_regs = s->reserved_regs;
 
+switch (op->opc) {
+case INDEX_op_brcond_i32:
+case INDEX_op_brcond_i64:
+op_cond = op->args[2];
+break;
+case INDEX_op_setcond_i32:
+case INDEX_op_setcond_i64:
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
+case INDEX_op_cmp_vec:
+op_cond = op->args[3];
+break;
+case INDEX_op_brcond2_i32:
+op_cond = op->args[4];
+break;
+case INDEX_op_movcond_i32:
+case INDEX_op_movcond_i64:
+case INDEX_op_setcond2_i32:
+case INDEX_op_cmpsel_vec:
+op_cond = op->args[5];
+break;
+default:
+/* No condition within opcode. */
+op_cond = TCG_COND_ALWAYS;
+break;
+}
+
 /* satisfy input constraints */
 for (k = 0; k < nb_iargs; k++) {
 TCGRegSet i_preferred_regs, i_required_regs;
@@ -4811,7 +4840,8 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 ts = arg_temp(arg);
 
 if (ts->val_type == TEMP_VAL_CONST
-&& tcg_target_const_match(ts->val, ts->type, arg_ct->ct, 
TCGOP_VECE(op))) {
+&& tcg_target_const_match(ts->val, arg_ct->ct, ts->type,
+  op_cond, TCGOP_VECE(op))) {
 /* constant is OK for instruction */
 const_args[i] = 1;
 new_args[i] = ts->val;
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index a3efa1e67a..420e4a35ea 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -270,7 +270,8 @@ static bool is_shimm1632(uint32_t v32, int *cmode, int 
*imm8)
 }
 }
 
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
+static bool tcg_target_const_match(int64_t val, int ct,
+   TCGType type, TCGCond cond, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index fc78566494..0c29a3929b 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -501,7 +501,8 @@ static bool is_shimm1632(uint32_t v32, int *cmode, int 
*imm8)
  * mov operand2: values represented with x << (2 * y), x < 0x100
  * add, sub, eor...: ditto
  */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
+static bool tcg_target_const_match(int64_t val, int ct,
+   TCGType type, TCGCond cond, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index d268199fc1..accaaa2660 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -195,7 +195,8 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 }
 
 /* test if a constant matches the constraint */
-static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
+static bool tcg_target_const_match(int64_t val, int ct,
+   TCGType type, TCGCond cond, int vece)
 {
 if (ct & TCG_CT_CONST) {
 return 1;
diff --git a/tcg/loongarch64/tcg-target.c.inc

[PATCH v3 05/38] tcg/optimize: Do swap_commutative2 in do_constant_folding_cond2

2024-01-10 Thread Richard Henderson

Mirror the new do_constant_folding_cond1 by doing all
argument and condition adjustment within one helper.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 107 ++---
 1 file changed, 57 insertions(+), 50 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 9c04dba099..08a9280432 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -713,43 +713,6 @@ static int do_constant_folding_cond(TCGType type, TCGArg x,
 return -1;
 }
 
-/*
- * Return -1 if the condition can't be simplified,
- * and the result of the condition (0 or 1) if it can.
- */
-static int do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
-{
-TCGArg al = p1[0], ah = p1[1];
-TCGArg bl = p2[0], bh = p2[1];
-
-if (arg_is_const(bl) && arg_is_const(bh)) {
-tcg_target_ulong blv = arg_info(bl)->val;
-tcg_target_ulong bhv = arg_info(bh)->val;
-uint64_t b = deposit64(blv, 32, 32, bhv);
-
-if (arg_is_const(al) && arg_is_const(ah)) {
-tcg_target_ulong alv = arg_info(al)->val;
-tcg_target_ulong ahv = arg_info(ah)->val;
-uint64_t a = deposit64(alv, 32, 32, ahv);
-return do_constant_folding_cond_64(a, b, c);
-}
-if (b == 0) {
-switch (c) {
-case TCG_COND_LTU:
-return 0;
-case TCG_COND_GEU:
-return 1;
-default:
-break;
-}
-}
-}
-if (args_are_copies(al, bl) && args_are_copies(ah, bh)) {
-return do_constant_folding_cond_eq(c);
-}
-return -1;
-}
-
 /**
  * swap_commutative:
  * @dest: TCGArg of the destination argument, or NO_DEST.
@@ -796,6 +759,10 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 return false;
 }
 
+/*
+ * Return -1 if the condition can't be simplified,
+ * and the result of the condition (0 or 1) if it can.
+ */
 static int do_constant_folding_cond1(OptContext *ctx, TCGArg dest,
  TCGArg *p1, TCGArg *p2, TCGArg *pcond)
 {
@@ -813,6 +780,51 @@ static int do_constant_folding_cond1(OptContext *ctx, 
TCGArg dest,
 return r;
 }
 
+static int do_constant_folding_cond2(OptContext *ctx, TCGArg *args)
+{
+TCGArg al, ah, bl, bh;
+TCGCond c;
+bool swap;
+
+swap = swap_commutative2(args, args + 2);
+c = args[4];
+if (swap) {
+args[4] = c = tcg_swap_cond(c);
+}
+
+al = args[0];
+ah = args[1];
+bl = args[2];
+bh = args[3];
+
+if (arg_is_const(bl) && arg_is_const(bh)) {
+tcg_target_ulong blv = arg_info(bl)->val;
+tcg_target_ulong bhv = arg_info(bh)->val;
+uint64_t b = deposit64(blv, 32, 32, bhv);
+
+if (arg_is_const(al) && arg_is_const(ah)) {
+tcg_target_ulong alv = arg_info(al)->val;
+tcg_target_ulong ahv = arg_info(ah)->val;
+uint64_t a = deposit64(alv, 32, 32, ahv);
+return do_constant_folding_cond_64(a, b, c);
+}
+if (b == 0) {
+switch (c) {
+case TCG_COND_LTU:
+return 0;
+case TCG_COND_GEU:
+return 1;
+default:
+break;
+}
+}
+}
+if (args_are_copies(al, bl) && args_are_copies(ah, bh)) {
+return do_constant_folding_cond_eq(c);
+}
+return -1;
+}
+
 static void init_arguments(OptContext *ctx, TCGOp *op, int nb_args)
 {
 for (int i = 0; i < nb_args; i++) {
@@ -1225,15 +1237,13 @@ static bool fold_brcond(OptContext *ctx, TCGOp *op)
 
 static bool fold_brcond2(OptContext *ctx, TCGOp *op)
 {
-TCGCond cond = op->args[4];
-TCGArg label = op->args[5];
+TCGCond cond;
+TCGArg label;
 int i, inv = 0;
 
-if (swap_commutative2(>args[0], >args[2])) {
-op->args[4] = cond = tcg_swap_cond(cond);
-}
-
-i = do_constant_folding_cond2(>args[0], >args[2], cond);
+i = do_constant_folding_cond2(ctx, >args[0]);
+cond = op->args[4];
+label = op->args[5];
 if (i >= 0) {
 goto do_brcond_const;
 }
@@ -1986,14 +1996,11 @@ static bool fold_negsetcond(OptContext *ctx, TCGOp *op)
 
 static bool fold_setcond2(OptContext *ctx, TCGOp *op)
 {
-TCGCond cond = op->args[5];
+TCGCond cond;
 int i, inv = 0;
 
-if (swap_commutative2(>args[1], >args[3])) {
-op->args[5] = cond = tcg_swap_cond(cond);
-}
-
-i = do_constant_folding_cond2(>args[1], >args[3], cond);
+i = do_constant_folding_cond2(ctx, >args[1]);
+cond = op->args[5];
 if (i >= 0) {
 goto do_setcond_const;
 }
-- 
2.34.1

[PATCH v3 09/38] target/alpha: Use TCG_COND_TST{EQ,NE} for BLB{C,S}

2024-01-10 Thread Richard Henderson

Signed-off-by: Richard Henderson 
Message-Id: <20231028194522.245170-33-richard.hender...@linaro.org>
[PMD: Split from bigger patch, part 2/2]
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20231108205247.83234-2-phi...@linaro.org>
---
 target/alpha/translate.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 89e630a7cc..49e6a7b62d 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -482,10 +482,10 @@ static DisasJumpType gen_bcond_internal(DisasContext 
*ctx, TCGCond cond,
 }
 
 static DisasJumpType gen_bcond(DisasContext *ctx, TCGCond cond, int ra,
-   int32_t disp, int mask)
+   int32_t disp)
 {
 return gen_bcond_internal(ctx, cond, load_gpr(ctx, ra),
-  mask, disp);
+  is_tst_cond(cond), disp);
 }
 
 /* Fold -0.0 for comparison with COND.  */
@@ -2820,35 +2820,35 @@ static DisasJumpType translate_one(DisasContext *ctx, 
uint32_t insn)
 break;
 case 0x38:
 /* BLBC */
-ret = gen_bcond(ctx, TCG_COND_EQ, ra, disp21, 1);
+ret = gen_bcond(ctx, TCG_COND_TSTEQ, ra, disp21);
 break;
 case 0x39:
 /* BEQ */
-ret = gen_bcond(ctx, TCG_COND_EQ, ra, disp21, 0);
+ret = gen_bcond(ctx, TCG_COND_EQ, ra, disp21);
 break;
 case 0x3A:
 /* BLT */
-ret = gen_bcond(ctx, TCG_COND_LT, ra, disp21, 0);
+ret = gen_bcond(ctx, TCG_COND_LT, ra, disp21);
 break;
 case 0x3B:
 /* BLE */
-ret = gen_bcond(ctx, TCG_COND_LE, ra, disp21, 0);
+ret = gen_bcond(ctx, TCG_COND_LE, ra, disp21);
 break;
 case 0x3C:
 /* BLBS */
-ret = gen_bcond(ctx, TCG_COND_NE, ra, disp21, 1);
+ret = gen_bcond(ctx, TCG_COND_TSTNE, ra, disp21);
 break;
 case 0x3D:
 /* BNE */
-ret = gen_bcond(ctx, TCG_COND_NE, ra, disp21, 0);
+ret = gen_bcond(ctx, TCG_COND_NE, ra, disp21);
 break;
 case 0x3E:
 /* BGE */
-ret = gen_bcond(ctx, TCG_COND_GE, ra, disp21, 0);
+ret = gen_bcond(ctx, TCG_COND_GE, ra, disp21);
 break;
 case 0x3F:
 /* BGT */
-ret = gen_bcond(ctx, TCG_COND_GT, ra, disp21, 0);
+ret = gen_bcond(ctx, TCG_COND_GT, ra, disp21);
 break;
 invalid_opc:
 ret = gen_invalid(ctx);
-- 
2.34.1

[PATCH v3 13/38] target/sparc: Use TCG_COND_TSTEQ in gen_op_mulscc

2024-01-10 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/sparc/translate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index 9387299559..b96633dde1 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -506,6 +506,7 @@ static void gen_op_subccc(TCGv dst, TCGv src1, TCGv src2)
 static void gen_op_mulscc(TCGv dst, TCGv src1, TCGv src2)
 {
 TCGv zero = tcg_constant_tl(0);
+TCGv one = tcg_constant_tl(1);
 TCGv t_src1 = tcg_temp_new();
 TCGv t_src2 = tcg_temp_new();
 TCGv t0 = tcg_temp_new();
@@ -517,8 +518,7 @@ static void gen_op_mulscc(TCGv dst, TCGv src1, TCGv src2)
  * if (!(env->y & 1))
  *   src2 = 0;
  */
-tcg_gen_andi_tl(t0, cpu_y, 0x1);
-tcg_gen_movcond_tl(TCG_COND_EQ, t_src2, t0, zero, zero, t_src2);
+tcg_gen_movcond_tl(TCG_COND_TSTEQ, t_src2, cpu_y, one, zero, t_src2);
 
 /*
  * b2 = src1 & 1;
-- 
2.34.1

[PATCH v3 03/38] tcg/optimize: Split out arg_is_const_val

2024-01-10 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 38 +++---
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index f2d01654c5..73019b9996 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -124,11 +124,22 @@ static inline bool ts_is_const(TCGTemp *ts)
 return ts_info(ts)->is_const;
 }
 
+static inline bool ts_is_const_val(TCGTemp *ts, uint64_t val)
+{
+TempOptInfo *ti = ts_info(ts);
+return ti->is_const && ti->val == val;
+}
+
 static inline bool arg_is_const(TCGArg arg)
 {
 return ts_is_const(arg_temp(arg));
 }
 
+static inline bool arg_is_const_val(TCGArg arg, uint64_t val)
+{
+return ts_is_const_val(arg_temp(arg), val);
+}
+
 static inline bool ts_is_copy(TCGTemp *ts)
 {
 return ts_info(ts)->next_copy != ts;
@@ -689,7 +700,7 @@ static int do_constant_folding_cond(TCGType type, TCGArg x,
 }
 } else if (args_are_copies(x, y)) {
 return do_constant_folding_cond_eq(c);
-} else if (arg_is_const(y) && arg_info(y)->val == 0) {
+} else if (arg_is_const_val(y, 0)) {
 switch (c) {
 case TCG_COND_LTU:
 return 0;
@@ -954,7 +965,7 @@ static bool fold_to_not(OptContext *ctx, TCGOp *op, int idx)
 /* If the binary operation has first argument @i, fold to @i. */
 static bool fold_ix_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
 {
-if (arg_is_const(op->args[1]) && arg_info(op->args[1])->val == i) {
+if (arg_is_const_val(op->args[1], i)) {
 return tcg_opt_gen_movi(ctx, op, op->args[0], i);
 }
 return false;
@@ -963,7 +974,7 @@ static bool fold_ix_to_i(OptContext *ctx, TCGOp *op, 
uint64_t i)
 /* If the binary operation has first argument @i, fold to NOT. */
 static bool fold_ix_to_not(OptContext *ctx, TCGOp *op, uint64_t i)
 {
-if (arg_is_const(op->args[1]) && arg_info(op->args[1])->val == i) {
+if (arg_is_const_val(op->args[1], i)) {
 return fold_to_not(ctx, op, 2);
 }
 return false;
@@ -972,7 +983,7 @@ static bool fold_ix_to_not(OptContext *ctx, TCGOp *op, 
uint64_t i)
 /* If the binary operation has second argument @i, fold to @i. */
 static bool fold_xi_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
 {
-if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == i) {
+if (arg_is_const_val(op->args[2], i)) {
 return tcg_opt_gen_movi(ctx, op, op->args[0], i);
 }
 return false;
@@ -981,7 +992,7 @@ static bool fold_xi_to_i(OptContext *ctx, TCGOp *op, 
uint64_t i)
 /* If the binary operation has second argument @i, fold to identity. */
 static bool fold_xi_to_x(OptContext *ctx, TCGOp *op, uint64_t i)
 {
-if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == i) {
+if (arg_is_const_val(op->args[2], i)) {
 return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[1]);
 }
 return false;
@@ -990,7 +1001,7 @@ static bool fold_xi_to_x(OptContext *ctx, TCGOp *op, 
uint64_t i)
 /* If the binary operation has second argument @i, fold to NOT. */
 static bool fold_xi_to_not(OptContext *ctx, TCGOp *op, uint64_t i)
 {
-if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == i) {
+if (arg_is_const_val(op->args[2], i)) {
 return fold_to_not(ctx, op, 1);
 }
 return false;
@@ -1223,8 +1234,8 @@ static bool fold_brcond2(OptContext *ctx, TCGOp *op)
  * Simplify LT/GE comparisons vs zero to a single compare
  * vs the high word of the input.
  */
-if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == 0 &&
-arg_is_const(op->args[3]) && arg_info(op->args[3])->val == 0) {
+if (arg_is_const_val(op->args[2], 0) &&
+arg_is_const_val(op->args[3], 0)) {
 goto do_brcond_high;
 }
 break;
@@ -1448,9 +1459,7 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
 }
 
 /* Inserting a value into zero at offset 0. */
-if (arg_is_const(op->args[1])
-&& arg_info(op->args[1])->val == 0
-&& op->args[3] == 0) {
+if (arg_is_const_val(op->args[1], 0) && op->args[3] == 0) {
 uint64_t mask = MAKE_64BIT_MASK(0, op->args[4]);
 
 op->opc = and_opc;
@@ -1461,8 +1470,7 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
 }
 
 /* Inserting zero into a value. */
-if (arg_is_const(op->args[2])
-&& arg_info(op->args[2])->val == 0) {
+if (arg_is_const_val(op->args[2], 0)) {
 uint64_t mask = deposit64(-1, op->args[3], op->args[4], 0);
 
 op->opc = and_opc;
@@ -2000,8 +2008,8 @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
  * Simplify LT/GE comparisons vs zero to a single compare
  * vs the high word of the input.
  */
-if (arg_is_const(op->args[3]) && arg_info(op->args[3])->val == 0 &&
-arg_is_const(op->args[4]) && arg_info(op->args[4])->val == 0) {
+if (arg_is_const_val(op->args[3], 0) &&
+

[PATCH v3 06/38] tcg/optimize: Handle TCG_COND_TST{EQ,NE}

2024-01-10 Thread Richard Henderson

Fold constant comparisons.
Canonicalize "tst x,x" to equality vs zero.
Canonicalize "tst x,sign" to sign test vs zero.
Fold double-word comparisons with zero parts.
Fold setcond of "tst x,pow2" to a bit extract.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 240 -
 1 file changed, 218 insertions(+), 22 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 08a9280432..2ed6322f97 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -625,9 +625,15 @@ static bool do_constant_folding_cond_32(uint32_t x, 
uint32_t y, TCGCond c)
 return x <= y;
 case TCG_COND_GTU:
 return x > y;
-default:
-g_assert_not_reached();
+case TCG_COND_TSTEQ:
+return (x & y) == 0;
+case TCG_COND_TSTNE:
+return (x & y) != 0;
+case TCG_COND_ALWAYS:
+case TCG_COND_NEVER:
+break;
 }
+g_assert_not_reached();
 }
 
 static bool do_constant_folding_cond_64(uint64_t x, uint64_t y, TCGCond c)
@@ -653,12 +659,18 @@ static bool do_constant_folding_cond_64(uint64_t x, 
uint64_t y, TCGCond c)
 return x <= y;
 case TCG_COND_GTU:
 return x > y;
-default:
-g_assert_not_reached();
+case TCG_COND_TSTEQ:
+return (x & y) == 0;
+case TCG_COND_TSTNE:
+return (x & y) != 0;
+case TCG_COND_ALWAYS:
+case TCG_COND_NEVER:
+break;
 }
+g_assert_not_reached();
 }
 
-static bool do_constant_folding_cond_eq(TCGCond c)
+static int do_constant_folding_cond_eq(TCGCond c)
 {
 switch (c) {
 case TCG_COND_GT:
@@ -673,9 +685,14 @@ static bool do_constant_folding_cond_eq(TCGCond c)
 case TCG_COND_LEU:
 case TCG_COND_EQ:
 return 1;
-default:
-g_assert_not_reached();
+case TCG_COND_TSTEQ:
+case TCG_COND_TSTNE:
+return -1;
+case TCG_COND_ALWAYS:
+case TCG_COND_NEVER:
+break;
 }
+g_assert_not_reached();
 }
 
 /*
@@ -703,8 +720,10 @@ static int do_constant_folding_cond(TCGType type, TCGArg x,
 } else if (arg_is_const_val(y, 0)) {
 switch (c) {
 case TCG_COND_LTU:
+case TCG_COND_TSTNE:
 return 0;
 case TCG_COND_GEU:
+case TCG_COND_TSTEQ:
 return 1;
 default:
 return -1;
@@ -777,7 +796,30 @@ static int do_constant_folding_cond1(OptContext *ctx, 
TCGArg dest,
 }
 
 r = do_constant_folding_cond(ctx->type, *p1, *p2, cond);
-return r;
+if (r >= 0) {
+return r;
+}
+if (!is_tst_cond(cond)) {
+return -1;
+}
+
+/*
+ * TSTNE x,x -> NE x,0
+ * TSTNE x,-1 -> NE x,0
+ */
+if (args_are_copies(*p1, *p2) || arg_is_const_val(*p2, -1)) {
+*p2 = arg_new_constant(ctx, 0);
+*pcond = tcg_tst_eqne_cond(cond);
+return -1;
+}
+
+/* TSTNE x,sign -> LT x,0 */
+if (arg_is_const_val(*p2, (ctx->type == TCG_TYPE_I32
+   ? INT32_MIN : INT64_MIN))) {
+*p2 = arg_new_constant(ctx, 0);
+*pcond = tcg_tst_ltge_cond(cond);
+}
+return -1;
 }
 
 static int do_constant_folding_cond2(OptContext *ctx, TCGArg *args)
@@ -785,6 +827,7 @@ static int do_constant_folding_cond2(OptContext *ctx, 
TCGArg *args)
 TCGArg al, ah, bl, bh;
 TCGCond c;
 bool swap;
+int r;
 
 swap = swap_commutative2(args, args + 2);
 c = args[4];
@@ -806,21 +849,54 @@ static int do_constant_folding_cond2(OptContext *ctx, 
TCGArg *args)
 tcg_target_ulong alv = arg_info(al)->val;
 tcg_target_ulong ahv = arg_info(ah)->val;
 uint64_t a = deposit64(alv, 32, 32, ahv);
-return do_constant_folding_cond_64(a, b, c);
+
+r = do_constant_folding_cond_64(a, b, c);
+if (r >= 0) {
+return r;
+}
 }
+
 if (b == 0) {
 switch (c) {
 case TCG_COND_LTU:
+case TCG_COND_TSTNE:
 return 0;
 case TCG_COND_GEU:
+case TCG_COND_TSTEQ:
 return 1;
 default:
 break;
 }
 }
+
+/* TSTNE x,-1 -> NE x,0 */
+if (b == -1 && is_tst_cond(c)) {
+args[3] = args[2] = arg_new_constant(ctx, 0);
+args[4] = tcg_tst_eqne_cond(c);
+return -1;
+}
+
+/* TSTNE x,sign -> LT x,0 */
+if (b == INT64_MIN && is_tst_cond(c)) {
+/* bl must be 0, so copy that to bh */
+args[3] = bl;
+args[4] = tcg_tst_ltge_cond(c);
+return -1;
+}
 }
+
 if (args_are_copies(al, bl) && args_are_copies(ah, bh)) {
-return do_constant_folding_cond_eq(c);
+r = do_constant_folding_cond_eq(c);
+if (r >= 0) {
+return r;
+}
+
+/* TSTNE x,x -> NE x,0 */
+if (is_tst_cond(c)) {
+args[3] = args[2]

[PATCH v3 01/38] tcg: Introduce TCG_COND_TST{EQ,NE}

2024-01-10 Thread Richard Henderson

Add the enumerators, adjust the helpers to match, and dump.
Not supported anywhere else just yet.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 docs/devel/tcg-ops.rst |  2 ++
 include/tcg/tcg-cond.h | 74 ++
 tcg/tcg.c  |  4 ++-
 3 files changed, 58 insertions(+), 22 deletions(-)

diff --git a/docs/devel/tcg-ops.rst b/docs/devel/tcg-ops.rst
index 8ae59ea02b..d46b625e0e 100644
--- a/docs/devel/tcg-ops.rst
+++ b/docs/devel/tcg-ops.rst
@@ -253,6 +253,8 @@ Jumps/Labels
|   ``TCG_COND_GEU /* unsigned */``
|   ``TCG_COND_LEU /* unsigned */``
|   ``TCG_COND_GTU /* unsigned */``
+   |   ``TCG_COND_TSTEQ /* t1 & t2 == 0 */``
+   |   ``TCG_COND_TSTNE /* t1 & t2 != 0 */``
 
 Arithmetic
 --
diff --git a/include/tcg/tcg-cond.h b/include/tcg/tcg-cond.h
index 2a38a386d4..5cadbd6ff2 100644
--- a/include/tcg/tcg-cond.h
+++ b/include/tcg/tcg-cond.h
@@ -29,26 +29,34 @@
  * Conditions.  Note that these are laid out for easy manipulation by
  * the functions below:
  *bit 0 is used for inverting;
- *bit 1 is signed,
- *bit 2 is unsigned,
- *bit 3 is used with bit 0 for swapping signed/unsigned.
+ *bit 1 is used for conditions that need swapping (signed/unsigned).
+ *bit 2 is used with bit 1 for swapping.
+ *bit 3 is used for unsigned conditions.
  */
 typedef enum {
 /* non-signed */
 TCG_COND_NEVER  = 0 | 0 | 0 | 0,
 TCG_COND_ALWAYS = 0 | 0 | 0 | 1,
+
+/* equality */
 TCG_COND_EQ = 8 | 0 | 0 | 0,
 TCG_COND_NE = 8 | 0 | 0 | 1,
+
+/* "test" i.e. and then compare vs 0 */
+TCG_COND_TSTEQ  = 8 | 4 | 0 | 0,
+TCG_COND_TSTNE  = 8 | 4 | 0 | 1,
+
 /* signed */
 TCG_COND_LT = 0 | 0 | 2 | 0,
 TCG_COND_GE = 0 | 0 | 2 | 1,
-TCG_COND_LE = 8 | 0 | 2 | 0,
-TCG_COND_GT = 8 | 0 | 2 | 1,
+TCG_COND_GT = 0 | 4 | 2 | 0,
+TCG_COND_LE = 0 | 4 | 2 | 1,
+
 /* unsigned */
-TCG_COND_LTU= 0 | 4 | 0 | 0,
-TCG_COND_GEU= 0 | 4 | 0 | 1,
-TCG_COND_LEU= 8 | 4 | 0 | 0,
-TCG_COND_GTU= 8 | 4 | 0 | 1,
+TCG_COND_LTU= 8 | 0 | 2 | 0,
+TCG_COND_GEU= 8 | 0 | 2 | 1,
+TCG_COND_GTU= 8 | 4 | 2 | 0,
+TCG_COND_LEU= 8 | 4 | 2 | 1,
 } TCGCond;
 
 /* Invert the sense of the comparison.  */
@@ -60,25 +68,49 @@ static inline TCGCond tcg_invert_cond(TCGCond c)
 /* Swap the operands in a comparison.  */
 static inline TCGCond tcg_swap_cond(TCGCond c)
 {
-return c & 6 ? (TCGCond)(c ^ 9) : c;
+return (TCGCond)(c ^ ((c & 2) << 1));
 }
 
-/* Create an "unsigned" version of a "signed" comparison.  */
-static inline TCGCond tcg_unsigned_cond(TCGCond c)
+/* Must a comparison be considered signed?  */
+static inline bool is_signed_cond(TCGCond c)
 {
-return c & 2 ? (TCGCond)(c ^ 6) : c;
-}
-
-/* Create a "signed" version of an "unsigned" comparison.  */
-static inline TCGCond tcg_signed_cond(TCGCond c)
-{
-return c & 4 ? (TCGCond)(c ^ 6) : c;
+return (c & (8 | 2)) == 2;
 }
 
 /* Must a comparison be considered unsigned?  */
 static inline bool is_unsigned_cond(TCGCond c)
 {
-return (c & 4) != 0;
+return (c & (8 | 2)) == (8 | 2);
+}
+
+/* Must a comparison be considered a test?  */
+static inline bool is_tst_cond(TCGCond c)
+{
+return (c | 1) == TCG_COND_TSTNE;
+}
+
+/* Create an "unsigned" version of a "signed" comparison.  */
+static inline TCGCond tcg_unsigned_cond(TCGCond c)
+{
+return is_signed_cond(c) ? (TCGCond)(c + 8) : c;
+}
+
+/* Create a "signed" version of an "unsigned" comparison.  */
+static inline TCGCond tcg_signed_cond(TCGCond c)
+{
+return is_unsigned_cond(c) ? (TCGCond)(c - 8) : c;
+}
+
+/* Create the eq/ne version of a tsteq/tstne comparison.  */
+static inline TCGCond tcg_tst_eqne_cond(TCGCond c)
+{
+return is_tst_cond(c) ? (TCGCond)(c - 4) : c;
+}
+
+/* Create the lt/ge version of a tstne/tsteq comparison of the sign.  */
+static inline TCGCond tcg_tst_ltge_cond(TCGCond c)
+{
+return is_tst_cond(c) ? (TCGCond)(c ^ 0xf) : c;
 }
 
 /*
@@ -92,7 +124,7 @@ static inline TCGCond tcg_high_cond(TCGCond c)
 case TCG_COND_LE:
 case TCG_COND_GEU:
 case TCG_COND_LEU:
-return (TCGCond)(c ^ 8);
+return (TCGCond)(c ^ (4 | 1));
 default:
 return c;
 }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index e2c38f6d11..9d146b13aa 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -2482,7 +2482,9 @@ static const char * const cond_name[] =
 [TCG_COND_LTU] = "ltu",
 [TCG_COND_GEU] = "geu",
 [TCG_COND_LEU] = "leu",
-[TCG_COND_GTU] = "gtu"
+[TCG_COND_GTU] = "gtu",
+[TCG_COND_TSTEQ] = "tsteq",
+[TCG_COND_TSTNE] = "tstne",
 };
 
 static const char * const ldst_name[(MO_BSWAP | MO_SSIZE) + 1] =
-- 
2.34.1

[PATCH v3 07/38] tcg/optimize: Lower TCG_COND_TST{EQ, NE} if unsupported

2024-01-10 Thread Richard Henderson

After having performed other simplifications, lower any
remaining test comparisons with AND.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-internal.h |  2 ++
 tcg/optimize.c | 60 +++---
 tcg/tcg.c  |  2 +-
 3 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
index 6c9d9e48db..9b0d982f65 100644
--- a/tcg/tcg-internal.h
+++ b/tcg/tcg-internal.h
@@ -83,6 +83,8 @@ static inline TCGv_i64 TCGV128_HIGH(TCGv_i128 t)
 
 bool tcg_target_has_memory_bswap(MemOp memop);
 
+TCGTemp *tcg_temp_new_internal(TCGType type, TCGTempKind kind);
+
 /*
  * Locate or create a read-only temporary that is a constant.
  * This kind of temporary need not be freed, but for convenience
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 2ed6322f97..79e701652b 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -364,6 +364,13 @@ static TCGArg arg_new_constant(OptContext *ctx, uint64_t 
val)
 return temp_arg(ts);
 }
 
+static TCGArg arg_new_temp(OptContext *ctx)
+{
+TCGTemp *ts = tcg_temp_new_internal(ctx->type, TEMP_EBB);
+init_ts_info(ctx, ts);
+return temp_arg(ts);
+}
+
 static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
 {
 TCGTemp *dst_ts = arg_temp(dst);
@@ -782,7 +789,7 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
  * Return -1 if the condition can't be simplified,
  * and the result of the condition (0 or 1) if it can.
  */
-static int do_constant_folding_cond1(OptContext *ctx, TCGArg dest,
+static int do_constant_folding_cond1(OptContext *ctx, TCGOp *op, TCGArg dest,
  TCGArg *p1, TCGArg *p2, TCGArg *pcond)
 {
 TCGCond cond;
@@ -818,11 +825,28 @@ static int do_constant_folding_cond1(OptContext *ctx, 
TCGArg dest,
? INT32_MIN : INT64_MIN))) {
 *p2 = arg_new_constant(ctx, 0);
 *pcond = tcg_tst_ltge_cond(cond);
+return -1;
+}
+
+/* Expand to AND with a temporary if no backend support. */
+if (!TCG_TARGET_HAS_tst) {
+TCGOpcode and_opc = (ctx->type == TCG_TYPE_I32
+ ? INDEX_op_and_i32 : INDEX_op_and_i64);
+TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, and_opc, 3);
+TCGArg tmp = arg_new_temp(ctx);
+
+op2->args[0] = tmp;
+op2->args[1] = *p1;
+op2->args[2] = *p2;
+
+*p1 = tmp;
+*p2 = arg_new_constant(ctx, 0);
+*pcond = tcg_tst_eqne_cond(cond);
 }
 return -1;
 }
 
-static int do_constant_folding_cond2(OptContext *ctx, TCGArg *args)
+static int do_constant_folding_cond2(OptContext *ctx, TCGOp *op, TCGArg *args)
 {
 TCGArg al, ah, bl, bh;
 TCGCond c;
@@ -898,6 +922,26 @@ static int do_constant_folding_cond2(OptContext *ctx, 
TCGArg *args)
 return -1;
 }
 }
+
+/* Expand to AND with a temporary if no backend support. */
+if (!TCG_TARGET_HAS_tst && is_tst_cond(c)) {
+TCGOp *op1 = tcg_op_insert_before(ctx->tcg, op, INDEX_op_and_i32, 3);
+TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, INDEX_op_and_i32, 3);
+TCGArg t1 = arg_new_temp(ctx);
+TCGArg t2 = arg_new_temp(ctx);
+
+op1->args[0] = t1;
+op1->args[1] = al;
+op1->args[2] = bl;
+op2->args[0] = t2;
+op2->args[1] = ah;
+op2->args[2] = bh;
+
+args[0] = t1;
+args[1] = t2;
+args[3] = args[2] = arg_new_constant(ctx, 0);
+args[4] = tcg_tst_eqne_cond(c);
+}
 return -1;
 }
 
@@ -1298,7 +1342,7 @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
 
 static bool fold_brcond(OptContext *ctx, TCGOp *op)
 {
-int i = do_constant_folding_cond1(ctx, NO_DEST, >args[0],
+int i = do_constant_folding_cond1(ctx, op, NO_DEST, >args[0],
   >args[1], >args[2]);
 if (i == 0) {
 tcg_op_remove(ctx->tcg, op);
@@ -1317,7 +1361,7 @@ static bool fold_brcond2(OptContext *ctx, TCGOp *op)
 TCGArg label;
 int i, inv = 0;
 
-i = do_constant_folding_cond2(ctx, >args[0]);
+i = do_constant_folding_cond2(ctx, op, >args[0]);
 cond = op->args[4];
 label = op->args[5];
 if (i >= 0) {
@@ -1815,7 +1859,7 @@ static bool fold_movcond(OptContext *ctx, TCGOp *op)
 op->args[5] = tcg_invert_cond(op->args[5]);
 }
 
-i = do_constant_folding_cond1(ctx, NO_DEST, >args[1],
+i = do_constant_folding_cond1(ctx, op, NO_DEST, >args[1],
   >args[2], >args[5]);
 if (i >= 0) {
 return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[4 - i]);
@@ -2151,7 +2195,7 @@ static void fold_setcond_tst_pow2(OptContext *ctx, TCGOp 
*op, bool neg)
 
 static bool fold_setcond(OptContext *ctx, TCGOp *op)
 {
-int i = do_constant_folding_cond1(ctx, op->args[0], >args[1],
+int i = do_constant_folding_cond1(ctx, op, op->args[0], >args[1],
   >args[2],

[PATCH v3 00/38] tcg: Introduce TCG_COND_TST{EQ,NE}

2024-01-10 Thread Richard Henderson

Expose a pair of comparison operators that map to the "test"
comparison that is available on many architectures.

Changes for v3:
  * Make support for TCG_COND_TST* optional (paolo)
  * Drop riscv, loongarch64 and (unposted) mips backend changes.
  * Incorporate Paolo's tcg/i386 TEST improvements
  * Convert some target/s390x cases for even more testing.
  * Probably some bug fixes in there too...


r~


Paolo Bonzini (1):
  tcg/i386: Use TEST r,r to test 8/16/32 bits

Richard Henderson (37):
  tcg: Introduce TCG_COND_TST{EQ,NE}
  tcg: Introduce TCG_TARGET_HAS_tst
  tcg/optimize: Split out arg_is_const_val
  tcg/optimize: Split out do_constant_folding_cond1
  tcg/optimize: Do swap_commutative2 in do_constant_folding_cond2
  tcg/optimize: Handle TCG_COND_TST{EQ,NE}
  tcg/optimize: Lower TCG_COND_TST{EQ,NE} if unsupported
  target/alpha: Pass immediate value to gen_bcond_internal()
  target/alpha: Use TCG_COND_TST{EQ,NE} for BLB{C,S}
  target/alpha: Use TCG_COND_TST{EQ,NE} for CMOVLB{C,S}
  target/alpha: Use TCG_COND_TSTNE for gen_fold_mzero
  target/m68k: Use TCG_COND_TST{EQ,NE} in gen_fcc_cond
  target/sparc: Use TCG_COND_TSTEQ in gen_op_mulscc
  target/s390x: Use TCG_COND_TSTNE for CC_OP_{TM,ICM}
  target/s390x: Improve general case of disas_jcc
  tcg: Add TCGConst argument to tcg_target_const_match
  tcg/aarch64: Support TCG_COND_TST{EQ,NE}
  tcg/aarch64: Generate TBZ, TBNZ
  tcg/aarch64: Generate CBNZ for TSTNE of UINT32_MAX
  tcg/arm: Factor tcg_out_cmp() out
  tcg/arm: Support TCG_COND_TST{EQ,NE}
  tcg/i386: Pass x86 condition codes to tcg_out_cmov
  tcg/i386: Move tcg_cond_to_jcc[] into tcg_out_cmp
  tcg/i386: Support TCG_COND_TST{EQ,NE}
  tcg/i386: Improve TSTNE/TESTEQ vs powers of two
  tcg/sparc64: Hoist read of tcg_cond_to_rcond
  tcg/sparc64: Pass TCGCond to tcg_out_cmp
  tcg/sparc64: Support TCG_COND_TST{EQ,NE}
  tcg/ppc: Sink tcg_to_bc usage into tcg_out_bc
  tcg/ppc: Use cr0 in tcg_to_bc and tcg_to_isel
  tcg/ppc: Tidy up tcg_target_const_match
  tcg/ppc: Add TCG_CT_CONST_CMP
  tcg/ppc: Support TCG_COND_TST{EQ,NE}
  tcg/s390x: Split constraint A into J+U
  tcg/s390x: Add TCG_CT_CONST_CMP
  tcg/s390x: Support TCG_COND_TST{EQ,NE}
  tcg/tci: Support TCG_COND_TST{EQ,NE}

 docs/devel/tcg-ops.rst   |   2 +
 include/tcg/tcg-cond.h   |  74 +++--
 tcg/aarch64/tcg-target-con-set.h |   5 +-
 tcg/aarch64/tcg-target-con-str.h |   1 +
 tcg/aarch64/tcg-target.h |   2 +
 tcg/arm/tcg-target.h |   2 +
 tcg/i386/tcg-target-con-set.h|   6 +-
 tcg/i386/tcg-target-con-str.h|   1 +
 tcg/i386/tcg-target.h|   2 +
 tcg/loongarch64/tcg-target.h |   2 +
 tcg/mips/tcg-target.h|   2 +
 tcg/ppc/tcg-target-con-set.h |   5 +-
 tcg/ppc/tcg-target-con-str.h |   1 +
 tcg/ppc/tcg-target.h |   2 +
 tcg/riscv/tcg-target.h   |   2 +
 tcg/s390x/tcg-target-con-set.h   |   8 +-
 tcg/s390x/tcg-target-con-str.h   |   3 +-
 tcg/s390x/tcg-target.h   |   2 +
 tcg/sparc64/tcg-target.h |   2 +
 tcg/tcg-internal.h   |   2 +
 tcg/tci/tcg-target.h |   2 +
 target/alpha/translate.c |  94 +++---
 target/m68k/translate.c  |  74 +++--
 target/s390x/tcg/translate.c | 100 +++
 target/sparc/translate.c |   4 +-
 tcg/optimize.c   | 474 +++
 tcg/tcg.c|  40 ++-
 tcg/tci.c|  14 +
 tcg/aarch64/tcg-target.c.inc | 165 ---
 tcg/arm/tcg-target.c.inc |  62 ++--
 tcg/i386/tcg-target.c.inc| 201 +
 tcg/loongarch64/tcg-target.c.inc |   3 +-
 tcg/mips/tcg-target.c.inc|   3 +-
 tcg/ppc/tcg-target.c.inc | 294 ++-
 tcg/riscv/tcg-target.c.inc   |   3 +-
 tcg/s390x/tcg-target.c.inc   | 246 +++-
 tcg/sparc64/tcg-target.c.inc |  65 +++--
 tcg/tci/tcg-target.c.inc |   3 +-
 38 files changed, 1378 insertions(+), 595 deletions(-)

-- 
2.34.1

[PATCH v3 04/38] tcg/optimize: Split out do_constant_folding_cond1

2024-01-10 Thread Richard Henderson

Handle modifications to the arguments and condition
in a single place.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 57 --
 1 file changed, 27 insertions(+), 30 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 73019b9996..9c04dba099 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -796,6 +796,23 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 return false;
 }
 
+static int do_constant_folding_cond1(OptContext *ctx, TCGArg dest,
+ TCGArg *p1, TCGArg *p2, TCGArg *pcond)
+{
+TCGCond cond;
+bool swap;
+int r;
+
+swap = swap_commutative(dest, p1, p2);
+cond = *pcond;
+if (swap) {
+*pcond = cond = tcg_swap_cond(cond);
+}
+
+r = do_constant_folding_cond(ctx->type, *p1, *p2, cond);
+return r;
+}
+
 static void init_arguments(OptContext *ctx, TCGOp *op, int nb_args)
 {
 for (int i = 0; i < nb_args; i++) {
@@ -1193,14 +1210,8 @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
 
 static bool fold_brcond(OptContext *ctx, TCGOp *op)
 {
-TCGCond cond = op->args[2];
-int i;
-
-if (swap_commutative(NO_DEST, >args[0], >args[1])) {
-op->args[2] = cond = tcg_swap_cond(cond);
-}
-
-i = do_constant_folding_cond(ctx->type, op->args[0], op->args[1], cond);
+int i = do_constant_folding_cond1(ctx, NO_DEST, >args[0],
+  >args[1], >args[2]);
 if (i == 0) {
 tcg_op_remove(ctx->tcg, op);
 return true;
@@ -1695,21 +1706,18 @@ static bool fold_mov(OptContext *ctx, TCGOp *op)
 
 static bool fold_movcond(OptContext *ctx, TCGOp *op)
 {
-TCGCond cond = op->args[5];
 int i;
 
-if (swap_commutative(NO_DEST, >args[1], >args[2])) {
-op->args[5] = cond = tcg_swap_cond(cond);
-}
 /*
  * Canonicalize the "false" input reg to match the destination reg so
  * that the tcg backend can implement a "move if true" operation.
  */
 if (swap_commutative(op->args[0], >args[4], >args[3])) {
-op->args[5] = cond = tcg_invert_cond(cond);
+op->args[5] = tcg_invert_cond(op->args[5]);
 }
 
-i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
+i = do_constant_folding_cond1(ctx, NO_DEST, >args[1],
+  >args[2], >args[5]);
 if (i >= 0) {
 return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[4 - i]);
 }
@@ -1723,6 +1731,7 @@ static bool fold_movcond(OptContext *ctx, TCGOp *op)
 uint64_t tv = arg_info(op->args[3])->val;
 uint64_t fv = arg_info(op->args[4])->val;
 TCGOpcode opc, negopc = 0;
+TCGCond cond = op->args[5];
 
 switch (ctx->type) {
 case TCG_TYPE_I32:
@@ -1950,14 +1959,8 @@ static bool fold_remainder(OptContext *ctx, TCGOp *op)
 
 static bool fold_setcond(OptContext *ctx, TCGOp *op)
 {
-TCGCond cond = op->args[3];
-int i;
-
-if (swap_commutative(op->args[0], >args[1], >args[2])) {
-op->args[3] = cond = tcg_swap_cond(cond);
-}
-
-i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
+int i = do_constant_folding_cond1(ctx, op->args[0], >args[1],
+  >args[2], >args[3]);
 if (i >= 0) {
 return tcg_opt_gen_movi(ctx, op, op->args[0], i);
 }
@@ -1969,14 +1972,8 @@ static bool fold_setcond(OptContext *ctx, TCGOp *op)
 
 static bool fold_negsetcond(OptContext *ctx, TCGOp *op)
 {
-TCGCond cond = op->args[3];
-int i;
-
-if (swap_commutative(op->args[0], >args[1], >args[2])) {
-op->args[3] = cond = tcg_swap_cond(cond);
-}
-
-i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
+int i = do_constant_folding_cond1(ctx, op->args[0], >args[1],
+  >args[2], >args[3]);
 if (i >= 0) {
 return tcg_opt_gen_movi(ctx, op, op->args[0], -i);
 }
-- 
2.34.1

Re: [PATCH v3 07/33] linux-user/arm: Remove qemu_host_page_size from init_guest_commpage

2024-01-10 Thread Richard Henderson


On 1/8/24 20:38, Pierrick Bouvier wrote:

On 1/2/24 05:57, Richard Henderson wrote:

Use qemu_real_host_page_size.
If the commpage is not within reserved_va, use MAP_FIXED_NOREPLACE.

Signed-off-by: Richard Henderson 
---
  linux-user/elfload.c | 13 -
  1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 4fcc490ce6..2e2b1b0784 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -459,6 +459,7 @@ enum {
  static bool init_guest_commpage(void)
  {
  ARMCPU *cpu = ARM_CPU(thread_cpu);
+    int host_page_size = qemu_real_host_page_size();
  abi_ptr commpage;
  void *want;
  void *addr;
@@ -471,10 +472,12 @@ static bool init_guest_commpage(void)
  return true;
  }
-    commpage = HI_COMMPAGE & -qemu_host_page_size;
+    commpage = HI_COMMPAGE & -host_page_size;
  want = g2h_untagged(commpage);
-    addr = mmap(want, qemu_host_page_size, PROT_READ | PROT_WRITE,
-    MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
+    addr = mmap(want, host_page_size, PROT_READ | PROT_WRITE,
+    MAP_ANONYMOUS | MAP_PRIVATE |
+    (commpage < reserved_va ? MAP_FIXED : MAP_FIXED_NOREPLACE),
+    -1, 0);
  if (addr == MAP_FAILED) {
  perror("Allocating guest commpage");
@@ -487,12 +490,12 @@ static bool init_guest_commpage(void)
  /* Set kernel helper versions; rest of page is 0.  */
  __put_user(5, (uint32_t *)g2h_untagged(0x0ffcu));
-    if (mprotect(addr, qemu_host_page_size, PROT_READ)) {
+    if (mprotect(addr, host_page_size, PROT_READ)) {
  perror("Protecting guest commpage");
  exit(EXIT_FAILURE);
  }
-    page_set_flags(commpage, commpage | ~qemu_host_page_mask,
+    page_set_flags(commpage, commpage | (host_page_size - 1),
 PAGE_READ | PAGE_EXEC | PAGE_VALID);
  return true;
  }


To confirm if I understand correctly, when using a reserved va, the contiguous address 
space is reserved using mmap, thus MAP_FIXED_NOREPLACE would fail when hitting it?


Correct.

r~

[PULL 2/4] tcg/i386: use 8-bit OR or XOR for unsigned 8-bit immediates

2024-01-10 Thread Richard Henderson

From: Paolo Bonzini 

In the case where OR or XOR has an 8-bit immediate between 128 and 255,
we can operate on a low-byte register and shorten the output by two or
three bytes (two if a prefix byte is needed for REX.B).

Signed-off-by: Paolo Bonzini 
Message-Id: <20231228120524.70239-1-pbonz...@redhat.com>
[rth: Incorporate into switch.]
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 29e80af78b..d268199fc1 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -244,6 +244,7 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct, int vece)
 #define P_VEXL  0x8 /* Set VEX.L = 1 */
 #define P_EVEX  0x10/* Requires EVEX encoding */
 
+#define OPC_ARITH_EbIb (0x80)
 #define OPC_ARITH_EvIz (0x81)
 #define OPC_ARITH_EvIb (0x83)
 #define OPC_ARITH_GvEv (0x03)  /* ... plus (ARITH_FOO << 3) */
@@ -1370,6 +1371,16 @@ static void tgen_arithi(TCGContext *s, int c, int r0,
 return;
 }
 break;
+
+case ARITH_OR:
+case ARITH_XOR:
+if (val >= 0x80 && val <= 0xff
+&& (r0 < 4 || TCG_TARGET_REG_BITS == 64)) {
+tcg_out_modrm(s, OPC_ARITH_EbIb + P_REXB_RM, c, r0);
+tcg_out8(s, val);
+return;
+}
+break;
 }
 
 if (val == (int8_t)val) {
-- 
2.34.1

1 2 3 >

1 - 100 of 265 matches

Mail list logo