Re: [PATCH v2] target/arm: Fix 32-bit SMOPA

2024-03-13 Thread Richard Henderson

On 3/13/24 09:12, Michael Tokarev wrote:

warning: TCG temporary leaks before 00400730
qemu-aarch64: ../tcg/tcg.c:1052: tcg_temp_free_internal: Assertion `ts->temp_allocated != 
0' failed.

timeout: the monitored command dumped core
Trace/breakpoint trap

Does it make sense to pick this for 7.2 with the above failures?


No, it doesn't.

I guess it was in the 8.x series that we eliminated the need for freeing tcg temporaries. 
The patch set would need adjustment for that, and I don't think it's worth the effort.



r~



Re: [PATCH v2] target/arm: Fix 32-bit SMOPA

2024-03-13 Thread Michael Tokarev

10.03.2024 21:13, Richard Henderson wrote:

On 3/9/24 08:40, Michael Tokarev wrote:

...

I tried to pick this one up for stable-7.2 (since the fix is for older commit),
and faced a fun issue in this change to tests/tcg/aarch64/Makefile.target,
since 7.2. doesn't have CROSS_AS_HAS_ARMV9_SME yet.  I went on and found out
that the following commits:

v7.2.0-374-gbc6bd20ee3  target/arm: align exposed ID registers with Linux
v8.0.0-2358-g3dc2afeab2 tests/tcg/aarch64/sysregs.c: Use S syntax for 
id_aa64zfr0_el1
 and id_aa64smfr0_el1
v8.0.0-2361-g1f51573f79 target/arm: Fix SME full tile indexing

applies to 7.2, and lets this "Fix 32-bit SMOPA" change to apply cleanly,
including this very change in Makefile.target.

Now, 1f51573f79 "Fix SME full tile indexing" is Cc'd qemu-stable already, but
it is not in 7.2.x for some reason which I don't remember anymore, so it is
a good one to pick up already.  3dc2afeab is tests-only.


Oh wow, I didn't expect the fix to get propagated back that far.
I was expecting only back into the 8.x series...


And bc6bd20ee3 (from Dec-2022) seems like a good candidate too, is it not?


Sure, couldn't hurt.

If it all applies without drama, all is well.


While it goes fine, it doesn't quite work actually:

https://gitlab.com/qemu-project/qemu/-/jobs/6386966720#L2482

timeout --foreground 90  
/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/qemu-aarch64  
sme-smopa-1 >  sme-smopa-1.out
warning: TCG temporary leaks before 00400714
qemu-aarch64: ../tcg/tcg.c:1052: tcg_temp_free_internal: Assertion 
`ts->temp_allocated != 0' failed.
timeout: the monitored command dumped core
Trace/breakpoint trap
make[1]: *** [Makefile:170: run-sme-smopa-1] Error 133
make[1]: Leaving directory 
'/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/tests/tcg/aarch64-linux-user'
make[1]: *** Waiting for unfinished jobs
make[1]: Entering directory 
'/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/tests/tcg/aarch64-linux-user'
timeout --foreground 90  
/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/qemu-aarch64  
sme-smopa-2 >  sme-smopa-2.out
warning: TCG temporary leaks before 00400730
qemu-aarch64: ../tcg/tcg.c:1052: tcg_temp_free_internal: Assertion 
`ts->temp_allocated != 0' failed.
timeout: the monitored command dumped core
Trace/breakpoint trap

Does it make sense to pick this for 7.2 with the above failures?

Unfortunately this test does not run on a private repository clone on gitlab,
that's why I haven't noticed this immediately.

Thanks,

/mjt



Re: [PATCH v2] target/arm: Fix 32-bit SMOPA

2024-03-10 Thread Michael Tokarev

10.03.2024 21:13, Richard Henderson :
...

If it all applies without drama, all is well.


This very fix applies and works just fine on top of 7.2, it is only
the tests part (the Makefile.target fragment) which isn't, so it's
kinda problematic to apply it without a test.  And while figuring out
what to do with the tests, I found other changes which also should go
there.

It isn't just about lack of drama.  It applies cleanly and works fine
so far (unfortunately I don't have lots'a arm* guests to test it) and
the CI works fine too.

That's why I mentioned it in the first place, - the situation is kinda
fun :)

(and to let everyone know the changes which are being picked up too,
ofcourse).

Thanks,

/mjt



Re: [PATCH v2] target/arm: Fix 32-bit SMOPA

2024-03-10 Thread Richard Henderson

On 3/9/24 08:40, Michael Tokarev wrote:

05.03.2024 19:39, Richard Henderson wrote:

While the 8-bit input elements are sequential in the input vector,
the 32-bit output elements are not sequential in the output matrix.
Do not attempt to compute 2 32-bit outputs at the same time.

Cc: qemu-sta...@nongnu.org
Fixes: 23a5e3859f5 ("target/arm: Implement SME integer outer product")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2083
Signed-off-by: Richard Henderson 


...

diff --git a/tests/tcg/aarch64/Makefile.target 
b/tests/tcg/aarch64/Makefile.target
index cded1d01fc..ea3e232e65 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -67,7 +67,7 @@ endif
  # SME Tests
  ifneq ($(CROSS_AS_HAS_ARMV9_SME),)
-AARCH64_TESTS += sme-outprod1
+AARCH64_TESTS += sme-outprod1 sme-smopa-1 sme-smopa-2
  endif


I tried to pick this one up for stable-7.2 (since the fix is for older commit),
and faced a fun issue in this change to tests/tcg/aarch64/Makefile.target,
since 7.2. doesn't have CROSS_AS_HAS_ARMV9_SME yet.  I went on and found out
that the following commits:

v7.2.0-374-gbc6bd20ee3  target/arm: align exposed ID registers with Linux
v8.0.0-2358-g3dc2afeab2 tests/tcg/aarch64/sysregs.c: Use S syntax for 
id_aa64zfr0_el1
     and id_aa64smfr0_el1
v8.0.0-2361-g1f51573f79 target/arm: Fix SME full tile indexing

applies to 7.2, and lets this "Fix 32-bit SMOPA" change to apply cleanly,
including this very change in Makefile.target.

Now, 1f51573f79 "Fix SME full tile indexing" is Cc'd qemu-stable already, but
it is not in 7.2.x for some reason which I don't remember anymore, so it is
a good one to pick up already.  3dc2afeab is tests-only.


Oh wow, I didn't expect the fix to get propagated back that far.
I was expecting only back into the 8.x series...


And bc6bd20ee3 (from Dec-2022) seems like a good candidate too, is it not?


Sure, couldn't hurt.

If it all applies without drama, all is well.


r~



Re: [PATCH v2] target/arm: Fix 32-bit SMOPA

2024-03-09 Thread Michael Tokarev

05.03.2024 19:39, Richard Henderson wrote:

While the 8-bit input elements are sequential in the input vector,
the 32-bit output elements are not sequential in the output matrix.
Do not attempt to compute 2 32-bit outputs at the same time.

Cc: qemu-sta...@nongnu.org
Fixes: 23a5e3859f5 ("target/arm: Implement SME integer outer product")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2083
Signed-off-by: Richard Henderson 


...

diff --git a/tests/tcg/aarch64/Makefile.target 
b/tests/tcg/aarch64/Makefile.target
index cded1d01fc..ea3e232e65 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -67,7 +67,7 @@ endif
  
  # SME Tests

  ifneq ($(CROSS_AS_HAS_ARMV9_SME),)
-AARCH64_TESTS += sme-outprod1
+AARCH64_TESTS += sme-outprod1 sme-smopa-1 sme-smopa-2
  endif


I tried to pick this one up for stable-7.2 (since the fix is for older commit),
and faced a fun issue in this change to tests/tcg/aarch64/Makefile.target,
since 7.2. doesn't have CROSS_AS_HAS_ARMV9_SME yet.  I went on and found out
that the following commits:

v7.2.0-374-gbc6bd20ee3  target/arm: align exposed ID registers with Linux
v8.0.0-2358-g3dc2afeab2 tests/tcg/aarch64/sysregs.c: Use S syntax for 
id_aa64zfr0_el1
and id_aa64smfr0_el1
v8.0.0-2361-g1f51573f79 target/arm: Fix SME full tile indexing

applies to 7.2, and lets this "Fix 32-bit SMOPA" change to apply cleanly,
including this very change in Makefile.target.

Now, 1f51573f79 "Fix SME full tile indexing" is Cc'd qemu-stable already, but
it is not in 7.2.x for some reason which I don't remember anymore, so it is
a good one to pick up already.  3dc2afeab is tests-only.

And bc6bd20ee3 (from Dec-2022) seems like a good candidate too, is it not?

I'm picking up all the bunch for now, let's see..

Thanks,

/mjt



Re: [PATCH v2] target/arm: Fix 32-bit SMOPA

2024-03-07 Thread Peter Maydell
On Tue, 5 Mar 2024 at 16:39, Richard Henderson
 wrote:
>
> While the 8-bit input elements are sequential in the input vector,
> the 32-bit output elements are not sequential in the output matrix.
> Do not attempt to compute 2 32-bit outputs at the same time.
>
> Cc: qemu-sta...@nongnu.org
> Fixes: 23a5e3859f5 ("target/arm: Implement SME integer outer product")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2083
> Signed-off-by: Richard Henderson 
> ---
>
> v2: Fixed endian issue; double-checked on s390x.
>


Applied to target-arm.next, thanks.

-- PMM



Re: [PATCH v2] target/arm: Fix 32-bit SMOPA

2024-03-05 Thread Philippe Mathieu-Daudé

On 5/3/24 17:39, Richard Henderson wrote:

While the 8-bit input elements are sequential in the input vector,
the 32-bit output elements are not sequential in the output matrix.
Do not attempt to compute 2 32-bit outputs at the same time.

Cc: qemu-sta...@nongnu.org
Fixes: 23a5e3859f5 ("target/arm: Implement SME integer outer product")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2083
Signed-off-by: Richard Henderson 
---

v2: Fixed endian issue; double-checked on s390x.

---
  target/arm/tcg/sme_helper.c   | 77 ++-
  tests/tcg/aarch64/sme-smopa-1.c   | 47 +++
  tests/tcg/aarch64/sme-smopa-2.c   | 54 ++
  tests/tcg/aarch64/Makefile.target |  2 +-
  4 files changed, 147 insertions(+), 33 deletions(-)
  create mode 100644 tests/tcg/aarch64/sme-smopa-1.c
  create mode 100644 tests/tcg/aarch64/sme-smopa-2.c


Reviewed-by: Philippe Mathieu-Daudé