Re: [PATCH v2] target/arm: Fix 32-bit SMOPA
On 3/13/24 09:12, Michael Tokarev wrote: warning: TCG temporary leaks before 00400730 qemu-aarch64: ../tcg/tcg.c:1052: tcg_temp_free_internal: Assertion `ts->temp_allocated != 0' failed. timeout: the monitored command dumped core Trace/breakpoint trap Does it make sense to pick this for 7.2 with the above failures? No, it doesn't. I guess it was in the 8.x series that we eliminated the need for freeing tcg temporaries. The patch set would need adjustment for that, and I don't think it's worth the effort. r~
Re: [PATCH v2] target/arm: Fix 32-bit SMOPA
10.03.2024 21:13, Richard Henderson wrote: On 3/9/24 08:40, Michael Tokarev wrote: ... I tried to pick this one up for stable-7.2 (since the fix is for older commit), and faced a fun issue in this change to tests/tcg/aarch64/Makefile.target, since 7.2. doesn't have CROSS_AS_HAS_ARMV9_SME yet. I went on and found out that the following commits: v7.2.0-374-gbc6bd20ee3 target/arm: align exposed ID registers with Linux v8.0.0-2358-g3dc2afeab2 tests/tcg/aarch64/sysregs.c: Use S syntax for id_aa64zfr0_el1 and id_aa64smfr0_el1 v8.0.0-2361-g1f51573f79 target/arm: Fix SME full tile indexing applies to 7.2, and lets this "Fix 32-bit SMOPA" change to apply cleanly, including this very change in Makefile.target. Now, 1f51573f79 "Fix SME full tile indexing" is Cc'd qemu-stable already, but it is not in 7.2.x for some reason which I don't remember anymore, so it is a good one to pick up already. 3dc2afeab is tests-only. Oh wow, I didn't expect the fix to get propagated back that far. I was expecting only back into the 8.x series... And bc6bd20ee3 (from Dec-2022) seems like a good candidate too, is it not? Sure, couldn't hurt. If it all applies without drama, all is well. While it goes fine, it doesn't quite work actually: https://gitlab.com/qemu-project/qemu/-/jobs/6386966720#L2482 timeout --foreground 90 /home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/qemu-aarch64 sme-smopa-1 > sme-smopa-1.out warning: TCG temporary leaks before 00400714 qemu-aarch64: ../tcg/tcg.c:1052: tcg_temp_free_internal: Assertion `ts->temp_allocated != 0' failed. timeout: the monitored command dumped core Trace/breakpoint trap make[1]: *** [Makefile:170: run-sme-smopa-1] Error 133 make[1]: Leaving directory '/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/tests/tcg/aarch64-linux-user' make[1]: *** Waiting for unfinished jobs make[1]: Entering directory '/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/tests/tcg/aarch64-linux-user' timeout --foreground 90 /home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/qemu-aarch64 sme-smopa-2 > sme-smopa-2.out warning: TCG temporary leaks before 00400730 qemu-aarch64: ../tcg/tcg.c:1052: tcg_temp_free_internal: Assertion `ts->temp_allocated != 0' failed. timeout: the monitored command dumped core Trace/breakpoint trap Does it make sense to pick this for 7.2 with the above failures? Unfortunately this test does not run on a private repository clone on gitlab, that's why I haven't noticed this immediately. Thanks, /mjt
Re: [PATCH v2] target/arm: Fix 32-bit SMOPA
10.03.2024 21:13, Richard Henderson : ... If it all applies without drama, all is well. This very fix applies and works just fine on top of 7.2, it is only the tests part (the Makefile.target fragment) which isn't, so it's kinda problematic to apply it without a test. And while figuring out what to do with the tests, I found other changes which also should go there. It isn't just about lack of drama. It applies cleanly and works fine so far (unfortunately I don't have lots'a arm* guests to test it) and the CI works fine too. That's why I mentioned it in the first place, - the situation is kinda fun :) (and to let everyone know the changes which are being picked up too, ofcourse). Thanks, /mjt
Re: [PATCH v2] target/arm: Fix 32-bit SMOPA
On 3/9/24 08:40, Michael Tokarev wrote: 05.03.2024 19:39, Richard Henderson wrote: While the 8-bit input elements are sequential in the input vector, the 32-bit output elements are not sequential in the output matrix. Do not attempt to compute 2 32-bit outputs at the same time. Cc: qemu-sta...@nongnu.org Fixes: 23a5e3859f5 ("target/arm: Implement SME integer outer product") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2083 Signed-off-by: Richard Henderson ... diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target index cded1d01fc..ea3e232e65 100644 --- a/tests/tcg/aarch64/Makefile.target +++ b/tests/tcg/aarch64/Makefile.target @@ -67,7 +67,7 @@ endif # SME Tests ifneq ($(CROSS_AS_HAS_ARMV9_SME),) -AARCH64_TESTS += sme-outprod1 +AARCH64_TESTS += sme-outprod1 sme-smopa-1 sme-smopa-2 endif I tried to pick this one up for stable-7.2 (since the fix is for older commit), and faced a fun issue in this change to tests/tcg/aarch64/Makefile.target, since 7.2. doesn't have CROSS_AS_HAS_ARMV9_SME yet. I went on and found out that the following commits: v7.2.0-374-gbc6bd20ee3 target/arm: align exposed ID registers with Linux v8.0.0-2358-g3dc2afeab2 tests/tcg/aarch64/sysregs.c: Use S syntax for id_aa64zfr0_el1 and id_aa64smfr0_el1 v8.0.0-2361-g1f51573f79 target/arm: Fix SME full tile indexing applies to 7.2, and lets this "Fix 32-bit SMOPA" change to apply cleanly, including this very change in Makefile.target. Now, 1f51573f79 "Fix SME full tile indexing" is Cc'd qemu-stable already, but it is not in 7.2.x for some reason which I don't remember anymore, so it is a good one to pick up already. 3dc2afeab is tests-only. Oh wow, I didn't expect the fix to get propagated back that far. I was expecting only back into the 8.x series... And bc6bd20ee3 (from Dec-2022) seems like a good candidate too, is it not? Sure, couldn't hurt. If it all applies without drama, all is well. r~
Re: [PATCH v2] target/arm: Fix 32-bit SMOPA
05.03.2024 19:39, Richard Henderson wrote: While the 8-bit input elements are sequential in the input vector, the 32-bit output elements are not sequential in the output matrix. Do not attempt to compute 2 32-bit outputs at the same time. Cc: qemu-sta...@nongnu.org Fixes: 23a5e3859f5 ("target/arm: Implement SME integer outer product") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2083 Signed-off-by: Richard Henderson ... diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target index cded1d01fc..ea3e232e65 100644 --- a/tests/tcg/aarch64/Makefile.target +++ b/tests/tcg/aarch64/Makefile.target @@ -67,7 +67,7 @@ endif # SME Tests ifneq ($(CROSS_AS_HAS_ARMV9_SME),) -AARCH64_TESTS += sme-outprod1 +AARCH64_TESTS += sme-outprod1 sme-smopa-1 sme-smopa-2 endif I tried to pick this one up for stable-7.2 (since the fix is for older commit), and faced a fun issue in this change to tests/tcg/aarch64/Makefile.target, since 7.2. doesn't have CROSS_AS_HAS_ARMV9_SME yet. I went on and found out that the following commits: v7.2.0-374-gbc6bd20ee3 target/arm: align exposed ID registers with Linux v8.0.0-2358-g3dc2afeab2 tests/tcg/aarch64/sysregs.c: Use S syntax for id_aa64zfr0_el1 and id_aa64smfr0_el1 v8.0.0-2361-g1f51573f79 target/arm: Fix SME full tile indexing applies to 7.2, and lets this "Fix 32-bit SMOPA" change to apply cleanly, including this very change in Makefile.target. Now, 1f51573f79 "Fix SME full tile indexing" is Cc'd qemu-stable already, but it is not in 7.2.x for some reason which I don't remember anymore, so it is a good one to pick up already. 3dc2afeab is tests-only. And bc6bd20ee3 (from Dec-2022) seems like a good candidate too, is it not? I'm picking up all the bunch for now, let's see.. Thanks, /mjt
Re: [PATCH v2] target/arm: Fix 32-bit SMOPA
On Tue, 5 Mar 2024 at 16:39, Richard Henderson wrote: > > While the 8-bit input elements are sequential in the input vector, > the 32-bit output elements are not sequential in the output matrix. > Do not attempt to compute 2 32-bit outputs at the same time. > > Cc: qemu-sta...@nongnu.org > Fixes: 23a5e3859f5 ("target/arm: Implement SME integer outer product") > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2083 > Signed-off-by: Richard Henderson > --- > > v2: Fixed endian issue; double-checked on s390x. > Applied to target-arm.next, thanks. -- PMM
Re: [PATCH v2] target/arm: Fix 32-bit SMOPA
On 5/3/24 17:39, Richard Henderson wrote: While the 8-bit input elements are sequential in the input vector, the 32-bit output elements are not sequential in the output matrix. Do not attempt to compute 2 32-bit outputs at the same time. Cc: qemu-sta...@nongnu.org Fixes: 23a5e3859f5 ("target/arm: Implement SME integer outer product") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2083 Signed-off-by: Richard Henderson --- v2: Fixed endian issue; double-checked on s390x. --- target/arm/tcg/sme_helper.c | 77 ++- tests/tcg/aarch64/sme-smopa-1.c | 47 +++ tests/tcg/aarch64/sme-smopa-2.c | 54 ++ tests/tcg/aarch64/Makefile.target | 2 +- 4 files changed, 147 insertions(+), 33 deletions(-) create mode 100644 tests/tcg/aarch64/sme-smopa-1.c create mode 100644 tests/tcg/aarch64/sme-smopa-2.c Reviewed-by: Philippe Mathieu-Daudé