Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
On 17/1/24 12:53, Michael Tokarev wrote: 04.01.2024 01:51, Richard Henderson : On 1/4/24 01:37, Philippe Mathieu-Daudé wrote: Finally changing the constraints on op_rotli_vec seems to fix it: --- diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc index fbee43d3b0..b3456fe857 100644 --- a/tcg/s390x/tcg-target.c.inc +++ b/tcg/s390x/tcg-target.c.inc @@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_ld_vec: case INDEX_op_dupm_vec: + case INDEX_op_rotli_vec: return C_O1_I1(v, r); case INDEX_op_dup_vec: return C_O1_I1(v, vr); case INDEX_op_abs_vec: case INDEX_op_neg_vec: case INDEX_op_not_vec: - case INDEX_op_rotli_vec: case INDEX_op_sari_vec: case INDEX_op_shli_vec: case INDEX_op_shri_vec: case INDEX_op_s390_vuph_vec: case INDEX_op_s390_vupl_vec: return C_O1_I1(v, v); Definitely not correct, since VERLL requires a vector input to be rotated. But I'm outside of my comfort zone so not really sure what I'm doing... (I was inspired by the "the instruction verll only allows immediates up to 32 bits." comment from https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html) That message is confused. The immediate in VERLL is 12 bits (with only 6 bits ever used for MO_64). Dunno where "32 bits" comes from. So, what do we have here in the end? Should we fix this on qemu side? Yes. This thread stopped quite some time ago, with problematic instruction found but no solution.. Be assured we are spending (too?) many hours on this...
Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
Michael Tokarev writes: > 04.01.2024 01:51, Richard Henderson : >> On 1/4/24 01:37, Philippe Mathieu-Daudé wrote: >>> Finally changing the constraints on op_rotli_vec seems to fix it: >>> >>> --- >>> diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc >>> index fbee43d3b0..b3456fe857 100644 >>> --- a/tcg/s390x/tcg-target.c.inc >>> +++ b/tcg/s390x/tcg-target.c.inc >>> @@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex >>> tcg_target_op_def(TCGOpcode op) >>> case INDEX_op_ld_vec: >>> case INDEX_op_dupm_vec: >>> + case INDEX_op_rotli_vec: >>> return C_O1_I1(v, r); >>> case INDEX_op_dup_vec: >>> return C_O1_I1(v, vr); >>> case INDEX_op_abs_vec: >>> case INDEX_op_neg_vec: >>> case INDEX_op_not_vec: >>> - case INDEX_op_rotli_vec: >>> case INDEX_op_sari_vec: >>> case INDEX_op_shli_vec: >>> case INDEX_op_shri_vec: >>> case INDEX_op_s390_vuph_vec: >>> case INDEX_op_s390_vupl_vec: >>> return C_O1_I1(v, v); >> Definitely not correct, since VERLL requires a vector input to be >> rotated. >> >>> But I'm outside of my comfort zone so not really sure what I'm doing... >>> (I was inspired by the "the instruction verll only allows immediates up >>> to 32 bits." comment from >>> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html) >> That message is confused. The immediate in VERLL is 12 bits (with >> only 6 bits ever used for MO_64). Dunno where "32 bits" comes from. > > So, what do we have here in the end? > Should we fix this on qemu side? I think the thinking is we should disable the optimisation for the 8.2 stable while figuring out the true fix for 9.0. > > This thread stopped quite some time ago, with problematic > instruction found but no solution.. > > Thanks, > > /mjt -- Alex Bennée Virtualisation Tech Lead @ Linaro
Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
04.01.2024 01:51, Richard Henderson : On 1/4/24 01:37, Philippe Mathieu-Daudé wrote: Finally changing the constraints on op_rotli_vec seems to fix it: --- diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc index fbee43d3b0..b3456fe857 100644 --- a/tcg/s390x/tcg-target.c.inc +++ b/tcg/s390x/tcg-target.c.inc @@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_ld_vec: case INDEX_op_dupm_vec: + case INDEX_op_rotli_vec: return C_O1_I1(v, r); case INDEX_op_dup_vec: return C_O1_I1(v, vr); case INDEX_op_abs_vec: case INDEX_op_neg_vec: case INDEX_op_not_vec: - case INDEX_op_rotli_vec: case INDEX_op_sari_vec: case INDEX_op_shli_vec: case INDEX_op_shri_vec: case INDEX_op_s390_vuph_vec: case INDEX_op_s390_vupl_vec: return C_O1_I1(v, v); Definitely not correct, since VERLL requires a vector input to be rotated. But I'm outside of my comfort zone so not really sure what I'm doing... (I was inspired by the "the instruction verll only allows immediates up to 32 bits." comment from https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html) That message is confused. The immediate in VERLL is 12 bits (with only 6 bits ever used for MO_64). Dunno where "32 bits" comes from. So, what do we have here in the end? Should we fix this on qemu side? This thread stopped quite some time ago, with problematic instruction found but no solution.. Thanks, /mjt
Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
On 1/4/24 01:37, Philippe Mathieu-Daudé wrote: Finally changing the constraints on op_rotli_vec seems to fix it: --- diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc index fbee43d3b0..b3456fe857 100644 --- a/tcg/s390x/tcg-target.c.inc +++ b/tcg/s390x/tcg-target.c.inc @@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_ld_vec: case INDEX_op_dupm_vec: + case INDEX_op_rotli_vec: return C_O1_I1(v, r); case INDEX_op_dup_vec: return C_O1_I1(v, vr); case INDEX_op_abs_vec: case INDEX_op_neg_vec: case INDEX_op_not_vec: - case INDEX_op_rotli_vec: case INDEX_op_sari_vec: case INDEX_op_shli_vec: case INDEX_op_shri_vec: case INDEX_op_s390_vuph_vec: case INDEX_op_s390_vupl_vec: return C_O1_I1(v, v); Definitely not correct, since VERLL requires a vector input to be rotated. But I'm outside of my comfort zone so not really sure what I'm doing... (I was inspired by the "the instruction verll only allows immediates up to 32 bits." comment from https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html) That message is confused. The immediate in VERLL is 12 bits (with only 6 bits ever used for MO_64). Dunno where "32 bits" comes from. r~
Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
On 3/1/24 15:01, Philippe Mathieu-Daudé wrote: On 3/1/24 12:53, Philippe Mathieu-Daudé wrote: Hi Richard, On 3/1/24 09:54, Michael Tokarev wrote: 03.01.2024 03:22, Richard Henderson wrote: On 12/22/23 01:51, Michael Tokarev wrote: ... git bisect points to this commit: commit ab84dc398b3b702b0c692538b947ef65dbbdf52f Author: Richard Henderson Date: Wed Aug 23 23:04:24 2023 -0700 tcg/optimize: Optimize env memory operations So far, this seems to work on amd64 host, but fails on s390x host - where this has been observed so far. Maybe it also fails in some other combinations too, I don't yet know. Just finished bisecting it on s390x. I haven't been able to build a reproducer for this. Have you an image or kernel you can share? Sure. Here's my actual testing "image": http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz It contains vmlinuz and initrd - generated on a debian s390x system using standard debian tools. Actual command line I used when doing bisection: ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic -smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G -kernel vmlinuz -initrd initrd -snapshot Reducing a bit further, it works when disabling rotli_vec opcode (commit 22cb37b417 "tcg/s390x: Implement vector shift operations"): --- diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc index fbee43d3b0..5f147661e8 100644 --- a/tcg/s390x/tcg-target.c.inc +++ b/tcg/s390x/tcg-target.c.inc @@ -2918,3 +2918,5 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_orc_vec: + return 1; case INDEX_op_rotli_vec: + return TCG_TARGET_HAS_roti_vec; case INDEX_op_rotls_vec: diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h index e69b0d2ddd..5c18146a40 100644 --- a/tcg/s390x/tcg-target.h +++ b/tcg/s390x/tcg-target.h @@ -152,3 +152,3 @@ extern uint64_t s390_facilities[3]; #define TCG_TARGET_HAS_abs_vec 1 -#define TCG_TARGET_HAS_roti_vec 1 +#define TCG_TARGET_HAS_roti_vec 0 #define TCG_TARGET_HAS_rots_vec 1 --- Finally changing the constraints on op_rotli_vec seems to fix it: --- diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc index fbee43d3b0..b3456fe857 100644 --- a/tcg/s390x/tcg-target.c.inc +++ b/tcg/s390x/tcg-target.c.inc @@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_ld_vec: case INDEX_op_dupm_vec: +case INDEX_op_rotli_vec: return C_O1_I1(v, r); case INDEX_op_dup_vec: return C_O1_I1(v, vr); case INDEX_op_abs_vec: case INDEX_op_neg_vec: case INDEX_op_not_vec: -case INDEX_op_rotli_vec: case INDEX_op_sari_vec: case INDEX_op_shli_vec: case INDEX_op_shri_vec: case INDEX_op_s390_vuph_vec: case INDEX_op_s390_vupl_vec: return C_O1_I1(v, v); --- But I'm outside of my comfort zone so not really sure what I'm doing... (I was inspired by the "the instruction verll only allows immediates up to 32 bits." comment from https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html)
Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
On 3/1/24 12:53, Philippe Mathieu-Daudé wrote: Hi Richard, On 3/1/24 09:54, Michael Tokarev wrote: 03.01.2024 03:22, Richard Henderson wrote: On 12/22/23 01:51, Michael Tokarev wrote: ... git bisect points to this commit: commit ab84dc398b3b702b0c692538b947ef65dbbdf52f Author: Richard Henderson Date: Wed Aug 23 23:04:24 2023 -0700 tcg/optimize: Optimize env memory operations So far, this seems to work on amd64 host, but fails on s390x host - where this has been observed so far. Maybe it also fails in some other combinations too, I don't yet know. Just finished bisecting it on s390x. I haven't been able to build a reproducer for this. Have you an image or kernel you can share? Sure. Here's my actual testing "image": http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz It contains vmlinuz and initrd - generated on a debian s390x system using standard debian tools. Actual command line I used when doing bisection: ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic -smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G -kernel vmlinuz -initrd initrd -snapshot I had a quick look at the reproducer and reduced the code area to: void tcg_optimize(TCGContext *s) { ... switch (opc) { case INDEX_op_ld_vec: done = fold_tcg_ld_memcopy(, op); static bool fold_tcg_ld_memcopy(OptContext *ctx, TCGOp *op) { ... if (src && src->base_type == type) { return tcg_opt_gen_mov(ctx, op, temp_arg(dst), temp_arg(src)); } static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src) { ... switch (ctx->type) { case TCG_TYPE_V128: new_op = INDEX_op_mov_vec; By disabling this optimization, the test succeeds. Looking at commit 4caad79f8d ("tcg/s390x: Support 128-bit load/store") and remembering the constraints change on PPC LQ in https://lore.kernel.org/qemu-devel/20240102013456.131846-1-richard.hender...@linaro.org/ I wondered if LPQ constraints are correct, but I disabled TCG_TARGET_HAS_qemu_ldst_i128 and the bug persists (so re-enabled). Then disabling TCG_TARGET_HAS_v64 and TCG_TARGET_HAS_v128 the bug disappears. Reducing a bit further, it works when disabling rotli_vec opcode (commit 22cb37b417 "tcg/s390x: Implement vector shift operations"): --- diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc index fbee43d3b0..5f147661e8 100644 --- a/tcg/s390x/tcg-target.c.inc +++ b/tcg/s390x/tcg-target.c.inc @@ -2918,3 +2918,5 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_orc_vec: +return 1; case INDEX_op_rotli_vec: +return TCG_TARGET_HAS_roti_vec; case INDEX_op_rotls_vec: diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h index e69b0d2ddd..5c18146a40 100644 --- a/tcg/s390x/tcg-target.h +++ b/tcg/s390x/tcg-target.h @@ -152,3 +152,3 @@ extern uint64_t s390_facilities[3]; #define TCG_TARGET_HAS_abs_vec1 -#define TCG_TARGET_HAS_roti_vec 1 +#define TCG_TARGET_HAS_roti_vec 0 #define TCG_TARGET_HAS_rots_vec 1 ---
Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
Hi Richard, On 3/1/24 09:54, Michael Tokarev wrote: 03.01.2024 03:22, Richard Henderson wrote: On 12/22/23 01:51, Michael Tokarev wrote: ... git bisect points to this commit: commit ab84dc398b3b702b0c692538b947ef65dbbdf52f Author: Richard Henderson Date: Wed Aug 23 23:04:24 2023 -0700 tcg/optimize: Optimize env memory operations So far, this seems to work on amd64 host, but fails on s390x host - where this has been observed so far. Maybe it also fails in some other combinations too, I don't yet know. Just finished bisecting it on s390x. I haven't been able to build a reproducer for this. Have you an image or kernel you can share? Sure. Here's my actual testing "image": http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz It contains vmlinuz and initrd - generated on a debian s390x system using standard debian tools. Actual command line I used when doing bisection: ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic -smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G -kernel vmlinuz -initrd initrd -snapshot I had a quick look at the reproducer and reduced the code area to: void tcg_optimize(TCGContext *s) { ... switch (opc) { case INDEX_op_ld_vec: done = fold_tcg_ld_memcopy(, op); static bool fold_tcg_ld_memcopy(OptContext *ctx, TCGOp *op) { ... if (src && src->base_type == type) { return tcg_opt_gen_mov(ctx, op, temp_arg(dst), temp_arg(src)); } static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src) { ... switch (ctx->type) { case TCG_TYPE_V128: new_op = INDEX_op_mov_vec; By disabling this optimization, the test succeeds. Looking at commit 4caad79f8d ("tcg/s390x: Support 128-bit load/store") and remembering the constraints change on PPC LQ in https://lore.kernel.org/qemu-devel/20240102013456.131846-1-richard.hender...@linaro.org/ I wondered if LPQ constraints are correct, but I disabled TCG_TARGET_HAS_qemu_ldst_i128 and the bug persists (so re-enabled). Then disabling TCG_TARGET_HAS_v64 and TCG_TARGET_HAS_v128 the bug disappears. The problematic chacha20 guest code could be: Restarting code generation with smaller translation block (max 86 insns) IN: 0x3ff80025a62: eb67 f030 0024 stmg %r6, %r7, 0x30(%r15) 0x3ff80025a68: a719 ff60 lghi %r1, -0xa0 0x3ff80025a6c: b904 000f lgr %r0, %r15 0x3ff80025a70: 41f1 f000 la %r15, 0(%r1, %r15) 0x3ff80025a74: e300 f000 0024 stg %r0, 0(%r15) 0x3ff80025a7a: c070 12c3 larl %r7, -0x7ffd8000 0x3ff80025a80: a708 000a lhi %r0, 0xa 0x3ff80025a84: e789 5000 0c36 .byte0xe7, 0x89, 0x50, 0x00, 0x0c, 0x36 0x3ff80025a8a: e7a0 6000 0806 .byte0xe7, 0xa0, 0x60, 0x00, 0x08, 0x06 0x3ff80025a90: e7bf 7000 4c36 .byte0xe7, 0xbf, 0x70, 0x00, 0x4c, 0x36 0x3ff80025a96: e70b 0456 .byte0xe7, 0x0b, 0x00, 0x00, 0x04, 0x56 0x3ff80025a9c: e718 0456 .byte0xe7, 0x18, 0x00, 0x00, 0x04, 0x56 0x3ff80025aa2: e74b 0456 .byte0xe7, 0x4b, 0x00, 0x00, 0x04, 0x56 0x3ff80025aa8: e758 0456 .byte0xe7, 0x58, 0x00, 0x00, 0x04, 0x56 0x3ff80025aae: e78b 0456 .byte0xe7, 0x8b, 0x00, 0x00, 0x04, 0x56 0x3ff80025ab4: e798 0456 .byte0xe7, 0x98, 0x00, 0x00, 0x04, 0x56 0x3ff80025aba: e7cb 0456 .byte0xe7, 0xcb, 0x00, 0x00, 0x04, 0x56 0x3ff80025ac0: e7d8 0456 .byte0xe7, 0xd8, 0x00, 0x00, 0x04, 0x56 0x3ff80025ac6: e70b 0c56 .byte0xe7, 0x0b, 0x00, 0x00, 0x0c, 0x56 0x3ff80025acc: e718 0c56 .byte0xe7, 0x18, 0x00, 0x00, 0x0c, 0x56 0x3ff80025ad2: e74b 0c56 .byte0xe7, 0x4b, 0x00, 0x00, 0x0c, 0x56 0x3ff80025ad8: e758 0c56 .byte0xe7, 0x58, 0x00, 0x00, 0x0c, 0x56 0x3ff80025ade: e73a 0456 .byte0xe7, 0x3a, 0x00, 0x00, 0x04, 0x56 0x3ff80025ae4: e77a c000 26f3 .byte0xe7, 0x7a, 0xc0, 0x00, 0x26, 0xf3 0x3ff80025aea: e7ba d000 26f3 .byte0xe7, 0xba, 0xd0, 0x00, 0x26, 0xf3 0x3ff80025af0: e7fa e000 26f3 .byte0xe7, 0xfa, 0xe0, 0x00, 0x26, 0xf3 0x3ff80025af6: e73b d000 2af3 .byte0xe7, 0x3b, 0xd0, 0x00, 0x2a, 0xf3 0x3ff80025afc: e77b e000 2af3 .byte0xe7, 0x7b, 0xe0, 0x00, 0x2a, 0xf3 0x3ff80025b02: e729 0456 .byte0xe7, 0x29, 0x00, 0x00, 0x04, 0x56 0x3ff80025b08: e769 0456 .byte0xe7, 0x69, 0x00, 0x00, 0x04, 0x56 0x3ff80025b0e: e7a9 0456 .byte0xe7, 0xa9, 0x00, 0x00, 0x04, 0x56 0x3ff80025b14: e7e9 0456 .byte0xe7, 0xe9, 0x00, 0x00, 0x04, 0x56 0x3ff80025b1a: e729 0c56 .byte0xe7, 0x29, 0x00, 0x00, 0x0c, 0x56 0x3ff80025b20: e769 0c56 .byte0xe7, 0x69, 0x00, 0x00, 0x0c, 0x56 0x3ff80025b26: e7c7 0856 .byte0xe7, 0xc7, 0x00, 0x00, 0x08, 0x56 0x3ff80025b2c: e7db 0856 .byte0xe7, 0xdb, 0x00, 0x00, 0x08, 0x56 0x3ff80025b32: e7ef 0856 .byte0xe7, 0xef, 0x00, 0x00, 0x08, 0x56 0x3ff80025b38: e700 1000 20f3 .byte0xe7, 0x00,
Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
03.01.2024 03:22, Richard Henderson wrote: On 12/22/23 01:51, Michael Tokarev wrote: ... git bisect points to this commit: commit ab84dc398b3b702b0c692538b947ef65dbbdf52f Author: Richard Henderson Date: Wed Aug 23 23:04:24 2023 -0700 tcg/optimize: Optimize env memory operations So far, this seems to work on amd64 host, but fails on s390x host - where this has been observed so far. Maybe it also fails in some other combinations too, I don't yet know. Just finished bisecting it on s390x. I haven't been able to build a reproducer for this. Have you an image or kernel you can share? Sure. Here's my actual testing "image": http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz It contains vmlinuz and initrd - generated on a debian s390x system using standard debian tools. Actual command line I used when doing bisection: ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic -smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G -kernel vmlinuz -initrd initrd -snapshot This command has unrelated stuff, one of which is using of vmlinuz as the hdd image (in my initial test it was real filesystem image, but it doesn't really matter), - I don't need this filesystem to be mounted, the prob is visible before the mount when crypto modules are loaded. All it needs is to load crypto stuff, - in particular it runs some selftests at this point. But please note once again: it works just fine on amd64 hw. Where it breaks is the actual s390x *host*, - I did all my tests on a debian s390x porterbox, an actual s390x machine. Thanks, /mjt
Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
On 12/22/23 01:51, Michael Tokarev wrote: When running current kernel on s390x in tcg mode *on s390x hw*, the following is generated when loading crypto selftest module (it gets loaded automatically): [ 10.546690] alg: skcipher: chacha20-s390 encryption test failed (wrong result) on test vector 1, cfg="in-place (one sglist)" [ 10.546914] alg: self-tests for chacha20 using chacha20-s390 failed (rc=-22) [ 10.546969] [ cut here ] [ 10.546998] alg: self-tests for chacha20 using chacha20-s390 failed (rc=-22) [ 10.547182] WARNING: CPU: 1 PID: 109 at crypto/testmgr.c:5936 alg_test+0x55a/0x5b8 [ 10.547510] Modules linked in: net_failover chacha_s390(+) libchacha virtio_blk(+) failover [ 10.547854] CPU: 1 PID: 109 Comm: cryptomgr_test Not tainted 6.5.0-5-s390x #1 Debian 6.5.13-1 [ 10.548002] Hardware name: QEMU 8561 QEMU (KVM/Linux) [ 10.548101] Krnl PSW : 0704c0018000 005df8fe (alg_test+0x55e/0x5b8) [ 10.548207] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 10.548291] Krnl GPRS: 01286408 005df8fa 01286408 [ 10.548337] 0014bf14 001c6ba8 01838b3c 0005 [ 10.548475] 025a4880 025a4800 ffea ffea [ 10.548521] 3e649200 005df8fa 03800016bcf8 [ 10.549504] Krnl Code: 005df8ee: c020003b5828 larl %r2,00d4a93e [ 10.549504] 005df8f4: c0e5ffdb62d2 brasl %r14,0014be98 [ 10.549504] #005df8fa: af00 mc 0,0 [ 10.549504] >005df8fe: a7f4fee6 brc 15,005df6ca [ 10.549504] 005df902: b9040042 lgr %r4,%r2 [ 10.549504] 005df906: b9040039 lgr %r3,%r9 [ 10.549504] 005df90a: c020003b57df larl %r2,00d4a8c8 [ 10.549504] 005df910: 18bd lr %r11,%r13 [ 10.550004] Call Trace: [ 10.550375] [<005df8fe>] alg_test+0x55e/0x5b8 [ 10.550467] ([<005df8fa>] alg_test+0x55a/0x5b8) [ 10.550489] [<005d9fbc>] cryptomgr_test+0x34/0x60 [ 10.550514] [<0017d004>] kthread+0x124/0x130 [ 10.550539] [<00103124>] __ret_from_fork+0x3c/0x50 [ 10.550562] [<00b1dfca>] ret_from_fork+0xa/0x30 [ 10.550611] Last Breaking-Event-Address: [ 10.550626] [<0014bf20>] __warn_printk+0x88/0x110 [ 10.550723] ---[ end trace ]--- git bisect points to this commit: commit ab84dc398b3b702b0c692538b947ef65dbbdf52f Author: Richard Henderson Date: Wed Aug 23 23:04:24 2023 -0700 tcg/optimize: Optimize env memory operations So far, this seems to work on amd64 host, but fails on s390x host - where this has been observed so far. Maybe it also fails in some other combinations too, I don't yet know. Just finished bisecting it on s390x. I haven't been able to build a reproducer for this. Have you an image or kernel you can share? r~