Re: chacha20-s390 broken in 8.2.0 in TCG on s390x

2024-01-17 Thread Philippe Mathieu-Daudé

On 17/1/24 12:53, Michael Tokarev wrote:

04.01.2024 01:51, Richard Henderson :

On 1/4/24 01:37, Philippe Mathieu-Daudé wrote:

Finally changing the constraints on op_rotli_vec seems to fix it:

---
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index fbee43d3b0..b3456fe857 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)

  case INDEX_op_ld_vec:
  case INDEX_op_dupm_vec:
+    case INDEX_op_rotli_vec:
  return C_O1_I1(v, r);
  case INDEX_op_dup_vec:
  return C_O1_I1(v, vr);
  case INDEX_op_abs_vec:
  case INDEX_op_neg_vec:
  case INDEX_op_not_vec:
-    case INDEX_op_rotli_vec:
  case INDEX_op_sari_vec:
  case INDEX_op_shli_vec:
  case INDEX_op_shri_vec:
  case INDEX_op_s390_vuph_vec:
  case INDEX_op_s390_vupl_vec:
  return C_O1_I1(v, v);


Definitely not correct, since VERLL requires a vector input to be 
rotated.



But I'm outside of my comfort zone so not really sure what I'm doing...
(I was inspired by the "the instruction verll only allows immediates up
to 32 bits." comment from
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html)


That message is confused.  The immediate in VERLL is 12 bits (with 
only 6 bits ever used for MO_64).  Dunno where "32 bits" comes from.


So, what do we have here in the end?
Should we fix this on qemu side?


Yes.


This thread stopped quite some time ago, with problematic
instruction found but no solution..


Be assured we are spending (too?) many hours on this...



Re: chacha20-s390 broken in 8.2.0 in TCG on s390x

2024-01-17 Thread Alex Bennée
Michael Tokarev  writes:

> 04.01.2024 01:51, Richard Henderson :
>> On 1/4/24 01:37, Philippe Mathieu-Daudé wrote:
>>> Finally changing the constraints on op_rotli_vec seems to fix it:
>>>
>>> ---
>>> diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
>>> index fbee43d3b0..b3456fe857 100644
>>> --- a/tcg/s390x/tcg-target.c.inc
>>> +++ b/tcg/s390x/tcg-target.c.inc
>>> @@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex 
>>> tcg_target_op_def(TCGOpcode op)
>>>   case INDEX_op_ld_vec:
>>>   case INDEX_op_dupm_vec:
>>> +    case INDEX_op_rotli_vec:
>>>   return C_O1_I1(v, r);
>>>   case INDEX_op_dup_vec:
>>>   return C_O1_I1(v, vr);
>>>   case INDEX_op_abs_vec:
>>>   case INDEX_op_neg_vec:
>>>   case INDEX_op_not_vec:
>>> -    case INDEX_op_rotli_vec:
>>>   case INDEX_op_sari_vec:
>>>   case INDEX_op_shli_vec:
>>>   case INDEX_op_shri_vec:
>>>   case INDEX_op_s390_vuph_vec:
>>>   case INDEX_op_s390_vupl_vec:
>>>   return C_O1_I1(v, v);
>> Definitely not correct, since VERLL requires a vector input to be
>> rotated.
>> 
>>> But I'm outside of my comfort zone so not really sure what I'm doing...
>>> (I was inspired by the "the instruction verll only allows immediates up
>>> to 32 bits." comment from
>>> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html)
>> That message is confused.  The immediate in VERLL is 12 bits (with
>> only 6 bits ever used for MO_64).  Dunno where "32 bits" comes from.
>
> So, what do we have here in the end?
> Should we fix this on qemu side?

I think the thinking is we should disable the optimisation for the 8.2
stable while figuring out the true fix for 9.0.

>
> This thread stopped quite some time ago, with problematic
> instruction found but no solution..
>
> Thanks,
>
> /mjt

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: chacha20-s390 broken in 8.2.0 in TCG on s390x

2024-01-17 Thread Michael Tokarev

04.01.2024 01:51, Richard Henderson :

On 1/4/24 01:37, Philippe Mathieu-Daudé wrote:

Finally changing the constraints on op_rotli_vec seems to fix it:

---
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index fbee43d3b0..b3456fe857 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)
  case INDEX_op_ld_vec:
  case INDEX_op_dupm_vec:
+    case INDEX_op_rotli_vec:
  return C_O1_I1(v, r);
  case INDEX_op_dup_vec:
  return C_O1_I1(v, vr);
  case INDEX_op_abs_vec:
  case INDEX_op_neg_vec:
  case INDEX_op_not_vec:
-    case INDEX_op_rotli_vec:
  case INDEX_op_sari_vec:
  case INDEX_op_shli_vec:
  case INDEX_op_shri_vec:
  case INDEX_op_s390_vuph_vec:
  case INDEX_op_s390_vupl_vec:
  return C_O1_I1(v, v);


Definitely not correct, since VERLL requires a vector input to be rotated.


But I'm outside of my comfort zone so not really sure what I'm doing...
(I was inspired by the "the instruction verll only allows immediates up
to 32 bits." comment from
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html)


That message is confused.  The immediate in VERLL is 12 bits (with only 6 bits ever used 
for MO_64).  Dunno where "32 bits" comes from.


So, what do we have here in the end?
Should we fix this on qemu side?

This thread stopped quite some time ago, with problematic
instruction found but no solution..

Thanks,

/mjt




Re: chacha20-s390 broken in 8.2.0 in TCG on s390x

2024-01-03 Thread Richard Henderson

On 1/4/24 01:37, Philippe Mathieu-Daudé wrote:

Finally changing the constraints on op_rotli_vec seems to fix it:

---
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index fbee43d3b0..b3456fe857 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)
  case INDEX_op_ld_vec:
  case INDEX_op_dupm_vec:
+    case INDEX_op_rotli_vec:
  return C_O1_I1(v, r);
  case INDEX_op_dup_vec:
  return C_O1_I1(v, vr);
  case INDEX_op_abs_vec:
  case INDEX_op_neg_vec:
  case INDEX_op_not_vec:
-    case INDEX_op_rotli_vec:
  case INDEX_op_sari_vec:
  case INDEX_op_shli_vec:
  case INDEX_op_shri_vec:
  case INDEX_op_s390_vuph_vec:
  case INDEX_op_s390_vupl_vec:
  return C_O1_I1(v, v);


Definitely not correct, since VERLL requires a vector input to be rotated.


But I'm outside of my comfort zone so not really sure what I'm doing...
(I was inspired by the "the instruction verll only allows immediates up
to 32 bits." comment from
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html)


That message is confused.  The immediate in VERLL is 12 bits (with only 6 bits ever used 
for MO_64).  Dunno where "32 bits" comes from.



r~



Re: chacha20-s390 broken in 8.2.0 in TCG on s390x

2024-01-03 Thread Philippe Mathieu-Daudé

On 3/1/24 15:01, Philippe Mathieu-Daudé wrote:

On 3/1/24 12:53, Philippe Mathieu-Daudé wrote:

Hi Richard,

On 3/1/24 09:54, Michael Tokarev wrote:

03.01.2024 03:22, Richard Henderson wrote:

On 12/22/23 01:51, Michael Tokarev wrote:

...

git bisect points to this commit:

commit ab84dc398b3b702b0c692538b947ef65dbbdf52f
Author: Richard Henderson 
Date:   Wed Aug 23 23:04:24 2023 -0700

 tcg/optimize: Optimize env memory operations

So far, this seems to work on amd64 host, but fails on s390x host -
where this has been observed so far.  Maybe it also fails in some
other combinations too, I don't yet know.  Just finished bisecting
it on s390x.


I haven't been able to build a reproducer for this.
Have you an image or kernel you can share?


Sure.

Here's my actual testing "image": 
http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz


It contains vmlinuz and initrd - generated on a debian s390x system 
using standard

debian tools.

Actual command line I used when doing bisection:

  ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic 
-smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G 
-kernel vmlinuz -initrd initrd -snapshot




Reducing a bit further, it works when disabling rotli_vec opcode
(commit 22cb37b417 "tcg/s390x: Implement vector shift operations"):

---
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index fbee43d3b0..5f147661e8 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -2918,3 +2918,5 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType 
type, unsigned vece)

  case INDEX_op_orc_vec:
+    return 1;
  case INDEX_op_rotli_vec:
+    return TCG_TARGET_HAS_roti_vec;
  case INDEX_op_rotls_vec:
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index e69b0d2ddd..5c18146a40 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -152,3 +152,3 @@ extern uint64_t s390_facilities[3];
  #define TCG_TARGET_HAS_abs_vec    1
-#define TCG_TARGET_HAS_roti_vec   1
+#define TCG_TARGET_HAS_roti_vec   0
  #define TCG_TARGET_HAS_rots_vec   1
---


Finally changing the constraints on op_rotli_vec seems to fix it:

---
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index fbee43d3b0..b3456fe857 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)

 case INDEX_op_ld_vec:
 case INDEX_op_dupm_vec:
+case INDEX_op_rotli_vec:
 return C_O1_I1(v, r);
 case INDEX_op_dup_vec:
 return C_O1_I1(v, vr);
 case INDEX_op_abs_vec:
 case INDEX_op_neg_vec:
 case INDEX_op_not_vec:
-case INDEX_op_rotli_vec:
 case INDEX_op_sari_vec:
 case INDEX_op_shli_vec:
 case INDEX_op_shri_vec:
 case INDEX_op_s390_vuph_vec:
 case INDEX_op_s390_vupl_vec:
 return C_O1_I1(v, v);
---

But I'm outside of my comfort zone so not really sure what I'm doing...
(I was inspired by the "the instruction verll only allows immediates up
to 32 bits." comment from
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html)



Re: chacha20-s390 broken in 8.2.0 in TCG on s390x

2024-01-03 Thread Philippe Mathieu-Daudé

On 3/1/24 12:53, Philippe Mathieu-Daudé wrote:

Hi Richard,

On 3/1/24 09:54, Michael Tokarev wrote:

03.01.2024 03:22, Richard Henderson wrote:

On 12/22/23 01:51, Michael Tokarev wrote:

...

git bisect points to this commit:

commit ab84dc398b3b702b0c692538b947ef65dbbdf52f
Author: Richard Henderson 
Date:   Wed Aug 23 23:04:24 2023 -0700

 tcg/optimize: Optimize env memory operations

So far, this seems to work on amd64 host, but fails on s390x host -
where this has been observed so far.  Maybe it also fails in some
other combinations too, I don't yet know.  Just finished bisecting
it on s390x.


I haven't been able to build a reproducer for this.
Have you an image or kernel you can share?


Sure.

Here's my actual testing "image": 
http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz


It contains vmlinuz and initrd - generated on a debian s390x system 
using standard

debian tools.

Actual command line I used when doing bisection:

  ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic 
-smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G 
-kernel vmlinuz -initrd initrd -snapshot


I had a quick look at the reproducer and reduced the code
area to:

void tcg_optimize(TCGContext *s)
{
     ...
     switch (opc) {
     case INDEX_op_ld_vec:
     done = fold_tcg_ld_memcopy(, op);


static bool fold_tcg_ld_memcopy(OptContext *ctx, TCGOp *op)
{
     ...
     if (src && src->base_type == type) {
     return tcg_opt_gen_mov(ctx, op, temp_arg(dst), temp_arg(src));
     }


static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, 
TCGArg src)

{
     ...
     switch (ctx->type) {
     case TCG_TYPE_V128:
     new_op = INDEX_op_mov_vec;


By disabling this optimization, the test succeeds.

Looking at commit 4caad79f8d ("tcg/s390x: Support 128-bit load/store")
and remembering the constraints change on PPC LQ in
https://lore.kernel.org/qemu-devel/20240102013456.131846-1-richard.hender...@linaro.org/
I wondered if LPQ constraints are correct, but I disabled
TCG_TARGET_HAS_qemu_ldst_i128 and the bug persists (so
re-enabled).

Then disabling TCG_TARGET_HAS_v64 and TCG_TARGET_HAS_v128 the bug
disappears.


Reducing a bit further, it works when disabling rotli_vec opcode
(commit 22cb37b417 "tcg/s390x: Implement vector shift operations"):

---
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index fbee43d3b0..5f147661e8 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -2918,3 +2918,5 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType 
type, unsigned vece)

 case INDEX_op_orc_vec:
+return 1;
 case INDEX_op_rotli_vec:
+return TCG_TARGET_HAS_roti_vec;
 case INDEX_op_rotls_vec:
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index e69b0d2ddd..5c18146a40 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -152,3 +152,3 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_abs_vec1
-#define TCG_TARGET_HAS_roti_vec   1
+#define TCG_TARGET_HAS_roti_vec   0
 #define TCG_TARGET_HAS_rots_vec   1
---



Re: chacha20-s390 broken in 8.2.0 in TCG on s390x

2024-01-03 Thread Philippe Mathieu-Daudé

Hi Richard,

On 3/1/24 09:54, Michael Tokarev wrote:

03.01.2024 03:22, Richard Henderson wrote:

On 12/22/23 01:51, Michael Tokarev wrote:

...

git bisect points to this commit:

commit ab84dc398b3b702b0c692538b947ef65dbbdf52f
Author: Richard Henderson 
Date:   Wed Aug 23 23:04:24 2023 -0700

 tcg/optimize: Optimize env memory operations

So far, this seems to work on amd64 host, but fails on s390x host -
where this has been observed so far.  Maybe it also fails in some
other combinations too, I don't yet know.  Just finished bisecting
it on s390x.


I haven't been able to build a reproducer for this.
Have you an image or kernel you can share?


Sure.

Here's my actual testing "image": 
http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz


It contains vmlinuz and initrd - generated on a debian s390x system 
using standard

debian tools.

Actual command line I used when doing bisection:

  ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic -smp 
2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G -kernel 
vmlinuz -initrd initrd -snapshot


I had a quick look at the reproducer and reduced the code
area to:

void tcg_optimize(TCGContext *s)
{
...
switch (opc) {
case INDEX_op_ld_vec:
done = fold_tcg_ld_memcopy(, op);


static bool fold_tcg_ld_memcopy(OptContext *ctx, TCGOp *op)
{
...
if (src && src->base_type == type) {
return tcg_opt_gen_mov(ctx, op, temp_arg(dst), temp_arg(src));
}


static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, 
TCGArg src)

{
...
switch (ctx->type) {
case TCG_TYPE_V128:
new_op = INDEX_op_mov_vec;


By disabling this optimization, the test succeeds.

Looking at commit 4caad79f8d ("tcg/s390x: Support 128-bit load/store")
and remembering the constraints change on PPC LQ in
https://lore.kernel.org/qemu-devel/20240102013456.131846-1-richard.hender...@linaro.org/
I wondered if LPQ constraints are correct, but I disabled
TCG_TARGET_HAS_qemu_ldst_i128 and the bug persists (so
re-enabled).

Then disabling TCG_TARGET_HAS_v64 and TCG_TARGET_HAS_v128 the bug
disappears.

The problematic chacha20 guest code could be:

Restarting code generation with smaller translation block (max 86 insns)

IN:
0x3ff80025a62:  eb67 f030 0024  stmg %r6, %r7, 0x30(%r15)
0x3ff80025a68:  a719 ff60   lghi %r1, -0xa0
0x3ff80025a6c:  b904 000f   lgr  %r0, %r15
0x3ff80025a70:  41f1 f000   la   %r15, 0(%r1, %r15)
0x3ff80025a74:  e300 f000 0024  stg  %r0, 0(%r15)
0x3ff80025a7a:  c070  12c3  larl %r7, -0x7ffd8000
0x3ff80025a80:  a708 000a   lhi  %r0, 0xa
0x3ff80025a84:  e789 5000 0c36  .byte0xe7, 0x89, 0x50, 0x00, 0x0c, 0x36
0x3ff80025a8a:  e7a0 6000 0806  .byte0xe7, 0xa0, 0x60, 0x00, 0x08, 0x06
0x3ff80025a90:  e7bf 7000 4c36  .byte0xe7, 0xbf, 0x70, 0x00, 0x4c, 0x36
0x3ff80025a96:  e70b  0456  .byte0xe7, 0x0b, 0x00, 0x00, 0x04, 0x56
0x3ff80025a9c:  e718  0456  .byte0xe7, 0x18, 0x00, 0x00, 0x04, 0x56
0x3ff80025aa2:  e74b  0456  .byte0xe7, 0x4b, 0x00, 0x00, 0x04, 0x56
0x3ff80025aa8:  e758  0456  .byte0xe7, 0x58, 0x00, 0x00, 0x04, 0x56
0x3ff80025aae:  e78b  0456  .byte0xe7, 0x8b, 0x00, 0x00, 0x04, 0x56
0x3ff80025ab4:  e798  0456  .byte0xe7, 0x98, 0x00, 0x00, 0x04, 0x56
0x3ff80025aba:  e7cb  0456  .byte0xe7, 0xcb, 0x00, 0x00, 0x04, 0x56
0x3ff80025ac0:  e7d8  0456  .byte0xe7, 0xd8, 0x00, 0x00, 0x04, 0x56
0x3ff80025ac6:  e70b  0c56  .byte0xe7, 0x0b, 0x00, 0x00, 0x0c, 0x56
0x3ff80025acc:  e718  0c56  .byte0xe7, 0x18, 0x00, 0x00, 0x0c, 0x56
0x3ff80025ad2:  e74b  0c56  .byte0xe7, 0x4b, 0x00, 0x00, 0x0c, 0x56
0x3ff80025ad8:  e758  0c56  .byte0xe7, 0x58, 0x00, 0x00, 0x0c, 0x56
0x3ff80025ade:  e73a  0456  .byte0xe7, 0x3a, 0x00, 0x00, 0x04, 0x56
0x3ff80025ae4:  e77a c000 26f3  .byte0xe7, 0x7a, 0xc0, 0x00, 0x26, 0xf3
0x3ff80025aea:  e7ba d000 26f3  .byte0xe7, 0xba, 0xd0, 0x00, 0x26, 0xf3
0x3ff80025af0:  e7fa e000 26f3  .byte0xe7, 0xfa, 0xe0, 0x00, 0x26, 0xf3
0x3ff80025af6:  e73b d000 2af3  .byte0xe7, 0x3b, 0xd0, 0x00, 0x2a, 0xf3
0x3ff80025afc:  e77b e000 2af3  .byte0xe7, 0x7b, 0xe0, 0x00, 0x2a, 0xf3
0x3ff80025b02:  e729  0456  .byte0xe7, 0x29, 0x00, 0x00, 0x04, 0x56
0x3ff80025b08:  e769  0456  .byte0xe7, 0x69, 0x00, 0x00, 0x04, 0x56
0x3ff80025b0e:  e7a9  0456  .byte0xe7, 0xa9, 0x00, 0x00, 0x04, 0x56
0x3ff80025b14:  e7e9  0456  .byte0xe7, 0xe9, 0x00, 0x00, 0x04, 0x56
0x3ff80025b1a:  e729  0c56  .byte0xe7, 0x29, 0x00, 0x00, 0x0c, 0x56
0x3ff80025b20:  e769  0c56  .byte0xe7, 0x69, 0x00, 0x00, 0x0c, 0x56
0x3ff80025b26:  e7c7  0856  .byte0xe7, 0xc7, 0x00, 0x00, 0x08, 0x56
0x3ff80025b2c:  e7db  0856  .byte0xe7, 0xdb, 0x00, 0x00, 0x08, 0x56
0x3ff80025b32:  e7ef  0856  .byte0xe7, 0xef, 0x00, 0x00, 0x08, 0x56
0x3ff80025b38:  e700 1000 20f3  .byte0xe7, 0x00, 

Re: chacha20-s390 broken in 8.2.0 in TCG on s390x

2024-01-03 Thread Michael Tokarev

03.01.2024 03:22, Richard Henderson wrote:

On 12/22/23 01:51, Michael Tokarev wrote:

...

git bisect points to this commit:

commit ab84dc398b3b702b0c692538b947ef65dbbdf52f
Author: Richard Henderson 
Date:   Wed Aug 23 23:04:24 2023 -0700

 tcg/optimize: Optimize env memory operations

So far, this seems to work on amd64 host, but fails on s390x host -
where this has been observed so far.  Maybe it also fails in some
other combinations too, I don't yet know.  Just finished bisecting
it on s390x.


I haven't been able to build a reproducer for this.
Have you an image or kernel you can share?


Sure.

Here's my actual testing "image": 
http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz

It contains vmlinuz and initrd - generated on a debian s390x system using 
standard
debian tools.

Actual command line I used when doing bisection:

 ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic -smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G -kernel 
vmlinuz -initrd initrd -snapshot


This command has unrelated stuff, one of which is using of vmlinuz as the hdd
image (in my initial test it was real filesystem image, but it doesn't really
matter), - I don't need this filesystem to be mounted, the prob is visible 
before
the mount when crypto modules are loaded.

All it needs is to load crypto stuff, - in particular it runs some selftests
at this point.

But please note once again: it works just fine on amd64 hw.  Where it breaks
is the actual s390x *host*, - I did all my tests on a debian s390x porterbox,
an actual s390x machine.

Thanks,

/mjt



Re: chacha20-s390 broken in 8.2.0 in TCG on s390x

2024-01-02 Thread Richard Henderson

On 12/22/23 01:51, Michael Tokarev wrote:

When running current kernel on s390x in tcg mode *on s390x hw*, the following
is generated when loading crypto selftest module (it gets loaded automatically):

[   10.546690] alg: skcipher: chacha20-s390 encryption test failed (wrong result) on test 
vector 1, cfg="in-place (one sglist)"

[   10.546914] alg: self-tests for chacha20 using chacha20-s390 failed (rc=-22)
[   10.546969] [ cut here ]
[   10.546998] alg: self-tests for chacha20 using chacha20-s390 failed (rc=-22)
[   10.547182] WARNING: CPU: 1 PID: 109 at crypto/testmgr.c:5936 
alg_test+0x55a/0x5b8
[   10.547510] Modules linked in: net_failover chacha_s390(+) libchacha virtio_blk(+) 
failover
[   10.547854] CPU: 1 PID: 109 Comm: cryptomgr_test Not tainted 6.5.0-5-s390x #1  Debian 
6.5.13-1

[   10.548002] Hardware name: QEMU 8561 QEMU (KVM/Linux)
[   10.548101] Krnl PSW : 0704c0018000 005df8fe 
(alg_test+0x55e/0x5b8)
[   10.548207]    R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 
RI:0 EA:3
[   10.548291] Krnl GPRS:  01286408 005df8fa 
01286408
[   10.548337]    0014bf14 001c6ba8 01838b3c 
0005
[   10.548475]    025a4880 025a4800 ffea 
ffea
[   10.548521]    3e649200  005df8fa 
03800016bcf8
[   10.549504] Krnl Code: 005df8ee: c020003b5828    larl    
%r2,00d4a93e
[   10.549504]    005df8f4: c0e5ffdb62d2    brasl    
%r14,0014be98
[   10.549504]   #005df8fa: af00    mc    0,0
[   10.549504]   >005df8fe: a7f4fee6    brc    
15,005df6ca
[   10.549504]    005df902: b9040042    lgr    %r4,%r2
[   10.549504]    005df906: b9040039    lgr    %r3,%r9
[   10.549504]    005df90a: c020003b57df    larl    
%r2,00d4a8c8
[   10.549504]    005df910: 18bd    lr    %r11,%r13
[   10.550004] Call Trace:
[   10.550375]  [<005df8fe>] alg_test+0x55e/0x5b8
[   10.550467] ([<005df8fa>] alg_test+0x55a/0x5b8)
[   10.550489]  [<005d9fbc>] cryptomgr_test+0x34/0x60
[   10.550514]  [<0017d004>] kthread+0x124/0x130
[   10.550539]  [<00103124>] __ret_from_fork+0x3c/0x50
[   10.550562]  [<00b1dfca>] ret_from_fork+0xa/0x30
[   10.550611] Last Breaking-Event-Address:
[   10.550626]  [<0014bf20>] __warn_printk+0x88/0x110
[   10.550723] ---[ end trace  ]---

git bisect points to this commit:

commit ab84dc398b3b702b0c692538b947ef65dbbdf52f
Author: Richard Henderson 
Date:   Wed Aug 23 23:04:24 2023 -0700

     tcg/optimize: Optimize env memory operations

So far, this seems to work on amd64 host, but fails on s390x host -
where this has been observed so far.  Maybe it also fails in some
other combinations too, I don't yet know.  Just finished bisecting
it on s390x.


I haven't been able to build a reproducer for this.
Have you an image or kernel you can share?


r~