Re: [regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s

2021-04-06 Thread Borislav Petkov
On Tue, Apr 06, 2021 at 12:58:15PM +0200, Paul Menzel wrote:
> I booted Linux 5.12-rc6, containing these commits, on a Dell OptiPlex 5055
> with AMD Ryzen 5 PRO 1500 Quad-Core Processor, and the regression is still
> present for `avx2x4 xor()`:

So I don't think that's a regression - this looks more like "you should
not look at those numbers and compare them". Below are some results from
boot logs on one of my test boxes, first column is the kernel version.

IOW, you can use those numbers as a random number generator.

Now, I'm not saying that there isn't anything happening after
5.4-5.6-ish timeframe but this needs to be checked with a proper
benchmark and then look at what could be causing this. It could be the
MXCSR clearing but it's not like we don't need that so there won't be a
whole lot we can do.

But someone would have to sit down and do proper measurements first. And
bisect. Then we'll see...

HTH.

01-0+   :raid6: avx2x4   xor() 10311 MB/s
01-rc3+ :raid6: avx2x4   xor()  5497 MB/s
01-rc6+ :raid6: avx2x4   xor()  5369 MB/s
02-rc3+ :raid6: avx2x4   xor()  9812 MB/s
02-rc5+ :raid6: avx2x4   xor() 11479 MB/s
03-rc1+ :raid6: avx2x4   xor()  6434 MB/s
03-rc2+ :raid6: avx2x4   xor()  5487 MB/s
03-rc3+ :raid6: avx2x4   xor()  4840 MB/s
03-rc5+ :raid6: avx2x4   xor() 11104 MB/s
04-rc1+ :raid6: avx2x4   xor()  6443 MB/s
04-rc2+ :raid6: avx2x4   xor()  4959 MB/s
04-rc3+ :raid6: avx2x4   xor()  4918 MB/s
04-rc7+ :raid6: avx2x4   xor()  5219 MB/s
05-rc1+ :raid6: avx2x4   xor()  5362 MB/s
05-rc2+ :raid6: avx2x4   xor()  5356 MB/s
05-rc7+ :raid6: avx2x4   xor()  5821 MB/s
06-rc1+ :raid6: avx2x4   xor()  3358 MB/s
06-rc2+ :raid6: avx2x4   xor()  3591 MB/s
06-rc4+ :raid6: avx2x4   xor()  3947 MB/s
06-rc6+ :raid6: avx2x4   xor()  4100 MB/s
06-rc7+ :raid6: avx2x4   xor()  4038 MB/s
07-0+   :raid6: avx2x4   xor()  3410 MB/s
07-rc1+ :raid6: avx2x4   xor()  4836 MB/s
07-rc2+ :raid6: avx2x4   xor()  3194 MB/s
07-rc5  :raid6: avx2x4   xor()  4220 MB/s
07-rc6+ :raid6: avx2x4   xor()  3949 MB/s
07-rc7+ :raid6: avx2x4   xor()  3238 MB/s
09-0+   :raid6: avx2x4   xor()  3259 MB/s
09-rc1+ :raid6: avx2x4   xor()  2963 MB/s
09-rc4+ :raid6: avx2x4   xor()  2593 MB/s
09-rc5+ :raid6: avx2x4   xor()  2555 MB/s
09-rc7+ :raid6: avx2x4   xor()   MB/s
09-rc8+ :raid6: avx2x4   xor()  2979 MB/s
10-rc4+ :raid6: avx2x4   xor()  4482 MB/s
10-rc5+ :raid6: avx2x4   xor()  6170 MB/s
10-rc7+ :raid6: avx2x4   xor()  3557 MB/s
11-rc1+ :raid6: avx2x4   xor()  1461 MB/s
11-rc2+ :raid6: avx2x4   xor()  4095 MB/s
11-rc7+ :raid6: avx2x4   xor()  6088 MB/s
12-rc1+ :raid6: avx2x4   xor()  4147 MB/s
12-rc2+ :raid6: avx2x4   xor()  4361 MB/s
12-rc3+ :raid6: avx2x4   xor()  4070 MB/s
12-rc4+ :raid6: avx2x4   xor()  6078 MB/s

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s

2021-04-06 Thread Paul Menzel

Dear Borislav,


Am 02.04.21 um 16:05 schrieb Borislav Petkov:

On Fri, Apr 02, 2021 at 10:33:51AM +0200, Paul Menzel wrote:



On an two socket AMD EPYC 7601, we noticed a decrease in raid6 avx2x4 speed
shown at the beginning of the boot.

5.4.955.10.24
--
raid6: avx2x4 gen()   18429 MB/s 6155 MB/s
raid6: avx2x4 xor()6644 MB/s 4274 MB/s
raid6: avx2x2 gen()   17894 MB/s18744 MB/s
raid6: avx2x2 xor()   11642 MB/s11950 MB/s
raid6: avx2x1 gen()   13992 MB/s17112 MB/s
raid6: avx2x1 xor()   10855 MB/s11143 MB/s


Looks like those two might help:

49200d17d27d x86/fpu/64: Don't FNINIT in kernel_fpu_begin()
e45122893a98 x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize 
state


I booted Linux 5.12-rc6, containing these commits, on a Dell OptiPlex 
5055 with AMD Ryzen 5 PRO 1500 Quad-Core Processor, and the regression 
is still present for `avx2x4 xor()`:


5.4.95   5.10.24
--
raid6: avx2x4 gen()23964 MB/s   24540 MB/s 


raid6: avx2x4 xor()13101 MB/s8354 MB/s
raid6: avx2x2 gen()22746 MB/s   26972 MB/s
raid6: avx2x2 xor()14917 MB/s   16463 MB/s
raid6: avx2x1 gen()17519 MB/s   24394 MB/s
raid6: avx2x1 xor()14091 MB/s   15330 MB/s
raid6: sse2x4 gen()16867 MB/s   16136 MB/s
raid6: sse2x4 xor() 9667 MB/s8176 MB/s
raid6: sse2x2 gen()14996 MB/s   18234 MB/s
raid6: sse2x2 xor()10765 MB/s   10455 MB/s
raid6: sse2x1 gen() 7667 MB/s   13769 MB/s
raid6: sse2x1 xor() 7818 MB/s7741 MB/s

What system are you using, and what results do you get with 5.4 and 
5.12-rc6?



Kind regards,

Paul


Re: [regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s

2021-04-03 Thread Thomas Backlund
Den 2021-04-02 kl. 17:05, skrev Borislav Petkov:
> On Fri, Apr 02, 2021 at 10:33:51AM +0200, Paul Menzel wrote:
>> Dear Linux folks,
>>
>>
>> On an two socket AMD EPYC 7601, we noticed a decrease in raid6 avx2x4 speed
>> shown at the beginning of the boot.
>>
>> 5.4.955.10.24
>> --
>> raid6: avx2x4 gen()   18429 MB/s 6155 MB/s
>> raid6: avx2x4 xor()6644 MB/s 4274 MB/s
>> raid6: avx2x2 gen()   17894 MB/s18744 MB/s
>> raid6: avx2x2 xor()   11642 MB/s11950 MB/s
>> raid6: avx2x1 gen()   13992 MB/s17112 MB/s
>> raid6: avx2x1 xor()   10855 MB/s11143 MB/s
>
> Looks like those two might help:
>

That would mean only this is missing:
> 49200d17d27d x86/fpu/64: Don't FNINIT in kernel_fpu_begin()


as this one landed in 5.10.11:
> e45122893a98 x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize 
> state
>

--
Thomas



Re: [regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s

2021-04-02 Thread Borislav Petkov
On Fri, Apr 02, 2021 at 10:33:51AM +0200, Paul Menzel wrote:
> Dear Linux folks,
> 
> 
> On an two socket AMD EPYC 7601, we noticed a decrease in raid6 avx2x4 speed
> shown at the beginning of the boot.
> 
>5.4.955.10.24
> --
> raid6: avx2x4 gen()   18429 MB/s 6155 MB/s
> raid6: avx2x4 xor()6644 MB/s 4274 MB/s
> raid6: avx2x2 gen()   17894 MB/s18744 MB/s
> raid6: avx2x2 xor()   11642 MB/s11950 MB/s
> raid6: avx2x1 gen()   13992 MB/s17112 MB/s
> raid6: avx2x1 xor()   10855 MB/s11143 MB/s

Looks like those two might help:

49200d17d27d x86/fpu/64: Don't FNINIT in kernel_fpu_begin()
e45122893a98 x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize 
state

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


[regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s

2021-04-02 Thread Paul Menzel

Dear Linux folks,


On an two socket AMD EPYC 7601, we noticed a decrease in raid6 avx2x4 
speed shown at the beginning of the boot.


   5.4.955.10.24
--
raid6: avx2x4 gen()   18429 MB/s 6155 MB/s
raid6: avx2x4 xor()6644 MB/s 4274 MB/s
raid6: avx2x2 gen()   17894 MB/s18744 MB/s
raid6: avx2x2 xor()   11642 MB/s11950 MB/s
raid6: avx2x1 gen()   13992 MB/s17112 MB/s
raid6: avx2x1 xor()   10855 MB/s11143 MB/s

We are able to reproduce this with different models: Supermicro 
AS-2023US-TR4/H11DSU-iN and Dell PowerEdge R7425 (with different 
microcode versions).


Can you reproduce this on your systems?

Bisecting is going to be hard, so the systems are in production and also 
take a while to boot. (Maybe kexec would help here.)



Kind regards,

Paul


PS: Some more information:

```
[0.00] Linux version 5.4.97.mx64.368 
(r...@theinternet.molgen.mpg.de) (gcc version 7.5.0 (GCC

)) #1 SMP Wed Feb 10 18:22:50 CET 2021
[…]
[0.00] DMI: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.1 02/07/2018
[…]
[0.630603] raid6: avx2x4   gen() 18429 MB/s
[0.651607] raid6: avx2x4   xor()  6644 MB/s
[0.672605] raid6: avx2x2   gen() 17894 MB/s
[0.693603] raid6: avx2x2   xor() 11642 MB/s
[0.714605] raid6: avx2x1   gen() 13992 MB/s
[0.735604] raid6: avx2x1   xor() 10855 MB/s
[0.756607] raid6: sse2x4   gen() 12246 MB/s
[0.777605] raid6: sse2x4   xor()  5724 MB/s
[0.798605] raid6: sse2x2   gen() 10945 MB/s
[0.819603] raid6: sse2x2   xor()  8097 MB/s
[0.840606] raid6: sse2x1   gen()  5941 MB/s
[0.861606] raid6: sse2x1   xor()  5894 MB/s
[0.866565] raid6: using algorithm avx2x4 gen() 18429 MB/s
[0.871567] raid6:  xor() 6644 MB/s, rmw enabled
[0.877566] raid6: using avx2x2 recovery algorithm
[…]
```


```
[0.00] Linux version 5.10.24.mx64.375 
(r...@theinternet.molgen.mpg.de) (gcc (GCC) 7.5.0, GNU ld (GNU Binutils) 
2.32) #1 SMP Fri Mar 19 12:29:21 CET 2021

[…]
[0.00] DMI: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.1 02/07/2018
[…]
[0.655382] raid6: avx2x4   gen()  6155 MB/s
[0.676382] raid6: avx2x4   xor()  4274 MB/s
[0.697380] raid6: avx2x2   gen() 18744 MB/s
[0.718380] raid6: avx2x2   xor() 11950 MB/s
[0.739380] raid6: avx2x1   gen() 17112 MB/s
[0.760380] raid6: avx2x1   xor() 11143 MB/s
[0.781381] raid6: sse2x4   gen() 11062 MB/s
[0.802380] raid6: sse2x4   xor()  5180 MB/s
[0.823380] raid6: sse2x2   gen() 12467 MB/s
[0.844380] raid6: sse2x2   xor()  7672 MB/s
[0.865381] raid6: sse2x1   gen()  9733 MB/s
[0.886380] raid6: sse2x1   xor()  5717 MB/s
[0.890674] raid6: using algorithm avx2x2 gen() 18744 MB/s
[0.895673] raid6:  xor() 11950 MB/s, rmw enabled
[0.901673] raid6: using avx2x2 recovery algorithm
```

```
$ lscpu
Architecture:x86_64
CPU op-mode(s):  32-bit, 64-bit
Byte Order:  Little Endian
Address sizes:   48 bits physical, 48 bits virtual
CPU(s):  128
On-line CPU(s) list: 0-127
Thread(s) per core:  2
Core(s) per socket:  32
Socket(s):   2
NUMA node(s):8
Vendor ID:   AuthenticAMD
CPU family:  23
Model:   1
Model name:  AMD EPYC 7601 32-Core Processor
Stepping:2
Frequency boost: enabled
CPU MHz: 3100.798
CPU max MHz: 2200.
CPU min MHz: 1200.
BogoMIPS:4399.53
Virtualization:  AMD-V
L1d cache:   2 MiB
L1i cache:   4 MiB
L2 cache:32 MiB
L3 cache:128 MiB
NUMA node0 CPU(s):   0-7,64-71
NUMA node1 CPU(s):   8-15,72-79
NUMA node2 CPU(s):   16-23,80-87
NUMA node3 CPU(s):   24-31,88-95
NUMA node4 CPU(s):   32-39,96-103
NUMA node5 CPU(s):   40-47,104-111
NUMA node6 CPU(s):   48-55,112-119
NUMA node7 CPU(s):   56-63,120-127
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf:  Not affected
Vulnerability Mds:   Not affected
Vulnerability Meltdown:  Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass 
disabled via prctl and seccomp
Vulnerability Spectre v1:Mitigation; usercopy/swapgs barriers 
and __user pointer sanitization
Vulnerability Spectre v2:Mitigation; Full AMD retpoline, IBPB 
conditional, STIBP disabled, RSB filling

Vulnerability Srbds: Not affected
Vulnerability Tsx async abort:   Not affected
Flags:   fpu vme de pse tsc msr pae mce cx8 apic 
sep mtrr pge mca cmov pat