Re: [regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s
On Tue, Apr 06, 2021 at 12:58:15PM +0200, Paul Menzel wrote: > I booted Linux 5.12-rc6, containing these commits, on a Dell OptiPlex 5055 > with AMD Ryzen 5 PRO 1500 Quad-Core Processor, and the regression is still > present for `avx2x4 xor()`: So I don't think that's a regression - this looks more like "you should not look at those numbers and compare them". Below are some results from boot logs on one of my test boxes, first column is the kernel version. IOW, you can use those numbers as a random number generator. Now, I'm not saying that there isn't anything happening after 5.4-5.6-ish timeframe but this needs to be checked with a proper benchmark and then look at what could be causing this. It could be the MXCSR clearing but it's not like we don't need that so there won't be a whole lot we can do. But someone would have to sit down and do proper measurements first. And bisect. Then we'll see... HTH. 01-0+ :raid6: avx2x4 xor() 10311 MB/s 01-rc3+ :raid6: avx2x4 xor() 5497 MB/s 01-rc6+ :raid6: avx2x4 xor() 5369 MB/s 02-rc3+ :raid6: avx2x4 xor() 9812 MB/s 02-rc5+ :raid6: avx2x4 xor() 11479 MB/s 03-rc1+ :raid6: avx2x4 xor() 6434 MB/s 03-rc2+ :raid6: avx2x4 xor() 5487 MB/s 03-rc3+ :raid6: avx2x4 xor() 4840 MB/s 03-rc5+ :raid6: avx2x4 xor() 11104 MB/s 04-rc1+ :raid6: avx2x4 xor() 6443 MB/s 04-rc2+ :raid6: avx2x4 xor() 4959 MB/s 04-rc3+ :raid6: avx2x4 xor() 4918 MB/s 04-rc7+ :raid6: avx2x4 xor() 5219 MB/s 05-rc1+ :raid6: avx2x4 xor() 5362 MB/s 05-rc2+ :raid6: avx2x4 xor() 5356 MB/s 05-rc7+ :raid6: avx2x4 xor() 5821 MB/s 06-rc1+ :raid6: avx2x4 xor() 3358 MB/s 06-rc2+ :raid6: avx2x4 xor() 3591 MB/s 06-rc4+ :raid6: avx2x4 xor() 3947 MB/s 06-rc6+ :raid6: avx2x4 xor() 4100 MB/s 06-rc7+ :raid6: avx2x4 xor() 4038 MB/s 07-0+ :raid6: avx2x4 xor() 3410 MB/s 07-rc1+ :raid6: avx2x4 xor() 4836 MB/s 07-rc2+ :raid6: avx2x4 xor() 3194 MB/s 07-rc5 :raid6: avx2x4 xor() 4220 MB/s 07-rc6+ :raid6: avx2x4 xor() 3949 MB/s 07-rc7+ :raid6: avx2x4 xor() 3238 MB/s 09-0+ :raid6: avx2x4 xor() 3259 MB/s 09-rc1+ :raid6: avx2x4 xor() 2963 MB/s 09-rc4+ :raid6: avx2x4 xor() 2593 MB/s 09-rc5+ :raid6: avx2x4 xor() 2555 MB/s 09-rc7+ :raid6: avx2x4 xor() MB/s 09-rc8+ :raid6: avx2x4 xor() 2979 MB/s 10-rc4+ :raid6: avx2x4 xor() 4482 MB/s 10-rc5+ :raid6: avx2x4 xor() 6170 MB/s 10-rc7+ :raid6: avx2x4 xor() 3557 MB/s 11-rc1+ :raid6: avx2x4 xor() 1461 MB/s 11-rc2+ :raid6: avx2x4 xor() 4095 MB/s 11-rc7+ :raid6: avx2x4 xor() 6088 MB/s 12-rc1+ :raid6: avx2x4 xor() 4147 MB/s 12-rc2+ :raid6: avx2x4 xor() 4361 MB/s 12-rc3+ :raid6: avx2x4 xor() 4070 MB/s 12-rc4+ :raid6: avx2x4 xor() 6078 MB/s -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s
Dear Borislav, Am 02.04.21 um 16:05 schrieb Borislav Petkov: On Fri, Apr 02, 2021 at 10:33:51AM +0200, Paul Menzel wrote: On an two socket AMD EPYC 7601, we noticed a decrease in raid6 avx2x4 speed shown at the beginning of the boot. 5.4.955.10.24 -- raid6: avx2x4 gen() 18429 MB/s 6155 MB/s raid6: avx2x4 xor()6644 MB/s 4274 MB/s raid6: avx2x2 gen() 17894 MB/s18744 MB/s raid6: avx2x2 xor() 11642 MB/s11950 MB/s raid6: avx2x1 gen() 13992 MB/s17112 MB/s raid6: avx2x1 xor() 10855 MB/s11143 MB/s Looks like those two might help: 49200d17d27d x86/fpu/64: Don't FNINIT in kernel_fpu_begin() e45122893a98 x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state I booted Linux 5.12-rc6, containing these commits, on a Dell OptiPlex 5055 with AMD Ryzen 5 PRO 1500 Quad-Core Processor, and the regression is still present for `avx2x4 xor()`: 5.4.95 5.10.24 -- raid6: avx2x4 gen()23964 MB/s 24540 MB/s raid6: avx2x4 xor()13101 MB/s8354 MB/s raid6: avx2x2 gen()22746 MB/s 26972 MB/s raid6: avx2x2 xor()14917 MB/s 16463 MB/s raid6: avx2x1 gen()17519 MB/s 24394 MB/s raid6: avx2x1 xor()14091 MB/s 15330 MB/s raid6: sse2x4 gen()16867 MB/s 16136 MB/s raid6: sse2x4 xor() 9667 MB/s8176 MB/s raid6: sse2x2 gen()14996 MB/s 18234 MB/s raid6: sse2x2 xor()10765 MB/s 10455 MB/s raid6: sse2x1 gen() 7667 MB/s 13769 MB/s raid6: sse2x1 xor() 7818 MB/s7741 MB/s What system are you using, and what results do you get with 5.4 and 5.12-rc6? Kind regards, Paul
Re: [regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s
Den 2021-04-02 kl. 17:05, skrev Borislav Petkov: > On Fri, Apr 02, 2021 at 10:33:51AM +0200, Paul Menzel wrote: >> Dear Linux folks, >> >> >> On an two socket AMD EPYC 7601, we noticed a decrease in raid6 avx2x4 speed >> shown at the beginning of the boot. >> >> 5.4.955.10.24 >> -- >> raid6: avx2x4 gen() 18429 MB/s 6155 MB/s >> raid6: avx2x4 xor()6644 MB/s 4274 MB/s >> raid6: avx2x2 gen() 17894 MB/s18744 MB/s >> raid6: avx2x2 xor() 11642 MB/s11950 MB/s >> raid6: avx2x1 gen() 13992 MB/s17112 MB/s >> raid6: avx2x1 xor() 10855 MB/s11143 MB/s > > Looks like those two might help: > That would mean only this is missing: > 49200d17d27d x86/fpu/64: Don't FNINIT in kernel_fpu_begin() as this one landed in 5.10.11: > e45122893a98 x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize > state > -- Thomas
Re: [regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s
On Fri, Apr 02, 2021 at 10:33:51AM +0200, Paul Menzel wrote: > Dear Linux folks, > > > On an two socket AMD EPYC 7601, we noticed a decrease in raid6 avx2x4 speed > shown at the beginning of the boot. > >5.4.955.10.24 > -- > raid6: avx2x4 gen() 18429 MB/s 6155 MB/s > raid6: avx2x4 xor()6644 MB/s 4274 MB/s > raid6: avx2x2 gen() 17894 MB/s18744 MB/s > raid6: avx2x2 xor() 11642 MB/s11950 MB/s > raid6: avx2x1 gen() 13992 MB/s17112 MB/s > raid6: avx2x1 xor() 10855 MB/s11143 MB/s Looks like those two might help: 49200d17d27d x86/fpu/64: Don't FNINIT in kernel_fpu_begin() e45122893a98 x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
[regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s
Dear Linux folks, On an two socket AMD EPYC 7601, we noticed a decrease in raid6 avx2x4 speed shown at the beginning of the boot. 5.4.955.10.24 -- raid6: avx2x4 gen() 18429 MB/s 6155 MB/s raid6: avx2x4 xor()6644 MB/s 4274 MB/s raid6: avx2x2 gen() 17894 MB/s18744 MB/s raid6: avx2x2 xor() 11642 MB/s11950 MB/s raid6: avx2x1 gen() 13992 MB/s17112 MB/s raid6: avx2x1 xor() 10855 MB/s11143 MB/s We are able to reproduce this with different models: Supermicro AS-2023US-TR4/H11DSU-iN and Dell PowerEdge R7425 (with different microcode versions). Can you reproduce this on your systems? Bisecting is going to be hard, so the systems are in production and also take a while to boot. (Maybe kexec would help here.) Kind regards, Paul PS: Some more information: ``` [0.00] Linux version 5.4.97.mx64.368 (r...@theinternet.molgen.mpg.de) (gcc version 7.5.0 (GCC )) #1 SMP Wed Feb 10 18:22:50 CET 2021 […] [0.00] DMI: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.1 02/07/2018 […] [0.630603] raid6: avx2x4 gen() 18429 MB/s [0.651607] raid6: avx2x4 xor() 6644 MB/s [0.672605] raid6: avx2x2 gen() 17894 MB/s [0.693603] raid6: avx2x2 xor() 11642 MB/s [0.714605] raid6: avx2x1 gen() 13992 MB/s [0.735604] raid6: avx2x1 xor() 10855 MB/s [0.756607] raid6: sse2x4 gen() 12246 MB/s [0.777605] raid6: sse2x4 xor() 5724 MB/s [0.798605] raid6: sse2x2 gen() 10945 MB/s [0.819603] raid6: sse2x2 xor() 8097 MB/s [0.840606] raid6: sse2x1 gen() 5941 MB/s [0.861606] raid6: sse2x1 xor() 5894 MB/s [0.866565] raid6: using algorithm avx2x4 gen() 18429 MB/s [0.871567] raid6: xor() 6644 MB/s, rmw enabled [0.877566] raid6: using avx2x2 recovery algorithm […] ``` ``` [0.00] Linux version 5.10.24.mx64.375 (r...@theinternet.molgen.mpg.de) (gcc (GCC) 7.5.0, GNU ld (GNU Binutils) 2.32) #1 SMP Fri Mar 19 12:29:21 CET 2021 […] [0.00] DMI: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.1 02/07/2018 […] [0.655382] raid6: avx2x4 gen() 6155 MB/s [0.676382] raid6: avx2x4 xor() 4274 MB/s [0.697380] raid6: avx2x2 gen() 18744 MB/s [0.718380] raid6: avx2x2 xor() 11950 MB/s [0.739380] raid6: avx2x1 gen() 17112 MB/s [0.760380] raid6: avx2x1 xor() 11143 MB/s [0.781381] raid6: sse2x4 gen() 11062 MB/s [0.802380] raid6: sse2x4 xor() 5180 MB/s [0.823380] raid6: sse2x2 gen() 12467 MB/s [0.844380] raid6: sse2x2 xor() 7672 MB/s [0.865381] raid6: sse2x1 gen() 9733 MB/s [0.886380] raid6: sse2x1 xor() 5717 MB/s [0.890674] raid6: using algorithm avx2x2 gen() 18744 MB/s [0.895673] raid6: xor() 11950 MB/s, rmw enabled [0.901673] raid6: using avx2x2 recovery algorithm ``` ``` $ lscpu Architecture:x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 48 bits physical, 48 bits virtual CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 32 Socket(s): 2 NUMA node(s):8 Vendor ID: AuthenticAMD CPU family: 23 Model: 1 Model name: AMD EPYC 7601 32-Core Processor Stepping:2 Frequency boost: enabled CPU MHz: 3100.798 CPU max MHz: 2200. CPU min MHz: 1200. BogoMIPS:4399.53 Virtualization: AMD-V L1d cache: 2 MiB L1i cache: 4 MiB L2 cache:32 MiB L3 cache:128 MiB NUMA node0 CPU(s): 0-7,64-71 NUMA node1 CPU(s): 8-15,72-79 NUMA node2 CPU(s): 16-23,80-87 NUMA node3 CPU(s): 24-31,88-95 NUMA node4 CPU(s): 32-39,96-103 NUMA node5 CPU(s): 40-47,104-111 NUMA node6 CPU(s): 48-55,112-119 NUMA node7 CPU(s): 56-63,120-127 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1:Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2:Mitigation; Full AMD retpoline, IBPB conditional, STIBP disabled, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat