Re: [PATCH v5 4/5] powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32
On Thu, 2024-09-05 at 18:55 +0200, Christophe Leroy wrote: > > Normal single thread > > vdso: 2500 times in 12.494133131 seconds > > libc: 2500 times in 69.594625188 seconds > > syscall: 2500 times in 67.349243972 seconds > > Time namespace single thread > > vdso: 2500 times in 71.673057436 seconds > > libc: 2500 times in 71.712774121 seconds > > syscall: 2500 times in 66.902318080 seconds > > > > I'm seeing this on ppc, ppc64, and ppc64le. > > What is the command to use to test with time namespace ? Assuming user namespace and time namespace are available: $ unshare -r -T --boottime $((365*24*3600)) It'll start a new shell where you are pretended to be the root (i.e. the root in the separated user namespace). Then: # uptime 00:57:17 up 365 days, 57 min, 2 users, load average: 0.19, 0.30, 0.32 So in the separated time namespace the system is pretended to have been booted for 1 year. Now: # /path/to/linux.git/tools/testing/selftests/vDSO/vdso_test_getrandom bench_single vdso: 2500 times in 0.419125373 seconds libc: 2500 times in 5.985498234 seconds syscall: 2500 times in 5.993506773 seconds This is on x86_64, indicating vDSO getrandom is fine for x86_64 in a separated time namespace. If user namespace isn't available (disabled building the kernel or disabled by the security policy of some distros) use $ sudo unshare -T --boottime $((365*24*3600)) to create the time namespace instead. But note that with this approach you'll be operating as the real root user and be careful not to break things. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH 2/6] loongarch: defconfig: drop RT_GROUP_SCHED=y
On Thu, 2024-05-30 at 19:19 +0800, Celeste Liu wrote: > For cgroup v1, if turned on, and there's any cgroup in the "cpu" hierarchy it > needs an RT budget assigned, otherwise the processes in it will not be able to > get RT at all. The problem with RT group scheduling is that it requires the > budget assigned but there's no way we could assign a default budget, since the > values to assign are both upper and lower time limits, are absolute, and need > to > be sum up to < 1 for each individal cgroup. That means we cannot really come > up > with values that would work by default in the general case.[1] > > For cgroup v2, it's almost unusable as well. If it turned on, the cpu > controller > can only be enabled when all RT processes are in the root cgroup. But it will > lose the benefits of cgroup v2 if all RT process were placed in the same > cgroup. > > Red Hat, Gentoo, Arch Linux and Debian all disable it. systemd also doesn't > support it.[2] > > [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1229700 > [2]: https://github.com/systemd/systemd/issues/13781#issuecomment-549164383 > > Signed-off-by: Celeste Liu As a distro maintainer who had once been bitten by this option: Reviewed-by: Xi Ruoyao > --- > arch/loongarch/configs/loongson3_defconfig | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/arch/loongarch/configs/loongson3_defconfig > b/arch/loongarch/configs/loongson3_defconfig > index b4252c357c8e..4d93adb3f1a2 100644 > --- a/arch/loongarch/configs/loongson3_defconfig > +++ b/arch/loongarch/configs/loongson3_defconfig > @@ -23,7 +23,6 @@ CONFIG_NUMA_BALANCING=y > CONFIG_MEMCG=y > CONFIG_BLK_CGROUP=y > CONFIG_CFS_BANDWIDTH=y > -CONFIG_RT_GROUP_SCHED=y > CONFIG_CGROUP_PIDS=y > CONFIG_CGROUP_RDMA=y > CONFIG_CGROUP_FREEZER=y -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCHv3 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO
On Wed, 2024-03-27 at 11:49 +0800, Ethan Zhao wrote: > so, yup, basically, the signal integrity is not good enough. > Though the function could work, its performance will be impacted. FWIW I've replaced the motherboard and this is gone. So it's likely a signal integrity issue of the motherboard. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCHv3 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO
On Mon, 2024-03-25 at 18:15 +0800, Xi Ruoyao wrote: > On Mon, 2024-03-25 at 16:45 +0800, Ethan Zhao wrote: > > On 3/25/2024 1:19 AM, Xi Ruoyao wrote: > > > On Mon, 2023-09-18 at 14:39 -0500, Bjorn Helgaas wrote: > > > > On Mon, Sep 18, 2023 at 07:42:30PM +0800, Xi Ruoyao wrote: > > > > > ... > > > > > My workstation suffers from too much correctable AER reporting as well > > > > > (related to Intel's errata "RPL013: Incorrectly Formed PCIe Packets > > > > > May > > > > > Generate Correctable Errors" and/or the motherboard design, I guess). > > > > We should rate-limit correctable error reporting so it's not > > > > overwhelming. > > > > > > > > At the same time, I'm *also* interested in the cause of these errors, > > > > in case there's a Linux defect or a hardware erratum that we can work > > > > around. Do you have a bug report with any more details, e.g., a dmesg > > > > log and "sudo lspci -vv" output? > > > Hi Bjorn, > > > > > > Sorry for the *very* late reply (somehow I didn't see the reply at all > > > before it was removed by my cron job, and now I just savaged it from > > > lore.kernel.org...) > > > > > > The dmesg is like: > > > > > > [ 882.456994] pcieport :00:1c.1: AER: Multiple Correctable error > > > message received from :00:1c.1 > > > [ 882.457002] pcieport :00:1c.1: AER: found no error details for > > > :00:1c.1 > > > [ 882.457003] pcieport :00:1c.1: AER: Multiple Correctable error > > > message received from :06:00.0 > > > [ 883.545763] pcieport :00:1c.1: AER: Multiple Correctable error > > > message received from :00:1c.1 > > > [ 883.545789] pcieport :00:1c.1: PCIe Bus Error: > > > severity=Correctable, type=Physical Layer, (Receiver ID) > > > [ 883.545790] pcieport :00:1c.1: device [8086:7a39] error > > > status/mask=0001/2000 > > > [ 883.545792] pcieport :00:1c.1: [ 0] RxErr > > > (First) > > > [ 883.545794] pcieport :00:1c.1: AER: Error of this Agent is > > > reported first > > > [ 883.545798] r8169 :06:00.0: PCIe Bus Error: severity=Correctable, > > > type=Physical Layer, (Transmitter ID) > > > [ 883.545799] r8169 :06:00.0: device [10ec:8125] error > > > status/mask=1101/e000 > > > [ 883.545800] r8169 :06:00.0: [ 0] RxErr (First) > > > [ 883.545801] r8169 :06:00.0: [ 8] Rollover > > > [ 883.545802] r8169 :06:00.0: [12] Timeout > > > [ 883.545815] pcieport :00:1c.1: AER: Correctable error message > > > received from :00:1c.1 > > > [ 883.545823] pcieport :00:1c.1: AER: found no error details for > > > :00:1c.1 > > > [ 883.545824] pcieport :00:1c.1: AER: Multiple Correctable error > > > message received from :06:00.0 > > > > > > lspci output attached. > > > > > > Intel has issued an errata "RPL013" saying: > > > > > > "Under complex microarchitectural conditions, the PCIe controller may > > > transmit an incorrectly formed Transaction Layer Packet (TLP), which > > > will fail CRC checks. When this erratum occurs, the PCIe end point may > > > record correctable errors resulting in either a NAK or link recovery. > > > Intel® has not observed any functional impact due to this erratum." > > > > > > But I'm really unsure if it describes my issue. > > > > > > Do you think I have some broken hardware and I should replace the CPU > > > and/or the motherboard (where the r8169 is soldered)? I've noticed that > > > my 13900K is almost impossible to overclock (despite it's a K), but I've > > > not encountered any issue other than these AER reporting so far after I > > > gave up overclocking. > > > > Seems there are two r8169 nics on your board, only :06:00.0 reports > > aer errors, how about another one the :07:00.0 nic ? > > It never happens to :07:00.0, even if I plug the ethernet cable into > it instead of :06:00.0. > > Maybe I should just use :07:00.0 and blacklist :06:00.0 as I > don't need two NICs? Plugging the ethernet cable into :07:00.0 and then "echo 1 > /sys/bus/pci/devices/:00:1c.1/remove" work for me... -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCHv3 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO
On Mon, 2024-03-25 at 16:45 +0800, Ethan Zhao wrote: > On 3/25/2024 1:19 AM, Xi Ruoyao wrote: > > On Mon, 2023-09-18 at 14:39 -0500, Bjorn Helgaas wrote: > > > On Mon, Sep 18, 2023 at 07:42:30PM +0800, Xi Ruoyao wrote: > > > > ... > > > > My workstation suffers from too much correctable AER reporting as well > > > > (related to Intel's errata "RPL013: Incorrectly Formed PCIe Packets May > > > > Generate Correctable Errors" and/or the motherboard design, I guess). > > > We should rate-limit correctable error reporting so it's not > > > overwhelming. > > > > > > At the same time, I'm *also* interested in the cause of these errors, > > > in case there's a Linux defect or a hardware erratum that we can work > > > around. Do you have a bug report with any more details, e.g., a dmesg > > > log and "sudo lspci -vv" output? > > Hi Bjorn, > > > > Sorry for the *very* late reply (somehow I didn't see the reply at all > > before it was removed by my cron job, and now I just savaged it from > > lore.kernel.org...) > > > > The dmesg is like: > > > > [ 882.456994] pcieport :00:1c.1: AER: Multiple Correctable error > > message received from :00:1c.1 > > [ 882.457002] pcieport :00:1c.1: AER: found no error details for > > :00:1c.1 > > [ 882.457003] pcieport :00:1c.1: AER: Multiple Correctable error > > message received from :06:00.0 > > [ 883.545763] pcieport :00:1c.1: AER: Multiple Correctable error > > message received from :00:1c.1 > > [ 883.545789] pcieport :00:1c.1: PCIe Bus Error: severity=Correctable, > > type=Physical Layer, (Receiver ID) > > [ 883.545790] pcieport :00:1c.1: device [8086:7a39] error > > status/mask=0001/2000 > > [ 883.545792] pcieport :00:1c.1: [ 0] RxErr (First) > > [ 883.545794] pcieport :00:1c.1: AER: Error of this Agent is > > reported first > > [ 883.545798] r8169 :06:00.0: PCIe Bus Error: severity=Correctable, > > type=Physical Layer, (Transmitter ID) > > [ 883.545799] r8169 :06:00.0: device [10ec:8125] error > > status/mask=1101/e000 > > [ 883.545800] r8169 :06:00.0: [ 0] RxErr (First) > > [ 883.545801] r8169 :06:00.0: [ 8] Rollover > > [ 883.545802] r8169 :06:00.0: [12] Timeout > > [ 883.545815] pcieport :00:1c.1: AER: Correctable error message > > received from :00:1c.1 > > [ 883.545823] pcieport :00:1c.1: AER: found no error details for > > :00:1c.1 > > [ 883.545824] pcieport :00:1c.1: AER: Multiple Correctable error > > message received from :06:00.0 > > > > lspci output attached. > > > > Intel has issued an errata "RPL013" saying: > > > > "Under complex microarchitectural conditions, the PCIe controller may > > transmit an incorrectly formed Transaction Layer Packet (TLP), which > > will fail CRC checks. When this erratum occurs, the PCIe end point may > > record correctable errors resulting in either a NAK or link recovery. > > Intel® has not observed any functional impact due to this erratum." > > > > But I'm really unsure if it describes my issue. > > > > Do you think I have some broken hardware and I should replace the CPU > > and/or the motherboard (where the r8169 is soldered)? I've noticed that > > my 13900K is almost impossible to overclock (despite it's a K), but I've > > not encountered any issue other than these AER reporting so far after I > > gave up overclocking. > > Seems there are two r8169 nics on your board, only :06:00.0 reports > aer errors, how about another one the :07:00.0 nic ? It never happens to :07:00.0, even if I plug the ethernet cable into it instead of :06:00.0. Maybe I should just use :07:00.0 and blacklist :06:00.0 as I don't need two NICs? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCHv3 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO
On Mon, 2023-09-18 at 14:39 -0500, Bjorn Helgaas wrote: > On Mon, Sep 18, 2023 at 07:42:30PM +0800, Xi Ruoyao wrote: > > ... > > > My workstation suffers from too much correctable AER reporting as well > > (related to Intel's errata "RPL013: Incorrectly Formed PCIe Packets May > > Generate Correctable Errors" and/or the motherboard design, I guess). > > We should rate-limit correctable error reporting so it's not > overwhelming. > > At the same time, I'm *also* interested in the cause of these errors, > in case there's a Linux defect or a hardware erratum that we can work > around. Do you have a bug report with any more details, e.g., a dmesg > log and "sudo lspci -vv" output? Hi Bjorn, Sorry for the *very* late reply (somehow I didn't see the reply at all before it was removed by my cron job, and now I just savaged it from lore.kernel.org...) The dmesg is like: [ 882.456994] pcieport :00:1c.1: AER: Multiple Correctable error message received from :00:1c.1 [ 882.457002] pcieport :00:1c.1: AER: found no error details for :00:1c.1 [ 882.457003] pcieport :00:1c.1: AER: Multiple Correctable error message received from :06:00.0 [ 883.545763] pcieport :00:1c.1: AER: Multiple Correctable error message received from :00:1c.1 [ 883.545789] pcieport :00:1c.1: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID) [ 883.545790] pcieport :00:1c.1: device [8086:7a39] error status/mask=0001/2000 [ 883.545792] pcieport :00:1c.1:[ 0] RxErr (First) [ 883.545794] pcieport :00:1c.1: AER: Error of this Agent is reported first [ 883.545798] r8169 :06:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Transmitter ID) [ 883.545799] r8169 :06:00.0: device [10ec:8125] error status/mask=1101/e000 [ 883.545800] r8169 :06:00.0:[ 0] RxErr (First) [ 883.545801] r8169 :06:00.0:[ 8] Rollover [ 883.545802] r8169 :06:00.0:[12] Timeout [ 883.545815] pcieport :00:1c.1: AER: Correctable error message received from :00:1c.1 [ 883.545823] pcieport :00:1c.1: AER: found no error details for :00:1c.1 [ 883.545824] pcieport :00:1c.1: AER: Multiple Correctable error message received from :06:00.0 lspci output attached. Intel has issued an errata "RPL013" saying: "Under complex microarchitectural conditions, the PCIe controller may transmit an incorrectly formed Transaction Layer Packet (TLP), which will fail CRC checks. When this erratum occurs, the PCIe end point may record correctable errors resulting in either a NAK or link recovery. Intel® has not observed any functional impact due to this erratum." But I'm really unsure if it describes my issue. Do you think I have some broken hardware and I should replace the CPU and/or the motherboard (where the r8169 is soldered)? I've noticed that my 13900K is almost impossible to overclock (despite it's a K), but I've not encountered any issue other than these AER reporting so far after I gave up overclocking. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University 00:00.0 Host bridge: Intel Corporation Device a700 (rev 01) DeviceName: Onboard - Other Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, IntMsgNum 0 DevCap: MaxPayload 128 bytes, PhantFunc 0 ExtTag- RBE+ FLReset+ DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- AtomicOpsCtl: ReqEn- IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq- 10BitTagReq- OBFF Disabled, EETLPPrefixBlk- Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit- Address: fee00018 Data: Masking: Pending: Capabilities: [d0] Power Management version 2
Re: [PATCHv3 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO
On Mon, 2023-08-14 at 08:40 -0700, Grant Grundler wrote: > On Sat, Aug 12, 2023 at 5:45 PM David Heidelberg > wrote: > > > > Tested-by: David Heidelberg > > Thanks David! > > > For PATCH v4 please fix the typo reported by the bot :) > > Sorry - I'll do that today. Hi Grant, Is there an update of this series? My workstation suffers from too much correctable AER reporting as well (related to Intel's errata "RPL013: Incorrectly Formed PCIe Packets May Generate Correctable Errors" and/or the motherboard design, I guess). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [RFC PATCH] asm-generic: Unify uapi bitsperlong.h
On Fri, 2023-06-09 at 14:50 +0800, Tiezhu Yang wrote: /* snip */ > > > > In musl, the documentation states that at least gcc-3.4 or > > > > clang-3.2 are required, which probably predate the > > > > __SIZEOF_LONG__ macro. Indeed, I've digged some history and __SIZEOF_LONG__ was added into GCC- 4.3 (in 2008). And I didn't realize the bitsperlong.h in tools directory is a copy from uapi. > > > > On the other hand, musl was only > > > > released in 2011, and building musl itself explicitly > > > > does not require kernel uapi headers, so this may not > > > > be too critical. > Only arm64, riscv and loongarch belong to the newer architectures > which are related with this change, I am not sure it is necessary > to "unify" uapi bitsperlong.h for them. At least it will stop the engineers working on "the next architecture" from adding an unneeded bitsperlong.h :). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University