Re: [PATCH v5 4/5] powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32

2024-09-05 Thread Xi Ruoyao
On Thu, 2024-09-05 at 18:55 +0200, Christophe Leroy wrote:
> > Normal single thread
> >  vdso: 2500 times in 12.494133131 seconds
> >  libc: 2500 times in 69.594625188 seconds
> > syscall: 2500 times in 67.349243972 seconds
> > Time namespace single thread
> >  vdso: 2500 times in 71.673057436 seconds
> >  libc: 2500 times in 71.712774121 seconds
> > syscall: 2500 times in 66.902318080 seconds
> > 
> > I'm seeing this on ppc, ppc64, and ppc64le.
> 
> What is the command to use to test with time namespace ?

Assuming user namespace and time namespace are available:

$ unshare -r -T --boottime $((365*24*3600))

It'll start a new shell where you are pretended to be the root (i.e. the
root in the separated user namespace).  Then:

# uptime
 00:57:17 up 365 days, 57 min,  2 users,  load average: 0.19, 0.30, 0.32

So in the separated time namespace the system is pretended to have been
booted for 1 year.  Now:

# /path/to/linux.git/tools/testing/selftests/vDSO/vdso_test_getrandom 
bench_single
   vdso: 2500 times in 0.419125373 seconds
   libc: 2500 times in 5.985498234 seconds
syscall: 2500 times in 5.993506773 seconds

This is on x86_64, indicating vDSO getrandom is fine for x86_64 in a
separated time namespace.

If user namespace isn't available (disabled building the kernel or
disabled by the security policy of some distros) use

$ sudo unshare -T --boottime $((365*24*3600))

to create the time namespace instead.  But note that with this approach
you'll be operating as the real root user and be careful not to break
things.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University



Re: [PATCH 2/6] loongarch: defconfig: drop RT_GROUP_SCHED=y

2024-05-30 Thread Xi Ruoyao
On Thu, 2024-05-30 at 19:19 +0800, Celeste Liu wrote:
> For cgroup v1, if turned on, and there's any cgroup in the "cpu" hierarchy it
> needs an RT budget assigned, otherwise the processes in it will not be able to
> get RT at all. The problem with RT group scheduling is that it requires the
> budget assigned but there's no way we could assign a default budget, since the
> values to assign are both upper and lower time limits, are absolute, and need 
> to
> be sum up to < 1 for each individal cgroup. That means we cannot really come 
> up
> with values that would work by default in the general case.[1]
> 
> For cgroup v2, it's almost unusable as well. If it turned on, the cpu 
> controller
> can only be enabled when all RT processes are in the root cgroup. But it will
> lose the benefits of cgroup v2 if all RT process were placed in the same 
> cgroup.
> 
> Red Hat, Gentoo, Arch Linux and Debian all disable it. systemd also doesn't
> support it.[2]
> 
> [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1229700
> [2]: https://github.com/systemd/systemd/issues/13781#issuecomment-549164383
> 
> Signed-off-by: Celeste Liu 

As a distro maintainer who had once been bitten by this option:

Reviewed-by: Xi Ruoyao 

> ---
>  arch/loongarch/configs/loongson3_defconfig | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/loongarch/configs/loongson3_defconfig 
> b/arch/loongarch/configs/loongson3_defconfig
> index b4252c357c8e..4d93adb3f1a2 100644
> --- a/arch/loongarch/configs/loongson3_defconfig
> +++ b/arch/loongarch/configs/loongson3_defconfig
> @@ -23,7 +23,6 @@ CONFIG_NUMA_BALANCING=y
>  CONFIG_MEMCG=y
>  CONFIG_BLK_CGROUP=y
>  CONFIG_CFS_BANDWIDTH=y
> -CONFIG_RT_GROUP_SCHED=y
>  CONFIG_CGROUP_PIDS=y
>  CONFIG_CGROUP_RDMA=y
>  CONFIG_CGROUP_FREEZER=y

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCHv3 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO

2024-04-01 Thread Xi Ruoyao
On Wed, 2024-03-27 at 11:49 +0800, Ethan Zhao wrote:
> so, yup, basically, the signal integrity is not good enough.
> Though the function could work, its performance will be impacted.

FWIW I've replaced the motherboard and this is gone.  So it's likely a
signal integrity issue of the motherboard.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCHv3 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO

2024-03-25 Thread Xi Ruoyao
On Mon, 2024-03-25 at 18:15 +0800, Xi Ruoyao wrote:
> On Mon, 2024-03-25 at 16:45 +0800, Ethan Zhao wrote:
> > On 3/25/2024 1:19 AM, Xi Ruoyao wrote:
> > > On Mon, 2023-09-18 at 14:39 -0500, Bjorn Helgaas wrote:
> > > > On Mon, Sep 18, 2023 at 07:42:30PM +0800, Xi Ruoyao wrote:
> > > > > ...
> > > > > My workstation suffers from too much correctable AER reporting as well
> > > > > (related to Intel's errata "RPL013: Incorrectly Formed PCIe Packets 
> > > > > May
> > > > > Generate Correctable Errors" and/or the motherboard design, I guess).
> > > > We should rate-limit correctable error reporting so it's not
> > > > overwhelming.
> > > > 
> > > > At the same time, I'm *also* interested in the cause of these errors,
> > > > in case there's a Linux defect or a hardware erratum that we can work
> > > > around.  Do you have a bug report with any more details, e.g., a dmesg
> > > > log and "sudo lspci -vv" output?
> > > Hi Bjorn,
> > > 
> > > Sorry for the *very* late reply (somehow I didn't see the reply at all
> > > before it was removed by my cron job, and now I just savaged it from
> > > lore.kernel.org...)
> > > 
> > > The dmesg is like:
> > > 
> > > [  882.456994] pcieport :00:1c.1: AER: Multiple Correctable error 
> > > message received from :00:1c.1
> > > [  882.457002] pcieport :00:1c.1: AER: found no error details for 
> > > :00:1c.1
> > > [  882.457003] pcieport :00:1c.1: AER: Multiple Correctable error 
> > > message received from :06:00.0
> > > [  883.545763] pcieport :00:1c.1: AER: Multiple Correctable error 
> > > message received from :00:1c.1
> > > [  883.545789] pcieport :00:1c.1: PCIe Bus Error: 
> > > severity=Correctable, type=Physical Layer, (Receiver ID)
> > > [  883.545790] pcieport :00:1c.1:   device [8086:7a39] error 
> > > status/mask=0001/2000
> > > [  883.545792] pcieport :00:1c.1:    [ 0] RxErr  
> > > (First)
> > > [  883.545794] pcieport :00:1c.1: AER:   Error of this Agent is 
> > > reported first
> > > [  883.545798] r8169 :06:00.0: PCIe Bus Error: severity=Correctable, 
> > > type=Physical Layer, (Transmitter ID)
> > > [  883.545799] r8169 :06:00.0:   device [10ec:8125] error 
> > > status/mask=1101/e000
> > > [  883.545800] r8169 :06:00.0:    [ 0] RxErr  (First)
> > > [  883.545801] r8169 :06:00.0:    [ 8] Rollover
> > > [  883.545802] r8169 :06:00.0:    [12] Timeout
> > > [  883.545815] pcieport :00:1c.1: AER: Correctable error message 
> > > received from :00:1c.1
> > > [  883.545823] pcieport :00:1c.1: AER: found no error details for 
> > > :00:1c.1
> > > [  883.545824] pcieport :00:1c.1: AER: Multiple Correctable error 
> > > message received from :06:00.0
> > > 
> > > lspci output attached.
> > > 
> > > Intel has issued an errata "RPL013" saying:
> > > 
> > > "Under complex microarchitectural conditions, the PCIe controller may
> > > transmit an incorrectly formed Transaction Layer Packet (TLP), which
> > > will fail CRC checks. When this erratum occurs, the PCIe end point may
> > > record correctable errors resulting in either a NAK or link recovery.
> > > Intel® has not observed any functional impact due to this erratum."
> > > 
> > > But I'm really unsure if it describes my issue.
> > > 
> > > Do you think I have some broken hardware and I should replace the CPU
> > > and/or the motherboard (where the r8169 is soldered)?  I've noticed that
> > > my 13900K is almost impossible to overclock (despite it's a K), but I've
> > > not encountered any issue other than these AER reporting so far after I
> > > gave up overclocking.
> > 
> > Seems there are two r8169 nics on your board, only :06:00.0 reports
> > aer errors, how about another one the :07:00.0 nic ?
> 
> It never happens to :07:00.0, even if I plug the ethernet cable into
> it instead of :06:00.0.
> 
> Maybe I should just use :07:00.0 and blacklist :06:00.0 as I
> don't need two NICs?

Plugging the ethernet cable into :07:00.0 and then
"echo 1 > /sys/bus/pci/devices/:00:1c.1/remove" work for me...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCHv3 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO

2024-03-25 Thread Xi Ruoyao
On Mon, 2024-03-25 at 16:45 +0800, Ethan Zhao wrote:
> On 3/25/2024 1:19 AM, Xi Ruoyao wrote:
> > On Mon, 2023-09-18 at 14:39 -0500, Bjorn Helgaas wrote:
> > > On Mon, Sep 18, 2023 at 07:42:30PM +0800, Xi Ruoyao wrote:
> > > > ...
> > > > My workstation suffers from too much correctable AER reporting as well
> > > > (related to Intel's errata "RPL013: Incorrectly Formed PCIe Packets May
> > > > Generate Correctable Errors" and/or the motherboard design, I guess).
> > > We should rate-limit correctable error reporting so it's not
> > > overwhelming.
> > > 
> > > At the same time, I'm *also* interested in the cause of these errors,
> > > in case there's a Linux defect or a hardware erratum that we can work
> > > around.  Do you have a bug report with any more details, e.g., a dmesg
> > > log and "sudo lspci -vv" output?
> > Hi Bjorn,
> > 
> > Sorry for the *very* late reply (somehow I didn't see the reply at all
> > before it was removed by my cron job, and now I just savaged it from
> > lore.kernel.org...)
> > 
> > The dmesg is like:
> > 
> > [  882.456994] pcieport :00:1c.1: AER: Multiple Correctable error 
> > message received from :00:1c.1
> > [  882.457002] pcieport :00:1c.1: AER: found no error details for 
> > :00:1c.1
> > [  882.457003] pcieport :00:1c.1: AER: Multiple Correctable error 
> > message received from :06:00.0
> > [  883.545763] pcieport :00:1c.1: AER: Multiple Correctable error 
> > message received from :00:1c.1
> > [  883.545789] pcieport :00:1c.1: PCIe Bus Error: severity=Correctable, 
> > type=Physical Layer, (Receiver ID)
> > [  883.545790] pcieport :00:1c.1:   device [8086:7a39] error 
> > status/mask=0001/2000
> > [  883.545792] pcieport :00:1c.1:    [ 0] RxErr  (First)
> > [  883.545794] pcieport :00:1c.1: AER:   Error of this Agent is 
> > reported first
> > [  883.545798] r8169 :06:00.0: PCIe Bus Error: severity=Correctable, 
> > type=Physical Layer, (Transmitter ID)
> > [  883.545799] r8169 :06:00.0:   device [10ec:8125] error 
> > status/mask=1101/e000
> > [  883.545800] r8169 :06:00.0:    [ 0] RxErr  (First)
> > [  883.545801] r8169 :06:00.0:    [ 8] Rollover
> > [  883.545802] r8169 :06:00.0:    [12] Timeout
> > [  883.545815] pcieport :00:1c.1: AER: Correctable error message 
> > received from :00:1c.1
> > [  883.545823] pcieport :00:1c.1: AER: found no error details for 
> > :00:1c.1
> > [  883.545824] pcieport :00:1c.1: AER: Multiple Correctable error 
> > message received from :06:00.0
> > 
> > lspci output attached.
> > 
> > Intel has issued an errata "RPL013" saying:
> > 
> > "Under complex microarchitectural conditions, the PCIe controller may
> > transmit an incorrectly formed Transaction Layer Packet (TLP), which
> > will fail CRC checks. When this erratum occurs, the PCIe end point may
> > record correctable errors resulting in either a NAK or link recovery.
> > Intel® has not observed any functional impact due to this erratum."
> > 
> > But I'm really unsure if it describes my issue.
> > 
> > Do you think I have some broken hardware and I should replace the CPU
> > and/or the motherboard (where the r8169 is soldered)?  I've noticed that
> > my 13900K is almost impossible to overclock (despite it's a K), but I've
> > not encountered any issue other than these AER reporting so far after I
> > gave up overclocking.
> 
> Seems there are two r8169 nics on your board, only :06:00.0 reports
> aer errors, how about another one the :07:00.0 nic ?

It never happens to :07:00.0, even if I plug the ethernet cable into
it instead of :06:00.0.

Maybe I should just use :07:00.0 and blacklist :06:00.0 as I
don't need two NICs?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCHv3 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO

2024-03-24 Thread Xi Ruoyao
On Mon, 2023-09-18 at 14:39 -0500, Bjorn Helgaas wrote:
> On Mon, Sep 18, 2023 at 07:42:30PM +0800, Xi Ruoyao wrote:
> > ...
> 
> > My workstation suffers from too much correctable AER reporting as well
> > (related to Intel's errata "RPL013: Incorrectly Formed PCIe Packets May
> > Generate Correctable Errors" and/or the motherboard design, I guess).
> 
> We should rate-limit correctable error reporting so it's not
> overwhelming.
> 
> At the same time, I'm *also* interested in the cause of these errors,
> in case there's a Linux defect or a hardware erratum that we can work
> around.  Do you have a bug report with any more details, e.g., a dmesg
> log and "sudo lspci -vv" output?

Hi Bjorn,

Sorry for the *very* late reply (somehow I didn't see the reply at all
before it was removed by my cron job, and now I just savaged it from
lore.kernel.org...)

The dmesg is like:

[  882.456994] pcieport :00:1c.1: AER: Multiple Correctable error message 
received from :00:1c.1
[  882.457002] pcieport :00:1c.1: AER: found no error details for 
:00:1c.1
[  882.457003] pcieport :00:1c.1: AER: Multiple Correctable error message 
received from :06:00.0
[  883.545763] pcieport :00:1c.1: AER: Multiple Correctable error message 
received from :00:1c.1
[  883.545789] pcieport :00:1c.1: PCIe Bus Error: severity=Correctable, 
type=Physical Layer, (Receiver ID)
[  883.545790] pcieport :00:1c.1:   device [8086:7a39] error 
status/mask=0001/2000
[  883.545792] pcieport :00:1c.1:[ 0] RxErr  (First)
[  883.545794] pcieport :00:1c.1: AER:   Error of this Agent is reported 
first
[  883.545798] r8169 :06:00.0: PCIe Bus Error: severity=Correctable, 
type=Physical Layer, (Transmitter ID)
[  883.545799] r8169 :06:00.0:   device [10ec:8125] error 
status/mask=1101/e000
[  883.545800] r8169 :06:00.0:[ 0] RxErr  (First)
[  883.545801] r8169 :06:00.0:[ 8] Rollover  
[  883.545802] r8169 :06:00.0:[12] Timeout   
[  883.545815] pcieport :00:1c.1: AER: Correctable error message received 
from :00:1c.1
[  883.545823] pcieport :00:1c.1: AER: found no error details for 
:00:1c.1
[  883.545824] pcieport :00:1c.1: AER: Multiple Correctable error message 
received from :06:00.0

lspci output attached.

Intel has issued an errata "RPL013" saying:

"Under complex microarchitectural conditions, the PCIe controller may
transmit an incorrectly formed Transaction Layer Packet (TLP), which
will fail CRC checks. When this erratum occurs, the PCIe end point may
record correctable errors resulting in either a NAK or link recovery.
Intel® has not observed any functional impact due to this erratum."

But I'm really unsure if it describes my issue.

Do you think I have some broken hardware and I should replace the CPU
and/or the motherboard (where the r8169 is soldered)?  I've noticed that
my 13900K is almost impossible to overclock (despite it's a K), but I've
not encountered any issue other than these AER reporting so far after I
gave up overclocking.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
00:00.0 Host bridge: Intel Corporation Device a700 (rev 01)
DeviceName: Onboard - Other
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- 
Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, 
IntMsgNum 0
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE+ FLReset+
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- 
TransPend-
DevCap2: Completion Timeout: Not Supported, TimeoutDis- 
NROPrPrP- LTR-
 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- 
EETLPPrefix-
 EmergencyPowerReduction Not Supported, 
EmergencyPowerReductionInit-
 FRS-
 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
 AtomicOpsCtl: ReqEn-
 IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
 10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit-
Address: fee00018  Data: 
Masking:   Pending: 
Capabilities: [d0] Power Management version 2
  

Re: [PATCHv3 pci-next 1/2] PCI/AER: correctable error message as KERN_INFO

2023-09-18 Thread Xi Ruoyao
On Mon, 2023-08-14 at 08:40 -0700, Grant Grundler wrote:
> On Sat, Aug 12, 2023 at 5:45 PM David Heidelberg 
> wrote:
> > 
> > Tested-by: David Heidelberg 
> 
> Thanks David!
> 
> > For PATCH v4 please fix the typo reported by the bot :)
> 
> Sorry - I'll do that today.

Hi Grant,

Is there an update of this series?

My workstation suffers from too much correctable AER reporting as well
(related to Intel's errata "RPL013: Incorrectly Formed PCIe Packets May
Generate Correctable Errors" and/or the motherboard design, I guess).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [RFC PATCH] asm-generic: Unify uapi bitsperlong.h

2023-06-10 Thread Xi Ruoyao
On Fri, 2023-06-09 at 14:50 +0800, Tiezhu Yang wrote:

/* snip */

> > > > In musl, the documentation states that at least gcc-3.4 or
> > > > clang-3.2 are required, which probably predate the
> > > > __SIZEOF_LONG__ macro.

Indeed, I've digged some history and __SIZEOF_LONG__ was added into GCC-
4.3 (in 2008).  And I didn't realize the bitsperlong.h in tools
directory is a copy from uapi.

> > > > On the other hand, musl was only
> > > > released in 2011, and building musl itself explicitly
> > > > does not require kernel uapi headers, so this may not
> > > > be too critical.

> Only arm64, riscv and loongarch belong to the newer architectures
> which are related with this change, I am not sure it is necessary
> to "unify" uapi bitsperlong.h for them.

At least it will stop the engineers working on "the next architecture"
from adding an unneeded bitsperlong.h :).


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University