Re: 回复:general protection fault in refill_obj_stock

2024-04-13 Thread Shakeel Butt
On Tue, Apr 02, 2024 at 09:50:54AM +0800, Ubisectech Sirius wrote:
> > On Mon, Apr 01, 2024 at 03:04:46PM +0800, Ubisectech Sirius wrote:
> > Hello.
> > We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
> > Recently, our team has discovered a issue in Linux kernel 6.7. Attached to 
> > the email were a PoC file of the issue.
> 
> > Thank you for the report!
> 
> > I tried to compile and run your test program for about half an hour
> > on a virtual machine running 6.7 with enabled KASAN, but wasn't able
> > to reproduce the problem.
> 
> > Can you, please, share a bit more information? How long does it take
> > to reproduce? Do you mind sharing your kernel config? Is there anything 
> > special
> > about your setup? What are exact steps to reproduce the problem?
> > Is this problem reproducible on 6.6?
> 
> Hi. 
>The .config of linux kernel 6.7 has send to you as attachment. And The 
> problem is reproducible on 6.6.

Can you please share the reproducer of this issue?

> 
> > It's interesting that the problem looks like use-after-free for the objcg 
> > pointer
> > but happens in the context of udev-systemd, which I believe should be 
> > fairly stable
> > and it's cgroup is not going anywhere.
> 
> > Thanks!
> 





Re: [PATCH v11 14/14] selftests/sgx: Add scripts for EPC cgroup testing

2024-04-13 Thread Jarkko Sakkinen
On Wed Apr 10, 2024 at 9:25 PM EEST, Haitao Huang wrote:
> To run selftests for EPC cgroup:
>
> sudo ./run_epc_cg_selftests.sh
>
> To watch misc cgroup 'current' changes during testing, run this in a
> separate terminal:
>
> ./watch_misc_for_tests.sh current
>
> With different cgroups, the script starts one or multiple concurrent SGX
> selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed
> test case, which loads an enclave of EPC size equal to the EPC capacity
> available on the platform. The script checks results against the
> expectation set for each cgroup and reports success or failure.
>
> The script creates 3 different cgroups at the beginning with following
> expectations:
>
> 1) SMALL - intentionally small enough to fail the test loading an
> enclave of size equal to the capacity.
> 2) LARGE - large enough to run up to 4 concurrent tests but fail some if
> more than 4 concurrent tests are run. The script starts 4 expecting at
> least one test to pass, and then starts 5 expecting at least one test
> to fail.
> 3) LARGER - limit is the same as the capacity, large enough to run lots of
> concurrent tests. The script starts 8 of them and expects all pass.
> Then it reruns the same test with one process randomly killed and
> usage checked to be zero after all processes exit.
>
> The script also includes a test with low mem_cg limit and LARGE sgx_epc
> limit to verify that the RAM used for per-cgroup reclamation is charged
> to a proper mem_cg. For this test, it turns off swapping before start,
> and turns swapping back on afterwards.
>
> Signed-off-by: Haitao Huang 
> ---
> V11:
> - Remove cgroups-tools dependency and make scripts ash compatible. (Jarkko)
> - Drop support for cgroup v1 and simplify. (Michal, Jarkko)
> - Add documentation for functions. (Jarkko)
> - Turn off swapping before memcontrol tests and back on after
> - Format and style fixes, name for hard coded values
>
> V7:
> - Added memcontrol test.
>
> V5:
> - Added script with automatic results checking, remove the interactive
> script.
> - The script can run independent from the series below.
> ---
>  tools/testing/selftests/sgx/ash_cgexec.sh |  16 +
>  .../selftests/sgx/run_epc_cg_selftests.sh | 275 ++
>  .../selftests/sgx/watch_misc_for_tests.sh |  11 +
>  3 files changed, 302 insertions(+)
>  create mode 100755 tools/testing/selftests/sgx/ash_cgexec.sh
>  create mode 100755 tools/testing/selftests/sgx/run_epc_cg_selftests.sh
>  create mode 100755 tools/testing/selftests/sgx/watch_misc_for_tests.sh
>
> diff --git a/tools/testing/selftests/sgx/ash_cgexec.sh 
> b/tools/testing/selftests/sgx/ash_cgexec.sh
> new file mode 100755
> index ..cfa5d2b0e795
> --- /dev/null
> +++ b/tools/testing/selftests/sgx/ash_cgexec.sh
> @@ -0,0 +1,16 @@
> +#!/usr/bin/env sh
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright(c) 2024 Intel Corporation.
> +
> +# Start a program in a given cgroup.
> +# Supports V2 cgroup paths, relative to /sys/fs/cgroup
> +if [ "$#" -lt 2 ]; then
> +echo "Usage: $0   [args...]"
> +exit 1
> +fi
> +# Move this shell to the cgroup.
> +echo 0 >/sys/fs/cgroup/$1/cgroup.procs
> +shift
> +# Execute the command within the cgroup
> +exec "$@"
> +
> diff --git a/tools/testing/selftests/sgx/run_epc_cg_selftests.sh 
> b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh
> new file mode 100755
> index ..dd56273056fc
> --- /dev/null
> +++ b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh
> @@ -0,0 +1,275 @@
> +#!/usr/bin/env sh
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright(c) 2023, 2024 Intel Corporation.
> +
> +TEST_ROOT_CG=selftest
> +TEST_CG_SUB1=$TEST_ROOT_CG/test1
> +TEST_CG_SUB2=$TEST_ROOT_CG/test2
> +# We will only set limit in test1 and run tests in test3
> +TEST_CG_SUB3=$TEST_ROOT_CG/test1/test3
> +TEST_CG_SUB4=$TEST_ROOT_CG/test4
> +
> +# Cgroup v2 only
> +CG_ROOT=/sys/fs/cgroup
> +mkdir -p $CG_ROOT/$TEST_CG_SUB1
> +mkdir -p $CG_ROOT/$TEST_CG_SUB2
> +mkdir -p $CG_ROOT/$TEST_CG_SUB3
> +mkdir -p $CG_ROOT/$TEST_CG_SUB4
> +
> +# Turn on misc and memory controller in non-leaf nodes
> +echo "+misc" >  $CG_ROOT/cgroup.subtree_control && \
> +echo "+memory" > $CG_ROOT/cgroup.subtree_control && \
> +echo "+misc" >  $CG_ROOT/$TEST_ROOT_CG/cgroup.subtree_control && \
> +echo "+memory" > $CG_ROOT/$TEST_ROOT_CG/cgroup.subtree_control && \
> +echo "+misc" >  $CG_ROOT/$TEST_CG_SUB1/cgroup.subtree_control
> +if [ $? -ne 0 ]; then
> +echo "# Failed setting up cgroups, make sure misc and memory cgroups are 
> enabled."
> +exit 1
> +fi
> +
> +CAPACITY=$(grep "sgx_epc" "$CG_ROOT/misc.capacity" | awk '{print $2}')
> +# This is below number of VA pages needed for enclave of capacity size. So
> +# should fail oversubscribed cases
> +SMALL=$(( CAPACITY / 512 ))
> +
> +# At least load one enclave of capacity size successfully, maybe up to 4.
> +# But some may fail if we run more than 4 concurrent enclaves of capacity 
> size.
> +LARGE=$(( SMALL * 4 ))
> +
> 

Re: [PATCH v10 08/14] x86/sgx: Add basic EPC reclamation flow for cgroup

2024-04-13 Thread Jarkko Sakkinen
On Fri Apr 5, 2024 at 6:07 AM EEST, Huang, Kai wrote:
> On Thu, 2024-04-04 at 12:05 -0500, Haitao Huang wrote:
> > > > -static inline int sgx_cgroup_try_charge(struct sgx_cgroup *sgx_cg)
> > > > +static inline int sgx_cgroup_try_charge(struct sgx_cgroup *sgx_cg,  
> > > > enum sgx_reclaim r)
> > > 
> > > Is the @r here intentional for shorter typing?
> > > 
> > 
> > yes :-)
> > Will speel out to make it consistent if that's the concern.
>
> I kinda prefer the full name to match the CONFIG_CGROUP_SGX_EPC on case.  You
> can put the 'enum sgx_reclaim reclaim' parameter into the new line if needed:

Why don't cgroups for SGX get always enabled when SGX and
cgroups support are enabled?

BR, Jarkko



[PATCH v3 4/4] arm64: dts: qcom: msm8976: Add WCNSS node

2024-04-13 Thread Adam Skladowski
Add node describing wireless connectivity subsystem.

Signed-off-by: Adam Skladowski 
---
 arch/arm64/boot/dts/qcom/msm8976.dtsi | 104 ++
 1 file changed, 104 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/msm8976.dtsi 
b/arch/arm64/boot/dts/qcom/msm8976.dtsi
index acb6331999bd..1e492bcc56f0 100644
--- a/arch/arm64/boot/dts/qcom/msm8976.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8976.dtsi
@@ -771,6 +771,36 @@ blsp2_i2c4_sleep: blsp2-i2c4-sleep-state {
drive-strength = <2>;
bias-disable;
};
+
+   wcss_wlan_default: wcss-wlan-default-state  {
+   wcss-wlan2-pins {
+   pins = "gpio40";
+   function = "wcss_wlan2";
+   drive-strength = <6>;
+   bias-pull-up;
+   };
+
+   wcss-wlan1-pins {
+   pins = "gpio41";
+   function = "wcss_wlan1";
+   drive-strength = <6>;
+   bias-pull-up;
+   };
+
+   wcss-wlan0-pins {
+   pins = "gpio42";
+   function = "wcss_wlan0";
+   drive-strength = <6>;
+   bias-pull-up;
+   };
+
+   wcss-wlan-pins {
+   pins = "gpio43", "gpio44";
+   function = "wcss_wlan";
+   drive-strength = <6>;
+   bias-pull-up;
+   };
+   };
};
 
gcc: clock-controller@180 {
@@ -1458,6 +1488,80 @@ blsp2_i2c4: i2c@7af8000 {
status = "disabled";
};
 
+   wcnss: remoteproc@a204000 {
+   compatible = "qcom,pronto-v3-pil", "qcom,pronto";
+   reg = <0x0a204000 0x2000>,
+ <0x0a202000 0x1000>,
+ <0x0a21b000 0x3000>;
+   reg-names = "ccu",
+   "dxe",
+   "pmu";
+
+   memory-region = <_fw_mem>;
+
+   interrupts-extended = < GIC_SPI 149 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 0 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 1 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 2 
IRQ_TYPE_EDGE_RISING>,
+ <_smp2p_in 3 
IRQ_TYPE_EDGE_RISING>;
+   interrupt-names = "wdog",
+ "fatal",
+ "ready",
+ "handover",
+ "stop-ack";
+
+   power-domains = < MSM8976_VDDCX>,
+   < MSM8976_VDDMX>;
+   power-domain-names = "cx", "mx";
+
+   qcom,smem-states = <_smp2p_out 0>;
+   qcom,smem-state-names = "stop";
+
+   pinctrl-0 = <_wlan_default>;
+   pinctrl-names = "default";
+
+   status = "disabled";
+
+   wcnss_iris: iris {
+   /* Separate chip, compatible is board-specific 
*/
+   clocks = < RPM_SMD_RF_CLK2>;
+   clock-names = "xo";
+   };
+
+   smd-edge {
+   interrupts = ;
+
+   qcom,ipc = < 8 17>;
+   qcom,smd-edge = <6>;
+   qcom,remote-pid = <4>;
+
+   label = "pronto";
+
+   wcnss_ctrl: wcnss {
+   compatible = "qcom,wcnss";
+   qcom,smd-channels = "WCNSS_CTRL";
+
+   qcom,mmio = <>;
+
+   wcnss_bt: bluetooth {
+   compatible = "qcom,wcnss-bt";
+   };
+
+   wcnss_wifi: wifi {
+   compatible = "qcom,wcnss-wlan";
+
+   interrupts = ,
+  

[PATCH v3 3/4] arm64: dts: qcom: msm8976: Add Adreno GPU

2024-04-13 Thread Adam Skladowski
Add Adreno GPU node.

Signed-off-by: Adam Skladowski 
---
 arch/arm64/boot/dts/qcom/msm8976.dtsi | 71 +++
 1 file changed, 71 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/msm8976.dtsi 
b/arch/arm64/boot/dts/qcom/msm8976.dtsi
index ce15c6ec9f4e..acb6331999bd 100644
--- a/arch/arm64/boot/dts/qcom/msm8976.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8976.dtsi
@@ -1080,6 +1080,77 @@ mdss_dsi1_phy: phy@1a96a00 {
};
};
 
+   adreno_gpu: gpu@1c0 {
+   compatible = "qcom,adreno-510.0", "qcom,adreno";
+
+   reg = <0x01c0 0x4>;
+   reg-names = "kgsl_3d0_reg_memory";
+
+   interrupts = ;
+   interrupt-names = "kgsl_3d0_irq";
+
+   clocks = < GCC_GFX3D_OXILI_CLK>,
+< GCC_GFX3D_OXILI_AHB_CLK>,
+< GCC_GFX3D_OXILI_GMEM_CLK>,
+< GCC_GFX3D_BIMC_CLK>,
+< GCC_GFX3D_OXILI_TIMER_CLK>,
+< GCC_GFX3D_OXILI_AON_CLK>;
+   clock-names = "core",
+ "iface",
+ "mem",
+ "mem_iface",
+ "rbbmtimer",
+ "alwayson";
+
+   power-domains = < OXILI_GX_GDSC>;
+
+   iommus = <_iommu 0>;
+
+   operating-points-v2 = <_opp_table>;
+
+   status = "disabled";
+
+   gpu_opp_table: opp-table {
+   compatible = "operating-points-v2";
+
+   opp-2 {
+   opp-hz = /bits/ 64 <2>;
+   required-opps = <_opp_low_svs>;
+   opp-supported-hw = <0xff>;
+   };
+
+   opp-3 {
+   opp-hz = /bits/ 64 <3>;
+   required-opps = <_opp_svs>;
+   opp-supported-hw = <0xff>;
+   };
+
+   opp-4 {
+   opp-hz = /bits/ 64 <4>;
+   required-opps = <_opp_nom>;
+   opp-supported-hw = <0xff>;
+   };
+
+   opp-48000 {
+   opp-hz = /bits/ 64 <48000>;
+   required-opps = <_opp_nom_plus>;
+   opp-supported-hw = <0xff>;
+   };
+
+   opp-54000 {
+   opp-hz = /bits/ 64 <54000>;
+   required-opps = <_opp_turbo>;
+   opp-supported-hw = <0xff>;
+   };
+
+   opp-6 {
+   opp-hz = /bits/ 64 <6>;
+   required-opps = <_opp_turbo>;
+   opp-supported-hw = <0xff>;
+   };
+   };
+   };
+
apps_iommu: iommu@1ee {
compatible = "qcom,msm8976-iommu", "qcom,msm-iommu-v2";
reg = <0x01ee 0x3000>;
-- 
2.44.0




[PATCH v3 2/4] arm64: dts: qcom: msm8976: Add MDSS nodes

2024-04-13 Thread Adam Skladowski
Add MDSS nodes to support displays on MSM8976 SoC.

Signed-off-by: Adam Skladowski 
---
 arch/arm64/boot/dts/qcom/msm8976.dtsi | 280 +-
 1 file changed, 276 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/msm8976.dtsi 
b/arch/arm64/boot/dts/qcom/msm8976.dtsi
index 8bdcc1438177..ce15c6ec9f4e 100644
--- a/arch/arm64/boot/dts/qcom/msm8976.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8976.dtsi
@@ -785,10 +785,10 @@ gcc: clock-controller@180 {
 
clocks = < RPM_SMD_XO_CLK_SRC>,
 < RPM_SMD_XO_A_CLK_SRC>,
-<0>,
-<0>,
-<0>,
-<0>;
+<_dsi0_phy 1>,
+<_dsi0_phy 0>,
+<_dsi1_phy 1>,
+<_dsi1_phy 0>;
clock-names = "xo",
  "xo_a",
  "dsi0pll",
@@ -808,6 +808,278 @@ tcsr: syscon@1937000 {
reg = <0x01937000 0x3>;
};
 
+   mdss: display-subsystem@1a0 {
+   compatible = "qcom,mdss";
+
+   reg = <0x01a0 0x1000>,
+ <0x01ab 0x3000>;
+   reg-names = "mdss_phys", "vbif_phys";
+
+   power-domains = < MDSS_GDSC>;
+   interrupts = ;
+
+   interrupt-controller;
+   #interrupt-cells = <1>;
+
+   clocks = < GCC_MDSS_AHB_CLK>,
+< GCC_MDSS_AXI_CLK>,
+< GCC_MDSS_VSYNC_CLK>,
+< GCC_MDSS_MDP_CLK>;
+   clock-names = "iface",
+ "bus",
+ "vsync",
+ "core";
+
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges;
+
+   status = "disabled";
+
+   mdss_mdp: display-controller@1a01000 {
+   compatible = "qcom,msm8976-mdp5", "qcom,mdp5";
+   reg = <0x01a01000 0x89000>;
+   reg-names = "mdp_phys";
+
+   interrupt-parent = <>;
+   interrupts = <0>;
+
+   clocks = < GCC_MDSS_AHB_CLK>,
+< GCC_MDSS_AXI_CLK>,
+< GCC_MDSS_MDP_CLK>,
+< GCC_MDSS_VSYNC_CLK>,
+< GCC_MDP_TBU_CLK>,
+< GCC_MDP_RT_TBU_CLK>;
+   clock-names = "iface",
+ "bus",
+ "core",
+ "vsync",
+ "tbu",
+ "tbu_rt";
+
+   operating-points-v2 = <_opp_table>;
+   power-domains = < MDSS_GDSC>;
+
+   iommus = <_iommu 22>;
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   port@0 {
+   reg = <0>;
+
+   mdss_mdp5_intf1_out: endpoint {
+   remote-endpoint = 
<_dsi0_in>;
+   };
+   };
+
+   port@1 {
+   reg = <1>;
+
+   mdss_mdp5_intf2_out: endpoint {
+   remote-endpoint = 
<_dsi1_in>;
+   };
+   };
+   };
+
+   mdp_opp_table: opp-table {
+   compatible = "operating-points-v2";
+
+   opp-17778 {
+   opp-hz = /bits/ 64 <17778>;
+   required-opps = 
<_opp_svs>;
+   };
+
+   opp-27000 {
+   opp-hz = /bits/ 64 <27000>;
+

[PATCH v3 1/4] arm64: dts: qcom: msm8976: Add IOMMU nodes

2024-04-13 Thread Adam Skladowski
Add the nodes describing the apps and gpu iommu and its context banks
that are found on msm8976 SoCs.

Signed-off-by: Adam Skladowski 
---
 arch/arm64/boot/dts/qcom/msm8976.dtsi | 81 +++
 1 file changed, 81 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/msm8976.dtsi 
b/arch/arm64/boot/dts/qcom/msm8976.dtsi
index d2bb1ada361a..8bdcc1438177 100644
--- a/arch/arm64/boot/dts/qcom/msm8976.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8976.dtsi
@@ -808,6 +808,87 @@ tcsr: syscon@1937000 {
reg = <0x01937000 0x3>;
};
 
+   apps_iommu: iommu@1ee {
+   compatible = "qcom,msm8976-iommu", "qcom,msm-iommu-v2";
+   reg = <0x01ee 0x3000>;
+   ranges  = <0 0x01e2 0x2>;
+
+   clocks = < GCC_SMMU_CFG_CLK>,
+< GCC_APSS_TCU_CLK>;
+   clock-names = "iface", "bus";
+
+   qcom,iommu-secure-id = <17>;
+
+   #address-cells = <1>;
+   #size-cells = <1>;
+   #iommu-cells = <1>;
+
+   /* VFE */
+   iommu-ctx@15000 {
+   compatible = "qcom,msm-iommu-v2-ns";
+   reg = <0x15000 0x1000>;
+   qcom,ctx-asid = <20>;
+   interrupts = ;
+   };
+
+   /* VENUS NS */
+   iommu-ctx@16000 {
+   compatible = "qcom,msm-iommu-v2-ns";
+   reg = <0x16000 0x1000>;
+   qcom,ctx-asid = <21>;
+   interrupts = ;
+   };
+
+   /* MDP0 */
+   iommu-ctx@17000 {
+   compatible = "qcom,msm-iommu-v2-ns";
+   reg = <0x17000 0x1000>;
+   qcom,ctx-asid = <22>;
+   interrupts = ;
+   };
+   };
+
+   gpu_iommu: iommu@1f08000 {
+   compatible = "qcom,msm8976-iommu", "qcom,msm-iommu-v2";
+   ranges = <0 0x01f08000 0x8000>;
+
+   clocks = < GCC_SMMU_CFG_CLK>,
+< GCC_GFX3D_TCU_CLK>;
+   clock-names = "iface", "bus";
+
+   power-domains = < OXILI_CX_GDSC>;
+
+   qcom,iommu-secure-id = <18>;
+
+   #address-cells = <1>;
+   #size-cells = <1>;
+   #iommu-cells = <1>;
+
+   /* gfx3d user */
+   iommu-ctx@0 {
+   compatible = "qcom,msm-iommu-v2-ns";
+   reg = <0x0 0x1000>;
+   qcom,ctx-asid = <0>;
+   interrupts = ;
+   };
+
+   /* gfx3d secure */
+   iommu-ctx@1000 {
+   compatible = "qcom,msm-iommu-v2-sec";
+   reg = <0x1000 0x1000>;
+   qcom,ctx-asid = <2>;
+   interrupts = ;
+   };
+
+   /* gfx3d priv */
+   iommu-ctx@2000 {
+   compatible = "qcom,msm-iommu-v2-sec";
+   reg = <0x2000 0x1000>;
+   qcom,ctx-asid = <1>;
+   interrupts = ;
+   };
+   };
+
spmi_bus: spmi@200f000 {
compatible = "qcom,spmi-pmic-arb";
reg = <0x0200f000 0x1000>,
-- 
2.44.0




[PATCH v3 0/4] MSM8976 MDSS/GPU/WCNSS support

2024-04-13 Thread Adam Skladowski
This patch series provide support for display subsystem, gpu
and also adds wireless connectivity subsystem support.

Changes since v2

1. Disabled mdss_dsi nodes by default
2. Changed reg size of mdss_dsi0 to be equal on both
3. Added operating points to second mdss_dsi
4. Brought back required opp-supported-hw on adreno
5. Moved status under operating points on adreno

Changes since v1

1. Addressed feedback
2. Dropped already applied dt-bindings patches
3. Dropped sdc patch as it was submitted as part of other series
4. Dropped dt-bindings patch for Adreno, also separate now

Adam Skladowski (4):
  arm64: dts: qcom: msm8976: Add IOMMU nodes
  arm64: dts: qcom: msm8976: Add MDSS nodes
  arm64: dts: qcom: msm8976: Add Adreno GPU
  arm64: dts: qcom: msm8976: Add WCNSS node

 arch/arm64/boot/dts/qcom/msm8976.dtsi | 536 +-
 1 file changed, 532 insertions(+), 4 deletions(-)

-- 
2.44.0




Re: [PATCH net-next v5] net/ipv4: add tracepoint for icmp_send

2024-04-13 Thread Simon Horman
On Thu, Apr 11, 2024 at 06:01:54PM +0800, xu.xi...@zte.com.cn wrote:
> From: hepeilin 

nit: it's nicer if this From line matches one of the Signed-off-by lines

 From: Peilin He 


> Introduce a tracepoint for icmp_send, which can help users to get more
> detail information conveniently when icmp abnormal events happen.
> 
> 1. Giving an usecase example:
> =
> When an application experiences packet loss due to an unreachable UDP
> destination port, the kernel will send an exception message through the
> icmp_send function. By adding a trace point for icmp_send, developers or
> system administrators can obtain detailed information about the UDP
> packet loss, including the type, code, source address, destination address,
> source port, and destination port. This facilitates the trouble-shooting
> of UDP packet loss issues especially for those network-service
> applications.
> 
> 2. Operation Instructions:
> ==
> Switch to the tracing directory.
> cd /sys/kernel/tracing
> Filter for destination port unreachable.
> echo "type==3 && code==3" > events/icmp/icmp_send/filter
> Enable trace event.
> echo 1 > events/icmp/icmp_send/enable
> 
> 3. Result View:
> 
>  udp_client_erro-11370   [002] ...s.12   124.728002:
>  icmp_send: icmp_send: type=3, code=3.
>  From 127.0.0.1:41895 to 127.0.0.1: ulen=23
>  skbaddr=589b167a
> 
> v4->v5:
> Some fixes according to
> https://lore.kernel.org/all/CAL+tcoDeXXh+zcRk4PHnUk8ELnx=ce2pccqs7sfm0y9ak-e...@mail.gmail.com/
> 1.Adjust the position of trace_icmp_send() to before icmp_push_reply().
> 
> v3->v4:
> Some fixes according to
> https://lore.kernel.org/all/CANn89i+EFEr7VHXNdOi59Ba_R1nFKSBJzBzkJFVgCTdXBx=y...@mail.gmail.com/
> 1.Add legality check for UDP header in SKB.
> 2.Target this patch for net-next.
> 
> Changelog
> 
> v2->v3:
> Some fixes according to
> https://lore.kernel.org/all/20240319102549.7f7f6...@gandalf.local.home/
> 1. Change the tracking directory to/sys/kernel/tracking.
> 2. Adjust the layout of the TP-STRUCT_entry parameter structure.
> 
> v1->v2:
> Some fixes according to
> https://lore.kernel.org/all/CANn89iL-y9e_VFpdw=sztrnkru_tnuwqhufqtjvjsv-nz1x...@mail.gmail.com/
> 1. adjust the trace_icmp_send() to more protocols than UDP.
> 2. move the calling of trace_icmp_send after sanity checks
> in __icmp_send().
> 
> Signed-off-by: Peilin He

nit: There should be a space between 'He' and '<'

> Reviewed-by: xu xin 

This has been posted by xu xin, thus it is appropriate for
a Signed-off-by line from xu xin to be present.

> Reviewed-by: Yunkai Zhang 
> Cc: Yang Yang 
> Cc: Liu Chun 
> Cc: Xuexin Jiang 

Hi,

Unfortunately this patch does not apply to next-next.
Please rebase and repost after waiting a suitable time for
other review.

-- 
pw-bot: changes-requested

> ---
>  include/trace/events/icmp.h | 65 +
>  net/ipv4/icmp.c |  4 +++
>  2 files changed, 69 insertions(+)
>  create mode 100644 include/trace/events/icmp.h
> 
> diff --git a/include/trace/events/icmp.h b/include/trace/events/icmp.h
> new file mode 100644
> index 0..7d5190f48
> --- /dev/null
> +++ b/include/trace/events/icmp.h
> @@ -0,0 +1,65 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM icmp
> +
> +#if !defined(_TRACE_ICMP_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_ICMP_H
> +
> +#include 
> +#include 
> +
> +TRACE_EVENT(icmp_send,
> +
> + TP_PROTO(const struct sk_buff *skb, int type, int code),
> +
> + TP_ARGS(skb, type, code),
> +
> + TP_STRUCT__entry(
> + __field(const void *, skbaddr)
> + __field(int, type)
> + __field(int, code)
> + __array(__u8, saddr, 4)
> + __array(__u8, daddr, 4)
> + __field(__u16, sport)
> + __field(__u16, dport)
> + __field(unsigned short, ulen)
> + ),
> +
> + TP_fast_assign(
> + struct iphdr *iph = ip_hdr(skb);
> + int proto_4 = iph->protocol;
> + __be32 *p32;
> +
> + __entry->skbaddr = skb;
> + __entry->type = type;
> + __entry->code = code;
> +
> + struct udphdr *uh = udp_hdr(skb);
> + if (proto_4 != IPPROTO_UDP || (u8 *)uh < skb->head ||
> + (u8 *)uh + sizeof(struct udphdr) > 
> skb_tail_pointer(skb)) {
> + __entry->sport = 0;
> + __entry->dport = 0;
> + __entry->ulen = 0;
> + } else {
> + __entry->sport = ntohs(uh->source);
> + __entry->dport = ntohs(uh->dest);
> + 

Re: [RFC PATCH 0/4] perf: Correlating user process data to samples

2024-04-13 Thread Steven Rostedt
On Sat, 13 Apr 2024 12:53:38 +0200
Peter Zijlstra  wrote:

> On Fri, Apr 12, 2024 at 09:37:24AM -0700, Beau Belgrave wrote:
> 
> > > Anyway, since we typically run stuff from NMI context, accessing user
> > > data is 'interesting'. As such I would really like to make this work
> > > depend on the call-graph rework that pushes all the user access bits
> > > into return-to-user.  
> > 
> > Cool, I assume that's the SFRAME work? Are there pointers to work I
> > could look at and think about what a rebase looks like? Or do you have
> > someone in mind I should work with for this?  
> 
> I've been offline for a little while and still need to catch up with
> things myself.
> 
> Josh was working on that when I dropped off IIRC, I'm not entirely sure
> where things are at currently (and there is no way I can ever hope to
> process the backlog).
> 
> Anybody know where we are with that?

It's still very much on my RADAR, but with layoffs and such, my
priorities have unfortunately changed. I'm hoping to start helping out
in the near future though (in a month or two).

Josh was working on it, but I think he got pulled off onto other
priorities too :-p

-- Steve



Re: [PATCH v2] bootconfig: use memblock_free_late to free xbc memory to buddy

2024-04-13 Thread Google
Hi Qiang,

I found xbc_free_mem() missed to check !addr. When I booted kernel without
bootconfig data but with "bootconfig" cmdline, I got a kernel crash below;


[2.394904] [ cut here ]
[2.396490] kernel BUG at arch/x86/mm/physaddr.c:28!
[2.398176] invalid opcode:  [#1] PREEMPT SMP PTI
[2.399388] CPU: 7 PID: 1 Comm: swapper/0 Tainted: G N 
6.9.0-rc3-4-g121fbb463836 #10
[2.401579] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.15.0-1 04/01/2014
[2.403247] RIP: 0010:__phys_addr+0x40/0x60
[2.404196] Code: 48 2b 05 fb a4 3d 01 48 05 00 00 00 80 48 39 c7 72 17 0f 
b6 0d ee 9e c0 01 48 89 c2 48 d3 ea 48 85 d2 75 05 c3 cc cc cc cc 90 <0f> 0b 48 
03 05 e7 e2 9d 01 48 81 ff ff ff ff 1f 76 e8 90 0f6
[2.407250] RSP: :c9013f18 EFLAGS: 00010287
[2.407991] RAX: 7780 RBX: 81c17940 RCX: 008a
[2.408891] RDX: 008b RSI: 88800775f320 RDI: 8000
[2.409727] RBP:  R08:  R09: 
[2.410555] R10: 888005028a60 R11: 008a R12: 
[2.411423] R13:  R14:  R15: 
[2.412155] FS:  () GS:88807d9c() 
knlGS:
[2.412970] CS:  0010 DS:  ES:  CR0: 80050033
[2.413550] CR2:  CR3: 02a48000 CR4: 06b0
[2.414264] Call Trace:
[2.414520]  
[2.414755]  ? die+0x37/0x90
[2.415062]  ? do_trap+0xe3/0x110
[2.415451]  ? __phys_addr+0x40/0x60
[2.415822]  ? do_error_trap+0x9c/0x120
[2.416215]  ? __phys_addr+0x40/0x60
[2.416573]  ? __phys_addr+0x40/0x60
[2.416968]  ? exc_invalid_op+0x53/0x70
[2.417358]  ? __phys_addr+0x40/0x60
[2.417709]  ? asm_exc_invalid_op+0x1a/0x20
[2.418122]  ? __pfx_kernel_init+0x10/0x10
[2.418569]  ? __phys_addr+0x40/0x60
[2.418960]  _xbc_exit+0x74/0xc0
[2.419374]  kernel_init+0x3a/0x1c0
[2.419764]  ret_from_fork+0x34/0x50
[2.420132]  ? __pfx_kernel_init+0x10/0x10
[2.420578]  ret_from_fork_asm+0x1a/0x30
[2.420973]  
[2.421200] Modules linked in:
[2.421598] ---[ end trace  ]---
[2.422053] RIP: 0010:__phys_addr+0x40/0x60
[2.422484] Code: 48 2b 05 fb a4 3d 01 48 05 00 00 00 80 48 39 c7 72 17 0f 
b6 0d ee 9e c0 01 48 89 c2 48 d3 ea 48 85 d2 75 05 c3 cc cc cc cc 90 <0f> 0b 48 
03 05 e7 e2 9d 01 48 81 ff ff ff ff 1f 76 e8 90 0f6
[2.424294] RSP: :c9013f18 EFLAGS: 00010287
[2.424769] RAX: 7780 RBX: 81c17940 RCX: 008a
[2.425378] RDX: 008b RSI: 88800775f320 RDI: 8000
[2.425993] RBP:  R08:  R09: 
[2.426589] R10: 888005028a60 R11: 008a R12: 
[2.427156] R13:  R14:  R15: 
[2.427746] FS:  () GS:88807d9c() 
knlGS:
[2.428368] CS:  0010 DS:  ES:  CR0: 80050033
[2.428820] CR2:  CR3: 02a48000 CR4: 06b0
[2.429373] Kernel panic - not syncing: Fatal exception
[2.429982] Kernel Offset: disabled
[2.430261] ---[ end Kernel panic - not syncing: Fatal exception ]---

Adding below patch fixed it.

diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index f9a45adc6307..8841554432d5 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -65,7 +65,7 @@ static inline void __init xbc_free_mem(void *addr, size_t 
size, bool early)
 {
if (early)
memblock_free(addr, size);
-   else
+   else if (addr)
memblock_free_late(__pa(addr), size);
 }
 
Can you update with this fix?

Thank you,


On Fri, 12 Apr 2024 22:18:20 +0900
Masami Hiramatsu (Google)  wrote:

> On Fri, 12 Apr 2024 18:49:41 +0800
> qiang4.zh...@linux.intel.com wrote:
> 
> > From: Qiang Zhang 
> > 
> > On the time to free xbc memory in xbc_exit(), memblock may has handed
> > over memory to buddy allocator. So it doesn't make sense to free memory
> > back to memblock. memblock_free() called by xbc_exit() even causes UAF bugs
> > on architectures with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86.
> > Following KASAN logs shows this case.
> > 
> > This patch fixes the xbc memory free problem by calling memblock_free()
> > in early xbc init error rewind path and calling memblock_free_late() in
> > xbc exit path to free memory to buddy allocator.
> > 
> > [9.410890] 
> > ==
> > [9.418962] BUG: KASAN: use-after-free in 
> > memblock_isolate_range+0x12d/0x260
> > [9.426850] Read of size 8 at addr 88845dd3 by task swapper/0/1
> > 
> > [9.435901] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G U 
> > 

Re: [PATCH] dt-bindings: iio: imu: mpu6050: Improve i2c-gate disallow list

2024-04-13 Thread Jonathan Cameron
On Tue, 9 Apr 2024 08:36:08 +0200
Krzysztof Kozlowski  wrote:

> On 08/04/2024 18:34, Luca Weiss wrote:
> > Before all supported sensors except for MPU{9150,9250,9255} were not
> > allowed to use i2c-gate in the bindings which excluded quite a few
> > supported sensors where this functionality is supported.
> > 
> > Switch the list of sensors to ones where the Linux driver explicitly
> > disallows support for the auxiliary bus ("inv_mpu_i2c_aux_bus"). Since
> > the driver is also based on "default: return true" this should scale
> > better into the future.
> > 
> > Signed-off-by: Luca Weiss   
> 
> Acked-by: Krzysztof Kozlowski 
> 
> Best regards,
> Krzysztof
> 

Applied, thanks



Re: [RFC PATCH 0/4] perf: Correlating user process data to samples

2024-04-13 Thread Peter Zijlstra
On Fri, Apr 12, 2024 at 09:37:24AM -0700, Beau Belgrave wrote:

> > Anyway, since we typically run stuff from NMI context, accessing user
> > data is 'interesting'. As such I would really like to make this work
> > depend on the call-graph rework that pushes all the user access bits
> > into return-to-user.
> 
> Cool, I assume that's the SFRAME work? Are there pointers to work I
> could look at and think about what a rebase looks like? Or do you have
> someone in mind I should work with for this?

I've been offline for a little while and still need to catch up with
things myself.

Josh was working on that when I dropped off IIRC, I'm not entirely sure
where things are at currently (and there is no way I can ever hope to
process the backlog).

Anybody know where we are with that?



Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-13 Thread Marc Zyngier
On Fri, 12 Apr 2024 15:54:22 +0100,
Sean Christopherson  wrote:
> 
> On Fri, Apr 12, 2024, Marc Zyngier wrote:
> > On Fri, 12 Apr 2024 11:44:09 +0100, Will Deacon  wrote:
> > > On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote:
> > > Also, if you're in the business of hacking the MMU notifier code, it
> > > would be really great to change the .clear_flush_young() callback so
> > > that the architecture could handle the TLB invalidation. At the moment,
> > > the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret'
> > > being set by kvm_handle_hva_range(), whereas we could do a much
> > > lighter-weight and targetted TLBI in the architecture page-table code
> > > when we actually update the ptes for small ranges.
> > 
> > Indeed, and I was looking at this earlier this week as it has a pretty
> > devastating effect with NV (it blows the shadow S2 for that VMID, with
> > costly consequences).
> > 
> > In general, it feels like the TLB invalidation should stay with the
> > code that deals with the page tables, as it has a pretty good idea of
> > what needs to be invalidated and how -- specially on architectures
> > that have a HW-broadcast facility like arm64.
> 
> Would this be roughly on par with an in-line flush on arm64?  The simpler, 
> more
> straightforward solution would be to let architectures override flush_on_ret,
> but I would prefer something like the below as x86 can also utilize a 
> range-based
> flush when running as a nested hypervisor.
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ff0a20565f90..b65116294efe 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -601,6 +601,7 @@ static __always_inline kvm_mn_ret_t 
> __kvm_handle_hva_range(struct kvm *kvm,
> struct kvm_gfn_range gfn_range;
> struct kvm_memory_slot *slot;
> struct kvm_memslots *slots;
> +   bool need_flush = false;
> int i, idx;
>  
> if (WARN_ON_ONCE(range->end <= range->start))
> @@ -653,10 +654,22 @@ static __always_inline kvm_mn_ret_t 
> __kvm_handle_hva_range(struct kvm *kvm,
> break;
> }
> r.ret |= range->handler(kvm, _range);
> +
> +   /*
> +* Use a precise gfn-based TLB flush when possible, as
> +* most mmu_notifier events affect a small-ish range.
> +* Fall back to a full TLB flush if the gfn-based 
> flush
> +* fails, and don't bother trying the gfn-based flush
> +* if a full flush is already pending.
> +*/
> +   if (range->flush_on_ret && !need_flush && r.ret &&
> +   kvm_arch_flush_remote_tlbs_range(kvm, 
> gfn_range.start
> +gfn_range.end - 
> gfn_range.start + 1))
> +   need_flush = true;
> }
> }
>  
> -   if (range->flush_on_ret && r.ret)
> +   if (need_flush)
> kvm_flush_remote_tlbs(kvm);
>  
> if (r.found_memslot)

I think this works for us on HW that has range invalidation, which
would already be a positive move.

For the lesser HW that isn't range capable, it also gives the
opportunity to perform the iteration ourselves or go for the nuclear
option if the range is larger than some arbitrary constant (though
this is additional work).

But this still considers the whole range as being affected by
range->handler(). It'd be interesting to try and see whether more
precise tracking is (or isn't) generally beneficial.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.



Re: [PATCH v11 00/14] Add Cgroup support for SGX EPC memory

2024-04-13 Thread Mikko Ylinen
On Wed, Apr 10, 2024 at 11:25:44AM -0700, Haitao Huang wrote:
> SGX Enclave Page Cache (EPC) memory allocations are separate from normal
> RAM allocations, and are managed solely by the SGX subsystem. The existing
> cgroup memory controller cannot be used to limit or account for SGX EPC
> memory, which is a desirable feature in some environments, e.g., support
> for pod level control in a Kubernates cluster on a VM or bare-metal host
> [1,2].
>  
> This patchset implements the support for sgx_epc memory within the misc
> cgroup controller. A user can use the misc cgroup controller to set and
> enforce a max limit on total EPC usage per cgroup. The implementation
> reports current usage and events of reaching the limit per cgroup as well
> as the total system capacity.
>  
> Much like normal system memory, EPC memory can be overcommitted via virtual
> memory techniques and pages can be swapped out of the EPC to their backing
> store, which are normal system memory allocated via shmem and accounted by
> the memory controller. Similar to per-cgroup reclamation done by the memory
> controller, the EPC misc controller needs to implement a per-cgroup EPC
> reclaiming process: when the EPC usage of a cgroup reaches its hard limit
> ('sgx_epc' entry in the 'misc.max' file), the cgroup starts swapping out
> some EPC pages within the same cgroup to make room for new allocations.
>  
> For that, this implementation tracks reclaimable EPC pages in a separate
> LRU list in each cgroup, and below are more details and justification of
> this design. 
>  
> Track EPC pages in per-cgroup LRUs (from Dave)
> --
>  
> tl;dr: A cgroup hitting its limit should be as similar as possible to the
> system running out of EPC memory. The only two choices to implement that
> are nasty changes the existing LRU scanning algorithm, or to add new LRUs.
> The result: Add a new LRU for each cgroup and scans those instead. Replace
> the existing global cgroup with the root cgroup's LRU (only when this new
> support is compiled in, obviously).
>  
> The existing EPC memory management aims to be a miniature version of the
> core VM where EPC memory can be overcommitted and reclaimed. EPC
> allocations can wait for reclaim. The alternative to waiting would have
> been to send a signal and let the enclave die.
>  
> This series attempts to implement that same logic for cgroups, for the same
> reasons: it's preferable to wait for memory to become available and let
> reclaim happen than to do things that are fatal to enclaves.
>  
> There is currently a global reclaimable page SGX LRU list. That list (and
> the existing scanning algorithm) is essentially useless for doing reclaim
> when a cgroup hits its limit because the cgroup's pages are scattered
> around that LRU. It is unspeakably inefficient to scan a linked list with
> millions of entries for what could be dozens of pages from a cgroup that
> needs reclaim.
>  
> Even if unspeakably slow reclaim was accepted, the existing scanning
> algorithm only picks a few pages off the head of the global LRU. It would
> either need to hold the list locks for unreasonable amounts of time, or be
> taught to scan the list in pieces, which has its own challenges.
>  
> Unreclaimable Enclave Pages
> ---
>  
> There are a variety of page types for enclaves, each serving different
> purposes [5]. Although the SGX architecture supports swapping for all
> types, some special pages, e.g., Version Array(VA) and Secure Enclave
> Control Structure (SECS)[5], holds meta data of reclaimed pages and
> enclaves. That makes reclamation of such pages more intricate to manage.
> The SGX driver global reclaimer currently does not swap out VA pages. It
> only swaps the SECS page of an enclave when all other associated pages have
> been swapped out. The cgroup reclaimer follows the same approach and does
> not track those in per-cgroup LRUs and considers them as unreclaimable
> pages. The allocation of these pages is counted towards the usage of a
> specific cgroup and is subject to the cgroup's set EPC limits.
>  
> Earlier versions of this series implemented forced enclave-killing to
> reclaim VA and SECS pages. That was designed to enforce the 'max' limit,
> particularly in scenarios where a user or administrator reduces this limit
> post-launch of enclaves. However, subsequent discussions [3, 4] indicated
> that such preemptive enforcement is not necessary for the misc-controllers.
> Therefore, reclaiming SECS/VA pages by force-killing enclaves were removed,
> and the limit is only enforced at the time of new EPC allocation request.
> When a cgroup hits its limit but nothing left in the LRUs of the subtree,
> i.e., nothing to reclaim in the cgroup, any new attempt to allocate EPC
> within that cgroup will result in an 'ENOMEM'.
>  
> Unreclaimable Guest VM EPC Pages
> 
>  
> The EPC pages allocated for guest VMs by the