date:20250409

[PATCH v8 3/4] KVM: selftests: Add ucall test support for LoongArch

2025-04-09 Thread Bibo Mao

Add ucall test support for LoongArch, ucall method on LoongArch uses
undefined mmio area. It will cause causes vcpu exits to hypervisor so
that hypervisor can communicate with vcpu.

Signed-off-by: Bibo Mao 
---
 .../selftests/kvm/include/loongarch/ucall.h   | 20 ++
 .../selftests/kvm/lib/loongarch/ucall.c   | 38 +++
 2 files changed, 58 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/include/loongarch/ucall.h
 create mode 100644 tools/testing/selftests/kvm/lib/loongarch/ucall.c

diff --git a/tools/testing/selftests/kvm/include/loongarch/ucall.h 
b/tools/testing/selftests/kvm/include/loongarch/ucall.h
new file mode 100644
index ..4ec801f37f00
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/loongarch/ucall.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef SELFTEST_KVM_UCALL_H
+#define SELFTEST_KVM_UCALL_H
+
+#include "kvm_util.h"
+
+#define UCALL_EXIT_REASON   KVM_EXIT_MMIO
+
+/*
+ * ucall_exit_mmio_addr holds per-VM values (global data is duplicated by each
+ * VM), it must not be accessed from host code.
+ */
+extern vm_vaddr_t *ucall_exit_mmio_addr;
+
+static inline void ucall_arch_do_ucall(vm_vaddr_t uc)
+{
+   WRITE_ONCE(*ucall_exit_mmio_addr, uc);
+}
+
+#endif
diff --git a/tools/testing/selftests/kvm/lib/loongarch/ucall.c 
b/tools/testing/selftests/kvm/lib/loongarch/ucall.c
new file mode 100644
index ..fc6cbb50573f
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/loongarch/ucall.c
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ucall support. A ucall is a "hypercall to userspace".
+ *
+ */
+#include "kvm_util.h"
+
+/*
+ * ucall_exit_mmio_addr holds per-VM values (global data is duplicated by each
+ * VM), it must not be accessed from host code.
+ */
+vm_vaddr_t *ucall_exit_mmio_addr;
+
+void ucall_arch_init(struct kvm_vm *vm, vm_paddr_t mmio_gpa)
+{
+   vm_vaddr_t mmio_gva = vm_vaddr_unused_gap(vm, vm->page_size, 
KVM_UTIL_MIN_VADDR);
+
+   virt_map(vm, mmio_gva, mmio_gpa, 1);
+
+   vm->ucall_mmio_addr = mmio_gpa;
+
+   write_guest_global(vm, ucall_exit_mmio_addr, (vm_vaddr_t *)mmio_gva);
+}
+
+void *ucall_arch_get_ucall(struct kvm_vcpu *vcpu)
+{
+   struct kvm_run *run = vcpu->run;
+
+   if (run->exit_reason == KVM_EXIT_MMIO &&
+   run->mmio.phys_addr == vcpu->vm->ucall_mmio_addr) {
+   TEST_ASSERT(run->mmio.is_write && run->mmio.len == 
sizeof(uint64_t),
+   "Unexpected ucall exit mmio address access");
+
+   return (void *)(*((uint64_t *)run->mmio.data));
+   }
+
+   return NULL;
+}
-- 
2.39.3

[PATCH v8 0/4] KVM: selftests: Add LoongArch support

2025-04-09 Thread Bibo Mao

This patchset adds KVM selftests for LoongArch system, currently only
some common test cases are supported and pass to run. These testcase
are listed as following:
coalesced_io_test
demand_paging_test
dirty_log_perf_test
dirty_log_test
guest_print_test
hardware_disable_test
kvm_binary_stats_test
kvm_create_max_vcpus
kvm_page_table_test
memslot_modification_stress_test
memslot_perf_test
set_memory_region_test

This patchset originally is posted from zhaotianrui, I continue to work
on his efforts.

---
Changes in v8:
1. Porting patch based on the latest version.
2. For macro PC_OFFSET_EXREGS, offsetof() method is used for C header file,
   still hardcoded definition for assemble language.

Changes in v7:
1. Refine code to add LoongArch support in test case
set_memory_region_test.

Changes in v6:
1. Refresh the patch based on latest kernel 6.8-rc1, add LoongArch
support about testcase set_memory_region_test.
2. Add hardware_disable_test test case.
3. Drop modification about macro DEFAULT_GUEST_TEST_MEM, it is problem
of LoongArch binutils, this issue is raised to LoongArch binutils owners.

Changes in v5:
1. In LoongArch kvm self tests, the DEFAULT_GUEST_TEST_MEM could be
0x13000, it is different from the default value in memstress.h.
So we Move the definition of DEFAULT_GUEST_TEST_MEM into LoongArch
ucall.h, and add 'ifndef' condition for DEFAULT_GUEST_TEST_MEM
in memstress.h.

Changes in v4:
1. Remove the based-on flag, as the LoongArch KVM patch series
have been accepted by Linux kernel, so this can be applied directly
in kernel.

Changes in v3:
1. Improve implementation of LoongArch VM page walk.
2. Add exception handler for LoongArch.
3. Add dirty_log_test, dirty_log_perf_test, guest_print_test
test cases for LoongArch.
4. Add __ASSEMBLER__ macro to distinguish asm file and c file.
5. Move ucall_arch_do_ucall to the header file and make it as
static inline to avoid function calls.
6. Change the DEFAULT_GUEST_TEST_MEM base addr for LoongArch.

Changes in v2:
1. We should use ".balign 4096" to align the assemble code with 4K in
exception.S instead of "align 12".
2. LoongArch only supports 3 or 4 levels page tables, so we remove the
hanlders for 2-levels page table.
3. Remove the DEFAULT_LOONGARCH_GUEST_STACK_VADDR_MIN and use the common
DEFAULT_GUEST_STACK_VADDR_MIN to allocate stack memory in guest.
4. Reorganize the test cases supported by LoongArch.
5. Fix some code comments.
6. Add kvm_binary_stats_test test case into LoongArch KVM selftests.

---
Bibo Mao (4):
  KVM: selftests: Add KVM selftests header files for LoongArch
  KVM: selftests: Add core KVM selftests support for LoongArch
  KVM: selftests: Add ucall test support for LoongArch
  KVM: selftests: Add test cases for LoongArch

 tools/testing/selftests/kvm/Makefile  |   2 +-
 tools/testing/selftests/kvm/Makefile.kvm  |  18 +
 .../testing/selftests/kvm/include/kvm_util.h  |   5 +
 .../kvm/include/loongarch/kvm_util_arch.h |   7 +
 .../kvm/include/loongarch/processor.h | 138 +++
 .../selftests/kvm/include/loongarch/ucall.h   |  20 +
 .../selftests/kvm/lib/loongarch/exception.S   |  59 +++
 .../selftests/kvm/lib/loongarch/processor.c   | 349 ++
 .../selftests/kvm/lib/loongarch/ucall.c   |  38 ++
 .../selftests/kvm/set_memory_region_test.c|   2 +-
 10 files changed, 636 insertions(+), 2 deletions(-)
 create mode 100644 
tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h
 create mode 100644 tools/testing/selftests/kvm/include/loongarch/processor.h
 create mode 100644 tools/testing/selftests/kvm/include/loongarch/ucall.h
 create mode 100644 tools/testing/selftests/kvm/lib/loongarch/exception.S
 create mode 100644 tools/testing/selftests/kvm/lib/loongarch/processor.c
 create mode 100644 tools/testing/selftests/kvm/lib/loongarch/ucall.c


base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
-- 
2.39.3

[PATCH v8 4/4] KVM: selftests: Add test cases for LoongArch

2025-04-09 Thread Bibo Mao

Some common KVM testcases are supported on LoongArch now as following:
  coalesced_io_test
  demand_paging_test
  dirty_log_perf_test
  dirty_log_test
  guest_print_test
  hardware_disable_test
  kvm_binary_stats_test
  kvm_create_max_vcpus
  kvm_page_table_test
  memslot_modification_stress_test
  memslot_perf_test
  set_memory_region_test
And other test cases are not supported by LoongArch such as rseq_test,
since it is not supported on LoongArch physical machine neither.

Signed-off-by: Bibo Mao 
---
 tools/testing/selftests/kvm/Makefile   |  2 +-
 tools/testing/selftests/kvm/Makefile.kvm   | 18 ++
 .../selftests/kvm/set_memory_region_test.c |  2 +-
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index 20af35a91d6f..d9fffe06d3ea 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -3,7 +3,7 @@ top_srcdir = ../../../..
 include $(top_srcdir)/scripts/subarch.include
 ARCH?= $(SUBARCH)
 
-ifeq ($(ARCH),$(filter $(ARCH),arm64 s390 riscv x86 x86_64))
+ifeq ($(ARCH),$(filter $(ARCH),arm64 s390 riscv x86 x86_64 loongarch))
 # Top-level selftests allows ARCH=x86_64 :-(
 ifeq ($(ARCH),x86_64)
ARCH := x86
diff --git a/tools/testing/selftests/kvm/Makefile.kvm 
b/tools/testing/selftests/kvm/Makefile.kvm
index f773f8f99249..4704ab16f19f 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -47,6 +47,10 @@ LIBKVM_riscv += lib/riscv/handlers.S
 LIBKVM_riscv += lib/riscv/processor.c
 LIBKVM_riscv += lib/riscv/ucall.c
 
+LIBKVM_loongarch += lib/loongarch/processor.c
+LIBKVM_loongarch += lib/loongarch/ucall.c
+LIBKVM_loongarch += lib/loongarch/exception.S
+
 # Non-compiled test targets
 TEST_PROGS_x86 += x86/nx_huge_pages_test.sh
 
@@ -205,6 +209,20 @@ TEST_GEN_PROGS_riscv += kvm_page_table_test
 TEST_GEN_PROGS_riscv += set_memory_region_test
 TEST_GEN_PROGS_riscv += steal_time
 
+TEST_GEN_PROGS_loongarch += coalesced_io_test
+TEST_GEN_PROGS_loongarch += demand_paging_test
+TEST_GEN_PROGS_loongarch += dirty_log_perf_test
+TEST_GEN_PROGS_loongarch += dirty_log_test
+TEST_GEN_PROGS_loongarch += demand_paging_test
+TEST_GEN_PROGS_loongarch += guest_print_test
+TEST_GEN_PROGS_loongarch += hardware_disable_test
+TEST_GEN_PROGS_loongarch += kvm_binary_stats_test
+TEST_GEN_PROGS_loongarch += kvm_create_max_vcpus
+TEST_GEN_PROGS_loongarch += kvm_page_table_test
+TEST_GEN_PROGS_loongarch += memslot_modification_stress_test
+TEST_GEN_PROGS_loongarch += memslot_perf_test
+TEST_GEN_PROGS_loongarch += set_memory_region_test
+
 SPLIT_TESTS += arch_timer
 SPLIT_TESTS += get-reg-list
 
diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c 
b/tools/testing/selftests/kvm/set_memory_region_test.c
index bc440d5aba57..ce3ac0fd6dfb 100644
--- a/tools/testing/selftests/kvm/set_memory_region_test.c
+++ b/tools/testing/selftests/kvm/set_memory_region_test.c
@@ -350,7 +350,7 @@ static void test_invalid_memory_region_flags(void)
struct kvm_vm *vm;
int r, i;
 
-#if defined __aarch64__ || defined __riscv || defined __x86_64__
+#if defined __aarch64__ || defined __riscv || defined __x86_64__ || defined 
__loongarch__
supported_flags |= KVM_MEM_READONLY;
 #endif
 
-- 
2.39.3

Re: [PATCH v2] scripts/spdxcheck: Limit the scope of git.Repo

2025-04-09 Thread Ricardo Ribalda

On Wed, 9 Apr 2025 at 22:40, Gon Solo  wrote:
>
> Am Wed, Apr 09, 2025 at 08:04:19PM + schrieb Ricardo Ribalda:
> > Changes in v2:
> > - Make repo a local variable
> > - Link to v1: 
> > https://lore.kernel.org/r/20250225-spx-v1-1-e935b27eb...@chromium.org
>
> This is not necessary as it was Python's fault and is fixed by 3.13.3
> which came out yesterday. I just checked.

It will take some time before this reaches all distributions. This
patch is relatively simple.

I might be biased, but I think the benefits outweigh the cons.

>


-- 
Ricardo Ribalda

Re: [PATCH bpf-next v1 0/4] bpf, sockmap: Fix data loss and panic issues

2025-04-09 Thread John Fastabend

On 2025-04-10 03:10:37, patchwork-bot+netdev...@kernel.org wrote:
> Hello:
> 
> This series was applied to bpf/bpf-next.git (master)
> by Alexei Starovoitov :
> 
> On Mon,  7 Apr 2025 22:21:19 +0800 you wrote:
> > I was writing a benchmark based on sockmap + TCP and discovered several
> > issues:
> > 
> > 1. When EAGAIN occurs, the direction of skb is incorrect, causing data
> >loss when retry.
> > 2. When sending partial data, the offset is not recorded, leading to
> >duplicate data being sent when retry.
> > 3. An unexpected BUG_ON() judgment in skb_linearize is triggered.
> > 4. The memory of psock->ingress_skb is not limited by the socket buffer
> >and memcg.
> > 
> > [...]

LGTM thanks for the fixes Jiayuan. Good to see someone working through
all the cases.

already merged but ACK for me.


> 
> Here is the summary with links:
>   - [bpf-next,v1,1/4] bpf, sockmap: Fix data lost during EAGAIN retries
> https://git.kernel.org/bpf/bpf-next/c/7683167196bd
>   - [bpf-next,v1,2/4] bpf, sockmap: fix duplicated data transmission
> https://git.kernel.org/bpf/bpf-next/c/3b4f14b79428
>   - [bpf-next,v1,3/4] bpf, sockmap: Fix panic when calling skb_linearize
> https://git.kernel.org/bpf/bpf-next/c/5ca2e29f6834
>   - [bpf-next,v1,4/4] selftest/bpf/benchs: Add benchmark for sockmap usage
> https://git.kernel.org/bpf/bpf-next/c/7b2fa44de5e7
> 
> You are awesome, thank you!
> -- 
> Deet-doot-dot, I am a bot.
> https://korg.docs.kernel.org/patchwork/pwbot.html
> 
>

Re: [PATCH bpf-next v1 0/4] bpf, sockmap: Fix data loss and panic issues

2025-04-09 Thread patchwork-bot+netdevbpf

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov :

On Mon,  7 Apr 2025 22:21:19 +0800 you wrote:
> I was writing a benchmark based on sockmap + TCP and discovered several
> issues:
> 
> 1. When EAGAIN occurs, the direction of skb is incorrect, causing data
>loss when retry.
> 2. When sending partial data, the offset is not recorded, leading to
>duplicate data being sent when retry.
> 3. An unexpected BUG_ON() judgment in skb_linearize is triggered.
> 4. The memory of psock->ingress_skb is not limited by the socket buffer
>and memcg.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v1,1/4] bpf, sockmap: Fix data lost during EAGAIN retries
https://git.kernel.org/bpf/bpf-next/c/7683167196bd
  - [bpf-next,v1,2/4] bpf, sockmap: fix duplicated data transmission
https://git.kernel.org/bpf/bpf-next/c/3b4f14b79428
  - [bpf-next,v1,3/4] bpf, sockmap: Fix panic when calling skb_linearize
https://git.kernel.org/bpf/bpf-next/c/5ca2e29f6834
  - [bpf-next,v1,4/4] selftest/bpf/benchs: Add benchmark for sockmap usage
https://git.kernel.org/bpf/bpf-next/c/7b2fa44de5e7

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

Re: [PATCH] selftests/bpf: Fix bpf_nf selftest failure

2025-04-09 Thread patchwork-bot+netdevbpf

Hello:

This patch was applied to bpf/bpf-next.git (master)
by Andrii Nakryiko :

On Wed,  9 Apr 2025 15:26:33 +0530 you wrote:
> For systems with missing iptables-legacy tool, this selftest fails.
> 
> Add check to find if iptables-legacy tool is available and skip the
> test if the tool is missing.
> 
> Fixes: de9c8d848d90 ("selftests/bpf: S/iptables/iptables-legacy/ in the 
> bpf_nf and xdp_synproxy test")
> Signed-off-by: Saket Kumar Bhaskar 
> 
> [...]

Here is the summary with links:
  - selftests/bpf: Fix bpf_nf selftest failure
https://git.kernel.org/bpf/bpf-next/c/967e8def1100

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2025-04-09 Thread Yan Zhao

Hi,

AFAIK, the commit c9c1e20b4c7d ("KVM: x86: Introduce Intel specific quirk
KVM_X86_QUIRK_IGNORE_GUEST_PAT") which re-allows honoring guest PAT on Intel's
platforms has been in kvm/queue now.

However, as the quirk is enabled by default, userspace(like QEMU) needs to turn
it off by code like "kvm_vm_enable_cap(kvm_state, KVM_CAP_DISABLE_QUIRKS2, 0,
KVM_X86_QUIRK_IGNORE_GUEST_PAT)" to honor guest PAT, according to the doc:

KVM_X86_QUIRK_IGNORE_GUEST_PAT   ...
 Userspace can disable the quirk to honor
 guest PAT if it knows that there is no such
 guest software, for example if it does not
 expose a bochs graphics device (which is
 known to have had a buggy driver).

Thanks
Yan

On Thu, Apr 10, 2025 at 01:13:18AM +, Myrsky Lintu wrote:
> Hello,
> 
> I am completely new to and uninformed about kernel development. I was 
> pointed here from Mesa documentation for Venus (Vulkan encapsulation for 
> KVM/QEMU): https://docs.mesa3d.org/drivers/venus.html
> 
> Based on my limited understanding of what has happened here, this patch 
> series was partially reverted due to an issue with the Bochs DRM driver. 
> A fix for that issue has been merged months ago according to the link 
> provided in an earlier message. Since then work on this detail of KVM 
> seems to have stalled.
> 
> Is it reasonable to ask here for this patch series to be evaluated and 
> incorporated again?
> 
> My layperson's attempt at applying the series against 6.14.1 source code 
> failed. In addition to the parts that appear to have already been 
> incorporated there are some parts of the patch series that are rejected. 
> I lack the knowledge to correct that.
> 
> Distro kernels currently ship without it which limits the usability of 
> Venus on AMD and NVIDIA GPUs paired with Intel CPUs. Convincing 
> individual distro maintainers of the necessity of this patch series 
> without the specialized knowledge required for understanding what it 
> does and performing that evaluation is quite hard. If upstream (kernel) 
> would apply it now the distros would ship a kernel including the 
> required changes to users, including me, without that multiplicated effort.
> 
> Thank you for your time. If this request is out of place here please 
> forgive me for engaging this mailing list without a proper understanding 
> of the list's scope.
> 
> On 2024-10-07 14:04:24, Linux regression tracking (Thorsten Leemhuis) wrote:
> > On 07.10.24 15:38, Vitaly Kuznetsov wrote:
> >> "Linux regression tracking (Thorsten Leemhuis)"
> >>  writes:
> >>
> >>> On 30.08.24 11:35, Vitaly Kuznetsov wrote:
>  Sean Christopherson  writes:
> 
> > Unconditionally honor guest PAT on CPUs that support self-snoop, as
> > Intel has confirmed that CPUs that support self-snoop always snoop 
> > caches
> > and store buffers.  I.e. CPUs with self-snoop maintain cache coherency
> > even in the presence of aliased memtypes, thus there is no need to trust
> > the guest behaves and only honor PAT as a last resort, as KVM does 
> > today.
> >
> > Honoring guest PAT is desirable for use cases where the guest has access
> > to non-coherent DMA _without_ bouncing through VFIO, e.g. when a virtual
> > (mediated, for all intents and purposes) GPU is exposed to the guest, 
> > along
> > with buffers that are consumed directly by the physical GPU, i.e. which
> > can't be proxied by the host to ensure writes from the guest are 
> > performed
> > with the correct memory type for the GPU.
> 
>  Necroposting!
> 
>  Turns out that this change broke "bochs-display" driver in QEMU even
>  when the guest is modern (don't ask me 'who the hell uses bochs for
>  modern guests', it was basically a configuration error :-). E.g:
>  [...]
> >>>
> >>> This regression made it to the list of tracked regressions. It seems
> >>> this thread stalled a while ago. Was this ever fixed? Does not look like
> >>> it, but I might have missed something. Or is this a regression I should
> >>> just ignore for one reason or another?
> >>>
> >>
> >> The regression was addressed in by reverting 377b2f359d1f in 6.11
> >>
> >> commit 9d70f3fec14421e793ffbc0ec2f739b24e534900
> >> Author: Paolo Bonzini 
> >> Date:   Sun Sep 15 02:49:33 2024 -0400
> >>
> >>  Revert "KVM: VMX: Always honor guest PAT on CPUs that support 
> >> self-snoop"
> > 
> > Thx. Sorry, missed that, thx for pointing me towards it. I had looked
> > for things like that, but seems I messed up my lore query. Apologies for
> > the noise!
> > 
> >> Also, there's a (pending) DRM patch fixing it from the guest's side:
> >> https://gitlab.freedesktop.org/drm/misc/kernel/-/commit/9388ccf69925223223c87355a417ba39b13a5e8e
> > 
> > Great!
> > 
> > Ciao, Thorsten
> > 
> > P.S.:
> > 
> > #regzbot fix: 9d70f3fec14421

[linus:master] [fs/dax] bde708f1a6: WARNING:at_mm/truncate.c:#truncate_folio_batch_exceptionals

2025-04-09 Thread kernel test robot




Hello,

kernel test robot noticed 
"WARNING:at_mm/truncate.c:#truncate_folio_batch_exceptionals" on:

commit: bde708f1a65d025c45575bfe1e7bf7bdf7e71e87 ("fs/dax: always remove DAX 
page-cache entries when breaking layouts")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: xfstests
version: xfstests-x86_64-8467552f-1_20241215
with following parameters:

bp1_memmap: 4G!8G
bp2_memmap: 4G!10G
bp3_memmap: 4G!16G
bp4_memmap: 4G!22G
nr_pmem: 4
fs: ext2
test: generic-dax



config: x86_64-rhel-9.4-func
compiler: gcc-12
test machine: 8 threads Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (Skylake) with 
28G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: https://lore.kernel.org/oe-lkp/202504101036.390f29a5-...@intel.com


[   46.394237][ T4025] [ cut here ]
[ 46.399593][ T4025] WARNING: CPU: 7 PID: 4025 at mm/truncate.c:89 
truncate_folio_batch_exceptionals (mm/truncate.c:89 (discriminator 1)) 
[   46.409748][ T4025] Modules linked in: ext2 snd_hda_codec_hdmi snd_ctl_led 
snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component 
intel_rapl_msr btrfs intel_rapl_common blake2b_generic xor ipmi_devintf 
zstd_compress ipmi_msghandler x86_pkg_temp_thermal snd_soc_avs intel_powerclamp 
raid6_pq snd_soc_hda_codec snd_hda_ext_core coretemp snd_soc_core snd_compress 
kvm_intel i915 sd_mod snd_hda_intel kvm snd_intel_dspcfg sg snd_intel_sdw_acpi 
intel_gtt snd_hda_codec cec dell_pc platform_profile drm_buddy 
ghash_clmulni_intel snd_hda_core sha512_ssse3 dell_wmi ttm sha256_ssse3 
snd_hwdep nd_pmem sha1_ssse3 dell_smbios drm_display_helper nd_btt snd_pcm 
dax_pmem mei_wdt rapl drm_kms_helper ahci mei_me snd_timer libahci rfkill 
nd_e820 intel_cstate sparse_keymap video wmi_bmof dcdbas dell_wmi_descriptor 
libnvdimm libata pcspkr intel_uncore i2c_i801 snd mei i2c_smbus soundcore 
intel_pch_thermal intel_pmc_core intel_vsec wmi pmt_telemetry acpi_pad 
pmt_class binfmt_misc fuse loop drm dm_mod ip_tables
[   46.498759][ T4025] CPU: 7 UID: 0 PID: 4025 Comm: umount Not tainted 
6.14.0-rc6-00297-gbde708f1a65d #1
[   46.508156][ T4025] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 
1.2.8 01/26/2016
[ 46.516324][ T4025] RIP: 0010:truncate_folio_batch_exceptionals 
(mm/truncate.c:89 (discriminator 1)) 
[ 46.523347][ T4025] Code: 84 70 ff ff ff 4d 63 fd 49 83 ff 1e 0f 87 d4 01 00 
00 4c 89 f0 48 c1 e8 03 42 80 3c 00 00 0f 85 9b 01 00 00 41 f6 06 01 74 c6 <0f> 
0b 48 89 c8 48 c1 e8 03 42 80 3c 00 00 0f 85 d6 01 00 00 48 8b
All code

   0:   84 70 fftest   %dh,-0x1(%rax)
   3:   ff  (bad)
   4:   ff 4d 63decl   0x63(%rbp)
   7:   fd  std
   8:   49 83 ff 1e cmp$0x1e,%r15
   c:   0f 87 d4 01 00 00   ja 0x1e6
  12:   4c 89 f0mov%r14,%rax
  15:   48 c1 e8 03 shr$0x3,%rax
  19:   42 80 3c 00 00  cmpb   $0x0,(%rax,%r8,1)
  1e:   0f 85 9b 01 00 00   jne0x1bf
  24:   41 f6 06 01 testb  $0x1,(%r14)
  28:   74 c6   je 0xfff0
  2a:*  0f 0b   ud2 <-- trapping instruction
  2c:   48 89 c8mov%rcx,%rax
  2f:   48 c1 e8 03 shr$0x3,%rax
  33:   42 80 3c 00 00  cmpb   $0x0,(%rax,%r8,1)
  38:   0f 85 d6 01 00 00   jne0x214
  3e:   48  rex.W
  3f:   8b  .byte 0x8b

Code starting with the faulting instruction
===
   0:   0f 0b   ud2
   2:   48 89 c8mov%rcx,%rax
   5:   48 c1 e8 03 shr$0x3,%rax
   9:   42 80 3c 00 00  cmpb   $0x0,(%rax,%r8,1)
   e:   0f 85 d6 01 00 00   jne0x1ea
  14:   48  rex.W
  15:   8b  .byte 0x8b
[   46.542938][ T4025] RSP: 0018:c9000d74f370 EFLAGS: 00010202
[   46.548900][ T4025] RAX: 192001ae9ec8 RBX: 8881e2d959d8 RCX: 
c9000d74f4f8
[   46.556787][ T4025] RDX: 0001 RSI: c9000d74f638 RDI: 
8881e2d95874
[   46.564676][ T4025] RBP: 192001ae9e74 R08: dc00 R09: 
f52001ae9eec
[   46.572557][ T4025] R10: 0003 R11: 1110d4bf8d9c R12: 
c9000d74f638
[   46.580439][ T4025] R13:  R14: c9000d74f640 R15: 

[   46.588319][ T4025] FS:  7fe696f3a840() GS:8886a5f8() 
knlGS:
[   46.597163][ T4025] CS:  0010 DS:  ES:  CR0: 80050033
[   46.603691][ T4025] CR2: 7fff266c5ec0 CR3: 0001ead7e002 CR4: 
003726f0
[   46.611586][ T4025] DR0:  DR1:  DR2:

[PATCH v2] scripts/spdxcheck: Limit the scope of git.Repo

2025-04-09 Thread Ricardo Ribalda

If the git.Repo object's scope extends to the Python interpreter's
shutdown phase, its destructor may fail due to the interpreter's state.

Exception ignored in: 
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/git/cmd.py", line 565, in __del__
  File "/usr/lib/python3/dist-packages/git/cmd.py", line 546, in _terminate
  File "/usr/lib/python3.13/subprocess.py", line 2227, in terminate
ImportError: sys.meta_path is None, Python is likely shutting down

Make repo a variable of the function read_spdxdata() and scan_git_tree()
to limit the scope of git.Repo and ensure proper resource management.

Signed-off-by: Ricardo Ribalda 
---
Changes in v2:
- Make repo a local variable
- Link to v1: 
https://lore.kernel.org/r/20250225-spx-v1-1-e935b27eb...@chromium.org
---
 scripts/spdxcheck.py | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py
index 
8d608f61bf371647e7ca0129f583e94e535b6193..d5c4c37f1b068486af28110261f74c67301618a9
 100755
--- a/scripts/spdxcheck.py
+++ b/scripts/spdxcheck.py
@@ -45,7 +45,9 @@ class dirinfo(object):
 self.files.append(fname)
 
 # Read the spdx data from the LICENSES directory
-def read_spdxdata(repo):
+def read_spdxdata():
+repo = git.Repo(os.getcwd())
+assert not repo.bare
 
 # The subdirectories of LICENSES in the kernel source
 # Note: exceptions needs to be parsed as last directory.
@@ -295,7 +297,15 @@ def exclude_file(fpath):
 return True
 return False
 
-def scan_git_tree(tree, basedir, dirdepth):
+def scan_git_tree(basedir, dirdepth):
+repo = git.Repo(os.getcwd())
+tree = repo.head.commit.tree
+
+basedir = basedir.strip('/')
+if basedir != '.':
+for p in basedir.split('/'):
+tree = tree[p]
+
 parser.set_dirinfo(basedir, dirdepth)
 for el in tree.traverse():
 if not os.path.isfile(el.path):
@@ -306,11 +316,6 @@ def scan_git_tree(tree, basedir, dirdepth):
 with open(el.path, 'rb') as fd:
 parser.parse_lines(fd, args.maxlines, el.path)
 
-def scan_git_subtree(tree, path, dirdepth):
-for p in path.strip('/').split('/'):
-tree = tree[p]
-scan_git_tree(tree, path.strip('/'), dirdepth)
-
 def read_exclude_file(fname):
 rules = []
 if not fname:
@@ -348,12 +353,8 @@ if __name__ == '__main__':
 sys.exit(1)
 
 try:
-# Use git to get the valid license expressions
-repo = git.Repo(os.getcwd())
-assert not repo.bare
-
 # Initialize SPDX data
-spdx = read_spdxdata(repo)
+spdx = read_spdxdata()
 
 # Initialize the parser
 parser = id_parser(spdx)
@@ -389,14 +390,13 @@ if __name__ == '__main__':
 if os.path.isfile(p):
 parser.parse_lines(open(p, 'rb'), args.maxlines, p)
 elif os.path.isdir(p):
-scan_git_subtree(repo.head.reference.commit.tree, p,
- args.depth)
+scan_git_tree(p, args.depth)
 else:
 sys.stderr.write('path %s does not exist\n' %p)
 sys.exit(1)
 else:
 # Full git tree scan
-scan_git_tree(repo.head.commit.tree, '.', args.depth)
+scan_git_tree('.', args.depth)
 
 ndirs = len(parser.spdx_dirs)
 dirsok = 0

---
base-commit: d082ecbc71e9e0bf49883ee4afd435a77a5101b6
change-id: 20250225-spx-382cf543370e

Best regards,
-- 
Ricardo Ribalda

Re: [PATCH 3/4] sysctl: call sysctl tests with a for loop

2025-04-09 Thread Kees Cook

On Fri, Mar 21, 2025 at 01:47:26PM +0100, Joel Granados wrote:
> As we add more test functions in lib/tests_sysctl the main test function
> (test_sysctl_init) grows. Condense the logic to make it easier to
> add/remove tests.
> 
> Signed-off-by: Joel Granados 

Nice cleanup!

Reviewed-by: Kees Cook 

-- 
Kees Cook

Re: [PATCH RFC v7 3/8] security: Export security_inode_init_security_anon for KVM guest_memfd

2025-04-09 Thread Paul Moore

On Tue, Apr 8, 2025 at 7:25 AM Shivank Garg  wrote:
>
> KVM guest_memfd is implementing its own inodes to store metadata for
> backing memory using a custom filesystem. This requires the ability to
> initialize anonymous inode using security_inode_init_security_anon().
>
> As guest_memfd currently resides in the KVM module, we need to export this
> symbol for use outside the core kernel. In the future, guest_memfd might be
> moved to core-mm, at which point the symbols no longer would have to be
> exported. When/if that happens is still unclear.

Can you help me understand the timing just a bit more ... do you
expect the move to the core MM code to happen during the lifetime of
this patchset, or is it just some hand-wavy "future date"?  No worries
either way, just trying to understand things a bit better.

> Signed-off-by: Shivank Garg 
> ---
>  security/security.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/security/security.c b/security/security.c
> index fb57e8fddd91..097283bb06a5 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1877,6 +1877,7 @@ int security_inode_init_security_anon(struct inode 
> *inode,
> return call_int_hook(inode_init_security_anon, inode, name,
>  context_inode);
>  }
> +EXPORT_SYMBOL(security_inode_init_security_anon);
>
>  #ifdef CONFIG_SECURITY_PATH
>  /**
> --
> 2.34.1

-- 
paul-moore.com

RE: spdxcheck: python git module considered harmful (was RE: [PATCH] scripts/spdxcheck: Limit the scope of git.Repo)

2025-04-09 Thread Thomas Gleixner

Tim!

On Wed, Apr 09 2025 at 17:44, Tim Bird wrote:
>> From: Thomas Gleixner 
>> On Tue, Apr 08 2025 at 17:34, Tim Bird wrote:
>> And yes, it ignores not yet tracked files, but if you want to check
>> them, then it's easy enough to commit them temporarily or provide a
>> dedicated file target to the tools, which ignores git.
>
> OK.  Yes. That's an easy workaround.

Actually spdxcheck supports that already:

   scripts/spdxcheck.py path/to/file

>> Good luck for coming up with a clever and clean solution for that!
>
> I thought about various solutions for this, but each one I came up
> with had other drawbacks.  If it was just a matter of separating 
> *.[chS] files from ELF object files, that would be easy to deal with.
> But we put SPDX headers on all kinds of files, and there are lots
> of other types of files generated during a build that are not just
> ELF objects.  And build rules change over time.  So even if I made
> a comprehensive system today to catch build-generated outliers,
> the solution would probably need constant updating and tweaking, which
> IMHO makes it a no-go.

I'm glad that I'm not the only one who came to this conclusion :)

>> Just for the record: I rather wish that people would contribute to
>> eliminate the remaining 17% (15397 files) which do not have SPDX
>> identifiers than complaining about the trivial to solve short-comings of
>> the tool, which was written to help this effort and to make sure that it
>> does not degrade.
>
> I agree with this.  Analyzing where the headers are missing is interesting.
> But it's more important to just fix the missing ones.
> I'll spend more of my time working on missing headers,
> rather than on tools to analyze and report them.

Very appreciated.

Thanks,

tglx

[PATCH v4] remoteproc: imx_dsp_rproc: Add support for DSP-specific features

2025-04-09 Thread Iuliana Prodan (OSS)

From: Iuliana Prodan 

Some DSP firmware requires a FW_READY signal before proceeding, while
others do not.
Therefore, add support to handle i.MX DSP-specific features.

Implement handle_rsc callback to handle resource table parsing and to
process DSP-specific resource, to determine if waiting is needed.

Update imx_dsp_rproc_start() to handle this condition accordingly.

Signed-off-by: Iuliana Prodan 
---
Changes in v4:
- Reviews from Mathieu Poirier:
  - Adjusted len to include the size of struct fw_rsc_imx_dsp.
  - Updated len validation checks.
- Review from Frank Li:
  - In imx_dsp_rproc_handle_rsc(), removed the goto ignored statement.
- In probe(), set flags to WAIT_FW_READY to ensure the host waits
for fw_ready when no vendor-specific resource is defined.
- Link to v3: 
https://lore.kernel.org/all/20250403100124.637889-1-iuliana.pro...@oss.nxp.com/

Changes in v3:
- Reviews from Mathieu Poirier:
  - Added version and magic number to vendor-specific resource table entry.
  - Updated defines to maintain backward compatibility with a resource table 
that doesn't have a vendor-specific resource.
- By default, wait for `fw_ready`, unless specified otherwise.
- Link to v2: 
https://lore.kernel.org/all/20250318215007.2109726-1-iuliana.pro...@oss.nxp.com

Changes in v2:
- Reviews from Mathieu Poirier:
  - Use vendor-specific resource table entry.
  - Implement resource handler specific to the i.MX DSP.
- Revise commit message to include recent updates.
- Link to v1: 
https://lore.kernel.org/all/20250305123923.514386-1-iuliana.pro...@oss.nxp.com/

 drivers/remoteproc/imx_dsp_rproc.c | 98 +-
 1 file changed, 96 insertions(+), 2 deletions(-)

diff --git a/drivers/remoteproc/imx_dsp_rproc.c 
b/drivers/remoteproc/imx_dsp_rproc.c
index b9bb15970966..e4212e624a91 100644
--- a/drivers/remoteproc/imx_dsp_rproc.c
+++ b/drivers/remoteproc/imx_dsp_rproc.c
@@ -35,9 +35,18 @@ module_param_named(no_mailboxes, no_mailboxes, int, 0644);
 MODULE_PARM_DESC(no_mailboxes,
 "There is no mailbox between cores, so ignore remote proc 
reply after start, default is 0 (off).");
 
+/* Flag indicating that the remote is up and running */
 #define REMOTE_IS_READYBIT(0)
+/* Flag indicating that the host should wait for a firmware-ready response */
+#define WAIT_FW_READY  BIT(1)
 #define REMOTE_READY_WAIT_MAX_RETRIES  500
 
+/*
+ * This flag is set in the DSP resource table's features field to indicate
+ * that the firmware requires the host NOT to wait for a FW_READY response.
+ */
+#define FEATURE_DONT_WAIT_FW_READY BIT(0)
+
 /* att flags */
 /* DSP own area */
 #define ATT_OWNBIT(31)
@@ -72,6 +81,10 @@ MODULE_PARM_DESC(no_mailboxes,
 
 #define IMX8ULP_SIP_HIFI_XRDC  0xc20e
 
+#define FW_RSC_NXP_S_MAGIC ((uint32_t)'n' << 24 |  \
+(uint32_t)'x' << 16 |  \
+(uint32_t)'p' << 8 |   \
+(uint32_t)'s')
 /*
  * enum - Predefined Mailbox Messages
  *
@@ -136,6 +149,24 @@ struct imx_dsp_rproc_dcfg {
int (*reset)(struct imx_dsp_rproc *priv);
 };
 
+/**
+ * struct fw_rsc_imx_dsp - i.MX DSP specific info
+ *
+ * @len: length of the resource entry
+ * @magic_num: 32-bit magic number
+ * @version: version of data structure
+ * @features: feature flags supported by the i.MX DSP firmware
+ *
+ * This represents a DSP-specific resource in the firmware's
+ * resource table, providing information on supported features.
+ */
+struct fw_rsc_imx_dsp {
+   uint32_t len;
+   uint32_t magic_num;
+   uint32_t version;
+   uint32_t features;
+} __packed;
+
 static const struct imx_rproc_att imx_dsp_rproc_att_imx8qm[] = {
/* dev addr , sys addr  , size  , flags */
{ 0x596e8000, 0x556e8000, 0x8000, ATT_OWN },
@@ -300,6 +331,66 @@ static int imx_dsp_rproc_ready(struct rproc *rproc)
return -ETIMEDOUT;
 }
 
+/**
+ * imx_dsp_rproc_handle_rsc() - Handle DSP-specific resource table entries
+ * @rproc: remote processor instance
+ * @rsc_type: resource type identifier
+ * @rsc: pointer to the resource entry
+ * @offset: offset of the resource entry
+ * @avail: available space in the resource table
+ *
+ * Parse the DSP-specific resource entry and update flags accordingly.
+ * If the WAIT_FW_READY feature is set, the host must wait for the firmware
+ * to signal readiness before proceeding with execution.
+ *
+ * Return: RSC_HANDLED if processed successfully, RSC_IGNORED otherwise.
+ */
+static int imx_dsp_rproc_handle_rsc(struct rproc *rproc, u32 rsc_type,
+   void *rsc, int offset, int avail)
+{
+   struct imx_dsp_rproc *priv = rproc->priv;
+   struct fw_rsc_imx_dsp *imx_dsp_rsc = rsc;
+   struct device *dev = rproc->dev.parent;
+
+

Re: [PATCH v2] DAX: warn when kmem regions are truncated for memory block alignment.

2025-04-09 Thread Alison Schofield

On Tue, Apr 01, 2025 at 09:59:20PM -0400, Gregory Price wrote:
> Device capacity intended for use as system ram should be aligned to the
> archite-defined memory block size or that capacity will be silently
> truncated and capacity stranded.
> 
> As hotplug dax memory becomes more prevelant, the memory block size
> alignment becomes more important for platform and device vendors to
> pay attention to - so this truncation should not be silent.
> 
> This issue is particularly relevant for CXL Dynamic Capacity devices,
> whose capacity may arrive in spec-aligned but block-misaligned chunks.
> 
> Suggested-by: David Hildenbrand 
> Suggested-by: Dan Williams 
> Signed-off-by: Gregory Price 

Existing unit test 'daxctl-devices.sh' hits this case:
[   52.547521] kmem dax3.0: DAX region truncated by 62.0 MiB due to alignment

Tested-by: Alison Schofield 

snip

>

[PATCH] selftests/sgx: Fix an enclave built with extended instructions

2025-04-09 Thread Vladis Dronov

Creating an enclave with xfrm == 3 disables extended CPU states and instruction
sets, like AVX2 and AVX512 inside the enclave. Thus the enclave code has to be
built with a compiler which does not produce instructions from the extended
instruction sets. Nevertheless certain Linux distributions confgure a compiler
so it produces extended instructions by default ("--with-arch_64=x86-64-v3" for
gcc). Thus an enclave code from test_encl.c is built with extended instructions
and an enclave execution hits the #UD exception (note exception_vector == 6):

# ./test_sgx
...
#  RUN   enclave.unclobbered_vdso_oversubscribed_remove ...
# main.c:481:unclobbered_vdso_oversubscribed_remove:Expected 
self->run.exception_vector (6) == 0 (0)
# main.c:485:unclobbered_vdso_oversubscribed_remove:Expected 
self->run.function (3) == EEXIT (4)
# unclobbered_vdso_oversubscribed_remove: Test terminated by assertion
#  FAIL  enclave.unclobbered_vdso_oversubscribed_remove
not ok 3 enclave.unclobbered_vdso_oversubscribed_remove

Fix this by adding "-mno-avx" to ENCL_CFLAGS in Makefile. Add some comments
about this to code locations where enclave's xfrm field is set.

Suggested-by: Dave Hansen 
Signed-off-by: Vladis Dronov 
---
an out-of-commit-message note:

I would greatly appreciate if someone reviews and possibly fixes my wording
of the commit message and the code comments.

 tools/testing/selftests/sgx/Makefile| 2 +-
 tools/testing/selftests/sgx/load.c  | 8 +++-
 tools/testing/selftests/sgx/sigstruct.c | 6 ++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/sgx/Makefile 
b/tools/testing/selftests/sgx/Makefile
index 03b5e13b872b..ab2561b4456d 100644
--- a/tools/testing/selftests/sgx/Makefile
+++ b/tools/testing/selftests/sgx/Makefile
@@ -15,7 +15,7 @@ INCLUDES := -I$(top_srcdir)/tools/include
 HOST_CFLAGS := -Wall -Werror -g $(INCLUDES) -fPIC $(CFLAGS)
 HOST_LDFLAGS := -z noexecstack -lcrypto
 ENCL_CFLAGS += -Wall -Werror -static-pie -nostdlib -ffreestanding -fPIE \
-  -fno-stack-protector -mrdrnd $(INCLUDES)
+  -fno-stack-protector -mrdrnd -mno-avx $(INCLUDES)
 ENCL_LDFLAGS := -Wl,-T,test_encl.lds,--build-id=none
 
 ifeq ($(CAN_BUILD_X86_64), 1)
diff --git a/tools/testing/selftests/sgx/load.c 
b/tools/testing/selftests/sgx/load.c
index c9f658e44de6..79946ca8f1a5 100644
--- a/tools/testing/selftests/sgx/load.c
+++ b/tools/testing/selftests/sgx/load.c
@@ -88,10 +88,16 @@ static bool encl_ioc_create(struct encl *encl)
memset(secs, 0, sizeof(*secs));
secs->ssa_frame_size = 1;
secs->attributes = SGX_ATTR_MODE64BIT;
-   secs->xfrm = 3;
secs->base = encl->encl_base;
secs->size = encl->encl_size;
 
+   /*
+* Setting xfrm to 3 disables extended CPU states and instruction sets
+* like AVX2 inside the enclave. Thus the enclave code has to be built
+* without instructions from extended instruction sets (-mno-avx).
+*/
+   secs->xfrm = 3;
+
ioc.src = (unsigned long)secs;
rc = ioctl(encl->fd, SGX_IOC_ENCLAVE_CREATE, &ioc);
if (rc) {
diff --git a/tools/testing/selftests/sgx/sigstruct.c 
b/tools/testing/selftests/sgx/sigstruct.c
index d73b29becf5b..f548392a2fee 100644
--- a/tools/testing/selftests/sgx/sigstruct.c
+++ b/tools/testing/selftests/sgx/sigstruct.c
@@ -331,6 +331,12 @@ bool encl_measure(struct encl *encl)
sigstruct->header.header2[1] = header2[1];
sigstruct->exponent = 3;
sigstruct->body.attributes = SGX_ATTR_MODE64BIT;
+
+   /*
+* Setting xfrm to 3 disables extended CPU states and instruction sets
+* like AVX2 inside the enclave. Thus the enclave code has to be built
+* without instructions from extended instruction sets (-mno-avx).
+*/
sigstruct->body.xfrm = 3;
 
/* sanity check */
-- 
2.49.0

Re: [PATCH v2] scripts/spdxcheck: Limit the scope of git.Repo

2025-04-09 Thread Gon Solo

Am Wed, Apr 09, 2025 at 08:04:19PM + schrieb Ricardo Ribalda:
> Changes in v2:
> - Make repo a local variable
> - Link to v1: 
> https://lore.kernel.org/r/20250225-spx-v1-1-e935b27eb...@chromium.org

This is not necessary as it was Python's fault and is fixed by 3.13.3
which came out yesterday. I just checked.

Re: [PATCH 1/4] sysctl: move u8 register test to lib/test_sysctl.c

2025-04-09 Thread Kees Cook

On Fri, Mar 21, 2025 at 01:47:24PM +0100, Joel Granados wrote:
> If the test added in commit b5ffbd139688 ("sysctl: move the extra1/2
> boundary check of u8 to sysctl_check_table_array") is run as a module, a
> lingering reference to the module is left behind, and a 'sysctl -a'
> leads to a panic.
> 
> To reproduce
> CONFIG_KUNIT=y
> CONFIG_SYSCTL_KUNIT_TEST=m
> 
> Then run these commands:
> modprobe sysctl-test
> rmmod sysctl-test
> sysctl -a
> 
> The panic varies but generally looks something like this:
> 
> BUG: unable to handle page fault for address: a4571c0c7db4
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x) - not-present page
> PGD 10067 P4D 10067 PUD 100351067 PMD 114f5e067 PTE 0
> Oops: Oops:  [#1] SMP NOPTI
> ... ... ...
> RIP: 0010:proc_sys_readdir+0x166/0x2c0
> ... ... ...
> Call Trace:
>  
>  iterate_dir+0x6e/0x140
>  __se_sys_getdents+0x6e/0x100
>  do_syscall_64+0x70/0x150
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> Move the test to lib/test_sysctl.c where the registration reference is
> handled on module exit
> 
> 'Fixes: b5ffbd139688 ("sysctl: move the extra1/2 boundary check of u8 to

Typoe: drop leading '

> sysctl_check_table_array")'

And avoid wrapping this line for the field.

> 
> Signed-off-by: Joel Granados 

Otherwise looks good to me.

Reviewed-by: Kees Cook 

(And I should  note that there is a push to move kunit tests into a
"/tests/" subdir, but that's separate from this series.)

-- 
Kees Cook

Re: [PATCH] compiler.h: Avoid the usage of __typeof_unqual() when GENKSYMS__ is defined

2025-04-09 Thread Uros Bizjak

On Wed, Apr 9, 2025 at 5:38 PM Borislav Petkov  wrote:
>
> On Wed, Apr 09, 2025 at 05:32:39PM +0200, Uros Bizjak wrote:
> > The workaround is posted to the list. It should be committed to the
> > mainline until genksyms is fixed.
>
> I'll take it through tip.

Thanks!

Best regards,
Uros.

Re: [PATCH v2] DAX: warn when kmem regions are truncated for memory block alignment.

2025-04-09 Thread Gregory Price

On Wed, Apr 09, 2025 at 03:21:53PM -0700, Alison Schofield wrote:
> Existing unit test 'daxctl-devices.sh' hits this case:
> [   52.547521] kmem dax3.0: DAX region truncated by 62.0 MiB due to alignment
> 
> Tested-by: Alison Schofield 
>

Thanks for testing, good to know there's an existing test for this

will respin w/ Jonathan's recommendations and tags.

~Gregory

[GIT PULL] Kselftest fixes update for Linux 6.15-rc2

2025-04-09 Thread Shuah Khan


Hi Linus,

Please pull the following kselftest fixes update for Linux 6.15-rc2

Fixes tpm2, futex, and mincore tests. Creates a dedicated .gitignore
for tpm2

Details:

selftests: tpm2: test_smoke: use POSIX-conformant expression operator
selftests/futex: futex_waitv wouldblock test should fail
selftests: tpm2: create a dedicated .gitignore
selftests/mincore: Allow read-ahead pages to reach the end of the file

diff is attached.

thanks,
-- Shuah


The following changes since commit 0af2f6be1b4281385b618cb86ad946eded089ac8:

  Linux 6.15-rc1 (2025-04-06 13:11:33 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest 
tags/linux_kselftest-fixes-6.15-rc2

for you to fetch changes up to 197c1eaa7ba633a482ed7588eea6fd4aa57e08d4:

  selftests/mincore: Allow read-ahead pages to reach the end of the file 
(2025-04-08 17:08:50 -0600)


linux_kselftest-fixes-6.15-rc2

Fixes tpm2, futex, and mincore tests. Creates a dedicated .gitignore
for tpm2

Details:

selftests: tpm2: test_smoke: use POSIX-conformant expression operator
selftests/futex: futex_waitv wouldblock test should fail
selftests: tpm2: create a dedicated .gitignore
selftests/mincore: Allow read-ahead pages to reach the end of the file


Ahmed Salem (1):
  selftests: tpm2: test_smoke: use POSIX-conformant expression operator

Edward Liaw (1):
  selftests/futex: futex_waitv wouldblock test should fail

Khaled Elnaggar (1):
  selftests: tpm2: create a dedicated .gitignore

Qiuxu Zhuo (1):
  selftests/mincore: Allow read-ahead pages to reach the end of the file

 tools/testing/selftests/.gitignore   | 1 -
 tools/testing/selftests/futex/functional/futex_wait_wouldblock.c | 2 +-
 tools/testing/selftests/mincore/mincore_selftest.c   | 3 ---
 tools/testing/selftests/tpm2/.gitignore  | 3 +++
 tools/testing/selftests/tpm2/test_smoke.sh   | 2 +-
 5 files changed, 5 insertions(+), 6 deletions(-)
 create mode 100644 tools/testing/selftests/tpm2/.gitignore
diff --git a/tools/testing/selftests/.gitignore b/tools/testing/selftests/.gitignore
index cb24124ac5b9..674aaa02e396 100644
--- a/tools/testing/selftests/.gitignore
+++ b/tools/testing/selftests/.gitignore
@@ -4,7 +4,6 @@ gpiogpio-hammer
 gpioinclude/
 gpiolsgpio
 kselftest_install/
-tpm2/SpaceTest.log
 
 # Python bytecode and cache
 __pycache__/
diff --git a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
index 7d7a6a06cdb7..2d8230da9064 100644
--- a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
+++ b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c
@@ -98,7 +98,7 @@ int main(int argc, char *argv[])
 	info("Calling futex_waitv on f1: %u @ %p with val=%u\n", f1, &f1, f1+1);
 	res = futex_waitv(&waitv, 1, 0, &to, CLOCK_MONOTONIC);
 	if (!res || errno != EWOULDBLOCK) {
-		ksft_test_result_pass("futex_waitv returned: %d %s\n",
+		ksft_test_result_fail("futex_waitv returned: %d %s\n",
   res ? errno : res,
   res ? strerror(errno) : "");
 		ret = RET_FAIL;
diff --git a/tools/testing/selftests/mincore/mincore_selftest.c b/tools/testing/selftests/mincore/mincore_selftest.c
index e949a43a6145..efabfcbe0b49 100644
--- a/tools/testing/selftests/mincore/mincore_selftest.c
+++ b/tools/testing/selftests/mincore/mincore_selftest.c
@@ -261,9 +261,6 @@ TEST(check_file_mmap)
 		TH_LOG("No read-ahead pages found in memory");
 	}
 
-	EXPECT_LT(i, vec_size) {
-		TH_LOG("Read-ahead pages reached the end of the file");
-	}
 	/*
 	 * End of the readahead window. The rest of the pages shouldn't
 	 * be in memory.
diff --git a/tools/testing/selftests/tpm2/.gitignore b/tools/testing/selftests/tpm2/.gitignore
new file mode 100644
index ..6d6165c5e35d
--- /dev/null
+++ b/tools/testing/selftests/tpm2/.gitignore
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+AsyncTest.log
+SpaceTest.log
diff --git a/tools/testing/selftests/tpm2/test_smoke.sh b/tools/testing/selftests/tpm2/test_smoke.sh
index 168f4b166234..3a60e6c6f5c9 100755
--- a/tools/testing/selftests/tpm2/test_smoke.sh
+++ b/tools/testing/selftests/tpm2/test_smoke.sh
@@ -6,6 +6,6 @@ ksft_skip=4
 
 [ -e /dev/tpm0 ] || exit $ksft_skip
 read tpm_version < /sys/class/tpm/tpm0/tpm_version_major
-[ "$tpm_version" == 2 ] || exit $ksft_skip
+[ "$tpm_version" = 2 ] || exit $ksft_skip
 
 python3 -m unittest -v tpm2_tests.SmokeTest 2>&1

Re: [PATCH RESEND bpf-next v2] selftests/bpf: close the file descriptor to avoid resource leaks

2025-04-09 Thread Alexei Starovoitov

On Wed, Apr 9, 2025 at 4:28 PM Andrii Nakryiko
 wrote:
>
> On Tue, Apr 8, 2025 at 11:33 AM Malaya Kumar Rout
>  wrote:
> >
> > Static Analyis for bench_htab_mem.c with cppcheck:error
>
> typo: analysis (lower case and typo)
>
> you can also make into a bit more human-readable sentence:
>
> "Static analysis found an issue in bench_htab_mem.c:
>
>
> > tools/testing/selftests/bpf/benchs/bench_htab_mem.c:284:3:
> > error: Resource leak: fd [resourceLeak]
> > tools/testing/selftests/bpf/prog_tests/sk_assign.c:41:3:
> > error: Resource leak: tc [resourceLeak]
> >
> > fix the issue  by closing the file descriptor (fd & tc) when
> > read & fgets operation fails.
>
> "Fix the issue by closing" (capitalization, single space between words
>
> >
> > Signed-off-by: Malaya Kumar Rout 
> > ---
> >  tools/testing/selftests/bpf/benchs/bench_htab_mem.c | 3 +--
> >  tools/testing/selftests/bpf/prog_tests/sk_assign.c  | 4 +++-
> >  2 files changed, 4 insertions(+), 3 deletions(-)
> >
>
> For some reason this patch didn't make it into our Patchworks system,
> so we can't easily apply it. Please fix the above commit issues and
> resubmit, hopefully this time it goes through just fine.

I thought it was a maintenance issue with patchwork, but it's probably
something else.
I'm guessing your cc list confuses pw bot.
Pls drop linux-kselftest@vger and lkml@vger and keep bpf@vger only.

Re: [PATCH RESEND bpf-next v2] selftests/bpf: close the file descriptor to avoid resource leaks

2025-04-09 Thread Andrii Nakryiko

On Tue, Apr 8, 2025 at 11:33 AM Malaya Kumar Rout
 wrote:
>
> Static Analyis for bench_htab_mem.c with cppcheck:error

typo: analysis (lower case and typo)

you can also make into a bit more human-readable sentence:

"Static analysis found an issue in bench_htab_mem.c:


> tools/testing/selftests/bpf/benchs/bench_htab_mem.c:284:3:
> error: Resource leak: fd [resourceLeak]
> tools/testing/selftests/bpf/prog_tests/sk_assign.c:41:3:
> error: Resource leak: tc [resourceLeak]
>
> fix the issue  by closing the file descriptor (fd & tc) when
> read & fgets operation fails.

"Fix the issue by closing" (capitalization, single space between words

>
> Signed-off-by: Malaya Kumar Rout 
> ---
>  tools/testing/selftests/bpf/benchs/bench_htab_mem.c | 3 +--
>  tools/testing/selftests/bpf/prog_tests/sk_assign.c  | 4 +++-
>  2 files changed, 4 insertions(+), 3 deletions(-)
>

For some reason this patch didn't make it into our Patchworks system,
so we can't easily apply it. Please fix the above commit issues and
resubmit, hopefully this time it goes through just fine.

> diff --git a/tools/testing/selftests/bpf/benchs/bench_htab_mem.c 
> b/tools/testing/selftests/bpf/benchs/bench_htab_mem.c
> index 926ee822143e..297e32390cd1 100644
> --- a/tools/testing/selftests/bpf/benchs/bench_htab_mem.c
> +++ b/tools/testing/selftests/bpf/benchs/bench_htab_mem.c
> @@ -279,6 +279,7 @@ static void htab_mem_read_mem_cgrp_file(const char *name, 
> unsigned long *value)
> }
>
> got = read(fd, buf, sizeof(buf) - 1);
> +   close(fd);
> if (got <= 0) {
> *value = 0;
> return;
> @@ -286,8 +287,6 @@ static void htab_mem_read_mem_cgrp_file(const char *name, 
> unsigned long *value)
> buf[got] = 0;
>
> *value = strtoull(buf, NULL, 0);
> -
> -   close(fd);
>  }
>
>  static void htab_mem_measure(struct bench_res *res)
> diff --git a/tools/testing/selftests/bpf/prog_tests/sk_assign.c 
> b/tools/testing/selftests/bpf/prog_tests/sk_assign.c
> index 0b9bd1d6f7cc..10a0ab954b8a 100644
> --- a/tools/testing/selftests/bpf/prog_tests/sk_assign.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sk_assign.c
> @@ -37,8 +37,10 @@ configure_stack(void)
> tc = popen("tc -V", "r");
> if (CHECK_FAIL(!tc))
> return false;
> -   if (CHECK_FAIL(!fgets(tc_version, sizeof(tc_version), tc)))
> +   if (CHECK_FAIL(!fgets(tc_version, sizeof(tc_version), tc))) {
> +   pclose(tc);
> return false;
> +   }
> if (strstr(tc_version, ", libbpf "))
> prog = "test_sk_assign_libbpf.bpf.o";
> else
> --
> 2.43.0
>

Re: [GIT PULL] Kselftest fixes update for Linux 6.15-rc2

2025-04-09 Thread pr-tracker-bot

The pull request you sent on Wed, 9 Apr 2025 16:13:39 -0600:

> git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest 
> tags/linux_kselftest-fixes-6.15-rc2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/3b07108ada81a8ebcebf1fe61367b4e436c895bd

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

Re: [PATCH v2] DAX: warn when kmem regions are truncated for memory block alignment.

2025-04-09 Thread Dan Williams

Gregory Price wrote:
> Device capacity intended for use as system ram should be aligned to the
> archite-defined memory block size or that capacity will be silently
> truncated and capacity stranded.
> 
> As hotplug dax memory becomes more prevelant, the memory block size
> alignment becomes more important for platform and device vendors to
> pay attention to - so this truncation should not be silent.
> 
> This issue is particularly relevant for CXL Dynamic Capacity devices,
> whose capacity may arrive in spec-aligned but block-misaligned chunks.
> 
> Suggested-by: David Hildenbrand 
> Suggested-by: Dan Williams 
> Signed-off-by: Gregory Price 

The typo in the changelog was already noted, and this addresses all my
feedback so feel free to add:

Reviewed-by: Dan Williams

Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

2025-04-09 Thread Myrsky Lintu

Hello,

I am completely new to and uninformed about kernel development. I was 
pointed here from Mesa documentation for Venus (Vulkan encapsulation for 
KVM/QEMU): https://docs.mesa3d.org/drivers/venus.html

Based on my limited understanding of what has happened here, this patch 
series was partially reverted due to an issue with the Bochs DRM driver. 
A fix for that issue has been merged months ago according to the link 
provided in an earlier message. Since then work on this detail of KVM 
seems to have stalled.

Is it reasonable to ask here for this patch series to be evaluated and 
incorporated again?

My layperson's attempt at applying the series against 6.14.1 source code 
failed. In addition to the parts that appear to have already been 
incorporated there are some parts of the patch series that are rejected. 
I lack the knowledge to correct that.

Distro kernels currently ship without it which limits the usability of 
Venus on AMD and NVIDIA GPUs paired with Intel CPUs. Convincing 
individual distro maintainers of the necessity of this patch series 
without the specialized knowledge required for understanding what it 
does and performing that evaluation is quite hard. If upstream (kernel) 
would apply it now the distros would ship a kernel including the 
required changes to users, including me, without that multiplicated effort.

Thank you for your time. If this request is out of place here please 
forgive me for engaging this mailing list without a proper understanding 
of the list's scope.

On 2024-10-07 14:04:24, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 07.10.24 15:38, Vitaly Kuznetsov wrote:
>> "Linux regression tracking (Thorsten Leemhuis)"
>>  writes:
>>
>>> On 30.08.24 11:35, Vitaly Kuznetsov wrote:
 Sean Christopherson  writes:

> Unconditionally honor guest PAT on CPUs that support self-snoop, as
> Intel has confirmed that CPUs that support self-snoop always snoop caches
> and store buffers.  I.e. CPUs with self-snoop maintain cache coherency
> even in the presence of aliased memtypes, thus there is no need to trust
> the guest behaves and only honor PAT as a last resort, as KVM does today.
>
> Honoring guest PAT is desirable for use cases where the guest has access
> to non-coherent DMA _without_ bouncing through VFIO, e.g. when a virtual
> (mediated, for all intents and purposes) GPU is exposed to the guest, 
> along
> with buffers that are consumed directly by the physical GPU, i.e. which
> can't be proxied by the host to ensure writes from the guest are performed
> with the correct memory type for the GPU.

 Necroposting!

 Turns out that this change broke "bochs-display" driver in QEMU even
 when the guest is modern (don't ask me 'who the hell uses bochs for
 modern guests', it was basically a configuration error :-). E.g:
 [...]
>>>
>>> This regression made it to the list of tracked regressions. It seems
>>> this thread stalled a while ago. Was this ever fixed? Does not look like
>>> it, but I might have missed something. Or is this a regression I should
>>> just ignore for one reason or another?
>>>
>>
>> The regression was addressed in by reverting 377b2f359d1f in 6.11
>>
>> commit 9d70f3fec14421e793ffbc0ec2f739b24e534900
>> Author: Paolo Bonzini 
>> Date:   Sun Sep 15 02:49:33 2024 -0400
>>
>>  Revert "KVM: VMX: Always honor guest PAT on CPUs that support 
>> self-snoop"
> 
> Thx. Sorry, missed that, thx for pointing me towards it. I had looked
> for things like that, but seems I messed up my lore query. Apologies for
> the noise!
> 
>> Also, there's a (pending) DRM patch fixing it from the guest's side:
>> https://gitlab.freedesktop.org/drm/misc/kernel/-/commit/9388ccf69925223223c87355a417ba39b13a5e8e
> 
> Great!
> 
> Ciao, Thorsten
> 
> P.S.:
> 
> #regzbot fix: 9d70f3fec14421e793ffbc0ec2f739b24e534900
> 
> 
>

Re: [PATCH] compiler.h: Avoid the usage of __typeof_unqual() when GENKSYMS__ is defined

2025-04-09 Thread Uros Bizjak

On Wed, Apr 9, 2025 at 5:28 PM Borislav Petkov  wrote:
>
> On Sun, Apr 06, 2025 at 05:36:13PM +0200, Uros Bizjak wrote:
> > On Fri, Apr 4, 2025 at 9:14 PM Masahiro Yamada  wrote:
> > >
> > > On Fri, Apr 4, 2025 at 11:37 PM Uros Bizjak  wrote:
> > > >
> > > > On Fri, Apr 4, 2025 at 4:06 PM Masahiro Yamada  
> > > > wrote:
> > > >
> > > > > > > > Current version of genksyms doesn't know anything about 
> > > > > > > > __typeof_unqual__()
> > > > > > > > operator.  Avoid the usage of __typeof_unqual__() with genksyms 
> > > > > > > > to prevent
> > > > > > > > errors when symbols are versioned.
> > > > > > > >
> > > > > > > > There were no problems with gendwarfksyms.
> > > > > > > >
> > > > > > > > Signed-off-by: Uros Bizjak 
> > > > > > > > Fixes: ac053946f5c40 ("compiler.h: introduce TYPEOF_UNQUAL() 
> > > > > > > > macro")
> > > > > > > > Reported-by: Paul Menzel 
> > > > > > > > Closes: 
> > > > > > > > https://lore.kernel.org/lkml/81a25a60-de78-43fb-b56a-131151e1c...@molgen.mpg.de/
> > > > > > > > Cc: Sami Tolvanen 
> > > > > > > > Cc: Andrew Morton 
> > > > > > > > ---
> > > > > > >
> > > > > > >
> > > > > > > Why don't you add it to the genksyms keyword table?
> > > > > >
> > > > > > It doesn't work, even if I patch it with an even more elaborate 
> > > > > > patch
> > > > > > (attached).
> > > > > >
> > > > > > I guess some more surgery will be needed, but for now a fallback 
> > > > > > works
> > > > > > as expected.
> > > > > >
> > > > > > Uros.
> > > > >
> > > > > The attached patch looks good to me.
> > > >
> > > > FAOD - do you refer to the submitted one for compiler.h or to the one
> > > > for scripts/genksyms/keywords.c? (The latter doesn't fix the warning,
> > > > though).
> > >
> > >
> > >
> > > You are still seeing the warnings because __typeof_unqual__
> > > is not only the issue.
> > >
> > > Hint:
> > >
> > > $ make -s KCFLAGS=-D__GENKSYMS__  arch/x86/kernel/setup_percpu.i
> > > $ grep  'this_cpu_off;'  arch/x86/kernel/setup_percpu.i
> >
> > I see.
> >
> > With my workaround, this_cpu_off is declared as:
> >
> > extern __attribute__((section(".data..percpu" "..hot.."
> > "this_cpu_off"))) __typeof__(unsigned long) this_cpu_off;
> >
> > while without workaround, the same variable is declared as:
> >
> > extern __seg_gs __attribute__((section(".data..percpu" "..hot.."
> > "this_cpu_off"))) __typeof__(unsigned long) this_cpu_off;
> >
> > It looks that genksyms should be extended to handle (or ignore)
> > __seg_gs/__seg_fs named address prefix. Somewhat surprising, because
> > genksyms can process:
> >
> > extern __attribute__((section(".data..percpu" "..hot.."
> > "const_current_task"))) __typeof__(struct task_struct * const
> > __seg_gs) const_current_task
> >
> > without problems.
> >
> > I'm sorry, but I'm not able to extend genksyms with a new keyword by 
> > myself...
>
> Well, we need a fix here because this fires a lot by now - triggers on my
> machines now too.
>
> So either take a fix or we'll need to revert until it is fixed properly.

The workaround is posted to the list. It should be committed to the
mainline until genksyms is fixed.

Uros.

Re: [PATCH] compiler.h: Avoid the usage of __typeof_unqual() when GENKSYMS__ is defined

2025-04-09 Thread Borislav Petkov

On Wed, Apr 09, 2025 at 05:32:39PM +0200, Uros Bizjak wrote:
> The workaround is posted to the list. It should be committed to the
> mainline until genksyms is fixed.

I'll take it through tip.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH net-next] configs/debug: run and debug PREEMPT

2025-04-09 Thread Jakub Kicinski

On Wed, 9 Apr 2025 13:58:13 +0200 Matthieu Baerts wrote:
> On 08/04/2025 21:03, Jakub Kicinski wrote:
> > On Tue, 8 Apr 2025 20:18:26 +0200 Matthieu Baerts wrote:  
> >> On 02/04/2025 19:23, Stanislav Fomichev wrote:  
> >>> Recent change [0] resulted in a "BUG: using __this_cpu_read() in
> >>> preemptible" splat [1]. PREEMPT kernels have additional requirements
> >>> on what can and can not run with/without preemption enabled.
> >>> Expose those constrains in the debug kernels.
> >>
> >> Good idea to suggest this to find more bugs!
> >>
> >> I did some quick tests on my side with our CI, and the MPTCP selftests
> >> seem to take a bit more time, but without impacting the results.
> >> Hopefully, there will be no impact in slower/busy environments :)  
> > 
> > What kind of slow down do you see? I think we get up to 50% more time
> > spent in the longer tests.  
> 
> That's difficult to measure in our CI because we have a majority of
> tests either creating test envs with random parameters (latency, losses,
> etc.), or waiting for a transfer at a limited speed to finish. Plus, we
> don't control the host running our tests. But if we omit that, our
> packetdrill tests take ~20% longer on the CI, and our 2 mini KUnit tests
> took ~10% longer (275ms -> 305ms). Globally, our test suite took maybe
> ~10-20% longer, and that's acceptable.
> 
> So not 50%. Is this difference acceptable for NIPA? Even when some tests
> are restarted automatically in case of instabilities?

We also see 10%+ on most cases, the 50% was the worst one I glanced.
The worst offenders in terms of runtime only increased by 10% so still
within the guidelines.

> One last thing, Stanislav's patch has been shared during Linus' merge
> window: perhaps something else could also impact the time?
> 
> > Not sure how bad is too bad..  
> 
> Did you observe more instabilities? Maybe the individual results should
> be omitted, and only debug specific issues (calltraces, kmemleak, etc.)
> should be looked at?

A couple but unclear at this stage whether that was just the merge
window or enabling preempt debug. Now patchwork is super unstable
so again, hard to judge the source of the problems :(

> > I'm leaning
> > towards applying this to net-next and we can see if people running
> > on linux-next complain?  
> 
> Good idea! But I do wonder how run **and monitor** the selftests in
> linux-next with a debug kernel :)

One way to find out :)

Re: [PATCH v1] s390/virtio_ccw: don't allocate/assign airqs for non-existing queues

2025-04-09 Thread Michael S. Tsirkin

On Wed, Apr 09, 2025 at 02:24:32PM +0200, David Hildenbrand wrote:
> On 09.04.25 14:07, Michael S. Tsirkin wrote:
> > On Wed, Apr 09, 2025 at 01:12:19PM +0200, David Hildenbrand wrote:
> > > On 09.04.25 12:56, Michael S. Tsirkin wrote:
> > > > On Wed, Apr 09, 2025 at 12:46:41PM +0200, David Hildenbrand wrote:
> > > > > On 07.04.25 23:20, Michael S. Tsirkin wrote:
> > > > > > On Mon, Apr 07, 2025 at 08:47:05PM +0200, David Hildenbrand wrote:
> > > > > > > > In my opinion, it makes the most sense to keep the spec as it 
> > > > > > > > is and
> > > > > > > > change QEMU and the kernel to match, but obviously that's not 
> > > > > > > > trivial
> > > > > > > > to do in a way that doesn't break existing devices and drivers.
> > > > > > > 
> > > > > > > If only it would be limited to QEMU and Linux ... :)
> > > > > > > 
> > > > > > > Out of curiosity, assuming we'd make the spec match the current 
> > > > > > > QEMU/Linux
> > > > > > > implementation at least for the 3 involved features only, would 
> > > > > > > there be a
> > > > > > > way to adjust crossvm without any disruption?
> > > > > > > 
> > > > > > > I still have the feeling that it will be rather hard to get that 
> > > > > > > all
> > > > > > > implementations match the spec ... For new features+queues it 
> > > > > > > will be easy
> > > > > > > to force the usage of fixed virtqueue numbers, but for 
> > > > > > > free-page-hinting and
> > > > > > > reporting, it's a mess :(
> > > > > > 
> > > > > > 
> > > > > > Still thinking about a way to fix drivers... We can discuss this
> > > > > > theoretically, maybe?
> > > > > 
> > > > > Yes, absolutely. I took the time to do some more digging; regarding 
> > > > > drivers
> > > > > only Linux seems to be problematic.
> > > > > 
> > > > > virtio-win, FreeBSD, NetBSD and OpenBSD and don't seem to support
> > > > > problematic features (free page hinting, free page reporting) in their
> > > > > virtio-balloon implementations.
> > > > > 
> > > > > So from the known drivers, only Linux is applicable.
> > > > > 
> > > > > reporting_vq is either at idx 4/3/2
> > > > > free_page_vq is either at idx 3/2
> > > > > statsq is at idx2 (only relevant if the feature is offered)
> > > > > 
> > > > > So if we could test for the existence of a virtqueue at an idx 
> > > > > easily, we
> > > > > could test from highest-to-smallest idx.
> > > > > 
> > > > > But I recall that testing for the existance of a virtqueue on s390x 
> > > > > resulted
> > > > > in the problem/deadlock in the first place ...
> > > > > 
> > > > > -- 
> > > > > Cheers,
> > > > > 
> > > > > David / dhildenb
> > > > 
> > > > So let's talk about a new feature bit?
> > > 
> > > Are you thinking about a new feature that switches between "fixed queue
> > > indices" and "compressed queue indices", whereby the latter would be the
> > > legacy default and we would expect all devices to switch to the new
> > > fixed-queue-indices layout?
> > > 
> > > We could make all new features require "fixed-queue-indices".
> > 
> > I see two ways:
> > 1. we make driver behave correctly with in spec and out of spec devices
> > and we make qemu behave correctly with in spec and out of spec devices
> > 2. a new feature bit
> > 
> > I prefer 1, and when we add a new feature we can also
> > document that it should be in spec if negotiated.
> > 
> > My question is if 1 is practical.
> 
> AFAIKT, 1) implies:
> 
> virtio-balloon:
> 
> a) Driver
> 
> As mentioned above, we'd need a reliable way to test for the existence of a
> virtqueue, so we can e.g., test for reporting_vq idx 4 -> 3 -> 2
> 
> With that we'd be able to support compressed+fixed at the same time.
> 
> Q: Is it possible/feasible?
> 
> b) Device: virtio-balloon: we can fake existence of STAT and
> FREE_PAGE_HINTING easily, such that the compressed layout corresponds to the
> fixed layout easily.
> 
> Q: alternatives? We could try creating multiple queues for the same feature,
> but it's going to be a mess I'm afraid ...
> 
> 
> virtio-fs:
> 
> a) Driver
> 
> Linux does not even implement VIRTIO_FS_F_NOTIFICATION or respect
> VIRTIO_FS_F_NOTIFICATION when calculating queue indices, ...
> 
> b) Device
> 
> Same applies to virtiofsd ...
> 
> Q: Did anybody actually implement VIRTIO_FS_F_NOTIFICATION ever? If not, can
> we just remove it from the spec completely and resolve the issue that way?

Donnu. Vivek?

Or we can check for queue number 1+num_request_queues maybe?
If that exists then it is spec compliant?


> -- 
> Cheers,
> 
> David / dhildenb

Re: [PATCH v1 3/3] selftests: pid_namespace: add missing sys/mount.h include in pid_max.c

2025-04-09 Thread T.J. Mercier

On Wed, Jan 15, 2025 at 2:53 AM Peter Seiderer  wrote:
>
> Fix compile on openSUSE Tumbleweed (gcc-14.2.1, glibc-2.40):
>   - add missing sys/mount.h include
>
> Fixes:
>
>   pid_max.c: In function ‘pid_max_cb’:
>   pid_max.c:42:15: error: implicit declaration of function ‘mount’ 
> [-Wimplicit-function-declaration]
>  42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0);
> |   ^
>
> Signed-off-by: Peter Seiderer 

Reviewed-by: T.J. Mercier

Re: [PATCH net-next] configs/debug: run and debug PREEMPT

2025-04-09 Thread Matthieu Baerts

Hi Jakub,

On 08/04/2025 21:03, Jakub Kicinski wrote:
> On Tue, 8 Apr 2025 20:18:26 +0200 Matthieu Baerts wrote:
>> On 02/04/2025 19:23, Stanislav Fomichev wrote:
>>> Recent change [0] resulted in a "BUG: using __this_cpu_read() in
>>> preemptible" splat [1]. PREEMPT kernels have additional requirements
>>> on what can and can not run with/without preemption enabled.
>>> Expose those constrains in the debug kernels.  
>>
>> Good idea to suggest this to find more bugs!
>>
>> I did some quick tests on my side with our CI, and the MPTCP selftests
>> seem to take a bit more time, but without impacting the results.
>> Hopefully, there will be no impact in slower/busy environments :)
> 
> What kind of slow down do you see? I think we get up to 50% more time
> spent in the longer tests.

That's difficult to measure in our CI because we have a majority of
tests either creating test envs with random parameters (latency, losses,
etc.), or waiting for a transfer at a limited speed to finish. Plus, we
don't control the host running our tests. But if we omit that, our
packetdrill tests take ~20% longer on the CI, and our 2 mini KUnit tests
took ~10% longer (275ms -> 305ms). Globally, our test suite took maybe
~10-20% longer, and that's acceptable.

So not 50%. Is this difference acceptable for NIPA? Even when some tests
are restarted automatically in case of instabilities?

One last thing, Stanislav's patch has been shared during Linus' merge
window: perhaps something else could also impact the time?

> Not sure how bad is too bad..

Did you observe more instabilities? Maybe the individual results should
be omitted, and only debug specific issues (calltraces, kmemleak, etc.)
should be looked at?

> I'm leaning
> towards applying this to net-next and we can see if people running
> on linux-next complain?

Good idea! But I do wonder how run **and monitor** the selftests in
linux-next with a debug kernel :)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH RFC v3 3/8] slab: add sheaf support for batching kfree_rcu() operations

2025-04-09 Thread Vlastimil Babka

On 4/9/25 3:50 AM, Harry Yoo wrote:
> On Mon, Mar 17, 2025 at 03:33:04PM +0100, Vlastimil Babka wrote:
> 
> Hmm this hunk in v3 is fine, but on your slub-percpu-shaves-v4r0 branch
> it's calling local_unlock() twice. Probably a rebase error?

Yeah, thanks a lot for catching that! I've just pushed
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-percpu-sheaves-v4r1

with this fixed, and fixups for 2/8 to the points you made, plus a
proper strict_numa handling in 2/8, and an extra patch for better NUMA
locality

> Otherwise looks good to me.
> 
> When you address this, please feel free to add:
> 
> Reviewed-by: Harry Yoo 
> 
> Thanks!
>

[PATCH net 0/2] mptcp: only inc MPJoinAckHMacFailure for HMAC failures

2025-04-09 Thread Matthieu Baerts (NGI0)

Recently, during a debugging session using local MPTCP connections, I
noticed MPJoinAckHMacFailure was strangely not zero on the server side.

The first patch fixes this issue -- present since v5.9 -- and the second
one validates it in the selftests.

Signed-off-by: Matthieu Baerts (NGI0) 
---
Matthieu Baerts (NGI0) (2):
  mptcp: only inc MPJoinAckHMacFailure for HMAC failures
  selftests: mptcp: validate MPJoin HMacFailure counters

 net/mptcp/subflow.c |  8 ++--
 tools/testing/selftests/net/mptcp/mptcp_join.sh | 18 ++
 2 files changed, 24 insertions(+), 2 deletions(-)
---
base-commit: 61f96e684edd28ca40555ec49ea1555df31ba619
change-id: 20250407-net-mptcp-hmac-failure-mib-66f599305ff3

Best regards,
-- 
Matthieu Baerts (NGI0)

Re: [PATCH bpf-next v4 2/3] bpf, sockmap: avoid using sk_socket after free when reading

2025-04-09 Thread Alexei Starovoitov

On Tue, Apr 8, 2025 at 12:31 AM Jiayuan Chen  wrote:
>
> There are potential concurrency issues, as shown below.
> '''
> CPU0   CPU1
> sk_psock_verdict_data_ready:
>   socket *sock = sk->sk_socket
>   if (!sock) return
>close(fd):
>  ...
>  ops->release()
>   if (!sock->ops) return
>  sock->ops = NULL
>  rcu_call(sock)
>  free(sock)
>   READ_ONCE(sock->ops)
>   ^
>   use 'sock' after free
> '''
>
> RCU is not applicable to Unix sockets read path, because the Unix socket
> implementation itself assumes it's always in process context and heavily
> uses mutex_lock, so, we can't call read_skb within rcu lock.
>
> Incrementing the psock reference count would not help either, since
> sock_map_close() does not wait for data_ready() to complete its execution.
>
> While we don't utilize sk_socket here, implementing read_skb at the sock
> layer instead of socket layer might be architecturally preferable ?
> However, deferring this optimization as current fix adequately addresses
> the immediate issue.
>
> Fixes: c63829182c37 ("af_unix: Implement ->psock_update_sk_prot()")
> Reported-by: syzbot+dd90a702f518e0eac...@syzkaller.appspotmail.com
> Closes: 
> https://lore.kernel.org/bpf/6734c033.050a0220.2a2fcc.0015@google.com/
> Signed-off-by: Jiayuan Chen 
> ---
>  net/core/skmsg.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 6101c1bb279a..5e913b62929e 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1231,17 +1231,24 @@ static int sk_psock_verdict_recv(struct sock *sk, 
> struct sk_buff *skb)
>
>  static void sk_psock_verdict_data_ready(struct sock *sk)
>  {
> -   struct socket *sock = sk->sk_socket;
> +   struct socket *sock;
> const struct proto_ops *ops;
> int copied;
>
> trace_sk_data_ready(sk);
>
> -   if (unlikely(!sock))
> +   rcu_read_lock();
> +   sock = sk->sk_socket;
> +   if (unlikely(!sock)) {
> +   rcu_read_unlock();
> return;
> +   }
> ops = READ_ONCE(sock->ops);
> -   if (!ops || !ops->read_skb)
> +   if (!ops || !ops->read_skb) {
> +   rcu_read_unlock();
> return;
> +   }
> +   rcu_read_unlock();

This makes no sense to me. RCU doesn't work this way.

pw-bot: cr

[PATCH v8 2/4] KVM: selftests: Add core KVM selftests support for LoongArch

2025-04-09 Thread Bibo Mao

Add core KVM selftests support for LoongArch, it includes exception
handler, mmu page table setup and vcpu startup entry supporting etc.

Signed-off-by: Bibo Mao 
---
 .../selftests/kvm/lib/loongarch/exception.S   |  59 +++
 .../selftests/kvm/lib/loongarch/processor.c   | 349 ++
 2 files changed, 408 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/lib/loongarch/exception.S
 create mode 100644 tools/testing/selftests/kvm/lib/loongarch/processor.c

diff --git a/tools/testing/selftests/kvm/lib/loongarch/exception.S 
b/tools/testing/selftests/kvm/lib/loongarch/exception.S
new file mode 100644
index ..88bfa505c6f5
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/loongarch/exception.S
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include "processor.h"
+
+/* address of refill exception should be 4K aligned */
+.balign4096
+.global handle_tlb_refill
+handle_tlb_refill:
+   csrwr   t0, LOONGARCH_CSR_TLBRSAVE
+   csrrd   t0, LOONGARCH_CSR_PGD
+   lddir   t0, t0, 3
+   lddir   t0, t0, 1
+   ldpte   t0, 0
+   ldpte   t0, 1
+   tlbfill
+   csrrd   t0, LOONGARCH_CSR_TLBRSAVE
+   ertn
+
+   /*
+* save and restore all gprs except base register,
+* and default value of base register is sp ($r3).
+*/
+.macro save_gprs base
+   .irp 
n,1,2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+   st.d$r\n, \base, 8 * \n
+   .endr
+.endm
+
+.macro restore_gprs base
+   .irp 
n,1,2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
+   ld.d$r\n, \base, 8 * \n
+   .endr
+.endm
+
+/* address of general exception should be 4K aligned */
+.balign4096
+.global handle_exception
+handle_exception:
+   csrwr  sp, LOONGARCH_CSR_KS0
+   csrrd  sp, LOONGARCH_CSR_KS1
+   addi.d sp, sp, -EXREGS_SIZE
+
+   save_gprs sp
+   /* save sp register to stack */
+   csrrd  t0, LOONGARCH_CSR_KS0
+   st.d   t0, sp, 3 * 8
+
+   csrrd  t0, LOONGARCH_CSR_ERA
+   st.d   t0, sp, PC_OFFSET_EXREGS
+   csrrd  t0, LOONGARCH_CSR_ESTAT
+   st.d   t0, sp, ESTAT_OFFSET_EXREGS
+   csrrd  t0, LOONGARCH_CSR_BADV
+   st.d   t0, sp, BADV_OFFSET_EXREGS
+
+   or a0, sp, zero
+   bl route_exception
+   restore_gprs sp
+   csrrd  sp, LOONGARCH_CSR_KS0
+   ertn
diff --git a/tools/testing/selftests/kvm/lib/loongarch/processor.c 
b/tools/testing/selftests/kvm/lib/loongarch/processor.c
new file mode 100644
index ..25b7f896a107
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/loongarch/processor.c
@@ -0,0 +1,349 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "ucall_common.h"
+
+#define LOONGARCH_GUEST_STACK_VADDR_MIN0x20
+#define LOONARCH_PAGE_TABLE_PHYS_MIN   0x20
+
+static vm_paddr_t invalid_pgtable[4];
+static uint64_t virt_pte_index(struct kvm_vm *vm, vm_vaddr_t gva, int level)
+{
+   unsigned int shift;
+   uint64_t mask;
+
+   shift = level * (vm->page_shift - 3) + vm->page_shift;
+   mask = (1UL << (vm->page_shift - 3)) - 1;
+   return (gva >> shift) & mask;
+}
+
+static uint64_t pte_addr(struct kvm_vm *vm, uint64_t entry)
+{
+   return entry &  ~((0x1UL << vm->page_shift) - 1);
+}
+
+static uint64_t ptrs_per_pte(struct kvm_vm *vm)
+{
+   return 1 << (vm->page_shift - 3);
+}
+
+static void virt_set_pgtable(struct kvm_vm *vm, vm_paddr_t table, vm_paddr_t 
child)
+{
+   uint64_t *ptep;
+   int i, ptrs_per_pte;
+
+   ptep = addr_gpa2hva(vm, table);
+   ptrs_per_pte = 1 << (vm->page_shift - 3);
+   for (i = 0; i < ptrs_per_pte; i++)
+   WRITE_ONCE(*(ptep + i), child);
+}
+
+void virt_arch_pgd_alloc(struct kvm_vm *vm)
+{
+   int i;
+   vm_paddr_t child, table;
+
+   if (vm->pgd_created)
+   return;
+   child = table = 0;
+   for (i = 0; i < vm->pgtable_levels; i++) {
+   invalid_pgtable[i] = child;
+   table = vm_phy_page_alloc(vm, LOONARCH_PAGE_TABLE_PHYS_MIN,
+   vm->memslots[MEM_REGION_PT]);
+   TEST_ASSERT(table, "Fail to allocate page tale at level %d\n", 
i);
+   virt_set_pgtable(vm, table, child);
+   child = table;
+   }
+   vm->pgd = table;
+   vm->pgd_created = true;
+}
+
+static int virt_pte_none(uint64_t *ptep, int level)
+{
+   return *ptep == invalid_pgtable[level];
+}
+
+static uint64_t *virt_populate_pte(struct kvm_vm *vm, vm_vaddr_t gva, int 
alloc)
+{
+   uint64_t *ptep;
+   vm_paddr_t child;
+   int level;
+
+   if (!vm->pgd_created)
+   goto unmapped_gva;
+
+   level = vm->pgtable_levels - 1;
+   child = vm->pgd;
+   while (level > 0) {
+   ptep = addr_gpa2hva(vm, child) + virt_pte_index(vm, gva, l

[PATCH v8 1/4] KVM: selftests: Add KVM selftests header files for LoongArch

2025-04-09 Thread Bibo Mao

Add KVM selftests header files for LoongArch, including processor.h
and kvm_util_base.h. It mainly contains LoongArch CSR register
definition and page table entry definition.

Signed-off-by: Bibo Mao 
---
 .../testing/selftests/kvm/include/kvm_util.h  |   5 +
 .../kvm/include/loongarch/kvm_util_arch.h |   7 +
 .../kvm/include/loongarch/processor.h | 138 ++
 3 files changed, 150 insertions(+)
 create mode 100644 
tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h
 create mode 100644 tools/testing/selftests/kvm/include/loongarch/processor.h

diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
b/tools/testing/selftests/kvm/include/kvm_util.h
index 373912464fb4..d9cb62207c57 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -232,6 +232,11 @@ extern enum vm_guest_mode vm_mode_default;
 #define MIN_PAGE_SHIFT 12U
 #define ptes_per_page(page_size)   ((page_size) / 8)
 
+#elif defined(__loongarch__)
+#define VM_MODE_DEFAULTVM_MODE_P36V47_16K
+#define MIN_PAGE_SHIFT 12U
+#define ptes_per_page(page_size)   ((page_size) / 8)
+
 #endif
 
 #define VM_SHAPE_DEFAULT   VM_SHAPE(VM_MODE_DEFAULT)
diff --git a/tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h 
b/tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h
new file mode 100644
index ..e43a57d99b56
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef SELFTEST_KVM_UTIL_ARCH_H
+#define SELFTEST_KVM_UTIL_ARCH_H
+
+struct kvm_vm_arch {};
+
+#endif  // SELFTEST_KVM_UTIL_ARCH_H
diff --git a/tools/testing/selftests/kvm/include/loongarch/processor.h 
b/tools/testing/selftests/kvm/include/loongarch/processor.h
new file mode 100644
index ..e95dd2059605
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/loongarch/processor.h
@@ -0,0 +1,138 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef SELFTEST_KVM_PROCESSOR_H
+#define SELFTEST_KVM_PROCESSOR_H
+
+#ifndef __ASSEMBLER__
+#include "ucall_common.h"
+
+#else
+/* general registers */
+#define zero   $r0
+#define ra $r1
+#define tp $r2
+#define sp $r3
+#define a0 $r4
+#define a1 $r5
+#define a2 $r6
+#define a3 $r7
+#define a4 $r8
+#define a5 $r9
+#define a6 $r10
+#define a7 $r11
+#define t0 $r12
+#define t1 $r13
+#define t2 $r14
+#define t3 $r15
+#define t4 $r16
+#define t5 $r17
+#define t6 $r18
+#define t7 $r19
+#define t8 $r20
+#define u0 $r21
+#define fp $r22
+#define s0 $r23
+#define s1 $r24
+#define s2 $r25
+#define s3 $r26
+#define s4 $r27
+#define s5 $r28
+#define s6 $r29
+#define s7 $r30
+#define s8 $r31
+#endif
+
+/* LoongArch page table entry definition */
+#define _PAGE_VALID_SHIFT  0
+#define _PAGE_DIRTY_SHIFT  1
+#define _PAGE_PLV_SHIFT2  /* 2~3, two bits */
+#define  PLV_KERN  0
+#define  PLV_USER  3
+#define  PLV_MASK  0x3
+#define _CACHE_SHIFT   4  /* 4~5, two bits */
+#define _PAGE_PRESENT_SHIFT7
+#define _PAGE_WRITE_SHIFT  8
+
+#define _PAGE_VALIDBIT_ULL(_PAGE_VALID_SHIFT)
+#define _PAGE_PRESENT  BIT_ULL(_PAGE_PRESENT_SHIFT)
+#define _PAGE_WRITEBIT_ULL(_PAGE_WRITE_SHIFT)
+#define _PAGE_DIRTYBIT_ULL(_PAGE_DIRTY_SHIFT)
+#define _PAGE_USER (PLV_USER << _PAGE_PLV_SHIFT)
+#define   __READABLE   (_PAGE_VALID)
+#define   __WRITEABLE  (_PAGE_DIRTY | _PAGE_WRITE)
+/* Coherent Cached */
+#define _CACHE_CC  BIT_ULL(_CACHE_SHIFT)
+#define PS_4K  0x000c
+#define PS_8K  0x000d
+#define PS_16K 0x000e
+#define PS_DEFAULT_SIZEPS_16K
+
+/* LoongArch Basic CSR registers */
+#define LOONGARCH_CSR_CRMD 0x0 /* Current mo

Re: [PATCH v1] s390/virtio_ccw: don't allocate/assign airqs for non-existing queues

2025-04-09 Thread David Hildenbrand


On 09.04.25 14:07, Michael S. Tsirkin wrote:

On Wed, Apr 09, 2025 at 01:12:19PM +0200, David Hildenbrand wrote:

On 09.04.25 12:56, Michael S. Tsirkin wrote:

On Wed, Apr 09, 2025 at 12:46:41PM +0200, David Hildenbrand wrote:

On 07.04.25 23:20, Michael S. Tsirkin wrote:

On Mon, Apr 07, 2025 at 08:47:05PM +0200, David Hildenbrand wrote:

In my opinion, it makes the most sense to keep the spec as it is and
change QEMU and the kernel to match, but obviously that's not trivial
to do in a way that doesn't break existing devices and drivers.


If only it would be limited to QEMU and Linux ... :)

Out of curiosity, assuming we'd make the spec match the current QEMU/Linux
implementation at least for the 3 involved features only, would there be a
way to adjust crossvm without any disruption?

I still have the feeling that it will be rather hard to get that all
implementations match the spec ... For new features+queues it will be easy
to force the usage of fixed virtqueue numbers, but for free-page-hinting and
reporting, it's a mess :(



Still thinking about a way to fix drivers... We can discuss this
theoretically, maybe?


Yes, absolutely. I took the time to do some more digging; regarding drivers
only Linux seems to be problematic.

virtio-win, FreeBSD, NetBSD and OpenBSD and don't seem to support
problematic features (free page hinting, free page reporting) in their
virtio-balloon implementations.

So from the known drivers, only Linux is applicable.

reporting_vq is either at idx 4/3/2
free_page_vq is either at idx 3/2
statsq is at idx2 (only relevant if the feature is offered)

So if we could test for the existence of a virtqueue at an idx easily, we
could test from highest-to-smallest idx.

But I recall that testing for the existance of a virtqueue on s390x resulted
in the problem/deadlock in the first place ...

--
Cheers,

David / dhildenb


So let's talk about a new feature bit?


Are you thinking about a new feature that switches between "fixed queue
indices" and "compressed queue indices", whereby the latter would be the
legacy default and we would expect all devices to switch to the new
fixed-queue-indices layout?

We could make all new features require "fixed-queue-indices".


I see two ways:
1. we make driver behave correctly with in spec and out of spec devices
and we make qemu behave correctly with in spec and out of spec devices
2. a new feature bit

I prefer 1, and when we add a new feature we can also
document that it should be in spec if negotiated.

My question is if 1 is practical.


AFAIKT, 1) implies:

virtio-balloon:

a) Driver

As mentioned above, we'd need a reliable way to test for the existence 
of a virtqueue, so we can e.g., test for reporting_vq idx 4 -> 3 -> 2


With that we'd be able to support compressed+fixed at the same time.

Q: Is it possible/feasible?

b) Device: virtio-balloon: we can fake existence of STAT and 
FREE_PAGE_HINTING easily, such that the compressed layout corresponds to 
the fixed layout easily.


Q: alternatives? We could try creating multiple queues for the same 
feature, but it's going to be a mess I'm afraid ...



virtio-fs:

a) Driver

Linux does not even implement VIRTIO_FS_F_NOTIFICATION or respect 
VIRTIO_FS_F_NOTIFICATION when calculating queue indices, ...


b) Device

Same applies to virtiofsd ...

Q: Did anybody actually implement VIRTIO_FS_F_NOTIFICATION ever? If not, 
can we just remove it from the spec completely and resolve the issue 
that way?


--
Cheers,

David / dhildenb

Re: [PATCH] compiler.h: Avoid the usage of __typeof_unqual() when GENKSYMS__ is defined

2025-04-09 Thread Borislav Petkov

On Sun, Apr 06, 2025 at 05:36:13PM +0200, Uros Bizjak wrote:
> On Fri, Apr 4, 2025 at 9:14 PM Masahiro Yamada  wrote:
> >
> > On Fri, Apr 4, 2025 at 11:37 PM Uros Bizjak  wrote:
> > >
> > > On Fri, Apr 4, 2025 at 4:06 PM Masahiro Yamada  
> > > wrote:
> > >
> > > > > > > Current version of genksyms doesn't know anything about 
> > > > > > > __typeof_unqual__()
> > > > > > > operator.  Avoid the usage of __typeof_unqual__() with genksyms 
> > > > > > > to prevent
> > > > > > > errors when symbols are versioned.
> > > > > > >
> > > > > > > There were no problems with gendwarfksyms.
> > > > > > >
> > > > > > > Signed-off-by: Uros Bizjak 
> > > > > > > Fixes: ac053946f5c40 ("compiler.h: introduce TYPEOF_UNQUAL() 
> > > > > > > macro")
> > > > > > > Reported-by: Paul Menzel 
> > > > > > > Closes: 
> > > > > > > https://lore.kernel.org/lkml/81a25a60-de78-43fb-b56a-131151e1c...@molgen.mpg.de/
> > > > > > > Cc: Sami Tolvanen 
> > > > > > > Cc: Andrew Morton 
> > > > > > > ---
> > > > > >
> > > > > >
> > > > > > Why don't you add it to the genksyms keyword table?
> > > > >
> > > > > It doesn't work, even if I patch it with an even more elaborate patch
> > > > > (attached).
> > > > >
> > > > > I guess some more surgery will be needed, but for now a fallback works
> > > > > as expected.
> > > > >
> > > > > Uros.
> > > >
> > > > The attached patch looks good to me.
> > >
> > > FAOD - do you refer to the submitted one for compiler.h or to the one
> > > for scripts/genksyms/keywords.c? (The latter doesn't fix the warning,
> > > though).
> >
> >
> >
> > You are still seeing the warnings because __typeof_unqual__
> > is not only the issue.
> >
> > Hint:
> >
> > $ make -s KCFLAGS=-D__GENKSYMS__  arch/x86/kernel/setup_percpu.i
> > $ grep  'this_cpu_off;'  arch/x86/kernel/setup_percpu.i
> 
> I see.
> 
> With my workaround, this_cpu_off is declared as:
> 
> extern __attribute__((section(".data..percpu" "..hot.."
> "this_cpu_off"))) __typeof__(unsigned long) this_cpu_off;
> 
> while without workaround, the same variable is declared as:
> 
> extern __seg_gs __attribute__((section(".data..percpu" "..hot.."
> "this_cpu_off"))) __typeof__(unsigned long) this_cpu_off;
> 
> It looks that genksyms should be extended to handle (or ignore)
> __seg_gs/__seg_fs named address prefix. Somewhat surprising, because
> genksyms can process:
> 
> extern __attribute__((section(".data..percpu" "..hot.."
> "const_current_task"))) __typeof__(struct task_struct * const
> __seg_gs) const_current_task
> 
> without problems.
> 
> I'm sorry, but I'm not able to extend genksyms with a new keyword by myself...

Well, we need a fix here because this fires a lot by now - triggers on my
machines now too.

So either take a fix or we'll need to revert until it is fixed properly.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

[PATCH v5 1/2] mm/vmscan: Skip memcg with !usage in shrink_node_memcgs()

2025-04-09 Thread Waiman Long

The test_memcontrol selftest consistently fails its test_memcg_low
sub-test due to the fact that two of its test child cgroups which
have a memmory.low of 0 or an effective memory.low of 0 still have low
events generated for them since mem_cgroup_below_low() use the ">="
operator when comparing to elow.

The two failed use cases are as follows:

1) memory.low is set to 0, but low events can still be triggered and
   so the cgroup may have a non-zero low event count. I doubt users are
   looking for that as they didn't set memory.low at all.

2) memory.low is set to a non-zero value but the cgroup has no task in
   it so that it has an effective low value of 0. Again it may have a
   non-zero low event count if memory reclaim happens. This is probably
   not a result expected by the users and it is really doubtful that
   users will check an empty cgroup with no task in it and expecting
   some non-zero event counts.

In the first case, even though memory.low isn't set, it may still have
some low protection if memory.low is set in the parent. So low event may
still be recorded. The test_memcontrol.c test has to be modified to
account for that.

For the second case, it really doesn't make sense to have non-zero low
event if the cgroup has 0 usage. So we need to skip this corner case
in shrink_node_memcgs() using mem_cgroup_usage(). The mem_cgroup_usage()
function declaration is moved from mm/memcontrol-v1.h to mm/internal.h
with the !CONFIG_MEMCG case defined as always true.

With this patch applied, the test_memcg_low sub-test finishes
successfully without failure in most cases. Though both test_memcg_low
and test_memcg_min sub-tests may still fail occasionally if the
memory.current values fall outside of the expected ranges.

Suggested-by: Johannes Weiner 
Signed-off-by: Waiman Long 
---
 mm/internal.h| 9 +
 mm/memcontrol-v1.h   | 2 --
 mm/vmscan.c  | 4 
 tools/testing/selftests/cgroup/test_memcontrol.c | 7 ++-
 4 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 50c2f590b2d0..c06fb0e8d75c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1535,6 +1535,15 @@ void __meminit __init_page_from_nid(unsigned long pfn, 
int nid);
 unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
  int priority);
 
+#ifdef CONFIG_MEMCG
+unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap);
+#else
+static inline unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool 
swap)
+{
+   return 1UL;
+}
+#endif
+
 #ifdef CONFIG_SHRINKER_DEBUG
 static inline __printf(2, 0) int shrinker_debugfs_name_alloc(
struct shrinker *shrinker, const char *fmt, va_list ap)
diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h
index 6358464bb416..e92b21af92b1 100644
--- a/mm/memcontrol-v1.h
+++ b/mm/memcontrol-v1.h
@@ -22,8 +22,6 @@
 iter != NULL;  \
 iter = mem_cgroup_iter(NULL, iter, NULL))
 
-unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap);
-
 void drain_all_stock(struct mem_cgroup *root_memcg);
 
 unsigned long memcg_events(struct mem_cgroup *memcg, int event);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b620d74b0f66..a771a0145a12 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -5963,6 +5963,10 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct 
scan_control *sc)
 
mem_cgroup_calculate_protection(target_memcg, memcg);
 
+   /* Skip memcg with no usage */
+   if (!mem_cgroup_usage(memcg, false))
+   continue;
+
if (mem_cgroup_below_min(target_memcg, memcg)) {
/*
 * Hard protection.
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c 
b/tools/testing/selftests/cgroup/test_memcontrol.c
index 16f5d74ae762..bab826b6b7b0 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -525,8 +525,13 @@ static int test_memcg_protection(const char *root, bool 
min)
goto cleanup;
}
 
+   /*
+* Child 2 has memory.low=0, but some low protection is still being
+* distributed down from its parent with memory.low=50M. So the low
+* event count will be non-zero.
+*/
for (i = 0; i < ARRAY_SIZE(children); i++) {
-   int no_low_events_index = 1;
+   int no_low_events_index = 2;
long low, oom;
 
oom = cg_read_key_long(children[i], "memory.events", "oom ");
-- 
2.48.1

Re: [PATCH] ASoC: cs-amp-lib-test: Don't select SND_SOC_CS_AMP_LIB

2025-04-09 Thread Richard Fitzgerald


On 09/04/2025 3:24 pm, Mark Brown wrote:

On Wed, Apr 09, 2025 at 11:45:44AM +0100, Richard Fitzgerald wrote:

Depend on SND_SOC_CS_AMP_LIB instead of selecting it.

KUNIT_ALL_TESTS should only build tests for components that are
already being built, it should not cause other stuff to be added
to the build.



  config SND_SOC_CS_AMP_LIB_TEST
-   tristate "KUnit test for Cirrus Logic cs-amp-lib"
-   depends on KUNIT
+   tristate "KUnit test for Cirrus Logic cs-amp-lib" if !KUNIT_ALL_TESTS
+   depends on SND_SOC_CS_AMP_LIB && KUNIT
default KUNIT_ALL_TESTS
-   select SND_SOC_CS_AMP_LIB
help
  This builds KUnit tests for the Cirrus Logic common
  amplifier library.


This by itself results in the Cirrus tests being removed from a kunit
--alltests run which is a regression in coverage.  I'd expect to see
some corresponding updates in the KUnit all_tests.config to keep them
enabled.


That's the defined behaviour of KUNIT_ALL_TESTS. It shouldn't have been
running as part of an alltests if nothing had selected it. That seems to
make people angry. Probably the same people who would complain if there
was a bug in the code that they didn't want to test.

Re: [PATCH] selftests/sgx: Fix an enclave built with extended instructions

2025-04-09 Thread Dave Hansen

On 4/9/25 09:55, Vladis Dronov wrote:
...
> Fix this by adding "-mno-avx" to ENCL_CFLAGS in Makefile. Add some comments
> about this to code locations where enclave's xfrm field is set.
> 
> Suggested-by: Dave Hansen 
> Signed-off-by: Vladis Dronov 
First of all, this looks fine to me:

Acked-by: Dave Hansen 

The code comments are fine. I'm much less picky about selftests.

I'm also open to other solutions here. We could, for instance, set
xfrm=7 to allow AVX2 instructions (which are generated by
"--with-arch_64=x86-64-v3") or use some compiler flags other than
"-mno-avx".

But "-mno-avx" does seem good to me.

Re: [PATCH] selftests/sgx: Fix an enclave built with extended instructions

2025-04-09 Thread Vladis Dronov

On Wed, Apr 9, 2025 at 7:07 PM Dave Hansen  wrote:
>
> On 4/9/25 09:55, Vladis Dronov wrote:
> ...
> > Fix this by adding "-mno-avx" to ENCL_CFLAGS in Makefile. Add some comments
> > about this to code locations where enclave's xfrm field is set.
> >
> > Suggested-by: Dave Hansen 
> > Signed-off-by: Vladis Dronov 
>
> First of all, this looks fine to me:
>
> Acked-by: Dave Hansen 
>
> The code comments are fine. I'm much less picky about selftests.
>
> I'm also open to other solutions here. We could, for instance, set
> xfrm=7 to allow AVX2 instructions (which are generated by
> "--with-arch_64=x86-64-v3") or use some compiler flags other than
> "-mno-avx".
>
> But "-mno-avx" does seem good to me.
>

Thank you for the ACK, Dave.

I've tested an enclave with xfrm=7 and it errors out at the AVX512
instruction (namely, vmovdqa64) in the same way. So if there is a
compiler built with "--with-arch_64=x86-64-v4" in some future, we
would get into the exact same situation.

So I believe a solution when we disable extended instruction sets
in an enclave as one covering all future cases.

Re: [PATCH] selftests/mm: Fix compiler -Wmaybe-uninitialized warning

2025-04-09 Thread Anshuman Khandual




On 4/9/25 15:51, David Hildenbrand wrote:
> On 09.04.25 12:09, Anshuman Khandual wrote:
>>
>>
>> On 4/9/25 15:27, David Hildenbrand wrote:
>>> On 09.04.25 11:50, Anshuman Khandual wrote:
 Following build warning comes up for cow test as 'transferred' variable has
 not been initialized. Fix the warning via zero init for the variable.

     CC   cow
 cow.c: In function ‘do_test_vmsplice_in_parent’:
 cow.c:365:61: warning: ‘transferred’ may be used uninitialized 
 [-Wmaybe-uninitialized]
     365 | cur = read(fds[0], new + total, transferred - 
 total);
     | 
 ^~~
 cow.c:296:29: note: ‘transferred’ was declared here
     296 | ssize_t cur, total, transferred;
     | ^~~
     CC   compaction_test
     CC   gup_longterm

 Cc: Andrew Morton 
 Cc: Shuah Khan 
 Cc: linux...@kvack.org
 Cc: linux-kselft...@vger.kernel.org
 Cc: linux-kernel@vger.kernel.org
 Signed-off-by: Anshuman Khandual 
 ---
    tools/testing/selftests/mm/cow.c | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/tools/testing/selftests/mm/cow.c 
 b/tools/testing/selftests/mm/cow.c
 index f0cb14ea8608..b6cfe0a4b7df 100644
 --- a/tools/testing/selftests/mm/cow.c
 +++ b/tools/testing/selftests/mm/cow.c
 @@ -293,7 +293,7 @@ static void do_test_vmsplice_in_parent(char *mem, 
 size_t size,
    .iov_base = mem,
    .iov_len = size,
    };
 -    ssize_t cur, total, transferred;
 +    ssize_t cur, total, transferred = 0;
    struct comm_pipes comm_pipes;
    char *old, *new;
    int ret, fds[2];
>>>
>>>
>>> if (before_fork) {
>>>  transferred = vmsplice(fds[1], &iov, 1, 0);
>>> ...
>>>
>>> if (!before_fork) {
>>>  transferred = vmsplice(fds[1], &iov, 1, 0);
>>> ...
>>>
>>> for (total = 0; total < transferred; total += cur) {
>>> ...
>>>
>>>
>>> And I don't see any jump label that could jump to code that would ve using 
>>> transferred.
>>>
>>> What am I missing?
>>
>> Probably because both those conditional statements are not mutually
>> exclusive above with an if-else construct. Hence compiler flags it
>> rather as a false positive ? Initializing with 0 just works around
>> that false positive.
> 
> This is something the compiler should clearly be able to verify. before_fork 
> is never changed in that function.
> 
> We should not work around wrong compilers.
> 
> Which compiler are you using such that you run into this issue?

gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0

[PATCH v4] selftests/x86/lam: fix resource leak in do_uring() and allocate_dsa_pasid()

2025-04-09 Thread Malaya Kumar Rout

This patch resolves resource leaks reported by cppcheck in lam.c.
Specifically, the 'file_fd' and 'fd' were not closed in do_uring()
and allocate_dsa_pasid() functions, respectively.

cppcheck output before this patch:
tools/testing/selftests/x86/lam.c:685:3: error: Resource leak: file_fd 
[resourceLeak]
tools/testing/selftests/x86/lam.c:693:3: error: Resource leak: file_fd 
[resourceLeak]
tools/testing/selftests/x86/lam.c:1195:2: error: Resource leak: fd 
[resourceLeak]

cppcheck output after this patch:
No resource leaks found

Signed-off-by: Malaya Kumar Rout 
---
 tools/testing/selftests/x86/lam.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/x86/lam.c 
b/tools/testing/selftests/x86/lam.c
index 18d736640ece..0873b0e5f48b 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -682,7 +682,7 @@ int do_uring(unsigned long lam)
return 1;
 
if (fstat(file_fd, &st) < 0)
-   return 1;
+   goto cleanup;
 
off_t file_sz = st.st_size;
 
@@ -690,7 +690,7 @@ int do_uring(unsigned long lam)
 
fi = malloc(sizeof(*fi) + sizeof(struct iovec) * blocks);
if (!fi)
-   return 1;
+   goto cleanup;
 
fi->file_sz = file_sz;
fi->file_fd = file_fd;
@@ -698,7 +698,7 @@ int do_uring(unsigned long lam)
ring = malloc(sizeof(*ring));
if (!ring) {
free(fi);
-   return 1;
+   goto cleanup;
}
 
memset(ring, 0, sizeof(struct io_ring));
@@ -729,6 +729,8 @@ int do_uring(unsigned long lam)
}
 
free(fi);
+cleanup:
+   close(file_fd);
 
return ret;
 }
@@ -1189,6 +1191,7 @@ void *allocate_dsa_pasid(void)
 
wq = mmap(NULL, 0x1000, PROT_WRITE,
   MAP_SHARED | MAP_POPULATE, fd, 0);
+   close(fd);
if (wq == MAP_FAILED)
perror("mmap");
 
-- 
2.43.0

Re: [PATCH v2] selftests: pid_namespace: Add missing sys/mount.h

2025-04-09 Thread T.J. Mercier

On Wed, Apr 9, 2025 at 1:53 AM Peter Seiderer  wrote:
>
> Hello T.J.,
>
> On Tue,  8 Apr 2025 23:02:02 +, "T.J. Mercier"  
> wrote:
>
> > pid_max.c: In function ‘pid_max_cb’:
> > pid_max.c:42:15: error: implicit declaration of function ‘mount’
> >[-Wimplicit-function-declaration]
> >42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0);
> >   |   ^
> > pid_max.c:42:36: error: ‘MS_PRIVATE’ undeclared (first use in this
> >   function); did you mean ‘MAP_PRIVATE’?
> >42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0);
> >   |^~
> >   |MAP_PRIVATE
> > pid_max.c:42:49: error: ‘MS_REC’ undeclared (first use in this function)
> >42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0);
> >   | ^~
> > pid_max.c:48:9: error: implicit declaration of function ‘umount2’; did
> >you mean ‘SYS_umount2’? [-Wimplicit-function-declaration]
> >48 | umount2("/proc", MNT_DETACH);
> >   | ^~~
> >   | SYS_umount2
> > pid_max.c:48:26: error: ‘MNT_DETACH’ undeclared (first use in this
> >function)
> >48 | umount2("/proc", MNT_DETACH);
> >
> > Fixes: 615ab43b838b ("tests/pid_namespace: add pid_max tests")
> > Signed-off-by: T.J. Mercier 
> > ---
> >  tools/testing/selftests/pid_namespace/pid_max.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/tools/testing/selftests/pid_namespace/pid_max.c 
> > b/tools/testing/selftests/pid_namespace/pid_max.c
> > index 51c414faabb0..96f274f0582b 100644
> > --- a/tools/testing/selftests/pid_namespace/pid_max.c
> > +++ b/tools/testing/selftests/pid_namespace/pid_max.c
> > @@ -10,6 +10,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >
> >  #include "../kselftest_harness.h"
>
> Predated patch already available, see
>
> 
> https://lore.kernel.org/linux-kselftest/20250115105211.390370-3-ps.rep...@gmx.net/
>
> Regards,
> Peter

Thanks Peter, looks like that was never taken into anybody's tree?

You can have my:
Reviewed-by: T.J. Mercier

Re: [PATCH 2/4] sysctl: Add 0012 to test the u8 range check

2025-04-09 Thread Kees Cook

On Fri, Mar 21, 2025 at 01:47:25PM +0100, Joel Granados wrote:
> Add a sysctl test that uses the new u8 test ctl files in a created by
> the sysctl test module. Check that the u8 proc file that is valid is
> created and that there are two messages in dmesg for the files that were
> out of range.
> 
> Signed-off-by: Joel Granados 

Reviewed-by: Kees Cook 

-- 
Kees Cook

Re: [PATCH v2] selftests: pid_namespace: Add missing sys/mount.h

2025-04-09 Thread Peter Seiderer

Hello T.J.,

On Tue,  8 Apr 2025 23:02:02 +, "T.J. Mercier"  wrote:

> pid_max.c: In function ‘pid_max_cb’:
> pid_max.c:42:15: error: implicit declaration of function ‘mount’
>[-Wimplicit-function-declaration]
>42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0);
>   |   ^
> pid_max.c:42:36: error: ‘MS_PRIVATE’ undeclared (first use in this
>   function); did you mean ‘MAP_PRIVATE’?
>42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0);
>   |^~
>   |MAP_PRIVATE
> pid_max.c:42:49: error: ‘MS_REC’ undeclared (first use in this function)
>42 | ret = mount("", "/", NULL, MS_PRIVATE | MS_REC, 0);
>   | ^~
> pid_max.c:48:9: error: implicit declaration of function ‘umount2’; did
>you mean ‘SYS_umount2’? [-Wimplicit-function-declaration]
>48 | umount2("/proc", MNT_DETACH);
>   | ^~~
>   | SYS_umount2
> pid_max.c:48:26: error: ‘MNT_DETACH’ undeclared (first use in this
>function)
>48 | umount2("/proc", MNT_DETACH);
> 
> Fixes: 615ab43b838b ("tests/pid_namespace: add pid_max tests")
> Signed-off-by: T.J. Mercier 
> ---
>  tools/testing/selftests/pid_namespace/pid_max.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/testing/selftests/pid_namespace/pid_max.c 
> b/tools/testing/selftests/pid_namespace/pid_max.c
> index 51c414faabb0..96f274f0582b 100644
> --- a/tools/testing/selftests/pid_namespace/pid_max.c
> +++ b/tools/testing/selftests/pid_namespace/pid_max.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "../kselftest_harness.h"

Predated patch already available, see


https://lore.kernel.org/linux-kselftest/20250115105211.390370-3-ps.rep...@gmx.net/

Regards,
Peter

RE: spdxcheck: python git module considered harmful (was RE: [PATCH] scripts/spdxcheck: Limit the scope of git.Repo)

2025-04-09 Thread Bird, Tim

> -Original Message-
> From: Thomas Gleixner 
> On Tue, Apr 08 2025 at 17:34, Tim Bird wrote:
> >> -Original Message-
> > For what it's worth, I've always been a bit skeptical of the use of the 
> > python git module
> > in spdxcheck.py.  Its use makes it impossible to use spdxcheck on a kernel 
> > source tree
> > from a tarball (ie, on source not inside a git repo).  Also, from what I 
> > can see in spdxcheck.py,
> > the way it's used is just to get the top directories for either the 
> > LICENSES dir,
> > the top dir of the kernel source tree, or the directory to scan passed on 
> > the
> > spdxcheck.py command line, and then to use the repo.traverse() function on 
> > said directory.
> >
> > This ends up excluding any files in the source directory tree that are not 
> > checked
> > into git yet, silently skipping them (which I've run into before when
> > using the tool).
> 
> The exactly same problem exists the other way round. Run an
> unconstrained version of spdxcheck on a dirty source tree with lots of
> leftovers, then it scans nonsense all the way instead of skipping some
> not yet git tracked files.

Yeah.  I thought about this overnight, and came to the same conclusion.
I forgot that most people build the kernel in a way
that the build results end up in the source tree.  (Crazy, right?)
I almost always use KBUILD_OUTPUT, and I always use it when I'm doing
embedded and spdx-related work, so I don't often run into build
contamination of the source tree.

> 
> The easiest way for me to achieve that was using git to exclude all of
> the irrelevant noise, which I still consider to be a reasonable design
> decision.

Agreed.  Given common build practices, this is a reasonable design decision.
I thought there might be a good reason for this design choice, and
was hoping you would respond, Thomas.  Thanks for the quick feedback.

> 
> And yes, it ignores not yet tracked files, but if you want to check
> them, then it's easy enough to commit them temporarily or provide a
> dedicated file target to the tools, which ignores git.

OK.  Yes. That's an easy workaround.

> 
> > I think the code could be relatively easily refactored to eliminate the use 
> > of the git
> > module, to overcome these issues.  I'm not sure if removing the module would
> > eliminate the yield operation (used inside repo.traverse()), which seems to 
> > be causing the
> > problem found here.  IMHO, in my experience when using python it is helpful
> > to use as few non-core modules as possible, because they tend to break like 
> > this
> > occasionally.
> >
> > Let me know if anyone objects to me working up a refactoring of spdxcheck.py
> > eliminating the use of the python 'git' module, and submitting it for 
> > review.
> 
> I have no objections at all as long as it gives the same result of not
> trying to scan random artifacts which might sit around in a source tree.
> 
> But not for the price that I have to create a tarball or a pristine
> checked out tree first to run it. That'd be a usability regression to
> begin with.

Agreed.
> 
> Good luck for coming up with a clever and clean solution for that!

I thought about various solutions for this, but each one I came up
with had other drawbacks.  If it was just a matter of separating 
*.[chS] files from ELF object files, that would be easy to deal with.
But we put SPDX headers on all kinds of files, and there are lots
of other types of files generated during a build that are not just
ELF objects.  And build rules change over time.  So even if I made
a comprehensive system today to catch build-generated outliers,
the solution would probably need constant updating and tweaking, which
IMHO makes it a no-go.

> 
> Just for the record: I rather wish that people would contribute to
> eliminate the remaining 17% (15397 files) which do not have SPDX
> identifiers than complaining about the trivial to solve short-comings of
> the tool, which was written to help this effort and to make sure that it
> does not degrade.

I agree with this.  Analyzing where the headers are missing is interesting.
But it's more important to just fix the missing ones.
I'll spend more of my time working on missing headers,
rather than on tools to analyze and report them.

Thanks and regards,
 -- Tim

[PATCH] selftests/bpf: Fix bpf_nf selftest failure

2025-04-09 Thread Saket Kumar Bhaskar

For systems with missing iptables-legacy tool, this selftest fails.

Add check to find if iptables-legacy tool is available and skip the
test if the tool is missing.

Fixes: de9c8d848d90 ("selftests/bpf: S/iptables/iptables-legacy/ in the bpf_nf 
and xdp_synproxy test")
Signed-off-by: Saket Kumar Bhaskar 
---
 tools/testing/selftests/bpf/prog_tests/bpf_nf.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c 
b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
index dbd13f8e42a7..dd6512fa652b 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
@@ -63,6 +63,12 @@ static void test_bpf_nf_ct(int mode)
.repeat = 1,
);
 
+   if (SYS_NOFAIL("iptables-legacy --version")) {
+   fprintf(stdout, "Missing required iptables-legacy tool\n");
+   test__skip();
+   return;
+   }
+
skel = test_bpf_nf__open_and_load();
if (!ASSERT_OK_PTR(skel, "test_bpf_nf__open_and_load"))
return;
-- 
2.43.5

Re: [PATCH] selftests/mm: Fix compiler -Wmaybe-uninitialized warning

2025-04-09 Thread David Hildenbrand


On 09.04.25 11:50, Anshuman Khandual wrote:

Following build warning comes up for cow test as 'transferred' variable has
not been initialized. Fix the warning via zero init for the variable.

   CC   cow
cow.c: In function ‘do_test_vmsplice_in_parent’:
cow.c:365:61: warning: ‘transferred’ may be used uninitialized 
[-Wmaybe-uninitialized]
   365 | cur = read(fds[0], new + total, transferred - total);
   | ^~~
cow.c:296:29: note: ‘transferred’ was declared here
   296 | ssize_t cur, total, transferred;
   | ^~~
   CC   compaction_test
   CC   gup_longterm

Cc: Andrew Morton 
Cc: Shuah Khan 
Cc: linux...@kvack.org
Cc: linux-kselft...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
  tools/testing/selftests/mm/cow.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/cow.c
index f0cb14ea8608..b6cfe0a4b7df 100644
--- a/tools/testing/selftests/mm/cow.c
+++ b/tools/testing/selftests/mm/cow.c
@@ -293,7 +293,7 @@ static void do_test_vmsplice_in_parent(char *mem, size_t 
size,
.iov_base = mem,
.iov_len = size,
};
-   ssize_t cur, total, transferred;
+   ssize_t cur, total, transferred = 0;
struct comm_pipes comm_pipes;
char *old, *new;
int ret, fds[2];



if (before_fork) {
transferred = vmsplice(fds[1], &iov, 1, 0);
...

if (!before_fork) {
transferred = vmsplice(fds[1], &iov, 1, 0);
...

for (total = 0; total < transferred; total += cur) {
...


And I don't see any jump label that could jump to code that would ve 
using transferred.


What am I missing?

--
Cheers,

David / dhildenb

Re: [PATCH v8 8/8] vhost: Add a KConfig knob to enable IOCTL VHOST_FORK_FROM_OWNER

2025-04-09 Thread Cindy Lu

On Tue, Apr 8, 2025 at 7:56 PM Michael S. Tsirkin  wrote:
>
> On Fri, Mar 28, 2025 at 06:02:52PM +0800, Cindy Lu wrote:
> > Introduce a new config knob `CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL`,
> > to control the availability of the `VHOST_FORK_FROM_OWNER` ioctl.
> > When CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL is set to n, the ioctl
> > is disabled, and any attempt to use it will result in failure.
> >
> > Signed-off-by: Cindy Lu 
> > ---
> >  drivers/vhost/Kconfig | 15 +++
> >  drivers/vhost/vhost.c |  3 +++
> >  2 files changed, 18 insertions(+)
> >
> > diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> > index b455d9ab6f3d..e5b9dcbf31b6 100644
> > --- a/drivers/vhost/Kconfig
> > +++ b/drivers/vhost/Kconfig
> > @@ -95,3 +95,18 @@ config VHOST_CROSS_ENDIAN_LEGACY
> > If unsure, say "N".
> >
> >  endif
> > +
> > +config VHOST_ENABLE_FORK_OWNER_IOCTL
> > + bool "Enable IOCTL VHOST_FORK_FROM_OWNER"
> > + default n
> > + help
> > +   This option enables the IOCTL VHOST_FORK_FROM_OWNER, which allows
> > +   userspace applications to modify the thread mode for vhost devices.
>
> ok
>
> > +  By default, `CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL` is set to `n`,
> > +  meaning the ioctl is disabled and any operation using this ioctl
> > +  will fail.
> > +  When the configuration is enabled (y), the ioctl becomes
> > +  available, allowing users to set the mode if needed.
>
> no need to be so verbose - the disabled beavious belongs in commit log
> not here.
>
> Also either ioctl or IOCTL but not both.
>
sure, will change this
Thanks
cindy
> > +
> > +   If unsure, say "N".
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index fb0c7fb43f78..568e43cb54a9 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -2294,6 +2294,8 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned 
> > int ioctl, void __user *argp)
> >   r = vhost_dev_set_owner(d);
> >   goto done;
> >   }
> > +
> > +#ifdef CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL
> >   if (ioctl == VHOST_FORK_FROM_OWNER) {
> >   u8 inherit_owner;
> >   /*inherit_owner can only be modified before owner is set*/
> > @@ -2313,6 +2315,7 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned 
> > int ioctl, void __user *argp)
> >   r = 0;
> >   goto done;
> >   }
> > +#endif
> >   /* You must be the owner to do anything else */
> >   r = vhost_dev_check_owner(d);
> >   if (r)
> > --
> > 2.45.0
>

Re: [PATCH] selftests/mm: Fix compiler -Wmaybe-uninitialized warning

2025-04-09 Thread Anshuman Khandual




On 4/9/25 15:27, David Hildenbrand wrote:
> On 09.04.25 11:50, Anshuman Khandual wrote:
>> Following build warning comes up for cow test as 'transferred' variable has
>> not been initialized. Fix the warning via zero init for the variable.
>>
>>    CC   cow
>> cow.c: In function ‘do_test_vmsplice_in_parent’:
>> cow.c:365:61: warning: ‘transferred’ may be used uninitialized 
>> [-Wmaybe-uninitialized]
>>    365 | cur = read(fds[0], new + total, transferred - 
>> total);
>>    | ^~~
>> cow.c:296:29: note: ‘transferred’ was declared here
>>    296 | ssize_t cur, total, transferred;
>>    | ^~~
>>    CC   compaction_test
>>    CC   gup_longterm
>>
>> Cc: Andrew Morton 
>> Cc: Shuah Khan 
>> Cc: linux...@kvack.org
>> Cc: linux-kselft...@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual 
>> ---
>>   tools/testing/selftests/mm/cow.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tools/testing/selftests/mm/cow.c 
>> b/tools/testing/selftests/mm/cow.c
>> index f0cb14ea8608..b6cfe0a4b7df 100644
>> --- a/tools/testing/selftests/mm/cow.c
>> +++ b/tools/testing/selftests/mm/cow.c
>> @@ -293,7 +293,7 @@ static void do_test_vmsplice_in_parent(char *mem, size_t 
>> size,
>>   .iov_base = mem,
>>   .iov_len = size,
>>   };
>> -    ssize_t cur, total, transferred;
>> +    ssize_t cur, total, transferred = 0;
>>   struct comm_pipes comm_pipes;
>>   char *old, *new;
>>   int ret, fds[2];
> 
> 
> if (before_fork) {
> transferred = vmsplice(fds[1], &iov, 1, 0);
> ...
> 
> if (!before_fork) {
> transferred = vmsplice(fds[1], &iov, 1, 0);
> ...
> 
> for (total = 0; total < transferred; total += cur) {
> ...
> 
> 
> And I don't see any jump label that could jump to code that would ve using 
> transferred.
> 
> What am I missing?

Probably because both those conditional statements are not mutually
exclusive above with an if-else construct. Hence compiler flags it
rather as a false positive ? Initializing with 0 just works around
that false positive.

[PATCH] selftests/mm: Fix compiler -Wmaybe-uninitialized warning

2025-04-09 Thread Anshuman Khandual

Following build warning comes up for cow test as 'transferred' variable has
not been initialized. Fix the warning via zero init for the variable.

  CC   cow
cow.c: In function ‘do_test_vmsplice_in_parent’:
cow.c:365:61: warning: ‘transferred’ may be used uninitialized 
[-Wmaybe-uninitialized]
  365 | cur = read(fds[0], new + total, transferred - total);
  | ^~~
cow.c:296:29: note: ‘transferred’ was declared here
  296 | ssize_t cur, total, transferred;
  | ^~~
  CC   compaction_test
  CC   gup_longterm

Cc: Andrew Morton 
Cc: Shuah Khan 
Cc: linux...@kvack.org
Cc: linux-kselft...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 tools/testing/selftests/mm/cow.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/cow.c
index f0cb14ea8608..b6cfe0a4b7df 100644
--- a/tools/testing/selftests/mm/cow.c
+++ b/tools/testing/selftests/mm/cow.c
@@ -293,7 +293,7 @@ static void do_test_vmsplice_in_parent(char *mem, size_t 
size,
.iov_base = mem,
.iov_len = size,
};
-   ssize_t cur, total, transferred;
+   ssize_t cur, total, transferred = 0;
struct comm_pipes comm_pipes;
char *old, *new;
int ret, fds[2];
-- 
2.43.0

Re: [PATCH] selftests/mm: Fix compiler -Wmaybe-uninitialized warning

2025-04-09 Thread David Hildenbrand


On 09.04.25 12:09, Anshuman Khandual wrote:



On 4/9/25 15:27, David Hildenbrand wrote:

On 09.04.25 11:50, Anshuman Khandual wrote:

Following build warning comes up for cow test as 'transferred' variable has
not been initialized. Fix the warning via zero init for the variable.

    CC   cow
cow.c: In function ‘do_test_vmsplice_in_parent’:
cow.c:365:61: warning: ‘transferred’ may be used uninitialized 
[-Wmaybe-uninitialized]
    365 | cur = read(fds[0], new + total, transferred - total);
    | ^~~
cow.c:296:29: note: ‘transferred’ was declared here
    296 | ssize_t cur, total, transferred;
    | ^~~
    CC   compaction_test
    CC   gup_longterm

Cc: Andrew Morton 
Cc: Shuah Khan 
Cc: linux...@kvack.org
Cc: linux-kselft...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
   tools/testing/selftests/mm/cow.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/cow.c
index f0cb14ea8608..b6cfe0a4b7df 100644
--- a/tools/testing/selftests/mm/cow.c
+++ b/tools/testing/selftests/mm/cow.c
@@ -293,7 +293,7 @@ static void do_test_vmsplice_in_parent(char *mem, size_t 
size,
   .iov_base = mem,
   .iov_len = size,
   };
-    ssize_t cur, total, transferred;
+    ssize_t cur, total, transferred = 0;
   struct comm_pipes comm_pipes;
   char *old, *new;
   int ret, fds[2];



if (before_fork) {
 transferred = vmsplice(fds[1], &iov, 1, 0);
...

if (!before_fork) {
 transferred = vmsplice(fds[1], &iov, 1, 0);
...

for (total = 0; total < transferred; total += cur) {
...


And I don't see any jump label that could jump to code that would ve using 
transferred.

What am I missing?


Probably because both those conditional statements are not mutually
exclusive above with an if-else construct. Hence compiler flags it
rather as a false positive ? Initializing with 0 just works around
that false positive.


This is something the compiler should clearly be able to verify. 
before_fork is never changed in that function.


We should not work around wrong compilers.

Which compiler are you using such that you run into this issue?

--
Cheers,

David / dhildenb

Re: [PATCH v1] s390/virtio_ccw: don't allocate/assign airqs for non-existing queues

2025-04-09 Thread David Hildenbrand


On 07.04.25 23:20, Michael S. Tsirkin wrote:

On Mon, Apr 07, 2025 at 08:47:05PM +0200, David Hildenbrand wrote:

In my opinion, it makes the most sense to keep the spec as it is and
change QEMU and the kernel to match, but obviously that's not trivial
to do in a way that doesn't break existing devices and drivers.


If only it would be limited to QEMU and Linux ... :)

Out of curiosity, assuming we'd make the spec match the current QEMU/Linux
implementation at least for the 3 involved features only, would there be a
way to adjust crossvm without any disruption?

I still have the feeling that it will be rather hard to get that all
implementations match the spec ... For new features+queues it will be easy
to force the usage of fixed virtqueue numbers, but for free-page-hinting and
reporting, it's a mess :(



Still thinking about a way to fix drivers... We can discuss this
theoretically, maybe?


Yes, absolutely. I took the time to do some more digging; regarding 
drivers only Linux seems to be problematic.


virtio-win, FreeBSD, NetBSD and OpenBSD and don't seem to support 
problematic features (free page hinting, free page reporting) in their 
virtio-balloon implementations.


So from the known drivers, only Linux is applicable.

reporting_vq is either at idx 4/3/2
free_page_vq is either at idx 3/2
statsq is at idx2 (only relevant if the feature is offered)

So if we could test for the existence of a virtqueue at an idx easily, 
we could test from highest-to-smallest idx.


But I recall that testing for the existance of a virtqueue on s390x 
resulted in the problem/deadlock in the first place ...


--
Cheers,

David / dhildenb

Re: [PATCH] selftests/mm: Fix compiler -Wmaybe-uninitialized warning

2025-04-09 Thread David Hildenbrand


On 09.04.25 12:25, Anshuman Khandual wrote:



On 4/9/25 15:51, David Hildenbrand wrote:

On 09.04.25 12:09, Anshuman Khandual wrote:



On 4/9/25 15:27, David Hildenbrand wrote:

On 09.04.25 11:50, Anshuman Khandual wrote:

Following build warning comes up for cow test as 'transferred' variable has
not been initialized. Fix the warning via zero init for the variable.

     CC   cow
cow.c: In function ‘do_test_vmsplice_in_parent’:
cow.c:365:61: warning: ‘transferred’ may be used uninitialized 
[-Wmaybe-uninitialized]
     365 | cur = read(fds[0], new + total, transferred - total);
     | ^~~
cow.c:296:29: note: ‘transferred’ was declared here
     296 | ssize_t cur, total, transferred;
     | ^~~
     CC   compaction_test
     CC   gup_longterm

Cc: Andrew Morton 
Cc: Shuah Khan 
Cc: linux...@kvack.org
Cc: linux-kselft...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
    tools/testing/selftests/mm/cow.c | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/cow.c
index f0cb14ea8608..b6cfe0a4b7df 100644
--- a/tools/testing/selftests/mm/cow.c
+++ b/tools/testing/selftests/mm/cow.c
@@ -293,7 +293,7 @@ static void do_test_vmsplice_in_parent(char *mem, size_t 
size,
    .iov_base = mem,
    .iov_len = size,
    };
-    ssize_t cur, total, transferred;
+    ssize_t cur, total, transferred = 0;
    struct comm_pipes comm_pipes;
    char *old, *new;
    int ret, fds[2];



if (before_fork) {
  transferred = vmsplice(fds[1], &iov, 1, 0);
...

if (!before_fork) {
  transferred = vmsplice(fds[1], &iov, 1, 0);
...

for (total = 0; total < transferred; total += cur) {
...


And I don't see any jump label that could jump to code that would ve using 
transferred.

What am I missing?


Probably because both those conditional statements are not mutually
exclusive above with an if-else construct. Hence compiler flags it
rather as a false positive ? Initializing with 0 just works around
that false positive.


This is something the compiler should clearly be able to verify. before_fork is 
never changed in that function.

We should not work around wrong compilers.

Which compiler are you using such that you run into this issue?


gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0



gcc (GCC) 14.2.1 20250110 (Red Hat 14.2.1-7)

Seems to be fine, just like all other compilers people used with this 
over the years.


Maybe something about that compiler is shaky that was fixed in the meantime?

--
Cheers,

David / dhildenb

Re: [PATCH v1] s390/virtio_ccw: don't allocate/assign airqs for non-existing queues

2025-04-09 Thread Michael S. Tsirkin

On Wed, Apr 09, 2025 at 12:46:41PM +0200, David Hildenbrand wrote:
> On 07.04.25 23:20, Michael S. Tsirkin wrote:
> > On Mon, Apr 07, 2025 at 08:47:05PM +0200, David Hildenbrand wrote:
> > > > In my opinion, it makes the most sense to keep the spec as it is and
> > > > change QEMU and the kernel to match, but obviously that's not trivial
> > > > to do in a way that doesn't break existing devices and drivers.
> > > 
> > > If only it would be limited to QEMU and Linux ... :)
> > > 
> > > Out of curiosity, assuming we'd make the spec match the current QEMU/Linux
> > > implementation at least for the 3 involved features only, would there be a
> > > way to adjust crossvm without any disruption?
> > > 
> > > I still have the feeling that it will be rather hard to get that all
> > > implementations match the spec ... For new features+queues it will be easy
> > > to force the usage of fixed virtqueue numbers, but for free-page-hinting 
> > > and
> > > reporting, it's a mess :(
> > 
> > 
> > Still thinking about a way to fix drivers... We can discuss this
> > theoretically, maybe?
> 
> Yes, absolutely. I took the time to do some more digging; regarding drivers
> only Linux seems to be problematic.
> 
> virtio-win, FreeBSD, NetBSD and OpenBSD and don't seem to support
> problematic features (free page hinting, free page reporting) in their
> virtio-balloon implementations.
> 
> So from the known drivers, only Linux is applicable.
> 
> reporting_vq is either at idx 4/3/2
> free_page_vq is either at idx 3/2
> statsq is at idx2 (only relevant if the feature is offered)
> 
> So if we could test for the existence of a virtqueue at an idx easily, we
> could test from highest-to-smallest idx.
> 
> But I recall that testing for the existance of a virtqueue on s390x resulted
> in the problem/deadlock in the first place ...
> 
> -- 
> Cheers,
> 
> David / dhildenb

So let's talk about a new feature bit?

Since vqs are probed after feature negotiation, it looks like
we could have a feature bit trigger sane behaviour, right?

I kind of dislike it that we have a feature bit for bugs though.
What would be a minimal new feature to add so it does not
feel wrong?

Maybe it's in the field of psychology though ...


-- 
MST

Re: [PATCH v1] s390/virtio_ccw: don't allocate/assign airqs for non-existing queues

2025-04-09 Thread David Hildenbrand


On 07.04.25 23:09, Daniel Verkamp wrote:

On Mon, Apr 7, 2025 at 11:47 AM David Hildenbrand  wrote:



Heh, but that one said:

+\item[ VIRTIO_BALLOON_F_WS_REPORTING(6) ] The device has support for
Working Set

Which does not seem to reflect reality ...


Please feel free to disregard these features and reuse their bits and
queue indexes; as far as I know, they are not actually enabled
anywhere currently and the corresponding guest patches were only
applied to some (no-longer-used) ChromeOS kernel trees, so the
compatibility impact should be minimal. I will also try to clean up
the leftover bits on the crosvm side just to clear things up.


Thanks for your reply, and thanks for clarifying+cleaning it up.


[...]

IIRC, in that commit they switched to the "spec" behavior.

That's when they started hard-coding the queue indexes.

CCing Daniel. All Linux versions should be incompatible with cross-vmm 
regarding free page reporting.
How is that handled?


In practice, it only works because nobody calls crosvm with
--balloon-page-reporting (it's off by default), so the balloon device
does not advertise the VIRTIO_BALLOON_F_PAGE_REPORTING feature.

(I just went searching now, and it does seem like there is actually
one user in Android that does try to enable page reporting[1], which
I'll have to look into...)

In my opinion, it makes the most sense to keep the spec as it is and
change QEMU and the kernel to match, but obviously that's not trivial
to do in a way that doesn't break existing devices and drivers.


If only it would be limited to QEMU and Linux ... :)

Out of curiosity, assuming we'd make the spec match the current
QEMU/Linux implementation at least for the 3 involved features only,
would there be a way to adjust crossvm without any disruption?

I still have the feeling that it will be rather hard to get that all
implementations match the spec ... For new features+queues it will be
easy to force the usage of fixed virtqueue numbers, but for
free-page-hinting and reporting, it's a mess :(


If the spec is changed, we can certainly update crosvm to match it; I
think this only really affects a few devices (balloon and technically
filesystem, but see below), only affects features that are generally
not turned on, and in many cases, the guest kernel is updated
simultaneously with the crosvm binary. I'm not opposed to changing the
spec to match reality, although that feels like a bad move from a
spec-integrity perspective.


Right. We didn't pay attention that the spec would reflect reality, and 
the reality was a bad decision :)




Regardless of the chosen path, I think the spec should be clarified -
the meaning of "queue only exists if  is set" leaves the
reader with too many questions:


Right, that's what we've been discussing.


- What does "if  is set" mean? If it's advertised by the
device? If it's acked by the driver? (To me, "set" definitely hints at
the latter, but it should be explicit.)


Currently it's "feature is offered by the device".


- What does it mean for a virtqueue to "exist"? Does that queue index
disappear from the numbering if it does not exist, sliding all of the
later queues down?


Currently it's like that, yes.


If so, the spec should really not have the static
queue numbers listed for the later queues, since they are only correct
if all previous feature-dependent queues were also "set", whatever
that means.


Yes, that's also what we've been discussing. And that started restarted 
the whole "can we fix device/drivers instead" :)




The way crosvm interpreted this was:
- "if  is set" means "if the device advertised 
*and* driver acknowledged "
- "queue only exists" means "if  was not acked, the
corresponding virtqueue cannot be enabled by the driver" (attempting
to set queue_enable = 1 has no effect).
- Any later virtqueues are unaffected and still have the same queue indexes.


Yes, that matches my understanding.



The way QEMU interpeted this (I think, just skimming the code and
working from memory here):
- "if  is set" means "if the device advertised " (it
is checking host_features, not guest_features)


Right, "offered features".


- "queue only exists" means "if  was not offered by the
device, all later virtqueues are shifted down one index"


Exactly.



---

The spec for the filesystem device has a similar issue to the balloon device:
- Queue 0 (hiprio) is always available regardless of features.
- Queue 1 (notification queue) has a note that "The notification queue
only exists if VIRTIO_FS_F_NOTIFICATION is set."
- Queue 2..n are supposed to be the request queues per the numbering
in the spec.

This is how it has been specified since virtio 1.2 when the fs device
was originally added. However, the Linux driver (and probably all
existing device implementations - at least virtiofsd and crosvm's fs
device) don't support VIRTIO_FS_F_NOTIFICATION and use queue 1 as a
request queue, which matches the QEMU/Linux interpretation but means
the spec doesn't match reality aga

Re: [PATCH v1] s390/virtio_ccw: don't allocate/assign airqs for non-existing queues

2025-04-09 Thread David Hildenbrand


On 09.04.25 12:56, Michael S. Tsirkin wrote:

On Wed, Apr 09, 2025 at 12:46:41PM +0200, David Hildenbrand wrote:

On 07.04.25 23:20, Michael S. Tsirkin wrote:

On Mon, Apr 07, 2025 at 08:47:05PM +0200, David Hildenbrand wrote:

In my opinion, it makes the most sense to keep the spec as it is and
change QEMU and the kernel to match, but obviously that's not trivial
to do in a way that doesn't break existing devices and drivers.


If only it would be limited to QEMU and Linux ... :)

Out of curiosity, assuming we'd make the spec match the current QEMU/Linux
implementation at least for the 3 involved features only, would there be a
way to adjust crossvm without any disruption?

I still have the feeling that it will be rather hard to get that all
implementations match the spec ... For new features+queues it will be easy
to force the usage of fixed virtqueue numbers, but for free-page-hinting and
reporting, it's a mess :(



Still thinking about a way to fix drivers... We can discuss this
theoretically, maybe?


Yes, absolutely. I took the time to do some more digging; regarding drivers
only Linux seems to be problematic.

virtio-win, FreeBSD, NetBSD and OpenBSD and don't seem to support
problematic features (free page hinting, free page reporting) in their
virtio-balloon implementations.

So from the known drivers, only Linux is applicable.

reporting_vq is either at idx 4/3/2
free_page_vq is either at idx 3/2
statsq is at idx2 (only relevant if the feature is offered)

So if we could test for the existence of a virtqueue at an idx easily, we
could test from highest-to-smallest idx.

But I recall that testing for the existance of a virtqueue on s390x resulted
in the problem/deadlock in the first place ...

--
Cheers,

David / dhildenb


So let's talk about a new feature bit?


Are you thinking about a new feature that switches between "fixed queue 
indices" and "compressed queue indices", whereby the latter would be the 
legacy default and we would expect all devices to switch to the new 
fixed-queue-indices layout?


We could make all new features require "fixed-queue-indices".



Since vqs are probed after feature negotiation, it looks like
we could have a feature bit trigger sane behaviour, right?


In the Linux driver, yes. In QEMU (devices), we add the queues when 
realizing, so we'd need some mechanism to adjust the queue indices based 
on feature negotiation I guess?


For virtio-balloon it might be doable to simply always create+indicate 
free-page hinting to resolve the issue easily.


For virtio-fs it might not be that easy.



I kind of dislike it that we have a feature bit for bugs though.
What would be a minimal new feature to add so it does not
feel wrong?


Probably as above: fixed vs. compressed virtqueue indices?

--
Cheers,

David / dhildenb

[RFC bpf-next 08/13] selftests: bpf: Avoid attaching to bpf_check()

2025-04-09 Thread Daniel Xu

bpf_check(), as it currently exists, will soon be going away to make way
for loadable BPF verifier support. Fixup selftests so they fentry attach
to a more reliable location.

Signed-off-by: Daniel Xu 
---
 tools/testing/selftests/bpf/progs/exceptions_assert.c | 2 +-
 tools/testing/selftests/bpf/progs/exceptions_fail.c   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/exceptions_assert.c 
b/tools/testing/selftests/bpf/progs/exceptions_assert.c
index 5e0a1ca96d4e..50bc52cbb2e7 100644
--- a/tools/testing/selftests/bpf/progs/exceptions_assert.c
+++ b/tools/testing/selftests/bpf/progs/exceptions_assert.c
@@ -124,7 +124,7 @@ int check_assert_generic(struct __sk_buff *ctx)
return data[128];
 }
 
-SEC("?fentry/bpf_check")
+SEC("?fentry/bpf_fentry_test1")
 __failure __msg("At program exit the register R1 has smin=64 smax=64")
 int check_assert_with_return(void *ctx)
 {
diff --git a/tools/testing/selftests/bpf/progs/exceptions_fail.c 
b/tools/testing/selftests/bpf/progs/exceptions_fail.c
index 8a0fdff89927..b44cb0a6c9d9 100644
--- a/tools/testing/selftests/bpf/progs/exceptions_fail.c
+++ b/tools/testing/selftests/bpf/progs/exceptions_fail.c
@@ -299,7 +299,7 @@ __noinline int exception_cb_bad_ret(u64 c)
return c;
 }
 
-SEC("?fentry/bpf_check")
+SEC("?fentry/bpf_fentry_test1")
 __exception_cb(exception_cb_bad_ret)
 __failure __msg("At program exit the register R0 has unknown scalar value 
should")
 int reject_set_exception_cb_bad_ret1(void *ctx)
@@ -307,7 +307,7 @@ int reject_set_exception_cb_bad_ret1(void *ctx)
return 0;
 }
 
-SEC("?fentry/bpf_check")
+SEC("?fentry/bpf_fentry_test1")
 __failure __msg("At program exit the register R1 has smin=64 smax=64 should")
 int reject_set_exception_cb_bad_ret2(void *ctx)
 {
-- 
2.47.1

Re: [PATCH v1] s390/virtio_ccw: don't allocate/assign airqs for non-existing queues

2025-04-09 Thread Michael S. Tsirkin

On Wed, Apr 09, 2025 at 01:12:19PM +0200, David Hildenbrand wrote:
> On 09.04.25 12:56, Michael S. Tsirkin wrote:
> > On Wed, Apr 09, 2025 at 12:46:41PM +0200, David Hildenbrand wrote:
> > > On 07.04.25 23:20, Michael S. Tsirkin wrote:
> > > > On Mon, Apr 07, 2025 at 08:47:05PM +0200, David Hildenbrand wrote:
> > > > > > In my opinion, it makes the most sense to keep the spec as it is and
> > > > > > change QEMU and the kernel to match, but obviously that's not 
> > > > > > trivial
> > > > > > to do in a way that doesn't break existing devices and drivers.
> > > > > 
> > > > > If only it would be limited to QEMU and Linux ... :)
> > > > > 
> > > > > Out of curiosity, assuming we'd make the spec match the current 
> > > > > QEMU/Linux
> > > > > implementation at least for the 3 involved features only, would there 
> > > > > be a
> > > > > way to adjust crossvm without any disruption?
> > > > > 
> > > > > I still have the feeling that it will be rather hard to get that all
> > > > > implementations match the spec ... For new features+queues it will be 
> > > > > easy
> > > > > to force the usage of fixed virtqueue numbers, but for 
> > > > > free-page-hinting and
> > > > > reporting, it's a mess :(
> > > > 
> > > > 
> > > > Still thinking about a way to fix drivers... We can discuss this
> > > > theoretically, maybe?
> > > 
> > > Yes, absolutely. I took the time to do some more digging; regarding 
> > > drivers
> > > only Linux seems to be problematic.
> > > 
> > > virtio-win, FreeBSD, NetBSD and OpenBSD and don't seem to support
> > > problematic features (free page hinting, free page reporting) in their
> > > virtio-balloon implementations.
> > > 
> > > So from the known drivers, only Linux is applicable.
> > > 
> > > reporting_vq is either at idx 4/3/2
> > > free_page_vq is either at idx 3/2
> > > statsq is at idx2 (only relevant if the feature is offered)
> > > 
> > > So if we could test for the existence of a virtqueue at an idx easily, we
> > > could test from highest-to-smallest idx.
> > > 
> > > But I recall that testing for the existance of a virtqueue on s390x 
> > > resulted
> > > in the problem/deadlock in the first place ...
> > > 
> > > -- 
> > > Cheers,
> > > 
> > > David / dhildenb
> > 
> > So let's talk about a new feature bit?
> 
> Are you thinking about a new feature that switches between "fixed queue
> indices" and "compressed queue indices", whereby the latter would be the
> legacy default and we would expect all devices to switch to the new
> fixed-queue-indices layout?
> 
> We could make all new features require "fixed-queue-indices".

I see two ways:
1. we make driver behave correctly with in spec and out of spec devices
   and we make qemu behave correctly with in spec and out of spec devices
2. a new feature bit

I prefer 1, and when we add a new feature we can also
document that it should be in spec if negotiated.

My question is if 1 is practical.





> > 
> > Since vqs are probed after feature negotiation, it looks like
> > we could have a feature bit trigger sane behaviour, right?
> 
> In the Linux driver, yes. In QEMU (devices), we add the queues when
> realizing, so we'd need some mechanism to adjust the queue indices based on
> feature negotiation I guess?

Well we can add queues later, nothing prevents that.


> For virtio-balloon it might be doable to simply always create+indicate
> free-page hinting to resolve the issue easily.


OK, so
- for devices, we suggest that basically VIRTIO_BALLOON_F_REPORTING
  only created with VIRTIO_BALLOON_F_FREE_PAGE_HINT and 
  VIRTIO_BALLOON_F_FREE_PAGE_HINT only created with VIRTIO_BALLOON_F_STATS_VQ

I got that.


Now, for drivers.

If the dependency is satisfied as above, no difference.

What should drivers do if not?



I think the thing to do would be to first probe spec compliant
vq numbers? If not there, try with the non compliant version?


However,  you wrote:
> > > But I recall that testing for the existance of a virtqueue on s390x 
> > > resulted
> > > in the problem/deadlock in the first place ...

I think the deadlock was if trying to *use* a non-existent virtqueue?

This is qemu code:

case CCW_CMD_READ_VQ_CONF:
if (check_len) {
if (ccw.count != sizeof(vq_config)) {
ret = -EINVAL;
break;
}
} else if (ccw.count < sizeof(vq_config)) {
/* Can't execute command. */
ret = -EINVAL;
break;
}
if (!ccw.cda) {
ret = -EFAULT;
} else {
ret = ccw_dstream_read(&sch->cds, vq_config.index);
if (ret) {
break;
}
vq_config.index = be16_to_cpu(vq_config.index);
if (vq_config.index >= VIRTIO_QUEUE_MAX) {
ret = -EINVAL;
break;
}
vq_config.num_max = virtio_queue_get_num(vdev,
 vq_config.index);
vq_config.num_

62 matches

Mail list logo