Re: [PATCH] selftests: Improve test output grammar, code style
Hello, On 15/05/2025 19:22:49+0300, Hanne-Lotta Mäenpää wrote: > Add small grammar fixes in perf events and Real Time Clock tests' > output messages. > > Include braces around a single if statement, when there are multiple > statements in the else branch, to align with the kernel coding style. > > Signed-off-by: Hanne-Lotta Mäenpää > --- > tools/testing/selftests/perf_events/watermark_signal.c | 7 --- > tools/testing/selftests/rtc/rtctest.c | 10 +- > 2 files changed, 9 insertions(+), 8 deletions(-) > > diff --git a/tools/testing/selftests/perf_events/watermark_signal.c > b/tools/testing/selftests/perf_events/watermark_signal.c > index 49dc1e831174..6176afd4950b 100644 > --- a/tools/testing/selftests/perf_events/watermark_signal.c > +++ b/tools/testing/selftests/perf_events/watermark_signal.c > @@ -65,8 +65,9 @@ TEST(watermark_signal) > > child = fork(); > EXPECT_GE(child, 0); > - if (child == 0) > + if (child == 0) { > do_child(); > + } This change seems unrelated. > else if (child < 0) { > perror("fork()"); > goto cleanup; > @@ -75,7 +76,7 @@ TEST(watermark_signal) > if (waitpid(child, &child_status, WSTOPPED) != child || > !(WIFSTOPPED(child_status) && WSTOPSIG(child_status) == SIGSTOP)) { > fprintf(stderr, > - "failed to sycnhronize with child errno=%d status=%x\n", > + "failed to synchronize with child errno=%d status=%x\n", > errno, > child_status); > goto cleanup; > @@ -84,7 +85,7 @@ TEST(watermark_signal) > fd = syscall(__NR_perf_event_open, &attr, child, -1, -1, >PERF_FLAG_FD_CLOEXEC); > if (fd < 0) { > - fprintf(stderr, "failed opening event %llx\n", attr.config); > + fprintf(stderr, "failed to setup performance monitoring > %llx\n", attr.config); > goto cleanup; > } > > diff --git a/tools/testing/selftests/rtc/rtctest.c > b/tools/testing/selftests/rtc/rtctest.c > index be175c0e6ae3..8fd4d5d3b527 100644 > --- a/tools/testing/selftests/rtc/rtctest.c > +++ b/tools/testing/selftests/rtc/rtctest.c > @@ -138,10 +138,10 @@ TEST_F_TIMEOUT(rtc, date_read_loop, > READ_LOOP_DURATION_SEC + 2) { > rtc_read = rtc_time_to_timestamp(&rtc_tm); > /* Time should not go backwards */ > ASSERT_LE(prev_rtc_read, rtc_read); > - /* Time should not increase more then 1s at a time */ > + /* Time should not increase more than 1s per read */ > ASSERT_GE(prev_rtc_read + 1, rtc_read); > > - /* Sleep 11ms to avoid killing / overheating the RTC */ > + /* Sleep 11ms to avoid overheating the RTC */ > nanosleep_with_retries(READ_LOOP_SLEEP_MS * 100); > > prev_rtc_read = rtc_read; > @@ -236,7 +236,7 @@ TEST_F(rtc, alarm_alm_set) { > if (alarm_state == RTC_ALARM_DISABLED) > SKIP(return, "Skipping test since alarms are not supported."); > if (alarm_state == RTC_ALARM_RES_MINUTE) > - SKIP(return, "Skipping test since alarms has only minute > granularity."); > + SKIP(return, "Skipping test since alarms have only minute > granularity."); I guess the proper fix is to remove the s in alarms as there is only one alarm. > > rc = ioctl(self->fd, RTC_RD_TIME, &tm); > ASSERT_NE(-1, rc); > @@ -306,7 +306,7 @@ TEST_F(rtc, alarm_wkalm_set) { > if (alarm_state == RTC_ALARM_DISABLED) > SKIP(return, "Skipping test since alarms are not supported."); > if (alarm_state == RTC_ALARM_RES_MINUTE) > - SKIP(return, "Skipping test since alarms has only minute > granularity."); > + SKIP(return, "Skipping test since alarms have only minute > granularity."); > > rc = ioctl(self->fd, RTC_RD_TIME, &alarm.time); > ASSERT_NE(-1, rc); > @@ -502,7 +502,7 @@ int main(int argc, char **argv) > if (access(rtc_file, R_OK) == 0) > ret = test_harness_run(argc, argv); > else > - ksft_exit_skip("[SKIP]: Cannot access rtc file %s - Exiting\n", > + ksft_exit_skip("Cannot access RTC file %s - exiting\n", > rtc_file); > > return ret; > -- > 2.39.5 > -- Alexandre Belloni, co-owner and COO, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
Re: [PATCH v2] x86/sgx: Prevent attempts to reclaim poisoned pages
* Dave Hansen wrote: > Thanks for sending this, Andrew! > > I think I'll probably add a slightly shorter summary: > > tl;dr: SGX page reclaim touches the page to copy its contents to > secondary storage. SGX instructions do not gracefully handle machine > checks. Despite this, the existing SGX code will try to reclaim pages > that it _knows_ are poisoned. Avoid even trying to reclaim poisoned pages. > > But otherwise it looks great: > > Acked-by: Dave Hansen Thanks, I've applied this fix to tip:x86/sgx, with the TL;DR paragraph added in. Thanks, Ingo
[tip: x86/sgx] x86/sgx: Prevent attempts to reclaim poisoned pages
The following commit has been merged into the x86/sgx branch of tip: Commit-ID: ed16618c380c32c68c06186d0ccbb0d5e0586e59 Gitweb: https://git.kernel.org/tip/ed16618c380c32c68c06186d0ccbb0d5e0586e59 Author:Andrew Zaborowski AuthorDate:Fri, 09 May 2025 01:04:29 +02:00 Committer: Ingo Molnar CommitterDate: Thu, 15 May 2025 19:01:45 +02:00 x86/sgx: Prevent attempts to reclaim poisoned pages TL;DR: SGX page reclaim touches the page to copy its contents to secondary storage. SGX instructions do not gracefully handle machine checks. Despite this, the existing SGX code will try to reclaim pages that it _knows_ are poisoned. Avoid even trying to reclaim poisoned pages. The longer story: Pages used by an enclave only get epc_page->poison set in arch_memory_failure() but they currently stay on sgx_active_page_list until sgx_encl_release(), with the SGX_EPC_PAGE_RECLAIMER_TRACKED flag untouched. epc_page->poison is not checked in the reclaimer logic meaning that, if other conditions are met, an attempt will be made to reclaim an EPC page that was poisoned. This is bad because 1. we don't want that page to end up added to another enclave and 2. it is likely to cause one core to shut down and the kernel to panic. Specifically, reclaiming uses microcode operations including "EWB" which accesses the EPC page contents to encrypt and write them out to non-SGX memory. Those operations cannot handle MCEs in their accesses other than by putting the executing core into a special shutdown state (affecting both threads with HT.) The kernel will subsequently panic on the remaining cores seeing the core didn't enter MCE handler(s) in time. Call sgx_unmark_page_reclaimable() to remove the affected EPC page from sgx_active_page_list on memory error to stop it being considered for reclaiming. Testing epc_page->poison in sgx_reclaim_pages() would also work but I assume it's better to add code in the less likely paths. The affected EPC page is not added to &node->sgx_poison_page_list until later in sgx_encl_release()->sgx_free_epc_page() when it is EREMOVEd. Membership on other lists doesn't change to avoid changing any of the lists' semantics except for sgx_active_page_list. There's a "TBD" comment in arch_memory_failure() about pre-emptive actions, the goal here is not to address everything that it may imply. This also doesn't completely close the time window when a memory error notification will be fatal (for a not previously poisoned EPC page) -- the MCE can happen after sgx_reclaim_pages() has selected its candidates or even *inside* a microcode operation (actually easy to trigger due to the amount of time spent in them.) The spinlock in sgx_unmark_page_reclaimable() is safe because memory_failure() runs in process context and no spinlocks are held, explicitly noted in a mm/memory-failure.c comment. Signed-off-by: Andrew Zaborowski Signed-off-by: Ingo Molnar Acked-by: Dave Hansen Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Tony Luck Cc: balr...@gmail.com Cc: linux-...@vger.kernel.org Link: https://lore.kernel.org/r/20250508230429.456271-1-andrew.zaborow...@intel.com --- arch/x86/kernel/cpu/sgx/main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 8ce352f..7c19977 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -719,6 +719,8 @@ int arch_memory_failure(unsigned long pfn, int flags) goto out; } + sgx_unmark_page_reclaimable(page); + /* * TBD: Add additional plumbing to enable pre-emptive * action for asynchronous poison notification. Until
Re: [PATCH 9/9] CodingStyle: flip the rule about curlies
On 5/13/2025 12:06 PM, Alexey Dobriyan wrote: > On Mon, May 12, 2025 at 06:56:56PM +0200, Greg KH wrote: >> On Mon, May 12, 2025 at 09:43:10AM -0700, Jeff Johnson wrote: >>> On 5/9/2025 11:18 PM, Greg KH wrote: On Fri, May 09, 2025 at 11:34:30PM +0300, Alexey Dobriyan wrote: > Require set of curlies {} in all if/else branches and all loops > not matter how simple. Sorry, but no, we are not going to change this long-term coding style rule for no real reason at this point in time. >>> >>> Is the infamous Apple SSL bug (CVE-2014-1266) a good reason? > > Indeed. > > Thanks, curlies were inspired by this CVE but I forgot to mention it. > >> One bug in 2014 will require us to touch 30+ million lines of code? > > Nobody is proposing to reformat 30 mil lines at one commit > (as much as I'd like it). > > Old code will stay old, new code will be formatted per new rules. > >> Please be reasonable. > > I'm very reasonable. Each patch details rationale why specific style is > better. > >> And everyone, remember _why_ we have a coding style. It's not so much >> the specifics of _what_ the coding style is, > > What? When was the last time you read it? It is very much about specifics: > 8 spaces, opening curly on the same line except at function scope, > 80 columns, recent rule about to format function attributes. > > It could have even more specific if there is pre-commit hook forcing > formatting like commercial companies do. > >> one at all. Don't argue the specifics of the coding style without a >> really really good reason why, with real details and proof. > > What is "really really good"? > > How do you know when it is good reason or not? > > I think I have good reason: I programmed a little in another languages > where some of the rules don't apply. In particular C++/Rust don't have > a rule about declaring variables upfront. Nor does any popular programming > language designed in the last 35 years (?). > > Such experience made me realize that linux-kernel CodingStyle in this > regard is pointless at best and counter-productive. It was so obvious. > >> It took us a long time to increase the default line length, and that too >> is still argued about for very good and valid reasons. > > It still 80 columns in CodingStyle. > >> That was discussed in detail, not just thrown at us like this patch series >> was. > > Oh come on. In Russia we say "not my first year of marriage". > > One of the unwritten rules of linux-kernel is to NEVER post [RFC] > as it will be ignored, but to post a [PATCH] and Cc specific people > to force a discussion. > > I don't want to look like a thief who sneaks in occasional declaration > in the middle of a function or set of curlies and get yelled by compilers > or maintainers (especially those armed with checkpatch.pl). > > I'll codify this first in CodingStyle, then delete relevant checks from > checkpatch.pl (citing CodingStyle of course). I replied only this patch in the series because, when I first started programming using the Linux Coding Style in 2004, the single statement brace rule was the only rule I disagreed with. And that was in part due to the fact that, at three of my previous employers, the C coding style had dictated mandatory use of braces. So the explicit prohibition of braces for single statement conditionals really surprised me. And note the rule, as written, is not what actually seems to be enforced (at least by checkpatch.pl). What seems to be enforced is to not use braces where the conditional is a single line and there is a single line statement. In other words, if either the conditional or the single statement span multiple lines, then braces are allowed (or encouraged?). As examples, checkpatch.pl does not complain about any of the following (either to recommend adding braces or removing braces): if (a_really_long_line_function_name(a_really_long_identifier) || another_really_long_function_name(another_long_identifier)) { braced(); } if (a_really_long_line_function_name(a_really_long_identifier) || another_really_long_function_name(another_long_identifier)) not_braced(); if (a) call_a_really_long_function(with_long_argument, another_log_argument); if (a) { call_a_really_long_function(with_long_argument, another_log_argument); } I'll also note that the popular style guide published by Michael Barr as well as the MISRA C standard also dictate braces (but I'll admit that both of these have other rules which contradict the Linux style and where I agree with the Linux style). All of that said, ultimately, the Coding Style is supposed to be enforce consistent style that is readable and maintainable. I get that. I personally believe that requiring braces makes the code more maintainable, and do
Re: [PATCH] Revert "remoteproc: core: Clear table_sz when rproc_shutdown"
On 5/13/25 10:52 AM, Bjorn Andersson wrote: Clearing the table_sz on cleanup seemed reasonable, but further discussions concluded that this merely working around the issue and that the fix is incomplete. As such, revert commit efdde3d73ab2 ("remoteproc: core: Clear table_sz when rproc_shutdown") to avoid carrying a partial fix. Setting table_sz to 0 still seems like a good idea from a defensive programming perspective. Both table_ptr and table_sz should be set and cleared together in all spots. In addition to this, another fix would be to also update both table_ptr and table_sz to 0 when loading firmware without a resource table. Both should be done, no need to revert this. Andrew Signed-off-by: Bjorn Andersson --- drivers/remoteproc/remoteproc_core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c index 48d146e1fa560397c11eeb8f824ae0fb844a022b..81b2ccf988e852ac79cee375c7e3f118c2a4b41a 100644 --- a/drivers/remoteproc/remoteproc_core.c +++ b/drivers/remoteproc/remoteproc_core.c @@ -2025,7 +2025,6 @@ int rproc_shutdown(struct rproc *rproc) kfree(rproc->cached_table); rproc->cached_table = NULL; rproc->table_ptr = NULL; - rproc->table_sz = 0; out: mutex_unlock(&rproc->lock); return ret; --- base-commit: aa94665adc28f3fdc3de2979ac1e98bae961d6ca change-id: 20250513-revert-rproc-table-sz-53ecf24726ae Best regards,
[PATCH] selftests/futex: Fix usage() message to clarify timeout value unit
futex_wait_timeout: Fix usage() message to clarify timeout value unit Signed-off-by: Jonathan Velez --- tools/testing/selftests/futex/functional/futex_wait_timeout.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/futex/functional/futex_wait_timeout.c b/tools/testing/selftests/futex/functional/futex_wait_timeout.c index d183f878360b..737475df9242 100644 --- a/tools/testing/selftests/futex/functional/futex_wait_timeout.c +++ b/tools/testing/selftests/futex/functional/futex_wait_timeout.c @@ -31,7 +31,7 @@ void usage(char *prog) printf("Usage: %s\n", prog); printf(" -cUse color\n"); printf(" -hDisplay this help message\n"); - printf(" -t N Timeout in nanoseconds (default: 100,000)\n"); + printf(" -t N Set timeout duration in nanoseconds (default: 100,000 ns = 100 us)\n"); printf(" -v L Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n", VQUIET, VCRITICAL, VINFO); } -- 2.43.0
[PATCH net-next v6] selftests/vsock: add initial vmtest.sh for vsock
This commit introduces a new vmtest.sh runner for vsock. It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H, H2G, and loopback. The testing tools from tools/testing/vsock/ are reused. Currently, only vsock_test is used. VMCI and hyperv support is automatically built, though not used. Only tested on x86. To run: $ make -C tools/testing/selftests TARGETS=vsock $ tools/testing/selftests/vsock/vmtest.sh or $ make -C tools/testing/selftests TARGETS=vsock run_tests Example runs: $ ./tools/testing/selftests/vsock/vmtest.sh TAP version 13 1..3 not ok 0 vm_server_host_client # exit=1 ok 1 vm_client_host_server ok 2 vm_loopback 1..3 $ ./tools/testing/selftests/vsock/vmtest.sh vm_loopback TAP version 13 1..1 ok 0 vm_loopback 1..1 Future work can include vsock_diag_test. The tap output style copies mm's run_vmtests.sh. Because vsock requires a VM to test anything other than loopback, this patch adds vmtest.sh as a kselftest itself. This is different than other systems that have a "vmtest.sh", where it is used as a utility script to spin up a VM to run the selftests as a guest (but isn't hooked into kselftest). Testing in NIPA is still WIP. Signed-off-by: Bobby Eshleman --- Changes in v6: - add make cmd in commit message in vmtest.sh example (Stefano) - check nonzero size of QEMU_PIDFILE using -s conditional (Stefano) - display log file path after tests so it is easier to find amongst other random names - cleanup qemu pidfile if qemu is unable to remove it - make oops/warning failures more obvious with 'FAIL' prefix in log (simply saying 'detected' wasn't clear enough to identify failing condition) - Link to v5: https://lore.kernel.org/r/20250513-vsock-vmtest-v5-1-4e75c4a45...@gmail.com Changes in v5: - make log file a tmpfile (Paolo) - make sure both default and user defined QEMU gets handled by the dependency check (Paolo) - increased VM boot up timeout from 1m to 3m for slow hosts (Paolo) - rename vm_setup -> vm_start (Paolo) - derive wait_for_listener from selftests/net/net_helper.sh to removes ss usage - Remove unused 'unset IFS' line (Paolo) - leave space after variable declarations (Paolo) - make QEMU_PIDFILE a tmp file (Paolo) - make everything readonly that is only read (Paolo) - source ktap_helpers.sh for KSFT_PASS and friends (Paolo) - don't check for timeout util (Paolo) - add missing usage string for -q qemu arg - add tap prefix to SUMMARY line since it isn't part of TAP protocol - exit with the correct status code based on failure/pass counts - Link to v4: https://lore.kernel.org/r/20250507-vsock-vmtest-v4-1-6e2a97262...@gmail.com Changes in v4: - do not use special tab delimiter for help string parsing (Stefano + Paolo) - fix paths for when installing kselftest and running out-of-tree (Paolo) - change vng to using running kernel instead of compiled kernel (Paolo) - use multi-line string for QEMU_OPTS (Stefano) - change timeout to 300s (Paolo) - skip if tools are not found and use kselftests status codes (Paolo) - remove build from vmtest.sh (Paolo) - change -> SSH_HOST_PORT (Stefano) - add tap-format output - add vmtest.log to gitignore - check for vsock_test binary and remind user to build it if missing - create a proper build in makefile - style fixes - add ssh, timeout, and pkill to dependency check, just in case - fix numerical comparison in conditionals - check qemu pidfile exists before proceeding (avoid wasting time waiting for ssh) - fix tracking of pass/fail bug - fix stderr redirection bug - Link to v3: https://lore.kernel.org/r/20250428-vsock-vmtest-v3-1-181af6163...@gmail.com Changes in v3: - use common conditional syntax for checking variables - use return value instead of global rc - fix typo TEST_HOST_LISTENER_PORT -> TEST_HOST_PORT_LISTENER - use SIGTERM instead of SIGKILL on cleanup - use peer-cid=1 for loopback - change sleep delay times into globals - fix test_vm_loopback logging - add test selection in arguments - make QEMU an argument - check that vng binary is on path - use QEMU variable - change to - fix hardcoded file paths - add comment in commit msg about script that vmtest.sh was based off of - Add tools/testing/selftest/vsock/Makefile for kselftest - Link to v2: https://lore.kernel.org/r/20250417-vsock-vmtest-v2-1-3901a2733...@gmail.com Changes in v2: - add kernel oops and warnings checker - change testname variable to use FUNCNAME - fix spacing in test_vm_server_host_client - add -s skip build option to vmtest.sh - add test_vm_loopback - pass port to vm_wait_for_listener - fix indentation in vmtest.sh - add vmci and hyperv to config - changed whitespace from tabs to spaces in help string - Link to v1: https://lore.kernel.org/r/20250410-vsock-vmtest-v1-1-f35a81dab...@gmail.com --- MAINTAINERS | 1 + tools/testing/selftests/vsock/.gitignore | 2 + tools/testing/selftests/vsock/Makefile | 16 ++ tools/testing/selftests/vsock/config | 114 tools/testing/selftests/vsock
Re: [PATCH] Revert "remoteproc: core: Clear table_sz when rproc_shutdown"
On Thu, May 15, 2025 at 12:21:14PM -0500, Andrew Davis wrote: > On 5/13/25 10:52 AM, Bjorn Andersson wrote: > > Clearing the table_sz on cleanup seemed reasonable, but further > > discussions concluded that this merely working around the issue > > and that the fix is incomplete. > > > > As such, revert commit efdde3d73ab2 ("remoteproc: core: Clear table_sz > > when rproc_shutdown") to avoid carrying a partial fix. > > > > Setting table_sz to 0 still seems like a good idea from a defensive > programming perspective. Both table_ptr and table_sz should be set > and cleared together in all spots. > > In addition to this, another fix would be to also update > both table_ptr and table_sz to 0 when loading firmware without > a resource table. Both should be done, no need to revert this. > As mentioned by Bjorn, this is a partial fix. I'm all good with setting table_sz to 0, but as long as the real solution to the problem is part of the same work. Once the patch is reverted, which I'm about to apply, work can continue. > Andrew > > > Signed-off-by: Bjorn Andersson > > --- > > drivers/remoteproc/remoteproc_core.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/drivers/remoteproc/remoteproc_core.c > > b/drivers/remoteproc/remoteproc_core.c > > index > > 48d146e1fa560397c11eeb8f824ae0fb844a022b..81b2ccf988e852ac79cee375c7e3f118c2a4b41a > > 100644 > > --- a/drivers/remoteproc/remoteproc_core.c > > +++ b/drivers/remoteproc/remoteproc_core.c > > @@ -2025,7 +2025,6 @@ int rproc_shutdown(struct rproc *rproc) > > kfree(rproc->cached_table); > > rproc->cached_table = NULL; > > rproc->table_ptr = NULL; > > - rproc->table_sz = 0; > > out: > > mutex_unlock(&rproc->lock); > > return ret; > > > > --- > > base-commit: aa94665adc28f3fdc3de2979ac1e98bae961d6ca > > change-id: 20250513-revert-rproc-table-sz-53ecf24726ae > > > > Best regards,
Re: [PATCH] selftests: Add functional test for the abort file in fusectl
On Thu, May 15, 2025 at 3:35 PM Chen Linxuan wrote: > This patch add a simple functional test for the "about" file Sorry for the typo, it should be "abort".
[PATCH 1/2] selftests/mm: skip uffd tests in madv_guard if uffd is not present.
When userfaultfd is not compiled into kernel, userfaultfd() returns -1, causing uffd tests in madv_guard fail. Skip the tests instead. Signed-off-by: Zi Yan --- tools/testing/selftests/mm/guard-regions.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/mm/guard-regions.c b/tools/testing/selftests/mm/guard-regions.c index 0cd9d236649d..93af3d3760f9 100644 --- a/tools/testing/selftests/mm/guard-regions.c +++ b/tools/testing/selftests/mm/guard-regions.c @@ -1453,8 +1453,21 @@ TEST_F(guard_regions, uffd) /* Set up uffd. */ uffd = userfaultfd(0); - if (uffd == -1 && errno == EPERM) - ksft_exit_skip("No userfaultfd permissions, try running as root.\n"); + if (uffd == -1) { + switch (errno) { + case EPERM: + SKIP(return, "No userfaultfd permissions, try running as root."); + break; + case ENOSYS: + SKIP(return, "userfaultfd is not supported/not enabled."); + break; + default: + ksft_exit_fail_msg("userfaultfd failed with %s\n", + strerror(errno)); + break; + } + } + ASSERT_NE(uffd, -1); ASSERT_EQ(ioctl(uffd, UFFDIO_API, &api), 0); -- 2.47.2
[PATCH 2/2] selftests/mm: skip hugevm test if kernel config file is not present.
When running hugevm tests in a machine without kernel config present, e.g., a VM running a kernel without CONFIG_IKCONFIG_PROC nor /boot/config-*, skip hugevm tests, which reads kernel config to get page table level information. Signed-off-by: Zi Yan --- .../selftests/mm/va_high_addr_switch.sh | 26 +++ 1 file changed, 9 insertions(+), 17 deletions(-) diff --git a/tools/testing/selftests/mm/va_high_addr_switch.sh b/tools/testing/selftests/mm/va_high_addr_switch.sh index 1f92e8caceac..325de53966b6 100755 --- a/tools/testing/selftests/mm/va_high_addr_switch.sh +++ b/tools/testing/selftests/mm/va_high_addr_switch.sh @@ -7,23 +7,20 @@ # real test to check that the kernel is configured to support at least 5 # pagetable levels. -# 1 means the test failed -exitcode=1 - # Kselftest framework requirement - SKIP code is 4. ksft_skip=4 -fail() +skip() { echo "$1" - exit $exitcode + exit $ksft_skip } check_supported_x86_64() { local config="/proc/config.gz" [[ -f "${config}" ]] || config="/boot/config-$(uname -r)" - [[ -f "${config}" ]] || fail "Cannot find kernel config in /proc or /boot" + [[ -f "${config}" ]] || skip "Cannot find kernel config in /proc or /boot" # gzip -dcfq automatically handles both compressed and plaintext input. # See man 1 gzip under '-f'. @@ -33,11 +30,9 @@ check_supported_x86_64() else {print 1}; exit}' /proc/cpuinfo 2>/dev/null) if [[ "${pg_table_levels}" -lt 5 ]]; then - echo "$0: PGTABLE_LEVELS=${pg_table_levels}, must be >= 5 to run this test" - exit $ksft_skip + skip "$0: PGTABLE_LEVELS=${pg_table_levels}, must be >= 5 to run this test" elif [[ "${cpu_supports_pl5}" -ne 0 ]]; then - echo "$0: CPU does not have the necessary la57 flag to support page table level 5" - exit $ksft_skip + skip "$0: CPU does not have the necessary la57 flag to support page table level 5" fi } @@ -45,24 +40,21 @@ check_supported_ppc64() { local config="/proc/config.gz" [[ -f "${config}" ]] || config="/boot/config-$(uname -r)" - [[ -f "${config}" ]] || fail "Cannot find kernel config in /proc or /boot" + [[ -f "${config}" ]] || skip "Cannot find kernel config in /proc or /boot" local pg_table_levels=$(gzip -dcfq "${config}" | grep PGTABLE_LEVELS | cut -d'=' -f 2) if [[ "${pg_table_levels}" -lt 5 ]]; then - echo "$0: PGTABLE_LEVELS=${pg_table_levels}, must be >= 5 to run this test" - exit $ksft_skip + skip "$0: PGTABLE_LEVELS=${pg_table_levels}, must be >= 5 to run this test" fi local mmu_support=$(grep -m1 "mmu" /proc/cpuinfo | awk '{print $3}') if [[ "$mmu_support" != "radix" ]]; then - echo "$0: System does not use Radix MMU, required for 5-level paging" - exit $ksft_skip + skip "$0: System does not use Radix MMU, required for 5-level paging" fi local hugepages_total=$(awk '/HugePages_Total/ {print $2}' /proc/meminfo) if [[ "${hugepages_total}" -eq 0 ]]; then - echo "$0: HugePages are not enabled, required for some tests" - exit $ksft_skip + skip "$0: HugePages are not enabled, required for some tests" fi } -- 2.47.2
Re: [PATCH net-next v5] selftests/vsock: add initial vmtest.sh for vsock
On Thu, May 15, 2025 at 11:02:00AM +0200, Stefano Garzarella wrote: > On Tue, May 13, 2025 at 11:31:13PM -0700, Bobby Eshleman wrote: > > This commit introduces a new vmtest.sh runner for vsock. > > > > It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H, > > H2G, and loopback. The testing tools from tools/testing/vsock/ are > > reused. Currently, only vsock_test is used. > > > > VMCI and hyperv support is automatically built, though not used. > > > > Only tested on x86. > > > > To run: > > > > $ tools/testing/selftests/vsock/vmtest.sh > > I just tried to call the script directly, but I have this: > > $ tools/testing/selftests/vsock/vmtest.sh > skip: > /home/stefano/repos/linux-vsock/tools/testing/selftests/vsock/vsock_test not > found! Please build the kselftest vsock target. > pkill: pidfile not valid > Try `pkill --help' for more information. > > Is this expected? > > > > > or > > > > $ make -C tools/testing/selftests TARGETS=vsock run_tests > > > > Example runs: > > > > $ ./tools/testing/selftests/vsock/vmtest.sh > > TAP version 13 > > 1..3 > > not ok 0 vm_server_host_client # exit=1 > > ok 1 vm_client_host_server > > ok 2 vm_loopback > > 1..3 > > > > $ ./tools/testing/selftests/vsock/vmtest.sh vm_loopback > > TAP version 13 > > 1..1 > > ok 0 vm_loopback > > 1..1 > > > > Future work can include vsock_diag_test. > > > > The tap output style copies mm's run_vmtests.sh. > > > > Because vsock requires a VM to test anything other than loopback, this > > patch adds vmtest.sh as a kselftest itself. This is different than other > > systems that have a "vmtest.sh", where it is used as a utility script to > > spin up a VM to run the selftests as a guest (but isn't hooked into > > kselftest). > > > > Testing in NIPA is still WIP. > > > > Signed-off-by: Bobby Eshleman > > --- > > Changes in v5: > > - make log file a tmpfile (Paolo) > > - make sure both default and user defined QEMU gets handled by the > > dependency check (Paolo) > > - increased VM boot up timeout from 1m to 3m for slow hosts (Paolo) > > - rename vm_setup -> vm_start (Paolo) > > - derive wait_for_listener from selftests/net/net_helper.sh to removes ss > > usage > > - Remove unused 'unset IFS' line (Paolo) > > - leave space after variable declarations (Paolo) > > - make QEMU_PIDFILE a tmp file (Paolo) > > - make everything readonly that is only read (Paolo) > > - source ktap_helpers.sh for KSFT_PASS and friends (Paolo) > > - don't check for timeout util (Paolo) > > - add missing usage string for -q qemu arg > > - add tap prefix to SUMMARY line since it isn't part of TAP protocol > > - exit with the correct status code based on failure/pass counts > > - Link to v4: > > https://lore.kernel.org/r/20250507-vsock-vmtest-v4-1-6e2a97262...@gmail.com > > > > Changes in v4: > > - do not use special tab delimiter for help string parsing (Stefano + Paolo) > > - fix paths for when installing kselftest and running out-of-tree (Paolo) > > - change vng to using running kernel instead of compiled kernel (Paolo) > > - use multi-line string for QEMU_OPTS (Stefano) > > - change timeout to 300s (Paolo) > > - skip if tools are not found and use kselftests status codes (Paolo) > > - remove build from vmtest.sh (Paolo) > > - change -> SSH_HOST_PORT (Stefano) > > - add tap-format output > > - add vmtest.log to gitignore > > - check for vsock_test binary and remind user to build it if missing > > - create a proper build in makefile > > - style fixes > > - add ssh, timeout, and pkill to dependency check, just in case > > - fix numerical comparison in conditionals > > - check qemu pidfile exists before proceeding (avoid wasting time waiting > > for ssh) > > - fix tracking of pass/fail bug > > - fix stderr redirection bug > > - Link to v3: > > https://lore.kernel.org/r/20250428-vsock-vmtest-v3-1-181af6163...@gmail.com > > > > Changes in v3: > > - use common conditional syntax for checking variables > > - use return value instead of global rc > > - fix typo TEST_HOST_LISTENER_PORT -> TEST_HOST_PORT_LISTENER > > - use SIGTERM instead of SIGKILL on cleanup > > - use peer-cid=1 for loopback > > - change sleep delay times into globals > > - fix test_vm_loopback logging > > - add test selection in arguments > > - make QEMU an argument > > - check that vng binary is on path > > - use QEMU variable > > - change to > > - fix hardcoded file paths > > - add comment in commit msg about script that vmtest.sh was based off of > > - Add tools/testing/selftest/vsock/Makefile for kselftest > > - Link to v2: > > https://lore.kernel.org/r/20250417-vsock-vmtest-v2-1-3901a2733...@gmail.com > > > > Changes in v2: > > - add kernel oops and warnings checker > > - change testname variable to use FUNCNAME > > - fix spacing in test_vm_server_host_client > > - add -s skip build option to vmtest.sh > > - add test_vm_loopback > > - pass port to vm_wait_for_listener > > - fix indentation in vmtest.sh > > - add vmci and hyperv to config > > - changed
Re: [PATCH bpf-next v4 0/3] Allow mmap of /sys/kernel/btf/vmlinux
> I'd like to cut down the memory usage of parsing vmlinux BTF in ebpf-go. > With some upcoming changes the library is sitting at 5MiB for a parse. > Most of that memory is simply copying the BTF blob into user space. > By allowing vmlinux BTF to be mmapped read-only into user space I can > cut memory usage by about 75%. > > Signed-off-by: Lorenz Bauer For the series, Tested-by: Alan Maguire Tested with 4k and 64k page size on aarch64; all worked perfectly. Thanks!
Re: [PATCH] vhost/net: remove zerocopy support
On Mon, May 12, 2025 at 5:21 PM Jon Kohler wrote: > > > > > On Apr 30, 2025, at 9:21 PM, Jon Kohler wrote: > > > > > > > >> On Apr 16, 2025, at 6:15 AM, Eugenio Perez Martin > >> wrote: > >> > >> !---| > >> CAUTION: External Email > >> > >> |---! > >> > >> On Tue, Apr 8, 2025 at 8:28 AM Jason Wang wrote: > >>> > >>> On Tue, Apr 8, 2025 at 9:18 AM Jon Kohler wrote: > > > > > On Apr 6, 2025, at 7:14 PM, Jason Wang wrote: > > > > !---| > > CAUTION: External Email > > > > |---! > > > > On Fri, Apr 4, 2025 at 10:24 PM Jon Kohler wrote: > >> > >> Commit 098eadce3c62 ("vhost_net: disable zerocopy by default") disabled > >> the module parameter for the handle_tx_zerocopy path back in 2019, > >> nothing that many downstream distributions (e.g., RHEL7 and later) had > >> already done the same. > >> > >> Both upstream and downstream disablement suggest this path is rarely > >> used. > >> > >> Testing the module parameter shows that while the path allows packet > >> forwarding, the zerocopy functionality itself is broken. On outbound > >> traffic (guest TX -> external), zerocopy SKBs are orphaned by either > >> skb_orphan_frags_rx() (used with the tun driver via tun_net_xmit()) > > > > This is by design to avoid DOS. > > I understand that, but it makes ZC non-functional in general, as ZC fails > and immediately increments the error counters. > >>> > >>> The main issue is HOL, but zerocopy may still work in some setups that > >>> don't need to care about HOL. One example the macvtap passthrough > >>> mode. > >>> > > > > >> or > >> skb_orphan_frags() elsewhere in the stack, > > > > Basically zerocopy is expected to work for guest -> remote case, so > > could we still hit skb_orphan_frags() in this case? > > Yes, you’d hit that in tun_net_xmit(). > >>> > >>> Only for local VM to local VM communication. > > > > Sure, but the tricky bit here is that if you have a mix of VM-VM and > > VM-external > > traffic patterns, any time the error path is hit, the zc error counter will > > go up. > > > > When that happens, ZC will get silently disabled anyhow, so it leads to > > sporadic > > success / non-deterministic performance. > > > >>> > If you punch a hole in that *and* in the > zc error counter (such that failed ZC doesn’t disable ZC in vhost), you > get ZC > from vhost; however, the network interrupt handler under net_tx_action > and > eventually incurs the memcpy under dev_queue_xmit_nit(). > >>> > >>> Well, yes, we need a copy if there's a packet socket. But if there's > >>> no network interface taps, we don't need to do the copy here. > >>> > > > > Agreed on the packet socket side. I recently fixed an issue in lldpd [1] > > that prevented > > this specific case; however, there are still other trip wires spread out > > across the > > stack that would need to be addressed. > > > > [1] > > https://github.com/lldpd/lldpd/commit/622a91144de4ae487ceebdb333863e9f660e0717 > > > >> > >> Hi! > >> > >> I need more time diving into the issues. As Jon mentioned, vhost ZC is > >> so little used I didn't have the chance to experiment with this until > >> now :). But yes, I expect to be able to overcome these for pasta, by > >> adapting buffer sizes or modifying code etc. > > > > Another tricky bit here is that it has been disabled both upstream and > > downstream > > for so long, the code naturally has a bit of wrench-in-the-engine. > > > > RE Buffer sizes: I tried this as well, because I think on sufficiently fast > > systems, > > zero copy gets especially interesting in GSO/TSO cases where you have mega > > payloads. > > > > I tried playing around with the good copy value such that ZC restricted > > itself to > > only lets say 32K payloads and above, and while it *does* work (with enough > > holes punched in), absolute t-put doesn’t actually go up, its just that CPU > > utilization > > goes down a pinch. Not a bad thing for certain, but still not great. > > > > In fact, I found that tput actually went down with this path, even with ZC > > occurring > > successfully, as there was still a mix of ZC and non-ZC because you can only > > have so many pending at any given time before the copy path kicks in again. > > > > > >> > > This is no more performant, and in fact is actually worse since the time > spent > waiting on that memcpy to resolve is longer. > > > > >> as vhost_net does not set > >> SKBFL_DONT_ORPHAN. > >>> > >>> Maybe we can try to set this as vhost-net can hornor ulimit now. > > > > Yea I tried that, and while it helps kick things
Re: [PATCH 01/12] module: Move modprobe_path and modules_disabled ctl_tables into the module subsys
On 5/9/25 14:54, Joel Granados wrote: > Move module sysctl (modprobe_path and modules_disabled) out of sysctl.c > and into the modules subsystem. Make the modprobe_path variable static > as it no longer needs to be exported. Remove module.h from the includes > in sysctl as it no longer uses any module exported variables. > > This is part of a greater effort to move ctl tables into their > respective subsystems which will reduce the merge conflicts in > kernel/sysctl.c. > > Signed-off-by: Joel Granados > [...] > --- a/kernel/module/kmod.c > +++ b/kernel/module/kmod.c > @@ -60,7 +60,7 @@ static DEFINE_SEMAPHORE(kmod_concurrent_max, > MAX_KMOD_CONCURRENT); > /* > modprobe_path is set via /proc/sys. > */ > -char modprobe_path[KMOD_PATH_LEN] = CONFIG_MODPROBE_PATH; > +static char modprobe_path[KMOD_PATH_LEN] = CONFIG_MODPROBE_PATH; > > static void free_modprobe_argv(struct subprocess_info *info) > { > @@ -177,3 +177,33 @@ int __request_module(bool wait, const char *fmt, ...) > return ret; > } > EXPORT_SYMBOL(__request_module); > + > +#ifdef CONFIG_MODULES > +static const struct ctl_table kmod_sysctl_table[] = { > + { > + .procname = "modprobe", > + .data = &modprobe_path, > + .maxlen = KMOD_PATH_LEN, > + .mode = 0644, > + .proc_handler = proc_dostring, > + }, > + { > + .procname = "modules_disabled", > + .data = &modules_disabled, > + .maxlen = sizeof(int), > + .mode = 0644, > + /* only handle a transition from default "0" to "1" */ > + .proc_handler = proc_dointvec_minmax, > + .extra1 = SYSCTL_ONE, > + .extra2 = SYSCTL_ONE, > + }, This is minor.. but the file kernel/module/kmod.c contains the logic to request direct modprobe invocation by the kernel. Registering the modprobe_path sysctl here is appropriate. However, the modules_disabled setting affects the entire module loader so I don't think it's best to register it here. I suggest keeping a single table for the module sysctl values but moving it to kernel/module/main.c. This means the variable modprobe_path must retain external linkage, on the other hand, modules_disabled can be made static. -- Thanks, Petr > +}; > + > +static int __init init_kmod_sysctl(void) > +{ > + register_sysctl_init("kernel", kmod_sysctl_table); > + return 0; > +} > + > +subsys_initcall(init_kmod_sysctl); > +#endif
Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
On 5/13/2025 3:24 PM, H. Peter Anvin wrote: On May 12, 2025 11:06:02 PM PDT, "Jürgen Groß" wrote: On 13.05.25 07:55, Xin Li wrote: On 5/12/2025 4:24 AM, Juergen Gross wrote: Now with the mentioned patch really attached. :-) Does it allow patching with an instruction more than 6 bytes long? The immediate form MSR instructions are 9 bytes long. Yes, shouldn't be a problem. Juergen However, it is more than that. The immediate instructions have a different interface, and it makes more sense to use the extra bytes to shuffle the bits around for the legacy forms: Write: mov %rax,%rdx shr $32,%rdx wrmsr(ns) Read: rdmsr shl $32,%rdx or %rdx,%rax For the write case, this also means that two separate trap points are needed. As far as Xen (the only user of pv msrs), note that it only paravirtualizes a very small number of MSRs, and some of those are fairly performance sensitive, so not going through the Xen framework for MSRs known to be either native or null on Xen would definitely be a win. Hi Juergen, I have some update on this thread while working on it. If we continue down the path of maintaining pvops MSR APIs as this patch series does, it seems we’ll need to duplicate the ALTERNATIVE code in three different places. 1) The MSR access primitives defined in , which is used when CONFIG_PARAVIRT=n. 2) The pvops native MSR functions pv_native_{rd,wr}msr{,_safe}() defined in arch/x86/kernel/paravirt.c, used when CONFIG_PARAVIRT=y on bare metal. 3) The pvops Xen MSR functions paravirt_{read,write}_msr{,_safe}() defined in , used when CONFIG_PARAVIRT_XXL=y. hpa had mentioned to me earlier that this would be a maintenance burden — something I only truly realized once I got hands-on with it. Maybe you have something in mind to address it? Also add PeterZ to the To list because he cares it. Thanks! Xin
[PATCH] selftests: Add functional test for the abort file in fusectl
This patch add a simple functional test for the "about" file in fusectlfs (/sys/fs/fuse/connections/ID/about). A simple fuse daemon is added for testing. Related discussion can be found in the link below. Link: https://lore.kernel.org/all/CAOQ4uxjKFXOKQxPpxtS6G_nR0tpw95w0GiO68UcWg_OBhmSY=q...@mail.gmail.com/ Cc: Amir Goldstein Signed-off-by: Chen Linxuan --- MAINTAINERS | 1 + tools/testing/selftests/Makefile | 1 + .../selftests/filesystems/fusectl/.gitignore | 3 + .../selftests/filesystems/fusectl/Makefile| 21 +++ .../selftests/filesystems/fusectl/fuse_mnt.c | 146 ++ .../filesystems/fusectl/fusectl_test.c| 115 ++ 6 files changed, 287 insertions(+) create mode 100644 tools/testing/selftests/filesystems/fusectl/.gitignore create mode 100644 tools/testing/selftests/filesystems/fusectl/Makefile create mode 100644 tools/testing/selftests/filesystems/fusectl/fuse_mnt.c create mode 100644 tools/testing/selftests/filesystems/fusectl/fusectl_test.c diff --git a/MAINTAINERS b/MAINTAINERS index f21f1dabb5fe1..efc6c89113b95 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9740,6 +9740,7 @@ T:git git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git F: Documentation/filesystems/fuse.rst F: fs/fuse/ F: include/uapi/linux/fuse.h +F: tools/testing/selftests/filesystems/fusectl FUTEX SUBSYSTEM M: Thomas Gleixner diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 80fb84fa3cfcb..a9bfefa961889 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -36,6 +36,7 @@ TARGETS += filesystems/fat TARGETS += filesystems/overlayfs TARGETS += filesystems/statmount TARGETS += filesystems/mount-notify +TARGETS += filesystems/fusectl TARGETS += firmware TARGETS += fpu TARGETS += ftrace diff --git a/tools/testing/selftests/filesystems/fusectl/.gitignore b/tools/testing/selftests/filesystems/fusectl/.gitignore new file mode 100644 index 0..3e72e742d08e8 --- /dev/null +++ b/tools/testing/selftests/filesystems/fusectl/.gitignore @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only +fuse_mnt +fusectl_test diff --git a/tools/testing/selftests/filesystems/fusectl/Makefile b/tools/testing/selftests/filesystems/fusectl/Makefile new file mode 100644 index 0..612aad69a93aa --- /dev/null +++ b/tools/testing/selftests/filesystems/fusectl/Makefile @@ -0,0 +1,21 @@ +# SPDX-License-Identifier: GPL-2.0-or-later + +CFLAGS += -Wall -O2 -g $(KHDR_INCLUDES) + +TEST_GEN_PROGS := fusectl_test +TEST_GEN_FILES := fuse_mnt + +include ../../lib.mk + +VAR_CFLAGS := $(shell pkg-config fuse --cflags 2>/dev/null) +ifeq ($(VAR_CFLAGS),) +VAR_CFLAGS := -D_FILE_OFFSET_BITS=64 -I/usr/include/fuse +endif + +VAR_LDLIBS := $(shell pkg-config fuse --libs 2>/dev/null) +ifeq ($(VAR_LDLIBS),) +VAR_LDLIBS := -lfuse -pthread +endif + +$(OUTPUT)/fuse_mnt: CFLAGS += $(VAR_CFLAGS) +$(OUTPUT)/fuse_mnt: LDLIBS += $(VAR_LDLIBS) diff --git a/tools/testing/selftests/filesystems/fusectl/fuse_mnt.c b/tools/testing/selftests/filesystems/fusectl/fuse_mnt.c new file mode 100644 index 0..d12b17f30fadc --- /dev/null +++ b/tools/testing/selftests/filesystems/fusectl/fuse_mnt.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * fusectl test file-system + * Creates a simple FUSE filesystem with a single read-write file (/test) + */ + +#define FUSE_USE_VERSION 26 + +#include +#include +#include +#include +#include +#include +#include + +#define MAX(a, b) ((a) > (b) ? (a) : (b)) + +static char *content; +static size_t content_size = 0; +static const char test_path[] = "/test"; + +static int test_getattr(const char *path, struct stat *st) +{ + memset(st, 0, sizeof(*st)); + + if (!strcmp(path, "/")) { + st->st_mode = S_IFDIR | 0755; + st->st_nlink = 2; + return 0; + } + + if (!strcmp(path, test_path)) { + st->st_mode = S_IFREG | 0664; + st->st_nlink = 1; + st->st_size = content_size; + return 0; + } + + return -ENOENT; +} + +static int test_readdir(const char *path, void *buf, fuse_fill_dir_t filler, + off_t offset, struct fuse_file_info *fi) +{ + if (strcmp(path, "/")) + return -ENOENT; + + filler(buf, ".", NULL, 0); + filler(buf, "..", NULL, 0); + filler(buf, test_path + 1, NULL, 0); + + return 0; +} + +static int test_open(const char *path, struct fuse_file_info *fi) +{ + if (strcmp(path, test_path)) + return -ENOENT; + + return 0; +} + +static int test_read(const char *path, char *buf, size_t size, off_t offset, +struct fuse_file_info *fi) +{ + if (strcmp(path, test_path) != 0) + return -ENOENT; + + if (!content || content_size == 0) +
[PATCH] selftests/memfd: clean Makefile
When writing a test for fusectl, I referred to this Makefile as a reference for creating a FUSE daemon in the selftests. While doing so, I noticed that there is a minor issue in the Makefile. The fuse_mnt.c file is not actually compiled into fuse_mnt.o, and the code setting CFLAGS for it never takes effect. The reason fuse_mnt compiles successfully is because CFLAGS is set at the very beginning of the file. Signed-off-by: Chen Linxuan --- tools/testing/selftests/memfd/Makefile | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/tools/testing/selftests/memfd/Makefile b/tools/testing/selftests/memfd/Makefile index 163b6f68631c4..e9b886c65153d 100644 --- a/tools/testing/selftests/memfd/Makefile +++ b/tools/testing/selftests/memfd/Makefile @@ -1,5 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 -CFLAGS += -D_FILE_OFFSET_BITS=64 CFLAGS += $(KHDR_INCLUDES) TEST_GEN_PROGS := memfd_test @@ -16,10 +15,9 @@ ifeq ($(VAR_LDLIBS),) VAR_LDLIBS := -lfuse -pthread endif -fuse_mnt.o: CFLAGS += $(VAR_CFLAGS) - include ../lib.mk +$(OUTPUT)/fuse_mnt: CFLAGS += $(VAR_CFLAGS) $(OUTPUT)/fuse_mnt: LDLIBS += $(VAR_LDLIBS) $(OUTPUT)/memfd_test: memfd_test.c common.c -- 2.43.0
Re: [PATCH v4 3/9] slab: sheaf prefilling for guaranteed allocations
On 5/7/25 11:15, Harry Yoo wrote: > On Fri, Apr 25, 2025 at 10:27:23AM +0200, Vlastimil Babka wrote: >> Add functions for efficient guaranteed allocations e.g. in a critical >> section that cannot sleep, when the exact number of allocations is not >> known beforehand, but an upper limit can be calculated. >> >> kmem_cache_prefill_sheaf() returns a sheaf containing at least given >> number of objects. >> >> kmem_cache_alloc_from_sheaf() will allocate an object from the sheaf >> and is guaranteed not to fail until depleted. >> >> kmem_cache_return_sheaf() is for giving the sheaf back to the slab >> allocator after the critical section. This will also attempt to refill >> it to cache's sheaf capacity for better efficiency of sheaves handling, >> but it's not stricly necessary to succeed. >> >> kmem_cache_refill_sheaf() can be used to refill a previously obtained >> sheaf to requested size. If the current size is sufficient, it does >> nothing. If the requested size exceeds cache's sheaf_capacity and the >> sheaf's current capacity, the sheaf will be replaced with a new one, >> hence the indirect pointer parameter. >> >> kmem_cache_sheaf_size() can be used to query the current size. >> >> The implementation supports requesting sizes that exceed cache's >> sheaf_capacity, but it is not efficient - such "oversize" sheaves are >> allocated fresh in kmem_cache_prefill_sheaf() and flushed and freed >> immediately by kmem_cache_return_sheaf(). kmem_cache_refill_sheaf() >> might be especially ineffective when replacing a sheaf with a new one of >> a larger capacity. It is therefore better to size cache's >> sheaf_capacity accordingly to make oversize sheaves exceptional. >> >> CONFIG_SLUB_STATS counters are added for sheaf prefill and return >> operations. A prefill or return is considered _fast when it is able to >> grab or return a percpu spare sheaf (even if the sheaf needs a refill to >> satisfy the request, as those should amortize over time), and _slow >> otherwise (when the barn or even sheaf allocation/freeing has to be >> involved). sheaf_prefill_oversize is provided to determine how many >> prefills were oversize (counter for oversize returns is not necessary as >> all oversize refills result in oversize returns). >> >> When slub_debug is enabled for a cache with sheaves, no percpu sheaves >> exist for it, but the prefill functionality is still provided simply by >> all prefilled sheaves becoming oversize. If percpu sheaves are not >> created for a cache due to not passing the sheaf_capacity argument on >> cache creation, the prefills also work through oversize sheaves, but >> there's a WARN_ON_ONCE() to indicate the omission. >> >> Signed-off-by: Vlastimil Babka >> Reviewed-by: Suren Baghdasaryan >> --- > > Looks good to me, > Reviewed-by: Harry Yoo > > with a nit below. Thanks, incorporated the suggestion!
Re: [PATCH v4 2/9] slab: add sheaf support for batching kfree_rcu() operations
On 5/14/25 16:01, Vlastimil Babka wrote: > On 5/6/25 23:34, Suren Baghdasaryan wrote: >> On Fri, Apr 25, 2025 at 1:27 AM Vlastimil Babka wrote: >>> @@ -2631,6 +2637,24 @@ static void sheaf_flush_unused(struct kmem_cache *s, >>> struct slab_sheaf *sheaf) >>> sheaf->size = 0; >>> } >>> >>> +static void __rcu_free_sheaf_prepare(struct kmem_cache *s, >>> +struct slab_sheaf *sheaf); >> >> I think you could safely move __rcu_free_sheaf_prepare() here and >> avoid the above forward declaration. > > Right, done. > >>> @@ -5304,6 +5340,140 @@ bool free_to_pcs(struct kmem_cache *s, void *object) >>> return true; >>> } >>> >>> +static void __rcu_free_sheaf_prepare(struct kmem_cache *s, >>> +struct slab_sheaf *sheaf) >> >> This function seems to be an almost exact copy of free_to_pcs_bulk() >> from your previous patch. Maybe they can be consolidated? > > True, I've extracted it to __kmem_cache_free_bulk_prepare(). ... and that was a mistake as free_to_pcs_bulk() diverges in patch 9/9 in a way that this makes it too infeasible
Re: [linux-next:master] [rcutorture] c27d0d38f2: WARNING:at_kernel/rcu/rcutorture.c:#rcutorture_one_extend_check[rcutorture]
hi, Paul, On Wed, May 14, 2025 at 09:18:10PM -0700, Paul E. McKenney wrote: > On Thu, May 15, 2025 at 10:30:02AM +0800, Oliver Sang wrote: > > hi, Paul, > > > > On Wed, May 14, 2025 at 11:26:34AM -0700, Paul E. McKenney wrote: > > > On Wed, May 14, 2025 at 10:47:30AM +0800, kernel test robot wrote: > > > > > > > > hi, Paul, > > > > > > > > for this commit we tested before, now we found it causes issues in > > > > linux-next > > > > master branch. > > > > > > Good catch as always!!! Yes, using HRTIMER_MODE_HARD means that this > > > ircutorture_one_extend_check() function must check for hardirq as well > > > as softirq. Which, oddly enough, permits simplifying the code, though > > > a larger patch. > > > > > > I could reproduce this, and the patch at the end of this email fixes it > > > for me. Does it work for you as well? > > > > thanks a lot for information! > > > > what's the base of the patch? I tried to apply it upon c27d0d38f2 or lastest > > linux-next/master > > bdd609656ff55 (tag: next-20250514, linux-next/master) Add linux-next > > specific files for 20250514 > > > > both failed. thanks > > Apologies, I just now pushed it out on my -rcu tree: > > db950fccd45a ("rcutorture: Make rcutorture_one_extend_check() account for > hardirq") issue is cleaned on db950fccd45a. thanks! Tested-by: kernel test robot = compiler/kconfig/rootfs/runtime/tbox_group/test/testcase/torture_type: clang-20/i386-randconfig-141-20250508/debian-11.1-i386-20220923.cgz/300s/vm-snb/cpuhotplug/rcutorture/srcud commit: 8454f1334b167 ("lib: Make the ratelimit test more reliable") db950fccd45a8 ("rcutorture: Make rcutorture_one_extend_check() account for hardirq") 8454f1334b167506 db950fccd45a8273d93ca861d84 --- fail:runs %reproductionfail:runs | | | 1:50 -2%:50dmesg.EIP:__x86_return_thunk 1:50 -2%:50dmesg.EIP:console_trylock_spinning 48:50 -96%:50dmesg.EIP:pv_native_safe_halt 50:50-100%:50 dmesg.EIP:rcutorture_one_extend_check 50:50-100%:50 dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcutorture_one_extend_check[rcutorture] > > I will need to rebase this to precede the updated version of the commit > that you tested, but one step at a time... > > Thanx, Paul > > > > > = > > > > tbox_group/testcase/rootfs/kconfig/compiler/runtime/test/torture_type: > > > > > > > > vm-snb/rcutorture/debian-11.1-i386-20220923.cgz/i386-randconfig-141-20250508/clang-20/300s/cpuhotplug/srcud > > > > > > > > c795676b5c0a4ab7 c27d0d38f2cafb70a68ca42c410 > > > > --- > > > >fail:runs %reproductionfail:runs > > > >| | | > > > >:50 2% 1:50 > > > > dmesg.EIP:__kernel_text_address > > > >:50 4% 2:50 > > > > dmesg.EIP:__srcu_check_read_flavor > > > >:50 2% 1:50 > > > > dmesg.EIP:_raw_spin_unlock_irq > > > >:50 14% 7:50 > > > > dmesg.EIP:_raw_spin_unlock_irqrestore > > > >:50 30% 15:50 > > > > dmesg.EIP:console_flush_all > > > >:50 4% 2:50 > > > > dmesg.EIP:console_trylock_spinning > > > >:50 2% 1:50dmesg.EIP:delay_tsc > > > >:50 2% 1:50 > > > > dmesg.EIP:finish_lock_switch > > > >:50 2% 1:50 > > > > dmesg.EIP:kernel_text_address > > > >:50 2% 1:50dmesg.EIP:lock_acquire > > > >:50 36% 18:50 > > > > dmesg.EIP:pv_native_safe_halt > > > > 1:50 -2%:50 > > > > dmesg.EIP:rcu_torture_fwd_prog_cr > > > > 7:50 -14%:50 > > > > dmesg.EIP:rcu_torture_writer > > > >:50 100% 50:50 > > > > dmesg.EIP:rcutorture_one_extend_check > > > > 48:50 0% 48:50 > > > > dmesg.INFO:task_blocked_for_more_than#seconds > > > > 48:50 0% 48:50 > > > > dmesg.Kernel_panic-not_syncing:hung_task:blocked_tasks > > > > 50:50 0% 50:50 > > > > dmesg.UBSAN:negation-overflow_in_lib/sort.c > > > > 1:50 -2%:50 > > > > dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_fwd_prog_cr[rcutorture] > > > > 7:50 -14%:50 > > > > dmesg.WARNING:at_kernel/rcu/rcutorture.c
[PATCH] selftests/mm: Fix test result reporting in gup_longterm
The kselftest framework uses the string logged when a test result is reported as the unique identifier for a test, using it to track test results between runs. The gup_longterm test completely fails to follow this pattern, it runs a single test function repeatedly with various parameters but each result report is a string logging an error message which is fixed between runs. Since the code already logs each test uniquely before it starts refactor to also print this to a buffer, then use that name as the test result. This isn't especially pretty, really this test could use a more substantial cleanup. Signed-off-by: Mark Brown --- tools/testing/selftests/mm/gup_longterm.c | 163 -- 1 file changed, 107 insertions(+), 56 deletions(-) diff --git a/tools/testing/selftests/mm/gup_longterm.c b/tools/testing/selftests/mm/gup_longterm.c index 21595b20bbc3..a849537f9372 100644 --- a/tools/testing/selftests/mm/gup_longterm.c +++ b/tools/testing/selftests/mm/gup_longterm.c @@ -35,6 +35,8 @@ static int nr_hugetlbsizes; static size_t hugetlbsizes[10]; static int gup_fd; +static char test_name[1024]; + static __fsword_t get_fs_type(int fd) { struct statfs fs; @@ -93,33 +95,48 @@ static void do_test(int fd, size_t size, enum test_type type, bool shared) __fsword_t fs_type = get_fs_type(fd); bool should_work; char *mem; + int result = KSFT_PASS; int ret; + if (fd < 0) { + result = KSFT_FAIL; + goto report; + } + if (ftruncate(fd, size)) { if (errno == ENOENT) { skip_test_dodgy_fs("ftruncate()"); } else { - ksft_test_result_fail("ftruncate() failed (%s)\n", strerror(errno)); + ksft_print_msg("ftruncate() failed (%s)\n", + strerror(errno)); + result = KSFT_FAIL; + goto report; } return; } if (fallocate(fd, 0, 0, size)) { - if (size == pagesize) - ksft_test_result_fail("fallocate() failed (%s)\n", strerror(errno)); - else - ksft_test_result_skip("need more free huge pages\n"); - return; + if (size == pagesize) { + ksft_print_msg("fallocate() failed (%s)\n", strerror(errno)); + result = KSFT_FAIL; + } else { + ksft_print_msg("need more free huge pages\n"); + result = KSFT_SKIP; + } + goto report; } mem = mmap(NULL, size, PROT_READ | PROT_WRITE, shared ? MAP_SHARED : MAP_PRIVATE, fd, 0); if (mem == MAP_FAILED) { - if (size == pagesize || shared) - ksft_test_result_fail("mmap() failed (%s)\n", strerror(errno)); - else - ksft_test_result_skip("need more free huge pages\n"); - return; + if (size == pagesize || shared) { + ksft_print_msg("mmap() failed (%s)\n", strerror(errno)); + result = KSFT_FAIL; + } else { + ksft_print_msg("need more free huge pages\n"); + result = KSFT_SKIP; + } + goto report; } /* Fault in the page such that GUP-fast can pin it directly. */ @@ -134,7 +151,8 @@ static void do_test(int fd, size_t size, enum test_type type, bool shared) */ ret = mprotect(mem, size, PROT_READ); if (ret) { - ksft_test_result_fail("mprotect() failed (%s)\n", strerror(errno)); + ksft_print_msg("mprotect() failed (%s)\n", strerror(errno)); + result = KSFT_FAIL; goto munmap; } /* FALLTHROUGH */ @@ -147,12 +165,14 @@ static void do_test(int fd, size_t size, enum test_type type, bool shared) type == TEST_TYPE_RW_FAST; if (gup_fd < 0) { - ksft_test_result_skip("gup_test not available\n"); + ksft_print_msg("gup_test not available\n"); + result = KSFT_SKIP; break; } if (rw && shared && fs_is_unknown(fs_type)) { - ksft_test_result_skip("Unknown filesystem\n"); + ksft_print_msg("Unknown filesystem\n"); + result = KSFT_SKIP; return; } /* @@ -169,14 +189,19 @@ static void do_test(int fd, size_t size, enum test_type type, bool shared) args.flags |= rw ? PIN_LONGTERM_TEST_FLAG_USE_WRITE :
Re: [PATCH v2 4/4] media: i2c: imx214: Remove hard-coded external clock frequency
Hi Sakari, On Tue, May 06, 2025 at 08:24:03AM +, Sakari Ailus wrote: > On Mon, May 05, 2025 at 11:05:56PM +0200, André Apitzsch via B4 Relay wrote: > > From: André Apitzsch > > > > Instead rely on the rate set on the clock (using assigned-clock-rates > > etc.) > > > > Signed-off-by: André Apitzsch > > --- > > drivers/media/i2c/imx214.c | 6 -- > > 1 file changed, 6 deletions(-) > > > > diff --git a/drivers/media/i2c/imx214.c b/drivers/media/i2c/imx214.c > > index > > 9e9be47394ec768a5b34d44b06b5bbb0988da5a1..c12996e294dccebb18c608254f1e0d14dc064423 > > 100644 > > --- a/drivers/media/i2c/imx214.c > > +++ b/drivers/media/i2c/imx214.c > > @@ -32,7 +32,6 @@ > > > > #define IMX214_REG_FAST_STANDBY_CTRL CCI_REG8(0x0106) > > > > -#define IMX214_DEFAULT_CLK_FREQ2400 > > #define IMX214_DEFAULT_LINK_FREQ 6 > > /* Keep wrong link frequency for backward compatibility */ > > #define IMX214_DEFAULT_LINK_FREQ_LEGACY48000 > > @@ -1405,11 +1404,6 @@ static int imx214_probe(struct i2c_client *client) > > return dev_err_probe(dev, PTR_ERR(imx214->xclk), > > "failed to get xclk\n"); > > > > - ret = clk_set_rate(imx214->xclk, IMX214_DEFAULT_CLK_FREQ); > > - if (ret) > > - return dev_err_probe(dev, ret, > > -"failed to set xclk frequency\n"); > > - > > Oops. I missed this is what the driver was doing already. Indeed, this is > one of the historic sensor drivers that do set the frequency in DT systems. > > The driver never used the clock-frequency property and instead used a fixed > frequency. Changing the behaviour now could be problematic. > > There are options here that I think we could do: > > 1) use your v1 patch (4) which uses "clock-frequency" if it exists and > otherwise uses the default, fixed frequency or > > 2) set the frequency only if the "clock-frequency" property exists. The DT > currently requires clock-frequency and the YAML conversion was done in 2020 > whereas the driver is from 2018. If we do this, the clock-frequency should > be deprecated (or even removed from bingings). > > I wonder what others think. Cc'd Laurent in any case. Maybe I'm missing something, but I don't really see the issue here. The clock-frequency DT property is currently ignored, and this patch doesn't change that situation, does it ? The change of behaviour here is related to the assigned-clock-rates property. If that property is specified today, it will set the clock rate, and the driver will override it to 24MHz right after. With this patch, the clock rate won't be overridden. I think the risk of regression is very low here, as I don't expect systems to set assigned-clock-rates in DT to a value different than 24MHz and expect the driver to override it. > > ret = imx214_get_regulators(dev, imx214); > > if (ret < 0) > > return dev_err_probe(dev, ret, "failed to get regulators\n"); -- Regards, Laurent Pinchart
Re: [PATCH net-next v5] selftests/vsock: add initial vmtest.sh for vsock
On Tue, May 13, 2025 at 11:31:13PM -0700, Bobby Eshleman wrote: This commit introduces a new vmtest.sh runner for vsock. It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H, H2G, and loopback. The testing tools from tools/testing/vsock/ are reused. Currently, only vsock_test is used. VMCI and hyperv support is automatically built, though not used. Only tested on x86. To run: $ tools/testing/selftests/vsock/vmtest.sh I just tried to call the script directly, but I have this: $ tools/testing/selftests/vsock/vmtest.sh skip: /home/stefano/repos/linux-vsock/tools/testing/selftests/vsock/vsock_test not found! Please build the kselftest vsock target. pkill: pidfile not valid Try `pkill --help' for more information. Is this expected? or $ make -C tools/testing/selftests TARGETS=vsock run_tests Example runs: $ ./tools/testing/selftests/vsock/vmtest.sh TAP version 13 1..3 not ok 0 vm_server_host_client # exit=1 ok 1 vm_client_host_server ok 2 vm_loopback 1..3 $ ./tools/testing/selftests/vsock/vmtest.sh vm_loopback TAP version 13 1..1 ok 0 vm_loopback 1..1 Future work can include vsock_diag_test. The tap output style copies mm's run_vmtests.sh. Because vsock requires a VM to test anything other than loopback, this patch adds vmtest.sh as a kselftest itself. This is different than other systems that have a "vmtest.sh", where it is used as a utility script to spin up a VM to run the selftests as a guest (but isn't hooked into kselftest). Testing in NIPA is still WIP. Signed-off-by: Bobby Eshleman --- Changes in v5: - make log file a tmpfile (Paolo) - make sure both default and user defined QEMU gets handled by the dependency check (Paolo) - increased VM boot up timeout from 1m to 3m for slow hosts (Paolo) - rename vm_setup -> vm_start (Paolo) - derive wait_for_listener from selftests/net/net_helper.sh to removes ss usage - Remove unused 'unset IFS' line (Paolo) - leave space after variable declarations (Paolo) - make QEMU_PIDFILE a tmp file (Paolo) - make everything readonly that is only read (Paolo) - source ktap_helpers.sh for KSFT_PASS and friends (Paolo) - don't check for timeout util (Paolo) - add missing usage string for -q qemu arg - add tap prefix to SUMMARY line since it isn't part of TAP protocol - exit with the correct status code based on failure/pass counts - Link to v4: https://lore.kernel.org/r/20250507-vsock-vmtest-v4-1-6e2a97262...@gmail.com Changes in v4: - do not use special tab delimiter for help string parsing (Stefano + Paolo) - fix paths for when installing kselftest and running out-of-tree (Paolo) - change vng to using running kernel instead of compiled kernel (Paolo) - use multi-line string for QEMU_OPTS (Stefano) - change timeout to 300s (Paolo) - skip if tools are not found and use kselftests status codes (Paolo) - remove build from vmtest.sh (Paolo) - change -> SSH_HOST_PORT (Stefano) - add tap-format output - add vmtest.log to gitignore - check for vsock_test binary and remind user to build it if missing - create a proper build in makefile - style fixes - add ssh, timeout, and pkill to dependency check, just in case - fix numerical comparison in conditionals - check qemu pidfile exists before proceeding (avoid wasting time waiting for ssh) - fix tracking of pass/fail bug - fix stderr redirection bug - Link to v3: https://lore.kernel.org/r/20250428-vsock-vmtest-v3-1-181af6163...@gmail.com Changes in v3: - use common conditional syntax for checking variables - use return value instead of global rc - fix typo TEST_HOST_LISTENER_PORT -> TEST_HOST_PORT_LISTENER - use SIGTERM instead of SIGKILL on cleanup - use peer-cid=1 for loopback - change sleep delay times into globals - fix test_vm_loopback logging - add test selection in arguments - make QEMU an argument - check that vng binary is on path - use QEMU variable - change to - fix hardcoded file paths - add comment in commit msg about script that vmtest.sh was based off of - Add tools/testing/selftest/vsock/Makefile for kselftest - Link to v2: https://lore.kernel.org/r/20250417-vsock-vmtest-v2-1-3901a2733...@gmail.com Changes in v2: - add kernel oops and warnings checker - change testname variable to use FUNCNAME - fix spacing in test_vm_server_host_client - add -s skip build option to vmtest.sh - add test_vm_loopback - pass port to vm_wait_for_listener - fix indentation in vmtest.sh - add vmci and hyperv to config - changed whitespace from tabs to spaces in help string - Link to v1: https://lore.kernel.org/r/20250410-vsock-vmtest-v1-1-f35a81dab...@gmail.com --- MAINTAINERS | 1 + tools/testing/selftests/vsock/.gitignore | 2 + tools/testing/selftests/vsock/Makefile | 16 ++ tools/testing/selftests/vsock/config | 114 tools/testing/selftests/vsock/settings | 1 + tools/testing/selftests/vsock/vmtest.sh | 441 +++ 6 files changed, 575 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 657a67f9031ef
Re: [PATCH v4 9/9] mm, slub: skip percpu sheaves for remote object freeing
On 5/7/25 12:39, Harry Yoo wrote: > On Fri, Apr 25, 2025 at 10:27:29AM +0200, Vlastimil Babka wrote: >> Since we don't control the NUMA locality of objects in percpu sheaves, >> allocations with node restrictions bypass them. Allocations without >> restrictions may however still expect to get local objects with high >> probability, and the introduction of sheaves can decrease it due to >> freed object from a remote node ending up in percpu sheaves. >> >> The fraction of such remote frees seems low (5% on an 8-node machine) >> but it can be expected that some cache or workload specific corner cases >> exist. We can either conclude that this is not a problem due to the low >> fraction, or we can make remote frees bypass percpu sheaves and go >> directly to their slabs. This will make the remote frees more expensive, >> but if if's only a small fraction, most frees will still benefit from >> the lower overhead of percpu sheaves. >> >> This patch thus makes remote object freeing bypass percpu sheaves, >> including bulk freeing, and kfree_rcu() via the rcu_free sheaf. However >> it's not intended to be 100% guarantee that percpu sheaves will only >> contain local objects. The refill from slabs does not provide that >> guarantee in the first place, and there might be cpu migrations >> happening when we need to unlock the local_lock. Avoiding all that could >> be possible but complicated so we can leave it for later investigation >> whether it would be worth it. It can be expected that the more selective >> freeing will itself prevent accumulation of remote objects in percpu >> sheaves so any such violations would have only short-term effects. >> >> Another possible optimization to investigate is whether it would be >> beneficial for node-restricted or strict_numa allocations to attempt to >> obtain an object from percpu sheaves if the node or mempolicy (i.e. >> MPOL_LOCAL) happens to want the local node of the allocating cpu. Right >> now such allocations bypass sheaves, but they could probably look first >> whether the first available object in percpu sheaves is local, and with >> high probability succeed - and only bypass the sheaves in cases it's >> not local. >> >> Signed-off-by: Vlastimil Babka >> --- >> mm/slab_common.c | 7 +-- >> mm/slub.c| 43 +-- >> 2 files changed, 42 insertions(+), 8 deletions(-) >> >> diff --git a/mm/slub.c b/mm/slub.c >> index >> cc273cc45f632e16644355831132cdc391219cec..2bf83e2b85b23f4db2b311edaded4bef6b7d01de >> 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -5924,8 +5948,15 @@ void slab_free(struct kmem_cache *s, struct slab >> *slab, void *object, >> if (unlikely(!slab_free_hook(s, object, slab_want_init_on_free(s), >> false))) >> return; >> >> -if (!s->cpu_sheaves || !free_to_pcs(s, object)) >> -do_slab_free(s, slab, object, object, 1, addr); >> +if (s->cpu_sheaves) { >> +if (likely(!IS_ENABLED(CONFIG_NUMA) || >> + slab_nid(slab) == numa_node_id())) { >> +free_to_pcs(s, object); > > Shouldn't it call do_slab_free() when free_to_pcs() failed? Oops yes, thanks! > >> +return; >> +} >> +} >> + >> +do_slab_free(s, slab, object, object, 1, addr); >> } >> >> #ifdef CONFIG_MEMCG >> >> -- >> 2.49.0 >> >> >
[PATCH] selftests/mm: Deduplicate test logging in test_mlock_lock()
The mlock2-tests test_mlock_lock() test reports two test results with an identical string, one reporitng if it successfully locked a block of memory and another reporting if the lock is still present after doing an unlock (following a similar pattern to other tests in the same program). This confuses test automation since the test string is used to deduplicate tests, change the post unlock test to report "Unlocked" instead like the other tests to fix this. Signed-off-by: Mark Brown --- tools/testing/selftests/mm/mlock2-tests.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/mlock2-tests.c b/tools/testing/selftests/mm/mlock2-tests.c index 7f0d50fa361d..3e90ff37e336 100644 --- a/tools/testing/selftests/mm/mlock2-tests.c +++ b/tools/testing/selftests/mm/mlock2-tests.c @@ -196,7 +196,7 @@ static void test_mlock_lock(void) ksft_exit_fail_msg("munlock(): %s\n", strerror(errno)); } - ksft_test_result(!unlock_lock_check(map), "%s: Locked\n", __func__); + ksft_test_result(!unlock_lock_check(map), "%s: Unlocked\n", __func__); munmap(map, 2 * page_size); } --- base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3 change-id: 20250514-selftest-mm-mlock2-dup-277d586bb29d Best regards, -- Mark Brown
Re: [PATCH] selftests/mm: Deduplicate test logging in test_mlock_lock()
On 15/05/25 2:57 pm, Mark Brown wrote: The mlock2-tests test_mlock_lock() test reports two test results with an identical string, one reporitng if it successfully locked a block of memory and another reporting if the lock is still present after doing an unlock (following a similar pattern to other tests in the same program). This confuses test automation since the test string is used to deduplicate tests, change the post unlock test to report "Unlocked" instead like the other tests to fix this. Signed-off-by: Mark Brown --- tools/testing/selftests/mm/mlock2-tests.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/mlock2-tests.c b/tools/testing/selftests/mm/mlock2-tests.c index 7f0d50fa361d..3e90ff37e336 100644 --- a/tools/testing/selftests/mm/mlock2-tests.c +++ b/tools/testing/selftests/mm/mlock2-tests.c @@ -196,7 +196,7 @@ static void test_mlock_lock(void) ksft_exit_fail_msg("munlock(): %s\n", strerror(errno)); } - ksft_test_result(!unlock_lock_check(map), "%s: Locked\n", __func__); + ksft_test_result(!unlock_lock_check(map), "%s: Unlocked\n", __func__); munmap(map, 2 * page_size); } Acked-by: Dev Jain --- base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3 change-id: 20250514-selftest-mm-mlock2-dup-277d586bb29d Best regards,
Re: [PATCH] selftests/mm: Fix test result reporting in gup_longterm
On 15/05/25 2:27 pm, Mark Brown wrote: The kselftest framework uses the string logged when a test result is reported as the unique identifier for a test, using it to track test results between runs. The gup_longterm test completely fails to follow this pattern, it runs a single test function repeatedly with various parameters but each result report is a string logging an error message which is fixed between runs. Since the code already logs each test uniquely before it starts refactor to also print this to a buffer, then use that name as the test result. This isn't especially pretty, really this test could use a more substantial cleanup. Signed-off-by: Mark Brown --- tools/testing/selftests/mm/gup_longterm.c | 163 -- 1 file changed, 107 insertions(+), 56 deletions(-) diff --git a/tools/testing/selftests/mm/gup_longterm.c b/tools/testing/selftests/mm/gup_longterm.c index 21595b20bbc3..a849537f9372 100644 --- a/tools/testing/selftests/mm/gup_longterm.c +++ b/tools/testing/selftests/mm/gup_longterm.c @@ -35,6 +35,8 @@ static int nr_hugetlbsizes; static size_t hugetlbsizes[10]; static int gup_fd; +static char test_name[1024]; + static __fsword_t get_fs_type(int fd) { struct statfs fs; @@ -93,33 +95,48 @@ static void do_test(int fd, size_t size, enum test_type type, bool shared) __fsword_t fs_type = get_fs_type(fd); bool should_work; char *mem; + int result = KSFT_PASS; int ret; + if (fd < 0) { + result = KSFT_FAIL; + goto report; + } + if (ftruncate(fd, size)) { if (errno == ENOENT) { skip_test_dodgy_fs("ftruncate()"); } else { - ksft_test_result_fail("ftruncate() failed (%s)\n", strerror(errno)); + ksft_print_msg("ftruncate() failed (%s)\n", + strerror(errno)); + result = KSFT_FAIL; + goto report; } return; } if (fallocate(fd, 0, 0, size)) { - if (size == pagesize) - ksft_test_result_fail("fallocate() failed (%s)\n", strerror(errno)); - else - ksft_test_result_skip("need more free huge pages\n"); - return; + if (size == pagesize) { + ksft_print_msg("fallocate() failed (%s)\n", strerror(errno)); + result = KSFT_FAIL; + } else { + ksft_print_msg("need more free huge pages\n"); + result = KSFT_SKIP; + } + goto report; } mem = mmap(NULL, size, PROT_READ | PROT_WRITE, shared ? MAP_SHARED : MAP_PRIVATE, fd, 0); if (mem == MAP_FAILED) { - if (size == pagesize || shared) - ksft_test_result_fail("mmap() failed (%s)\n", strerror(errno)); - else - ksft_test_result_skip("need more free huge pages\n"); - return; + if (size == pagesize || shared) { + ksft_print_msg("mmap() failed (%s)\n", strerror(errno)); + result = KSFT_FAIL; + } else { + ksft_print_msg("need more free huge pages\n"); + result = KSFT_SKIP; + } + goto report; } /* Fault in the page such that GUP-fast can pin it directly. */ @@ -134,7 +151,8 @@ static void do_test(int fd, size_t size, enum test_type type, bool shared) */ ret = mprotect(mem, size, PROT_READ); if (ret) { - ksft_test_result_fail("mprotect() failed (%s)\n", strerror(errno)); + ksft_print_msg("mprotect() failed (%s)\n", strerror(errno)); + result = KSFT_FAIL; goto munmap; } /* FALLTHROUGH */ @@ -147,12 +165,14 @@ static void do_test(int fd, size_t size, enum test_type type, bool shared) type == TEST_TYPE_RW_FAST; if (gup_fd < 0) { - ksft_test_result_skip("gup_test not available\n"); + ksft_print_msg("gup_test not available\n"); + result = KSFT_SKIP; break; } if (rw && shared && fs_is_unknown(fs_type)) { - ksft_test_result_skip("Unknown filesystem\n"); + ksft_print_msg("Unknown filesystem\n"); + result = KSFT_SKIP; return; } /* @@ -169,14 +189,19 @@ static void do_test(int fd, size_t size, enum test_type type, bool shared) args.flags |= rw ? PIN_LONGTERM_TES
[PATCH] selftests/mm: Deduplicate default page size test results in thuge-gen
The thuge-gen test program runs mmap() and shmget() tests for both every available page size and the default page size, resulting in two tests for the default size. These tests are distinct since the flags in the default case do not specify an explicit size, add the flags to the test name that is logged to deduplicate. Signed-off-by: Mark Brown --- tools/testing/selftests/mm/thuge-gen.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/mm/thuge-gen.c b/tools/testing/selftests/mm/thuge-gen.c index cd5174d735be..a41bc1234b37 100644 --- a/tools/testing/selftests/mm/thuge-gen.c +++ b/tools/testing/selftests/mm/thuge-gen.c @@ -127,7 +127,7 @@ void test_mmap(unsigned long size, unsigned flags) show(size); ksft_test_result(size == getpagesize() || (before - after) == NUM_PAGES, -"%s mmap %lu\n", __func__, size); +"%s mmap %lu %x\n", __func__, size, flags); if (munmap(map, size * NUM_PAGES)) ksft_exit_fail_msg("%s: unmap %s\n", __func__, strerror(errno)); @@ -165,7 +165,7 @@ void test_shmget(unsigned long size, unsigned flags) show(size); ksft_test_result(size == getpagesize() || (before - after) == NUM_PAGES, -"%s: mmap %lu\n", __func__, size); +"%s: mmap %lu %x\n", __func__, size, flags); if (shmdt(map)) ksft_exit_fail_msg("%s: shmdt: %s\n", __func__, strerror(errno)); } --- base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3 change-id: 20250514-selfests-mm-thuge-gen-dup-7e1c40716091 Best regards, -- Mark Brown
Re: [PATCH] selftests/mm: Fix test result reporting in gup_longterm
On Thu, May 15, 2025 at 03:05:07PM +0530, Dev Jain wrote: > On 15/05/25 2:27 pm, Mark Brown wrote: > > @@ -189,7 +214,10 @@ static void do_test(int fd, size_t size, enum > > test_type type, bool shared) > > * some previously unsupported filesystems, we might want to > > * perform some additional tests for possible data corruptions. > > */ > > - ksft_test_result(should_work, "Should have worked\n"); > > + if (should_work) > > + result = KSFT_PASS; > Missed printing "Should have worked" here. I didn't think that output was particularly useful separate to the overall test result (which is logged on exit from the function), it's just the test result more than diagnostic information. Please delete unneeded context from mails when replying. Doing this makes it much easier to find your reply in the message, helping ensure it won't be missed by people scrolling through the irrelevant quoted material. signature.asc Description: PGP signature
Re: [PATCH] selftests/mm: Fix test result reporting in gup_longterm
On 15/05/25 3:11 pm, Mark Brown wrote: On Thu, May 15, 2025 at 03:05:07PM +0530, Dev Jain wrote: On 15/05/25 2:27 pm, Mark Brown wrote: @@ -189,7 +214,10 @@ static void do_test(int fd, size_t size, enum test_type type, bool shared) * some previously unsupported filesystems, we might want to * perform some additional tests for possible data corruptions. */ - ksft_test_result(should_work, "Should have worked\n"); + if (should_work) + result = KSFT_PASS; Missed printing "Should have worked" here. I didn't think that output was particularly useful separate to the overall test result (which is logged on exit from the function), it's just the test result more than diagnostic information. No hard opinion. Please delete unneeded context from mails when replying. Doing this makes it much easier to find your reply in the message, helping ensure it won't be missed by people scrolling through the irrelevant quoted material. You have mentioned that before, sorry my bad! I also hate it :)
Re: [PATCH] selftests/mm: Deduplicate default page size test results in thuge-gen
On 15/05/25 3:06 pm, Mark Brown wrote: The thuge-gen test program runs mmap() and shmget() tests for both every available page size and the default page size, resulting in two tests for the default size. These tests are distinct since the flags in the default case do not specify an explicit size, add the flags to the test name that is logged to deduplicate. Signed-off-by: Mark Brown Acked-by: Dev Jain
Re: [PATCH] selftests: remove duplicated function definition
On Wed, May 14, 2025 at 10:02:02AM +0800, Chen Linxuan wrote: > I failed to build this test on Ubuntu 24.04. Compiler complains that > function sys_open_tree has already been defined in > "../filesystems/overlayfs/wrappers.h". > > Signed-off-by: Chen Linxuan > --- This is already fixed in the next vfs tree. Thanks though!
Re: [PATCH 01/12] module: Move modprobe_path and modules_disabled ctl_tables into the module subsys
On Thu, May 15, 2025 at 10:04:53AM +0200, Petr Pavlu wrote: > On 5/9/25 14:54, Joel Granados wrote: > > Move module sysctl (modprobe_path and modules_disabled) out of sysctl.c > > and into the modules subsystem. Make the modprobe_path variable static > > as it no longer needs to be exported. Remove module.h from the includes > > in sysctl as it no longer uses any module exported variables. > > > > This is part of a greater effort to move ctl tables into their > > respective subsystems which will reduce the merge conflicts in > > kernel/sysctl.c. > > > > Signed-off-by: Joel Granados > > [...] > > --- a/kernel/module/kmod.c > > +++ b/kernel/module/kmod.c > > @@ -60,7 +60,7 @@ static DEFINE_SEMAPHORE(kmod_concurrent_max, > > MAX_KMOD_CONCURRENT); > > /* > > modprobe_path is set via /proc/sys. > > */ > > -char modprobe_path[KMOD_PATH_LEN] = CONFIG_MODPROBE_PATH; > > +static char modprobe_path[KMOD_PATH_LEN] = CONFIG_MODPROBE_PATH; > > > > static void free_modprobe_argv(struct subprocess_info *info) > > { > > @@ -177,3 +177,33 @@ int __request_module(bool wait, const char *fmt, ...) > > return ret; > > } > > EXPORT_SYMBOL(__request_module); > > + > > +#ifdef CONFIG_MODULES > > +static const struct ctl_table kmod_sysctl_table[] = { > > + { > > + .procname = "modprobe", > > + .data = &modprobe_path, > > + .maxlen = KMOD_PATH_LEN, > > + .mode = 0644, > > + .proc_handler = proc_dostring, > > + }, > > + { > > + .procname = "modules_disabled", > > + .data = &modules_disabled, > > + .maxlen = sizeof(int), > > + .mode = 0644, > > + /* only handle a transition from default "0" to "1" */ > > + .proc_handler = proc_dointvec_minmax, > > + .extra1 = SYSCTL_ONE, > > + .extra2 = SYSCTL_ONE, > > + }, > > This is minor.. but the file kernel/module/kmod.c contains the logic to > request direct modprobe invocation by the kernel. Registering the > modprobe_path sysctl here is appropriate. However, the modules_disabled > setting affects the entire module loader so I don't think it's best to > register it here. > > I suggest keeping a single table for the module sysctl values but moving > it to kernel/module/main.c. This means the variable modprobe_path must > retain external linkage, on the other hand, modules_disabled can be made > static. Like this?: --- include/linux/module.h | 1 - kernel/module/main.c | 30 +- kernel/sysctl.c| 20 3 files changed, 29 insertions(+), 22 deletions(-) diff --git a/include/linux/module.h b/include/linux/module.h index d94b196d5a34..25476168e012 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -302,7 +302,6 @@ struct notifier_block; #ifdef CONFIG_MODULES -extern int modules_disabled; /* for sysctl */ /* Get/put a kernel symbol (calls must be symmetric) */ void *__symbol_get(const char *symbol); void *__symbol_get_gpl(const char *symbol); diff --git a/kernel/module/main.c b/kernel/module/main.c index a2859dc3eea6..13055ef65f15 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -126,9 +126,37 @@ static void mod_update_bounds(struct module *mod) } /* Block module loading/unloading? */ -int modules_disabled; +static int modules_disabled; core_param(nomodule, modules_disabled, bint, 0); +static const struct ctl_table kmod_sysctl_table[] = { + { + .procname = "modprobe", + .data = &modprobe_path, + .maxlen = KMOD_PATH_LEN, + .mode = 0644, + .proc_handler = proc_dostring, + }, + { + .procname = "modules_disabled", + .data = &modules_disabled, + .maxlen = sizeof(int), + .mode = 0644, + /* only handle a transition from default "0" to "1" */ + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ONE, + .extra2 = SYSCTL_ONE, + }, +}; + +static int __init init_kmod_sysctl(void) +{ + register_sysctl_init("kernel", kmod_sysctl_table); + return 0; +} + +subsys_initcall(init_kmod_sysctl); + /* Waiting for a module to finish initializing? */ static DECLARE_WAIT_QUEUE_HEAD(module_wq); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 9b4f0cff76ea..473133d9651e 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -19,7 +19,6 @@ * Removed it and replaced it with older style, 03/23/00, Bill Wendling */ -#include #include #include #include @@ -1616,25 +1615,6 @@ static const struct ctl_table kern_table[] = { .proc_handler = proc_dointvec, }, #endif -#ifdef CONFIG_MODULES - { - .procname = "m
Re: [PATCH] selftests: Add functional test for the abort file in fusectl
On Thu, May 15, 2025 at 9:35 AM Chen Linxuan wrote: > > This patch add a simple functional test for the "about" file > in fusectlfs (/sys/fs/fuse/connections/ID/about). > > A simple fuse daemon is added for testing. > > Related discussion can be found in the link below. > > Link: > https://lore.kernel.org/all/CAOQ4uxjKFXOKQxPpxtS6G_nR0tpw95w0GiO68UcWg_OBhmSY=q...@mail.gmail.com/ > Cc: Amir Goldstein > Signed-off-by: Chen Linxuan > --- > MAINTAINERS | 1 + > tools/testing/selftests/Makefile | 1 + > .../selftests/filesystems/fusectl/.gitignore | 3 + > .../selftests/filesystems/fusectl/Makefile| 21 +++ > .../selftests/filesystems/fusectl/fuse_mnt.c | 146 ++ > .../filesystems/fusectl/fusectl_test.c| 115 ++ > 6 files changed, 287 insertions(+) > create mode 100644 tools/testing/selftests/filesystems/fusectl/.gitignore > create mode 100644 tools/testing/selftests/filesystems/fusectl/Makefile > create mode 100644 tools/testing/selftests/filesystems/fusectl/fuse_mnt.c > create mode 100644 tools/testing/selftests/filesystems/fusectl/fusectl_test.c > > diff --git a/MAINTAINERS b/MAINTAINERS > index f21f1dabb5fe1..efc6c89113b95 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -9740,6 +9740,7 @@ T:git > git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git > F: Documentation/filesystems/fuse.rst > F: fs/fuse/ > F: include/uapi/linux/fuse.h > +F: tools/testing/selftests/filesystems/fusectl > > FUTEX SUBSYSTEM > M: Thomas Gleixner > diff --git a/tools/testing/selftests/Makefile > b/tools/testing/selftests/Makefile > index 80fb84fa3cfcb..a9bfefa961889 100644 > --- a/tools/testing/selftests/Makefile > +++ b/tools/testing/selftests/Makefile > @@ -36,6 +36,7 @@ TARGETS += filesystems/fat > TARGETS += filesystems/overlayfs > TARGETS += filesystems/statmount > TARGETS += filesystems/mount-notify > +TARGETS += filesystems/fusectl > TARGETS += firmware > TARGETS += fpu > TARGETS += ftrace > diff --git a/tools/testing/selftests/filesystems/fusectl/.gitignore > b/tools/testing/selftests/filesystems/fusectl/.gitignore > new file mode 100644 > index 0..3e72e742d08e8 > --- /dev/null > +++ b/tools/testing/selftests/filesystems/fusectl/.gitignore > @@ -0,0 +1,3 @@ > +# SPDX-License-Identifier: GPL-2.0-only > +fuse_mnt > +fusectl_test > diff --git a/tools/testing/selftests/filesystems/fusectl/Makefile > b/tools/testing/selftests/filesystems/fusectl/Makefile > new file mode 100644 > index 0..612aad69a93aa > --- /dev/null > +++ b/tools/testing/selftests/filesystems/fusectl/Makefile > @@ -0,0 +1,21 @@ > +# SPDX-License-Identifier: GPL-2.0-or-later > + > +CFLAGS += -Wall -O2 -g $(KHDR_INCLUDES) > + > +TEST_GEN_PROGS := fusectl_test > +TEST_GEN_FILES := fuse_mnt > + > +include ../../lib.mk > + > +VAR_CFLAGS := $(shell pkg-config fuse --cflags 2>/dev/null) > +ifeq ($(VAR_CFLAGS),) > +VAR_CFLAGS := -D_FILE_OFFSET_BITS=64 -I/usr/include/fuse > +endif > + > +VAR_LDLIBS := $(shell pkg-config fuse --libs 2>/dev/null) > +ifeq ($(VAR_LDLIBS),) > +VAR_LDLIBS := -lfuse -pthread > +endif > + > +$(OUTPUT)/fuse_mnt: CFLAGS += $(VAR_CFLAGS) > +$(OUTPUT)/fuse_mnt: LDLIBS += $(VAR_LDLIBS) > diff --git a/tools/testing/selftests/filesystems/fusectl/fuse_mnt.c > b/tools/testing/selftests/filesystems/fusectl/fuse_mnt.c > new file mode 100644 > index 0..d12b17f30fadc > --- /dev/null > +++ b/tools/testing/selftests/filesystems/fusectl/fuse_mnt.c > @@ -0,0 +1,146 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * fusectl test file-system > + * Creates a simple FUSE filesystem with a single read-write file (/test) > + */ > + > +#define FUSE_USE_VERSION 26 > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define MAX(a, b) ((a) > (b) ? (a) : (b)) > + > +static char *content; > +static size_t content_size = 0; > +static const char test_path[] = "/test"; > + > +static int test_getattr(const char *path, struct stat *st) > +{ > + memset(st, 0, sizeof(*st)); > + > + if (!strcmp(path, "/")) { > + st->st_mode = S_IFDIR | 0755; > + st->st_nlink = 2; > + return 0; > + } > + > + if (!strcmp(path, test_path)) { > + st->st_mode = S_IFREG | 0664; > + st->st_nlink = 1; > + st->st_size = content_size; > + return 0; > + } > + > + return -ENOENT; > +} > + > +static int test_readdir(const char *path, void *buf, fuse_fill_dir_t filler, > + off_t offset, struct fuse_file_info *fi) > +{ > + if (strcmp(path, "/")) > + return -ENOENT; > + > + filler(buf, ".", NULL, 0); > + filler(buf, "..", NULL, 0); > + filler(buf, test_path + 1, NULL, 0); > + > + return 0; > +} > + > +static int test_open(const char *path, struct fuse_file_info *fi) >
Re: [PATCH bpf-next v3 4/8] selftests/bpf: Introduce verdict programs for sockmap_redir
On 5/15/25 06:29, John Fastabend wrote: > On 2025-05-15 00:15:27, Michal Luczaj wrote: >> Instead of piggybacking on test_sockmap_listen, introduce >> test_sockmap_redir especially for sockmap redirection tests. >> >> Suggested-by: Jiayuan Chen >> Acked-by: John Fastabend >> Signed-off-by: Michal Luczaj >> --- > > LGTM, this is a net new patch in v3 though correct? In the future > only carry the Acks into patches that are from the previous version > and not substantially changed. Will do, sorry. Note that I wasn't being sneaky, I asked about it in the cover letter :) Thanks, Michal > All the same thanks for doing this. For this patch. > > Acked-by: John Fastabend
Re: [PATCH v4] vdpa/octeon_ep: Control PCI dev enabling manually
On Thu, May 15, 2025 at 09:14:22AM +0200, Philipp Stanner wrote: > On Thu, 2025-05-08 at 10:51 +0200, Philipp Stanner wrote: > > PCI region request functions such as pci_request_region() currently > > have > > the problem of becoming sometimes managed functions, if > > pcim_enable_device() instead of pci_enable_device() was called. The > > PCI > > subsystem wants to remove this deprecated behavior from its > > interfaces. > > > > octeopn_ep enables its device with pcim_enable_device() (for VF. PF > > uses > > manual management), but does so only to get automatic disablement. > > The > > driver wants to manage its PCI resources for VF manually, without > > devres. > > > > The easiest way not to use automatic resource management at all is by > > also handling device enable- and disablement manually. > > > > Replace pcim_enable_device() with pci_enable_device(). Add the > > necessary > > calls to pci_disable_device(). > > > > Signed-off-by: Philipp Stanner > > Acked-by: Vamsi Attunuru > > Hi again, > > this is the last of 12 drivers blocking me from removing a few hundred > lines of broken code from PCI. Would be great if it could be sent to > Linus next merge window. > > Can someone take this patch in? > > Thx > P. I intend to, working on packing things up for -next as we speak. > > --- > > Changes in v4: > > - s/AF/PF > > - Add Vamsi's AB > > > > Changes in v3: > > - Only call pci_disable_device() for the PF version. For AF it > > would > > cause a WARN_ON because pcim_enable_device()'s callback will also > > try to disable. > > --- > > drivers/vdpa/octeon_ep/octep_vdpa_main.c | 17 - > > 1 file changed, 12 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/vdpa/octeon_ep/octep_vdpa_main.c > > b/drivers/vdpa/octeon_ep/octep_vdpa_main.c > > index f3d4dda4e04c..9b49efd24391 100644 > > --- a/drivers/vdpa/octeon_ep/octep_vdpa_main.c > > +++ b/drivers/vdpa/octeon_ep/octep_vdpa_main.c > > @@ -454,6 +454,9 @@ static void octep_vdpa_remove_pf(struct pci_dev > > *pdev) > > octep_iounmap_region(pdev, octpf->base, > > OCTEP_HW_MBOX_BAR); > > > > octep_vdpa_pf_bar_expand(octpf); > > + > > + /* The pf version does not use managed PCI. */ > > + pci_disable_device(pdev); > > } > > > > static void octep_vdpa_vf_bar_shrink(struct pci_dev *pdev) > > @@ -825,7 +828,7 @@ static int octep_vdpa_probe_pf(struct pci_dev > > *pdev) > > struct octep_pf *octpf; > > int ret; > > > > - ret = pcim_enable_device(pdev); > > + ret = pci_enable_device(pdev); > > if (ret) { > > dev_err(dev, "Failed to enable device\n"); > > return ret; > > @@ -834,15 +837,17 @@ static int octep_vdpa_probe_pf(struct pci_dev > > *pdev) > > ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)); > > if (ret) { > > dev_err(dev, "No usable DMA configuration\n"); > > - return ret; > > + goto disable_pci; > > } > > octpf = devm_kzalloc(dev, sizeof(*octpf), GFP_KERNEL); > > - if (!octpf) > > - return -ENOMEM; > > + if (!octpf) { > > + ret = -ENOMEM; > > + goto disable_pci; > > + } > > > > ret = octep_iomap_region(pdev, octpf->base, > > OCTEP_HW_MBOX_BAR); > > if (ret) > > - return ret; > > + goto disable_pci; > > > > pci_set_master(pdev); > > pci_set_drvdata(pdev, octpf); > > @@ -856,6 +861,8 @@ static int octep_vdpa_probe_pf(struct pci_dev > > *pdev) > > > > unmap_region: > > octep_iounmap_region(pdev, octpf->base, OCTEP_HW_MBOX_BAR); > > +disable_pci: > > + pci_disable_device(pdev); > > return ret; > > } > >
Re: [PATCH v2 4/4] media: i2c: imx214: Remove hard-coded external clock frequency
Hi Laurent, On Thu, May 15, 2025 at 10:58:46AM +0200, Laurent Pinchart wrote: > Hi Sakari, > > On Tue, May 06, 2025 at 08:24:03AM +, Sakari Ailus wrote: > > On Mon, May 05, 2025 at 11:05:56PM +0200, André Apitzsch via B4 Relay wrote: > > > From: André Apitzsch > > > > > > Instead rely on the rate set on the clock (using assigned-clock-rates > > > etc.) > > > > > > Signed-off-by: André Apitzsch > > > --- > > > drivers/media/i2c/imx214.c | 6 -- > > > 1 file changed, 6 deletions(-) > > > > > > diff --git a/drivers/media/i2c/imx214.c b/drivers/media/i2c/imx214.c > > > index > > > 9e9be47394ec768a5b34d44b06b5bbb0988da5a1..c12996e294dccebb18c608254f1e0d14dc064423 > > > 100644 > > > --- a/drivers/media/i2c/imx214.c > > > +++ b/drivers/media/i2c/imx214.c > > > @@ -32,7 +32,6 @@ > > > > > > #define IMX214_REG_FAST_STANDBY_CTRL CCI_REG8(0x0106) > > > > > > -#define IMX214_DEFAULT_CLK_FREQ 2400 > > > #define IMX214_DEFAULT_LINK_FREQ 6 > > > /* Keep wrong link frequency for backward compatibility */ > > > #define IMX214_DEFAULT_LINK_FREQ_LEGACY 48000 > > > @@ -1405,11 +1404,6 @@ static int imx214_probe(struct i2c_client *client) > > > return dev_err_probe(dev, PTR_ERR(imx214->xclk), > > >"failed to get xclk\n"); > > > > > > - ret = clk_set_rate(imx214->xclk, IMX214_DEFAULT_CLK_FREQ); > > > - if (ret) > > > - return dev_err_probe(dev, ret, > > > - "failed to set xclk frequency\n"); > > > - > > > > Oops. I missed this is what the driver was doing already. Indeed, this is > > one of the historic sensor drivers that do set the frequency in DT systems. > > > > The driver never used the clock-frequency property and instead used a fixed > > frequency. Changing the behaviour now could be problematic. > > > > There are options here that I think we could do: > > > > 1) use your v1 patch (4) which uses "clock-frequency" if it exists and > > otherwise uses the default, fixed frequency or > > > > 2) set the frequency only if the "clock-frequency" property exists. The DT > > currently requires clock-frequency and the YAML conversion was done in 2020 > > whereas the driver is from 2018. If we do this, the clock-frequency should > > be deprecated (or even removed from bingings). > > > > I wonder what others think. Cc'd Laurent in any case. > > Maybe I'm missing something, but I don't really see the issue here. The > clock-frequency DT property is currently ignored, and this patch doesn't > change that situation, does it ? > > The change of behaviour here is related to the assigned-clock-rates > property. If that property is specified today, it will set the clock > rate, and the driver will override it to 24MHz right after. With this > patch, the clock rate won't be overridden. I think the risk of > regression is very low here, as I don't expect systems to set > assigned-clock-rates in DT to a value different than 24MHz and expect > the driver to override it. If the DTS had assigned-clock-rates set correctly, then yes. How much can we trust the older DTS did have that? -- Regards, Sakari Ailus
Re: [PATCH 01/12] module: Move modprobe_path and modules_disabled ctl_tables into the module subsys
On 5/15/25 12:04, Joel Granados wrote: > On Thu, May 15, 2025 at 10:04:53AM +0200, Petr Pavlu wrote: >> On 5/9/25 14:54, Joel Granados wrote: >>> Move module sysctl (modprobe_path and modules_disabled) out of sysctl.c >>> and into the modules subsystem. Make the modprobe_path variable static >>> as it no longer needs to be exported. Remove module.h from the includes >>> in sysctl as it no longer uses any module exported variables. >>> >>> This is part of a greater effort to move ctl tables into their >>> respective subsystems which will reduce the merge conflicts in >>> kernel/sysctl.c. >>> >>> Signed-off-by: Joel Granados >>> [...] >>> --- a/kernel/module/kmod.c >>> +++ b/kernel/module/kmod.c >>> @@ -60,7 +60,7 @@ static DEFINE_SEMAPHORE(kmod_concurrent_max, >>> MAX_KMOD_CONCURRENT); >>> /* >>> modprobe_path is set via /proc/sys. >>> */ >>> -char modprobe_path[KMOD_PATH_LEN] = CONFIG_MODPROBE_PATH; >>> +static char modprobe_path[KMOD_PATH_LEN] = CONFIG_MODPROBE_PATH; >>> >>> static void free_modprobe_argv(struct subprocess_info *info) >>> { >>> @@ -177,3 +177,33 @@ int __request_module(bool wait, const char *fmt, ...) >>> return ret; >>> } >>> EXPORT_SYMBOL(__request_module); >>> + >>> +#ifdef CONFIG_MODULES >>> +static const struct ctl_table kmod_sysctl_table[] = { >>> + { >>> + .procname = "modprobe", >>> + .data = &modprobe_path, >>> + .maxlen = KMOD_PATH_LEN, >>> + .mode = 0644, >>> + .proc_handler = proc_dostring, >>> + }, >>> + { >>> + .procname = "modules_disabled", >>> + .data = &modules_disabled, >>> + .maxlen = sizeof(int), >>> + .mode = 0644, >>> + /* only handle a transition from default "0" to "1" */ >>> + .proc_handler = proc_dointvec_minmax, >>> + .extra1 = SYSCTL_ONE, >>> + .extra2 = SYSCTL_ONE, >>> + }, >> >> This is minor.. but the file kernel/module/kmod.c contains the logic to >> request direct modprobe invocation by the kernel. Registering the >> modprobe_path sysctl here is appropriate. However, the modules_disabled >> setting affects the entire module loader so I don't think it's best to >> register it here. >> >> I suggest keeping a single table for the module sysctl values but moving >> it to kernel/module/main.c. This means the variable modprobe_path must >> retain external linkage, on the other hand, modules_disabled can be made >> static. > > Like this?: > [...] Let's also move the KMOD_PATH_LEN definition and the modprobe_path declaration from include/linux/kmod.h to kernel/module/internal.h, as they are now fully internal to the module loader, and use "module" instead of "kmod" in the sysctl registration to avoid confusion with the modprobe logic. The adjusted patch is below. Reviewed-by: Petr Pavlu -- Thanks, Petr diff --git a/include/linux/kmod.h b/include/linux/kmod.h index 68f69362d427..9a07c3215389 100644 --- a/include/linux/kmod.h +++ b/include/linux/kmod.h @@ -14,10 +14,7 @@ #include #include -#define KMOD_PATH_LEN 256 - #ifdef CONFIG_MODULES -extern char modprobe_path[]; /* for sysctl */ /* modprobe exit status on success, -ve on error. Return value * usually useless though. */ extern __printf(2, 3) diff --git a/include/linux/module.h b/include/linux/module.h index 8050f77c3b64..f4ab8d90c475 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -304,7 +304,6 @@ struct notifier_block; #ifdef CONFIG_MODULES -extern int modules_disabled; /* for sysctl */ /* Get/put a kernel symbol (calls must be symmetric) */ void *__symbol_get(const char *symbol); void *__symbol_get_gpl(const char *symbol); diff --git a/kernel/module/internal.h b/kernel/module/internal.h index 626cf8668a7e..0954c8de00c2 100644 --- a/kernel/module/internal.h +++ b/kernel/module/internal.h @@ -58,6 +58,9 @@ extern const struct kernel_symbol __stop___ksymtab_gpl[]; extern const u32 __start___kcrctab[]; extern const u32 __start___kcrctab_gpl[]; +#define KMOD_PATH_LEN 256 +extern char modprobe_path[]; + struct load_info { const char *name; /* pointer to module in temporary copy, freed at end of load_module() */ diff --git a/kernel/module/main.c b/kernel/module/main.c index a2859dc3eea6..a336b7b3fb23 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -126,9 +126,37 @@ static void mod_update_bounds(struct module *mod) } /* Block module loading/unloading? */ -int modules_disabled; +static int modules_disabled; core_param(nomodule, modules_disabled, bint, 0); +static const struct ctl_table module_sysctl_table[] = { + { + .procname = "modprobe", + .data = &modprobe_path, + .maxlen = KMOD_PATH_LEN, + .mode = 0644, + .proc_handler = proc_dostring, + }, + { +
Re: [PATCH v4 0/9] SLUB percpu sheaves
On 4/25/25 10:27, Vlastimil Babka wrote: > Hi, > > This is the v4 and first non-RFC series to add an opt-in percpu > array-based caching layer to SLUB, following the LSF/MM discussions. > Since v3 I've also made changes to achieve full compatibility with > slub_debug, and IRC discussions led to the last patch intended to > improve NUMA locality (the patch remains separate for evaluation > purposes). I've pushed the changes based on the feedback here: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slub-percpu-sheaves You can use that for testing/benchmarking. Thanks!
Re: [PATCH v2 4/4] media: i2c: imx214: Remove hard-coded external clock frequency
On Thu, May 15, 2025 at 02:54:54PM +0300, Sakari Ailus wrote: > On Thu, May 15, 2025 at 10:58:46AM +0200, Laurent Pinchart wrote: > > On Tue, May 06, 2025 at 08:24:03AM +, Sakari Ailus wrote: > > > On Mon, May 05, 2025 at 11:05:56PM +0200, André Apitzsch via B4 Relay > > > wrote: > > > > From: André Apitzsch > > > > > > > > Instead rely on the rate set on the clock (using assigned-clock-rates > > > > etc.) > > > > > > > > Signed-off-by: André Apitzsch > > > > --- > > > > drivers/media/i2c/imx214.c | 6 -- > > > > 1 file changed, 6 deletions(-) > > > > > > > > diff --git a/drivers/media/i2c/imx214.c b/drivers/media/i2c/imx214.c > > > > index > > > > 9e9be47394ec768a5b34d44b06b5bbb0988da5a1..c12996e294dccebb18c608254f1e0d14dc064423 > > > > 100644 > > > > --- a/drivers/media/i2c/imx214.c > > > > +++ b/drivers/media/i2c/imx214.c > > > > @@ -32,7 +32,6 @@ > > > > > > > > #define IMX214_REG_FAST_STANDBY_CTRL CCI_REG8(0x0106) > > > > > > > > -#define IMX214_DEFAULT_CLK_FREQ2400 > > > > #define IMX214_DEFAULT_LINK_FREQ 6 > > > > /* Keep wrong link frequency for backward compatibility */ > > > > #define IMX214_DEFAULT_LINK_FREQ_LEGACY48000 > > > > @@ -1405,11 +1404,6 @@ static int imx214_probe(struct i2c_client > > > > *client) > > > > return dev_err_probe(dev, PTR_ERR(imx214->xclk), > > > > "failed to get xclk\n"); > > > > > > > > - ret = clk_set_rate(imx214->xclk, IMX214_DEFAULT_CLK_FREQ); > > > > - if (ret) > > > > - return dev_err_probe(dev, ret, > > > > -"failed to set xclk frequency\n"); > > > > - > > > > > > Oops. I missed this is what the driver was doing already. Indeed, this is > > > one of the historic sensor drivers that do set the frequency in DT > > > systems. > > > > > > The driver never used the clock-frequency property and instead used a > > > fixed > > > frequency. Changing the behaviour now could be problematic. > > > > > > There are options here that I think we could do: > > > > > > 1) use your v1 patch (4) which uses "clock-frequency" if it exists and > > > otherwise uses the default, fixed frequency or > > > > > > 2) set the frequency only if the "clock-frequency" property exists. The DT > > > currently requires clock-frequency and the YAML conversion was done in > > > 2020 > > > whereas the driver is from 2018. If we do this, the clock-frequency should > > > be deprecated (or even removed from bingings). > > > > > > I wonder what others think. Cc'd Laurent in any case. > > > > Maybe I'm missing something, but I don't really see the issue here. The > > clock-frequency DT property is currently ignored, and this patch doesn't > > change that situation, does it ? > > > > The change of behaviour here is related to the assigned-clock-rates > > property. If that property is specified today, it will set the clock > > rate, and the driver will override it to 24MHz right after. With this > > patch, the clock rate won't be overridden. I think the risk of > > regression is very low here, as I don't expect systems to set > > assigned-clock-rates in DT to a value different than 24MHz and expect > > the driver to override it. > > If the DTS had assigned-clock-rates set correctly, then yes. How much can > we trust the older DTS did have that? I am relatively confident that DT-based systems wouldn't have an assigned-clock-rates property with a frequency that doesn't match IMX214_DEFAULT_CLK_FREQ. The real question is whether or not I'm over-confident :-) -- Regards, Laurent Pinchart
Re: [linux-next:master] [rcutorture] c27d0d38f2: WARNING:at_kernel/rcu/rcutorture.c:#rcutorture_one_extend_check[rcutorture]
On Thu, May 15, 2025 at 04:45:35PM +0800, Oliver Sang wrote: > hi, Paul, > > On Wed, May 14, 2025 at 09:18:10PM -0700, Paul E. McKenney wrote: > > On Thu, May 15, 2025 at 10:30:02AM +0800, Oliver Sang wrote: > > > hi, Paul, > > > > > > On Wed, May 14, 2025 at 11:26:34AM -0700, Paul E. McKenney wrote: > > > > On Wed, May 14, 2025 at 10:47:30AM +0800, kernel test robot wrote: > > > > > > > > > > hi, Paul, > > > > > > > > > > for this commit we tested before, now we found it causes issues in > > > > > linux-next > > > > > master branch. > > > > > > > > Good catch as always!!! Yes, using HRTIMER_MODE_HARD means that this > > > > ircutorture_one_extend_check() function must check for hardirq as well > > > > as softirq. Which, oddly enough, permits simplifying the code, though > > > > a larger patch. > > > > > > > > I could reproduce this, and the patch at the end of this email fixes it > > > > for me. Does it work for you as well? > > > > > > thanks a lot for information! > > > > > > what's the base of the patch? I tried to apply it upon c27d0d38f2 or > > > lastest > > > linux-next/master > > > bdd609656ff55 (tag: next-20250514, linux-next/master) Add linux-next > > > specific files for 20250514 > > > > > > both failed. thanks > > > > Apologies, I just now pushed it out on my -rcu tree: > > > > db950fccd45a ("rcutorture: Make rcutorture_one_extend_check() account for > > hardirq") > > issue is cleaned on db950fccd45a. thanks! > > Tested-by: kernel test robot I will apply on my next rebase, and once again thank you for your testing! Thanx, Paul > = > compiler/kconfig/rootfs/runtime/tbox_group/test/testcase/torture_type: > > clang-20/i386-randconfig-141-20250508/debian-11.1-i386-20220923.cgz/300s/vm-snb/cpuhotplug/rcutorture/srcud > > commit: > 8454f1334b167 ("lib: Make the ratelimit test more reliable") > db950fccd45a8 ("rcutorture: Make rcutorture_one_extend_check() account for > hardirq") > > 8454f1334b167506 db950fccd45a8273d93ca861d84 > --- >fail:runs %reproductionfail:runs >| | | > 1:50 -2%:50dmesg.EIP:__x86_return_thunk > 1:50 -2%:50 > dmesg.EIP:console_trylock_spinning > 48:50 -96%:50dmesg.EIP:pv_native_safe_halt > 50:50-100%:50 > dmesg.EIP:rcutorture_one_extend_check > 50:50-100%:50 > dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcutorture_one_extend_check[rcutorture] > > > > > I will need to rebase this to precede the updated version of the commit > > that you tested, but one step at a time... > > > > Thanx, Paul > > > > > > > = > > > > > tbox_group/testcase/rootfs/kconfig/compiler/runtime/test/torture_type: > > > > > > > > > > vm-snb/rcutorture/debian-11.1-i386-20220923.cgz/i386-randconfig-141-20250508/clang-20/300s/cpuhotplug/srcud > > > > > > > > > > c795676b5c0a4ab7 c27d0d38f2cafb70a68ca42c410 > > > > > --- > > > > >fail:runs %reproductionfail:runs > > > > >| | | > > > > >:50 2% 1:50 > > > > > dmesg.EIP:__kernel_text_address > > > > >:50 4% 2:50 > > > > > dmesg.EIP:__srcu_check_read_flavor > > > > >:50 2% 1:50 > > > > > dmesg.EIP:_raw_spin_unlock_irq > > > > >:50 14% 7:50 > > > > > dmesg.EIP:_raw_spin_unlock_irqrestore > > > > >:50 30% 15:50 > > > > > dmesg.EIP:console_flush_all > > > > >:50 4% 2:50 > > > > > dmesg.EIP:console_trylock_spinning > > > > >:50 2% 1:50dmesg.EIP:delay_tsc > > > > >:50 2% 1:50 > > > > > dmesg.EIP:finish_lock_switch > > > > >:50 2% 1:50 > > > > > dmesg.EIP:kernel_text_address > > > > >:50 2% 1:50dmesg.EIP:lock_acquire > > > > >:50 36% 18:50 > > > > > dmesg.EIP:pv_native_safe_halt > > > > > 1:50 -2%:50 > > > > > dmesg.EIP:rcu_torture_fwd_prog_cr > > > > > 7:50 -14%:50 > > > > > dmesg.EIP:rcu_torture_writer > > > > >:50 100% 50:50 > > > > > dmesg.EIP:rcutorture_one_extend_check > > > > > 48:50 0% 48:50 > > > > > dmesg.INFO:task_blocked_for_more_than#seconds > > > > > 48:50 0%
LFX Mentorship: Kselftest Patch Submission - Message Clarity Improvement in futex_requeue
Hi maintainers, As part of the Kselftest task for the LFX Mentorship Program, I have reviewed the futex selftest and made minor improvements to the message clarity in `futex_requeue.c`. Attached is the patch with the changes. I’ve also uploaded it to the mentorship platform as instructed. Please let me know if any changes are needed. Thanks, Shivam Sharma 10sharmashi...@gmail.com 0001-selftests-futex-improve-message-clarity-in-futex_req.patch Description: Binary data
Re: [PATCH v4 0/9] SLUB percpu sheaves
On Thu, May 15, 2025 at 5:46 AM Vlastimil Babka wrote: > > On 4/25/25 10:27, Vlastimil Babka wrote: > > Hi, > > > > This is the v4 and first non-RFC series to add an opt-in percpu > > array-based caching layer to SLUB, following the LSF/MM discussions. > > Since v3 I've also made changes to achieve full compatibility with > > slub_debug, and IRC discussions led to the last patch intended to > > improve NUMA locality (the patch remains separate for evaluation > > purposes). > > I've pushed the changes based on the feedback here: > https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slub-percpu-sheaves > > You can use that for testing/benchmarking. Thanks! Thanks! I'll give it a spin this weekend.
Re: [PATCH v4 2/9] slab: add sheaf support for batching kfree_rcu() operations
On Thu, May 15, 2025 at 1:45 AM Vlastimil Babka wrote: > > On 5/14/25 16:01, Vlastimil Babka wrote: > > On 5/6/25 23:34, Suren Baghdasaryan wrote: > >> On Fri, Apr 25, 2025 at 1:27 AM Vlastimil Babka wrote: > >>> @@ -2631,6 +2637,24 @@ static void sheaf_flush_unused(struct kmem_cache > >>> *s, struct slab_sheaf *sheaf) > >>> sheaf->size = 0; > >>> } > >>> > >>> +static void __rcu_free_sheaf_prepare(struct kmem_cache *s, > >>> +struct slab_sheaf *sheaf); > >> > >> I think you could safely move __rcu_free_sheaf_prepare() here and > >> avoid the above forward declaration. > > > > Right, done. > > > >>> @@ -5304,6 +5340,140 @@ bool free_to_pcs(struct kmem_cache *s, void > >>> *object) > >>> return true; > >>> } > >>> > >>> +static void __rcu_free_sheaf_prepare(struct kmem_cache *s, > >>> +struct slab_sheaf *sheaf) > >> > >> This function seems to be an almost exact copy of free_to_pcs_bulk() > >> from your previous patch. Maybe they can be consolidated? > > > > True, I've extracted it to __kmem_cache_free_bulk_prepare(). > > ... and that was a mistake as free_to_pcs_bulk() diverges in patch 9/9 in a > way that this makes it too infeasible Ah, I see. Makes sense. Sorry for the misleading suggestion.
Re: [PATCH v4 1/1] x86/sgx: Enable automatic SVN updates for SGX enclaves
On 5/14/25 00:32, Reshetova, Elena wrote: >> This was the recent discussion I am aware we had on this matter: >> https://lkml.org/lkml/2024/2/5/1595 >> The measurements were done for older platform (skylake), but I am not >> aware of any architectural changes since that time to improve this. > And to make it more concrete, I made some simple tests based > on program for stress testing rdrand/rdseed that was discussed in that > threat earlier: https://lkml.org/lkml/2024/2/6/746 > Using this stress testing and enough threads, I can make EUPDATESVN fail > reliably and quite easily even with 10 time retry loop by kernel. So, > given this, what should be the correct behaviour be here? WARN_ON > is not justified imo. But should we return an error to userspace and > block open()? No, we can't block open. Just silently fail the EUPDATESVN for now. We can have a separate bikeshedding session about how to warn about the failure (or not). I can also hear Sean screaming from across the room that silent failure is going to be really nasty to debug. Once we settle on this set, we can have another discussion on an explicit update ABI so that folks who truly control their environment can stop all the guests and explicitly run EUPDATESVN at a nice convenient time. *THAT* interface can punt retries out to userspace and warn for sure. But, one thing at a time, please. This set will cover *normal* users: those who don't know exactly where each SGX user is and what their lifetimes are. Oh, and normal users also aren't under RDSEED attacks.
Re: [PATCH v4 22/38] KVM: x86/pmu: Optimize intel/amd_pmu_refresh() helpers
On 5/16/2025 3:22 AM, Sean Christopherson wrote: > On Thu, May 15, 2025, Dapeng Mi wrote: >> On 5/15/2025 8:37 AM, Sean Christopherson wrote: diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c index 153972e944eb..eba086ef5eca 100644 --- a/arch/x86/kvm/svm/pmu.c +++ b/arch/x86/kvm/svm/pmu.c @@ -198,12 +198,20 @@ static void __amd_pmu_refresh(struct kvm_vcpu *vcpu) pmu->nr_arch_gp_counters = min_t(unsigned int, pmu->nr_arch_gp_counters, kvm_pmu_cap.num_counters_gp); - if (pmu->version > 1) { - pmu->global_ctrl_rsvd = ~((1ull << pmu->nr_arch_gp_counters) - 1); + if (kvm_pmu_cap.version > 1) { > ARGH! I saw the pmu->version => kvm_pmu_cap.version change when going > through > this patch, but assumed it was simply a refactoring bug and ignored it. > > Nope, 100% intentional, as I discovered after spending the better part of an > hour > debugging. Stuffing pmu->global_ctrl when it doesn't exist is necessary when > the > mediated PMU is enabled, because pmu->global_ctrl will always be loaded in > hardware. > And so loading '0' means the PMU is effectively disabled, because KVM > disallows the > guest from writing the MSR. > > _That_ is the type of thing that absolutely needs a comment and a lengthy > explanation > in the changelog. Yes, this change is intended. As long as HW supports Global_CTRL MSR, KVM should write available value into it at vm-entry regardless of if guest PMU has GLOBAL_CTRL (pmu version < 2), otherwise guest counters won't be really enabled on HW. yeah, it's indeed easily confused and should put a comment here. My fault ... > > Intel also needs the same treatment for PMU v1. And since there's no hackery > that > I can see, that suggests PMU v1 guests haven't been tested with the mediated > PMU > on Intel. > > I guess congratulations are in order, because this patch just became my new > goto > example of why I'm so strict about on the "one thing per patch" rule. Embarrassing > >>> It's not just global_ctrl. PEBS and the fixed counters also depend on v2+ >>> (the >>> SDM contradicts itself; KVM's ABI is that they're v2+). >>> + /* + * At RESET, AMD CPUs set all enable bits for general purpose counters in + * IA32_PERF_GLOBAL_CTRL (so that software that was written for v1 PMUs + * don't unknowingly leave GP counters disabled in the global controls). + * Emulate that behavior when refreshing the PMU so that userspace doesn't + * need to manually set PERF_GLOBAL_CTRL. + */ + pmu->global_ctrl = BIT_ULL(pmu->nr_arch_gp_counters) - 1; + pmu->global_ctrl_rsvd = ~pmu->global_ctrl; pmu->global_status_rsvd = pmu->global_ctrl_rsvd; }
Re: [PATCH v4 30/38] KVM: x86/pmu: Handle emulated instruction for mediated vPMU
On Mon, Mar 24, 2025, Mingwei Zhang wrote: > static void kvm_pmu_incr_counter(struct kvm_pmc *pmc) > { > - pmc->emulated_counter++; > - kvm_pmu_request_counter_reprogram(pmc); > + struct kvm_vcpu *vcpu = pmc->vcpu; > + > + /* > + * For perf-based PMUs, accumulate software-emulated events separately > + * from pmc->counter, as pmc->counter is offset by the count of the > + * associated perf event. Request reprogramming, which will consult > + * both emulated and hardware-generated events to detect overflow. > + */ > + if (!kvm_mediated_pmu_enabled(vcpu)) { > + pmc->emulated_counter++; > + kvm_pmu_request_counter_reprogram(pmc); > + return; > + } > + > + /* > + * For mediated PMUs, pmc->counter is updated when the vCPU's PMU is > + * put, and will be loaded into hardware when the PMU is loaded. Simply > + * increment the counter and signal overflow if it wraps to zero. > + */ > + pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc); > + if (!pmc->counter) { Ugh, this is broken for the fastpath. If kvm_skip_emulated_instruction() is invoked by handle_fastpath_set_msr_irqoff() or handle_fastpath_hlt(), KVM may consume stale information (GLOBAL_CTRL, GLOBAL_STATUS and PMCs), and even if KVM gets lucky and those are all fresh, the PMC and GLOBAL_STATUS changes won't be propagated back to hardware. The best idea I have is to track whether or not the guest may be counting branches and/or instruction based on eventsels, and then bail from fastpaths that need to skip instructions. That flag would also be useful to further optimize kvm_pmu_trigger_event(). > + pmc_to_pmu(pmc)->global_status |= BIT_ULL(pmc->idx); > + if (pmc_pmi_enabled(pmc)) > + kvm_make_request(KVM_REQ_PMI, vcpu); > + } > } > > static inline bool cpl_is_matched(struct kvm_pmc *pmc) > -- > 2.49.0.395.g12beb8f557-goog >
Re: [PATCH v4 27/38] KVM: x86/pmu: Handle PMU MSRs interception and event filtering
On 3/25/2025 1:31 AM, Mingwei Zhang wrote: > From: Dapeng Mi > > Mediated vPMU needs to intercept EVENTSELx and FIXED_CNTR_CTRL MSRs to > filter out guest malicious perf events. Either writing these MSRs or > updating event filters would call reprogram_counter() eventually. Thus > check if the guest event should be filtered out in reprogram_counter(). > If so, clear corresponding EVENTSELx MSR or FIXED_CNTR_CTRL field to > ensure the guest event won't be really enabled at vm-entry. > > Besides, mediated vPMU intercepts the MSRs of these guest not owned > counters and it just needs simply to read/write from/to pmc->counter. > > Suggested-by: Sean Christopherson > Signed-off-by: Dapeng Mi > Co-developed-by: Mingwei Zhang > Signed-off-by: Mingwei Zhang > --- > arch/x86/kvm/pmu.c | 27 +++ > arch/x86/kvm/pmu.h | 3 +++ > 2 files changed, 30 insertions(+) > > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c > index 63143eeb5c44..e9100dc49fdc 100644 > --- a/arch/x86/kvm/pmu.c > +++ b/arch/x86/kvm/pmu.c > @@ -305,6 +305,11 @@ static void pmc_update_sample_period(struct kvm_pmc *pmc) > > void pmc_write_counter(struct kvm_pmc *pmc, u64 val) > { > + if (kvm_mediated_pmu_enabled(pmc->vcpu)) { > + pmc->counter = val & pmc_bitmask(pmc); > + return; > + } > + > /* >* Drop any unconsumed accumulated counts, the WRMSR is a write, not a >* read-modify-write. Adjust the counter value so that its value is > @@ -455,6 +460,28 @@ static int reprogram_counter(struct kvm_pmc *pmc) > bool emulate_overflow; > u8 fixed_ctr_ctrl; > > + if (kvm_mediated_pmu_enabled(pmu_to_vcpu(pmu))) { > + bool allowed = check_pmu_event_filter(pmc); > + > + if (pmc_is_gp(pmc)) { > + if (allowed) > + pmc->eventsel_hw |= pmc->eventsel & > + > ARCH_PERFMON_EVENTSEL_ENABLE; > + else > + pmc->eventsel_hw &= > ~ARCH_PERFMON_EVENTSEL_ENABLE; > + } else { > + int idx = pmc->idx - KVM_FIXED_PMC_BASE_IDX; > + > + if (allowed) > + pmu->fixed_ctr_ctrl_hw = pmu->fixed_ctr_ctrl; Sean, just found there is a potential bug here. The "pmu->fixed_ctr_ctrl_hw" should not be assigned to "pmu->fixed_ctr_ctrl" here, otherwise the other filtered fixed counter (not this allowed fixed counter) could be enabled accidentally. diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index ba9d336f1d1d..f32e5f66f73b 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -473,7 +473,8 @@ static int reprogram_counter(struct kvm_pmc *pmc) int idx = pmc->idx - KVM_FIXED_PMC_BASE_IDX; if (allowed) - pmu->fixed_ctr_ctrl_hw = pmu->fixed_ctr_ctrl; + pmu->fixed_ctr_ctrl_hw |= pmu->fixed_ctr_ctrl & + intel_fixed_bits_by_idx(idx, 0xf); else pmu->fixed_ctr_ctrl_hw &= ~intel_fixed_bits_by_idx(idx, 0xf); > + else > + pmu->fixed_ctr_ctrl_hw &= > + ~intel_fixed_bits_by_idx(idx, 0xf); > + } > + > + return 0; > + } > + > emulate_overflow = pmc_pause_counter(pmc); > > if (!pmc_event_is_allowed(pmc)) > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h > index 509c995b7871..6289f523d893 100644 > --- a/arch/x86/kvm/pmu.h > +++ b/arch/x86/kvm/pmu.h > @@ -113,6 +113,9 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc) > { > u64 counter, enabled, running; > > + if (kvm_mediated_pmu_enabled(pmc->vcpu)) > + return pmc->counter & pmc_bitmask(pmc); > + > counter = pmc->counter + pmc->emulated_counter; > > if (pmc->perf_event && !pmc->is_paused)
Re: [PATCH v9 4/4] vhost: Add a KConfig knob to enable IOCTL VHOST_FORK_FROM_OWNER
On Thu, May 15, 2025 at 2:14 PM Michael S. Tsirkin wrote: > > On Wed, May 14, 2025 at 10:52:58AM +0800, Jason Wang wrote: > > On Tue, May 13, 2025 at 3:09 PM Michael S. Tsirkin wrote: > > > > > > On Tue, May 13, 2025 at 12:08:51PM +0800, Jason Wang wrote: > > > > On Wed, Apr 30, 2025 at 5:27 PM Michael S. Tsirkin > > > > wrote: > > > > > > > > > > On Wed, Apr 30, 2025 at 11:34:49AM +0800, Jason Wang wrote: > > > > > > On Tue, Apr 29, 2025 at 6:56 PM Michael S. Tsirkin > > > > > > wrote: > > > > > > > > > > > > > > On Tue, Apr 29, 2025 at 11:39:37AM +0800, Jason Wang wrote: > > > > > > > > On Mon, Apr 21, 2025 at 11:46 AM Jason Wang > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Mon, Apr 21, 2025 at 11:45 AM Jason Wang > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Mon, Apr 21, 2025 at 10:45 AM Cindy Lu > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > Introduce a new config knob > > > > > > > > > > > `CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL`, > > > > > > > > > > > to control the availability of the > > > > > > > > > > > `VHOST_FORK_FROM_OWNER` ioctl. > > > > > > > > > > > When CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL is set to n, > > > > > > > > > > > the ioctl > > > > > > > > > > > is disabled, and any attempt to use it will result in > > > > > > > > > > > failure. > > > > > > > > > > > > > > > > > > > > I think we need to describe why the default value was > > > > > > > > > > chosen to be false. > > > > > > > > > > > > > > > > > > > > What's more, should we document the implications here? > > > > > > > > > > > > > > > > > > > > inherit_owner was set to false: this means "legacy" > > > > > > > > > > userspace may > > > > > > > > > > > > > > > > > > I meant "true" actually. > > > > > > > > > > > > > > > > MIchael, I'd expect inherit_owner to be false. Otherwise legacy > > > > > > > > applications need to be modified in order to get the behaviour > > > > > > > > recovered which is an impossible taks. > > > > > > > > > > > > > > > > Any idea on this? > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > So, let's say we had a modparam? Enough for this customer? > > > > > WDYT? > > > > > > > > Just to make sure I understand the proposal. > > > > > > > > Did you mean a module parameter like "inherit_owner_by_default"? I > > > > think it would be fine if we make it false by default. > > > > > > > > Thanks > > > > > > I think we should keep it true by default, changing the default > > > risks regressing what we already fixes. > > > > I think it's not a regression since it comes since the day vhost is > > introduced. To my understanding the real regression is the user space > > noticeable behaviour changes introduced by vhost thread. > > > > > The specific customer can > > > flip the modparam and be happy. > > > > If you stick to the false as default, I'm fine. > > > > Thanks > > That would be yet another behaviour change. Back to the original behaviour. > I think one was enough, don't you think? I think such kind of change is unavoidable if we want to fix the usersapce behaviour change. Thanks > > > > > > > > > > > > > > > -- > > > > > MST > > > > > > > > >
Re: [PATCH 13/19] virtio_ring: introduce virtqueue ops
On Wed, May 14, 2025 at 10:24 PM Michael S. Tsirkin wrote: > > On Wed, May 14, 2025 at 10:19:05AM -0400, Michael S. Tsirkin wrote: > > On Wed, Apr 09, 2025 at 12:06:03PM +0800, Jason Wang wrote: > > > On Tue, Apr 8, 2025 at 7:37 PM Michael S. Tsirkin wrote: > > > > > > > > On Tue, Apr 08, 2025 at 03:02:35PM +0800, Jason Wang wrote: > > > > > On Mon, Apr 7, 2025 at 4:20 PM Michael S. Tsirkin > > > > > wrote: > > > > > > > > > > > > On Mon, Mar 24, 2025 at 02:01:21PM +0800, Jason Wang wrote: > > > > > > > This patch introduces virtqueue ops which is a set of the > > > > > > > callbacks > > > > > > > that will be called for different queue layout or features. This > > > > > > > would > > > > > > > help to avoid branches for split/packed and will ease the future > > > > > > > implementation like in order. > > > > > > > > > > > > > > Signed-off-by: Jason Wang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > drivers/virtio/virtio_ring.c | 96 > > > > > > > +--- > > > > > > > 1 file changed, 67 insertions(+), 29 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c > > > > > > > b/drivers/virtio/virtio_ring.c > > > > > > > index a2884eae14d9..ce1dc90ee89d 100644 > > > > > > > --- a/drivers/virtio/virtio_ring.c > > > > > > > +++ b/drivers/virtio/virtio_ring.c > > > > > > > @@ -159,9 +159,30 @@ struct vring_virtqueue_packed { > > > > > > > size_t event_size_in_bytes; > > > > > > > }; > > > > > > > > > > > > > > +struct vring_virtqueue; > > > > > > > + > > > > > > > +struct virtqueue_ops { > > > > > > > + int (*add)(struct vring_virtqueue *_vq, struct scatterlist > > > > > > > *sgs[], > > > > > > > +unsigned int total_sg, unsigned int out_sgs, > > > > > > > +unsigned int in_sgs, void *data, > > > > > > > +void *ctx, bool premapped, gfp_t gfp); > > > > > > > + void *(*get)(struct vring_virtqueue *vq, unsigned int *len, > > > > > > > void **ctx); > > > > > > > + bool (*kick_prepare)(struct vring_virtqueue *vq); > > > > > > > + void (*disable_cb)(struct vring_virtqueue *vq); > > > > > > > + bool (*enable_cb_delayed)(struct vring_virtqueue *vq); > > > > > > > + unsigned int (*enable_cb_prepare)(struct vring_virtqueue > > > > > > > *vq); > > > > > > > + bool (*poll)(const struct vring_virtqueue *vq, u16 > > > > > > > last_used_idx); > > > > > > > + void *(*detach_unused_buf)(struct vring_virtqueue *vq); > > > > > > > + bool (*more_used)(const struct vring_virtqueue *vq); > > > > > > > + int (*resize)(struct vring_virtqueue *vq, u32 num); > > > > > > > + void (*reset)(struct vring_virtqueue *vq); > > > > > > > +}; > > > > > > > > > > > > I like it that it's organized but > > > > > > I worry about the overhead of indirect calls here. > > > > > > > > > > We can switch to use INDIRECT_CALL_X() here > > > > > > > > If you think it's cleaner.. but INDIRECT_CALL is all chained > > > > > > Yes, and it would be problematic as the number of ops increased. > > > > > > > while a switch can do a binary search. > > > > > > > > > > Do you mean a nested switch? > > > > Not sure what is nested. gcc does a decent job of optimizing > > switches. You have 4 types of ops: > > packed/packed in order/split/split in order > > > > So: > > > > enum { > > VQ_SPLIT, > > VQ_SPLIT_IN_ORDER, > > VQ_PACKED, > > VQ_PACKED_IN_ORDER, > > } > > > > > > I do not see how it is worse? > > > > > > > > Actually, here is an idea - create an array of ops: > > > > enum vqtype { > SPLIT, > SPLIT_IN_ORDER, > PACKED, > PACKED_IN_ORDER, > MAX > }; > > > struct ops { > int (*add)(int bar); > }; > > extern int packed(int); > extern int packedinorder(int); > extern int split(int); > extern int splitinorder(int); > > const struct ops allops[MAX] = { [SPLIT] = {split}, [SPLIT_IN_ORDER] = { > splitinorder}, [PACKED] = { packed }, [PACKED_IN_ORDER] = {packedinorder}}; > > int main(int argc, char **argv) > { > switch (argc) { > case 0: > return allops[PACKED].foo(argc); > case 1: > return allops[SPLIT].foo(argc); > default: > return allops[PACKED_IN_ORDER].foo(argc); This still looks like an indirection call as we don't call the symbol directly but need to load the function address into a register. > } > } > > > I tested this and compiler is able to elide the indirect calls. I've tried the following: struct virtqueue_ops split_ops = { .add = virtqueue_add_split, .get = virtqueue_get_buf_ctx_split, .kick_prepare = virtqueue_kick_prepare_split, .disable_cb = virtqueue_disable_cb_split, .enable_cb_delayed = virtqueue_enable_cb_delayed_split, .enable_cb_prepare = virtqueue_enable_cb_prepare_split, .poll = virtqueue
Fwd: [PATCH] selftests : timers : valid-adjtimex.c : Fixed style checks
fixed style checks according to Linux Kernel Coding Style standards. 1 : fixed alignment of parenthesis. LOG : CHECK: Alignment should match open parenthesis + printf("ERROR: out of range value %ld actually set!\n", + tx.freq); 2 : fixed alignment of parenthesis. LOG : CHECK: Alignment should match open parenthesis + printf("Error: No failure on invalid ADJ_FREQUENCY %ld\n", + invalid_freq[i]); 3 : fixed line length of 106 to 100 and less. LOG : CHECK: line length of 106 exceeds 100 columns + printf("Invalid (sec: %ld usec: %ld) did not fail! ", tmx.time.tv_sec, tmx.time.tv_usec); Signed-off-by: Rujra Bhatt --- tools/testing/selftests/timers/valid-adjtimex.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/timers/valid-adjtimex.c b/tools/testing/selftests/timers/valid-adjtimex.c index 6b7801055ad1..5110f9ee285c 100644 --- a/tools/testing/selftests/timers/valid-adjtimex.c +++ b/tools/testing/selftests/timers/valid-adjtimex.c @@ -157,7 +157,7 @@ int validate_freq(void) if (tx.freq == outofrange_freq[i]) { printf("[FAIL]\n"); printf("ERROR: out of range value %ld actually set!\n", - tx.freq); + tx.freq); pass = -1; goto out; } @@ -172,7 +172,7 @@ int validate_freq(void) if (ret >= 0) { printf("[FAIL]\n"); printf("Error: No failure on invalid ADJ_FREQUENCY %ld\n", - invalid_freq[i]); + invalid_freq[i]); pass = -1; goto out; } @@ -238,7 +238,8 @@ int set_bad_offset(long sec, long usec, int use_nano) tmx.time.tv_usec = usec; ret = clock_adjtime(CLOCK_REALTIME, &tmx); if (ret >= 0) { - printf("Invalid (sec: %ld usec: %ld) did not fail! ", tmx.time.tv_sec, tmx.time.tv_usec); + printf("Invalid (sec: %ld usec: %ld) did not fail! ", + tmx.time.tv_sec, tmx.time.tv_usec); printf("[FAIL]\n"); return -1; } -- 2.43.0
Re: [PATCH v4 23/38] KVM: x86/pmu: Configure the interception of PMU MSRs
On Thu, May 15, 2025, Dapeng Mi wrote: > On 5/15/2025 8:41 AM, Sean Christopherson wrote: > >> + if (kvm_mediated_pmu_enabled(vcpu) && kvm_pmu_has_perf_global_ctrl(pmu) > >> && > > Just require the guest to have PERF_GLOBAL_CTRL, I don't see any reason to > > support > > v1 PMUs. It adds complexity and weirdness, and I can't imagine there's a > > use case. I take that back, there absolutely are use cases, especially for AMD. Any VM shape that exists today should be compatible with the mediated PMU. And I was dead wrong about adding complexity; KVM already needs to intercept GLOBAL_CTRL if the guest has fewer PMCs than hardware, so incorporating this check is all of two lines of code.
Re: [PATCH] Revert "remoteproc: core: Clear table_sz when rproc_shutdown"
On Tue, May 13, 2025 at 04:52:46PM +0100, Bjorn Andersson wrote: > Clearing the table_sz on cleanup seemed reasonable, but further > discussions concluded that this merely working around the issue > and that the fix is incomplete. > > As such, revert commit efdde3d73ab2 ("remoteproc: core: Clear table_sz > when rproc_shutdown") to avoid carrying a partial fix. > > Signed-off-by: Bjorn Andersson > --- > drivers/remoteproc/remoteproc_core.c | 1 - > 1 file changed, 1 deletion(-) > I have applied this patch. Thanks, Mathieu > diff --git a/drivers/remoteproc/remoteproc_core.c > b/drivers/remoteproc/remoteproc_core.c > index > 48d146e1fa560397c11eeb8f824ae0fb844a022b..81b2ccf988e852ac79cee375c7e3f118c2a4b41a > 100644 > --- a/drivers/remoteproc/remoteproc_core.c > +++ b/drivers/remoteproc/remoteproc_core.c > @@ -2025,7 +2025,6 @@ int rproc_shutdown(struct rproc *rproc) > kfree(rproc->cached_table); > rproc->cached_table = NULL; > rproc->table_ptr = NULL; > - rproc->table_sz = 0; > out: > mutex_unlock(&rproc->lock); > return ret; > > --- > base-commit: aa94665adc28f3fdc3de2979ac1e98bae961d6ca > change-id: 20250513-revert-rproc-table-sz-53ecf24726ae > > Best regards, > -- > Bjorn Andersson >
Re: [PATCH 2/2] selftests/mm: skip hugevm test if kernel config file is not present.
On Thu, May 15, 2025 at 02:23:33PM -0400, Zi Yan wrote: > When running hugevm tests in a machine without kernel config present, e.g., > a VM running a kernel without CONFIG_IKCONFIG_PROC nor /boot/config-*, > skip hugevm tests, which reads kernel config to get page table level > information. > > Signed-off-by: Zi Yan Looks generally reasonable to me, but I'm not so familiar with this so, Acked-by: Lorenzo Stoakes > --- > .../selftests/mm/va_high_addr_switch.sh | 26 +++ > 1 file changed, 9 insertions(+), 17 deletions(-) > > diff --git a/tools/testing/selftests/mm/va_high_addr_switch.sh > b/tools/testing/selftests/mm/va_high_addr_switch.sh > index 1f92e8caceac..325de53966b6 100755 > --- a/tools/testing/selftests/mm/va_high_addr_switch.sh > +++ b/tools/testing/selftests/mm/va_high_addr_switch.sh > @@ -7,23 +7,20 @@ > # real test to check that the kernel is configured to support at least 5 > # pagetable levels. > > -# 1 means the test failed > -exitcode=1 > - > # Kselftest framework requirement - SKIP code is 4. > ksft_skip=4 > > -fail() > +skip() > { > echo "$1" > - exit $exitcode > + exit $ksft_skip > } > > check_supported_x86_64() > { > local config="/proc/config.gz" > [[ -f "${config}" ]] || config="/boot/config-$(uname -r)" > - [[ -f "${config}" ]] || fail "Cannot find kernel config in /proc or > /boot" > + [[ -f "${config}" ]] || skip "Cannot find kernel config in /proc or > /boot" > > # gzip -dcfq automatically handles both compressed and plaintext input. > # See man 1 gzip under '-f'. > @@ -33,11 +30,9 @@ check_supported_x86_64() > else {print 1}; exit}' /proc/cpuinfo 2>/dev/null) > > if [[ "${pg_table_levels}" -lt 5 ]]; then > - echo "$0: PGTABLE_LEVELS=${pg_table_levels}, must be >= 5 to > run this test" > - exit $ksft_skip > + skip "$0: PGTABLE_LEVELS=${pg_table_levels}, must be >= 5 to > run this test" > elif [[ "${cpu_supports_pl5}" -ne 0 ]]; then > - echo "$0: CPU does not have the necessary la57 flag to support > page table level 5" > - exit $ksft_skip > + skip "$0: CPU does not have the necessary la57 flag to support > page table level 5" > fi > } > > @@ -45,24 +40,21 @@ check_supported_ppc64() > { > local config="/proc/config.gz" > [[ -f "${config}" ]] || config="/boot/config-$(uname -r)" > - [[ -f "${config}" ]] || fail "Cannot find kernel config in /proc or > /boot" > + [[ -f "${config}" ]] || skip "Cannot find kernel config in /proc or > /boot" > > local pg_table_levels=$(gzip -dcfq "${config}" | grep PGTABLE_LEVELS | > cut -d'=' -f 2) > if [[ "${pg_table_levels}" -lt 5 ]]; then > - echo "$0: PGTABLE_LEVELS=${pg_table_levels}, must be >= 5 to > run this test" > - exit $ksft_skip > + skip "$0: PGTABLE_LEVELS=${pg_table_levels}, must be >= 5 to > run this test" > fi > > local mmu_support=$(grep -m1 "mmu" /proc/cpuinfo | awk '{print $3}') > if [[ "$mmu_support" != "radix" ]]; then > - echo "$0: System does not use Radix MMU, required for 5-level > paging" > - exit $ksft_skip > + skip "$0: System does not use Radix MMU, required for 5-level > paging" > fi > > local hugepages_total=$(awk '/HugePages_Total/ {print $2}' > /proc/meminfo) > if [[ "${hugepages_total}" -eq 0 ]]; then > - echo "$0: HugePages are not enabled, required for some tests" > - exit $ksft_skip > + skip "$0: HugePages are not enabled, required for some tests" > fi > } > > -- > 2.47.2 >
Re: [PATCH 1/2] selftests/mm: skip uffd tests in madv_guard if uffd is not present.
On 15 May 2025, at 14:41, Lorenzo Stoakes wrote: > Ah you got to this first :) thanks! > > Could you do this with a cover letter though? It's really weird to have 2/2 > reply to 1/2, I know sometimes people do that, but it's just odd, and it'd be > good to have an overview, thanks! > > On Thu, May 15, 2025 at 02:23:32PM -0400, Zi Yan wrote: >> When userfaultfd is not compiled into kernel, userfaultfd() returns -1, >> causing uffd tests in madv_guard fail. Skip the tests instead. > > 'madv_guard'? I'd just say the guard_regions.uffd test to fail. Sure. Will change it. > >> >> Signed-off-by: Zi Yan >> --- >> tools/testing/selftests/mm/guard-regions.c | 17 +++-- >> 1 file changed, 15 insertions(+), 2 deletions(-) >> >> diff --git a/tools/testing/selftests/mm/guard-regions.c >> b/tools/testing/selftests/mm/guard-regions.c >> index 0cd9d236649d..93af3d3760f9 100644 >> --- a/tools/testing/selftests/mm/guard-regions.c >> +++ b/tools/testing/selftests/mm/guard-regions.c >> @@ -1453,8 +1453,21 @@ TEST_F(guard_regions, uffd) >> >> /* Set up uffd. */ >> uffd = userfaultfd(0); >> -if (uffd == -1 && errno == EPERM) >> -ksft_exit_skip("No userfaultfd permissions, try running as >> root.\n"); > > Let's just make this all part of the same switch please! What do you mean? EPERM is handled in the switch-case below. > > And while I originally used ksft_exit_skip(), I think we can just use the > SKIP(return, ...) form here just fine to keep it consistent. Right. I am using SKIP below, since when I ran it, ksft_exit_skip() makes the whole test message inconsistent. > >> +if (uffd == -1) { >> +switch (errno) { >> +case EPERM: >> +SKIP(return, "No userfaultfd permissions, try running >> as root."); >> +break; >> +case ENOSYS: >> +SKIP(return, "userfaultfd is not supported/not >> enabled."); >> +break; >> +default: >> +ksft_exit_fail_msg("userfaultfd failed with %s\n", >> + strerror(errno)); >> +break; >> +} >> +} >> + >> ASSERT_NE(uffd, -1); >> >> ASSERT_EQ(ioctl(uffd, UFFDIO_API, &api), 0); >> -- >> 2.47.2 >> > > Thanks! -- Best Regards, Yan, Zi
Re: [PATCH 2/2] selftests/mm: skip hugevm test if kernel config file is not present.
On Thu, May 15, 2025 at 07:43:40PM +0100, Lorenzo Stoakes wrote: > On Thu, May 15, 2025 at 02:23:33PM -0400, Zi Yan wrote: > > When running hugevm tests in a machine without kernel config present, e.g., > > a VM running a kernel without CONFIG_IKCONFIG_PROC nor /boot/config-*, > > skip hugevm tests, which reads kernel config to get page table level > > information. > > > > Signed-off-by: Zi Yan > > Looks generally reasonable to me, but I'm not so familiar with this so, > > Acked-by: Lorenzo Stoakes Acked-by: Pedro Falcato Same here. Although, despite this being worth patching, I do think we should document somewhere what the expectations are for mm selftests (in terms of kernel config, environment, libc, possibly even utilities present). -- Pedro
Re: [PATCH v4 05/38] perf: Add generic exclude_guest support
On 2025-05-15 3:25 p.m., Sean Christopherson wrote: > On Thu, May 15, 2025, Kan Liang wrote: >> On 2025-05-14 7:19 p.m., Sean Christopherson wrote: This naming is confusing on purpose? Pick either guest/host and stick with it. >>> >>> +1. I also think the inner perf_host_{enter,exit}() helpers are superflous. >>> These flows >>> >>> After a bit of hacking, and with a few spoilers, this is what I ended up >>> with >>> (not anywhere near fully tested). I like following KVM's >>> kvm_xxx_{load,put}() >>> nomenclature to tie everything together, so I went with "guest" instead of >>> "host" >>> even though the majority of work being down is to shedule out/in host >>> context. >>> >>> /* When loading a guest's mediated PMU, schedule out all exclude_guest >>> events. */ >>> void perf_load_guest_context(unsigned long data) >>> { >>> struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context); >>> >>> lockdep_assert_irqs_disabled(); >>> >>> perf_ctx_lock(cpuctx, cpuctx->task_ctx); >>> >>> if (WARN_ON_ONCE(__this_cpu_read(guest_ctx_loaded))) >>> goto unlock; >>> >>> perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST); >>> ctx_sched_out(&cpuctx->ctx, NULL, EVENT_GUEST); >>> perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST); >>> if (cpuctx->task_ctx) { >>> perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST); >>> task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST); >>> perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST); >>> } >>> >>> arch_perf_load_guest_context(data); >>> >>> __this_cpu_write(guest_ctx_loaded, true); >>> >>> unlock: >>> perf_ctx_unlock(cpuctx, cpuctx->task_ctx); >>> } >>> EXPORT_SYMBOL_GPL(perf_load_guest_context); >>> >>> void perf_put_guest_context(void) >>> { >>> struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context); >>> >>> lockdep_assert_irqs_disabled(); >>> >>> perf_ctx_lock(cpuctx, cpuctx->task_ctx); >>> >>> if (WARN_ON_ONCE(!__this_cpu_read(guest_ctx_loaded))) >>> goto unlock; >>> >>> arch_perf_put_guest_context(); >> >> It will set the guest_ctx_loaded to false. >> The update_context_time() invoked in the perf_event_sched_in() will not >> get a chance to update the guest time. > > The guest_ctx_loaded in arch/x86/events/core.c is a different variable, it > just > happens to have the same name. > Ah, I thought it was the same variable. Then there should be no issue with the guest time. But the same variable name may bring confusions. Maybe add a x86_pmu/x86 prefix to the variable in x86 code. > It's completely gross, but exposing guest_ctx_loaded outside of > kernel/events/core.c > didn't seem like a great alternative. If we wanted to use a single variable, > then the writes in arch_perf_{load,put}_guest_context() can simply go away. > Either two variables or a single variable is fine with me, as long as they can be easily distinguished. But I think we should put arch_perf_{load,put}_guest_context() and guest_ctx_loaded between the perf_ctx_disable/enable() pair. Perf should only update the state when PMU is completely disabled. It matches the way the rest of the perf code does. It could also avoid some potential issues, e.g., the state is not updated completely, but the counter has already been fired. Thanks, Kan
Re: [PATCH v2 1/3] rpmsg: char: Reuse eptdev logic for anon device
Hi, On Fri, May 09, 2025 at 11:59:25PM +0800, Dawei Li wrote: > Current uAPI implementation for rpmsg ctrl & char device manipulation is > abstracted in procedures below: > > Current uAPI implementation for rpmsg ctrl & char device manipulation is > abstracted in procedures below: > - fd = open("/dev/rpmsg_ctrlX") > - ioctl(fd, RPMSG_CREATE_EPT_IOCTL, &info); /dev/rpmsgY devnode is > generated. > - fd_ep = open("/dev/rpmsgY", O_RDWR) > - operations on fd_ep(write, read, poll ioctl) > - ioctl(fd_ep, RPMSG_DESTROY_EPT_IOCTL) > - close(fd_ep) > - close(fd) > > This /dev/rpmsgY abstraction is less favorable for: > - Performance issue: It's time consuming for some operations are > invovled: > - Device node creation. > Depends on specific config, especially CONFIG_DEVTMPFS, the overall > overhead is based on coordination between DEVTMPFS and userspace > tools such as udev and mdev. > > - Extra kernel-space switch cost. > > - Other major costs brought by heavy-weight logic like device_add(). > > - /dev/rpmsgY node can be opened only once. It doesn't make much sense > that a dynamically created device node can be opened only once. > > - For some container application such as docker, a client can't access > host's dev unless specified explicitly. But in case of /dev/rpmsgY, which > is generated dynamically and whose existence is unknown for clients in > advance, this uAPI based on device node doesn't fit well. > > An anon inode based approach is introduced to address the issues above. > Rather than generating device node and opening it, rpmsg code just make > a anon inode representing eptdev and return the fd to userspace. Please change occurences of "anon" for "anonyous". "Anon" was used to avoid writing "anonymous" all the time in the code, but description should not use the shorthand. In the title, this patch and all other patches. > > The legacy abstraction based on struct dev and struct cdev is honored > for: > - Avoid legacy uAPI break(RPMSG_CREATE_EPT_IOCTL) > - Reuse existing logic: > -- dev_err() and friends. > -- Life cycle management of struct device. > > Signed-off-by: Dawei Li > --- > drivers/rpmsg/rpmsg_char.c | 80 ++ > 1 file changed, 56 insertions(+), 24 deletions(-) > > diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c > index eec7642d2686..5b2a883d6236 100644 > --- a/drivers/rpmsg/rpmsg_char.c > +++ b/drivers/rpmsg/rpmsg_char.c > @@ -91,7 +91,8 @@ int rpmsg_chrdev_eptdev_destroy(struct device *dev, void > *data) > /* wake up any blocked readers */ > wake_up_interruptible(&eptdev->readq); > > - cdev_device_del(&eptdev->cdev, &eptdev->dev); > + if (eptdev->dev.devt) > + cdev_device_del(&eptdev->cdev, &eptdev->dev); > put_device(&eptdev->dev); > > return 0; > @@ -132,21 +133,17 @@ static int rpmsg_ept_flow_cb(struct rpmsg_device > *rpdev, void *priv, bool enable > return 0; > } > > -static int rpmsg_eptdev_open(struct inode *inode, struct file *filp) > +static int __rpmsg_eptdev_open(struct rpmsg_eptdev *eptdev) > { > - struct rpmsg_eptdev *eptdev = cdev_to_eptdev(inode->i_cdev); > struct rpmsg_endpoint *ept; > struct rpmsg_device *rpdev = eptdev->rpdev; > struct device *dev = &eptdev->dev; > > - mutex_lock(&eptdev->ept_lock); > if (eptdev->ept) { > - mutex_unlock(&eptdev->ept_lock); > return -EBUSY; > } > > if (!eptdev->rpdev) { > - mutex_unlock(&eptdev->ept_lock); > return -ENETRESET; > } > > @@ -164,21 +161,32 @@ static int rpmsg_eptdev_open(struct inode *inode, > struct file *filp) > if (!ept) { > dev_err(dev, "failed to open %s\n", eptdev->chinfo.name); > put_device(dev); > - mutex_unlock(&eptdev->ept_lock); > return -EINVAL; > } > > ept->flow_cb = rpmsg_ept_flow_cb; > eptdev->ept = ept; > - filp->private_data = eptdev; > - mutex_unlock(&eptdev->ept_lock); > > return 0; > } > > -static int rpmsg_eptdev_release(struct inode *inode, struct file *filp) > +static int rpmsg_eptdev_open(struct inode *inode, struct file *filp) > { > struct rpmsg_eptdev *eptdev = cdev_to_eptdev(inode->i_cdev); > + int ret; > + > + mutex_lock(&eptdev->ept_lock); > + ret = __rpmsg_eptdev_open(eptdev); > + if (!ret) > + filp->private_data = eptdev; > + mutex_unlock(&eptdev->ept_lock); > + > + return ret; > +} > + > +static int rpmsg_eptdev_release(struct inode *inode, struct file *filp) > +{ > + struct rpmsg_eptdev *eptdev = filp->private_data; > struct device *dev = &eptdev->dev; > > /* Close the endpoint, if it's not already destroyed by the parent */ > @@ -400,12 +408,13 @@ static void rpmsg_eptdev_release_device(struct device > *dev) > struct rpmsg_eptdev *eptdev = dev_to_eptdev(dev);
Re: [PATCH 5/6] x86/paravirt: Switch MSR access pv_ops functions to instruction interfaces
On 5/15/25 00:32, Xin Li wrote: Hi Juergen, I have some update on this thread while working on it. If we continue down the path of maintaining pvops MSR APIs as this patch series does, it seems we’ll need to duplicate the ALTERNATIVE code in three different places. 1) The MSR access primitives defined in , which is used when CONFIG_PARAVIRT=n. 2) The pvops native MSR functions pv_native_{rd,wr}msr{,_safe}() defined in arch/x86/kernel/paravirt.c, used when CONFIG_PARAVIRT=y on bare metal. 3) The pvops Xen MSR functions paravirt_{read,write}_msr{,_safe}() defined in , used when CONFIG_PARAVIRT_XXL=y. hpa had mentioned to me earlier that this would be a maintenance burden — something I only truly realized once I got hands-on with it. Maybe you have something in mind to address it? Also add PeterZ to the To list because he cares it. Having the code being duplicated is definitely not a good thing; although I'm not one of the x86 maintainers anymore, I would consider it a strong reason to NAK such a patchset. At one point I was considering augmenting the alternatives framework to be able to call an ad hoc subroutine to generate the code. It would be useful in cases like this, where if PV is enabled it can make a callout to the currently-active PV code to query the desired code to be output. There are 16 unused bits in the alternatives table (not counting the 14 unused flag bits), which could be used for an enumeration of such subroutines, optionally split into 8 bits of function enumeration and 8 bits of private data. In this case, the "replacement" pointer becomes available as a private pointer; possibly to a metadata structure used by the subroutine. This could also be used to significantly enhance the static-immediate framework, by being able to have explicit code which handles the transformations instead of needing to rely on assembly hacks. That way we might even be able to do that kind of transformations for any ro_after_init value. I think the biggest concern is how this would affect objtool, since objtool would now not have any kind of direct visibility into the possibly generated code. How to best feed the information objtool needs to it would be my biggest question (in part because I don't know what objtool would actually need.) -hpa
Re: [PATCH v2 2/3] rpmsg: char: Implement eptdev based on anon inode
On Fri, May 09, 2025 at 11:59:26PM +0800, Dawei Li wrote: > Introduce new eptdev abstraction based on anon inode. The new API is > exactly same with legacy one except: > > - It's anonymous and devnode/path free. > - Its fops->open() is empty. > > Signed-off-by: Dawei Li > --- > drivers/rpmsg/rpmsg_char.c | 44 ++ > drivers/rpmsg/rpmsg_char.h | 19 > 2 files changed, 63 insertions(+) > > diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c > index 5b2a883d6236..b0ec05f88013 100644 > --- a/drivers/rpmsg/rpmsg_char.c > +++ b/drivers/rpmsg/rpmsg_char.c > @@ -13,6 +13,7 @@ > > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > +#include > #include > #include > #include > @@ -517,6 +518,49 @@ int rpmsg_chrdev_eptdev_create(struct rpmsg_device > *rpdev, struct device *parent > } > EXPORT_SYMBOL(rpmsg_chrdev_eptdev_create); > > +static const struct file_operations rpmsg_eptdev_anon_fops = { > + .owner = THIS_MODULE, > + .release = rpmsg_eptdev_release, > + .read_iter = rpmsg_eptdev_read_iter, > + .write_iter = rpmsg_eptdev_write_iter, > + .poll = rpmsg_eptdev_poll, > + .unlocked_ioctl = rpmsg_eptdev_ioctl, > + .compat_ioctl = compat_ptr_ioctl, > +}; > + > +int rpmsg_eptdev_create(struct rpmsg_device *rpdev, struct device *parent, > + struct rpmsg_channel_info chinfo, int *pfd) rpmsg_anonymous_eptdev_create() > +{ > + struct rpmsg_eptdev *eptdev; > + int ret, fd; > + > + eptdev = __rpmsg_chrdev_eptdev_alloc(rpdev, parent, false); > + if (IS_ERR(eptdev)) > + return PTR_ERR(eptdev); > + > + ret = __rpmsg_chrdev_eptdev_add(eptdev, chinfo, false); > + if (ret) { > + dev_err(&eptdev->dev, "failed to add %s\n", > eptdev->chinfo.name); > + return ret; > + } > + > + fd = anon_inode_getfd("rpmsg-eptdev", &rpmsg_eptdev_anon_fops, eptdev, > O_RDWR | O_CLOEXEC); > + if (fd < 0) { > + put_device(&eptdev->dev); > + return fd; > + } > + > + mutex_lock(&eptdev->ept_lock); > + ret = __rpmsg_eptdev_open(eptdev); > + mutex_unlock(&eptdev->ept_lock); > + > + if (!ret) > + *pfd = fd; > + > + return ret; > +} > +EXPORT_SYMBOL(rpmsg_eptdev_create); > + > static int rpmsg_chrdev_probe(struct rpmsg_device *rpdev) > { > struct rpmsg_channel_info chinfo; > diff --git a/drivers/rpmsg/rpmsg_char.h b/drivers/rpmsg/rpmsg_char.h > index 117d9cbc52f0..8cc2c14537da 100644 > --- a/drivers/rpmsg/rpmsg_char.h > +++ b/drivers/rpmsg/rpmsg_char.h > @@ -19,6 +19,19 @@ > int rpmsg_chrdev_eptdev_create(struct rpmsg_device *rpdev, struct device > *parent, > struct rpmsg_channel_info chinfo); > > +/** > + * rpmsg_eptdev_create() - register ep device and its associated fd based on > an endpoint > + * @rpdev: prepared rpdev to be used for creating endpoints > + * @parent: parent device > + * @chinfo: associated endpoint channel information. > + * @pfd: fd in represent of endpoint device > + * > + * This function create a new rpmsg endpoint device and its associated fd to > instantiate a new > + * endpoint based on chinfo information. > + */ > +int rpmsg_eptdev_create(struct rpmsg_device *rpdev, struct device *parent, > + struct rpmsg_channel_info chinfo, int *pfd); > + > /** > * rpmsg_chrdev_eptdev_destroy() - destroy created char device endpoint. > * @data: private data associated to the endpoint device > @@ -36,6 +49,12 @@ static inline int rpmsg_chrdev_eptdev_create(struct > rpmsg_device *rpdev, struct > return -ENXIO; > } > > +static inline int rpmsg_eptdev_create(struct rpmsg_device *rpdev, struct > device *parent, > + struct rpmsg_channel_info chinfo, int > *pfd) > +{ > + return -ENXIO; > +} > + > static inline int rpmsg_chrdev_eptdev_destroy(struct device *dev, void *data) > { > return -ENXIO; > -- > 2.25.1 >
[PATCH v3 2/6] dt-bindings: soc: qcom: add qcom,qcs615-imem compatible
Document qcom,qcs615-imem compatible. It has a child node for debugging purposes. Acked-by: Krzysztof Kozlowski Signed-off-by: Lijuan Gao --- Documentation/devicetree/bindings/sram/qcom,imem.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/sram/qcom,imem.yaml b/Documentation/devicetree/bindings/sram/qcom,imem.yaml index 2711f90d9664b70fcd1e2f7e2dfd3386ed5c1952..dc3b5a69b9254e7001a329bb2011305152316689 100644 --- a/Documentation/devicetree/bindings/sram/qcom,imem.yaml +++ b/Documentation/devicetree/bindings/sram/qcom,imem.yaml @@ -22,6 +22,7 @@ properties: - qcom,msm8974-imem - qcom,msm8976-imem - qcom,qcs404-imem + - qcom,qcs615-imem - qcom,qcs8300-imem - qcom,qdu1000-imem - qcom,sa8775p-imem -- 2.34.1
[PATCH v3 3/6] arm64: dts: qcom: qcs615: Add mproc node for SEMP2P
From: Kyle Deng The Shared Memory Point to Point (SMP2P) protocol facilitates communication of a single 32-bit value between two processors. Add these two nodes for remoteproc enablement on QCS615 SoC. Signed-off-by: Kyle Deng Signed-off-by: Lijuan Gao --- arch/arm64/boot/dts/qcom/qcs615.dtsi | 44 1 file changed, 44 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/qcs615.dtsi b/arch/arm64/boot/dts/qcom/qcs615.dtsi index f08ba09772f333adaade2d30897cf50c555a4d12..f922349758d11ec7fda1c43736a4bf290916e67f 100644 --- a/arch/arm64/boot/dts/qcom/qcs615.dtsi +++ b/arch/arm64/boot/dts/qcom/qcs615.dtsi @@ -332,6 +332,50 @@ mc_virt: interconnect-2 { qcom,bcm-voters = <&apps_bcm_voter>; }; + smp2p-adsp { + compatible = "qcom,smp2p"; + qcom,smem = <443>, <429>; + interrupts = ; + /* On this platform, bit 26 (normally SLPI) is repurposed for ADSP */ + mboxes = <&apss_shared 26>; + + qcom,local-pid = <0>; + qcom,remote-pid = <2>; + + adsp_smp2p_out: master-kernel { + qcom,entry-name = "master-kernel"; + #qcom,smem-state-cells = <1>; + }; + + adsp_smp2p_in: slave-kernel { + qcom,entry-name = "slave-kernel"; + interrupt-controller; + #interrupt-cells = <2>; + }; + }; + + smp2p-cdsp { + compatible = "qcom,smp2p"; + qcom,smem = <94>, <432>; + interrupts = ; + mboxes = <&apss_shared 6>; + + qcom,local-pid = <0>; + qcom,remote-pid = <5>; + + cdsp_smp2p_out: master-kernel { + qcom,entry-name = "master-kernel"; + #qcom,smem-state-cells = <1>; + }; + + cdsp_smp2p_in: slave-kernel { + qcom,entry-name = "slave-kernel"; + interrupt-controller; + #interrupt-cells = <2>; + }; + + }; + qup_opp_table: opp-table-qup { compatible = "operating-points-v2"; opp-shared; -- 2.34.1
[PATCH v3 5/6] arm64: dts: qcom: qcs615: add ADSP and CDSP nodes
Add nodes for remoteprocs: ADSP and CDSP for QCS615 SoC to enable proper remoteproc functionality. Reviewed-by: Konrad Dybcio Signed-off-by: Lijuan Gao --- arch/arm64/boot/dts/qcom/qcs615.dtsi | 86 1 file changed, 86 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/qcs615.dtsi b/arch/arm64/boot/dts/qcom/qcs615.dtsi index dd54cfe7b7a6f03c1aa658ce3014d50478df5931..b7d649822a35fd30366ecab2eeb7b2fced1b369d 100644 --- a/arch/arm64/boot/dts/qcom/qcs615.dtsi +++ b/arch/arm64/boot/dts/qcom/qcs615.dtsi @@ -473,6 +473,16 @@ smem_region: smem@8600 { no-map; hwlocks = <&tcsr_mutex 3>; }; + + rproc_cdsp_mem: rproc-cdsp@93b0 { + reg = <0x0 0x93b0 0x0 0x1e0>; + no-map; + }; + + rproc_adsp_mem: rproc-adsp@9590 { + reg = <0x0 0x9590 0x0 0x1e0>; + no-map; + }; }; soc: soc@0 { @@ -3117,6 +3127,44 @@ cti@790 { clock-names = "apb_pclk"; }; + remoteproc_cdsp: remoteproc@830 { + compatible = "qcom,qcs615-cdsp-pas", "qcom,sm8150-cdsp-pas"; + reg = <0x0 0x0830 0x0 0x4040>; + + interrupts-extended = <&intc GIC_SPI 578 IRQ_TYPE_EDGE_RISING>, + <&cdsp_smp2p_in 0 IRQ_TYPE_EDGE_RISING>, + <&cdsp_smp2p_in 1 IRQ_TYPE_EDGE_RISING>, + <&cdsp_smp2p_in 2 IRQ_TYPE_EDGE_RISING>, + <&cdsp_smp2p_in 3 IRQ_TYPE_EDGE_RISING>; + interrupt-names = "wdog", + "fatal", + "ready", + "handover", + "stop-ack"; + + clocks = <&rpmhcc RPMH_CXO_CLK>; + clock-names = "xo"; + + power-domains = <&rpmhpd RPMHPD_CX>; + power-domain-names = "cx"; + + memory-region = <&rproc_cdsp_mem>; + + qcom,qmp = <&aoss_qmp>; + + qcom,smem-states = <&cdsp_smp2p_out 0>; + qcom,smem-state-names = "stop"; + + status = "disabled"; + + glink-edge { + interrupts = ; + mboxes = <&apss_shared 4>; + label = "cdsp"; + qcom,remote-pid = <5>; + }; + }; + pmu@90b6300 { compatible = "qcom,qcs615-cpu-bwmon", "qcom,sdm845-bwmon"; reg = <0x0 0x090b6300 0x0 0x600>; @@ -3751,6 +3799,44 @@ usb_2_dwc3: usb@a80 { maximum-speed = "high-speed"; }; }; + + remoteproc_adsp: remoteproc@6240 { + compatible = "qcom,qcs615-adsp-pas", "qcom,sm8150-adsp-pas"; + reg = <0x0 0x6240 0x0 0x4040>; + + interrupts-extended = <&intc GIC_SPI 162 IRQ_TYPE_EDGE_RISING>, + <&adsp_smp2p_in 0 IRQ_TYPE_EDGE_RISING>, + <&adsp_smp2p_in 1 IRQ_TYPE_EDGE_RISING>, + <&adsp_smp2p_in 2 IRQ_TYPE_EDGE_RISING>, + <&adsp_smp2p_in 3 IRQ_TYPE_EDGE_RISING>; + interrupt-names = "wdog", + "fatal", + "ready", + "handover", + "stop-ack"; + + clocks = <&rpmhcc RPMH_CXO_CLK>; + clock-names = "xo"; + + power-domains = <&rpmhpd RPMHPD_CX>; + power-domain-names = "cx"; + + memory-region = <&rproc_adsp_mem>; + + qcom,qmp = <&aoss_qmp>; + + qcom,smem-states = <&adsp_smp2p_out 0>; + qcom,smem-state-names = "stop"; + + status = "disabled"; + + glink_edge: glink-edge { + interrupts = ; + mboxes = <&apss_shared 24>; + label = "lpass"; + qcom,remote-pid = <2>; + }; + }; }; arch_timer: timer { -- 2.34.1
[PATCH v3 4/6] arm64: dts: qcom: qcs615: Add IMEM and PIL info region
Add a simple-mfd representing IMEM on QCS615 and define the PIL relocation info region as its child. The PIL region in IMEM is used to communicate load addresses of remoteproc to post mortem debug tools, so that these tools can collect ramdumps. Signed-off-by: Lijuan Gao --- arch/arm64/boot/dts/qcom/qcs615.dtsi | 14 ++ 1 file changed, 14 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/qcs615.dtsi b/arch/arm64/boot/dts/qcom/qcs615.dtsi index f922349758d11ec7fda1c43736a4bf290916e67f..dd54cfe7b7a6f03c1aa658ce3014d50478df5931 100644 --- a/arch/arm64/boot/dts/qcom/qcs615.dtsi +++ b/arch/arm64/boot/dts/qcom/qcs615.dtsi @@ -3290,6 +3290,20 @@ sram@c3f { reg = <0x0 0x0c3f 0x0 0x400>; }; + sram@146aa000 { + compatible = "qcom,qcs615-imem", "syscon", "simple-mfd"; + reg = <0x0 0x1468 0x0 0x2c000>; + ranges = <0 0 0x1468 0x2c000>; + + #address-cells = <1>; + #size-cells = <1>; + + pil-reloc@2a94c { + compatible = "qcom,pil-reloc-info"; + reg = <0x2a94c 0xc8>; + }; + }; + apps_smmu: iommu@1500 { compatible = "qcom,qcs615-smmu-500", "qcom,smmu-500", "arm,mmu-500"; reg = <0x0 0x1500 0x0 0x8>; -- 2.34.1
[PATCH v3 6/6] arm64: dts: qcom: qcs615-ride: enable remoteprocs
Enable all remoteproc nodes on the qcs615-ride board and point to the appropriate firmware files to allow proper functioning of the remote processors. Reviewed-by: Konrad Dybcio Signed-off-by: Lijuan Gao --- arch/arm64/boot/dts/qcom/qcs615-ride.dts | 12 1 file changed, 12 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/qcs615-ride.dts b/arch/arm64/boot/dts/qcom/qcs615-ride.dts index 2b5aa3c66867676bda59ff82b902b6e4974126f8..a6652e4817d1c218c7981b04daeb035e2852ac1a 100644 --- a/arch/arm64/boot/dts/qcom/qcs615-ride.dts +++ b/arch/arm64/boot/dts/qcom/qcs615-ride.dts @@ -240,6 +240,18 @@ &qupv3_id_0 { status = "okay"; }; +&remoteproc_adsp { + firmware-name = "qcom/qcs615/adsp.mbn"; + + status = "okay"; +}; + +&remoteproc_cdsp { + firmware-name = "qcom/qcs615/cdsp.mbn"; + + status = "okay"; +}; + &rpmhcc { clocks = <&xo_board_clk>; }; -- 2.34.1
[PATCH v3 1/6] dt-bindings: remoteproc: qcom,sm8150-pas: Document QCS615 remoteproc
Document the components used to boot the ADSP and CDSP on the Qualcomm QCS615 SoC. Use fallback to indicate the compatibility of the remoteproc on the QCS615 with that on the SM8150. Reviewed-by: Krzysztof Kozlowski Signed-off-by: Lijuan Gao --- .../bindings/remoteproc/qcom,sm8150-pas.yaml | 65 +- 1 file changed, 38 insertions(+), 27 deletions(-) diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,sm8150-pas.yaml b/Documentation/devicetree/bindings/remoteproc/qcom,sm8150-pas.yaml index 5dcc2a32c080049ac6c486614a5bd4d71fd3ed62..a8cddf7e2fe1a84064730d847d2f0601b67572ff 100644 --- a/Documentation/devicetree/bindings/remoteproc/qcom,sm8150-pas.yaml +++ b/Documentation/devicetree/bindings/remoteproc/qcom,sm8150-pas.yaml @@ -15,17 +15,26 @@ description: properties: compatible: -enum: - - qcom,sc8180x-adsp-pas - - qcom,sc8180x-cdsp-pas - - qcom,sc8180x-slpi-pas - - qcom,sm8150-adsp-pas - - qcom,sm8150-cdsp-pas - - qcom,sm8150-mpss-pas - - qcom,sm8150-slpi-pas - - qcom,sm8250-adsp-pas - - qcom,sm8250-cdsp-pas - - qcom,sm8250-slpi-pas +oneOf: + - items: + - enum: + - qcom,qcs615-adsp-pas + - const: qcom,sm8150-adsp-pas + - items: + - enum: + - qcom,qcs615-cdsp-pas + - const: qcom,sm8150-cdsp-pas + - enum: + - qcom,sc8180x-adsp-pas + - qcom,sc8180x-cdsp-pas + - qcom,sc8180x-slpi-pas + - qcom,sm8150-adsp-pas + - qcom,sm8150-cdsp-pas + - qcom,sm8150-mpss-pas + - qcom,sm8150-slpi-pas + - qcom,sm8250-adsp-pas + - qcom,sm8250-cdsp-pas + - qcom,sm8250-slpi-pas reg: maxItems: 1 @@ -62,16 +71,17 @@ allOf: - if: properties: compatible: - enum: -- qcom,sc8180x-adsp-pas -- qcom,sc8180x-cdsp-pas -- qcom,sc8180x-slpi-pas -- qcom,sm8150-adsp-pas -- qcom,sm8150-cdsp-pas -- qcom,sm8150-slpi-pas -- qcom,sm8250-adsp-pas -- qcom,sm8250-cdsp-pas -- qcom,sm8250-slpi-pas + contains: +enum: + - qcom,sc8180x-adsp-pas + - qcom,sc8180x-cdsp-pas + - qcom,sc8180x-slpi-pas + - qcom,sm8150-adsp-pas + - qcom,sm8150-cdsp-pas + - qcom,sm8150-slpi-pas + - qcom,sm8250-adsp-pas + - qcom,sm8250-cdsp-pas + - qcom,sm8250-slpi-pas then: properties: interrupts: @@ -88,12 +98,13 @@ allOf: - if: properties: compatible: - enum: -- qcom,sc8180x-adsp-pas -- qcom,sc8180x-cdsp-pas -- qcom,sm8150-adsp-pas -- qcom,sm8150-cdsp-pas -- qcom,sm8250-cdsp-pas + contains: +enum: + - qcom,sc8180x-adsp-pas + - qcom,sc8180x-cdsp-pas + - qcom,sm8150-adsp-pas + - qcom,sm8150-cdsp-pas + - qcom,sm8250-cdsp-pas then: properties: power-domains: -- 2.34.1
[PATCH v3 0/6] arm64: dts: qcom: qcs615: enable remoteprocs - ADSP and CDSP
Enable the remote processor PAS loader for QCS615 ADSP and CDSP processors. This allows different platforms/architectures to control (power on, load firmware, power off) those remote processors while abstracting the hardware differences. Additionally, and add a PIL region in IMEM so that post mortem debug tools can collect ramdumps. Signed-off-by: Lijuan Gao --- Changes in v3: - Update base-commit to tag: next-20250515. - Collected Reviewed-by: tag. - Add a comment for SLPI 26 in the smp2p-adsp node. - Update the IMEM address to the starting address of the IMEM layout, and also update the offset address of pil-reloc. - Link to v1: https://lore.kernel.org/r/20250507-add_qcs615_remoteproc_support-v2-0-52ac6cb43...@quicinc.com Changes in v2: - Remove the qcom prefix from smp2p node name. - Remove the unnecessary property qcom,ipc from smp2p node. - Remove the unused node apcs: syscon. - Remove the unused nodes from smp2p node, such as sleepstate_smp2p_out/in, smp2p_rdbg2_out/in, smp2p_rdbg5_out/in. - Update the commit message for IMEM PIL info region. - Update the remoteproc node names. - Correct the size of register for remoteproc nodes. - Add empty line before status properties. - Link to v1: https://lore.kernel.org/r/20250423-add_qcs615_remoteproc_support-v1-0-a94fe8799...@quicinc.com --- Kyle Deng (1): arm64: dts: qcom: qcs615: Add mproc node for SEMP2P Lijuan Gao (5): dt-bindings: remoteproc: qcom,sm8150-pas: Document QCS615 remoteproc dt-bindings: soc: qcom: add qcom,qcs615-imem compatible arm64: dts: qcom: qcs615: Add IMEM and PIL info region arm64: dts: qcom: qcs615: add ADSP and CDSP nodes arm64: dts: qcom: qcs615-ride: enable remoteprocs .../bindings/remoteproc/qcom,sm8150-pas.yaml | 65 ++ .../devicetree/bindings/sram/qcom,imem.yaml| 1 + arch/arm64/boot/dts/qcom/qcs615-ride.dts | 12 ++ arch/arm64/boot/dts/qcom/qcs615.dtsi | 144 + 4 files changed, 195 insertions(+), 27 deletions(-) --- base-commit: 484803582c77061b470ac64a634f25f89715be3f change-id: 20250516-add_qcs615_remoteproc_support-b85ca60e48a8 Best regards, -- Lijuan Gao
Re: [PATCH 0/9] tools/nolibc: split out more headers
On Thu, May 15, 2025 at 09:57:46PM +0200, Thomas Weißschuh wrote: > Split out all headers which are used by nolibc-test.c. > This makes it easier to port existing applications to nolibc. Nice work, it's pleasant to no longer see #ifdef NOLIBC in the .c. I'll eventually try to do the same in my init code once I'm in vacation, it's likely that it will be sufficient as well now. For the series: Acked-by: Willy Tarreau Thanks! Willy
[PATCH 1/2] nvdimm/btt: Fix memleaks in discover_arenas()
kmemleak reported a memleak after the ndctl_test unreferenced object 0x88800e6cf2c0 (size 32): comm "modprobe", pid 969, jiffies 4294698691 hex dump (first 32 bytes): 03 00 00 00 a0 0a 00 00 00 b0 b4 00 00 c9 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 backtrace (crc 807f3e24): __kmalloc_cache_noprof+0x331/0x410 nvdimm_namespace_attach_btt+0xa9b/0xcc0 [nd_btt] platform_probe+0x45/0xa0 ... load_module+0x21f9/0x22f0 faddr2line tells that (based on v6.15-rc4): $ ./scripts/faddr2line drivers/nvdimm/nd_btt.o nvdimm_namespace_attach_btt+0xa9 nvdimm_namespace_attach_btt+0xa9b/0xcc0: log_set_indices at linux/drivers/nvdimm/btt.c:719 (inlined by) discover_arenas at linux/drivers/nvdimm/btt.c:888 (inlined by) btt_init at linux/drivers/nvdimm/btt.c:1583 (inlined by) nvdimm_namespace_attach_btt at linux/drivers/nvdimm/btt.c:1680 It's believed that this was a false positive because the leaking size didn't match any instance in an arena. However during looking into this issue, it's noticed that it does not release an initializing arena instance and instances in btt->arena_list in some error paths. Signed-off-by: Li Zhijian --- drivers/nvdimm/btt.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c index 423dcd190906..a11e4e7e9a52 100644 --- a/drivers/nvdimm/btt.c +++ b/drivers/nvdimm/btt.c @@ -801,17 +801,22 @@ static struct arena_info *alloc_arena(struct btt *btt, size_t size, return arena; } +static void free_arena(struct arena_info *arena) +{ + kfree(arena->rtt); + kfree(arena->map_locks); + kfree(arena->freelist); + debugfs_remove_recursive(arena->debugfs_dir); + kfree(arena); +} + static void free_arenas(struct btt *btt) { struct arena_info *arena, *next; list_for_each_entry_safe(arena, next, &btt->arena_list, list) { list_del(&arena->list); - kfree(arena->rtt); - kfree(arena->map_locks); - kfree(arena->freelist); - debugfs_remove_recursive(arena->debugfs_dir); - kfree(arena); + free_arena(arena); } } @@ -848,7 +853,7 @@ static void parse_arena_meta(struct arena_info *arena, struct btt_sb *super, static int discover_arenas(struct btt *btt) { int ret = 0; - struct arena_info *arena; + struct arena_info *arena = NULL; size_t remaining = btt->rawsize; u64 cur_nlba = 0; size_t cur_off = 0; @@ -861,8 +866,10 @@ static int discover_arenas(struct btt *btt) while (remaining) { /* Alloc memory for arena */ arena = alloc_arena(btt, 0, 0, 0); - if (!arena) - return -ENOMEM; + if (!arena) { + ret = -ENOMEM; + goto out; + } arena->infooff = cur_off; ret = btt_info_read(arena, super); @@ -921,7 +928,8 @@ static int discover_arenas(struct btt *btt) return ret; out: - kfree(arena); + if (arena) + free_arena(arena); free_arenas(btt); return ret; } -- 2.47.0
[PATCH 2/2] nvdimm/btt: Fix memleaks in btt_init()
Call free_arenas() to release the arena instances in btt->arena_list in the error paths. Signed-off-by: Li Zhijian --- drivers/nvdimm/btt.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c index a11e4e7e9a52..a85448273a9a 100644 --- a/drivers/nvdimm/btt.c +++ b/drivers/nvdimm/btt.c @@ -1597,7 +1597,7 @@ static struct btt *btt_init(struct nd_btt *nd_btt, unsigned long long rawsize, if (btt->init_state != INIT_READY && nd_region->ro) { dev_warn(dev, "%s is read-only, unable to init btt metadata\n", dev_name(&nd_region->dev)); - return NULL; + goto out; } else if (btt->init_state != INIT_READY) { btt->num_arenas = (rawsize / ARENA_MAX_SIZE) + ((rawsize % ARENA_MAX_SIZE) ? 1 : 0); @@ -1607,25 +1607,29 @@ static struct btt *btt_init(struct nd_btt *nd_btt, unsigned long long rawsize, ret = create_arenas(btt); if (ret) { dev_info(dev, "init: create_arenas: %d\n", ret); - return NULL; + goto out; } ret = btt_meta_init(btt); if (ret) { dev_err(dev, "init: error in meta_init: %d\n", ret); - return NULL; + goto out; } } ret = btt_blk_init(btt); if (ret) { dev_err(dev, "init: error in blk_init: %d\n", ret); - return NULL; + goto out; } btt_debugfs_init(btt); return btt; + +out: + free_arenas(btt); + return NULL; } /** -- 2.47.0
Re: [PATCH v4] vdpa/octeon_ep: Control PCI dev enabling manually
On Thu, 2025-05-08 at 10:51 +0200, Philipp Stanner wrote: > PCI region request functions such as pci_request_region() currently > have > the problem of becoming sometimes managed functions, if > pcim_enable_device() instead of pci_enable_device() was called. The > PCI > subsystem wants to remove this deprecated behavior from its > interfaces. > > octeopn_ep enables its device with pcim_enable_device() (for VF. PF > uses > manual management), but does so only to get automatic disablement. > The > driver wants to manage its PCI resources for VF manually, without > devres. > > The easiest way not to use automatic resource management at all is by > also handling device enable- and disablement manually. > > Replace pcim_enable_device() with pci_enable_device(). Add the > necessary > calls to pci_disable_device(). > > Signed-off-by: Philipp Stanner > Acked-by: Vamsi Attunuru Hi again, this is the last of 12 drivers blocking me from removing a few hundred lines of broken code from PCI. Would be great if it could be sent to Linus next merge window. Can someone take this patch in? Thx P. > --- > Changes in v4: > - s/AF/PF > - Add Vamsi's AB > > Changes in v3: > - Only call pci_disable_device() for the PF version. For AF it > would > cause a WARN_ON because pcim_enable_device()'s callback will also > try to disable. > --- > drivers/vdpa/octeon_ep/octep_vdpa_main.c | 17 - > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/drivers/vdpa/octeon_ep/octep_vdpa_main.c > b/drivers/vdpa/octeon_ep/octep_vdpa_main.c > index f3d4dda4e04c..9b49efd24391 100644 > --- a/drivers/vdpa/octeon_ep/octep_vdpa_main.c > +++ b/drivers/vdpa/octeon_ep/octep_vdpa_main.c > @@ -454,6 +454,9 @@ static void octep_vdpa_remove_pf(struct pci_dev > *pdev) > octep_iounmap_region(pdev, octpf->base, > OCTEP_HW_MBOX_BAR); > > octep_vdpa_pf_bar_expand(octpf); > + > + /* The pf version does not use managed PCI. */ > + pci_disable_device(pdev); > } > > static void octep_vdpa_vf_bar_shrink(struct pci_dev *pdev) > @@ -825,7 +828,7 @@ static int octep_vdpa_probe_pf(struct pci_dev > *pdev) > struct octep_pf *octpf; > int ret; > > - ret = pcim_enable_device(pdev); > + ret = pci_enable_device(pdev); > if (ret) { > dev_err(dev, "Failed to enable device\n"); > return ret; > @@ -834,15 +837,17 @@ static int octep_vdpa_probe_pf(struct pci_dev > *pdev) > ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)); > if (ret) { > dev_err(dev, "No usable DMA configuration\n"); > - return ret; > + goto disable_pci; > } > octpf = devm_kzalloc(dev, sizeof(*octpf), GFP_KERNEL); > - if (!octpf) > - return -ENOMEM; > + if (!octpf) { > + ret = -ENOMEM; > + goto disable_pci; > + } > > ret = octep_iomap_region(pdev, octpf->base, > OCTEP_HW_MBOX_BAR); > if (ret) > - return ret; > + goto disable_pci; > > pci_set_master(pdev); > pci_set_drvdata(pdev, octpf); > @@ -856,6 +861,8 @@ static int octep_vdpa_probe_pf(struct pci_dev > *pdev) > > unmap_region: > octep_iounmap_region(pdev, octpf->base, OCTEP_HW_MBOX_BAR); > +disable_pci: > + pci_disable_device(pdev); > return ret; > } >
Re: [PATCH v2] x86/sgx: Prevent attempts to reclaim poisoned pages
Thanks for sending this, Andrew! I think I'll probably add a slightly shorter summary: tl;dr: SGX page reclaim touches the page to copy its contents to secondary storage. SGX instructions do not gracefully handle machine checks. Despite this, the existing SGX code will try to reclaim pages that it _knows_ are poisoned. Avoid even trying to reclaim poisoned pages. But otherwise it looks great: Acked-by: Dave Hansen
[PATCH] selftests: Improve test output grammar, code style
Add small grammar fixes in perf events and Real Time Clock tests' output messages. Include braces around a single if statement, when there are multiple statements in the else branch, to align with the kernel coding style. Signed-off-by: Hanne-Lotta Mäenpää --- tools/testing/selftests/perf_events/watermark_signal.c | 7 --- tools/testing/selftests/rtc/rtctest.c | 10 +- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/perf_events/watermark_signal.c b/tools/testing/selftests/perf_events/watermark_signal.c index 49dc1e831174..6176afd4950b 100644 --- a/tools/testing/selftests/perf_events/watermark_signal.c +++ b/tools/testing/selftests/perf_events/watermark_signal.c @@ -65,8 +65,9 @@ TEST(watermark_signal) child = fork(); EXPECT_GE(child, 0); - if (child == 0) + if (child == 0) { do_child(); + } else if (child < 0) { perror("fork()"); goto cleanup; @@ -75,7 +76,7 @@ TEST(watermark_signal) if (waitpid(child, &child_status, WSTOPPED) != child || !(WIFSTOPPED(child_status) && WSTOPSIG(child_status) == SIGSTOP)) { fprintf(stderr, - "failed to sycnhronize with child errno=%d status=%x\n", + "failed to synchronize with child errno=%d status=%x\n", errno, child_status); goto cleanup; @@ -84,7 +85,7 @@ TEST(watermark_signal) fd = syscall(__NR_perf_event_open, &attr, child, -1, -1, PERF_FLAG_FD_CLOEXEC); if (fd < 0) { - fprintf(stderr, "failed opening event %llx\n", attr.config); + fprintf(stderr, "failed to setup performance monitoring %llx\n", attr.config); goto cleanup; } diff --git a/tools/testing/selftests/rtc/rtctest.c b/tools/testing/selftests/rtc/rtctest.c index be175c0e6ae3..8fd4d5d3b527 100644 --- a/tools/testing/selftests/rtc/rtctest.c +++ b/tools/testing/selftests/rtc/rtctest.c @@ -138,10 +138,10 @@ TEST_F_TIMEOUT(rtc, date_read_loop, READ_LOOP_DURATION_SEC + 2) { rtc_read = rtc_time_to_timestamp(&rtc_tm); /* Time should not go backwards */ ASSERT_LE(prev_rtc_read, rtc_read); - /* Time should not increase more then 1s at a time */ + /* Time should not increase more than 1s per read */ ASSERT_GE(prev_rtc_read + 1, rtc_read); - /* Sleep 11ms to avoid killing / overheating the RTC */ + /* Sleep 11ms to avoid overheating the RTC */ nanosleep_with_retries(READ_LOOP_SLEEP_MS * 100); prev_rtc_read = rtc_read; @@ -236,7 +236,7 @@ TEST_F(rtc, alarm_alm_set) { if (alarm_state == RTC_ALARM_DISABLED) SKIP(return, "Skipping test since alarms are not supported."); if (alarm_state == RTC_ALARM_RES_MINUTE) - SKIP(return, "Skipping test since alarms has only minute granularity."); + SKIP(return, "Skipping test since alarms have only minute granularity."); rc = ioctl(self->fd, RTC_RD_TIME, &tm); ASSERT_NE(-1, rc); @@ -306,7 +306,7 @@ TEST_F(rtc, alarm_wkalm_set) { if (alarm_state == RTC_ALARM_DISABLED) SKIP(return, "Skipping test since alarms are not supported."); if (alarm_state == RTC_ALARM_RES_MINUTE) - SKIP(return, "Skipping test since alarms has only minute granularity."); + SKIP(return, "Skipping test since alarms have only minute granularity."); rc = ioctl(self->fd, RTC_RD_TIME, &alarm.time); ASSERT_NE(-1, rc); @@ -502,7 +502,7 @@ int main(int argc, char **argv) if (access(rtc_file, R_OK) == 0) ret = test_harness_run(argc, argv); else - ksft_exit_skip("[SKIP]: Cannot access rtc file %s - Exiting\n", + ksft_exit_skip("Cannot access RTC file %s - exiting\n", rtc_file); return ret; -- 2.39.5
Re: [PATCH v4 29/38] KVM: x86/pmu: Switch host/guest PMU context at vm-exit/vm-entry
On Mon, Mar 24, 2025, Mingwei Zhang wrote: > diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h > b/arch/x86/include/asm/kvm-x86-pmu-ops.h > index 9159bf1a4730..35f27366c277 100644 > --- a/arch/x86/include/asm/kvm-x86-pmu-ops.h > +++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h > @@ -22,6 +22,8 @@ KVM_X86_PMU_OP(init) > KVM_X86_PMU_OP_OPTIONAL(reset) > KVM_X86_PMU_OP_OPTIONAL(deliver_pmi) > KVM_X86_PMU_OP_OPTIONAL(cleanup) > +KVM_X86_PMU_OP(put_guest_context) > +KVM_X86_PMU_OP(load_guest_context) For KVM, the "guest_context" part is largely superfluous, as KVM always operates on guest state, e.g. kvm_fpu_{load,put}(). I do think we should squeeze in "mediated" somewhere, otherwise the it's hard to see that these are specific to the mediated PMU. So probably mediated_{load,put}()? > #undef KVM_X86_PMU_OP > #undef KVM_X86_PMU_OP_OPTIONAL > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 7ee740aa..4117a382739a 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -568,6 +568,10 @@ struct kvm_pmu { > u64 raw_event_mask; > struct kvm_pmc gp_counters[KVM_MAX_NR_GP_COUNTERS]; > struct kvm_pmc fixed_counters[KVM_MAX_NR_FIXED_COUNTERS]; > + u32 gp_eventsel_base; > + u32 gp_counter_base; > + u32 fixed_base; > + u32 cntr_shift; Gah, my bad, "shift" was a terrible suggestion. It should be "stride". > @@ -306,6 +313,10 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu); > int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp); > void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel); > bool vcpu_pmu_can_enable(struct kvm_vcpu *vcpu); > +void kvm_pmu_put_guest_pmcs(struct kvm_vcpu *vcpu); > +void kvm_pmu_load_guest_pmcs(struct kvm_vcpu *vcpu); > +void kvm_pmu_put_guest_context(struct kvm_vcpu *vcpu); > +void kvm_pmu_load_guest_context(struct kvm_vcpu *vcpu); > > bool is_vmware_backdoor_pmc(u32 pmc_idx); > bool kvm_rdpmc_in_guest(struct kvm_vcpu *vcpu); > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c > index 1a7e3a897fdf..7e0d84d50b74 100644 > --- a/arch/x86/kvm/svm/pmu.c > +++ b/arch/x86/kvm/svm/pmu.c > @@ -175,6 +175,22 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct > msr_data *msr_info) > return 1; > } > > +static inline void amd_update_msr_base(struct kvm_vcpu *vcpu) > +{ > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > + > + if (kvm_pmu_has_perf_global_ctrl(pmu) || > + guest_cpu_cap_has(vcpu, X86_FEATURE_PERFCTR_CORE)) { > + pmu->gp_eventsel_base = MSR_F15H_PERF_CTL0; > + pmu->gp_counter_base = MSR_F15H_PERF_CTR0; > + pmu->cntr_shift = 2; > + } else { > + pmu->gp_eventsel_base = MSR_K7_EVNTSEL0; > + pmu->gp_counter_base = MSR_K7_PERFCTR0; > + pmu->cntr_shift = 1; > + } > +} Moving quoted text around to organize responses... > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c > index 796b7bc4affe..ed17ab198dfb 100644 > --- a/arch/x86/kvm/vmx/pmu_intel.c > +++ b/arch/x86/kvm/vmx/pmu_intel.c > @@ -460,6 +460,17 @@ static void intel_pmu_enable_fixed_counter_bits(struct > kvm_pmu *pmu, u64 bits) > pmu->fixed_ctr_ctrl_rsvd &= ~intel_fixed_bits_by_idx(i, bits); > } > > +static inline void intel_update_msr_base(struct kvm_vcpu *vcpu) > +{ > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > + > + pmu->gp_eventsel_base = MSR_P6_EVNTSEL0; > + pmu->gp_counter_base = fw_writes_is_enabled(vcpu) ? > +MSR_IA32_PMC0 : MSR_IA32_PERFCTR0; This is wrong. And I unintentionally proved that it's wrong, by goofing when I fixed up this code and using MSR_IA32_PERFCTR0 instead of MSR_IA32_PMC0. Whether or not the guest supports full-width writes is irrelevant, because support for FW writes doesn't change the width of the counters. Just because the *guest* can't directly write all e.g. 48 bits doesn't mean clobbering bits 47:32 is ok. Similarly, on the AMD side, using the legacy interface in KVM is unnecessary. The guest may be limited to those MSRs, but KVM has a hard dependency on PMU v2, so just unconditionally use MSR_F15H_PERF_CTR0 (and for the record, because I had to look it up, the newfangled MSRs on AMD are aliased to the legacy MSRs for 0..3). Very happily, that means the MSRs don't need to be per-PMU, and they don't even need to be configured at runtime for a given vendor. Simply require FW writes on Intel to enable the mediated PMU, and then hardcode the GP base to MSR_IA32_PMC0. > +static void amd_put_guest_context(struct kvm_vcpu *vcpu) > +{ > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > + > + rdmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, pmu->global_ctrl); > + wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0); > + rdmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, pmu->global_status); > + > + /* Clear global status bits if non-zero */ > + if (pmu->global_st
[PATCH 1/9] tools/nolibc: move ioctl() to sys/ioctl.h
This is the location regular userspace expects this definition. Signed-off-by: Thomas Weißschuh --- tools/include/nolibc/Makefile| 1 + tools/include/nolibc/nolibc.h| 1 + tools/include/nolibc/sys.h | 12 tools/include/nolibc/sys/ioctl.h | 29 + 4 files changed, 31 insertions(+), 12 deletions(-) diff --git a/tools/include/nolibc/Makefile b/tools/include/nolibc/Makefile index 4c4b7b5ff605ac5dde11628331378489d76d4466..7d66847df91a94ca784c04ced278eb6d7099bab4 100644 --- a/tools/include/nolibc/Makefile +++ b/tools/include/nolibc/Makefile @@ -49,6 +49,7 @@ all_files := \ string.h \ sys.h \ sys/auxv.h \ + sys/ioctl.h \ sys/mman.h \ sys/random.h \ sys/stat.h \ diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h index 51c423a36b5957e2528e94d717d88e2383230cba..d6048d1e9ea5d4b5d504e156aafc5651dcd6b8c1 100644 --- a/tools/include/nolibc/nolibc.h +++ b/tools/include/nolibc/nolibc.h @@ -97,6 +97,7 @@ #include "types.h" #include "sys.h" #include "sys/auxv.h" +#include "sys/ioctl.h" #include "sys/mman.h" #include "sys/random.h" #include "sys/stat.h" diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h index 5733fe54911dca44c7423951ff85fb166d95c06f..313c210173c804f728b0be4ab8e67542bb3a7219 100644 --- a/tools/include/nolibc/sys.h +++ b/tools/include/nolibc/sys.h @@ -517,18 +517,6 @@ uid_t getuid(void) } -/* - * int ioctl(int fd, unsigned long cmd, ... arg); - */ - -static __attribute__((unused)) -long sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg) -{ - return my_syscall3(__NR_ioctl, fd, cmd, arg); -} - -#define ioctl(fd, cmd, arg) __sysret(sys_ioctl(fd, cmd, (unsigned long)(arg))) - /* * int kill(pid_t pid, int signal); */ diff --git a/tools/include/nolibc/sys/ioctl.h b/tools/include/nolibc/sys/ioctl.h new file mode 100644 index ..cb34be92a5f96aa566afa53a0f19932c2d5fe1eb --- /dev/null +++ b/tools/include/nolibc/sys/ioctl.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * Ioctl definitions for NOLIBC + * Copyright (C) 2017-2021 Willy Tarreau + */ + +/* make sure to include all global symbols */ +#include "nolibc.h" + +#ifndef _NOLIBC_SYS_IOCTL_H +#define _NOLIBC_SYS_IOCTL_H + +#include "../sys.h" + +#include + +/* + * int ioctl(int fd, unsigned long cmd, ... arg); + */ + +static __attribute__((unused)) +long sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg) +{ + return my_syscall3(__NR_ioctl, fd, cmd, arg); +} + +#define ioctl(fd, cmd, arg) __sysret(sys_ioctl(fd, cmd, (unsigned long)(arg))) + +#endif /* _NOLIBC_SYS_IOCTL_H */ -- 2.49.0
[PATCH 4/9] tools/nolibc: move reboot() to sys/reboot.h
This is the location regular userspace expects this definition. Signed-off-by: Thomas Weißschuh --- tools/include/nolibc/Makefile | 1 + tools/include/nolibc/nolibc.h | 1 + tools/include/nolibc/sys.h| 18 -- tools/include/nolibc/sys/reboot.h | 34 ++ tools/include/nolibc/types.h | 1 - 5 files changed, 36 insertions(+), 19 deletions(-) diff --git a/tools/include/nolibc/Makefile b/tools/include/nolibc/Makefile index 35cfbc84d499b0a57689bdfed95a45023904d256..4850501b8d53be6b603ecbb04d6f952cd5370cf0 100644 --- a/tools/include/nolibc/Makefile +++ b/tools/include/nolibc/Makefile @@ -54,6 +54,7 @@ all_files := \ sys/mount.h \ sys/prctl.h \ sys/random.h \ + sys/reboot.h \ sys/stat.h \ sys/syscall.h \ sys/time.h \ diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h index 1c159e32a248d46fa4d36a2c35e92eb9da91e9f6..36ea7a02c7434cc006a7b12d413add04a2f85565 100644 --- a/tools/include/nolibc/nolibc.h +++ b/tools/include/nolibc/nolibc.h @@ -102,6 +102,7 @@ #include "sys/mount.h" #include "sys/prctl.h" #include "sys/random.h" +#include "sys/reboot.h" #include "sys/stat.h" #include "sys/syscall.h" #include "sys/time.h" diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h index a17fe98968a29081661eaf235111482a543f87ba..6c89dd0316dd0ebf03ebde6fa5c14273df6a0c62 100644 --- a/tools/include/nolibc/sys.h +++ b/tools/include/nolibc/sys.h @@ -730,24 +730,6 @@ ssize_t read(int fd, void *buf, size_t count) } -/* - * int reboot(int cmd); - * is among LINUX_REBOOT_CMD_* - */ - -static __attribute__((unused)) -ssize_t sys_reboot(int magic1, int magic2, int cmd, void *arg) -{ - return my_syscall4(__NR_reboot, magic1, magic2, cmd, arg); -} - -static __attribute__((unused)) -int reboot(int cmd) -{ - return __sysret(sys_reboot(LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, cmd, 0)); -} - - /* * int getrlimit(int resource, struct rlimit *rlim); * int setrlimit(int resource, const struct rlimit *rlim); diff --git a/tools/include/nolibc/sys/reboot.h b/tools/include/nolibc/sys/reboot.h new file mode 100644 index ..727363fbfd8a6f8994a0cf13829b7347ced198bb --- /dev/null +++ b/tools/include/nolibc/sys/reboot.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * Reboot definitions for NOLIBC + * Copyright (C) 2017-2021 Willy Tarreau + */ + +/* make sure to include all global symbols */ +#include "nolibc.h" + +#ifndef _NOLIBC_SYS_REBOOT_H +#define _NOLIBC_SYS_REBOOT_H + +#include "../sys.h" + +#include + +/* + * int reboot(int cmd); + * is among LINUX_REBOOT_CMD_* + */ + +static __attribute__((unused)) +ssize_t sys_reboot(int magic1, int magic2, int cmd, void *arg) +{ + return my_syscall4(__NR_reboot, magic1, magic2, cmd, arg); +} + +static __attribute__((unused)) +int reboot(int cmd) +{ + return __sysret(sys_reboot(LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, cmd, 0)); +} + +#endif /* _NOLIBC_SYS_REBOOT_H */ diff --git a/tools/include/nolibc/types.h b/tools/include/nolibc/types.h index 93da29fe7719c9f196fba38d1f3f31cad0fc02f1..74c7694b2d5e54f7a86854697ac32a2ea2b62e86 100644 --- a/tools/include/nolibc/types.h +++ b/tools/include/nolibc/types.h @@ -12,7 +12,6 @@ #include "std.h" #include -#include /* for LINUX_REBOOT_* */ #include #include #include -- 2.49.0
[PATCH 9/9] selftests/nolibc: drop include guards around standard headers
Nolibc now provides all the headers required by nolibc-test.c. Signed-off-by: Thomas Weißschuh --- tools/testing/selftests/nolibc/nolibc-test.c | 5 - 1 file changed, 5 deletions(-) diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c index 0391c7d01380ea2f20d1d07497ea1964bcb6a9f4..dbe13000fb1ac153e9a89f627492daeb584a05d4 100644 --- a/tools/testing/selftests/nolibc/nolibc-test.c +++ b/tools/testing/selftests/nolibc/nolibc-test.c @@ -9,12 +9,9 @@ * $(CC) -nostdlib -I/path/to/nolibc/sysroot => _NOLIBC_* guards are present * $(CC) with default libc=> NOLIBC* never defined */ -#ifndef NOLIBC #include #include #include -#ifndef _NOLIBC_STDIO_H -/* standard libcs need more includes */ #include #include #include @@ -43,8 +40,6 @@ #include #include #include -#endif -#endif #pragma GCC diagnostic ignored "-Wmissing-prototypes" -- 2.49.0
[PATCH 6/9] tools/nolibc: move makedev() and friends to sys/sysmacros.h
This is the location regular userspace expects these definitions. Signed-off-by: Thomas Weißschuh --- tools/include/nolibc/Makefile| 1 + tools/include/nolibc/nolibc.h| 1 + tools/include/nolibc/sys/sysmacros.h | 20 tools/include/nolibc/types.h | 5 - 4 files changed, 22 insertions(+), 5 deletions(-) diff --git a/tools/include/nolibc/Makefile b/tools/include/nolibc/Makefile index cc82399e940e53940061b60588566db51f2e191a..ae04de46cdfeb2080d95b60c3d16e1c9f3692c9c 100644 --- a/tools/include/nolibc/Makefile +++ b/tools/include/nolibc/Makefile @@ -58,6 +58,7 @@ all_files := \ sys/resource.h \ sys/stat.h \ sys/syscall.h \ + sys/sysmacros.h \ sys/time.h \ sys/timerfd.h \ sys/types.h \ diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h index 7d151776e47a30080dc3d961bcdeacd6ab558629..182dcfce126638db1cc9fd3fa79ab99835b42c26 100644 --- a/tools/include/nolibc/nolibc.h +++ b/tools/include/nolibc/nolibc.h @@ -106,6 +106,7 @@ #include "sys/resource.h" #include "sys/stat.h" #include "sys/syscall.h" +#include "sys/sysmacros.h" #include "sys/time.h" #include "sys/timerfd.h" #include "sys/wait.h" diff --git a/tools/include/nolibc/sys/sysmacros.h b/tools/include/nolibc/sys/sysmacros.h new file mode 100644 index ..e7e24dda1dbb51df92307def00f3670459b2409c --- /dev/null +++ b/tools/include/nolibc/sys/sysmacros.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * Sysmacro definitions for NOLIBC + * Copyright (C) 2017-2021 Willy Tarreau + */ + +/* make sure to include all global symbols */ +#include "nolibc.h" + +#ifndef _NOLIBC_SYS_SYSMACROS_H +#define _NOLIBC_SYS_SYSMACROS_H + +#include "../std.h" + +/* WARNING, it only deals with the 4096 first majors and 256 first minors */ +#define makedev(major, minor) ((dev_t)major) & 0xfff) << 8) | ((minor) & 0xff))) +#define major(dev) ((unsigned int)(((dev) >> 8) & 0xfff)) +#define minor(dev) ((unsigned int)((dev) & 0xff)) + +#endif /* _NOLIBC_SYS_SYSMACROS_H */ diff --git a/tools/include/nolibc/types.h b/tools/include/nolibc/types.h index 2225c9388a46480a18e9ce603b08d846e5276bab..0071bfbc23154cf9b47b0bd747101fea4955018d 100644 --- a/tools/include/nolibc/types.h +++ b/tools/include/nolibc/types.h @@ -188,11 +188,6 @@ struct stat { typedef __kernel_clockid_t clockid_t; typedef int timer_t; -/* WARNING, it only deals with the 4096 first majors and 256 first minors */ -#define makedev(major, minor) ((dev_t)major) & 0xfff) << 8) | ((minor) & 0xff))) -#define major(dev) ((unsigned int)(((dev) >> 8) & 0xfff)) -#define minor(dev) ((unsigned int)((dev) & 0xff)) - #ifndef offsetof #define offsetof(TYPE, FIELD) ((size_t) &((TYPE *)0)->FIELD) #endif -- 2.49.0
[PATCH 8/9] tools/nolibc: move NULL and offsetof() to sys/stddef.h
This is the location regular userspace expects these definitions. Signed-off-by: Thomas Weißschuh --- tools/include/nolibc/Makefile | 1 + tools/include/nolibc/std.h| 6 +- tools/include/nolibc/stddef.h | 24 tools/include/nolibc/types.h | 4 4 files changed, 26 insertions(+), 9 deletions(-) diff --git a/tools/include/nolibc/Makefile b/tools/include/nolibc/Makefile index 0ff06374577df457b0611d91dcfe637a05476364..c4615bed731ced75a00867fdeb455d72652a7e42 100644 --- a/tools/include/nolibc/Makefile +++ b/tools/include/nolibc/Makefile @@ -44,6 +44,7 @@ all_files := \ std.h \ stdarg.h \ stdbool.h \ + stddef.h \ stdint.h \ stdlib.h \ string.h \ diff --git a/tools/include/nolibc/std.h b/tools/include/nolibc/std.h index 933bc0be7e1c6be3b7909efc127f1d3b9c611135..adda7333d12e7d2c336938ede1aaf215b4b93165 100644 --- a/tools/include/nolibc/std.h +++ b/tools/include/nolibc/std.h @@ -13,12 +13,8 @@ * syscall-specific stuff, as this file is expected to be included very early. */ -/* note: may already be defined */ -#ifndef NULL -#define NULL ((void *)0) -#endif - #include "stdint.h" +#include "stddef.h" /* those are commonly provided by sys/types.h */ typedef unsigned int dev_t; diff --git a/tools/include/nolibc/stddef.h b/tools/include/nolibc/stddef.h new file mode 100644 index ..ecbd13eab1f5190fd0e90a07699c2f06dbde8916 --- /dev/null +++ b/tools/include/nolibc/stddef.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * Stddef definitions for NOLIBC + * Copyright (C) 2017-2021 Willy Tarreau + */ + +/* make sure to include all global symbols */ +#include "nolibc.h" + +#ifndef _NOLIBC_STDDEF_H +#define _NOLIBC_STDDEF_H + +#include "stdint.h" + +/* note: may already be defined */ +#ifndef NULL +#define NULL ((void *)0) +#endif + +#ifndef offsetof +#define offsetof(TYPE, FIELD) ((size_t) &((TYPE *)0)->FIELD) +#endif + +#endif /* _NOLIBC_STDDEF_H */ diff --git a/tools/include/nolibc/types.h b/tools/include/nolibc/types.h index 0071bfbc23154cf9b47b0bd747101fea4955018d..30904be544ed01b212042ebc0f4dab610f64b216 100644 --- a/tools/include/nolibc/types.h +++ b/tools/include/nolibc/types.h @@ -188,10 +188,6 @@ struct stat { typedef __kernel_clockid_t clockid_t; typedef int timer_t; -#ifndef offsetof -#define offsetof(TYPE, FIELD) ((size_t) &((TYPE *)0)->FIELD) -#endif - #ifndef container_of #define container_of(PTR, TYPE, FIELD) ({ \ __typeof__(((TYPE *)0)->FIELD) *__FIELD_PTR = (PTR);\ -- 2.49.0
[PATCH 2/9] tools/nolibc: move mount() to sys/mount.h
This is the location regular userspace expects this definition. Signed-off-by: Thomas Weißschuh --- tools/include/nolibc/Makefile| 1 + tools/include/nolibc/nolibc.h| 1 + tools/include/nolibc/sys.h | 20 tools/include/nolibc/sys/mount.h | 37 + 4 files changed, 39 insertions(+), 20 deletions(-) diff --git a/tools/include/nolibc/Makefile b/tools/include/nolibc/Makefile index 7d66847df91a94ca784c04ced278eb6d7099bab4..aa14dfddd77deb5442a6be65dee5684b6d218da7 100644 --- a/tools/include/nolibc/Makefile +++ b/tools/include/nolibc/Makefile @@ -51,6 +51,7 @@ all_files := \ sys/auxv.h \ sys/ioctl.h \ sys/mman.h \ + sys/mount.h \ sys/random.h \ sys/stat.h \ sys/syscall.h \ diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h index d6048d1e9ea5d4b5d504e156aafc5651dcd6b8c1..690368f8e46c33df37c824429cf89dd0e95bb806 100644 --- a/tools/include/nolibc/nolibc.h +++ b/tools/include/nolibc/nolibc.h @@ -99,6 +99,7 @@ #include "sys/auxv.h" #include "sys/ioctl.h" #include "sys/mman.h" +#include "sys/mount.h" #include "sys/random.h" #include "sys/stat.h" #include "sys/syscall.h" diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h index 313c210173c804f728b0be4ab8e67542bb3a7219..e66dd6e76055753d98da2627b038dffe7f93 100644 --- a/tools/include/nolibc/sys.h +++ b/tools/include/nolibc/sys.h @@ -672,26 +672,6 @@ int mknod(const char *path, mode_t mode, dev_t dev) return __sysret(sys_mknod(path, mode, dev)); } -/* - * int mount(const char *source, const char *target, - * const char *fstype, unsigned long flags, - * const void *data); - */ -static __attribute__((unused)) -int sys_mount(const char *src, const char *tgt, const char *fst, - unsigned long flags, const void *data) -{ - return my_syscall5(__NR_mount, src, tgt, fst, flags, data); -} - -static __attribute__((unused)) -int mount(const char *src, const char *tgt, - const char *fst, unsigned long flags, - const void *data) -{ - return __sysret(sys_mount(src, tgt, fst, flags, data)); -} - /* * int pipe2(int pipefd[2], int flags); diff --git a/tools/include/nolibc/sys/mount.h b/tools/include/nolibc/sys/mount.h new file mode 100644 index ..5129420e716d0a7e75c32a5bbc8f4e9f0e21b03b --- /dev/null +++ b/tools/include/nolibc/sys/mount.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * Mount definitions for NOLIBC + * Copyright (C) 2017-2021 Willy Tarreau + */ + +/* make sure to include all global symbols */ +#include "nolibc.h" + +#ifndef _NOLIBC_SYS_MOUNT_H +#define _NOLIBC_SYS_MOUNT_H + +#include "../sys.h" + +#include + +/* + * int mount(const char *source, const char *target, + * const char *fstype, unsigned long flags, + * const void *data); + */ +static __attribute__((unused)) +int sys_mount(const char *src, const char *tgt, const char *fst, + unsigned long flags, const void *data) +{ + return my_syscall5(__NR_mount, src, tgt, fst, flags, data); +} + +static __attribute__((unused)) +int mount(const char *src, const char *tgt, + const char *fst, unsigned long flags, + const void *data) +{ + return __sysret(sys_mount(src, tgt, fst, flags, data)); +} + +#endif /* _NOLIBC_SYS_MOUNT_H */ -- 2.49.0
[PATCH 3/9] tools/nolibc: move prctl() to sys/prctl.h
This is the location regular userspace expects this definition. Signed-off-by: Thomas Weißschuh --- tools/include/nolibc/Makefile| 1 + tools/include/nolibc/nolibc.h| 1 + tools/include/nolibc/sys.h | 21 - tools/include/nolibc/sys/prctl.h | 36 4 files changed, 38 insertions(+), 21 deletions(-) diff --git a/tools/include/nolibc/Makefile b/tools/include/nolibc/Makefile index aa14dfddd77deb5442a6be65dee5684b6d218da7..35cfbc84d499b0a57689bdfed95a45023904d256 100644 --- a/tools/include/nolibc/Makefile +++ b/tools/include/nolibc/Makefile @@ -52,6 +52,7 @@ all_files := \ sys/ioctl.h \ sys/mman.h \ sys/mount.h \ + sys/prctl.h \ sys/random.h \ sys/stat.h \ sys/syscall.h \ diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h index 690368f8e46c33df37c824429cf89dd0e95bb806..1c159e32a248d46fa4d36a2c35e92eb9da91e9f6 100644 --- a/tools/include/nolibc/nolibc.h +++ b/tools/include/nolibc/nolibc.h @@ -100,6 +100,7 @@ #include "sys/ioctl.h" #include "sys/mman.h" #include "sys/mount.h" +#include "sys/prctl.h" #include "sys/random.h" #include "sys/stat.h" #include "sys/syscall.h" diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h index e66dd6e76055753d98da2627b038dffe7f93..a17fe98968a29081661eaf235111482a543f87ba 100644 --- a/tools/include/nolibc/sys.h +++ b/tools/include/nolibc/sys.h @@ -23,7 +23,6 @@ #include #include /* for O_* and AT_* */ #include /* for statx() */ -#include #include #include @@ -697,26 +696,6 @@ int pipe(int pipefd[2]) } -/* - * int prctl(int option, unsigned long arg2, unsigned long arg3, - * unsigned long arg4, unsigned long arg5); - */ - -static __attribute__((unused)) -int sys_prctl(int option, unsigned long arg2, unsigned long arg3, - unsigned long arg4, unsigned long arg5) -{ - return my_syscall5(__NR_prctl, option, arg2, arg3, arg4, arg5); -} - -static __attribute__((unused)) -int prctl(int option, unsigned long arg2, unsigned long arg3, - unsigned long arg4, unsigned long arg5) -{ - return __sysret(sys_prctl(option, arg2, arg3, arg4, arg5)); -} - - /* * int pivot_root(const char *new, const char *old); */ diff --git a/tools/include/nolibc/sys/prctl.h b/tools/include/nolibc/sys/prctl.h new file mode 100644 index ..778a6a868823ce496f0f5482a58a31d69afd6d30 --- /dev/null +++ b/tools/include/nolibc/sys/prctl.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * Prctl definitions for NOLIBC + * Copyright (C) 2017-2021 Willy Tarreau + */ + +/* make sure to include all global symbols */ +#include "nolibc.h" + +#ifndef _NOLIBC_SYS_PRCTL_H +#define _NOLIBC_SYS_PRCTL_H + +#include "../sys.h" + +#include + +/* + * int prctl(int option, unsigned long arg2, unsigned long arg3, + * unsigned long arg4, unsigned long arg5); + */ + +static __attribute__((unused)) +int sys_prctl(int option, unsigned long arg2, unsigned long arg3, + unsigned long arg4, unsigned long arg5) +{ + return my_syscall5(__NR_prctl, option, arg2, arg3, arg4, arg5); +} + +static __attribute__((unused)) +int prctl(int option, unsigned long arg2, unsigned long arg3, + unsigned long arg4, unsigned long arg5) +{ + return __sysret(sys_prctl(option, arg2, arg3, arg4, arg5)); +} + +#endif /* _NOLIBC_SYS_PRCTL_H */ -- 2.49.0
[PATCH 0/9] tools/nolibc: split out more headers
Split out all headers which are used by nolibc-test.c. This makes it easier to port existing applications to nolibc. Signed-off-by: Thomas Weißschuh --- Thomas Weißschuh (9): tools/nolibc: move ioctl() to sys/ioctl.h tools/nolibc: move mount() to sys/mount.h tools/nolibc: move prctl() to sys/prctl.h tools/nolibc: move reboot() to sys/reboot.h tools/nolibc: move getrlimit() and friends to sys/resource.h tools/nolibc: move makedev() and friends to sys/sysmacros.h tools/nolibc: move uname() and friends to sys/utsname.h tools/nolibc: move NULL and offsetof() to sys/stddef.h selftests/nolibc: drop include guards around standard headers tools/include/nolibc/Makefile| 8 ++ tools/include/nolibc/nolibc.h| 7 ++ tools/include/nolibc/std.h | 6 +- tools/include/nolibc/stddef.h| 24 + tools/include/nolibc/sys.h | 136 --- tools/include/nolibc/sys/ioctl.h | 29 ++ tools/include/nolibc/sys/mount.h | 37 tools/include/nolibc/sys/prctl.h | 36 +++ tools/include/nolibc/sys/reboot.h| 34 +++ tools/include/nolibc/sys/resource.h | 53 +++ tools/include/nolibc/sys/sysmacros.h | 20 tools/include/nolibc/sys/utsname.h | 42 + tools/include/nolibc/types.h | 11 --- tools/testing/selftests/nolibc/nolibc-test.c | 5 - 14 files changed, 291 insertions(+), 157 deletions(-) --- base-commit: 6a25f787912a73613f12e7eefbebd72ee3d43f85 change-id: 20250515-nolibc-sys-31a4fd76d897 Best regards, -- Thomas Weißschuh
[PATCH 7/9] tools/nolibc: move uname() and friends to sys/utsname.h
This is the location regular userspace expects these definitions. Signed-off-by: Thomas Weißschuh --- tools/include/nolibc/Makefile | 1 + tools/include/nolibc/nolibc.h | 1 + tools/include/nolibc/sys.h | 27 tools/include/nolibc/sys/utsname.h | 42 ++ 4 files changed, 44 insertions(+), 27 deletions(-) diff --git a/tools/include/nolibc/Makefile b/tools/include/nolibc/Makefile index ae04de46cdfeb2080d95b60c3d16e1c9f3692c9c..0ff06374577df457b0611d91dcfe637a05476364 100644 --- a/tools/include/nolibc/Makefile +++ b/tools/include/nolibc/Makefile @@ -62,6 +62,7 @@ all_files := \ sys/time.h \ sys/timerfd.h \ sys/types.h \ + sys/utsname.h \ sys/wait.h \ time.h \ types.h \ diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h index 182dcfce126638db1cc9fd3fa79ab99835b42c26..c199ade200c240ef4081f767a162698256f39677 100644 --- a/tools/include/nolibc/nolibc.h +++ b/tools/include/nolibc/nolibc.h @@ -109,6 +109,7 @@ #include "sys/sysmacros.h" #include "sys/time.h" #include "sys/timerfd.h" +#include "sys/utsname.h" #include "sys/wait.h" #include "ctype.h" #include "elf.h" diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h index 282909b1992d17d5292917fbabb0bd2a2dbfce88..9556c69a6ae1ff0f19d884f5022599fc31e04f1f 100644 --- a/tools/include/nolibc/sys.h +++ b/tools/include/nolibc/sys.h @@ -23,7 +23,6 @@ #include #include /* for O_* and AT_* */ #include /* for statx() */ -#include #include "errno.h" #include "stdarg.h" @@ -894,32 +893,6 @@ int umount2(const char *path, int flags) } -/* - * int uname(struct utsname *buf); - */ - -struct utsname { - char sysname[65]; - char nodename[65]; - char release[65]; - char version[65]; - char machine[65]; - char domainname[65]; -}; - -static __attribute__((unused)) -int sys_uname(struct utsname *buf) -{ - return my_syscall1(__NR_uname, buf); -} - -static __attribute__((unused)) -int uname(struct utsname *buf) -{ - return __sysret(sys_uname(buf)); -} - - /* * int unlink(const char *path); */ diff --git a/tools/include/nolibc/sys/utsname.h b/tools/include/nolibc/sys/utsname.h new file mode 100644 index ..3adda115085069403253c2a276de1e2ae9bb1ad3 --- /dev/null +++ b/tools/include/nolibc/sys/utsname.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * Utsname definitions for NOLIBC + * Copyright (C) 2017-2021 Willy Tarreau + */ + +/* make sure to include all global symbols */ +#include "nolibc.h" + +#ifndef _NOLIBC_SYS_UTSNAME_H +#define _NOLIBC_SYS_UTSNAME_H + +#include "../sys.h" + +#include + +/* + * int uname(struct utsname *buf); + */ + +struct utsname { + char sysname[65]; + char nodename[65]; + char release[65]; + char version[65]; + char machine[65]; + char domainname[65]; +}; + +static __attribute__((unused)) +int sys_uname(struct utsname *buf) +{ + return my_syscall1(__NR_uname, buf); +} + +static __attribute__((unused)) +int uname(struct utsname *buf) +{ + return __sysret(sys_uname(buf)); +} + +#endif /* _NOLIBC_SYS_UTSNAME_H */ -- 2.49.0
[PATCH 5/9] tools/nolibc: move getrlimit() and friends to sys/resource.h
This is the location regular userspace expects these definitions. Signed-off-by: Thomas Weißschuh --- tools/include/nolibc/Makefile | 1 + tools/include/nolibc/nolibc.h | 1 + tools/include/nolibc/sys.h | 38 -- tools/include/nolibc/sys/resource.h | 53 + tools/include/nolibc/types.h| 1 - 5 files changed, 55 insertions(+), 39 deletions(-) diff --git a/tools/include/nolibc/Makefile b/tools/include/nolibc/Makefile index 4850501b8d53be6b603ecbb04d6f952cd5370cf0..cc82399e940e53940061b60588566db51f2e191a 100644 --- a/tools/include/nolibc/Makefile +++ b/tools/include/nolibc/Makefile @@ -55,6 +55,7 @@ all_files := \ sys/prctl.h \ sys/random.h \ sys/reboot.h \ + sys/resource.h \ sys/stat.h \ sys/syscall.h \ sys/time.h \ diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h index 36ea7a02c7434cc006a7b12d413add04a2f85565..7d151776e47a30080dc3d961bcdeacd6ab558629 100644 --- a/tools/include/nolibc/nolibc.h +++ b/tools/include/nolibc/nolibc.h @@ -103,6 +103,7 @@ #include "sys/prctl.h" #include "sys/random.h" #include "sys/reboot.h" +#include "sys/resource.h" #include "sys/stat.h" #include "sys/syscall.h" #include "sys/time.h" diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h index 6c89dd0316dd0ebf03ebde6fa5c14273df6a0c62..282909b1992d17d5292917fbabb0bd2a2dbfce88 100644 --- a/tools/include/nolibc/sys.h +++ b/tools/include/nolibc/sys.h @@ -23,7 +23,6 @@ #include #include /* for O_* and AT_* */ #include /* for statx() */ -#include #include #include "errno.h" @@ -730,43 +729,6 @@ ssize_t read(int fd, void *buf, size_t count) } -/* - * int getrlimit(int resource, struct rlimit *rlim); - * int setrlimit(int resource, const struct rlimit *rlim); - */ - -static __attribute__((unused)) -int sys_prlimit64(pid_t pid, int resource, - const struct rlimit64 *new_limit, struct rlimit64 *old_limit) -{ - return my_syscall4(__NR_prlimit64, pid, resource, new_limit, old_limit); -} - -static __attribute__((unused)) -int getrlimit(int resource, struct rlimit *rlim) -{ - struct rlimit64 rlim64; - int ret; - - ret = __sysret(sys_prlimit64(0, resource, NULL, &rlim64)); - rlim->rlim_cur = rlim64.rlim_cur; - rlim->rlim_max = rlim64.rlim_max; - - return ret; -} - -static __attribute__((unused)) -int setrlimit(int resource, const struct rlimit *rlim) -{ - struct rlimit64 rlim64 = { - .rlim_cur = rlim->rlim_cur, - .rlim_max = rlim->rlim_max, - }; - - return __sysret(sys_prlimit64(0, resource, &rlim64, NULL)); -} - - /* * int sched_yield(void); */ diff --git a/tools/include/nolibc/sys/resource.h b/tools/include/nolibc/sys/resource.h new file mode 100644 index ..ac2705a67e8b3abacffa57f097095b98afa1469c --- /dev/null +++ b/tools/include/nolibc/sys/resource.h @@ -0,0 +1,53 @@ +/* SPDX-License-Identifier: LGPL-2.1 OR MIT */ +/* + * Resource definitions for NOLIBC + * Copyright (C) 2017-2021 Willy Tarreau + */ + +/* make sure to include all global symbols */ +#include "nolibc.h" + +#ifndef _NOLIBC_SYS_RESOURCE_H +#define _NOLIBC_SYS_RESOURCE_H + +#include "../sys.h" + +#include + +/* + * int getrlimit(int resource, struct rlimit *rlim); + * int setrlimit(int resource, const struct rlimit *rlim); + */ + +static __attribute__((unused)) +int sys_prlimit64(pid_t pid, int resource, + const struct rlimit64 *new_limit, struct rlimit64 *old_limit) +{ + return my_syscall4(__NR_prlimit64, pid, resource, new_limit, old_limit); +} + +static __attribute__((unused)) +int getrlimit(int resource, struct rlimit *rlim) +{ + struct rlimit64 rlim64; + int ret; + + ret = __sysret(sys_prlimit64(0, resource, NULL, &rlim64)); + rlim->rlim_cur = rlim64.rlim_cur; + rlim->rlim_max = rlim64.rlim_max; + + return ret; +} + +static __attribute__((unused)) +int setrlimit(int resource, const struct rlimit *rlim) +{ + struct rlimit64 rlim64 = { + .rlim_cur = rlim->rlim_cur, + .rlim_max = rlim->rlim_max, + }; + + return __sysret(sys_prlimit64(0, resource, &rlim64, NULL)); +} + +#endif /* _NOLIBC_SYS_RESOURCE_H */ diff --git a/tools/include/nolibc/types.h b/tools/include/nolibc/types.h index 74c7694b2d5e54f7a86854697ac32a2ea2b62e86..2225c9388a46480a18e9ce603b08d846e5276bab 100644 --- a/tools/include/nolibc/types.h +++ b/tools/include/nolibc/types.h @@ -15,7 +15,6 @@ #include #include #include -#include /* Only the generic macros and types may be defined here. The arch-specific -- 2.49.0
Re: [PATCH 1/2] selftests/mm: skip uffd tests in madv_guard if uffd is not present.
Ah you got to this first :) thanks! Could you do this with a cover letter though? It's really weird to have 2/2 reply to 1/2, I know sometimes people do that, but it's just odd, and it'd be good to have an overview, thanks! On Thu, May 15, 2025 at 02:23:32PM -0400, Zi Yan wrote: > When userfaultfd is not compiled into kernel, userfaultfd() returns -1, > causing uffd tests in madv_guard fail. Skip the tests instead. 'madv_guard'? I'd just say the guard_regions.uffd test to fail. > > Signed-off-by: Zi Yan > --- > tools/testing/selftests/mm/guard-regions.c | 17 +++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/tools/testing/selftests/mm/guard-regions.c > b/tools/testing/selftests/mm/guard-regions.c > index 0cd9d236649d..93af3d3760f9 100644 > --- a/tools/testing/selftests/mm/guard-regions.c > +++ b/tools/testing/selftests/mm/guard-regions.c > @@ -1453,8 +1453,21 @@ TEST_F(guard_regions, uffd) > > /* Set up uffd. */ > uffd = userfaultfd(0); > - if (uffd == -1 && errno == EPERM) > - ksft_exit_skip("No userfaultfd permissions, try running as > root.\n"); Let's just make this all part of the same switch please! And while I originally used ksft_exit_skip(), I think we can just use the SKIP(return, ...) form here just fine to keep it consistent. > + if (uffd == -1) { > + switch (errno) { > + case EPERM: > + SKIP(return, "No userfaultfd permissions, try running > as root."); > + break; > + case ENOSYS: > + SKIP(return, "userfaultfd is not supported/not > enabled."); > + break; > + default: > + ksft_exit_fail_msg("userfaultfd failed with %s\n", > +strerror(errno)); > + break; > + } > + } > + > ASSERT_NE(uffd, -1); > > ASSERT_EQ(ioctl(uffd, UFFDIO_API, &api), 0); > -- > 2.47.2 > Thanks!
Re: [PATCH 1/2] selftests/mm: skip uffd tests in madv_guard if uffd is not present.
On Thu, May 15, 2025 at 02:23:32PM -0400, Zi Yan wrote: > When userfaultfd is not compiled into kernel, userfaultfd() returns -1, > causing uffd tests in madv_guard fail. Skip the tests instead. > > Signed-off-by: Zi Yan Reviewed-by: Pedro Falcato Thanks! -- Pedro
Re: [PATCH v4 22/38] KVM: x86/pmu: Optimize intel/amd_pmu_refresh() helpers
On Thu, May 15, 2025, Dapeng Mi wrote: > On 5/15/2025 8:37 AM, Sean Christopherson wrote: > >> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c > >> index 153972e944eb..eba086ef5eca 100644 > >> --- a/arch/x86/kvm/svm/pmu.c > >> +++ b/arch/x86/kvm/svm/pmu.c > >> @@ -198,12 +198,20 @@ static void __amd_pmu_refresh(struct kvm_vcpu *vcpu) > >>pmu->nr_arch_gp_counters = min_t(unsigned int, pmu->nr_arch_gp_counters, > >> kvm_pmu_cap.num_counters_gp); > >> > >> - if (pmu->version > 1) { > >> - pmu->global_ctrl_rsvd = ~((1ull << pmu->nr_arch_gp_counters) - > >> 1); > >> + if (kvm_pmu_cap.version > 1) { ARGH! I saw the pmu->version => kvm_pmu_cap.version change when going through this patch, but assumed it was simply a refactoring bug and ignored it. Nope, 100% intentional, as I discovered after spending the better part of an hour debugging. Stuffing pmu->global_ctrl when it doesn't exist is necessary when the mediated PMU is enabled, because pmu->global_ctrl will always be loaded in hardware. And so loading '0' means the PMU is effectively disabled, because KVM disallows the guest from writing the MSR. _That_ is the type of thing that absolutely needs a comment and a lengthy explanation in the changelog. Intel also needs the same treatment for PMU v1. And since there's no hackery that I can see, that suggests PMU v1 guests haven't been tested with the mediated PMU on Intel. I guess congratulations are in order, because this patch just became my new goto example of why I'm so strict about on the "one thing per patch" rule. > > It's not just global_ctrl. PEBS and the fixed counters also depend on v2+ > > (the > > SDM contradicts itself; KVM's ABI is that they're v2+). > > > >> + /* > >> + * At RESET, AMD CPUs set all enable bits for general purpose > >> counters in > >> + * IA32_PERF_GLOBAL_CTRL (so that software that was written for > >> v1 PMUs > >> + * don't unknowingly leave GP counters disabled in the global > >> controls). > >> + * Emulate that behavior when refreshing the PMU so that > >> userspace doesn't > >> + * need to manually set PERF_GLOBAL_CTRL. > >> + */ > >> + pmu->global_ctrl = BIT_ULL(pmu->nr_arch_gp_counters) - 1; > >> + pmu->global_ctrl_rsvd = ~pmu->global_ctrl; > >>pmu->global_status_rsvd = pmu->global_ctrl_rsvd; > >>}
Re: [PATCH v4 05/38] perf: Add generic exclude_guest support
On Thu, May 15, 2025, Kan Liang wrote: > On 2025-05-14 7:19 p.m., Sean Christopherson wrote: > >> This naming is confusing on purpose? Pick either guest/host and stick > >> with it. > > > > +1. I also think the inner perf_host_{enter,exit}() helpers are superflous. > > These flows > > > > After a bit of hacking, and with a few spoilers, this is what I ended up > > with > > (not anywhere near fully tested). I like following KVM's > > kvm_xxx_{load,put}() > > nomenclature to tie everything together, so I went with "guest" instead of > > "host" > > even though the majority of work being down is to shedule out/in host > > context. > > > > /* When loading a guest's mediated PMU, schedule out all exclude_guest > > events. */ > > void perf_load_guest_context(unsigned long data) > > { > > struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context); > > > > lockdep_assert_irqs_disabled(); > > > > perf_ctx_lock(cpuctx, cpuctx->task_ctx); > > > > if (WARN_ON_ONCE(__this_cpu_read(guest_ctx_loaded))) > > goto unlock; > > > > perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST); > > ctx_sched_out(&cpuctx->ctx, NULL, EVENT_GUEST); > > perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST); > > if (cpuctx->task_ctx) { > > perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST); > > task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST); > > perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST); > > } > > > > arch_perf_load_guest_context(data); > > > > __this_cpu_write(guest_ctx_loaded, true); > > > > unlock: > > perf_ctx_unlock(cpuctx, cpuctx->task_ctx); > > } > > EXPORT_SYMBOL_GPL(perf_load_guest_context); > > > > void perf_put_guest_context(void) > > { > > struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context); > > > > lockdep_assert_irqs_disabled(); > > > > perf_ctx_lock(cpuctx, cpuctx->task_ctx); > > > > if (WARN_ON_ONCE(!__this_cpu_read(guest_ctx_loaded))) > > goto unlock; > > > > arch_perf_put_guest_context(); > > It will set the guest_ctx_loaded to false. > The update_context_time() invoked in the perf_event_sched_in() will not > get a chance to update the guest time. The guest_ctx_loaded in arch/x86/events/core.c is a different variable, it just happens to have the same name. It's completely gross, but exposing guest_ctx_loaded outside of kernel/events/core.c didn't seem like a great alternative. If we wanted to use a single variable, then the writes in arch_perf_{load,put}_guest_context() can simply go away.
Re: [PATCH v2 4/4] media: i2c: imx214: Remove hard-coded external clock frequency
Hi Laurent, On Thu, May 15, 2025 at 03:03:40PM +0200, Laurent Pinchart wrote: > On Thu, May 15, 2025 at 02:54:54PM +0300, Sakari Ailus wrote: > > On Thu, May 15, 2025 at 10:58:46AM +0200, Laurent Pinchart wrote: > > > On Tue, May 06, 2025 at 08:24:03AM +, Sakari Ailus wrote: > > > > On Mon, May 05, 2025 at 11:05:56PM +0200, André Apitzsch via B4 Relay > > > > wrote: > > > > > From: André Apitzsch > > > > > > > > > > Instead rely on the rate set on the clock (using assigned-clock-rates > > > > > etc.) > > > > > > > > > > Signed-off-by: André Apitzsch > > > > > --- > > > > > drivers/media/i2c/imx214.c | 6 -- > > > > > 1 file changed, 6 deletions(-) > > > > > > > > > > diff --git a/drivers/media/i2c/imx214.c b/drivers/media/i2c/imx214.c > > > > > index > > > > > 9e9be47394ec768a5b34d44b06b5bbb0988da5a1..c12996e294dccebb18c608254f1e0d14dc064423 > > > > > 100644 > > > > > --- a/drivers/media/i2c/imx214.c > > > > > +++ b/drivers/media/i2c/imx214.c > > > > > @@ -32,7 +32,6 @@ > > > > > > > > > > #define IMX214_REG_FAST_STANDBY_CTRL CCI_REG8(0x0106) > > > > > > > > > > -#define IMX214_DEFAULT_CLK_FREQ 2400 > > > > > #define IMX214_DEFAULT_LINK_FREQ 6 > > > > > /* Keep wrong link frequency for backward compatibility */ > > > > > #define IMX214_DEFAULT_LINK_FREQ_LEGACY 48000 > > > > > @@ -1405,11 +1404,6 @@ static int imx214_probe(struct i2c_client > > > > > *client) > > > > > return dev_err_probe(dev, PTR_ERR(imx214->xclk), > > > > >"failed to get xclk\n"); > > > > > > > > > > - ret = clk_set_rate(imx214->xclk, IMX214_DEFAULT_CLK_FREQ); > > > > > - if (ret) > > > > > - return dev_err_probe(dev, ret, > > > > > - "failed to set xclk frequency\n"); > > > > > - > > > > > > > > Oops. I missed this is what the driver was doing already. Indeed, this > > > > is > > > > one of the historic sensor drivers that do set the frequency in DT > > > > systems. > > > > > > > > The driver never used the clock-frequency property and instead used a > > > > fixed > > > > frequency. Changing the behaviour now could be problematic. > > > > > > > > There are options here that I think we could do: > > > > > > > > 1) use your v1 patch (4) which uses "clock-frequency" if it exists and > > > > otherwise uses the default, fixed frequency or > > > > > > > > 2) set the frequency only if the "clock-frequency" property exists. The > > > > DT > > > > currently requires clock-frequency and the YAML conversion was done in > > > > 2020 > > > > whereas the driver is from 2018. If we do this, the clock-frequency > > > > should > > > > be deprecated (or even removed from bingings). > > > > > > > > I wonder what others think. Cc'd Laurent in any case. > > > > > > Maybe I'm missing something, but I don't really see the issue here. The > > > clock-frequency DT property is currently ignored, and this patch doesn't > > > change that situation, does it ? > > > > > > The change of behaviour here is related to the assigned-clock-rates > > > property. If that property is specified today, it will set the clock > > > rate, and the driver will override it to 24MHz right after. With this > > > patch, the clock rate won't be overridden. I think the risk of > > > regression is very low here, as I don't expect systems to set > > > assigned-clock-rates in DT to a value different than 24MHz and expect > > > the driver to override it. > > > > If the DTS had assigned-clock-rates set correctly, then yes. How much can > > we trust the older DTS did have that? > > I am relatively confident that DT-based systems wouldn't have an > assigned-clock-rates property with a frequency that doesn't match > IMX214_DEFAULT_CLK_FREQ. The real question is whether or not I'm > over-confident :-) The assigned-clock stuff wasn't always there. But nowadays I guess a lot of things in practice depends on it. So I guess doing this should be fine then. -- Sakari Ailus
Re: [PATCH 1/2] selftests/mm: skip uffd tests in madv_guard if uffd is not present.
On Thu, May 15, 2025 at 02:46:41PM -0400, Zi Yan wrote: > On 15 May 2025, at 14:41, Lorenzo Stoakes wrote: > > > Ah you got to this first :) thanks! > > > > Could you do this with a cover letter though? It's really weird to have 2/2 > > reply to 1/2, I know sometimes people do that, but it's just odd, and it'd > > be > > good to have an overview, thanks! > > > > On Thu, May 15, 2025 at 02:23:32PM -0400, Zi Yan wrote: > >> When userfaultfd is not compiled into kernel, userfaultfd() returns -1, > >> causing uffd tests in madv_guard fail. Skip the tests instead. > > > > 'madv_guard'? I'd just say the guard_regions.uffd test to fail. > > Sure. Will change it. > > > >> > >> Signed-off-by: Zi Yan Given I was being an idiot below, now the patch is fine as-is, just resend with the nitty commit message change and cover letter as a v2 and we should be good :) Reviewed-by: Lorenzo Stoakes > >> --- > >> tools/testing/selftests/mm/guard-regions.c | 17 +++-- > >> 1 file changed, 15 insertions(+), 2 deletions(-) > >> > >> diff --git a/tools/testing/selftests/mm/guard-regions.c > >> b/tools/testing/selftests/mm/guard-regions.c > >> index 0cd9d236649d..93af3d3760f9 100644 > >> --- a/tools/testing/selftests/mm/guard-regions.c > >> +++ b/tools/testing/selftests/mm/guard-regions.c > >> @@ -1453,8 +1453,21 @@ TEST_F(guard_regions, uffd) > >> > >>/* Set up uffd. */ > >>uffd = userfaultfd(0); > >> - if (uffd == -1 && errno == EPERM) > >> - ksft_exit_skip("No userfaultfd permissions, try running as > >> root.\n"); > > > > Let's just make this all part of the same switch please! > > What do you mean? EPERM is handled in the switch-case below. Oh man, I'm the biggest idiot on Earth haha! For some reason I read these '-'s as '+'s : Yes please ignore the above, I therefore - like your approach - and am good with it. > > > > > And while I originally used ksft_exit_skip(), I think we can just use the > > SKIP(return, ...) form here just fine to keep it consistent. > > Right. I am using SKIP below, since when I ran it, ksft_exit_skip() > makes the whole test message inconsistent. Yes, your confusion is warranted, I apparently can't read... :>) and indeed, agreed. Thanks for doing this! > > > > >> + if (uffd == -1) { > >> + switch (errno) { > >> + case EPERM: > >> + SKIP(return, "No userfaultfd permissions, try running > >> as root."); > >> + break; > >> + case ENOSYS: > >> + SKIP(return, "userfaultfd is not supported/not > >> enabled."); > >> + break; > >> + default: > >> + ksft_exit_fail_msg("userfaultfd failed with %s\n", > >> + strerror(errno)); > >> + break; > >> + } > >> + } > >> + > >>ASSERT_NE(uffd, -1); > >> > >>ASSERT_EQ(ioctl(uffd, UFFDIO_API, &api), 0); > >> -- > >> 2.47.2 > >> > > > > Thanks! > > > -- > Best Regards, > Yan, Zi
[PATCH net] selftests: net: validate team flags propagation
Cover three recent cases: 1. missing ops locking for the lowers during netdev_sync_lower_features 2. missing locking for dev_set_promiscuity (plus netdev_ops_assert_locked with a comment on why/when it's needed) 3. rcu lock during team_change_rx_flags Verified that each one triggers when the respective fix is reverted. Not sure about the placement, but since it all relies on teaming, added to the teaming directory. One ugly bit is that I add NETIF_F_LRO to netdevsim; there is no way to trigger netdev_sync_lower_features without it. Signed-off-by: Stanislav Fomichev --- drivers/net/netdevsim/netdev.c| 2 + net/core/dev.c| 10 ++- .../selftests/drivers/net/team/Makefile | 2 +- .../testing/selftests/drivers/net/team/config | 1 + .../selftests/drivers/net/team/propagation.sh | 79 +++ 5 files changed, 92 insertions(+), 2 deletions(-) create mode 100755 tools/testing/selftests/drivers/net/team/propagation.sh diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c index 0e0321a7ddd7..3bd1f8cffee8 100644 --- a/drivers/net/netdevsim/netdev.c +++ b/drivers/net/netdevsim/netdev.c @@ -879,11 +879,13 @@ static void nsim_setup(struct net_device *dev) NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | +NETIF_F_LRO | NETIF_F_TSO; dev->hw_features |= NETIF_F_HW_TC | NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | + NETIF_F_LRO | NETIF_F_TSO; dev->max_mtu = ETH_MAX_MTU; dev->xdp_features = NETDEV_XDP_ACT_HW_OFFLOAD; diff --git a/net/core/dev.c b/net/core/dev.c index 0d891634c692..4debd4b8e0f5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9188,8 +9188,16 @@ static int __dev_set_promiscuity(struct net_device *dev, int inc, bool notify) dev_change_rx_flags(dev, IFF_PROMISC); } - if (notify) + if (notify) { + /* The ops lock is only required to ensure consistent locking +* for `NETDEV_CHANGE` notifiers. This function is sometimes +* called without the lock, even for devices that are ops +* locked, such as in `dev_uc_sync_multiple` when using +* bonding or teaming. +*/ + netdev_ops_assert_locked(dev); __dev_notify_flags(dev, old_flags, IFF_PROMISC, 0, NULL); + } return 0; } diff --git a/tools/testing/selftests/drivers/net/team/Makefile b/tools/testing/selftests/drivers/net/team/Makefile index 2d5a76d99181..eaf6938f100e 100644 --- a/tools/testing/selftests/drivers/net/team/Makefile +++ b/tools/testing/selftests/drivers/net/team/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 # Makefile for net selftests -TEST_PROGS := dev_addr_lists.sh +TEST_PROGS := dev_addr_lists.sh propagation.sh TEST_INCLUDES := \ ../bonding/lag_lib.sh \ diff --git a/tools/testing/selftests/drivers/net/team/config b/tools/testing/selftests/drivers/net/team/config index b5e3a3aad4bf..636b3525b679 100644 --- a/tools/testing/selftests/drivers/net/team/config +++ b/tools/testing/selftests/drivers/net/team/config @@ -1,5 +1,6 @@ CONFIG_DUMMY=y CONFIG_IPV6=y CONFIG_MACVLAN=y +CONFIG_NETDEVSIM=m CONFIG_NET_TEAM=y CONFIG_NET_TEAM_MODE_LOADBALANCE=y diff --git a/tools/testing/selftests/drivers/net/team/propagation.sh b/tools/testing/selftests/drivers/net/team/propagation.sh new file mode 100755 index ..849a5f2cb3a7 --- /dev/null +++ b/tools/testing/selftests/drivers/net/team/propagation.sh @@ -0,0 +1,79 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +set -e + +NSIM_LRO_ID=$((256 + RANDOM % 256)) +NSIM_LRO_SYS=/sys/bus/netdevsim/devices/netdevsim$NSIM_LRO_ID + +NSIM_DEV_SYS_NEW=/sys/bus/netdevsim/new_device +NSIM_DEV_SYS_DEL=/sys/bus/netdevsim/del_device + +cleanup() +{ + ip link del dummyteam &>/dev/null + ip link del team0 &>/dev/null + echo $NSIM_LRO_ID > $NSIM_DEV_SYS_DEL +} + +# Trigger LRO propagation to the lower. +# https://lore.kernel.org/netdev/aBvOpkIoxcr9PfDg@mini-arch/ +team_lro() +{ + # using netdevsim because it supports NETIF_F_LRO + NSIM_LRO_NAME=$(find $NSIM_LRO_SYS/net -maxdepth 1 -type d ! \ + -path $NSIM_LRO_SYS/net -exec basename {} \;) + + ip link add name team0 type team + ip link set $NSIM_LRO_NAME down + ip link set dev $NSIM_LRO_NAME master team0 + ip link set team0 up + ethtool -K team0 large-receive-offload off + + ip link del team0 +} + +# Trigger promisc propagation to the lower during IFLA_MASTER. +# https://lore.kernel.org/netdev/20250506032328.3003050-1-...@fomichev.me/ +team_promisc() +{ + ip link add name dummyteam type dummy
Re: [PATCH v4 05/38] perf: Add generic exclude_guest support
On 2025-05-14 7:19 p.m., Sean Christopherson wrote: > On Fri, Apr 25, 2025, Peter Zijlstra wrote: >> On Mon, Mar 24, 2025 at 05:30:45PM +, Mingwei Zhang wrote: >> >>> @@ -6040,6 +6041,71 @@ void perf_put_mediated_pmu(void) >>> } >>> EXPORT_SYMBOL_GPL(perf_put_mediated_pmu); >>> >>> +static inline void perf_host_exit(struct perf_cpu_context *cpuctx) >>> +{ >>> + perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST); >>> + ctx_sched_out(&cpuctx->ctx, NULL, EVENT_GUEST); >>> + perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST); >>> + if (cpuctx->task_ctx) { >>> + perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST); >>> + task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST); >>> + perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST); >>> + } >>> +} >>> + >>> +/* When entering a guest, schedule out all exclude_guest events. */ >>> +void perf_guest_enter(void) >>> +{ >>> + struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context); >>> + >>> + lockdep_assert_irqs_disabled(); >>> + >>> + perf_ctx_lock(cpuctx, cpuctx->task_ctx); >>> + >>> + if (WARN_ON_ONCE(__this_cpu_read(perf_in_guest))) >>> + goto unlock; >>> + >>> + perf_host_exit(cpuctx); >>> + >>> + __this_cpu_write(perf_in_guest, true); >>> + >>> +unlock: >>> + perf_ctx_unlock(cpuctx, cpuctx->task_ctx); >>> +} >>> +EXPORT_SYMBOL_GPL(perf_guest_enter); >>> + >>> +static inline void perf_host_enter(struct perf_cpu_context *cpuctx) >>> +{ >>> + perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST); >>> + if (cpuctx->task_ctx) >>> + perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST); >>> + >>> + perf_event_sched_in(cpuctx, cpuctx->task_ctx, NULL, EVENT_GUEST); >>> + >>> + if (cpuctx->task_ctx) >>> + perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST); >>> + perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST); >>> +} >>> + >>> +void perf_guest_exit(void) >>> +{ >>> + struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context); >>> + >>> + lockdep_assert_irqs_disabled(); >>> + >>> + perf_ctx_lock(cpuctx, cpuctx->task_ctx); >>> + >>> + if (WARN_ON_ONCE(!__this_cpu_read(perf_in_guest))) >>> + goto unlock; >>> + >>> + perf_host_enter(cpuctx); >>> + >>> + __this_cpu_write(perf_in_guest, false); >>> +unlock: >>> + perf_ctx_unlock(cpuctx, cpuctx->task_ctx); >>> +} >>> +EXPORT_SYMBOL_GPL(perf_guest_exit); >> >> This naming is confusing on purpose? Pick either guest/host and stick >> with it. > > +1. I also think the inner perf_host_{enter,exit}() helpers are superflous. > These flows > > After a bit of hacking, and with a few spoilers, this is what I ended up with > (not anywhere near fully tested). I like following KVM's kvm_xxx_{load,put}() > nomenclature to tie everything together, so I went with "guest" instead of > "host" > even though the majority of work being down is to shedule out/in host context. > > /* When loading a guest's mediated PMU, schedule out all exclude_guest > events. */ > void perf_load_guest_context(unsigned long data) > { > struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context); > > lockdep_assert_irqs_disabled(); > > perf_ctx_lock(cpuctx, cpuctx->task_ctx); > > if (WARN_ON_ONCE(__this_cpu_read(guest_ctx_loaded))) > goto unlock; > > perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST); > ctx_sched_out(&cpuctx->ctx, NULL, EVENT_GUEST); > perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST); > if (cpuctx->task_ctx) { > perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST); > task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST); > perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST); > } > > arch_perf_load_guest_context(data); > > __this_cpu_write(guest_ctx_loaded, true); > > unlock: > perf_ctx_unlock(cpuctx, cpuctx->task_ctx); > } > EXPORT_SYMBOL_GPL(perf_load_guest_context); > > void perf_put_guest_context(void) > { > struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context); > > lockdep_assert_irqs_disabled(); > > perf_ctx_lock(cpuctx, cpuctx->task_ctx); > > if (WARN_ON_ONCE(!__this_cpu_read(guest_ctx_loaded))) > goto unlock; > > arch_perf_put_guest_context(); It will set the guest_ctx_loaded to false. The update_context_time() invoked in the perf_event_sched_in() will not get a chance to update the guest time. I think something as below should work. - Disable all in the PMU (disable global control) - schedule in the host counters (but not run yet since the global control of the PMU is disabled) - arch_perf_put_guest_context() - Enable all in the PMU (Enable global control. The host counters now start) void perf_put_guest_context(void) { struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context); lockdep_assert_irqs_disabled(); perf_ctx_lock(cpuctx, cpuctx->task_ctx); if (WARN_ON_ONCE(!__this_cpu_read(guest_ctx_loaded)))
Re: [PATCH 1/2] selftests/mm: skip uffd tests in madv_guard if uffd is not present.
On 15 May 2025, at 14:49, Lorenzo Stoakes wrote: > On Thu, May 15, 2025 at 02:46:41PM -0400, Zi Yan wrote: >> On 15 May 2025, at 14:41, Lorenzo Stoakes wrote: >> >>> Ah you got to this first :) thanks! >>> >>> Could you do this with a cover letter though? It's really weird to have 2/2 >>> reply to 1/2, I know sometimes people do that, but it's just odd, and it'd >>> be >>> good to have an overview, thanks! >>> >>> On Thu, May 15, 2025 at 02:23:32PM -0400, Zi Yan wrote: When userfaultfd is not compiled into kernel, userfaultfd() returns -1, causing uffd tests in madv_guard fail. Skip the tests instead. >>> >>> 'madv_guard'? I'd just say the guard_regions.uffd test to fail. >> >> Sure. Will change it. >>> Signed-off-by: Zi Yan > > Given I was being an idiot below, now the patch is fine as-is, just resend > with the nitty commit message change and cover letter as a v2 and we should > be good :) Sure. I am also waiting for Adam's feedback on patch2 and will resend later. > > Reviewed-by: Lorenzo Stoakes > --- tools/testing/selftests/mm/guard-regions.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/mm/guard-regions.c b/tools/testing/selftests/mm/guard-regions.c index 0cd9d236649d..93af3d3760f9 100644 --- a/tools/testing/selftests/mm/guard-regions.c +++ b/tools/testing/selftests/mm/guard-regions.c @@ -1453,8 +1453,21 @@ TEST_F(guard_regions, uffd) /* Set up uffd. */ uffd = userfaultfd(0); - if (uffd == -1 && errno == EPERM) - ksft_exit_skip("No userfaultfd permissions, try running as root.\n"); >>> >>> Let's just make this all part of the same switch please! >> >> What do you mean? EPERM is handled in the switch-case below. > > Oh man, I'm the biggest idiot on Earth haha! > > For some reason I read these '-'s as '+'s : > > Yes please ignore the above, I therefore - like your approach - and am good > with it. > Yeah, I kinda figured when I read your message, but wanted to double check with you. >> >>> >>> And while I originally used ksft_exit_skip(), I think we can just use the >>> SKIP(return, ...) form here just fine to keep it consistent. >> >> Right. I am using SKIP below, since when I ran it, ksft_exit_skip() >> makes the whole test message inconsistent. > > Yes, your confusion is warranted, I apparently can't read... :>) and > indeed, agreed. > > Thanks for doing this! > Thank you for the review. :) >> >>> + if (uffd == -1) { + switch (errno) { + case EPERM: + SKIP(return, "No userfaultfd permissions, try running as root."); + break; + case ENOSYS: + SKIP(return, "userfaultfd is not supported/not enabled."); + break; + default: + ksft_exit_fail_msg("userfaultfd failed with %s\n", + strerror(errno)); + break; + } + } + ASSERT_NE(uffd, -1); ASSERT_EQ(ioctl(uffd, UFFDIO_API, &api), 0); -- 2.47.2 >>> >>> Thanks! >> >> >> -- >> Best Regards, >> Yan, Zi -- Best Regards, Yan, Zi
Re: [PATCH v2 4/4] media: i2c: imx214: Remove hard-coded external clock frequency
Hi Sakari, Hi Laurent, Am Freitag, dem 16.05.2025 um 00:02 +0300 schrieb Sakari Ailus: > Hi Laurent, > > On Thu, May 15, 2025 at 03:03:40PM +0200, Laurent Pinchart wrote: > > On Thu, May 15, 2025 at 02:54:54PM +0300, Sakari Ailus wrote: > > > On Thu, May 15, 2025 at 10:58:46AM +0200, Laurent Pinchart wrote: > > > > On Tue, May 06, 2025 at 08:24:03AM +, Sakari Ailus wrote: > > > > > On Mon, May 05, 2025 at 11:05:56PM +0200, André Apitzsch via > > > > > B4 Relay wrote: > > > > > > From: André Apitzsch > > > > > > > > > > > > Instead rely on the rate set on the clock (using assigned- > > > > > > clock-rates > > > > > > etc.) > > > > > > > > > > > > Signed-off-by: André Apitzsch > > > > > > --- > > > > > > drivers/media/i2c/imx214.c | 6 -- > > > > > > 1 file changed, 6 deletions(-) > > > > > > > > > > > > diff --git a/drivers/media/i2c/imx214.c > > > > > > b/drivers/media/i2c/imx214.c > > > > > > index > > > > > > 9e9be47394ec768a5b34d44b06b5bbb0988da5a1..c12996e294dccebb1 > > > > > > 8c608254f1e0d14dc064423 100644 > > > > > > --- a/drivers/media/i2c/imx214.c > > > > > > +++ b/drivers/media/i2c/imx214.c > > > > > > @@ -32,7 +32,6 @@ > > > > > > > > > > > > #define IMX214_REG_FAST_STANDBY_CTRL CCI_REG8(0x0106) > > > > > > > > > > > > -#define IMX214_DEFAULT_CLK_FREQ2400 > > > > > > #define IMX214_DEFAULT_LINK_FREQ 6 > > > > > > /* Keep wrong link frequency for backward compatibility */ > > > > > > #define IMX214_DEFAULT_LINK_FREQ_LEGACY48000 > > > > > > @@ -1405,11 +1404,6 @@ static int imx214_probe(struct > > > > > > i2c_client *client) > > > > > > return dev_err_probe(dev, PTR_ERR(imx214- > > > > > > >xclk), > > > > > > "failed to get > > > > > > xclk\n"); > > > > > > > > > > > > - ret = clk_set_rate(imx214->xclk, > > > > > > IMX214_DEFAULT_CLK_FREQ); > > > > > > - if (ret) > > > > > > - return dev_err_probe(dev, ret, > > > > > > - "failed to set xclk > > > > > > frequency\n"); > > > > > > - > > > > > > > > > > Oops. I missed this is what the driver was doing already. > > > > > Indeed, this is one of the historic sensor drivers that do > > > > > set the frequency in DT systems. > > > > > > > > > > The driver never used the clock-frequency property and > > > > > instead used a fixed frequency. Changing the behaviour now > > > > > could be problematic. > > > > > > > > > > There are options here that I think we could do: > > > > > > > > > > 1) use your v1 patch (4) which uses "clock-frequency" if it > > > > > exists and otherwise uses the default, fixed frequency or > > > > > > > > > > 2) set the frequency only if the "clock-frequency" property > > > > > exists. The DT currently requires clock-frequency and the > > > > > YAML conversion was done in 2020 whereas the driver is from > > > > > 2018. If we do this, the clock-frequency should > > > > > be deprecated (or even removed from bingings). > > > > > > > > > > I wonder what others think. Cc'd Laurent in any case. > > > > > > > > Maybe I'm missing something, but I don't really see the issue > > > > here. The clock-frequency DT property is currently ignored, and > > > > this patch doesn't change that situation, does it ? > > > > > > > > The change of behaviour here is related to the assigned-clock- > > > > rates property. If that property is specified today, it will > > > > set the clock rate, and the driver will override it to 24MHz > > > > right after. > > > > With this patch, the clock rate won't be overridden. I think > > > > the risk of regression is very low here, as I don't expect > > > > systems to set assigned-clock-rates in DT to a value different > > > > than 24MHz and expect the driver to override it. > > > > > > If the DTS had assigned-clock-rates set correctly, then yes. How > > > much can we trust the older DTS did have that? > > > > I am relatively confident that DT-based systems wouldn't have an > > assigned-clock-rates property with a frequency that doesn't match > > IMX214_DEFAULT_CLK_FREQ. The real question is whether or not I'm > > over-confident :-) > > The assigned-clock stuff wasn't always there. But nowadays I guess a > lot of things in practice depends on it. > > So I guess doing this should be fine then. Just to be clear, this patch is fine and no changes are needed? Should the bindings be updated in this series or can that be done later? Best regards, André
[PATCH net-next v7] selftests/vsock: add initial vmtest.sh for vsock
This commit introduces a new vmtest.sh runner for vsock. It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H, H2G, and loopback. The testing tools from tools/testing/vsock/ are reused. Currently, only vsock_test is used. VMCI and hyperv support is automatically built, though not used. Only tested on x86. To run: $ make -C tools/testing/selftests TARGETS=vsock $ tools/testing/selftests/vsock/vmtest.sh or $ make -C tools/testing/selftests TARGETS=vsock run_tests Example runs (after make -C tools/testing/selftests TARGETS=vsock): $ ./tools/testing/selftests/vsock/vmtest.sh TAP version 13 1..3 ok 0 vm_server_host_client ok 1 vm_client_host_server ok 2 vm_loopback $ ./tools/testing/selftests/vsock/vmtest.sh vm_loopback TAP version 13 1..1 ok 0 vm_loopback $ mkdir -p ~/scratch $ make -C tools/testing/selftests install TARGETS=vsock INSTALL_PATH=~/scratch [... omitted ...] $ cd ~/scratch $ ./run_kselftest.sh TAP version 13 1..1 ok 1 selftests: vsock: vmtest.sh Future work can include vsock_diag_test. The tap output style copies mm's run_vmtests.sh. Because vsock requires a VM to test anything other than loopback, this patch adds vmtest.sh as a kselftest itself. This is different than other systems that have a "vmtest.sh", where it is used as a utility script to spin up a VM to run the selftests as a guest (but isn't hooked into kselftest). Testing in NIPA is still WIP. Signed-off-by: Bobby Eshleman --- Changes in v7: - fix exit code bug when ran is kselftest: use cnt_total instead of KSFT_NUM_TESTS - updated commit message with updated output - updated commit message with commands for installing/running as kselftest - Link to v6: https://lore.kernel.org/r/20250515-vsock-vmtest-v6-1-9af1cc023...@gmail.com Changes in v6: - add make cmd in commit message in vmtest.sh example (Stefano) - check nonzero size of QEMU_PIDFILE using -s conditional (Stefano) - display log file path after tests so it is easier to find amongst other random names - cleanup qemu pidfile if qemu is unable to remove it - make oops/warning failures more obvious with 'FAIL' prefix in log (simply saying 'detected' wasn't clear enough to identify failing condition) - Link to v5: https://lore.kernel.org/r/20250513-vsock-vmtest-v5-1-4e75c4a45...@gmail.com Changes in v5: - make log file a tmpfile (Paolo) - make sure both default and user defined QEMU gets handled by the dependency check (Paolo) - increased VM boot up timeout from 1m to 3m for slow hosts (Paolo) - rename vm_setup -> vm_start (Paolo) - derive wait_for_listener from selftests/net/net_helper.sh to removes ss usage - Remove unused 'unset IFS' line (Paolo) - leave space after variable declarations (Paolo) - make QEMU_PIDFILE a tmp file (Paolo) - make everything readonly that is only read (Paolo) - source ktap_helpers.sh for KSFT_PASS and friends (Paolo) - don't check for timeout util (Paolo) - add missing usage string for -q qemu arg - add tap prefix to SUMMARY line since it isn't part of TAP protocol - exit with the correct status code based on failure/pass counts - Link to v4: https://lore.kernel.org/r/20250507-vsock-vmtest-v4-1-6e2a97262...@gmail.com Changes in v4: - do not use special tab delimiter for help string parsing (Stefano + Paolo) - fix paths for when installing kselftest and running out-of-tree (Paolo) - change vng to using running kernel instead of compiled kernel (Paolo) - use multi-line string for QEMU_OPTS (Stefano) - change timeout to 300s (Paolo) - skip if tools are not found and use kselftests status codes (Paolo) - remove build from vmtest.sh (Paolo) - change -> SSH_HOST_PORT (Stefano) - add tap-format output - add vmtest.log to gitignore - check for vsock_test binary and remind user to build it if missing - create a proper build in makefile - style fixes - add ssh, timeout, and pkill to dependency check, just in case - fix numerical comparison in conditionals - check qemu pidfile exists before proceeding (avoid wasting time waiting for ssh) - fix tracking of pass/fail bug - fix stderr redirection bug - Link to v3: https://lore.kernel.org/r/20250428-vsock-vmtest-v3-1-181af6163...@gmail.com Changes in v3: - use common conditional syntax for checking variables - use return value instead of global rc - fix typo TEST_HOST_LISTENER_PORT -> TEST_HOST_PORT_LISTENER - use SIGTERM instead of SIGKILL on cleanup - use peer-cid=1 for loopback - change sleep delay times into globals - fix test_vm_loopback logging - add test selection in arguments - make QEMU an argument - check that vng binary is on path - use QEMU variable - change to - fix hardcoded file paths - add comment in commit msg about script that vmtest.sh was based off of - Add tools/testing/selftest/vsock/Makefile for kselftest - Link to v2: https://lore.kernel.org/r/20250417-vsock-vmtest-v2-1-3901a2733...@gmail.com Changes in v2: - add kernel oops and wa
Re: [PATCH v4 29/38] KVM: x86/pmu: Switch host/guest PMU context at vm-exit/vm-entry
On 5/16/2025 12:29 AM, Sean Christopherson wrote: > On Mon, Mar 24, 2025, Mingwei Zhang wrote: >> diff --git a/arch/x86/include/asm/kvm-x86-pmu-ops.h >> b/arch/x86/include/asm/kvm-x86-pmu-ops.h >> index 9159bf1a4730..35f27366c277 100644 >> --- a/arch/x86/include/asm/kvm-x86-pmu-ops.h >> +++ b/arch/x86/include/asm/kvm-x86-pmu-ops.h >> @@ -22,6 +22,8 @@ KVM_X86_PMU_OP(init) >> KVM_X86_PMU_OP_OPTIONAL(reset) >> KVM_X86_PMU_OP_OPTIONAL(deliver_pmi) >> KVM_X86_PMU_OP_OPTIONAL(cleanup) >> +KVM_X86_PMU_OP(put_guest_context) >> +KVM_X86_PMU_OP(load_guest_context) > For KVM, the "guest_context" part is largely superfluous, as KVM always > operates > on guest state, e.g. kvm_fpu_{load,put}(). > > I do think we should squeeze in "mediated" somewhere, otherwise the it's hard > to > see that these are specific to the mediated PMU. > > So probably mediated_{load,put}()? After moving all PMC's manipulation into kvm common code, these two helper actually only manipulates global shared MSRs right now, not sure if add extra "global" word to the ops name which makes the name more descriptive. But considering we may add more other MSRs like PEBS MSRs in these 2 helpers, I'm ok for mediated_{load,put}. > >> #undef KVM_X86_PMU_OP >> #undef KVM_X86_PMU_OP_OPTIONAL >> diff --git a/arch/x86/include/asm/kvm_host.h >> b/arch/x86/include/asm/kvm_host.h >> index 7ee740aa..4117a382739a 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -568,6 +568,10 @@ struct kvm_pmu { >> u64 raw_event_mask; >> struct kvm_pmc gp_counters[KVM_MAX_NR_GP_COUNTERS]; >> struct kvm_pmc fixed_counters[KVM_MAX_NR_FIXED_COUNTERS]; >> +u32 gp_eventsel_base; >> +u32 gp_counter_base; >> +u32 fixed_base; >> +u32 cntr_shift; > Gah, my bad, "shift" was a terrible suggestion. It should be "stride". Yes. > >> @@ -306,6 +313,10 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu); >> int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp); >> void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel); >> bool vcpu_pmu_can_enable(struct kvm_vcpu *vcpu); >> +void kvm_pmu_put_guest_pmcs(struct kvm_vcpu *vcpu); >> +void kvm_pmu_load_guest_pmcs(struct kvm_vcpu *vcpu); >> +void kvm_pmu_put_guest_context(struct kvm_vcpu *vcpu); >> +void kvm_pmu_load_guest_context(struct kvm_vcpu *vcpu); >> >> bool is_vmware_backdoor_pmc(u32 pmc_idx); >> bool kvm_rdpmc_in_guest(struct kvm_vcpu *vcpu); >> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c >> index 1a7e3a897fdf..7e0d84d50b74 100644 >> --- a/arch/x86/kvm/svm/pmu.c >> +++ b/arch/x86/kvm/svm/pmu.c >> @@ -175,6 +175,22 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, >> struct msr_data *msr_info) >> return 1; >> } >> >> +static inline void amd_update_msr_base(struct kvm_vcpu *vcpu) >> +{ >> +struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); >> + >> +if (kvm_pmu_has_perf_global_ctrl(pmu) || >> +guest_cpu_cap_has(vcpu, X86_FEATURE_PERFCTR_CORE)) { >> +pmu->gp_eventsel_base = MSR_F15H_PERF_CTL0; >> +pmu->gp_counter_base = MSR_F15H_PERF_CTR0; >> +pmu->cntr_shift = 2; >> +} else { >> +pmu->gp_eventsel_base = MSR_K7_EVNTSEL0; >> +pmu->gp_counter_base = MSR_K7_PERFCTR0; >> +pmu->cntr_shift = 1; >> +} >> +} > Moving quoted text around to organize responses... > >> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c >> index 796b7bc4affe..ed17ab198dfb 100644 >> --- a/arch/x86/kvm/vmx/pmu_intel.c >> +++ b/arch/x86/kvm/vmx/pmu_intel.c >> @@ -460,6 +460,17 @@ static void intel_pmu_enable_fixed_counter_bits(struct >> kvm_pmu *pmu, u64 bits) >> pmu->fixed_ctr_ctrl_rsvd &= ~intel_fixed_bits_by_idx(i, bits); >> } >> >> +static inline void intel_update_msr_base(struct kvm_vcpu *vcpu) >> +{ >> +struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); >> + >> +pmu->gp_eventsel_base = MSR_P6_EVNTSEL0; >> +pmu->gp_counter_base = fw_writes_is_enabled(vcpu) ? >> + MSR_IA32_PMC0 : MSR_IA32_PERFCTR0; > This is wrong. And I unintentionally proved that it's wrong, by goofing when > I > fixed up this code and using MSR_IA32_PERFCTR0 instead of MSR_IA32_PMC0. > > Whether or not the guest supports full-width writes is irrelevant, because > support > for FW writes doesn't change the width of the counters. Just because the > *guest* > can't directly write all e.g. 48 bits doesn't mean clobbering bits 47:32 is > ok. > > Similarly, on the AMD side, using the legacy interface in KVM is unnecessary. > The guest may be limited to those MSRs, but KVM has a hard dependency on PMU > v2, > so just unconditionally use MSR_F15H_PERF_CTR0 (and for the record, because I > had to look it up, the newfangled MSRs on AMD are aliased to the legacy MSRs > for > 0..3). > > Very happily, that means the MSRs don't need to be per-PMU, and they don't > even > need to be configured at runtime