Re: [PATCH/RFC] kunit/rtc: Add real support for very slow tests
Hi Geert, Thanks for sending this out: I think this raises some good questions about exactly how to handle long running tests (particularly on older/slower hardware). I've put a few notes below, but, tl;dr: I think these are all good changes, even if there's more we can do to better scale to slower hardware. On Fri, 28 Mar 2025 at 00:07, Geert Uytterhoeven wrote: > > When running rtc_lib_test ("lib_test" before my "[PATCH] rtc: Rename > lib_test to rtc_lib_test") on m68k/ARAnyM: > > KTAP version 1 > 1..1 > KTAP version 1 > # Subtest: rtc_lib_test_cases > # module: rtc_lib_test > 1..2 > # rtc_time64_to_tm_test_date_range_1000: Test should be marked slow > (runtime: 3.222371420s) > ok 1 rtc_time64_to_tm_test_date_range_1000 > # rtc_time64_to_tm_test_date_range_16: try timed out > # rtc_time64_to_tm_test_date_range_16: test case timed out > # rtc_time64_to_tm_test_date_range_16.speed: slow > not ok 2 rtc_time64_to_tm_test_date_range_16 > # rtc_lib_test_cases: pass:1 fail:1 skip:0 total:2 > # Totals: pass:1 fail:1 skip:0 total:2 > not ok 1 rtc_lib_test_cases > > Commit 02c2d0c2a84172c3 ("kunit: Add speed attribute") added the notion > of "very slow" tests, but this is further unused and unhandled. > > Hence: > 1. Introduce KUNIT_CASE_VERY_SLOW(), Thanks -- I think we want this regardless. > 2. Increase timeout by ten; ideally this should only be done for very > slow tests, but I couldn't find how to access kunit_case.attr.case > from kunit_try_catch_run(), My feeling for tests generally is: - Normal: effectively instant on modern hardware, O(seconds) on ancient hardware. - Slow: takes O(seconds) to run on modern hardware, O(minutes)..O(10s of minutes) on ancient hardware. - Very slow: O(minutes) or higher on modern hardware, infeasible on ancient hardware. Obviously the definition of "modern" and "ancient" hardware here is pretty arbitrary: I'm using "modern, high-end x86" ~4GHz as my "modern" example, and "66MHz 486" as my "ancient" one, but things like emulation or embedded systems fit in-between. Ultimately, I think the timeout probably needs to be configurable on a per-machine basis more than a per-test one, but having a 10x multiplier (or even a 100x multiplier) for very slow tests would also work for me. I quickly tried hacking together something to pass through the attribute and implement this. Diff (probably mangled by gmail) below: --- diff --git a/include/kunit/try-catch.h b/include/kunit/try-catch.h index 7c966a1adbd3..24a29622068b 100644 --- a/include/kunit/try-catch.h +++ b/include/kunit/try-catch.h @@ -50,6 +50,13 @@ struct kunit_try_catch { void *context; }; +struct kunit_try_catch_context { + struct kunit *test; + struct kunit_suite *suite; + struct kunit_case *test_case; +}; + + void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context); void __noreturn kunit_try_catch_throw(struct kunit_try_catch *try_catch); diff --git a/lib/kunit/test.c b/lib/kunit/test.c index 146d1b48a096..79d12c0c2d25 100644 --- a/lib/kunit/test.c +++ b/lib/kunit/test.c @@ -420,12 +420,6 @@ static void kunit_run_case_cleanup(struct kunit *test, kunit_case_internal_cleanup(test); } -struct kunit_try_catch_context { - struct kunit *test; - struct kunit_suite *suite; - struct kunit_case *test_case; -}; - static void kunit_try_run_case(void *data) { struct kunit_try_catch_context *ctx = data; diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c index 92099c67bb21..5f62e393d422 100644 --- a/lib/kunit/try-catch.c +++ b/lib/kunit/try-catch.c @@ -34,30 +34,15 @@ static int kunit_generic_run_threadfn_adapter(void *data) return 0; } -static unsigned long kunit_test_timeout(void) +static unsigned long kunit_test_timeout(struct kunit_try_catch *try_catch) { - /* -* TODO(brendanhigg...@google.com): We should probably have some type of -* variable timeout here. The only question is what that timeout value -* should be. -* -* The intention has always been, at some point, to be able to label -* tests with some type of size bucket (unit/small, integration/medium, -* large/system/end-to-end, etc), where each size bucket would get a -* default timeout value kind of like what Bazel does: -* https://docs.bazel.build/versions/master/be/common-definitions.html#test.size -* There is still some debate to be had on exactly how we do this. (For -* one, we probably want to have some sort of test runner level -* timeout.) -* -* For more background on this topic, see: -* https://mike-bland.com/2011/11/01/small-medium-large.html -* -* If tests timeout due to exceeding sysctl_hung_task_timeout_secs, -* the task will be killed and an oops generated. -*/ - // FIXME
Re: [PATCH 1/4] x86/sgx: Add total number of EPC pages
On Fri, Mar 28, 2025 at 08:07:24AM +, Reshetova, Elena wrote: > > Yes but obviously I cannot promise that I'll accept this as it is > > until I see the final version > > Are you saying you prefer *this version with spinlock* vs. > simpler version that utilizes the fact that sgx_nr_free_pages is changed > into tracking of number of used pages? I don't know really what I do prefer. Maybe +1 version would make sense where you keep with the approach you've chosen (used pages) and better rationalize why it is mandatory, and why free pages would be worse? > > > > > Also you probably should use mutex given the loop where we cannot > > temporarily exit the lock (like e.g. in keyrings gc we can). > > Not sure I understand this, could you please elaborate why do I need an > additional mutex here? Or are you suggesting switching spinlock to mutex? In your code example you had a loop inside spinlock, which was based on a return code of an opcode, i.e. potentially infinite loop. I'd like to remind you that the hardware I have is NUC7 from 2018 so you really have to nail how things will work semantically as I can only think these things only in theoretical level ;-) [1] > > Best Regards, > Elena. > [1] https://social.kernel.org/notice/AsUbsYH0T4bTcUSdUW BR, Jarkko
Re: [PATCH 4/4] x86/sgx: Implement ENCLS[EUPDATESVN] and opportunistically call it during first EPC page alloc
On Fri, Mar 28, 2025 at 08:27:51AM +, Reshetova, Elena wrote: > > > On Thu, Mar 27, 2025 at 03:42:30PM +, Reshetova, Elena wrote: > > > > > > > + case SGX_NO_UPDATE: > > > > > > > + pr_debug("EUPDATESVN was successful, but CPUSVN > > was not > > > > > > updated, " > > > > > > > + "because current SVN was not newer than > > > > > > CPUSVN.\n"); > > > > > > > + break; > > > > > > > + case SGX_EPC_NOT_READY: > > > > > > > + pr_debug("EPC is not ready for SVN update."); > > > > > > > + break; > > > > > > > + case SGX_INSUFFICIENT_ENTROPY: > > > > > > > + pr_debug("CPUSVN update is failed due to Insufficient > > > > > > entropy in RNG, " > > > > > > > + "please try it later.\n"); > > > > > > > + break; > > > > > > > + case SGX_EPC_PAGE_CONFLICT: > > > > > > > + pr_debug("CPUSVN update is failed due to > > concurrency > > > > > > violation, please " > > > > > > > + "stop running any other ENCLS leaf and try it > > > > > > later.\n"); > > > > > > > + break; > > > > > > > + default: > > > > > > > + break; > > > > > > > > > > > > Remove pr_debug() statements. > > > > > > > > > > This I am not sure it is good idea. I think it would be useful for > > > > > system > > > > > admins to have a way to see that update either happened or not. > > > > > It is true that you can find this out by requesting a new SGX > > > > > attestation > > > > > quote (and see if newer SVN is used), but it is not the faster way. > > > > > > > > Maybe pr_debug() is them wrong level if they are meant for sysadmins? > > > > > > > > I mean these should not happen in normal behavior like ever? As > > > > pr_debug() I don't really grab this. > > > > > > SGX_NO_UPDATE will absolutely happen normally all the time. > > > Since EUPDATESVN is executed every time EPC is empty, this is the > > > most common code you will get back (because microcode updates are rare). > > > Others yes, that would indicate some error condition. > > > So, what is the pr_level that you would suggest? > > > > Right, got it. That changes my conclusions: > > > > So I'd reformulate it like: > > > > switch (ret) { > > case 0: > > pr_info("EUPDATESVN: success\n); > > break; > > case SGX_EPC_NOT_READY: > > case SGX_INSUFFICIENT_ENTROPY: > > case SGX_EPC_PAGE_CONFLICT: > > pr_err("EUPDATESVN: error %d\n", ret); > > /* TODO: block/teardown driver? */ > > break; > > case SGX_NO_UPDATE: > > break; > > default: > > pr_err("EUPDATESVN: unknown error %d\n", ret); > > /* TODO: block/teardown driver? */ > > break; > > } > > > > Since when this is executed EPC usage is zero error cases should block > > or teardown SGX driver, presuming that they are because of either > > incorrect driver state or spurious error code. > > I agree with the above, but not sure at all about the blocking/teardown the > driver. They are all potentially temporal things and SGX_INSUFFICIENT_ENTROPY > is even outside of SGX driver control and *does not* indicate any error > condition on the driver side itself. SGX_EPC_NOT_READY and > SGX_EPC_PAGE_CONFLICT > would mean we have a bug somewhere because we thought we could go > do EUDPATESVN on empty EPC and prevented anyone from creating > pages in meanwhile but looks like we missed smth. That said, I dont know if we > want to fail the whole system in case we have such a code bug, this is very > aggressive (in case it is some rare edge condition that no one knew about or > guessed). So, I would propose to print the pr_err() as you have above but > avoid destroying the driver. > Would this work? I think now is the time that you should really roll out a new version in the way you see fit and we will revisit that. I already grabbed from your example that I got some of the error codes horribly wrong :-) Still I think the draft of error planning I put is at least towards right direction. > > Best Regards, > Elena. > > > > > > If this happens, we definitely do not want service, right? > > > > I'm not sure of all error codes how serious they are, or are all of them > > consequence of incorrectly working driver. > > > > BR, Jarkko BR, Jarkko
RE: [PATCH 1/4] x86/sgx: Add total number of EPC pages
> On Fri, Mar 28, 2025 at 08:07:24AM +, Reshetova, Elena wrote: > > > Yes but obviously I cannot promise that I'll accept this as it is > > > until I see the final version > > > > Are you saying you prefer *this version with spinlock* vs. > > simpler version that utilizes the fact that sgx_nr_free_pages is changed > > into tracking of number of used pages? > > I don't know really what I do prefer. > > Maybe +1 version would make sense where you keep with the approach > you've chosen (used pages) and better rationalize why it is mandatory, > and why free pages would be worse? Sure, let me send out v2 with the old approach, all suggestions and fixes taken into account and better reasoning. > > > > > > > > > Also you probably should use mutex given the loop where we cannot > > > temporarily exit the lock (like e.g. in keyrings gc we can). > > > > Not sure I understand this, could you please elaborate why do I need an > > additional mutex here? Or are you suggesting switching spinlock to mutex? > > In your code example you had a loop inside spinlock, which was based on > a return code of an opcode, i.e. potentially infinite loop. Oh, this is a misunderstanding due to limited snippet posting. The loop was bounded also by "retry" condition in while with the max number of retires being 10. It only exists earlier if there is success. > > I'd like to remind you that the hardware I have is NUC7 from 2018 so > you really have to nail how things will work semantically as I can > only think these things only in theoretical level ;-) [1] Sure, I understand. Best Regards, Elena. > > > > > > Best Regards, > > Elena. > > > > [1] https://social.kernel.org/notice/AsUbsYH0T4bTcUSdUW > > BR, Jarkko
Re: [PATCH v3 2/3] openrisc: Introduce new utility functions to flush and invalidate caches
Hi, Thank you for the review. On 3/26/25 10:41 PM, Stafford Horne wrote: On Mon, Mar 24, 2025 at 01:25:43AM +0530, Sahil Siddiq wrote: According to the OpenRISC architecture manual, the dcache and icache may not be present. When these caches are present, the invalidate and flush registers may be absent. The current implementation does not perform checks to verify their presence before utilizing cache registers, or invalidating and flushing cache blocks. Introduce new functions to detect the presence of cache components and related special-purpose registers. There are a few places where a range of addresses have to be flushed or invalidated and the implementation is duplicated. Introduce new utility functions and macros that generalize this implementation and reduce duplication. Signed-off-by: Sahil Siddiq --- Changes from v2 -> v3: - arch/openrisc/include/asm/cacheflush.h: Declare new functions and macros. - arch/openrisc/include/asm/cpuinfo.h: Implement new functions. (cpu_cache_is_present): 1. The implementation of this function was strewn all over the place in the previous versions. 2. Fix condition. The condition in the previous version was incorrect. (cb_inv_flush_is_implemented): New function. - arch/openrisc/kernel/dma.c: Use new functions. - arch/openrisc/mm/cache.c: (cache_loop): Extend function. (local_*_page_*): Use new cache_loop interface. (local_*_range_*): Implement new functions. - arch/openrisc/mm/init.c: Use new functions. arch/openrisc/include/asm/cacheflush.h | 17 + arch/openrisc/include/asm/cpuinfo.h| 42 + arch/openrisc/kernel/dma.c | 18 ++--- arch/openrisc/mm/cache.c | 52 ++ arch/openrisc/mm/init.c| 5 ++- 5 files changed, 110 insertions(+), 24 deletions(-) diff --git a/arch/openrisc/include/asm/cacheflush.h b/arch/openrisc/include/asm/cacheflush.h index 984c331ff5f4..0e60af486ec1 100644 --- a/arch/openrisc/include/asm/cacheflush.h +++ b/arch/openrisc/include/asm/cacheflush.h @@ -23,6 +23,9 @@ */ extern void local_dcache_page_flush(struct page *page); extern void local_icache_page_inv(struct page *page); +extern void local_dcache_range_flush(unsigned long start, unsigned long end); +extern void local_dcache_range_inv(unsigned long start, unsigned long end); +extern void local_icache_range_inv(unsigned long start, unsigned long end); /* * Data cache flushing always happen on the local cpu. Instruction cache @@ -38,6 +41,20 @@ extern void local_icache_page_inv(struct page *page); extern void smp_icache_page_inv(struct page *page); #endif /* CONFIG_SMP */ +/* + * Even if the actual block size is larger than L1_CACHE_BYTES, paddr + * can be incremented by L1_CACHE_BYTES. When paddr is written to the + * invalidate register, the entire cache line encompassing this address + * is invalidated. Each subsequent reference to the same cache line will + * not affect the invalidation process. + */ +#define local_dcache_block_flush(addr) \ + local_dcache_range_flush(addr, addr + L1_CACHE_BYTES) +#define local_dcache_block_inv(addr) \ + local_dcache_range_inv(addr, addr + L1_CACHE_BYTES) +#define local_icache_block_inv(addr) \ + local_icache_range_inv(addr, addr + L1_CACHE_BYTES) + /* * Synchronizes caches. Whenever a cpu writes executable code to memory, this * should be called to make sure the processor sees the newly written code. diff --git a/arch/openrisc/include/asm/cpuinfo.h b/arch/openrisc/include/asm/cpuinfo.h index 82f5d4c06314..7839c41152af 100644 --- a/arch/openrisc/include/asm/cpuinfo.h +++ b/arch/openrisc/include/asm/cpuinfo.h @@ -15,6 +15,9 @@ #ifndef __ASM_OPENRISC_CPUINFO_H #define __ASM_OPENRISC_CPUINFO_H +#include +#include + struct cache_desc { u32 size; u32 sets; @@ -34,4 +37,43 @@ struct cpuinfo_or1k { extern struct cpuinfo_or1k cpuinfo_or1k[NR_CPUS]; extern void setup_cpuinfo(void); +/* + * Check if the cache component exists. + */ +static inline bool cpu_cache_is_present(const unsigned int cache_type) +{ + unsigned long upr = mfspr(SPR_UPR); + + return !!(upr & (SPR_UPR_UP | cache_type)); +} + +/* + * Check if the cache block flush/invalidate register is implemented for the + * cache component. + */ +static inline bool cb_inv_flush_is_implemented(const unsigned int reg, + const unsigned int cache_type) +{ + unsigned long cfgr; + + if (cache_type == SPR_UPR_DCP) { + cfgr = mfspr(SPR_DCCFGR); + if (reg == SPR_DCBFR) + return !!(cfgr & SPR_DCCFGR_CBFRI); + + if (reg == SPR_DCBIR) + return !!(cfgr & SPR_DCCFGR_CBIRI); + } + + /* +* The cache block flush register is not implemented for the instruction cache. +*/ + if (cache_type == SPR_UPR_ICP) { +
Re: [PATCH] virtio: console: Make resizing compliant with virtio spec
On Thu, 20 Mar 2025 15:09:57 +0100 Halil Pasic wrote: > > I already implemented it in my patch v2 (just waiting for Amit to > > confirm the new commit message). But if you want to split it you can > > create a seperate patch for it as well (I don't really mind either > > way). > > Your v2 has not been posted yet, or? I can't find it in my Inbox. I understand that you have confirmed that the byte order handling is needed but missing, right? > > It is conceptually a different bug and warrants a patch and a commit > message of its own. At least IMHO.
[PATCH v4 3/3] openrisc: Add cacheinfo support
Add cacheinfo support for OpenRISC. Currently, a few CPU cache attributes pertaining to OpenRISC processors are exposed along with other unrelated CPU attributes in the procfs file system (/proc/cpuinfo). However, a few cache attributes remain unexposed. Provide a mechanism that the generic cacheinfo infrastructure can employ to expose these attributes via the sysfs file system. These attributes can then be exposed in /sys/devices/system/cpu/cpuX/cache/indexN. Move the implementation to pull cache attributes from the processor's registers from arch/openrisc/kernel/setup.c with a few modifications. This implementation is based on similar work done for MIPS and LoongArch. Link: https://raw.githubusercontent.com/openrisc/doc/master/openrisc-arch-1.4-rev0.pdf Signed-off-by: Sahil Siddiq --- Changes from v3 -> v4: - arch/openrisc/kernel/cacheinfo.c: Fix build warning detected by kernel test robot. Changes from v2 -> v3: - arch/openrisc/kernel/cacheinfo.c: 1. Use new functions introduced in patch #2. 2. Address review comments regarding coding style. - arch/openrisc/kernel/setup.c: (print_cpuinfo): Don't remove detection of UPR register. arch/openrisc/kernel/Makefile| 2 +- arch/openrisc/kernel/cacheinfo.c | 104 +++ arch/openrisc/kernel/setup.c | 44 + 3 files changed, 108 insertions(+), 42 deletions(-) create mode 100644 arch/openrisc/kernel/cacheinfo.c diff --git a/arch/openrisc/kernel/Makefile b/arch/openrisc/kernel/Makefile index 79129161f3e0..e4c7d9bdd598 100644 --- a/arch/openrisc/kernel/Makefile +++ b/arch/openrisc/kernel/Makefile @@ -7,7 +7,7 @@ extra-y := vmlinux.lds obj-y := head.o setup.o or32_ksyms.o process.o dma.o \ traps.o time.o irq.o entry.o ptrace.o signal.o \ - sys_call_table.o unwinder.o + sys_call_table.o unwinder.o cacheinfo.o obj-$(CONFIG_SMP) += smp.o sync-timer.o obj-$(CONFIG_STACKTRACE) += stacktrace.o diff --git a/arch/openrisc/kernel/cacheinfo.c b/arch/openrisc/kernel/cacheinfo.c new file mode 100644 index ..61230545e4ff --- /dev/null +++ b/arch/openrisc/kernel/cacheinfo.c @@ -0,0 +1,104 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * OpenRISC cacheinfo support + * + * Based on work done for MIPS and LoongArch. All original copyrights + * apply as per the original source declaration. + * + * OpenRISC implementation: + * Copyright (C) 2025 Sahil Siddiq + */ + +#include +#include +#include +#include + +static inline void ci_leaf_init(struct cacheinfo *this_leaf, enum cache_type type, + unsigned int level, struct cache_desc *cache, int cpu) +{ + this_leaf->type = type; + this_leaf->level = level; + this_leaf->coherency_line_size = cache->block_size; + this_leaf->number_of_sets = cache->sets; + this_leaf->ways_of_associativity = cache->ways; + this_leaf->size = cache->size; + cpumask_set_cpu(cpu, &this_leaf->shared_cpu_map); +} + +int init_cache_level(unsigned int cpu) +{ + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()]; + struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); + int leaves = 0, levels = 0; + unsigned long upr = mfspr(SPR_UPR); + unsigned long iccfgr, dccfgr; + + if (!(upr & SPR_UPR_UP)) { + printk(KERN_INFO + "-- no UPR register... unable to detect configuration\n"); + return -ENOENT; + } + + if (cpu_cache_is_present(SPR_UPR_DCP)) { + dccfgr = mfspr(SPR_DCCFGR); + cpuinfo->dcache.ways = 1 << (dccfgr & SPR_DCCFGR_NCW); + cpuinfo->dcache.sets = 1 << ((dccfgr & SPR_DCCFGR_NCS) >> 3); + cpuinfo->dcache.block_size = 16 << ((dccfgr & SPR_DCCFGR_CBS) >> 7); + cpuinfo->dcache.size = + cpuinfo->dcache.sets * cpuinfo->dcache.ways * cpuinfo->dcache.block_size; + leaves += 1; + printk(KERN_INFO + "-- dcache: %d bytes total, %d bytes/line, %d set(s), %d way(s)\n", + cpuinfo->dcache.size, cpuinfo->dcache.block_size, + cpuinfo->dcache.sets, cpuinfo->dcache.ways); + } else + printk(KERN_INFO "-- dcache disabled\n"); + + if (cpu_cache_is_present(SPR_UPR_ICP)) { + iccfgr = mfspr(SPR_ICCFGR); + cpuinfo->icache.ways = 1 << (iccfgr & SPR_ICCFGR_NCW); + cpuinfo->icache.sets = 1 << ((iccfgr & SPR_ICCFGR_NCS) >> 3); + cpuinfo->icache.block_size = 16 << ((iccfgr & SPR_ICCFGR_CBS) >> 7); + cpuinfo->icache.size = + cpuinfo->icache.sets * cpuinfo->icache.ways * cpuinfo->icache.block_size; + leaves += 1; + printk(KERN_INFO + "-- icache: %d bytes total, %d bytes/line, %d set(s), %d way(s)\n", + cp
Re: [PATCH v2 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc
On Fri, Mar 28, 2025 at 02:57:41PM +0200, Elena Reshetova wrote: > SGX architecture introduced a new instruction called EUPDATESVN > to Ice Lake. It allows updating security SVN version, given that EPC > is completely empty. The latter is required for security reasons > in order to reason that enclave security posture is as secure as the > security SVN version of the TCB that created it. > > Additionally it is important to ensure that while ENCLS[EUPDATESVN] > runs, no concurrent page creation happens in EPC, because it might > result in #GP delivered to the creator. Legacy SW might not be prepared > to handle such unexpected #GPs and therefore this patch introduces > a locking mechanism to ensure no concurrent EPC allocations can happen. > > It is also ensured that ENCLS[EUPDATESVN] is not called when running > in a VM since it does not have a meaning in this context (microcode > updates application is limited to the host OS) and will create > unnecessary load. > > This patch is based on previous submision by Cathy Zhang > https://lore.kernel.org/all/20220520103904.1216-1-cathy.zh...@intel.com/ > > Signed-off-by: Elena Reshetova > --- > arch/x86/include/asm/sgx.h | 41 + > arch/x86/kernel/cpu/sgx/encls.h | 6 > arch/x86/kernel/cpu/sgx/main.c | 63 - > arch/x86/kernel/cpu/sgx/sgx.h | 1 + > 4 files changed, 95 insertions(+), 16 deletions(-) > > diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h > index 6a0069761508..5caf5c31ebc6 100644 > --- a/arch/x86/include/asm/sgx.h > +++ b/arch/x86/include/asm/sgx.h > @@ -26,23 +26,26 @@ > #define SGX_CPUID_EPC_SECTION0x1 > /* The bitmask for the EPC section type. */ > #define SGX_CPUID_EPC_MASK GENMASK(3, 0) > +/* EUPDATESVN presence indication */ > +#define SGX_CPUID_EUPDATESVN BIT(10) > > enum sgx_encls_function { > - ECREATE = 0x00, > - EADD= 0x01, > - EINIT = 0x02, > - EREMOVE = 0x03, > - EDGBRD = 0x04, > - EDGBWR = 0x05, > - EEXTEND = 0x06, > - ELDU= 0x08, > - EBLOCK = 0x09, > - EPA = 0x0A, > - EWB = 0x0B, > - ETRACK = 0x0C, > - EAUG= 0x0D, > - EMODPR = 0x0E, > - EMODT = 0x0F, > + ECREATE = 0x00, > + EADD= 0x01, > + EINIT = 0x02, > + EREMOVE = 0x03, > + EDGBRD = 0x04, > + EDGBWR = 0x05, > + EEXTEND = 0x06, > + ELDU= 0x08, > + EBLOCK = 0x09, > + EPA = 0x0A, > + EWB = 0x0B, > + ETRACK = 0x0C, > + EAUG= 0x0D, > + EMODPR = 0x0E, > + EMODT = 0x0F, > + EUPDATESVN = 0x18, > }; > > /** > @@ -73,6 +76,11 @@ enum sgx_encls_function { > * public key does not match IA32_SGXLEPUBKEYHASH. > * %SGX_PAGE_NOT_MODIFIABLE: The EPC page cannot be modified because it > * is in the PENDING or MODIFIED state. > + * %SGX_INSUFFICIENT_ENTROPY:Insufficient entropy in RNG. > + * %SGX_EPC_NOT_READY: EPC is not ready for SVN update. > + * %SGX_NO_UPDATE: EUPDATESVN was successful, but CPUSVN was not > + * updated because current SVN was not newer than > + * CPUSVN. > * %SGX_UNMASKED_EVENT: An unmasked event, e.g. INTR, was > received > */ > enum sgx_return_code { > @@ -81,6 +89,9 @@ enum sgx_return_code { > SGX_CHILD_PRESENT = 13, > SGX_INVALID_EINITTOKEN = 16, > SGX_PAGE_NOT_MODIFIABLE = 20, > + SGX_INSUFFICIENT_ENTROPY= 29, > + SGX_EPC_NOT_READY = 30, > + SGX_NO_UPDATE = 31, > SGX_UNMASKED_EVENT = 128, > }; > > diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h > index 99004b02e2ed..3d83c76dc91f 100644 > --- a/arch/x86/kernel/cpu/sgx/encls.h > +++ b/arch/x86/kernel/cpu/sgx/encls.h > @@ -233,4 +233,10 @@ static inline int __eaug(struct sgx_pageinfo *pginfo, > void *addr) > return __encls_2(EAUG, pginfo, addr); > } > > +/* Update CPUSVN at runtime. */ > +static inline int __eupdatesvn(void) > +{ > + return __encls_ret_1(EUPDATESVN, ""); > +} > + > #endif /* _X86_ENCLS_H */ > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > index b61d3bad0446..24563110811d 100644 > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -32,6 +32,11 @@ static DEFINE_XARRAY(sgx_epc_address_space); > static LIST_HEAD(sgx_active_page_list); > static DEFINE_SPINLOCK(sgx_reclaimer_lock); > > +/* This lock is held to prevent new EPC pages from being created > + * during the execution of ENCLS[EUPDATESVN]. > + */ > +static DEFINE_SPINLOCK(sgx_epc_eupdatesvn_lock); > + > static atomic_long_t sgx_nr_used_pages = ATOMIC_LONG_IN
Re: [PATCH] selftests/run_kselftest.sh: Use readlink if realpath is not available
On 3/18/25 10:05, Yosry Ahmed wrote: 'realpath' is not always available, fallback to 'readlink -f' if is not available. They seem to work equally well in this context. Can you add more specifics on "realpath" is not always available," No issues with the patch itself. I would like to know the cases where "realpath" command is missing. Signed-off-by: Yosry Ahmed --- tools/testing/selftests/run_kselftest.sh | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/run_kselftest.sh b/tools/testing/selftests/run_kselftest.sh index 50e03eefe7ac7..0443beacf3621 100755 --- a/tools/testing/selftests/run_kselftest.sh +++ b/tools/testing/selftests/run_kselftest.sh @@ -3,7 +3,14 @@ # # Run installed kselftest tests. # -BASE_DIR=$(realpath $(dirname $0)) + +# Fallback to readlink if realpath is not available +if which realpath > /dev/null; then +BASE_DIR=$(realpath $(dirname $0)) +else +BASE_DIR=$(readlink -f $(dirname $0)) +fi + cd $BASE_DIR TESTS="$BASE_DIR"/kselftest-list.txt if [ ! -r "$TESTS" ] ; then thanks, -- Shuah
[PATCH v4 1/3] openrisc: Refactor struct cpuinfo_or1k to reduce duplication
The "cpuinfo_or1k" structure currently has identical data members for different cache components. Remove these fields out of struct cpuinfo_or1k and into its own struct. This reduces duplication while keeping cpuinfo_or1k extensible so more cache descriptors can be added in the future. Also add a new field "sets" to the new structure. Signed-off-by: Sahil Siddiq --- No changes from v3 -> v4. Changes from v1/v2 -> v3: - arch/openrisc/kernel/setup.c: (print_cpuinfo): 1. Cascade changes made to struct cpuinfo_or1k. 2. These lines are ultimately shifted to the new file created in patch #3. (setup_cpuinfo): Likewise. (show_cpuinfo): Likewise. arch/openrisc/include/asm/cpuinfo.h | 16 +- arch/openrisc/kernel/setup.c| 45 ++--- 2 files changed, 31 insertions(+), 30 deletions(-) diff --git a/arch/openrisc/include/asm/cpuinfo.h b/arch/openrisc/include/asm/cpuinfo.h index 5e4744153d0e..82f5d4c06314 100644 --- a/arch/openrisc/include/asm/cpuinfo.h +++ b/arch/openrisc/include/asm/cpuinfo.h @@ -15,16 +15,18 @@ #ifndef __ASM_OPENRISC_CPUINFO_H #define __ASM_OPENRISC_CPUINFO_H +struct cache_desc { + u32 size; + u32 sets; + u32 block_size; + u32 ways; +}; + struct cpuinfo_or1k { u32 clock_frequency; - u32 icache_size; - u32 icache_block_size; - u32 icache_ways; - - u32 dcache_size; - u32 dcache_block_size; - u32 dcache_ways; + struct cache_desc icache; + struct cache_desc dcache; u16 coreid; }; diff --git a/arch/openrisc/kernel/setup.c b/arch/openrisc/kernel/setup.c index be56eaafc8b9..66207cd7bb9e 100644 --- a/arch/openrisc/kernel/setup.c +++ b/arch/openrisc/kernel/setup.c @@ -115,16 +115,16 @@ static void print_cpuinfo(void) if (upr & SPR_UPR_DCP) printk(KERN_INFO - "-- dcache: %4d bytes total, %2d bytes/line, %d way(s)\n", - cpuinfo->dcache_size, cpuinfo->dcache_block_size, - cpuinfo->dcache_ways); + "-- dcache: %4d bytes total, %2d bytes/line, %d set(s), %d way(s)\n", + cpuinfo->dcache.size, cpuinfo->dcache.block_size, + cpuinfo->dcache.sets, cpuinfo->dcache.ways); else printk(KERN_INFO "-- dcache disabled\n"); if (upr & SPR_UPR_ICP) printk(KERN_INFO - "-- icache: %4d bytes total, %2d bytes/line, %d way(s)\n", - cpuinfo->icache_size, cpuinfo->icache_block_size, - cpuinfo->icache_ways); + "-- icache: %4d bytes total, %2d bytes/line, %d set(s), %d way(s)\n", + cpuinfo->icache.size, cpuinfo->icache.block_size, + cpuinfo->icache.sets, cpuinfo->icache.ways); else printk(KERN_INFO "-- icache disabled\n"); @@ -156,7 +156,6 @@ void __init setup_cpuinfo(void) { struct device_node *cpu; unsigned long iccfgr, dccfgr; - unsigned long cache_set_size; int cpu_id = smp_processor_id(); struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[cpu_id]; @@ -165,18 +164,18 @@ void __init setup_cpuinfo(void) panic("Couldn't find CPU%d in device tree...\n", cpu_id); iccfgr = mfspr(SPR_ICCFGR); - cpuinfo->icache_ways = 1 << (iccfgr & SPR_ICCFGR_NCW); - cache_set_size = 1 << ((iccfgr & SPR_ICCFGR_NCS) >> 3); - cpuinfo->icache_block_size = 16 << ((iccfgr & SPR_ICCFGR_CBS) >> 7); - cpuinfo->icache_size = - cache_set_size * cpuinfo->icache_ways * cpuinfo->icache_block_size; + cpuinfo->icache.ways = 1 << (iccfgr & SPR_ICCFGR_NCW); + cpuinfo->icache.sets = 1 << ((iccfgr & SPR_ICCFGR_NCS) >> 3); + cpuinfo->icache.block_size = 16 << ((iccfgr & SPR_ICCFGR_CBS) >> 7); + cpuinfo->icache.size = + cpuinfo->icache.sets * cpuinfo->icache.ways * cpuinfo->icache.block_size; dccfgr = mfspr(SPR_DCCFGR); - cpuinfo->dcache_ways = 1 << (dccfgr & SPR_DCCFGR_NCW); - cache_set_size = 1 << ((dccfgr & SPR_DCCFGR_NCS) >> 3); - cpuinfo->dcache_block_size = 16 << ((dccfgr & SPR_DCCFGR_CBS) >> 7); - cpuinfo->dcache_size = - cache_set_size * cpuinfo->dcache_ways * cpuinfo->dcache_block_size; + cpuinfo->dcache.ways = 1 << (dccfgr & SPR_DCCFGR_NCW); + cpuinfo->dcache.sets = 1 << ((dccfgr & SPR_DCCFGR_NCS) >> 3); + cpuinfo->dcache.block_size = 16 << ((dccfgr & SPR_DCCFGR_CBS) >> 7); + cpuinfo->dcache.size = + cpuinfo->dcache.sets * cpuinfo->dcache.ways * cpuinfo->dcache.block_size; if (of_property_read_u32(cpu, "clock-frequency", &cpuinfo->clock_frequency)) { @@ -320,14 +319,14 @@ static int show_cpuinfo(struct seq_file *m, void *v) seq_printf(m, "revision\t\t: %d\n", vr & SPR_VR_REV); } seq_pr
[PATCH v4 2/3] openrisc: Introduce new utility functions to flush and invalidate caches
According to the OpenRISC architecture manual, the dcache and icache may not be present. When these caches are present, the invalidate and flush registers may be absent. The current implementation does not perform checks to verify their presence before utilizing cache registers, or invalidating and flushing cache blocks. Introduce new functions to detect the presence of cache components and related special-purpose registers. There are a few places where a range of addresses have to be flushed or invalidated and the implementation is duplicated. Introduce new utility functions and macros that generalize this implementation and reduce duplication. Signed-off-by: Sahil Siddiq --- Changes from v3 -> v4: - arch/openrisc/include/asm/cpuinfo.h: Move new definitions to cache.c. - arch/openrisc/mm/cache.c: (cache_loop): Split function. (cache_loop_page): New function. (cpu_cache_is_present): Move definition here. (cb_inv_flush_is_implemented): Move definition here. Changes from v2 -> v3: - arch/openrisc/include/asm/cacheflush.h: Declare new functions and macros. - arch/openrisc/include/asm/cpuinfo.h: Implement new functions. (cpu_cache_is_present): 1. The implementation of this function was strewn all over the place in the previous versions. 2. Fix condition. The condition in the previous version was incorrect. (cb_inv_flush_is_implemented): New function. - arch/openrisc/kernel/dma.c: Use new functions. - arch/openrisc/mm/cache.c: (cache_loop): Extend function. (local_*_page_*): Use new cache_loop interface. (local_*_range_*): Implement new functions. - arch/openrisc/mm/init.c: Use new functions. arch/openrisc/include/asm/cacheflush.h | 17 + arch/openrisc/include/asm/cpuinfo.h| 15 + arch/openrisc/kernel/dma.c | 18 ++ arch/openrisc/mm/cache.c | 87 +++--- arch/openrisc/mm/init.c| 5 +- 5 files changed, 118 insertions(+), 24 deletions(-) diff --git a/arch/openrisc/include/asm/cacheflush.h b/arch/openrisc/include/asm/cacheflush.h index 984c331ff5f4..0e60af486ec1 100644 --- a/arch/openrisc/include/asm/cacheflush.h +++ b/arch/openrisc/include/asm/cacheflush.h @@ -23,6 +23,9 @@ */ extern void local_dcache_page_flush(struct page *page); extern void local_icache_page_inv(struct page *page); +extern void local_dcache_range_flush(unsigned long start, unsigned long end); +extern void local_dcache_range_inv(unsigned long start, unsigned long end); +extern void local_icache_range_inv(unsigned long start, unsigned long end); /* * Data cache flushing always happen on the local cpu. Instruction cache @@ -38,6 +41,20 @@ extern void local_icache_page_inv(struct page *page); extern void smp_icache_page_inv(struct page *page); #endif /* CONFIG_SMP */ +/* + * Even if the actual block size is larger than L1_CACHE_BYTES, paddr + * can be incremented by L1_CACHE_BYTES. When paddr is written to the + * invalidate register, the entire cache line encompassing this address + * is invalidated. Each subsequent reference to the same cache line will + * not affect the invalidation process. + */ +#define local_dcache_block_flush(addr) \ + local_dcache_range_flush(addr, addr + L1_CACHE_BYTES) +#define local_dcache_block_inv(addr) \ + local_dcache_range_inv(addr, addr + L1_CACHE_BYTES) +#define local_icache_block_inv(addr) \ + local_icache_range_inv(addr, addr + L1_CACHE_BYTES) + /* * Synchronizes caches. Whenever a cpu writes executable code to memory, this * should be called to make sure the processor sees the newly written code. diff --git a/arch/openrisc/include/asm/cpuinfo.h b/arch/openrisc/include/asm/cpuinfo.h index 82f5d4c06314..e46afbfe9b5a 100644 --- a/arch/openrisc/include/asm/cpuinfo.h +++ b/arch/openrisc/include/asm/cpuinfo.h @@ -15,6 +15,9 @@ #ifndef __ASM_OPENRISC_CPUINFO_H #define __ASM_OPENRISC_CPUINFO_H +#include +#include + struct cache_desc { u32 size; u32 sets; @@ -34,4 +37,16 @@ struct cpuinfo_or1k { extern struct cpuinfo_or1k cpuinfo_or1k[NR_CPUS]; extern void setup_cpuinfo(void); +/* + * Check if the cache component exists. + */ +extern bool cpu_cache_is_present(const unsigned int cache_type); + +/* + * Check if the cache block flush/invalidate register is implemented for the + * cache component. + */ +extern bool cb_inv_flush_is_implemented(const unsigned int reg, + const unsigned int cache_type); + #endif /* __ASM_OPENRISC_CPUINFO_H */ diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index b3edbb33b621..3a7b5baaa450 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -17,6 +17,7 @@ #include #include +#include #include #include @@ -24,9 +25,6 @@ static int page_set_nocache(pte_t *pte, unsigned long addr, unsigned long next, struct mm_walk *walk) { - unsigned long cl; - struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_proc
Re: [PATCH v4 2/3] openrisc: Introduce new utility functions to flush and invalidate caches
On Sat, Mar 29, 2025 at 01:56:31AM +0530, Sahil Siddiq wrote: > According to the OpenRISC architecture manual, the dcache and icache may > not be present. When these caches are present, the invalidate and flush > registers may be absent. The current implementation does not perform > checks to verify their presence before utilizing cache registers, or > invalidating and flushing cache blocks. > > Introduce new functions to detect the presence of cache components and > related special-purpose registers. > > There are a few places where a range of addresses have to be flushed or > invalidated and the implementation is duplicated. Introduce new utility > functions and macros that generalize this implementation and reduce > duplication. > > Signed-off-by: Sahil Siddiq > --- > Changes from v3 -> v4: > - arch/openrisc/include/asm/cpuinfo.h: Move new definitions to cache.c. > - arch/openrisc/mm/cache.c: > (cache_loop): Split function. > (cache_loop_page): New function. > (cpu_cache_is_present): Move definition here. > (cb_inv_flush_is_implemented): Move definition here. > > Changes from v2 -> v3: > - arch/openrisc/include/asm/cacheflush.h: Declare new functions and macros. > - arch/openrisc/include/asm/cpuinfo.h: Implement new functions. > (cpu_cache_is_present): > 1. The implementation of this function was strewn all over the place in > the previous versions. > 2. Fix condition. The condition in the previous version was incorrect. > (cb_inv_flush_is_implemented): New function. > - arch/openrisc/kernel/dma.c: Use new functions. > - arch/openrisc/mm/cache.c: > (cache_loop): Extend function. > (local_*_page_*): Use new cache_loop interface. > (local_*_range_*): Implement new functions. > - arch/openrisc/mm/init.c: Use new functions. > > arch/openrisc/include/asm/cacheflush.h | 17 + > arch/openrisc/include/asm/cpuinfo.h| 15 + > arch/openrisc/kernel/dma.c | 18 ++ > arch/openrisc/mm/cache.c | 87 +++--- > arch/openrisc/mm/init.c| 5 +- > 5 files changed, 118 insertions(+), 24 deletions(-) > > diff --git a/arch/openrisc/include/asm/cacheflush.h > b/arch/openrisc/include/asm/cacheflush.h > index 984c331ff5f4..0e60af486ec1 100644 > --- a/arch/openrisc/include/asm/cacheflush.h > +++ b/arch/openrisc/include/asm/cacheflush.h > @@ -23,6 +23,9 @@ > */ > extern void local_dcache_page_flush(struct page *page); > extern void local_icache_page_inv(struct page *page); > +extern void local_dcache_range_flush(unsigned long start, unsigned long end); > +extern void local_dcache_range_inv(unsigned long start, unsigned long end); > +extern void local_icache_range_inv(unsigned long start, unsigned long end); > > /* > * Data cache flushing always happen on the local cpu. Instruction cache > @@ -38,6 +41,20 @@ extern void local_icache_page_inv(struct page *page); > extern void smp_icache_page_inv(struct page *page); > #endif /* CONFIG_SMP */ > > +/* > + * Even if the actual block size is larger than L1_CACHE_BYTES, paddr > + * can be incremented by L1_CACHE_BYTES. When paddr is written to the > + * invalidate register, the entire cache line encompassing this address > + * is invalidated. Each subsequent reference to the same cache line will > + * not affect the invalidation process. > + */ > +#define local_dcache_block_flush(addr) \ > + local_dcache_range_flush(addr, addr + L1_CACHE_BYTES) > +#define local_dcache_block_inv(addr) \ > + local_dcache_range_inv(addr, addr + L1_CACHE_BYTES) > +#define local_icache_block_inv(addr) \ > + local_icache_range_inv(addr, addr + L1_CACHE_BYTES) > + > /* > * Synchronizes caches. Whenever a cpu writes executable code to memory, this > * should be called to make sure the processor sees the newly written code. > diff --git a/arch/openrisc/include/asm/cpuinfo.h > b/arch/openrisc/include/asm/cpuinfo.h > index 82f5d4c06314..e46afbfe9b5a 100644 > --- a/arch/openrisc/include/asm/cpuinfo.h > +++ b/arch/openrisc/include/asm/cpuinfo.h > @@ -15,6 +15,9 @@ > #ifndef __ASM_OPENRISC_CPUINFO_H > #define __ASM_OPENRISC_CPUINFO_H > > +#include > +#include > + > struct cache_desc { > u32 size; > u32 sets; > @@ -34,4 +37,16 @@ struct cpuinfo_or1k { > extern struct cpuinfo_or1k cpuinfo_or1k[NR_CPUS]; > extern void setup_cpuinfo(void); > > +/* > + * Check if the cache component exists. > + */ > +extern bool cpu_cache_is_present(const unsigned int cache_type); This is used in cacheinfo. OK. > +/* > + * Check if the cache block flush/invalidate register is implemented for the > + * cache component. > + */ > +extern bool cb_inv_flush_is_implemented(const unsigned int reg, > + const unsigned int cache_type); But this function doesnt seem to be used anywhere but in cache.c. Does it need to be public? > #endif /* __ASM_OPENRISC_CPUINFO_H */ > diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c > index b3
[GIT PULL] remoteproc updates for v6.15
The following changes since commit a64dcfb451e254085a7daee5fe51bf22959d52d3: Linux 6.14-rc2 (2025-02-09 12:45:03 -0800) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux.git tags/rproc-v6.15 for you to fetch changes up to e917b73234b02aa4966325e7380d2559bf127ba9: remoteproc: qcom_q6v5_pas: Make single-PD handling more robust (2025-03-22 08:42:39 -0500) remoteproc updates for v6.15 The i.MX8MP DSP remoteproc driver is transitioned to use the reset framework for driving the run/stall reset bits. Support for managing the modem remoteprocessor on the Qualocmm MSM8226, MSM8926, and SM8750 platforms is added. Dan Carpenter (1): remoteproc: sysmon: Update qcom_add_sysmon_subdev() comment Daniel Baluta (8): dt-bindings: reset: audiomix: Add reset ids for EARC and DSP dt-bindings: dsp: fsl,dsp: Add resets property reset: imx8mp-audiomix: Add prefix for internal macro reset: imx8mp-audiomix: Prepare the code for more reset bits reset: imx8mp-audiomix: Introduce active_low configuration option reset: imx8mp-audiomix: Add support for DSP run/stall imx_dsp_rproc: Use reset controller API to control the DSP remoteproc: imx_dsp_rproc: Document run_stall struct member Jiri Slaby (SUSE) (1): irqdomain: remoteproc: Switch to of_fwnode_handle() Konrad Dybcio (1): dt-bindings: remoteproc: Consolidate SC8180X and SM8150 PAS files Krzysztof Kozlowski (4): dt-bindings: remoteproc: qcom,sm6115-pas: Use recommended MBN firmware format in DTS example dt-bindings: remoteproc: Add SM8750 CDSP dt-bindings: remoteproc: Add SM8750 MPSS remoteproc: qcom: pas: Add SM8750 MPSS Luca Weiss (7): dt-bindings: remoteproc: qcom,msm8916-mss-pil: Add MSM8926 remoteproc: qcom_q6v5_mss: Handle platforms with one power domain remoteproc: qcom_q6v5_mss: Add modem support on MSM8226 remoteproc: qcom_q6v5_mss: Add modem support on MSM8926 remoteproc: qcom: pas: add minidump_id to SC7280 WPSS remoteproc: qcom_q6v5_pas: Use resource with CX PD for MSM8226 remoteproc: qcom_q6v5_pas: Make single-PD handling more robust Matti Lehtimäki (4): dt-bindings: remoteproc: qcom,msm8916-mss-pil: Support platforms with one power domain dt-bindings: remoteproc: qcom,msm8916-mss-pil: Add MSM8226 dt-bindings: remoteproc: qcom,wcnss-pil: Add support for single power-domain platforms remoteproc: qcom_wcnss: Handle platforms with only single power domain Peng Fan (2): remoteproc: omap: Add comment for is_iomem remoteproc: core: Clear table_sz when rproc_shutdown Documentation/devicetree/bindings/dsp/fsl,dsp.yaml | 24 ++- .../bindings/remoteproc/qcom,msm8916-mss-pil.yaml | 64 ++- .../bindings/remoteproc/qcom,sc8180x-pas.yaml | 96 --- .../bindings/remoteproc/qcom,sm6115-pas.yaml | 2 +- .../bindings/remoteproc/qcom,sm8150-pas.yaml | 7 + .../bindings/remoteproc/qcom,sm8550-pas.yaml | 46 +- .../bindings/remoteproc/qcom,wcnss-pil.yaml| 45 - drivers/remoteproc/imx_dsp_rproc.c | 26 ++- drivers/remoteproc/imx_rproc.h | 2 + drivers/remoteproc/omap_remoteproc.c | 1 + drivers/remoteproc/pru_rproc.c | 2 +- drivers/remoteproc/qcom_q6v5_mss.c | 184 - drivers/remoteproc/qcom_q6v5_pas.c | 38 - drivers/remoteproc/qcom_sysmon.c | 2 +- drivers/remoteproc/qcom_wcnss.c| 33 +++- drivers/remoteproc/remoteproc_core.c | 1 + drivers/reset/reset-imx8mp-audiomix.c | 78 ++--- include/dt-bindings/reset/imx8mp-reset-audiomix.h | 13 ++ 18 files changed, 499 insertions(+), 165 deletions(-) delete mode 100644 Documentation/devicetree/bindings/remoteproc/qcom,sc8180x-pas.yaml create mode 100644 include/dt-bindings/reset/imx8mp-reset-audiomix.h
[PATCH v4 0/3] openrisc: Add cacheinfo support and introduce new utility functions
Hi, The main purpose of this series is to expose CPU cache attributes for OpenRISC in sysfs using the cacheinfo API. The core implementation to achieve this is in patch #3. Patch #1 and #2 add certain enhancements to simplify the implementation of cacheinfo support. Patch #1 removes duplication of cache-related data members in struct cpuinfo_or1k. Patch #2 introduces several utility functions. One set of functions is used to check if the cache components and SPRs exist before attempting to use them. The other set provides a convenient interface to flush or invalidate a range of cache blocks. This version addresses review comments posted in response to v3. In commit #2, I chose not to make "cache_loop_page()" inline after reading point 15 in the coding style doc [1]. Let me know if it should be made inline. Thanks, Sahil [1] https://www.kernel.org/doc/html/latest/process/coding-style.html Sahil Siddiq (3): openrisc: Refactor struct cpuinfo_or1k to reduce duplication openrisc: Introduce new utility functions to flush and invalidate caches openrisc: Add cacheinfo support arch/openrisc/include/asm/cacheflush.h | 17 arch/openrisc/include/asm/cpuinfo.h| 31 ++-- arch/openrisc/kernel/Makefile | 2 +- arch/openrisc/kernel/cacheinfo.c | 104 + arch/openrisc/kernel/dma.c | 18 + arch/openrisc/kernel/setup.c | 45 +-- arch/openrisc/mm/cache.c | 87 +++-- arch/openrisc/mm/init.c| 5 +- 8 files changed, 235 insertions(+), 74 deletions(-) create mode 100644 arch/openrisc/kernel/cacheinfo.c base-commit: ea1413e5b53a8dd4fa7675edb23cdf828bbdce1e -- 2.48.1
[PATCH net 0/4] mptcp: misc. fixes for 6.15-rc0
Here are 4 unrelated patches: - Patch 1: fix a NULL pointer when two SYN-ACK for the same request are handled in parallel. A fix for up to v5.9. - Patch 2: selftests: fix check for the wrong FD. A fix for up to v5.17. - Patch 3: selftests: close all FDs in case of error. A fix for up to v5.17. - Patch 4: selftests: ignore a new generated file. A fix for 6.15-rc0. Signed-off-by: Matthieu Baerts (NGI0) --- Cong Liu (1): selftests: mptcp: fix incorrect fd checks in main_loop Gang Yan (1): mptcp: fix NULL pointer in can_accept_new_subflow Geliang Tang (1): selftests: mptcp: close fd_in before returning in main_loop Matthieu Baerts (NGI0) (1): selftests: mptcp: ignore mptcp_diag binary net/mptcp/subflow.c | 15 --- tools/testing/selftests/net/mptcp/.gitignore | 1 + tools/testing/selftests/net/mptcp/mptcp_connect.c | 11 +++ 3 files changed, 16 insertions(+), 11 deletions(-) --- base-commit: 2ea396448f26d0d7d66224cb56500a6789c7ed07 change-id: 20250328-net-mptcp-misc-fixes-6-15-98bfbeaa15ac Best regards, -- Matthieu Baerts (NGI0)
[PATCH net 1/4] mptcp: fix NULL pointer in can_accept_new_subflow
From: Gang Yan When testing valkey benchmark tool with MPTCP, the kernel panics in 'mptcp_can_accept_new_subflow' because subflow_req->msk is NULL. Call trace: mptcp_can_accept_new_subflow (./net/mptcp/subflow.c:63 (discriminator 4)) (P) subflow_syn_recv_sock (./net/mptcp/subflow.c:854) tcp_check_req (./net/ipv4/tcp_minisocks.c:863) tcp_v4_rcv (./net/ipv4/tcp_ipv4.c:2268) ip_protocol_deliver_rcu (./net/ipv4/ip_input.c:207) ip_local_deliver_finish (./net/ipv4/ip_input.c:234) ip_local_deliver (./net/ipv4/ip_input.c:254) ip_rcv_finish (./net/ipv4/ip_input.c:449) ... According to the debug log, the same req received two SYN-ACK in a very short time, very likely because the client retransmits the syn ack due to multiple reasons. Even if the packets are transmitted with a relevant time interval, they can be processed by the server on different CPUs concurrently). The 'subflow_req->msk' ownership is transferred to the subflow the first, and there will be a risk of a null pointer dereference here. This patch fixes this issue by moving the 'subflow_req->msk' under the `own_req == true` conditional. Note that the !msk check in subflow_hmac_valid() can be dropped, because the same check already exists under the own_req mpj branch where the code has been moved to. Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use") Cc: sta...@vger.kernel.org Suggested-by: Paolo Abeni Signed-off-by: Gang Yan Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/subflow.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index efe8d86496dbd06a3c4cae6ffc6462e43e42c959..409bd415ef1d190d5599658d01323ad8c8a9be93 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -754,8 +754,6 @@ static bool subflow_hmac_valid(const struct request_sock *req, subflow_req = mptcp_subflow_rsk(req); msk = subflow_req->msk; - if (!msk) - return false; subflow_generate_hmac(READ_ONCE(msk->remote_key), READ_ONCE(msk->local_key), @@ -850,12 +848,8 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, } else if (subflow_req->mp_join) { mptcp_get_options(skb, &mp_opt); - if (!(mp_opt.suboptions & OPTION_MPTCP_MPJ_ACK) || - !subflow_hmac_valid(req, &mp_opt) || - !mptcp_can_accept_new_subflow(subflow_req->msk)) { - SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_JOINACKMAC); + if (!(mp_opt.suboptions & OPTION_MPTCP_MPJ_ACK)) fallback = true; - } } create_child: @@ -905,6 +899,13 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, goto dispose_child; } + if (!subflow_hmac_valid(req, &mp_opt) || + !mptcp_can_accept_new_subflow(subflow_req->msk)) { + SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_JOINACKMAC); + subflow_add_reset_reason(skb, MPTCP_RST_EPROHIBIT); + goto dispose_child; + } + /* move the msk reference ownership to the subflow */ subflow_req->msk = NULL; ctx->conn = (struct sock *)owner; -- 2.48.1
[PATCH net 2/4] selftests: mptcp: fix incorrect fd checks in main_loop
From: Cong Liu Fix a bug where the code was checking the wrong file descriptors when opening the input files. The code was checking 'fd' instead of 'fd_in', which could lead to incorrect error handling. Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests") Cc: sta...@vger.kernel.org Fixes: ca7ae8916043 ("selftests: mptcp: mptfo Initiator/Listener") Co-developed-by: Geliang Tang Signed-off-by: Geliang Tang Signed-off-by: Cong Liu Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_connect.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c index d240d02fa443a1cd802f0e705ab36db5c22063a8..893dc36b12f607bec56a41c9961eff272a7837c7 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -1270,7 +1270,7 @@ int main_loop(void) if (cfg_input && cfg_sockopt_types.mptfo) { fd_in = open(cfg_input, O_RDONLY); - if (fd < 0) + if (fd_in < 0) xerror("can't open %s:%d", cfg_input, errno); } @@ -1293,7 +1293,7 @@ int main_loop(void) if (cfg_input && !cfg_sockopt_types.mptfo) { fd_in = open(cfg_input, O_RDONLY); - if (fd < 0) + if (fd_in < 0) xerror("can't open %s:%d", cfg_input, errno); } -- 2.48.1
[PATCH net 3/4] selftests: mptcp: close fd_in before returning in main_loop
From: Geliang Tang The file descriptor 'fd_in' is opened when cfg_input is configured, but not closed in main_loop(), this patch fixes it. Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests") Cc: sta...@vger.kernel.org Co-developed-by: Cong Liu Signed-off-by: Cong Liu Signed-off-by: Geliang Tang Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_connect.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c index 893dc36b12f607bec56a41c9961eff272a7837c7..c83a8b47bbdfa5fcf1462e2b2949b41fd32c9b14 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -1299,7 +1299,7 @@ int main_loop(void) ret = copyfd_io(fd_in, fd, 1, 0, &winfo); if (ret) - return ret; + goto out; if (cfg_truncate > 0) { shutdown(fd, SHUT_WR); @@ -1320,7 +1320,10 @@ int main_loop(void) close(fd); } - return 0; +out: + if (cfg_input) + close(fd_in); + return ret; } int parse_proto(const char *proto) -- 2.48.1
[GIT PULL] hwspinlock updates for v6.15
The following changes since commit a64dcfb451e254085a7daee5fe51bf22959d52d3: Linux 6.14-rc2 (2025-02-09 12:45:03 -0800) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux.git tags/hwlock-v6.15 for you to fetch changes up to fec04edb74126f21ac628c7be763c97deb49f69d: hwspinlock: Remove unused hwspin_lock_get_id() (2025-03-21 17:12:04 -0500) hwspinlock updates for v6.15 Drop a few unused functions from the hwspinlock framework. Dr. David Alan Gilbert (2): hwspinlock: Remove unused (devm_)hwspin_lock_request() hwspinlock: Remove unused hwspin_lock_get_id() Documentation/locking/hwspinlock.rst | 57 +- drivers/hwspinlock/hwspinlock_core.c | 94 include/linux/hwspinlock.h | 18 --- 3 files changed, 1 insertion(+), 168 deletions(-)
[PATCH v2 1/2] x86/sgx: Use sgx_nr_used_pages for EPC page count instead of sgx_nr_free_pages
sgx_nr_free_pages is an atomic that is used to keep track of free EPC pages and detect whenever page reclaiming should start. Since successful execution of ENCLS[EUPDATESVN] requires empty EPC and preferably a fast lockless way of checking for this condition in all code paths where EPC is already used, change the reclaiming code to track the number of used pages via sgx_nr_used_pages instead of sgx_nr_free_pages. For this change to work in the page reclamation code, add a new variable, sgx_nr_total_pages, that will keep track of total number of EPC pages. It would have been possible to implement ENCLS[EUPDATESVN] using existing sgx_nr_free_pages counter and a new sgx_nr_total_pages counter, but it won't be possible to avoid taking a lock *every time* a new EPC page is being allocated. The conversion of sgx_nr_free_pages into sgx_nr_used_pages allows avoiding the lock in all cases except when it is the first EPC page being allocated via a quick atomic_long_inc_not_zero check. Note: The serialization for sgx_nr_total_pages is not needed because the variable is only updated during the initialization and there's no concurrent access. Signed-off-by: Elena Reshetova --- arch/x86/kernel/cpu/sgx/main.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 8ce352fc72ac..b61d3bad0446 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -32,7 +32,8 @@ static DEFINE_XARRAY(sgx_epc_address_space); static LIST_HEAD(sgx_active_page_list); static DEFINE_SPINLOCK(sgx_reclaimer_lock); -static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); +static atomic_long_t sgx_nr_used_pages = ATOMIC_LONG_INIT(0); +static unsigned long sgx_nr_total_pages; /* Nodes with one or more EPC sections. */ static nodemask_t sgx_numa_mask; @@ -378,8 +379,8 @@ static void sgx_reclaim_pages(void) static bool sgx_should_reclaim(unsigned long watermark) { - return atomic_long_read(&sgx_nr_free_pages) < watermark && - !list_empty(&sgx_active_page_list); + return (sgx_nr_total_pages - atomic_long_read(&sgx_nr_used_pages)) + < watermark && !list_empty(&sgx_active_page_list); } /* @@ -456,7 +457,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid) page->flags = 0; spin_unlock(&node->lock); - atomic_long_dec(&sgx_nr_free_pages); + atomic_long_inc(&sgx_nr_used_pages); return page; } @@ -616,7 +617,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page) page->flags = SGX_EPC_PAGE_IS_FREE; spin_unlock(&node->lock); - atomic_long_inc(&sgx_nr_free_pages); + atomic_long_dec(&sgx_nr_used_pages); } static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, @@ -648,6 +649,8 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, list_add_tail(§ion->pages[i].list, &sgx_dirty_page_list); } + sgx_nr_total_pages += nr_pages; + return true; } @@ -848,6 +851,8 @@ static bool __init sgx_page_cache_init(void) return false; } + atomic_long_set(&sgx_nr_used_pages, sgx_nr_total_pages); + for_each_online_node(nid) { if (!node_isset(nid, sgx_numa_mask) && node_state(nid, N_MEMORY) && node_state(nid, N_CPU)) -- 2.45.2
[PATCH v2 0/2] Enable automatic SVN updates for SGX enclaves
Changes since v1 following review by Jarkko: - first and second patch are squashed together and a better explanation of the change is added into the commit message - third and fourth patch are also combined for better understanding of error code purposes used in 4th patch - implementation of sgx_updatesvn adjusted following Jarkko's suggestions - minor fixes in both commit messages and code from the review - dropping co-developed-by tag since the code now differs enough from the original submission. However, the reference where the original code came from and credits to original author is kept Background -- In case an SGX vulnerability is discovered and TCB recovery for SGX is triggered, Intel specifies a process that must be followed for a given vulnerability. Steps to mitigate can vary based on vulnerability type, affected components, etc. In some cases, a vulnerability can be mitigated via a runtime recovery flow by shutting down all running SGX enclaves, clearing enclave page cache (EPC), applying a microcode patch that does not require a reboot (via late microcode loading) and restarting all SGX enclaves. Problem statement - Even when the above-described runtime recovery flow to mitigate the SGX vulnerability is followed, the SGX attestation evidence will still reflect the security SVN version being equal to the previous state of security SVN (containing vulnerability) that created and managed the enclave until the runtime recovery event. This limitation currently can be only overcome via a platform reboot, which negates all the benefits from the rebootless late microcode loading and not required in this case for functional or security purposes. Proposed solution - SGX architecture introduced a new instruction called EUPDATESVN [1] to Ice Lake. It allows updating security SVN version, given that EPC is completely empty. The latter is required for security reasons in order to reason that enclave security posture is as secure as the security SVN version of the TCB that created it. This series enables opportunistic execution of EUPDATESVN upon first EPC page allocation for a first enclave to be run on the platform. This series is partly based on the previous work done by Cathy Zhang [2], which attempted to enable forceful destruction of all SGX enclaves and execution of EUPDATESVN upon successful application of any microcode patch. This approach is determined as being too intrusive for the running SGX enclaves, especially taking into account that it would be performed upon *every* microcode patch application regardless if it changes the security SVN version or not (change to the security SVN version is a rare event). Testing --- Tested on EMR machine using kernel-6.14.0_rc7 & sgx selftests. If Google folks in CC can test on their side, it would be greatly appreciated. References -- [1] https://cdrdv2.intel.com/v1/dl/getContent/648682?explicitVersion=true [2] https://lore.kernel.org/all/20220520103904.1216-1-cathy.zh...@intel.com/ Elena Reshetova (2): x86/sgx: Use sgx_nr_used_pages for EPC page count instead of sgx_nr_free_pages x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc arch/x86/include/asm/sgx.h | 41 +++--- arch/x86/kernel/cpu/sgx/encls.h | 6 +++ arch/x86/kernel/cpu/sgx/main.c | 76 ++--- arch/x86/kernel/cpu/sgx/sgx.h | 1 + 4 files changed, 104 insertions(+), 20 deletions(-) -- 2.45.2
Re: [PATCH 0/3] Avoid calling WARN_ON() on allocation failure in cfg802154_switch_netns()
Hello Ivan, On 28/03/2025 at 04:04:24 +03, Ivan Abramov wrote: > This series was inspired by Syzkaller report on warning in > cfg802154_switch_netns(). Thanks for the series, lgtm. Reviewed-by: Miquel Raynal Miquèl
RE: [PATCH 1/4] x86/sgx: Add total number of EPC pages
> oN Thu, Mar 27, 2025 at 03:29:53PM +, Reshetova, Elena wrote: > > > > > On Mon, Mar 24, 2025 at 12:12:41PM +, Reshetova, Elena wrote: > > > > > On Fri, Mar 21, 2025 at 02:34:40PM +0200, Elena Reshetova wrote: > > > > > > In order to successfully execute ENCLS[EUPDATESVN], EPC must be > > > empty. > > > > > > SGX already has a variable sgx_nr_free_pages that tracks free > > > > > > EPC pages. Add a new variable, sgx_nr_total_pages, that will keep > > > > > > track of total number of EPC pages. It will be used in subsequent > > > > > > patch to change the sgx_nr_free_pages into sgx_nr_used_pages and > > > > > > allow an easy check for an empty EPC. > > > > > > > > > > First off, remove "in subsequent patch". > > > > > > > > Ok > > > > > > > > > > > > > > What does "change sgx_nr_free_pages into sgx_nr_used_pages" > mean? > > > > > > > > As you can see from patch 2/4, I had to turn around the meaning of the > > > > existing sgx_nr_free_pages atomic counter not to count the # of free > pages > > > > in EPC, but to count the # of used EPC pages (hence the change of name > > > > to sgx_nr_used_pages). The reason for doing this is only apparent in > patch > > > > > > Why you *absolutely* need to invert the meaning and cannot make > > > this work by any means otherwise? > > > > > > I doubt highly doubt this could not be done other way around. > > > > I can make it work. The point that this way is much better and no damage to > > existing logic is done. The sgx_nr_free_pages counter that is used only for > page reclaiming > > and checked in a single piece of code. > > To give you an idea the previous iteration of the code looked like below. > > First, I had to define a new unconditional spinlock to protect the EPC page > allocation: > > > > diff --git a/arch/x86/kernel/cpu/sgx/main.c > b/arch/x86/kernel/cpu/sgx/main.c > > index c8a2542140a1..4f445c28929b 100644 > > --- a/arch/x86/kernel/cpu/sgx/main.c > > +++ b/arch/x86/kernel/cpu/sgx/main.c > > @@ -31,6 +31,7 @@ static DEFINE_XARRAY(sgx_epc_address_space); > > */ > > static LIST_HEAD(sgx_active_page_list); > > static DEFINE_SPINLOCK(sgx_reclaimer_lock); > > +static DEFINE_SPINLOCK(sgx_allocate_epc_page_lock); > > > > > > > static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); > > static unsigned long sgx_nr_total_pages; > > @@ -457,7 +458,10 @@ static struct sgx_epc_page > *__sgx_alloc_epc_page_from_node(int nid) > > page->flags = 0; > > > > spin_unlock(&node->lock); > > + > > + spin_lock(&sgx_allocate_epc_page_lock); > > atomic_long_dec(&sgx_nr_free_pages); > > + spin_unlock(&sgx_allocate_epc_page_lock); > > > > return page; > > } > > > > And then also take spinlock every time eupdatesvn attempts to run: > > > > int sgx_updatesvn(void) > > +{ > > + int ret; > > + int retry = 10; > > Reverse xmas tree order. > > > + > > + spin_lock(&sgx_allocate_epc_page_lock); > > You could use guard for this. > > https://elixir.bootlin.com/linux/v6.13.7/source/include/linux/cleanup.h > > > + > > + if (atomic_long_read(&sgx_nr_free_pages) != sgx_nr_total_pages) { > > + spin_unlock(&sgx_allocate_epc_page_lock); > > + return SGX_EPC_NOT_READY; > > Don't use uarch error codes. Sure, thanks, I can fix all of the above, this was just to give an idea how the other version of the code would look like. > > > + } > > + > > + do { > > + ret = __eupdatesvn(); > > + if (ret != SGX_INSUFFICIENT_ENTROPY) > > + break; > > + > > + } while (--retry); > > + > > + spin_unlock(&sgx_allocate_epc_page_lock); > > > > Which was called from each enclave create ioctl: > > > > @@ -163,6 +163,11 @@ static long sgx_ioc_enclave_create(struct sgx_encl > *encl, void __user *arg) > > if (copy_from_user(&create_arg, arg, sizeof(create_arg))) > > return -EFAULT; > > > > + /* Unless running in a VM, execute EUPDATESVN if instruction is avalible > > */ > > + if ((cpuid_eax(SGX_CPUID) & SGX_CPUID_EUPDATESVN) && > > +!boot_cpu_has(X86_FEATURE_HYPERVISOR)) > > + sgx_updatesvn(); > > + > > secs = kmalloc(PAGE_SIZE, GFP_KERNEL); > > if (!secs) > > return -ENOMEM; > > > > Would you agree that this way it is much worse even code/logic-wise even > without benchmarks? > > Yes but obviously I cannot promise that I'll accept this as it is > until I see the final version Are you saying you prefer *this version with spinlock* vs. simpler version that utilizes the fact that sgx_nr_free_pages is changed into tracking of number of used pages? > > Also you probably should use mutex given the loop where we cannot > temporarily exit the lock (like e.g. in keyrings gc we can). Not sure I understand this, could you please elaborate why do I need an additional mutex here? Or are you suggesting switching spinlock to mutex? Best Regards, Elena.
[PATCH v8 4/8] vhost: Introduce vhost_worker_ops in vhost_worker
Abstract vhost worker operations (create/stop/wakeup) into an ops structure to prepare for kthread mode support. Signed-off-by: Cindy Lu --- drivers/vhost/vhost.c | 63 ++- drivers/vhost/vhost.h | 11 2 files changed, 56 insertions(+), 18 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 20571bd6f7bd..c162ad772f8f 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -243,7 +243,7 @@ static void vhost_worker_queue(struct vhost_worker *worker, * test_and_set_bit() implies a memory barrier. */ llist_add(&work->node, &worker->work_list); - vhost_task_wake(worker->vtsk); + worker->ops->wakeup(worker); } } @@ -706,7 +706,7 @@ static void vhost_worker_destroy(struct vhost_dev *dev, WARN_ON(!llist_empty(&worker->work_list)); xa_erase(&dev->worker_xa, worker->id); - vhost_task_stop(worker->vtsk); + worker->ops->stop(worker); kfree(worker); } @@ -729,42 +729,69 @@ static void vhost_workers_free(struct vhost_dev *dev) xa_destroy(&dev->worker_xa); } +static void vhost_task_wakeup(struct vhost_worker *worker) +{ + return vhost_task_wake(worker->vtsk); +} + +static void vhost_task_do_stop(struct vhost_worker *worker) +{ + return vhost_task_stop(worker->vtsk); +} + +static int vhost_task_worker_create(struct vhost_worker *worker, + struct vhost_dev *dev, const char *name) +{ + struct vhost_task *vtsk; + u32 id; + int ret; + + vtsk = vhost_task_create(vhost_run_work_list, vhost_worker_killed, +worker, name); + if (IS_ERR(vtsk)) + return PTR_ERR(vtsk); + + worker->vtsk = vtsk; + vhost_task_start(vtsk); + ret = xa_alloc(&dev->worker_xa, &id, worker, xa_limit_32b, GFP_KERNEL); + if (ret < 0) { + vhost_task_do_stop(worker); + return ret; + } + worker->id = id; + return 0; +} + +static const struct vhost_worker_ops vhost_task_ops = { + .create = vhost_task_worker_create, + .stop = vhost_task_do_stop, + .wakeup = vhost_task_wakeup, +}; + static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) { struct vhost_worker *worker; - struct vhost_task *vtsk; char name[TASK_COMM_LEN]; int ret; - u32 id; + const struct vhost_worker_ops *ops = &vhost_task_ops; worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT); if (!worker) return NULL; worker->dev = dev; + worker->ops = ops; snprintf(name, sizeof(name), "vhost-%d", current->pid); - vtsk = vhost_task_create(vhost_run_work_list, vhost_worker_killed, -worker, name); - if (IS_ERR(vtsk)) - goto free_worker; - mutex_init(&worker->mutex); init_llist_head(&worker->work_list); worker->kcov_handle = kcov_common_handle(); - worker->vtsk = vtsk; - - vhost_task_start(vtsk); - - ret = xa_alloc(&dev->worker_xa, &id, worker, xa_limit_32b, GFP_KERNEL); + ret = ops->create(worker, dev, name); if (ret < 0) - goto stop_worker; - worker->id = id; + goto free_worker; return worker; -stop_worker: - vhost_task_stop(vtsk); free_worker: kfree(worker); return NULL; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 19bb94922a0e..98895e299efa 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -26,6 +26,16 @@ struct vhost_work { unsigned long flags; }; +struct vhost_worker; +struct vhost_dev; + +struct vhost_worker_ops { + int (*create)(struct vhost_worker *worker, struct vhost_dev *dev, + const char *name); + void (*stop)(struct vhost_worker *worker); + void (*wakeup)(struct vhost_worker *worker); +}; + struct vhost_worker { struct vhost_task *vtsk; struct vhost_dev*dev; @@ -36,6 +46,7 @@ struct vhost_worker { u32 id; int attachment_cnt; boolkilled; + const struct vhost_worker_ops *ops; }; /* Poll a file (eventfd or socket) */ -- 2.45.0
[PATCH] selftests/mm: Fix loss of information warnings
Cppcheck reported a style warning: int result is assigned to long long variable. If the variable is long long to avoid loss of information, then you have loss of information. Changing the type of page_size from 'unsigned int' to 'unsigned long long' was considered. But that might cause new conversion issues in other parts of the code where calculations involving 'page_size' are assigned to int variables. So we approach by appending ULL suffixes Reported-by: David Binderman Closes: https://lore.kernel.org/all/as8pr02mb10217315060bbfdb21f19643e9c...@as8pr02mb10217.eurprd02.prod.outlook.com/ Signed-off-by: Siddarth G --- tools/testing/selftests/mm/pagemap_ioctl.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/mm/pagemap_ioctl.c b/tools/testing/selftests/mm/pagemap_ioctl.c index 57b4bba2b45f..f3b12402ca89 100644 --- a/tools/testing/selftests/mm/pagemap_ioctl.c +++ b/tools/testing/selftests/mm/pagemap_ioctl.c @@ -244,7 +244,7 @@ int sanity_tests_sd(void) long walk_end; vec_size = num_pages/2; - mem_size = num_pages * page_size; + mem_size = num_pages * (long long)page_size; vec = malloc(sizeof(struct page_region) * vec_size); if (!vec) @@ -432,7 +432,7 @@ int sanity_tests_sd(void) free(vec2); /* 8. Smaller vec */ - mem_size = 1050 * page_size; + mem_size = 1050ULL * page_size; vec_size = mem_size/(page_size*2); vec = malloc(sizeof(struct page_region) * vec_size); @@ -487,7 +487,7 @@ int sanity_tests_sd(void) total_pages = 0; /* 9. Smaller vec */ - mem_size = 1 * page_size; + mem_size = 1ULL * page_size; vec_size = 50; vec = malloc(sizeof(struct page_region) * vec_size); @@ -1058,7 +1058,7 @@ int sanity_tests(void) char *tmp_buf; /* 1. wrong operation */ - mem_size = 10 * page_size; + mem_size = 10ULL * page_size; vec_size = mem_size / page_size; vec = malloc(sizeof(struct page_region) * vec_size); @@ -1507,7 +1507,7 @@ int main(int __attribute__((unused)) argc, char *argv[]) sanity_tests_sd(); /* 2. Normal page testing */ - mem_size = 10 * page_size; + mem_size = 10ULL * page_size; mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); if (mem == MAP_FAILED) ksft_exit_fail_msg("error nomem\n"); @@ -1520,7 +1520,7 @@ int main(int __attribute__((unused)) argc, char *argv[]) munmap(mem, mem_size); /* 3. Large page testing */ - mem_size = 512 * 10 * page_size; + mem_size = 512ULL * 10 * page_size; mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); if (mem == MAP_FAILED) ksft_exit_fail_msg("error nomem\n"); -- 2.43.0
Re: [PATCH v2] selftests/ptrace/get_syscall_info: fix for MIPS n32
On 1/15/25 16:37, Dmitry V. Levin wrote: MIPS n32 is one of two ILP32 architectures supported by the kernel that have 64-bit syscall arguments (another one is x32). When this test passed 32-bit arguments to syscall(), they were sign-extended in libc, PTRACE_GET_SYSCALL_INFO reported these sign-extended 64-bit values, and the test complained about the mismatch. Fix this by passing arguments of the appropriate type to syscall(), which is "unsigned long long" on MIPS n32, and __kernel_ulong_t on other architectures. As a side effect, this also extends the test on all 64-bit architectures by choosing constants that don't fit into 32-bit integers. Signed-off-by: Dmitry V. Levin --- v2: Fixed MIPS #ifdef. .../selftests/ptrace/get_syscall_info.c | 53 +++ 1 file changed, 32 insertions(+), 21 deletions(-) diff --git a/tools/testing/selftests/ptrace/get_syscall_info.c b/tools/testing/selftests/ptrace/get_syscall_info.c index 5bcd1c7b5be6..2970f72d66d3 100644 --- a/tools/testing/selftests/ptrace/get_syscall_info.c +++ b/tools/testing/selftests/ptrace/get_syscall_info.c @@ -11,8 +11,19 @@ #include #include #include +#include #include "linux/ptrace.h" +#if defined(_MIPS_SIM) && _MIPS_SIM == _MIPS_SIM_NABI32 +/* + * MIPS N32 is the only architecture where __kernel_ulong_t + * does not match the bitness of syscall arguments. + */ +typedef unsigned long long kernel_ulong_t; +#else +typedef __kernel_ulong_t kernel_ulong_t; +#endif + What's the reason for adding these typedefs? checkpatch should have warned you about adding new typedefs. Also this introduces kernel_ulong_t in user-space test code. Something to avoid. static int kill_tracee(pid_t pid) { @@ -42,37 +53,37 @@ sys_ptrace(int request, pid_t pid, unsigned long addr, unsigned long data) TEST(get_syscall_info) { - static const unsigned long args[][7] = { + const kernel_ulong_t args[][7] = { /* a sequence of architecture-agnostic syscalls */ { __NR_chdir, - (unsigned long) "", - 0xbad1fed1, - 0xbad2fed2, - 0xbad3fed3, - 0xbad4fed4, - 0xbad5fed5 + (uintptr_t) "", You could use ifdef here. + (kernel_ulong_t) 0xdad1bef1bad1fed1ULL, + (kernel_ulong_t) 0xdad2bef2bad2fed2ULL, + (kernel_ulong_t) 0xdad3bef3bad3fed3ULL, + (kernel_ulong_t) 0xdad4bef4bad4fed4ULL, + (kernel_ulong_t) 0xdad5bef5bad5fed5ULL }, { __NR_gettid, - 0xcaf0bea0, - 0xcaf1bea1, - 0xcaf2bea2, - 0xcaf3bea3, - 0xcaf4bea4, - 0xcaf5bea5 + (kernel_ulong_t) 0xdad0bef0caf0bea0ULL, + (kernel_ulong_t) 0xdad1bef1caf1bea1ULL, + (kernel_ulong_t) 0xdad2bef2caf2bea2ULL, + (kernel_ulong_t) 0xdad3bef3caf3bea3ULL, + (kernel_ulong_t) 0xdad4bef4caf4bea4ULL, + (kernel_ulong_t) 0xdad5bef5caf5bea5ULL }, { __NR_exit_group, 0, - 0xfac1c0d1, - 0xfac2c0d2, - 0xfac3c0d3, - 0xfac4c0d4, - 0xfac5c0d5 + (kernel_ulong_t) 0xdad1bef1fac1c0d1ULL, + (kernel_ulong_t) 0xdad2bef2fac2c0d2ULL, + (kernel_ulong_t) 0xdad3bef3fac3c0d3ULL, + (kernel_ulong_t) 0xdad4bef4fac4c0d4ULL, + (kernel_ulong_t) 0xdad5bef5fac5c0d5ULL } }; - const unsigned long *exp_args; + const kernel_ulong_t *exp_args; pid_t pid = fork(); @@ -154,7 +165,7 @@ TEST(get_syscall_info) } ASSERT_LT(0, (rc = sys_ptrace(PTRACE_GET_SYSCALL_INFO, pid, size, - (unsigned long) &info))) { + (uintptr_t) &info))) { LOG_KILL_TRACEE("PTRACE_GET_SYSCALL_INFO: %m"); } ASSERT_EQ(expected_none_size, rc) { @@ -177,7 +188,7 @@ TEST(get_syscall_info) case SIGTRAP | 0x80: ASSERT_LT(0, (rc = sys_ptrace(PTRACE_GET_SYSCALL_INFO, pid, size, - (unsigned long) &info))) { + (uintptr_
Re: [PATCH net-next v2] vsock/test: Add test for null ptr deref when transport changes
On Wed, Mar 26, 2025 at 05:21:03PM +0100, Stefano Garzarella wrote: On Wed, Mar 26, 2025 at 04:14:20PM +0100, Luigi Leonardi wrote: Hi Michal, On Wed, Mar 19, 2025 at 01:27:35AM +0100, Michal Luczaj wrote: On 3/14/25 10:27, Luigi Leonardi wrote: Add a new test to ensure that when the transport changes a null pointer dereference does not occur[1]. Note that this test does not fail, but it may hang on the client side if it triggers a kernel oops. This works by creating a socket, trying to connect to a server, and then executing a second connect operation on the same socket but to a different CID (0). This triggers a transport change. If the connect operation is interrupted by a signal, this could cause a null-ptr-deref. Just to be clear: that's the splat, right? Oops: general protection fault, probably for non-canonical address 0xdc0c: [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0060-0x0067] CPU: 2 UID: 0 PID: 463 Comm: kworker/2:3 Not tainted Workqueue: vsock-loopback vsock_loopback_work RIP: 0010:vsock_stream_has_data+0x44/0x70 Call Trace: virtio_transport_do_close+0x68/0x1a0 virtio_transport_recv_pkt+0x1045/0x2ae4 vsock_loopback_work+0x27d/0x3f0 process_one_work+0x846/0x1420 worker_thread+0x5b3/0xf80 kthread+0x35a/0x700 ret_from_fork+0x2d/0x70 ret_from_fork_asm+0x1a/0x30 Yep! I'll add it to the commit message in v3. ... +static void test_stream_transport_change_client(const struct test_opts *opts) +{ + __sighandler_t old_handler; + pid_t pid = getpid(); + pthread_t thread_id; + time_t tout; + + old_handler = signal(SIGUSR1, test_transport_change_signal_handler); + if (old_handler == SIG_ERR) { + perror("signal"); + exit(EXIT_FAILURE); + } + + if (pthread_create(&thread_id, NULL, test_stream_transport_change_thread, &pid)) { + perror("pthread_create"); Does pthread_create() set errno on failure? It does not, very good catch! + exit(EXIT_FAILURE); + } + + tout = current_nsec() + TIMEOUT * NSEC_PER_SEC; Isn't 10 seconds a bit excessive? I see the oops pretty much immediately. Yeah it's probably excessive. I used because it's the default timeout value. + do { + struct sockaddr_vm sa = { + .svm_family = AF_VSOCK, + .svm_cid = opts->peer_cid, + .svm_port = opts->peer_port, + }; + int s; + + s = socket(AF_VSOCK, SOCK_STREAM, 0); + if (s < 0) { + perror("socket"); + exit(EXIT_FAILURE); + } + + connect(s, (struct sockaddr *)&sa, sizeof(sa)); + + /* Set CID to 0 cause a transport change. */ + sa.svm_cid = 0; + connect(s, (struct sockaddr *)&sa, sizeof(sa)); + + close(s); + } while (current_nsec() < tout); + + if (pthread_cancel(thread_id)) { + perror("pthread_cancel"); And errno here. + exit(EXIT_FAILURE); + } + + /* Wait for the thread to terminate */ + if (pthread_join(thread_id, NULL)) { + perror("pthread_join"); And here. Aaand I've realized I've made exactly the same mistake elsewhere :) ... +static void test_stream_transport_change_server(const struct test_opts *opts) +{ + time_t tout = current_nsec() + TIMEOUT * NSEC_PER_SEC; + + do { + int s = vsock_stream_listen(VMADDR_CID_ANY, opts->peer_port); + + close(s); + } while (current_nsec() < tout); +} I'm not certain you need to re-create the listener or measure the time here. What about something like int s = vsock_stream_listen(VMADDR_CID_ANY, opts->peer_port); control_expectln("DONE"); close(s); Just tried and it triggers the oops :) If this works (as I also initially thought), we should check the result of the first connect() in the client code. It can succeed or fail with -EINTR, in other cases we should report an error because it is not expected. And we should check also the second connect(), it should always fail, right? For this I think you need another sync point to be sure the server is listening before try to connect the first time: client: // pthread_create, etc. control_expectln("LISTENING"); do { ... } while(); control_writeln("DONE"); server: int s = vsock_stream_listen(VMADDR_CID_ANY, opts->peer_port); control_writeln("LISTENING"); We found that this needed to be extended by adding an accept() loop to avoid filling up the backlog of the listening socket. But by doing accept() and close() back to back, we found a problem in AF_VSOCK, where connect() in some cases would get stuck until the timeout (default: 2 seconds) returning -ETIMEDOUT. Fix is coming. Thanks, Stefano co
Re: [PATCH/RFC] kunit/rtc: Add real support for very slow tests
Hi David, On Fri, 28 Mar 2025 at 09:07, David Gow wrote: > Thanks for sending this out: I think this raises some good questions > about exactly how to handle long running tests (particularly on > older/slower hardware). > > I've put a few notes below, but, tl;dr: I think these are all good > changes, even if there's more we can do to better scale to slower > hardware. > > On Fri, 28 Mar 2025 at 00:07, Geert Uytterhoeven wrote: > > 2. Increase timeout by ten; ideally this should only be done for very > > slow tests, but I couldn't find how to access kunit_case.attr.case > > from kunit_try_catch_run(), > > > My feeling for tests generally is: > - Normal: effectively instant on modern hardware, O(seconds) on > ancient hardware. > - Slow: takes O(seconds) to run on modern hardware, O(minutes)..O(10s > of minutes) on ancient hardware. > - Very slow: O(minutes) or higher on modern hardware, infeasible on > ancient hardware. > > Obviously the definition of "modern" and "ancient" hardware here is > pretty arbitrary: I'm using "modern, high-end x86" ~4GHz as my > "modern" example, and "66MHz 486" as my "ancient" one, but things like > emulation or embedded systems fit in-between. > > Ultimately, I think the timeout probably needs to be configurable on a > per-machine basis more than a per-test one, but having a 10x > multiplier (or even a 100x multiplier) for very slow tests would also > work for me. Yes, adapting automatically to the speed of the target maachine would be nice, but non-trivial. > I quickly tried hacking together something to pass through the > attribute and implement this. Diff (probably mangled by gmail) below: [...] Thanks! > I'll get around to extending this to allow the "base timeout" to be > configurable as a command-line option, too, if this seems like a good > way to go. > > > 3. Mark rtc_time64_to_tm_test_date_range_1000 slow, > > 4. Mark rtc_time64_to_tm_test_date_range_16 very slow. > > Hmm... these are definitely fast enough on my "modern" machine that > they probably only warrant "slow", not "very slow". But given they're > definitely causing problems on older machines, I'm happy to go with > marking the large ones very slow. (I've been waiting for them for > about 45 minutes so far on my 486.) > > Do the time tests in kernel/time/time_test.c also need to be marked > very slow, or does that run much faster on your setup? Hmm, I did run time_test (insmod took (+7 minutes), but I don't seem to have pass/fail output. Will rerun... Indeed: # time64_to_tm_test_date_range.speed: slow Another test that wanted to be marked as slow was: # kunit_platform_device_add_twice_fails_test: Test should be marked slow (runtime: 30.788248702s) I will rerun all, as it seems I have lost some logs... > Is this causing you enough strife that you want it in as-is, straight > away, or would you be happy with it being split up and polished a bit > first -- particularly around supporting the more configurable timeout, > and shifting the test changes into separate patches? (I'm happy to do > that for you if you don't want to dig around in the somewhat messy > KUnit try-catch stuff any further.) This is definitely not something urgent for me. Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [ANNOUNCE] kmod 34.2
kmod 34.2 is out: https://www.kernel.org/pub/linux/utils/kernel/kmod/kmod-34.2.tar.xz https://www.kernel.org/pub/linux/utils/kernel/kmod/kmod-34.2.tar.sign The tarballs generated for kmod-34 and kmod-34.1 were not very compatible for distros still on autotools. Hint: v35 will not have autotools and it'd be better to be prepared. Fix it and also bring a few fixes to weakdep parsing. Shortlog is below: Emil Velikov (1): NEWS: squash a couple of typos Jakub Ślepecki (1): libkmod: fix buffer-overflow in weakdep_to_char Lucas De Marchi (3): testsuite: Add modprobe -c test for weakdep autotools: Fix generated files in tarball kmod 34.2 Tobias Stoeckmann (2): libkmod: release memory on builtin error path libkmod: fix buffer-overflow in weakdep_to_char thanks Lucas De Marchi
[GIT PULL] Modules changes for v6.15-rc1
The following changes since commit 80e54e84911a923c40d7bee33a34c1b4be148d7a: Linux 6.14-rc6 (2025-03-09 13:45:25 -1000) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux.git/ tags/modules-6.15-rc1 for you to fetch changes up to 897c0b4e27135132dc5b348c1a3773d059668489: MAINTAINERS: Update the MODULE SUPPORT section (2025-03-28 15:08:20 +0100) Modules changes for 6.15-rc1 - Use RCU instead of RCU-sched The mix of rcu_read_lock(), rcu_read_lock_sched() and preempt_disable() in the module code and its users has been replaced with just rcu_read_lock(). - The rest of changes are smaller fixes and updates. The changes have been on linux-next for at least 2 weeks, with the RCU cleanup present for 2 months. One performance problem was reported with the RCU change when KASAN + lockdep were enabled, but it was effectively addressed by the already merged ee57ab5a3212 ("locking/lockdep: Disable KASAN instrumentation of lockdep.c"). Joel Granados (1): tests/module: nix-ify Petr Pavlu (1): MAINTAINERS: Update the MODULE SUPPORT section Sebastian Andrzej Siewior (27): module: Begin to move from RCU-sched to RCU. module: Use proper RCU assignment in add_kallsyms(). module: Use RCU in find_kallsyms_symbol(). module: Use RCU in module_get_kallsym(). module: Use RCU in find_module_all(). module: Use RCU in __find_kallsyms_symbol_value(). module: Use RCU in module_kallsyms_on_each_symbol(). module: Remove module_assert_mutex_or_preempt() from try_add_tainted_module(). module: Use RCU in find_symbol(). module: Use RCU in __is_module_percpu_address(). module: Allow __module_address() to be called from RCU section. module: Use RCU in search_module_extables(). module: Use RCU in all users of __module_address(). module: Use RCU in all users of __module_text_address(). ARM: module: Use RCU in all users of __module_text_address(). arm64: module: Use RCU in all users of __module_text_address(). LoongArch/orc: Use RCU in all users of __module_address(). LoongArch: ftrace: Use RCU in all users of __module_text_address(). powerpc/ftrace: Use RCU in all users of __module_text_address(). cfi: Use RCU while invoking __module_address(). x86: Use RCU in all users of __module_address(). jump_label: Use RCU in all users of __module_address(). jump_label: Use RCU in all users of __module_text_address(). bpf: Use RCU in all users of __module_text_address(). kprobes: Use RCU in all users of __module_text_address(). static_call: Use RCU in all users of __module_text_address(). bug: Use RCU instead RCU-sched to protect module_bug_list. Thorsten Blum (3): params: Annotate struct module_param_attrs with __counted_by() module: Replace deprecated strncpy() with strscpy() module: Remove unnecessary size argument when calling strscpy() MAINTAINERS | 4 +- arch/arm/kernel/module-plts.c| 4 +- arch/arm64/kernel/ftrace.c | 7 +- arch/loongarch/kernel/ftrace_dyn.c | 9 ++- arch/loongarch/kernel/unwind_orc.c | 4 +- arch/powerpc/kernel/trace/ftrace.c | 6 +- arch/powerpc/kernel/trace/ftrace_64_pg.c | 6 +- arch/x86/kernel/callthunks.c | 3 +- arch/x86/kernel/unwind_orc.c | 4 +- include/linux/kallsyms.h | 3 +- include/linux/module.h | 2 +- kernel/cfi.c | 5 +- kernel/jump_label.c | 31 + kernel/kprobes.c | 2 +- kernel/livepatch/core.c | 4 +- kernel/module/internal.h | 11 kernel/module/kallsyms.c | 73 - kernel/module/main.c | 109 +++ kernel/module/tracking.c | 2 - kernel/module/tree_lookup.c | 8 +-- kernel/module/version.c | 14 ++-- kernel/params.c | 29 kernel/static_call_inline.c | 13 ++-- kernel/trace/bpf_trace.c | 24 +++ kernel/trace/trace_kprobe.c | 9 +-- lib/bug.c| 22 +++ lib/tests/module/gen_test_kallsyms.sh| 2 +- 27 files changed, 160 insertions(+), 250 deletions(-)
Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
On Fri, Mar 28, 2025 at 12:50:12PM +0800, Peng Fan wrote: > On Thu, Mar 27, 2025 at 11:46:33AM -0600, Mathieu Poirier wrote: > >Hi, > > > >On Wed, Mar 26, 2025 at 10:02:14AM +0800, Peng Fan (OSS) wrote: > >> From: Peng Fan > >> > >> There is case as below could trigger kernel dump: > >> Use U-Boot to start remote processor(rproc) with resource table > >> published to a fixed address by rproc. After Kernel boots up, > >> stop the rproc, load a new firmware which doesn't have resource table > >> ,and start rproc. > >> > > > >If a firwmare image doesn't have a resouce table, rproc_elf_load_rsc_table() > >will return an error [1], rproc_fw_boot() will exit prematurely [2] and the > >remote processor won't be started. What am I missing? > > STM32 and i.MX use their own parse_fw implementation which allows no resource > table: > https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/stm32_rproc.c#L272 > https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/imx_rproc.c#L598 Ok, that settles rproc_fw_boot() but there is also rproc_find_loaded_rsc_table() that will return NULL if a resource table is not found and preventing the memcpy() in rproc_start() from happening: https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1288 > > Thanks, > Peng > > > > >[1]. > >https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_elf_loader.c#L338 > >[2]. > >https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1411 > > > > > >> When starting rproc with a firmware not have resource table, > >> `memcpy(loaded_table, rproc->cached_table, rproc->table_sz)` will > >> trigger dump, because rproc->cache_table is set to NULL during the last > >> stop operation, but rproc->table_sz is still valid. > >> > >> This issue is found on i.MX8MP and i.MX9. > >> > >> Dump as below: > >> Unable to handle kernel NULL pointer dereference at virtual address > >> > >> Mem abort info: > >> ESR = 0x9604 > >> EC = 0x25: DABT (current EL), IL = 32 bits > >> SET = 0, FnV = 0 > >> EA = 0, S1PTW = 0 > >> FSC = 0x04: level 0 translation fault > >> Data abort info: > >> ISV = 0, ISS = 0x0004, ISS2 = 0x > >> CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > >> GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > >> user pgtable: 4k pages, 48-bit VAs, pgdp=00010af63000 > >> [] pgd=, p4d= > >> Internal error: Oops: 9604 [#1] PREEMPT SMP > >> Modules linked in: > >> CPU: 2 UID: 0 PID: 1060 Comm: sh Not tainted > >> 6.14.0-rc7-next-20250317-dirty #38 > >> Hardware name: NXP i.MX8MPlus EVK board (DT) > >> pstate: a005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > >> pc : __pi_memcpy_generic+0x110/0x22c > >> lr : rproc_start+0x88/0x1e0 > >> Call trace: > >> __pi_memcpy_generic+0x110/0x22c (P) > >> rproc_boot+0x198/0x57c > >> state_store+0x40/0x104 > >> dev_attr_store+0x18/0x2c > >> sysfs_kf_write+0x7c/0x94 > >> kernfs_fop_write_iter+0x120/0x1cc > >> vfs_write+0x240/0x378 > >> ksys_write+0x70/0x108 > >> __arm64_sys_write+0x1c/0x28 > >> invoke_syscall+0x48/0x10c > >> el0_svc_common.constprop.0+0xc0/0xe0 > >> do_el0_svc+0x1c/0x28 > >> el0_svc+0x30/0xcc > >> el0t_64_sync_handler+0x10c/0x138 > >> el0t_64_sync+0x198/0x19c > >> > >> Clear rproc->table_sz to address the issue. > >> > >> While at here, also clear rproc->table_sz when rproc_fw_boot and > >> rproc_detach. > >> > >> Fixes: 9dc9507f1880 ("remoteproc: Properly deal with the resource table > >> when detaching") > >> Signed-off-by: Peng Fan > >> --- > >> > >> V2: > >> Clear table_sz when rproc_fw_boot and rproc_detach per Arnaud > >> > >> drivers/remoteproc/remoteproc_core.c | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/drivers/remoteproc/remoteproc_core.c > >> b/drivers/remoteproc/remoteproc_core.c > >> index c2cf0d277729..1efa53d4e0c3 100644 > >> --- a/drivers/remoteproc/remoteproc_core.c > >> +++ b/drivers/remoteproc/remoteproc_core.c > >> @@ -1442,6 +1442,7 @@ static int rproc_fw_boot(struct rproc *rproc, const > >> struct firmware *fw) > >>kfree(rproc->cached_table); > >>rproc->cached_table = NULL; > >>rproc->table_ptr = NULL; > >> + rproc->table_sz = 0; > >> unprepare_rproc: > >>/* release HW resources if needed */ > >>rproc_unprepare_device(rproc); > >> @@ -2025,6 +2026,7 @@ int rproc_shutdown(struct rproc *rproc) > >>kfree(rproc->cached_table); > >>rproc->cached_table = NULL; > >>rproc->table_ptr = NULL; > >> + rproc->table_sz = 0; > >> out: > >>mutex_unlock(&rproc->lock); > >>return ret; > >> @@ -2091,6 +2093,7 @@ int rproc_detach(struct rproc *rproc) > >>kfree(rproc->cached_table); > >>rproc->cached_table = NULL; > >>rproc->table_ptr = NULL; > >> + rproc->table_sz = 0; > >> out: > >>mutex_unlock(&rproc->lock); > >>return ret; > >> -- > >> 2.37.1 > >
[PATCH net] vsock: avoid timeout during connect() if the socket is closing
From: Stefano Garzarella When a peer attempts to establish a connection, vsock_connect() contains a loop that waits for the state to be TCP_ESTABLISHED. However, the other peer can be fast enough to accept the connection and close it immediately, thus moving the state to TCP_CLOSING. When this happens, the peer in the vsock_connect() is properly woken up, but since the state is not TCP_ESTABLISHED, it goes back to sleep until the timeout expires, returning -ETIMEDOUT. If the socket state is TCP_CLOSING, waiting for the timeout is pointless. vsock_connect() can return immediately without errors or delay since the connection actually happened. The socket will be in a closing state, but this is not an issue, and subsequent calls will fail as expected. We discovered this issue while developing a test that accepts and immediately closes connections to stress the transport switch between two connect() calls, where the first one was interrupted by a signal (see Closes link). Reported-by: Luigi Leonardi Closes: https://lore.kernel.org/virtualization/bq6hxrolno2vmtqwcvb5bljfpb7mvwb3kohrvaed6auz5vxrfv@ijmd2f3grobn/ Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella --- net/vmw_vsock/af_vsock.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 7e3db87ae433..fc6afbc8d680 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1551,7 +1551,11 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr, timeout = vsk->connect_timeout; prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); - while (sk->sk_state != TCP_ESTABLISHED && sk->sk_err == 0) { + /* If the socket is already closing or it is in an error state, there +* is no point in waiting. +*/ + while (sk->sk_state != TCP_ESTABLISHED && + sk->sk_state != TCP_CLOSING && sk->sk_err == 0) { if (flags & O_NONBLOCK) { /* If we're not going to block, we schedule a timeout * function to generate a timeout on the connection -- 2.49.0
Re: [PATCH v2 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc
On Fri, Mar 28, 2025 at 07:50:43PM +0200, Jarkko Sakkinen wrote: > On Fri, Mar 28, 2025 at 02:57:41PM +0200, Elena Reshetova wrote: > > SGX architecture introduced a new instruction called EUPDATESVN > > to Ice Lake. It allows updating security SVN version, given that EPC > > is completely empty. The latter is required for security reasons > > in order to reason that enclave security posture is as secure as the > > security SVN version of the TCB that created it. > > > > Additionally it is important to ensure that while ENCLS[EUPDATESVN] > > runs, no concurrent page creation happens in EPC, because it might > > result in #GP delivered to the creator. Legacy SW might not be prepared > > to handle such unexpected #GPs and therefore this patch introduces > > a locking mechanism to ensure no concurrent EPC allocations can happen. > > > > It is also ensured that ENCLS[EUPDATESVN] is not called when running > > in a VM since it does not have a meaning in this context (microcode > > updates application is limited to the host OS) and will create > > unnecessary load. > > > > This patch is based on previous submision by Cathy Zhang > > https://lore.kernel.org/all/20220520103904.1216-1-cathy.zh...@intel.com/ > > > > Signed-off-by: Elena Reshetova > > --- > > arch/x86/include/asm/sgx.h | 41 + > > arch/x86/kernel/cpu/sgx/encls.h | 6 > > arch/x86/kernel/cpu/sgx/main.c | 63 - > > arch/x86/kernel/cpu/sgx/sgx.h | 1 + > > 4 files changed, 95 insertions(+), 16 deletions(-) > > > > diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h > > index 6a0069761508..5caf5c31ebc6 100644 > > --- a/arch/x86/include/asm/sgx.h > > +++ b/arch/x86/include/asm/sgx.h > > @@ -26,23 +26,26 @@ > > #define SGX_CPUID_EPC_SECTION 0x1 > > /* The bitmask for the EPC section type. */ > > #define SGX_CPUID_EPC_MASK GENMASK(3, 0) > > +/* EUPDATESVN presence indication */ > > +#define SGX_CPUID_EUPDATESVN BIT(10) > > > > enum sgx_encls_function { > > - ECREATE = 0x00, > > - EADD= 0x01, > > - EINIT = 0x02, > > - EREMOVE = 0x03, > > - EDGBRD = 0x04, > > - EDGBWR = 0x05, > > - EEXTEND = 0x06, > > - ELDU= 0x08, > > - EBLOCK = 0x09, > > - EPA = 0x0A, > > - EWB = 0x0B, > > - ETRACK = 0x0C, > > - EAUG= 0x0D, > > - EMODPR = 0x0E, > > - EMODT = 0x0F, > > + ECREATE = 0x00, > > + EADD= 0x01, > > + EINIT = 0x02, > > + EREMOVE = 0x03, > > + EDGBRD = 0x04, > > + EDGBWR = 0x05, > > + EEXTEND = 0x06, > > + ELDU= 0x08, > > + EBLOCK = 0x09, > > + EPA = 0x0A, > > + EWB = 0x0B, > > + ETRACK = 0x0C, > > + EAUG= 0x0D, > > + EMODPR = 0x0E, > > + EMODT = 0x0F, > > + EUPDATESVN = 0x18, > > }; > > > > /** > > @@ -73,6 +76,11 @@ enum sgx_encls_function { > > * public key does not match IA32_SGXLEPUBKEYHASH. > > * %SGX_PAGE_NOT_MODIFIABLE: The EPC page cannot be modified because > > it > > * is in the PENDING or MODIFIED state. > > + * %SGX_INSUFFICIENT_ENTROPY: Insufficient entropy in RNG. > > + * %SGX_EPC_NOT_READY: EPC is not ready for SVN update. > > + * %SGX_NO_UPDATE: EUPDATESVN was successful, but CPUSVN was not > > + * updated because current SVN was not newer than > > + * CPUSVN. > > * %SGX_UNMASKED_EVENT:An unmasked event, e.g. INTR, was > > received > > */ > > enum sgx_return_code { > > @@ -81,6 +89,9 @@ enum sgx_return_code { > > SGX_CHILD_PRESENT = 13, > > SGX_INVALID_EINITTOKEN = 16, > > SGX_PAGE_NOT_MODIFIABLE = 20, > > + SGX_INSUFFICIENT_ENTROPY= 29, > > + SGX_EPC_NOT_READY = 30, > > + SGX_NO_UPDATE = 31, > > SGX_UNMASKED_EVENT = 128, > > }; > > > > diff --git a/arch/x86/kernel/cpu/sgx/encls.h > > b/arch/x86/kernel/cpu/sgx/encls.h > > index 99004b02e2ed..3d83c76dc91f 100644 > > --- a/arch/x86/kernel/cpu/sgx/encls.h > > +++ b/arch/x86/kernel/cpu/sgx/encls.h > > @@ -233,4 +233,10 @@ static inline int __eaug(struct sgx_pageinfo *pginfo, > > void *addr) > > return __encls_2(EAUG, pginfo, addr); > > } > > > > +/* Update CPUSVN at runtime. */ > > +static inline int __eupdatesvn(void) > > +{ > > + return __encls_ret_1(EUPDATESVN, ""); > > +} > > + > > #endif /* _X86_ENCLS_H */ > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > > index b61d3bad0446..24563110811d 100644 > > --- a/arch/x86/kernel/cpu/sgx/main.c > > +++ b/arch/x86/kernel/cpu/sgx/main.c > > @@ -32,6 +32,11 @@ static DEFINE_XARRAY(sgx_epc_address_space); > > static LIST_HEAD(sgx_active_page_list); > > static DEFINE_SPINLOCK(sgx_reclaimer_lock); > > > > +/*
Re: [PATCH v3 1/7] dt-bindings: input: syna,rmi4: document syna,pdt-fallback-desc
On 26/03/2025 11:26, Caleb Connolly wrote: On 3/26/25 07:57, Krzysztof Kozlowski wrote: On 25/03/2025 14:23, Caleb Connolly wrote: On 3/25/25 08:36, Krzysztof Kozlowski wrote: On 24/03/2025 19:00, David Heidelberg wrote: On 10/03/2025 10:45, Krzysztof Kozlowski wrote: On Sat, Mar 08, 2025 at 03:08:37PM +0100, David Heidelberg wrote: From: Caleb Connolly This new property allows devices to specify some register values which are missing on units with third party replacement displays. These displays use unofficial touch ICs which only implement a subset of the RMI4 specification. These are different ICs, so they have their own compatibles. Why this cannot be deduced from the compatible? Yes, but these identify as the originals. It does not matter how they identify. You have the compatible for them. If you cannot add compatible for them, how can you add dedicated property for them? Hi Krzysztof, There are an unknown number of knock-off RMI4 chips which are sold in cheap replacement display panels from multiple vendors. We suspect there's more than one implementation. A new compatible string wouldn't help us, since we use the same DTB on fully original hardware as on hardware with replacement parts. The proposed new property describes configuration registers which are present on original RMI4 chips but missing on the third party ones, the contents of the registers is static. So you want to add redundant information for existing compatible, while claiming you cannot deduce it from that existing compatible... Well, no.. you cannot be sure that only chosen boards will have touchscreens replaced, thus you will have to add this property to every board using this compatible making it equal to the compatible and we are back at my original comment. This is deducible from the compatible. If not the new one, then from old one. hmm I see, so instead we should add a compatible for the specific variant (S3320 or something) of RMI4 in this device and handle this in the driver? I think that makes sense. Agree, preparing it for v4. So far proposing `compatible = "syna,rmi4-s3706b-i2c", "syna,rmi4-i2c"` (as S3706B is written in the commit and search confirms it for OP6/6T). David> Best regards, Krzysztof -- David Heidelberg
Re: bug report for linux-6.14/tools/testing/selftests/mm/pagemap_ioctl.c
On 3/26/25 13:25, David Binderman wrote: Hello there, Static analyser cppcheck says: linux-6.14/tools/testing/selftests/mm/pagemap_ioctl.c:1061:11: style: int result is assigned to long long variable. If the variable is long long to avoid loss of information, then you have loss of information. [truncLongCastAssignment] linux-6.14/tools/testing/selftests/mm/pagemap_ioctl.c:1510:11: style: int result is assigned to long long variable. If the variable is long long to avoid loss of information, then you have loss of information. [truncLongCastAssignment] linux-6.14/tools/testing/selftests/mm/pagemap_ioctl.c:1523:11: style: int result is assigned to long long variable. If the variable is long long to avoid loss of information, then you have loss of information. [truncLongCastAssignment] linux-6.14/tools/testing/selftests/mm/pagemap_ioctl.c:247:11: style: int result is assigned to long long variable. If the variable is long long to avoid loss of information, then you have loss of information. [truncLongCastAssignment] linux-6.14/tools/testing/selftests/mm/pagemap_ioctl.c:435:11: style: int result is assigned to long long variable. If the variable is long long to avoid loss of information, then you have loss of information. [truncLongCastAssignment] linux-6.14/tools/testing/selftests/mm/pagemap_ioctl.c:490:11: style: int result is assigned to long long variable. If the variable is long long to avoid loss of information, then you have loss of information. [truncLongCastAssignment] The source code of the first one is mem_size = 10 * page_size; Maybe better code: mem_size = 10ULL * page_size; Regards David Binderman Can you send a patch for us to review? thanks, -- Shuah
Re: [PATCH] selftests/nolibc: drop unnecessary sys/io.h include
On 3/24/25 16:01, Thomas Weißschuh wrote: The include of sys/io.h is not necessary anymore since commit 67eb617a8e1e ("selftests/nolibc: simplify call to ioperm"). It's existence is also problematic as the header does not exist on all architectures. Reported-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Weißschuh --- tools/testing/selftests/nolibc/nolibc-test.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c index 5884a891c491544050fc35b07322c73a1a9dbaf3..7a60b6ac1457e8d862ab1a6a26c9e46abec92111 100644 --- a/tools/testing/selftests/nolibc/nolibc-test.c +++ b/tools/testing/selftests/nolibc/nolibc-test.c @@ -16,7 +16,6 @@ #ifndef _NOLIBC_STDIO_H /* standard libcs need more includes */ #include -#include #include #include #include --- base-commit: bceb73904c855c78402dca94c82915f078f259dd change-id: 20250324-nolibc-ioperm-155646560b95 Best regards, Acked-by: Shuah Khan thanks, -- Shuah
[PATCH v8 0/8] vhost: Add support of kthread API
In commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads"), the vhost now uses vhost_task and operates as a child of the owner thread. This aligns with containerization principles. However, this change has caused confusion for some legacy userspace applications. Therefore, we are reintroducing support for the kthread API. In this series, a new UAPI is implemented to allow userspace applications to configure their thread mode. Changelog v2: 1. Change the module_param's name to enforce_inherit_owner, and the default value is true. 2. Change the UAPI's name to VHOST_SET_INHERIT_FROM_OWNER. Changelog v3: 1. Change the module_param's name to inherit_owner_default, and the default value is true. 2. Add a structure for task function; the worker will select a different mode based on the value inherit_owner. 3. device will have their own inherit_owner in struct vhost_dev 4. Address other comments Changelog v4: 1. remove the module_param, only keep the UAPI 2. remove the structure for task function; change to use the function pointer in vhost_worker 3. fix the issue in vhost_worker_create and vhost_dev_ioctl 4. Address other comments Changelog v5: 1. Change wakeup and stop function pointers in struct vhost_worker to void. 2. merging patches 4, 5, 6 in a single patch 3. Fix spelling issues and address other comments. Changelog v6: 1. move the check of VHOST_NEW_WORKER from vhost_scsi to vhost 2. Change the ioctl name VHOST_SET_INHERIT_FROM_OWNER to VHOST_FORK_FROM_OWNER 3. reuse the function __vhost_worker_flush 4. use a ops sturct to support worker relates function 5. reset the value of inherit_owner in vhost_dev_reset_owner. Changelog v7: 1. add a KConfig knob to disable legacy app support 2. Split the changes into two patches to separately introduce the ops and add kthread support. 3. Utilized INX_MAX to avoid modifications in __vhost_worker_flush 4. Rebased on the latest kernel 5. Address other comments Changelog v8: 1. Rebased on the latest kernel 2. Address some other comments Tested with QEMU with kthread mode/task mode/kthread+task mode Cindy Lu (8): vhost: Add a new parameter in vhost_dev to allow user select kthread vhost: Reintroduce vhost_worker to support kthread vhost: Add the cgroup related function vhost: Introduce vhost_worker_ops in vhost_worker vhost: Reintroduce kthread mode support in vhost vhost: uapi to control task mode (owner vs kthread) vhost: Add check for inherit_owner status vhost: Add a KConfig knob to enable IOCTL VHOST_FORK_FROM_OWNER drivers/vhost/Kconfig | 15 +++ drivers/vhost/vhost.c | 219 + drivers/vhost/vhost.h | 21 include/uapi/linux/vhost.h | 16 +++ 4 files changed, 252 insertions(+), 19 deletions(-) -- 2.45.0
[PATCH v8 7/8] vhost: Add check for inherit_owner status
The VHOST_NEW_WORKER requires the inherit_owner setting to be true. So we need to add a check for this. Signed-off-by: Cindy Lu --- drivers/vhost/vhost.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index ff930c2e5b78..fb0c7fb43f78 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1018,6 +1018,13 @@ long vhost_worker_ioctl(struct vhost_dev *dev, unsigned int ioctl, switch (ioctl) { /* dev worker ioctls */ case VHOST_NEW_WORKER: + /* +* vhost_tasks will account for worker threads under the parent's +* NPROC value but kthreads do not. To avoid userspace overflowing +* the system with worker threads inherit_owner must be true. +*/ + if (!dev->inherit_owner) + return -EFAULT; ret = vhost_new_worker(dev, &state); if (!ret && copy_to_user(argp, &state, sizeof(state))) ret = -EFAULT; -- 2.45.0
[PATCH v8 8/8] vhost: Add a KConfig knob to enable IOCTL VHOST_FORK_FROM_OWNER
Introduce a new config knob `CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL`, to control the availability of the `VHOST_FORK_FROM_OWNER` ioctl. When CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL is set to n, the ioctl is disabled, and any attempt to use it will result in failure. Signed-off-by: Cindy Lu --- drivers/vhost/Kconfig | 15 +++ drivers/vhost/vhost.c | 3 +++ 2 files changed, 18 insertions(+) diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index b455d9ab6f3d..e5b9dcbf31b6 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -95,3 +95,18 @@ config VHOST_CROSS_ENDIAN_LEGACY If unsure, say "N". endif + +config VHOST_ENABLE_FORK_OWNER_IOCTL + bool "Enable IOCTL VHOST_FORK_FROM_OWNER" + default n + help + This option enables the IOCTL VHOST_FORK_FROM_OWNER, which allows + userspace applications to modify the thread mode for vhost devices. + + By default, `CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL` is set to `n`, + meaning the ioctl is disabled and any operation using this ioctl + will fail. + When the configuration is enabled (y), the ioctl becomes + available, allowing users to set the mode if needed. + + If unsure, say "N". diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index fb0c7fb43f78..568e43cb54a9 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2294,6 +2294,8 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp) r = vhost_dev_set_owner(d); goto done; } + +#ifdef CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL if (ioctl == VHOST_FORK_FROM_OWNER) { u8 inherit_owner; /*inherit_owner can only be modified before owner is set*/ @@ -2313,6 +2315,7 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp) r = 0; goto done; } +#endif /* You must be the owner to do anything else */ r = vhost_dev_check_owner(d); if (r) -- 2.45.0
[PATCH v8 3/8] vhost: Add the cgroup related function
Add back the previously removed cgroup function to support the kthread The biggest change for this part is in vhost_attach_cgroups() and vhost_attach_task_to_cgroups(). The old function was remove in commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") Signed-off-by: Cindy Lu --- drivers/vhost/vhost.c | 41 + 1 file changed, 41 insertions(+) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 9500e85b42ce..20571bd6f7bd 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -620,6 +621,46 @@ long vhost_dev_check_owner(struct vhost_dev *dev) } EXPORT_SYMBOL_GPL(vhost_dev_check_owner); +struct vhost_attach_cgroups_struct { + struct vhost_work work; + struct task_struct *owner; + int ret; +}; + +static void vhost_attach_cgroups_work(struct vhost_work *work) +{ + struct vhost_attach_cgroups_struct *s; + + s = container_of(work, struct vhost_attach_cgroups_struct, work); + s->ret = cgroup_attach_task_all(s->owner, current); +} + +static int vhost_attach_task_to_cgroups(struct vhost_worker *worker) +{ + struct vhost_attach_cgroups_struct attach; + int saved_cnt; + + attach.owner = current; + + vhost_work_init(&attach.work, vhost_attach_cgroups_work); + vhost_worker_queue(worker, &attach.work); + + mutex_lock(&worker->mutex); + + /* +* Bypass attachment_cnt check in __vhost_worker_flush: +* Temporarily change it to INT_MAX to bypass the check +*/ + saved_cnt = worker->attachment_cnt; + worker->attachment_cnt = INT_MAX; + __vhost_worker_flush(worker); + worker->attachment_cnt = saved_cnt; + + mutex_unlock(&worker->mutex); + + return attach.ret; +} + /* Caller should have device mutex */ bool vhost_dev_has_owner(struct vhost_dev *dev) { -- 2.45.0
[PATCH v8 5/8] vhost: Reintroduce kthread mode support in vhost
This commit restores the previously removed functions kthread wake/stop/create, and use ops structure vhost_worker_ops to manage worker wakeup, stop and creation. The function vhost_worker_create initializes these ops pointers based on the value of inherit_owner The old function was remove in commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") Signed-off-by: Cindy Lu --- drivers/vhost/vhost.c | 48 ++- drivers/vhost/vhost.h | 1 + 2 files changed, 48 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index c162ad772f8f..be97028a8baf 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -734,11 +734,21 @@ static void vhost_task_wakeup(struct vhost_worker *worker) return vhost_task_wake(worker->vtsk); } +static void vhost_kthread_wakeup(struct vhost_worker *worker) +{ + wake_up_process(worker->kthread_task); +} + static void vhost_task_do_stop(struct vhost_worker *worker) { return vhost_task_stop(worker->vtsk); } +static void vhost_kthread_do_stop(struct vhost_worker *worker) +{ + kthread_stop(worker->kthread_task); +} + static int vhost_task_worker_create(struct vhost_worker *worker, struct vhost_dev *dev, const char *name) { @@ -762,6 +772,41 @@ static int vhost_task_worker_create(struct vhost_worker *worker, return 0; } +static int vhost_kthread_worker_create(struct vhost_worker *worker, + struct vhost_dev *dev, const char *name) +{ + struct task_struct *task; + u32 id; + int ret; + + task = kthread_create(vhost_run_work_kthread_list, worker, "%s", name); + if (IS_ERR(task)) + return PTR_ERR(task); + + worker->kthread_task = task; + wake_up_process(task); + ret = xa_alloc(&dev->worker_xa, &id, worker, xa_limit_32b, GFP_KERNEL); + if (ret < 0) + goto stop_worker; + + ret = vhost_attach_task_to_cgroups(worker); + if (ret) + goto stop_worker; + + worker->id = id; + return 0; + +stop_worker: + vhost_kthread_do_stop(worker); + return ret; +} + +static const struct vhost_worker_ops kthread_ops = { + .create = vhost_kthread_worker_create, + .stop = vhost_kthread_do_stop, + .wakeup = vhost_kthread_wakeup, +}; + static const struct vhost_worker_ops vhost_task_ops = { .create = vhost_task_worker_create, .stop = vhost_task_do_stop, @@ -773,7 +818,8 @@ static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) struct vhost_worker *worker; char name[TASK_COMM_LEN]; int ret; - const struct vhost_worker_ops *ops = &vhost_task_ops; + const struct vhost_worker_ops *ops = + dev->inherit_owner ? &vhost_task_ops : &kthread_ops; worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT); if (!worker) diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 98895e299efa..af4b2f7d3b91 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -37,6 +37,7 @@ struct vhost_worker_ops { }; struct vhost_worker { + struct task_struct *kthread_task; struct vhost_task *vtsk; struct vhost_dev*dev; /* Used to serialize device wide flushing with worker swapping. */ -- 2.45.0
[PATCH v8 6/8] vhost: uapi to control task mode (owner vs kthread)
Add a new UAPI to configure the vhost device to use the kthread mode The userspace application can use IOCTL VHOST_FORK_FROM_OWNER to choose between owner and kthread mode if necessary This setting must be applied before VHOST_SET_OWNER, as the worker will be created in the VHOST_SET_OWNER function Signed-off-by: Cindy Lu --- drivers/vhost/vhost.c | 22 -- include/uapi/linux/vhost.h | 16 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index be97028a8baf..ff930c2e5b78 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1134,7 +1134,7 @@ void vhost_dev_reset_owner(struct vhost_dev *dev, struct vhost_iotlb *umem) int i; vhost_dev_cleanup(dev); - + dev->inherit_owner = true; dev->umem = umem; /* We don't need VQ locks below since vhost_dev_cleanup makes sure * VQs aren't running. @@ -2287,7 +2287,25 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp) r = vhost_dev_set_owner(d); goto done; } - + if (ioctl == VHOST_FORK_FROM_OWNER) { + u8 inherit_owner; + /*inherit_owner can only be modified before owner is set*/ + if (vhost_dev_has_owner(d)) { + r = -EBUSY; + goto done; + } + if (copy_from_user(&inherit_owner, argp, sizeof(u8))) { + r = -EFAULT; + goto done; + } + if (inherit_owner > 1) { + r = -EINVAL; + goto done; + } + d->inherit_owner = (bool)inherit_owner; + r = 0; + goto done; + } /* You must be the owner to do anything else */ r = vhost_dev_check_owner(d); if (r) diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h index b95dd84eef2d..1ae0917bfeca 100644 --- a/include/uapi/linux/vhost.h +++ b/include/uapi/linux/vhost.h @@ -235,4 +235,20 @@ */ #define VHOST_VDPA_GET_VRING_SIZE _IOWR(VHOST_VIRTIO, 0x82, \ struct vhost_vring_state) + +/** + * VHOST_FORK_FROM_OWNER - Set the inherit_owner flag for the vhost device, + * This ioctl must called before VHOST_SET_OWNER. + * + * @param inherit_owner: An 8-bit value that determines the vhost thread mode + * + * When inherit_owner is set to 1(default value): + * - Vhost will create tasks similar to processes forked from the owner, + * inheriting all of the owner's attributes. + * + * When inherit_owner is set to 0: + * - Vhost will create tasks as kernel thread. + */ +#define VHOST_FORK_FROM_OWNER _IOW(VHOST_VIRTIO, 0x83, __u8) + #endif -- 2.45.0
[PATCH v8 2/8] vhost: Reintroduce vhost_worker to support kthread
Add the previously removed function vhost_worker() back to support the kthread and rename it to vhost_run_work_kthread_list. The old function vhost_worker was change to support task in commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") change to xarray in commit 1cdaafa1b8b4 ("vhost: replace single worker pointer with xarray") Signed-off-by: Cindy Lu --- drivers/vhost/vhost.c | 38 ++ 1 file changed, 38 insertions(+) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 250dc43f1786..9500e85b42ce 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -388,6 +388,44 @@ static void vhost_vq_reset(struct vhost_dev *dev, __vhost_vq_meta_reset(vq); } +static int vhost_run_work_kthread_list(void *data) +{ + struct vhost_worker *worker = data; + struct vhost_work *work, *work_next; + struct vhost_dev *dev = worker->dev; + struct llist_node *node; + + kthread_use_mm(dev->mm); + + for (;;) { + /* mb paired w/ kthread_stop */ + set_current_state(TASK_INTERRUPTIBLE); + + if (kthread_should_stop()) { + __set_current_state(TASK_RUNNING); + break; + } + node = llist_del_all(&worker->work_list); + if (!node) + schedule(); + + node = llist_reverse_order(node); + /* make sure flag is seen after deletion */ + smp_wmb(); + llist_for_each_entry_safe(work, work_next, node, node) { + clear_bit(VHOST_WORK_QUEUED, &work->flags); + __set_current_state(TASK_RUNNING); + kcov_remote_start_common(worker->kcov_handle); + work->fn(work); + kcov_remote_stop(); + cond_resched(); + } + } + kthread_unuse_mm(dev->mm); + + return 0; +} + static bool vhost_run_work_list(void *data) { struct vhost_worker *worker = data; -- 2.45.0
[PATCH v8 1/8] vhost: Add a new parameter in vhost_dev to allow user select kthread
The vhost now uses vhost_task and workers as a child of the owner thread. While this aligns with containerization principles,it confuses some legacy userspace app, Therefore, we are reintroducing kthread API support. Introduce a new parameter to enable users to choose between kthread and task mode. Signed-off-by: Cindy Lu --- drivers/vhost/vhost.c | 1 + drivers/vhost/vhost.h | 9 + 2 files changed, 10 insertions(+) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 63612faeab72..250dc43f1786 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -552,6 +552,7 @@ void vhost_dev_init(struct vhost_dev *dev, dev->byte_weight = byte_weight; dev->use_worker = use_worker; dev->msg_handler = msg_handler; + dev->inherit_owner = true; init_waitqueue_head(&dev->wait); INIT_LIST_HEAD(&dev->read_list); INIT_LIST_HEAD(&dev->pending_list); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index bb75a292d50c..19bb94922a0e 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -176,6 +176,15 @@ struct vhost_dev { int byte_weight; struct xarray worker_xa; bool use_worker; + /* +* If inherit_owner is true we use vhost_tasks to create +* the worker so all settings/limits like cgroups, NPROC, +* scheduler, etc are inherited from the owner. If false, +* we use kthreads and only attach to the same cgroups +* as the owner for compat with older kernels. +* here we use true as default value +*/ + bool inherit_owner; int (*msg_handler)(struct vhost_dev *dev, u32 asid, struct vhost_iotlb_msg *msg); }; -- 2.45.0
Re: Symbol too long for allsyms warnings on KSYM_NAME_LEN
On Thu, Mar 27, 2025, at 14:58, Peter Zijlstra wrote: > On Thu, Mar 27, 2025 at 09:38:46AM +0100, Arnd Bergmann wrote: >> My randconfig builds sometimes (around one in every 700 configs) run >> into this warning on x86: >> >> Symbol >> __pfx_sg1h2i3j4k5l6m7ng1h2i3j4k5l6m7nng1h2i3j4k5l6m7ng1h2i3j4k5l6m7nnng1h2i3j4k5l6m7ng1h2i3j4k5l6m7nng1h2i3j4k5l6m7ng1h2i3j4k5l6m7g1h2i3j4k5l6m7ng1h2i3j4k5l6m7nng1h2i3j4k5l6m7ng1h2i3j4k5l6m7nnng1h2i3j4k5l6m7ng1h2i3j4k5l6m7nng1h2i3j4k5l6m7ng1h2i3j4k5l6m7ng1h2i3j4k5l6m7ng1h2i3j4k5l6m7nng1h2i3j4k5l6m7ng1h2i3j4k5l6m7nnng1h2i3j4k5l6m7ng1h2i3j4k5l6m7nng1h2i3j4k5l6m7ng1h2i3j4k5l6m7g1h2i3j4k5l6m7ng1h2i3j4k5l6m7nng1h2i3j4k5l6m7ng1h2i3j4k5l6m7nnng1h2i3j4k5l6m7ng1h2i3j4k5l6m7nng1h2i3j4k5l6m7ng1h2i3j4k5l6m7n >> too long for kallsyms (517 >= 512). >> Please increase KSYM_NAME_LEN both in kernel and kallsyms.c >> >> The check that gets triggered was added in commit c104c16073b >> ("Kunit to check the longest symbol length"), see >> https://lore.kernel.org/all/20241117195923.222145-1-sergio.coll...@gmail.com/ >> >> and the overlong identifier seems to be the result of objtool adding >> the six-byte "__pfx_" string to a symbol in elf_create_prefix_symbol() >> when CONFIG_FUNCTION_PADDING_CFI is set. >> >> I think the suggestion to "Please increase KSYM_NAME_LEN both in >> kernel and kallsyms.c" is misleading here and should probably be >> changed. I don't know if this something that objtool should work >> around, or something that needs to be adapted in the test. > > Probably test needs to be fixed; objtool can't really do anything here, > it just take the existing symname and prefixes it. I found a workaround that avoids the problem for me now, see https://lore.kernel.org/linux-kbuild/20250328112156.2614513-1-a...@kernel.org/ Arnd
Re: [PATCH 1/4] x86/sgx: Add total number of EPC pages
On Fri, Mar 28, 2025 at 10:42:18AM +0200, Jarkko Sakkinen wrote: > In your code example you had a loop inside spinlock, which was based on > a return code of an opcode, i.e. potentially infinite loop. > > I'd like to remind you that the hardware I have is NUC7 from 2018 so > you really have to nail how things will work semantically as I can > only think these things only in theoretical level ;-) [1] That said, I do execute these in NUC7 but is getting a bit old.. Cheapest hardware I've heard is Xeon E-2334 but even that with case etc. is like nearing 2k in price. BR, Jarkko
Re: [PATCH net-next v24 00/23] Introducing OpenVPN Data Channel Offload
Hi Sabrina, do you plan to drop more comments at the patchset at this point? I have gone through all requested changes and I'll just get the patches ready for submission once net-next is open again. Thanks a lot! Cheers, On 18/03/2025 02:40, Antonio Quartulli wrote: Notable changes since v23: * dropped call to netif_tx_start/stop_all_queues() * dropped NETIF_F_HW_CSUM and NETIF_F_RXCSUM dev flags * dropped conditional call to skb_checksum_help() due to the point above * added call to dst_cache_reset() in nl_peer_modify() * dropped obsolete comment in ovpn_peer_keepalive_work() * reversed scheduling delay computation in ovpn_peer_keepalive_work() Please note that some patches were already reviewed/tested by a few people. These patches have retained the tags as they have hardly been touched. The latest code can also be found at: https://github.com/OpenVPN/ovpn-net-next Thanks a lot! Best Regards, Antonio Quartulli OpenVPN Inc. --- Antonio Quartulli (23): net: introduce OpenVPN Data Channel Offload (ovpn) ovpn: add basic netlink support ovpn: add basic interface creation/destruction/management routines ovpn: keep carrier always on for MP interfaces ovpn: introduce the ovpn_peer object ovpn: introduce the ovpn_socket object ovpn: implement basic TX path (UDP) ovpn: implement basic RX path (UDP) ovpn: implement packet processing ovpn: store tunnel and transport statistics ovpn: implement TCP transport skb: implement skb_send_sock_locked_with_flags() ovpn: add support for MSG_NOSIGNAL in tcp_sendmsg ovpn: implement multi-peer support ovpn: implement peer lookup logic ovpn: implement keepalive mechanism ovpn: add support for updating local or remote UDP endpoint ovpn: implement peer add/get/dump/delete via netlink ovpn: implement key add/get/del/swap via netlink ovpn: kill key and notify userspace in case of IV exhaustion ovpn: notify userspace when a peer is deleted ovpn: add basic ethtool support testing/selftests: add test tool and scripts for ovpn module Documentation/netlink/specs/ovpn.yaml | 367 +++ Documentation/netlink/specs/rt_link.yaml | 16 + MAINTAINERS| 11 + drivers/net/Kconfig| 15 + drivers/net/Makefile |1 + drivers/net/ovpn/Makefile | 22 + drivers/net/ovpn/bind.c| 55 + drivers/net/ovpn/bind.h| 101 + drivers/net/ovpn/crypto.c | 211 ++ drivers/net/ovpn/crypto.h | 145 ++ drivers/net/ovpn/crypto_aead.c | 409 drivers/net/ovpn/crypto_aead.h | 29 + drivers/net/ovpn/io.c | 455 drivers/net/ovpn/io.h | 34 + drivers/net/ovpn/main.c| 330 +++ drivers/net/ovpn/main.h| 14 + drivers/net/ovpn/netlink-gen.c | 213 ++ drivers/net/ovpn/netlink-gen.h | 41 + drivers/net/ovpn/netlink.c | 1250 ++ drivers/net/ovpn/netlink.h | 18 + drivers/net/ovpn/ovpnpriv.h| 57 + drivers/net/ovpn/peer.c| 1364 +++ drivers/net/ovpn/peer.h| 163 ++ drivers/net/ovpn/pktid.c | 129 ++ drivers/net/ovpn/pktid.h | 87 + drivers/net/ovpn/proto.h | 118 + drivers/net/ovpn/skb.h | 61 + drivers/net/ovpn/socket.c | 244 ++ drivers/net/ovpn/socket.h | 49 + drivers/net/ovpn/stats.c | 21 + drivers/net/ovpn/stats.h | 47 + drivers/net/ovpn/tcp.c | 592 + drivers/net/ovpn/tcp.h | 36 + drivers/net/ovpn/udp.c | 442 drivers/net/ovpn/udp.h | 25 + include/linux/skbuff.h |2 + include/uapi/linux/if_link.h | 15 + include/uapi/linux/ovpn.h | 109 + include/uapi/linux/udp.h |1 + net/core/skbuff.c | 18 +- net/ipv6/af_inet6.c|1 + net/ipv6/udp.c |1 + tools/testing/selftests/Makefile |1 + tools/testing/selftests/net/ovpn/.gitignore|2 + tools/testing/selftests/net/ovpn/Makefile | 31 + t
RE: [PATCH 4/4] x86/sgx: Implement ENCLS[EUPDATESVN] and opportunistically call it during first EPC page alloc
> On Thu, Mar 27, 2025 at 03:42:30PM +, Reshetova, Elena wrote: > > > > > > + case SGX_NO_UPDATE: > > > > > > + pr_debug("EUPDATESVN was successful, but CPUSVN > was not > > > > > updated, " > > > > > > + "because current SVN was not newer than > > > > > CPUSVN.\n"); > > > > > > + break; > > > > > > + case SGX_EPC_NOT_READY: > > > > > > + pr_debug("EPC is not ready for SVN update."); > > > > > > + break; > > > > > > + case SGX_INSUFFICIENT_ENTROPY: > > > > > > + pr_debug("CPUSVN update is failed due to Insufficient > > > > > entropy in RNG, " > > > > > > + "please try it later.\n"); > > > > > > + break; > > > > > > + case SGX_EPC_PAGE_CONFLICT: > > > > > > + pr_debug("CPUSVN update is failed due to > concurrency > > > > > violation, please " > > > > > > + "stop running any other ENCLS leaf and try it > > > > > later.\n"); > > > > > > + break; > > > > > > + default: > > > > > > + break; > > > > > > > > > > Remove pr_debug() statements. > > > > > > > > This I am not sure it is good idea. I think it would be useful for > > > > system > > > > admins to have a way to see that update either happened or not. > > > > It is true that you can find this out by requesting a new SGX > > > > attestation > > > > quote (and see if newer SVN is used), but it is not the faster way. > > > > > > Maybe pr_debug() is them wrong level if they are meant for sysadmins? > > > > > > I mean these should not happen in normal behavior like ever? As > > > pr_debug() I don't really grab this. > > > > SGX_NO_UPDATE will absolutely happen normally all the time. > > Since EUPDATESVN is executed every time EPC is empty, this is the > > most common code you will get back (because microcode updates are rare). > > Others yes, that would indicate some error condition. > > So, what is the pr_level that you would suggest? > > Right, got it. That changes my conclusions: > > So I'd reformulate it like: > > switch (ret) { > case 0: > pr_info("EUPDATESVN: success\n); > break; > case SGX_EPC_NOT_READY: > case SGX_INSUFFICIENT_ENTROPY: > case SGX_EPC_PAGE_CONFLICT: > pr_err("EUPDATESVN: error %d\n", ret); > /* TODO: block/teardown driver? */ > break; > case SGX_NO_UPDATE: > break; > default: > pr_err("EUPDATESVN: unknown error %d\n", ret); > /* TODO: block/teardown driver? */ > break; > } > > Since when this is executed EPC usage is zero error cases should block > or teardown SGX driver, presuming that they are because of either > incorrect driver state or spurious error code. I agree with the above, but not sure at all about the blocking/teardown the driver. They are all potentially temporal things and SGX_INSUFFICIENT_ENTROPY is even outside of SGX driver control and *does not* indicate any error condition on the driver side itself. SGX_EPC_NOT_READY and SGX_EPC_PAGE_CONFLICT would mean we have a bug somewhere because we thought we could go do EUDPATESVN on empty EPC and prevented anyone from creating pages in meanwhile but looks like we missed smth. That said, I dont know if we want to fail the whole system in case we have such a code bug, this is very aggressive (in case it is some rare edge condition that no one knew about or guessed). So, I would propose to print the pr_err() as you have above but avoid destroying the driver. Would this work? Best Regards, Elena. > > If this happens, we definitely do not want service, right? > > I'm not sure of all error codes how serious they are, or are all of them > consequence of incorrectly working driver. > > BR, Jarkko
Re: [RFC PATCH v3 5/8] KVM: arm64: Introduce module param to partition the PMU
On 13/02/2025 6:03 pm, Colton Lewis wrote: For PMUv3, the register MDCR_EL2.HPMN partitiones the PMU counters into two ranges where counters 0..HPMN-1 are accessible by EL1 and, if allowed, EL0 while counters HPMN..N are only accessible by EL2. Introduce a module parameter in KVM to set this register. The name reserved_host_counters reflects the intent to reserve some counters for the host so the guest may eventually be allowed direct access to a subset of PMU functionality for increased performance. Track HPMN and whether the pmu is partitioned in struct arm_pmu because both KVM and the PMUv3 driver will need to know that to handle guests correctly. Due to the difficulty this feature would create for the driver running at EL1 on the host, partitioning is only allowed in VHE mode. Working on nVHE mode would require a hypercall for every register access because the counters reserved for the host by HPMN are now only accessible to EL2. The parameter is only configurable at boot time. Making the parameter configurable on a running system is dangerous due to the difficulty of knowing for sure no counters are in use anywhere so it is safe to reporgram HPMN. Hi Colton, For some high level feedback for the RFC, it probably makes sense to include the other half of the feature at the same time. I think there is a risk that it requires something slightly different than what's here and there ends up being some churn. Other than that I think it looks ok apart from some minor code review nits. I was also thinking about how BRBE interacts with this. Alex has done some analysis that finds that it's difficult to use BRBE in guests with virtualized counters due to the fact that BRBE freezes on any counter overflow, rather than just guest ones. That leaves the guest with branch blackout windows in the delay between a host counter overflowing and the interrupt being taken and BRBE being restarted. But with HPMN, BRBE does allow freeze on overflow of only one partition or the other (or both, but I don't think we'd want that) e.g.: RNXCWF: If EL2 is implemented, a BRBE freeze event occurs when all of the following are true: * BRBCR_EL1.FZP is 1. * Generation of Branch records is not paused. * PMOVSCLR_EL0[(MDCR_EL2.HPMN-1):0] is nonzero. * The PE is in a BRBE Non-prohibited region. Unfortunately that means we could only let guests use BRBE with a partitioned PMU, which would massively reduce flexibility if hosts have to lose counters just so the guest can use BRBE. I don't know if this is a stupid idea, but instead of having a fixed number for the partition, wouldn't it be nice if we could trap and increment HPMN on the first guest use of a counter, then decrement it on guest exit depending on what's still in use? The host would always assign its counters from the top down, and guests go bottom up if they want PMU passthrough. Maybe it's too complicated or won't work for various reasons, but because of BRBE the counter partitioning changes go from an optimization to almost a necessity. Signed-off-by: Colton Lewis --- arch/arm64/include/asm/kvm_pmu.h | 4 +++ arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/debug.c | 9 -- arch/arm64/kvm/pmu-part.c| 47 arch/arm64/kvm/pmu.c | 2 ++ include/linux/perf/arm_pmu.h | 2 ++ 6 files changed, 62 insertions(+), 4 deletions(-) create mode 100644 arch/arm64/kvm/pmu-part.c diff --git a/arch/arm64/include/asm/kvm_pmu.h b/arch/arm64/include/asm/kvm_pmu.h index 613cddbdbdd8..174b7f376d95 100644 --- a/arch/arm64/include/asm/kvm_pmu.h +++ b/arch/arm64/include/asm/kvm_pmu.h @@ -22,6 +22,10 @@ bool kvm_set_pmuserenr(u64 val); void kvm_vcpu_pmu_resync_el0(void); void kvm_host_pmu_init(struct arm_pmu *pmu); +u8 kvm_pmu_get_reserved_counters(void); +u8 kvm_pmu_hpmn(u8 nr_counters); +void kvm_pmu_partition(struct arm_pmu *pmu); + #else static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {} diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 3cf7adb2b503..065a6b804c84 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -25,7 +25,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \ vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \ vgic/vgic-its.o vgic/vgic-debug.o -kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o +kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu-part.o pmu.o kvm-$(CONFIG_ARM64_PTR_AUTH) += pauth.o kvm-$(CONFIG_PTDUMP_STAGE2_DEBUGFS) += ptdump.o diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c index 7fb1d9e7180f..b5ac5a213877 100644 --- a/arch/arm64/kvm/debug.c +++ b/arch/arm64/kvm/debug.c @@ -31,15 +31,18 @@ */ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu) { + u8 counters = *host_data_ptr(nr_event_counters); + u8 hpmn = kvm_pmu_hpmn(counters); + preempt_disable(); Would you no
[PATCH net 4/4] selftests: mptcp: ignore mptcp_diag binary
A new binary is now generated by the MPTCP selftests: mptcp_diag. Like the other binaries from this directory, there is no need to track this in Git, it should then be ignored. Fixes: 00f5e338cf7e ("selftests: mptcp: Add a tool to get specific msk_info") Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/.gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/net/mptcp/.gitignore b/tools/testing/selftests/net/mptcp/.gitignore index 49daae73c41e6f86c6f0e47aa42426e5ad5c17e6..833279fb34e2dd74a27f16c26e44108029dd45e1 100644 --- a/tools/testing/selftests/net/mptcp/.gitignore +++ b/tools/testing/selftests/net/mptcp/.gitignore @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only mptcp_connect +mptcp_diag mptcp_inq mptcp_sockopt pm_nl_ctl -- 2.48.1
[PATCH v2 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc
SGX architecture introduced a new instruction called EUPDATESVN to Ice Lake. It allows updating security SVN version, given that EPC is completely empty. The latter is required for security reasons in order to reason that enclave security posture is as secure as the security SVN version of the TCB that created it. Additionally it is important to ensure that while ENCLS[EUPDATESVN] runs, no concurrent page creation happens in EPC, because it might result in #GP delivered to the creator. Legacy SW might not be prepared to handle such unexpected #GPs and therefore this patch introduces a locking mechanism to ensure no concurrent EPC allocations can happen. It is also ensured that ENCLS[EUPDATESVN] is not called when running in a VM since it does not have a meaning in this context (microcode updates application is limited to the host OS) and will create unnecessary load. This patch is based on previous submision by Cathy Zhang https://lore.kernel.org/all/20220520103904.1216-1-cathy.zh...@intel.com/ Signed-off-by: Elena Reshetova --- arch/x86/include/asm/sgx.h | 41 + arch/x86/kernel/cpu/sgx/encls.h | 6 arch/x86/kernel/cpu/sgx/main.c | 63 - arch/x86/kernel/cpu/sgx/sgx.h | 1 + 4 files changed, 95 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h index 6a0069761508..5caf5c31ebc6 100644 --- a/arch/x86/include/asm/sgx.h +++ b/arch/x86/include/asm/sgx.h @@ -26,23 +26,26 @@ #define SGX_CPUID_EPC_SECTION 0x1 /* The bitmask for the EPC section type. */ #define SGX_CPUID_EPC_MASK GENMASK(3, 0) +/* EUPDATESVN presence indication */ +#define SGX_CPUID_EUPDATESVN BIT(10) enum sgx_encls_function { - ECREATE = 0x00, - EADD= 0x01, - EINIT = 0x02, - EREMOVE = 0x03, - EDGBRD = 0x04, - EDGBWR = 0x05, - EEXTEND = 0x06, - ELDU= 0x08, - EBLOCK = 0x09, - EPA = 0x0A, - EWB = 0x0B, - ETRACK = 0x0C, - EAUG= 0x0D, - EMODPR = 0x0E, - EMODT = 0x0F, + ECREATE = 0x00, + EADD= 0x01, + EINIT = 0x02, + EREMOVE = 0x03, + EDGBRD = 0x04, + EDGBWR = 0x05, + EEXTEND = 0x06, + ELDU= 0x08, + EBLOCK = 0x09, + EPA = 0x0A, + EWB = 0x0B, + ETRACK = 0x0C, + EAUG= 0x0D, + EMODPR = 0x0E, + EMODT = 0x0F, + EUPDATESVN = 0x18, }; /** @@ -73,6 +76,11 @@ enum sgx_encls_function { * public key does not match IA32_SGXLEPUBKEYHASH. * %SGX_PAGE_NOT_MODIFIABLE: The EPC page cannot be modified because it * is in the PENDING or MODIFIED state. + * %SGX_INSUFFICIENT_ENTROPY: Insufficient entropy in RNG. + * %SGX_EPC_NOT_READY: EPC is not ready for SVN update. + * %SGX_NO_UPDATE: EUPDATESVN was successful, but CPUSVN was not + * updated because current SVN was not newer than + * CPUSVN. * %SGX_UNMASKED_EVENT:An unmasked event, e.g. INTR, was received */ enum sgx_return_code { @@ -81,6 +89,9 @@ enum sgx_return_code { SGX_CHILD_PRESENT = 13, SGX_INVALID_EINITTOKEN = 16, SGX_PAGE_NOT_MODIFIABLE = 20, + SGX_INSUFFICIENT_ENTROPY= 29, + SGX_EPC_NOT_READY = 30, + SGX_NO_UPDATE = 31, SGX_UNMASKED_EVENT = 128, }; diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h index 99004b02e2ed..3d83c76dc91f 100644 --- a/arch/x86/kernel/cpu/sgx/encls.h +++ b/arch/x86/kernel/cpu/sgx/encls.h @@ -233,4 +233,10 @@ static inline int __eaug(struct sgx_pageinfo *pginfo, void *addr) return __encls_2(EAUG, pginfo, addr); } +/* Update CPUSVN at runtime. */ +static inline int __eupdatesvn(void) +{ + return __encls_ret_1(EUPDATESVN, ""); +} + #endif /* _X86_ENCLS_H */ diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index b61d3bad0446..24563110811d 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -32,6 +32,11 @@ static DEFINE_XARRAY(sgx_epc_address_space); static LIST_HEAD(sgx_active_page_list); static DEFINE_SPINLOCK(sgx_reclaimer_lock); +/* This lock is held to prevent new EPC pages from being created + * during the execution of ENCLS[EUPDATESVN]. + */ +static DEFINE_SPINLOCK(sgx_epc_eupdatesvn_lock); + static atomic_long_t sgx_nr_used_pages = ATOMIC_LONG_INIT(0); static unsigned long sgx_nr_total_pages; @@ -457,7 +462,17 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid) page->flags = 0; spin_unlock(&node->lock); - atomic_long_inc(&sgx_nr_used_