[PATCH v2] lib: objpool: fix head overrun on big.LITTLE system
objpool overrun stress with test_objpool on OrangePi5+ SBC triggered the following kernel warnings: WARNING: CPU: 6 PID: 3115 at lib/objpool.c:168 objpool_push+0xc0/0x100 This message is from objpool.c:168: WARN_ON_ONCE(tail - head > pool->nr_objs); The overrun test case is to validate the case that pre-allocated objects are insufficient: 8 objects are pre-allocated for each node and consumer thread per node tries to grab 16 objects in a row. The testing system is OrangePI 5+, with RK3588, a big.LITTLE SOC with 4x A76 and 4x A55. When disabling either all 4 big or 4 little cores, the overrun tests run well, and once with big and little cores mixed together, the overrun test would always cause an overrun loop. It's likely the memory timing differences of big and little cores cause this trouble. Here are the debugging data of objpool_try_get_slot after try_cmpxchg_release: objpool_pop: cpu: 4/0 0:0 head: 278/279 tail:278 last:276/278 The local copies of 'head' and 'last' were 278 and 276, and reloading of 'slot->head' and 'slot->last' got 279 and 278. After try_cmpxchg_release 'slot->head' became 'head + 1', which is correct. But what's wrong here is the stale value of 'last', and that stale value of 'last' finally led the overrun of 'head'. Memory updating of 'last' and 'head' are performed in push() and pop() independently, which could be the culprit leading this out of order visibility of 'last' and 'head'. So for objpool_try_get_slot(), it's not enough only checking the condition of 'head != slot', the implicit condition 'last - head <= nr_objs' must also be explicitly asserted to guarantee 'last' is always behind 'head' before the object retrieving. This patch will check and try reloading of 'head' and 'last' to ensure 'last' is behind 'head' at the time of object retrieving. Performance testings show the average impact is about 0.1% for X86_64 and 1.12% for ARM64. Here are the results: OS: Debian 10 X86_64, Linux 6.6rc HW: XEON 8336C x 2, 64 cores/128 threads, DDR4 3200MT/s 1T 2T 4T 8T16T native: 49543304 99277826 199017659 399070324 795185848 objpool:29909085 59865637 119692073 239750369 478005250 objpool+: 29879313 59230743 119609856 239067773 478509029 32T48T64T96T 128T native: 1596927073 2390099988 2929397330 3183875848 3257546602 objpool: 957553042 1435814086 1680872925 2043126796 2165424198 objpool+: 956476281 1434491297 1666055740 2041556569 2157415622 OS: Debian 11 AARCH64, Linux 6.6rc HW: Kunpeng-920 96 cores/2 sockets/4 NUMA nodes, DDR4 2933 MT/s 1T 2T 4T 8T16T native: 30890508 60399915 123111980 242257008 494002946 objpool:14742531 28883047 57739948 115886644 232455421 objpool+: 14107220 29032998 57286084 113730493 232232850 24T32T48T64T96T native:746406039 1000174750 1493236240 1998318364 2942911180 objpool: 349164852 467284332 702296756 934459713 1387898285 objpool+: 348388180 462750976 696606096 927865887 1368402195 Fixes: b4edb8d2d464 ("lib: objpool added: ring-array based lockless MPMC") v1 -> v2: - Title updated since it's a common issue of objpool for big.LITTLE systems, verified on Rockchip RK3588 and Amlogic A311D - Fixes tag added, as reminded by Masami Hiramatsu Signed-off-by: wuqiang.matt Acked-by: Masami Hiramatsu (Google) --- lib/objpool.c | 17 + 1 file changed, 17 insertions(+) diff --git a/lib/objpool.c b/lib/objpool.c index 1ab49b897b2e..e8c9a30e485f 100644 --- a/lib/objpool.c +++ b/lib/objpool.c @@ -199,6 +199,23 @@ static inline void *objpool_try_get_slot(struct objpool_head *pool, int cpu) while (head != READ_ONCE(slot->last)) { void *obj; + /* +* data visibility of 'last' and 'head' could be out of +* order since memory updating of 'last' and 'head' are +* performed in push() and pop() independently +* +* before any retrieving attempts, pop() must guarantee +* 'last' is behind 'head', that is to say, there must +* be available objects in slot, which could be ensured +* by condition 'last != head && last - head <= nr_objs' +* that is equivalent to 'last - head - 1 < nr_objs' as +* 'last' and 'head' are both unsigned int32 +*/ + if (READ_ONCE(slot->last) - head - 1 >= pool->nr_objs) { + head = READ_ONCE(slot->head); + continue; + } + /* obj must be retrieved before moving forward head */ obj = READ_ONCE(slot->entries[head & slot->mask]); -- 2.40.1
Re: [PATCH RFC 20/37] mm: compaction: Reserve metadata storage in compaction_alloc()
Hi Alexandru, On Wed, Aug 23, 2023 at 6:16 AM Alexandru Elisei wrote: > > If the source page being migrated has metadata associated with it, make > sure to reserve the metadata storage when choosing a suitable destination > page from the free list. > > Signed-off-by: Alexandru Elisei > --- > mm/compaction.c | 9 + > mm/internal.h | 1 + > 2 files changed, 10 insertions(+) > > diff --git a/mm/compaction.c b/mm/compaction.c > index cc0139fa0cb0..af2ee3085623 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -570,6 +570,7 @@ static unsigned long isolate_freepages_block(struct > compact_control *cc, > bool locked = false; > unsigned long blockpfn = *start_pfn; > unsigned int order; > + int ret; > > /* Strict mode is for isolation, speed is secondary */ > if (strict) > @@ -626,6 +627,11 @@ static unsigned long isolate_freepages_block(struct > compact_control *cc, > > /* Found a free page, will break it into order-0 pages */ > order = buddy_order(page); > + if (metadata_storage_enabled() && cc->reserve_metadata) { > + ret = reserve_metadata_storage(page, order, > cc->gfp_mask); At this point the zone lock is held and preemption is disabled, which makes it invalid to call reserve_metadata_storage. Peter > + if (ret) > + goto isolate_fail; > + } > isolated = __isolate_free_page(page, order); > if (!isolated) > break; > @@ -1757,6 +1763,9 @@ static struct folio *compaction_alloc(struct folio > *src, unsigned long data) > struct compact_control *cc = (struct compact_control *)data; > struct folio *dst; > > + if (metadata_storage_enabled()) > + cc->reserve_metadata = folio_has_metadata(src); > + > if (list_empty(>freepages)) { > isolate_freepages(cc); > > diff --git a/mm/internal.h b/mm/internal.h > index d28ac0085f61..046cc264bfbe 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -492,6 +492,7 @@ struct compact_control { > */ > bool alloc_contig; /* alloc_contig_range allocation */ > bool source_has_metadata; /* source pages have associated > metadata */ > + bool reserve_metadata; > }; > > /* > -- > 2.41.0 >
[PATCH 1/2] eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL
From: "Steven Rostedt (Google)" The logic to free the eventfs_inode (ei) use to set is_freed and clear the "dentry" field under the eventfs_mutex. But that changed when a race was found where the ei->dentry needed to be cleared when the last dput() was called on it. But there was still logic that checked if ei->dentry was not NULL and is_freed is set, and would warn if it was. But since that situation was changed and the ei->dentry isn't cleared until the last dput() is called on it while the ei->is_freed is set, do not test for that condition anymore, and change the comments to reflect that. Fixes: 020010fbfa20 ("eventfs: Delete eventfs_inode when the last dentry is freed") Reported-by: Mark Rutland Signed-off-by: Steven Rostedt (Google) --- fs/tracefs/event_inode.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index f8a594a50ae6..f239b2b507a4 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -27,16 +27,16 @@ /* * eventfs_mutex protects the eventfs_inode (ei) dentry. Any access * to the ei->dentry must be done under this mutex and after checking - * if ei->is_freed is not set. The ei->dentry is released under the - * mutex at the same time ei->is_freed is set. If ei->is_freed is set - * then the ei->dentry is invalid. + * if ei->is_freed is not set. When ei->is_freed is set, the dentry + * is on its way to being freed after the last dput() is made on it. */ static DEFINE_MUTEX(eventfs_mutex); /* * The eventfs_inode (ei) itself is protected by SRCU. It is released from * its parent's list and will have is_freed set (under eventfs_mutex). - * After the SRCU grace period is over, the ei may be freed. + * After the SRCU grace period is over and the last dput() is called + * the ei is freed. */ DEFINE_STATIC_SRCU(eventfs_srcu); @@ -365,12 +365,14 @@ create_file_dentry(struct eventfs_inode *ei, int idx, * created the dentry for this e_dentry. In which case * use that one. * -* Note, with the mutex held, the e_dentry cannot have content -* and the ei->is_freed be true at the same time. +* If ei->is_freed is set, the e_dentry is currently on its +* way to being freed, don't return it. If e_dentry is NULL +* it means it was already freed. */ - dentry = *e_dentry; - if (WARN_ON_ONCE(dentry && ei->is_freed)) + if (ei->is_freed) dentry = NULL; + else + dentry = *e_dentry; /* The lookup does not need to up the dentry refcount */ if (dentry && !lookup) dget(dentry); @@ -473,8 +475,8 @@ create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, * created the dentry for this e_dentry. In which case * use that one. * -* Note, with the mutex held, the e_dentry cannot have content -* and the ei->is_freed be true at the same time. +* If ei->is_freed is set, the e_dentry is currently on its +* way to being freed. */ dentry = ei->dentry; if (dentry && !lookup) -- 2.42.0
[PATCH 2/2] eventfs: Do not invalidate dentry in create_file/dir_dentry()
From: "Steven Rostedt (Google)" With the call to simple_recursive_removal() on the entire eventfs sub system when the directory is removed, it performs the d_invalidate on all the dentries when it is removed. There's no need to do clean ups when a dentry is being created while the directory is being deleted. As dentries are cleaned up by the simpler_recursive_removal(), trying to do d_invalidate() in these functions will cause the dentry to be invalidated twice, and crash the kernel. Link: https://lore.kernel.org/all/20231116123016.140576-1-naresh.kamb...@linaro.org/ Fixes: 407c6726ca71 ("eventfs: Use simple_recursive_removal() to clean up dentries") Reported-by: Mark Rutland Reported-by: Naresh Kamboju Reported-by: Linux Kernel Functional Testing Signed-off-by: Steven Rostedt (Google) --- fs/tracefs/event_inode.c | 19 ++- 1 file changed, 6 insertions(+), 13 deletions(-) diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index f239b2b507a4..3eb6c622a74d 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -326,7 +326,6 @@ create_file_dentry(struct eventfs_inode *ei, int idx, struct eventfs_attr *attr = NULL; struct dentry **e_dentry = >d_children[idx]; struct dentry *dentry; - bool invalidate = false; mutex_lock(_mutex); if (ei->is_freed) { @@ -389,17 +388,14 @@ create_file_dentry(struct eventfs_inode *ei, int idx, * Otherwise it means two dentries exist with the same name. */ WARN_ON_ONCE(!ei->is_freed); - invalidate = true; + dentry = NULL; } mutex_unlock(_mutex); - if (invalidate) - d_invalidate(dentry); - - if (lookup || invalidate) + if (lookup) dput(dentry); - return invalidate ? NULL : dentry; + return dentry; } /** @@ -439,7 +435,6 @@ static struct dentry * create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, struct dentry *parent, bool lookup) { - bool invalidate = false; struct dentry *dentry = NULL; mutex_lock(_mutex); @@ -495,16 +490,14 @@ create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, * Otherwise it means two dentries exist with the same name. */ WARN_ON_ONCE(!ei->is_freed); - invalidate = true; + dentry = NULL; } mutex_unlock(_mutex); - if (invalidate) - d_invalidate(dentry); - if (lookup || invalidate) + if (lookup) dput(dentry); - return invalidate ? NULL : dentry; + return dentry; } /** -- 2.42.0
[PATCH 0/2] eventfs: Fixes for v6.7-rc2
[ Resend to trigger my patchwork updates, and Cc linux-trace-kernel ] A couple of fixes to eventfs: - With the usage of simple_recursive_remove() recommended by Al Viro, the code should not be calling "d_invalidate()" itself. Doing so is causing crashes. The code was calling d_invalidate() on the race of trying to look up a file while the parent was being deleted. This was detected, and the added dentry was having d_invalidate() called on it, but the deletion of the directory was also calling d_invalidate() on that same dentry. - A fix to not free the eventfs_inode (ei) until the last dput() was called on its ei->dentry made the ei->dentry exist even after it was marked for free by setting the ei->is_freed. But code elsewhere still was checking if ei->dentry was NULL if ei->is_freed is set and would trigger WARN_ON if that was the case. That's no longer true and there should not be any warnings when it is true. Steven Rostedt (Google) (2): eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL eventfs: Do not invalidate dentry in create_file/dir_dentry() fs/tracefs/event_inode.c | 41 ++--- 1 file changed, 18 insertions(+), 23 deletions(-)
Re: [PATCH v4 0/2] Rpmsg support for i.MX DSP with resource table
On Fri, Oct 13, 2023 at 06:27:29PM +0300, Iuliana Prodan (OSS) wrote: > From: Iuliana Prodan > > These patches are needed in order to support rpmsg on DSP when a > resource table is available. > > Changes since v3: > - add reserve-memory nodes in imx8mp-evk.dts rather than .dtsi (patch 2/2) > > Changes since v2: > - add newline between nodes in dtsi (patch 2/2) > > Changes since v1: > - add missing bracket in dtsi (patch 2/2) > > Iuliana Prodan (2): > remoteproc: imx_dsp_rproc: add mandatory find_loaded_rsc_table op > arm64: dts: imx8mp: add reserve-memory nodes for DSP > > arch/arm64/boot/dts/freescale/imx8mp-evk.dts | 22 > drivers/remoteproc/imx_dsp_rproc.c | 1 + > 2 files changed, 23 insertions(+) > I have applied this set. Thanks, Mathieu > -- > 2.17.1 >
Re: selftests: ftrace: WARNING: __list_del_entry_valid_or_report (lib/list_debug.c:62 (discriminator 1))
On Thu, 16 Nov 2023 18:00:16 +0530 Naresh Kamboju wrote: > Following kernel crash noticed while running selftests: ftrace on arm64 > Juno-r2 > device running stable-rc linux-6.6.y kernel. > > This kernel crash is hard to reproduce. > Can you test this patch. Note, there's a similar bug on 6.7-rc1 which I'll fix first. And when that's accepted, I'll push this one for v6.6. This may be two patches as one if the d_invalidate() issue, and another is a memory leak fix. -- Steve diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 5fcfb634fec2..b60048469df1 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -289,6 +289,8 @@ void eventfs_set_ef_status_free(struct tracefs_inode *ti, struct dentry *dentry) ef = dentry->d_fsdata; if (ef) free_ef(ef); + else + kfree(ei); return; } @@ -342,7 +344,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef) static struct dentry * create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup) { - bool invalidate = false; struct dentry *dentry; mutex_lock(_mutex); @@ -387,23 +388,24 @@ create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup) eventfs_post_create_dir(ef); dentry->d_fsdata = ef; } else { - /* A race here, should try again (unless freed) */ - invalidate = true; - /* +* If we are here then the directory is being freed. +* The simple_recursive_removal() will get rid of the dentry +* here. +*/ + dentry = NULL; +/* * Should never happen unless we get here due to being freed. * Otherwise it means two dentries exist with the same name. */ WARN_ON_ONCE(!ef->is_freed); } mutex_unlock(_mutex); - if (invalidate) - d_invalidate(dentry); - if (lookup || invalidate) + if (lookup) dput(dentry); - return invalidate ? NULL : dentry; + return dentry; } static bool match_event_file(struct eventfs_file *ef, const char *name) -- 2.42.0
[PATCH v3 2/5] params: Do not go over the limit when getting the string length
We can use strnlen() even on early stages and it prevents from going over the string boundaries in case it's already too long. Reviewed-by: Luis Chamberlain Reviewed-by: Kees Cook Signed-off-by: Andy Shevchenko --- kernel/params.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/kernel/params.c b/kernel/params.c index 626fa8265932..f8e3c4139854 100644 --- a/kernel/params.c +++ b/kernel/params.c @@ -260,7 +260,10 @@ EXPORT_SYMBOL_GPL(param_set_uint_minmax); int param_set_charp(const char *val, const struct kernel_param *kp) { - if (strlen(val) > 1024) { + size_t len, maxlen = 1024; + + len = strnlen(val, maxlen + 1); + if (len == maxlen + 1) { pr_err("%s: string parameter too long\n", kp->name); return -ENOSPC; } @@ -270,7 +273,7 @@ int param_set_charp(const char *val, const struct kernel_param *kp) /* This is a hack. We can't kmalloc in early boot, and we * don't need to; this mangled commandline is preserved. */ if (slab_is_available()) { - *(char **)kp->arg = kmalloc_parameter(strlen(val)+1); + *(char **)kp->arg = kmalloc_parameter(len + 1); if (!*(char **)kp->arg) return -ENOMEM; strcpy(*(char **)kp->arg, val); @@ -508,7 +511,7 @@ int param_set_copystring(const char *val, const struct kernel_param *kp) { const struct kparam_string *kps = kp->str; - if (strlen(val)+1 > kps->maxlen) { + if (strnlen(val, kps->maxlen) == kps->maxlen) { pr_err("%s: string doesn't fit in %u chars.\n", kp->name, kps->maxlen-1); return -ENOSPC; -- 2.43.0.rc1.1.gbec44491f096
[PATCH v3 5/5] params: Fix multi-line comment style
The multi-line comment style in the file is rather arbitrary. Make it follow the standard one. Reviewed-by: Luis Chamberlain Reviewed-by: Kees Cook Signed-off-by: Andy Shevchenko --- kernel/params.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/kernel/params.c b/kernel/params.c index eb55b32399b4..2e447f8ae183 100644 --- a/kernel/params.c +++ b/kernel/params.c @@ -1,8 +1,8 @@ // SPDX-License-Identifier: GPL-2.0-or-later -/* Helpers for initial module or kernel cmdline parsing - Copyright (C) 2001 Rusty Russell. - -*/ +/* + * Helpers for initial module or kernel cmdline parsing + * Copyright (C) 2001 Rusty Russell. + */ #include #include #include @@ -271,8 +271,10 @@ int param_set_charp(const char *val, const struct kernel_param *kp) maybe_kfree_parameter(*(char **)kp->arg); - /* This is a hack. We can't kmalloc in early boot, and we -* don't need to; this mangled commandline is preserved. */ + /* +* This is a hack. We can't kmalloc() in early boot, and we +* don't need to; this mangled commandline is preserved. +*/ if (slab_is_available()) { *(char **)kp->arg = kmalloc_parameter(len + 1); if (!*(char **)kp->arg) @@ -743,8 +745,10 @@ void module_param_sysfs_remove(struct module *mod) { if (mod->mkobj.mp) { sysfs_remove_group(>mkobj.kobj, >mkobj.mp->grp); - /* We are positive that no one is using any param -* attrs at this point. Deallocate immediately. */ + /* +* We are positive that no one is using any param +* attrs at this point. Deallocate immediately. +*/ free_module_param_attrs(>mkobj); } } -- 2.43.0.rc1.1.gbec44491f096
[PATCH v3 4/5] params: Sort headers
Sort the headers in alphabetic order in order to ease the maintenance for this part. Reviewed-by: Luis Chamberlain Reviewed-by: Kees Cook Signed-off-by: Andy Shevchenko --- kernel/params.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/kernel/params.c b/kernel/params.c index c3a029fe183d..eb55b32399b4 100644 --- a/kernel/params.c +++ b/kernel/params.c @@ -3,18 +3,18 @@ Copyright (C) 2001 Rusty Russell. */ -#include -#include -#include -#include -#include -#include +#include #include #include +#include +#include +#include +#include +#include #include -#include -#include #include +#include +#include #ifdef CONFIG_SYSFS /* Protects all built-in parameters, modules use their own param_lock */ -- 2.43.0.rc1.1.gbec44491f096
[PATCH v3 1/5] params: Introduce the param_unknown_fn type
Introduce a new type for the callback to parse an unknown argument. This unifies function prototypes which takes that as a parameter. Reviewed-by: Luis Chamberlain Reviewed-by: Kees Cook Signed-off-by: Andy Shevchenko --- include/linux/moduleparam.h | 6 +++--- kernel/params.c | 8 ++-- 2 files changed, 5 insertions(+), 9 deletions(-) diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h index 4fa9726bc328..bfb85fd13e1f 100644 --- a/include/linux/moduleparam.h +++ b/include/linux/moduleparam.h @@ -385,6 +385,8 @@ extern bool parameq(const char *name1, const char *name2); */ extern bool parameqn(const char *name1, const char *name2, size_t n); +typedef int (*parse_unknown_fn)(char *param, char *val, const char *doing, void *arg); + /* Called on module insert or kernel boot */ extern char *parse_args(const char *name, char *args, @@ -392,9 +394,7 @@ extern char *parse_args(const char *name, unsigned num, s16 level_min, s16 level_max, - void *arg, - int (*unknown)(char *param, char *val, -const char *doing, void *arg)); + void *arg, parse_unknown_fn unknown); /* Called by module remove. */ #ifdef CONFIG_SYSFS diff --git a/kernel/params.c b/kernel/params.c index 2d4a0564697e..626fa8265932 100644 --- a/kernel/params.c +++ b/kernel/params.c @@ -120,9 +120,7 @@ static int parse_one(char *param, unsigned num_params, s16 min_level, s16 max_level, -void *arg, -int (*handle_unknown)(char *param, char *val, -const char *doing, void *arg)) +void *arg, parse_unknown_fn handle_unknown) { unsigned int i; int err; @@ -165,9 +163,7 @@ char *parse_args(const char *doing, unsigned num, s16 min_level, s16 max_level, -void *arg, -int (*unknown)(char *param, char *val, - const char *doing, void *arg)) +void *arg, parse_unknown_fn unknown) { char *param, *val, *err = NULL; -- 2.43.0.rc1.1.gbec44491f096
[PATCH v3 3/5] params: Use size_add() for kmalloc()
Prevent allocations from integer overflow by using size_add(). Reviewed-by: Luis Chamberlain Reviewed-by: Kees Cook Signed-off-by: Andy Shevchenko --- kernel/params.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/params.c b/kernel/params.c index f8e3c4139854..c3a029fe183d 100644 --- a/kernel/params.c +++ b/kernel/params.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -48,7 +49,7 @@ static void *kmalloc_parameter(unsigned int size) { struct kmalloced_param *p; - p = kmalloc(sizeof(*p) + size, GFP_KERNEL); + p = kmalloc(size_add(sizeof(*p), size), GFP_KERNEL); if (!p) return NULL; -- 2.43.0.rc1.1.gbec44491f096
[PATCH v3 0/5] params: harden string ops and allocatio ops
A couple of patches are for get the string ops, used in the module, slightly harden. On top a few cleanups. Since the main part is rather hardening, I think the Kees' tree is the best fit for the series. It also possible to route via Greg's sysfs (driver core?), but I'm open for another option(s). Changelog v3: - added tags (Kees, Luis) Changelog v2: - dropped the s*printf() --> sysfs_emit() conversion as it revealed an issue, i.e. reuse getters with non-page-aligned pointer, which would be addressed separately - added cover letter and clarified the possible route for the series (Luis) Andy Shevchenko (5): params: Introduce the param_unknown_fn type params: Do not go over the limit when getting the string length params: Use size_add() for kmalloc() params: Sort headers params: Fix multi-line comment style include/linux/moduleparam.h | 6 ++-- kernel/params.c | 56 - 2 files changed, 33 insertions(+), 29 deletions(-) -- 2.43.0.rc1.1.gbec44491f096
Re: [PATCH v1 1/1] ACPI: NFIT: Switch to use acpi_evaluate_dsm_typed()
On Mon, Nov 20, 2023 at 4:03 PM Andy Shevchenko wrote: > > On Thu, Oct 19, 2023 at 06:03:28PM -0700, Dan Williams wrote: > > Andy Shevchenko wrote: > > > The acpi_evaluate_dsm_typed() provides a way to check the type of the > > > object evaluated by _DSM call. Use it instead of open coded variant. > > > > Looks good to me. > > > > Reviewed-by: Dan Williams > > Thank you! > > Who is taking care of this? Rafael? I can apply it.
Re: [PATCH v1 1/1] ACPI: NFIT: Switch to use acpi_evaluate_dsm_typed()
On Thu, Oct 19, 2023 at 06:03:28PM -0700, Dan Williams wrote: > Andy Shevchenko wrote: > > The acpi_evaluate_dsm_typed() provides a way to check the type of the > > object evaluated by _DSM call. Use it instead of open coded variant. > > Looks good to me. > > Reviewed-by: Dan Williams Thank you! Who is taking care of this? Rafael? -- With Best Regards, Andy Shevchenko
Re: [PATCH v1] lib: objpool: fix head overrun on RK3588 SBC
On 2023/11/20 13:18, Masami Hiramatsu (Google) wrote: On Tue, 14 Nov 2023 19:51:48 +0800 "wuqiang.matt" wrote: objpool overrun stress with test_objpool on OrangePi5+ SBC triggered the following kernel warnings: WARNING: CPU: 6 PID: 3115 at lib/objpool.c:168 objpool_push+0xc0/0x100 This message is from objpool.c:168: WARN_ON_ONCE(tail - head > pool->nr_objs); The overrun test case is to validate the case that pre-allocated objects are insufficient: 8 objects are pre-allocated for each node and consumer thread per node tries to grab 16 objects in a row. The testing system is OrangePI 5+, with RK3588, a big.LITTLE SOC with 4x A76 and 4x A55. When disabling either all 4 big or 4 little cores, the overrun tests run well, and once with big and little cores mixed together, the overrun test would always cause an overrun loop. It's likely the memory timing differences of big and little cores cause this trouble. Here are the debugging data of objpool_try_get_slot after try_cmpxchg_release: objpool_pop: cpu: 4/0 0:0 head: 278/279 tail:278 last:276/278 The local copies of 'head' and 'last' were 278 and 276, and reloading of 'slot->head' and 'slot->last' got 279 and 278. After try_cmpxchg_release 'slot->head' became 'head + 1', which is correct. But what's wrong here is the stale value of 'last', and that stale value of 'last' finally led the overrun of 'head'. Ah, good catch! So even if the ring size is enough, the head/tail update value is not updated locally, it can cause the overrun! It's really confusing at the first glance of such an issue. I was assuming the order between 'last' and 'head' should be implicitly maintained, but after more digging, then found that wasn't true actually, the order should be explicitly guaranteed by pop(). I also verified with Amlogic A311D which has 6 cores (4x A73 and 4x A53), and got same results. I think I just need re-discover the differences of HMP (heterogeneous multiprocessing) for big.LITTLE or P/E cores cpus. Memory updating of 'last' and 'head' are performed in push() and pop() independently, which could be the culprit leading this out of order visibility of 'last' and 'head'. So for objpool_try_get_slot(), it's not enough only checking the condition of 'head != slot', the implicit condition 'last - head <= nr_objs' must also be explicitly asserted to guarantee 'last' is always behind 'head' before the object retrieving. Indeed. Thanks for the investigation! This patch will check and try reloading of 'head' and 'last' to ensure 'last' is behind 'head' at the time of object retrieving. Performance testings show the average impact is about 0.1% for X86_64 and 1.12% for ARM64. Here are the results: OS: Debian 10 X86_64, Linux 6.6rc HW: XEON 8336C x 2, 64 cores/128 threads, DDR4 3200MT/s 1T 2T 4T 8T16T native: 49543304 99277826 199017659 399070324 795185848 objpool:29909085 59865637 119692073 239750369 478005250 objpool+: 29879313 59230743 119609856 239067773 478509029 32T48T64T96T 128T native: 1596927073 2390099988 2929397330 3183875848 3257546602 objpool: 957553042 1435814086 1680872925 2043126796 2165424198 objpool+: 956476281 1434491297 1666055740 2041556569 2157415622 OS: Debian 11 AARCH64, Linux 6.6rc HW: Kunpeng-920 96 cores/2 sockets/4 NUMA nodes, DDR4 2933 MT/s 1T 2T 4T 8T16T native: 30890508 60399915 123111980 242257008 494002946 objpool:14742531 28883047 57739948 115886644 232455421 objpool+: 14107220 29032998 57286084 113730493 232232850 24T32T48T64T96T native:746406039 1000174750 1493236240 1998318364 2942911180 objpool: 349164852 467284332 702296756 934459713 1387898285 objpool+: 348388180 462750976 696606096 927865887 1368402195 OK, looks good to me. Acked-by: Masami Hiramatsu (Google) And let me pick it. Signed-off-by: wuqiang.matt BTW, this is a real bugfix, so it should have Fixes tag :) Fixes: b4edb8d2d464 ("lib: objpool added: ring-array based lockless MPMC") Oh, right! Thanks for your kind reminder. I'll keep that in mind. Thank you! Best regards. --- lib/objpool.c | 17 + 1 file changed, 17 insertions(+) diff --git a/lib/objpool.c b/lib/objpool.c index ce0087f64400..cfdc02420884 100644 --- a/lib/objpool.c +++ b/lib/objpool.c @@ -201,6 +201,23 @@ static inline void *objpool_try_get_slot(struct objpool_head *pool, int cpu) while (head != READ_ONCE(slot->last)) { void *obj; + /* +* data visibility of 'last' and 'head' could be out of +* order since memory updating of 'last' and 'head' are +* performed in push() and pop() independently +
Re: Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)
On 11/20/23 6:56 PM, Peter Zijlstra Wrote: On Sat, Nov 18, 2023 at 01:14:32PM +0800, Abel Wu wrote: Hi Peter, I'm a little confused here. As we adopt placement strategy #1 when PLACE_LAG is enabled, the lag of that entity needs to be preserved. Given that the weight doesn't change, we have: vl' = vl But in fact it is scaled on placement: vl' = vl * W/(W + w) (W+w)/W Ah, right. I misunderstood (again) the comment which says: vl_i = (W + w_i)*vl'_i / W So the current implementation is: v' = V - vl' and what I was proposing is: v' = V' - vl and they are equal in fact. Does this intended? The scaling, yes that's intended and the comment explains why. So now you have me confused too :-) Specifically, I want the lag after placement to be equal to the lag we come in with. Since placement will affect avg_vruntime (adding one element to the average changes the average etc..) the placement also affects the lag as measured after placement. Yes. You did the math in an iterative fashion and mine is facing the final state: v' = V' - vlag V' = (WV + wv') / (W + w) which gives: V' = V - w * vlag / W Or rather, if you enqueue and dequeue, I want the lag to be preserved. If you do not take placement into consideration, lag will dissipate real quick. And to illustrate my understanding of strategy #1: @@ -5162,41 +5165,17 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) * vl_i is given by: * * V' = (\Sum w_j*v_j + w_i*v_i) / (W + w_i) -* = (W*V + w_i*(V - vl_i)) / (W + w_i) -* = (W*V + w_i*V - w_i*vl_i) / (W + w_i) -* = (V*(W + w_i) - w_i*l) / (W + w_i) -* = V - w_i*vl_i / (W + w_i) -* -* And the actual lag after adding an entity with vl_i is: -* -* vl'_i = V' - v_i -* = V - w_i*vl_i / (W + w_i) - (V - vl_i) -* = vl_i - w_i*vl_i / (W + w_i) -* -* Which is strictly less than vl_i. So in order to preserve lag -* we should inflate the lag before placement such that the -* effective lag after placement comes out right. -* -* As such, invert the above relation for vl'_i to get the vl_i -* we need to use such that the lag after placement is the lag -* we computed before dequeue. +* = (W*V + w_i*(V' - vl_i)) / (W + w_i) +* = V - w_i*vl_i / W * -* vl'_i = vl_i - w_i*vl_i / (W + w_i) -* = ((W + w_i)*vl_i - w_i*vl_i) / (W + w_i) -* -* (W + w_i)*vl'_i = (W + w_i)*vl_i - w_i*vl_i -* = W*vl_i -* -* vl_i = (W + w_i)*vl'_i / W */ load = cfs_rq->avg_load; if (curr && curr->on_rq) load += scale_load_down(curr->load.weight); - - lag *= load + scale_load_down(se->load.weight); if (WARN_ON_ONCE(!load)) load = 1; - lag = div_s64(lag, load); + + vruntime -= div_s64(lag * scale_load_down(se->load.weight), load); } se->vruntime = vruntime - lag; So you're proposing we do: v = V - (lag * w) / (W + w) - lag What I 'm proposing is: V' = V - w * vlag / W so we have: v' = V' - vlag = V - vlag * w/W - vlag = V - vlag * (W + w)/W which is exactly the same as current implementation. ? That can be written like: v = V - (lag * w) / (W+w) - (lag * (W+w)) / (W+w) = V - (lag * (W+w) + lag * w) / (W+w) = V - (lag * (W+2w)) / (W+w) And that turns into a mess AFAICT. Let me repeat my earlier argument. Suppose v,w,l are the new element. V,W are the old avg_vruntime and sum-weight. Then: V = V*W / W, and by extention: V' = (V*W + v*w) / (W + w). The new lag, after placement: l' = V' - v = (V*W + v*w) / (W+w) - v = (V*W + v*w) / (W+w) - v * (W+w) / (W+v) = (V*W + v*w -v*W - v*w) / (W+w) = (V*W - v*W) / (W+w) = W*(V-v) / (W+w) = W/(W+w) * (V-v) Substitute: v = V - (W+w)/W * l, my scaling thing, to obtain: l' = W/(W+w) * (V - (V - (W+w)/W * l)) = W/(W+w) * (V - V + (W+w)/W * l) = W/(W+w) * (W+w)/W * l = l So by scaling, we've preserved lag across placement. That make sense? Yes, I think I won't misunderstand again for the 3rd time :) Thanks! Abel
Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)
On Sat, Nov 18, 2023 at 01:14:32PM +0800, Abel Wu wrote: > Hi Peter, I'm a little confused here. As we adopt placement strategy #1 > when PLACE_LAG is enabled, the lag of that entity needs to be preserved. > Given that the weight doesn't change, we have: > > vl' = vl > > But in fact it is scaled on placement: > > vl' = vl * W/(W + w) (W+w)/W > > Does this intended? The scaling, yes that's intended and the comment explains why. So now you have me confused too :-) Specifically, I want the lag after placement to be equal to the lag we come in with. Since placement will affect avg_vruntime (adding one element to the average changes the average etc..) the placement also affects the lag as measured after placement. Or rather, if you enqueue and dequeue, I want the lag to be preserved. If you do not take placement into consideration, lag will dissipate real quick. > And to illustrate my understanding of strategy #1: > @@ -5162,41 +5165,17 @@ place_entity(struct cfs_rq *cfs_rq, struct > sched_entity *se, int flags) >* vl_i is given by: >* >* V' = (\Sum w_j*v_j + w_i*v_i) / (W + w_i) > - * = (W*V + w_i*(V - vl_i)) / (W + w_i) > - * = (W*V + w_i*V - w_i*vl_i) / (W + w_i) > - * = (V*(W + w_i) - w_i*l) / (W + w_i) > - * = V - w_i*vl_i / (W + w_i) > - * > - * And the actual lag after adding an entity with vl_i is: > - * > - * vl'_i = V' - v_i > - * = V - w_i*vl_i / (W + w_i) - (V - vl_i) > - * = vl_i - w_i*vl_i / (W + w_i) > - * > - * Which is strictly less than vl_i. So in order to preserve lag > - * we should inflate the lag before placement such that the > - * effective lag after placement comes out right. > - * > - * As such, invert the above relation for vl'_i to get the vl_i > - * we need to use such that the lag after placement is the lag > - * we computed before dequeue. > + * = (W*V + w_i*(V' - vl_i)) / (W + w_i) > + * = V - w_i*vl_i / W >* > - * vl'_i = vl_i - w_i*vl_i / (W + w_i) > - * = ((W + w_i)*vl_i - w_i*vl_i) / (W + w_i) > - * > - * (W + w_i)*vl'_i = (W + w_i)*vl_i - w_i*vl_i > - * = W*vl_i > - * > - * vl_i = (W + w_i)*vl'_i / W >*/ > load = cfs_rq->avg_load; > if (curr && curr->on_rq) > load += scale_load_down(curr->load.weight); > - > - lag *= load + scale_load_down(se->load.weight); > if (WARN_ON_ONCE(!load)) > load = 1; > - lag = div_s64(lag, load); > + > + vruntime -= div_s64(lag * scale_load_down(se->load.weight), > load); > } > se->vruntime = vruntime - lag; So you're proposing we do: v = V - (lag * w) / (W + w) - lag ? That can be written like: v = V - (lag * w) / (W+w) - (lag * (W+w)) / (W+w) = V - (lag * (W+w) + lag * w) / (W+w) = V - (lag * (W+2w)) / (W+w) And that turns into a mess AFAICT. Let me repeat my earlier argument. Suppose v,w,l are the new element. V,W are the old avg_vruntime and sum-weight. Then: V = V*W / W, and by extention: V' = (V*W + v*w) / (W + w). The new lag, after placement: l' = V' - v = (V*W + v*w) / (W+w) - v = (V*W + v*w) / (W+w) - v * (W+w) / (W+v) = (V*W + v*w -v*W - v*w) / (W+w) = (V*W - v*W) / (W+w) = W*(V-v) / (W+w) = W/(W+w) * (V-v) Substitute: v = V - (W+w)/W * l, my scaling thing, to obtain: l' = W/(W+w) * (V - (V - (W+w)/W * l)) = W/(W+w) * (V - V + (W+w)/W * l) = W/(W+w) * (W+w)/W * l = l So by scaling, we've preserved lag across placement. That make sense?