Re: Current status and possible improvements in CONFIG_MODULE_FORCE_UNLOAD
Thanks for the reply Lucas. It makes sense now! > On 15 Jun 2024, at 12:18 AM, Lucas De Marchi wrote: > > On Thu, Jun 06, 2024 at 06:49:59AM GMT, Aditya Garg wrote: >> Hi >> >> I am Aditya Garg. I often require using out of tree drivers to support >> various hardwares on Linux. Sometimes the provider doesn't write good >> drivers, and often they have to be force unloaded. It's a common thing in >> proprietary drivers. I know the author of the driver should take note of the >> issues, but still the force unloading of the modules does come in handy many >> times. >> >> Unfortunately if CONFIG_MODULE_FORCE_UNLOAD is not enabled in your kernel, >> which most probably is not enabled if you are using a Distribution pre >> compiled kernel, you have to recompile the whole kernel again. > > CONFIG_MODULE_FORCE_UNLOAD only ever makes sense on a developer > environment loading/unloading multiple times his own .ko module. Then > the developer knows better the state of the driver and hw to judge if > it's safe to ignore krefs. > >> >> I want wondering if instead of a kernel config option, we could use a kernel >> parameter to enable/disable this feature, I believe it should act as a >> better alternative. After all there must be people like me who are forced to >> recompile the whole linux kernel just for the sake of getting a >> functionality. > > Just allowing it like this is not a good thing. You may have a all kind > of issues with use after free, dangling pointers etc. That would just > make life harder for people not involved with proprietary modules. > > >> I hope for a reply and suggestions > > I´d ask them to upstream their driver and start sending the issues to > their side. > > Lucas De Marchi > >> >> Regards >> Aditya
Re: [PATCH] ARM: dts: qcom: motorola-falcon: add accelerometer, magnetometer
On Sun, 09 Jun 2024 13:05:43 +0200, Stanislav Jakubek wrote: > Add the accelerometer and magnetometer that are present on the Motorola > Moto G (2013) device. > > Applied, thanks! [1/1] ARM: dts: qcom: motorola-falcon: add accelerometer, magnetometer commit: 5756101babc5334a9bc99601d1cc0d6776fa9ada Best regards, -- Bjorn Andersson
Re: [PATCH v2 0/2] Support mailbox interface in qcom,smsm driver
On Thu, 06 Jun 2024 21:18:31 +0200, Luca Weiss wrote: > Take a shot at converting the last driver that requires direct > "qcom,ipc*" syscon references in devicetree by allowing the smsm driver > to use the mailbox interface to achieve the same effect. > > Still not sure if the devicetree bindings are the prettiest but they're > functional. > > [...] Applied, thanks! [1/2] dt-bindings: soc: qcom,smsm: Allow specifying mboxes instead of qcom,ipc commit: 5e66abcf1e250f032ecb18a7ecfac5287298ed8e [2/2] soc: qcom: smsm: Support using mailbox interface commit: 75287992f58a74271a083fef0356bc81d629f671 Best regards, -- Bjorn Andersson
Re: [PATCH] arm64: dts: qcom: qcm6490-fairphone-fp5: Use .mbn firmware for IPA
On Thu, 06 Jun 2024 11:09:06 +0200, Luca Weiss wrote: > Specify the file name for the squashed/non-split firmware with the .mbn > extension instead of the split .mdt. The kernel can load both but the > squashed version is preferred in dts nowadays. > > Applied, thanks! [1/1] arm64: dts: qcom: qcm6490-fairphone-fp5: Use .mbn firmware for IPA commit: ee5dcd7393af9af3494f533a6308faa539bd6718 Best regards, -- Bjorn Andersson
[PATCH] virtio_net: Eliminate OOO packets during switching
Disable the network device & turn off carrier before modifying the number of queue pairs. Process all the in-flight packets and then turn on carrier, followed by waking up all the queues on the network device. Signed-off-by: Abhinav Jain --- drivers/net/virtio_net.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 61a57d134544..d0a655a3b4c6 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3447,7 +3447,6 @@ static void virtnet_get_drvinfo(struct net_device *dev, } -/* TODO: Eliminate OOO packets during switching */ static int virtnet_set_channels(struct net_device *dev, struct ethtool_channels *channels) { @@ -3471,6 +3470,15 @@ static int virtnet_set_channels(struct net_device *dev, if (vi->rq[0].xdp_prog) return -EINVAL; + /* Disable network device to prevent packet processing during +* the switch. +*/ + netif_tx_disable(dev); + netif_carrier_off(dev); + + /* Make certain that all in-flight packets are processed. */ + synchronize_net(); + cpus_read_lock(); err = virtnet_set_queues(vi, queue_pairs); if (err) { @@ -3482,7 +3490,12 @@ static int virtnet_set_channels(struct net_device *dev, netif_set_real_num_tx_queues(dev, queue_pairs); netif_set_real_num_rx_queues(dev, queue_pairs); - err: + + /* Restart the network device */ + netif_carrier_on(dev); + netif_tx_wake_all_queues(dev); + +err: return err; } -- 2.34.1
Re: [PATCHv8 bpf-next 3/9] uprobe: Add uretprobe syscall to speed up return probe
On Fri, Jun 14, 2024 at 09:26:59PM +0200, Jiri Olsa wrote: > On Fri, Jun 14, 2024 at 10:48:22AM -0700, Nathan Chancellor wrote: > > Hi Jiri, > > > > On Tue, Jun 11, 2024 at 01:21:52PM +0200, Jiri Olsa wrote: > > > Adding uretprobe syscall instead of trap to speed up return probe. > > ... > > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > > > index 2c83ba776fc7..2816e65729ac 100644 > > > --- a/kernel/events/uprobes.c > > > +++ b/kernel/events/uprobes.c > > > @@ -1474,11 +1474,20 @@ static int xol_add_vma(struct mm_struct *mm, > > > struct xol_area *area) > > > return ret; > > > } > > > > > > +void * __weak arch_uprobe_trampoline(unsigned long *psize) > > > +{ > > > + static uprobe_opcode_t insn = UPROBE_SWBP_INSN; > > > > This change as commit ff474a78cef5 ("uprobe: Add uretprobe syscall to > > speed up return probe") in -next causes the following build error for > > ARCH=loongarch allmodconfig: > > > > In file included from include/linux/uprobes.h:49, > >from include/linux/mm_types.h:16, > >from include/linux/mmzone.h:22, > >from include/linux/gfp.h:7, > >from include/linux/xarray.h:16, > >from include/linux/list_lru.h:14, > >from include/linux/fs.h:13, > >from include/linux/highmem.h:5, > >from kernel/events/uprobes.c:13: > > kernel/events/uprobes.c: In function 'arch_uprobe_trampoline': > > arch/loongarch/include/asm/uprobes.h:12:33: error: initializer element is > > not constant > > 12 | #define UPROBE_SWBP_INSN > > larch_insn_gen_break(BRK_UPROBE_BP) > > | ^~~~ > > kernel/events/uprobes.c:1479:39: note: in expansion of macro > > 'UPROBE_SWBP_INSN' > >1479 | static uprobe_opcode_t insn = UPROBE_SWBP_INSN; > > reproduced, could you please try the change below Yeah, that fixes the issue for me. Tested-by: Nathan Chancellor # build > --- > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index 2816e65729ac..6986bd993702 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -1476,8 +1476,9 @@ static int xol_add_vma(struct mm_struct *mm, struct > xol_area *area) > > void * __weak arch_uprobe_trampoline(unsigned long *psize) > { > - static uprobe_opcode_t insn = UPROBE_SWBP_INSN; > + static uprobe_opcode_t insn; > > + insn = insn ?: UPROBE_SWBP_INSN; > *psize = UPROBE_SWBP_INSN_SIZE; > return &insn; > }
Re: [PATCH v4 net-next 1/7] net: add rx_sk to trace_kfree_skb
On Fri, Jun 14, 2024 at 5:15 AM Paolo Abeni wrote: > > On Wed, 2024-06-12 at 09:59 +0200, Jesper Dangaard Brouer wrote: > > > > On 11/06/2024 22.11, Yan Zhai wrote: > > > skb does not include enough information to find out receiving > > > sockets/services and netns/containers on packet drops. In theory > > > skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP > > > stack for OOO packet lookup. Similarly, skb->sk often identifies a local > > > sender, and tells nothing about a receiver. > > > > > > Allow passing an extra receiving socket to the tracepoint to improve > > > the visibility on receiving drops. > > > > > > Signed-off-by: Yan Zhai > > > --- > > > v3->v4: adjusted the TP_STRUCT field order to be consistent > > > v2->v3: fixed drop_monitor function prototype > > > --- > > > include/trace/events/skb.h | 11 +++ > > > net/core/dev.c | 2 +- > > > net/core/drop_monitor.c| 9 ++--- > > > net/core/skbuff.c | 2 +- > > > 4 files changed, 15 insertions(+), 9 deletions(-) > > > > > > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h > > > index 07e0715628ec..3e9ea1cca6f2 100644 > > > --- a/include/trace/events/skb.h > > > +++ b/include/trace/events/skb.h > > > @@ -24,13 +24,14 @@ DEFINE_DROP_REASON(FN, FN) > > > TRACE_EVENT(kfree_skb, > > > > > > TP_PROTO(struct sk_buff *skb, void *location, > > > -enum skb_drop_reason reason), > > > +enum skb_drop_reason reason, struct sock *rx_sk), > > > > > > - TP_ARGS(skb, location, reason), > > > + TP_ARGS(skb, location, reason, rx_sk), > > > > > > TP_STRUCT__entry( > > > __field(void *, skbaddr) > > > __field(void *, location) > > > + __field(void *, rx_skaddr) > > > > Is there any reason for appending the "addr" part to "rx_sk" ? > > It makes it harder to read this is the sk (socket). > > > > AFAICR the skbaddr naming is a legacy thing. > > I'm double-minded about the above: I can see your point, but on the > flip side the 'addr' suffix is consistently used in net-related > tracepoints. > > > > > __field(unsigned short, protocol) > > > __field(enum skb_drop_reason, reason) > > > ), > > > @@ -38,12 +39,14 @@ TRACE_EVENT(kfree_skb, > > > TP_fast_assign( > > > __entry->skbaddr = skb; > > > __entry->location = location; > > > + __entry->rx_skaddr = rx_sk; > > > __entry->protocol = ntohs(skb->protocol); > > > __entry->reason = reason; > > > ), > > > > > > - TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s", > > > - __entry->skbaddr, __entry->protocol, __entry->location, > > > + TP_printk("skbaddr=%p rx_skaddr=%p protocol=%u location=%pS reason: > > > %s", > >^ > > I find it hard to visually tell skbaddr and rx_skaddr apart. > > And especially noticing the "skb" vs "sk" part of the string. > > I agree 'rx_skaddr' is sub-optimal. Either be consistent with all the > other net tracepoints and use 'skaddr' (which will very likely will > increase Jesper concerns, but I personally have no problem with such > format) or prefer readability with something alike 'rx_sk' or (even > better) 'sk'. > Jesper explained to me in a private message that "addr" makes more sense when there was no BPF, since likely nothing would dereference the pointer anymore at that time, so it's purely an address. But it is no longer the case now. Also, in later patches of this change, I am already breaking the "convention" by replacing kfree_skb with sk_skb_reason_drop, so how about breaking it once more, and just calling it "rx_sk". I want to keep the "rx_" to emphasize this is a receiving socket. Let me send an amended version early next week and see if more thoughts come. thanks Yan > Thanks, > > Paolo >
Re: [PATCHv8 bpf-next 3/9] uprobe: Add uretprobe syscall to speed up return probe
On Fri, Jun 14, 2024 at 10:48:22AM -0700, Nathan Chancellor wrote: > Hi Jiri, > > On Tue, Jun 11, 2024 at 01:21:52PM +0200, Jiri Olsa wrote: > > Adding uretprobe syscall instead of trap to speed up return probe. > ... > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > > index 2c83ba776fc7..2816e65729ac 100644 > > --- a/kernel/events/uprobes.c > > +++ b/kernel/events/uprobes.c > > @@ -1474,11 +1474,20 @@ static int xol_add_vma(struct mm_struct *mm, struct > > xol_area *area) > > return ret; > > } > > > > +void * __weak arch_uprobe_trampoline(unsigned long *psize) > > +{ > > + static uprobe_opcode_t insn = UPROBE_SWBP_INSN; > > This change as commit ff474a78cef5 ("uprobe: Add uretprobe syscall to > speed up return probe") in -next causes the following build error for > ARCH=loongarch allmodconfig: > > In file included from include/linux/uprobes.h:49, >from include/linux/mm_types.h:16, >from include/linux/mmzone.h:22, >from include/linux/gfp.h:7, >from include/linux/xarray.h:16, >from include/linux/list_lru.h:14, >from include/linux/fs.h:13, >from include/linux/highmem.h:5, >from kernel/events/uprobes.c:13: > kernel/events/uprobes.c: In function 'arch_uprobe_trampoline': > arch/loongarch/include/asm/uprobes.h:12:33: error: initializer element is > not constant > 12 | #define UPROBE_SWBP_INSNlarch_insn_gen_break(BRK_UPROBE_BP) > | ^~~~ > kernel/events/uprobes.c:1479:39: note: in expansion of macro > 'UPROBE_SWBP_INSN' >1479 | static uprobe_opcode_t insn = UPROBE_SWBP_INSN; reproduced, could you please try the change below thanks, jirka --- diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 2816e65729ac..6986bd993702 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1476,8 +1476,9 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area) void * __weak arch_uprobe_trampoline(unsigned long *psize) { - static uprobe_opcode_t insn = UPROBE_SWBP_INSN; + static uprobe_opcode_t insn; + insn = insn ?: UPROBE_SWBP_INSN; *psize = UPROBE_SWBP_INSN_SIZE; return &insn; }
Re: [PATCH v4 2/3] kbuild, kconfig: generate offset range data for builtin modules
My apologies (esp. to Masahiro Yamada)... this patch was supposed to resolve the outstanding issue of needing to add gawk to the dependencies in the documentation and that part of the patch still didn't make it in. I've added it on my end for v5, and will absolutely ensure that it will be in the posted version. Again, sorry for have yet again overlooked that. Kris On Fri, Jun 14, 2024 at 01:14:27PM -0400, Kris Van Hees wrote: > The offset range data for builtin modules is generated using: > - modules.builtin: associates object files with module names > - vmlinux.map: provides load order of sections and offset of first member > per section > - vmlinux.o.map: provides offset of object file content per section > - .*.cmd: build cmd file with KBUILD_MODFILE and KBUILD_MODNAME > > The generated data will look like: > > .text - = _text > .text baf0-cb10 amd_uncore > .text 0009bd10-0009c8e0 iosf_mbi > ... > .text 008e6660-008e9630 snd_soc_wcd_mbhc > .text 008e9630-008ea610 snd_soc_wcd9335 snd_soc_wcd934x snd_soc_wcd938x > .text 008ea610-008ea780 snd_soc_wcd9335 > ... > .data - = _sdata > .data f020-f680 amd_uncore > > For each ELF section, it lists the offset of the first symbol. This can > be used to determine the base address of the section at runtime. > > Next, it lists (in strict ascending order) offset ranges in that section > that cover the symbols of one or more builtin modules. Multiple ranges > can apply to a single module, and ranges can be shared between modules. > > The CONFIG_BUILTIN_MODULE_RANGES option controls whether offset range data > is generated for kernel modules that are built into the kernel image. > > Signed-off-by: Kris Van Hees > Reviewed-by: Nick Alcock > Reviewed-by: Alan Maguire > --- > Changes since v3: > - Consolidated patches 2 through 5 into a single patch > - Move CONFIG_BUILTIN_MODULE_RANGES to Kconfig.debug > - Make CONFIG_BUILTIN_MODULE_RANGES select CONFIG_VMLINUX_MAP > - Disable CONFIG_BUILTIN_MODULE_RANGES if CONFIG_LTO_CLANG_(FULL|THIN)=y > - Support LLVM (lld) compiles in generate_builtin_ranges.awk > - Support CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y > > Changes since v2: > - Add explicit dependency on FTRACE for CONFIG_BUILTIN_MODULE_RANGES > - 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo > - Switched from using modules.builtin.objs to parsing .*.cmd files > - Parse data from .*.cmd in generate_builtin_ranges.awk > - Use $(real-prereqs) rather than $(filter-out ...) > --- > lib/Kconfig.debug | 19 ++ > scripts/Makefile.vmlinux| 16 ++ > scripts/Makefile.vmlinux_o | 3 + > scripts/generate_builtin_ranges.awk | 284 > 4 files changed, 322 insertions(+) > create mode 100755 scripts/generate_builtin_ranges.awk > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index 291185f54ee4..03fddad67d59 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -571,6 +571,25 @@ config VMLINUX_MAP > pieces of code get eliminated with > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION. > > +config BUILTIN_MODULE_RANGES > + bool "Generate address range information for builtin modules" > + depends on !LTO_CLANG_FULL > + depends on !LTO_CLANG_THIN > + select VMLINUX_MAP > + help > + When modules are built into the kernel, there will be no module name > + associated with its symbols in /proc/kallsyms. Tracers may want to > + identify symbols by module name and symbol name regardless of whether > + the module is configured as loadable or not. > + > + This option generates modules.builtin.ranges in the build tree with > + offset ranges (per ELF section) for the module(s) they belong to. > + It also records an anchor symbol to determine the load address of the > + section. > + > + It is fully compatible with CONFIG_RANDOMIZE_BASE and similar late- > + address-modification options. > + > config DEBUG_FORCE_WEAK_PER_CPU > bool "Force weak per-cpu definitions" > depends on DEBUG_KERNEL > diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux > index c9f3e03124d7..5fd1f272ccde 100644 > --- a/scripts/Makefile.vmlinux > +++ b/scripts/Makefile.vmlinux > @@ -36,6 +36,22 @@ targets += vmlinux > vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE > +$(call if_changed_dep,link_vmlinux) > > +# module.builtin.ranges > +# --- > +ifdef CONFIG_BUILTIN_MODULE_RANGES > +__default: modules.builtin.ranges > + > +quiet_cmd_modules_builtin_ranges = GEN $@ > + cmd_modules_builtin_ranges = \ > + $(srctree)/scripts/generate_builtin_ranges.awk $(real-prereqs) > $@ > + > +vmlinux.map: vmlinux > + > +targets += modules.builtin.ranges > +modules.builtin.ranges: modules.builtin vmlinux.map vmlinux.o.map FORCE > + $(cal
Re: [PATCH] lib/test_kmod: add missing MODULE_DESCRIPTION() macro
On Fri, May 31, 2024 at 05:23:09PM GMT, Jeff Johnson wrote: make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_kmod.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson Reviewed-by: Lucas De Marchi Lucas De Marchi --- lib/test_kmod.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/test_kmod.c b/lib/test_kmod.c index 1eec3b7ac67c..064ed0fce75a 100644 --- a/lib/test_kmod.c +++ b/lib/test_kmod.c @@ -1223,4 +1223,5 @@ static void __exit test_kmod_exit(void) module_exit(test_kmod_exit); MODULE_AUTHOR("Luis R. Rodriguez "); +MODULE_DESCRIPTION("kmod stress test driver"); MODULE_LICENSE("GPL"); --- base-commit: b050496579632f86ee1ef7e7501906db579f3457 change-id: 20240531-md-lib-test_kmod-83bf2ee7e725
Re: Current status and possible improvements in CONFIG_MODULE_FORCE_UNLOAD
On Thu, Jun 06, 2024 at 06:49:59AM GMT, Aditya Garg wrote: Hi I am Aditya Garg. I often require using out of tree drivers to support various hardwares on Linux. Sometimes the provider doesn't write good drivers, and often they have to be force unloaded. It's a common thing in proprietary drivers. I know the author of the driver should take note of the issues, but still the force unloading of the modules does come in handy many times. Unfortunately if CONFIG_MODULE_FORCE_UNLOAD is not enabled in your kernel, which most probably is not enabled if you are using a Distribution pre compiled kernel, you have to recompile the whole kernel again. CONFIG_MODULE_FORCE_UNLOAD only ever makes sense on a developer environment loading/unloading multiple times his own .ko module. Then the developer knows better the state of the driver and hw to judge if it's safe to ignore krefs. I want wondering if instead of a kernel config option, we could use a kernel parameter to enable/disable this feature, I believe it should act as a better alternative. After all there must be people like me who are forced to recompile the whole linux kernel just for the sake of getting a functionality. Just allowing it like this is not a good thing. You may have a all kind of issues with use after free, dangling pointers etc. That would just make life harder for people not involved with proprietary modules. I hope for a reply and suggestions I´d ask them to upstream their driver and start sending the issues to their side. Lucas De Marchi Regards Aditya
Re: [PATCH v4 1/3] kbuild: add mod(name,file)_flags to assembler flags for module objects
On Fri, 14 Jun 2024 14:10:58 -0400 Kris Van Hees wrote: > On Fri, Jun 14, 2024 at 01:46:51PM -0400, Steven Rostedt wrote: > > On Fri, 14 Jun 2024 13:14:26 -0400 > > Kris Van Hees wrote: > > > > > Module objects compiled from C source can be identified by the presence > > > of -DKBUILD_MODFILE and -DKBUILD_MODNAME on their compile command lines. > > > However, module objects from assembler source do not have this defines. > > > > > > Add $(modfile_flags) to modkern_aflags (similar to modkern_cflahs), and > > > add $(modname_flags) to a_flags (similar to c_flags). > > > > You explain what this does but not why it does it. > > The first paragraph is meant to estabish the "why" (being able to identify > what objects are module objects, even if they are compiled from assembler > source). Perhaps there's a lack of context. Sure, the cover letter can help in this regard, but I always look at each commit as a stand alone. > > As I mention, for objects compiled from C source code, those defines being > present identifies those objects as belonging to a module. For objects > compiled from assembler source code, those defines are not present. Passing > them on the compile command line for assembler source code files for objects > that are part of one or more modules allows us to identify all objects that > are part of modules with a single consistent mechanism. Sure, but why do we care? Again, if this was the only patch you sent, it should explain why it is being done. Perhaps something like: "In order to be able to identify what code is from a module, even if it is built in, ..." But what you are saying is just "C code has these flags, make assembly have them too". Which is meaningless. The other patches could use some more explanation too. -- Steve
Re: [PATCH v4 1/3] kbuild: add mod(name,file)_flags to assembler flags for module objects
On Fri, Jun 14, 2024 at 01:46:51PM -0400, Steven Rostedt wrote: > On Fri, 14 Jun 2024 13:14:26 -0400 > Kris Van Hees wrote: > > > Module objects compiled from C source can be identified by the presence > > of -DKBUILD_MODFILE and -DKBUILD_MODNAME on their compile command lines. > > However, module objects from assembler source do not have this defines. > > > > Add $(modfile_flags) to modkern_aflags (similar to modkern_cflahs), and > > add $(modname_flags) to a_flags (similar to c_flags). > > You explain what this does but not why it does it. The first paragraph is meant to estabish the "why" (being able to identify what objects are module objects, even if they are compiled from assembler source). As I mention, for objects compiled from C source code, those defines being present identifies those objects as belonging to a module. For objects compiled from assembler source code, those defines are not present. Passing them on the compile command line for assembler source code files for objects that are part of one or more modules allows us to identify all objects that are part of modules with a single consistent mechanism. Kris
Re: [PATCHv8 bpf-next 3/9] uprobe: Add uretprobe syscall to speed up return probe
Hi Jiri, On Tue, Jun 11, 2024 at 01:21:52PM +0200, Jiri Olsa wrote: > Adding uretprobe syscall instead of trap to speed up return probe. ... > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index 2c83ba776fc7..2816e65729ac 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -1474,11 +1474,20 @@ static int xol_add_vma(struct mm_struct *mm, struct > xol_area *area) > return ret; > } > > +void * __weak arch_uprobe_trampoline(unsigned long *psize) > +{ > + static uprobe_opcode_t insn = UPROBE_SWBP_INSN; This change as commit ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe") in -next causes the following build error for ARCH=loongarch allmodconfig: In file included from include/linux/uprobes.h:49, from include/linux/mm_types.h:16, from include/linux/mmzone.h:22, from include/linux/gfp.h:7, from include/linux/xarray.h:16, from include/linux/list_lru.h:14, from include/linux/fs.h:13, from include/linux/highmem.h:5, from kernel/events/uprobes.c:13: kernel/events/uprobes.c: In function 'arch_uprobe_trampoline': arch/loongarch/include/asm/uprobes.h:12:33: error: initializer element is not constant 12 | #define UPROBE_SWBP_INSNlarch_insn_gen_break(BRK_UPROBE_BP) | ^~~~ kernel/events/uprobes.c:1479:39: note: in expansion of macro 'UPROBE_SWBP_INSN' 1479 | static uprobe_opcode_t insn = UPROBE_SWBP_INSN; | ^~~~ > + *psize = UPROBE_SWBP_INSN_SIZE; > + return &insn; > +} > + > static struct xol_area *__create_xol_area(unsigned long vaddr) > { > struct mm_struct *mm = current->mm; > - uprobe_opcode_t insn = UPROBE_SWBP_INSN; > + unsigned long insns_size; > struct xol_area *area; > + void *insns; > > area = kmalloc(sizeof(*area), GFP_KERNEL); > if (unlikely(!area)) > @@ -1502,7 +1511,8 @@ static struct xol_area *__create_xol_area(unsigned long > vaddr) > /* Reserve the 1st slot for get_trampoline_vaddr() */ > set_bit(0, area->bitmap); > atomic_set(&area->slot_count, 1); > - arch_uprobe_copy_ixol(area->pages[0], 0, &insn, UPROBE_SWBP_INSN_SIZE); > + insns = arch_uprobe_trampoline(&insns_size); > + arch_uprobe_copy_ixol(area->pages[0], 0, insns, insns_size); > > if (!xol_add_vma(mm, area)) > return area; > @@ -1827,7 +1837,7 @@ void uprobe_copy_process(struct task_struct *t, > unsigned long flags) > * > * Returns -1 in case the xol_area is not allocated. > */ > -static unsigned long get_trampoline_vaddr(void) > +unsigned long uprobe_get_trampoline_vaddr(void) > { > struct xol_area *area; > unsigned long trampoline_vaddr = -1; > @@ -1878,7 +1888,7 @@ static void prepare_uretprobe(struct uprobe *uprobe, > struct pt_regs *regs) > if (!ri) > return; > > - trampoline_vaddr = get_trampoline_vaddr(); > + trampoline_vaddr = uprobe_get_trampoline_vaddr(); > orig_ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, > regs); > if (orig_ret_vaddr == -1) > goto fail; > @@ -2123,7 +2133,7 @@ static struct return_instance > *find_next_ret_chain(struct return_instance *ri) > return ri; > } > > -static void handle_trampoline(struct pt_regs *regs) > +void uprobe_handle_trampoline(struct pt_regs *regs) > { > struct uprobe_task *utask; > struct return_instance *ri, *next; > @@ -2187,8 +2197,8 @@ static void handle_swbp(struct pt_regs *regs) > int is_swbp; > > bp_vaddr = uprobe_get_swbp_addr(regs); > - if (bp_vaddr == get_trampoline_vaddr()) > - return handle_trampoline(regs); > + if (bp_vaddr == uprobe_get_trampoline_vaddr()) > + return uprobe_handle_trampoline(regs); > > uprobe = find_active_uprobe(bp_vaddr, &is_swbp); > if (!uprobe) { > -- > 2.45.1 > Cheers, Nathan
Re: [PATCH v4 1/3] kbuild: add mod(name,file)_flags to assembler flags for module objects
On Fri, 14 Jun 2024 13:14:26 -0400 Kris Van Hees wrote: > Module objects compiled from C source can be identified by the presence > of -DKBUILD_MODFILE and -DKBUILD_MODNAME on their compile command lines. > However, module objects from assembler source do not have this defines. > > Add $(modfile_flags) to modkern_aflags (similar to modkern_cflahs), and > add $(modname_flags) to a_flags (similar to c_flags). You explain what this does but not why it does it. -- Steve
[PATCH v4 3/3] module: add install target for modules.builtin.ranges
When CONFIG_BUILTIN_MODULE_RANGES is enabled, the modules.builtin.ranges file should be installed in the module install location. Signed-off-by: Kris Van Hees Reviewed-by: Nick Alcock --- Changes since v3: - Only install modules.builtin.ranges if CONFIG_BUILTIN_MODULE_RANGES=y --- scripts/Makefile.modinst | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/scripts/Makefile.modinst b/scripts/Makefile.modinst index 0afd75472679..c38bf63a33be 100644 --- a/scripts/Makefile.modinst +++ b/scripts/Makefile.modinst @@ -30,10 +30,12 @@ $(MODLIB)/modules.order: modules.order FORCE quiet_cmd_install_modorder = INSTALL $@ cmd_install_modorder = sed 's:^\(.*\)\.o$$:kernel/\1.ko:' $< > $@ -# Install modules.builtin(.modinfo) even when CONFIG_MODULES is disabled. +# Install modules.builtin(.modinfo,.ranges) even when CONFIG_MODULES is disabled. install-y += $(addprefix $(MODLIB)/, modules.builtin modules.builtin.modinfo) -$(addprefix $(MODLIB)/, modules.builtin modules.builtin.modinfo): $(MODLIB)/%: % FORCE +install-$(CONFIG_BUILTIN_MODULE_RANGES) += $(MODLIB)/modules.builtin.ranges + +$(addprefix $(MODLIB)/, modules.builtin modules.builtin.modinfo modules.builtin.ranges): $(MODLIB)/%: % FORCE $(call cmd,install) endif -- 2.45.1
[PATCH v4 2/3] kbuild, kconfig: generate offset range data for builtin modules
The offset range data for builtin modules is generated using: - modules.builtin: associates object files with module names - vmlinux.map: provides load order of sections and offset of first member per section - vmlinux.o.map: provides offset of object file content per section - .*.cmd: build cmd file with KBUILD_MODFILE and KBUILD_MODNAME The generated data will look like: .text - = _text .text baf0-cb10 amd_uncore .text 0009bd10-0009c8e0 iosf_mbi ... .text 008e6660-008e9630 snd_soc_wcd_mbhc .text 008e9630-008ea610 snd_soc_wcd9335 snd_soc_wcd934x snd_soc_wcd938x .text 008ea610-008ea780 snd_soc_wcd9335 ... .data - = _sdata .data f020-f680 amd_uncore For each ELF section, it lists the offset of the first symbol. This can be used to determine the base address of the section at runtime. Next, it lists (in strict ascending order) offset ranges in that section that cover the symbols of one or more builtin modules. Multiple ranges can apply to a single module, and ranges can be shared between modules. The CONFIG_BUILTIN_MODULE_RANGES option controls whether offset range data is generated for kernel modules that are built into the kernel image. Signed-off-by: Kris Van Hees Reviewed-by: Nick Alcock Reviewed-by: Alan Maguire --- Changes since v3: - Consolidated patches 2 through 5 into a single patch - Move CONFIG_BUILTIN_MODULE_RANGES to Kconfig.debug - Make CONFIG_BUILTIN_MODULE_RANGES select CONFIG_VMLINUX_MAP - Disable CONFIG_BUILTIN_MODULE_RANGES if CONFIG_LTO_CLANG_(FULL|THIN)=y - Support LLVM (lld) compiles in generate_builtin_ranges.awk - Support CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y Changes since v2: - Add explicit dependency on FTRACE for CONFIG_BUILTIN_MODULE_RANGES - 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo - Switched from using modules.builtin.objs to parsing .*.cmd files - Parse data from .*.cmd in generate_builtin_ranges.awk - Use $(real-prereqs) rather than $(filter-out ...) --- lib/Kconfig.debug | 19 ++ scripts/Makefile.vmlinux| 16 ++ scripts/Makefile.vmlinux_o | 3 + scripts/generate_builtin_ranges.awk | 284 4 files changed, 322 insertions(+) create mode 100755 scripts/generate_builtin_ranges.awk diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 291185f54ee4..03fddad67d59 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -571,6 +571,25 @@ config VMLINUX_MAP pieces of code get eliminated with CONFIG_LD_DEAD_CODE_DATA_ELIMINATION. +config BUILTIN_MODULE_RANGES + bool "Generate address range information for builtin modules" + depends on !LTO_CLANG_FULL + depends on !LTO_CLANG_THIN + select VMLINUX_MAP + help +When modules are built into the kernel, there will be no module name +associated with its symbols in /proc/kallsyms. Tracers may want to +identify symbols by module name and symbol name regardless of whether +the module is configured as loadable or not. + +This option generates modules.builtin.ranges in the build tree with +offset ranges (per ELF section) for the module(s) they belong to. +It also records an anchor symbol to determine the load address of the +section. + +It is fully compatible with CONFIG_RANDOMIZE_BASE and similar late- +address-modification options. + config DEBUG_FORCE_WEAK_PER_CPU bool "Force weak per-cpu definitions" depends on DEBUG_KERNEL diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux index c9f3e03124d7..5fd1f272ccde 100644 --- a/scripts/Makefile.vmlinux +++ b/scripts/Makefile.vmlinux @@ -36,6 +36,22 @@ targets += vmlinux vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE +$(call if_changed_dep,link_vmlinux) +# module.builtin.ranges +# --- +ifdef CONFIG_BUILTIN_MODULE_RANGES +__default: modules.builtin.ranges + +quiet_cmd_modules_builtin_ranges = GEN $@ + cmd_modules_builtin_ranges = \ + $(srctree)/scripts/generate_builtin_ranges.awk $(real-prereqs) > $@ + +vmlinux.map: vmlinux + +targets += modules.builtin.ranges +modules.builtin.ranges: modules.builtin vmlinux.map vmlinux.o.map FORCE + $(call if_changed,modules_builtin_ranges) +endif + # Add FORCE to the prequisites of a target to force it to be always rebuilt. # --- diff --git a/scripts/Makefile.vmlinux_o b/scripts/Makefile.vmlinux_o index 6de297916ce6..252505505e0e 100644 --- a/scripts/Makefile.vmlinux_o +++ b/scripts/Makefile.vmlinux_o @@ -45,9 +45,12 @@ objtool-args = $(vmlinux-objtool-args-y) --link # Link of vmlinux.o used for section mismatch analysis # --- +vmlinux-o-ld-args-$(CONFIG_BUI
[PATCH v4 1/3] kbuild: add mod(name,file)_flags to assembler flags for module objects
Module objects compiled from C source can be identified by the presence of -DKBUILD_MODFILE and -DKBUILD_MODNAME on their compile command lines. However, module objects from assembler source do not have this defines. Add $(modfile_flags) to modkern_aflags (similar to modkern_cflahs), and add $(modname_flags) to a_flags (similar to c_flags). Signed-off-by: Kris Van Hees --- scripts/Makefile.lib | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index 3179747cbd2c..a2524ffd046f 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -234,7 +234,7 @@ modkern_rustflags = \ modkern_aflags = $(if $(part-of-module), \ $(KBUILD_AFLAGS_MODULE) $(AFLAGS_MODULE), \ - $(KBUILD_AFLAGS_KERNEL) $(AFLAGS_KERNEL)) + $(KBUILD_AFLAGS_KERNEL) $(AFLAGS_KERNEL) $(modfile_flags)) c_flags= -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \ -include $(srctree)/include/linux/compiler_types.h \ @@ -244,7 +244,7 @@ c_flags= -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \ rust_flags = $(_rust_flags) $(modkern_rustflags) @$(objtree)/include/generated/rustc_cfg a_flags= -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \ -$(_a_flags) $(modkern_aflags) +$(_a_flags) $(modkern_aflags) $(modname_flags) cpp_flags = -Wp,-MMD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \ $(_cpp_flags) -- 2.45.1
[PATCH v4 0/3] Generate address range data for built-in modules
Especially for tracing applications, it is convenient to be able to refer to a symbol using a pair and to be able to translate an address into a pair. But that does not work if the module is built into the kernel because the object files that comprise the built-in module implementation are simply linked into the kernel image along with all other kernel object files. This is especially visible when providing tracing scripts for support purposes, where the developer of the script targets a particular kernel version, but does not have control over whether the target system has a particular module as loadable module or built-in module. When tracing symbols within a module, referring them by pairs is both convenient and aids symbol lookup. But that naming will not work if the module name information is lost if the module is built into the kernel on the target system. Earlier work addressing this loss of information for built-in modules involved adding module name information to the kallsyms data, but that required more invasive code in the kernel proper. This work never did get merged into the kernel tree. All that is really needed is knowing whether a given address belongs to a particular module (or multiple modules if they share an object file). Or in other words, whether that address falls within an address range that is associated with one or more modules. Objects can be identified as belonging to a particular module (or modules) based on defines that are passed as flags to their respective compilation commands. The data found in modules.builtin is used to determine what modules are built into the kernel proper. Then, vmlinux.o.map and vmlinux.map can be parsed in a single pass to generate a modules.buitin.ranges file with offset range information (relative to the base address of the associated section) for built-in modules. This file gets installed along with the other modules.builtin.* files. The impact on the kernel build is minimal because everything is done using a single-pass AWK script. The generated data size is minimal as well, (depending on the exact kernel configuration) usually in the range of 500-700 lines, with a file size of 20-40KB (if all modules are built in, the file contains about 8000 lines, with a file size of about 285KB). Changes since v3: - Consolidated patches 2 through 5 into a single patch - Move CONFIG_BUILTIN_MODULE_RANGES to Kconfig.debug - Make CONFIG_BUILTIN_MODULE_RANGES select CONFIG_VMLINUX_MAP - Disable CONFIG_BUILTIN_MODULE_RANGES if CONFIG_LTO_CLANG_(FULL|THIN)=y - Support LLVM (lld) compiles in generate_builtin_ranges.awk - Support CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y - Only install modules.builtin.ranges if CONFIG_BUILTIN_MODULE_RANGES=y Changes since v2: - Switched from using modules.builtin.objs to parsing .*.cmd files - Add explicit dependency on FTRACE for CONFIG_BUILTIN_MODULE_RANGES - 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo - Parse data from .*.cmd in generate_builtin_ranges.awk - Use $(real-prereqs) rather than $(filter-out ...) - Include modules.builtin.ranges in modules install target Changes since v1: - Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES - Moved the config option to the tracers section - 2nd arg to generate_builtin_ranges.awk should be vmlinux.map Kris Van Hees (5): trace: add CONFIG_BUILTIN_MODULE_RANGES option kbuild: generate a linker map for vmlinux.o module: script to generate offset ranges for builtin modules kbuild: generate modules.builtin.ranges when linking the kernel module: add install target for modules.builtin.ranges Luis Chamberlain (1): kbuild: add modules.builtin.objs .gitignore | 2 +- Documentation/dontdiff | 2 +- Documentation/kbuild/kbuild.rst | 5 ++ Makefile| 8 +- include/linux/module.h | 4 +- kernel/trace/Kconfig| 17 scripts/Makefile.lib| 5 +- scripts/Makefile.modinst| 11 ++- scripts/Makefile.vmlinux| 17 scripts/Makefile.vmlinux_o | 18 - scripts/generate_builtin_ranges.awk | 149 11 files changed, 228 insertions(+), 10 deletions(-) create mode 100755 scripts/generate_builtin_ranges.awk base-commit: dd5a440a31fae6e459c0d627162825505361 -- 2.42.0
Re: [PATCH v6 0/7] DAMON based tiered memory management for CXL memory
On Fri, 14 Jun 2024 12:00:02 +0900 Honggyu Kim wrote: > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously > posted at [1]. > > It says there is no implementation of the demote/promote DAMOS action > are made. This patch series is about its implementation for physical > address space so that this scheme can be applied in system wide level. > > Changes from v5: > https://lore.kernel.org/20240613132056.608-1-honggyu@sk.com > 1. Remove new actions in usage document as its for debugfs Thank you, I confirmed this and gave you my Reviewed-by: tag. > 2. Apply minor fixes on cover letter But... [...] > 2. YCSB zipfian distribution read only workload (with demotion_enabled true) > memory pressure with cold memory on node0 with 512GB of local DRAM. > > ++= > | cold memory occupied by mmap and memset | > | 0G 440G 450G 460G 470G 480G 490G 500G | > > ++= > Execution time normalized to DRAM-only values| > GEOMEAN > > ++- > DAMON tiered|- 1.03 1.03 1.03 1.03 1.03 1.07 1.05 | 1.04 > DAMON lazy |- 1.04 1.03 1.04 1.05 1.06 1.06 1.06 | 1.05 > DAMON tiered kswapd |- 1.03 1.03 1.03 1.03 1.02 1.02 1.03 | 1.03 > DAMON lazy kswapd |- 1.04 1.04 1.04 1.03 1.05 1.04 1.05 | 1.04 > > ++= > CXL usage of redis-server in GB | > AVERAGE > > ++- > DAMON tiered|- 0.6 0.5 0.4 0.7 0.8 7.1 5.6 | 2.2 > DAMON lazy |- 0.5 3.0 4.5 5.4 6.4 9.4 9.1 | 5.5 > DAMON tiered kswapd |- 0.0 0.0 0.4 0.5 0.1 0.8 1.0 | 0.4 > DAMON lazy kswapd |- 4.2 4.6 5.3 1.7 6.8 8.1 5.8 | 5.2 > > ++= > > Each test result is based on the exeuction environment as follows. Seems the typo is not fixed? I don't want to delay this work for such trivial thing, though. For the patch series, Acked-by: SeongJae Park Thanks, SJ [...]
Re: [PATCH v6 7/7] Docs/damon: document damos_migrate_{hot,cold}
On Fri, 14 Jun 2024 12:00:09 +0900 Honggyu Kim wrote: > This patch adds damon description for "migrate_hot" and "migrate_cold" > actions for both usage and design documents as long as a new > "target_nid" knob to set the migration target node. > > Signed-off-by: Honggyu Kim Reviewed-by: SeongJae Park Thanks, SJ [...]
Re: [PATCH V2] net: qrtr: ns: Ignore ENODEV failures in ns
Hello: This patch was applied to netdev/net-next.git (main) by Paolo Abeni : On Wed, 12 Jun 2024 12:01:56 +0530 you wrote: > From: Chris Lew > > Ignore the ENODEV failures returned by kernel_sendmsg(). These errors > indicate that either the local port has been closed or the remote has > gone down. Neither of these scenarios are fatal and will eventually be > handled through packets that are later queued on the control port. > > [...] Here is the summary with links: - [V2] net: qrtr: ns: Ignore ENODEV failures in ns https://git.kernel.org/netdev/net-next/c/404dbd26322f You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
[PATCH v2] net: missing check virtio
Yeah, I was thinking of adding Fixes: But this code is new, it complements what is done. 1. check (!(ret && (hdr->gso_size > needed) && ((remainder > needed) || (remainder == 0 complements comit 0f6925b3e8da0 2. The setting of the SKBFL_SHARED_FRAG flag can be associated with this comit cef401de7be8c. In the skb_checksum_help function, a check for skb_has_shared_frag has been added. If the flag is not set, the skb buffer will remain non-linear, which is not good.
Re: [PATCH v4 net-next 1/7] net: add rx_sk to trace_kfree_skb
On Wed, 2024-06-12 at 09:59 +0200, Jesper Dangaard Brouer wrote: > > On 11/06/2024 22.11, Yan Zhai wrote: > > skb does not include enough information to find out receiving > > sockets/services and netns/containers on packet drops. In theory > > skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP > > stack for OOO packet lookup. Similarly, skb->sk often identifies a local > > sender, and tells nothing about a receiver. > > > > Allow passing an extra receiving socket to the tracepoint to improve > > the visibility on receiving drops. > > > > Signed-off-by: Yan Zhai > > --- > > v3->v4: adjusted the TP_STRUCT field order to be consistent > > v2->v3: fixed drop_monitor function prototype > > --- > > include/trace/events/skb.h | 11 +++ > > net/core/dev.c | 2 +- > > net/core/drop_monitor.c| 9 ++--- > > net/core/skbuff.c | 2 +- > > 4 files changed, 15 insertions(+), 9 deletions(-) > > > > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h > > index 07e0715628ec..3e9ea1cca6f2 100644 > > --- a/include/trace/events/skb.h > > +++ b/include/trace/events/skb.h > > @@ -24,13 +24,14 @@ DEFINE_DROP_REASON(FN, FN) > > TRACE_EVENT(kfree_skb, > > > > TP_PROTO(struct sk_buff *skb, void *location, > > -enum skb_drop_reason reason), > > +enum skb_drop_reason reason, struct sock *rx_sk), > > > > - TP_ARGS(skb, location, reason), > > + TP_ARGS(skb, location, reason, rx_sk), > > > > TP_STRUCT__entry( > > __field(void *, skbaddr) > > __field(void *, location) > > + __field(void *, rx_skaddr) > > Is there any reason for appending the "addr" part to "rx_sk" ? > It makes it harder to read this is the sk (socket). > > AFAICR the skbaddr naming is a legacy thing. I'm double-minded about the above: I can see your point, but on the flip side the 'addr' suffix is consistently used in net-related tracepoints. > > > __field(unsigned short, protocol) > > __field(enum skb_drop_reason, reason) > > ), > > @@ -38,12 +39,14 @@ TRACE_EVENT(kfree_skb, > > TP_fast_assign( > > __entry->skbaddr = skb; > > __entry->location = location; > > + __entry->rx_skaddr = rx_sk; > > __entry->protocol = ntohs(skb->protocol); > > __entry->reason = reason; > > ), > > > > - TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s", > > - __entry->skbaddr, __entry->protocol, __entry->location, > > + TP_printk("skbaddr=%p rx_skaddr=%p protocol=%u location=%pS reason: %s", >^ > I find it hard to visually tell skbaddr and rx_skaddr apart. > And especially noticing the "skb" vs "sk" part of the string. I agree 'rx_skaddr' is sub-optimal. Either be consistent with all the other net tracepoints and use 'skaddr' (which will very likely will increase Jesper concerns, but I personally have no problem with such format) or prefer readability with something alike 'rx_sk' or (even better) 'sk'. Thanks, Paolo
[PATCH] bpf/selftests: Fix __NR_uretprobe in uprobe_syscall test
Fixing the __NR_uretprobe number in uprobe_syscall test, because it changed due to merge conflict. Signed-off-by: Jiri Olsa --- tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c index c8517c8f5313..bd8c75b620c2 100644 --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c @@ -216,7 +216,7 @@ static void test_uretprobe_regs_change(void) } #ifndef __NR_uretprobe -#define __NR_uretprobe 463 +#define __NR_uretprobe 467 #endif __naked unsigned long uretprobe_syscall_call_1(void) -- 2.45.1
[PATCH] module: Add log information for loading module failures
Add log information in kernel-space when loading module failures. Try to load the unsigned module and the module with bad signature when set 1 to /sys/module/module/parameters/sig_enforce. Unsigned module case: (linux) insmod unsigned.ko [ 18.714661] Loading of unsigned module is rejected insmod: can't insert 'unsigned.ko': Key was rejected by service (linux) Bad signature module case: (linux) insmod bad_signature.ko insmod: can't insert 'bad_signature.ko': Key was rejected by service (linux) There have different logging behavior the bad signature case only log in user-space, add log info for fatal errors in module_sig_check(). Signed-off-by: Yusong Gao --- kernel/module/signing.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/module/signing.c b/kernel/module/signing.c index a2ff4242e623..6a6493c8f7e4 100644 --- a/kernel/module/signing.c +++ b/kernel/module/signing.c @@ -113,6 +113,7 @@ int module_sig_check(struct load_info *info, int flags) * unparseable signatures, and signature check failures -- * even if signatures aren't required. */ + pr_notice("Loading module failed (errno=%d)\n", -err); return err; } -- 2.34.1
[PATCH v5 4/4] net: drop_monitor: use drop_reason_lookup()
From: Johannes Berg Now that we have drop_reason_lookup(), we can just use it for drop_monitor as well, rather than exporting the list itself. Signed-off-by: Johannes Berg --- v3: - look up SKB_DROP_REASON_NOT_SPECIFIED if initial lookup returns NULL, to preserve previous behaviour --- include/net/dropreason.h | 4 net/core/drop_monitor.c | 20 +--- net/core/skbuff.c| 6 +++--- 3 files changed, 8 insertions(+), 22 deletions(-) diff --git a/include/net/dropreason.h b/include/net/dropreason.h index c157070b5303..0e2195ccf2cd 100644 --- a/include/net/dropreason.h +++ b/include/net/dropreason.h @@ -38,10 +38,6 @@ struct drop_reason_list { size_t n_reasons; }; -/* Note: due to dynamic registrations, access must be under RCU */ -extern const struct drop_reason_list __rcu * -drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM]; - #ifdef CONFIG_TRACEPOINTS const char *drop_reason_lookup(unsigned long long value); void drop_reason_show(struct seq_file *m); diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 430ed18f8584..fddf6b68bf06 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -610,9 +610,8 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb, size_t payload_len) { struct net_dm_skb_cb *cb = NET_DM_SKB_CB(skb); - const struct drop_reason_list *list = NULL; - unsigned int subsys, subsys_reason; char buf[NET_DM_MAX_SYMBOL_LEN]; + const char *reason_str; struct nlattr *attr; void *hdr; int rc; @@ -630,19 +629,10 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb, goto nla_put_failure; rcu_read_lock(); - subsys = u32_get_bits(cb->reason, SKB_DROP_REASON_SUBSYS_MASK); - if (subsys < SKB_DROP_REASON_SUBSYS_NUM) - list = rcu_dereference(drop_reasons_by_subsys[subsys]); - subsys_reason = cb->reason & ~SKB_DROP_REASON_SUBSYS_MASK; - if (!list || - subsys_reason >= list->n_reasons || - !list->reasons[subsys_reason] || - strlen(list->reasons[subsys_reason]) > NET_DM_MAX_REASON_LEN) { - list = rcu_dereference(drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_CORE]); - subsys_reason = SKB_DROP_REASON_NOT_SPECIFIED; - } - if (nla_put_string(msg, NET_DM_ATTR_REASON, - list->reasons[subsys_reason])) { + reason_str = drop_reason_lookup(cb->reason); + if (unlikely(!reason_str)) + reason_str = drop_reason_lookup(SKB_DROP_REASON_NOT_SPECIFIED); + if (nla_put_string(msg, NET_DM_ATTR_REASON, reason_str)) { rcu_read_unlock(); goto nla_put_failure; } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index cd1ea6c3e0f8..bd4fb7410284 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -139,13 +139,11 @@ static const struct drop_reason_list drop_reasons_core = { .n_reasons = ARRAY_SIZE(drop_reasons), }; -const struct drop_reason_list __rcu * +static const struct drop_reason_list __rcu * drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM] = { [SKB_DROP_REASON_SUBSYS_CORE] = RCU_INITIALIZER(&drop_reasons_core), }; -EXPORT_SYMBOL(drop_reasons_by_subsys); -#ifdef CONFIG_TRACEPOINTS const char *drop_reason_lookup(unsigned long long value) { unsigned long long subsys_id = value >> SKB_DROP_REASON_SUBSYS_SHIFT; @@ -162,7 +160,9 @@ const char *drop_reason_lookup(unsigned long long value) return NULL; return subsys->reasons[reason]; } +EXPORT_SYMBOL(drop_reason_lookup); +#ifdef CONFIG_TRACEPOINTS void drop_reason_show(struct seq_file *m) { u32 subsys_id; -- 2.45.2
[PATCH v5 3/4] net: dropreason: use new __print_sym() in tracing
From: Johannes Berg The __print_symbolic() could only ever print the core drop reasons, since that's the way the infrastructure works. Now that we have __print_sym() with all the advantages mentioned in that commit, convert to that and get all the drop reasons from all subsystems. As we already have a list of them, that's really easy. This is a little bit of .text (~100 bytes in my build) and saves a lot of .data (~17k). Signed-off-by: Johannes Berg --- include/net/dropreason.h | 5 + include/trace/events/skb.h | 16 +++--- net/core/skbuff.c | 43 ++ 3 files changed, 51 insertions(+), 13 deletions(-) diff --git a/include/net/dropreason.h b/include/net/dropreason.h index 56cb7be92244..c157070b5303 100644 --- a/include/net/dropreason.h +++ b/include/net/dropreason.h @@ -42,6 +42,11 @@ struct drop_reason_list { extern const struct drop_reason_list __rcu * drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM]; +#ifdef CONFIG_TRACEPOINTS +const char *drop_reason_lookup(unsigned long long value); +void drop_reason_show(struct seq_file *m); +#endif + void drop_reasons_register_subsys(enum skb_drop_reason_subsys subsys, const struct drop_reason_list *list); void drop_reasons_unregister_subsys(enum skb_drop_reason_subsys subsys); diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h index 07e0715628ec..8a1a63f9e796 100644 --- a/include/trace/events/skb.h +++ b/include/trace/events/skb.h @@ -8,15 +8,9 @@ #include #include #include +#include -#undef FN -#define FN(reason) TRACE_DEFINE_ENUM(SKB_DROP_REASON_##reason); -DEFINE_DROP_REASON(FN, FN) - -#undef FN -#undef FNe -#define FN(reason) { SKB_DROP_REASON_##reason, #reason }, -#define FNe(reason){ SKB_DROP_REASON_##reason, #reason } +TRACE_DEFINE_SYM_FNS(drop_reason, drop_reason_lookup, drop_reason_show); /* * Tracepoint for free an sk_buff: @@ -44,13 +38,9 @@ TRACE_EVENT(kfree_skb, TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s", __entry->skbaddr, __entry->protocol, __entry->location, - __print_symbolic(__entry->reason, - DEFINE_DROP_REASON(FN, FNe))) + __print_sym(__entry->reason, drop_reason )) ); -#undef FN -#undef FNe - TRACE_EVENT(consume_skb, TP_PROTO(struct sk_buff *skb, void *location), diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 466999a7515e..cd1ea6c3e0f8 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -145,6 +145,49 @@ drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM] = { }; EXPORT_SYMBOL(drop_reasons_by_subsys); +#ifdef CONFIG_TRACEPOINTS +const char *drop_reason_lookup(unsigned long long value) +{ + unsigned long long subsys_id = value >> SKB_DROP_REASON_SUBSYS_SHIFT; + u32 reason = value & ~SKB_DROP_REASON_SUBSYS_MASK; + const struct drop_reason_list *subsys; + + if (subsys_id >= SKB_DROP_REASON_SUBSYS_NUM) + return NULL; + + subsys = rcu_dereference(drop_reasons_by_subsys[subsys_id]); + if (!subsys) + return NULL; + if (reason >= subsys->n_reasons) + return NULL; + return subsys->reasons[reason]; +} + +void drop_reason_show(struct seq_file *m) +{ + u32 subsys_id; + + rcu_read_lock(); + for (subsys_id = 0; subsys_id < SKB_DROP_REASON_SUBSYS_NUM; subsys_id++) { + const struct drop_reason_list *subsys; + u32 i; + + subsys = rcu_dereference(drop_reasons_by_subsys[subsys_id]); + if (!subsys) + continue; + + for (i = 0; i < subsys->n_reasons; i++) { + if (!subsys->reasons[i]) + continue; + seq_printf(m, ", { %u, \"%s\" }", + (subsys_id << SKB_DROP_REASON_SUBSYS_SHIFT) | i, + subsys->reasons[i]); + } + } + rcu_read_unlock(); +} +#endif + /** * drop_reasons_register_subsys - register another drop reason subsystem * @subsys: the subsystem to register, must not be the core -- 2.45.2
[PATCH v5 1/4] tracing: add __print_sym() to replace __print_symbolic()
From: Johannes Berg The way __print_symbolic() works is limited and inefficient in multiple ways: - you can only use it with a static list of symbols, but e.g. the SKB dropreasons are now a dynamic list - it builds the list in memory _three_ times, so it takes a lot of memory: - The print_fmt contains the list (since it's passed to the macro there). This actually contains the names _twice_, which is fixed up at runtime. - TRACE_DEFINE_ENUM() puts a 24-byte struct trace_eval_map for every entry, plus the string pointed to by it, which cannot be deduplicated with the strings in the print_fmt - The in-kernel symbolic printing creates yet another list of struct trace_print_flags for trace_print_symbols_seq() - it also requires runtime fixup during init, which is a lot of string parsing due to the print_fmt fixup Introduce __print_sym() to - over time - replace the old one. We can easily extend this also to __print_flags later, but I cared only about the SKB dropreasons for now, which has only __print_symbolic(). This new __print_sym() requires only a single list of items, created by TRACE_DEFINE_SYM_LIST(), or can even use another already existing list by using TRACE_DEFINE_SYM_FNS() with lookup and show methods. Then, instead of doing an init-time fixup, just do this at the time when userspace reads the print_fmt. This way, dynamically updated lists are possible. For userspace, nothing actually changes, because the print_fmt is shown exactly the same way the old __print_symbolic() was. This adds about 4k .text in my test builds, but that'll be more than paid for by the actual conversions. Signed-off-by: Johannes Berg --- v2: - fix RCU - use ':' as separator to simplify the code, that's still not valid in a C identifier v3: - add missing #undef lines (reported by Simon Horman) - clarify name is not NUL-terminated and fix logic for the comparison for that v4: - fix non-modular builds and handle TRACE_EVENT_FL_DYNAMIC v5: - fix build warning in non-modular build --- include/asm-generic/vmlinux.lds.h | 3 +- include/linux/module.h | 2 + include/linux/trace_events.h | 7 ++ include/linux/tracepoint.h | 20 + include/trace/stages/init.h| 55 include/trace/stages/stage2_data_offsets.h | 6 ++ include/trace/stages/stage3_trace_output.h | 9 ++ include/trace/stages/stage7_class_define.h | 3 + kernel/module/main.c | 3 + kernel/trace/trace_events.c| 99 +- kernel/trace/trace_output.c| 45 ++ 11 files changed, 249 insertions(+), 3 deletions(-) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 5703526d6ebf..8275a06bcaee 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -258,7 +258,8 @@ #define FTRACE_EVENTS() \ . = ALIGN(8); \ BOUNDED_SECTION(_ftrace_events) \ - BOUNDED_SECTION_BY(_ftrace_eval_map, _ftrace_eval_maps) + BOUNDED_SECTION_BY(_ftrace_eval_map, _ftrace_eval_maps) \ + BOUNDED_SECTION(_ftrace_sym_defs) #else #define FTRACE_EVENTS() #endif diff --git a/include/linux/module.h b/include/linux/module.h index ffa1c603163c..7256762d59e1 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -524,6 +524,8 @@ struct module { unsigned int num_trace_events; struct trace_eval_map **trace_evals; unsigned int num_trace_evals; + struct trace_sym_def **trace_sym_defs; + unsigned int num_trace_sym_defs; #endif #ifdef CONFIG_FTRACE_MCOUNT_RECORD unsigned int num_ftrace_callsites; diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 9df3e2973626..2743280c9a46 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -27,6 +27,13 @@ const char *trace_print_flags_seq(struct trace_seq *p, const char *delim, const char *trace_print_symbols_seq(struct trace_seq *p, unsigned long val, const struct trace_print_flags *symbol_array); +const char *trace_print_sym_seq(struct trace_seq *p, unsigned long long val, + const char *(*lookup)(unsigned long long val)); +const char *trace_sym_lookup(const struct trace_sym_entry *list, +size_t len, unsigned long long value); +void trace_sym_show(struct seq_file *m, + const struct trace_sym_entry *list, size_t len); + #if BITS_PER_LONG == 32 const char *trace_print_flags_seq_u64(struct trace_seq *p, const char *delim, unsigned long long flags, diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 689b6d71590e..cc3b387953d1 10064
[PATCH v5 2/4] tracing/timer: use __print_sym()
From: Johannes Berg Use the new __print_sym() in the timer tracing, just to show how to convert something. This adds ~80 bytes of .text for a saving of ~1.5K of data in my builds. Note the format changes from print fmt: "success=%d dependency=%s", REC->success, __print_symbolic(REC->dependency, { 0, "NONE" }, { (1 << 0), "POSIX_TIMER" }, { (1 << 1), "PERF_EVENTS" }, { (1 << 2), "SCHED" }, { (1 << 3), "CLOCK_UNSTABLE" }, { (1 << 4), "RCU" }, { (1 << 5), "RCU_EXP" }) to print fmt: "success=%d dependency=%s", REC->success, __print_symbolic(REC->dependency, { 0, "NONE" }, { 1, "POSIX_TIMER" }, { 2, "PERF_EVENTS" }, { 4, "SCHED" }, { 8, "CLOCK_UNSTABLE" }, { 16, "RCU" }, { 32, "RCU_EXP" }) since the values are now just printed in the show function as pure decimal values. Signed-off-by: Johannes Berg --- include/trace/events/timer.h | 22 +++--- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h index 1ef58a04fc57..d483abffed78 100644 --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -402,26 +402,18 @@ TRACE_EVENT(itimer_expire, #undef tick_dep_mask_name #undef tick_dep_name_end -/* The MASK will convert to their bits and they need to be processed too */ -#define tick_dep_name(sdep) TRACE_DEFINE_ENUM(TICK_DEP_BIT_##sdep); \ - TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep); -#define tick_dep_name_end(sdep) TRACE_DEFINE_ENUM(TICK_DEP_BIT_##sdep); \ - TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep); -/* NONE only has a mask defined for it */ -#define tick_dep_mask_name(sdep) TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep); - -TICK_DEP_NAMES - -#undef tick_dep_name -#undef tick_dep_mask_name -#undef tick_dep_name_end - #define tick_dep_name(sdep) { TICK_DEP_MASK_##sdep, #sdep }, #define tick_dep_mask_name(sdep) { TICK_DEP_MASK_##sdep, #sdep }, #define tick_dep_name_end(sdep) { TICK_DEP_MASK_##sdep, #sdep } +TRACE_DEFINE_SYM_LIST(tick_dep_names, TICK_DEP_NAMES); + +#undef tick_dep_name +#undef tick_dep_mask_name +#undef tick_dep_name_end + #define show_tick_dep_name(val)\ - __print_symbolic(val, TICK_DEP_NAMES) + __print_sym(val, tick_dep_names) TRACE_EVENT(tick_stop, -- 2.45.2
[PATCH v5 0/4] tracing: improve symbolic printing
v2 was: - rebased on 6.9-rc1 - always search for __print_sym() and get rid of the DYNPRINT flag and associated code; I think ideally we'll just remove the older __print_symbolic() entirely - use ':' as the separator instead of "//" since that makes searching for it much easier and it's still not a valid char in an identifier - fix RCU v3: - fix #undef issues - fix drop_monitor default - rebase on linux-trace/for-next (there were no conflicts) - move net patches to 3/4 - clarify symbol name matching logic (and remove ")" from it) v4: - fix non-module build and possibly dynamic event handling v5: - fix build warning in non-module build To recap, it's annoying to have irq/65-iwlwifi:-401 [000]22.79: kfree_skb: ... reason: 0x2 and much nicer to see irq/65-iwlwifi:-401 [000]22.79: kfree_skb: ... reason: RX_DROP_MONITOR but this doesn't work now because __print_symbolic() can only deal with a hard-coded list (which is actually really big.) So here's __print_sym() which doesn't build the list into the kernel image, but creates it at runtime. For userspace, it will look the same as __print_symbolic() (it literally shows __print_symbolic() to userspace) so no changes are needed, but the actual list of values exposed to userspace in there is built dynamically. For SKB drop reasons, this then has all the reasons known when userspace queries the trace format. I guess patch 3/4 should go through net-next, so not sure how to handle this patch series. Or perhaps, as this will not cause conflicts, in fact I've been rebasing it for a long time, go through tracing anyway with an Ack from netdev? But I can also just wait for the trace patch(es) to land and resubmit the net patches after. Assuming this looks good at all :-) Thanks, johannes