[syzbot] [net?] [virt?] BUG: unable to handle kernel paging request in clear_page_erms (6)
Hello, syzbot found the following issue on: HEAD commit:abf2050f51fd Merge tag 'media/v6.12-1' of git://git.kernel.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=15a03107980000 kernel config: https://syzkaller.appspot.com/x/.config?x=2a8c36c5e2b56016 dashboard link: https://syzkaller.appspot.com/bug?extid=0a31340d42a1d572f904 compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 Unfortunately, I don't have any reproducer for this issue yet. Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/9800778169d6/disk-abf2050f.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/32a789de3883/vmlinux-abf2050f.xz kernel image: https://storage.googleapis.com/syzbot-assets/24e5e7200094/bzImage-abf2050f.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+0a31340d42a1d572f...@syzkaller.appspotmail.com BUG: unable to handle page fault for address: 8880603bc000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 1a801067 P4D 1a801067 PUD 6c591063 PMD 30259063 PTE 800f9fc43060 Oops: Oops: 0002 [#1] PREEMPT SMP KASAN PTI CPU: 0 UID: 0 PID: 14210 Comm: syz.2.2649 Not tainted 6.11.0-syzkaller-09959-gabf2050f51fd #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 RIP: 0010:clear_page_erms+0xb/0x20 arch/x86/lib/clear_page_64.S:50 Code: 48 8d 7f 40 75 d9 90 c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa b9 00 10 00 00 31 c0 aa c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90 RSP: 0018:c9007310 EFLAGS: 00010246 RAX: RBX: RCX: 1000 RDX: 8880603bc000 RSI: 0001 RDI: 8880603bc000 RBP: dc00 R08: ea000180ef37 R09: R10: ed100c077800 R11: f94000301de7 R12: 0001 R13: 0001 R14: ea000180ef00 R15: FS: 7fa3b27206c0() GS:8880b860() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 8880603bc000 CR3: 3ec82000 CR4: 003506f0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: clear_page arch/x86/include/asm/page_64.h:54 [inline] clear_highpage_kasan_tagged include/linux/highmem.h:248 [inline] kernel_init_pages mm/page_alloc.c:1036 [inline] post_alloc_hook+0xf8/0x230 mm/page_alloc.c:1535 prep_new_page mm/page_alloc.c:1545 [inline] get_page_from_freelist+0x3039/0x3180 mm/page_alloc.c:3457 __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4733 page_frag_alloc_1k net/core/skbuff.c:249 [inline] napi_alloc_skb+0x641/0xa00 net/core/skbuff.c:847 page_to_skb+0x276/0x9b0 drivers/net/virtio_net.c:800 receive_mergeable drivers/net/virtio_net.c:2253 [inline] receive_buf+0x3bc/0x17b0 drivers/net/virtio_net.c:2391 virtnet_receive_packets drivers/net/virtio_net.c:2698 [inline] virtnet_receive drivers/net/virtio_net.c:2722 [inline] virtnet_poll+0x26b2/0x3980 drivers/net/virtio_net.c:2817 __napi_poll+0xcb/0x490 net/core/dev.c:6771 napi_poll net/core/dev.c:6840 [inline] net_rx_action+0x89b/0x1240 net/core/dev.c:6962 handle_softirqs+0x2c5/0x980 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637 irq_exit_rcu+0x9/0x30 kernel/softirq.c:649 common_interrupt+0xb9/0xd0 arch/x86/kernel/irq.c:278 asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693 RIP: 0010:finish_task_switch+0x1ea/0x870 kernel/sched/core.c:5189 Code: c9 50 e8 79 fa 0b 00 48 83 c4 08 4c 89 f7 e8 4d 39 00 00 e9 de 04 00 00 4c 89 f7 e8 e0 70 60 0a e8 db 58 38 00 fb 48 8b 5d c0 <48> 8d bb f8 15 00 00 48 89 f8 48 c1 e8 03 49 be 00 00 00 00 00 fc RSP: 0018:c9000caa7228 EFLAGS: 0282 RAX: 0b99d481b6833300 RBX: 888050f5 RCX: 817088da RDX: dc00 RSI: 8c0aca40 RDI: 8c601bc0 RBP: c9000caa7270 R08: 9422a907 R09: 12845520 R10: dc00 R11: fbfff2845521 R12: 1110170c7f0c R13: dc00 R14: 8880b863ea40 R15: 8880b863f860 context_switch kernel/sched/core.c:5318 [inline] __schedule+0x184b/0x4ae0 kernel/sched/core.c:6674 preempt_schedule_common+0x84/0xd0 kernel/sched/core.c:6853 preempt_schedule+0xe1/0xf0 kernel/sched/core.c:6877 preempt_schedule_thunk+0x1a/0x30 arch/x86/entry/thunk.S:12 free_unref_page+0x6b5/0xf00 mm/page_alloc.c:2662 __folio_put+0x2c7/0x440 mm/swap.c:126 secretmem_fault+0x1f9/0x430 mm/secretmem.c:87 __do_fault+0x135/0x460 mm/memory.c:4876 do_shared_fault mm/memory.c:5346 [inline] do_fault mm/memory.c:5420 [inline] do_pte_missing mm/memory.c:3965 [inline] handle_pte_fault+0x1105/0x6800 mm/memory.c:5751
[linus:master] [selftests] ecb8bd70d5: kernel-selftests.vDSO.vdso_standalone_test_x86.fail
Hello, kernel test robot noticed "kernel-selftests.vDSO.vdso_standalone_test_x86.fail" on: commit: ecb8bd70d51ccf9009219a6097cef293deada65b ("selftests: vDSO: build tests with O2 optimization") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: kernel-selftests version: kernel-selftests-x86_64-977d51cf-1_20240508 with following parameters: group: group-03 compiler: gcc-12 test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-lkp/202409241558.98e13f6f-oliver.s...@intel.com # timeout set to 300 # selftests: vDSO: vdso_standalone_test_x86 # Segmentation fault not ok 5 selftests: vDSO: vdso_standalone_test_x86 # exit=139 The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240924/202409241558.98e13f6f-oliver.s...@intel.com -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Re: [RFC PATCH 0/4] Add hazard pointers to kernel
On 9/19/2024 12:16 PM, Linus Torvalds wrote: > On Thu, 19 Sept 2024 at 00:44, Neeraj Upadhyay > wrote: >> >> While we were working on this problem, this refcount scalability issue got >> resolved recently with conditional ref acquisition [3] (however, there are >> new >> developments in apparmor code which might bring back the refcount problem >> [4]). > > Honestly, the various security layers should be a whole lot more > careful about their horrid performance issues, and I think that [4] > you point at needs to just be headed off at the pass. > > No more "the security layer is so bad at performance that we have to > introduce new ref mechanisms", please. Let's push back on bad security > layer code instead. > Ok got it. Thanks for your feedback! I had tried using percpu refcount first (in place of kref) in AppArmor. However, that required managing the last reference drop (implemented in [1] and [2]). Mateusz has shared some ideas in his reply to this thread. Maybe that is a workable solution. Will defer to John on this as I have limited understanding of the cred management code. - Neeraj > Linus
Re: [RFC PATCH 0/4] Add hazard pointers to kernel
On Thu, Sep 19, 2024 at 04:14:05AM +0530, Neeraj Upadhyay wrote: > On 9/18/2024 12:48 PM, Linus Torvalds wrote: > > On Tue, 17 Sept 2024 at 16:34, Boqun Feng wrote: > >> > >> This series introduces hazard pointers [1] to kernel space. A TL;DR > >> description of hazard pointers is "a scalable refcounting mechanim > >> with RCU-like API". More information can be found at [2]. > > > > Please give actual "this is useful for X, and here is an actual real > > load with numbers showing why it matters". > > > > One of the use case where we had seen improvement is - Nginx > web server throughput scalability with AppArmor enabled. For this use > case we see refcount scalability problem when kref operations > are done for AppArmor label object in Nginx worker's context. More > details about this are captured @ [1] [2]. > > When we switch from kref to hazard pointer in apparmor_file_open(), > we see ~7% improvement in Nginx throughput for this use case. > > While we were working on this problem, this refcount scalability issue got > resolved recently with conditional ref acquisition [3] (however, there are > new > developments in apparmor code which might bring back the refcount problem > [4]). > The open/close thing is still serializing across different processes, the slowdown just got lower. As in apparmor *as is* continues to be a problem at big enough scale. Per my messages in the area in the past, I'm confident this is fixable with changing the refcount model to cache ref changes per-thread. I employed this very scheme $elsewhere. Since equivalent mechanism is applicable to creds this may want to be implemented as something under lib/. I even started to work on it for Linux, but real life got in the way and then I could not be arsed to finish. It is a little reminiscenet of per-cpu refs. Here is the outline again: kref usage gets replaced with a touple of { kref users; s64 refs; } task_struct grows a pointer to the cached label and refs counter on it when a new thread is created it bumps users and stores the pointer. on destruction it decrements users and rolls up the local changes. Similarly, if it turns out the label has to change during thread's lifetime, the same thing happens. In pseudo-code for apparmor_file_open(): if (unlikely(current->aa_cached_label != check_label())) { /* do a replacement here */ } /* just bump the local counter, no synchronisation with other * cpus in the common case */ current->aa_cached_label_refs++; In apparmor_file_close(): /* common case fast path */ if (file->aa_label == current->aa_cached_label) { current->aa_cached_label_refs--; return; } /* we get here if apparmor got reconfigured or this is a file we * inherited from another proc which had a different label and * this is the last fput */ kref_put(file->aa_label); Conceptually there is almost nothing to see here. As outlined above stale labels would clear themselves out as threads open files. However, a thread which stubborly refuses to call allocate a new file obj may hold on to a stale label indefinitely. One way to sort it out: I presume there is a spot somewhere in user<->kernel transition handling which updates the credentials pointer, should it have changed. $elsewhere I patched it up with a "cow" generation counter. If not matching with the real task struct you know you need to take the fast path and check creds, apparmor and whatever else. No extra branches in the fast path, but a new int does have to be read. Given that task_struct is a little bit of a cluster fuck I don't think it's a problem. That would be a rough sketch, anyone interested can fill in the details. This still performs serializing atomics in *certain* cases, but avoids them in almost all cases and there is nothing complicated about this that I see, just some effort to implement. So I don't believe patching up RCU with hazard pointers is warranted if apparmor is the only justification. Anyway no ETA from my end, anyone interested is free to take the idea or do better.
Re: [RFC PATCH 0/4] Add hazard pointers to kernel
On Thu, 19 Sept 2024 at 16:15, Christoph Hellwig wrote: > > Agreed. From the description this would seem like a good fit for > q_usage_counter in the block layer, which currently makes creative use > of percpu counters. Yes, if this actually could simplify code that currently used percpu counters, that might be lovely. The percpu counters often perform very well, but then have huge pain in either managing the percpu allocation, or in trying to synchronize across CPU's. I'd be a lot more interested in "we can fix complex code" than in "we have crappy code in bad subsystems where we can hide the performance impact of the subsystem not having been done right". Linus
Re: [RFC PATCH 0/4] Add hazard pointers to kernel
On Wed, Sep 18, 2024 at 09:18:43AM +0200, Linus Torvalds wrote: > On Tue, 17 Sept 2024 at 16:34, Boqun Feng wrote: > > > > This series introduces hazard pointers [1] to kernel space. A TL;DR > > description of hazard pointers is "a scalable refcounting mechanim > > with RCU-like API". More information can be found at [2]. > > Please give actual "this is useful for X, and here is an actual real > load with numbers showing why it matters". Agreed. From the description this would seem like a good fit for q_usage_counter in the block layer, which currently makes creative use of percpu counters.
Re: [RFC PATCH 0/4] Add hazard pointers to kernel
On Thu, 19 Sept 2024 at 00:44, Neeraj Upadhyay wrote: > > While we were working on this problem, this refcount scalability issue got > resolved recently with conditional ref acquisition [3] (however, there are > new > developments in apparmor code which might bring back the refcount problem > [4]). Honestly, the various security layers should be a whole lot more careful about their horrid performance issues, and I think that [4] you point at needs to just be headed off at the pass. No more "the security layer is so bad at performance that we have to introduce new ref mechanisms", please. Let's push back on bad security layer code instead. Linus
Re: [RFC PATCH 0/4] Add hazard pointers to kernel
On 9/18/2024 12:48 PM, Linus Torvalds wrote: > On Tue, 17 Sept 2024 at 16:34, Boqun Feng wrote: >> >> This series introduces hazard pointers [1] to kernel space. A TL;DR >> description of hazard pointers is "a scalable refcounting mechanim >> with RCU-like API". More information can be found at [2]. > > Please give actual "this is useful for X, and here is an actual real > load with numbers showing why it matters". > One of the use case where we had seen improvement is - Nginx web server throughput scalability with AppArmor enabled. For this use case we see refcount scalability problem when kref operations are done for AppArmor label object in Nginx worker's context. More details about this are captured @ [1] [2]. When we switch from kref to hazard pointer in apparmor_file_open(), we see ~7% improvement in Nginx throughput for this use case. While we were working on this problem, this refcount scalability issue got resolved recently with conditional ref acquisition [3] (however, there are new developments in apparmor code which might bring back the refcount problem [4]). [1] https://lore.kernel.org/lkml/20240110111856.87370-7-neeraj.upadh...@amd.com/T/ [2] https://lore.kernel.org/lkml/20240916050811.473556-1-neeraj.upadh...@amd.com/ [3] https://lore.kernel.org/lkml/20240620131524.156312-1-mjgu...@gmail.com/ [4] https://lore.kernel.org/lkml/71c0ea18-8b8b-402b-b03c-029aeedc2...@canonical.com/ - Neeraj > We don't just merge random infrastructure without a use-case and an > argument for it. > > Linus
Re: [RFC PATCH 0/4] Add hazard pointers to kernel
On Tue, 17 Sept 2024 at 16:34, Boqun Feng wrote: > > This series introduces hazard pointers [1] to kernel space. A TL;DR > description of hazard pointers is "a scalable refcounting mechanim > with RCU-like API". More information can be found at [2]. Please give actual "this is useful for X, and here is an actual real load with numbers showing why it matters". We don't just merge random infrastructure without a use-case and an argument for it. Linus
[RFC PATCH 0/4] Add hazard pointers to kernel
Hi, This series introduces hazard pointers [1] to kernel space. A TL;DR description of hazard pointers is "a scalable refcounting mechanim with RCU-like API". More information can be found at [2]. The problem we are trying to resolve here is refcount scalability issues that cannot be resolved simply by RCU or SRCU (maybe due to the requirement of an unbound protect duration). Neeraj has tried it in the scalability issue[3] he has been working on, and he will share more information in our LPC session [4] (and I will update in the list for those who cannot make it to the session later). My micro-benchmark shows the hazard pointers provide very good scalability on par with percpu_ref/RCU/SRCU on the reader side: (refscale in x86_64 + PREEMPT=y, avg reader duration in ns) nreaders1 8 32 percpu_ref 6.95123 10.0869 8.9674 rcu 2.97923 3.243 3.55077 hazptr 8.5991 8.40443 8.5762 srcu16.7754 22.4807 20.2406 Things that we know are currently not working: * Handling module unload, probably needs a hazptr_barrier() similar to rcu_barrier(). * rcutorture support should be added to catch potential bugs (esp. for callback handling). * Improvement for updater side performance, currently all callbacks are handled in one work, this can be improved by using multiple work_structs or threads. Of course, I might create some bugs, so please take a look. Also love to hear anything on the current API. Any feedback is welcome! Patch #1 is the implemenation of hazptr, Paul and Neeraj contributed a lot, but all bugs are mine ;-) Patch 2-3 add micro-benchmarks for hazptr and percpu_ref. Patch #4 is a simple test I've used for development, I put it here just in case someone wants to give a quick try, eventually, we need to add hazptr to rcutorture (or has its own torture) for more testing. Regards, Boqun [1]: M. M. Michael, "Hazard pointers: safe memory reclamation for lock-free objects," in IEEE Transactions on Parallel and Distributed Systems, vol. 15, no. 6, pp. 491-504, June 2004 [2]: https://docs.google.com/document/d/113WFjGlAW4m72xNbZWHUSE-yU2HIJnWpiXp91ShtgeE/ [3]: https://lore.kernel.org/lkml/20240916050811.473556-1-neeraj.upadh...@amd.com/ [4]: https://lpc.events/event/18/contributions/1731/ [5]: Herlihy, Maurice, Victor Luchangco, and Mark Moir. "The repeat offender problem: A mechanism for supporting dynamic-sized, lock-free data structures." International Symposium on Distributed Computing. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002. Boqun Feng (4): hazptr: Add initial implementation of hazard pointers refscale: Add benchmarks for hazptr refscale: Add benchmarks for percpu_ref WIP: hazptr: Add hazptr test sample include/linux/hazptr.h | 83 +++ kernel/Makefile | 1 + kernel/hazptr.c | 463 +++++++ kernel/rcu/refscale.c| 127 +- samples/Kconfig | 6 + samples/Makefile | 1 + samples/hazptr/hazptr_test.c | 87 +++ 7 files changed, 767 insertions(+), 1 deletion(-) create mode 100644 include/linux/hazptr.h create mode 100644 kernel/hazptr.c create mode 100644 samples/hazptr/hazptr_test.c -- 2.45.2
[PATCH v3 0/9] SEV Kernel Selftests
This series primarily introduces SEV-SNP test for the kernel selftest framework. It tests boot, ioctl, pre fault, and fallocate in various combinations to exercise both positive and negative launch flow paths. Patch 1 - Adds a wrapper for the ioctl calls that decouple ioctl and asserts, which enables the use of negative test cases. No functional change intended. Patch 2 - Extend the sev smoke tests to use the SNP specific ioctl calls and sets up memory to boot a SNP guest VM Patch 3 - Adds SNP to shutdown testing Patch 4, 5 - Tests the ioctl path for SEV, SEV-ES and SNP Patch 6 - Adds support for SNP in KVM_SEV_INIT2 tests Patch 7,8,9 - Enable Prefault tests for SEV, SEV-ES and SNP The patchset is rebased on top of kvm-x86/next branch. v3: 1. Remove the assignments for the prefault and fallocate test type enums. 2. Fix error message for sev launch measure and finish. 3. Collect tested-by tags [Peter, Srikanth] v2: https://lore.kernel.org/kvm/20240816192310.117456-1-pratikrajesh.sam...@amd.com/ 1. Add SMT parsing check to populate SNP policy flags 2. Extend Peter Gonda's shutdown test to include SNP 3. Introduce new tests for prefault which include exercising prefault, fallocate, hole-punch in various combinations. 4. Decouple ioctl patch reworked to introduce private variants of the the functions that call into the ioctl. Also reordered the patch for it to arrive first so that new APIs are not written right after their introduction. 5. General cleanups - adding comments, avoiding local booleans, better error message. Suggestions incorporated from Peter, Tom, and Sean. RFC: https://lore.kernel.org/kvm/20240710220540.188239-1-pratikrajesh.sam...@amd.com/ Any feedback/review is highly appreciated! Michael Roth (2): KVM: selftests: Add interface to manually flag protected/encrypted ranges KVM: selftests: Add a CoCo-specific test for KVM_PRE_FAULT_MEMORY Pratik R. Sampat (7): KVM: selftests: Decouple SEV ioctls from asserts KVM: selftests: Add a basic SNP smoke test KVM: selftests: Add SNP to shutdown testing KVM: selftests: SEV IOCTL test KVM: selftests: SNP IOCTL test KVM: selftests: SEV-SNP test for KVM_SEV_INIT2 KVM: selftests: Interleave fallocate for KVM_PRE_FAULT_MEMORY tools/testing/selftests/kvm/Makefile | 1 + .../testing/selftests/kvm/include/kvm_util.h | 13 + .../selftests/kvm/include/x86_64/processor.h | 1 + .../selftests/kvm/include/x86_64/sev.h| 76 +++- tools/testing/selftests/kvm/lib/kvm_util.c| 53 ++- .../selftests/kvm/lib/x86_64/processor.c | 6 +- tools/testing/selftests/kvm/lib/x86_64/sev.c | 190 +++- .../kvm/x86_64/coco_pre_fault_memory_test.c | 421 ++ .../selftests/kvm/x86_64/sev_init2_tests.c| 13 + .../selftests/kvm/x86_64/sev_smoke_test.c | 297 +++- 10 files changed, 1023 insertions(+), 48 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/coco_pre_fault_memory_test.c -- 2.34.1
Re: [PATCH v4 1/2] virtiofs: use pages instead of pointer for kernel direct IO
Hi, On 9/3/2024 4:44 PM, Jingbo Xu wrote: > > On 8/31/24 5:37 PM, Hou Tao wrote: >> From: Hou Tao >> >> When trying to insert a 10MB kernel module kept in a virtio-fs with cache >> disabled, the following warning was reported: >> SNIP >> >> Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem") >> Signed-off-by: Hou Tao > Tested-by: Jingbo Xu Thanks for the test. > > >> --- >> fs/fuse/file.c | 62 +++-- >> fs/fuse/fuse_i.h| 6 + >> fs/fuse/virtio_fs.c | 1 + >> 3 files changed, 50 insertions(+), 19 deletions(-) >> >> diff --git a/fs/fuse/file.c b/fs/fuse/file.c >> index f39456c65ed7..331208d3e4d1 100644 >> --- a/fs/fuse/file.c >> +++ b/fs/fuse/file.c >> @@ -645,7 +645,7 @@ void fuse_read_args_fill(struct fuse_io_args *ia, struct >> file *file, loff_t pos, >> args->out_args[0].size = count; >> } >> >> - SNIP >> static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter >> *ii, >> size_t *nbytesp, int write, >> - unsigned int max_pages) >> + unsigned int max_pages, >> + bool use_pages_for_kvec_io) >> { >> +bool flush_or_invalidate = false; >> size_t nbytes = 0; /* # bytes already packed in req */ >> ssize_t ret = 0; >> >> -/* Special case for kernel I/O: can copy directly into the buffer */ >> +/* Special case for kernel I/O: can copy directly into the buffer. >> + * However if the implementation of fuse_conn requires pages instead of >> + * pointer (e.g., virtio-fs), use iov_iter_extract_pages() instead. >> + */ >> if (iov_iter_is_kvec(ii)) { >> -unsigned long user_addr = fuse_get_user_addr(ii); >> -size_t frag_size = fuse_get_frag_size(ii, *nbytesp); >> +void *user_addr = (void *)fuse_get_user_addr(ii); >> >> -if (write) >> -ap->args.in_args[1].value = (void *) user_addr; >> -else >> -ap->args.out_args[0].value = (void *) user_addr; >> +if (!use_pages_for_kvec_io) { >> +size_t frag_size = fuse_get_frag_size(ii, *nbytesp); >> >> -iov_iter_advance(ii, frag_size); >> -*nbytesp = frag_size; >> -return 0; >> +if (write) >> +ap->args.in_args[1].value = user_addr; >> +else >> +ap->args.out_args[0].value = user_addr; >> + >> +iov_iter_advance(ii, frag_size); >> +*nbytesp = frag_size; >> +return 0; >> +} >> + >> +if (is_vmalloc_addr(user_addr)) { >> +ap->args.vmap_base = user_addr; >> +flush_or_invalidate = true; > Could we move flush_kernel_vmap_range() upon here, so that > flush_or_invalidate is not needed anymore and the code looks cleaner? flush_kernel_vmap_range() needs to know the length of the flushed area, if moving it here(), the length will be unknown. > >> +} >> } >> >> while (nbytes < *nbytesp && ap->num_pages < max_pages) { >> @@ -1513,6 +1533,10 @@ static int fuse_get_user_pages(struct fuse_args_pages >> *ap, struct iov_iter *ii, >> (PAGE_SIZE - ret) & (PAGE_SIZE - 1); >> } >> >> +if (write && flush_or_invalidate) >> +flush_kernel_vmap_range(ap->args.vmap_base, nbytes); >> + >> +ap->args.invalidate_vmap = !write && flush_or_invalidate; > How about initializing vmap_base only when the data buffer is vmalloced > and it's a read request? In this case invalidate_vmap is no longer needed. You mean using the value of vmap_base to indicate whether invalidation is needed or not, right ? I prefer to keep it, because the extra variable invalidate_vmap indicates the required action for the vmap area and it doesn't increase the size of fuse_args. > >> ap->args.is_pinned = iov_iter_extract_will_pin(ii); >> ap->args.user_pages = true; >> if (write) >> @@ -1581,7 +1605,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct >> iov_iter *iter, >> size_t nbytes = min(count, nmax); >> >> err = fuse_get_user_pages(&ia->ap, i
Re: [PATCH v4 1/2] virtiofs: use pages instead of pointer for kernel direct IO
On 8/31/24 5:37 PM, Hou Tao wrote: > From: Hou Tao > > When trying to insert a 10MB kernel module kept in a virtio-fs with cache > disabled, the following warning was reported: > > [ cut here ] > WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 .. > Modules linked in: > CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) .. > RIP: 0010:__alloc_pages+0x2bf/0x380 > .. > Call Trace: > >? __warn+0x8e/0x150 >? __alloc_pages+0x2bf/0x380 >__kmalloc_large_node+0x86/0x160 >__kmalloc+0x33c/0x480 >virtio_fs_enqueue_req+0x240/0x6d0 >virtio_fs_wake_pending_and_unlock+0x7f/0x190 >queue_request_and_unlock+0x55/0x60 >fuse_simple_request+0x152/0x2b0 >fuse_direct_io+0x5d2/0x8c0 >fuse_file_read_iter+0x121/0x160 >__kernel_read+0x151/0x2d0 >kernel_read+0x45/0x50 >kernel_read_file+0x1a9/0x2a0 >init_module_from_file+0x6a/0xe0 >idempotent_init_module+0x175/0x230 >__x64_sys_finit_module+0x5d/0xb0 >x64_sys_call+0x1c3/0x9e0 >do_syscall_64+0x3d/0xc0 >entry_SYSCALL_64_after_hwframe+0x4b/0x53 >.. > > ---[ end trace ]--- > > The warning is triggered as follows: > > 1) syscall finit_module() handles the module insertion and it invokes > kernel_read_file() to read the content of the module first. > > 2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and > passes it to kernel_read(). kernel_read() constructs a kvec iter by > using iov_iter_kvec() and passes it to fuse_file_read_iter(). > > 3) virtio-fs disables the cache, so fuse_file_read_iter() invokes > fuse_direct_io(). As for now, the maximal read size for kvec iter is > only limited by fc->max_read. For virtio-fs, max_read is UINT_MAX, so > fuse_direct_io() doesn't split the 10MB buffer. It saves the address and > the size of the 10MB-sized buffer in out_args[0] of a fuse request and > passes the fuse request to virtio_fs_wake_pending_and_unlock(). > > 4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to > queue the request. Because virtiofs need DMA-able address, so > virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce buffer for > all fuse args, copies these args into the bounce buffer and passed the > physical address of the bounce buffer to virtiofsd. The total length of > these fuse args for the passed fuse request is about 10MB, so > copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter and > it triggers the warning in __alloc_pages(): > > if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) > return NULL; > > 5) virtio_fs_enqueue_req() will retry the memory allocation in a > kworker, but it won't help, because kmalloc() will always return NULL > due to the abnormal size and finit_module() will hang forever. > > A feasible solution is to limit the value of max_read for virtio-fs, so > the length passed to kmalloc() will be limited. However it will affect > the maximal read size for normal read. And for virtio-fs write initiated > from kernel, it has the similar problem but now there is no way to limit > fc->max_write in kernel. > > So instead of limiting both the values of max_read and max_write in > kernel, introducing use_pages_for_kvec_io in fuse_conn and setting it as > true in virtiofs. When use_pages_for_kvec_io is enabled, fuse will use > pages instead of pointer to pass the KVEC_IO data. > > After switching to pages for KVEC_IO data, these pages will be used for > DMA through virtio-fs. If these pages are backed by vmalloc(), > {flush|invalidate}_kernel_vmap_range() are necessary to flush or > invalidate the cache before the DMA operation. So add two new fields in > fuse_args_pages to record the base address of vmalloc area and the > condition indicating whether invalidation is needed. Perform the flush > in fuse_get_user_pages() for write operations and the invalidation in > fuse_release_user_pages() for read operations. > > It may seem necessary to introduce another field in fuse_conn to > indicate that these KVEC_IO pages are used for DMA, However, considering > that virtio-fs is currently the only user of use_pages_for_kvec_io, just > reuse use_pages_for_kvec_io to indicate that these pages will be used > for DMA. > > Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem") > Signed-off-by: Hou Tao Tested-by: Jingbo Xu > --- > fs/fuse/file.c | 62 +++-- > fs/fuse/fuse_i.h| 6 + > fs/fuse/virtio_fs.c | 1 + > 3 files changed, 50 insertions(+), 19 deletions(-) >
[RFC 04/31] kernel/sys: Don't reference UTS_RELEASE directly
Objtool will be getting a new feature to calculate build-time function checksums, so each function can be uniquely identified. A function's checksum is calculated based on its instructions, jump/call targets, alternatives, string literals, and more. When there are any changes to the git working tree, UTS_RELEASE is suffixed with "+". That can result in an undesired changed checksum for the functions which inline override_release() due to its direct reference of the UTS_RELEASE string literal. Convert the override_release() 'rest' variable to a static local so it won't affect function checksums. Signed-off-by: Josh Poimboeuf --- kernel/sys.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sys.c b/kernel/sys.c index 3a2df1bd9f64..526464ea194b 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1291,7 +1291,7 @@ static int override_release(char __user *release, size_t len) int ret = 0; if (current->personality & UNAME26) { - const char *rest = UTS_RELEASE; + static const char *rest = UTS_RELEASE; char buf[65] = { 0 }; int ndots = 0; unsigned v; -- 2.45.2
[syzbot] [modules?] kernel panic: stack is corrupted in call_usermodehelper_exec
Hello, syzbot found the following issue on: HEAD commit:3b9dfd9e5936 Merge tag 'hwmon-for-v6.11-rc6' of git://git... git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=141ab933980000 kernel config: https://syzkaller.appspot.com/x/.config?x=d76559f775f44ba6 dashboard link: https://syzkaller.appspot.com/bug?extid=14d9438422f594f856bd compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17d8c77b98 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11034a3598 Downloadable assets: disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7bc7510fe41f/non_bootable_disk-3b9dfd9e.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/3dab2f917732/vmlinux-3b9dfd9e.xz kernel image: https://storage.googleapis.com/syzbot-assets/541828a1cf09/bzImage-3b9dfd9e.xz mounted in repro: https://storage.googleapis.com/syzbot-assets/cc6a8f9d7bd9/mount_0.gz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+14d9438422f594f85...@syzkaller.appspotmail.com Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: call_usermodehelper_exec+0x493/0x4a0 CPU: 0 UID: 0 PID: 5107 Comm: syz-executor310 Not tainted 6.11.0-rc5-syzkaller-00148-g3b9dfd9e5936 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 panic+0x349/0x860 kernel/panic.c:354 __stack_chk_fail+0x15/0x20 kernel/panic.c:827 call_usermodehelper_exec+0x493/0x4a0 call_modprobe kernel/module/kmod.c:103 [inline] __request_module+0x3ee/0x650 kernel/module/kmod.c:173 ctrl_getfamily+0x28e/0x6b0 net/netlink/genetlink.c:1450 genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline] genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline] genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2550 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219 netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline] netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1357 netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 __sys_sendto+0x3a4/0x4f0 net/socket.c:2204 __do_sys_sendto net/socket.c:2216 [inline] __se_sys_sendto net/socket.c:2212 [inline] __x64_sys_sendto+0xde/0x100 net/socket.c:2212 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fb2add42023 Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 80 3d 81 90 09 00 00 41 89 ca 74 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24 RSP: 002b:7ffe2a46ace8 EFLAGS: 0202 ORIG_RAX: 002c RAX: ffda RBX: 7ffe2a46ad90 RCX: 7fb2add42023 RDX: 001c RSI: 7ffe2a46ade0 RDI: 0005 RBP: 0005 R08: 7ffe2a46ad04 R09: 000c R10: R11: 0202 R12: R13: 7ffe2a46ad58 R14: 7ffe2a46ade0 R15: 00000000 Kernel Offset: disabled Rebooting in 86400 seconds.. --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. If the report is already addressed, let syzbot know by replying with: #syz fix: exact-commit-title If you want syzbot to run the reproducer, reply with: #syz test: git://repo/address.git branch-or-commit-hash If you attach or paste a git patch, syzbot will apply it before testing. If you want to overwrite report's subsystems, reply with: #syz set subsystems: new-subsystem (See the list of subsystem names on the web dashboard) If the report is a duplicate of another one, reply with: #syz dup: exact-subject-of-another-report If you want to undo deduplication, reply with: #syz undup
Re: [PATCH 0/2] module: Split modules_install compression and in-kernel decompression
On Mon, Jul 22, 2024 at 11:06:20AM +0200, Petr Pavlu wrote: > Allow enabling the in-kernel module decompression support separately, > without requiring to enable also the automatic compression during > 'make modules_install'. Applied and pushed, thanks! Luis
Re: [PATCH v3 0/2] virtiofs: fix the warning for kernel direct IO
On 8/14/24 3:46 PM, Hou Tao wrote: > Hi, > > On 8/14/2024 2:34 PM, Jingbo Xu wrote: >> Hi, Tao, >> >> On 4/26/24 10:39 PM, Hou Tao wrote: >>> From: Hou Tao >>> >>> Hi, >>> >>> The patch set aims to fix the warning related to an abnormal size >>> parameter of kmalloc() in virtiofs. Patch #1 fixes it by introducing >>> use_pages_for_kvec_io option in fuse_conn and enabling it in virtiofs. >>> Beside the abnormal size parameter for kmalloc, the gfp parameter is >>> also questionable: GFP_ATOMIC is used even when the allocation occurs >>> in a kworker context. Patch #2 fixes it by using GFP_NOFS when the >>> allocation is initiated by the kworker. For more details, please check >>> the individual patches. >>> >>> As usual, comments are always welcome. >>> >>> Change Log: >>> >>> v3: >>> * introduce use_pages_for_kvec_io for virtiofs. When the option is >>>enabled, fuse will use iov_iter_extract_pages() to construct a page >>>array and pass the pages array instead of a pointer to virtiofs. >>>The benefit is twofold: the length of the data passed to virtiofs is >>>limited by max_pages, and there is no memory copy compared with v2. >>> >>> v2: >>> https://lore.kernel.org/linux-fsdevel/20240228144126.2864064-1-hou...@huaweicloud.com/ >>> * limit the length of ITER_KVEC dio by max_pages instead of the >>> newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC >>> dio being consistent with other rw operations. >>> * replace kmalloc-allocated bounce buffer by using a bounce buffer >>> backed by scattered pages when the length of the bounce buffer for >>> KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with >>> fragmented memory, the KVEC_ITER dio can be handled normally by >>> virtiofs. (Bernd Schubert) >>> * merge the GFP_NOFS patch [1] into this patch-set and use >>> memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS >>> (Benjamin Coddington) >>> >>> v1: >>> https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/ >>> >>> [1]: >>> https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-hou...@huaweicloud.com/ >>> >>> Hou Tao (2): >>> virtiofs: use pages instead of pointer for kernel direct IO >>> virtiofs: use GFP_NOFS when enqueuing request through kworker >>> >>> fs/fuse/file.c | 12 >>> fs/fuse/fuse_i.h| 3 +++ >>> fs/fuse/virtio_fs.c | 25 - >>> 3 files changed, 27 insertions(+), 13 deletions(-) >>> >> We also encountered the same issue as [1] these days when attempting to >> insmod a module with ~6MB size, which is upon a virtiofs filesystem. >> >> It would be much helpful if this issue has a standard fix in the >> upstream. I see there will be v4 when reading through the mailing >> thread. Glad to know if there's any update to this series. > > Being busy with other stuff these days. I hope to send v4 before next > weekend. Many thanks, Tao. -- Thanks, Jingbo
Re: [PATCH v3 0/2] virtiofs: fix the warning for kernel direct IO
Hi, On 8/14/2024 2:34 PM, Jingbo Xu wrote: > Hi, Tao, > > On 4/26/24 10:39 PM, Hou Tao wrote: >> From: Hou Tao >> >> Hi, >> >> The patch set aims to fix the warning related to an abnormal size >> parameter of kmalloc() in virtiofs. Patch #1 fixes it by introducing >> use_pages_for_kvec_io option in fuse_conn and enabling it in virtiofs. >> Beside the abnormal size parameter for kmalloc, the gfp parameter is >> also questionable: GFP_ATOMIC is used even when the allocation occurs >> in a kworker context. Patch #2 fixes it by using GFP_NOFS when the >> allocation is initiated by the kworker. For more details, please check >> the individual patches. >> >> As usual, comments are always welcome. >> >> Change Log: >> >> v3: >> * introduce use_pages_for_kvec_io for virtiofs. When the option is >>enabled, fuse will use iov_iter_extract_pages() to construct a page >>array and pass the pages array instead of a pointer to virtiofs. >>The benefit is twofold: the length of the data passed to virtiofs is >>limited by max_pages, and there is no memory copy compared with v2. >> >> v2: >> https://lore.kernel.org/linux-fsdevel/20240228144126.2864064-1-hou...@huaweicloud.com/ >> * limit the length of ITER_KVEC dio by max_pages instead of the >> newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC >> dio being consistent with other rw operations. >> * replace kmalloc-allocated bounce buffer by using a bounce buffer >> backed by scattered pages when the length of the bounce buffer for >> KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with >> fragmented memory, the KVEC_ITER dio can be handled normally by >> virtiofs. (Bernd Schubert) >> * merge the GFP_NOFS patch [1] into this patch-set and use >> memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS >> (Benjamin Coddington) >> >> v1: >> https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/ >> >> [1]: >> https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-hou...@huaweicloud.com/ >> >> Hou Tao (2): >> virtiofs: use pages instead of pointer for kernel direct IO >> virtiofs: use GFP_NOFS when enqueuing request through kworker >> >> fs/fuse/file.c | 12 >> fs/fuse/fuse_i.h| 3 +++ >> fs/fuse/virtio_fs.c | 25 - >> 3 files changed, 27 insertions(+), 13 deletions(-) >> > We also encountered the same issue as [1] these days when attempting to > insmod a module with ~6MB size, which is upon a virtiofs filesystem. > > It would be much helpful if this issue has a standard fix in the > upstream. I see there will be v4 when reading through the mailing > thread. Glad to know if there's any update to this series. Being busy with other stuff these days. I hope to send v4 before next weekend. > > [1] > https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/ >
Re: [PATCH v3 0/2] virtiofs: fix the warning for kernel direct IO
Hi, Tao, On 4/26/24 10:39 PM, Hou Tao wrote: > From: Hou Tao > > Hi, > > The patch set aims to fix the warning related to an abnormal size > parameter of kmalloc() in virtiofs. Patch #1 fixes it by introducing > use_pages_for_kvec_io option in fuse_conn and enabling it in virtiofs. > Beside the abnormal size parameter for kmalloc, the gfp parameter is > also questionable: GFP_ATOMIC is used even when the allocation occurs > in a kworker context. Patch #2 fixes it by using GFP_NOFS when the > allocation is initiated by the kworker. For more details, please check > the individual patches. > > As usual, comments are always welcome. > > Change Log: > > v3: > * introduce use_pages_for_kvec_io for virtiofs. When the option is >enabled, fuse will use iov_iter_extract_pages() to construct a page >array and pass the pages array instead of a pointer to virtiofs. >The benefit is twofold: the length of the data passed to virtiofs is >limited by max_pages, and there is no memory copy compared with v2. > > v2: > https://lore.kernel.org/linux-fsdevel/20240228144126.2864064-1-hou...@huaweicloud.com/ > * limit the length of ITER_KVEC dio by max_pages instead of the > newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC > dio being consistent with other rw operations. > * replace kmalloc-allocated bounce buffer by using a bounce buffer > backed by scattered pages when the length of the bounce buffer for > KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with > fragmented memory, the KVEC_ITER dio can be handled normally by > virtiofs. (Bernd Schubert) > * merge the GFP_NOFS patch [1] into this patch-set and use > memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS > (Benjamin Coddington) > > v1: > https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/ > > [1]: > https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-hou...@huaweicloud.com/ > > Hou Tao (2): > virtiofs: use pages instead of pointer for kernel direct IO > virtiofs: use GFP_NOFS when enqueuing request through kworker > > fs/fuse/file.c | 12 > fs/fuse/fuse_i.h| 3 +++ > fs/fuse/virtio_fs.c | 25 - > 3 files changed, 27 insertions(+), 13 deletions(-) > We also encountered the same issue as [1] these days when attempting to insmod a module with ~6MB size, which is upon a virtiofs filesystem. It would be much helpful if this issue has a standard fix in the upstream. I see there will be v4 when reading through the mailing thread. Glad to know if there's any update to this series. [1] https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/ -- Thanks, Jingbo
Re: [PATCH 1/2] module: Split modules_install compression and in-kernel decompression
On Thu, Jul 25, 2024 at 9:59 PM Petr Pavlu wrote: > > On 7/22/24 12:23, Masahiro Yamada wrote: > > On Mon, Jul 22, 2024 at 6:07 PM Petr Pavlu wrote: > >> > >> The kernel configuration allows specifying a module compression mode. If > >> one is selected then each module gets compressed during > >> 'make modules_install' and additionally one can also enable support for > >> a respective direct in-kernel decompression support. This means that the > >> decompression support cannot be enabled without the automatic compression. > >> > >> Some distributions, such as the (open)SUSE family, use a signer service for > >> modules. A build runs on a worker machine but signing is done by a separate > >> locked-down server that is in possession of the signing key. The build > >> invokes 'make modules_install' to create a modules tree, collects > >> information about the modules, asks the signer service for their signature, > >> appends each signature to the respective module and compresses all modules. > >> > >> When using this arrangment, the 'make modules_install' step produces > >> unsigned+uncompressed modules and the distribution's own build recipe takes > >> care of signing and compression later. > >> > >> The signing support can be currently enabled without automatically signing > >> modules during 'make modules_install'. However, the in-kernel decompression > >> support can be selected only after first enabling automatic compression > >> during this step. > >> > >> To allow only enabling the in-kernel decompression support without the > >> automatic compression during 'make modules_install', separate the > >> compression options similarly to the signing options, as follows: > >> > >>> Enable loadable module support > >> [*] Module compression > >> Module compression type (GZIP) ---> > >> [*] Automatically compress all modules > >> [ ] Support in-kernel module decompression > >> > >> * "Module compression" (MODULE_COMPRESS) is a new main switch for the > >> compression/decompression support. It replaces MODULE_COMPRESS_NONE. > >> * "Module compression type" (MODULE_COMPRESS_) chooses the > >> compression type, one of GZ, XZ, ZSTD. > >> * "Automatically compress all modules" (MODULE_COMPRESS_ALL) is a new > >> option to enable module compression during 'make modules_install'. It > >> defaults to Y. > >> * "Support in-kernel module decompression" (MODULE_DECOMPRESS) enables > >> in-kernel decompression. > >> > >> Signed-off-by: Petr Pavlu > >> --- > > > > > > > > My preference is to add > > CONFIG_MODULE_DECOMPRESS_GZIP > > CONFIG_MODULE_DECOMPRESS_XZ > > CONFIG_MODULE_DECOMPRESS_ZSTD > > instead of > > CONFIG_MODULE_COMPRESS_ALL. > > > > > > > > > > For example, > > > > > > if MODULE_DECOMPRESS > > > > config MODULE_DECOMPRESS_GZIP > > bool "Support in-kernel GZIP decompression for module" > >default MODULE_COMPRESS_GZIP > > > > config MODULE_DECOMPRESS_XZ > >bool "Support in-kernel XZ decompression for module" > >default MODULE_COMPRESS_XZ > > > > config MODULE_DECOMPRESS_ZSTD > >bool "Support in-kernel ZSTD decompression for module" > >default MODULE_COMPRESS_ZSTD > > > > endif > > > > > > > > > > > > OR, maybe > > > > > > > > config MODULE_DECOMPRESS_GZIP > >bool "Support in-kernel GZIP decompression for module" > > select MODULE_DECOMPRESS > > > > config MODULE_DECOMPRESS_XZ > >bool "Support in-kernel XZ decompression for module" > >select MODULE_DECOMPRESS > > > > config MODULE_DECOMPRESS_ZSTD > >bool "Support in-kernel ZSTD decompression for module" > >select MODULE_DECOMPRESS > > > > config MODULE_DECOMPRESS > >bool > > > > > > > > > > You can toggle MODULE_COMPRESS_GZIP and > > MODULE_DECOMPRESS_GZIP independently > > I can implement this, but what would be a use case to enable multiple module > decompression types in the kernel? I just thought there is a possibility where the singer service A compresses modules in GZIP, and the singer service B in XZ, etc. If the compression type is predictable at the Kbuild time, it is fine. > > > > > > > Of course, the current kernel/module/decompress.c does not > > work when multiple (or zero) CONFIG_MODULE_DECOMPRESS_* is > > enabled. It needs a little modification. > > One issue is with the file /sys/module/compression which shows the module > decompression type supported by the kernel. If multiple types are allowed then > I think they should all get listed there. This could however create some > compatibility problems. For instance, kmod reads this file and currently > expects to find exactly one type, so it would need updating as well. OK, understood. Then, Acked-by: Masahiro Yamada -- Best Regards Masahiro Yamada
Re: [PATCH 1/2] module: Split modules_install compression and in-kernel decompression
On 7/22/24 12:23, Masahiro Yamada wrote: > On Mon, Jul 22, 2024 at 6:07 PM Petr Pavlu wrote: >> >> The kernel configuration allows specifying a module compression mode. If >> one is selected then each module gets compressed during >> 'make modules_install' and additionally one can also enable support for >> a respective direct in-kernel decompression support. This means that the >> decompression support cannot be enabled without the automatic compression. >> >> Some distributions, such as the (open)SUSE family, use a signer service for >> modules. A build runs on a worker machine but signing is done by a separate >> locked-down server that is in possession of the signing key. The build >> invokes 'make modules_install' to create a modules tree, collects >> information about the modules, asks the signer service for their signature, >> appends each signature to the respective module and compresses all modules. >> >> When using this arrangment, the 'make modules_install' step produces >> unsigned+uncompressed modules and the distribution's own build recipe takes >> care of signing and compression later. >> >> The signing support can be currently enabled without automatically signing >> modules during 'make modules_install'. However, the in-kernel decompression >> support can be selected only after first enabling automatic compression >> during this step. >> >> To allow only enabling the in-kernel decompression support without the >> automatic compression during 'make modules_install', separate the >> compression options similarly to the signing options, as follows: >> >>> Enable loadable module support >> [*] Module compression >> Module compression type (GZIP) ---> >> [*] Automatically compress all modules >> [ ] Support in-kernel module decompression >> >> * "Module compression" (MODULE_COMPRESS) is a new main switch for the >> compression/decompression support. It replaces MODULE_COMPRESS_NONE. >> * "Module compression type" (MODULE_COMPRESS_) chooses the >> compression type, one of GZ, XZ, ZSTD. >> * "Automatically compress all modules" (MODULE_COMPRESS_ALL) is a new >> option to enable module compression during 'make modules_install'. It >> defaults to Y. >> * "Support in-kernel module decompression" (MODULE_DECOMPRESS) enables >> in-kernel decompression. >> >> Signed-off-by: Petr Pavlu >> --- > > > > My preference is to add > CONFIG_MODULE_DECOMPRESS_GZIP > CONFIG_MODULE_DECOMPRESS_XZ > CONFIG_MODULE_DECOMPRESS_ZSTD > instead of > CONFIG_MODULE_COMPRESS_ALL. > > > > > For example, > > > if MODULE_DECOMPRESS > > config MODULE_DECOMPRESS_GZIP >bool "Support in-kernel GZIP decompression for module" >default MODULE_COMPRESS_GZIP > > config MODULE_DECOMPRESS_XZ >bool "Support in-kernel XZ decompression for module" >default MODULE_COMPRESS_XZ > > config MODULE_DECOMPRESS_ZSTD >bool "Support in-kernel ZSTD decompression for module" >default MODULE_COMPRESS_ZSTD > > endif > > > > > > OR, maybe > > > > config MODULE_DECOMPRESS_GZIP >bool "Support in-kernel GZIP decompression for module" >select MODULE_DECOMPRESS > > config MODULE_DECOMPRESS_XZ >bool "Support in-kernel XZ decompression for module" >select MODULE_DECOMPRESS > > config MODULE_DECOMPRESS_ZSTD >bool "Support in-kernel ZSTD decompression for module" >select MODULE_DECOMPRESS > > config MODULE_DECOMPRESS >bool > > > > > You can toggle MODULE_COMPRESS_GZIP and > MODULE_DECOMPRESS_GZIP independently I can implement this, but what would be a use case to enable multiple module decompression types in the kernel? > > > Of course, the current kernel/module/decompress.c does not > work when multiple (or zero) CONFIG_MODULE_DECOMPRESS_* is > enabled. It needs a little modification. One issue is with the file /sys/module/compression which shows the module decompression type supported by the kernel. If multiple types are allowed then I think they should all get listed there. This could however create some compatibility problems. For instance, kmod reads this file and currently expects to find exactly one type, so it would need updating as well. Thanks, Petr
Re: [PATCH 1/2] module: Split modules_install compression and in-kernel decompression
On Mon, Jul 22, 2024 at 6:07 PM Petr Pavlu wrote: > > The kernel configuration allows specifying a module compression mode. If > one is selected then each module gets compressed during > 'make modules_install' and additionally one can also enable support for > a respective direct in-kernel decompression support. This means that the > decompression support cannot be enabled without the automatic compression. > > Some distributions, such as the (open)SUSE family, use a signer service for > modules. A build runs on a worker machine but signing is done by a separate > locked-down server that is in possession of the signing key. The build > invokes 'make modules_install' to create a modules tree, collects > information about the modules, asks the signer service for their signature, > appends each signature to the respective module and compresses all modules. > > When using this arrangment, the 'make modules_install' step produces > unsigned+uncompressed modules and the distribution's own build recipe takes > care of signing and compression later. > > The signing support can be currently enabled without automatically signing > modules during 'make modules_install'. However, the in-kernel decompression > support can be selected only after first enabling automatic compression > during this step. > > To allow only enabling the in-kernel decompression support without the > automatic compression during 'make modules_install', separate the > compression options similarly to the signing options, as follows: > > > Enable loadable module support > [*] Module compression > Module compression type (GZIP) ---> > [*] Automatically compress all modules > [ ] Support in-kernel module decompression > > * "Module compression" (MODULE_COMPRESS) is a new main switch for the > compression/decompression support. It replaces MODULE_COMPRESS_NONE. > * "Module compression type" (MODULE_COMPRESS_) chooses the > compression type, one of GZ, XZ, ZSTD. > * "Automatically compress all modules" (MODULE_COMPRESS_ALL) is a new > option to enable module compression during 'make modules_install'. It > defaults to Y. > * "Support in-kernel module decompression" (MODULE_DECOMPRESS) enables > in-kernel decompression. > > Signed-off-by: Petr Pavlu > --- My preference is to add CONFIG_MODULE_DECOMPRESS_GZIP CONFIG_MODULE_DECOMPRESS_XZ CONFIG_MODULE_DECOMPRESS_ZSTD instead of CONFIG_MODULE_COMPRESS_ALL. For example, if MODULE_DECOMPRESS config MODULE_DECOMPRESS_GZIP bool "Support in-kernel GZIP decompression for module" default MODULE_COMPRESS_GZIP config MODULE_DECOMPRESS_XZ bool "Support in-kernel XZ decompression for module" default MODULE_COMPRESS_XZ config MODULE_DECOMPRESS_ZSTD bool "Support in-kernel ZSTD decompression for module" default MODULE_COMPRESS_ZSTD endif OR, maybe config MODULE_DECOMPRESS_GZIP bool "Support in-kernel GZIP decompression for module" select MODULE_DECOMPRESS config MODULE_DECOMPRESS_XZ bool "Support in-kernel XZ decompression for module" select MODULE_DECOMPRESS config MODULE_DECOMPRESS_ZSTD bool "Support in-kernel ZSTD decompression for module" select MODULE_DECOMPRESS config MODULE_DECOMPRESS bool You can toggle MODULE_COMPRESS_GZIP and MODULE_DECOMPRESS_GZIP independently Of course, the current kernel/module/decompress.c does not work when multiple (or zero) CONFIG_MODULE_DECOMPRESS_* is enabled. It needs a little modification. I will wait for Lius's comment. > kernel/module/Kconfig| 61 > scripts/Makefile.modinst | 2 ++ > 2 files changed, 33 insertions(+), 30 deletions(-) > > diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig > index 4047b6d48255..bb7f7930fef6 100644 > --- a/kernel/module/Kconfig > +++ b/kernel/module/Kconfig > @@ -278,64 +278,65 @@ config MODULE_SIG_HASH > default "sha3-384" if MODULE_SIG_SHA3_384 > default "sha3-512" if MODULE_SIG_SHA3_512 > > -choice > - prompt "Module compression mode" > +config MODULE_COMPRESS > + bool "Module compression" > help > - This option allows you to choose the algorithm which will be used to > - compress modules when 'make modules_install' is run. (or, you can > - choose to not compress modules at all.) > - > - External modules will also be compressed in the same way during the > - installation. > - > - For modules inside an initrd or initramfs
[PATCH 1/2] module: Split modules_install compression and in-kernel decompression
The kernel configuration allows specifying a module compression mode. If one is selected then each module gets compressed during 'make modules_install' and additionally one can also enable support for a respective direct in-kernel decompression support. This means that the decompression support cannot be enabled without the automatic compression. Some distributions, such as the (open)SUSE family, use a signer service for modules. A build runs on a worker machine but signing is done by a separate locked-down server that is in possession of the signing key. The build invokes 'make modules_install' to create a modules tree, collects information about the modules, asks the signer service for their signature, appends each signature to the respective module and compresses all modules. When using this arrangment, the 'make modules_install' step produces unsigned+uncompressed modules and the distribution's own build recipe takes care of signing and compression later. The signing support can be currently enabled without automatically signing modules during 'make modules_install'. However, the in-kernel decompression support can be selected only after first enabling automatic compression during this step. To allow only enabling the in-kernel decompression support without the automatic compression during 'make modules_install', separate the compression options similarly to the signing options, as follows: > Enable loadable module support [*] Module compression Module compression type (GZIP) ---> [*] Automatically compress all modules [ ] Support in-kernel module decompression * "Module compression" (MODULE_COMPRESS) is a new main switch for the compression/decompression support. It replaces MODULE_COMPRESS_NONE. * "Module compression type" (MODULE_COMPRESS_) chooses the compression type, one of GZ, XZ, ZSTD. * "Automatically compress all modules" (MODULE_COMPRESS_ALL) is a new option to enable module compression during 'make modules_install'. It defaults to Y. * "Support in-kernel module decompression" (MODULE_DECOMPRESS) enables in-kernel decompression. Signed-off-by: Petr Pavlu --- kernel/module/Kconfig| 61 ---- scripts/Makefile.modinst | 2 ++ 2 files changed, 33 insertions(+), 30 deletions(-) diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig index 4047b6d48255..bb7f7930fef6 100644 --- a/kernel/module/Kconfig +++ b/kernel/module/Kconfig @@ -278,64 +278,65 @@ config MODULE_SIG_HASH default "sha3-384" if MODULE_SIG_SHA3_384 default "sha3-512" if MODULE_SIG_SHA3_512 -choice - prompt "Module compression mode" +config MODULE_COMPRESS + bool "Module compression" help - This option allows you to choose the algorithm which will be used to - compress modules when 'make modules_install' is run. (or, you can - choose to not compress modules at all.) - - External modules will also be compressed in the same way during the - installation. - - For modules inside an initrd or initramfs, it's more efficient to - compress the whole initrd or initramfs instead. - + Enable module compression to reduce on-disk size of module binaries. This is fully compatible with signed modules. - Please note that the tool used to load modules needs to support the - corresponding algorithm. module-init-tools MAY support gzip, and kmod - MAY support gzip, xz and zstd. + The tool used to work with modules needs to support the selected + compression type. kmod MAY support gzip, xz and zstd. Other tools + might have a limited selection of the supported types. - Your build system needs to provide the appropriate compression tool - to compress the modules. + Note that for modules inside an initrd or initramfs, it's more + efficient to compress the whole ramdisk instead. - If in doubt, select 'None'. + If unsure, say N. -config MODULE_COMPRESS_NONE - bool "None" +choice + prompt "Module compression type" + depends on MODULE_COMPRESS help - Do not compress modules. The installed modules are suffixed - with .ko. + Choose the supported algorithm for module compression. config MODULE_COMPRESS_GZIP bool "GZIP" help - Compress modules with GZIP. The installed modules are suffixed - with .ko.gz. + Support modules compressed with GZIP. The installed modules are + suffixed with .ko.gz. config MODULE_COMPRESS_XZ bool "XZ" help - Compress modules with XZ. The installed modules are suffixed - with .ko.xz. + Support m
[PATCH 0/2] module: Split modules_install compression and in-kernel decompression
Allow enabling the in-kernel module decompression support separately, without requiring to enable also the automatic compression during 'make modules_install'. Petr Pavlu (2): module: Split modules_install compression and in-kernel decompression module: Clean up the description of MODULE_SIG_ kernel/module/Kconfig| 77 scripts/Makefile.modinst | 2 ++ 2 files changed, 41 insertions(+), 38 deletions(-) base-commit: 933069701c1b507825b514317d4edd5d3fd9d417 -- 2.35.3
Re: [PATCH 01/17] mm: move kernel/numa.c to mm/
On Tue, 16 Jul 2024 14:13:30 +0300 Mike Rapoport wrote: > From: "Mike Rapoport (Microsoft)" > > The stub functions in kernel/numa.c belong to mm/ rather than to kernel/ > > Signed-off-by: Mike Rapoport (Microsoft) Makes sense + all arch specific implementations are in arch/*/mm not arch/*/kernel so this makes it more consistent with that. Reviewed-by: Jonathan Cameron
Re: [PATCH 01/17] mm: move kernel/numa.c to mm/
On 16.07.24 13:13, Mike Rapoport wrote: From: "Mike Rapoport (Microsoft)" The stub functions in kernel/numa.c belong to mm/ rather than to kernel/ Signed-off-by: Mike Rapoport (Microsoft) --- Acked-by: David Hildenbrand -- Cheers, David / dhildenb
Re: [BUG REPORT] kernel BUG at lib/dynamic_queue_limits.c:99!
Hi, On 2024/7/13 8:44, Jakub Kicinski wrote: > On Fri, 12 Jul 2024 17:43:21 -0700 Jakub Kicinski wrote: >> CC: virtio_net maintainers and Jiri who added BQL > > Oh, sounds like the fix may be already posted: > https://lore.kernel.org/all/20240712080329.197605-2-jean-phili...@linaro.org/ Thanks, this patch indeed resolved the issue.
[PATCH 01/17] mm: move kernel/numa.c to mm/
From: "Mike Rapoport (Microsoft)" The stub functions in kernel/numa.c belong to mm/ rather than to kernel/ Signed-off-by: Mike Rapoport (Microsoft) --- kernel/Makefile | 1 - mm/Makefile | 1 + {kernel => mm}/numa.c | 0 3 files changed, 1 insertion(+), 1 deletion(-) rename {kernel => mm}/numa.c (100%) diff --git a/kernel/Makefile b/kernel/Makefile index 3c13240dfc9f..87866b037fbe 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -116,7 +116,6 @@ obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o obj-$(CONFIG_HAVE_STATIC_CALL) += static_call.o obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call_inline.o obj-$(CONFIG_CFI_CLANG) += cfi.o -obj-$(CONFIG_NUMA) += numa.o obj-$(CONFIG_PERF_EVENTS) += events/ diff --git a/mm/Makefile b/mm/Makefile index 8fb85acda1b1..773b3b267438 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -139,3 +139,4 @@ obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) += bootmem_info.o obj-$(CONFIG_GENERIC_IOREMAP) += ioremap.o obj-$(CONFIG_SHRINKER_DEBUG) += shrinker_debug.o obj-$(CONFIG_EXECMEM) += execmem.o +obj-$(CONFIG_NUMA) += numa.o diff --git a/kernel/numa.c b/mm/numa.c similarity index 100% rename from kernel/numa.c rename to mm/numa.c -- 2.43.0
Re: [BUG REPORT] kernel BUG at lib/dynamic_queue_limits.c:99!
On Fri, 12 Jul 2024 17:43:21 -0700 Jakub Kicinski wrote: > CC: virtio_net maintainers and Jiri who added BQL Oh, sounds like the fix may be already posted: https://lore.kernel.org/all/20240712080329.197605-2-jean-phili...@linaro.org/
Re: [BUG REPORT] kernel BUG at lib/dynamic_queue_limits.c:99!
CC: virtio_net maintainers and Jiri who added BQL On Fri, 12 Jul 2024 10:12:42 +0800 xiujianfeng wrote: > On 2024/7/12 10:08, xiujianfeng wrote: > > I found a problem with my QEMU environment, and the log is as follows. > > > > After I did the bisect to locate the issue, I found > > 8490dd0592e85e0cceefa6b48d66dbdd73df0fb3 is the first bad commit, > > however this is a merge commit, and I cannot further confirm which > > specific commit caused this issue. > > It's on > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git and > the base commit is f477dd6eede3 > > > > > [ cut here ] > > kernel BUG at lib/dynamic_queue_limits.c:99! > > Oops: invalid opcode: [#1] PREEMPT SMP NOPTI > > CPU: 1 UID: 0 PID: 203 Comm: ip Not tainted > > 6.10.0-rc7-next-20240711-12643-gf477dd6eede3 #613 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 > > 04/01/2014 > > RIP: 0010:dql_completed+0x212/0x230 > > Code: 41 1c 01 48 89 57 58 e9 85 fe ff ff 85 ed 40 0f 95 c5 41 39 d8 0f > > 95 c1 40 84 cd 74 05 45 85 e4 78 0a 44 89 d9 e9 67 fe fe > > RSP: 0018:c90f0d70 EFLAGS: 0213 > > RAX: RBX: 88800413b800 RCX: 888005925240 > > RDX: RSI: 81df1116 RDI: 888003a0d700 > > RBP: 888003a0d600 R08: R09: > > R10: R11: 88800a403c90 R12: 0001 > > R13: c90f0db0 R14: 888003a0d680 R15: 88803cc8 > > FS: 7fcf4229f1c0() GS:88803cc8() knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: 5596d60d1290 CR3: 093c CR4: 06f0 > > Call Trace: > > > > ? die+0x32/0x90 > > ? do_trap+0xdc/0x100 > > ? dql_completed+0x212/0x230 > > ? do_error_trap+0x60/0x80 > > ? dql_completed+0x212/0x230 > > ? exc_invalid_op+0x4f/0x70 > > ? dql_completed+0x212/0x230 > > ? asm_exc_invalid_op+0x1a/0x20 > > ? dql_completed+0x212/0x230 > > __free_old_xmit+0xb2/0x120 > > free_old_xmit+0x23/0x70 > > ? _raw_spin_trylock+0x46/0x60 > > virtnet_poll+0xe0/0x590 > > ? update_curr+0xf9/0x1c0 > > ? find_held_lock+0x2b/0x80 > > __napi_poll+0x25/0x160 > > net_rx_action+0x177/0x310 > > ? clockevents_program_event+0x53/0x100 > > ? lock_release+0xa4/0x1d0 > > ? ktime_get+0x76/0x100 > > ? lapic_next_event+0x10/0x20 > > handle_softirqs+0xd0/0x210 > > do_softirq+0x3b/0x60 > > > > > > __local_bh_enable_ip+0x55/0x70 > > virtnet_open+0xac/0x2d0 > > __dev_open+0xda/0x190 > > __dev_change_flags+0x1b3/0x230 > > ? __pfx_stack_trace_consume_entry+0x10/0x10 > > ? arch_stack_walk+0x9d/0xf0 > > dev_change_flags+0x20/0x60 > > do_setlink+0x27e/0x1120 > > ? set_track_prepare+0x3b/0x60 > > ? rtnl_newlink+0x5a/0xa0 > > ? rtnetlink_rcv_msg+0x199/0x4c0 > > ? __nla_validate_parse+0x5e/0xed0 > > ? netlink_sendmsg+0x1e3/0x420 > > ? __sock_sendmsg+0x5e/0x60 > > ? sys_sendmsg+0x1da/0x210 > > ? ___sys_sendmsg+0x7b/0xc0 > > ? __sys_sendmsg+0x50/0x90 > > ? do_syscall_64+0x4b/0x110 > > ? entry_SYSCALL_64_after_hwframe+0x76/0x7e > > __rtnl_newlink+0x50d/0x990 > > ? __kmalloc_cache_noprof+0x1a0/0x260 > > ? __kmalloc_cache_noprof+0x204/0x260 > > ? rtnetlink_rcv_msg+0x14e/0x4c0 > > ? rtnl_newlink+0x5a/0xa0 > > rtnl_newlink+0x73/0xa0 > > rtnetlink_rcv_msg+0x199/0x4c0 > > ? find_held_lock+0x2b/0x80 > > ? __pfx_rtnetlink_rcv_msg+0x10/0x10 > > netlink_rcv_skb+0x56/0x100 > > ? netlink_unicast+0x69/0x3a0 > > netlink_unicast+0x283/0x3a0 > > netlink_sendmsg+0x1e3/0x420 > > __sock_sendmsg+0x5e/0x60 > > sys_sendmsg+0x1da/0x210 > > ? copy_msghdr_from_user+0x68/0xa0 > > ___sys_sendmsg+0x7b/0xc0 > > ? stack_depot_save_flags+0x2e/0x8a0 > > ? check_bytes_and_report.constprop.0+0x48/0x120 > > ? check_object+0xb5/0x3a0 > > ? find_held_lock+0x2b/0x80 > > __sys_sendmsg+0x50/0x90 > > do_syscall_64+0x4b/0x110 > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > RIP: 0033:0x7fcf423c7f03 > > Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 > > 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 08 > > RSP: 002b:7ffcbfa59528 EFLAGS: 0246 ORIG_RAX: 002e > > RAX: ffda RBX: RCX: 7fcf423c7f03 > > RDX: RSI: 7ffcbfa59590 RDI: 0003 > > RBP: 00
Re: [PATCH 6.10.0-rc2] kernel/module: avoid panic on loading broken module
On Fri, Jun 21, 2024 at 04:05:27PM +0200, Daniel von Kirschten wrote: > Am 18.06.2024 um 21:58 schrieb Luis Chamberlain: > > On Thu, Jun 06, 2024 at 03:31:49PM +0200, Daniel v. Kirschten wrote: > > > If a module is being loaded, and the .gnu.linkonce.this_module section > > > in the module's ELF file does not have the WRITE flag, the kernel will > > > map the finished module struct of that module as read-only. > > > This causes a kernel panic when the struct is written to the first time > > > after it has been marked read-only. Currently this happens in > > > complete_formation in kernel/module/main.c:2765 when the module's state is > > > set to MODULE_STATE_COMING, just after setting up the memory protections. > > > > How did you find this issue? > > In a university course I got the assignment to manually craft a loadable .ko > file, given only a regular object file, without using Kbuild. During testing > my module files, most of them were simply (correctly) rejected by the kernel > with an appropriate error message, but at some point I ran into this exact > kernel panic, and investigated it to understand why my module file was > invalid. OK, then the commit log should describe that this doesn't fix any known real world issue, but rather a custom crafted module without the regular module build system. > > > Down the line, this seems to lead to unpredictable freezes when trying to > > > load other modules - I guess this is due to some structures not being > > > cleaned up properly, but I didn't investigate this further. > > > > > > A check already exists which verifies that .gnu.linkonce.this_module > > > is ALLOC. This patch simply adds an analogous check for WRITE. > > > > Can you check to ensure our modules generated have a respective check to > > ensure this check exists at build time? That would proactively inform > > userspace when a built module is not built correctly, and the tool > > responsible can be identified. > > See above - I don't think it's possible to create such a broken module file > with any of "official" tools. That should be clearly stated on the commit log. > I haven't looked too deeply into how Kbuild > actually builds modules, but as far as I know, the user doesn't even come > into contact with this_module w Consider that a next level university assignment and is more useful to the world than this debug message. Because above you suggest "I don't think", go out and now be sure. > hen using the regular toolchain, because > Kbuild is responsible for creating the .this_module section. And Kbuild of > course creates it with the correct flags. So if I understand correctly, ... > this > problem can only occur when the module was built by some external tooling > (or manually, in my case). Who would create custom modules without the Linux kernel module build system, and what uses does that provide? It seems you are proving why this would be terribly silly thing to do. Now, the *value* your change has is it can prevent a crash in case of a corrupted module, which *can* occur, consider an odd filesystem live corruption, at least this would be caught at module load attempt and not crash. That's worth committing for this reason but your commit log really needs much more clarity. Why? Because stupid bots want to assign stupid CVEs for anything that seems like a security issue and this could escalate to such type of things. Providing clarity helps system integrators decide if they want to backport this sort of patch. Providing clarify on the chances of this happening and how we think it can happen helps a lot. If you want to be more proactive, try to enhance userspace kmod modprobe so that this is also verified. Luis
Re: (subset) [PATCH v9 0/5] soc: qcom: add in-kernel pd-mapper implementation
On Sat, 22 Jun 2024 01:03:39 +0300, Dmitry Baryshkov wrote: > Protection domain mapper is a QMI service providing mapping between > 'protection domains' and services supported / allowed in these domains. > For example such mapping is required for loading of the WiFi firmware or > for properly starting up the UCSI / altmode / battery manager support. > > The existing userspace implementation has several issue. It doesn't play > well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the > firmware location is changed (or if the firmware was not available at > the time pd-mapper was started but the corresponding directory is > mounted later), etc. > > [...] Applied, thanks! [5/5] remoteproc: qcom: enable in-kernel PD mapper commit: 5b9f51b200dcb2c3924ecbff324fa52f1faa84d3 Best regards, -- Bjorn Andersson
Re: (subset) [PATCH v9 0/5] soc: qcom: add in-kernel pd-mapper implementation
On Sat, 22 Jun 2024 01:03:39 +0300, Dmitry Baryshkov wrote: > Protection domain mapper is a QMI service providing mapping between > 'protection domains' and services supported / allowed in these domains. > For example such mapping is required for loading of the WiFi firmware or > for properly starting up the UCSI / altmode / battery manager support. > > The existing userspace implementation has several issue. It doesn't play > well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the > firmware location is changed (or if the firmware was not available at > the time pd-mapper was started but the corresponding directory is > mounted later), etc. > > [...] Applied, thanks! [1/5] soc: qcom: pdr: protect locator_addr with the main mutex commit: 107924c14e3ddd85119ca43c26a4ee1056fa9b84 [2/5] soc: qcom: pdr: fix parsing of domains lists commit: 57f20d51f35780f240ecf39d81cda23612800a92 [3/5] soc: qcom: pdr: extract PDR message marshalling data commit: 0ac5c7d933de6570e0efa62bb5ef9e440311a6fe [4/5] soc: qcom: add pd-mapper implementation commit: 1ebcde047c547134e894508468ead0b7bd3b967d Best regards, -- Bjorn Andersson
[PATCH v9 5/5] remoteproc: qcom: enable in-kernel PD mapper
Request in-kernel protection domain mapper to be started before starting Qualcomm DSP and release it once DSP is stopped. Once all DSPs are stopped, the PD mapper will be stopped too. Reviewed-by: Chris Lew Tested-by: Steev Klimaszewski Tested-by: Neil Armstrong # on SM8550-QRD Signed-off-by: Dmitry Baryshkov --- drivers/remoteproc/qcom_common.c| 87 + drivers/remoteproc/qcom_common.h| 10 + drivers/remoteproc/qcom_q6v5_adsp.c | 3 ++ drivers/remoteproc/qcom_q6v5_mss.c | 3 ++ drivers/remoteproc/qcom_q6v5_pas.c | 3 ++ drivers/remoteproc/qcom_q6v5_wcss.c | 3 ++ 6 files changed, 109 insertions(+) diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c index 03e5f5d533eb..8c8688f99f0a 100644 --- a/drivers/remoteproc/qcom_common.c +++ b/drivers/remoteproc/qcom_common.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -25,6 +26,7 @@ #define to_glink_subdev(d) container_of(d, struct qcom_rproc_glink, subdev) #define to_smd_subdev(d) container_of(d, struct qcom_rproc_subdev, subdev) #define to_ssr_subdev(d) container_of(d, struct qcom_rproc_ssr, subdev) +#define to_pdm_subdev(d) container_of(d, struct qcom_rproc_pdm, subdev) #define MAX_NUM_OF_SS 10 #define MAX_REGION_NAME_LENGTH 16 @@ -519,5 +521,90 @@ void qcom_remove_ssr_subdev(struct rproc *rproc, struct qcom_rproc_ssr *ssr) } EXPORT_SYMBOL_GPL(qcom_remove_ssr_subdev); +static void pdm_dev_release(struct device *dev) +{ + struct auxiliary_device *adev = to_auxiliary_dev(dev); + + kfree(adev); +} + +static int pdm_notify_prepare(struct rproc_subdev *subdev) +{ + struct qcom_rproc_pdm *pdm = to_pdm_subdev(subdev); + struct auxiliary_device *adev; + int ret; + + adev = kzalloc(sizeof(*adev), GFP_KERNEL); + if (!adev) + return -ENOMEM; + + adev->dev.parent = pdm->dev; + adev->dev.release = pdm_dev_release; + adev->name = "pd-mapper"; + adev->id = pdm->index; + + ret = auxiliary_device_init(adev); + if (ret) { + kfree(adev); + return ret; + } + + ret = auxiliary_device_add(adev); + if (ret) { + auxiliary_device_uninit(adev); + return ret; + } + + pdm->adev = adev; + + return 0; +} + + +static void pdm_notify_unprepare(struct rproc_subdev *subdev) +{ + struct qcom_rproc_pdm *pdm = to_pdm_subdev(subdev); + + if (!pdm->adev) + return; + + auxiliary_device_delete(pdm->adev); + auxiliary_device_uninit(pdm->adev); + pdm->adev = NULL; +} + +/** + * qcom_add_pdm_subdev() - register PD Mapper subdevice + * @rproc: rproc handle + * @pdm: PDM subdevice handle + * + * Register @pdm so that Protection Device mapper service is started when the + * DSP is started too. + */ +void qcom_add_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm) +{ + pdm->dev = &rproc->dev; + pdm->index = rproc->index; + + pdm->subdev.prepare = pdm_notify_prepare; + pdm->subdev.unprepare = pdm_notify_unprepare; + + rproc_add_subdev(rproc, &pdm->subdev); +} +EXPORT_SYMBOL_GPL(qcom_add_pdm_subdev); + +/** + * qcom_remove_pdm_subdev() - remove PD Mapper subdevice + * @rproc: rproc handle + * @pdm: PDM subdevice handle + * + * Remove the PD Mapper subdevice. + */ +void qcom_remove_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm) +{ + rproc_remove_subdev(rproc, &pdm->subdev); +} +EXPORT_SYMBOL_GPL(qcom_remove_pdm_subdev); + MODULE_DESCRIPTION("Qualcomm Remoteproc helper driver"); MODULE_LICENSE("GPL v2"); diff --git a/drivers/remoteproc/qcom_common.h b/drivers/remoteproc/qcom_common.h index 9ef4449052a9..b07fbaa091a0 100644 --- a/drivers/remoteproc/qcom_common.h +++ b/drivers/remoteproc/qcom_common.h @@ -34,6 +34,13 @@ struct qcom_rproc_ssr { struct qcom_ssr_subsystem *info; }; +struct qcom_rproc_pdm { + struct rproc_subdev subdev; + struct device *dev; + int index; + struct auxiliary_device *adev; +}; + void qcom_minidump(struct rproc *rproc, unsigned int minidump_id, void (*rproc_dumpfn_t)(struct rproc *rproc, struct rproc_dump_segment *segment, void *dest, size_t offset, @@ -52,6 +59,9 @@ void qcom_add_ssr_subdev(struct rproc *rproc, struct qcom_rproc_ssr *ssr, const char *ssr_name); void qcom_remove_ssr_subdev(struct rproc *rproc, struct qcom_rproc_ssr *ssr); +void qcom_add_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm); +void qcom_remove_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm); + #if IS_ENABLED(CONFIG_QCOM_SYSMON) struct qcom_sysmon *qcom_add_sysmon_subdev(struct rproc *rproc,
[PATCH v9 0/5] soc: qcom: add in-kernel pd-mapper implementation
Protection domain mapper is a QMI service providing mapping between 'protection domains' and services supported / allowed in these domains. For example such mapping is required for loading of the WiFi firmware or for properly starting up the UCSI / altmode / battery manager support. The existing userspace implementation has several issue. It doesn't play well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the firmware location is changed (or if the firmware was not available at the time pd-mapper was started but the corresponding directory is mounted later), etc. However this configuration is largely static and common between different platforms. Provide in-kernel service implementing static per-platform data. --- Changes in v9: - Adjust locking in pdr_get_domain_list(), releasing the mutex right after qmi_send_request() (Chris Lew) - Link to v8: https://lore.kernel.org/r/20240512-qcom-pd-mapper-v8-0-5ecbb276f...@linaro.org Changes in v8: - Reworked pd-mapper to register as an rproc_subdev / auxdev - Dropped Tested-by from Steev and Alexey from the last patch since the implementation was changed significantly. - Add sensors, cdsp and mpss_root domains to 660 config (Alexey Minnekhanov) - Added platform entry for sm4250 (used for qrb4210 / RB2) - Added locking to the pdr_get_domain_list() (Chris Lew) - Remove the call to qmi_del_server() and corresponding API (Chris Lew) - In qmi_handle_init() changed 1024 to a defined constant (Chris Lew) - Link to v7: https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org Changes in v7: - Fixed modular build (Steev) - Link to v6: https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org Changes in v6: - Reworked mutex to fix lockdep issue on deregistration - Fixed dependencies between PD-mapper and remoteproc to fix modular builds (Krzysztof) - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof) - Fixed kerneldocs (Krzysztof) - Removed extra pr_debug messages (Krzysztof) - Fixed wcss build (Krzysztof) - Added platforms which do not require protection domain mapping to silence the notice on those platforms - Link to v5: https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org Changes in v5: - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew) - pd_mapper: reworked to provide static configuration per platform (Bjorn) - Link to v4: https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org Changes in v4: - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad) - Added configuration for sm6350 (Thanks to Luca) - Removed RFC tag (Konrad) - Link to v3: https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org Changes in RFC v3: - Send start / stop notifications when PD-mapper domain list is changed - Reworked the way PD-mapper treats protection domains, register all of them in a single batch - Added SC7180 domains configuration based on TCL Book 14 GO - Link to v2: https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org Changes in RFC v2: - Swapped num_domains / domains (Konrad) - Fixed an issue with battery not working on sc8280xp - Added missing configuration for QCS404 To: Bjorn Andersson To: Konrad Dybcio To: Sibi Sankar To: Mathieu Poirier Cc: linux-arm-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-remotep...@vger.kernel.org --- Dmitry Baryshkov (5): soc: qcom: pdr: protect locator_addr with the main mutex soc: qcom: pdr: fix parsing of domains lists soc: qcom: pdr: extract PDR message marshalling data soc: qcom: add pd-mapper implementation remoteproc: qcom: enable in-kernel PD mapper drivers/remoteproc/qcom_common.c| 87 + drivers/remoteproc/qcom_common.h| 10 + drivers/remoteproc/qcom_q6v5_adsp.c | 3 + drivers/remoteproc/qcom_q6v5_mss.c | 3 + drivers/remoteproc/qcom_q6v5_pas.c | 3 + drivers/remoteproc/qcom_q6v5_wcss.c | 3 + drivers/soc/qcom/Kconfig| 15 + drivers/soc/qcom/Makefile | 2 + drivers/soc/qcom/pdr_interface.c| 8 +- drivers/soc/qcom/pdr_internal.h | 318 ++--- drivers/soc/qcom/qcom_pd_mapper.c | 676 drivers/soc/qcom/qcom_pdr_msg.c | 353 +++ 12 files changed, 1183 insertions(+), 298 deletions(-) --- base-commit: 2102cb0d050d34d50b9642a3a50861787527e922 change-id: 20240301-qcom-pd-mapper-e12d622d4ad0 Best regards, -- Dmitry Baryshkov
Re: [PATCH 6.10.0-rc2] kernel/module: avoid panic on loading broken module
Am 18.06.2024 um 21:58 schrieb Luis Chamberlain: On Thu, Jun 06, 2024 at 03:31:49PM +0200, Daniel v. Kirschten wrote: If a module is being loaded, and the .gnu.linkonce.this_module section in the module's ELF file does not have the WRITE flag, the kernel will map the finished module struct of that module as read-only. This causes a kernel panic when the struct is written to the first time after it has been marked read-only. Currently this happens in complete_formation in kernel/module/main.c:2765 when the module's state is set to MODULE_STATE_COMING, just after setting up the memory protections. How did you find this issue? In a university course I got the assignment to manually craft a loadable .ko file, given only a regular object file, without using Kbuild. During testing my module files, most of them were simply (correctly) rejected by the kernel with an appropriate error message, but at some point I ran into this exact kernel panic, and investigated it to understand why my module file was invalid. Down the line, this seems to lead to unpredictable freezes when trying to load other modules - I guess this is due to some structures not being cleaned up properly, but I didn't investigate this further. A check already exists which verifies that .gnu.linkonce.this_module is ALLOC. This patch simply adds an analogous check for WRITE. Can you check to ensure our modules generated have a respective check to ensure this check exists at build time? That would proactively inform userspace when a built module is not built correctly, and the tool responsible can be identified. See above - I don't think it's possible to create such a broken module file with any of "official" tools. I haven't looked too deeply into how Kbuild actually builds modules, but as far as I know, the user doesn't even come into contact with this_module when using the regular toolchain, because Kbuild is responsible for creating the .this_module section. And Kbuild of course creates it with the correct flags. So if I understand correctly, this problem can only occur when the module was built by some external tooling (or manually, in my case). Daniel
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
disabled CONFIG_FORCE_NR_CPUS option for 6.9.5 but the trace + panic still exists. So that one didn't help. I've also been bisecting the trace but have not finished it yet as the last half dozen builds produced non-bootable kernels. Anyway, I will continue it soon(ish) when I have a bit more free time. --Ilkka On Tue, Jun 18, 2024 at 5:52 PM Steven Rostedt wrote: > > On Thu, 13 Jun 2024 10:32:24 +0300 > Ilkka Naulapää wrote: > > > ok, so if you don't have any idea where this bug is after those debug > > patches, I'll try to find some time to bisect it as a last resort. > > Stay tuned. > > FYI, > > I just debugged a strange crash that was caused by my config having > something leftover from your config. Specifically, that was: > > CONFIG_FORCE_NR_CPUS > > Do you get any warning about nr cpus not matching at boot up? > > Regardless, can you disable that and see if you still get the same > crash. > > Thanks, > > -- Steve
Re: [PATCH 6.10.0-rc2] kernel/module: avoid panic on loading broken module
On Thu, Jun 06, 2024 at 03:31:49PM +0200, Daniel v. Kirschten wrote: > If a module is being loaded, and the .gnu.linkonce.this_module section > in the module's ELF file does not have the WRITE flag, the kernel will > map the finished module struct of that module as read-only. > This causes a kernel panic when the struct is written to the first time > after it has been marked read-only. Currently this happens in > complete_formation in kernel/module/main.c:2765 when the module's state is > set to MODULE_STATE_COMING, just after setting up the memory protections. How did you find this issue? > Down the line, this seems to lead to unpredictable freezes when trying to > load other modules - I guess this is due to some structures not being > cleaned up properly, but I didn't investigate this further. > > A check already exists which verifies that .gnu.linkonce.this_module > is ALLOC. This patch simply adds an analogous check for WRITE. Can you check to ensure our modules generated have a respective check to ensure this check exists at build time? That would proactively inform userspace when a built module is not built correctly, and the tool responsible can be identified. Luis
[RFC PATCH 1/4] kernel/reboot: Introduce pre_restart notifiers
Introduce a new pre_restart notifier chain for callbacks that need to be executed after the system has been made quiescent with syscore_shutdown(), before machine restart. This pre_restart notifier chain should be invoked on machine restart and on emergency machine restart. The use-case for this new notifier chain is to preserve tracing data within pmem areas on systems where the BIOS does not clear memory across warm reboots. Why do we need a new notifier chain ? 1) The reboot and restart_prepare notifiers are called too early in the reboot sequence: they are invoked before syscore_shutdown(), which leaves other CPUs actively running threads while those notifiers are invoked. 2) The "restart" notifier is meant to trigger the actual machine restart, and is not meant to be invoked as a last step immediately before restart. It is also not always used: some architecture code choose to bypass this restart notifier and reboot directly from the architecture code. Wiring up the architecture code to call this notifier chain is left to follow-up arch-specific patches. Signed-off-by: Mathieu Desnoyers Cc: Dan Williams Cc: Vishal Verma Cc: Dave Jiang Cc: Ira Weiny Cc: Steven Rostedt Cc: nvd...@lists.linux.dev Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: x...@kernel.org Cc: "H. Peter Anvin" Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org --- include/linux/reboot.h | 4 kernel/reboot.c| 51 ++ 2 files changed, 55 insertions(+) diff --git a/include/linux/reboot.h b/include/linux/reboot.h index abcdde4df697..c7f340e81451 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -50,6 +50,10 @@ extern int register_restart_handler(struct notifier_block *); extern int unregister_restart_handler(struct notifier_block *); extern void do_kernel_restart(char *cmd); +extern int register_pre_restart_handler(struct notifier_block *); +extern int unregister_pre_restart_handler(struct notifier_block *); +extern void do_kernel_pre_restart(char *cmd); + /* * Architecture-specific implementations of sys_reboot commands. */ diff --git a/kernel/reboot.c b/kernel/reboot.c index 22c16e2564cc..b7287dd48d35 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -235,6 +235,57 @@ void do_kernel_restart(char *cmd) atomic_notifier_call_chain(&restart_handler_list, reboot_mode, cmd); } +/* + * Notifier list for kernel code which wants to be called immediately + * before restarting the system. + */ +static ATOMIC_NOTIFIER_HEAD(pre_restart_handler_list); + +/** + * register_pre_restart_handler - Register function to be called in preparation + *to reset the system + * @nb: Info about handler function to be called + * + * Registers a function with code to be called in preparation to restart + * the system. + * + * Currently always returns zero, as atomic_notifier_chain_register() + * always returns zero. + */ +int register_pre_restart_handler(struct notifier_block *nb) +{ + return atomic_notifier_chain_register(&pre_restart_handler_list, nb); +} +EXPORT_SYMBOL(register_pre_restart_handler); + +/** + * unregister_pre_restart_handler - Unregister previously registered + * pre-restart handler + * @nb: Hook to be unregistered + * + * Unregisters a previously registered pre-restart handler function. + * + * Returns zero on success, or %-ENOENT on failure. + */ +int unregister_pre_restart_handler(struct notifier_block *nb) +{ + return atomic_notifier_chain_unregister(&pre_restart_handler_list, nb); +} +EXPORT_SYMBOL(unregister_pre_restart_handler); + +/** + * do_kernel_pre_restart - Execute kernel pre-restart handler call chain + * + * Calls functions registered with register_pre_restart_handler. + * + * Expected to be called from machine_restart and + * machine_emergency_restart before invoking the restart handlers. + */ +void do_kernel_pre_restart(char *cmd) +{ + atomic_notifier_call_chain(&pre_restart_handler_list, reboot_mode, cmd); +} + void migrate_to_reboot_cpu(void) { /* The boot cpu is always logical cpu 0 */ -- 2.39.2
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Thu, 13 Jun 2024 10:32:24 +0300 Ilkka Naulapää wrote: > ok, so if you don't have any idea where this bug is after those debug > patches, I'll try to find some time to bisect it as a last resort. > Stay tuned. FYI, I just debugged a strange crash that was caused by my config having something leftover from your config. Specifically, that was: CONFIG_FORCE_NR_CPUS Do you get any warning about nr cpus not matching at boot up? Regardless, can you disable that and see if you still get the same crash. Thanks, -- Steve
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On 13.06.24 09:32, Ilkka Naulapää wrote: > On Wed, Jun 12, 2024 at 6:56 PM Steven Rostedt wrote: >> On Wed, 12 Jun 2024 15:36:22 +0200 >> "Linux regression tracking (Thorsten Leemhuis)" >> wrote: >>> >>> Ilkka or Steven, what happened to this? This thread looks stalled. I >>> also was unsuccessful when looking for other threads related to this >>> report or the culprit. Did it fall through the cracks or am I missing >>> something here? > >> Honesty, I have no idea where the bug is. I can't reproduce it. [...] Steven, thx for the update! And yeah, that's how it sometimes is. Given that we haven't seen similar reports (at least afaics) it's nothing I worry much about. > ok, so if you don't have any idea where this bug is after those debug > patches, I'll try to find some time to bisect it as a last resort. > Stay tuned. Yeah, that would be great help. Thank you, too! Ciao, Thorsten >>> On 02.06.24 09:32, Ilkka Naulapää wrote: sorry longer delay, been a bit busy but here is the result from that new patch. Only applied this patch so if the previous one is needed also, let me know and I'll rerun it. --Ilkka On Thu, May 30, 2024 at 5:00 PM Steven Rostedt wrote: > > On Thu, 30 May 2024 16:02:37 +0300 > Ilkka Naulapää wrote: > >> applied your patch and here's the output. >> > > Unfortunately, it doesn't give me any new information. I added one more > BUG on, want to try this? Otherwise, I'm pretty much at a lost. :-/ > > -- Steve > > diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c > index de5b72216b1a..a090495e78c9 100644 > --- a/fs/tracefs/inode.c > +++ b/fs/tracefs/inode.c > @@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct > super_block *sb) > return NULL; > > ti->flags = 0; > + ti->magic = 20240823; > > return &ti->vfs_inode; > } > > static void tracefs_free_inode(struct inode *inode) > { > - kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode)); > + struct tracefs_inode *ti = get_tracefs(inode); > + > + BUG_ON(ti->magic != 20240823); > + kmem_cache_free(tracefs_inode_cachep, ti); > } > > static ssize_t default_read_file(struct file *file, char __user *buf, > @@ -147,16 +151,6 @@ static const struct inode_operations > tracefs_dir_inode_operations = { > .rmdir = tracefs_syscall_rmdir, > }; > > -struct inode *tracefs_get_inode(struct super_block *sb) > -{ > - struct inode *inode = new_inode(sb); > - if (inode) { > - inode->i_ino = get_next_ino(); > - inode->i_atime = inode->i_mtime = > inode_set_ctime_current(inode); > - } > - return inode; > -} > - > struct tracefs_mount_opts { > kuid_t uid; > kgid_t gid; > @@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry > *dentry, struct inode *inode) > return; > > ti = get_tracefs(inode); > + BUG_ON(ti->magic != 20240823); > if (ti && ti->flags & TRACEFS_EVENT_INODE) > eventfs_set_ef_status_free(dentry); > iput(inode); > @@ -568,6 +563,18 @@ struct dentry *eventfs_end_creating(struct dentry > *dentry) > return dentry; > } > > +struct inode *tracefs_get_inode(struct super_block *sb) > +{ > + struct inode *inode = new_inode(sb); > + > + BUG_ON(sb->s_op != &tracefs_super_operations); > + if (inode) { > + inode->i_ino = get_next_ino(); > + inode->i_atime = inode->i_mtime = > inode_set_ctime_current(inode); > + } > + return inode; > +} > + > /** > * tracefs_create_file - create a file in the tracefs filesystem > * @name: a pointer to a string containing the name of the file to > create. > diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h > index 69c2b1d87c46..9059b8b11bb6 100644 > --- a/fs/tracefs/internal.h > +++ b/fs/tracefs/internal.h > @@ -9,12 +9,15 @@ enum { > struct tracefs_inode { > unsigned long flags; > void*private; > + unsigned long magic; > struct inodevfs_inode; > }; > > static inline struct tracefs_inode *get_tracefs(const struct inode > *inode) > { > - return container_of(inode, struct tracefs_inode, vfs_inode); > + struct tracefs_inode *ti = container_of(inode, struct > tracefs_inode, vfs_inode); > + BUG_ON(ti->magic != 20240823); > + return ti; > } > > struct dentry *tracefs_start_creating(const char *name, struct dentry > *parent);
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
ok, so if you don't have any idea where this bug is after those debug patches, I'll try to find some time to bisect it as a last resort. Stay tuned. --Ilkka On Wed, Jun 12, 2024 at 6:56 PM Steven Rostedt wrote: > > On Wed, 12 Jun 2024 15:36:22 +0200 > "Linux regression tracking (Thorsten Leemhuis)" > wrote: > > > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting > > for once, to make this easily accessible to everyone. > > > > Ilkka or Steven, what happened to this? This thread looks stalled. I > > also was unsuccessful when looking for other threads related to this > > report or the culprit. Did it fall through the cracks or am I missing > > something here? > > Honesty, I have no idea where the bug is. I can't reproduce it. These > patches I sent would check all the places that add to the list to make > sure the proper trace_inode was being added, and the output shows that > they are all correct. Then suddenly, something that came from the > inode cache is calling the tracefs inode cache to free it, and that's > where the bug is happening. > > This really looks like another bug that the recent changes have made > more predominate. > > -- Steve > > > > > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > > -- > > Everything you wanna know about Linux kernel regression tracking: > > https://linux-regtracking.leemhuis.info/about/#tldr > > If I did something stupid, please tell me, as explained on that page. > > > > #regzbot poke > > > > On 02.06.24 09:32, Ilkka Naulapää wrote: > > > sorry longer delay, been a bit busy but here is the result from that > > > new patch. Only applied this patch so if the previous one is needed > > > also, let me know and I'll rerun it. > > > > > > --Ilkka > > > > > > On Thu, May 30, 2024 at 5:00 PM Steven Rostedt > > > wrote: > > >> > > >> On Thu, 30 May 2024 16:02:37 +0300 > > >> Ilkka Naulapää wrote: > > >> > > >>> applied your patch and here's the output. > > >>> > > >> > > >> Unfortunately, it doesn't give me any new information. I added one more > > >> BUG on, want to try this? Otherwise, I'm pretty much at a lost. :-/ > > >> > > >> -- Steve > > >> > > >> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c > > >> index de5b72216b1a..a090495e78c9 100644 > > >> --- a/fs/tracefs/inode.c > > >> +++ b/fs/tracefs/inode.c > > >> @@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct > > >> super_block *sb) > > >> return NULL; > > >> > > >> ti->flags = 0; > > >> + ti->magic = 20240823; > > >> > > >> return &ti->vfs_inode; > > >> } > > >> > > >> static void tracefs_free_inode(struct inode *inode) > > >> { > > >> - kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode)); > > >> + struct tracefs_inode *ti = get_tracefs(inode); > > >> + > > >> + BUG_ON(ti->magic != 20240823); > > >> + kmem_cache_free(tracefs_inode_cachep, ti); > > >> } > > >> > > >> static ssize_t default_read_file(struct file *file, char __user *buf, > > >> @@ -147,16 +151,6 @@ static const struct inode_operations > > >> tracefs_dir_inode_operations = { > > >> .rmdir = tracefs_syscall_rmdir, > > >> }; > > >> > > >> -struct inode *tracefs_get_inode(struct super_block *sb) > > >> -{ > > >> - struct inode *inode = new_inode(sb); > > >> - if (inode) { > > >> - inode->i_ino = get_next_ino(); > > >> - inode->i_atime = inode->i_mtime = > > >> inode_set_ctime_current(inode); > > >> - } > > >> - return inode; > > >> -} > > >> - > > >> struct tracefs_mount_opts { > > >> kuid_t uid; > > >> kgid_t gid; > > >> @@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry > > >> *dentry, struct inode *inode) > > >> return; > > >> > > >> ti = get_tracefs(inode); > > >> + BUG_ON(ti->magic != 20240823); > > >>
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Wed, 12 Jun 2024 15:36:22 +0200 "Linux regression tracking (Thorsten Leemhuis)" wrote: > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting > for once, to make this easily accessible to everyone. > > Ilkka or Steven, what happened to this? This thread looks stalled. I > also was unsuccessful when looking for other threads related to this > report or the culprit. Did it fall through the cracks or am I missing > something here? Honesty, I have no idea where the bug is. I can't reproduce it. These patches I sent would check all the places that add to the list to make sure the proper trace_inode was being added, and the output shows that they are all correct. Then suddenly, something that came from the inode cache is calling the tracefs inode cache to free it, and that's where the bug is happening. This really looks like another bug that the recent changes have made more predominate. -- Steve > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > -- > Everything you wanna know about Linux kernel regression tracking: > https://linux-regtracking.leemhuis.info/about/#tldr > If I did something stupid, please tell me, as explained on that page. > > #regzbot poke > > On 02.06.24 09:32, Ilkka Naulapää wrote: > > sorry longer delay, been a bit busy but here is the result from that > > new patch. Only applied this patch so if the previous one is needed > > also, let me know and I'll rerun it. > > > > --Ilkka > > > > On Thu, May 30, 2024 at 5:00 PM Steven Rostedt wrote: > > > >> > >> On Thu, 30 May 2024 16:02:37 +0300 > >> Ilkka Naulapää wrote: > >> > >>> applied your patch and here's the output. > >>> > >> > >> Unfortunately, it doesn't give me any new information. I added one more > >> BUG on, want to try this? Otherwise, I'm pretty much at a lost. :-/ > >> > >> -- Steve > >> > >> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c > >> index de5b72216b1a..a090495e78c9 100644 > >> --- a/fs/tracefs/inode.c > >> +++ b/fs/tracefs/inode.c > >> @@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct > >> super_block *sb) > >> return NULL; > >> > >> ti->flags = 0; > >> + ti->magic = 20240823; > >> > >> return &ti->vfs_inode; > >> } > >> > >> static void tracefs_free_inode(struct inode *inode) > >> { > >> - kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode)); > >> + struct tracefs_inode *ti = get_tracefs(inode); > >> + > >> + BUG_ON(ti->magic != 20240823); > >> + kmem_cache_free(tracefs_inode_cachep, ti); > >> } > >> > >> static ssize_t default_read_file(struct file *file, char __user *buf, > >> @@ -147,16 +151,6 @@ static const struct inode_operations > >> tracefs_dir_inode_operations = { > >> .rmdir = tracefs_syscall_rmdir, > >> }; > >> > >> -struct inode *tracefs_get_inode(struct super_block *sb) > >> -{ > >> - struct inode *inode = new_inode(sb); > >> - if (inode) { > >> - inode->i_ino = get_next_ino(); > >> - inode->i_atime = inode->i_mtime = > >> inode_set_ctime_current(inode); > >> - } > >> - return inode; > >> -} > >> - > >> struct tracefs_mount_opts { > >> kuid_t uid; > >> kgid_t gid; > >> @@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry *dentry, > >> struct inode *inode) > >> return; > >> > >> ti = get_tracefs(inode); > >> + BUG_ON(ti->magic != 20240823); > >> if (ti && ti->flags & TRACEFS_EVENT_INODE) > >> eventfs_set_ef_status_free(dentry); > >> iput(inode); > >> @@ -568,6 +563,18 @@ struct dentry *eventfs_end_creating(struct dentry > >> *dentry) > >> return dentry; > >> } > >> > >> +struct inode *tracefs_get_inode(struct super_block *sb) > >> +{ > >> + struct inode *inode = new_inode(sb); > >> + > >> + BUG_ON(sb->s_op != &tracefs_super_operations); > >> + if (inode) { > >> + inode->i_ino = get_next_ino(); > >> + inode->i_atime = inode->i_
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting for once, to make this easily accessible to everyone. Ilkka or Steven, what happened to this? This thread looks stalled. I also was unsuccessful when looking for other threads related to this report or the culprit. Did it fall through the cracks or am I missing something here? Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke On 02.06.24 09:32, Ilkka Naulapää wrote: > sorry longer delay, been a bit busy but here is the result from that > new patch. Only applied this patch so if the previous one is needed > also, let me know and I'll rerun it. > > --Ilkka > > On Thu, May 30, 2024 at 5:00 PM Steven Rostedt wrote: >> >> On Thu, 30 May 2024 16:02:37 +0300 >> Ilkka Naulapää wrote: >> >>> applied your patch and here's the output. >>> >> >> Unfortunately, it doesn't give me any new information. I added one more >> BUG on, want to try this? Otherwise, I'm pretty much at a lost. :-/ >> >> -- Steve >> >> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c >> index de5b72216b1a..a090495e78c9 100644 >> --- a/fs/tracefs/inode.c >> +++ b/fs/tracefs/inode.c >> @@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct >> super_block *sb) >> return NULL; >> >> ti->flags = 0; >> + ti->magic = 20240823; >> >> return &ti->vfs_inode; >> } >> >> static void tracefs_free_inode(struct inode *inode) >> { >> - kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode)); >> + struct tracefs_inode *ti = get_tracefs(inode); >> + >> + BUG_ON(ti->magic != 20240823); >> + kmem_cache_free(tracefs_inode_cachep, ti); >> } >> >> static ssize_t default_read_file(struct file *file, char __user *buf, >> @@ -147,16 +151,6 @@ static const struct inode_operations >> tracefs_dir_inode_operations = { >> .rmdir = tracefs_syscall_rmdir, >> }; >> >> -struct inode *tracefs_get_inode(struct super_block *sb) >> -{ >> - struct inode *inode = new_inode(sb); >> - if (inode) { >> - inode->i_ino = get_next_ino(); >> - inode->i_atime = inode->i_mtime = >> inode_set_ctime_current(inode); >> - } >> - return inode; >> -} >> - >> struct tracefs_mount_opts { >> kuid_t uid; >> kgid_t gid; >> @@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry *dentry, >> struct inode *inode) >> return; >> >> ti = get_tracefs(inode); >> + BUG_ON(ti->magic != 20240823); >> if (ti && ti->flags & TRACEFS_EVENT_INODE) >> eventfs_set_ef_status_free(dentry); >> iput(inode); >> @@ -568,6 +563,18 @@ struct dentry *eventfs_end_creating(struct dentry >> *dentry) >> return dentry; >> } >> >> +struct inode *tracefs_get_inode(struct super_block *sb) >> +{ >> + struct inode *inode = new_inode(sb); >> + >> + BUG_ON(sb->s_op != &tracefs_super_operations); >> + if (inode) { >> + inode->i_ino = get_next_ino(); >> + inode->i_atime = inode->i_mtime = >> inode_set_ctime_current(inode); >> + } >> + return inode; >> +} >> + >> /** >> * tracefs_create_file - create a file in the tracefs filesystem >> * @name: a pointer to a string containing the name of the file to create. >> diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h >> index 69c2b1d87c46..9059b8b11bb6 100644 >> --- a/fs/tracefs/internal.h >> +++ b/fs/tracefs/internal.h >> @@ -9,12 +9,15 @@ enum { >> struct tracefs_inode { >> unsigned long flags; >> void*private; >> + unsigned long magic; >> struct inodevfs_inode; >> }; >> >> static inline struct tracefs_inode *get_tracefs(const struct inode *inode) >> { >> - return container_of(inode, struct tracefs_inode, vfs_inode); >> + struct tracefs_inode *ti = container_of(inode, struct tracefs_inode, >> vfs_inode); >> + BUG_ON(ti->magic != 20240823); >> + return ti; >> } >> >> struct dentry *tracefs_start_creating(const char *name, struct dentry >> *parent);
Re: [PATCH -next 2/2] ftrace: Add kernel-doc comments for unregister_ftrace_direct() function
On Fri, 7 Jun 2024 16:49:57 +0800 Yang Li wrote: > Added kernel-doc comments for the unregister_ftrace_direct() function to > improve code documentation and readability. > Someone else beat you to this. -- Steve > Reported-by: Abaci Robot > Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9300 > Signed-off-by: Yang Li > --- > kernel/trace/ftrace.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > index 4aeb1183ea9f..3b0dbd55cc05 100644 > --- a/kernel/trace/ftrace.c > +++ b/kernel/trace/ftrace.c > @@ -5988,6 +5988,8 @@ EXPORT_SYMBOL_GPL(register_ftrace_direct); > * unregister_ftrace_direct - Remove calls to custom trampoline > * previously registered by register_ftrace_direct for @ops object. > * @ops: The address of the struct ftrace_ops object > + * @addr: The address of the direct call to remove > + * @free_filters: Boolean indicating whether to free the filters > * > * This is used to remove a direct calls to @addr from the nop locations > * of the functions registered in @ops (with by ftrace_set_filter_ip
Re: [PATCH -next 1/2] function_graph: Add kernel-doc comments for ftrace_graph_ret_addr() function
On Fri, 7 Jun 2024 16:49:56 +0800 Yang Li wrote: > Added kernel-doc comments for the ftrace_graph_ret_addr() function to > improve code documentation and readability. > > Reported-by: Abaci Robot > Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9299 > Signed-off-by: Yang Li > --- > kernel/trace/fgraph.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c > index a13551a023aa..4ad33e4cb8da 100644 > --- a/kernel/trace/fgraph.c > +++ b/kernel/trace/fgraph.c > @@ -872,6 +872,12 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int > idx) > /** > * ftrace_graph_ret_addr - convert a potentially modified stack return > address > * to its original value > + * @task: pointer to the task_struct of the task being examined > + * @idx: pointer to a state variable, should be initialized to zero > + *before the first call parameter descriptions should not go across more than one line. At least not in my code. Also, you don't need to add that it needs to be initialized here. That belongs in the body. And it's not a state variable. The description you got that from is wrong. I'll go update it and give you a reported by, as the entire description needs a rewrite. -- Steve > + * @ret: the current return address found on the stack > + * @retp: pointer to the return address on the stack, ignored if > + * HAVE_FUNCTION_GRAPH_RET_ADDR_PTR is not defined > * > * This function can be called by stack unwinding code to convert a found > stack > * return address ('ret') to its original value, in case the function graph
Re: [PATCH v8 5/5] remoteproc: qcom: enable in-kernel PD mapper
On 5/11/2024 2:56 PM, Dmitry Baryshkov wrote: Request in-kernel protection domain mapper to be started before starting Qualcomm DSP and release it once DSP is stopped. Once all DSPs are stopped, the PD mapper will be stopped too. Signed-off-by: Dmitry Baryshkov --- drivers/remoteproc/qcom_common.c| 87 + drivers/remoteproc/qcom_common.h| 10 + drivers/remoteproc/qcom_q6v5_adsp.c | 3 ++ drivers/remoteproc/qcom_q6v5_mss.c | 3 ++ drivers/remoteproc/qcom_q6v5_pas.c | 3 ++ drivers/remoteproc/qcom_q6v5_wcss.c | 3 ++ 6 files changed, 109 insertions(+) Thanks for looking into whether this could be implemented as a remoteproc subdevice. Reviewed-by: Chris Lew
[PATCH] kernel/trace: fix possible deadlock in trie_delete_elem
On bpf syscall map operations the bpf_disable_instrumentation function is called for the reason described in the comment to the function. The description matches the bug case. The function increments a per CPU integer variable bpf_prog_active. The variable is not processed in the bpf trace path. The fix implements a similar processing as for kprobe handling. The fix degrades the bpf tracing by skipping some eBPF trace sequences that otherwise might yield deadlock. Reported-by: syzbot+9d95beb2a3c260622...@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9d95beb2a3c260622518 Link: https://lore.kernel.org/all/adb08b0614139...@google.com/T/ Signed-off-by: Wojciech Gładysz --- kernel/trace/bpf_trace.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 6249dac61701..8de2e084b162 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2391,7 +2391,9 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args) struct bpf_trace_run_ctx run_ctx; cant_sleep(); - if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { + + // if the instrumentation is not disabled disable recurrence and go + if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) { bpf_prog_inc_misses_counter(prog); goto out; } @@ -2405,7 +2407,7 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args) bpf_reset_run_ctx(old_run_ctx); out: - this_cpu_dec(*(prog->active)); + __this_cpu_dec(bpf_prog_active); } #define UNPACK(...)__VA_ARGS__ -- 2.35.3
[PATCH -next 2/2] ftrace: Add kernel-doc comments for unregister_ftrace_direct() function
Added kernel-doc comments for the unregister_ftrace_direct() function to improve code documentation and readability. Reported-by: Abaci Robot Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9300 Signed-off-by: Yang Li --- kernel/trace/ftrace.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 4aeb1183ea9f..3b0dbd55cc05 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -5988,6 +5988,8 @@ EXPORT_SYMBOL_GPL(register_ftrace_direct); * unregister_ftrace_direct - Remove calls to custom trampoline * previously registered by register_ftrace_direct for @ops object. * @ops: The address of the struct ftrace_ops object + * @addr: The address of the direct call to remove + * @free_filters: Boolean indicating whether to free the filters * * This is used to remove a direct calls to @addr from the nop locations * of the functions registered in @ops (with by ftrace_set_filter_ip -- 2.20.1.7.g153144c
[PATCH -next 1/2] function_graph: Add kernel-doc comments for ftrace_graph_ret_addr() function
Added kernel-doc comments for the ftrace_graph_ret_addr() function to improve code documentation and readability. Reported-by: Abaci Robot Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9299 Signed-off-by: Yang Li --- kernel/trace/fgraph.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index a13551a023aa..4ad33e4cb8da 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -872,6 +872,12 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int idx) /** * ftrace_graph_ret_addr - convert a potentially modified stack return address *to its original value + * @task: pointer to the task_struct of the task being examined + * @idx: pointer to a state variable, should be initialized to zero + * before the first call + * @ret: the current return address found on the stack + * @retp: pointer to the return address on the stack, ignored if + * HAVE_FUNCTION_GRAPH_RET_ADDR_PTR is not defined * * This function can be called by stack unwinding code to convert a found stack * return address ('ret') to its original value, in case the function graph -- 2.20.1.7.g153144c
[PATCH 6.10.0-rc2] kernel/module: avoid panic on loading broken module
If a module is being loaded, and the .gnu.linkonce.this_module section in the module's ELF file does not have the WRITE flag, the kernel will map the finished module struct of that module as read-only. This causes a kernel panic when the struct is written to the first time after it has been marked read-only. Currently this happens in complete_formation in kernel/module/main.c:2765 when the module's state is set to MODULE_STATE_COMING, just after setting up the memory protections. Down the line, this seems to lead to unpredictable freezes when trying to load other modules - I guess this is due to some structures not being cleaned up properly, but I didn't investigate this further. A check already exists which verifies that .gnu.linkonce.this_module is ALLOC. This patch simply adds an analogous check for WRITE. Signed-off-by: Daniel Kirschten --- kernel/module/main.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/kernel/module/main.c b/kernel/module/main.c index d18a94b973e1..abba097551a2 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -1886,6 +1886,12 @@ static int elf_validity_cache_copy(struct load_info *info, int flags) goto no_exec; } + if (!(shdr->sh_flags & SHF_WRITE)) { + pr_err("module %s: .gnu.linkonce.this_module must be writable\n", + info->name ?: "(missing .modinfo section or name field)"); + goto no_exec; + } + if (shdr->sh_size != sizeof(struct module)) { pr_err("module %s: .gnu.linkonce.this_module section size must match the kernel's built struct module size at run time\n", info->name ?: "(missing .modinfo section or name field)"); -- 2.34.1
Re: [PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation
On 11/05/2024 23:56, Dmitry Baryshkov wrote: Protection domain mapper is a QMI service providing mapping between 'protection domains' and services supported / allowed in these domains. For example such mapping is required for loading of the WiFi firmware or for properly starting up the UCSI / altmode / battery manager support. The existing userspace implementation has several issue. It doesn't play well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the firmware location is changed (or if the firmware was not available at the time pd-mapper was started but the corresponding directory is mounted later), etc. However this configuration is largely static and common between different platforms. Provide in-kernel service implementing static per-platform data. To: Bjorn Andersson To: Konrad Dybcio To: Sibi Sankar To: Mathieu Poirier Cc: linux-arm-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-remotep...@vger.kernel.org Cc: Johan Hovold Cc: Xilin Wu Cc: "Bryan O'Donoghue" Cc: Steev Klimaszewski Cc: Alexey Minnekhanov -- Changes in v8: - Reworked pd-mapper to register as an rproc_subdev / auxdev - Dropped Tested-by from Steev and Alexey from the last patch since the implementation was changed significantly. - Add sensors, cdsp and mpss_root domains to 660 config (Alexey Minnekhanov) - Added platform entry for sm4250 (used for qrb4210 / RB2) - Added locking to the pdr_get_domain_list() (Chris Lew) - Remove the call to qmi_del_server() and corresponding API (Chris Lew) - In qmi_handle_init() changed 1024 to a defined constant (Chris Lew) - Link to v7: https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org Changes in v7: - Fixed modular build (Steev) - Link to v6: https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org Changes in v6: - Reworked mutex to fix lockdep issue on deregistration - Fixed dependencies between PD-mapper and remoteproc to fix modular builds (Krzysztof) - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof) - Fixed kerneldocs (Krzysztof) - Removed extra pr_debug messages (Krzysztof) - Fixed wcss build (Krzysztof) - Added platforms which do not require protection domain mapping to silence the notice on those platforms - Link to v5: https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org Changes in v5: - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew) - pd_mapper: reworked to provide static configuration per platform (Bjorn) - Link to v4: https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org Changes in v4: - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad) - Added configuration for sm6350 (Thanks to Luca) - Removed RFC tag (Konrad) - Link to v3: https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org Changes in RFC v3: - Send start / stop notifications when PD-mapper domain list is changed - Reworked the way PD-mapper treats protection domains, register all of them in a single batch - Added SC7180 domains configuration based on TCL Book 14 GO - Link to v2: https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org Changes in RFC v2: - Swapped num_domains / domains (Konrad) - Fixed an issue with battery not working on sc8280xp - Added missing configuration for QCS404 --- Dmitry Baryshkov (5): soc: qcom: pdr: protect locator_addr with the main mutex soc: qcom: pdr: fix parsing of domains lists soc: qcom: pdr: extract PDR message marshalling data soc: qcom: add pd-mapper implementation remoteproc: qcom: enable in-kernel PD mapper drivers/remoteproc/qcom_common.c| 87 + drivers/remoteproc/qcom_common.h| 10 + drivers/remoteproc/qcom_q6v5_adsp.c | 3 + drivers/remoteproc/qcom_q6v5_mss.c | 3 + drivers/remoteproc/qcom_q6v5_pas.c | 3 + drivers/remoteproc/qcom_q6v5_wcss.c | 3 + drivers/soc/qcom/Kconfig| 15 + drivers/soc/qcom/Makefile | 2 + drivers/soc/qcom/pdr_interface.c| 17 +- drivers/soc/qcom/pdr_internal.h | 318 ++--- drivers/soc/qcom/qcom_pd_mapper.c | 676 drivers/soc/qcom/qcom_pdr_msg.c | 353 +++ 12 files changed, 1190 insertions(+), 300 deletions(-) --- base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488 change-id: 20240301-qcom-pd-mapper-e12d622d4ad0 Best regards, Tested-by: Neil Armstrong # on SM8550-QRD Tested-by: Neil Armstrong # on SM8550-HDK Tested-by: Neil Armstrong # on SM8650-QRD Thanks, Neil
Re: [PATCH 0/6] ftrace: Minor fixes for sparse and kernel test robot
On Wed, 05 Jun 2024 16:26:44 -0400 Steven Rostedt wrote: > > Recieved some minor bug reports from the kernel test robot. First I started > cleaning up some of the sparse warnings. There's many more, but most changes > are not really helping anything, but just quieting the warnings. > > But the reports from kernel test robot need to be fixed. All looks good to me. Acked-by: Masami Hiramatsu (Google) Thank you! > > Steven Rostedt (Google) (6): > ftrace: Declare function_trace_op in header to quiet sparse warning > ftrace: Assign ftrace_list_end to ftrace_ops_list type cast to RCU > ftrace: Assign RCU list variable with rcu_assign_ptr() > ftrace: Fix prototypes for ftrace_startup/shutdown_subops() > function_graph: Make fgraph_do_direct static key static > function_graph: Do not update pid func if CONFIG_DYNAMIC_FTRACE not > enabled > > > include/linux/ftrace.h | 3 +++ > kernel/trace/fgraph.c | 4 +++- > kernel/trace/ftrace.c | 4 ++-- > kernel/trace/ftrace_internal.h | 9 + > kernel/trace/trace.h | 1 - > 5 files changed, 17 insertions(+), 4 deletions(-) -- Masami Hiramatsu (Google)
[PATCH 0/6] ftrace: Minor fixes for sparse and kernel test robot
Recieved some minor bug reports from the kernel test robot. First I started cleaning up some of the sparse warnings. There's many more, but most changes are not really helping anything, but just quieting the warnings. But the reports from kernel test robot need to be fixed. Steven Rostedt (Google) (6): ftrace: Declare function_trace_op in header to quiet sparse warning ftrace: Assign ftrace_list_end to ftrace_ops_list type cast to RCU ftrace: Assign RCU list variable with rcu_assign_ptr() ftrace: Fix prototypes for ftrace_startup/shutdown_subops() function_graph: Make fgraph_do_direct static key static function_graph: Do not update pid func if CONFIG_DYNAMIC_FTRACE not enabled include/linux/ftrace.h | 3 +++ kernel/trace/fgraph.c | 4 +++- kernel/trace/ftrace.c | 4 ++-- kernel/trace/ftrace_internal.h | 9 +++++ kernel/trace/trace.h | 1 - 5 files changed, 17 insertions(+), 4 deletions(-)
[PATCH v4 07/11] riscv: mm: Take memory hotplug read-lock during kernel page table dump
From: Björn Töpel During memory hot remove, the ptdump functionality can end up touching stale data. Avoid any potential crashes (or worse), by holding the memory hotplug read-lock while traversing the page table. This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm: Hold memory hotplug lock while walking for kernel page table dump"). Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Signed-off-by: Björn Töpel --- arch/riscv/mm/ptdump.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c index 1289cc6d3700..9d5f657a251b 100644 --- a/arch/riscv/mm/ptdump.c +++ b/arch/riscv/mm/ptdump.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include @@ -370,7 +371,9 @@ bool ptdump_check_wx(void) static int ptdump_show(struct seq_file *m, void *v) { + get_online_mems(); ptdump_walk(m, m->private); + put_online_mems(); return 0; } -- 2.43.0
Re: [PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation
I've tested this applied on top of kernel 6.8.11 on an X13s over the past week and it's been working well. -- classabbyamp
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Thu, 30 May 2024 16:02:37 +0300 Ilkka Naulapää wrote: > applied your patch and here's the output. > Unfortunately, it doesn't give me any new information. I added one more BUG on, want to try this? Otherwise, I'm pretty much at a lost. :-/ -- Steve diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index de5b72216b1a..a090495e78c9 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct super_block *sb) return NULL; ti->flags = 0; + ti->magic = 20240823; return &ti->vfs_inode; } static void tracefs_free_inode(struct inode *inode) { - kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode)); + struct tracefs_inode *ti = get_tracefs(inode); + + BUG_ON(ti->magic != 20240823); + kmem_cache_free(tracefs_inode_cachep, ti); } static ssize_t default_read_file(struct file *file, char __user *buf, @@ -147,16 +151,6 @@ static const struct inode_operations tracefs_dir_inode_operations = { .rmdir = tracefs_syscall_rmdir, }; -struct inode *tracefs_get_inode(struct super_block *sb) -{ - struct inode *inode = new_inode(sb); - if (inode) { - inode->i_ino = get_next_ino(); - inode->i_atime = inode->i_mtime = inode_set_ctime_current(inode); - } - return inode; -} - struct tracefs_mount_opts { kuid_t uid; kgid_t gid; @@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode) return; ti = get_tracefs(inode); + BUG_ON(ti->magic != 20240823); if (ti && ti->flags & TRACEFS_EVENT_INODE) eventfs_set_ef_status_free(dentry); iput(inode); @@ -568,6 +563,18 @@ struct dentry *eventfs_end_creating(struct dentry *dentry) return dentry; } +struct inode *tracefs_get_inode(struct super_block *sb) +{ + struct inode *inode = new_inode(sb); + + BUG_ON(sb->s_op != &tracefs_super_operations); + if (inode) { + inode->i_ino = get_next_ino(); + inode->i_atime = inode->i_mtime = inode_set_ctime_current(inode); + } + return inode; +} + /** * tracefs_create_file - create a file in the tracefs filesystem * @name: a pointer to a string containing the name of the file to create. diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 69c2b1d87c46..9059b8b11bb6 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -9,12 +9,15 @@ enum { struct tracefs_inode { unsigned long flags; void*private; + unsigned long magic; struct inodevfs_inode; }; static inline struct tracefs_inode *get_tracefs(const struct inode *inode) { - return container_of(inode, struct tracefs_inode, vfs_inode); + struct tracefs_inode *ti = container_of(inode, struct tracefs_inode, vfs_inode); + BUG_ON(ti->magic != 20240823); + return ti; } struct dentry *tracefs_start_creating(const char *name, struct dentry *parent);
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Wed, 29 May 2024 14:47:57 -0400 Steven Rostedt wrote: > Let me make a debug patch (that crashes on this issue) for that kernel, > and perhaps you could bisect it? Can you try this on 6.6-rc1 and see if it gives you any other splats? Hmm, you can switch it to WARN_ON and that way it may not crash the machine, and you can use dmesg to get the output. Thanks, -- Steve diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index de5b72216b1a..a090495e78c9 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct super_block *sb) return NULL; ti->flags = 0; + ti->magic = 20240823; return &ti->vfs_inode; } static void tracefs_free_inode(struct inode *inode) { - kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode)); + struct tracefs_inode *ti = get_tracefs(inode); + + BUG_ON(ti->magic != 20240823); + kmem_cache_free(tracefs_inode_cachep, ti); } static ssize_t default_read_file(struct file *file, char __user *buf, @@ -147,16 +151,6 @@ static const struct inode_operations tracefs_dir_inode_operations = { .rmdir = tracefs_syscall_rmdir, }; -struct inode *tracefs_get_inode(struct super_block *sb) -{ - struct inode *inode = new_inode(sb); - if (inode) { - inode->i_ino = get_next_ino(); - inode->i_atime = inode->i_mtime = inode_set_ctime_current(inode); - } - return inode; -} - struct tracefs_mount_opts { kuid_t uid; kgid_t gid; @@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode) return; ti = get_tracefs(inode); + BUG_ON(ti->magic != 20240823); if (ti && ti->flags & TRACEFS_EVENT_INODE) eventfs_set_ef_status_free(dentry); iput(inode); @@ -568,6 +563,18 @@ struct dentry *eventfs_end_creating(struct dentry *dentry) return dentry; } +struct inode *tracefs_get_inode(struct super_block *sb) +{ + struct inode *inode = new_inode(sb); + + BUG_ON(sb->s_op != &tracefs_super_operations); + if (inode) { + inode->i_ino = get_next_ino(); + inode->i_atime = inode->i_mtime = inode_set_ctime_current(inode); + } + return inode; +} + /** * tracefs_create_file - create a file in the tracefs filesystem * @name: a pointer to a string containing the name of the file to create. diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 69c2b1d87c46..9f6f303a9e58 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -9,6 +9,7 @@ enum { struct tracefs_inode { unsigned long flags; void*private; + unsigned long magic; struct inodevfs_inode; };
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Wed, 29 May 2024 21:36:08 +0300 Ilkka Naulapää wrote: > applied your patch without others, so trace and panic there. > Screenshot attached. Also tested kernels backward and found out that Bah, it's still in an RCU callback, which doesn't tell us why a normal inode is being sent to the trace inode free list. > this trace bug first triggered on 6.6-rc1. Hmm, that's when eventfs was added. > > Let me know if you need more assistance with this. Let me make a debug patch (that crashes on this issue) for that kernel, and perhaps you could bisect it? Thanks! -- Steve
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Tue, 28 May 2024 07:51:30 +0300 Ilkka Naulapää wrote: > yeah, the cache_from_obj tracing bug (without panic) has been > displayed quite some time now - maybe even since 6.7.x or so. I could > try checking a few versions back for this and try bisecting it if I > can find when this started. > OK, so I don't think the commit your last bisect hit is the cause of the bug. It added a delay (via RCU) and is causing the real bug to blow up more. Can you add this patch to v6.9.2 and hopefully it crashes in a better location that we can find where the mixup happened. You may need to add the other commit (too if this doesn't trigger. Thanks, -- Steve diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 417c840e6403..7af3f696696d 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -50,6 +50,7 @@ static struct inode *tracefs_alloc_inode(struct super_block *sb) list_add_rcu(&ti->list, &tracefs_inodes); spin_unlock_irqrestore(&tracefs_inode_lock, flags); + ti->magic = 20240823; return &ti->vfs_inode; } @@ -66,6 +67,7 @@ static void tracefs_free_inode(struct inode *inode) struct tracefs_inode *ti = get_tracefs(inode); unsigned long flags; + BUG_ON(ti->magic != 20240823); spin_lock_irqsave(&tracefs_inode_lock, flags); list_del_rcu(&ti->list); spin_unlock_irqrestore(&tracefs_inode_lock, flags); @@ -271,16 +273,6 @@ static const struct inode_operations tracefs_file_inode_operations = { .setattr= tracefs_setattr, }; -struct inode *tracefs_get_inode(struct super_block *sb) -{ - struct inode *inode = new_inode(sb); - if (inode) { - inode->i_ino = get_next_ino(); - simple_inode_init_ts(inode); - } - return inode; -} - struct tracefs_mount_opts { kuid_t uid; kgid_t gid; @@ -448,6 +440,17 @@ static const struct super_operations tracefs_super_operations = { .show_options = tracefs_show_options, }; +struct inode *tracefs_get_inode(struct super_block *sb) +{ + struct inode *inode = new_inode(sb); + BUG_ON(sb->s_op != &tracefs_super_operations); + if (inode) { + inode->i_ino = get_next_ino(); + simple_inode_init_ts(inode); + } + return inode; +} + /* * It would be cleaner if eventfs had its own dentry ops. * diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index f704d8348357..dda7d2708e30 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -16,6 +16,7 @@ struct tracefs_inode { }; /* The below gets initialized with memset_after(ti, 0, vfs_inode) */ struct list_headlist; + unsigned long magic; unsigned long flags; void*private; };
Re: [PATCH] rv: Update rv_en(dis)able_monitor doc to match kernel-doc
On 5/20/24 07:42, Yang Li wrote: > The patch updates the function documentation comment for > rv_en(dis)able_monitor to adhere to the kernel-doc specification. > > Signed-off-by: Yang Li Acked-by: Daniel Bristot de Oliveira Thanks -- Daniel
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
yeah, the cache_from_obj tracing bug (without panic) has been displayed quite some time now - maybe even since 6.7.x or so. I could try checking a few versions back for this and try bisecting it if I can find when this started. --Ilkka On Tue, May 28, 2024 at 1:31 AM Steven Rostedt wrote: > > On Fri, 24 May 2024 12:50:08 +0200 > "Linux regression tracking (Thorsten Leemhuis)" > wrote: > > > > - Affected Versions: Before kernel version 6.8.10, the bug caused a > > > quick display of a kernel trace dump before the shutdown/reboot > > > completed. Starting from version 6.8.10 and continuing into version > > > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic, > > > preventing the shutdown or reboot from completing and leaving the > > > machine stuck. > > You state "Before kernel version 6.8.10, the bug caused ...". Does that > mean that a bug was happening before v6.8.10? But did not cause a panic? > > I just noticed your second screen shot from your report, and it has: > > "cache_from_obj: Wrong slab cache, tracefs_inode_cache but object is from > inode_cache" > > So somehow an tracefs_inode was allocated from the inode_cache and is > now being freed by the tracefs_inode logic? Did this happen before > 6.8.10? If so, this code could just be triggering an issue from an > unrelated bug. > > Thanks, > > -- Steve
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
I tried 6.10-rc1 and it still ends up to panic --Ilkka On Tue, May 28, 2024 at 12:44 AM Steven Rostedt wrote: > > On Mon, 27 May 2024 20:14:42 +0200 > Greg KH wrote: > > > On Mon, May 27, 2024 at 07:40:21PM +0300, Ilkka Naulapää wrote: > > > Hi Steven, > > > > > > I took some time and bisected the 6.8.9 - 6.8.10 and git gave the > > > panic inducing commit: > > > > > > 414fb08628143 (tracefs: Reset permissions on remount if permissions are > > > options) > > > > > > I reverted that commit to 6.9.2 and now it only serves the trace but > > > the panic is gone. But I can live with it. > > > > Steven, should we revert that? > > > > Or is there some other change that we should take to resolve this? > > > > Before we revert it (as it may be a bug in mainline), Ilkka, can you > test v6.10-rc1? If it exists there, it will let me know whether or not > I missed something. > > Thanks, > > -- Steve
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Fri, 24 May 2024 12:50:08 +0200 "Linux regression tracking (Thorsten Leemhuis)" wrote: > > - Affected Versions: Before kernel version 6.8.10, the bug caused a > > quick display of a kernel trace dump before the shutdown/reboot > > completed. Starting from version 6.8.10 and continuing into version > > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic, > > preventing the shutdown or reboot from completing and leaving the > > machine stuck. You state "Before kernel version 6.8.10, the bug caused ...". Does that mean that a bug was happening before v6.8.10? But did not cause a panic? I just noticed your second screen shot from your report, and it has: "cache_from_obj: Wrong slab cache, tracefs_inode_cache but object is from inode_cache" So somehow an tracefs_inode was allocated from the inode_cache and is now being freed by the tracefs_inode logic? Did this happen before 6.8.10? If so, this code could just be triggering an issue from an unrelated bug. Thanks, -- Steve
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Mon, 27 May 2024 20:14:42 +0200 Greg KH wrote: > On Mon, May 27, 2024 at 07:40:21PM +0300, Ilkka Naulapää wrote: > > Hi Steven, > > > > I took some time and bisected the 6.8.9 - 6.8.10 and git gave the > > panic inducing commit: > > > > 414fb08628143 (tracefs: Reset permissions on remount if permissions are > > options) > > > > I reverted that commit to 6.9.2 and now it only serves the trace but > > the panic is gone. But I can live with it. > > Steven, should we revert that? > > Or is there some other change that we should take to resolve this? > Before we revert it (as it may be a bug in mainline), Ilkka, can you test v6.10-rc1? If it exists there, it will let me know whether or not I missed something. Thanks, -- Steve
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Mon, May 27, 2024 at 07:40:21PM +0300, Ilkka Naulapää wrote: > Hi Steven, > > I took some time and bisected the 6.8.9 - 6.8.10 and git gave the > panic inducing commit: > > 414fb08628143 (tracefs: Reset permissions on remount if permissions are > options) > > I reverted that commit to 6.9.2 and now it only serves the trace but > the panic is gone. But I can live with it. Steven, should we revert that? Or is there some other change that we should take to resolve this? thanks, greg k-h
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
Hi Steven, I took some time and bisected the 6.8.9 - 6.8.10 and git gave the panic inducing commit: 414fb08628143 (tracefs: Reset permissions on remount if permissions are options) I reverted that commit to 6.9.2 and now it only serves the trace but the panic is gone. But I can live with it. --Ilkka On Sun, May 26, 2024 at 8:42 PM Ilkka Naulapää wrote: > > hi, > > I took 6.9.2 and applied that 0bcfd9aa4dafa to it. Now the kernel is > serving me both problems; the trace and the panic as the pic shows. > > > To understand this, did you do anything with tracing? Before shutting down, > > is there anything in /sys/kernel/tracing/instances directory? > > Were any of the files/directories permissions in /sys/kernel/tracing > > changed? > > And to answer your question, I did not do any tracing or so and the > /sys/kernel/tracing is empty. > Just plain boot-up, no matter if in full desktop or in bare rescue > mode, ends up the same way. > > --Ilkka > > On Fri, May 24, 2024 at 8:19 PM Steven Rostedt wrote: > > > > On Fri, 24 May 2024 12:50:08 +0200 > > "Linux regression tracking (Thorsten Leemhuis)" > > wrote: > > > > > > - Affected Versions: Before kernel version 6.8.10, the bug caused a > > > > quick display of a kernel trace dump before the shutdown/reboot > > > > completed. Starting from version 6.8.10 and continuing into version > > > > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic, > > > > preventing the shutdown or reboot from completing and leaving the > > > > machine stuck. > > > > Ah, I bet it was this commit: baa23a8d4360d ("tracefs: Reset permissions on > > remount if permissions are options"), which added a "iput" callback to the > > dentry without calling iput, leaving stale inodes around. > > > > This is fixed with: > > > > 0bcfd9aa4dafa ("tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()") > > > > Try adding just that patch. It will at least make it go back to what was > > happening before 6.8.10 (I hope!). > > > > -- Steve
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Fri, 24 May 2024 12:50:08 +0200 "Linux regression tracking (Thorsten Leemhuis)" wrote: > > - Affected Versions: Before kernel version 6.8.10, the bug caused a > > quick display of a kernel trace dump before the shutdown/reboot > > completed. Starting from version 6.8.10 and continuing into version > > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic, > > preventing the shutdown or reboot from completing and leaving the > > machine stuck. Ah, I bet it was this commit: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options"), which added a "iput" callback to the dentry without calling iput, leaving stale inodes around. This is fixed with: 0bcfd9aa4dafa ("tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()") Try adding just that patch. It will at least make it go back to what was happening before 6.8.10 (I hope!). -- Steve
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
On Fri, 24 May 2024 12:50:08 +0200 "Linux regression tracking (Thorsten Leemhuis)" wrote: > [CCing a few people] > Thanks for the Cc. > On 24.05.24 12:31, Ilkka Naulapää wrote: > > > > I have encountered a critical bug in the Linux vanilla kernel that > > leads to a kernel panic during the shutdown or reboot process. The > > issue arises after all services, including `journald`, have been > > stopped. As a result, the machine fails to complete the shutdown or > > reboot procedure, effectively causing the system to hang and not shut > > down or reboot. To understand this, did you do anything with tracing? Before shutting down, is there anything in /sys/kernel/tracing/instances directory? Were any of the files/directories permissions in /sys/kernel/tracing changed? > > Thx for the report. Not my area of expertise, so take this with a gain > of salt. But given the versions your mention in your report and the > screenshot that mentioned tracefs_free_inode I suspect this is caused by > baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are > options"). A few fixes for it will soon hit mainline and are meant to be > backported to affected stable trees: > > https://lore.kernel.org/all/20240523212406.254317...@goodmis.org/ > https://lore.kernel.org/all/20240523174419.1e588...@gandalf.local.home/ > > You might want to try them – or recheck once they hit the stable trees > you are about. If they don't work, please report back. There's been quite a bit of updates in this code, but this looks new to me. I have more fixes that were just pulled by Linus today. https://git.kernel.org/torvalds/c/0eb03c7e8e2a4cc3653eb5eeb2d2001182071215 I'm not sure how relevant that is for this. But if you can reproduce it with that commit, then this is a new bug. -- Steve
Re: How to properly fix reading user pointers in bpf in android kernel 4.9?
[also Cc: bpf maintainers and get_maintainer output] On Thu, May 23, 2024 at 07:52:22PM +0300, Marcel wrote: > This seems that it was a long standing problem with the Linux kernel in > general. bpf_probe_read should have worked for both kernel and user pointers > but it fails with access error when reading an user one instead. > > I know there's a patch upstream that fixes this by introducing new helpers > for reading kernel and userspace pointers and I tried to back port them back > to my kernel but with no success. Tools like bcc fail to use them and instead > they report that the arguments sent to the helpers are invalid. I assume this > is due to the arguments ARG_CONST_STACK_SIZE and ARG_PTR_TO_RAW_STACK handle > data different in the 4.9 android version and the upstream version but I'm > not sure that this is the cause. I left the patch I did below and with a link > to the kernel I'm working on and maybe someone can take a look and give me an > hand (the patch isn't applied yet) What upstream patch? Has it already been in mainline? > > <https://github.com/nitanmarcel/android_kernel_oneplus_sdm845-bpf> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 744b4763b80e..de94c13b7193 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -559,6 +559,43 @@ enum bpf_func_id { > */ > BPF_FUNC_probe_read_user, > > + /** > + * int bpf_probe_read_kernel(void *dst, int size, void *src) > + * Read a kernel pointer safely. > + * Return: 0 on success or negative error > + */ > + BPF_FUNC_probe_read_kernel, > + > + /** > + * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr) > + * Copy a NUL terminated string from user unsafe address. In case > the string > + * length is smaller than size, the target is not padded with > further NUL > + * bytes. In case the string length is larger than size, just > count-1 > + * bytes are copied and the last byte is set to NUL. > + * @dst: destination address > + * @size: maximum number of bytes to copy, including the trailing > NUL > + * @unsafe_ptr: unsafe address > + * Return: > + * > 0 length of the string including the trailing NUL on success > + * < 0 error > + */ > + BPF_FUNC_probe_read_user_str, > + > + /** > + * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr) > + * Copy a NUL terminated string from unsafe address. In case the > string > + * length is smaller than size, the target is not padded with > further NUL > + * bytes. In case the string length is larger than size, just > count-1 > + * bytes are copied and the last byte is set to NUL. > + * @dst: destination address > + * @size: maximum number of bytes to copy, including the trailing > NUL > + * @unsafe_ptr: unsafe address > + * Return: > + * > 0 length of the string including the trailing NUL on success > + * < 0 error > + */ > + BPF_FUNC_probe_read_kernel_str, > + > __BPF_FUNC_MAX_ID, > }; > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index a1e37a5d8c88..3478ca744a45 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -94,7 +94,7 @@ static const struct bpf_func_proto bpf_probe_read_proto = { > .arg3_type = ARG_ANYTHING, > }; > > -BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size, const void *, > unsafe_ptr) > +BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size, const void __user > *, unsafe_ptr) > { > int ret; > > @@ -115,6 +115,27 @@ static const struct bpf_func_proto > bpf_probe_read_user_proto = { > }; > > > +BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, const void *, > unsafe_ptr) > +{ > + int ret; > + > + ret = probe_kernel_read(dst, unsafe_ptr, size); > + if (unlikely(ret < 0)) > + memset(dst, 0, size); > + > + return ret; > +} > + > +static const struct bpf_func_proto bpf_probe_read_kernel_proto = { > + .func = bpf_probe_read_kernel, > + .gpl_only = true, > + .ret_type = RET_INTEGER, > + .arg1_type = ARG_PTR_TO_RAW_STACK, > + .arg2_type = ARG_CONST_STACK_SIZE, > + .arg3_type = ARG_ANYTHING, > +}; > + > + > BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src, > u32, size) > { > @@ -487,6 +508,69 @@ static const struct
Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot
[CCing a few people] On 24.05.24 12:31, Ilkka Naulapää wrote: > > I have encountered a critical bug in the Linux vanilla kernel that > leads to a kernel panic during the shutdown or reboot process. The > issue arises after all services, including `journald`, have been > stopped. As a result, the machine fails to complete the shutdown or > reboot procedure, effectively causing the system to hang and not shut > down or reboot. Thx for the report. Not my area of expertise, so take this with a gain of salt. But given the versions your mention in your report and the screenshot that mentioned tracefs_free_inode I suspect this is caused by baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options"). A few fixes for it will soon hit mainline and are meant to be backported to affected stable trees: https://lore.kernel.org/all/20240523212406.254317...@goodmis.org/ https://lore.kernel.org/all/20240523174419.1e588...@gandalf.local.home/ You might want to try them – or recheck once they hit the stable trees you are about. If they don't work, please report back. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. > Here are the details of the issue: > > - Affected Versions: Before kernel version 6.8.10, the bug caused a > quick display of a kernel trace dump before the shutdown/reboot > completed. Starting from version 6.8.10 and continuing into version > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic, > preventing the shutdown or reboot from completing and leaving the > machine stuck. > > - Symptoms: > - In normal shutdown/reboot scenarios, the kernel trace dump briefly > appears as the last message on the screen. > - In rescue mode, the kernel panic message is displayed. Normally it > is not shown. > > Since `journald` is stopped before this issue occurs, no textual logs > are available. However, I have captured two pictures illustrating > these related issues, which I am attaching to this email for your > reference. Also added my custom kernel config. > > Thank you for your attention to this matter. Please let me know if any > additional information is required to assist in diagnosing and > resolving this bug. > > Best regards, > > Ilkka Naulapää
[PATCH v3 6/9] riscv: mm: Take memory hotplug read-lock during kernel page table dump
From: Björn Töpel During memory hot remove, the ptdump functionality can end up touching stale data. Avoid any potential crashes (or worse), by holding the memory hotplug read-lock while traversing the page table. This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm: Hold memory hotplug lock while walking for kernel page table dump"). Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Signed-off-by: Björn Töpel --- arch/riscv/mm/ptdump.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c index 1289cc6d3700..9d5f657a251b 100644 --- a/arch/riscv/mm/ptdump.c +++ b/arch/riscv/mm/ptdump.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include @@ -370,7 +371,9 @@ bool ptdump_check_wx(void) static int ptdump_show(struct seq_file *m, void *v) { + get_online_mems(); ptdump_walk(m, m->private); + put_online_mems(); return 0; } -- 2.40.1
[PATCH] rv: Update rv_en(dis)able_monitor doc to match kernel-doc
The patch updates the function documentation comment for rv_en(dis)able_monitor to adhere to the kernel-doc specification. Signed-off-by: Yang Li --- kernel/trace/rv/rv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c index 2f68e93fff0b..df0745a42a3f 100644 --- a/kernel/trace/rv/rv.c +++ b/kernel/trace/rv/rv.c @@ -245,6 +245,7 @@ static int __rv_disable_monitor(struct rv_monitor_def *mdef, bool sync) /** * rv_disable_monitor - disable a given runtime monitor + * @mdef: Pointer to the monitor definition structure. * * Returns 0 on success. */ @@ -256,6 +257,7 @@ int rv_disable_monitor(struct rv_monitor_def *mdef) /** * rv_enable_monitor - enable a given runtime monitor + * @mdef: Pointer to the monitor definition structure. * * Returns 0 on success, error otherwise. */ -- 2.20.1.7.g153144c
Re: [PATCH] kernel: trace: preemptirq_delay_test: add MODULE_DESCRIPTION()
On Sat, 18 May 2024 15:54:49 -0700 Jeff Johnson wrote: > Fix the 'make W=1' warning: > > WARNING: modpost: missing MODULE_DESCRIPTION() in > kernel/trace/preemptirq_delay_test.o > Looks good to me. Acked-by: Masami Hiramatsu (Google) Fixes: f96e8577da10 ("lib: Add module for testing preemptoff/irqsoff latency tracers") Thanks, > Signed-off-by: Jeff Johnson > --- > kernel/trace/preemptirq_delay_test.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/trace/preemptirq_delay_test.c > b/kernel/trace/preemptirq_delay_test.c > index 8c4ffd076162..cb0871fbdb07 100644 > --- a/kernel/trace/preemptirq_delay_test.c > +++ b/kernel/trace/preemptirq_delay_test.c > @@ -215,4 +215,5 @@ static void __exit preemptirq_delay_exit(void) > > module_init(preemptirq_delay_init) > module_exit(preemptirq_delay_exit) > +MODULE_DESCRIPTION("Preempt / IRQ disable delay thread to test latency > tracers"); > MODULE_LICENSE("GPL v2"); > > --- > base-commit: 674143feb6a8c02d899e64e2ba0f992896afd532 > change-id: 20240518-md-preemptirq_delay_test-552cd20e7b0b > -- Masami Hiramatsu (Google)
[PATCH] kernel: trace: preemptirq_delay_test: add MODULE_DESCRIPTION()
Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/trace/preemptirq_delay_test.o Signed-off-by: Jeff Johnson --- kernel/trace/preemptirq_delay_test.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index 8c4ffd076162..cb0871fbdb07 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -215,4 +215,5 @@ static void __exit preemptirq_delay_exit(void) module_init(preemptirq_delay_init) module_exit(preemptirq_delay_exit) +MODULE_DESCRIPTION("Preempt / IRQ disable delay thread to test latency tracers"); MODULE_LICENSE("GPL v2"); --- base-commit: 674143feb6a8c02d899e64e2ba0f992896afd532 change-id: 20240518-md-preemptirq_delay_test-552cd20e7b0b
Re: [PATCH -next] rv: Update rv_en(dis)able_monitor doc to match kernel-doc
Hi Yang On 5/17/24 11:14, Yang Li wrote: > The patch updates the function documentation comment for > rv_en(dis)able_monitor to adhere to the kernel-doc specification. > > Signed-off-by: Yang Li > --- > kernel/trace/rv/rv.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c > index 2f68e93fff0b..df0745a42a3f 100644 > --- a/kernel/trace/rv/rv.c > +++ b/kernel/trace/rv/rv.c > @@ -245,6 +245,7 @@ static int __rv_disable_monitor(struct rv_monitor_def > *mdef, bool sync) > > /** > * rv_disable_monitor - disable a given runtime monitor > + * @mdef: Pointer to the monitor definition structure. This change is in for mainline kernel, why are you using the -next on the Subject? -- Daniel
[PATCH -next] rv: Update rv_en(dis)able_monitor doc to match kernel-doc
The patch updates the function documentation comment for rv_en(dis)able_monitor to adhere to the kernel-doc specification. Signed-off-by: Yang Li --- kernel/trace/rv/rv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c index 2f68e93fff0b..df0745a42a3f 100644 --- a/kernel/trace/rv/rv.c +++ b/kernel/trace/rv/rv.c @@ -245,6 +245,7 @@ static int __rv_disable_monitor(struct rv_monitor_def *mdef, bool sync) /** * rv_disable_monitor - disable a given runtime monitor + * @mdef: Pointer to the monitor definition structure. * * Returns 0 on success. */ @@ -256,6 +257,7 @@ int rv_disable_monitor(struct rv_monitor_def *mdef) /** * rv_enable_monitor - enable a given runtime monitor + * @mdef: Pointer to the monitor definition structure. * * Returns 0 on success, error otherwise. */ -- 2.20.1.7.g153144c
[PATCH v3 5/6] kbuild: generate modules.builtin.ranges when linking the kernel
Signed-off-by: Kris Van Hees Reviewed-by: Nick Alcock --- Changes since v2: - 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo - Use $(real-prereqs) rather than $(filter-out ...) --- scripts/Makefile.vmlinux | 16 1 file changed, 16 insertions(+) diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux index c9f3e03124d7f..afe8287e8dda0 100644 --- a/scripts/Makefile.vmlinux +++ b/scripts/Makefile.vmlinux @@ -36,6 +36,22 @@ targets += vmlinux vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE +$(call if_changed_dep,link_vmlinux) +# module.builtin.ranges +# --- +ifdef CONFIG_BUILTIN_MODULE_RANGES +__default: modules.builtin.ranges + +quiet_cmd_modules_builtin_ranges = GEN $@ + cmd_modules_builtin_ranges = \ + $(srctree)/scripts/generate_builtin_ranges.awk $(real-prereqs) > $@ + +vmlinux.map: vmlinux + +targets += modules.builtin.ranges +modules.builtin.ranges: modules.builtin.modinfo vmlinux.map vmlinux.o.map FORCE + $(call if_changed,modules_builtin_ranges) +endif + # Add FORCE to the prequisites of a target to force it to be always rebuilt. # --- -- 2.43.0
Re: [PATCH v2 5/8] riscv: mm: Take memory hotplug read-lock during kernel page table dump
On Tue, May 14, 2024 at 04:04:43PM +0200, Björn Töpel wrote: > From: Björn Töpel > > During memory hot remove, the ptdump functionality can end up touching > stale data. Avoid any potential crashes (or worse), by holding the > memory hotplug read-lock while traversing the page table. > > This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm: > Hold memory hotplug lock while walking for kernel page table dump"). > > Signed-off-by: Björn Töpel Reviewed-by: Oscar Salvador funny enough, it seems arm64 and riscv are the only ones holding the hotplug lock here. I think we have the same problem on the other arches as well (at least on x86_64 that I can see). If we happen to finally need the lock in those, I would rather have a centric function in the generic mm code with the locking and then calling an arch specific ptdump_show function, so the lock is not scattered. But that is another story. -- Oscar Salvador SUSE Labs
Re: [PATCH v2 5/8] riscv: mm: Take memory hotplug read-lock during kernel page table dump
On 14.05.24 16:04, Björn Töpel wrote: From: Björn Töpel During memory hot remove, the ptdump functionality can end up touching stale data. Avoid any potential crashes (or worse), by holding the memory hotplug read-lock while traversing the page table. This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm: Hold memory hotplug lock while walking for kernel page table dump"). Signed-off-by: Björn Töpel --- arch/riscv/mm/ptdump.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c index 1289cc6d3700..9d5f657a251b 100644 --- a/arch/riscv/mm/ptdump.c +++ b/arch/riscv/mm/ptdump.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include @@ -370,7 +371,9 @@ bool ptdump_check_wx(void) static int ptdump_show(struct seq_file *m, void *v) { + get_online_mems(); ptdump_walk(m, m->private); + put_online_mems(); return 0; } Reviewed-by: David Hildenbrand -- Cheers, David / dhildenb
[PATCH v2 5/8] riscv: mm: Take memory hotplug read-lock during kernel page table dump
From: Björn Töpel During memory hot remove, the ptdump functionality can end up touching stale data. Avoid any potential crashes (or worse), by holding the memory hotplug read-lock while traversing the page table. This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm: Hold memory hotplug lock while walking for kernel page table dump"). Signed-off-by: Björn Töpel --- arch/riscv/mm/ptdump.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c index 1289cc6d3700..9d5f657a251b 100644 --- a/arch/riscv/mm/ptdump.c +++ b/arch/riscv/mm/ptdump.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include @@ -370,7 +371,9 @@ bool ptdump_check_wx(void) static int ptdump_show(struct seq_file *m, void *v) { + get_online_mems(); ptdump_walk(m, m->private); + put_online_mems(); return 0; } -- 2.40.1
Re: [PATCH 2/3] kernel/pid: Remove default pid_max value
On Mon, Apr 08, 2024 at 04:58:18PM GMT, Michal Koutný wrote: > The kernel provides mechanisms, while it should not imply policies -- > default pid_max seems to be an example of the policy that does not fit > all. At the same time pid_max must have some value assigned, so use the > end of the allowed range -- pid_max_max. > > This change thus increases initial pid_max from 32k to 4M (x86_64 > defconfig). Out of curiosity I dug out the commit acdc721fe26d ("[PATCH] pid-max-2.5.33-A0") v2.5.34~5 that introduced the 32k default. The commit message doesn't say why such a sudden change though. Previously, the limit was 1G of pids (i.e. effectively no default limit like the intention of this series). Honestly, I expected more enthusiasm or reasons against removing the default value of pid_max. Is this really not of interest to anyone? (Thanks, Andrew, for your responses. I don't plan to pursue this further should there be no more interest in having less default limit values in kernel.) Regards, Michal signature.asc Description: PGP signature
Re: [PATCH v2 5/6] kbuild: generate modules.builtin.ranges when linking the kernel
On Sun, May 12, 2024 at 7:44 AM Kris Van Hees wrote: > > Signed-off-by: Kris Van Hees > Reviewed-by: Nick Alcock > --- > Changes since v1: > - Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES > --- > scripts/Makefile.vmlinux | 17 + > 1 file changed, 17 insertions(+) > > diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux > index c9f3e03124d7f..54095d72f7fd7 100644 > --- a/scripts/Makefile.vmlinux > +++ b/scripts/Makefile.vmlinux > @@ -36,6 +36,23 @@ targets += vmlinux > vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE > +$(call if_changed_dep,link_vmlinux) > > +# module.builtin.ranges > +# --- > +ifdef CONFIG_BUILTIN_MODULE_RANGES > +__default: modules.builtin.ranges > + > +quiet_cmd_modules_builtin_ranges = GEN $@ > + cmd_modules_builtin_ranges = \ > + $(srctree)/scripts/generate_builtin_ranges.awk \ > + $(filter-out FORCE,$+) > $@ $(filter-out FORCE,$+) -> $(real-prereqs) > + > +vmlinux.map: vmlinux > + > +targets += modules.builtin.ranges > +modules.builtin.ranges: modules.builtin.objs vmlinux.map vmlinux.o.map FORCE > + $(call if_changed,modules_builtin_ranges) > +endif > + > # Add FORCE to the prequisites of a target to force it to be always rebuilt. > # --- > > -- > 2.43.0 > > -- Best Regards Masahiro Yamada
Re: [PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation
On Sat, May 11, 2024 at 4:56 PM Dmitry Baryshkov wrote: > > Protection domain mapper is a QMI service providing mapping between > 'protection domains' and services supported / allowed in these domains. > For example such mapping is required for loading of the WiFi firmware or > for properly starting up the UCSI / altmode / battery manager support. > > The existing userspace implementation has several issue. It doesn't play > well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the > firmware location is changed (or if the firmware was not available at > the time pd-mapper was started but the corresponding directory is > mounted later), etc. > > However this configuration is largely static and common between > different platforms. Provide in-kernel service implementing static > per-platform data. > > To: Bjorn Andersson > To: Konrad Dybcio > To: Sibi Sankar > To: Mathieu Poirier > Cc: linux-arm-...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-remotep...@vger.kernel.org > Cc: Johan Hovold > Cc: Xilin Wu > Cc: "Bryan O'Donoghue" > Cc: Steev Klimaszewski > Cc: Alexey Minnekhanov > > -- > > Changes in v8: > - Reworked pd-mapper to register as an rproc_subdev / auxdev > - Dropped Tested-by from Steev and Alexey from the last patch since the > implementation was changed significantly. > - Add sensors, cdsp and mpss_root domains to 660 config (Alexey > Minnekhanov) > - Added platform entry for sm4250 (used for qrb4210 / RB2) > - Added locking to the pdr_get_domain_list() (Chris Lew) > - Remove the call to qmi_del_server() and corresponding API (Chris Lew) > - In qmi_handle_init() changed 1024 to a defined constant (Chris Lew) > - Link to v7: > https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org > > Changes in v7: > - Fixed modular build (Steev) > - Link to v6: > https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org > > Changes in v6: > - Reworked mutex to fix lockdep issue on deregistration > - Fixed dependencies between PD-mapper and remoteproc to fix modular > builds (Krzysztof) > - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof) > - Fixed kerneldocs (Krzysztof) > - Removed extra pr_debug messages (Krzysztof) > - Fixed wcss build (Krzysztof) > - Added platforms which do not require protection domain mapping to > silence the notice on those platforms > - Link to v5: > https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org > > Changes in v5: > - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris > Lew) > - pd_mapper: reworked to provide static configuration per platform > (Bjorn) > - Link to v4: > https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org > > Changes in v4: > - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad) > - Added configuration for sm6350 (Thanks to Luca) > - Removed RFC tag (Konrad) > - Link to v3: > https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org > > Changes in RFC v3: > - Send start / stop notifications when PD-mapper domain list is changed > - Reworked the way PD-mapper treats protection domains, register all of > them in a single batch > - Added SC7180 domains configuration based on TCL Book 14 GO > - Link to v2: > https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org > > Changes in RFC v2: > - Swapped num_domains / domains (Konrad) > - Fixed an issue with battery not working on sc8280xp > - Added missing configuration for QCS404 > > --- > Dmitry Baryshkov (5): > soc: qcom: pdr: protect locator_addr with the main mutex > soc: qcom: pdr: fix parsing of domains lists > soc: qcom: pdr: extract PDR message marshalling data > soc: qcom: add pd-mapper implementation > remoteproc: qcom: enable in-kernel PD mapper > > drivers/remoteproc/qcom_common.c| 87 + > drivers/remoteproc/qcom_common.h| 10 + > drivers/remoteproc/qcom_q6v5_adsp.c | 3 + > drivers/remoteproc/qcom_q6v5_mss.c | 3 + > drivers/remoteproc/qcom_q6v5_pas.c | 3 + > drivers/remoteproc/qcom_q6v5_wcss.c | 3 + > drivers/soc/qcom/Kconfig| 15 + > drivers/soc/qcom/Makefile | 2 + > drivers/soc/qcom/pdr_interface.c| 17 +- > drivers/soc/qcom/pdr_internal.h | 318 ++--- > drivers/soc/qcom/qcom_pd_mapper.c | 676 > > drivers/soc/qcom/qcom_pdr_msg.c | 353 +++ > 12 files changed, 1190 insertions(+), 300 deletions(-) > --- > base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488 > change-id: 20240301-qcom-pd-mapper-e12d622d4ad0 > > Best regards, > -- > Dmitry Baryshkov > I've tested this over the weekend on my Thinkpad X13s with a number of reboots and seems to do the correct thing in v8 as well. Tested-by: Steev Klimaszewski
[PATCH v2 5/6] kbuild: generate modules.builtin.ranges when linking the kernel
Signed-off-by: Kris Van Hees Reviewed-by: Nick Alcock --- Changes since v1: - Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES --- scripts/Makefile.vmlinux | 17 + 1 file changed, 17 insertions(+) diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux index c9f3e03124d7f..54095d72f7fd7 100644 --- a/scripts/Makefile.vmlinux +++ b/scripts/Makefile.vmlinux @@ -36,6 +36,23 @@ targets += vmlinux vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE +$(call if_changed_dep,link_vmlinux) +# module.builtin.ranges +# --- +ifdef CONFIG_BUILTIN_MODULE_RANGES +__default: modules.builtin.ranges + +quiet_cmd_modules_builtin_ranges = GEN $@ + cmd_modules_builtin_ranges = \ + $(srctree)/scripts/generate_builtin_ranges.awk \ + $(filter-out FORCE,$+) > $@ + +vmlinux.map: vmlinux + +targets += modules.builtin.ranges +modules.builtin.ranges: modules.builtin.objs vmlinux.map vmlinux.o.map FORCE + $(call if_changed,modules_builtin_ranges) +endif + # Add FORCE to the prequisites of a target to force it to be always rebuilt. # --- -- 2.43.0
[PATCH v8 5/5] remoteproc: qcom: enable in-kernel PD mapper
Request in-kernel protection domain mapper to be started before starting Qualcomm DSP and release it once DSP is stopped. Once all DSPs are stopped, the PD mapper will be stopped too. Signed-off-by: Dmitry Baryshkov --- drivers/remoteproc/qcom_common.c| 87 + drivers/remoteproc/qcom_common.h| 10 + drivers/remoteproc/qcom_q6v5_adsp.c | 3 ++ drivers/remoteproc/qcom_q6v5_mss.c | 3 ++ drivers/remoteproc/qcom_q6v5_pas.c | 3 ++ drivers/remoteproc/qcom_q6v5_wcss.c | 3 ++ 6 files changed, 109 insertions(+) diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c index 03e5f5d533eb..8c8688f99f0a 100644 --- a/drivers/remoteproc/qcom_common.c +++ b/drivers/remoteproc/qcom_common.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -25,6 +26,7 @@ #define to_glink_subdev(d) container_of(d, struct qcom_rproc_glink, subdev) #define to_smd_subdev(d) container_of(d, struct qcom_rproc_subdev, subdev) #define to_ssr_subdev(d) container_of(d, struct qcom_rproc_ssr, subdev) +#define to_pdm_subdev(d) container_of(d, struct qcom_rproc_pdm, subdev) #define MAX_NUM_OF_SS 10 #define MAX_REGION_NAME_LENGTH 16 @@ -519,5 +521,90 @@ void qcom_remove_ssr_subdev(struct rproc *rproc, struct qcom_rproc_ssr *ssr) } EXPORT_SYMBOL_GPL(qcom_remove_ssr_subdev); +static void pdm_dev_release(struct device *dev) +{ + struct auxiliary_device *adev = to_auxiliary_dev(dev); + + kfree(adev); +} + +static int pdm_notify_prepare(struct rproc_subdev *subdev) +{ + struct qcom_rproc_pdm *pdm = to_pdm_subdev(subdev); + struct auxiliary_device *adev; + int ret; + + adev = kzalloc(sizeof(*adev), GFP_KERNEL); + if (!adev) + return -ENOMEM; + + adev->dev.parent = pdm->dev; + adev->dev.release = pdm_dev_release; + adev->name = "pd-mapper"; + adev->id = pdm->index; + + ret = auxiliary_device_init(adev); + if (ret) { + kfree(adev); + return ret; + } + + ret = auxiliary_device_add(adev); + if (ret) { + auxiliary_device_uninit(adev); + return ret; + } + + pdm->adev = adev; + + return 0; +} + + +static void pdm_notify_unprepare(struct rproc_subdev *subdev) +{ + struct qcom_rproc_pdm *pdm = to_pdm_subdev(subdev); + + if (!pdm->adev) + return; + + auxiliary_device_delete(pdm->adev); + auxiliary_device_uninit(pdm->adev); + pdm->adev = NULL; +} + +/** + * qcom_add_pdm_subdev() - register PD Mapper subdevice + * @rproc: rproc handle + * @pdm: PDM subdevice handle + * + * Register @pdm so that Protection Device mapper service is started when the + * DSP is started too. + */ +void qcom_add_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm) +{ + pdm->dev = &rproc->dev; + pdm->index = rproc->index; + + pdm->subdev.prepare = pdm_notify_prepare; + pdm->subdev.unprepare = pdm_notify_unprepare; + + rproc_add_subdev(rproc, &pdm->subdev); +} +EXPORT_SYMBOL_GPL(qcom_add_pdm_subdev); + +/** + * qcom_remove_pdm_subdev() - remove PD Mapper subdevice + * @rproc: rproc handle + * @pdm: PDM subdevice handle + * + * Remove the PD Mapper subdevice. + */ +void qcom_remove_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm) +{ + rproc_remove_subdev(rproc, &pdm->subdev); +} +EXPORT_SYMBOL_GPL(qcom_remove_pdm_subdev); + MODULE_DESCRIPTION("Qualcomm Remoteproc helper driver"); MODULE_LICENSE("GPL v2"); diff --git a/drivers/remoteproc/qcom_common.h b/drivers/remoteproc/qcom_common.h index 9ef4449052a9..b07fbaa091a0 100644 --- a/drivers/remoteproc/qcom_common.h +++ b/drivers/remoteproc/qcom_common.h @@ -34,6 +34,13 @@ struct qcom_rproc_ssr { struct qcom_ssr_subsystem *info; }; +struct qcom_rproc_pdm { + struct rproc_subdev subdev; + struct device *dev; + int index; + struct auxiliary_device *adev; +}; + void qcom_minidump(struct rproc *rproc, unsigned int minidump_id, void (*rproc_dumpfn_t)(struct rproc *rproc, struct rproc_dump_segment *segment, void *dest, size_t offset, @@ -52,6 +59,9 @@ void qcom_add_ssr_subdev(struct rproc *rproc, struct qcom_rproc_ssr *ssr, const char *ssr_name); void qcom_remove_ssr_subdev(struct rproc *rproc, struct qcom_rproc_ssr *ssr); +void qcom_add_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm); +void qcom_remove_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm); + #if IS_ENABLED(CONFIG_QCOM_SYSMON) struct qcom_sysmon *qcom_add_sysmon_subdev(struct rproc *rproc, const char *name, diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c b/d
[PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation
Protection domain mapper is a QMI service providing mapping between 'protection domains' and services supported / allowed in these domains. For example such mapping is required for loading of the WiFi firmware or for properly starting up the UCSI / altmode / battery manager support. The existing userspace implementation has several issue. It doesn't play well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the firmware location is changed (or if the firmware was not available at the time pd-mapper was started but the corresponding directory is mounted later), etc. However this configuration is largely static and common between different platforms. Provide in-kernel service implementing static per-platform data. To: Bjorn Andersson To: Konrad Dybcio To: Sibi Sankar To: Mathieu Poirier Cc: linux-arm-...@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-remotep...@vger.kernel.org Cc: Johan Hovold Cc: Xilin Wu Cc: "Bryan O'Donoghue" Cc: Steev Klimaszewski Cc: Alexey Minnekhanov -- Changes in v8: - Reworked pd-mapper to register as an rproc_subdev / auxdev - Dropped Tested-by from Steev and Alexey from the last patch since the implementation was changed significantly. - Add sensors, cdsp and mpss_root domains to 660 config (Alexey Minnekhanov) - Added platform entry for sm4250 (used for qrb4210 / RB2) - Added locking to the pdr_get_domain_list() (Chris Lew) - Remove the call to qmi_del_server() and corresponding API (Chris Lew) - In qmi_handle_init() changed 1024 to a defined constant (Chris Lew) - Link to v7: https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org Changes in v7: - Fixed modular build (Steev) - Link to v6: https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org Changes in v6: - Reworked mutex to fix lockdep issue on deregistration - Fixed dependencies between PD-mapper and remoteproc to fix modular builds (Krzysztof) - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof) - Fixed kerneldocs (Krzysztof) - Removed extra pr_debug messages (Krzysztof) - Fixed wcss build (Krzysztof) - Added platforms which do not require protection domain mapping to silence the notice on those platforms - Link to v5: https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org Changes in v5: - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew) - pd_mapper: reworked to provide static configuration per platform (Bjorn) - Link to v4: https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org Changes in v4: - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad) - Added configuration for sm6350 (Thanks to Luca) - Removed RFC tag (Konrad) - Link to v3: https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org Changes in RFC v3: - Send start / stop notifications when PD-mapper domain list is changed - Reworked the way PD-mapper treats protection domains, register all of them in a single batch - Added SC7180 domains configuration based on TCL Book 14 GO - Link to v2: https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org Changes in RFC v2: - Swapped num_domains / domains (Konrad) - Fixed an issue with battery not working on sc8280xp - Added missing configuration for QCS404 --- Dmitry Baryshkov (5): soc: qcom: pdr: protect locator_addr with the main mutex soc: qcom: pdr: fix parsing of domains lists soc: qcom: pdr: extract PDR message marshalling data soc: qcom: add pd-mapper implementation remoteproc: qcom: enable in-kernel PD mapper drivers/remoteproc/qcom_common.c| 87 + drivers/remoteproc/qcom_common.h| 10 + drivers/remoteproc/qcom_q6v5_adsp.c | 3 + drivers/remoteproc/qcom_q6v5_mss.c | 3 + drivers/remoteproc/qcom_q6v5_pas.c | 3 + drivers/remoteproc/qcom_q6v5_wcss.c | 3 + drivers/soc/qcom/Kconfig| 15 + drivers/soc/qcom/Makefile | 2 + drivers/soc/qcom/pdr_interface.c| 17 +- drivers/soc/qcom/pdr_internal.h | 318 ++--- drivers/soc/qcom/qcom_pd_mapper.c | 676 drivers/soc/qcom/qcom_pdr_msg.c | 353 +++ 12 files changed, 1190 insertions(+), 300 deletions(-) --- base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488 change-id: 20240301-qcom-pd-mapper-e12d622d4ad0 Best regards, -- Dmitry Baryshkov
Re: kernel BUG in ptr_stale
On Thu, May 09, 2024 at 02:26:24PM +0800, Ubisectech Sirius wrote: > Hello. > We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. > Recently, our team has discovered a issue in Linux kernel 6.7. Attached to > the email were a PoC file of the issue. This (and several of your others) are fixed in Linus's tree. > > Stack dump: > > bcachefs (loop1): mounting version 1.7: (unknown version) > opts=metadata_checksum=none,data_checksum=none,nojournal_transaction_names > ----[ cut here ] > kernel BUG at fs/bcachefs/buckets.h:114! > invalid opcode: [#1] PREEMPT SMP KASAN NOPTI > CPU: 1 PID: 9472 Comm: syz-executor.1 Not tainted 6.7.0 #2 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 > 04/01/2014 > RIP: 0010:bucket_gen fs/bcachefs/buckets.h:114 [inline] > RIP: 0010:ptr_stale+0x474/0x4e0 fs/bcachefs/buckets.h:188 > Code: 48 c7 c2 80 8c 1b 8b be 67 00 00 00 48 c7 c7 e0 8c 1b 8b c6 05 ea a6 72 > 0b 01 e8 57 55 9c fd e9 fb fc ff ff e8 9d 02 bd fd 90 <0f> 0b 48 89 04 24 e8 > 31 bb 13 fe 48 8b 04 24 e9 35 fc ff ff e8 23 > RSP: 0018:c90007c4ec38 EFLAGS: 00010246 > RAX: 0004 RBX: 0080 RCX: c90002679000 > RDX: 0004 RSI: 83ccf3b3 RDI: 0006 > RBP: R08: 0006 R09: 1028 > R10: 0080 R11: R12: 1028 > R13: 88804dee5100 R14: R15: 88805b1a4110 > FS: 7f79ba8ab640() GS:88807ec0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7f0bbda3f000 CR3: 5f37a000 CR4: 00750ef0 > DR0: DR1: DR2: > DR3: DR6: fffe0ff0 DR7: 0400 > PKRU: 5554 > Call Trace: > > bch2_bkey_ptrs_to_text+0xb4e/0x1760 fs/bcachefs/extents.c:1012 > bch2_btree_ptr_v2_to_text+0x288/0x330 fs/bcachefs/extents.c:215 > bch2_val_to_text fs/bcachefs/bkey_methods.c:287 [inline] > bch2_bkey_val_to_text+0x1c8/0x210 fs/bcachefs/bkey_methods.c:297 > journal_validate_key+0x7ab/0xb50 fs/bcachefs/journal_io.c:322 > journal_entry_btree_root_validate+0x31c/0x380 fs/bcachefs/journal_io.c:411 > bch2_journal_entry_validate+0xc7/0x130 fs/bcachefs/journal_io.c:752 > bch2_sb_clean_validate_late+0x14b/0x1e0 fs/bcachefs/sb-clean.c:32 > bch2_read_superblock_clean+0xbb/0x250 fs/bcachefs/sb-clean.c:160 > bch2_fs_recovery+0x113/0x52d0 fs/bcachefs/recovery.c:691 > bch2_fs_start+0x365/0x5e0 fs/bcachefs/super.c:978 > bch2_fs_open+0x1ac9/0x3890 fs/bcachefs/super.c:1968 > bch2_mount+0x538/0x13c0 fs/bcachefs/fs.c:1863 > legacy_get_tree+0x109/0x220 fs/fs_context.c:662 > vfs_get_tree+0x93/0x380 fs/super.c:1771 > do_new_mount fs/namespace.c:3337 [inline] > path_mount+0x679/0x1e40 fs/namespace.c:3664 > do_mount fs/namespace.c:3677 [inline] > __do_sys_mount fs/namespace.c:3886 [inline] > __se_sys_mount fs/namespace.c:3863 [inline] > __x64_sys_mount+0x287/0x310 fs/namespace.c:3863 > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83 > entry_SYSCALL_64_after_hwframe+0x6f/0x77 > RIP: 0033:0x7f79b9a91b3e > Code: 48 c7 c0 ff ff ff ff eb aa e8 be 0d 00 00 66 2e 0f 1f 84 00 00 00 00 00 > 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 > 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:7f79ba8aae38 EFLAGS: 0202 ORIG_RAX: 00a5 > RAX: ffda RBX: 000119f4 RCX: 7f79b9a91b3e > RDX: 20011a00 RSI: 20011a40 RDI: 7f79ba8aae90 > RBP: 7f79ba8aaed0 R08: 7f79ba8aaed0 R09: 0181c050 > R10: 0181c050 R11: 0202 R12: 20011a00 > R13: 20011a40 R14: 7f79ba8aae90 R15: 21c0 > > Modules linked in: > ---[ end trace ]--- > > > Thank you for taking the time to read this email and we look forward to > working with you further. > > > > > >
[PATCH] tracing: Fix trace_pid_list_free() kernel-doc
make C=1 reports: kernel/trace/pid_list.c:458: warning: Function parameter or struct member 'pid_list' not described in 'trace_pid_list_free' Add the missing parameter to the trace_pid_list_free() kernel-doc. Signed-off-by: Jeff Johnson --- kernel/trace/pid_list.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/trace/pid_list.c b/kernel/trace/pid_list.c index 95106d02b32d..19b271a12c99 100644 --- a/kernel/trace/pid_list.c +++ b/kernel/trace/pid_list.c @@ -451,6 +451,7 @@ struct trace_pid_list *trace_pid_list_alloc(void) /** * trace_pid_list_free - Frees an allocated pid_list. + * @pid_list: The pid list to free. * * Frees the memory for a pid_list that was allocated. */ --- base-commit: dd5a440a31fae6e459c0d627162825505361 change-id: 20240506-trace_pid_list_free-kdoc-e2bf15be84ee
Re: [PATCH v3 1/2] virtiofs: use pages instead of pointer for kernel direct IO
On 4/26/2024 10:39 PM, Hou Tao wrote: > From: Hou Tao > > When trying to insert a 10MB kernel module kept in a virtio-fs with cache > disabled, the following warning was reported: > > [ cut here ] > WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 .. > Modules linked in: > CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) .. > RIP: 0010:__alloc_pages+0x2bf/0x380 > .. > Call Trace: > >? __warn+0x8e/0x150 >? __alloc_pages+0x2bf/0x380 >__kmalloc_large_node+0x86/0x160 >__kmalloc+0x33c/0x480 >virtio_fs_enqueue_req+0x240/0x6d0 >virtio_fs_wake_pending_and_unlock+0x7f/0x190 >queue_request_and_unlock+0x55/0x60 >fuse_simple_request+0x152/0x2b0 >fuse_direct_io+0x5d2/0x8c0 >fuse_file_read_iter+0x121/0x160 >__kernel_read+0x151/0x2d0 >kernel_read+0x45/0x50 >kernel_read_file+0x1a9/0x2a0 >init_module_from_file+0x6a/0xe0 >idempotent_init_module+0x175/0x230 >__x64_sys_finit_module+0x5d/0xb0 >x64_sys_call+0x1c3/0x9e0 >do_syscall_64+0x3d/0xc0 >entry_SYSCALL_64_after_hwframe+0x4b/0x53 >.. > > ---[ end trace ]--- > > The warning is triggered as follows: > SNIP > @@ -1585,7 +1589,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct > iov_iter *iter, > size_t nbytes = min(count, nmax); > > err = fuse_get_user_pages(&ia->ap, iter, &nbytes, write, > - max_pages); > + max_pages, fc->use_pages_for_kvec_io); > if (err && !nbytes) > break; Just find out that flush_kernel_vmap_range() and invalidate_kernel_vmap_range() should be used before DMA rw operation and after DMA read operation if the kvec IO is backed by vmalloc() area. Will update it in v4. > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index f239196103137..d4f04e19058c1 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -860,6 +860,9 @@ struct fuse_conn { > /** Passthrough support for read/write IO */ > unsigned int passthrough:1; > > + /* Use pages instead of pointer for kernel I/O */ > + unsigned int use_pages_for_kvec_io:1; > + > /** Maximum stack depth for passthrough backing files */ > int max_stack_depth; > > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c > index 322af827a2329..36984c0e23d14 100644 > --- a/fs/fuse/virtio_fs.c > +++ b/fs/fuse/virtio_fs.c > @@ -1512,6 +1512,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc) > fc->delete_stale = true; > fc->auto_submounts = true; > fc->sync_fs = true; > + fc->use_pages_for_kvec_io = true; > > /* Tell FUSE to split requests that exceed the virtqueue's size */ > fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit,
Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side
On Mon, May 6, 2024 at 3:00 PM maobibo wrote: > > > > On 2024/5/6 上午9:53, Huacai Chen wrote: > > Hi, Bibo, > > > > On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao wrote: > >> > >> PARAVIRT option and pv ipi is added on guest kernel side, function > >> pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function > >> firstly checks whether system runs on VM mode. If kernel runs on VM mode, > >> it will call function kvm_para_available() to detect current hypervirsor > >> type. Now only KVM type detection is supported, the paravirt function can > >> work only if current hypervisor type is KVM, since there is only KVM > >> supported on LoongArch now. > >> > >> PV IPI uses virtual IPI sender and virtual IPI receiver function. With > >> virutal IPI sender, ipi message is stored in DDR memory rather than > >> emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs > >> at the same time like X86 KVM method. Hypercall method is used for IPI > >> sending. > >> > >> With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since > >> VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt > >> acknowledge. And IPI message is stored in DDR, no trap in get IPI message. > >> > >> Signed-off-by: Bibo Mao > >> --- > >> arch/loongarch/Kconfig | 9 ++ > >> arch/loongarch/include/asm/hardirq.h | 1 + > >> arch/loongarch/include/asm/paravirt.h | 27 > >> .../include/asm/paravirt_api_clock.h | 1 + > >> arch/loongarch/kernel/Makefile| 1 + > >> arch/loongarch/kernel/irq.c | 2 +- > >> arch/loongarch/kernel/paravirt.c | 151 ++ > >> arch/loongarch/kernel/smp.c | 4 +- > >> 8 files changed, 194 insertions(+), 2 deletions(-) > >> create mode 100644 arch/loongarch/include/asm/paravirt.h > >> create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h > >> create mode 100644 arch/loongarch/kernel/paravirt.c > >> > >> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > >> index 54ad04dacdee..0a1540a8853e 100644 > >> --- a/arch/loongarch/Kconfig > >> +++ b/arch/loongarch/Kconfig > >> @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH > >> bool > >> default y > >> > >> +config PARAVIRT > >> + bool "Enable paravirtualization code" > >> + depends on AS_HAS_LVZ_EXTENSION > >> + help > >> + This changes the kernel so it can modify itself when it is run > >> + under a hypervisor, potentially improving performance > >> significantly > >> + over full virtualization. However, when run without a hypervisor > >> + the kernel is theoretically slower and slightly larger. > >> + > >> config ARCH_SUPPORTS_KEXEC > >> def_bool y > >> > >> diff --git a/arch/loongarch/include/asm/hardirq.h > >> b/arch/loongarch/include/asm/hardirq.h > >> index 9f0038e19c7f..b26d596a73aa 100644 > >> --- a/arch/loongarch/include/asm/hardirq.h > >> +++ b/arch/loongarch/include/asm/hardirq.h > >> @@ -21,6 +21,7 @@ enum ipi_msg_type { > >> typedef struct { > >> unsigned int ipi_irqs[NR_IPI]; > >> unsigned int __softirq_pending; > >> + atomic_t message cacheline_aligned_in_smp; > >> } cacheline_aligned irq_cpustat_t; > >> > >> DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); > >> diff --git a/arch/loongarch/include/asm/paravirt.h > >> b/arch/loongarch/include/asm/paravirt.h > >> new file mode 100644 > >> index ..58f7b7b89f2c > >> --- /dev/null > >> +++ b/arch/loongarch/include/asm/paravirt.h > >> @@ -0,0 +1,27 @@ > >> +/* SPDX-License-Identifier: GPL-2.0 */ > >> +#ifndef _ASM_LOONGARCH_PARAVIRT_H > >> +#define _ASM_LOONGARCH_PARAVIRT_H > >> + > >> +#ifdef CONFIG_PARAVIRT > >> +#include > >> +struct static_key; > >> +extern struct static_key paravirt_steal_enabled; > >> +extern struct static_key paravirt_steal_rq_enabled; > >> + > >> +u64 dummy_steal_clock(int cpu); > >> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); > >> + > >> +static inline u64 paravirt_steal_clock
Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side
On 2024/5/6 上午9:53, Huacai Chen wrote: Hi, Bibo, On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao wrote: PARAVIRT option and pv ipi is added on guest kernel side, function pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available() to detect current hypervirsor type. Now only KVM type detection is supported, the paravirt function can work only if current hypervisor type is KVM, since there is only KVM supported on LoongArch now. PV IPI uses virtual IPI sender and virtual IPI receiver function. With virutal IPI sender, ipi message is stored in DDR memory rather than emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs at the same time like X86 KVM method. Hypercall method is used for IPI sending. With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt acknowledge. And IPI message is stored in DDR, no trap in get IPI message. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 ++ arch/loongarch/include/asm/hardirq.h | 1 + arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/irq.c | 2 +- arch/loongarch/kernel/paravirt.c | 151 ++ arch/loongarch/kernel/smp.c | 4 +- 8 files changed, 194 insertions(+), 2 deletions(-) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 54ad04dacdee..0a1540a8853e 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h index 9f0038e19c7f..b26d596a73aa 100644 --- a/arch/loongarch/include/asm/hardirq.h +++ b/arch/loongarch/include/asm/hardirq.h @@ -21,6 +21,7 @@ enum ipi_msg_type { typedef struct { unsigned int ipi_irqs[NR_IPI]; unsigned int __softirq_pending; + atomic_t message cacheline_aligned_in_smp; } cacheline_aligned irq_cpustat_t; DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..58f7b7b89f2c --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} + +int pv_ipi_init(void); +#else +static inline int pv_ipi_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3a7620b66bc6..c9bfeda89e40 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c index ce36897d1e5a..4863e6c1b739 100644 --- a/arch/loongarch/kernel/irq.c +++ b/arch/loongarch/kernel/irq.c @@ -113,5 +113,5 @@ void __init init_IRQ(void) per_cpu(irq_stack, i), per_cpu(irq_stack, i) + IRQ_STACK_SIZE); } - set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC); + se
[PATCH] kernel/module: disable cfi for do_mod_ctors
CFI failure when both CONFIG_CONSTRUCTORS and CFI_CLANG enabled. CFI failure at do_init_module+0x100/0x384 (target: tsan.module_ctor+0x0/0xa98 [module_name_xx]; expected type: 0xa540670c) Disable cfi for do_mod_ctors to avoid cfi check on mod->ctors[i](). Signed-off-by: Joey Jiao --- kernel/module/main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/module/main.c b/kernel/module/main.c index e1e8a7a9d6c1..d51e63795637 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -2453,6 +2453,7 @@ static int post_relocation(struct module *mod, const struct load_info *info) } /* Call module constructors. */ +__nocfi static void do_mod_ctors(struct module *mod) { #ifdef CONFIG_CONSTRUCTORS -- 2.43.2
Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side
Hi, Bibo, On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao wrote: > > PARAVIRT option and pv ipi is added on guest kernel side, function > pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function > firstly checks whether system runs on VM mode. If kernel runs on VM mode, > it will call function kvm_para_available() to detect current hypervirsor > type. Now only KVM type detection is supported, the paravirt function can > work only if current hypervisor type is KVM, since there is only KVM > supported on LoongArch now. > > PV IPI uses virtual IPI sender and virtual IPI receiver function. With > virutal IPI sender, ipi message is stored in DDR memory rather than > emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs > at the same time like X86 KVM method. Hypercall method is used for IPI > sending. > > With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since > VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt > acknowledge. And IPI message is stored in DDR, no trap in get IPI message. > > Signed-off-by: Bibo Mao > --- > arch/loongarch/Kconfig| 9 ++ > arch/loongarch/include/asm/hardirq.h | 1 + > arch/loongarch/include/asm/paravirt.h | 27 > .../include/asm/paravirt_api_clock.h | 1 + > arch/loongarch/kernel/Makefile| 1 + > arch/loongarch/kernel/irq.c | 2 +- > arch/loongarch/kernel/paravirt.c | 151 ++ > arch/loongarch/kernel/smp.c | 4 +- > 8 files changed, 194 insertions(+), 2 deletions(-) > create mode 100644 arch/loongarch/include/asm/paravirt.h > create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h > create mode 100644 arch/loongarch/kernel/paravirt.c > > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > index 54ad04dacdee..0a1540a8853e 100644 > --- a/arch/loongarch/Kconfig > +++ b/arch/loongarch/Kconfig > @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH > bool > default y > > +config PARAVIRT > + bool "Enable paravirtualization code" > + depends on AS_HAS_LVZ_EXTENSION > + help > + This changes the kernel so it can modify itself when it is run > + under a hypervisor, potentially improving performance significantly > + over full virtualization. However, when run without a hypervisor > + the kernel is theoretically slower and slightly larger. > + > config ARCH_SUPPORTS_KEXEC > def_bool y > > diff --git a/arch/loongarch/include/asm/hardirq.h > b/arch/loongarch/include/asm/hardirq.h > index 9f0038e19c7f..b26d596a73aa 100644 > --- a/arch/loongarch/include/asm/hardirq.h > +++ b/arch/loongarch/include/asm/hardirq.h > @@ -21,6 +21,7 @@ enum ipi_msg_type { > typedef struct { > unsigned int ipi_irqs[NR_IPI]; > unsigned int __softirq_pending; > + atomic_t message cacheline_aligned_in_smp; > } cacheline_aligned irq_cpustat_t; > > DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); > diff --git a/arch/loongarch/include/asm/paravirt.h > b/arch/loongarch/include/asm/paravirt.h > new file mode 100644 > index ..58f7b7b89f2c > --- /dev/null > +++ b/arch/loongarch/include/asm/paravirt.h > @@ -0,0 +1,27 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_LOONGARCH_PARAVIRT_H > +#define _ASM_LOONGARCH_PARAVIRT_H > + > +#ifdef CONFIG_PARAVIRT > +#include > +struct static_key; > +extern struct static_key paravirt_steal_enabled; > +extern struct static_key paravirt_steal_rq_enabled; > + > +u64 dummy_steal_clock(int cpu); > +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); > + > +static inline u64 paravirt_steal_clock(int cpu) > +{ > + return static_call(pv_steal_clock)(cpu); > +} > + > +int pv_ipi_init(void); > +#else > +static inline int pv_ipi_init(void) > +{ > + return 0; > +} > + > +#endif // CONFIG_PARAVIRT > +#endif > diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h > b/arch/loongarch/include/asm/paravirt_api_clock.h > new file mode 100644 > index ..65ac7cee0dad > --- /dev/null > +++ b/arch/loongarch/include/asm/paravirt_api_clock.h > @@ -0,0 +1 @@ > +#include > diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile > index 3a7620b66bc6..c9bfeda89e40 100644 > --- a/arch/loongarch/kernel/Makefile > +++ b/arch/loongarch/kernel/Makefile > @@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o > obj-$(CONFIG_STACKTRACE) += stacktrace.o > > obj-$(CONFIG_PROC_FS) += proc.o >
Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper
On 4/26/2024 6:36 PM, Dmitry Baryshkov wrote: On Sat, 27 Apr 2024 at 04:03, Chris Lew wrote: On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote: diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c b/drivers/remoteproc/qcom_q6v5_adsp.c index 1d24c9b656a8..02d0c626b03b 100644 --- a/drivers/remoteproc/qcom_q6v5_adsp.c +++ b/drivers/remoteproc/qcom_q6v5_adsp.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc) int ret; unsigned int val; - ret = qcom_q6v5_prepare(&adsp->q6v5); + ret = qcom_pdm_get(); if (ret) return ret; Would it make sense to try and model this as a rproc subdev? This section of the remoteproc code seems to be focused on making specific calls to setup and enable hardware resources, where as pd mapper is software. sysmon and ssr are also purely software and they are modeled as subdevs in qcom_common. I'm not an expert on remoteproc organization but this was just a thought. Well, the issue is that the pd-mapper is a global, not a per-remoteproc instance Both sysmon and ssr have some kind of global states that they manage too. Each subdev functionality tends to be a mix of per-remoteproc instance management and global state management. If pd-mapper was completely global, pd-mapper would be able to instantiate by itself. Instead, instantiation is dependent on each remoteproc instance properly getting and putting references. The pdm subdev could manage the references to pd-mapper for that remoteproc instance. On the other hand, I think Bjorn recommended this could be moved to probe time in v4. The v4 version was doing the reinitialization-dance, but I think the recommendation could still apply to this version. Thanks! Chris + ret = qcom_q6v5_prepare(&adsp->q6v5); + if (ret) + goto put_pdm; + ret = adsp_map_carveout(rproc); if (ret) { dev_err(adsp->dev, "ADSP smmu mapping failed\n"); @@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc) adsp_unmap_carveout(rproc); disable_irqs: qcom_q6v5_unprepare(&adsp->q6v5); +put_pdm: + qcom_pdm_release(); return ret; }
BUG: unable to handle kernel paging request in do_split
Hello. We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. Recently, our team has discovered a issue in Linux kernel 6.7. Attached to the email were a PoC file of the issue. Stack dump: BUG: unable to handle page fault for address: ed110c2fd97f #PF: supervisor read access in kernel mode #PF: error_code(0x) - not-present page PGD 7ffd0067 P4D 7ffd0067 PUD 0 Oops: [#1] PREEMPT SMP KASAN NOPTI CPU: 0 PID: 24082 Comm: syz-executor.3 Not tainted 6.7.0 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:do_split+0xfef/0x1e10 fs/ext4/namei.c:2047 Code: d2 0f 85 38 0b 00 00 8b 45 00 89 84 24 84 00 00 00 41 8d 45 ff 48 8d 1c c3 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 <0f> b6 14 02 48 89 d8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ef RSP: 0018:c90001e9f858 EFLAGS: 00010a02 RAX: dc00 RBX: 617ecbf8 RCX: c9001048f000 RDX: 11110c2fd97f RSI: 823364ab RDI: 0005 RBP: 8880617ecc00 R08: 0005 R09: R10: R11: R12: dc00 R13: R14: R15: 88801ee8d2b0 FS: 7f191402a640() GS:88802c60() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: ed110c2fd97f CR3: 5500a000 CR4: 00750ef0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 PKRU: 5554 Call Trace: make_indexed_dir+0x1158/0x1540 fs/ext4/namei.c:2342 ext4_add_entry+0xcd0/0xe80 fs/ext4/namei.c:2454 ext4_add_nondir+0x90/0x2b0 fs/ext4/namei.c:2795 ext4_symlink+0x539/0x9e0 fs/ext4/namei.c:3436 vfs_symlink fs/namei.c:4464 [inline] vfs_symlink+0x3f6/0x640 fs/namei.c:4448 do_symlinkat+0x245/0x2f0 fs/namei.c:4490 __do_sys_symlink fs/namei.c:4511 [inline] __se_sys_symlink fs/namei.c:4509 [inline] __x64_sys_symlink+0x79/0xa0 fs/namei.c:4509 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7f191329002d Code: c3 e8 97 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:7f191402a028 EFLAGS: 0246 ORIG_RAX: 0058 RAX: ffda RBX: 7f19133cbf80 RCX: 7f191329002d RDX: RSI: 2e40 RDI: 20001640 RBP: 7f19132f14d0 R08: R09: R10: R11: 0246 R12: R13: 000b R14: 7f19133cbf80 R15: 7f191400a000 Modules linked in: CR2: ed110c2fd97f ---[ end trace ]--- RIP: 0010:do_split+0xfef/0x1e10 fs/ext4/namei.c:2047 Code: d2 0f 85 38 0b 00 00 8b 45 00 89 84 24 84 00 00 00 41 8d 45 ff 48 8d 1c c3 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 <0f> b6 14 02 48 89 d8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ef RSP: 0018:c90001e9f858 EFLAGS: 00010a02 RAX: dc00 RBX: 617ecbf8 RCX: c9001048f000 RDX: 11110c2fd97f RSI: 823364ab RDI: 0005 RBP: 8880617ecc00 R08: 0005 R09: R10: R11: R12: dc00 R13: R14: R15: 88801ee8d2b0 FS: 7f191402a640() GS:88802c60() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: ed110c2fd97f CR3: 5500a000 CR4: 00750ef0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 PKRU: 5554 Code disassembly (best guess): 0: d2 0f rorb %cl,(%rdi) 2: 85 38 test %edi,(%rax) 4: 0b 00 or (%rax),%eax 6: 00 8b 45 00 89 84 add%cl,-0x7b76ffbb(%rbx) c: 24 84 and$0x84,%al e: 00 00 add%al,(%rax) 10: 00 41 8dadd%al,-0x73(%rcx) 13: 45 ff 48 8d rex.RB decl -0x73(%r8) 17: 1c c3 sbb$0xc3,%al 19: 48 b8 00 00 00 00 00movabs $0xdc00,%rax 20: fc ff df 23: 48 89 damov%rbx,%rdx 26: 48 c1 ea 03 shr$0x3,%rdx * 2a: 0f b6 14 02 movzbl (%rdx,%rax,1),%edx <-- trapping instruction 2e: 48 89 d8mov%rbx,%rax 31: 83 e0 07and$0x7,%eax 34: 83 c0 03add$0x3,%eax 37: 38 d0 cmp%dl,%al 39: 7c 08 jl 0x43 3b: 84 d2 test %dl,%dl 3d: 0f .byte 0xf 3e: 85 ef test
[PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side
PARAVIRT option and pv ipi is added on guest kernel side, function pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function firstly checks whether system runs on VM mode. If kernel runs on VM mode, it will call function kvm_para_available() to detect current hypervirsor type. Now only KVM type detection is supported, the paravirt function can work only if current hypervisor type is KVM, since there is only KVM supported on LoongArch now. PV IPI uses virtual IPI sender and virtual IPI receiver function. With virutal IPI sender, ipi message is stored in DDR memory rather than emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs at the same time like X86 KVM method. Hypercall method is used for IPI sending. With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt acknowledge. And IPI message is stored in DDR, no trap in get IPI message. Signed-off-by: Bibo Mao --- arch/loongarch/Kconfig| 9 ++ arch/loongarch/include/asm/hardirq.h | 1 + arch/loongarch/include/asm/paravirt.h | 27 .../include/asm/paravirt_api_clock.h | 1 + arch/loongarch/kernel/Makefile| 1 + arch/loongarch/kernel/irq.c | 2 +- arch/loongarch/kernel/paravirt.c | 151 ++ arch/loongarch/kernel/smp.c | 4 +- 8 files changed, 194 insertions(+), 2 deletions(-) create mode 100644 arch/loongarch/include/asm/paravirt.h create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h create mode 100644 arch/loongarch/kernel/paravirt.c diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 54ad04dacdee..0a1540a8853e 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH bool default y +config PARAVIRT + bool "Enable paravirtualization code" + depends on AS_HAS_LVZ_EXTENSION + help + This changes the kernel so it can modify itself when it is run + under a hypervisor, potentially improving performance significantly + over full virtualization. However, when run without a hypervisor + the kernel is theoretically slower and slightly larger. + config ARCH_SUPPORTS_KEXEC def_bool y diff --git a/arch/loongarch/include/asm/hardirq.h b/arch/loongarch/include/asm/hardirq.h index 9f0038e19c7f..b26d596a73aa 100644 --- a/arch/loongarch/include/asm/hardirq.h +++ b/arch/loongarch/include/asm/hardirq.h @@ -21,6 +21,7 @@ enum ipi_msg_type { typedef struct { unsigned int ipi_irqs[NR_IPI]; unsigned int __softirq_pending; + atomic_t message cacheline_aligned_in_smp; } cacheline_aligned irq_cpustat_t; DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); diff --git a/arch/loongarch/include/asm/paravirt.h b/arch/loongarch/include/asm/paravirt.h new file mode 100644 index ..58f7b7b89f2c --- /dev/null +++ b/arch/loongarch/include/asm/paravirt.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_LOONGARCH_PARAVIRT_H +#define _ASM_LOONGARCH_PARAVIRT_H + +#ifdef CONFIG_PARAVIRT +#include +struct static_key; +extern struct static_key paravirt_steal_enabled; +extern struct static_key paravirt_steal_rq_enabled; + +u64 dummy_steal_clock(int cpu); +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock); + +static inline u64 paravirt_steal_clock(int cpu) +{ + return static_call(pv_steal_clock)(cpu); +} + +int pv_ipi_init(void); +#else +static inline int pv_ipi_init(void) +{ + return 0; +} + +#endif // CONFIG_PARAVIRT +#endif diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h b/arch/loongarch/include/asm/paravirt_api_clock.h new file mode 100644 index ..65ac7cee0dad --- /dev/null +++ b/arch/loongarch/include/asm/paravirt_api_clock.h @@ -0,0 +1 @@ +#include diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile index 3a7620b66bc6..c9bfeda89e40 100644 --- a/arch/loongarch/kernel/Makefile +++ b/arch/loongarch/kernel/Makefile @@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c index ce36897d1e5a..4863e6c1b739 100644 --- a/arch/loongarch/kernel/irq.c +++ b/arch/loongarch/kernel/irq.c @@ -113,5 +113,5 @@ void __init init_IRQ(void) per_cpu(irq_stack, i), per_cpu(irq_stack, i) + IRQ_STACK_SIZE); } - set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC); + set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC); } diff --git a/arch/loongarch/kernel/paravir
Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper
On Sat, 27 Apr 2024 at 04:03, Chris Lew wrote: > > > > On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote: > > diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c > > b/drivers/remoteproc/qcom_q6v5_adsp.c > > index 1d24c9b656a8..02d0c626b03b 100644 > > --- a/drivers/remoteproc/qcom_q6v5_adsp.c > > +++ b/drivers/remoteproc/qcom_q6v5_adsp.c > > @@ -23,6 +23,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > > > @@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc) > > int ret; > > unsigned int val; > > > > - ret = qcom_q6v5_prepare(&adsp->q6v5); > > + ret = qcom_pdm_get(); > > if (ret) > > return ret; > > Would it make sense to try and model this as a rproc subdev? This > section of the remoteproc code seems to be focused on making specific > calls to setup and enable hardware resources, where as pd mapper is > software. > > sysmon and ssr are also purely software and they are modeled as subdevs > in qcom_common. I'm not an expert on remoteproc organization but this > was just a thought. Well, the issue is that the pd-mapper is a global, not a per-remoteproc instance > > Thanks! > Chris > > > > > + ret = qcom_q6v5_prepare(&adsp->q6v5); > > + if (ret) > > + goto put_pdm; > > + > > ret = adsp_map_carveout(rproc); > > if (ret) { > > dev_err(adsp->dev, "ADSP smmu mapping failed\n"); > > @@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc) > > adsp_unmap_carveout(rproc); > > disable_irqs: > > qcom_q6v5_unprepare(&adsp->q6v5); > > +put_pdm: > > + qcom_pdm_release(); > > > > return ret; > > } > -- With best wishes Dmitry
Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper
On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote: diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c b/drivers/remoteproc/qcom_q6v5_adsp.c index 1d24c9b656a8..02d0c626b03b 100644 --- a/drivers/remoteproc/qcom_q6v5_adsp.c +++ b/drivers/remoteproc/qcom_q6v5_adsp.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc) int ret; unsigned int val; - ret = qcom_q6v5_prepare(&adsp->q6v5); + ret = qcom_pdm_get(); if (ret) return ret; Would it make sense to try and model this as a rproc subdev? This section of the remoteproc code seems to be focused on making specific calls to setup and enable hardware resources, where as pd mapper is software. sysmon and ssr are also purely software and they are modeled as subdevs in qcom_common. I'm not an expert on remoteproc organization but this was just a thought. Thanks! Chris + ret = qcom_q6v5_prepare(&adsp->q6v5); + if (ret) + goto put_pdm; + ret = adsp_map_carveout(rproc); if (ret) { dev_err(adsp->dev, "ADSP smmu mapping failed\n"); @@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc) adsp_unmap_carveout(rproc); disable_irqs: qcom_q6v5_unprepare(&adsp->q6v5); +put_pdm: + qcom_pdm_release(); return ret; }
Re: [PATCH] kernel/trace/trace_probe:Fixed memory leak issues in trace_probe.c.
Hi LuMingYin, Thanks for finding the problem! But please make a commit message following Documentation/process/submitting-patches.rst On Fri, 26 Apr 2024 10:13:43 +0100 lumingyindet...@126.com wrote: > From: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com> > > At line 1408 of the file /linux/kernel/trace/trace_probe.c, pointer variables > named code and tmp are defined. At line 1437, a new dynamic memory area is > allocated using the function kcalloc. When the if statement at line 1467 > evaluates to true, the program jumps to the out label at line 1469. Within > this function, there are two labels: out and fail. The difference between > these two labels is that fail additionally frees the dynamic memory area > pointed to by the variable code. Therefore, the program should jump to the > fail label instead of the out label. This commit fixes this bug. > For example, you must line break after about 70 characters. Also, please don't use the line number because the line number is easily changed (function name is OK). Since this bug is very clear mistake, so you can just explain that as following. If traceprobe_parse_probe_arg_body() fails to allocate 'parg->fmt', it jumps to 'out' instead of 'fail' by mistake. In the result, in this case the 'tmp' buffer is not freed and leaks its memory. Fix it by jumping to 'fail' in that case. The first paragraph explains what happens, and second one to exaplain how to fix it. Also, please add this Fixes tag. Fixes: 032330abd08b ("tracing/probes: Cleanup probe argument parser") You can easily find this commit number with git blame. Thank you, > Signed-off-by: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com> > --- > kernel/trace/trace_probe.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c > index dfe3ee6035ec..42bc0f362226 100644 > --- a/kernel/trace/trace_probe.c > +++ b/kernel/trace/trace_probe.c > @@ -1466,7 +1466,7 @@ static int traceprobe_parse_probe_arg_body(const char > *argv, ssize_t *size, > parg->fmt = kmalloc(len, GFP_KERNEL); > if (!parg->fmt) { > ret = -ENOMEM; > - goto out; > + goto fail; > } > snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype, >parg->count); > -- > 2.25.1 > -- Masami Hiramatsu (Google)