subject:"kernel"

[syzbot] [net?] [virt?] BUG: unable to handle kernel paging request in clear_page_erms (6)

2024-09-28 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:abf2050f51fd Merge tag 'media/v6.12-1' of git://git.kernel..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15a03107980000
kernel config:  https://syzkaller.appspot.com/x/.config?x=2a8c36c5e2b56016
dashboard link: https://syzkaller.appspot.com/bug?extid=0a31340d42a1d572f904
compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: 
https://storage.googleapis.com/syzbot-assets/9800778169d6/disk-abf2050f.raw.xz
vmlinux: 
https://storage.googleapis.com/syzbot-assets/32a789de3883/vmlinux-abf2050f.xz
kernel image: 
https://storage.googleapis.com/syzbot-assets/24e5e7200094/bzImage-abf2050f.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+0a31340d42a1d572f...@syzkaller.appspotmail.com

BUG: unable to handle page fault for address: 8880603bc000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1a801067 P4D 1a801067 PUD 6c591063 PMD 30259063 PTE 800f9fc43060
Oops: Oops: 0002 [#1] PREEMPT SMP KASAN PTI
CPU: 0 UID: 0 PID: 14210 Comm: syz.2.2649 Not tainted 
6.11.0-syzkaller-09959-gabf2050f51fd #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
08/06/2024
RIP: 0010:clear_page_erms+0xb/0x20 arch/x86/lib/clear_page_64.S:50
Code: 48 8d 7f 40 75 d9 90 c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 
90 90 90 90 90 90 90 f3 0f 1e fa b9 00 10 00 00 31 c0  aa c3 cc cc cc cc 66 
2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90
RSP: 0018:c9007310 EFLAGS: 00010246
RAX:  RBX:  RCX: 1000
RDX: 8880603bc000 RSI: 0001 RDI: 8880603bc000
RBP: dc00 R08: ea000180ef37 R09: 
R10: ed100c077800 R11: f94000301de7 R12: 0001
R13: 0001 R14: ea000180ef00 R15: 
FS:  7fa3b27206c0() GS:8880b860() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 8880603bc000 CR3: 3ec82000 CR4: 003506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 clear_page arch/x86/include/asm/page_64.h:54 [inline]
 clear_highpage_kasan_tagged include/linux/highmem.h:248 [inline]
 kernel_init_pages mm/page_alloc.c:1036 [inline]
 post_alloc_hook+0xf8/0x230 mm/page_alloc.c:1535
 prep_new_page mm/page_alloc.c:1545 [inline]
 get_page_from_freelist+0x3039/0x3180 mm/page_alloc.c:3457
 __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4733
 page_frag_alloc_1k net/core/skbuff.c:249 [inline]
 napi_alloc_skb+0x641/0xa00 net/core/skbuff.c:847
 page_to_skb+0x276/0x9b0 drivers/net/virtio_net.c:800
 receive_mergeable drivers/net/virtio_net.c:2253 [inline]
 receive_buf+0x3bc/0x17b0 drivers/net/virtio_net.c:2391
 virtnet_receive_packets drivers/net/virtio_net.c:2698 [inline]
 virtnet_receive drivers/net/virtio_net.c:2722 [inline]
 virtnet_poll+0x26b2/0x3980 drivers/net/virtio_net.c:2817
 __napi_poll+0xcb/0x490 net/core/dev.c:6771
 napi_poll net/core/dev.c:6840 [inline]
 net_rx_action+0x89b/0x1240 net/core/dev.c:6962
 handle_softirqs+0x2c5/0x980 kernel/softirq.c:554
 __do_softirq kernel/softirq.c:588 [inline]
 invoke_softirq kernel/softirq.c:428 [inline]
 __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
 irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
 common_interrupt+0xb9/0xd0 arch/x86/kernel/irq.c:278
 
 
 asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693
RIP: 0010:finish_task_switch+0x1ea/0x870 kernel/sched/core.c:5189
Code: c9 50 e8 79 fa 0b 00 48 83 c4 08 4c 89 f7 e8 4d 39 00 00 e9 de 04 00 00 
4c 89 f7 e8 e0 70 60 0a e8 db 58 38 00 fb 48 8b 5d c0 <48> 8d bb f8 15 00 00 48 
89 f8 48 c1 e8 03 49 be 00 00 00 00 00 fc
RSP: 0018:c9000caa7228 EFLAGS: 0282
RAX: 0b99d481b6833300 RBX: 888050f5 RCX: 817088da
RDX: dc00 RSI: 8c0aca40 RDI: 8c601bc0
RBP: c9000caa7270 R08: 9422a907 R09: 12845520
R10: dc00 R11: fbfff2845521 R12: 1110170c7f0c
R13: dc00 R14: 8880b863ea40 R15: 8880b863f860
 context_switch kernel/sched/core.c:5318 [inline]
 __schedule+0x184b/0x4ae0 kernel/sched/core.c:6674
 preempt_schedule_common+0x84/0xd0 kernel/sched/core.c:6853
 preempt_schedule+0xe1/0xf0 kernel/sched/core.c:6877
 preempt_schedule_thunk+0x1a/0x30 arch/x86/entry/thunk.S:12
 free_unref_page+0x6b5/0xf00 mm/page_alloc.c:2662
 __folio_put+0x2c7/0x440 mm/swap.c:126
 secretmem_fault+0x1f9/0x430 mm/secretmem.c:87
 __do_fault+0x135/0x460 mm/memory.c:4876
 do_shared_fault mm/memory.c:5346 [inline]
 do_fault mm/memory.c:5420 [inline]
 do_pte_missing mm/memory.c:3965 [inline]
 handle_pte_fault+0x1105/0x6800 mm/memory.c:5751

[linus:master] [selftests] ecb8bd70d5: kernel-selftests.vDSO.vdso_standalone_test_x86.fail

2024-09-24 Thread kernel test robot




Hello,

kernel test robot noticed "kernel-selftests.vDSO.vdso_standalone_test_x86.fail" 
on:

commit: ecb8bd70d51ccf9009219a6097cef293deada65b ("selftests: vDSO: build tests 
with O2 optimization")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: kernel-selftests
version: kernel-selftests-x86_64-977d51cf-1_20240508
with following parameters:

group: group-03



compiler: gcc-12
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz 
(Cascade Lake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-lkp/202409241558.98e13f6f-oliver.s...@intel.com



# timeout set to 300
# selftests: vDSO: vdso_standalone_test_x86
# Segmentation fault
not ok 5 selftests: vDSO: vdso_standalone_test_x86 # exit=139



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240924/202409241558.98e13f6f-oliver.s...@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [RFC PATCH 0/4] Add hazard pointers to kernel

2024-09-19 Thread Neeraj Upadhyay

On 9/19/2024 12:16 PM, Linus Torvalds wrote:
> On Thu, 19 Sept 2024 at 00:44, Neeraj Upadhyay  
> wrote:
>>
>> While we were working on this problem, this refcount scalability issue got
>> resolved  recently with conditional ref acquisition [3] (however, there are 
>> new
>> developments in apparmor code which might bring back the refcount problem 
>> [4]).
> 
> Honestly, the various security layers should be a whole lot more
> careful about their horrid performance issues, and I think that [4]
> you point at needs to just be headed off at the pass.
> 
> No  more "the security layer is so bad at performance that we have to
> introduce new ref mechanisms", please. Let's push back on bad security
> layer code instead.
> 

Ok got it. Thanks for your feedback! I had tried using percpu refcount first
(in place of kref) in AppArmor. However, that required managing the last
reference drop (implemented in [1] and [2]). Mateusz has shared some ideas
in his reply to this thread. Maybe that is a workable solution. Will defer
to John on this as I have limited understanding of the cred management code.

- Neeraj

> Linus

Re: [RFC PATCH 0/4] Add hazard pointers to kernel

2024-09-19 Thread Mateusz Guzik

On Thu, Sep 19, 2024 at 04:14:05AM +0530, Neeraj Upadhyay wrote:
> On 9/18/2024 12:48 PM, Linus Torvalds wrote:
> > On Tue, 17 Sept 2024 at 16:34, Boqun Feng  wrote:
> >>
> >> This series introduces hazard pointers [1] to kernel space. A TL;DR
> >> description of hazard pointers is "a scalable refcounting mechanim
> >> with RCU-like API". More information can be found at [2].
> > 
> > Please give actual "this is useful for X, and here is an actual real
> > load with numbers showing why it matters".
> > 
> 
> One of the use case where we had seen improvement is - Nginx
> web server throughput scalability with AppArmor enabled. For this use
> case we see refcount scalability problem when kref operations
> are done for AppArmor label object in Nginx worker's context. More
> details about this are captured @ [1] [2].
> 
> When we switch from kref to hazard pointer in apparmor_file_open(),
> we see ~7% improvement in Nginx throughput for this use case.
> 
> While we were working on this problem, this refcount scalability issue got
> resolved  recently with conditional ref acquisition [3] (however, there are 
> new
> developments in apparmor code which might bring back the refcount problem 
> [4]).
> 

The open/close thing is still serializing across different processes,
the slowdown just got lower. As in apparmor *as is* continues to be a
problem at big enough scale.

Per my messages in the area in the past, I'm confident this is fixable
with changing the refcount model to cache ref changes per-thread. I
employed this very scheme $elsewhere.

Since equivalent mechanism is applicable to creds this may want to be
implemented as something under lib/. I even started to work on it for
Linux, but real life got in the way and then I could not be arsed to
finish. 

It is a little reminiscenet of per-cpu refs. Here is the outline again:

kref usage gets replaced with a touple of { kref users; s64 refs; }

task_struct grows a pointer to the cached label and refs counter on it

when a new thread is created it bumps users and stores the pointer. on
destruction it decrements users and rolls up the local changes.
Similarly, if it turns out the label has to change during thread's
lifetime, the same thing happens.

In pseudo-code for apparmor_file_open():
if (unlikely(current->aa_cached_label != check_label())) {
/* do a replacement here */
}
/* just bump the local counter, no synchronisation with other
 * cpus in the common case */
current->aa_cached_label_refs++;

In apparmor_file_close():
/* common case fast path */
if (file->aa_label == current->aa_cached_label) {
current->aa_cached_label_refs--;
return;
}
/* we get here if apparmor got reconfigured or this is a file we
 * inherited from another proc which had a different label and
 * this is the last fput */
kref_put(file->aa_label);

Conceptually there is almost nothing to see here.

As outlined above stale labels would clear themselves out as threads
open files. However, a thread which stubborly refuses to call allocate a
new file obj may hold on to a stale label indefinitely.

One way to sort it out:
I presume there is a spot somewhere in user<->kernel transition handling
which updates the credentials pointer, should it have changed.

$elsewhere I patched it up with a "cow" generation counter. If not
matching with the real task struct you know you need to take the fast
path and check creds, apparmor and whatever else. No extra branches in
the fast path, but a new int does have to be read. Given that
task_struct is a little bit of a cluster fuck I don't think it's a
problem.

That would be a rough sketch, anyone interested can fill in the details.
This still performs serializing atomics in *certain* cases, but avoids
them in almost all cases and there is nothing complicated about this
that I see, just some effort to implement.

So I don't believe patching up RCU with hazard pointers is warranted if
apparmor is the only justification.

Anyway no ETA from my end, anyone interested is free to take the idea or
do better.

Re: [RFC PATCH 0/4] Add hazard pointers to kernel

2024-09-19 Thread Linus Torvalds

On Thu, 19 Sept 2024 at 16:15, Christoph Hellwig  wrote:
>
> Agreed.  From the description this would seem like a good fit for
> q_usage_counter in the block layer, which currently makes creative use
> of percpu counters.

Yes, if this actually could simplify code that currently used percpu
counters, that might be lovely.

The percpu counters often perform very well, but then have huge pain
in either managing the percpu allocation, or in trying to synchronize
across CPU's.

I'd be a lot more interested in "we can fix complex code" than in "we
have crappy code in bad subsystems where we can hide the performance
impact of the subsystem not having been done right".

   Linus

Re: [RFC PATCH 0/4] Add hazard pointers to kernel

2024-09-19 Thread Christoph Hellwig

On Wed, Sep 18, 2024 at 09:18:43AM +0200, Linus Torvalds wrote:
> On Tue, 17 Sept 2024 at 16:34, Boqun Feng  wrote:
> >
> > This series introduces hazard pointers [1] to kernel space. A TL;DR
> > description of hazard pointers is "a scalable refcounting mechanim
> > with RCU-like API". More information can be found at [2].
> 
> Please give actual "this is useful for X, and here is an actual real
> load with numbers showing why it matters".

Agreed.  From the description this would seem like a good fit for
q_usage_counter in the block layer, which currently makes creative use
of percpu counters.

Re: [RFC PATCH 0/4] Add hazard pointers to kernel

2024-09-18 Thread Linus Torvalds

On Thu, 19 Sept 2024 at 00:44, Neeraj Upadhyay  wrote:
>
> While we were working on this problem, this refcount scalability issue got
> resolved  recently with conditional ref acquisition [3] (however, there are 
> new
> developments in apparmor code which might bring back the refcount problem 
> [4]).

Honestly, the various security layers should be a whole lot more
careful about their horrid performance issues, and I think that [4]
you point at needs to just be headed off at the pass.

No  more "the security layer is so bad at performance that we have to
introduce new ref mechanisms", please. Let's push back on bad security
layer code instead.

Linus

Re: [RFC PATCH 0/4] Add hazard pointers to kernel

2024-09-18 Thread Neeraj Upadhyay

On 9/18/2024 12:48 PM, Linus Torvalds wrote:
> On Tue, 17 Sept 2024 at 16:34, Boqun Feng  wrote:
>>
>> This series introduces hazard pointers [1] to kernel space. A TL;DR
>> description of hazard pointers is "a scalable refcounting mechanim
>> with RCU-like API". More information can be found at [2].
> 
> Please give actual "this is useful for X, and here is an actual real
> load with numbers showing why it matters".
> 

One of the use case where we had seen improvement is - Nginx
web server throughput scalability with AppArmor enabled. For this use
case we see refcount scalability problem when kref operations
are done for AppArmor label object in Nginx worker's context. More
details about this are captured @ [1] [2].

When we switch from kref to hazard pointer in apparmor_file_open(),
we see ~7% improvement in Nginx throughput for this use case.

While we were working on this problem, this refcount scalability issue got
resolved  recently with conditional ref acquisition [3] (however, there are new
developments in apparmor code which might bring back the refcount problem [4]).

[1] 
https://lore.kernel.org/lkml/20240110111856.87370-7-neeraj.upadh...@amd.com/T/
[2] 
https://lore.kernel.org/lkml/20240916050811.473556-1-neeraj.upadh...@amd.com/
[3] https://lore.kernel.org/lkml/20240620131524.156312-1-mjgu...@gmail.com/
[4] 
https://lore.kernel.org/lkml/71c0ea18-8b8b-402b-b03c-029aeedc2...@canonical.com/

- Neeraj

> We don't just merge random infrastructure without a use-case and an
> argument for it.
> 
>  Linus

Re: [RFC PATCH 0/4] Add hazard pointers to kernel

2024-09-18 Thread Linus Torvalds

On Tue, 17 Sept 2024 at 16:34, Boqun Feng  wrote:
>
> This series introduces hazard pointers [1] to kernel space. A TL;DR
> description of hazard pointers is "a scalable refcounting mechanim
> with RCU-like API". More information can be found at [2].

Please give actual "this is useful for X, and here is an actual real
load with numbers showing why it matters".

We don't just merge random infrastructure without a use-case and an
argument for it.

 Linus

[RFC PATCH 0/4] Add hazard pointers to kernel

2024-09-17 Thread Boqun Feng

Hi,

This series introduces hazard pointers [1] to kernel space. A TL;DR
description of hazard pointers is "a scalable refcounting mechanim
with RCU-like API". More information can be found at [2].

The problem we are trying to resolve here is refcount scalability
issues that cannot be resolved simply by RCU or SRCU (maybe due to the
requirement of an unbound protect duration). Neeraj has tried it in the
scalability issue[3] he has been working on, and he will share more
information in our LPC session [4] (and I will update in the list for
those who cannot make it to the session later).

My micro-benchmark shows the hazard pointers provide very good
scalability on par with percpu_ref/RCU/SRCU on the reader side:

(refscale in x86_64 + PREEMPT=y, avg reader duration in ns)
nreaders1   8   32
percpu_ref  6.95123 10.0869 8.9674
rcu 2.97923 3.243   3.55077
hazptr  8.5991  8.40443 8.5762
srcu16.7754 22.4807 20.2406

Things that we know are currently not working:

*   Handling module unload, probably needs a hazptr_barrier()
similar to rcu_barrier().

*   rcutorture support should be added to catch potential bugs (esp.
for callback handling).

*   Improvement for updater side performance, currently all
callbacks are handled in one work, this can be improved by using
multiple work_structs or threads.

Of course, I might create some bugs, so please take a look. Also love to
hear anything on the current API. Any feedback is welcome!

Patch #1 is the implemenation of hazptr, Paul and Neeraj contributed a
lot, but all bugs are mine ;-)

Patch 2-3 add micro-benchmarks for hazptr and percpu_ref.

Patch #4 is a simple test I've used for development, I put it here just
in case someone wants to give a quick try, eventually, we need to add
hazptr to rcutorture (or has its own torture) for more testing.

Regards,
Boqun

[1]: M. M. Michael, "Hazard pointers: safe memory reclamation for
 lock-free objects," in IEEE Transactions on Parallel and
 Distributed Systems, vol. 15, no. 6, pp. 491-504, June 2004
[2]: 
https://docs.google.com/document/d/113WFjGlAW4m72xNbZWHUSE-yU2HIJnWpiXp91ShtgeE/
[3]: 
https://lore.kernel.org/lkml/20240916050811.473556-1-neeraj.upadh...@amd.com/
[4]: https://lpc.events/event/18/contributions/1731/
[5]: Herlihy, Maurice, Victor Luchangco, and Mark Moir. "The repeat
 offender problem: A mechanism for supporting dynamic-sized,
 lock-free data structures." International Symposium on Distributed
 Computing. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002.

Boqun Feng (4):
  hazptr: Add initial implementation of hazard pointers
  refscale: Add benchmarks for hazptr
  refscale: Add benchmarks for percpu_ref
  WIP: hazptr: Add hazptr test sample

 include/linux/hazptr.h   |  83 +++
 kernel/Makefile  |   1 +
 kernel/hazptr.c  | 463 +++++++
 kernel/rcu/refscale.c| 127 +-
 samples/Kconfig  |   6 +
 samples/Makefile |   1 +
 samples/hazptr/hazptr_test.c |  87 +++
 7 files changed, 767 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/hazptr.h
 create mode 100644 kernel/hazptr.c
 create mode 100644 samples/hazptr/hazptr_test.c

-- 
2.45.2

[PATCH v3 0/9] SEV Kernel Selftests

2024-09-05 Thread Pratik R. Sampat

This series primarily introduces SEV-SNP test for the kernel selftest
framework. It tests boot, ioctl, pre fault, and fallocate in various
combinations to exercise both positive and negative launch flow paths.

Patch 1 - Adds a wrapper for the ioctl calls that decouple ioctl and
asserts, which enables the use of negative test cases. No functional
change intended.
Patch 2 - Extend the sev smoke tests to use the SNP specific ioctl
calls and sets up memory to boot a SNP guest VM
Patch 3 - Adds SNP to shutdown testing
Patch 4, 5 - Tests the ioctl path for SEV, SEV-ES and SNP
Patch 6 - Adds support for SNP in KVM_SEV_INIT2 tests
Patch 7,8,9 - Enable Prefault tests for SEV, SEV-ES and SNP

The patchset is rebased on top of kvm-x86/next branch.

v3:
1. Remove the assignments for the prefault and fallocate test type
   enums.
2. Fix error message for sev launch measure and finish.
3. Collect tested-by tags [Peter, Srikanth]

v2:
https://lore.kernel.org/kvm/20240816192310.117456-1-pratikrajesh.sam...@amd.com/
1. Add SMT parsing check to populate SNP policy flags
2. Extend Peter Gonda's shutdown test to include SNP
3. Introduce new tests for prefault which include exercising prefault,
   fallocate, hole-punch in various combinations.
4. Decouple ioctl patch reworked to introduce private variants of the
   the functions that call into the ioctl. Also reordered the patch for
   it to arrive first so that new APIs are not written right after
   their introduction.
5. General cleanups - adding comments, avoiding local booleans, better
   error message. Suggestions incorporated from Peter, Tom, and Sean.

RFC:
https://lore.kernel.org/kvm/20240710220540.188239-1-pratikrajesh.sam...@amd.com/

Any feedback/review is highly appreciated!

Michael Roth (2):
  KVM: selftests: Add interface to manually flag protected/encrypted
ranges
  KVM: selftests: Add a CoCo-specific test for KVM_PRE_FAULT_MEMORY

Pratik R. Sampat (7):
  KVM: selftests: Decouple SEV ioctls from asserts
  KVM: selftests: Add a basic SNP smoke test
  KVM: selftests: Add SNP to shutdown testing
  KVM: selftests: SEV IOCTL test
  KVM: selftests: SNP IOCTL test
  KVM: selftests: SEV-SNP test for KVM_SEV_INIT2
  KVM: selftests: Interleave fallocate for KVM_PRE_FAULT_MEMORY

 tools/testing/selftests/kvm/Makefile  |   1 +
 .../testing/selftests/kvm/include/kvm_util.h  |  13 +
 .../selftests/kvm/include/x86_64/processor.h  |   1 +
 .../selftests/kvm/include/x86_64/sev.h|  76 +++-
 tools/testing/selftests/kvm/lib/kvm_util.c|  53 ++-
 .../selftests/kvm/lib/x86_64/processor.c  |   6 +-
 tools/testing/selftests/kvm/lib/x86_64/sev.c  | 190 +++-
 .../kvm/x86_64/coco_pre_fault_memory_test.c   | 421 ++
 .../selftests/kvm/x86_64/sev_init2_tests.c|  13 +
 .../selftests/kvm/x86_64/sev_smoke_test.c | 297 +++-
 10 files changed, 1023 insertions(+), 48 deletions(-)
 create mode 100644 
tools/testing/selftests/kvm/x86_64/coco_pre_fault_memory_test.c

-- 
2.34.1

Re: [PATCH v4 1/2] virtiofs: use pages instead of pointer for kernel direct IO

2024-09-03 Thread Hou Tao

Hi,

On 9/3/2024 4:44 PM, Jingbo Xu wrote:
>
> On 8/31/24 5:37 PM, Hou Tao wrote:
>> From: Hou Tao 
>>
>> When trying to insert a 10MB kernel module kept in a virtio-fs with cache
>> disabled, the following warning was reported:
>>

SNIP
>>
>> Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem")
>> Signed-off-by: Hou Tao 
> Tested-by: Jingbo Xu 

Thanks for the test.
>
>
>> ---
>>  fs/fuse/file.c  | 62 +++--
>>  fs/fuse/fuse_i.h|  6 +
>>  fs/fuse/virtio_fs.c |  1 +
>>  3 files changed, 50 insertions(+), 19 deletions(-)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index f39456c65ed7..331208d3e4d1 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -645,7 +645,7 @@ void fuse_read_args_fill(struct fuse_io_args *ia, struct 
>> file *file, loff_t pos,
>>  args->out_args[0].size = count;
>>  }
>>  
>> -

SNIP
>>  static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter 
>> *ii,
>> size_t *nbytesp, int write,
>> -   unsigned int max_pages)
>> +   unsigned int max_pages,
>> +   bool use_pages_for_kvec_io)
>>  {
>> +bool flush_or_invalidate = false;
>>  size_t nbytes = 0;  /* # bytes already packed in req */
>>  ssize_t ret = 0;
>>  
>> -/* Special case for kernel I/O: can copy directly into the buffer */
>> +/* Special case for kernel I/O: can copy directly into the buffer.
>> + * However if the implementation of fuse_conn requires pages instead of
>> + * pointer (e.g., virtio-fs), use iov_iter_extract_pages() instead.
>> + */
>>  if (iov_iter_is_kvec(ii)) {
>> -unsigned long user_addr = fuse_get_user_addr(ii);
>> -size_t frag_size = fuse_get_frag_size(ii, *nbytesp);
>> +void *user_addr = (void *)fuse_get_user_addr(ii);
>>  
>> -if (write)
>> -ap->args.in_args[1].value = (void *) user_addr;
>> -else
>> -ap->args.out_args[0].value = (void *) user_addr;
>> +if (!use_pages_for_kvec_io) {
>> +size_t frag_size = fuse_get_frag_size(ii, *nbytesp);
>>  
>> -iov_iter_advance(ii, frag_size);
>> -*nbytesp = frag_size;
>> -return 0;
>> +if (write)
>> +ap->args.in_args[1].value = user_addr;
>> +else
>> +ap->args.out_args[0].value = user_addr;
>> +
>> +iov_iter_advance(ii, frag_size);
>> +*nbytesp = frag_size;
>> +return 0;
>> +}
>> +
>> +if (is_vmalloc_addr(user_addr)) {
>> +ap->args.vmap_base = user_addr;
>> +flush_or_invalidate = true;
> Could we move flush_kernel_vmap_range() upon here, so that
> flush_or_invalidate is not needed anymore and the code looks cleaner?

flush_kernel_vmap_range() needs to know the length of the flushed area,
if moving it here(), the length will be unknown.
>
>> +}
>>  }
>>  
>>  while (nbytes < *nbytesp && ap->num_pages < max_pages) {
>> @@ -1513,6 +1533,10 @@ static int fuse_get_user_pages(struct fuse_args_pages 
>> *ap, struct iov_iter *ii,
>>  (PAGE_SIZE - ret) & (PAGE_SIZE - 1);
>>  }
>>  
>> +if (write && flush_or_invalidate)
>> +flush_kernel_vmap_range(ap->args.vmap_base, nbytes);
>> +
>> +ap->args.invalidate_vmap = !write && flush_or_invalidate;
> How about initializing vmap_base only when the data buffer is vmalloced
> and it's a read request?  In this case invalidate_vmap is no longer needed.

You mean using the value of vmap_base to indicate whether invalidation
is needed or not, right ? I prefer to keep it, because the extra
variable invalidate_vmap indicates the required action for the vmap area
and it doesn't increase the size of fuse_args.

>
>>  ap->args.is_pinned = iov_iter_extract_will_pin(ii);
>>  ap->args.user_pages = true;
>>  if (write)
>> @@ -1581,7 +1605,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct 
>> iov_iter *iter,
>>  size_t nbytes = min(count, nmax);
>>  
>>  err = fuse_get_user_pages(&ia->ap, i

Re: [PATCH v4 1/2] virtiofs: use pages instead of pointer for kernel direct IO

2024-09-03 Thread Jingbo Xu




On 8/31/24 5:37 PM, Hou Tao wrote:
> From: Hou Tao 
> 
> When trying to insert a 10MB kernel module kept in a virtio-fs with cache
> disabled, the following warning was reported:
> 
>   [ cut here ]
>   WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 ..
>   Modules linked in:
>   CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ..
>   RIP: 0010:__alloc_pages+0x2bf/0x380
>   ..
>   Call Trace:
>
>? __warn+0x8e/0x150
>? __alloc_pages+0x2bf/0x380
>__kmalloc_large_node+0x86/0x160
>__kmalloc+0x33c/0x480
>virtio_fs_enqueue_req+0x240/0x6d0
>virtio_fs_wake_pending_and_unlock+0x7f/0x190
>queue_request_and_unlock+0x55/0x60
>fuse_simple_request+0x152/0x2b0
>fuse_direct_io+0x5d2/0x8c0
>fuse_file_read_iter+0x121/0x160
>__kernel_read+0x151/0x2d0
>kernel_read+0x45/0x50
>kernel_read_file+0x1a9/0x2a0
>init_module_from_file+0x6a/0xe0
>idempotent_init_module+0x175/0x230
>__x64_sys_finit_module+0x5d/0xb0
>x64_sys_call+0x1c3/0x9e0
>do_syscall_64+0x3d/0xc0
>entry_SYSCALL_64_after_hwframe+0x4b/0x53
>..
>
>   ---[ end trace  ]---
> 
> The warning is triggered as follows:
> 
> 1) syscall finit_module() handles the module insertion and it invokes
> kernel_read_file() to read the content of the module first.
> 
> 2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and
> passes it to kernel_read(). kernel_read() constructs a kvec iter by
> using iov_iter_kvec() and passes it to fuse_file_read_iter().
> 
> 3) virtio-fs disables the cache, so fuse_file_read_iter() invokes
> fuse_direct_io(). As for now, the maximal read size for kvec iter is
> only limited by fc->max_read. For virtio-fs, max_read is UINT_MAX, so
> fuse_direct_io() doesn't split the 10MB buffer. It saves the address and
> the size of the 10MB-sized buffer in out_args[0] of a fuse request and
> passes the fuse request to virtio_fs_wake_pending_and_unlock().
> 
> 4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to
> queue the request. Because virtiofs need DMA-able address, so
> virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce buffer for
> all fuse args, copies these args into the bounce buffer and passed the
> physical address of the bounce buffer to virtiofsd. The total length of
> these fuse args for the passed fuse request is about 10MB, so
> copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter and
> it triggers the warning in __alloc_pages():
> 
>   if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
>   return NULL;
> 
> 5) virtio_fs_enqueue_req() will retry the memory allocation in a
> kworker, but it won't help, because kmalloc() will always return NULL
> due to the abnormal size and finit_module() will hang forever.
> 
> A feasible solution is to limit the value of max_read for virtio-fs, so
> the length passed to kmalloc() will be limited. However it will affect
> the maximal read size for normal read. And for virtio-fs write initiated
> from kernel, it has the similar problem but now there is no way to limit
> fc->max_write in kernel.
> 
> So instead of limiting both the values of max_read and max_write in
> kernel, introducing use_pages_for_kvec_io in fuse_conn and setting it as
> true in virtiofs. When use_pages_for_kvec_io is enabled, fuse will use
> pages instead of pointer to pass the KVEC_IO data.
> 
> After switching to pages for KVEC_IO data, these pages will be used for
> DMA through virtio-fs. If these pages are backed by vmalloc(),
> {flush|invalidate}_kernel_vmap_range() are necessary to flush or
> invalidate the cache before the DMA operation. So add two new fields in
> fuse_args_pages to record the base address of vmalloc area and the
> condition indicating whether invalidation is needed. Perform the flush
> in fuse_get_user_pages() for write operations and the invalidation in
> fuse_release_user_pages() for read operations.
> 
> It may seem necessary to introduce another field in fuse_conn to
> indicate that these KVEC_IO pages are used for DMA, However, considering
> that virtio-fs is currently the only user of use_pages_for_kvec_io, just
> reuse use_pages_for_kvec_io to indicate that these pages will be used
> for DMA.
> 
> Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem")
> Signed-off-by: Hou Tao 

Tested-by: Jingbo Xu 


> ---
>  fs/fuse/file.c  | 62 +++--
>  fs/fuse/fuse_i.h|  6 +
>  fs/fuse/virtio_fs.c |  1 +
>  3 files changed, 50 insertions(+), 19 deletions(-)
>

[RFC 04/31] kernel/sys: Don't reference UTS_RELEASE directly

2024-09-02 Thread Josh Poimboeuf

Objtool will be getting a new feature to calculate build-time function
checksums, so each function can be uniquely identified.  A function's
checksum is calculated based on its instructions, jump/call targets,
alternatives, string literals, and more.

When there are any changes to the git working tree, UTS_RELEASE is
suffixed with "+".  That can result in an undesired changed checksum for
the functions which inline override_release() due to its direct
reference of the UTS_RELEASE string literal.

Convert the override_release() 'rest' variable to a static local so it
won't affect function checksums.

Signed-off-by: Josh Poimboeuf 
---
 kernel/sys.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 3a2df1bd9f64..526464ea194b 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1291,7 +1291,7 @@ static int override_release(char __user *release, size_t 
len)
int ret = 0;
 
if (current->personality & UNAME26) {
-   const char *rest = UTS_RELEASE;
+   static const char *rest = UTS_RELEASE;
char buf[65] = { 0 };
int ndots = 0;
unsigned v;
-- 
2.45.2

[syzbot] [modules?] kernel panic: stack is corrupted in call_usermodehelper_exec

2024-08-29 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:3b9dfd9e5936 Merge tag 'hwmon-for-v6.11-rc6' of git://git...
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=141ab933980000
kernel config:  https://syzkaller.appspot.com/x/.config?x=d76559f775f44ba6
dashboard link: https://syzkaller.appspot.com/bug?extid=14d9438422f594f856bd
compiler:   Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 
2.40
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=17d8c77b98
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11034a3598

Downloadable assets:
disk image (non-bootable): 
https://storage.googleapis.com/syzbot-assets/7bc7510fe41f/non_bootable_disk-3b9dfd9e.raw.xz
vmlinux: 
https://storage.googleapis.com/syzbot-assets/3dab2f917732/vmlinux-3b9dfd9e.xz
kernel image: 
https://storage.googleapis.com/syzbot-assets/541828a1cf09/bzImage-3b9dfd9e.xz
mounted in repro: 
https://storage.googleapis.com/syzbot-assets/cc6a8f9d7bd9/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+14d9438422f594f85...@syzkaller.appspotmail.com

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 
call_usermodehelper_exec+0x493/0x4a0
CPU: 0 UID: 0 PID: 5107 Comm: syz-executor310 Not tainted 
6.11.0-rc5-syzkaller-00148-g3b9dfd9e5936 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 
 __dump_stack lib/dump_stack.c:93 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
 panic+0x349/0x860 kernel/panic.c:354
 __stack_chk_fail+0x15/0x20 kernel/panic.c:827
 call_usermodehelper_exec+0x493/0x4a0
 call_modprobe kernel/module/kmod.c:103 [inline]
 __request_module+0x3ee/0x650 kernel/module/kmod.c:173
 ctrl_getfamily+0x28e/0x6b0 net/netlink/genetlink.c:1450
 genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
 genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
 genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2550
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
 netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
 netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1357
 netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1901
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x221/0x270 net/socket.c:745
 __sys_sendto+0x3a4/0x4f0 net/socket.c:2204
 __do_sys_sendto net/socket.c:2216 [inline]
 __se_sys_sendto net/socket.c:2212 [inline]
 __x64_sys_sendto+0xde/0x100 net/socket.c:2212
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb2add42023
Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 80 
3d 81 90 09 00 00 41 89 ca 74 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 
c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24
RSP: 002b:7ffe2a46ace8 EFLAGS: 0202 ORIG_RAX: 002c
RAX: ffda RBX: 7ffe2a46ad90 RCX: 7fb2add42023
RDX: 001c RSI: 7ffe2a46ade0 RDI: 0005
RBP: 0005 R08: 7ffe2a46ad04 R09: 000c
R10:  R11: 0202 R12: 
R13: 7ffe2a46ad58 R14: 7ffe2a46ade0 R15: 00000000
 
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Re: [PATCH 0/2] module: Split modules_install compression and in-kernel decompression

2024-08-19 Thread Luis Chamberlain

On Mon, Jul 22, 2024 at 11:06:20AM +0200, Petr Pavlu wrote:
> Allow enabling the in-kernel module decompression support separately,
> without requiring to enable also the automatic compression during
> 'make modules_install'.

Applied and pushed, thanks!

  Luis

Re: [PATCH v3 0/2] virtiofs: fix the warning for kernel direct IO

2024-08-14 Thread Jingbo Xu




On 8/14/24 3:46 PM, Hou Tao wrote:
> Hi,
> 
> On 8/14/2024 2:34 PM, Jingbo Xu wrote:
>> Hi, Tao,
>>
>> On 4/26/24 10:39 PM, Hou Tao wrote:
>>> From: Hou Tao 
>>>
>>> Hi,
>>>
>>> The patch set aims to fix the warning related to an abnormal size
>>> parameter of kmalloc() in virtiofs. Patch #1 fixes it by introducing
>>> use_pages_for_kvec_io option in fuse_conn and enabling it in virtiofs.
>>> Beside the abnormal size parameter for kmalloc, the gfp parameter is
>>> also questionable: GFP_ATOMIC is used even when the allocation occurs
>>> in a kworker context. Patch #2 fixes it by using GFP_NOFS when the
>>> allocation is initiated by the kworker. For more details, please check
>>> the individual patches.
>>>
>>> As usual, comments are always welcome.
>>>
>>> Change Log:
>>>
>>> v3:
>>>  * introduce use_pages_for_kvec_io for virtiofs. When the option is
>>>enabled, fuse will use iov_iter_extract_pages() to construct a page
>>>array and pass the pages array instead of a pointer to virtiofs.
>>>The benefit is twofold: the length of the data passed to virtiofs is
>>>limited by max_pages, and there is no memory copy compared with v2.
>>>
>>> v2: 
>>> https://lore.kernel.org/linux-fsdevel/20240228144126.2864064-1-hou...@huaweicloud.com/
>>>   * limit the length of ITER_KVEC dio by max_pages instead of the
>>> newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC
>>> dio being consistent with other rw operations.
>>>   * replace kmalloc-allocated bounce buffer by using a bounce buffer
>>> backed by scattered pages when the length of the bounce buffer for
>>> KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with
>>> fragmented memory, the KVEC_ITER dio can be handled normally by
>>> virtiofs. (Bernd Schubert)
>>>   * merge the GFP_NOFS patch [1] into this patch-set and use
>>> memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS
>>> (Benjamin Coddington)
>>>
>>> v1: 
>>> https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/
>>>
>>> [1]: 
>>> https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-hou...@huaweicloud.com/
>>>
>>> Hou Tao (2):
>>>   virtiofs: use pages instead of pointer for kernel direct IO
>>>   virtiofs: use GFP_NOFS when enqueuing request through kworker
>>>
>>>  fs/fuse/file.c  | 12 
>>>  fs/fuse/fuse_i.h|  3 +++
>>>  fs/fuse/virtio_fs.c | 25 -
>>>  3 files changed, 27 insertions(+), 13 deletions(-)
>>>
>> We also encountered the same issue as [1] these days when attempting to
>> insmod a module with ~6MB size, which is upon a virtiofs filesystem.
>>
>> It would be much helpful if this issue has a standard fix in the
>> upstream.  I see there will be v4 when reading through the mailing
>> thread.  Glad to know if there's any update to this series.
> 
> Being busy with other stuff these days. I hope to send v4 before next
> weekend.

Many thanks, Tao.


-- 
Thanks,
Jingbo

Re: [PATCH v3 0/2] virtiofs: fix the warning for kernel direct IO

2024-08-14 Thread Hou Tao

Hi,

On 8/14/2024 2:34 PM, Jingbo Xu wrote:
> Hi, Tao,
>
> On 4/26/24 10:39 PM, Hou Tao wrote:
>> From: Hou Tao 
>>
>> Hi,
>>
>> The patch set aims to fix the warning related to an abnormal size
>> parameter of kmalloc() in virtiofs. Patch #1 fixes it by introducing
>> use_pages_for_kvec_io option in fuse_conn and enabling it in virtiofs.
>> Beside the abnormal size parameter for kmalloc, the gfp parameter is
>> also questionable: GFP_ATOMIC is used even when the allocation occurs
>> in a kworker context. Patch #2 fixes it by using GFP_NOFS when the
>> allocation is initiated by the kworker. For more details, please check
>> the individual patches.
>>
>> As usual, comments are always welcome.
>>
>> Change Log:
>>
>> v3:
>>  * introduce use_pages_for_kvec_io for virtiofs. When the option is
>>enabled, fuse will use iov_iter_extract_pages() to construct a page
>>array and pass the pages array instead of a pointer to virtiofs.
>>The benefit is twofold: the length of the data passed to virtiofs is
>>limited by max_pages, and there is no memory copy compared with v2.
>>
>> v2: 
>> https://lore.kernel.org/linux-fsdevel/20240228144126.2864064-1-hou...@huaweicloud.com/
>>   * limit the length of ITER_KVEC dio by max_pages instead of the
>> newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC
>> dio being consistent with other rw operations.
>>   * replace kmalloc-allocated bounce buffer by using a bounce buffer
>> backed by scattered pages when the length of the bounce buffer for
>> KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with
>> fragmented memory, the KVEC_ITER dio can be handled normally by
>> virtiofs. (Bernd Schubert)
>>   * merge the GFP_NOFS patch [1] into this patch-set and use
>> memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS
>> (Benjamin Coddington)
>>
>> v1: 
>> https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/
>>
>> [1]: 
>> https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-hou...@huaweicloud.com/
>>
>> Hou Tao (2):
>>   virtiofs: use pages instead of pointer for kernel direct IO
>>   virtiofs: use GFP_NOFS when enqueuing request through kworker
>>
>>  fs/fuse/file.c  | 12 
>>  fs/fuse/fuse_i.h|  3 +++
>>  fs/fuse/virtio_fs.c | 25 -
>>  3 files changed, 27 insertions(+), 13 deletions(-)
>>
> We also encountered the same issue as [1] these days when attempting to
> insmod a module with ~6MB size, which is upon a virtiofs filesystem.
>
> It would be much helpful if this issue has a standard fix in the
> upstream.  I see there will be v4 when reading through the mailing
> thread.  Glad to know if there's any update to this series.

Being busy with other stuff these days. I hope to send v4 before next
weekend.
>
> [1]
> https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/
>

Re: [PATCH v3 0/2] virtiofs: fix the warning for kernel direct IO

2024-08-13 Thread Jingbo Xu

Hi, Tao,

On 4/26/24 10:39 PM, Hou Tao wrote:
> From: Hou Tao 
> 
> Hi,
> 
> The patch set aims to fix the warning related to an abnormal size
> parameter of kmalloc() in virtiofs. Patch #1 fixes it by introducing
> use_pages_for_kvec_io option in fuse_conn and enabling it in virtiofs.
> Beside the abnormal size parameter for kmalloc, the gfp parameter is
> also questionable: GFP_ATOMIC is used even when the allocation occurs
> in a kworker context. Patch #2 fixes it by using GFP_NOFS when the
> allocation is initiated by the kworker. For more details, please check
> the individual patches.
> 
> As usual, comments are always welcome.
> 
> Change Log:
> 
> v3:
>  * introduce use_pages_for_kvec_io for virtiofs. When the option is
>enabled, fuse will use iov_iter_extract_pages() to construct a page
>array and pass the pages array instead of a pointer to virtiofs.
>The benefit is twofold: the length of the data passed to virtiofs is
>limited by max_pages, and there is no memory copy compared with v2.
> 
> v2: 
> https://lore.kernel.org/linux-fsdevel/20240228144126.2864064-1-hou...@huaweicloud.com/
>   * limit the length of ITER_KVEC dio by max_pages instead of the
> newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC
> dio being consistent with other rw operations.
>   * replace kmalloc-allocated bounce buffer by using a bounce buffer
> backed by scattered pages when the length of the bounce buffer for
> KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with
> fragmented memory, the KVEC_ITER dio can be handled normally by
> virtiofs. (Bernd Schubert)
>   * merge the GFP_NOFS patch [1] into this patch-set and use
> memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS
> (Benjamin Coddington)
> 
> v1: 
> https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/
> 
> [1]: 
> https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-hou...@huaweicloud.com/
> 
> Hou Tao (2):
>   virtiofs: use pages instead of pointer for kernel direct IO
>   virtiofs: use GFP_NOFS when enqueuing request through kworker
> 
>  fs/fuse/file.c  | 12 
>  fs/fuse/fuse_i.h|  3 +++
>  fs/fuse/virtio_fs.c | 25 -
>  3 files changed, 27 insertions(+), 13 deletions(-)
> 

We also encountered the same issue as [1] these days when attempting to
insmod a module with ~6MB size, which is upon a virtiofs filesystem.

It would be much helpful if this issue has a standard fix in the
upstream.  I see there will be v4 when reading through the mailing
thread.  Glad to know if there's any update to this series.

[1]
https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-hou...@huaweicloud.com/

-- 
Thanks,
Jingbo

Re: [PATCH 1/2] module: Split modules_install compression and in-kernel decompression

2024-07-28 Thread Masahiro Yamada

On Thu, Jul 25, 2024 at 9:59 PM Petr Pavlu  wrote:
>
> On 7/22/24 12:23, Masahiro Yamada wrote:
> > On Mon, Jul 22, 2024 at 6:07 PM Petr Pavlu  wrote:
> >>
> >> The kernel configuration allows specifying a module compression mode. If
> >> one is selected then each module gets compressed during
> >> 'make modules_install' and additionally one can also enable support for
> >> a respective direct in-kernel decompression support. This means that the
> >> decompression support cannot be enabled without the automatic compression.
> >>
> >> Some distributions, such as the (open)SUSE family, use a signer service for
> >> modules. A build runs on a worker machine but signing is done by a separate
> >> locked-down server that is in possession of the signing key. The build
> >> invokes 'make modules_install' to create a modules tree, collects
> >> information about the modules, asks the signer service for their signature,
> >> appends each signature to the respective module and compresses all modules.
> >>
> >> When using this arrangment, the 'make modules_install' step produces
> >> unsigned+uncompressed modules and the distribution's own build recipe takes
> >> care of signing and compression later.
> >>
> >> The signing support can be currently enabled without automatically signing
> >> modules during 'make modules_install'. However, the in-kernel decompression
> >> support can be selected only after first enabling automatic compression
> >> during this step.
> >>
> >> To allow only enabling the in-kernel decompression support without the
> >> automatic compression during 'make modules_install', separate the
> >> compression options similarly to the signing options, as follows:
> >>
> >>> Enable loadable module support
> >> [*] Module compression
> >>   Module compression type (GZIP)  --->
> >> [*]   Automatically compress all modules
> >> [ ]   Support in-kernel module decompression
> >>
> >> * "Module compression" (MODULE_COMPRESS) is a new main switch for the
> >>   compression/decompression support. It replaces MODULE_COMPRESS_NONE.
> >> * "Module compression type" (MODULE_COMPRESS_) chooses the
> >>   compression type, one of GZ, XZ, ZSTD.
> >> * "Automatically compress all modules" (MODULE_COMPRESS_ALL) is a new
> >>   option to enable module compression during 'make modules_install'. It
> >>   defaults to Y.
> >> * "Support in-kernel module decompression" (MODULE_DECOMPRESS) enables
> >>   in-kernel decompression.
> >>
> >> Signed-off-by: Petr Pavlu 
> >> ---
> >
> >
> >
> > My preference is to add
> >  CONFIG_MODULE_DECOMPRESS_GZIP
> >  CONFIG_MODULE_DECOMPRESS_XZ
> >  CONFIG_MODULE_DECOMPRESS_ZSTD
> > instead of
> >  CONFIG_MODULE_COMPRESS_ALL.
> >
> >
> >
> >
> > For example,
> >
> >
> > if MODULE_DECOMPRESS
> >
> > config MODULE_DECOMPRESS_GZIP
> >    bool "Support in-kernel GZIP decompression for module"
> >default MODULE_COMPRESS_GZIP
> >
> > config MODULE_DECOMPRESS_XZ
> >bool "Support in-kernel XZ decompression for module"
> >default MODULE_COMPRESS_XZ
> >
> > config MODULE_DECOMPRESS_ZSTD
> >bool "Support in-kernel ZSTD decompression for module"
> >default MODULE_COMPRESS_ZSTD
> >
> > endif
> >
> >
> >
> >
> >
> > OR, maybe
> >
> >
> >
> > config MODULE_DECOMPRESS_GZIP
> >bool "Support in-kernel GZIP decompression for module"
> >    select MODULE_DECOMPRESS
> >
> > config MODULE_DECOMPRESS_XZ
> >bool "Support in-kernel XZ decompression for module"
> >select MODULE_DECOMPRESS
> >
> > config MODULE_DECOMPRESS_ZSTD
> >bool "Support in-kernel ZSTD decompression for module"
> >select MODULE_DECOMPRESS
> >
> > config MODULE_DECOMPRESS
> >bool
> >
> >
> >
> >
> > You can toggle MODULE_COMPRESS_GZIP and
> > MODULE_DECOMPRESS_GZIP independently
>
> I can implement this, but what would be a use case to enable multiple module
> decompression types in the kernel?


I just thought there is a possibility where the singer service A
compresses modules in GZIP, and the singer service B in XZ, etc.

If the compression type is predictable at the Kbuild time,
it is fine.




>
> >
> >
> > Of course, the current kernel/module/decompress.c does not
> > work when multiple (or zero) CONFIG_MODULE_DECOMPRESS_* is
> > enabled. It needs a little modification.
>
> One issue is with the file /sys/module/compression which shows the module
> decompression type supported by the kernel. If multiple types are allowed then
> I think they should all get listed there. This could however create some
> compatibility problems. For instance, kmod reads this file and currently
> expects to find exactly one type, so it would need updating as well.


OK, understood. Then,

Acked-by: Masahiro Yamada 



--
Best Regards
Masahiro Yamada

Re: [PATCH 1/2] module: Split modules_install compression and in-kernel decompression

2024-07-25 Thread Petr Pavlu

On 7/22/24 12:23, Masahiro Yamada wrote:
> On Mon, Jul 22, 2024 at 6:07 PM Petr Pavlu  wrote:
>>
>> The kernel configuration allows specifying a module compression mode. If
>> one is selected then each module gets compressed during
>> 'make modules_install' and additionally one can also enable support for
>> a respective direct in-kernel decompression support. This means that the
>> decompression support cannot be enabled without the automatic compression.
>>
>> Some distributions, such as the (open)SUSE family, use a signer service for
>> modules. A build runs on a worker machine but signing is done by a separate
>> locked-down server that is in possession of the signing key. The build
>> invokes 'make modules_install' to create a modules tree, collects
>> information about the modules, asks the signer service for their signature,
>> appends each signature to the respective module and compresses all modules.
>>
>> When using this arrangment, the 'make modules_install' step produces
>> unsigned+uncompressed modules and the distribution's own build recipe takes
>> care of signing and compression later.
>>
>> The signing support can be currently enabled without automatically signing
>> modules during 'make modules_install'. However, the in-kernel decompression
>> support can be selected only after first enabling automatic compression
>> during this step.
>>
>> To allow only enabling the in-kernel decompression support without the
>> automatic compression during 'make modules_install', separate the
>> compression options similarly to the signing options, as follows:
>>
>>> Enable loadable module support
>> [*] Module compression
>>   Module compression type (GZIP)  --->
>> [*]   Automatically compress all modules
>> [ ]   Support in-kernel module decompression
>>
>> * "Module compression" (MODULE_COMPRESS) is a new main switch for the
>>   compression/decompression support. It replaces MODULE_COMPRESS_NONE.
>> * "Module compression type" (MODULE_COMPRESS_) chooses the
>>   compression type, one of GZ, XZ, ZSTD.
>> * "Automatically compress all modules" (MODULE_COMPRESS_ALL) is a new
>>   option to enable module compression during 'make modules_install'. It
>>   defaults to Y.
>> * "Support in-kernel module decompression" (MODULE_DECOMPRESS) enables
>>   in-kernel decompression.
>>
>> Signed-off-by: Petr Pavlu 
>> ---
> 
> 
> 
> My preference is to add
>  CONFIG_MODULE_DECOMPRESS_GZIP
>  CONFIG_MODULE_DECOMPRESS_XZ
>  CONFIG_MODULE_DECOMPRESS_ZSTD
> instead of
>  CONFIG_MODULE_COMPRESS_ALL.
> 
> 
> 
> 
> For example,
> 
> 
> if MODULE_DECOMPRESS
> 
> config MODULE_DECOMPRESS_GZIP
>bool "Support in-kernel GZIP decompression for module"
>default MODULE_COMPRESS_GZIP
> 
> config MODULE_DECOMPRESS_XZ
>bool "Support in-kernel XZ decompression for module"
>default MODULE_COMPRESS_XZ
> 
> config MODULE_DECOMPRESS_ZSTD
>bool "Support in-kernel ZSTD decompression for module"
>default MODULE_COMPRESS_ZSTD
> 
> endif
> 
> 
> 
> 
> 
> OR, maybe
> 
> 
> 
> config MODULE_DECOMPRESS_GZIP
>bool "Support in-kernel GZIP decompression for module"
>select MODULE_DECOMPRESS
> 
> config MODULE_DECOMPRESS_XZ
>bool "Support in-kernel XZ decompression for module"
>select MODULE_DECOMPRESS
> 
> config MODULE_DECOMPRESS_ZSTD
>bool "Support in-kernel ZSTD decompression for module"
>select MODULE_DECOMPRESS
> 
> config MODULE_DECOMPRESS
>bool
> 
> 
> 
> 
> You can toggle MODULE_COMPRESS_GZIP and
> MODULE_DECOMPRESS_GZIP independently

I can implement this, but what would be a use case to enable multiple module
decompression types in the kernel?

> 
> 
> Of course, the current kernel/module/decompress.c does not
> work when multiple (or zero) CONFIG_MODULE_DECOMPRESS_* is
> enabled. It needs a little modification.

One issue is with the file /sys/module/compression which shows the module
decompression type supported by the kernel. If multiple types are allowed then
I think they should all get listed there. This could however create some
compatibility problems. For instance, kmod reads this file and currently
expects to find exactly one type, so it would need updating as well.

Thanks,
Petr

Re: [PATCH 1/2] module: Split modules_install compression and in-kernel decompression

2024-07-22 Thread Masahiro Yamada

On Mon, Jul 22, 2024 at 6:07 PM Petr Pavlu  wrote:
>
> The kernel configuration allows specifying a module compression mode. If
> one is selected then each module gets compressed during
> 'make modules_install' and additionally one can also enable support for
> a respective direct in-kernel decompression support. This means that the
> decompression support cannot be enabled without the automatic compression.
>
> Some distributions, such as the (open)SUSE family, use a signer service for
> modules. A build runs on a worker machine but signing is done by a separate
> locked-down server that is in possession of the signing key. The build
> invokes 'make modules_install' to create a modules tree, collects
> information about the modules, asks the signer service for their signature,
> appends each signature to the respective module and compresses all modules.
>
> When using this arrangment, the 'make modules_install' step produces
> unsigned+uncompressed modules and the distribution's own build recipe takes
> care of signing and compression later.
>
> The signing support can be currently enabled without automatically signing
> modules during 'make modules_install'. However, the in-kernel decompression
> support can be selected only after first enabling automatic compression
> during this step.
>
> To allow only enabling the in-kernel decompression support without the
> automatic compression during 'make modules_install', separate the
> compression options similarly to the signing options, as follows:
>
> > Enable loadable module support
> [*] Module compression
>   Module compression type (GZIP)  --->
> [*]   Automatically compress all modules
> [ ]   Support in-kernel module decompression
>
> * "Module compression" (MODULE_COMPRESS) is a new main switch for the
>   compression/decompression support. It replaces MODULE_COMPRESS_NONE.
> * "Module compression type" (MODULE_COMPRESS_) chooses the
>   compression type, one of GZ, XZ, ZSTD.
> * "Automatically compress all modules" (MODULE_COMPRESS_ALL) is a new
>   option to enable module compression during 'make modules_install'. It
>   defaults to Y.
> * "Support in-kernel module decompression" (MODULE_DECOMPRESS) enables
>   in-kernel decompression.
>
> Signed-off-by: Petr Pavlu 
> ---



My preference is to add
 CONFIG_MODULE_DECOMPRESS_GZIP
 CONFIG_MODULE_DECOMPRESS_XZ
 CONFIG_MODULE_DECOMPRESS_ZSTD
instead of
 CONFIG_MODULE_COMPRESS_ALL.




For example,


if MODULE_DECOMPRESS

config MODULE_DECOMPRESS_GZIP
   bool "Support in-kernel GZIP decompression for module"
   default MODULE_COMPRESS_GZIP

config MODULE_DECOMPRESS_XZ
   bool "Support in-kernel XZ decompression for module"
   default MODULE_COMPRESS_XZ

config MODULE_DECOMPRESS_ZSTD
   bool "Support in-kernel ZSTD decompression for module"
   default MODULE_COMPRESS_ZSTD

endif





OR, maybe



config MODULE_DECOMPRESS_GZIP
   bool "Support in-kernel GZIP decompression for module"
   select MODULE_DECOMPRESS

config MODULE_DECOMPRESS_XZ
   bool "Support in-kernel XZ decompression for module"
   select MODULE_DECOMPRESS

config MODULE_DECOMPRESS_ZSTD
   bool "Support in-kernel ZSTD decompression for module"
   select MODULE_DECOMPRESS

config MODULE_DECOMPRESS
   bool




You can toggle MODULE_COMPRESS_GZIP and
MODULE_DECOMPRESS_GZIP independently


Of course, the current kernel/module/decompress.c does not
work when multiple (or zero) CONFIG_MODULE_DECOMPRESS_* is
enabled. It needs a little modification.


I will wait for Lius's comment.







>  kernel/module/Kconfig| 61 
>  scripts/Makefile.modinst |  2 ++
>  2 files changed, 33 insertions(+), 30 deletions(-)
>
> diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
> index 4047b6d48255..bb7f7930fef6 100644
> --- a/kernel/module/Kconfig
> +++ b/kernel/module/Kconfig
> @@ -278,64 +278,65 @@ config MODULE_SIG_HASH
> default "sha3-384" if MODULE_SIG_SHA3_384
> default "sha3-512" if MODULE_SIG_SHA3_512
>
> -choice
> -   prompt "Module compression mode"
> +config MODULE_COMPRESS
> +   bool "Module compression"
> help
> - This option allows you to choose the algorithm which will be used to
> - compress modules when 'make modules_install' is run. (or, you can
> - choose to not compress modules at all.)
> -
> - External modules will also be compressed in the same way during the
> - installation.
> -
> - For modules inside an initrd or initramfs

[PATCH 1/2] module: Split modules_install compression and in-kernel decompression

2024-07-22 Thread Petr Pavlu

The kernel configuration allows specifying a module compression mode. If
one is selected then each module gets compressed during
'make modules_install' and additionally one can also enable support for
a respective direct in-kernel decompression support. This means that the
decompression support cannot be enabled without the automatic compression.

Some distributions, such as the (open)SUSE family, use a signer service for
modules. A build runs on a worker machine but signing is done by a separate
locked-down server that is in possession of the signing key. The build
invokes 'make modules_install' to create a modules tree, collects
information about the modules, asks the signer service for their signature,
appends each signature to the respective module and compresses all modules.

When using this arrangment, the 'make modules_install' step produces
unsigned+uncompressed modules and the distribution's own build recipe takes
care of signing and compression later.

The signing support can be currently enabled without automatically signing
modules during 'make modules_install'. However, the in-kernel decompression
support can be selected only after first enabling automatic compression
during this step.

To allow only enabling the in-kernel decompression support without the
automatic compression during 'make modules_install', separate the
compression options similarly to the signing options, as follows:

> Enable loadable module support
[*] Module compression
  Module compression type (GZIP)  --->
[*]   Automatically compress all modules
[ ]   Support in-kernel module decompression

* "Module compression" (MODULE_COMPRESS) is a new main switch for the
  compression/decompression support. It replaces MODULE_COMPRESS_NONE.
* "Module compression type" (MODULE_COMPRESS_) chooses the
  compression type, one of GZ, XZ, ZSTD.
* "Automatically compress all modules" (MODULE_COMPRESS_ALL) is a new
  option to enable module compression during 'make modules_install'. It
  defaults to Y.
* "Support in-kernel module decompression" (MODULE_DECOMPRESS) enables
  in-kernel decompression.

Signed-off-by: Petr Pavlu 
---
 kernel/module/Kconfig| 61 ----
 scripts/Makefile.modinst |  2 ++
 2 files changed, 33 insertions(+), 30 deletions(-)

diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
index 4047b6d48255..bb7f7930fef6 100644
--- a/kernel/module/Kconfig
+++ b/kernel/module/Kconfig
@@ -278,64 +278,65 @@ config MODULE_SIG_HASH
default "sha3-384" if MODULE_SIG_SHA3_384
default "sha3-512" if MODULE_SIG_SHA3_512
 
-choice
-   prompt "Module compression mode"
+config MODULE_COMPRESS
+   bool "Module compression"
help
- This option allows you to choose the algorithm which will be used to
- compress modules when 'make modules_install' is run. (or, you can
- choose to not compress modules at all.)
-
- External modules will also be compressed in the same way during the
- installation.
-
- For modules inside an initrd or initramfs, it's more efficient to
- compress the whole initrd or initramfs instead.
-
+ Enable module compression to reduce on-disk size of module binaries.
  This is fully compatible with signed modules.
 
- Please note that the tool used to load modules needs to support the
- corresponding algorithm. module-init-tools MAY support gzip, and kmod
- MAY support gzip, xz and zstd.
+ The tool used to work with modules needs to support the selected
+ compression type. kmod MAY support gzip, xz and zstd. Other tools
+ might have a limited selection of the supported types.
 
- Your build system needs to provide the appropriate compression tool
- to compress the modules.
+ Note that for modules inside an initrd or initramfs, it's more
+ efficient to compress the whole ramdisk instead.
 
- If in doubt, select 'None'.
+ If unsure, say N.
 
-config MODULE_COMPRESS_NONE
-   bool "None"
+choice
+   prompt "Module compression type"
+   depends on MODULE_COMPRESS
help
- Do not compress modules. The installed modules are suffixed
- with .ko.
+ Choose the supported algorithm for module compression.
 
 config MODULE_COMPRESS_GZIP
bool "GZIP"
help
- Compress modules with GZIP. The installed modules are suffixed
- with .ko.gz.
+ Support modules compressed with GZIP. The installed modules are
+ suffixed with .ko.gz.
 
 config MODULE_COMPRESS_XZ
bool "XZ"
help
- Compress modules with XZ. The installed modules are suffixed
- with .ko.xz.
+ Support m

[PATCH 0/2] module: Split modules_install compression and in-kernel decompression

2024-07-22 Thread Petr Pavlu

Allow enabling the in-kernel module decompression support separately,
without requiring to enable also the automatic compression during
'make modules_install'.

Petr Pavlu (2):
  module: Split modules_install compression and in-kernel decompression
  module: Clean up the description of MODULE_SIG_

 kernel/module/Kconfig| 77 
 scripts/Makefile.modinst |  2 ++
 2 files changed, 41 insertions(+), 38 deletions(-)


base-commit: 933069701c1b507825b514317d4edd5d3fd9d417
-- 
2.35.3

Re: [PATCH 01/17] mm: move kernel/numa.c to mm/

2024-07-19 Thread Jonathan Cameron

On Tue, 16 Jul 2024 14:13:30 +0300
Mike Rapoport  wrote:

> From: "Mike Rapoport (Microsoft)" 
> 
> The stub functions in kernel/numa.c belong to mm/ rather than to kernel/
> 
> Signed-off-by: Mike Rapoport (Microsoft) 

Makes sense + all arch specific implementations are in arch/*/mm not
arch/*/kernel so this makes it more consistent with that.

Reviewed-by: Jonathan Cameron

Re: [PATCH 01/17] mm: move kernel/numa.c to mm/

2024-07-17 Thread David Hildenbrand


On 16.07.24 13:13, Mike Rapoport wrote:

From: "Mike Rapoport (Microsoft)" 

The stub functions in kernel/numa.c belong to mm/ rather than to kernel/

Signed-off-by: Mike Rapoport (Microsoft) 
---


Acked-by: David Hildenbrand 

--
Cheers,

David / dhildenb

Re: [BUG REPORT] kernel BUG at lib/dynamic_queue_limits.c:99!

2024-07-16 Thread xiujianfeng

Hi,

On 2024/7/13 8:44, Jakub Kicinski wrote:
> On Fri, 12 Jul 2024 17:43:21 -0700 Jakub Kicinski wrote:
>> CC: virtio_net maintainers and Jiri who added BQL
> 
> Oh, sounds like the fix may be already posted:
> https://lore.kernel.org/all/20240712080329.197605-2-jean-phili...@linaro.org/

Thanks, this patch indeed resolved the issue.

[PATCH 01/17] mm: move kernel/numa.c to mm/

2024-07-16 Thread Mike Rapoport

From: "Mike Rapoport (Microsoft)" 

The stub functions in kernel/numa.c belong to mm/ rather than to kernel/

Signed-off-by: Mike Rapoport (Microsoft) 
---
 kernel/Makefile   | 1 -
 mm/Makefile   | 1 +
 {kernel => mm}/numa.c | 0
 3 files changed, 1 insertion(+), 1 deletion(-)
 rename {kernel => mm}/numa.c (100%)

diff --git a/kernel/Makefile b/kernel/Makefile
index 3c13240dfc9f..87866b037fbe 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -116,7 +116,6 @@ obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
 obj-$(CONFIG_HAVE_STATIC_CALL) += static_call.o
 obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call_inline.o
 obj-$(CONFIG_CFI_CLANG) += cfi.o
-obj-$(CONFIG_NUMA) += numa.o
 
 obj-$(CONFIG_PERF_EVENTS) += events/
 
diff --git a/mm/Makefile b/mm/Makefile
index 8fb85acda1b1..773b3b267438 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -139,3 +139,4 @@ obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) += bootmem_info.o
 obj-$(CONFIG_GENERIC_IOREMAP) += ioremap.o
 obj-$(CONFIG_SHRINKER_DEBUG) += shrinker_debug.o
 obj-$(CONFIG_EXECMEM) += execmem.o
+obj-$(CONFIG_NUMA) += numa.o
diff --git a/kernel/numa.c b/mm/numa.c
similarity index 100%
rename from kernel/numa.c
rename to mm/numa.c
-- 
2.43.0

Re: [BUG REPORT] kernel BUG at lib/dynamic_queue_limits.c:99!

2024-07-12 Thread Jakub Kicinski

On Fri, 12 Jul 2024 17:43:21 -0700 Jakub Kicinski wrote:
> CC: virtio_net maintainers and Jiri who added BQL

Oh, sounds like the fix may be already posted:
https://lore.kernel.org/all/20240712080329.197605-2-jean-phili...@linaro.org/

Re: [BUG REPORT] kernel BUG at lib/dynamic_queue_limits.c:99!

2024-07-12 Thread Jakub Kicinski

CC: virtio_net maintainers and Jiri who added BQL

On Fri, 12 Jul 2024 10:12:42 +0800 xiujianfeng wrote:
> On 2024/7/12 10:08, xiujianfeng wrote:
> > I found a problem with my QEMU environment, and the log is as follows.
> > 
> > After I did the bisect to locate the issue, I found
> > 8490dd0592e85e0cceefa6b48d66dbdd73df0fb3 is the first bad commit,
> > however this is a merge commit, and I cannot further confirm which
> > specific commit caused this issue.  
> 
> It's on
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git and
> the base commit is f477dd6eede3
> 
> > 
> > [ cut here ]
> > kernel BUG at lib/dynamic_queue_limits.c:99!
> > Oops: invalid opcode:  [#1] PREEMPT SMP NOPTI
> > CPU: 1 UID: 0 PID: 203 Comm: ip Not tainted
> > 6.10.0-rc7-next-20240711-12643-gf477dd6eede3 #613
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1
> > 04/01/2014
> > RIP: 0010:dql_completed+0x212/0x230
> > Code: 41 1c 01 48 89 57 58 e9 85 fe ff ff 85 ed 40 0f 95 c5 41 39 d8 0f
> > 95 c1 40 84 cd 74 05 45 85 e4 78 0a 44 89 d9 e9 67 fe fe
> > RSP: 0018:c90f0d70 EFLAGS: 0213
> > RAX:  RBX: 88800413b800 RCX: 888005925240
> > RDX:  RSI: 81df1116 RDI: 888003a0d700
> > RBP: 888003a0d600 R08:  R09: 
> > R10:  R11: 88800a403c90 R12: 0001
> > R13: c90f0db0 R14: 888003a0d680 R15: 88803cc8
> > FS:  7fcf4229f1c0() GS:88803cc8() knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 5596d60d1290 CR3: 093c CR4: 06f0
> > Call Trace:
> >  
> >  ? die+0x32/0x90
> >  ? do_trap+0xdc/0x100
> >  ? dql_completed+0x212/0x230
> >  ? do_error_trap+0x60/0x80
> >  ? dql_completed+0x212/0x230
> >  ? exc_invalid_op+0x4f/0x70
> >  ? dql_completed+0x212/0x230
> >  ? asm_exc_invalid_op+0x1a/0x20
> >  ? dql_completed+0x212/0x230
> >  __free_old_xmit+0xb2/0x120
> >  free_old_xmit+0x23/0x70
> >  ? _raw_spin_trylock+0x46/0x60
> >  virtnet_poll+0xe0/0x590
> >  ? update_curr+0xf9/0x1c0
> >  ? find_held_lock+0x2b/0x80
> >  __napi_poll+0x25/0x160
> >  net_rx_action+0x177/0x310
> >  ? clockevents_program_event+0x53/0x100
> >  ? lock_release+0xa4/0x1d0
> >  ? ktime_get+0x76/0x100
> >  ? lapic_next_event+0x10/0x20
> >  handle_softirqs+0xd0/0x210
> >  do_softirq+0x3b/0x60
> >  
> >  
> >  __local_bh_enable_ip+0x55/0x70
> >  virtnet_open+0xac/0x2d0
> >  __dev_open+0xda/0x190
> >  __dev_change_flags+0x1b3/0x230
> >  ? __pfx_stack_trace_consume_entry+0x10/0x10
> >  ? arch_stack_walk+0x9d/0xf0
> >  dev_change_flags+0x20/0x60
> >  do_setlink+0x27e/0x1120
> >  ? set_track_prepare+0x3b/0x60
> >  ? rtnl_newlink+0x5a/0xa0
> >  ? rtnetlink_rcv_msg+0x199/0x4c0
> >  ? __nla_validate_parse+0x5e/0xed0
> >  ? netlink_sendmsg+0x1e3/0x420
> >  ? __sock_sendmsg+0x5e/0x60
> >  ? sys_sendmsg+0x1da/0x210
> >  ? ___sys_sendmsg+0x7b/0xc0
> >  ? __sys_sendmsg+0x50/0x90
> >  ? do_syscall_64+0x4b/0x110
> >  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >  __rtnl_newlink+0x50d/0x990
> >  ? __kmalloc_cache_noprof+0x1a0/0x260
> >  ? __kmalloc_cache_noprof+0x204/0x260
> >  ? rtnetlink_rcv_msg+0x14e/0x4c0
> >  ? rtnl_newlink+0x5a/0xa0
> >  rtnl_newlink+0x73/0xa0
> >  rtnetlink_rcv_msg+0x199/0x4c0
> >  ? find_held_lock+0x2b/0x80
> >  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> >  netlink_rcv_skb+0x56/0x100
> >  ? netlink_unicast+0x69/0x3a0
> >  netlink_unicast+0x283/0x3a0
> >  netlink_sendmsg+0x1e3/0x420
> >  __sock_sendmsg+0x5e/0x60
> >  sys_sendmsg+0x1da/0x210
> >  ? copy_msghdr_from_user+0x68/0xa0
> >  ___sys_sendmsg+0x7b/0xc0
> >  ? stack_depot_save_flags+0x2e/0x8a0
> >  ? check_bytes_and_report.constprop.0+0x48/0x120
> >  ? check_object+0xb5/0x3a0
> >  ? find_held_lock+0x2b/0x80
> >  __sys_sendmsg+0x50/0x90
> >  do_syscall_64+0x4b/0x110
> >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > RIP: 0033:0x7fcf423c7f03
> > Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00
> > 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 08
> > RSP: 002b:7ffcbfa59528 EFLAGS: 0246 ORIG_RAX: 002e
> > RAX: ffda RBX:  RCX: 7fcf423c7f03
> > RDX:  RSI: 7ffcbfa59590 RDI: 0003
> > RBP: 00

Re: [PATCH 6.10.0-rc2] kernel/module: avoid panic on loading broken module

2024-06-28 Thread Luis Chamberlain

On Fri, Jun 21, 2024 at 04:05:27PM +0200, Daniel von Kirschten wrote:
> Am 18.06.2024 um 21:58 schrieb Luis Chamberlain:
> > On Thu, Jun 06, 2024 at 03:31:49PM +0200, Daniel v. Kirschten wrote:
> > > If a module is being loaded, and the .gnu.linkonce.this_module section
> > > in the module's ELF file does not have the WRITE flag, the kernel will
> > > map the finished module struct of that module as read-only.
> > > This causes a kernel panic when the struct is written to the first time
> > > after it has been marked read-only. Currently this happens in
> > > complete_formation in kernel/module/main.c:2765 when the module's state is
> > > set to MODULE_STATE_COMING, just after setting up the memory protections.
> > 
> > How did you find this issue?
> 
> In a university course I got the assignment to manually craft a loadable .ko
> file, given only a regular object file, without using Kbuild. During testing
> my module files, most of them were simply (correctly) rejected by the kernel
> with an appropriate error message, but at some point I ran into this exact
> kernel panic, and investigated it to understand why my module file was
> invalid.

OK, then the commit log should describe that this doesn't fix any known
real world issue, but rather a custom crafted module without the regular
module build system.

> > > Down the line, this seems to lead to unpredictable freezes when trying to
> > > load other modules - I guess this is due to some structures not being
> > > cleaned up properly, but I didn't investigate this further.
> > > 
> > > A check already exists which verifies that .gnu.linkonce.this_module
> > > is ALLOC. This patch simply adds an analogous check for WRITE.
> > 
> > Can you check to ensure our modules generated have a respective check to
> > ensure this check exists at build time? That would proactively inform
> > userspace when a built module is not built correctly, and the tool
> > responsible can be identified.
> 
> See above - I don't think it's possible to create such a broken module file
> with any of "official" tools.

That should be clearly stated on the commit log.

> I haven't looked too deeply into how Kbuild
> actually builds modules, but as far as I know, the user doesn't even come
> into contact with this_module w

Consider that a next level university assignment and is more useful to the world
than this debug message. Because above you suggest "I don't think", go
out and now be sure.

> hen using the regular toolchain, because
> Kbuild is responsible for creating the .this_module section. And Kbuild of
> course creates it with the correct flags. So if I understand correctly,

...

> this
> problem can only occur when the module was built by some external tooling
> (or manually, in my case).

Who would create custom modules without the Linux kernel module build
system, and what uses does that provide? It seems you are proving why
this would be terribly silly thing to do.

Now, the *value* your change has is it can prevent a crash in case of a
corrupted module, which *can* occur, consider an odd filesystem
live corruption, at least this would be caught at module load attempt
and not crash. That's worth committing for this reason but your commit
log really needs much more clarity. Why? Because stupid bots want to
assign stupid CVEs for anything that seems like a security issue and
this could escalate to such type of things. Providing clarity helps
system integrators decide if they want to backport this sort of patch.
Providing clarify on the chances of this happening and how we think it
can happen helps a lot.

If you want to be more proactive, try to enhance userspace kmod modprobe
so that this is also verified.

  Luis

Re: (subset) [PATCH v9 0/5] soc: qcom: add in-kernel pd-mapper implementation

2024-06-25 Thread Bjorn Andersson



On Sat, 22 Jun 2024 01:03:39 +0300, Dmitry Baryshkov wrote:
> Protection domain mapper is a QMI service providing mapping between
> 'protection domains' and services supported / allowed in these domains.
> For example such mapping is required for loading of the WiFi firmware or
> for properly starting up the UCSI / altmode / battery manager support.
> 
> The existing userspace implementation has several issue. It doesn't play
> well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
> firmware location is changed (or if the firmware was not available at
> the time pd-mapper was started but the corresponding directory is
> mounted later), etc.
> 
> [...]

Applied, thanks!

[5/5] remoteproc: qcom: enable in-kernel PD mapper
  commit: 5b9f51b200dcb2c3924ecbff324fa52f1faa84d3

Best regards,
-- 
Bjorn Andersson

Re: (subset) [PATCH v9 0/5] soc: qcom: add in-kernel pd-mapper implementation

2024-06-24 Thread Bjorn Andersson



On Sat, 22 Jun 2024 01:03:39 +0300, Dmitry Baryshkov wrote:
> Protection domain mapper is a QMI service providing mapping between
> 'protection domains' and services supported / allowed in these domains.
> For example such mapping is required for loading of the WiFi firmware or
> for properly starting up the UCSI / altmode / battery manager support.
> 
> The existing userspace implementation has several issue. It doesn't play
> well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
> firmware location is changed (or if the firmware was not available at
> the time pd-mapper was started but the corresponding directory is
> mounted later), etc.
> 
> [...]

Applied, thanks!

[1/5] soc: qcom: pdr: protect locator_addr with the main mutex
  commit: 107924c14e3ddd85119ca43c26a4ee1056fa9b84
[2/5] soc: qcom: pdr: fix parsing of domains lists
  commit: 57f20d51f35780f240ecf39d81cda23612800a92
[3/5] soc: qcom: pdr: extract PDR message marshalling data
  commit: 0ac5c7d933de6570e0efa62bb5ef9e440311a6fe
[4/5] soc: qcom: add pd-mapper implementation
  commit: 1ebcde047c547134e894508468ead0b7bd3b967d

Best regards,
-- 
Bjorn Andersson

[PATCH v9 5/5] remoteproc: qcom: enable in-kernel PD mapper

2024-06-21 Thread Dmitry Baryshkov

Request in-kernel protection domain mapper to be started before starting
Qualcomm DSP and release it once DSP is stopped. Once all DSPs are
stopped, the PD mapper will be stopped too.

Reviewed-by: Chris Lew 
Tested-by: Steev Klimaszewski 
Tested-by: Neil Armstrong  # on SM8550-QRD
Signed-off-by: Dmitry Baryshkov 
---
 drivers/remoteproc/qcom_common.c| 87 +
 drivers/remoteproc/qcom_common.h| 10 +
 drivers/remoteproc/qcom_q6v5_adsp.c |  3 ++
 drivers/remoteproc/qcom_q6v5_mss.c  |  3 ++
 drivers/remoteproc/qcom_q6v5_pas.c  |  3 ++
 drivers/remoteproc/qcom_q6v5_wcss.c |  3 ++
 6 files changed, 109 insertions(+)

diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c
index 03e5f5d533eb..8c8688f99f0a 100644
--- a/drivers/remoteproc/qcom_common.c
+++ b/drivers/remoteproc/qcom_common.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -25,6 +26,7 @@
 #define to_glink_subdev(d) container_of(d, struct qcom_rproc_glink, subdev)
 #define to_smd_subdev(d) container_of(d, struct qcom_rproc_subdev, subdev)
 #define to_ssr_subdev(d) container_of(d, struct qcom_rproc_ssr, subdev)
+#define to_pdm_subdev(d) container_of(d, struct qcom_rproc_pdm, subdev)
 
 #define MAX_NUM_OF_SS   10
 #define MAX_REGION_NAME_LENGTH  16
@@ -519,5 +521,90 @@ void qcom_remove_ssr_subdev(struct rproc *rproc, struct 
qcom_rproc_ssr *ssr)
 }
 EXPORT_SYMBOL_GPL(qcom_remove_ssr_subdev);
 
+static void pdm_dev_release(struct device *dev)
+{
+   struct auxiliary_device *adev = to_auxiliary_dev(dev);
+
+   kfree(adev);
+}
+
+static int pdm_notify_prepare(struct rproc_subdev *subdev)
+{
+   struct qcom_rproc_pdm *pdm = to_pdm_subdev(subdev);
+   struct auxiliary_device *adev;
+   int ret;
+
+   adev = kzalloc(sizeof(*adev), GFP_KERNEL);
+   if (!adev)
+   return -ENOMEM;
+
+   adev->dev.parent = pdm->dev;
+   adev->dev.release = pdm_dev_release;
+   adev->name = "pd-mapper";
+   adev->id = pdm->index;
+
+   ret = auxiliary_device_init(adev);
+   if (ret) {
+   kfree(adev);
+   return ret;
+   }
+
+   ret = auxiliary_device_add(adev);
+   if (ret) {
+   auxiliary_device_uninit(adev);
+   return ret;
+   }
+
+   pdm->adev = adev;
+
+   return 0;
+}
+
+
+static void pdm_notify_unprepare(struct rproc_subdev *subdev)
+{
+   struct qcom_rproc_pdm *pdm = to_pdm_subdev(subdev);
+
+   if (!pdm->adev)
+   return;
+
+   auxiliary_device_delete(pdm->adev);
+   auxiliary_device_uninit(pdm->adev);
+   pdm->adev = NULL;
+}
+
+/**
+ * qcom_add_pdm_subdev() - register PD Mapper subdevice
+ * @rproc: rproc handle
+ * @pdm:   PDM subdevice handle
+ *
+ * Register @pdm so that Protection Device mapper service is started when the
+ * DSP is started too.
+ */
+void qcom_add_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm)
+{
+   pdm->dev = &rproc->dev;
+   pdm->index = rproc->index;
+
+   pdm->subdev.prepare = pdm_notify_prepare;
+   pdm->subdev.unprepare = pdm_notify_unprepare;
+
+   rproc_add_subdev(rproc, &pdm->subdev);
+}
+EXPORT_SYMBOL_GPL(qcom_add_pdm_subdev);
+
+/**
+ * qcom_remove_pdm_subdev() - remove PD Mapper subdevice
+ * @rproc: rproc handle
+ * @pdm:   PDM subdevice handle
+ *
+ * Remove the PD Mapper subdevice.
+ */
+void qcom_remove_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm)
+{
+   rproc_remove_subdev(rproc, &pdm->subdev);
+}
+EXPORT_SYMBOL_GPL(qcom_remove_pdm_subdev);
+
 MODULE_DESCRIPTION("Qualcomm Remoteproc helper driver");
 MODULE_LICENSE("GPL v2");
diff --git a/drivers/remoteproc/qcom_common.h b/drivers/remoteproc/qcom_common.h
index 9ef4449052a9..b07fbaa091a0 100644
--- a/drivers/remoteproc/qcom_common.h
+++ b/drivers/remoteproc/qcom_common.h
@@ -34,6 +34,13 @@ struct qcom_rproc_ssr {
struct qcom_ssr_subsystem *info;
 };
 
+struct qcom_rproc_pdm {
+   struct rproc_subdev subdev;
+   struct device *dev;
+   int index;
+   struct auxiliary_device *adev;
+};
+
 void qcom_minidump(struct rproc *rproc, unsigned int minidump_id,
void (*rproc_dumpfn_t)(struct rproc *rproc,
struct rproc_dump_segment *segment, void *dest, 
size_t offset,
@@ -52,6 +59,9 @@ void qcom_add_ssr_subdev(struct rproc *rproc, struct 
qcom_rproc_ssr *ssr,
 const char *ssr_name);
 void qcom_remove_ssr_subdev(struct rproc *rproc, struct qcom_rproc_ssr *ssr);
 
+void qcom_add_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm);
+void qcom_remove_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm);
+
 #if IS_ENABLED(CONFIG_QCOM_SYSMON)
 struct qcom_sysmon *qcom_add_sysmon_subdev(struct rproc *rproc,

[PATCH v9 0/5] soc: qcom: add in-kernel pd-mapper implementation

2024-06-21 Thread Dmitry Baryshkov

Protection domain mapper is a QMI service providing mapping between
'protection domains' and services supported / allowed in these domains.
For example such mapping is required for loading of the WiFi firmware or
for properly starting up the UCSI / altmode / battery manager support.

The existing userspace implementation has several issue. It doesn't play
well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
firmware location is changed (or if the firmware was not available at
the time pd-mapper was started but the corresponding directory is
mounted later), etc.

However this configuration is largely static and common between
different platforms. Provide in-kernel service implementing static
per-platform data.

---
Changes in v9:
- Adjust locking in pdr_get_domain_list(), releasing the mutex right
  after qmi_send_request() (Chris Lew)
- Link to v8: 
https://lore.kernel.org/r/20240512-qcom-pd-mapper-v8-0-5ecbb276f...@linaro.org

Changes in v8:
- Reworked pd-mapper to register as an rproc_subdev / auxdev
- Dropped Tested-by from Steev and Alexey from the last patch since the
  implementation was changed significantly.
- Add sensors, cdsp and mpss_root domains to 660 config (Alexey
  Minnekhanov)
- Added platform entry for sm4250 (used for qrb4210 / RB2)
- Added locking to the pdr_get_domain_list() (Chris Lew)
- Remove the call to qmi_del_server() and corresponding API (Chris Lew)
- In qmi_handle_init() changed 1024 to a defined constant (Chris Lew)
- Link to v7: 
https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org

Changes in v7:
- Fixed modular build (Steev)
- Link to v6: 
https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org

Changes in v6:
- Reworked mutex to fix lockdep issue on deregistration
- Fixed dependencies between PD-mapper and remoteproc to fix modular
  builds (Krzysztof)
- Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
- Fixed kerneldocs (Krzysztof)
- Removed extra pr_debug messages (Krzysztof)
- Fixed wcss build (Krzysztof)
- Added platforms which do not require protection domain mapping to
  silence the notice on those platforms
- Link to v5: 
https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org

Changes in v5:
- pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew)
- pd_mapper: reworked to provide static configuration per platform
  (Bjorn)
- Link to v4: 
https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org

Changes in v4:
- Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
- Added configuration for sm6350 (Thanks to Luca)
- Removed RFC tag (Konrad)
- Link to v3: 
https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org

Changes in RFC v3:
- Send start / stop notifications when PD-mapper domain list is changed
- Reworked the way PD-mapper treats protection domains, register all of
  them in a single batch
- Added SC7180 domains configuration based on TCL Book 14 GO
- Link to v2: 
https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org

Changes in RFC v2:
- Swapped num_domains / domains (Konrad)
- Fixed an issue with battery not working on sc8280xp
- Added missing configuration for QCS404

To: Bjorn Andersson 
To: Konrad Dybcio 
To: Sibi Sankar 
To: Mathieu Poirier 
Cc: linux-arm-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-remotep...@vger.kernel.org

---
Dmitry Baryshkov (5):
  soc: qcom: pdr: protect locator_addr with the main mutex
  soc: qcom: pdr: fix parsing of domains lists
  soc: qcom: pdr: extract PDR message marshalling data
  soc: qcom: add pd-mapper implementation
  remoteproc: qcom: enable in-kernel PD mapper

 drivers/remoteproc/qcom_common.c|  87 +
 drivers/remoteproc/qcom_common.h|  10 +
 drivers/remoteproc/qcom_q6v5_adsp.c |   3 +
 drivers/remoteproc/qcom_q6v5_mss.c  |   3 +
 drivers/remoteproc/qcom_q6v5_pas.c  |   3 +
 drivers/remoteproc/qcom_q6v5_wcss.c |   3 +
 drivers/soc/qcom/Kconfig|  15 +
 drivers/soc/qcom/Makefile   |   2 +
 drivers/soc/qcom/pdr_interface.c|   8 +-
 drivers/soc/qcom/pdr_internal.h | 318 ++---
 drivers/soc/qcom/qcom_pd_mapper.c   | 676 
 drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
 12 files changed, 1183 insertions(+), 298 deletions(-)
---
base-commit: 2102cb0d050d34d50b9642a3a50861787527e922
change-id: 20240301-qcom-pd-mapper-e12d622d4ad0

Best regards,
-- 
Dmitry Baryshkov

Re: [PATCH 6.10.0-rc2] kernel/module: avoid panic on loading broken module

2024-06-21 Thread Daniel von Kirschten


Am 18.06.2024 um 21:58 schrieb Luis Chamberlain:

On Thu, Jun 06, 2024 at 03:31:49PM +0200, Daniel v. Kirschten wrote:

If a module is being loaded, and the .gnu.linkonce.this_module section
in the module's ELF file does not have the WRITE flag, the kernel will
map the finished module struct of that module as read-only.
This causes a kernel panic when the struct is written to the first time
after it has been marked read-only. Currently this happens in
complete_formation in kernel/module/main.c:2765 when the module's state is
set to MODULE_STATE_COMING, just after setting up the memory protections.


How did you find this issue?


In a university course I got the assignment to manually craft a loadable 
.ko file, given only a regular object file, without using Kbuild. During 
testing my module files, most of them were simply (correctly) rejected 
by the kernel with an appropriate error message, but at some point I ran 
into this exact kernel panic, and investigated it to understand why my 
module file was invalid.





Down the line, this seems to lead to unpredictable freezes when trying to
load other modules - I guess this is due to some structures not being
cleaned up properly, but I didn't investigate this further.

A check already exists which verifies that .gnu.linkonce.this_module
is ALLOC. This patch simply adds an analogous check for WRITE.


Can you check to ensure our modules generated have a respective check to
ensure this check exists at build time? That would proactively inform
userspace when a built module is not built correctly, and the tool
responsible can be identified.


See above - I don't think it's possible to create such a broken module 
file with any of "official" tools. I haven't looked too deeply into how 
Kbuild actually builds modules, but as far as I know, the user doesn't 
even come into contact with this_module when using the regular 
toolchain, because Kbuild is responsible for creating the .this_module 
section. And Kbuild of course creates it with the correct flags. So if I 
understand correctly, this problem can only occur when the module was 
built by some external tooling (or manually, in my case).


  Daniel

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-06-19 Thread Ilkka Naulapää

disabled  CONFIG_FORCE_NR_CPUS option for 6.9.5 but the trace + panic
still exists. So that one didn't help. I've also been bisecting the
trace but have not finished it yet as the last half dozen builds
produced non-bootable kernels. Anyway, I will continue it soon(ish)
when I have a bit more free time.

--Ilkka

On Tue, Jun 18, 2024 at 5:52 PM Steven Rostedt  wrote:
>
> On Thu, 13 Jun 2024 10:32:24 +0300
> Ilkka Naulapää  wrote:
>
> > ok, so if you don't have any idea where this bug is after those debug
> > patches, I'll try to find some time to bisect it as a last resort.
> > Stay tuned.
>
> FYI,
>
> I just debugged a strange crash that was caused by my config having
> something leftover from your config. Specifically, that was:
>
> CONFIG_FORCE_NR_CPUS
>
> Do you get any warning about nr cpus not matching at boot up?
>
> Regardless, can you disable that and see if you still get the same
> crash.
>
> Thanks,
>
> -- Steve

Re: [PATCH 6.10.0-rc2] kernel/module: avoid panic on loading broken module

2024-06-18 Thread Luis Chamberlain

On Thu, Jun 06, 2024 at 03:31:49PM +0200, Daniel v. Kirschten wrote:
> If a module is being loaded, and the .gnu.linkonce.this_module section
> in the module's ELF file does not have the WRITE flag, the kernel will
> map the finished module struct of that module as read-only.
> This causes a kernel panic when the struct is written to the first time
> after it has been marked read-only. Currently this happens in
> complete_formation in kernel/module/main.c:2765 when the module's state is
> set to MODULE_STATE_COMING, just after setting up the memory protections.

How did you find this issue?

> Down the line, this seems to lead to unpredictable freezes when trying to
> load other modules - I guess this is due to some structures not being
> cleaned up properly, but I didn't investigate this further.
> 
> A check already exists which verifies that .gnu.linkonce.this_module
> is ALLOC. This patch simply adds an analogous check for WRITE.

Can you check to ensure our modules generated have a respective check to
ensure this check exists at build time? That would proactively inform
userspace when a built module is not built correctly, and the tool
responsible can be identified.

  Luis

[RFC PATCH 1/4] kernel/reboot: Introduce pre_restart notifiers

2024-06-18 Thread Mathieu Desnoyers

Introduce a new pre_restart notifier chain for callbacks that need to
be executed after the system has been made quiescent with
syscore_shutdown(), before machine restart.

This pre_restart notifier chain should be invoked on machine restart and
on emergency machine restart.

The use-case for this new notifier chain is to preserve tracing data
within pmem areas on systems where the BIOS does not clear memory across
warm reboots.

Why do we need a new notifier chain ?

1) The reboot and restart_prepare notifiers are called too early in the
   reboot sequence: they are invoked before syscore_shutdown(), which
   leaves other CPUs actively running threads while those notifiers are
   invoked.

2) The "restart" notifier is meant to trigger the actual machine
   restart, and is not meant to be invoked as a last step immediately
   before restart. It is also not always used: some architecture code
   choose to bypass this restart notifier and reboot directly from the
   architecture code.

Wiring up the architecture code to call this notifier chain is left to
follow-up arch-specific patches.

Signed-off-by: Mathieu Desnoyers 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Ira Weiny 
Cc: Steven Rostedt 
Cc: nvd...@lists.linux.dev
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
---
 include/linux/reboot.h |  4 
 kernel/reboot.c| 51 ++
 2 files changed, 55 insertions(+)

diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index abcdde4df697..c7f340e81451 100644
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -50,6 +50,10 @@ extern int register_restart_handler(struct notifier_block *);
 extern int unregister_restart_handler(struct notifier_block *);
 extern void do_kernel_restart(char *cmd);
 
+extern int register_pre_restart_handler(struct notifier_block *);
+extern int unregister_pre_restart_handler(struct notifier_block *);
+extern void do_kernel_pre_restart(char *cmd);
+
 /*
  * Architecture-specific implementations of sys_reboot commands.
  */
diff --git a/kernel/reboot.c b/kernel/reboot.c
index 22c16e2564cc..b7287dd48d35 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -235,6 +235,57 @@ void do_kernel_restart(char *cmd)
atomic_notifier_call_chain(&restart_handler_list, reboot_mode, cmd);
 }
 
+/*
+ * Notifier list for kernel code which wants to be called immediately
+ * before restarting the system.
+ */
+static ATOMIC_NOTIFIER_HEAD(pre_restart_handler_list);
+
+/**
+ * register_pre_restart_handler - Register function to be called in 
preparation
+ *to reset the system
+ * @nb: Info about handler function to be called
+ *
+ * Registers a function with code to be called in preparation to restart
+ * the system.
+ *
+ * Currently always returns zero, as atomic_notifier_chain_register()
+ * always returns zero.
+ */
+int register_pre_restart_handler(struct notifier_block *nb)
+{
+   return atomic_notifier_chain_register(&pre_restart_handler_list, nb);
+}
+EXPORT_SYMBOL(register_pre_restart_handler);
+
+/**
+ * unregister_pre_restart_handler - Unregister previously registered
+ *  pre-restart handler
+ * @nb: Hook to be unregistered
+ *
+ * Unregisters a previously registered pre-restart handler function.
+ *
+ * Returns zero on success, or %-ENOENT on failure.
+ */
+int unregister_pre_restart_handler(struct notifier_block *nb)
+{
+   return atomic_notifier_chain_unregister(&pre_restart_handler_list, nb);
+}
+EXPORT_SYMBOL(unregister_pre_restart_handler);
+
+/**
+ * do_kernel_pre_restart - Execute kernel pre-restart handler call chain
+ *
+ * Calls functions registered with register_pre_restart_handler.
+ *
+ * Expected to be called from machine_restart and
+ * machine_emergency_restart before invoking the restart handlers.
+ */
+void do_kernel_pre_restart(char *cmd)
+{
+   atomic_notifier_call_chain(&pre_restart_handler_list, reboot_mode, cmd);
+}
+
 void migrate_to_reboot_cpu(void)
 {
/* The boot cpu is always logical cpu 0 */
-- 
2.39.2

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-06-18 Thread Steven Rostedt

On Thu, 13 Jun 2024 10:32:24 +0300
Ilkka Naulapää  wrote:

> ok, so if you don't have any idea where this bug is after those debug
> patches, I'll try to find some time to bisect it as a last resort.
> Stay tuned.

FYI,

I just debugged a strange crash that was caused by my config having
something leftover from your config. Specifically, that was:

CONFIG_FORCE_NR_CPUS

Do you get any warning about nr cpus not matching at boot up?

Regardless, can you disable that and see if you still get the same
crash.

Thanks,

-- Steve

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-06-13 Thread Linux regression tracking (Thorsten Leemhuis)

On 13.06.24 09:32, Ilkka Naulapää wrote:
> On Wed, Jun 12, 2024 at 6:56 PM Steven Rostedt  wrote:
>> On Wed, 12 Jun 2024 15:36:22 +0200
>> "Linux regression tracking (Thorsten Leemhuis)"  
>> wrote:
>>>
>>> Ilkka or Steven, what happened to this? This thread looks stalled. I
>>> also was unsuccessful when looking for other threads related to this
>>> report or the culprit. Did it fall through the cracks or am I missing
>>> something here?
>
>> Honesty, I have no idea where the bug is. I can't reproduce it. [...]

Steven, thx for the update! And yeah, that's how it sometimes is. Given
that we haven't seen similar reports (at least afaics) it's nothing I
worry much about.

> ok, so if you don't have any idea where this bug is after those debug
> patches, I'll try to find some time to bisect it as a last resort.
> Stay tuned.

Yeah, that would be great help. Thank you, too!

Ciao, Thorsten

>>> On 02.06.24 09:32, Ilkka Naulapää wrote:
 sorry longer delay, been a bit busy but here is the result from that
 new patch. Only applied this patch so if the previous one is needed
 also, let me know and I'll rerun it.

 --Ilkka

 On Thu, May 30, 2024 at 5:00 PM Steven Rostedt  wrote:
>
> On Thu, 30 May 2024 16:02:37 +0300
> Ilkka Naulapää  wrote:
>
>> applied your patch and here's the output.
>>
>
> Unfortunately, it doesn't give me any new information. I added one more
> BUG on, want to try this? Otherwise, I'm pretty much at a lost. :-/
>
> -- Steve
>
> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> index de5b72216b1a..a090495e78c9 100644
> --- a/fs/tracefs/inode.c
> +++ b/fs/tracefs/inode.c
> @@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct 
> super_block *sb)
> return NULL;
>
> ti->flags = 0;
> +   ti->magic = 20240823;
>
> return &ti->vfs_inode;
>  }
>
>  static void tracefs_free_inode(struct inode *inode)
>  {
> -   kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode));
> +   struct tracefs_inode *ti = get_tracefs(inode);
> +
> +   BUG_ON(ti->magic != 20240823);
> +   kmem_cache_free(tracefs_inode_cachep, ti);
>  }
>
>  static ssize_t default_read_file(struct file *file, char __user *buf,
> @@ -147,16 +151,6 @@ static const struct inode_operations 
> tracefs_dir_inode_operations = {
> .rmdir  = tracefs_syscall_rmdir,
>  };
>
> -struct inode *tracefs_get_inode(struct super_block *sb)
> -{
> -   struct inode *inode = new_inode(sb);
> -   if (inode) {
> -   inode->i_ino = get_next_ino();
> -   inode->i_atime = inode->i_mtime = 
> inode_set_ctime_current(inode);
> -   }
> -   return inode;
> -}
> -
>  struct tracefs_mount_opts {
> kuid_t uid;
> kgid_t gid;
> @@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry 
> *dentry, struct inode *inode)
> return;
>
> ti = get_tracefs(inode);
> +   BUG_ON(ti->magic != 20240823);
> if (ti && ti->flags & TRACEFS_EVENT_INODE)
> eventfs_set_ef_status_free(dentry);
> iput(inode);
> @@ -568,6 +563,18 @@ struct dentry *eventfs_end_creating(struct dentry 
> *dentry)
> return dentry;
>  }
>
> +struct inode *tracefs_get_inode(struct super_block *sb)
> +{
> +   struct inode *inode = new_inode(sb);
> +
> +   BUG_ON(sb->s_op != &tracefs_super_operations);
> +   if (inode) {
> +   inode->i_ino = get_next_ino();
> +   inode->i_atime = inode->i_mtime = 
> inode_set_ctime_current(inode);
> +   }
> +   return inode;
> +}
> +
>  /**
>   * tracefs_create_file - create a file in the tracefs filesystem
>   * @name: a pointer to a string containing the name of the file to 
> create.
> diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
> index 69c2b1d87c46..9059b8b11bb6 100644
> --- a/fs/tracefs/internal.h
> +++ b/fs/tracefs/internal.h
> @@ -9,12 +9,15 @@ enum {
>  struct tracefs_inode {
> unsigned long   flags;
> void*private;
> +   unsigned long   magic;
> struct inodevfs_inode;
>  };
>
>  static inline struct tracefs_inode *get_tracefs(const struct inode 
> *inode)
>  {
> -   return container_of(inode, struct tracefs_inode, vfs_inode);
> +   struct tracefs_inode *ti = container_of(inode, struct 
> tracefs_inode, vfs_inode);
> +   BUG_ON(ti->magic != 20240823);
> +   return ti;
>  }
>
>  struct dentry *tracefs_start_creating(const char *name, struct dentry 
> *parent);

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-06-13 Thread Ilkka Naulapää

ok, so if you don't have any idea where this bug is after those debug
patches, I'll try to find some time to bisect it as a last resort.
Stay tuned.

--Ilkka

On Wed, Jun 12, 2024 at 6:56 PM Steven Rostedt  wrote:
>
> On Wed, 12 Jun 2024 15:36:22 +0200
> "Linux regression tracking (Thorsten Leemhuis)"  
> wrote:
>
> > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> > for once, to make this easily accessible to everyone.
> >
> > Ilkka or Steven, what happened to this? This thread looks stalled. I
> > also was unsuccessful when looking for other threads related to this
> > report or the culprit. Did it fall through the cracks or am I missing
> > something here?
>
> Honesty, I have no idea where the bug is. I can't reproduce it. These
> patches I sent would check all the places that add to the list to make
> sure the proper trace_inode was being added, and the output shows that
> they are all correct. Then suddenly, something that came from the
> inode cache is calling the tracefs inode cache to free it, and that's
> where the bug is happening.
>
> This really looks like another bug that the recent changes have made
> more predominate.
>
> -- Steve
>
>
> >
> > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > --
> > Everything you wanna know about Linux kernel regression tracking:
> > https://linux-regtracking.leemhuis.info/about/#tldr
> > If I did something stupid, please tell me, as explained on that page.
> >
> > #regzbot poke
> >
> > On 02.06.24 09:32, Ilkka Naulapää wrote:
> > > sorry longer delay, been a bit busy but here is the result from that
> > > new patch. Only applied this patch so if the previous one is needed
> > > also, let me know and I'll rerun it.
> > >
> > > --Ilkka
> > >
> > > On Thu, May 30, 2024 at 5:00 PM Steven Rostedt  
> > > wrote:
> > >>
> > >> On Thu, 30 May 2024 16:02:37 +0300
> > >> Ilkka Naulapää  wrote:
> > >>
> > >>> applied your patch and here's the output.
> > >>>
> > >>
> > >> Unfortunately, it doesn't give me any new information. I added one more
> > >> BUG on, want to try this? Otherwise, I'm pretty much at a lost. :-/
> > >>
> > >> -- Steve
> > >>
> > >> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> > >> index de5b72216b1a..a090495e78c9 100644
> > >> --- a/fs/tracefs/inode.c
> > >> +++ b/fs/tracefs/inode.c
> > >> @@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct 
> > >> super_block *sb)
> > >> return NULL;
> > >>
> > >> ti->flags = 0;
> > >> +   ti->magic = 20240823;
> > >>
> > >> return &ti->vfs_inode;
> > >>  }
> > >>
> > >>  static void tracefs_free_inode(struct inode *inode)
> > >>  {
> > >> -   kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode));
> > >> +   struct tracefs_inode *ti = get_tracefs(inode);
> > >> +
> > >> +   BUG_ON(ti->magic != 20240823);
> > >> +   kmem_cache_free(tracefs_inode_cachep, ti);
> > >>  }
> > >>
> > >>  static ssize_t default_read_file(struct file *file, char __user *buf,
> > >> @@ -147,16 +151,6 @@ static const struct inode_operations 
> > >> tracefs_dir_inode_operations = {
> > >> .rmdir  = tracefs_syscall_rmdir,
> > >>  };
> > >>
> > >> -struct inode *tracefs_get_inode(struct super_block *sb)
> > >> -{
> > >> -   struct inode *inode = new_inode(sb);
> > >> -   if (inode) {
> > >> -   inode->i_ino = get_next_ino();
> > >> -   inode->i_atime = inode->i_mtime = 
> > >> inode_set_ctime_current(inode);
> > >> -   }
> > >> -   return inode;
> > >> -}
> > >> -
> > >>  struct tracefs_mount_opts {
> > >> kuid_t uid;
> > >> kgid_t gid;
> > >> @@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry 
> > >> *dentry, struct inode *inode)
> > >> return;
> > >>
> > >> ti = get_tracefs(inode);
> > >> +   BUG_ON(ti->magic != 20240823);
> > >>

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-06-12 Thread Steven Rostedt

On Wed, 12 Jun 2024 15:36:22 +0200
"Linux regression tracking (Thorsten Leemhuis)"  
wrote:

> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
> 
> Ilkka or Steven, what happened to this? This thread looks stalled. I
> also was unsuccessful when looking for other threads related to this
> report or the culprit. Did it fall through the cracks or am I missing
> something here?

Honesty, I have no idea where the bug is. I can't reproduce it. These
patches I sent would check all the places that add to the list to make
sure the proper trace_inode was being added, and the output shows that
they are all correct. Then suddenly, something that came from the
inode cache is calling the tracefs inode cache to free it, and that's
where the bug is happening.

This really looks like another bug that the recent changes have made
more predominate.

-- Steve


> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
> 
> #regzbot poke
> 
> On 02.06.24 09:32, Ilkka Naulapää wrote:
> > sorry longer delay, been a bit busy but here is the result from that
> > new patch. Only applied this patch so if the previous one is needed
> > also, let me know and I'll rerun it.
> > 
> > --Ilkka
> > 
> > On Thu, May 30, 2024 at 5:00 PM Steven Rostedt  wrote: 
> >  
> >>
> >> On Thu, 30 May 2024 16:02:37 +0300
> >> Ilkka Naulapää  wrote:
> >>  
> >>> applied your patch and here's the output.
> >>>  
> >>
> >> Unfortunately, it doesn't give me any new information. I added one more
> >> BUG on, want to try this? Otherwise, I'm pretty much at a lost. :-/
> >>
> >> -- Steve
> >>
> >> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
> >> index de5b72216b1a..a090495e78c9 100644
> >> --- a/fs/tracefs/inode.c
> >> +++ b/fs/tracefs/inode.c
> >> @@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct 
> >> super_block *sb)
> >> return NULL;
> >>
> >> ti->flags = 0;
> >> +   ti->magic = 20240823;
> >>
> >> return &ti->vfs_inode;
> >>  }
> >>
> >>  static void tracefs_free_inode(struct inode *inode)
> >>  {
> >> -   kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode));
> >> +   struct tracefs_inode *ti = get_tracefs(inode);
> >> +
> >> +   BUG_ON(ti->magic != 20240823);
> >> +   kmem_cache_free(tracefs_inode_cachep, ti);
> >>  }
> >>
> >>  static ssize_t default_read_file(struct file *file, char __user *buf,
> >> @@ -147,16 +151,6 @@ static const struct inode_operations 
> >> tracefs_dir_inode_operations = {
> >> .rmdir  = tracefs_syscall_rmdir,
> >>  };
> >>
> >> -struct inode *tracefs_get_inode(struct super_block *sb)
> >> -{
> >> -   struct inode *inode = new_inode(sb);
> >> -   if (inode) {
> >> -   inode->i_ino = get_next_ino();
> >> -   inode->i_atime = inode->i_mtime = 
> >> inode_set_ctime_current(inode);
> >> -   }
> >> -   return inode;
> >> -}
> >> -
> >>  struct tracefs_mount_opts {
> >> kuid_t uid;
> >> kgid_t gid;
> >> @@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry *dentry, 
> >> struct inode *inode)
> >> return;
> >>
> >> ti = get_tracefs(inode);
> >> +   BUG_ON(ti->magic != 20240823);
> >> if (ti && ti->flags & TRACEFS_EVENT_INODE)
> >> eventfs_set_ef_status_free(dentry);
> >> iput(inode);
> >> @@ -568,6 +563,18 @@ struct dentry *eventfs_end_creating(struct dentry 
> >> *dentry)
> >> return dentry;
> >>  }
> >>
> >> +struct inode *tracefs_get_inode(struct super_block *sb)
> >> +{
> >> +   struct inode *inode = new_inode(sb);
> >> +
> >> +   BUG_ON(sb->s_op != &tracefs_super_operations);
> >> +   if (inode) {
> >> +   inode->i_ino = get_next_ino();
> >> +   inode->i_atime = inode->i_

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-06-12 Thread Linux regression tracking (Thorsten Leemhuis)

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Ilkka or Steven, what happened to this? This thread looks stalled. I
also was unsuccessful when looking for other threads related to this
report or the culprit. Did it fall through the cracks or am I missing
something here?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 02.06.24 09:32, Ilkka Naulapää wrote:
> sorry longer delay, been a bit busy but here is the result from that
> new patch. Only applied this patch so if the previous one is needed
> also, let me know and I'll rerun it.
> 
> --Ilkka
> 
> On Thu, May 30, 2024 at 5:00 PM Steven Rostedt  wrote:
>>
>> On Thu, 30 May 2024 16:02:37 +0300
>> Ilkka Naulapää  wrote:
>>
>>> applied your patch and here's the output.
>>>
>>
>> Unfortunately, it doesn't give me any new information. I added one more
>> BUG on, want to try this? Otherwise, I'm pretty much at a lost. :-/
>>
>> -- Steve
>>
>> diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
>> index de5b72216b1a..a090495e78c9 100644
>> --- a/fs/tracefs/inode.c
>> +++ b/fs/tracefs/inode.c
>> @@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct 
>> super_block *sb)
>> return NULL;
>>
>> ti->flags = 0;
>> +   ti->magic = 20240823;
>>
>> return &ti->vfs_inode;
>>  }
>>
>>  static void tracefs_free_inode(struct inode *inode)
>>  {
>> -   kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode));
>> +   struct tracefs_inode *ti = get_tracefs(inode);
>> +
>> +   BUG_ON(ti->magic != 20240823);
>> +   kmem_cache_free(tracefs_inode_cachep, ti);
>>  }
>>
>>  static ssize_t default_read_file(struct file *file, char __user *buf,
>> @@ -147,16 +151,6 @@ static const struct inode_operations 
>> tracefs_dir_inode_operations = {
>> .rmdir  = tracefs_syscall_rmdir,
>>  };
>>
>> -struct inode *tracefs_get_inode(struct super_block *sb)
>> -{
>> -   struct inode *inode = new_inode(sb);
>> -   if (inode) {
>> -   inode->i_ino = get_next_ino();
>> -   inode->i_atime = inode->i_mtime = 
>> inode_set_ctime_current(inode);
>> -   }
>> -   return inode;
>> -}
>> -
>>  struct tracefs_mount_opts {
>> kuid_t uid;
>> kgid_t gid;
>> @@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry *dentry, 
>> struct inode *inode)
>> return;
>>
>> ti = get_tracefs(inode);
>> +   BUG_ON(ti->magic != 20240823);
>> if (ti && ti->flags & TRACEFS_EVENT_INODE)
>> eventfs_set_ef_status_free(dentry);
>> iput(inode);
>> @@ -568,6 +563,18 @@ struct dentry *eventfs_end_creating(struct dentry 
>> *dentry)
>> return dentry;
>>  }
>>
>> +struct inode *tracefs_get_inode(struct super_block *sb)
>> +{
>> +   struct inode *inode = new_inode(sb);
>> +
>> +   BUG_ON(sb->s_op != &tracefs_super_operations);
>> +   if (inode) {
>> +   inode->i_ino = get_next_ino();
>> +   inode->i_atime = inode->i_mtime = 
>> inode_set_ctime_current(inode);
>> +   }
>> +   return inode;
>> +}
>> +
>>  /**
>>   * tracefs_create_file - create a file in the tracefs filesystem
>>   * @name: a pointer to a string containing the name of the file to create.
>> diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
>> index 69c2b1d87c46..9059b8b11bb6 100644
>> --- a/fs/tracefs/internal.h
>> +++ b/fs/tracefs/internal.h
>> @@ -9,12 +9,15 @@ enum {
>>  struct tracefs_inode {
>> unsigned long   flags;
>> void*private;
>> +   unsigned long   magic;
>> struct inodevfs_inode;
>>  };
>>
>>  static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
>>  {
>> -   return container_of(inode, struct tracefs_inode, vfs_inode);
>> +   struct tracefs_inode *ti = container_of(inode, struct tracefs_inode, 
>> vfs_inode);
>> +   BUG_ON(ti->magic != 20240823);
>> +   return ti;
>>  }
>>
>>  struct dentry *tracefs_start_creating(const char *name, struct dentry 
>> *parent);

Re: [PATCH -next 2/2] ftrace: Add kernel-doc comments for unregister_ftrace_direct() function

2024-06-10 Thread Steven Rostedt

On Fri,  7 Jun 2024 16:49:57 +0800
Yang Li  wrote:

> Added kernel-doc comments for the unregister_ftrace_direct() function to
> improve code documentation and readability.
> 

Someone else beat you to this.

-- Steve

> Reported-by: Abaci Robot 
> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9300
> Signed-off-by: Yang Li 
> ---
>  kernel/trace/ftrace.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 4aeb1183ea9f..3b0dbd55cc05 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -5988,6 +5988,8 @@ EXPORT_SYMBOL_GPL(register_ftrace_direct);
>   * unregister_ftrace_direct - Remove calls to custom trampoline
>   * previously registered by register_ftrace_direct for @ops object.
>   * @ops: The address of the struct ftrace_ops object
> + * @addr: The address of the direct call to remove
> + * @free_filters: Boolean indicating whether to free the filters
>   *
>   * This is used to remove a direct calls to @addr from the nop locations
>   * of the functions registered in @ops (with by ftrace_set_filter_ip

Re: [PATCH -next 1/2] function_graph: Add kernel-doc comments for ftrace_graph_ret_addr() function

2024-06-10 Thread Steven Rostedt

On Fri,  7 Jun 2024 16:49:56 +0800
Yang Li  wrote:

> Added kernel-doc comments for the ftrace_graph_ret_addr() function to
> improve code documentation and readability.
> 
> Reported-by: Abaci Robot 
> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9299
> Signed-off-by: Yang Li 
> ---
>  kernel/trace/fgraph.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> index a13551a023aa..4ad33e4cb8da 100644
> --- a/kernel/trace/fgraph.c
> +++ b/kernel/trace/fgraph.c
> @@ -872,6 +872,12 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int 
> idx)
>  /**
>   * ftrace_graph_ret_addr - convert a potentially modified stack return 
> address
>   *  to its original value
> + * @task: pointer to the task_struct of the task being examined
> + * @idx: pointer to a state variable, should be initialized to zero
> + *before the first call

parameter descriptions should not go across more than one line. At least
not in my code. Also, you don't need to add that it needs to be initialized
here. That belongs in the body.

And it's not a state variable. The description you got that from is wrong.

I'll go update it and give you a reported by, as the entire description
needs a rewrite.

-- Steve


> + * @ret: the current return address found on the stack
> + * @retp: pointer to the return address on the stack, ignored if
> + * HAVE_FUNCTION_GRAPH_RET_ADDR_PTR is not defined
>   *
>   * This function can be called by stack unwinding code to convert a found 
> stack
>   * return address ('ret') to its original value, in case the function graph

Re: [PATCH v8 5/5] remoteproc: qcom: enable in-kernel PD mapper

2024-06-07 Thread Chris Lew





On 5/11/2024 2:56 PM, Dmitry Baryshkov wrote:

Request in-kernel protection domain mapper to be started before starting
Qualcomm DSP and release it once DSP is stopped. Once all DSPs are
stopped, the PD mapper will be stopped too.

Signed-off-by: Dmitry Baryshkov 
---
  drivers/remoteproc/qcom_common.c| 87 +
  drivers/remoteproc/qcom_common.h| 10 +
  drivers/remoteproc/qcom_q6v5_adsp.c |  3 ++
  drivers/remoteproc/qcom_q6v5_mss.c  |  3 ++
  drivers/remoteproc/qcom_q6v5_pas.c  |  3 ++
  drivers/remoteproc/qcom_q6v5_wcss.c |  3 ++
  6 files changed, 109 insertions(+)



Thanks for looking into whether this could be implemented as a 
remoteproc subdevice.


Reviewed-by: Chris Lew

[PATCH] kernel/trace: fix possible deadlock in trie_delete_elem

2024-06-07 Thread Wojciech Gładysz

On bpf syscall map operations the bpf_disable_instrumentation function
is called for the reason described in the comment to the function.
The description matches the bug case. The function increments a per CPU
integer variable bpf_prog_active. The variable is not processed in the
bpf trace path. The fix implements a similar processing as for kprobe
handling. The fix degrades the bpf tracing by skipping some eBPF trace
sequences that otherwise might yield deadlock.

Reported-by: syzbot+9d95beb2a3c260622...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9d95beb2a3c260622518
Link: https://lore.kernel.org/all/adb08b0614139...@google.com/T/
Signed-off-by: Wojciech Gładysz 
---
 kernel/trace/bpf_trace.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 6249dac61701..8de2e084b162 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2391,7 +2391,9 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 
*args)
struct bpf_trace_run_ctx run_ctx;
 
cant_sleep();
-   if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
+
+   // if the instrumentation is not disabled disable recurrence and go
+   if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
bpf_prog_inc_misses_counter(prog);
goto out;
}
@@ -2405,7 +2407,7 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 
*args)
 
bpf_reset_run_ctx(old_run_ctx);
 out:
-   this_cpu_dec(*(prog->active));
+   __this_cpu_dec(bpf_prog_active);
 }
 
 #define UNPACK(...)__VA_ARGS__
-- 
2.35.3

[PATCH -next 2/2] ftrace: Add kernel-doc comments for unregister_ftrace_direct() function

2024-06-07 Thread Yang Li

Added kernel-doc comments for the unregister_ftrace_direct() function to
improve code documentation and readability.

Reported-by: Abaci Robot 
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9300
Signed-off-by: Yang Li 
---
 kernel/trace/ftrace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4aeb1183ea9f..3b0dbd55cc05 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5988,6 +5988,8 @@ EXPORT_SYMBOL_GPL(register_ftrace_direct);
  * unregister_ftrace_direct - Remove calls to custom trampoline
  * previously registered by register_ftrace_direct for @ops object.
  * @ops: The address of the struct ftrace_ops object
+ * @addr: The address of the direct call to remove
+ * @free_filters: Boolean indicating whether to free the filters
  *
  * This is used to remove a direct calls to @addr from the nop locations
  * of the functions registered in @ops (with by ftrace_set_filter_ip
-- 
2.20.1.7.g153144c

[PATCH -next 1/2] function_graph: Add kernel-doc comments for ftrace_graph_ret_addr() function

2024-06-07 Thread Yang Li

Added kernel-doc comments for the ftrace_graph_ret_addr() function to
improve code documentation and readability.

Reported-by: Abaci Robot 
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9299
Signed-off-by: Yang Li 
---
 kernel/trace/fgraph.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index a13551a023aa..4ad33e4cb8da 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -872,6 +872,12 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int 
idx)
 /**
  * ftrace_graph_ret_addr - convert a potentially modified stack return address
  *to its original value
+ * @task: pointer to the task_struct of the task being examined
+ * @idx: pointer to a state variable, should be initialized to zero
+ *  before the first call
+ * @ret: the current return address found on the stack
+ * @retp: pointer to the return address on the stack, ignored if
+ *   HAVE_FUNCTION_GRAPH_RET_ADDR_PTR is not defined
  *
  * This function can be called by stack unwinding code to convert a found stack
  * return address ('ret') to its original value, in case the function graph
-- 
2.20.1.7.g153144c

[PATCH 6.10.0-rc2] kernel/module: avoid panic on loading broken module

2024-06-06 Thread Daniel v. Kirschten


If a module is being loaded, and the .gnu.linkonce.this_module section
in the module's ELF file does not have the WRITE flag, the kernel will
map the finished module struct of that module as read-only.
This causes a kernel panic when the struct is written to the first time
after it has been marked read-only. Currently this happens in
complete_formation in kernel/module/main.c:2765 when the module's state is
set to MODULE_STATE_COMING, just after setting up the memory protections.

Down the line, this seems to lead to unpredictable freezes when trying to
load other modules - I guess this is due to some structures not being
cleaned up properly, but I didn't investigate this further.

A check already exists which verifies that .gnu.linkonce.this_module
is ALLOC. This patch simply adds an analogous check for WRITE.

Signed-off-by: Daniel Kirschten 
---
 kernel/module/main.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index d18a94b973e1..abba097551a2 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1886,6 +1886,12 @@ static int elf_validity_cache_copy(struct load_info 
*info, int flags)
goto no_exec;
}
 
+   if (!(shdr->sh_flags & SHF_WRITE)) {

+   pr_err("module %s: .gnu.linkonce.this_module must be 
writable\n",
+  info->name ?: "(missing .modinfo section or name 
field)");
+   goto no_exec;
+   }
+
if (shdr->sh_size != sizeof(struct module)) {
pr_err("module %s: .gnu.linkonce.this_module section size must match 
the kernel's built struct module size at run time\n",
   info->name ?: "(missing .modinfo section or name 
field)");
--
2.34.1

Re: [PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation

2024-06-06 Thread Neil Armstrong


On 11/05/2024 23:56, Dmitry Baryshkov wrote:

Protection domain mapper is a QMI service providing mapping between
'protection domains' and services supported / allowed in these domains.
For example such mapping is required for loading of the WiFi firmware or
for properly starting up the UCSI / altmode / battery manager support.

The existing userspace implementation has several issue. It doesn't play
well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
firmware location is changed (or if the firmware was not available at
the time pd-mapper was started but the corresponding directory is
mounted later), etc.

However this configuration is largely static and common between
different platforms. Provide in-kernel service implementing static
per-platform data.

To: Bjorn Andersson 
To: Konrad Dybcio 
To: Sibi Sankar 
To: Mathieu Poirier 
Cc: linux-arm-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-remotep...@vger.kernel.org
Cc: Johan Hovold 
Cc: Xilin Wu 
Cc: "Bryan O'Donoghue" 
Cc: Steev Klimaszewski 
Cc: Alexey Minnekhanov 

--

Changes in v8:
- Reworked pd-mapper to register as an rproc_subdev / auxdev
- Dropped Tested-by from Steev and Alexey from the last patch since the
   implementation was changed significantly.
- Add sensors, cdsp and mpss_root domains to 660 config (Alexey
   Minnekhanov)
- Added platform entry for sm4250 (used for qrb4210 / RB2)
- Added locking to the pdr_get_domain_list() (Chris Lew)
- Remove the call to qmi_del_server() and corresponding API (Chris Lew)
- In qmi_handle_init() changed 1024 to a defined constant (Chris Lew)
- Link to v7: 
https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org

Changes in v7:
- Fixed modular build (Steev)
- Link to v6: 
https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org

Changes in v6:
- Reworked mutex to fix lockdep issue on deregistration
- Fixed dependencies between PD-mapper and remoteproc to fix modular
   builds (Krzysztof)
- Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
- Fixed kerneldocs (Krzysztof)
- Removed extra pr_debug messages (Krzysztof)
- Fixed wcss build (Krzysztof)
- Added platforms which do not require protection domain mapping to
   silence the notice on those platforms
- Link to v5: 
https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org

Changes in v5:
- pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew)
- pd_mapper: reworked to provide static configuration per platform
   (Bjorn)
- Link to v4: 
https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org

Changes in v4:
- Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
- Added configuration for sm6350 (Thanks to Luca)
- Removed RFC tag (Konrad)
- Link to v3: 
https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org

Changes in RFC v3:
- Send start / stop notifications when PD-mapper domain list is changed
- Reworked the way PD-mapper treats protection domains, register all of
   them in a single batch
- Added SC7180 domains configuration based on TCL Book 14 GO
- Link to v2: 
https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org

Changes in RFC v2:
- Swapped num_domains / domains (Konrad)
- Fixed an issue with battery not working on sc8280xp
- Added missing configuration for QCS404

---
Dmitry Baryshkov (5):
   soc: qcom: pdr: protect locator_addr with the main mutex
   soc: qcom: pdr: fix parsing of domains lists
   soc: qcom: pdr: extract PDR message marshalling data
   soc: qcom: add pd-mapper implementation
   remoteproc: qcom: enable in-kernel PD mapper

  drivers/remoteproc/qcom_common.c|  87 +
  drivers/remoteproc/qcom_common.h|  10 +
  drivers/remoteproc/qcom_q6v5_adsp.c |   3 +
  drivers/remoteproc/qcom_q6v5_mss.c  |   3 +
  drivers/remoteproc/qcom_q6v5_pas.c  |   3 +
  drivers/remoteproc/qcom_q6v5_wcss.c |   3 +
  drivers/soc/qcom/Kconfig|  15 +
  drivers/soc/qcom/Makefile   |   2 +
  drivers/soc/qcom/pdr_interface.c|  17 +-
  drivers/soc/qcom/pdr_internal.h | 318 ++---
  drivers/soc/qcom/qcom_pd_mapper.c   | 676 
  drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
  12 files changed, 1190 insertions(+), 300 deletions(-)
---
base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488
change-id: 20240301-qcom-pd-mapper-e12d622d4ad0

Best regards,


Tested-by: Neil Armstrong  # on SM8550-QRD
Tested-by: Neil Armstrong  # on SM8550-HDK
Tested-by: Neil Armstrong  # on SM8650-QRD

Thanks,
Neil

Re: [PATCH 0/6] ftrace: Minor fixes for sparse and kernel test robot

2024-06-05 Thread Google

On Wed, 05 Jun 2024 16:26:44 -0400
Steven Rostedt  wrote:

> 
> Recieved some minor bug reports from the kernel test robot. First I started
> cleaning up some of the sparse warnings. There's many more, but most changes
> are not really helping anything, but just quieting the warnings.
> 
> But the reports from kernel test robot need to be fixed.

All looks good to me.

Acked-by: Masami Hiramatsu (Google) 

Thank you!

> 
> Steven Rostedt (Google) (6):
>   ftrace: Declare function_trace_op in header to quiet sparse warning
>   ftrace: Assign ftrace_list_end to ftrace_ops_list type cast to RCU
>   ftrace: Assign RCU list variable with rcu_assign_ptr()
>   ftrace: Fix prototypes for ftrace_startup/shutdown_subops()
>   function_graph: Make fgraph_do_direct static key static
>   function_graph: Do not update pid func if CONFIG_DYNAMIC_FTRACE not 
> enabled
> 
> 
>  include/linux/ftrace.h | 3 +++
>  kernel/trace/fgraph.c      | 4 +++-
>  kernel/trace/ftrace.c  | 4 ++--
>  kernel/trace/ftrace_internal.h | 9 +
>  kernel/trace/trace.h   | 1 -
>  5 files changed, 17 insertions(+), 4 deletions(-)


-- 
Masami Hiramatsu (Google)

[PATCH 0/6] ftrace: Minor fixes for sparse and kernel test robot

2024-06-05 Thread Steven Rostedt



Recieved some minor bug reports from the kernel test robot. First I started
cleaning up some of the sparse warnings. There's many more, but most changes
are not really helping anything, but just quieting the warnings.

But the reports from kernel test robot need to be fixed.

Steven Rostedt (Google) (6):
  ftrace: Declare function_trace_op in header to quiet sparse warning
  ftrace: Assign ftrace_list_end to ftrace_ops_list type cast to RCU
  ftrace: Assign RCU list variable with rcu_assign_ptr()
  ftrace: Fix prototypes for ftrace_startup/shutdown_subops()
  function_graph: Make fgraph_do_direct static key static
  function_graph: Do not update pid func if CONFIG_DYNAMIC_FTRACE not 
enabled


 include/linux/ftrace.h | 3 +++
 kernel/trace/fgraph.c  | 4 +++-
 kernel/trace/ftrace.c  | 4 ++--
 kernel/trace/ftrace_internal.h | 9 +++++
 kernel/trace/trace.h   | 1 -
 5 files changed, 17 insertions(+), 4 deletions(-)

[PATCH v4 07/11] riscv: mm: Take memory hotplug read-lock during kernel page table dump

2024-06-05 Thread Björn Töpel

From: Björn Töpel 

During memory hot remove, the ptdump functionality can end up touching
stale data. Avoid any potential crashes (or worse), by holding the
memory hotplug read-lock while traversing the page table.

This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm:
Hold memory hotplug lock while walking for kernel page table dump").

Reviewed-by: David Hildenbrand 
Reviewed-by: Oscar Salvador 
Signed-off-by: Björn Töpel 
---
 arch/riscv/mm/ptdump.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 1289cc6d3700..9d5f657a251b 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -370,7 +371,9 @@ bool ptdump_check_wx(void)
 
 static int ptdump_show(struct seq_file *m, void *v)
 {
+   get_online_mems();
ptdump_walk(m, m->private);
+   put_online_mems();
 
return 0;
 }
-- 
2.43.0

Re: [PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation

2024-05-30 Thread classabbyamp

I've tested this applied on top of kernel 6.8.11 on an X13s over the 
past week and it's been working well.


--
classabbyamp

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-30 Thread Steven Rostedt

On Thu, 30 May 2024 16:02:37 +0300
Ilkka Naulapää  wrote:

> applied your patch and here's the output.
> 

Unfortunately, it doesn't give me any new information. I added one more
BUG on, want to try this? Otherwise, I'm pretty much at a lost. :-/

-- Steve

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index de5b72216b1a..a090495e78c9 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct super_block 
*sb)
return NULL;
 
ti->flags = 0;
+   ti->magic = 20240823;
 
return &ti->vfs_inode;
 }
 
 static void tracefs_free_inode(struct inode *inode)
 {
-   kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode));
+   struct tracefs_inode *ti = get_tracefs(inode);
+
+   BUG_ON(ti->magic != 20240823);
+   kmem_cache_free(tracefs_inode_cachep, ti);
 }
 
 static ssize_t default_read_file(struct file *file, char __user *buf,
@@ -147,16 +151,6 @@ static const struct inode_operations 
tracefs_dir_inode_operations = {
.rmdir  = tracefs_syscall_rmdir,
 };
 
-struct inode *tracefs_get_inode(struct super_block *sb)
-{
-   struct inode *inode = new_inode(sb);
-   if (inode) {
-   inode->i_ino = get_next_ino();
-   inode->i_atime = inode->i_mtime = 
inode_set_ctime_current(inode);
-   }
-   return inode;
-}
-
 struct tracefs_mount_opts {
kuid_t uid;
kgid_t gid;
@@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry *dentry, 
struct inode *inode)
return;
 
ti = get_tracefs(inode);
+   BUG_ON(ti->magic != 20240823);
if (ti && ti->flags & TRACEFS_EVENT_INODE)
eventfs_set_ef_status_free(dentry);
iput(inode);
@@ -568,6 +563,18 @@ struct dentry *eventfs_end_creating(struct dentry *dentry)
return dentry;
 }
 
+struct inode *tracefs_get_inode(struct super_block *sb)
+{
+   struct inode *inode = new_inode(sb);
+
+   BUG_ON(sb->s_op != &tracefs_super_operations);
+   if (inode) {
+   inode->i_ino = get_next_ino();
+   inode->i_atime = inode->i_mtime = 
inode_set_ctime_current(inode);
+   }
+   return inode;
+}
+
 /**
  * tracefs_create_file - create a file in the tracefs filesystem
  * @name: a pointer to a string containing the name of the file to create.
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 69c2b1d87c46..9059b8b11bb6 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -9,12 +9,15 @@ enum {
 struct tracefs_inode {
unsigned long   flags;
void*private;
+   unsigned long   magic;
struct inodevfs_inode;
 };
 
 static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
 {
-   return container_of(inode, struct tracefs_inode, vfs_inode);
+   struct tracefs_inode *ti = container_of(inode, struct tracefs_inode, 
vfs_inode);
+   BUG_ON(ti->magic != 20240823);
+   return ti;
 }
 
 struct dentry *tracefs_start_creating(const char *name, struct dentry *parent);

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-29 Thread Steven Rostedt

On Wed, 29 May 2024 14:47:57 -0400
Steven Rostedt  wrote:

> Let me make a debug patch (that crashes on this issue) for that kernel,
> and perhaps you could bisect it?

Can you try this on 6.6-rc1 and see if it gives you any other splats?

Hmm, you can switch it to WARN_ON and that way it may not crash the
machine, and you can use dmesg to get the output.

Thanks,

-- Steve


diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index de5b72216b1a..a090495e78c9 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -39,13 +39,17 @@ static struct inode *tracefs_alloc_inode(struct super_block 
*sb)
return NULL;
 
ti->flags = 0;
+   ti->magic = 20240823;
 
return &ti->vfs_inode;
 }
 
 static void tracefs_free_inode(struct inode *inode)
 {
-   kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode));
+   struct tracefs_inode *ti = get_tracefs(inode);
+
+   BUG_ON(ti->magic != 20240823);
+   kmem_cache_free(tracefs_inode_cachep, ti);
 }
 
 static ssize_t default_read_file(struct file *file, char __user *buf,
@@ -147,16 +151,6 @@ static const struct inode_operations 
tracefs_dir_inode_operations = {
.rmdir  = tracefs_syscall_rmdir,
 };
 
-struct inode *tracefs_get_inode(struct super_block *sb)
-{
-   struct inode *inode = new_inode(sb);
-   if (inode) {
-   inode->i_ino = get_next_ino();
-   inode->i_atime = inode->i_mtime = 
inode_set_ctime_current(inode);
-   }
-   return inode;
-}
-
 struct tracefs_mount_opts {
kuid_t uid;
kgid_t gid;
@@ -384,6 +378,7 @@ static void tracefs_dentry_iput(struct dentry *dentry, 
struct inode *inode)
return;
 
ti = get_tracefs(inode);
+   BUG_ON(ti->magic != 20240823);
if (ti && ti->flags & TRACEFS_EVENT_INODE)
eventfs_set_ef_status_free(dentry);
iput(inode);
@@ -568,6 +563,18 @@ struct dentry *eventfs_end_creating(struct dentry *dentry)
return dentry;
 }
 
+struct inode *tracefs_get_inode(struct super_block *sb)
+{
+   struct inode *inode = new_inode(sb);
+
+   BUG_ON(sb->s_op != &tracefs_super_operations);
+   if (inode) {
+   inode->i_ino = get_next_ino();
+   inode->i_atime = inode->i_mtime = 
inode_set_ctime_current(inode);
+   }
+   return inode;
+}
+
 /**
  * tracefs_create_file - create a file in the tracefs filesystem
  * @name: a pointer to a string containing the name of the file to create.
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 69c2b1d87c46..9f6f303a9e58 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -9,6 +9,7 @@ enum {
 struct tracefs_inode {
unsigned long   flags;
void*private;
+   unsigned long   magic;
struct inodevfs_inode;
 };

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-29 Thread Steven Rostedt

On Wed, 29 May 2024 21:36:08 +0300
Ilkka Naulapää  wrote:

> applied your patch without others, so trace and panic there.
> Screenshot attached. Also tested kernels backward and found out that

Bah, it's still in an RCU callback, which doesn't tell us why a
normal inode is being sent to the trace inode free list.

> this trace bug first triggered on 6.6-rc1.

Hmm, that's when eventfs was added.

> 
> Let me know if you need more assistance with this.

Let me make a debug patch (that crashes on this issue) for that kernel,
and perhaps you could bisect it?

Thanks!

-- Steve

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-28 Thread Steven Rostedt

On Tue, 28 May 2024 07:51:30 +0300
Ilkka Naulapää  wrote:

> yeah, the cache_from_obj tracing bug (without panic) has been
> displayed quite some time now - maybe even since 6.7.x or so. I could
> try checking a few versions back for this and try bisecting it if I
> can find when this started.
> 

OK, so I don't think the commit your last bisect hit is the cause of
the bug. It added a delay (via RCU) and is causing the real bug to blow
up more.

Can you add this patch to v6.9.2 and hopefully it crashes in a better
location that we can find where the mixup happened.

You may need to add the other commit (too if this doesn't trigger.

Thanks,

-- Steve

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 417c840e6403..7af3f696696d 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -50,6 +50,7 @@ static struct inode *tracefs_alloc_inode(struct super_block 
*sb)
list_add_rcu(&ti->list, &tracefs_inodes);
spin_unlock_irqrestore(&tracefs_inode_lock, flags);
 
+   ti->magic = 20240823;
return &ti->vfs_inode;
 }
 
@@ -66,6 +67,7 @@ static void tracefs_free_inode(struct inode *inode)
struct tracefs_inode *ti = get_tracefs(inode);
unsigned long flags;
 
+   BUG_ON(ti->magic != 20240823);
spin_lock_irqsave(&tracefs_inode_lock, flags);
list_del_rcu(&ti->list);
spin_unlock_irqrestore(&tracefs_inode_lock, flags);
@@ -271,16 +273,6 @@ static const struct inode_operations 
tracefs_file_inode_operations = {
.setattr= tracefs_setattr,
 };
 
-struct inode *tracefs_get_inode(struct super_block *sb)
-{
-   struct inode *inode = new_inode(sb);
-   if (inode) {
-   inode->i_ino = get_next_ino();
-   simple_inode_init_ts(inode);
-   }
-   return inode;
-}
-
 struct tracefs_mount_opts {
kuid_t uid;
kgid_t gid;
@@ -448,6 +440,17 @@ static const struct super_operations 
tracefs_super_operations = {
.show_options   = tracefs_show_options,
 };
 
+struct inode *tracefs_get_inode(struct super_block *sb)
+{
+   struct inode *inode = new_inode(sb);
+   BUG_ON(sb->s_op != &tracefs_super_operations);
+   if (inode) {
+   inode->i_ino = get_next_ino();
+   simple_inode_init_ts(inode);
+   }
+   return inode;
+}
+
 /*
  * It would be cleaner if eventfs had its own dentry ops.
  *
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index f704d8348357..dda7d2708e30 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -16,6 +16,7 @@ struct tracefs_inode {
};
/* The below gets initialized with memset_after(ti, 0, vfs_inode) */
struct list_headlist;
+   unsigned long   magic;
unsigned long   flags;
void*private;
 };

Re: [PATCH] rv: Update rv_en(dis)able_monitor doc to match kernel-doc

2024-05-28 Thread Daniel Bristot de Oliveira

On 5/20/24 07:42, Yang Li wrote:
> The patch updates the function documentation comment for
> rv_en(dis)able_monitor to adhere to the kernel-doc specification.
> 
> Signed-off-by: Yang Li 

Acked-by: Daniel Bristot de Oliveira 

Thanks
-- Daniel

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-27 Thread Ilkka Naulapää

yeah, the cache_from_obj tracing bug (without panic) has been
displayed quite some time now - maybe even since 6.7.x or so. I could
try checking a few versions back for this and try bisecting it if I
can find when this started.

--Ilkka

On Tue, May 28, 2024 at 1:31 AM Steven Rostedt  wrote:
>
> On Fri, 24 May 2024 12:50:08 +0200
> "Linux regression tracking (Thorsten Leemhuis)"  
> wrote:
>
> > > - Affected Versions: Before kernel version 6.8.10, the bug caused a
> > > quick display of a kernel trace dump before the shutdown/reboot
> > > completed. Starting from version 6.8.10 and continuing into version
> > > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic,
> > > preventing the shutdown or reboot from completing and leaving the
> > > machine stuck.
>
> You state "Before kernel version 6.8.10, the bug caused ...". Does that
> mean that a bug was happening before v6.8.10? But did not cause a panic?
>
> I just noticed your second screen shot from your report, and it has:
>
>  "cache_from_obj: Wrong slab cache, tracefs_inode_cache but object is from 
> inode_cache"
>
> So somehow an tracefs_inode was allocated from the inode_cache and is
> now being freed by the tracefs_inode logic? Did this happen before
> 6.8.10? If so, this code could just be triggering an issue from an
> unrelated bug.
>
> Thanks,
>
> -- Steve

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-27 Thread Ilkka Naulapää

I tried 6.10-rc1 and it still ends up to panic

--Ilkka


On Tue, May 28, 2024 at 12:44 AM Steven Rostedt  wrote:
>
> On Mon, 27 May 2024 20:14:42 +0200
> Greg KH  wrote:
>
> > On Mon, May 27, 2024 at 07:40:21PM +0300, Ilkka Naulapää wrote:
> > > Hi Steven,
> > >
> > > I took some time and bisected the 6.8.9 - 6.8.10 and git gave the
> > > panic inducing commit:
> > >
> > > 414fb08628143 (tracefs: Reset permissions on remount if permissions are 
> > > options)
> > >
> > > I reverted that commit to 6.9.2 and now it only serves the trace but
> > > the panic is gone. But I can live with it.
> >
> > Steven, should we revert that?
> >
> > Or is there some other change that we should take to resolve this?
> >
>
> Before we revert it (as it may be a bug in mainline), Ilkka, can you
> test v6.10-rc1?  If it exists there, it will let me know whether or not
> I missed something.
>
> Thanks,
>
> -- Steve

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-27 Thread Steven Rostedt

On Fri, 24 May 2024 12:50:08 +0200
"Linux regression tracking (Thorsten Leemhuis)"  
wrote:

> > - Affected Versions: Before kernel version 6.8.10, the bug caused a
> > quick display of a kernel trace dump before the shutdown/reboot
> > completed. Starting from version 6.8.10 and continuing into version
> > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic,
> > preventing the shutdown or reboot from completing and leaving the
> > machine stuck.

You state "Before kernel version 6.8.10, the bug caused ...". Does that
mean that a bug was happening before v6.8.10? But did not cause a panic?

I just noticed your second screen shot from your report, and it has:

 "cache_from_obj: Wrong slab cache, tracefs_inode_cache but object is from 
inode_cache"

So somehow an tracefs_inode was allocated from the inode_cache and is
now being freed by the tracefs_inode logic? Did this happen before
6.8.10? If so, this code could just be triggering an issue from an
unrelated bug.

Thanks,

-- Steve

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-27 Thread Steven Rostedt

On Mon, 27 May 2024 20:14:42 +0200
Greg KH  wrote:

> On Mon, May 27, 2024 at 07:40:21PM +0300, Ilkka Naulapää wrote:
> > Hi Steven,
> > 
> > I took some time and bisected the 6.8.9 - 6.8.10 and git gave the
> > panic inducing commit:
> > 
> > 414fb08628143 (tracefs: Reset permissions on remount if permissions are 
> > options)
> > 
> > I reverted that commit to 6.9.2 and now it only serves the trace but
> > the panic is gone. But I can live with it.  
> 
> Steven, should we revert that?
> 
> Or is there some other change that we should take to resolve this?
> 

Before we revert it (as it may be a bug in mainline), Ilkka, can you
test v6.10-rc1?  If it exists there, it will let me know whether or not
I missed something.

Thanks,

-- Steve

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-27 Thread Greg KH

On Mon, May 27, 2024 at 07:40:21PM +0300, Ilkka Naulapää wrote:
> Hi Steven,
> 
> I took some time and bisected the 6.8.9 - 6.8.10 and git gave the
> panic inducing commit:
> 
> 414fb08628143 (tracefs: Reset permissions on remount if permissions are 
> options)
> 
> I reverted that commit to 6.9.2 and now it only serves the trace but
> the panic is gone. But I can live with it.

Steven, should we revert that?

Or is there some other change that we should take to resolve this?

thanks,

greg k-h

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-27 Thread Ilkka Naulapää

Hi Steven,

I took some time and bisected the 6.8.9 - 6.8.10 and git gave the
panic inducing commit:

414fb08628143 (tracefs: Reset permissions on remount if permissions are options)

I reverted that commit to 6.9.2 and now it only serves the trace but
the panic is gone. But I can live with it.

--Ilkka

On Sun, May 26, 2024 at 8:42 PM Ilkka Naulapää  wrote:
>
> hi,
>
> I took 6.9.2 and applied that 0bcfd9aa4dafa to it. Now the kernel is
> serving me both problems; the trace and the panic as the pic shows.
>
> > To understand this, did you do anything with tracing? Before shutting down,
> > is there anything in /sys/kernel/tracing/instances directory?
> > Were any of the files/directories permissions in /sys/kernel/tracing 
> > changed?
>
> And to answer your question, I did not do any tracing or so and the
> /sys/kernel/tracing is empty.
> Just plain boot-up, no matter if in full desktop or in bare rescue
> mode, ends up the same way.
>
> --Ilkka
>
> On Fri, May 24, 2024 at 8:19 PM Steven Rostedt  wrote:
> >
> > On Fri, 24 May 2024 12:50:08 +0200
> > "Linux regression tracking (Thorsten Leemhuis)"  
> > wrote:
> >
> > > > - Affected Versions: Before kernel version 6.8.10, the bug caused a
> > > > quick display of a kernel trace dump before the shutdown/reboot
> > > > completed. Starting from version 6.8.10 and continuing into version
> > > > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic,
> > > > preventing the shutdown or reboot from completing and leaving the
> > > > machine stuck.
> >
> > Ah, I bet it was this commit: baa23a8d4360d ("tracefs: Reset permissions on
> > remount if permissions are options"), which added a "iput" callback to the
> > dentry without calling iput, leaving stale inodes around.
> >
> > This is fixed with:
> >
> >   0bcfd9aa4dafa ("tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()")
> >
> > Try adding just that patch. It will at least make it go back to what was
> > happening before 6.8.10 (I hope!).
> >
> > -- Steve

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-24 Thread Steven Rostedt

On Fri, 24 May 2024 12:50:08 +0200
"Linux regression tracking (Thorsten Leemhuis)"  
wrote:

> > - Affected Versions: Before kernel version 6.8.10, the bug caused a
> > quick display of a kernel trace dump before the shutdown/reboot
> > completed. Starting from version 6.8.10 and continuing into version
> > 6.9.0 and 6.9.1, this issue has escalated to a kernel panic,
> > preventing the shutdown or reboot from completing and leaving the
> > machine stuck.

Ah, I bet it was this commit: baa23a8d4360d ("tracefs: Reset permissions on
remount if permissions are options"), which added a "iput" callback to the
dentry without calling iput, leaving stale inodes around.

This is fixed with:

  0bcfd9aa4dafa ("tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()")

Try adding just that patch. It will at least make it go back to what was
happening before 6.8.10 (I hope!).

-- Steve

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-24 Thread Steven Rostedt

On Fri, 24 May 2024 12:50:08 +0200
"Linux regression tracking (Thorsten Leemhuis)"  
wrote:

> [CCing a few people]
> 

Thanks for the Cc.

> On 24.05.24 12:31, Ilkka Naulapää wrote:
> > 
> > I have encountered a critical bug in the Linux vanilla kernel that
> > leads to a kernel panic during the shutdown or reboot process. The
> > issue arises after all services, including `journald`, have been
> > stopped. As a result, the machine fails to complete the shutdown or
> > reboot procedure, effectively causing the system to hang and not shut
> > down or reboot.  

To understand this, did you do anything with tracing? Before shutting down,
is there anything in /sys/kernel/tracing/instances directory?
Were any of the files/directories permissions in /sys/kernel/tracing changed?

> 
> Thx for the report. Not my area of expertise, so take this with a gain
> of salt. But given the versions your mention in your report and the
> screenshot that mentioned tracefs_free_inode I suspect this is caused by
> baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are
> options"). A few fixes for it will soon hit mainline and are meant to be
> backported to affected stable trees:
> 
> https://lore.kernel.org/all/20240523212406.254317...@goodmis.org/
> https://lore.kernel.org/all/20240523174419.1e588...@gandalf.local.home/
> 
> You might want to try them – or recheck once they hit the stable trees
> you are about. If they don't work, please report back.

There's been quite a bit of updates in this code, but this looks new to me.
I have more fixes that were just pulled by Linus today.

  https://git.kernel.org/torvalds/c/0eb03c7e8e2a4cc3653eb5eeb2d2001182071215

I'm not sure how relevant that is for this. But if you can reproduce it
with that commit, then this is a new bug.

-- Steve

Re: How to properly fix reading user pointers in bpf in android kernel 4.9?

2024-05-24 Thread Bagas Sanjaya

[also Cc: bpf maintainers and get_maintainer output]

On Thu, May 23, 2024 at 07:52:22PM +0300, Marcel wrote:
> This seems that it was a long standing problem with the Linux kernel in 
> general. bpf_probe_read should have worked for both kernel and user pointers 
> but it fails with access error when reading an user one instead. 
> 
> I know there's a patch upstream that fixes this by introducing new helpers 
> for reading kernel and userspace pointers and I tried to back port them back 
> to my kernel but with no success. Tools like bcc fail to use them and instead 
> they report that the arguments sent to the helpers are invalid. I assume this 
> is due to the arguments ARG_CONST_STACK_SIZE and ARG_PTR_TO_RAW_STACK handle 
> data different in the 4.9 android version and the upstream version but I'm 
> not sure that this is the cause. I left the patch I did below and with a link 
> to the kernel I'm working on and maybe someone can take a look and give me an 
> hand (the patch isn't applied yet)

What upstream patch? Has it already been in mainline?

> 
> <https://github.com/nitanmarcel/android_kernel_oneplus_sdm845-bpf>
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 744b4763b80e..de94c13b7193 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -559,6 +559,43 @@ enum bpf_func_id {
> */
> BPF_FUNC_probe_read_user,
>  
> +   /**
> +   * int bpf_probe_read_kernel(void *dst, int size, void *src)
> +   * Read a kernel pointer safely.
> +   * Return: 0 on success or negative error
> +   */
> +   BPF_FUNC_probe_read_kernel,
> +
> + /**
> +  * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
> +  * Copy a NUL terminated string from user unsafe address. In case 
> the string
> +  * length is smaller than size, the target is not padded with 
> further NUL
> +  * bytes. In case the string length is larger than size, just 
> count-1
> +  * bytes are copied and the last byte is set to NUL.
> +  * @dst: destination address
> +  * @size: maximum number of bytes to copy, including the trailing 
> NUL
> +  * @unsafe_ptr: unsafe address
> +  * Return:
> +  *   > 0 length of the string including the trailing NUL on success
> +  *   < 0 error
> +  */
> + BPF_FUNC_probe_read_user_str,
> +
> + /**
> +  * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
> +  * Copy a NUL terminated string from unsafe address. In case the 
> string
> +  * length is smaller than size, the target is not padded with 
> further NUL
> +  * bytes. In case the string length is larger than size, just 
> count-1
> +  * bytes are copied and the last byte is set to NUL.
> +  * @dst: destination address
> +  * @size: maximum number of bytes to copy, including the trailing 
> NUL
> +  * @unsafe_ptr: unsafe address
> +  * Return:
> +  *   > 0 length of the string including the trailing NUL on success
> +  *   < 0 error
> +  */
> + BPF_FUNC_probe_read_kernel_str,
> +
>   __BPF_FUNC_MAX_ID,
>  };
>  
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index a1e37a5d8c88..3478ca744a45 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -94,7 +94,7 @@ static const struct bpf_func_proto bpf_probe_read_proto = {
>   .arg3_type  = ARG_ANYTHING,
>  };
>  
> -BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size, const void *, 
> unsafe_ptr)
> +BPF_CALL_3(bpf_probe_read_user, void *, dst, u32, size, const void  __user 
> *, unsafe_ptr)
>  {
>   int ret;
>  
> @@ -115,6 +115,27 @@ static const struct bpf_func_proto 
> bpf_probe_read_user_proto = {
>  };
>  
>  
> +BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, const void *, 
> unsafe_ptr)
> +{
> + int ret;
> +
> + ret = probe_kernel_read(dst, unsafe_ptr, size);
> + if (unlikely(ret < 0))
> + memset(dst, 0, size);
> +
> + return ret;
> +}
> +
> +static const struct bpf_func_proto bpf_probe_read_kernel_proto = {
> + .func   = bpf_probe_read_kernel,
> + .gpl_only   = true,
> + .ret_type   = RET_INTEGER,
> + .arg1_type  = ARG_PTR_TO_RAW_STACK,
> + .arg2_type  = ARG_CONST_STACK_SIZE,
> + .arg3_type  = ARG_ANYTHING,
> +};
> +
> +
>  BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src,
>  u32, size)
>  {
> @@ -487,6 +508,69 @@ static const struct

Re: Bug in Kernel 6.8.x, 6.9.x Causing Trace/Panic During Shutdown/Reboot

2024-05-24 Thread Linux regression tracking (Thorsten Leemhuis)

[CCing a few people]

On 24.05.24 12:31, Ilkka Naulapää wrote:
> 
> I have encountered a critical bug in the Linux vanilla kernel that
> leads to a kernel panic during the shutdown or reboot process. The
> issue arises after all services, including `journald`, have been
> stopped. As a result, the machine fails to complete the shutdown or
> reboot procedure, effectively causing the system to hang and not shut
> down or reboot.

Thx for the report. Not my area of expertise, so take this with a gain
of salt. But given the versions your mention in your report and the
screenshot that mentioned tracefs_free_inode I suspect this is caused by
baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are
options"). A few fixes for it will soon hit mainline and are meant to be
backported to affected stable trees:

https://lore.kernel.org/all/20240523212406.254317...@goodmis.org/
https://lore.kernel.org/all/20240523174419.1e588...@gandalf.local.home/

You might want to try them – or recheck once they hit the stable trees
you are about. If they don't work, please report back.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> Here are the details of the issue:
> 
> - Affected Versions: Before kernel version 6.8.10, the bug caused a
> quick display of a kernel trace dump before the shutdown/reboot
> completed. Starting from version 6.8.10 and continuing into version
> 6.9.0 and 6.9.1, this issue has escalated to a kernel panic,
> preventing the shutdown or reboot from completing and leaving the
> machine stuck.
> 
> - Symptoms:
>   - In normal shutdown/reboot scenarios, the kernel trace dump briefly
> appears as the last message on the screen.
>   - In rescue mode, the kernel panic message is displayed. Normally it
> is not shown.
> 
> Since `journald` is stopped before this issue occurs, no textual logs
> are available. However, I have captured two pictures illustrating
> these related issues, which I am attaching to this email for your
> reference. Also added my custom kernel config.
> 
> Thank you for your attention to this matter. Please let me know if any
> additional information is required to assist in diagnosing and
> resolving this bug.
> 
> Best regards,
> 
> Ilkka Naulapää

[PATCH v3 6/9] riscv: mm: Take memory hotplug read-lock during kernel page table dump

2024-05-21 Thread Björn Töpel

From: Björn Töpel 

During memory hot remove, the ptdump functionality can end up touching
stale data. Avoid any potential crashes (or worse), by holding the
memory hotplug read-lock while traversing the page table.

This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm:
Hold memory hotplug lock while walking for kernel page table dump").

Reviewed-by: David Hildenbrand 
Reviewed-by: Oscar Salvador 
Signed-off-by: Björn Töpel 
---
 arch/riscv/mm/ptdump.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 1289cc6d3700..9d5f657a251b 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -370,7 +371,9 @@ bool ptdump_check_wx(void)
 
 static int ptdump_show(struct seq_file *m, void *v)
 {
+   get_online_mems();
ptdump_walk(m, m->private);
+   put_online_mems();
 
return 0;
 }
-- 
2.40.1

[PATCH] rv: Update rv_en(dis)able_monitor doc to match kernel-doc

2024-05-19 Thread Yang Li

The patch updates the function documentation comment for
rv_en(dis)able_monitor to adhere to the kernel-doc specification.

Signed-off-by: Yang Li 
---
 kernel/trace/rv/rv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c
index 2f68e93fff0b..df0745a42a3f 100644
--- a/kernel/trace/rv/rv.c
+++ b/kernel/trace/rv/rv.c
@@ -245,6 +245,7 @@ static int __rv_disable_monitor(struct rv_monitor_def 
*mdef, bool sync)
 
 /**
  * rv_disable_monitor - disable a given runtime monitor
+ * @mdef: Pointer to the monitor definition structure.
  *
  * Returns 0 on success.
  */
@@ -256,6 +257,7 @@ int rv_disable_monitor(struct rv_monitor_def *mdef)
 
 /**
  * rv_enable_monitor - enable a given runtime monitor
+ * @mdef: Pointer to the monitor definition structure.
  *
  * Returns 0 on success, error otherwise.
  */
-- 
2.20.1.7.g153144c

Re: [PATCH] kernel: trace: preemptirq_delay_test: add MODULE_DESCRIPTION()

2024-05-18 Thread Google

On Sat, 18 May 2024 15:54:49 -0700
Jeff Johnson  wrote:

> Fix the 'make W=1' warning:
> 
> WARNING: modpost: missing MODULE_DESCRIPTION() in 
> kernel/trace/preemptirq_delay_test.o
> 

Looks good to me.

Acked-by: Masami Hiramatsu (Google) 

Fixes: f96e8577da10 ("lib: Add module for testing preemptoff/irqsoff latency 
tracers")

Thanks,

> Signed-off-by: Jeff Johnson 
> ---
>  kernel/trace/preemptirq_delay_test.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/trace/preemptirq_delay_test.c 
> b/kernel/trace/preemptirq_delay_test.c
> index 8c4ffd076162..cb0871fbdb07 100644
> --- a/kernel/trace/preemptirq_delay_test.c
> +++ b/kernel/trace/preemptirq_delay_test.c
> @@ -215,4 +215,5 @@ static void __exit preemptirq_delay_exit(void)
>  
>  module_init(preemptirq_delay_init)
>  module_exit(preemptirq_delay_exit)
> +MODULE_DESCRIPTION("Preempt / IRQ disable delay thread to test latency 
> tracers");
>  MODULE_LICENSE("GPL v2");
> 
> ---
> base-commit: 674143feb6a8c02d899e64e2ba0f992896afd532
> change-id: 20240518-md-preemptirq_delay_test-552cd20e7b0b
> 


-- 
Masami Hiramatsu (Google)

[PATCH] kernel: trace: preemptirq_delay_test: add MODULE_DESCRIPTION()

2024-05-18 Thread Jeff Johnson

Fix the 'make W=1' warning:

WARNING: modpost: missing MODULE_DESCRIPTION() in 
kernel/trace/preemptirq_delay_test.o

Signed-off-by: Jeff Johnson 
---
 kernel/trace/preemptirq_delay_test.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c
index 8c4ffd076162..cb0871fbdb07 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -215,4 +215,5 @@ static void __exit preemptirq_delay_exit(void)
 
 module_init(preemptirq_delay_init)
 module_exit(preemptirq_delay_exit)
+MODULE_DESCRIPTION("Preempt / IRQ disable delay thread to test latency 
tracers");
 MODULE_LICENSE("GPL v2");

---
base-commit: 674143feb6a8c02d899e64e2ba0f992896afd532
change-id: 20240518-md-preemptirq_delay_test-552cd20e7b0b

Re: [PATCH -next] rv: Update rv_en(dis)able_monitor doc to match kernel-doc

2024-05-17 Thread Daniel Bristot de Oliveira

Hi Yang

On 5/17/24 11:14, Yang Li wrote:
> The patch updates the function documentation comment for
> rv_en(dis)able_monitor to adhere to the kernel-doc specification.
> 
> Signed-off-by: Yang Li 
> ---
>  kernel/trace/rv/rv.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c
> index 2f68e93fff0b..df0745a42a3f 100644
> --- a/kernel/trace/rv/rv.c
> +++ b/kernel/trace/rv/rv.c
> @@ -245,6 +245,7 @@ static int __rv_disable_monitor(struct rv_monitor_def 
> *mdef, bool sync)
>  
>  /**
>   * rv_disable_monitor - disable a given runtime monitor
> + * @mdef: Pointer to the monitor definition structure.

This change is in for mainline kernel, why are you using the -next on the 
Subject?

-- Daniel

[PATCH -next] rv: Update rv_en(dis)able_monitor doc to match kernel-doc

2024-05-17 Thread Yang Li

The patch updates the function documentation comment for
rv_en(dis)able_monitor to adhere to the kernel-doc specification.

Signed-off-by: Yang Li 
---
 kernel/trace/rv/rv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c
index 2f68e93fff0b..df0745a42a3f 100644
--- a/kernel/trace/rv/rv.c
+++ b/kernel/trace/rv/rv.c
@@ -245,6 +245,7 @@ static int __rv_disable_monitor(struct rv_monitor_def 
*mdef, bool sync)
 
 /**
  * rv_disable_monitor - disable a given runtime monitor
+ * @mdef: Pointer to the monitor definition structure.
  *
  * Returns 0 on success.
  */
@@ -256,6 +257,7 @@ int rv_disable_monitor(struct rv_monitor_def *mdef)
 
 /**
  * rv_enable_monitor - enable a given runtime monitor
+ * @mdef: Pointer to the monitor definition structure.
  *
  * Returns 0 on success, error otherwise.
  */
-- 
2.20.1.7.g153144c

[PATCH v3 5/6] kbuild: generate modules.builtin.ranges when linking the kernel

2024-05-16 Thread Kris Van Hees

Signed-off-by: Kris Van Hees 
Reviewed-by: Nick Alcock 
---
Changes since v2:
 - 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo
 - Use $(real-prereqs) rather than $(filter-out ...)
---
 scripts/Makefile.vmlinux | 16 
 1 file changed, 16 insertions(+)

diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index c9f3e03124d7f..afe8287e8dda0 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -36,6 +36,22 @@ targets += vmlinux
 vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE
+$(call if_changed_dep,link_vmlinux)
 
+# module.builtin.ranges
+# ---
+ifdef CONFIG_BUILTIN_MODULE_RANGES
+__default: modules.builtin.ranges
+
+quiet_cmd_modules_builtin_ranges = GEN $@
+  cmd_modules_builtin_ranges = \
+   $(srctree)/scripts/generate_builtin_ranges.awk $(real-prereqs) > $@
+
+vmlinux.map: vmlinux
+
+targets += modules.builtin.ranges
+modules.builtin.ranges: modules.builtin.modinfo vmlinux.map vmlinux.o.map FORCE
+   $(call if_changed,modules_builtin_ranges)
+endif
+
 # Add FORCE to the prequisites of a target to force it to be always rebuilt.
 # ---
 
-- 
2.43.0

Re: [PATCH v2 5/8] riscv: mm: Take memory hotplug read-lock during kernel page table dump

2024-05-14 Thread Oscar Salvador

On Tue, May 14, 2024 at 04:04:43PM +0200, Björn Töpel wrote:
> From: Björn Töpel 
> 
> During memory hot remove, the ptdump functionality can end up touching
> stale data. Avoid any potential crashes (or worse), by holding the
> memory hotplug read-lock while traversing the page table.
> 
> This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm:
> Hold memory hotplug lock while walking for kernel page table dump").
> 
> Signed-off-by: Björn Töpel 

Reviewed-by: Oscar Salvador 

funny enough, it seems arm64 and riscv are the only ones holding the
hotplug lock here.
I think we have the same problem on the other arches as well (at least
on x86_64 that I can see).

If we happen to finally need the lock in those, I would rather have a
centric function in the generic mm code with the locking and then
calling an arch specific ptdump_show function, so the lock is not
scattered. But that is another story.

-- 
Oscar Salvador
SUSE Labs

Re: [PATCH v2 5/8] riscv: mm: Take memory hotplug read-lock during kernel page table dump

2024-05-14 Thread David Hildenbrand


On 14.05.24 16:04, Björn Töpel wrote:

From: Björn Töpel 

During memory hot remove, the ptdump functionality can end up touching
stale data. Avoid any potential crashes (or worse), by holding the
memory hotplug read-lock while traversing the page table.

This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm:
Hold memory hotplug lock while walking for kernel page table dump").

Signed-off-by: Björn Töpel 
---
  arch/riscv/mm/ptdump.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 1289cc6d3700..9d5f657a251b 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -6,6 +6,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -370,7 +371,9 @@ bool ptdump_check_wx(void)
  
  static int ptdump_show(struct seq_file *m, void *v)

  {
+   get_online_mems();
ptdump_walk(m, m->private);
+   put_online_mems();
  
  	return 0;

  }


Reviewed-by: David Hildenbrand 

--
Cheers,

David / dhildenb

[PATCH v2 5/8] riscv: mm: Take memory hotplug read-lock during kernel page table dump

2024-05-14 Thread Björn Töpel

From: Björn Töpel 

During memory hot remove, the ptdump functionality can end up touching
stale data. Avoid any potential crashes (or worse), by holding the
memory hotplug read-lock while traversing the page table.

This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm:
Hold memory hotplug lock while walking for kernel page table dump").

Signed-off-by: Björn Töpel 
---
 arch/riscv/mm/ptdump.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
index 1289cc6d3700..9d5f657a251b 100644
--- a/arch/riscv/mm/ptdump.c
+++ b/arch/riscv/mm/ptdump.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -370,7 +371,9 @@ bool ptdump_check_wx(void)
 
 static int ptdump_show(struct seq_file *m, void *v)
 {
+   get_online_mems();
ptdump_walk(m, m->private);
+   put_online_mems();
 
return 0;
 }
-- 
2.40.1

Re: [PATCH 2/3] kernel/pid: Remove default pid_max value

2024-05-13 Thread Michal Koutný

On Mon, Apr 08, 2024 at 04:58:18PM GMT, Michal Koutný  wrote:
> The kernel provides mechanisms, while it should not imply policies --
> default pid_max seems to be an example of the policy that does not fit
> all. At the same time pid_max must have some value assigned, so use the
> end of the allowed range -- pid_max_max.
> 
> This change thus increases initial pid_max from 32k to 4M (x86_64
> defconfig).

Out of curiosity I dug out the commit
acdc721fe26d ("[PATCH] pid-max-2.5.33-A0") v2.5.34~5
that introduced the 32k default. The commit message doesn't say why such
a sudden change though.
Previously, the limit was 1G of pids (i.e. effectively no default limit
like the intention of this series).

Honestly, I expected more enthusiasm or reasons against removing the
default value of pid_max. Is this really not of interest to anyone?

(Thanks, Andrew, for your responses. I don't plan to pursue this further
should there be no more interest in having less default limit values in
kernel.)

Regards,
Michal

signature.asc
Description: PGP signature

Re: [PATCH v2 5/6] kbuild: generate modules.builtin.ranges when linking the kernel

2024-05-12 Thread Masahiro Yamada

On Sun, May 12, 2024 at 7:44 AM Kris Van Hees  wrote:
>
> Signed-off-by: Kris Van Hees 
> Reviewed-by: Nick Alcock 
> ---
> Changes since v1:
>  - Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES
> ---
>  scripts/Makefile.vmlinux | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> index c9f3e03124d7f..54095d72f7fd7 100644
> --- a/scripts/Makefile.vmlinux
> +++ b/scripts/Makefile.vmlinux
> @@ -36,6 +36,23 @@ targets += vmlinux
>  vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE
> +$(call if_changed_dep,link_vmlinux)
>
> +# module.builtin.ranges
> +# ---
> +ifdef CONFIG_BUILTIN_MODULE_RANGES
> +__default: modules.builtin.ranges
> +
> +quiet_cmd_modules_builtin_ranges = GEN $@
> +  cmd_modules_builtin_ranges = \
> +   $(srctree)/scripts/generate_builtin_ranges.awk \
> + $(filter-out FORCE,$+) > $@


$(filter-out FORCE,$+)

  ->

$(real-prereqs)



> +
> +vmlinux.map: vmlinux
> +
> +targets += modules.builtin.ranges
> +modules.builtin.ranges: modules.builtin.objs vmlinux.map vmlinux.o.map FORCE
> +   $(call if_changed,modules_builtin_ranges)
> +endif
> +
>  # Add FORCE to the prequisites of a target to force it to be always rebuilt.
>  # ---
>
> --
> 2.43.0
>
>


-- 
Best Regards
Masahiro Yamada

Re: [PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation

2024-05-12 Thread Steev Klimaszewski

On Sat, May 11, 2024 at 4:56 PM Dmitry Baryshkov
 wrote:
>
> Protection domain mapper is a QMI service providing mapping between
> 'protection domains' and services supported / allowed in these domains.
> For example such mapping is required for loading of the WiFi firmware or
> for properly starting up the UCSI / altmode / battery manager support.
>
> The existing userspace implementation has several issue. It doesn't play
> well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
> firmware location is changed (or if the firmware was not available at
> the time pd-mapper was started but the corresponding directory is
> mounted later), etc.
>
> However this configuration is largely static and common between
> different platforms. Provide in-kernel service implementing static
> per-platform data.
>
> To: Bjorn Andersson 
> To: Konrad Dybcio 
> To: Sibi Sankar 
> To: Mathieu Poirier 
> Cc: linux-arm-...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-remotep...@vger.kernel.org
> Cc: Johan Hovold 
> Cc: Xilin Wu 
> Cc: "Bryan O'Donoghue" 
> Cc: Steev Klimaszewski 
> Cc: Alexey Minnekhanov 
>
> --
>
> Changes in v8:
> - Reworked pd-mapper to register as an rproc_subdev / auxdev
> - Dropped Tested-by from Steev and Alexey from the last patch since the
>   implementation was changed significantly.
> - Add sensors, cdsp and mpss_root domains to 660 config (Alexey
>   Minnekhanov)
> - Added platform entry for sm4250 (used for qrb4210 / RB2)
> - Added locking to the pdr_get_domain_list() (Chris Lew)
> - Remove the call to qmi_del_server() and corresponding API (Chris Lew)
> - In qmi_handle_init() changed 1024 to a defined constant (Chris Lew)
> - Link to v7: 
> https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org
>
> Changes in v7:
> - Fixed modular build (Steev)
> - Link to v6: 
> https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org
>
> Changes in v6:
> - Reworked mutex to fix lockdep issue on deregistration
> - Fixed dependencies between PD-mapper and remoteproc to fix modular
>   builds (Krzysztof)
> - Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
> - Fixed kerneldocs (Krzysztof)
> - Removed extra pr_debug messages (Krzysztof)
> - Fixed wcss build (Krzysztof)
> - Added platforms which do not require protection domain mapping to
>   silence the notice on those platforms
> - Link to v5: 
> https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org
>
> Changes in v5:
> - pdr: drop lock in pdr_register_listener, list_lock is already held (Chris 
> Lew)
> - pd_mapper: reworked to provide static configuration per platform
>   (Bjorn)
> - Link to v4: 
> https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org
>
> Changes in v4:
> - Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
> - Added configuration for sm6350 (Thanks to Luca)
> - Removed RFC tag (Konrad)
> - Link to v3: 
> https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org
>
> Changes in RFC v3:
> - Send start / stop notifications when PD-mapper domain list is changed
> - Reworked the way PD-mapper treats protection domains, register all of
>   them in a single batch
> - Added SC7180 domains configuration based on TCL Book 14 GO
> - Link to v2: 
> https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org
>
> Changes in RFC v2:
> - Swapped num_domains / domains (Konrad)
> - Fixed an issue with battery not working on sc8280xp
> - Added missing configuration for QCS404
>
> ---
> Dmitry Baryshkov (5):
>   soc: qcom: pdr: protect locator_addr with the main mutex
>   soc: qcom: pdr: fix parsing of domains lists
>   soc: qcom: pdr: extract PDR message marshalling data
>   soc: qcom: add pd-mapper implementation
>   remoteproc: qcom: enable in-kernel PD mapper
>
>  drivers/remoteproc/qcom_common.c|  87 +
>  drivers/remoteproc/qcom_common.h|  10 +
>  drivers/remoteproc/qcom_q6v5_adsp.c |   3 +
>  drivers/remoteproc/qcom_q6v5_mss.c  |   3 +
>  drivers/remoteproc/qcom_q6v5_pas.c  |   3 +
>  drivers/remoteproc/qcom_q6v5_wcss.c |   3 +
>  drivers/soc/qcom/Kconfig|  15 +
>  drivers/soc/qcom/Makefile   |   2 +
>  drivers/soc/qcom/pdr_interface.c|  17 +-
>  drivers/soc/qcom/pdr_internal.h | 318 ++---
>  drivers/soc/qcom/qcom_pd_mapper.c   | 676 
> 
>  drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
>  12 files changed, 1190 insertions(+), 300 deletions(-)
> ---
> base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488
> change-id: 20240301-qcom-pd-mapper-e12d622d4ad0
>
> Best regards,
> --
> Dmitry Baryshkov 
>

I've tested this over the weekend on my Thinkpad X13s with a number of
reboots and seems to do the correct thing in v8 as well.
Tested-by: Steev Klimaszewski

[PATCH v2 5/6] kbuild: generate modules.builtin.ranges when linking the kernel

2024-05-11 Thread Kris Van Hees

Signed-off-by: Kris Van Hees 
Reviewed-by: Nick Alcock 
---
Changes since v1:
 - Renamed CONFIG_BUILTIN_RANGES to CONFIG_BUILTIN_MODULE_RANGES
---
 scripts/Makefile.vmlinux | 17 +
 1 file changed, 17 insertions(+)

diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index c9f3e03124d7f..54095d72f7fd7 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -36,6 +36,23 @@ targets += vmlinux
 vmlinux: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE
+$(call if_changed_dep,link_vmlinux)
 
+# module.builtin.ranges
+# ---
+ifdef CONFIG_BUILTIN_MODULE_RANGES
+__default: modules.builtin.ranges
+
+quiet_cmd_modules_builtin_ranges = GEN $@
+  cmd_modules_builtin_ranges = \
+   $(srctree)/scripts/generate_builtin_ranges.awk \
+ $(filter-out FORCE,$+) > $@
+
+vmlinux.map: vmlinux
+
+targets += modules.builtin.ranges
+modules.builtin.ranges: modules.builtin.objs vmlinux.map vmlinux.o.map FORCE
+   $(call if_changed,modules_builtin_ranges)
+endif
+
 # Add FORCE to the prequisites of a target to force it to be always rebuilt.
 # ---
 
-- 
2.43.0

[PATCH v8 5/5] remoteproc: qcom: enable in-kernel PD mapper

2024-05-11 Thread Dmitry Baryshkov

Request in-kernel protection domain mapper to be started before starting
Qualcomm DSP and release it once DSP is stopped. Once all DSPs are
stopped, the PD mapper will be stopped too.

Signed-off-by: Dmitry Baryshkov 
---
 drivers/remoteproc/qcom_common.c| 87 +
 drivers/remoteproc/qcom_common.h| 10 +
 drivers/remoteproc/qcom_q6v5_adsp.c |  3 ++
 drivers/remoteproc/qcom_q6v5_mss.c  |  3 ++
 drivers/remoteproc/qcom_q6v5_pas.c  |  3 ++
 drivers/remoteproc/qcom_q6v5_wcss.c |  3 ++
 6 files changed, 109 insertions(+)

diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c
index 03e5f5d533eb..8c8688f99f0a 100644
--- a/drivers/remoteproc/qcom_common.c
+++ b/drivers/remoteproc/qcom_common.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -25,6 +26,7 @@
 #define to_glink_subdev(d) container_of(d, struct qcom_rproc_glink, subdev)
 #define to_smd_subdev(d) container_of(d, struct qcom_rproc_subdev, subdev)
 #define to_ssr_subdev(d) container_of(d, struct qcom_rproc_ssr, subdev)
+#define to_pdm_subdev(d) container_of(d, struct qcom_rproc_pdm, subdev)
 
 #define MAX_NUM_OF_SS   10
 #define MAX_REGION_NAME_LENGTH  16
@@ -519,5 +521,90 @@ void qcom_remove_ssr_subdev(struct rproc *rproc, struct 
qcom_rproc_ssr *ssr)
 }
 EXPORT_SYMBOL_GPL(qcom_remove_ssr_subdev);
 
+static void pdm_dev_release(struct device *dev)
+{
+   struct auxiliary_device *adev = to_auxiliary_dev(dev);
+
+   kfree(adev);
+}
+
+static int pdm_notify_prepare(struct rproc_subdev *subdev)
+{
+   struct qcom_rproc_pdm *pdm = to_pdm_subdev(subdev);
+   struct auxiliary_device *adev;
+   int ret;
+
+   adev = kzalloc(sizeof(*adev), GFP_KERNEL);
+   if (!adev)
+   return -ENOMEM;
+
+   adev->dev.parent = pdm->dev;
+   adev->dev.release = pdm_dev_release;
+   adev->name = "pd-mapper";
+   adev->id = pdm->index;
+
+   ret = auxiliary_device_init(adev);
+   if (ret) {
+   kfree(adev);
+   return ret;
+   }
+
+   ret = auxiliary_device_add(adev);
+   if (ret) {
+   auxiliary_device_uninit(adev);
+   return ret;
+   }
+
+   pdm->adev = adev;
+
+   return 0;
+}
+
+
+static void pdm_notify_unprepare(struct rproc_subdev *subdev)
+{
+   struct qcom_rproc_pdm *pdm = to_pdm_subdev(subdev);
+
+   if (!pdm->adev)
+   return;
+
+   auxiliary_device_delete(pdm->adev);
+   auxiliary_device_uninit(pdm->adev);
+   pdm->adev = NULL;
+}
+
+/**
+ * qcom_add_pdm_subdev() - register PD Mapper subdevice
+ * @rproc: rproc handle
+ * @pdm:   PDM subdevice handle
+ *
+ * Register @pdm so that Protection Device mapper service is started when the
+ * DSP is started too.
+ */
+void qcom_add_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm)
+{
+   pdm->dev = &rproc->dev;
+   pdm->index = rproc->index;
+
+   pdm->subdev.prepare = pdm_notify_prepare;
+   pdm->subdev.unprepare = pdm_notify_unprepare;
+
+   rproc_add_subdev(rproc, &pdm->subdev);
+}
+EXPORT_SYMBOL_GPL(qcom_add_pdm_subdev);
+
+/**
+ * qcom_remove_pdm_subdev() - remove PD Mapper subdevice
+ * @rproc: rproc handle
+ * @pdm:   PDM subdevice handle
+ *
+ * Remove the PD Mapper subdevice.
+ */
+void qcom_remove_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm)
+{
+   rproc_remove_subdev(rproc, &pdm->subdev);
+}
+EXPORT_SYMBOL_GPL(qcom_remove_pdm_subdev);
+
 MODULE_DESCRIPTION("Qualcomm Remoteproc helper driver");
 MODULE_LICENSE("GPL v2");
diff --git a/drivers/remoteproc/qcom_common.h b/drivers/remoteproc/qcom_common.h
index 9ef4449052a9..b07fbaa091a0 100644
--- a/drivers/remoteproc/qcom_common.h
+++ b/drivers/remoteproc/qcom_common.h
@@ -34,6 +34,13 @@ struct qcom_rproc_ssr {
struct qcom_ssr_subsystem *info;
 };
 
+struct qcom_rproc_pdm {
+   struct rproc_subdev subdev;
+   struct device *dev;
+   int index;
+   struct auxiliary_device *adev;
+};
+
 void qcom_minidump(struct rproc *rproc, unsigned int minidump_id,
void (*rproc_dumpfn_t)(struct rproc *rproc,
struct rproc_dump_segment *segment, void *dest, 
size_t offset,
@@ -52,6 +59,9 @@ void qcom_add_ssr_subdev(struct rproc *rproc, struct 
qcom_rproc_ssr *ssr,
 const char *ssr_name);
 void qcom_remove_ssr_subdev(struct rproc *rproc, struct qcom_rproc_ssr *ssr);
 
+void qcom_add_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm);
+void qcom_remove_pdm_subdev(struct rproc *rproc, struct qcom_rproc_pdm *pdm);
+
 #if IS_ENABLED(CONFIG_QCOM_SYSMON)
 struct qcom_sysmon *qcom_add_sysmon_subdev(struct rproc *rproc,
   const char *name,
diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
b/d

[PATCH v8 0/5] soc: qcom: add in-kernel pd-mapper implementation

2024-05-11 Thread Dmitry Baryshkov

Protection domain mapper is a QMI service providing mapping between
'protection domains' and services supported / allowed in these domains.
For example such mapping is required for loading of the WiFi firmware or
for properly starting up the UCSI / altmode / battery manager support.

The existing userspace implementation has several issue. It doesn't play
well with CONFIG_EXTRA_FIRMWARE, it doesn't reread the JSON files if the
firmware location is changed (or if the firmware was not available at
the time pd-mapper was started but the corresponding directory is
mounted later), etc.

However this configuration is largely static and common between
different platforms. Provide in-kernel service implementing static
per-platform data.

To: Bjorn Andersson 
To: Konrad Dybcio 
To: Sibi Sankar 
To: Mathieu Poirier 
Cc: linux-arm-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-remotep...@vger.kernel.org
Cc: Johan Hovold 
Cc: Xilin Wu 
Cc: "Bryan O'Donoghue" 
Cc: Steev Klimaszewski 
Cc: Alexey Minnekhanov 

--

Changes in v8:
- Reworked pd-mapper to register as an rproc_subdev / auxdev
- Dropped Tested-by from Steev and Alexey from the last patch since the
  implementation was changed significantly.
- Add sensors, cdsp and mpss_root domains to 660 config (Alexey
  Minnekhanov)
- Added platform entry for sm4250 (used for qrb4210 / RB2)
- Added locking to the pdr_get_domain_list() (Chris Lew)
- Remove the call to qmi_del_server() and corresponding API (Chris Lew)
- In qmi_handle_init() changed 1024 to a defined constant (Chris Lew)
- Link to v7: 
https://lore.kernel.org/r/20240424-qcom-pd-mapper-v7-0-05f7fc646...@linaro.org

Changes in v7:
- Fixed modular build (Steev)
- Link to v6: 
https://lore.kernel.org/r/20240422-qcom-pd-mapper-v6-0-f96957d01...@linaro.org

Changes in v6:
- Reworked mutex to fix lockdep issue on deregistration
- Fixed dependencies between PD-mapper and remoteproc to fix modular
  builds (Krzysztof)
- Added EXPORT_SYMBOL_GPL to fix modular builds (Krzysztof)
- Fixed kerneldocs (Krzysztof)
- Removed extra pr_debug messages (Krzysztof)
- Fixed wcss build (Krzysztof)
- Added platforms which do not require protection domain mapping to
  silence the notice on those platforms
- Link to v5: 
https://lore.kernel.org/r/20240419-qcom-pd-mapper-v5-0-e35b6f847...@linaro.org

Changes in v5:
- pdr: drop lock in pdr_register_listener, list_lock is already held (Chris Lew)
- pd_mapper: reworked to provide static configuration per platform
  (Bjorn)
- Link to v4: 
https://lore.kernel.org/r/20240311-qcom-pd-mapper-v4-0-24679cca5...@linaro.org

Changes in v4:
- Fixed missing chunk, reenabled kfree in qmi_del_server (Konrad)
- Added configuration for sm6350 (Thanks to Luca)
- Removed RFC tag (Konrad)
- Link to v3: 
https://lore.kernel.org/r/20240304-qcom-pd-mapper-v3-0-6858fa1ac...@linaro.org

Changes in RFC v3:
- Send start / stop notifications when PD-mapper domain list is changed
- Reworked the way PD-mapper treats protection domains, register all of
  them in a single batch
- Added SC7180 domains configuration based on TCL Book 14 GO
- Link to v2: 
https://lore.kernel.org/r/20240301-qcom-pd-mapper-v2-0-5d12a081d...@linaro.org

Changes in RFC v2:
- Swapped num_domains / domains (Konrad)
- Fixed an issue with battery not working on sc8280xp
- Added missing configuration for QCS404

---
Dmitry Baryshkov (5):
  soc: qcom: pdr: protect locator_addr with the main mutex
  soc: qcom: pdr: fix parsing of domains lists
  soc: qcom: pdr: extract PDR message marshalling data
  soc: qcom: add pd-mapper implementation
  remoteproc: qcom: enable in-kernel PD mapper

 drivers/remoteproc/qcom_common.c|  87 +
 drivers/remoteproc/qcom_common.h|  10 +
 drivers/remoteproc/qcom_q6v5_adsp.c |   3 +
 drivers/remoteproc/qcom_q6v5_mss.c  |   3 +
 drivers/remoteproc/qcom_q6v5_pas.c  |   3 +
 drivers/remoteproc/qcom_q6v5_wcss.c |   3 +
 drivers/soc/qcom/Kconfig|  15 +
 drivers/soc/qcom/Makefile   |   2 +
 drivers/soc/qcom/pdr_interface.c|  17 +-
 drivers/soc/qcom/pdr_internal.h | 318 ++---
 drivers/soc/qcom/qcom_pd_mapper.c   | 676 
 drivers/soc/qcom/qcom_pdr_msg.c | 353 +++
 12 files changed, 1190 insertions(+), 300 deletions(-)
---
base-commit: e5119bbdaca76cd3c15c3c975d51d840bbfb2488
change-id: 20240301-qcom-pd-mapper-e12d622d4ad0

Best regards,
-- 
Dmitry Baryshkov

Re: kernel BUG in ptr_stale

2024-05-09 Thread Kent Overstreet

On Thu, May 09, 2024 at 02:26:24PM +0800, Ubisectech Sirius wrote:
> Hello.
> We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
> Recently, our team has discovered a issue in Linux kernel 6.7. Attached to 
> the email were a PoC file of the issue.

This (and several of your others) are fixed in Linus's tree.

> 
> Stack dump:
> 
> bcachefs (loop1): mounting version 1.7: (unknown version) 
> opts=metadata_checksum=none,data_checksum=none,nojournal_transaction_names
> ----[ cut here ]
> kernel BUG at fs/bcachefs/buckets.h:114!
> invalid opcode:  [#1] PREEMPT SMP KASAN NOPTI
> CPU: 1 PID: 9472 Comm: syz-executor.1 Not tainted 6.7.0 #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 
> 04/01/2014
> RIP: 0010:bucket_gen fs/bcachefs/buckets.h:114 [inline]
> RIP: 0010:ptr_stale+0x474/0x4e0 fs/bcachefs/buckets.h:188
> Code: 48 c7 c2 80 8c 1b 8b be 67 00 00 00 48 c7 c7 e0 8c 1b 8b c6 05 ea a6 72 
> 0b 01 e8 57 55 9c fd e9 fb fc ff ff e8 9d 02 bd fd 90 <0f> 0b 48 89 04 24 e8 
> 31 bb 13 fe 48 8b 04 24 e9 35 fc ff ff e8 23
> RSP: 0018:c90007c4ec38 EFLAGS: 00010246
> RAX: 0004 RBX: 0080 RCX: c90002679000
> RDX: 0004 RSI: 83ccf3b3 RDI: 0006
> RBP:  R08: 0006 R09: 1028
> R10: 0080 R11:  R12: 1028
> R13: 88804dee5100 R14:  R15: 88805b1a4110
> FS:  7f79ba8ab640() GS:88807ec0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f0bbda3f000 CR3: 5f37a000 CR4: 00750ef0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> PKRU: 5554
> Call Trace:
>  
>  bch2_bkey_ptrs_to_text+0xb4e/0x1760 fs/bcachefs/extents.c:1012
>  bch2_btree_ptr_v2_to_text+0x288/0x330 fs/bcachefs/extents.c:215
>  bch2_val_to_text fs/bcachefs/bkey_methods.c:287 [inline]
>  bch2_bkey_val_to_text+0x1c8/0x210 fs/bcachefs/bkey_methods.c:297
>  journal_validate_key+0x7ab/0xb50 fs/bcachefs/journal_io.c:322
>  journal_entry_btree_root_validate+0x31c/0x380 fs/bcachefs/journal_io.c:411
>  bch2_journal_entry_validate+0xc7/0x130 fs/bcachefs/journal_io.c:752
>  bch2_sb_clean_validate_late+0x14b/0x1e0 fs/bcachefs/sb-clean.c:32
>  bch2_read_superblock_clean+0xbb/0x250 fs/bcachefs/sb-clean.c:160
>  bch2_fs_recovery+0x113/0x52d0 fs/bcachefs/recovery.c:691
>  bch2_fs_start+0x365/0x5e0 fs/bcachefs/super.c:978
>  bch2_fs_open+0x1ac9/0x3890 fs/bcachefs/super.c:1968
>  bch2_mount+0x538/0x13c0 fs/bcachefs/fs.c:1863
>  legacy_get_tree+0x109/0x220 fs/fs_context.c:662
>  vfs_get_tree+0x93/0x380 fs/super.c:1771
>  do_new_mount fs/namespace.c:3337 [inline]
>  path_mount+0x679/0x1e40 fs/namespace.c:3664
>  do_mount fs/namespace.c:3677 [inline]
>  __do_sys_mount fs/namespace.c:3886 [inline]
>  __se_sys_mount fs/namespace.c:3863 [inline]
>  __x64_sys_mount+0x287/0x310 fs/namespace.c:3863
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x6f/0x77
> RIP: 0033:0x7f79b9a91b3e
> Code: 48 c7 c0 ff ff ff ff eb aa e8 be 0d 00 00 66 2e 0f 1f 84 00 00 00 00 00 
> 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 
> 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:7f79ba8aae38 EFLAGS: 0202 ORIG_RAX: 00a5
> RAX: ffda RBX: 000119f4 RCX: 7f79b9a91b3e
> RDX: 20011a00 RSI: 20011a40 RDI: 7f79ba8aae90
> RBP: 7f79ba8aaed0 R08: 7f79ba8aaed0 R09: 0181c050
> R10: 0181c050 R11: 0202 R12: 20011a00
> R13: 20011a40 R14: 7f79ba8aae90 R15: 21c0
>  
> Modules linked in:
> ---[ end trace  ]---
> 
> 
> Thank you for taking the time to read this email and we look forward to 
> working with you further.
> 
> 
> 
> 
> 
>

[PATCH] tracing: Fix trace_pid_list_free() kernel-doc

2024-05-06 Thread Jeff Johnson

make C=1 reports:

kernel/trace/pid_list.c:458: warning: Function parameter or struct member 
'pid_list' not described in 'trace_pid_list_free'

Add the missing parameter to the trace_pid_list_free() kernel-doc.

Signed-off-by: Jeff Johnson 
---
 kernel/trace/pid_list.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/trace/pid_list.c b/kernel/trace/pid_list.c
index 95106d02b32d..19b271a12c99 100644
--- a/kernel/trace/pid_list.c
+++ b/kernel/trace/pid_list.c
@@ -451,6 +451,7 @@ struct trace_pid_list *trace_pid_list_alloc(void)
 
 /**
  * trace_pid_list_free - Frees an allocated pid_list.
+ * @pid_list: The pid list to free.
  *
  * Frees the memory for a pid_list that was allocated.
  */

---
base-commit: dd5a440a31fae6e459c0d627162825505361
change-id: 20240506-trace_pid_list_free-kdoc-e2bf15be84ee

Re: [PATCH v3 1/2] virtiofs: use pages instead of pointer for kernel direct IO

2024-05-06 Thread Hou Tao




On 4/26/2024 10:39 PM, Hou Tao wrote:
> From: Hou Tao 
>
> When trying to insert a 10MB kernel module kept in a virtio-fs with cache
> disabled, the following warning was reported:
>
>   [ cut here ]
>   WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 ..
>   Modules linked in:
>   CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ..
>   RIP: 0010:__alloc_pages+0x2bf/0x380
>   ..
>   Call Trace:
>
>? __warn+0x8e/0x150
>? __alloc_pages+0x2bf/0x380
>__kmalloc_large_node+0x86/0x160
>__kmalloc+0x33c/0x480
>virtio_fs_enqueue_req+0x240/0x6d0
>virtio_fs_wake_pending_and_unlock+0x7f/0x190
>queue_request_and_unlock+0x55/0x60
>fuse_simple_request+0x152/0x2b0
>fuse_direct_io+0x5d2/0x8c0
>fuse_file_read_iter+0x121/0x160
>__kernel_read+0x151/0x2d0
>kernel_read+0x45/0x50
>kernel_read_file+0x1a9/0x2a0
>init_module_from_file+0x6a/0xe0
>idempotent_init_module+0x175/0x230
>__x64_sys_finit_module+0x5d/0xb0
>x64_sys_call+0x1c3/0x9e0
>do_syscall_64+0x3d/0xc0
>entry_SYSCALL_64_after_hwframe+0x4b/0x53
>..
>
>   ---[ end trace  ]---
>
> The warning is triggered as follows:
>

SNIP
> @@ -1585,7 +1589,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct 
> iov_iter *iter,
>   size_t nbytes = min(count, nmax);
>  
>   err = fuse_get_user_pages(&ia->ap, iter, &nbytes, write,
> -   max_pages);
> +   max_pages, fc->use_pages_for_kvec_io);
>   if (err && !nbytes)
>   break;

Just find out that flush_kernel_vmap_range() and
invalidate_kernel_vmap_range() should be used before DMA rw operation
and after DMA read operation if the kvec IO is backed by vmalloc() area.
Will update it in v4.
>  
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index f239196103137..d4f04e19058c1 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -860,6 +860,9 @@ struct fuse_conn {
>   /** Passthrough support for read/write IO */
>   unsigned int passthrough:1;
>  
> + /* Use pages instead of pointer for kernel I/O */
> + unsigned int use_pages_for_kvec_io:1;
> +
>   /** Maximum stack depth for passthrough backing files */
>   int max_stack_depth;
>  
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 322af827a2329..36984c0e23d14 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -1512,6 +1512,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc)
>   fc->delete_stale = true;
>   fc->auto_submounts = true;
>   fc->sync_fs = true;
> + fc->use_pages_for_kvec_io = true;
>  
>   /* Tell FUSE to split requests that exceed the virtqueue's size */
>   fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit,

Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

2024-05-06 Thread Huacai Chen

On Mon, May 6, 2024 at 3:00 PM maobibo  wrote:
>
>
>
> On 2024/5/6 上午9:53, Huacai Chen wrote:
> > Hi, Bibo,
> >
> > On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:
> >>
> >> PARAVIRT option and pv ipi is added on guest kernel side, function
> >> pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
> >> firstly checks whether system runs on VM mode. If kernel runs on VM mode,
> >> it will call function kvm_para_available() to detect current hypervirsor
> >> type. Now only KVM type detection is supported, the paravirt function can
> >> work only if current hypervisor type is KVM, since there is only KVM
> >> supported on LoongArch now.
> >>
> >> PV IPI uses virtual IPI sender and virtual IPI receiver function. With
> >> virutal IPI sender, ipi message is stored in DDR memory rather than
> >> emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
> >> at the same time like X86 KVM method. Hypercall method is used for IPI
> >> sending.
> >>
> >> With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
> >> VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
> >> acknowledge. And IPI message is stored in DDR, no trap in get IPI message.
> >>
> >> Signed-off-by: Bibo Mao 
> >> ---
> >>   arch/loongarch/Kconfig    |   9 ++
> >>   arch/loongarch/include/asm/hardirq.h  |   1 +
> >>   arch/loongarch/include/asm/paravirt.h |  27 
> >>   .../include/asm/paravirt_api_clock.h  |   1 +
> >>   arch/loongarch/kernel/Makefile|   1 +
> >>   arch/loongarch/kernel/irq.c   |   2 +-
> >>   arch/loongarch/kernel/paravirt.c  | 151 ++
> >>   arch/loongarch/kernel/smp.c   |   4 +-
> >>   8 files changed, 194 insertions(+), 2 deletions(-)
> >>   create mode 100644 arch/loongarch/include/asm/paravirt.h
> >>   create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
> >>   create mode 100644 arch/loongarch/kernel/paravirt.c
> >>
> >> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> >> index 54ad04dacdee..0a1540a8853e 100644
> >> --- a/arch/loongarch/Kconfig
> >> +++ b/arch/loongarch/Kconfig
> >> @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
> >>  bool
> >>  default y
> >>
> >> +config PARAVIRT
> >> +   bool "Enable paravirtualization code"
> >> +   depends on AS_HAS_LVZ_EXTENSION
> >> +   help
> >> +  This changes the kernel so it can modify itself when it is run
> >> + under a hypervisor, potentially improving performance 
> >> significantly
> >> + over full virtualization.  However, when run without a hypervisor
> >> + the kernel is theoretically slower and slightly larger.
> >> +
> >>   config ARCH_SUPPORTS_KEXEC
> >>  def_bool y
> >>
> >> diff --git a/arch/loongarch/include/asm/hardirq.h 
> >> b/arch/loongarch/include/asm/hardirq.h
> >> index 9f0038e19c7f..b26d596a73aa 100644
> >> --- a/arch/loongarch/include/asm/hardirq.h
> >> +++ b/arch/loongarch/include/asm/hardirq.h
> >> @@ -21,6 +21,7 @@ enum ipi_msg_type {
> >>   typedef struct {
> >>  unsigned int ipi_irqs[NR_IPI];
> >>  unsigned int __softirq_pending;
> >> +   atomic_t message cacheline_aligned_in_smp;
> >>   } cacheline_aligned irq_cpustat_t;
> >>
> >>   DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
> >> diff --git a/arch/loongarch/include/asm/paravirt.h 
> >> b/arch/loongarch/include/asm/paravirt.h
> >> new file mode 100644
> >> index ..58f7b7b89f2c
> >> --- /dev/null
> >> +++ b/arch/loongarch/include/asm/paravirt.h
> >> @@ -0,0 +1,27 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> >> +#define _ASM_LOONGARCH_PARAVIRT_H
> >> +
> >> +#ifdef CONFIG_PARAVIRT
> >> +#include 
> >> +struct static_key;
> >> +extern struct static_key paravirt_steal_enabled;
> >> +extern struct static_key paravirt_steal_rq_enabled;
> >> +
> >> +u64 dummy_steal_clock(int cpu);
> >> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> >> +
> >> +static inline u64 paravirt_steal_clock

Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

2024-05-06 Thread maobibo





On 2024/5/6 上午9:53, Huacai Chen wrote:

Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:


PARAVIRT option and pv ipi is added on guest kernel side, function
pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current hypervirsor
type. Now only KVM type detection is supported, the paravirt function can
work only if current hypervisor type is KVM, since there is only KVM
supported on LoongArch now.

PV IPI uses virtual IPI sender and virtual IPI receiver function. With
virutal IPI sender, ipi message is stored in DDR memory rather than
emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
at the same time like X86 KVM method. Hypercall method is used for IPI
sending.

With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
acknowledge. And IPI message is stored in DDR, no trap in get IPI message.

Signed-off-by: Bibo Mao 
---
  arch/loongarch/Kconfig|   9 ++
  arch/loongarch/include/asm/hardirq.h  |   1 +
  arch/loongarch/include/asm/paravirt.h |  27 
  .../include/asm/paravirt_api_clock.h  |   1 +
  arch/loongarch/kernel/Makefile|   1 +
  arch/loongarch/kernel/irq.c   |   2 +-
  arch/loongarch/kernel/paravirt.c  | 151 ++
  arch/loongarch/kernel/smp.c   |   4 +-
  8 files changed, 194 insertions(+), 2 deletions(-)
  create mode 100644 arch/loongarch/include/asm/paravirt.h
  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
  create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 54ad04dacdee..0a1540a8853e 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
 bool
 default y

+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
  config ARCH_SUPPORTS_KEXEC
 def_bool y

diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..b26d596a73aa 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
  typedef struct {
 unsigned int ipi_irqs[NR_IPI];
 unsigned int __softirq_pending;
+   atomic_t message cacheline_aligned_in_smp;
  } cacheline_aligned irq_cpustat_t;

  DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3a7620b66bc6..c9bfeda89e40 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
  obj-$(CONFIG_STACKTRACE)   += stacktrace.o

  obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o

  obj-$(CONFIG_SMP)  += smp.o

diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index ce36897d1e5a..4863e6c1b739 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -113,5 +113,5 @@ void __init init_IRQ(void)
 per_cpu(irq_stack, i), per_cpu(irq_stack, i) + 
IRQ_STACK_SIZE);
 }

-   set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
+   se

[PATCH] kernel/module: disable cfi for do_mod_ctors

2024-05-05 Thread Joey Jiao

CFI failure when both CONFIG_CONSTRUCTORS and CFI_CLANG enabled.

CFI failure at do_init_module+0x100/0x384 (target:
tsan.module_ctor+0x0/0xa98 [module_name_xx]; expected type: 0xa540670c)

Disable cfi for do_mod_ctors to avoid cfi check on mod->ctors[i]().

Signed-off-by: Joey Jiao 
---
 kernel/module/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index e1e8a7a9d6c1..d51e63795637 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2453,6 +2453,7 @@ static int post_relocation(struct module *mod, const 
struct load_info *info)
 }
 
 /* Call module constructors. */
+__nocfi
 static void do_mod_ctors(struct module *mod)
 {
 #ifdef CONFIG_CONSTRUCTORS
-- 
2.43.2

Re: [PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

2024-05-05 Thread Huacai Chen

Hi, Bibo,

On Sun, Apr 28, 2024 at 6:05 PM Bibo Mao  wrote:
>
> PARAVIRT option and pv ipi is added on guest kernel side, function
> pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
> firstly checks whether system runs on VM mode. If kernel runs on VM mode,
> it will call function kvm_para_available() to detect current hypervirsor
> type. Now only KVM type detection is supported, the paravirt function can
> work only if current hypervisor type is KVM, since there is only KVM
> supported on LoongArch now.
>
> PV IPI uses virtual IPI sender and virtual IPI receiver function. With
> virutal IPI sender, ipi message is stored in DDR memory rather than
> emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
> at the same time like X86 KVM method. Hypercall method is used for IPI
> sending.
>
> With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
> VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
> acknowledge. And IPI message is stored in DDR, no trap in get IPI message.
>
> Signed-off-by: Bibo Mao 
> ---
>  arch/loongarch/Kconfig|   9 ++
>  arch/loongarch/include/asm/hardirq.h  |   1 +
>  arch/loongarch/include/asm/paravirt.h |  27 
>  .../include/asm/paravirt_api_clock.h  |   1 +
>  arch/loongarch/kernel/Makefile|   1 +
>  arch/loongarch/kernel/irq.c   |   2 +-
>  arch/loongarch/kernel/paravirt.c  | 151 ++
>  arch/loongarch/kernel/smp.c   |   4 +-
>  8 files changed, 194 insertions(+), 2 deletions(-)
>  create mode 100644 arch/loongarch/include/asm/paravirt.h
>  create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
>  create mode 100644 arch/loongarch/kernel/paravirt.c
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 54ad04dacdee..0a1540a8853e 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
> bool
> default y
>
> +config PARAVIRT
> +   bool "Enable paravirtualization code"
> +   depends on AS_HAS_LVZ_EXTENSION
> +   help
> +  This changes the kernel so it can modify itself when it is run
> + under a hypervisor, potentially improving performance significantly
> + over full virtualization.  However, when run without a hypervisor
> + the kernel is theoretically slower and slightly larger.
> +
>  config ARCH_SUPPORTS_KEXEC
> def_bool y
>
> diff --git a/arch/loongarch/include/asm/hardirq.h 
> b/arch/loongarch/include/asm/hardirq.h
> index 9f0038e19c7f..b26d596a73aa 100644
> --- a/arch/loongarch/include/asm/hardirq.h
> +++ b/arch/loongarch/include/asm/hardirq.h
> @@ -21,6 +21,7 @@ enum ipi_msg_type {
>  typedef struct {
> unsigned int ipi_irqs[NR_IPI];
> unsigned int __softirq_pending;
> +   atomic_t message cacheline_aligned_in_smp;
>  } cacheline_aligned irq_cpustat_t;
>
>  DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
> diff --git a/arch/loongarch/include/asm/paravirt.h 
> b/arch/loongarch/include/asm/paravirt.h
> new file mode 100644
> index ..58f7b7b89f2c
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_LOONGARCH_PARAVIRT_H
> +#define _ASM_LOONGARCH_PARAVIRT_H
> +
> +#ifdef CONFIG_PARAVIRT
> +#include 
> +struct static_key;
> +extern struct static_key paravirt_steal_enabled;
> +extern struct static_key paravirt_steal_rq_enabled;
> +
> +u64 dummy_steal_clock(int cpu);
> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> +
> +static inline u64 paravirt_steal_clock(int cpu)
> +{
> +   return static_call(pv_steal_clock)(cpu);
> +}
> +
> +int pv_ipi_init(void);
> +#else
> +static inline int pv_ipi_init(void)
> +{
> +   return 0;
> +}
> +
> +#endif // CONFIG_PARAVIRT
> +#endif
> diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
> b/arch/loongarch/include/asm/paravirt_api_clock.h
> new file mode 100644
> index ..65ac7cee0dad
> --- /dev/null
> +++ b/arch/loongarch/include/asm/paravirt_api_clock.h
> @@ -0,0 +1 @@
> +#include 
> diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
> index 3a7620b66bc6..c9bfeda89e40 100644
> --- a/arch/loongarch/kernel/Makefile
> +++ b/arch/loongarch/kernel/Makefile
> @@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
>  obj-$(CONFIG_STACKTRACE)   += stacktrace.o
>
>  obj-$(CONFIG_PROC_FS)  += proc.o
>

Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper

2024-04-30 Thread Chris Lew





On 4/26/2024 6:36 PM, Dmitry Baryshkov wrote:

On Sat, 27 Apr 2024 at 04:03, Chris Lew  wrote:




On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote:

diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
b/drivers/remoteproc/qcom_q6v5_adsp.c
index 1d24c9b656a8..02d0c626b03b 100644
--- a/drivers/remoteproc/qcom_q6v5_adsp.c
+++ b/drivers/remoteproc/qcom_q6v5_adsp.c
@@ -23,6 +23,7 @@
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 

@@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc)
   int ret;
   unsigned int val;

- ret = qcom_q6v5_prepare(&adsp->q6v5);
+ ret = qcom_pdm_get();
   if (ret)
   return ret;


Would it make sense to try and model this as a rproc subdev? This
section of the remoteproc code seems to be focused on making specific
calls to setup and enable hardware resources, where as pd mapper is
software.

sysmon and ssr are also purely software and they are modeled as subdevs
in qcom_common. I'm not an expert on remoteproc organization but this
was just a thought.


Well, the issue is that the pd-mapper is a global, not a per-remoteproc instance



Both sysmon and ssr have some kind of global states that they manage 
too. Each subdev functionality tends to be a mix of per-remoteproc 
instance management and global state management.


If pd-mapper was completely global, pd-mapper would be able to 
instantiate by itself. Instead, instantiation is dependent on each 
remoteproc instance properly getting and putting references.


The pdm subdev could manage the references to pd-mapper for that 
remoteproc instance.


On the other hand, I think Bjorn recommended this could be moved to 
probe time in v4. The v4 version was doing the reinitialization-dance, 
but I think the recommendation could still apply to this version.




Thanks!
Chris



+ ret = qcom_q6v5_prepare(&adsp->q6v5);
+ if (ret)
+ goto put_pdm;
+
   ret = adsp_map_carveout(rproc);
   if (ret) {
   dev_err(adsp->dev, "ADSP smmu mapping failed\n");
@@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc)
   adsp_unmap_carveout(rproc);
   disable_irqs:
   qcom_q6v5_unprepare(&adsp->q6v5);
+put_pdm:
+ qcom_pdm_release();

   return ret;
   }

BUG: unable to handle kernel paging request in do_split

2024-04-29 Thread Ubisectech Sirius

Hello.
We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
Recently, our team has discovered a issue in Linux kernel 6.7. Attached to the 
email were a PoC file of the issue.

Stack dump:
BUG: unable to handle page fault for address: ed110c2fd97f
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD 7ffd0067 P4D 7ffd0067 PUD 0
Oops:  [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 PID: 24082 Comm: syz-executor.3 Not tainted 6.7.0 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:do_split+0xfef/0x1e10 fs/ext4/namei.c:2047
Code: d2 0f 85 38 0b 00 00 8b 45 00 89 84 24 84 00 00 00 41 8d 45 ff 48 8d 1c 
c3 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 <0f> b6 14 02 48 89 d8 83 
e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ef
RSP: 0018:c90001e9f858 EFLAGS: 00010a02
RAX: dc00 RBX: 617ecbf8 RCX: c9001048f000
RDX: 11110c2fd97f RSI: 823364ab RDI: 0005
RBP: 8880617ecc00 R08: 0005 R09: 
R10:  R11:  R12: dc00
R13:  R14:  R15: 88801ee8d2b0
FS:  7f191402a640() GS:88802c60() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ed110c2fd97f CR3: 5500a000 CR4: 00750ef0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
PKRU: 5554
Call Trace:
 
 make_indexed_dir+0x1158/0x1540 fs/ext4/namei.c:2342
 ext4_add_entry+0xcd0/0xe80 fs/ext4/namei.c:2454
 ext4_add_nondir+0x90/0x2b0 fs/ext4/namei.c:2795
 ext4_symlink+0x539/0x9e0 fs/ext4/namei.c:3436
 vfs_symlink fs/namei.c:4464 [inline]
 vfs_symlink+0x3f6/0x640 fs/namei.c:4448
 do_symlinkat+0x245/0x2f0 fs/namei.c:4490
 __do_sys_symlink fs/namei.c:4511 [inline]
 __se_sys_symlink fs/namei.c:4509 [inline]
 __x64_sys_symlink+0x79/0xa0 fs/namei.c:4509
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x43/0x120 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f191329002d
Code: c3 e8 97 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:7f191402a028 EFLAGS: 0246 ORIG_RAX: 0058
RAX: ffda RBX: 7f19133cbf80 RCX: 7f191329002d
RDX:  RSI: 2e40 RDI: 20001640
RBP: 7f19132f14d0 R08:  R09: 
R10:  R11: 0246 R12: 
R13: 000b R14: 7f19133cbf80 R15: 7f191400a000
 
Modules linked in:
CR2: ed110c2fd97f
---[ end trace  ]---
RIP: 0010:do_split+0xfef/0x1e10 fs/ext4/namei.c:2047
Code: d2 0f 85 38 0b 00 00 8b 45 00 89 84 24 84 00 00 00 41 8d 45 ff 48 8d 1c 
c3 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 <0f> b6 14 02 48 89 d8 83 
e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ef
RSP: 0018:c90001e9f858 EFLAGS: 00010a02
RAX: dc00 RBX: 617ecbf8 RCX: c9001048f000
RDX: 11110c2fd97f RSI: 823364ab RDI: 0005
RBP: 8880617ecc00 R08: 0005 R09: 
R10:  R11:  R12: dc00
R13:  R14:  R15: 88801ee8d2b0
FS:  7f191402a640() GS:88802c60() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ed110c2fd97f CR3: 5500a000 CR4: 00750ef0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
PKRU: 5554

Code disassembly (best guess):
   0:   d2 0f   rorb   %cl,(%rdi)
   2:   85 38   test   %edi,(%rax)
   4:   0b 00   or (%rax),%eax
   6:   00 8b 45 00 89 84   add%cl,-0x7b76ffbb(%rbx)
   c:   24 84   and$0x84,%al
   e:   00 00   add%al,(%rax)
  10:   00 41 8dadd%al,-0x73(%rcx)
  13:   45 ff 48 8d rex.RB decl -0x73(%r8)
  17:   1c c3   sbb$0xc3,%al
  19:   48 b8 00 00 00 00 00movabs $0xdc00,%rax
  20:   fc ff df
  23:   48 89 damov%rbx,%rdx
  26:   48 c1 ea 03 shr$0x3,%rdx
* 2a:   0f b6 14 02 movzbl (%rdx,%rax,1),%edx <-- trapping 
instruction
  2e:   48 89 d8mov%rbx,%rax
  31:   83 e0 07and$0x7,%eax
  34:   83 c0 03add$0x3,%eax
  37:   38 d0   cmp%dl,%al
  39:   7c 08   jl 0x43
  3b:   84 d2   test   %dl,%dl
  3d:   0f  .byte 0xf
  3e:   85 ef   test

[PATCH v8 6/6] LoongArch: Add pv ipi support on guest kernel side

2024-04-28 Thread Bibo Mao

PARAVIRT option and pv ipi is added on guest kernel side, function
pv_ipi_init() is to add ipi sending and ipi receiving hooks. This function
firstly checks whether system runs on VM mode. If kernel runs on VM mode,
it will call function kvm_para_available() to detect current hypervirsor
type. Now only KVM type detection is supported, the paravirt function can
work only if current hypervisor type is KVM, since there is only KVM
supported on LoongArch now.

PV IPI uses virtual IPI sender and virtual IPI receiver function. With
virutal IPI sender, ipi message is stored in DDR memory rather than
emulated HW. IPI multicast is supported, and 128 vcpus can received IPIs
at the same time like X86 KVM method. Hypercall method is used for IPI
sending.

With virtual IPI receiver, HW SW0 is used rather than real IPI HW. Since
VCPU has separate HW SW0 like HW timer, there is no trap in IPI interrupt
acknowledge. And IPI message is stored in DDR, no trap in get IPI message.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|   9 ++
 arch/loongarch/include/asm/hardirq.h  |   1 +
 arch/loongarch/include/asm/paravirt.h |  27 
 .../include/asm/paravirt_api_clock.h  |   1 +
 arch/loongarch/kernel/Makefile|   1 +
 arch/loongarch/kernel/irq.c   |   2 +-
 arch/loongarch/kernel/paravirt.c  | 151 ++
 arch/loongarch/kernel/smp.c   |   4 +-
 8 files changed, 194 insertions(+), 2 deletions(-)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 54ad04dacdee..0a1540a8853e 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -583,6 +583,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+     the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..b26d596a73aa 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
 typedef struct {
unsigned int ipi_irqs[NR_IPI];
unsigned int __softirq_pending;
+   atomic_t message cacheline_aligned_in_smp;
 } cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..58f7b7b89f2c
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_ipi_init(void);
+#else
+static inline int pv_ipi_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3a7620b66bc6..c9bfeda89e40 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -51,6 +51,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index ce36897d1e5a..4863e6c1b739 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -113,5 +113,5 @@ void __init init_IRQ(void)
per_cpu(irq_stack, i), per_cpu(irq_stack, i) + 
IRQ_STACK_SIZE);
}
 
-   set_csr_ecfg(ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI | ECFGF_PMC);
+   set_csr_ecfg(ECFGF_SIP0 | ECFGF_IP0 | ECFGF_IP1 | ECFGF_IP2 | ECFGF_IPI 
| ECFGF_PMC);
 }
diff --git a/arch/loongarch/kernel/paravir

Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper

2024-04-26 Thread Dmitry Baryshkov

On Sat, 27 Apr 2024 at 04:03, Chris Lew  wrote:
>
>
>
> On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote:
> > diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
> > b/drivers/remoteproc/qcom_q6v5_adsp.c
> > index 1d24c9b656a8..02d0c626b03b 100644
> > --- a/drivers/remoteproc/qcom_q6v5_adsp.c
> > +++ b/drivers/remoteproc/qcom_q6v5_adsp.c
> > @@ -23,6 +23,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >
> > @@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc)
> >   int ret;
> >   unsigned int val;
> >
> > - ret = qcom_q6v5_prepare(&adsp->q6v5);
> > + ret = qcom_pdm_get();
> >   if (ret)
> >   return ret;
>
> Would it make sense to try and model this as a rproc subdev? This
> section of the remoteproc code seems to be focused on making specific
> calls to setup and enable hardware resources, where as pd mapper is
> software.
>
> sysmon and ssr are also purely software and they are modeled as subdevs
> in qcom_common. I'm not an expert on remoteproc organization but this
> was just a thought.

Well, the issue is that the pd-mapper is a global, not a per-remoteproc instance

>
> Thanks!
> Chris
>
> >
> > + ret = qcom_q6v5_prepare(&adsp->q6v5);
> > + if (ret)
> > + goto put_pdm;
> > +
> >   ret = adsp_map_carveout(rproc);
> >   if (ret) {
> >   dev_err(adsp->dev, "ADSP smmu mapping failed\n");
> > @@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc)
> >   adsp_unmap_carveout(rproc);
> >   disable_irqs:
> >   qcom_q6v5_unprepare(&adsp->q6v5);
> > +put_pdm:
> > + qcom_pdm_release();
> >
> >   return ret;
> >   }
>


-- 
With best wishes
Dmitry

Re: [PATCH v7 6/6] remoteproc: qcom: enable in-kernel PD mapper

2024-04-26 Thread Chris Lew





On 4/24/2024 2:28 AM, Dmitry Baryshkov wrote:

diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
b/drivers/remoteproc/qcom_q6v5_adsp.c
index 1d24c9b656a8..02d0c626b03b 100644
--- a/drivers/remoteproc/qcom_q6v5_adsp.c
+++ b/drivers/remoteproc/qcom_q6v5_adsp.c
@@ -23,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -375,10 +376,14 @@ static int adsp_start(struct rproc *rproc)

int ret;
unsigned int val;
  
-	ret = qcom_q6v5_prepare(&adsp->q6v5);

+   ret = qcom_pdm_get();
if (ret)
return ret;


Would it make sense to try and model this as a rproc subdev? This 
section of the remoteproc code seems to be focused on making specific 
calls to setup and enable hardware resources, where as pd mapper is 
software.


sysmon and ssr are also purely software and they are modeled as subdevs 
in qcom_common. I'm not an expert on remoteproc organization but this 
was just a thought.


Thanks!
Chris

  
+	ret = qcom_q6v5_prepare(&adsp->q6v5);

+   if (ret)
+   goto put_pdm;
+
ret = adsp_map_carveout(rproc);
if (ret) {
dev_err(adsp->dev, "ADSP smmu mapping failed\n");
@@ -446,6 +451,8 @@ static int adsp_start(struct rproc *rproc)
adsp_unmap_carveout(rproc);
  disable_irqs:
qcom_q6v5_unprepare(&adsp->q6v5);
+put_pdm:
+   qcom_pdm_release();
  
  	return ret;

  }

Re: [PATCH] kernel/trace/trace_probe:Fixed memory leak issues in trace_probe.c.

2024-04-26 Thread Google

Hi LuMingYin,

Thanks for finding the problem! But please make a commit message
following Documentation/process/submitting-patches.rst

On Fri, 26 Apr 2024 10:13:43 +0100
lumingyindet...@126.com wrote:

> From: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com>
> 
> At line 1408 of the file /linux/kernel/trace/trace_probe.c, pointer variables 
> named code and tmp are defined. At line 1437, a new dynamic memory area is 
> allocated using the function kcalloc. When the if statement at line 1467 
> evaluates to true, the program jumps to the out label at line 1469. Within 
> this function, there are two labels: out and fail. The difference between 
> these two labels is that fail additionally frees the dynamic memory area 
> pointed to by the variable code. Therefore, the program should jump to the 
> fail label instead of the out label. This commit fixes this bug.
> 

For example, you must line break after about 70 characters. Also,
please don't use the line number because the line number is easily
changed (function name is OK). Since this bug is very clear mistake,
so you can just explain that as following.

 If traceprobe_parse_probe_arg_body() fails to allocate 'parg->fmt', it
 jumps to 'out' instead of 'fail' by mistake. In the result, in this
 case the 'tmp' buffer is not freed and leaks its memory.

 Fix it by jumping to 'fail' in that case.

The first paragraph explains what happens, and second one to exaplain
how to fix it.

Also, please add this Fixes tag.

Fixes: 032330abd08b ("tracing/probes: Cleanup probe argument parser")

You can easily find this commit number with git blame.

Thank you,

> Signed-off-by: LuMingYin <11570291+yin-lum...@user.noreply.gitee.com>
> ---
>  kernel/trace/trace_probe.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index dfe3ee6035ec..42bc0f362226 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -1466,7 +1466,7 @@ static int traceprobe_parse_probe_arg_body(const char 
> *argv, ssize_t *size,
>   parg->fmt = kmalloc(len, GFP_KERNEL);
>   if (!parg->fmt) {
>   ret = -ENOMEM;
> - goto out;
> + goto fail;
>   }
>   snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype,
>parg->count);
> -- 
> 2.25.1
> 

-- 
Masami Hiramatsu (Google)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 29485 matches

Mail list logo