[PATCH] vfio: fix virtio-pci dependency

2024-01-08 Thread Arnd Bergmann
From: Arnd Bergmann 

The new vfio-virtio driver already has a dependency on VIRTIO_PCI_ADMIN_LEGACY,
but that is a bool symbol and allows vfio-virtio to be built-in even if
virtio-pci itself is a loadable module. This leads to a link failure:

aarch64-linux-ld: drivers/vfio/pci/virtio/main.o: in function 
`virtiovf_pci_probe':
main.c:(.text+0xec): undefined reference to `virtio_pci_admin_has_legacy_io'
aarch64-linux-ld: drivers/vfio/pci/virtio/main.o: in function 
`virtiovf_pci_init_device':
main.c:(.text+0x260): undefined reference to 
`virtio_pci_admin_legacy_io_notify_info'
aarch64-linux-ld: drivers/vfio/pci/virtio/main.o: in function 
`virtiovf_pci_bar0_rw':
main.c:(.text+0x6ec): undefined reference to 
`virtio_pci_admin_legacy_common_io_read'
aarch64-linux-ld: main.c:(.text+0x6f4): undefined reference to 
`virtio_pci_admin_legacy_device_io_read'
aarch64-linux-ld: main.c:(.text+0x7f0): undefined reference to 
`virtio_pci_admin_legacy_common_io_write'
aarch64-linux-ld: main.c:(.text+0x7f8): undefined reference to 
`virtio_pci_admin_legacy_device_io_write'

Add another explicit dependency on the tristate symbol.

Fixes: eb61eca0e8c3 ("vfio/virtio: Introduce a vfio driver over virtio devices")
Signed-off-by: Arnd Bergmann 
---
 drivers/vfio/pci/virtio/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
index fc3a0be9d8d4..bd80eca4a196 100644
--- a/drivers/vfio/pci/virtio/Kconfig
+++ b/drivers/vfio/pci/virtio/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config VIRTIO_VFIO_PCI
 tristate "VFIO support for VIRTIO NET PCI devices"
-depends on VIRTIO_PCI_ADMIN_LEGACY
+depends on VIRTIO_PCI && VIRTIO_PCI_ADMIN_LEGACY
 select VFIO_PCI_CORE
 help
   This provides support for exposing VIRTIO NET VF devices which 
support
-- 
2.39.2




Re: [PATCH 5/5] LoongArch: Add pv ipi support on LoongArch system

2024-01-08 Thread kernel test robot
Hi Bibo,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 610a9b8f49fbcf1100716370d3b5f6f884a2835a]

url:
https://github.com/intel-lab-lkp/linux/commits/Bibo-Mao/LoongArch-KVM-Add-hypercall-instruction-emulation-support/20240103-151946
base:   610a9b8f49fbcf1100716370d3b5f6f884a2835a
patch link:
https://lore.kernel.org/r/20240103071615.3422264-6-maobibo%40loongson.cn
patch subject: [PATCH 5/5] LoongArch: Add pv ipi support on LoongArch system
config: loongarch-randconfig-r061-20240109 
(https://download.01.org/0day-ci/archive/20240109/202401091354.gtbergqj-...@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 13.2.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202401091354.gtbergqj-...@intel.com/

cocci warnings: (new ones prefixed by >>)
>> arch/loongarch/kvm/exit.c:681:5-8: Unneeded variable: "ret". Return "  0" on 
>> line 701
--
>> arch/loongarch/kvm/exit.c:720:2-3: Unneeded semicolon

vim +681 arch/loongarch/kvm/exit.c

   678  
   679  static int kvm_pv_send_ipi(struct kvm_vcpu *vcpu, int sgi)
   680  {
 > 681  int ret = 0;
   682  u64 ipi_bitmap;
   683  unsigned int min, cpu;
   684  struct kvm_vcpu *dest;
   685  
   686  ipi_bitmap = vcpu->arch.gprs[LOONGARCH_GPR_A1];
   687  min = vcpu->arch.gprs[LOONGARCH_GPR_A2];
   688  
   689  if (ipi_bitmap) {
   690  cpu = find_first_bit((void *)_bitmap, 
BITS_PER_LONG);
   691  while (cpu < BITS_PER_LONG) {
   692  if ((cpu + min) < KVM_MAX_VCPUS) {
   693  dest = kvm_get_vcpu_by_id(vcpu->kvm, 
cpu + min);
   694  kvm_queue_irq(dest, sgi);
   695  kvm_vcpu_kick(dest);
   696  }
   697  cpu = find_next_bit((void *)_bitmap, 
BITS_PER_LONG, cpu + 1);
   698  }
   699  }
   700  
 > 701  return ret;
   702  }
   703  
   704  /*
   705   * hypcall emulation always return to guest, Caller should check retval.
   706   */
   707  static void kvm_handle_pv_hcall(struct kvm_vcpu *vcpu)
   708  {
   709  unsigned long func = vcpu->arch.gprs[LOONGARCH_GPR_A0];
   710  long ret;
   711  
   712  switch (func) {
   713  case KVM_HC_FUNC_IPI:
   714  kvm_pv_send_ipi(vcpu, INT_SWI0);
   715  ret = KVM_HC_STATUS_SUCCESS;
   716  break;
   717  default:
   718  ret = KVM_HC_INVALID_CODE;
   719  break;
 > 720  };
   721  
   722  vcpu->arch.gprs[LOONGARCH_GPR_A0] = ret;
   723  }
   724  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Re: linux-next: Tree for Jan 8 (drivers/vfio/pci/virtio/main.o)

2024-01-08 Thread Randy Dunlap


On 1/7/24 21:09, Stephen Rothwell wrote:
> Hi all,
> 
> News: the merge window has opened, so please do not add any material
> intended for v6.9 to your linux-next included branches until asfter
> v6.8-rc1 has been released.
> 
> Changes since 20240105:
> 

on powerpc 64-bit:

when
CONFIG_VIRTIO_PCI=m
CONFIG_VIRTIO_PCI_ADMIN_LEGACY=y


powerpc64-linux-ld: drivers/vfio/pci/virtio/main.o: in function 
`virtiovf_pci_probe':
main.c:(.text+0x17c): undefined reference to `virtio_pci_admin_has_legacy_io'
powerpc64-linux-ld: drivers/vfio/pci/virtio/main.o: in function 
`virtiovf_pci_init_device':
main.c:(.text+0x82c): undefined reference to 
`virtio_pci_admin_legacy_io_notify_info'
powerpc64-linux-ld: drivers/vfio/pci/virtio/main.o: in function 
`virtiovf_pci_bar0_rw':
main.c:(.text+0xe98): undefined reference to 
`virtio_pci_admin_legacy_common_io_read'
powerpc64-linux-ld: main.c:(.text+0xecc): undefined reference to 
`virtio_pci_admin_legacy_device_io_read'
powerpc64-linux-ld: main.c:(.text+0x10d8): undefined reference to 
`virtio_pci_admin_legacy_common_io_write'
powerpc64-linux-ld: main.c:(.text+0x1108): undefined reference to 
`virtio_pci_admin_legacy_device_io_write'


Also, it is not a good idea to have a driver depend on a hidden (not visible) 
Kconfig symbol,
like this:

config VIRTIO_VFIO_PCI
tristate "VFIO support for VIRTIO NET PCI devices"
depends on VIRTIO_PCI_ADMIN_LEGACY


Full randconfig file is attached.

-- 
#Randy

config-r4841.gz
Description: application/gzip


Re: REGRESSION: lockdep warning triggered by 15b9ce7ecd: virtio_balloon: stay awake while adjusting balloon

2024-01-08 Thread David Stevens
On Tue, Jan 9, 2024 at 6:50 AM Theodore Ts'o  wrote:
>
> Hi, while doing final testing before sending a pull request, I merged
> in linux-next, and commit 5b9ce7ecd7: virtio_balloon: stay awake while
> adjusting balloon seems to be causing a lockdep warning (see attached)
> when running gce-xfstests on a Google Compute Engine e2 VM.  I was not
> able to trigger it using kvm-xfstests, but the following command:
> "gce-xfstests -C 10 ext4/4k generic/476) was sufficient to triger the
> problem.   For more information please see [1] and [2].
>
> [1] 
> https://github.com/tytso/xfstests-bld/blob/master/Documentation/gce-xfstests.md
> [2] https://thunk.org/gce-xfstests
>
> I found it by looking at the git logs, and this commit aroused my
> suspicions, and I further testing showed that the lockdep warning was
> reproducible with this commit, but not when testing with the
> immediately preceeding commit (15b9ce7ecd^).
>
> Cheers,
>
> - Ted
>
>
> root: ext4/4k run xfstest generic/476
> systemd[1]: Started fstests-generic-476.scope - /usr/bin/bash -c test -w 
> /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec 
> ./tests/generic/476.
> kernel: [  399.361181] EXT4-fs (dm-1): mounted filesystem 
> 840e25bd-f650-4819-8562-7eded85ef370 r/w with ordered data mode. Quota mode: 
> none.
> systemd[1]: fstests-generic-476.scope: Deactivated successfully.
> systemd[1]: fstests-generic-476.scope: Consumed 3min 1.966s CPU time.
> systemd[1]: xt\x2dvdb.mount: Deactivated successfully.
> kernel: [  537.085404] EXT4-fs (dm-0): unmounting filesystem 
> d3d7a675-f7b6-4384-abec-2e60d885b6da.
> systemd[1]: xt\x2dvdc.mount: Deactivated successfully.
> kernel: [  540.565870]
> kernel: [  540.567523] 
> kernel: [  540.572007] WARNING: inconsistent lock state
> kernel: [  540.576407] 6.7.0-rc3-xfstests-lockdep-00012-g5b9ce7ecd715 #318 
> Not tainted
> kernel: [  540.583532] 
> kernel: [  540.587928] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> kernel: [  540.594326] kworker/0:3/329 [HC0[0]:SC0[0]:HE1:SE1] takes:
> kernel: [  540.599955] 90b280a548c0 (>adjustment_lock){?...}-{2:2}, 
> at: update_balloon_size_func+0x33/0x190
> kernel: [  540.609926] {IN-HARDIRQ-W} state was registered at:
> kernel: [  540.614935]   __lock_acquire+0x3f2/0xb30
> kernel: [  540.618992]   lock_acquire+0xbf/0x2b0
> kernel: [  540.622786]   _raw_spin_lock_irqsave+0x43/0x90
> kernel: [  540.627366]   virtballoon_changed+0x51/0xd0
> kernel: [  540.631947]   virtio_config_changed+0x5a/0x70
> kernel: [  540.636437]   vp_config_changed+0x11/0x20
> kernel: [  540.640576]   __handle_irq_event_percpu+0x88/0x230
> kernel: [  540.645500]   handle_irq_event+0x38/0x80
> kernel: [  540.649558]   handle_edge_irq+0x8f/0x1f0
> kernel: [  540.653791]   __common_interrupt+0x47/0xf0
> kernel: [  540.658106]   common_interrupt+0x79/0xa0
> kernel: [  540.661672] EXT4-fs (dm-1): unmounting filesystem 
> 840e25bd-f650-4819-8562-7eded85ef370.
> kernel: [  540.663183]   asm_common_interrupt+0x26/0x40
> kernel: [  540.663190]   acpi_safe_halt+0x1b/0x30
> kernel: [  540.663196]   acpi_idle_enter+0x7b/0xd0
> kernel: [  540.663199]   cpuidle_enter_state+0x90/0x4f0
> kernel: [  540.688723]   cpuidle_enter+0x2d/0x40
> kernel: [  540.692516]   cpuidle_idle_call+0xe4/0x120
> kernel: [  540.697036]   do_idle+0x84/0xd0
> kernel: [  540.700393]   cpu_startup_entry+0x2a/0x30
> kernel: [  540.704588]   rest_init+0xe9/0x180
> kernel: [  540.708118]   arch_call_rest_init+0xe/0x30
> kernel: [  540.712426]   start_kernel+0x41c/0x4b0
> kernel: [  540.716310]   x86_64_start_reservations+0x18/0x30
> kernel: [  540.721164]   x86_64_start_kernel+0x8c/0x90
> kernel: [  540.725737]   secondary_startup_64_no_verify+0x178/0x17b
> kernel: [  540.731432] irq event stamp: 22681
> kernel: [  540.734956] hardirqs last  enabled at (22681): 
> [] _raw_spin_unlock_irq+0x28/0x50
> kernel: [  540.744564] hardirqs last disabled at (22680): 
> [] _raw_spin_lock_irq+0x5d/0x90
> kernel: [  540.753475] softirqs last  enabled at (22076): 
> [] srcu_invoke_callbacks+0x101/0x1c0
> kernel: [  540.762904] softirqs last disabled at (22072): 
> [] srcu_invoke_callbacks+0x101/0x1c0
> kernel: [  540.773298]
> kernel: [  540.773298] other info that might help us debug this:
> kernel: [  540.780207]  Possible unsafe locking scenario:
> kernel: [  540.780207]
> kernel: [  540.786438]CPU0
> kernel: [  540.789007]
> kernel: [  540.791766]   lock(>adjustment_lock);
> kernel: [  540.796014]   
> kernel: [  540.798778] lock(>adjustment_lock);
> kernel: [  540.803605]

Oh, that's embarrassing, I completely whiffed on interactions with
interrupts. The following patch fixes it, and I've locally repro'ed
the issue and verified the fix. What's the process for getting this
fix merged? Does it get merged as a seperatch patch, or squashed into
the original commit?

>From 

Re: [PATCH v6 01/12] cgroup/misc: Add per resource callbacks for CSS events

2024-01-08 Thread Haitao Huang
On Wed, 15 Nov 2023 14:25:59 -0600, Jarkko Sakkinen   
wrote:



On Mon Oct 30, 2023 at 8:20 PM EET, Haitao Huang wrote:

From: Kristen Carlson Accardi 

The misc cgroup controller (subsystem) currently does not perform
resource type specific action for Cgroups Subsystem State (CSS) events:
the 'css_alloc' event when a cgroup is created and the 'css_free' event
when a cgroup is destroyed.

Define callbacks for those events and allow resource providers to
register the callbacks per resource type as needed. This will be
utilized later by the EPC misc cgroup support implemented in the SGX
driver.

Also add per resource type private data for those callbacks to store and
access resource specific data.

Signed-off-by: Kristen Carlson Accardi 
Co-developed-by: Haitao Huang 
Signed-off-by: Haitao Huang 
---
V6:
- Create ops struct for per resource callbacks (Jarkko)
- Drop max_write callback (Dave, Michal)
- Style fixes (Kai)
---
 include/linux/misc_cgroup.h | 14 ++
 kernel/cgroup/misc.c| 27 ---
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h
index e799b1f8d05b..5dc509c27c3d 100644
--- a/include/linux/misc_cgroup.h
+++ b/include/linux/misc_cgroup.h
@@ -27,16 +27,30 @@ struct misc_cg;

 #include 

+/**
+ * struct misc_operations_struct: per resource callback ops.
+ * @alloc: invoked for resource specific initialization when cgroup is  
allocated.
+ * @free: invoked for resource specific cleanup when cgroup is  
deallocated.

+ */
+struct misc_operations_struct {
+   int (*alloc)(struct misc_cg *cg);
+   void (*free)(struct misc_cg *cg);
+};


Maybe just misc_operations, or even misc_ops?



With Michal's suggestion to make ops per-resource-type, I'll rename this  
misc_res_ops  (I was following vm_operations_struct as example)



+
 /**
  * struct misc_res: Per cgroup per misc type resource
  * @max: Maximum limit on the resource.
  * @usage: Current usage of the resource.
  * @events: Number of times, the resource limit exceeded.
+ * @priv: resource specific data.
+ * @misc_ops: resource specific operations.
  */
 struct misc_res {
u64 max;
atomic64_t usage;
atomic64_t events;
+   void *priv;


priv is the wrong patch, it just confuses the overall picture heere.
please move it to 04/12. Let's deal with the callbacks here.



Ok


+   const struct misc_operations_struct *misc_ops;
 };


misc_ops would be at least consistent with this, as misc_res also has an
acronym.



 /**
diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c
index 79a3717a5803..d971ede44ebf 100644
--- a/kernel/cgroup/misc.c
+++ b/kernel/cgroup/misc.c
@@ -383,23 +383,37 @@ static struct cftype misc_cg_files[] = {
 static struct cgroup_subsys_state *
 misc_cg_alloc(struct cgroup_subsys_state *parent_css)
 {
+   struct misc_cg *parent_cg, *cg;
enum misc_res_type i;
-   struct misc_cg *cg;
+   int ret;

if (!parent_css) {
-   cg = _cg;
+   parent_cg = cg = _cg;
} else {
cg = kzalloc(sizeof(*cg), GFP_KERNEL);
if (!cg)
return ERR_PTR(-ENOMEM);
+   parent_cg = css_misc(parent_css);
}

for (i = 0; i < MISC_CG_RES_TYPES; i++) {
WRITE_ONCE(cg->res[i].max, MAX_NUM);
atomic64_set(>res[i].usage, 0);
+		if (parent_cg->res[i].misc_ops && parent_cg->res[i].misc_ops->alloc)  
{

+   ret = parent_cg->res[i].misc_ops->alloc(cg);
+   if (ret)
+   goto alloc_err;


The patch set only has a use case for both operations defined - any
partial combinations should never be allowed.

To enforce this invariant you could create a set of operations (written
out of top of my head):

static int misc_res_init(struct misc_res *res, struct misc_ops *ops)
{
if (!misc_ops->alloc) {
pr_err("%s: alloc missing\n", __func__);
return -EINVAL;
}

if (!misc_ops->free) {
pr_err("%s: free missing\n", __func__);
return -EINVAL;
}

res->misc_ops = misc_ops;
return 0;
}

static inline int misc_res_alloc(struct misc_cg *cg, struct misc_res  
*res)

{
int ret;

if (!res->misc_ops)
return 0;

return res->misc_ops->alloc(cg);
}

static inline void misc_res_free(struct misc_cg *cg, struct misc_res  
*res)

{
int ret;

if (!res->misc_ops)
return 0;

return res->misc_ops->alloc(cg);
}

Now if anything has misc_ops, it will also always have *both* callback,
and nothing half-baked gets in.

The above loops would be then:

for (i = 0; i < MISC_CG_RES_TYPES; i++) {
WRITE_ONCE(cg->res[i].max, MAX_NUM);
atomic64_set(>res[i].usage, 0);
ret = 

Re: possible deadlock in __perf_install_in_context

2024-01-08 Thread Steven Rostedt


This is related to perf not tracing.

-- Steve


On Tue, 09 Jan 2024 08:34:15 +0800
"Ubisectech Sirius"  wrote:

> Dear concerned.
> Greetings!
> We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. 
> Recently, our team has discovered a issue in Linux kernel 6.7.0-g0dd3ee311255.
> technical details:
> 1. Issue Description: possible deadlock in __perf_install_in_context
> 2. Stack Dump: 
> [ 158.488994][ T8029] Call Trace:
> [ 158.489411][ T8029] 
> arch/x86/events/intel/../perf_event.h:1166 arch/x86/events/intel/core.c:2799)
> [ 158.498427][ T8029] x86_pmu_start (arch/x86/events/core.c:1516)
> [ 158.499034][ T8029] x86_pmu_enable (arch/x86/events/core.c:1331 
> (discriminator 2))
> [ 158.499601][ T8029] perf_ctx_enable (kernel/events/core.c:703 
> (discriminator 2))
> [ 158.500171][ T8029] ctx_resched (kernel/events/core.c:2741)
> [ 158.500733][ T8029] __perf_install_in_context (kernel/events/core.c:2807)
> [ 158.502106][ T8029] remote_function (kernel/events/core.c:92 
> kernel/events/core.c:72)
> [ 158.503364][ T8029] generic_exec_single (kernel/smp.c:134 (discriminator 3) 
> kernel/smp.c:404 (discriminator 3))
> [ 158.503995][ T8029] smp_call_function_single (kernel/smp.c:647)
> (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:77 
> ./arch/x86/include/asm/irqflags.h:135 lib/percpu_counter.c:102)
> [ 158.512958][ T8029] perf_install_in_context (kernel/events/core.c:2909 
> (discriminator 1))
> [ 158.515717][ T8029] __do_sys_perf_event_open (kernel/events/core.c:1443 
> kernel/events/core.c:12747)
> [ 158.518483][ T8029] do_syscall_64 (arch/x86/entry/common.c:52 
> arch/x86/entry/common.c:83)
> [ 158.519281][ T8029] entry_SYSCALL_64_after_hwframe 
> (arch/x86/entry/entry_64.S:129)
> [ 158.519991][ T8029] RIP: 0033:0x7f04a0c9cf29
> [ 158.520536][ T8029] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 
> 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 
> <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 37 8f 0d 00 f7 d8 64 89 01 48
> All code
> 
>  0: 00 c3 add %al,%bl
>  2: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
>  9: 00 00 00
>  c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
>  11: 48 89 f8 mov %rdi,%rax
>  14: 48 89 f7 mov %rsi,%rdi
>  17: 48 89 d6 mov %rdx,%rsi
>  1a: 48 89 ca mov %rcx,%rdx
>  1d: 4d 89 c2 mov %r8,%r10
>  20: 4d 89 c8 mov %r9,%r8
>  23: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9
>  28: 0f 05 syscall
>  2a:* 48 3d 01 f0 ff ff cmp $0xf001,%rax <-- trapping instruction
>  30: 73 01 jae 0x33
>  32: c3 ret
>  33: 48 8b 0d 37 8f 0d 00 mov 0xd8f37(%rip),%rcx # 0xd8f71
>  3a: f7 d8 neg %eax
>  3c: 64 89 01 mov %eax,%fs:(%rcx)
>  3f: 48 rex.W
> Code starting with the faulting instruction
> ===
>  0: 48 3d 01 f0 ff ff cmp $0xf001,%rax
>  6: 73 01 jae 0x9
>  8: c3 ret
>  9: 48 8b 0d 37 8f 0d 00 mov 0xd8f37(%rip),%rcx # 0xd8f47
>  10: f7 d8 neg %eax
>  12: 64 89 01 mov %eax,%fs:(%rcx)
>  15: 48 rex.W
> [ 158.522837][ T8029] RSP: 002b:7ffe5f1174b8 EFLAGS: 0246 ORIG_RAX: 
> 012a
> [ 158.523848][ T8029] RAX: ffda RBX:  RCX: 
> 7f04a0c9cf29
> [ 158.524797][ T8029] RDX:  RSI:  RDI: 
> 20004740
> [ 158.525738][ T8029] RBP: 7ffe5f1174c0 R08:  R09: 
> 7ffe5f1174f0
> [ 158.526717][ T8029] R10:  R11: 0246 R12: 
> 5597d067d180
> [ 158.527661][ T8029] R13:  R14:  R15: 
> 
> [ 158.528611][ T8029] 
> [ 158.530059][ T8029]
> [ 158.530364][ T8029] ==
> [ 158.531146][ T8029] WARNING: possible circular locking dependency detected
> [ 158.531881][ T8029] 6.7.0-g0dd3ee311255 #6 Not tainted
> [ 158.532457][ T8029] --
> [ 158.533256][ T8029] poc/8029 is trying to acquire lock:
> [ 158.533880][ T8029] 88801ca53018 (>lock){}-{2:2}, at: 
> __perf_event_task_sched_out (kernel/events/core.c:3573 
> kernel/events/core.c:3676)
> [ 158.535067][ T8029]
> [ 158.535067][ T8029] but task is already holding lock:
> [ 158.535925][ T8029] 88802d23c758 (>__lock){-.-.}-{2:2}, at: 
> raw_spin_rq_lock_nested (kernel/sched/core.c:574)
> [ 158.537001][ T8029]
> [ 158.537001][ T8029] which lock already depends on the new lock.
> [ 158.537001][ T8029]
> [ 158.538196][ T8029]
> [ 158.538196][ T8029] the existing dependency chain (in reverse order) is:
> [ 158.539200][ T8029]
> [ 158.539200][ T8029] -> #3 (>__lock){-.-.}-{2:2}:
> [ 158.540081][ T8029] _raw_spin_lock_nested (kernel/locking/spinlock.c:379)
> [ 158.540772][ T8029] raw_spin_rq_lock_nested (kernel/sched/core.c:574)
> [ 158.541471][ T8029] task_fork_fair (kernel/sched/sched.h:1222 
> kernel/sched/sched.h:1581 kernel/sched/sched.h:1664 kernel/sched/fair.c:12586)
> [ 158.542092][ T8029] sched_cgroup_fork (kernel/sched/core.c:4814)
> [ 158.542772][ 

Re: [PATCH RFT] arm64: dts: qcom: sm8350: Reenable crypto & cryptobam

2024-01-08 Thread Dmitry Baryshkov
On Mon, 8 Jan 2024 at 16:23, Luca Weiss  wrote:
>
> On Mon Jan 8, 2024 at 3:18 PM CET, Konrad Dybcio wrote:
> > On 8.01.2024 14:49, Luca Weiss wrote:
> > > When num-channels and qcom,num-ees is not provided in devicetree, the
> > > driver will try to read these values from the registers during probe but
> > > this fails if the interconnect is not on and then crashes the system.
> > >
> > > So we can provide these properties in devicetree (queried after patching
> > > BAM driver to enable the necessary interconnect) so we can probe
> > > cryptobam without reading registers and then also use the QCE as
> > > expected.
> >
> > This really feels a bit backwards.. Enable the resource to query the
> > hardware for numbers, so that said resource can be enabled, but
> > slightly later :/
>
> If you think adding interconnect support to driver and dtsi is better,
> let me know.

I'd say, adding the proper interconnect is a better option. Otherwise
we just depend on the QCE itself to set up the vote for us.

>
> Stephan (+CC) mentioned it should be okay like this *shrug*
>
> For the record, this is the same way I got the values for sc7280[0] and
> sm6350[1].
>
> [0] 
> https://lore.kernel.org/linux-arm-msm/20231229-sc7280-cryptobam-fixup-v1-1-bd8f68589...@fairphone.com/
> [1] 
> https://lore.kernel.org/linux-arm-msm/20240105-sm6350-qce-v1-0-416e5c731...@fairphone.com/
>
> diff --git a/arch/arm64/boot/dts/qcom/sm8350.dtsi 
> b/arch/arm64/boot/dts/qcom/sm8350.dtsi
> index b46236235b7f..cd4dd9852d9e 100644
> --- a/arch/arm64/boot/dts/qcom/sm8350.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sm8350.dtsi
> @@ -1756,8 +1756,8 @@ cryptobam: dma-controller@1dc4000 {
> qcom,controlled-remotely;
> iommus = <_smmu 0x594 0x0011>,
>  <_smmu 0x596 0x0011>;
> -   /* FIXME: Probing BAM DMA causes some abort and 
> system hang */
> -   status = "fail";
> +   interconnects = <_noc MASTER_CRYPTO 0 _virt 
> SLAVE_EBI1 0>;
> +   interconnect-names = "memory";
> };
>
> crypto: crypto@1dfa000 {
> diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
> index 5e7d332731e0..9de28f615639 100644
> --- a/drivers/dma/qcom/bam_dma.c
> +++ b/drivers/dma/qcom/bam_dma.c
> @@ -40,6 +40,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  #include "../dmaengine.h"
> @@ -394,6 +395,7 @@ struct bam_device {
> const struct reg_offset_data *layout;
>
> struct clk *bamclk;
> +   struct icc_path *mem_path;
> int irq;
>
> /* dma start transaction tasklet */
> @@ -1206,6 +1208,7 @@ static int bam_init(struct bam_device *bdev)
> bdev->num_channels = val & BAM_NUM_PIPES_MASK;
> }
>
> +   printk(KERN_ERR "%s:%d DBG num_ees=%u num_channels=%u\n", __func__, 
> __LINE__, bdev->num_ees, bdev->num_channels);
> /* Reset BAM now if fully controlled locally */
> if (!bdev->controlled_remotely && !bdev->powered_remotely)
> bam_reset(bdev);
> @@ -1298,6 +1301,14 @@ static int bam_dma_probe(struct platform_device *pdev)
> return ret;
> }
>
> +   bdev->mem_path = devm_of_icc_get(bdev->dev, "memory");
> +   if (IS_ERR(bdev->mem_path))
> +   return PTR_ERR(bdev->mem_path);
> +
> +   ret = icc_set_bw(bdev->mem_path, 1, 1);

Probably this needs some more sensible value.

> +   if (ret)
> +   return ret;
> +
> ret = bam_init(bdev);
> if (ret)
> goto err_disable_clk;
>


-- 
With best wishes
Dmitry



Re: [PATCH] tracing user_events: Simplify user_event_parse_field() parsing

2024-01-08 Thread Steven Rostedt
On Mon, 8 Jan 2024 17:13:12 -0500
Steven Rostedt  wrote:

> On Mon, 8 Jan 2024 21:47:44 +
> Beau Belgrave  wrote:
> 
> > > - len = str_has_prefix(field, "__rel_loc ");
> > > - if (len)
> > > - goto skip_next;
> > > + if (!(len = str_has_prefix(field, "__data_loc unsigned ")) &&
> > > + !(len = str_has_prefix(field, "__data_loc ")) &&
> > > + !(len = str_has_prefix(field, "__rel_loc unsigned ")) &&
> > > + !(len = str_has_prefix(field, "__rel_loc "))) {
> > > + goto parse;
> > > + }
> > 
> > This now triggers a checkpatch error:
> > ERROR: do not use assignment in if condition  
> 
> What a horrible message.
> 
> > #1184: FILE: kernel/trace/trace_events_user.c:1184:
> > +   if (!(len = str_has_prefix(field, "__data_loc unsigned ")) &&
> > 
> > I personally prefer to keep these files fully checkpatch clean.  
> 
> I've stopped using checkpatch years ago because I disagreed with so much it 
> :-p
>   (Including this message)

Note that checkpatch is a guideline and not a rule. The general rule is, if
the code looks worse when applying the checkpatch rule, don't do it.

- Steve



Re: [PATCH] tracing user_events: Simplify user_event_parse_field() parsing

2024-01-08 Thread Steven Rostedt
On Mon, 8 Jan 2024 21:47:44 +
Beau Belgrave  wrote:

> > -   len = str_has_prefix(field, "__rel_loc ");
> > -   if (len)
> > -   goto skip_next;
> > +   if (!(len = str_has_prefix(field, "__data_loc unsigned ")) &&
> > +   !(len = str_has_prefix(field, "__data_loc ")) &&
> > +   !(len = str_has_prefix(field, "__rel_loc unsigned ")) &&
> > +   !(len = str_has_prefix(field, "__rel_loc "))) {
> > +   goto parse;
> > +   }  
> 
> This now triggers a checkpatch error:
> ERROR: do not use assignment in if condition

What a horrible message.

> #1184: FILE: kernel/trace/trace_events_user.c:1184:
> +   if (!(len = str_has_prefix(field, "__data_loc unsigned ")) &&
> 
> I personally prefer to keep these files fully checkpatch clean.

I've stopped using checkpatch years ago because I disagreed with so much it :-p
  (Including this message)

> However, I did test these changes under the self-tests and it passed.
> 
> Do they bug you that much? :)

No big deal if you prefer the other way. I was just doing an audit of
str_has_prefix() to see what code could be cleaned up that uses it, and I
found this code.

If you prefer to limit your code to "checkpatch clean", I'll leave it alone.

-- Steve



REGRESSION: lockdep warning triggered by 15b9ce7ecd: virtio_balloon: stay awake while adjusting balloon

2024-01-08 Thread Theodore Ts'o
Hi, while doing final testing before sending a pull request, I merged
in linux-next, and commit 5b9ce7ecd7: virtio_balloon: stay awake while
adjusting balloon seems to be causing a lockdep warning (see attached)
when running gce-xfstests on a Google Compute Engine e2 VM.  I was not
able to trigger it using kvm-xfstests, but the following command:
"gce-xfstests -C 10 ext4/4k generic/476) was sufficient to triger the
problem.   For more information please see [1] and [2].

[1] 
https://github.com/tytso/xfstests-bld/blob/master/Documentation/gce-xfstests.md
[2] https://thunk.org/gce-xfstests

I found it by looking at the git logs, and this commit aroused my
suspicions, and I further testing showed that the lockdep warning was
reproducible with this commit, but not when testing with the
immediately preceeding commit (15b9ce7ecd^).

Cheers,

- Ted


root: ext4/4k run xfstest generic/476
systemd[1]: Started fstests-generic-476.scope - /usr/bin/bash -c test -w 
/proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec 
./tests/generic/476.
kernel: [  399.361181] EXT4-fs (dm-1): mounted filesystem 
840e25bd-f650-4819-8562-7eded85ef370 r/w with ordered data mode. Quota mode: 
none.
systemd[1]: fstests-generic-476.scope: Deactivated successfully.
systemd[1]: fstests-generic-476.scope: Consumed 3min 1.966s CPU time.
systemd[1]: xt\x2dvdb.mount: Deactivated successfully.
kernel: [  537.085404] EXT4-fs (dm-0): unmounting filesystem 
d3d7a675-f7b6-4384-abec-2e60d885b6da.
systemd[1]: xt\x2dvdc.mount: Deactivated successfully.
kernel: [  540.565870] 
kernel: [  540.567523] 
kernel: [  540.572007] WARNING: inconsistent lock state
kernel: [  540.576407] 6.7.0-rc3-xfstests-lockdep-00012-g5b9ce7ecd715 #318 Not 
tainted
kernel: [  540.583532] 
kernel: [  540.587928] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
kernel: [  540.594326] kworker/0:3/329 [HC0[0]:SC0[0]:HE1:SE1] takes:
kernel: [  540.599955] 90b280a548c0 (>adjustment_lock){?...}-{2:2}, at: 
update_balloon_size_func+0x33/0x190
kernel: [  540.609926] {IN-HARDIRQ-W} state was registered at:
kernel: [  540.614935]   __lock_acquire+0x3f2/0xb30
kernel: [  540.618992]   lock_acquire+0xbf/0x2b0
kernel: [  540.622786]   _raw_spin_lock_irqsave+0x43/0x90
kernel: [  540.627366]   virtballoon_changed+0x51/0xd0
kernel: [  540.631947]   virtio_config_changed+0x5a/0x70
kernel: [  540.636437]   vp_config_changed+0x11/0x20
kernel: [  540.640576]   __handle_irq_event_percpu+0x88/0x230
kernel: [  540.645500]   handle_irq_event+0x38/0x80
kernel: [  540.649558]   handle_edge_irq+0x8f/0x1f0
kernel: [  540.653791]   __common_interrupt+0x47/0xf0
kernel: [  540.658106]   common_interrupt+0x79/0xa0
kernel: [  540.661672] EXT4-fs (dm-1): unmounting filesystem 
840e25bd-f650-4819-8562-7eded85ef370.
kernel: [  540.663183]   asm_common_interrupt+0x26/0x40
kernel: [  540.663190]   acpi_safe_halt+0x1b/0x30
kernel: [  540.663196]   acpi_idle_enter+0x7b/0xd0
kernel: [  540.663199]   cpuidle_enter_state+0x90/0x4f0
kernel: [  540.688723]   cpuidle_enter+0x2d/0x40
kernel: [  540.692516]   cpuidle_idle_call+0xe4/0x120
kernel: [  540.697036]   do_idle+0x84/0xd0
kernel: [  540.700393]   cpu_startup_entry+0x2a/0x30
kernel: [  540.704588]   rest_init+0xe9/0x180
kernel: [  540.708118]   arch_call_rest_init+0xe/0x30
kernel: [  540.712426]   start_kernel+0x41c/0x4b0
kernel: [  540.716310]   x86_64_start_reservations+0x18/0x30
kernel: [  540.721164]   x86_64_start_kernel+0x8c/0x90
kernel: [  540.725737]   secondary_startup_64_no_verify+0x178/0x17b
kernel: [  540.731432] irq event stamp: 22681
kernel: [  540.734956] hardirqs last  enabled at (22681): [] 
_raw_spin_unlock_irq+0x28/0x50
kernel: [  540.744564] hardirqs last disabled at (22680): [] 
_raw_spin_lock_irq+0x5d/0x90
kernel: [  540.753475] softirqs last  enabled at (22076): [] 
srcu_invoke_callbacks+0x101/0x1c0
kernel: [  540.762904] softirqs last disabled at (22072): [] 
srcu_invoke_callbacks+0x101/0x1c0
kernel: [  540.773298] 
kernel: [  540.773298] other info that might help us debug this:
kernel: [  540.780207]  Possible unsafe locking scenario:
kernel: [  540.780207] 
kernel: [  540.786438]CPU0
kernel: [  540.789007]
kernel: [  540.791766]   lock(>adjustment_lock);
kernel: [  540.796014]   
kernel: [  540.798778] lock(>adjustment_lock);
kernel: [  540.803605] 
kernel: [  540.803605]  *** DEADLOCK ***
kernel: [  540.803605] 
kernel: [  540.809840] 2 locks held by kworker/0:3/329:
kernel: [  540.814259]  #0: 90b280079148 
((wq_completion)events_freezable){+.+.}-{0:0}, at: process_one_work+0x1a6/0x500
kernel: [  540.824952]  #1: 9e3b40d1fe50 
((work_completion)(>update_balloon_size_work)){+.+.}-{0:0}, at: 
process_one_work+0x1a6/0x500
kernel: [  540.837088] 
kernel: [  540.837088] stack backtrace:
kernel: [  540.841584] CPU: 0 PID: 329 Comm: kworker/0:3 Not tainted 

Re: [PATCH] tracing user_events: Simplify user_event_parse_field() parsing

2024-01-08 Thread Beau Belgrave
On Mon, Jan 08, 2024 at 01:37:23PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" 
> 
> Instead of having a bunch of if statements with:
> 
>len = str_has_prefix(field, "__data_loc unsigned ");
>if (len)
>goto skip_next;
> 
>len = str_has_prefix(field, "__data_loc ");
>if (len)
>goto skip_next;
> 
>len = str_has_prefix(field, "__rel_loc unsigned ");
>if (len)
>goto skip_next;
> 
>len = str_has_prefix(field, "__rel_loc ");
>if (len)
>goto skip_next;
> 
>   goto parse;
> 
>  skip_next:
> 
> Consolidate it into a negative check and jump to parse if all the
> str_has_prefix() calls fail. If one succeeds, it will just continue with
> len equal to the proper string:
> 
>if (!(len = str_has_prefix(field, "__data_loc unsigned ")) &&
>!(len = str_has_prefix(field, "__data_loc ")) &&
>!(len = str_has_prefix(field, "__rel_loc unsigned ")) &&
>!(len = str_has_prefix(field, "__rel_loc "))) {
>goto parse;
>}
> 
>  skip_next:
> 
> Signed-off-by: Steven Rostedt (Google) 
> ---
>  kernel/trace/trace_events_user.c | 22 ++
>  1 file changed, 6 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/trace/trace_events_user.c 
> b/kernel/trace/trace_events_user.c
> index 9365ce407426..ce0c5f1ded48 100644
> --- a/kernel/trace/trace_events_user.c
> +++ b/kernel/trace/trace_events_user.c
> @@ -1175,23 +1175,13 @@ static int user_event_parse_field(char *field, struct 
> user_event *user,
>   goto skip_next;
>   }
>  
> - len = str_has_prefix(field, "__data_loc unsigned ");
> - if (len)
> - goto skip_next;
> -
> - len = str_has_prefix(field, "__data_loc ");
> - if (len)
> - goto skip_next;
> -
> - len = str_has_prefix(field, "__rel_loc unsigned ");
> - if (len)
> - goto skip_next;
> -
> - len = str_has_prefix(field, "__rel_loc ");
> - if (len)
> - goto skip_next;
> + if (!(len = str_has_prefix(field, "__data_loc unsigned ")) &&
> + !(len = str_has_prefix(field, "__data_loc ")) &&
> + !(len = str_has_prefix(field, "__rel_loc unsigned ")) &&
> + !(len = str_has_prefix(field, "__rel_loc "))) {
> + goto parse;
> + }

This now triggers a checkpatch error:
ERROR: do not use assignment in if condition
#1184: FILE: kernel/trace/trace_events_user.c:1184:
+   if (!(len = str_has_prefix(field, "__data_loc unsigned ")) &&

I personally prefer to keep these files fully checkpatch clean.
However, I did test these changes under the self-tests and it passed.

Do they bug you that much? :)

Thanks,
-Beau

>  
> - goto parse;
>  skip_next:
>   type = field;
>   field = strpbrk(field + len, " ");
> -- 
> 2.43.0



Re: [PATCH v5] can: virtio: Initial virtio CAN driver.

2024-01-08 Thread Christophe JAILLET

Le 08/01/2024 à 14:10, Mikhail Golubev-Ciuchea a écrit :

From: Harald Mommer 

- CAN Control

   - "ip link set up can0" starts the virtual CAN controller,
   - "ip link set up can0" stops the virtual CAN controller

- CAN RX

   Receive CAN frames. CAN frames can be standard or extended, classic or
   CAN FD. Classic CAN RTR frames are supported.

- CAN TX

   Send CAN frames. CAN frames can be standard or extended, classic or
   CAN FD. Classic CAN RTR frames are supported.

- CAN BusOff indication

   CAN BusOff is handled by a bit in the configuration space.

Signed-off-by: Harald Mommer 
Signed-off-by: Mikhail Golubev-Ciuchea 
Co-developed-by: Marc Kleine-Budde 
Signed-off-by: Marc Kleine-Budde 
Cc: Damir Shaikhutdinov 
---


Hi,
a few nits below, should there be a v6.



+static int virtio_can_alloc_tx_idx(struct virtio_can_priv *priv)
+{
+   int tx_idx;
+
+   tx_idx = ida_alloc_range(>tx_putidx_ida, 0,
+priv->can.echo_skb_max - 1, GFP_KERNEL);
+   if (tx_idx >= 0)
+   atomic_add(1, >tx_inflight);


atomic_inc() ?


+
+   return tx_idx;
+}
+
+static void virtio_can_free_tx_idx(struct virtio_can_priv *priv,
+  unsigned int idx)
+{
+   ida_free(>tx_putidx_ida, idx);
+   atomic_sub(1, >tx_inflight);


atomic_dec() ?


+}


...


+static int virtio_can_probe(struct virtio_device *vdev)
+{
+   struct virtio_can_priv *priv;
+   struct net_device *dev;
+   int err;
+
+   dev = alloc_candev(sizeof(struct virtio_can_priv),
+  VIRTIO_CAN_ECHO_SKB_MAX);
+   if (!dev)
+   return -ENOMEM;
+
+   priv = netdev_priv(dev);
+
+   ida_init(>tx_putidx_ida);
+
+   netif_napi_add(dev, >napi, virtio_can_rx_poll);
+   netif_napi_add(dev, >napi_tx, virtio_can_tx_poll);
+
+   SET_NETDEV_DEV(dev, >dev);
+
+   priv->dev = dev;
+   priv->vdev = vdev;
+   vdev->priv = priv;
+
+   priv->can.do_set_mode = virtio_can_set_mode;
+   /* Set Virtio CAN supported operations */
+   priv->can.ctrlmode_supported = CAN_CTRLMODE_BERR_REPORTING;
+   if (virtio_has_feature(vdev, VIRTIO_CAN_F_CAN_FD)) {
+   err = can_set_static_ctrlmode(dev, CAN_CTRLMODE_FD);
+   if (err != 0)
+   goto on_failure;
+   }
+
+   /* Initialize virtqueues */
+   err = virtio_can_find_vqs(priv);
+   if (err != 0)
+   goto on_failure;
+
+   INIT_LIST_HEAD(>tx_list);
+
+   spin_lock_init(>tx_lock);
+   mutex_init(>ctrl_lock);
+
+   init_completion(>ctrl_done);
+
+   virtio_can_populate_vqs(vdev);
+
+   register_virtio_can_dev(dev);


Check for error?

CJ


+
+   napi_enable(>napi);
+   napi_enable(>napi_tx);
+
+   /* Request device going live */
+   virtio_device_ready(vdev); /* Optionally done by virtio_dev_probe() */
+
+   return 0;
+
+on_failure:
+   virtio_can_free_candev(dev);
+   return err;
+}


...




[PATCH] tracing user_events: Simplify user_event_parse_field() parsing

2024-01-08 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Instead of having a bunch of if statements with:

   len = str_has_prefix(field, "__data_loc unsigned ");
   if (len)
   goto skip_next;

   len = str_has_prefix(field, "__data_loc ");
   if (len)
   goto skip_next;

   len = str_has_prefix(field, "__rel_loc unsigned ");
   if (len)
   goto skip_next;

   len = str_has_prefix(field, "__rel_loc ");
   if (len)
   goto skip_next;

goto parse;

 skip_next:

Consolidate it into a negative check and jump to parse if all the
str_has_prefix() calls fail. If one succeeds, it will just continue with
len equal to the proper string:

   if (!(len = str_has_prefix(field, "__data_loc unsigned ")) &&
   !(len = str_has_prefix(field, "__data_loc ")) &&
   !(len = str_has_prefix(field, "__rel_loc unsigned ")) &&
   !(len = str_has_prefix(field, "__rel_loc "))) {
   goto parse;
   }

 skip_next:

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/trace_events_user.c | 22 ++
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index 9365ce407426..ce0c5f1ded48 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -1175,23 +1175,13 @@ static int user_event_parse_field(char *field, struct 
user_event *user,
goto skip_next;
}
 
-   len = str_has_prefix(field, "__data_loc unsigned ");
-   if (len)
-   goto skip_next;
-
-   len = str_has_prefix(field, "__data_loc ");
-   if (len)
-   goto skip_next;
-
-   len = str_has_prefix(field, "__rel_loc unsigned ");
-   if (len)
-   goto skip_next;
-
-   len = str_has_prefix(field, "__rel_loc ");
-   if (len)
-   goto skip_next;
+   if (!(len = str_has_prefix(field, "__data_loc unsigned ")) &&
+   !(len = str_has_prefix(field, "__data_loc ")) &&
+   !(len = str_has_prefix(field, "__rel_loc unsigned ")) &&
+   !(len = str_has_prefix(field, "__rel_loc "))) {
+   goto parse;
+   }
 
-   goto parse;
 skip_next:
type = field;
field = strpbrk(field + len, " ");
-- 
2.43.0




Re: [PATCH v8 3/3] remoteproc: zynqmp: parse TCM from device tree

2024-01-08 Thread Mathieu Poirier
On Thu, 4 Jan 2024 at 09:14, Tanmay Shah  wrote:
>
>
> On 1/3/24 12:17 PM, Mathieu Poirier wrote:
> > On Fri, Dec 15, 2023 at 03:57:25PM -0800, Tanmay Shah wrote:
> > > ZynqMP TCM information is fixed in driver. Now ZynqMP TCM information
> >
> > s/"is fixed in driver"/"was fixed in driver"
> >
> > > is available in device-tree. Parse TCM information in driver
> > > as per new bindings.
> > >
> > > Signed-off-by: Tanmay Shah 
> > > ---
> > >
> > > Changes in v8:
> > >   - parse power-domains property from device-tree and use EEMI calls
> > > to power on/off TCM instead of using pm domains framework
> > >   - Remove checking of pm_domain_id validation to power on/off tcm
> > >   - Remove spurious change
> > >
> > > Changes in v7:
> > >   - move checking of pm_domain_id from previous patch
> > >   - fix mem_bank_data memory allocation
> > >
> > >  drivers/remoteproc/xlnx_r5_remoteproc.c | 154 +++-
> > >  1 file changed, 148 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
> > > b/drivers/remoteproc/xlnx_r5_remoteproc.c
> > > index 4395edea9a64..36d73dcd93f0 100644
> > > --- a/drivers/remoteproc/xlnx_r5_remoteproc.c
> > > +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
> > > @@ -74,8 +74,8 @@ struct mbox_info {
> > >  };
> > >
> > >  /*
> > > - * Hardcoded TCM bank values. This will be removed once TCM bindings are
> > > - * accepted for system-dt specifications and upstreamed in linux kernel
> > > + * Hardcoded TCM bank values. This will stay in driver to maintain 
> > > backward
> > > + * compatibility with device-tree that does not have TCM information.
> > >   */
> > >  static const struct mem_bank_data zynqmp_tcm_banks_split[] = {
> > > {0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB 
> > > each */
> > > @@ -878,6 +878,139 @@ static struct zynqmp_r5_core 
> > > *zynqmp_r5_add_rproc_core(struct device *cdev)
> > > return ERR_PTR(ret);
> > >  }
> > >
> > > +static int zynqmp_r5_get_tcm_node_from_dt(struct zynqmp_r5_cluster 
> > > *cluster)
> > > +{
> > > +   struct of_phandle_args out_args;
> > > +   int tcm_reg_per_r5, tcm_pd_idx;
> > > +   struct zynqmp_r5_core *r5_core;
> > > +   int i, j, tcm_bank_count, ret;
> > > +   struct platform_device *cpdev;
> > > +   struct mem_bank_data *tcm;
> > > +   struct device_node *np;
> > > +   struct resource *res;
> > > +   u64 abs_addr, size;
> > > +   struct device *dev;
> > > +
> > > +   for (i = 0; i < cluster->core_count; i++) {
> > > +   r5_core = cluster->r5_cores[i];
> > > +   dev = r5_core->dev;
> > > +   np = of_node_get(dev_of_node(dev));
> > > +   tcm_pd_idx = 1;
> > > +
> > > +   /* we have address cell 2 and size cell as 2 */
> > > +   tcm_reg_per_r5 = of_property_count_elems_of_size(np, "reg",
> > > +4 * 
> > > sizeof(u32));
> > > +   if (tcm_reg_per_r5 <= 0) {
> > > +   dev_err(dev, "can't get reg property err %d\n", 
> > > tcm_reg_per_r5);
> > > +   return -EINVAL;
> > > +   }
> > > +
> > > +   /*
> > > +* In lockstep mode, r5 core 0 will use r5 core 1 TCM
> > > +* power domains as well. so allocate twice of per core TCM
> >
> > Twice of what?  Please use proper english in your multi line comments, i.e a
> > capital letter for the first word and a dot at the end.
> >
> > > +*/
> > > +   if (cluster->mode == LOCKSTEP_MODE)
> > > +   tcm_bank_count = tcm_reg_per_r5 * 2;
> > > +   else
> > > +   tcm_bank_count = tcm_reg_per_r5;
> > > +
> > > +   r5_core->tcm_banks = devm_kcalloc(dev, tcm_bank_count,
> > > + sizeof(struct mem_bank_data 
> > > *),
> > > + GFP_KERNEL);
> > > +   if (!r5_core->tcm_banks)
> > > +   ret = -ENOMEM;
> > > +
> > > +   r5_core->tcm_bank_count = tcm_bank_count;
> > > +   for (j = 0; j < tcm_bank_count; j++) {
> > > +   tcm = devm_kzalloc(dev, sizeof(struct mem_bank_data),
> > > +  GFP_KERNEL);
> > > +   if (!tcm)
> > > +   return -ENOMEM;
> > > +
> > > +   r5_core->tcm_banks[j] = tcm;
> > > +
> > > +   /*
> > > +* In lockstep mode, get second core's TCM power 
> > > domains id
> > > +* after first core TCM parsing is done as
> >
> > There seems to be words missing at the end of the sentence, and there is no 
> > dot.
> >
> > > +*/
> > > +   if (j == tcm_reg_per_r5) {
> > > +   /* dec first core node */
> > > +   of_node_put(np);
> > > +
> > > +   /* get second core node */
> > > +

Re: [PATCH net-next 2/6] page_frag: unify gfp bits for order 3 page allocation

2024-01-08 Thread Alexander Duyck
On Mon, Jan 8, 2024 at 12:25 AM Yunsheng Lin  wrote:
>
> On 2024/1/5 23:35, Alexander H Duyck wrote:
> > On Wed, 2024-01-03 at 17:56 +0800, Yunsheng Lin wrote:
> >> Currently there seems to be three page frag implementions
> >> which all try to allocate order 3 page, if that fails, it
> >> then fail back to allocate order 0 page, and each of them
> >> all allow order 3 page allocation to fail under certain
> >> condition by using specific gfp bits.
> >>
> >> The gfp bits for order 3 page allocation are different
> >> between different implementation, __GFP_NOMEMALLOC is
> >> or'd to forbid access to emergency reserves memory for
> >> __page_frag_cache_refill(), but it is not or'd in other
> >> implementions, __GFP_DIRECT_RECLAIM is masked off to avoid
> >> direct reclaim in skb_page_frag_refill(), but it is not
> >> masked off in __page_frag_cache_refill().
> >>
> >> This patch unifies the gfp bits used between different
> >> implementions by or'ing __GFP_NOMEMALLOC and masking off
> >> __GFP_DIRECT_RECLAIM for order 3 page allocation to avoid
> >> possible pressure for mm.
> >>
> >> Signed-off-by: Yunsheng Lin 
> >> CC: Alexander Duyck 
> >> ---
> >>  drivers/vhost/net.c | 2 +-
> >>  mm/page_alloc.c | 4 ++--
> >>  net/core/sock.c | 2 +-
> >>  3 files changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >> index f2ed7167c848..e574e21cc0ca 100644
> >> --- a/drivers/vhost/net.c
> >> +++ b/drivers/vhost/net.c
> >> @@ -670,7 +670,7 @@ static bool vhost_net_page_frag_refill(struct 
> >> vhost_net *net, unsigned int sz,
> >>  /* Avoid direct reclaim but allow kswapd to wake */
> >>  pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
> >>__GFP_COMP | __GFP_NOWARN |
> >> -  __GFP_NORETRY,
> >> +  __GFP_NORETRY | __GFP_NOMEMALLOC,
> >>SKB_FRAG_PAGE_ORDER);
> >>  if (likely(pfrag->page)) {
> >>  pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 9a16305cf985..1f0b36dd81b5 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -4693,8 +4693,8 @@ static struct page *__page_frag_cache_refill(struct 
> >> page_frag_cache *nc,
> >>  gfp_t gfp = gfp_mask;
> >>
> >>  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> >> -gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
> >> -__GFP_NOMEMALLOC;
> >> +gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) |  __GFP_COMP |
> >> +   __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
> >>  page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
> >>  PAGE_FRAG_CACHE_MAX_ORDER);
> >>  nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
> >> diff --git a/net/core/sock.c b/net/core/sock.c
> >> index 446e945f736b..d643332c3ee5 100644
> >> --- a/net/core/sock.c
> >> +++ b/net/core/sock.c
> >> @@ -2900,7 +2900,7 @@ bool skb_page_frag_refill(unsigned int sz, struct 
> >> page_frag *pfrag, gfp_t gfp)
> >>  /* Avoid direct reclaim but allow kswapd to wake */
> >>  pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
> >>__GFP_COMP | __GFP_NOWARN |
> >> -  __GFP_NORETRY,
> >> +  __GFP_NORETRY | __GFP_NOMEMALLOC,
> >>SKB_FRAG_PAGE_ORDER);
> >>  if (likely(pfrag->page)) {
> >>  pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> >
> > Looks fine to me.
> >
> > One thing you may want to consider would be to place this all in an
> > inline function that could just consolidate all the code.
>
> Do you think it is possible to further unify the implementations of the
> 'struct page_frag_cache' and 'struct page_frag', so adding a inline
> function for above is unnecessary?

Actually the skb_page_frag_refill seems to function more similarly to
how the Intel drivers do in terms of handling fragments. It is
basically slicing off pieces until either it runs out of them and
allocates a new one, or if the page reference count is one without
pre-allocating the references.

However, with that said many of the core bits are the same so it might
be possible to look at unifiying at least pieces of this. For example
the page_frag has the same first 3 members as the page_frag_cache so
it might be possible to look at refactoring things further to unify
more of the frag_refill logic.



Re: [PATCH v2] virtiofs: use GFP_NOFS when enqueuing request through kworker

2024-01-08 Thread Benjamin Coddington
On 5 Jan 2024, at 5:53, Hou Tao wrote:

> From: Hou Tao 
>
> When invoking virtio_fs_enqueue_req() through kworker, both the
> allocation of the sg array and the bounce buffer still use GFP_ATOMIC.
> Considering the size of both the sg array and the bounce buffer may be
> greater than PAGE_SIZE, use GFP_NOFS instead of GFP_ATOMIC to lower the
> possibility of memory allocation failure.

Perhaps not appropriate for this case, but are you aware of
memalloc_nofs_save/restore?  NFS has been converting over and cleaning out
our GFP_NOFS usage:

Documentation/core-api/gfp_mask-from-fs-io.rst

Ben




Re: [PATCH net-next 4/6] vhost/net: remove vhost_net_page_frag_refill()

2024-01-08 Thread Alexander Duyck
On Mon, Jan 8, 2024 at 1:06 AM Yunsheng Lin  wrote:
>
> On 2024/1/6 0:06, Alexander H Duyck wrote:
> >>
> >>  static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> >> @@ -1353,8 +1318,7 @@ static int vhost_net_open(struct inode *inode, 
> >> struct file *f)
> >>  vqs[VHOST_NET_VQ_RX]);
> >>
> >>  f->private_data = n;
> >> -n->page_frag.page = NULL;
> >> -n->refcnt_bias = 0;
> >> +n->pf_cache.va = NULL;
> >>
> >>  return 0;
> >>  }
> >> @@ -1422,8 +1386,9 @@ static int vhost_net_release(struct inode *inode, 
> >> struct file *f)
> >>  kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
> >>  kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
> >>  kfree(n->dev.vqs);
> >> -if (n->page_frag.page)
> >> -__page_frag_cache_drain(n->page_frag.page, n->refcnt_bias);
> >> +if (n->pf_cache.va)
> >> +__page_frag_cache_drain(virt_to_head_page(n->pf_cache.va),
> >> +n->pf_cache.pagecnt_bias);
> >>  kvfree(n);
> >>  return 0;
> >>  }
> >
> > I would recommend reordering this patch with patch 5. Then you could
> > remove the block that is setting "n->pf_cache.va = NULL" above and just
> > make use of page_frag_cache_drain in the lower block which would also
> > return the va to NULL.
>
> I am not sure if we can as there is no zeroing for 'struct vhost_net' in
> vhost_net_open().
>
> If we don't have "n->pf_cache.va = NULL", don't we use the uninitialized data
> when calling page_frag_alloc_align() for the first time?

I see. So kvmalloc is used instead of kvzalloc when allocating the
structure. That might be an opportunity to clean things up a bit by
making that change to reduce the risk of some piece of memory
initialization being missed.

That said, I still think reordering the two patches might be useful as
it would help to make it so that the change you make to vhost_net is
encapsulated in one patch to fully enable the use of the new page pool
API.



Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

2024-01-08 Thread Steven Rostedt
On Mon, 8 Jan 2024 12:32:46 +0100
Christian Brauner  wrote:

> On Sun, Jan 07, 2024 at 01:32:28PM -0500, Steven Rostedt wrote:
> > On Sun, 7 Jan 2024 13:29:12 -0500
> > Steven Rostedt  wrote:
> >   
> > > > 
> > > > IOW, the inode_permission() in lookup_one_len() that eventfs does is
> > > > redundant and just wrong.
> > > 
> > > I don't think so.  
> > 
> > Just to make it clear. eventfs has nothing to do with mkdir instance/foo.
> > It exists without that. Although one rationale to do eventfs was so  
> 
> Every instance/foo/ tracefs instances also contains an events directory
> and thus a eventfs portion. Eventfs is just a subtree of tracefs. It's
> not a separate filesystem. Both eventfs and tracefs are on the same
> single, system wide superblock.
> 
> > that the instance directories wouldn't recreate the same 10thousands
> > event inodes and dentries for every mkdir done.  
> 
> I know but that's irrelevant to what I'm trying to tell you.
> 
> A mkdir /sys/kernel/tracing/instances/foo creates a new tracefs
> instance. With or without the on-demand dentry and inode creation for
> the eventfs portion that tracefs "instance" has now been created in its
> entirety including all the required information for someone to later
> come along and perform a lookup on /sys/kernel/tracing/instances/foo/events.
> 
> All you've done is to defer the addition of the dentries and inodes when
> someone does actually look at the events directory of the tracefs
> instance.
> 
> Whether you choose to splice in the dentries and inodes for the eventfs
> portion during lookup and readdir or if you had chosen to not do the
> on-demand thing at all and the entries were created at the same time as
> the mkdir call are equivalent from the perspective of permission
> checking.
> 
> If you have the required permissions to look at the events directory
> then there's no reason why listing the directory entries in there should
> fail. This can't even happen right now.

Ah, I think I know where the confusion lies. The tracing information in
kernel/trace/*.c doesn't keep track of permission. It relies totally on
fs/tracefs/* to do so. If someone does 'chmod' or 'chown' or mounts with
'gid=xxx' then it's up to tracefs to maintain that information and not the
tracing subsystem. The tracing subsystem only gives the "default"
permissions (before boot finishes).

The difference between normal file systems and pseudo file systems like
debugfs and tracefs, is that normal file systems keep the permission
information stored on the external device. That is, when the inodes and
dentries are created, the information is retrieved from the stored file
system.

I think this may actually be a failure of debugfs (and tracefs as it was
based on debugfs), in that the inodes and dentries are created at the same
time the "files" backing them are. Which is normally at boot up and before
the file system is mounted.

That is, inodes and dentries are actually coupled with the data they
represent. It's not a cache for a back store like a hard drive partition.

To create a file in debugfs you do:

struct dentry *debugfs_create_file(const char *name, umode_t mode,
   struct dentry *parent, void *data,
   const struct file_operations *fops)

That is, you pass a the name, the mode, the parent dentry, data, and the
fops and that will create an inode and dentry (which is returned).

This happens at boot up before user space is running and before debugfs is
even mounted.

Because debugfs is mostly for debugging, people don't care about how it's
mounted. It is usually restricted to root only access. Especially since
there's a lot of sensitive information that shouldn't be exposed to
non-privileged users.

The reason tracefs came about is that people asked me to be able to have
access to tracing without needing to even enable debugfs. They also want to
easily make it accessible to non root users and talking with Kees Cook, he
recommended using ACL. But because it inherited a lot from debugfs, I
started doing these tricks like walking the dentry tree to make it work a
bit better. Because the dentries and inodes were created before mount, I
had to play these tricks.

But as Linus pointed out, that was the wrong way to do that. The right way
was to use .getattr and .permission callbacks to figure out what the
permissions to the files are.

This has nothing to do with the creation of the files, it's about who has
access to the files that the inodes point to.

This sounds like another topic to bring up at LSFMM ;-)  "Can we
standardize pseudo file systems like debugfs and tracefs to act more like
real file systems, and have inodes and dentries act as cache and not be so
coupled to the data?"

-- Steve



Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

2024-01-08 Thread Steven Rostedt
On Mon, 8 Jan 2024 12:04:54 +0100
Christian Brauner  wrote:

> > > IOW, the inode_permission() in lookup_one_len() that eventfs does is
> > > redundant and just wrong.  
> > 
> > I don't think so.  
> 
> I'm very well aware that the dentries and inode aren't created during
> mkdir but the completely directory layout is determined. You're just
> splicing in dentries and inodes during lookup and readdir.
> 
> If mkdir /sys/kernel/tracing/instances/foo has succeeded and you later
> do a lookup/readdir on
> 
> ls -al /sys/kernel/tracing/instances/foo/events
> 
> Why should the creation of the dentries and inodes ever fail due to a
> permission failure?

They shouldn't.

> The vfs did already verify that you had the required
> permissions to list entries in that directory. Why should filling up
> /sys/kernel/tracing/instances/foo/events ever fail then? It shouldn't
> That tracefs instance would be half-functional. And again, right now
> that inode_permission() check cannot even fail.

And it shouldn't. But without dentries and inodes, how does VFS know what
is allowed to open the files?

-- Steve



Re: [PATCH] tracing histograms: Simplify parse_actions() function

2024-01-08 Thread Steven Rostedt
On Mon, 8 Jan 2024 10:32:14 +0200
Andy Shevchenko  wrote:

> On Mon, Jan 8, 2024 at 3:31 AM Steven Rostedt  wrote:
> >
> > From: "Steven Rostedt (Google)" 
> >
> > The parse_actions() function uses 'len = str_has_prefix()' to test which
> > action is in the string being parsed. But then it goes and repeats the
> > logic for each different action. This logic can be simplified and
> > duplicate code can be removed as 'len' contains the length of the found
> > prefix which should be used for all actions.  
> 
> > Link: 
> > https://lore.kernel.org/all/20240107112044.6702c...@gandalf.local.home/
> >
> > Signed-off-by: Steven Rostedt (Google)   
> 
> If you want Link to be formally a tag, you should drop the following
> blank line.

The link is for humans not for parsers.

> 
> 
> > +   if ((len = str_has_prefix(str, "onmatch(")))
> > +   hid = HANDLER_ONMATCH;
> > +   else if ((len = str_has_prefix(str, "onmax(")))
> > +   hid = HANDLER_ONMAX;
> > +   else if ((len = str_has_prefix(str, "onchange(")))
> > +   hid = HANDLER_ONCHANGE;  
> 
> The repeating check for ( might be moved out as well after this like
> 
>   if (str[len] != '(') {
> // not sure if you need data to be assigned here as well
> ret = -EINVAL;
> ...
>   }
> 

Not sure how that makes it any better. It adds more code. I could start
with checking the "on" before checking for "match", "max" and "change", but
that just makes it more complex.

-- Steve




Re: [PATCH v5 11/34] function_graph: Have the instances use their own ftrace_ops for filtering

2024-01-08 Thread Mark Rutland
On Mon, Jan 08, 2024 at 02:21:03PM +, Mark Rutland wrote:
> On Mon, Jan 08, 2024 at 12:25:55PM +, Mark Rutland wrote:
> > We also have HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, but since the return address 
> > is
> > not on the stack at the point function-entry is intercepted we use the FP as
> > the retp value -- in the absence of tail calls this will be different 
> > between a
> > caller and callee.
> 
> Ah; I just spotted that this patch changed that in ftrace_graph_func(), which
> is the source of the bug. 
> 
> As of this patch, we use the address of fregs->lr as the retp value, but the
> unwinder still uses the FP value, and so when unwind_recover_return_address()
> calls ftrace_graph_ret_addr(), the retp value won't match the expected entry 
> on
> the fgraph ret_stack, resulting in failing to find the expected entry.
> 
> Since the ftrace_regs only exist transiently during function entry/exit, it's
> possible for a stackframe to reuse that same address on the stack, which would
> result in finding a different entry by mistake.
> 
> The diff below restores the existing behaviour and fixes the issue for me.
> Could you please fold that into this patch?
> 
> On a separate note, looking at how this patch changed arm64's
> ftrace_graph_func(), do we need similar changes to arm64's
> prepare_ftrace_return() for the old-style mcount based ftrace?
> 
> Mark.
> 
> >8
> diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
> index 205937e04ece..329092ce06ba 100644
> --- a/arch/arm64/kernel/ftrace.c
> +++ b/arch/arm64/kernel/ftrace.c
> @@ -495,7 +495,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long 
> parent_ip,
> if (bit < 0)
> return;
>  
> -   if (!function_graph_enter_ops(*parent, ip, fregs->fp, parent, gops))
> +   if (!function_graph_enter_ops(*parent, ip, fregs->fp, (void 
> *)fregs->fp, gops))
> *parent = (unsigned long)_to_handler;
>  
> ftrace_test_recursion_unlock(bit);

Thinking some more, this line gets excessively long when we pass the fregs too,
so it's probably worth adding a local variable for fp, i.e. the diff below.

Mark.

>8
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index 205937e04ece..d4e142ef4686 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -481,8 +481,9 @@ void prepare_ftrace_return(unsigned long self_addr, 
unsigned long *parent,
 void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
   struct ftrace_ops *op, struct ftrace_regs *fregs)
 {
-   unsigned long *parent = >lr;
struct fgraph_ops *gops = container_of(op, struct fgraph_ops, ops);
+   unsigned long *parent = >lr;
+   unsigned long fp = fregs->fp;
int bit;
 
if (unlikely(ftrace_graph_is_dead()))
@@ -495,7 +496,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long 
parent_ip,
if (bit < 0)
return;
 
-   if (!function_graph_enter_ops(*parent, ip, fregs->fp, parent, gops))
+   if (!function_graph_enter_ops(*parent, ip, fp, (void *)fp, gops))
*parent = (unsigned long)_to_handler;
 
ftrace_test_recursion_unlock(bit);



Re: [PATCH RFT] arm64: dts: qcom: sm8350: Reenable crypto & cryptobam

2024-01-08 Thread Luca Weiss
On Mon Jan 8, 2024 at 3:18 PM CET, Konrad Dybcio wrote:
> On 8.01.2024 14:49, Luca Weiss wrote:
> > When num-channels and qcom,num-ees is not provided in devicetree, the
> > driver will try to read these values from the registers during probe but
> > this fails if the interconnect is not on and then crashes the system.
> > 
> > So we can provide these properties in devicetree (queried after patching
> > BAM driver to enable the necessary interconnect) so we can probe
> > cryptobam without reading registers and then also use the QCE as
> > expected.
>
> This really feels a bit backwards.. Enable the resource to query the
> hardware for numbers, so that said resource can be enabled, but
> slightly later :/

If you think adding interconnect support to driver and dtsi is better,
let me know.

Stephan (+CC) mentioned it should be okay like this *shrug*

For the record, this is the same way I got the values for sc7280[0] and
sm6350[1].

[0] 
https://lore.kernel.org/linux-arm-msm/20231229-sc7280-cryptobam-fixup-v1-1-bd8f68589...@fairphone.com/
[1] 
https://lore.kernel.org/linux-arm-msm/20240105-sm6350-qce-v1-0-416e5c731...@fairphone.com/

diff --git a/arch/arm64/boot/dts/qcom/sm8350.dtsi 
b/arch/arm64/boot/dts/qcom/sm8350.dtsi
index b46236235b7f..cd4dd9852d9e 100644
--- a/arch/arm64/boot/dts/qcom/sm8350.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8350.dtsi
@@ -1756,8 +1756,8 @@ cryptobam: dma-controller@1dc4000 {
qcom,controlled-remotely;
iommus = <_smmu 0x594 0x0011>,
 <_smmu 0x596 0x0011>;
-   /* FIXME: Probing BAM DMA causes some abort and system 
hang */
-   status = "fail";
+   interconnects = <_noc MASTER_CRYPTO 0 _virt 
SLAVE_EBI1 0>;
+   interconnect-names = "memory";
};
 
crypto: crypto@1dfa000 {
diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
index 5e7d332731e0..9de28f615639 100644
--- a/drivers/dma/qcom/bam_dma.c
+++ b/drivers/dma/qcom/bam_dma.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "../dmaengine.h"
@@ -394,6 +395,7 @@ struct bam_device {
const struct reg_offset_data *layout;
 
struct clk *bamclk;
+   struct icc_path *mem_path;
int irq;
 
/* dma start transaction tasklet */
@@ -1206,6 +1208,7 @@ static int bam_init(struct bam_device *bdev)
bdev->num_channels = val & BAM_NUM_PIPES_MASK;
}
 
+   printk(KERN_ERR "%s:%d DBG num_ees=%u num_channels=%u\n", __func__, 
__LINE__, bdev->num_ees, bdev->num_channels);
/* Reset BAM now if fully controlled locally */
if (!bdev->controlled_remotely && !bdev->powered_remotely)
bam_reset(bdev);
@@ -1298,6 +1301,14 @@ static int bam_dma_probe(struct platform_device *pdev)
return ret;
}
 
+   bdev->mem_path = devm_of_icc_get(bdev->dev, "memory");
+   if (IS_ERR(bdev->mem_path))
+   return PTR_ERR(bdev->mem_path);
+
+   ret = icc_set_bw(bdev->mem_path, 1, 1);
+   if (ret)
+   return ret;
+
ret = bam_init(bdev);
if (ret)
goto err_disable_clk;



Re: [PATCH v5 11/34] function_graph: Have the instances use their own ftrace_ops for filtering

2024-01-08 Thread Mark Rutland
On Mon, Jan 08, 2024 at 12:25:55PM +, Mark Rutland wrote:
> We also have HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, but since the return address is
> not on the stack at the point function-entry is intercepted we use the FP as
> the retp value -- in the absence of tail calls this will be different between 
> a
> caller and callee.

Ah; I just spotted that this patch changed that in ftrace_graph_func(), which
is the source of the bug. 

As of this patch, we use the address of fregs->lr as the retp value, but the
unwinder still uses the FP value, and so when unwind_recover_return_address()
calls ftrace_graph_ret_addr(), the retp value won't match the expected entry on
the fgraph ret_stack, resulting in failing to find the expected entry.

Since the ftrace_regs only exist transiently during function entry/exit, it's
possible for a stackframe to reuse that same address on the stack, which would
result in finding a different entry by mistake.

The diff below restores the existing behaviour and fixes the issue for me.
Could you please fold that into this patch?

On a separate note, looking at how this patch changed arm64's
ftrace_graph_func(), do we need similar changes to arm64's
prepare_ftrace_return() for the old-style mcount based ftrace?

Mark.

>8
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index 205937e04ece..329092ce06ba 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -495,7 +495,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long 
parent_ip,
if (bit < 0)
return;
 
-   if (!function_graph_enter_ops(*parent, ip, fregs->fp, parent, gops))
+   if (!function_graph_enter_ops(*parent, ip, fregs->fp, (void 
*)fregs->fp, gops))
*parent = (unsigned long)_to_handler;
 
ftrace_test_recursion_unlock(bit);



Re: [PATCH RFT] arm64: dts: qcom: sm8350: Reenable crypto & cryptobam

2024-01-08 Thread Konrad Dybcio
On 8.01.2024 14:49, Luca Weiss wrote:
> When num-channels and qcom,num-ees is not provided in devicetree, the
> driver will try to read these values from the registers during probe but
> this fails if the interconnect is not on and then crashes the system.
> 
> So we can provide these properties in devicetree (queried after patching
> BAM driver to enable the necessary interconnect) so we can probe
> cryptobam without reading registers and then also use the QCE as
> expected.

This really feels a bit backwards.. Enable the resource to query the
hardware for numbers, so that said resource can be enabled, but
slightly later :/

Konrad



[PATCH RFT] arm64: dts: qcom: sm8350: Reenable crypto & cryptobam

2024-01-08 Thread Luca Weiss
When num-channels and qcom,num-ees is not provided in devicetree, the
driver will try to read these values from the registers during probe but
this fails if the interconnect is not on and then crashes the system.

So we can provide these properties in devicetree (queried after patching
BAM driver to enable the necessary interconnect) so we can probe
cryptobam without reading registers and then also use the QCE as
expected.

Fixes: 4d29db204361 ("arm64: dts: qcom: sm8350: fix BAM DMA crash and reboot")
Fixes: f1040a7fe8f0 ("arm64: dts: qcom: sm8350: Add Crypto Engine support")
Signed-off-by: Luca Weiss 
---
Not tested myself, but David Heidelberg was so nice and ran it on a
SM8350 board in a test farm and it seems to be working as expected.

But still please test it on some other boards so make sure it actually
works as expected there also.
---
 arch/arm64/boot/dts/qcom/sm8350.dtsi | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/sm8350.dtsi 
b/arch/arm64/boot/dts/qcom/sm8350.dtsi
index b46236235b7f..3cd75ab552c5 100644
--- a/arch/arm64/boot/dts/qcom/sm8350.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8350.dtsi
@@ -1754,10 +1754,10 @@ cryptobam: dma-controller@1dc4000 {
#dma-cells = <1>;
qcom,ee = <0>;
qcom,controlled-remotely;
+   num-channels = <16>;
+   qcom,num-ees = <4>;
iommus = <_smmu 0x594 0x0011>,
 <_smmu 0x596 0x0011>;
-   /* FIXME: Probing BAM DMA causes some abort and system 
hang */
-   status = "fail";
};
 
crypto: crypto@1dfa000 {
@@ -1769,8 +1769,6 @@ crypto: crypto@1dfa000 {
 <_smmu 0x596 0x0011>;
interconnects = <_noc MASTER_CRYPTO 0 _virt 
SLAVE_EBI1 0>;
interconnect-names = "memory";
-   /* FIXME: dependency BAM DMA is disabled */
-   status = "disabled";
};
 
    ipa: ipa@1e4 {

---
base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a
change-id: 20240108-sm8350-qce-6ada49f90657

Best regards,
-- 
Luca Weiss 




Re: [PATCH v9] bus: mhi: host: Add tracing support

2024-01-08 Thread Krishna Chaitanya Chundru

Hi Steven,

Even though I added your reviewed-by tag, I incorporated changes 
mentioned in the previous patch.


Can you please review it once.

Thanks & Regards,

Krishna Chaitanya.

On 1/5/2024 5:53 PM, Krishna chaitanya chundru wrote:

This change adds ftrace support for following functions which
helps in debugging the issues when there is Channel state & MHI
state change and also when we receive data and control events:
1. mhi_intvec_mhi_states
2. mhi_process_data_event_ring
3. mhi_process_ctrl_ev_ring
4. mhi_gen_tre
5. mhi_update_channel_state
6. mhi_tryset_pm_state
7. mhi_pm_st_worker

Change the implementation of the arrays which has enum to strings mapping
to make it consistent in both trace header file and other files.

Where ever the trace events are added, debug messages are removed.

Signed-off-by: Krishna chaitanya chundru 
Reviewed-by: "Steven Rostedt (Google)" 
---
Changes in v9:
- Change the implementations of some array so that the strings to enum mapping
- is same in both trace header and other files as suggested by steve.
- Link to v8: 
https://lore.kernel.org/r/20231207-ftrace_support-v8-1-7f62d4558...@quicinc.com

Changes in v8:
- Pass the structure and derefernce the variables in TP_fast_assign as 
suggested by steve
- Link to v7: 
https://lore.kernel.org/r/20231206-ftrace_support-v7-1-aca49a042...@quicinc.com

Changes in v7:
- change log format as pointed by mani.
- Link to v6: 
https://lore.kernel.org/r/20231204-ftrace_support-v6-1-9b206546d...@quicinc.com

Changes in v6:
- use 'rp' directly as suggested by jeffrey.
- Link to v5: 
https://lore.kernel.org/r/20231127-ftrace_support-v5-1-eb67daead...@quicinc.com

Changes in v5:
- Use DECLARE_EVENT_CLASS for multiple events as suggested by steve.
- Instead of converting to u64 to print address, use %px to print the address 
to avoid
- warnings in some platforms.
- Link to v4: 
https://lore.kernel.org/r/2023-ftrace_support-v4-1-c83602399...@quicinc.com

Changes in v4:
- Fix compilation issues in previous patch which happended due to rebasing.
- In the defconfig FTRACE config is not enabled due to that the compilation 
issue is not
- seen in my workspace.
- Link to v3: 
https://lore.kernel.org/r/2023-ftrace_support-v3-1-f358d2911...@quicinc.com

Changes in v3:
- move trace header file from include/trace/events to drivers/bus/mhi/host/ so 
that
- we can include driver header files.
- Use macros directly in the trace events as suggested Jeffrey Hugo.
- Reorder the structure in the events as suggested by steve to avoid holes in 
the buffer.
- removed the mhi_to_physical function as this can give security issues.
- removed macros to define strings as we can get those from driver headers.
- Link to v2: 
https://lore.kernel.org/r/20231013-ftrace_support-v2-1-6e893ce01...@quicinc.com

Changes in v2:
- Passing the raw state into the trace event and using  __print_symbolic() as 
suggested by bjorn.
- Change mhi_pm_st_worker to mhi_pm_st_transition as suggested by bjorn.
- Fixed the kernel test rebot issues.
- Link to v1: 
https://lore.kernel.org/r/20231005-ftrace_support-v1-1-23a2f394f...@quicinc.com
---
  drivers/bus/mhi/common.h|  38 +++---
  drivers/bus/mhi/host/init.c |  63 +
  drivers/bus/mhi/host/internal.h |  40 ++
  drivers/bus/mhi/host/main.c |  19 ++-
  drivers/bus/mhi/host/pm.c   |   7 +-
  drivers/bus/mhi/host/trace.h| 275 
  6 files changed, 378 insertions(+), 64 deletions(-)

diff --git a/drivers/bus/mhi/common.h b/drivers/bus/mhi/common.h
index f794b9c8049e..dda340aaed95 100644
--- a/drivers/bus/mhi/common.h
+++ b/drivers/bus/mhi/common.h
@@ -297,30 +297,30 @@ struct mhi_ring_element {
__le32 dword[2];
  };
  
+#define MHI_STATE_LIST\

+   mhi_state(RESET,"RESET")  \
+   mhi_state(READY,"READY")  \
+   mhi_state(M0,   "M0") \
+   mhi_state(M1,   "M1") \
+   mhi_state(M2,   "M2") \
+   mhi_state(M3,   "M3") \
+   mhi_state(M3_FAST,  "M3_FAST")\
+   mhi_state(BHI,  "BHI")\
+   mhi_state_end(SYS_ERR,  "SYS ERROR")
+
+#undef mhi_state
+#undef mhi_state_end
+
+#define mhi_state(a, b)case MHI_STATE_##a: return b;
+#define mhi_state_end(a, b)case MHI_STATE_##a: return b;
+
  static inline const char *mhi_state_str(enum mhi_state state)
  {
switch (state) {
-   case MHI_STATE_RESET:
-   return "RESET";
-   case MHI_STATE_READY:
-   return "READY";
-   case MHI_STATE_M0:
-   return "M0";
-   case MHI_STATE_M1:
-   return "M1";
-   case MHI_STATE_M2:
-   return "M2";
-   case MHI_STATE_M3:
-   return "M3";
-   case MHI_STATE_M3_FAST:
-   return "M3 FAST";
-   case MHI_STATE_BHI:
-   return "BHI";
-   case MHI_STATE_SYS_ERR:
-

Re: Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)

2024-01-08 Thread Tobias Huschle
On Thu, Dec 14, 2023 at 02:14:59AM -0500, Michael S. Tsirkin wrote:
> 
> Peter, would appreciate feedback on this. When is cond_resched()
> insufficient to give up the CPU? Should 
> Documentation/kernel-hacking/hacking.rst
> be updated to require schedule() instead?
> 

Happy new year everybody!

I'd like to bring this thread back to life. To reiterate:

- The introduction of the EEVDF scheduler revealed a performance
  regression in a uperf testcase of ~50%.
- Tracing the scheduler showed that it takes decisions which are
  in line with its design.
- The traces showed as well, that a vhost instance might run
  excessively long on its CPU in some circumstance. Those cause
  the performance regression as they cause delay times of 100+ms
  for a kworker which drives the actual network processing.
- Before EEVDF, the vhost would always be scheduled off its CPU
  in favor of the kworker, as the kworker was being woken up and
  the former scheduler was giving more priority to the woken up
  task. With EEVDF, the kworker, as a long running process, is
  able to accumulate negative lag, which causes EEVDF to not
  prefer it on its wake up, leaving the vhost running.
- If the kworker is not scheduled when being woken up, the vhost
  continues looping until it is migrated off the CPU.
- The vhost offers to be scheduled off the CPU by calling 
  cond_resched(), but, the the need_resched flag is not set,
  therefore cond_resched() does nothing.

To solve this, I see the following options 
  (might not be a complete nor a correct list)
- Along with the wakeup of the kworker, need_resched needs to
  be set, such that cond_resched() triggers a reschedule.
- The vhost calls schedule() instead of cond_resched() to give up
  the CPU. This would of course be a significantly stricter
  approach and might limit the performance of vhost in other cases.
- Preventing the kworker from accumulating negative lag as it is
  mostly not runnable and if it runs, it only runs for a very short
  time frame. This might clash with the overall concept of EEVDF.
- On cond_resched(), verify if the consumed runtime of the caller
  is outweighing the negative lag of another process (e.g. the 
  kworker) and schedule the other process. Introduces overhead
  to cond_resched.

I would be curious on feedback on those ideas and interested in
alternative approaches.



[PATCH v5] can: virtio: Initial virtio CAN driver.

2024-01-08 Thread Mikhail Golubev-Ciuchea
From: Harald Mommer 

- CAN Control

  - "ip link set up can0" starts the virtual CAN controller,
  - "ip link set up can0" stops the virtual CAN controller

- CAN RX

  Receive CAN frames. CAN frames can be standard or extended, classic or
  CAN FD. Classic CAN RTR frames are supported.

- CAN TX

  Send CAN frames. CAN frames can be standard or extended, classic or
  CAN FD. Classic CAN RTR frames are supported.

- CAN BusOff indication

  CAN BusOff is handled by a bit in the configuration space.

Signed-off-by: Harald Mommer 
Signed-off-by: Mikhail Golubev-Ciuchea 
Co-developed-by: Marc Kleine-Budde 
Signed-off-by: Marc Kleine-Budde 
Cc: Damir Shaikhutdinov 
---

V5:
* Re-base on top of linux-next (next-20240103)
* Tested with https://github.com/OpenSynergy/qemu/tree/virtio-can-spec-rfc-v3

RFC V4:
* Apply reverse Christmas tree style
* Add member *classic_dlc to RX and TX CAN frames
* Fix race causing a NETDEV_TX_BUSY return
* Fix TX queue going stuck on -ENOMEM
* Update stats.tx_dropped on kzalloc() failure
* Replace "(err != 0)" with "(unlikely(err))"
* Use "ARRAY_SIZE(sgs)"
* Refactor SGs in virtio_can_send_ctrl_msg()
* Tested with https://github.com/OpenSynergy/qemu/tree/virtio-can-spec-rfc-v3

RFC V3:
* Incorporate patch "[PATCH] can: virtio-can: cleanups" from
  
https://lore.kernel.org/all/20230424-footwear-daily-9339bd0ec428-...@pengutronix.de/
* Add missing can_free_echo_skb()
* Replace home-brewed ID allocator with the standard one from kernel
* Simplify flow control
* Tested with https://github.com/OpenSynergy/qemu/tree/virtio-can-spec-rfc-v3

RFC V2:
* Remove the event indication queue and use the config space instead, to
  indicate a bus off condition
* Rework RX and TX messages having a length field and some more fields for CAN
  EXT
* Fix CAN_EFF_MASK comparison
* Remove MISRA style code (e.g. '! = 0u')
* Remove priorities leftovers
* Remove BUGONs
* Based on virtio can spec RFCv3
* Tested with https://github.com/OpenSynergy/qemu/tree/virtio-can-spec-rfc-v3


 MAINTAINERS |7 +
 drivers/net/can/Kconfig |   12 +
 drivers/net/can/Makefile|1 +
 drivers/net/can/virtio_can.c| 1008 +++
 include/uapi/linux/virtio_can.h |   75 +++
 5 files changed, 1103 insertions(+)
 create mode 100644 drivers/net/can/virtio_can.c
 create mode 100644 include/uapi/linux/virtio_can.h

diff --git a/MAINTAINERS b/MAINTAINERS
index cef9597b90ac..d3892370db2b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23214,6 +23214,13 @@ F: drivers/scsi/virtio_scsi.c
 F: include/uapi/linux/virtio_blk.h
 F: include/uapi/linux/virtio_scsi.h
 
+VIRTIO CAN DRIVER
+M: "Harald Mommer" 
+L: linux-...@vger.kernel.org
+S: Maintained
+F: drivers/net/can/virtio_can.c
+F: include/uapi/linux/virtio_can.h
+
 VIRTIO CONSOLE DRIVER
 M: Amit Shah 
 L: virtualizat...@lists.linux.dev
diff --git a/drivers/net/can/Kconfig b/drivers/net/can/Kconfig
index eb410714afc2..f966bab19ed8 100644
--- a/drivers/net/can/Kconfig
+++ b/drivers/net/can/Kconfig
@@ -215,6 +215,18 @@ config CAN_XILINXCAN
  Xilinx CAN driver. This driver supports both soft AXI CAN IP and
  Zynq CANPS IP.
 
+config CAN_VIRTIO_CAN
+   depends on VIRTIO
+   tristate "Virtio CAN device support"
+   default n
+   help
+ Say Y here if you want to support for Virtio CAN.
+
+ To compile this driver as a module, choose M here: the
+ module will be called virtio-can.
+
+ If unsure, say N.
+
 source "drivers/net/can/c_can/Kconfig"
 source "drivers/net/can/cc770/Kconfig"
 source "drivers/net/can/ctucanfd/Kconfig"
diff --git a/drivers/net/can/Makefile b/drivers/net/can/Makefile
index ff8f76295d13..4488e591736e 100644
--- a/drivers/net/can/Makefile
+++ b/drivers/net/can/Makefile
@@ -30,6 +30,7 @@ obj-$(CONFIG_CAN_PEAK_PCIEFD) += peak_canfd/
 obj-$(CONFIG_CAN_SJA1000)  += sja1000/
 obj-$(CONFIG_CAN_SUN4I)+= sun4i_can.o
 obj-$(CONFIG_CAN_TI_HECC)  += ti_hecc.o
+obj-$(CONFIG_CAN_VIRTIO_CAN)   += virtio_can.o
 obj-$(CONFIG_CAN_XILINXCAN)+= xilinx_can.o
 
 subdir-ccflags-$(CONFIG_CAN_DEBUG_DEVICES) += -DDEBUG
diff --git a/drivers/net/can/virtio_can.c b/drivers/net/can/virtio_can.c
new file mode 100644
index ..924a7c19e438
--- /dev/null
+++ b/drivers/net/can/virtio_can.c
@@ -0,0 +1,1008 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * CAN bus driver for the Virtio CAN controller
+ * Copyright (C) 2021-2023 OpenSynergy GmbH
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* CAN device queues */
+#define VIRTIO_CAN_QUEUE_TX 0 /* Driver side view! The device receives here */
+#define VIRTIO_CAN_QUEUE_RX 1 /* Driver side view! The device transmits here */
+#define VIRTIO_CAN_QUEUE_CONTROL 2
+#define VIRTIO_CAN_QUEUE_COUNT 3
+
+#define CAN_KNOWN_FLAGS \
+   

Re: [PATCH v5 11/34] function_graph: Have the instances use their own ftrace_ops for filtering

2024-01-08 Thread Mark Rutland
Hi,

There's a bit more of an info-dump below; I'll go try to dump the fgraph shadow
stack so that we can analyse this in more detail.

On Mon, Jan 08, 2024 at 10:14:36AM +0900, Masami Hiramatsu wrote:
> On Fri, 5 Jan 2024 17:09:10 +
> Mark Rutland  wrote:
> 
> > On Mon, Dec 18, 2023 at 10:13:46PM +0900, Masami Hiramatsu (Google) wrote:
> > > From: Steven Rostedt (VMware) 
> > > 
> > > Allow for instances to have their own ftrace_ops part of the fgraph_ops
> > > that makes the funtion_graph tracer filter on the set_ftrace_filter file
> > > of the instance and not the top instance.
> > > 
> > > This also change how the function_graph handles multiple instances on the
> > > shadow stack. Previously we use ARRAY type entries to record which one
> > > is enabled, and this makes it a bitmap of the fgraph_array's indexes.
> > > Previous function_graph_enter() expects calling back from
> > > prepare_ftrace_return() function which is called back only once if it is
> > > enabled. But this introduces different ftrace_ops for each fgraph
> > > instance and those are called from ftrace_graph_func() one by one. Thus
> > > we can not loop on the fgraph_array(), and need to reuse the ret_stack
> > > pushed by the previous instance. Finding the ret_stack is easy because
> > > we can check the ret_stack->func. But that is not enough for the self-
> > > recursive tail-call case. Thus fgraph uses the bitmap entry to find it
> > > is already set (this means that entry is for previous tail call).
> > > 
> > > Signed-off-by: Steven Rostedt (VMware) 
> > > Signed-off-by: Masami Hiramatsu (Google) 
> > 
> > As a heads-up, while testing the topic/fprobe-on-fgraph branch on arm64, I 
> > get
> > a warning which bisets down to this commit:
> 
> Hmm, so does this happen when enabling function graph tracer?

Yes; I see it during the function_graph boot-time self-test if I also enable
CONFIG_IRQSOFF_TRACER=y. I can also trigger it regardless of
CONFIG_IRQSOFF_TRACER if I cat /proc/self/stack with the function_graph tracer
enabled (note that I hacked the unwinder to continue after failing to recover a
return address):

| # mount -t tracefs none /sys/kernel/tracing/
| # echo function_graph > /sys/kernel/tracing/current_tracer
| # cat /proc/self/stack
| [   37.469980] [ cut here ]
| [   37.471503] WARNING: CPU: 2 PID: 174 at arch/arm64/kernel/stacktrace.c:84 
arch_stack_walk+0x2d8/0x338
| [   37.474381] Modules linked in:
| [   37.475501] CPU: 2 PID: 174 Comm: cat Not tainted 
6.7.0-rc2-00026-gea1e68a341c2-dirty #15
| [   37.478133] Hardware name: linux,dummy-virt (DT)
| [   37.479670] pstate: 6045 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| [   37.481923] pc : arch_stack_walk+0x2d8/0x338
| [   37.483373] lr : arch_stack_walk+0x1bc/0x338
| [   37.484818] sp : 8000835f3a90
| [   37.485974] x29: 8000835f3a90 x28: 8000835f3b80 x27: 
8000835f3b38
| [   37.488405] x26: 04341e00 x25: 8000835f4000 x24: 
80008002df18
| [   37.490842] x23: 80008002df18 x22: 8000835f3b60 x21: 
80008015d240
| [   37.493269] x20: 8000835f3b50 x19: 8000835f3b40 x18: 

| [   37.495704] x17:  x16:  x15: 

| [   37.498144] x14:  x13:  x12: 

| [   37.500579] x11: 800082b4d920 x10: 8000835f3a70 x9 : 
8000800e55a0
| [   37.503021] x8 : 80008002df18 x7 : 04341e00 x6 : 

| [   37.505452] x5 :  x4 : 8000835f3e48 x3 : 
8000835f3b80
| [   37.507888] x2 : 80008002df18 x1 : 07f7b000 x0 : 
80008002df18
| [   37.510319] Call trace:
| [   37.511202]  arch_stack_walk+0x2d8/0x338
| [   37.512541]  stack_trace_save_tsk+0x90/0x110
| [   37.514012]  return_to_handler+0x0/0x48
| [   37.515336]  return_to_handler+0x0/0x48
| [   37.516657]  return_to_handler+0x0/0x48
| [   37.517985]  return_to_handler+0x0/0x48
| [   37.519305]  return_to_handler+0x0/0x48
| [   37.520623]  return_to_handler+0x0/0x48
| [   37.521957]  return_to_handler+0x0/0x48
| [   37.523272]  return_to_handler+0x0/0x48
| [   37.524595]  return_to_handler+0x0/0x48
| [   37.525931]  return_to_handler+0x0/0x48
| [   37.527254]  return_to_handler+0x0/0x48
| [   37.528564]  el0t_64_sync_handler+0x120/0x130
| [   37.530046]  el0t_64_sync+0x190/0x198
| [   37.531310] ---[ end trace  ]---
| [<0>] ftrace_stub_graph+0x8/0x8
| [<0>] ftrace_stub_graph+0x8/0x8
| [<0>] ftrace_stub_graph+0x8/0x8
| [<0>] ftrace_stub_graph+0x8/0x8
| [<0>] ftrace_stub_graph+0x8/0x8
| [<0>] ftrace_stub_graph+0x8/0x8
| [<0>] ftrace_stub_graph+0x8/0x8
| [<0>] ftrace_stub_graph+0x8/0x8
| [<0>] ftrace_stub_graph+0x8/0x8
| [<0>] ftrace_stub_graph+0x8/0x8
| [<0>] ftrace_stub_graph+0x8/0x8
| [<0>] el0t_64_sync_handler+0x120/0x130
| [<0>] el0t_64_sync+0x190/0x198

One interesting thing there is that there are two distinct failure modes: the
unwind for the WARNING gives 

Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

2024-01-08 Thread Christian Brauner
On Sun, Jan 07, 2024 at 01:32:28PM -0500, Steven Rostedt wrote:
> On Sun, 7 Jan 2024 13:29:12 -0500
> Steven Rostedt  wrote:
> 
> > > 
> > > IOW, the inode_permission() in lookup_one_len() that eventfs does is
> > > redundant and just wrong.  
> > 
> > I don't think so.
> 
> Just to make it clear. eventfs has nothing to do with mkdir instance/foo.
> It exists without that. Although one rationale to do eventfs was so

Every instance/foo/ tracefs instances also contains an events directory
and thus a eventfs portion. Eventfs is just a subtree of tracefs. It's
not a separate filesystem. Both eventfs and tracefs are on the same
single, system wide superblock.

> that the instance directories wouldn't recreate the same 10thousands
> event inodes and dentries for every mkdir done.

I know but that's irrelevant to what I'm trying to tell you.

A mkdir /sys/kernel/tracing/instances/foo creates a new tracefs
instance. With or without the on-demand dentry and inode creation for
the eventfs portion that tracefs "instance" has now been created in its
entirety including all the required information for someone to later
come along and perform a lookup on /sys/kernel/tracing/instances/foo/events.

All you've done is to defer the addition of the dentries and inodes when
someone does actually look at the events directory of the tracefs
instance.

Whether you choose to splice in the dentries and inodes for the eventfs
portion during lookup and readdir or if you had chosen to not do the
on-demand thing at all and the entries were created at the same time as
the mkdir call are equivalent from the perspective of permission
checking.

If you have the required permissions to look at the events directory
then there's no reason why listing the directory entries in there should
fail. This can't even happen right now.



Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

2024-01-08 Thread Christian Brauner
> > * Tracefs supports the creation of instances from userspace via mkdir.
> >   For example,
> > 
> > mkdir /sys/kernel/tracing/instances/foo
> > 
> >   And here the idmapping is relevant so we need to make the helpers
> >   aware of the idmapping.
> > 
> >   I just went and plumbed this through to most helpers.
> > 
> > There's some subtlety in eventfs. Afaict, the directories and files for
> > the individual events are created on-demand during lookup or readdir.
> > 
> > The ownership of these events is again inherited from the parent inode
> > or recovered from stored state. In both cases the actual idmapping is
> > irrelevant.
> > 
> > The callchain here is:
> > 
> > eventfs_root_lookup("xfs", "events")
> > -> create_{dir,file}_dentry("xfs", "events")
> >-> create_{dir,file}("xfs", "events")
> >   -> eventfs_start_creating("xfs", "events")
> >  -> lookup_one_len("xfs", "events")  
> > 
> > And the subtlety is that lookup_one_len() does permission checking on
> > the parent inode (IOW, if you want a dentry for "blech" under "events"
> > it'll do a permission check on events->d_inode) for exec permissions
> > and then goes on to give you a new dentry.
> > 
> > Usually this call would have to be changed to lookup_one() and the
> > idmapping be handed down to it. But I think that's irrelevant here.
> > 
> > Lookup generally doesn't need to be aware of idmappings at all. The
> > permission checking is done purely in the vfs via may_lookup() and the
> > idmapping is irrelevant because we always initialize inodes with the
> > filesystem level ownership (see the idmappings.rst) documentation if
> > you're interested in excessive details (otherwise you get inode aliases
> > which you really don't want).
> > 
> > For tracefs it would not matter for lookup per se but only because
> > tracefs seemingly creates inodes/dentries during lookup (and readdir()).
> 
> tracefs creates the inodes/dentries at boot up, it's eventfs that does
> it on demand during lookup.
> 
> For inodes/dentries:
> 
>  /sys/kernel/tracing/* is all created at boot up, except for "events".

Yes.

>  /sys/kernel/tracing/events/* is created on demand.

Yes.

> 
> > 
> > But imho the permission checking done in current eventfs_root_lookup()
> > via lookup_one_len() is meaningless in any way; possibly even
> > (conceptually) wrong.
> > 
> > Because, the actual permission checking for the creation of the eventfs
> > entries isn't really done during lookup or readdir, it's done when mkdir
> > is called:
> > 
> > mkdir /sys/kernel/tracing/instances/foo
> 
> No. that creates a entire new tracefs instance, which happens to
> include another eventfs directory.

Yes, I'm aware of all that.

> No. Only the meta data is created for the eventfs directory with a
> mkdir instances/foo. The inodes and dentries are not there.

I know, that is what I'm saying.

> 
> > 
> > When one goes and looksup stuff under foo/events/ or readdir the entries
> > in that directory:
> > 
> > fd = open("foo/events")
> > readdir(fd, ...)
> > 
> > then they are licensed to list an entry in that directory. So all that
> > needs to be done is to actually list those files in that directory. And
> > since they already exist (they were created during mkdir) we just need
> > to splice in inodes and dentries for them. But for that we shouldn't
> > check permissions on the directory again. Because we've done that
> > already correctly when the VFS called may_lookup().
> 
> No they do not exist.

I am aware.

> 
> > 
> > IOW, the inode_permission() in lookup_one_len() that eventfs does is
> > redundant and just wrong.
> 
> I don't think so.

I'm very well aware that the dentries and inode aren't created during
mkdir but the completely directory layout is determined. You're just
splicing in dentries and inodes during lookup and readdir.

If mkdir /sys/kernel/tracing/instances/foo has succeeded and you later
do a lookup/readdir on

ls -al /sys/kernel/tracing/instances/foo/events

Why should the creation of the dentries and inodes ever fail due to a
permission failure? The vfs did already verify that you had the required
permissions to list entries in that directory. Why should filling up
/sys/kernel/tracing/instances/foo/events ever fail then? It shouldn't
That tracefs instance would be half-functional. And again, right now
that inode_permission() check cannot even fail.



Re: [PATCH net-next 4/6] vhost/net: remove vhost_net_page_frag_refill()

2024-01-08 Thread Yunsheng Lin
On 2024/1/6 0:06, Alexander H Duyck wrote:
>>  
>>  static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>> @@ -1353,8 +1318,7 @@ static int vhost_net_open(struct inode *inode, struct 
>> file *f)
>>  vqs[VHOST_NET_VQ_RX]);
>>  
>>  f->private_data = n;
>> -n->page_frag.page = NULL;
>> -n->refcnt_bias = 0;
>> +n->pf_cache.va = NULL;
>>  
>>  return 0;
>>  }
>> @@ -1422,8 +1386,9 @@ static int vhost_net_release(struct inode *inode, 
>> struct file *f)
>>  kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
>>  kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
>>  kfree(n->dev.vqs);
>> -if (n->page_frag.page)
>> -__page_frag_cache_drain(n->page_frag.page, n->refcnt_bias);
>> +if (n->pf_cache.va)
>> +__page_frag_cache_drain(virt_to_head_page(n->pf_cache.va),
>> +n->pf_cache.pagecnt_bias);
>>  kvfree(n);
>>  return 0;
>>  }
> 
> I would recommend reordering this patch with patch 5. Then you could
> remove the block that is setting "n->pf_cache.va = NULL" above and just
> make use of page_frag_cache_drain in the lower block which would also
> return the va to NULL.

I am not sure if we can as there is no zeroing for 'struct vhost_net' in
vhost_net_open().

If we don't have "n->pf_cache.va = NULL", don't we use the uninitialized data
when calling page_frag_alloc_align() for the first time?

> .
> 



Re: [PATCH v2 0/5] PM: domains: Add helpers for multi PM domains to avoid open-coding

2024-01-08 Thread Daniel Baluta
On Fri, Jan 5, 2024 at 6:02 PM Ulf Hansson  wrote:
>
> Updates in v2:
> - Ccing Daniel Baluta and Iuliana Prodan the NXP remoteproc patches to
> requests help with testing.
> - Fixed NULL pointer bug in patch1, pointed out by Nikunj.
> - Added some tested/reviewed-by tags.

Thanks for doing this Ulf. I remember that I've tried introducing the
helpers some time ago
but got side tracked by other tasks.

https://lore.kernel.org/linux-pm/20200624103247.7115-1-daniel.bal...@oss.nxp.com/t/

Will review the series and test the remoteproc part this week.

Daniel.



Re: [PATCH] tracing histograms: Simplify parse_actions() function

2024-01-08 Thread Andy Shevchenko
On Mon, Jan 8, 2024 at 3:31 AM Steven Rostedt  wrote:
>
> From: "Steven Rostedt (Google)" 
>
> The parse_actions() function uses 'len = str_has_prefix()' to test which
> action is in the string being parsed. But then it goes and repeats the
> logic for each different action. This logic can be simplified and
> duplicate code can be removed as 'len' contains the length of the found
> prefix which should be used for all actions.

> Link: https://lore.kernel.org/all/20240107112044.6702c...@gandalf.local.home/
>
> Signed-off-by: Steven Rostedt (Google) 

If you want Link to be formally a tag, you should drop the following
blank line.


> +   if ((len = str_has_prefix(str, "onmatch(")))
> +   hid = HANDLER_ONMATCH;
> +   else if ((len = str_has_prefix(str, "onmax(")))
> +   hid = HANDLER_ONMAX;
> +   else if ((len = str_has_prefix(str, "onchange(")))
> +   hid = HANDLER_ONCHANGE;

The repeating check for ( might be moved out as well after this like

  if (str[len] != '(') {
// not sure if you need data to be assigned here as well
ret = -EINVAL;
...
  }

-- 
With Best Regards,
Andy Shevchenko



Re: [PATCH net-next 2/6] page_frag: unify gfp bits for order 3 page allocation

2024-01-08 Thread Yunsheng Lin
On 2024/1/5 23:35, Alexander H Duyck wrote:
> On Wed, 2024-01-03 at 17:56 +0800, Yunsheng Lin wrote:
>> Currently there seems to be three page frag implementions
>> which all try to allocate order 3 page, if that fails, it
>> then fail back to allocate order 0 page, and each of them
>> all allow order 3 page allocation to fail under certain
>> condition by using specific gfp bits.
>>
>> The gfp bits for order 3 page allocation are different
>> between different implementation, __GFP_NOMEMALLOC is
>> or'd to forbid access to emergency reserves memory for
>> __page_frag_cache_refill(), but it is not or'd in other
>> implementions, __GFP_DIRECT_RECLAIM is masked off to avoid
>> direct reclaim in skb_page_frag_refill(), but it is not
>> masked off in __page_frag_cache_refill().
>>
>> This patch unifies the gfp bits used between different
>> implementions by or'ing __GFP_NOMEMALLOC and masking off
>> __GFP_DIRECT_RECLAIM for order 3 page allocation to avoid
>> possible pressure for mm.
>>
>> Signed-off-by: Yunsheng Lin 
>> CC: Alexander Duyck 
>> ---
>>  drivers/vhost/net.c | 2 +-
>>  mm/page_alloc.c | 4 ++--
>>  net/core/sock.c | 2 +-
>>  3 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index f2ed7167c848..e574e21cc0ca 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -670,7 +670,7 @@ static bool vhost_net_page_frag_refill(struct vhost_net 
>> *net, unsigned int sz,
>>  /* Avoid direct reclaim but allow kswapd to wake */
>>  pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
>>__GFP_COMP | __GFP_NOWARN |
>> -  __GFP_NORETRY,
>> +  __GFP_NORETRY | __GFP_NOMEMALLOC,
>>SKB_FRAG_PAGE_ORDER);
>>  if (likely(pfrag->page)) {
>>  pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 9a16305cf985..1f0b36dd81b5 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4693,8 +4693,8 @@ static struct page *__page_frag_cache_refill(struct 
>> page_frag_cache *nc,
>>  gfp_t gfp = gfp_mask;
>>  
>>  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>> -gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
>> -__GFP_NOMEMALLOC;
>> +gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) |  __GFP_COMP |
>> +   __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
>>  page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
>>  PAGE_FRAG_CACHE_MAX_ORDER);
>>  nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 446e945f736b..d643332c3ee5 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -2900,7 +2900,7 @@ bool skb_page_frag_refill(unsigned int sz, struct 
>> page_frag *pfrag, gfp_t gfp)
>>  /* Avoid direct reclaim but allow kswapd to wake */
>>  pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
>>__GFP_COMP | __GFP_NOWARN |
>> -  __GFP_NORETRY,
>> +  __GFP_NORETRY | __GFP_NOMEMALLOC,
>>SKB_FRAG_PAGE_ORDER);
>>  if (likely(pfrag->page)) {
>>  pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> 
> Looks fine to me.
> 
> One thing you may want to consider would be to place this all in an
> inline function that could just consolidate all the code.

Do you think it is possible to further unify the implementations of the
'struct page_frag_cache' and 'struct page_frag', so adding a inline
function for above is unnecessary?

> 
> Reviewed-by: Alexander Duyck 
> 
> .
>